- 00:00 Introduction and Speaker Background
- 00:47 Poll Results and Audience Overview
- 01:09 Project Overview and AI in Security
- 04:00 Challenges in Security Engineering
- 07:33 Legacy Security Tools and Modernization
- 10:23 AI Solutions and Practical Examples
- 15:51 Data and Intelligence Framework
- 23:40 Demo and Workflow Simulation
- 26:31 Q&A and Future Directions
Introduction and Speaker Background
[00:00] Ammar Alim: my name is Omar and I I am a senior manager at Adobe leading um dev sec ops so primarily leading two projects one around web application firewalling so figuring out defenses to protect web applications and another one around testing our um desktop products the the ones that you're probably familiar with Photoshop and and so on those are C++ code bases that were built long time ago and they need to be looked at and um plus tested and luckily we were you know able to make more progress lately with the um with developments in the AI industry.
Poll Results and Audience Overview
[00:47] Jenn Gile: right on well um just looking at our poll results um almost half of people said that they would categorize themselves as a young ling they can handle the basics of Jedi. The next category up is our padawans who are known enough to be dangerous and then we have our Jedi and then a handful of people are like hands off so great diversity of people here.
Project Overview and AI in Security
[01:09] Jenn Gile: um I'll let you uh take it away to talk about the project your team has been working on at Adobe.
[01:12] Ammar Alim: absolutely i'll share my screen.
[01:34] Jenn Gile: we have it i'm going to go ahead and hide but I assure you I'm still here i'll be keeping an eye on the Q&A.
[01:42] Ammar Alim: thank you um yeah again I am very excited uh first I'm really honored to be here just because I um big a big fan of Adam's work for example so this is like a hey how like how did I end up presenting alongside someone like Adam so um very honored and also I am a practitioner like most of you guys so I'm someone who wakes up in the morning 9 to5 doing security and I've I've always been part of smaller teams and we will talk about that so I would like to make the problem concrete.
Generally why why we looking into using AI within security engineering and we'll maybe pick a single example which is how to solve how to modernize u some of the legacy security tools that are out there maybe been there for 10 20 years uh signature based rulebased mo like most of for uh security tooling using the latest you know features within within AI. So let's get started.
So um if you you know been living right now like you know you are uh this is unprecedented times like these are very interesting time those times will go in history similar to when we discovered electricity or put cars on the highways or even discovering fire like from 2019 maybe to 2030 there's a lot going to be going on lots of discoveries uh we are building data centers the way we work is changing uh there is hype for sure uh but there's also some useful tools that are coming out of it that you we should not ignore that that's my kind of take on this so a balanced approach to AI I'm not all in but I'm also carefully looking at what I can take out of it and use day-to-day to just make my my life easy for the following reasons that are coming up.
Challenges in Security Engineering
[04:00] Ammar Alim: so let's talk about our challenges as security practitioners whether you if you've been here like doing security for a long time you know these maybe you have not gone through listing all of them and talking through them if you haven't been in security these are the challenges that you're going to be facing coming into this field and Adam and Jen both of them talked about it and it's all about you know uh you it's all about this conference which is lean abset like we always lean we always have more information to process than people to process them so the first challenge is we have smaller overloaded team that could be you could be a sock analyst you have a lot of alerts you could be an appseac engineer you dealing with so many false positives from legacy tools that are scanning your codebase telling you things that are not true or you know you see me saying things that you do not know how to sort through and figure out so this is going to be a fact any team I work for that was the case uh we were not as you know as funded as other teams we are a central team we cannot grow linearly to the rest of the organization no one is going to allow that this is the reality.
The second challenge here is software is eating the world meaning there is more software every day than the day before and thanks to AI there's anyone now is able to generate software and experienced developers are you know outdoing their previous self maybe 10x so this is just a subset of the tools or the products that I need to secure this is not even all the things so you know you probably familiar with with these products you heard about them some of them you never heard about and but this is just a subset of uh what I have to worry about uh day-to-day.
Second there is always more regulations like today there's more regulations than the day before uh whether it's you know EU is concerned about AI development whether it's uh you know additional upgrades to establish frameworks like Fed RAM PCI N you name it there is always something new so this is all new like you have to you have to budget for this you have to allocate time to work on this so you're not only um overloaded and there is more software there's more regulations.
Vulnerabilities uh you know just this is how I start my day i go to Google and and do security news and uh this is what I get daily like it's it's a lot there is a CVA every day there is an MPM issue for you to work on every other week or month um and sometimes you know about it because someone detected it some sometimes you just don't know about it because no one kind of detected it so uh there's more vulnerabilities every day uh we had about 40,000 CVEEs documented CVEEs last year i don't even know how much we have this year but it's probably more with all the AI and the MCD servers and and whatnot so um these are our day-to-day problems now let's pick one one thing that we we can use an as an example instead of just a broad you know set of problems.
Legacy Security Tools and Modernization
[07:33] Ammar Alim: most of our vendors were not created for this era most of the vendors were created like maybe 20 30 years ago i'll give you an example web application firewalls uh which is something that I work on those are typically developed for these are not typically but they are developed for all customers they're not developed for you use cases for your application specifically for the type of traffic you process whether you process credit card information or not w is what was just developed and you you kind of need to use it but is it doesn't handle the the last mind problem right like Amazon delivers a package to your home is this is w is not designed like that and you have to figure that problem out.
So let's let's get more deep into these gaps for example um tuning you have to tune W meaning you create a general set of rules tools for known issues imagine this you only are able to to detect and respond to things that were detected in the past so W has a database of known issues if there's a new emerging issues issue you are a flop until the issue a rule is developed for that for that issue meaning you always going to have rules that you don't need or have less than what you need meaning you don't know what you what you need because the W is not designed for your source codes it's just designed for all source codes out there so it end up with false positives meaning uh you could block a legit request the request because just web got confused maybe the user was you know attempting to reset their password and they tried it multiple times quickly w could say well this feels like an automated thing i'm going to block it.
Also W does not have access to your source code your threat model your application architecture your nothing is it doesn't understand any of that and the rules are just static they're not dynamic they don't evolve you need a W team that regularly review these things manually and update W accordingly so uh if you have 400 maybe 200 in my case web applications like I showed you the slide of products that I have to take care of I cannot scale with a smaller team um this is just not possible and in the past we just gave up we do do we did our best our best but that was it.
AI Solutions and Practical Examples
[10:23] Ammar Alim: so what is the solution i think the solution is apparent um look at the development in engineering and AI and see what you can do so let's let's let's talk about some of the things that we did um internally there's some low cost or no cost opportunities here so let's let's let's pick few and we can also do a demo if you have time so imagine this in the past we we are told hey uh Photoshop is going live they have a web application now do you have to you have to deploy there we have a generic general security baseline rules that we deploy regardless maybe you don't have a SQL database we're going to give you SQL injection uh rules we just don't know yet until we sort through the code and most of the web engineers they just do not know how to read codebase even developers like when you tell a developer hey you have to read like 50 uh files of code this is an overwhelming task.
But with an AI IDE like cursor AI or cloud code or anything they can actually go and read the code for you and they could actually map out all the public endpoints the expected parameters all of it for example if you have a an email public endpoint that only accept emails we can figure that out uh we have a user endpoint that deletes users update user add new user we know you know for a fact that this endpoint should not do anything else and we can create a rule that actually um is smart enough to say well like no one should be tinkering with this endpoint trying to add an email it's a user's endpoint it's not an email endpoint so I'm going to block this request that doesn't make any sense so this is just an example like you can have maybe have like access to one of those IDEs today go to one of your web applications today and ask it to you know build a JSON output of all the public endpoints all the expected parameters all all the interactions like within your app so you have a a contextaware um you know file so you can use that file you could say hey IDE based on this um codebase can you generate like AWS W rules for me just targeted for this application and you'll be surprised that you will get a very very good starting point and that now become becomes our baseline so we do not put a generic set of uh baseline security rules we now actually generate very targeted web security rules.
So other examples and I'm not going to dig in details into all of these like this is you know there's more uh examples but maybe we'll talk about another example so imagine you want to you want to expand this um system you cannot manually grab an ID and do it um go to each codebase uh you know one at a time this is a good starting point but what if you would like to generate rules based on your documented threat model maybe you want to dynamic generate a threat model using an IDE like an AI agent maybe you have the application um architecture diagrams and all of that good stuff maybe you have more like the compliance frameworks you can take all of this information and put it in a ramp database which is a retriever uh retrieval augmented generation database meaning when you talk to an LLM the LLM will have access to a database with very very recent and accurate information it's not using it training data it's not using data from the internet it's actually using your internal data.
Why is this powerful w vendors do not have access to any of this data they give you generic W rules based on this you could say could you generate W rules based on my information my threat the application specific threat model the documentation the code base all of that so you now added more context so AI thriving context the more accurate context you you give it the more recent data you you can get so do not rely on LLM to do all the work for you because LLM has risk we'll we'll talk about those in the in the in in a slides so uh uh AI could also dynamically prioritize rules so W process uh request based on the rule priority meaning if you have a priority one rule that will be pro process first but AI can dynamically look at your um your traffic patterns this could be a custom ML model you built as I will I will talk about later because some in some in some uh cases you really have to do that so that would be enough about some of the opportunities um uh in this case alerting could be looked at um and so on.
Data and Intelligence Framework
[15:51] Ammar Alim: so the first thing you have to do when looking to solve this problem is data actually the the hardest thing we we like I'm experiencing now is really curating clean good data uh the AI is like the data is actually your like end goal like you you if you give AI bad data like a custom ML model you're going to end up with a bad solution so for example I'm going to look at my source code w vendors don't have access to this i'm going to get that data clean it up i'm going to go look at our my architecture document threat model vulnerability scanning results this is really important dynamic maybe scanning results um uh static code analysis uh the business logic um and I can also have like a threat like I could build an MCP server which is um just a a way for your LLM to have to have access to tools so if there is a complicated tool with an API you don't really have to master the API architecture to be able to talk to it you can just use natural language like English to communicate to an LLM and tell it hey could you retrieve can you get me the last 10 uh bad IPs that were reported to us and this this data could be uh stored in a thread intel database uh that you have locally so this is a use case for a rag on MCP so if you have all of this data you clean it up it's good to go you can then feed it to an AI to make not just waffle security decisions there's so many security decisions you can make based on this data maybe you want to harden the application by fixing bugs or detecting issues or like whatever like there's so many things that you can do and you can you can just do that based on the app you don't you're not having you know a lot of engineers who can um do this manually so might as well use AI just because it's much easier.
So once you have the data now you have to build the intelligence the intelligence is a collection of AI tools um like it's it's clear that LLMs are not getting better like if you look at the difference between uh GBT4 and GBT5 which was a huge investment as compared to GBT4 that we're not seeing a lot of gains yeah there's some gains in some areas but uh at this point it's all about the tooling around LMS like MCP servers agents um rack databases retrieval augmented generation databases um and so on so now step number one is to get the data step number two is just to build the intelligence framework or to build the the brain uh that does all of this and it really depends on the use case so sometimes you cannot use an NLM you have to build a custom ML model like a lightweight one.
Uh for example we decided to analyze W logs or uh HTTP traffic using a custom built machine learning model using just classical traditional M model you can use depending on what you're trying to do there's so many algorithms that you can pick from random forest uh decision trees there's so many other options like uh clustering um ideas and so on so depending on what you like to do um you have to pick the right algorithm and I will give you like 20% of how to do ML in one slide. All of the training look the same and most of it you don't really need to be like an like a machine learning expert you know you you know that would be nice um but by expert I mean I don't I don't think you need to know all the math behind building those algorithm because there is so many libraries that abstract this away right now like you just bring the good data clean data and and start exploring uh those some of those models are lightweight they can run on your laptop so you you know start learning today.
Um so let's assume your system requires like a like a custom you know ML model you you have a system that can use LLMs but you have another system that cannot use LLM because just do not trust the data um in that case you could either have an option to build a custom uh machine learning model or give this LLM um access to your data which is via retrieval augmented generation so all machine learning training is going to look like these three lines that I'm commenting out here first import the model uh you want to use in this case it's scikitlearn this so like there's tensorflow there's all of these machine learning model that were developed by very smart people at companies like Google meta and so on so those are used for very sophisticated um things that you can use in security it's open source so you import the models second you instantiate that model which is you know if you look at the com uh line two here comment it hopefully it's big enough so you guys can see it first you have to train which is fit the model the model is like when someone says model is really after training the the ML says this is my understanding of the battery in this uh data and I'm going to give you like a file that you can save locally that anytime you add new data like you ask it about new data it should figure it out.
So you typically need evaluation and testing data and training data the evaluation data should be like one/ird of the entire data and after that you start you know uh if if the tests start passing you use this in production um but don't get too excited and just trust it i don't I know like no one trust AI fully um it can hallucinate training that data could be stale like most of these models are trained on like plusier data for example uh it can also be nondeterministic which is actually a good thing uh because you can build a a non-deterministic probabilistic model but have a a a very deterministic wrapper around it so meaning nondeterministic would be like your LLM so this is the core and the deterministic things could be like your corporate data your source code and things like that um AI could also be a little expensive if you are not careful so I really encourage responsible use of AI just because of the impact on the in environment right like you're burning a lot of tokens uh so really understand how you save tokens how you be really efficient there like the you know just in keeping up with the theme it's all about being lean so be careful there but now there are cheaper models coming that are very small like the tiny recur recursive models those are competing with the heavy large alm so things always improve costs always go down we figure out how to uh reduce hardware spending and so on so um I'm hopeful that uh within the next year there's going to be some breakthroughs.
Demo and Workflow Simulation
[23:40] Ammar Alim: so um and it's time for demo if you have if we don't have time we can just do questions right away.
[23:44] Jenn Gile: yeah I don't know if we'll have time um we've got about five minutes before Katie is going to come on but we don't have anything in the Q&A yet so um if anybody has questions for Amomar go ahead and put them in.
[23:59] Jenn Gile: and then Amar if you want to bring up your demo if there's a couple of things you want to show I know when we went through it like even just seeing the sequencing of what tasks will run even if we don't see them run I think that's super interesting to this group.
[24:11] Ammar Alim: yeah sure i actually ran it so we don't have to run all the things oh we don't have to wait for it nice yeah so um so this is a simulation of like one of the workflows that we have um this came on an interview um and we decided to just start um messing with it.
For example let's assume we have traffic red here represent bad traffic and green represent like human traffic now we don't think we need to train a custom uh ML model to be able to distinguish between those uh those are bot traffic bots are uh now trying to scrape you data for training they could be doing you know e-commerce stuff um and you train a model and don't worry this is not going to take this is very small uh model and it's training on very fast yeah it's a it's a small data set so that was fast so the model trained um the second step after the training the model I could say hey uh LLM could you really look into the outcomes of this training and analyze W logs based on that it did it's going to do that quickly and it's going to suggest to me uh few rules here suggested four rules that I can uh generate um and after that I could go down to based on this suggestion I could generate like a terraform so I have a terapform pipeline and I have a rack who has access to what is already deployed so I don't have to recreate a rule that I've done before so the rack could say "Hey you already did this why are you trying to do it again?" And once all of that is done I deploy and I don't deploy to production immediately you need a human in the loop you still need someone to say "Let's put this in staging." You can automatically do that you can have an agent doing that behind the scenes and once it's working on stage and it's not blocking legit traffic it's just not doing something crazy you can promote it to production and you can also do a little bit of testing there before enforcing so this is all I wanted to show this is one example there's so many uh things you can do in this uh fashion.
Q&A and Future Directions
[26:31] Jenn Gile: while we're waiting to see if um more questions come in Amar talk about um a bit how you're validating it internally how you're going to decide when it's time to actually launch it as part of you know a a tool at Adobe.
[26:47] Ammar Alim: yeah what the most I think the most interesting and difficult thing in developing um thing within the AI stuff is eval which is just evaluating what your AI is doing so AI is nondeterministic so you always tune it with more deterministic input um so you typically start with just non-deterministic you say I'm just going to rely on an LLM you don't have access to anything but my source code for example analyze this and and tell me and you get you know 80% there then you start adding more context and be careful with the context i A I love context but too much context could be confusing to it it could start you know making it starts making it hallucinate and yeah yeah so you you explore that like you try um techniques like prompt chaining and and there's so many techniques out there now that you have to try but once you um and you start testing and and recording your test results testing and recording your test results once you have very good confidence that whatever you settle and like settled on is consistent is giving you the you give it the same output or different similar input and it producing the same output that's when you're ready to to move on to production um so you need a systematic process you can't just um not test your your workflow and then.
[28:15] Jenn Gile: uh we have a question from Katie uh knowing what Katie does for a living I'm not surprised she's asked this question uh she says "How do you eval evaluate build versus buy you know as time goes on as WFT vendors potentially advance their own capabilities you and I have talked about this a little bit um so I'll I'll add on an additional question here of do you think WFT vendors will ever replace what you've done in this workflow you know how do you how do you decide where to put your time versus your money.
[28:47] Ammar Alim: yeah sure so um I'm actually passionate about this area the area of buy versus build versus buy and I come from engineering and I think engineering are really good about making those decisions like they've been building a lot once in security we're not building a lot I think it's coming like people are uh in security more building more now it's like engineering is coming to security but it's not distributed equally it's there um so we don't have that m muscle there so I I don't encourage encourage you to to think a lot about building i encourage you about a lot about you have a lot of existing tools there is no company that I work for that doesn't have a lot of security tools solve the gaps like I found the W gaps and I solved around them any tool you have today has gaps right so those gaps the last mile problem and solve it this is how you start building that muscle you could say well like once you develop that muscle you now have the ability to say I really do not need to buy a tool for this.
Um in the case of web you you have three solutions one build it your own that really a huge undertaking second build like a sa buy like a SAS that's uh um that's out of the box support and and whatnot still have limitation the third option is use like an open source like mod security maybe you start there in the middle you don't you know don't do any of the extremes and see what you can do with an open source and then you could say well the open source is high maintenance or I can keep up with the maintenance using AI and if it's high maintenance you go and and buy the tool so it's always about money just focus on the money how much money will I spend if I buy it versus how much money will I spend on engineering hours if I build it my myself once you have the numbers make the decision based on the numbers don't make decisions based on anything else um and going.
[30:57] Jenn Gile: uh remind me of your your question again um Jeff oh uh will um will there ever be a situation where you think a W vendor would be able to do this for you and I guess I'll add on to that i know you've said Adobe uses several different WAFT vendors so do you see a future where you wouldn't use this you know workflow and just offload it to a vendor or do you just not see that being possible given the intentional choice to have tools for all.
[31:23] Ammar Alim: yeah so the vendor is not going to no company is going to um share their source code like intellectual property with with vendors so that's a limitation for vendors uh we not going to keep up with updating the vendor with all the like our internal threat model if those get leaked you know there's so many things I don't want to share with the vendor so this give internal teams an advantage. Second the vendors actually use some machine learning capabilities but this machine learning is built for the masses is trained on the internet data is not trained on my Photoshop Photoshop traffic meaning like I I think I I gave the same analogy before do you want to be paid based on the average cost of living with Washington washington or Seattle you probably want to be paid based on the cost average cost of living in Seattle just because it's a the expensive city right um so I want to train and build a solution that is custom for each application and the vendor is just not going to do that for me um it's just not it it's not costefficient for them to do that and they don't have time for it. Are they going to solve for most people and I going to take that solution and bring it to each of my teams because that's what I'm hired to do right context matter in security and uh AI also you know enjoys the context like AI is ML is pattern rec recognition on steroids and uh the more data you give it the more cleaner data the more custom data the more like the better the solution you will get.
[33:03] Jenn Gile: one last question um first of all there was a question about sharing slides uh what I'll be doing after this event wraps up is um we'll be publishing kind of like a blog version of the talk that'll include the video a transcript of this and also um we'll PDF a copy of Amar's slides so that'll all be available on leanapsac.com um the second part of the question Omar is uh I think there's some excitement around what your team has done and some curiosity around whether there's a GitHub or a guide of how you um accomplish this so that other people can learn from you um what are you kind of thinking about in terms of uh sharing this with the world.
[33:46] Ammar Alim: that's good so the the some of the this stuff is still in motion like we it's not uh ready like it's high restorement source a nondeterministic tool i'm going probably need to figure out the disclaimers there right like you have to go and adjust the tool you can use it as is depending on the use case but we are happy to open source immediately like a like a a demo version that you can expand um from there like I think we can do that um just give us some times reach find me on LinkedIn uh reach message me and I will I will keep that you know I will I will work out um something with you whomever excited about this it would be nice to get more ideas from them and um and actually um think about how we open source this.
[34:38] Jenn Gile: I love it that's what this community is all about is connecting people.