Project Overview
In my continuing interest in cybersecurity, I embarked on researching the end-to-end deployment of a DevSecOps strategy. I had to start somewhere, so while this may not be 100% the “best” way, it certainly is “a” way and it reflects my self-taught journey in understanding this complex field.
I also know a plan like this can be unique to the structure of the business, the readiness of the leaders to push a plan like this and the pitfalls of security as a whole.
In my research I found some pretty amazing documentation, I included some of that research at the bottom, but some of what I found was written by businesses that were pushing their paid solutions, which as anyone in tech knows can sometimes be more of a sales document than an assistive document, even with their best intentions in mind.
Approach & Plan
Initial Plan
The deployment idea I had must provide actionable information using these ideas as the baseline:
Documentation - document information related to setup, education, logging, methodology, etc. If a setup isn’t documented well, it may get thrown by the wayside as too complicated and silo’ed to be effective.
Education - developers, are told to shift left EVERYTHING. To help, I wanted to make sure there were good resources so they aren’t wasting valuable time learning how to fix a vulnerable piece of code, rather, being able to click a link and learn more about it right away and move on to the next ticket or feature.
Logging - secure code comes from finding what is wrong. That’s done by being able to log what happened. We don’t know what the effect is till its in staging or production. That’s why logging is important to see what’s going on.
Dashboards - health dashboards are important, but it also requires you to know what metrics are important to you. What’s important to you may not be important to a DevOps Engineer or Cloud Engineer. As a Systems Engineer turned DevSecOps Engineer my purpose was creating a system that developers, support agents, or Managers could use to create dashboards to fit their requirements and find what’s important to them.
Alerting - breaches are only good if we know about them. Well there never good, but you get what I mean. If they are screaming at you, at least you can fix them. They are also different for each person, an API Engineer may be interested in API error rate or fuzz scanning, while a security engineer is interested in failed login attempts. What I learned is find out what the general security vulnerabilities are, set up a solid alerting system, then work with the devs what would be important metrics that could be problematic if they failed and create alerts based on that.
Protocols - what is the protocols if something bad is found? Say a security engineer finds a vulnerability in an API, how do they report that to the development team? Through a ticket? How is that assigned and classified whats the CSE or EPSS score or even whats dynamic risk assessment for the current environment? Or worse, what happens if a customer finds a vulnerability? Is managements first response to throw the legal team at them? Or will you be willing to fix it and appreciate the report? Do you have a bug bounty program?
Compliance - what are the base requirements to ensure secure coding from a firm, legal or policy based system. These should be technical safeguards but it may not be possible, so you move to administrative safeguards, which while not as strong should be well documented so devs know what to expect. Also, how do you report compliance related issues? If there was a breach, whats the correct path to shut down the offending service and report it to compliance so they can alert the correct legal entities, divisions, or customers if they were affected?
Mapping - A system is as only as good as what you know is out there. Is there shadow IT happening? How do you provide enough flexibility to devs while maintaining a secure environment? Enabling some mapping tools lets you find what’s out there while also making sure some shadow system doesn’t get introduced into the pipeline causing other security vulnerabilities.
Tools - Tools, tools, tools. So many you don’t even know where to start. I’ve always started this process by whats the overhead to set it up and maintain it, will it be used after the first month, what’s the impact if it had to go away because of cost, and will I be the only champion of the tool. Depending on your answer, you may find it’s not worth implementing it. Either ask for more budget or person-hours to manage it, or tell management it’s not possible to implement that or those sets of tools.
Internal Pen Testing (Blue, Red, Purple) - While important, it may not be necessary. I did this more by game than by actual pen testing. I used Backdoors & Breaches and an hour to share with some on the dev & cloud team improvements to methodologies, thoughts and workflows to improve the teams security stance.
Phew, OK the basic idea and thoughts pretty much done, I’m sure there are other improvements to be made but that’s where I ended with the overview of the plan.
Approach
There’s two approaches I was thinking about.
The first approach (bigger team, more administrative tooling) fits well if I had a security team that can handle an increase in requests, and can be a bit more hands on, less automated. Not saying automation won’t be good for efficiency, but still wanted to make sure I could handle those out of cycle requests without overloading people.
The second approach (smaller team & more automated tooling) fits well if I had a smaller team. The downside is I would need to focus on specific responses and a narrower end goal since I wouldn’t be able to handle all requests even if I had automated a significant part of the plan.
A completely automated system, while efficient, can’t respond to out of cycle requests very well and the end result would likely turn into tool fatigue or lack of actionable improvements. It’s a balance.
I wanted to make something simple and have low overhead for the developers as the world of developers are all “shifting left”. While “shifting everything left” may not be typical of a dev team I’ve seen it a few times and it’s difficult for a developer to know all the in’s and out’s of a codebase and the best known technical, security and implementation methods and typically want to just make it work so they can close out the feature request, rather than belabor over a security point that may or may not be vulnerable.
This is especially true as generative AI becomes more mainstream with developers who can import libraries AND have generative AI create code for them, without realizing what vulnerabilities they are importing. While tools aren’t going to make the code better or more secure, it can at least catch some issues that come from both generative tools and human error.
Administrative Controls
So, because of the “shift left” approach that many businesses are implementing where the devs were implementing the own fixes, I wanted to make sure the administrative controls were less onerous.
My first step was to get the teams on my side, understanding what I was trying to achieve and what I expected of them. I started by requesting a security champion from each of the teams, sharing what my expectation was and how I was planning to improve DevSecOps.
It included:
Additional security training.
Being my main point of contact when deploying security related changes.
Promoting security best practices and learning of new information in the security world.
Other administrative controls I had either successfully implemented or was planning to implement include:
Tagging security related changes in the ticket management system and knowledge base when designing or considering new features.
Formal security related discussions when adding a new product, including what process the team will be following, and what infrastructure and language will be used and what its potential issues with it are.
Writing up if and why a risk was accepted, including its controlling factors, and if a risk was accepted, having it approved by the manager and the security champion.
If a critical or high CVE or EPSS with related dynamic assessment, what the expected response would be, including if it will be fixed or added to the suppressed list and why it was accepted, including risk mitigation plans, if any.
Template creation in the Git Repo. My goal was so developers could pick a project type they were working on and get started with all the required security features enabled.
Public - Made for projects that were to be deployed to either customers or public facing locations.
Internal - Made for projects that were not to be used or interact with customer data, but still needed some controls in place to maintain security.
Private - Made for projects that were not to be used in any customer facing project, either test projects, or projects that were were for personal projects.
There were other improvements I was making, but this was the start. I wasn’t trying to make it this burdensome requirement, more beginning the discussion on what administrative controls could be implemented and providing teams with
Technical Controls
The tool selection depended widely on what goal I had in mind. For my deployment, I needed to focus on a smaller team, that had more automated self-serve tooling, so the developers could act on the information themselves without needing a security engineer or analyst to report the vulnerability to them.
Tools
First, I needed to find out what I was going to cover, there are so many areas to security, I had to start small and grow. Here’s the full list of categories I wanted to cover, and some tools I either tested or researched for a deployment in the DevSecOps or general security deployment plan.
Category | Tools |
---|---|
Linting | - Sonar Linting - Gitlab Clean as you Code - Trivy VS Code |
SAST | - Gitlab SAST - SonarQube -SemGrep |
DAST | - Gitlab DAST - OWASP Zap |
Container Scanning | - Gitlab Container Scanning - Aqua Security |
Infrastructure Scanning | - KICS - Trivy |
Kubernetes | See Below |
SOAR | - Rapid7 - Splunk - Kondukto |
ASOC | - DefectDojo |
RBAC Controls | - Azure AD |
PAM Tools | - CyberArk - Azure PIM |
SIEM | - Wazuh - Elasticsearch |
Logging | - Azure Monitor - Log Analytics |
API Fuzzing | - Gitlab Web API Fuzz Testing |
PaC | - Open Policy Agent |
This list is not exhaustive, there are so many components and tools to security it’s hard to list them all. Once I got a basic list I focused on what was important.
Additional Controls
Some other additional controls I wanted across the environment was:
Only Critical and High vulnerabilities would be a blocker for developers. If a Critical or High vulnerability is found, the merge would be blocked until it was resolved or risk accepted. The risk would be based on the individual, team or product unit, based on where the vulnerability was found.
Because I typically knew who pushed the code, I could notify a Slack channel tagging the developer with the information on how to fix the issue.
I could also post the result to Kondukto and manage the request there, or create a ticket in Jira that needed to be closed, that would auto-update Kondkuto when the ticket was completed.
Other Thoughts
Because of the potential of false positives, I had to think about how I’d solve large amounts of failures. I was thinking I could assign a certain target reduction per month until the initial batch was completed, then do the same thing when new languages or new products were introduced.
I wanted to make sure I introduced it early enough in the new development workflow so developers were thinking of it as soon as development started.
Deployment
Kondukto
I needed to find a way that I could link the tools I was to deploy across the environment into one central location. Having all these tools are great, but if its a pain to access, or requires extensive training on what to do when you get an alert it wasn’t going to be well accepted, and devs would find ways around it rather than embrace it.
After some significant research, I settled on Kondukto, because it had the right balance of tooling, training, education, and controls in place that would allow a successful deployment.
Now I couldn’t really categorize what Kondukto was under, some see it as a Security Orchestration, Automation, and Response (SOAR) tool, but then it kinda fits in the ASOC category, but lets settle on it being a Security Orchestration and Vulnerability Management tool… if thats a thing.
Some things that stuck out to me about Kondukto:
Ability to import free or paid scanner vulnerability reports from multiple scanners that could paint a better picture of the vulnerabilities or issues in the code.
Deduplication of vulnerabilities (e.g., CVEs) during scanning, combined with EPSS scores to generate a dynamic risk assessment score based on the currently deployed code.
Automatic vulnerability routing to the correct team.
Allowing the team lead or security lead to sign off on vulnerabilities that are won’t fix or accepting the risk.
Policy enforcement for all projects as a whole.
Remediation playbooks so devs don’t have to hunt around for how to resolve issues.
Training for why a vulnerability is important to fix, in case the dev hasn’t come across that type of vulnerability.
Consistent scanning so new vulnerabilities show up as a ticket if new vulnerabilities can be found and reported.
Man, the list goes on, probably one of my most favorite tools with what seems like a decent amount of flexibility. Unfortunately I only got to talk to the sales reps and wasn’t able to use the tool personally.
Either way, I knew Kondukto would be the way I’d go if I needed to implement a system like this again.
One of the downsides with this tool was its inability to scan Kubernetes systems. It may have improved since when I got an intro, but that was something else I’d have to figure out.
Linters & Continued Education
I think linters and continued education are an important part of the security pipeline. The earlier you can catch an issue, the more productive developers can be so they aren’t wasting their time on failed PRs and fixes that shouldn’t have made it into the pipeline in the first place.
Here’s what I learned:
Enable linters that are related to the codebase the developers are working on. This isn’t an exhaustive list, but just a few that have linters.
SonarLint
Trivy VSCode
Semgrep
ESLint
Continued education could lead to lower code issues, requesting some recommended reading or guiding devs to resources that could improve code quality could lead to less mistakes and a more productive dev.
While linters and continued education are typically implemented as separate initiatives, monitoring their usage and stats can provide valuable metrics on the number of issues that reach the pull request stage. The lower reach the PR stage the better the coding (in theory).
SAST
Thing with SAST is there are so many options and tools, and also tools based on types of codebases (as with many of the scanning tools). Either way, SAST based responses are more for individual contributors, so can be resolved by them.
Here’s what I learned:
Some of my research led me to believe that it would be better to have multiple scanners as some scanners picked up some things and others would pick up other things. I knew that I would go one by one if I did add additional scan tools as I thought it would be too much complication for what would be marginal gains.
Some scanners are better at some languages than others, So pick the ones that would work for the language being developed in. This would require some research in what each repo was developed in and pick a scanner or group of scanners that could work for most of the codebase in the company’s git repo. It does make it difficult because there’s now tool fragmentation but may work in the long run. I’d only apply it to the top 3 languages first then expand after that.
For SAST, because it was based on the individual contribution of the developers, those developers were to be the one that fixed the scan, or potentially had their manager sign off on accepting the risk. Now, I know it could get overwhelming so there would have to be some balancing work to do here.
Container Scanning
Container scanning is based on the teams deployment, so typically requires more team-based actions.
Here’s what I learned:
Scans happened after the builds are complete and are either in staging or on a dev-test system.
Scans should be continuous, meaning they would happen first on release, then in the container registry on a regular basis, before release to the customer, then continuously until the container is no longer in use. This moves more to the runtime scans, and from what I found continuous scanning should also be done in the container orchestration system at runtime, but thats details for another system.
Container scanning can also help with secret detection and other network based vulnerabilities.
DAST Scanning
DAST Scanning is more of a finished, running product scan, so usually is done near final deployment. This means that in a perfect world DevSecOps or AppSec would be responsible for ensuring the devs got what they needed to fix the issue. In my case, I’d have to automate and make it easy for teams to gather feedback on the DAST Scan, especially for smaller teams or if there are no DevSecOps teams able to handle the bulk of that load.
Here’s what I learned:
While similar to container scanning in that it is usually done near the end of the development cycle, DAST scanning is usually done at runtime, trying to ensure the security of the app after runtime.
I wanted to make sure all customer-facing products were scanned, so the requirement would be that all customer facing products have DAST Scanning.
DAST scanning is a pretty big beast of its own. It has lots of things it can scan for. Since it interacts and can actively slow down servers it would likely need to be put on its own server so it doesn’t cause failures in an active deployment.
Infrastructure Scanning
This one I didn’t get too much in the weeds about, I’m sure there’s more about this that could be said, but this is mainly to satisfy devs using templates to start infrastructure requirements rather than in depth infrastructure design.
Here’s what I learned:
Any IaC needed to be scanned. For my use case, I recommended Kics, but other scanners like Checkov, Tenable, Aqua and Anchore were ones I looked at, but didn’t try them out.
Some issues with some of the infrastructure scanning is the rules can sometimes get in the way depending on the infrastructure. I found that adding a global rule would sometimes impact other downstream infrastructure scanning so sometimes would require me to separate the rules so it wouldn’t impact other codebases that either it didn’t have the full picture of, or didn’t work well with that codebase.
Container Orchestration Security
OK, this was the last big one. There’s a lot to say here since Container Orchestration is such a big thing, with so many different options. Things like Kubernetes Terraform, Ansible and so many more. One of the benefits with container orchestration is its easier to rapidly deploy and scale when needed. The issue with this is when you have a complicated highly available, multi-tenant infrastructure, you have to have some pretty good controls, to manage these scaling systems.
There are tons of services, open-source, paid, and otherwise available for Kubernetes. Here’s what I learned, this is specifically correlated to my own testing on a MicroK8s server and other K8s research I’ve done.
Here’s some tools I’ve found to help with K8s security. I didn’t test many of them, but this was just a few tools I found that could help with some of the security issues with running K8s.
Tool | Use | Link |
IceKube | Privilege Escalation Scanner | |
Trivy-Operator | Security Toolkit | |
KubeClarity | SBOM & Vulnerability Scanner | |
K8Lens | Kubernetes Scanner/Dashboard | |
KubeLinter | Kube Linter | |
Falco | Runtime Security Tool | The Falco Project | Falco - used to detect abnormal behavior in real time. |
Terrascan | Admissions Controller | |
Open Policy Agent | Network & Runtime Policy System | |
Checkov | SCA Tool | |
rbac-lookup | RBAC Audit | |
Clair V4 | Container Scanning | |
Kubeaudit Kube-bench | Misconfiguration Detection | |
Kube-hunter | Pen Test Tool |
Conclusion
Now I know, scanning isn’t a cover-all and there are so many more aspects that I didn’t cover here. It is a HUGE area, and I just wanted to share what I’ve learned in the short time I’ve been involved in DevSecOps.
Some things I didn’t cover are:
AutoDevOps, being able to automatically have the Git system automatically find vulnerabilities and create PRs for devs to review, that was a cool feature I would have loved to have gotten more into.
How best to report and managerially fix the vulnerabilities, as well as how best to deploy policies related to how the scans would impact the workflow.
How best to start the process of enabling scanners, I had estimated it would be a slow process as initial scans would have been overwhelming, it would take at least a year to implement and fix some of the scans before it would be able to run itself a bit more automatically.
SIEMs are great if you have the bandwidth to be looking at the reports, if the team is small, it may not make sense to invest heavily into that infrastructure.
One of the downsides of focusing so much energy on scanning is the expectation that downstream and upstream CVE and EPSS scores are being reported and have a big impact on the infrastructures security. I couldn’t assume fixing every vulnerability would make the infrastructure that much more secure. Companies don’t always like to create CVEs unless its super important because a low or medium CVE aren’t going to get fixed anyway because of the noise already being created by all the other dependencies.
I also know scanning isn't the only thing a DevSecOps position should be looking at, but in a world dominated by automation it’s probably a big part of the daily workload, especially at the beginning of the project. If anything, I’d definitely focus on gathering mapping and data on what is out there before implementing lots of scanning tools.
Sometimes the best way is train devs AND to give devs enough time to review and complete timely code reviews and fix technical debt rather than pushing them to create features for features sake. Customers can wait.
My mantra is listen, plan, deploy and evolve. I HOPE I come back to this and say dang, this is an awful plan. Or even hey, I was on the right track, I just needed to EVOLVE in my understanding. If I can do that, I’ve succeeded.
Helpful Resources
Here’s a random coalition of links I found interesting, helpful or cool in my research.
General Services Administration - https://tech.gsa.gov/guides/dev_sec_ops_guide/
Atlassian - https://www.atlassian.com/devops/devops-tools/devsecops-tools
Kondukto - https://kondukto.io/
Github - https://github.com/resources/articles/devops/devsecops
Kubernetes Security - https://github.com/magnologan/awesome-k8s-security & https://github.com/ksoclabs/awesome-kubernetes-security
Jit - https://www.jit.io/
Securing Kubernetes - https://kubernetes.io/docs/concepts/security/security-checklist/
Detection as Code - https://blog.runreveal.com/introducing-detection-as-code-support/?utm_source=tldrinfosec
Comments