Policy-As-Code: Measurable Security for Cloud Environments

As organizations transition to the cloud, enterprise security programs must align themselves with DevOps and cloud-native architectures. At its core, DevOps is all about automation and visibility. Much of the time-consuming grunt work involved in deploying, monitoring, and managing infrastructure is automated through software, bringing us the catchphrase “infrastructure as code.” 

When implementing infrastructure as code, DevOps managers can apply configuration and other management data rapidly, repeatedly, consistently, and at a scale manual processes can never achieve. Via APIs and continuous telemetry, cloud implementations are highly visible, making them testable and verifiable in or near real-time. Managers move from thinking they know how the system works to knowing how it works. Telemetry, consistent metrics, and continuous testing make it possible to prove that systems are operating and meeting specific requirements.

In stark contrast, security controls often operate under much murkier conditions. As Phil Venables, chief information security officer for Google Cloud, said in a recent post, most bad things happen “because the control we thought we had in place was not present or operational when we needed it the most.” Such silent failures become apparent only after an incident, and security teams have to figure out why a control failed—and how to fix it—without a great deal of information to go on. Simply put, security is not as measurable or verifiable as it should be.

The need to align security programs with DevOps, make security more measurable, and ensure failures are more visible is driving the implementation of security controls and policies as code, making it one of the most important trends in enterprise security.

What is Policy-As-Code?

In its most literal definition, “policy-as-code” is the process of implementing policies in software, using a programming language. But policy-as-code is much more than simply writing code. It means applying the same rigor and discipline that goes into any software engineering effort to security controls and policies. The typical cycle for developing and deploying policy-as-code is illustrated in Figure 1.

cycle small.png

Implementing policy-as-code also means adopting many DevOps methodologies. Maintaining controls in a central repository, applying version control, enabling automatic validation in the pipeline, and continuously monitoring performance are all crucial elements of policy-as-code.

In cloud systems, implementing policy-as-code involves creating a general-purpose policy architecture. A general-purpose policy engine can work across multiple services and applications in today’s microservices environments, supporting a standard API and a common language for writing policies. This basic architecture is illustrated in Figure 2.

policy engine small.png

By expressing policies in code and defining governance and control activities in software, organizations can systematically declare, test, execute, measure, and maintain security. Managers can accurately measure how well the system adheres to a given policy (or a compliance framework) and understand what it takes to reach compliance. Following in the footsteps of infrastructure as code, policy-as-code makes security measurable, aligning it with cloud architecture.

The Advantages of Policy-As-Code

Policy-as-code brings many of the advantages of infrastructure as code to security. These include:

  • Automation: Enterprise security teams spend a lot of time on menial tasks, manually managing many security policies. In many cases, security policies live in spreadsheets, making them more intentional than actionable. When a policy is expressed as code, managers can automate deployment and update processes, moving the policy from intention to measurable reality. As we’ve said in previous posts, automation is a prerequisite to applying enterprise security to cloud-native architectures.

  • Management at Scale: Modern enterprise environments demand large-scale deployment and management. In those environments, it’s impossible to manage policies manually. Through automation, security teams can manage policies consistently and efficiently at scale.

  • Testing and Monitoring: Security teams can conduct continuous code testing, validating that the control is working as intended. Such testing substantially increases the likelihood of discovering silent failures before an incident occurs. Monitoring can also capture exceptions and the reasoning behind them, allowing ongoing improvements in control implementations as real-world operations reveal unforeseen requirements.

  • Continuous Integration/Continuous Deployment (CI/CD) Integration: Security and DevOps teams can integrate software-based controls into the organization’s CI/CD workflows, making them an integral part of production systems. Instead of being an afterthought, security testing and validation become a part of the CI/CD process.

  • Explicit failure conditions: Silent failures are a major problem in security architecture. To have confidence in security controls, managers must know when those controls fail. Policy-as-code makes failures explicit, which gives the security team much-needed visibility into the actual performance of the controls they implement.

Differences From Infrastructure as Code

Infrastructure as code allows managers to provision and manage cloud infrastructure at scale, operating at a relatively low level of abstraction. While efficient and effective, infrastructure as code also creates new risks. Configuration mistakes are amplified at scale, significantly impacting business operations and security posture. 

That’s where policy-as-code comes in, operating at a higher level of abstraction, supporting the logic for expressing business, compliance, security, and other constraints. Infrastructure as code allows managers to deploy S3 buckets rapidly and enable 2FA, for example. But it’s policy-as-code that expresses and enforces the condition that all S3 buckets must have 2FA enabled. 

In conjunction with infrastructure as code, policy-as-code can define and validate constraints on cloud systems, limiting the exposure of the underlying infrastructure to risks introduced by human error. 

The Starting Point: Visibility

Without a clear understanding of an organization’s assets, policy-as-code is a pipe dream. So the first step toward policy-as-code must be creating a consistent and continuous asset inventory. This is, of course, a difficult task at best for most organizations. The ease of subscribing to and using cloud services makes it challenging to keep track. But the very characteristics of the cloud — automation and scale — are coming to bear on the problem. 

The overwhelming market need for creating and maintaining a credible inventory of computing assets drove our investment in JupiterOne. As discussed in a previous post, JupiterOne’s platform discovers and documents assets, information about those assets (such as configuration data), and the relationships between those assets in a graph database. The platform also includes a powerful query engine that allows security managers to compose arbitrarily sophisticated queries, revealing an inventory of assets and information and insight about their status and operation. 

Tools such as JupiterOne are essential in the quest to create policy-as-code, allowing security teams to continuously test and understand the conditions under which policies, and the assets they govern, operate. A company may have a policy that states that all Amazon S3 buckets must have two-factor authentication enabled, for example. But knowing how many S3 buckets the organization has, much less if they all have two-factor authentication enabled, has been an impossible task. JupiterOne can tell an organization how many S3 buckets it has and which instances don’t have two-factor authentication enabled. 

Managers can save such queries and schedule them to run as often as necessary, giving them the tools to understand what assets the organization has and how well (or poorly) a given policy is performing. 

Other Characteristics

In addition to visibility, policy-as-code requires several functional elements. These include:

  • A Common Framework: In theory, it’s possible to implement policy-as-code using different policy languages, models, and APIs for different systems. But in practice, the complexity of such brute force integrations is a significant impediment. More importantly, the translation of policy between frameworks doesn’t always yield the levels of fidelity security programs demand. A common framework provides a declarative policy language, a uniform policy model, a standard API, and a consistent enforcement layer, making policy-as-code practical. The Open Policy Agent is an encouraging development effort to address this need and one worth investigating.

  • Testing: As we’ve said before, security testing must become a more continuous practice, reflecting the nature of CI/CD operations. Before deploying policies at scale, security teams must test those policies for both results and unintended consequences, including conflicts with how people work and with other policies.

  • Detection: Security systems must detect policy violations and control failures. As is the case with most cloud architectures, automation plays an essential role in detecting failures, allowing it to work in or near real-time. The system must also generate alerts and get them to the right people and systems through the proper channels. We wrote extensively on detection engineering in a previous post.

  • Remediation: As we discussed in our post on detection engineering, automated response and remediation are essential components of any cloud security system, matching the scale and speed of the cloud itself. Response orchestrations can bring the environment back to policy compliance by automating common fixes before an alert reaches a human. Automated communications to the proper humans allow follow-up. Today, many organizations have yet to reach the maturity levels that support automated remediation. But automated responses are a long-term requirement, so organizations must build a plan for getting there.

Real-World Examples

All policies can’t be implemented as code, but several policy categories lend themselves to software-defined enforcement. Access, governance, and configuration policies are all well-suited to automation. A governance policy that states “any operation requiring core system changes must go through code review” can be enforced in CI/CD workflows, for example. 

How organizations write a policy can also affect how well it fits within an automation framework. A policy that states “users cannot share their passwords” isn’t enforceable in software, for example. (One can argue that it isn’t enforceable in any domain before the fact.) But security teams can refactor such policies, making them enforceable in software while achieving the same end goal. A policy that states “all users must use 2FA,” for example, is enforceable in software and makes sharing passwords ineffective.

Other examples of policies that security teams can express and enforce in software include:

  • All hosts must use approved standard images: JupiterOne allows managers to write queries that find all the Docker containers in an organization and discover any hosts that aren’t using an approved image.

  • All S3 buckets must use default encryption: Similarly, managers can use JupiterOne to find S3 buckets for which default encryption isn’t enabled, triggering automated remediation for those instances.

  • Google Cloud Compute instances cannot use the default service account: Finding any instance in which someone left the default admin account enabled decreases risk and enhances security.

  • Administrators must grant user access to all systems using the principle of least privilege (PoLP), restricting user access rights to only those resources required to perform routine, authorized activities: Querying JupiterOne can tell security managers which users have any given permission, which users have full administrative access, or which users and systems have access to sensitive data, just to name a few examples.

Where to Start

As we said earlier, access, governance, and configuration policies are the low-hanging fruit. Of those three, governance is the category in which most organizations spend far too much time on manual processes. Similarly, access policies relate directly to risk management but are difficult to assess without the right tools. Enterprises looking to get started should pick one area and start with a manageable scope, working toward scale. If an organization can automate, monitor, and measure governance and access policies through code, it will make huge strides towards measuring the efficacy of its security programs.

As part of any effort to implement policy-as-code, enterprises should investigate and experiment with OPA. Understanding what OPA can (and can’t) do is an excellent way to define a starting point. OPA allows security managers to control sudo and SSH operations, for example, and implementing those controls on a few systems is a great place to start.

Conclusion

To protect cloud systems, security programs must align themselves with DevOps and cloud architecture. policy-as-code is an essential component of that alignment, providing the automation and measurability cloud security requires.