Securing DataOps: The Case for Satori

Few business leaders would argue against the value of making data-driven decisions. Today, data analytics give businesses insights in -- or near -- real-time, allowing them to respond to changing markets and customer needs. These benefits have driven a rapid evolution of the decades-old concept of the data warehouse. “Big data” and platforms such as Hadoop emerged, followed by “data as a service” and data lakes. Services such as Snowflake, Amazon Redshift Spectrum, and Google BigQuery brought massive scale to data operations by leveraging the cloud’s distributed and elastic nature, creating huge repositories that encompass both data warehouses and lakes. 

These technologies evolved in lock-step with the use cases for the data themselves. The days of putting in a request for data and waiting hours -- much less days or weeks -- are long gone. Today, data is a self-serve resource, allowing data scientists to access whatever data they need whenever they need them, establishing data operations as an essential requirement for many companies. And just as DevOps transformed application development, DataOps has changed how data is stored, accessed, and used, especially in cloud-native environments.

While they are an obvious boon to the business, these self-serve data models create new risks. As companies concentrate data in these huge repositories, the security controls on the original data sources -- which don’t fit either cloud technology or the self-serve use case -- get left behind. Both security and compliance problems can occur as a result. Without proper controls, people can access personally identifiable information (PII) that regulations, security policies, or privacy policies intended to protect. More concerning still, security teams can quickly lose sight of how data flows in the organization, making it impossible to provide any level of consistent governance. 

In the world of self-serve data, then, security leaders must strike a careful balance between the business’s need to move quickly and the security controls necessary to manage these risks. But that’s easier said than done. 

These massive repositories contain information from many sources -- sometimes in their original formats -- with multiple points of ingress and egress. Building a new control layer on each data source that feeds into the repository isn’t just extremely difficult. It’s a patchwork job entailing levels of complexity that come close to guaranteeing failure -- one the organization must repeat every time it adopts a new data platform. Architectural inertia is pushing organizations to build new controls that operate across multiple data services and many data types, often at the application layer. And that means the responsibility for building, monitoring, and maintaining these controls is often shared with, if not owned outright by, data operations people. The question for data leaders is how do they work with security and privacy leaders to enable controls that manage compliance and security risk while meeting the business’s needs?

Satori’s Approach: Agile Data Governance and Security

Satori co-founders Eldad Chai and Yoav Cohen understood this problem based on their previous experience securing big data systems. So they set out to build a DataOps security and privacy control layer. The Satori Secure Data Access Platform provides data governance and security for cloud data stores, allowing organizations to apply controls consistently across services such as Snowflake, BigQuery, and Redshift, independently of the data store or type. 

800.jpg

Satori’s platform delivers these data governance functions: 

  • Fine-grained policies: Satori can make access control decisions based on a variety of factors, including user identities, groups, data types, and schema. Teams can manage policies-as-code via APIs or the Satori console. The platform comes with policies for implementing the NIST Cyber Security Framework (CSF), the Payment Card Industry Data Security Standard (PCI DSS), and others.

  • Sophisticated data protection: In addition to data-level access control, Satori can dynamically mask data in adherence with minimum access policies or enforce a specific access pattern. For example, Satori could ensure that analysts can retrieve only masked PII for a limited amount of time. It can also trigger a workflow in real-time based on data types, user identities, and access policies.

  • Detailed data access audit: With Satori, security teams can gather detailed information about how data flows within an organization. Administrators can generate granular data maps and data access audits, examining each access to the data store, including who accessed the store, the access context, and what data the user accessed. Organizations can leverage these data flow maps for security and business analytics, looking at data usage by users, groups, volume, tags, and locations. 

Architecturally Speaking

Satori is a transparent proxy service that consists of two key components: the Context Engine and the  Policy engine. The Context Engine asynchronously inspects all queries and their results, building a map of how data flows in the environment and how the organization is using it. Depending on the data access context, the Policy engine applies the policy for accessing a specific type of data.

Screen Shot 2020-08-31 at 4.39.05 PM.png

In implementing these functions, Satori prioritized reliability and low latency to ensure the Secure Data Access Platform balances the business’s performance and security needs appropriately. Satori accomplishes that goal by: 

  • Using a proven network proxy for reliability and performance: While they can be effective, application layer proxies require organizations to add another component to their technology stack, increasing both complexity and the potential for higher latency. Instead, Satori based its service on Nginx, a proven and reliable network proxy. (According to Netcraft, Nginx served or proxied 25.75 percent of the busiest websites in August 2020 and 36.45 percent of all active sites.) Consequently, Satori doesn’t require organizations to add application proxy components to their technology stacks, and Satori can focus on its query inspection, data mapping, and policy application functions, leveraging Nginx’s performance and reliability.

  • Using dynamic in-lining to ensure low latency: As Nginx proxies queries, Satori’s Context Engine asynchronously analyses the combination of who the consumer is, what data they are requesting, and what they want to do with the data. Using dynamic in-lining, Satori’s Policy Engine interrupts connections only when it has to, applying policies developers and security teams define and taking the actions those policies dictate. Benchmarks show no added latency for small to medium result sets (10MB or less) and about 5 percent latency for large result sets (over 100MB).

  • Integrating with existing identity and access management systems: Effective data security policies operate on the combination of user identities and the data they want to access. Instead of creating its own user management and access control structure, Satori relies on existing identity and access management (IAM) systems to determine the user identities and attributes that drive access control policies. Today, Satori works with Okta. The company plans Active Directory support for the near future. 

  • Mapping and classifying data in real-time: Satori’s asynchronous architecture also allows the service to map data flows and classify data in real-time without adversely affecting performance. Satori automatically classifies various data types in the result set based on the actual data and metadata, such as column or field names. Consequently, Satori can detect new occurrences of sensitive data and apply the appropriate policy to them. Combined with classification, data mapping gives data operations and security teams a clear picture of how data is moving in their organizations. Security teams can map governance programs to the reality of how people are using data. 

  • Using the Rust programming language to ensure performance and safety: The Rust programming language is quickly emerging as the go-to choice for secure and safety-critical software components. With no runtime or garbage collection (among other features), it yields high performance, and its focus on memory safety ensures higher levels of reliability and security. Given the place the Data Access Controller takes in data operations, Satori chose to implement the Data Access Controller in Rust, ensuring performance and security. 

Deployment Options and Future Directions

The Secure Data Access Platform is fully containerized and runs on Kubernetes. Satori hosts the cloud service, allowing organizations to start using it very quickly. Alternatively, organizations can run the service on-prem in their Kubernetes clusters.

The asynchronous architecture enabled by dynamic inlining can also operate in either fail-open or fail-close configurations, allowing organizations to balance their overall performance and security approach. In a fail-open configuration (the default), the Data Access Controller will not interrupt connections between data consumers and data stores if the service is offline. No queries will go through in a fail-close configuration if the Data Access Controller is not inspecting them. 

Today, Satori is focused on securing data warehouses and data lakes, including services such as Snowflake, Amazon Redshift, and Google BigQuery. But given its role as a medium between data consumers and data stores, Satori brings a level of future-proofing with it. As data operations and technologies continue to evolve, Satori can grow with them, providing its dynamic policy and data mapping capabilities. In the long term, Satori envisions using its architecture to help organizations with other aspects of data operations, such as performance management and troubleshooting. 

Conclusion

The ongoing and rapid evolution of data operations technologies and services makes data access governance both more difficult and more necessary. Satori’s Secure Data Access Platform balances the business’s performance needs and security requirements, creating an agile governance layer that works across multiple data stores and types. The company’s focus on reliability and low latency shows in its product architecture -- and the results seen by its early customers -- yielding an effective solution that can evolve as data operations and services change. And that’s why we invested in the business.

Jamie Lewis