Don’t Leave Your Cluster Unguarded - Use Gatekeeper Instead

Don’t Leave Your Cluster Unguarded - Use Gatekeeper Instead

When you work with Kubernetes, it slowly becomes your production temple. You invest time and resources into developing and nurturing it, and you naturally begin looking for ways to control the Kubernetes end-user. What can it do? What resources can it create? Can it label two deployments in a specific way? Which best practices should we follow?

In this article, I’ll introduce OPA Gatekeeper and show you how to use it to create and enforce policies and governance for your Kubernetes clusters, so that the resources you apply comply with that policy.

Why use OPA Gatekeeper?

Simply put, OPA Gatekeeper provides you with two critical abilities:

  1. Control what the end-user can do on the cluster.
  2. Enforce company policies in the cluster.

However, the true power of Gatekeeper is actually its effect on organizations. Gatekeeper provides a way to reduce the dependency between DevOps admins and the developers themselves. 

With Gatekeeper, enforcement of your organization's policies can be automated, which frees DevOps engineers from worrying about the developers making mistakes, and provides developers with instant feedback about what went wrong and what they need to change.

Prerequisites

OPA Gatekeeper, a sub-project of Open Policy Agent, is specifically designed to implement OPA into a Kubernetes cluster. This article requires a basic understanding of both Kubernetes and OPA, so if you’re already familiar with OPA, feel free to skip this part and move forward to the next one ✌🏻

What is OPA?

OPA is like a super-engine. You can write all your policies in it, then execute it with each input to check whether it violates any policies and, if so, in what way.

The main idea behind OPA is the ability to decouple the policy-decision-making logic from the policy-enforcement usage. 

Suppose you work in multi-services architecture. You might have to make policy decisions, for example, when that microservice receives an API request (like authorization). That logic is based on predicted rules in your organization, so in this case, you can offload and unify all your decision-making logic to a dedicated service - OPA.

How to use OPA 

  1. Integrate with OPA. If your services are written in GO, you can embed OPA as a package within your project. Otherwise, you can deploy OPA as a host-level daemon.
  2. Write and store your policies. To define your policies in OPA, you need to write them in Rego and send them to OPA. This way, whenever you use OPA for policy enforcement, OPA will query the input against these policies.

Request policy evaluation. When your application needs to make a policy- decision, it will send an API query request using JSON, containing all the required data via HTTP.

If you want to read more about OPA and how to use it, and learn more about its capabilities, I highly recommend reading its docs.

TL;DR

OPA Gatekeeper is a sub-project of Open Policy Agent, which is specifically designed to implement OPA in Kubernetes.

But before we dive into how Gatekeeper works under the hood, we first need to learn about Kubernetes Admission WebHooks.

Kubernetes Admission WebHooks

When a request comes into the Kubernetes API, it passes through a series of steps before it’s executed. 

  1. The request is authenticated and authorized.
  2. The request is processed by a list of special Kubernetes webhooks collections called Admission Controllers that can mutate, modify and validate the objects in the request.
  3. The request is persisted into etcd to be executed.

What are admission controllers?

Kubernetes admission controllers are the cluster’s middleware - they control what can proceed into the cluster. Admission Controllers manage deployments requesting too many resources, enforce pod security policies, and even block vulnerable images from being deployed. 

How admission controllers work UDH

Under the hood of an admission controller is a collection of pre-defined HTTP callbacks (i.e. webhooks), which intercept the Kubernetes API and process requests after they have been authenticated and authorized.

There are two types of admission controllers: 

  • MutatingAdmissionWebhook
  • ValidatingAdmissionWebhook

Mutating admission controllers are invoked first, because their job is to enforce custom defaults and, if necessary, to modify the objects sent to the API server. 

After all the modifications are completed and the incoming object has been validated, the Validating admission controllers are invoked and can reject requests to enforce custom policies.

Note that some controllers are both validating and mutating. If one of these rejects the request, the request will fail.

It’s powerful, free and you might already use it

Several admission controllers come pre-configured out-of-the-box and you probably already use them. LimitRanger, for example, is an admission webhook that prevents pods from running if the cluster is out of resources.

For further reading about MutatingAdmissionWebhook, I recommend this article.

Credit: Kubernetes-admission-controllers

Dynamic admission control

You probably wonder why admission controllers are implemented with webhooks. The reason for that is actually where the admission controller shines, and where the dynamic admission control comes in. 

Webhooks give developers the freedom and flexibility to customize the admission logic to actions like CREATE, UPDATE or DELETE on any resource. This is extremely useful, because almost every organization will need to add/adjust its policies and best practices.

Key issues arise from the way admission controllers operate. Modifying admission controllers requires them to be re-compiled into Kube-apiserver and they can only be enabled when the apiserver is activated. The implementation of admission controllers with webhooks provide administrators with the ability to create customized webhooks, and add mutating or validating admission webhooks to the admission webhook chain without re-compiling them. The Kubernetes apiserver executes registered webhooks, which are standard interfaces.

For further reading about dynamic admission control, I recommend reading K8s docs.

Now that we know what an admission controller/webhook is, we can delve into how Gatekeeper works.

How OPA Gatekeeper works

Gatekeeper acts as a bridge between the Kubernetes API server and OPA. In practice, this means that Gatekeeper checks every request that comes into the cluster to see if it violates any of the predefined policies. If so, apiserver will reject it.

Under the hood, Gatekeeper integrates with Kubernetes using the Dynamic Admission Control API and is installed as a customizable ValidatingAdmission webhook. Once it’s installed, the apiserver triggers it whenever a resource in the cluster is created, updated, or deleted.

Since Gatekeeper operates through OPA, all policies must be written in Rego. Fortunately, Kubernetes has that covered by using the OPA Constraints Framework.

A constraint is a CRD that represents the policy we want to enforce on a specific kind of resource, so when the ValidatingAdmission controller is invoked, the Gatekeeper webhook evaluates all constraints and sends OPA the request together with the policy to enforce it. All constraints are evaluated as logical and if constraint is not satisfied, the whole request is rejected.

Fun fact: This project is a collaboration between Google, Microsoft, Red Hat, and Styra.

Additional info about constraints
In order to create a constraint, we first need to create a ConstraintTemplate, which defines the policy that is used to enforce the constraint and the general schema of the constraint.
Once you define a template, you can create constraints with the same policy but with varying resources.

🛠 How to use OPA Gatekeeper - a simple scenario

Let’s say that you want to enforce a resource without an owner label:

  • Install Gatekeeper on your cluster

kubectl apply -f https://raw.githubusercontent.com/open-policy-agent/gatekeeper/release-3.4/deploy/gatekeeper.yaml


You can test it by running the following command:


kubectl get pods --all-namespaces


If everything works correctly, you should see a running pod named gatekeeper-contoller-manager under gatekeeper-system namespace.

  • Apply the ConstraintTemplate that will require all the labels described by the constraint to be present:

apiVersion: templates.gatekeeper.sh/v1beta1
kind: ConstraintTemplate
metadata:
  name: k8srequiredlabels
spec:
  crd:
    spec:
      names:
        kind: K8sRequiredLabels
      validation:
        # Schema for the `parameters` field
        openAPIV3Schema:
          properties:
            labels:
              type: array
              items: string
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8srequiredlabels
        violation[{"msg": msg, "details": {"missing_labels": missing}}] {
          provided := {label | input.review.object.metadata.labels[label]}
          required := {label | label := input.parameters.labels[_]}
          missing := required - provided
          count(missing) > 0
          msg := sprintf("you must provide labels: %v", [missing])
        }



Apply the constraint that will use the K8sRequiredLabel we created before we scoped to namespace, so that every namespace will be forced to have an owner label:


apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredLabels
metadata:
  name: ns-must-have-owner
spec:
  match:
    kinds:
      - apiGroups: [""]
        kinds: ["Namespace"]
  parameters:
    labels: ["owner"]


🧭 Audit

As I like to say, the cluster is our production temple so we want ongoing monitoring to be able to detect the remediation of pre-existing misconfigurations. This is where the audit comes in.

What are audits?

When the Gatekeeper webhook is invoked, it stores audit results as violations in the status field of the relevant constraint, for example:


apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredLabels
metadata:
  name: ns-must-have-owner
spec:
  match:
    kinds:
      - apiGroups: [""]
        kinds: ["Namespace"]
  parameters:
    labels: ["owner"]
status:
  auditTimestamp: "2019-08-06T01:46:13Z"
  byPod:
  - enforced: true
    id: gatekeeper-controller-manager-0
  violations:
  - enforcementAction: deny
    kind: Namespace
    message: 'you must provide labels: {"hr"}'
    name: default
  - enforcementAction: deny
    kind: Namespace
    message: 'you must provide labels: {"hr"}'
    name: gatekeeper-system
  - enforcementAction: deny
    kind: Namespace
    message: 'you must provide labels: {"hr"}'
    name: kube-public
  - enforcementAction: deny
    kind: Namespace
    message: 'you must provide labels: {"hr"}'
    name: kube-system


A Simple Scenario

Let’s say you want to create a policy that enforces all ingress hostnames to be unique. In this case, you would want to use the audit feature. However, the constraint that enforces a unique ingress hostname must have access to all other ingresses other than the object under evaluation—this requires data replication into OPA. 

By default, an audit does not require replication but there are two ways to configure data replication manually:

  1. Use the OPA cache mechanism. Simply set the flag audit-from-cache to ‘true’, which will enable the OPA cache to be used as the source-of-truth for all audit queries; thus, any object must first be cached before it can be audited for constraint violations.

Use a Kubernetes config resource. Create a config resource and define the resources you want to be replicated into OPA insyncOnly. Don’t worry, updating syncOnly should dynamically update all synced objects.

For example, the following configuration replicates all namespace and pod resources to OPA:


apiVersion: config.gatekeeper.sh/v1alpha1
kind: Config
metadata:
  name: config
  namespace: "gatekeeper-system"
spec:
  sync:
    syncOnly:
      - group: ""
        version: "v1"
        kind: "Namespace"
      - group: ""
        version: "v1"
        kind: "Pod"


Dry Run

Let’s take it even further and say that we want to test the constraints before we add and start enforcing them. This is where the Dry Run feature comes in. 

What is Dry Run?

Dry run provides the same functionality as Audit, enabling you to deploy the constraints and to see all violations of the constraint reported in the status without making actual changes.

How to use Dry Run?

In order to configure a constraint for dry-run mode, all you need to do is to use the enforcementAction: dryrun label in the constraint’s spec. By default, enforcementAction is set to “deny,” as the default behavior is to deny admission requests with any violation. 

For example:


apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredLabels
metadata:
  name: ns-must-have-owner
spec:
  match:
    kinds:
      - apiGroups: [""]
        kinds: ["Namespace"]
  parameters:
    labels: ["owner"]
status:
  auditTimestamp: "2019-08-06T01:46:13Z"
  byPod:
  - enforced: true
    id: gatekeeper-controller-manager-0
  violations:
  - enforcementAction: dryrun
    kind: Namespace
    message: 'you must provide labels: {"hr"}'
    name: default
  - enforcementAction: deny
    kind: Namespace
    message: 'you must provide labels: {"hr"}'
    name: gatekeeper-system
  - enforcementAction: deny
    kind: Namespace
    message: 'you must provide labels: {"hr"}'
    name: kube-public
  - enforcementAction: deny
    kind: Namespace
    message: 'you must provide labels: {"hr"}'
    name: kube-system


🎁 Think big

Now that your production is safe and sound, I’d like to pause for a second and ask you a question: How is Gatekeeper different from unit tests? Personally, when I first heard about Gatekeeper, I couldn’t understand. “Why does it have to be only about the production?” After all, ideally, doesn't it all go through the same pipeline?

As a developer, thinking about Kubernetes policies as if they were unit-tests made so much sense that it got me wondering - what's the difference between my code and Kubernetes resources?

Enter Datree.

What is Datree

Datree is a CLI solution that enables you to test policies against YAML files. The CLI comes with built-in policies for all Kubernetes best practices – as well as a centralized management solution for any policies you create. Run Datree in the CI, or as a pre-commit hook, and use it as you would use a local testing library. It’s an open-source project, so it’s free 🤑

You can find the project here and all the info on our website.
I encourage you to review our code and give us your feedback so it can be the best solution for your production.

🌸 Summary

The way I see it, just because Kubernetes allows you to deploy a pod with access to the host network namespace, for example, doesn't mean it's a good idea.

IMHO, when you adopt Kubernetes you're also changing the culture of your organization. DevOps is not something that happens overnight; it’s a process and it’s important to know how to manage it, especially in terms of scale. 

I hope this will inspire you to start thinking about your policies and how to enforce them within your organization.


Reveal misconfigurations within minutes

3 Quick Steps to Get Started