Over the last couple of years, Kubernetes has become all the rage and it is now the most popular container orchestration and scheduling clustered system. In this article, we will cover the top 10 things you should look out for when pushing Kubernetes configurations from development through your deployment process. Our recommendations come from real-world experience with Kubernetes configurations that made it to production but should never have.
In a service definition if you make it a “type: LoadBalancer,” then the cloud you are on will create a load balancer for you. In AWS this is an ELB (external by default) and in GCP this is a LoadBalancer (external). All too often though, it’s a security risk since you are exposing something onto the internet with few to no security controls. There is at least one external load balancer that handles the services you want to expose to the Internet and everything routes into that.
When getting PR with this type should be flagged and reviewed.
Anytime the “type: LoadBalancer” line changes, it should be flagged and someone - a DevOps Engineer perhaps - should review for verification and approval.
“Service: NodePort”, will open a port on all of the nodes where it can be reached by the network external to the cluster. This exposes the cluster to a security risk. Caution should be taken when making this decision. Just like type LoadBalancer, it is really easy to do because many tutorials direct you to do it for ease of use reasons. Even a lot of helm / stable charts do this by default to make it “easy” for you to reach the application.
A “readiness” probe should be defined in any deployment. This is simply a signal to inform Kubernetes when to put this pod behind the load balancer and when to put this service behind the proxy to serve traffic. If you put an application behind the load balancer before it is ready, then a user can reach this pod but will not get the expected response of a healthy server. This rule is here to give an alert to the pod developer that the readiness probe should be enabled.
The liveness probe is just as important as the readiness probe. The liveness probe lets Kubernetes know if the pod is in a healthy state and if it isn’t healthy, Kubernetes should restart it. This is done via a simple check, such as getting an HTTP 200ok on some endpoint or a more complex check based on some bash commands. Either way, it is important - and very handy - to let Kubernetes know when the application isn’t working and needs to be restarted.
The container(s) in a deployment should automatically requests the CPU and memory resources that it needs and define it for the system. This prevents the pod from being starved of resources, and also prevents CPU/Mem from consuming all of the resources on a node.
Your company may be more or less stringent on where the binaries come from, depending on their policy on third-party binaries. If you are pulling common images that organizations use - like the official nginx, MySQL, or Redis - your organization might want to build it from source and/or rehost the image internally instead of pulling from Docker Hub.
The reason is that the images stored in Docker Hub can change if someone pushes the same image and tag to it. That means what you get from pulling the same image and tag may be different from one day to another, causing confusion. Additionally, the difference could be something malicious that could compromise your infrastructure and application. To mitigate these risks, you can either build the image from the source and host it in your own repository, or push the same images into your repository.
If your organization hosts some or all of your container images, you should apply this rule to flag any image not coming from your organization and flag it for someone to approve.
Applying Security Groups policies to your VMs or your Kubernetes worker nodes are considered essential to security. We should do the same with Kubernetes workloads.
The best practice is to limit inbound and outbound traffic to only what you need so you don’t accidentally expose unwanted services on the outbound. Kubernetes has Network Policy functionality that's equivalent to Security Groups. All resources should have Network Policy rules associated with their deployments.
For every deployment set, there should be a network policy file or the following resource:
If you want to get super fancy with it, you can match up the ports list to those outlined in the deployments pods exposed list and/or the service port list.
Ideally, these would all match so that the developers know that everything reconciles and the network policy doesn't list a port that is not used by the service or pod.
Also make sure that you provisioned your cluster with network plugin (CNI) that supports network policies.
Service accounts provide an identity mapped to some set of Kubernetes API server access permissions for a pod to use. When these items change, it should be flagged for review. These might be very minor changes that are easily overlooked, but have big ramifications on security and API server access. When these items are changed, the right persons should be notified to review the changes before it is allowed to merge.
If you see this type of change, flag it:
If a service account name changes, flag it:
The principle of least privilege is something your security team will be bugging you about. It's a compelling reason to get your RBAC configuration right. This is something that requires an overall review, starting with subjects that can create resources like Deployments or Pods in general or read sensitive resources like Secrets.
The challenging part, is to understand when Role (or ClusterRole) resources does not add privileges over time.
Such change is definitely something that should be flagged for a review.
For example: Changing verbs: ["get"] to verbs: [“*”] is a significant change.
Check out these sample RBAC policies to get started.
The Kubernetes master nodes are the control nodes of the entire cluster. Therefore, only certain items should be permitted to run on these nodes. To effectively limit what can run on these nodes, taints are placed on the nodes to specify items that tolerate the taint can run on them. However, this does not preclude anyone from using these taints on their pods to run on the master nodes.
If you encounter the toleration below on a Pod specification in one of your deployment resources, and your cluster is self-managed, it should be flagged for review:
The above 10 best practices should always be implemented before running deployments in order to mitigate security risks and ensure operational excellence. There are tools that could help you do that effectively, including Alcide and Datree.
Alcide's Kubernetes Advisor audits your Kubernetes cluster, nodes and pods configuration to make sure that the cluster is tuned and runs according to security best practices.
Datree allows you to check Kubernetes configuration files early on during development - when code is committed to source control - to ensure these practices and your other development standards are adopted.
Top 10 best practices you should adopt when working with the Serverless Framework, such as following the principle of least privilege to the functions.
Developers spend a lot of time working with git and GitHub, so investing in improving your GitHub practices makes a lot of sense. Implementing best practices in this guide could help the team improve developer productivity and reduce security risks.