Validating Kubernetes Manifests

At ZipRecruiter my team is hard at work making Kubernetes our production platform. This is an incredible effort and I can only take the credit for very small parts of it. The issue that I was tasked with most recently was to verify and transform Kubernetes manifests; this post demonstrates how to do that reliably.

TL;DR: I built a Go package to walk Kubernetes manifests by type that allows transformation or validation of resources.

🔗 Kubernetes Manifests

For those who haven’t used Kubernetes I should describe what a manifest is. In brief it is (typically) one or more YAML documents that describe one or more Kubernetes resources. A resource is a meaningless long word that just means “thing,” but it’s the word normally used in k8s. For a complete listing check out the official docs. The things on the left are resource types.

🔗 Validate a Manifest

The most obvious validation you might do on a manifest is to ensure that it is valid. There are a number of existing solutions to this already; a convenient one is kubeval, which is an ok solution, but reaches out to github to pull API definitions on each resource, which not acceptable for us, but fixing that isn’t too hard.

In addition we have policies for how people use Kubernetes that we need to enforce. A super obvious example is the image: you should ensure that you’re either using a fully resolved image (like include the repo digest) or include an immutable version tag.

This sounds easy until you do a little bit of research and find that containers (which have the image field) appear in manifests in over 100 distinct locations, so you can’t just manually write code to for each case. For example, deployments contain a deployment spec, which contains a pod template, which contains a pod spec, which contains a list of containers.

I decided I’d try to build a visitor that would allow you to walk a given resource and have a callback be triggered whenever a certain resource type appeared. So for example, if you pass a Pod that defines two inner Containers, you could walk the Pod and get your Container function called twice, once for each container.

🔗 Building Walk

First and foremost, I started with the OpenAPI specification that is provided with Kubernetes. I don’t actually know anything about OpenAPI or Swagger or whatever, but looking at the data I came up with some code to build a path from a given type to another type.

All resources have a type, which is expressed with the apiVersion and kind fields within the manifest. My idea was this: given a resource, we should be able to enumerate all possible paths from the root of the type to the resource types we are looking for. I wrote a Perl script that generates that listing by doing a brute force search of the OpenAPI spec. (It also generates the mapping of kind and apiVersion so that the ResourceType function uses.)

The Go package then recursively walks resources using the paths that the Perl code generated.

🔗 Using Walk

Here’s a partial listing of what we use this for at work, which is to make a non-alpha, improved version of a PodPreset (ask me at some point and I can give reasons why, or maybe I’ll blog that later):

err = manifests.Walk("io.k8s.api.core.v1.PodSpec", resource, func(i interface{}) error {
	v, ok := i.(map[string]interface{})
	if !ok {
		return errors.New("Cannot transform non-hash resource", "resource", fmt.Sprintf("%#v", i))

	vols := []interface{}{}

	volsRaw, ok := v["volumes"]
	if !ok {
		v["volumes"] = vols
	} else {
		vols, ok = volsRaw.([]interface{})
		if !ok {
			return errors.New("volumes were not an array", "volumes", fmt.Sprintf("%#v", volsRaw))

	v["volumes"] = append(vols, []interface{}{map[string]interface{}{
		"name":      "config-volume",
		"configMap": map[string]interface{}{"name": "config-map"},

	return nil

The code is super annoying because it uses map[string]interface{} and []interface{} types instead of any actual structs. Arguably Go is one of the worst languages for this, but it has to run outside of containers so it’s worth it to avoid any runtime deps.

In addition to the type related annoyances, this approach has two limitations:

  1. It doesn’t support recursive types.
  2. It (at least in it’s current form) can’t validate or transform missing resources.

The recursive thing hasn’t been an issue for me, but there are some types in the OpenAPI spec that are recursive. I only discovered this because I tried to simplify the Perl script by generating the paths for all types and found that I couldn’t without fixing the recursive issue. Patches welcome for that.

The second issue is more frustrating and I’m not sure that it’d be sensible to fix it generically. In theory you could do this:

meta := "io.k8s.apimachinery.pkg.apis.meta.v1.ObjectMeta"
err := manifests.Walk(meta, r interface{}, func (r interface{}) error {
	noop := func (interface{}) error { return nil }
	if _, ok := manifests.Check(r, []string{"labels", "whatever"}, noop); !ok {
		return errors.New("whatever not set")
	return nil

(Note the use of the Check function which allows descending into the objects with a list of strings as path segments.) But the meta key is optional, so if a resource is lacking it, the above won’t be called for the missing meta. I suspect I could do some kind of crazy to allow injecting values like this but I have a gut feeling it would end up a mess and not actually that useful.

Is this the best or even only way to implement manifest validation? Absolutely not. But leveraging the official API specs to generate the boring part is clearly useful. Also generating Go code from Perl makes me feel like I’m cheating the devil or something.

If you want to learn more about Kubernetes, you might want to check out Kelsey Hightower’s book Kubernetes: Up and Running. He knows what’s up.

If you want to learn more about Go I would suggest The Go Programming Language which is one of the best tech books I’ve ever read.

Posted Tue, Dec 18, 2018

Receive Blog Posts in Your Email