Kubernetes Cluster Logging: Manually Installing Fluentd + ElasticSearch

Overview

This is a supplement to the main k8s article found here. That's the easiest way to get this working, but if for whatever reason you're not kube-up'ing, read on.

There are three parts to getting this running:

spinning up the service and replication controllers for elasticsearch (ES)
run fluentd_es on each kubernetes node
spinning up the service and replication controllers for kibana

First we get ES running as this is where all the logs are to be pushed or pulled. Next is setting up the agent on each of the nodes to filter and deliver the logs to ES. Last is kibana to allow us to fiddle with the logs in a semi-controlled clicky fashion. The yaml for parts one and three can be found here.

Deploy ElasticSearch

Again here's a link to the yaml files in the k8s project that we'll be using below.

Edit es-controller.yaml if you want to have more than two ES nodes running. You can always adjust this later. The line you want to edit is replicas: 2.

Use kubectl to create the yamls -- first bringing up the RC and then the service:

kubectl create -f es-controller.yaml      
kubectl create -f es-service.yaml

Confirm ES started properly

This is easier said than done inside of a k8s cluster. What I'd suggest doing is (well honestly I'd say assume its all good until it isn't, but lets say you're not a cowboy) is run a busybox pod that you can use as a shell INSIDE k8s.

Here's a command to do that. Be sure to specify the right namespace when doing this. Also consider changing --tty jfbbox to --tty whatever-unique-name you want in case a few different admins are using this technique at the same time.

kubectl run -i --tty jfbbox --image=busybox --generator="run-pod/v1" --namespace=kube-system

Inside the busybox shell, you can run the following which performs a request against the service elasticsearch-logging:

wget http://elasticsearch-logging:9200

That command only retrieves the index.html, and you'll have to cat it after that to see what you pulled back.

That full session should look something like this:

/ # wget http://elasticsearch-logging:9200
Connecting to elasticsearch-logging:9200 (10.65.12.179:9200)
index.html           100% |*******************************|   346   0:00:00 ETA
/ # cat index.html
{
  "status" : 200,
  "name" : "Garrison Kane",
  "cluster_name" : "kubernetes-logging",
  "version" : {
    "number" : "1.5.2",
    "build_hash" : "62ff9868b4c8a0c45860bebb259e21980778ab1c",
    "build_timestamp" : "2015-04-27T09:21:06Z",
    "build_snapshot" : false,
    "lucene_version" : "4.10.4"
  },
  "tagline" : "You Know, for Search"
}

Once done, exit out of the shell and then delete the busybox pod with the following command:

kubectl delete pod jfbbox --namespace=kube-system

Checking the ES logs

To see what went wrong or just as a point of interest, you can check the logs for each of the pods. First get the names of the logs you're interested in.

kubectl get pods --namespace=kube-system | grep elasticsearch-logging | cut -d " " -f1

that should output similar to the following lines with only the last five characters different:

elasticsearch-logging-v1-56784
elasticsearch-logging-v1-rg13v

and then get the logs for each of the pods listed:

kubectl logs elasticsearch-logging-v1-56784 --namespace=kube-system

add an -f flag if you want to live tail this. This will dump out a lot of stuff. You're probably like me (an ES n00b) so you'll say "what does an error look like?". Some things I've seen are masters not found, etc... At this point it might be an ES problem OR it could be ES interacting with various things in your cluster. For example -- is your DNS the issue? Good luck. Don't proceed until the above test works though.

Get Fluentd Agent Running on the Nodes

Now we get the Fluentd agent running on each of the nodes to filter the logs and ship them to ES. To do this, we simply drop the yaml found here into the directory /etc/kubernetes/manifests on all of your nodes.

You can test it by manually dropping it on to a k8s node and moving it to the right directory, but you should add that to whatever automation you're using to build out your nodes.

Confirm that its running and working as you'd expect.

kubectl get pods --namespace=kube-system

and you should see the node running there -- something like this.

fluentd-elasticsearch-myhost-kubenode-001   1/1       Running   0          19h

You can check the logs using the same basic process as outlined above for installing ES.

Note: one way you can tell if you're killing ES with logs is that you'll see timeouts in the logs like the following:

2016-06-14 19:28:45 +0000 [warn]: Could not push logs to Elasticsearch, resetting connection and trying again. read timeout reached
2016-06-14 19:28:53 +0000 [info]: Connection opened to Elasticsearch cluster => {:host=>"elasticsearch-logging", :port=>9200, :scheme=>"http"}

When you see this, you will want to increase the number of nodes in your ES replication controller yaml. That can be done with the following command (in this instance, we go from the default of 2 up to 5 pods):

kubectl scale rc elasticsearch-logging-v1 --replicas=5 --namespace=kube-system

Kibana

This one is the blackest of black boxes to me. Hope it works for you right off the bat :D

Here's the yaml we want to use.

https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/fluentd-elasticsearch

One thing before you begin -- more than likely, you'll want to expose your kibana service OUTSIDE of k8s. To do that, you'll need to make small change to the yaml found in the above directory. If you don't want that, don't worry about it. We're going to add type LoadBalancer like this:

 spec:
-  type: "LoadBalancer"
   ports:
   - port: 80
     protocol: TCP

Then we're ready to create it all.

kubectl create -f kibana-controller.yaml
kubectl create -f  kibana-service.yaml

If you don't get an index at the start, that means that logs aren't getting into ES as you'd expect. That's it. Do a victory lap!