Wednesday, May 24, 2017

Kubernetes Readiness and Liveness with Apache Kafka REST Proxy

When setting up Readiness and Liveness checks in Kubernetes for Kafka connectors, the use of the httpGet described in my previous blog post (Kubernetes Readiness and Liveness with Spring Boot Actuator) is not an option because there is no endpoint to reference. These can be deployed with the Apache Kafka REST Proxy, which gets us on the right path, but doesn't quite work how we want in this respect.

The Kafka REST Proxy provides endpoints that allow one to get some basic status info about connectors. However, the standard Kubernetes httpGet calls use status code >= 200 and < 400 to determine the status, and since the Kafka REST status endpoint always provides a 200 status code, it is not possible to use this methodology to determine if a connector is down.

What we would like to do is check the content of the status call, and do a string comparison. For example, when the service is up, the status endpoint indicates that the state is "RUNNING":
# curl http://10.30.128.1:8083/connectors/mysql-kafka-connector/status
{"name":"mysql-kafka-connector","connector":{"state":"RUNNING","worker_id":"10.30.128.1:8083"},"tasks":[{"state":"RUNNING","id":0,"worker_id":"10.30.128.1:8083"}]}

We can pause the connector using this endpoint:
# curl -i -X PUT http://10.30.128.1:8083/connectors/mysql-kafka-connector/pause
HTTP/1.1 202 Accepted

And then the state is changed to PAUSED:
# curl http://10.30.128.1:8083/connectors/mysql-kafka-connector/status
{"name":"mysql-kafka-connector","connector":{"state":"PAUSED","worker_id":"10.30.128.1:8083"},"tasks":[{"state":"PAUSED","id":0,"worker_id":"10.30.128.1:8083"}]}

To accomplish this check, we can leverage the exec command probe:
readinessProbe: 
  exec: 
    command:
      - /bin/sh 
      - -c 
      - curl -s http://127.0.0.1:8083/connectors/mysql-kafka-connector/status | grep "RUNNING"
  initialDelaySeconds: 240
  periodSeconds: 5
  timeoutSeconds: 5 
  successThreshold: 1
  failureThreshold: 10 
livenessProbe:
  exec: 
    command: 
      - /bin/sh
      - -c 
      - curl -s http://127.0.0.1:8083/connectors/mysql-kafka-connector/status | grep "RUNNING"
  initialDelaySeconds: 300 
  periodSeconds: 60
  timeoutSeconds: 10 
  successThreshold: 1
  failureThreshold: 3

The the exec command allows us to execute a shell command. In this case:
  • running the shell (/bin/sh)
  • telling it to run a single command (-c)
  • with the command being a cURL call to the specific connector status endpoint, and grepping for the string "RUNNING"
When the grep is successful, Kubernetes interprets this as a success. If the grep comes back empty (i.e., "RUNNING" is not found), then it gets viewed as a failure. You can then test this on the server by pausing the service in question as described above.

To get the service running again and start passing readiness and liveness again, then you will want to use the RESUME endpoint.
# curl -i -X PUT http://10.30.128.1:8083/connectors/mysql-kafka-connector/pause
HTTP/1.1 202 Accepted


No comments: