This document with highlighting how to troubleshoot the deployment of a Kubernetes cluster, it will not cover debugging of workloads inside Kubernetes.
This page assumes you have a working Juju deployed cluster.
Using juju status
can give you some insight as to what’s happening in a cluster:
Model Controller Cloud/Region Version
kubes work-multi aws/us-east-2 2.0.2.1
App Version Status Scale Charm Store Rev OS Notes
easyrsa 3.0.1 active 1 easyrsa jujucharms 3 ubuntu
etcd 2.2.5 active 1 etcd jujucharms 17 ubuntu
flannel 0.6.1 active 2 flannel jujucharms 6 ubuntu
kubernetes-master 1.4.5 active 1 kubernetes-master jujucharms 8 ubuntu exposed
kubernetes-worker 1.4.5 active 1 kubernetes-worker jujucharms 11 ubuntu exposed
Unit Workload Agent Machine Public address Ports Message
easyrsa/0* active idle 0/lxd/0 10.0.0.55 Certificate Authority connected.
etcd/0* active idle 0 52.15.47.228 2379/tcp Healthy with 1 known peers.
kubernetes-master/0* active idle 0 52.15.47.228 6443/tcp Kubernetes master services ready.
flannel/1 active idle 52.15.47.228 Flannel subnet 10.1.75.1/24
kubernetes-worker/0* active idle 1 52.15.177.233 80/tcp,443/tcp Kubernetes worker running.
flannel/0* active idle 52.15.177.233 Flannel subnet 10.1.63.1/24
Machine State DNS Inst id Series AZ
0 started 52.15.47.228 i-0bb211a18be691473 xenial us-east-2a
0/lxd/0 started 10.0.0.55 juju-153b74-0-lxd-0 xenial
1 started 52.15.177.233 i-0502d7de733be31bb xenial us-east-2b
In this example we can glean some information. The Workload
column will show the status of a given service. The Message
section will show you the health of a given service in the cluster. During deployment and maintenance these workload statuses will update to reflect what a given node is doing. For example the workload my say maintenance
while message will describe this maintenance as Installing docker
.
During normal operation the Workload should read active
, the Agent column (which reflects what the Juju agent is doing) should read idle
, and the messages will either say Ready
or another descriptive term. juju status --color
will also return all green results when a cluster’s deployment is healthy.
Status can become unwieldy for large clusters, it is then recommended to check status on individual services, for example to check the status on the workers only:
juju status kubernetes-worker
or just on the etcd cluster:
juju status etcd
Errors will have an obvious message, and will return a red result when used with juju status --color
. Nodes that come up in this manner should be investigated.
You can ssh to individual units easily with the following convention, juju ssh <servicename>/<unit#>
:
juju ssh kubernetes-worker/3
Will automatically ssh you to the 3rd worker unit.
juju ssh easyrsa/0
This will automatically ssh you to the easyrsa unit.
Sometimes it is useful to collect all the information from a cluster to share with a developer to identify problems. This is best accomplished with CDK Field Agent.
Download and execute the collect.py script from CDK Field Agent on a box that has a Juju client configured with the current controller and model pointing at the CDK deployment of interest.
Running the script will generate a tarball of system information and includes basic information such as systemctl status, Juju logs, charm unit data, etc. Additional application-specific information may be included as well.
This section assumes you have a working deployment of Kubernetes via Juju using a Load Balancer for the API, and that you are using Helm to deploy charts.
To deploy Helm you will have run:
helm init
$HELM_HOME has been configured at /home/ubuntu/.helm
Tiller (the helm server side component) has been installed into your Kubernetes Cluster.
Happy Helming!
Then when using helm you may see one of the following errors:
helm version
Client: &version.Version{SemVer:"v2.1.3", GitCommit:"5cbc48fb305ca4bf68c26eb8d2a7eb363227e973", GitTreeState:"clean"}
Error: cannot connect to Tiller
helm install <chart> --debug
Error: forwarding ports: error upgrading connection: Upgrade request required
This is caused by the API load balancer not forwarding ports in the context of the helm client-server relationship. To deploy using helm, you will need to follow these steps:
juju expose kubernetes-master
juju status kubernetes-master
Model Controller Cloud/Region Version
production k8s-admin aws/us-east-1 2.0.0
App Version Status Scale Charm Store Rev OS Notes
flannel 0.6.1 active 1 flannel jujucharms 7 ubuntu
kubernetes-master 1.5.1 active 1 kubernetes-master jujucharms 10 ubuntu exposed
Unit Workload Agent Machine Public address Ports Message
kubernetes-master/0* active idle 5 54.210.100.102 6443/tcp Kubernetes master running.
flannel/0 active idle 54.210.100.102 Flannel subnet 10.1.50.1/24
Machine State DNS Inst id Series AZ
5 started 54.210.100.102 i-002b7150639eb183b xenial us-east-1a
Relation Provides Consumes Type
certificates easyrsa kubernetes-master regular
etcd etcd flannel regular
etcd etcd kubernetes-master regular
cni flannel kubernetes-master regular
loadbalancer kubeapi-load-balancer kubernetes-master regular
cni kubernetes-master flannel subordinate
cluster-dns kubernetes-master kubernetes-worker regular
cni kubernetes-worker flannel subordinate
In this context the public IP address is 54.210.100.102.
If you want to access this data programmatically you can use the JSON output:
juju show-status kubernetes-master --format json | jq --raw-output '.applications."kubernetes-master".units | keys[]'
54.210.100.102
Identify the kubeconfig file or section used for this cluster, and edit the server configuration.
By default, it will look like https://54.213.123.123:443
. Replace it with the Kubernetes Master endpoint https://54.210.100.102:6443
and save.
Note that the default port used by CDK for the Kubernetes Master API is 6443 while the port exposed by the load balancer is 443.
helm install <chart> --debug
Created tunnel using local port: '36749'
SERVER: "localhost:36749"
CHART PATH: /home/ubuntu/.helm/<chart>
NAME: <chart>
...
...
By default there is no log aggregation of the Kubernetes nodes, each node logs locally. Please read over the logging page for more information.