I have deployed a 3-node CoreOS Vagrant VMs following this guide, modified as described here.
The VMs are healthy and running; K8s controller/worker nodes are fine; and I can deploy Pods; ReplicaSets; etc.
However, DNS does not seem to work and, when I look at the state of the flannel pods, they are positively unhealthy:
$ kubectl get po --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
apps frontend-cluster-q4gvm 1/1 Running 1 1h
apps frontend-cluster-tl5ts 1/1 Running 0 1h
apps frontend-cluster-xgktz 1/1 Running 1 1h
kube-system kube-apiserver-172.17.8.101 1/1 Running 2 32d
kube-system kube-controller-manager-172.17.8.101 1/1 Running 2 32d
kube-system kube-flannel-ds-6csjl 0/1 CrashLoopBackOff 46 31d
kube-system kube-flannel-ds-f8czg 0/1 CrashLoopBackOff 48 31d
kube-system kube-flannel-ds-qbtlc 0/1 CrashLoopBackOff 52 31d
kube-system kube-proxy-172.17.8.101 1/1 Running 2 32d
kube-system kube-proxy-172.17.8.102 1/1 Running 0 6m
kube-system kube-proxy-172.17.8.103 1/1 Running 0 2m
kube-system kube-scheduler-172.17.8.101 1/1 Running 2 32dfurther, when I try to deploy kubedns those fail too, with the same failure mode:
$ kubectl logs kube-flannel-ds-f8czg -n kube-system
I0608 23:03:32.526331 1 main.go:475] Determining IP address of default interface
I0608 23:03:32.528108 1 main.go:488] Using interface with name eth0 and address 10.0.2.15
I0608 23:03:32.528135 1 main.go:505] Defaulting external address to interface address (10.0.2.15)
E0608 23:04:02.627348 1 main.go:232] Failed to create SubnetManager: error retrieving pod spec for 'kube-system/kube-flannel-ds-f8czg': Get https://10.3.0.1:443/api/v1/namespaces/kube-system/pods/kube-flannel-ds-f8czg: dial tcp 10.3.0.1:443: i/o timeoutSo, it appears that the controller service, running off the 10.3.0.1 IP is not reachable from other pods:
$ kubectl get svc --all-namespaces
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
apps frontend ClusterIP 10.3.0.170 <none> 80/TCP,443/TCP 1h
default kubernetes ClusterIP 10.3.0.1 <none> 443/TCP 32dMy guess were around either Flannel's etcd configurations; or the kube-proxy YAML; so, I added the following to all the nodes:
core@core-01 ~ $ etcdctl ls /flannel/network/subnets
/flannel/network/subnets/10.1.80.0-24
/flannel/network/subnets/10.1.76.0-24
/flannel/network/subnets/10.3.0.0-16
/flannel/network/subnets/10.1.34.0-24
core@core-01 ~ $ etcdctl get /flannel/network/subnets/10.3.0.0-16
{"PublicIP": "172.17.8.101"}and restarted flanneld:
core@core-01 ~ $ sudo systemctl restart flanneldHowever, that does not appear to do any good; from within a running Pod:
# This is expected (no client certs):
root@frontend-cluster-q4gvm:/opt/simple# curl -k https://172.17.8.101/api/v1/pods
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {
},
"status": "Failure",
"message": "Unauthorized",
"reason": "Unauthorized",
"code": 401
}
# But this one just times out:
root@frontend-cluster-q4gvm:/opt/simple# curl -k https://10.3.0.1/api/v1/podsThen I looked into the kube-proxy.yaml and suspected that the --master configuration (for the worker nodes) was not correct, somehow?
core@core-02 /etc/kubernetes/manifests $ cat kube-proxy.yaml
apiVersion: v1
kind: Pod
metadata:
name: kube-proxy
namespace: kube-system
spec:
hostNetwork: true
containers:
- name: kube-proxy
image: quay.io/coreos/hyperkube:v1.10.1_coreos.0
command:
- /hyperkube
- proxy
>>>>> Should it be like this?
- --master=https://172.17.8.101
>>>>> or like this?
- --master=http://127.0.0.1:8080
securityContext:
privileged: true
volumeMounts:
- mountPath: /etc/ssl/certs
name: ssl-certs-host
readOnly: true
volumes:
- hostPath:
path: /usr/share/ca-certificates
name: ssl-certs-hostthe 127.0.0.1:8080 configuration would appear to work only (at best) for the controller node, but would surely lead nowhere on the other nodes?
Modifying the --master as indicated above and restarting the pods however, does not do any good either.
Bottom line is, how do I make the Controller API reachable on 10.3.0.1? How can I enable KubeDNS (I tried the instructions on the "Hard Way" guide, but got exactly the same failure mode as above).
Many thanks in advance!
Update
This is the file with the flanneld options:
$ cat /etc/flannel/options.env
FLANNELD_IFACE=172.17.8.101
FLANNELD_ETCD_ENDPOINTS=http://172.17.8.102:2379,http://172.17.8.103:2379,http://172.17.8.101:2379I have now removed the flannel daemon set:
kc delete ds kube-flannel-ds -n kube-systemand deployed Kube DNS, following these instructions: the service is defined here, and the deployment here:
$ kc -n kube-system get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-dns ClusterIP 10.3.0.10 <none> 53/UDP,53/TCP 4d
$ kc get po -n kube-system
NAME READY STATUS RESTARTS AGE
kube-apiserver-172.17.8.101 1/1 Running 5 36d
kube-controller-manager-172.17.8.101 1/1 Running 5 36d
kube-dns-7868b65c7b-ntc95 3/4 Running 2 3m
kube-proxy-172.17.8.101 1/1 Running 5 36d
kube-proxy-172.17.8.102 1/1 Running 3 4d
kube-proxy-172.17.8.103 1/1 Running 2 4d
kube-scheduler-172.17.8.101 1/1 Running 5 36dHowever, I'm still getting the timeout error (actually, a bunch of them):
E0613 19:02:27.193691 1 sync.go:105] Error getting ConfigMap kube-system:kube-dns err: Get https://10.3.0.1:443/api/v1/namespaces/kube-system/configmaps/kube-dns: dial tcp 10.3.0.1:443: i/o timeoutUpdate #2
On a system setup similarly, I have the following flanneld configuration:
core@core-02 ~ $ etcdctl get /flannel/network/config
{ "Network": "10.1.0.0/16" }
core@core-02 ~ $ etcdctl ls /flannel/network/subnets
/flannel/network/subnets/10.1.5.0-24
/flannel/network/subnets/10.1.66.0-24
/flannel/network/subnets/10.1.6.0-24
core@core-02 ~ $ etcdctl get /flannel/network/subnets/10.1.66.0-24
{"PublicIP":"172.17.8.102"}(and similarly for the others, pointing to 101 and 103) - should there be something in config for the 10.3.0.0/16 subnet? Also, should there be an entry (pointing to to 172.17.8.101) for the Controller API at 10.3.0.1? Something along the lines of:
/flannel/network/subnets/10.3.0.0-24
{"PublicIP":"172.17.8.101"}Does anyone know where to find good flanneld documentation (CoreOS docs are truly insufficient and feel somewhat "abandoned")? Or something else to use that actually works?
Thanks!