Kubernetes bare metal NFS PVs error with elasticsearch helm chart

9/29/2018

I deployed Kubernetes on a bare metal dedicated server using conjure-up kubernetes on Ubuntu 18.04 LTS. This also means the nodes are LXD containers.

I need persistent volumes for Elasticsearch and MongoDB, and after some research I decided that the simplest way of getting that to work in my deployment was an NFS share. I created an NFS share in the host OS, with the following configuration:

/srv/volumes 127.0.0.1(rw) 10.78.69.*(rw,no_root_squash)

10.78.69.* appears to be the bridge network used by Kubernetes, at least looking at ifconfig there's nothing else.

Then I proceeded to create two folders, /srv/volumes/1 and /srv/volumes/2 I created two PVs from these folders with this configuration for the first (the second is similar):

apiVersion: v1
kind: PersistentVolume
metadata:
  name: elastic-pv1
spec:
  capacity:
    storage: 30Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  nfs:
    path: /srv/volumes/1
    server: 10.78.69.1

Then I deploy the Elasticsearch helm chart (https://github.com/helm/charts/tree/master/incubator/elasticsearch) and it creates two claims which successfully bind to my PVs.

The issue is that afterwards the containers seem to encounter errors:

Error: failed to start container "sysctl": Error response from daemon: linux runtime spec devices: lstat /dev/.lxc/proc/17848/fdinfo/24: no such file or directory Back-off restarting failed container

Pods view

Persistent Volume Claims view

I'm kinda stuck here. I've tried searching for the error but I haven't been able to find a solution to this issue.

Previously before I set the allowed IP in /etc/exports to 10.78.69.* Kubernetes would tell me it got "permission denied" from the NFS server while trying to mount, so I assume that now mounting succeeded, since that error disappeared.

EDIT:

I decided to purge the helm deployment and try again, this time with a different storage type, local-storage volumes. I created them following the guide from Canonical, and I know they work because I set up one for MongoDB this way and it works perfectly.

The configuration for the elasticsearch helm deployment changed since now I have to set affinity for the nodes on which the persistent volumes were created:

values.yaml:

data:
    replicas: 1,
    nodeSelector:
        elasticsearch: data
master:
    replicas: 1,
    nodeSelector:
        elasticsearch: master
client:
    replicas: 1,
cluster:
    env: {MINIMUM_MASTER_NODES: "1"}

I deployed using

helm install --name site-search -f values.yaml incubator/elasticsearch

These are the only changes, however elasticsearch still presents the same issues.

Additional information:

kubectl version:

Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.3", GitCommit:"a4529464e4629c21224b3d52edfe0ea91b072862", GitTreeState:"clean", BuildDate:"2018-09-09T18:02:47Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.3", GitCommit:"a4529464e4629c21224b3d52edfe0ea91b072862", GitTreeState:"clean", BuildDate:"2018-09-09T17:53:03Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}

The elasticsearch image is the default one in the helm chart:

docker.elastic.co/elasticsearch/elasticsearch-oss:6.4.1

The various pods' (master, client, data) logs are empty. The error is the same.

-- Reasv
cloud-bare-metal
docker
kubernetes
kubernetes-helm
nfs

2 Answers

10/3/2018

It looks like an often issue and it is observed in various environments and configurations. However it's quite unclear what exactly causing it. Could you provide more details about your software versions, log fragments, etc?

-- getslaf
Source: StackOverflow

10/6/2018

I was able to solve the issue by running sysctl -w vm.max_map_count=262144 myself on the host machine, and removing the "sysctl" init container which was trying to do this unsuccessfully.

-- Reasv
Source: StackOverflow