通过kubeadm安装kubernetes高可用模式,以及master故障恢复
有个bug,kubelet 一直报错open /sys/fs/cgroup/devices/libcontainer_124798_systemd_test_default.slice: no such file or directory,https://github.com/kubernetes/kubernetes/issues/76531
一、基础环境
#环境
kubernetes v1.14.2(注意kubelet client跟server最好版本一致)
kubeadm 默认证书有效期为1年
linux CentOS 64 Linux release 7.5.1804 (Core)
192.168.10.241 master01
192.168.10.242 master02
192.168.10.243 master03
192.168.10.244 node01
192.168.10.245 node02
192.168.10.246 node03
#所有节点设置各自的hostname
hostnamectl set-hostname master01
hostnamectl set-hostname master02
hostnamectl set-hostname master03
hostnamectl set-hostname node01
hostnamectl set-hostname node02
hostnamectl set-hostname node03
#所有节点加入/etc/hosts解析
cat <<EOF >> /etc/hosts
192.168.10.241 master01
192.168.10.242 master02
192.168.10.243 master03
192.168.10.244 node01
192.168.10.245 node02
192.168.10.246 node03
EOF
#所有节点永久关闭防火墙
systemctl stop firewalld
systemctl disable firewalld
#所有节点永久关闭selinux
sed -i 's/^SELINUX=.*$/SELINUX=disabled/' /etc/selinux/config
reboot #查看当前状态getenforce
#所有节点关闭swap
cat <<EOF > /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
vm.swappiness=0
EOF
sysctl --system #查看所有参数sysctl -a
swapoff -a #临时关闭配置文件“/etc/fstab”中所有的交换空间
sed -i 's/^\/dev\/mapper\/centos-swap/#&/' /etc/fstab #永久关闭swap
#所有节点所有节点安装docker,具体安装过程请详见官方教程:https://docs.docker.com/install/linux/docker-ce/centos/,还需要参考k8s官网https://kubernetes.io/docs/setup/cri/#docker
echo "nameserver 114.114.114.114" >> /etc/resolv.conf
yum install -y yum-utils device-mapper-persistent-data lvm2 && yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
yum install -y docker-ce docker-ce-cli containerd.io
cat > /etc/docker/daemon.json <<EOF
{
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m"
},
"storage-driver": "overlay2",
"storage-opts": [
"overlay2.override_kernel_check=true"
]
}
EOF
systemctl enable docker && systemctl restart docker
#所有节点开启forward
iptables -P FORWARD ACCEPT
#所有节点开启ipvs内核模块
cat > /etc/sysconfig/modules/ipvs.modules <<EOF
#!/bin/bash
modprobe -- ip_vs
modprobe -- ip_vs_rr
modprobe -- ip_vs_wrr
modprobe -- ip_vs_sh
modprobe -- nf_conntrack_ipv4
EOF
chmod 755 /etc/sysconfig/modules/ipvs.modules && bash /etc/sysconfig/modules/ipvs.modules
#校验ipvs相关模块是否启用
lsmod | grep -e ip_vs -e nf_conntrack_ipv4
二、所有节点安装kubeadm、kubelet和kubectl、ipvsadm
#所有节点配置阿里云repo镜像源
cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
#所有节点安装kubeadm、kubelet和kubectl、ipvsadm
yum install -y kubelet-1.14.1 kubeadm-1.14.1 kubectl-1.14.1 ipvsadm
systemctl enable kubelet && systemctl start kubelet
#配置kubelet参数
cat <<EOF > /etc/sysconfig/kubelet
KUBELET_EXTRA_ARGS="--runtime-cgroups=/systemd/system.slice --kubelet-cgroups=/systemd/system.slice"
EOF
三、所有master节点配置HAproxy代理和keepalived
#所有master节点,安装haproxy docker镜像
docker pull haproxy:latest
#创建haproxy的配置文件
mkdir -pv /etc/haproxy
cat <<EOF > /etc/haproxy/haproxy.cfg
global
log 127.0.0.1 local0 err
maxconn 50000
uid 99 #nobody
gid 99 #nobody
#daemon
nbproc 1
pidfile haproxy.pid
defaults
mode http
log 127.0.0.1 local0 err
maxconn 50000
retries 3
timeout connect 5s
timeout client 30s
timeout server 30s
timeout check 2s
listen admin_stats
mode http
bind 0.0.0.0:1080
log 127.0.0.1 local0 err
stats refresh 30s
stats uri /haproxy-status
stats realm Haproxy\ Statistics
stats auth admin:admin
stats hide-version
stats admin if TRUE
frontend k8smaster-haproxy-https
bind 0.0.0.0:8443
mode tcp
default_backend k8smaster-https
backend k8smaster-https
mode tcp
balance roundrobin
server master01 192.168.10.241:6443 weight 1 maxconn 1000 check inter 2000 rise 2 fall 3
server master02 192.168.10.242:6443 weight 1 maxconn 1000 check inter 2000 rise 2 fall 3
server master03 192.168.10.243:6443 weight 1 maxconn 1000 check inter 2000 rise 2 fall 3
EOF
#检测配置文件错误
docker run -it --rm --name haproxy-syntax-check my-haproxy haproxy -c -f /etc/haproxy/haproxy.cfg
#启动haproxy
docker run -d --name k8smaster-haproxy -v /etc/haproxy:/usr/local/etc/haproxy:ro -p 8443:8443 -p 1080:1080 --restart always haproxy:latest
#重载haproxy配置
docker kill -s HUP k8smaster-haproxy
#查看haproxy日志
docker logs k8smaster-haproxy
#浏览器查看状态
http://192.168.10.241:1080/haproxy-status
http://192.168.10.242:1080/haproxy-status
http://192.168.10.243:1080/haproxy-status
#所有master节点,安装keepalived docker镜像
docker pull osixia/keepalived:latest
#所有master节点,启动keepalived docker
docker run -d --name k8smaster-keepalived --net=host --cap-add=NET_ADMIN -e KEEPALIVED_INTERFACE=ens33 -e KEEPALIVED_VIRTUAL_IPS="#PYTHON2BASH:['192.168.10.240']" -e KEEPALIVED_UNICAST_PEERS="#PYTHON2BASH:['192.168.10.241','192.168.10.242','192.168.10.243']" -e KEEPALIVED_PASSWORD=admin --restart always osixia/keepalived:latest
#查看keepalived docker日志
docker logs k8smaster-keepalived
#此时vip 192.168.10.240应该会绑定到三台的master01上
ping 192.168.10.240 #可通
#如果失败,删除命令为
docker rm -f k8smaster-keepalived
ip a del 192.168.10.240/32 dev ens33
四、配置第一个master(参数使用说明,请参考官网https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-config/#cmd-config-from-file)
#查看默认组件image list
kubeadm config images list
#查看默认配置(kubeadm config print init-defaults or kubeadm config print join-defaults),并初始化文件(详细参数配置参考https://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta1)
cd /root/
kubeadm config print init-defaults > k8s-init-master01.yaml
#修改k8s-init-master01.yaml配置文件
详见同级目录的k8s-init-master01.yaml文件
#提前拉取镜像
kubeadm config images pull --config k8s-init-master01.yaml
#查看vip240是否在master01上(如果不在,可以同时重启另两个master的docker keepalived服务,将vip漂移到master01上)
ip addr
docker restart k8smaster-keepalived
#初始化(如果初始化超时并失败,很可能一个原因是240的vip没有在master01上,kubelet报错连接vip失败,ip addr 可以查看ip信息)
kubeadm init --config k8s-init-master01.yaml
mkdir -p $HOME/.kube && sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config && sudo chown $(id -u):$(id -g) $HOME/.kube/config
#systemctl status -l kubelet 会报Unable to update cni config: No networks found in /etc/cni/net.d 因为还没安装k8s的flannel cni插件
#注意保存返回的join命令
kubeadm join 192.168.10.240:8443 --token rx75vh.v77joay2977m2p5g \
--discovery-token-ca-cert-hash sha256:fc9c5027264df3320d30894078f75daf69565ac50dbb411e956493e65646e4f3 \
--experimental-control-plane
kubeadm join 192.168.10.240:8443 --token rx75vh.v77joay2977m2p5g \
--discovery-token-ca-cert-hash sha256:fc9c5027264df3320d30894078f75daf69565ac50dbb411e956493e65646e4f3
#重置节点命令是(如果全部集群重置,每个master及node节点都需要操作):
kubeadm reset -f
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X
ipvsadm --clear
rm -rf $HOME/.kube
systemctl daemon-reload && systemctl restart kubelet
#copy证书到其他master节点,不用传输证书到node节点
cd /etc/kubernetes && tar -czvf k8s-ca.tar.gz admin.conf pki/ca.* pki/sa.* pki/front-proxy-ca.* pki/etcd/ca.* && cd ~
#scp到master02、master03,然后解压相同级别目录tar -xzvf k8s-ca.tar.gz
scp /etc/kubernetes/k8s-ca.tar.gz master02:/etc/kubernetes/
scp /etc/kubernetes/k8s-ca.tar.gz master03:/etc/kubernetes/
#解压master02、master03下的k8s-ca.tar.gz
cd /etc/kubernetes/ && tar -xzvf k8s-ca.tar.gz && cd ~
cd /etc/kubernetes/ && tar -xzvf k8s-ca.tar.gz && cd ~
五、配置第二个master(以下是使用init phase分阶段加入集群教程)
#copy k8s-init-master01.yaml为k8s-init-master02.yaml,并修改内容
具体请查看同级目录的k8s-init-master02.yaml
#初始化
#配置证书
kubeadm init phase certs all --config k8s-init-master02.yaml
kubeadm init phase etcd local --config k8s-init-master02.yaml
#初始化kubelet
kubeadm init phase kubeconfig kubelet --config k8s-init-master02.yaml
kubeadm init phase kubelet-start --config k8s-init-master02.yaml
mkdir -p $HOME/.kube && sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config && sudo chown $(id -u):$(id -g) $HOME/.kube/config
#将etcd加入集群
kubectl exec -n kube-system etcd-master01 -- etcdctl --ca-file /etc/kubernetes/pki/etcd/ca.crt --cert-file /etc/kubernetes/pki/etcd/peer.crt --key-file /etc/kubernetes/pki/etcd/peer.key --endpoints=https://192.168.10.241:2379 member add master02 https://192.168.10.242:2380
#查看etcd集群已有节点
kubectl exec -n kube-system etcd-master01 -- etcdctl --ca-file /etc/kubernetes/pki/etcd/ca.crt --cert-file /etc/kubernetes/pki/etcd/peer.crt --key-file /etc/kubernetes/pki/etcd/peer.key --endpoints=https://192.168.10.241:2379 member list
#启动kube-apiserver、kube-controller-manager、kube-scheduler
kubeadm init phase kubeconfig all --config k8s-init-master02.yaml
kubeadm init phase control-plane all --config k8s-init-master02.yaml
#标记节点为master
#获取状态
kubectl get nodes #ROLES为none状态
#标记为master
kubeadm init phase mark-control-plane --config k8s-init-master02.yaml #ROLES为master状态
#获取状态
kubectl get nodes #ROLES为master状态
六、配置第三个master(以下是使用init phase分阶段加入集群教程)
#copy k8s-init-master02.yaml为k8s-init-master03.yaml,并修改内容
具体请查看同级目录的k8s-init-master03.yaml
#初始化
#配置证书
kubeadm init phase certs all --config k8s-init-master03.yaml
kubeadm init phase etcd local --config k8s-init-master03.yaml
#初始化kubelet
kubeadm init phase kubeconfig kubelet --config k8s-init-master03.yaml
kubeadm init phase kubelet-start --config k8s-init-master03.yaml
mkdir -p $HOME/.kube && cp -i /etc/kubernetes/admin.conf $HOME/.kube/config && chown $(id -u):$(id -g) $HOME/.kube/config
#将etcd加入集群
kubectl exec -n kube-system etcd-master01 -- etcdctl --ca-file /etc/kubernetes/pki/etcd/ca.crt --cert-file /etc/kubernetes/pki/etcd/peer.crt --key-file /etc/kubernetes/pki/etcd/peer.key --endpoints=https://192.168.10.241:2379 member add master03 https://192.168.10.243:2380
#查看etcd集群已有节点
kubectl exec -n kube-system etcd-master01 -- etcdctl --ca-file /etc/kubernetes/pki/etcd/ca.crt --cert-file /etc/kubernetes/pki/etcd/peer.crt --key-file /etc/kubernetes/pki/etcd/peer.key --endpoints=https://192.168.10.241:2379 member list
#启动kube-apiserver、kube-controller-manager、kube-scheduler
kubeadm init phase kubeconfig all --config k8s-init-master03.yaml
kubeadm init phase control-plane all --config k8s-init-master03.yaml
#标记节点为master
#获取状态
kubectl get nodes #ROLES为none状态
#标记为master
kubeadm init phase mark-control-plane --config k8s-init-master03.yaml #ROLES为master状态
#获取状态
kubectl get nodes #ROLES为master状态
七、安装flannel(在任意一个master节点执行)
#下载flannel的yml文件
wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
#查看net-conf.json的network跟你的环境网络是否同一个网段,如果不是请修改kube-falnnel.yml内容为实际网段内容
net-conf.json: |
{
"Network": "10.244.0.0/16",
"Backend": {
"Type": "vxlan"
}
}
#flannel默认会使用主机的第一张网卡,如果你有多张网卡,需要增加参数,单独指定特定网卡名称
containers:
- name: kube-flannel
image: quay.io/coreos/flannel:v0.11.0-amd64
command:
- /opt/bin/flanneld
args:
- --ip-masq
- --kube-subnet-mgr
- --iface=ens33
#执行安装flannel
kubectl apply -f kube-flannel.yml
#查看节点状态
kubectl get nodes #status都为Ready
kubectl get pods --all-namespaces #有很多pod
systemctl status -l kubelet #不再报cni的错误
八、在所有node节点操作
#在所有node节点上运行,master01上初始化后返回的不带有参数--experimental-control-plane的join语句
执行上面master01返回的join语句
#如果忘记加入命令,可以查询(master端查询)
kubeadm token create --print-join-command
#重置命令
kubeadm reset -f
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X
ipvsadm --clear
rm -rf $HOME/.kube
systemctl daemon-reload && systemctl restart kubelet
九、校验是否成功(任意一个master节点执行)
#节点状态
kubectl get nodes
#组件状态
kubectl get cs
#查看所有pods
kubectl get pods --namespace kube-system -o wide
#查看pod日志
kubectl logs etcd-master02 --namespace kube-system
#查看所有服务
kubectl get svc --all-namespaces
#查看指定服务
kubectl get svc --namespace kube-system
#服务账户
kubectl get serviceaccount --all-namespaces
#查看角色
kubectl get roles --all-namespaces
#集群信息
kubectl cluster-info
#验证dns功能
kubectl run -it --rm --restart=Never --image=infoblox/dnstools:latest dnstools
nslookup kubernetes.default
#检查是否创建了deployments任务:
kubectl get deployments --all-namespaces
#编辑deployments配置
kubectl edit deployments -n kube-system
检查是否创建了副本控制器ReplicationController:
kubectl get rc --all-namespaces
#检查死否创建了副本集replicasets:
kubectl get rs --all-namespaces
十、部署高可用 CoreDNS(任意一个master节点执行,如果已是高可用,不需要执行本步骤,官方参考文档https://github.com/coredns/deployment/tree/master/kubernetes)
#查看pod
kubectl get pods --all-namespaces -o wide
#发现初始安装的coredns pod都在单点master03上,那如果master03宕机,coredns的服务就挂了
#下载并生成coredns.yaml文件
wget https://raw.githubusercontent.com/coredns/deployment/master/kubernetes/coredns.yaml.sed
wget https://raw.githubusercontent.com/coredns/deployment/master/kubernetes/deploy.sh
yum install -y epel-release && yum install -y jq && chmod u+x deploy.sh
./deploy.sh > coredns.yaml
#先删除coredns
kubectl delete -f coredns.yaml
kubectl get pods --all-namespaces -o wide
#coredns.yaml为同级目录下的coredns.yaml
#安装coredns
kubectl apply -f coredns.yaml
kubectl get pods --all-namespaces -o wide
#验证dns功能
kubectl run -it --rm --restart=Never --image=infoblox/dnstools:latest dnstools
nslookup kubernetes.default
十一、安装kubernetes dashboard(任意一个master节点执行,官方参考文档https://kubernetes.io/docs/tasks/access-application-cluster/web-ui-dashboard/#deploying-the-dashboard-ui , https://github.com/kubernetes/dashboard/wiki/Creating-sample-user)
#下载kubernetes-dashboard.yaml
wget https://raw.githubusercontent.com/kubernetes/dashboard/v1.10.1/src/deploy/recommended/kubernetes-dashboard.yaml
#修改kubernetes-dashboard.yaml为同级目录下的kubernetes-dashboard.yaml内容
#安装kubernetes-dashboard
kubectl apply -f kubernetes-dashboard.yaml
#新建dashboard-adminuser.yaml文件,内容同同级目录下的dashboard-adminuser.yaml内容
#新建dashborad管理权限用户(请注意根据自己实际需求设定)
kubectl apply -f dashboard-adminuser.yaml
#给浏览器生成客户端证书
grep 'client-certificate-data' ~/.kube/config | head -n 1 | awk '{print $2}' | base64 -d > kubecfg.crt
grep 'client-key-data' ~/.kube/config | head -n 1 | awk '{print $2}' | base64 -d > kubecfg.key
openssl pkcs12 -export -clcerts -inkey kubecfg.key -in kubecfg.crt -out kubecfg.p12 -name "kubernetes-client" #此时,可以设置导出密码,kubecfg.p12就是生成的个人证书
#浏览器导入证书kubecfg.p12
需要人工手动操作浏览器设置导入证书,如果生成的时候设置了导出密码,还需要在导入的时候输入相同密码
#导入成功后,重启浏览器,然后浏览器访问Dashboard UI
https://192.168.10.240:8443/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/
#使用token登录界面,界面输入获取的token(已经登录成功,不过这一步因为浏览器版本、品牌(谷歌火狐目前测试正常)、https证书等的原因,各种报错结果,但以上步骤确定是正确的)
#获取kuberntes-dashboard用户token
kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep kubernetes-dashboard | awk '{print $1}') | grep token: | awk '{print $2}'
#获取service account admin-user的Token
kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep admin-user | awk '{print $1}') | grep token: | awk '{print $2}'
eyJhbGciOiJSUzI1NiIsImtpZCI6IiJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJhZG1pbi11c2VyLXRva2VuLXM4cWhkIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6ImFkbWluLXVzZXIiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiIxYzU3MzVjMy03MmRjLTExZTktYmE3ZC0wMDBjMjk4Y2M4YjMiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6a3ViZS1zeXN0ZW06YWRtaW4tdXNlciJ9.DkEKxwIaSeaEWRW10RefhBGoFneF4S_A_WDhqAN1bhZDuMQY93cqm7-WIw5iMvqt6SWGZSyb8KsaGUNR5ISmg5SAuLc6TUezzCyGJuntoTQRLYN0NIipbCEPj7OWGTVjiChr0Ss4X4opDpBUvQ7OkL-UzSQLpYOtSGYFPKa9CXUF5vjnNa4ova5KO61HJLInMZ-4XkPyWubsoyqihPVHXu36TnWJ0pEmpLxt9NhdCykUy1Oxcojj0r889cl5hUiLCl2Wk3mlMs3azC1pCOvRY0mcLmzz8g4OihVYMc5FK0rqG6LQ4K9SHpwB0AZQnhgzC5feHMGL9o5A2W-qek2uWQ
#使用kubeconfig文件登录(User in kubeconfig file need either username & password or token,这里我们使用token,参考链接https://kubernetes.io/docs/tasks/access-application-cluster/configure-access-multiple-clusters/)
DASHBOARD_TOCKEN=$(kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep admin-user | awk '{print $1}') | grep token: | awk '{print $2}')
kubectl config set-cluster kubernetes --server=https://192.168.10.240:8443 --kubeconfig=/root/dashboard-admin.conf
kubectl config set-credentials dashboard-admin --token=$DASHBOARD_TOCKEN --kubeconfig=/root/dashboard-admin.conf
kubectl config set-context dashboard-admin@kubernetes --cluster=kubernetes --user=dashboard-admin --kubeconfig=/root/dashboard-admin.conf
kubectl config use-context dashboard-admin@kubernetes --kubeconfig=/root/dashboard-admin.conf
生成的dashboard-admin.conf即可用于kubeconfig方式登录dashboard
#注意,以上是api方式访问kubernetes dashboard,当然也可以create nginx ingress for dashboard,但不如api方式更直接,毕竟ingress挂了,也就没法ingress方式访问dashboard了
十二、模拟master01故障
#master01上执行
shutdown -h now
#需要过一段时间,k8s才会检测到master01 not ready
#浏览器重新打开https://192.168.10.240:8443/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/ 看是否正常
#dashborad节点菜单项上master01已就绪状态变成Unknown
#其他正常master执行kubectl get nodes看结果
#master01的status状态变为NotReady
#查看pod是否正常
kubectl get pods --namespace kube-system -o wide
#其他服务都正常,表明kubernetesHA高可用模式成功
#当然现在重新启动master01(没有破坏过master01的数据,只是做了重启),重启服务器完毕后,整个集群的状态又会自动恢复正常。
#为了模拟更真实的故障恢复,我们破坏master01的数据更彻底些
kubeadm reset -f
#再次查询集群状态等
kubectl get nodes #master01的status状态变为NotReady
kubectl get pods --namespace kube-system -o wide
十三、模拟master01故障以后的恢复
#首先在一台正常master上运行下面的命令获取etcd集群中故障member的ID(正常master节点执行)
ETCD=`docker ps|grep etcd|grep -v POD|awk '{print $1}'`
docker exec -it ${ETCD} etcdctl --endpoints https://127.0.0.1:2379 --ca-file /etc/kubernetes/pki/etcd/ca.crt --cert-file /etc/kubernetes/pki/etcd/peer.crt --key-file /etc/kubernetes/pki/etcd/peer.key cluster-health
#移除故障节点的etcd(正常master节点执行)
ETCD=`docker ps|grep etcd|grep -v POD|awk '{print $1}'`
docker exec -it ${ETCD} etcdctl --endpoints https://127.0.0.1:2379 --ca-file /etc/kubernetes/pki/etcd/ca.crt --cert-file /etc/kubernetes/pki/etcd/peer.crt --key-file /etc/kubernetes/pki/etcd/peer.key member remove 6bb5472fc24f7d4c
docker exec -it ${ETCD} etcdctl --endpoints https://127.0.0.1:2379 --ca-file /etc/kubernetes/pki/etcd/ca.crt --cert-file /etc/kubernetes/pki/etcd/peer.crt --key-file /etc/kubernetes/pki/etcd/peer.key cluster-health
#重新加入正常的机器(要与原来master01主机名、IP一致),并且复用ca证书、k8s-init-master01.yaml
#注意修改initial-cluster: "master01=https://192.168.10.241:2380" 为
initial-cluster: "master01=https://192.168.10.241:2380,master02=https:/192.168.10..242:2380,master03=https:192.168.10.10.243:2380"
initial-cluster-state: existing
#重置master01(之后都在master01上执行)
#kubeadm reset -f
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X
ipvsadm --clear
rm -rf $HOME/.kube
systemctl daemon-reload && systemctl restart kubelet
#重新做一遍master01初始化
#记得重新copy正常master上的ca证书等
cd /etc/kubernetes/ && tar -xzvf k8s-ca.tar.gz && cd ~
#配置证书
kubeadm init phase certs all --config k8s-init-master01.yaml
kubeadm init phase etcd local --config k8s-init-master01.yaml
#初始化kubelet
kubeadm init phase kubeconfig kubelet --config k8s-init-master01.yaml
kubeadm init phase kubelet-start --config k8s-init-master01.yaml
mkdir -p $HOME/.kube && sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config && sudo chown $(id -u):$(id -g) $HOME/.kube/config
#将etcd加入集群
#查看etcd集群已有节点
kubectl exec -n kube-system etcd-master02 -- etcdctl --ca-file /etc/kubernetes/pki/etcd/ca.crt --cert-file /etc/kubernetes/pki/etcd/peer.crt --key-file /etc/kubernetes/pki/etcd/peer.key --endpoints=https://192.168.10.242:2379 member list
#加入集群
kubectl exec -n kube-system etcd-master02 -- etcdctl --ca-file /etc/kubernetes/pki/etcd/ca.crt --cert-file /etc/kubernetes/pki/etcd/peer.crt --key-file /etc/kubernetes/pki/etcd/peer.key --endpoints=https://192.168.10.242:2379 member add master01 https:/192.168.10..241:2380
#查看etcd集群已有节点
kubectl exec -n kube-system etcd-master02 -- etcdctl --ca-file /etc/kubernetes/pki/etcd/ca.crt --cert-file /etc/kubernetes/pki/etcd/peer.crt --key-file /etc/kubernetes/pki/etcd/peer.key --endpoints=https://192.168.10.242:2379 member list
#启动kube-apiserver、kube-controller-manager、kube-scheduler
kubeadm init phase kubeconfig all --config k8s-init-master01.yaml
kubeadm init phase control-plane all --config k8s-init-master01.yaml
#标记节点为master
kubectl get nodes #ROLES为none状态
kubeadm init phase mark-control-plane --config k8s-init-master01.yaml #ROLES为master状态
kubectl get nodes #ROLES为master状态
#查看恢复是否正常(如果失败,还需重新删除坏的etcd,然后再来一遍)
kubectl get nodes #master01 status变成ready
kubectl get pods --namespace kube-system -o wide
#成功恢复
十四、安装helm(kubernetes的包管理器)
#前提条件:
必须安装了kubernetes集群
必须拥有本地配置好的kubectl
#安装client端(Helm CLI)
#脚本安装
curl https://raw.githubusercontent.com/helm/helm/master/scripts/get > get_helm.sh
chmod u+x get_helm.sh
./get_helm.sh
#二进制安装
wget https://storage.googleapis.com/kubernetes-helm/helm-v2.14.0-linux-amd64.tar.gz
tar -xzvf helm-v2.14.0-linux-amd64.tar.gz
cp linux-amd64/helm /usr/local/bin/helm && cp linux-amd64/tiller /usr/local/bin/tiller
#安装server端(tiller是服务端组件,会安装到kubernetes集群里)
#安装ssl/tls的tiller(https://helm.sh/docs/using_helm/#using-ssl-between-helm-and-tiller)
cd /etc/kubernetes/pki && mkdir helm
cd helm
#制作CA RSA私钥
openssl genrsa -out ./ca.key.pem 4096
#制作CA根证书
openssl req -key ca.key.pem -new -x509 -days 7300 -sha256 -out ca.cert.pem -extensions v3_ca -subj "/C=CN/O=HELM/CN=HELM"
#生成tiller RSA私钥
openssl genrsa -out ./tiller.key.pem 4096
#生成helm RSA私钥
openssl genrsa -out ./helm.key.pem 4096
#生成tiller私钥的签名请求对(csr)
openssl req -key tiller.key.pem -new -sha256 -out tiller.csr.pem -subj "/C=CN/O=HELM/CN=TILLER"
#生成helm私钥的签名请求(csr)
openssl req -key helm.key.pem -new -sha256 -out helm.csr.pem -subj "/C=CN/O=HELM/CN=HELM"
#使用CA根证书对tiller的签名请求(csr)签名
#openssl x509 -req -CA ca.cert.pem -CAkey ca.key.pem -CAcreateserial -in tiller.csr.pem -out tiller.cert.pem -days 365
#默认情况下,Helm 客户端通过隧道(即 kube 代理)127.0.0.1 连接到 Tiller。 在 TLS 握手期间,通常提供主机名(例如 example.com),对证书进行检查,包括附带的信息。 但是,由于通过隧道,目标是 IP 地址。因此,要验证证书,必须在 Tiller 证书中将 IP 地址 127.0.0.1 列为 IP 附带备用名称(IP SAN: IP subject alternative name)。
例如,要在生成 Tiller 证书时将 127.0.0.1 列为 IP SAN:
echo subjectAltName=IP:127.0.0.1 > extfile.cnf
openssl x509 -req -CA ca.cert.pem -CAkey ca.key.pem -CAcreateserial -in tiller.csr.pem -out tiller.cert.pem -days 365 -extfile extfile.cnf
#使用CA根证书对helm的签名请求(csr)签名
openssl x509 -req -CA ca.cert.pem -CAkey ca.key.pem -CAcreateserial -in helm.csr.pem -out helm.cert.pem -days 365
#查看证书内容命令
openssl x509 -text -noout -in tiller.cert.pem
#初始化tiller(注意helm客户端版本要跟服务端tiller images版本兼容)
cd ~
helm version(注意helm客户端版本要跟服务端tiller images版本兼容)
#无安全认证
helm init --history-max=0 --debug --upgrade --service-account tiller --tiller-namespace kube-system --tiller-image gcr.azk8s.cn/google_containers/tiller:v2.14.0 --stable-repo-url https://kubernetes.oss-cn-hangzhou.aliyuncs.com/charts
#启用TLS安全认证(后续教程启用了TLS)
helm init --history-max=0 --debug --upgrade --service-account tiller --tiller-namespace kube-system --tiller-image registry.cn-hangzhou.aliyuncs.com/google_containers/tiller:v2.14.0 --stable-repo-url https://kubernetes.oss-cn-hangzhou.aliyuncs.com/charts --tiller-tls --tiller-tls-cert /etc/kubernetes/pki/helm/tiller.cert.pem --tiller-tls-key /etc/kubernetes/pki/helm/tiller.key.pem --tiller-tls-verify --tls-ca-cert /etc/kubernetes/pki/helm/ca.cert.pem
#使用谷歌默认地址,可能网络不通(https://kubernetes-charts.storage.googleapis.com)
helm init --history-max=0 --debug --upgrade --service-account tiller --tiller-namespace kube-system --tiller-image registry.cn-hangzhou.aliyuncs.com/google_containers/tiller:v2.14.0 --tiller-tls --tiller-tls-cert /etc/kubernetes/pki/helm/tiller.cert.pem --tiller-tls-key /etc/kubernetes/pki/helm/tiller.key.pem --tiller-tls-verify --tls-ca-cert /etc/kubernetes/pki/helm/ca.cert.pem
#kubernetes上给tiller新建账户和角色(helm-tiller.yaml详见同级目录的helm-tiller.yaml)
kubectl apply -f helm-tiller.yaml
#kubectl delete -f helm-tiller.yaml
#查看授权是否成功
kubectl get deploy --namespace kube-system tiller-deploy --output yaml | grep serviceAccount
#配置client tls
#执行 helm ls 会报错:Error: transport is closing,这是因为您的 Helm 客户端没有正确的证书来向 Tiller 进行身份验证。
helm ls --debug
helm ls --tls --tls-ca-cert /etc/kubernetes/pki/helm/ca.cert.pem --tls-cert /etc/kubernetes/pki/helm/helm.cert.pem --tls-key /etc/kubernetes/pki/helm/helm.key.pem --debug
#更快捷的方式是
\cp -f /etc/kubernetes/pki/helm/ca.cert.pem $HOME/.helm/ca.pem
\cp -f /etc/kubernetes/pki/helm/helm.cert.pem $HOME/.helm/cert.pem
\cp -f /etc/kubernetes/pki/helm/helm.key.pem $HOME/.helm/key.pem
helm ls --tls --debug
#验证
helm version --tls --debug
kubectl get pods -n kube-system | grep tiller
kubectl -n kube-system get deployment
kubectl logs $(kubectl get pods -n kube-system | grep tiller | awk '{print $1}') -n kube-system
#查看服务
kubectl get svc -n kube-system | grep tiller
#验证端口
kubectl get pods tiller-deploy-c595846c-vf4vr --template='{{(index (index .spec.containers 0).ports 0).containerPort}}{{"\n"}}' -n kube-system
#如何删除和重置
kubectl delete deployment tiller-deploy -n kube-system
kubectl delete service tiller-deploy -n kube-system
kubectl delete -f helm-tiller.yaml
rm -rf $HOME/.helm
或者
helm reset -f --tls && helm reset --remove-helm-home --tls(存在无法连接tiller pod,reset失败的情况,只能使用上面的办法删除)
#使用helm
helm search
helm repo list
helm repo update
helm list --tls
#因为kubernetes集群开启了rbac,所以安装的时候需要创建rbac权限
helm install stable/nginx-ingress --name nginx-ingress --tls --debug --set rbac.create=true
#查看状态
helm status nginx-ingress --tls
注意:如果完全重新安装,建议完全清理以前生成的文件,以及重启相关的服务,最好重启服务器,生产环境中,记得要定期备份etcd数据等
十五、使用kubeadm方式进行kubernetes版本升级
#https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade-1-15/
#https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-init/
#https://github.com/kubernetes/kubeadm/issues/1322
#https://github.com/kubernetes/kubernetes/pull/69366
#https://juejin.im/post/5c9ce517e51d452b837c959e
#获取最新kubeadm最新版本
yum list --showduplicates kubeadm --disableexcludes=kubernetes
#升级第一个控制节点(master01执行)
#安装最新版本kubeadm
yum install -y kubeadm-1.15.0-0 --disableexcludes=kubernetes
#验证下载是否有效并具有预期版本
kubeadm version
#获取升级计划
kubeadm upgrade plan
#选择要升级到的版本,然后运行相应的命令
kubeadm upgrade apply v1.15.0
#手动升级您的CNI提供程序插件
#按需操作
#升级控制端上的kubelet和kubectl
yum install -y kubelet-1.15.0-0 kubectl-1.15.0-0 --disableexcludes=kubernetes
#重启kubelet
systemctl restart kubelet
#升级其他控制节点(其他master节点执行)
#与第一个控制节点操作相同但使用:
kubeadm upgrade node 代替 kubeadm upgrade apply
#查看kubeadm-config
kubectl -n kube-system get cm kubeadm-config -oyaml
#编辑添加内容
ClusterStatus: |
nodeRegistration:
criSocket: /var/run/dockershim.sock
apiEndpoints:
master01:
advertiseAddress: 192.168.10.241
bindPort: 6443
master02:
advertiseAddress: 192.168.10.242
bindPort: 6443
master03:
advertiseAddress: 192.168.10.243
bindPort: 6443
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterStatus
#如果没有上面内容则执行kubeadm upgrade node会报错:
unable to fetch the kubeadm-config ConfigMap: failed to get node registration: node master02 doesn't have kubeadm.alpha.kubernetes.io/cri-socket annotation
unable to fetch the kubeadm-config ConfigMap: failed to getAPIEndpoint: failed to get APIEndpoint information for this node
#升级工作节点(node节点执行)
#安装最新版本kubeadm
yum install -y kubeadm-1.15.0-0 --disableexcludes=kubernetes
#排空node节点(master端执行)
#通过将有负载的节点标记为不可调度并逐出工作负载来准备节点以进行维护
kubectl drain node01 --ignore-daemonsets
#升级kubelet配置
kubeadm upgrade node
#升级kubelet和kubectl
yum install -y kubelet-1.15.x-0 kubectl-1.15.x-0 --disableexcludes=kubernetes
#重启kubelet
systemctl restart kubelet
#恢复节点(master端执行)
kubectl uncordon node01
十六、kubeadm证书有效期问题
kubeadm 默认证书为一年,一年过期后,会导致api service不可用,使用过程中会出现:x509: certificate has expired or is not yet valid
kubeadm部署的kubernets证书一直都是个诟病,默认都只有一年有效期,kubeadm 1.14.x安装后有部分证书还是一年有效期,但个别证书已修改为10年有效期,但对我们使用来说,一年有效期还是一个比较的坑,需要进行调整。
修改kubeadm 1.14.x源码,调整证书过期时间
kubeadm1.14.x 安装过后crt证书如下所示
/etc/kubernetes/pki/apiserver.crt
/etc/kubernetes/pki/front-proxy-ca.crt #10年有效期
/etc/kubernetes/pki/ca.crt #10年有效期
/etc/kubernetes/pki/apiserver-etcd-client.crt
/etc/kubernetes/pki/front-proxy-client.crt #10年有效期
/etc/kubernetes/pki/etcd/server.crt
/etc/kubernetes/pki/etcd/ca.crt #10年有效期
/etc/kubernetes/pki/etcd/peer.crt #10年有效期
/etc/kubernetes/pki/etcd/healthcheck-client.crt
/etc/kubernetes/pki/apiserver-kubelet-client.crt
如上所示,除了标注说明的证书为10年有效期,其余都是1年有效期,我们查看下原先调整证书有效期的源码,克隆kubernetes 源码,切换到1.14.1 tag 查看:
代码目录: staging/src/k8s.io/client-go/util/cert/cert.go
那么解决方法大致两种:
一种,调整证书有效期,重新编译:http://team.jiunile.com/blog/2018/12/k8s-kubeadm-ca-upgdate.html
二种,重新签发证书:
#建议不要重新生成ca证书,因为更新了ca证书,集群节点就需要手工操作,才能让集群正常(会涉及重新join)
#操作之前,先将/etc/kubernetes/pki下的证书文件,mv到其它文件夹,作个临时备份,不要删除。
kubeadm alpha phase certs etcd-healthcheck-client --config cluster.yaml
kubeadm alpha phase certs etcd-peer --config cluster.yaml
kubeadm alpha phase certs etcd-server --config cluster.yaml
kubeadm alpha phase certs front-proxy-client--config cluster.yaml
kubeadm alpha phase certs apiserver-etcd-client --config cluster.yaml
kubeadm alpha phase certs apiserver-kubelet-client --config cluster.yaml
kubeadm alpha phase certs apiserver --config cluster.yaml
kubeadm alpha phase certs sa --config cluster.yaml
直接粘贴笔记,格式不太美观,教程原文件及yaml配置文件,可以直接百度云下载后浏览:https://pan.baidu.com/s/14KArQ7yWhqWJEgcVZrxE6Q 提取码:86mq