怎么搭建釣魚網(wǎng)站百度指數(shù)第一
1?單master集群和多master節(jié)點集群方案
1.1?單Master集群
k8s 集群是由一組運行 k8s 的節(jié)點組成的,節(jié)點可以是物理機、虛擬機或者云服務器。k8s 集群中的節(jié)點分為兩種角色:master 和 node。
- master 節(jié)點:master 節(jié)點負責控制和管理整個集群,它運行著一些關鍵的組件,如 kube-apiserver、kube-scheduler、kube-controller-manager 等。master 節(jié)點可以有一個或多個,如果有多個 master 節(jié)點,那么它們之間需要通過 etcd 這個分布式鍵值存儲來保持數(shù)據(jù)的一致性。
- node 節(jié)點:node 節(jié)點是承載用戶應用的工作節(jié)點,它運行著一些必要的組件,如 kubelet、kube-proxy、container runtime 等。node 節(jié)點可以有一個或多個,如果有多個 node 節(jié)點,那么它們之間需要通過網(wǎng)絡插件來實現(xiàn)通信和路由。
一般情況下我們會搭建單master多node集群。它是一種常見的 k8s 集群架構,它只有一個 master 節(jié)點和多個 node 節(jié)點。這種架構的優(yōu)點是簡單易搭建,適合用于學習和測試 k8s 的功能和特性。這種架構的缺點是 master 節(jié)點成為了單點故障,如果 master 節(jié)點出現(xiàn)問題,那么整個集群就無法正常工作。
搭建 k8s 單 master 多 node 集群有多種方法,根據(jù)不同的需求和場景,可以選擇合適的方式來搭建和運維node集群。一般來說,有以下幾種常見的方式:
- 使用kubeadm:這是一種使用官方提供的工具kubeadm來快速創(chuàng)建和管理node集群的方式。kubeadm可以自動安裝和配置node節(jié)點上所需的組件,如kubelet、kube-proxy、容器運行時等。這種方式適用于學習和測試目的,或者簡單的生產(chǎn)環(huán)境。
- 使用kops:這是一種使用開源工具kops來在云服務商(如AWS、GCP等)上創(chuàng)建和管理node集群的方式。kops可以自動創(chuàng)建和配置云資源,如虛擬機、網(wǎng)絡、存儲等,并安裝和配置node節(jié)點上所需的組件。這種方式適用于在云端部署高可用和可擴展的node集群。
- 使用其他工具或平臺:這是一種使用其他第三方提供的工具或平臺來創(chuàng)建和管理node集群的方式。例如,你可以使用Ansible、Terraform、Rancher等工具來自動化和定制node集群的創(chuàng)建和配置過程?;蛘?#xff0c;你可以使用云服務商提供的托管服務(如EKS、GKE、AKS等)來直接創(chuàng)建和管理node集群。這種方式適用于不同的需求和偏好,但可能需要更多的學習和調(diào)試成本。
1.2?Master 高可用架構
kubernetes多master集群是指使用多個master節(jié)點來提高集群的可用性和容錯性的方案。master節(jié)點是負責控制和管理集群中的資源和服務的節(jié)點,它運行著以下組件:
- kube-apiserver:提供了HTTP REST接口的關鍵服務進程,是集群中所有資源的增、刪、改、查等操作的唯一入口,也是集群控制的入口進程。
- kube-scheduler:負責資源調(diào)度(Pod調(diào)度)的進程,相當于公交公司的“調(diào)度室”。
- kube-controller-manager:集群中所有資源對象的自動化控制中心,可以將其理解為資源對象的“大總管”。
Kubernetes 作為容器集群系統(tǒng),通過健康檢查 + 重啟策略實現(xiàn)了 Pod 故障自我修復能力,通過調(diào)度算法實現(xiàn)將 Pod 分布式部署,并保持預期副本數(shù),根據(jù) Node 失效狀態(tài)自動在其他 Node 拉起 Pod,實現(xiàn)了應用層的高可用性。
針對 Kubernetes 集群,高可用性還應包含以下兩個層面的考慮:Etcd 數(shù)據(jù)庫的高可用性和 Kubernetes Master 組件的高可用性。
Master 節(jié)點扮演著總控中心的角色,通過不斷與工作節(jié)點上的 Kubelet 和 kube-proxy 進行通信來維護整個集群的健康工作狀態(tài)。如果 Master 節(jié)點故障,將無法使用 kubectl 工具或者 API 做任何集群管理。
Master 節(jié)點主要有三個服務 kube-apiserver、kube-controller-manager 和 kube-scheduler,其中 kube-controller-manager 和 kube-scheduler 組件自身通過選擇機制已經(jīng)實現(xiàn)了高可用,所以 Master 高可用主要針對 kube-apiserver 組件,而該組件是以 HTTP API 提供服務,因此對他高可用與 Web 服務器類似,增加負載均衡器對其負載均衡即可,并且可水平擴容。
多 Master 架構圖:
實現(xiàn)kubernetes master集群有多種方式,根據(jù)不同的需求和場景,可以選擇合適的方式來搭建和運維master集群。一般來說,根據(jù)實現(xiàn)方式,負載均衡集群可以分為以下幾種方案:
- 硬件負載均衡:硬件負載均衡是使用專門的硬件設備來實現(xiàn)負載均衡的方案,如 F5、Cisco 等。硬件負載均衡的優(yōu)點是性能高、穩(wěn)定性強,缺點是成本高、擴展性差。
- 軟件負載均衡:軟件負載均衡是使用普通的服務器和軟件來實現(xiàn)負載均衡的方案,如 Nginx、HAProxy 等。軟件負載均衡的優(yōu)點是成本低、擴展性好,缺點是性能低、穩(wěn)定性差。
- 混合負載均衡:混合負載均衡是結合硬件和軟件來實現(xiàn)負載均衡的方案,如使用硬件設備作為全局入口,使用軟件作為局部分發(fā)?;旌县撦d均衡的優(yōu)點是兼顧了性能和成本,缺點是復雜度高、維護難。
1.2.1?存儲高可用集群
etcd:分布式鍵值存儲系統(tǒng),用于保存集群中所有資源對象的狀態(tài)和元數(shù)據(jù)。
k8s配置高可用(HA)Kubernetes etcd集群。
可以設置 以下兩種HA 集群:
- 使用堆疊(stacked)控制平面節(jié)點,其中 etcd 節(jié)點與控制平面節(jié)點共存
- 使用外部 etcd 節(jié)點,其中 etcd 在與控制平面不同的節(jié)點上運行
1.2.1.1?堆疊(Stacked)etcd 拓撲--內(nèi)置etcd集群
堆疊(Stacked)HA集群是一種這樣的拓撲,其中 etcd 分布式數(shù)據(jù)存儲集群堆疊在 kubeadm 管理的控制平面節(jié)點上,作為控制平面的一個組件運行。
每個控制平面節(jié)點運行 kube-apiserver、kube-scheduler 和 kube-controller-manager 實例。 kube-apiserver 使用負載均衡器暴露給工作節(jié)點。
每個控制平面節(jié)點創(chuàng)建一個本地etcd成員(member),這個 etcd 成員只與該節(jié)點的 kube-apiserver 通信。 這同樣適用于本地 kube-controller-manager 和 kube-scheduler 實例。
這種拓撲將控制平面和 etcd 成員耦合在同一節(jié)點上。相對使用外部 etcd 集群, 設置起來更簡單,而且更易于副本管理。
然而,堆疊集群存在耦合失敗的風險。如果一個節(jié)點發(fā)生故障,則etcd 成員和控制平面實例都將丟失, 并且冗余會受到影響。你可以通過添加更多控制平面節(jié)點來降低此風險。
因此應該為 HA 集群運行至少三個堆疊的控制平面節(jié)點。
這是 kubeadm 中的默認拓撲。當使用 kubeadm init 和 kubeadm join --control-plane 時, 在控制平面節(jié)點上會自動創(chuàng)建本地 etcd 成員。
?1.2.1.2?外部 etcd 拓撲--外部etcd集群
具有外部 etcd 的 HA 集群是一種這樣的拓撲, 其中 etcd 分布式數(shù)據(jù)存儲集群在獨立于控制平面節(jié)點的其他節(jié)點上運行。
就像堆疊的 etcd 拓撲一樣,外部 etcd 拓撲中的每個控制平面節(jié)點都會運行 kube-apiserver、kube-scheduler 和 kube-controller-manager 實例。 同樣,kube-apiserver 使用負載均衡器暴露給工作節(jié)點。但是 etcd 成員在不同的主機上運行, 每個 etcd 主機與每個控制平面節(jié)點的 kube-apiserver 通信。
這種拓撲結構解耦了控制平面和 etcd 成員。因此它提供了一種 HA 設置, 其中失去控制平面實例或者 etcd 成員的影響較小,并且不會像堆疊的 HA 拓撲那樣影響集群冗余。
但此拓撲需要兩倍于堆疊 HA 拓撲的主機數(shù)量。 具有此拓撲的 HA 集群至少需要三個用于控制平面節(jié)點的主機和三個用于 etcd 節(jié)點的主機。
2 高可用集群部署實戰(zhàn)
2.1?單master節(jié)點升級為高可用集群
2.1.1 部署負載均衡
nginx節(jié)點信息:10.220.43.211:16443
2.1.1.1 安裝nginx
此處負載均衡以nginx為例。
$ yum install nginx -y
2.1.1.2 配置nginx
$ vim /etc/nginx/nginx.conf
user nginx;
worker_processes auto;
error_log /var/log/nginx/error.log;
pid /run/nginx.pid;include /usr/share/nginx/modules/*.conf;events {worker_connections 1024;
}# 四層負載均衡,為兩臺Master apiserver組件提供負載均衡
stream {log_format main '$remote_addr $upstream_addr - [$time_local] $status $upstream_bytes_sent';access_log /var/log/nginx/k8s-access.log main;upstream k8s-apiserver {server 10.220.43.203:6443; # Master1 APISERVER IP:PORT}server {listen 16443; # 由于nginx與master節(jié)點復用,這個監(jiān)聽端口不能是6443,否則會沖突proxy_pass k8s-apiserver;}
}http {log_format main '$remote_addr - $remote_user [$time_local] "$request" ''$status $body_bytes_sent "$http_referer" ''"$http_user_agent" "$http_x_forwarded_for"';access_log /var/log/nginx/access.log main;sendfile on;tcp_nopush on;tcp_nodelay on;keepalive_timeout 65;types_hash_max_size 2048;include /etc/nginx/mime.types;default_type application/octet-stream;server {listen 80 default_server;server_name _;location / {}}
}
2.1.1.3 啟動nginx
$ nginx -t
$ systemctl start nginx
2.1.2 master切換
2.1.2.1 更新k8s證書?
ops-master-1操作。
如果是用kubeadm init 來創(chuàng)建的集群,那么需要導出一個kubeadm配置?。
$ kubectl -n kube-system get configmap kubeadm-config -o jsonpath='{.data.ClusterConfiguration}' > kubeadm.yaml
$ cat kubeadm.yaml
apiServer:extraArgs:authorization-mode: Node,RBACtimeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns:type: CoreDNS
etcd:local:dataDir: /var/lib/etcd
imageRepository: registry.aliyuncs.com/google_containers
kind: ClusterConfiguration
kubernetesVersion: v1.21.9
networking:dnsDomain: cluster.localpodSubnet: 172.25.0.0/16serviceSubnet: 192.168.0.0/16
scheduler: {}
2.1.2.2?添加證書SANs信息
$ vim kubeadm.yaml
apiServer:certSANs:- 10.220.43.211- 10.220.43.203- 10.220.43.204- 10.220.43.205extraArgs:authorization-mode: Node,RBACtimeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controlPlaneEndpoint: 10.220.43.211:6443
controllerManager: {}
dns:type: CoreDNS
etcd:local:dataDir: /var/lib/etcd
imageRepository: registry.aliyuncs.com/google_containers
kind: ClusterConfiguration
kubernetesVersion: v1.21.9
networking:dnsDomain: cluster.localpodSubnet: 172.25.0.0/16serviceSubnet: 192.168.0.0/16
scheduler: {}
2.1.2.3?生成新證書
2.1.2.3.1 備份舊證書
$ mkdir bak
$ mv /etc/kubernetes/pki/apiserver.{crt,key} bak/
2.1.2.3.2?生成新證書
$ kubeadm init phase certs apiserver --config kubeadm.yaml
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local ops-master-1] and IPs [192.168.0.1 10.220.43.203 10.220.43.211 10.220.43.204 10.220.43.205]
2.1.2.3.3?驗證證書
確定包含新添加的SAN列表。
$ openssl x509 -in /etc/kubernetes/pki/apiserver.crt -text
......
X509v3 Subject Alternative Name: DNS:kubernetes, DNS:kubernetes.default, DNS:kubernetes.default.svc, DNS:kubernetes.default.svc.cluster.local, DNS:ops-master-1, IP Address:192.168.0.1, IP Address:10.220.43.203, IP Address:10.220.43.211, IP Address:10.220.43.204, IP Address:10.220.43.205
......
2.1.2.3.5?重啟apiserver
$ kubectl get pod -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
calico-kube-controllers-5d4b78db86-rrgw4 1/1 Running 0 54m 172.25.13.1 ops-master-1 <none> <none>
calico-node-jk7zc 1/1 Running 0 51m 10.220.43.204 ops-worker-1 <none> <none>
calico-node-p2c7d 1/1 Running 0 54m 10.220.43.203 ops-master-1 <none> <none>
calico-node-v8z5x 1/1 Running 0 51m 10.220.43.205 ops-worker-2 <none> <none>
coredns-59d64cd4d4-gkrz6 1/1 Running 0 87m 172.25.13.2 ops-master-1 <none> <none>
coredns-59d64cd4d4-nmdfh 1/1 Running 0 87m 172.25.13.3 ops-master-1 <none> <none>
etcd-ops-master-1 1/1 Running 0 87m 10.220.43.203 ops-master-1 <none> <none>
kube-apiserver-ops-master-1 1/1 Running 0 87m 10.220.43.203 ops-master-1 <none> <none>
kube-controller-manager-ops-master-1 1/1 Running 0 87m 10.220.43.203 ops-master-1 <none> <none>
kube-proxy-f7mct 1/1 Running 0 51m 10.220.43.205 ops-worker-2 <none> <none>
kube-proxy-j9bmp 1/1 Running 0 51m 10.220.43.204 ops-worker-1 <none> <none>
kube-proxy-pm77c 1/1 Running 0 87m 10.220.43.203 ops-master-1 <none> <none>
kube-scheduler-ops-master-1 1/1 Running 0 87m 10.220.43.203 ops-master-1 <none> <none>
$ kubectl delete pod kube-controller-manager-ops-master-1 -n kube-system
pod "kube-controller-manager-ops-master-1" deleted
2.1.2.3.6?保存新配置
$ kubeadm init phase upload-config kubeadm --config kubeadm.yaml
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
2.1.2.4?更新配置
證書更新完成了,負載均衡也部署好了,接下來就需要把所有用到舊地址的組件配置修改成負載均衡的地址。
2.1.2.4.1?kubelet.conf
$ vim /etc/kubernetes/kubelet.conf
...server: https://10.220.43.211:16443name: kubernetes
...
$ systemctl restart kubelet
2.1.2.4.2?controller-manager.conf
$ vim /etc/kubernetes/controller-manager.conf
...server: https://10.220.43.211:16443name: kubernetes
...
# 重啟kube-controller-manager
$ kubectl delete pod -n kube-system kube-controller-manager-ops-master-1
2.1.2.4.3??scheduler.conf
$ vim /etc/kubernetes/scheduler.conf...server: https://10.220.43.211:16443name: kubernetes
...
# 重啟kube-scheduler
$ kubectl delete pod -n kube-system kube-scheduler-ops-master-1
2.1.2.4.4?kube-proxy
$ kubectl edit configmap kube-proxy -n kube-system
...kubeconfig.conf: |-apiVersion: v1kind: Configclusters:- cluster:certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crtserver: https://10.220.43.211:16443name: defaultcontexts:- context:cluster: defaultnamespace: defaultuser: defaultname: default
...
configmap/kube-proxy edited
$ kubectl rollout restart daemonset kube-proxy -n kube-system
2.1.2.4.5?修改kubeconfig
~/.kube/config
?和?/etc/kubernetes/admin.conf都需要修改。
?
$ vim /etc/kubernetes/admin.conf
...server: https://10.220.43.211:16443name: kubernetes
...
$ vim /root/.kube/config
...server: https://10.220.43.211:16443name: kubernetes
...
2.1.3?worker切換apiserver
2.1.3.1 kubelet.conf
$ vim /etc/kubernetes/kubelet.conf
...server: https://10.220.43.211:16443name: kubernetes
...
$ systemctl restart kubelet
2.1.3.2?修改kubeconfig
只需要修改~/.kube/config
?。
$ vim /etc/kubernetes/admin.conf
...server: https://10.220.43.211:16443name: kubernetes
...
2.1.4?驗證
2.1.4.1 master驗證
ops-master-1驗證。
$ cat /root/.kube/config | grep server
server: https://10.220.43.211:16443
$ kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-5d4b78db86-rrgw4 1/1 Running 0 65m
calico-node-jk7zc 1/1 Running 0 62m
calico-node-p2c7d 1/1 Running 0 65m
calico-node-v8z5x 1/1 Running 0 62m
coredns-59d64cd4d4-gkrz6 1/1 Running 0 97m
coredns-59d64cd4d4-nmdfh 1/1 Running 0 97m
etcd-ops-master-1 1/1 Running 0 98m
kube-apiserver-ops-master-1 1/1 Running 0 98m
kube-controller-manager-ops-master-1 1/1 Running 0 5m44s
kube-proxy-dhjxj 1/1 Running 0 2m30s
kube-proxy-rm64j 1/1 Running 0 2m32s
kube-proxy-xg6bp 1/1 Running 0 2m35s
kube-scheduler-ops-master-1 1/1 Running 0 4m16s
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ops-master-1 Ready control-plane,master 101m v1.21.9
ops-worker-1 Ready <none> 65m v1.21.9
ops-worker-2 Ready <none> 65m v1.21.9
2.1.4.2 worker驗證?
ops-worker-1節(jié)點驗證。?
$ kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-5d4b78db86-rrgw4 1/1 Running 0 74m
calico-node-jk7zc 1/1 Running 0 71m
calico-node-p2c7d 1/1 Running 0 74m
calico-node-v8z5x 1/1 Running 0 71m
coredns-59d64cd4d4-gkrz6 1/1 Running 0 107m
coredns-59d64cd4d4-nmdfh 1/1 Running 0 107m
etcd-ops-master-1 1/1 Running 0 107m
kube-apiserver-ops-master-1 1/1 Running 0 107m
kube-controller-manager-ops-master-1 1/1 Running 0 14m
kube-proxy-dhjxj 1/1 Running 0 11m
kube-proxy-rm64j 1/1 Running 0 11m
kube-proxy-xg6bp 1/1 Running 0 11m
kube-scheduler-ops-master-1 1/1 Running 0 13m
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ops-master-1 Ready control-plane,master 109m v1.21.9
ops-worker-1 Ready <none> 74m v1.21.9
ops-worker-2 Ready <none> 73m v1.21.9
2.2??高可用集群新增master節(jié)點
新master節(jié)點:10.220.43.209 ops-master-2
2.2.1 新master部署k8s服務
2.2.1.1 各節(jié)點增加新master 信息
# ops-master-1/ops-worker-1/ops-worker-2:
echo "10.220.43.209 ops-master-2" >> /etc/hosts
2.2.1.2 k8s服務部署?
參考:Kubernetes實戰(zhàn)(九)-kubeadm安裝k8s集群-CSDN博客??
2.2.2? 新master加入集群
$ kubeadm join 10.220.43.211:16443 --token 9puv2h.sr5dvg9skqlqhofm --discovery-token-ca-cert-hash sha256:b85555d7fdf2e1f28afe09dcb649117a34ac330ace38434fb604e2705b5df207 --control-plane --certificate-key a96e54087b299b962dae6321e519386fd9bdb1876a6cd4067c55484a0fe0c5e0
[preflight] Running pre-flight checks[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks before initializing the new control plane instance
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[download-certs] Downloading the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost ops-master-2] and IPs [10.220.43.209 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost ops-master-2] and IPs [10.220.43.209 127.0.0.1 ::1]
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local ops-master-2] and IPs [192.168.0.1 10.220.43.209 10.220.43.211 10.220.43.203 10.220.43.204 10.220.43.205]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[certs] Using the existing "sa" key
[kubeconfig] Generating kubeconfig files
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "admin.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[check-etcd] Checking that the etcd cluster is healthy
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
[etcd] Announced new etcd member joining to the existing etcd cluster
[etcd] Creating static Pod manifest for "etcd"
[etcd] Waiting for the new etcd member to join the cluster. This can take up to 40s
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[mark-control-plane] Marking the node ops-master-2 as control-plane by adding the labels: [node-role.kubernetes.io/master(deprecated) node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node ops-master-2 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]This node has joined the cluster and a new control plane instance was created:* Certificate signing request was sent to apiserver and approval was received.
* The Kubelet was informed of the new secure connection details.
* Control plane (master) label and taint were applied to the new node.
* The Kubernetes control plane instances scaled up.
* A new etcd member was added to the local/stacked etcd cluster.To start administering your cluster from this node, you need to run the following as a regular user:mkdir -p $HOME/.kubesudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/configsudo chown $(id -u):$(id -g) $HOME/.kube/configRun 'kubectl get nodes' to see this node join the cluster.
加入成功。
2.2.3 查看狀態(tài)
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ops-master-1 Ready control-plane,master 147m v1.21.9
ops-master-2 NotReady control-plane,master 27s v1.21.9
ops-worker-1 Ready <none> 111m v1.21.9
ops-worker-2 Ready <none> 111m v1.21.9
狀態(tài)更新需要等待,等到2-3分鐘后再查看:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ops-master-1 Ready control-plane,master 150m v1.21.9
ops-master-2 Ready control-plane,master 3m46s v1.21.9
ops-worker-1 Ready <none> 114m v1.21.9
ops-worker-2 Ready <none> 114m v1.21.9
$ kubectl get pod -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
calico-kube-controllers-5d4b78db86-rrgw4 1/1 Running 0 117m 172.25.13.1 ops-master-1 <none> <none>
calico-node-f5s6w 1/1 Running 0 4m1s 10.220.43.209 ops-master-2 <none> <none>
calico-node-jk7zc 1/1 Running 0 114m 10.220.43.204 ops-worker-1 <none> <none>
calico-node-p2c7d 1/1 Running 0 117m 10.220.43.203 ops-master-1 <none> <none>
calico-node-v8z5x 1/1 Running 0 114m 10.220.43.205 ops-worker-2 <none> <none>
coredns-59d64cd4d4-gkrz6 1/1 Running 0 150m 172.25.13.2 ops-master-1 <none> <none>
coredns-59d64cd4d4-nmdfh 1/1 Running 0 150m 172.25.13.3 ops-master-1 <none> <none>
etcd-ops-master-1 1/1 Running 0 150m 10.220.43.203 ops-master-1 <none> <none>
etcd-ops-master-2 1/1 Running 0 3m56s 10.220.43.209 ops-master-2 <none> <none>
kube-apiserver-ops-master-1 1/1 Running 0 150m 10.220.43.203 ops-master-1 <none> <none>
kube-apiserver-ops-master-2 1/1 Running 0 3m56s 10.220.43.209 ops-master-2 <none> <none>
kube-controller-manager-ops-master-1 1/1 Running 1 5m9s 10.220.43.203 ops-master-1 <none> <none>
kube-controller-manager-ops-master-2 1/1 Running 0 3m56s 10.220.43.209 ops-master-2 <none> <none>
kube-proxy-dhjxj 1/1 Running 0 54m 10.220.43.203 ops-master-1 <none> <none>
kube-proxy-rm64j 1/1 Running 0 54m 10.220.43.204 ops-worker-1 <none> <none>
kube-proxy-xg6bp 1/1 Running 0 54m 10.220.43.205 ops-worker-2 <none> <none>
kube-proxy-zcvzs 1/1 Running 0 4m1s 10.220.43.209 ops-master-2 <none> <none>
kube-scheduler-ops-master-1 1/1 Running 1 56m 10.220.43.203 ops-master-1 <none> <none>
kube-scheduler-ops-master-2 1/1 Running 0 3m56s 10.220.43.209 ops-master-2 <none> <none>
新master節(jié)點各種組件已將安裝完畢。?
2.2.4 驗證高可用
2.2.4.1 停掉ops-master-1
[root@ops-master-1 ~]# init 0
2.2.4.2?其他節(jié)點驗證
[root@ops-master-2 etc]# kubectl get nodes
Error from server: etcdserver: request timed out
[root@ops-worker-1 .kube]# kubectl get nodes
Error from server: rpc error: code = Unknown desc = OK: HTTP status code 200; transport: missing content-type field
?經(jīng)分析,是因為coredns均分布在ops-master-1節(jié)點上,當ops-master-1節(jié)點掛掉后,無可用coredns。
2.2.4.3?coredns打散分布
$ kubectl delete pod coredns-59d64cd4d4-gkrz6 -n kube-system
pod "coredns-59d64cd4d4-gkrz6" deleted
$ kubectl get pod -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
calico-kube-controllers-5d4b78db86-rrgw4 1/1 Running 1 125m 172.25.13.6 ops-master-1 <none> <none>
calico-node-f5s6w 1/1 Running 0 11m 10.220.43.209 ops-master-2 <none> <none>
calico-node-jk7zc 1/1 Running 0 122m 10.220.43.204 ops-worker-1 <none> <none>
calico-node-p2c7d 1/1 Running 1 125m 10.220.43.203 ops-master-1 <none> <none>
calico-node-v8z5x 1/1 Running 0 122m 10.220.43.205 ops-worker-2 <none> <none>
coredns-59d64cd4d4-nmdfh 1/1 Running 1 158m 172.25.13.5 ops-master-1 <none> <none>
coredns-59d64cd4d4-zr4hd 1/1 Running 0 40s 172.25.78.65 ops-worker-1 <none> <none>
etcd-ops-master-1 1/1 Running 1 158m 10.220.43.203 ops-master-1 <none> <none>
etcd-ops-master-2 1/1 Running 1 11m 10.220.43.209 ops-master-2 <none> <none>
kube-apiserver-ops-master-1 1/1 Running 1 158m 10.220.43.203 ops-master-1 <none> <none>
kube-apiserver-ops-master-2 1/1 Running 4 11m 10.220.43.209 ops-master-2 <none> <none>
kube-controller-manager-ops-master-1 1/1 Running 2 12m 10.220.43.203 ops-master-1 <none> <none>
kube-controller-manager-ops-master-2 1/1 Running 1 11m 10.220.43.209 ops-master-2 <none> <none>
kube-proxy-dhjxj 1/1 Running 1 62m 10.220.43.203 ops-master-1 <none> <none>
kube-proxy-rm64j 1/1 Running 0 62m 10.220.43.204 ops-worker-1 <none> <none>
kube-proxy-xg6bp 1/1 Running 0 62m 10.220.43.205 ops-worker-2 <none> <none>
kube-proxy-zcvzs 1/1 Running 0 11m 10.220.43.209 ops-master-2 <none> <none>
kube-scheduler-ops-master-1 1/1 Running 2 64m 10.220.43.203 ops-master-1 <none> <none>
kube-scheduler-ops-master-2 1/1 Running 1 11m 10.220.43.209 ops-master-2 <none> <none>
coredns已打散。
此刻針對ops-master-1節(jié)點執(zhí)行停機操作,但是集群仍然不可用。
經(jīng)分析是etcd只有兩個pod,由于etcd是分布式服務,必須保持基數(shù)格式才能完成選舉。因此需要再部署一個master節(jié)點以保證etcd個數(shù)達到基數(shù)個。
此處建議使用外拓撲架構的etcd,而不是使用堆疊式的etcd部署架構。?
2.2.5 部署ops-master-3節(jié)點
參考:Kubernetes實戰(zhàn)(九)-kubeadm安裝k8s集群-CSDN博客??
2.2.6?驗證
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ops-master-1 Ready control-plane,master 168m v1.21.9
ops-master-2 Ready control-plane,master 21m v1.21.9
ops-master-3 Ready control-plane,master 2m28s v1.21.9
ops-worker-1 Ready <none> 132m v1.21.9
ops-worker-2 Ready <none> 132m v1.21.9
ops-master-1節(jié)點下線。
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ops-master-1 NotReady control-plane,master 168m v1.21.9
ops-master-2 NotReady control-plane,master 22m v1.21.9
ops-master-3 NotReady control-plane,master 2m47s v1.21.9
ops-worker-1 Ready <none> 133m v1.21.9
ops-worker-2 Ready <none> 132m v1.21.9
三個master均離線。
經(jīng)查是因為新master的kubelet.conf配置仍然配置的是:10.220.43.203:6443,當節(jié)點ops-master-1(10.220.43.203)掛掉,新master節(jié)點將無法集群鏈接,導致node下線。
解決方案:
$ vim kubelet.conf
......server: https://10.220.43.211:16443
......
$ systemctl restart kubelet
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ops-master-1 NotReady control-plane,master 4h15m v1.21.9
ops-master-2 Ready control-plane,master 108m v1.21.9
ops-master-3 Ready control-plane,master 88m v1.21.9
ops-worker-1 Ready <none> 3h39m v1.21.9
ops-worker-2 Ready <none> 3h39m v1.21.9
?至此,高可用集群新增master節(jié)點完成。