K8S 问题解决方案 —— Calico CNI 无法连接 Kubernetes API 服务器导致清理失败

kubeadm reset -f --cri-socket unix:///run/containerd/containerd.sock
[preflight] Running pre-flight checks
W0305 16:42:40.141287   31801 removeetcdmember.go:106] [reset] No kubeadm config, using etcd pod spec to get data directory
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
W0305 16:42:40.166536   31801 cleanupnode.go:134] [reset] Failed to evaluate the "/var/lib/kubelet" directory. Skipping its unmount and cleanup: lstat /var/lib/kubelet: no such file or directory
W0305 16:42:42.220281   31801 cleanupnode.go:99] [reset] Failed to remove containers: [failed to stop running pod 77c3648139b04fa5925e645cf26b78fe69b122584959ef390cc080494a96799a: output: E0305 16:42:41.207891   31928 remote_runtime.go:248] "StopPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to destroy network for sandbox \"77c3648139b04fa5925e645cf26b78fe69b122584959ef390cc080494a96799a\": plugin type=\"calico\" failed (delete): error getting ClusterInformation: Get \"https://10.1.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default\": dial tcp 10.1.0.1:443: connect: connection refused" podSandboxID="77c3648139b04fa5925e645cf26b78fe69b122584959ef390cc080494a96799a"
time="2025-03-05T16:42:41+08:00" level=fatal msg="stopping the pod sandbox \"77c3648139b04fa5925e645cf26b78fe69b122584959ef390cc080494a96799a\": rpc error: code = Unknown desc = failed to destroy network for sandbox \"77c3648139b04fa5925e645cf26b78fe69b122584959ef390cc080494a96799a\": plugin type=\"calico\" failed (delete): error getting ClusterInformation: Get \"https://10.1.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default\": dial tcp 10.1.0.1:443: connect: connection refused"
: exit status 1, failed to stop running pod 31310acd9b4330ee909d06bfb986ca2d1a8340144c330458b32bb437b01b5e4e: output: E0305 16:42:42.215736   32071 remote_runtime.go:248] "StopPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to destroy network for sandbox \"31310acd9b4330ee909d06bfb986ca2d1a8340144c330458b32bb437b01b5e4e\": plugin type=\"calico\" failed (delete): error getting ClusterInformation: Get \"https://10.1.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default\": dial tcp 10.1.0.1:443: connect: connection refused" podSandboxID="31310acd9b4330ee909d06bfb986ca2d1a8340144c330458b32bb437b01b5e4e"
time="2025-03-05T16:42:42+08:00" level=fatal msg="stopping the pod sandbox \"31310acd9b4330ee909d06bfb986ca2d1a8340144c330458b32bb437b01b5e4e\": rpc error: code = Unknown desc = failed to destroy network for sandbox \"31310acd9b4330ee909d06bfb986ca2d1a8340144c330458b32bb437b01b5e4e\": plugin type=\"calico\" failed (delete): error getting ClusterInformation: Get \"https://10.1.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default\": dial tcp 10.1.0.1:443: connect: connection refused"
: exit status 1]
[reset] Deleting contents of directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]

The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d

The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually by using the "iptables" command.

If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.

The reset process does not clean your kubeconfig files and you must remove them manually.
Please, check the contents of the $HOME/.kube/config file.

根据错误信息，问题主要出在 Calico CNI 无法连接 Kubernetes API 服务器（10.1.0.1:443）导致清理失败。以下是分步解决方案：

分步解决方案

停止相关服务并清理容器

# 停止 kubelet 和 containerd
sudo systemctl stop kubelet
sudo systemctl stop containerd

# 手动清理所有容器（containerd 环境）
sudo crictl rm -fa 2>/dev/null || \
  sudo ctr -n k8s.io containers list | awk '{print $1}' | xargs -r sudo ctr -n k8s.io containers delete

删除 CNI 配置和网络命名空间

# 删除 CNI 配置文件
sudo rm -rf /etc/cni/net.d/*

# 清理残留网络接口和命名空间
sudo ip link delete cni0 2>/dev/null
sudo ip link delete flannel.1 2>/dev/null
sudo rm -rf /var/run/calico /var/run/flannel

清理 iptables 和 IPVS 规则

# 重置 iptables
sudo iptables -F
sudo iptables -t nat -F
sudo iptables -t mangle -F
sudo iptables -X

# 如果使用 IPVS
sudo ipvsadm --clear

重置 kubeadm 并清理残留文件

# 执行 kubeadm reset
sudo kubeadm reset -f --cri-socket unix:///run/containerd/containerd.sock

# 清理残余目录
sudo rm -rf /var/lib/kubelet /etc/kubernetes $HOME/.kube

重启节点（可选）
```
# 确保所有配置生效
sudo reboot
```

关键解释

Calico 连接失败：因 kubeadm reset 已关闭 API 服务器，导致 Calico 无法通过 10.1.0.1:443 通信。需手动清理容器和网络配置。
强制清理容器：直接通过 crictl 或 ctr 删除容器，绕过 kubelet 管理。
网络残留处理：删除 CNI 配置和网络接口，防止旧配置影响新集群。
iptables/IPVS：确保无旧规则干扰后续网络组件安装。

执行完上述步骤后，可重新初始化集群（kubeadm init）或进行其他操作。

注意事项

如果出现文件无法删除，可以重启服务器后重新操作

目录CONTENT

K8S 问题解决方案 —— Calico CNI 无法连接 Kubernetes API 服务器导致清理失败

K8S 问题解决方案 —— Calico CNI 无法连接 Kubernetes API 服务器导致清理失败

分步解决方案

关键解释

注意事项

其它链接

评论区