gopaddle User Guide
  • 👋Welcome to gopaddle
  • Overview
    • 💡Getting Started
    • Register a Cluster
      • Register Rancher Prime - RKE2
      • Register K3S
      • Register MicroK8s
      • Register Kind
      • Register minikube
      • Register Kubeadm Cluster
      • Register AWS EKS Cluster
        • EKS Cluster with public or public/private access point
        • EKS Cluster with private access endpoint and a bastion host
        • Validate Cluster Connectivity
      • Register Azure AKS Cluster
      • Register Google GKE Cluster
      • Register Huawei Cloud Container Engine
    • Register GitHub Account
    • Register Jira Account
    • Register ChatGPT Assistant
    • 💻Kubernetes IDE
      • Filters
      • Editor
      • Flat vs Tree View
      • Developer Tools
    • 🙋AI Assistant
      • Chat with AI
      • Raise a Jira Ticket from Chat Window
      • Create Runbook from Chat Window
    • 📖Runbook Hub
      • Create Runbook Hub
      • Attach Runbook Hub to Cluster
      • Enhancing contents of Runbooks with AI
      • Detach Runbook Hub from Cluster
      • Syncing Runbook Hub with GitHub
      • Delete Runbook / Runbook Hub
    • ⏱️Installing Community Edition
      • MicroK8s Addon
        • On Ubuntu
        • On MacOS
      • Docker Desktop
      • SUSE Rancher Prime
      • Digital Ocean
      • Akamai Linode
      • Kind Cluster
      • Helm
      • Docker Compose
      • Accessing gopaddle UI
    • 📈Improving performance of resource discovery
    • Provision new Cluster
      • Register Cloud Account
        • AWS
          • Quickstart AWS Setup
          • IAM Access Policies
          • AWS Setup Script
        • Azure
          • Create Azure Application
          • Register Azure Cloud Authenticator
          • Register Azure Account
        • Google
      • Provision Clusters on Cloud
        • AWS EKS
          • AWS EKS Reference Architecture
          • Adding an AWS IAM Role (EKS Master / Node Pool)
          • Public EKS Cluster
          • All Private EKS Cluster (beta)
          • Creating a Node Pool
        • Azure AKS
          • Creating a Node Pool
          • Enable Public IP Node Access for Azure Deployments
          • VMSS Autoscaling Rules
        • Google GKE
          • Creating a Node Pool
  • 🔎Troubleshooting
    • Cluster Resource View Issues
      • Network Error ! ServerError: Response not successful: Received status code 503
      • Network Error ! TypeError: Failed to fetch
      • Network Error ! ServerParseError: Unexpected token 'j', "json: erro"... is not valid JSON
      • Updating Labels and Annotations does not get reflected in resources list
      • Filtered resources are not fully listed
    • Runbook Issues
      • Deleting a runbook from .gp.yaml does not detach annotation in resources
      • Deleting a Code Account from gopaddle UI does not detach annotation in resources
    • Jira Issues
      • Creating, Updating or Appending a Jira issue fails with error INVALID_INPUT
    • EKS Issues
    • Deployment Issues
    • Node Issues
    • Huawei Issues
Powered by GitBook
On this page
  1. Troubleshooting

Node Issues

Troubleshooting node events when node is not in ready status

PreviousDeployment IssuesNextHuawei Issues

Last updated 1 year ago

🔔 Event: FreeDiskSpaceFailed

🔍 Reason: Not enough disk space in the node

By default, the Kubernetes garbage collection (GC) gets triggered when the disk usage on a node crosses the HighThresholdPercent value (90% default). The ImageGCManager deletes images starting with the oldest and last used image until the disk usage reaches the LowThresholdPercent value. In some cases, GC does not get triggered. In such scenarios, the FreeDiskSpaceFailed event occurs.

💡 Solution

Clean up some space or resize the volume. Look for unused docker images and clean up the unused images. Say, for instance, you can run the Spotify's GC to manually clean up the images on the node.

docker run --rm --privileged -v /var/run/docker.sock:/var/run/docker.sock -v /etc:/etc:ro spotify/docker-gc
🔔 Event: ImageGCFailed

🔍 Reason: If the disk space threshold hits default 90%, then ImageGCManager does the cleanup automatically. Sometimes ImageGCFailed error appears in the node events if the garbage collection fails.

💡 Solution: Same as FreeDiskSpaceFailed

🔔 Event: ContainerGCFailed

🔍 Reason: Node is overloaded (not always reflected as disk or memory pressure). Not enough resources are allocated to Docker and it fails to respond in time.

💡 Solution

  1. Set limits for pods to prevent overloading the Nod

  2. Cordon and evict the pods

  3. Reboot the server

🔍 Reason 2: Evictions thresholds are too close to the node's physical memory limits

💡 Solution 2: Leave some buffer while setting eviction thresholds

For more troubleshooting check -

🔔 Event: InvalidDiskCapacity

🔍 Reason 1: invalid capacity 0 on image filesystem & the node is in 'NotReady' status

This occurs when kubelet does not recognize the disk availability.

💡 Solution 1: Restart containerd and kubelet daemons on the node.

systemctl restart containerd
systemctl restart kubelet

(or on microk8s)

sudo systemctl restart snap.microk8s.daemon-kubelet
sudo systemctl status snap.microk8s.daemon-kubelet

🔍 Reason 2: cgroups not enabled on the node(edge ARM)

💡 Solution 2: Enable cgroups and reboot the node

sudo echo "cgroup_enable=memory cgroup_memory=1" >> /boot/firmware/cmdline.txt
reboot
https://kubernetes.feisky.xyz/v/en/index/cluster
🔎
Page cover image