# Node Issues

<details>

<summary>🔔  Event: FreeDiskSpaceFailed</summary>

🔍 **Reason:** Not enough disk space in the node

By default, the Kubernetes garbage collection (GC) gets triggered when the disk usage on a node crosses the HighThresholdPercent value (90% default). The ImageGCManager deletes images starting with the oldest and last used image until the disk usage reaches the LowThresholdPercent value. In some cases, GC does not get triggered. In such scenarios, the FreeDiskSpaceFailed event occurs.

💡 **Solution**

Clean up some space or resize the volume. Look for unused docker images and clean up the unused images. Say, for instance, you can run the Spotify's GC to manually clean up the images on the node.

```
docker run --rm --privileged -v /var/run/docker.sock:/var/run/docker.sock -v /etc:/etc:ro spotify/docker-gc
```

</details>

<details>

<summary>🔔 Event: ImageGCFailed</summary>

**🔍 Reason:** If the disk space threshold hits default 90%, then ImageGCManager does the cleanup automatically. Sometimes ImageGCFailed error appears in the node events if the garbage collection fails.

**💡 Solution:** Same as FreeDiskSpaceFailed

</details>

<details>

<summary>🔔 Event: ContainerGCFailed</summary>

**🔍 Reason:** Node is overloaded (not always reflected as disk or memory pressure). Not enough resources are allocated to Docker and it fails to respond in time.

**💡 Solution**

1. Set limits for pods to prevent overloading the Nod
2. Cordon and evict the pods
3. Reboot the server

🔍 **Reason 2:** Evictions thresholds are too close to the node's physical memory limits

💡 **Solution 2:** Leave some buffer while setting eviction thresholds

For more troubleshooting check - <https://kubernetes.feisky.xyz/v/en/index/cluster>

</details>

<details>

<summary>🔔 Event: InvalidDiskCapacity</summary>

**🔍 Reason 1:** invalid capacity 0 on image filesystem & the node is in 'NotReady' status

This occurs when kubelet does not recognize the disk availability.

💡 **Solution 1:** Restart containerd and kubelet daemons on the node.

```
systemctl restart containerd
systemctl restart kubelet
```

(or on microk8s)

```
sudo systemctl restart snap.microk8s.daemon-kubelet
sudo systemctl status snap.microk8s.daemon-kubelet
```

🔍 **Reason 2:** cgroups not enabled on the node(edge ARM)

💡 **Solution 2:** Enable cgroups and reboot the node

```
sudo echo "cgroup_enable=memory cgroup_memory=1" >> /boot/firmware/cmdline.txt
reboot
```

</details>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://help.gopaddle.io/troubleshooting/node-issues.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
