Deployment Issues

Service stuck in pending state - Reason : 0/1 nodes are available: Too many pods.

Scenario

When an application is deployed, the services are stuck in pending state in the application view page. When a specific replica is viewed, it shows the error - 0/1 nodes are available: <nodecount> Too many pods.

Under containers, there are no containers provisioned.

Resolution

This error could happen due to 2 reasons.

The number of replicas provisioned on a specific node exceeded the maximum limit assigned on that node.
The number of private IP addresses assigned to the replicas exceeded the maximum number of private IPs possible for node size.

In both scenarios, the cluster does not automatically scale and add another node. To overcome this issue, you can increase the desired node code in the cluster.

Navigate to the cluster view page, click on the node pool and edit the desired node count.

Below are the list of node pool fields to be edited for different types of managed clusters.

AWS EKS - Desired Capacity
Google GKE - Initial Count
Azure AKS - Desired Count

exec user process caused "exec format error"

Scenario

When the application is deployed, the service moves to pending state and the container moves to Waiting state with the reason as CrashLoopBackOff. The container logs show the error standard_init_linux.go:178: exec user process caused “exec format error”

Reason 1:

This error is encountered when trying to run a go binary inside a Docker container. This is because the host machine and the Docker container have different architectures and operating systems. The go binary must be compiled for the architecture and operating system of the Docker container in order for it to be executed.

Solution

If you are onboarding a Dockerfile based container then make sure the right architecture is specified in your build command. Say, for instance, if you are building the binary on a MacOS, but your runtime Docker container is based on Ubuntu, you can compile your binary like below:

GOOS=linux go build -o myprogram

Reason 2 :

This error could appear due to missing script header like #!/bin/bash or #!/bin/sh in the container start script. The error may also occur due to any empty line or space before the script header.

Solution

Click on the info link for the container, and edit the start script to include the header as the first line of the start script. Save the start script.

Here is an example of how a startscript with script header:

#!/bin/bash
npm start

This will automatically restart the replica after a few seconds. If you desire to restart the replica immediately, you can click on the Delete option next to the replica. This will bring down the replica and create a new replica with the updated start script in the container.

pod has unbound immediate PersistentVolumeClaims : node(s) had taints that the pod didn't tolerate.

Scenario

Deploying an application with Stateful services, results in Stateful Service stuck in pending state for long. Examining the Service replica log shows the error :

Failed to provision volume with StorageClass "gp-landingpage-sc-uj2z": InvalidZone.NotFound: The zone 'us-east-1a' does not exist. pod has unbound immediate PersistentVolumeClaims 0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate. no nodes available to schedule pods

This happens when the Service is scheduled on a nodepool in a different availability zone where as the Volume Provision Policy provisions the volume in a different availability zone.

Resolution

Modify the Volume Provision Policy to the availability zone to match the node pool availability zone.

PreviousEKS Issues NextNode Issues

Last updated 1 year ago