SearchSpot Tech Blog Series 1: Karpenter AL2023 Nodes Not Joining EKS 1.29+ (Access Entries + nodeadm Authentication Trap)
AL2023 + Karpenter requires a hybrid authentication setup. Keep an EKS Access Entry of type EC2_LINUX (gives system:nodes) Enable authentication_mode = "API_AND_CONFIG_MAP" Add an aws-auth ConfigMap role mapping that includes both: system:bootstrappers system:nodes
Fix: Karpenter AL2023 Nodes Not Joining EKS 1.29+ (Access Entries + nodeadm Authentication Trap)
Migrating from EKS managed node groups to Karpenter on Amazon Linux 2023 (AL2023) is usually a big win: better bin-packing, faster scale-out, and lower idle cost. We did the same at SearchSpot because our workloads are bursty (campaign spikes, new feature launches), and waiting minutes for node groups to scale is a tax.
But there’s a sharp edge that can burn days: AL2023 nodes launched by Karpenter can come up healthy in EC2 and still never register as Kubernetes Nodes on EKS 1.29+, especially when you’re using EKS Access Entries for auth.
This post is the fix we wish existed when we hit it.
TL;DR
AL2023 + Karpenter requires a hybrid authentication setup.
- Keep an EKS Access Entry of type
EC2_LINUX(givessystem:nodes) - Enable
authentication_mode = "API_AND_CONFIG_MAP" - Add an aws-auth ConfigMap role mapping that includes both:
system:bootstrapperssystem:nodes
Why: with Access Entries, you cannot attach system:* groups via a STANDARD entry, and the EC2_LINUX entry alone doesn’t cover everything AL2023 bootstrap needs.
When this applies
You’re likely affected if all are true:
- ✅ Karpenter provisions the instances (NodeClaim created, EC2 instance Running)
- ✅ AMI family is AL2023
- ✅ EKS cluster version 1.29+ (we’ve seen it across 1.29–1.33)
- ✅ You’re relying on EKS Access Entries (not only aws-auth)
- ✅ Symptom is “nodes launch but never join” (
kubectl get nodesnever shows them)
If your nodes are joining fine, you don’t need this post.
What you see when it fails
Symptom 1: NodeClaim stuck “Unknown” / “Node not registered with cluster”
kubectl get nodeclaim
Example:
NAME TYPE CAPACITY ZONE NODE READY AGE
general-purpose-wkg62 t4g.medium on-demand us-east-1a Unknown 8m
- Karpenter provisions EC2
- Instance health checks pass
- But no node appears in
kubectl get nodes
Symptom 2: Logs vary (don’t overfit to one signal)
Depending on timing and configuration, you might see any of these:
- kubelet/nodeadm logs imply auth/bootstrap permission issues
- CSRs may be missing or stuck (more on this below)
- EC2 console output sometimes shows a warning like:
aws ec2 get-console-output --instance-id i-xxx --region <region> --latest
cloud-init: Unhandled unknown content-type (application/node.eks.aws)
That message is commonly observed in AL2023/nodeadm contexts, but it’s not the only failure signature. Treat it as “seen in the wild,” not as sole proof.
Why this happens (the real trap)
AL2023 bootstraps with nodeadm, not bootstrap.sh
Amazon Linux 2023 for EKS uses nodeadm as the bootstrap mechanism. This is a meaningful change from AL2’s /etc/eks/bootstrap.sh.
The authentication sharp edge: Access Entries vs bootstrap groups
To join the cluster, nodes need the right Kubernetes identity groups during bootstrap and normal operation:
system:bootstrappers— used for initial registration/bootstrapping flowssystem:nodes— used for ongoing node permissions
Now combine that with Access Entries constraints:
EC2_LINUXAccess Entry automatically maps the role as a node identity and grantssystem:nodesSTANDARDAccess Entry lets you specify custom groups… but AWS rejectssystem:*groups (reserved prefix)
So you end up in a catch-22:
EC2_LINUXgives yousystem:nodes, but you still needsystem:bootstrappersfor AL2023 bootstrap to complete reliablySTANDARDcan’t be used to addsystem:bootstrappersbecausesystem:*is blocked
Managed Node Groups work because AWS wires up the node auth path automatically for you. Karpenter nodes are self-managed from the cluster’s perspective, so you must do it.
Quick decision tree (before you change anything)
- If you’re on AL2023 and using custom userData
Remove custom userData first. AL2023 uses nodeadm; don’t forcebootstrap.sh. - If you’re on Access Entries only (authentication_mode = API)
You likely need the hybrid fix below. - If you’re on AL2 and nodes don’t join
This is usually networking/security-group/endpoint reachability, not this specific auth trap.
The fix: Hybrid authentication (Access Entry + aws-auth)
Step 0: Set cluster authentication mode to hybrid
You must run EKS auth in API_AND_CONFIG_MAP mode so both Access Entries and aws-auth are honored.
# modules/eks/main.tf
module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "20.35.0"
cluster_name = "my-eks-cluster"
cluster_version = "1.33"
# Required for the hybrid fix
authentication_mode = "API_AND_CONFIG_MAP"
# ... rest of EKS config
}
Step 1: Keep the EC2_LINUX Access Entry (don’t remove it)
This continues to provide the node identity mapping and system:nodes.
# modules/eks/main.tf
module "eks" {
# ...
access_entries = {
karpenter_node = {
principal_arn = aws_iam_role.karpenter_node.arn
type = "EC2_LINUX"
}
}
}
Step 2: Add aws-auth role mapping with BOTH groups
This is the missing piece for AL2023 bootstrap reliability.
Important: Do not put aws-auth inside the eks module if it creates circular dependencies. Put it at the root (environment) level.
# environments/dev/main.tf
module "aws_auth" {
source = "terraform-aws-modules/eks/aws//modules/aws-auth"
version = "20.35.0"
manage_aws_auth_configmap = true
aws_auth_roles = [
{
rolearn = module.eks.karpenter_node_role_arn
username = "system:node:{{EC2PrivateDNSName}}"
groups = ["system:bootstrappers", "system:nodes"]
}
]
depends_on = [module.eks]
}
Step 3: Configure the Kubernetes provider (for aws-auth module)
# environments/dev/provider.tf
provider "kubernetes" {
host = module.eks.cluster_endpoint
cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)
exec {
api_version = "client.authentication.k8s.io/v1beta1"
command = "aws"
args = ["eks", "get-token", "--cluster-name", module.eks.cluster_name]
}
}
Karpenter correctness checklist (common “not joining” causes)
Even after fixing auth, these can still block registration. We hit multiple during migration, so here’s the tightened list.
1) Don’t use custom userData with AL2023
AL2023 uses nodeadm. Let Karpenter generate the NodeConfig automatically.
# ✅ Correct: omit userData for AL2023
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: general-purpose-nodeclass
spec:
amiFamily: AL2023
role: karpenter-node-role
amiSelectorTerms:
- alias: al2023@latest
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: "my-eks-cluster"
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: "my-eks-cluster"
2) spec.role is IAM role name, not instance profile
# ✅ Correct
spec:
role: karpenter-node-role
3) Subnet tagging: tag only the subnets you actually want Karpenter to use
If you tag public subnets unintentionally, Karpenter can schedule nodes there. Whether that breaks depends on your endpoint mode (public/private), NAT, routing, and egress controls. In many real setups it results in “instance up, node never registers.”
# ✅ Recommended: tag private subnets for Karpenter discovery
private_subnet_tags = {
"karpenter.sh/discovery" = var.cluster_name
}
# Keep public subnet tags for ELB only
public_subnet_tags = {
"kubernetes.io/role/elb" = 1
}
IRSA + IAM: keep Karpenter able to create instance profiles
Step 4: Verify the controller ServiceAccount name (version drift)
Karpenter chart versions changed the controller SA naming in many setups. Make sure your trust policy matches your deployed SA.
module "karpenter_irsa" {
source = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
version = "5.48.0"
role_name = "karpenter-controller"
attach_karpenter_controller_policy = true
karpenter_controller_cluster_name = module.eks.cluster_name
karpenter_controller_node_iam_role_arns = [aws_iam_role.karpenter_node.arn]
karpenter_sqs_queue_arn = aws_sqs_queue.karpenter_interruption.arn
oidc_providers = {
ex = {
provider_arn = module.eks.oidc_provider_arn
namespace_service_accounts = ["karpenter:karpenter-controller"]
}
}
}
Step 5: Ensure Karpenter has the IAM permissions to manage instance profiles (including PassRole)
This is required for Karpenter to attach the node IAM role to instance profiles.
resource "aws_iam_policy" "karpenter_instance_profile_policy" {
name = "karpenter-instance-profile-policy"
description = "Allow Karpenter to manage instance profiles"
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = [
"iam:GetInstanceProfile",
"iam:CreateInstanceProfile",
"iam:DeleteInstanceProfile",
"iam:AddRoleToInstanceProfile",
"iam:RemoveRoleFromInstanceProfile",
"iam:TagInstanceProfile",
"iam:PassRole"
]
Resource = "*"
}
]
})
}
resource "aws_iam_role_policy_attachment" "karpenter_controller_instance_profile_policy" {
policy_arn = aws_iam_policy.karpenter_instance_profile_policy.arn
role = module.karpenter_irsa.iam_role_name
}
Verification (copy/paste)
1) aws-auth ConfigMap contains both groups
kubectl get configmap aws-auth -n kube-system -o yaml
Expected snippet:
data:
mapRoles: |
- groups:
- system:bootstrappers
- system:nodes
rolearn: arn:aws:iam::ACCOUNT:role/karpenter-node-role
username: system:node:{{EC2PrivateDNSName}}
2) Access Entry exists (EC2_LINUX)
aws eks describe-access-entry \
--cluster-name my-eks-cluster \
--principal-arn arn:aws:iam::ACCOUNT:role/karpenter-node-role
Expected:
{
"accessEntry": {
"type": "EC2_LINUX",
"kubernetesGroups": ["system:nodes"],
"username": "system:node:{{EC2PrivateDNSName}}"
}
}
3) Launch a pod to trigger provisioning
kubectl run test --image=nginx --requests=cpu=100m,memory=128Mi
kubectl get nodeclaim -w
4) Optional: check CSRs (helpful bootstrap signal)
kubectl get csr
kubectl describe csr <name>
5) Node appears
kubectl get nodes
You should see a Ready node within a couple minutes.
Debugging commands (when it still doesn’t join)
# NodeClaim details
kubectl describe nodeclaim <name>
# Karpenter controller logs
kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter --tail=200
# EC2 console output (often useful on AL2023)
aws ec2 get-console-output --instance-id i-xxx --region <region> --latest
# Verify the instance profile attached
aws ec2 describe-instances --instance-ids i-xxx \
--query 'Reservations[0].Instances[0].IamInstanceProfile.Arn'
Key takeaways
- AL2023 + Karpenter + Access Entries can require hybrid auth
Use API_AND_CONFIG_MAP, keepEC2_LINUX, and add aws-auth mapping withsystem:bootstrappers+system:nodes. - Don’t specify custom userData on AL2023
Let Karpenter generate nodeadm config. - Be intentional with subnet discovery tags
Tag only the subnets you want Karpenter to use. - IRSA + IAM permissions matter
Make sure the controller SA name is correct and Karpenter can manage instance profiles (includingiam:PassRole).
References
- Karpenter docs
- EKS Access Entries
- Amazon Linux 2023 EKS-optimized AMI docs
- nodeadm docs
- Related GitHub issues: AL2023 node registration, cloud-init content-type warnings
If you want, I can also generate:
- a “short version” for LinkedIn (problem + fix + 1 code snippet),
- and a “checklist image” (SearchSpot style) you can attach to the post.