AWS EKS Cheatsheet

Getting started

Introduction

Amazon Elastic Kubernetes Service (EKS) is a fully managed Kubernetes service by AWS. AWS manages the control plane (API server, etcd, scheduler, controller manager) while you manage worker nodes.

What is EKS

Managed Control Plane: AWS handles patching, scaling, and high availability
Two Modes: Standard EKS (manage control plane) and EKS Auto Mode (manage control + data plane)
AWS Integration: Works with IAM, VPC, EBS, EFS, ALB, NLB
Multi-AZ: API server runs across multiple availability zones

Architecture Overview

Control Plane (AWS-Managed):
  - API Server (Multi-AZ)
  - etcd
  - Scheduler
  - Controller Manager

Data Plane (Customer-Managed):
  - EC2 Managed Nodes
  - EC2 Self-Managed Nodes
  - Fargate (Serverless)

Quick Start

# Create cluster
aws eks create-cluster \
  --name my-cluster \
  --role-arn arn:aws:iam::123456789012:role/eksServiceRole \
  --resources-vpc-config subnetIds=subnet-xxx,subnet-yyy

# Update kubeconfig
aws eks update-kubeconfig --name my-cluster

# Verify connection
kubectl get svc

VPC Requirements

Requirement	Value
Minimum Subnets	2 (in different AZs)
IPs per Subnet	8+ available (16 recommended)
Upgrade IPs	Up to 5 for cluster upgrades

Versions

Support Types

Type	Duration	Cost	Details
Standard	14 months	Included	After version release
Extended	+12 months	Extra cost	Total 26 months

Currently Available

Standard Support:

1.34
1.33
1.32

Extended Support:

1.31
1.30
1.29

Upgrade Rules

# Can only upgrade one minor version at a time
1.28 → 1.29 ✓
1.28 → 1.30 ✗ (must go 1.28 → 1.29 → 1.30)

# Auto-upgrade when extended support ends
# (control plane only, not worker nodes)

Version Skew

Kubernetes Version	Max kubelet Lag
1.28+	3 minor versions behind
Before 1.28	2 minor versions behind

Upgrade Process

New API server nodes launched
Health checks performed
Old nodes replaced
Rolling update (cannot pause/stop)
Requires up to 5 available IPs in subnets

# Update cluster version
aws eks update-cluster-version \
  --name my-cluster \
  --kubernetes-version 1.30

Upgrade Insights

Automatically scans for deprecated API usage
Identifies upgrade blockers
Refreshes every 24 hours
Cannot upgrade if deprecated APIs used in last 30 days

Node Types

Managed Node Groups (EC2)

# Create managed node group
aws eks create-nodegroup \
  --cluster-name my-cluster \
  --nodegroup-name my-nodes \
  --node-role arn:aws:iam::123456789012:role/NodeRole \
  --subnets subnet-xxx subnet-yyy \
  --instance-types t3.medium

Features:

AWS automates provisioning and lifecycle
Part of EC2 Auto Scaling group
Labeled with eks.amazonaws.com/capacityType

Allocation Strategies:

On-Demand: prioritized
Spot: price-capacity-optimized (K8s 1.28+) or capacity-optimized (1.27-)

Self-Managed Nodes (EC2)

# Manual management required
# More control over configuration
# Requires manual updates

Use Cases:

Custom AMIs
Specific instance configurations
Advanced networking requirements

Fargate (Serverless)

# Fargate profile
apiVersion: v1
kind: FargateProfile
metadata:
  name: my-profile
selectors:
  - namespace: default

Features:

On-demand, right-sized compute
Dedicated VM boundary per Pod
No shared kernel, CPU, memory, or ENI

Limitations:

❌ No HostPort/HostNetwork
❌ No DaemonSets
❌ No GPUs
❌ No Spot instances
✓ Requires private subnets with NAT gateway

Comparison Table

Feature	Managed Nodes	Self-Managed	Fargate
Management	AWS automated	Manual	Fully serverless
Cost	EC2 pricing	EC2 pricing	Per Pod pricing
Control	Medium	High	Low
DaemonSets	✓	✓	✗
GPUs	✓	✓	✗
Spot	✓	✓	✗
HostNetwork	✓	✓	✗

Spot Capacity Rebalancing

# Enabled by default for Spot nodes
# 2-minute interruption notice
# Recommend 30s or less termination grace periods

spec:
  terminationGracePeriodSeconds: 30

Warning: Pods may be forcibly terminated during concurrent reclamations.

IAM & Authentication

aws-iam-authenticator

# Uses IAM for cluster authentication
# Integrates with OpenID Connect (OIDC)

# Check authentication
aws sts get-caller-identity

OIDC Provider

# Hosts public OIDC discovery endpoint per cluster
# Contains signing keys for service account tokens

# Private keys rotate every 7 days
# Public keys kept until expiry

IRSA (IAM Roles for Service Accounts)

# Service Account with IAM role
apiVersion: v1
kind: ServiceAccount
metadata:
  name: my-sa
  namespace: default
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/MyRole

Injected Environment Variables:

AWS_ROLE_ARN
AWS_WEB_IDENTITY_TOKEN_FILE

Benefits:

Least privilege
Credential isolation
Auditability

Create IAM Role for IRSA

# Create OIDC provider
eksctl utils associate-iam-oidc-provider \
  --cluster my-cluster \
  --approve

# Create IAM role
eksctl create iamserviceaccount \
  --name my-sa \
  --namespace default \
  --cluster my-cluster \
  --attach-policy-arn arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess \
  --approve

aws-auth ConfigMap (Legacy)

# Maps IAM roles/users to Kubernetes RBAC groups
apiVersion: v1
kind: ConfigMap
metadata:
  name: aws-auth
  namespace: kube-system
data:
  mapRoles: |
    - rolearn: arn:aws:iam::123456789012:role/NodeRole
      username: system:node:{{EC2PrivateDNSName}}
      groups:
        - system:bootstrappers
        - system:nodes

Note: Being replaced by access entries.

Access Entries (New Method)

# Replaces aws-auth ConfigMap
# Requires minimum platform version

aws eks create-access-entry \
  --cluster-name my-cluster \
  --principal-arn arn:aws:iam::123456789012:role/MyRole

Required Roles

eks:node-manager:

ClusterRole and ClusterRoleBinding
Required for managed node groups
Missing/broken causes AccessDenied errors

# Verify role exists
kubectl get clusterrole eks:node-manager
kubectl get clusterrolebinding eks:node-manager

Networking

VPC CNI Plugin

# Check VPC CNI
kubectl get pods -n kube-system -l k8s-app=aws-node

# View CNI configuration
kubectl get daemonset -n kube-system aws-node -o yaml

Features:

Manages Pod networking
Allocates VPC IP addresses directly to Pods
Uses secondary IPs from ENI or prefix delegation
Requires IAM permissions (AmazonEKS_CNI_Policy)

Security Groups for Pods

# Assign different VPC security groups to Pods
apiVersion: vpcresources.k8s.aws/v1beta1
kind: SecurityGroupPolicy
metadata:
  name: my-sg-policy
spec:
  podSelector:
    matchLabels:
      app: my-app
  securityGroups:
    groupIds:
      - sg-0123456789abcdef0

Availability:

✓ Fargate
✓ EC2 nodes

Application Load Balancer (ALB)

# Ingress with ALB
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-ingress
  annotations:
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
spec:
  ingressClassName: alb
  rules:
    - http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: my-service
                port:
                  number: 80

Traffic Modes:

Instance: NodePort proxy
IP: Direct to Pod (required for Fargate)

Network Load Balancer (NLB)

# Service with NLB
apiVersion: v1
kind: Service
metadata:
  name: my-service
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: nlb
spec:
  type: LoadBalancer
  selector:
    app: my-app
  ports:
    - port: 80
      targetPort: 8080

Features:

Layer 4 load balancing
Provisioned via Service type LoadBalancer

Subnet Tagging

Required for load balancer discovery:

Subnet Type	Tag	Value
Private	`kubernetes.io/role/internal-elb`	`1`
Public	`kubernetes.io/role/elb`	`1`

# Tag private subnet
aws ec2 create-tags \
  --resources subnet-xxx \
  --tags Key=kubernetes.io/role/internal-elb,Value=1

IPv6 Support

# Enable IPv6 for ALB
metadata:
  annotations:
    alb.ingress.kubernetes.io/ip-address-type: dualstack
    alb.ingress.kubernetes.io/target-type: ip # Required

Note: Only works with IP target type.

Add-ons

Default Add-ons (Self-managed)

# VPC CNI
kubectl get daemonset -n kube-system aws-node

# kube-proxy
kubectl get daemonset -n kube-system kube-proxy

# CoreDNS
kubectl get deployment -n kube-system coredns

EKS-Managed Add-ons

# List available add-ons
aws eks describe-addon-versions

# Install add-on
aws eks create-addon \
  --cluster-name my-cluster \
  --addon-name vpc-cni \
  --addon-version v1.18.0-eksbuild.1

# Update add-on
aws eks update-addon \
  --cluster-name my-cluster \
  --addon-name vpc-cni \
  --addon-version v1.18.1-eksbuild.1

Add-on Types

Type	Description	Examples
AWS	AWS-curated, latest security patches	VPC CNI, kube-proxy, CoreDNS
AWS Marketplace	Third-party verified	Datadog, New Relic
Community	Open-source, AWS-validated	Metrics Server, Cluster Autoscaler

AWS Load Balancer Controller

# Install via Helm
helm repo add eks https://aws.github.io/eks-charts
helm install aws-load-balancer-controller \
  eks/aws-load-balancer-controller \
  -n kube-system \
  --set clusterName=my-cluster

Required for:

ALB Ingress
NLB with IP targets
Fargate load balancing

Storage

EBS CSI Driver

# Install EBS CSI driver add-on
aws eks create-addon \
  --cluster-name my-cluster \
  --addon-name aws-ebs-csi-driver

# Create IAM role for CSI driver
eksctl create iamserviceaccount \
  --name ebs-csi-controller-sa \
  --namespace kube-system \
  --cluster my-cluster \
  --attach-policy-arn arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy \
  --approve

Compatibility:

✓ EC2 nodes
✗ Fargate
✗ Hybrid Nodes

Note: Node DaemonSet only runs on EC2, controller can run on Fargate.

StorageClass with EBS

# EBS StorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: gp3
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  encrypted: "true"
  kmsKeyId: arn:aws:kms:us-east-1:123456789012:key/12345678-1234-1234-1234-123456789012
volumeBindingMode: WaitForFirstConsumer

KMS Encryption for EBS

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["kms:CreateGrant", "kms:Encrypt", "kms:Decrypt"],
      "Resource": "arn:aws:kms:*:*:key/*"
    }
  ]
}

Required IAM Permissions:

kms:CreateGrant
kms:Encrypt
kms:Decrypt

EFS Support

# EFS PersistentVolume (Static Provisioning)
apiVersion: v1
kind: PersistentVolume
metadata:
  name: efs-pv
spec:
  capacity:
    storage: 5Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteMany
  storageClassName: efs-sc
  csi:
    driver: efs.csi.aws.com
    volumeHandle: fs-12345678

Fargate Support:

✓ Automatic EFS mount (no driver installation)
✓ Static provisioning only
✗ Dynamic provisioning

FSx for Lustre

# Supported via CSI driver
# EC2 nodes only
# Not available on Fargate

Storage Comparison

Storage Type	Fargate	EC2	Dynamic Provisioning	Use Case
EBS	✗	✓	✓	Block storage, single AZ
EFS	✓ (static)	✓	✓	Shared file storage, multi-AZ
FSx Lustre	✗	✓	✓	High-performance computing

Troubleshooting

Node Join Failures

aws-auth ConfigMap Issues:

# Check aws-auth ConfigMap
kubectl get configmap aws-auth -n kube-system -o yaml

# Common issues:
# - Missing or incorrect aws-auth entries
# - ARN cannot include path other than /

Fix:

# Correct format
mapRoles: |
  - rolearn: arn:aws:iam::123456789012:role/NodeRole  # No path
    username: system:node:{{EC2PrivateDNSName}}
    groups:
      - system:bootstrappers
      - system:nodes

Node Tags:

# Nodes must have cluster tag
aws ec2 create-tags \
  --resources i-1234567890abcdef0 \
  --tags Key=kubernetes.io/cluster/my-cluster,Value=owned

Public IP Issues:

# Public subnet nodes need public IP or Elastic IP
# Private subnets need NAT gateway route

# Check route table
aws ec2 describe-route-tables --route-table-id rtb-xxx

DNS Issues:

# Node needs private DNS entry
# VPC requires DHCP options set

# Check DHCP options
aws ec2 describe-dhcp-options --dhcp-options-id dopt-xxx

# Required:
# - domain-name
# - domain-name-servers

NodeCreationFailure

# 15-minute timeout
# Windows AMIs may need fast launch enabled

# Check node group events
aws eks describe-nodegroup \
  --cluster-name my-cluster \
  --nodegroup-name my-nodes \
  --query 'nodegroup.health'

InsufficientFreeAddresses

# Not enough available IPs in subnet
# Need to free IPs or change subnets

# Check available IPs
aws ec2 describe-subnets --subnet-ids subnet-xxx \
  --query 'Subnets[0].AvailableIpAddressCount'

# Solution: Add new subnet to node group
aws eks update-nodegroup-config \
  --cluster-name my-cluster \
  --nodegroup-name my-nodes \
  --subnets subnet-new

AccessDenied Error

# Missing eks:node-manager ClusterRole or ClusterRoleBinding

# Verify role exists
kubectl get clusterrole eks:node-manager
kubectl get clusterrolebinding eks:node-manager

# Recreate if missing
kubectl apply -f https://s3.us-west-2.amazonaws.com/amazon-eks/cloudformation/2020-10-29/aws-auth-cm.yaml

Container Runtime Not Ready

# Missing or incorrect aws-auth/access entry for node IAM role

# Check node logs
kubectl logs -n kube-system -l k8s-app=aws-node

# Verify IAM role in aws-auth
kubectl get configmap aws-auth -n kube-system -o yaml

# Check node authorization
kubectl describe node NODE_NAME | grep -i auth

TLS Handshake Timeout

# Node cannot reach public API endpoint
# Check route table & security groups

# Test from node
curl -k https://API_SERVER_ENDPOINT

# Check security group rules
aws ec2 describe-security-groups --group-ids sg-xxx

# Required: Allow HTTPS (443) from node security group

HTTP 401 Unauthorized

# Stale service account tokens (>90 days old)
# Kubernetes client SDK must be recent version

# Recreate service account
kubectl delete sa SERVICE_ACCOUNT_NAME -n NAMESPACE
kubectl create sa SERVICE_ACCOUNT_NAME -n NAMESPACE

# Update client SDK
# Use latest aws-sdk, kubectl, or client library

Too Many Requests

# Launching many nodes causes describeCluster throttling

# Solution: Pass bootstrap arguments
--apiserver-endpoint ENDPOINT \
--b64-cluster-ca CERTIFICATE_AUTHORITY \
--dns-cluster-ip DNS_CLUSTER_IP

# Reduces API calls during node bootstrap

EKS Log Collector

# Pre-built script on nodes
/etc/eks/log-collector-script/eks-log-collector.sh

# Run on node
sudo /etc/eks/log-collector-script/eks-log-collector.sh

# Collects:
# - kubelet logs
# - Container runtime logs
# - VPC CNI logs
# - System information

Cluster Health Issues

Detection:

# EKS detects infrastructure/configuration issues
# Stores in health object
# Can take up to 3 hours to detect

aws eks describe-cluster --name my-cluster \
  --query 'cluster.health'

Notifications:

AWS sends email
Personal Health Dashboard notification

Recoverable Errors

Error	Cause	Recovery
SUBNET_NOT_FOUND	Cluster subnet deleted	`update-cluster-config`
SECURITY_GROUP_NOT_FOUND	Cluster SG deleted	`update-cluster-config`
KMS_KEY_DISABLED	KMS key disabled	Re-enable key

# Recover from SUBNET_NOT_FOUND
aws eks update-cluster-config \
  --name my-cluster \
  --resources-vpc-config subnetIds=subnet-new1,subnet-new2

Non-Recoverable Errors

Error	Cause	Action
VPC_NOT_FOUND	Cluster VPC deleted	Delete and recreate cluster
KMS_KEY_NOT_FOUND	KMS key deleted	Delete and recreate cluster

Warning: These errors require cluster recreation. Backup data before deleting.

Commands

AWS CLI - Cluster Operations

# Describe cluster
aws eks describe-cluster --name my-cluster

# List clusters
aws eks list-clusters

# Get cluster versions
aws eks describe-cluster-versions

# Update cluster version
aws eks update-cluster-version \
  --name my-cluster \
  --kubernetes-version 1.30

# Update cluster config
aws eks update-cluster-config \
  --name my-cluster \
  --resources-vpc-config subnetIds=subnet-xxx,subnet-yyy

# Delete cluster
aws eks delete-cluster --name my-cluster

AWS CLI - Node Groups

# List node groups
aws eks list-nodegroups --cluster-name my-cluster

# Describe node group
aws eks describe-nodegroup \
  --cluster-name my-cluster \
  --nodegroup-name my-nodes

# Create node group
aws eks create-nodegroup \
  --cluster-name my-cluster \
  --nodegroup-name my-nodes \
  --node-role arn:aws:iam::123456789012:role/NodeRole \
  --subnets subnet-xxx subnet-yyy

# Update node group config
aws eks update-nodegroup-config \
  --cluster-name my-cluster \
  --nodegroup-name my-nodes \
  --scaling-config minSize=2,maxSize=10,desiredSize=4

# Delete node group
aws eks delete-nodegroup \
  --cluster-name my-cluster \
  --nodegroup-name my-nodes

AWS CLI - Add-ons

# List add-ons
aws eks list-addons --cluster-name my-cluster

# Describe add-on versions
aws eks describe-addon-versions --addon-name vpc-cni

# Create add-on
aws eks create-addon \
  --cluster-name my-cluster \
  --addon-name vpc-cni \
  --addon-version v1.18.0-eksbuild.1

# Update add-on
aws eks update-addon \
  --cluster-name my-cluster \
  --addon-name vpc-cni \
  --addon-version v1.18.1-eksbuild.1

# Delete add-on
aws eks delete-addon \
  --cluster-name my-cluster \
  --addon-name vpc-cni

kubectl - Node Management

# View nodes
kubectl get nodes

# Check node labels
kubectl get nodes --show-labels

# Check node capacity type
kubectl get nodes -L eks.amazonaws.com/capacityType

# Describe node
kubectl describe node NODE_NAME

# Check kubelet version
kubectl version

# Drain node
kubectl drain NODE_NAME --ignore-daemonsets --delete-emptydir-data

# Cordon node
kubectl cordon NODE_NAME

# Uncordon node
kubectl uncordon NODE_NAME

kubectl - Add-ons

# Check VPC CNI
kubectl get pods -n kube-system -l k8s-app=aws-node

# Check CoreDNS
kubectl get pods -n kube-system -l k8s-app=kube-dns

# Check kube-proxy
kubectl get pods -n kube-system -l k8s-app=kube-proxy

# Check EBS CSI driver
kubectl get pods -n kube-system \
  -l app.kubernetes.io/name=aws-ebs-csi-driver

# Check AWS Load Balancer Controller
kubectl get pods -n kube-system \
  -l app.kubernetes.io/name=aws-load-balancer-controller

kubectl - IRSA & IAM

# Check service account
kubectl get sa SERVICE_ACCOUNT_NAME -n NAMESPACE -o yaml

# View annotations (should show eks.amazonaws.com/role-arn)
kubectl describe sa SERVICE_ACCOUNT_NAME -n NAMESPACE

# Check aws-auth ConfigMap (legacy)
kubectl get configmap aws-auth -n kube-system -o yaml

# Edit aws-auth ConfigMap
kubectl edit configmap aws-auth -n kube-system

# Create service account with IRSA
kubectl create sa my-sa -n default
kubectl annotate sa my-sa -n default \
  eks.amazonaws.com/role-arn=arn:aws:iam::123456789012:role/MyRole

kubectl - Troubleshooting

# View events (sorted)
kubectl get events --sort-by='.lastTimestamp' -A

# Check pod on specific node
kubectl get pods -A -o wide --field-selector spec.nodeName=NODE_NAME

# Describe failing pod
kubectl describe pod POD_NAME -n NAMESPACE

# Check logs
kubectl logs POD_NAME -n NAMESPACE

# Check previous logs (crashed pod)
kubectl logs POD_NAME -n NAMESPACE --previous

# Execute into pod
kubectl exec -it POD_NAME -n NAMESPACE -- /bin/sh

# Check resource usage
kubectl top nodes
kubectl top pods -A

eksctl - Quick Operations

# Create cluster
eksctl create cluster \
  --name my-cluster \
  --region us-east-1 \
  --nodegroup-name my-nodes \
  --nodes 3

# Create IRSA
eksctl create iamserviceaccount \
  --name my-sa \
  --namespace default \
  --cluster my-cluster \
  --attach-policy-arn arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess \
  --approve

# Associate OIDC provider
eksctl utils associate-iam-oidc-provider \
  --cluster my-cluster \
  --approve

# Delete cluster
eksctl delete cluster --name my-cluster

Security

Pod Security Standards

# Replace deprecated PodSecurityPolicies
apiVersion: v1
kind: Namespace
metadata:
  name: my-namespace
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted

Levels:

Privileged: Unrestricted
Baseline: Minimally restrictive
Restricted: Heavily restricted

Secrets Encryption

# Enable KMS encryption at cluster creation
aws eks create-cluster \
  --name my-cluster \
  --encryption-config '[{"resources":["secrets"],"provider":{"keyArn":"arn:aws:kms:us-east-1:123456789012:key/12345678-1234-1234-1234-123456789012"}}]' \
  # ... other parameters

# Cannot be enabled on existing clusters
# Must recreate cluster

Encrypts:

Kubernetes secrets at rest
etcd data

Private Clusters

# API server endpoint only accessible from within VPC
aws eks create-cluster \
  --name my-cluster \
  --resources-vpc-config endpointPrivateAccess=true,endpointPublicAccess=false \
  # ... other parameters

# Requires VPN or Direct Connect for external access

Options:

Public Only: Default, accessible from internet
Public + Private: Both access methods
Private Only: VPC-only access (requires VPN/Direct Connect)

IMDS Restriction

# Block Pod access to instance metadata service
# Prevents credential access from Pods

# Add to node user data
--kubelet-extra-args '--cloud-provider=aws --node-labels=node.kubernetes.io/role=worker --register-node=true --v=2 --cloud-provider-config=/etc/kubernetes/cloud.conf --cluster-dns=10.100.0.10 --cluster-domain=cluster.local --hostname-override=$(curl -s http://169.254.169.254/latest/meta-data/local-hostname) --node-ip=$(curl -s http://169.254.169.254/latest/meta-data/local-ipv4)'

Recommended:

Use IRSA instead of instance roles
Block IMDS v1
Require IMDSv2 with hop limit 1

Network Policies

# Restrict Pod-to-Pod communication
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-all
  namespace: default
spec:
  podSelector: {}
  policyTypes:
    - Ingress
    - Egress

Requires:

Calico or other CNI with network policy support
VPC CNI supports security groups, not network policies

Audit Logging

# Enable control plane logging
aws eks update-cluster-config \
  --name my-cluster \
  --logging '{"clusterLogging":[{"types":["api","audit","authenticator","controllerManager","scheduler"],"enabled":true}]}'

# View logs in CloudWatch
# Log group: /aws/eks/my-cluster/cluster

Log Types:

api: API server logs
audit: Audit logs
authenticator: Authenticator logs
controllerManager: Controller manager logs
scheduler: Scheduler logs

Best Practices

Cluster Design

Multi-AZ: Deploy across 3+ availability zones
Separate Subnets: Use different subnets for public/private workloads
Control Plane Logging: Enable all log types
Managed Nodes: Use managed node groups for easier lifecycle
Version Currency: Keep clusters within standard support window

Networking

# Use VPC CNI prefix delegation for more IPs per node
apiVersion: v1
kind: ConfigMap
metadata:
  name: amazon-vpc-cni
  namespace: kube-system
data:
  enable-prefix-delegation: "true"
  warm-prefix-target: "1"

Best Practices:

Tag subnets for load balancer discovery
Use security groups for Pods for fine-grained control
Implement network policies (requires Calico)
Use private subnets for worker nodes
Use public subnets only for load balancers

Security

IAM & Authentication:

Enable secrets encryption with KMS
Use IRSA for Pod IAM permissions (avoid node IAM roles)
Restrict IMDS access from Pods (use IMDSv2)
Use private API endpoint when possible
Enable audit logging

Pod Security:

Apply Pod Security Standards
Run containers as non-root
Use read-only root filesystems
Drop unnecessary capabilities
Set resource limits

Cost Optimization

# Use Spot instances for fault-tolerant workloads
apiVersion: eks.amazonaws.com/v1
kind: NodeGroup
metadata:
  name: spot-nodes
spec:
  capacityType: SPOT
  instanceTypes:
    - t3.medium
    - t3a.medium
    - t2.medium

Strategies:

Right-size node instance types
Use Fargate for unpredictable workloads
Implement Cluster Autoscaler or Karpenter
Monitor and optimize resource requests/limits
Use Savings Plans or Reserved Instances
Clean up unused resources (volumes, snapshots, AMIs)

Operations

Cluster Maintenance:

Keep clusters updated (within standard support)
Automate cluster creation with IaC (eksctl, Terraform)
Monitor with CloudWatch Container Insights
Use Upgrade Insights before upgrades
Test upgrades in non-production first

Monitoring:

Enable CloudWatch Container Insights
Use Prometheus + Grafana for metrics
Implement log aggregation (Fluent Bit, CloudWatch Logs)
Set up alerting for critical events

Disaster Recovery:

Backup etcd regularly (handled by AWS)
Version control Kubernetes manifests
Use GitOps (ArgoCD, Flux)
Document runbooks
Test disaster recovery procedures

Resource Management

# Set resource requests and limits
apiVersion: v1
kind: Pod
metadata:
  name: my-pod
spec:
  containers:
    - name: app
      image: my-app
      resources:
        requests:
          memory: "64Mi"
          cpu: "250m"
        limits:
          memory: "128Mi"
          cpu: "500m"

Recommendations:

Always set requests (for scheduling)
Set limits for memory (prevent OOM)
Use Vertical Pod Autoscaler for right-sizing
Implement Horizontal Pod Autoscaler
Use PodDisruptionBudgets for availability

Pricing

Cluster Pricing

Standard Support:

$0.10 per hour per cluster
$73 per month per cluster

Extended Support:

Additional cost per cluster
Varies by Kubernetes version

Compute Pricing

EC2 Nodes:

Standard EC2 pricing applies
On-Demand, Reserved, Spot available
Separate from cluster fees

Fargate:

Per-vCPU and per-GB memory pricing
Each Pod billed individually
No node management overhead

Example Fargate:

0.25 vCPU, 0.5 GB: ~$0.012/hour
1 vCPU, 2 GB: ~$0.046/hour
4 vCPU, 8 GB: ~$0.185/hour

Additional Costs

Networking:

VPC NAT Gateway: $0.045/hour + data transfer
ALB: $0.0225/hour + LCU pricing
NLB: $0.0225/hour + NLCU pricing
Data transfer: $0.09/GB (varies by region)

Storage:

EBS: $0.08-0.10/GB-month (gp3)
EFS: $0.30/GB-month (standard)
FSx Lustre: $0.14/GB-month

Data Transfer:

Within same AZ: Free
Between AZs: $0.01/GB
To internet: $0.09/GB (first 10 TB)

Cost Optimization

Use Spot instances (up to 90% savings)
Use Fargate for unpredictable workloads
Right-size instances and Pods
Use Savings Plans (20-50% savings)
Monitor and eliminate waste
Use Cluster Autoscaler to scale down

Also see

AWS EKS Documentation - Official EKS user guide
EKS Managed Nodes - Managed node groups documentation
EKS Fargate - Fargate profiles and configuration
IRSA Documentation - IAM roles for service accounts
EKS Add-ons - Add-ons management
Kubernetes Versions - Version support and upgrades
EKS Upgrades - Cluster upgrade process
EKS Troubleshooting - Common issues and solutions
EBS CSI Driver - EBS storage configuration
ALB Ingress - Application Load Balancer setup
eksctl Documentation - Official eksctl CLI tool
AWS Load Balancer Controller - ALB/NLB controller
kubectl Cheatsheet - kubectl command reference
Helm Cheatsheet - Helm package manager
Kubernetes Cheatsheet - Kubernetes concepts and commands