Getting started
Introduction
Amazon Elastic Kubernetes Service (EKS) is a fully managed Kubernetes service by AWS. AWS manages the control plane (API server, etcd, scheduler, controller manager) while you manage worker nodes.
What is EKS
- Managed Control Plane: AWS handles patching, scaling, and high availability
- Two Modes: Standard EKS (manage control plane) and EKS Auto Mode (manage control + data plane)
- AWS Integration: Works with IAM, VPC, EBS, EFS, ALB, NLB
- Multi-AZ: API server runs across multiple availability zones
Architecture Overview
Control Plane (AWS-Managed):
- API Server (Multi-AZ)
- etcd
- Scheduler
- Controller Manager
Data Plane (Customer-Managed):
- EC2 Managed Nodes
- EC2 Self-Managed Nodes
- Fargate (Serverless)
Quick Start
# Create cluster
aws eks create-cluster \
--name my-cluster \
--role-arn arn:aws:iam::123456789012:role/eksServiceRole \
--resources-vpc-config subnetIds=subnet-xxx,subnet-yyy
# Update kubeconfig
aws eks update-kubeconfig --name my-cluster
# Verify connection
kubectl get svc
VPC Requirements
| Requirement | Value |
|---|---|
| Minimum Subnets | 2 (in different AZs) |
| IPs per Subnet | 8+ available (16 recommended) |
| Upgrade IPs | Up to 5 for cluster upgrades |
Versions
Support Types
| Type | Duration | Cost | Details |
|---|---|---|---|
| Standard | 14 months | Included | After version release |
| Extended | +12 months | Extra cost | Total 26 months |
Currently Available
Standard Support:
- 1.34
- 1.33
- 1.32
Extended Support:
- 1.31
- 1.30
- 1.29
Upgrade Rules
# Can only upgrade one minor version at a time
1.28 → 1.29 ✓
1.28 → 1.30 ✗ (must go 1.28 → 1.29 → 1.30)
# Auto-upgrade when extended support ends
# (control plane only, not worker nodes)
Version Skew
| Kubernetes Version | Max kubelet Lag |
|---|---|
| 1.28+ | 3 minor versions behind |
| Before 1.28 | 2 minor versions behind |
Upgrade Process
- New API server nodes launched
- Health checks performed
- Old nodes replaced
- Rolling update (cannot pause/stop)
- Requires up to 5 available IPs in subnets
# Update cluster version
aws eks update-cluster-version \
--name my-cluster \
--kubernetes-version 1.30
Upgrade Insights
- Automatically scans for deprecated API usage
- Identifies upgrade blockers
- Refreshes every 24 hours
- Cannot upgrade if deprecated APIs used in last 30 days
Node Types
Managed Node Groups (EC2)
# Create managed node group
aws eks create-nodegroup \
--cluster-name my-cluster \
--nodegroup-name my-nodes \
--node-role arn:aws:iam::123456789012:role/NodeRole \
--subnets subnet-xxx subnet-yyy \
--instance-types t3.medium
Features:
- AWS automates provisioning and lifecycle
- Part of EC2 Auto Scaling group
- Labeled with
eks.amazonaws.com/capacityType
Allocation Strategies:
- On-Demand:
prioritized - Spot:
price-capacity-optimized(K8s 1.28+) orcapacity-optimized(1.27-)
Self-Managed Nodes (EC2)
# Manual management required
# More control over configuration
# Requires manual updates
Use Cases:
- Custom AMIs
- Specific instance configurations
- Advanced networking requirements
Fargate (Serverless)
# Fargate profile
apiVersion: v1
kind: FargateProfile
metadata:
name: my-profile
selectors:
- namespace: default
Features:
- On-demand, right-sized compute
- Dedicated VM boundary per Pod
- No shared kernel, CPU, memory, or ENI
Limitations:
- ❌ No HostPort/HostNetwork
- ❌ No DaemonSets
- ❌ No GPUs
- ❌ No Spot instances
- ✓ Requires private subnets with NAT gateway
Comparison Table
| Feature | Managed Nodes | Self-Managed | Fargate |
|---|---|---|---|
| Management | AWS automated | Manual | Fully serverless |
| Cost | EC2 pricing | EC2 pricing | Per Pod pricing |
| Control | Medium | High | Low |
| DaemonSets | ✓ | ✓ | ✗ |
| GPUs | ✓ | ✓ | ✗ |
| Spot | ✓ | ✓ | ✗ |
| HostNetwork | ✓ | ✓ | ✗ |
Spot Capacity Rebalancing
# Enabled by default for Spot nodes
# 2-minute interruption notice
# Recommend 30s or less termination grace periods
spec:
terminationGracePeriodSeconds: 30
Warning: Pods may be forcibly terminated during concurrent reclamations.
IAM & Authentication
aws-iam-authenticator
# Uses IAM for cluster authentication
# Integrates with OpenID Connect (OIDC)
# Check authentication
aws sts get-caller-identity
OIDC Provider
# Hosts public OIDC discovery endpoint per cluster
# Contains signing keys for service account tokens
# Private keys rotate every 7 days
# Public keys kept until expiry
IRSA (IAM Roles for Service Accounts)
# Service Account with IAM role
apiVersion: v1
kind: ServiceAccount
metadata:
name: my-sa
namespace: default
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/MyRole
Injected Environment Variables:
AWS_ROLE_ARNAWS_WEB_IDENTITY_TOKEN_FILE
Benefits:
- Least privilege
- Credential isolation
- Auditability
Create IAM Role for IRSA
# Create OIDC provider
eksctl utils associate-iam-oidc-provider \
--cluster my-cluster \
--approve
# Create IAM role
eksctl create iamserviceaccount \
--name my-sa \
--namespace default \
--cluster my-cluster \
--attach-policy-arn arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess \
--approve
aws-auth ConfigMap (Legacy)
# Maps IAM roles/users to Kubernetes RBAC groups
apiVersion: v1
kind: ConfigMap
metadata:
name: aws-auth
namespace: kube-system
data:
mapRoles: |
- rolearn: arn:aws:iam::123456789012:role/NodeRole
username: system:node:{{EC2PrivateDNSName}}
groups:
- system:bootstrappers
- system:nodes
Note: Being replaced by access entries.
Access Entries (New Method)
# Replaces aws-auth ConfigMap
# Requires minimum platform version
aws eks create-access-entry \
--cluster-name my-cluster \
--principal-arn arn:aws:iam::123456789012:role/MyRole
Required Roles
eks:node-manager:
- ClusterRole and ClusterRoleBinding
- Required for managed node groups
- Missing/broken causes AccessDenied errors
# Verify role exists
kubectl get clusterrole eks:node-manager
kubectl get clusterrolebinding eks:node-manager
Networking
VPC CNI Plugin
# Check VPC CNI
kubectl get pods -n kube-system -l k8s-app=aws-node
# View CNI configuration
kubectl get daemonset -n kube-system aws-node -o yaml
Features:
- Manages Pod networking
- Allocates VPC IP addresses directly to Pods
- Uses secondary IPs from ENI or prefix delegation
- Requires IAM permissions (
AmazonEKS_CNI_Policy)
Security Groups for Pods
# Assign different VPC security groups to Pods
apiVersion: vpcresources.k8s.aws/v1beta1
kind: SecurityGroupPolicy
metadata:
name: my-sg-policy
spec:
podSelector:
matchLabels:
app: my-app
securityGroups:
groupIds:
- sg-0123456789abcdef0
Availability:
- ✓ Fargate
- ✓ EC2 nodes
Application Load Balancer (ALB)
# Ingress with ALB
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: my-ingress
annotations:
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/target-type: ip
spec:
ingressClassName: alb
rules:
- http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: my-service
port:
number: 80
Traffic Modes:
- Instance: NodePort proxy
- IP: Direct to Pod (required for Fargate)
Network Load Balancer (NLB)
# Service with NLB
apiVersion: v1
kind: Service
metadata:
name: my-service
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: nlb
spec:
type: LoadBalancer
selector:
app: my-app
ports:
- port: 80
targetPort: 8080
Features:
- Layer 4 load balancing
- Provisioned via Service type LoadBalancer
Subnet Tagging
Required for load balancer discovery:
| Subnet Type | Tag | Value |
|---|---|---|
| Private | kubernetes.io/role/internal-elb |
1 |
| Public | kubernetes.io/role/elb |
1 |
# Tag private subnet
aws ec2 create-tags \
--resources subnet-xxx \
--tags Key=kubernetes.io/role/internal-elb,Value=1
IPv6 Support
# Enable IPv6 for ALB
metadata:
annotations:
alb.ingress.kubernetes.io/ip-address-type: dualstack
alb.ingress.kubernetes.io/target-type: ip # Required
Note: Only works with IP target type.
Add-ons
Default Add-ons (Self-managed)
# VPC CNI
kubectl get daemonset -n kube-system aws-node
# kube-proxy
kubectl get daemonset -n kube-system kube-proxy
# CoreDNS
kubectl get deployment -n kube-system coredns
EKS-Managed Add-ons
# List available add-ons
aws eks describe-addon-versions
# Install add-on
aws eks create-addon \
--cluster-name my-cluster \
--addon-name vpc-cni \
--addon-version v1.18.0-eksbuild.1
# Update add-on
aws eks update-addon \
--cluster-name my-cluster \
--addon-name vpc-cni \
--addon-version v1.18.1-eksbuild.1
Add-on Types
| Type | Description | Examples |
|---|---|---|
| AWS | AWS-curated, latest security patches | VPC CNI, kube-proxy, CoreDNS |
| AWS Marketplace | Third-party verified | Datadog, New Relic |
| Community | Open-source, AWS-validated | Metrics Server, Cluster Autoscaler |
AWS Load Balancer Controller
# Install via Helm
helm repo add eks https://aws.github.io/eks-charts
helm install aws-load-balancer-controller \
eks/aws-load-balancer-controller \
-n kube-system \
--set clusterName=my-cluster
Required for:
- ALB Ingress
- NLB with IP targets
- Fargate load balancing
Storage
EBS CSI Driver
# Install EBS CSI driver add-on
aws eks create-addon \
--cluster-name my-cluster \
--addon-name aws-ebs-csi-driver
# Create IAM role for CSI driver
eksctl create iamserviceaccount \
--name ebs-csi-controller-sa \
--namespace kube-system \
--cluster my-cluster \
--attach-policy-arn arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy \
--approve
Compatibility:
- ✓ EC2 nodes
- ✗ Fargate
- ✗ Hybrid Nodes
Note: Node DaemonSet only runs on EC2, controller can run on Fargate.
StorageClass with EBS
# EBS StorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: gp3
provisioner: ebs.csi.aws.com
parameters:
type: gp3
encrypted: "true"
kmsKeyId: arn:aws:kms:us-east-1:123456789012:key/12345678-1234-1234-1234-123456789012
volumeBindingMode: WaitForFirstConsumer
KMS Encryption for EBS
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": ["kms:CreateGrant", "kms:Encrypt", "kms:Decrypt"],
"Resource": "arn:aws:kms:*:*:key/*"
}
]
}
Required IAM Permissions:
kms:CreateGrantkms:Encryptkms:Decrypt
EFS Support
# EFS PersistentVolume (Static Provisioning)
apiVersion: v1
kind: PersistentVolume
metadata:
name: efs-pv
spec:
capacity:
storage: 5Gi
volumeMode: Filesystem
accessModes:
- ReadWriteMany
storageClassName: efs-sc
csi:
driver: efs.csi.aws.com
volumeHandle: fs-12345678
Fargate Support:
- ✓ Automatic EFS mount (no driver installation)
- ✓ Static provisioning only
- ✗ Dynamic provisioning
FSx for Lustre
# Supported via CSI driver
# EC2 nodes only
# Not available on Fargate
Storage Comparison
| Storage Type | Fargate | EC2 | Dynamic Provisioning | Use Case |
|---|---|---|---|---|
| EBS | ✗ | ✓ | ✓ | Block storage, single AZ |
| EFS | ✓ (static) | ✓ | ✓ | Shared file storage, multi-AZ |
| FSx Lustre | ✗ | ✓ | ✓ | High-performance computing |
Troubleshooting
Node Join Failures
aws-auth ConfigMap Issues:
# Check aws-auth ConfigMap
kubectl get configmap aws-auth -n kube-system -o yaml
# Common issues:
# - Missing or incorrect aws-auth entries
# - ARN cannot include path other than /
Fix:
# Correct format
mapRoles: |
- rolearn: arn:aws:iam::123456789012:role/NodeRole # No path
username: system:node:{{EC2PrivateDNSName}}
groups:
- system:bootstrappers
- system:nodes
Node Tags:
# Nodes must have cluster tag
aws ec2 create-tags \
--resources i-1234567890abcdef0 \
--tags Key=kubernetes.io/cluster/my-cluster,Value=owned
Public IP Issues:
# Public subnet nodes need public IP or Elastic IP
# Private subnets need NAT gateway route
# Check route table
aws ec2 describe-route-tables --route-table-id rtb-xxx
DNS Issues:
# Node needs private DNS entry
# VPC requires DHCP options set
# Check DHCP options
aws ec2 describe-dhcp-options --dhcp-options-id dopt-xxx
# Required:
# - domain-name
# - domain-name-servers
NodeCreationFailure
# 15-minute timeout
# Windows AMIs may need fast launch enabled
# Check node group events
aws eks describe-nodegroup \
--cluster-name my-cluster \
--nodegroup-name my-nodes \
--query 'nodegroup.health'
InsufficientFreeAddresses
# Not enough available IPs in subnet
# Need to free IPs or change subnets
# Check available IPs
aws ec2 describe-subnets --subnet-ids subnet-xxx \
--query 'Subnets[0].AvailableIpAddressCount'
# Solution: Add new subnet to node group
aws eks update-nodegroup-config \
--cluster-name my-cluster \
--nodegroup-name my-nodes \
--subnets subnet-new
AccessDenied Error
# Missing eks:node-manager ClusterRole or ClusterRoleBinding
# Verify role exists
kubectl get clusterrole eks:node-manager
kubectl get clusterrolebinding eks:node-manager
# Recreate if missing
kubectl apply -f https://s3.us-west-2.amazonaws.com/amazon-eks/cloudformation/2020-10-29/aws-auth-cm.yaml
Container Runtime Not Ready
# Missing or incorrect aws-auth/access entry for node IAM role
# Check node logs
kubectl logs -n kube-system -l k8s-app=aws-node
# Verify IAM role in aws-auth
kubectl get configmap aws-auth -n kube-system -o yaml
# Check node authorization
kubectl describe node NODE_NAME | grep -i auth
TLS Handshake Timeout
# Node cannot reach public API endpoint
# Check route table & security groups
# Test from node
curl -k https://API_SERVER_ENDPOINT
# Check security group rules
aws ec2 describe-security-groups --group-ids sg-xxx
# Required: Allow HTTPS (443) from node security group
HTTP 401 Unauthorized
# Stale service account tokens (>90 days old)
# Kubernetes client SDK must be recent version
# Recreate service account
kubectl delete sa SERVICE_ACCOUNT_NAME -n NAMESPACE
kubectl create sa SERVICE_ACCOUNT_NAME -n NAMESPACE
# Update client SDK
# Use latest aws-sdk, kubectl, or client library
Too Many Requests
# Launching many nodes causes describeCluster throttling
# Solution: Pass bootstrap arguments
--apiserver-endpoint ENDPOINT \
--b64-cluster-ca CERTIFICATE_AUTHORITY \
--dns-cluster-ip DNS_CLUSTER_IP
# Reduces API calls during node bootstrap
EKS Log Collector
# Pre-built script on nodes
/etc/eks/log-collector-script/eks-log-collector.sh
# Run on node
sudo /etc/eks/log-collector-script/eks-log-collector.sh
# Collects:
# - kubelet logs
# - Container runtime logs
# - VPC CNI logs
# - System information
Cluster Health Issues
Detection:
# EKS detects infrastructure/configuration issues
# Stores in health object
# Can take up to 3 hours to detect
aws eks describe-cluster --name my-cluster \
--query 'cluster.health'
Notifications:
- AWS sends email
- Personal Health Dashboard notification
Recoverable Errors
| Error | Cause | Recovery |
|---|---|---|
| SUBNET_NOT_FOUND | Cluster subnet deleted | update-cluster-config |
| SECURITY_GROUP_NOT_FOUND | Cluster SG deleted | update-cluster-config |
| KMS_KEY_DISABLED | KMS key disabled | Re-enable key |
# Recover from SUBNET_NOT_FOUND
aws eks update-cluster-config \
--name my-cluster \
--resources-vpc-config subnetIds=subnet-new1,subnet-new2
Non-Recoverable Errors
| Error | Cause | Action |
|---|---|---|
| VPC_NOT_FOUND | Cluster VPC deleted | Delete and recreate cluster |
| KMS_KEY_NOT_FOUND | KMS key deleted | Delete and recreate cluster |
Warning: These errors require cluster recreation. Backup data before deleting.
Commands
AWS CLI - Cluster Operations
# Describe cluster
aws eks describe-cluster --name my-cluster
# List clusters
aws eks list-clusters
# Get cluster versions
aws eks describe-cluster-versions
# Update cluster version
aws eks update-cluster-version \
--name my-cluster \
--kubernetes-version 1.30
# Update cluster config
aws eks update-cluster-config \
--name my-cluster \
--resources-vpc-config subnetIds=subnet-xxx,subnet-yyy
# Delete cluster
aws eks delete-cluster --name my-cluster
AWS CLI - Node Groups
# List node groups
aws eks list-nodegroups --cluster-name my-cluster
# Describe node group
aws eks describe-nodegroup \
--cluster-name my-cluster \
--nodegroup-name my-nodes
# Create node group
aws eks create-nodegroup \
--cluster-name my-cluster \
--nodegroup-name my-nodes \
--node-role arn:aws:iam::123456789012:role/NodeRole \
--subnets subnet-xxx subnet-yyy
# Update node group config
aws eks update-nodegroup-config \
--cluster-name my-cluster \
--nodegroup-name my-nodes \
--scaling-config minSize=2,maxSize=10,desiredSize=4
# Delete node group
aws eks delete-nodegroup \
--cluster-name my-cluster \
--nodegroup-name my-nodes
AWS CLI - Add-ons
# List add-ons
aws eks list-addons --cluster-name my-cluster
# Describe add-on versions
aws eks describe-addon-versions --addon-name vpc-cni
# Create add-on
aws eks create-addon \
--cluster-name my-cluster \
--addon-name vpc-cni \
--addon-version v1.18.0-eksbuild.1
# Update add-on
aws eks update-addon \
--cluster-name my-cluster \
--addon-name vpc-cni \
--addon-version v1.18.1-eksbuild.1
# Delete add-on
aws eks delete-addon \
--cluster-name my-cluster \
--addon-name vpc-cni
kubectl - Node Management
# View nodes
kubectl get nodes
# Check node labels
kubectl get nodes --show-labels
# Check node capacity type
kubectl get nodes -L eks.amazonaws.com/capacityType
# Describe node
kubectl describe node NODE_NAME
# Check kubelet version
kubectl version
# Drain node
kubectl drain NODE_NAME --ignore-daemonsets --delete-emptydir-data
# Cordon node
kubectl cordon NODE_NAME
# Uncordon node
kubectl uncordon NODE_NAME
kubectl - Add-ons
# Check VPC CNI
kubectl get pods -n kube-system -l k8s-app=aws-node
# Check CoreDNS
kubectl get pods -n kube-system -l k8s-app=kube-dns
# Check kube-proxy
kubectl get pods -n kube-system -l k8s-app=kube-proxy
# Check EBS CSI driver
kubectl get pods -n kube-system \
-l app.kubernetes.io/name=aws-ebs-csi-driver
# Check AWS Load Balancer Controller
kubectl get pods -n kube-system \
-l app.kubernetes.io/name=aws-load-balancer-controller
kubectl - IRSA & IAM
# Check service account
kubectl get sa SERVICE_ACCOUNT_NAME -n NAMESPACE -o yaml
# View annotations (should show eks.amazonaws.com/role-arn)
kubectl describe sa SERVICE_ACCOUNT_NAME -n NAMESPACE
# Check aws-auth ConfigMap (legacy)
kubectl get configmap aws-auth -n kube-system -o yaml
# Edit aws-auth ConfigMap
kubectl edit configmap aws-auth -n kube-system
# Create service account with IRSA
kubectl create sa my-sa -n default
kubectl annotate sa my-sa -n default \
eks.amazonaws.com/role-arn=arn:aws:iam::123456789012:role/MyRole
kubectl - Troubleshooting
# View events (sorted)
kubectl get events --sort-by='.lastTimestamp' -A
# Check pod on specific node
kubectl get pods -A -o wide --field-selector spec.nodeName=NODE_NAME
# Describe failing pod
kubectl describe pod POD_NAME -n NAMESPACE
# Check logs
kubectl logs POD_NAME -n NAMESPACE
# Check previous logs (crashed pod)
kubectl logs POD_NAME -n NAMESPACE --previous
# Execute into pod
kubectl exec -it POD_NAME -n NAMESPACE -- /bin/sh
# Check resource usage
kubectl top nodes
kubectl top pods -A
eksctl - Quick Operations
# Create cluster
eksctl create cluster \
--name my-cluster \
--region us-east-1 \
--nodegroup-name my-nodes \
--nodes 3
# Create IRSA
eksctl create iamserviceaccount \
--name my-sa \
--namespace default \
--cluster my-cluster \
--attach-policy-arn arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess \
--approve
# Associate OIDC provider
eksctl utils associate-iam-oidc-provider \
--cluster my-cluster \
--approve
# Delete cluster
eksctl delete cluster --name my-cluster
Security
Pod Security Standards
# Replace deprecated PodSecurityPolicies
apiVersion: v1
kind: Namespace
metadata:
name: my-namespace
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/audit: restricted
pod-security.kubernetes.io/warn: restricted
Levels:
- Privileged: Unrestricted
- Baseline: Minimally restrictive
- Restricted: Heavily restricted
Secrets Encryption
# Enable KMS encryption at cluster creation
aws eks create-cluster \
--name my-cluster \
--encryption-config '[{"resources":["secrets"],"provider":{"keyArn":"arn:aws:kms:us-east-1:123456789012:key/12345678-1234-1234-1234-123456789012"}}]' \
# ... other parameters
# Cannot be enabled on existing clusters
# Must recreate cluster
Encrypts:
- Kubernetes secrets at rest
- etcd data
Private Clusters
# API server endpoint only accessible from within VPC
aws eks create-cluster \
--name my-cluster \
--resources-vpc-config endpointPrivateAccess=true,endpointPublicAccess=false \
# ... other parameters
# Requires VPN or Direct Connect for external access
Options:
- Public Only: Default, accessible from internet
- Public + Private: Both access methods
- Private Only: VPC-only access (requires VPN/Direct Connect)
IMDS Restriction
# Block Pod access to instance metadata service
# Prevents credential access from Pods
# Add to node user data
--kubelet-extra-args '--cloud-provider=aws --node-labels=node.kubernetes.io/role=worker --register-node=true --v=2 --cloud-provider-config=/etc/kubernetes/cloud.conf --cluster-dns=10.100.0.10 --cluster-domain=cluster.local --hostname-override=$(curl -s http://169.254.169.254/latest/meta-data/local-hostname) --node-ip=$(curl -s http://169.254.169.254/latest/meta-data/local-ipv4)'
Recommended:
- Use IRSA instead of instance roles
- Block IMDS v1
- Require IMDSv2 with hop limit 1
Network Policies
# Restrict Pod-to-Pod communication
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: deny-all
namespace: default
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
Requires:
- Calico or other CNI with network policy support
- VPC CNI supports security groups, not network policies
Audit Logging
# Enable control plane logging
aws eks update-cluster-config \
--name my-cluster \
--logging '{"clusterLogging":[{"types":["api","audit","authenticator","controllerManager","scheduler"],"enabled":true}]}'
# View logs in CloudWatch
# Log group: /aws/eks/my-cluster/cluster
Log Types:
- api: API server logs
- audit: Audit logs
- authenticator: Authenticator logs
- controllerManager: Controller manager logs
- scheduler: Scheduler logs
Best Practices
Cluster Design
- Multi-AZ: Deploy across 3+ availability zones
- Separate Subnets: Use different subnets for public/private workloads
- Control Plane Logging: Enable all log types
- Managed Nodes: Use managed node groups for easier lifecycle
- Version Currency: Keep clusters within standard support window
Networking
# Use VPC CNI prefix delegation for more IPs per node
apiVersion: v1
kind: ConfigMap
metadata:
name: amazon-vpc-cni
namespace: kube-system
data:
enable-prefix-delegation: "true"
warm-prefix-target: "1"
Best Practices:
- Tag subnets for load balancer discovery
- Use security groups for Pods for fine-grained control
- Implement network policies (requires Calico)
- Use private subnets for worker nodes
- Use public subnets only for load balancers
Security
IAM & Authentication:
- Enable secrets encryption with KMS
- Use IRSA for Pod IAM permissions (avoid node IAM roles)
- Restrict IMDS access from Pods (use IMDSv2)
- Use private API endpoint when possible
- Enable audit logging
Pod Security:
- Apply Pod Security Standards
- Run containers as non-root
- Use read-only root filesystems
- Drop unnecessary capabilities
- Set resource limits
Cost Optimization
# Use Spot instances for fault-tolerant workloads
apiVersion: eks.amazonaws.com/v1
kind: NodeGroup
metadata:
name: spot-nodes
spec:
capacityType: SPOT
instanceTypes:
- t3.medium
- t3a.medium
- t2.medium
Strategies:
- Right-size node instance types
- Use Fargate for unpredictable workloads
- Implement Cluster Autoscaler or Karpenter
- Monitor and optimize resource requests/limits
- Use Savings Plans or Reserved Instances
- Clean up unused resources (volumes, snapshots, AMIs)
Operations
Cluster Maintenance:
- Keep clusters updated (within standard support)
- Automate cluster creation with IaC (eksctl, Terraform)
- Monitor with CloudWatch Container Insights
- Use Upgrade Insights before upgrades
- Test upgrades in non-production first
Monitoring:
- Enable CloudWatch Container Insights
- Use Prometheus + Grafana for metrics
- Implement log aggregation (Fluent Bit, CloudWatch Logs)
- Set up alerting for critical events
Disaster Recovery:
- Backup etcd regularly (handled by AWS)
- Version control Kubernetes manifests
- Use GitOps (ArgoCD, Flux)
- Document runbooks
- Test disaster recovery procedures
Resource Management
# Set resource requests and limits
apiVersion: v1
kind: Pod
metadata:
name: my-pod
spec:
containers:
- name: app
image: my-app
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
Recommendations:
- Always set requests (for scheduling)
- Set limits for memory (prevent OOM)
- Use Vertical Pod Autoscaler for right-sizing
- Implement Horizontal Pod Autoscaler
- Use PodDisruptionBudgets for availability
Pricing
Cluster Pricing
Standard Support:
- $0.10 per hour per cluster
- $73 per month per cluster
Extended Support:
- Additional cost per cluster
- Varies by Kubernetes version
Compute Pricing
EC2 Nodes:
- Standard EC2 pricing applies
- On-Demand, Reserved, Spot available
- Separate from cluster fees
Fargate:
- Per-vCPU and per-GB memory pricing
- Each Pod billed individually
- No node management overhead
Example Fargate:
- 0.25 vCPU, 0.5 GB: ~$0.012/hour
- 1 vCPU, 2 GB: ~$0.046/hour
- 4 vCPU, 8 GB: ~$0.185/hour
Additional Costs
Networking:
- VPC NAT Gateway: $0.045/hour + data transfer
- ALB: $0.0225/hour + LCU pricing
- NLB: $0.0225/hour + NLCU pricing
- Data transfer: $0.09/GB (varies by region)
Storage:
- EBS: $0.08-0.10/GB-month (gp3)
- EFS: $0.30/GB-month (standard)
- FSx Lustre: $0.14/GB-month
Data Transfer:
- Within same AZ: Free
- Between AZs: $0.01/GB
- To internet: $0.09/GB (first 10 TB)
Cost Optimization
- Use Spot instances (up to 90% savings)
- Use Fargate for unpredictable workloads
- Right-size instances and Pods
- Use Savings Plans (20-50% savings)
- Monitor and eliminate waste
- Use Cluster Autoscaler to scale down
Also see
- AWS EKS Documentation - Official EKS user guide
- EKS Managed Nodes - Managed node groups documentation
- EKS Fargate - Fargate profiles and configuration
- IRSA Documentation - IAM roles for service accounts
- EKS Add-ons - Add-ons management
- Kubernetes Versions - Version support and upgrades
- EKS Upgrades - Cluster upgrade process
- EKS Troubleshooting - Common issues and solutions
- EBS CSI Driver - EBS storage configuration
- ALB Ingress - Application Load Balancer setup
- eksctl Documentation - Official eksctl CLI tool
- AWS Load Balancer Controller - ALB/NLB controller
- kubectl Cheatsheet - kubectl command reference
- Helm Cheatsheet - Helm package manager
- Kubernetes Cheatsheet - Kubernetes concepts and commands