Workshops

AWS Terraform Workshop: From Fundamentals to Production Architecture

This document provides a complete, high-fidelity manual for the AWS Terraform Workshop. It covers technical depth, service definitions, architectural "whys," and complete step-by-step implementation blocks for every topic.

1. Introduction

Workshop Introduction

This is a guided, hands-on workshop where you build a realistic AWS environment using Terraform and learn the core practices used in production Infrastructure as Code (IaC): repeatable environments, safe change workflows, state management, modularity, and least-privilege access.

Sample AWS architecture diagram

What you will build (sample AWS architecture)

Over the course of the modules, you will incrementally build an AWS environment that resembles a production-ready baseline. Conceptually, the target architecture includes:

  • Networking foundation
    • A dedicated VPC with public and private subnets across Availability Zones
    • Internet access via an Internet Gateway, and private outbound access via NAT (where applicable)
    • Route tables and security boundaries to separate internet-facing and internal workloads
  • Compute and traffic entry
    • An internet-facing Application Load Balancer (ALB)
    • A scalable compute layer using an Auto Scaling Group (ASG) for EC2-based workloads
  • Identity and access
    • Practical IAM usage (users vs groups vs roles vs policies), and how Terraform interacts with AWS APIs securely
  • Kubernetes (optional/advanced track)
    • An Amazon EKS cluster, including foundational concepts like cluster identity, kubeconfig access, and (optionally) OIDC/IRSA for pod-to-AWS permissions

The workshop is designed so each module builds on the previous one. Even if you don’t implement every “advanced” item, you’ll still end with a coherent environment and a solid Terraform workflow.

How to use this workshop (self-paced)

You can complete this workshop end-to-end on your own. It supports two modes:

  1. Live mode (deploy to AWS)

    • Use this mode when the goal is to practice real infrastructure deployment and validation in AWS.
    • Suggested flow per module:
      • terraform fmt
      • terraform validate
      • terraform plan
      • terraform apply
      • Verify using AWS Console / CLI
  2. Non-live mode (no AWS deploy; visualize with TerraViz)

    • Use this mode when the goal is to learn Terraform structure, references, modules, and architecture without creating cloud resources. It is not necessary to install Terraform!
    • Suggested flow per module:
      • Write the .tf files for that module (treat the module as a “checkpoint”).
      • Upload your Terraform code to TerraViz to visualize the dependency graph and validate your references.
      • Iterate on your .tf until the graph matches the intended architecture, then proceed to the next module.

Recommended approach (both modes):

  1. Read the “why” first, then type the “how”
    • Skim each module’s explanation to understand the intent.
    • Then implement the Terraform code in order.
  2. Move forward only when your current state is healthy
    • If a step fails (or the graph doesn’t look right), stop and resolve it before continuing.
  3. Keep notes and commit often
    • Treat this like software development: small changes, frequent commits, clear messages.
  4. Cost and cleanup (Live mode only)
    • Some resources (especially EKS/ALB/NAT) can incur cost.
    • When finished, use the cleanup module and verify resources are removed.

Suggested learning outcomes

By the end, you should be able to:

  • Explain how Terraform builds and uses a dependency graph
  • Use variables, outputs, and basic module patterns
  • Understand state, remote state, and why locking matters
  • Apply practical IAM patterns for Terraform operators and workload permissions
  • Confidently reason about common AWS building blocks (VPC, subnets, SGs, ALB, ASG)

Infrastructure as Code (IaC): What it is and why it matters

IaC (100-level)

Infrastructure as Code means you define infrastructure (networks, servers, databases, permissions, etc.) as version-controlled code instead of manually clicking in a console.

At a high level:

  • You write the desired infrastructure in code.
  • You run a tool (Terraform) to compare:
    • Desired state (your code)
    • Actual state (what exists in AWS)
  • Terraform creates a plan (what will change), then applies it.
IaC feedback loop (Code → Plan → Apply → Cloud → Drift detection → back to Code)

Why IaC is important (practical outcomes)

  • Repeatability: rebuild the same environment (dev/stage/prod) reliably.
  • Safety: changes are reviewed like software (pull requests, approvals, change history).
  • Speed: provisioning becomes automated and consistent.
  • Auditability: “who changed what” is visible in Git history.
ClickOps vs IaC (manual steps vs automated pipeline)

ClickOps vs IaC (simple example)

Console-driven approach:

  • Create VPC
  • Create subnets
  • Create route tables
  • Create security groups
  • Launch instance

IaC-driven approach:

  • Describe those pieces in .tf files
  • Terraform determines order and dependencies automatically

Example snippet (HCL):

resource "aws_vpc" "main" {
  cidr_block = "10.0.0.0/16"
}

resource "aws_subnet" "public" {
  vpc_id     = aws_vpc.main.id
  cidr_block = "10.0.1.0/24"
}

IaC (200–300 level): core concepts you will use constantly

  • Declarative configuration: you describe what you want, not the step-by-step procedure.
  • Idempotency: applying the same code repeatedly converges to the same result.
  • Dependency graph: Terraform builds a graph from references (e.g., aws_subnet.public depends on aws_vpc.main).
  • State: Terraform records what it created in a state file (terraform.tfstate). This is how it knows what exists and what must change.
  • Drift: if someone changes resources manually, Terraform can detect it at plan time.
Terraform dependency graph (VPC → Subnet → EC2)

IaC (advanced intro): production-minded framing

As systems grow, IaC becomes part of software delivery:

  • Environments: separate state and configuration for dev/stage/prod.
  • Modules: reusable building blocks (like libraries) for VPCs, ECS clusters, RDS, etc.
  • Remote state: store state centrally (e.g., S3 + DynamoDB locking) to enable teamwork safely.
  • Policy and guardrails: enforce standards (naming, tagging, allowed regions/instance types) via policy-as-code and CI checks.
  • CI/CD: run terraform fmt, terraform validate, and terraform plan in pipelines; require approvals before apply.
Team workflow (Git PR → CI Plan → Approval → Apply)

Preparation and Comparison

Before writing code, we must understand the "Stateful" vs. "Managed Service" debate.
AWS CloudFormation is an AWS-native service. You upload a template (JSON/YAML), and AWS handles the execution and state internally. It is highly integrated but locked to the AWS ecosystem.
Terraform is an open-source tool by HashiCorp. It is Cloud-Agnostic, allowing you to manage AWS, Azure, GCP, and even SaaS tools like GitHub in the same script. Terraform uses a State File (.tfstate)—a JSON map of your infrastructure—to remember what it built. This allows for a plan phase, which acts as a "dry run" to show exactly what will happen before you commit.

Setting Up Your Environment

On Windows (PowerShell)

  1. Terraform Installation:

    • Download the ZIP from https://developer.hashicorp.com/terraform/install.
    • Create a folder named terraform in your C:\ drive (C:\terraform).
    • Extract terraform.exe from the ZIP into C:\terraform.
    • Press Win + R, type sysdm.cpl, and press Enter to open System Properties
    • Go to the "Advanced" tab and click "Environment Variables."
    • Under "System variables," find and select the "Path" variable, then click "Edit."
    • Click "New" and add C:\terraform to the list, then click “OK” on all open windows to apply and save the changes.
  2. AWS CLI: Download and run the MSI Installer.

  3. Git: Install from https://git-scm.com/install/windows.

  4. Verify: Open Command Prompt (CMD) and run each

    • terraform -version
    • aws --version
    • git --version

On macOS (Terminal)

  1. Install: Use Homebrew: brew install hashicorp/tap/terraform awscli git.
  2. Verify: run each
    • terraform -version
    • aws --version
    • git --version

Introduction to Terraform and Infrastructure as Code (IaC) as a part of DevOps

Why is IaC important in DevOps?

Traditional "ClickOps" (manual console clicks) leads to Configuration Drift, where servers become unique "Snowflakes" that are impossible to replicate. IaC treats hardware like software. This is particularly important for automation flows of a CI/CD pipepline.

IaC in the DevOps & CI/CD Lifecycle (200-level)

IaC is the "glue" of DevOps, allowing infrastructure to move at the speed of software.

  • The CI/CD link: pipelines can run terraform fmt, terraform validate, and terraform plan to verify changes.
  • Promotion: the same code pattern can be promoted from dev -> staging -> prod with different inputs/state.
CI/CD flow for Terraform (PR → CI plan → approval → apply)

Immutable infrastructure (200-level)

Instead of manually patching a long-lived server, many teams treat servers as disposable:

  • Update configuration in code.
  • Provision a new instance (or an autoscaling group rollout).
  • Decommission the old instance.

This reduces "snowflakes" and makes changes predictable.

Core mechanics: how Terraform thinks (300-level)

To master Terraform, keep three pillars in mind:

  • Declarative syntax: you describe what you want ("a VPC, subnets, an ALB"), Terraform figures out ordering.
  • Dependency graph: Terraform builds a graph based on references (e.g., Subnet depends on VPC).
  • State management: Terraform uses terraform.tfstate as memory to map your code to real AWS resource IDs.
Dependency graph and ordering (VPC → Subnet → ALB → EC2)

Key Concepts

  • Declarative: You define the end state ("I want 3 servers"), and Terraform handles the logic.
  • Idempotency: Running the code twice results in the same state. If the server exists, Terraform does nothing.
  • Reconciliation Loop: Terraform compares your code (Desired State) to AWS (Actual State) and only fixes the difference (The Delta).

1.1 Introduction to Git and Repositories

A Repository is a version-controlled folder. We use Git to track every change. If a Terraform script accidentally deletes a database, Git allows you to "Roll Back" to the last working version of your code.

Step-by-Step: Initializing Git & Terraform

  1. Create Folder: Open a new Command Prompt (CMD) and run: mkdir tf-workshop && cd tf-workshop
  2. Git Setup: Run: git init in the Command Prompt (CMD) to initialize the repository.
  3. The .gitignore: Run the following commands in the Command Prompt to create a .gitignore file with the necessary entries:
    echo .terraform/ > .gitignore
    echo *.tfstate* >> .gitignore
    echo .terraform.lock.hcl >> .gitignore
    
  4. First Snapshot: Run: git add . first, then if this is your first time using Git, you may need to set your user name and email before committing. If not already set, run:
    git config --global user.name "Your Name"
    git config --global user.email "[email protected]"
    
    Then, run git commit -m "Initial infra setup".

2. Terraform Syntax & Structure

Setting Up Terraform with the AWS Provider

Terraform is a "plug-and-play" engine. It needs the AWS Provider—a plugin that translates HCL code into AWS API calls.

AWS Credentials

You need an Access Key ID and Secret Access Key.
Security Rule: Never hardcode keys in .tf files. Use aws configure to store them in your local machine's encrypted credential store.

Step-by-Step: Authenticating

  1. Configure: Run aws configure. Enter your keys and region (ap-southeast-1).
  2. Define Provider: In main.tf:

Terraform

provider "aws" {  
  region = "us-east-1"  
}
  1. Init: Run terraform init. This downloads the provider binary to a hidden .terraform folder.

3. Terraform Variables

Resources are the components you build (Servers, VPCs). Variables are inputs that make your code reusable.

What are variables? (100-level)

A variable is a symbolic name for a value that can change. Instead of writing the same value everywhere (like t2.micro), you store it in one place and reference it.

At the 100-level, think of a variable as a labeled input box:

  • You give the box a name (like instance_type).
  • You tell Terraform what kind of value it holds (like string).
  • You optionally provide a default.
  • Anywhere you need that value, you reference the box (like var.instance_type).

This is beneficial in code because it improves:

  • Readability: instance_type = var.instance_type communicates intent (“this is configurable”) better than a random hardcoded string.
  • Maintainability: change the value once, and every place that references it updates.
  • Consistency: tags, names, instance sizes, CIDR blocks, and regions stay aligned.

Simple before/after example:

Hardcoded:

resource "aws_instance" "web" {
  instance_type = "t2.micro"
}

Variable-driven:

variable "instance_type" {
  type    = string
  default = "t2.micro"
}

resource "aws_instance" "web" {
  instance_type = var.instance_type
}

Where do variable values come from?

  • Defaults inside the variable block.
  • terraform.tfvars / *.tfvars files (common for workshop/student inputs).
  • Command line flags like -var and -var-file.
  • Environment variables like TF_VAR_instance_type.

Why variables matter in DevOps (200-level)

  • Reusability: use the same code for dev and prod by changing inputs.
  • Security: avoid hardcoding secrets; pass sensitive values at runtime (environment variables, secret managers).

Step-by-Step: Parameterization

  1. Create variables.tf:

Terraform

variable "project_name" {
  type    = string
  default = "devops-workshop"
}

variable "instance_type" {
  type    = string
  default = "t2.micro"
}
  1. Reference in main.tf:

Terraform

resource "aws_instance" "web_server" {
  ami           = data.aws_ami.latest_linux.id
  instance_type = var.instance_type

  tags = {
    Name = "${var.project_name}-server"
  }
}

4. AWS Basics: IAM

IAM (Identity and Access Management) is the most critical security layer in AWS. It defines Who (Principal) can do What (Action) to Which (Resource).

Think of IAM as the authorization system for AWS:

  • It answers questions like:
    • “Can this person log in and create an EC2 instance?”
    • “Can this application read from this specific S3 bucket?”
    • “Can this EC2 instance upload logs to CloudWatch?”
  • It is evaluated on every API call.

At a high level, AWS permission evaluation is:

  • Principal (an identity: user/role) makes a request
  • AWS checks Policies attached to that principal (and sometimes to the resource)
  • The request is either Allowed or Denied

Core IAM building blocks (and how they differ)

Users

A user is an identity meant to represent a human or a long-lived application identity.

  • Users can have:
    • a console password (for AWS Console login)
    • access keys (for CLI/API)
  • Users are best for:
    • humans (students, engineers) who need direct access
  • Users are generally not best for:
    • EC2/EKS/ECS workloads (use roles instead)

Terraform resources you will commonly see:

  • aws_iam_user
  • aws_iam_access_key
  • aws_iam_user_login_profile

Groups

A group is a container for users.

  • Groups do not have credentials.
  • You attach policies to the group, and every user in the group inherits them.
  • Groups are best for:
    • “teams” or “job functions” (e.g., devops, read_only, billing_admin)

Terraform resources:

  • aws_iam_group
  • aws_iam_user_group_membership
  • aws_iam_group_policy_attachment

Policies

A policy is a JSON document that defines permissions.

  • Policies define:
    • Action: what API calls are allowed/denied (e.g., s3:GetObject)
    • Resource: what the action applies to (e.g., a bucket ARN)
    • Effect: Allow or Deny
  • Policies come in two common flavors:
    • AWS managed policies: maintained by AWS (quick to use, broad)
    • Customer managed policies: written by you (more precise)

Important principle:

  • Least privilege: grant only what is needed.

Terraform resources:

  • aws_iam_policy (customer managed)
  • aws_iam_policy_attachment / aws_iam_role_policy_attachment / aws_iam_user_policy_attachment

Roles

A role is an identity that is assumed temporarily.

  • Roles do not have long-lived credentials stored in code.
  • When assumed, AWS returns temporary credentials via STS.
  • Roles are the standard way to grant permissions to:
    • AWS services (EC2, Lambda, EKS Pods via IRSA)
    • CI/CD systems
    • cross-account access

Key concept:

  • A role has a trust policy (who/what is allowed to assume it) and permission policies (what it can do once assumed).

Terraform resources:

  • aws_iam_role
  • aws_iam_role_policy_attachment
  • aws_iam_instance_profile (often needed for EC2)

Quick mental model (one-liners)

  • User: “A person/app identity with credentials.”
  • Group: “A way to assign policies to many users at once.”
  • Policy: “The permission rules (Allow/Deny) written as JSON.”
  • Role: “An identity assumed temporarily (best for AWS services and automation).”

Common workshop patterns (practical examples)

  • Humans:

    • Create users for students
    • Put them into a workshop_students group
    • Attach a scoped policy to the group
  • EC2 instances needing S3/CloudWatch:

    • Create an IAM role with a trust policy for ec2.amazonaws.com
    • Attach a policy like s3:GetObject and/or CloudWatch logs permissions
    • Attach it to instances via an instance profile
  • Kubernetes workloads (EKS):

    • Use roles + IRSA rather than node-wide permissions
    • Each controller/workload can get its own least-privilege role

Hierarchy of Permissions

  • Users: Unique identities for people or applications.
  • Groups: Containers of users that receive policy attachments as a unit.
  • Policies: JSON documents that explicitly grant or deny permissions.
  • Roles: Temporary credentials assumed by AWS services. For example, an EC2 instance can "assume a role" to upload files to S3 without needing a hardcoded password stored on the server's disk.

4.1. AWS on Terraform: IAM

Terraform Implementation

Implementation:

  1. Step 1, create a new file in your editor
  2. Step 2, add the code to the file

Terraform

resource "aws_iam_user" "workshop_user" {  
  name = "terraform-student"  
}

resource "aws_iam_policy" "s3_read_only" {  
  name        = "S3ReadOnlyPolicy"  
  description = "Allows read access to S3"  
  policy      = jsonencode({  
    Version = "2012-10-17"  
    Statement = [{  
      Action   = ["s3:Get*", "s3:List*"]  
      Effect   = "Allow"  
      Resource = "*"  
    }]  
  })  
}

resource "aws_iam_user_policy_attachment" "attach_s3" {  
  user       = aws_iam_user.workshop_user.name  
  policy_arn = aws_iam_policy.s3_read_only.arn  
}
  1. Save the file as iam.tf

5. AWS Basics: VPC Networking

VPC (Virtual Private Cloud) An AWS Virtual Private Cloud (VPC) is a private, isolated section of the AWS cloud where you can launch resources like servers (EC2) and databases (RDS). Think of it as your own digital data center in the cloud. You have full control over the environment, including IP address ranges, subnets, and network gateways.

AWS Service Components

  • Subnets: Divide your VPC into smaller IP ranges. Public Subnets are connected to the internet; Private Subnets are not.
  • Internet Gateway (IGW): The logical router that allows traffic to flow between your VPC and the public internet.
  • Security Groups: Acting as a "Virtual Firewall" at the instance level. They are Stateful, meaning if you allow an inbound request, the outbound response is automatically allowed.

5.1 AWS on Terraform: VPC Networking

Terraform Implementation

Implementation:

  1. Create a new file in your editor
  2. Add the following codeblock to the file

Terraform

resource "aws_vpc" "main" {  
  cidr_block           = "10.0.0.0/16"  
  enable_dns_hostnames = true  
  tags = { Name = "workshop-vpc" }  
}

resource "aws_internet_gateway" "main_gw" {  
  vpc_id = aws_vpc.main.id  
}

resource "aws_subnet" "public_a" {  
  vpc_id            = aws_vpc.main.id  
  cidr_block        = "10.0.1.0/24"  
  availability_zone = "us-east-1a"  
  map_public_ip_on_launch = true  
}

resource "aws_security_group" "web_sg" {  
  name   = "allow_http"  
  vpc_id = aws_vpc.main.id

  ingress {  
    from_port   = 80  
    to_port     = 80  
    protocol    = "tcp"  
    cidr_blocks = ["0.0.0.0/0"]  
  }  
}
  1. save the file as vpc.tf

6. AWS Basics: EC2

AWS EC2 (Elastic Compute Cloud) AWS EC2 (Elastic Compute Cloud) is a web service that provides secure, resizable compute capacity in the cloud. In simple terms, it allows you to rent virtual computers (called instances) to run your own applications.

Instead of buying physical hardware and waiting for it to be delivered and installed, you can launch an EC2 instance in minutes. Key Features of EC2

  • Elasticity: You can increase or decrease the number of instances you have running in minutes. This is often handled automatically using Auto Scaling.

  • Full Control: You have administrative access (root or administrator) to each instance. You can choose your operating system (Linux or Windows), CPU, memory, and storage.

  • Security: Instances are located within your VPC. You control access using Security Groups, which act as a virtual firewall for your server.

  • Pay-as-you-go: You only pay for the compute power you actually use. When you stop or terminate an instance, the billing stops.

Understanding Instance Types

AWS offers different families of instances optimized for different workloads. They are usually named with a letter and a number (e.g., t3.medium).

To launch one, you need an AMI (Amazon Machine Image), which is the OS template.

6.1 AWS on Terraform: EC2

Implementation:

  1. Create a new file in your editor
  2. Add the following codeblock to the file

Step-by-Step: EC2 Lab

  1. Add Data Source:

Terraform

data "aws_ami" "latest_linux" {  
  most_recent = true  
  owners      = ["amazon"]  
  filter { name = "name"; values = ["al2023-ami-*-x86_64"] }  
}
  1. Add Resource:

Terraform

resource "aws_instance" "web" {  
  ami           = data.aws_ami.latest_linux.id  
  instance_type = var.instance_type  
  tags          = { Name = "Workshop-Server" }  
}
  1. Save the file as ec2.tf

7. AWS Basics: ALB

AWS Application Load Balancer (ALB) is a service that automatically distributes incoming app traffic across multiple targets, such as EC2 instances, containers, and IP addresses.

It functions at the Application Layer (Layer 7 of the OSI model). This means it is "smart" enough to look at the content of the network traffic—like the URL path or HTTP headers—and decide where to send the request. Core Components

  • Listener: A process that checks for connection requests using the protocol and port you configure (e.g., HTTPS on port 443).

  • Rules: These determine how the load balancer routes requests to targets. For example, a rule can send traffic to one group of servers if the URL contains /images and another if it contains /api.

  • Target Group: A logical grouping of resources (like EC2 instances) that receive the traffic. The ALB performs health checks on these targets to ensure it only sends traffic to servers that are running correctly.

Key Advantages

  • Content-Based Routing: You can host multiple applications or microservices behind a single load balancer by routing based on the host header or path.

  • High Availability: If one server in a target group fails, the ALB automatically stops sending traffic to it and redirects users to healthy servers.

  • Security: ALB supports SSL/TLS termination, meaning it can handle the heavy lifting of encrypting and decrypting web traffic so your servers don't have to.

  • Sticky Sessions: It can remember a user and ensure all their requests during a session are sent to the same backend server.

This provides High Availability —if one server fails, the ALB reroutes traffic to the others.

Components

  • Listener: Checks for connection requests on a specific port (e.g., 80).
  • Target Group: The pool of EC2 instances that receive the traffic.
  • Health Checks: The ALB regularly pings the servers; if a server stops responding, it is marked as "unhealthy" and ignored.

7.1 AWS on Terraform: ALB

Terraform Implementation

Implementation:

  1. Create a new file in your editor
  2. Add the following codeblock to the file

Terraform

resource "aws_lb" "web_alb" {  
  name               = "workshop-alb"  
  internal           = false  
  load_balancer_type = "application"  
  security_groups    = [aws_security_group.web_sg.id]  
  subnets            = [aws_subnet.public_a.id, aws_subnet.public_b.id]  
}

resource "aws_lb_target_group" "web_tg" {  
  name     = "web-target-group"  
  port     = 80  
  protocol = "HTTP"  
  vpc_id   = aws_vpc.main.id  
}

resource "aws_lb_listener" "http" {  
  load_balancer_arn = aws_lb.web_alb.arn  
  port              = "80"  
  protocol          = "HTTP"

  default_action {  
    type             = "forward"  
    target_group_arn = aws_lb_target_group.web_tg.arn  
  }  
}
  1. Save the file as alb.tf

8. Terraform Modules

Modules are self-contained packages of Terraform configurations. They allow you to build a "Standard VPC" or "Standard Database" blueprint once and reuse it across multiple projects.

Why use them?

  • Encapsulation: Hide complex logic behind simple input variables.
  • Standardization: Ensure every team in your company builds infrastructure the same way.
  • Versioning: Update the module code in one place and have users "upgrade" their stacks when ready.

Now we will put everthing together.

8.1 Terraform Modules

Terraform Implementation

  1. Create a folder modules/iam

  2. Copy the iam.tf to the file modules/iam/main.tf

  3. Create a folder modules/vpc

  4. Copy the vpc.tf to the file modules/vpc/main.tf

  5. Create a folder modules/ec2

  6. Copy the ec2.tf to the file modules/ec2/main.tf

  7. Create a folder modules/alb

  8. Copy the alb.tf to modules/alb/main.tf

  9. In your root main.tf, call each of the modules:

Terraform

module "iam" {  
  source      = "./modules/iam"   
}

module "vpc" {
  source      = "./modules/vpc"
}

module "ec2" {
  source      = "./modules/ec2"
}

module "alb" {
  source      = "./modules/alb"
}
  1. Put the entire directory into a zip file so it can be uploaded to TerraViz.

Advanced Topics

9. Terraform State

The State File is Terraform's memory. For teams, we move this to Amazon S3 (storage) and use DynamoDB (database) for State Locking. This prevents two people from running apply simultaneously and corrupting the file.

Why remote state & locking? (300-level)

  • Shared source of truth: everyone sees the same infrastructure status.
  • Concurrency safety: DynamoDB locking prevents two applies at once.
  • Recoverability: state stored in S3 can be versioned and backed up.

Step-by-Step: Remote State

  1. Configure Backend:

Terraform

terraform {  
  backend "s3" {  
    bucket         = "my-tf-state-bucket"  
    key            = "global/s3/terraform.tfstate"  
    region         = "us-east-1"  
    dynamodb_table = "tf-lock-table"  
    encrypt        = true
  }  
}
  1. Migrate: Run terraform init. Type yes to move state to S3.

10. Auto Scaling Groups (ASG)

An Auto Scaling Group (ASG) is an AWS service that runs a fleet of EC2 instances and automatically adjusts how many instances are running based on:

  • demand (CPU, requests, queue depth)
  • time (schedule)
  • instance health (replace unhealthy instances)
  • desired availability (multi-AZ)

Using Terraform for ASGs is a core pattern for production automation because you can:

  • version-control the compute configuration
  • standardize “how we run instances” across teams
  • evolve capacity and scaling policies safely via PRs

Key terms and concepts (definitions)

Auto Scaling Group (ASG)

  • A managed group of EC2 instances.
  • You specify a desired capacity (how many instances you want), plus min and max bounds.
  • ASG can run across multiple subnets / Availability Zones for high availability.

Launch Template

  • A reusable “instance blueprint”:
    • AMI
    • instance type
    • security groups
    • IAM instance profile
    • user data (bootstrap script)
  • In Terraform this is aws_launch_template.

Target Group (ALB/NLB)

  • A set of IPs/instances that a load balancer routes traffic to.
  • ASGs can attach to a target group so instances automatically register/deregister.

Health checks

  • EC2 health check: instance-level status checks.
  • ELB health check: the load balancer target group health check. ASG can use this for smarter replacement.

Scaling policy

  • Rules that change desired capacity.
  • Common styles:
    • Target tracking (recommended default): “keep average CPU near 50%”.
    • Step scaling: “if CPU > 80% add 2 instances”.
    • Scheduled scaling: “scale up at 9am, down at 6pm”.

When to use an ASG vs single EC2

  • Use single EC2 when:

    • it’s a demo, a dev sandbox, or a one-off host
    • you can tolerate downtime
  • Use ASG when:

    • you want automation, self-healing, and capacity changes
    • you’re behind an ALB/NLB
    • you need multi-AZ resilience

Example 1: Minimal ASG (no load balancer)

This example demonstrates the smallest set of resources required to automate a fleet.

Terraform

data "aws_ami" "amazon_linux" {
  most_recent = true
  owners      = ["amazon"]

  filter {
    name   = "name"
    values = ["al2023-ami-*-x86_64"]
  }
}

resource "aws_launch_template" "web" {
  name_prefix   = "workshop-web-"
  image_id      = data.aws_ami.amazon_linux.id
  instance_type = "t3.micro"

  vpc_security_group_ids = [aws_security_group.web_sg.id]

  user_data = base64encode(<<-EOF
    #!/bin/bash
    set -e
    dnf -y update
    dnf -y install nginx
    systemctl enable nginx
    systemctl start nginx
    echo "hello from ASG" > /usr/share/nginx/html/index.html
  EOF
  )

  tag_specifications {
    resource_type = "instance"
    tags = {
      Name = "workshop-asg-web"
    }
  }
}

resource "aws_autoscaling_group" "web" {
  name                = "workshop-asg-web"
  min_size            = 1
  desired_capacity    = 2
  max_size            = 4
  vpc_zone_identifier = [aws_subnet.public_a.id, aws_subnet.public_b.id]

  health_check_type         = "EC2"
  health_check_grace_period = 120

  launch_template {
    id      = aws_launch_template.web.id
    version = "$Latest"
  }

  tag {
    key                 = "Environment"
    value               = terraform.workspace
    propagate_at_launch = true
  }
}

This pattern connects the ASG to a target group so instances automatically join/leave the load balancer.

Terraform

resource "aws_lb_target_group" "web" {
  name     = "workshop-asg-tg"
  port     = 80
  protocol = "HTTP"
  vpc_id   = aws_vpc.main.id

  health_check {
    path                = "/"
    healthy_threshold   = 2
    unhealthy_threshold = 2
    interval            = 15
    timeout             = 5
    matcher             = "200"
  }
}

resource "aws_autoscaling_group" "web" {
  name                = "workshop-asg-web"
  min_size            = 1
  desired_capacity    = 2
  max_size            = 6
  vpc_zone_identifier = [aws_subnet.public_a.id, aws_subnet.public_b.id]

  health_check_type         = "ELB"
  health_check_grace_period = 180

  launch_template {
    id      = aws_launch_template.web.id
    version = "$Latest"
  }

  target_group_arns = [aws_lb_target_group.web.arn]
}

resource "aws_lb_listener" "http" {
  load_balancer_arn = aws_lb.web_alb.arn
  port              = 80
  protocol          = "HTTP"

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.web.arn
  }
}

Target tracking is usually the simplest and most stable choice.

Terraform

resource "aws_autoscaling_policy" "cpu_target" {
  name                   = "cpu-target-tracking"
  policy_type            = "TargetTrackingScaling"
  autoscaling_group_name = aws_autoscaling_group.web.name

  target_tracking_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ASGAverageCPUUtilization"
    }

    target_value = 50.0
  }
}

Example 4: Scheduled scaling (time-based automation)

Scheduled scaling is ideal for predictable traffic patterns (e.g., business hours).

Terraform

resource "aws_autoscaling_schedule" "scale_up_morning" {
  scheduled_action_name  = "scale-up-morning"
  autoscaling_group_name = aws_autoscaling_group.web.name
  recurrence             = "0 9 * * 1-5"  # Mon-Fri 09:00 UTC
  desired_capacity       = 4
  min_size               = 2
  max_size               = 8
}

resource "aws_autoscaling_schedule" "scale_down_evening" {
  scheduled_action_name  = "scale-down-evening"
  autoscaling_group_name = aws_autoscaling_group.web.name
  recurrence             = "0 18 * * 1-5" # Mon-Fri 18:00 UTC
  desired_capacity       = 2
  min_size               = 1
  max_size               = 6
}

Example 5: Rolling updates with Instance Refresh (safe deployments)

When you change the launch template (AMI, user data, instance type), you usually want a controlled rollout.

Terraform

resource "aws_autoscaling_group" "web" {
  # ...existing config...

  instance_refresh {
    strategy = "Rolling"
    preferences {
      min_healthy_percentage = 90
      instance_warmup        = 120
    }
    triggers = ["launch_template"]
  }
}

Example 6: Mixed instances (cost + resilience)

This lets the ASG run multiple instance types (and optionally Spot) to improve capacity availability and reduce cost.

Terraform

resource "aws_autoscaling_group" "web" {
  name                = "workshop-asg-web"
  min_size            = 1
  desired_capacity    = 2
  max_size            = 8
  vpc_zone_identifier = [aws_subnet.public_a.id, aws_subnet.public_b.id]

  mixed_instances_policy {
    launch_template {
      launch_template_specification {
        launch_template_id = aws_launch_template.web.id
        version            = "$Latest"
      }

      override { instance_type = "t3.micro" }
      override { instance_type = "t3.small" }
      override { instance_type = "t3a.micro" }
    }

    instances_distribution {
      on_demand_base_capacity                  = 1
      on_demand_percentage_above_base_capacity = 50
      spot_allocation_strategy                 = "capacity-optimized"
    }
  }
}

Operational notes (what teams usually get wrong)

  • ASGs typically work best behind an ALB/NLB.
  • Prefer TargetTrackingScaling first; add more complex policies only if needed.
  • Use ELB health checks when using a load balancer.
  • Use Instance Refresh for safe rollouts when you change AMIs/user data.
  • Keep min_size >= 2 for high availability in production (multi-AZ).

11. Terraform Workspaces

Workspaces allow you to use the exact same code to manage separate environments (Dev, Staging, Prod). Each workspace maintains its own state file, ensuring that a change in "Dev" never touches "Prod."

Commands

  • terraform workspace new dev
  • terraform workspace select dev

Terraform Implementation

Use the ${terraform.workspace} variable to name resources dynamically.

Terraform

resource "aws_instance" "app_server" {  
  ami           = data.aws_ami.latest_linux.id  
  instance_type = terraform.workspace == "prod" ? "t3.medium" : "t2.micro"  
    
  tags = {  
    Name        = "server-${terraform.workspace}"  
    Environment = terraform.workspace  
  }  
}

12. Managing EKS (Kubernetes) Clusters Using Terraform

Amazon EKS (Elastic Kubernetes Service) is AWS’s managed Kubernetes control plane. Terraform lets you provision:

  • the EKS cluster (control plane + networking)
  • worker nodes (managed node groups)
  • IAM integration for Pods (IRSA)
  • cluster add-ons (CoreDNS, VPC CNI, kube-proxy, EBS CSI)
  • optional Kubernetes objects (namespaces, config maps, service accounts)

This section focuses on infrastructure automation for Kubernetes on AWS: creating a production-ready baseline you can replicate across environments.

Key terms and concepts (definitions)

Kubernetes (K8s)

  • An orchestration platform for running containers.
  • You deploy workloads (Pods) and Kubernetes schedules them onto worker nodes.

EKS Cluster / Control plane

  • Managed by AWS: Kubernetes API server, etcd, control plane components.
  • You pay for the cluster control plane and then separately for worker nodes.

Worker nodes

  • EC2 instances that run your Pods.
  • In EKS you normally use:
    • Managed Node Groups (aws_eks_node_group): AWS manages the underlying ASG.
    • Fargate profiles: serverless Pods (not covered deeply here).

kubeconfig

  • A local config file that tells kubectl how to talk to a Kubernetes cluster.
  • For EKS, authentication uses AWS IAM.

OIDC provider

  • EKS exposes an OIDC issuer for the cluster.
  • Terraform can create an IAM OIDC provider so IAM roles can be assumed by Kubernetes service accounts.

IRSA (IAM Roles for Service Accounts)

  • A pattern where Kubernetes Pods assume AWS IAM roles without node-level credentials.
  • You create:
    • an IAM role with a trust policy for the cluster’s OIDC provider
    • a Kubernetes service account annotated with that role ARN

Add-ons

  • EKS-managed components installed into the cluster (CoreDNS, VPC CNI, kube-proxy, EBS CSI).
  • Managed add-ons simplify upgrades and lifecycle.
  • EKS control plane in your VPC
  • Private subnets for worker nodes (recommended)
  • Optional public subnets for ALB/NLB
  • Node group spanning at least 2 AZs
  • IRSA enabled for controllers (ALB controller, external-dns, cluster-autoscaler, etc.)

Example 1: Minimal EKS cluster + managed node group (Terraform-only)

This creates:

  • an EKS cluster
  • a managed node group

Terraform

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 6.0"
    }
    tls = {
      source  = "hashicorp/tls"
      version = "~> 4.0"
    }
  }
}

provider "aws" {
  region = var.aws_region
}

data "aws_caller_identity" "current" {}

resource "aws_iam_role" "eks_cluster" {
  name = "workshop-eks-cluster-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Principal = { Service = "eks.amazonaws.com" }
        Action = "sts:AssumeRole"
      }
    ]
  })
}

resource "aws_iam_role_policy_attachment" "eks_cluster_AmazonEKSClusterPolicy" {
  role       = aws_iam_role.eks_cluster.name
  policy_arn  = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
}

resource "aws_eks_cluster" "workshop" {
  name     = "workshop-eks-${terraform.workspace}"
  version  = "1.30"
  role_arn = aws_iam_role.eks_cluster.arn

  vpc_config {
    subnet_ids = [aws_subnet.public_a.id, aws_subnet.public_b.id]
  }

  depends_on = [aws_iam_role_policy_attachment.eks_cluster_AmazonEKSClusterPolicy]
}

resource "aws_iam_role" "eks_nodes" {
  name = "workshop-eks-node-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Principal = { Service = "ec2.amazonaws.com" }
        Action = "sts:AssumeRole"
      }
    ]
  })
}

resource "aws_iam_role_policy_attachment" "node_AmazonEKSWorkerNodePolicy" {
  role      = aws_iam_role.eks_nodes.name
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
}

resource "aws_iam_role_policy_attachment" "node_AmazonEKS_CNI_Policy" {
  role      = aws_iam_role.eks_nodes.name
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
}

resource "aws_iam_role_policy_attachment" "node_AmazonEC2ContainerRegistryReadOnly" {
  role      = aws_iam_role.eks_nodes.name
  policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
}

resource "aws_eks_node_group" "managed" {
  cluster_name    = aws_eks_cluster.workshop.name
  node_group_name = "managed-ng"
  node_role_arn   = aws_iam_role.eks_nodes.arn

  subnet_ids = [aws_subnet.public_a.id, aws_subnet.public_b.id]

  scaling_config {
    desired_size = 2
    min_size     = 2
    max_size     = 6
  }

  instance_types = ["t3.medium"]
  capacity_type  = "ON_DEMAND"

  update_config {
    max_unavailable = 1
  }

  depends_on = [
    aws_iam_role_policy_attachment.node_AmazonEKSWorkerNodePolicy,
    aws_iam_role_policy_attachment.node_AmazonEKS_CNI_Policy,
    aws_iam_role_policy_attachment.node_AmazonEC2ContainerRegistryReadOnly,
  ]
}

Example 2: Exporting kubeconfig access (getting kubectl working)

Terraform can output the data needed to configure access.

Terraform

data "aws_eks_cluster" "workshop" {
  name = aws_eks_cluster.workshop.name
}

data "aws_eks_cluster_auth" "workshop" {
  name = aws_eks_cluster.workshop.name
}

output "eks_endpoint" {
  value = data.aws_eks_cluster.workshop.endpoint
}

output "eks_ca" {
  value = data.aws_eks_cluster.workshop.certificate_authority[0].data
}

To configure local access, common approaches:

  • Use AWS CLI:
    • aws eks update-kubeconfig --name workshop-eks-dev --region us-east-1
  • Or manage a dedicated kubeconfig file per workspace.

Example 3: Enable OIDC + IRSA (IAM Roles for Service Accounts)

This is the key mechanism for giving Pods AWS permissions safely.

Terraform

data "tls_certificate" "eks" {
  url = aws_eks_cluster.workshop.identity[0].oidc[0].issuer
}

resource "aws_iam_openid_connect_provider" "eks" {
  url = aws_eks_cluster.workshop.identity[0].oidc[0].issuer

  client_id_list = ["sts.amazonaws.com"]
  thumbprint_list = [data.tls_certificate.eks.certificates[0].sha1_fingerprint]
}

resource "aws_iam_role" "irsa_example" {
  name = "workshop-irsa-example"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = "sts:AssumeRoleWithWebIdentity"
        Principal = { Federated = aws_iam_openid_connect_provider.eks.arn }
        Condition = {
          StringEquals = {
            "${replace(aws_eks_cluster.workshop.identity[0].oidc[0].issuer, "https://", "")}:sub" = "system:serviceaccount:kube-system:aws-lb-controller"
          }
        }
      }
    ]
  })
}

At this point you typically also:

  • Attach IAM policies to aws_iam_role.irsa_example
  • Create a Kubernetes service account annotated with that role ARN

Example 4: Install EKS managed add-ons

Add-ons are commonly managed by Terraform so upgrades are repeatable.

Terraform

resource "aws_eks_addon" "coredns" {
  cluster_name = aws_eks_cluster.workshop.name
  addon_name   = "coredns"
}

resource "aws_eks_addon" "kube_proxy" {
  cluster_name = aws_eks_cluster.workshop.name
  addon_name   = "kube-proxy"
}

resource "aws_eks_addon" "vpc_cni" {
  cluster_name = aws_eks_cluster.workshop.name
  addon_name   = "vpc-cni"
}

Example 5: Managing Kubernetes objects with Terraform (Kubernetes provider)

Terraform can manage resources inside the cluster (namespaces, config maps, service accounts).

Terraform

provider "kubernetes" {
  host                   = aws_eks_cluster.workshop.endpoint
  cluster_ca_certificate = base64decode(aws_eks_cluster.workshop.certificate_authority[0].data)
  token                  = data.aws_eks_cluster_auth.workshop.token
}

resource "kubernetes_namespace" "apps" {
  metadata {
    name = "apps"
  }
}

Example 6: Safe upgrades (cluster version + node group rollout)

In production you typically upgrade in this order:

  1. Upgrade EKS cluster version
  2. Upgrade add-ons (coredns, kube-proxy, vpc-cni)
  3. Roll node groups (managed node group update)
  4. Roll workloads if necessary

Terraform tactics:

  • Update aws_eks_cluster.workshop.version
  • Keep update_config.max_unavailable = 1 in node groups
  • Use separate node groups for blue/green capacity if you need zero downtime

13. Clean Up

Destroy is the process of reversing everything. Terraform reads the state file, identifies every resource created, and deletes them in the reverse order of their dependency (e.g., Instances first, then Subnets, then the VPC).

Step-by-Step Cleanup

  1. Preview: Run terraform plan -destroy.
  2. Execute: Run terraform destroy. Type yes.
  3. Workspace Cleanup: Remember to switch workspaces (terraform workspace select prod) and run destroy for each environment to avoid orphaned costs.

Appendix

Technical Prerequisites (2026 Edition)

Before starting, ensure all participants have the following versions installed:

  • Terraform CLI: 1.14+
  • AWS CLI: 2.x
  • Git: 2.40+ (any recent version is fine)
  • Terraform AWS Provider: ~> 6.0
Local workstation toolchain (Terraform + AWS CLI + Git → AWS APIs)

🛠️ Final Consolidated Source Code (main.tf)

Terraform

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 6.0"
    }
  }
}

provider "aws" {
  region = "us-east-1"
}

locals { env = terraform.workspace }

# Network Layer  
resource "aws_vpc" "main" {  
  cidr_block = "10.0.0.0/16"  
  tags = { Name = "vpc-${local.env}" }  
}

resource "aws_subnet" "pub_a" {  
  vpc_id = aws_vpc.main.id  
  cidr_block = "10.0.1.0/24"  
  availability_zone = "us-east-1a"  
}

# ALB & Security Group  
resource "aws_security_group" "lb_sg" {  
  vpc_id = aws_vpc.main.id  
  ingress {   
    from_port = 80; to_port = 80; protocol = "tcp"; cidr_blocks = \["0.0.0.0/0"]   
  }  
  egress {   
    from_port = 0; to_port = 0; protocol = "-1"; cidr_blocks = \["0.0.0.0/0"]   
  }  
}

resource "aws_lb" "app_lb" {  
  name = "alb-${local.env}"  
  load_balancer_type = "application"  
  security_groups = [aws_security_group.lb_sg.id]  
  subnets = [aws_subnet.pub_a.id]  
}

# EC2 Compute  
data "aws_ami" "amazon_linux" {  
  most_recent = true  
  owners = ["amazon"]  
  filter { name = "name"; values = ["al2023-ami-*-x86_64"] }  
}

resource "aws_instance" "web" {  
  ami = data.aws_ami.amazon_linux.id  
  instance_type = "t2.micro"  
  subnet_id = aws_subnet.pub_a.id  
  tags = { Name = "server-${local.env}" }  
}