Cloud Incident Response

Blue Team · Expert

Master cloud-native incident response — reading CloudTrail events to reconstruct attack timelines, detecting IAM privilege escalation and persistence techniques, understanding the shared responsibility model, and applying AWS-native containment and eradication procedures that preserve forensic evidence while stopping active compromise.

Expert Blue Team Path ⏱ 28 min read

Learning Progress

What is it?

Cloud IR follows the same principles as traditional incident response but requires understanding cloud-specific attack surfaces, log sources, and attacker techniques. Cloud environments present unique challenges: infrastructure is ephemeral (instances can be terminated, losing volatile evidence), logs may not persist by default, and the blast radius of a compromised cloud identity can be enormous — a single IAM key with excessive permissions can compromise an entire cloud estate within minutes.

This lab focuses on AWS, but the principles apply to Azure and GCP. The attacker techniques are similar across platforms; only the log source names and CLI syntax differ. The fundamental principle — that identity is the security boundary in cloud environments — is universal.

🚨IAM is the perimeter: In cloud environments, identity and access management IS the security boundary. A leaked IAM key or SSRF that reaches the metadata service can give an attacker the same access as a legitimate administrator — from anywhere on the internet, with no network-layer controls to stop them.

How Cloud IR Differs from Traditional IR

Traditional on-premises incident response assumes a relatively stable infrastructure: servers stay running, logs persist on disk, and the network perimeter provides some control over attacker movement. Cloud environments break all three of these assumptions in ways that require fundamentally different response procedures:

Ephemeral infrastructure: Auto-scaling groups launch and terminate instances constantly. An instance involved in a security event may be terminated by normal scaling operations before you can capture its memory or disk state. Cloud IR requires proactive evidence preservation — snapshotting EBS volumes and disabling instance termination protection — before attempting any investigation or remediation.
Default log gaps: AWS CloudTrail is enabled by default for management API events, but S3 data events, Lambda invocations, and RDS query logs require explicit enablement. Many cloud breaches cannot be fully reconstructed because the relevant logs were never enabled. The first hardening question for cloud IR readiness is: what are we not logging?
Identity-based lateral movement: In on-premises environments, lateral movement requires network access. In cloud environments, an attacker with stolen IAM credentials can pivot to any authorised resource from their own laptop — bypassing all network-layer controls. The blast radius is defined by IAM permissions, not network topology.
Shared responsibility: AWS manages the security of the cloud infrastructure; you manage the security of what you put in the cloud. AWS will not respond to your incident — that is entirely your responsibility. Understanding what AWS logs are available, how to access them under time pressure, and what they do and don't contain is prerequisite knowledge for cloud IR.

📌 Non-Technical Analogy — Cloud IR vs Traditional IR

Traditional IR is like investigating a burglary in a building you own — the rooms are still there, the CCTV footage is still on the DVR, and you can take your time examining everything. Cloud IR is like investigating a crime in a hotel where rooms are automatically reassigned every few hours, security footage is only recorded for 90 days by default and requires you to specifically request the right footage in advance, the hotel's security team won't help you because their responsibility ends at the building — what happens inside the room is yours — and the suspect used a valid electronic keycard they obtained through a locksmith rather than breaking a window. Different tools, different assumptions, different urgency around evidence preservation.

Cloud attack surface

Cloud-Specific Attack Techniques

AWS Attack Techniques and Primary Log Sources

Credential theft    IAM key from GitHub, code repos, env vars, SSRF → metadata service
Privilege escalation PassRole, iam:CreatePolicyVersion, iam:AttachUserPolicy, iam:AddUserToGroup
Persistence         New IAM user, access key creation, Lambda backdoor, SSO hijack
Discovery           DescribeInstances, ListBuckets, GetCallerIdentity, ListRoles
Lateral movement    AssumeRole, switch between accounts, cross-account trust abuse
Exfiltration        S3 GetObject bulk download, RDS snapshot export, EBS snapshot share
Impact              EC2 cryptomining, S3 ransomware (object deletion/encryption), route table

Primary forensic log source: AWS CloudTrail (all API calls, 90-day default retention)
Secondary sources: VPC Flow Logs, S3 Access Logs, CloudWatch Logs, GuardDuty findings

The IAM Privilege Escalation Landscape

IAM privilege escalation is one of the most critical categories for cloud IR analysts to understand. Attackers who obtain limited IAM credentials frequently use specific API call sequences to expand their permissions toward full administrator access. The Pacu framework documents over 30 distinct escalation paths from different starting IAM permission sets. Each path leaves a specific CloudTrail signature that a well-tuned alert rule can detect.

The most commonly observed paths in real incidents are: iam:CreatePolicyVersion (create a new policy version with AdministratorAccess and make it active), iam:AttachUserPolicy (attach an existing AWS-managed admin policy directly to the attacker's user), and iam:PassRole combined with service abuse (pass a highly-privileged role to an EC2 instance or Lambda, then access that service to execute code with the elevated permissions). All three share a common detection pattern: IAM modification events originating from an unexpected source IP or outside business hours.

Examples

Cloud IR Investigation in Practice

Example 01Detecting IAM key exposure from GitHub

When an IAM key is exposed in a public repository, automated bots typically use it within minutes. CloudTrail shows API calls from unexpected sources providing the timeline of compromise.

# Developer pushes code with AWS key to public GitHub repo
# Within 8 minutes, CloudTrail shows:
Time: 14:33:01  Event: GetCallerIdentity  UserAgent: aws-cli/2.x
  Source IP: 185.220.101.45  (Tor exit node -- not developer's IP)
Time: 14:33:15  Event: ListBuckets  (enumerating S3)
Time: 14:33:44  Event: DescribeInstances  (enumerating EC2)
Time: 14:34:02  Event: GetSecretValue  SecretId: prod/database/password
Time: 14:34:18  Event: CreateUser  UserName: backup-svc  (persistence)
# Key compromised, recon complete, persistence established in 77 seconds

Example 02SSRF to metadata service credential theft

SSRF vulnerabilities in web apps can reach the EC2 metadata service and steal the instance's IAM role credentials — granting all permissions the role has without any IAM key being exposed.

# Attacker exploits SSRF in web app to reach metadata service:
GET http://169.254.169.254/latest/meta-data/iam/security-credentials/
WebAppRole  (IAM role name)
GET http://169.254.169.254/latest/meta-data/iam/security-credentials/WebAppRole
{
  "AccessKeyId": "ASIA...",
  "SecretAccessKey": "...",
  "Token": "...",
  "Expiration": "2026-05-14T16:00:00Z"
}
# Temporary credentials stolen -- attacker has WebAppRole permissions
# IMDSv2 (requiring a PUT token request first) prevents this attack entirely

Example 03IAM privilege escalation detection

Attackers with limited IAM permissions use specific API calls to escalate to full admin access. The sequence in CloudTrail is distinctive and should trigger immediate alerting.

# CloudTrail shows privilege escalation chain:
14:35:01  CreatePolicyVersion  (attacker creates new policy with AdministratorAccess)
14:35:12  SetDefaultPolicyVersion  (makes new admin version active)
14:35:33  AttachUserPolicy  UserName=attacker  PolicyArn=arn:aws:iam::aws:policy/AdministratorAccess
14:36:01  CreateAccessKey  UserName=attacker  (new persistent key created)
# 95 seconds from limited to full admin access using iam:CreatePolicyVersion
# The Pacu framework automates this from 30+ different starting positions

Example 04S3 data exfiltration detection

Attackers with S3 access enumerate buckets and systematically exfiltrate sensitive data. CloudTrail logs every S3 API call when data event logging is explicitly enabled.

14:40:01  ListBuckets  (enumerating all accessible buckets)
14:40:22  ListObjectsV2  Bucket: corp-financial-data-prod
14:40:44  GetBucketAcl  (checking if bucket is public)
14:41:01 to 14:58:33  GetObject  Bucket: corp-financial-data-prod
  Objects downloaded: 4,821 files  Total: 2.3 GB in 17 minutes
# Systematic exfiltration of entire S3 bucket
# GuardDuty fires: "Unusual data access from anomalous location"

Example 05Containment and remediation in AWS

Cloud IR containment uses AWS-native tools to isolate compromised identities and resources without disrupting unaffected services. Order of operations matters — preserve evidence before remediation.

Immediate (IAM key compromise) — preserve then disable
  aws iam delete-access-key --access-key-id AKIA...
  aws iam attach-user-policy --user-name attacker --policy-arn DenyAll
  Review and revoke all access keys created by compromised identity

Containment (EC2 instance) — isolate before terminating
  Isolate: move instance to quarantine security group (no inbound/outbound)
  Preserve: create EBS snapshot before any remediation
  Memory: SSM Run Command to acquire memory dump if feasible

Eradication
  Delete attacker-created IAM users, access keys, and policy versions
  Review CloudTrail for ALL actions taken with compromised credentials
  Enable AWS Config rules to detect future policy violations

Key Concepts

What You Need to Know

🔑

IAM as the Perimeter

In cloud environments, identity IS the security boundary. A compromised IAM key or role provides the same access as a legitimate user — without any network-layer controls applying.

📜

CloudTrail

AWS CloudTrail logs every API call — who, what, when, from where. Primary forensic data source for any AWS incident. Enable in all regions and enable S3 data events.

📡

IMDS and SSRF

EC2 Instance Metadata Service (169.254.169.254) provides temporary credentials. SSRF can reach it. IMDSv2 (token-required PUT before GET) prevents credential theft via SSRF.

⚡

Blast Radius

A single over-privileged IAM key can compromise all accounts, buckets, and instances. Least privilege is the primary cloud security control — audit permissions regularly.

🛡️

GuardDuty

AWS managed threat detection. Analyses CloudTrail, VPC Flow Logs, and DNS logs for known attack patterns. Enable in all regions — it is the cloud-native SIEM equivalent.

📊

Ephemeral Infrastructure

Cloud instances can be terminated by scaling events, losing volatile evidence. Always snapshot EBS volumes and capture CloudTrail logs before terminating any compromised instance.

Forensic Log Sources

CloudTrail Deep Dive — What It Logs and What It Misses

CloudTrail is the cornerstone of AWS forensics, but understanding its coverage gaps is as important as knowing how to query it. An analyst who trusts CloudTrail as a complete audit log of all activity in their AWS account will miss significant attacker activity that occurs outside its scope.

What CloudTrail Covers by Default

CloudTrail management events are enabled by default and capture all control-plane API calls: IAM changes, EC2 instance launch and termination, S3 bucket creation and policy changes, VPC configuration changes, and similar operations. These events are retained for 90 days in the CloudTrail Event History with no additional configuration.

What management events do NOT capture: the contents of S3 object operations (who downloaded which file — only that the API was called), Lambda function invocations and their input data, RDS query execution, or the actual network traffic within VPCs. For a complete investigation, management events must be supplemented by the data-plane logs described below.

What Requires Explicit Enablement

S3 data events: Captures every GetObject, PutObject, and DeleteObject call on specific buckets. Required for exfiltration investigation. Has cost implications — only enable on sensitive buckets if budget-constrained.
Lambda data events: Captures every Lambda function invocation. Required for investigating backdoored Lambda functions.
VPC Flow Logs: Network traffic metadata (source IP, destination IP, port, protocol, bytes transferred). Does not capture payload content. Essential for lateral movement investigation and anomaly detection.
CloudWatch Logs: Application-level logs from EC2 instances, ECS containers, and other services. Requires CloudWatch agent installation or direct API calls from applications.
S3 Server Access Logs: More granular than CloudTrail S3 data events. Captures requestor IP, request URI, response code, and bytes transferred for every request.

⚠️Log retention gap: CloudTrail Event History retains 90 days by default — but only if you haven't configured a trail. If you have a trail writing to S3 with a lifecycle policy, that policy determines actual retention. Many organisations discover during an incident that their CloudTrail logs were being deleted after 30 days by a cost-saving lifecycle rule. The first action in any cloud IR readiness programme is verifying that CloudTrail is writing to a protected S3 bucket with Object Lock enabled, preventing deletion or modification of log records.

Example 06CloudTrail query patterns for common investigation scenarios

Structured CloudTrail query patterns for the most common investigation scenarios — usable directly in Athena, CloudWatch Logs Insights, or exported to a SIEM.

--- Find all actions by a specific access key (scope compromise) ---
aws cloudtrail lookup-events \
  --lookup-attributes AttributeKey=AccessKeyId,AttributeValue=AKIAIOSFODNN7EXAMPLE \
  --start-time 2026-05-01 --end-time 2026-05-15

--- Athena query: all IAM modifications in past 24 hours ---
SELECT eventTime, userIdentity.arn, eventName, requestParameters
FROM cloudtrail_logs
WHERE eventSource = 'iam.amazonaws.com'
AND eventTime > current_timestamp - interval '24' hour
ORDER BY eventTime;

--- GuardDuty findings aggregated by severity ---
aws guardduty list-findings \
  --detector-id [id] \
  --finding-criteria '{"Criterion":{"severity":{"Gte":7}}}'
# severity >= 7 = High/Critical findings requiring immediate investigation

--- Find newly created IAM users in last 30 days ---
SELECT eventTime, userIdentity.arn, requestParameters.userName
FROM cloudtrail_logs
WHERE eventName = 'CreateUser'
AND eventTime > current_timestamp - interval '30' day;

IR Methodology

Cloud IR Phase Framework

Cloud incident response follows the same NIST SP 800-61 phases as traditional IR — Preparation, Detection & Analysis, Containment, Eradication, Recovery, and Post-Incident Activity — but each phase has cloud-specific procedures and sequencing requirements that differ significantly from on-premises response.

Phase 1

Detect

GuardDuty finding, CloudTrail anomaly, billing alert

Phase 2

Preserve

Snapshot EBS, export CloudTrail, disable termination

Phase 3

Contain

Quarantine SG, deny-all IAM policy, disable keys

Phase 4

Investigate

Reconstruct timeline, scope compromise, identify persistence

Phase 5

Eradicate

Delete attacker resources, rotate secrets, patch root cause

Phase 2 — Evidence Preservation Is Non-Negotiable

The most common cloud IR failure is performing containment actions that destroy evidence before it is preserved. Deleting an IAM key immediately sounds correct but loses the complete activity record associated with it in CloudTrail. Terminating a compromised EC2 instance loses volatile memory state. Cloud IR requires a strict preserve-then-contain sequencing that many practitioners trained on traditional IR get wrong initially.

The evidence preservation checklist before any containment action:

Export all CloudTrail events associated with the compromised credential's access key ID to persistent storage outside the affected account.
Create an EBS snapshot of any compromised EC2 instances before isolation or termination. Tag the snapshot with the incident ID and timestamp.
Export VPC Flow Logs for the relevant time window and relevant VPC.
Document the current state of all IAM policies, attached policies, and inline policies for the compromised user or role — before revoking anything.
Record the exact IAM permissions the attacker had at the time of discovery. This is needed for scoping what resources could have been accessed.

IR ScenarioIAM Key Exposed in GitHub — 8-Minute Compromise to Full Response

T+0:00 — Detection: AWS GuardDuty fires "UnauthorizedAccess:IAMUser/InstanceCredentialExfiltration.InsideAWS" finding. Simultaneously, GitHub's secret scanning sends an email notification that a credential was detected in a public commit pushed 8 minutes ago. The responder's first action is to pull the full CloudTrail record for the affected access key ID — not to delete the key.

T+0:05 — Scoping: CloudTrail shows GetCallerIdentity (reconnaissance), ListBuckets, ListSecrets, GetSecretValue (two secrets accessed), CreateUser, and CreateAccessKey — all within 77 seconds of the key's first use. The attacker has read two production secrets and created a new backdoor IAM user.

T+0:08 — Preserve then contain: Export CloudTrail events for both the original key and the new backdoor user. Document all permissions on both. Then: delete the original leaked key, apply a deny-all inline policy to the backdoor user, and begin rotating the two accessed secrets in parallel.

T+0:15 — Scope confirmation: Review each API call for downstream impact. GetSecretValue on prod/database/password — check RDS CloudWatch metrics and access logs for anomalous query patterns in the 15-minute window. No anomalous database activity found — likely key was collected but not yet used for database access.

T+0:30 — Eradication: Backdoor IAM user deleted. Original key deleted. Both secrets rotated in Secrets Manager with all consumers updated. Root cause: developer committed .env file containing credentials. Post-incident action: implement pre-commit hooks with secret scanning and enforce git history scrubbing policy.

Hardening

Preventive Controls — Building a Resilient Cloud Posture

IR capability is the response to failure — but the goal is to build controls that prevent, detect, and limit the blast radius of cloud compromises before they escalate. The following controls represent the highest-ROI investments for cloud security posture.

Identity Controls

No long-lived IAM access keys for human users. Use AWS SSO/IAM Identity Center with short-lived credentials. Reserve IAM users for service accounts only.

Enforce least privilege: Regularly audit IAM policies with AWS IAM Access Analyzer. Remove unused permissions. Use permission boundaries to cap maximum effective permissions.

MFA on all human accounts: Require MFA for AWS Console access. Enforce with SCPs at the organisation level so individual account administrators cannot bypass it.

IMDSv2 everywhere: Require IMDSv2 (token-based) for all EC2 instances via instance metadata options. Eliminates SSRF-based credential theft.

Detection Controls

GuardDuty in all regions and all accounts. Route findings to a centralised security account. Enable S3 malware protection and EKS audit log monitoring.

CloudTrail with Object Lock: Write to S3 with Object Lock (Compliance mode) preventing any deletion or modification. Multi-region trail covering all regions including ones you don't actively use.

AWS Config rules: Automated compliance checks — alert on public S3 buckets, security groups with 0.0.0.0/0 ingress, IAM users without MFA, and root account usage.

Billing alerts: Unusual spending (particularly EC2 GPU instances) is often the first indicator of cryptomining. Set budget alerts for sudden spending increases.

✅The Shared Responsibility Model in Practice: AWS secures the physical infrastructure, the hypervisor, and the managed services. Everything deployed into the cloud — IAM configuration, EC2 security groups, S3 bucket policies, encryption, logging — is your responsibility. The most important mindset shift for cloud security is understanding that "AWS didn't get hacked" and "your AWS environment got compromised" are both simultaneously true. The shared responsibility model doesn't protect your data; your configuration does.

Reference

Core Concepts Summary

🔑

IAM as the Perimeter

Identity is the cloud security boundary. Over-privileged keys provide internet-accessible access with no network controls. Least privilege + no long-lived keys is the foundation. Audit regularly with Access Analyzer.

📜

CloudTrail Coverage

Management events enabled by default (90-day retention). S3 data events, Lambda invocations, and VPC Flow Logs require explicit enablement. Write to S3 with Object Lock. Verify retention isn't being deleted by lifecycle rules.

📡

IMDSv2

IMDSv2 requires a PUT token request before credential retrieval. Eliminates SSRF-based metadata theft entirely. Enforce via instance metadata options at launch. Verify via AWS Config rule. Highest-ROI single EC2 hardening control.

⚡

Blast Radius Scoping

Every compromised credential investigation must answer: what permissions did this identity have? What resources could it access? What did it actually access (CloudTrail)? Scope determines containment and notification obligations.

🛡️

GuardDuty

Enable in all regions and all accounts — route to central security account. Detects: Tor/VPN usage of IAM keys, anomalous API calls, cryptomining, S3 exfiltration patterns, reconnaissance. First alert layer for most cloud incidents.

📦

Preserve Before Contain

Export CloudTrail for compromised key ID. Snapshot EBS before isolation. Document IAM permissions before revoking. Terminating or deleting before preserving destroys the investigation. Sequence is mandatory.

⏱️

77-Second Window

Automated bots compromise exposed GitHub keys within minutes. GetCallerIdentity → enumerate → create persistence → exfiltrate in under 2 minutes is realistic. Detection and response capability must match this speed.

🏢

Shared Responsibility

AWS secures the cloud infrastructure. You secure what you put in it: IAM config, bucket policies, security groups, encryption, logging. "AWS didn't get hacked" and "your environment was compromised" are simultaneously true.

Ready to put it into practice?

Proceed to the Lab

You've covered the theory. Now apply it hands-on in the simulated environment.

Start Lab — Cloud IR→
← Return to all labs