The simplest way to explain red teaming to someone outside the industry: a penetration test asks “are there holes in the wall?” A red team operation asks “if someone climbs through one, will anyone notice?” The first is a finding count. The second is a test of the entire detection-and-response stack: SIEM rules, SOC analyst shifts, the playbook for “we just saw beacon traffic from a finance laptop,” the legal escalation chain, and the reflexes of the analyst who has to make the call at 2 a.m.
This is the first post in the Red Team Operations series. It’s the conceptual layer: what an engagement actually looks like, where the framework boundaries sit, and where the working operator’s day-to-day diverges from the version of the job people imagine. Later posts go into specific techniques.
Red teaming versus penetration testing#
These get conflated constantly, partly because the sales material from a lot of consultancies blurs the line on purpose. They’re different products and they exist for different buyers.
| Property | Penetration test | Red team operation |
|---|---|---|
| Goal | Enumerate vulnerabilities and prove exploitation | Test detection, response, and recovery capability |
| Scope | Narrow (“the customer portal,” “the corporate Wi-Fi”) | Broad (everything an objective requires touching) |
| Duration | 1-4 weeks | 1-6+ months |
| Stealth | Not a requirement; often explicitly out of scope | Central requirement; detection is itself a finding |
| Starting position | Typically white- or grey-box with credentials and architecture docs | Black-box external, or assumed-breach starting inside |
| Output | Findings report with CVSS-scored vulnerabilities | Narrative of operator actions vs. defender response, with timeline |
A useful framing: the pen test is the safety inspection. The red team operation is the fire drill. Both have value. They answer different questions and they shouldn’t be priced against each other.
A caveat on the “black-box” row: a lot of modern red team work is assumed breach. The customer hands you an authenticated workstation (or a phished credential, or a planted USB) and you start the campaign from there. This isn’t a compromise of red-team integrity; it’s a practical choice that lets the engagement focus on internal detection rather than spending three weeks burning external phishing infrastructure to get the foothold you would have gotten anyway. If your goal is to test the SOC’s ability to catch lateral movement, you don’t need to spend a month re-proving that phishing works.
The Unified Kill Chain as a working model#
Lockheed Martin’s 2011 Cyber Kill Chain is the canonical model and it’s still useful for explaining what an intrusion looks like to a non-technical audience. The issue is that the original seven phases (Reconnaissance, Weaponization, Delivery, Exploitation, Installation, Command & Control, Actions on Objectives) ends with the attacker getting initial code execution. Everything that happens after the foothold — lateral movement, internal recon, persistence layering, credential theft, the actual data exfil — gets bundled into “Actions on Objectives” as if it were a single step.
Paul Pols’ Unified Kill Chain (2017, updated 2022) splits that out into eighteen phases organized into three high-level groups: In (external reconnaissance through foothold), Through (internal pivoting, privilege escalation, credential access), and Out (collection, exfiltration, impact). The model maps cleanly onto how modern red team work actually unfolds, where 80% of the operator hours are post-foothold and the interesting question is whether you can move from one segment to another without being noticed.
In practice the phases an operator thinks about during an engagement are roughly:
- Reconnaissance — passive OSINT (LinkedIn, job postings, GitHub leaks, certificate transparency logs) and active scanning of whatever’s in scope.
- Resource development — registering domains, standing up redirectors and C2 servers, generating payloads.
- Initial access — phishing, exposed services, web app exploits, supply chain. Or assumed breach.
- Execution and persistence — getting the implant running and surviving reboots / EDR disinfection.
- Defense evasion — staying off the EDR’s radar, AMSI/ETW patching, in-memory only, indirect syscalls.
- Credential access — Mimikatz / kiwi, Kerberoasting, AS-REP roasting, DCSync, LSASS dumps.
- Discovery — BloodHound, ADExplorer, internal port scanning, service enumeration.
- Lateral movement — pass-the-hash, pass-the-ticket, WMI, SMB, WinRM, RDP, exploiting trusts.
- Collection and exfiltration — actually retrieving the data the engagement objective specifies.
- Impact (or simulated impact) — usually not actually executed in a red team engagement, but documented as “we could have.”
The exact terminology varies. MITRE ATT&CK Enterprise currently catalogs fourteen tactics — Reconnaissance, Resource Development, Initial Access, Execution, Persistence, Privilege Escalation, Defense Evasion, Credential Access, Discovery, Lateral Movement, Collection, Command and Control, Exfiltration, and Impact — and is the operational vocabulary most reports get written in. Knowing the ATT&CK technique IDs for what you actually did makes the customer’s threat intel team’s life much easier.
C2 infrastructure#
A red team’s command-and-control infrastructure is the thing that most reliably gets it caught. The implant on the endpoint is one detection surface; the network traffic from that implant back to your team server is another. Direct connections from a target host to a single attacker-controlled IP are exactly what every modern SIEM is tuned to spot.
The standard architectural answer is tiered infrastructure:
- Team server — the actual C2 backend (Cobalt Strike, Sliver, Mythic, Havoc, etc.). Never exposed to the public internet. Lives behind a firewall, accessible only from operator workstations over a VPN.
- Long-haul redirectors — cheap VPS instances running
socat,nginx, orApache mod_rewritethat take inbound C2 traffic from compromised hosts and forward it to the team server. These are disposable. When one gets blocked or burned, you spin up a new one and update DNS. - Short-haul / phishing infrastructure — separate from the C2 stack, used for delivery only. Burns fast and isn’t expected to survive long.
- Categorization aging — domains used for serious operations are usually registered weeks or months ahead of the engagement, then put through aging / categorization services so they don’t look like brand-new attacker domains to filtering engines.
The redirector tier is the load-bearing concept. If a SOC analyst pivots from “weird beacon traffic” to the IP it’s hitting and blocks that IP, you’ve lost one $5 VPS. The team server is untouched. You bring up another redirector, update the DNS record the implant resolves, and the beacon reconnects.
Domain fronting (and why it isn’t 2018 anymore)#
Domain fronting used to be a powerful evasion technique: a TLS connection terminates at a high-reputation CDN edge (say, ajax.microsoft.com), but the HTTP Host header inside the TLS tunnel routes the request to your actual C2 backend inside the same CDN. To a network observer, the traffic is just HTTPS to Microsoft. To the receiving CDN, it’s a request for your container.
This was the canonical CDN-as-cover technique for years. It’s mostly gone now. Google disabled domain fronting on Google App Engine in April 2018; AWS CloudFront published its enhanced domain protections on April 27, 2018, just days later; Microsoft Azure Front Door began blocking domain fronting on newly created resources in November 2022, with full enforcement against existing resources from January 2024. The major CDNs all reject mismatched SNI vs Host header now. There are still niche providers and specific architectures where variants work, but assuming domain fronting in 2026 is generally a planning mistake.
The current replacements:
- CDN-resident C2 — your team server lives behind a CDN, but the SNI and Host match. The CDN gives you a legitimate cert and a high-reputation IP without the fronting trick.
- Reputable cloud hosting — Cloudflare Workers, AWS Lambda, GitHub Pages, Azure Functions, anything where the traffic is to a major provider’s IP range. SOCs are reluctant to block these wholesale because the false positive cost is huge.
- Malleable C2 profiles — Cobalt Strike’s profile language and Sliver’s HTTP profile system let you make the beacon traffic look like specific legitimate services: a stale Amazon shopping session, a Slack heartbeat, an Office 365 sync.
- Encrypted DNS / DoH — using DNS-over-HTTPS to
cloudflare-dns.comordns.googlefor the C2 channel. Limited bandwidth but extremely hard to filter.
Purple teaming#
The traditional adversarial framing of red versus blue creates a real cultural problem: the red team finds a gap, writes it up, and hands it over the wall. The blue team gets a report saying they failed at a thing, fixes it as best they can, and never sees the next variant. Both sides feel adversarial in a way that doesn’t actually make defenses better.
Purple teaming is the collaborative answer. Operators and defenders sit in the same room (or video call) and the cycle is fast:
- Operator: “I’m about to run a Kerberoasting attack against
DC01at 10:05.” - Operator runs it.
- Defender: “I see nothing in our SIEM.”
- Together: “Why? Is Windows event 4769 not being collected? Is the SIEM rule looking for the wrong service name pattern? Is the alert too noisy and being auto-deduped?”
- Defender enables collection / fixes the rule / re-tunes the alert.
- Operator re-runs.
- Defender: “Got it, alert fired in 90 seconds.”
The same loop, against ten different ATT&CK techniques in a day, produces more useful tuning than a quarter of after-the-fact report-based remediation. The downside is that purple teaming doesn’t test the SOC’s real-world response time or process — everyone knows what’s coming. Use both.
Deconfliction#
During a stealth red team engagement, the SOC will sometimes detect activity that looks like a real attacker. This is good — it’s exactly the test the engagement is designed to run. The problem is what happens next.
If the SOC genuinely doesn’t know it’s a red team, they call an incident, page the CISO, possibly bring in external IR, and the organization burns real money and adrenaline on responding to a simulated event. Worse, they might shut down systems, contact law enforcement, or notify a regulator — all of which is hard to walk back.
Deconfliction is the process that prevents this. There’s a designated white cell — one or two trusted people inside the customer organization who know the engagement is happening and who have the operator’s contact information. When the SOC detects suspicious activity, before escalating, they check with the white cell. The white cell asks the operator: “Was IP 192.0.2.50 hitting the DC at 14:00 UTC you?” The operator’s logs answer yes or no.
For this to work the operator side has to log everything: every IP used, every command run, every timestamp, every host touched. This is also useful for the eventual report; it’s how you write the timeline section. Cobalt Strike, Sliver, and Mythic all have built-in logging that’s usually sufficient if you remember to turn it on. The mistake is realizing on engagement day three that you never enabled the log retention and now the white cell can’t deconflict the activity.
Operator OPSEC#
You are trying to detect them. They are trying to detect you. The basic hygiene that decides which way that goes:
Don’t generate predictable network patterns. Default beacon intervals (60 seconds, no jitter) are one of the cleanest IOCs an EDR or SIEM can write a rule for. Run with random jitter (60s ±20-40%), longer intervals on long-haul beacons (15-30 minutes), and avoid generating traffic that’s exactly the same size on every callback.
Don’t reuse credentials at machine-gun speed. If you obtain a domain user’s password, the temptation is to spray it against every host you can see to confirm where they have access. A SOC with even mediocre auditing will see “user X authenticated to 87 hosts in three minutes from a workstation that’s never seen most of them” and respond accordingly. Use it slowly, against specific targets that justify the noise.
Clean your binaries. Strip symbols (strip on Linux, --passL:-s for Nim, strip = true in Cargo for Rust). Remove debug strings, PDB paths, and anything that says “compiled on the operator’s laptop at /home/operator/projects/totally-not-malware/”. A surprising number of red team payloads have been caught by exactly this — a forgotten build path embedded in the binary that links it back to a known operator’s tooling.
Assume burn. Every payload you ship is going to be caught eventually. Build for graceful degradation: when the implant gets sandboxed or the C2 channel gets blocked, you should still have a foothold somewhere else that the defender hasn’t found yet. Two-channel C2, multiple persistence mechanisms, and a clear answer to “what’s the next thing I do when this one dies” are how engagements survive.
Don’t fight the EDR. If you find yourself trying to bypass a specific EDR product through escalating evasion tricks during an engagement, you’re almost always better off going around it: find an unmanaged host (a forgotten Linux server, a contractor’s laptop, a dev environment) and operate from there. Most networks have one. The phrase “managed endpoint” is doing a lot of work in EDR vendor marketing — there’s usually a long tail of stuff that isn’t.
Closing#
Red teaming is a strange discipline. The deliverable isn’t a list of bugs; it’s a story. What an adversary actually does inside the environment. Where detection holds and where it folds. Where the gap sits between tooling, process, and the people running both. What to fix first so next year’s engagement is harder.
The technical skill is table stakes. The harder part is the discipline around it: OPSEC, deconfliction, careful logging, restrained scoping, and being clear with the customer about what the test is actually measuring. The point isn’t to prove you personally can run Mimikatz. It’s to make the defenders better. The customers who get the most out of red team work treat the report as input to a tuning cycle, not as a verdict on the SOC.
Next posts in the series will dig into specific techniques: phishing infrastructure, lateral movement chains in Active Directory, EDR evasion patterns, and how to write the report.
References#
- The Unified Kill Chain (Paul Pols) — the model, with the 18 phases.
- MITRE ATT&CK — the canonical tactic/technique catalog.
- Lockheed Martin Cyber Kill Chain — the original 2011 model.
- Red Team Infrastructure Wiki by bluscreenofjeff — older but still useful infrastructure reference.
- Cobalt Strike Malleable C2 Profile reference
- “Domain Fronting Is Dead, Long Live Domain Fronting” (Erik Hjelmvik, NETRESEC) — context on the 2018-2022 shutdown.
- TIBER-EU framework — the European Central Bank’s red team engagement framework, useful as a reference for how mature engagements are scoped.