Stack-based buffer overflows are the oldest exploitable memory corruption bug and still the easiest to walk through end-to-end. A program reserves N bytes of stack space for a buffer, copies more than N bytes into it, and the excess overwrites whatever’s adjacent in the stack frame, including the saved return address. When the function returns, the CPU pops that overwritten value into the instruction pointer and starts executing whatever the attacker put there.
On a 1999 Linux box with no mitigations, that’s the whole exploit: pad the buffer, overwrite the return address with the location of your shellcode, get code execution. On a 2026 Windows machine with ASLR, DEP, stack canaries, CFG, and CET shadow stacks all enabled, the same vulnerability needs four or five additional primitives stitched together. The bug class hasn’t gone away. The mitigation stack has gotten enormous.
This post walks the mechanics, the classic mitigations and their bypasses, the modern layers added on top, and what an exploit-dev workflow looks like against a target that’s deliberately not hardened (a CTF binary, a vulnerable VM, an IoT firmware image). Production-hardened software is a different post.
If you’ve never written one of these by hand, the answer to “why bother in 2026” is that the mechanics are the foundation of everything from heap UAFs to JIT compiler bugs to kernel exploitation. The bug class is dated; the muscle memory transfers.
Stack frame mechanics#
A buffer is a contiguous block of memory. Local C variables get placed on the stack, a region the CPU manages via the stack pointer (rsp on x86-64, esp on x86) and the frame pointer (rbp / ebp).
When a function is called on x86, the stack frame looks like this. The stack grows downward, meaning new frames live at lower addresses than older ones:
block-beta columns 1 high["▲ high addresses"] args["function arguments: pushed by caller (or in registers on x86-64)"] ret["return address: pushed by CALL"] ebp["saved EBP / RBP: pushed by function prologue"] locals["local variables: char buffer of 64 bytes, etc."] rsp["◀ ESP / RSP"] low["▼ low addresses"] style high fill:transparent,stroke:transparent style low fill:transparent,stroke:transparent style rsp fill:transparent,stroke:transparent
The detail that makes the exploit possible: writes into buffer move from low addresses to high addresses. Write 65 bytes into a 64-byte buffer and byte 65 lands on the saved frame pointer. Keep writing and you reach the return address. Keep writing past that and you’re scribbling on the caller’s frame.
When the function executes ret, the CPU pops whatever’s at [rsp] into the instruction pointer. Control the bytes at that address and you control execution.
That’s the foundational primitive. Every other technique in this post is bypassing a defense added against it.
The classic mitigation trio#
Between roughly 2003 and 2008 the major operating systems shipped three defenses that together make a 1999-style overflow useless. They are still the right place to start because most newer mitigations layer on top of them.
ASLR#
Address Space Layout Randomization. Randomizes where the stack, heap, and libraries get loaded each time the process starts. Hardcoded jump addresses break because they’ll be at a different offset next run.
On Windows, ASLR is opt-in per module via the /DYNAMICBASE linker flag. One library compiled without it gives the attacker a fixed offset to pivot through, which is why mitigation audits walk every loaded DLL one at a time. On Linux, the equivalent is PIE (position-independent executables), default on amd64 in Ubuntu since 16.10 (2016), Debian 9 (2017), and similarly in other modern distros.
DEP / NX#
Data Execution Prevention. Stack and heap pages are marked writable but not executable. Drop shellcode in a buffer and jump to it and the CPU faults on the first instruction. Implemented via the NX bit on the page table entry (AMD calls it NX, Intel calls it XD).
DEP killed the simplest exploit shape (write shellcode to the stack, jump to it) and pushed exploit development toward ROP.
Stack canaries#
A random per-process value gets written between the local variables and the saved frame pointer at function entry. The function epilogue checks it before returning; if an overflow has corrupted it, the program calls __stack_chk_fail (Linux glibc) or __report_gsfailure (Windows MSVC) and aborts before ret runs.
GCC’s -fstack-protector-strong and MSVC’s /GS turn it on. Most distros and toolchains have it on by default now.
How each gets bypassed#
Bypassing DEP with ROP#
You can’t execute your own code, so you reuse the program’s existing code. ROP (return-oriented programming) chains short instruction sequences ending in ret, called gadgets, that already exist in the binary or loaded libraries.
The standard ROP chain on Windows calls VirtualProtect to flip the stack page from RW to RWX, then jumps into shellcode that’s now executable. The chain threads arguments into the right registers for the calling convention. On x64 Windows that’s rcx, rdx, r8, r9:
pop rcx; ret(address of the stack page)pop rdx; ret(size, typically0x1000)pop r8; ret(new protection,PAGE_EXECUTE_READWRITE)pop r9; ret(pointer to receive old protection)retlands onVirtualProtect- after the call returns,
retto the start of the now-executable shellcode
Gadget finders: ROPgadget, ropper, pwntools’s ROP helper. Building one chain by hand is worth doing once. After that, use the tools.
Bypassing ASLR with information leaks#
ASLR randomizes module base addresses, but the internal layout of each module is constant. The offset between two functions inside kernel32.dll is the same regardless of where the DLL got mapped. Leak a single pointer that lives inside the module and subtract the known offset to recover the base.
Common leak primitives: format string bugs (printf with attacker-controlled format), out-of-bounds reads in parsers, info-leak gadgets that hand back a register value. A leak plus an overflow is the standard two-bug chain against any ASLR-protected target.
Bypassing canaries#
Three options:
- Read the canary with an info leak (the same format string bug or OOB read used for the ASLR bypass usually works), then write it back unchanged during the overflow.
- Brute force against forking servers. Pre-
fork()Apache and similar servers inherit the same canary in every child. Wrong canary kills the child; correct canary lets the request continue. Guess byte by byte and the canary gets recovered in linear rather than exponential time. Variants of this against Apache have been published since the late 2000s. - Skip the canary by overwriting something past the return address that gets dereferenced before the function returns. On 32-bit Windows the classic version was SEH overwrite, where corrupting the Structured Exception Handler chain and then triggering an exception transferred execution before the canary check ran.
What the classic trio misses#
The 2010s and 2020s added more layers. A complete picture of modern exploit dev has to mention them, even at primer level:
- CFG (Control Flow Guard) on Windows. Validates indirect call targets against a bitmap of legitimate call sites. Breaks vtable hijacks unless the attacker finds a CFG-suppressed function or a non-CFG module loaded in the same process.
- CET (Control-flow Enforcement Technology) on Intel 11th-gen and AMD Zen 3 onward. Hardware shadow stack holding a second copy of every return address. The CPU compares stack and shadow stack on
ret; mismatch faults. This breaks classical ROP at the hardware level. - ARMv8.3 PAC (Pointer Authentication Codes). Signs return addresses and function pointers with a CPU-derived MAC. iOS leans on this aggressively.
- Heap hardening. LFH randomization, Segment Heap on Windows, GWP-ASan in Chrome, and a long list of allocator improvements that’s out of scope here.
Honest version of modern stack overflow exploitation: against fully hardened Chrome or recent Windows, you’re chaining four or five primitives. An info leak, an ASLR bypass, a CFG-bypassing call gadget, a CET-avoiding control flow primitive, and a kernel exploit on top if you want out of the sandbox. CTF and firmware targets are the ones that still look like a 2005 textbook exploit.
Shellcode and bad chars#
Shellcode is position-independent machine code that does whatever the exploit needs to do: spawn a shell, call back to a C2, write a file, load a stage 2. msfvenom, pwntools, and donut all generate it. You can hand-write it in NASM for small payloads.
The constraint nobody warns you about is bad characters. Whatever function the overflow flows through has its own list of bytes it can’t pass. strcpy stops at null. gets stops at newline. A custom protocol parser looking for a length field might stop at 0x00, 0xff, or whatever marks end-of-input.
Generating shellcode that avoids specific bytes:
msfvenom -p windows/x64/exec CMD=calc.exe -b "\x00\x0a\x0d" -f pythonThe -b flag is the badchar list. msfvenom picks an encoder and prepends a decoder stub that unpacks the rest at runtime. The cost is size and the constraint that the decoder stub itself has to be free of the listed bytes, which fails on small badchar lists. The encoder options on x64 are much more limited than on x86, where shikata_ga_nai covers most cases.
Identifying badchars empirically is part of the workflow. Send a payload containing every byte 0x01–0xff in order, attach a debugger, look at memory to find where corruption shows up, mark the byte before the cutoff as bad, retry.
Workflow for an unhardened target#
The sequence for a CTF binary, a vulnerable VM, or an old IoT image where most mitigations are missing:
- Fuzz to crash. Send increasingly long inputs (
A * 100,A * 1000,A * 5000) until the program segfaults. Or use a fuzzer. - Find the offset. Send a unique pattern (
pattern_create.rbfrom Metasploit orcyclicfrom pwntools) and check what value ends up inrip/eipat crash. That tells you exactly how many bytes precede the return address overwrite. - Identify badchars. Send all bytes
0x01–0xffin order, see which get mangled. - Find a control transfer primitive. With no DEP, a
jmp espgadget in a non-ASLR module. With DEP, a ROP chain that callsVirtualProtect/mprotect. With CET, find a different bug. - Plant shellcode and a NOP sled. The sled (
0x90repeated on x86) gives the jump some slop if the address isn’t exact. - Test, debug, iterate. Most exploits don’t work first try. Common failure modes: badchars you missed, wrong calling convention, register state at the gadget didn’t match what you assumed, the heap got rearranged between development and deployment.
A hardened target makes steps 4 and 5 multi-stage. Everything before that is the same.
What this still applies to#
Browsers and modern desktop apps have made classical stack overflows look like a museum piece. CET, CFG, sandboxes, and aggressive fuzz testing during development mean a clean stack-overflow-to-shell exploit against current Chrome or recent Windows is research-grade work, not a tutorial exercise.
What still looks like 2005:
- IoT and embedded firmware. ASLR is often off, DEP can’t be relied on because many ARM/MIPS firmware images run kernels without per-page execute permissions or the vendor compiled without the relevant flags, and the device may be on a 2014 kernel.
- Industrial control systems and SCADA, same reasons.
- Legacy enterprise software nobody recompiles. There are still production Windows binaries from the late 1990s and 2000s running with no
/DYNAMICBASE. - Embedded Linux on routers. Busybox-based firmware tends to be unhardened userspace.
- Specific Linux kernel paths that lag userspace hardening.
Practicing on a CTF target teaches the mechanics. Applying them in 2026 means picking target classes where the hardening hasn’t caught up, and most of those targets aren’t on someone’s laptop.
References#
- Corelan exploit writing tutorials . Older, but still the most thorough walkthrough of the classic workflow on Windows.
- pwn.college . Free hands-on exploit dev curriculum. Recommended starting point in 2026.
- LiveOverflow’s binary exploitation series . Video walkthroughs that build intuition.
- Microsoft on Control Flow Guard
- Intel on Control-flow Enforcement Technology
- Project Zero blog . Current examples of how modern exploit chains compose against hardened targets.
- OWASP Buffer Overflow