Aura Stealer #2 beatin the obfuscation

blog.xyris.mov · 01xyris · 5 months ago · research
quality 7/10 · good
0 net
Hello party people today we gonna deep dive into the topic of obfuscation in AuraStealer just kiddin this aint chatgpt, this is my bad english. Profile So lets look into the obfuscation from Aura Stealer. It uses a very similar technique like the one which is describe here https://cloud.google.com/blog/topics/threat-intelligence/lummac2-obfuscation-through-indirect-control-flow It is called indirect control flow I’ll use the WinMain() here as a good example. we directly can see that there are no direct function calls. lets take a look at the first call. .text: 0042 F6D7 mov eax , 30 BCh .text: 0042 F6DC .text: 0042 F6DC loc_42F6DC : .text : 0042 F6DC add eax , off_49AEE4 ; dword ptr [0x49aee4] .text: 0042 F6E2 call eax so what is happening here? We can see mov eax, 30BCh that means the register holds the value 30BCh now after that we add eax, off_49AEE4 off_49AEE4 lays in .data and points to loc_450461+3 at loc_450461+3 the address would be 0x00450464 important we get the address not the data because of the off_49AEE4 which is the same like dword ptr [0x49aee4] what that means for the calculation of our functions address? we just have to calculate 0x30BC + 0x00450464 = 0x00453520 so eax = 0x00453520 call 0x00453520 should be a function now lets see and boom we find a function here :3 As we can see we have many patterns like this with jmp/call variation So what now? first i tried hardcoding every patttern… but then I had the idea to just look for call/jmp register in the dissassembled code and trace back every register which is important for the calculation of our final call address. Turns out I should have read the whole article “LummaC2: Obfuscation Through Indirect Control Flow” https://cloud.google.com/blog/topics/threat-intelligence/lummac2-obfuscation-through-indirect-control-flow Here it is described as symbolic backward slicing , but hey atleast i had the idea too, just need to fail for 3 days straight wondering how many patterns are left. ; earlier example from WinMain we calculated 0x0042F6D7: mov eax, 30BCh 0x0042F6DC: add eax, dword ptr [0x49AEE4] 0x0042F6E2: call eax flowchart TD subgraph Analysis["Code Analysis"] A["analyze_range
0x0042F6D7, 0x0042F6E2"] --> B["Disassemble x86
instructions"] B --> C["Loop through
instructions"] C --> D["Find call eax
at 0x0042F6E2"] end subgraph Tracing["Register Dependency Tracing"] D --> E["trace_register_dependencies
idx=2, target_reg=eax"] E --> F["tracked_regs = eax"] F --> G["Walk backwards"] G --> H["idx=1: add eax,
dword ptr [0x49AEE4]"] H --> I["get_register_written
returns eax"] I --> J["Add to dependencies"] J --> K["get_registers_read
returns eax, [0x49AEE4]"] K --> L["tracked_regs = eax"] L --> M["idx=0: mov eax, 30BCh"] M --> N["get_register_written
returns eax"] N --> O["Add to dependencies"] O --> P["get_registers_read
returns empty"] P --> Q["tracked_regs = empty
DONE"] Q --> R["Return dependencies
mov + add"] end subgraph Emulation["Unicorn Emulation"] R --> S["emulate_dependencies
deps, target_reg=eax"] S --> T["unicorn emulator"] T --> U["Execute: mov eax, 0x30BC
eax = 0x30BC"] U --> V["Execute: add eax,
dword ptr [0x49AEE4]"] V --> W["read_dword 0x49AEE4
returns 0x00450464"] W --> X["eax = 0x30BC + 0x00450464
= 0x00453520"] X --> Y["Return: 0x00453520"] end Reads a 4-byte integer from a virtual address by finding the correct section and unpacking the bytes. def read_dword ( self , va ): for s in self . sections . values (): if s [ 'start' ] <= va < s [ 'end' ]: off = va - s [ 'start' ] if off + 4 <= len ( s [ 'data' ]): return struct . unpack ( '= start or inst . mnemonic in [ 'cmp' , 'test' , 'sub' , 'add' , 'xor' , 'or' , 'and' ]: if op . type == X86_OP_REG : regs . add ( op . reg ) elif op . type == X86_OP_MEM : if op . mem . base : regs . add ( op . mem . base ) if op . mem . index : regs . add ( op . mem . index ) return regs Walks backward through instructions to find all instructions that contribute to computing the value of a target register. def trace_dependencies ( self , instructions , target_idx , target_reg , max_lookback = 100 ): deps , tracked , seen = [], { target_reg }, set () for idx in range ( target_idx - 1 , max ( 0 , target_idx - max_lookback ), - 1 ): inst = instructions [ idx ] if inst . mnemonic in [ 'jmp' , 'je' , 'jne' , 'jz' , 'jnz' , 'jg' , 'jl' , 'jge' , 'jle' , 'ja' , 'jb' , 'jae' , 'jbe' , 'ret' ] or inst . address in seen : continue if inst . operands and inst . operands [ 0 ] . type == X86_OP_REG and inst . operands [ 0 ] . reg in tracked and \ inst . mnemonic in [ 'mov' , 'lea' , 'add' , 'sub' , 'xor' , 'or' , 'and' , 'shl' , 'shr' , 'sar' , 'imul' , 'mul' , 'movzx' , 'movsx' ]: deps . insert ( 0 , inst ) seen . add ( inst . address ) tracked . remove ( inst . operands [ 0 ] . reg ) tracked . update ( self . get_registers_read ( inst )) if not tracked : break return deps Uses Unicorn Engine to execute the dependency chain and resolve the final value of the target register. def emulate_dependencies ( self , deps , target_reg ): if not deps : return None mu = Uc ( UC_ARCH_X86 , UC_MODE_32 ) code_base , stack_base = 0x1000000 , 0x2000000 mu . mem_map ( code_base , 0x10000 ) mu . mem_map ( stack_base , 0x10000 ) mu . reg_write ( UC_X86_REG_ESP , stack_base + 0x8000 ) for s in self . sections . values (): try : mu . mem_map ( s [ 'start' ], (( len ( s [ 'data' ]) + 0xFFF ) // 0x1000 ) * 0x1000 ) mu . mem_write ( s [ 'start' ], bytes ( s [ 'data' ])) except : pass Disassembles a code range and finds all indirect calls/jumps, then traces and resolves their target addresses. def analyze_range ( self , start_va , end_va ): code = self . get_code_range ( start_va , end_va ) if not code : print ( f "Failed to read code range 0x { start_va : 08x } - 0x { end_va : 08x } " ) return instructions = list ( self . cs . disasm ( code , start_va )) for idx , inst in enumerate ( instructions ): if inst . mnemonic in [ "call" , "jmp" ] and inst . operands and inst . operands [ 0 ] . type == X86_OP_REG : target_reg = self . cs . reg_name ( inst . operands [ 0 ] . reg ) print ( f " \n [+] Analyzing: 0x { inst . address : 08x } : { inst . mnemonic } { target_reg } " ) deps = self . trace_dependencies ( instructions , idx , inst . operands [ 0 ] . reg ) target_value = self . emulate_dependencies ( deps , inst . operands [ 0 ] . reg ) if target_value is not None : print ( f " \n [*] Target resolved: 0x { target_value : 08x } \n " ) Output full script can be found here https://github.com/01Xyris/RE-Malware/blob/main/AuraStealer/find_cff.py Now we just need to patch this in the binary. takes the last instruction before the call and patches it with # mov reg, calculated_address example 0 x0042F6D7: mov eax , 30 BCh 0 x0042F6DC: add eax , dword ptr [ 0x49AEE4 ] 0 x0042F6E2: call eax after 0 x0042F6D7: mov eax , 30 BCh 0 x0042F6DC: mov eax , 0x00453520 ; calculated_address nop ; nop padding so size is the same 0 x0042F6E2: call eax The full deobfuscation script can be found here https://github.com/01Xyris/RE-Malware/blob/main/AuraStealer/aura_patch_obf.py