Rationale

Malware and cheats often call system API to do things. Calls like VirtualAlloc and VirtualProtect are routed through NtAllocateVirtualMemory and NtProtectVirtualMemory after some user mode checks in Kernel32.dll. However, using the highest level API can fall victim to user mode hooks.

For example, instrumentation callbacks are set using NtSetInformationProcess. By using the info class ProcessInstrumentationCallback & passing a PROCESS_INSTRUMENTATION_CALLBACK_INFORMATION structure containing a callback function, we handle all syscalls coming from all threads. We can also use TLS and examine the return address and arguments of all syscalls, storing context as we do so.


Briefly on Syscalls

When we perform a syscall, we request that the OS do something. On Windows version NT6+, these system calls are implemented in ntoskrnl and win32k in exported tables KiServiceTables and W32pServiceTables respectively.

We can use syscalls to bridge the user-kernel gap not covered by public APIs, though many opaque routines are mirrored renames of syscalls, often with some error checking and so on. The majority of syscalls are trampolined via ntdll. We disassemble any x86 syscall.

Syscall Disasm

The routine first places the syscall index into EAX. It then does something peculiar. Because the x86 call is not native, the syscall is instead routed through a translation layer Wow64SystemServiceCall. Instead of containing the expected SYSENTER instruction, it jumps into 64-bit mode and issues a SYSCALL.

Wow64 Translation

We want to retrieve the address of this function, place the syscall id in EAX, dynamically fix the stack and return. Instead of calling the ntdll export, we avoid any hooks, filters and callbacks and instead ship the call straight to kernel. We can do this by allocating executable memory, setting up the function and calling it on the go.

u64 nt_wow64_transition(u8* nt_base)
{
  // 1. vv vv vv vv
  // BA A0 A0 31 4B       mov     edx, 4B31A0A0h
  // FF D2                call    edx

  // 2.    vv vv vv vv
  // FF 25 24 22 3B 4B    jmp      ds:_Wow64Transition
  return *uti::ida_sig<u64*>("BA ?? ?? ?? ?? FF D2 C2 34 00").ref(1).ref(2);
}

template<u32 id, typename... arg_t>
u32 nt_make_call(arg_t... args)
{
  u8 call_stub[15] = {
    0xb8, 0x00, 0x00, 0x00, 0x00,  // mov eax, ?? ?? ?? ??
    0xba, 0x00, 0x00, 0x00, 0x00,  // mov edx, ?? ?? ?? ??
    0xff, 0xd2,                    // call edx
    0xc2, 0x00, 0x00               // retn ?? ??
  };

  auto mem_alloc = VirtualAlloc(0, sizeof(call_stub), MEM_RESERVE | MEM_COMMIT, PAGE_EXECUTE_READWRITE);
  if (!mem_alloc)
    return STATUS_UNSUCCESSFUL;

  *(u32*)(&call_stub[0x1]) = id;
  *(u32*)(&call_stub[0x6]) = nt_wow64_transition(LoadLibraryA("ntdll.dll")); // get wow64 translation
  *(u16*)(&call_stub[0xd]) = (sizeof(arg_t) + ...);

  for (size_t i = 0; i < sizeof(call_stub); i++) mem_alloc[i] = call_stub[i]; // write stub to RWX alloc

  auto ret_status = (u32(__stdcall*)(arg_t...))(args...);

  if (!VirtualFree(mem_alloc, sizeof(call_stub), MEM_RELEASE | MEM_DECOMMIT))
    return STATUS_UNSUCCESSFUL;

  return ret_status;
}


Conclusion

With this method, we are able to call syscalls on the go without calling the ntdll trampoline methods. I leave further analysis of Wow64Transition as an exercise to the reader. The bottom line is that we are able to implement syscalling trivially, effectively getting around many of the checks and balances that would otherwise catch us - such as those previously mentioned.

Similar methods are available in 64-bit, but the technique must be somewhat adapted. I leave this, too, as an exercise to the reader. Syscalls are a very interesting mechanic of operating system design.