log in

Detecting process injection with ETW

lobste.rs - Thu Apr 8 01:27

The goal of this series is to document some key concepts around memory injection detection, and to experiment with some simple detection use cases leveraging the TiEtw kernel feed. I'll also build a PoC detection agent based on this ETW, and experiment with different bypass techniques.

To understand why Microsoft's "Threat Intelligence Tracing" is useful and much needed in detection of modern and often obscure injection techniques, let's first briefly remind ourselves what are the most common injection detection techniques used by defenders and security vendors for the last 15 years, and where they fall short.

Scanning process memory for signatures in a similar way that an AV scans accessed files is perhaps the oldest and most trivial technique used for catching memory-resident malware. Depending on the specific implementation, the scans can be selective, and take place after a new process is started, or periodic (aka. persistence scanning), for all the processes running.

While in theory this approach is very attractive, there are some challenges:

  • Firstly, memory scanning is notorious for being resource-intensive. The constantly growing average RAM size, and wide adoption of the x64 architecture only adds to the challenge. This means whatever our implementation, we will need to make accuracy-performance tradeoffs.

  • Because the periodic scanners can't do full memory scans in short intervals (see above), they tend be used for finding interesting and anomalous memory segment metadata, such as PAGE_EXECUTE_+ in conjunction with MEM_RESERVE | MEM_COMMIT allocation type (this is a GREAT simplification). These can then trigger a full memory scan of a particular process.

  • Traditional memory scanners lack contextual awareness:

    • Which process and thread has created/written to the memory page?

    • Was this a remote or local process memory allocation?

    • What sequence of WinAPI calls was used? What arguments were passed etc.

    • When to scan?

All this means traditional ("full") memory scanning is mainly used in AV on-demand scanning, as well as during incident response - often on "offline" memory dumps. Most if not all major endpoint security solutions rely on a hybrid approaches, sandboxed execution + fingerprinting, and event-triggered real-time scanning.

The most common techniques leveraged for event-triggered process scanning (but also more generally AV/EDR detection) involve using kernel driver notifications/callbacks, as well as user-mode Win32 API inline hooking. Both have their advantages and weaknesses.

Other interfaces e.g. AMSI (Anti-Malware Scanning Interface) exist and can be used by defenders for event-triggered scanning of buffers in various Windows interfaces such as VBA7.DLL or WMI.

Kernel driver callbacks

Windows allows driver writers to register callback routines to be called when a particular type of system event occurs. For instance, if we are interested in monitoring new process creation events, we can register a callback routine for it by calling PsSetCreateProcessNotifyRoutine. When a process is next started, our driver will get notified.

Some facts about driver callbacks:

  • Run in kernel-mode

    • Known bypass techniques require SELoadDriverPrivilege and a vulnerable signed driver, or another method to enter Kernel-land.

  • The number of callbacks is limited, and there no callbacks for memory management/process injection APIs exist (thread creation is an exception). The most useful types of events we can intercept include:

    • Process creation

    • Image load

    • File creation

    • Thread creation

    • Registry operations

    • Minifilter callbacks

  • Apparently kernel callbacks can suffer from a race condition. However, this is more important when trying to proactively block execution, and does not affect detection capability.

While it's possible to only use driver callbacks with process creation, thread creation and image loads to inform our memory scanning doing so doesn't solve the issues described earlier. This is why defenders had to resort to hooking, and historically even kernel-mode hooking to gain visibility into all memory and process management operations.

Userland-hooking

Thanks to Userland-hooking we can intercept actual memory management (and other) API calls, and process these individually to provide better visibility, accuracy and performance. Moreover, we are no longer at Microsoft's grace and intercept whatever we think is useful in detection.

As an example - instead of continuously scanning process memory for presence of malicious code, we can detour a Native API such as NtWriteVirtualMemory and scan the buffer passed to it at runtime. Or we could monitor a subset of calls to NtAllocateVirtualMemory , and trigger scanning based on the call parameters.

In case of many injection techniques, having visibility into arguments passed to calls might be enough to perform high-fidelity detection, without additional scanning of the target memory segments. A good example here would be queueing an APC routine outside of a loaded module.

This approach is not perfect either:

  • Function prologues can be restored ("unhooking")

  • Techniques such as APC-based EarlyBird injection allow for code execution before protection DLLs are loaded, and have time apply hooks (including in ntdll)

  • Direct syscalls bypass userland hooking, and kernel-land hooking is not possible on x64 systems since 2005

    • This also means no visibility into kernel-originating injections as was demonstrated by DOUBLEPULSAR in early 2017.

In late 2018, a year after widespread DOUBLEPULSAR infections made it apparent security vendors need kernel-level hooking into common process injection APIs, Windows 10 build 1809 has added a new kernel instrumentation that did just that.

Events intercepted from kernel-mode memory management and APC APIs are traced though an ETW provider called "Microsoft-Windows-Threat-Intelligence", and available for subscription to processes running with PPL-Antimalware level of protection. This in practice means only Microsoft-recognized security vendors with proper signing certificates can consume the feed.

As an example, on the latest Windows 10 x64 release, any memory allocation API will eventually reach the nt!MiAllocateVirtualMemory, and this is where the associated logging function EtwTiLogAllocExecVmis called from, and - assuming the trace is enabled - logs all contextual information about the call.

We can list all the threat intel logging functions by checking on xrefs to EtwThreatIntProvRegHandle in the kernel image.

Other than ensuring kernel-to-usermode injections are logged, this kind of implementation makes any bypass attempts fairly difficult, unlike ntdll inline hooking. And not running in usermode means we are not getting patched out like the DotNet CLR ETW and other usermode session providers:

Although a few bypasses have been researched and successfully attempted for the APC logic of the sensor, we are yet to see a good one for the fundamental memory management operations (allocations / memory writes / protection mask changes) required for any type of process injection.

Event types

The sensor logs at least 14 different memory and thread management events, divided into LOCAL and REMOTE groups, where remote means the API-calling process is different from the target process, and local signifies a process making changes within its own memory space.

KERNEL_THREATINT_TASK_ALLOCVM_REMOTE

KERNEL_THREATINT_TASK_PROTECTVM_REMOTE

KERNEL_THREATINT_TASK_MAPVIEW_REMOTE

KERNEL_THREATINT_TASK_QUEUEUSERAPC_REMOTE

KERNEL_THREATINT_TASK_SETTHREADCONTEXT_REMOTE

KERNEL_THREATINT_TASK_ALLOCVM_LOCAL

KERNEL_THREATINT_TASK_PROTECTVM_LOCAL

KERNEL_THREATINT_TASK_MAPVIEW_LOCAL

KERNEL_THREATINT_TASK_QUEUEUSERAPC_LOCAL

KERNEL_THREATINT_TASK_SETTHREADCONTEXT_LOCAL

KERNEL_THREATINT_TASK_READVM_LOCAL

KERNEL_THREATINT_TASK_WRITEVM_LOCAL

KERNEL_THREATINT_TASK_READVM_REMOTE

KERNEL_THREATINT_TASK_WRITEVM_REMOTE

As you can imagine, the LOCAL operations are very common and quite noisy. So much so in fact that folks at Microsoft decided not to even send most of those to event consumers registered using the default (zeroed) "any/all" keyword bitmasks.

The events themselves contain a ton of useful fields including source and target process signature trust level or 1:1 details from the MEMORY_BASIC_INFORMATION structure. Some events - e.g. SetThreadContext - go as far as including current state of all registers, disk path of the module in VAD pointed to by EIP at the time of the call, and more. The full extracted manifests can be found here.

KERNEL_THREATINT_TASK_ALLOCVM_REMOTE (

UInt32 CallingProcessId,

FILETIME CallingProcessCreateTime,

UInt64 CallingProcessStartKey,

UInt8 CallingProcessSignatureLevel,

UInt8 CallingProcessSectionSignatureLevel,

UInt8 CallingProcessProtection,

UInt32 CallingThreadId,

FILETIME CallingThreadCreateTime,

UInt32 TargetProcessId,

FILETIME TargetProcessCreateTime,

UInt64 TargetProcessStartKey,

UInt8 TargetProcessSignatureLevel,

UInt8 TargetProcessSectionSignatureLevel,

UInt8 TargetProcessProtection,

UInt32 OriginalProcessId,

FILETIME OriginalProcessCreateTime,

UInt64 OriginalProcessStartKey,

UInt8 OriginalProcessSignatureLevel,

UInt8 OriginalProcessSectionSignatureLevel,

UInt8 OriginalProcessProtection,

Pointer BaseAddress,

Pointer RegionSize,

UInt32 AllocationType,

UInt32 ProtectionMask

)

Data exploration

Because NTOSKRNL will only forward collected logs to PPL protected processes, getting the events themselves is not a very straightforward task. The easiest method involves using a self-signed certificate, a custom service executable configured to start as SERVICE_LAUNCH_PROTECTED_ANTIMALWARE_LIGHT, and an ELAM driver. All in testsigning mode.

Fortunately, @pathtofile has already done and documented this, so we can be lazy and use his ppl_runner to start our SilkETW or Sealighter session as PPL. Both tools will serialize the events to a series of JSONs.

For the first detection use case I decided to take a look at something simple, yet critically needed for any injection into a foreign process - the remote VM allocations - and more specifically only the ones where protections imply future code execution.

While collecting the events for about 1 hour I decided to simulate some typical office workstation activity by downloading, installing and using the Microsoft Office suite, different web browsers, Slack, virtualization software, Spotify and others. At the end of the test I have executed 8 different CobaltStrike and Metasploit executables (both staged and stageless), and performed additional injections to see in what ways these stand out in the data.

Plotting AllocationType against RegionSizes for ALLOCVM_REMOTE reveals that all the collected legitimate events are for minimal size memory segments (4 kB) and MEM_COMMIT allocation type, while all the attack framework injections allocate between 170 kB and 350 kB for the payload, and use theMEM_COMMIT | MEM_RESERVE options.

In this case the malicious allocations also used PAGE_EXECUTE_READWRITE protection constant, but this is easily avoidable.

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

%matplotlib inline

with open('output.json', 'r') as f:

data = f.readlines()

# remove the trailing "\n" from each line

data = map(lambda x: x.rstrip(), data)

# each element of 'data' is an individual JSON object.

data_json_str = "[" + ','.join(data) + "]"

data_df = pd.read_json(data_json_str)

# keep only KERNEL_THREATINT_TASK_ALLOCVM_REMOTE events

data_df = data_df[pd.json_normalize(data_df['header'])['event_id']==1]

header_df = pd.json_normalize(data_df['header'])

data_df = pd.json_normalize(data_df['properties'])

sns.set_theme(style="whitegrid")

fig = plt.figure(figsize=(12,7))

sns.violinplot(

data=data_df,

x=data_df['AllocationType'],

y=data_df['RegionSize'],

palette='magma',

)

The small data sample does not take into account the many border cases such as anti-cheat, IDE, compilers, antivirus and other custom software which would produce at least some false positive events over time. Nevertheless, this single-event detection logic is still a strong indicator and allocations with these characteristics are flagged by the MS Defender ATP themselves, and a few other vendors at least for a subset of processes.

Microsoft Defender for Windows

F-Secure RDR

Allocations in general are also a good entry point for further development of the detection lifecycle, so with that I decided to write a proof of concept memory injection detection agent based on TI ETW to further research different detection use cases, as well as to see what techniques could be employed to make the sensor blind.

Building a PoC agent

The sensor is written in C/C++, and is a Windows Service executable similarly to how real EDR/AV agents are setup.

As mentioned, to successfully register an event consumer for Microsoft-Windows-Threat-Intelligence, we need to be running as at least a PPL process, and the most elegant way to do it is to sign your software with a Microsoft-verified certificate (with Code Signing EKU), or simulate such a situation with testsigning, as described in the earlier linked post by @pathtofile. For no reason I chose the latter ; )

After we have the basic service + ELAM driver installation handled, we can move onto event consumption - here I chose to use the krabsetw library by Microsoft. The session setup is straightforward:

  • Create new trace object

  • Create new provider object

  • Optionally add filters and define any/all collection flags - here I decided to only collect one event type simply because there isn't yet a use case for the others in the detection code, however, the number of remote events is small enough that we could safely skip this step and do this later. krabs::predicates::id_is((int)KERNEL_THREATINT_TASK_ALLOCVM_REMOTE)

  • Register a callback function

  • Enable and start the trace

DWORD agent_worker()

{

DWORD ret{ 0 };

log_debug(L"TiEtwAgent: Started the agent worker\n");

krabs::user_trace trace(ETS_NAME);

krabs::provider<> provider(L"Microsoft-Windows-Threat-Intelligence");

krabs::event_filter filter(krabs::predicates::id_is((int)KERNEL_THREATINT_TASK_ALLOCVM_REMOTE));

try {

log_debug(L"TiEtwAgent: Setting up the trace session\n");

provider.add_on_event_callback(parse_single_event);

provider.add_filter(filter);

trace.enable(provider);

trace.start();

}

catch (...) {

log_debug(L"TiEtwAgent: Failed to setup a trace session\n");

trace.stop();

}

ret = GetLastError();

return ret;

}

Next we can get the schema, set up the parser and parse the events. Since the fields I needed to use always contained an unsigned integer of some size, I decided to put them into a map<wstring, uint64_t>. This will likely need a rework soon, but for now serves its purpose.

VOID parse_single_event(const EVENT_RECORD& record, const krabs::trace_context& trace_context) {

krabs::schema schema(record, trace_context.schema_locator);

krabs::parser parser(schema);

map<wstring, uint64_t> parsed_event;

int eid = schema.event_id();

switch (eid) {

case KERNEL_THREATINT_TASK_ALLOCVM_REMOTE:

parsed_event = parse_allocvm_remote(schema, parser);

break;

case KERNEL_THREATINT_TASK_PROTECTVM_REMOTE:

case KERNEL_THREATINT_TASK_MAPVIEW_REMOTE:

case KERNEL_THREATINT_TASK_QUEUEUSERAPC_REMOTE:

case KERNEL_THREATINT_TASK_SETTHREADCONTEXT_REMOTE:

case KERNEL_THREATINT_TASK_ALLOCVM_LOCAL:

case KERNEL_THREATINT_TASK_PROTECTVM_LOCAL:

case KERNEL_THREATINT_TASK_MAPVIEW_LOCAL:

case KERNEL_THREATINT_TASK_QUEUEUSERAPC_LOCAL:

case KERNEL_THREATINT_TASK_SETTHREADCONTEXT_LOCAL:

case KERNEL_THREATINT_TASK_READVM_LOCAL:

case KERNEL_THREATINT_TASK_WRITEVM_LOCAL:

case KERNEL_THREATINT_TASK_READVM_REMOTE:

case KERNEL_THREATINT_TASK_WRITEVM_REMOTE:

default:

return;

}

if (parsed_event.empty()) {

log_debug(L"TiEtwAgent: Failed to parse an event\n");

}

else {

detect_event(parsed_event, eid);

}

return;

}

try {

for (krabs::property property : parser.properties()) {

std::wstring wsPropertyName = property.name();

if (allocation_fields.find(wsPropertyName) != allocation_fields.end()) {

switch (property.type()) {

case TDH_INTYPE_UINT32:

allocation_fields[wsPropertyName] = parser.parse<std::uint32_t>(wsPropertyName);

break;

case TDH_INTYPE_POINTER:

allocation_fields[wsPropertyName] = parser.parse<krabs::pointer>(wsPropertyName).address;

break;

}

}

}

return allocation_fields;

}

catch (...) {

log_debug(L"Error parsing the event\n");

return zero_map;

}

The successfully parsed events are ready to go through the detection logic implemented as a set of independent functions in DetectionLogic.cpp (at the point of writing this literally only 3 functions, but we'll work on it in the next post).

The boolean output of one detection function, can trigger another one. Very procedural, no scoring etc.

VOID detect_event(std::map<std::wstring, uint64_t> parsed_event, int eid) {

switch (eid) {

case KERNEL_THREATINT_TASK_ALLOCVM_REMOTE:

bool s1 = allocvm_remote_meta_generic(parsed_event);

if (s1) {

allocvm_remote_signatures(parsed_event);

}

break;

case KERNEL_THREATINT_TASK_PROTECTVM_REMOTE:

case KERNEL_THREATINT_TASK_MAPVIEW_REMOTE:

case KERNEL_THREATINT_TASK_QUEUEUSERAPC_REMOTE:

(...)

const int ALLOC_PROTECTION{ PAGE_EXECUTE_READWRITE };

const int ALLOC_TYPE{ MEM_RESERVE | MEM_COMMIT };

const int MIN_REGION_SIZE{ 10240 };

DWORD allocvm_remote_meta_generic(std::map<std::wstring, uint64_t> alloc_event) {

if (alloc_event[L"RegionSize"] >= MIN_REGION_SIZE) {

if (alloc_event[L"AllocationType"] == ALLOC_TYPE) {

if (alloc_event[L"ProtectionMask"] == ALLOC_PROTECTION) {

report_detection(ALLOCVM_REMOTE_META_GENERIC, alloc_event);

return TRUE;

}

}

}

return FALSE;

}

The whole thing is only around 700 lines, and here is the effect so far

( should've shown it on the DOUBLEPULSAR example... )

In the next post we'll look into bypass techniques