Malware Reverse Engineering with Ghidra: A Beginner's Guide — HackerXone

Malware Reverse Engineering with Ghidra: A Beginner’s Guide

Disclaimer: This content is intended for educational purposes and authorized security research only. Analyzing malware should only be performed in isolated environments with proper authorization. Unauthorized analysis of malicious software may violate computer crime laws in your jurisdiction.

Introduction: Why Reverse Engineering Matters in 2026

The malware landscape has evolved dramatically. With the proliferation of AI-generated payloads and polymorphic threats, understanding what malicious code actually does at the binary level has become an essential skill for security professionals. In Q1 2026 alone, CISA reported a 340% increase in novel malware families that evade traditional signature-based detection. When your EDR flags a suspicious binary but can’t tell you what it does, reverse engineering becomes your best investigative tool.

Ghidra, the NSA’s open-source software reverse engineering (SRE) framework, has matured into a powerhouse rivaling commercial alternatives like IDA Pro. With the release of Ghidra 11.2 in early 2026, significant improvements to decompilation accuracy and scripting capabilities make it the ideal platform for analysts entering the reverse engineering field.

This guide will walk you through setting up a safe analysis environment, navigating Ghidra’s interface, and performing practical analysis on a real malware sample. By the end, you’ll have the foundational skills to begin dissecting unknown threats in your own incident response workflows.

Setting Up Your Malware Analysis Environment

Before touching any malicious sample, you need an isolated environment that prevents accidental infection of your host system or network. This isn’t optional—it’s mandatory for safe analysis.

Creating an Isolated Virtual Machine

I recommend using a dedicated hypervisor with snapshot capabilities. VirtualBox or VMware Workstation Pro both work well, but ensure you configure the VM with the following restrictions:

  • Network isolation: Use host-only networking or disable networking entirely during initial analysis
  • Shared folder disabled: Prevent any clipboard or file sharing with the host
  • Snapshot before analysis: Always create a clean snapshot you can revert to
  • Nested virtualization disabled: Some malware detects nested VMs and alters behavior

For your analysis VM, I recommend REMnux or FlareVM. REMnux is a Debian-based distribution pre-configured with analysis tools, while FlareVM transforms a Windows installation into an analysis workstation. For Ghidra-focused work, either works well.

Installing Ghidra 11.2

Ghidra requires Java Development Kit (JDK) 17 or later. Here’s the installation process on a Linux-based analysis VM:

# Install OpenJDK 17
sudo apt update
sudo apt install openjdk-17-jdk -y

# Verify Java installation
java --version

# Download Ghidra 11.2 from the official GitHub releases
wget https://github.com/NationalSecurityAgency/ghidra/releases/download/Ghidra_11.2_build/ghidra_11.2_PUBLIC_20260115.zip

# Extract and run
unzip ghidra_11.2_PUBLIC_20260115.zip
cd ghidra_11.2_PUBLIC
./ghidraRun

On first launch, Ghidra will prompt you to specify the JDK path if it’s not automatically detected. Point it to /usr/lib/jvm/java-17-openjdk-amd64 on most Debian-based systems.

Obtaining Malware Samples Safely

For learning purposes, you need real malware samples. Several legitimate repositories exist for researchers:

  • MalwareBazaar: Abuse.ch’s database with over 2 million samples
  • VirusTotal: Requires a premium account for downloads, but excellent for initial triage
  • theZoo: GitHub repository with curated samples for educational purposes
  • Malware Traffic Analysis: Focuses on network-based samples with PCAPs

Always handle samples in password-protected ZIP archives (standard password is “infected”) and never extract them on your host system.

Ghidra Interface Deep Dive

When you first open a binary in Ghidra, the interface can feel overwhelming. Understanding the key windows will accelerate your analysis significantly.

The Code Browser: Your Analysis Hub

After creating a new project and importing a binary, Ghidra’s auto-analysis will run. This process identifies functions, strings, cross-references, and attempts decompilation. For a typical 500KB sample, expect this to take 2-5 minutes.

The Code Browser presents several critical panels:

  • Listing Window: Shows disassembly with addresses, bytes, and assembly instructions
  • Decompiler Window: Displays pseudo-C code reconstructed from assembly
  • Symbol Tree: Navigates functions, labels, classes, and namespaces
  • Data Type Manager: Shows recognized structures and allows custom type definition
  • Function Graph: Visualizes control flow within a function

Essential Keyboard Shortcuts

Efficient analysis requires mastering navigation shortcuts:

  • G – Go to address
  • L – Rename label/function
  • ; – Add comment
  • Ctrl+Shift+E – Show references to current location
  • Ctrl+Shift+F – Search for strings
  • D – Disassemble at cursor
  • T – Change data type

Practical Analysis: Dissecting a Ransomware Dropper

Let’s work through analyzing a real-world sample. We’ll use a defanged ransomware dropper from early 2026 that exhibits common techniques including string obfuscation, API hashing, and process injection.

Initial Triage

Before opening in Ghidra, perform basic static analysis to understand what you’re dealing with:

# Get file type and basic info
file sample.exe
# Output: PE32+ executable (GUI) x86-64, for MS Windows

# Calculate hashes for threat intel lookup
sha256sum sample.exe
# Output: 3a7bd3e2f5d9c8b1a6e4f2d8c9b7a5e3d1c9f8a7b6e5d4c3b2a1f0e9d8c7b6a5

# Extract strings (use FLOSS for obfuscated strings)
floss sample.exe > strings_output.txt

# Check for packing with Detect It Easy
diec sample.exe
# Output: UPX(3.96)[NRV2B,brute]

This sample is UPX packed. Before detailed analysis, we should unpack it:

upx -d sample.exe -o sample_unpacked.exe

Identifying Malicious Functions

After importing the unpacked binary into Ghidra and letting auto-analysis complete, we start by examining the entry point. Navigate to the entry function (usually listed as entry or the address shown in the PE header).

The decompiler shows the following suspicious code pattern:

void entry(void)
{
  undefined8 uVar1;
  HMODULE hModule;
  FARPROC pFVar2;
  char local_58[32];
  char local_38[40];
  
  /* XOR-decoded string reconstruction */
  decode_string(local_58, &DAT_140005000, 0x1a, 0x5f);
  decode_string(local_38, &DAT_140005020, 0x10, 0x5f);
  
  /* Dynamic API resolution */
  hModule = LoadLibraryA(local_58);  // kernel32.dll
  if (hModule != NULL) {
    pFVar2 = GetProcAddress(hModule, local_38);  // VirtualAlloc
    if (pFVar2 != NULL) {
      uVar1 = (*pFVar2)(0, 0x1000, 0x3000, 0x40);
      /* Shellcode injection follows */
    }
  }
  return;
}

Several red flags appear immediately: XOR string decoding (common obfuscation), dynamic API resolution (evades static import analysis), and VirtualAlloc with PAGE_EXECUTE_READWRITE permissions (0x40) indicating shellcode injection preparation.

Decoding Obfuscated Strings

Let’s examine the decode_string function. In the decompiler, it appears as:

void decode_string(char *output, byte *encoded, int length, byte key)
{
  int i;
  for (i = 0; i < length; i++) {
    output[i] = encoded[i] ^ key;
  }
  output[length] = '\0';
  return;
}

This is a simple single-byte XOR cipher. We can decode the strings using a Ghidra Python script:

# Ghidra Python script to decode XOR strings
# Run via Window > Script Manager > New Script

from ghidra.program.model.address import AddressFactory

def xor_decode(addr, length, key):
    """Decode XOR-encoded string at address"""
    result = []
    mem = currentProgram.getMemory()
    
    for i in range(length):
        byte_addr = addr.add(i)
        byte_val = mem.getByte(byte_addr) & 0xFF
        decoded = byte_val ^ key
        result.append(chr(decoded))
    
    return ''.join(result)

# Decode the strings we identified
addr_factory = currentProgram.getAddressFactory()

# First encoded string at 0x140005000, length 0x1a, key 0x5f
addr1 = addr_factory.getAddress("0x140005000")
string1 = xor_decode(addr1, 0x1a, 0x5f)
print("Decoded string 1: " + string1)

# Second encoded string at 0x140005020, length 0x10, key 0x5f  
addr2 = addr_factory.getAddress("0x140005020")
string2 = xor_decode(addr2, 0x10, 0x5f)
print("Decoded string 2: " + string2)

Running this script reveals the decoded strings: "kernel32.dll" and "VirtualAlloc" confirming our suspicions about dynamic API resolution.

Tracing the Injection Chain

Following the code flow after VirtualAlloc, we find the classic process injection pattern:

  1. VirtualAlloc: Allocates RWX memory in the current process
  2. Memory copy: Shellcode from embedded data section copied to allocated buffer
  3. CreateThread: New thread created with shellcode buffer as start address

The shellcode itself (located at DAT_140006000) appears to be a Cobalt Strike beacon based on characteristic patterns. We can extract it for further analysis:

# In Ghidra's Python console
mem = currentProgram.getMemory()
addr = currentProgram.getAddressFactory().getAddress("0x140006000")

# Read shellcode bytes
shellcode = []
for i in range(0x800):  # Adjust size as needed
    byte_val = mem.getByte(addr.add(i)) & 0xFF
    shellcode.append(byte_val)

# Write to file
with open("/tmp/extracted_shellcode.bin", "wb") as f:
    f.write(bytes(shellcode))

print("Shellcode extracted to /tmp/extracted_shellcode.bin")

Attack Scenarios: Understanding Adversary Techniques

Understanding how attackers use these techniques helps you recognize them in future analysis. The sample we examined demonstrates several MITRE ATT&CK techniques:

T1027 - Obfuscated Files or Information

The XOR string encoding prevents static string extraction from revealing IOCs. More sophisticated variants use rolling XOR keys, RC4 encryption, or stack-based string construction. During analysis, always run FLOSS (FireEye Labs Obfuscated String Solver) before Ghidra import to automatically decode common obfuscation schemes.

T1055.001 - Process Injection: Dynamic-link Library Injection

While our sample used self-injection (creating a thread in its own process), the same technique scales to remote process injection. Watch for the pattern: OpenProcess → VirtualAllocEx → WriteProcessMemory → CreateRemoteThread. These API sequences are strong indicators of injection-capable malware.

T1106 - Native API

Dynamic API resolution through LoadLibrary/GetProcAddress evades static import table analysis. Advanced malware uses API hashing where each API name is converted to a hash value, and the malware searches loaded modules for matching exports at runtime. Ghidra scripts can automate hash-to-API resolution if you identify the hashing algorithm.

Defense Strategies and Detection

Your reverse engineering findings should translate directly to defensive measures. Based on our analysis, here are actionable detections:

YARA Rule for String Obfuscation Pattern

rule Dropper_XOR_String_Obfuscation {
    meta:
        description = "Detects XOR string decoding pattern from analyzed sample"
        author = "HackerXone Research"
        date = "2026-06-08"
        reference = "Internal analysis"
    
    strings:
        // XOR decode loop pattern
        $decode_loop = { 8A ?? ?? 32 ?? 88 ?? ?? FF C? 3B ?? 7C }
        
        // VirtualAlloc with RWX permissions pattern
        $virtualalloc_rwx = { 6A 40 68 00 30 00 00 }
        
        // Encoded kernel32.dll XOR 0x5F
        $encoded_k32 = { 34 30 27 39 30 31 78 79 69 31 31 }
    
    condition:
        uint16(0) == 0x5A4D and
        $decode_loop and
        ($virtualalloc_rwx or $encoded_k32)
}

Sigma Rule for Behavioral Detection

Deploy this Sigma rule to detect the runtime behavior:

title: Suspicious Memory Allocation with Execute Permissions
id: a3b8c9d7-e5f4-4a2b-9c1d-8e7f6a5b4c3d
status: experimental
date: 2026/06/08
author: HackerXone Research
description: Detects VirtualAlloc calls with PAGE_EXECUTE_READWRITE that may indicate shellcode injection
logsource:
    category: process_creation
    product: windows
detection:
    selection:
        CommandLine|contains:
            - 'VirtualAlloc'
            - 'VirtualAllocEx'
        CommandLine|contains:
            - '0x40'
            - 'PAGE_EXECUTE_READWRITE'
    condition: selection
falsepositives:
    - Legitimate software installers
    - JIT compilers
level: high

Endpoint Detection Recommendations

  • Memory scanning: Configure your EDR to scan newly allocated RWX memory regions
  • API monitoring: Alert on LoadLibrary/GetProcAddress sequences followed by VirtualAlloc
  • Thread creation monitoring: Flag threads with start addresses in non-image memory
  • Behavioral analysis: Monitor for encrypted network connections from processes that loaded minimal DLLs

Advanced Ghidra Techniques for Continued Learning

Once you've mastered basic analysis, explore these advanced capabilities:

Custom Data Types

Define structures matching malware configuration formats. When you identify a config structure in memory, creating a custom data type allows Ghidra to parse it automatically, dramatically improving decompiler output readability.

Ghidra Scripting with Java

While Python scripts work for quick tasks, Java-based analyzers integrate more deeply with Ghidra's analysis framework. Consider developing custom analyzers for malware family-specific patterns you encounter repeatedly.

Collaborative Analysis

Ghidra Server enables team-based analysis of large samples. When facing a complex threat, multiple analysts can work on different functions simultaneously, with changes synchronized in real-time.

Key Takeaways

  • Environment isolation is non-negotiable: Always analyze malware in a properly isolated VM with snapshots before beginning any analysis
  • Triage before deep analysis: Basic static analysis (file type, hashes, strings, packer detection) guides your Ghidra analysis and saves time
  • Learn to recognize patterns: String obfuscation, dynamic API resolution, and process injection are foundational techniques that appear across malware families
  • Translate findings to defenses: Every reverse engineering session should produce actionable intelligence—YARA rules, Sigma detections, or configuration extractions
  • Automation multiplies your capability: Ghidra's scripting capabilities let you build reusable tools that accelerate future analysis
  • Practice consistently: Reverse engineering is a perishable skill. Analyze samples regularly, even outside of active incidents, to maintain proficiency

Malware reverse engineering with Ghidra is a journey, not a destination. The techniques attackers use evolve constantly, requiring continuous learning. Start with simpler samples—basic packers, simple RATs, commodity malware—before tackling sophisticated APT tools. Each analysis builds your pattern recognition and Ghidra proficiency.

The malware authors are counting on analysts being too intimidated to look under the hood. Prove them wrong.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *