Introduction to IDAPython for Vulnerability Hunting

Overview

IDAPython is a powerful tool that can be used to automate tedious or complicated reverse engineering tasks. While much has been written about using IDAPython to simplify basic reversing tasks, little has been written about using IDAPython to assist in auditing binaries for vulnerabilities. Since this is not a new idea (Halvar Flake presented on automating vulnerability research with IDA scripting in 2001), it is a bit surprising that there is not more written on this topic. This may be partially a result of the increasing complexity required to perform exploitation on modern operating systems. However, there is still a lot of value in being able to automate portions of the vulnerability research process.

In this post we will begin to describe using basic IDAPython techniques to detect dangerous programming constructs which often result in stack-buffer overflows. Throughout this blog post, I will be walking through automating the detection of a basic stack-buffer overflow using the “ascii_easy” binary from http://pwnable.kr. While this binary is small enough to manually reverse in its entirety, it serves as a good educational example whereby the same IDAPython techniques can be applied to much larger and more complex binaries.

Getting Started

Before we start writing any IDAPython, we must first determine what we would like our scripts to look for. In this case, I have selected a binary with one of the most simple types of vulnerabilities, a stack-buffer overflow caused by using `strcpy` to copy a user-controlled string into a stack-buffer. Now that we know what we are looking for, we can begin to think about how to automate finding these types of vulnerabilities.

For our purposes here, we will break this down into two steps:

1. Locating all function calls that may cause the stack-buffer overflow (in this case `strcpy`)
2. Analyzing usages of function calls to determine whether a usage is “interesting” (likely to cause an exploitable overflow)

Locating Function Calls

In order to find all calls to the `strcpy` function, we must first locate the `strcpy` function itself. This is easy to do with the functionality provided by the IDAPython API. Using the code snippet below we can print all function names in the binary:

for functionAddr in Functions():    
   print(GetFunctionName(functionAddr))

Running this IDAPython script on the ascii_easy binary gives us the following output. We can see that all of the function names were printed in the output window of IDA Pro.  

vrblog_fig1_get_func_names.png

Next, we add code to filter through the list of functions in order to find the `strcpy` function that is of interest to us. Using simple string comparisons will do the trick here. Since we oftentimes deal with functions that are similar, but slightly differing names (such as `strcpy` vs `_strcpy` in the example program) due to how imported functions are named, it is best to check for substrings rather than exact strings.

Building upon our previous snippet, we now have the following code:

for functionAddr in Functions():    
    if “strcpy” in GetFunctionName(functionAddr):        
        print hex(functionAddr)

Now that we have the function that we are interested in, we have to identify all locations where it is called. This involves a couple of steps. First we get all cross-references to `strcpy` and then we check each cross-reference to find which cross references are actual `strcpy` function calls. Putting this all together gives us the piece of code below:

for functionAddr in Functions():    
    # Check each function to look for strcpy        
    if "strcpy" in GetFunctionName(functionAddr): 
        xrefs = CodeRefsTo(functionAddr, False)                
        # Iterate over each cross-reference
        for xref in xrefs:                            
            # Check to see if this cross-reference is a function call                            
            if GetMnem(xref).lower() == "call":           
                print hex(xref)

Running this against the ascii_easy binary yields all calls of `strcpy` in the binary. The result is shown below:

vrblog_fig2_strcpy_calls.png

Analysis of Function Calls

Now, with the above code, we know how to get the addresses of all calls to `strcpy` in a program. While in the case of the ascii_easy application there is only a single call to `strcpy` (which also happens to be vulnerable), many applications will have a large number of calls to `strcpy` (with a large number not being vulnerable) so we need some way to analyze calls to `strcpy` in order to prioritize function calls that are more likely to be vulnerable.

One common feature of exploitable buffers overflows is that they oftentimes involve stack buffers. While exploiting buffer overflows in the heap and elsewhere is possible, stack-buffer overflows represent a simpler exploitation path.

This involves a bit of analysis of the destination argument to the strcpy function. We know that the destination argument is the first argument to the strcpy function and we are able to find this argument by going backwards through the disassembly from the function call. The disassembly of the call to strcpy is included below.

vrblog_fig3_function_call.png

In analyzing the above code, there are two ways that one might find the destination argument to the _strcpy function. The first method would be to rely on the automatic IDA Pro analysis which automatically annotates known function arguments. As we can see in the above screenshot, IDA Pro has automatically detected the “dest” argument to the _strcpy function and has marked it as such with a comment at the instruction where the argument is pushed onto the stack.

Another simple way to detect arguments to the function would be to move backwards through the assembly, starting at the function call looking for “push” instructions. Each time we find an instruction, we can increment a counter until we locate the index of the argument that we are looking for. In this case, since we are looking for the “dest” argument that happens to be the first argument, this method would halt at the first instance of a “push” instruction prior to the function call.

In both of these cases, while we are traversing backwards through the code, we are forced to be careful to identify certain instructions that break sequential code flow. Instructions such as “ret” and “jmp” cause changes in the code flow that make it difficult to accurately identify the arguments. Additionally, we must also make sure that we don’t traverse backwards through the code past the start of the function that we are currently in. For now, we will simply work to identify instances of non-sequential code flow while searching for the arguments and halt the search if any instances of non-sequential code flow is found.

We will use the second method of finding arguments (looking for arguments being pushed to the stack). In order to assist us in finding arguments in this way, we should create a helper function. This function will work backwards from the address of a function call, tracking the arguments pushed to the stack and return the operand corresponding to our specified argument.

So for the above example of the call to _strcpy in  ascii_easy, our helper function will return the value “eax” since the “eax” register stores the destination argument of strcpy when it is pushed to the stack as an argument to _strcpy. Using some basic python in conjunction with the IDAPython API, we are able to build a function that does that as shown below.

def find_arg(addr, arg_num):
   # Get the start address of the function that we are in
   function_head = GetFunctionAttr(addr, idc.FUNCATTR_START)    
   steps = 0
   arg_count = 0
   # It is unlikely the arguments are 100 instructions away, include this as a safety check
   while steps < 100:    
       steps = steps + 1
       # Get the previous instruction
       addr = idc.PrevHead(addr)  
       # Get the name of the previous instruction
       op = GetMnem(addr).lower()         
       # Check to ensure that we haven’t reached anything that breaks sequential code flow        
       if op in ("ret", "retn", "jmp", "b") or addr < function_head:
           return
       if op == "push":
           arg_count = arg_count + 1
           if arg_count == arg_num:
               # Return the operand that was pushed to the stack
               return GetOpnd(addr, 0) 

Using this helper function we are able to determine that the “eax” register was used to store the destination argument prior to calling _strcpy. In order to determine whether eax is pointing to a stack buffer when it is pushed to the stack we must now continue to try to track where the value in “eax” came from. In order to do this, we use a similar search loop to that which we used in our previous helper function:

# Assume _addr is the address of the call to _strcpy 
# Assume opnd is “eax” 
# Find the start address of the function that we are searching in
function_head = GetFunctionAttr(_addr, idc.FUNCATTR_START)
addr = _addr 
while True:
   _addr = idc.PrevHead(_addr)
   _op = GetMnem(_addr).lower()    
   if _op in ("ret", "retn", "jmp", "b") or _addr < function_head:
       break
   elif _op == "lea" and GetOpnd(_addr, 0) == opnd:
       # We found the destination buffer, check to see if it is in the stack
       if is_stack_buffer(_addr, 1):
           print "STACK BUFFER STRCOPY FOUND at 0x%X" % addr
           break
   # If we detect that the register that we are trying to locate comes from some other register
   # then we update our loop to begin looking for the source of the data in that other register
   elif _op == "mov" and GetOpnd(_addr, 0) == opnd:
       op_type = GetOpType(_addr, 1)
       if op_type == o_reg:
           opnd = GetOpnd(_addr, 1)
           addr = _addr
       else:
           break

In the above code we perform a backwards search through the assembly looking for instructions where the register that holds the destination buffer gets its value. The code also performs a number of other checks such as checking to ensure that we haven’t searched past the start of the function or hit any instructions that would cause a change in the code flow. The code also attempts to trace back the value of any other registers that may have been the source of the register that we were originally searching for. For example, this code attempts to account for the situation demonstrated below.

... 
lea ebx [ebp-0x24] 
... 
mov eax, ebx
...
push eax
...

Additionally, in the above code, we reference the function is_stack_buffer(). This function is one of the last pieces of this script and something that is not defined in the IDA API. This is an additional helper function that we will write in order to assist us with our bug hunting. The purpose of this function is quite simple: given the address of an instruction and an index of an operand, report whether the variable is a stack buffer. While the IDA API doesn’t provide us with this functionality directly, it does provide us with the ability to check this through other means. Using the get_stkvar function and checking whether the result is None or an object, we are able to effectively check whether an operand is a stack variable. We can see our helper function in the code below:

def is_stack_buffer(addr, idx):
   inst = DecodeInstruction(addr)
   return get_stkvar(inst[idx], inst[idx].addr) != None

Note that the above helper function is not compatible with the IDA 7 API. In our next blog post we will present a new method of checking whether an argument is a stack buffer while maintaining compatibility with all recent versions of the IDA API.

So now we can put all of this together into a nice script as shown below in order to find all of the instances of strcpy being used in order to copy data into a stack buffer. With these skills it is possible for us to extend these capabilities beyond just strcpy but also to similar functions such as strcat, sprintf, etc. (see the Microsoft Banned Functions List for inspiration) as well as to adding additional analysis to our script. The script is included  in its entirety at the bottom of the post. Running the script results in our successfully finding the vulnerable strcpy as shown below.

vrblog_fig4_output.png

Script

def is_stack_buffer(addr, idx):
   inst = DecodeInstruction(addr)
   return get_stkvar(inst[idx], inst[idx].addr) != None 

def find_arg(addr, arg_num):
   # Get the start address of the function that we are in
   function_head = GetFunctionAttr(addr, idc.FUNCATTR_START)    
   steps = 0
   arg_count = 0
   # It is unlikely the arguments are 100 instructions away, include this as a safety check
   while steps < 100:    
       steps = steps + 1
       # Get the previous instruction
       addr = idc.PrevHead(addr)  
       # Get the name of the previous instruction        
       op = GetMnem(addr).lower() 
       # Check to ensure that we havent reached anything that breaks sequential code flow        
       if op in ("ret", "retn", "jmp", "b") or addr < function_head:            
           return
       if op == "push":
           arg_count = arg_count + 1
           if arg_count == arg_num:
               #Return the operand that was pushed to the stack 
               return GetOpnd(addr, 0) 

for functionAddr in Functions():
   # Check each function to look for strcpy
   if "strcpy" in GetFunctionName(functionAddr): 
       xrefs = CodeRefsTo(functionAddr, False)
       # Iterate over each cross-reference
       for xref in xrefs:
           # Check to see if this cross-reference is a function call
           if GetMnem(xref).lower() == "call":
               # Since the dest is the first argument of strcpy
               opnd = find_arg(xref, 1) 
               function_head = GetFunctionAttr(xref, idc.FUNCATTR_START)
               addr = xref
               _addr = xref                
               while True:
                   _addr = idc.PrevHead(_addr)
                   _op = GetMnem(_addr).lower()                    
                   if _op in ("ret", "retn", "jmp", "b") or _addr < function_head:
                       break
                   elif _op == "lea" and GetOpnd(_addr, 0) == opnd:
                       # We found the destination buffer, check to see if it is in the stack
                       if is_stack_buffer(_addr, 1):
                           print "STACK BUFFER STRCOPY FOUND at 0x%X" % addr                            break
                   # If we detect that the register that we are trying to locate comes from some other register
                   # then we update our loop to begin looking for the source of the data in that other register
                   elif _op == "mov" and GetOpnd(_addr, 0) == opnd:
                       op_type = GetOpType(_addr, 1)
                       if op_type == o_reg:
                           opnd = GetOpnd(_addr, 1)
                           addr = _addr
                       else:
                           break
Posted on July 11, 2018 .

GameOn! Abusing SCADA HMI Project Files

Introduction

Back in July 2016, AttackIQ announced that they were hosting a GameOn! Competition for their FireDrill platform. FireDrill aims to aid companies in improving their network security posture by performing continuous real-world network attack simulations, they call scenarios, on a company’s network to test whether it is susceptible to particular vulnerabilities and network mis-configurations. Scenarios can be selected, deployed, and controlled from an administration console from the AttackIQ cloud. Once a scenario is chosen, AttackIQ servers communicate and instrument a software agent that has been deployed on the host. This agent performs local/remote attacks on the company’s network, such as testing for pass the hash or outbound firewall rules, like TOR traffic. Note that these tests are harmless and only check for vulnerabilities without actually exploiting them. If a security mechanism, such as a firewall, properly blocks an attack, then the scenario will fail at the last phase it was running. Finally, it logs the results back to the cloud, which can be viewed by the user.

The competition required each team’s submission to be in the form of a custom attack scenario. We took this as an opportunity to spend some time searching for vulnerabilities in common supervisory control and data acquisition (SCADA) human machine interfaces (HMIs) and create custom FireDrill scenarios for them.

This lead us to investigate Ecava IntegraXor and Sielco Sistemi Winlog Lite, popular SCADA HMIs that run on Windows platforms. We discovered that it was possible to gain code execution by crafting a project file that abuses the internal scripting engines in both HMIs. IntegraXor uses a Javascript engine and Winlog Lite uses a custom runtime scripting engine, which executes WLL files. This technique could be compared to malicious macros in Microsoft-related project files. Our reasoning was that many users and defensive technologies are conscious of potentially malicious Windows executables, PDFs, Adobe Flash files, JARs, etc. However, it was less likely that they are aware of dangerous SCADA HMI files. We created two five-phase scenarios named IGX-Poison and WLL-Poison. Other than phase 1, both scenarios are identical in behavior.

Phase 1 - Code Execution

The deployed agent for this scenario starts by running a malicious (poisoned) project file, on a victim host. This can be thought of as a user or system administrator running an IntegraXor or Winlog Lite project file that they had received over email or some other means. The poisoned file contains SCADA simulation files and malicious scripting code. The malicious scripting code spawns a set of CMD commands and outputs a malicious base64 encoded Windows batch file. The batch file is then decoded using certutil (a default Windows program). The result is a FTP-base reverse shell that is then executed. Because these HMIs run as an administrative user, the malicious program also gets full administrative privileges.

In IGX-Poison, the project code abuses a design flaw in IntegraXor’s HMI ActiveX engine that allows us to execute CMD commands on the host, which we use to decrypt an included malicious batch file and run it. We encrypt the batch file as a means to circumvent and test the network’s antivirus, intrusion detection systems (IDS), and intrusion prevention systems (IPS).  

Malicious Javascript embedded within the poisoned IntegraXor project

Incredible enough, the process is nearly identical in WLL-Poison. However, the malicious code is written in WinLog Lite’s custom programming language and abuses its ShellExec function.

Malicious WLL code embedded within the poisoned Winlog Lite project

Phase 2 - Persistence

This is a setup phase that checks if access controls are in place to prevent an attacker from creating files and directories for storing harvested data. If proper access controls are in place, the scenario will fail at this step.

Phase 3 - Reverse Shell

This phase attempts to create a persistent reverse shell by using the Windows built-in FTP client. We chose this route to circumvent antivirus, reduce the dependencies on post-exploitation toolkits, and rely purely on vanilla Windows services. It works by running a script that continuously downloads a text file from an attacker’s remote FTP server and execute its contents.

Reverse shell batch file

Phase 4 - Harvesting

Assuming that an attacker now has remote shell access, the harvesting stage simulates an attacker searching for sensitive data on a host. It is performed by scanning local disks for SCADA HMI projects and extracting database connections strings from the files. Sensitive information can be harvested from these project files and additional data can be gathered for future data exfiltration.

Phase 5 - Exfiltration

The exfiltration phase attempts to send harvested data back to the attacker covertly. This phase encrypts and compresses harvested data into a ZIP file. The ZIP file is then split into 10-Byte chunks and sent over the network to the attacker over HTTP. If at any point a firewall blocks data transfer back to an attacker, this scenario phase will fail.

Defense

When using IntegraXor or Winlog Lite on a production network, it’s possible to implement security measures to prevent exploitation of the vulnerability we highlight in this scenario. The most effective would be limiting the network’s internet access. Doing so would drastically decrease the capability of an attacker to communicate with a compromised host. Application whitelisting, blocking all ingress and egress traffic/ports at the firewall, employing network IDS/IPS, disabling cmd.exe and batch file execution, and email filters to block or flag project files are additional system and network security considerations. In the end, it's important that users and companies understand the risks that SCADA project files pose and how they could be leveraged to exploit and exfiltrate data from a network. 

Posted on May 2, 2018 .

Electronic Safe Lock Analysis: Part 2

Introduction / Recap

View the full whitepaper here

In a previous post we talked about the SecuRam Prologic B01, a Bluetooth Low Energy (BLE) electronic lock marketed towards commercial applications. After performing an initial tear-down, we were able to map out the device’s behaviors and attack surface. We then narrowed our efforts on analyzing the device’s BLE wireless communication. The Prologic B01’s main feature is that it can be unlocked by a mobile Android or iOS device over BLE. The end result was a fully-automated attack that allows us to remotely compromise any Prologic B01 lock up to 100 yards away. We have contacted SecuRam about this vulnerability, but since these devices are not capable of OTA (Over-the-Air) firmware updates, it does not look promising that they will be patched. Because of this, we advise all current/prospective customers to avoid this entry pad.

Vulnerabilities Found

The mobile application used to control the Prologic B01 remotely had no anti-reversing protection on the Android version. This allowed us to decompile and conveniently audit the mobile application’s code, which lead us to find vulnerabilities within the communication protocol. BLE data between a mobile device and the Prologic B01 lacked encryption, allowing us to sniff traffic in plaintext as it was transmitted. The Prologic B01 also does not possess a secure channel to pair with a mobile device, meaning that any mobile device with the “SecuRam Access” application installed can communicate with any ProLogic B01. The lack of encryption and proper key management allows for a fully-automated remote attack.

Attack Model

Because the Prologic B01 has a unique advertising signature, safes can easily be discovered using commodity Bluetooth devices or software defined radios. These characteristics would allow an attacker to wardrive for devices. In other words, this would allow an attacker to drive around and map out the location of safes in a region.

Figure 1: Wireshark capturing Bluetooth traffic with wardriving filter

Figure 1: Wireshark capturing Bluetooth traffic with wardriving filter

Since BLE traffic is sent over plaintext, command packets can be decoded. The packet, shown in Figure 2, was a captured unlock command. The last four bytes of the receiver’s (pink) and sender’s (cyan) MAC address is included. The PIN (green) is parsed as a Long type and is sent in reverse order, which is illustrated above. Finally, the open time (blue) is included and specifies how long the lock should stay open, in seconds.

Figure 2: Bluetooth application payload containing the receiver MAC address (pink), sender MAC address (cyan), PIN (green), and open time (blue)

Figure 2: Bluetooth application payload containing the receiver MAC address (pink), sender MAC address (cyan), PIN (green), and open time (blue)

Figure 3: PIN in hexadecimal format

Figure 3: PIN in hexadecimal format

Automating this process is what makes this attack powerful. An attacker can drop BLE scanning devices in nearby areas where Prologic B01 safes were detected. The devices can continuously scan for unknowing victims to connect to their safe with their mobile devices. The BLE traffic is immediately captured, decoded, and the unlock PIN is sent back to the attacker who is located in a safe location.

In our attack, we used a Texas Instruments CC2540 and BLE Sniffer software to capture the BLE traffic of a specific target. The data is then funneled into a Python service that filters on the unlock command packet and extracts the PIN.

Info for Customers

The standards for wireless electronic locks are vague and few, making it difficult for consumers to tell whether they can trust a product or not. While the Prologic B01’s datasheet did not explicitly detail its wireless security features, the omission of that information is a perfect example of why consumers are confused, especially for a device assumed to be secure enough to protect a safe. The best measure for consumers is to avoid wireless electronic locks. Until there are verified security standards put in place, there are just too many unknown variables to take in account.

Info for IoT Companies, Engineers, and Developers

If you’re developing a wireless electronic lock, it is worth the investment to incorporate security features. The Prologic B01’s biggest downfall was that it did not encrypt any of its wireless traffic. Strong encryption schemes exist and would prevent an attacker from sniffing traffic and deriving the plaintext data. This could be implemented utilizing pre-shared keys at the link-level or application-level. Strong encryption, cryptographic integrity, authentication, and strong key management are all effective methods that prevent attacks and they should be implemented to enhance the security of wireless electronic locks.

Conclusion / Takeaways

Wireless electronic locks are entering the forefront of the physical security marketplace. The industry has recently been pushing on creating wireless electronic locks. As a new concept, there has been little to no regulation or standardization, causing the security of these devices to suffer. This has been the case for many residential-grade locks for a while now. However, our findings show that commercial-grade locks also suffer from the same vulnerabilities.

Posted on December 12, 2016 .

Electronic Safe Lock Analysis: Part 1 - Teardown

Introduction

With the rise of IoT devices and the age of convenience, electronic locks are becoming more commonplace on safes, homes, businesses, and even handheld padlocks. While classic mechanical locks have gone through decades of rigorous testing, their electronic counterparts are still considered unfamiliar territory.  Securam, founded in 2006, has designed locks for personal use, commercial, corporate, ATM, and bank security containers. Many of their products are Underwriters Laboratories (UL) certified at the highest level of security (UL Type-1). These locks can include features such as biometric scanning, WIFI, Bluetooth low energy, and mobile application interoperability. While adding convenience, these features do not necessarily harden the security of their product line. With this in mind, we decide to analyze several Securam devices to see how they worked and if the devices had any potential security vulnerabilities.

How It Works

The following teardown was performed on the Securam ProLogic 0601A-B01 entry pad and Securam EL-0701 lock body. The entry pad allows a user to enter a 6-digit pin number. When correct, the entry pad sends an electrical signal to the lock body, which is positioned on the inside of the safe. The lock body will then retract the bolt allowing the container to be opened. The entry pad also supports bluetooth communication, and the device can be controlled using an iOS or Android application.

Teardown

Front side of the entry pad circuit board shows power connections for a 9V battery and a 10-pin debugging interface (J1).

Front side of the entry pad circuit board shows power connections for a 9V battery and a 10-pin debugging interface (J1).

Back side of the entry pad is where the board’s main MCU, a Renesas μPD78F0515A (U1), can be found. It’s accompanied with an NXP QN902X SoC (U6) for BLE communication. Peripherals, such as the 8-pin keypad header (P1) and 4-pin serial interface (P3;…

Back side of the entry pad is where the board’s main MCU, a Renesas μPD78F0515A (U1), can be found. It’s accompanied with an NXP QN902X SoC (U6) for BLE communication. Peripherals, such as the 8-pin keypad header (P1) and 4-pin serial interface (P3; ref above) to the lock-body are also located on this side. This serial interface is both used for communication and to carry power to the lock body. There are unpopulated footprints (U2, U4, BAT1), which may have been used in previous revisions of this board, for debugging purposes, or other models of this entry pad.

Front side of the lock body circuit board uses a less featureful Renesas μPD78F9234 MCU (U1). It is only known to communicate with a entry pad over the wired 4-pin serial communication interface (JP1), but there is another serial interface (JP2) adj…

Front side of the lock body circuit board uses a less featureful Renesas μPD78F9234 MCU (U1). It is only known to communicate with a entry pad over the wired 4-pin serial communication interface (JP1), but there is another serial interface (JP2) adjacent to it that is hidden by the lock body cover. Additionally hidden is a 2-pin connector (P1) that is tied to pin 14 on the MCU. Its purpose is still undetermined.

Back side of the lock body reveals another 8-pin debugging interface (J1; under sticker) and the hardware reset button (SW1). The reset button allows the lock body and entry pad to “relink” if the devices were to somehow fall out of sync with each o…

Back side of the lock body reveals another 8-pin debugging interface (J1; under sticker) and the hardware reset button (SW1). The reset button allows the lock body and entry pad to “relink” if the devices were to somehow fall out of sync with each other. The reset button is also used when connecting a preconfigured lock body with a new entry pad.

The lock body is the only mechanical part of the system. It is composed of a DC motor and bolt. If 5 volts is applied across the red and yellow wires shown, the DC motor will retract the bolt and allow the security container to be opened.

The lock body is the only mechanical part of the system. It is composed of a DC motor and bolt. If 5 volts is applied across the red and yellow wires shown, the DC motor will retract the bolt and allow the security container to be opened.

Conclusion

By inspecting the lock body and the keypad, we were able to gain more insight into how the device operates. This additional information gave us clues into its security and potential vulnerabilities. In part two of our blog posts we'll cover a more in depth security analysis of the devices and some vulnerabilities we discovered. Follow us on Twitter @SomersetRecon to catch our next posts in the series.

Posted on June 8, 2016 .