PTS2022

Binbloom reloaded
2022-07-05, 14:35–15:10 (Europe/Paris), Amphitheater

Reverse-engineering hardware devices usually requires extracting data from
memory, be it from an internal Flash of a SoC, an external NAND or SPI
flash chip. Extracting memory content is part of the job, but once done we still
need to analyze it and face the inevitable truth : we may be in front of an
unknown memory dump or just have no idea of how information is stored in it,
or even how it is loaded into the SoC or MCU memory.
In this talk we will introduce Binbloom version 2, a tool able to identify the base address of any firmware code and also some specific structures such as UDS databases (often encountered in ECUs), no matter what the architecture (32 or 64 bits).


Detailed outline

I. Introduction (5 minutes)

I.1. Quick introduction and demo of the tool
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

I will start the talk by introducing the main reason why this new version of Binbloom has been developed and will show it live on various firmwares (on 32-bit and 64-bit firmwares). I will also insist on the fact this tool implements a new method that will be detailed in this talk, and that other tools exist too.

I.3. How existing tools work
~~~~~~~~~~~~~~~~~~~~~

I then talk about how I came to improve Binbloom, the fact that other tools do exist that are able to guess a firmware base address (like rbasefind for instance), and I will detail their internals (basically, they try every possible base address and compute a score based on some heuristics).

I.4. Actual limitations (64-bit architecture)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I will then talk about the actual limitations of these existing tools, the lack of support for 64-bit architecture.

II. Bruteforce vs. Inference (7 minutes)

In this part of the talk, I will detail the algorithm implemented in Binbloom v2, which does not rely on bruteforce but try to infer the base address based on data found in the firmware.

II.1. Entropy
~~~~~~~~~

I present the first interesting metric other tools are lacking: entropy. Firmware entropy can be useful to tell code and data apart, based on thresholds that have to be determined.

II.2. Introducing Binbloom v2 internals
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

It is time to go into the details with a focus on the inference mechanism implemented in Binbloom v2. This mechanism allows Binbloom to deduce a set of potential base address rather than bruteforcing any possible values, that is more efficient on 64-bit architecture firmware files but also backward-compatible with 32-bit architectures.

II.3. Implementation constraints (memory usage, performances and firmware file size)

I will then talk about some technical constraints I faced during the development of Binbloom, especially memory usage issues or how I had to deal with a huge number of candidate addresses. I will also talk about performances issues and code optimization.

II.4. 32-bit and 64-bit architectures support

Again, I will insist in this part of the talk on the fact that this method is generic and may be used for 32-bit and 64-bit based firmware files, with the same efficiency.

III. Binbloom v2 (3 minutes)

III.1. Comparison between Binbloom v2, rbasefind and Binbloom v1
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I will present in this section the result of a comparative analysis performed on Binbloom v1 and v2 and rbasefind, aiming at evaluating the efficiency of these three toos on a set of firmware files gathered on Internet (thanks Twitter !) and internally at Quarkslab.

III.2. Improvements
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

I will then present some improvements (in our todo list) for Binbloom v2, and what they may bring to the tool. It is also a good time to ask the audience to contribute to this project ! I will give the repository URL and invite attendees to give it a try (and report issues as well) =)

See also: 🎥 video

See also: slides

Damien Cauquil is a Security Researcher at Quarkslab who loves reverse-engineering hardware devices, firmwares and protocols.