Fuzzing rust code: cargo-fuzz and honggfuzz

This post explains how to test Rust code using fuzzers.

Parsers are good target for fuzzers, especially because they usually are functions that only takes bytes as input.

Preamble: why fuzz Rust code ?

Since Rust provides some kind of memory safety, it may first appear strange to fuzz Rust code. However, there are some kind of bugs that a fuzzer will help you find, including:

debug or unfinished code, like unimplemented! and panic! calls
out of range accesses, like array[i]
integers overflows/underflows, like base + offset (see end of article)
stack overflows, unbound recursions
crashes in unsafe code
direct calls to std::process::exit
timeouts and functions that take too long

The last kind of bugs is very similar to the previous list, but is quite annoying:

functions from other crates, even std, which can panic (like Add, Sub and other ops for Duration)

Note that fuzzing has other advantages, like acting as regression tests when you change/update code, and can even be used as functional testing if you add assertions.

The good news is that some tools are well integrated with cargo, making it very easy to use. We’ll show how to use 2 tools: cargo-fuzz and honggfuzz.

cargo-fuzz

cargo-fuzz is a nice command-line wrapper around libFuzzer, an LLVM library for coverage-based fuzzing.

Install cargo-fuzz:

$ cargo install cargo-fuzz

Note that you will need the nightly version of the toolchain.

Note: I’m using stable as default, so all commands requiring nightly will explicitly set +nightly.

Initialize directories:

$ cargo +nightly fuzz init

This will create the fuzz/ directory, where fuzzers code and data will live.

We’ll then add a new fuzzer. I’ll take the example of the parse_der function of der-parser (DER is always a good target for fuzzing).

$ cargo +nightly fuzz add fuzzer_parse_der

It will create fuzz/fuzz_targets/fuzzer_parse_der.rs with template code, and add stuff to fuzz/Cargo.toml so it builds. Edit the fuzzer_parse_der.rs, and add a call to the targeted function:

#![no_main]
extern crate libfuzzer_sys;
extern crate der_parser;
#[export_name="rust_fuzzer_test_input"]
pub extern fn go(data: &[u8]) {
    let _ = der_parser::parse_der(data);
}

That’s all you need to start. Run the fuzzer:

$ cargo +nightly fuzz run fuzzer_parse_der

It will show you the progress of execution. To understand output, see libFuzzer Output. Many items are interesting:

NEW items show that a new code path has been discovered. With time, NEW events will appear less often
cov and ft give an idea of the current coverage (edges and comparisons)
exec/s shows how many times per second the function has executed. This should remain high for the fuzzer to be efficient.

Next, a few tips that will make fuzzing much more efficient.

Use a corpus

libFuzzer is a mutation-based fuzzer. If you give it enough time, it should be able to find all execution paths and branches. However, discovery of new paths can be slow.

Providing examples (as much as possible, good and bad) in the corpus makes this process much faster. This is especially true if the input data should be structured.

Simply copy files to the fuzz/corpus/fuzzer_parse_der/ directory.

Neat point: if your fuzzer is already running, do nothing more! Files copied to the corpus are automatically detected, and the fuzzer will issue a RELOAD.

Use parallelism

cargo-fuzz can start many processes (instead of 1 by default) using the --jobs n option.

$ cargo +nightly fuzz run --jobs 24 fuzzer_parse_der

All processes share the same corpus, and paths discovered by one process are automatically used by others (since a new item is added to corpus, others will reload it).

Handling crashes

When a crash is detected, the input will be saved in the fuzz/artifacts/<fuzzer_name>/ directory.

This is useful, especially for testing a candidate fix. Giving the name of a file as input argument of cargo-fuzz will run the fuzzer only for this file.

$ cargo +nightly fuzz run fuzzer_parse_der fuzz/artifacts/<fuzzer_name>/<input_file>

Minimize corpus

libFuzzer adds new samples to the corpus for every new path. With time, the corpus can grow to become very large, which will slow down fuzzing and become hard to manage (please do not commit a corpus of several gigs to github when your source code is only a few kilobytes!).

The cmin subcommand can be used to minimize corpus examples, while preserving coverage.

$ cargo +nightly fuzz cmin fuzzer_parse_der

Also fuzz in release mode

By default, cargo-fuzz uses the debug mode (which is good, because operations on integers are only instrumented in debug mode by default). Fuzzing in release mode has several advantages: it is much faster, and it provides a target closer to the code that will be executed in the end.

Just add --release to command-line arguments:

$ cargo +nightly fuzz run --jobs 24 --release fuzzer_parse_der

Note that fuzzing in release mode is complementary to debug mode, but does not replace it.

Visualizing code coverage

After torturing the parser, I want to look at the coverage of source code by the current corpus. For this I used kcov, passing the entire corpus as argument of the fuzzer:

$ cd fuzz
$ mkdir cov
$ kcov ./cov ./target/debug/fuzzer_parse_der corpus/fuzzer_parse_der/*

with not much success:

WARNING: Failed to find function "__sanitizer_acquire_crash_state".
WARNING: Failed to find function "__sanitizer_print_stack_trace".
WARNING: Failed to find function "__sanitizer_set_death_callback".
INFO: Seed: 631975293
kcov: Process exited with signal 11 (SIGSEGV) at 0x5555555a8f75

After fighting a bit with search results, I found this kcov issue which helped solving the problem. After adding some arguments (to include source path for the fuzzer and the der-parser lib), kcov worked:

kcov --include-path .,.. ./cov ./target/debug/fuzzer_parse_der corpus/fuzzer_parse_der/*

Overview

honggfuzz

honggfuzz is another fuzzer, maintained by Google, easy to use in Rust using honggfuzz-rs.

I have not a complete comparison of of honggfuzz versus cargo-fuzz. Here are a few points that looks important to me:

cargo-fuzz makes it easier to create multiple fuzzers in the same project. It is more efficient to write multiple, small fuzzers if your project can parse several formats. It is faster, and makes managing corpus easier.
honggfuzz seems faster in terms of execution per seconds (not sure why, just looking at the numbers)
honggfuzz does not require nightly

Similarly to the fuzz directory of cargo-fuzz, I used a hfuzz directory to store the project.

Create a new crate:

$ cargo new --bin hfuzz

Enter this directory, and edit the Cargo.toml file:

# ...
[dependencies]
honggfuzz = "0.5"

[dependencies.der-parser]
path = ".."

Edit src/main.rs file, and call the targeted function:

#[macro_use]
extern crate honggfuzz;
extern crate der_parser;

fn main() {
    println!("Starting fuzzer");
    loop {
        // The fuzz macro gives an arbitrary object (see `arbitrary crate`)
        // to a closure-like block of code.
        // For performance reasons, it is recommended that you use the native type
        // `&[u8]` when possible.
        // Here, this slice will contain a "random" quantity of "random" data.
        fuzz!(|data: &[u8]| {
          let _ = der_parser::parse_der(data);
        });
    }
}

Build and run fuzzer:

$ cargo hfuzz build
$ cargo hfuzz run hfuzz

This will display the fuzzing console, with some log output and current progress display.

Parallelism

By default, honggfuzz uses half the logical number of CPUs. To set the number of workers, use the -n argument:

$ HFUZZ_RUN_ARGS="-n 12" cargo hfuzz run hfuzz

Corpus

honggfuzz stores its corpus (by default) in hfuzz_workspace/<fuzzer_name>/input/.

Note that the corpus is similar to the one used in libFuzzer or cargo-fuzz, so files can just be copied to populate the initial corpus.

Exit upon crash

Unlike cargo-fuzz, honggfuzz will continue when the target function crashes. While this can be interesting in some cases, during the development phase it is easier to stop the fuzzer and fix the bug before continuing.

To do that, add --exit_upon_crash to HFUZZ_RUN_ARGS.

Release mode

By default, honggfuzz uses instrumented release mode. To build (run) without instrumentation, or in debug mode, use the following targets: build-no-instr, build-debug (run-no-instr, run-debug) instead of build (run).

honggfuzz arguments

The list of possible arguments for HFUZZ_RUN_ARGS can be displayed using:

$ cargo honggfuzz --help

Digression on checked integer operations

In Rust, the overflowing_xxx and checked_xxx family of functions helps detecting overflows, returning the result of operations and a boolean set to true if an operation occurred.

This is great, because a parser should be careful with overflows (especially when adding numbers coming from untrusted sources like the network).

However, the checked_shl and overflowing_shl are misleading in the performed check: they do not check for overflow, but that the number of bits shifted is not greater than the representation of the left argument. While this is important to test, this is not sufficient to detect shift overflows.

See this rust playground example for a silent overflow.

Note that this is usually the case for the CPU instructions (sal for x86, lsl, etc.) where the shifted bit goes to the carry flag, so if you shift several bits and the last is a 0, the overflow cannot be detected using carry. LLVM does not provide intrinsics to test it.

As such, it is neither a bug nor a new thing, but something that you have to be careful with.

To test for an overflow when shifting bits, you have to test manually.

Other articles

UEFI SecureBoot on Debian
Date Thu 15 October 2015 Tags Security Debian UEFI x509 PKI SecureBoot

This post explains how to enable UEFI SecureBoot on Debian, using your own trust chain. The technical part itself is very light, most of the post is explanations and what and why.

What is SecureBoot ?

UEFI SecureBoot is a mechanism to verify a cryptographic signature of UEFI Images before loading them into the Firmware (the new name for the BIOS¹). This provides a way to control which images are allowed, and also drivers and option ROM used by the Firmware, and to fight bootkits and malwares based on that. For an example of such dangers, see my past presentations on malicious UEFI Option ROMs ([FR] at SSTIC, and [EN] at PacSec).

Roughly, SecureBoot will rely on cryptographic signatures (mainly using SHA-256 and RSA-2048) that are embedded into files using the Authenticode file format. The integrity of the executable is verified by checking the hash, and the authenticity and the trust by checking the signature, based on X.509 certificates, which has to be trusted by the platform.

At a high level, the Firmware has 4 different set of objects (see figures for details):
- the Platform Key (PK): this is the main key. This keys, usually belonging to …
read more
Site Changed
Date Thu 20 August 2015 Tags Life

Since the blog was migrated (in fact, the entire server), I was wondering what to do with the previous content (especially the redmine installation, and the public git).

I don’t like the idea of installing a new redmine server (and dynamic content), so I’ve decided to throw away all the previous content and public only the new, static, content. This also makes the installation easier, because the LXC containing the web server is read-only :)

Some of the active projects are now published on github:
The change may break all of the previous links - I’m sorry for that, if something was important please contact me.

Comments may be re-added soon, maybe using Isso, allowing self-hosted comments, and no need for Disqus and others.
read more
OCaml LLVM bindings tutorial, part 4
Date Wed 01 July 2015 Tags Programming OCaml LLVM Compiler

See also:
In the previous examples, we’ve seen how to build OCaml applications to read, manipulate and write LLVM bitcode.

To be able to generate realistic code, we now need to add a few more things. This part explains how to create bitcode with a correctly specified target triple, how to verify bitcode, and write a hello world application.

Target Triple and Data Layout

While LLVM IR is (or should be) target independent, there are a few things that are not. For example, the support for some instructions, the padding and alignment inside structures, the endianness, the size of pointers, etc. All these things are specified in two attributes of modules: the target triple, and the data layout.

In the current (3.5) version of LLVM, these two attributes are optional. However, they could become mandatory in the future, so it is best specifying them.

Note: in my personal opinion, specifying that inside the module is clearly redundant with the -march= option of llc. Most of this could have been handled by compiler flags, instead of creating situations where one can …
read more
OCaml LLVM bindings tutorial, part 3
Date Wed 24 June 2015 Tags Programming OCaml LLVM Compiler

See also:
- OCaml LLVM bindings tutorial, part 1
- OCaml LLVM bindings tutorial, part 2
The previous articles explain how to build applications using the OCaml-LLVM bindings, and how to use the API to manipulate the LLVM objects. This was the “read-only” part of the tutorial, which can be used to analyze LLVM IR.

This part explains how to create LLVM IR, and write a simple application from scratch, and see how to build and run it.

Modules

As in the previous tutorial, we need to create a context and a module:
```
let llctx = global_context () in
let llm = create_module llctx "mymodule" in
```
Functions

There are two actions that can be done on functions:
- declare_function to give only a declaration of the prototype,
- define_function to give both the declaration and the implementation.
In both cases, we need to give the signature (return type, number and type of arguments) of the function.

This is pretty similar to C. We’ll use this to declare the function int main(void).

The int type is a bit problematic in LLVM (and in C, but for other reasons): integer types must have a known size in LLVM. While this does not change the architecture-independent property …
read more
OCaml LLVM bindings tutorial, part 2
Date Thu 11 June 2015 Tags Programming OCaml LLVM Compiler

See also:
- OCaml LLVM bindings tutorial, part 1
In the previous tutorial, we’ve seen how to use ocamlbuild and make to build a simple application. In this part, we’ll start exploring the API, and see how to access values and attributes of LLVM objects.

The base of the code is the same as in part 1: it reads an existing LLVM bitcode file, for example one generated by clang.

As in previous tutorial part, knowing the LLVM C++ API is not required (but can help).

LLVM objects

The top-level container is a module (llmodule). The module contains global variables, types and functions, which in turn contains basic blocks, and basic blocks contain instructions.

Values

In the OCaml bindings, all objects (variables, functions, instructions) are instances of the opaque type llvalue.

A value has a type, a name, a definition, a list of users, and other things like attributes (for ex. visibility or linkage options) or aliases.

Each value has a type (lltype), which is a composite object to define the type of a value and its arguments. To match the real type, it needs to be converted to a TypeKind.t:
```
let rec print_type llty =
  let ty = Llvm …
```
read more
OCaml LLVM bindings tutorial, part 1
Date Tue 09 June 2015 Tags Programming OCaml LLVM Compiler

This is the first part of a tutorial series, on how to use the OCaml bindings for LLVM. Why use OCaml bindings ? Because you can avoid using the C++ API, spending huge amounts of time compiling Clang sources, then your plugin, then debugging the segfaults again and again. The bindings are stable, cover most of the API, and are quite simple to use, thanks to the Debian packages.

This tutorial is written based on a Debian Sid, things may differ but should stay similar on other distributions.

The objectives of this first part are:
- install the required packages
- setup a build environment for ocamlbuild
- build a simple application that reads an LLVM bitcode file and prints it
Installation

The required packages are:
- llvm-3.5-dev
- libllvm-3.5-ocaml-dev
- the LLVM and OCaml compilers (llvm-3.5, ocaml)
- optionally, clang
The current LLVM version is 3.6, however the OCaml bindings are currently disabled (See Debian bug #783919), because of changes in the required dependencies.

Project Layout

The sources are organized as follows:
```
part1/
├── build
├── Makefile
└── src
    └── tutorial01.ml
```
First application

First, create file src/tutorial01.ml:
```
let _ =
  let llctx = Llvm.global_context () in
  let llmem = Llvm.MemoryBuffer.of_file Sys.argv.(1) in
  let …
```
read more
Materials for my talk at SSTIC 2015 - PICON : Control Flow Integrity on LLVM IR
Date Mon 08 June 2015 Tags Programming LLVM Compiler Security

Here are the materials for the talk PICON : Control Flow Integrity on LLVM IR, given during SSTIC 2015. While SSTIC is a french-speaking conference, I publish here in English because my other posts also are in English.

Here is the summary, from the website:

Control flow integrity has been a well explored field of software security for more than a decade.

However, most of the proposed approaches are stalled in a proof of concept state - when the implementation is publicly available - or have been designed with a minimal performance overhead as their primary objective, sacrificing security.

Currently, none of the proposed approaches can be used to fully protect real-world programs compiled with most common compilers (e.g. GCC, Clang/LLVM).

In this paper we describe a control flow integrity enforcement mechanism whose main objective is security. Our approach is based on compile-time code instrumentation, making the program communicate with its external execution monitor. The program is terminated by the monitor as soon as a control flow integrity violation is detected.

Our approach is implemented as an LLVM plugin and is working on LLVM’s Intermediate Representation.
- Article (EN)
- Slides (FR)
- Video (FR)
Code is currently being published (with an opensource …
read more
Blog migrated

Date Mon 01 June 2015 Tags Life

It’s been a long time since the last post, so this is a kind of “I’m alive” post. This blog has been migrated (as well as the entire server) to a static site generator, pelican. The theme is a custom one, inspired from pelican-boostrap. Why a static content generator ? Because browsing is fast, because it’s easy to create a read-only container to host pages, and because thanks to that I can avoid having PHP running. As the migration was both an automatic and manual process, don’t be surprised if things are broken, I will repair everything (don’t hesitate to give feedback).

In fact, the entire server has been migrated, and it took some time before getting things to work again.

What will be published ? Same kind as before, random posts on technical stuff, an OCaml-LLVM tutorial, thoughts on TLS, and maybe some sysadmin points one the installation of my servers
read more
grsec kernel with nvidia module
Date Wed 29 February 2012 Tags Security Debian Kernel

Compiling a grsec kernel on a laptop/workstation is a good way to add protection against wide classes of attacks. However, while the options may be easy to choose on a server, this may be difficult because a typical desktop needs more privileges. Here are a few points:
- Xorg (wants privileged I/O, unless you use KMS) conflicts with PAX_NOEXEC and GRKERNSEC_IO
- power management: applets to display the battery level want (non-root) read permission on /sys, this will conflict with GRKERNSEC_SYSFS_RESTRICT. You can enable SYSFS_DEPRECATED as a workaround.
- power management: ACPI is required for a laptop (if you want to be able to use suspend/resume, control fan speed, etc.)
- power management: suspend/restore conflicts with some options (PAX_MEMORY_UDEREF and PAX_KERNEXEC)
- virtualization: PAX_KERNEXEC conflicts with kvm/vmx
If you have other points to add/corrections, just send them to me !

Now, another problem I have is that I must use the proprietary kernel. Not that I really want to, but it is the only driver with proper support for my graphics card (GT555M), since the nouveau driver has some problems here: breaks suspend to ram/disk, sucks battery (I have 2h30 of autonomy with nouveau, and about 5 with Nvidia …
read more

Page 1 / 7 »

links

Social

Tags

Pollux's corner

Other articles

What is SecureBoot ?

Target Triple and Data Layout

Modules

Functions

LLVM objects

Values

Installation

Project Layout

First application