LLVM

OCaml

This is the first part of a tutorial series, on how to use the OCaml bindings for LLVM. Why use OCaml bindings ? Because you can avoid using the C++ API, spending huge amounts of time compiling Clang sources, then your plugin, then debugging the segfaults again and again. The bindings are stable, cover most of the API, and are quite simple to use, thanks to the Debian packages.

This tutorial is written based on a Debian Sid, things may differ but should stay similar on other distributions.

The objectives of this first part are:

  • install the required packages
  • setup a build environment for ocamlbuild
  • build a simple application that reads an LLVM bitcode file and prints it

Installation

The required packages are:

  • llvm-3.5-dev
  • libllvm-3.5-ocaml-dev
  • the LLVM and OCaml compilers (llvm-3.5, ocaml)
  • optionally, clang

The current LLVM version is 3.6, however the OCaml bindings are currently disabled (See Debian bug #783919), because of changes in the required dependencies.

Project Layout

The sources are organized as follows:

part1/
├── build
├── Makefile
└── src
    └── tutorial01.ml

First application

First, create file src/tutorial01.ml:

let _ =
  let llctx = Llvm.global_context () in
  let llmem = Llvm.MemoryBuffer.of_file Sys.argv.(1) in
  let llm = Llvm_bitreader.parse_bitcode llctx llmem in
  Llvm.dump_module llm ;
  ()

Let’s look at the file contents, and comment it a bit.

  let llctx = Llvm.global_context () in

LLVM requires a context (LLVMContext in the C++ API), to transparently own and manage all data. Here, there is no need to create a context, so we get the global one

  let llmem = Llvm.MemoryBuffer.of_file Sys.argv.(1) in

This line takes the first command-line argument of the application, and uses the LLVM-OCaml bindings API to read it into memory (as a llmemorybuffer opaque object). Input format should be LLVM bitcode, usually a file with the .bc extension.

  let llm = Llvm_bitreader.parse_bitcode llctx llmem in

After reading the LLVM bitcode file, the llmemorybuffer can now be parsed to create a LLVM module, in OCaml a llmodule. In LLVM, a module is a single unit of code to process. It contains things like functions, structures definitions and global variables, and usually matches the content of a single file to be compiled.

  Llvm.dump_module llm ;

The dump_module function prints the contents of the module to stderr, in the textual LLVM IR form. Its main purpose is debugging, and fits well the goal of this first tutorial.

Makefile

The build system is certainly not an OCaml strength. To make things a little bit easier, I’ve decided to use ocamlbuild, but with a wrapper (a Makefile) to simplify arguments. As I don’t like _tags files, everything will be on the CLI.

The Makefile only wraps (more or less) the following command:

export OCAMLPATH=/usr/lib/ocaml/llvm-$(LLVM_VERSION)
ocamlbuild -classic-display -j 0 -cflags -w,@a-4  -use-ocamlfind -pkgs llvm,llvm.bitreader  -I src -build-dir build/tutorial01 tutorial01.byte

The options should be rather easy to understand:

  • The first group of options -classic-display -j 0 -cflags -w,@a-4 sets some generic ocamlbuild flags (classic build display, parallel build if possible, and ask the compiler for warnings),
  • -use-ocamlfind -pkgs llvm,llvm.bitreader are the most important options: they ask ocamlbuild to find the llvm and llvm.bitreader packages, required by our example. This is why we have to set OCAMLPATH to the directory containing the bindings,
  • the remaining options specifies where the sources are, and where to put the compiled files.

Running the application

We use clang to transform a simple Hello World file to a LLVM bitcode file.

$ clang -c -emit-llvm hello.c
$ file hello.bc
hello.bc: LLVM IR bitcode

We can now use our first application to dump to LLVM bitcode:

$ LD_LIBRARY_PATH=/usr/lib/ocaml/llvm-3.5/ ./build/tutorial01/src/tutorial01.byte ./hello.bc
; ModuleID = './hello.bc'
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"

@.str = private unnamed_addr constant [14 x i8] c"hello, world\0A\00", align 1

; Function Attrs: nounwind uwtable
define i32 @main() #0 {
  %1 = alloca i32, align 4
  store i32 0, i32* %1
  %2 = call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([14 x i8]* @.str, i32 0, i32 0))
  ret i32 0
}

declare i32 @printf(i8*, ...) #1

attributes #0 = { nounwind uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }

!llvm.ident = !{!0}

!0 = metadata !{metadata !"Debian clang version 3.5.2-1 (tags/RELEASE_352/final) (based on LLVM 3.5.2)"}

Our example application works as expected. That’s it for part 1 of the tutorial, you should now be able to build an application using the OCaml LLVM bindings.

Example code has been published on github, project ocaml-llvm-tutorial.

To get it, run

$ git clone https://github.com/chifflier/ocaml-llvm-tutorial.git

Next time

In part 2, we’ll see how to iterate on functions, and access simple values and attributes.

Links