See also:

In the previous tutorial, we’ve seen how to use ocamlbuild and make to build a simple application. In this part, we’ll start exploring the API, and see how to access values and attributes of LLVM objects.

The base of the code is the same as in part 1: it reads an existing LLVM bitcode file, for example one generated by clang.

As in previous tutorial part, knowing the LLVM C++ API is not required (but can help).

LLVM objects

The top-level container is a module (llmodule). The module contains global variables, types and functions, which in turn contains basic blocks, and basic blocks contain instructions.

Values

In the OCaml bindings, all objects (variables, functions, instructions) are instances of the opaque type llvalue.

A value has a type, a name, a definition, a list of users, and other things like attributes (for ex. visibility or linkage options) or aliases.

Each value has a type (lltype), which is a composite object to define the type of a value and its arguments. To match the real type, it needs to be converted to a TypeKind.t:

let rec print_type llty =
  let ty = Llvm.classify_type llty in
  match ty with
  | Llvm.TypeKind.Function -> Printf.printf "  function\n"
  | Llvm.TypeKind.Pointer  -> Printf.printf "  pointer to" ; print_type (Llvm.element_type llty)
  | _                      -> Printf.printf "  other type\n"

We define a simple function to print a few informations about the input llvalue argument:

let print_val lv =
  Printf.printf "Value\n" ;
  Printf.printf "  name %s\n" (Llvm.value_name lv) ;
  let llty = Llvm.type_of lv in
  Printf.printf "  type %s\n" (Llvm.string_of_lltype llty) ;
  print_type llty ;
  ()

Functions

The lookup_function can be used to get the llvalue associated to a function. It returns an llvalue option, so we must use match to check if the function exists:

let opt_lv = Llvm.lookup_function "main" llm in
match opt_lv with
| Some lv -> print_val lv
| None    -> Printf.printf "'main' function not found\n"

If you don’t know the name of the functions, or simply wants to iterate on all functions, you can use the iter_functions, fold_left_functions, and similar functions:

Llvm.iter_functions print_val llm ;
let count =
  Llvm.fold_left_functions
    (fun acc lv ->
      print_val lv ;
      acc + 1
    )
    0
    llm
in
Printf.printf "Functions count: %d\n" count ;

If you run the above code, please note that when iterating on functions, you always get a pointer to the function, not the function directly.

As usual in OCaml, it is better to use the tail-recursive functions (for ex, fold_right_functions is not), especially when running on large LLVM modules. Hopefully, the documentation clearly indicates if the iteration functions are tail-recursive or not.

Basic blocks and instructions

In LLVM, a function is made of basic blocks, which are lists of instructions. Basic blocks have zero or more instructions, but they must be ended by a terminator instruction, which indicates which blocks must be executed after the current one is ended. Basically, a terminator instruction is a flow change (ret, br, switch, indirectbr, invoke, resume), or unreachable.

A function has at least one basic block, the entry point.

The LLVM instructions are in single-step assignment (SSA) form: a value is created by an instruction and can be assigned only once, and an instruction must only use values that are previously defined (in more precise words, the definition of a value must dominate all of its uses).

It is very important that the LLVM bitcode is well-formed: all constraints will be checked by the compiler, and the module will be rejected if not correct. Or, since the LLVM source code is abused the assert instruction a lot, you will get a segmentation fault if the compiler is in release mode …

For example, to iterate on all instructions of all basic blocks of a function:

let print_fun lv =
  Llvm.iter_blocks
    (fun llbb ->
      Printf.printf "  bb: %s\n" (Llvm.value_name (Llvm.value_of_block (llbb))) ;
      Llvm.iter_instrs
        (fun lli ->
          Printf.printf "    instr: %s\n" (Llvm.string_of_llvalue lli)
        )
        llbb
    )
    lv

Note that the order on the iteration of basic blocks is the iteration on the oriented graph (the control flow graph) of the function.

Global variables

Access to global variables is done using similar functions: iter_globals, fold_left_globals, etc.

Next time

In this part, we’ve covered how to access base elements of LLVM using the OCaml bindings. Using this, it is rather easy to develop applications to analyze LLVM bitcode, check some properties, etc.

Example code is in the part2 directory of project ocaml-llvm-tutorial.

To get it, run

$ git clone https://github.com/chifflier/ocaml-llvm-tutorial.git
$ cd ocaml-llvm-tutorial
$ cd part2
$ make

In part 3, we’ll see how to create or modify LLVM bitcode: functions, instructions, values, etc.