September 6, 2010
RocketShip Overview

Background

Many years ago I decided that I liked the idea of UML.  The main drawback was that it imposed some pretty strict restraints when you were only trying to get the initial ideas presented.  It failed for what I was trying to accomplish, namely loose specification of a piece of software before I began writing it.   I tried to use it as a sort of “scratch pad” to flesh out my ideas so I could come up with the general approach.   While it did not fit this role very well, the documentation it presented made it easy to refer to when looking back on things I had done.  There was no denying the benefits of good documentation.

There are several tools out there that will automatically generate documentation about your code for you.  Doxygen is one of my personal favorites, generating multiple diagrams as well as the full documentation you add alongside your code.  However, it only provides a limited set of diagrams and graphs (dependency graphs, inheritance diagrams and collaboration diagrams).   It is designed around the idea of extracting limited information from the code and creating useful documentation based on that information.

One of the often overlooked UML diagrams is the Activity Diagram.   The usefulness of activity diagrams is quickly lost if they are not constantly maintained with every code change.   They describe the actual behavior of a system, easily representing complex behaviors in a straightforward manner.  While it would be great if they could be generated by Doxygen, it requires semantic knowledge of language internals to be able to automatically generate the diagrams.  Doxygen does not have the framework in place for the level of analysis required.

Introducing RocketShip

RocketShip was created as a way to extract more useful documentation from the actual code that has been written.  It will currently generate a directed graph that is very similar to UML activity diagrams from the code itself.   It is implemented as an analysis pass for LLVM’s optimizer.   This allows it to be used to automatically generate the corresponding graphs every time the code is compiled (providing you load the RocketShip module and pass the -rocketship flag to the optimizer).

Key Features

  • Language Independent - RocketShip operates on the LLVM bitcode passed to the optimizer, so any language that targets LLVM is supported.
  • Simple Output Format - RocketShip outputs dot files suitable for use with Graphviz, not some custom format that requires a special tool to use.
  • Fast - RocketShip processes each instruction in the bitcode only once, and requires only one pass to output the dot files.
  • Useful Diagrams - Rather than simply provide the full graph of the instructions and opcodes present in the bitcode, RocketShip combines instructions and condenses the graph to closely mimic the structure of the original source code automatically.

Current Limitations

  • C++ Support - Due to the way in which names are mangled in C++, RocketShip does not generate useful graphs for code that was originally in C++. This is at the top of the todo list.
  • Labels - LLVM bitcode uses labels to indicate jumps between blocks. These labels don’t exist within the code but are currently used within the graph.
  • Variable Names - Variable names are not fully resolved, occasionally having _addr or an integer appended to them. This is due to how they are represented in the bitcode but is on the priority todo list.
  • Output - The dot files are output in the directory the optimizer is invoked from and only indicate the name of the function. They should be placed somewhere more meaningful instead.
  • Only tested against C and C++ - I have not tested against other languages that target LLVM, but nearly all languages should result in bitcode that is similar to C or C++.

Release

RocketShip is available at http://github.com/ismarc/RocketShip.  As I have not decided on a license yet, you are free to download and use the software for evaluation, but please refrain from distribution.   Once I have decided on a license, the repository will be updated (and the license will be an Open Source license, I just wanted to make sure the source was available before I finished weighing the pros of each one).

Example

What new product would be complete without a demonstration of capabilities? Rather than a contrived set of code to generate a graph that looks wonderful, I have been testing RocketShip against code I had laying around from working on Project Euler problems. Below is the source code for the function, the assembly representation of the bitcode and the subsequent graph that was generated automatically.

Original Source

void multiplyBigInt(int first[], int second[], int destination[])
{
    int i;
    int temp[2000] = { 0 };
    int temp_two[2000] = { 0 };

    for (i = 0; i < 2000; i++) {
        assignBigInt(temp, 0);
        multiplyBigDigit(second, temp, first[i]);
        shiftBigDigit(temp, i);
        addBigInts(temp_two, temp, destination);
        copyBigInt(destination, temp_two);
    }
}

LLVM Assembly bitcode representation

define void @multiplyBigInt(i32* %first, i32* %second, i32* %destination) nounwind {
entry:
  %first_addr = alloca i32*                       ;  [#uses=2]
  %second_addr = alloca i32*                      ;  [#uses=2]
  %destination_addr = alloca i32*                 ;  [#uses=3]
  %i = alloca i32                                 ;  [#uses=6]
  %j = alloca i32                                 ;  [#uses=0]
  %temp = alloca [2000 x i32]                     ; <[2000 x i32]*> [#uses=5]
  %temp_two = alloca [2000 x i32]                 ; <[2000 x i32]*> [#uses=3]
  %"alloca point" = bitcast i32 0 to i32          ;  [#uses=0]
  store i32* %first, i32** %first_addr
  store i32* %second, i32** %second_addr
  store i32* %destination, i32** %destination_addr
  %temp1 = bitcast [2000 x i32]* %temp to i8*     ;  [#uses=1]
  call void @llvm.memset.i64(i8* %temp1, i8 0, i64 8000, i32 4)
  %temp_two2 = bitcast [2000 x i32]* %temp_two to i8* ;  [#uses=1]
  call void @llvm.memset.i64(i8* %temp_two2, i8 0, i64 8000, i32 4)
  store i32 0, i32* %i, align 4
  br label %bb9

bb:                                               ; preds = %bb9
  %temp3 = bitcast [2000 x i32]* %temp to i32*    ;  [#uses=1]
  call void @assignBigInt(i32* %temp3, i32 0) nounwind
  %0 = load i32** %first_addr, align 8            ;  [#uses=1]
  %1 = load i32* %i, align 4                      ;  [#uses=1]
  %2 = sext i32 %1 to i64                         ;  [#uses=1]
  %3 = getelementptr inbounds i32* %0, i64 %2     ;  [#uses=1]
  %4 = load i32* %3, align 1                      ;  [#uses=1]
  %5 = load i32** %second_addr, align 8           ;  [#uses=1]
  %temp4 = bitcast [2000 x i32]* %temp to i32*    ;  [#uses=1]
  call void @multiplyBigDigit(i32* %5, i32* %temp4, i32 %4) nounwind
  %temp5 = bitcast [2000 x i32]* %temp to i32*    ;  [#uses=1]
  %6 = load i32* %i, align 4                      ;  [#uses=1]
  call void @shiftBigDigit(i32* %temp5, i32 %6) nounwind
  %temp_two6 = bitcast [2000 x i32]* %temp_two to i32* ;  [#uses=1]
  %temp7 = bitcast [2000 x i32]* %temp to i32*    ;  [#uses=1]
  %7 = load i32** %destination_addr, align 8      ;  [#uses=1]
  call void @addBigInts(i32* %temp_two6, i32* %temp7, i32* %7) nounwind
  %8 = load i32** %destination_addr, align 8      ;  [#uses=1]
  %temp_two8 = bitcast [2000 x i32]* %temp_two to i32* ;  [#uses=1]
  call void @copyBigInt(i32* %8, i32* %temp_two8) nounwind
  %9 = load i32* %i, align 4                      ;  [#uses=1]
  %10 = add nsw i32 %9, 1                         ;  [#uses=1]
  store i32 %10, i32* %i, align 4
  br label %bb9

bb9:                                              ; preds = %bb, %entry
  %11 = load i32* %i, align 4                     ;  [#uses=1]
  %12 = icmp sle i32 %11, 1999                    ;  [#uses=1]
  br i1 %12, label %bb, label %bb10

bb10:                                             ; preds = %bb9
  br label %return

return:                                           ; preds = %bb10
  ret void
}

Generated Graph

multiplyBigInt