November 1, 2010
Hacking the Hiring Process

Hacking the interview process

Recently, the team I work on has had several positions open, so we have been doing a lot of interviewing.  The position requires a unique intersection of skills, making the likelihood of a single candidate having all the qualifications rare.  We were really looking for a mid-level developer with the aptitude to learn quickly.  Normally, I’m a confident fellow, I know my strengths and a good number of my weaknesses.  However, I don’t think myself anything special or out of the ordinary.  I would place my development skills within one standard deviation (plus or minus) of average.  Not a bad spot to be, considering my educational background (several false starts and no degrees).

What do these two things have in common?  Nothing, really.  However, I have also been interviewing on occasion with other companies, mainly earlier-stage startups.  If I leave my relatively cushy job, I want to go do something interesting, not just chase a bigger paycheck.  Now, this is where the first two points begin to merge together.  We had a VERY difficult time finding people that even matched their resumes, let alone had a base of knowledge to learn from or the natural curiosity to allow them to learn.  The general impression was that folks wanted a job that didn’t ask too much of them.

This isn’t a bad thing, some folks honestly peak earlier in the career ladder than others.  But therein lies the conundrum.  The result of interviews where I am the interviewee was that I would not be a good fit for the position.  That is very understandable, they are looking for a specific candidate and skillset/skill level.  But the feedback is generally atrocious.  It is a sense of general rejection without anything to improve upon.

So I decided to hack the system a bit and solicit feedback on the areas they perceived as needing improvement.  I received excellent feedback by getting no direct feedback from one company.  They said they felt I was probably a good developer, but that wasn’t strong enough to fly me out for an in person interview (understandable, this company was looking for the best).  They also brought up the interview anti-loop (http://steve-yegge.blogspot.com/2008/03/get-that-job-at-google.html).  This was positive feedback as if it was no lack of skill, but happening to hit several weak areas.

I thought about pressing for more details so that I had general areas to work on for personal improvement, but decided against it.  I thought back to all the candidates that I had rejected and tried to decide what I would respond with if I were asked the same question.  And I drew a blank.  I could not come up with answers that wouldn’t be a potential liability or cause the requester more grief than just the general feeling of rejection.

And thus ended my attempted hack of the interview process.  I’ll still ask for feedback, but I expect the same, generic responses as the actual rejection.  The hiring process is deeply flawed when the only quantitative skills and characteristics for any given position are “can breath, with assistance is okay” and “can show up.”  Nearly all other requirements are subjective with a large number of caveats and cannot be qualified or measured.  The cover letter becomes a fraud to get you in the door so that a subjective measurement of intangible qualities can be taken and weighed against (usually) arbitrary valuations of those characteristics as needed for the position.

Why the current hiring process is the current hiring process

This leads to an interesting aspect of the hiring process.  Every employer knows and recognizes that there is a “ramp up” period before a new hire can be completely productive.  The amount of time is based on the type of position and the learning speed of the individual, but is rarely zero.  Combine this with the cost of hiring and there is a significant investment in each employee by the time they are able to contribute to the company.  The current system is working to balance these costs with the more expensive scenario of bringing on a candidate that has to be let go.

The cost of hiring an employee is amortized over the length of their employment.  Employers are willing to spend more money to acquire a candidate that will stay with the company long enough to recoup their costs and profit from the employee’s contributions.  For any system to replace the current method, it needs to be proven to reduce the cost of hiring significantly, increase the length of tenure of an employee or significantly reduce the ramp up time for new employees.  The challenge is that the length of tenure and ramp up time can be directly controlled with documentation, training process and company culture.  This necessitates an approach that is more efficient, even when paired with those systems.

Variations on a theme

To be able to provide an alternative method of seeking candidates, the first necessary piece is to be able to define measurable requirements for a particular position.  This definition is difficult because it is assumed that the candidate will need training in at least one area.  The comprehensive, perfect candidate will need less training than others, but still a non-zero amount of training.  This provides for a wide range of variability within the necessary skills.

With the wide variety in skill levels, there is also a variance in the base skills that are needed.  This occurs because proficiency in one area may be as beneficial as proficiency in another.  This complexity increases the space of applicants and greatly increases the need for a human or manual element in determining if the applicant is indeed a potential match.

If the skill requirements can be determined and the skills of an an applicant can be properly quantified, the filtering process becomes trivial.  The key components of an alternative talent acquisition process then becomes the definition of the required quantitative skills and the measurement of those skills for each applicant.  Each position at each company is unique.  There has been some convergence in different areas, but there will always be variation.  

I don’t have all the answers, yet.

All the pontification in the world will not create a suitable “replacement” for the current hiring process.  I even decided to post this write up prior to finding the answer myself because simply rationalizing about it will not solve the problems with the current system.  No amount of technology can determine if a candidate should be hired.  However, technology can be used to improve the process of determining which candidates should be evaluated by a person.

Problems with the current process that need to be addressed.

  • No quantitative skills necessary — This can be solved by technology and I believe is one of the key components of improving the current process.
  • Subjective determinations are needed — While many facets of an individual can be whittled down to measurable values, there will always be a necessity on a subjective measurement of the individual as a whole.
  • The filterer is not the one that knows the most about the position — In nearly every company, the representative that manages the openings, resumes and filtering process is divorced from the actual position.  The hiring manager must be more easily involved in this part of the process.  It can be overcome with company culture/process, but tools that automate this as much as possible help to remove ingrained methods.
  • Uncertainty — The cost of hiring and losing an individual is so high that any amount of uncertainty results in not hiring the individual.  This is difficult to overcome as many of the costs are independent of the company.
  • Feedback — Laws and regulations limit what employers are willing to disclose to applicants when they are not selected for employment.  Opening the company to potential liability is more disastrous than helping people to improve.  In addition, new technologies mean that companies receive many orders of magnitude more responses than before.  This can be resolved by an automated system informing the applicant of the determination if the resume is not forwarded along.  

September 6, 2010
RocketShip Overview

Background

Many years ago I decided that I liked the idea of UML.  The main drawback was that it imposed some pretty strict restraints when you were only trying to get the initial ideas presented.  It failed for what I was trying to accomplish, namely loose specification of a piece of software before I began writing it.   I tried to use it as a sort of “scratch pad” to flesh out my ideas so I could come up with the general approach.   While it did not fit this role very well, the documentation it presented made it easy to refer to when looking back on things I had done.  There was no denying the benefits of good documentation.

There are several tools out there that will automatically generate documentation about your code for you.  Doxygen is one of my personal favorites, generating multiple diagrams as well as the full documentation you add alongside your code.  However, it only provides a limited set of diagrams and graphs (dependency graphs, inheritance diagrams and collaboration diagrams).   It is designed around the idea of extracting limited information from the code and creating useful documentation based on that information.

One of the often overlooked UML diagrams is the Activity Diagram.   The usefulness of activity diagrams is quickly lost if they are not constantly maintained with every code change.   They describe the actual behavior of a system, easily representing complex behaviors in a straightforward manner.  While it would be great if they could be generated by Doxygen, it requires semantic knowledge of language internals to be able to automatically generate the diagrams.  Doxygen does not have the framework in place for the level of analysis required.

Introducing RocketShip

RocketShip was created as a way to extract more useful documentation from the actual code that has been written.  It will currently generate a directed graph that is very similar to UML activity diagrams from the code itself.   It is implemented as an analysis pass for LLVM’s optimizer.   This allows it to be used to automatically generate the corresponding graphs every time the code is compiled (providing you load the RocketShip module and pass the -rocketship flag to the optimizer).

Key Features

  • Language Independent - RocketShip operates on the LLVM bitcode passed to the optimizer, so any language that targets LLVM is supported.
  • Simple Output Format - RocketShip outputs dot files suitable for use with Graphviz, not some custom format that requires a special tool to use.
  • Fast - RocketShip processes each instruction in the bitcode only once, and requires only one pass to output the dot files.
  • Useful Diagrams - Rather than simply provide the full graph of the instructions and opcodes present in the bitcode, RocketShip combines instructions and condenses the graph to closely mimic the structure of the original source code automatically.

Current Limitations

  • C++ Support - Due to the way in which names are mangled in C++, RocketShip does not generate useful graphs for code that was originally in C++. This is at the top of the todo list.
  • Labels - LLVM bitcode uses labels to indicate jumps between blocks. These labels don’t exist within the code but are currently used within the graph.
  • Variable Names - Variable names are not fully resolved, occasionally having _addr or an integer appended to them. This is due to how they are represented in the bitcode but is on the priority todo list.
  • Output - The dot files are output in the directory the optimizer is invoked from and only indicate the name of the function. They should be placed somewhere more meaningful instead.
  • Only tested against C and C++ - I have not tested against other languages that target LLVM, but nearly all languages should result in bitcode that is similar to C or C++.

Release

RocketShip is available at http://github.com/ismarc/RocketShip.  As I have not decided on a license yet, you are free to download and use the software for evaluation, but please refrain from distribution.   Once I have decided on a license, the repository will be updated (and the license will be an Open Source license, I just wanted to make sure the source was available before I finished weighing the pros of each one).

Example

What new product would be complete without a demonstration of capabilities? Rather than a contrived set of code to generate a graph that looks wonderful, I have been testing RocketShip against code I had laying around from working on Project Euler problems. Below is the source code for the function, the assembly representation of the bitcode and the subsequent graph that was generated automatically.

Original Source

void multiplyBigInt(int first[], int second[], int destination[])
{
    int i;
    int temp[2000] = { 0 };
    int temp_two[2000] = { 0 };

    for (i = 0; i < 2000; i++) {
        assignBigInt(temp, 0);
        multiplyBigDigit(second, temp, first[i]);
        shiftBigDigit(temp, i);
        addBigInts(temp_two, temp, destination);
        copyBigInt(destination, temp_two);
    }
}

LLVM Assembly bitcode representation

define void @multiplyBigInt(i32* %first, i32* %second, i32* %destination) nounwind {
entry:
  %first_addr = alloca i32*                       ;  [#uses=2]
  %second_addr = alloca i32*                      ;  [#uses=2]
  %destination_addr = alloca i32*                 ;  [#uses=3]
  %i = alloca i32                                 ;  [#uses=6]
  %j = alloca i32                                 ;  [#uses=0]
  %temp = alloca [2000 x i32]                     ; <[2000 x i32]*> [#uses=5]
  %temp_two = alloca [2000 x i32]                 ; <[2000 x i32]*> [#uses=3]
  %"alloca point" = bitcast i32 0 to i32          ;  [#uses=0]
  store i32* %first, i32** %first_addr
  store i32* %second, i32** %second_addr
  store i32* %destination, i32** %destination_addr
  %temp1 = bitcast [2000 x i32]* %temp to i8*     ;  [#uses=1]
  call void @llvm.memset.i64(i8* %temp1, i8 0, i64 8000, i32 4)
  %temp_two2 = bitcast [2000 x i32]* %temp_two to i8* ;  [#uses=1]
  call void @llvm.memset.i64(i8* %temp_two2, i8 0, i64 8000, i32 4)
  store i32 0, i32* %i, align 4
  br label %bb9

bb:                                               ; preds = %bb9
  %temp3 = bitcast [2000 x i32]* %temp to i32*    ;  [#uses=1]
  call void @assignBigInt(i32* %temp3, i32 0) nounwind
  %0 = load i32** %first_addr, align 8            ;  [#uses=1]
  %1 = load i32* %i, align 4                      ;  [#uses=1]
  %2 = sext i32 %1 to i64                         ;  [#uses=1]
  %3 = getelementptr inbounds i32* %0, i64 %2     ;  [#uses=1]
  %4 = load i32* %3, align 1                      ;  [#uses=1]
  %5 = load i32** %second_addr, align 8           ;  [#uses=1]
  %temp4 = bitcast [2000 x i32]* %temp to i32*    ;  [#uses=1]
  call void @multiplyBigDigit(i32* %5, i32* %temp4, i32 %4) nounwind
  %temp5 = bitcast [2000 x i32]* %temp to i32*    ;  [#uses=1]
  %6 = load i32* %i, align 4                      ;  [#uses=1]
  call void @shiftBigDigit(i32* %temp5, i32 %6) nounwind
  %temp_two6 = bitcast [2000 x i32]* %temp_two to i32* ;  [#uses=1]
  %temp7 = bitcast [2000 x i32]* %temp to i32*    ;  [#uses=1]
  %7 = load i32** %destination_addr, align 8      ;  [#uses=1]
  call void @addBigInts(i32* %temp_two6, i32* %temp7, i32* %7) nounwind
  %8 = load i32** %destination_addr, align 8      ;  [#uses=1]
  %temp_two8 = bitcast [2000 x i32]* %temp_two to i32* ;  [#uses=1]
  call void @copyBigInt(i32* %8, i32* %temp_two8) nounwind
  %9 = load i32* %i, align 4                      ;  [#uses=1]
  %10 = add nsw i32 %9, 1                         ;  [#uses=1]
  store i32 %10, i32* %i, align 4
  br label %bb9

bb9:                                              ; preds = %bb, %entry
  %11 = load i32* %i, align 4                     ;  [#uses=1]
  %12 = icmp sle i32 %11, 1999                    ;  [#uses=1]
  br i1 %12, label %bb, label %bb10

bb10:                                             ; preds = %bb9
  br label %return

return:                                           ; preds = %bb10
  ret void
}

Generated Graph

multiplyBigInt

Liked posts on Tumblr: More liked posts »