Annotalysis is a flow and path sensitive analysis, in which we track the set of 
locks that are known to be held at each program point, and verify that any 
access to guarded data, or call to a guarded method, is made while holding the 
appropriate lock.  Annotations are used to mark data members and methods as 
"guarded".

In the ideal world of our imagination, we would like an intermediate 
representation that supplies the following information, in order of priority: 

(1) It must provide a CFG that accurately reflects the source code.  This is 
critical.
(2) It must provide accurate source-language type information for each 
expression, so that we can find annotations on virtual methods.
(3) We need to look at the IR before any optimizations occur.
(4) It should be stable from release to release of the compiler.
(5) All local variables should be in SSA form; SSA makes life much easier.
(6) It should identify loads and stores.
(7) It should be simple and easy to traverse.


Gimple satisfies 5-7 almost perfectly, does 1-3 pretty well, and fails on 4. 
Unfortunately, "pretty well" on 1-3 was not sufficient for our needs, and 4 was 
getting to be a problem.  I will provide a few examples of things that have 
caused major headaches.  These examples are enclosed in gcc-problem-examples.cpp, 
which you can compile with -fdump-tree-ssa to follow along.  Descriptions relate 
to gcc 4.6; I have not tested with gcc 4.7.

========
(1) Inaccurate CFG

void foo1(Mutex* mu, bool c) {
  mu->Lock();
  if (c) {
    MyClass mc;
    if (c) {
      mu->Unlock();
      return;
    }
    // extra join point here to call ~MyClass()
  }
  mu->Unlock();
  return;
};

The lowering to GIMPLE produces an extra join point in the CFG that is not 
present in the source code, in order to call the destructor of MyClass.  The 
benefit from a codegen standpoint is that different paths in the CFG can share 
the same destructor code, so this is ordinarily a good thing.  However, our 
analysis algorithm merges locksets whenever it hits a join point, so we get 
false positives in this case.  Fixing this problem would require a substantial 
rewrite of the algorithm to handle conditionally held locks. 


=======
(2) Inaccurate type information. 

void foo2(void* p) {
  ((MyClass*) p)->myMethod();
}

This is a minor nit.  The virtual function call uses a type cast to grab the 
vtable, but then passes p (with type void*) as 'this'.  So when I go back 
and try to locate the method later, I get a method lookup failure.  To be fair, 
this sort of thing does not happen very often.  


========
(3) "local" optimizations with non-local effects.  

Annotalysis is scheduled as a local optimization pass that runs before all 
other optimizations.  Unfortunately, some "local" optimizations have non-local 
effects that interfere with the analysis.  IPA-SRA has been the main source of 
headaches:

class DummyMutex {
public:
  void Lock()   EXCLUSIVE_LOCK_FUNCTION(this)  { }
  void Unlock() UNLOCK_FUNCTION(this)          { }
};

void foo3(DummyMutex* dmu1, DummyMutex* dmu2) {
  dmu1->Lock();
  dmu2->Lock();
  dmu2->Unlock();
  dmu1->Unlock();
}

DummyMutex is a class that implements the interface of a mutex without doing 
anything.  The LOCK_FUNCTION and UNLOCK_FUNCTION macros are annotations that 
the analysis uses to build locksets.  IPA-SRA will helpfully notice that the 
body of these functions do not refer to 'this', and thus they can be lifted to 
static methods; a good optimization.  It fails to notice that the *annotation*
refers to 'this'.  The lifting happens "locally", before the compilation of 
foo3(), so when checking foo3, the analysis can no longer determine which 
object is being locked or unlocked.  

The net result of this change is that we get false positives when compiling 
with optimizations turned on, even though the analysis is supposed to run 
before any optimizations.  (Compare the output of -fdump-tree-ssa when 
compiling normally, versus compiling with -O3.)


========
(4) Unstable platform.

Problems (1) and (3) were both introduced in gcc 4.6; they did not exist in 
earlier versions of the compiler.  gcc 4.7 introduces new breakages that I have 
not investigated.  The lowering algorithm from C++ to gimple, the gimple 
intermediate language, and the optimization pass framework, are all very fluid, 
so there's a lot of maintenance work required to keep annotalysis up-to-date. 


========
Comparison with Clang.

The clang AST satisfies 1-4 perfectly.  The AST records all information, and 
follows the C++ spec as closely as possible. Clang does not peform any 
transformations or optimizations in the front end.  It builds a CFG on top of 
the original AST, with pointers to the original expressions.  The AST is then 
lowered to LLVM IR in a separate pass, which happens after the AST has been 
fully constructed.  Because the AST is keyed to the C++ spec, it rarely changes.

The clang AST fails utterly on 5-7.  The AST is not simple, because C++ is not 
simple.  For example, whereas gimple has function calls, clang has constructor, 
destructor, new, delete, implicit destructor, function, and method calls.  So 
that involves more work.  We also had to implement our own use-def chains that 
mimic SSA, and identify expressions that would cause a load or store.  (Note 
that the clang static analyzer framework does provide some of this 
functionality, but our current implementation does not make use of it.)

As you can see, neither platform perfectly meets our needs.  However, issues  
5-7 were lower priority, and we had a straightforward way to work around them 
that did not require major changes to the algorithm, or alterations to the 
architecture of the compiler.


========
Conclusion

From our perspective, the ideal solution would be to have a strongly typed 
high-level intermediate language, somewhat like the JVM or .NET IRs, and 
initially lower from C++ source code to the high-level IR.  Static analysis 
passes and high-level optimizations could be run on this IR.  The high-level 
IR would then be lowered to a low-level IR like Gimple or LLVM, where low-level  
optimization and codegen passes would occur.  

Java uses an architecture like this, which is one reason why academic research 
has overwhelmingly focused on Java; Java bytecode is vastly easier to work with
than C++.  Neither clang nor gcc currently use such an architecture.