> So that also applies to
> 
> "... and the second profiling function is called before the exit
> +corresponding to this first entry"
> 
> specifically "corresponding to this first entry"?   As if the second
> entry exits first will that call the second profiling function or will
> it really be the thread that called the first profiling function
> (what happens when that thread terminates before calling the second
> profiling function? (***)).  Consider re-wording this slightly.

The calls are always paired, i.e. if a thread calls the first function, then 
it will call the second function; I can indeed state it explicitly in the doc.

> +      /* If -finstrument-functions-once is specified, generate:
> +
> +          static volatile bool F.0 = true;
> +          bool tmp_first;
> 
> is there any good reason to make F.0 volatile?  That doesn't prevent
> races.

No, it does not, but it guarantees a single read so the pairing.

> Any reason to make F.0 initialized to true rather than false (bss init?)

None, changed.

> (***) looking at the implementation the second profiling function
> can end up being never called when the thread calling the first
> profiling function does not exit the function.  So I wonder if
> the "optimization"(?) not re-reading F.0 makes sense (it also
> requires to keep the value of F.0 live across the whole function)

It's for the pairing.  The value should be spilled onto the stack if need be, 
so you'd get at most 2 loads like if you re-read the variable.

Revised patch attached.

-- 
Eric Botcazou