public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug other/55354] New: [asan] by default, the asan run-time should be linked statically, not dynamically
@ 2012-11-16 17:03 konstantin.s.serebryany at gmail dot com
  2012-11-16 17:15 ` [Bug other/55354] " pinskia at gcc dot gnu.org
                   ` (27 more replies)
  0 siblings, 28 replies; 29+ messages in thread
From: konstantin.s.serebryany at gmail dot com @ 2012-11-16 17:03 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354

             Bug #: 55354
           Summary: [asan] by default, the asan run-time should be linked
                    statically, not dynamically
    Classification: Unclassified
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: other
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: konstantin.s.serebryany@gmail.com
                CC: dseketel@redhat.com, dvyukov@google.com,
                    jakub@redhat.com


Today, -faddress-sanitizer (to be changed to -fsanitize=address) causes the 
asan run-time library to be linked dynamically. 
It needs to be changed to static linkage because 
  - it matches clang behavior 
  - causes less confusion for users (where is my libasan.so???)
  - better for tsan performance (we'll need to link tsan statically too)


Note that on MacOS we probably want to keep using dynamic linkage
since the future library-interposition-based run-time will require it.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug other/55354] [asan] by default, the asan run-time should be linked statically, not dynamically
  2012-11-16 17:03 [Bug other/55354] New: [asan] by default, the asan run-time should be linked statically, not dynamically konstantin.s.serebryany at gmail dot com
@ 2012-11-16 17:15 ` pinskia at gcc dot gnu.org
  2012-11-16 17:21 ` dvyukov at google dot com
                   ` (26 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: pinskia at gcc dot gnu.org @ 2012-11-16 17:15 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Version|unknown                     |4.8.0

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> 2012-11-16 17:15:31 UTC ---
>  - it matches clang behavior 

But it is inconsistent with the rest of the target libraries of GCC.

>  - causes less confusion for users (where is my libasan.so???)

So this is a well documented issue already and easy fixed.  

>  - better for tsan performance (we'll need to link tsan statically too)

Not much better performance.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug other/55354] [asan] by default, the asan run-time should be linked statically, not dynamically
  2012-11-16 17:03 [Bug other/55354] New: [asan] by default, the asan run-time should be linked statically, not dynamically konstantin.s.serebryany at gmail dot com
  2012-11-16 17:15 ` [Bug other/55354] " pinskia at gcc dot gnu.org
@ 2012-11-16 17:21 ` dvyukov at google dot com
  2012-11-16 17:22 ` jakub at gcc dot gnu.org
                   ` (25 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: dvyukov at google dot com @ 2012-11-16 17:21 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354

--- Comment #2 from Dmitry Vyukov <dvyukov at google dot com> 2012-11-16 17:20:43 UTC ---
> Not much better performance.

Sole -fPIE vs -fPIC gives us 20% speedup on real programs. Indirect call will
add another 10%.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug other/55354] [asan] by default, the asan run-time should be linked statically, not dynamically
  2012-11-16 17:03 [Bug other/55354] New: [asan] by default, the asan run-time should be linked statically, not dynamically konstantin.s.serebryany at gmail dot com
  2012-11-16 17:15 ` [Bug other/55354] " pinskia at gcc dot gnu.org
  2012-11-16 17:21 ` dvyukov at google dot com
@ 2012-11-16 17:22 ` jakub at gcc dot gnu.org
  2012-11-16 20:28 ` konstantin.s.serebryany at gmail dot com
                   ` (24 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: jakub at gcc dot gnu.org @ 2012-11-16 17:22 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
                 CC|                            |jakub at gcc dot gnu.org
         Resolution|                            |INVALID

--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> 2012-11-16 17:22:03 UTC ---
This has been discussed already.  And Diego agreed Google can have a different
default in google branches if desirable for that kind of usage scenarios, but
otherwise it is undesirable.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug other/55354] [asan] by default, the asan run-time should be linked statically, not dynamically
  2012-11-16 17:03 [Bug other/55354] New: [asan] by default, the asan run-time should be linked statically, not dynamically konstantin.s.serebryany at gmail dot com
                   ` (2 preceding siblings ...)
  2012-11-16 17:22 ` jakub at gcc dot gnu.org
@ 2012-11-16 20:28 ` konstantin.s.serebryany at gmail dot com
  2012-11-16 20:47 ` jakub at gcc dot gnu.org
                   ` (23 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: konstantin.s.serebryany at gmail dot com @ 2012-11-16 20:28 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354

--- Comment #4 from Konstantin Serebryany <konstantin.s.serebryany at gmail dot com> 2012-11-16 20:28:34 UTC ---
You have been warned (especially about tsan performance. tsan run-time heavily
depends on TLS, and TLS is much slower with -fPIC than with -fPIE).

Do we have a flag today which will cause libasan.a to be linked statically, 
while not forcing anything else to link statically?


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug other/55354] [asan] by default, the asan run-time should be linked statically, not dynamically
  2012-11-16 17:03 [Bug other/55354] New: [asan] by default, the asan run-time should be linked statically, not dynamically konstantin.s.serebryany at gmail dot com
                   ` (3 preceding siblings ...)
  2012-11-16 20:28 ` konstantin.s.serebryany at gmail dot com
@ 2012-11-16 20:47 ` jakub at gcc dot gnu.org
  2012-11-16 20:54 ` konstantin.s.serebryany at gmail dot com
                   ` (22 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: jakub at gcc dot gnu.org @ 2012-11-16 20:47 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354

--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> 2012-11-16 20:46:35 UTC ---
For TLS, you can just use -ftls-model=initial-exec or __attribute__((tls_model
("initial-exec"))).  libasan from what I can see doesn't use TLS at all, and
you really can't just have -fno-pic libasan.a only anyway, any time you want to
instrument a shared library, if you linked non-pic libasan.a into it, it
wouldn't link on many architectures and on others would fail SELinux
restrictions.
If IE TLS mode is used, there is no problem if an app is linked against it or
if libraries it depends on are linked against it, there could be a problem if
its TLS usage is too large and app isn't linked against the library, and you
only dlopen some -fsanitize=address compiled/linked shared library.  Then
dlopen could fail (unless you e.g. LD_PRELOAD=libasan.so.0 or otherwise make
sure the app is linked against it).  Other GCC shared libraries (e.g.
libgomp.so.1) are also using the IE model.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug other/55354] [asan] by default, the asan run-time should be linked statically, not dynamically
  2012-11-16 17:03 [Bug other/55354] New: [asan] by default, the asan run-time should be linked statically, not dynamically konstantin.s.serebryany at gmail dot com
                   ` (4 preceding siblings ...)
  2012-11-16 20:47 ` jakub at gcc dot gnu.org
@ 2012-11-16 20:54 ` konstantin.s.serebryany at gmail dot com
  2012-11-17 20:36 ` hjl.tools at gmail dot com
                   ` (21 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: konstantin.s.serebryany at gmail dot com @ 2012-11-16 20:54 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354

--- Comment #6 from Konstantin Serebryany <konstantin.s.serebryany at gmail dot com> 2012-11-16 20:54:40 UTC ---
Answering my own question: we can get static linking with 
 -Wl,-Bstatic -lasan -Wl,-Bdynamic -ldl -lpthread 

>> For TLS, you can just use -ftls-model=initial-exec 
I did not know, thanks. 
Sounds like this could be a good solution for tsan. Will check.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug other/55354] [asan] by default, the asan run-time should be linked statically, not dynamically
  2012-11-16 17:03 [Bug other/55354] New: [asan] by default, the asan run-time should be linked statically, not dynamically konstantin.s.serebryany at gmail dot com
                   ` (5 preceding siblings ...)
  2012-11-16 20:54 ` konstantin.s.serebryany at gmail dot com
@ 2012-11-17 20:36 ` hjl.tools at gmail dot com
  2012-11-17 21:08 ` markus at trippelsdorf dot de
                   ` (20 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: hjl.tools at gmail dot com @ 2012-11-17 20:36 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354

--- Comment #7 from H.J. Lu <hjl.tools at gmail dot com> 2012-11-17 20:35:57 UTC ---
(In reply to comment #6)
> Answering my own question: we can get static linking with 
>  -Wl,-Bstatic -lasan -Wl,-Bdynamic -ldl -lpthread 
> 

The -static-libasan option was added by

http://gcc.gnu.org/ml/gcc-cvs/2012-11/msg00536.html


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug other/55354] [asan] by default, the asan run-time should be linked statically, not dynamically
  2012-11-16 17:03 [Bug other/55354] New: [asan] by default, the asan run-time should be linked statically, not dynamically konstantin.s.serebryany at gmail dot com
                   ` (6 preceding siblings ...)
  2012-11-17 20:36 ` hjl.tools at gmail dot com
@ 2012-11-17 21:08 ` markus at trippelsdorf dot de
  2012-11-18 19:36 ` konstantin.s.serebryany at gmail dot com
                   ` (19 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: markus at trippelsdorf dot de @ 2012-11-17 21:08 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354

Markus Trippelsdorf <markus at trippelsdorf dot de> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |markus at trippelsdorf dot
                   |                            |de

--- Comment #8 from Markus Trippelsdorf <markus at trippelsdorf dot de> 2012-11-17 21:08:21 UTC ---
(In reply to comment #7)
> (In reply to comment #6)
> > Answering my own question: we can get static linking with 
> >  -Wl,-Bstatic -lasan -Wl,-Bdynamic -ldl -lpthread 
> > 
> 
> The -static-libasan option was added by
> 
> http://gcc.gnu.org/ml/gcc-cvs/2012-11/msg00536.html

Can you please add "-ldl -lpthread" to it?
Otherwise the user has to set this by hand:

markus@x4 ~ % g++ -faddress-sanitizer -static-libasan -g test.cpp
/home/markus/gcc/libsanitizer/sanitizer_common/sanitizer_linux.cc:135: error:
undefined reference to 'pthread_getattr_np'
/home/markus/gcc/libsanitizer/sanitizer_common/sanitizer_linux.cc:138: error:
undefined reference to 'pthread_attr_getstack'
/home/markus/gcc/libsanitizer/asan/asan_posix.cc:103: error: undefined
reference to 'pthread_key_create'
/home/markus/gcc/libsanitizer/asan/asan_posix.cc:108: error: undefined
reference to 'pthread_getspecific'
/home/markus/gcc/libsanitizer/asan/asan_posix.cc:113: error: undefined
reference to 'pthread_setspecific'
/home/markus/gcc/libsanitizer/interception/interception_linux.cc:22: error:
undefined reference to 'dlsym'
collect2: error: ld returned 1 exit status


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug other/55354] [asan] by default, the asan run-time should be linked statically, not dynamically
  2012-11-16 17:03 [Bug other/55354] New: [asan] by default, the asan run-time should be linked statically, not dynamically konstantin.s.serebryany at gmail dot com
                   ` (7 preceding siblings ...)
  2012-11-17 21:08 ` markus at trippelsdorf dot de
@ 2012-11-18 19:36 ` konstantin.s.serebryany at gmail dot com
  2012-11-18 19:54 ` jakub at gcc dot gnu.org
                   ` (18 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: konstantin.s.serebryany at gmail dot com @ 2012-11-18 19:36 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354

--- Comment #9 from Konstantin Serebryany <konstantin.s.serebryany at gmail dot com> 2012-11-18 19:35:43 UTC ---
As dvyuokv@ pointed out, 
-ftls-model=initial-exec improves the situation, but does not fully help. 

Experiment: 

% cat x.c 
__thread int a;
int foo() {
  return a;
}


HORRIBLE: -fPIC -shared
% gcc x.c -O2 -fPIC -shared -o x.so  ; objdump -d x.so  | grep foo.: -A 5 
00000000000006e0 <foo>:
 6e0:   66 48 8d 3d f0 08 20    lea    0x2008f0(%rip),%rdi        # 200fd8
<_DYNAMIC+0x1b8>
 6e7:   00 
 6e8:   66 66 48 e8 10 ff ff    callq  600 <__tls_get_addr@plt>
 6ef:   ff 
 6f0:   8b 00                   mov    (%rax),%eax


NOT-SO-BAD: -fPIC -shared  -ftls-model=initial-exec
% gcc x.c -O2 -fPIC -shared -o x.so  -ftls-model=initial-exec ; objdump -d x.so
 | grep foo.: -A 5 
0000000000000630 <foo>:
 630:   48 8b 05 a9 09 20 00    mov    0x2009a9(%rip),%rax        # 200fe0
<_DYNAMIC+0x1b8>
 637:   64 8b 00                mov    %fs:(%rax),%eax
 63a:   c3                      retq   


GOOD: -fPIE 
% gcc -c x.c -O2 -fPIE -o x.o  ; objdump -d x.o  | grep foo.: -A 5 
0000000000000000 <foo>:
   0:   64 8b 04 25 00 00 00    mov    %fs:0x0,%eax
   7:   00 
   8:   c3                      retq   


So, while -ftls-model=initial-exec improves the TLS performance, it is still 
2x slower than -fPIE. 

For tsan, which does this for *every* memory access in the original program, 
this will cost 5%-10% slowdown. 

For our users this is a big deal, so they will link the static library whenever
possible. Which default is used in gcc -- I don't care that much.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug other/55354] [asan] by default, the asan run-time should be linked statically, not dynamically
  2012-11-16 17:03 [Bug other/55354] New: [asan] by default, the asan run-time should be linked statically, not dynamically konstantin.s.serebryany at gmail dot com
                   ` (8 preceding siblings ...)
  2012-11-18 19:36 ` konstantin.s.serebryany at gmail dot com
@ 2012-11-18 19:54 ` jakub at gcc dot gnu.org
  2012-11-18 19:59 ` konstantin.s.serebryany at gmail dot com
                   ` (17 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: jakub at gcc dot gnu.org @ 2012-11-18 19:54 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354

--- Comment #10 from Jakub Jelinek <jakub at gcc dot gnu.org> 2012-11-18 19:54:37 UTC ---
(In reply to comment #9)
> NOT-SO-BAD: -fPIC -shared  -ftls-model=initial-exec
> % gcc x.c -O2 -fPIC -shared -o x.so  -ftls-model=initial-exec ; objdump -d x.so
>  | grep foo.: -A 5 
> 0000000000000630 <foo>:
>  630:   48 8b 05 a9 09 20 00    mov    0x2009a9(%rip),%rax        # 200fe0
> <_DYNAMIC+0x1b8>
>  637:   64 8b 00                mov    %fs:(%rax),%eax
>  63a:   c3                      retq   
> 
> 
> GOOD: -fPIE 
> % gcc -c x.c -O2 -fPIE -o x.o  ; objdump -d x.o  | grep foo.: -A 5 
> 0000000000000000 <foo>:
>    0:   64 8b 04 25 00 00 00    mov    %fs:0x0,%eax
>    7:   00 
>    8:   c3                      retq   
> 
> 
> So, while -ftls-model=initial-exec improves the TLS performance, it is still 
> 2x slower than -fPIE. 

Except obviously you can't use the last code sequence if you want to link it
into a shared library.  The extra indirection is the standard cost of
relocatable code, especially if there are just a few TLS vars in libtsan and
they are accessed a lot, that memory (the .got section entry) is in caches most
likely and so the indirection can be just a cycle or at most a few of them.
No idea how would you plan to compile libtsan with -fPIE flag, for libtsan.so.0
you obviously can't, it would fail to link or load, and for libtsan.a it would
make the shared library
only usable in executables or PIEs, not from shared libraries.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug other/55354] [asan] by default, the asan run-time should be linked statically, not dynamically
  2012-11-16 17:03 [Bug other/55354] New: [asan] by default, the asan run-time should be linked statically, not dynamically konstantin.s.serebryany at gmail dot com
                   ` (9 preceding siblings ...)
  2012-11-18 19:54 ` jakub at gcc dot gnu.org
@ 2012-11-18 19:59 ` konstantin.s.serebryany at gmail dot com
  2012-11-18 20:10 ` jakub at gcc dot gnu.org
                   ` (16 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: konstantin.s.serebryany at gmail dot com @ 2012-11-18 19:59 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354

--- Comment #11 from Konstantin Serebryany <konstantin.s.serebryany at gmail dot com> 2012-11-18 19:59:42 UTC ---
The above comment is correct. 
-fPIE is only applicable if we build libtsan.a and link it statically to the
pie executable. 
This mode however, works pretty well and most our users pay this price for
performance. 

On every memory access tsan touches a few (two or three) extra cache lines. 
Not using -fPIE makes it touch one extra cache line. 
Even if that line is in L1, it still has a non-zero cost. 

We will try to provide benchmark numbers next week.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug other/55354] [asan] by default, the asan run-time should be linked statically, not dynamically
  2012-11-16 17:03 [Bug other/55354] New: [asan] by default, the asan run-time should be linked statically, not dynamically konstantin.s.serebryany at gmail dot com
                   ` (10 preceding siblings ...)
  2012-11-18 19:59 ` konstantin.s.serebryany at gmail dot com
@ 2012-11-18 20:10 ` jakub at gcc dot gnu.org
  2012-11-19  4:13 ` konstantin.s.serebryany at gmail dot com
                   ` (15 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: jakub at gcc dot gnu.org @ 2012-11-18 20:10 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354

--- Comment #12 from Jakub Jelinek <jakub at gcc dot gnu.org> 2012-11-18 20:09:39 UTC ---
That would effectively require building libtsan as libtsan.so.0, libtsan.a
(both -fPIC built) and libtsan_pie.a (-fPIE built), where the gcc driver would
do:
%{static-libtsan:!shared:-ltsan_pie;:-ltsan} or so.  Or there could be even
libtsan_nonpic.a for !shared:!pie, of course everything would need to be done
only given appropriate benchmarks of real-world programs.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug other/55354] [asan] by default, the asan run-time should be linked statically, not dynamically
  2012-11-16 17:03 [Bug other/55354] New: [asan] by default, the asan run-time should be linked statically, not dynamically konstantin.s.serebryany at gmail dot com
                   ` (11 preceding siblings ...)
  2012-11-18 20:10 ` jakub at gcc dot gnu.org
@ 2012-11-19  4:13 ` konstantin.s.serebryany at gmail dot com
  2012-11-19  8:55 ` jakub at gcc dot gnu.org
                   ` (14 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: konstantin.s.serebryany at gmail dot com @ 2012-11-19  4:13 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354

--- Comment #13 from Konstantin Serebryany <konstantin.s.serebryany at gmail dot com> 2012-11-19 04:13:23 UTC ---
>> of course everything would need to be done only given appropriate benchmarks of real-world programs.

We have a synthetic benchmark which perfectly reflects the only major hot spot
in tsan: the set of functions __tsan_{read,write}{1,2,4,8} that are called on
every memory access.


When building libtsan as a shared library (for which I had to hack our assembly
blobs a bit) we get two sources of slowdown: 
  1. __tsan_read8 and friends are called through PLT
  2. __tsan_read8 and friends use one extra load to get to TLS

The result is > 10% slowdown.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug other/55354] [asan] by default, the asan run-time should be linked statically, not dynamically
  2012-11-16 17:03 [Bug other/55354] New: [asan] by default, the asan run-time should be linked statically, not dynamically konstantin.s.serebryany at gmail dot com
                   ` (12 preceding siblings ...)
  2012-11-19  4:13 ` konstantin.s.serebryany at gmail dot com
@ 2012-11-19  8:55 ` jakub at gcc dot gnu.org
  2012-11-19  9:03 ` konstantin.s.serebryany at gmail dot com
                   ` (13 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: jakub at gcc dot gnu.org @ 2012-11-19  8:55 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354

--- Comment #14 from Jakub Jelinek <jakub at gcc dot gnu.org> 2012-11-19 08:54:47 UTC ---
I bet 9.5% or more of that is due to the PLT call.  The thing is, even when you
have initial-exec TLS model code, if you link it into an executable and the
referenced TLS variable is in the executable, the linker TLS transitions
optimization changes the IE model into LE model, so instead of something like:
  mov    0x2009a9(%rip),%rax
  mov    %fs:(%rax),%eax
you'll end up with
  mov    $-4,%rax
  mov    %fs:(%rax),%eax
or so (compared to
  mov    %fs:-4,%eax
if it was local-exec model from the beginning).  Given the amount of code in
__tsan_read8, I seriously doubt it is noticeable.  So, please compare libtsan
built with -fPIC -ftls-model=initial-exec with libtsan built with -fPIE
(-ftls-model=local-exec).  The former will not require any special hacks and
will work just fine even with shared libraries, the latter won't.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug other/55354] [asan] by default, the asan run-time should be linked statically, not dynamically
  2012-11-16 17:03 [Bug other/55354] New: [asan] by default, the asan run-time should be linked statically, not dynamically konstantin.s.serebryany at gmail dot com
                   ` (13 preceding siblings ...)
  2012-11-19  8:55 ` jakub at gcc dot gnu.org
@ 2012-11-19  9:03 ` konstantin.s.serebryany at gmail dot com
  2012-11-19  9:06 ` konstantin.s.serebryany at gmail dot com
                   ` (12 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: konstantin.s.serebryany at gmail dot com @ 2012-11-19  9:03 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354

--- Comment #15 from Konstantin Serebryany <konstantin.s.serebryany at gmail dot com> 2012-11-19 09:03:35 UTC ---
You are right that "-fPIC -ftls-model=initial-exec" does not affect performance 
if we link libtsan statically (I checked). 
As you say, the linker nukes one of the loads. 

But if we link libtsan.so dynamically, we still get both sources of overhead.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug other/55354] [asan] by default, the asan run-time should be linked statically, not dynamically
  2012-11-16 17:03 [Bug other/55354] New: [asan] by default, the asan run-time should be linked statically, not dynamically konstantin.s.serebryany at gmail dot com
                   ` (14 preceding siblings ...)
  2012-11-19  9:03 ` konstantin.s.serebryany at gmail dot com
@ 2012-11-19  9:06 ` konstantin.s.serebryany at gmail dot com
  2012-11-19 10:53 ` dvyukov at google dot com
                   ` (11 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: konstantin.s.serebryany at gmail dot com @ 2012-11-19  9:06 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354

--- Comment #16 from Konstantin Serebryany <konstantin.s.serebryany at gmail dot com> 2012-11-19 09:06:26 UTC ---
So, using "-fPIC -ftls-model=initial-exec" is a great idea, it will allow 
to build the files once and have both static and dynamic library.
But we need to agree that the dynamic library is noticeably slower.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug other/55354] [asan] by default, the asan run-time should be linked statically, not dynamically
  2012-11-16 17:03 [Bug other/55354] New: [asan] by default, the asan run-time should be linked statically, not dynamically konstantin.s.serebryany at gmail dot com
                   ` (15 preceding siblings ...)
  2012-11-19  9:06 ` konstantin.s.serebryany at gmail dot com
@ 2012-11-19 10:53 ` dvyukov at google dot com
  2012-11-21  7:45 ` dvyukov at google dot com
                   ` (10 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: dvyukov at google dot com @ 2012-11-19 10:53 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354

--- Comment #17 from Dmitry Vyukov <dvyukov at google dot com> 2012-11-19 10:53:04 UTC ---
>When building libtsan as a shared library (for which I had to hack our assembly
>blobs a bit) we get two sources of slowdown: 
>  1. __tsan_read8 and friends are called through PLT
>  2. __tsan_read8 and friends use one extra load to get to TLS

> I bet 9.5% or more of that is due to the PLT call.

That's not the overhead you are looking for, Luke.

We currently compile with -fPIC and link statically, linker inserts only 1
memory dereference in this case. However, -fPIC affects code generation in
compiler, it has to reserve more registers for tls access code and has to
allocate stack frame because of the potential call. Only that causes *20%*
slowdown on a real application (not a synthetic benchmark).

Kostya, to evaluate initial-exec you need to insure that code characteristics
of __tsan_read/write are not affected, i.e. 0 stack spills and analyze script
passes. Everything else we have w/o initial-exec.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug other/55354] [asan] by default, the asan run-time should be linked statically, not dynamically
  2012-11-16 17:03 [Bug other/55354] New: [asan] by default, the asan run-time should be linked statically, not dynamically konstantin.s.serebryany at gmail dot com
                   ` (16 preceding siblings ...)
  2012-11-19 10:53 ` dvyukov at google dot com
@ 2012-11-21  7:45 ` dvyukov at google dot com
  2012-11-21  8:43 ` jakub at gcc dot gnu.org
                   ` (9 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: dvyukov at google dot com @ 2012-11-21  7:45 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354

--- Comment #18 from Dmitry Vyukov <dvyukov at google dot com> 2012-11-21 07:45:20 UTC ---
(In reply to comment #17)
> >When building libtsan as a shared library (for which I had to hack our assembly
> >blobs a bit) we get two sources of slowdown: 
> >  1. __tsan_read8 and friends are called through PLT
> >  2. __tsan_read8 and friends use one extra load to get to TLS
> 
> > I bet 9.5% or more of that is due to the PLT call.
> 
> That's not the overhead you are looking for, Luke.
> 
> We currently compile with -fPIC and link statically, linker inserts only 1
> memory dereference in this case. However, -fPIC affects code generation in
> compiler, it has to reserve more registers for tls access code and has to
> allocate stack frame because of the potential call. Only that causes *20%*
> slowdown on a real application (not a synthetic benchmark).
> 
> Kostya, to evaluate initial-exec you need to insure that code characteristics
> of __tsan_read/write are not affected, i.e. 0 stack spills and analyze script
> passes. Everything else we have w/o initial-exec.

For actual ThreadSanitizer runtime -fPIC -ftls-model=initial-exec causes
degradation of generated code. Linker emits the same tls access code in all
cases, but the compiler generates worse code.
The table below show various stats about generated code for the hot functions.
We are mostly interested in stack access instructions (rsp/push/pop):

gcc -fPIE
    write1 tot 325; size 1316; rsp 1; push 0; pop 0; call 2; load 16; store  5;
sh  52; mov  68; lea   6; cmp  46
    write2 tot 326; size 1340; rsp 1; push 0; pop 0; call 2; load 16; store  5;
sh  52; mov  68; lea   6; cmp  46
    write4 tot 325; size 1316; rsp 1; push 0; pop 0; call 2; load 16; store  5;
sh  52; mov  68; lea   6; cmp  46
    write8 tot 325; size 1316; rsp 1; push 0; pop 0; call 2; load 16; store  5;
sh  52; mov  68; lea   6; cmp  46
     read1 tot 355; size 1476; rsp 1; push 0; pop 0; call 2; load 17; store  5;
sh  52; mov  71; lea   6; cmp  52
     read2 tot 344; size 1428; rsp 1; push 0; pop 0; call 2; load 16; store  5;
sh  52; mov  71; lea   6; cmp  51
     read4 tot 344; size 1436; rsp 1; push 0; pop 0; call 2; load 16; store  5;
sh  52; mov  71; lea   6; cmp  51
     read8 tot 344; size 1436; rsp 1; push 0; pop 0; call 2; load 16; store  5;
sh  52; mov  71; lea   6; cmp  51
func_entry tot  28; size  116; rsp 0; push 0; pop 0; call 1; load  3; store  1;
sh   2; mov   8; lea   0; cmp   1
 func_exit tot  25; size  100; rsp 0; push 0; pop 0; call 1; load  1; store  0;
sh   2; mov   5; lea   0; cmp   1

gcc -fPIC -ftls-model=initial-exec
    write1 tot 323; size 1268; rsp 1; push 1; pop 5; call 2; load 17; store  7;
sh  52; mov  64; lea   6; cmp  46
    write2 tot 321; size 1275; rsp 1; push 1; pop 5; call 2; load 17; store  7;
sh  52; mov  64; lea   6; cmp  46
    write4 tot 323; size 1268; rsp 1; push 1; pop 5; call 2; load 17; store  7;
sh  52; mov  64; lea   6; cmp  46
    write8 tot 323; size 1268; rsp 1; push 1; pop 5; call 2; load 17; store  7;
sh  52; mov  64; lea   6; cmp  46
     read1 tot 342; size 1380; rsp 1; push 1; pop 4; call 2; load 18; store  7;
sh  52; mov  67; lea   6; cmp  52
     read2 tot 331; size 1331; rsp 1; push 1; pop 3; call 2; load 17; store  7;
sh  52; mov  67; lea   6; cmp  51
     read4 tot 334; size 1356; rsp 1; push 1; pop 3; call 2; load 17; store  7;
sh  52; mov  67; lea   6; cmp  51
     read8 tot 334; size 1356; rsp 1; push 1; pop 3; call 2; load 17; store  7;
sh  52; mov  67; lea   6; cmp  51
func_entry tot   7; size   24; rsp 0; push 0; pop 0; call 0; load  0; store  1;
sh   0; mov   2; lea   0; cmp   0
 func_exit tot   6; size   21; rsp 0; push 0; pop 0; call 0; load  0; store  1;
sh   0; mov   1; lea   0; cmp   0

gcc -fPIC
    write1 tot 379; size 1571; rsp 23; push 0; pop 0; call 2; load 25; store
20; sh  52; mov 100; lea   6; cmp  42
    write2 tot 383; size 1603; rsp 23; push 0; pop 0; call 2; load 25; store
20; sh  52; mov 100; lea   6; cmp  42
    write4 tot 379; size 1571; rsp 23; push 0; pop 0; call 2; load 25; store
20; sh  52; mov 100; lea   6; cmp  42
    write8 tot 379; size 1571; rsp 23; push 0; pop 0; call 2; load 25; store
20; sh  52; mov 100; lea   6; cmp  42
     read1 tot 402; size 1715; rsp 23; push 0; pop 0; call 2; load 26; store
20; sh  52; mov 103; lea   6; cmp  48
     read2 tot 393; size 1659; rsp 23; push 0; pop 0; call 2; load 25; store
20; sh  52; mov 103; lea   6; cmp  47
     read4 tot 391; size 1659; rsp 23; push 0; pop 0; call 2; load 25; store
20; sh  52; mov 103; lea   6; cmp  47
     read8 tot 391; size 1659; rsp 23; push 0; pop 0; call 2; load 25; store
20; sh  52; mov 103; lea   6; cmp  47
func_entry tot   9; size   32; rsp 0; push 1; pop 1; call 0; load  0; store  0;
sh   0; mov   4; lea   0; cmp   0
 func_exit tot   7; size   32; rsp 0; push 0; pop 0; call 0; load  0; store  0;
sh   0; mov   2; lea   0; cmp   0

clang -fPIE
    write1 tot 326; size 1194; rsp 1; push 1; pop 5; call 2; load 16; store  5;
sh  37; mov  77; lea   4; cmp  42
    write2 tot 330; size 1210; rsp 1; push 1; pop 5; call 2; load 16; store  5;
sh  37; mov  77; lea   4; cmp  46
    write4 tot 330; size 1210; rsp 1; push 1; pop 5; call 2; load 16; store  5;
sh  37; mov  77; lea   4; cmp  46
    write8 tot 330; size 1210; rsp 1; push 1; pop 5; call 2; load 16; store  5;
sh  37; mov  77; lea   4; cmp  46
     read1 tot 350; size 1304; rsp 1; push 0; pop 0; call 2; load 17; store  5;
sh  38; mov  85; lea   3; cmp  47
     read2 tot 355; size 1311; rsp 1; push 1; pop 5; call 2; load 16; store  5;
sh  37; mov  83; lea   4; cmp  50
     read4 tot 355; size 1311; rsp 1; push 1; pop 5; call 2; load 16; store  5;
sh  37; mov  83; lea   4; cmp  50
     read8 tot 355; size 1311; rsp 1; push 1; pop 5; call 2; load 16; store  5;
sh  37; mov  83; lea   4; cmp  50
func_entry tot  26; size  108; rsp 0; push 0; pop 0; call 1; load  2; store  0;
sh   1; mov   7; lea   0; cmp   1
 func_exit tot  21; size   93; rsp 0; push 0; pop 0; call 1; load  1; store  0;
sh   1; mov   4; lea   0; cmp   1

clang -fPIC -ftls-model=initial-exec
    write1 tot 315; size 1173; rsp 1; push 2; pop 2; call 2; load 20; store  6;
sh  37; mov  78; lea   4; cmp  42
    write2 tot 319; size 1189; rsp 1; push 2; pop 2; call 2; load 20; store  6;
sh  37; mov  78; lea   4; cmp  46
    write4 tot 319; size 1189; rsp 1; push 2; pop 2; call 2; load 20; store  6;
sh  37; mov  78; lea   4; cmp  46
    write8 tot 319; size 1189; rsp 1; push 2; pop 2; call 2; load 20; store  6;
sh  37; mov  78; lea   4; cmp  46
     read1 tot 347; size 1271; rsp 1; push 1; pop 5; call 2; load 21; store  6;
sh  38; mov  86; lea   3; cmp  47
     read2 tot 345; size 1268; rsp 1; push 2; pop 2; call 2; load 20; store  6;
sh  37; mov  84; lea   4; cmp  50
     read4 tot 345; size 1268; rsp 1; push 2; pop 2; call 2; load 20; store  6;
sh  37; mov  84; lea   4; cmp  50
     read8 tot 345; size 1268; rsp 1; push 2; pop 2; call 2; load 20; store  6;
sh  37; mov  84; lea   4; cmp  50
func_entry tot  21; size   96; rsp 0; push 0; pop 0; call 1; load  4; store  1;
sh   1; mov   8; lea   0; cmp   1
 func_exit tot  19; size   82; rsp 0; push 0; pop 0; call 1; load  2; store  0;
sh   1; mov   5; lea   0; cmp   1

clang -fPIC
    write1 tot 409; size 1625; rsp 17; push 6; pop 6; call 2; load 17; store
25; sh  37; mov 131; lea  20; cmp  42
    write2 tot 413; size 1641; rsp 17; push 6; pop 6; call 2; load 17; store
25; sh  37; mov 131; lea  20; cmp  46
    write4 tot 413; size 1641; rsp 17; push 6; pop 6; call 2; load 17; store
25; sh  37; mov 131; lea  20; cmp  46
    write8 tot 413; size 1641; rsp 17; push 6; pop 6; call 2; load 17; store
25; sh  37; mov 131; lea  20; cmp  46
     read1 tot 423; size 1654; rsp 15; push 6; pop 6; call 2; load 13; store
26; sh  38; mov 123; lea  19; cmp  47
     read2 tot 446; size 1742; rsp 19; push 6; pop 6; call 2; load 19; store
27; sh  37; mov 140; lea  20; cmp  50
     read4 tot 446; size 1742; rsp 19; push 6; pop 6; call 2; load 19; store
27; sh  37; mov 140; lea  20; cmp  50
     read8 tot 446; size 1742; rsp 19; push 6; pop 6; call 2; load 19; store
27; sh  37; mov 140; lea  20; cmp  50
func_entry tot  36; size  129; rsp 0; push 3; pop 3; call 1; load  4; store  1;
sh   1; mov  10; lea   2; cmp   1
 func_exit tot  30; size  114; rsp 0; push 3; pop 2; call 1; load  2; store  0;
sh   1; mov   6; lea   2; cmp   1

clang -fPIE project-stream
    write1 tot 293; size 1071; rsp 1; push 0; pop 0; call 2; load 17; store  5;
sh  33; mov  72; lea   1; cmp  38
    write2 tot 297; size 1087; rsp 1; push 0; pop 0; call 2; load 17; store  5;
sh  33; mov  72; lea   1; cmp  42
    write4 tot 297; size 1087; rsp 1; push 0; pop 0; call 2; load 17; store  5;
sh  33; mov  72; lea   1; cmp  42
    write8 tot 297; size 1087; rsp 1; push 0; pop 0; call 2; load 17; store  5;
sh  33; mov  72; lea   1; cmp  42
     read1 tot 334; size 1252; rsp 1; push 0; pop 0; call 2; load 17; store  5;
sh  33; mov  78; lea   1; cmp  38
     read2 tot 341; size 1283; rsp 1; push 0; pop 0; call 2; load 17; store  5;
sh  33; mov  79; lea   1; cmp  42
     read4 tot 341; size 1283; rsp 1; push 0; pop 0; call 2; load 17; store  5;
sh  33; mov  79; lea   1; cmp  42
     read8 tot 341; size 1283; rsp 1; push 0; pop 0; call 2; load 17; store  5;
sh  33; mov  79; lea   1; cmp  42
func_entry tot  26; size  102; rsp 0; push 0; pop 0; call 1; load  2; store  0;
sh   0; mov   7; lea   0; cmp   1
 func_exit tot  21; size   87; rsp 0; push 0; pop 0; call 1; load  1; store  0;
sh   0; mov   4; lea   0; cmp   1

clang -fPIC -ftls-model=initial-exec project-stream
    write1 tot 293; size 1051; rsp 1; push 1; pop 5; call 2; load 21; store  6;
sh  33; mov  73; lea   1; cmp  38
    write2 tot 297; size 1067; rsp 1; push 1; pop 5; call 2; load 21; store  6;
sh  33; mov  73; lea   1; cmp  42
    write4 tot 297; size 1067; rsp 1; push 1; pop 5; call 2; load 21; store  6;
sh  33; mov  73; lea   1; cmp  42
    write8 tot 297; size 1067; rsp 1; push 1; pop 5; call 2; load 21; store  6;
sh  33; mov  73; lea   1; cmp  42
     read1 tot 327; size 1231; rsp 1; push 0; pop 0; call 2; load 21; store  6;
sh  33; mov  79; lea   1; cmp  38
     read2 tot 340; size 1263; rsp 1; push 1; pop 5; call 2; load 21; store  6;
sh  33; mov  80; lea   1; cmp  42
     read4 tot 340; size 1263; rsp 1; push 1; pop 5; call 2; load 21; store  6;
sh  33; mov  80; lea   1; cmp  42
     read8 tot 340; size 1263; rsp 1; push 1; pop 5; call 2; load 21; store  6;
sh  33; mov  80; lea   1; cmp  42
func_entry tot  21; size   90; rsp 0; push 0; pop 0; call 1; load  4; store  2;
sh   0; mov   8; lea   0; cmp   1
 func_exit tot  17; size   76; rsp 0; push 0; pop 0; call 1; load  2; store  1;
sh   0; mov   5; lea   0; cmp   1

clang -fPIC project-stream
    write1 tot 341; size 1289; rsp 7; push 6; pop 6; call 2; load 22; store  7;
sh  33; mov  91; lea  12; cmp  38
    write2 tot 345; size 1305; rsp 7; push 6; pop 6; call 2; load 22; store  7;
sh  33; mov  91; lea  12; cmp  42
    write4 tot 345; size 1305; rsp 7; push 6; pop 6; call 2; load 22; store  7;
sh  33; mov  91; lea  12; cmp  42
    write8 tot 345; size 1305; rsp 7; push 6; pop 6; call 2; load 22; store  7;
sh  33; mov  91; lea  12; cmp  42
     read1 tot 379; size 1489; rsp 7; push 6; pop 6; call 2; load 21; store  6;
sh  33; mov  93; lea  12; cmp  38
     read2 tot 388; size 1533; rsp 9; push 6; pop 6; call 2; load 22; store  7;
sh  33; mov  97; lea  12; cmp  42
     read4 tot 388; size 1533; rsp 9; push 6; pop 6; call 2; load 22; store  7;
sh  33; mov  97; lea  12; cmp  42
     read8 tot 388; size 1533; rsp 9; push 6; pop 6; call 2; load 22; store  7;
sh  33; mov  97; lea  12; cmp  42
func_entry tot  33; size  123; rsp 0; push 3; pop 3; call 1; load  4; store  2;
sh   0; mov  10; lea   2; cmp   1
 func_exit tot  28; size  108; rsp 0; push 3; pop 2; call 1; load  2; store  1;
sh   0; mov   6; lea   2; cmp   1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug other/55354] [asan] by default, the asan run-time should be linked statically, not dynamically
  2012-11-16 17:03 [Bug other/55354] New: [asan] by default, the asan run-time should be linked statically, not dynamically konstantin.s.serebryany at gmail dot com
                   ` (17 preceding siblings ...)
  2012-11-21  7:45 ` dvyukov at google dot com
@ 2012-11-21  8:43 ` jakub at gcc dot gnu.org
  2012-11-21  9:06 ` dvyukov at google dot com
                   ` (8 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: jakub at gcc dot gnu.org @ 2012-11-21  8:43 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354

--- Comment #19 from Jakub Jelinek <jakub at gcc dot gnu.org> 2012-11-21 08:40:52 UTC ---
(In reply to comment #18)
> For actual ThreadSanitizer runtime -fPIC -ftls-model=initial-exec causes
> degradation of generated code. Linker emits the same tls access code in all
> cases, but the compiler generates worse code.

-fPIC -ftls-model=initial-exec is by definition almost equivalent to -fPIE, the
only exceptions are:
1) -fPIE code is allowed to assume globally visible symbols aren't interposed
2) if TLS vars are defined locally (or hidden visibility), then local-exec
   model can be used instead of initial-exec (one less dereference)

As for 2), I've explained already that by linking -fPIC code into the
executable if the TLS var is defined in the executable, linker TLS transition
transform all other TLS models (even global and local dynamic) into local-exec,
just might result in some nops or for IE->LE setting of a register to an
immediate and using that register as opposed to just using the immediate in the
%fs: prefixed insn.

And for 1), for the fast path, for any symbols on the fast path that shouldn't
be interposeable and that are defined in libtsan, you should be able to just
use visibility attributes and get the same effect.

-fPIE flag simply isn't usable for a library that is to be used also by shared
libraries.  How do you link -fsanitize=thread shared libraries anyway?  Just
don't link libtsan in for -static-libtsan, and rely on the executable being
linked against it?  Such libraries will fail to link with -Wl,-z,defs ...
Of course, having multiple tsan TLS roots in the same process isn't a good idea
either (which is why I think we can't default to -static-libtsan).


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug other/55354] [asan] by default, the asan run-time should be linked statically, not dynamically
  2012-11-16 17:03 [Bug other/55354] New: [asan] by default, the asan run-time should be linked statically, not dynamically konstantin.s.serebryany at gmail dot com
                   ` (18 preceding siblings ...)
  2012-11-21  8:43 ` jakub at gcc dot gnu.org
@ 2012-11-21  9:06 ` dvyukov at google dot com
  2012-11-21  9:26 ` jakub at gcc dot gnu.org
                   ` (7 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: dvyukov at google dot com @ 2012-11-21  9:06 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354

--- Comment #20 from Dmitry Vyukov <dvyukov at google dot com> 2012-11-21 09:04:07 UTC ---
(In reply to comment #19)
> (In reply to comment #18)
> > For actual ThreadSanitizer runtime -fPIC -ftls-model=initial-exec causes
> > degradation of generated code. Linker emits the same tls access code in all
> > cases, but the compiler generates worse code.
> 
> -fPIC -ftls-model=initial-exec is by definition almost equivalent to -fPIE, the
> only exceptions are:
> 1) -fPIE code is allowed to assume globally visible symbols aren't interposed
> 2) if TLS vars are defined locally (or hidden visibility), then local-exec
>    model can be used instead of initial-exec (one less dereference)


What I see is that it also affect code generation (register allocation). Do we
need to file a bug on that?


> As for 2), I've explained already that by linking -fPIC code into the
> executable if the TLS var is defined in the executable, linker TLS transition
> transform all other TLS models (even global and local dynamic) into local-exec,
> just might result in some nops or for IE->LE setting of a register to an
> immediate and using that register as opposed to just using the immediate in the
> %fs: prefixed insn.
> 
> And for 1), for the fast path, for any symbols on the fast path that shouldn't
> be interposeable and that are defined in libtsan, you should be able to just
> use visibility attributes and get the same effect.
> 
> -fPIE flag simply isn't usable for a library that is to be used also by shared
> libraries.  How do you link -fsanitize=thread shared libraries anyway?  Just
> don't link libtsan in for -static-libtsan, and rely on the executable being
> linked against it?


Yes, we rely on the library being linked into the executable, because we want
the runtime be linked statically.
For dynamic libraries that are loaded into a non-instrumented executable (e.g.
swig so preloaded into python process), we statically link the tsan runtime
into the so.



> Such libraries will fail to link with -Wl,-z,defs ...
> Of course, having multiple tsan TLS roots in the same process isn't a good idea
> either (which is why I think we can't default to -static-libtsan).


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug other/55354] [asan] by default, the asan run-time should be linked statically, not dynamically
  2012-11-16 17:03 [Bug other/55354] New: [asan] by default, the asan run-time should be linked statically, not dynamically konstantin.s.serebryany at gmail dot com
                   ` (19 preceding siblings ...)
  2012-11-21  9:06 ` dvyukov at google dot com
@ 2012-11-21  9:26 ` jakub at gcc dot gnu.org
  2012-11-23  7:17 ` [Bug sanitizer/55354] " dvyukov at google dot com
                   ` (6 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: jakub at gcc dot gnu.org @ 2012-11-21  9:26 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354

--- Comment #21 from Jakub Jelinek <jakub at gcc dot gnu.org> 2012-11-21 09:23:56 UTC ---
(In reply to comment #20)
> What I see is that it also affect code generation (register allocation). Do we
> need to file a bug on that?

If you see a code generation difference even with -ftls-model=local-exec -fPIC
vs. -fPIE, then it must mean you don't have visibility attributes on the
symbols used in the fast path.  For initial-exec, the RA effects should be
minimal, the TLS offset load from got is usually very close to the actual TLS
memory load (or lea), and thus it will just pick up some short lived scratch
register.  Generally in GCC, -fPIE sets flag_pic and not flag_shlib, while
-fPIC sets flag_pic and flag_shlib.  flag_pic is about whether position
independent code needs to be generated, flag_shlib is about whether locally
defined symbols can be interposed (plus it affects TLS model default choice).

> For dynamic libraries that are loaded into a non-instrumented executable (e.g.
> swig so preloaded into python process), we statically link the tsan runtime
> into the so.

And you don't get linker errors from that?  That must be by pure luck.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug sanitizer/55354] [asan] by default, the asan run-time should be linked statically, not dynamically
  2012-11-16 17:03 [Bug other/55354] New: [asan] by default, the asan run-time should be linked statically, not dynamically konstantin.s.serebryany at gmail dot com
                   ` (20 preceding siblings ...)
  2012-11-21  9:26 ` jakub at gcc dot gnu.org
@ 2012-11-23  7:17 ` dvyukov at google dot com
  2012-11-23  7:28 ` dvyukov at google dot com
                   ` (5 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: dvyukov at google dot com @ 2012-11-23  7:17 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354

--- Comment #22 from Dmitry Vyukov <dvyukov at google dot com> 2012-11-23 07:16:14 UTC ---
> > For dynamic libraries that are loaded into a non-instrumented executable (e.g.
> > swig so preloaded into python process), we statically link the tsan runtime
> > into the so.
> 
> And you don't get linker errors from that?  That must be by pure luck.

Why must I get errors in this case?


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug sanitizer/55354] [asan] by default, the asan run-time should be linked statically, not dynamically
  2012-11-16 17:03 [Bug other/55354] New: [asan] by default, the asan run-time should be linked statically, not dynamically konstantin.s.serebryany at gmail dot com
                   ` (21 preceding siblings ...)
  2012-11-23  7:17 ` [Bug sanitizer/55354] " dvyukov at google dot com
@ 2012-11-23  7:28 ` dvyukov at google dot com
  2012-11-23  8:14 ` jakub at gcc dot gnu.org
                   ` (4 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: dvyukov at google dot com @ 2012-11-23  7:28 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354

--- Comment #23 from Dmitry Vyukov <dvyukov at google dot com> 2012-11-23 07:27:27 UTC ---
(In reply to comment #21)
> (In reply to comment #20)
> > What I see is that it also affect code generation (register allocation). Do we
> > need to file a bug on that?
> 
> If you see a code generation difference even with -ftls-model=local-exec -fPIC
> vs. -fPIE, then it must mean you don't have visibility attributes on the
> symbols used in the fast path.  For initial-exec, the RA effects should be
> minimal, the TLS offset load from got is usually very close to the actual TLS
> memory load (or lea), and thus it will just pick up some short lived scratch
> register.  Generally in GCC, -fPIE sets flag_pic and not flag_shlib, while
> -fPIC sets flag_pic and flag_shlib.  flag_pic is about whether position
> independent code needs to be generated, flag_shlib is about whether locally
> defined symbols can be interposed (plus it affects TLS model default choice).

When I compile with -fvisibility=hidden, it does not affect generated code.
It's not that we access a lot of symbols in the function, there is one
thread-local and one static global var.

That "minimal" RA effects do have effect in our case. We don't have a reserve
to squeeze another register for tls access:

// -fPIE
000000000009ca30 <__tsan_write2>:
   9ca30:       64 48 8b 04 25 40 1f    mov    %fs:0xffffffffffeb1f40,%rax
   9ca37:       eb ff 
   9ca39:       48 8b 0c 24             mov    (%rsp),%rcx
   9ca3d:       a8 01                   test   $0x1,%al
   9ca3f:       0f 85 d3 00 00 00       jne    9cb18 <__tsan_write2+0xe8>
   9ca45:       48 83 e8 80             sub    $0xffffffffffffff80,%rax
   9ca49:       48 89 fe                mov    %rdi,%rsi
   9ca4c:       48 89 c2                mov    %rax,%rdx
   9ca4f:       64 48 89 04 25 40 1f    mov    %rax,%fs:0xffffffffffeb1f40
   9ca56:       eb ff 

// -fPIC -ftls-model=initial-exec
00000000000969f0 <__tsan_write2>:
   969f0:       48 c7 c2 40 1f eb ff    mov    $0xffffffffffeb1f40,%rdx
   969f7:       53                      push   %rbx
   969f8:       48 8b 4c 24 08          mov    0x8(%rsp),%rcx
   969fd:       64 48 8b 02             mov    %fs:(%rdx),%rax
   96a01:       a8 01                   test   $0x1,%al
   96a03:       0f 85 c7 00 00 00       jne    96ad0 <__tsan_write2+0xe0>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug sanitizer/55354] [asan] by default, the asan run-time should be linked statically, not dynamically
  2012-11-16 17:03 [Bug other/55354] New: [asan] by default, the asan run-time should be linked statically, not dynamically konstantin.s.serebryany at gmail dot com
                   ` (22 preceding siblings ...)
  2012-11-23  7:28 ` dvyukov at google dot com
@ 2012-11-23  8:14 ` jakub at gcc dot gnu.org
  2012-11-23  8:22 ` jakub at gcc dot gnu.org
                   ` (3 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: jakub at gcc dot gnu.org @ 2012-11-23  8:14 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354

--- Comment #24 from Jakub Jelinek <jakub at gcc dot gnu.org> 2012-11-23 08:13:11 UTC ---
When I rebuild libtsan with -fPIE instead of -fPIC in the Makefile, and
g++ -shared -Wl,--whole-archive -o libtsanx.so libtsan.a -ldl -lpthread
(note that .libs/*.o are still built with -fPIC because libtool overrides those
flags and adds its -fPIC at the end), then this link fails with:
/usr/bin/ld: libtsan.a(tsan_rtl.o): relocation R_X86_64_TPOFF32 against
`_ZN6__tsan22cur_thread_placeholderE' can not be used when making a shared
object; recompile with -fPIC
libtsan.a(tsan_rtl.o): could not read symbols: Bad value
collect2: error: ld returned 1 exit status
(obviously, local-exec model would need to patch the insn at dynamic linking
time, making the library DT_TEXTREL (not acceptable for SELinux, not allowed
for x86_64 at all)).

Yes, I see the code generation differences, but the functions are huge anyway,
is it really so crucial that you'd want to make another libtsan_pie.a for it?

BTW, the tsan that was added to GCC yesterday doesn't have
-ftls-model=initial-exec even, so it is even slower.
-ftls-model=initial-exec can only be done for *-*-linux* targets btw, I don't
think other dynamic linkers support dlopening IE model shared libraries.
E.g. libgomp has in its configure.tgt
if test $gcc_cv_have_tls = yes ; then
  case "${target}" in

    *-*-linux*)
        XCFLAGS="${XCFLAGS} -ftls-model=initial-exec"
        ;;
  esac
fi

so libsanitizer/configure.tgt would need to add it to say TSAN_CXXFLAGS var
that would be substituted by configure.

Would be nice if you could check the numerous warnings from tsan build, e.g. it
seems the ALWAYS_INLINE macro doesn't include the inline keyword, and you are
using it on functions that don't have inline keyword, which gives a warning and
if it is inlined, it is by pure luck.  Either you should add inline keywords
manually, or put inline keyword into ALWAYS_INLINE macro.  There are other
warnings...


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug sanitizer/55354] [asan] by default, the asan run-time should be linked statically, not dynamically
  2012-11-16 17:03 [Bug other/55354] New: [asan] by default, the asan run-time should be linked statically, not dynamically konstantin.s.serebryany at gmail dot com
                   ` (23 preceding siblings ...)
  2012-11-23  8:14 ` jakub at gcc dot gnu.org
@ 2012-11-23  8:22 ` jakub at gcc dot gnu.org
  2012-11-23  8:36 ` dvyukov at google dot com
                   ` (2 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: jakub at gcc dot gnu.org @ 2012-11-23  8:22 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354

--- Comment #25 from Jakub Jelinek <jakub at gcc dot gnu.org> 2012-11-23 08:20:40 UTC ---
Have you considered using __builtin_expect to help the compiler in the
performance sensitive code?  Or even better would be profile-feedback, build
libtsan once, run some benchmark, build libtsan again.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug sanitizer/55354] [asan] by default, the asan run-time should be linked statically, not dynamically
  2012-11-16 17:03 [Bug other/55354] New: [asan] by default, the asan run-time should be linked statically, not dynamically konstantin.s.serebryany at gmail dot com
                   ` (24 preceding siblings ...)
  2012-11-23  8:22 ` jakub at gcc dot gnu.org
@ 2012-11-23  8:36 ` dvyukov at google dot com
  2012-11-23 10:47 ` konstantin.s.serebryany at gmail dot com
  2012-11-23 11:16 ` konstantin.s.serebryany at gmail dot com
  27 siblings, 0 replies; 29+ messages in thread
From: dvyukov at google dot com @ 2012-11-23  8:36 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354

--- Comment #26 from Dmitry Vyukov <dvyukov at google dot com> 2012-11-23 08:34:52 UTC ---
(In reply to comment #25)
> Have you considered using __builtin_expect to help the compiler in the
> performance sensitive code?  Or even better would be profile-feedback, build
> libtsan once, run some benchmark, build libtsan again.

There are few of them in the code in the obvious cases. The performance is very
shaky and depends on compiler, compiler version and options. I remember I get
worse results when I added some other likily/unlikely. I didn't have time for
more comprehensive investigation. FDO may be a good idea for prebuild
libraries.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug sanitizer/55354] [asan] by default, the asan run-time should be linked statically, not dynamically
  2012-11-16 17:03 [Bug other/55354] New: [asan] by default, the asan run-time should be linked statically, not dynamically konstantin.s.serebryany at gmail dot com
                   ` (25 preceding siblings ...)
  2012-11-23  8:36 ` dvyukov at google dot com
@ 2012-11-23 10:47 ` konstantin.s.serebryany at gmail dot com
  2012-11-23 11:16 ` konstantin.s.serebryany at gmail dot com
  27 siblings, 0 replies; 29+ messages in thread
From: konstantin.s.serebryany at gmail dot com @ 2012-11-23 10:47 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354

--- Comment #27 from Konstantin Serebryany <konstantin.s.serebryany at gmail dot com> 2012-11-23 10:47:14 UTC ---
>> is it really so crucial that you'd want to make another libtsan_pie.a for it?
I would actually prefer to have *only* libtsan_pie.a, and not bother with .so
version at all. This is what we have in clang land.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug sanitizer/55354] [asan] by default, the asan run-time should be linked statically, not dynamically
  2012-11-16 17:03 [Bug other/55354] New: [asan] by default, the asan run-time should be linked statically, not dynamically konstantin.s.serebryany at gmail dot com
                   ` (26 preceding siblings ...)
  2012-11-23 10:47 ` konstantin.s.serebryany at gmail dot com
@ 2012-11-23 11:16 ` konstantin.s.serebryany at gmail dot com
  27 siblings, 0 replies; 29+ messages in thread
From: konstantin.s.serebryany at gmail dot com @ 2012-11-23 11:16 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354

--- Comment #28 from Konstantin Serebryany <konstantin.s.serebryany at gmail dot com> 2012-11-23 11:14:58 UTC ---
Note that today's upstream tsan sources don't link with -fPIC at all
because our assembly blobs are not PIC-friendly. 

/usr/bin/ld: .libs/tsan_rtl_thread.o: relocation R_X86_64_PC32 against
undefined symbol `__tsan_trace_switch_thunk' can not be used when making a
shared objec

Apparently, Wei hacked this around by enabling pure-C++ code, which is slower.


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2012-11-23 11:16 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-16 17:03 [Bug other/55354] New: [asan] by default, the asan run-time should be linked statically, not dynamically konstantin.s.serebryany at gmail dot com
2012-11-16 17:15 ` [Bug other/55354] " pinskia at gcc dot gnu.org
2012-11-16 17:21 ` dvyukov at google dot com
2012-11-16 17:22 ` jakub at gcc dot gnu.org
2012-11-16 20:28 ` konstantin.s.serebryany at gmail dot com
2012-11-16 20:47 ` jakub at gcc dot gnu.org
2012-11-16 20:54 ` konstantin.s.serebryany at gmail dot com
2012-11-17 20:36 ` hjl.tools at gmail dot com
2012-11-17 21:08 ` markus at trippelsdorf dot de
2012-11-18 19:36 ` konstantin.s.serebryany at gmail dot com
2012-11-18 19:54 ` jakub at gcc dot gnu.org
2012-11-18 19:59 ` konstantin.s.serebryany at gmail dot com
2012-11-18 20:10 ` jakub at gcc dot gnu.org
2012-11-19  4:13 ` konstantin.s.serebryany at gmail dot com
2012-11-19  8:55 ` jakub at gcc dot gnu.org
2012-11-19  9:03 ` konstantin.s.serebryany at gmail dot com
2012-11-19  9:06 ` konstantin.s.serebryany at gmail dot com
2012-11-19 10:53 ` dvyukov at google dot com
2012-11-21  7:45 ` dvyukov at google dot com
2012-11-21  8:43 ` jakub at gcc dot gnu.org
2012-11-21  9:06 ` dvyukov at google dot com
2012-11-21  9:26 ` jakub at gcc dot gnu.org
2012-11-23  7:17 ` [Bug sanitizer/55354] " dvyukov at google dot com
2012-11-23  7:28 ` dvyukov at google dot com
2012-11-23  8:14 ` jakub at gcc dot gnu.org
2012-11-23  8:22 ` jakub at gcc dot gnu.org
2012-11-23  8:36 ` dvyukov at google dot com
2012-11-23 10:47 ` konstantin.s.serebryany at gmail dot com
2012-11-23 11:16 ` konstantin.s.serebryany at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).