public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* thread_local performance using g++ for cygwin
@ 2019-05-06  7:09 Arthur Norman
  2019-05-06 20:10 ` Brian Inglis
  0 siblings, 1 reply; 2+ messages in thread
From: Arthur Norman @ 2019-05-06  7:09 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1141 bytes --]

The attached code tried two loops each of which just calls a function that 
increments an integer variable. One loop is a simple variable, the other 
has the thread_local qualifier. I put in ugly annotations to prevent g++ 
from inlining the functions even though I compile with -O3, but in real 
cases separate compilation forces each TL access to be independent.
The timing as between the two cases is EXTREME on cygwin (both 32 and 
64-bit) however g++ on Linux and the Microsoft compiler on Windows both 
manage to keep the base of thread-local regions in a segment register in 
such a way that the thread_local overhead is minimal. The cygwin 
thread_local overhead is large enough to be very visible in my code as a 
whole. I can see that changing to use a segment register might be a 
painful API change even if it was feasible, but has there been any 
consideration of it?
Note that x86_64-w64-mingw32-g++ and clang also do not use the segment 
register so suffer the significant speed penalty, so maybe it would be 
hard to match what Microsoft manage?

Sample output:
     simple 1.265
     thread_local 33.219


            Arthur

[-- Attachment #2: Type: TEXT/PLAIN, Size: 678 bytes --]

#include <time.h>
#include <iostream>
thread_local int tl_var = 0;
int simple_var = 0;
void simple_inc() __attribute__ ((noinline));
void simple_inc()
{   simple_var++;
}
void tl_inc() __attribute__ ((noinline));
void tl_inc()
{   tl_var++;
}
int main(int argc, char *argv[])
{   clock_t c0 = clock();
    for (unsigned int i=0; i<0x40000000; i++) simple_inc();
    std::cout << "simple " << ((clock()-c0)/(double)CLOCKS_PER_SEC)
              << std::flush << std::endl;
    c0 = clock();
    for (unsigned int i=0; i<0x40000000; i++) tl_inc();
    std::cout << "thread_local " << ((clock()-c0)/(double)CLOCKS_PER_SEC)
              << std::flush << std::endl;
    return 0;
}

[-- Attachment #3: Type: text/plain, Size: 219 bytes --]


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2019-05-06 20:10 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-06  7:09 thread_local performance using g++ for cygwin Arthur Norman
2019-05-06 20:10 ` Brian Inglis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).