public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* thread_local performance using g++ for cygwin
@ 2019-05-06  7:09 Arthur Norman
  2019-05-06 20:10 ` Brian Inglis
  0 siblings, 1 reply; 2+ messages in thread
From: Arthur Norman @ 2019-05-06  7:09 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1141 bytes --]

The attached code tried two loops each of which just calls a function that 
increments an integer variable. One loop is a simple variable, the other 
has the thread_local qualifier. I put in ugly annotations to prevent g++ 
from inlining the functions even though I compile with -O3, but in real 
cases separate compilation forces each TL access to be independent.
The timing as between the two cases is EXTREME on cygwin (both 32 and 
64-bit) however g++ on Linux and the Microsoft compiler on Windows both 
manage to keep the base of thread-local regions in a segment register in 
such a way that the thread_local overhead is minimal. The cygwin 
thread_local overhead is large enough to be very visible in my code as a 
whole. I can see that changing to use a segment register might be a 
painful API change even if it was feasible, but has there been any 
consideration of it?
Note that x86_64-w64-mingw32-g++ and clang also do not use the segment 
register so suffer the significant speed penalty, so maybe it would be 
hard to match what Microsoft manage?

Sample output:
     simple 1.265
     thread_local 33.219


            Arthur

[-- Attachment #2: Type: TEXT/PLAIN, Size: 678 bytes --]

#include <time.h>
#include <iostream>
thread_local int tl_var = 0;
int simple_var = 0;
void simple_inc() __attribute__ ((noinline));
void simple_inc()
{   simple_var++;
}
void tl_inc() __attribute__ ((noinline));
void tl_inc()
{   tl_var++;
}
int main(int argc, char *argv[])
{   clock_t c0 = clock();
    for (unsigned int i=0; i<0x40000000; i++) simple_inc();
    std::cout << "simple " << ((clock()-c0)/(double)CLOCKS_PER_SEC)
              << std::flush << std::endl;
    c0 = clock();
    for (unsigned int i=0; i<0x40000000; i++) tl_inc();
    std::cout << "thread_local " << ((clock()-c0)/(double)CLOCKS_PER_SEC)
              << std::flush << std::endl;
    return 0;
}

[-- Attachment #3: Type: text/plain, Size: 219 bytes --]


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: thread_local performance using g++ for cygwin
  2019-05-06  7:09 thread_local performance using g++ for cygwin Arthur Norman
@ 2019-05-06 20:10 ` Brian Inglis
  0 siblings, 0 replies; 2+ messages in thread
From: Brian Inglis @ 2019-05-06 20:10 UTC (permalink / raw)
  To: cygwin

On 2019-05-06 01:09, Arthur Norman wrote:
> The attached code tried two loops each of which just calls a function that
> increments an integer variable. One loop is a simple variable, the other has the
> thread_local qualifier. I put in ugly annotations to prevent g++ from inlining
> the functions even though I compile with -O3, but in real cases separate
> compilation forces each TL access to be independent.
> The timing as between the two cases is EXTREME on cygwin (both 32 and 64-bit)
> however g++ on Linux and the Microsoft compiler on Windows both manage to keep
> the base of thread-local regions in a segment register in such a way that the
> thread_local overhead is minimal. The cygwin thread_local overhead is large
> enough to be very visible in my code as a whole. I can see that changing to use
> a segment register might be a painful API change even if it was feasible, but
> has there been any consideration of it?
> Note that x86_64-w64-mingw32-g++ and clang also do not use the segment register
> so suffer the significant speed penalty, so maybe it would be hard to match what
> Microsoft manage?
> 
> Sample output:
>     simple 1.265
>     thread_local 33.219

See:
https://cygwin.com/git/gitweb.cgi?p=newlib-cygwin.git;f=winsup/cygwin/how-cygtls-works.txt;a=blob

and you may want to compare the gcc default options and -S assembler output for
your test case on Linux and Cygwin, and perhaps also any glibc and newlib TLS
support functions called: running your Linux tests under some WSL distro will
even out OS kernel differences.

My own tests on Win 10.0.17763.437 1809 are worse than yours:

$ g++ -O3 -o tltime.{bin,cpp}
$ ./tltime.bin
simple 1.60938
thread_local 1.95312
$ uname -srvmo
Linux 4.4.0-17763-Microsoft #379-Microsoft Wed Mar 06 19:16:00 PST 2019 x86_64
GNU/Linux
$ head /etc/os-release
PRETTY_NAME="Debian GNU/Linux 9 (stretch)"
NAME="Debian GNU/Linux"
VERSION_ID="9"
VERSION="9 (stretch)"
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

$ g++ -O3 -o tltime.{exe,cpp}
$ ./tltime.exe
simple 1.608
thread_local 53.25
$ uname -srvmo
CYGWIN_NT-10.0 3.0.7(0.338/5/3) 2019-04-30 18:08 x86_64 Cygwin
$ head /etc/os-release
PRETTY_NAME="Cygwin 64 3.0.7 2019-04-30"
NAME=Cygwin
ID=cygwin
ID_LIKE=msys mingw
VARIANT="64"
VARIANT_ID="x86_64"
VERSION="3.0.7 (0.338/5/3) 2019-04-30 18:08"
VERSION_ID="3.0.7"
BUILD_ID="0.338/5/3 2019-04-30 18:08"
CPE_NAME="cpe:/a:cygwin:cygwin:3.0.7::~~~~x64~Windows%3e%3d6.0"

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2019-05-06 20:10 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-06  7:09 thread_local performance using g++ for cygwin Arthur Norman
2019-05-06 20:10 ` Brian Inglis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).