* calloc speed difference @ 2018-01-12 7:19 Lee 2018-01-12 8:38 ` Eliot Moss ` (2 more replies) 0 siblings, 3 replies; 15+ messages in thread From: Lee @ 2018-01-12 7:19 UTC (permalink / raw) To: cygwin Why is the cygwin gcc calloc so much slower than the i686-w64-mingw32-gcc calloc? 1:12 vs 0:11 $cat calloc-test.c #include <stdio.h> #include <stdlib.h> #define ALLOCATION_SIZE (100 * 1024 * 1024) int main (int argc, char *argv[]) { for (int i = 0; i < 10000; i++) { void *temp = calloc(ALLOCATION_SIZE, 1); if ( temp == NULL ) { printf("drat! calloc returned NULL\n"); return 1; } free(temp); } return 0; } $gcc calloc-test.c $time ./a real 1m12.459s user 0m0.640s sys 1m11.750s $i686-w64-mingw32-gcc calloc-test.c $time ./a real 0m11.119s user 0m0.000s sys 0m0.000s $gcc calloc-test.c $time ./a real 1m12.323s user 0m0.656s sys 1m11.640s $i686-w64-mingw32-gcc calloc-test.c $time ./a real 0m11.080s user 0m0.000s sys 0m0.000s $ $ gcc --version gcc (GCC) 6.4.0 Copyright (C) 2017 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. $ i686-w64-mingw32-gcc --version i686-w64-mingw32-gcc (GCC) 6.4.0 Copyright (C) 2017 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: calloc speed difference 2018-01-12 7:19 calloc speed difference Lee @ 2018-01-12 8:38 ` Eliot Moss 2018-01-12 9:07 ` Marco Atzeri 2018-01-12 14:05 ` Christian Franke 2 siblings, 0 replies; 15+ messages in thread From: Eliot Moss @ 2018-01-12 8:38 UTC (permalink / raw) To: cygwin On 1/12/2018 2:19 AM, Lee wrote: > Why is the cygwin gcc calloc so much slower than the > i686-w64-mingw32-gcc calloc? Since your test repeatedly allocates and frees one chunk of size 100 Mb (ouch!) my guess is that the slow behavior is rooted in something to do with mmap. Perhaps Corinna or another internals expert can explain why large mmap requests would be a problem for cygwin -- and perhaps it is something that could be improved if the effort is warranted ... Eliot Moss -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: calloc speed difference 2018-01-12 7:19 calloc speed difference Lee 2018-01-12 8:38 ` Eliot Moss @ 2018-01-12 9:07 ` Marco Atzeri 2018-01-12 10:52 ` Lee 2018-01-12 14:05 ` Christian Franke 2 siblings, 1 reply; 15+ messages in thread From: Marco Atzeri @ 2018-01-12 9:07 UTC (permalink / raw) To: cygwin On 12/01/2018 08:19, Lee wrote: > Why is the cygwin gcc calloc so much slower than the > i686-w64-mingw32-gcc calloc? > 1:12 vs 0:11 > > $cat calloc-test.c > #include <stdio.h> > #include <stdlib.h> > #define ALLOCATION_SIZE (100 * 1024 * 1024) > int main (int argc, char *argv[]) { > for (int i = 0; i < 10000; i++) { > void *temp = calloc(ALLOCATION_SIZE, 1); > if ( temp == NULL ) { > printf("drat! calloc returned NULL\n"); > return 1; > } > free(temp); > } > return 0; > } > > $gcc calloc-test.c > $time ./a > > real 1m12.459s > user 0m0.640s > sys 1m11.750s it seems a local problem, maybe BLODA? I have roughly the same for both 32 and 64 cygwin version on W7-64 $ time ./calloc-tests real 0m8.346s user 0m0.904s sys 0m7.175s -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: calloc speed difference 2018-01-12 9:07 ` Marco Atzeri @ 2018-01-12 10:52 ` Lee 2018-01-21 11:01 ` Marco Atzeri 0 siblings, 1 reply; 15+ messages in thread From: Lee @ 2018-01-12 10:52 UTC (permalink / raw) To: cygwin On 1/12/18, Marco Atzeri wrote: > On 12/01/2018 08:19, Lee wrote: >> Why is the cygwin gcc calloc so much slower than the >> i686-w64-mingw32-gcc calloc? >> 1:12 vs 0:11 >> >> $cat calloc-test.c >> #include <stdio.h> >> #include <stdlib.h> >> #define ALLOCATION_SIZE (100 * 1024 * 1024) >> int main (int argc, char *argv[]) { >> for (int i = 0; i < 10000; i++) { >> void *temp = calloc(ALLOCATION_SIZE, 1); >> if ( temp == NULL ) { >> printf("drat! calloc returned NULL\n"); >> return 1; >> } >> free(temp); >> } >> return 0; >> } >> >> $gcc calloc-test.c >> $time ./a >> >> real 1m12.459s >> user 0m0.640s >> sys 1m11.750s > > it seems a local problem, maybe BLODA? I've seen windows defender get in the way & slow things down before - this doesn't look anything like that but how does one know for sure? when running the cygwin gcc version sysinternals process explorer shows system idle process 72.x a.exe 24.9x procexp64.exe 1.x and everything else is < 1% CPU is an Intel i3 w/ 4 logical processors, so I'm guessing that 25% cpu busy is one processor 100% busy It looks roughly the same when running the mingw gcc version .. except that a.exe shows 24.9x% cpu busy for a much shorter time :) In any case, I tried turning off windows defender - no change in how long it takes calloc-test to run (i already had c:\cygwin in the exclusion list) > I have roughly the same for both 32 and 64 cygwin version on W7-64 which flavor of gcc - the cygwin version that builds an executable that pulls in the posix emulation layer or the mingw version that builds an executable that runs "native" windows code? Thanks, Lee -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: calloc speed difference 2018-01-12 10:52 ` Lee @ 2018-01-21 11:01 ` Marco Atzeri 0 siblings, 0 replies; 15+ messages in thread From: Marco Atzeri @ 2018-01-21 11:01 UTC (permalink / raw) To: cygwin On 12/01/2018 18:52, Lee wrote: > On 1/12/18, Marco Atzeri wrote: >> On 12/01/2018 08:19, Lee wrote: > > >> I have roughly the same for both 32 and 64 cygwin version on W7-64 > > which flavor of gcc - the cygwin version that builds an executable > that pulls in the posix emulation layer or the mingw version that > builds an executable that runs "native" windows code? > > Thanks, > Lee As I wrote the cygwin version. Regards Marco -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: calloc speed difference 2018-01-12 7:19 calloc speed difference Lee 2018-01-12 8:38 ` Eliot Moss 2018-01-12 9:07 ` Marco Atzeri @ 2018-01-12 14:05 ` Christian Franke 2018-01-12 14:33 ` Corinna Vinschen ` (2 more replies) 2 siblings, 3 replies; 15+ messages in thread From: Christian Franke @ 2018-01-12 14:05 UTC (permalink / raw) To: cygwin Lee wrote: > Why is the cygwin gcc calloc so much slower than the > i686-w64-mingw32-gcc calloc? > 1:12 vs 0:11 > > $cat calloc-test.c > #include <stdio.h> > #include <stdlib.h> > #define ALLOCATION_SIZE (100 * 1024 * 1024) > int main (int argc, char *argv[]) { > for (int i = 0; i < 10000; i++) { > void *temp = calloc(ALLOCATION_SIZE, 1); > if ( temp == NULL ) { > printf("drat! calloc returned NULL\n"); > return 1; > } > free(temp); > } > return 0; > } > Could reproduce the difference on an older i7-2600K machine: Cygwin: ~20s MinGW: ~4s Timing [cm]alloc() calls without actually using the allocated memory might produce misleading results due to lazy page allocation and/or zero-filling. MinGW binaries use calloc() from msvcrt.dll. This calloc() does not call malloc() and then memset(). It directly calls: mem = HeapAlloc(_crtheap, HEAP_ZERO_MEMORY, size); which possibly only reserves allocate-and-zero-fill-on-demand pages for later. Cygwin's calloc() is different. This variant of the above code adds one write access to each 4KiB page (guarded by "volatile" to prevent dead assignment optimization): #include <stdio.h> #include <stdlib.h> #define ALLOCATION_SIZE (100 * 1024 * 1024) int main (int argc, char *argv[]) { for (int i = 0; i < 1000; i++) { void *temp = calloc(ALLOCATION_SIZE, 1); if ( temp == NULL ) { printf("drat! calloc returned NULL\n"); return 1; } for (int j = 0; j < ALLOCATION_SIZE; j += 4096) ((volatile char *)temp)[j] = (char)i; free(temp); } return 0; } Results: Cygwin: ~310s MinGW: ~210s Christian -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: calloc speed difference 2018-01-12 14:05 ` Christian Franke @ 2018-01-12 14:33 ` Corinna Vinschen 2018-01-12 19:59 ` cyg Simple 2018-01-13 10:04 ` Lee 2018-01-12 22:00 ` Eliot Moss 2018-01-13 8:35 ` Lee 2 siblings, 2 replies; 15+ messages in thread From: Corinna Vinschen @ 2018-01-12 14:33 UTC (permalink / raw) To: cygwin [-- Attachment #1: Type: text/plain, Size: 1954 bytes --] On Jan 12 15:06, Christian Franke wrote: > Lee wrote: > > Why is the cygwin gcc calloc so much slower than the > > i686-w64-mingw32-gcc calloc? > > 1:12 vs 0:11 > > > > $cat calloc-test.c > > #include <stdio.h> > > #include <stdlib.h> > > #define ALLOCATION_SIZE (100 * 1024 * 1024) > > int main (int argc, char *argv[]) { > > for (int i = 0; i < 10000; i++) { > > void *temp = calloc(ALLOCATION_SIZE, 1); > > if ( temp == NULL ) { > > printf("drat! calloc returned NULL\n"); > > return 1; > > } > > free(temp); > > } > > return 0; > > } > > > > Could reproduce the difference on an older i7-2600K machine: > > Cygwin: ~20s > MinGW: ~4s > > Timing [cm]alloc() calls without actually using the allocated memory might > produce misleading results due to lazy page allocation and/or zero-filling. > > MinGW binaries use calloc() from msvcrt.dll. This calloc() does not call > malloc() and then memset(). It directly calls: > > mem = HeapAlloc(_crtheap, HEAP_ZERO_MEMORY, size); > > which possibly only reserves allocate-and-zero-fill-on-demand pages for > later. > > Cygwin's calloc() is different. But then again, Cygwin's malloc *is* slow, particulary in memory-demanding multi-threaded scenarios since that serializes all malloc/free calls. The memory handling within Cygwin is tricky. Attempts to replace good old dlmalloc with a fresher jemalloc or ptmalloc failed, but that only means the developer (i.e., me, in case of ptmalloc) was too lazy... busy! I mean busy... to pull this through. Having said that, if somebody would like to take a stab at replacing dlmalloc with something leaner, I would be very happy and assist as much as I can. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: calloc speed difference 2018-01-12 14:33 ` Corinna Vinschen @ 2018-01-12 19:59 ` cyg Simple 2018-01-12 20:07 ` cyg Simple 2018-01-12 20:41 ` Corinna Vinschen 2018-01-13 10:04 ` Lee 1 sibling, 2 replies; 15+ messages in thread From: cyg Simple @ 2018-01-12 19:59 UTC (permalink / raw) To: cygwin On 1/12/2018 9:33 AM, Corinna Vinschen wrote: > On Jan 12 15:06, Christian Franke wrote: >> Lee wrote: >>> Why is the cygwin gcc calloc so much slower than the >>> i686-w64-mingw32-gcc calloc? >>> 1:12 vs 0:11 >>> >>> $cat calloc-test.c >>> #include <stdio.h> >>> #include <stdlib.h> >>> #define ALLOCATION_SIZE (100 * 1024 * 1024) >>> int main (int argc, char *argv[]) { >>> for (int i = 0; i < 10000; i++) { >>> void *temp = calloc(ALLOCATION_SIZE, 1); >>> if ( temp == NULL ) { >>> printf("drat! calloc returned NULL\n"); >>> return 1; >>> } >>> free(temp); >>> } >>> return 0; >>> } >>> >> >> Could reproduce the difference on an older i7-2600K machine: >> >> Cygwin: ~20s >> MinGW: ~4s >> >> Timing [cm]alloc() calls without actually using the allocated memory might >> produce misleading results due to lazy page allocation and/or zero-filling. >> >> MinGW binaries use calloc() from msvcrt.dll. This calloc() does not call >> malloc() and then memset(). It directly calls: >> >> mem = HeapAlloc(_crtheap, HEAP_ZERO_MEMORY, size); >> >> which possibly only reserves allocate-and-zero-fill-on-demand pages for >> later. >> >> Cygwin's calloc() is different. > > But then again, Cygwin's malloc *is* slow, particulary in > memory-demanding multi-threaded scenarios since that serializes all > malloc/free calls. > > The memory handling within Cygwin is tricky. Attempts to replace good > old dlmalloc with a fresher jemalloc or ptmalloc failed, but that only > means the developer (i.e., me, in case of ptmalloc) was too lazy... > busy! I mean busy... to pull this through. > > Having said that, if somebody would like to take a stab at replacing > dlmalloc with something leaner, I would be very happy and assist as > much as I can. Corina, how reliable is the Cygwin time function on a non-Cygwin executable? Isn't this a comparison of apples to oranges? -- cyg Simple -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: calloc speed difference 2018-01-12 19:59 ` cyg Simple @ 2018-01-12 20:07 ` cyg Simple 2018-01-12 20:41 ` Corinna Vinschen 1 sibling, 0 replies; 15+ messages in thread From: cyg Simple @ 2018-01-12 20:07 UTC (permalink / raw) To: cygwin On 1/12/2018 2:59 PM, cyg Simple wrote: > On 1/12/2018 9:33 AM, Corinna Vinschen wrote: >> On Jan 12 15:06, Christian Franke wrote: >>> Lee wrote: >>>> Why is the cygwin gcc calloc so much slower than the >>>> i686-w64-mingw32-gcc calloc? >>>> 1:12 vs 0:11 >>>> >>>> $cat calloc-test.c >>>> #include <stdio.h> >>>> #include <stdlib.h> >>>> #define ALLOCATION_SIZE (100 * 1024 * 1024) >>>> int main (int argc, char *argv[]) { >>>> for (int i = 0; i < 10000; i++) { >>>> void *temp = calloc(ALLOCATION_SIZE, 1); >>>> if ( temp == NULL ) { >>>> printf("drat! calloc returned NULL\n"); >>>> return 1; >>>> } >>>> free(temp); >>>> } >>>> return 0; >>>> } >>>> >>> >>> Could reproduce the difference on an older i7-2600K machine: >>> >>> Cygwin: ~20s >>> MinGW: ~4s >>> >>> Timing [cm]alloc() calls without actually using the allocated memory might >>> produce misleading results due to lazy page allocation and/or zero-filling. >>> >>> MinGW binaries use calloc() from msvcrt.dll. This calloc() does not call >>> malloc() and then memset(). It directly calls: >>> >>> mem = HeapAlloc(_crtheap, HEAP_ZERO_MEMORY, size); >>> >>> which possibly only reserves allocate-and-zero-fill-on-demand pages for >>> later. >>> >>> Cygwin's calloc() is different. >> >> But then again, Cygwin's malloc *is* slow, particulary in >> memory-demanding multi-threaded scenarios since that serializes all >> malloc/free calls. >> >> The memory handling within Cygwin is tricky. Attempts to replace good >> old dlmalloc with a fresher jemalloc or ptmalloc failed, but that only >> means the developer (i.e., me, in case of ptmalloc) was too lazy... >> busy! I mean busy... to pull this through. >> >> Having said that, if somebody would like to take a stab at replacing >> dlmalloc with something leaner, I would be very happy and assist as >> much as I can. > > Corina, how reliable is the Cygwin time function on a non-Cygwin > executable? Isn't this a comparison of apples to oranges? > s/Corina/Corinna Sorry, -- cyg Simple -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: calloc speed difference 2018-01-12 19:59 ` cyg Simple 2018-01-12 20:07 ` cyg Simple @ 2018-01-12 20:41 ` Corinna Vinschen 2018-01-12 22:34 ` cyg Simple 1 sibling, 1 reply; 15+ messages in thread From: Corinna Vinschen @ 2018-01-12 20:41 UTC (permalink / raw) To: cygwin [-- Attachment #1: Type: text/plain, Size: 1760 bytes --] On Jan 12 14:59, cyg Simple wrote: > On 1/12/2018 9:33 AM, Corinna Vinschen wrote: > > On Jan 12 15:06, Christian Franke wrote: > >> Timing [cm]alloc() calls without actually using the allocated memory might > >> produce misleading results due to lazy page allocation and/or zero-filling. > >> > >> MinGW binaries use calloc() from msvcrt.dll. This calloc() does not call > >> malloc() and then memset(). It directly calls: > >> > >> mem = HeapAlloc(_crtheap, HEAP_ZERO_MEMORY, size); > >> > >> which possibly only reserves allocate-and-zero-fill-on-demand pages for > >> later. > >> > >> Cygwin's calloc() is different. > > > > But then again, Cygwin's malloc *is* slow, particulary in > > memory-demanding multi-threaded scenarios since that serializes all > > malloc/free calls. > > > > The memory handling within Cygwin is tricky. Attempts to replace good > > old dlmalloc with a fresher jemalloc or ptmalloc failed, but that only > > means the developer (i.e., me, in case of ptmalloc) was too lazy... > > busy! I mean busy... to pull this through. > > > > Having said that, if somebody would like to take a stab at replacing > > dlmalloc with something leaner, I would be very happy and assist as > > much as I can. > > Corina, how reliable is the Cygwin time function on a non-Cygwin > executable? Isn't this a comparison of apples to oranges? I wasn't comparing, in fact. I was just saying that Cygwin's malloc is slow, partially because dlmalloc is not the fastest one, partially due to the serialization overhead in multithreading scenarios. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: calloc speed difference 2018-01-12 20:41 ` Corinna Vinschen @ 2018-01-12 22:34 ` cyg Simple 2018-01-13 10:48 ` Lee 0 siblings, 1 reply; 15+ messages in thread From: cyg Simple @ 2018-01-12 22:34 UTC (permalink / raw) To: cygwin On 1/12/2018 3:41 PM, Corinna Vinschen wrote: > On Jan 12 14:59, cyg Simple wrote: >> On 1/12/2018 9:33 AM, Corinna Vinschen wrote: >>> On Jan 12 15:06, Christian Franke wrote: >>>> Timing [cm]alloc() calls without actually using the allocated memory might >>>> produce misleading results due to lazy page allocation and/or zero-filling. >>>> >>>> MinGW binaries use calloc() from msvcrt.dll. This calloc() does not call >>>> malloc() and then memset(). It directly calls: >>>> >>>> mem = HeapAlloc(_crtheap, HEAP_ZERO_MEMORY, size); >>>> >>>> which possibly only reserves allocate-and-zero-fill-on-demand pages for >>>> later. >>>> >>>> Cygwin's calloc() is different. >>> >>> But then again, Cygwin's malloc *is* slow, particulary in >>> memory-demanding multi-threaded scenarios since that serializes all >>> malloc/free calls. >>> >>> The memory handling within Cygwin is tricky. Attempts to replace good >>> old dlmalloc with a fresher jemalloc or ptmalloc failed, but that only >>> means the developer (i.e., me, in case of ptmalloc) was too lazy... >>> busy! I mean busy... to pull this through. >>> >>> Having said that, if somebody would like to take a stab at replacing >>> dlmalloc with something leaner, I would be very happy and assist as >>> much as I can. >> >> Corina, how reliable is the Cygwin time function on a non-Cygwin >> executable? Isn't this a comparison of apples to oranges? > > I wasn't comparing, in fact. I was just saying that Cygwin's malloc > is slow, partially because dlmalloc is not the fastest one, partially > due to the serialization overhead in multithreading scenarios. No, but the OP *is* doing a compare. From what I remember doing a time comparison of a non-Cygwin app compared to a Cygwin app isn't really a logical comparison. Even if the two were a Cygwin app multiple runs of the same app will show variance. -- cyg Simple -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: calloc speed difference 2018-01-12 22:34 ` cyg Simple @ 2018-01-13 10:48 ` Lee 0 siblings, 0 replies; 15+ messages in thread From: Lee @ 2018-01-13 10:48 UTC (permalink / raw) To: cygwin On 1/12/18, cyg Simple wrote: > On 1/12/2018 3:41 PM, Corinna Vinschen wrote: >> On Jan 12 14:59, cyg Simple wrote: >>> On 1/12/2018 9:33 AM, Corinna Vinschen wrote: >>>> On Jan 12 15:06, Christian Franke wrote: >>>>> Timing [cm]alloc() calls without actually using the allocated memory >>>>> might >>>>> produce misleading results due to lazy page allocation and/or >>>>> zero-filling. >>>>> >>>>> MinGW binaries use calloc() from msvcrt.dll. This calloc() does not >>>>> call >>>>> malloc() and then memset(). It directly calls: >>>>> >>>>> mem = HeapAlloc(_crtheap, HEAP_ZERO_MEMORY, size); >>>>> >>>>> which possibly only reserves allocate-and-zero-fill-on-demand pages >>>>> for >>>>> later. >>>>> >>>>> Cygwin's calloc() is different. >>>> >>>> But then again, Cygwin's malloc *is* slow, particulary in >>>> memory-demanding multi-threaded scenarios since that serializes all >>>> malloc/free calls. >>>> >>>> The memory handling within Cygwin is tricky. Attempts to replace good >>>> old dlmalloc with a fresher jemalloc or ptmalloc failed, but that only >>>> means the developer (i.e., me, in case of ptmalloc) was too lazy... >>>> busy! I mean busy... to pull this through. >>>> >>>> Having said that, if somebody would like to take a stab at replacing >>>> dlmalloc with something leaner, I would be very happy and assist as >>>> much as I can. >>> >>> Corina, how reliable is the Cygwin time function on a non-Cygwin >>> executable? Isn't this a comparison of apples to oranges? The wall-clock time seems reliable. Timing a non-Cygwin executable gives you 0.0 for the user & sys categories but I don't care about them anywhere near as much as how long it takes for the program to run. >> I wasn't comparing, in fact. I was just saying that Cygwin's malloc >> is slow, partially because dlmalloc is not the fastest one, partially >> due to the serialization overhead in multithreading scenarios. > > No, but the OP *is* doing a compare. From what I remember doing a time > comparison of a non-Cygwin app compared to a Cygwin app isn't really a > logical comparison. I'm probably missing something, but.. I have the source code. I have the choice of using the cygwin gcc compiler or the mingw cross compiler. If both executables produce the same results then comparing execution times seems perfectly valid. On the other hand, if the program produces text output (ie. dos vs. unix line endings) then dealing with dos line endings in the cygwin environment might be enough of a pain that I accept the ease-of-use vs. execution time tradeoff and keep everything compatible w/ cygwin. > Even if the two were a Cygwin app multiple runs of > the same app will show variance. But not the seconds vs. minutes difference that I occasionally see when comparing cygwin vs. native app performance. Regards, Lee -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: calloc speed difference 2018-01-12 14:33 ` Corinna Vinschen 2018-01-12 19:59 ` cyg Simple @ 2018-01-13 10:04 ` Lee 1 sibling, 0 replies; 15+ messages in thread From: Lee @ 2018-01-13 10:04 UTC (permalink / raw) To: cygwin On 1/12/18, Corinna Vinschen wrote: > On Jan 12 15:06, Christian Franke wrote: >> Lee wrote: >> > Why is the cygwin gcc calloc so much slower than the >> > i686-w64-mingw32-gcc calloc? >> > 1:12 vs 0:11 <.. snip example prog ..> >> >> Could reproduce the difference on an older i7-2600K machine: >> >> Cygwin: ~20s >> MinGW: ~4s <.. snip possible explanation ..> > > But then again, Cygwin's malloc *is* slow, particulary in > memory-demanding multi-threaded scenarios since that serializes all > malloc/free calls. > > The memory handling within Cygwin is tricky. Attempts to replace good > old dlmalloc with a fresher jemalloc or ptmalloc failed, but that only > means the developer (i.e., me, in case of ptmalloc) was too lazy... > busy! I mean busy... to pull this through. > > Having said that, if somebody would like to take a stab at replacing > dlmalloc with something leaner, I would be very happy and assist as > much as I can. I just took a quick look at some malloc code & docs and I know enough to know that I'm not going to be the one taking a stab at replacing dlmalloc. Sorry :( Lee -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: calloc speed difference 2018-01-12 14:05 ` Christian Franke 2018-01-12 14:33 ` Corinna Vinschen @ 2018-01-12 22:00 ` Eliot Moss 2018-01-13 8:35 ` Lee 2 siblings, 0 replies; 15+ messages in thread From: Eliot Moss @ 2018-01-12 22:00 UTC (permalink / raw) To: cygwin On 1/12/2018 9:06 AM, Christian Franke wrote: > This variant of the above code adds one write access to each 4KiB page (guarded by "volatile" to > prevent dead assignment optimization): > > #include <stdio.h> > #include <stdlib.h> > #define ALLOCATION_SIZE (100 * 1024 * 1024) > int main (int argc, char *argv[]) { > Â Â Â for (int i = 0; i < 1000; i++) { > Â Â Â Â Â Â Â void *temp = calloc(ALLOCATION_SIZE, 1); > Â Â Â Â Â Â Â if ( temp == NULL ) { > Â Â Â Â Â Â Â Â Â Â printf("drat! calloc returned NULL\n"); > Â Â Â Â Â Â Â Â Â Â return 1; > Â Â Â Â Â Â Â } > Â Â Â Â Â Â Â for (int j = 0; j < ALLOCATION_SIZE; j += 4096) > Â Â Â Â Â Â Â Â Â ((volatile char *)temp)[j] = (char)i; > Â Â Â Â Â Â Â free(temp); > Â Â Â } > Â Â Â return 0; > } > > Results: > > Cygwin: ~310s > MinGW: ~210s Good analysis! There remains a lot of room for improvement, but this shows good reasons to dig deeper to understand what goes on with large allocations. EM -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: calloc speed difference 2018-01-12 14:05 ` Christian Franke 2018-01-12 14:33 ` Corinna Vinschen 2018-01-12 22:00 ` Eliot Moss @ 2018-01-13 8:35 ` Lee 2 siblings, 0 replies; 15+ messages in thread From: Lee @ 2018-01-13 8:35 UTC (permalink / raw) To: cygwin On 1/12/18, Christian Franke wrote: > Lee wrote: >> Why is the cygwin gcc calloc so much slower than the >> i686-w64-mingw32-gcc calloc? >> 1:12 vs 0:11 >> >> $cat calloc-test.c >> #include <stdio.h> >> #include <stdlib.h> >> #define ALLOCATION_SIZE (100 * 1024 * 1024) >> int main (int argc, char *argv[]) { >> for (int i = 0; i < 10000; i++) { >> void *temp = calloc(ALLOCATION_SIZE, 1); >> if ( temp == NULL ) { >> printf("drat! calloc returned NULL\n"); >> return 1; >> } >> free(temp); >> } >> return 0; >> } >> > > Could reproduce the difference on an older i7-2600K machine: > > Cygwin: ~20s > MinGW: ~4s > > Timing [cm]alloc() calls without actually using the allocated memory > might produce misleading results due to lazy page allocation and/or > zero-filling. > > MinGW binaries use calloc() from msvcrt.dll. This calloc() does not call > malloc() and then memset(). It directly calls: > > mem = HeapAlloc(_crtheap, HEAP_ZERO_MEMORY, size); > > which possibly only reserves allocate-and-zero-fill-on-demand pages for > later. Which seems like it could be viewed as a feature? Sort of like buying on credit - you don't pay for it all up front, just pay a bit each time you reference another zero fill on demand page. > Cygwin's calloc() is different. > > This variant of the above code adds one write access to each 4KiB page > (guarded by "volatile" to prevent dead assignment optimization): > > #include <stdio.h> > #include <stdlib.h> > #define ALLOCATION_SIZE (100 * 1024 * 1024) > int main (int argc, char *argv[]) { > for (int i = 0; i < 1000; i++) { > void *temp = calloc(ALLOCATION_SIZE, 1); > if ( temp == NULL ) { > printf("drat! calloc returned NULL\n"); > return 1; > } > for (int j = 0; j < ALLOCATION_SIZE; j += 4096) > ((volatile char *)temp)[j] = (char)i; > free(temp); > } > return 0; > } > > Results: > > Cygwin: ~310s > MinGW: ~210s Wow! Really nice explanation & example - Thank you. Lee -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2018-01-21 11:01 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-01-12 7:19 calloc speed difference Lee 2018-01-12 8:38 ` Eliot Moss 2018-01-12 9:07 ` Marco Atzeri 2018-01-12 10:52 ` Lee 2018-01-21 11:01 ` Marco Atzeri 2018-01-12 14:05 ` Christian Franke 2018-01-12 14:33 ` Corinna Vinschen 2018-01-12 19:59 ` cyg Simple 2018-01-12 20:07 ` cyg Simple 2018-01-12 20:41 ` Corinna Vinschen 2018-01-12 22:34 ` cyg Simple 2018-01-13 10:48 ` Lee 2018-01-13 10:04 ` Lee 2018-01-12 22:00 ` Eliot Moss 2018-01-13 8:35 ` Lee
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).