public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* calloc speed difference
@ 2018-01-12  7:19 Lee
  2018-01-12  8:38 ` Eliot Moss
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Lee @ 2018-01-12  7:19 UTC (permalink / raw)
  To: cygwin

Why is the cygwin gcc calloc so much slower than the
i686-w64-mingw32-gcc calloc?
  1:12 vs 0:11

$cat calloc-test.c
#include <stdio.h>
#include <stdlib.h>
#define ALLOCATION_SIZE (100 * 1024 * 1024)
int main (int argc, char *argv[]) {
    for (int i = 0; i < 10000; i++) {
        void *temp = calloc(ALLOCATION_SIZE, 1);
        if ( temp == NULL ) {
           printf("drat! calloc returned NULL\n");
           return 1;
        }
        free(temp);
    }
    return 0;
}

$gcc calloc-test.c
$time ./a

real    1m12.459s
user    0m0.640s
sys     1m11.750s
$i686-w64-mingw32-gcc calloc-test.c
$time ./a

real    0m11.119s
user    0m0.000s
sys     0m0.000s
$gcc calloc-test.c
$time ./a

real    1m12.323s
user    0m0.656s
sys     1m11.640s
$i686-w64-mingw32-gcc calloc-test.c
$time ./a

real    0m11.080s
user    0m0.000s
sys     0m0.000s
$


$ gcc --version
gcc (GCC) 6.4.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ i686-w64-mingw32-gcc --version
i686-w64-mingw32-gcc (GCC) 6.4.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: calloc speed difference
  2018-01-12  7:19 calloc speed difference Lee
@ 2018-01-12  8:38 ` Eliot Moss
  2018-01-12  9:07 ` Marco Atzeri
  2018-01-12 14:05 ` Christian Franke
  2 siblings, 0 replies; 15+ messages in thread
From: Eliot Moss @ 2018-01-12  8:38 UTC (permalink / raw)
  To: cygwin

On 1/12/2018 2:19 AM, Lee wrote:
> Why is the cygwin gcc calloc so much slower than the
> i686-w64-mingw32-gcc calloc?

Since your test repeatedly allocates and frees one chunk
of size 100 Mb (ouch!) my guess is that the slow behavior is
rooted in something to do with mmap.  Perhaps Corinna
or another internals expert can explain why large mmap
requests would be a problem for cygwin -- and perhaps it
is something that could be improved if the effort is
warranted ...

Eliot Moss

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: calloc speed difference
  2018-01-12  7:19 calloc speed difference Lee
  2018-01-12  8:38 ` Eliot Moss
@ 2018-01-12  9:07 ` Marco Atzeri
  2018-01-12 10:52   ` Lee
  2018-01-12 14:05 ` Christian Franke
  2 siblings, 1 reply; 15+ messages in thread
From: Marco Atzeri @ 2018-01-12  9:07 UTC (permalink / raw)
  To: cygwin

On 12/01/2018 08:19, Lee wrote:
> Why is the cygwin gcc calloc so much slower than the
> i686-w64-mingw32-gcc calloc?
>    1:12 vs 0:11
> 
> $cat calloc-test.c
> #include <stdio.h>
> #include <stdlib.h>
> #define ALLOCATION_SIZE (100 * 1024 * 1024)
> int main (int argc, char *argv[]) {
>      for (int i = 0; i < 10000; i++) {
>          void *temp = calloc(ALLOCATION_SIZE, 1);
>          if ( temp == NULL ) {
>             printf("drat! calloc returned NULL\n");
>             return 1;
>          }
>          free(temp);
>      }
>      return 0;
> }
> 
> $gcc calloc-test.c
> $time ./a
> 
> real    1m12.459s
> user    0m0.640s
> sys     1m11.750s

it seems a local problem, maybe BLODA?

I have roughly the same for both 32 and 64 cygwin version on W7-64

$ time ./calloc-tests

real    0m8.346s
user    0m0.904s
sys     0m7.175s


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: calloc speed difference
  2018-01-12  9:07 ` Marco Atzeri
@ 2018-01-12 10:52   ` Lee
  2018-01-21 11:01     ` Marco Atzeri
  0 siblings, 1 reply; 15+ messages in thread
From: Lee @ 2018-01-12 10:52 UTC (permalink / raw)
  To: cygwin

On 1/12/18, Marco Atzeri wrote:
> On 12/01/2018 08:19, Lee wrote:
>> Why is the cygwin gcc calloc so much slower than the
>> i686-w64-mingw32-gcc calloc?
>>    1:12 vs 0:11
>>
>> $cat calloc-test.c
>> #include <stdio.h>
>> #include <stdlib.h>
>> #define ALLOCATION_SIZE (100 * 1024 * 1024)
>> int main (int argc, char *argv[]) {
>>      for (int i = 0; i < 10000; i++) {
>>          void *temp = calloc(ALLOCATION_SIZE, 1);
>>          if ( temp == NULL ) {
>>             printf("drat! calloc returned NULL\n");
>>             return 1;
>>          }
>>          free(temp);
>>      }
>>      return 0;
>> }
>>
>> $gcc calloc-test.c
>> $time ./a
>>
>> real    1m12.459s
>> user    0m0.640s
>> sys     1m11.750s
>
> it seems a local problem, maybe BLODA?

I've seen windows defender get in the way & slow things down before -
this doesn't look anything like that but how does one know for sure?

when running the cygwin gcc version sysinternals process explorer shows
  system idle process 72.x
  a.exe                      24.9x
  procexp64.exe           1.x
and everything else is < 1%
CPU is an Intel i3 w/ 4 logical processors, so I'm guessing that 25%
cpu busy is one processor 100% busy

It looks roughly the same when running the mingw gcc version .. except
that a.exe shows 24.9x% cpu busy for a much shorter time :)

In any case, I tried turning off windows defender - no change in how
long it takes calloc-test to run  (i already had c:\cygwin in the
exclusion list)


> I have roughly the same for both 32 and 64 cygwin version on W7-64

which flavor of gcc - the cygwin version that builds an executable
that pulls in the posix emulation layer or the mingw version that
builds an executable that runs "native" windows code?

Thanks,
Lee

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: calloc speed difference
  2018-01-12  7:19 calloc speed difference Lee
  2018-01-12  8:38 ` Eliot Moss
  2018-01-12  9:07 ` Marco Atzeri
@ 2018-01-12 14:05 ` Christian Franke
  2018-01-12 14:33   ` Corinna Vinschen
                     ` (2 more replies)
  2 siblings, 3 replies; 15+ messages in thread
From: Christian Franke @ 2018-01-12 14:05 UTC (permalink / raw)
  To: cygwin

Lee wrote:
> Why is the cygwin gcc calloc so much slower than the
> i686-w64-mingw32-gcc calloc?
>    1:12 vs 0:11
>
> $cat calloc-test.c
> #include <stdio.h>
> #include <stdlib.h>
> #define ALLOCATION_SIZE (100 * 1024 * 1024)
> int main (int argc, char *argv[]) {
>      for (int i = 0; i < 10000; i++) {
>          void *temp = calloc(ALLOCATION_SIZE, 1);
>          if ( temp == NULL ) {
>             printf("drat! calloc returned NULL\n");
>             return 1;
>          }
>          free(temp);
>      }
>      return 0;
> }
>

Could reproduce the difference on an older i7-2600K machine:

Cygwin: ~20s
MinGW: ~4s

Timing [cm]alloc() calls without actually using the allocated memory 
might produce misleading results due to lazy page allocation and/or 
zero-filling.

MinGW binaries use calloc() from msvcrt.dll. This calloc() does not call 
malloc() and then memset(). It directly calls:

   mem = HeapAlloc(_crtheap, HEAP_ZERO_MEMORY, size);

which possibly only reserves allocate-and-zero-fill-on-demand pages for 
later.

Cygwin's calloc() is different.

This variant of the above code adds one write access to each 4KiB page 
(guarded by "volatile" to prevent dead assignment optimization):

#include <stdio.h>
#include <stdlib.h>
#define ALLOCATION_SIZE (100 * 1024 * 1024)
int main (int argc, char *argv[]) {
     for (int i = 0; i < 1000; i++) {
         void *temp = calloc(ALLOCATION_SIZE, 1);
         if ( temp == NULL ) {
            printf("drat! calloc returned NULL\n");
            return 1;
         }
         for (int j = 0; j < ALLOCATION_SIZE; j += 4096)
           ((volatile char *)temp)[j] = (char)i;
         free(temp);
     }
     return 0;
}

Results:

Cygwin: ~310s
MinGW: ~210s

Christian


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: calloc speed difference
  2018-01-12 14:05 ` Christian Franke
@ 2018-01-12 14:33   ` Corinna Vinschen
  2018-01-12 19:59     ` cyg Simple
  2018-01-13 10:04     ` Lee
  2018-01-12 22:00   ` Eliot Moss
  2018-01-13  8:35   ` Lee
  2 siblings, 2 replies; 15+ messages in thread
From: Corinna Vinschen @ 2018-01-12 14:33 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 1954 bytes --]

On Jan 12 15:06, Christian Franke wrote:
> Lee wrote:
> > Why is the cygwin gcc calloc so much slower than the
> > i686-w64-mingw32-gcc calloc?
> >    1:12 vs 0:11
> > 
> > $cat calloc-test.c
> > #include <stdio.h>
> > #include <stdlib.h>
> > #define ALLOCATION_SIZE (100 * 1024 * 1024)
> > int main (int argc, char *argv[]) {
> >      for (int i = 0; i < 10000; i++) {
> >          void *temp = calloc(ALLOCATION_SIZE, 1);
> >          if ( temp == NULL ) {
> >             printf("drat! calloc returned NULL\n");
> >             return 1;
> >          }
> >          free(temp);
> >      }
> >      return 0;
> > }
> > 
> 
> Could reproduce the difference on an older i7-2600K machine:
> 
> Cygwin: ~20s
> MinGW: ~4s
> 
> Timing [cm]alloc() calls without actually using the allocated memory might
> produce misleading results due to lazy page allocation and/or zero-filling.
> 
> MinGW binaries use calloc() from msvcrt.dll. This calloc() does not call
> malloc() and then memset(). It directly calls:
> 
>   mem = HeapAlloc(_crtheap, HEAP_ZERO_MEMORY, size);
> 
> which possibly only reserves allocate-and-zero-fill-on-demand pages for
> later.
> 
> Cygwin's calloc() is different.

But then again, Cygwin's malloc *is* slow, particulary in
memory-demanding multi-threaded scenarios since that serializes all
malloc/free calls.

The memory handling within Cygwin is tricky.  Attempts to replace good
old dlmalloc with a fresher jemalloc or ptmalloc failed, but that only
means the developer (i.e., me, in case of ptmalloc) was too lazy...
busy! I mean busy... to pull this through.

Having said that, if somebody would like to take a stab at replacing
dlmalloc with something leaner, I would be very happy and assist as
much as I can.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: calloc speed difference
  2018-01-12 14:33   ` Corinna Vinschen
@ 2018-01-12 19:59     ` cyg Simple
  2018-01-12 20:07       ` cyg Simple
  2018-01-12 20:41       ` Corinna Vinschen
  2018-01-13 10:04     ` Lee
  1 sibling, 2 replies; 15+ messages in thread
From: cyg Simple @ 2018-01-12 19:59 UTC (permalink / raw)
  To: cygwin

On 1/12/2018 9:33 AM, Corinna Vinschen wrote:
> On Jan 12 15:06, Christian Franke wrote:
>> Lee wrote:
>>> Why is the cygwin gcc calloc so much slower than the
>>> i686-w64-mingw32-gcc calloc?
>>>    1:12 vs 0:11
>>>
>>> $cat calloc-test.c
>>> #include <stdio.h>
>>> #include <stdlib.h>
>>> #define ALLOCATION_SIZE (100 * 1024 * 1024)
>>> int main (int argc, char *argv[]) {
>>>      for (int i = 0; i < 10000; i++) {
>>>          void *temp = calloc(ALLOCATION_SIZE, 1);
>>>          if ( temp == NULL ) {
>>>             printf("drat! calloc returned NULL\n");
>>>             return 1;
>>>          }
>>>          free(temp);
>>>      }
>>>      return 0;
>>> }
>>>
>>
>> Could reproduce the difference on an older i7-2600K machine:
>>
>> Cygwin: ~20s
>> MinGW: ~4s
>>
>> Timing [cm]alloc() calls without actually using the allocated memory might
>> produce misleading results due to lazy page allocation and/or zero-filling.
>>
>> MinGW binaries use calloc() from msvcrt.dll. This calloc() does not call
>> malloc() and then memset(). It directly calls:
>>
>>   mem = HeapAlloc(_crtheap, HEAP_ZERO_MEMORY, size);
>>
>> which possibly only reserves allocate-and-zero-fill-on-demand pages for
>> later.
>>
>> Cygwin's calloc() is different.
> 
> But then again, Cygwin's malloc *is* slow, particulary in
> memory-demanding multi-threaded scenarios since that serializes all
> malloc/free calls.
> 
> The memory handling within Cygwin is tricky.  Attempts to replace good
> old dlmalloc with a fresher jemalloc or ptmalloc failed, but that only
> means the developer (i.e., me, in case of ptmalloc) was too lazy...
> busy! I mean busy... to pull this through.
> 
> Having said that, if somebody would like to take a stab at replacing
> dlmalloc with something leaner, I would be very happy and assist as
> much as I can.

Corina, how reliable is the Cygwin time function on a non-Cygwin
executable?  Isn't this a comparison of apples to oranges?

-- 
cyg Simple

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: calloc speed difference
  2018-01-12 19:59     ` cyg Simple
@ 2018-01-12 20:07       ` cyg Simple
  2018-01-12 20:41       ` Corinna Vinschen
  1 sibling, 0 replies; 15+ messages in thread
From: cyg Simple @ 2018-01-12 20:07 UTC (permalink / raw)
  To: cygwin

On 1/12/2018 2:59 PM, cyg Simple wrote:
> On 1/12/2018 9:33 AM, Corinna Vinschen wrote:
>> On Jan 12 15:06, Christian Franke wrote:
>>> Lee wrote:
>>>> Why is the cygwin gcc calloc so much slower than the
>>>> i686-w64-mingw32-gcc calloc?
>>>>    1:12 vs 0:11
>>>>
>>>> $cat calloc-test.c
>>>> #include <stdio.h>
>>>> #include <stdlib.h>
>>>> #define ALLOCATION_SIZE (100 * 1024 * 1024)
>>>> int main (int argc, char *argv[]) {
>>>>      for (int i = 0; i < 10000; i++) {
>>>>          void *temp = calloc(ALLOCATION_SIZE, 1);
>>>>          if ( temp == NULL ) {
>>>>             printf("drat! calloc returned NULL\n");
>>>>             return 1;
>>>>          }
>>>>          free(temp);
>>>>      }
>>>>      return 0;
>>>> }
>>>>
>>>
>>> Could reproduce the difference on an older i7-2600K machine:
>>>
>>> Cygwin: ~20s
>>> MinGW: ~4s
>>>
>>> Timing [cm]alloc() calls without actually using the allocated memory might
>>> produce misleading results due to lazy page allocation and/or zero-filling.
>>>
>>> MinGW binaries use calloc() from msvcrt.dll. This calloc() does not call
>>> malloc() and then memset(). It directly calls:
>>>
>>>   mem = HeapAlloc(_crtheap, HEAP_ZERO_MEMORY, size);
>>>
>>> which possibly only reserves allocate-and-zero-fill-on-demand pages for
>>> later.
>>>
>>> Cygwin's calloc() is different.
>>
>> But then again, Cygwin's malloc *is* slow, particulary in
>> memory-demanding multi-threaded scenarios since that serializes all
>> malloc/free calls.
>>
>> The memory handling within Cygwin is tricky.  Attempts to replace good
>> old dlmalloc with a fresher jemalloc or ptmalloc failed, but that only
>> means the developer (i.e., me, in case of ptmalloc) was too lazy...
>> busy! I mean busy... to pull this through.
>>
>> Having said that, if somebody would like to take a stab at replacing
>> dlmalloc with something leaner, I would be very happy and assist as
>> much as I can.
> 
> Corina, how reliable is the Cygwin time function on a non-Cygwin
> executable?  Isn't this a comparison of apples to oranges?
> 

s/Corina/Corinna

Sorry,
-- 
cyg Simple

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: calloc speed difference
  2018-01-12 19:59     ` cyg Simple
  2018-01-12 20:07       ` cyg Simple
@ 2018-01-12 20:41       ` Corinna Vinschen
  2018-01-12 22:34         ` cyg Simple
  1 sibling, 1 reply; 15+ messages in thread
From: Corinna Vinschen @ 2018-01-12 20:41 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 1760 bytes --]

On Jan 12 14:59, cyg Simple wrote:
> On 1/12/2018 9:33 AM, Corinna Vinschen wrote:
> > On Jan 12 15:06, Christian Franke wrote:
> >> Timing [cm]alloc() calls without actually using the allocated memory might
> >> produce misleading results due to lazy page allocation and/or zero-filling.
> >>
> >> MinGW binaries use calloc() from msvcrt.dll. This calloc() does not call
> >> malloc() and then memset(). It directly calls:
> >>
> >>   mem = HeapAlloc(_crtheap, HEAP_ZERO_MEMORY, size);
> >>
> >> which possibly only reserves allocate-and-zero-fill-on-demand pages for
> >> later.
> >>
> >> Cygwin's calloc() is different.
> > 
> > But then again, Cygwin's malloc *is* slow, particulary in
> > memory-demanding multi-threaded scenarios since that serializes all
> > malloc/free calls.
> > 
> > The memory handling within Cygwin is tricky.  Attempts to replace good
> > old dlmalloc with a fresher jemalloc or ptmalloc failed, but that only
> > means the developer (i.e., me, in case of ptmalloc) was too lazy...
> > busy! I mean busy... to pull this through.
> > 
> > Having said that, if somebody would like to take a stab at replacing
> > dlmalloc with something leaner, I would be very happy and assist as
> > much as I can.
> 
> Corina, how reliable is the Cygwin time function on a non-Cygwin
> executable?  Isn't this a comparison of apples to oranges?

I wasn't comparing, in fact.  I was just saying that Cygwin's malloc
is slow, partially because dlmalloc is not the fastest one, partially
due to the serialization overhead in multithreading scenarios.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: calloc speed difference
  2018-01-12 14:05 ` Christian Franke
  2018-01-12 14:33   ` Corinna Vinschen
@ 2018-01-12 22:00   ` Eliot Moss
  2018-01-13  8:35   ` Lee
  2 siblings, 0 replies; 15+ messages in thread
From: Eliot Moss @ 2018-01-12 22:00 UTC (permalink / raw)
  To: cygwin

On 1/12/2018 9:06 AM, Christian Franke wrote:

> This variant of the above code adds one write access to each 4KiB page (guarded by "volatile" to 
> prevent dead assignment optimization):
> 
> #include <stdio.h>
> #include <stdlib.h>
> #define ALLOCATION_SIZE (100 * 1024 * 1024)
> int main (int argc, char *argv[]) {
>      for (int i = 0; i < 1000; i++) {
>          void *temp = calloc(ALLOCATION_SIZE, 1);
>          if ( temp == NULL ) {
>             printf("drat! calloc returned NULL\n");
>             return 1;
>          }
>          for (int j = 0; j < ALLOCATION_SIZE; j += 4096)
>            ((volatile char *)temp)[j] = (char)i;
>          free(temp);
>      }
>      return 0;
> }
> 
> Results:
> 
> Cygwin: ~310s
> MinGW: ~210s

Good analysis!  There remains a lot of room for improvement, but this
shows good reasons to dig deeper to understand what goes on with large
allocations.

EM

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: calloc speed difference
  2018-01-12 20:41       ` Corinna Vinschen
@ 2018-01-12 22:34         ` cyg Simple
  2018-01-13 10:48           ` Lee
  0 siblings, 1 reply; 15+ messages in thread
From: cyg Simple @ 2018-01-12 22:34 UTC (permalink / raw)
  To: cygwin

On 1/12/2018 3:41 PM, Corinna Vinschen wrote:
> On Jan 12 14:59, cyg Simple wrote:
>> On 1/12/2018 9:33 AM, Corinna Vinschen wrote:
>>> On Jan 12 15:06, Christian Franke wrote:
>>>> Timing [cm]alloc() calls without actually using the allocated memory might
>>>> produce misleading results due to lazy page allocation and/or zero-filling.
>>>>
>>>> MinGW binaries use calloc() from msvcrt.dll. This calloc() does not call
>>>> malloc() and then memset(). It directly calls:
>>>>
>>>>   mem = HeapAlloc(_crtheap, HEAP_ZERO_MEMORY, size);
>>>>
>>>> which possibly only reserves allocate-and-zero-fill-on-demand pages for
>>>> later.
>>>>
>>>> Cygwin's calloc() is different.
>>>
>>> But then again, Cygwin's malloc *is* slow, particulary in
>>> memory-demanding multi-threaded scenarios since that serializes all
>>> malloc/free calls.
>>>
>>> The memory handling within Cygwin is tricky.  Attempts to replace good
>>> old dlmalloc with a fresher jemalloc or ptmalloc failed, but that only
>>> means the developer (i.e., me, in case of ptmalloc) was too lazy...
>>> busy! I mean busy... to pull this through.
>>>
>>> Having said that, if somebody would like to take a stab at replacing
>>> dlmalloc with something leaner, I would be very happy and assist as
>>> much as I can.
>>
>> Corina, how reliable is the Cygwin time function on a non-Cygwin
>> executable?  Isn't this a comparison of apples to oranges?
> 
> I wasn't comparing, in fact.  I was just saying that Cygwin's malloc
> is slow, partially because dlmalloc is not the fastest one, partially
> due to the serialization overhead in multithreading scenarios.

No, but the OP *is* doing a compare.  From what I remember doing a time
comparison of a non-Cygwin app compared to a Cygwin app isn't really a
logical comparison.  Even if the two were a Cygwin app multiple runs of
the same app will show variance.

-- 
cyg Simple

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: calloc speed difference
  2018-01-12 14:05 ` Christian Franke
  2018-01-12 14:33   ` Corinna Vinschen
  2018-01-12 22:00   ` Eliot Moss
@ 2018-01-13  8:35   ` Lee
  2 siblings, 0 replies; 15+ messages in thread
From: Lee @ 2018-01-13  8:35 UTC (permalink / raw)
  To: cygwin

On 1/12/18, Christian Franke  wrote:
> Lee wrote:
>> Why is the cygwin gcc calloc so much slower than the
>> i686-w64-mingw32-gcc calloc?
>>    1:12 vs 0:11
>>
>> $cat calloc-test.c
>> #include <stdio.h>
>> #include <stdlib.h>
>> #define ALLOCATION_SIZE (100 * 1024 * 1024)
>> int main (int argc, char *argv[]) {
>>      for (int i = 0; i < 10000; i++) {
>>          void *temp = calloc(ALLOCATION_SIZE, 1);
>>          if ( temp == NULL ) {
>>             printf("drat! calloc returned NULL\n");
>>             return 1;
>>          }
>>          free(temp);
>>      }
>>      return 0;
>> }
>>
>
> Could reproduce the difference on an older i7-2600K machine:
>
> Cygwin: ~20s
> MinGW: ~4s
>
> Timing [cm]alloc() calls without actually using the allocated memory
> might produce misleading results due to lazy page allocation and/or
> zero-filling.
>
> MinGW binaries use calloc() from msvcrt.dll. This calloc() does not call
> malloc() and then memset(). It directly calls:
>
>    mem = HeapAlloc(_crtheap, HEAP_ZERO_MEMORY, size);
>
> which possibly only reserves allocate-and-zero-fill-on-demand pages for
> later.

Which seems like it could be viewed as a feature?  Sort of like buying
on credit - you don't pay for it all up front, just pay a bit each
time you reference another zero fill on demand page.


> Cygwin's calloc() is different.
>
> This variant of the above code adds one write access to each 4KiB page
> (guarded by "volatile" to prevent dead assignment optimization):
>
> #include <stdio.h>
> #include <stdlib.h>
> #define ALLOCATION_SIZE (100 * 1024 * 1024)
> int main (int argc, char *argv[]) {
>      for (int i = 0; i < 1000; i++) {
>          void *temp = calloc(ALLOCATION_SIZE, 1);
>          if ( temp == NULL ) {
>             printf("drat! calloc returned NULL\n");
>             return 1;
>          }
>          for (int j = 0; j < ALLOCATION_SIZE; j += 4096)
>            ((volatile char *)temp)[j] = (char)i;
>          free(temp);
>      }
>      return 0;
> }
>
> Results:
>
> Cygwin: ~310s
> MinGW: ~210s

Wow!  Really nice explanation & example - Thank you.
Lee

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: calloc speed difference
  2018-01-12 14:33   ` Corinna Vinschen
  2018-01-12 19:59     ` cyg Simple
@ 2018-01-13 10:04     ` Lee
  1 sibling, 0 replies; 15+ messages in thread
From: Lee @ 2018-01-13 10:04 UTC (permalink / raw)
  To: cygwin

On 1/12/18, Corinna Vinschen  wrote:
> On Jan 12 15:06, Christian Franke wrote:
>> Lee wrote:
>> > Why is the cygwin gcc calloc so much slower than the
>> > i686-w64-mingw32-gcc calloc?
>> >    1:12 vs 0:11
    <.. snip example prog ..>
>>
>> Could reproduce the difference on an older i7-2600K machine:
>>
>> Cygwin: ~20s
>> MinGW: ~4s
     <.. snip possible explanation ..>
>
> But then again, Cygwin's malloc *is* slow, particulary in
> memory-demanding multi-threaded scenarios since that serializes all
> malloc/free calls.
>
> The memory handling within Cygwin is tricky.  Attempts to replace good
> old dlmalloc with a fresher jemalloc or ptmalloc failed, but that only
> means the developer (i.e., me, in case of ptmalloc) was too lazy...
> busy! I mean busy... to pull this through.
>
> Having said that, if somebody would like to take a stab at replacing
> dlmalloc with something leaner, I would be very happy and assist as
> much as I can.

I just took a quick look at some malloc code & docs and I know enough
to know that I'm not going to be the one taking a stab at replacing
dlmalloc.  Sorry :(

Lee

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: calloc speed difference
  2018-01-12 22:34         ` cyg Simple
@ 2018-01-13 10:48           ` Lee
  0 siblings, 0 replies; 15+ messages in thread
From: Lee @ 2018-01-13 10:48 UTC (permalink / raw)
  To: cygwin

On 1/12/18, cyg Simple  wrote:
> On 1/12/2018 3:41 PM, Corinna Vinschen wrote:
>> On Jan 12 14:59, cyg Simple wrote:
>>> On 1/12/2018 9:33 AM, Corinna Vinschen wrote:
>>>> On Jan 12 15:06, Christian Franke wrote:
>>>>> Timing [cm]alloc() calls without actually using the allocated memory
>>>>> might
>>>>> produce misleading results due to lazy page allocation and/or
>>>>> zero-filling.
>>>>>
>>>>> MinGW binaries use calloc() from msvcrt.dll. This calloc() does not
>>>>> call
>>>>> malloc() and then memset(). It directly calls:
>>>>>
>>>>>   mem = HeapAlloc(_crtheap, HEAP_ZERO_MEMORY, size);
>>>>>
>>>>> which possibly only reserves allocate-and-zero-fill-on-demand pages
>>>>> for
>>>>> later.
>>>>>
>>>>> Cygwin's calloc() is different.
>>>>
>>>> But then again, Cygwin's malloc *is* slow, particulary in
>>>> memory-demanding multi-threaded scenarios since that serializes all
>>>> malloc/free calls.
>>>>
>>>> The memory handling within Cygwin is tricky.  Attempts to replace good
>>>> old dlmalloc with a fresher jemalloc or ptmalloc failed, but that only
>>>> means the developer (i.e., me, in case of ptmalloc) was too lazy...
>>>> busy! I mean busy... to pull this through.
>>>>
>>>> Having said that, if somebody would like to take a stab at replacing
>>>> dlmalloc with something leaner, I would be very happy and assist as
>>>> much as I can.
>>>
>>> Corina, how reliable is the Cygwin time function on a non-Cygwin
>>> executable?  Isn't this a comparison of apples to oranges?

The wall-clock time seems reliable.  Timing a non-Cygwin executable
gives you 0.0 for the user & sys categories but I don't care about
them anywhere near as much as how long it takes for the program to
run.

>> I wasn't comparing, in fact.  I was just saying that Cygwin's malloc
>> is slow, partially because dlmalloc is not the fastest one, partially
>> due to the serialization overhead in multithreading scenarios.
>
> No, but the OP *is* doing a compare.  From what I remember doing a time
> comparison of a non-Cygwin app compared to a Cygwin app isn't really a
> logical comparison.

I'm probably missing something, but.. I have the source code.  I have
the choice of using the cygwin gcc compiler or the mingw cross
compiler.  If both executables produce the same results then comparing
execution times seems perfectly valid.

On the other hand, if the program produces text output (ie. dos vs.
unix line endings) then dealing with dos line endings in the cygwin
environment might be enough of a pain that I accept the ease-of-use
vs. execution time tradeoff and keep everything compatible w/ cygwin.

>  Even if the two were a Cygwin app multiple runs of
> the same app will show variance.

But not the seconds vs. minutes difference that I occasionally see
when comparing cygwin vs. native app performance.

Regards,
Lee

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: calloc speed difference
  2018-01-12 10:52   ` Lee
@ 2018-01-21 11:01     ` Marco Atzeri
  0 siblings, 0 replies; 15+ messages in thread
From: Marco Atzeri @ 2018-01-21 11:01 UTC (permalink / raw)
  To: cygwin

On 12/01/2018 18:52, Lee wrote:
> On 1/12/18, Marco Atzeri wrote:
>> On 12/01/2018 08:19, Lee wrote:
> 
> 
>> I have roughly the same for both 32 and 64 cygwin version on W7-64
> 
> which flavor of gcc - the cygwin version that builds an executable
> that pulls in the posix emulation layer or the mingw version that
> builds an executable that runs "native" windows code?
> 
> Thanks,
> Lee

As I wrote the cygwin version.

Regards
Marco


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-01-21 11:01 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-12  7:19 calloc speed difference Lee
2018-01-12  8:38 ` Eliot Moss
2018-01-12  9:07 ` Marco Atzeri
2018-01-12 10:52   ` Lee
2018-01-21 11:01     ` Marco Atzeri
2018-01-12 14:05 ` Christian Franke
2018-01-12 14:33   ` Corinna Vinschen
2018-01-12 19:59     ` cyg Simple
2018-01-12 20:07       ` cyg Simple
2018-01-12 20:41       ` Corinna Vinschen
2018-01-12 22:34         ` cyg Simple
2018-01-13 10:48           ` Lee
2018-01-13 10:04     ` Lee
2018-01-12 22:00   ` Eliot Moss
2018-01-13  8:35   ` Lee

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).