public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* C++ program using <thread>
@ 2021-06-11  1:14 André Bleau
  2021-06-11  5:52 ` Mark Geisert
  0 siblings, 1 reply; 3+ messages in thread
From: André Bleau @ 2021-06-11  1:14 UTC (permalink / raw)
  To: cygwin

Hi all,

I have a small C++ program using <thread>. Mainly, I have a series of long tasks for which I can use a chosen number of threads. I create a given number of instances of the std::thread class, each executing the same function but with different data. After launching the threads, my program waits for the results by calling thread.join() on each thread. Some threads finish sooner than others, depending on the data they process.

I have 16 cores with hyperthreading, so 32 virtual cores. I tried a big run with 63 threads.

If I compile as a Cygwin program with g++ 10.2.0, when the program runs with 63 threads the CPU load for the program never exceeds 6 or 7%. It remains constants when some of the threads finish, unless the number of threads remaining goes down to about 4.

If I compile as a Mingw program with x86_64-w64-mingw32-c++ 10.2.0, when the program runs with 63 threads the CPU load for the program reaches 98% then gradually diminishes as threads finish their task.

Needless to say, the total running time for the Mingw version is much shorter.

So, is there a limit to the CPU load that a threaded Cygwin program can get? If yes, how can it be changed?

Regards,

- André Bleau

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: C++ program using <thread>
  2021-06-11  1:14 C++ program using <thread> André Bleau
@ 2021-06-11  5:52 ` Mark Geisert
  0 siblings, 0 replies; 3+ messages in thread
From: Mark Geisert @ 2021-06-11  5:52 UTC (permalink / raw)
  To: cygwin

Hi André,

André Bleau via Cygwin wrote:
> Hi all,
> 
> I have a small C++ program using <thread>. Mainly, I have a series of long tasks for which I can use a chosen number of threads. I create a given number of instances of the std::thread class, each executing the same function but with different data. After launching the threads, my program waits for the results by calling thread.join() on each thread. Some threads finish sooner than others, depending on the data they process.
> 
> I have 16 cores with hyperthreading, so 32 virtual cores. I tried a big run with 63 threads.
> 
> If I compile as a Cygwin program with g++ 10.2.0, when the program runs with 63 threads the CPU load for the program never exceeds 6 or 7%. It remains constants when some of the threads finish, unless the number of threads remaining goes down to about 4.
> 
> If I compile as a Mingw program with x86_64-w64-mingw32-c++ 10.2.0, when the program runs with 63 threads the CPU load for the program reaches 98% then gradually diminishes as threads finish their task.
> 
> Needless to say, the total running time for the Mingw version is much shorter.
> 
> So, is there a limit to the CPU load that a threaded Cygwin program can get? If yes, how can it be changed?

Fascinating report; thanks!  I doubt there's an intentional limit, but perhaps 
there is some difference between the Cygwin and MinGW environments that explains 
the symptoms.  No idea at this point though.

Oh, does your program set process or thread affinity anywhere?

Would you be able to post the source of your test program, or provide it to me by 
private email?  I would run my cygmon profiler on it to look for clues.  There 
have been several other reports of multi-threaded programs taking far too long to 
complete, but the reports came in as possibly malloc-related issues so they may 
differ from yours.  But cygmon has found something troubling in those cases that 
I'm still investigating.
Regards,

..mark

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: C++ program using <thread>
@ 2021-06-12  3:04 André Bleau
  0 siblings, 0 replies; 3+ messages in thread
From: André Bleau @ 2021-06-12  3:04 UTC (permalink / raw)
  To: cygwin

Mark Geisert wrote:

>Hi André,
>
>André Bleau via Cygwin wrote:
>> Hi all,
>>
>> I have a small C++ program using <thread>. Mainly, I have a series of long tasks for which I can use a chosen
>>number of threads. I create a given number of instances of the std::thread class, each executing the same function but
>>with different data. After launching the threads, my program waits for the results by calling thread.join() on each
>>thread. Some threads finish sooner than others, depending on the data they process.
>>
>> I have 16 cores with hyperthreading, so 32 virtual cores. I tried a big run with 63 threads.
>>
>> If I compile as a Cygwin program with g++ 10.2.0, when the program runs with 63 threads the CPU load for the
>>program never exceeds 6 or 7%. It remains constants when some of the threads finish, unless the number of threads
>>remaining goes down to about 4.
>>
>> If I compile as a Mingw program with x86_64-w64-mingw32-c++ 10.2.0, when the program runs with 63 threads the
>> CPU load for the program reaches 98% then gradually diminishes as threads finish their task.
>>
>> Needless to say, the total running time for the Mingw version is much shorter.
>>
>> So, is there a limit to the CPU load that a threaded Cygwin program can get? If yes, how can it be changed?
>
>Fascinating report; thanks!  I doubt there's an intentional limit, but perhaps
>there is some difference between the Cygwin and MinGW environments that explains
>the symptoms.  No idea at this point though.
>
>Oh, does your program set process or thread affinity anywhere?
>
>Would you be able to post the source of your test program, or provide it to me by
>private email?  I would run my cygmon profiler on it to look for clues.  There
>have been several other reports of multi-threaded programs taking far too long to
>complete, but the reports came in as possibly malloc-related issues so they may
>differ from yours.  But cygmon has found something troubling in those cases that
>I'm still investigating.
>Regards,
>
>..mark

Thanks for your interest in my problem, Mark.

I have tried to build a simple test case with the same thread structure as my program. In that STC, the threaded function doesn't do any allocation and doesn't use anything from the standard library. With the STC, the Cygwin and Mingw versions behave similarly to each other as the number of threads increases. So, I guess that Cygwin's threaded program limit on CPU usage comes either from allocation or from something in std. I will try to add more things to the test program threaded function to discover the culprit.

Regards, and sorry to have first responded to you personally instead of responding to the list. Outlook.com is somewhat awkward with mailing lists.

- André Bleau

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-06-12  3:04 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-11  1:14 C++ program using <thread> André Bleau
2021-06-11  5:52 ` Mark Geisert
2021-06-12  3:04 André Bleau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).