public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* Re: /proc/cpuinfo vs. processor groups
       [not found] <878t9vt3vs.fsf@Rainer.invalid>
@ 2018-04-11  7:02 ` Corinna Vinschen
  2018-04-11  9:28   ` Corinna Vinschen
  0 siblings, 1 reply; 4+ messages in thread
From: Corinna Vinschen @ 2018-04-11  7:02 UTC (permalink / raw)
  To: cygwin; +Cc: cygwin-apps

[-- Attachment #1: Type: text/plain, Size: 6715 bytes --]

THanks for the report but this belongs to the cygwin ML.
I'm redirecting this here.


Corinna

On Apr 10 18:36, Achim Gratz wrote:
> 
> As briefly discussed on IRC I've got a new Server 2016 blade with 2
> sockets × 8 cores × 2 HT =32 logical processors and Cygwin spews errors
> for processor ID 16 and up (also top doesn't quite work, which likely
> has the same reason, although the code path may be unrelated to the
> /proc/cpuinfo bug described here).
> 
> --8<---------------cut here---------------start------------->8---
> 64bit (166)~ > cat /proc/cpuinfo
>       0 [main] cat 10068 format_proc_cpuinfo: SetThreadGroupAffinity(10000,0 (10/16)) failed Win32 error 87
>     209 [main] cat 10068 format_proc_cpuinfo: SetThreadGroupAffinity(20000,0 (11/17)) failed Win32 error 87
>     913 [main] cat 10068 format_proc_cpuinfo: SetThreadGroupAffinity(40000,0 (12/18)) failed Win32 error 87
>    1047 [main] cat 10068 format_proc_cpuinfo: SetThreadGroupAffinity(80000,0 (13/19)) failed Win32 error 87
>    1151 [main] cat 10068 format_proc_cpuinfo: SetThreadGroupAffinity(100000,0 (14/20)) failed Win32 error 87
>    1266 [main] cat 10068 format_proc_cpuinfo: SetThreadGroupAffinity(200000,0 (15/21)) failed Win32 error 87
>    1383 [main] cat 10068 format_proc_cpuinfo: SetThreadGroupAffinity(400000,0 (16/22)) failed Win32 error 87
>    1479 [main] cat 10068 format_proc_cpuinfo: SetThreadGroupAffinity(800000,0 (17/23)) failed Win32 error 87
>    1573 [main] cat 10068 format_proc_cpuinfo: SetThreadGroupAffinity(1000000,0 (18/24)) failed Win32 error 87
>    1675 [main] cat 10068 format_proc_cpuinfo: SetThreadGroupAffinity(2000000,0 (19/25)) failed Win32 error 87
>    1806 [main] cat 10068 format_proc_cpuinfo: SetThreadGroupAffinity(4000000,0 (1A/26)) failed Win32 error 87
>    1888 [main] cat 10068 format_proc_cpuinfo: SetThreadGroupAffinity(8000000,0 (1B/27)) failed Win32 error 87
>    1971 [main] cat 10068 format_proc_cpuinfo: SetThreadGroupAffinity(10000000,0 (1C/28)) failed Win32 error 87
>    2069 [main] cat 10068 format_proc_cpuinfo: SetThreadGroupAffinity(20000000,0 (1D/29)) failed Win32 error 87
>    2154 [main] cat 10068 format_proc_cpuinfo: SetThreadGroupAffinity(40000000,0 (1E/30)) failed Win32 error 87
>    2247 [main] cat 10068 format_proc_cpuinfo: SetThreadGroupAffinity(80000000,0 (1F/31)) failed Win32 error 87
> --8<---------------cut here---------------end--------------->8---
> 
> It turns out this is related to processor groups and some changes that
> probably weren't even in the making when the Cygwin code was written.
> These changes were opt-in patches until 2008R2, but are now the default
> in 2016:
> 
> https://blogs.msdn.microsoft.com/saponsqlserver/2011/10/08/uneven-windows-processor-groups/
> 
> The BIOS on that server does something rather peculiar (it does make
> sense in a way, but Cygwin clearly didn't expect it):
> 
> https://support.hpe.com/hpsc/doc/public/display?sp4ts.oid=7271227&docId=emr_na-c04650594&docLocale=en_US
> 
> This results in Windows coming up with two 64 core processor groups that
> have 16 active logical processors each:
> 
> --8<---------------cut here---------------start------------->8---
> (gdb) print plpi
> $1 = (PSYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX) 0x600008020
> (gdb) print *plpi
> $2 = {Relationship = RelationGroup, Size = 128, {Processor = {Flags = 2 '\002', Reserved = "\000\002", '\000' <repeats 18 times>, GroupCount = 0,
>       GroupMask = {{Mask = 4160, Group = 0, Reserved = {0, 0, 0}}}}, NumaNode = {NodeNumber = 131074, Reserved = '\000' <repeats 19 times>,
>       GroupMask = {Mask = 4160, Group = 0, Reserved = {0, 0, 0}}}, Cache = {Level = 2 '\002', Associativity = 0 '\000', LineSize = 2, CacheSize = 0,
>       Type = CacheUnified, Reserved = '\000' <repeats 12 times>, "@\020\000\000\000\000\000", GroupMask = {Mask = 0, Group = 0, Reserved = {0, 0,
>           0}}}, Group = {MaximumGroupCount = 2, ActiveGroupCount = 2, Reserved = '\000' <repeats 19 times>, GroupInfo = {{
>           MaximumProcessorCount = 64 '@', ActiveProcessorCount = 16 '\020', Reserved = '\000' <repeats 37 times>, ActiveProcessorMask = 65535}}}}}
> --8<---------------cut here---------------end--------------->8---
> 
> I've confirmed that the error message is not printed if I manually
> correct the information for processor ID 17 as follows:
> 
> --8<---------------cut here---------------start------------->8---
> (gdb) print affinity                                                                                                                                                            
> $2 = {Mask = 131072, Group = 0, Reserved = {0, 0, 0}}                                                                                                                           
> (gdb) set affinity.Mask=2                                                                                                                                                       
> (gdb) set affinity.Group=1
> --8<---------------cut here---------------end--------------->8---
> 
> However, the same or possibly even stranger processor group setups can
> be created using boot options that force different organizations of
> processor groups.  There is an option to force a seperate processor
> group for each NUMA node and another one to force a specific number of
> groups.  The upshot is that even the first processor groups may not have
> the maximum number of processors present, so you need to check the
> number of active processors instead.  I couldn't find out if the
> processor mask is still guaranteed to be filled from the LSB contigously
> or whether one can rely on only the last group to have less than the
> first few.  It seems more prudent to check the group specific
> ActiveProcessorMask, although that significantly complicates the code.
> I don't think Windows can currently switch CPU online/offline when
> booted.
> 
> 
> As an aside, the cache size is reported as 256kiB (not just for this
> processor, but also for a Celeron 1037U on another machine), which seems
> to be the L2 cache for a single hardware core on these architectures.
> Linux now reports L3 cache sizes (and possibly L4 if present) for these
> (20MiB and 2MiB per socket respectively).
> 
> 
> Regards,
> Achim.
> --
> +<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+
> 
> Factory and User Sound Singles for Waldorf Blofeld:
> http://Synth.Stromeko.net/Downloads.html#WaldorfSounds

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: /proc/cpuinfo vs. processor groups
  2018-04-11  7:02 ` /proc/cpuinfo vs. processor groups Corinna Vinschen
@ 2018-04-11  9:28   ` Corinna Vinschen
  2018-04-11 10:49     ` Corinna Vinschen
  0 siblings, 1 reply; 4+ messages in thread
From: Corinna Vinschen @ 2018-04-11  9:28 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 3143 bytes --]

On Apr 11 09:02, Corinna Vinschen wrote:
> THanks for the report but this belongs to the cygwin ML.
> I'm redirecting this here.
> 
> 
> Corinna
> 
> On Apr 10 18:36, Achim Gratz wrote:
> > 
> > As briefly discussed on IRC I've got a new Server 2016 blade with 2
> > sockets × 8 cores × 2 HT =32 logical processors and Cygwin spews errors
> > for processor ID 16 and up (also top doesn't quite work, which likely
> > has the same reason, although the code path may be unrelated to the
> > /proc/cpuinfo bug described here).
> > 
> > --8<---------------cut here---------------start------------->8---
> > 64bit (166)~ > cat /proc/cpuinfo
> >       0 [main] cat 10068 format_proc_cpuinfo: SetThreadGroupAffinity(10000,0 (10/16)) failed Win32 error 87
> > [...]
> > This results in Windows coming up with two 64 core processor groups that
> > have 16 active logical processors each:
> > 
> > --8<---------------cut here---------------start------------->8---
> > (gdb) print plpi
> > $1 = (PSYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX) 0x600008020
> > (gdb) print *plpi
> > $2 = {Relationship = RelationGroup, Size = 128, {Processor = {Flags = 2 '\002', Reserved = "\000\002", '\000' <repeats 18 times>, GroupCount = 0,
> >       GroupMask = {{Mask = 4160, Group = 0, Reserved = {0, 0, 0}}}}, NumaNode = {NodeNumber = 131074, Reserved = '\000' <repeats 19 times>,
> >       GroupMask = {Mask = 4160, Group = 0, Reserved = {0, 0, 0}}}, Cache = {Level = 2 '\002', Associativity = 0 '\000', LineSize = 2, CacheSize = 0,
> >       Type = CacheUnified, Reserved = '\000' <repeats 12 times>, "@\020\000\000\000\000\000", GroupMask = {Mask = 0, Group = 0, Reserved = {0, 0,
> >           0}}}, Group = {MaximumGroupCount = 2, ActiveGroupCount = 2, Reserved = '\000' <repeats 19 times>, GroupInfo = {{
> >           MaximumProcessorCount = 64 '@', ActiveProcessorCount = 16 '\020', Reserved = '\000' <repeats 37 times>, ActiveProcessorMask = 65535}}}}}
> > --8<---------------cut here---------------end--------------->8---
> > 
> > I've confirmed that the error message is not printed if I manually
> > correct the information for processor ID 17 as follows:
> > [...]

I'm a bit puzzled about the connection between MaximumProcessorCount
and ActiveProcessorCount here.  Why isn't MaximumProcessorCount 16
as well?  Setting it to 64 doesn't make any sense for a system with
32 logical CPUs in total.

I'm not sure just simply using ActiveProcessorCount rather than
MaximumProcessorCount is the right thing to do...

> > As an aside, the cache size is reported as 256kiB (not just for this
> > processor, but also for a Celeron 1037U on another machine), which seems
> > to be the L2 cache for a single hardware core on these architectures.
> > Linux now reports L3 cache sizes (and possibly L4 if present) for these
> > (20MiB and 2MiB per socket respectively).

L3 is easy.  Checking the Linux kernel source I don't see that it
reports L4.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: /proc/cpuinfo vs. processor groups
  2018-04-11  9:28   ` Corinna Vinschen
@ 2018-04-11 10:49     ` Corinna Vinschen
  2018-04-11 17:03       ` Achim Gratz
  0 siblings, 1 reply; 4+ messages in thread
From: Corinna Vinschen @ 2018-04-11 10:49 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 1928 bytes --]

On Apr 11 11:28, Corinna Vinschen wrote:
> On Apr 11 09:02, Corinna Vinschen wrote:
> > On Apr 10 18:36, Achim Gratz wrote:
> > > As briefly discussed on IRC I've got a new Server 2016 blade with 2
> > > sockets × 8 cores × 2 HT =32 logical processors and Cygwin spews errors
> > > for processor ID 16 and up (also top doesn't quite work, which likely
> > > has the same reason, although the code path may be unrelated to the
> > > /proc/cpuinfo bug described here).
> > > 
> > > --8<---------------cut here---------------start------------->8---
> > > 64bit (166)~ > cat /proc/cpuinfo
> > >       0 [main] cat 10068 format_proc_cpuinfo: SetThreadGroupAffinity(10000,0 (10/16)) failed Win32 error 87
> > > [...]
> 
> I'm a bit puzzled about the connection between MaximumProcessorCount
> and ActiveProcessorCount here.  Why isn't MaximumProcessorCount 16
> as well?  Setting it to 64 doesn't make any sense for a system with
> 32 logical CPUs in total.
> 
> I'm not sure just simply using ActiveProcessorCount rather than
> MaximumProcessorCount is the right thing to do...

Nevertheless I pushed a patch doing just that, plus...
> 
> > > As an aside, the cache size is reported as 256kiB (not just for this
> > > processor, but also for a Celeron 1037U on another machine), which seems
> > > to be the L2 cache for a single hardware core on these architectures.
> > > Linux now reports L3 cache sizes (and possibly L4 if present) for these
> > > (20MiB and 2MiB per socket respectively).
> 
> L3 is easy.  Checking the Linux kernel source I don't see that it
> reports L4.

...L3 reporting for Intel CPUs.  I'm just building a new developer
snapshot I'll upload to https://cygwin.com/snapshots/ shortly.  Please
give it a try.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: /proc/cpuinfo vs. processor groups
  2018-04-11 10:49     ` Corinna Vinschen
@ 2018-04-11 17:03       ` Achim Gratz
  0 siblings, 0 replies; 4+ messages in thread
From: Achim Gratz @ 2018-04-11 17:03 UTC (permalink / raw)
  To: cygwin

Corinna Vinschen writes:
>> I'm a bit puzzled about the connection between MaximumProcessorCount
>> and ActiveProcessorCount here.  Why isn't MaximumProcessorCount 16
>> as well?  Setting it to 64 doesn't make any sense for a system with
>> 32 logical CPUs in total.

The way I understand it is that this is basically a lie by the BIOS to
get the desired effect of Windows creating two processor groups that
neatly separate along the NUMA boundary (which is are the two sockets in
this case).  With newer Windows versions (or certain patches applied to
the older ones) that would not be necessary (and would probably create
processor groups where the active processors and the maximum number of
processors are the same).  I still don't know if it's possible to create
shifted or discontinous processor maps, but it seems there'd be quite a
few programs that stopped working if that happened, so it can't be
common.

>> I'm not sure just simply using ActiveProcessorCount rather than
>> MaximumProcessorCount is the right thing to do...
>
> Nevertheless I pushed a patch doing just that, plus...
>> 
>> > > As an aside, the cache size is reported as 256kiB (not just for this
>> > > processor, but also for a Celeron 1037U on another machine), which seems
>> > > to be the L2 cache for a single hardware core on these architectures.
>> > > Linux now reports L3 cache sizes (and possibly L4 if present) for these
>> > > (20MiB and 2MiB per socket respectively).
>> 
>> L3 is easy.  Checking the Linux kernel source I don't see that it
>> reports L4.
>
> ...L3 reporting for Intel CPUs.  I'm just building a new developer
> snapshot I'll upload to https://cygwin.com/snapshots/ shortly.  Please
> give it a try.

Just as I had my own patch ready... :-) not only did I get a more beefy
CPU than requested, I also got a large enough data disk configured this
time, so I can compile Cygwin again from source.  I can confirm this
works as expected now.  I think the flags indicating presence of the new
barrier instructions are still missing.  It would also be nice if the
microcode patch level could be exposed, but I don't even know if it's
accessible on Windows from userspace.


There are oher applications that still don't work (at all or correctly)
when more than a single processor group is present, so I'll have our
hardware admins change the BIOS settings to "flat" mode in about a week
or so.  I might be able to play a bit with the boot options to create
processor groups in the Windows kernel if I'm allowed to change these
myself (haven't asked yet), but if you need more info from the system in
the current configuration please let me know.


Regards,
Achim.
-- 
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

SD adaptations for KORG EX-800 and Poly-800MkII V0.9:
http://Synth.Stromeko.net/Downloads.html#KorgSDada

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-04-11 17:03 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <878t9vt3vs.fsf@Rainer.invalid>
2018-04-11  7:02 ` /proc/cpuinfo vs. processor groups Corinna Vinschen
2018-04-11  9:28   ` Corinna Vinschen
2018-04-11 10:49     ` Corinna Vinschen
2018-04-11 17:03       ` Achim Gratz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).