public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Will GCC eventually support SSE2 or SSE4.1?
@ 2023-05-26  6:46 Stefan Kanthak
  2023-05-26  7:00 ` Andrew Pinski
                   ` (2 more replies)
  0 siblings, 3 replies; 43+ messages in thread
From: Stefan Kanthak @ 2023-05-26  6:46 UTC (permalink / raw)
  To: gcc

Hi,

compile the following function on a system with Core2 processor
(released January 2008) for the 32-bit execution environment:

--- demo.c ---
int ispowerof2(unsigned long long argument)
{
    return (argument & argument - 1) == 0;
}
--- EOF ---

GCC 13.3: gcc -m32 -O3 demo.c

NOTE: -mtune=native is the default!

# https://godbolt.org/z/b43cjGdY9
ispowerof2(unsigned long long):
        movq    xmm1, [esp+4]
        pcmpeqd xmm0, xmm0
        paddq   xmm0, xmm1
        pand    xmm0, xmm1
        movd    edx, xmm0      #    pxor    xmm1, xmm1
        psrlq   xmm0, 32       #    pcmpeqb xmm0, xmm1
        movd    eax, xmm0      #    pmovmskb eax, xmm0
        or      edx, eax       #    cmp     al, 255
        sete    al             #    sete    al
        movzx   eax, al        #
        ret

11 instructions in 40 bytes    # 10 instructions in 36 bytes

OOPS: why does GCC (ab)use the SSE2 alias "Willamette New Instruction Set"
      here instead of the native SSE4.1 alias "Penryn New Instruction Set"
      of the Core2 (and all later processors)?

OUCH: why does it FAIL to REALLY use SSE2, as shown in the comments on the
      right side?


Now add the -mtune=core2 option to EXPLICITLY enable the NATIVE SSE4.1
alias "Penryn New Instruction Set" of the Core2 processor:

GCC 13.3: gcc -m32 -mtune=core2 -O3 demo.c

# https://godbolt.org/z/svhEoYT11
ispowerof2(unsigned long long):
                               #    xor      eax, eax
        movq    xmm1, [esp+4]  #    movq     xmm1, [esp+4]
        pcmpeqd xmm0, xmm0     #    pcmpeqq  xmm0, xmm0
        paddq   xmm0, xmm1     #    paddq    xmm0, xmm1
        pand    xmm0, xmm1     #    ptest    xmm0, xmm1
        movd    edx, xmm0      #
        psrlq   xmm0, 32       #
        movd    eax, xmm0      #
        or      edx, eax       #
        sete    al             #    sete     al
        movzx   eax, al        #
        ret                    #    ret

11 instructions in 40 bytes    # 7 instructions in 26 bytes

OUCH: GCC FAILS to use SSE4.1 as shown in the comments on the right side.
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Last compile with -mtune=i386 for the i386 processor:

GCC 13.3: gcc -m32 -mtune=i386 -O3 demo.c

# https://godbolt.org/z/e76W6dsMj
ispowerof2(unsigned long long):
        push    ebx            #
        mov     ecx, [esp+8]   #    mov    eax, [esp+4]
        mov     ebx, [esp+12]  #    mov    edx, [esp+8]
        mov     eax, ecx       #
        mov     edx, ebx       #
        add     eax, -1        #    add    eax, -1
        adc     edx, -1        #    adc    edx, -1
        and     eax, ecx       #    and    eax, [esp+4]
        and     edx, ebx       #    and    edx, [esp+8]
        or      eax, edx       #    or     eax, edx
        sete    al             #    neg    eax
        movzx   eax, al        #    sbb    eax, eax
        pop     ebx            #    inc    eax
        ret                    #    ret

14 instructions in 33 bytes    # 11 instructions in 32 bytes

OUCH: why does GCC abuse EBX (and ECX too) and performs a superfluous
      memory write?


Stefan Kanthak

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Will GCC eventually support SSE2 or SSE4.1?
  2023-05-26  6:46 Will GCC eventually support SSE2 or SSE4.1? Stefan Kanthak
@ 2023-05-26  7:00 ` Andrew Pinski
  2023-05-26  7:30   ` Jonathan Wakely
  2023-05-26 11:34 ` Nicholas Vinson
  2023-05-26 15:10 ` LIU Hao
  2 siblings, 1 reply; 43+ messages in thread
From: Andrew Pinski @ 2023-05-26  7:00 UTC (permalink / raw)
  To: Stefan Kanthak; +Cc: gcc

On Thu, May 25, 2023 at 11:56 PM Stefan Kanthak <stefan.kanthak@nexgo.de> wrote:
>
> Hi,
>
> compile the following function on a system with Core2 processor
> (released January 2008) for the 32-bit execution environment:
>
> --- demo.c ---
> int ispowerof2(unsigned long long argument)
> {
>     return (argument & argument - 1) == 0;
> }
> --- EOF ---
>
> GCC 13.3: gcc -m32 -O3 demo.c
>
> NOTE: -mtune=native is the default!

You need to use -march=native and not -mtune=native .... to turn on
the architecture features.

Thanks,
Andrew

>
> # https://godbolt.org/z/b43cjGdY9
> ispowerof2(unsigned long long):
>         movq    xmm1, [esp+4]
>         pcmpeqd xmm0, xmm0
>         paddq   xmm0, xmm1
>         pand    xmm0, xmm1
>         movd    edx, xmm0      #    pxor    xmm1, xmm1
>         psrlq   xmm0, 32       #    pcmpeqb xmm0, xmm1
>         movd    eax, xmm0      #    pmovmskb eax, xmm0
>         or      edx, eax       #    cmp     al, 255
>         sete    al             #    sete    al
>         movzx   eax, al        #
>         ret
>
> 11 instructions in 40 bytes    # 10 instructions in 36 bytes
>
> OOPS: why does GCC (ab)use the SSE2 alias "Willamette New Instruction Set"
>       here instead of the native SSE4.1 alias "Penryn New Instruction Set"
>       of the Core2 (and all later processors)?
>
> OUCH: why does it FAIL to REALLY use SSE2, as shown in the comments on the
>       right side?
>
>
> Now add the -mtune=core2 option to EXPLICITLY enable the NATIVE SSE4.1
> alias "Penryn New Instruction Set" of the Core2 processor:
>
> GCC 13.3: gcc -m32 -mtune=core2 -O3 demo.c
>
> # https://godbolt.org/z/svhEoYT11
> ispowerof2(unsigned long long):
>                                #    xor      eax, eax
>         movq    xmm1, [esp+4]  #    movq     xmm1, [esp+4]
>         pcmpeqd xmm0, xmm0     #    pcmpeqq  xmm0, xmm0
>         paddq   xmm0, xmm1     #    paddq    xmm0, xmm1
>         pand    xmm0, xmm1     #    ptest    xmm0, xmm1
>         movd    edx, xmm0      #
>         psrlq   xmm0, 32       #
>         movd    eax, xmm0      #
>         or      edx, eax       #
>         sete    al             #    sete     al
>         movzx   eax, al        #
>         ret                    #    ret
>
> 11 instructions in 40 bytes    # 7 instructions in 26 bytes
>
> OUCH: GCC FAILS to use SSE4.1 as shown in the comments on the right side.
>       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Last compile with -mtune=i386 for the i386 processor:
>
> GCC 13.3: gcc -m32 -mtune=i386 -O3 demo.c
>
> # https://godbolt.org/z/e76W6dsMj
> ispowerof2(unsigned long long):
>         push    ebx            #
>         mov     ecx, [esp+8]   #    mov    eax, [esp+4]
>         mov     ebx, [esp+12]  #    mov    edx, [esp+8]
>         mov     eax, ecx       #
>         mov     edx, ebx       #
>         add     eax, -1        #    add    eax, -1
>         adc     edx, -1        #    adc    edx, -1
>         and     eax, ecx       #    and    eax, [esp+4]
>         and     edx, ebx       #    and    edx, [esp+8]
>         or      eax, edx       #    or     eax, edx
>         sete    al             #    neg    eax
>         movzx   eax, al        #    sbb    eax, eax
>         pop     ebx            #    inc    eax
>         ret                    #    ret
>
> 14 instructions in 33 bytes    # 11 instructions in 32 bytes
>
> OUCH: why does GCC abuse EBX (and ECX too) and performs a superfluous
>       memory write?
>
>
> Stefan Kanthak

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Will GCC eventually support SSE2 or SSE4.1?
  2023-05-26  7:00 ` Andrew Pinski
@ 2023-05-26  7:30   ` Jonathan Wakely
  2023-05-26  7:58     ` Stefan Kanthak
  2023-05-26  8:12     ` Hagen Paul Pfeifer
  0 siblings, 2 replies; 43+ messages in thread
From: Jonathan Wakely @ 2023-05-26  7:30 UTC (permalink / raw)
  To: Stefan Kanthak; +Cc: gcc, Andrew Pinski

[-- Attachment #1: Type: text/plain, Size: 918 bytes --]

On Fri, 26 May 2023, 08:01 Andrew Pinski via Gcc, <gcc@gcc.gnu.org> wrote:

> On Thu, May 25, 2023 at 11:56 PM Stefan Kanthak <stefan.kanthak@nexgo.de>
> wrote:
> >
> > Hi,
> >
> > compile the following function on a system with Core2 processor
> > (released January 2008) for the 32-bit execution environment:
> >
> > --- demo.c ---
> > int ispowerof2(unsigned long long argument)
> > {
> >     return (argument & argument - 1) == 0;
> > }
> > --- EOF ---
> >
> > GCC 13.3: gcc -m32 -O3 demo.c
> >
> > NOTE: -mtune=native is the default!
>
> You need to use -march=native and not -mtune=native .... to turn on
> the architecture features.



Yes this is just user error. You didn't use the right options to say you
want SSE2. GCC supports it fine already.

This is also the wrong mailing list for this kind of question, please use
gcc-help@gcc.gnu.org for this kind of thing, thanks.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Will GCC eventually support SSE2 or SSE4.1?
  2023-05-26  7:30   ` Jonathan Wakely
@ 2023-05-26  7:58     ` Stefan Kanthak
  2023-05-26  8:16       ` Sam James
  2023-05-26  8:28       ` Jonathan Wakely
  2023-05-26  8:12     ` Hagen Paul Pfeifer
  1 sibling, 2 replies; 43+ messages in thread
From: Stefan Kanthak @ 2023-05-26  7:58 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: gcc, Andrew Pinski

"Jonathan Wakely" <jwakely.gcc@gmail.com> wrote:

> On Fri, 26 May 2023, 08:01 Andrew Pinski via Gcc, <gcc@gcc.gnu.org> wrote:
>
>> On Thu, May 25, 2023 at 11:56?PM Stefan Kanthak <stefan.kanthak@nexgo.de>
>> wrote:
>>>
>>> Hi,
>>>
>>> compile the following function on a system with Core2 processor
>>> (released January 2008) for the 32-bit execution environment:
>>>
>>> --- demo.c ---
>>> int ispowerof2(unsigned long long argument)
>>> {
>>>     return (argument & argument - 1) == 0;
>>> }
>>> --- EOF ---
>>>
>>> GCC 13.3: gcc -m32 -O3 demo.c
>>>
>>> NOTE: -mtune=native is the default!
>>
>> You need to use -march=native and not -mtune=native .... to turn on
>> the architecture features.

(Un)fortunately this changes nothing!

STOP: that's wrong, it makes it even WORSE!

# Compilation provided by Compiler Explorer at https://godbolt.org/
ispowerof2(unsigned long long):
        vmovq   xmm1, QWORD PTR [esp+4]
        vpcmpeqd        xmm0, xmm0, xmm0
        xor     eax, eax
        vpaddq  xmm0, xmm1, xmm0
        vpand   xmm0, xmm0, xmm1
        vpunpcklqdq     xmm0, xmm0, xmm0
        vptest  xmm0, xmm0
        sete    al
        ret

That's what I call a REALLY EPIC FAILURE!

Compare this unefficient BLOAT to the SSE4.1 code from my original post!

> Yes this is just user error. You didn't use the right options to say you
> want SSE2.

ARGH: please read CAREFULLY what I wrote!

1) I didn't tell GCC to use SSE at all (I DON'T want any compiler to use
   SSE per default, especially when the generated code is SLOWER and BIGGER
   than conventional code using the general purpose registers)!

2) GCC uses SSE2 on its own, but doesn't support it well: it FAILS to use
   PMOVMSKB here, despite -O3!

3) -march=core2 doesn't help too, GCC fails to use SSE4.1 at all!

> GCC supports it fine already.

DREAM ON!
Again: view the 2 counter examples from my original post CAREFULLY!

not amused
Stefan

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Will GCC eventually support SSE2 or SSE4.1?
  2023-05-26  7:30   ` Jonathan Wakely
  2023-05-26  7:58     ` Stefan Kanthak
@ 2023-05-26  8:12     ` Hagen Paul Pfeifer
  2023-05-26  9:51       ` Jonathan Wakely
  1 sibling, 1 reply; 43+ messages in thread
From: Hagen Paul Pfeifer @ 2023-05-26  8:12 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: Stefan Kanthak, gcc, Andrew Pinski

* Jonathan Wakely via Gcc | 2023-05-26 08:30:06 [+0100]:

>On Fri, 26 May 2023, 08:01 Andrew Pinski via Gcc, <gcc@gcc.gnu.org> wrote:
>
>> > GCC 13.3: gcc -m32 -O3 demo.c
>> >
>> > NOTE: -mtune=native is the default!
>>
>> You need to use -march=native and not -mtune=native .... to turn on
>> the architecture features.
>
>Yes this is just user error. You didn't use the right options to say you
>want SSE2. GCC supports it fine already.
>
>This is also the wrong mailing list for this kind of question, please use
>gcc-help@gcc.gnu.org for this kind of thing, thanks.

Correct, that was also my first thought - but: this mistake has been repeated
again and again for decades. Here specifically Stefan Kanthak realized that
something is wrong - in many cases simply mtune=native is used and the error
is not realized.

Maybe we should think about how you can support the users better? Maybe by an
explicit hint in the documentation or by a info message at execution time. And
for such discussions this is the right mailing list! ;-)

Hagen

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Will GCC eventually support SSE2 or SSE4.1?
  2023-05-26  7:58     ` Stefan Kanthak
@ 2023-05-26  8:16       ` Sam James
  2023-05-26  8:28       ` Jonathan Wakely
  1 sibling, 0 replies; 43+ messages in thread
From: Sam James @ 2023-05-26  8:16 UTC (permalink / raw)
  To: Stefan Kanthak; +Cc: Jonathan Wakely, gcc, Andrew Pinski, gcc

[-- Attachment #1: Type: text/plain, Size: 874 bytes --]


"Stefan Kanthak" <stefan.kanthak@nexgo.de> writes:

> "Jonathan Wakely" <jwakely.gcc@gmail.com> wrote:
>
>> On Fri, 26 May 2023, 08:01 Andrew Pinski via Gcc, <gcc@gcc.gnu.org> wrote:
>>
>>> On Thu, May 25, 2023 at 11:56?PM Stefan Kanthak <stefan.kanthak@nexgo.de>
>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> compile the following function on a system with Core2 processor
>>>> (released January 2008) for the 32-bit execution environment:
>>>>
>>>> --- demo.c ---
>>>> int ispowerof2(unsigned long long argument)
>>>> {
>>>>     return (argument & argument - 1) == 0;
>>>> }
>>>> --- EOF ---
>>>>
>>>> GCC 13.3: gcc -m32 -O3 demo.c
>>>>
>>>> NOTE: -mtune=native is the default!
>>>
>>> You need to use -march=native and not -mtune=native .... to turn on
>>> the architecture features.
>
> That's what I call a REALLY EPIC FAILURE!

Please read https://gcc.gnu.org/bugs/ carefully.


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 377 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Will GCC eventually support SSE2 or SSE4.1?
  2023-05-26  7:58     ` Stefan Kanthak
  2023-05-26  8:16       ` Sam James
@ 2023-05-26  8:28       ` Jonathan Wakely
  2023-05-26  8:59         ` Stefan Kanthak
  1 sibling, 1 reply; 43+ messages in thread
From: Jonathan Wakely @ 2023-05-26  8:28 UTC (permalink / raw)
  To: Stefan Kanthak; +Cc: gcc, Andrew Pinski

On Fri, 26 May 2023 at 09:00, Stefan Kanthak <stefan.kanthak@nexgo.de> wrote:
>
> "Jonathan Wakely" <jwakely.gcc@gmail.com> wrote:
>
> > On Fri, 26 May 2023, 08:01 Andrew Pinski via Gcc, <gcc@gcc.gnu.org> wrote:
> >
> >> On Thu, May 25, 2023 at 11:56?PM Stefan Kanthak <stefan.kanthak@nexgo.de>
> >> wrote:
> >>>
> >>> Hi,
> >>>
> >>> compile the following function on a system with Core2 processor
> >>> (released January 2008) for the 32-bit execution environment:
> >>>
> >>> --- demo.c ---
> >>> int ispowerof2(unsigned long long argument)
> >>> {
> >>>     return (argument & argument - 1) == 0;
> >>> }
> >>> --- EOF ---
> >>>
> >>> GCC 13.3: gcc -m32 -O3 demo.c
> >>>
> >>> NOTE: -mtune=native is the default!
> >>
> >> You need to use -march=native and not -mtune=native .... to turn on
> >> the architecture features.
>
> (Un)fortunately this changes nothing!
>
> STOP: that's wrong, it makes it even WORSE!
>
> # Compilation provided by Compiler Explorer at https://godbolt.org/
> ispowerof2(unsigned long long):
>         vmovq   xmm1, QWORD PTR [esp+4]
>         vpcmpeqd        xmm0, xmm0, xmm0
>         xor     eax, eax
>         vpaddq  xmm0, xmm1, xmm0
>         vpand   xmm0, xmm0, xmm1
>         vpunpcklqdq     xmm0, xmm0, xmm0
>         vptest  xmm0, xmm0
>         sete    al
>         ret
>
> That's what I call a REALLY EPIC FAILURE!
>
> Compare this unefficient BLOAT to the SSE4.1 code from my original post!
>
> > Yes this is just user error. You didn't use the right options to say you
> > want SSE2.
>
> ARGH: please read CAREFULLY what I wrote!

You wrote "Now add the -mtune=core2 option to EXPLICITLY enable the
NATIVE SSE4.1
alias "Penryn New Instruction Set" of the Core2 processor" which is
wrong, that's not what -mtune does.

Read the docs CAREFULLY: https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html

>
> 1) I didn't tell GCC to use SSE at all (I DON'T want any compiler to use
>    SSE per default, especially when the generated code is SLOWER and BIGGER
>    than conventional code using the general purpose registers)!
>
> 2) GCC uses SSE2 on its own, but doesn't support it well: it FAILS to use
>    PMOVMSKB here, despite -O3!

So report a bug to bugzilla, not via an email to the wrong list.

>
> 3) -march=core2 doesn't help too, GCC fails to use SSE4.1 at all!

core2 doesn't enable SSE4.1, as clearly shown in the docs:
https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html

If you send emails full of confused mistakes, don't be surprised if
the replies aren't what you want.

If you think GCC is generating bad code, file a bug. But make sure
you're actually using the right options to enable the right
instruction sets before complaining about the instructions used.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Will GCC eventually support SSE2 or SSE4.1?
  2023-05-26  8:28       ` Jonathan Wakely
@ 2023-05-26  8:59         ` Stefan Kanthak
  2023-05-26  9:22           ` Jakub Jelinek
  2023-05-26  9:22           ` Will GCC eventually support SSE2 or SSE4.1? Jonathan Wakely
  0 siblings, 2 replies; 43+ messages in thread
From: Stefan Kanthak @ 2023-05-26  8:59 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: gcc, Andrew Pinski

"Jonathan Wakely" <jwakely.gcc@gmail.com> wrote:

> On Fri, 26 May 2023 at 09:00, Stefan Kanthak <stefan.kanthak@nexgo.de> wrote:
>>
>> "Jonathan Wakely" <jwakely.gcc@gmail.com> wrote:
>>
>> > On Fri, 26 May 2023, 08:01 Andrew Pinski via Gcc, <gcc@gcc.gnu.org> wrote:
>> >
>> >> On Thu, May 25, 2023 at 11:56?PM Stefan Kanthak <stefan.kanthak@nexgo.de>
>> >> wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>> compile the following function on a system with Core2 processor
>> >>> (released January 2008) for the 32-bit execution environment:
>> >>>
>> >>> --- demo.c ---
>> >>> int ispowerof2(unsigned long long argument)
>> >>> {
>> >>>     return (argument & argument - 1) == 0;
>> >>> }
>> >>> --- EOF ---
>> >>>
>> >>> GCC 13.3: gcc -m32 -O3 demo.c
>> >>>
>> >>> NOTE: -mtune=native is the default!
>> >>
>> >> You need to use -march=native and not -mtune=native .... to turn on
>> >> the architecture features.
>>
>> (Un)fortunately this changes nothing!
>>
>> STOP: that's wrong, it makes it even WORSE!
>>
>> # Compilation provided by Compiler Explorer at https://godbolt.org/
>> ispowerof2(unsigned long long):
>>         vmovq   xmm1, QWORD PTR [esp+4]
>>         vpcmpeqd        xmm0, xmm0, xmm0
>>         xor     eax, eax
>>         vpaddq  xmm0, xmm1, xmm0
>>         vpand   xmm0, xmm0, xmm1
>>         vpunpcklqdq     xmm0, xmm0, xmm0
>>         vptest  xmm0, xmm0
>>         sete    al
>>         ret
>>
>> That's what I call a REALLY EPIC FAILURE!
>>
>> Compare this unefficient BLOAT to the SSE4.1 code from my original post!
>>
>> > Yes this is just user error. You didn't use the right options to say you
>> > want SSE2.
>>
>> ARGH: please read CAREFULLY what I wrote!
> 
> You wrote "Now add the -mtune=core2 option to EXPLICITLY enable the
> NATIVE SSE4.1
> alias "Penryn New Instruction Set" of the Core2 processor" which is
> wrong, that's not what -mtune does.
> 
> Read the docs CAREFULLY: https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html

3) SSE4.1 is supported since Core2, but -march=core2 fails to enable it.
   That's bad, REALITY CHECK, please!

4) If the documenation is right, then the behaviour of GCC is wrong: it
   doesn't allow to use SSE4.1 without SSE4.2!

5) Compile the function with -march=nehalem (which according to the
   documentation enables support for BOTH SSE4.1 and SSE4.2) and notice
   that GCC fails to use SSE4.1!

>> 1) I didn't tell GCC to use SSE at all (I DON'T want any compiler to use
>>    SSE per default, especially when the generated code is SLOWER and BIGGER
>>    than conventional code using the general purpose registers)!
>>
>> 2) GCC uses SSE2 on its own, but doesn't support it well: it FAILS to use
>>    PMOVMSKB here, despite -O3!
> 
> So report a bug to bugzilla, not via an email to the wrong list.
> 
>>
>> 3) -march=core2 doesn't help too, GCC fails to use SSE4.1 at all!
> 
> core2 doesn't enable SSE4.1, as clearly shown in the docs:
> https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html
> 
> If you send emails full of confused mistakes, don't be surprised if
> the replies aren't what you want.
> 
> If you think GCC is generating bad code, file a bug. But make sure
> you're actually using the right options to enable the right
> instruction sets before complaining about the instructions used.

See above: GCC fails to use SSE4.1, despite -march=nehalem
And (if the documentation is right, then) GCC fails to support SSE4.1
without SSE4.2.

Stefan

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Will GCC eventually support SSE2 or SSE4.1?
  2023-05-26  8:59         ` Stefan Kanthak
@ 2023-05-26  9:22           ` Jakub Jelinek
  2023-05-26 11:28             ` Stefan Kanthak
  2023-05-26 11:36             ` Stefan Kanthak
  2023-05-26  9:22           ` Will GCC eventually support SSE2 or SSE4.1? Jonathan Wakely
  1 sibling, 2 replies; 43+ messages in thread
From: Jakub Jelinek @ 2023-05-26  9:22 UTC (permalink / raw)
  To: Stefan Kanthak; +Cc: Jonathan Wakely, gcc, Andrew Pinski

On Fri, May 26, 2023 at 10:59:03AM +0200, Stefan Kanthak wrote:
> 3) SSE4.1 is supported since Core2, but -march=core2 fails to enable it.
>    That's bad, REALITY CHECK, please!

You're wrong.
SSE4.1 first appeared in the 45nm versions of Core2, the 65nm versions
didn't have it.
The supported CPU names don't distinguish between core2 submodels,
so if you have core2 with sse4.1, you should either be using -march=native
if compiling on such a machine, or use -march=core2 -msse4.1,
there is no -march={conroe,allendale,wolfdale,merom,penryn,...}.
> 
> 4) If the documenation is right, then the behaviour of GCC is wrong: it
>    doesn't allow to use SSE4.1 without SSE4.2!

If you aren't able to read the documentation, it is hard to argue.

	Jakub


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Will GCC eventually support SSE2 or SSE4.1?
  2023-05-26  8:59         ` Stefan Kanthak
  2023-05-26  9:22           ` Jakub Jelinek
@ 2023-05-26  9:22           ` Jonathan Wakely
  1 sibling, 0 replies; 43+ messages in thread
From: Jonathan Wakely @ 2023-05-26  9:22 UTC (permalink / raw)
  To: Stefan Kanthak; +Cc: gcc, Andrew Pinski

On Fri, 26 May 2023 at 10:06, Stefan Kanthak <stefan.kanthak@nexgo.de> wrote:
>
> "Jonathan Wakely" <jwakely.gcc@gmail.com> wrote:
>
> > On Fri, 26 May 2023 at 09:00, Stefan Kanthak <stefan.kanthak@nexgo.de> wrote:
> >>
> >> "Jonathan Wakely" <jwakely.gcc@gmail.com> wrote:
> >>
> >> > On Fri, 26 May 2023, 08:01 Andrew Pinski via Gcc, <gcc@gcc.gnu.org> wrote:
> >> >
> >> >> On Thu, May 25, 2023 at 11:56?PM Stefan Kanthak <stefan.kanthak@nexgo.de>
> >> >> wrote:
> >> >>>
> >> >>> Hi,
> >> >>>
> >> >>> compile the following function on a system with Core2 processor
> >> >>> (released January 2008) for the 32-bit execution environment:
> >> >>>
> >> >>> --- demo.c ---
> >> >>> int ispowerof2(unsigned long long argument)
> >> >>> {
> >> >>>     return (argument & argument - 1) == 0;
> >> >>> }
> >> >>> --- EOF ---
> >> >>>
> >> >>> GCC 13.3: gcc -m32 -O3 demo.c
> >> >>>
> >> >>> NOTE: -mtune=native is the default!
> >> >>
> >> >> You need to use -march=native and not -mtune=native .... to turn on
> >> >> the architecture features.
> >>
> >> (Un)fortunately this changes nothing!
> >>
> >> STOP: that's wrong, it makes it even WORSE!
> >>
> >> # Compilation provided by Compiler Explorer at https://godbolt.org/
> >> ispowerof2(unsigned long long):
> >>         vmovq   xmm1, QWORD PTR [esp+4]
> >>         vpcmpeqd        xmm0, xmm0, xmm0
> >>         xor     eax, eax
> >>         vpaddq  xmm0, xmm1, xmm0
> >>         vpand   xmm0, xmm0, xmm1
> >>         vpunpcklqdq     xmm0, xmm0, xmm0
> >>         vptest  xmm0, xmm0
> >>         sete    al
> >>         ret
> >>
> >> That's what I call a REALLY EPIC FAILURE!
> >>
> >> Compare this unefficient BLOAT to the SSE4.1 code from my original post!
> >>
> >> > Yes this is just user error. You didn't use the right options to say you
> >> > want SSE2.
> >>
> >> ARGH: please read CAREFULLY what I wrote!
> >
> > You wrote "Now add the -mtune=core2 option to EXPLICITLY enable the
> > NATIVE SSE4.1
> > alias "Penryn New Instruction Set" of the Core2 processor" which is
> > wrong, that's not what -mtune does.
> >
> > Read the docs CAREFULLY: https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html
>
> 3) SSE4.1 is supported since Core2, but -march=core2 fails to enable it.
>    That's bad, REALITY CHECK, please!

Are you sure about that?

My understanding is that Core2 introduced SSSE3 and Penryn introduced
SSE4.1. The list at
https://en.wikipedia.org/wiki/List_of_Intel_Core_2_processors shows a
lot of Core2 processors without SSE4.1, is it wrong?

e.g. Intel Core2 E6400 doesn't support SSE4.1


>
> 4) If the documenation is right, then the behaviour of GCC is wrong: it
>    doesn't allow to use SSE4.1 without SSE4.2!

It's not "wrong", it just means GCC has chosen not to add customized
behaviour for the models that only support SSE4.1 and not SSE4.2.
That's not "wrong" unless it's leaving real performance on the floor
for real hardware used by real users.

How common are those models, and is there any significant performance
benefit in adding yet another arch option for those models?


> 5) Compile the function with -march=nehalem (which according to the
>    documentation enables support for BOTH SSE4.1 and SSE4.2) and notice
>    that GCC fails to use SSE4.1!

If you think the code would perform better with SSE4.1 instructions
and GCC doesn't use them for -march=nehalem, PLEASE FILE A BUG.

Stop yelling about it on the mailing list, it just makes you look like
a troll who isn't actually interesting in improving anything, just
complaining.

If you think there's something that should be fixed in GCC file a bug.
File a bug. File a bug.

Did anybody mention yet that you should file a bug?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Will GCC eventually support SSE2 or SSE4.1?
  2023-05-26  8:12     ` Hagen Paul Pfeifer
@ 2023-05-26  9:51       ` Jonathan Wakely
  0 siblings, 0 replies; 43+ messages in thread
From: Jonathan Wakely @ 2023-05-26  9:51 UTC (permalink / raw)
  To: Hagen Paul Pfeifer; +Cc: Stefan Kanthak, gcc, Andrew Pinski

On Fri, 26 May 2023 at 10:31, Hagen Paul Pfeifer wrote:
>
> * Jonathan Wakely via Gcc | 2023-05-26 08:30:06 [+0100]:
>
> >On Fri, 26 May 2023, 08:01 Andrew Pinski via Gcc, <gcc@gcc.gnu.org> wrote:
> >
> >> > GCC 13.3: gcc -m32 -O3 demo.c
> >> >
> >> > NOTE: -mtune=native is the default!
> >>
> >> You need to use -march=native and not -mtune=native .... to turn on
> >> the architecture features.
> >
> >Yes this is just user error. You didn't use the right options to say you
> >want SSE2. GCC supports it fine already.
> >
> >This is also the wrong mailing list for this kind of question, please use
> >gcc-help@gcc.gnu.org for this kind of thing, thanks.
>
> Correct, that was also my first thought - but: this mistake has been repeated
> again and again for decades. Here specifically Stefan Kanthak realized that
> something is wrong - in many cases simply mtune=native is used and the error
> is not realized.

I suppose we could give a warning if -mtune is used without an
explicit -march but it would probably annoy a lot of people.

It's not *always* wrong to use -mtune without -march. It's fine if you
know what the compiler's default -march is and you're happy with that
default.

> Maybe we should think about how you can support the users better? Maybe by an
> explicit hint in the documentation or by a info message at execution time. And
> for such discussions this is the right mailing list! ;-)

Yes :-)

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Will GCC eventually support SSE2 or SSE4.1?
  2023-05-26  9:22           ` Jakub Jelinek
@ 2023-05-26 11:28             ` Stefan Kanthak
  2023-05-26 11:42               ` Jonathan Wakely
  2023-05-26 11:36             ` Stefan Kanthak
  1 sibling, 1 reply; 43+ messages in thread
From: Stefan Kanthak @ 2023-05-26 11:28 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Jonathan Wakely, gcc, Andrew Pinski

"Jakub Jelinek" <jakub@redhat.com> wrote:

> On Fri, May 26, 2023 at 10:59:03AM +0200, Stefan Kanthak wrote:
>> 3) SSE4.1 is supported since Core2, but -march=core2 fails to enable it.
>>    That's bad, REALITY CHECK, please!
> 
> You're wrong.
> SSE4.1 first appeared in the 45nm versions of Core2, the 65nm versions
> didn't have it.

That's correct, I failed to see this difference.

> The supported CPU names don't distinguish between core2 submodels,
> so if you have core2 with sse4.1, you should either be using -march=native
> if compiling on such a machine, or use -march=core2 -msse4.1,

This is one of the combinations I didn't test until now; with it (and with
-m32 -msse4.1 too) GCC generates SSE4.1 instructions, but FAILS to optimise:

# Compilation provided by Compiler Explorer at https://godbolt.org/
ispowerof2(unsigned long long):
        movq    xmm1, QWORD PTR [esp+4]
        pcmpeqd xmm0, xmm0
        xor     eax, eax
        paddq   xmm0, xmm1
        pand    xmm0, xmm1            # SUPERFLUOUS!
        punpcklqdq      xmm0, xmm0    # SUPERFLUOUS!
        ptest   xmm0, xmm0            #    ptest    xmm0, xmm1
        sete    al
        ret

9 instructions in 36 bytes instead of 7 instructions in 26 bytes.

JFTR: the documentation of MOVQ specifies

| when the destination operand is an XMM register, the quadword is
| stored to the low quadword of the register, and the high quadword
| is cleared to all 0s.

> there is no -march={conroe,allendale,wolfdale,merom,penryn,...}.
> 
>> 4) If the documenation is right, then the behaviour of GCC is wrong: it
>>    doesn't allow to use SSE4.1 without SSE4.2!
> 
> If you aren't able to read the documentation, it is hard to argue.

When the documentation is wrong or incomplete it's hard to trust it!

| -m32
...
| The -m32 option sets int, long, and pointer types to 32 bits, and
| generates code that runs on any i386 system.
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

OUCH: as shown in https://godbolt.org/z/b43cjGdY9 -m32 ALONE but
      generates SSE2 instructions which DONT run on ANY i386 system!

OOPS: as shown above, -m32 -msse4.1 (or another -msse*) also generates
      code that does NOT run on ANY i386 system!

Where is the precedence of the different -m* options for the CPU type
documented?
Where is their influence on each other documented?

| -march=cpu-type
...
|   Specifying -march=cpu-type implies -mtune=cpu-type, except where noted
|   otherwise.
...
| -mtune=cpu-type
...
|    the compiler does not generate any code that cannot run on the default
|    machine type unless you use a -march=cpu-type option.

Why is the "default machine type" not mentioned/specified with -march=?

Stefan

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Will GCC eventually support SSE2 or SSE4.1?
  2023-05-26  6:46 Will GCC eventually support SSE2 or SSE4.1? Stefan Kanthak
  2023-05-26  7:00 ` Andrew Pinski
@ 2023-05-26 11:34 ` Nicholas Vinson
  2023-05-26 15:10 ` LIU Hao
  2 siblings, 0 replies; 43+ messages in thread
From: Nicholas Vinson @ 2023-05-26 11:34 UTC (permalink / raw)
  To: gcc

On 5/26/23 02:46, Stefan Kanthak wrote:

> Hi,
>
> compile the following function on a system with Core2 processor
> (released January 2008) for the 32-bit execution environment:
>
> --- demo.c ---
> int ispowerof2(unsigned long long argument)
> {
>      return (argument & argument - 1) == 0;
> }
> --- EOF ---
>
> GCC 13.3: gcc -m32 -O3 demo.c
>
> NOTE: -mtune=native is the default!
>
> # https://godbolt.org/z/b43cjGdY9
> ispowerof2(unsigned long long):
>          movq    xmm1, [esp+4]
>          pcmpeqd xmm0, xmm0
>          paddq   xmm0, xmm1
>          pand    xmm0, xmm1
>          movd    edx, xmm0      #    pxor    xmm1, xmm1
>          psrlq   xmm0, 32       #    pcmpeqb xmm0, xmm1
>          movd    eax, xmm0      #    pmovmskb eax, xmm0
>          or      edx, eax       #    cmp     al, 255
>          sete    al             #    sete    al
>          movzx   eax, al        #
>          ret
>
> 11 instructions in 40 bytes # 10 instructions in 36 bytes 

You cannot delete the 'movzx eax, al' instruction. The line "(argument & 
argument - 1) == 0" must evaluate to a 0 or a 1. The movzx is required 
to ensure that the upper 24-bits of the eax register are properly zeroed.


> OOPS: why does GCC (ab)use the SSE2 alias "Willamette New Instruction Set"
>        here instead of the native SSE4.1 alias "Penryn New Instruction Set"
>        of the Core2 (and all later processors)?
>
> OUCH: why does it FAIL to REALLY use SSE2, as shown in the comments on the
> right side?
After correcting for the above error, your solution is is the same size 
as the solution gcc generated. Therefore, the only remaining question 
would be "Is your solution faster than the code gcc produced?"

If you claim it is, I'd like to see evidence supporting that claim.
> Now add the -mtune=core2 option to EXPLICITLY enable the NATIVE SSE4.1
> alias "Penryn New Instruction Set" of the Core2 processor:
>
> GCC 13.3: gcc -m32 -mtune=core2 -O3 demo.c
>
> # https://godbolt.org/z/svhEoYT11
> ispowerof2(unsigned long long):
>                                 #    xor      eax, eax
>          movq    xmm1, [esp+4]  #    movq     xmm1, [esp+4]
>          pcmpeqd xmm0, xmm0     #    pcmpeqq  xmm0, xmm0
>          paddq   xmm0, xmm1     #    paddq    xmm0, xmm1
>          pand    xmm0, xmm1     #    ptest    xmm0, xmm1
>          movd    edx, xmm0      #
>          psrlq   xmm0, 32       #
>          movd    eax, xmm0      #
>          or      edx, eax       #
>          sete    al             #    sete     al
>          movzx   eax, al        #
>          ret                    #    ret
>
> 11 instructions in 40 bytes    # 7 instructions in 26 bytes
>
> OUCH: GCC FAILS to use SSE4.1 as shown in the comments on the right side.
>        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

As pointed out elsewhere in this thread, you used the wrong flags. With 
the proper flags, I get

% gcc -march=x86-64 -msse4.1 -m32 -O3 -c ispowerof2.c  && objdump -d 
ispowerof2.o


ispowerof2.o:     file format elf32-i386


Disassembly of section .text:

00000000 <ispowerof2>:
    0:   f3 0f 7e 4c 24 04       movq   0x4(%esp),%xmm1
    6:   66 0f 76 c0             pcmpeqd %xmm0,%xmm0
    a:   31 c0                   xor    %eax,%eax
    c:   66 0f d4 c1             paddq  %xmm1,%xmm0
   10:   66 0f db c1             pand   %xmm1,%xmm0
   14:   66 0f 6c c0             punpcklqdq %xmm0,%xmm0
   18:   66 0f 38 17 c0          ptest  %xmm0,%xmm0
   1d:   0f 94 c0                sete   %al
   20:   c3                      ret

so with just the SSE-4.1 instruction set the output is 31 bytes long.

> Last compile with -mtune=i386 for the i386 processor:
>
> GCC 13.3: gcc -m32 -mtune=i386 -O3 demo.c
>
> # https://godbolt.org/z/e76W6dsMj
> ispowerof2(unsigned long long):
>          push    ebx            #
>          mov     ecx, [esp+8]   #    mov    eax, [esp+4]
>          mov     ebx, [esp+12]  #    mov    edx, [esp+8]
>          mov     eax, ecx       #
>          mov     edx, ebx       #
>          add     eax, -1        #    add    eax, -1
>          adc     edx, -1        #    adc    edx, -1
>          and     eax, ecx       #    and    eax, [esp+4]
>          and     edx, ebx       #    and    edx, [esp+8]
>          or      eax, edx       #    or     eax, edx
>          sete    al             #    neg    eax
>          movzx   eax, al        #    sbb    eax, eax
>          pop     ebx            #    inc    eax
>          ret                    #    ret
>
> 14 instructions in 33 bytes    # 11 instructions in 32 bytes
>
> OUCH: why does GCC abuse EBX (and ECX too) and performs a superfluous
>        memory write?

At -O1 gcc produces:

% gcc -march=x86-64 -mtune=i386 -m32 -O -c ispowerof2.c  && objdump 
-Mintel -d ispowerof2.o

ispowerof2.o:     file format elf32-i386


Disassembly of section .text:

00000000 <ispowerof2>:
    0:   8b 44 24 04             mov    eax,DWORD PTR [esp+0x4]
    4:   8b 54 24 08             mov    edx,DWORD PTR [esp+0x8]
    8:   83 c0 ff                add    eax,0xffffffff
    b:   83 d2 ff                adc    edx,0xffffffff
    e:   23 44 24 04             and    eax,DWORD PTR [esp+0x4]
   12:   23 54 24 08             and    edx,DWORD PTR [esp+0x8]
   16:   09 d0                   or     eax,edx
   18:   0f 94 c0                sete   al
   1b:   0f b6 c0                movzx  eax,al
   1e:   c3                      ret

which is 1 instruction and 1 byte shorter than your proposed solution.

However, at -O2 or -O3 it produces the code you mention above. The 
reason for that is simple. It's faster to read from registers than it is 
to read from cache or RAM, and gcc is taking advantage of that fact when 
optimizing at -O2 or higher.

>
> Stefan Kanthak

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Will GCC eventually support SSE2 or SSE4.1?
  2023-05-26  9:22           ` Jakub Jelinek
  2023-05-26 11:28             ` Stefan Kanthak
@ 2023-05-26 11:36             ` Stefan Kanthak
  2023-05-26 11:45               ` Jonathan Wakely
  1 sibling, 1 reply; 43+ messages in thread
From: Stefan Kanthak @ 2023-05-26 11:36 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Jonathan Wakely, gcc, Andrew Pinski

"Jakub Jelinek" <jakub@redhat.com> wrote:

> On Fri, May 26, 2023 at 10:59:03AM +0200, Stefan Kanthak wrote:
>> 3) SSE4.1 is supported since Core2, but -march=core2 fails to enable it.
>>    That's bad, REALITY CHECK, please!
> 
> You're wrong.
> SSE4.1 first appeared in the 45nm versions of Core2, the 65nm versions
> didn't have it.

That's correct, I failed to see this difference.

> The supported CPU names don't distinguish between core2 submodels,
> so if you have core2 with sse4.1, you should either be using -march=native
> if compiling on such a machine, or use -march=core2 -msse4.1,

This is one of the combinations I didn't test until now; with it (and with
-m32 -msse4.1 too) GCC generates SSE4.1 instructions, but FAILS to optimise:

# Compilation provided by Compiler Explorer at https://godbolt.org/
ispowerof2(unsigned long long):
        movq    xmm1, QWORD PTR [esp+4]
        pcmpeqd xmm0, xmm0
        xor     eax, eax
        paddq   xmm0, xmm1
        pand    xmm0, xmm1            # SUPERFLUOUS!
        punpcklqdq      xmm0, xmm0    # SUPERFLUOUS!
        ptest   xmm0, xmm0            #    ptest    xmm0, xmm1
        sete    al
        ret

9 instructions in 36 bytes instead of 7 instructions in 26 bytes.

JFTR: the documentation of MOVQ specifies

| when the destination operand is an XMM register, the quadword is
| stored to the low quadword of the register, and the high quadword
| is cleared to all 0s.

> there is no -march={conroe,allendale,wolfdale,merom,penryn,...}.
> 
>> 4) If the documenation is right, then the behaviour of GCC is wrong: it
>>    doesn't allow to use SSE4.1 without SSE4.2!
> 
> If you aren't able to read the documentation, it is hard to argue.

When the documentation is wrong or incomplete it's hard to trust it!

| -m32
...
| The -m32 option sets int, long, and pointer types to 32 bits, and
| generates code that runs on any i386 system.
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

OUCH: as shown in https://godbolt.org/z/b43cjGdY9 -m32 ALONE but
      generates SSE2 instructions which DONT run on ANY i386 system!

OOPS: as shown above, -m32 -msse4.1 (or another -msse*) also generates
      code that does NOT run on ANY i386 system!

Where is the precedence of the different -m* feature options for the CPU
type documented?
Where is their influence on each other documented?
Why does the documentation FAIL to specify that CPU features given by
-m* override -m32 or enables them in ADDITION to those enabled by -march=?

| -march=cpu-type
...
|   Specifying -march=cpu-type implies -mtune=cpu-type, except where noted
|   otherwise.
...
| -mtune=cpu-type
...
|    the compiler does not generate any code that cannot run on the default
|    machine type unless you use a -march=cpu-type option.

Why is the "default machine type" not mentioned/specified with -march=?

Stefan

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Will GCC eventually support SSE2 or SSE4.1?
  2023-05-26 11:28             ` Stefan Kanthak
@ 2023-05-26 11:42               ` Jonathan Wakely
  2023-05-26 12:03                 ` Stefan Kanthak
  0 siblings, 1 reply; 43+ messages in thread
From: Jonathan Wakely @ 2023-05-26 11:42 UTC (permalink / raw)
  To: Stefan Kanthak; +Cc: Jakub Jelinek, gcc, Andrew Pinski

On Fri, 26 May 2023 at 12:29, Stefan Kanthak <stefan.kanthak@nexgo.de> wrote:
>
> "Jakub Jelinek" <jakub@redhat.com> wrote:
>
> > On Fri, May 26, 2023 at 10:59:03AM +0200, Stefan Kanthak wrote:
> >> 3) SSE4.1 is supported since Core2, but -march=core2 fails to enable it.
> >>    That's bad, REALITY CHECK, please!
> >
> > You're wrong.
> > SSE4.1 first appeared in the 45nm versions of Core2, the 65nm versions
> > didn't have it.
>
> That's correct, I failed to see this difference.

REALITY CHECK please!


> > The supported CPU names don't distinguish between core2 submodels,
> > so if you have core2 with sse4.1, you should either be using -march=native
> > if compiling on such a machine, or use -march=core2 -msse4.1,
>
> This is one of the combinations I didn't test until now; with it (and with
> -m32 -msse4.1 too) GCC generates SSE4.1 instructions, but FAILS to optimise:
>
> # Compilation provided by Compiler Explorer at https://godbolt.org/
> ispowerof2(unsigned long long):
>         movq    xmm1, QWORD PTR [esp+4]
>         pcmpeqd xmm0, xmm0
>         xor     eax, eax
>         paddq   xmm0, xmm1
>         pand    xmm0, xmm1            # SUPERFLUOUS!
>         punpcklqdq      xmm0, xmm0    # SUPERFLUOUS!
>         ptest   xmm0, xmm0            #    ptest    xmm0, xmm1
>         sete    al
>         ret
>
> 9 instructions in 36 bytes instead of 7 instructions in 26 bytes.
>
> JFTR: the documentation of MOVQ specifies
>
> | when the destination operand is an XMM register, the quadword is
> | stored to the low quadword of the register, and the high quadword
> | is cleared to all 0s.
>
> > there is no -march={conroe,allendale,wolfdale,merom,penryn,...}.
> >
> >> 4) If the documenation is right, then the behaviour of GCC is wrong: it
> >>    doesn't allow to use SSE4.1 without SSE4.2!
> >
> > If you aren't able to read the documentation, it is hard to argue.
>
> When the documentation is wrong or incomplete it's hard to trust it!

Just like when you make incorrect statements and assume everybody else is wrong.

The documentation isn't perfect, but you should not just ignore it and
assume you know better in all cases.

> | -m32
> ...
> | The -m32 option sets int, long, and pointer types to 32 bits, and
> | generates code that runs on any i386 system.
>   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> OUCH: as shown in https://godbolt.org/z/b43cjGdY9 -m32 ALONE but
>       generates SSE2 instructions which DONT run on ANY i386 system!

That's https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109954

> OOPS: as shown above, -m32 -msse4.1 (or another -msse*) also generates
>       code that does NOT run on ANY i386 system!
>
> Where is the precedence of the different -m* options for the CPU type
> documented?
> Where is their influence on each other documented?

-march enables the instructions listed for the relevant cpu family,
then using -mxxx or -mno-xxx adds or removes particular instruction
sets from the ones enabled by -march.

If you give an option twice, e.g. -march=core2 -march=nehalem, then
the second one wins. If you use -msse2 -mno-sse2 then the second one
wins.

You can check this using e.g.

gcc -Q --help=target -march=core2 -msse2

> | -march=cpu-type
> ...
> |   Specifying -march=cpu-type implies -mtune=cpu-type, except where noted
> |   otherwise.
> ...
> | -mtune=cpu-type
> ...
> |    the compiler does not generate any code that cannot run on the default
> |    machine type unless you use a -march=cpu-type option.
>
> Why is the "default machine type" not mentioned/specified with -march=?

Using -march overrides it. The default is set during configure. Adding
-v to the compilation will show what -march option is used by cc1 by
default.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Will GCC eventually support SSE2 or SSE4.1?
  2023-05-26 11:36             ` Stefan Kanthak
@ 2023-05-26 11:45               ` Jonathan Wakely
  2023-05-26 12:19                 ` Stefan Kanthak
  0 siblings, 1 reply; 43+ messages in thread
From: Jonathan Wakely @ 2023-05-26 11:45 UTC (permalink / raw)
  To: Stefan Kanthak; +Cc: Jakub Jelinek, gcc, Andrew Pinski

On Fri, 26 May 2023 at 12:42, Stefan Kanthak wrote:
> Why does the documentation FAIL to specify that CPU features given by
> -m* override -m32 or enables them in ADDITION to those enabled by -march=?

Because it's obvious.  If you ask for sse2 you get it.

I find it very SURPRISING that you're only just learning the basics of
how to use gcc NOW, after YELLING about all the OUCH.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Will GCC eventually support SSE2 or SSE4.1?
  2023-05-26 11:42               ` Jonathan Wakely
@ 2023-05-26 12:03                 ` Stefan Kanthak
  2023-05-26 12:16                   ` Jonathan Wakely
  2023-05-26 12:23                   ` Jonathan Wakely
  0 siblings, 2 replies; 43+ messages in thread
From: Stefan Kanthak @ 2023-05-26 12:03 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: Jakub Jelinek, gcc, Andrew Pinski

"Jonathan Wakely" <jwakely.gcc@gmail.com> wrote:

> On Fri, 26 May 2023 at 12:29, Stefan Kanthak <stefan.kanthak@nexgo.de> wrote:
>>
>> "Jakub Jelinek" <jakub@redhat.com> wrote:
>>
>> > On Fri, May 26, 2023 at 10:59:03AM +0200, Stefan Kanthak wrote:
>> >> 3) SSE4.1 is supported since Core2, but -march=core2 fails to enable it.
>> >>    That's bad, REALITY CHECK, please!
>> >
>> > You're wrong.
>> > SSE4.1 first appeared in the 45nm versions of Core2, the 65nm versions
>> > didn't have it.
>>
>> That's correct, I failed to see this difference.
> 
> REALITY CHECK please!

Dumbass check please!

>> > The supported CPU names don't distinguish between core2 submodels,
>> > so if you have core2 with sse4.1, you should either be using -march=native
>> > if compiling on such a machine, or use -march=core2 -msse4.1,
>>
>> This is one of the combinations I didn't test until now; with it (and with
>> -m32 -msse4.1 too) GCC generates SSE4.1 instructions, but FAILS to optimise:
>>
>> # Compilation provided by Compiler Explorer at https://godbolt.org/
>> ispowerof2(unsigned long long):
>>         movq    xmm1, QWORD PTR [esp+4]
>>         pcmpeqd xmm0, xmm0
>>         xor     eax, eax
>>         paddq   xmm0, xmm1
>>         pand    xmm0, xmm1            # SUPERFLUOUS!
>>         punpcklqdq      xmm0, xmm0    # SUPERFLUOUS!
>>         ptest   xmm0, xmm0            #    ptest    xmm0, xmm1
>>         sete    al
>>         ret
>>
>> 9 instructions in 36 bytes instead of 7 instructions in 26 bytes.

No comment here?

>> JFTR: the documentation of MOVQ specifies
>>
>> | when the destination operand is an XMM register, the quadword is
>> | stored to the low quadword of the register, and the high quadword
>> | is cleared to all 0s.
>>
>> > there is no -march={conroe,allendale,wolfdale,merom,penryn,...}.
>> >
>> >> 4) If the documenation is right, then the behaviour of GCC is wrong: it
>> >>    doesn't allow to use SSE4.1 without SSE4.2!
>> >
>> > If you aren't able to read the documentation, it is hard to argue.
>>
>> When the documentation is wrong or incomplete it's hard to trust it!
> 
> Just like when you make incorrect statements and assume everybody else is wrong.

Do I assume that? Or did you just make this up?

> The documentation isn't perfect, but you should not just ignore it and
> assume you know better in all cases.
> 
>> | -m32
>> ...
>> | The -m32 option sets int, long, and pointer types to 32 bits, and
>> | generates code that runs on any i386 system.
>>   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>> OUCH: as shown in https://godbolt.org/z/b43cjGdY9 -m32 ALONE but
>>       generates SSE2 instructions which DONT run on ANY i386 system!
> 
> That's https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109954

I posted this here some years ago; see for example
<https://skanthak.homepage.t-online.de/gcc.html#case27>
Ignorance is bliss?!

>> OOPS: as shown above, -m32 -msse4.1 (or another -msse*) also generates
>>       code that does NOT run on ANY i386 system!
>>
>> Where is the precedence of the different -m* options for the CPU type
>> documented?
>> Where is their influence on each other documented?
> 
> -march enables the instructions listed for the relevant cpu family,
> then using -mxxx or -mno-xxx adds or removes particular instruction
> sets from the ones enabled by -march.

ADD THIS TO THE DOCUMENTATION!

> If you give an option twice, e.g. -march=core2 -march=nehalem, then
> the second one wins. If you use -msse2 -mno-sse2 then the second one
> wins.

ARGH: not repetitions of ONE particular option or its negation, stupid!

> You can check this using e.g.
> 
> gcc -Q --help=target -march=core2 -msse2
> 
>> | -march=cpu-type
>> ...
>> |   Specifying -march=cpu-type implies -mtune=cpu-type, except where noted
>> |   otherwise.
>> ...
>> | -mtune=cpu-type
>> ...
>> |    the compiler does not generate any code that cannot run on the default
>> |    machine type unless you use a -march=cpu-type option.
>>
>> Why is the "default machine type" not mentioned/specified with -march=?
> 
> Using -march overrides it. The default is set during configure.

And exactly this is missing in the documentation for -march=!
Guess why I cited the documentation for -mtune= where it is mentioned?

> Adding -v to the compilation will show what -march option is used by cc1 by
> default.

Not reliable unless documented elsewhere!

Stefan

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Will GCC eventually support SSE2 or SSE4.1?
  2023-05-26 12:03                 ` Stefan Kanthak
@ 2023-05-26 12:16                   ` Jonathan Wakely
  2023-05-26 12:22                     ` Stefan Kanthak
  2023-05-26 12:23                   ` Jonathan Wakely
  1 sibling, 1 reply; 43+ messages in thread
From: Jonathan Wakely @ 2023-05-26 12:16 UTC (permalink / raw)
  To: Stefan Kanthak; +Cc: Jakub Jelinek, gcc, Andrew Pinski

On Fri, 26 May 2023 at 13:09, Stefan Kanthak <stefan.kanthak@nexgo.de> wrote:
>
> "Jonathan Wakely" <jwakely.gcc@gmail.com> wrote:
>
> > On Fri, 26 May 2023 at 12:29, Stefan Kanthak <stefan.kanthak@nexgo.de> wrote:
> >> OUCH: as shown in https://godbolt.org/z/b43cjGdY9 -m32 ALONE but
> >>       generates SSE2 instructions which DONT run on ANY i386 system!
> >
> > That's https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109954
>
> I posted this here some years ago; see for example
> <https://skanthak.homepage.t-online.de/gcc.html#case27>
> Ignorance is bliss?!

And when did you report it to bugzilla?

Nobody reads your silly webpage.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Will GCC eventually support SSE2 or SSE4.1?
  2023-05-26 11:45               ` Jonathan Wakely
@ 2023-05-26 12:19                 ` Stefan Kanthak
  2023-05-26 12:30                   ` Jonathan Wakely
  2023-05-26 12:37                   ` Jakub Jelinek
  0 siblings, 2 replies; 43+ messages in thread
From: Stefan Kanthak @ 2023-05-26 12:19 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: Jakub Jelinek, gcc, Andrew Pinski

"Jonathan Wakely" <jwakely.gcc@gmail.com> wrote:

> On Fri, 26 May 2023 at 12:42, Stefan Kanthak wrote:
>> Why does the documentation FAIL to specify that CPU features given by
>> -m* override -m32 or enables them in ADDITION to those enabled by -march=?
> 
> Because it's obvious.  If you ask for sse2 you get it.

ARGH! The documentation for -m32 contradicts

| -m32
...
| The -m32 option sets int, long, and pointer types to 32 bits, and
| generates code that runs on any i386 system.
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

> I find it very SURPRISING that you're only just learning the basics of
> how to use gcc NOW, after YELLING about all the OUCH.

I'm NOT surprised that you don't grok it!

gcc -msse4.1 -m32 -march=core2 ...

Which -m* options win here?
Do -m32 or -march=core2 override -msse4.1?

Stefan

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Will GCC eventually support SSE2 or SSE4.1?
  2023-05-26 12:16                   ` Jonathan Wakely
@ 2023-05-26 12:22                     ` Stefan Kanthak
  2023-05-26 13:00                       ` Mark Wielaard
  0 siblings, 1 reply; 43+ messages in thread
From: Stefan Kanthak @ 2023-05-26 12:22 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: Jakub Jelinek, gcc, Andrew Pinski

"Jonathan Wakely" <jwakely.gcc@gmail.com> wrote:

> On Fri, 26 May 2023 at 13:09, Stefan Kanthak <stefan.kanthak@nexgo.de> wrote:
>>
>> "Jonathan Wakely" <jwakely.gcc@gmail.com> wrote:
>>
>> > On Fri, 26 May 2023 at 12:29, Stefan Kanthak <stefan.kanthak@nexgo.de> wrote:
>> >> OUCH: as shown in https://godbolt.org/z/b43cjGdY9 -m32 ALONE but
>> >>       generates SSE2 instructions which DONT run on ANY i386 system!
>> >
>> > That's https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109954
>>
>> I posted this here some years ago; see for example
   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> <https://skanthak.homepage.t-online.de/gcc.html#case27>
>> Ignorance is bliss?!
> 
> And when did you report it to bugzilla?
> 
> Nobody reads your silly webpage.

Thanks stupid. I don't read your silly bugzilla!

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Will GCC eventually support SSE2 or SSE4.1?
  2023-05-26 12:03                 ` Stefan Kanthak
  2023-05-26 12:16                   ` Jonathan Wakely
@ 2023-05-26 12:23                   ` Jonathan Wakely
  1 sibling, 0 replies; 43+ messages in thread
From: Jonathan Wakely @ 2023-05-26 12:23 UTC (permalink / raw)
  To: Stefan Kanthak; +Cc: Jakub Jelinek, gcc, Andrew Pinski

On Fri, 26 May 2023 at 13:09, Stefan Kanthak <stefan.kanthak@nexgo.de> wrote:
>
> "Jonathan Wakely" <jwakely.gcc@gmail.com> wrote:
>
> > On Fri, 26 May 2023 at 12:29, Stefan Kanthak <stefan.kanthak@nexgo.de> wrote:
> >>
> >> "Jakub Jelinek" <jakub@redhat.com> wrote:
> >> > If you aren't able to read the documentation, it is hard to argue.
> >>
> >> When the documentation is wrong or incomplete it's hard to trust it!
> >
> > Just like when you make incorrect statements and assume everybody else is wrong.
>
> Do I assume that? Or did you just make this up?

Your initial email in this thread, and examples on your silly webpage,
assume that -mtune=i386 or -mtune=core2 or -mtune=native will control
the instruction set used. Which isn't how it works. You didn't read
the docs, or you read them and assumed you knew better.

And your charming "REALITY CHECK please!" about -march=core2 which
turned out to be, again, because you don't know what you're talking
about.

I would suggest that you fix the silly webpage, but it doesn't really
matter because nobody's going to take it seriously. Have fun being
rude and wrong.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Will GCC eventually support SSE2 or SSE4.1?
  2023-05-26 12:19                 ` Stefan Kanthak
@ 2023-05-26 12:30                   ` Jonathan Wakely
  2023-05-26 12:42                     ` Stefan Kanthak
  2023-05-26 12:37                   ` Jakub Jelinek
  1 sibling, 1 reply; 43+ messages in thread
From: Jonathan Wakely @ 2023-05-26 12:30 UTC (permalink / raw)
  To: Stefan Kanthak; +Cc: Jakub Jelinek, gcc, Andrew Pinski

On Fri, 26 May 2023 at 13:23, Stefan Kanthak <stefan.kanthak@nexgo.de> wrote:
>
> "Jonathan Wakely" <jwakely.gcc@gmail.com> wrote:
>
> > On Fri, 26 May 2023 at 12:42, Stefan Kanthak wrote:
> >> Why does the documentation FAIL to specify that CPU features given by
> >> -m* override -m32 or enables them in ADDITION to those enabled by -march=?
> >
> > Because it's obvious.  If you ask for sse2 you get it.
>
> ARGH! The documentation for -m32 contradicts
>
> | -m32
> ...
> | The -m32 option sets int, long, and pointer types to 32 bits, and
> | generates code that runs on any i386 system.
>   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> > I find it very SURPRISING that you're only just learning the basics of
> > how to use gcc NOW, after YELLING about all the OUCH.
>
> I'm NOT surprised that you don't grok it!
>
> gcc -msse4.1 -m32 -march=core2 ...
>
> Which -m* options win here?
> Do -m32 or -march=core2 override -msse4.1?

No, because -m32 says to generate code for the 32-bit model, it
doesn't select an instruction set.

A multilib x86_64 compiler has a default 64-bit arch and a default
32-bit arch. If you don't configure GCC with --with-arch-32 and/or
--with-arch-64 then you get -march=x86-64 for both 32-bit and 64-bit.
Using -m32 without -march will use the default 32-bit arch, which is
probably x86-64. Using -m32 with any explicit -march will override the
default, and use the one you specified.

And I already said that -march selects the base instruction set, and
then -msse4.1 adds to that, enabling sse4.1 as well.

I said:

-march enables the instructions listed for the relevant cpu family,
then using -mxxx or -mno-xxx adds or removes particular instruction
sets from the ones enabled by -march.

So -march=core2 selects the instruction sets listed in the docs, and
then -msse4.1 adds to that. I don't know how to say it more clearly.

All this could have been explained easily and without conflict if
you'd use the right mailing list in the first place and asked how
things work, instead of storming in acting like a clown and being
rude.

"Will GCC eventually support SSE2 or SSE4.1?" is confrontational, and
makes you look dumb. And it's just got worse since then.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Will GCC eventually support SSE2 or SSE4.1?
  2023-05-26 12:19                 ` Stefan Kanthak
  2023-05-26 12:30                   ` Jonathan Wakely
@ 2023-05-26 12:37                   ` Jakub Jelinek
  2023-05-26 13:49                     ` Stefan Kanthak
  1 sibling, 1 reply; 43+ messages in thread
From: Jakub Jelinek @ 2023-05-26 12:37 UTC (permalink / raw)
  To: Stefan Kanthak; +Cc: Jonathan Wakely, gcc, Andrew Pinski

On Fri, May 26, 2023 at 02:19:54PM +0200, Stefan Kanthak wrote:
> > I find it very SURPRISING that you're only just learning the basics of
> > how to use gcc NOW, after YELLING about all the OUCH.
> 
> I'm NOT surprised that you don't grok it!
> 
> gcc -msse4.1 -m32 -march=core2 ...
> 
> Which -m* options win here?
> Do -m32 or -march=core2 override -msse4.1?

Jonathan told you what to use to find it out (-Q --help=target).
-m32/-m64/-mx32/-m16 options don't affect the ISA, they switch the
main ABI (ilp32 32-bit code, lp64 64-bit code, ilp32 code running
in 64-bit mode, 16-bit code).  -march= options selects the ISA base (which
CPU family to compile form as minimum),
if you don't supply it, the default from how gcc has been configured
is selected (e.g. if you configure --with-arch-32=core2, then that
will be the -m32 default, if you configure --with-arch=x86-64, that will
be the -march default if --with-arch-32= isn't specified, etc.).
If more than one -march= is specified, the last one wins.
And, the -mISA options then tweak the ISA set.  Most ISAs have dependencies,
say -msse4.1 enables -mssse3 which enables -msse3 which enables -msse2 etc.,
and similarly the -mno-ISA options disable what ISAs depend on it, so
-mno-avx disables -mno-avx2 which disables -mno-avx512f which disables ...
-mtune= option specifies for which CPU family the code should be tuned,
it will still run on any code compatible with the implicit or explicit
-march=, but will schedule instructions or choose from alternative forms
from the selected ISAs to perform best on the -mtune=  family.

	Jakub


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Will GCC eventually support SSE2 or SSE4.1?
  2023-05-26 12:30                   ` Jonathan Wakely
@ 2023-05-26 12:42                     ` Stefan Kanthak
  2023-05-26 13:33                       ` Nicholas Vinson
  0 siblings, 1 reply; 43+ messages in thread
From: Stefan Kanthak @ 2023-05-26 12:42 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: Jakub Jelinek, gcc, Andrew Pinski

"Jonathan Wakely" <jwakely.gcc@gmail.com> wrote:

> On Fri, 26 May 2023 at 13:23, Stefan Kanthak <stefan.kanthak@nexgo.de> wrote:
>>
>> "Jonathan Wakely" <jwakely.gcc@gmail.com> wrote:
>>
>> > On Fri, 26 May 2023 at 12:42, Stefan Kanthak wrote:
>> >> Why does the documentation FAIL to specify that CPU features given by
>> >> -m* override -m32 or enables them in ADDITION to those enabled by -march=?
>> >
>> > Because it's obvious.  If you ask for sse2 you get it.
>>
>> ARGH! The documentation for -m32 contradicts
>>
>> | -m32
>> ...
>> | The -m32 option sets int, long, and pointer types to 32 bits, and
>> | generates code that runs on any i386 system.
>>   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>> > I find it very SURPRISING that you're only just learning the basics of
>> > how to use gcc NOW, after YELLING about all the OUCH.
>>
>> I'm NOT surprised that you don't grok it!
>>
>> gcc -msse4.1 -m32 -march=core2 ...
>>
>> Which -m* options win here?
>> Do -m32 or -march=core2 override -msse4.1?
> 
> No, because -m32 says to generate code for the 32-bit model, it
> doesn't select an instruction set.

I underlined the relevant part, EXTRA FOR YOU!
Bonus question: does every 32-bit i386 system support SSE instructions?

> A multilib x86_64 compiler has a default 64-bit arch and a default
> 32-bit arch. If you don't configure GCC with --with-arch-32 and/or
> --with-arch-64 then you get -march=x86-64 for both 32-bit and 64-bit.
> Using -m32 without -march will use the default 32-bit arch, which is
> probably x86-64. Using -m32 with any explicit -march will override the
> default, and use the one you specified.

This is NOT what the underlined part of the documentation says!

> And I already said that -march selects the base instruction set, and
> then -msse4.1 adds to that, enabling sse4.1 as well.
> 
> I said:
> 
> -march enables the instructions listed for the relevant cpu family,
> then using -mxxx or -mno-xxx adds or removes particular instruction
> sets from the ones enabled by -march.

"Then" means after, not before!
Guess why I asked EXPLICITLY for the preferences?!

> So -march=core2 selects the instruction sets listed in the docs, and
> then -msse4.1 adds to that. I don't know how to say it more clearly.

That's your problem! Set some sequence points...

> All this could have been explained easily and without conflict if
> you'd use the right mailing list in the first place and asked how
> things work, instead of storming in acting like a clown and being
> rude.
> 
> "Will GCC eventually support SSE2 or SSE4.1?" is confrontational, and
> makes you look dumb. And it's just got worse since then.

I could have added PROPERLY, because that's where it CLEARLY fails, as
shown by the generated unoptimised code.

Stefan

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Will GCC eventually support SSE2 or SSE4.1?
  2023-05-26 12:22                     ` Stefan Kanthak
@ 2023-05-26 13:00                       ` Mark Wielaard
  0 siblings, 0 replies; 43+ messages in thread
From: Mark Wielaard @ 2023-05-26 13:00 UTC (permalink / raw)
  To: Stefan Kanthak, Jonathan Wakely; +Cc: Jakub Jelinek, gcc, Andrew Pinski

Stefan,

On Fri, 2023-05-26 at 14:22 +0200, Stefan Kanthak wrote:
> "Jonathan Wakely" <jwakely.gcc@gmail.com> wrote:
> > And when did you report it to bugzilla?
> > 
> > Nobody reads your silly webpage.
> 
> Thanks stupid. I don't read your silly bugzilla!

Stop calling people stupid or dumbass.

People are trying to help you understand your issues and explaining
your mistakes and asking you to please report any real issues to
bugzilla for proper triage. Which you have refused to do for a couple
of years now.

It is totally inappropriate to respond to such requests with insults
(even if they call your website silly, others also shouldn't use such
denigrating language).

Thanks,

Mark

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Will GCC eventually support SSE2 or SSE4.1?
  2023-05-26 12:42                     ` Stefan Kanthak
@ 2023-05-26 13:33                       ` Nicholas Vinson
  0 siblings, 0 replies; 43+ messages in thread
From: Nicholas Vinson @ 2023-05-26 13:33 UTC (permalink / raw)
  To: gcc

On 5/26/23 08:42, Stefan Kanthak wrote:
>
> I could have added PROPERLY, because that's where it CLEARLY fails, as
> shown by the generated unoptimised code.

 From what I've seen so far, I find your arguments unconvincing.

In this thread alone, you've proven that you don't know how to properly 
control gcc via its command-line flags, and that you don't know how to 
properly generate assembly code for your own C example (properly in this 
case meaning to exhibit the behavior the ISO C standard requires) which 
makes it hard for me to accept your claims at face value (your C example 
is also logically incorrect, but that's not important to this discussion).

That said assuming that your "optimized assembly" examples (with the 
exception of the first) are correct, all you've done is shown that your 
versions are slightly smaller in both instruction count and size and 
declared your examples "proper". The optimization flag -O3 (like most of 
the -On flags) optimize for speed over all else, and it has been proven 
that the faster code isn't necessarily the code with fewer instructions 
or the smallest size (see the RISC v CISC debate).

To accept that your suggestions are the proper ways to generate code 
using SSE4.1 instructions at -O3, I insist on data that clearly 
demonstrates that your suggestions are at least as performant than what 
GCC's currently does.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Will GCC eventually support SSE2 or SSE4.1?
  2023-05-26 12:37                   ` Jakub Jelinek
@ 2023-05-26 13:49                     ` Stefan Kanthak
  2023-05-26 14:07                       ` Jonathan Wakely
  0 siblings, 1 reply; 43+ messages in thread
From: Stefan Kanthak @ 2023-05-26 13:49 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Jonathan Wakely, gcc, Andrew Pinski

"Jakub Jelinek" <jakub@redhat.com> wrote:

> On Fri, May 26, 2023 at 02:19:54PM +0200, Stefan Kanthak wrote:
>> > I find it very SURPRISING that you're only just learning the basics of
>> > how to use gcc NOW, after YELLING about all the OUCH.
>> 
>> I'm NOT surprised that you don't grok it!
>> 
>> gcc -msse4.1 -m32 -march=core2 ...
>> 
>> Which -m* options win here?
>> Do -m32 or -march=core2 override -msse4.1?
> 
> Jonathan told you what to use to find it out (-Q --help=target).
> -m32/-m64/-mx32/-m16 options don't affect the ISA, they switch the
> main ABI (ilp32 32-bit code, lp64 64-bit code, ilp32 code running
> in 64-bit mode, 16-bit code).

PLEASE read your own documentation and specify ANY:

| -m32
...
|    generates code that runs on any i386 system.

The first example of my initial posts falsifies "runs on any i386 system"!

> -march= options selects the ISA base (which CPU family to compile form
> as minimum), if you don't supply it, the default from how gcc has been
> configured is selected (e.g. if you configure --with-arch-32=core2, then
> that will be the -m32 default, if you configure --with-arch=x86-64, that
> will be the -march default if --with-arch-32= isn't specified, etc.).
> If more than one -march= is specified, the last one wins.

But "the last one wins" does NOT apply to -m32 or -m<feature>: the latter
are additive, INDEPENDENT from their position before or after -march=,
what is NOT documented, and -m32 fails to disable all -m<feature> not
common to ANY i386 system set before, thus contradicting the documentation.

> And, the -mISA options then tweak the ISA set.  Most ISAs have dependencies,
> say -msse4.1 enables -mssse3 which enables -msse3 which enables -msse2 etc.,
> and similarly the -mno-ISA options disable what ISAs depend on it, so
> -mno-avx disables -mno-avx2 which disables -mno-avx512f which disables ...

These points are obvious.
NOT obvious is but that -m<feature> -march=<lowerISA> does not clear any
<feature> not supported in <lowerISA>, i.e the last one does NOT win here.

JFTR: because -m32 -mtune=i386 disables SSE instructions I made the wrong
      assumption "-mtune=native is the default", upon which (not only) you
      reacted like your compiler when it encounters UB and stopped reading.

> -mtune= option specifies for which CPU family the code should be tuned,
> it will still run on any code compatible with the implicit or explicit
> -march=, but will schedule instructions or choose from alternative forms
> from the selected ISAs to perform best on the -mtune=  family.

The third example of my initial post disables SSE2 with -mtune=i386, but
does NOT "perform best" on the i386 family: before the corei all i386 CPUs
run MOV <reg>,<reg> plus AND <reg>,<reg> slower than AND <reg>,<memory>,
i.e. the code generated with -mtune=i586 performs best.

JTFR: my post is NOT about "How to use GCC", it's about the not properly
      optimised code.

Stefan

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Will GCC eventually support SSE2 or SSE4.1?
  2023-05-26 13:49                     ` Stefan Kanthak
@ 2023-05-26 14:07                       ` Jonathan Wakely
  2023-05-26 14:18                         ` Jakub Jelinek
  2023-05-26 14:26                         ` Stefan Kanthak
  0 siblings, 2 replies; 43+ messages in thread
From: Jonathan Wakely @ 2023-05-26 14:07 UTC (permalink / raw)
  To: Stefan Kanthak; +Cc: Jakub Jelinek, gcc, Andrew Pinski

On Fri, 26 May 2023 at 14:55, Stefan Kanthak <stefan.kanthak@nexgo.de> wrote:
>
> "Jakub Jelinek" <jakub@redhat.com> wrote:
>
> > On Fri, May 26, 2023 at 02:19:54PM +0200, Stefan Kanthak wrote:
> >> > I find it very SURPRISING that you're only just learning the basics of
> >> > how to use gcc NOW, after YELLING about all the OUCH.
> >>
> >> I'm NOT surprised that you don't grok it!
> >>
> >> gcc -msse4.1 -m32 -march=core2 ...
> >>
> >> Which -m* options win here?
> >> Do -m32 or -march=core2 override -msse4.1?
> >
> > Jonathan told you what to use to find it out (-Q --help=target).
> > -m32/-m64/-mx32/-m16 options don't affect the ISA, they switch the
> > main ABI (ilp32 32-bit code, lp64 64-bit code, ilp32 code running
> > in 64-bit mode, 16-bit code).
>
> PLEASE read your own documentation and specify ANY:
>
> | -m32
> ...
> |    generates code that runs on any i386 system.
>
> The first example of my initial posts falsifies "runs on any i386 system"!

Which is why that word "any" has been acknowledged as a bug:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109954
The word "any" is wrong. We know it's wrong. We've acknowledged it's
wrong. I've even proposed wording to fix it in that bug report.
What part of this is still hard to understand?


>
> > -march= options selects the ISA base (which CPU family to compile form
> > as minimum), if you don't supply it, the default from how gcc has been
> > configured is selected (e.g. if you configure --with-arch-32=core2, then
> > that will be the -m32 default, if you configure --with-arch=x86-64, that
> > will be the -march default if --with-arch-32= isn't specified, etc.).
> > If more than one -march= is specified, the last one wins.
>
> But "the last one wins" does NOT apply to -m32 or -m<feature>: the latter
> are additive, INDEPENDENT from their position before or after -march=,
> what is NOT documented, and -m32 fails to disable all -m<feature> not
> common to ANY i386 system set before, thus contradicting the documentation.

See above.


> > And, the -mISA options then tweak the ISA set.  Most ISAs have dependencies,
> > say -msse4.1 enables -mssse3 which enables -msse3 which enables -msse2 etc.,
> > and similarly the -mno-ISA options disable what ISAs depend on it, so
> > -mno-avx disables -mno-avx2 which disables -mno-avx512f which disables ...
>
> These points are obvious.
> NOT obvious is but that -m<feature> -march=<lowerISA> does not clear any
> <feature> not supported in <lowerISA>, i.e the last one does NOT win here.

The last -march option selects the base set of instructions. The -mISA
options modify that base.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Will GCC eventually support SSE2 or SSE4.1?
  2023-05-26 14:07                       ` Jonathan Wakely
@ 2023-05-26 14:18                         ` Jakub Jelinek
  2023-05-26 14:41                           ` Stefan Kanthak
  2023-05-26 14:26                         ` Stefan Kanthak
  1 sibling, 1 reply; 43+ messages in thread
From: Jakub Jelinek @ 2023-05-26 14:18 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: Stefan Kanthak, gcc, Andrew Pinski

On Fri, May 26, 2023 at 03:07:59PM +0100, Jonathan Wakely wrote:
> > These points are obvious.
> > NOT obvious is but that -m<feature> -march=<lowerISA> does not clear any
> > <feature> not supported in <lowerISA>, i.e the last one does NOT win here.
> 
> The last -march option selects the base set of instructions. The -mISA
> options modify that base.

And for -m32 it is also the last option that wins, but as with many other
cases just last one from certain set of options.  In the -m32 case,
last one of -m{32,64,x32,m16} option wins.  In the -march= case, the
last -march= option wins.  In say the PIC/PIE options case, last one
of -f{,no-}{pic,PIC,pie,PIE} option wins.  The -mISA options are processed
left to right after setting base from -march=.

	Jakub


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Will GCC eventually support SSE2 or SSE4.1?
  2023-05-26 14:07                       ` Jonathan Wakely
  2023-05-26 14:18                         ` Jakub Jelinek
@ 2023-05-26 14:26                         ` Stefan Kanthak
  2023-05-26 14:58                           ` Jonathan Wakely
  1 sibling, 1 reply; 43+ messages in thread
From: Stefan Kanthak @ 2023-05-26 14:26 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: Jakub Jelinek, gcc, Andrew Pinski

"Jonathan Wakely" <jwakely.gcc@gmail.com> wrote:

> On Fri, 26 May 2023 at 14:55, Stefan Kanthak <stefan.kanthak@nexgo.de> wrote:

[...]

>> NOT obvious is but that -m<feature> -march=<lowerISA> does not clear any
>> <feature> not supported in <lowerISA>, i.e the last one does NOT win here.
> 
> The last -march option selects the base set of instructions. The -mISA
> options modify that base.

You but missed the point, AGAIN: the modifications per -mISA and -mno-ISA
persist, i.e. they are NOT reset by the last -march= option.
Is this SOOOO hard to grok?
Is this soooo hard to document?

Stefan

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Will GCC eventually support SSE2 or SSE4.1?
  2023-05-26 14:18                         ` Jakub Jelinek
@ 2023-05-26 14:41                           ` Stefan Kanthak
  2023-05-26 14:55                             ` Jonathan Wakely
  0 siblings, 1 reply; 43+ messages in thread
From: Stefan Kanthak @ 2023-05-26 14:41 UTC (permalink / raw)
  To: Jakub Jelinek, Jonathan Wakely; +Cc: gcc, Andrew Pinski

"Jakub Jelinek" <jakub@redhat.com> wrote:

[...]

> And for -m32 it is also the last option that wins, but as with
> many other cases just last one from certain set of options. [...]
> The -mISA options are processed left to right after

as well as BEFORE

> setting base from -march=.

In other words: although -march= selects a (documented sub)set of
-mISA options, it does NEITHER reset any -mISA option set NOR any
-mno-ISA option reset BEFORE or AFTER itself, i.e. all -m[no-]ISA
options have precedence even if they preceed -march=.

Just document that!

Stefan

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Will GCC eventually support SSE2 or SSE4.1?
  2023-05-26 14:41                           ` Stefan Kanthak
@ 2023-05-26 14:55                             ` Jonathan Wakely
  2023-05-26 15:07                               ` Stefan Kanthak
  0 siblings, 1 reply; 43+ messages in thread
From: Jonathan Wakely @ 2023-05-26 14:55 UTC (permalink / raw)
  To: Stefan Kanthak; +Cc: Jakub Jelinek, gcc, Andrew Pinski

On Fri, 26 May 2023 at 15:48, Stefan Kanthak <stefan.kanthak@nexgo.de> wrote:
>
> "Jakub Jelinek" <jakub@redhat.com> wrote:
>
> [...]
>
> > And for -m32 it is also the last option that wins, but as with
> > many other cases just last one from certain set of options. [...]
> > The -mISA options are processed left to right after
>
> as well as BEFORE

No. You seem to be interpreting "after" to mean "later in the command
line" but Jakub means *at a later time*. He used "left to right" to
describe position in the command line, and "after" means "at a later
time".

Any -march options are processed first, from left to right. After
that, there is a base arch in effect.
Then, after that, the -mISA options are processed, and take effect
relative to the base arch.

What Jakub wrote is correct. If you try a bit harder to understand
what has been said repeatedly, you might get it.

>
> > setting base from -march=.
>
> In other words: although -march= selects a (documented sub)set of
> -mISA options, it does NEITHER reset any -mISA option set NOR any
> -mno-ISA option reset BEFORE or AFTER itself, i.e. all -m[no-]ISA
> options have precedence even if they preceed -march=.
>
> Just document that!

That is not far from unreadable.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Will GCC eventually support SSE2 or SSE4.1?
  2023-05-26 14:26                         ` Stefan Kanthak
@ 2023-05-26 14:58                           ` Jonathan Wakely
  2023-05-26 15:49                             ` Stefan Kanthak
  0 siblings, 1 reply; 43+ messages in thread
From: Jonathan Wakely @ 2023-05-26 14:58 UTC (permalink / raw)
  To: Stefan Kanthak; +Cc: Jakub Jelinek, gcc, Andrew Pinski

On Fri, 26 May 2023 at 15:34, Stefan Kanthak <stefan.kanthak@nexgo.de> wrote:
>
> "Jonathan Wakely" <jwakely.gcc@gmail.com> wrote:
>
> > On Fri, 26 May 2023 at 14:55, Stefan Kanthak <stefan.kanthak@nexgo.de> wrote:
>
> [...]
>
> >> NOT obvious is but that -m<feature> -march=<lowerISA> does not clear any
> >> <feature> not supported in <lowerISA>, i.e the last one does NOT win here.
> >
> > The last -march option selects the base set of instructions. The -mISA
> > options modify that base.
>
> You but missed the point, AGAIN: the modifications per -mISA and -mno-ISA
> persist, i.e. they are NOT reset by the last -march= option.

Nobody said they are reset, and the docs don't say that, so assume they are not.

The last -march option selects a base and the -mISA options modify the
base. Note *the* base. The one that was selected. By the last -march
option. The base.

> Is this SOOOO hard to grok?

I understand your question. It's based on failing to read or
understand what has been said.

There is a base ISA. Then there are additions or subtractions relative
to that base. That's it. That actually tells you everything you need
to know, if you just apply some thought.

Choosing a base does not remove the effects of the additions or
subtractions, because they are additions and subtractions relative to
whichever base arch is in effect.

Where the -misa options appear on the command line relative to -march
doesn't matter, which is why it doesn't need to be stated explicitly.
The order only matters for -march relative to other -march options,
and -misa options relative to other -misa options.


> Is this soooo hard to document?

I prefer arguing with trolls, it's even easier.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Will GCC eventually support SSE2 or SSE4.1?
  2023-05-26 14:55                             ` Jonathan Wakely
@ 2023-05-26 15:07                               ` Stefan Kanthak
  0 siblings, 0 replies; 43+ messages in thread
From: Stefan Kanthak @ 2023-05-26 15:07 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: Jakub Jelinek, gcc, Andrew Pinski

"Jonathan Wakely" <jwakely.gcc@gmail.com> wrote:

> On Fri, 26 May 2023 at 15:48, Stefan Kanthak <stefan.kanthak@nexgo.de> wrote:
>>
>> "Jakub Jelinek" <jakub@redhat.com> wrote:
>>
>> [...]
>>
>> > And for -m32 it is also the last option that wins, but as with
>> > many other cases just last one from certain set of options. [...]
>> > The -mISA options are processed left to right after
>>
>> as well as BEFORE
> 
> No. You seem to be interpreting "after" to mean "later in the command
> line"

OUCH: "left" and "right" denote SPACE, not TIME!
"after" follows "left to right", so it can only mean the position in the
command line; if he meant "later or earlier/before" instead he should
write that!

> but Jakub means *at a later time*. He used "left to right" to
> describe position in the command line, and "after" means "at a later
> time".

See above: that's his fault!

> Any -march options are processed first, from left to right. After
> that, there is a base arch in effect.
> Then, after that, the -mISA options are processed, and take effect
> relative to the base arch.
> 
> What Jakub wrote is correct. If you try a bit harder to understand
> what has been said repeatedly, you might get it.

Why don't you and Jakub try to express yourself unambiguously?

>> > setting base from -march=.
>>
>> In other words: although -march= selects a (documented sub)set of
>> -mISA options, it does NEITHER reset any -mISA option set NOR any
>> -mno-ISA option reset BEFORE or AFTER itself, i.e. all -m[no-]ISA
>> options have precedence even if they preceed -march=.
>>
>> Just document that!
> 
> That is not far from unreadable.

Far in respect to space or time?

Stefan

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Will GCC eventually support SSE2 or SSE4.1?
  2023-05-26  6:46 Will GCC eventually support SSE2 or SSE4.1? Stefan Kanthak
  2023-05-26  7:00 ` Andrew Pinski
  2023-05-26 11:34 ` Nicholas Vinson
@ 2023-05-26 15:10 ` LIU Hao
  2023-05-26 15:40   ` Stefan Kanthak
  2 siblings, 1 reply; 43+ messages in thread
From: LIU Hao @ 2023-05-26 15:10 UTC (permalink / raw)
  To: Stefan Kanthak, gcc


[-- Attachment #1.1: Type: text/plain, Size: 1911 bytes --]

在 2023-05-26 14:46, Stefan Kanthak 写道:
> OOPS: why does GCC (ab)use the SSE2 alias "Willamette New Instruction Set"
> (... ...)
> OUCH: why does it FAIL to REALLY use SSE2, as shown in the comments on the
>        right side?

Please stop yelling like that. It makes you look like a naughty pupil.


> 14 instructions in 33 bytes    # 11 instructions in 32 bytes
> 
> OUCH: why does GCC abuse EBX (and ECX too) and performs a superfluous
>        memory write?

Apart from the SSE question: You are performing 64-bit arithmetic on a 32-bit machine, which GCC 
isn't good at. The preferred way to check whether a 64-bit integer is a power of two is to cast it 
to a float, then examine whether its 23-bit mantissa is all zeroes:

Like yours, this also mistakes zero as a 'power of two', but it isn't.
    ```
    sub   esp, 0x0C                  ; 83 EC 0C
    fild  qword ptr [esp + 0x10]     ; DF 6C 24 10
    xor   eax, eax                   ; 33 C0
    fstp  dword ptr [esp]            ; D9 1C 24
    shl   dword ptr [esp], 9         ; C1 24 24 09
    setz  al                         ; 0F 94 C0
    add   esp, 0x0C                  ; 83 C4 0C
    ret                              ; C3
    ```
That's 8 instructions and 23 bytes in total.

In 64-bit mode, 64-bit integers can be converted to floats directly:
    ```
    cvtsi2ss  xmm0, qword ptr [rsp + 0x08]   ; F3 48 0F 2A 44 24 08
    xor       eax, eax                       ; 33 C0
    movd      ecx, xmm0                      ; 66 0F 7E C1
    shl       ecx, 9                         ; C1 E1 09
    setz      al                             ; 0F 94 C0
    ret                                      ; C3
    ```
That's 6 instructions and 20 bytes in total.

GCC has its own limitation, so if you would like aggressive optimization like this, you must do it 
yourself.


-- 
Best regards,
LIU Hao


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Will GCC eventually support SSE2 or SSE4.1?
  2023-05-26 15:10 ` LIU Hao
@ 2023-05-26 15:40   ` Stefan Kanthak
  2023-05-27 18:20     ` LIU Hao
  0 siblings, 1 reply; 43+ messages in thread
From: Stefan Kanthak @ 2023-05-26 15:40 UTC (permalink / raw)
  To: gcc, LIU Hao

You wrote:

>在 2023-05-26 14:46, Stefan Kanthak 写道:
>> OOPS: why does GCC (ab)use the SSE2 alias "Willamette New Instruction Set"
>> (... ...)
>> OUCH: why does it FAIL to REALLY use SSE2, as shown in the comments on the
>>        right side?
>
> Please stop yelling like that. It makes you look like a naughty pupil.

That's EMPHASISING, kid!

>> 14 instructions in 33 bytes    # 11 instructions in 32 bytes
>>
>> OUCH: why does GCC abuse EBX (and ECX too) and performs a superfluous
>>        memory write?
>
> Apart from the SSE question: You are performing 64-bit arithmetic on a 32-bit
> machine, which GCC isn't good at.

So it's REALLY time to fix that.

> The preferred way to check whether a 64-bit integer is a power of two is to cast it
> to a float, then examine whether its 23-bit mantissa is all zeroes:

Feel free to propose this alternative here (better elsewhere, where you'll
earn less laughter).
But don't forget that this 23-bit mantissa will be all zeroes for quite some
64-bit (and even 32-bit) integers which are no power of 2, for example
0x8000003fffffffff, and that both FILD and CVT2SI2SS only work on SIGNED
integers.

I instead prefer to show that and how GCCs current code generator fails to
optimise properly.

Stefan


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Will GCC eventually support SSE2 or SSE4.1?
  2023-05-26 14:58                           ` Jonathan Wakely
@ 2023-05-26 15:49                             ` Stefan Kanthak
  2023-05-26 16:44                               ` David Brown
  0 siblings, 1 reply; 43+ messages in thread
From: Stefan Kanthak @ 2023-05-26 15:49 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: Jakub Jelinek, gcc, Andrew Pinski

"Jonathan Wakely" <jwakely.gcc@gmail.com> wrote:

> On Fri, 26 May 2023 at 15:34, Stefan Kanthak <stefan.kanthak@nexgo.de> wrote:
>>
>> "Jonathan Wakely" <jwakely.gcc@gmail.com> wrote:
>>
>> > On Fri, 26 May 2023 at 14:55, Stefan Kanthak <stefan.kanthak@nexgo.de> wrote:
>>
>> [...]
>>
>> >> NOT obvious is but that -m<feature> -march=<lowerISA> does not clear any
>> >> <feature> not supported in <lowerISA>, i.e the last one does NOT win here.
>> >
>> > The last -march option selects the base set of instructions. The -mISA
>> > options modify that base.
>>
>> You but missed the point, AGAIN: the modifications per -mISA and -mno-ISA
>> persist, i.e. they are NOT reset by the last -march= option.
> 
> Nobody said they are reset, and the docs don't say that, so assume they are not.

For the supported -march= the documentation EXPLICITLY enumerates (all?)
options!
Why should I assume that these options are "sticky" and override the
DOCUMENTED set?

Unless you document the behaviour in either way it is UB, so every user
of GCC can assume anything he wants!

> The last -march option selects a base and the -mISA options modify the
> base. Note *the* base. The one that was selected. By the last -march
> option. The base.
> 
>> Is this SOOOO hard to grok?
> 
> I understand your question. It's based on failing to read or
> understand what has been said.

Yes, silly!

[...]

>> Is this soooo hard to document?
> 
> I prefer arguing with trolls, it's even easier.

I don't like to argue with idiots: they beat me with experience!

Stefan

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Will GCC eventually support SSE2 or SSE4.1?
  2023-05-26 15:49                             ` Stefan Kanthak
@ 2023-05-26 16:44                               ` David Brown
  2023-05-27 18:16                                 ` Will GCC eventually support correct code compilation? Dave Blanchard
  0 siblings, 1 reply; 43+ messages in thread
From: David Brown @ 2023-05-26 16:44 UTC (permalink / raw)
  To: gcc

On 26/05/2023 17:49, Stefan Kanthak wrote:

> I don't like to argue with idiots: they beat me with experience!
> 
> Stefan
> 

Stefan, you are clearly not happy about the /free/ compiler you are 
using, and its /free/ documentation (which, despite its flaws, is better 
than I have seen for most other compilers).

Instead of filing a bug report, as you have been asked to do, or reading 
the documentation, or thinking, or posting to an appropriate mailing 
list, you have chosen to rant, yell, shout at and insult the very people 
who could make the changes and improvements you want.

So who, exactly, do you think is acting like an idiot?  I'd say it is 
the rude and arrogant fool that is sawing off the branch he is sitting on.

Remember, these are people with /no/ obligation to help you.  Some do 
gcc development as voluntary contributions, others are paid to work on 
it - but they are not paid by /you/.  And none are paid to sit and 
listen to your tantrums.


So if you want to shout and rant and blow off steam, go make a tweet or 
something.  If you actually hope to see gcc change its optimisation, 
flag details or documentation to your liking, then your current 
behaviour is the worst possible tactic.  So let your final post to this 
thread be an apology, then register bug reports with what you see as 
bugs or scope for improvement in the project.  Please - for the sanaity 
of the gcc developers and for the benefit of gcc users everywhere - stop 
your aggravating posts here, so that Jonathan and the others can back to 
what they do best - improving gcc for everyone.

David



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Will GCC eventually support correct code compilation?
  2023-05-26 16:44                               ` David Brown
@ 2023-05-27 18:16                                 ` Dave Blanchard
  2023-05-27 18:59                                   ` Jason Merrill
  2023-05-28 11:50                                   ` David Brown
  0 siblings, 2 replies; 43+ messages in thread
From: Dave Blanchard @ 2023-05-27 18:16 UTC (permalink / raw)
  To: gcc; +Cc: David Brown

On Fri, 26 May 2023 18:44:41 +0200
David Brown via Gcc <gcc@gcc.gnu.org> wrote:

> On 26/05/2023 17:49, Stefan Kanthak wrote:
> 
> > I don't like to argue with idiots: they beat me with experience!
> > 
> > Stefan
> > 
> 
> Stefan, you are clearly not happy about the /free/ compiler you are 
> using, and its /free/ documentation (which, despite its flaws, is better 
> than I have seen for most other compilers).

When the flaws continue to stack up as things get provably worse over time, at some point you need to stop patting yourself on the back, riding on the coattails of your past successes, and get to work making things right.

At the very least, GCC documentation is HORRIBLE, as this previous thread proves.

> Instead of filing a bug report, as you have been asked to do, or reading 
> the documentation, or thinking, or posting to an appropriate mailing 
> list, you have chosen to rant, yell, shout at and insult the very people 
> who could make the changes and improvements you want.

Actually, no, that's not what happened. He made a valid observation and got the run-around; the typical "just RTFM noob" treatment, despite pointing out again and again that the documentation LIES. 

The overall point however was successfully buried in the noise: looks like the code quality of GCC is shit anymore.

If you hand me a pile of shit wrapped up nicely in a plastic bag, guess what: I still don't want it, even if it's free. So I think this man (and the people of this mailing list) deserve a real explanation. Why does GCC generate such shit code?

> So who, exactly, do you think is acting like an idiot?  I'd say it is 
> the rude and arrogant fool that is sawing off the branch he is sitting on.

If the branch is rotten and splintered then maybe it's time to get off that branch and climb onto another one.

> Remember, these are people with /no/ obligation to help you. 

... and it often shows!

> Some do gcc development as voluntary contributions, others are paid to work on 
> it - but they are not paid by /you/.  And none are paid to sit and 
> listen to your tantrums.

So is this proof of the technical and intellectually bankruptcy of the open source development model, or...?

If nobody wants to have detailed discussions about the technical workings of a very serious tool that millions are relying on day in and day out, what is this mailing list FOR, exactly?

Dave

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Will GCC eventually support SSE2 or SSE4.1?
  2023-05-26 15:40   ` Stefan Kanthak
@ 2023-05-27 18:20     ` LIU Hao
  2023-05-27 18:49       ` Stefan Kanthak
  0 siblings, 1 reply; 43+ messages in thread
From: LIU Hao @ 2023-05-27 18:20 UTC (permalink / raw)
  To: Stefan Kanthak, gcc


[-- Attachment #1.1: Type: text/plain, Size: 1226 bytes --]

在 2023-05-26 23:40, Stefan Kanthak 写道:
> Feel free to propose this alternative here (better elsewhere, where you'll
> earn less laughter).
> But don't forget that this 23-bit mantissa will be all zeroes for quite some
> 64-bit (and even 32-bit) integers which are no power of 2, for example
> 0x8000003fffffffff, and that both FILD and CVT2SI2SS only work on SIGNED
> integers.

The precision loss can be detected by examining the PF bit (6th bit i.e. `0x20`) of the x87 status 
register. It doesn't matter whether the number is interpreted as signed or unsigned: 
`-0x80000000'00000000` still only has one bit in its mantissa. Another option is to store the number 
in the 80-bit extended precision format, with a 64-bit mantissa which includes the otherwise hidden 
bit (so if the number is a power of two, the mantissa will be `0x80000000'00000000`).

But anyway, traditional x86 has very few GPRs and GCC doesn't optimize multi-word arithmetic very 
well. Performance may or may not vary depending on cache locality and number of μops; not to mention 
`movq` and `movd` which have relative high latencies. I would like to see some benchmarking results 
first.



-- 
Best regards,
LIU Hao


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Will GCC eventually support SSE2 or SSE4.1?
  2023-05-27 18:20     ` LIU Hao
@ 2023-05-27 18:49       ` Stefan Kanthak
  0 siblings, 0 replies; 43+ messages in thread
From: Stefan Kanthak @ 2023-05-27 18:49 UTC (permalink / raw)
  To: gcc, LIU Hao

You wrote:

>在 2023-05-26 23:40, Stefan Kanthak 写道:
>> Feel free to propose this alternative here (better elsewhere, where you'll
>> earn less laughter).
>> But don't forget that this 23-bit mantissa will be all zeroes for quite some
>> 64-bit (and even 32-bit) integers which are no power of 2, for example
>> 0x8000003fffffffff, and that both FILD and CVT2SI2SS only work on SIGNED
>> integers.
>
> The precision loss can be detected by examining the PF bit (6th bit i.e.
> `0x20`) of the x87 status register.

How many instructions and conditional branches do you need then?
Is the COMPLETE code using x87 instructions shorter/faster than the pure i386
code?

JFTR: I show the DELTA to the generated code with intention; I don't create
      completely different code.

> It doesn't matter whether the number is interpreted as signed or unsigned:
> `-0x80000000'00000000` still only has one bit in its mantissa. Another option
> is to store the number in the 80-bit extended precision format, with a 64-bit
> mantissa which includes the otherwise hidden bit (so if the number is a power
> of two, the mantissa will be `0x80000000'00000000`).

Correct; you but proposed to use the 23-bit mantissa!

> But anyway, traditional x86 has very few GPRs and GCC doesn't optimize multi-
> word arithmetic very well. Performance may or may not vary depending on cache
> locality and number of μops; not to mention `movq` and `movd` which have
> relative high latencies. I would like to see some benchmarking results first.

Ask the GCC developers why they generate SSE2 instructions in the first place
here, and why they ignore their shortcomings, instead to stick with i386 code.

JFTR: adding "(argument != 0) &&" stops their nonsense!

Stefan


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Will GCC eventually support correct code compilation?
  2023-05-27 18:16                                 ` Will GCC eventually support correct code compilation? Dave Blanchard
@ 2023-05-27 18:59                                   ` Jason Merrill
  2023-05-28 11:50                                   ` David Brown
  1 sibling, 0 replies; 43+ messages in thread
From: Jason Merrill @ 2023-05-27 18:59 UTC (permalink / raw)
  To: Dave Blanchard; +Cc: gcc, David Brown

[-- Attachment #1: Type: text/plain, Size: 519 bytes --]

On Sat, May 27, 2023 at 2:15 PM Dave Blanchard <dave@killthe.net> wrote:

> If nobody wants to have detailed discussions about the technical workings
> of a very serious tool that millions are relying on day in and day out,
> what is this mailing list FOR, exactly?
>

For discussions, absolutely.  Not for Usenet-style flame wars like the one
he and you seem determined to start.

If you lead with "your work is bad and you should feel bad" that doesn't
tend to lead to constructive discussion.

Jason

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: Will GCC eventually support correct code compilation?
  2023-05-27 18:16                                 ` Will GCC eventually support correct code compilation? Dave Blanchard
  2023-05-27 18:59                                   ` Jason Merrill
@ 2023-05-28 11:50                                   ` David Brown
  1 sibling, 0 replies; 43+ messages in thread
From: David Brown @ 2023-05-28 11:50 UTC (permalink / raw)
  To: gcc

On 27/05/2023 20:16, Dave Blanchard wrote:
> On Fri, 26 May 2023 18:44:41 +0200 David Brown via Gcc
> <gcc@gcc.gnu.org> wrote:
> 
>> On 26/05/2023 17:49, Stefan Kanthak wrote:
>> 
>>> I don't like to argue with idiots: they beat me with experience!
>>> 
>>> Stefan
>>> 
>> 
>> Stefan, you are clearly not happy about the /free/ compiler you
>> are using, and its /free/ documentation (which, despite its flaws,
>> is better than I have seen for most other compilers).
> 
> When the flaws continue to stack up as things get provably worse over
> time, at some point you need to stop patting yourself on the back,
> riding on the coattails of your past successes, and get to work
> making things right.
> 

I think your idea of "proof" might differ from that of everyone else. 
The GCC developers are entirely aware that their tools have bugs and 
scope for improvement, but anyone who has followed the project for any 
length of time can see it has continually progressed in many ways. 
There are regularly minor regressions, and occasionally serious issues - 
but the serious issues get fixed.

This is open source software.  If newer versions were "getting provably 
worse over time", then people would simply fork earlier versions and use 
them.  That's what happens in projects where a significant number of 
users or developers feel the project is moving in the wrong direction.

> At the very least, GCC documentation is HORRIBLE, as this previous
> thread proves.

Now I am sure that you don't know what "proof" is.  In regard to 
documentation, this thread proves that GCC's documentation is not 
perfect, that the GCC developers know this, that they ask people for 
suggestions for improvement, and that they keep track of suggestions or 
complaints so that they can be fixed when time and resources allow.

> 
> If the branch is rotten and splintered then maybe it's time to get
> off that branch and climb onto another one.

Feel free to do so.

> 
>> Remember, these are people with /no/ obligation to help you.
> 
> ... and it often shows!

My experience, like that of most people (judging from the mailing lists 
and the bugzilla discussions I have read), is different - those who 
treat the GCC developers politely and with the respect due any fellow 
human, get a great deal of help.  They might not always agree on what 
should be changed, but even then you can generally come out of the 
discussion with an understanding of /why/ they cannot or will not change 
GCC as you'd like.

But - like everyone else - the GCC developers can quickly lose interest 
in helping those who come across as rude, demanding, unhelpful and 
wilfully ignorant.

> 
>> Some do gcc development as voluntary contributions, others are paid
>> to work on it - but they are not paid by /you/.  And none are paid
>> to sit and listen to your tantrums.
> 
> So is this proof of the technical and intellectually bankruptcy of
> the open source development model, or...?

No, it is not.

> 
> If nobody wants to have detailed discussions about the technical
> workings of a very serious tool that millions are relying on day in
> and day out, what is this mailing list FOR, exactly?
> 

It /is/ for such discussions.  This thread has not been a discussion - 
it has been driven by someone who preferred to yell and whine rather 
than discuss, and insisted on continuing here rather than filing bug 
reports in the right places.  The GCC developers prefer to work /with/ 
the users in finding out how to make the toolchain better - /that/ is 
what the mailing lists are for.



^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2023-05-28 11:51 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-26  6:46 Will GCC eventually support SSE2 or SSE4.1? Stefan Kanthak
2023-05-26  7:00 ` Andrew Pinski
2023-05-26  7:30   ` Jonathan Wakely
2023-05-26  7:58     ` Stefan Kanthak
2023-05-26  8:16       ` Sam James
2023-05-26  8:28       ` Jonathan Wakely
2023-05-26  8:59         ` Stefan Kanthak
2023-05-26  9:22           ` Jakub Jelinek
2023-05-26 11:28             ` Stefan Kanthak
2023-05-26 11:42               ` Jonathan Wakely
2023-05-26 12:03                 ` Stefan Kanthak
2023-05-26 12:16                   ` Jonathan Wakely
2023-05-26 12:22                     ` Stefan Kanthak
2023-05-26 13:00                       ` Mark Wielaard
2023-05-26 12:23                   ` Jonathan Wakely
2023-05-26 11:36             ` Stefan Kanthak
2023-05-26 11:45               ` Jonathan Wakely
2023-05-26 12:19                 ` Stefan Kanthak
2023-05-26 12:30                   ` Jonathan Wakely
2023-05-26 12:42                     ` Stefan Kanthak
2023-05-26 13:33                       ` Nicholas Vinson
2023-05-26 12:37                   ` Jakub Jelinek
2023-05-26 13:49                     ` Stefan Kanthak
2023-05-26 14:07                       ` Jonathan Wakely
2023-05-26 14:18                         ` Jakub Jelinek
2023-05-26 14:41                           ` Stefan Kanthak
2023-05-26 14:55                             ` Jonathan Wakely
2023-05-26 15:07                               ` Stefan Kanthak
2023-05-26 14:26                         ` Stefan Kanthak
2023-05-26 14:58                           ` Jonathan Wakely
2023-05-26 15:49                             ` Stefan Kanthak
2023-05-26 16:44                               ` David Brown
2023-05-27 18:16                                 ` Will GCC eventually support correct code compilation? Dave Blanchard
2023-05-27 18:59                                   ` Jason Merrill
2023-05-28 11:50                                   ` David Brown
2023-05-26  9:22           ` Will GCC eventually support SSE2 or SSE4.1? Jonathan Wakely
2023-05-26  8:12     ` Hagen Paul Pfeifer
2023-05-26  9:51       ` Jonathan Wakely
2023-05-26 11:34 ` Nicholas Vinson
2023-05-26 15:10 ` LIU Hao
2023-05-26 15:40   ` Stefan Kanthak
2023-05-27 18:20     ` LIU Hao
2023-05-27 18:49       ` Stefan Kanthak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).