* Another epic optimiser failure
@ 2023-05-27 21:04 Stefan Kanthak
2023-05-27 21:20 ` Jakub Jelinek
2023-05-28 6:28 ` Nicholas Vinson
0 siblings, 2 replies; 11+ messages in thread
From: Stefan Kanthak @ 2023-05-27 21:04 UTC (permalink / raw)
To: gcc
--- .c ---
int ispowerof2(unsigned long long argument) {
return __builtin_popcountll(argument) == 1;
}
--- EOF ---
GCC 13.3 gcc -m32 -march=alderlake -O3
gcc -m32 -march=sapphirerapids -O3
gcc -m32 -mpopcnt -mtune=sapphirerapids -O3
https://gcc.godbolt.org/z/cToYrrYPq
ispowerof2(unsigned long long):
xor eax, eax # superfluous
xor edx, edx # superfluous
popcnt eax, [esp+4]
popcnt edx, [esp+8]
add eax, edx
cmp eax, 1 -> dec eax
sete al
movzx eax, al # superfluous
ret
9 instructions in 28 bytes # 6 instructions in 20 bytes
OUCH: popcnt writes the WHOLE result register, there is ABSOLUTELY
no need to clear it beforehand nor to clear the higher 24 bits
afterwards!
JFTR: before GCC zealots write nonsense: see -march= or -mtune=
GCC 13.3 gcc -mpopcnt -mtune=barcelona -O3
https://gcc.godbolt.org/z/3Ks8vh7a6
ispowerof2(unsigned long long):
popcnt rdi, rdi -> popcnt rax, rdi
xor eax, eax # superfluous!
dec edi -> dec eax
sete al -> setz al
ret
GCC 13.3 gcc -m32 -mpopcnt -mtune=barcelona -O3
https://gcc.godbolt.org/z/s5s5KTGnv
ispowerof2(unsigned long long):
popcnt eax, [esp+4]
popcnt edx, [esp+8]
add eax, edx
dec eax
sete al
movzx eax, al # superfluous!
ret
Will GCC eventually generate properly optimised code instead of bloat?
Stefan
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Another epic optimiser failure
2023-05-27 21:04 Another epic optimiser failure Stefan Kanthak
@ 2023-05-27 21:20 ` Jakub Jelinek
2023-05-27 21:28 ` Stefan Kanthak
2023-05-28 6:28 ` Nicholas Vinson
1 sibling, 1 reply; 11+ messages in thread
From: Jakub Jelinek @ 2023-05-27 21:20 UTC (permalink / raw)
To: Stefan Kanthak; +Cc: gcc
On Sat, May 27, 2023 at 11:04:11PM +0200, Stefan Kanthak wrote:
> OUCH: popcnt writes the WHOLE result register, there is ABSOLUTELY
> no need to clear it beforehand nor to clear the higher 24 bits
> afterwards!
Except that there is. See https://gcc.gnu.org/PR62011 for details.
Jakub
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Another epic optimiser failure
2023-05-27 21:20 ` Jakub Jelinek
@ 2023-05-27 21:28 ` Stefan Kanthak
2023-05-27 21:42 ` Andrew Pinski
0 siblings, 1 reply; 11+ messages in thread
From: Stefan Kanthak @ 2023-05-27 21:28 UTC (permalink / raw)
To: Jakub Jelinek; +Cc: gcc
"Jakub Jelinek" <jakub@redhat.com> wrote, completely clueless:
> On Sat, May 27, 2023 at 11:04:11PM +0200, Stefan Kanthak wrote:
>> OUCH: popcnt writes the WHOLE result register, there is ABSOLUTELY
>> no need to clear it beforehand nor to clear the higher 24 bits
>> afterwards!
>
> Except that there is. See https://gcc.gnu.org/PR62011 for details.
GUESS WHY I EXPLICITLY WROTE
| JFTR: before GCC zealots write nonsense: see -march= or -mtune=
NOT AMUSED ABOUT YOUR CRYING INCOMPETENCE!
Stefan
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Another epic optimiser failure
2023-05-27 21:28 ` Stefan Kanthak
@ 2023-05-27 21:42 ` Andrew Pinski
2023-05-27 22:00 ` Stefan Kanthak
0 siblings, 1 reply; 11+ messages in thread
From: Andrew Pinski @ 2023-05-27 21:42 UTC (permalink / raw)
To: Stefan Kanthak; +Cc: Jakub Jelinek, gcc
On Sat, May 27, 2023 at 2:38 PM Stefan Kanthak <stefan.kanthak@nexgo.de> wrote:
>
> "Jakub Jelinek" <jakub@redhat.com> wrote, completely clueless:
>
> > On Sat, May 27, 2023 at 11:04:11PM +0200, Stefan Kanthak wrote:
> >> OUCH: popcnt writes the WHOLE result register, there is ABSOLUTELY
> >> no need to clear it beforehand nor to clear the higher 24 bits
> >> afterwards!
> >
> > Except that there is. See https://gcc.gnu.org/PR62011 for details.
>
> GUESS WHY I EXPLICITLY WROTE
>
> | JFTR: before GCC zealots write nonsense: see -march= or -mtune=
>
> NOT AMUSED ABOUT YOUR CRYING INCOMPETENCE!
So you want to complain about GCC knowing about an Intel performance errata????
If you read that bug report, you would have learned why the zeroing is
done. I am sorry you hate GCC for actually following advice from the
processor maker after all.
> Stefan
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Another epic optimiser failure
2023-05-27 21:42 ` Andrew Pinski
@ 2023-05-27 22:00 ` Stefan Kanthak
2023-05-27 22:46 ` Jonathan Wakely
0 siblings, 1 reply; 11+ messages in thread
From: Stefan Kanthak @ 2023-05-27 22:00 UTC (permalink / raw)
To: Andrew Pinski; +Cc: Jakub Jelinek, gcc
"Andrew Pinski" <pinskia@gmail.com> wrote:
> On Sat, May 27, 2023 at 2:38 PM Stefan Kanthak <stefan.kanthak@nexgo.de> wrote:
>>
>> "Jakub Jelinek" <jakub@redhat.com> wrote, completely clueless:
>>
>>> On Sat, May 27, 2023 at 11:04:11PM +0200, Stefan Kanthak wrote:
>>>> OUCH: popcnt writes the WHOLE result register, there is ABSOLUTELY
>>>> no need to clear it beforehand nor to clear the higher 24 bits
>>>> afterwards!
>>>
>>> Except that there is. See https://gcc.gnu.org/PR62011 for details.
>>
>> GUESS WHY I EXPLICITLY WROTE
>>
>> | JFTR: before GCC zealots write nonsense: see -march= or -mtune=
>>
>> NOT AMUSED ABOUT YOUR CRYING INCOMPETENCE!
>
> So you want to complain about GCC knowing about an Intel performance errata????
Ever heard about cargo cult?
Read the options I used CAREFULLY
| gcc -m32 -march=alderlake -O3
| gcc -m32 -march=sapphirerapids -O3
| gcc -m32 -mpopcnt -mtune=sapphirerapids -O3
> If you read that bug report, you would have learned why the zeroing is
> done. I am sorry you hate GCC for actually following advice from the
> processor maker after all.
HSD146 does NOT apply to Alder Lake or Sapphire Rapids!
NOT AMUSED ABOUT YOUR INCOMPETENCE TOO!
Stefan
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Another epic optimiser failure
2023-05-27 22:00 ` Stefan Kanthak
@ 2023-05-27 22:46 ` Jonathan Wakely
0 siblings, 0 replies; 11+ messages in thread
From: Jonathan Wakely @ 2023-05-27 22:46 UTC (permalink / raw)
To: Stefan Kanthak; +Cc: Andrew Pinski, Jakub Jelinek, gcc
On Sat, 27 May 2023 at 23:03, Stefan Kanthak <stefan.kanthak@nexgo.de> wrote:
>
> "Andrew Pinski" <pinskia@gmail.com> wrote:
>
> > On Sat, May 27, 2023 at 2:38 PM Stefan Kanthak <stefan.kanthak@nexgo.de> wrote:
> >>
> >> "Jakub Jelinek" <jakub@redhat.com> wrote, completely clueless:
> >>
> >>> On Sat, May 27, 2023 at 11:04:11PM +0200, Stefan Kanthak wrote:
> >>>> OUCH: popcnt writes the WHOLE result register, there is ABSOLUTELY
> >>>> no need to clear it beforehand nor to clear the higher 24 bits
> >>>> afterwards!
> >>>
> >>> Except that there is. See https://gcc.gnu.org/PR62011 for details.
> >>
> >> GUESS WHY I EXPLICITLY WROTE
> >>
> >> | JFTR: before GCC zealots write nonsense: see -march= or -mtune=
> >>
> >> NOT AMUSED ABOUT YOUR CRYING INCOMPETENCE!
> >
> > So you want to complain about GCC knowing about an Intel performance errata????
>
> Ever heard about cargo cult?
>
> Read the options I used CAREFULLY
>
> | gcc -m32 -march=alderlake -O3
> | gcc -m32 -march=sapphirerapids -O3
> | gcc -m32 -mpopcnt -mtune=sapphirerapids -O3
>
> > If you read that bug report, you would have learned why the zeroing is
> > done. I am sorry you hate GCC for actually following advice from the
> > processor maker after all.
>
> HSD146 does NOT apply to Alder Lake or Sapphire Rapids!
>
> NOT AMUSED ABOUT YOUR INCOMPETENCE TOO!
> Stefan
File a proper bug report then, instead of subjecting this mailing list
to your clown show.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Another epic optimiser failure
2023-05-27 21:04 Another epic optimiser failure Stefan Kanthak
2023-05-27 21:20 ` Jakub Jelinek
@ 2023-05-28 6:28 ` Nicholas Vinson
1 sibling, 0 replies; 11+ messages in thread
From: Nicholas Vinson @ 2023-05-28 6:28 UTC (permalink / raw)
To: gcc
On 5/27/23 17:04, Stefan Kanthak wrote:
> --- .c ---
> int ispowerof2(unsigned long long argument) {
> return __builtin_popcountll(argument) == 1;
> }
> --- EOF ---
>
> GCC 13.3 gcc -m32 -march=alderlake -O3
> gcc -m32 -march=sapphirerapids -O3
> gcc -m32 -mpopcnt -mtune=sapphirerapids -O3
>
> https://gcc.godbolt.org/z/cToYrrYPq
> ispowerof2(unsigned long long):
> xor eax, eax # superfluous
> xor edx, edx # superfluous
> popcnt eax, [esp+4]
> popcnt edx, [esp+8]
> add eax, edx
> cmp eax, 1 -> dec eax
> sete al
> movzx eax, al # superfluous
> ret
>
> 9 instructions in 28 bytes # 6 instructions in 20 bytes
I agree this can be done using 6 instructions, but you cannot do it
using the dec instruction. If you use the dec instruction, "movzx eax,
al" becomes a required instruction (consider the case when the input is
0) resulting in 7 instructions and 22 bytes.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Another epic optimiser failure
2023-05-29 19:01 ` Dave Blanchard
2023-05-29 23:44 ` Nicholas Vinson
@ 2023-05-30 4:04 ` Julian Waters
1 sibling, 0 replies; 11+ messages in thread
From: Julian Waters @ 2023-05-30 4:04 UTC (permalink / raw)
To: Dave Blanchard, gcc
[-- Attachment #1: Type: text/plain, Size: 5497 bytes --]
"There's your first mistake. Hint: people who are able to hand deconstruct
the output of a compiler's code generator and point out exactly how
instructions are wasted are never correctly referred to as an "idiot", in
the context of computer programming at least."
gdb -batch -ex 'set disassembly-flavor intel' -ex 'file /bin/ls' -ex
'disassemble method'
Being able to run a disassembler does not make you intelligent or hot shit ;)
"Strange reasoning you've used here. Is this sort of like how if I'm
against Donald Trump, then I must be for Hillary Clinton, or vice
versa?
That's called a "false dichotomy" FYI."
Literally a few sentences later:
"Are the GCC developers *trying* to subtly push everyone toward Clang,
by slowly degrading GCC over time in hopes that people will eventually
give up and leave in frustration? Serious question."
"You mean the ones which are unclear and uncertain, because the GCC
documentation is inaccurate or simply lies?"
I will concede that gcc's documentation is pretty horrible, but that's
the exact reason I asked the gcc developers
and maintainers to list the options out here instead. Maybe you need
to brush up on your comprehension skills?
"Do you have any rebuttals of his argument to present yourself? Or do
you prefer to just sit back and wait on "y'all" to do the heavy
lifting?"
I would, if he hadn't already been absolutely schooled in more recent
replies pointing out why gcc produces certain code sequences.
At least he had the maturity to apologize and make his leave once he
found out he had made several mistakes, unlike
a certain someone I'm speaking to right now. And the few corner cases
he mentions that are valid he didn't even bother
filing a bug report, but instead resorted to endlessly screaming on a
mailing list without letting anyone talk to
him properly.
"What version of GCC can we expect to generate efficient and correct
code for this brand new, just-released "x86" instruction set? Maybe
GCC 97 will finally get it right...which at the current rate of major
version number increase, should be some time next year I guess.
Or rather more accurately, when will GCC's code generator stop
regressing as it seemingly has done for many versions now, and finally
Make Compiling Great Again?"
Again, you may want to look at the more recent code snippets posted
after the main argument got sidetracked, I don't
even have to do anything, hilariously enough.
It's also amazing how you managed to overpoliticize the rest of your
post after using that criticism against me
initially, how ironic.
"Ever heard the saying "if you can't run with the big dogs, stay under
the porch"?"
You think you're one of the big dogs? Pffft, that's cute.
I have no interest in getting into a compiler pissing match past this
point, so be on your way. Go annoy other people instead with
your garbage.
On Tue, May 30, 2023 at 2:59 AM Dave Blanchard <dave@killthe.net> wrote:
> On Sun, 28 May 2023 15:50:41 +0800
> Julian Waters via Gcc <gcc@gcc.gnu.org> wrote:
>
> > Man, these clang fanboys sure are getting out of hand
>
> Strange reasoning you've used here. Is this sort of like how if I'm
> against Donald Trump, then I must be for Hillary Clinton, or vice versa?
>
> That's called a "false dichotomy" FYI.
>
> > I feel like all this garbage can be easily resolved by y'all showing this
> > idiot
>
> There's your first mistake. Hint: people who are able to hand deconstruct
> the output of a compiler's code generator and point out exactly how
> instructions are wasted are never correctly referred to as an "idiot", in
> the context of computer programming at least.
>
> He's certainly got a few things wrong from time to time in his zeal, but
> his overall point seems to stand. Do you have any rebuttals of his argument
> to present yourself? Or do you prefer to just sit back and wait on "y'all"
> to do the heavy lifting?
>
> > the exact proper options required
>
> You mean the ones which are unclear and uncertain, because the GCC
> documentation is inaccurate or simply lies?
>
> > and attaching the resulting compiled assembly exactly as he wants it
>
> And what if GCC is unable to produce anything like that, because the code
> generator is at the very least questionable, as his postings seems to prove?
>
> > or if gcc doesn't compile the exact assembly he wants, explaining why
> gcc chose a different
> > route than the quote on quote "Perfect assembly" that he expects it to
> spit
> > out
>
> What version of GCC can we expect to generate efficient and correct code
> for this brand new, just-released "x86" instruction set? Maybe GCC 97 will
> finally get it right...which at the current rate of major version number
> increase, should be some time next year I guess.
>
> Or rather more accurately, when will GCC's code generator stop regressing
> as it seemingly has done for many versions now, and finally Make Compiling
> Great Again?
>
> > And Stefan? Ever heard of the saying that "the loudest man in the room is
> > always the weakest"?
>
> Ever heard the saying "if you can't run with the big dogs, stay under the
> porch"?
>
> Are the GCC developers *trying* to subtly push everyone toward Clang, by
> slowly degrading GCC over time in hopes that people will eventually give up
> and leave in frustration? Serious question.
>
> Dave
>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Another epic optimiser failure
2023-05-29 19:01 ` Dave Blanchard
@ 2023-05-29 23:44 ` Nicholas Vinson
2023-05-30 4:04 ` Julian Waters
1 sibling, 0 replies; 11+ messages in thread
From: Nicholas Vinson @ 2023-05-29 23:44 UTC (permalink / raw)
To: gcc
[-- Attachment #1: Type: text/plain, Size: 3887 bytes --]
On 5/29/23 15:01, Dave Blanchard wrote:
> He's certainly got a few things wrong from time to time in his zeal, but his overall point seems to stand. Do you have any rebuttals of his argument to present yourself? Or do you prefer to just sit back and wait on "y'all" to do the heavy lifting?
He's gotten many details wrong including the proper flags to set for gcc
(and the "bad documentation" does not justify all the errors he's made),
his hand-generated assembly (I've personally pointed out logic errors in
his assembly on more than on occasion), and has failed to provide
evidence that his solutions are better.
In almost all of his examples, he uses -O3 which is basically the "speed
above all else" optimization level. I pointed this out before; I also
pointed out that the smallest code (in bytes) with the fewest
instructions is not always the fastest. He has not provided any data
showing that his solutions result in faster executing code than what gcc
produces. He has also raised questions that show a distinct lack of
understanding when it comes to storage hierarchy; something I feel one
would need to know to properly write fast assembly. Finally, I will
admit some of the examples of gcc produced code are a bit suspicious,
and probably should be reviewed.
In short Stefan is not being taken seriously because he is not
presenting himself, or his arguments, in a manner that would convince
people to take him seriously. As long as Stefan continues to communicate
in such a manner, we're going to see similar such responses from (some
of) the gcc devs (unfortunately).
The best next steps for Stefan, would be to review the constructive
criticism, expand on his examples by providing explanation and proof as
to why they're better, and then present these updated findings in the
proper manner.
Using his first example as my own, take the C code:
int ispowerof2(unsigned long long argument)
{
return (argument & argument - 1) == 0;
}
when compiled produces:
% gcc -m32 -O3 -c ispowerof2.c && objdump -d -Mintel ispowerof2.o
ispowerof2.o: file format elf32-i386
Disassembly of section .text:
00000000 <ispowerof2>:
0: f3 0f 7e 4c 24 04 movq xmm1,QWORD PTR [esp+0x4]
6: 66 0f 76 c0 pcmpeqd xmm0,xmm0
a: 66 0f d4 c1 paddq xmm0,xmm1
e: 66 0f db c1 pand xmm0,xmm1
12: 66 0f 7e c2 movd edx,xmm0
16: 66 0f 73 d0 20 psrlq xmm0,0x20
1b: 66 0f 7e c0 movd eax,xmm0
1f: 09 c2 or edx,eax
21: 0f 94 c0 sete al
24: 0f b6 c0 movzx eax,al
27: c3 ret
Whereas he claims the following is better:
movq xmm1, [esp+4]
pcmpeqd xmm0, xmm0
paddq xmm0, xmm1
pand xmm0, xmm1
pxor xmm1, xmm1
pcmpeqb xmm0, xmm1
pmovmskb eax, xmm0
cmp al, 255
sete al
ret
because it has 10 instructions and is 36 bytes long vs the 11
instructions and 40 bytes. However, the rebuttals are 1. his code is
wrong (can return values other than 0 or 1) and 2. -O3 doesn't optimize
on instruction count or byte size (as an aside: clang's output uses 14
instructions but is only 32 bytes in size -- is it better or worse than
gcc's?).
Therefore, while he's 1 instruction less and 4 bytes fewer (1 byte fewer
if you add the needed correction), he presents no evidence that his
solution is actually faster. What he would need to do instead is show
proof that his solution is indeed faster than what gcc produces.
Afterwards, he would be in a position to represent this data in a proper
manner.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Another epic optimiser failure
2023-05-28 7:50 Julian Waters
@ 2023-05-29 19:01 ` Dave Blanchard
2023-05-29 23:44 ` Nicholas Vinson
2023-05-30 4:04 ` Julian Waters
0 siblings, 2 replies; 11+ messages in thread
From: Dave Blanchard @ 2023-05-29 19:01 UTC (permalink / raw)
To: gcc; +Cc: Julian Waters
On Sun, 28 May 2023 15:50:41 +0800
Julian Waters via Gcc <gcc@gcc.gnu.org> wrote:
> Man, these clang fanboys sure are getting out of hand
Strange reasoning you've used here. Is this sort of like how if I'm against Donald Trump, then I must be for Hillary Clinton, or vice versa?
That's called a "false dichotomy" FYI.
> I feel like all this garbage can be easily resolved by y'all showing this
> idiot
There's your first mistake. Hint: people who are able to hand deconstruct the output of a compiler's code generator and point out exactly how instructions are wasted are never correctly referred to as an "idiot", in the context of computer programming at least.
He's certainly got a few things wrong from time to time in his zeal, but his overall point seems to stand. Do you have any rebuttals of his argument to present yourself? Or do you prefer to just sit back and wait on "y'all" to do the heavy lifting?
> the exact proper options required
You mean the ones which are unclear and uncertain, because the GCC documentation is inaccurate or simply lies?
> and attaching the resulting compiled assembly exactly as he wants it
And what if GCC is unable to produce anything like that, because the code generator is at the very least questionable, as his postings seems to prove?
> or if gcc doesn't compile the exact assembly he wants, explaining why gcc chose a different
> route than the quote on quote "Perfect assembly" that he expects it to spit
> out
What version of GCC can we expect to generate efficient and correct code for this brand new, just-released "x86" instruction set? Maybe GCC 97 will finally get it right...which at the current rate of major version number increase, should be some time next year I guess.
Or rather more accurately, when will GCC's code generator stop regressing as it seemingly has done for many versions now, and finally Make Compiling Great Again?
> And Stefan? Ever heard of the saying that "the loudest man in the room is
> always the weakest"?
Ever heard the saying "if you can't run with the big dogs, stay under the porch"?
Are the GCC developers *trying* to subtly push everyone toward Clang, by slowly degrading GCC over time in hopes that people will eventually give up and leave in frustration? Serious question.
Dave
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Another epic optimiser failure
@ 2023-05-28 7:50 Julian Waters
2023-05-29 19:01 ` Dave Blanchard
0 siblings, 1 reply; 11+ messages in thread
From: Julian Waters @ 2023-05-28 7:50 UTC (permalink / raw)
To: gcc
[-- Attachment #1: Type: text/plain, Size: 507 bytes --]
Man, these clang fanboys sure are getting out of hand
I feel like all this garbage can be easily resolved by y'all showing this
idiot the exact proper options required and attaching the resulting
compiled assembly exactly as he wants it, or if gcc doesn't compile the
exact assembly he wants, explaining why gcc chose a different
route than the quote on quote "Perfect assembly" that he expects it to spit
out
And Stefan? Ever heard of the saying that "the loudest man in the room is
always the weakest"?
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2023-05-30 4:04 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-27 21:04 Another epic optimiser failure Stefan Kanthak
2023-05-27 21:20 ` Jakub Jelinek
2023-05-27 21:28 ` Stefan Kanthak
2023-05-27 21:42 ` Andrew Pinski
2023-05-27 22:00 ` Stefan Kanthak
2023-05-27 22:46 ` Jonathan Wakely
2023-05-28 6:28 ` Nicholas Vinson
2023-05-28 7:50 Julian Waters
2023-05-29 19:01 ` Dave Blanchard
2023-05-29 23:44 ` Nicholas Vinson
2023-05-30 4:04 ` Julian Waters
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).