public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
From: Nicholas Vinson <nvinson234@gmail.com>
To: gcc@gcc.gnu.org
Subject: Re: Another epic optimiser failure
Date: Mon, 29 May 2023 19:44:42 -0400	[thread overview]
Message-ID: <b922435c-2aae-d31e-5075-8af05db2ea34@gmail.com> (raw)
In-Reply-To: <20230529140119.91d6f657a19b31f68d80467d@killthe.net>

[-- Attachment #1: Type: text/plain, Size: 3887 bytes --]

On 5/29/23 15:01, Dave Blanchard wrote:

> He's certainly got a few things wrong from time to time in his zeal, but his overall point seems to stand. Do you have any rebuttals of his argument to present yourself? Or do you prefer to just sit back and wait on "y'all" to do the heavy lifting?

He's gotten many details wrong including the proper flags to set for gcc 
(and the "bad documentation" does not justify all the errors he's made), 
his hand-generated assembly (I've personally pointed out logic errors in 
his assembly on more than on occasion), and has failed to provide 
evidence that his solutions are better.

In almost all of his examples, he uses -O3 which is basically the "speed 
above all else" optimization level. I pointed this out before; I also 
pointed out that the smallest code (in bytes) with the fewest 
instructions is not always the fastest. He has not provided any data 
showing that his solutions result in faster executing code than what gcc 
produces. He has also raised questions that show a distinct lack of 
understanding when it comes to storage hierarchy; something I feel one 
would need to know to properly write fast assembly. Finally, I will 
admit some of the examples of gcc produced code are a bit suspicious, 
and probably should be reviewed.

In short Stefan is not being taken seriously because he is not 
presenting himself, or his arguments, in a manner that would convince 
people to take him seriously. As long as Stefan continues to communicate 
in such a manner, we're going to see similar such responses from (some 
of) the gcc devs (unfortunately).

The best next steps for Stefan, would be to review the constructive 
criticism, expand on his examples by providing explanation and proof as 
to why they're better, and then present these updated findings in the 
proper manner.

Using his first example as my own, take the C code:

	int ispowerof2(unsigned long long argument)
	{
      		return (argument & argument - 1) == 0;
	}

when compiled produces:

% gcc -m32 -O3 -c ispowerof2.c && objdump -d -Mintel ispowerof2.o

ispowerof2.o:     file format elf32-i386

Disassembly of section .text:

	00000000 <ispowerof2>:
	   0:   f3 0f 7e 4c 24 04       movq   xmm1,QWORD PTR [esp+0x4]
	   6:   66 0f 76 c0             pcmpeqd xmm0,xmm0
	   a:   66 0f d4 c1             paddq  xmm0,xmm1
	   e:   66 0f db c1             pand   xmm0,xmm1
	  12:   66 0f 7e c2             movd   edx,xmm0
	  16:   66 0f 73 d0 20          psrlq  xmm0,0x20
	  1b:   66 0f 7e c0             movd   eax,xmm0
	  1f:   09 c2                   or     edx,eax
	  21:   0f 94 c0                sete   al
	  24:   0f b6 c0                movzx  eax,al
	  27:   c3                      ret

Whereas he claims the following is better:

	movq    xmm1, [esp+4]
	pcmpeqd xmm0, xmm0
	paddq   xmm0, xmm1
	pand    xmm0, xmm1
	pxor    xmm1, xmm1
	pcmpeqb xmm0, xmm1
	pmovmskb eax, xmm0
	cmp     al, 255
	sete    al
	ret

because it has 10 instructions and is 36 bytes long vs the 11 
instructions and 40 bytes. However, the rebuttals are 1. his code is 
wrong (can return values other than 0 or 1) and 2. -O3 doesn't optimize 
on instruction count or  byte size (as an aside: clang's output uses 14 
instructions but is only 32 bytes in size -- is it better or worse than 
gcc's?).

Therefore, while he's 1 instruction less and 4 bytes fewer (1 byte fewer 
if you add the needed correction), he presents no evidence that his 
solution is actually faster. What he would need to do instead is show 
proof that his solution is indeed faster than what gcc produces.

Afterwards, he would be in a position to represent this data in a proper 
manner.

  reply	other threads:[~2023-05-29 23:44 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-28  7:50 Julian Waters
2023-05-29 19:01 ` Dave Blanchard
2023-05-29 23:44   ` Nicholas Vinson [this message]
2023-05-30  4:04   ` Julian Waters
  -- strict thread matches above, loose matches on Subject: below --
2023-05-27 21:04 Stefan Kanthak
2023-05-27 21:20 ` Jakub Jelinek
2023-05-27 21:28   ` Stefan Kanthak
2023-05-27 21:42     ` Andrew Pinski
2023-05-27 22:00       ` Stefan Kanthak
2023-05-27 22:46         ` Jonathan Wakely
2023-05-28  6:28 ` Nicholas Vinson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b922435c-2aae-d31e-5075-8af05db2ea34@gmail.com \
    --to=nvinson234@gmail.com \
    --cc=gcc@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).