From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qt1-x831.google.com (mail-qt1-x831.google.com [IPv6:2607:f8b0:4864:20::831]) by sourceware.org (Postfix) with ESMTPS id 496523858431 for ; Mon, 29 May 2023 23:44:45 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 496523858431 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-qt1-x831.google.com with SMTP id d75a77b69052e-3f8177f9a7bso12556281cf.2 for ; Mon, 29 May 2023 16:44:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1685403884; x=1687995884; h=in-reply-to:from:references:to:content-language:subject:user-agent :mime-version:date:message-id:from:to:cc:subject:date:message-id :reply-to; bh=ydHOjoAKkfoEivW+B2TpT+4lNSNb2tY2lfEC74xJIQI=; b=Olim6v22Tbq9nV65ERAPGel3qlfMwWrY0rwnK8WFtZ7zqrgaFAQWnxZwiOT/Wa90JV nbTYgeHr3iUSphGkqToOieCZAXm1BnEFBw1fHVJKsOhg3W3mp1m/+uUrce7NieErhlE7 ZU7vt9py1K9QbtMi/hAeUW82zhkaVtVzaW7rzucgj9JIS3mpFFqtdSqXfN6+OM11fXhd 7rQOTDJgXaO5agrPstHceuH07nhPV5XjknNVrwo9QUMMzpVvdJyLGyz9/scKw4VFTZ8Q EzSBLP9N49ud/feyCX/+wP5k7HGVvOaCqKUHQorp8UO6IK5TFN6lofEyK6XLTnOEuCHJ 0K/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685403884; x=1687995884; h=in-reply-to:from:references:to:content-language:subject:user-agent :mime-version:date:message-id:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=ydHOjoAKkfoEivW+B2TpT+4lNSNb2tY2lfEC74xJIQI=; b=F2ppJSCozrQBkh/2NXeaVSVqRgLoEzbUpjjqvZWn3LKmrA3MU3R/1BWxRBRQrKD9iP DhPeYpg1n2c3M+eb0Wdz9bg+GxkeEGvyLRJvxbREzdikRRtTaWlIfRKOovjt8aGJqAuZ 9ZRS2Oyio0mkL8Vr9jzlnciRiQk15wiT+AoFcSbCRs7f60hWVrS0uisQ++LDn6sP9M94 /Nyizq9aKmJMGvo6ZJwbSH6ZcoUlvPX4t+2nqFWxPr3PTEKdFhYsPm/BiJS7zWqOj55K cMTdSzd9BNl+z5AJJ8jAl9HxLbpAwWYDxYu/2IDMAnKiFGglBdFuYUIuAYnexCHWu2s0 q3CQ== X-Gm-Message-State: AC+VfDy2zFlYkFbHVrsyzUBsdGcl43xS2vkScteJ0osSjo4xT3vjgTlS ZrTY5L9KedRkMj1H6EaqKVuFONQGUpc= X-Google-Smtp-Source: ACHHUZ4LoYK7RMf6VxcFnojk3zg/fJrFYhFj43k0vz4xY2voRXLeLJlYhtL+MqNqcjUXigqzVQhQ4Q== X-Received: by 2002:a05:622a:144:b0:3f6:9a18:e67c with SMTP id v4-20020a05622a014400b003f69a18e67cmr166307qtw.66.1685403883986; Mon, 29 May 2023 16:44:43 -0700 (PDT) Received: from ?IPV6:2602:47:d92c:4400:af7c:ec96:30ac:73cd? ([2602:47:d92c:4400:af7c:ec96:30ac:73cd]) by smtp.gmail.com with ESMTPSA id fg20-20020a05622a581400b003e39106bdb2sm4237099qtb.31.2023.05.29.16.44.43 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 29 May 2023 16:44:43 -0700 (PDT) Content-Type: multipart/alternative; boundary="------------nklmhCNZ0gBD8hHHooixpjHF" Message-ID: Date: Mon, 29 May 2023 19:44:42 -0400 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0 Subject: Re: Another epic optimiser failure Content-Language: en-US To: gcc@gcc.gnu.org References: <20230529140119.91d6f657a19b31f68d80467d@killthe.net> From: Nicholas Vinson In-Reply-To: <20230529140119.91d6f657a19b31f68d80467d@killthe.net> X-Spam-Status: No, score=0.1 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: This is a multi-part message in MIME format. --------------nklmhCNZ0gBD8hHHooixpjHF Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 5/29/23 15:01, Dave Blanchard wrote: > He's certainly got a few things wrong from time to time in his zeal, but his overall point seems to stand. Do you have any rebuttals of his argument to present yourself? Or do you prefer to just sit back and wait on "y'all" to do the heavy lifting? He's gotten many details wrong including the proper flags to set for gcc (and the "bad documentation" does not justify all the errors he's made), his hand-generated assembly (I've personally pointed out logic errors in his assembly on more than on occasion), and has failed to provide evidence that his solutions are better. In almost all of his examples, he uses -O3 which is basically the "speed above all else" optimization level. I pointed this out before; I also pointed out that the smallest code (in bytes) with the fewest instructions is not always the fastest. He has not provided any data showing that his solutions result in faster executing code than what gcc produces. He has also raised questions that show a distinct lack of understanding when it comes to storage hierarchy; something I feel one would need to know to properly write fast assembly. Finally, I will admit some of the examples of gcc produced code are a bit suspicious, and probably should be reviewed. In short Stefan is not being taken seriously because he is not presenting himself, or his arguments, in a manner that would convince people to take him seriously. As long as Stefan continues to communicate in such a manner, we're going to see similar such responses from (some of) the gcc devs (unfortunately). The best next steps for Stefan, would be to review the constructive criticism, expand on his examples by providing explanation and proof as to why they're better, and then present these updated findings in the proper manner. Using his first example as my own, take the C code: int ispowerof2(unsigned long long argument) {      return (argument & argument - 1) == 0; } when compiled produces: % gcc -m32 -O3 -c ispowerof2.c && objdump -d -Mintel ispowerof2.o ispowerof2.o:     file format elf32-i386 Disassembly of section .text: 00000000 :    0:   f3 0f 7e 4c 24 04       movq   xmm1,QWORD PTR [esp+0x4]    6:   66 0f 76 c0             pcmpeqd xmm0,xmm0    a:   66 0f d4 c1             paddq  xmm0,xmm1    e:   66 0f db c1             pand   xmm0,xmm1   12:   66 0f 7e c2             movd   edx,xmm0   16:   66 0f 73 d0 20          psrlq  xmm0,0x20   1b:   66 0f 7e c0             movd   eax,xmm0   1f:   09 c2                   or     edx,eax   21:   0f 94 c0                sete   al   24:   0f b6 c0                movzx  eax,al   27:   c3                      ret Whereas he claims the following is better: movq    xmm1, [esp+4] pcmpeqd xmm0, xmm0 paddq   xmm0, xmm1 pand    xmm0, xmm1 pxor    xmm1, xmm1 pcmpeqb xmm0, xmm1 pmovmskb eax, xmm0 cmp     al, 255 sete    al ret because it has 10 instructions and is 36 bytes long vs the 11 instructions and 40 bytes. However, the rebuttals are 1. his code is wrong (can return values other than 0 or 1) and 2. -O3 doesn't optimize on instruction count or  byte size (as an aside: clang's output uses 14 instructions but is only 32 bytes in size -- is it better or worse than gcc's?). Therefore, while he's 1 instruction less and 4 bytes fewer (1 byte fewer if you add the needed correction), he presents no evidence that his solution is actually faster. What he would need to do instead is show proof that his solution is indeed faster than what gcc produces. Afterwards, he would be in a position to represent this data in a proper manner. --------------nklmhCNZ0gBD8hHHooixpjHF--