From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from eggs.gnu.org (eggs.gnu.org [IPv6:2001:470:142:3::10]) by sourceware.org (Postfix) with ESMTPS id 9A9EE3858C66 for ; Sat, 27 May 2023 17:33:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9A9EE3858C66 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nexgo.de Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=nexgo.de Received: from mr5.vodafonemail.de ([145.253.228.165]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1q2xng-0000U7-Bn for gcc@gnu.org; Sat, 27 May 2023 13:33:58 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nexgo.de; s=vfde-smtpout-mb-15sep; t=1685208832; bh=83oVf5tLPn96d2aLigUGRVevmfLi8dzGTT6ohtEiAQ4=; h=Message-ID:From:To:Subject:Date:Content-Type:X-Mailer:From; b=JV7pTcxWYIa9O1Y650BcjXWdupQo1/XdBNS54qPe6ZwoJhsPAwftYNYR8vwdP0/5K RRMmsrYxwvqrWAWbKiyKTcOHJfSYroau6YKphhLA2+zSoefvlcbCjk0YHSNxen/D0W WP5GTJ99rVSNVnPpamIb+Olsfvzm/hNbbGFmAHZ0= Received: from smtp.vodafone.de (unknown [10.0.0.2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by mr5.vodafonemail.de (Postfix) with ESMTPS id 4QT86870F3z1yGm for ; Sat, 27 May 2023 17:33:52 +0000 (UTC) Received: from H270 (p5b38f631.dip0.t-ipconnect.de [91.56.246.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by smtp.vodafone.de (Postfix) with ESMTPSA id 4QT86459PSzMkrp for ; Sat, 27 May 2023 17:33:45 +0000 (UTC) Message-ID: From: "Stefan Kanthak" To: Subject: Epic code generator/optimiser failures Date: Sat, 27 May 2023 19:32:52 +0200 Organization: Me, myself & IT MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Windows Mail 6.0.6002.18197 X-MimeOLE: Produced By Microsoft MimeOLE V6.1.7601.24158 X-purgate-type: clean X-purgate: clean X-purgate-size: 2707 X-purgate-ID: 155817::1685208828-DE7FB4B9-4AB5DCD5/0/0 Received-SPF: pass client-ip=145.253.228.165; envelope-from=stefan.kanthak@nexgo.de; helo=mr5.vodafonemail.de X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9,DKIM_SIGNED=0.1,DKIM_VALID=-0.1,DKIM_VALID_AU=-0.1,DKIM_VALID_EF=-0.1,RCVD_IN_DNSWL_LOW=-0.7,SPF_HELO_NONE=0.001,SPF_PASS=-0.001,T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Status: No, score=-2.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_FAIL,SPF_HELO_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --- demo.c --- int ispowerof2(unsigned long long argument) { return (argument != 0) && ((argument & argument - 1) == 0); } --- EOF --- GCC 13.1 gcc -m32 -mavx -O3 # or -march=native instead of -mavx https://gcc.godbolt.org/z/T31Gzo85W ispowerof2(unsigned long long): vmovq xmm1, QWORD PTR [esp+4] -> movq xmm0, dword ptr [esp+4] xor eax, eax -> xor eax, eax vpunpcklqdq xmm0, xmm1, xmm1 # superfluous vptest xmm0, xmm0 -> ptest xmm0, xmm0 je .L1 -> jz .L1 vpcmpeqd xmm0, xmm0, xmm0 -> pcmpeqd xmm1, xmm1 xor eax, eax # superfluous vpaddq xmm0, xmm1, xmm0 -> paddq xmm1. xmm0 vpand xmm0, xmm0, xmm1 # superfluous vpunpcklqdq xmm0, xmm0, xmm0 # superfluous vptest xmm0, xmm0 -> ptest xmm1, xmm0 sete al -> setz al .L1: ret -> ret 5 out of 13 instructions are SUPERFLUOUS here! OUCH #1: there's ANSOLUTELY no need to generate AVX instructions and bloat the code through VEX prefixes and longer instructions! OUCH #2: [V]MOVQ clears the upper lane of XMM registers, there's ABSOLTELY no need for [V]PUNPCKLQDQ instructions. GCC 13.1 gcc -m32 -msse4.1 -O3 https://gcc.godbolt.org/z/bqsqec6r1 ispowerof2(unsigned long long): movq xmm1, QWORD PTR [esp+4] -> movq xmm0, [esp+4] xor eax, eax -> xor eax, eax movdqa xmm0, xmm1 # superfluous punpcklqdq xmm0, xmm1 # superfluous ptest xmm0, xmm0 -> ptest xmm0, xmm0 je .L1 -> jz .L1 pcmpeqd xmm0, xmm0 -> pcmpeqq xmm1, xmm1 xor eax, eax # superfluous paddq xmm0, xmm1 -> paddq xmm1, xmm0 pand xmm0, xmm1 # superfluous punpcklqdq xmm0, xmm0 # superfluous ptest xmm0, xmm0 -> ptest xmm1, xmm0 sete al -> setz al .L1: ret -> ret 5 out of 14 instructions are superfluous here, or 18 of 50 bytes! OUCH #3/#4: see above! Will GCC eventually generate proper SSE4.1/AVX code? Stefan