From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from eggs.gnu.org (eggs.gnu.org [IPv6:2001:470:142:3::10]) by sourceware.org (Postfix) with ESMTPS id 7C7023858D3C for ; Sat, 27 May 2023 22:54:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7C7023858D3C Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nexgo.de Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=nexgo.de Received: from mr3.vodafonemail.de ([145.253.228.163]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1q32nk-0003YG-0A for gcc@gnu.org; Sat, 27 May 2023 18:54:22 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nexgo.de; s=vfde-smtpout-mb-15sep; t=1685228056; bh=KOyb7ql75H8AgqbuqowWXtuu01N5rXQTGYxlwL6Aufs=; h=Message-ID:From:To:References:In-Reply-To:Subject:Date: Content-Type:X-Mailer:From; b=WP/pgR02VVBh6FmFCx/kBNc3jPWAt/ISQmsuD+Wz6qB3Bwlafh3pg9WtCy3qWpJSr 2jOXG7FhS/UxwDC/NsCZdK9A5R+56L0ayyRAV0Iyv9U/AAZln1Ga0pY5nNUEwuthUP V8VmMEV8L7CI8qXjD1Gb/gS+nehCfD+BbvYQtfh4= Received: from smtp.vodafone.de (unknown [10.0.0.2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by mr3.vodafonemail.de (Postfix) with ESMTPS id 4QTHCr5J06z211s; Sat, 27 May 2023 22:54:16 +0000 (UTC) Received: from H270 (p5b38f631.dip0.t-ipconnect.de [91.56.246.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by smtp.vodafone.de (Postfix) with ESMTPSA id 4QTHCj2WXlz9s2B; Sat, 27 May 2023 22:54:06 +0000 (UTC) Message-ID: <8DB226CF451A4430A8D7D5CBFE6B3972@H270> From: "Stefan Kanthak" To: "Andrew Pinski" Cc: References: <23A490318B7149D88618A7CDA2CEDB14@H270> In-Reply-To: Subject: Re: Who cares about performance (or Intel's CPU errata)? Date: Sun, 28 May 2023 00:52:30 +0200 Organization: Me, myself & IT MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Windows Mail 6.0.6002.18197 X-MimeOLE: Produced By Microsoft MimeOLE V6.1.7601.24158 X-purgate-type: clean X-purgate: clean X-purgate-size: 2386 X-purgate-ID: 155817::1685228052-767FB4DE-D3665075/0/0 Received-SPF: pass client-ip=145.253.228.163; envelope-from=stefan.kanthak@nexgo.de; helo=mr3.vodafonemail.de X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9,DKIM_SIGNED=0.1,DKIM_VALID=-0.1,DKIM_VALID_AU=-0.1,DKIM_VALID_EF=-0.1,RCVD_IN_DNSWL_LOW=-0.7,SPF_HELO_NONE=0.001,SPF_PASS=-0.001,T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Status: No, score=-2.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_FAIL,SPF_HELO_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: "Andrew Pinski" wrote: > On Sat, May 27, 2023 at 2:25 PM Stefan Kanthak wrote: >> >> Just to show how SLOPPY, INCONSEQUENTIAL and INCOMPETENT GCC's developers are: >> >> --- dontcare.c --- >> int ispowerof2(unsigned __int128 argument) { >> return __builtin_popcountll(argument) + __builtin_popcountll(argument >> 64) == 1; >> } >> --- EOF --- >> >> GCC 13.3 gcc -march=haswell -O3 >> >> https://gcc.godbolt.org/z/PPzYsPzMc >> ispowerof2(unsigned __int128): >> popcnt rdi, rdi >> popcnt rsi, rsi >> add esi, edi >> xor eax, eax >> cmp esi, 1 >> sete al >> ret >> >> OOPS: what about Intel's CPU errata regarding the false dependency on POPCNTs output? > > Because the popcount is going to the same register, there is no false > dependency .... > The false dependency errata only applies if the result of the popcnt > is going to a different register, the processor thinks it depends on > the result in that register from a previous instruction but it does > not (which is why it is called a false dependency). In this case it > actually does depend on the previous result since the input is the > same as the input. OUCH, my fault; sorry for the confusion and the wrong accusation. Nevertheless GCC fails to optimise code properly: --- .c --- int ispowerof2(unsigned long long argument) { return __builtin_popcountll(argument) == 1; } --- EOF --- GCC 13.3 gcc -m32 -mpopcnt -O3 https://godbolt.org/z/fT7a7jP4e ispowerof2(unsigned long long): xor eax, eax xor edx, edx popcnt eax, [esp+4] popcnt edx, [esp+8] add eax, edx # eax is less than 64! cmp eax, 1 -> dec eax # 2 bytes shorter sete al movzx eax, al # superfluous ret 5 bytes and 1 instruction saved; 5 bytes here and there accumulate to kilo- or even megabytes, and they can extend code to cross a cache line or a 16-byte alignment boundary. JFTR: same for "__builtin_popcount(argument) == 1;" and 32-bit argument JFTR: GCC is notorious for generating superfluous MOVZX instructions where its optimiser SHOULD be able see that the value is already less than 256! Stefan