From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk1-x72a.google.com (mail-qk1-x72a.google.com [IPv6:2607:f8b0:4864:20::72a]) by sourceware.org (Postfix) with ESMTPS id 2D7513858D3C for ; Sun, 28 May 2023 06:40:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2D7513858D3C Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-qk1-x72a.google.com with SMTP id af79cd13be357-75b17aa343dso137690085a.3 for ; Sat, 27 May 2023 23:40:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1685256056; x=1687848056; h=content-transfer-encoding:in-reply-to:from:references:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=kMLEeO2tzD950kn5cwDfYuE4PyesvABhDSEJOG+jPBw=; b=g/AfMSNr7GWsxQNeZDgOpuRBVe5wq2Wzvu0L6MPWoh5yb35SyOTxRA4VYNUSCvvTj+ 41Jau4Gmg+tzrVCsLA5AOinFx4YPLJ84X5fObSKX5C+6duX4Y4Z3kli0I8uHCoUT3gKk DhY8Zkpw026gqIcMHsrgTghoZ9A4MWd0IvfCHBjfbvx8k73UxUR5OOViAs+EclODqwvO o9idtcgnke2bQo8f0la8obsGGJhUZ43sLl2UKmzJIi4gmViFYj2yKWZsZZtfY8PMjO8Q 2lW0KwNfnalzs+g2FPjbnUIh4KsFn/6ogTY5KstMs72pLTQiosTyoGPVWHnV7oIP3U6/ L4/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685256056; x=1687848056; h=content-transfer-encoding:in-reply-to:from:references:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=kMLEeO2tzD950kn5cwDfYuE4PyesvABhDSEJOG+jPBw=; b=lR9Z1Xb2murcRn/LhYk3lXdZeBCfrgTuSFfExHMcJxuG3Gvdh60QQ/OL1pT2L2R/tG a0lWNt/HNx1Ko03EJIoz4khnnaO9XBr/zoWk7Q0TkTF/oBXSzUpy7Da69eEXO+ZSEVte 0m86M2Sv7MvBvi5yIgJpXZWQsah5MeQ0T5Az4FffACuVTVciRtnxx9SdPCsrXnZ5ewzc fGzLVZGi3n9XJDh2tigfGOf7mhiItNJWmhJqyXDkzB7upL+Geokf1sn1CIeh3qndBhuG SBrlypcRHbD9cUN9352+eMRfoUBaHU8cbszLdRLP4s0U45Bq/4Aw1yk3opl28Smgt/kg 7Ibg== X-Gm-Message-State: AC+VfDz+sBoC6aln2xTrXe2flxTA2vgpR/4jOCcpDTBMEpGSsV4VCezX 1P2U81g8wlJTZyqECsJpu9JchILaIig= X-Google-Smtp-Source: ACHHUZ5hkePnWJMa/hyVfRFY8A1mPJ5CxSLIEB62Xm5/w19FF78ymzYgOi/qy27vGYFv+1GBMalotQ== X-Received: by 2002:a05:620a:480e:b0:75b:23a1:830f with SMTP id eb14-20020a05620a480e00b0075b23a1830fmr4329166qkb.10.1685256056602; Sat, 27 May 2023 23:40:56 -0700 (PDT) Received: from ?IPV6:2602:47:d92c:4400:4846:7012:40d6:e51b? ([2602:47:d92c:4400:4846:7012:40d6:e51b]) by smtp.gmail.com with ESMTPSA id w9-20020a05620a148900b0075cd3d61715sm670603qkj.47.2023.05.27.23.40.55 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 27 May 2023 23:40:56 -0700 (PDT) Message-ID: <958cc2e2-62c4-498a-e408-600ffae56d11@gmail.com> Date: Sun, 28 May 2023 02:40:55 -0400 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0 Subject: Re: Who cares about performance (or Intel's CPU errata)? Content-Language: en-US To: gcc@gcc.gnu.org References: <23A490318B7149D88618A7CDA2CEDB14@H270> <8DB226CF451A4430A8D7D5CBFE6B3972@H270> From: Nicholas Vinson In-Reply-To: <8DB226CF451A4430A8D7D5CBFE6B3972@H270> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 5/27/23 18:52, Stefan Kanthak wrote: > "Andrew Pinski" wrote: > >> On Sat, May 27, 2023 at 2:25 PM Stefan Kanthak wrote: >>> Just to show how SLOPPY, INCONSEQUENTIAL and INCOMPETENT GCC's developers are: >>> >>> --- dontcare.c --- >>> int ispowerof2(unsigned __int128 argument) { >>> return __builtin_popcountll(argument) + __builtin_popcountll(argument >> 64) == 1; >>> } >>> --- EOF --- >>> >>> GCC 13.3 gcc -march=haswell -O3 >>> >>> https://gcc.godbolt.org/z/PPzYsPzMc >>> ispowerof2(unsigned __int128): >>> popcnt rdi, rdi >>> popcnt rsi, rsi >>> add esi, edi >>> xor eax, eax >>> cmp esi, 1 >>> sete al >>> ret >>> >>> OOPS: what about Intel's CPU errata regarding the false dependency on POPCNTs output? >> Because the popcount is going to the same register, there is no false >> dependency .... >> The false dependency errata only applies if the result of the popcnt >> is going to a different register, the processor thinks it depends on >> the result in that register from a previous instruction but it does >> not (which is why it is called a false dependency). In this case it >> actually does depend on the previous result since the input is the >> same as the input. > OUCH, my fault; sorry for the confusion and the wrong accusation. > > Nevertheless GCC fails to optimise code properly: > > --- .c --- > int ispowerof2(unsigned long long argument) { > return __builtin_popcountll(argument) == 1; > } > --- EOF --- > > GCC 13.3 gcc -m32 -mpopcnt -O3 > > https://godbolt.org/z/fT7a7jP4e > ispowerof2(unsigned long long): > xor eax, eax > xor edx, edx > popcnt eax, [esp+4] > popcnt edx, [esp+8] > add eax, edx # eax is less than 64! Less than or equal to 64 (consider the case when input is (unsigned long long)-1) > cmp eax, 1 -> dec eax # 2 bytes shorter > sete al > movzx eax, al # superfluous Not when dec is used. Use dec and omit this instruction, you may get a result value of 0xffffff00 (consider the case when input is (unsigned long long)0). > ret > > 5 bytes and 1 instruction saved; 5 bytes here and there accumulate to > kilo- or even megabytes, and they can extend code to cross a cache line > or a 16-byte alignment boundary. > > JFTR: same for "__builtin_popcount(argument) == 1;" and 32-bit argument > > JFTR: GCC is notorious for generating superfluous MOVZX instructions > where its optimiser SHOULD be able see that the value is already > less than 256! > > Stefan