From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pl1-x634.google.com (mail-pl1-x634.google.com [IPv6:2607:f8b0:4864:20::634]) by sourceware.org (Postfix) with ESMTPS id 5ADBA38C2994; Fri, 7 Jun 2024 18:42:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5ADBA38C2994 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 5ADBA38C2994 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::634 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1717785780; cv=none; b=hj3vZOPJyFAzJ/4eiJ+5Ql+xaUODuNvA24CAdsleE8JzH00t1C6TFUq7f06ibAjJgOFa3UgaxTRU1L7oy8aILnJyh/tZqMVKBWEd7OBg/2j0/d+Rj8WCqnMzbX6z1WErzNy9g5Px3notkYd2FxOKqxZsfzjSEp0pkjphIGD0Zzc= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1717785780; c=relaxed/simple; bh=/oupRqCU3KI7Irboi/z6C6NXLpvdiWxFfGjLrZl2VrE=; h=DKIM-Signature:Message-ID:Date:MIME-Version:From:Subject:To; b=oOX85gHkh87JfiCWrdlEdfkW/1UFiZIKMYoXHH7FDUvJTX+/zrwshdRPOn3vkLQ+t2vhOxkSPl2myhYZIDKvLa+p4oPJXuSZqfNmJwhLaDGXTd1mF//eAAI/7VxhSNZngDXV3AebR3tB5Q6Gm7CAjsR3Kxm9gGj4G5c+95VKlp0= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pl1-x634.google.com with SMTP id d9443c01a7336-1f6c7cded01so13734005ad.0; Fri, 07 Jun 2024 11:42:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1717785769; x=1718390569; darn=gcc.gnu.org; h=content-transfer-encoding:in-reply-to:content-language:references :cc:to:subject:from:user-agent:mime-version:date:message-id:from:to :cc:subject:date:message-id:reply-to; bh=ReOllLR2UUe33gS9K7ocF+PbDbGEWg7NitsY1HCf+xM=; b=GQPkZnPnTjw0ixhC44Qo3O0AqzjXahEpoqiNNzPfyTslzNH+bQg+ws4o5rntxuISTd 3T8ZwuD3wTXnnHcl73wM8dOJLmr0nRqheWCzPkaUtuuq6eZzF7yulIW/1K5XZRFj6rBW YbtmX8YgiogIw8IbOMngkmhdWRAjz87ypHt0H3daQbJbO+sxLW68QsmxASEKQKdiMOAs rwuxD/52b8kJfyDd93fq35wNeaucaQsVFUUAQYR6Briti4Vjdt59wAreYtAqCaH3dF2W CeDJam1M20FjOK3Co/DmJsz8obsHuw3VFYMdPCMwsqxyomk4S/LdVo4yGMMk+RFcLQKp mBxQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717785769; x=1718390569; h=content-transfer-encoding:in-reply-to:content-language:references :cc:to:subject:from:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ReOllLR2UUe33gS9K7ocF+PbDbGEWg7NitsY1HCf+xM=; b=GurlYyifZwBredl9g1F650cpY5lwppl8rVG1j9Jh3jUIwjTrwuqx2s4edi+hbb+PlM ZlmIsOQxPQnss/7KrDe9Ndca5V/gDXjlCmMksn3/foL5X/uNhIybZeUYdMxn7WOLkAfS WeMEoV22h5iKgp+O+TD5Gnz9fqRoipW7MTfHaSGxv945Bh8/OGqowpv6DtQbTu66MlOb 0BJogG/QnHQ7S2ATK3qFn4FZ6BJtULowaPHEqKoi5oI3YPGc4c/pgDqZXcDXoECUD4Mh QwaNzDrDAyddJUJBjqq9R49Bi7pusRs1o/xEa5xE08DWaWPoAUS0vmLfN0AM0hwnpLVQ MtOQ== X-Forwarded-Encrypted: i=1; AJvYcCWE4QEx/mLCrXB3uKW9HkqbG5trczYNlJYzAtq2QNjPTwE/98IgOYB3H6A8md0LyMDtKVzIEeogd8eoS43e78s/Pv6zlQMDdA== X-Gm-Message-State: AOJu0YxX/coc7je+Jzd4iQwvXZiGq9rTMDAZNcpCo7JeUeM7jMVYr/aW QJK3UeKls5PJppnnxUUhywNLefFCIaS1+58T8sP1iOH7jY3/hIZLnfsY5Q== X-Google-Smtp-Source: AGHT+IFYMg0N2pnfe1J/vJgiEpN0AhUk3aKXF306FHPI10dkhebVCI0Q0tbtw8afPiK+OG2jYf6Q6Q== X-Received: by 2002:a17:902:c952:b0:1f6:a766:2ffb with SMTP id d9443c01a7336-1f6d03dba1amr39949805ad.65.1717785769149; Fri, 07 Jun 2024 11:42:49 -0700 (PDT) Received: from ?IPV6:2604:4080:1153:80f1:80e4:d2f5:8760:4e7f? ([2604:4080:1153:80f1:80e4:d2f5:8760:4e7f]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1f6bd76d724sm37434245ad.103.2024.06.07.11.42.48 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 07 Jun 2024 11:42:48 -0700 (PDT) Message-ID: <147980a4-20f3-47f4-8ab3-fb31d2ae3e58@gmail.com> Date: Fri, 7 Jun 2024 11:42:34 -0700 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird From: Stephen Face Subject: Re: [PATCH] libstdc++: Optimize std::gcd To: Jonathan Wakely , Sam James Cc: libstdc++@gcc.gnu.org, gcc-patches@gcc.gnu.org References: <9e5636e8-224f-4dde-9c60-489b4aad6534@gmail.com> <87wmn1m8pu.fsf@gentoo.org> Content-Language: en-US In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-12.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 6/7/24 2:30 AM, Jonathan Wakely wrote: > On Fri, 7 Jun 2024 at 09:57, Sam James wrote: >> >> Stephen Face writes: >> >>> This patch is to optimize the runtime execution of gcd. Mathematically, >>> it computes with the same algorithm as before, but subtractions and >>> branches are rearranged to encourage generation of code that can use >>> flags from the subtractions for conditional moves. Additionally, most >>> pairs of integers are coprime, so this patch also includes a check for >>> one of the integers to be equal to 1, and then it will exit the loop >>> early in this case. >> >> Is it worth filing a bug for the missed optimisation? You shouldn't have >> to write things in a specific order. Thanks. This patch is not a pure reordering. It has the added comparison with 1. Also, the argument to the trailing zero count is now the result of the subtraction before the comparison of m and n. This shortens the chain of dependencies, which can help the loop run faster. > > Yes, I think a bug report would be good. But 20%-60% decreases in run > time seems significant enough that we should take the libstdc++ patch > now, rather than wait for a possible compiler fix to come later. > > Stephen, could you please confirm whether you have a copyright > assignment in place for GCC, and if not whether you would be will to > complete one, or alternatively contribute this under the DCO terms: > https://gcc.gnu.org/dco.html > Thanks! I do not have a copyright assignment in place for GCC. I can certify to the DCO terms for this contribution. Signed-off-by: Stephen Face > > >> >>> >>> libstdc++-v3/ChangeLog: >>> >>> * include/std/numeric(__gcd): Optimize. >>> --- >>> I have tested this on x86_64-linux and aarch64-linux. I have tested the >>> timing with random distributions of small inputs and large inputs on a >>> couple of machines with -O2 and found decreases in execution time from >>> 20% to 60% depending on the machine and distribution of inputs. >>> >>> libstdc++-v3/include/std/numeric | 21 +++++++++++---------- >>> 1 file changed, 11 insertions(+), 10 deletions(-) >>> >>> diff --git a/libstdc++-v3/include/std/numeric b/libstdc++-v3/include/std/numeric >>> index c912db4a519..3c9e8387a0e 100644 >>> --- a/libstdc++-v3/include/std/numeric >>> +++ b/libstdc++-v3/include/std/numeric >>> @@ -148,19 +148,20 @@ namespace __detail >>> >>> while (true) >>> { >>> - if (__m > __n) >>> - { >>> - _Tp __tmp = __m; >>> - __m = __n; >>> - __n = __tmp; >>> - } >>> + _Tp __m_minus_n = __m - __n; >>> + if (__m_minus_n == 0) >>> + return __m << __k; >>> >>> - __n -= __m; >>> + _Tp __next_m = __m < __n ? __m : __n; >>> >>> - if (__n == 0) >>> - return __m << __k; >>> + if (__next_m == 1) >>> + return __next_m << __k; >>> + >>> + _Tp __n_minus_m = __n - __m; >>> + __n = __n < __m ? __m_minus_n : __n_minus_m; >>> + __m = __next_m; >>> >>> - __n >>= std::__countr_zero(__n); >>> + __n >>= std::__countr_zero(__m_minus_n); >>> } >>> } >>> } // namespace __detail >> >