From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm1-x32e.google.com (mail-wm1-x32e.google.com [IPv6:2a00:1450:4864:20::32e]) by sourceware.org (Postfix) with ESMTPS id 20EFF3858D33 for ; Thu, 9 Nov 2023 05:57:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 20EFF3858D33 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 20EFF3858D33 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::32e ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699509472; cv=none; b=MBoBX8sqImlWnOVvthiUAf8h7LimSaaX9DPtERdv8q5LaAzLE/vsuDKuUX5cN/57UbmeiivZu25HPY4b2JG9Mh9jrEoVdQr8RrbudKDDjV5LWJuoVIz4r8bYJL+tRXhuTIUt32W5rZtSPaOHhb3lqgxvaE95V8bqqrWjivIpeqs= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699509472; c=relaxed/simple; bh=8VEg4famo5WSOSl/K4wOydtrmCpKim7zdVHRdVyrQVE=; h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:To:From; b=ZXf73XZXKp0Dvvidki1sGthm7/vQ6+iz5T+urSwdbJT0/Cl5L1DkjE6VeLua+n4/NU8Fsx9D1mHNurf/alvhdPlAt1OOhRCI6n0mRgG5rg3EFfP9jb5AXVj8wBJ7RMQFsor440IaJnh8ugWmogU9WZstWH/qzpKaoKiyMnZdy40= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-wm1-x32e.google.com with SMTP id 5b1f17b1804b1-409299277bbso2834645e9.2 for ; Wed, 08 Nov 2023 21:57:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1699509469; x=1700114269; darn=gcc.gnu.org; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=uGrFKdx4tcwnTyWJJ1QyW0huZtIZUMv7VI+RQy4KO7c=; b=dRdMAeftmzcoH942HaiHJNMcU1tVPP1V5iTMhbzXvHx0XTYNA6jVUExmrljtHcBPPl ng2rO3NrYegpAdRNT1S3XMkdfLc8C80RPAjW1JyzRLAJQOsfGac/LRrYhFNt1L7wJ63I T661OZSF3fphMnXf5AfpdyUgxmMlPRPEVPadurRdhSEDWptqC228ULsQ/+ESCj5ozEmM IHGVA70HRFzTsaOElItqluVdW/DXdp3UC0Vd+w2K95T2FbPru0IlykJHHxtZafbPy/8O ma80jiyLApTCk1DTW0znPK3LY0EVT6qjPGHsrMDsTJMOmvn54bfzY4dtGMckU1K7vQWO JOuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699509469; x=1700114269; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=uGrFKdx4tcwnTyWJJ1QyW0huZtIZUMv7VI+RQy4KO7c=; b=A5kU1ZEwNd/i8zUWp/uhOnkgcEmx/pFWeR4mXnqBffnVXxY3AKpC41fBDbSHNFzK56 149HDgvNbY9z0MfzFYdRLZuY/7NuLYozbP+SQaanhDNghkt4GVUJ8+WX/EgIgdhWrrpP rfdKz/qFO6t113V7NIgTOJNkfqv663Dw9obJeYJvufcl0TCB5e6JW0Rn40lzNKIkqIEb yRr0veqEm+UdYujE4auRT2kumpYMVlEjFG5emDA9g8jVE1ct8OQ+wH3mCDWjRmPDMuuU SqFWeOGctvXyBi+UfupK2SP04gi/yt/v7Y8grq21LsvyeEs/lZuXjOtna7NvIfQ/SNZX X1qw== X-Gm-Message-State: AOJu0YxU7mRuw+rOr6rfUd0oxr54hX3lpLBqFJ+ICRSVz/aBQguOAhBi +vgXmKv+e1Qn7ejztRsTTQs= X-Google-Smtp-Source: AGHT+IGLwLKTH3ML36Dlp2ER49gh3cU1VUppr/xeWvG3p7kk6apEI4Zy5PoYqsQeA8fpyjy16R15Kg== X-Received: by 2002:a05:600c:4715:b0:406:84cb:b014 with SMTP id v21-20020a05600c471500b0040684cbb014mr3501360wmo.22.1699509468209; Wed, 08 Nov 2023 21:57:48 -0800 (PST) Received: from [10.8.0.82] ([89.207.171.96]) by smtp.gmail.com with ESMTPSA id i15-20020a05600c354f00b004067e905f44sm987462wmq.9.2023.11.08.21.57.47 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 08 Nov 2023 21:57:47 -0800 (PST) Message-ID: <0e2723e5-a60d-484e-b4f4-951bc8dfde8d@gmail.com> Date: Thu, 9 Nov 2023 06:57:45 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v3] libstdc++: optimize bit iterators assuming normalization [PR110807] Content-Language: en-US To: Alexandre Oliva , Jonathan Wakely Cc: Jonathan Wakely , libstdc++ References: From: =?UTF-8?Q?Fran=C3=A7ois_Dumont?= In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-10.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE,URI_HEX autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 09/11/2023 04:36, Alexandre Oliva wrote: > On Nov 8, 2023, Jonathan Wakely wrote: > >> ofst needs to be __ofst but OK for trunk with that change. > Oh, doh, thanks for catching that last-minute tweak. > > Retesting with that change completed successfully, so I've just pushed > the following: > > > libstdc++: optimize bit iterators assuming normalization [PR110807] > > The representation of bit iterators, using a pointer into an array of > words, and an unsigned bit offset into that word, makes for some > optimization challenges: because the compiler doesn't know that the > offset is always in a certain narrow range, beginning at zero and > ending before the word bitwidth, when a function loads an offset that > it hasn't normalized itself, it may fail to derive certain reasonable > conclusions, even to the point of retaining useless calls that elicit > incorrect warnings. > > Case at hand: The 110807.cc testcase for bit vectors assigns a 1-bit > list to a global bit vector variable. Based on the compile-time > constant length of the list, we decide in _M_insert_range whether to > use the existing storage or to allocate new storage for the vector. > After allocation, we decide in _M_copy_aligned how to copy any > preexisting portions of the vector to the newly-allocated storage. > When copying two or more words, we use __builtin_memmove. > > However, because we compute the available room using bit offsets > without range information, even comparing them with constants, we fail > to infer ranges for the preexisting vector depending on word size, and > may thus retain the memmove call despite knowing we've only allocated > one word. > > Other parts of the compiler then detect the mismatch between the > constant allocation size and the much larger range that could > theoretically be copied into the newly-allocated storage if we could > reach the call. > > Ensuring the compiler is aware of the constraints on the offset range > enables it to do a much better job at optimizing. Using attribute > assume (_M_offset <= ...) didn't work, because gimple lowered that to > something that vrp could only use to ensure 'this' was non-NULL. > Exposing _M_offset as an automatic variable/gimple register outside > the unevaluated assume operand enabled the optimizer to do its job. > > Rather than placing such load-then-assume constructs all over, I > introduced an always-inline member function in bit iterators that does > the job of conveying to the compiler the information that the > assumption is supposed to hold, and various calls throughout functions > pertaining to bit iterators that might not otherwise know that the > offsets have to be in range, so that the compiler no longer needs to > make conservative assumptions that prevent optimizations. > > With the explicit assumptions, the compiler can correlate the test for > available storage in the vector with the test for how much storage > might need to be copied, and determine that, if we're not asking for > enough room for two or more words, we can omit entirely the code to > copy two or more words, without any runtime overhead whatsoever: no > traces remain of the undefined behavior or of the tests that inform > the compiler about the assumptions that must hold. > > > for libstdc++-v3/ChangeLog > > PR libstdc++/110807 > * include/bits/stl_bvector.h (_Bit_iterator_base): Add > _M_assume_normalized member function. Call it in _M_bump_up, > _M_bump_down, _M_incr, operator==, operator<=>, operator<, and > operator-. > (_Bit_iterator): Also call it in operator*. > (_Bit_const_iterator): Likewise. > --- > libstdc++-v3/include/bits/stl_bvector.h | 37 ++++++++++++++++++++++++++++--- > 1 file changed, 34 insertions(+), 3 deletions(-) > > diff --git a/libstdc++-v3/include/bits/stl_bvector.h b/libstdc++-v3/include/bits/stl_bvector.h > index 8d18bcaffd434..2b91af2005f2d 100644 > --- a/libstdc++-v3/include/bits/stl_bvector.h > +++ b/libstdc++-v3/include/bits/stl_bvector.h > @@ -56,6 +56,10 @@ > #ifndef _STL_BVECTOR_H > #define _STL_BVECTOR_H 1 > > +#ifndef _GLIBCXX_ALWAYS_INLINE > +#define _GLIBCXX_ALWAYS_INLINE inline __attribute__((__always_inline__)) > +#endif > + IMHO using [[__gnu__::__always_inline__]] would give the same result, but without the macro !