From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 0061F3858291 for ; Thu, 9 Nov 2023 08:16:25 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0061F3858291 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 0061F3858291 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699517787; cv=none; b=K52pVgSejQ5mSE2ckeX9XtItU9BPm+t/l9XyEM4lUi+0In/qnwPH8FJjpk2GPsewINnKInyPqkykL9Xc/9Z8UqO1Xsg58R+W3GCGG4uRvU7mSQM+Jcn9t+tRttQcbWkyubNZV1Pef2Gp+iRL9i53Xce77IqRkZ88HEUcNLybh/k= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699517787; c=relaxed/simple; bh=cn+1mBGy0ylofJ3mVkEYGw8xzSgE+hX+LEG/nuMPLOg=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=v12hSYIQFayJJrV3QtYJl4rZp0Qeyb1grgOO2ez8IaKd1mxgWvbT8E5aIuSNuVqb/tW7yUkLZuzHn+w/bj3p9tzo9RiQTaW5BDa4WO6gnn3IyPUDiO6yXW84giPhBwA/5TkBQsHPvXxcd6uN6ITzdbTqWPke4NPlFp5goPBfRRs= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1699517785; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=duwIRhYwtXDBtEvXJQOT9U4kkyleVeRmMtv3TdzUa5M=; b=CK2s7Ost3ZUm/ZIoJNbnQB39vuaOczb20MBDEk/neuZktgeAAVb9fJCQBShDv6sZXfj2B4 nBo5ob416G1xPiLz8LSh/XpL1+baRrf6LKvXmf7be+DFsWHRQgHH5oXqrTtn9Eot0r6hJV 0q7Mp3gJuDrCm+UVA5lrZu5gq5wqwUc= Received: from mail-yb1-f199.google.com (mail-yb1-f199.google.com [209.85.219.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-501-P21yqKU_NlCmIyRLlRg7Gg-1; Thu, 09 Nov 2023 03:16:24 -0500 X-MC-Unique: P21yqKU_NlCmIyRLlRg7Gg-1 Received: by mail-yb1-f199.google.com with SMTP id 3f1490d57ef6-daee86e2d70so482186276.0 for ; Thu, 09 Nov 2023 00:16:24 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699517783; x=1700122583; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=duwIRhYwtXDBtEvXJQOT9U4kkyleVeRmMtv3TdzUa5M=; b=sfEUK6o3PZ2+8iZVmcrmBmwzosoukYJ2PiAGb+AYdG675CimdlCGb879eG2fvVWkV0 xsQZIpT5xvgY93blHaxGWl7FADsUFA1yRhMC7OSSiqTypiLPl3JAd3inSTzIrykqKH1c OqSyKZMW+tUgqTymNk1C2otI/HBcrUVRyCemE/+n1m1DCJ6koLyWTt814O7lxFEhiP7h LKQ+Zwvvf8j3Gg3e9FKRAWOs0X/+fCJr9JhcpHpeN7/ceCaYVDf0Vx3FMGke4nyfbs4v GLk+M72J+K3pp+lZhGRYF6/WR3+OL29vVDqt9pCRzFWbi5mm2qygJ0yloKC6+M9er4yy BXew== X-Gm-Message-State: AOJu0Yyz7TEsRpgfzdOBCtvMOIMNqNm0tZwRKCH6lzWgHTDbE0I921m4 UR1no6AH7DS+RpuVYU/9J4pYMehA41xOz1cLSD3UKYjQOrpN98A9rX2RtO+eMnnPUqTpnYc+94/ I/0wQmueYweX2ZhZMbLZ5qT2rs8CXCSE= X-Received: by 2002:a25:dc86:0:b0:d9c:7d48:3020 with SMTP id y128-20020a25dc86000000b00d9c7d483020mr4117461ybe.20.1699517783680; Thu, 09 Nov 2023 00:16:23 -0800 (PST) X-Google-Smtp-Source: AGHT+IGLAYWkncPSAZQdJwELLxEWYCU8BNQ1FCbqB/WsZldO2sxkLpFU//eKktxjoXdi0TYITwhBJFG9etX8O4ifnpY= X-Received: by 2002:a25:dc86:0:b0:d9c:7d48:3020 with SMTP id y128-20020a25dc86000000b00d9c7d483020mr4117448ybe.20.1699517783414; Thu, 09 Nov 2023 00:16:23 -0800 (PST) MIME-Version: 1.0 References: <0e2723e5-a60d-484e-b4f4-951bc8dfde8d@gmail.com> In-Reply-To: <0e2723e5-a60d-484e-b4f4-951bc8dfde8d@gmail.com> From: Jonathan Wakely Date: Thu, 9 Nov 2023 08:16:12 +0000 Message-ID: Subject: Re: [PATCH v3] libstdc++: optimize bit iterators assuming normalization [PR110807] To: =?UTF-8?Q?Fran=C3=A7ois_Dumont?= Cc: Alexandre Oliva , Jonathan Wakely , "libstdc++" X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE,URI_HEX autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Thu, 9 Nov 2023 at 05:57, Fran=C3=A7ois Dumont wr= ote: > > > On 09/11/2023 04:36, Alexandre Oliva wrote: > > On Nov 8, 2023, Jonathan Wakely wrote: > > > >> ofst needs to be __ofst but OK for trunk with that change. > > Oh, doh, thanks for catching that last-minute tweak. > > > > Retesting with that change completed successfully, so I've just pushed > > the following: > > > > > > libstdc++: optimize bit iterators assuming normalization [PR110807] > > > > The representation of bit iterators, using a pointer into an array of > > words, and an unsigned bit offset into that word, makes for some > > optimization challenges: because the compiler doesn't know that the > > offset is always in a certain narrow range, beginning at zero and > > ending before the word bitwidth, when a function loads an offset that > > it hasn't normalized itself, it may fail to derive certain reasonable > > conclusions, even to the point of retaining useless calls that elicit > > incorrect warnings. > > > > Case at hand: The 110807.cc testcase for bit vectors assigns a 1-bit > > list to a global bit vector variable. Based on the compile-time > > constant length of the list, we decide in _M_insert_range whether to > > use the existing storage or to allocate new storage for the vector. > > After allocation, we decide in _M_copy_aligned how to copy any > > preexisting portions of the vector to the newly-allocated storage. > > When copying two or more words, we use __builtin_memmove. > > > > However, because we compute the available room using bit offsets > > without range information, even comparing them with constants, we fail > > to infer ranges for the preexisting vector depending on word size, and > > may thus retain the memmove call despite knowing we've only allocated > > one word. > > > > Other parts of the compiler then detect the mismatch between the > > constant allocation size and the much larger range that could > > theoretically be copied into the newly-allocated storage if we could > > reach the call. > > > > Ensuring the compiler is aware of the constraints on the offset range > > enables it to do a much better job at optimizing. Using attribute > > assume (_M_offset <=3D ...) didn't work, because gimple lowered that to > > something that vrp could only use to ensure 'this' was non-NULL. > > Exposing _M_offset as an automatic variable/gimple register outside > > the unevaluated assume operand enabled the optimizer to do its job. > > > > Rather than placing such load-then-assume constructs all over, I > > introduced an always-inline member function in bit iterators that does > > the job of conveying to the compiler the information that the > > assumption is supposed to hold, and various calls throughout functions > > pertaining to bit iterators that might not otherwise know that the > > offsets have to be in range, so that the compiler no longer needs to > > make conservative assumptions that prevent optimizations. > > > > With the explicit assumptions, the compiler can correlate the test for > > available storage in the vector with the test for how much storage > > might need to be copied, and determine that, if we're not asking for > > enough room for two or more words, we can omit entirely the code to > > copy two or more words, without any runtime overhead whatsoever: no > > traces remain of the undefined behavior or of the tests that inform > > the compiler about the assumptions that must hold. > > > > > > for libstdc++-v3/ChangeLog > > > > PR libstdc++/110807 > > * include/bits/stl_bvector.h (_Bit_iterator_base): Add > > _M_assume_normalized member function. Call it in _M_bump_up, > > _M_bump_down, _M_incr, operator=3D=3D, operator<=3D>, operator<, = and > > operator-. > > (_Bit_iterator): Also call it in operator*. > > (_Bit_const_iterator): Likewise. > > --- > > libstdc++-v3/include/bits/stl_bvector.h | 37 ++++++++++++++++++++++= ++++++--- > > 1 file changed, 34 insertions(+), 3 deletions(-) > > > > diff --git a/libstdc++-v3/include/bits/stl_bvector.h b/libstdc++-v3/inc= lude/bits/stl_bvector.h > > index 8d18bcaffd434..2b91af2005f2d 100644 > > --- a/libstdc++-v3/include/bits/stl_bvector.h > > +++ b/libstdc++-v3/include/bits/stl_bvector.h > > @@ -56,6 +56,10 @@ > > #ifndef _STL_BVECTOR_H > > #define _STL_BVECTOR_H 1 > > > > +#ifndef _GLIBCXX_ALWAYS_INLINE > > +#define _GLIBCXX_ALWAYS_INLINE inline __attribute__((__always_inline__= )) > > +#endif > > + > > IMHO using [[__gnu__::__always_inline__]] would give the same result, > but without the macro ! But only valid for C++11 and later, not C++98. I think the reason we use _GLIBCXX_ALWAYS_INLINE like that is in case it causes compiler errors ("'always_inline' function might not be inlinable"), so that users can define the macro themselves to disable the attribute. At least, that's my assumption of why we do it via macros in several places. But I've just realised we probably want to #undef the macro at the end of bits/stl_bvector.h too.