From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 129BE383138C for ; Fri, 21 Jun 2024 16:08:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 129BE383138C Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 129BE383138C Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1718986122; cv=none; b=P7d9BWCK6OoGrkgPab8jqLBuvUyE6YAhN6gCe4FW2v/n/RyB9QN1FLyGn1DVWZwrBpYwTNhQ9CJcw0VD78CdzlUX286XC4YTA5WMnzfDC3A+BWqT/dB133XqfsFSIsjIX8pJyIZn8kwP5h/bLNScS0ITTsXWVGXiiQPBg75Aj2k= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1718986122; c=relaxed/simple; bh=NNw9WRocwXN2hq0cGpr/5sY7Zyh+axPTQFk9c0/GmgU=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=Dn93to8kPDTDfbvYh+DIN3SGlUkXvR7fAShdHeyawgbiBNtovv1nhNPtCZEW3qbt3Zhl/QQE/TA9e6UvVO6hVU33UAn10E4ycYftqDUSxEKFDIj8SM7UBo/AS6AS5KiOnFwg5eWtfLL3IAI8LYCtXItgt0drjMB/SlHzHVlmdGo= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1718986119; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=B+hgW+JJAYWG3kDeRE5H3XZVA0r6o8wF85URoRXDU5U=; b=hfTedkd/jN6Dv2OjRXNUljcC+3wLJa1K5DoXHaiZ1kfYnJDKQTkOXxQUzC5jAOFwhBzYbK +LNKhf8pHJHmtdQY+4vZsAXfmbpbqUmMUCRfRItkWj+qdBTKbZwDi7Yn9W0liemRMvSOIO 1Ywlnne8RQKC2SxvSUtau2qc/UxIA2g= Received: from mail-yw1-f199.google.com (mail-yw1-f199.google.com [209.85.128.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-134-l3jBYVL5MI-07l5R0w3IBw-1; Fri, 21 Jun 2024 12:08:34 -0400 X-MC-Unique: l3jBYVL5MI-07l5R0w3IBw-1 Received: by mail-yw1-f199.google.com with SMTP id 00721157ae682-62f8a1b2969so39459207b3.3 for ; Fri, 21 Jun 2024 09:08:34 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718986113; x=1719590913; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=B+hgW+JJAYWG3kDeRE5H3XZVA0r6o8wF85URoRXDU5U=; b=ipX+lq932Ajum/nfsGxad4n4IS8mwQGCZB+xCv69uls91GGJBjJUCPhg65f1tkj3Gp yojPgOCM3Ku5s0AJapMkqQBDzVMIn/XxNV4K5GCA3T80S/RlkNF4xP6VZb18xEkzjaFr 0bgLEkXvFjQoDVRNV5I8vOZifoBeNUxWVveu7IF8qSgNUx0Ee3O0r1aypIWhOYiNnbcp GwSiJsyu8KNPHD5RZAfDL3l6b1lidCnEThRmU8nt+J3TloLdqSjCfUhXIsmvJ/Ih6eLi GDs/DOpyeDkzTnQdhB8J60/Qdp6Lqm+sTe+U1Z4EC/ZVRWNZ5Ti1vr7uW0hub4QMWYaG IlAw== X-Forwarded-Encrypted: i=1; AJvYcCX52OUvFjcXUewtyKOKgTh6LMnauw4u3sotgMqq/OHdzjYY/H4kpN96YxzJVqmfP66w6OOxvdCZKIhj4mh/pPmMcmdT3o6IiQ== X-Gm-Message-State: AOJu0Yz0jCWOSEbcO5Vsi7ria9vH+LbCjBtefG/8+GVb9K3MH8hUZchu a+lliZoUtJbBFmR5YEIBtRJGf2lI+m50DC5jmPn/hJOG5c7n/b5fRAkrBJ2boBONh2HTcryX3Nb ghtwdJN/5ZgQSn/p7Ltquu9ug6lcGE+XTV/rKAzGVvjPnrTC1lsmkUOtSyW9BH7xaWPBrrGJ/eQ fTk14YAv6N36/hmQpL1HayGbkEDYjTZWqAQAKggA== X-Received: by 2002:a0d:f781:0:b0:62f:d165:6ee8 with SMTP id 00721157ae682-63a8deedb8dmr92308287b3.18.1718986112385; Fri, 21 Jun 2024 09:08:32 -0700 (PDT) X-Google-Smtp-Source: AGHT+IG7dHpzT3g3JrV2zauQVPp1pdaQpzh5SxEYGhe+0E470EKJ0ShbvG9kS1NlHr0/ZQkZnLX5Jkd1kcog4n8WC0Y= X-Received: by 2002:a0d:f781:0:b0:62f:d165:6ee8 with SMTP id 00721157ae682-63a8deedb8dmr92307097b3.18.1718986110311; Fri, 21 Jun 2024 09:08:30 -0700 (PDT) MIME-Version: 1.0 References: <20240620152811.2020226-1-jwakely@redhat.com> In-Reply-To: <20240620152811.2020226-1-jwakely@redhat.com> From: Jonathan Wakely Date: Fri, 21 Jun 2024 17:08:14 +0100 Message-ID: Subject: Re: [PATCH] libstdc++: Fix std::fill and std::fill_n optimizations [PR109150] To: libstdc++@gcc.gnu.org, gcc-patches@gcc.gnu.org X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE,TXREP,URI_HEX autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Pushed to trunk now. On Thu, 20 Jun 2024 at 16:28, Jonathan Wakely wrote: > > I think the new conditions are correct. They're certainly an improvment > on just checking __is_scalar without considering what we're assigning it > to. > > Tested x86_64-linux. > > -- >8 -- > > As noted in the PR, the optimization used for scalar types in std::fill > and std::fill_n is non-conforming, because it doesn't consider that > assigning a scalar type might have non-trivial side effects which are > affected by the optimization. > > By changing the condition under which the optimization is done we ensure > it's only performed when safe to do so, and we also enable it for > additional types, which was the original subject of the PR. > > Instead of two overloads using __enable_if<__is_scalar::__value, R> > we can combine them into one and create a local variable which is either > a local copy of __value or another reference to it, depending on whether > the optimization is allowed. > > This removes a use of std::__is_scalar, which is a step towards fixing > PR 115497 by removing std::__is_pointer from > > libstdc++-v3/ChangeLog: > > PR libstdc++/109150 > * include/bits/stl_algobase.h (__fill_a1): Combine the > !__is_scalar and __is_scalar overloads into one and rewrite the > condition used to decide whether to perform the load outside the > loop. > * testsuite/25_algorithms/fill/109150.cc: New test. > * testsuite/25_algorithms/fill_n/109150.cc: New test. > --- > libstdc++-v3/include/bits/stl_algobase.h | 75 ++++++++++++------- > .../testsuite/25_algorithms/fill/109150.cc | 62 +++++++++++++++ > .../testsuite/25_algorithms/fill_n/109150.cc | 62 +++++++++++++++ > 3 files changed, 171 insertions(+), 28 deletions(-) > create mode 100644 libstdc++-v3/testsuite/25_algorithms/fill/109150.cc > create mode 100644 libstdc++-v3/testsuite/25_algorithms/fill_n/109150.cc > > diff --git a/libstdc++-v3/include/bits/stl_algobase.h b/libstdc++-v3/include/bits/stl_algobase.h > index d831e0e9883..1a0f8c14073 100644 > --- a/libstdc++-v3/include/bits/stl_algobase.h > +++ b/libstdc++-v3/include/bits/stl_algobase.h > @@ -929,28 +929,39 @@ _GLIBCXX_END_NAMESPACE_CONTAINER > #define _GLIBCXX_MOVE_BACKWARD3(_Tp, _Up, _Vp) std::copy_backward(_Tp, _Up, _Vp) > #endif > > +#pragma GCC diagnostic push > +#pragma GCC diagnostic ignored "-Wc++17-extensions" > template > _GLIBCXX20_CONSTEXPR > - inline typename > - __gnu_cxx::__enable_if::__value, void>::__type > + inline void > __fill_a1(_ForwardIterator __first, _ForwardIterator __last, > const _Tp& __value) > { > - for (; __first != __last; ++__first) > - *__first = __value; > - } > + // We can optimize this loop by moving the load from __value outside > + // the loop, but only if we know that making that copy is trivial, > + // and the assignment in the loop is also trivial (so that the identity > + // of the operand doesn't matter). > + const bool __load_outside_loop = > +#if __has_builtin(__is_trivially_constructible) \ > + && __has_builtin(__is_trivially_assignable) > + __is_trivially_constructible(_Tp, const _Tp&) > + && __is_trivially_assignable(__decltype(*__first), const _Tp&) > +#else > + __is_trivially_copyable(_Tp) > + && __is_same(_Tp, __typeof__(*__first)) > +#endif > + && sizeof(_Tp) <= sizeof(long long); > > - template > - _GLIBCXX20_CONSTEXPR > - inline typename > - __gnu_cxx::__enable_if<__is_scalar<_Tp>::__value, void>::__type > - __fill_a1(_ForwardIterator __first, _ForwardIterator __last, > - const _Tp& __value) > - { > - const _Tp __tmp = __value; > + // When the condition is true, we use a copy of __value, > + // otherwise we just use another reference. > + typedef typename __gnu_cxx::__conditional_type<__load_outside_loop, > + const _Tp, > + const _Tp&>::__type _Up; > + _Up __val(__value); > for (; __first != __last; ++__first) > - *__first = __tmp; > + *__first = __val; > } > +#pragma GCC diagnostic pop > > // Specialization: for char types we can use memset. > template > @@ -1079,28 +1090,36 @@ _GLIBCXX_END_NAMESPACE_CONTAINER > __size_to_integer(__float128 __n) { return (long long)__n; } > #endif > > +#pragma GCC diagnostic push > +#pragma GCC diagnostic ignored "-Wc++17-extensions" > template > _GLIBCXX20_CONSTEXPR > - inline typename > - __gnu_cxx::__enable_if::__value, _OutputIterator>::__type > + inline _OutputIterator > __fill_n_a1(_OutputIterator __first, _Size __n, const _Tp& __value) > { > - for (; __n > 0; --__n, (void) ++__first) > - *__first = __value; > - return __first; > - } > + // See std::__fill_a1 for explanation of this condition. > + const bool __load_outside_loop = > +#if __has_builtin(__is_trivially_constructible) \ > + && __has_builtin(__is_trivially_assignable) > + __is_trivially_constructible(_Tp, const _Tp&) > + && __is_trivially_assignable(__decltype(*__first), const _Tp&) > +#else > + __is_trivially_copyable(_Tp) > + && __is_same(_Tp, __typeof__(*__first)) > +#endif > + && sizeof(_Tp) <= sizeof(long long); > > - template > - _GLIBCXX20_CONSTEXPR > - inline typename > - __gnu_cxx::__enable_if<__is_scalar<_Tp>::__value, _OutputIterator>::__type > - __fill_n_a1(_OutputIterator __first, _Size __n, const _Tp& __value) > - { > - const _Tp __tmp = __value; > + // When the condition is true, we use a copy of __value, > + // otherwise we just use another reference. > + typedef typename __gnu_cxx::__conditional_type<__load_outside_loop, > + const _Tp, > + const _Tp&>::__type _Up; > + _Up __val(__value); > for (; __n > 0; --__n, (void) ++__first) > - *__first = __tmp; > + *__first = __val; > return __first; > } > +#pragma GCC diagnostic pop > > template typename _Tp> > diff --git a/libstdc++-v3/testsuite/25_algorithms/fill/109150.cc b/libstdc++-v3/testsuite/25_algorithms/fill/109150.cc > new file mode 100644 > index 00000000000..9491a998ff0 > --- /dev/null > +++ b/libstdc++-v3/testsuite/25_algorithms/fill/109150.cc > @@ -0,0 +1,62 @@ > +// { dg-do run } > + > +// Test the problematic cases identified in PR libstdc++/109150 > +// where the previous std::fill was non-conforming. > + > +#include > +#include > + > +const int global = 0; > + > +struct X { > + void operator=(const int& i) { VERIFY(&i == &global); } > +}; > + > +void > +test_identity_matters() > +{ > + X x; > + // Assigning int to X has non-trivial side effects, so we cannot > + // hoist the load outside the loop, we have to do exactly what the > + // standard says to do. > + std::fill(&x, &x+1, global); > +} > + > +struct Y { > + int i; > + void operator=(int ii) { i = ii + 1; } > +}; > + > +void > +test_self_aliasing() > +{ > + Y y[2] = { }; > + // Assigning int to X has non-trivial side effects, altering the value > + // used to fill the later elements. Must not load it outside the loop. > + std::fill(y, y+2, y[0].i); > + VERIFY(y[1].i == 2); > +} > + > +struct Z > +{ > + Z() { } > +#if __cplusplus >= 201103L > + explicit Z(const Z&) = default; > +#endif > +}; > + > +void > +test_explicit_copy_ctor() > +{ > + Z z; > + // The optimization that copies the fill value must use direct-initialization > + // otherwise this case would be ill-formed due to the explicit constructor. > + std::fill(&z, &z, z); > +} > + > +int main() > +{ > + test_identity_matters(); > + test_self_aliasing(); > + test_explicit_copy_ctor(); > +} > diff --git a/libstdc++-v3/testsuite/25_algorithms/fill_n/109150.cc b/libstdc++-v3/testsuite/25_algorithms/fill_n/109150.cc > new file mode 100644 > index 00000000000..13bb31c9344 > --- /dev/null > +++ b/libstdc++-v3/testsuite/25_algorithms/fill_n/109150.cc > @@ -0,0 +1,62 @@ > +// { dg-do run } > + > +// Test the problematic cases identified in PR libstdc++/109150 > +// where the previous std::fill_n was non-conforming. > + > +#include > +#include > + > +const int global = 0; > + > +struct X { > + void operator=(const int& i) { VERIFY(&i == &global); } > +}; > + > +void > +test_identity_matters() > +{ > + X x; > + // Assigning int to X has non-trivial side effects, so we cannot > + // hoist the load outside the loop, we have to do exactly what the > + // standard says to do. > + std::fill_n(&x, 1, global); > +} > + > +struct Y { > + int i; > + void operator=(int ii) { i = ii + 1; } > +}; > + > +void > +test_self_aliasing() > +{ > + Y y[2] = { }; > + // Assigning int to X has non-trivial side effects, altering the value > + // used to fill the later elements. Must not load it outside the loop. > + std::fill_n(y, 2, y[0].i); > + VERIFY(y[1].i == 2); > +} > + > +struct Z > +{ > + Z() { } > +#if __cplusplus >= 201103L > + explicit Z(const Z&) = default; > +#endif > +}; > + > +void > +test_explicit_copy_ctor() > +{ > + Z z; > + // The optimization that copies the fill value must use direct-initialization > + // otherwise this case would be ill-formed due to the explicit constructor. > + std::fill_n(&z, 1, z); > +} > + > +int main() > +{ > + test_identity_matters(); > + test_self_aliasing(); > + test_explicit_copy_ctor(); > +} > -- > 2.45.2 >