From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 59EB63890431 for ; Thu, 20 Jun 2024 15:28:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 59EB63890431 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 59EB63890431 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1718897303; cv=none; b=ldmwWKd8G85RdyM+yQ4QgAVQHN5gxvrZz1n3JuMqQg3leKmf0eNYDYvcCX4VreCQvUGMdYaEhaQuqT0deP+QJeMUeePaE52LqaEYqcFEjQzGlfy2vsvxcenV0IkSYNDDrERp8tAFfUYy8cpkcnHQgvHRZzY/D8YloVjgR4auJOY= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1718897303; c=relaxed/simple; bh=GYgGq4dTe/vqc28aR7IEUD3zGO65Il+CjPAF9Oy2vFg=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=pjbyFlB6q2wFty0GjYnSkHFpIE+tc4jP0WPfYE3LxEuurKm8Z+kEumJKDiD889PMxf5fCjjCDCZaqvKQ/6NkUzg9wGxljY8zn8tDbIUG2nUIc5KfGZV2eqoAhc1l2mrMRb3Ng+/m95GyK/AOJwfuPe8QvgrUa04anJC4CJCXSQ4= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1718897300; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=bz7XvSV1s1lpKOYCDpPFOZXcwIcz02ntc/dHLvn3sXE=; b=O2gm2EKrFopphINwy3GWuluOWrq7BPn37XW9HKY5CswRAjDnaJw5TSQRpwBjISvec4gL5j y4omNQM5XQZAa2kiq3LC3pyp6Q66dcUbSRtc5lNZ6oNE5gHyO1GF5BZ5TyE3vOHZt6mo1D uC+dY7oASdClI0lMrzaGt5vmBxS9tsk= Received: from mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-369-OWk5jhtJNWO9autgRZJ_Pg-1; Thu, 20 Jun 2024 11:28:15 -0400 X-MC-Unique: OWk5jhtJNWO9autgRZJ_Pg-1 Received: from mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.40]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 8463519560B0; Thu, 20 Jun 2024 15:28:13 +0000 (UTC) Received: from localhost (unknown [10.42.28.182]) by mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id E69C919560B3; Thu, 20 Jun 2024 15:28:12 +0000 (UTC) From: Jonathan Wakely To: libstdc++@gcc.gnu.org, gcc-patches@gcc.gnu.org Subject: [PATCH] libstdc++: Fix std::fill and std::fill_n optimizations [PR109150] Date: Thu, 20 Jun 2024 16:27:19 +0100 Message-ID: <20240620152811.2020226-1-jwakely@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.40 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-10.7 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,RCVD_IN_SBL_CSS,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE,URI_HEX autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: I think the new conditions are correct. They're certainly an improvment on just checking __is_scalar without considering what we're assigning it to. Tested x86_64-linux. -- >8 -- As noted in the PR, the optimization used for scalar types in std::fill and std::fill_n is non-conforming, because it doesn't consider that assigning a scalar type might have non-trivial side effects which are affected by the optimization. By changing the condition under which the optimization is done we ensure it's only performed when safe to do so, and we also enable it for additional types, which was the original subject of the PR. Instead of two overloads using __enable_if<__is_scalar::__value, R> we can combine them into one and create a local variable which is either a local copy of __value or another reference to it, depending on whether the optimization is allowed. This removes a use of std::__is_scalar, which is a step towards fixing PR 115497 by removing std::__is_pointer from libstdc++-v3/ChangeLog: PR libstdc++/109150 * include/bits/stl_algobase.h (__fill_a1): Combine the !__is_scalar and __is_scalar overloads into one and rewrite the condition used to decide whether to perform the load outside the loop. * testsuite/25_algorithms/fill/109150.cc: New test. * testsuite/25_algorithms/fill_n/109150.cc: New test. --- libstdc++-v3/include/bits/stl_algobase.h | 75 ++++++++++++------- .../testsuite/25_algorithms/fill/109150.cc | 62 +++++++++++++++ .../testsuite/25_algorithms/fill_n/109150.cc | 62 +++++++++++++++ 3 files changed, 171 insertions(+), 28 deletions(-) create mode 100644 libstdc++-v3/testsuite/25_algorithms/fill/109150.cc create mode 100644 libstdc++-v3/testsuite/25_algorithms/fill_n/109150.cc diff --git a/libstdc++-v3/include/bits/stl_algobase.h b/libstdc++-v3/include/bits/stl_algobase.h index d831e0e9883..1a0f8c14073 100644 --- a/libstdc++-v3/include/bits/stl_algobase.h +++ b/libstdc++-v3/include/bits/stl_algobase.h @@ -929,28 +929,39 @@ _GLIBCXX_END_NAMESPACE_CONTAINER #define _GLIBCXX_MOVE_BACKWARD3(_Tp, _Up, _Vp) std::copy_backward(_Tp, _Up, _Vp) #endif +#pragma GCC diagnostic push +#pragma GCC diagnostic ignored "-Wc++17-extensions" template _GLIBCXX20_CONSTEXPR - inline typename - __gnu_cxx::__enable_if::__value, void>::__type + inline void __fill_a1(_ForwardIterator __first, _ForwardIterator __last, const _Tp& __value) { - for (; __first != __last; ++__first) - *__first = __value; - } + // We can optimize this loop by moving the load from __value outside + // the loop, but only if we know that making that copy is trivial, + // and the assignment in the loop is also trivial (so that the identity + // of the operand doesn't matter). + const bool __load_outside_loop = +#if __has_builtin(__is_trivially_constructible) \ + && __has_builtin(__is_trivially_assignable) + __is_trivially_constructible(_Tp, const _Tp&) + && __is_trivially_assignable(__decltype(*__first), const _Tp&) +#else + __is_trivially_copyable(_Tp) + && __is_same(_Tp, __typeof__(*__first)) +#endif + && sizeof(_Tp) <= sizeof(long long); - template - _GLIBCXX20_CONSTEXPR - inline typename - __gnu_cxx::__enable_if<__is_scalar<_Tp>::__value, void>::__type - __fill_a1(_ForwardIterator __first, _ForwardIterator __last, - const _Tp& __value) - { - const _Tp __tmp = __value; + // When the condition is true, we use a copy of __value, + // otherwise we just use another reference. + typedef typename __gnu_cxx::__conditional_type<__load_outside_loop, + const _Tp, + const _Tp&>::__type _Up; + _Up __val(__value); for (; __first != __last; ++__first) - *__first = __tmp; + *__first = __val; } +#pragma GCC diagnostic pop // Specialization: for char types we can use memset. template @@ -1079,28 +1090,36 @@ _GLIBCXX_END_NAMESPACE_CONTAINER __size_to_integer(__float128 __n) { return (long long)__n; } #endif +#pragma GCC diagnostic push +#pragma GCC diagnostic ignored "-Wc++17-extensions" template _GLIBCXX20_CONSTEXPR - inline typename - __gnu_cxx::__enable_if::__value, _OutputIterator>::__type + inline _OutputIterator __fill_n_a1(_OutputIterator __first, _Size __n, const _Tp& __value) { - for (; __n > 0; --__n, (void) ++__first) - *__first = __value; - return __first; - } + // See std::__fill_a1 for explanation of this condition. + const bool __load_outside_loop = +#if __has_builtin(__is_trivially_constructible) \ + && __has_builtin(__is_trivially_assignable) + __is_trivially_constructible(_Tp, const _Tp&) + && __is_trivially_assignable(__decltype(*__first), const _Tp&) +#else + __is_trivially_copyable(_Tp) + && __is_same(_Tp, __typeof__(*__first)) +#endif + && sizeof(_Tp) <= sizeof(long long); - template - _GLIBCXX20_CONSTEXPR - inline typename - __gnu_cxx::__enable_if<__is_scalar<_Tp>::__value, _OutputIterator>::__type - __fill_n_a1(_OutputIterator __first, _Size __n, const _Tp& __value) - { - const _Tp __tmp = __value; + // When the condition is true, we use a copy of __value, + // otherwise we just use another reference. + typedef typename __gnu_cxx::__conditional_type<__load_outside_loop, + const _Tp, + const _Tp&>::__type _Up; + _Up __val(__value); for (; __n > 0; --__n, (void) ++__first) - *__first = __tmp; + *__first = __val; return __first; } +#pragma GCC diagnostic pop template diff --git a/libstdc++-v3/testsuite/25_algorithms/fill/109150.cc b/libstdc++-v3/testsuite/25_algorithms/fill/109150.cc new file mode 100644 index 00000000000..9491a998ff0 --- /dev/null +++ b/libstdc++-v3/testsuite/25_algorithms/fill/109150.cc @@ -0,0 +1,62 @@ +// { dg-do run } + +// Test the problematic cases identified in PR libstdc++/109150 +// where the previous std::fill was non-conforming. + +#include +#include + +const int global = 0; + +struct X { + void operator=(const int& i) { VERIFY(&i == &global); } +}; + +void +test_identity_matters() +{ + X x; + // Assigning int to X has non-trivial side effects, so we cannot + // hoist the load outside the loop, we have to do exactly what the + // standard says to do. + std::fill(&x, &x+1, global); +} + +struct Y { + int i; + void operator=(int ii) { i = ii + 1; } +}; + +void +test_self_aliasing() +{ + Y y[2] = { }; + // Assigning int to X has non-trivial side effects, altering the value + // used to fill the later elements. Must not load it outside the loop. + std::fill(y, y+2, y[0].i); + VERIFY(y[1].i == 2); +} + +struct Z +{ + Z() { } +#if __cplusplus >= 201103L + explicit Z(const Z&) = default; +#endif +}; + +void +test_explicit_copy_ctor() +{ + Z z; + // The optimization that copies the fill value must use direct-initialization + // otherwise this case would be ill-formed due to the explicit constructor. + std::fill(&z, &z, z); +} + +int main() +{ + test_identity_matters(); + test_self_aliasing(); + test_explicit_copy_ctor(); +} diff --git a/libstdc++-v3/testsuite/25_algorithms/fill_n/109150.cc b/libstdc++-v3/testsuite/25_algorithms/fill_n/109150.cc new file mode 100644 index 00000000000..13bb31c9344 --- /dev/null +++ b/libstdc++-v3/testsuite/25_algorithms/fill_n/109150.cc @@ -0,0 +1,62 @@ +// { dg-do run } + +// Test the problematic cases identified in PR libstdc++/109150 +// where the previous std::fill_n was non-conforming. + +#include +#include + +const int global = 0; + +struct X { + void operator=(const int& i) { VERIFY(&i == &global); } +}; + +void +test_identity_matters() +{ + X x; + // Assigning int to X has non-trivial side effects, so we cannot + // hoist the load outside the loop, we have to do exactly what the + // standard says to do. + std::fill_n(&x, 1, global); +} + +struct Y { + int i; + void operator=(int ii) { i = ii + 1; } +}; + +void +test_self_aliasing() +{ + Y y[2] = { }; + // Assigning int to X has non-trivial side effects, altering the value + // used to fill the later elements. Must not load it outside the loop. + std::fill_n(y, 2, y[0].i); + VERIFY(y[1].i == 2); +} + +struct Z +{ + Z() { } +#if __cplusplus >= 201103L + explicit Z(const Z&) = default; +#endif +}; + +void +test_explicit_copy_ctor() +{ + Z z; + // The optimization that copies the fill value must use direct-initialization + // otherwise this case would be ill-formed due to the explicit constructor. + std::fill_n(&z, 1, z); +} + +int main() +{ + test_identity_matters(); + test_self_aliasing(); + test_explicit_copy_ctor(); +} -- 2.45.2