From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 21FE53858D20 for ; Tue, 15 Mar 2022 15:38:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 21FE53858D20 Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-244-tCZjTqqxPiOW-NBI0c_qpg-1; Tue, 15 Mar 2022 11:38:31 -0400 X-MC-Unique: tCZjTqqxPiOW-NBI0c_qpg-1 Received: by mail-qv1-f71.google.com with SMTP id dj3-20020a056214090300b004354a9c60aaso16953503qvb.0 for ; Tue, 15 Mar 2022 08:38:31 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:in-reply-to :content-transfer-encoding; bh=DAnuKoNFpXealeCqpB1NtKR0Xbez0cKPsy8gKbDsrEA=; b=lT+X+B6IpsrQAnwd4NmM98LszheULm8eHPdpgMZax3pzeRyOs8pwfw7OUQtainAm+O a9bKBqune75GbDD02enyp9Amz0B4kI0AUPLoeeA4tJ7KqqAgvzwzvOqSqgzhsnfqcPgE nsDux2R8Kqj8fzYSXiAF/b2x4Dsk94Aj2xvOum5vQEIlkfbVtYxyZFt32UDcMFFtxyS5 YLhGFu0c/krEeg57I9V9D6j8U0wowof+be+cE8Mx9ftKSsktlBqbQVBxqLreroPhgdKI FO0uQWZKANqXiawrNUe5sEUhUbo6KiLDDI4/MJSyVrAecXCKrVHxq+kqiUGwNWKDDvVo o32w== X-Gm-Message-State: AOAM533EQMe3g5WvKAbKbRrEAem6F9flRUMElP4wL7j/I2VaKMReEVB3 xGXOXUK5eKp5sQGBhRzU668PZZKB49oNDm4pdWaDm5Coydop/IQSFn49/Fva5JZkGyezZvhQBN2 RErlYM6oLaaHbSF0Vnw== X-Received: by 2002:a05:620a:1094:b0:67b:1f05:4942 with SMTP id g20-20020a05620a109400b0067b1f054942mr18288395qkk.224.1647358710831; Tue, 15 Mar 2022 08:38:30 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyKqY1aDuillZUzp1S+9WETvHSa/ngBi3FJbWhGG8MRPAA21qz/TMmmg7OC4sdrGcwlBXuLdw== X-Received: by 2002:a05:620a:1094:b0:67b:1f05:4942 with SMTP id g20-20020a05620a109400b0067b1f054942mr18288364qkk.224.1647358710246; Tue, 15 Mar 2022 08:38:30 -0700 (PDT) Received: from [192.168.1.149] (130-44-159-43.s15913.c3-0.arl-cbr1.sbo-arl.ma.cable.rcncustomer.com. [130.44.159.43]) by smtp.gmail.com with ESMTPSA id b1-20020a05622a020100b002de94100619sm13457550qtx.81.2022.03.15.08.38.29 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 15 Mar 2022 08:38:29 -0700 (PDT) Message-ID: <50b67f0d-4f22-2b83-c383-533219334365@redhat.com> Date: Tue, 15 Mar 2022 11:38:28 -0400 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.7.0 Subject: Re: [PATCH] c++: fold calls to std::move/forward [PR96780] To: Patrick Palka Cc: gcc-patches@gcc.gnu.org, Jonathan Wakely References: <20220301220821.1732163-1-ppalka@redhat.com> <1de0b889-66e7-8202-5c5e-526b728a36e8@redhat.com> <6b2257e0-cd0f-4771-7c12-1be4dbf88423@idea> <7b1f33b7-3c59-bf1e-1fa4-7226504a5ae2@redhat.com> <9db703fb-4517-5ee2-fe29-b84cdd30ed9e@idea> From: Jason Merrill In-Reply-To: <9db703fb-4517-5ee2-fe29-b84cdd30ed9e@idea> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-13.3 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Mar 2022 15:38:36 -0000 On 3/15/22 10:03, Patrick Palka wrote: > On Mon, 14 Mar 2022, Jason Merrill wrote: > >> On 3/14/22 13:13, Patrick Palka wrote: >>> On Fri, 11 Mar 2022, Jason Merrill wrote: >>> >>>> On 3/10/22 11:27, Patrick Palka wrote: >>>>> On Wed, 9 Mar 2022, Jason Merrill wrote: >>>>> >>>>>> On 3/1/22 18:08, Patrick Palka wrote: >>>>>>> A well-formed call to std::move/forward is equivalent to a cast, but >>>>>>> the >>>>>>> former being a function call means it comes with bloated debug info, >>>>>>> which >>>>>>> persists even after the call has been inlined away, for an operation >>>>>>> that >>>>>>> is never interesting to debug. >>>>>>> >>>>>>> This patch addresses this problem in a relatively ad-hoc way by >>>>>>> folding >>>>>>> calls to std::move/forward into casts as part of the frontend's >>>>>>> general >>>>>>> expression folding routine. After this patch with -O2 and a >>>>>>> non-checking >>>>>>> compiler, debug info size for some testcases decreases by about ~10% >>>>>>> and >>>>>>> overall compile time and memory usage decreases by ~2%. >>>>>> >>>>>> Impressive. Which testcases? >>>>> >>>>> I saw the largest percent reductions in debug file object size in >>>>> various tests from cmcstl2 and range-v3, e.g. >>>>> test/algorithm/set_symmetric_difference4.cpp and .../rotate_copy.cpp >>>>> (which are among their biggest tests). >>>>> >>>>> Significant reductions in debug object file size can be observed in >>>>> some libstdc++ testcases too, such as a 5.5% reduction in >>>>> std/ranges/adaptor/join.cc >>>>> >>>>>> >>>>>> Do you also want to handle addressof and as_const in this patch, as >>>>>> Jonathan >>>>>> suggested? >>>>> >>>>> Yes, good idea. Since each of their argument and return types are >>>>> indirect types, I think we can use the same NOP_EXPR-based folding for >>>>> them. >>>>> >>>>>> >>>>>> I think we can do this now, and think about generalizing more in stage >>>>>> 1. >>>>>> >>>>>>> Bootstrapped and regtested on x86_64-pc-linux-gnu, is this something >>>>>>> we >>>>>>> want to consider for GCC 12? >>>>>>> >>>>>>> PR c++/96780 >>>>>>> >>>>>>> gcc/cp/ChangeLog: >>>>>>> >>>>>>> * cp-gimplify.cc (cp_fold) : When optimizing, >>>>>>> fold calls to std::move/forward into simple casts. >>>>>>> * cp-tree.h (is_std_move_p, is_std_forward_p): Declare. >>>>>>> * typeck.cc (is_std_move_p, is_std_forward_p): Export. >>>>>>> >>>>>>> gcc/testsuite/ChangeLog: >>>>>>> >>>>>>> * g++.dg/opt/pr96780.C: New test. >>>>>>> --- >>>>>>> gcc/cp/cp-gimplify.cc | 18 ++++++++++++++++++ >>>>>>> gcc/cp/cp-tree.h | 2 ++ >>>>>>> gcc/cp/typeck.cc | 6 ++---- >>>>>>> gcc/testsuite/g++.dg/opt/pr96780.C | 24 ++++++++++++++++++++++++ >>>>>>> 4 files changed, 46 insertions(+), 4 deletions(-) >>>>>>> create mode 100644 gcc/testsuite/g++.dg/opt/pr96780.C >>>>>>> >>>>>>> diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc >>>>>>> index d7323fb5c09..0b009b631c7 100644 >>>>>>> --- a/gcc/cp/cp-gimplify.cc >>>>>>> +++ b/gcc/cp/cp-gimplify.cc >>>>>>> @@ -2756,6 +2756,24 @@ cp_fold (tree x) >>>>>>> case CALL_EXPR: >>>>>>> { >>>>>>> + if (optimize >>>>>> >>>>>> I think this should check flag_no_inline rather than optimize. >>>>> >>>>> Sounds good. >>>>> >>>>> Here's a patch that extends the folding to as_const and addressof (as >>>>> well as __addressof, which I'm kind of unsure about since it's >>>>> non-standard). I suppose it also doesn't hurt to verify that the return >>>>> and argument type of the function are sane before we commit to folding. >>>>> >>>>> -- >8 -- >>>>> >>>>> Subject: [PATCH] c++: fold calls to std::move/forward [PR96780] >>>>> >>>>> A well-formed call to std::move/forward is equivalent to a cast, but the >>>>> former being a function call means the compiler generates debug info for >>>>> it, which persists even after the call has been inlined away, for an >>>>> operation that's never interesting to debug. >>>>> >>>>> This patch addresses this problem in a relatively ad-hoc way by folding >>>>> calls to std::move/forward and other cast-like functions into simple >>>>> casts as part of the frontend's general expression folding routine. >>>>> After this patch with -O2 and a non-checking compiler, debug info size >>>>> for some testcases decreases by about ~10% and overall compile time and >>>>> memory usage decreases by ~2%. >>>>> >>>>> PR c++/96780 >>>>> >>>>> gcc/cp/ChangeLog: >>>>> >>>>> * cp-gimplify.cc (cp_fold) : When optimizing, >>>>> fold calls to std::move/forward and other cast-like functions >>>>> into simple casts. >>>>> >>>>> gcc/testsuite/ChangeLog: >>>>> >>>>> * g++.dg/opt/pr96780.C: New test. >>>>> --- >>>>> gcc/cp/cp-gimplify.cc | 36 +++++++++++++++++++++++++++- >>>>> gcc/testsuite/g++.dg/opt/pr96780.C | 38 >>>>> ++++++++++++++++++++++++++++++ >>>>> 2 files changed, 73 insertions(+), 1 deletion(-) >>>>> create mode 100644 gcc/testsuite/g++.dg/opt/pr96780.C >>>>> >>>>> diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc >>>>> index d7323fb5c09..efc4c8f0eb9 100644 >>>>> --- a/gcc/cp/cp-gimplify.cc >>>>> +++ b/gcc/cp/cp-gimplify.cc >>>>> @@ -2756,9 +2756,43 @@ cp_fold (tree x) >>>>> case CALL_EXPR: >>>>> { >>>>> - int sv = optimize, nw = sv; >>>>> tree callee = get_callee_fndecl (x); >>>>> + /* "Inline" calls to std::move/forward and other cast-like >>>>> functions >>>>> + by simply folding them into the corresponding cast determined by >>>>> + their return type. This is cheaper than relying on the middle-end >>>>> + to do so, and also means we avoid generating useless debug info for >>>>> + them at all. >>>>> + >>>>> + At this point the argument has already been converted into a >>>>> + reference, so it suffices to use a NOP_EXPR to express the >>>>> + cast. */ >>>>> + if (!flag_no_inline >>>> >>>> In our conversation yesterday it occurred to me that we might make this a >>>> separate flag that defaults to the value of flag_no_inline; I was thinking >>>> of >>>> -ffold-simple-inlines. Then Vittorio et al can specify that explicitly at >>>> -O0 >>>> if they'd like. >>> >>> Makes sense, like so? Bootstrapped and regtested on x86_64-pc-linux-gnu. >>> >>> The patch defaults -ffold-simple-inlines according to the value of >>> flag_no_inline at startup. IIUC this means that if the flag has been >>> defaulted to set, then e.g. an optimize("O0") function attribute won't >>> disable -ffold-simple-inlines for that function, since we only compute >>> its default value once. >>> >>> I wonder if we therefore instead want to handle defaulting the flag >>> when it's used, e.g. check >>> >>> (flag_fold_simple_inlines == -1 >>> ? flag_no_inline >>> : flag_fold_simple_inlines) >>> >>> instead of >>> >>> flag_fold_simple_inlines >>> >>> in cp_fold? >> >> I guess that makes sense, we can't add front-end options to the >> default_options_table. But I think let's use OPTION_SET_P instead of checking >> for -1. > > Done. > >> >>> -- >8 -- >>> >>> Subject: [PATCH] c++: fold calls to std::move/forward [PR96780] >>> >>> A well-formed call to std::move/forward is equivalent to a cast, but the >>> former being a function call means the compiler generates debug info for >>> it, which persists even after the call has been inlined away, for an >>> operation that's never interesting to debug. >>> >>> This patch addresses this problem by folding calls to std::move/forward >>> and other cast-like functions into simple casts as part of the frontend's >>> general expression folding routine. This behavior is controlled by a >>> new flag -ffold-simple-inlines which defaults to the value of -fno-inline. >>> >>> After this patch with -O2 and a non-checking compiler, debug info size >>> for some testcases (e.g. from range-v3 and cmcstl2) decreases by about >>> ~10% and overall compile time and memory usage decreases by ~2%. >> >> Did you compare the reduction after handling more functions? > > The numbers are roughly the same, which I guess is not too surprising > since calls to std::move/forward outnumber the other functions by about > 10:1 in libstdc++, range-v3 and cmcstl2. > > The biggest reduction in debug object file size (measured by du) I've > observed is 14% with range-v3's test/algorithm/stable_partition.cpp. > The biggest reduction in peak memory usage is (measured by /usr/bin/time -v) > is 5% with cmcstl's test/algorithm/set_symmetric_difference4.cpp. The > biggest reduction in compile time (measured by perf stat) is about 3%, > also from that testcase. > > -- >8 -- > > Subject: [PATCH] c++: fold calls to std::move/forward [PR96780] > > A well-formed call to std::move/forward is equivalent to a cast, but the > former being a function call means the compiler generates debug info for > it, which persists even after the call has been inlined away, for an > operation that's never interesting to debug. > > This patch addresses this problem by folding calls to std::move/forward > and other cast-like functions into simple casts as part of the frontend's > general expression folding routine. This behavior is controlled by a > new flag -ffold-simple-inlines, and otherwise by -fno-inline, so that > users can enable such folding even with -O0 (which implies -fno-inline). > > After this patch with -O2 and a non-checking compiler, debug info size > for some testcases (e.g. from range-v3 and cmcstl2) decreases by about > ~10% and overall compile time and memory usage decreases by ~2%. > > PR c++/96780 > > gcc/c-family/ChangeLog: > > * c-opts.cc (c_common_post_options): Handle defaulting of > flag_fold_simple_inlines. > * c.opt: Add -ffold-simple-inlines. Looks like you still need a doc/invoke.texi change for the new flag. The rest of the patch looks good. > gcc/cp/ChangeLog: > > * cp-gimplify.cc (cp_fold) : Fold calls to > std::move/forward and other cast-like functions into simple > casts. > > gcc/testsuite/ChangeLog: > > * g++.dg/opt/pr96780.C: New test. > --- > gcc/c-family/c.opt | 4 ++++ > gcc/cp/cp-gimplify.cc | 38 +++++++++++++++++++++++++++++- > gcc/testsuite/g++.dg/opt/pr96780.C | 38 ++++++++++++++++++++++++++++++ > 3 files changed, 79 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/g++.dg/opt/pr96780.C > > diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt > index 9cfd2a6bc4e..9a4828ebe37 100644 > --- a/gcc/c-family/c.opt > +++ b/gcc/c-family/c.opt > @@ -1731,6 +1731,10 @@ Support dynamic initialization of thread-local variables in a different translat > fexternal-templates > C++ ObjC++ WarnRemoved > > +ffold-simple-inlines > +C++ ObjC++ Optimization Var(flag_fold_simple_inlines) > +Fold calls to simple inline functions. > + > ffor-scope > C++ ObjC++ WarnRemoved > > diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc > index d7323fb5c09..41daaf18725 100644 > --- a/gcc/cp/cp-gimplify.cc > +++ b/gcc/cp/cp-gimplify.cc > @@ -41,6 +41,7 @@ along with GCC; see the file COPYING3. If not see > #include "file-prefix-map.h" > #include "cgraph.h" > #include "omp-general.h" > +#include "opts.h" > > /* Forward declarations. */ > > @@ -2756,9 +2757,44 @@ cp_fold (tree x) > > case CALL_EXPR: > { > - int sv = optimize, nw = sv; > tree callee = get_callee_fndecl (x); > > + /* "Inline" calls to std::move/forward and other cast-like functions > + by simply folding them into a corresponding cast to their return > + type. This is cheaper than relying on the middle-end to do so, and > + also means we avoid generating useless debug info for them at all. > + > + At this point the argument has already been converted into a > + reference, so it suffices to use a NOP_EXPR to express the > + cast. */ > + if ((OPTION_SET_P (flag_fold_simple_inlines) > + ? flag_fold_simple_inlines > + : !flag_no_inline) > + && call_expr_nargs (x) == 1 > + && decl_in_std_namespace_p (callee) > + && DECL_NAME (callee) != NULL_TREE > + && (id_equal (DECL_NAME (callee), "move") > + || id_equal (DECL_NAME (callee), "forward") > + || id_equal (DECL_NAME (callee), "addressof") > + /* This addressof equivalent is used heavily in libstdc++. */ > + || id_equal (DECL_NAME (callee), "__addressof") > + || id_equal (DECL_NAME (callee), "as_const"))) > + { > + r = CALL_EXPR_ARG (x, 0); > + /* Check that the return and arguments types are sane before > + folding. */ > + if (INDIRECT_TYPE_P (TREE_TYPE (x)) > + && INDIRECT_TYPE_P (TREE_TYPE (r))) > + { > + if (!same_type_p (TREE_TYPE (x), TREE_TYPE (r))) > + r = build_nop (TREE_TYPE (x), r); > + x = cp_fold (r); > + break; > + } > + } > + > + int sv = optimize, nw = sv; > + > /* Some built-in function calls will be evaluated at compile-time in > fold (). Set optimize to 1 when folding __builtin_constant_p inside > a constexpr function so that fold_builtin_1 doesn't fold it to 0. */ > diff --git a/gcc/testsuite/g++.dg/opt/pr96780.C b/gcc/testsuite/g++.dg/opt/pr96780.C > new file mode 100644 > index 00000000000..61e11855eeb > --- /dev/null > +++ b/gcc/testsuite/g++.dg/opt/pr96780.C > @@ -0,0 +1,38 @@ > +// PR c++/96780 > +// Verify calls to std::move/forward are folded away by the frontend. > +// { dg-do compile { target c++11 } } > +// { dg-additional-options "-ffold-simple-inlines -fdump-tree-gimple" } > + > +#include > + > +struct A; > + > +extern A& a; > +extern const A& ca; > + > +void f() { > + auto&& x1 = std::move(a); > + auto&& x2 = std::forward(a); > + auto&& x3 = std::forward(a); > + > + auto&& x4 = std::move(ca); > + auto&& x5 = std::forward(ca); > + auto&& x6 = std::forward(ca); > + > + auto x7 = std::addressof(a); > + auto x8 = std::addressof(ca); > +#if __GLIBCXX__ > + auto x9 = std::__addressof(a); > + auto x10 = std::__addressof(ca); > +#endif > +#if __cpp_lib_as_const > + auto&& x11 = std::as_const(a); > + auto&& x12 = std::as_const(ca); > +#endif > +} > + > +// { dg-final { scan-tree-dump-not "= std::move" "gimple" } } > +// { dg-final { scan-tree-dump-not "= std::forward" "gimple" } } > +// { dg-final { scan-tree-dump-not "= std::addressof" "gimple" } } > +// { dg-final { scan-tree-dump-not "= std::__addressof" "gimple" } } > +// { dg-final { scan-tree-dump-not "= std::as_const" "gimple" } }