public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Jason Merrill <jason@redhat.com>
To: Patrick Palka <ppalka@redhat.com>
Cc: gcc-patches@gcc.gnu.org, Jonathan Wakely <jwakely.gcc@gmail.com>
Subject: Re: [PATCH] c++: fold calls to std::move/forward [PR96780]
Date: Tue, 15 Mar 2022 17:03:21 -0400	[thread overview]
Message-ID: <2136b888-4456-278d-542f-2069accf15b5@redhat.com> (raw)
In-Reply-To: <f295e306-a79f-a6ca-840a-aaedf57c36b1@idea>

On 3/15/22 13:09, Patrick Palka wrote:
> On Tue, 15 Mar 2022, Jason Merrill wrote:
> 
>> On 3/15/22 10:03, Patrick Palka wrote:
>>> On Mon, 14 Mar 2022, Jason Merrill wrote:
>>>
>>>> On 3/14/22 13:13, Patrick Palka wrote:
>>>>> On Fri, 11 Mar 2022, Jason Merrill wrote:
>>>>>
>>>>>> On 3/10/22 11:27, Patrick Palka wrote:
>>>>>>> On Wed, 9 Mar 2022, Jason Merrill wrote:
>>>>>>>
>>>>>>>> On 3/1/22 18:08, Patrick Palka wrote:
>>>>>>>>> A well-formed call to std::move/forward is equivalent to a cast,
>>>>>>>>> but
>>>>>>>>> the
>>>>>>>>> former being a function call means it comes with bloated debug
>>>>>>>>> info,
>>>>>>>>> which
>>>>>>>>> persists even after the call has been inlined away, for an
>>>>>>>>> operation
>>>>>>>>> that
>>>>>>>>> is never interesting to debug.
>>>>>>>>>
>>>>>>>>> This patch addresses this problem in a relatively ad-hoc way by
>>>>>>>>> folding
>>>>>>>>> calls to std::move/forward into casts as part of the frontend's
>>>>>>>>> general
>>>>>>>>> expression folding routine.  After this patch with -O2 and a
>>>>>>>>> non-checking
>>>>>>>>> compiler, debug info size for some testcases decreases by about
>>>>>>>>> ~10%
>>>>>>>>> and
>>>>>>>>> overall compile time and memory usage decreases by ~2%.
>>>>>>>>
>>>>>>>> Impressive.  Which testcases?
>>>>>>>
>>>>>>> I saw the largest percent reductions in debug file object size in
>>>>>>> various tests from cmcstl2 and range-v3, e.g.
>>>>>>> test/algorithm/set_symmetric_difference4.cpp and .../rotate_copy.cpp
>>>>>>> (which are among their biggest tests).
>>>>>>>
>>>>>>> Significant reductions in debug object file size can be observed in
>>>>>>> some libstdc++ testcases too, such as a 5.5% reduction in
>>>>>>> std/ranges/adaptor/join.cc
>>>>>>>
>>>>>>>>
>>>>>>>> Do you also want to handle addressof and as_const in this patch,
>>>>>>>> as
>>>>>>>> Jonathan
>>>>>>>> suggested?
>>>>>>>
>>>>>>> Yes, good idea.  Since each of their argument and return types are
>>>>>>> indirect types, I think we can use the same NOP_EXPR-based folding
>>>>>>> for
>>>>>>> them.
>>>>>>>
>>>>>>>>
>>>>>>>> I think we can do this now, and think about generalizing more in
>>>>>>>> stage
>>>>>>>> 1.
>>>>>>>>
>>>>>>>>> Bootstrapped and regtested on x86_64-pc-linux-gnu, is this
>>>>>>>>> something
>>>>>>>>> we
>>>>>>>>> want to consider for GCC 12?
>>>>>>>>>
>>>>>>>>> 	PR c++/96780
>>>>>>>>>
>>>>>>>>> gcc/cp/ChangeLog:
>>>>>>>>>
>>>>>>>>> 	* cp-gimplify.cc (cp_fold) <case CALL_EXPR>: When optimizing,
>>>>>>>>> 	fold calls to std::move/forward into simple casts.
>>>>>>>>> 	* cp-tree.h (is_std_move_p, is_std_forward_p): Declare.
>>>>>>>>> 	* typeck.cc (is_std_move_p, is_std_forward_p): Export.
>>>>>>>>>
>>>>>>>>> gcc/testsuite/ChangeLog:
>>>>>>>>>
>>>>>>>>> 	* g++.dg/opt/pr96780.C: New test.
>>>>>>>>> ---
>>>>>>>>>       gcc/cp/cp-gimplify.cc              | 18 ++++++++++++++++++
>>>>>>>>>       gcc/cp/cp-tree.h                   |  2 ++
>>>>>>>>>       gcc/cp/typeck.cc                   |  6 ++----
>>>>>>>>>       gcc/testsuite/g++.dg/opt/pr96780.C | 24
>>>>>>>>> ++++++++++++++++++++++++
>>>>>>>>>       4 files changed, 46 insertions(+), 4 deletions(-)
>>>>>>>>>       create mode 100644 gcc/testsuite/g++.dg/opt/pr96780.C
>>>>>>>>>
>>>>>>>>> diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
>>>>>>>>> index d7323fb5c09..0b009b631c7 100644
>>>>>>>>> --- a/gcc/cp/cp-gimplify.cc
>>>>>>>>> +++ b/gcc/cp/cp-gimplify.cc
>>>>>>>>> @@ -2756,6 +2756,24 @@ cp_fold (tree x)
>>>>>>>>>             case CALL_EXPR:
>>>>>>>>>             {
>>>>>>>>> +	if (optimize
>>>>>>>>
>>>>>>>> I think this should check flag_no_inline rather than optimize.
>>>>>>>
>>>>>>> Sounds good.
>>>>>>>
>>>>>>> Here's a patch that extends the folding to as_const and addressof
>>>>>>> (as
>>>>>>> well as __addressof, which I'm kind of unsure about since it's
>>>>>>> non-standard).  I suppose it also doesn't hurt to verify that the
>>>>>>> return
>>>>>>> and argument type of the function are sane before we commit to
>>>>>>> folding.
>>>>>>>
>>>>>>> -- >8 --
>>>>>>>
>>>>>>> Subject: [PATCH] c++: fold calls to std::move/forward [PR96780]
>>>>>>>
>>>>>>> A well-formed call to std::move/forward is equivalent to a cast, but
>>>>>>> the
>>>>>>> former being a function call means the compiler generates debug info
>>>>>>> for
>>>>>>> it, which persists even after the call has been inlined away, for an
>>>>>>> operation that's never interesting to debug.
>>>>>>>
>>>>>>> This patch addresses this problem in a relatively ad-hoc way by
>>>>>>> folding
>>>>>>> calls to std::move/forward and other cast-like functions into simple
>>>>>>> casts as part of the frontend's general expression folding routine.
>>>>>>> After this patch with -O2 and a non-checking compiler, debug info
>>>>>>> size
>>>>>>> for some testcases decreases by about ~10% and overall compile time
>>>>>>> and
>>>>>>> memory usage decreases by ~2%.
>>>>>>>
>>>>>>> 	PR c++/96780
>>>>>>>
>>>>>>> gcc/cp/ChangeLog:
>>>>>>>
>>>>>>> 	* cp-gimplify.cc (cp_fold) <case CALL_EXPR>: When optimizing,
>>>>>>> 	fold calls to std::move/forward and other cast-like functions
>>>>>>> 	into simple casts.
>>>>>>>
>>>>>>> gcc/testsuite/ChangeLog:
>>>>>>>
>>>>>>> 	* g++.dg/opt/pr96780.C: New test.
>>>>>>> ---
>>>>>>>      gcc/cp/cp-gimplify.cc              | 36
>>>>>>> +++++++++++++++++++++++++++-
>>>>>>>      gcc/testsuite/g++.dg/opt/pr96780.C | 38
>>>>>>> ++++++++++++++++++++++++++++++
>>>>>>>      2 files changed, 73 insertions(+), 1 deletion(-)
>>>>>>>      create mode 100644 gcc/testsuite/g++.dg/opt/pr96780.C
>>>>>>>
>>>>>>> diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
>>>>>>> index d7323fb5c09..efc4c8f0eb9 100644
>>>>>>> --- a/gcc/cp/cp-gimplify.cc
>>>>>>> +++ b/gcc/cp/cp-gimplify.cc
>>>>>>> @@ -2756,9 +2756,43 @@ cp_fold (tree x)
>>>>>>>            case CALL_EXPR:
>>>>>>>            {
>>>>>>> -	int sv = optimize, nw = sv;
>>>>>>>      	tree callee = get_callee_fndecl (x);
>>>>>>>      +	/* "Inline" calls to std::move/forward and other cast-like
>>>>>>> functions
>>>>>>> +	   by simply folding them into the corresponding cast
>>>>>>> determined by
>>>>>>> +	   their return type.  This is cheaper than relying on the
>>>>>>> middle-end
>>>>>>> +	   to do so, and also means we avoid generating useless debug
>>>>>>> info for
>>>>>>> +	   them at all.
>>>>>>> +
>>>>>>> +	   At this point the argument has already been converted into
>>>>>>> a
>>>>>>> +	   reference, so it suffices to use a NOP_EXPR to express the
>>>>>>> +	   cast.  */
>>>>>>> +	if (!flag_no_inline
>>>>>>
>>>>>> In our conversation yesterday it occurred to me that we might make
>>>>>> this a
>>>>>> separate flag that defaults to the value of flag_no_inline; I was
>>>>>> thinking
>>>>>> of
>>>>>> -ffold-simple-inlines.  Then Vittorio et al can specify that
>>>>>> explicitly at
>>>>>> -O0
>>>>>> if they'd like.
>>>>>
>>>>> Makes sense, like so?  Bootstrapped and regtested on
>>>>> x86_64-pc-linux-gnu.
>>>>>
>>>>> The patch defaults -ffold-simple-inlines according to the value of
>>>>> flag_no_inline at startup.  IIUC this means that if the flag has been
>>>>> defaulted to set, then e.g. an optimize("O0") function attribute won't
>>>>> disable -ffold-simple-inlines for that function, since we only compute
>>>>> its default value once.
>>>>>
>>>>> I wonder if we therefore instead want to handle defaulting the flag
>>>>> when it's used, e.g. check
>>>>>
>>>>>      (flag_fold_simple_inlines == -1
>>>>>       ? flag_no_inline
>>>>>       : flag_fold_simple_inlines)
>>>>>
>>>>> instead of
>>>>>
>>>>>      flag_fold_simple_inlines
>>>>>
>>>>> in cp_fold?
>>>>
>>>> I guess that makes sense, we can't add front-end options to the
>>>> default_options_table.  But I think let's use OPTION_SET_P instead of
>>>> checking
>>>> for -1.
>>>
>>> Done.
>>>
>>>>
>>>>> -- >8 --
>>>>>
>>>>> Subject: [PATCH] c++: fold calls to std::move/forward [PR96780]
>>>>>
>>>>> A well-formed call to std::move/forward is equivalent to a cast, but the
>>>>> former being a function call means the compiler generates debug info for
>>>>> it, which persists even after the call has been inlined away, for an
>>>>> operation that's never interesting to debug.
>>>>>
>>>>> This patch addresses this problem by folding calls to std::move/forward
>>>>> and other cast-like functions into simple casts as part of the
>>>>> frontend's
>>>>> general expression folding routine.  This behavior is controlled by a
>>>>> new flag -ffold-simple-inlines which defaults to the value of
>>>>> -fno-inline.
>>>>>
>>>>> After this patch with -O2 and a non-checking compiler, debug info size
>>>>> for some testcases (e.g. from range-v3 and cmcstl2) decreases by about
>>>>> ~10% and overall compile time and memory usage decreases by ~2%.
>>>>
>>>> Did you compare the reduction after handling more functions?
>>>
>>> The numbers are roughly the same, which I guess is not too surprising
>>> since calls to std::move/forward outnumber the other functions by about
>>> 10:1 in libstdc++, range-v3 and cmcstl2.
>>>
>>> The biggest reduction in debug object file size (measured by du) I've
>>> observed is 14% with range-v3's test/algorithm/stable_partition.cpp.
>>> The biggest reduction in peak memory usage is (measured by /usr/bin/time -v)
>>> is 5% with cmcstl's test/algorithm/set_symmetric_difference4.cpp.  The
>>> biggest reduction in compile time (measured by perf stat) is about 3%,
>>> also from that testcase.
>>>
>>> -- >8 --
>>>
>>> Subject: [PATCH] c++: fold calls to std::move/forward [PR96780]
>>>
>>> A well-formed call to std::move/forward is equivalent to a cast, but the
>>> former being a function call means the compiler generates debug info for
>>> it, which persists even after the call has been inlined away, for an
>>> operation that's never interesting to debug.
>>>
>>> This patch addresses this problem by folding calls to std::move/forward
>>> and other cast-like functions into simple casts as part of the frontend's
>>> general expression folding routine.  This behavior is controlled by a
>>> new flag -ffold-simple-inlines, and otherwise by -fno-inline, so that
>>> users can enable such folding even with -O0 (which implies -fno-inline).
>>>
>>> After this patch with -O2 and a non-checking compiler, debug info size
>>> for some testcases (e.g. from range-v3 and cmcstl2) decreases by about
>>> ~10% and overall compile time and memory usage decreases by ~2%.
>>>
>>> 	PR c++/96780
>>>
>>> gcc/c-family/ChangeLog:
>>>
>>> 	* c-opts.cc (c_common_post_options): Handle defaulting of
>>> 	flag_fold_simple_inlines.
>>> 	* c.opt: Add -ffold-simple-inlines.
>>
>> Looks like you still need a doc/invoke.texi change for the new flag. The rest
>> of the patch looks good.
> 
> Like his perhaps?  I opted to document the current scope of the flag
> (which only cares about a fixed set of functions) as opposed to its
> future scope (folding all sufficiently simple inline functions).

OK.

> -- >8 --
> 
> Subject: [PATCH] c++: fold calls to std::move/forward [PR96780]
> 
> A well-formed call to std::move/forward is equivalent to a cast, but the
> former being a function call means the compiler generates debug info for
> it, which persists even after the call has been inlined away, for an
> operation that's never interesting to debug.
> 
> This patch addresses this problem by folding calls to std::move/forward
> and other cast-like functions into simple casts as part of the frontend's
> general expression folding routine.  This behavior is controlled by a
> new flag -ffold-simple-inlines, and otherwise by -fno-inline, so that
> users can enable this folding with -O0 (which implies -fno-inline).
> 
> After this patch with -O2 and a non-checking compiler, debug info size
> for some testcases from range-v3 and cmcstl2 decreases by as much as ~10%
> and overall compile time and memory usage decreases by ~2%.
> 
> 	PR c++/96780
> 
> gcc/c-family/ChangeLog:
> 
> 	* c.opt: Add -ffold-simple-inlines.
> 
> gcc/ChangeLog:
> 
> 	* doc/invoke.texi (C++ Dialect Options): Document
> 	-ffold-simple-inlines.
> 
> gcc/cp/ChangeLog:
> 
> 	* cp-gimplify.cc (cp_fold) <case CALL_EXPR>: Fold calls to
> 	std::move/forward and other cast-like functions into simple
> 	casts.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* g++.dg/opt/pr96780.C: New test.
> ---
>   gcc/c-family/c.opt                 |  4 ++++
>   gcc/cp/cp-gimplify.cc              | 38 +++++++++++++++++++++++++++++-
>   gcc/doc/invoke.texi                | 10 ++++++++
>   gcc/testsuite/g++.dg/opt/pr96780.C | 38 ++++++++++++++++++++++++++++++
>   4 files changed, 89 insertions(+), 1 deletion(-)
>   create mode 100644 gcc/testsuite/g++.dg/opt/pr96780.C
> 
> diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
> index 9cfd2a6bc4e..9a4828ebe37 100644
> --- a/gcc/c-family/c.opt
> +++ b/gcc/c-family/c.opt
> @@ -1731,6 +1731,10 @@ Support dynamic initialization of thread-local variables in a different translat
>   fexternal-templates
>   C++ ObjC++ WarnRemoved
>   
> +ffold-simple-inlines
> +C++ ObjC++ Optimization Var(flag_fold_simple_inlines)
> +Fold calls to simple inline functions.
> +
>   ffor-scope
>   C++ ObjC++ WarnRemoved
>   
> diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
> index d7323fb5c09..e4c2644af15 100644
> --- a/gcc/cp/cp-gimplify.cc
> +++ b/gcc/cp/cp-gimplify.cc
> @@ -41,6 +41,7 @@ along with GCC; see the file COPYING3.  If not see
>   #include "file-prefix-map.h"
>   #include "cgraph.h"
>   #include "omp-general.h"
> +#include "opts.h"
>   
>   /* Forward declarations.  */
>   
> @@ -2756,9 +2757,44 @@ cp_fold (tree x)
>   
>       case CALL_EXPR:
>         {
> -	int sv = optimize, nw = sv;
>   	tree callee = get_callee_fndecl (x);
>   
> +	/* "Inline" calls to std::move/forward and other cast-like functions
> +	   by simply folding them into a corresponding cast to their return
> +	   type.  This is cheaper than relying on the middle end to do so, and
> +	   also means we avoid generating useless debug info for them at all.
> +
> +	   At this point the argument has already been converted into a
> +	   reference, so it suffices to use a NOP_EXPR to express the
> +	   cast.  */
> +	if ((OPTION_SET_P (flag_fold_simple_inlines)
> +	     ? flag_fold_simple_inlines
> +	     : !flag_no_inline)
> +	    && call_expr_nargs (x) == 1
> +	    && decl_in_std_namespace_p (callee)
> +	    && DECL_NAME (callee) != NULL_TREE
> +	    && (id_equal (DECL_NAME (callee), "move")
> +		|| id_equal (DECL_NAME (callee), "forward")
> +		|| id_equal (DECL_NAME (callee), "addressof")
> +		/* This addressof equivalent is used heavily in libstdc++.  */
> +		|| id_equal (DECL_NAME (callee), "__addressof")
> +		|| id_equal (DECL_NAME (callee), "as_const")))
> +	  {
> +	    r = CALL_EXPR_ARG (x, 0);
> +	    /* Check that the return and argument types are sane before
> +	       folding.  */
> +	    if (INDIRECT_TYPE_P (TREE_TYPE (x))
> +		&& INDIRECT_TYPE_P (TREE_TYPE (r)))
> +	      {
> +		if (!same_type_p (TREE_TYPE (x), TREE_TYPE (r)))
> +		  r = build_nop (TREE_TYPE (x), r);
> +		x = cp_fold (r);
> +		break;
> +	      }
> +	  }
> +
> +	int sv = optimize, nw = sv;
> +
>   	/* Some built-in function calls will be evaluated at compile-time in
>   	   fold ().  Set optimize to 1 when folding __builtin_constant_p inside
>   	   a constexpr function so that fold_builtin_1 doesn't fold it to 0.  */
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 2a14e1a9472..d65979bba3f 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -3124,6 +3124,16 @@ On targets that support symbol aliases, the default is
>   @option{-fextern-tls-init}.  On targets that do not support symbol
>   aliases, the default is @option{-fno-extern-tls-init}.
>   
> +@item -ffold-simple-inlines
> +@itemx -fno-fold-simple-inlines
> +@opindex ffold-simple-inlines
> +@opindex fno-fold-simple-inlines
> +Permit the C++ frontend to fold calls to @code{std::move}, @code{std::forward},
> +@code{std::addressof} and @code{std::as_const}.  In contrast to inlining, this
> +means no debug information will be generated for such calls.  Since these
> +functions are rarely interesting to debug, this flag is enabled by default
> +unless @option{-fno-inline} is active.
> +
>   @item -fno-gnu-keywords
>   @opindex fno-gnu-keywords
>   @opindex fgnu-keywords
> diff --git a/gcc/testsuite/g++.dg/opt/pr96780.C b/gcc/testsuite/g++.dg/opt/pr96780.C
> new file mode 100644
> index 00000000000..61e11855eeb
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/opt/pr96780.C
> @@ -0,0 +1,38 @@
> +// PR c++/96780
> +// Verify calls to std::move/forward are folded away by the frontend.
> +// { dg-do compile { target c++11 } }
> +// { dg-additional-options "-ffold-simple-inlines -fdump-tree-gimple" }
> +
> +#include <utility>
> +
> +struct A;
> +
> +extern A& a;
> +extern const A& ca;
> +
> +void f() {
> +  auto&& x1 = std::move(a);
> +  auto&& x2 = std::forward<A>(a);
> +  auto&& x3 = std::forward<A&>(a);
> +
> +  auto&& x4 = std::move(ca);
> +  auto&& x5 = std::forward<const A>(ca);
> +  auto&& x6 = std::forward<const A&>(ca);
> +
> +  auto x7 = std::addressof(a);
> +  auto x8 = std::addressof(ca);
> +#if __GLIBCXX__
> +  auto x9 = std::__addressof(a);
> +  auto x10 = std::__addressof(ca);
> +#endif
> +#if __cpp_lib_as_const
> +  auto&& x11 = std::as_const(a);
> +  auto&& x12 = std::as_const(ca);
> +#endif
> +}
> +
> +// { dg-final { scan-tree-dump-not "= std::move" "gimple" } }
> +// { dg-final { scan-tree-dump-not "= std::forward" "gimple" } }
> +// { dg-final { scan-tree-dump-not "= std::addressof" "gimple" } }
> +// { dg-final { scan-tree-dump-not "= std::__addressof" "gimple" } }
> +// { dg-final { scan-tree-dump-not "= std::as_const" "gimple" } }


      reply	other threads:[~2022-03-15 21:03 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-01 22:08 Patrick Palka
2022-03-10  4:09 ` Jason Merrill
2022-03-10 15:27   ` Patrick Palka
2022-03-10 15:32     ` Jonathan Wakely
2022-03-12  1:31     ` Jason Merrill
2022-03-14 17:13       ` Patrick Palka
2022-03-14 22:20         ` Jason Merrill
2022-03-15 14:03           ` Patrick Palka
2022-03-15 15:38             ` Jason Merrill
2022-03-15 17:09               ` Patrick Palka
2022-03-15 21:03                 ` Jason Merrill [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2136b888-4456-278d-542f-2069accf15b5@redhat.com \
    --to=jason@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jwakely.gcc@gmail.com \
    --cc=ppalka@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).