public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] predict: Adjust optimize_function_for_size_p [PR105818]
@ 2022-06-14  7:57 Kewen.Lin
  2022-06-14 11:37 ` Jan Hubicka
  2022-06-14 12:53 ` Segher Boessenkool
  0 siblings, 2 replies; 9+ messages in thread
From: Kewen.Lin @ 2022-06-14  7:57 UTC (permalink / raw)
  To: GCC Patches
  Cc: Jan Hubicka, Richard Biener, Richard Sandiford, Segher Boessenkool

Hi,

Function optimize_function_for_size_p returns OPTIMIZE_SIZE_NO
if func->decl is not null but no cgraph node is available for it.
As PR105818 shows, this could give unexpected result.  For the
case in PR105818, when parsing bar decl in function foo, the cfun
is a function structure for foo, for which there is none cgraph
node, so it returns OPTIMIZE_SIZE_NO.  But it's incorrect since
the context is to optimize for size, the flag optimize_size is
true.

The patch is to make optimize_function_for_size_p to check
optimize_size as what it does when func->decl is unavailable.

One regression failure got exposed on aarch64-linux-gnu:

PASS->FAIL: gcc.dg/guality/pr54693-2.c   -Os \
	    -DPREVENT_OPTIMIZATION  line 21 x == 10 - i

The difference comes from the macro LOGICAL_OP_NON_SHORT_CIRCUIT
used in function fold_range_test during c parsing, it uses
optimize_function_for_speed_p which is equal to the invertion
of optimize_function_for_size_p.  At that time cfun->decl is valid
but no cgraph node for it, w/o this patch function
optimize_function_for_speed_p returns true eventually, while it
returns false with this patch.  Since the command line option -Os
is specified, there is no reason to interpret it as "for speed".
I think this failure is expected and adjust the test case
accordingly.

Is it ok for trunk?

BR,
Kewen
-----

	PR target/105818

gcc/ChangeLog:

	* predict.cc (optimize_function_for_size_p): Check optimize_size when
	func->decl is valid but its cgraph node is unavailable.

gcc/testsuite/ChangeLog:

	* gcc.target/powerpc/pr105818.c: New test.
	* gcc.dg/guality/pr54693-2.c: Adjust for aarch64.
---
 gcc/predict.cc                              | 2 +-
 gcc/testsuite/gcc.dg/guality/pr54693-2.c    | 2 +-
 gcc/testsuite/gcc.target/powerpc/pr105818.c | 9 +++++++++
 3 files changed, 11 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr105818.c

diff --git a/gcc/predict.cc b/gcc/predict.cc
index 5734e4c8516..6c60a973236 100644
--- a/gcc/predict.cc
+++ b/gcc/predict.cc
@@ -268,7 +268,7 @@ optimize_function_for_size_p (struct function *fun)
   cgraph_node *n = cgraph_node::get (fun->decl);
   if (n)
     return n->optimize_for_size_p ();
-  return OPTIMIZE_SIZE_NO;
+  return optimize_size ? OPTIMIZE_SIZE_MAX : OPTIMIZE_SIZE_NO;
 }

 /* Return true if function FUN should always be optimized for speed.  */
diff --git a/gcc/testsuite/gcc.dg/guality/pr54693-2.c b/gcc/testsuite/gcc.dg/guality/pr54693-2.c
index 68aa6c63d71..14ca94ad37d 100644
--- a/gcc/testsuite/gcc.dg/guality/pr54693-2.c
+++ b/gcc/testsuite/gcc.dg/guality/pr54693-2.c
@@ -17,7 +17,7 @@ foo (int x, int y, int z)
   int i = 0;
   while (x > 3 && y > 3 && z > 3)
     {		/* { dg-final { gdb-test .+2 "i" "v + 1" } } */
-		/* { dg-final { gdb-test .+1 "x" "10 - i" } } */
+		/* { dg-final { gdb-test .+1 "x" "10 - i" { xfail { aarch64*-*-* && { any-opts "-Os" } } } } } */
       bar (i);	/* { dg-final { gdb-test . "y" "20 - 2 * i" } } */
 		/* { dg-final { gdb-test .-1 "z" "30 - 3 * i" { xfail { aarch64*-*-* && { any-opts "-fno-fat-lto-objects" "-Os" } } } } } */
       i++, x--, y -= 2, z -= 3;
diff --git a/gcc/testsuite/gcc.target/powerpc/pr105818.c b/gcc/testsuite/gcc.target/powerpc/pr105818.c
new file mode 100644
index 00000000000..18781edbf9e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr105818.c
@@ -0,0 +1,9 @@
+/* { dg-options "-Os -fno-tree-vectorize" } */
+
+#pragma GCC optimize "-fno-tree-vectorize"
+
+void
+foo (void)
+{
+  void bar (void);
+}
--
2.27.0

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] predict: Adjust optimize_function_for_size_p [PR105818]
  2022-06-14  7:57 [PATCH] predict: Adjust optimize_function_for_size_p [PR105818] Kewen.Lin
@ 2022-06-14 11:37 ` Jan Hubicka
  2022-06-15  6:20   ` Kewen.Lin
  2022-06-14 12:53 ` Segher Boessenkool
  1 sibling, 1 reply; 9+ messages in thread
From: Jan Hubicka @ 2022-06-14 11:37 UTC (permalink / raw)
  To: Kewen.Lin; +Cc: GCC Patches, Richard Sandiford, Segher Boessenkool

> Hi,
> 
> Function optimize_function_for_size_p returns OPTIMIZE_SIZE_NO
> if func->decl is not null but no cgraph node is available for it.
> As PR105818 shows, this could give unexpected result.  For the
> case in PR105818, when parsing bar decl in function foo, the cfun
> is a function structure for foo, for which there is none cgraph
> node, so it returns OPTIMIZE_SIZE_NO.  But it's incorrect since
> the context is to optimize for size, the flag optimize_size is
> true.
> 
> The patch is to make optimize_function_for_size_p to check
> optimize_size as what it does when func->decl is unavailable.
> 
> One regression failure got exposed on aarch64-linux-gnu:
> 
> PASS->FAIL: gcc.dg/guality/pr54693-2.c   -Os \
> 	    -DPREVENT_OPTIMIZATION  line 21 x == 10 - i
> 
> The difference comes from the macro LOGICAL_OP_NON_SHORT_CIRCUIT
> used in function fold_range_test during c parsing, it uses
> optimize_function_for_speed_p which is equal to the invertion
> of optimize_function_for_size_p.  At that time cfun->decl is valid
> but no cgraph node for it, w/o this patch function
> optimize_function_for_speed_p returns true eventually, while it
> returns false with this patch.  Since the command line option -Os
> is specified, there is no reason to interpret it as "for speed".
> I think this failure is expected and adjust the test case
> accordingly.
> 
> Is it ok for trunk?
> 
> BR,
> Kewen
> -----
> 
> 	PR target/105818
> 
> gcc/ChangeLog:
> 
> 	* predict.cc (optimize_function_for_size_p): Check optimize_size when
> 	func->decl is valid but its cgraph node is unavailable.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/powerpc/pr105818.c: New test.
> 	* gcc.dg/guality/pr54693-2.c: Adjust for aarch64.
> ---
>  gcc/predict.cc                              | 2 +-
>  gcc/testsuite/gcc.dg/guality/pr54693-2.c    | 2 +-
>  gcc/testsuite/gcc.target/powerpc/pr105818.c | 9 +++++++++
>  3 files changed, 11 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr105818.c
> 
> diff --git a/gcc/predict.cc b/gcc/predict.cc
> index 5734e4c8516..6c60a973236 100644
> --- a/gcc/predict.cc
> +++ b/gcc/predict.cc
> @@ -268,7 +268,7 @@ optimize_function_for_size_p (struct function *fun)
>    cgraph_node *n = cgraph_node::get (fun->decl);
>    if (n)
>      return n->optimize_for_size_p ();
> -  return OPTIMIZE_SIZE_NO;
> +  return optimize_size ? OPTIMIZE_SIZE_MAX : OPTIMIZE_SIZE_NO;

We could also do (opt_for_fn (cfun->decl, optimize_size) that is
probably better since one can change optimize_size with optimization
attribute.
However I think in most cases we check for optimize_size early I think
we are doing something wrong, since at that time htere is no profile
available.  Why exactly PR105818 hits the flag change issue?

Honza
>  }
> 
>  /* Return true if function FUN should always be optimized for speed.  */
> diff --git a/gcc/testsuite/gcc.dg/guality/pr54693-2.c b/gcc/testsuite/gcc.dg/guality/pr54693-2.c
> index 68aa6c63d71..14ca94ad37d 100644
> --- a/gcc/testsuite/gcc.dg/guality/pr54693-2.c
> +++ b/gcc/testsuite/gcc.dg/guality/pr54693-2.c
> @@ -17,7 +17,7 @@ foo (int x, int y, int z)
>    int i = 0;
>    while (x > 3 && y > 3 && z > 3)
>      {		/* { dg-final { gdb-test .+2 "i" "v + 1" } } */
> -		/* { dg-final { gdb-test .+1 "x" "10 - i" } } */
> +		/* { dg-final { gdb-test .+1 "x" "10 - i" { xfail { aarch64*-*-* && { any-opts "-Os" } } } } } */
>        bar (i);	/* { dg-final { gdb-test . "y" "20 - 2 * i" } } */
>  		/* { dg-final { gdb-test .-1 "z" "30 - 3 * i" { xfail { aarch64*-*-* && { any-opts "-fno-fat-lto-objects" "-Os" } } } } } */
>        i++, x--, y -= 2, z -= 3;
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr105818.c b/gcc/testsuite/gcc.target/powerpc/pr105818.c
> new file mode 100644
> index 00000000000..18781edbf9e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr105818.c
> @@ -0,0 +1,9 @@
> +/* { dg-options "-Os -fno-tree-vectorize" } */
> +
> +#pragma GCC optimize "-fno-tree-vectorize"
> +
> +void
> +foo (void)
> +{
> +  void bar (void);
> +}
> --
> 2.27.0

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] predict: Adjust optimize_function_for_size_p [PR105818]
  2022-06-14  7:57 [PATCH] predict: Adjust optimize_function_for_size_p [PR105818] Kewen.Lin
  2022-06-14 11:37 ` Jan Hubicka
@ 2022-06-14 12:53 ` Segher Boessenkool
  2022-06-15  6:21   ` Kewen.Lin
  1 sibling, 1 reply; 9+ messages in thread
From: Segher Boessenkool @ 2022-06-14 12:53 UTC (permalink / raw)
  To: Kewen.Lin; +Cc: GCC Patches, Jan Hubicka, Richard Biener, Richard Sandiford

On Tue, Jun 14, 2022 at 03:57:08PM +0800, Kewen.Lin wrote:
> 	PR target/105818

Change this to something else?  It is not a target bug.  "middle-end"
perhaps?

Thanks for fixing this,


Segher

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] predict: Adjust optimize_function_for_size_p [PR105818]
  2022-06-14 11:37 ` Jan Hubicka
@ 2022-06-15  6:20   ` Kewen.Lin
  2022-07-11  3:42     ` Kewen.Lin
  0 siblings, 1 reply; 9+ messages in thread
From: Kewen.Lin @ 2022-06-15  6:20 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: GCC Patches, Richard Sandiford, Segher Boessenkool

Hi Honza,

Thanks for the comments!  Some replies are inlined below.

on 2022/6/14 19:37, Jan Hubicka wrote:
>> Hi,
>>
>> Function optimize_function_for_size_p returns OPTIMIZE_SIZE_NO
>> if func->decl is not null but no cgraph node is available for it.
>> As PR105818 shows, this could give unexpected result.  For the
>> case in PR105818, when parsing bar decl in function foo, the cfun
>> is a function structure for foo, for which there is none cgraph
>> node, so it returns OPTIMIZE_SIZE_NO.  But it's incorrect since
>> the context is to optimize for size, the flag optimize_size is
>> true.
>>
>> The patch is to make optimize_function_for_size_p to check
>> optimize_size as what it does when func->decl is unavailable.
>>
>> One regression failure got exposed on aarch64-linux-gnu:
>>
>> PASS->FAIL: gcc.dg/guality/pr54693-2.c   -Os \
>> 	    -DPREVENT_OPTIMIZATION  line 21 x == 10 - i
>>
>> The difference comes from the macro LOGICAL_OP_NON_SHORT_CIRCUIT
>> used in function fold_range_test during c parsing, it uses
>> optimize_function_for_speed_p which is equal to the invertion
>> of optimize_function_for_size_p.  At that time cfun->decl is valid
>> but no cgraph node for it, w/o this patch function
>> optimize_function_for_speed_p returns true eventually, while it
>> returns false with this patch.  Since the command line option -Os
>> is specified, there is no reason to interpret it as "for speed".
>> I think this failure is expected and adjust the test case
>> accordingly.
>>
>> Is it ok for trunk?
>>
>> BR,
>> Kewen
>> -----
>>
>> 	PR target/105818
>>
>> gcc/ChangeLog:
>>
>> 	* predict.cc (optimize_function_for_size_p): Check optimize_size when
>> 	func->decl is valid but its cgraph node is unavailable.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 	* gcc.target/powerpc/pr105818.c: New test.
>> 	* gcc.dg/guality/pr54693-2.c: Adjust for aarch64.
>> ---
>>  gcc/predict.cc                              | 2 +-
>>  gcc/testsuite/gcc.dg/guality/pr54693-2.c    | 2 +-
>>  gcc/testsuite/gcc.target/powerpc/pr105818.c | 9 +++++++++
>>  3 files changed, 11 insertions(+), 2 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr105818.c
>>
>> diff --git a/gcc/predict.cc b/gcc/predict.cc
>> index 5734e4c8516..6c60a973236 100644
>> --- a/gcc/predict.cc
>> +++ b/gcc/predict.cc
>> @@ -268,7 +268,7 @@ optimize_function_for_size_p (struct function *fun)
>>    cgraph_node *n = cgraph_node::get (fun->decl);
>>    if (n)
>>      return n->optimize_for_size_p ();
>> -  return OPTIMIZE_SIZE_NO;
>> +  return optimize_size ? OPTIMIZE_SIZE_MAX : OPTIMIZE_SIZE_NO;
> 
> We could also do (opt_for_fn (cfun->decl, optimize_size) that is
> probably better since one can change optimize_size with optimization
> attribute.

Good point, agree!

> However I think in most cases we check for optimize_size early I think
> we are doing something wrong, since at that time htere is no profile
> available.  Why exactly PR105818 hits the flag change issue?

For PR105818, the reason why the flag changs is that:

Firstly, the inconsistent flag is OPTION_MASK_SAVE_TOC_INDIRECT bit
of rs6000_isa_flags_explicit, it's set as below:

/* If we can shrink-wrap the TOC register save separately, then use
   -msave-toc-indirect unless explicitly disabled.  */
if ((rs6000_isa_flags_explicit & OPTION_MASK_SAVE_TOC_INDIRECT) == 0
    && flag_shrink_wrap_separate
    && optimize_function_for_speed_p (cfun))
  rs6000_isa_flags |= OPTION_MASK_SAVE_TOC_INDIRECT;

Initially, rs6000 initialize target_option_default_node with
OPTION_MASK_SAVE_TOC_INDIRECT unset, at that time cfun is NULL
and optimize_size is true.

Later, when c parser handling function foo, it builds target
option node as target_option_default_node in function
handle_optimize_attribute, it does global option saving and
verifying there as well, at that time the cfun is NULL, no
issue is found.  And function store_parm_decls allocates
struct_function for foo then, cfun becomes function struct
for foo, when c parser continues to handle the decl bar in
foo, function handle_optimize_attribute works as before,
tries to restore the target options at the end, it calls
targetm.target_option.restore (rs6000_function_specific_restore)
which calls function rs6000_option_override_internal again,
at this time the cfun is not NULL while there is no cgraph
node for its decl, optimize_function_for_speed_p returns true
and gets the OPTION_MASK_SAVE_TOC_INDIRECT bit of flag
rs6000_isa_flags set unexpectedly.  It becomes inconsistent
as the one saved previously.

IMHO, both contexts of global and function decl foo here hold
optimize_size, function optimize_function_for_speed_p should
not return true anyway.

btw, the aarch64 failed case also gets the unexpected
result for optimize_function_for_speed_p during c parsing
(fold_range_test <- ... <- c_parser_condition).

IIUC, in parsing time we don't have the profile information
available.

BR,
Kewen

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] predict: Adjust optimize_function_for_size_p [PR105818]
  2022-06-14 12:53 ` Segher Boessenkool
@ 2022-06-15  6:21   ` Kewen.Lin
  0 siblings, 0 replies; 9+ messages in thread
From: Kewen.Lin @ 2022-06-15  6:21 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: GCC Patches, Jan Hubicka, Richard Biener, Richard Sandiford

on 2022/6/14 20:53, Segher Boessenkool wrote:
> On Tue, Jun 14, 2022 at 03:57:08PM +0800, Kewen.Lin wrote:
>> 	PR target/105818
> 
> Change this to something else?  It is not a target bug.  "middle-end"
> perhaps?
> 

Good catch, will fix this.  Thanks Segher!

BR,
Kewen

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] predict: Adjust optimize_function_for_size_p [PR105818]
  2022-06-15  6:20   ` Kewen.Lin
@ 2022-07-11  3:42     ` Kewen.Lin
  2022-08-15  8:33       ` Kewen.Lin
  0 siblings, 1 reply; 9+ messages in thread
From: Kewen.Lin @ 2022-07-11  3:42 UTC (permalink / raw)
  To: Jan Hubicka
  Cc: GCC Patches, Richard Sandiford, Segher Boessenkool, Richard Biener

on 2022/6/15 14:20, Kewen.Lin wrote:
> Hi Honza,
> 
> Thanks for the comments!  Some replies are inlined below.
> 
> on 2022/6/14 19:37, Jan Hubicka wrote:
>>> Hi,
>>>
>>> Function optimize_function_for_size_p returns OPTIMIZE_SIZE_NO
>>> if func->decl is not null but no cgraph node is available for it.
>>> As PR105818 shows, this could give unexpected result.  For the
>>> case in PR105818, when parsing bar decl in function foo, the cfun
>>> is a function structure for foo, for which there is none cgraph
>>> node, so it returns OPTIMIZE_SIZE_NO.  But it's incorrect since
>>> the context is to optimize for size, the flag optimize_size is
>>> true.
>>>
>>> The patch is to make optimize_function_for_size_p to check
>>> optimize_size as what it does when func->decl is unavailable.
>>>
>>> One regression failure got exposed on aarch64-linux-gnu:
>>>
>>> PASS->FAIL: gcc.dg/guality/pr54693-2.c   -Os \
>>> 	    -DPREVENT_OPTIMIZATION  line 21 x == 10 - i
>>>
>>> The difference comes from the macro LOGICAL_OP_NON_SHORT_CIRCUIT
>>> used in function fold_range_test during c parsing, it uses
>>> optimize_function_for_speed_p which is equal to the invertion
>>> of optimize_function_for_size_p.  At that time cfun->decl is valid
>>> but no cgraph node for it, w/o this patch function
>>> optimize_function_for_speed_p returns true eventually, while it
>>> returns false with this patch.  Since the command line option -Os
>>> is specified, there is no reason to interpret it as "for speed".
>>> I think this failure is expected and adjust the test case
>>> accordingly.
>>>
>>> Is it ok for trunk?
>>>
>>> BR,
>>> Kewen
>>> -----
>>>
>>> 	PR target/105818
>>>
>>> gcc/ChangeLog:
>>>
>>> 	* predict.cc (optimize_function_for_size_p): Check optimize_size when
>>> 	func->decl is valid but its cgraph node is unavailable.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> 	* gcc.target/powerpc/pr105818.c: New test.
>>> 	* gcc.dg/guality/pr54693-2.c: Adjust for aarch64.
>>> ---
>>>  gcc/predict.cc                              | 2 +-
>>>  gcc/testsuite/gcc.dg/guality/pr54693-2.c    | 2 +-
>>>  gcc/testsuite/gcc.target/powerpc/pr105818.c | 9 +++++++++
>>>  3 files changed, 11 insertions(+), 2 deletions(-)
>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr105818.c
>>>
>>> diff --git a/gcc/predict.cc b/gcc/predict.cc
>>> index 5734e4c8516..6c60a973236 100644
>>> --- a/gcc/predict.cc
>>> +++ b/gcc/predict.cc
>>> @@ -268,7 +268,7 @@ optimize_function_for_size_p (struct function *fun)
>>>    cgraph_node *n = cgraph_node::get (fun->decl);
>>>    if (n)
>>>      return n->optimize_for_size_p ();
>>> -  return OPTIMIZE_SIZE_NO;
>>> +  return optimize_size ? OPTIMIZE_SIZE_MAX : OPTIMIZE_SIZE_NO;
>>
>> We could also do (opt_for_fn (cfun->decl, optimize_size) that is
>> probably better since one can change optimize_size with optimization
>> attribute.
> 
> Good point, agree!
> 
>> However I think in most cases we check for optimize_size early I think
>> we are doing something wrong, since at that time htere is no profile
>> available.  Why exactly PR105818 hits the flag change issue?
> 
> For PR105818, the reason why the flag changs is that:
> 
> Firstly, the inconsistent flag is OPTION_MASK_SAVE_TOC_INDIRECT bit
> of rs6000_isa_flags_explicit, it's set as below:
> 
> /* If we can shrink-wrap the TOC register save separately, then use
>    -msave-toc-indirect unless explicitly disabled.  */
> if ((rs6000_isa_flags_explicit & OPTION_MASK_SAVE_TOC_INDIRECT) == 0
>     && flag_shrink_wrap_separate
>     && optimize_function_for_speed_p (cfun))
>   rs6000_isa_flags |= OPTION_MASK_SAVE_TOC_INDIRECT;
> 
> Initially, rs6000 initialize target_option_default_node with
> OPTION_MASK_SAVE_TOC_INDIRECT unset, at that time cfun is NULL
> and optimize_size is true.
> 
> Later, when c parser handling function foo, it builds target
> option node as target_option_default_node in function
> handle_optimize_attribute, it does global option saving and
> verifying there as well, at that time the cfun is NULL, no
> issue is found.  And function store_parm_decls allocates
> struct_function for foo then, cfun becomes function struct
> for foo, when c parser continues to handle the decl bar in
> foo, function handle_optimize_attribute works as before,
> tries to restore the target options at the end, it calls
> targetm.target_option.restore (rs6000_function_specific_restore)
> which calls function rs6000_option_override_internal again,
> at this time the cfun is not NULL while there is no cgraph
> node for its decl, optimize_function_for_speed_p returns true
> and gets the OPTION_MASK_SAVE_TOC_INDIRECT bit of flag
> rs6000_isa_flags set unexpectedly.  It becomes inconsistent
> as the one saved previously.
> 
> IMHO, both contexts of global and function decl foo here hold
> optimize_size, function optimize_function_for_speed_p should
> not return true anyway.
> 
> btw, the aarch64 failed case also gets the unexpected
> result for optimize_function_for_speed_p during c parsing
> (fold_range_test <- ... <- c_parser_condition).
> 
> IIUC, in parsing time we don't have the profile information
> available.
> 

Hi Honza,

Does the above explanation sound reasonable to you?

BR,
Kewen

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] predict: Adjust optimize_function_for_size_p [PR105818]
  2022-07-11  3:42     ` Kewen.Lin
@ 2022-08-15  8:33       ` Kewen.Lin
  2022-08-29  6:35         ` Kewen.Lin
  0 siblings, 1 reply; 9+ messages in thread
From: Kewen.Lin @ 2022-08-15  8:33 UTC (permalink / raw)
  To: Jan Hubicka
  Cc: GCC Patches, Richard Sandiford, Segher Boessenkool, Richard Biener

on 2022/7/11 11:42, Kewen.Lin wrote:
> on 2022/6/15 14:20, Kewen.Lin wrote:
>> Hi Honza,
>>
>> Thanks for the comments!  Some replies are inlined below.
>>
>> on 2022/6/14 19:37, Jan Hubicka wrote:
>>>> Hi,
>>>>
>>>> Function optimize_function_for_size_p returns OPTIMIZE_SIZE_NO
>>>> if func->decl is not null but no cgraph node is available for it.
>>>> As PR105818 shows, this could give unexpected result.  For the
>>>> case in PR105818, when parsing bar decl in function foo, the cfun
>>>> is a function structure for foo, for which there is none cgraph
>>>> node, so it returns OPTIMIZE_SIZE_NO.  But it's incorrect since
>>>> the context is to optimize for size, the flag optimize_size is
>>>> true.
>>>>
>>>> The patch is to make optimize_function_for_size_p to check
>>>> optimize_size as what it does when func->decl is unavailable.
>>>>
>>>> One regression failure got exposed on aarch64-linux-gnu:
>>>>
>>>> PASS->FAIL: gcc.dg/guality/pr54693-2.c   -Os \
>>>> 	    -DPREVENT_OPTIMIZATION  line 21 x == 10 - i
>>>>
>>>> The difference comes from the macro LOGICAL_OP_NON_SHORT_CIRCUIT
>>>> used in function fold_range_test during c parsing, it uses
>>>> optimize_function_for_speed_p which is equal to the invertion
>>>> of optimize_function_for_size_p.  At that time cfun->decl is valid
>>>> but no cgraph node for it, w/o this patch function
>>>> optimize_function_for_speed_p returns true eventually, while it
>>>> returns false with this patch.  Since the command line option -Os
>>>> is specified, there is no reason to interpret it as "for speed".
>>>> I think this failure is expected and adjust the test case
>>>> accordingly.
>>>>
>>>> Is it ok for trunk?
>>>>
>>>> BR,
>>>> Kewen
>>>> -----
>>>>
>>>> 	PR target/105818
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>> 	* predict.cc (optimize_function_for_size_p): Check optimize_size when
>>>> 	func->decl is valid but its cgraph node is unavailable.
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>>
>>>> 	* gcc.target/powerpc/pr105818.c: New test.
>>>> 	* gcc.dg/guality/pr54693-2.c: Adjust for aarch64.
>>>> ---
>>>>  gcc/predict.cc                              | 2 +-
>>>>  gcc/testsuite/gcc.dg/guality/pr54693-2.c    | 2 +-
>>>>  gcc/testsuite/gcc.target/powerpc/pr105818.c | 9 +++++++++
>>>>  3 files changed, 11 insertions(+), 2 deletions(-)
>>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr105818.c
>>>>
>>>> diff --git a/gcc/predict.cc b/gcc/predict.cc
>>>> index 5734e4c8516..6c60a973236 100644
>>>> --- a/gcc/predict.cc
>>>> +++ b/gcc/predict.cc
>>>> @@ -268,7 +268,7 @@ optimize_function_for_size_p (struct function *fun)
>>>>    cgraph_node *n = cgraph_node::get (fun->decl);
>>>>    if (n)
>>>>      return n->optimize_for_size_p ();
>>>> -  return OPTIMIZE_SIZE_NO;
>>>> +  return optimize_size ? OPTIMIZE_SIZE_MAX : OPTIMIZE_SIZE_NO;
>>>
>>> We could also do (opt_for_fn (cfun->decl, optimize_size) that is
>>> probably better since one can change optimize_size with optimization
>>> attribute.
>>
>> Good point, agree!
>>
>>> However I think in most cases we check for optimize_size early I think
>>> we are doing something wrong, since at that time htere is no profile
>>> available.  Why exactly PR105818 hits the flag change issue?
>>
>> For PR105818, the reason why the flag changs is that:
>>
>> Firstly, the inconsistent flag is OPTION_MASK_SAVE_TOC_INDIRECT bit
>> of rs6000_isa_flags_explicit, it's set as below:
>>
>> /* If we can shrink-wrap the TOC register save separately, then use
>>    -msave-toc-indirect unless explicitly disabled.  */
>> if ((rs6000_isa_flags_explicit & OPTION_MASK_SAVE_TOC_INDIRECT) == 0
>>     && flag_shrink_wrap_separate
>>     && optimize_function_for_speed_p (cfun))
>>   rs6000_isa_flags |= OPTION_MASK_SAVE_TOC_INDIRECT;
>>
>> Initially, rs6000 initialize target_option_default_node with
>> OPTION_MASK_SAVE_TOC_INDIRECT unset, at that time cfun is NULL
>> and optimize_size is true.
>>
>> Later, when c parser handling function foo, it builds target
>> option node as target_option_default_node in function
>> handle_optimize_attribute, it does global option saving and
>> verifying there as well, at that time the cfun is NULL, no
>> issue is found.  And function store_parm_decls allocates
>> struct_function for foo then, cfun becomes function struct
>> for foo, when c parser continues to handle the decl bar in
>> foo, function handle_optimize_attribute works as before,
>> tries to restore the target options at the end, it calls
>> targetm.target_option.restore (rs6000_function_specific_restore)
>> which calls function rs6000_option_override_internal again,
>> at this time the cfun is not NULL while there is no cgraph
>> node for its decl, optimize_function_for_speed_p returns true
>> and gets the OPTION_MASK_SAVE_TOC_INDIRECT bit of flag
>> rs6000_isa_flags set unexpectedly.  It becomes inconsistent
>> as the one saved previously.
>>
>> IMHO, both contexts of global and function decl foo here hold
>> optimize_size, function optimize_function_for_speed_p should
>> not return true anyway.
>>
>> btw, the aarch64 failed case also gets the unexpected
>> result for optimize_function_for_speed_p during c parsing
>> (fold_range_test <- ... <- c_parser_condition).
>>
>> IIUC, in parsing time we don't have the profile information
>> available.
>>
> 
> Hi Honza,
> 
> Does the above explanation sound reasonable to you?
> 

Hi Honza,

Gentle ping ...

BR,
Kewen

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] predict: Adjust optimize_function_for_size_p [PR105818]
  2022-08-15  8:33       ` Kewen.Lin
@ 2022-08-29  6:35         ` Kewen.Lin
  2022-09-28  5:46           ` Kewen.Lin
  0 siblings, 1 reply; 9+ messages in thread
From: Kewen.Lin @ 2022-08-29  6:35 UTC (permalink / raw)
  To: Jan Hubicka
  Cc: Richard Sandiford, GCC Patches, Segher Boessenkool, Jan Hubicka,
	Richard Biener

on 2022/8/15 16:33, Kewen.Lin via Gcc-patches wrote:
> on 2022/7/11 11:42, Kewen.Lin wrote:
>> on 2022/6/15 14:20, Kewen.Lin wrote:
>>> Hi Honza,
>>>
>>> Thanks for the comments!  Some replies are inlined below.
>>>
>>> on 2022/6/14 19:37, Jan Hubicka wrote:
>>>>> Hi,
>>>>>
>>>>> Function optimize_function_for_size_p returns OPTIMIZE_SIZE_NO
>>>>> if func->decl is not null but no cgraph node is available for it.
>>>>> As PR105818 shows, this could give unexpected result.  For the
>>>>> case in PR105818, when parsing bar decl in function foo, the cfun
>>>>> is a function structure for foo, for which there is none cgraph
>>>>> node, so it returns OPTIMIZE_SIZE_NO.  But it's incorrect since
>>>>> the context is to optimize for size, the flag optimize_size is
>>>>> true.
>>>>>
>>>>> The patch is to make optimize_function_for_size_p to check
>>>>> optimize_size as what it does when func->decl is unavailable.
>>>>>
>>>>> One regression failure got exposed on aarch64-linux-gnu:
>>>>>
>>>>> PASS->FAIL: gcc.dg/guality/pr54693-2.c   -Os \
>>>>> 	    -DPREVENT_OPTIMIZATION  line 21 x == 10 - i
>>>>>
>>>>> The difference comes from the macro LOGICAL_OP_NON_SHORT_CIRCUIT
>>>>> used in function fold_range_test during c parsing, it uses
>>>>> optimize_function_for_speed_p which is equal to the invertion
>>>>> of optimize_function_for_size_p.  At that time cfun->decl is valid
>>>>> but no cgraph node for it, w/o this patch function
>>>>> optimize_function_for_speed_p returns true eventually, while it
>>>>> returns false with this patch.  Since the command line option -Os
>>>>> is specified, there is no reason to interpret it as "for speed".
>>>>> I think this failure is expected and adjust the test case
>>>>> accordingly.
>>>>>
>>>>> Is it ok for trunk?
>>>>>
>>>>> BR,
>>>>> Kewen
>>>>> -----
>>>>>
>>>>> 	PR target/105818
>>>>>
>>>>> gcc/ChangeLog:
>>>>>
>>>>> 	* predict.cc (optimize_function_for_size_p): Check optimize_size when
>>>>> 	func->decl is valid but its cgraph node is unavailable.
>>>>>
>>>>> gcc/testsuite/ChangeLog:
>>>>>
>>>>> 	* gcc.target/powerpc/pr105818.c: New test.
>>>>> 	* gcc.dg/guality/pr54693-2.c: Adjust for aarch64.
>>>>> ---
>>>>>  gcc/predict.cc                              | 2 +-
>>>>>  gcc/testsuite/gcc.dg/guality/pr54693-2.c    | 2 +-
>>>>>  gcc/testsuite/gcc.target/powerpc/pr105818.c | 9 +++++++++
>>>>>  3 files changed, 11 insertions(+), 2 deletions(-)
>>>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr105818.c
>>>>>
>>>>> diff --git a/gcc/predict.cc b/gcc/predict.cc
>>>>> index 5734e4c8516..6c60a973236 100644
>>>>> --- a/gcc/predict.cc
>>>>> +++ b/gcc/predict.cc
>>>>> @@ -268,7 +268,7 @@ optimize_function_for_size_p (struct function *fun)
>>>>>    cgraph_node *n = cgraph_node::get (fun->decl);
>>>>>    if (n)
>>>>>      return n->optimize_for_size_p ();
>>>>> -  return OPTIMIZE_SIZE_NO;
>>>>> +  return optimize_size ? OPTIMIZE_SIZE_MAX : OPTIMIZE_SIZE_NO;
>>>>
>>>> We could also do (opt_for_fn (cfun->decl, optimize_size) that is
>>>> probably better since one can change optimize_size with optimization
>>>> attribute.
>>>
>>> Good point, agree!
>>>
>>>> However I think in most cases we check for optimize_size early I think
>>>> we are doing something wrong, since at that time htere is no profile
>>>> available.  Why exactly PR105818 hits the flag change issue?
>>>
>>> For PR105818, the reason why the flag changs is that:
>>>
>>> Firstly, the inconsistent flag is OPTION_MASK_SAVE_TOC_INDIRECT bit
>>> of rs6000_isa_flags_explicit, it's set as below:
>>>
>>> /* If we can shrink-wrap the TOC register save separately, then use
>>>    -msave-toc-indirect unless explicitly disabled.  */
>>> if ((rs6000_isa_flags_explicit & OPTION_MASK_SAVE_TOC_INDIRECT) == 0
>>>     && flag_shrink_wrap_separate
>>>     && optimize_function_for_speed_p (cfun))
>>>   rs6000_isa_flags |= OPTION_MASK_SAVE_TOC_INDIRECT;
>>>
>>> Initially, rs6000 initialize target_option_default_node with
>>> OPTION_MASK_SAVE_TOC_INDIRECT unset, at that time cfun is NULL
>>> and optimize_size is true.
>>>
>>> Later, when c parser handling function foo, it builds target
>>> option node as target_option_default_node in function
>>> handle_optimize_attribute, it does global option saving and
>>> verifying there as well, at that time the cfun is NULL, no
>>> issue is found.  And function store_parm_decls allocates
>>> struct_function for foo then, cfun becomes function struct
>>> for foo, when c parser continues to handle the decl bar in
>>> foo, function handle_optimize_attribute works as before,
>>> tries to restore the target options at the end, it calls
>>> targetm.target_option.restore (rs6000_function_specific_restore)
>>> which calls function rs6000_option_override_internal again,
>>> at this time the cfun is not NULL while there is no cgraph
>>> node for its decl, optimize_function_for_speed_p returns true
>>> and gets the OPTION_MASK_SAVE_TOC_INDIRECT bit of flag
>>> rs6000_isa_flags set unexpectedly.  It becomes inconsistent
>>> as the one saved previously.
>>>
>>> IMHO, both contexts of global and function decl foo here hold
>>> optimize_size, function optimize_function_for_speed_p should
>>> not return true anyway.
>>>
>>> btw, the aarch64 failed case also gets the unexpected
>>> result for optimize_function_for_speed_p during c parsing
>>> (fold_range_test <- ... <- c_parser_condition).
>>>
>>> IIUC, in parsing time we don't have the profile information
>>> available.
>>>
>>
>> Hi Honza,
>>
>> Does the above explanation sound reasonable to you?
>>
> 

Hi Honza,

Gentle ping again ...

BR,
Kewen

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] predict: Adjust optimize_function_for_size_p [PR105818]
  2022-08-29  6:35         ` Kewen.Lin
@ 2022-09-28  5:46           ` Kewen.Lin
  0 siblings, 0 replies; 9+ messages in thread
From: Kewen.Lin @ 2022-09-28  5:46 UTC (permalink / raw)
  To: Jan Hubicka
  Cc: Richard Sandiford, GCC Patches, Jan Hubicka, Segher Boessenkool,
	Richard Biener

on 2022/8/29 14:35, Kewen.Lin via Gcc-patches wrote:
> on 2022/8/15 16:33, Kewen.Lin via Gcc-patches wrote:
>> on 2022/7/11 11:42, Kewen.Lin wrote:
>>> on 2022/6/15 14:20, Kewen.Lin wrote:
>>>> Hi Honza,
>>>>
>>>> Thanks for the comments!  Some replies are inlined below.
>>>>
>>>> on 2022/6/14 19:37, Jan Hubicka wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Function optimize_function_for_size_p returns OPTIMIZE_SIZE_NO
>>>>>> if func->decl is not null but no cgraph node is available for it.
>>>>>> As PR105818 shows, this could give unexpected result.  For the
>>>>>> case in PR105818, when parsing bar decl in function foo, the cfun
>>>>>> is a function structure for foo, for which there is none cgraph
>>>>>> node, so it returns OPTIMIZE_SIZE_NO.  But it's incorrect since
>>>>>> the context is to optimize for size, the flag optimize_size is
>>>>>> true.
>>>>>>
>>>>>> The patch is to make optimize_function_for_size_p to check
>>>>>> optimize_size as what it does when func->decl is unavailable.
>>>>>>
>>>>>> One regression failure got exposed on aarch64-linux-gnu:
>>>>>>
>>>>>> PASS->FAIL: gcc.dg/guality/pr54693-2.c   -Os \
>>>>>> 	    -DPREVENT_OPTIMIZATION  line 21 x == 10 - i
>>>>>>
>>>>>> The difference comes from the macro LOGICAL_OP_NON_SHORT_CIRCUIT
>>>>>> used in function fold_range_test during c parsing, it uses
>>>>>> optimize_function_for_speed_p which is equal to the invertion
>>>>>> of optimize_function_for_size_p.  At that time cfun->decl is valid
>>>>>> but no cgraph node for it, w/o this patch function
>>>>>> optimize_function_for_speed_p returns true eventually, while it
>>>>>> returns false with this patch.  Since the command line option -Os
>>>>>> is specified, there is no reason to interpret it as "for speed".
>>>>>> I think this failure is expected and adjust the test case
>>>>>> accordingly.
>>>>>>
>>>>>> Is it ok for trunk?
>>>>>>
>>>>>> BR,
>>>>>> Kewen
>>>>>> -----
>>>>>>
>>>>>> 	PR target/105818
>>>>>>
>>>>>> gcc/ChangeLog:
>>>>>>
>>>>>> 	* predict.cc (optimize_function_for_size_p): Check optimize_size when
>>>>>> 	func->decl is valid but its cgraph node is unavailable.
>>>>>>
>>>>>> gcc/testsuite/ChangeLog:
>>>>>>
>>>>>> 	* gcc.target/powerpc/pr105818.c: New test.
>>>>>> 	* gcc.dg/guality/pr54693-2.c: Adjust for aarch64.
>>>>>> ---
>>>>>>  gcc/predict.cc                              | 2 +-
>>>>>>  gcc/testsuite/gcc.dg/guality/pr54693-2.c    | 2 +-
>>>>>>  gcc/testsuite/gcc.target/powerpc/pr105818.c | 9 +++++++++
>>>>>>  3 files changed, 11 insertions(+), 2 deletions(-)
>>>>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr105818.c
>>>>>>
>>>>>> diff --git a/gcc/predict.cc b/gcc/predict.cc
>>>>>> index 5734e4c8516..6c60a973236 100644
>>>>>> --- a/gcc/predict.cc
>>>>>> +++ b/gcc/predict.cc
>>>>>> @@ -268,7 +268,7 @@ optimize_function_for_size_p (struct function *fun)
>>>>>>    cgraph_node *n = cgraph_node::get (fun->decl);
>>>>>>    if (n)
>>>>>>      return n->optimize_for_size_p ();
>>>>>> -  return OPTIMIZE_SIZE_NO;
>>>>>> +  return optimize_size ? OPTIMIZE_SIZE_MAX : OPTIMIZE_SIZE_NO;
>>>>>
>>>>> We could also do (opt_for_fn (cfun->decl, optimize_size) that is
>>>>> probably better since one can change optimize_size with optimization
>>>>> attribute.
>>>>
>>>> Good point, agree!
>>>>
>>>>> However I think in most cases we check for optimize_size early I think
>>>>> we are doing something wrong, since at that time htere is no profile
>>>>> available.  Why exactly PR105818 hits the flag change issue?
>>>>
>>>> For PR105818, the reason why the flag changs is that:
>>>>
>>>> Firstly, the inconsistent flag is OPTION_MASK_SAVE_TOC_INDIRECT bit
>>>> of rs6000_isa_flags_explicit, it's set as below:
>>>>
>>>> /* If we can shrink-wrap the TOC register save separately, then use
>>>>    -msave-toc-indirect unless explicitly disabled.  */
>>>> if ((rs6000_isa_flags_explicit & OPTION_MASK_SAVE_TOC_INDIRECT) == 0
>>>>     && flag_shrink_wrap_separate
>>>>     && optimize_function_for_speed_p (cfun))
>>>>   rs6000_isa_flags |= OPTION_MASK_SAVE_TOC_INDIRECT;
>>>>
>>>> Initially, rs6000 initialize target_option_default_node with
>>>> OPTION_MASK_SAVE_TOC_INDIRECT unset, at that time cfun is NULL
>>>> and optimize_size is true.
>>>>
>>>> Later, when c parser handling function foo, it builds target
>>>> option node as target_option_default_node in function
>>>> handle_optimize_attribute, it does global option saving and
>>>> verifying there as well, at that time the cfun is NULL, no
>>>> issue is found.  And function store_parm_decls allocates
>>>> struct_function for foo then, cfun becomes function struct
>>>> for foo, when c parser continues to handle the decl bar in
>>>> foo, function handle_optimize_attribute works as before,
>>>> tries to restore the target options at the end, it calls
>>>> targetm.target_option.restore (rs6000_function_specific_restore)
>>>> which calls function rs6000_option_override_internal again,
>>>> at this time the cfun is not NULL while there is no cgraph
>>>> node for its decl, optimize_function_for_speed_p returns true
>>>> and gets the OPTION_MASK_SAVE_TOC_INDIRECT bit of flag
>>>> rs6000_isa_flags set unexpectedly.  It becomes inconsistent
>>>> as the one saved previously.
>>>>
>>>> IMHO, both contexts of global and function decl foo here hold
>>>> optimize_size, function optimize_function_for_speed_p should
>>>> not return true anyway.
>>>>
>>>> btw, the aarch64 failed case also gets the unexpected
>>>> result for optimize_function_for_speed_p during c parsing
>>>> (fold_range_test <- ... <- c_parser_condition).
>>>>
>>>> IIUC, in parsing time we don't have the profile information
>>>> available.
>>>>
>>>
>>> Hi Honza,
>>>
>>> Does the above explanation sound reasonable to you?
>>>

Hi Honza,

Gentle ping^4 ...

BR,
Kewen

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-09-28  5:46 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-14  7:57 [PATCH] predict: Adjust optimize_function_for_size_p [PR105818] Kewen.Lin
2022-06-14 11:37 ` Jan Hubicka
2022-06-15  6:20   ` Kewen.Lin
2022-07-11  3:42     ` Kewen.Lin
2022-08-15  8:33       ` Kewen.Lin
2022-08-29  6:35         ` Kewen.Lin
2022-09-28  5:46           ` Kewen.Lin
2022-06-14 12:53 ` Segher Boessenkool
2022-06-15  6:21   ` Kewen.Lin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).