* [PATCH] rs6000: Don't use optimize_function_for_speed_p too early [PR108184]
@ 2023-01-04 9:20 Kewen.Lin
2023-01-04 10:46 ` Segher Boessenkool
0 siblings, 1 reply; 5+ messages in thread
From: Kewen.Lin @ 2023-01-04 9:20 UTC (permalink / raw)
To: GCC Patches; +Cc: Segher Boessenkool, David Edelsohn, Peter Bergner
Hi,
As Honza pointed out in [1], the current uses of function
optimize_function_for_speed_p in rs6000_option_override_internal
are too early, since the query results from the functions
optimize_function_for_{speed,size}_p could be changed later due
to profile feedback and some function attributes handlings etc.
This patch is to move optimize_function_for_speed_p to all the
use places of the corresponding flags, which follows the existing
practices. Maybe we can cache it somewhere at an appropriate
timing, but that's another thing.
[1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607527.html
Bootstrapped and regtested on powerpc64-linux-gnu P8 and
powerpc64le-linux-gnu P9 and P10.
I'm going to push this soon if no objections.
BR,
Kewen
-----
PR target/108184
gcc/ChangeLog:
* config/rs6000/rs6000.cc (rs6000_option_override_internal): Remove
all optimize_function_for_speed_p uses.
* config/rs6000/predicates.md (fusion_gpr_mem_load): Call
optimize_function_for_speed_p along with TARGET_P8_FUSION_SIGN.
(fusion_gpr_load_p): Likewise.
(expand_fusion_gpr_load): Likewise.
(rs6000_call_aix): Call optimize_function_for_speed_p along with
TARGET_SAVE_TOC_INDIRECT.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/pr108184.c: New test.
---
gcc/config/rs6000/predicates.md | 4 +++-
gcc/config/rs6000/rs6000.cc | 16 ++++++++++------
gcc/testsuite/gcc.target/powerpc/pr108184.c | 15 +++++++++++++++
3 files changed, 28 insertions(+), 7 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108184.c
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index b1fcc69bb60..11f1779e7bf 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -1878,7 +1878,9 @@ (define_predicate "fusion_gpr_mem_load"
/* Handle sign/zero extend. */
if (GET_CODE (op) == ZERO_EXTEND
- || (TARGET_P8_FUSION_SIGN && GET_CODE (op) == SIGN_EXTEND))
+ || (TARGET_P8_FUSION_SIGN
+ && GET_CODE (op) == SIGN_EXTEND
+ && optimize_function_for_speed_p (cfun)))
{
op = XEXP (op, 0);
mode = GET_MODE (op);
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index eb7ad5e954f..bbf829f45d9 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -3978,8 +3978,7 @@ rs6000_option_override_internal (bool global_init_p)
/* If we can shrink-wrap the TOC register save separately, then use
-msave-toc-indirect unless explicitly disabled. */
if ((rs6000_isa_flags_explicit & OPTION_MASK_SAVE_TOC_INDIRECT) == 0
- && flag_shrink_wrap_separate
- && optimize_function_for_speed_p (cfun))
+ && flag_shrink_wrap_separate)
rs6000_isa_flags |= OPTION_MASK_SAVE_TOC_INDIRECT;
/* Enable power8 fusion if we are tuning for power8, even if we aren't
@@ -4013,7 +4012,6 @@ rs6000_option_override_internal (bool global_init_p)
zero extending load, and an explicit sign extension. */
if (TARGET_P8_FUSION
&& !(rs6000_isa_flags_explicit & OPTION_MASK_P8_FUSION_SIGN)
- && optimize_function_for_speed_p (cfun)
&& optimize >= 3)
rs6000_isa_flags |= OPTION_MASK_P8_FUSION_SIGN;
@@ -25604,7 +25602,9 @@ rs6000_call_aix (rtx value, rtx func_desc, rtx tlsarg, rtx cookie)
/* Can we optimize saving the TOC in the prologue or
do we need to do it at every call? */
- if (TARGET_SAVE_TOC_INDIRECT && !cfun->calls_alloca)
+ if (TARGET_SAVE_TOC_INDIRECT
+ && !cfun->calls_alloca
+ && optimize_function_for_speed_p (cfun))
cfun->machine->save_toc_in_prologue = true;
else
{
@@ -27471,7 +27471,9 @@ fusion_gpr_load_p (rtx addis_reg, /* register set via addis. */
/* Allow sign/zero extension. */
if (GET_CODE (mem) == ZERO_EXTEND
- || (GET_CODE (mem) == SIGN_EXTEND && TARGET_P8_FUSION_SIGN))
+ || (GET_CODE (mem) == SIGN_EXTEND
+ && TARGET_P8_FUSION_SIGN
+ && optimize_function_for_speed_p (cfun)))
mem = XEXP (mem, 0);
if (!MEM_P (mem))
@@ -27535,7 +27537,9 @@ expand_fusion_gpr_load (rtx *operands)
enum rtx_code extend = UNKNOWN;
if (GET_CODE (orig_mem) == ZERO_EXTEND
- || (TARGET_P8_FUSION_SIGN && GET_CODE (orig_mem) == SIGN_EXTEND))
+ || (TARGET_P8_FUSION_SIGN
+ && GET_CODE (orig_mem) == SIGN_EXTEND
+ && optimize_function_for_speed_p (cfun)))
{
extend = GET_CODE (orig_mem);
orig_mem = XEXP (orig_mem, 0);
diff --git a/gcc/testsuite/gcc.target/powerpc/pr108184.c b/gcc/testsuite/gcc.target/powerpc/pr108184.c
new file mode 100644
index 00000000000..8f1e91d9258
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr108184.c
@@ -0,0 +1,15 @@
+/* Only possible to fuse sign extended loads with the addis when
+ optimize >= 3 and Power8 fusion takes effects. */
+/* { dg-options "-mdejagnu-tune=power8 -O3" } */
+
+/* Verify it doesn't optimize this function for speed as it's cold,
+ by checking it doesn't try to fuse sign extended loads with addis,
+ which results in a zero extended load and a sign extension. */
+
+__attribute__ ((cold)) int
+fusion_short (short *p)
+{
+ return p[0x12345];
+}
+
+/* { dg-final { scan-assembler-not {\mextsh\M} } } */
--
2.27.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] rs6000: Don't use optimize_function_for_speed_p too early [PR108184]
2023-01-04 9:20 [PATCH] rs6000: Don't use optimize_function_for_speed_p too early [PR108184] Kewen.Lin
@ 2023-01-04 10:46 ` Segher Boessenkool
2023-01-04 12:15 ` Kewen.Lin
0 siblings, 1 reply; 5+ messages in thread
From: Segher Boessenkool @ 2023-01-04 10:46 UTC (permalink / raw)
To: Kewen.Lin; +Cc: GCC Patches, David Edelsohn, Peter Bergner
On Wed, Jan 04, 2023 at 05:20:14PM +0800, Kewen.Lin wrote:
> As Honza pointed out in [1], the current uses of function
> optimize_function_for_speed_p in rs6000_option_override_internal
> are too early, since the query results from the functions
> optimize_function_for_{speed,size}_p could be changed later due
> to profile feedback and some function attributes handlings etc.
>
> This patch is to move optimize_function_for_speed_p to all the
> use places of the corresponding flags, which follows the existing
> practices. Maybe we can cache it somewhere at an appropriate
> timing, but that's another thing.
> @@ -25604,7 +25602,9 @@ rs6000_call_aix (rtx value, rtx func_desc, rtx tlsarg, rtx cookie)
>
> /* Can we optimize saving the TOC in the prologue or
> do we need to do it at every call? */
> - if (TARGET_SAVE_TOC_INDIRECT && !cfun->calls_alloca)
> + if (TARGET_SAVE_TOC_INDIRECT
> + && !cfun->calls_alloca
> + && optimize_function_for_speed_p (cfun))
> cfun->machine->save_toc_in_prologue = true;
Is this correct? If so, it really needs a separate testcase.
The rest looks good. Thanks!
Segher
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] rs6000: Don't use optimize_function_for_speed_p too early [PR108184]
2023-01-04 10:46 ` Segher Boessenkool
@ 2023-01-04 12:15 ` Kewen.Lin
2023-01-04 14:02 ` Segher Boessenkool
0 siblings, 1 reply; 5+ messages in thread
From: Kewen.Lin @ 2023-01-04 12:15 UTC (permalink / raw)
To: Segher Boessenkool; +Cc: GCC Patches, David Edelsohn, Peter Bergner
Hi Segher,
Thanks for the comments.
on 2023/1/4 18:46, Segher Boessenkool wrote:
> On Wed, Jan 04, 2023 at 05:20:14PM +0800, Kewen.Lin wrote:
>> As Honza pointed out in [1], the current uses of function
>> optimize_function_for_speed_p in rs6000_option_override_internal
>> are too early, since the query results from the functions
>> optimize_function_for_{speed,size}_p could be changed later due
>> to profile feedback and some function attributes handlings etc.
>>
>> This patch is to move optimize_function_for_speed_p to all the
>> use places of the corresponding flags, which follows the existing
>> practices. Maybe we can cache it somewhere at an appropriate
>> timing, but that's another thing.
>
>> @@ -25604,7 +25602,9 @@ rs6000_call_aix (rtx value, rtx func_desc, rtx tlsarg, rtx cookie)
>>
>> /* Can we optimize saving the TOC in the prologue or
>> do we need to do it at every call? */
>> - if (TARGET_SAVE_TOC_INDIRECT && !cfun->calls_alloca)
>> + if (TARGET_SAVE_TOC_INDIRECT
>> + && !cfun->calls_alloca
>> + && optimize_function_for_speed_p (cfun))
>> cfun->machine->save_toc_in_prologue = true;
>
> Is this correct? If so, it really needs a separate testcase.
>
Yes, it just moves the condition from:
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -3978,8 +3978,7 @@ rs6000_option_override_internal (bool global_init_p)
/* If we can shrink-wrap the TOC register save separately, then use
-msave-toc-indirect unless explicitly disabled. */
if ((rs6000_isa_flags_explicit & OPTION_MASK_SAVE_TOC_INDIRECT) == 0
- && flag_shrink_wrap_separate
- && optimize_function_for_speed_p (cfun))
+ && flag_shrink_wrap_separate)
rs6000_isa_flags |= OPTION_MASK_SAVE_TOC_INDIRECT;
here.
I tried to find one test case before, but failed to find one which is not fragile
to test. And I thought the associated test case has demonstrated why the use of
optimize_function_for_{speed,size}_p is too early in function
rs6000_option_override_internal, so I gave up then. Do you worry about that we
could revert it unexpectedly in future and no sensitive test case is on it?
BR,
Kewen
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] rs6000: Don't use optimize_function_for_speed_p too early [PR108184]
2023-01-04 12:15 ` Kewen.Lin
@ 2023-01-04 14:02 ` Segher Boessenkool
2023-01-05 4:04 ` Kewen.Lin
0 siblings, 1 reply; 5+ messages in thread
From: Segher Boessenkool @ 2023-01-04 14:02 UTC (permalink / raw)
To: Kewen.Lin; +Cc: GCC Patches, David Edelsohn, Peter Bergner
Hi!
On Wed, Jan 04, 2023 at 08:15:03PM +0800, Kewen.Lin wrote:
> on 2023/1/4 18:46, Segher Boessenkool wrote:
> >> @@ -25604,7 +25602,9 @@ rs6000_call_aix (rtx value, rtx func_desc, rtx tlsarg, rtx cookie)
> >>
> >> /* Can we optimize saving the TOC in the prologue or
> >> do we need to do it at every call? */
> >> - if (TARGET_SAVE_TOC_INDIRECT && !cfun->calls_alloca)
> >> + if (TARGET_SAVE_TOC_INDIRECT
> >> + && !cfun->calls_alloca
> >> + && optimize_function_for_speed_p (cfun))
> >> cfun->machine->save_toc_in_prologue = true;
> >
> > Is this correct? If so, it really needs a separate testcase.
>
> Yes, it just moves the condition from:
>
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -3978,8 +3978,7 @@ rs6000_option_override_internal (bool global_init_p)
> /* If we can shrink-wrap the TOC register save separately, then use
> -msave-toc-indirect unless explicitly disabled. */
> if ((rs6000_isa_flags_explicit & OPTION_MASK_SAVE_TOC_INDIRECT) == 0
> - && flag_shrink_wrap_separate
> - && optimize_function_for_speed_p (cfun))
> + && flag_shrink_wrap_separate)
> rs6000_isa_flags |= OPTION_MASK_SAVE_TOC_INDIRECT;
>
> here.
That "just" reinforces that this really needs a testcase! It is all
action at a distance, none of this is trivial (if it was there would
not be a bug here in the first place, of course).
> I tried to find one test case before, but failed to find one which is not fragile
> to test. And I thought the associated test case has demonstrated why the use of
> optimize_function_for_{speed,size}_p is too early in function
> rs6000_option_override_internal, so I gave up then. Do you worry about that we
> could revert it unexpectedly in future and no sensitive test case is on it?
I worry that it might contradict what some other code does. I also
worry that it just is not a sensible thing to do.
I do not worry that your patch is not an improvement. But the resulting
code more clearly (than the original) is problematic. Where is r2 saved
to the frame if save_toc_in_prologue is false?
Segher
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] rs6000: Don't use optimize_function_for_speed_p too early [PR108184]
2023-01-04 14:02 ` Segher Boessenkool
@ 2023-01-05 4:04 ` Kewen.Lin
0 siblings, 0 replies; 5+ messages in thread
From: Kewen.Lin @ 2023-01-05 4:04 UTC (permalink / raw)
To: Segher Boessenkool; +Cc: GCC Patches, David Edelsohn, Peter Bergner
on 2023/1/4 22:02, Segher Boessenkool wrote:
> Hi!
>
> On Wed, Jan 04, 2023 at 08:15:03PM +0800, Kewen.Lin wrote:
>> on 2023/1/4 18:46, Segher Boessenkool wrote:
>>>> @@ -25604,7 +25602,9 @@ rs6000_call_aix (rtx value, rtx func_desc, rtx tlsarg, rtx cookie)
>>>>
>>>> /* Can we optimize saving the TOC in the prologue or
>>>> do we need to do it at every call? */
>>>> - if (TARGET_SAVE_TOC_INDIRECT && !cfun->calls_alloca)
>>>> + if (TARGET_SAVE_TOC_INDIRECT
>>>> + && !cfun->calls_alloca
>>>> + && optimize_function_for_speed_p (cfun))
>>>> cfun->machine->save_toc_in_prologue = true;
>>>
>>> Is this correct? If so, it really needs a separate testcase.
>>
>> Yes, it just moves the condition from:
>>
>> --- a/gcc/config/rs6000/rs6000.cc
>> +++ b/gcc/config/rs6000/rs6000.cc
>> @@ -3978,8 +3978,7 @@ rs6000_option_override_internal (bool global_init_p)
>> /* If we can shrink-wrap the TOC register save separately, then use
>> -msave-toc-indirect unless explicitly disabled. */
>> if ((rs6000_isa_flags_explicit & OPTION_MASK_SAVE_TOC_INDIRECT) == 0
>> - && flag_shrink_wrap_separate
>> - && optimize_function_for_speed_p (cfun))
>> + && flag_shrink_wrap_separate)
>> rs6000_isa_flags |= OPTION_MASK_SAVE_TOC_INDIRECT;
>>
>> here.
>
> That "just" reinforces that this really needs a testcase! It is all
> action at a distance, none of this is trivial (if it was there would
> not be a bug here in the first place, of course).
OK, I'll make a test case for it. :)
>
>> I tried to find one test case before, but failed to find one which is not fragile
>> to test. And I thought the associated test case has demonstrated why the use of
>> optimize_function_for_{speed,size}_p is too early in function
>> rs6000_option_override_internal, so I gave up then. Do you worry about that we
>> could revert it unexpectedly in future and no sensitive test case is on it?
>
> I worry that it might contradict what some other code does. I also
> worry that it just is not a sensible thing to do.
>
> I do not worry that your patch is not an improvement. But the resulting
> code more clearly (than the original) is problematic. Where is r2 saved
> to the frame if save_toc_in_prologue is false?
If save_toc_in_prologue is false, the r2 saving to frame would occur at each
indirect call. Currently separate shrink-wrapping will check save_toc_in_prologue
to decide whether to consider saving toc as one component, I think that's why
we enable save-toc-indirect implicitly (going to set save_toc_in_prologue)
if it's not specified explicitly and doing separate shrink-wrapping.
BR,
Kewen
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-01-05 4:04 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-04 9:20 [PATCH] rs6000: Don't use optimize_function_for_speed_p too early [PR108184] Kewen.Lin
2023-01-04 10:46 ` Segher Boessenkool
2023-01-04 12:15 ` Kewen.Lin
2023-01-04 14:02 ` Segher Boessenkool
2023-01-05 4:04 ` Kewen.Lin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).