public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [RFA] Attach MEM_EXPR information when flushing BLKmode args to the stack
@ 2021-07-02 16:13 Jeff Law
  2021-07-05 11:17 ` Richard Biener
  2021-07-09  2:39 ` [RFA] Attach MEM_EXPR information when flushing BLKmode args to the stack - V2 Jeff Law
  0 siblings, 2 replies; 6+ messages in thread
From: Jeff Law @ 2021-07-02 16:13 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 3450 bytes --]


This is a minor missed optimization we found with our internal port.

Given this code:

typedef struct {short a; short b;} T;

extern void g1();

void f(T s)
{
         if (s.a < 0)
                 g1();
}


"s" is passed in a register, but it's still a BLKmode object because the 
alignment of T is smaller than the alignment that an integer of the same 
size would have (16 bits vs 32 bits).


Because "s" is BLKmode function.c is going to store it into a stack slot 
and we'll load it from the that slot for each reference.  So on the v850 
(just to pick a port that likely has the same behavior we're seeing) we 
have this RTL from CSE2:


(insn 2 4 3 2 (set (mem/c:SI (plus:SI (reg/f:SI 34 .fp)
                 (const_int -4 [0xfffffffffffffffc])) [2 S4 A32])
         (reg:SI 6 r6)) "j.c":6:1 7 {*movsi_internal}
      (expr_list:REG_DEAD (reg:SI 6 r6)
         (nil)))
(note 3 2 8 2 NOTE_INSN_FUNCTION_BEG)
(insn 8 3 9 2 (set (reg:HI 44 [ s.a ])
         (mem/c:HI (plus:SI (reg/f:SI 34 .fp)
                 (const_int -4 [0xfffffffffffffffc])) [1 s.a+0 S2 A32])) 
"j.c":7:5 3 {*movhi_internal}
      (nil))
(insn 9 8 10 2 (parallel [
             (set (reg:SI 45)
                 (ashift:SI (subreg:SI (reg:HI 44 [ s.a ]) 0)
                     (const_int 16 [0x10])))
             (clobber (reg:CC 32 psw))
         ]) "j.c":7:5 94 {ashlsi3_clobber_flags}
      (expr_list:REG_DEAD (reg:HI 44 [ s.a ])
         (expr_list:REG_UNUSED (reg:CC 32 psw)
             (nil))))
(insn 10 9 11 2 (parallel [
             (set (reg:SI 43)
                 (ashiftrt:SI (reg:SI 45)
                     (const_int 16 [0x10])))
             (clobber (reg:CC 32 psw))
         ]) "j.c":7:5 104 {ashrsi3_clobber_flags}
      (expr_list:REG_DEAD (reg:SI 45)
         (expr_list:REG_UNUSED (reg:CC 32 psw)
             (nil))))


Insn 2 is the store into the stack. insn 8 is the load for s.a in the 
conditional.  DSE1 replaces the MEM in insn 8 with (reg 6) since (reg 6) 
has the value we want.  After that the store at insn 2 is dead.  Sadly 
DSE never removes the store.

The problem is RTL DSE considers a store with no MEM_EXPR as escaping, 
which keeps the MEM live.  The lack of a MEM_EXPR is due to call to 
change_address to twiddle the mode on the MEM for the store at insn 2.  
It should be safe to copy the MEM_EXPR (which should always be a 
PARM_DECL) from the original memory to the memory returned by 
change_address.  Doing so results in DSE1 removing the store at insn 2.

It would be nice to remove the stack setup/teardown.   I'm not offhand 
aware of mechanisms to remove the setup/teardown after we've already 
allocated a slot, even if the slot is no longer used.

Bootstrapped and regression tested on x86, though I don't think that's a 
particularly useful test.  So I also ran it through my tester across 
those pesky embedded targets without regressions as well.

I didn't include a test simply because I didn't want to have an insane 
target selector.  I guess if we really wanted a test we could look after 
DSE1 is done and verify there aren't any MEMs left at all.  Willing to 
try that if the consensus is we want this tested.

OK for the trunk?

Jeff



[-- Attachment #2: dse.patch --]
[-- Type: text/plain, Size: 1110 bytes --]

diff --git a/gcc/function.c b/gcc/function.c
index 00b2fe70c7d..e47931a2695 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -3036,7 +3036,23 @@ assign_parm_setup_block (struct assign_parm_data_all *all,
 		  reg = gen_rtx_REG (word_mode, REGNO (entry_parm));
 		  reg = convert_to_mode (mode, copy_to_reg (reg), 1);
 		}
-	      emit_move_insn (change_address (mem, mode, 0), reg);
+
+	      /* We use change_address to get a new MEM with the mode
+		 changed.  But that new MEM will not have a full set
+		 of attributes such as MEM_EXPR.
+
+		 This can inhibit later optimizations like DSE which
+		 assumes if there is no MEM_EXPR, then the memory
+		 must escape the current function.
+
+		 The MEM_EXPR on the original MEM expression should be
+		 the PARM_DECL which should be safe to attach to the
+		 new MEM.  */
+	      gcc_assert (MEM_EXPR (mem));
+	      gcc_assert (TREE_CODE (MEM_EXPR (mem)) == PARM_DECL);
+	      rtx new_mem = change_address (mem, mode, 0);
+	      set_mem_attributes (new_mem, MEM_EXPR (mem), 1);
+	      emit_move_insn (new_mem, reg);
 	    }
 
 #ifdef BLOCK_REG_PADDING

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFA] Attach MEM_EXPR information when flushing BLKmode args to the stack
  2021-07-02 16:13 [RFA] Attach MEM_EXPR information when flushing BLKmode args to the stack Jeff Law
@ 2021-07-05 11:17 ` Richard Biener
  2021-07-06  4:01   ` Jeff Law
  2021-07-08 19:19   ` Jeff Law
  2021-07-09  2:39 ` [RFA] Attach MEM_EXPR information when flushing BLKmode args to the stack - V2 Jeff Law
  1 sibling, 2 replies; 6+ messages in thread
From: Richard Biener @ 2021-07-05 11:17 UTC (permalink / raw)
  To: Jeff Law; +Cc: GCC Patches

On Fri, Jul 2, 2021 at 6:13 PM Jeff Law <jlaw@tachyum.com> wrote:
>
>
> This is a minor missed optimization we found with our internal port.
>
> Given this code:
>
> typedef struct {short a; short b;} T;
>
> extern void g1();
>
> void f(T s)
> {
>          if (s.a < 0)
>                  g1();
> }
>
>
> "s" is passed in a register, but it's still a BLKmode object because the
> alignment of T is smaller than the alignment that an integer of the same
> size would have (16 bits vs 32 bits).
>
>
> Because "s" is BLKmode function.c is going to store it into a stack slot
> and we'll load it from the that slot for each reference.  So on the v850
> (just to pick a port that likely has the same behavior we're seeing) we
> have this RTL from CSE2:
>
>
> (insn 2 4 3 2 (set (mem/c:SI (plus:SI (reg/f:SI 34 .fp)
>                  (const_int -4 [0xfffffffffffffffc])) [2 S4 A32])
>          (reg:SI 6 r6)) "j.c":6:1 7 {*movsi_internal}
>       (expr_list:REG_DEAD (reg:SI 6 r6)
>          (nil)))
> (note 3 2 8 2 NOTE_INSN_FUNCTION_BEG)
> (insn 8 3 9 2 (set (reg:HI 44 [ s.a ])
>          (mem/c:HI (plus:SI (reg/f:SI 34 .fp)
>                  (const_int -4 [0xfffffffffffffffc])) [1 s.a+0 S2 A32]))
> "j.c":7:5 3 {*movhi_internal}
>       (nil))
> (insn 9 8 10 2 (parallel [
>              (set (reg:SI 45)
>                  (ashift:SI (subreg:SI (reg:HI 44 [ s.a ]) 0)
>                      (const_int 16 [0x10])))
>              (clobber (reg:CC 32 psw))
>          ]) "j.c":7:5 94 {ashlsi3_clobber_flags}
>       (expr_list:REG_DEAD (reg:HI 44 [ s.a ])
>          (expr_list:REG_UNUSED (reg:CC 32 psw)
>              (nil))))
> (insn 10 9 11 2 (parallel [
>              (set (reg:SI 43)
>                  (ashiftrt:SI (reg:SI 45)
>                      (const_int 16 [0x10])))
>              (clobber (reg:CC 32 psw))
>          ]) "j.c":7:5 104 {ashrsi3_clobber_flags}
>       (expr_list:REG_DEAD (reg:SI 45)
>          (expr_list:REG_UNUSED (reg:CC 32 psw)
>              (nil))))
>
>
> Insn 2 is the store into the stack. insn 8 is the load for s.a in the
> conditional.  DSE1 replaces the MEM in insn 8 with (reg 6) since (reg 6)
> has the value we want.  After that the store at insn 2 is dead.  Sadly
> DSE never removes the store.
>
> The problem is RTL DSE considers a store with no MEM_EXPR as escaping,
> which keeps the MEM live.  The lack of a MEM_EXPR is due to call to
> change_address to twiddle the mode on the MEM for the store at insn 2.
> It should be safe to copy the MEM_EXPR (which should always be a
> PARM_DECL) from the original memory to the memory returned by
> change_address.  Doing so results in DSE1 removing the store at insn 2.
>
> It would be nice to remove the stack setup/teardown.   I'm not offhand
> aware of mechanisms to remove the setup/teardown after we've already
> allocated a slot, even if the slot is no longer used.
>
> Bootstrapped and regression tested on x86, though I don't think that's a
> particularly useful test.  So I also ran it through my tester across
> those pesky embedded targets without regressions as well.
>
> I didn't include a test simply because I didn't want to have an insane
> target selector.  I guess if we really wanted a test we could look after
> DSE1 is done and verify there aren't any MEMs left at all.  Willing to
> try that if the consensus is we want this tested.
>
> OK for the trunk?

I wonder why the code doesn't use adjust_address instead?  That
handles most cases already and the code doesn't change the
address but just the mode (and access size)?

Richard.

> Jeff
>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFA] Attach MEM_EXPR information when flushing BLKmode args to the stack
  2021-07-05 11:17 ` Richard Biener
@ 2021-07-06  4:01   ` Jeff Law
  2021-07-08 19:19   ` Jeff Law
  1 sibling, 0 replies; 6+ messages in thread
From: Jeff Law @ 2021-07-06  4:01 UTC (permalink / raw)
  To: Richard Biener, Jeff Law; +Cc: GCC Patches



On 7/5/2021 5:17 AM, Richard Biener via Gcc-patches wrote:
> On Fri, Jul 2, 2021 at 6:13 PM Jeff Law <jlaw@tachyum.com> wrote:
>>
>> This is a minor missed optimization we found with our internal port.
>>
>> Given this code:
>>
>> typedef struct {short a; short b;} T;
>>
>> extern void g1();
>>
>> void f(T s)
>> {
>>           if (s.a < 0)
>>                   g1();
>> }
>>
>>
>> "s" is passed in a register, but it's still a BLKmode object because the
>> alignment of T is smaller than the alignment that an integer of the same
>> size would have (16 bits vs 32 bits).
>>
>>
>> Because "s" is BLKmode function.c is going to store it into a stack slot
>> and we'll load it from the that slot for each reference.  So on the v850
>> (just to pick a port that likely has the same behavior we're seeing) we
>> have this RTL from CSE2:
>>
>>
>> (insn 2 4 3 2 (set (mem/c:SI (plus:SI (reg/f:SI 34 .fp)
>>                   (const_int -4 [0xfffffffffffffffc])) [2 S4 A32])
>>           (reg:SI 6 r6)) "j.c":6:1 7 {*movsi_internal}
>>        (expr_list:REG_DEAD (reg:SI 6 r6)
>>           (nil)))
>> (note 3 2 8 2 NOTE_INSN_FUNCTION_BEG)
>> (insn 8 3 9 2 (set (reg:HI 44 [ s.a ])
>>           (mem/c:HI (plus:SI (reg/f:SI 34 .fp)
>>                   (const_int -4 [0xfffffffffffffffc])) [1 s.a+0 S2 A32]))
>> "j.c":7:5 3 {*movhi_internal}
>>        (nil))
>> (insn 9 8 10 2 (parallel [
>>               (set (reg:SI 45)
>>                   (ashift:SI (subreg:SI (reg:HI 44 [ s.a ]) 0)
>>                       (const_int 16 [0x10])))
>>               (clobber (reg:CC 32 psw))
>>           ]) "j.c":7:5 94 {ashlsi3_clobber_flags}
>>        (expr_list:REG_DEAD (reg:HI 44 [ s.a ])
>>           (expr_list:REG_UNUSED (reg:CC 32 psw)
>>               (nil))))
>> (insn 10 9 11 2 (parallel [
>>               (set (reg:SI 43)
>>                   (ashiftrt:SI (reg:SI 45)
>>                       (const_int 16 [0x10])))
>>               (clobber (reg:CC 32 psw))
>>           ]) "j.c":7:5 104 {ashrsi3_clobber_flags}
>>        (expr_list:REG_DEAD (reg:SI 45)
>>           (expr_list:REG_UNUSED (reg:CC 32 psw)
>>               (nil))))
>>
>>
>> Insn 2 is the store into the stack. insn 8 is the load for s.a in the
>> conditional.  DSE1 replaces the MEM in insn 8 with (reg 6) since (reg 6)
>> has the value we want.  After that the store at insn 2 is dead.  Sadly
>> DSE never removes the store.
>>
>> The problem is RTL DSE considers a store with no MEM_EXPR as escaping,
>> which keeps the MEM live.  The lack of a MEM_EXPR is due to call to
>> change_address to twiddle the mode on the MEM for the store at insn 2.
>> It should be safe to copy the MEM_EXPR (which should always be a
>> PARM_DECL) from the original memory to the memory returned by
>> change_address.  Doing so results in DSE1 removing the store at insn 2.
>>
>> It would be nice to remove the stack setup/teardown.   I'm not offhand
>> aware of mechanisms to remove the setup/teardown after we've already
>> allocated a slot, even if the slot is no longer used.
>>
>> Bootstrapped and regression tested on x86, though I don't think that's a
>> particularly useful test.  So I also ran it through my tester across
>> those pesky embedded targets without regressions as well.
>>
>> I didn't include a test simply because I didn't want to have an insane
>> target selector.  I guess if we really wanted a test we could look after
>> DSE1 is done and verify there aren't any MEMs left at all.  Willing to
>> try that if the consensus is we want this tested.
>>
>> OK for the trunk?
> I wonder why the code doesn't use adjust_address instead?  That
> handles most cases already and the code doesn't change the
> address but just the mode (and access size)?
No idea.  It should be easy enough to try that approach though.

jeff


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFA] Attach MEM_EXPR information when flushing BLKmode args to the stack
  2021-07-05 11:17 ` Richard Biener
  2021-07-06  4:01   ` Jeff Law
@ 2021-07-08 19:19   ` Jeff Law
  1 sibling, 0 replies; 6+ messages in thread
From: Jeff Law @ 2021-07-08 19:19 UTC (permalink / raw)
  To: Richard Biener, Jeff Law; +Cc: GCC Patches



On 7/5/2021 5:17 AM, Richard Biener via Gcc-patches wrote:
> On Fri, Jul 2, 2021 at 6:13 PM Jeff Law <jlaw@tachyum.com> wrote:
>>
>> This is a minor missed optimization we found with our internal port.
>>
>> Given this code:
>>
>> typedef struct {short a; short b;} T;
>>
>> extern void g1();
>>
>> void f(T s)
>> {
>>           if (s.a < 0)
>>                   g1();
>> }
>>
>>
>> "s" is passed in a register, but it's still a BLKmode object because the
>> alignment of T is smaller than the alignment that an integer of the same
>> size would have (16 bits vs 32 bits).
>>
>>
>> Because "s" is BLKmode function.c is going to store it into a stack slot
>> and we'll load it from the that slot for each reference.  So on the v850
>> (just to pick a port that likely has the same behavior we're seeing) we
>> have this RTL from CSE2:
>>
>>
>> (insn 2 4 3 2 (set (mem/c:SI (plus:SI (reg/f:SI 34 .fp)
>>                   (const_int -4 [0xfffffffffffffffc])) [2 S4 A32])
>>           (reg:SI 6 r6)) "j.c":6:1 7 {*movsi_internal}
>>        (expr_list:REG_DEAD (reg:SI 6 r6)
>>           (nil)))
>> (note 3 2 8 2 NOTE_INSN_FUNCTION_BEG)
>> (insn 8 3 9 2 (set (reg:HI 44 [ s.a ])
>>           (mem/c:HI (plus:SI (reg/f:SI 34 .fp)
>>                   (const_int -4 [0xfffffffffffffffc])) [1 s.a+0 S2 A32]))
>> "j.c":7:5 3 {*movhi_internal}
>>        (nil))
>> (insn 9 8 10 2 (parallel [
>>               (set (reg:SI 45)
>>                   (ashift:SI (subreg:SI (reg:HI 44 [ s.a ]) 0)
>>                       (const_int 16 [0x10])))
>>               (clobber (reg:CC 32 psw))
>>           ]) "j.c":7:5 94 {ashlsi3_clobber_flags}
>>        (expr_list:REG_DEAD (reg:HI 44 [ s.a ])
>>           (expr_list:REG_UNUSED (reg:CC 32 psw)
>>               (nil))))
>> (insn 10 9 11 2 (parallel [
>>               (set (reg:SI 43)
>>                   (ashiftrt:SI (reg:SI 45)
>>                       (const_int 16 [0x10])))
>>               (clobber (reg:CC 32 psw))
>>           ]) "j.c":7:5 104 {ashrsi3_clobber_flags}
>>        (expr_list:REG_DEAD (reg:SI 45)
>>           (expr_list:REG_UNUSED (reg:CC 32 psw)
>>               (nil))))
>>
>>
>> Insn 2 is the store into the stack. insn 8 is the load for s.a in the
>> conditional.  DSE1 replaces the MEM in insn 8 with (reg 6) since (reg 6)
>> has the value we want.  After that the store at insn 2 is dead.  Sadly
>> DSE never removes the store.
>>
>> The problem is RTL DSE considers a store with no MEM_EXPR as escaping,
>> which keeps the MEM live.  The lack of a MEM_EXPR is due to call to
>> change_address to twiddle the mode on the MEM for the store at insn 2.
>> It should be safe to copy the MEM_EXPR (which should always be a
>> PARM_DECL) from the original memory to the memory returned by
>> change_address.  Doing so results in DSE1 removing the store at insn 2.
>>
>> It would be nice to remove the stack setup/teardown.   I'm not offhand
>> aware of mechanisms to remove the setup/teardown after we've already
>> allocated a slot, even if the slot is no longer used.
>>
>> Bootstrapped and regression tested on x86, though I don't think that's a
>> particularly useful test.  So I also ran it through my tester across
>> those pesky embedded targets without regressions as well.
>>
>> I didn't include a test simply because I didn't want to have an insane
>> target selector.  I guess if we really wanted a test we could look after
>> DSE1 is done and verify there aren't any MEMs left at all.  Willing to
>> try that if the consensus is we want this tested.
>>
>> OK for the trunk?
> I wonder why the code doesn't use adjust_address instead?  That
> handles most cases already and the code doesn't change the
> address but just the mode (and access size)?
Yea, adjust_address seems to work fine.  I'm spinning that in my tester 
at the moment.

Jeff


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFA] Attach MEM_EXPR information when flushing BLKmode args to the stack - V2
  2021-07-02 16:13 [RFA] Attach MEM_EXPR information when flushing BLKmode args to the stack Jeff Law
  2021-07-05 11:17 ` Richard Biener
@ 2021-07-09  2:39 ` Jeff Law
  2021-07-09  6:44   ` Richard Biener
  1 sibling, 1 reply; 6+ messages in thread
From: Jeff Law @ 2021-07-09  2:39 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 3882 bytes --]



On 7/2/2021 10:13 AM, Jeff Law wrote:
>
> This is a minor missed optimization we found with our internal port.
>
> Given this code:
>
> typedef struct {short a; short b;} T;
>
> extern void g1();
>
> void f(T s)
> {
>         if (s.a < 0)
>                 g1();
> }
>
>
> "s" is passed in a register, but it's still a BLKmode object because 
> the alignment of T is smaller than the alignment that an integer of 
> the same size would have (16 bits vs 32 bits).
>
>
> Because "s" is BLKmode function.c is going to store it into a stack 
> slot and we'll load it from the that slot for each reference.  So on 
> the v850 (just to pick a port that likely has the same behavior we're 
> seeing) we have this RTL from CSE2:
>
>
> (insn 2 4 3 2 (set (mem/c:SI (plus:SI (reg/f:SI 34 .fp)
>                 (const_int -4 [0xfffffffffffffffc])) [2 S4 A32])
>         (reg:SI 6 r6)) "j.c":6:1 7 {*movsi_internal}
>      (expr_list:REG_DEAD (reg:SI 6 r6)
>         (nil)))
> (note 3 2 8 2 NOTE_INSN_FUNCTION_BEG)
> (insn 8 3 9 2 (set (reg:HI 44 [ s.a ])
>         (mem/c:HI (plus:SI (reg/f:SI 34 .fp)
>                 (const_int -4 [0xfffffffffffffffc])) [1 s.a+0 S2 
> A32])) "j.c":7:5 3 {*movhi_internal}
>      (nil))
> (insn 9 8 10 2 (parallel [
>             (set (reg:SI 45)
>                 (ashift:SI (subreg:SI (reg:HI 44 [ s.a ]) 0)
>                     (const_int 16 [0x10])))
>             (clobber (reg:CC 32 psw))
>         ]) "j.c":7:5 94 {ashlsi3_clobber_flags}
>      (expr_list:REG_DEAD (reg:HI 44 [ s.a ])
>         (expr_list:REG_UNUSED (reg:CC 32 psw)
>             (nil))))
> (insn 10 9 11 2 (parallel [
>             (set (reg:SI 43)
>                 (ashiftrt:SI (reg:SI 45)
>                     (const_int 16 [0x10])))
>             (clobber (reg:CC 32 psw))
>         ]) "j.c":7:5 104 {ashrsi3_clobber_flags}
>      (expr_list:REG_DEAD (reg:SI 45)
>         (expr_list:REG_UNUSED (reg:CC 32 psw)
>             (nil))))
>
>
> Insn 2 is the store into the stack. insn 8 is the load for s.a in the 
> conditional.  DSE1 replaces the MEM in insn 8 with (reg 6) since (reg 
> 6) has the value we want.  After that the store at insn 2 is dead.  
> Sadly DSE never removes the store.
>
> The problem is RTL DSE considers a store with no MEM_EXPR as escaping, 
> which keeps the MEM live.  The lack of a MEM_EXPR is due to call to 
> change_address to twiddle the mode on the MEM for the store at insn 
> 2.  It should be safe to copy the MEM_EXPR (which should always be a 
> PARM_DECL) from the original memory to the memory returned by 
> change_address.  Doing so results in DSE1 removing the store at insn 2.
>
> It would be nice to remove the stack setup/teardown.   I'm not offhand 
> aware of mechanisms to remove the setup/teardown after we've already 
> allocated a slot, even if the slot is no longer used.
>
> Bootstrapped and regression tested on x86, though I don't think that's 
> a particularly useful test.  So I also ran it through my tester across 
> those pesky embedded targets without regressions as well.
>
> I didn't include a test simply because I didn't want to have an insane 
> target selector.  I guess if we really wanted a test we could look 
> after DSE1 is done and verify there aren't any MEMs left at all.  
> Willing to try that if the consensus is we want this tested.
>
> OK for the trunk?
So Richi questioned if using adjust_address rather than change_address 
was sufficient to solve the problem.  It is.  I've bootstrapped & 
regression tested on x86_64 and regression tested this against the usual 
set of crosses in the tester.

OK for the trunk now?

JEff


[-- Attachment #2: P --]
[-- Type: text/plain, Size: 952 bytes --]

	* function.c (assign_parm_setup_block): Use adjust_address instead
	of change_address to preserve MEM_EXPR and friends.

diff --git a/gcc/function.c b/gcc/function.c
index 00b2fe70c7d..af3d57b32a3 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -3036,7 +3036,15 @@ assign_parm_setup_block (struct assign_parm_data_all *all,
 		  reg = gen_rtx_REG (word_mode, REGNO (entry_parm));
 		  reg = convert_to_mode (mode, copy_to_reg (reg), 1);
 		}
-	      emit_move_insn (change_address (mem, mode, 0), reg);
+
+	      /* We use adjust_address to get a new MEM with the mode
+		 changed.  adjust_address is better than change_address
+		 for this purpose because adjust_address does not lose
+		 the MEM_EXPR associated with the MEM.
+
+		 If the MEM_EXPR is lost, then optimizations like DSE
+		 assume the MEM escapes and thus is not subject to DSE.  */
+	      emit_move_insn (adjust_address (mem, mode, 0), reg);
 	    }
 
 #ifdef BLOCK_REG_PADDING

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFA] Attach MEM_EXPR information when flushing BLKmode args to the stack - V2
  2021-07-09  2:39 ` [RFA] Attach MEM_EXPR information when flushing BLKmode args to the stack - V2 Jeff Law
@ 2021-07-09  6:44   ` Richard Biener
  0 siblings, 0 replies; 6+ messages in thread
From: Richard Biener @ 2021-07-09  6:44 UTC (permalink / raw)
  To: Jeff Law; +Cc: GCC Patches

On Fri, Jul 9, 2021 at 4:39 AM Jeff Law <jlaw@tachyum.com> wrote:
>
>
>
> On 7/2/2021 10:13 AM, Jeff Law wrote:
> >
> > This is a minor missed optimization we found with our internal port.
> >
> > Given this code:
> >
> > typedef struct {short a; short b;} T;
> >
> > extern void g1();
> >
> > void f(T s)
> > {
> >         if (s.a < 0)
> >                 g1();
> > }
> >
> >
> > "s" is passed in a register, but it's still a BLKmode object because
> > the alignment of T is smaller than the alignment that an integer of
> > the same size would have (16 bits vs 32 bits).
> >
> >
> > Because "s" is BLKmode function.c is going to store it into a stack
> > slot and we'll load it from the that slot for each reference.  So on
> > the v850 (just to pick a port that likely has the same behavior we're
> > seeing) we have this RTL from CSE2:
> >
> >
> > (insn 2 4 3 2 (set (mem/c:SI (plus:SI (reg/f:SI 34 .fp)
> >                 (const_int -4 [0xfffffffffffffffc])) [2 S4 A32])
> >         (reg:SI 6 r6)) "j.c":6:1 7 {*movsi_internal}
> >      (expr_list:REG_DEAD (reg:SI 6 r6)
> >         (nil)))
> > (note 3 2 8 2 NOTE_INSN_FUNCTION_BEG)
> > (insn 8 3 9 2 (set (reg:HI 44 [ s.a ])
> >         (mem/c:HI (plus:SI (reg/f:SI 34 .fp)
> >                 (const_int -4 [0xfffffffffffffffc])) [1 s.a+0 S2
> > A32])) "j.c":7:5 3 {*movhi_internal}
> >      (nil))
> > (insn 9 8 10 2 (parallel [
> >             (set (reg:SI 45)
> >                 (ashift:SI (subreg:SI (reg:HI 44 [ s.a ]) 0)
> >                     (const_int 16 [0x10])))
> >             (clobber (reg:CC 32 psw))
> >         ]) "j.c":7:5 94 {ashlsi3_clobber_flags}
> >      (expr_list:REG_DEAD (reg:HI 44 [ s.a ])
> >         (expr_list:REG_UNUSED (reg:CC 32 psw)
> >             (nil))))
> > (insn 10 9 11 2 (parallel [
> >             (set (reg:SI 43)
> >                 (ashiftrt:SI (reg:SI 45)
> >                     (const_int 16 [0x10])))
> >             (clobber (reg:CC 32 psw))
> >         ]) "j.c":7:5 104 {ashrsi3_clobber_flags}
> >      (expr_list:REG_DEAD (reg:SI 45)
> >         (expr_list:REG_UNUSED (reg:CC 32 psw)
> >             (nil))))
> >
> >
> > Insn 2 is the store into the stack. insn 8 is the load for s.a in the
> > conditional.  DSE1 replaces the MEM in insn 8 with (reg 6) since (reg
> > 6) has the value we want.  After that the store at insn 2 is dead.
> > Sadly DSE never removes the store.
> >
> > The problem is RTL DSE considers a store with no MEM_EXPR as escaping,
> > which keeps the MEM live.  The lack of a MEM_EXPR is due to call to
> > change_address to twiddle the mode on the MEM for the store at insn
> > 2.  It should be safe to copy the MEM_EXPR (which should always be a
> > PARM_DECL) from the original memory to the memory returned by
> > change_address.  Doing so results in DSE1 removing the store at insn 2.
> >
> > It would be nice to remove the stack setup/teardown.   I'm not offhand
> > aware of mechanisms to remove the setup/teardown after we've already
> > allocated a slot, even if the slot is no longer used.
> >
> > Bootstrapped and regression tested on x86, though I don't think that's
> > a particularly useful test.  So I also ran it through my tester across
> > those pesky embedded targets without regressions as well.
> >
> > I didn't include a test simply because I didn't want to have an insane
> > target selector.  I guess if we really wanted a test we could look
> > after DSE1 is done and verify there aren't any MEMs left at all.
> > Willing to try that if the consensus is we want this tested.
> >
> > OK for the trunk?
> So Richi questioned if using adjust_address rather than change_address
> was sufficient to solve the problem.  It is.  I've bootstrapped &
> regression tested on x86_64 and regression tested this against the usual
> set of crosses in the tester.
>
> OK for the trunk now?

OK.

Thanks,
Richard.

> JEff
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-07-09  6:44 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-02 16:13 [RFA] Attach MEM_EXPR information when flushing BLKmode args to the stack Jeff Law
2021-07-05 11:17 ` Richard Biener
2021-07-06  4:01   ` Jeff Law
2021-07-08 19:19   ` Jeff Law
2021-07-09  2:39 ` [RFA] Attach MEM_EXPR information when flushing BLKmode args to the stack - V2 Jeff Law
2021-07-09  6:44   ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).