[Bug rtl-optimization/44883] New: Combine separate shift and add instructions into a single one

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug rtl-optimization/44883]  New: Combine separate shift and add instructions into a single one
@ 2010-07-08 23:18 carrot at google dot com
  2010-07-08 23:22 ` [Bug target/44883] " pinskia at gcc dot gnu dot org
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: carrot at google dot com @ 2010-07-08 23:18 UTC (permalink / raw)
  To: gcc-bugs

Compile the following code with options -march=armv7-a -mthumb -Os

struct S1
{
    int f1;
    int f2;
    int f3[6];
};

struct S2
{
    struct S1* pS1;
};

void aaaaaaaaa(struct S2* pS2, int count)
{
        int idx;
        for (idx = 0; idx < count; idx++) {
            struct S1* pS1 = &pS2->pS1[idx];
            foo(pS1->f1);
            pS1->f2 = 6;
        }
}

GCC generates:

aaaaaaaaa:
        push    {r4, r5, r6, r7, r8, lr}
        mov     r4, r0
        mov     r7, r1
        movs    r5, #0
        movs    r6, #6
        b       .L2
.L3:
        ldr     r2, [r4, #0]
        lsls    r3, r5, #5       // A
        adds    r5, r5, #1
        add     r8, r2, r3       // B
        ldr     r0, [r2, r3]     // C
        bl      foo
        str     r6, [r8, #4]
.L2:
        cmp     r5, r7
        blt     .L3
        pop     {r4, r5, r6, r7, r8, pc}

Instructions AB can be merged into one instruction and C should be modified
accordingly

      add r8, r2, r5 << 5
      ldr     r0, [r8]

The related rtl insns before fwprop2 pass is:

(insn 13 12 14 3 src/to.c:13 (set (reg:SI 143)
        (ashift:SI (reg/v:SI 135 [ idx ])
            (const_int 5 [0x5]))) 119 {*arm_shiftsi3} (nil))

(insn 15 14 16 3 src/to.c:17 (set (reg/v/f:SI 137 [ pS1 ])
        (plus:SI (reg/f:SI 144 [ pS2_4(D)->pS1 ])
            (reg:SI 143))) 4 {*arm_addsi3} (expr_list:REG_DEAD (reg/f:SI 144 [
pS2_4(D)->pS1 ])
        (expr_list:REG_DEAD (reg:SI 143)
            (nil))))

(insn 16 15 17 3 src/to.c:18 (set (reg:SI 0 r0)
        (mem/s:SI (reg/v/f:SI 137 [ pS1 ]) [5 pS1_8->f1+0 S4 A32])) 661
{*thumb2_movsi_insn} (nil))

It looks can be handled by combine pass. But the fwprop2 pass propagates the
following expression into memory load

    (plus:SI (reg/f:SI 144 [ pS2_4(D)->pS1 ])
            (reg:SI 143))

So now we get:

(insn 13 12 14 3 src/to.c:13 (set (reg:SI 143)
        (ashift:SI (reg/v:SI 135 [ idx ])
            (const_int 5 [0x5]))) 119 {*arm_shiftsi3} (nil))

(insn 15 14 16 3 src/to.c:17 (set (reg/v/f:SI 137 [ pS1 ])
        (plus:SI (reg/f:SI 144 [ pS2_4(D)->pS1 ])
            (reg:SI 143))) 4 {*arm_addsi3} (expr_list:REG_DEAD (reg/f:SI 144 [
pS2_4(D)->pS1 ])
        (expr_list:REG_DEAD (reg:SI 143)
            (nil))))

(insn 16 15 17 3 src/to.c:18 (set (reg:SI 0 r0)
        (mem/s:SI (plus:SI (reg/f:SI 144 [ pS2_4(D)->pS1 ])
                (reg:SI 143)) [5 pS1_8->f1+0 S4 A32])) 661 {*thumb2_movsi_insn}
(nil))

Now r143 is used in both insn 15 and insn 16. Combine insn 13 and insn 15 can't
bring any benefit.

So in function fwprop_addr before deciding propagate an expression should we
also check if it is the only use of the corresponding def?


-- 
           Summary: Combine separate shift and add instructions into a
                    single one
           Product: gcc
           Version: 4.6.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: carrot at google dot com
 GCC build triplet: i686-linux
  GCC host triplet: i686-linux
GCC target triplet: arm-eabi


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44883


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/44883] Combine separate shift and add instructions into a single one
  2010-07-08 23:18 [Bug rtl-optimization/44883] New: Combine separate shift and add instructions into a single one carrot at google dot com
@ 2010-07-08 23:22 ` pinskia at gcc dot gnu dot org
  2010-07-09  0:04 ` carrot at google dot com
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2010-07-08 23:22 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from pinskia at gcc dot gnu dot org  2010-07-08 23:22 -------
>So in function fwprop_addr before deciding propagate an expression should we
also check if it is the only use of the corresponding def?

It does somewhat.  Though address cost might be lower for r2+r3 than just r8. 
Please make sure that fwprop_addr has the correct address cost.


-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|rtl-optimization            |target


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44883


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/44883] Combine separate shift and add instructions into a single one
  2010-07-08 23:18 [Bug rtl-optimization/44883] New: Combine separate shift and add instructions into a single one carrot at google dot com
  2010-07-08 23:22 ` [Bug target/44883] " pinskia at gcc dot gnu dot org
@ 2010-07-09  0:04 ` carrot at google dot com
  2010-07-09  0:06 ` pinskia at gcc dot gnu dot org
  2010-07-09 22:11 ` carrot at google dot com
  3 siblings, 0 replies; 6+ messages in thread
From: carrot at google dot com @ 2010-07-09  0:04 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from carrot at google dot com  2010-07-09 00:04 -------
(In reply to comment #1)
> >So in function fwprop_addr before deciding propagate an expression should we
> also check if it is the only use of the corresponding def?
> 
> It does somewhat.  Though address cost might be lower for r2+r3 than just r8. 
> Please make sure that fwprop_addr has the correct address cost.
> 

It occurs before register allocation, it is hard to say (plus (reg 144) (reg
143)) is cheaper than (reg 137).

But the address cost looks really strange. The arm/thumb2 address cost function
is

arm_arm_address_cost (rtx x)
{
  enum rtx_code c  = GET_CODE (x);
  if (c == PRE_INC || c == PRE_DEC || c == POST_INC || c == POST_DEC)
    return 0;
  if (c == MEM || c == LABEL_REF || c == SYMBOL_REF)
    return 10;
  if (c == PLUS)
    {
      if (GET_CODE (XEXP (x, 1)) == CONST_INT)
        return 2;
      if (ARITHMETIC_P (XEXP (x, 0)) || ARITHMETIC_P (XEXP (x, 1)))
        return 3;
      return 4;
    }
  return 6;
}

Give it a single register, the cost is 6. Give it the rtx (plus (reg 144) (reg
143)), the cost is 4. So a single register is more expensive than a plus
expression, not reasonable.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44883


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/44883] Combine separate shift and add instructions into a single one
  2010-07-08 23:18 [Bug rtl-optimization/44883] New: Combine separate shift and add instructions into a single one carrot at google dot com
  2010-07-08 23:22 ` [Bug target/44883] " pinskia at gcc dot gnu dot org
  2010-07-09  0:04 ` carrot at google dot com
@ 2010-07-09  0:06 ` pinskia at gcc dot gnu dot org
  2010-07-09 22:11 ` carrot at google dot com
  3 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2010-07-09  0:06 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from pinskia at gcc dot gnu dot org  2010-07-09 00:06 -------
(In reply to comment #2)
> Give it a single register, the cost is 6. Give it the rtx (plus (reg 144) (reg
> 143)), the cost is 4. So a single register is more expensive than a plus
> expression, not reasonable.

And that causes forwprop to do what it does; it is a lower total cost according
to the cost function :).


-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|0                           |1
           Keywords|                            |missed-optimization
   Last reconfirmed|0000-00-00 00:00:00         |2010-07-09 00:06:33
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44883


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/44883] Combine separate shift and add instructions into a single one
  2010-07-08 23:18 [Bug rtl-optimization/44883] New: Combine separate shift and add instructions into a single one carrot at google dot com
                   ` (2 preceding siblings ...)
  2010-07-09  0:06 ` pinskia at gcc dot gnu dot org
@ 2010-07-09 22:11 ` carrot at google dot com
  3 siblings, 0 replies; 6+ messages in thread
From: carrot at google dot com @ 2010-07-09 22:11 UTC (permalink / raw)
  To: gcc-bugs

------- Comment #4 from carrot at google dot com  2010-07-09 22:11 -------
Consider following case:

    add r3, r1, r2
    ldr r4, [r3]

Suppose there is no other usage of r3, propagate the first instruction into the
second and remove the first is the correct behavior and beneficial. This is
also the current cost model's behavior. So I was really confused by the
function name arm_arm_address_cost.

Suppose there is another usage of r3 in later instructions, even we propagate
the first instruction into the second instruction, we can't delete the first
one. So there is no beneficial with the propagation and we should stop. My
original test case is very similar. The (reg 137) is used more than once, the
propagation can't make insn 15 dead, so the result is not better. On the
contrary it prevents other optimization (combine of insn 13 and insn 15).

-- 

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44883

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/44883] Combine separate shift and add instructions into a single one
       [not found] <bug-44883-4@http.gcc.gnu.org/bugzilla/>
@ 2021-09-26 22:43 ` pinskia at gcc dot gnu.org
  0 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-09-26 22:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44883

--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
In GCC 9+ (due to 2->2 combine) we get:
.L2:
        cmp     r4, r5
        blt     .L3
        pop     {r4, r5, r6, r7, r8, pc}
.L3:
        ldr     r3, [r6]
        lsls    r2, r4, #5
        add     r8, r3, r4, lsl #5
        adds    r4, r4, #1
        ldr     r0, [r3, r2]
        bl      foo
        str     r7, [r8, #4]
        b       .L2

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-09-26 22:43 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-07-08 23:18 [Bug rtl-optimization/44883] New: Combine separate shift and add instructions into a single one carrot at google dot com
2010-07-08 23:22 ` [Bug target/44883] " pinskia at gcc dot gnu dot org
2010-07-09  0:04 ` carrot at google dot com
2010-07-09  0:06 ` pinskia at gcc dot gnu dot org
2010-07-09 22:11 ` carrot at google dot com
     [not found] <bug-44883-4@http.gcc.gnu.org/bugzilla/>
2021-09-26 22:43 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).