public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* peephole or expand optimization for fixing poor code generated in conversion from QImode to PSImode pointer?
@ 2019-09-23 11:56 Jozef Lawrynowicz
  2019-09-23 13:25 ` Richard Biener
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Jozef Lawrynowicz @ 2019-09-23 11:56 UTC (permalink / raw)
  To: gcc

For msp430-elf in the large memory model there are PSImode (20-bit) pointers.
POINTERS_EXTEND_UNSIGNED == 1 and "char" is unsigned by default.

We get poor code generated for the following code snippet:

const int table[2] = {1, 2};

int
foo (char i)
{
  return table[i];
}

The RTL generated by expand uses two insns to convert "i" to a register's
natural mode; there is a sign extension which would be unnecessary if the first
instruction had a PSImode register as the lvalue:

(insn 2 4 3 2 (set (reg/v:HI 25 [ i ])
        (zero_extend:HI (reg:QI 12 R12 [ i ])))
     (nil))
.....
(insn 7 6 8 2 (set (reg:PSI 28)
        (subreg:PSI (sign_extend:SI (reg/v:HI 25 [ i ])) 0))
     (nil))

All we really need is:

(insn (set (reg:PSI 28 [ i ])
        (zero_extend:PSI (reg:QI 12 R12 [ i ])))
     (nil))

The zero extend is implicit with byte sized register operations, so this would
result in assembly such as:
  MOV.B R12, R12

My question is whether this is the type of thing that should be handled with a
peephole optimization or if it is worth trying to fix the initial RTL generated
by expand (or in a later RTL pass e.g. combine)?

Thanks,
Jozef

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: peephole or expand optimization for fixing poor code generated in conversion from QImode to PSImode pointer?
  2019-09-23 11:56 peephole or expand optimization for fixing poor code generated in conversion from QImode to PSImode pointer? Jozef Lawrynowicz
@ 2019-09-23 13:25 ` Richard Biener
  2019-09-23 13:30 ` Jeff Law
  2019-09-23 15:56 ` Segher Boessenkool
  2 siblings, 0 replies; 7+ messages in thread
From: Richard Biener @ 2019-09-23 13:25 UTC (permalink / raw)
  To: Jozef Lawrynowicz; +Cc: GCC Development

On Mon, Sep 23, 2019 at 1:56 PM Jozef Lawrynowicz <jozefl.gcc@gmail.com> wrote:
>
> For msp430-elf in the large memory model there are PSImode (20-bit) pointers.
> POINTERS_EXTEND_UNSIGNED == 1 and "char" is unsigned by default.
>
> We get poor code generated for the following code snippet:
>
> const int table[2] = {1, 2};
>
> int
> foo (char i)
> {
>   return table[i];
> }
>
> The RTL generated by expand uses two insns to convert "i" to a register's
> natural mode; there is a sign extension which would be unnecessary if the first
> instruction had a PSImode register as the lvalue:
>
> (insn 2 4 3 2 (set (reg/v:HI 25 [ i ])
>         (zero_extend:HI (reg:QI 12 R12 [ i ])))
>      (nil))
> .....
> (insn 7 6 8 2 (set (reg:PSI 28)
>         (subreg:PSI (sign_extend:SI (reg/v:HI 25 [ i ])) 0))
>      (nil))
>
> All we really need is:
>
> (insn (set (reg:PSI 28 [ i ])
>         (zero_extend:PSI (reg:QI 12 R12 [ i ])))
>      (nil))
>
> The zero extend is implicit with byte sized register operations, so this would
> result in assembly such as:
>   MOV.B R12, R12
>
> My question is whether this is the type of thing that should be handled with a
> peephole optimization or if it is worth trying to fix the initial RTL generated
> by expand (or in a later RTL pass e.g. combine)?

The C frontend promotes 'i' already and likely that promotion "sticks" via
your choice of SIZETYPE.

But yes, preferably RTL expansion would behave better here.

Richard.

>
> Thanks,
> Jozef

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: peephole or expand optimization for fixing poor code generated in conversion from QImode to PSImode pointer?
  2019-09-23 11:56 peephole or expand optimization for fixing poor code generated in conversion from QImode to PSImode pointer? Jozef Lawrynowicz
  2019-09-23 13:25 ` Richard Biener
@ 2019-09-23 13:30 ` Jeff Law
  2019-09-23 17:02   ` Jozef Lawrynowicz
  2019-09-23 15:56 ` Segher Boessenkool
  2 siblings, 1 reply; 7+ messages in thread
From: Jeff Law @ 2019-09-23 13:30 UTC (permalink / raw)
  To: Jozef Lawrynowicz, gcc

On 9/23/19 5:56 AM, Jozef Lawrynowicz wrote:
> For msp430-elf in the large memory model there are PSImode (20-bit) pointers.
> POINTERS_EXTEND_UNSIGNED == 1 and "char" is unsigned by default.
> 
> We get poor code generated for the following code snippet:
> 
> const int table[2] = {1, 2};
> 
> int
> foo (char i)
> {
>   return table[i];
> }
> 
> The RTL generated by expand uses two insns to convert "i" to a register's
> natural mode; there is a sign extension which would be unnecessary if the first
> instruction had a PSImode register as the lvalue:
> 
> (insn 2 4 3 2 (set (reg/v:HI 25 [ i ])
>         (zero_extend:HI (reg:QI 12 R12 [ i ])))
>      (nil))
> .....
> (insn 7 6 8 2 (set (reg:PSI 28)
>         (subreg:PSI (sign_extend:SI (reg/v:HI 25 [ i ])) 0))
>      (nil))
> 
> All we really need is:
> 
> (insn (set (reg:PSI 28 [ i ])
>         (zero_extend:PSI (reg:QI 12 R12 [ i ])))
>      (nil))
> 
> The zero extend is implicit with byte sized register operations, so this would
> result in assembly such as:
>   MOV.B R12, R12
> 
> My question is whether this is the type of thing that should be handled with a
> peephole optimization or if it is worth trying to fix the initial RTL generated
> by expand (or in a later RTL pass e.g. combine)?
Ideally it'd be fixed earlier, but that can be painful due to the way
promotions work.

I certainly recall doing some combiner patterns to catch the more
egregious codegen issues on the mn102 port (similar, but with 24bit
pointers).  I think I mostly focused on stuff that showed up with
integer loop indices and their annoying promotion to ptrdiff_t when used
for array indexing.

I'd bet if you extracted mn10200.md out of the archive the patterns
would be obvious ;-)

In this specific case I'd think a combiner pattern ought to do the right
thing except for the fact that you're probably dealing with hard
registers which combine no longer handles as aggressively.

jeff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: peephole or expand optimization for fixing poor code generated in conversion from QImode to PSImode pointer?
  2019-09-23 11:56 peephole or expand optimization for fixing poor code generated in conversion from QImode to PSImode pointer? Jozef Lawrynowicz
  2019-09-23 13:25 ` Richard Biener
  2019-09-23 13:30 ` Jeff Law
@ 2019-09-23 15:56 ` Segher Boessenkool
  2019-09-23 17:56   ` Jozef Lawrynowicz
  2 siblings, 1 reply; 7+ messages in thread
From: Segher Boessenkool @ 2019-09-23 15:56 UTC (permalink / raw)
  To: Jozef Lawrynowicz; +Cc: gcc

On Mon, Sep 23, 2019 at 12:56:20PM +0100, Jozef Lawrynowicz wrote:
> (insn 2 4 3 2 (set (reg/v:HI 25 [ i ])
>         (zero_extend:HI (reg:QI 12 R12 [ i ])))
>      (nil))
> .....
> (insn 7 6 8 2 (set (reg:PSI 28)
>         (subreg:PSI (sign_extend:SI (reg/v:HI 25 [ i ])) 0))
>      (nil))
> 
> All we really need is:
> 
> (insn (set (reg:PSI 28 [ i ])
>         (zero_extend:PSI (reg:QI 12 R12 [ i ])))
>      (nil))
> 
> The zero extend is implicit with byte sized register operations, so this would
> result in assembly such as:
>   MOV.B R12, R12
> 
> My question is whether this is the type of thing that should be handled with a
> peephole optimization or if it is worth trying to fix the initial RTL generated
> by expand (or in a later RTL pass e.g. combine)?

combine does (well, it did in June, I don't think things changed since then)

Trying 2 -> 7:
    2: r25:HI=zero_extend(R12:QI)
      REG_DEAD R12:QI
    7: r28:PSI=sign_extend(r25:HI)#0
      REG_DEAD r25:HI
Failed to match this instruction:
(set (reg:PSI 28 [ i ])
    (sign_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ i ]))))
Failed to match this instruction:
(set (reg:PSI 28 [ i ])
    (sign_extend:PSI (and:HI (reg:HI 12 R12)
            (const_int 255 [0xff]))))

It could have just done that as

(set (reg:PSI 28)
     (zero_extend:PSI (reg:QI 12)))

as far as I can see?  Do you already have a machine pattern that matches
that?

Please file a PR for this.  Thanks!


Segher

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: peephole or expand optimization for fixing poor code generated in conversion from QImode to PSImode pointer?
  2019-09-23 13:30 ` Jeff Law
@ 2019-09-23 17:02   ` Jozef Lawrynowicz
  0 siblings, 0 replies; 7+ messages in thread
From: Jozef Lawrynowicz @ 2019-09-23 17:02 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc

On Mon, 23 Sep 2019 07:30:21 -0600
Jeff Law <law@redhat.com> wrote:

> On 9/23/19 5:56 AM, Jozef Lawrynowicz wrote:
> > For msp430-elf in the large memory model there are PSImode (20-bit) pointers.
> > POINTERS_EXTEND_UNSIGNED == 1 and "char" is unsigned by default.
> > 
> > We get poor code generated for the following code snippet:
> > 
> > const int table[2] = {1, 2};
> > 
> > int
> > foo (char i)
> > {
> >   return table[i];
> > }
> > 
> > The RTL generated by expand uses two insns to convert "i" to a register's
> > natural mode; there is a sign extension which would be unnecessary if the first
> > instruction had a PSImode register as the lvalue:
> > 
> > (insn 2 4 3 2 (set (reg/v:HI 25 [ i ])
> >         (zero_extend:HI (reg:QI 12 R12 [ i ])))
> >      (nil))
> > .....
> > (insn 7 6 8 2 (set (reg:PSI 28)
> >         (subreg:PSI (sign_extend:SI (reg/v:HI 25 [ i ])) 0))
> >      (nil))
> > 
> > All we really need is:
> > 
> > (insn (set (reg:PSI 28 [ i ])
> >         (zero_extend:PSI (reg:QI 12 R12 [ i ])))
> >      (nil))
> > 
> > The zero extend is implicit with byte sized register operations, so this would
> > result in assembly such as:
> >   MOV.B R12, R12
> > 
> > My question is whether this is the type of thing that should be handled with a
> > peephole optimization or if it is worth trying to fix the initial RTL generated
> > by expand (or in a later RTL pass e.g. combine)?  
> Ideally it'd be fixed earlier, but that can be painful due to the way
> promotions work.
> 
> I certainly recall doing some combiner patterns to catch the more
> egregious codegen issues on the mn102 port (similar, but with 24bit
> pointers).  I think I mostly focused on stuff that showed up with
> integer loop indices and their annoying promotion to ptrdiff_t when used
> for array indexing.
> 
> I'd bet if you extracted mn10200.md out of the archive the patterns
> would be obvious ;-)

Ok thanks for the tip, I'll take a look. I'm sure there's likely to be other
stuff that could be applicable to msp430 as well.

Jozef
> 
> In this specific case I'd think a combiner pattern ought to do the right
> thing except for the fact that you're probably dealing with hard
> registers which combine no longer handles as aggressively.
> 
> jeff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: peephole or expand optimization for fixing poor code generated in conversion from QImode to PSImode pointer?
  2019-09-23 15:56 ` Segher Boessenkool
@ 2019-09-23 17:56   ` Jozef Lawrynowicz
  2019-09-23 22:53     ` Segher Boessenkool
  0 siblings, 1 reply; 7+ messages in thread
From: Jozef Lawrynowicz @ 2019-09-23 17:56 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc

On Mon, 23 Sep 2019 10:56:55 -0500
Segher Boessenkool <segher@kernel.crashing.org> wrote:

> On Mon, Sep 23, 2019 at 12:56:20PM +0100, Jozef Lawrynowicz wrote:
> > (insn 2 4 3 2 (set (reg/v:HI 25 [ i ])
> >         (zero_extend:HI (reg:QI 12 R12 [ i ])))
> >      (nil))
> > .....
> > (insn 7 6 8 2 (set (reg:PSI 28)
> >         (subreg:PSI (sign_extend:SI (reg/v:HI 25 [ i ])) 0))
> >      (nil))
> > 
> > All we really need is:
> > 
> > (insn (set (reg:PSI 28 [ i ])
> >         (zero_extend:PSI (reg:QI 12 R12 [ i ])))
> >      (nil))
> > 
> > The zero extend is implicit with byte sized register operations, so this would
> > result in assembly such as:
> >   MOV.B R12, R12
> > 
> > My question is whether this is the type of thing that should be handled with a
> > peephole optimization or if it is worth trying to fix the initial RTL generated
> > by expand (or in a later RTL pass e.g. combine)?  
> 
> combine does (well, it did in June, I don't think things changed since then)
> 
> Trying 2 -> 7:
>     2: r25:HI=zero_extend(R12:QI)
>       REG_DEAD R12:QI
>     7: r28:PSI=sign_extend(r25:HI)#0
>       REG_DEAD r25:HI
> Failed to match this instruction:
> (set (reg:PSI 28 [ i ])
>     (sign_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ i ]))))
> Failed to match this instruction:
> (set (reg:PSI 28 [ i ])
>     (sign_extend:PSI (and:HI (reg:HI 12 R12)
>             (const_int 255 [0xff]))))
> 
> It could have just done that as
> 
> (set (reg:PSI 28)
>      (zero_extend:PSI (reg:QI 12)))
> 
> as far as I can see?  Do you already have a machine pattern that matches
> that?

Yes that combination is what I was expecting to see. I guess since char is
unsigned, a zero extend is needed to widen it, and then for the following insn a
sign extend is added to widen the HImode integer.

There isn't currently a pattern which matches that, but adding it
doesn't fix the issue which is why I thought it might need to be fixed earlier
in RTL generation.
Here is the pattern that is missing:

+(define_insn "movqipsi"
+  [(set (match_operand:PSI                 0 "register_operand" "=r,r")
+       (zero_extend:PSI (match_operand:QI 1 "general_operand" "rYs,m")))]
+  "msp430x"
+  "@
+   MOV.B\t%1, %0
+   MOV%X1.B\t%1, %0"
+)
+

So will combine potentially be able to do away with (sign_extend (zero_extend))
in certain situations? I filed PR/91865 which has all the relevant details
from this thread.

I can add the following nameless insn and combine will catch this case and
generate the better, smaller code:

+(define_insn "*movqihipsi"
+  [(set (match_operand:PSI                 0 "register_operand" "=r,r")
+       (sign_extend:PSI (zero_extend:HI (match_operand:QI 1 "general_operand"
"rYs,m"))))]
+  "msp430x"
+  "@
+   MOV.B\t%1, %0
+   MOV%X1.B\t%1, %0"
+)
+

Thanks,
Jozef

> 
> Please file a PR for this.  Thanks!
> 
> 
> Segher

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: peephole or expand optimization for fixing poor code generated in conversion from QImode to PSImode pointer?
  2019-09-23 17:56   ` Jozef Lawrynowicz
@ 2019-09-23 22:53     ` Segher Boessenkool
  0 siblings, 0 replies; 7+ messages in thread
From: Segher Boessenkool @ 2019-09-23 22:53 UTC (permalink / raw)
  To: Jozef Lawrynowicz; +Cc: gcc

On Mon, Sep 23, 2019 at 06:56:12PM +0100, Jozef Lawrynowicz wrote:
> > It could have just done that as
> > 
> > (set (reg:PSI 28)
> >      (zero_extend:PSI (reg:QI 12)))
> > 
> > as far as I can see?  Do you already have a machine pattern that matches
> > that?
> 
> Yes that combination is what I was expecting to see. I guess since char is
> unsigned, a zero extend is needed to widen it, and then for the following insn a
> sign extend is added to widen the HImode integer.

Yeah, but a sign_extend:M1 of a zero_extend:M2 of an M3 (with M2>M3) is
just a zero_extend:M1 of that M3.  Somehow combine (or simplify-rtx)
missed that; or come to think of it, it probably does that for MODE_INT
things, but this is a MODE_PARTIAL_INT.  Hrm, would it be correct then.
I think it is?  M2 is a normal MODE_INT, all bits in M2 are defined, the
M2 value always is non-negative, the sign_extend:M1 always is the same
as the zero_extend:M1 would be.

> There isn't currently a pattern which matches that, but adding it
> doesn't fix the issue which is why I thought it might need to be fixed earlier
> in RTL generation.
> Here is the pattern that is missing:
> 
> +(define_insn "movqipsi"
> +  [(set (match_operand:PSI                 0 "register_operand" "=r,r")
> +       (zero_extend:PSI (match_operand:QI 1 "general_operand" "rYs,m")))]
> +  "msp430x"
> +  "@
> +   MOV.B\t%1, %0
> +   MOV%X1.B\t%1, %0"
> +)
> +
> 
> So will combine potentially be able to do away with (sign_extend (zero_extend))
> in certain situations?

Yes.

> I filed PR/91865 which has all the relevant details from this thread.

Thanks!

> I can add the following nameless insn and combine will catch this case and
> generate the better, smaller code:
> 
> +(define_insn "*movqihipsi"
> +  [(set (match_operand:PSI                 0 "register_operand" "=r,r")
> +       (sign_extend:PSI (zero_extend:HI (match_operand:QI 1 "general_operand"
> "rYs,m"))))]
> +  "msp430x"
> +  "@
> +   MOV.B\t%1, %0
> +   MOV%X1.B\t%1, %0"
> +)

Good to hear that already works!  combine should come up with the simpler
formulation though, let me see what I can do.


Segher

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-09-23 22:53 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-23 11:56 peephole or expand optimization for fixing poor code generated in conversion from QImode to PSImode pointer? Jozef Lawrynowicz
2019-09-23 13:25 ` Richard Biener
2019-09-23 13:30 ` Jeff Law
2019-09-23 17:02   ` Jozef Lawrynowicz
2019-09-23 15:56 ` Segher Boessenkool
2019-09-23 17:56   ` Jozef Lawrynowicz
2019-09-23 22:53     ` Segher Boessenkool

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).