public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* RTL alternative selection question
@ 2019-09-23 10:56 Andrew Stubbs
  2019-09-23 13:23 ` Richard Biener
  2019-09-23 14:15 ` Segher Boessenkool
  0 siblings, 2 replies; 10+ messages in thread
From: Andrew Stubbs @ 2019-09-23 10:56 UTC (permalink / raw)
  To: GCC Development

Hi All,

I'm trying to figure out how to prevent LRA selecting alternatives that 
result in values being copied from A to B for one instruction, and then 
immediately back from B to A again, when there are apparently more 
sensible alternatives available.

I have an insn with the following pattern (simplified here):

   [(set (match_operand:DI 0 "register_operand"  "=Sg,v")
         (ashift:DI
           (match_operand:DI 1 "gcn_alu_operand" " Sg,v")
           (match_operand:SI 2 "gcn_alu_operand" " Sg,v")))
    (clobber (match_scratch:BI 3                 "=cs,X"))]

There are two lshl instructions; one for scalar registers and one for 
vector registers. The vector here has only a single element, so the two 
are equivalent, but we need to pick one.

This operation works for both register files, but there are other 
operations that exist only on one side or the other, so we want those to 
determine in which register file the values are allocated.

Unfortunately, the compiler (almost?) exclusively selects the second 
alternative, even when this means moving the values from one register 
file to the other, and then back again.

The problem is that the scalar instruction clobbers the CC register, 
which results in a "reject++" for that alternative in the LRA dump.

I can fix this by disparaging the second alternative in the pattern:

    (clobber (match_scratch:BI 3                 "=cs,?X"))

This appears to do the right thing. I can now see both kinds of shift 
appearing in the assembly dumps.

But that does "reject+=6", which makes me worry that the balance has now 
shifted too far the other way.

Does this make sense?

    (clobber (match_scratch:BI 3                 "=^cs,?X"))

Is there a better way to discourage the copies? Perhaps without editing 
all the patterns?

What I want is for the two alternatives to appear equal when the CC 
register is not live, and when CC is live for LRA to be able to choose 
between reloading CC or switching to the other alternative according to 
the situation, not on the pattern alone.

Thanks in advance.

Andrew

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RTL alternative selection question
  2019-09-23 10:56 RTL alternative selection question Andrew Stubbs
@ 2019-09-23 13:23 ` Richard Biener
  2019-09-23 14:15 ` Segher Boessenkool
  1 sibling, 0 replies; 10+ messages in thread
From: Richard Biener @ 2019-09-23 13:23 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: GCC Development

On Mon, Sep 23, 2019 at 12:56 PM Andrew Stubbs <andrew_stubbs@mentor.com> wrote:
>
> Hi All,
>
> I'm trying to figure out how to prevent LRA selecting alternatives that
> result in values being copied from A to B for one instruction, and then
> immediately back from B to A again, when there are apparently more
> sensible alternatives available.
>
> I have an insn with the following pattern (simplified here):
>
>    [(set (match_operand:DI 0 "register_operand"  "=Sg,v")
>          (ashift:DI
>            (match_operand:DI 1 "gcn_alu_operand" " Sg,v")
>            (match_operand:SI 2 "gcn_alu_operand" " Sg,v")))
>     (clobber (match_scratch:BI 3                 "=cs,X"))]
>
> There are two lshl instructions; one for scalar registers and one for
> vector registers. The vector here has only a single element, so the two
> are equivalent, but we need to pick one.
>
> This operation works for both register files, but there are other
> operations that exist only on one side or the other, so we want those to
> determine in which register file the values are allocated.
>
> Unfortunately, the compiler (almost?) exclusively selects the second
> alternative, even when this means moving the values from one register
> file to the other, and then back again.
>
> The problem is that the scalar instruction clobbers the CC register,
> which results in a "reject++" for that alternative in the LRA dump.
>
> I can fix this by disparaging the second alternative in the pattern:
>
>     (clobber (match_scratch:BI 3                 "=cs,?X"))
>
> This appears to do the right thing. I can now see both kinds of shift
> appearing in the assembly dumps.
>
> But that does "reject+=6", which makes me worry that the balance has now
> shifted too far the other way.
>
> Does this make sense?
>
>     (clobber (match_scratch:BI 3                 "=^cs,?X"))
>
> Is there a better way to discourage the copies? Perhaps without editing
> all the patterns?
>
> What I want is for the two alternatives to appear equal when the CC
> register is not live, and when CC is live for LRA to be able to choose
> between reloading CC or switching to the other alternative according to
> the situation, not on the pattern alone.
>
> Thanks in advance.

I've faced a similar situation for PR91154 where for x86 there's a
vector min/max but no scalar one and the vector min/max is faster
than the conditional move sequence on the integer side.

In the end the extra freedom for the RA is bad since it doesn't
do a global decision on the initial regset preference.

So - don't do it.  In the simpler case as it is yours it would indeed
be nice if it just worked...

Richard.

>
> Andrew

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RTL alternative selection question
  2019-09-23 10:56 RTL alternative selection question Andrew Stubbs
  2019-09-23 13:23 ` Richard Biener
@ 2019-09-23 14:15 ` Segher Boessenkool
  2019-09-23 14:39   ` Andrew Stubbs
  1 sibling, 1 reply; 10+ messages in thread
From: Segher Boessenkool @ 2019-09-23 14:15 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: GCC Development

On Mon, Sep 23, 2019 at 11:56:27AM +0100, Andrew Stubbs wrote:
>   [(set (match_operand:DI 0 "register_operand"  "=Sg,v")
>         (ashift:DI
>           (match_operand:DI 1 "gcn_alu_operand" " Sg,v")
>           (match_operand:SI 2 "gcn_alu_operand" " Sg,v")))
>    (clobber (match_scratch:BI 3                 "=cs,X"))]

> Unfortunately, the compiler (almost?) exclusively selects the second 
> alternative, even when this means moving the values from one register 
> file to the other, and then back again.
> 
> The problem is that the scalar instruction clobbers the CC register, 
> which results in a "reject++" for that alternative in the LRA dump.

What kind of reject?  It prints a reason, too.

Maybe we should make a macro/hook to never do that for your target, for
those flags registers anyway.


Segher

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RTL alternative selection question
  2019-09-23 14:15 ` Segher Boessenkool
@ 2019-09-23 14:39   ` Andrew Stubbs
  2019-09-23 15:21     ` Segher Boessenkool
                       ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Andrew Stubbs @ 2019-09-23 14:39 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: GCC Development

On 23/09/2019 15:15, Segher Boessenkool wrote:
> On Mon, Sep 23, 2019 at 11:56:27AM +0100, Andrew Stubbs wrote:
>>    [(set (match_operand:DI 0 "register_operand"  "=Sg,v")
>>          (ashift:DI
>>            (match_operand:DI 1 "gcn_alu_operand" " Sg,v")
>>            (match_operand:SI 2 "gcn_alu_operand" " Sg,v")))
>>     (clobber (match_scratch:BI 3                 "=cs,X"))]
> 
>> Unfortunately, the compiler (almost?) exclusively selects the second
>> alternative, even when this means moving the values from one register
>> file to the other, and then back again.
>>
>> The problem is that the scalar instruction clobbers the CC register,
>> which results in a "reject++" for that alternative in the LRA dump.
> 
> What kind of reject?  It prints a reason, too.

      0 Non input pseudo reload: reject++

> Maybe we should make a macro/hook to never do that for your target, for
> those flags registers anyway.

That wouldn't be horrible. I suppose I could look at doing that. Any 
suggestions how it should or should not work?

Andrew

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RTL alternative selection question
  2019-09-23 14:39   ` Andrew Stubbs
@ 2019-09-23 15:21     ` Segher Boessenkool
  2019-09-23 15:26       ` Andrew Stubbs
  2019-09-23 19:14     ` Richard Sandiford
  2019-10-01 12:12     ` Andrew Stubbs
  2 siblings, 1 reply; 10+ messages in thread
From: Segher Boessenkool @ 2019-09-23 15:21 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: GCC Development

On Mon, Sep 23, 2019 at 03:39:08PM +0100, Andrew Stubbs wrote:
> On 23/09/2019 15:15, Segher Boessenkool wrote:
> >On Mon, Sep 23, 2019 at 11:56:27AM +0100, Andrew Stubbs wrote:
> >>   [(set (match_operand:DI 0 "register_operand"  "=Sg,v")
> >>         (ashift:DI
> >>           (match_operand:DI 1 "gcn_alu_operand" " Sg,v")
> >>           (match_operand:SI 2 "gcn_alu_operand" " Sg,v")))
> >>    (clobber (match_scratch:BI 3                 "=cs,X"))]
> >
> >>Unfortunately, the compiler (almost?) exclusively selects the second
> >>alternative, even when this means moving the values from one register
> >>file to the other, and then back again.
> >>
> >>The problem is that the scalar instruction clobbers the CC register,
> >>which results in a "reject++" for that alternative in the LRA dump.
> >
> >What kind of reject?  It prints a reason, too.
> 
>      0 Non input pseudo reload: reject++
> 
> >Maybe we should make a macro/hook to never do that for your target, for
> >those flags registers anyway.
> 
> That wouldn't be horrible. I suppose I could look at doing that. Any 
> suggestions how it should or should not work?

Pass the register class or constraint or something like that to the hook,
then based on what the hook returns, either or not do the reject?  So your
hook would special-case SCC_CONDITIONAL_REG, maybe a few more similar ones
(those are confusing names btw, _REG but they are register classes).


Segher

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RTL alternative selection question
  2019-09-23 15:21     ` Segher Boessenkool
@ 2019-09-23 15:26       ` Andrew Stubbs
  2019-09-23 16:24         ` Jeff Law
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew Stubbs @ 2019-09-23 15:26 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: GCC Development

On 23/09/2019 16:21, Segher Boessenkool wrote:
> Pass the register class or constraint or something like that to the hook,
> then based on what the hook returns, either or not do the reject?  So your
> hook would special-case SCC_CONDITIONAL_REG, maybe a few more similar ones
> (those are confusing names btw, _REG but they are register classes).

It's a class of one register!

Thanks

Andrew

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RTL alternative selection question
  2019-09-23 15:26       ` Andrew Stubbs
@ 2019-09-23 16:24         ` Jeff Law
  0 siblings, 0 replies; 10+ messages in thread
From: Jeff Law @ 2019-09-23 16:24 UTC (permalink / raw)
  To: Andrew Stubbs, Segher Boessenkool; +Cc: GCC Development

On 9/23/19 9:26 AM, Andrew Stubbs wrote:
> On 23/09/2019 16:21, Segher Boessenkool wrote:
>> Pass the register class or constraint or something like that to the hook,
>> then based on what the hook returns, either or not do the reject?  So
>> your
>> hook would special-case SCC_CONDITIONAL_REG, maybe a few more similar
>> ones
>> (those are confusing names btw, _REG but they are register classes).
> 
> It's a class of one register!
So maybe use CLASS_LIKELY_SPILLED_P as the trigger?

jeff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RTL alternative selection question
  2019-09-23 14:39   ` Andrew Stubbs
  2019-09-23 15:21     ` Segher Boessenkool
@ 2019-09-23 19:14     ` Richard Sandiford
  2019-10-01 12:12     ` Andrew Stubbs
  2 siblings, 0 replies; 10+ messages in thread
From: Richard Sandiford @ 2019-09-23 19:14 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: Segher Boessenkool, GCC Development

Andrew Stubbs <andrew_stubbs@mentor.com> writes:
> On 23/09/2019 15:15, Segher Boessenkool wrote:
>> On Mon, Sep 23, 2019 at 11:56:27AM +0100, Andrew Stubbs wrote:
>>>    [(set (match_operand:DI 0 "register_operand"  "=Sg,v")
>>>          (ashift:DI
>>>            (match_operand:DI 1 "gcn_alu_operand" " Sg,v")
>>>            (match_operand:SI 2 "gcn_alu_operand" " Sg,v")))
>>>     (clobber (match_scratch:BI 3                 "=cs,X"))]
>> 
>>> Unfortunately, the compiler (almost?) exclusively selects the second
>>> alternative, even when this means moving the values from one register
>>> file to the other, and then back again.
>>>
>>> The problem is that the scalar instruction clobbers the CC register,
>>> which results in a "reject++" for that alternative in the LRA dump.
>> 
>> What kind of reject?  It prints a reason, too.
>
>       0 Non input pseudo reload: reject++

It looks like that comes from:

		  /* Input reloads can be inherited more often than
		     output reloads can be removed, so penalize output
		     reloads.  */
		  if (!REG_P (op) || curr_static_id->operand[nop].type != OP_IN)
		    {
		      if (lra_dump_file != NULL)
			fprintf
			  (lra_dump_file,
			   "            %d Non input pseudo reload: reject++\n",
			   nop);
		      reject++;
		    }

That's a good reason for increasing the cost of output reloads of values
vs. input reloads of values, but I don't think it covers your case of
reloading a SCRATCH, which can never be inherited anyway.  So if the
reason in the comment is the real reason (definitely sounds plausible)
then perhaps we should just exclude SCRATCHes?

I don't think this is really target-specific.  MIPS has a very similar
situation with multiplication, and again the SCRATCH reloads shouldn't
count.  Same for some SVE patterns.

Richard

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RTL alternative selection question
  2019-09-23 14:39   ` Andrew Stubbs
  2019-09-23 15:21     ` Segher Boessenkool
  2019-09-23 19:14     ` Richard Sandiford
@ 2019-10-01 12:12     ` Andrew Stubbs
  2019-10-02  0:07       ` Segher Boessenkool
  2 siblings, 1 reply; 10+ messages in thread
From: Andrew Stubbs @ 2019-10-01 12:12 UTC (permalink / raw)
  To: Segher Boessenkool, Richard Sandiford; +Cc: GCC Development

On 23/09/2019 15:39, Andrew Stubbs wrote:
> On 23/09/2019 15:15, Segher Boessenkool wrote:
>> On Mon, Sep 23, 2019 at 11:56:27AM +0100, Andrew Stubbs wrote:
>>>    [(set (match_operand:DI 0 "register_operand"  "=Sg,v")
>>>          (ashift:DI
>>>            (match_operand:DI 1 "gcn_alu_operand" " Sg,v")
>>>            (match_operand:SI 2 "gcn_alu_operand" " Sg,v")))
>>>     (clobber (match_scratch:BI 3                 "=cs,X"))]
>>
>>> Unfortunately, the compiler (almost?) exclusively selects the second
>>> alternative, even when this means moving the values from one register
>>> file to the other, and then back again.
>>>
>>> The problem is that the scalar instruction clobbers the CC register,
>>> which results in a "reject++" for that alternative in the LRA dump.
>>
>> What kind of reject?  It prints a reason, too.
> 
>       0 Non input pseudo reload: reject++

Apparently I was confused by operand "0" versus alternative "0". That 
message did occur, but it wasn't the only one. Here's all of it:

              0 Non input pseudo reload: reject++
              3 Scratch win: reject+=2
            alt=0,overall=9,losers=1,rld_nregs=2
            alt=1,overall=6,losers=1,rld_nregs=2

I don't understand why the "reject++" occurs, but presumably has to do 
with the "Sg" register availability somehow?

The "Scratch win" part comes from this code:

           /* We simulate the behavior of old reload here.
              Although scratches need hard registers and it
              might result in spilling other pseudos, no reload
              insns are generated for the scratches.  So it
              might cost something but probably less than old
              reload pass believes.  */
           if (scratch_p)
             {
               if (lra_dump_file != NULL)
                 fprintf (lra_dump_file,
                          "            %d Scratch win: reject+=2\n",
                          nop);
               reject += 2;
             }
         }

Would it make sense to skip this reject when CLASS_LIKELY_SPILLED_P, as 
Jeff suggested?

Unfortunately, removing the "Scratch win" penalty alone is not enough 
for LRA to select the first alternative -- at least, no in my testcase 
-- so I need to understand the "non input pseudo reload" issue as well. 
I can see why it fires for alt0, but not why it does not fire for alt1.

Andrew

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: RTL alternative selection question
  2019-10-01 12:12     ` Andrew Stubbs
@ 2019-10-02  0:07       ` Segher Boessenkool
  0 siblings, 0 replies; 10+ messages in thread
From: Segher Boessenkool @ 2019-10-02  0:07 UTC (permalink / raw)
  To: Andrew Stubbs; +Cc: Richard Sandiford, GCC Development

On Tue, Oct 01, 2019 at 01:12:06PM +0100, Andrew Stubbs wrote:
> On 23/09/2019 15:39, Andrew Stubbs wrote:
> >On 23/09/2019 15:15, Segher Boessenkool wrote:
> >>On Mon, Sep 23, 2019 at 11:56:27AM +0100, Andrew Stubbs wrote:
> >>>   [(set (match_operand:DI 0 "register_operand"  "=Sg,v")
> >>>         (ashift:DI
> >>>           (match_operand:DI 1 "gcn_alu_operand" " Sg,v")
> >>>           (match_operand:SI 2 "gcn_alu_operand" " Sg,v")))
> >>>    (clobber (match_scratch:BI 3                 
> >>>"=cs,X"))]
> >>
> >>>Unfortunately, the compiler (almost?) exclusively selects the second
> >>>alternative, even when this means moving the values from one register
> >>>file to the other, and then back again.
> >>>
> >>>The problem is that the scalar instruction clobbers the CC register,
> >>>which results in a "reject++" for that alternative in the LRA dump.
> >>
> >>What kind of reject?  It prints a reason, too.
> >
> >      0 Non input pseudo reload: reject++
> 
> Apparently I was confused by operand "0" versus alternative "0". That 
> message did occur, but it wasn't the only one. Here's all of it:
> 
>              0 Non input pseudo reload: reject++
>              3 Scratch win: reject+=2
>            alt=0,overall=9,losers=1,rld_nregs=2
>            alt=1,overall=6,losers=1,rld_nregs=2
> 
> I don't understand why the "reject++" occurs, but presumably has to do 
> with the "Sg" register availability somehow?

operands[0] (the output) needed a reload, apparently.

[ snip ]

> Unfortunately, removing the "Scratch win" penalty alone is not enough 
> for LRA to select the first alternative -- at least, no in my testcase 
> -- so I need to understand the "non input pseudo reload" issue as well. 
> I can see why it fires for alt0, but not why it does not fire for alt1.

The reload it decided to do there was for operands[1] or operands[2]?


Segher

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-10-02  0:07 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-23 10:56 RTL alternative selection question Andrew Stubbs
2019-09-23 13:23 ` Richard Biener
2019-09-23 14:15 ` Segher Boessenkool
2019-09-23 14:39   ` Andrew Stubbs
2019-09-23 15:21     ` Segher Boessenkool
2019-09-23 15:26       ` Andrew Stubbs
2019-09-23 16:24         ` Jeff Law
2019-09-23 19:14     ` Richard Sandiford
2019-10-01 12:12     ` Andrew Stubbs
2019-10-02  0:07       ` Segher Boessenkool

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).