Right way to represent flag-setting arithmetic instructions in MD files

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Right way to represent flag-setting arithmetic instructions in MD files
@ 2017-03-10 10:10 Kyrill Tkachov
  2017-03-10 10:23 ` Eric Botcazou
  2017-03-10 10:38 ` Jakub Jelinek
  0 siblings, 2 replies; 8+ messages in thread
From: Kyrill Tkachov @ 2017-03-10 10:10 UTC (permalink / raw)
  To: gcc

Hi all,

Some (many?) targets have instructions that perform an arithmetic operation and set the condition flags based on the result.
For example, on aarch64, we have instructions like ADDS, SUBS, ANDS etc.
In the machine description we represent them as a PARALLEL pattern of a COMPARE and the arithmetic operation.
For example, the ADDS instruction is represented as:

(define_insn "add<mode>3_compare0"
   [(set (reg:CC_NZ CC_REGNUM)
     (compare:CC_NZ
      (plus:GPI (match_operand:GPI 1 "register_operand" "%r,r,r")
            (match_operand:GPI 2 "aarch64_plus_operand" "r,I,J"))
      (const_int 0)))
    (set (match_operand:GPI 0 "register_operand" "=r,r,r")
     (plus:GPI (match_dup 1) (match_dup 2)))]

My understanding was that the order of the two in this pattern here doesn't matter because there is
an implicit PARALLEL around them, but I found that the compare-elimination pass (compare-elim.c)
assumes that the COMPARE set must be in the second position for it to do the transformations it wants.

Is there a recommended order for specifying the compare and the arithmetic operation in the MD files?
(in which case we should go through the aarch64 MD files and make sure the patterns are written the right
way round). Or is the compare-elimination pass just not robust enough? (In which case we should teach it
to look into both SETs of the pattern).

Thanks,
Kyrill

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Right way to represent flag-setting arithmetic instructions in MD files
  2017-03-10 10:10 Right way to represent flag-setting arithmetic instructions in MD files Kyrill Tkachov
@ 2017-03-10 10:23 ` Eric Botcazou
  2017-03-10 10:30   ` Kyrill Tkachov
  2017-03-10 10:38 ` Jakub Jelinek
  1 sibling, 1 reply; 8+ messages in thread
From: Eric Botcazou @ 2017-03-10 10:23 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: gcc

> My understanding was that the order of the two in this pattern here doesn't
> matter because there is an implicit PARALLEL around them, but I found that
> the compare-elimination pass (compare-elim.c) assumes that the COMPARE set
> must be in the second position for it to do the transformations it wants.

Why do you want to use the compare-elimination pass exactly if the flags are 
exposed before reload, as is the case on Aarch64 I think?  The combiner is 
supposed to do the same job instead for these targets.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Right way to represent flag-setting arithmetic instructions in MD files
  2017-03-10 10:23 ` Eric Botcazou
@ 2017-03-10 10:30   ` Kyrill Tkachov
  2017-03-10 23:57     ` Segher Boessenkool
  0 siblings, 1 reply; 8+ messages in thread
From: Kyrill Tkachov @ 2017-03-10 10:30 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: gcc

<resending due to mailing list failure>

On 10/03/17 10:23, Eric Botcazou wrote:
>> My understanding was that the order of the two in this pattern here doesn't
>> matter because there is an implicit PARALLEL around them, but I found that
>> the compare-elimination pass (compare-elim.c) assumes that the COMPARE set
>> must be in the second position for it to do the transformations it wants.
> Why do you want to use the compare-elimination pass exactly if the flags are
> exposed before reload, as is the case on Aarch64 I think?  The combiner is
> supposed to do the same job instead for these targets.
>

I'm trying to improve the cases where the result of the arithmetic
operation is used in multiple places besides the comparison.
For example:
         add     w0, w0, w1
         add     w1, w0, 2
         cmp     w0, 0

Combine will not attempt to merge the first ADD and CMP because W0 is used
in the second ADD. The compare-elimination pass so far looks like a far simpler
place to implement this transformation than combine.

Thanks,
Kyrill

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Right way to represent flag-setting arithmetic instructions in MD files
  2017-03-10 10:10 Right way to represent flag-setting arithmetic instructions in MD files Kyrill Tkachov
  2017-03-10 10:23 ` Eric Botcazou
@ 2017-03-10 10:38 ` Jakub Jelinek
  2017-03-10 11:45   ` Kyrylo Tkachov
  1 sibling, 1 reply; 8+ messages in thread
From: Jakub Jelinek @ 2017-03-10 10:38 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: gcc

On Fri, Mar 10, 2017 at 10:10:34AM +0000, Kyrill Tkachov wrote:
> Hi all,
> 
> Some (many?) targets have instructions that perform an arithmetic operation and set the condition flags based on the result.
> For example, on aarch64, we have instructions like ADDS, SUBS, ANDS etc.
> In the machine description we represent them as a PARALLEL pattern of a COMPARE and the arithmetic operation.
> For example, the ADDS instruction is represented as:
> 
> (define_insn "add<mode>3_compare0"
>   [(set (reg:CC_NZ CC_REGNUM)
>     (compare:CC_NZ
>      (plus:GPI (match_operand:GPI 1 "register_operand" "%r,r,r")
>            (match_operand:GPI 2 "aarch64_plus_operand" "r,I,J"))
>      (const_int 0)))
>    (set (match_operand:GPI 0 "register_operand" "=r,r,r")
>     (plus:GPI (match_dup 1) (match_dup 2)))]
> 
> My understanding was that the order of the two in this pattern here doesn't matter because there is
> an implicit PARALLEL around them, but I found that the compare-elimination pass (compare-elim.c)
> assumes that the COMPARE set must be in the second position for it to do the transformations it wants.
> 
> Is there a recommended order for specifying the compare and the arithmetic operation in the MD files?
> (in which case we should go through the aarch64 MD files and make sure the patterns are written the right
> way round). Or is the compare-elimination pass just not robust enough? (In which case we should teach it
> to look into both SETs of the pattern).

Please see http://gcc.gnu.org/ml/gcc-patches/2014-12/msg00584.html and
surrounding thread.

	Jakub

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Right way to represent flag-setting arithmetic instructions in MD files
  2017-03-10 10:38 ` Jakub Jelinek
@ 2017-03-10 11:45   ` Kyrylo Tkachov
  0 siblings, 0 replies; 8+ messages in thread
From: Kyrylo Tkachov @ 2017-03-10 11:45 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc, nd

<resending for 3rd time. Sorry my email system is having trouble :(>
On 10/03/17 10:38, Jakub Jelinek wrote:
> On Fri, Mar 10, 2017 at 10:10:34AM +0000, Kyrill Tkachov wrote:
>> Hi all,
>>
>> Some (many?) targets have instructions that perform an arithmetic operation and set the condition flags based on the result.
>> For example, on aarch64, we have instructions like ADDS, SUBS, ANDS etc.
>> In the machine description we represent them as a PARALLEL pattern of a COMPARE and the arithmetic operation.
>> For example, the ADDS instruction is represented as:
>>
>> (define_insn "add<mode>3_compare0"
>>    [(set (reg:CC_NZ CC_REGNUM)
>>      (compare:CC_NZ
>>       (plus:GPI (match_operand:GPI 1 "register_operand" "%r,r,r")
>>             (match_operand:GPI 2 "aarch64_plus_operand" "r,I,J"))
>>       (const_int 0)))
>>     (set (match_operand:GPI 0 "register_operand" "=r,r,r")
>>      (plus:GPI (match_dup 1) (match_dup 2)))]
>>
>> My understanding was that the order of the two in this pattern here doesn't matter because there is
>> an implicit PARALLEL around them, but I found that the compare-elimination pass (compare-elim.c)
>> assumes that the COMPARE set must be in the second position for it to do the transformations it wants.
>>
>> Is there a recommended order for specifying the compare and the arithmetic operation in the MD files?
>> (in which case we should go through the aarch64 MD files and make sure the patterns are written the right
>> way round). Or is the compare-elimination pass just not robust enough? (In which case we should teach it
>> to look into both SETs of the pattern).
> Please see http://gcc.gnu.org/ml/gcc-patches/2014-12/msg00584.html and
> surrounding thread.


Thanks, that is helpful.
It seems to me that teaching cmpelim to handle either order would not be a very complicated task.
Would folks object to making such a change?

Kyrill

> 	Jakub

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Right way to represent flag-setting arithmetic instructions in MD files
  2017-03-10 10:30   ` Kyrill Tkachov
@ 2017-03-10 23:57     ` Segher Boessenkool
  2017-03-13  9:53       ` Kyrill Tkachov
  0 siblings, 1 reply; 8+ messages in thread
From: Segher Boessenkool @ 2017-03-10 23:57 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: Eric Botcazou, gcc

On Fri, Mar 10, 2017 at 10:30:33AM +0000, Kyrill Tkachov wrote:
> I'm trying to improve the cases where the result of the arithmetic
> operation is used in multiple places besides the comparison.
> For example:
>         add     w0, w0, w1
>         add     w1, w0, 2
>         cmp     w0, 0
> 
> Combine will not attempt to merge the first ADD and CMP because W0 is used
> in the second ADD.

So the LOG_LINK from the first instruction will point at the second,
and combining the first and the third isn't considered (because the
first and second insns don't combine).

combine doesn't try to combine all producer-consumer pairs, only
producer with first consumer, because it would not often help and
could easily take much more time.  On the other hand I'd love to get
rid of the LOG_LINKS and use DF directly.

Note that combining the first and third insns in your example requires
to put the combined insns in the place of the first insn, where
normally it would put it at the third insn.  Maybe we could treat
compares specially?

Do such cases happen a lot, btw?

Segher

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Right way to represent flag-setting arithmetic instructions in MD files
  2017-03-10 23:57     ` Segher Boessenkool
@ 2017-03-13  9:53       ` Kyrill Tkachov
  2017-03-13 16:43         ` Segher Boessenkool
  0 siblings, 1 reply; 8+ messages in thread
From: Kyrill Tkachov @ 2017-03-13  9:53 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Eric Botcazou, gcc


On 10/03/17 23:56, Segher Boessenkool wrote:
> On Fri, Mar 10, 2017 at 10:30:33AM +0000, Kyrill Tkachov wrote:
>> I'm trying to improve the cases where the result of the arithmetic
>> operation is used in multiple places besides the comparison.
>> For example:
>>          add     w0, w0, w1
>>          add     w1, w0, 2
>>          cmp     w0, 0
>>
>> Combine will not attempt to merge the first ADD and CMP because W0 is used
>> in the second ADD.
> So the LOG_LINK from the first instruction will point at the second,
> and combining the first and the third isn't considered (because the
> first and second insns don't combine).

Yeah, that's what I'm seeing

> combine doesn't try to combine all producer-consumer pairs, only
> producer with first consumer, because it would not often help and
> could easily take much more time.  On the other hand I'd love to get
> rid of the LOG_LINKS and use DF directly.

Are there any significant barriers to moving to DF?
I heard some other passes (e.g. LRA) are reluctant to use
DF because it's too slow.


> Note that combining the first and third insns in your example requires
> to put the combined insns in the place of the first insn, where
> normally it would put it at the third insn.  Maybe we could treat
> compares specially?

We'd also need to do a SELECT_CC_MODE if we're replacing the operands of the comparison
and update the users of the CC reg from that comparison I believe.

>
> Do such cases happen a lot, btw?

I hacked together an extension to the cmpelim pass that merges such comparisons
(that previous passes like combine don't catch) and on aarch64 for SPEC2006 it merged
about 480 compares. If combine were extended as you described above I think it could catch
it, but it's a more complex and fragile pass than I feel comfortable hacking for this :)

Thanks,
Kyrill
>
> Segher

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Right way to represent flag-setting arithmetic instructions in MD files
  2017-03-13  9:53       ` Kyrill Tkachov
@ 2017-03-13 16:43         ` Segher Boessenkool
  0 siblings, 0 replies; 8+ messages in thread
From: Segher Boessenkool @ 2017-03-13 16:43 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: Eric Botcazou, gcc

On Mon, Mar 13, 2017 at 09:53:11AM +0000, Kyrill Tkachov wrote:
> >combine doesn't try to combine all producer-consumer pairs, only
> >producer with first consumer, because it would not often help and
> >could easily take much more time.  On the other hand I'd love to get
> >rid of the LOG_LINKS and use DF directly.
> 
> Are there any significant barriers to moving to DF?
> I heard some other passes (e.g. LRA) are reluctant to use
> DF because it's too slow.

Updating DF info can be slow.  I'm not sure how bad it really would be
for combine.  But, we do not even need to do a "full" update inside
combine.

Looking at all producer-consumer pairs is just as linear as only
looking at the first consumer for every producer.  It is more work
of course, and it does not generally help all that much.

combine has quite a few problems caused by not working with DF (both
correctness bugs and missed optimisations).

> >Note that combining the first and third insns in your example requires
> >to put the combined insns in the place of the first insn, where
> >normally it would put it at the third insn.  Maybe we could treat
> >compares specially?
> 
> We'd also need to do a SELECT_CC_MODE if we're replacing the operands of 
> the comparison
> and update the users of the CC reg from that comparison I believe.

Yeah probably.

> >Do such cases happen a lot, btw?
> 
> I hacked together an extension to the cmpelim pass that merges such 
> comparisons
> (that previous passes like combine don't catch) and on aarch64 for SPEC2006 
> it merged
> about 480 compares. If combine were extended as you described above I think 
> it could catch
> it, but it's a more complex and fragile pass than I feel comfortable 
> hacking for this :)

Some people call it brave, some people call it foolish.


Segher

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-03-13 16:43 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-10 10:10 Right way to represent flag-setting arithmetic instructions in MD files Kyrill Tkachov
2017-03-10 10:23 ` Eric Botcazou
2017-03-10 10:30   ` Kyrill Tkachov
2017-03-10 23:57     ` Segher Boessenkool
2017-03-13  9:53       ` Kyrill Tkachov
2017-03-13 16:43         ` Segher Boessenkool
2017-03-10 10:38 ` Jakub Jelinek
2017-03-10 11:45   ` Kyrylo Tkachov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).