public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* RFC: Combine of compare & and oddity
@ 2015-09-02 17:09 Wilco Dijkstra
  2015-09-02 18:49 ` Segher Boessenkool
  2015-09-02 20:45 ` Jeff Law
  0 siblings, 2 replies; 26+ messages in thread
From: Wilco Dijkstra @ 2015-09-02 17:09 UTC (permalink / raw)
  To: 'GCC Patches', 'Segher Boessenkool'

Hi,

Combine canonicalizes certain AND masks in a comparison with zero into extracts of the widest
register type. During matching these are expanded into a very inefficient sequence that fails to
match. For example (x & 2) == 0 is matched in combine like this:

Failed to match this instruction:
(set (reg:CC 66 cc)
    (compare:CC (zero_extract:DI (subreg:DI (reg/v:SI 76 [ xD.2641 ]) 0)
            (const_int 1 [0x1])
            (const_int 1 [0x1]))
        (const_int 0 [0])))
Failed to match this instruction:
(set (reg:CC 66 cc)
    (compare:CC (and:DI (lshiftrt:DI (subreg:DI (reg/v:SI 76 [ xD.2641 ]) 0)
                (const_int 1 [0x1]))
            (const_int 1 [0x1]))
        (const_int 0 [0])))

Neither matches the AArch64 patterns for ANDS/TST (which is just compare and AND). If the immediate
is not a power of 2 or a power of 2 -1 then it matches correctly as expected.

I don't understand how ((x >> 1) & 1) != 0 could be a useful expansion (it even uses shifts by 0 at
times which are unlikely to ever match anything). Why does combine not try to match the obvious (x &
C) != 0 case? Single-bit and mask tests are very common, so this blocks efficient code generation on
many targets.

It's trivial to change change_zero_ext to expand extracts always into AND and remove the redundant
subreg. However wouldn't it make more sense to never special case certain AND immediate in the first
place?

Wilco






^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC: Combine of compare & and oddity
  2015-09-02 17:09 RFC: Combine of compare & and oddity Wilco Dijkstra
@ 2015-09-02 18:49 ` Segher Boessenkool
  2015-09-02 19:56   ` Segher Boessenkool
  2015-09-03 11:51   ` Wilco Dijkstra
  2015-09-02 20:45 ` Jeff Law
  1 sibling, 2 replies; 26+ messages in thread
From: Segher Boessenkool @ 2015-09-02 18:49 UTC (permalink / raw)
  To: Wilco Dijkstra; +Cc: 'GCC Patches'

Hi Wilco,

On Wed, Sep 02, 2015 at 06:09:24PM +0100, Wilco Dijkstra wrote:
> Combine canonicalizes certain AND masks in a comparison with zero into extracts of the widest
> register type. During matching these are expanded into a very inefficient sequence that fails to
> match. For example (x & 2) == 0 is matched in combine like this:
> 
> Failed to match this instruction:
> (set (reg:CC 66 cc)
>     (compare:CC (zero_extract:DI (subreg:DI (reg/v:SI 76 [ xD.2641 ]) 0)
>             (const_int 1 [0x1])
>             (const_int 1 [0x1]))
>         (const_int 0 [0])))

Yes.  Some processors even have specific instructions to do this.

> Failed to match this instruction:
> (set (reg:CC 66 cc)
>     (compare:CC (and:DI (lshiftrt:DI (subreg:DI (reg/v:SI 76 [ xD.2641 ]) 0)
>                 (const_int 1 [0x1]))
>             (const_int 1 [0x1]))
>         (const_int 0 [0])))

This is after r223067.  Combine tests only one "final" instruction; that
revision rewrites zero_ext* if it doesn't match and tries again.  This
helps for processors that can do more generic masks (like rs6000, and I
believe also aarch64?): without it, you need many more patterns to match
all the zero_ext{ract,end} cases.

> Neither matches the AArch64 patterns for ANDS/TST (which is just compare and AND). If the immediate
> is not a power of 2 or a power of 2 -1 then it matches correctly as expected.
> 
> I don't understand how ((x >> 1) & 1) != 0 could be a useful expansion

It is zero_extract(x,1,1) really.  This is convenient for (old and embedded)
processors that have special bit-test instructions.  If we now want combine
to not do this, we'll have to update all backends that rely on it.

> (it even uses shifts by 0 at
> times which are unlikely to ever match anything).

It matches fine on some targets.  It certainly looks silly, and could be
expressed simpler.  Patch should be simple; watch this space / remind me /
or file a PR.

> Why does combine not try to match the obvious (x &
> C) != 0 case? Single-bit and mask tests are very common, so this blocks efficient code generation on
> many targets.

They are common, and many processors had instructions for them, which is
(supposedly) why combine special-cased them.

> It's trivial to change change_zero_ext to expand extracts always into AND and remove the redundant
> subreg.

Not really trivial...  Think about comparisons...

int32_t x;
((x >> 31) & 1) > 0;
// is not the same as
(x & 0x80000000) > 0; // signed comparison

(You do not easily know what the COMPARE is used for).

> However wouldn't it make more sense to never special case certain AND immediate in the first
> place?

Yes, but we need to make sure no targets regress (i.e. by rewriting patterns
for those that do).  We need to know the impact of such a change, at the least.


Segher

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC: Combine of compare & and oddity
  2015-09-02 18:49 ` Segher Boessenkool
@ 2015-09-02 19:56   ` Segher Boessenkool
  2015-09-03 11:51   ` Wilco Dijkstra
  1 sibling, 0 replies; 26+ messages in thread
From: Segher Boessenkool @ 2015-09-02 19:56 UTC (permalink / raw)
  To: Wilco Dijkstra; +Cc: 'GCC Patches'

Hi Wilco,

On Wed, Sep 02, 2015 at 06:09:24PM +0100, Wilco Dijkstra wrote:
> Combine canonicalizes certain AND masks in a comparison with zero into extracts of the widest
> register type. During matching these are expanded into a very inefficient sequence that fails to
> match. For example (x & 2) == 0 is matched in combine like this:
> 
> Failed to match this instruction:
> (set (reg:CC 66 cc)
>     (compare:CC (zero_extract:DI (subreg:DI (reg/v:SI 76 [ xD.2641 ]) 0)
>             (const_int 1 [0x1])
>             (const_int 1 [0x1]))
>         (const_int 0 [0])))

Yes.  Some processors even have specific instructions to do this.

> Failed to match this instruction:
> (set (reg:CC 66 cc)
>     (compare:CC (and:DI (lshiftrt:DI (subreg:DI (reg/v:SI 76 [ xD.2641 ]) 0)
>                 (const_int 1 [0x1]))
>             (const_int 1 [0x1]))
>         (const_int 0 [0])))

This is after r223067.  Combine tests only one "final" instruction; that
revision rewrites zero_ext* if it doesn't match and tries again.  This
helps for processors that can do more generic masks (like rs6000, and I
believe also aarch64?): without it, you need many more patterns to match
all the zero_ext{ract,end} cases.

> Neither matches the AArch64 patterns for ANDS/TST (which is just compare and AND). If the immediate
> is not a power of 2 or a power of 2 -1 then it matches correctly as expected.
> 
> I don't understand how ((x >> 1) & 1) != 0 could be a useful expansion

It is zero_extract(x,1,1) really.  This is convenient for (old and embedded)
processors that have special bit-test instructions.  If we now want combine
to not do this, we'll have to update all backends that rely on it.

> (it even uses shifts by 0 at
> times which are unlikely to ever match anything).

It matches fine on some targets.  It certainly looks silly, and could be
expressed simpler.  Patch should be simple; watch this space / remind me /
or file a PR.

> Why does combine not try to match the obvious (x &
> C) != 0 case? Single-bit and mask tests are very common, so this blocks efficient code generation on
> many targets.

They are common, and many processors had instructions for them, which is
(supposedly) why combine special-cased them.

> It's trivial to change change_zero_ext to expand extracts always into AND and remove the redundant
> subreg.

Not really trivial...  Think about comparisons...

int32_t x;
((x >> 31) & 1) > 0;
// is not the same as
(x & 0x80000000) > 0; // signed comparison

(You do not easily know what the COMPARE is used for).

> However wouldn't it make more sense to never special case certain AND immediate in the first
> place?

Yes, but we need to make sure no targets regress (i.e. by rewriting patterns
for those that do).  We need to know the impact of such a change, at the least.


Segher

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC: Combine of compare & and oddity
  2015-09-02 17:09 RFC: Combine of compare & and oddity Wilco Dijkstra
  2015-09-02 18:49 ` Segher Boessenkool
@ 2015-09-02 20:45 ` Jeff Law
  2015-09-02 21:05   ` Segher Boessenkool
  1 sibling, 1 reply; 26+ messages in thread
From: Jeff Law @ 2015-09-02 20:45 UTC (permalink / raw)
  To: Wilco Dijkstra, 'GCC Patches', 'Segher Boessenkool'

On 09/02/2015 11:09 AM, Wilco Dijkstra wrote:
> Hi,
>
> Combine canonicalizes certain AND masks in a comparison with zero into extracts of the widest
> register type. During matching these are expanded into a very inefficient sequence that fails to
> match. For example (x & 2) == 0 is matched in combine like this:
>
> Failed to match this instruction:
> (set (reg:CC 66 cc)
>      (compare:CC (zero_extract:DI (subreg:DI (reg/v:SI 76 [ xD.2641 ]) 0)
>              (const_int 1 [0x1])
>              (const_int 1 [0x1]))
>          (const_int 0 [0])))
That's a fairly standard looking bit test  One could argue about the 
mode of the extraction being problematical because it introduces the 
SUBREG, but overall it's a normal looking bit test.  Various ports know 
about this form.


> Failed to match this instruction:
> (set (reg:CC 66 cc)
>      (compare:CC (and:DI (lshiftrt:DI (subreg:DI (reg/v:SI 76 [ xD.2641 ]) 0)
>                  (const_int 1 [0x1]))
>              (const_int 1 [0x1]))
>          (const_int 0 [0])))
Yea, this is an alternative form.   I don't offhand remember how/why 
this form appears, but it certainly does.  I don't think any ports 
handle this form (but I certainly have done any checks), but I believe 
combine creates it primarily for internal purposes.

>
> Neither matches the AArch64 patterns for ANDS/TST (which is just compare and AND). If the immediate
> is not a power of 2 or a power of 2 -1 then it matches correctly as expected.
It might be advisable to recognize the first form.

>
> I don't understand how ((x >> 1) & 1) != 0 could be a useful expansion (it even uses shifts by 0 at
> times which are unlikely to ever match anything). Why does combine not try to match the obvious (x &
> C) != 0 case? Single-bit and mask tests are very common, so this blocks efficient code generation on
> many targets.
 From md.texi:

cindex @code{zero_extract}, canonicalization of
@cindex @code{sign_extract}, canonicalization of
@item
Equality comparisons of a group of bits (usually a single bit) with zero
will be written using @code{zero_extract} rather than the equivalent
@code{and} or @code{sign_extract} operations.

Jeff

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC: Combine of compare & and oddity
  2015-09-02 20:45 ` Jeff Law
@ 2015-09-02 21:05   ` Segher Boessenkool
  2015-09-03 15:26     ` Jeff Law
  0 siblings, 1 reply; 26+ messages in thread
From: Segher Boessenkool @ 2015-09-02 21:05 UTC (permalink / raw)
  To: Jeff Law; +Cc: Wilco Dijkstra, 'GCC Patches'

On Wed, Sep 02, 2015 at 01:59:58PM -0600, Jeff Law wrote:
> >(set (reg:CC 66 cc)
> >     (compare:CC (and:DI (lshiftrt:DI (subreg:DI (reg/v:SI 76 [ xD.2641 ]) 
> >     0)
> >                 (const_int 1 [0x1]))
> >             (const_int 1 [0x1]))
> >         (const_int 0 [0])))
> Yea, this is an alternative form.   I don't offhand remember how/why 
> this form appears, but it certainly does.  I don't think any ports 
> handle this form (but I certainly have done any checks), but I believe 
> combine creates it primarily for internal purposes.

Combine replaces zero_ext* with equivalent shift/and patterns and tries
again, if things don't match.  Targets with more generic masking insns
do not want to describe the very many cases that can be described with
zero_ext* separately.

rs6000 handles this exact pattern, btw.  And I'll be very happy if we can
just drop it :-)

> >I don't understand how ((x >> 1) & 1) != 0 could be a useful expansion (it 
> >even uses shifts by 0 at
> >times which are unlikely to ever match anything). Why does combine not try 
> >to match the obvious (x &
> >C) != 0 case? Single-bit and mask tests are very common, so this blocks 
> >efficient code generation on
> >many targets.
> From md.texi:
> 
> cindex @code{zero_extract}, canonicalization of
> @cindex @code{sign_extract}, canonicalization of
> @item
> Equality comparisons of a group of bits (usually a single bit) with zero
> will be written using @code{zero_extract} rather than the equivalent
> @code{and} or @code{sign_extract} operations.

Oh it's even documented, thanks.  I do still think we should think of
changing this.


Segher

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: RFC: Combine of compare & and oddity
  2015-09-02 18:49 ` Segher Boessenkool
  2015-09-02 19:56   ` Segher Boessenkool
@ 2015-09-03 11:51   ` Wilco Dijkstra
  2015-09-03 13:57     ` Segher Boessenkool
  1 sibling, 1 reply; 26+ messages in thread
From: Wilco Dijkstra @ 2015-09-03 11:51 UTC (permalink / raw)
  To: 'Segher Boessenkool'; +Cc: 'GCC Patches'

> Segher Boessenkool wrote:
> Hi Wilco,
> 
> On Wed, Sep 02, 2015 at 06:09:24PM +0100, Wilco Dijkstra wrote:
> > Combine canonicalizes certain AND masks in a comparison with zero into extracts of the
> widest
> > register type. During matching these are expanded into a very inefficient sequence that
> fails to
> > match. For example (x & 2) == 0 is matched in combine like this:
> >
> > Failed to match this instruction:
> > (set (reg:CC 66 cc)
> >     (compare:CC (zero_extract:DI (subreg:DI (reg/v:SI 76 [ xD.2641 ]) 0)
> >             (const_int 1 [0x1])
> >             (const_int 1 [0x1]))
> >         (const_int 0 [0])))
> 
> Yes.  Some processors even have specific instructions to do this.

However there are 2 issues with this, one is the spurious subreg, the other is 
that only a subset of legal zero_extracts are tried (only single bit and the
degenerate case of zero_extract with shift of 0 - which I think should not be a
zero_extract). All other AND immediate remain as AND. 

So to emit an AND on targets without such specific instructions, or where such 
instructions are more expensive than a simple AND (*), you need now at least 3 different 
backend patterns for any instruction that can emit AND immediate...

(*) I think that is another issue in combine - when both alternatives match you
want to select the lowest cost one, not the first one that matches.

> > Failed to match this instruction:
> > (set (reg:CC 66 cc)
> >     (compare:CC (and:DI (lshiftrt:DI (subreg:DI (reg/v:SI 76 [ xD.2641 ]) 0)
> >                 (const_int 1 [0x1]))
> >             (const_int 1 [0x1]))
> >         (const_int 0 [0])))
> 
> This is after r223067.  Combine tests only one "final" instruction; that
> revision rewrites zero_ext* if it doesn't match and tries again.  This
> helps for processors that can do more generic masks (like rs6000, and I
> believe also aarch64?): without it, you need many more patterns to match
> all the zero_ext{ract,end} cases.

But there are more efficient ways to emit single bit and masks tests that apply
to most CPUs rather than doing something specific that works for just one target
only. For example single bit test is a simple shift into carry flag or into the 
sign bit, and for mask tests, you shift out all the non-mask bits.

So my question is, is it combine's job to try all possible permutations that
constitute a bit or mask test? Or would it be better to let each target decide
on how to canonicalize bit tests and only try that alternative?

> > Neither matches the AArch64 patterns for ANDS/TST (which is just compare and AND). If the
> immediate
> > is not a power of 2 or a power of 2 -1 then it matches correctly as expected.
> >
> > I don't understand how ((x >> 1) & 1) != 0 could be a useful expansion
> 
> It is zero_extract(x,1,1) really.  This is convenient for (old and embedded)
> processors that have special bit-test instructions.  If we now want combine
> to not do this, we'll have to update all backends that rely on it.

Would any backend actually rely on this given it only does some specific masks,
has a redundant shift with 0 for the mask case and the odd subreg as well?

> > (it even uses shifts by 0 at
> > times which are unlikely to ever match anything).
> 
> It matches fine on some targets.  It certainly looks silly, and could be
> expressed simpler.  Patch should be simple; watch this space / remind me /
> or file a PR.

I don't think this issue matters as it seems unlikely it ever matches.

> > Why does combine not try to match the obvious (x &
> > C) != 0 case? Single-bit and mask tests are very common, so this blocks efficient code
> generation on
> > many targets.
> 
> They are common, and many processors had instructions for them, which is
> (supposedly) why combine special-cased them.

Yes, but that doesn't mean (x & C) != 0 shouldn't be tried as well...

> > It's trivial to change change_zero_ext to expand extracts always into AND and remove the
> redundant
> > subreg.
> 
> Not really trivial...  Think about comparisons...
> 
> int32_t x;
> ((x >> 31) & 1) > 0;
> // is not the same as
> (x & 0x80000000) > 0; // signed comparison
> 
> (You do not easily know what the COMPARE is used for).

Indeed if you had a zero_extract that included the sign-bit then you may have to adjust
the compare condition. However it looks like that case can't happen - (x & 0x80000000) 
comparisons with have the AND optimized away much earlier.

> > However wouldn't it make more sense to never special case certain AND immediate in the first
> > place?
> 
> Yes, but we need to make sure no targets regress (i.e. by rewriting patterns
> for those that do).  We need to know the impact of such a change, at the least.

The alternative would be let the target decide how to canonicalize bit tests. 
That seems like a better solution than trying multiple possibilities which can never 
match on most targets.

Wilco


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC: Combine of compare & and oddity
  2015-09-03 11:51   ` Wilco Dijkstra
@ 2015-09-03 13:57     ` Segher Boessenkool
  2015-09-03 15:00       ` Wilco Dijkstra
  2015-09-03 15:52       ` Jeff Law
  0 siblings, 2 replies; 26+ messages in thread
From: Segher Boessenkool @ 2015-09-03 13:57 UTC (permalink / raw)
  To: Wilco Dijkstra; +Cc: 'GCC Patches'

On Thu, Sep 03, 2015 at 12:43:34PM +0100, Wilco Dijkstra wrote:
> > > Combine canonicalizes certain AND masks in a comparison with zero into extracts of the
> > widest
> > > register type. During matching these are expanded into a very inefficient sequence that
> > fails to
> > > match. For example (x & 2) == 0 is matched in combine like this:
> > >
> > > Failed to match this instruction:
> > > (set (reg:CC 66 cc)
> > >     (compare:CC (zero_extract:DI (subreg:DI (reg/v:SI 76 [ xD.2641 ]) 0)
> > >             (const_int 1 [0x1])
> > >             (const_int 1 [0x1]))
> > >         (const_int 0 [0])))
> > 
> > Yes.  Some processors even have specific instructions to do this.
> 
> However there are 2 issues with this, one is the spurious subreg,

Combine didn't make that up out of thin air; something already used
DImode here.  It could simplify it to SImode in this case, that is
true, don't know why it doesn't; it isn't necessarily faster code to
do so, it can be slower, it might not match, etc.

> the other is 
> that only a subset of legal zero_extracts are tried (only single bit and the
> degenerate case of zero_extract with shift of 0 - which I think should not be a
> zero_extract). All other AND immediate remain as AND. 

Yes.  I'm happy to see this weird special case "optimisation",
"canocalisation" gone.

> So to emit an AND on targets without such specific instructions, or where such 
> instructions are more expensive than a simple AND (*), you need now at least 3 different 
> backend patterns for any instruction that can emit AND immediate...

It's only a problem for AND-and-compare, no?

> (*) I think that is another issue in combine - when both alternatives match you
> want to select the lowest cost one, not the first one that matches.

That's recog, not combine.  And quite a few backends rely on "first match
wins", because it always has been that way.  It also is very easy to write
such patterns accidentally (sometimes even the exact same one twice in the
same machine description, etc.)

> > > Failed to match this instruction:
> > > (set (reg:CC 66 cc)
> > >     (compare:CC (and:DI (lshiftrt:DI (subreg:DI (reg/v:SI 76 [ xD.2641 ]) 0)
> > >                 (const_int 1 [0x1]))
> > >             (const_int 1 [0x1]))
> > >         (const_int 0 [0])))
> > 
> > This is after r223067.  Combine tests only one "final" instruction; that
> > revision rewrites zero_ext* if it doesn't match and tries again.  This
> > helps for processors that can do more generic masks (like rs6000, and I
> > believe also aarch64?): without it, you need many more patterns to match
> > all the zero_ext{ract,end} cases.
> 
> But there are more efficient ways to emit single bit and masks tests that apply
> to most CPUs rather than doing something specific that works for just one target
> only. For example single bit test is a simple shift into carry flag or into the 
> sign bit, and for mask tests, you shift out all the non-mask bits.

Most of those are quite target-specific.  Some others are already done,
and/or done by other passes.

> So my question is, is it combine's job to try all possible permutations that
> constitute a bit or mask test?

Combine converts the merged instructions to what it thinks is the
canonical or cheapest form, and uses that.  It does not try multiple
options (the zero_ext* -> and+shift rewriting is not changing the
semantics of the pattern at all).

> Or would it be better to let each target decide
> on how to canonicalize bit tests and only try that alternative?

The question is how to write the pattern to be most convenient for all
targets.

> > > Neither matches the AArch64 patterns for ANDS/TST (which is just compare and AND). If the
> > immediate
> > > is not a power of 2 or a power of 2 -1 then it matches correctly as expected.
> > >
> > > I don't understand how ((x >> 1) & 1) != 0 could be a useful expansion
> > 
> > It is zero_extract(x,1,1) really.  This is convenient for (old and embedded)
> > processors that have special bit-test instructions.  If we now want combine
> > to not do this, we'll have to update all backends that rely on it.
> 
> Would any backend actually rely on this given it only does some specific masks,
> has a redundant shift with 0 for the mask case and the odd subreg as well?

Such backends match the zero_extract patterns, of course.  Random example:
the h8300 patterns for the "btst" instruction.

> > They are common, and many processors had instructions for them, which is
> > (supposedly) why combine special-cased them.
> 
> Yes, but that doesn't mean (x & C) != 0 shouldn't be tried as well...

Combine does not try multiple options.

> > > It's trivial to change change_zero_ext to expand extracts always into AND and remove the
> > redundant
> > > subreg.
> > 
> > Not really trivial...  Think about comparisons...
> > 
> > int32_t x;
> > ((x >> 31) & 1) > 0;
> > // is not the same as
> > (x & 0x80000000) > 0; // signed comparison
> > 
> > (You do not easily know what the COMPARE is used for).
> 
> Indeed if you had a zero_extract that included the sign-bit then you may have to adjust
> the compare condition. However it looks like that case can't happen - (x & 0x80000000) 
> comparisons with have the AND optimized away much earlier.

A long time before combine, yes.  But it is *possible* to give such insns
to combine, and it should not do the wrong thing with them.

> > > However wouldn't it make more sense to never special case certain AND immediate in the first
> > > place?
> > 
> > Yes, but we need to make sure no targets regress (i.e. by rewriting patterns
> > for those that do).  We need to know the impact of such a change, at the least.
> 
> The alternative would be let the target decide how to canonicalize bit tests. 
> That seems like a better solution than trying multiple possibilities which can never 
> match on most targets.

You will end up with a *lot* of target hooks like this.  It will also
make testing harder (less coverage).  I am not sure that is a good idea.


Segher

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: RFC: Combine of compare & and oddity
  2015-09-03 13:57     ` Segher Boessenkool
@ 2015-09-03 15:00       ` Wilco Dijkstra
  2015-09-03 15:21         ` Andrew Pinski
                           ` (2 more replies)
  2015-09-03 15:52       ` Jeff Law
  1 sibling, 3 replies; 26+ messages in thread
From: Wilco Dijkstra @ 2015-09-03 15:00 UTC (permalink / raw)
  To: 'Segher Boessenkool'; +Cc: 'GCC Patches'

> Segher Boessenkool wrote:
> On Thu, Sep 03, 2015 at 12:43:34PM +0100, Wilco Dijkstra wrote:
> > > > Combine canonicalizes certain AND masks in a comparison with zero into extracts of the
> > > widest
> > > > register type. During matching these are expanded into a very inefficient sequence that
> > > fails to
> > > > match. For example (x & 2) == 0 is matched in combine like this:
> > > >
> > > > Failed to match this instruction:
> > > > (set (reg:CC 66 cc)
> > > >     (compare:CC (zero_extract:DI (subreg:DI (reg/v:SI 76 [ xD.2641 ]) 0)
> > > >             (const_int 1 [0x1])
> > > >             (const_int 1 [0x1]))
> > > >         (const_int 0 [0])))
> > >
> > > Yes.  Some processors even have specific instructions to do this.
> >
> > However there are 2 issues with this, one is the spurious subreg,
> 
> Combine didn't make that up out of thin air; something already used
> DImode here.  It could simplify it to SImode in this case, that is
> true, don't know why it doesn't; it isn't necessarily faster code to
> do so, it can be slower, it might not match, etc.

The relevant RTL instructions on AArch64 are:

(insn 8 3 25 2 (set (reg:SI 77 [ D.2705 ])
        (and:SI (reg/v:SI 76 [ xD.2641 ])
            (const_int 2 [0x2]))) tmp5.c:122 452 {andsi3}
     (nil))
 (insn 26 25 27 2 (set (reg:CC 66 cc)
        (compare:CC (reg:SI 77 [ D.2705 ])
            (const_int 0 [0]))) tmp5.c:122 377 {*cmpsi}
     (expr_list:REG_DEAD (reg:SI 77 [ D.2705 ])
        (nil)))

I don't see anything using DI...

> > the other is
> > that only a subset of legal zero_extracts are tried (only single bit and the
> > degenerate case of zero_extract with shift of 0 - which I think should not be a
> > zero_extract). All other AND immediate remain as AND.
> 
> Yes.  I'm happy to see this weird special case "optimisation",
> "canocalisation" gone.
> 
> > So to emit an AND on targets without such specific instructions, or where such
> > instructions are more expensive than a simple AND (*), you need now at least 3 different
> > backend patterns for any instruction that can emit AND immediate...
> 
> It's only a problem for AND-and-compare, no?

Yes, so it looks like some other backends match the odd pattern and then have another
pattern change it back into the canonical AND/TST form during the split phase (maybe
the subreg confuses register allocation or block other optimizations). This all seems
a lot of unnecessary complexity for a few special immediates when there is a much 
simpler solution...

> > (*) I think that is another issue in combine - when both alternatives match you
> > want to select the lowest cost one, not the first one that matches.
> 
> That's recog, not combine.  And quite a few backends rely on "first match
> wins", because it always has been that way.  It also is very easy to write
> such patterns accidentally (sometimes even the exact same one twice in the
> same machine description, etc.)
> 
> > > > Failed to match this instruction:
> > > > (set (reg:CC 66 cc)
> > > >     (compare:CC (and:DI (lshiftrt:DI (subreg:DI (reg/v:SI 76 [ xD.2641 ]) 0)
> > > >                 (const_int 1 [0x1]))
> > > >             (const_int 1 [0x1]))
> > > >         (const_int 0 [0])))
> > >
> > > This is after r223067.  Combine tests only one "final" instruction; that
> > > revision rewrites zero_ext* if it doesn't match and tries again.  This
> > > helps for processors that can do more generic masks (like rs6000, and I
> > > believe also aarch64?): without it, you need many more patterns to match
> > > all the zero_ext{ract,end} cases.
> >
> > But there are more efficient ways to emit single bit and masks tests that apply
> > to most CPUs rather than doing something specific that works for just one target
> > only. For example single bit test is a simple shift into carry flag or into the
> > sign bit, and for mask tests, you shift out all the non-mask bits.
> 
> Most of those are quite target-specific.  Some others are already done,
> and/or done by other passes.

But what combine does here is even more target-specific. Shifts and AND setting flags
are universally supported on targets with condition code register.
Bitfield test/extract instructions are more rare, and when they are supported, they
may well be more expensive.

> > So my question is, is it combine's job to try all possible permutations that
> > constitute a bit or mask test?
> 
> Combine converts the merged instructions to what it thinks is the
> canonical or cheapest form, and uses that.  It does not try multiple
> options (the zero_ext* -> and+shift rewriting is not changing the
> semantics of the pattern at all).

But the change from AND to zero_extract is already changing semantics...

> > Or would it be better to let each target decide
> > on how to canonicalize bit tests and only try that alternative?
> 
> The question is how to write the pattern to be most convenient for all
> targets.

The obvious choice is to try the 2 original instructions merged.

> > > > Neither matches the AArch64 patterns for ANDS/TST (which is just compare and AND). If
> the
> > > immediate
> > > > is not a power of 2 or a power of 2 -1 then it matches correctly as expected.
> > > >
> > > > I don't understand how ((x >> 1) & 1) != 0 could be a useful expansion
> > >
> > > It is zero_extract(x,1,1) really.  This is convenient for (old and embedded)
> > > processors that have special bit-test instructions.  If we now want combine
> > > to not do this, we'll have to update all backends that rely on it.
> >
> > Would any backend actually rely on this given it only does some specific masks,
> > has a redundant shift with 0 for the mask case and the odd subreg as well?
> 
> Such backends match the zero_extract patterns, of course.  Random example:
> the h8300 patterns for the "btst" instruction.
> 
> > > They are common, and many processors had instructions for them, which is
> > > (supposedly) why combine special-cased them.
> >
> > Yes, but that doesn't mean (x & C) != 0 shouldn't be tried as well...
> 
> Combine does not try multiple options.

I'm not following - combine tries zero_extract and shift+AND - that's 2 options.
If that is feasible then adding a 3rd option should be possible.

> > > > It's trivial to change change_zero_ext to expand extracts always into AND and remove the
> > > redundant
> > > > subreg.
> > >
> > > Not really trivial...  Think about comparisons...
> > >
> > > int32_t x;
> > > ((x >> 31) & 1) > 0;
> > > // is not the same as
> > > (x & 0x80000000) > 0; // signed comparison
> > >
> > > (You do not easily know what the COMPARE is used for).
> >
> > Indeed if you had a zero_extract that included the sign-bit then you may have to adjust
> > the compare condition. However it looks like that case can't happen - (x & 0x80000000)
> > comparisons with have the AND optimized away much earlier.
> 
> A long time before combine, yes.  But it is *possible* to give such insns
> to combine, and it should not do the wrong thing with them.

Yes. So that suggests we'd need to block the canonicalization rather than undo it.

> > > > However wouldn't it make more sense to never special case certain AND immediate in the
> first
> > > > place?
> > >
> > > Yes, but we need to make sure no targets regress (i.e. by rewriting patterns
> > > for those that do).  We need to know the impact of such a change, at the least.
> >
> > The alternative would be let the target decide how to canonicalize bit tests.
> > That seems like a better solution than trying multiple possibilities which can never
> > match on most targets.
> 
> You will end up with a *lot* of target hooks like this.  It will also
> make testing harder (less coverage).  I am not sure that is a good idea.

We certainly need a lot more target hooks in general so GCC can do the right thing
(rather than using costs inconsistently all over the place). But that's a different
discussion...

Wilco


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC: Combine of compare & and oddity
  2015-09-03 15:00       ` Wilco Dijkstra
@ 2015-09-03 15:21         ` Andrew Pinski
  2015-09-03 16:15         ` Jeff Law
  2015-09-03 16:24         ` Segher Boessenkool
  2 siblings, 0 replies; 26+ messages in thread
From: Andrew Pinski @ 2015-09-03 15:21 UTC (permalink / raw)
  To: Wilco Dijkstra; +Cc: Segher Boessenkool, GCC Patches

On Thu, Sep 3, 2015 at 10:59 PM, Wilco Dijkstra <wdijkstr@arm.com> wrote:
>> Segher Boessenkool wrote:
>> On Thu, Sep 03, 2015 at 12:43:34PM +0100, Wilco Dijkstra wrote:
>> > > > Combine canonicalizes certain AND masks in a comparison with zero into extracts of the
>> > > widest
>> > > > register type. During matching these are expanded into a very inefficient sequence that
>> > > fails to
>> > > > match. For example (x & 2) == 0 is matched in combine like this:
>> > > >
>> > > > Failed to match this instruction:
>> > > > (set (reg:CC 66 cc)
>> > > >     (compare:CC (zero_extract:DI (subreg:DI (reg/v:SI 76 [ xD.2641 ]) 0)
>> > > >             (const_int 1 [0x1])
>> > > >             (const_int 1 [0x1]))
>> > > >         (const_int 0 [0])))
>> > >
>> > > Yes.  Some processors even have specific instructions to do this.
>> >
>> > However there are 2 issues with this, one is the spurious subreg,
>>
>> Combine didn't make that up out of thin air; something already used
>> DImode here.  It could simplify it to SImode in this case, that is
>> true, don't know why it doesn't; it isn't necessarily faster code to
>> do so, it can be slower, it might not match, etc.
>
> The relevant RTL instructions on AArch64 are:
>
> (insn 8 3 25 2 (set (reg:SI 77 [ D.2705 ])
>         (and:SI (reg/v:SI 76 [ xD.2641 ])
>             (const_int 2 [0x2]))) tmp5.c:122 452 {andsi3}
>      (nil))
>  (insn 26 25 27 2 (set (reg:CC 66 cc)
>         (compare:CC (reg:SI 77 [ D.2705 ])
>             (const_int 0 [0]))) tmp5.c:122 377 {*cmpsi}
>      (expr_list:REG_DEAD (reg:SI 77 [ D.2705 ])
>         (nil)))
>
> I don't see anything using DI...
>
>> > the other is
>> > that only a subset of legal zero_extracts are tried (only single bit and the
>> > degenerate case of zero_extract with shift of 0 - which I think should not be a
>> > zero_extract). All other AND immediate remain as AND.
>>
>> Yes.  I'm happy to see this weird special case "optimisation",
>> "canocalisation" gone.
>>
>> > So to emit an AND on targets without such specific instructions, or where such
>> > instructions are more expensive than a simple AND (*), you need now at least 3 different
>> > backend patterns for any instruction that can emit AND immediate...
>>
>> It's only a problem for AND-and-compare, no?
>
> Yes, so it looks like some other backends match the odd pattern and then have another
> pattern change it back into the canonical AND/TST form during the split phase (maybe
> the subreg confuses register allocation or block other optimizations). This all seems
> a lot of unnecessary complexity for a few special immediates when there is a much
> simpler solution...
>
>> > (*) I think that is another issue in combine - when both alternatives match you
>> > want to select the lowest cost one, not the first one that matches.
>>
>> That's recog, not combine.  And quite a few backends rely on "first match
>> wins", because it always has been that way.  It also is very easy to write
>> such patterns accidentally (sometimes even the exact same one twice in the
>> same machine description, etc.)
>>
>> > > > Failed to match this instruction:
>> > > > (set (reg:CC 66 cc)
>> > > >     (compare:CC (and:DI (lshiftrt:DI (subreg:DI (reg/v:SI 76 [ xD.2641 ]) 0)
>> > > >                 (const_int 1 [0x1]))
>> > > >             (const_int 1 [0x1]))
>> > > >         (const_int 0 [0])))
>> > >
>> > > This is after r223067.  Combine tests only one "final" instruction; that
>> > > revision rewrites zero_ext* if it doesn't match and tries again.  This
>> > > helps for processors that can do more generic masks (like rs6000, and I
>> > > believe also aarch64?): without it, you need many more patterns to match
>> > > all the zero_ext{ract,end} cases.
>> >
>> > But there are more efficient ways to emit single bit and masks tests that apply
>> > to most CPUs rather than doing something specific that works for just one target
>> > only. For example single bit test is a simple shift into carry flag or into the
>> > sign bit, and for mask tests, you shift out all the non-mask bits.
>>
>> Most of those are quite target-specific.  Some others are already done,
>> and/or done by other passes.
>
> But what combine does here is even more target-specific. Shifts and AND setting flags
> are universally supported on targets with condition code register.
> Bitfield test/extract instructions are more rare, and when they are supported, they
> may well be more expensive.
>
>> > So my question is, is it combine's job to try all possible permutations that
>> > constitute a bit or mask test?
>>
>> Combine converts the merged instructions to what it thinks is the
>> canonical or cheapest form, and uses that.  It does not try multiple
>> options (the zero_ext* -> and+shift rewriting is not changing the
>> semantics of the pattern at all).
>
> But the change from AND to zero_extract is already changing semantics...
>
>> > Or would it be better to let each target decide
>> > on how to canonicalize bit tests and only try that alternative?
>>
>> The question is how to write the pattern to be most convenient for all
>> targets.
>
> The obvious choice is to try the 2 original instructions merged.
>
>> > > > Neither matches the AArch64 patterns for ANDS/TST (which is just compare and AND). If
>> the
>> > > immediate
>> > > > is not a power of 2 or a power of 2 -1 then it matches correctly as expected.
>> > > >
>> > > > I don't understand how ((x >> 1) & 1) != 0 could be a useful expansion
>> > >
>> > > It is zero_extract(x,1,1) really.  This is convenient for (old and embedded)
>> > > processors that have special bit-test instructions.  If we now want combine
>> > > to not do this, we'll have to update all backends that rely on it.
>> >
>> > Would any backend actually rely on this given it only does some specific masks,
>> > has a redundant shift with 0 for the mask case and the odd subreg as well?
>>
>> Such backends match the zero_extract patterns, of course.  Random example:
>> the h8300 patterns for the "btst" instruction.
>>
>> > > They are common, and many processors had instructions for them, which is
>> > > (supposedly) why combine special-cased them.
>> >
>> > Yes, but that doesn't mean (x & C) != 0 shouldn't be tried as well...
>>
>> Combine does not try multiple options.
>
> I'm not following - combine tries zero_extract and shift+AND - that's 2 options.
> If that is feasible then adding a 3rd option should be possible.
>
>> > > > It's trivial to change change_zero_ext to expand extracts always into AND and remove the
>> > > redundant
>> > > > subreg.
>> > >
>> > > Not really trivial...  Think about comparisons...
>> > >
>> > > int32_t x;
>> > > ((x >> 31) & 1) > 0;
>> > > // is not the same as
>> > > (x & 0x80000000) > 0; // signed comparison
>> > >
>> > > (You do not easily know what the COMPARE is used for).
>> >
>> > Indeed if you had a zero_extract that included the sign-bit then you may have to adjust
>> > the compare condition. However it looks like that case can't happen - (x & 0x80000000)
>> > comparisons with have the AND optimized away much earlier.
>>
>> A long time before combine, yes.  But it is *possible* to give such insns
>> to combine, and it should not do the wrong thing with them.
>
> Yes. So that suggests we'd need to block the canonicalization rather than undo it.
>
>> > > > However wouldn't it make more sense to never special case certain AND immediate in the
>> first
>> > > > place?
>> > >
>> > > Yes, but we need to make sure no targets regress (i.e. by rewriting patterns
>> > > for those that do).  We need to know the impact of such a change, at the least.
>> >
>> > The alternative would be let the target decide how to canonicalize bit tests.
>> > That seems like a better solution than trying multiple possibilities which can never
>> > match on most targets.
>>
>> You will end up with a *lot* of target hooks like this.  It will also
>> make testing harder (less coverage).  I am not sure that is a good idea.
>
> We certainly need a lot more target hooks in general so GCC can do the right thing
> (rather than using costs inconsistently all over the place). But that's a different
> discussion...


This is the pattern which I have in my tree (I need someone here to
submit it still):

(define_insn "*and<mode>3_ze_nr_compare0"
  [(set (reg:CC_NZ CC_REGNUM)
(compare:CC_NZ
(zero_extract:GPI  ; loc size pos
                  (match_operand:GPI 0 "register_operand" "r")
                  (match_operand:GPI 1 "const_int_operand" "n")
 (match_operand:GPI 2 "const_int_operand" "n"))
(const_int 0)))]
  "aarch64_bitmask_imm ((((HOST_WIDE_INT_1U << UINTVAL (operands[1]))
- 1) << UINTVAL (operands[2])),<MODE>mode)"
  {
    unsigned HOST_WIDE_INT value  = (((HOST_WIDE_INT_1U << UINTVAL
(operands[1])) - 1) << UINTVAL (operands[2]));
    operands[1] = GEN_INT (value);
    return "tst\\t%<w>0, %<w>1";
  }
  [(set_attr "type" "logics_reg")]
)

--- CUT ---

And then under case COMPARE part of rtx_costs:
          /* CC_NZmode supports zero extract for free.  */
          if (GET_MODE (x) == CC_NZmode && GET_CODE (op0) == ZERO_EXTRACT)
            op0 = XEXP (op0, 0);

And in aarch64_select_cc_mode:

  if ((GET_MODE (x) == SImode || GET_MODE (x) == DImode)
      && y == const0_rtx
      && (code == EQ || code == NE || code == LT || code == GE)
      && (GET_CODE (x) == PLUS || GET_CODE (x) == MINUS || GET_CODE (x) == AND
 || GET_CODE (x) == NEG || GET_CODE (x) == ZERO_EXTRACT))
    return CC_NZmode;


Note I also have the following too, to solve the case where
zero_extend is need being used in some cases for ands (yes this shows
up mostly right after returns from functions)

(define_insn "*andsi3_compare0_uxtw"
  [(set (reg:CC_NZ CC_REGNUM)
(compare:CC_NZ
(and:SI (match_operand:SI 1 "register_operand" "%r,r")
(match_operand:SI 2 "aarch64_logical_operand" "r,K"))
(const_int 0)))
   (set (match_operand:DI 0 "register_operand" "=r,r")
(zero_extend:DI (and:SI (match_dup 1) (match_dup 2))))]
  ""
  "ands\\t%w0, %w1, %w2"
  [(set_attr "type" "logics_reg,logics_imm")]


Thanks,
Andrew Pinski




>
> Wilco
>
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC: Combine of compare & and oddity
  2015-09-02 21:05   ` Segher Boessenkool
@ 2015-09-03 15:26     ` Jeff Law
  0 siblings, 0 replies; 26+ messages in thread
From: Jeff Law @ 2015-09-03 15:26 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Wilco Dijkstra, 'GCC Patches'

On 09/02/2015 03:00 PM, Segher Boessenkool wrote:
> On Wed, Sep 02, 2015 at 01:59:58PM -0600, Jeff Law wrote:
>>> (set (reg:CC 66 cc)
>>>      (compare:CC (and:DI (lshiftrt:DI (subreg:DI (reg/v:SI 76 [ xD.2641 ])
>>>      0)
>>>                  (const_int 1 [0x1]))
>>>              (const_int 1 [0x1]))
>>>          (const_int 0 [0])))
>> Yea, this is an alternative form.   I don't offhand remember how/why
>> this form appears, but it certainly does.  I don't think any ports
>> handle this form (but I certainly have done any checks), but I believe
>> combine creates it primarily for internal purposes.
>
> Combine replaces zero_ext* with equivalent shift/and patterns and tries
> again, if things don't match.  Targets with more generic masking insns
> do not want to describe the very many cases that can be described with
> zero_ext* separately.
>
> rs6000 handles this exact pattern, btw.  And I'll be very happy if we can
> just drop it :-)
If I still cared, I'd probably look into this for the PA which has some 
rough similarities with the PPC architecture in its bit 
insertion/extraction capabilities.    But the PA just isn't worth the 
time :-)

>> cindex @code{zero_extract}, canonicalization of
>> @cindex @code{sign_extract}, canonicalization of
>> @item
>> Equality comparisons of a group of bits (usually a single bit) with zero
>> will be written using @code{zero_extract} rather than the equivalent
>> @code{and} or @code{sign_extract} operations.
>
> Oh it's even documented, thanks.  I do still think we should think of
> changing this.
Do-able, but I suspect the fallout would be significant across the older 
ports.

Jeff

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC: Combine of compare & and oddity
  2015-09-03 13:57     ` Segher Boessenkool
  2015-09-03 15:00       ` Wilco Dijkstra
@ 2015-09-03 15:52       ` Jeff Law
  1 sibling, 0 replies; 26+ messages in thread
From: Jeff Law @ 2015-09-03 15:52 UTC (permalink / raw)
  To: Segher Boessenkool, Wilco Dijkstra; +Cc: 'GCC Patches'

On 09/03/2015 07:18 AM, Segher Boessenkool wrote:
> On Thu, Sep 03, 2015 at 12:43:34PM +0100, Wilco Dijkstra wrote:
>>>> Combine canonicalizes certain AND masks in a comparison with zero into extracts of the
>>> widest
>>>> register type. During matching these are expanded into a very inefficient sequence that
>>> fails to
>>>> match. For example (x & 2) == 0 is matched in combine like this:
>>>>
>>>> Failed to match this instruction:
>>>> (set (reg:CC 66 cc)
>>>>      (compare:CC (zero_extract:DI (subreg:DI (reg/v:SI 76 [ xD.2641 ]) 0)
>>>>              (const_int 1 [0x1])
>>>>              (const_int 1 [0x1]))
>>>>          (const_int 0 [0])))
>>>
>>> Yes.  Some processors even have specific instructions to do this.
>>
>> However there are 2 issues with this, one is the spurious subreg,
>
> Combine didn't make that up out of thin air; something already used
> DImode here.  It could simplify it to SImode in this case, that is
> true, don't know why it doesn't; it isn't necessarily faster code to
> do so, it can be slower, it might not match, etc.
Right.  It may also be the case that on a 64 bit target, but the 
underlying object is 32 bits and combine wanted to do things in word_mode.

But yes, there's a reason why the subreg is in there and there are times 
when the subregs get in the way of the hand-written pattern matching 
that occurs in combine.c and elsewhere.

So it's generally useful to squash away the subregs when we can. 
However, it's also the case that the subregs can't always be squashed 
away -- so it's also helpful to dig into the transformations in 
combine.c that you want to fire and figure out if and how that code can 
be extended to handle the embedded subregs.

>
>> (*) I think that is another issue in combine - when both alternatives match you
>> want to select the lowest cost one, not the first one that matches.
>
> That's recog, not combine.  And quite a few backends rely on "first match
> wins", because it always has been that way.  It also is very easy to write
> such patterns accidentally (sometimes even the exact same one twice in the
> same machine description, etc.)
Note that it's also been documented that first match wins for 20+ years.



>
>> So my question is, is it combine's job to try all possible permutations that
>> constitute a bit or mask test?
>
> Combine converts the merged instructions to what it thinks is the
> canonical or cheapest form, and uses that.  It does not try multiple
> options (the zero_ext* -> and+shift rewriting is not changing the
> semantics of the pattern at all).
Right.  Once combine finds something that works, it's done and moves 
onto the next set of insns to combine.


>
>>>> Neither matches the AArch64 patterns for ANDS/TST (which is just compare and AND). If the
>>> immediate
>>>> is not a power of 2 or a power of 2 -1 then it matches correctly as expected.
>>>>
>>>> I don't understand how ((x >> 1) & 1) != 0 could be a useful expansion
>>>
>>> It is zero_extract(x,1,1) really.  This is convenient for (old and embedded)
>>> processors that have special bit-test instructions.  If we now want combine
>>> to not do this, we'll have to update all backends that rely on it.
>>
>> Would any backend actually rely on this given it only does some specific masks,
>> has a redundant shift with 0 for the mask case and the odd subreg as well?
>
> Such backends match the zero_extract patterns, of course.  Random example:
> the h8300 patterns for the "btst" instruction.
PA, m68k and almost certainly others.  I suspect it's fairly common in 
older ports.


Jeff

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC: Combine of compare & and oddity
  2015-09-03 15:00       ` Wilco Dijkstra
  2015-09-03 15:21         ` Andrew Pinski
@ 2015-09-03 16:15         ` Jeff Law
  2015-09-03 16:40           ` Segher Boessenkool
  2015-09-03 16:24         ` Segher Boessenkool
  2 siblings, 1 reply; 26+ messages in thread
From: Jeff Law @ 2015-09-03 16:15 UTC (permalink / raw)
  To: Wilco Dijkstra, 'Segher Boessenkool'; +Cc: 'GCC Patches'

On 09/03/2015 08:59 AM, Wilco Dijkstra wrote:
>> Segher Boessenkool wrote:
>> On Thu, Sep 03, 2015 at 12:43:34PM +0100, Wilco Dijkstra wrote:
>>>>> Combine canonicalizes certain AND masks in a comparison with zero into extracts of the
>>>> widest
>>>>> register type. During matching these are expanded into a very inefficient sequence that
>>>> fails to
>>>>> match. For example (x & 2) == 0 is matched in combine like this:
>>>>>
>>>>> Failed to match this instruction:
>>>>> (set (reg:CC 66 cc)
>>>>>      (compare:CC (zero_extract:DI (subreg:DI (reg/v:SI 76 [ xD.2641 ]) 0)
>>>>>              (const_int 1 [0x1])
>>>>>              (const_int 1 [0x1]))
>>>>>          (const_int 0 [0])))
>>>>
>>>> Yes.  Some processors even have specific instructions to do this.
>>>
>>> However there are 2 issues with this, one is the spurious subreg,
>>
>> Combine didn't make that up out of thin air; something already used
>> DImode here.  It could simplify it to SImode in this case, that is
>> true, don't know why it doesn't; it isn't necessarily faster code to
>> do so, it can be slower, it might not match, etc.
>
> The relevant RTL instructions on AArch64 are:
>
> (insn 8 3 25 2 (set (reg:SI 77 [ D.2705 ])
>          (and:SI (reg/v:SI 76 [ xD.2641 ])
>              (const_int 2 [0x2]))) tmp5.c:122 452 {andsi3}
>       (nil))
>   (insn 26 25 27 2 (set (reg:CC 66 cc)
>          (compare:CC (reg:SI 77 [ D.2705 ])
>              (const_int 0 [0]))) tmp5.c:122 377 {*cmpsi}
>       (expr_list:REG_DEAD (reg:SI 77 [ D.2705 ])
>          (nil)))
>
> I don't see anything using DI...
Is this aarch64?  Might be the case that combine wants to do everything 
in word_mode for some reason or another.

> Yes, so it looks like some other backends match the odd pattern and then have another
> pattern change it back into the canonical AND/TST form during the split phase (maybe
> the subreg confuses register allocation or block other optimizations). This all seems
> a lot of unnecessary complexity for a few special immediates when there is a much
> simpler solution...
subregs get in the way in various places.  So if we can avoid generating 
them, then that's good.

>>> But there are more efficient ways to emit single bit and masks tests that apply
>>> to most CPUs rather than doing something specific that works for just one target
>>> only. For example single bit test is a simple shift into carry flag or into the
>>> sign bit, and for mask tests, you shift out all the non-mask bits.
>>
>> Most of those are quite target-specific.  Some others are already done,
>> and/or done by other passes.
I wouldn't go that far.  Many targets have simple, cheap ways to do 
single bit testing.  And there are some targets where trying to shift a 
bit into the carry flag would be *horribly* bad performance-wise as they 
only have single bit shifters.




>
> But what combine does here is even more target-specific. Shifts and AND setting flags
> are universally supported on targets with condition code register.
> Bitfield test/extract instructions are more rare, and when they are supported, they
> may well be more expensive.
I don't think it's anywhere near that clear cut.


>
>>> So my question is, is it combine's job to try all possible permutations that
>>> constitute a bit or mask test?
>>
>> Combine converts the merged instructions to what it thinks is the
>> canonical or cheapest form, and uses that.  It does not try multiple
>> options (the zero_ext* -> and+shift rewriting is not changing the
>> semantics of the pattern at all).
>
> But the change from AND to zero_extract is already changing semantics...
It shouldn't be changing semantics -- changing form != changing semantics.

>>
>> Combine does not try multiple options.
>
> I'm not following - combine tries zero_extract and shift+AND - that's 2 options.
> If that is feasible then adding a 3rd option should be possible.
Combine will stop once it finds a match.  It may, in some circumstances, 
try more than one representation to find a match, but that's the 
exception rather than the rule.

>> You will end up with a *lot* of target hooks like this.  It will also
>> make testing harder (less coverage).  I am not sure that is a good idea.
>
> We certainly need a lot more target hooks in general so GCC can do the right thing
> (rather than using costs inconsistently all over the place). But that's a different
> discussion...
Let's be very careful here, target hooks aren't always the solution. 
I'd rather see the costing models fixed and use those across the board. 
  But frankly, I don't know how to fix the costing models.

jeff

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC: Combine of compare & and oddity
  2015-09-03 15:00       ` Wilco Dijkstra
  2015-09-03 15:21         ` Andrew Pinski
  2015-09-03 16:15         ` Jeff Law
@ 2015-09-03 16:24         ` Segher Boessenkool
  2015-09-03 16:36           ` Kyrill Tkachov
  2015-09-03 18:42           ` Jeff Law
  2 siblings, 2 replies; 26+ messages in thread
From: Segher Boessenkool @ 2015-09-03 16:24 UTC (permalink / raw)
  To: Wilco Dijkstra; +Cc: 'GCC Patches'

On Thu, Sep 03, 2015 at 03:59:00PM +0100, Wilco Dijkstra wrote:
> > > However there are 2 issues with this, one is the spurious subreg,
> > 
> > Combine didn't make that up out of thin air; something already used
> > DImode here.  It could simplify it to SImode in this case, that is
> > true, don't know why it doesn't; it isn't necessarily faster code to
> > do so, it can be slower, it might not match, etc.
> 
> The relevant RTL instructions on AArch64 are:

[ You never gave a full  test case, or I missed it, or cannot find it
  anymore -- but I can reproduce this now:

void g(void);
void f(int *x) { if (*x & 2) g(); }

]

> (insn 8 3 25 2 (set (reg:SI 77 [ D.2705 ])
>         (and:SI (reg/v:SI 76 [ xD.2641 ])
>             (const_int 2 [0x2]))) tmp5.c:122 452 {andsi3}
>      (nil))
>  (insn 26 25 27 2 (set (reg:CC 66 cc)
>         (compare:CC (reg:SI 77 [ D.2705 ])
>             (const_int 0 [0]))) tmp5.c:122 377 {*cmpsi}
>      (expr_list:REG_DEAD (reg:SI 77 [ D.2705 ])
>         (nil)))
> 
> I don't see anything using DI...

Yeah, I spoke too soon, sorry.  It looks like make_compound_operation came
up with it.

> > It's only a problem for AND-and-compare, no?
> 
> Yes, so it looks like some other backends match the odd pattern and then have another
> pattern change it back into the canonical AND/TST form during the split phase (maybe
> the subreg confuses register allocation or block other optimizations).

A subreg of a pseudo is not anything special, don't worry about it,
register_operand and similar treat it just like any other register.

> This all seems
> a lot of unnecessary complexity for a few special immediates when there is a much 
> simpler solution...

Feel free to post a patch!  I would love to have this all simplified.

> > > But there are more efficient ways to emit single bit and masks tests that apply
> > > to most CPUs rather than doing something specific that works for just one target
> > > only. For example single bit test is a simple shift into carry flag or into the
> > > sign bit, and for mask tests, you shift out all the non-mask bits.
> > 
> > Most of those are quite target-specific.  Some others are already done,
> > and/or done by other passes.
> 
> But what combine does here is even more target-specific.

Combine puts everything (well, most things) through
make_compound_operation, on all targets.

> > Combine converts the merged instructions to what it thinks is the
> > canonical or cheapest form, and uses that.  It does not try multiple
> > options (the zero_ext* -> and+shift rewriting is not changing the
> > semantics of the pattern at all).
> 
> But the change from AND to zero_extract is already changing semantics...

Oh?  It is not supposed to!

> > > Or would it be better to let each target decide
> > > on how to canonicalize bit tests and only try that alternative?
> > 
> > The question is how to write the pattern to be most convenient for all
> > targets.
> 
> The obvious choice is to try the 2 original instructions merged.

... without any simplification.  Yes, I've wanted combine to fall back
to that if the "simplified" version does not work out.  Not so easy to
do though.

> > > Yes, but that doesn't mean (x & C) != 0 shouldn't be tried as well...
> > 
> > Combine does not try multiple options.
> 
> I'm not following - combine tries zero_extract and shift+AND - that's 2 options.
> If that is feasible then adding a 3rd option should be possible.

The shift+and is *exactly the same* as the zero_extract, just written
differently.

> We certainly need a lot more target hooks in general so GCC can do the right thing
> (rather than using costs inconsistently all over the place). But that's a different
> discussion...

This isn't about costs though.  That is a big other can of worms, indeed!


Anyway.  In that testcase I made, everything is simplified just fine on
aarch64, using *tbeqdi1; what am I missing?


Segher

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC: Combine of compare & and oddity
  2015-09-03 16:24         ` Segher Boessenkool
@ 2015-09-03 16:36           ` Kyrill Tkachov
  2015-09-03 16:44             ` Wilco Dijkstra
  2015-09-03 17:27             ` Segher Boessenkool
  2015-09-03 18:42           ` Jeff Law
  1 sibling, 2 replies; 26+ messages in thread
From: Kyrill Tkachov @ 2015-09-03 16:36 UTC (permalink / raw)
  To: Segher Boessenkool, Wilco Dijkstra; +Cc: 'GCC Patches'


On 03/09/15 17:18, Segher Boessenkool wrote:
> On Thu, Sep 03, 2015 at 03:59:00PM +0100, Wilco Dijkstra wrote:
>>>> However there are 2 issues with this, one is the spurious subreg,
>>> Combine didn't make that up out of thin air; something already used
>>> DImode here.  It could simplify it to SImode in this case, that is
>>> true, don't know why it doesn't; it isn't necessarily faster code to
>>> do so, it can be slower, it might not match, etc.
>> The relevant RTL instructions on AArch64 are:
> [ You never gave a full  test case, or I missed it, or cannot find it
>    anymore -- but I can reproduce this now:
>
> void g(void);
> void f(int *x) { if (*x & 2) g(); }
>
> ]
>
>> (insn 8 3 25 2 (set (reg:SI 77 [ D.2705 ])
>>          (and:SI (reg/v:SI 76 [ xD.2641 ])
>>              (const_int 2 [0x2]))) tmp5.c:122 452 {andsi3}
>>       (nil))
>>   (insn 26 25 27 2 (set (reg:CC 66 cc)
>>          (compare:CC (reg:SI 77 [ D.2705 ])
>>              (const_int 0 [0]))) tmp5.c:122 377 {*cmpsi}
>>       (expr_list:REG_DEAD (reg:SI 77 [ D.2705 ])
>>          (nil)))
>>
>> I don't see anything using DI...
> Yeah, I spoke too soon, sorry.  It looks like make_compound_operation came
> up with it.
>
>>> It's only a problem for AND-and-compare, no?
>> Yes, so it looks like some other backends match the odd pattern and then have another
>> pattern change it back into the canonical AND/TST form during the split phase (maybe
>> the subreg confuses register allocation or block other optimizations).
> A subreg of a pseudo is not anything special, don't worry about it,
> register_operand and similar treat it just like any other register.
>
>> This all seems
>> a lot of unnecessary complexity for a few special immediates when there is a much
>> simpler solution...
> Feel free to post a patch!  I would love to have this all simplified.
>
>>>> But there are more efficient ways to emit single bit and masks tests that apply
>>>> to most CPUs rather than doing something specific that works for just one target
>>>> only. For example single bit test is a simple shift into carry flag or into the
>>>> sign bit, and for mask tests, you shift out all the non-mask bits.
>>> Most of those are quite target-specific.  Some others are already done,
>>> and/or done by other passes.
>> But what combine does here is even more target-specific.
> Combine puts everything (well, most things) through
> make_compound_operation, on all targets.
>
>>> Combine converts the merged instructions to what it thinks is the
>>> canonical or cheapest form, and uses that.  It does not try multiple
>>> options (the zero_ext* -> and+shift rewriting is not changing the
>>> semantics of the pattern at all).
>> But the change from AND to zero_extract is already changing semantics...
> Oh?  It is not supposed to!
>
>>>> Or would it be better to let each target decide
>>>> on how to canonicalize bit tests and only try that alternative?
>>> The question is how to write the pattern to be most convenient for all
>>> targets.
>> The obvious choice is to try the 2 original instructions merged.
> ... without any simplification.  Yes, I've wanted combine to fall back
> to that if the "simplified" version does not work out.  Not so easy to
> do though.
>
>>>> Yes, but that doesn't mean (x & C) != 0 shouldn't be tried as well...
>>> Combine does not try multiple options.
>> I'm not following - combine tries zero_extract and shift+AND - that's 2 options.
>> If that is feasible then adding a 3rd option should be possible.
> The shift+and is *exactly the same* as the zero_extract, just written
> differently.
>
>> We certainly need a lot more target hooks in general so GCC can do the right thing
>> (rather than using costs inconsistently all over the place). But that's a different
>> discussion...
> This isn't about costs though.  That is a big other can of worms, indeed!
>
>
> Anyway.  In that testcase I made, everything is simplified just fine on
> aarch64, using *tbeqdi1; what am I missing?

A testcase I was looking at is:
int
foo (int a)
{
   return (a & 7) != 0;
}

For me this generates:
         and     w0, w0, 7
         cmp     w0, wzr
         cset    w0, ne
         ret

when it could be:
         tst      w0, 7
         cset     w0, ne
         ret

Kyrill

>
>
> Segher
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC: Combine of compare & and oddity
  2015-09-03 16:15         ` Jeff Law
@ 2015-09-03 16:40           ` Segher Boessenkool
  2015-09-03 18:56             ` Wilco Dijkstra
  0 siblings, 1 reply; 26+ messages in thread
From: Segher Boessenkool @ 2015-09-03 16:40 UTC (permalink / raw)
  To: Jeff Law; +Cc: Wilco Dijkstra, 'GCC Patches'

On Thu, Sep 03, 2015 at 10:09:36AM -0600, Jeff Law wrote:
> >>You will end up with a *lot* of target hooks like this.  It will also
> >>make testing harder (less coverage).  I am not sure that is a good idea.
> >
> >We certainly need a lot more target hooks in general so GCC can do the 
> >right thing
> >(rather than using costs inconsistently all over the place). But that's a 
> >different
> >discussion...
> Let's be very careful here, target hooks aren't always the solution. 
> I'd rather see the costing models fixed and use those across the board. 
>  But frankly, I don't know how to fix the costing models.

Combine doesn't currently use costs to decide how to simplify and
canonicalise things.  Simplifications are what is simpler RTL; combine's
job is to make fewer RTL instructions (which is not the same thing as
fewer machine instructions, or cheaper instructions).  Changing what is
canonical based on target hooks would be, uh, interesting.


Segher

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: RFC: Combine of compare & and oddity
  2015-09-03 16:36           ` Kyrill Tkachov
@ 2015-09-03 16:44             ` Wilco Dijkstra
  2015-09-03 17:27             ` Segher Boessenkool
  1 sibling, 0 replies; 26+ messages in thread
From: Wilco Dijkstra @ 2015-09-03 16:44 UTC (permalink / raw)
  To: Kyrylo Tkachov, Segher Boessenkool; +Cc: 'GCC Patches'

> Kyrill Tkachov wrote:
> A testcase I was looking at is:
> int
> foo (int a)
> {
>    return (a & 7) != 0;
> }
> 
> For me this generates:
>          and     w0, w0, 7
>          cmp     w0, wzr
>          cset    w0, ne
>          ret
> 
> when it could be:
>          tst      w0, 7
>          cset     w0, ne
>          ret

Indeed. And return (a & 14) != 0; generates a tst just fine (this should be an actual zero_extract:
((a >> 1) & 7) != 0 - but somehow it doesn't get converted into one...).

Wilco



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC: Combine of compare & and oddity
  2015-09-03 16:36           ` Kyrill Tkachov
  2015-09-03 16:44             ` Wilco Dijkstra
@ 2015-09-03 17:27             ` Segher Boessenkool
  2015-09-03 17:30               ` Oleg Endo
  1 sibling, 1 reply; 26+ messages in thread
From: Segher Boessenkool @ 2015-09-03 17:27 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: Wilco Dijkstra, 'GCC Patches'

On Thu, Sep 03, 2015 at 05:25:43PM +0100, Kyrill Tkachov wrote:
> >void g(void);
> >void f(int *x) { if (*x & 2) g(); }

> A testcase I was looking at is:
> int
> foo (int a)
> {
>   return (a & 7) != 0;
> }
> 
> For me this generates:
>         and     w0, w0, 7
>         cmp     w0, wzr
>         cset    w0, ne
>         ret
> 
> when it could be:
>         tst      w0, 7
>         cset     w0, ne
>         ret

Interesting, thanks.

That testcase with 4 (instead of 7) results in a single ubfx (a zero_extract)
(this case is written differently before combine already, however).
With 6 it does what you want (combine does not handle it as an extract,
no matter what the docs say); and 7 is as you say (combine tries the extract,
there is no insn like that).


Segher

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC: Combine of compare & and oddity
  2015-09-03 17:27             ` Segher Boessenkool
@ 2015-09-03 17:30               ` Oleg Endo
  2015-09-03 18:53                 ` Wilco Dijkstra
  0 siblings, 1 reply; 26+ messages in thread
From: Oleg Endo @ 2015-09-03 17:30 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Kyrill Tkachov, Wilco Dijkstra, GCC Patches


On 04 Sep 2015, at 01:54, Segher Boessenkool <segher@kernel.crashing.org> wrote:

> On Thu, Sep 03, 2015 at 05:25:43PM +0100, Kyrill Tkachov wrote:
>>> void g(void);
>>> void f(int *x) { if (*x & 2) g(); }
> 
>> A testcase I was looking at is:
>> int
>> foo (int a)
>> {
>>  return (a & 7) != 0;
>> }
>> 
>> For me this generates:
>>        and     w0, w0, 7
>>        cmp     w0, wzr
>>        cset    w0, ne
>>        ret
>> 
>> when it could be:
>>        tst      w0, 7
>>        cset     w0, ne
>>        ret
> 
> Interesting, thanks.
> 
> That testcase with 4 (instead of 7) results in a single ubfx (a zero_extract)
> (this case is written differently before combine already, however).
> With 6 it does what you want (combine does not handle it as an extract,
> no matter what the docs say); and 7 is as you say (combine tries the extract,
> there is no insn like that).

I've been through this on SH.  As it currently stands, to generate tst insns basically 4 different combine patterns are required:
 - lsb (e.g. & 1)
 - one bit (zero extract, e.g. & 2)
 - n contiguous bits (zero extract, e.g. & 7)
 - everything else (e.g. 4)

Cheers,
Oleg

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC: Combine of compare & and oddity
  2015-09-03 16:24         ` Segher Boessenkool
  2015-09-03 16:36           ` Kyrill Tkachov
@ 2015-09-03 18:42           ` Jeff Law
  2015-09-03 19:06             ` Segher Boessenkool
  1 sibling, 1 reply; 26+ messages in thread
From: Jeff Law @ 2015-09-03 18:42 UTC (permalink / raw)
  To: Segher Boessenkool, Wilco Dijkstra; +Cc: 'GCC Patches'

On 09/03/2015 10:18 AM, Segher Boessenkool wrote:
>
>>>> But there are more efficient ways to emit single bit and masks tests that apply
>>>> to most CPUs rather than doing something specific that works for just one target
>>>> only. For example single bit test is a simple shift into carry flag or into the
>>>> sign bit, and for mask tests, you shift out all the non-mask bits.
>>>
>>> Most of those are quite target-specific.  Some others are already done,
>>> and/or done by other passes.
>>
>> But what combine does here is even more target-specific.
>
> Combine puts everything (well, most things) through
> make_compound_operation, on all targets.
Note huge parts of combine are structured around the needs of processors 
from the late 80s and early 90s.  The canonical forms selected for those 
processors may not be optimal for today's processors.

The problem (of course) is changing the canonical forms can be a ton of 
work in both the backends as well as combine itself to ensure quality of 
code doesn't regress.
>
>>> Combine converts the merged instructions to what it thinks is the
>>> canonical or cheapest form, and uses that.  It does not try multiple
>>> options (the zero_ext* -> and+shift rewriting is not changing the
>>> semantics of the pattern at all).
>>
>> But the change from AND to zero_extract is already changing semantics...
>
> Oh?  It is not supposed to!
Combine should never change semantics.  It can change form and may 
change what happens to "don't care" bits.  But it should never change 
visible semantics.

Jeff

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: RFC: Combine of compare & and oddity
  2015-09-03 17:30               ` Oleg Endo
@ 2015-09-03 18:53                 ` Wilco Dijkstra
  0 siblings, 0 replies; 26+ messages in thread
From: Wilco Dijkstra @ 2015-09-03 18:53 UTC (permalink / raw)
  To: 'Oleg Endo', Segher Boessenkool; +Cc: Kyrylo Tkachov, GCC Patches

> Oleg Endo wrote:
> On 04 Sep 2015, at 01:54, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> > On Thu, Sep 03, 2015 at 05:25:43PM +0100, Kyrill Tkachov wrote:
> >>> void g(void);
> >>> void f(int *x) { if (*x & 2) g(); }
> >
> >> A testcase I was looking at is:
> >> int
> >> foo (int a)
> >> {
> >>  return (a & 7) != 0;
> >> }
> >>
> >> For me this generates:
> >>        and     w0, w0, 7
> >>        cmp     w0, wzr
> >>        cset    w0, ne
> >>        ret
> >>
> >> when it could be:
> >>        tst      w0, 7
> >>        cset     w0, ne
> >>        ret
> >
> > Interesting, thanks.
> >
> > That testcase with 4 (instead of 7) results in a single ubfx (a zero_extract)
> > (this case is written differently before combine already, however).
> > With 6 it does what you want (combine does not handle it as an extract,
> > no matter what the docs say); and 7 is as you say (combine tries the extract,
> > there is no insn like that).

(a & 1) != 0 is optimized earlier to a & 1, (a & 2) != 0 to a real zero_extract 
(non-zero shift), so these cases don't need a compare. return (a & C) ? 2 : 3 
always uses a compare with zero, even for C=1.

> I've been through this on SH.  As it currently stands, to generate tst insns basically 4
> different combine patterns are required:
>  - lsb (e.g. & 1)
>  - one bit (zero extract, e.g. & 2)
>  - n contiguous bits (zero extract, e.g. & 7)
>  - everything else (e.g. 4)

Also: (a & 255) and (a & 65535) which are converted into zero_extend by combine.
Interestingly a subreg is used depending on whether the operand is a virtual or physical reg -
((a + 1) & 255) == 0 vs (a & 65535) == 0:

Failed to match this instruction:
(set (reg:CC_ZESWP 66 cc)
    (compare:CC_ZESWP (reg:HI 0 x0 [ xD.2661 ])
        (const_int 0 [0])))

Failed to match this instruction:
(set (reg:CC_ZESWP 66 cc)
    (compare:CC_ZESWP (subreg:QI (reg:SI 80 [ D.2778 ]) 0)
        (const_int 0 [0])))

So that means another 2 patterns - and all that for one simple instruction...

Wilco


^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: RFC: Combine of compare & and oddity
  2015-09-03 16:40           ` Segher Boessenkool
@ 2015-09-03 18:56             ` Wilco Dijkstra
  2015-09-03 19:05               ` Jeff Law
  2015-09-03 19:22               ` Segher Boessenkool
  0 siblings, 2 replies; 26+ messages in thread
From: Wilco Dijkstra @ 2015-09-03 18:56 UTC (permalink / raw)
  To: 'Segher Boessenkool', Jeff Law; +Cc: 'GCC Patches'

> Segher Boessenkool wrote:
> On Thu, Sep 03, 2015 at 10:09:36AM -0600, Jeff Law wrote:
> > >>You will end up with a *lot* of target hooks like this.  It will also
> > >>make testing harder (less coverage).  I am not sure that is a good idea.
> > >
> > >We certainly need a lot more target hooks in general so GCC can do the
> > >right thing
> > >(rather than using costs inconsistently all over the place). But that's a
> > >different
> > >discussion...
> > Let's be very careful here, target hooks aren't always the solution.
> > I'd rather see the costing models fixed and use those across the board.
> >  But frankly, I don't know how to fix the costing models.
> 
> Combine doesn't currently use costs to decide how to simplify and
> canonicalise things.  Simplifications are what is simpler RTL; combine's
> job is to make fewer RTL instructions (which is not the same thing as
> fewer machine instructions, or cheaper instructions).  Changing what is
> canonical based on target hooks would be, uh, interesting.

Would it be reasonable to query the rtx_cost of a compare+and and if the cost
is the same as an AND assume that that instruction does not need to be "improved"
into the canonical form? That way it will use the compare+and pattern if it exists
and still try the zero_extract/shift+and forms for targets that don't have a 
compare+and instruction.

Wilco



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC: Combine of compare & and oddity
  2015-09-03 18:56             ` Wilco Dijkstra
@ 2015-09-03 19:05               ` Jeff Law
  2015-09-03 19:20                 ` Richard Sandiford
  2015-09-03 19:22               ` Segher Boessenkool
  1 sibling, 1 reply; 26+ messages in thread
From: Jeff Law @ 2015-09-03 19:05 UTC (permalink / raw)
  To: Wilco Dijkstra, 'Segher Boessenkool'; +Cc: 'GCC Patches'

On 09/03/2015 12:53 PM, Wilco Dijkstra wrote:
>> Segher Boessenkool wrote:
>> On Thu, Sep 03, 2015 at 10:09:36AM -0600, Jeff Law wrote:
>>>>> You will end up with a *lot* of target hooks like this.  It will also
>>>>> make testing harder (less coverage).  I am not sure that is a good idea.
>>>>
>>>> We certainly need a lot more target hooks in general so GCC can do the
>>>> right thing
>>>> (rather than using costs inconsistently all over the place). But that's a
>>>> different
>>>> discussion...
>>> Let's be very careful here, target hooks aren't always the solution.
>>> I'd rather see the costing models fixed and use those across the board.
>>>   But frankly, I don't know how to fix the costing models.
>>
>> Combine doesn't currently use costs to decide how to simplify and
>> canonicalise things.  Simplifications are what is simpler RTL; combine's
>> job is to make fewer RTL instructions (which is not the same thing as
>> fewer machine instructions, or cheaper instructions).  Changing what is
>> canonical based on target hooks would be, uh, interesting.
>
> Would it be reasonable to query the rtx_cost of a compare+and and if the cost
> is the same as an AND assume that that instruction does not need to be "improved"
> into the canonical form? That way it will use the compare+and pattern if it exists
> and still try the zero_extract/shift+and forms for targets that don't have a
> compare+and instruction.
Perhaps -- but you also have to make sure that you don't regress cases 
where canonicalization in turn exposes simplifications due to related 
insns in the chain.

Jeff

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC: Combine of compare & and oddity
  2015-09-03 18:42           ` Jeff Law
@ 2015-09-03 19:06             ` Segher Boessenkool
  0 siblings, 0 replies; 26+ messages in thread
From: Segher Boessenkool @ 2015-09-03 19:06 UTC (permalink / raw)
  To: Jeff Law; +Cc: Wilco Dijkstra, 'GCC Patches'

On Thu, Sep 03, 2015 at 12:42:30PM -0600, Jeff Law wrote:
> Note huge parts of combine are structured around the needs of processors 
> from the late 80s and early 90s.  The canonical forms selected for those 
> processors may not be optimal for today's processors.

Or, more precisely, for the backends for those processors.  Some of the
canonicalisation rules are very inconvenient for some backends.

> The problem (of course) is changing the canonical forms can be a ton of 
> work in both the backends as well as combine itself to ensure quality of 
> code doesn't regress.

Yes exactly.  Even more so than with other combine changes, before we
do such changes we need to evaluate 1) what this changes, on what targets;
and 2) how big the impact of that is.

Without a proposed patch, all I can say is "most targets will need changes".

> >>But the change from AND to zero_extract is already changing semantics...
> >
> >Oh?  It is not supposed to!
> Combine should never change semantics.  It can change form and may 
> change what happens to "don't care" bits.  But it should never change 
> visible semantics.

And in the reverse transform (in change_zero_ext), it is hard to tell
what those "don't care" bits are (so there are no such bits).


Segher

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC: Combine of compare & and oddity
  2015-09-03 19:05               ` Jeff Law
@ 2015-09-03 19:20                 ` Richard Sandiford
  2015-09-03 21:20                   ` Jeff Law
  0 siblings, 1 reply; 26+ messages in thread
From: Richard Sandiford @ 2015-09-03 19:20 UTC (permalink / raw)
  To: Jeff Law
  Cc: Wilco Dijkstra, 'Segher Boessenkool', 'GCC Patches'

Jeff Law <law@redhat.com> writes:
> On 09/03/2015 12:53 PM, Wilco Dijkstra wrote:
>>> Segher Boessenkool wrote:
>>> On Thu, Sep 03, 2015 at 10:09:36AM -0600, Jeff Law wrote:
>>>>>> You will end up with a *lot* of target hooks like this.  It will also
>>>>>> make testing harder (less coverage).  I am not sure that is a good idea.
>>>>>
>>>>> We certainly need a lot more target hooks in general so GCC can do the
>>>>> right thing
>>>>> (rather than using costs inconsistently all over the place). But that's a
>>>>> different
>>>>> discussion...
>>>> Let's be very careful here, target hooks aren't always the solution.
>>>> I'd rather see the costing models fixed and use those across the board.
>>>>   But frankly, I don't know how to fix the costing models.
>>>
>>> Combine doesn't currently use costs to decide how to simplify and
>>> canonicalise things.  Simplifications are what is simpler RTL; combine's
>>> job is to make fewer RTL instructions (which is not the same thing as
>>> fewer machine instructions, or cheaper instructions).  Changing what is
>>> canonical based on target hooks would be, uh, interesting.
>>
>> Would it be reasonable to query the rtx_cost of a compare+and and if the cost
>> is the same as an AND assume that that instruction does not need to be
>> "improved"
>> into the canonical form? That way it will use the compare+and pattern
>> if it exists
>> and still try the zero_extract/shift+and forms for targets that don't have a
>> compare+and instruction.
> Perhaps -- but you also have to make sure that you don't regress cases 
> where canonicalization in turn exposes simplifications due to related 
> insns in the chain.

I agree with Segher that the canonical form really shouldn't depend
on costs or target hooks.  It's just going to be a can of worms.
And patterns shouldn't match non-canonical rtl.

If the (and ...) form is a better canonical form (IMO yes) then
I think it would be better to make it the canonical form across
the baord and update the existing ports to use it.  The criteria
could be something like no unjustifiable differences in gcc.dg,
g++.dg and gcc.c-torture .s output for -O2, which is relatively
easy to test.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC: Combine of compare & and oddity
  2015-09-03 18:56             ` Wilco Dijkstra
  2015-09-03 19:05               ` Jeff Law
@ 2015-09-03 19:22               ` Segher Boessenkool
  1 sibling, 0 replies; 26+ messages in thread
From: Segher Boessenkool @ 2015-09-03 19:22 UTC (permalink / raw)
  To: Wilco Dijkstra; +Cc: Jeff Law, 'GCC Patches'

On Thu, Sep 03, 2015 at 07:53:12PM +0100, Wilco Dijkstra wrote:
> > > >>You will end up with a *lot* of target hooks like this.  It will also
> > > >>make testing harder (less coverage).  I am not sure that is a good idea.
> > > >
> > > >We certainly need a lot more target hooks in general so GCC can do the
> > > >right thing
> > > >(rather than using costs inconsistently all over the place). But that's a
> > > >different
> > > >discussion...
> > > Let's be very careful here, target hooks aren't always the solution.
> > > I'd rather see the costing models fixed and use those across the board.
> > >  But frankly, I don't know how to fix the costing models.
> > 
> > Combine doesn't currently use costs to decide how to simplify and
> > canonicalise things.  Simplifications are what is simpler RTL; combine's
> > job is to make fewer RTL instructions (which is not the same thing as
> > fewer machine instructions, or cheaper instructions).  Changing what is
> > canonical based on target hooks would be, uh, interesting.
> 
> Would it be reasonable to query the rtx_cost of a compare+and and if the cost
> is the same as an AND assume that that instruction does not need to be "improved"
> into the canonical form? That way it will use the compare+and pattern if it exists
> and still try the zero_extract/shift+and forms for targets that don't have a 
> compare+and instruction.

At the point the canonicalisation is done you do not yet know if this
is a valid instruction at all.  Introducing more cost computations for
random things is not such a great idea, and for RTL that can never be
part of a machine instruction doubly so.

I think we really should just change what is the canonical form for such
a comparison.


Segher

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC: Combine of compare & and oddity
  2015-09-03 19:20                 ` Richard Sandiford
@ 2015-09-03 21:20                   ` Jeff Law
  0 siblings, 0 replies; 26+ messages in thread
From: Jeff Law @ 2015-09-03 21:20 UTC (permalink / raw)
  To: Wilco Dijkstra, 'Segher Boessenkool',
	'GCC Patches',
	rdsandiford

On 09/03/2015 01:14 PM, Richard Sandiford wrote:
>
> If the (and ...) form is a better canonical form (IMO yes) then
> I think it would be better to make it the canonical form across
> the baord and update the existing ports to use it.  The criteria
> could be something like no unjustifiable differences in gcc.dg,
> g++.dg and gcc.c-torture .s output for -O2, which is relatively
> easy to test.
I'd support this.

jeff

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2015-09-03 21:12 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-02 17:09 RFC: Combine of compare & and oddity Wilco Dijkstra
2015-09-02 18:49 ` Segher Boessenkool
2015-09-02 19:56   ` Segher Boessenkool
2015-09-03 11:51   ` Wilco Dijkstra
2015-09-03 13:57     ` Segher Boessenkool
2015-09-03 15:00       ` Wilco Dijkstra
2015-09-03 15:21         ` Andrew Pinski
2015-09-03 16:15         ` Jeff Law
2015-09-03 16:40           ` Segher Boessenkool
2015-09-03 18:56             ` Wilco Dijkstra
2015-09-03 19:05               ` Jeff Law
2015-09-03 19:20                 ` Richard Sandiford
2015-09-03 21:20                   ` Jeff Law
2015-09-03 19:22               ` Segher Boessenkool
2015-09-03 16:24         ` Segher Boessenkool
2015-09-03 16:36           ` Kyrill Tkachov
2015-09-03 16:44             ` Wilco Dijkstra
2015-09-03 17:27             ` Segher Boessenkool
2015-09-03 17:30               ` Oleg Endo
2015-09-03 18:53                 ` Wilco Dijkstra
2015-09-03 18:42           ` Jeff Law
2015-09-03 19:06             ` Segher Boessenkool
2015-09-03 15:52       ` Jeff Law
2015-09-02 20:45 ` Jeff Law
2015-09-02 21:05   ` Segher Boessenkool
2015-09-03 15:26     ` Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).