Re: RFC: Combine of compare & and oddity

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Andrew Pinski <pinskia@gmail.com>
To: Wilco Dijkstra <wdijkstr@arm.com>
Cc: Segher Boessenkool <segher@kernel.crashing.org>,
	GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: RFC: Combine of compare & and oddity
Date: Thu, 03 Sep 2015 15:21:00 -0000	[thread overview]
Message-ID: <CA+=Sn1nP29_a9m3AHy-fOkW_1MQSpcw-V_8TCWcLenH4m+hw+A@mail.gmail.com> (raw)
In-Reply-To: <001001d0e659$1120bb60$33623220$@com>

On Thu, Sep 3, 2015 at 10:59 PM, Wilco Dijkstra <wdijkstr@arm.com> wrote:
>> Segher Boessenkool wrote:
>> On Thu, Sep 03, 2015 at 12:43:34PM +0100, Wilco Dijkstra wrote:
>> > > > Combine canonicalizes certain AND masks in a comparison with zero into extracts of the
>> > > widest
>> > > > register type. During matching these are expanded into a very inefficient sequence that
>> > > fails to
>> > > > match. For example (x & 2) == 0 is matched in combine like this:
>> > > >
>> > > > Failed to match this instruction:
>> > > > (set (reg:CC 66 cc)
>> > > >     (compare:CC (zero_extract:DI (subreg:DI (reg/v:SI 76 [ xD.2641 ]) 0)
>> > > >             (const_int 1 [0x1])
>> > > >             (const_int 1 [0x1]))
>> > > >         (const_int 0 [0])))
>> > >
>> > > Yes.  Some processors even have specific instructions to do this.
>> >
>> > However there are 2 issues with this, one is the spurious subreg,
>>
>> Combine didn't make that up out of thin air; something already used
>> DImode here.  It could simplify it to SImode in this case, that is
>> true, don't know why it doesn't; it isn't necessarily faster code to
>> do so, it can be slower, it might not match, etc.
>
> The relevant RTL instructions on AArch64 are:
>
> (insn 8 3 25 2 (set (reg:SI 77 [ D.2705 ])
>         (and:SI (reg/v:SI 76 [ xD.2641 ])
>             (const_int 2 [0x2]))) tmp5.c:122 452 {andsi3}
>      (nil))
>  (insn 26 25 27 2 (set (reg:CC 66 cc)
>         (compare:CC (reg:SI 77 [ D.2705 ])
>             (const_int 0 [0]))) tmp5.c:122 377 {*cmpsi}
>      (expr_list:REG_DEAD (reg:SI 77 [ D.2705 ])
>         (nil)))
>
> I don't see anything using DI...
>
>> > the other is
>> > that only a subset of legal zero_extracts are tried (only single bit and the
>> > degenerate case of zero_extract with shift of 0 - which I think should not be a
>> > zero_extract). All other AND immediate remain as AND.
>>
>> Yes.  I'm happy to see this weird special case "optimisation",
>> "canocalisation" gone.
>>
>> > So to emit an AND on targets without such specific instructions, or where such
>> > instructions are more expensive than a simple AND (*), you need now at least 3 different
>> > backend patterns for any instruction that can emit AND immediate...
>>
>> It's only a problem for AND-and-compare, no?
>
> Yes, so it looks like some other backends match the odd pattern and then have another
> pattern change it back into the canonical AND/TST form during the split phase (maybe
> the subreg confuses register allocation or block other optimizations). This all seems
> a lot of unnecessary complexity for a few special immediates when there is a much
> simpler solution...
>
>> > (*) I think that is another issue in combine - when both alternatives match you
>> > want to select the lowest cost one, not the first one that matches.
>>
>> That's recog, not combine.  And quite a few backends rely on "first match
>> wins", because it always has been that way.  It also is very easy to write
>> such patterns accidentally (sometimes even the exact same one twice in the
>> same machine description, etc.)
>>
>> > > > Failed to match this instruction:
>> > > > (set (reg:CC 66 cc)
>> > > >     (compare:CC (and:DI (lshiftrt:DI (subreg:DI (reg/v:SI 76 [ xD.2641 ]) 0)
>> > > >                 (const_int 1 [0x1]))
>> > > >             (const_int 1 [0x1]))
>> > > >         (const_int 0 [0])))
>> > >
>> > > This is after r223067.  Combine tests only one "final" instruction; that
>> > > revision rewrites zero_ext* if it doesn't match and tries again.  This
>> > > helps for processors that can do more generic masks (like rs6000, and I
>> > > believe also aarch64?): without it, you need many more patterns to match
>> > > all the zero_ext{ract,end} cases.
>> >
>> > But there are more efficient ways to emit single bit and masks tests that apply
>> > to most CPUs rather than doing something specific that works for just one target
>> > only. For example single bit test is a simple shift into carry flag or into the
>> > sign bit, and for mask tests, you shift out all the non-mask bits.
>>
>> Most of those are quite target-specific.  Some others are already done,
>> and/or done by other passes.
>
> But what combine does here is even more target-specific. Shifts and AND setting flags
> are universally supported on targets with condition code register.
> Bitfield test/extract instructions are more rare, and when they are supported, they
> may well be more expensive.
>
>> > So my question is, is it combine's job to try all possible permutations that
>> > constitute a bit or mask test?
>>
>> Combine converts the merged instructions to what it thinks is the
>> canonical or cheapest form, and uses that.  It does not try multiple
>> options (the zero_ext* -> and+shift rewriting is not changing the
>> semantics of the pattern at all).
>
> But the change from AND to zero_extract is already changing semantics...
>
>> > Or would it be better to let each target decide
>> > on how to canonicalize bit tests and only try that alternative?
>>
>> The question is how to write the pattern to be most convenient for all
>> targets.
>
> The obvious choice is to try the 2 original instructions merged.
>
>> > > > Neither matches the AArch64 patterns for ANDS/TST (which is just compare and AND). If
>> the
>> > > immediate
>> > > > is not a power of 2 or a power of 2 -1 then it matches correctly as expected.
>> > > >
>> > > > I don't understand how ((x >> 1) & 1) != 0 could be a useful expansion
>> > >
>> > > It is zero_extract(x,1,1) really.  This is convenient for (old and embedded)
>> > > processors that have special bit-test instructions.  If we now want combine
>> > > to not do this, we'll have to update all backends that rely on it.
>> >
>> > Would any backend actually rely on this given it only does some specific masks,
>> > has a redundant shift with 0 for the mask case and the odd subreg as well?
>>
>> Such backends match the zero_extract patterns, of course.  Random example:
>> the h8300 patterns for the "btst" instruction.
>>
>> > > They are common, and many processors had instructions for them, which is
>> > > (supposedly) why combine special-cased them.
>> >
>> > Yes, but that doesn't mean (x & C) != 0 shouldn't be tried as well...
>>
>> Combine does not try multiple options.
>
> I'm not following - combine tries zero_extract and shift+AND - that's 2 options.
> If that is feasible then adding a 3rd option should be possible.
>
>> > > > It's trivial to change change_zero_ext to expand extracts always into AND and remove the
>> > > redundant
>> > > > subreg.
>> > >
>> > > Not really trivial...  Think about comparisons...
>> > >
>> > > int32_t x;
>> > > ((x >> 31) & 1) > 0;
>> > > // is not the same as
>> > > (x & 0x80000000) > 0; // signed comparison
>> > >
>> > > (You do not easily know what the COMPARE is used for).
>> >
>> > Indeed if you had a zero_extract that included the sign-bit then you may have to adjust
>> > the compare condition. However it looks like that case can't happen - (x & 0x80000000)
>> > comparisons with have the AND optimized away much earlier.
>>
>> A long time before combine, yes.  But it is *possible* to give such insns
>> to combine, and it should not do the wrong thing with them.
>
> Yes. So that suggests we'd need to block the canonicalization rather than undo it.
>
>> > > > However wouldn't it make more sense to never special case certain AND immediate in the
>> first
>> > > > place?
>> > >
>> > > Yes, but we need to make sure no targets regress (i.e. by rewriting patterns
>> > > for those that do).  We need to know the impact of such a change, at the least.
>> >
>> > The alternative would be let the target decide how to canonicalize bit tests.
>> > That seems like a better solution than trying multiple possibilities which can never
>> > match on most targets.
>>
>> You will end up with a *lot* of target hooks like this.  It will also
>> make testing harder (less coverage).  I am not sure that is a good idea.
>
> We certainly need a lot more target hooks in general so GCC can do the right thing
> (rather than using costs inconsistently all over the place). But that's a different
> discussion...


This is the pattern which I have in my tree (I need someone here to
submit it still):

(define_insn "*and<mode>3_ze_nr_compare0"
  [(set (reg:CC_NZ CC_REGNUM)
(compare:CC_NZ
(zero_extract:GPI  ; loc size pos
                  (match_operand:GPI 0 "register_operand" "r")
                  (match_operand:GPI 1 "const_int_operand" "n")
 (match_operand:GPI 2 "const_int_operand" "n"))
(const_int 0)))]
  "aarch64_bitmask_imm ((((HOST_WIDE_INT_1U << UINTVAL (operands[1]))
- 1) << UINTVAL (operands[2])),<MODE>mode)"
  {
    unsigned HOST_WIDE_INT value  = (((HOST_WIDE_INT_1U << UINTVAL
(operands[1])) - 1) << UINTVAL (operands[2]));
    operands[1] = GEN_INT (value);
    return "tst\\t%<w>0, %<w>1";
  }
  [(set_attr "type" "logics_reg")]
)

--- CUT ---

And then under case COMPARE part of rtx_costs:
          /* CC_NZmode supports zero extract for free.  */
          if (GET_MODE (x) == CC_NZmode && GET_CODE (op0) == ZERO_EXTRACT)
            op0 = XEXP (op0, 0);

And in aarch64_select_cc_mode:

  if ((GET_MODE (x) == SImode || GET_MODE (x) == DImode)
      && y == const0_rtx
      && (code == EQ || code == NE || code == LT || code == GE)
      && (GET_CODE (x) == PLUS || GET_CODE (x) == MINUS || GET_CODE (x) == AND
 || GET_CODE (x) == NEG || GET_CODE (x) == ZERO_EXTRACT))
    return CC_NZmode;


Note I also have the following too, to solve the case where
zero_extend is need being used in some cases for ands (yes this shows
up mostly right after returns from functions)

(define_insn "*andsi3_compare0_uxtw"
  [(set (reg:CC_NZ CC_REGNUM)
(compare:CC_NZ
(and:SI (match_operand:SI 1 "register_operand" "%r,r")
(match_operand:SI 2 "aarch64_logical_operand" "r,K"))
(const_int 0)))
   (set (match_operand:DI 0 "register_operand" "=r,r")
(zero_extend:DI (and:SI (match_dup 1) (match_dup 2))))]
  ""
  "ands\\t%w0, %w1, %w2"
  [(set_attr "type" "logics_reg,logics_imm")]


Thanks,
Andrew Pinski




>
> Wilco
>
>

next prev parent reply	other threads:[~2015-09-03 15:14 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-02 17:09 Wilco Dijkstra
2015-09-02 18:49 ` Segher Boessenkool
2015-09-02 19:56   ` Segher Boessenkool
2015-09-03 11:51   ` Wilco Dijkstra
2015-09-03 13:57     ` Segher Boessenkool
2015-09-03 15:00       ` Wilco Dijkstra
2015-09-03 15:21         ` Andrew Pinski [this message]
2015-09-03 16:15         ` Jeff Law
2015-09-03 16:40           ` Segher Boessenkool
2015-09-03 18:56             ` Wilco Dijkstra
2015-09-03 19:05               ` Jeff Law
2015-09-03 19:20                 ` Richard Sandiford
2015-09-03 21:20                   ` Jeff Law
2015-09-03 19:22               ` Segher Boessenkool
2015-09-03 16:24         ` Segher Boessenkool
2015-09-03 16:36           ` Kyrill Tkachov
2015-09-03 16:44             ` Wilco Dijkstra
2015-09-03 17:27             ` Segher Boessenkool
2015-09-03 17:30               ` Oleg Endo
2015-09-03 18:53                 ` Wilco Dijkstra
2015-09-03 18:42           ` Jeff Law
2015-09-03 19:06             ` Segher Boessenkool
2015-09-03 15:52       ` Jeff Law
2015-09-02 20:45 ` Jeff Law
2015-09-02 21:05   ` Segher Boessenkool
2015-09-03 15:26     ` Jeff Law

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+=Sn1nP29_a9m3AHy-fOkW_1MQSpcw-V_8TCWcLenH4m+hw+A@mail.gmail.com' \
    --to=pinskia@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=segher@kernel.crashing.org \
    --cc=wdijkstr@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).