From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-406664-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 44173 invoked by alias); 3 Sep 2015 16:25:51 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 44153 invoked by uid 89); 3 Sep 2015 16:25:50 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00,SPF_PASS autolearn=ham version=3.3.2
X-HELO: eu-smtp-delivery-143.mimecast.com
Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (146.101.78.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 03 Sep 2015 16:25:48 +0000
Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.140]) by eu-smtp-1.mimecast.com with ESMTP id uk-mta-29-SSlA0QrNSlKlTSHoI-DoGA-1; Thu, 03 Sep 2015 17:25:43 +0100
Received: from [10.2.207.50] ([10.1.2.79]) by cam-owa1.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959);	 Thu, 3 Sep 2015 17:25:43 +0100
Message-ID: <55E87487.4060101@arm.com>
Date: Thu, 03 Sep 2015 16:36:00 -0000
From: Kyrill Tkachov <kyrylo.tkachov@arm.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0
MIME-Version: 1.0
To: Segher Boessenkool <segher@kernel.crashing.org>,  Wilco Dijkstra <Wilco.Dijkstra@arm.com>
CC: 'GCC Patches' <gcc-patches@gcc.gnu.org>
Subject: Re: RFC: Combine of compare & and oddity
References: <000e01d0e5a2$1e2f66b0$5a8e3410$@com> <20150902184747.GA7676@gate.crashing.org> <000f01d0e63d$c40686e0$4c1394a0$@com> <20150903131809.GA27819@gate.crashing.org> <001001d0e659$1120bb60$33623220$@com> <20150903161825.GA13559@gate.crashing.org>
In-Reply-To: <20150903161825.GA13559@gate.crashing.org>
X-MC-Unique: SSlA0QrNSlKlTSHoI-DoGA-1
Content-Type: text/plain; charset=WINDOWS-1252; format=flowed
Content-Transfer-Encoding: quoted-printable
X-IsSubscribed: yes
X-SW-Source: 2015-09/txt/msg00289.txt.bz2


On 03/09/15 17:18, Segher Boessenkool wrote:
> On Thu, Sep 03, 2015 at 03:59:00PM +0100, Wilco Dijkstra wrote:
>>>> However there are 2 issues with this, one is the spurious subreg,
>>> Combine didn't make that up out of thin air; something already used
>>> DImode here.  It could simplify it to SImode in this case, that is
>>> true, don't know why it doesn't; it isn't necessarily faster code to
>>> do so, it can be slower, it might not match, etc.
>> The relevant RTL instructions on AArch64 are:
> [ You never gave a full  test case, or I missed it, or cannot find it
>    anymore -- but I can reproduce this now:
>
> void g(void);
> void f(int *x) { if (*x & 2) g(); }
>
> ]
>
>> (insn 8 3 25 2 (set (reg:SI 77 [ D.2705 ])
>>          (and:SI (reg/v:SI 76 [ xD.2641 ])
>>              (const_int 2 [0x2]))) tmp5.c:122 452 {andsi3}
>>       (nil))
>>   (insn 26 25 27 2 (set (reg:CC 66 cc)
>>          (compare:CC (reg:SI 77 [ D.2705 ])
>>              (const_int 0 [0]))) tmp5.c:122 377 {*cmpsi}
>>       (expr_list:REG_DEAD (reg:SI 77 [ D.2705 ])
>>          (nil)))
>>
>> I don't see anything using DI...
> Yeah, I spoke too soon, sorry.  It looks like make_compound_operation came
> up with it.
>
>>> It's only a problem for AND-and-compare, no?
>> Yes, so it looks like some other backends match the odd pattern and then=
 have another
>> pattern change it back into the canonical AND/TST form during the split =
phase (maybe
>> the subreg confuses register allocation or block other optimizations).
> A subreg of a pseudo is not anything special, don't worry about it,
> register_operand and similar treat it just like any other register.
>
>> This all seems
>> a lot of unnecessary complexity for a few special immediates when there =
is a much
>> simpler solution...
> Feel free to post a patch!  I would love to have this all simplified.
>
>>>> But there are more efficient ways to emit single bit and masks tests t=
hat apply
>>>> to most CPUs rather than doing something specific that works for just =
one target
>>>> only. For example single bit test is a simple shift into carry flag or=
 into the
>>>> sign bit, and for mask tests, you shift out all the non-mask bits.
>>> Most of those are quite target-specific.  Some others are already done,
>>> and/or done by other passes.
>> But what combine does here is even more target-specific.
> Combine puts everything (well, most things) through
> make_compound_operation, on all targets.
>
>>> Combine converts the merged instructions to what it thinks is the
>>> canonical or cheapest form, and uses that.  It does not try multiple
>>> options (the zero_ext* -> and+shift rewriting is not changing the
>>> semantics of the pattern at all).
>> But the change from AND to zero_extract is already changing semantics...
> Oh?  It is not supposed to!
>
>>>> Or would it be better to let each target decide
>>>> on how to canonicalize bit tests and only try that alternative?
>>> The question is how to write the pattern to be most convenient for all
>>> targets.
>> The obvious choice is to try the 2 original instructions merged.
> ... without any simplification.  Yes, I've wanted combine to fall back
> to that if the "simplified" version does not work out.  Not so easy to
> do though.
>
>>>> Yes, but that doesn't mean (x & C) !=3D 0 shouldn't be tried as well...
>>> Combine does not try multiple options.
>> I'm not following - combine tries zero_extract and shift+AND - that's 2 =
options.
>> If that is feasible then adding a 3rd option should be possible.
> The shift+and is *exactly the same* as the zero_extract, just written
> differently.
>
>> We certainly need a lot more target hooks in general so GCC can do the r=
ight thing
>> (rather than using costs inconsistently all over the place). But that's =
a different
>> discussion...
> This isn't about costs though.  That is a big other can of worms, indeed!
>
>
> Anyway.  In that testcase I made, everything is simplified just fine on
> aarch64, using *tbeqdi1; what am I missing?

A testcase I was looking at is:
int
foo (int a)
{
   return (a & 7) !=3D 0;
}

For me this generates:
         and     w0, w0, 7
         cmp     w0, wzr
         cset    w0, ne
         ret

when it could be:
         tst      w0, 7
         cset     w0, ne
         ret

Kyrill

>
>
> Segher
>