From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-408315-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 104938 invoked by alias); 25 Sep 2015 11:42:13 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 104808 invoked by uid 89); 25 Sep 2015 11:42:12 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2
X-HELO: mail-yk0-f171.google.com
Received: from mail-yk0-f171.google.com (HELO mail-yk0-f171.google.com) (209.85.160.171) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Fri, 25 Sep 2015 11:42:10 +0000
Received: by ykdz138 with SMTP id z138so111987147ykd.2        for <gcc-patches@gcc.gnu.org>; Fri, 25 Sep 2015 04:42:08 -0700 (PDT)
MIME-Version: 1.0
X-Received: by 10.170.57.80 with SMTP id 77mr4181518ykz.59.1443181328480; Fri, 25 Sep 2015 04:42:08 -0700 (PDT)
Received: by 10.37.93.136 with HTTP; Fri, 25 Sep 2015 04:42:08 -0700 (PDT)
In-Reply-To: <5605303B.7080102@arm.com>
References: <55E092D9.8070700@arm.com>	<alpine.DEB.2.20.1508281956570.9451@laptop-mg.saclay.inria.fr>	<55E5AACB.3050205@arm.com>	<CAFiYyc0j=1JYycYi9zBFMZooCE7C-mxZUUjiXzrAX47jgKaRvw@mail.gmail.com>	<55E82AFF.4020401@arm.com>	<CAFiYyc2znWLc-S7txfaJT2VatwSmg_7pZ81__ehoGpU1Qdtf6w@mail.gmail.com>	<5605303B.7080102@arm.com>
Date: Fri, 25 Sep 2015 12:22:00 -0000
Message-ID: <CAFiYyc1VRqo_vKHjnmnSStKS=4-B3nrgv4cus+wdGkxFGU+CMg@mail.gmail.com>
Subject: Re: [PATCH v2][GCC] Algorithmic optimization in match and simplify
From: Richard Biener <richard.guenther@gmail.com>
To: Andre Vieira <Andre.SimoesDiasVieira@arm.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Content-Type: text/plain; charset=UTF-8
X-IsSubscribed: yes
X-SW-Source: 2015-09/txt/msg01940.txt.bz2

On Fri, Sep 25, 2015 at 1:30 PM, Andre Vieira
<Andre.SimoesDiasVieira@arm.com> wrote:
> On 17/09/15 10:46, Richard Biener wrote:
>>
>> On Thu, Sep 3, 2015 at 1:11 PM, Andre Vieira
>> <Andre.SimoesDiasVieira@arm.com> wrote:
>>>
>>> On 01/09/15 15:01, Richard Biener wrote:
>>>>
>>>>
>>>> On Tue, Sep 1, 2015 at 3:40 PM, Andre Vieira
>>>> <Andre.SimoesDiasVieira@arm.com> wrote:
>>>>>
>>>>>
>>>>> Hi Marc,
>>>>>
>>>>> On 28/08/15 19:07, Marc Glisse wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> (not a review, I haven't even read the whole patch)
>>>>>>
>>>>>> On Fri, 28 Aug 2015, Andre Vieira wrote:
>>>>>>
>>>>>>> 2015-08-03  Andre Vieira  <andre.simoesdiasvieira@arm.com>
>>>>>>>
>>>>>>>     * match.pd: Added new patterns:
>>>>>>>       ((X {&,<<,>>} C0) {|,^} C1) {^,|} C2)
>>>>>>>       (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>}
>>>>>>> C1)
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> +(for op0 (rshift rshift lshift lshift bit_and bit_and)
>>>>>> + op1 (bit_ior bit_xor bit_ior bit_xor bit_ior bit_xor)
>>>>>> + op2 (bit_xor bit_ior bit_xor bit_ior bit_xor bit_ior)
>>>>>>
>>>>>> You can nest for-loops, it seems clearer as:
>>>>>> (for op0 (rshift lshift bit_and)
>>>>>>      (for op1 (bit_ior bit_xor)
>>>>>>           op2 (bit_xor bit_ior)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Will do, thank you for pointing it out.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> +(simplify
>>>>>> + (op2:c
>>>>>> +  (op1:c
>>>>>> +   (op0 @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3)
>>>>>>
>>>>>> I suspect you will want more :s (single_use) and less :c
>>>>>> (canonicalization
>>>>>> should put constants in second position).
>>>>>>
>>>>> I can't find the definition of :s (single_use).
>>>>
>>>>
>>>>
>>>> Sorry for that - didn't get along updating it yet :/  It restricts
>>>> matching to
>>>> sub-expressions that have a single-use.  So
>>>>
>>>> +  a &= 0xd123;
>>>> +  unsigned short tem = a ^ 0x6040;
>>>> +  a = tem | 0xc031; /* Simplify _not_ to ((a & 0xd123) | 0xe071).  */
>>>> ... use of tem ...
>>>>
>>>> we shouldn't do the simplifcation here because the expression
>>>> (a & 0x123) ^ 0x6040 is kept live by 'tem'.
>>>>
>>>>> GCC internals do point out
>>>>> that canonicalization does put constants in the second position, didnt
>>>>> see
>>>>> that first. Thank you for pointing it out.
>>>>>
>>>>>> +       C1 = wi::bit_and_not (C1,C2);
>>>>>>
>>>>>> Space after ','.
>>>>>>
>>>>> Will do.
>>>>>
>>>>>> Having wide_int_storage in many places is surprising, I can't find
>>>>>> similar
>>>>>> code anywhere else in gcc.
>>>>>>
>>>>>>
>>>>>
>>>>> I tried looking for examples of something similar, I think I ended up
>>>>> using
>>>>> wide_int because I was able to convert easily to and from it and it has
>>>>> the
>>>>> "mask" and "wide_int_to_tree" functions. I welcome any suggestions on
>>>>> what I
>>>>> should be using here for integer constant transformations and
>>>>> comparisons.
>>>>
>>>>
>>>>
>>>> Using wide-ints is fine, but you shouldn't need 'wide_int_storage'
>>>> constructors - those
>>>> are indeed odd.  Is it just for the initializers of wide-ints?
>>>>
>>>> +    wide_int zero_mask = wi::zero (prec);
>>>> +    wide_int C0 = wide_int_storage (@1);
>>>> +    wide_int C1 = wide_int_storage (@2);
>>>> +    wide_int C2 = wide_int_storage (@3);
>>>> ...
>>>> +       zero_mask = wide_int_storage (wi::mask (C0.to_uhwi (), false,
>>>> prec));
>>>>
>>>> tree_to_uhwi (@1) should do the trick as well
>>>>
>>>> +       C1 = wi::bit_and_not (C1,C2);
>>>> +       cst_emit = wi::bit_or (C1, C2);
>>>>
>>>> the ops should be replacable with @2 and @3, the store to C1 obviously
>>>> not
>>>> (but you can use a tree temporary and use wide_int_to_tree here to avoid
>>>> the back-and-forth for the case where C1 is not assigned to).
>>>>
>>>> Note that transforms only doing association are prone to endless
>>>> recursion
>>>> in case some other pattern does the reverse op...
>>>>
>>>> Richard.
>>>>
>>>>
>>>>> BR,
>>>>> Andre
>>>>>
>>>>
>>> Thank you for all the comments, see reworked version:
>>>
>>> Two new algorithmic optimisations:
>>>    1.((X op0 C0) op1 C1) op2 C2)
>>>      with op0 = {&, >>, <<}, op1 = {|,^}, op2 = {|,^} and op1 != op2
>>>      zero_mask has 1's for all bits that are sure to be 0 in (X op0 C0)
>>>      and 0's otherwise.
>>>      if (op1 == '^') C1 &= ~C2 (Only changed if actually emitted)
>>>      if ((C1 & ~zero_mask) == 0) then emit (X op0 C0) op2 (C1 op2 C2)
>>>      if ((C2 & ~zero_mask) == 0) then emit (X op0 C0) op1 (C1 op2 C2)
>>>    2. (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>} C1)
>>>
>>>
>>> This patch does two algorithmic optimisations that target patterns like:
>>> (((x << 24) | 0x00FFFFFF) ^ 0xFF000000) and ((x ^ 0x40000002) >> 1) |
>>> 0x80000000.
>>>
>>> The transformation uses the knowledge of which bits are zero after
>>> operations like (X {&,<<,(unsigned)>>}) to combine constants, reducing
>>> run-time operations.
>>> The two examples above would be transformed into (X << 24) ^ 0xFFFFFFFF
>>> and
>>> (X >> 1) ^ 0xa0000001 respectively.
>>>
>>> The second transformation enables more applications of the first. Also
>>> some
>>> targets may benefit from delaying shift operations. I am aware that such
>>> an
>>> optimization, in combination with one or more optimizations that cause
>>> the
>>> reverse transformation, may lead to an infinite loop. Though such
>>> behavior
>>> has not been detected during regression testing and bootstrapping on
>>> aarch64.
>>
>>
>> +/* (X bit_op C0) rshift C1 -> (X rshift C0) bit_op (C0 rshift C1) */
>> +(for bit_op (bit_ior bit_xor bit_and)
>> +(simplify
>> + (rshift (bit_op:s @0 INTEGER_CST@1) INTEGER_CST@2)
>> + (bit_op
>> +  (rshift @0 @2)
>> +  { wide_int_to_tree (type, wi::rshift (@1, @2, TYPE_SIGN (type))); })))
>> +
>> +/* (X bit_op C0) lshift C1 -> (X lshift C0) bit_op (C0 lshift C1) */
>> +(for bit_op (bit_ior bit_xor bit_and)
>> +(simplify
>> + (lshift (bit_op:s @0 INTEGER_CST@1) INTEGER_CST@2)
>> + (bit_op
>> +  (lshift @0 @2)
>> +  { wide_int_to_tree (type, wi::lshift (@1, @2)); })))
>
>
> Will do, good to see that my second transformation still picks up the shift
> @1 @2 as a constant. I'm assuming there is some constant folding going on
> between simplify transformations?

Yes.

>>
>> this may be one case where not using wide-ints to be able to combine the
>> patterns makes sense.  Thus,
>>
>> (for shift (lshift rshift)
>>   (simplify
>>    (shift ...)
>>    (bit_op
>>     (shift @0 @2)
>>     (shift @1 @2))))
>>
>> note that I'm worried we'd take on "invalid" ubsan here when the above
>> applies to
>>
>> int foo (int i)
>> {
>>    return (i & 0x7fffffff) >> 3;
>> }
>> int main () { return foo (0x80000007); }
>>
>> and i is negative.  That's because right-shift of negative values
>> invokes undefined
>> behavior.  IIRC in the middle-end we will not be taking advantage of
>> that but the
>> simplifications apply to GENERIC as well and thus may hit before ubsan
>> instrumentation triggers(?)  It would be nice if you could double-check
>> that.
>
>
> I was looking into this and I understand your worries, though, I found out
> that for the particular case of shifts and bit_and there already is a
> simplify transformation that does the same, regardless of the sign.
>
> /* Fold (X & C2) << C1 into (X << C1) & (C2 << C1)
>    (X & C2) >> C1 into (X >> C1) & (C2 >> C1).  */
> (for shift (lshift rshift)
>  (simplify
>   (shift (convert?:s (bit_and:s @0 INTEGER_CST@2)) INTEGER_CST@1)
>   (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
>    (with { tree mask = int_const_binop (shift, fold_convert (type, @2), @1);
> }
>     (bit_and (shift (convert @0) @1) { mask; })))))

I see ... also an opportunity to merge this pattern with yours.

> Now, I don't quite understand what you mean by ubsan instrumentation, will
> GCC introduce guards into code where it detects potential undefined
> behavior?

Yes.

> Also, it might be worth to note that right shift of negative
> values is denoted as "implementation defined" by the C standard. GCC however
> doesn't include it in its list of implementation defined behavior so I guess
> that is why you refer to it as undefined?

Not sure, I thought it was undefined.  If its implementation defined
then GCC needs
to document its behavior.

> Should we perhaps disable transformations where we can not guarantee that
> the right shift produced is not one of negative values? I.e. prohibit this
> transformation for:
> 1) signed types and op1 == bit_xor
> 2) signed types and op1 == bit_and and C1 has sign bit set.
>
> Also would it be useful in cases where you have signed shift and bit_and,
> and C1 is not negative, to do the transformation but replace the signed
> shift by an unsigned shift. This to make sure we don't introduce
> undefined/implementation defined behavior were there was none.
>
> This does however change the current behavior!

Yeah, so unless somebody else chimes in let's consider this as followup only.

>>
>> + (if (!(op0 == RSHIFT_EXPR && !TYPE_UNSIGNED (type)) && wi::fits_uhwi_p
>> (@1))
>>
>> you only need fits_uhwi_p (@1) in the op0 != BIT_AND_EXPR case it
>> seems, so better
>> move it down to those cases.
>
>
> So I used to, but I had the problem that I didn't know how to make it "fail"
> the matching if this was not the case. For instance if op0 is a lshift for
> which the constant doesn't fit uhwi, then it would fall through and never
> set the zero mask, potentially leading to a wrong transformation. Now I'm
> not sure this can happen, since that would mean that either constant @2 or
> @3 need to be all 1's and that might already be caught by some other
> transformation, but its wrong to rely on such behavior IMO. So for now I
> have changed it to:
>
> (simplify
>  (op2
>   (op1:s
>    (op0@4 @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3)
>  (if (!(op0 == RSHIFT_EXPR && !TYPE_UNSIGNED (type)) &&
>       (op0 == BIT_AND_EXPR || wi::fits_uhwi_p (@1)))
>
>
> It would be cool to have a FAIL expression, usable in the with clauses, to
> make the pattern match fail a bit like the one in the machine description
> language.

I'll think about it.  Currently you'd need to add a  'bool fail' in the with
and initialize it, adding a (if (!fail) ...) after it.

>>
>> +   (if (wi::eq_p (wi::bit_and (C1, zero_mask_not), wi::zero (prec)))
>>
>> I think you can write
>>
>>     (if (wi::bit_and (...) == 0)
>>
>> or at least wi:eq_p (wi::bit_and (...), 0).
>>
>
> wi::bit_and (...) == 0 seems to be doing the trick.
>
>> I wonder if we shouldn't improve the pattern by handling (X op0 C0)
>> transparently
>> via using get_nonzero_bits (yeah, that's not exactly zero_mask but its
>> inverse AFAIK).
>> We'd wrap get_nonzero_bits in a helper that can handle GENERIC and your
>> &, >>, << cases (hmm, such function must already exist somewhere...).  So
>> you'd
>> reduce the pattern to
>>
>> + (for op1 (bit_ior bit_xor)
>> +      op2 (bit_xor bit_ior)
>> +(simplify
>> + (op2
>> +  (op1:s @0 INTEGER_CST@2) INTEGER_CST@3))
>>     (with
>>      {
>>        wide_int zero_mask_not = get_nonzero_bits (@0);
>> ...
>>      }
>>
>> This would make use of value-range information determined by VRP for
>> example.
>
>
> I'll go look for such a function.
>
>>
>> note that with your pattern you'd want to capture (op0:s @0 INTEGER_CST@1)
>> like via (op0@4 @0 INTEGER_CST@1) so you can re-use it in the replacement
>> like so:
>>
>> +   (if (wi::eq_p (wi::bit_and (C1, zero_mask_not), wi::zero (prec)))
>> +    (op2 @4 { wide_int_to_tree (type, cst_emit); })
>> +    (if (wi::eq_p (wi::bit_and (@3, zero_mask_not), wi::zero (prec)))
>> +     (op1 @4 { wide_int_to_tree (type, cst_emit); }))))))))
>>
>> the expression doesn't need a :s then obviously.
>
>
> Yeah makes sense.
>>
>>
>> Thanks and sorry for the delay in reviewing this.
>> Richard.
>>
>
> Thank you for all the comments!

No problem!

>>
>>> gcc/ChangeLog:
>>>
>>> 2015-08-03  Andre Vieira  <andre.simoesdiasvieira@arm.com>
>>>
>>>    * match.pd: Added new patterns:
>>>      ((X {&,<<,>>} C0) {|,^} C1) {^,|} C2)
>>>      (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>} C1)
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> 2015-08-03  Andre Vieira  <andre.simoesdiasvieira@arm.com>
>>>              Hale Wang  <hale.wang@arm.com>
>>>
>>>    * gcc.dg/tree-ssa/forwprop-33.c: New test.
>>
>>
>