From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 104938 invoked by alias); 25 Sep 2015 11:42:13 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 104808 invoked by uid 89); 25 Sep 2015 11:42:12 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-yk0-f171.google.com Received: from mail-yk0-f171.google.com (HELO mail-yk0-f171.google.com) (209.85.160.171) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Fri, 25 Sep 2015 11:42:10 +0000 Received: by ykdz138 with SMTP id z138so111987147ykd.2 for ; Fri, 25 Sep 2015 04:42:08 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.170.57.80 with SMTP id 77mr4181518ykz.59.1443181328480; Fri, 25 Sep 2015 04:42:08 -0700 (PDT) Received: by 10.37.93.136 with HTTP; Fri, 25 Sep 2015 04:42:08 -0700 (PDT) In-Reply-To: <5605303B.7080102@arm.com> References: <55E092D9.8070700@arm.com> <55E5AACB.3050205@arm.com> <55E82AFF.4020401@arm.com> <5605303B.7080102@arm.com> Date: Fri, 25 Sep 2015 12:22:00 -0000 Message-ID: Subject: Re: [PATCH v2][GCC] Algorithmic optimization in match and simplify From: Richard Biener To: Andre Vieira Cc: GCC Patches Content-Type: text/plain; charset=UTF-8 X-IsSubscribed: yes X-SW-Source: 2015-09/txt/msg01940.txt.bz2 On Fri, Sep 25, 2015 at 1:30 PM, Andre Vieira wrote: > On 17/09/15 10:46, Richard Biener wrote: >> >> On Thu, Sep 3, 2015 at 1:11 PM, Andre Vieira >> wrote: >>> >>> On 01/09/15 15:01, Richard Biener wrote: >>>> >>>> >>>> On Tue, Sep 1, 2015 at 3:40 PM, Andre Vieira >>>> wrote: >>>>> >>>>> >>>>> Hi Marc, >>>>> >>>>> On 28/08/15 19:07, Marc Glisse wrote: >>>>>> >>>>>> >>>>>> >>>>>> (not a review, I haven't even read the whole patch) >>>>>> >>>>>> On Fri, 28 Aug 2015, Andre Vieira wrote: >>>>>> >>>>>>> 2015-08-03 Andre Vieira >>>>>>> >>>>>>> * match.pd: Added new patterns: >>>>>>> ((X {&,<<,>>} C0) {|,^} C1) {^,|} C2) >>>>>>> (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>} >>>>>>> C1) >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> +(for op0 (rshift rshift lshift lshift bit_and bit_and) >>>>>> + op1 (bit_ior bit_xor bit_ior bit_xor bit_ior bit_xor) >>>>>> + op2 (bit_xor bit_ior bit_xor bit_ior bit_xor bit_ior) >>>>>> >>>>>> You can nest for-loops, it seems clearer as: >>>>>> (for op0 (rshift lshift bit_and) >>>>>> (for op1 (bit_ior bit_xor) >>>>>> op2 (bit_xor bit_ior) >>>>> >>>>> >>>>> >>>>> >>>>> Will do, thank you for pointing it out. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> +(simplify >>>>>> + (op2:c >>>>>> + (op1:c >>>>>> + (op0 @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3) >>>>>> >>>>>> I suspect you will want more :s (single_use) and less :c >>>>>> (canonicalization >>>>>> should put constants in second position). >>>>>> >>>>> I can't find the definition of :s (single_use). >>>> >>>> >>>> >>>> Sorry for that - didn't get along updating it yet :/ It restricts >>>> matching to >>>> sub-expressions that have a single-use. So >>>> >>>> + a &= 0xd123; >>>> + unsigned short tem = a ^ 0x6040; >>>> + a = tem | 0xc031; /* Simplify _not_ to ((a & 0xd123) | 0xe071). */ >>>> ... use of tem ... >>>> >>>> we shouldn't do the simplifcation here because the expression >>>> (a & 0x123) ^ 0x6040 is kept live by 'tem'. >>>> >>>>> GCC internals do point out >>>>> that canonicalization does put constants in the second position, didnt >>>>> see >>>>> that first. Thank you for pointing it out. >>>>> >>>>>> + C1 = wi::bit_and_not (C1,C2); >>>>>> >>>>>> Space after ','. >>>>>> >>>>> Will do. >>>>> >>>>>> Having wide_int_storage in many places is surprising, I can't find >>>>>> similar >>>>>> code anywhere else in gcc. >>>>>> >>>>>> >>>>> >>>>> I tried looking for examples of something similar, I think I ended up >>>>> using >>>>> wide_int because I was able to convert easily to and from it and it has >>>>> the >>>>> "mask" and "wide_int_to_tree" functions. I welcome any suggestions on >>>>> what I >>>>> should be using here for integer constant transformations and >>>>> comparisons. >>>> >>>> >>>> >>>> Using wide-ints is fine, but you shouldn't need 'wide_int_storage' >>>> constructors - those >>>> are indeed odd. Is it just for the initializers of wide-ints? >>>> >>>> + wide_int zero_mask = wi::zero (prec); >>>> + wide_int C0 = wide_int_storage (@1); >>>> + wide_int C1 = wide_int_storage (@2); >>>> + wide_int C2 = wide_int_storage (@3); >>>> ... >>>> + zero_mask = wide_int_storage (wi::mask (C0.to_uhwi (), false, >>>> prec)); >>>> >>>> tree_to_uhwi (@1) should do the trick as well >>>> >>>> + C1 = wi::bit_and_not (C1,C2); >>>> + cst_emit = wi::bit_or (C1, C2); >>>> >>>> the ops should be replacable with @2 and @3, the store to C1 obviously >>>> not >>>> (but you can use a tree temporary and use wide_int_to_tree here to avoid >>>> the back-and-forth for the case where C1 is not assigned to). >>>> >>>> Note that transforms only doing association are prone to endless >>>> recursion >>>> in case some other pattern does the reverse op... >>>> >>>> Richard. >>>> >>>> >>>>> BR, >>>>> Andre >>>>> >>>> >>> Thank you for all the comments, see reworked version: >>> >>> Two new algorithmic optimisations: >>> 1.((X op0 C0) op1 C1) op2 C2) >>> with op0 = {&, >>, <<}, op1 = {|,^}, op2 = {|,^} and op1 != op2 >>> zero_mask has 1's for all bits that are sure to be 0 in (X op0 C0) >>> and 0's otherwise. >>> if (op1 == '^') C1 &= ~C2 (Only changed if actually emitted) >>> if ((C1 & ~zero_mask) == 0) then emit (X op0 C0) op2 (C1 op2 C2) >>> if ((C2 & ~zero_mask) == 0) then emit (X op0 C0) op1 (C1 op2 C2) >>> 2. (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>} C1) >>> >>> >>> This patch does two algorithmic optimisations that target patterns like: >>> (((x << 24) | 0x00FFFFFF) ^ 0xFF000000) and ((x ^ 0x40000002) >> 1) | >>> 0x80000000. >>> >>> The transformation uses the knowledge of which bits are zero after >>> operations like (X {&,<<,(unsigned)>>}) to combine constants, reducing >>> run-time operations. >>> The two examples above would be transformed into (X << 24) ^ 0xFFFFFFFF >>> and >>> (X >> 1) ^ 0xa0000001 respectively. >>> >>> The second transformation enables more applications of the first. Also >>> some >>> targets may benefit from delaying shift operations. I am aware that such >>> an >>> optimization, in combination with one or more optimizations that cause >>> the >>> reverse transformation, may lead to an infinite loop. Though such >>> behavior >>> has not been detected during regression testing and bootstrapping on >>> aarch64. >> >> >> +/* (X bit_op C0) rshift C1 -> (X rshift C0) bit_op (C0 rshift C1) */ >> +(for bit_op (bit_ior bit_xor bit_and) >> +(simplify >> + (rshift (bit_op:s @0 INTEGER_CST@1) INTEGER_CST@2) >> + (bit_op >> + (rshift @0 @2) >> + { wide_int_to_tree (type, wi::rshift (@1, @2, TYPE_SIGN (type))); }))) >> + >> +/* (X bit_op C0) lshift C1 -> (X lshift C0) bit_op (C0 lshift C1) */ >> +(for bit_op (bit_ior bit_xor bit_and) >> +(simplify >> + (lshift (bit_op:s @0 INTEGER_CST@1) INTEGER_CST@2) >> + (bit_op >> + (lshift @0 @2) >> + { wide_int_to_tree (type, wi::lshift (@1, @2)); }))) > > > Will do, good to see that my second transformation still picks up the shift > @1 @2 as a constant. I'm assuming there is some constant folding going on > between simplify transformations? Yes. >> >> this may be one case where not using wide-ints to be able to combine the >> patterns makes sense. Thus, >> >> (for shift (lshift rshift) >> (simplify >> (shift ...) >> (bit_op >> (shift @0 @2) >> (shift @1 @2)))) >> >> note that I'm worried we'd take on "invalid" ubsan here when the above >> applies to >> >> int foo (int i) >> { >> return (i & 0x7fffffff) >> 3; >> } >> int main () { return foo (0x80000007); } >> >> and i is negative. That's because right-shift of negative values >> invokes undefined >> behavior. IIRC in the middle-end we will not be taking advantage of >> that but the >> simplifications apply to GENERIC as well and thus may hit before ubsan >> instrumentation triggers(?) It would be nice if you could double-check >> that. > > > I was looking into this and I understand your worries, though, I found out > that for the particular case of shifts and bit_and there already is a > simplify transformation that does the same, regardless of the sign. > > /* Fold (X & C2) << C1 into (X << C1) & (C2 << C1) > (X & C2) >> C1 into (X >> C1) & (C2 >> C1). */ > (for shift (lshift rshift) > (simplify > (shift (convert?:s (bit_and:s @0 INTEGER_CST@2)) INTEGER_CST@1) > (if (tree_nop_conversion_p (type, TREE_TYPE (@0))) > (with { tree mask = int_const_binop (shift, fold_convert (type, @2), @1); > } > (bit_and (shift (convert @0) @1) { mask; }))))) I see ... also an opportunity to merge this pattern with yours. > Now, I don't quite understand what you mean by ubsan instrumentation, will > GCC introduce guards into code where it detects potential undefined > behavior? Yes. > Also, it might be worth to note that right shift of negative > values is denoted as "implementation defined" by the C standard. GCC however > doesn't include it in its list of implementation defined behavior so I guess > that is why you refer to it as undefined? Not sure, I thought it was undefined. If its implementation defined then GCC needs to document its behavior. > Should we perhaps disable transformations where we can not guarantee that > the right shift produced is not one of negative values? I.e. prohibit this > transformation for: > 1) signed types and op1 == bit_xor > 2) signed types and op1 == bit_and and C1 has sign bit set. > > Also would it be useful in cases where you have signed shift and bit_and, > and C1 is not negative, to do the transformation but replace the signed > shift by an unsigned shift. This to make sure we don't introduce > undefined/implementation defined behavior were there was none. > > This does however change the current behavior! Yeah, so unless somebody else chimes in let's consider this as followup only. >> >> + (if (!(op0 == RSHIFT_EXPR && !TYPE_UNSIGNED (type)) && wi::fits_uhwi_p >> (@1)) >> >> you only need fits_uhwi_p (@1) in the op0 != BIT_AND_EXPR case it >> seems, so better >> move it down to those cases. > > > So I used to, but I had the problem that I didn't know how to make it "fail" > the matching if this was not the case. For instance if op0 is a lshift for > which the constant doesn't fit uhwi, then it would fall through and never > set the zero mask, potentially leading to a wrong transformation. Now I'm > not sure this can happen, since that would mean that either constant @2 or > @3 need to be all 1's and that might already be caught by some other > transformation, but its wrong to rely on such behavior IMO. So for now I > have changed it to: > > (simplify > (op2 > (op1:s > (op0@4 @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3) > (if (!(op0 == RSHIFT_EXPR && !TYPE_UNSIGNED (type)) && > (op0 == BIT_AND_EXPR || wi::fits_uhwi_p (@1))) > > > It would be cool to have a FAIL expression, usable in the with clauses, to > make the pattern match fail a bit like the one in the machine description > language. I'll think about it. Currently you'd need to add a 'bool fail' in the with and initialize it, adding a (if (!fail) ...) after it. >> >> + (if (wi::eq_p (wi::bit_and (C1, zero_mask_not), wi::zero (prec))) >> >> I think you can write >> >> (if (wi::bit_and (...) == 0) >> >> or at least wi:eq_p (wi::bit_and (...), 0). >> > > wi::bit_and (...) == 0 seems to be doing the trick. > >> I wonder if we shouldn't improve the pattern by handling (X op0 C0) >> transparently >> via using get_nonzero_bits (yeah, that's not exactly zero_mask but its >> inverse AFAIK). >> We'd wrap get_nonzero_bits in a helper that can handle GENERIC and your >> &, >>, << cases (hmm, such function must already exist somewhere...). So >> you'd >> reduce the pattern to >> >> + (for op1 (bit_ior bit_xor) >> + op2 (bit_xor bit_ior) >> +(simplify >> + (op2 >> + (op1:s @0 INTEGER_CST@2) INTEGER_CST@3)) >> (with >> { >> wide_int zero_mask_not = get_nonzero_bits (@0); >> ... >> } >> >> This would make use of value-range information determined by VRP for >> example. > > > I'll go look for such a function. > >> >> note that with your pattern you'd want to capture (op0:s @0 INTEGER_CST@1) >> like via (op0@4 @0 INTEGER_CST@1) so you can re-use it in the replacement >> like so: >> >> + (if (wi::eq_p (wi::bit_and (C1, zero_mask_not), wi::zero (prec))) >> + (op2 @4 { wide_int_to_tree (type, cst_emit); }) >> + (if (wi::eq_p (wi::bit_and (@3, zero_mask_not), wi::zero (prec))) >> + (op1 @4 { wide_int_to_tree (type, cst_emit); })))))))) >> >> the expression doesn't need a :s then obviously. > > > Yeah makes sense. >> >> >> Thanks and sorry for the delay in reviewing this. >> Richard. >> > > Thank you for all the comments! No problem! >> >>> gcc/ChangeLog: >>> >>> 2015-08-03 Andre Vieira >>> >>> * match.pd: Added new patterns: >>> ((X {&,<<,>>} C0) {|,^} C1) {^,|} C2) >>> (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>} C1) >>> >>> gcc/testsuite/ChangeLog: >>> >>> 2015-08-03 Andre Vieira >>> Hale Wang >>> >>> * gcc.dg/tree-ssa/forwprop-33.c: New test. >> >> >