From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 64524 invoked by alias); 8 Oct 2015 12:29:40 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 64514 invoked by uid 89); 8 Oct 2015 12:29:39 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.0 required=5.0 tests=AWL,BAYES_80,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=no version=3.3.2 X-HELO: mail-yk0-f179.google.com Received: from mail-yk0-f179.google.com (HELO mail-yk0-f179.google.com) (209.85.160.179) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Thu, 08 Oct 2015 12:29:37 +0000 Received: by ykft14 with SMTP id t14so46835428ykf.0 for ; Thu, 08 Oct 2015 05:29:35 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.129.49.149 with SMTP id x143mr5420307ywx.147.1444307375069; Thu, 08 Oct 2015 05:29:35 -0700 (PDT) Received: by 10.37.93.136 with HTTP; Thu, 8 Oct 2015 05:29:34 -0700 (PDT) In-Reply-To: <5614D5F7.2000806@arm.com> References: <55E092D9.8070700@arm.com> <55E5AACB.3050205@arm.com> <55E82AFF.4020401@arm.com> <5605303B.7080102@arm.com> <5614D5F7.2000806@arm.com> Date: Thu, 08 Oct 2015 12:29:00 -0000 Message-ID: Subject: Re: [PATCH V3][GCC] Algorithmic optimization in match and simplify From: Richard Biener To: Andre Vieira Cc: GCC Patches Content-Type: text/plain; charset=UTF-8 X-IsSubscribed: yes X-SW-Source: 2015-10/txt/msg00822.txt.bz2 On Wed, Oct 7, 2015 at 10:21 AM, Andre Vieira wrote: > On 25/09/15 12:42, Richard Biener wrote: >> >> On Fri, Sep 25, 2015 at 1:30 PM, Andre Vieira >> wrote: >>> >>> On 17/09/15 10:46, Richard Biener wrote: >>>> >>>> >>>> On Thu, Sep 3, 2015 at 1:11 PM, Andre Vieira >>>> wrote: >>>>> >>>>> >>>>> On 01/09/15 15:01, Richard Biener wrote: >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Sep 1, 2015 at 3:40 PM, Andre Vieira >>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> Hi Marc, >>>>>>> >>>>>>> On 28/08/15 19:07, Marc Glisse wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> (not a review, I haven't even read the whole patch) >>>>>>>> >>>>>>>> On Fri, 28 Aug 2015, Andre Vieira wrote: >>>>>>>> >>>>>>>>> 2015-08-03 Andre Vieira >>>>>>>>> >>>>>>>>> * match.pd: Added new patterns: >>>>>>>>> ((X {&,<<,>>} C0) {|,^} C1) {^,|} C2) >>>>>>>>> (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 >>>>>>>>> {<<,>>} >>>>>>>>> C1) >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> +(for op0 (rshift rshift lshift lshift bit_and bit_and) >>>>>>>> + op1 (bit_ior bit_xor bit_ior bit_xor bit_ior bit_xor) >>>>>>>> + op2 (bit_xor bit_ior bit_xor bit_ior bit_xor bit_ior) >>>>>>>> >>>>>>>> You can nest for-loops, it seems clearer as: >>>>>>>> (for op0 (rshift lshift bit_and) >>>>>>>> (for op1 (bit_ior bit_xor) >>>>>>>> op2 (bit_xor bit_ior) >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Will do, thank you for pointing it out. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> +(simplify >>>>>>>> + (op2:c >>>>>>>> + (op1:c >>>>>>>> + (op0 @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3) >>>>>>>> >>>>>>>> I suspect you will want more :s (single_use) and less :c >>>>>>>> (canonicalization >>>>>>>> should put constants in second position). >>>>>>>> >>>>>>> I can't find the definition of :s (single_use). >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Sorry for that - didn't get along updating it yet :/ It restricts >>>>>> matching to >>>>>> sub-expressions that have a single-use. So >>>>>> >>>>>> + a &= 0xd123; >>>>>> + unsigned short tem = a ^ 0x6040; >>>>>> + a = tem | 0xc031; /* Simplify _not_ to ((a & 0xd123) | 0xe071). */ >>>>>> ... use of tem ... >>>>>> >>>>>> we shouldn't do the simplifcation here because the expression >>>>>> (a & 0x123) ^ 0x6040 is kept live by 'tem'. >>>>>> >>>>>>> GCC internals do point out >>>>>>> that canonicalization does put constants in the second position, >>>>>>> didnt >>>>>>> see >>>>>>> that first. Thank you for pointing it out. >>>>>>> >>>>>>>> + C1 = wi::bit_and_not (C1,C2); >>>>>>>> >>>>>>>> Space after ','. >>>>>>>> >>>>>>> Will do. >>>>>>> >>>>>>>> Having wide_int_storage in many places is surprising, I can't find >>>>>>>> similar >>>>>>>> code anywhere else in gcc. >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> I tried looking for examples of something similar, I think I ended up >>>>>>> using >>>>>>> wide_int because I was able to convert easily to and from it and it >>>>>>> has >>>>>>> the >>>>>>> "mask" and "wide_int_to_tree" functions. I welcome any suggestions on >>>>>>> what I >>>>>>> should be using here for integer constant transformations and >>>>>>> comparisons. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Using wide-ints is fine, but you shouldn't need 'wide_int_storage' >>>>>> constructors - those >>>>>> are indeed odd. Is it just for the initializers of wide-ints? >>>>>> >>>>>> + wide_int zero_mask = wi::zero (prec); >>>>>> + wide_int C0 = wide_int_storage (@1); >>>>>> + wide_int C1 = wide_int_storage (@2); >>>>>> + wide_int C2 = wide_int_storage (@3); >>>>>> ... >>>>>> + zero_mask = wide_int_storage (wi::mask (C0.to_uhwi (), false, >>>>>> prec)); >>>>>> >>>>>> tree_to_uhwi (@1) should do the trick as well >>>>>> >>>>>> + C1 = wi::bit_and_not (C1,C2); >>>>>> + cst_emit = wi::bit_or (C1, C2); >>>>>> >>>>>> the ops should be replacable with @2 and @3, the store to C1 obviously >>>>>> not >>>>>> (but you can use a tree temporary and use wide_int_to_tree here to >>>>>> avoid >>>>>> the back-and-forth for the case where C1 is not assigned to). >>>>>> >>>>>> Note that transforms only doing association are prone to endless >>>>>> recursion >>>>>> in case some other pattern does the reverse op... >>>>>> >>>>>> Richard. >>>>>> >>>>>> >>>>>>> BR, >>>>>>> Andre >>>>>>> >>>>>> >>>>> Thank you for all the comments, see reworked version: >>>>> >>>>> Two new algorithmic optimisations: >>>>> 1.((X op0 C0) op1 C1) op2 C2) >>>>> with op0 = {&, >>, <<}, op1 = {|,^}, op2 = {|,^} and op1 != op2 >>>>> zero_mask has 1's for all bits that are sure to be 0 in (X op0 >>>>> C0) >>>>> and 0's otherwise. >>>>> if (op1 == '^') C1 &= ~C2 (Only changed if actually emitted) >>>>> if ((C1 & ~zero_mask) == 0) then emit (X op0 C0) op2 (C1 op2 C2) >>>>> if ((C2 & ~zero_mask) == 0) then emit (X op0 C0) op1 (C1 op2 C2) >>>>> 2. (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>} >>>>> C1) >>>>> >>>>> >>>>> This patch does two algorithmic optimisations that target patterns >>>>> like: >>>>> (((x << 24) | 0x00FFFFFF) ^ 0xFF000000) and ((x ^ 0x40000002) >> 1) | >>>>> 0x80000000. >>>>> >>>>> The transformation uses the knowledge of which bits are zero after >>>>> operations like (X {&,<<,(unsigned)>>}) to combine constants, reducing >>>>> run-time operations. >>>>> The two examples above would be transformed into (X << 24) ^ 0xFFFFFFFF >>>>> and >>>>> (X >> 1) ^ 0xa0000001 respectively. >>>>> >>>>> The second transformation enables more applications of the first. Also >>>>> some >>>>> targets may benefit from delaying shift operations. I am aware that >>>>> such >>>>> an >>>>> optimization, in combination with one or more optimizations that cause >>>>> the >>>>> reverse transformation, may lead to an infinite loop. Though such >>>>> behavior >>>>> has not been detected during regression testing and bootstrapping on >>>>> aarch64. >>>> >>>> >>>> >>>> +/* (X bit_op C0) rshift C1 -> (X rshift C0) bit_op (C0 rshift C1) */ >>>> +(for bit_op (bit_ior bit_xor bit_and) >>>> +(simplify >>>> + (rshift (bit_op:s @0 INTEGER_CST@1) INTEGER_CST@2) >>>> + (bit_op >>>> + (rshift @0 @2) >>>> + { wide_int_to_tree (type, wi::rshift (@1, @2, TYPE_SIGN (type))); >>>> }))) >>>> + >>>> +/* (X bit_op C0) lshift C1 -> (X lshift C0) bit_op (C0 lshift C1) */ >>>> +(for bit_op (bit_ior bit_xor bit_and) >>>> +(simplify >>>> + (lshift (bit_op:s @0 INTEGER_CST@1) INTEGER_CST@2) >>>> + (bit_op >>>> + (lshift @0 @2) >>>> + { wide_int_to_tree (type, wi::lshift (@1, @2)); }))) >>> >>> >>> >>> Will do, good to see that my second transformation still picks up the >>> shift >>> @1 @2 as a constant. I'm assuming there is some constant folding going on >>> between simplify transformations? >> >> >> Yes. >> >>>> >>>> this may be one case where not using wide-ints to be able to combine the >>>> patterns makes sense. Thus, >>>> >>>> (for shift (lshift rshift) >>>> (simplify >>>> (shift ...) >>>> (bit_op >>>> (shift @0 @2) >>>> (shift @1 @2)))) >>>> >>>> note that I'm worried we'd take on "invalid" ubsan here when the above >>>> applies to >>>> >>>> int foo (int i) >>>> { >>>> return (i & 0x7fffffff) >> 3; >>>> } >>>> int main () { return foo (0x80000007); } >>>> >>>> and i is negative. That's because right-shift of negative values >>>> invokes undefined >>>> behavior. IIRC in the middle-end we will not be taking advantage of >>>> that but the >>>> simplifications apply to GENERIC as well and thus may hit before ubsan >>>> instrumentation triggers(?) It would be nice if you could double-check >>>> that. >>> >>> >>> >>> I was looking into this and I understand your worries, though, I found >>> out >>> that for the particular case of shifts and bit_and there already is a >>> simplify transformation that does the same, regardless of the sign. >>> >>> /* Fold (X & C2) << C1 into (X << C1) & (C2 << C1) >>> (X & C2) >> C1 into (X >> C1) & (C2 >> C1). */ >>> (for shift (lshift rshift) >>> (simplify >>> (shift (convert?:s (bit_and:s @0 INTEGER_CST@2)) INTEGER_CST@1) >>> (if (tree_nop_conversion_p (type, TREE_TYPE (@0))) >>> (with { tree mask = int_const_binop (shift, fold_convert (type, @2), >>> @1); >>> } >>> (bit_and (shift (convert @0) @1) { mask; }))))) >> >> >> I see ... also an opportunity to merge this pattern with yours. >> >>> Now, I don't quite understand what you mean by ubsan instrumentation, >>> will >>> GCC introduce guards into code where it detects potential undefined >>> behavior? >> >> >> Yes. >> >>> Also, it might be worth to note that right shift of negative >>> values is denoted as "implementation defined" by the C standard. GCC >>> however >>> doesn't include it in its list of implementation defined behavior so I >>> guess >>> that is why you refer to it as undefined? >> >> >> Not sure, I thought it was undefined. If its implementation defined >> then GCC needs >> to document its behavior. >> >>> Should we perhaps disable transformations where we can not guarantee that >>> the right shift produced is not one of negative values? I.e. prohibit >>> this >>> transformation for: >>> 1) signed types and op1 == bit_xor >>> 2) signed types and op1 == bit_and and C1 has sign bit set. >>> >>> Also would it be useful in cases where you have signed shift and bit_and, >>> and C1 is not negative, to do the transformation but replace the signed >>> shift by an unsigned shift. This to make sure we don't introduce >>> undefined/implementation defined behavior were there was none. >>> >>> This does however change the current behavior! >> >> >> Yeah, so unless somebody else chimes in let's consider this as followup >> only. >> >>>> >>>> + (if (!(op0 == RSHIFT_EXPR && !TYPE_UNSIGNED (type)) && wi::fits_uhwi_p >>>> (@1)) >>>> >>>> you only need fits_uhwi_p (@1) in the op0 != BIT_AND_EXPR case it >>>> seems, so better >>>> move it down to those cases. >>> >>> >>> >>> So I used to, but I had the problem that I didn't know how to make it >>> "fail" >>> the matching if this was not the case. For instance if op0 is a lshift >>> for >>> which the constant doesn't fit uhwi, then it would fall through and never >>> set the zero mask, potentially leading to a wrong transformation. Now I'm >>> not sure this can happen, since that would mean that either constant @2 >>> or >>> @3 need to be all 1's and that might already be caught by some other >>> transformation, but its wrong to rely on such behavior IMO. So for now I >>> have changed it to: >>> >>> (simplify >>> (op2 >>> (op1:s >>> (op0@4 @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3) >>> (if (!(op0 == RSHIFT_EXPR && !TYPE_UNSIGNED (type)) && >>> (op0 == BIT_AND_EXPR || wi::fits_uhwi_p (@1))) >>> >>> >>> It would be cool to have a FAIL expression, usable in the with clauses, >>> to >>> make the pattern match fail a bit like the one in the machine description >>> language. >> >> >> I'll think about it. Currently you'd need to add a 'bool fail' in the >> with >> and initialize it, adding a (if (!fail) ...) after it. >> >>>> >>>> + (if (wi::eq_p (wi::bit_and (C1, zero_mask_not), wi::zero (prec))) >>>> >>>> I think you can write >>>> >>>> (if (wi::bit_and (...) == 0) >>>> >>>> or at least wi:eq_p (wi::bit_and (...), 0). >>>> >>> >>> wi::bit_and (...) == 0 seems to be doing the trick. >>> >>>> I wonder if we shouldn't improve the pattern by handling (X op0 C0) >>>> transparently >>>> via using get_nonzero_bits (yeah, that's not exactly zero_mask but its >>>> inverse AFAIK). >>>> We'd wrap get_nonzero_bits in a helper that can handle GENERIC and your >>>> &, >>, << cases (hmm, such function must already exist somewhere...). >>>> So >>>> you'd >>>> reduce the pattern to >>>> >>>> + (for op1 (bit_ior bit_xor) >>>> + op2 (bit_xor bit_ior) >>>> +(simplify >>>> + (op2 >>>> + (op1:s @0 INTEGER_CST@2) INTEGER_CST@3)) >>>> (with >>>> { >>>> wide_int zero_mask_not = get_nonzero_bits (@0); >>>> ... >>>> } >>>> >>>> This would make use of value-range information determined by VRP for >>>> example. >>> >>> >>> >>> I'll go look for such a function. >>> >>>> >>>> note that with your pattern you'd want to capture (op0:s @0 >>>> INTEGER_CST@1) >>>> like via (op0@4 @0 INTEGER_CST@1) so you can re-use it in the >>>> replacement >>>> like so: >>>> >>>> + (if (wi::eq_p (wi::bit_and (C1, zero_mask_not), wi::zero (prec))) >>>> + (op2 @4 { wide_int_to_tree (type, cst_emit); }) >>>> + (if (wi::eq_p (wi::bit_and (@3, zero_mask_not), wi::zero (prec))) >>>> + (op1 @4 { wide_int_to_tree (type, cst_emit); })))))))) >>>> >>>> the expression doesn't need a :s then obviously. >>> >>> >>> >>> Yeah makes sense. >>>> >>>> >>>> >>>> Thanks and sorry for the delay in reviewing this. >>>> Richard. >>>> >>> >>> Thank you for all the comments! >> >> >> No problem! >> >>>> >>>>> gcc/ChangeLog: >>>>> >>>>> 2015-08-03 Andre Vieira >>>>> >>>>> * match.pd: Added new patterns: >>>>> ((X {&,<<,>>} C0) {|,^} C1) {^,|} C2) >>>>> (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>} >>>>> C1) >>>>> >>>>> gcc/testsuite/ChangeLog: >>>>> >>>>> 2015-08-03 Andre Vieira >>>>> Hale Wang >>>>> >>>>> * gcc.dg/tree-ssa/forwprop-33.c: New test. >>>> >>>> >>>> >>> >> > Thanks again for the comments Richard! > > A new algorithmic optimisation: > > ((X inner_op C0) outer_op C1) > With X being a tree where value_range has reasoned certain bits to always be > zero throughout its computed value range, we will call this the zero_mask, > and with inner_op = {|,^}, outer_op = {|,^} and inner_op != outer_op. > if (inner_op == '^') C0 &= ~C1; > if ((C0 & ~zero_mask) == 0) then emit (X outer_op (C0 outer_op C1) > if ((C1 & ~zero_mask) == 0) then emit (X inner_op (C0 outer_op C1) > > And extended '(X & C2) << C1 into (X << C1) & (C2 << C1)' and > '(X & C2) >> C1 into (X >> C1) & (C2 >> C1)' to also accept the bitwise or > and xor operators: > '(X {&,^,|} C2) << C1 into (X << C1) {&,^,|} (C2 << C1)' and > '(X {&,^,|} C2) >> C1 into (X >> C1) & (C2 >> C1)'. > > The second transformation enables more applications of the first. Also some > targets may benefit from delaying shift operations. I am aware that such an > optimization, in combination with one or more optimizations that cause the > reverse transformation, may lead to an infinite loop. Though such behavior > has not been detected during regression testing and bootstrapping on > aarch64. > > gcc/ChangeLog: > > 2015-10-05 Andre Vieira > > * match.pd: Added a new pattern > ((X inner_op C0) outer_op C1) > and expanded existing one > (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>} C1) > > gcc/testsuite/ChangeLog: > > 2015-10-05 Andre Vieira > > Hale Wang > > * gcc.dg/tree-ssa/forwprop-33.c: New test. Ok. Thanks, Richard.