From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 71025 invoked by alias); 25 Sep 2015 11:30:14 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 71007 invoked by uid 89); 25 Sep 2015 11:30:12 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.5 required=5.0 tests=AWL,BAYES_00,SPF_PASS autolearn=ham version=3.3.2 X-HELO: eu-smtp-delivery-143.mimecast.com Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (207.82.80.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 25 Sep 2015 11:30:10 +0000 Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.140]) by eu-smtp-1.mimecast.com with ESMTP id uk-mta-36-06SQmDFzRdOLA8Na0n7XVg-1; Fri, 25 Sep 2015 12:30:04 +0100 Received: from [10.2.207.14] ([10.1.2.79]) by cam-owa1.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Fri, 25 Sep 2015 12:30:04 +0100 Subject: Re: [PATCH v2][GCC] Algorithmic optimization in match and simplify To: Richard Biener References: <55E092D9.8070700@arm.com> <55E5AACB.3050205@arm.com> <55E82AFF.4020401@arm.com> Cc: GCC Patches From: Andre Vieira Message-ID: <5605303B.7080102@arm.com> Date: Fri, 25 Sep 2015 11:44:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: X-MC-Unique: 06SQmDFzRdOLA8Na0n7XVg-1 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes X-SW-Source: 2015-09/txt/msg01937.txt.bz2 On 17/09/15 10:46, Richard Biener wrote: > On Thu, Sep 3, 2015 at 1:11 PM, Andre Vieira > wrote: >> On 01/09/15 15:01, Richard Biener wrote: >>> >>> On Tue, Sep 1, 2015 at 3:40 PM, Andre Vieira >>> wrote: >>>> >>>> Hi Marc, >>>> >>>> On 28/08/15 19:07, Marc Glisse wrote: >>>>> >>>>> >>>>> (not a review, I haven't even read the whole patch) >>>>> >>>>> On Fri, 28 Aug 2015, Andre Vieira wrote: >>>>> >>>>>> 2015-08-03 Andre Vieira >>>>>> >>>>>> * match.pd: Added new patterns: >>>>>> ((X {&,<<,>>} C0) {|,^} C1) {^,|} C2) >>>>>> (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>} >>>>>> C1) >>>>> >>>>> >>>>> >>>>> +(for op0 (rshift rshift lshift lshift bit_and bit_and) >>>>> + op1 (bit_ior bit_xor bit_ior bit_xor bit_ior bit_xor) >>>>> + op2 (bit_xor bit_ior bit_xor bit_ior bit_xor bit_ior) >>>>> >>>>> You can nest for-loops, it seems clearer as: >>>>> (for op0 (rshift lshift bit_and) >>>>> (for op1 (bit_ior bit_xor) >>>>> op2 (bit_xor bit_ior) >>>> >>>> >>>> >>>> Will do, thank you for pointing it out. >>>>> >>>>> >>>>> >>>>> +(simplify >>>>> + (op2:c >>>>> + (op1:c >>>>> + (op0 @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3) >>>>> >>>>> I suspect you will want more :s (single_use) and less :c >>>>> (canonicalization >>>>> should put constants in second position). >>>>> >>>> I can't find the definition of :s (single_use). >>> >>> >>> Sorry for that - didn't get along updating it yet :/ It restricts >>> matching to >>> sub-expressions that have a single-use. So >>> >>> + a &=3D 0xd123; >>> + unsigned short tem =3D a ^ 0x6040; >>> + a =3D tem | 0xc031; /* Simplify _not_ to ((a & 0xd123) | 0xe071). */ >>> ... use of tem ... >>> >>> we shouldn't do the simplifcation here because the expression >>> (a & 0x123) ^ 0x6040 is kept live by 'tem'. >>> >>>> GCC internals do point out >>>> that canonicalization does put constants in the second position, didnt >>>> see >>>> that first. Thank you for pointing it out. >>>> >>>>> + C1 =3D wi::bit_and_not (C1,C2); >>>>> >>>>> Space after ','. >>>>> >>>> Will do. >>>> >>>>> Having wide_int_storage in many places is surprising, I can't find >>>>> similar >>>>> code anywhere else in gcc. >>>>> >>>>> >>>> >>>> I tried looking for examples of something similar, I think I ended up >>>> using >>>> wide_int because I was able to convert easily to and from it and it has >>>> the >>>> "mask" and "wide_int_to_tree" functions. I welcome any suggestions on >>>> what I >>>> should be using here for integer constant transformations and >>>> comparisons. >>> >>> >>> Using wide-ints is fine, but you shouldn't need 'wide_int_storage' >>> constructors - those >>> are indeed odd. Is it just for the initializers of wide-ints? >>> >>> + wide_int zero_mask =3D wi::zero (prec); >>> + wide_int C0 =3D wide_int_storage (@1); >>> + wide_int C1 =3D wide_int_storage (@2); >>> + wide_int C2 =3D wide_int_storage (@3); >>> ... >>> + zero_mask =3D wide_int_storage (wi::mask (C0.to_uhwi (), false, >>> prec)); >>> >>> tree_to_uhwi (@1) should do the trick as well >>> >>> + C1 =3D wi::bit_and_not (C1,C2); >>> + cst_emit =3D wi::bit_or (C1, C2); >>> >>> the ops should be replacable with @2 and @3, the store to C1 obviously = not >>> (but you can use a tree temporary and use wide_int_to_tree here to avoid >>> the back-and-forth for the case where C1 is not assigned to). >>> >>> Note that transforms only doing association are prone to endless recurs= ion >>> in case some other pattern does the reverse op... >>> >>> Richard. >>> >>> >>>> BR, >>>> Andre >>>> >>> >> Thank you for all the comments, see reworked version: >> >> Two new algorithmic optimisations: >> 1.((X op0 C0) op1 C1) op2 C2) >> with op0 =3D {&, >>, <<}, op1 =3D {|,^}, op2 =3D {|,^} and op1 !=3D= op2 >> zero_mask has 1's for all bits that are sure to be 0 in (X op0 C0) >> and 0's otherwise. >> if (op1 =3D=3D '^') C1 &=3D ~C2 (Only changed if actually emitted) >> if ((C1 & ~zero_mask) =3D=3D 0) then emit (X op0 C0) op2 (C1 op2 C2) >> if ((C2 & ~zero_mask) =3D=3D 0) then emit (X op0 C0) op1 (C1 op2 C2) >> 2. (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>} C1) >> >> >> This patch does two algorithmic optimisations that target patterns like: >> (((x << 24) | 0x00FFFFFF) ^ 0xFF000000) and ((x ^ 0x40000002) >> 1) | >> 0x80000000. >> >> The transformation uses the knowledge of which bits are zero after >> operations like (X {&,<<,(unsigned)>>}) to combine constants, reducing >> run-time operations. >> The two examples above would be transformed into (X << 24) ^ 0xFFFFFFFF = and >> (X >> 1) ^ 0xa0000001 respectively. >> >> The second transformation enables more applications of the first. Also s= ome >> targets may benefit from delaying shift operations. I am aware that such= an >> optimization, in combination with one or more optimizations that cause t= he >> reverse transformation, may lead to an infinite loop. Though such behavi= or >> has not been detected during regression testing and bootstrapping on >> aarch64. > > +/* (X bit_op C0) rshift C1 -> (X rshift C0) bit_op (C0 rshift C1) */ > +(for bit_op (bit_ior bit_xor bit_and) > +(simplify > + (rshift (bit_op:s @0 INTEGER_CST@1) INTEGER_CST@2) > + (bit_op > + (rshift @0 @2) > + { wide_int_to_tree (type, wi::rshift (@1, @2, TYPE_SIGN (type))); }))) > + > +/* (X bit_op C0) lshift C1 -> (X lshift C0) bit_op (C0 lshift C1) */ > +(for bit_op (bit_ior bit_xor bit_and) > +(simplify > + (lshift (bit_op:s @0 INTEGER_CST@1) INTEGER_CST@2) > + (bit_op > + (lshift @0 @2) > + { wide_int_to_tree (type, wi::lshift (@1, @2)); }))) Will do, good to see that my second transformation still picks up the=20 shift @1 @2 as a constant. I'm assuming there is some constant folding=20 going on between simplify transformations? > > this may be one case where not using wide-ints to be able to combine the > patterns makes sense. Thus, > > (for shift (lshift rshift) > (simplify > (shift ...) > (bit_op > (shift @0 @2) > (shift @1 @2)))) > > note that I'm worried we'd take on "invalid" ubsan here when the above > applies to > > int foo (int i) > { > return (i & 0x7fffffff) >> 3; > } > int main () { return foo (0x80000007); } > > and i is negative. That's because right-shift of negative values > invokes undefined > behavior. IIRC in the middle-end we will not be taking advantage of > that but the > simplifications apply to GENERIC as well and thus may hit before ubsan > instrumentation triggers(?) It would be nice if you could double-check t= hat. I was looking into this and I understand your worries, though, I found=20 out that for the particular case of shifts and bit_and there already is=20 a simplify transformation that does the same, regardless of the sign. /* Fold (X & C2) << C1 into (X << C1) & (C2 << C1) (X & C2) >> C1 into (X >> C1) & (C2 >> C1). */ (for shift (lshift rshift) (simplify (shift (convert?:s (bit_and:s @0 INTEGER_CST@2)) INTEGER_CST@1) (if (tree_nop_conversion_p (type, TREE_TYPE (@0))) (with { tree mask =3D int_const_binop (shift, fold_convert (type, @2),= =20 @1); } (bit_and (shift (convert @0) @1) { mask; }))))) Now, I don't quite understand what you mean by ubsan instrumentation,=20 will GCC introduce guards into code where it detects potential undefined=20 behavior? Also, it might be worth to note that right shift of negative=20 values is denoted as "implementation defined" by the C standard. GCC=20 however doesn't include it in its list of implementation defined=20 behavior so I guess that is why you refer to it as undefined? Should we perhaps disable transformations where we can not guarantee=20 that the right shift produced is not one of negative values? I.e.=20 prohibit this transformation for: 1) signed types and op1 =3D=3D bit_xor 2) signed types and op1 =3D=3D bit_and and C1 has sign bit set. Also would it be useful in cases where you have signed shift and=20 bit_and, and C1 is not negative, to do the transformation but replace=20 the signed shift by an unsigned shift. This to make sure we don't=20 introduce undefined/implementation defined behavior were there was none. This does however change the current behavior! > > + (if (!(op0 =3D=3D RSHIFT_EXPR && !TYPE_UNSIGNED (type)) && wi::fits_uhw= i_p (@1)) > > you only need fits_uhwi_p (@1) in the op0 !=3D BIT_AND_EXPR case it > seems, so better > move it down to those cases. So I used to, but I had the problem that I didn't know how to make it=20 "fail" the matching if this was not the case. For instance if op0 is a=20 lshift for which the constant doesn't fit uhwi, then it would fall=20 through and never set the zero mask, potentially leading to a wrong=20 transformation. Now I'm not sure this can happen, since that would mean=20 that either constant @2 or @3 need to be all 1's and that might already=20 be caught by some other transformation, but its wrong to rely on such=20 behavior IMO. So for now I have changed it to: (simplify (op2 (op1:s (op0@4 @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3) (if (!(op0 =3D=3D RSHIFT_EXPR && !TYPE_UNSIGNED (type)) && (op0 =3D=3D BIT_AND_EXPR || wi::fits_uhwi_p (@1))) It would be cool to have a FAIL expression, usable in the with clauses,=20 to make the pattern match fail a bit like the one in the machine=20 description language. > > + (if (wi::eq_p (wi::bit_and (C1, zero_mask_not), wi::zero (prec))) > > I think you can write > > (if (wi::bit_and (...) =3D=3D 0) > > or at least wi:eq_p (wi::bit_and (...), 0). > wi::bit_and (...) =3D=3D 0 seems to be doing the trick. > I wonder if we shouldn't improve the pattern by handling (X op0 C0) > transparently > via using get_nonzero_bits (yeah, that's not exactly zero_mask but its > inverse AFAIK). > We'd wrap get_nonzero_bits in a helper that can handle GENERIC and your > &, >>, << cases (hmm, such function must already exist somewhere...). So= you'd > reduce the pattern to > > + (for op1 (bit_ior bit_xor) > + op2 (bit_xor bit_ior) > +(simplify > + (op2 > + (op1:s @0 INTEGER_CST@2) INTEGER_CST@3)) > (with > { > wide_int zero_mask_not =3D get_nonzero_bits (@0); > ... > } > > This would make use of value-range information determined by VRP for exam= ple. I'll go look for such a function. > > note that with your pattern you'd want to capture (op0:s @0 INTEGER_CST@1) > like via (op0@4 @0 INTEGER_CST@1) so you can re-use it in the replacement > like so: > > + (if (wi::eq_p (wi::bit_and (C1, zero_mask_not), wi::zero (prec))) > + (op2 @4 { wide_int_to_tree (type, cst_emit); }) > + (if (wi::eq_p (wi::bit_and (@3, zero_mask_not), wi::zero (prec))) > + (op1 @4 { wide_int_to_tree (type, cst_emit); })))))))) > > the expression doesn't need a :s then obviously. Yeah makes sense. > > Thanks and sorry for the delay in reviewing this. > Richard. > Thank you for all the comments! > >> gcc/ChangeLog: >> >> 2015-08-03 Andre Vieira >> >> * match.pd: Added new patterns: >> ((X {&,<<,>>} C0) {|,^} C1) {^,|} C2) >> (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>} C1) >> >> gcc/testsuite/ChangeLog: >> >> 2015-08-03 Andre Vieira >> Hale Wang >> >> * gcc.dg/tree-ssa/forwprop-33.c: New test. >