From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 17076 invoked by alias); 17 Nov 2015 09:34:39 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 16814 invoked by uid 89); 17 Nov 2015 09:34:37 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.5 required=5.0 tests=AWL,BAYES_00,KAM_LOTSOFHASH,SPF_PASS autolearn=no version=3.3.2 X-HELO: eu-smtp-delivery-143.mimecast.com Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (146.101.78.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 17 Nov 2015 09:34:32 +0000 Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.140]) by eu-smtp-1.mimecast.com with ESMTP id uk-mta-7-mWDLz5BbRH66DRL6_ExVTA-1; Tue, 17 Nov 2015 09:34:28 +0000 Received: from [10.2.206.221] ([10.1.2.79]) by cam-owa1.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Tue, 17 Nov 2015 09:34:27 +0000 To: GCC Patches From: Andre Vieira Subject: [embedded-5-branch][PATCH 1/2] Backporting algorithmic optimization in match and simplify Message-ID: <564AF4A3.90407@arm.com> Date: Tue, 17 Nov 2015 09:34:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 X-MC-Unique: mWDLz5BbRH66DRL6_ExVTA-1 Content-Type: multipart/mixed; boundary="------------030604000306060800070609" X-IsSubscribed: yes X-SW-Source: 2015-11/txt/msg02046.txt.bz2 This is a multi-part message in MIME format. --------------030604000306060800070609 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable Content-length: 2187 New algorithmic optimisations: ((X inner_op C0) outer_op C1) With X being a tree where value_range has reasoned certain bits to=20 always be zero throughout its computed value range, we will call this the=20 zero_mask, and with inner_op =3D {|,^}, outer_op =3D {|,^} and inner_op !=3D outer= _op. if (inner_op =3D=3D '^') C0 &=3D ~C1; if ((C0 & ~zero_mask) =3D=3D 0) then emit (X outer_op (C0 outer_op C1) if ((C1 & ~zero_mask) =3D=3D 0) then emit (X inner_op (C0 outer_op C1) (X {&,^,|} C2) << C1 -> (X << C1) {&,^,|} (C2 << C1) (X {&,^,|} C2) >> C1 -> (X >> C1) & (C2 >> C1) The second and third transformations enable more applications of the=20 first. Also some targets may benefit from delaying shift operations. I=20 am aware that such an optimization, in combination with one or more=20 optimizations that cause the reverse transformation, may lead to an=20 infinite loop. Though such behavior has not been detected during=20 regression testing and bootstrapping on aarch64. This patch backports the optimization from trunk to the embedded-5-branch. The original patches are at: https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01493.html (for the first=20 optimization and changes to second and third) https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00517.html (the addition=20 of the original second and third optimizations) gcc/ChangeLog: 2015-11-27 Andre Vieira Backport from mainline: 2015-10-09 Andre Vieira * match.pd: ((X inner_op C0) outer_op C1) New pattern. ((X & C2) << C1): Expand to... (X {&,^,|} C2 << C1): ...This. ((X & C2) >> C1): Expand to... (X {&,^,|} C2 >> C1): ...This. 2015-07-07 Richard Biener * fold-const.c (fold_binary_loc): Move (X & C2) << C1 -> (X << C1) & (C2 << C1) simplification ... * match.pd: ... here. gcc/testsuite/ChangeLog: 2015-11-27 Andre Vieira Backport from mainline: 2015-10-09 Andre Vieira Hale Wang * gcc.dg/tree-ssa/forwprop-33.c: New. --------------030604000306060800070609 Content-Type: text/x-patch; name=0001-backport-algorithmic-optimization.patch Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="0001-backport-algorithmic-optimization.patch" Content-length: 6184 =46rom ac76659583ab0d9eb96552a540a22e66fdd430f7 Mon Sep 17 00:00:00 2001 From: Andre Simoes Dias Vieira Date: Fri, 23 Oct 2015 15:35:27 +0100 Subject: [PATCH 1/2] backport algorithmic optimization --- gcc/fold-const.c | 21 --------- gcc/match.pd | 54 ++++++++++++++++++++++ gcc/testsuite/gcc.dg/tree-ssa/forwprop-33.c | 71 +++++++++++++++++++++++++= ++++ 3 files changed, 125 insertions(+), 21 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/forwprop-33.c diff --git a/gcc/fold-const.c b/gcc/fold-const.c index 6d085b185895f715d99a53a9e706295aa983d52a..5e0477f974173627317a25e018c= 9a6e098b099fb 100644 --- a/gcc/fold-const.c +++ b/gcc/fold-const.c @@ -12080,27 +12080,6 @@ fold_binary_loc (location_t loc, prec) =3D=3D 0) return TREE_OPERAND (arg0, 0); =20 - /* Fold (X & C2) << C1 into (X << C1) & (C2 << C1) - (X & C2) >> C1 into (X >> C1) & (C2 >> C1) - if the latter can be further optimized. */ - if ((code =3D=3D LSHIFT_EXPR || code =3D=3D RSHIFT_EXPR) - && TREE_CODE (arg0) =3D=3D BIT_AND_EXPR - && TREE_CODE (arg1) =3D=3D INTEGER_CST - && TREE_CODE (TREE_OPERAND (arg0, 1)) =3D=3D INTEGER_CST) - { - tree mask =3D fold_build2_loc (loc, code, type, - fold_convert_loc (loc, type, - TREE_OPERAND (arg0, 1)), - arg1); - tree shift =3D fold_build2_loc (loc, code, type, - fold_convert_loc (loc, type, - TREE_OPERAND (arg0, 0)), - arg1); - tem =3D fold_binary_loc (loc, BIT_AND_EXPR, type, shift, mask); - if (tem) - return tem; - } - return NULL_TREE; =20 case MIN_EXPR: diff --git a/gcc/match.pd b/gcc/match.pd index e40720e130fe0302466969e39ad5803960b942da..bea0237ae3b335c73bbc71f6d39= 97b1d84a58986 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -389,6 +389,49 @@ along with GCC; see the file COPYING3. If not see && (TREE_CODE (@4) !=3D SSA_NAME || has_single_use (@4))) (bit_xor (bit_and (bit_xor @0 @1) @2) @0))) =20 +/* ((X inner_op C0) outer_op C1) + With X being a tree where value_range has reasoned certain bits to alwa= ys be + zero throughout its computed value range, + inner_op =3D {|,^}, outer_op =3D {|,^} and inner_op !=3D outer_op + where zero_mask has 1's for all bits that are sure to be 0 in + and 0's otherwise. + if (inner_op =3D=3D '^') C0 &=3D ~C1; + if ((C0 & ~zero_mask) =3D=3D 0) then emit (X outer_op (C0 outer_op C1) + if ((C1 & ~zero_mask) =3D=3D 0) then emit (X inner_op (C0 outer_op C1) +*/ +(for inner_op (bit_ior bit_xor) + outer_op (bit_xor bit_ior) +(simplify + (outer_op + (inner_op@3 @2 INTEGER_CST@0) INTEGER_CST@1) + (with + { + bool fail =3D false; + wide_int zero_mask_not; + wide_int C0; + wide_int cst_emit; + + if (TREE_CODE (@2) =3D=3D SSA_NAME && + (TREE_CODE (@3) !=3D SSA_NAME || has_single_use (@3))) + zero_mask_not =3D get_nonzero_bits (@2); + else + fail =3D true; + + if (inner_op =3D=3D BIT_XOR_EXPR) + { + C0 =3D wi::bit_and_not (@0, @1); + cst_emit =3D wi::bit_or (C0, @1); + } + else + { + C0 =3D @0; + cst_emit =3D wi::bit_xor (@0, @1); + } + } + (if (!fail && wi::bit_and (C0, zero_mask_not) =3D=3D 0) + (outer_op @2 { wide_int_to_tree (type, cst_emit); })) + (if (!fail && wi::bit_and (@1, zero_mask_not) =3D=3D 0) + (inner_op @2 { wide_int_to_tree (type, cst_emit); }))))) =20 /* Associate (p +p off1) +p off2 as (p +p (off1 + off2)). */ (simplify @@ -614,6 +657,17 @@ along with GCC; see the file COPYING3. If not see (cmp (bit_and (lshift integer_onep @0) integer_onep) integer_zerop) (icmp @0 { build_zero_cst (TREE_TYPE (@0)); }))) =20 +/* Fold (X {&,^,|} C2) << C1 into (X << C1) {&,^,|} (C2 << C1) + (X {&,^,|} C2) >> C1 into (X >> C1) & (C2 >> C1). */ + (for shift (lshift rshift) + (for bit_op (bit_and bit_xor bit_ior) + (simplify + (shift (convert? (bit_op@4 @0 INTEGER_CST@2)) INTEGER_CST@1) + (with { bool fail =3D !(TREE_CODE (@4) !=3D SSA_NAME || has_single_use = (@4)); } + (if (!fail && tree_nop_conversion_p (type, TREE_TYPE (@0))) + (with { tree mask =3D int_const_binop (shift, fold_convert (type, @2)= , @1); } + (bit_op (shift (convert @0) @1) { mask; }))))))) + /* Simplifications of conversions. */ =20 /* Basic strip-useless-type-conversions / strip_nops. */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/forwprop-33.c b/gcc/testsuite/gc= c.dg/tree-ssa/forwprop-33.c new file mode 100644 index 0000000000000000000000000000000000000000..c7124deee11f3655d7c50bd9605= bb7caddba6470 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-33.c @@ -0,0 +1,71 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-forwprop3" } */ + +unsigned short +test1 (unsigned short a) +{ + a ^=3D 0x4002; + a >>=3D 1; + a |=3D 0x8000; /* Simplify to ((a >> 1) ^ 0xa001). */ + return a; +} +/* { dg-final { scan-tree-dump "\\^ 40961" "forwprop3" } } */ + +unsigned short +test2 (unsigned short a) +{ + a |=3D 0x4002; + a <<=3D 1; + a ^=3D 0x0001; /* Simplify to ((a << 1) | 0x8005). */ + return a; +} +/* { dg-final { scan-tree-dump "\\| 32773" "forwprop3" } } */ + +unsigned short +test3 (unsigned short a) +{ + a &=3D 0xd123; + a ^=3D 0x6040; + a |=3D 0xc031; /* Simplify to ((a & 0xd123) | 0xe071). */ + return a; +} +/* { dg-final { scan-tree-dump "\\| 57457" "forwprop3" } } */ + +unsigned short +test4 (unsigned short a) +{ + a ^=3D 0x8002; + a >>=3D 1; + a |=3D 0x8000; + return a; +} +/* { dg-final { scan-tree-dump "\\^ 49153" "forwprop3" } } */ + +unsigned short +test5 (unsigned short a) +{ + a ^=3D 0x8002; + a >>=3D 1; + a |=3D 0x8001; /* Only move shift inward: (((a >> 1) ^ 0x4001) | 0x8001)= . */ + return a; +} +/* { dg-final { scan-tree-dump "\\^ 16385" "forwprop3" } } */ +/* { dg-final { scan-tree-dump "\\| 32769" "forwprop3" } } */ + +short +test6 (short a) +{ + a &=3D 0x7fff; + a >>=3D 2; + return a; +} +/* { dg-final { scan-tree-dump "\\& 8191" "forwprop3" } } */ + +short +test7 (short a) +{ + a &=3D 0x8fff; + a >>=3D 2; + return a; +} +/* { dg-final { scan-tree-dump "\\& -7169" "forwprop3" } } */ --=20 1.9.1 --------------030604000306060800070609--