From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 99765 invoked by alias); 19 Aug 2015 22:55:57 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 99754 invoked by uid 89); 19 Aug 2015 22:55:57 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.3 required=5.0 tests=AWL,BAYES_00,KAM_ASCII_DIVIDERS,SPF_PASS autolearn=no version=3.3.2 X-HELO: eu-smtp-delivery-143.mimecast.com Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (207.82.80.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 19 Aug 2015 22:55:54 +0000 Received: from cam-owa2.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.140]) by eu-smtp-1.mimecast.com with ESMTP id uk-mta-38-FgoaWFmGSCe7i47H7JnWag-1; Wed, 19 Aug 2015 23:55:49 +0100 Received: from e104437-lin ([10.1.2.79]) by cam-owa2.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Wed, 19 Aug 2015 23:55:48 +0100 References: <5539A922.7060708@redhat.com> <554054E9.4030400@redhat.com> <55415C2E.9010001@redhat.com> <55CE4E7A.8000603@redhat.com> <55D36E59.8090609@redhat.com> From: Jiong Wang To: Jeff Law Cc: "gcc-patches\@gcc.gnu.org" Subject: Re: [Patch/rtl-expand] Take tree range info into account to improve LSHIFT_EXP expanding Date: Wed, 19 Aug 2015 23:05:00 -0000 In-reply-to: <55D36E59.8090609@redhat.com> Message-ID: MIME-Version: 1.0 X-MC-Unique: FgoaWFmGSCe7i47H7JnWag-1 Content-Type: multipart/mixed; boundary="=-=-=" X-SW-Source: 2015-08/txt/msg01148.txt.bz2 --=-=-= Content-Type: text/plain; charset=WINDOWS-1252 Content-Transfer-Encoding: quoted-printable Content-length: 3657 Jeff Law writes: >> + && ! unsignedp > Don't you need to check that the conversion is actually a sign=20 > extension. Oh, you're relying on the signedness of ops->type. That=20 > should be sufficient. Exactly. >> + if (GET_MODE_SIZE (rmode) < GET_MODE_SIZE (mode) >> + && TREE_INT_CST_LOW (treeop1) < GET_MODE_BITSIZE (word_mode) >> + && ((TREE_INT_CST_LOW (treeop1) + GET_MODE_BITSIZE (rmode)) >> + >=3D GET_MODE_BITSIZE (word_mode))) > I keep having trouble with this. I think the key to why this works is=20 > you don't actually use the source of the conversion, but instead the=20 > destination of the conversion (held in op0). Yes, my thought is given any narrower type, the op0 =3D expand_expr (treeop0, subtarget, VOIDmode, EXPAND_NORMAL); Will do necesary sign extension, then finally want we get from op0 is a "2 x word_mode_size" register whose high part contains duplication of sign bit, and low part contains the source of the conversion if the narrower mode is word_mode_size, or contains sign extened source, but either way, the low part can be used as our new shift operand. And the high part of op0 generated from above expand_expr is actually dead which will be removed by later optimization pass. > > So this is OK for the trunk with the typo & whitespace nits fixed. During my final testing of the patch on arm, mips32, mips64, powerpc32, powerpc64 I do found two new issues: * As you have mentioned, this patch do have endianness issue! Although the new testcases (wide-shift-128/64.c) under gcc.dg passed my manual powerpc32/powerpc64 test which check combine pass rtl dump, but the generated assembly looks weird to me, then I found simplify_gen_subreg itself is not endianness safe. We need to use subreg_highpart/lowpart_offset to get the offset.=20=20=20=20 So I done minor modifications on the patch by using lowpart_reg and subreg_highpart_offset I think it's endianess safe now. And the instruction sequences generated for powerpc, for example powerpc32 looks sane to me now: long long load1 (int data) { return (long long) data << 12;} load1-before-patch: srawi 9,3,31 mr 4,3 srwi 3,3,20 slwi 4,4,12 rlwimi 3,9,12,0,31-12 blr load1-after-patch: mr 4,3 srawi 3,3,20 slwi 4,4,12 blr * The other issue is still what you have noticed. For some targets, for example arm, there is backend define_expand for double word left shift, then because my patch will only do the tranformation when "! have_insn_for (ASHIFT, mode)", thus this transformation will not happen on these targets. While I belive this transformation do generate better instruction sequences for some case. As backend don't know high level type conversion information, those backends expand logic like arm_emit_coreregs_64bit_shift can only assume the shift operand may come from any cases thus the expand logic will be very generic thus not optimal. We should compare the cost to decide whether to go with this transformation or use the backend's customized expand logic. But anyway, this patch will not cause regression. We need another seperate patch to implement the cost comparision then we can let those targets benefit from this transformation. Thus I removed the arm target testcase. (I recall arm testcase is in this patch because my original local patch don't contain the "have_insn_for" check.) Thanks very much for the review! =20 Committed attached patch (above minor issues I treat them as obivious) as 227018. =20 --=20 Regards, Jiong --=-=-= Content-Type: text/x-diff; charset=WINDOWS-1252 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline; filename=k.patch Content-length: 9120 Index: gcc/ChangeLog =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- gcc/ChangeLog (revision 227017) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,8 @@ +2015-08-19 Jiong Wang + + * expr.c (expand_expr_real_2): Check gimple statement during + LSHIFT_EXPR expand. + 2015-08-19 Magnus Granberg =20 * common.opt (fstack-protector): Initialize to -1. Index: gcc/expr.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- gcc/expr.c (revision 227017) +++ gcc/expr.c (working copy) @@ -8836,24 +8836,113 @@ =20 case LSHIFT_EXPR: case RSHIFT_EXPR: - /* If this is a fixed-point operation, then we cannot use the code - below because "expand_shift" doesn't support sat/no-sat fixed-point - shifts. */ - if (ALL_FIXED_POINT_MODE_P (mode)) - goto binop; + { + /* If this is a fixed-point operation, then we cannot use the code + below because "expand_shift" doesn't support sat/no-sat fixed-point + shifts. */ + if (ALL_FIXED_POINT_MODE_P (mode)) + goto binop; =20 - if (! safe_from_p (subtarget, treeop1, 1)) - subtarget =3D 0; - if (modifier =3D=3D EXPAND_STACK_PARM) - target =3D 0; - op0 =3D expand_expr (treeop0, subtarget, - VOIDmode, EXPAND_NORMAL); - temp =3D expand_variable_shift (code, mode, op0, treeop1, target, - unsignedp); - if (code =3D=3D LSHIFT_EXPR) - temp =3D REDUCE_BIT_FIELD (temp); - return temp; + if (! safe_from_p (subtarget, treeop1, 1)) + subtarget =3D 0; + if (modifier =3D=3D EXPAND_STACK_PARM) + target =3D 0; + op0 =3D expand_expr (treeop0, subtarget, + VOIDmode, EXPAND_NORMAL); =20 + /* Left shift optimization when shifting across word_size boundary. + + If mode =3D=3D GET_MODE_WIDER_MODE (word_mode), then normally there is= n't + native instruction to support this wide mode left shift. Given below + scenario: + + Type A =3D (Type) B << C + + |< T >| + | dest_high | dest_low | + + | word_size | + + If the shift amount C caused we shift B to across the word size + boundary, i.e part of B shifted into high half of destination + register, and part of B remains in the low half, then GCC will use + the following left shift expand logic: + + 1. Initialize dest_low to B. + 2. Initialize every bit of dest_high to the sign bit of B. + 3. Logic left shift dest_low by C bit to finalize dest_low. + The value of dest_low before this shift is kept in a temp D. + 4. Logic left shift dest_high by C. + 5. Logic right shift D by (word_size - C). + 6. Or the result of 4 and 5 to finalize dest_high. + + While, by checking gimple statements, if operand B is coming from + signed extension, then we can simplify above expand logic into: + + 1. dest_high =3D src_low >> (word_size - C). + 2. dest_low =3D src_low << C. + + We can use one arithmetic right shift to finish all the purpose of + steps 2, 4, 5, 6, thus we reduce the steps needed from 6 into 2. */ + + temp =3D NULL_RTX; + if (code =3D=3D LSHIFT_EXPR + && target + && REG_P (target) + && ! unsignedp + && mode =3D=3D GET_MODE_WIDER_MODE (word_mode) + && GET_MODE_SIZE (mode) =3D=3D 2 * GET_MODE_SIZE (word_mode) + && ! have_insn_for (ASHIFT, mode) + && TREE_CONSTANT (treeop1) + && TREE_CODE (treeop0) =3D=3D SSA_NAME) + { + gimple def =3D SSA_NAME_DEF_STMT (treeop0); + if (is_gimple_assign (def) + && gimple_assign_rhs_code (def) =3D=3D NOP_EXPR) + { + machine_mode rmode =3D TYPE_MODE + (TREE_TYPE (gimple_assign_rhs1 (def))); + + if (GET_MODE_SIZE (rmode) < GET_MODE_SIZE (mode) + && TREE_INT_CST_LOW (treeop1) < GET_MODE_BITSIZE (word_mode) + && ((TREE_INT_CST_LOW (treeop1) + GET_MODE_BITSIZE (rmode)) + >=3D GET_MODE_BITSIZE (word_mode))) + { + unsigned int high_off =3D subreg_highpart_offset (word_mode, + mode); + rtx low =3D lowpart_subreg (word_mode, op0, mode); + rtx dest_low =3D lowpart_subreg (word_mode, target, mode); + rtx dest_high =3D simplify_gen_subreg (word_mode, target, + mode, high_off); + HOST_WIDE_INT ramount =3D (BITS_PER_WORD + - TREE_INT_CST_LOW (treeop1)); + tree rshift =3D build_int_cst (TREE_TYPE (treeop1), ramount); + + /* dest_high =3D src_low >> (word_size - C). */ + temp =3D expand_variable_shift (RSHIFT_EXPR, word_mode, low, + rshift, dest_high, unsignedp); + if (temp !=3D dest_high) + emit_move_insn (dest_high, temp); + + /* dest_low =3D src_low << C. */ + temp =3D expand_variable_shift (LSHIFT_EXPR, word_mode, low, + treeop1, dest_low, unsignedp); + if (temp !=3D dest_low) + emit_move_insn (dest_low, temp); + + temp =3D target ; + } + } + } + + if (temp =3D=3D NULL_RTX) + temp =3D expand_variable_shift (code, mode, op0, treeop1, target, + unsignedp); + if (code =3D=3D LSHIFT_EXPR) + temp =3D REDUCE_BIT_FIELD (temp); + return temp; + } + /* Could determine the answer when only additive constants differ. = Also, the addition of one can be handled by changing the condition. */ case LT_EXPR: Index: gcc/testsuite/ChangeLog =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- gcc/testsuite/ChangeLog (revision 227017) +++ gcc/testsuite/ChangeLog (working copy) @@ -1,3 +1,9 @@ +2015-08-19 Jiong Wang + + * gcc.dg/wide_shift_64_1.c: New testcase. + * gcc.dg/wide_shift_128_1.c: Likewise. + * gcc.target/aarch64/ashlti3_1.c: Likewise. + 2015-08-19 Magnus Granberg =20 * lib/target-supports.exp Index: gcc/testsuite/gcc.dg/wide-shift-128.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- gcc/testsuite/gcc.dg/wide-shift-128.c (revision 0) +++ gcc/testsuite/gcc.dg/wide-shift-128.c (working copy) @@ -0,0 +1,11 @@ +/* { dg-do compile { target aarch64*-*-* mips64*-*-* sparc64*-*-* } } */ +/* { dg-require-effective-target int128 } */ +/* { dg-options "-O2 -fdump-rtl-combine" } */ + +__int128_t +load2 (int data) +{ + return (__int128_t) data << 50; +} + +/* { dg-final { scan-rtl-dump-not "ior" "combine" } } */ Index: gcc/testsuite/gcc.dg/wide-shift-64.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- gcc/testsuite/gcc.dg/wide-shift-64.c (revision 0) +++ gcc/testsuite/gcc.dg/wide-shift-64.c (working copy) @@ -0,0 +1,10 @@ +/* { dg-do compile { target mips*-*-* sparc*-*-* } } */ +/* { dg-options "-O2 -fdump-rtl-combine" } */ + +long long +load1 (int data) +{ + return (long long) data << 12; +} + +/* { dg-final { scan-rtl-dump-not "ior" "combine" } } */ Index: gcc/testsuite/gcc.target/aarch64/ashltidisi.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- gcc/testsuite/gcc.target/aarch64/ashltidisi.c (revision 0) +++ gcc/testsuite/gcc.target/aarch64/ashltidisi.c (working copy) @@ -0,0 +1,49 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -save-temps" } */ + +extern void abort (void); + +#define GEN_TEST_CASE(x, y, z)\ +__uint128_t __attribute__ ((noinline))\ +ushift_##x##_##z (unsigned y data)\ +{\ + return (__uint128_t) data << x;\ +}\ +__int128_t __attribute__ ((noinline)) \ +shift_##x##_##z (y data) \ +{\ + return (__int128_t) data << x;\ +} + +GEN_TEST_CASE (53, int, i) +GEN_TEST_CASE (3, long long, ll) +GEN_TEST_CASE (13, long long, ll) +GEN_TEST_CASE (53, long long, ll) + +int +main (int argc, char **argv) +{ + +#define SHIFT_CHECK(x, y, z, p) \ + if (ushift_##y##_##p (x)\ + !=3D ((__uint128_t) (unsigned z) x << y)) \ + abort ();\ + if (shift_##y##_##p (x)\ + !=3D ((__uint128_t) (signed z) x << y)) \ + abort (); + + SHIFT_CHECK (0x12345678, 53, int, i) + SHIFT_CHECK (0xcafecafe, 53, int, i) + + SHIFT_CHECK (0x1234567890abcdefLL, 3, long long, ll) + SHIFT_CHECK (0x1234567890abcdefLL, 13, long long, ll) + SHIFT_CHECK (0x1234567890abcdefLL, 53, long long, ll) + SHIFT_CHECK (0xcafecafedeaddeadLL, 3, long long, ll) + SHIFT_CHECK (0xcafecafedeaddeadLL, 13, long long, ll) + SHIFT_CHECK (0xcafecafedeaddeadLL, 53, long long, ll) + + return 0; +} + +/* { dg-final { scan-assembler-times "asr" 4 } } */ +/* { dg-final { scan-assembler-not "extr\t" } } */ --=-=-=--