From: Jiong Wang <jiong.wang@arm.com>
To: Jeff Law <law@redhat.com>
Cc: "gcc-patches\@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: Re: [Patch/rtl-expand] Take tree range info into account to improve LSHIFT_EXP expanding
Date: Fri, 14 Aug 2015 17:55:00 -0000 [thread overview]
Message-ID: <n99oai9epez.fsf@arm.com> (raw)
In-Reply-To: <55415C2E.9010001@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 3532 bytes --]
Jeff Law writes:
> On 04/29/2015 03:36 PM, Jiong Wang wrote:
>>
>> Jeff Law writes:
>>
>>> On 04/27/2015 02:21 PM, Jiong Wang wrote:
>>>
>>>> Jeff,
>>>>
>>>> Sorry, I can't understand the meaning of "overlap between t_low and low",
>>>> assume "right" in "right value" means the opposite of "left" not
>>>> "correct".
>>>>
>>>> So what you mean is t_low and low share the same pseudo regiser?
>>> My concern is sharing the same pseudo or memory location. But thinking
>>> more about it, the shifted value has to have range information, so it
>>> must have been an SSA_NAME, right? If so, then it can't overlap with
>>> the destination, so this is a non-issue. Sorry for the confusion.
>>
>> Thanks for the light. By looking at related code, looks like even it's
>> SSA_NAME, it's still possible to share the same pseudo given the
>> destination is in the same SSA map parition after ssa name coleascing?
> If they're the same size and have non-overlapping lifetimes, then yes,
> they could be the same pseudo. That ought to be easy to check.
> Thankfully we don't have to worry about MEMs, which is a harder check.
>
>> OK. I will rework the patch, and I found there is a function named
>> "expand_doubleword_shift" which looks like a more natural place to do
>> this optimization, although it's hard to get range info there. I will do
>> further explore on this.
> Sounds good.
>
> jeff
Jeff,
Sorry for the slow response.
I found we can't integrate this transformation into
"expand_doubleword_shift" as it's at quite deep layer in the call
stack, when we reach there, we have lost some high level info.
So I am keeping this transformaion still in expr.c, and attachment is
the updated version of this patch. The main changes are:
* Figuring out whether the shift source is coming from sign extension
by checking SSA_NAME_DEF_STMT instead of deducing from tree range
info. I fell checking the gimple statement is more reliable and
straigtforward.
* For the pseudo register overlaping issue, given:
RTX target = TREE source << TREE amount
I was trying to make sure those two SSA_NAME won't get the same
pseudo register by comparing their assigned partition using
var_to_partition, but I can't get the associated tree representation
for "target" which is RTX.
Then I just expand "source" and compare the register number with
"target".
But I found the simplest way maybe just reorder
low_part (target) = low_part (source) << amount
high_part (target) = low_part (source) >> amount1
to:
high_part (target) = low_part (source) >> amount1
low_part (target) = low_part (source) << amount
then, even target and source share the same pseudo register, we can
still get what we want, as we are operating on their subreg.
* Added checking on the result value of expand_variable_shift, if it's
not the same as "target" then call emit_move_insn to do that.
How is the re-worked patch looks to you?
x86 bootstrap OK, regression OK. aarch64 bootstrap OK.
2015-08-14 Jiong.Wang <jiong.wang@arm.com>
gcc/
* expr.c (expand_expr_real_2): Check tree format to optimize
LSHIFT_EXPR expand.
gcc/testsuite
* gcc.dg/wide_shift_64_1.c: New testcase.
* gcc.dg/wide_shift_128_1.c: Likewise.
* gcc.target/aarch64/ashlti3_1.c: Likewise.
* gcc.target/arm/ashldisi_1.c: Likewise.
--
Regards,
Jiong
[-- Attachment #2: k.patch --]
[-- Type: text/x-diff, Size: 7508 bytes --]
diff --git a/gcc/expr.c b/gcc/expr.c
index 31b4573..8a25e98 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -8836,23 +8836,90 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
case LSHIFT_EXPR:
case RSHIFT_EXPR:
- /* If this is a fixed-point operation, then we cannot use the code
- below because "expand_shift" doesn't support sat/no-sat fixed-point
- shifts. */
- if (ALL_FIXED_POINT_MODE_P (mode))
- goto binop;
+ {
+ /* If this is a fixed-point operation, then we cannot use the code
+ below because "expand_shift" doesn't support sat/no-sat fixed-point
+ shifts. */
+ if (ALL_FIXED_POINT_MODE_P (mode))
+ goto binop;
+
+ if (! safe_from_p (subtarget, treeop1, 1))
+ subtarget = 0;
+ if (modifier == EXPAND_STACK_PARM)
+ target = 0;
+ op0 = expand_expr (treeop0, subtarget,
+ VOIDmode, EXPAND_NORMAL);
+ /* Left shfit optimization:
+
+ If mode == GET_MODE_WIDER_MODE (word_mode), then normally there isn't
+ native instruction to support this wide mode left shift. Given below
+ example:
+
+ Type A = (Type) B << C
+
+ |< T >|
+ | high | low |
+
+ |<- size ->|
+
+ By checking tree shape, if operand B is coming from signed extension,
+ then the left shift operand can be simplified into:
+
+ 1. high = low >> (size - C);
+ 2. low = low << C;
+ */
+
+ temp = NULL_RTX;
+ if (code == LSHIFT_EXPR
+ && target
+ && REG_P (target)
+ && ! unsignedp
+ && mode == GET_MODE_WIDER_MODE (word_mode)
+ && ! have_insn_for (ASHIFT, mode)
+ && TREE_CONSTANT (treeop1)
+ && TREE_CODE (treeop0) == SSA_NAME)
+ {
+ gimple def = SSA_NAME_DEF_STMT (treeop0);
+ if (is_gimple_assign (def)
+ && gimple_assign_rhs_code (def) == NOP_EXPR)
+ {
+ machine_mode rmode = TYPE_MODE
+ (TREE_TYPE (gimple_assign_rhs1 (def)));
- if (! safe_from_p (subtarget, treeop1, 1))
- subtarget = 0;
- if (modifier == EXPAND_STACK_PARM)
- target = 0;
- op0 = expand_expr (treeop0, subtarget,
- VOIDmode, EXPAND_NORMAL);
- temp = expand_variable_shift (code, mode, op0, treeop1, target,
- unsignedp);
- if (code == LSHIFT_EXPR)
- temp = REDUCE_BIT_FIELD (temp);
- return temp;
+ if (GET_MODE_SIZE (rmode) < GET_MODE_SIZE (mode)
+ && ((TREE_INT_CST_LOW (treeop1) + GET_MODE_BITSIZE (rmode))
+ >= GET_MODE_BITSIZE (word_mode)))
+ {
+ rtx low = simplify_gen_subreg (word_mode, op0, mode, 0);
+ rtx tlow = simplify_gen_subreg (word_mode, target, mode, 0);
+ rtx thigh = simplify_gen_subreg (word_mode, target, mode,
+ UNITS_PER_WORD);
+ HOST_WIDE_INT ramount = (BITS_PER_WORD
+ - TREE_INT_CST_LOW (treeop1));
+ tree rshift = build_int_cst (TREE_TYPE (treeop1), ramount);
+
+ temp = expand_variable_shift (code, word_mode, low, treeop1,
+ tlow, unsignedp);
+ if (temp != tlow)
+ emit_move_insn (tlow, temp);
+
+ temp = expand_variable_shift (RSHIFT_EXPR, word_mode, low,
+ rshift, thigh, unsignedp);
+ if (temp != thigh)
+ emit_move_insn (thigh, temp);
+
+ temp = target ;
+ }
+ }
+ }
+
+ if (temp == NULL_RTX)
+ temp = expand_variable_shift (code, mode, op0, treeop1, target,
+ unsignedp);
+ if (code == LSHIFT_EXPR)
+ temp = REDUCE_BIT_FIELD (temp);
+ return temp;
+ }
/* Could determine the answer when only additive constants differ. Also,
the addition of one can be handled by changing the condition. */
diff --git a/gcc/testsuite/gcc.dg/wide-shift-128.c b/gcc/testsuite/gcc.dg/wide-shift-128.c
new file mode 100644
index 0000000..9b62715
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/wide-shift-128.c
@@ -0,0 +1,12 @@
+/* { dg-do compile { target aarch64*-*-* mips64*-*-* sparc64*-*-* } } */
+/* { dg-require-effective-target int128 } */
+/* { dg-options "-O2 -fdump-rtl-combine" } */
+
+__int128_t
+load2 (int data)
+{
+ return (__int128_t) data << 50;
+}
+
+/* { dg-final { scan-rtl-dump-not "ior" "combine" } } */
+/* { dg-final { cleanup-rtl-dump "combine" } } */
diff --git a/gcc/testsuite/gcc.dg/wide-shift-64.c b/gcc/testsuite/gcc.dg/wide-shift-64.c
new file mode 100644
index 0000000..5bc278f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/wide-shift-64.c
@@ -0,0 +1,10 @@
+/* { dg-do compile { target arm*-*-* mips*-*-* sparc*-*-* } } */
+/* { dg-options "-O2 -fdump-rtl-combine" } */
+
+long long
+load1 (int data)
+{
+ return (long long) data << 12;
+}
+
+/* { dg-final { scan-rtl-dump-not "ior" "combine" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/ashltidisi.c b/gcc/testsuite/gcc.target/aarch64/ashltidisi.c
new file mode 100644
index 0000000..aeb2a24
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ashltidisi.c
@@ -0,0 +1,48 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -save-temps" } */
+
+extern void abort (void);
+
+#define GEN_TEST_CASE(x, y, z)\
+__uint128_t __attribute__ ((noinline))\
+ushift_##x##_##z (unsigned y data)\
+{\
+ return (__uint128_t) data << x;\
+}\
+__int128_t __attribute__ ((noinline)) \
+shift_##x##_##z (y data) \
+{\
+ return (__int128_t) data << x;\
+}
+
+GEN_TEST_CASE (53, int, i)
+GEN_TEST_CASE (3, long long, ll)
+GEN_TEST_CASE (13, long long, ll)
+GEN_TEST_CASE (53, long long, ll)
+
+int
+main (int argc, char **argv)
+{
+
+#define SHIFT_CHECK(x, y, z, p) \
+ if (ushift_##y##_##p (x)\
+ != ((__uint128_t) (unsigned z) x << y)) \
+ abort ();\
+ if (shift_##y##_##p (x)\
+ != ((__uint128_t) (signed z) x << y)) \
+ abort ();
+
+ SHIFT_CHECK (0x12345678, 53, int, i)
+ SHIFT_CHECK (0xcafecafe, 53, int, i)
+
+ SHIFT_CHECK (0x1234567890abcdefLL, 3, long long, ll)
+ SHIFT_CHECK (0x1234567890abcdefLL, 13, long long, ll)
+ SHIFT_CHECK (0x1234567890abcdefLL, 53, long long, ll)
+ SHIFT_CHECK (0xcafecafedeaddeadLL, 3, long long, ll)
+ SHIFT_CHECK (0xcafecafedeaddeadLL, 13, long long, ll)
+ SHIFT_CHECK (0xcafecafedeaddeadLL, 53, long long, ll)
+
+ return 0;
+}
+
+/* { dg-final { scan-assembler-times "asr" 4 } } */
diff --git a/gcc/testsuite/gcc.target/arm/ashldisi.c b/gcc/testsuite/gcc.target/arm/ashldisi.c
new file mode 100644
index 0000000..00dc06e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/ashldisi.c
@@ -0,0 +1,44 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -save-temps" } */
+
+extern void abort (void);
+
+#define GEN_TEST_CASE(x)\
+unsigned long long __attribute__ ((noinline))\
+ushift_ ## x (unsigned int data)\
+{\
+ return (unsigned long long) data << x;\
+}\
+long long __attribute__ ((noinline)) \
+shift_ ## x (int data) \
+{\
+ return (long long) data << x;\
+}
+
+GEN_TEST_CASE (3)
+GEN_TEST_CASE (23)
+GEN_TEST_CASE (30)
+int
+main (int argc, char **argv)
+{
+
+#define SHIFT_CHECK(x, y) \
+ if (ushift_ ## y (x)\
+ != ((unsigned long long) (unsigned) x << y)) \
+ abort (); \
+ if (shift_ ## y (x)\
+ != ((long long) (signed) x << y)) \
+ abort ();
+
+ SHIFT_CHECK (0x12345678, 3)
+ SHIFT_CHECK (0xcafecafe, 3)
+ SHIFT_CHECK (0x12345678, 23)
+ SHIFT_CHECK (0xcafecafe, 23)
+ SHIFT_CHECK (0x12345678, 30)
+ SHIFT_CHECK (0xcafecafe, 30)
+
+ return 0;
+}
+
+/* { dg-final { scan-assembler-times "asr" 3 } } */
+/* { dg-final { cleanup-saved-temps } } */
next prev parent reply other threads:[~2015-08-14 17:43 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-16 11:04 Jiong Wang
2015-04-24 2:23 ` Jeff Law
2015-04-27 20:58 ` Jiong Wang
2015-04-29 3:53 ` Jeff Law
2015-04-29 22:14 ` Jiong Wang
2015-04-29 22:55 ` Jeff Law
2015-08-14 17:55 ` Jiong Wang [this message]
2015-08-14 20:30 ` Jeff Law
2015-08-14 22:24 ` Jiong Wang
2015-08-18 13:22 ` Jiong Wang
2015-08-18 17:47 ` Jeff Law
2015-08-19 23:05 ` Jiong Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=n99oai9epez.fsf@arm.com \
--to=jiong.wang@arm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=law@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).