public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Jiong Wang <jiong.wang@arm.com>
To: Jeff Law <law@redhat.com>
Cc: "gcc-patches\@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: Re: [Patch/rtl-expand] Take tree range info into account to improve LSHIFT_EXP expanding
Date: Tue, 18 Aug 2015 13:22:00 -0000	[thread overview]
Message-ID: <n99614cagfa.fsf@arm.com> (raw)
In-Reply-To: <n99zj1twngh.fsf@arm.com>

[-- Attachment #1: Type: text/plain, Size: 6344 bytes --]


Jiong Wang writes:

> Jeff Law writes:
>
>> On 08/14/2015 11:40 AM, Jiong Wang wrote:
>>>
>>>    * Figuring out whether the shift source is coming from sign extension
>>>      by checking SSA_NAME_DEF_STMT instead of deducing from tree range
>>>      info. I fell checking the gimple statement is more reliable and
>>>      straigtforward.
>> I suspect it'll also apply more often if you're looking for the 
>> nop-conversion rather than looking at range information.
>>
>> I keep thinking there ought to be a generalization here so that we're 
>> not so restrictive on the modes, but it's probably not worth doing.
>>
>> In a perfect world we'd also integrate the other strategies for 
>> double-word shifts that exist in the various ports as special cases in 
>> expansion and remove the double-word shift patterns for ports that don't 
>> actually have them in hardware.  But that's probably a bit much to ask 
>> -- however, it probably couldn't hurt to see if there are any that are 
>> easily handled.
>
> Agree.
>
> Doing these in early tree/rtl expand stage is more clean & safe, and
> expose more details to later RTL optimization passes as I think if you
> handle double-word by defining insn pattern, then you will try to split
> it in RTL split pass which happens later after some important
> optimizations.
>
>>
>>> +
>>> +	   If mode == GET_MODE_WIDER_MODE (word_mode), then normally there isn't
>>> +	   native instruction to support this wide mode left shift.  Given below
>>> +	   example:
>>> +
>>> +	    Type A = (Type) B  << C
>>> +
>>> +            |<           T	    >|
>>> +            |   high     |   low     |
>>> +
>>> +                         |<- size  ->|
>>> +
>>> +	   By checking tree shape, if operand B is coming from signed extension,
>>> +	   then the left shift operand can be simplified into:
>>> +
>>> +	      1. high = low >> (size - C);
>>> +	      2. low = low << C;
>> You'll want to reorder those to match the code you generate.
>>
>> Doesn't this require that C be less than the bitsize of a word?
>
> Yes.
>
> Above transformation is to handle double word left shift which shift the
> original source across the word boundary.
>
> Part of the contents shifted into the high half of destination, and the
> other remain in the low half.
>
> So if C is bigger than bitsize of a word, the the whole source will be
> shifted into high half, thus should generate what you have listed below
> and that's what gcc is generating, looks like the existed double word
> shift logic can recognize this special cases.
>
> So, I should check the value of C,if it's bigger than bitsize of a word,
> then just fall through to default expand logic.
>
>> If C is larger than the bitsize of a word, then you need some 
>> adjustments, something like:
>>
>>
>> 	      1. high = low << (C - size)
>>                2. low = 0
>>
>> Does this transformation require that the wider mode be exactly 2X the 
>> narrower mode?  If so, that needs to be verified.
>
> I think so, the wider mode should be exactly 2X the word_mode, will
> add the check.
>
>>
>>> +		if (GET_MODE_SIZE (rmode) < GET_MODE_SIZE (mode)
>> So we're assured we have a widening conversion.
>>
>>> +		    && ((TREE_INT_CST_LOW (treeop1) + GET_MODE_BITSIZE (rmode))
>>> +			>= GET_MODE_BITSIZE (word_mode)))
>> This test seems wrong.  I'm not sure what you're really trying to test 
>> here.  It seems you want to look at the shift count relative to the 
>> bitsize of word_mode.  If it's less, then generate the code you 
>> mentioned above in the comment.  If it's more, then generate the 
>> sequence I referenced?  Right?
>
> As explained above, I am trying to check whether the left shift is
> causing the original source across word boundary while I should make sure
> TREE_INT_CST_LOW (treeop1) < GET_MODE_BITSIZE (word_mode) at the same
> time, otherwise fall through to default expand logic.
>
>>
>> I do think you need to be verifying that rmode == wordmode here.  If I 
>> understand everything correctly, the final value is "mode" which must be 
>> 2X the size size of rmode/wordmode here, right?
>
> I think rmode don't need to equal wordmode. For example the
> transformation is OK for the following, where rmode = SImode, and final
> mode = TImode.
>
> int b;
> __128_int a = (__128_int) b;
>
> the expand_expr (treeop0... in the start of those code will generate
> necessary instruction sequences to extend whatever mode b is into TImode
> register (op0 in the patch);
>
>>
>>
>>
>> The other question is are we endianness-safe in these transformations?
>
> I think it is. As this transformation is done with register involved
> only. I think endianness issue will only happen if there is operation on
> memory object.
>
>>> +		    temp = expand_variable_shift (code, word_mode, low, treeop1,
>>> +						  tlow, unsignedp);
>> Why use "code" here right than just using LSHIFT_EXPR?  As noted
>> earlier,
>
> Yes, better to use the constant LSHIFT_EXPR.
>
> Thanks.

patch updated, please review thanks.

Changes are:

  1. s/shfit/shift/
  2. Re-write the comment.
     more explanations on the left shift across word size boundary
     scenario, explan gcc's current expand steps and what this
     optimization can improve.
  3. Match the code to the comment.
     Expand high part first, then low part, so we don't need to check
     the pseudo register overlapping.
  4. Add the check to make sure if we are shifting the whole source
     operand into high part (sfhit amount >= word mode size) then just
     don't do this optimization, use the default expand logic instead
     which generate optimized rtx sequences already.
  5. Only do this optimization when
       GET_MOZE_SIZE (mode) == 2 * GET_MODE_SIZE (word_mode)

  bootstrap OK on x86-64, no regression.
  bootstrap OK on aarch64, no regression.

2015-08-18  Jiong.Wang  <jiong.wang@arm.com>

gcc/
  * expr.c (expand_expr_real_2): Check gimple statement during
  LSHIFT_EXPR expand.
  
gcc/testsuite
  * gcc.dg/wide_shift_64_1.c: New testcase.
  * gcc.dg/wide_shift_128_1.c: Likewise.
  * gcc.target/aarch64/ashlti3_1.c: Likewise.
  * gcc.target/arm/ashldisi_1.c: Likewise.
  
-- 
Regards,
Jiong


[-- Attachment #2: k-new.patch --]
[-- Type: text/x-diff, Size: 8548 bytes --]

diff --git a/gcc/expr.c b/gcc/expr.c
index 3202f55..eca9a20 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -8836,23 +8836,110 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
 
     case LSHIFT_EXPR:
     case RSHIFT_EXPR:
-      /* If this is a fixed-point operation, then we cannot use the code
-	 below because "expand_shift" doesn't support sat/no-sat fixed-point
-         shifts.   */
-      if (ALL_FIXED_POINT_MODE_P (mode))
-	goto binop;
+      {
+	/* If this is a fixed-point operation, then we cannot use the code
+	   below because "expand_shift" doesn't support sat/no-sat fixed-point
+	   shifts.  */
+	if (ALL_FIXED_POINT_MODE_P (mode))
+	  goto binop;
+
+	if (! safe_from_p (subtarget, treeop1, 1))
+	  subtarget = 0;
+	if (modifier == EXPAND_STACK_PARM)
+	  target = 0;
+	op0 = expand_expr (treeop0, subtarget,
+			   VOIDmode, EXPAND_NORMAL);
+	/* Left shift optimization when shifting across word_size boundary.
+
+	   If mode == GET_MODE_WIDER_MODE (word_mode), then normally there isn't
+	   native instruction to support this wide mode left shift.  Given below
+	   scenario:
+
+	    Type A = (Type) B  << C
+
+	    |<		 T	    >|
+	    | dest_high  |  dest_low |
+
+			 | word_size |
+
+	   If the shfit amount C caused we shift B to across the word size
+	   boundary, i.e part of B shifted into high half of destination
+	   register, and part of B remains in the low half, then GCC will use
+	   the following left shift expand logic:
+
+	   1. Initialize dest_low to B.
+	   2. Initialize every bit of dest_high to the sign bit of B.
+	   3. Logic left shift dest_low by C bit to finalize dest_low.
+	      The value of dest_low before this shift is kept in a temp D.
+	   4. Logic left shift dest_high by C.
+	   5. Logic right shift D by (word_size - C).
+	   6. Or the result of 4 and 5 to finalize dest_high.
+
+	   While, by checking gimple statements, if operand B is coming from
+	   signed extension, then we can simplify above expand logic into:
+
+	      1. dest_high = src_low >> (word_size - C).
+	      2. dest_low = src_low << C.
+
+	   We can use one arithmetic right shift to finish all the purpose of
+	   steps 2, 4, 5, 6, thus we reduce the steps needed from 6 into 2.  */
+
+	temp = NULL_RTX;
+	if (code == LSHIFT_EXPR
+	    && target
+	    && REG_P (target)
+	    && ! unsignedp
+	    && mode == GET_MODE_WIDER_MODE (word_mode)
+	    && GET_MODE_SIZE (mode) == 2 * GET_MODE_SIZE (word_mode)
+	    && ! have_insn_for (ASHIFT, mode)
+	    && TREE_CONSTANT (treeop1)
+	    && TREE_CODE (treeop0) == SSA_NAME)
+	  {
+	    gimple def = SSA_NAME_DEF_STMT (treeop0);
+	    if (is_gimple_assign (def)
+		&& gimple_assign_rhs_code (def) == NOP_EXPR)
+	      {
+		machine_mode rmode = TYPE_MODE
+		  (TREE_TYPE (gimple_assign_rhs1 (def)));
 
-      if (! safe_from_p (subtarget, treeop1, 1))
-	subtarget = 0;
-      if (modifier == EXPAND_STACK_PARM)
-	target = 0;
-      op0 = expand_expr (treeop0, subtarget,
-			 VOIDmode, EXPAND_NORMAL);
-      temp = expand_variable_shift (code, mode, op0, treeop1, target,
-				    unsignedp);
-      if (code == LSHIFT_EXPR)
-	temp = REDUCE_BIT_FIELD (temp);
-      return temp;
+		if (GET_MODE_SIZE (rmode) < GET_MODE_SIZE (mode)
+		    && TREE_INT_CST_LOW (treeop1) < GET_MODE_BITSIZE (word_mode)
+		    && ((TREE_INT_CST_LOW (treeop1) + GET_MODE_BITSIZE (rmode))
+			>= GET_MODE_BITSIZE (word_mode)))
+		  {
+		    rtx low = simplify_gen_subreg (word_mode, op0, mode, 0);
+		    rtx dest_low = simplify_gen_subreg (word_mode, target, mode,
+							0);
+		    rtx dest_high = simplify_gen_subreg (word_mode, target,
+							 mode, UNITS_PER_WORD);
+		    HOST_WIDE_INT ramount = (BITS_PER_WORD
+					     - TREE_INT_CST_LOW (treeop1));
+		    tree rshift = build_int_cst (TREE_TYPE (treeop1), ramount);
+
+		    /* dest_high = src_low >> (word_size - C).  */
+		    temp = expand_variable_shift (RSHIFT_EXPR, word_mode, low,
+						  rshift, dest_high, unsignedp);
+		    if (temp != dest_high)
+		      emit_move_insn (dest_high, temp);
+
+		    /* dest_low = src_low << C.  */
+		    temp = expand_variable_shift (LSHIFT_EXPR, word_mode, low,
+						  treeop1, dest_low, unsignedp);
+		    if (temp != dest_low)
+		      emit_move_insn (dest_low, temp);
+
+		    temp = target ;
+		  }
+	      }
+	  }
+
+	if (temp == NULL_RTX)
+	  temp = expand_variable_shift (code, mode, op0, treeop1, target,
+					unsignedp);
+	if (code == LSHIFT_EXPR)
+	  temp = REDUCE_BIT_FIELD (temp);
+	return temp;
+      }
 
       /* Could determine the answer when only additive constants differ.  Also,
 	 the addition of one can be handled by changing the condition.  */
diff --git a/gcc/testsuite/gcc.dg/wide-shift-128.c b/gcc/testsuite/gcc.dg/wide-shift-128.c
new file mode 100644
index 0000000..9b62715
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/wide-shift-128.c
@@ -0,0 +1,11 @@
+/* { dg-do compile { target aarch64*-*-* mips64*-*-* sparc64*-*-* } } */
+/* { dg-require-effective-target int128 } */
+/* { dg-options "-O2 -fdump-rtl-combine" } */
+
+__int128_t
+load2 (int data)
+{
+    return (__int128_t) data << 50;
+}
+
+/* { dg-final { scan-rtl-dump-not "ior" "combine" } } */
diff --git a/gcc/testsuite/gcc.dg/wide-shift-64.c b/gcc/testsuite/gcc.dg/wide-shift-64.c
new file mode 100644
index 0000000..ea78708
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/wide-shift-64.c
@@ -0,0 +1,10 @@
+/* { dg-do compile { target arm*-*-* mips*-*-* sparc*-*-* } } */
+/* { dg-options "-O2 -fdump-rtl-combine" } */
+
+long long
+load1 (int data)
+{
+    return (long long) data << 12;
+}
+
+/* { dg-final { scan-rtl-dump-not "ior" "combine" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/ashltidisi.c b/gcc/testsuite/gcc.target/aarch64/ashltidisi.c
new file mode 100644
index 0000000..d7b02b5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ashltidisi.c
@@ -0,0 +1,48 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -save-temps" } */
+
+extern void abort (void);
+
+#define GEN_TEST_CASE(x, y, z)\
+__uint128_t __attribute__ ((noinline))\
+ushift_##x##_##z (unsigned y data)\
+{\
+  return (__uint128_t) data << x;\
+}\
+__int128_t __attribute__ ((noinline)) \
+shift_##x##_##z (y data) \
+{\
+  return (__int128_t) data << x;\
+}
+
+GEN_TEST_CASE (53, int, i)
+GEN_TEST_CASE (3, long long, ll)
+GEN_TEST_CASE (13, long long, ll)
+GEN_TEST_CASE (53, long long, ll)
+
+int
+main (int argc, char **argv)
+{
+
+#define SHIFT_CHECK(x, y, z, p) \
+	if (ushift_##y##_##p (x)\
+	    != ((__uint128_t) (unsigned z) x << y)) \
+	  abort ();\
+	if (shift_##y##_##p (x)\
+	    != ((__uint128_t) (signed z) x << y)) \
+	  abort ();
+
+  SHIFT_CHECK (0x12345678, 53, int, i)
+  SHIFT_CHECK (0xcafecafe, 53, int, i)
+
+  SHIFT_CHECK (0x1234567890abcdefLL, 3, long long, ll)
+  SHIFT_CHECK (0x1234567890abcdefLL, 13, long long, ll)
+  SHIFT_CHECK (0x1234567890abcdefLL, 53, long long, ll)
+  SHIFT_CHECK (0xcafecafedeaddeadLL, 3, long long, ll)
+  SHIFT_CHECK (0xcafecafedeaddeadLL, 13, long long, ll)
+  SHIFT_CHECK (0xcafecafedeaddeadLL, 53, long long, ll)
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times "asr" 4 } } */
diff --git a/gcc/testsuite/gcc.target/arm/ashldisi.c b/gcc/testsuite/gcc.target/arm/ashldisi.c
new file mode 100644
index 0000000..00dc06e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/ashldisi.c
@@ -0,0 +1,44 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -save-temps" } */
+
+extern void abort (void);
+
+#define GEN_TEST_CASE(x)\
+unsigned long long __attribute__ ((noinline))\
+ushift_ ## x (unsigned int data)\
+{\
+  return (unsigned long long) data << x;\
+}\
+long long __attribute__ ((noinline)) \
+shift_ ## x (int data) \
+{\
+  return (long long) data << x;\
+}
+
+GEN_TEST_CASE (3)
+GEN_TEST_CASE (23)
+GEN_TEST_CASE (30)
+int
+main (int argc, char **argv)
+{
+
+#define SHIFT_CHECK(x, y) \
+	if (ushift_ ## y (x)\
+	    != ((unsigned long long) (unsigned) x << y)) \
+	  abort (); \
+	if (shift_ ## y (x)\
+	    != ((long long) (signed) x << y)) \
+	  abort ();
+
+  SHIFT_CHECK (0x12345678, 3)
+  SHIFT_CHECK (0xcafecafe, 3)
+  SHIFT_CHECK (0x12345678, 23)
+  SHIFT_CHECK (0xcafecafe, 23)
+  SHIFT_CHECK (0x12345678, 30)
+  SHIFT_CHECK (0xcafecafe, 30)
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times "asr" 3 } } */
+/* { dg-final { cleanup-saved-temps } } */

  reply	other threads:[~2015-08-18 13:13 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-16 11:04 Jiong Wang
2015-04-24  2:23 ` Jeff Law
2015-04-27 20:58   ` Jiong Wang
2015-04-29  3:53     ` Jeff Law
2015-04-29 22:14       ` Jiong Wang
2015-04-29 22:55         ` Jeff Law
2015-08-14 17:55           ` Jiong Wang
2015-08-14 20:30             ` Jeff Law
2015-08-14 22:24               ` Jiong Wang
2015-08-18 13:22                 ` Jiong Wang [this message]
2015-08-18 17:47                   ` Jeff Law
2015-08-19 23:05                     ` Jiong Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=n99614cagfa.fsf@arm.com \
    --to=jiong.wang@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=law@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).