public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Fix PR63259: bswap not recognized when finishing with rotation
@ 2014-10-08  1:57 Thomas Preud'homme
  2014-10-08  6:38 ` Jakub Jelinek
  0 siblings, 1 reply; 9+ messages in thread
From: Thomas Preud'homme @ 2014-10-08  1:57 UTC (permalink / raw)
  To: gcc-patches, Richard Biener

Currently the bswap pass only look for bswap pattern by examining bitwise
OR statement and doing following def-use chains. However a rotation
(left or right) can finish a manual byteswap, as shown in the following example:

unsigned
byteswap_ending_with_rotation (unsigned in)
{
    in = ((in & 0xff00ff00) >>  8) | ((in & 0x00ff00ff) <<  8);
    in = ((in & 0xffff0000) >> 16) | ((in & 0x0000ffff) << 16);
    return in;
}

which is compiled into:

byteswap_ending_with_rotation (unsigned int in)
{
  unsigned int _2;
  unsigned int _3;
  unsigned int _4;
  unsigned int _5;

  <bb 2>:
  _2 = in_1(D) & 4278255360;
  _3 = _2 >> 8;
  _4 = in_1(D) & 16711935;
  _5 = _4 << 8;
  in_6 = _5 | _3;
  in_7 = in_6 r>> 16;
  return in_7;

}

This patch adds rotation (left and right) to the list of statement to consider for byte swap.

ChangeLog are as follows:

*** gcc/ChangeLog ***

2014-09-30  Thomas Preud'homme  <thomas.preudhomme@arm.com>

        PR tree-optimization/63259
        * tree-ssa-math-opts.c (pass_optimize_bswap::execute): Also consider
        bswap in LROTATE_EXPR and RROTATE_EXPR statements.

*** gcc/testsuite/ChangeLog ***

2014-09-30  Thomas Preud'homme  <thomas.preudhomme@arm.com>

        PR tree-optimization/63259
        * optimize-bswapsi-1.c (swap32_e): New bswap pass test.


diff --git a/gcc/testsuite/gcc.dg/optimize-bswapsi-1.c b/gcc/testsuite/gcc.dg/optimize-bswapsi-1.c
index 580e6e0..d4b5740 100644
--- a/gcc/testsuite/gcc.dg/optimize-bswapsi-1.c
+++ b/gcc/testsuite/gcc.dg/optimize-bswapsi-1.c
@@ -64,5 +64,16 @@ swap32_d (SItype in)
 	 | (((in >> 24) & 0xFF) << 0);
 }
 
-/* { dg-final { scan-tree-dump-times "32 bit bswap implementation found at" 4 "bswap" } } */
+/* This variant comes from PR63259.  It compiles to a gimple sequence that ends
+   with a rotation instead of a bitwise OR.  */
+
+unsigned
+swap32_e (unsigned in)
+{
+  in = ((in & 0xff00ff00) >>  8) | ((in & 0x00ff00ff) <<  8);
+  in = ((in & 0xffff0000) >> 16) | ((in & 0x0000ffff) << 16);
+  return in;
+}
+
+/* { dg-final { scan-tree-dump-times "32 bit bswap implementation found at" 5 "bswap" } } */
 /* { dg-final { cleanup-tree-dump "bswap" } } */
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index 3c6e935..2023f2e 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -2377,11 +2377,16 @@ pass_optimize_bswap::execute (function *fun)
         {
 	  gimple src_stmt, cur_stmt = gsi_stmt (gsi);
 	  tree fndecl = NULL_TREE, bswap_type = NULL_TREE, load_type;
+	  enum tree_code code;
 	  struct symbolic_number n;
 	  bool bswap;
 
-	  if (!is_gimple_assign (cur_stmt)
-	      || gimple_assign_rhs_code (cur_stmt) != BIT_IOR_EXPR)
+	  if (!is_gimple_assign (cur_stmt))
+	    continue;
+
+	  code = gimple_assign_rhs_code (cur_stmt);
+	  if (code != BIT_IOR_EXPR && code != LROTATE_EXPR
+	      && code != RROTATE_EXPR)
 	    continue;
 
 	  src_stmt = find_bswap_or_nop (cur_stmt, &n, &bswap);

Testing was done by running the testsuite on arm-none-eabi target with QEMU
emulating Cortex-M3: no regression were found. Due to the potential increase
in compilation time, A bootstrap with sequential build (no -j option when calling
make) and with default option was made with and without the patch. The
results shows no increase compilation time:

r215662 with patch:
make  6167.48s user 401.03s system 99% cpu 1:49:52.07 total

r215662 without patch
make  6136.63s user 400.32s system 99% cpu 1:49:27.28 total

Is it ok for trunk?

Best regards,

Thomas Preud'homme



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Fix PR63259: bswap not recognized when finishing with rotation
  2014-10-08  1:57 [PATCH] Fix PR63259: bswap not recognized when finishing with rotation Thomas Preud'homme
@ 2014-10-08  6:38 ` Jakub Jelinek
  2014-10-08  6:43   ` Thomas Preud'homme
  0 siblings, 1 reply; 9+ messages in thread
From: Jakub Jelinek @ 2014-10-08  6:38 UTC (permalink / raw)
  To: Thomas Preud'homme; +Cc: gcc-patches, Richard Biener

On Wed, Oct 08, 2014 at 09:56:51AM +0800, Thomas Preud'homme wrote:
> --- a/gcc/tree-ssa-math-opts.c
> +++ b/gcc/tree-ssa-math-opts.c
> @@ -2377,11 +2377,16 @@ pass_optimize_bswap::execute (function *fun)
>          {
>  	  gimple src_stmt, cur_stmt = gsi_stmt (gsi);
>  	  tree fndecl = NULL_TREE, bswap_type = NULL_TREE, load_type;
> +	  enum tree_code code;
>  	  struct symbolic_number n;
>  	  bool bswap;
>  
> -	  if (!is_gimple_assign (cur_stmt)
> -	      || gimple_assign_rhs_code (cur_stmt) != BIT_IOR_EXPR)
> +	  if (!is_gimple_assign (cur_stmt))
> +	    continue;
> +
> +	  code = gimple_assign_rhs_code (cur_stmt);
> +	  if (code != BIT_IOR_EXPR && code != LROTATE_EXPR
> +	      && code != RROTATE_EXPR)
>  	    continue;
>  
>  	  src_stmt = find_bswap_or_nop (cur_stmt, &n, &bswap);

Doesn't it turn 16-bit {L,R}ROTATE_EXPR used alone into __builtin_bswap16?
For those the question is if the canonical GIMPLE should be the rotation or
byteswap, I'd think rotation would be perhaps better.  Or depending on if
the backend has bswaphi2 or rotate pattern?

Also, perhaps you could short-circuit this if the rotation isn't by constant
or not a multiple of BITS_PER_UNIT.  So
	  switch (code)
	    {
	    case BIT_IOR_EXPR:
	      break;
	    case LROTATE_EXPR:
	    case RROTATE_EXPR:
	      if (!tree_fits_uhwi_p (gimple_assign_rhs2 (cur_stmt))
		  || (tree_to_uhwi (gimple_assign_rhs2 (cur_stmt))
		      % BITS_PER_UNIT))
		continue;
	      break;
	    default:
	      continue;
	    }
?    

	Jakub

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH] Fix PR63259: bswap not recognized when finishing with rotation
  2014-10-08  6:38 ` Jakub Jelinek
@ 2014-10-08  6:43   ` Thomas Preud'homme
  2014-10-08  6:57     ` Thomas Preud'homme
  2014-10-08  6:58     ` Andrew Pinski
  0 siblings, 2 replies; 9+ messages in thread
From: Thomas Preud'homme @ 2014-10-08  6:43 UTC (permalink / raw)
  To: 'Jakub Jelinek'; +Cc: gcc-patches, Richard Biener

> From: Jakub Jelinek [mailto:jakub@redhat.com]
> Sent: Wednesday, October 08, 2014 2:39 PM
> 
> Doesn't it turn 16-bit {L,R}ROTATE_EXPR used alone into
> __builtin_bswap16?
> For those the question is if the canonical GIMPLE should be the rotation
> or
> byteswap, I'd think rotation would be perhaps better.  Or depending on
> if
> the backend has bswaphi2 or rotate pattern?

Good point. It seems better to keep the status quo.

> 
> Also, perhaps you could short-circuit this if the rotation isn't by constant
> or not a multiple of BITS_PER_UNIT.  So
> 	  switch (code)
> 	    {
> 	    case BIT_IOR_EXPR:
> 	      break;
> 	    case LROTATE_EXPR:
> 	    case RROTATE_EXPR:
> 	      if (!tree_fits_uhwi_p (gimple_assign_rhs2 (cur_stmt))
> 		  || (tree_to_uhwi (gimple_assign_rhs2 (cur_stmt))
> 		      % BITS_PER_UNIT))
> 		continue;
> 	      break;
> 	    default:
> 	      continue;
> 	    }
> ?

Right. Thanks for the comments.

Best regards,

Thomas




^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH] Fix PR63259: bswap not recognized when finishing with rotation
  2014-10-08  6:43   ` Thomas Preud'homme
@ 2014-10-08  6:57     ` Thomas Preud'homme
  2014-10-08  7:04       ` Jakub Jelinek
  2014-10-08  6:58     ` Andrew Pinski
  1 sibling, 1 reply; 9+ messages in thread
From: Thomas Preud'homme @ 2014-10-08  6:57 UTC (permalink / raw)
  To: 'Jakub Jelinek'; +Cc: gcc-patches, Richard Biener

> From: gcc-patches-owner@gcc.gnu.org [mailto:gcc-patches-
> owner@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> Sent: Wednesday, October 08, 2014 2:43 PM
> > Also, perhaps you could short-circuit this if the rotation isn't by constant

Note that do_shift_rotate already check for this. Is it enough?

Best regards,

Thomas 




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Fix PR63259: bswap not recognized when finishing with rotation
  2014-10-08  6:43   ` Thomas Preud'homme
  2014-10-08  6:57     ` Thomas Preud'homme
@ 2014-10-08  6:58     ` Andrew Pinski
  1 sibling, 0 replies; 9+ messages in thread
From: Andrew Pinski @ 2014-10-08  6:58 UTC (permalink / raw)
  To: Thomas Preud'homme; +Cc: Jakub Jelinek, GCC Patches, Richard Biener

On Tue, Oct 7, 2014 at 11:43 PM, Thomas Preud'homme
<thomas.preudhomme@arm.com> wrote:
>> From: Jakub Jelinek [mailto:jakub@redhat.com]
>> Sent: Wednesday, October 08, 2014 2:39 PM
>>
>> Doesn't it turn 16-bit {L,R}ROTATE_EXPR used alone into
>> __builtin_bswap16?
>> For those the question is if the canonical GIMPLE should be the rotation
>> or
>> byteswap, I'd think rotation would be perhaps better.  Or depending on
>> if
>> the backend has bswaphi2 or rotate pattern?
>
> Good point. It seems better to keep the status quo.


There is already code which converts __builtin_bswap16 to the rotate
so maybe it does not matter.  Though I have not seen code which does
the rotate into a bswaphi2 pattern.

Thanks,
Andrew


>
>>
>> Also, perhaps you could short-circuit this if the rotation isn't by constant
>> or not a multiple of BITS_PER_UNIT.  So
>>         switch (code)
>>           {
>>           case BIT_IOR_EXPR:
>>             break;
>>           case LROTATE_EXPR:
>>           case RROTATE_EXPR:
>>             if (!tree_fits_uhwi_p (gimple_assign_rhs2 (cur_stmt))
>>                 || (tree_to_uhwi (gimple_assign_rhs2 (cur_stmt))
>>                     % BITS_PER_UNIT))
>>               continue;
>>             break;
>>           default:
>>             continue;
>>           }
>> ?
>
> Right. Thanks for the comments.
>
> Best regards,
>
> Thomas
>
>
>
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Fix PR63259: bswap not recognized when finishing with rotation
  2014-10-08  6:57     ` Thomas Preud'homme
@ 2014-10-08  7:04       ` Jakub Jelinek
  2014-10-08  7:27         ` Richard Biener
  0 siblings, 1 reply; 9+ messages in thread
From: Jakub Jelinek @ 2014-10-08  7:04 UTC (permalink / raw)
  To: Thomas Preud'homme; +Cc: gcc-patches, Richard Biener

On Wed, Oct 08, 2014 at 02:56:46PM +0800, Thomas Preud'homme wrote:
> > From: gcc-patches-owner@gcc.gnu.org [mailto:gcc-patches-
> > owner@gcc.gnu.org] On Behalf Of Thomas Preud'homme
> > Sent: Wednesday, October 08, 2014 2:43 PM
> > > Also, perhaps you could short-circuit this if the rotation isn't by constant
> 
> Note that do_shift_rotate already check for this. Is it enough?

The question is only about the compile time cost needed through all the
routines find_bswap_or_nop calls before it finds out it can't do anything
with the rotate, the short-circuiting could avoid that.  E.g.
find_bswap_or_nop_1 first recurses on the argument etc.

	Jakub

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Fix PR63259: bswap not recognized when finishing with rotation
  2014-10-08  7:04       ` Jakub Jelinek
@ 2014-10-08  7:27         ` Richard Biener
  2014-10-29 13:34           ` Thomas Preud'homme
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Biener @ 2014-10-08  7:27 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Thomas Preud'homme, GCC Patches

On Wed, Oct 8, 2014 at 9:04 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Wed, Oct 08, 2014 at 02:56:46PM +0800, Thomas Preud'homme wrote:
>> > From: gcc-patches-owner@gcc.gnu.org [mailto:gcc-patches-
>> > owner@gcc.gnu.org] On Behalf Of Thomas Preud'homme
>> > Sent: Wednesday, October 08, 2014 2:43 PM
>> > > Also, perhaps you could short-circuit this if the rotation isn't by constant
>>
>> Note that do_shift_rotate already check for this. Is it enough?
>
> The question is only about the compile time cost needed through all the
> routines find_bswap_or_nop calls before it finds out it can't do anything
> with the rotate, the short-circuiting could avoid that.  E.g.
> find_bswap_or_nop_1 first recurses on the argument etc.

I wouldn't worry about that too much.  Indeed the question would be what
should be canonical on GIMPLE (expanders should choose the optimal
vairant from both).  I think a tree code should be always prefered to a
builtin function call - which means a rotate is more canonical than a
bswap16 call.

From the performance side the pass could be re-structured to populate
a lattice, thus work from def to use instead of the other way around.  Which
means we visit each stmt exactly once, compute its value symbolically
and check it against a rotate.

Richard.

>         Jakub

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH] Fix PR63259: bswap not recognized when finishing with rotation
  2014-10-08  7:27         ` Richard Biener
@ 2014-10-29 13:34           ` Thomas Preud'homme
  2014-10-31 10:46             ` Richard Biener
  0 siblings, 1 reply; 9+ messages in thread
From: Thomas Preud'homme @ 2014-10-29 13:34 UTC (permalink / raw)
  To: 'Richard Biener', Jakub Jelinek; +Cc: GCC Patches

> From: Richard Biener [mailto:richard.guenther@gmail.com]
> Sent: Wednesday, October 08, 2014 8:27 AM
> 
> I wouldn't worry about that too much.  Indeed the question would be
> what
> should be canonical on GIMPLE (expanders should choose the optimal
> vairant from both).  I think a tree code should be always prefered to a
> builtin function call - which means a rotate is more canonical than a
> bswap16 call.

Below is the updated patch. ChangeLog entries are as follows:

*** gcc/ChangeLog ***

2014-10-29  Thomas Preud'homme  <thomas.preudhomme@arm.com>

        PR tree-optimization/63259
        * tree-ssa-math-opts.c (bswap_replace): Replace expression by a
        rotation left if it is a 16 bit byte swap.
        (pass_optimize_bswap::execute): Also consider bswap in LROTATE_EXPR and
        RROTATE_EXPR statements if it is a byte rotation.


*** gcc/testsuite/ChangeLog ***

2014-10-29  Thomas Preud'homme  <thomas.preudhomme@arm.com>

        PR tree-optimization/63259
        * optimize-bswapsi-1.c (swap32_f): New bswap pass test.
        * optimize-bswaphi-1.c: Drop useless SIType definition and fix typo in
        following comment.


diff --git a/gcc/testsuite/gcc.dg/optimize-bswaphi-1.c b/gcc/testsuite/gcc.dg/optimize-bswaphi-1.c
index 18aba28..692fceb 100644
--- a/gcc/testsuite/gcc.dg/optimize-bswaphi-1.c
+++ b/gcc/testsuite/gcc.dg/optimize-bswaphi-1.c
@@ -42,11 +42,10 @@ uint32_t read_be16_3 (unsigned char *data)
   return *(data + 1) | (*data << 8);
 }
 
-typedef int SItype __attribute__ ((mode (SI)));
 typedef int HItype __attribute__ ((mode (HI)));
 
 /* Test that detection of significant sign extension works correctly. This
-   checks that unknown byte marker are set correctly in cast of cast.  */
+   checks that unknown byte markers are set correctly in cast of cast.  */
 
 HItype
 swap16 (HItype in)
diff --git a/gcc/testsuite/gcc.dg/optimize-bswapsi-1.c b/gcc/testsuite/gcc.dg/optimize-bswapsi-1.c
index cfde218..ad3ede4 100644
--- a/gcc/testsuite/gcc.dg/optimize-bswapsi-1.c
+++ b/gcc/testsuite/gcc.dg/optimize-bswapsi-1.c
@@ -78,5 +78,16 @@ swap32_e (SItype in)
 	 | (((in >> 24) & 0xFF) << 0);
 }
 
-/* { dg-final { scan-tree-dump-times "32 bit bswap implementation found at" 5 "bswap" } } */
+/* This variant comes from PR63259.  It compiles to a gimple sequence that ends
+   with a rotation instead of a bitwise OR.  */
+
+unsigned
+swap32_f (unsigned in)
+{
+  in = ((in & 0xff00ff00) >>  8) | ((in & 0x00ff00ff) <<  8);
+  in = ((in & 0xffff0000) >> 16) | ((in & 0x0000ffff) << 16);
+  return in;
+}
+
+/* { dg-final { scan-tree-dump-times "32 bit bswap implementation found at" 6 "bswap" } } */
 /* { dg-final { cleanup-tree-dump "bswap" } } */
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index e0f2924..5b656e0 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -2187,7 +2187,7 @@ bswap_replace (gimple cur_stmt, gimple_stmt_iterator gsi, gimple src_stmt,
 	       struct symbolic_number *n, bool bswap)
 {
   tree src, tmp, tgt;
-  gimple call;
+  gimple bswap_stmt;
 
   src = gimple_assign_rhs1 (src_stmt);
   tgt = gimple_assign_lhs (cur_stmt);
@@ -2293,16 +2293,28 @@ bswap_replace (gimple cur_stmt, gimple_stmt_iterator gsi, gimple src_stmt,
 
   tmp = src;
 
-  /* Convert the src expression if necessary.  */
-  if (!useless_type_conversion_p (TREE_TYPE (tmp), bswap_type))
+  /* Canonical form for 16 bit bswap is a rotate expression.  */
+  if (bswap && n->range == 16)
     {
-      gimple convert_stmt;
-      tmp = make_temp_ssa_name (bswap_type, NULL, "bswapsrc");
-      convert_stmt = gimple_build_assign_with_ops (NOP_EXPR, tmp, src, NULL);
-      gsi_insert_before (&gsi, convert_stmt, GSI_SAME_STMT);
+      tree count = build_int_cst (NULL, BITS_PER_UNIT);
+      bswap_type = TREE_TYPE (src);
+      src = fold_build2 (LROTATE_EXPR, bswap_type, src, count);
+      bswap_stmt = gimple_build_assign (NULL, src);
     }
+  else
+    {
+      /* Convert the src expression if necessary.  */
+      if (!useless_type_conversion_p (TREE_TYPE (tmp), bswap_type))
+	{
+	  gimple convert_stmt;
+	  tmp = make_temp_ssa_name (bswap_type, NULL, "bswapsrc");
+	  convert_stmt = gimple_build_assign_with_ops (NOP_EXPR, tmp, src,
+						       NULL);
+	  gsi_insert_before (&gsi, convert_stmt, GSI_SAME_STMT);
+	}
 
-  call = gimple_build_call (fndecl, 1, tmp);
+      bswap_stmt = gimple_build_call (fndecl, 1, tmp);
+    }
 
   tmp = tgt;
 
@@ -2315,7 +2327,7 @@ bswap_replace (gimple cur_stmt, gimple_stmt_iterator gsi, gimple src_stmt,
       gsi_insert_after (&gsi, convert_stmt, GSI_SAME_STMT);
     }
 
-  gimple_call_set_lhs (call, tmp);
+  gimple_set_lhs (bswap_stmt, tmp);
 
   if (dump_file)
     {
@@ -2324,7 +2336,7 @@ bswap_replace (gimple cur_stmt, gimple_stmt_iterator gsi, gimple src_stmt,
       print_gimple_stmt (dump_file, cur_stmt, 0, 0);
     }
 
-  gsi_insert_after (&gsi, call, GSI_SAME_STMT);
+  gsi_insert_after (&gsi, bswap_stmt, GSI_SAME_STMT);
   gsi_remove (&gsi, true);
   return true;
 }
@@ -2388,13 +2400,29 @@ pass_optimize_bswap::execute (function *fun)
         {
 	  gimple src_stmt, cur_stmt = gsi_stmt (gsi);
 	  tree fndecl = NULL_TREE, bswap_type = NULL_TREE, load_type;
+	  enum tree_code code;
 	  struct symbolic_number n;
 	  bool bswap;
 
-	  if (!is_gimple_assign (cur_stmt)
-	      || gimple_assign_rhs_code (cur_stmt) != BIT_IOR_EXPR)
+	  if (!is_gimple_assign (cur_stmt))
 	    continue;
 
+	  code = gimple_assign_rhs_code (cur_stmt);
+	  switch (code)
+	    {
+	    case LROTATE_EXPR:
+	    case RROTATE_EXPR:
+	      if (!tree_fits_uhwi_p (gimple_assign_rhs2 (cur_stmt))
+		  || tree_to_uhwi (gimple_assign_rhs2 (cur_stmt))
+		     % BITS_PER_UNIT)
+		continue;
+	      /* Fall through.  */
+	    case BIT_IOR_EXPR:
+	      break;
+	    default:
+	      continue;
+	    }
+
 	  src_stmt = find_bswap_or_nop (cur_stmt, &n, &bswap);
 
 	  if (!src_stmt)

Ok for trunk?

> 
> From the performance side the pass could be re-structured to populate
> a lattice, thus work from def to use instead of the other way around.
> Which
> means we visit each stmt exactly once, compute its value symbolically
> and check it against a rotate.

That sounds nice but is an heavy change. IMHO this should be done only if it
can be shown that bswap can have significan impact on compilation time. In
any case, with the approaching end of stage1 this would be material for next
stage1.

Best regards,

Thomas



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Fix PR63259: bswap not recognized when finishing with rotation
  2014-10-29 13:34           ` Thomas Preud'homme
@ 2014-10-31 10:46             ` Richard Biener
  0 siblings, 0 replies; 9+ messages in thread
From: Richard Biener @ 2014-10-31 10:46 UTC (permalink / raw)
  To: Thomas Preud'homme; +Cc: Jakub Jelinek, GCC Patches

On Wed, Oct 29, 2014 at 2:30 PM, Thomas Preud'homme
<thomas.preudhomme@arm.com> wrote:
>> From: Richard Biener [mailto:richard.guenther@gmail.com]
>> Sent: Wednesday, October 08, 2014 8:27 AM
>>
>> I wouldn't worry about that too much.  Indeed the question would be
>> what
>> should be canonical on GIMPLE (expanders should choose the optimal
>> vairant from both).  I think a tree code should be always prefered to a
>> builtin function call - which means a rotate is more canonical than a
>> bswap16 call.
>
> Below is the updated patch. ChangeLog entries are as follows:
>
> *** gcc/ChangeLog ***
>
> 2014-10-29  Thomas Preud'homme  <thomas.preudhomme@arm.com>
>
>         PR tree-optimization/63259
>         * tree-ssa-math-opts.c (bswap_replace): Replace expression by a
>         rotation left if it is a 16 bit byte swap.
>         (pass_optimize_bswap::execute): Also consider bswap in LROTATE_EXPR and
>         RROTATE_EXPR statements if it is a byte rotation.
>
>
> *** gcc/testsuite/ChangeLog ***
>
> 2014-10-29  Thomas Preud'homme  <thomas.preudhomme@arm.com>
>
>         PR tree-optimization/63259
>         * optimize-bswapsi-1.c (swap32_f): New bswap pass test.
>         * optimize-bswaphi-1.c: Drop useless SIType definition and fix typo in
>         following comment.
>
>
> diff --git a/gcc/testsuite/gcc.dg/optimize-bswaphi-1.c b/gcc/testsuite/gcc.dg/optimize-bswaphi-1.c
> index 18aba28..692fceb 100644
> --- a/gcc/testsuite/gcc.dg/optimize-bswaphi-1.c
> +++ b/gcc/testsuite/gcc.dg/optimize-bswaphi-1.c
> @@ -42,11 +42,10 @@ uint32_t read_be16_3 (unsigned char *data)
>    return *(data + 1) | (*data << 8);
>  }
>
> -typedef int SItype __attribute__ ((mode (SI)));
>  typedef int HItype __attribute__ ((mode (HI)));
>
>  /* Test that detection of significant sign extension works correctly. This
> -   checks that unknown byte marker are set correctly in cast of cast.  */
> +   checks that unknown byte markers are set correctly in cast of cast.  */
>
>  HItype
>  swap16 (HItype in)
> diff --git a/gcc/testsuite/gcc.dg/optimize-bswapsi-1.c b/gcc/testsuite/gcc.dg/optimize-bswapsi-1.c
> index cfde218..ad3ede4 100644
> --- a/gcc/testsuite/gcc.dg/optimize-bswapsi-1.c
> +++ b/gcc/testsuite/gcc.dg/optimize-bswapsi-1.c
> @@ -78,5 +78,16 @@ swap32_e (SItype in)
>          | (((in >> 24) & 0xFF) << 0);
>  }
>
> -/* { dg-final { scan-tree-dump-times "32 bit bswap implementation found at" 5 "bswap" } } */
> +/* This variant comes from PR63259.  It compiles to a gimple sequence that ends
> +   with a rotation instead of a bitwise OR.  */
> +
> +unsigned
> +swap32_f (unsigned in)
> +{
> +  in = ((in & 0xff00ff00) >>  8) | ((in & 0x00ff00ff) <<  8);
> +  in = ((in & 0xffff0000) >> 16) | ((in & 0x0000ffff) << 16);
> +  return in;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "32 bit bswap implementation found at" 6 "bswap" } } */
>  /* { dg-final { cleanup-tree-dump "bswap" } } */
> diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
> index e0f2924..5b656e0 100644
> --- a/gcc/tree-ssa-math-opts.c
> +++ b/gcc/tree-ssa-math-opts.c
> @@ -2187,7 +2187,7 @@ bswap_replace (gimple cur_stmt, gimple_stmt_iterator gsi, gimple src_stmt,
>                struct symbolic_number *n, bool bswap)
>  {
>    tree src, tmp, tgt;
> -  gimple call;
> +  gimple bswap_stmt;
>
>    src = gimple_assign_rhs1 (src_stmt);
>    tgt = gimple_assign_lhs (cur_stmt);
> @@ -2293,16 +2293,28 @@ bswap_replace (gimple cur_stmt, gimple_stmt_iterator gsi, gimple src_stmt,
>
>    tmp = src;
>
> -  /* Convert the src expression if necessary.  */
> -  if (!useless_type_conversion_p (TREE_TYPE (tmp), bswap_type))
> +  /* Canonical form for 16 bit bswap is a rotate expression.  */
> +  if (bswap && n->range == 16)
>      {
> -      gimple convert_stmt;
> -      tmp = make_temp_ssa_name (bswap_type, NULL, "bswapsrc");
> -      convert_stmt = gimple_build_assign_with_ops (NOP_EXPR, tmp, src, NULL);
> -      gsi_insert_before (&gsi, convert_stmt, GSI_SAME_STMT);
> +      tree count = build_int_cst (NULL, BITS_PER_UNIT);
> +      bswap_type = TREE_TYPE (src);
> +      src = fold_build2 (LROTATE_EXPR, bswap_type, src, count);
> +      bswap_stmt = gimple_build_assign (NULL, src);
>      }
> +  else
> +    {
> +      /* Convert the src expression if necessary.  */
> +      if (!useless_type_conversion_p (TREE_TYPE (tmp), bswap_type))
> +       {
> +         gimple convert_stmt;
> +         tmp = make_temp_ssa_name (bswap_type, NULL, "bswapsrc");
> +         convert_stmt = gimple_build_assign_with_ops (NOP_EXPR, tmp, src,
> +                                                      NULL);
> +         gsi_insert_before (&gsi, convert_stmt, GSI_SAME_STMT);
> +       }
>
> -  call = gimple_build_call (fndecl, 1, tmp);
> +      bswap_stmt = gimple_build_call (fndecl, 1, tmp);
> +    }
>
>    tmp = tgt;
>
> @@ -2315,7 +2327,7 @@ bswap_replace (gimple cur_stmt, gimple_stmt_iterator gsi, gimple src_stmt,
>        gsi_insert_after (&gsi, convert_stmt, GSI_SAME_STMT);
>      }
>
> -  gimple_call_set_lhs (call, tmp);
> +  gimple_set_lhs (bswap_stmt, tmp);
>
>    if (dump_file)
>      {
> @@ -2324,7 +2336,7 @@ bswap_replace (gimple cur_stmt, gimple_stmt_iterator gsi, gimple src_stmt,
>        print_gimple_stmt (dump_file, cur_stmt, 0, 0);
>      }
>
> -  gsi_insert_after (&gsi, call, GSI_SAME_STMT);
> +  gsi_insert_after (&gsi, bswap_stmt, GSI_SAME_STMT);
>    gsi_remove (&gsi, true);
>    return true;
>  }
> @@ -2388,13 +2400,29 @@ pass_optimize_bswap::execute (function *fun)
>          {
>           gimple src_stmt, cur_stmt = gsi_stmt (gsi);
>           tree fndecl = NULL_TREE, bswap_type = NULL_TREE, load_type;
> +         enum tree_code code;
>           struct symbolic_number n;
>           bool bswap;
>
> -         if (!is_gimple_assign (cur_stmt)
> -             || gimple_assign_rhs_code (cur_stmt) != BIT_IOR_EXPR)
> +         if (!is_gimple_assign (cur_stmt))
>             continue;
>
> +         code = gimple_assign_rhs_code (cur_stmt);
> +         switch (code)
> +           {
> +           case LROTATE_EXPR:
> +           case RROTATE_EXPR:
> +             if (!tree_fits_uhwi_p (gimple_assign_rhs2 (cur_stmt))
> +                 || tree_to_uhwi (gimple_assign_rhs2 (cur_stmt))
> +                    % BITS_PER_UNIT)
> +               continue;
> +             /* Fall through.  */
> +           case BIT_IOR_EXPR:
> +             break;
> +           default:
> +             continue;
> +           }
> +
>           src_stmt = find_bswap_or_nop (cur_stmt, &n, &bswap);
>
>           if (!src_stmt)
>
> Ok for trunk?

Ok.

>>
>> From the performance side the pass could be re-structured to populate
>> a lattice, thus work from def to use instead of the other way around.
>> Which
>> means we visit each stmt exactly once, compute its value symbolically
>> and check it against a rotate.
>
> That sounds nice but is an heavy change. IMHO this should be done only if it
> can be shown that bswap can have significan impact on compilation time. In
> any case, with the approaching end of stage1 this would be material for next
> stage1.

Yes, of course.

Thanks,
Richard.

> Best regards,
>
> Thomas
>
>
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-10-31 10:42 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-08  1:57 [PATCH] Fix PR63259: bswap not recognized when finishing with rotation Thomas Preud'homme
2014-10-08  6:38 ` Jakub Jelinek
2014-10-08  6:43   ` Thomas Preud'homme
2014-10-08  6:57     ` Thomas Preud'homme
2014-10-08  7:04       ` Jakub Jelinek
2014-10-08  7:27         ` Richard Biener
2014-10-29 13:34           ` Thomas Preud'homme
2014-10-31 10:46             ` Richard Biener
2014-10-08  6:58     ` Andrew Pinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).