public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] bswap: Recognize (int) __builtin_bswap64 (arg) idioms or __builtin_bswap?? (arg) & mask [PR86723]
@ 2021-08-23  8:46 Jakub Jelinek
  2021-08-23  9:50 ` Richard Biener
  0 siblings, 1 reply; 2+ messages in thread
From: Jakub Jelinek @ 2021-08-23  8:46 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

Hi!

The following patch recognizes in the bswap pass (only there for now,
haven't done it for store merging pass yet) code sequences that can
be handled by (int32) __builtin_bswap64 (arg), i.e. where we have
0x05060708 n->n with 64-bit non-memory argument (if it is memory, we
can just load the 32-bit at 4 bytes into the address and n->n would
be 0x01020304; and only 64 -> 32 bit, because 64 -> 16 bit or 32 -> 16 bit
would mean only two bytes in the result and probably not worth it),
and furthermore the case where we have in the 0x0102030405060708 etc.
numbers some bytes 0 (i.e. known to contain zeros rather than source bytes),
as long as we have at least two original bytes in the right
positions (and no unknown bytes).  This can be handled by
__builtin_bswap64 (arg) & 0xff0000ffffff00ffULL etc.
The latter change is the reason why counting the bswap messages doesn't work
too well in optimize-bswap* tests anymore, while the pass iterates from end
of basic block towards start, it will often match both the bswap at the end
and some of the earlier bswaps with some masks (not a problem generally,
we'll just DCE it away whenever possible).  The pass right now doesn't
handle __builtin_bswap* calls in the pattern matching (which is the reason
why it operates backwards), but it uses FOR_EACH_BB_FN (bb, fun) order
of handling blocks and matched sequences can span multiple blocks, so I was
worried about cases like:
void bar (unsigned long long);
unsigned long long
foo (unsigned long long value, int x)
{
  unsigned long long tmp = (((value & 0x00000000000000ffull) << 56)
          | ((value & 0x000000000000ff00ull) << 40)
          | ((value & 0x00000000ff000000ull) << 8));
  if (x)
    bar (tmp);
  return (tmp 
          | ((value & 0x000000ff00000000ull) >> 8)
          | ((value & 0x0000ff0000000000ull) >> 24)
          | ((value & 0x0000000000ff0000ull) << 24)
          | ((value & 0x00ff000000000000ull) >> 40)
          | ((value & 0xff00000000000000ull) >> 56));
}
but it seems we handle even that fine, while bb2 ending in GIMPLE_COND
is processed first, we recognize there a __builtin_bswap64 (value) & mask1,
in the last bb we recognize tmp | (__builtin_bswap64 (value) & mask2) and
PRE optimizes that into t = __builtin_bswap64 (value); tmp = t & mask1;
in the first bb and return t; in the last one.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-08-23  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/86723
	* gimple-ssa-store-merging.c (find_bswap_or_nop_finalize): Add
	cast64_to_32 argument, set *cast64_to_32 to false, unless n is
	non-memory permutation of 64-bit src which only has bytes of
	0 or [5..8] and n->range is 4.
	(find_bswap_or_nop): Add cast64_to_32 and mask arguments, adjust
	find_bswap_or_nop_finalize caller, support bswap with some bytes
	zeroed, as long as at least two bytes are not zeroed.
	(bswap_replace): Add mask argument and handle masking of bswap
	result.
	(maybe_optimize_vector_constructor): Adjust find_bswap_or_nop
	caller, punt if cast64_to_32 or mask is not all ones.
	(pass_optimize_bswap::execute): Adjust find_bswap_or_nop_finalize
	caller, for now punt if cast64_to_32.

	* gcc.dg/pr86723.c: New test.
	* gcc.target/i386/pr86723.c: New test.
	* gcc.dg/optimize-bswapdi-1.c: Use -fdump-tree-optimized instead of
	-fdump-tree-bswap and scan for number of __builtin_bswap64 calls.
	* gcc.dg/optimize-bswapdi-2.c: Likewise.
	* gcc.dg/optimize-bswapsi-1.c: Use -fdump-tree-optimized instead of
	-fdump-tree-bswap and scan for number of __builtin_bswap32 calls.
	* gcc.dg/optimize-bswapdi-5.c: Likewise.
	* gcc.dg/optimize-bswapdi-3.c: Likewise.  Expect one __builtin_bswap32
	call instead of zero.

--- gcc/gimple-ssa-store-merging.c.jj	2021-07-21 09:38:12.114463148 +0200
+++ gcc/gimple-ssa-store-merging.c	2021-08-20 16:34:03.372081938 +0200
@@ -792,7 +792,7 @@ find_bswap_or_nop_1 (gimple *stmt, struc
 
 void
 find_bswap_or_nop_finalize (struct symbolic_number *n, uint64_t *cmpxchg,
-			    uint64_t *cmpnop)
+			    uint64_t *cmpnop, bool *cast64_to_32)
 {
   unsigned rsize;
   uint64_t tmpn, mask;
@@ -802,6 +802,7 @@ find_bswap_or_nop_finalize (struct symbo
      according to the size of the symbolic number before using it.  */
   *cmpxchg = CMPXCHG;
   *cmpnop = CMPNOP;
+  *cast64_to_32 = false;
 
   /* Find real size of result (highest non-zero byte).  */
   if (n->base_addr)
@@ -814,7 +815,27 @@ find_bswap_or_nop_finalize (struct symbo
   if (n->range < (int) sizeof (int64_t))
     {
       mask = ((uint64_t) 1 << (n->range * BITS_PER_MARKER)) - 1;
-      *cmpxchg >>= (64 / BITS_PER_MARKER - n->range) * BITS_PER_MARKER;
+      if (n->base_addr == NULL
+	  && n->range == 4
+	  && int_size_in_bytes (TREE_TYPE (n->src)) == 8)
+	{
+	  /* If all bytes in n->n are either 0 or in [5..8] range, this
+	     might be a candidate for (unsigned) __builtin_bswap64 (src).
+	     It is not worth it for (unsigned short) __builtin_bswap64 (src)
+	     or (unsigned short) __builtin_bswap32 (src).  */
+	  *cast64_to_32 = true;
+	  for (tmpn = n->n; tmpn; tmpn >>= BITS_PER_MARKER)
+	    if ((tmpn & MARKER_MASK)
+		&& ((tmpn & MARKER_MASK) <= 4 || (tmpn & MARKER_MASK) > 8))
+	      {
+		*cast64_to_32 = false;
+		break;
+	      }
+	}
+      if (*cast64_to_32)
+	*cmpxchg &= mask;
+      else
+	*cmpxchg >>= (64 / BITS_PER_MARKER - n->range) * BITS_PER_MARKER;
       *cmpnop &= mask;
     }
 
@@ -837,6 +858,8 @@ find_bswap_or_nop_finalize (struct symbo
       n->range = rsize;
     }
 
+  if (*cast64_to_32)
+    n->range = 8;
   n->range *= BITS_PER_UNIT;
 }
 
@@ -849,7 +872,8 @@ find_bswap_or_nop_finalize (struct symbo
    expression.  */
 
 gimple *
-find_bswap_or_nop (gimple *stmt, struct symbolic_number *n, bool *bswap)
+find_bswap_or_nop (gimple *stmt, struct symbolic_number *n, bool *bswap,
+		   bool *cast64_to_32, uint64_t *mask)
 {
   tree type_size = TYPE_SIZE_UNIT (TREE_TYPE (gimple_get_lhs (stmt)));
   if (!tree_fits_uhwi_p (type_size))
@@ -929,17 +953,30 @@ find_bswap_or_nop (gimple *stmt, struct
     }
 
   uint64_t cmpxchg, cmpnop;
-  find_bswap_or_nop_finalize (n, &cmpxchg, &cmpnop);
+  find_bswap_or_nop_finalize (n, &cmpxchg, &cmpnop, cast64_to_32);
 
   /* A complete byte swap should make the symbolic number to start with
      the largest digit in the highest order byte. Unchanged symbolic
      number indicates a read with same endianness as target architecture.  */
+  *mask = ~(uint64_t) 0;
   if (n->n == cmpnop)
     *bswap = false;
   else if (n->n == cmpxchg)
     *bswap = true;
   else
-    return NULL;
+    {
+      int set = 0;
+      for (uint64_t msk = MARKER_MASK; msk; msk <<= BITS_PER_MARKER)
+	if ((n->n & msk) == 0)
+	  *mask &= ~msk;
+	else if ((n->n & msk) == (cmpxchg & msk))
+	  set++;
+	else
+	  return NULL;
+      if (set < 2)
+	return NULL;
+      *bswap = true;
+    }
 
   /* Useless bit manipulation performed by code.  */
   if (!n->base_addr && n->n == cmpnop && n->n_ops == 1)
@@ -1024,10 +1061,10 @@ bswap_view_convert (gimple_stmt_iterator
 tree
 bswap_replace (gimple_stmt_iterator gsi, gimple *ins_stmt, tree fndecl,
 	       tree bswap_type, tree load_type, struct symbolic_number *n,
-	       bool bswap)
+	       bool bswap, uint64_t mask)
 {
   tree src, tmp, tgt = NULL_TREE;
-  gimple *bswap_stmt;
+  gimple *bswap_stmt, *mask_stmt = NULL;
   tree_code conv_code = NOP_EXPR;
 
   gimple *cur_stmt = gsi_stmt (gsi);
@@ -1247,6 +1284,15 @@ bswap_replace (gimple_stmt_iterator gsi,
     tgt = make_ssa_name (bswap_type);
   tmp = tgt;
 
+  if (mask != ~(uint64_t) 0)
+    {
+      tree m = build_int_cst (bswap_type, mask);
+      tmp = make_temp_ssa_name (bswap_type, NULL, "bswapdst");
+      gimple_set_lhs (bswap_stmt, tmp);
+      mask_stmt = gimple_build_assign (tgt, BIT_AND_EXPR, tmp, m);
+      tmp = tgt;
+    }
+
   /* Convert the result if necessary.  */
   if (!useless_type_conversion_p (TREE_TYPE (tgt), bswap_type))
     {
@@ -1260,7 +1306,7 @@ bswap_replace (gimple_stmt_iterator gsi,
       gsi_insert_after (&gsi, convert_stmt, GSI_SAME_STMT);
     }
 
-  gimple_set_lhs (bswap_stmt, tmp);
+  gimple_set_lhs (mask_stmt ? mask_stmt : bswap_stmt, tmp);
 
   if (dump_file)
     {
@@ -1277,11 +1323,17 @@ bswap_replace (gimple_stmt_iterator gsi,
 
   if (cur_stmt)
     {
+      if (mask_stmt)
+	gsi_insert_after (&gsi, mask_stmt, GSI_SAME_STMT);
       gsi_insert_after (&gsi, bswap_stmt, GSI_SAME_STMT);
       gsi_remove (&gsi, true);
     }
   else
-    gsi_insert_before (&gsi, bswap_stmt, GSI_SAME_STMT);
+    {
+      gsi_insert_before (&gsi, bswap_stmt, GSI_SAME_STMT);
+      if (mask_stmt)
+	gsi_insert_before (&gsi, mask_stmt, GSI_SAME_STMT);
+    }
   return tgt;
 }
 
@@ -1341,8 +1393,14 @@ maybe_optimize_vector_constructor (gimpl
       return false;
     }
 
-  gimple *ins_stmt = find_bswap_or_nop (cur_stmt, &n, &bswap);
-  if (!ins_stmt || n.range != (unsigned HOST_WIDE_INT) sz)
+  bool cast64_to_32;
+  uint64_t mask;
+  gimple *ins_stmt = find_bswap_or_nop (cur_stmt, &n, &bswap,
+					&cast64_to_32, &mask);
+  if (!ins_stmt
+      || n.range != (unsigned HOST_WIDE_INT) sz
+      || cast64_to_32
+      || mask != ~(uint64_t) 0)
     return false;
 
   if (bswap && !fndecl && n.range != 16)
@@ -1351,7 +1409,7 @@ maybe_optimize_vector_constructor (gimpl
   memset (&nop_stats, 0, sizeof (nop_stats));
   memset (&bswap_stats, 0, sizeof (bswap_stats));
   return bswap_replace (gsi_for_stmt (cur_stmt), ins_stmt, fndecl,
-			bswap_type, load_type, &n, bswap) != NULL_TREE;
+			bswap_type, load_type, &n, bswap, mask) != NULL_TREE;
 }
 
 /* Find manual byte swap implementations as well as load in a given
@@ -1405,7 +1463,8 @@ pass_optimize_bswap::execute (function *
 	  tree fndecl = NULL_TREE, bswap_type = NULL_TREE, load_type;
 	  enum tree_code code;
 	  struct symbolic_number n;
-	  bool bswap;
+	  bool bswap, cast64_to_32;
+	  uint64_t mask;
 
 	  /* This gsi_prev (&gsi) is not part of the for loop because cur_stmt
 	     might be moved to a different basic block by bswap_replace and gsi
@@ -1442,7 +1501,8 @@ pass_optimize_bswap::execute (function *
 	      continue;
 	    }
 
-	  ins_stmt = find_bswap_or_nop (cur_stmt, &n, &bswap);
+	  ins_stmt = find_bswap_or_nop (cur_stmt, &n, &bswap,
+					&cast64_to_32, &mask);
 
 	  if (!ins_stmt)
 	    continue;
@@ -1479,7 +1539,7 @@ pass_optimize_bswap::execute (function *
 	    continue;
 
 	  if (bswap_replace (gsi_for_stmt (cur_stmt), ins_stmt, fndecl,
-			     bswap_type, load_type, &n, bswap))
+			     bswap_type, load_type, &n, bswap, mask))
 	    changed = true;
 	}
     }
@@ -2820,7 +2880,8 @@ imm_store_chain_info::try_coalesce_bswap
     }
 
   uint64_t cmpxchg, cmpnop;
-  find_bswap_or_nop_finalize (&n, &cmpxchg, &cmpnop);
+  bool cast64_to_32;
+  find_bswap_or_nop_finalize (&n, &cmpxchg, &cmpnop, &cast64_to_32);
 
   /* A complete byte swap should make the symbolic number to start with
      the largest digit in the highest order byte.  Unchanged symbolic
@@ -2828,6 +2889,10 @@ imm_store_chain_info::try_coalesce_bswap
   if (n.n != cmpnop && n.n != cmpxchg)
     return false;
 
+  /* For now.  */
+  if (cast64_to_32)
+    return false;
+
   if (n.base_addr == NULL_TREE && !is_gimple_val (n.src))
     return false;
 
@@ -4161,7 +4226,8 @@ imm_store_chain_info::output_merged_stor
 	    n->vuse = gimple_vuse (ins_stmt);
 	}
       bswap_res = bswap_replace (gsi_start (seq), ins_stmt, fndecl,
-				 bswap_type, load_type, n, bswap);
+				 bswap_type, load_type, n, bswap,
+				 ~(uint64_t) 0);
       gcc_assert (bswap_res);
     }
 
--- gcc/testsuite/gcc.dg/pr86723.c.jj	2021-08-20 17:29:49.079153045 +0200
+++ gcc/testsuite/gcc.dg/pr86723.c	2021-08-20 17:30:44.027398231 +0200
@@ -0,0 +1,63 @@
+/* PR tree-optimization/86723 */
+/* { dg-do run { target { ilp32 || lp64 } } } */
+/* { dg-options "-O2" } */
+
+__attribute__((noipa)) int
+foo (unsigned long long value)
+{
+  return (((value & 0x00000000000000ffull) << 56)
+	  | ((value & 0x000000000000ff00ull) << 40)
+	  | ((value & 0x0000000000ff0000ull) << 24)
+	  | ((value & 0x00000000ff000000ull) << 8)
+	  | ((value & 0x000000ff00000000ull) >> 8)
+	  | ((value & 0x0000ff0000000000ull) >> 24)
+	  | ((value & 0x00ff000000000000ull) >> 40)
+	  | ((value & 0xff00000000000000ull) >> 56));
+}
+
+__attribute__((noipa)) int
+bar (unsigned long long value)
+{
+  return (((value & 0x000000ff00000000ull) >> 8)
+	  | ((value & 0x0000ff0000000000ull) >> 24)
+	  | ((value & 0x00ff000000000000ull) >> 40)
+	  | ((value & 0xff00000000000000ull) >> 56));
+}
+
+__attribute__((noipa)) unsigned long long
+baz (unsigned long long value)
+{
+  return (((value & 0x00000000000000ffull) << 56)
+	  | ((value & 0x000000000000ff00ull) << 40)
+	  | ((value & 0x00000000ff000000ull) << 8)
+	  | ((value & 0x000000ff00000000ull) >> 8)
+	  | ((value & 0x0000ff0000000000ull) >> 24)
+	  | ((value & 0xff00000000000000ull) >> 56));
+}
+
+__attribute__((noipa)) unsigned int
+qux (unsigned int value)
+{
+  return (((value & 0x000000ff) << 24)
+	  | ((value & 0x00ff0000) >> 8)
+	  | ((value & 0xff000000) >> 24));
+}
+
+__attribute__((noipa)) unsigned int
+corge (unsigned int value)
+{
+  return (((value & 0x000000ff) << 24)
+	  | ((value & 0xff000000) >> 24));
+}
+
+int
+main ()
+{
+  if (foo (0x0102030405060708ull) != 0x04030201
+      || bar (0x0102030405060708ull) != 0x04030201
+      || baz (0x0102030405060708ull) != 0x0807000504030001ull
+      || qux (0x01020304) != 0x04000201
+      || corge (0x01020304) != 0x04000001)
+    __builtin_abort ();
+  return 0;
+}
--- gcc/testsuite/gcc.target/i386/pr86723.c.jj	2021-08-20 17:20:01.090247097 +0200
+++ gcc/testsuite/gcc.target/i386/pr86723.c	2021-08-20 17:19:09.421958400 +0200
@@ -0,0 +1,52 @@
+/* PR tree-optimization/86723 */
+/* { dg-do compile { target lp64 } } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-times "\tbswap\t" 5 } } */
+
+int
+foo (unsigned long long value)
+{
+  return (((value & 0x00000000000000ffull) << 56)
+	  | ((value & 0x000000000000ff00ull) << 40)
+	  | ((value & 0x0000000000ff0000ull) << 24)
+	  | ((value & 0x00000000ff000000ull) << 8)
+	  | ((value & 0x000000ff00000000ull) >> 8)
+	  | ((value & 0x0000ff0000000000ull) >> 24)
+	  | ((value & 0x00ff000000000000ull) >> 40)
+	  | ((value & 0xff00000000000000ull) >> 56));
+}
+
+int
+bar (unsigned long long value)
+{
+  return (((value & 0x000000ff00000000ull) >> 8)
+	  | ((value & 0x0000ff0000000000ull) >> 24)
+	  | ((value & 0x00ff000000000000ull) >> 40)
+	  | ((value & 0xff00000000000000ull) >> 56));
+}
+
+unsigned long long
+baz (unsigned long long value)
+{
+  return (((value & 0x00000000000000ffull) << 56)
+	  | ((value & 0x000000000000ff00ull) << 40)
+	  | ((value & 0x00000000ff000000ull) << 8)
+	  | ((value & 0x000000ff00000000ull) >> 8)
+	  | ((value & 0x0000ff0000000000ull) >> 24)
+	  | ((value & 0xff00000000000000ull) >> 56));
+}
+
+unsigned int
+qux (unsigned int value)
+{
+  return (((value & 0x000000ff) << 24)
+	  | ((value & 0x00ff0000) >> 8)
+	  | ((value & 0xff000000) >> 24));
+}
+
+unsigned int
+corge (unsigned int value)
+{
+  return (((value & 0x000000ff) << 24)
+	  | ((value & 0xff000000) >> 24));
+}
--- gcc/testsuite/gcc.dg/optimize-bswapdi-1.c.jj	2020-01-14 20:02:47.333601595 +0100
+++ gcc/testsuite/gcc.dg/optimize-bswapdi-1.c	2021-08-22 16:37:06.662260888 +0200
@@ -1,7 +1,7 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target bswap } */
 /* { dg-require-effective-target stdint_types } */
-/* { dg-options "-O2 -fdump-tree-bswap" } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
 /* { dg-additional-options "-mzarch" { target s390*-*-* } } */
 
 #include <stdint.h>
@@ -58,4 +58,4 @@ swap64_c (uint64_t x)
 }
 
 
-/* { dg-final { scan-tree-dump-times "64 bit bswap implementation found at" 3 "bswap" } } */
+/* { dg-final { scan-tree-dump-times "= __builtin_bswap64 \\\(" 3 "optimized" } } */
--- gcc/testsuite/gcc.dg/optimize-bswapdi-2.c.jj	2020-01-14 20:02:47.333601595 +0100
+++ gcc/testsuite/gcc.dg/optimize-bswapdi-2.c	2021-08-22 16:37:32.138910470 +0200
@@ -1,7 +1,7 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target bswap } */
 /* { dg-require-effective-target stdint_types } */
-/* { dg-options "-O2 -fdump-tree-bswap" } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
 /* { dg-additional-options "-mzarch" { target s390*-*-* } } */
 
 #include <stdint.h>
@@ -23,4 +23,4 @@ swap64_c (uint64_t x)
 }
 
 
-/* { dg-final { scan-tree-dump-times "64 bit bswap implementation found at" 1 "bswap" } } */
+/* { dg-final { scan-tree-dump-times "= __builtin_bswap64 \\\(" 1 "optimized" } } */
--- gcc/testsuite/gcc.dg/optimize-bswapsi-1.c.jj	2020-01-14 20:02:47.333601595 +0100
+++ gcc/testsuite/gcc.dg/optimize-bswapsi-1.c	2021-08-22 16:38:07.058430169 +0200
@@ -1,7 +1,7 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target bswap } */
 /* { dg-require-effective-target stdint_types } */
-/* { dg-options "-O2 -fdump-tree-bswap" } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
 /* { dg-additional-options "-march=z900" { target s390*-*-* } } */
 
 #include <stdint.h>
@@ -89,4 +89,4 @@ swap32_f (unsigned in)
   return in;
 }
 
-/* { dg-final { scan-tree-dump-times "32 bit bswap implementation found at" 6 "bswap" } } */
+/* { dg-final { scan-tree-dump-times "= __builtin_bswap32 \\\(" 6 "optimized" } } */
--- gcc/testsuite/gcc.dg/optimize-bswapsi-3.c.jj	2020-01-14 20:02:47.333601595 +0100
+++ gcc/testsuite/gcc.dg/optimize-bswapsi-3.c	2021-08-22 16:41:11.726890162 +0200
@@ -1,7 +1,7 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target bswap } */
 /* { dg-require-effective-target stdint_types } */
-/* { dg-options "-O2 -fdump-tree-bswap" } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
 /* { dg-additional-options "-march=z900" { target s390-*-* } } */
 
 typedef int SItype __attribute__ ((mode (SI)));
@@ -20,4 +20,4 @@ swap32 (SItype in)
 	 | (((in >> 24) & 0xFF) << 0);
 }
 
-/* { dg-final { scan-tree-dump-not "32 bit bswap implementation found at" "bswap" } } */
+/* { dg-final { scan-tree-dump-times "= __builtin_bswap32 \\\(" 1 "optimized" } } */
--- gcc/testsuite/gcc.dg/optimize-bswapsi-5.c.jj	2020-01-14 20:02:47.333601595 +0100
+++ gcc/testsuite/gcc.dg/optimize-bswapsi-5.c	2021-08-22 16:41:43.635451282 +0200
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target bswap } */
-/* { dg-options "-O2 -fdump-tree-bswap -fno-inline-functions" } */
+/* { dg-options "-O2 -fdump-tree-optimized -fno-inline-functions" } */
 /* { dg-additional-options "-march=z900" { target s390-*-* } } */
 
 struct L { unsigned int l[2]; };
@@ -28,4 +28,4 @@ bar (double a, struct L *p)
   foo (a, p);
 }
 
-/* { dg-final { scan-tree-dump-times "32 bit bswap implementation found at" 2 "bswap" } } */
+/* { dg-final { scan-tree-dump-times "= __builtin_bswap32 \\\(" 2 "optimized" } } */

	Jakub


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH] bswap: Recognize (int) __builtin_bswap64 (arg) idioms or __builtin_bswap?? (arg) & mask [PR86723]
  2021-08-23  8:46 [PATCH] bswap: Recognize (int) __builtin_bswap64 (arg) idioms or __builtin_bswap?? (arg) & mask [PR86723] Jakub Jelinek
@ 2021-08-23  9:50 ` Richard Biener
  0 siblings, 0 replies; 2+ messages in thread
From: Richard Biener @ 2021-08-23  9:50 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches

On Mon, 23 Aug 2021, Jakub Jelinek wrote:

> Hi!
> 
> The following patch recognizes in the bswap pass (only there for now,
> haven't done it for store merging pass yet) code sequences that can
> be handled by (int32) __builtin_bswap64 (arg), i.e. where we have
> 0x05060708 n->n with 64-bit non-memory argument (if it is memory, we
> can just load the 32-bit at 4 bytes into the address and n->n would
> be 0x01020304; and only 64 -> 32 bit, because 64 -> 16 bit or 32 -> 16 bit
> would mean only two bytes in the result and probably not worth it),
> and furthermore the case where we have in the 0x0102030405060708 etc.
> numbers some bytes 0 (i.e. known to contain zeros rather than source bytes),
> as long as we have at least two original bytes in the right
> positions (and no unknown bytes).  This can be handled by
> __builtin_bswap64 (arg) & 0xff0000ffffff00ffULL etc.
> The latter change is the reason why counting the bswap messages doesn't work
> too well in optimize-bswap* tests anymore, while the pass iterates from end
> of basic block towards start, it will often match both the bswap at the end
> and some of the earlier bswaps with some masks (not a problem generally,
> we'll just DCE it away whenever possible).  The pass right now doesn't
> handle __builtin_bswap* calls in the pattern matching (which is the reason
> why it operates backwards), but it uses FOR_EACH_BB_FN (bb, fun) order
> of handling blocks and matched sequences can span multiple blocks, so I was
> worried about cases like:
> void bar (unsigned long long);
> unsigned long long
> foo (unsigned long long value, int x)
> {
>   unsigned long long tmp = (((value & 0x00000000000000ffull) << 56)
>           | ((value & 0x000000000000ff00ull) << 40)
>           | ((value & 0x00000000ff000000ull) << 8));
>   if (x)
>     bar (tmp);
>   return (tmp 
>           | ((value & 0x000000ff00000000ull) >> 8)
>           | ((value & 0x0000ff0000000000ull) >> 24)
>           | ((value & 0x0000000000ff0000ull) << 24)
>           | ((value & 0x00ff000000000000ull) >> 40)
>           | ((value & 0xff00000000000000ull) >> 56));
> }
> but it seems we handle even that fine, while bb2 ending in GIMPLE_COND
> is processed first, we recognize there a __builtin_bswap64 (value) & mask1,
> in the last bb we recognize tmp | (__builtin_bswap64 (value) & mask2) and
> PRE optimizes that into t = __builtin_bswap64 (value); tmp = t & mask1;
> in the first bb and return t; in the last one.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2021-08-23  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR tree-optimization/86723
> 	* gimple-ssa-store-merging.c (find_bswap_or_nop_finalize): Add
> 	cast64_to_32 argument, set *cast64_to_32 to false, unless n is
> 	non-memory permutation of 64-bit src which only has bytes of
> 	0 or [5..8] and n->range is 4.
> 	(find_bswap_or_nop): Add cast64_to_32 and mask arguments, adjust
> 	find_bswap_or_nop_finalize caller, support bswap with some bytes
> 	zeroed, as long as at least two bytes are not zeroed.
> 	(bswap_replace): Add mask argument and handle masking of bswap
> 	result.
> 	(maybe_optimize_vector_constructor): Adjust find_bswap_or_nop
> 	caller, punt if cast64_to_32 or mask is not all ones.
> 	(pass_optimize_bswap::execute): Adjust find_bswap_or_nop_finalize
> 	caller, for now punt if cast64_to_32.
> 
> 	* gcc.dg/pr86723.c: New test.
> 	* gcc.target/i386/pr86723.c: New test.
> 	* gcc.dg/optimize-bswapdi-1.c: Use -fdump-tree-optimized instead of
> 	-fdump-tree-bswap and scan for number of __builtin_bswap64 calls.
> 	* gcc.dg/optimize-bswapdi-2.c: Likewise.
> 	* gcc.dg/optimize-bswapsi-1.c: Use -fdump-tree-optimized instead of
> 	-fdump-tree-bswap and scan for number of __builtin_bswap32 calls.
> 	* gcc.dg/optimize-bswapdi-5.c: Likewise.
> 	* gcc.dg/optimize-bswapdi-3.c: Likewise.  Expect one __builtin_bswap32
> 	call instead of zero.
> 
> --- gcc/gimple-ssa-store-merging.c.jj	2021-07-21 09:38:12.114463148 +0200
> +++ gcc/gimple-ssa-store-merging.c	2021-08-20 16:34:03.372081938 +0200
> @@ -792,7 +792,7 @@ find_bswap_or_nop_1 (gimple *stmt, struc
>  
>  void
>  find_bswap_or_nop_finalize (struct symbolic_number *n, uint64_t *cmpxchg,
> -			    uint64_t *cmpnop)
> +			    uint64_t *cmpnop, bool *cast64_to_32)
>  {
>    unsigned rsize;
>    uint64_t tmpn, mask;
> @@ -802,6 +802,7 @@ find_bswap_or_nop_finalize (struct symbo
>       according to the size of the symbolic number before using it.  */
>    *cmpxchg = CMPXCHG;
>    *cmpnop = CMPNOP;
> +  *cast64_to_32 = false;
>  
>    /* Find real size of result (highest non-zero byte).  */
>    if (n->base_addr)
> @@ -814,7 +815,27 @@ find_bswap_or_nop_finalize (struct symbo
>    if (n->range < (int) sizeof (int64_t))
>      {
>        mask = ((uint64_t) 1 << (n->range * BITS_PER_MARKER)) - 1;
> -      *cmpxchg >>= (64 / BITS_PER_MARKER - n->range) * BITS_PER_MARKER;
> +      if (n->base_addr == NULL
> +	  && n->range == 4
> +	  && int_size_in_bytes (TREE_TYPE (n->src)) == 8)
> +	{
> +	  /* If all bytes in n->n are either 0 or in [5..8] range, this
> +	     might be a candidate for (unsigned) __builtin_bswap64 (src).
> +	     It is not worth it for (unsigned short) __builtin_bswap64 (src)
> +	     or (unsigned short) __builtin_bswap32 (src).  */
> +	  *cast64_to_32 = true;
> +	  for (tmpn = n->n; tmpn; tmpn >>= BITS_PER_MARKER)
> +	    if ((tmpn & MARKER_MASK)
> +		&& ((tmpn & MARKER_MASK) <= 4 || (tmpn & MARKER_MASK) > 8))
> +	      {
> +		*cast64_to_32 = false;
> +		break;
> +	      }
> +	}
> +      if (*cast64_to_32)
> +	*cmpxchg &= mask;
> +      else
> +	*cmpxchg >>= (64 / BITS_PER_MARKER - n->range) * BITS_PER_MARKER;
>        *cmpnop &= mask;
>      }
>  
> @@ -837,6 +858,8 @@ find_bswap_or_nop_finalize (struct symbo
>        n->range = rsize;
>      }
>  
> +  if (*cast64_to_32)
> +    n->range = 8;
>    n->range *= BITS_PER_UNIT;
>  }
>  
> @@ -849,7 +872,8 @@ find_bswap_or_nop_finalize (struct symbo
>     expression.  */
>  
>  gimple *
> -find_bswap_or_nop (gimple *stmt, struct symbolic_number *n, bool *bswap)
> +find_bswap_or_nop (gimple *stmt, struct symbolic_number *n, bool *bswap,
> +		   bool *cast64_to_32, uint64_t *mask)
>  {
>    tree type_size = TYPE_SIZE_UNIT (TREE_TYPE (gimple_get_lhs (stmt)));
>    if (!tree_fits_uhwi_p (type_size))
> @@ -929,17 +953,30 @@ find_bswap_or_nop (gimple *stmt, struct
>      }
>  
>    uint64_t cmpxchg, cmpnop;
> -  find_bswap_or_nop_finalize (n, &cmpxchg, &cmpnop);
> +  find_bswap_or_nop_finalize (n, &cmpxchg, &cmpnop, cast64_to_32);
>  
>    /* A complete byte swap should make the symbolic number to start with
>       the largest digit in the highest order byte. Unchanged symbolic
>       number indicates a read with same endianness as target architecture.  */
> +  *mask = ~(uint64_t) 0;
>    if (n->n == cmpnop)
>      *bswap = false;
>    else if (n->n == cmpxchg)
>      *bswap = true;
>    else
> -    return NULL;
> +    {
> +      int set = 0;
> +      for (uint64_t msk = MARKER_MASK; msk; msk <<= BITS_PER_MARKER)
> +	if ((n->n & msk) == 0)
> +	  *mask &= ~msk;
> +	else if ((n->n & msk) == (cmpxchg & msk))
> +	  set++;
> +	else
> +	  return NULL;
> +      if (set < 2)
> +	return NULL;
> +      *bswap = true;
> +    }
>  
>    /* Useless bit manipulation performed by code.  */
>    if (!n->base_addr && n->n == cmpnop && n->n_ops == 1)
> @@ -1024,10 +1061,10 @@ bswap_view_convert (gimple_stmt_iterator
>  tree
>  bswap_replace (gimple_stmt_iterator gsi, gimple *ins_stmt, tree fndecl,
>  	       tree bswap_type, tree load_type, struct symbolic_number *n,
> -	       bool bswap)
> +	       bool bswap, uint64_t mask)
>  {
>    tree src, tmp, tgt = NULL_TREE;
> -  gimple *bswap_stmt;
> +  gimple *bswap_stmt, *mask_stmt = NULL;
>    tree_code conv_code = NOP_EXPR;
>  
>    gimple *cur_stmt = gsi_stmt (gsi);
> @@ -1247,6 +1284,15 @@ bswap_replace (gimple_stmt_iterator gsi,
>      tgt = make_ssa_name (bswap_type);
>    tmp = tgt;
>  
> +  if (mask != ~(uint64_t) 0)
> +    {
> +      tree m = build_int_cst (bswap_type, mask);
> +      tmp = make_temp_ssa_name (bswap_type, NULL, "bswapdst");
> +      gimple_set_lhs (bswap_stmt, tmp);
> +      mask_stmt = gimple_build_assign (tgt, BIT_AND_EXPR, tmp, m);
> +      tmp = tgt;
> +    }
> +
>    /* Convert the result if necessary.  */
>    if (!useless_type_conversion_p (TREE_TYPE (tgt), bswap_type))
>      {
> @@ -1260,7 +1306,7 @@ bswap_replace (gimple_stmt_iterator gsi,
>        gsi_insert_after (&gsi, convert_stmt, GSI_SAME_STMT);
>      }
>  
> -  gimple_set_lhs (bswap_stmt, tmp);
> +  gimple_set_lhs (mask_stmt ? mask_stmt : bswap_stmt, tmp);
>  
>    if (dump_file)
>      {
> @@ -1277,11 +1323,17 @@ bswap_replace (gimple_stmt_iterator gsi,
>  
>    if (cur_stmt)
>      {
> +      if (mask_stmt)
> +	gsi_insert_after (&gsi, mask_stmt, GSI_SAME_STMT);
>        gsi_insert_after (&gsi, bswap_stmt, GSI_SAME_STMT);
>        gsi_remove (&gsi, true);
>      }
>    else
> -    gsi_insert_before (&gsi, bswap_stmt, GSI_SAME_STMT);
> +    {
> +      gsi_insert_before (&gsi, bswap_stmt, GSI_SAME_STMT);
> +      if (mask_stmt)
> +	gsi_insert_before (&gsi, mask_stmt, GSI_SAME_STMT);
> +    }
>    return tgt;
>  }
>  
> @@ -1341,8 +1393,14 @@ maybe_optimize_vector_constructor (gimpl
>        return false;
>      }
>  
> -  gimple *ins_stmt = find_bswap_or_nop (cur_stmt, &n, &bswap);
> -  if (!ins_stmt || n.range != (unsigned HOST_WIDE_INT) sz)
> +  bool cast64_to_32;
> +  uint64_t mask;
> +  gimple *ins_stmt = find_bswap_or_nop (cur_stmt, &n, &bswap,
> +					&cast64_to_32, &mask);
> +  if (!ins_stmt
> +      || n.range != (unsigned HOST_WIDE_INT) sz
> +      || cast64_to_32
> +      || mask != ~(uint64_t) 0)
>      return false;
>  
>    if (bswap && !fndecl && n.range != 16)
> @@ -1351,7 +1409,7 @@ maybe_optimize_vector_constructor (gimpl
>    memset (&nop_stats, 0, sizeof (nop_stats));
>    memset (&bswap_stats, 0, sizeof (bswap_stats));
>    return bswap_replace (gsi_for_stmt (cur_stmt), ins_stmt, fndecl,
> -			bswap_type, load_type, &n, bswap) != NULL_TREE;
> +			bswap_type, load_type, &n, bswap, mask) != NULL_TREE;
>  }
>  
>  /* Find manual byte swap implementations as well as load in a given
> @@ -1405,7 +1463,8 @@ pass_optimize_bswap::execute (function *
>  	  tree fndecl = NULL_TREE, bswap_type = NULL_TREE, load_type;
>  	  enum tree_code code;
>  	  struct symbolic_number n;
> -	  bool bswap;
> +	  bool bswap, cast64_to_32;
> +	  uint64_t mask;
>  
>  	  /* This gsi_prev (&gsi) is not part of the for loop because cur_stmt
>  	     might be moved to a different basic block by bswap_replace and gsi
> @@ -1442,7 +1501,8 @@ pass_optimize_bswap::execute (function *
>  	      continue;
>  	    }
>  
> -	  ins_stmt = find_bswap_or_nop (cur_stmt, &n, &bswap);
> +	  ins_stmt = find_bswap_or_nop (cur_stmt, &n, &bswap,
> +					&cast64_to_32, &mask);
>  
>  	  if (!ins_stmt)
>  	    continue;
> @@ -1479,7 +1539,7 @@ pass_optimize_bswap::execute (function *
>  	    continue;
>  
>  	  if (bswap_replace (gsi_for_stmt (cur_stmt), ins_stmt, fndecl,
> -			     bswap_type, load_type, &n, bswap))
> +			     bswap_type, load_type, &n, bswap, mask))
>  	    changed = true;
>  	}
>      }
> @@ -2820,7 +2880,8 @@ imm_store_chain_info::try_coalesce_bswap
>      }
>  
>    uint64_t cmpxchg, cmpnop;
> -  find_bswap_or_nop_finalize (&n, &cmpxchg, &cmpnop);
> +  bool cast64_to_32;
> +  find_bswap_or_nop_finalize (&n, &cmpxchg, &cmpnop, &cast64_to_32);
>  
>    /* A complete byte swap should make the symbolic number to start with
>       the largest digit in the highest order byte.  Unchanged symbolic
> @@ -2828,6 +2889,10 @@ imm_store_chain_info::try_coalesce_bswap
>    if (n.n != cmpnop && n.n != cmpxchg)
>      return false;
>  
> +  /* For now.  */
> +  if (cast64_to_32)
> +    return false;
> +
>    if (n.base_addr == NULL_TREE && !is_gimple_val (n.src))
>      return false;
>  
> @@ -4161,7 +4226,8 @@ imm_store_chain_info::output_merged_stor
>  	    n->vuse = gimple_vuse (ins_stmt);
>  	}
>        bswap_res = bswap_replace (gsi_start (seq), ins_stmt, fndecl,
> -				 bswap_type, load_type, n, bswap);
> +				 bswap_type, load_type, n, bswap,
> +				 ~(uint64_t) 0);
>        gcc_assert (bswap_res);
>      }
>  
> --- gcc/testsuite/gcc.dg/pr86723.c.jj	2021-08-20 17:29:49.079153045 +0200
> +++ gcc/testsuite/gcc.dg/pr86723.c	2021-08-20 17:30:44.027398231 +0200
> @@ -0,0 +1,63 @@
> +/* PR tree-optimization/86723 */
> +/* { dg-do run { target { ilp32 || lp64 } } } */
> +/* { dg-options "-O2" } */
> +
> +__attribute__((noipa)) int
> +foo (unsigned long long value)
> +{
> +  return (((value & 0x00000000000000ffull) << 56)
> +	  | ((value & 0x000000000000ff00ull) << 40)
> +	  | ((value & 0x0000000000ff0000ull) << 24)
> +	  | ((value & 0x00000000ff000000ull) << 8)
> +	  | ((value & 0x000000ff00000000ull) >> 8)
> +	  | ((value & 0x0000ff0000000000ull) >> 24)
> +	  | ((value & 0x00ff000000000000ull) >> 40)
> +	  | ((value & 0xff00000000000000ull) >> 56));
> +}
> +
> +__attribute__((noipa)) int
> +bar (unsigned long long value)
> +{
> +  return (((value & 0x000000ff00000000ull) >> 8)
> +	  | ((value & 0x0000ff0000000000ull) >> 24)
> +	  | ((value & 0x00ff000000000000ull) >> 40)
> +	  | ((value & 0xff00000000000000ull) >> 56));
> +}
> +
> +__attribute__((noipa)) unsigned long long
> +baz (unsigned long long value)
> +{
> +  return (((value & 0x00000000000000ffull) << 56)
> +	  | ((value & 0x000000000000ff00ull) << 40)
> +	  | ((value & 0x00000000ff000000ull) << 8)
> +	  | ((value & 0x000000ff00000000ull) >> 8)
> +	  | ((value & 0x0000ff0000000000ull) >> 24)
> +	  | ((value & 0xff00000000000000ull) >> 56));
> +}
> +
> +__attribute__((noipa)) unsigned int
> +qux (unsigned int value)
> +{
> +  return (((value & 0x000000ff) << 24)
> +	  | ((value & 0x00ff0000) >> 8)
> +	  | ((value & 0xff000000) >> 24));
> +}
> +
> +__attribute__((noipa)) unsigned int
> +corge (unsigned int value)
> +{
> +  return (((value & 0x000000ff) << 24)
> +	  | ((value & 0xff000000) >> 24));
> +}
> +
> +int
> +main ()
> +{
> +  if (foo (0x0102030405060708ull) != 0x04030201
> +      || bar (0x0102030405060708ull) != 0x04030201
> +      || baz (0x0102030405060708ull) != 0x0807000504030001ull
> +      || qux (0x01020304) != 0x04000201
> +      || corge (0x01020304) != 0x04000001)
> +    __builtin_abort ();
> +  return 0;
> +}
> --- gcc/testsuite/gcc.target/i386/pr86723.c.jj	2021-08-20 17:20:01.090247097 +0200
> +++ gcc/testsuite/gcc.target/i386/pr86723.c	2021-08-20 17:19:09.421958400 +0200
> @@ -0,0 +1,52 @@
> +/* PR tree-optimization/86723 */
> +/* { dg-do compile { target lp64 } } */
> +/* { dg-options "-O2" } */
> +/* { dg-final { scan-assembler-times "\tbswap\t" 5 } } */
> +
> +int
> +foo (unsigned long long value)
> +{
> +  return (((value & 0x00000000000000ffull) << 56)
> +	  | ((value & 0x000000000000ff00ull) << 40)
> +	  | ((value & 0x0000000000ff0000ull) << 24)
> +	  | ((value & 0x00000000ff000000ull) << 8)
> +	  | ((value & 0x000000ff00000000ull) >> 8)
> +	  | ((value & 0x0000ff0000000000ull) >> 24)
> +	  | ((value & 0x00ff000000000000ull) >> 40)
> +	  | ((value & 0xff00000000000000ull) >> 56));
> +}
> +
> +int
> +bar (unsigned long long value)
> +{
> +  return (((value & 0x000000ff00000000ull) >> 8)
> +	  | ((value & 0x0000ff0000000000ull) >> 24)
> +	  | ((value & 0x00ff000000000000ull) >> 40)
> +	  | ((value & 0xff00000000000000ull) >> 56));
> +}
> +
> +unsigned long long
> +baz (unsigned long long value)
> +{
> +  return (((value & 0x00000000000000ffull) << 56)
> +	  | ((value & 0x000000000000ff00ull) << 40)
> +	  | ((value & 0x00000000ff000000ull) << 8)
> +	  | ((value & 0x000000ff00000000ull) >> 8)
> +	  | ((value & 0x0000ff0000000000ull) >> 24)
> +	  | ((value & 0xff00000000000000ull) >> 56));
> +}
> +
> +unsigned int
> +qux (unsigned int value)
> +{
> +  return (((value & 0x000000ff) << 24)
> +	  | ((value & 0x00ff0000) >> 8)
> +	  | ((value & 0xff000000) >> 24));
> +}
> +
> +unsigned int
> +corge (unsigned int value)
> +{
> +  return (((value & 0x000000ff) << 24)
> +	  | ((value & 0xff000000) >> 24));
> +}
> --- gcc/testsuite/gcc.dg/optimize-bswapdi-1.c.jj	2020-01-14 20:02:47.333601595 +0100
> +++ gcc/testsuite/gcc.dg/optimize-bswapdi-1.c	2021-08-22 16:37:06.662260888 +0200
> @@ -1,7 +1,7 @@
>  /* { dg-do compile } */
>  /* { dg-require-effective-target bswap } */
>  /* { dg-require-effective-target stdint_types } */
> -/* { dg-options "-O2 -fdump-tree-bswap" } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
>  /* { dg-additional-options "-mzarch" { target s390*-*-* } } */
>  
>  #include <stdint.h>
> @@ -58,4 +58,4 @@ swap64_c (uint64_t x)
>  }
>  
>  
> -/* { dg-final { scan-tree-dump-times "64 bit bswap implementation found at" 3 "bswap" } } */
> +/* { dg-final { scan-tree-dump-times "= __builtin_bswap64 \\\(" 3 "optimized" } } */
> --- gcc/testsuite/gcc.dg/optimize-bswapdi-2.c.jj	2020-01-14 20:02:47.333601595 +0100
> +++ gcc/testsuite/gcc.dg/optimize-bswapdi-2.c	2021-08-22 16:37:32.138910470 +0200
> @@ -1,7 +1,7 @@
>  /* { dg-do compile } */
>  /* { dg-require-effective-target bswap } */
>  /* { dg-require-effective-target stdint_types } */
> -/* { dg-options "-O2 -fdump-tree-bswap" } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
>  /* { dg-additional-options "-mzarch" { target s390*-*-* } } */
>  
>  #include <stdint.h>
> @@ -23,4 +23,4 @@ swap64_c (uint64_t x)
>  }
>  
>  
> -/* { dg-final { scan-tree-dump-times "64 bit bswap implementation found at" 1 "bswap" } } */
> +/* { dg-final { scan-tree-dump-times "= __builtin_bswap64 \\\(" 1 "optimized" } } */
> --- gcc/testsuite/gcc.dg/optimize-bswapsi-1.c.jj	2020-01-14 20:02:47.333601595 +0100
> +++ gcc/testsuite/gcc.dg/optimize-bswapsi-1.c	2021-08-22 16:38:07.058430169 +0200
> @@ -1,7 +1,7 @@
>  /* { dg-do compile } */
>  /* { dg-require-effective-target bswap } */
>  /* { dg-require-effective-target stdint_types } */
> -/* { dg-options "-O2 -fdump-tree-bswap" } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
>  /* { dg-additional-options "-march=z900" { target s390*-*-* } } */
>  
>  #include <stdint.h>
> @@ -89,4 +89,4 @@ swap32_f (unsigned in)
>    return in;
>  }
>  
> -/* { dg-final { scan-tree-dump-times "32 bit bswap implementation found at" 6 "bswap" } } */
> +/* { dg-final { scan-tree-dump-times "= __builtin_bswap32 \\\(" 6 "optimized" } } */
> --- gcc/testsuite/gcc.dg/optimize-bswapsi-3.c.jj	2020-01-14 20:02:47.333601595 +0100
> +++ gcc/testsuite/gcc.dg/optimize-bswapsi-3.c	2021-08-22 16:41:11.726890162 +0200
> @@ -1,7 +1,7 @@
>  /* { dg-do compile } */
>  /* { dg-require-effective-target bswap } */
>  /* { dg-require-effective-target stdint_types } */
> -/* { dg-options "-O2 -fdump-tree-bswap" } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
>  /* { dg-additional-options "-march=z900" { target s390-*-* } } */
>  
>  typedef int SItype __attribute__ ((mode (SI)));
> @@ -20,4 +20,4 @@ swap32 (SItype in)
>  	 | (((in >> 24) & 0xFF) << 0);
>  }
>  
> -/* { dg-final { scan-tree-dump-not "32 bit bswap implementation found at" "bswap" } } */
> +/* { dg-final { scan-tree-dump-times "= __builtin_bswap32 \\\(" 1 "optimized" } } */
> --- gcc/testsuite/gcc.dg/optimize-bswapsi-5.c.jj	2020-01-14 20:02:47.333601595 +0100
> +++ gcc/testsuite/gcc.dg/optimize-bswapsi-5.c	2021-08-22 16:41:43.635451282 +0200
> @@ -1,6 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-require-effective-target bswap } */
> -/* { dg-options "-O2 -fdump-tree-bswap -fno-inline-functions" } */
> +/* { dg-options "-O2 -fdump-tree-optimized -fno-inline-functions" } */
>  /* { dg-additional-options "-march=z900" { target s390-*-* } } */
>  
>  struct L { unsigned int l[2]; };
> @@ -28,4 +28,4 @@ bar (double a, struct L *p)
>    foo (a, p);
>  }
>  
> -/* { dg-final { scan-tree-dump-times "32 bit bswap implementation found at" 2 "bswap" } } */
> +/* { dg-final { scan-tree-dump-times "= __builtin_bswap32 \\\(" 2 "optimized" } } */
> 
> 	Jakub
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-08-23  9:50 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-23  8:46 [PATCH] bswap: Recognize (int) __builtin_bswap64 (arg) idioms or __builtin_bswap?? (arg) & mask [PR86723] Jakub Jelinek
2021-08-23  9:50 ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).