public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Lift alignment restrictions from inlining register size memcpy
@ 2016-07-12  9:38 Richard Biener
  2016-07-12 21:37 ` Eric Botcazou
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Richard Biener @ 2016-07-12  9:38 UTC (permalink / raw)
  To: gcc-patches


The following patch does $subject which is requested in a comment in
PR50417 as this restriction defeats the purpose of having memcpy
as a portable and efficient way to circumvent strict-aliasing violations
(or even as a portable and efficient way to do unaligned loads).

Bootstrap / regtest running on x86_64-unknown-linux-gnu (quite pointless
as there is no change on that arch).

I have added a testcase that should exercise most relevant cases so
we can look for fallout on "interesting" targets.

Boostrap / regtest on strict-alignment platforms welcome.

I do expect that with -Os expanding unaligned-unaligned moves
might be a size pessimization compared to a libcall (or what
the targets block move expander does).  But the whole point is
to remove abstraction penalty so it isn't an easy stmt-local
decision to make.  Comments on this front welcome as well.

Unless I hear some positives I'll let the patch sit here as I
don't really care too much about those pesky targets (and
targets can choose to opt-in by providing movmisalign optabs
anyway so they don't go the store/extract_bit_field path).

Thanks,
Richard.

2016-07-12  Richard Biener  <rguenther@suse.de>

	* gimple-fold.c (gimple_fold_builtin_memory_op): Lift alignment
	restrictions from inlining register size memcpy.

	* gcc.dg/torture/builtin-memcpy.c: New testcase.

Index: gcc/gimple-fold.c
===================================================================
*** gcc/gimple-fold.c	(revision 238237)
--- gcc/gimple-fold.c	(working copy)
*************** gimple_fold_builtin_memory_op (gimple_st
*** 705,718 ****
  	      if (type
  		  && TYPE_MODE (type) != BLKmode
  		  && (GET_MODE_SIZE (TYPE_MODE (type)) * BITS_PER_UNIT
! 		      == ilen * 8)
! 		  /* If the destination pointer is not aligned we must be able
! 		     to emit an unaligned store.  */
! 		  && (dest_align >= GET_MODE_ALIGNMENT (TYPE_MODE (type))
! 		      || !SLOW_UNALIGNED_ACCESS (TYPE_MODE (type), dest_align)
! 		      || (optab_handler (movmisalign_optab, TYPE_MODE (type))
! 			  != CODE_FOR_nothing)))
  		{
  		  tree srctype = type;
  		  tree desttype = type;
  		  if (src_align < GET_MODE_ALIGNMENT (TYPE_MODE (type)))
--- 705,718 ----
  	      if (type
  		  && TYPE_MODE (type) != BLKmode
  		  && (GET_MODE_SIZE (TYPE_MODE (type)) * BITS_PER_UNIT
! 		      == ilen * 8))
  		{
+ 		  /* RTL expansion handles misaligned destination / source
+ 		     MEM_REFs either via target provided movmisalign or
+ 		     via extract/store_bit_field for targets that set
+ 		     SLOW_UNALIGNED_ACCESS for the move.  For move
+ 		     quantities up to MOVE_MAX this should be always
+ 		     more efficient than a libcall to memcpy.  */
  		  tree srctype = type;
  		  tree desttype = type;
  		  if (src_align < GET_MODE_ALIGNMENT (TYPE_MODE (type)))
*************** gimple_fold_builtin_memory_op (gimple_st
*** 721,767 ****
  		  tree tem = fold_const_aggregate_ref (srcmem);
  		  if (tem)
  		    srcmem = tem;
! 		  else if (src_align < GET_MODE_ALIGNMENT (TYPE_MODE (type))
! 			   && SLOW_UNALIGNED_ACCESS (TYPE_MODE (type),
! 						     src_align)
! 			   && (optab_handler (movmisalign_optab,
! 					      TYPE_MODE (type))
! 			       == CODE_FOR_nothing))
! 		    srcmem = NULL_TREE;
! 		  if (srcmem)
  		    {
! 		      gimple *new_stmt;
! 		      if (is_gimple_reg_type (TREE_TYPE (srcmem)))
! 			{
! 			  new_stmt = gimple_build_assign (NULL_TREE, srcmem);
! 			  if (gimple_in_ssa_p (cfun))
! 			    srcmem = make_ssa_name (TREE_TYPE (srcmem),
! 						    new_stmt);
! 			  else
! 			    srcmem = create_tmp_reg (TREE_TYPE (srcmem));
! 			  gimple_assign_set_lhs (new_stmt, srcmem);
! 			  gimple_set_vuse (new_stmt, gimple_vuse (stmt));
! 			  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
! 			}
! 		      if (dest_align < GET_MODE_ALIGNMENT (TYPE_MODE (type)))
! 			desttype = build_aligned_type (type, dest_align);
! 		      new_stmt
! 			= gimple_build_assign (fold_build2 (MEM_REF, desttype,
! 							    dest, off0),
! 					       srcmem);
  		      gimple_set_vuse (new_stmt, gimple_vuse (stmt));
- 		      gimple_set_vdef (new_stmt, gimple_vdef (stmt));
- 		      if (gimple_vdef (new_stmt)
- 			  && TREE_CODE (gimple_vdef (new_stmt)) == SSA_NAME)
- 			SSA_NAME_DEF_STMT (gimple_vdef (new_stmt)) = new_stmt;
- 		      if (!lhs)
- 			{
- 			  gsi_replace (gsi, new_stmt, false);
- 			  return true;
- 			}
  		      gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
- 		      goto done;
  		    }
  		}
  	    }
  	}
--- 721,756 ----
  		  tree tem = fold_const_aggregate_ref (srcmem);
  		  if (tem)
  		    srcmem = tem;
! 		  gimple *new_stmt;
! 		  if (is_gimple_reg_type (TREE_TYPE (srcmem)))
  		    {
! 		      new_stmt = gimple_build_assign (NULL_TREE, srcmem);
! 		      if (gimple_in_ssa_p (cfun))
! 			srcmem = make_ssa_name (TREE_TYPE (srcmem),
! 						new_stmt);
! 		      else
! 			srcmem = create_tmp_reg (TREE_TYPE (srcmem));
! 		      gimple_assign_set_lhs (new_stmt, srcmem);
  		      gimple_set_vuse (new_stmt, gimple_vuse (stmt));
  		      gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
  		    }
+ 		  if (dest_align < GET_MODE_ALIGNMENT (TYPE_MODE (type)))
+ 		    desttype = build_aligned_type (type, dest_align);
+ 		  new_stmt
+ 		    = gimple_build_assign (fold_build2 (MEM_REF, desttype,
+ 							dest, off0), srcmem);
+ 		  gimple_set_vuse (new_stmt, gimple_vuse (stmt));
+ 		  gimple_set_vdef (new_stmt, gimple_vdef (stmt));
+ 		  if (gimple_vdef (new_stmt)
+ 		      && TREE_CODE (gimple_vdef (new_stmt)) == SSA_NAME)
+ 		    SSA_NAME_DEF_STMT (gimple_vdef (new_stmt)) = new_stmt;
+ 		  if (!lhs)
+ 		    {
+ 		      gsi_replace (gsi, new_stmt, false);
+ 		      return true;
+ 		    }
+ 		  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+ 		  goto done;
  		}
  	    }
  	}
Index: gcc/testsuite/gcc.dg/torture/builtin-memcpy.c
===================================================================
*** gcc/testsuite/gcc.dg/torture/builtin-memcpy.c	(revision 0)
--- gcc/testsuite/gcc.dg/torture/builtin-memcpy.c	(working copy)
***************
*** 0 ****
--- 1,89 ----
+ /* { dg-do run } */
+ 
+ char src[32], dest[32];
+ long long llsrc, lldest;
+ 
+ /* Unaligned source/dest variants.  */
+ void __attribute__((noinline,noclone))
+ copyuu16 (long long *src, long long *dest)
+ {
+   __builtin_memcpy (dest, src, 16);
+ }
+ void __attribute__((noinline,noclone))
+ copyuu8 (long *src, long *dest)
+ {
+   __builtin_memcpy (dest, src, 8);
+ }
+ void __attribute__((noinline,noclone))
+ copyuu4 (int *src, int *dest)
+ {
+   __builtin_memcpy (dest, src, 4);
+ }
+ void __attribute__((noinline,noclone))
+ copyuu2 (short *src, short *dest)
+ {
+   __builtin_memcpy (dest, src, 2);
+ }
+ 
+ /* Aligned source, unaligned dest variants.  */
+ void __attribute__((noinline,noclone))
+ copyau16 (long long *dest)
+ {
+   __builtin_memcpy (dest, &llsrc, 16);
+ }
+ void __attribute__((noinline,noclone))
+ copyau8 (long long *dest)
+ {
+   __builtin_memcpy (dest, &llsrc, 8);
+ }
+ void __attribute__((noinline,noclone))
+ copyau4 (long long *dest)
+ {
+   __builtin_memcpy (dest, &llsrc, 4);
+ }
+ void __attribute__((noinline,noclone))
+ copyau2 (long long *dest)
+ {
+   __builtin_memcpy (dest, &llsrc, 2);
+ }
+ 
+ /* Unaligned source, aligned dest variants.  */
+ void __attribute__((noinline,noclone))
+ copyua16 (long long *src)
+ {
+   __builtin_memcpy (&lldest, src, 16);
+ }
+ void __attribute__((noinline,noclone))
+ copyua8 (long *src)
+ {
+   __builtin_memcpy (&lldest, src, 8);
+ }
+ void __attribute__((noinline,noclone))
+ copyua4 (int *src)
+ {
+   __builtin_memcpy (&lldest, src, 4);
+ }
+ void __attribute__((noinline,noclone))
+ copyua2 (short *src)
+ {
+   __builtin_memcpy (&lldest, src, 2);
+ }
+ 
+ int main()
+ {
+   void *usrc = (void *)(((__UINTPTR_TYPE__)src & -16) + 17);
+   void *udest = (void *)(((__UINTPTR_TYPE__)dest & -16) + 17);
+   copyuu16 (udest, usrc);
+   copyuu8 (udest, usrc);
+   copyuu4 (udest, usrc);
+   copyuu2 (udest, usrc);
+   copyau16 (usrc);
+   copyau8 (usrc);
+   copyau4 (usrc);
+   copyau2 (usrc);
+   copyua16 (udest);
+   copyua8 (udest);
+   copyua4 (udest);
+   copyua2 (udest);
+   return 0;
+ }

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Lift alignment restrictions from inlining register size memcpy
  2016-07-12  9:38 [PATCH] Lift alignment restrictions from inlining register size memcpy Richard Biener
@ 2016-07-12 21:37 ` Eric Botcazou
  2016-07-14 10:29 ` Eric Botcazou
  2016-07-14 11:47 ` Georg-Johann Lay
  2 siblings, 0 replies; 6+ messages in thread
From: Eric Botcazou @ 2016-07-12 21:37 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

> I have added a testcase that should exercise most relevant cases so
> we can look for fallout on "interesting" targets.

It passes on SPARC/Solaris 32-bit.

> Boostrap / regtest on strict-alignment platforms welcome.

Preliminary testing looks good, with a few expected regressions I guess:

+XPASS: gcc.dg/memmove-4.c scan-tree-dump-not optimized "memmove"
+FAIL: gcc.dg/strlenopt-8.c scan-tree-dump-times strlen "strlen \\\\(" 0
+FAIL: gcc.dg/strlenopt-8.c scan-tree-dump-times strlen "memcpy \\\\(" 4

> Unless I hear some positives I'll let the patch sit here as I
> don't really care too much about those pesky targets (and
> targets can choose to opt-in by providing movmisalign optabs
> anyway so they don't go the store/extract_bit_field path).

I'll conduct more thorough testing over the next few days.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Lift alignment restrictions from inlining register size memcpy
  2016-07-12  9:38 [PATCH] Lift alignment restrictions from inlining register size memcpy Richard Biener
  2016-07-12 21:37 ` Eric Botcazou
@ 2016-07-14 10:29 ` Eric Botcazou
  2016-07-14 11:10   ` Richard Biener
  2016-07-14 11:47 ` Georg-Johann Lay
  2 siblings, 1 reply; 6+ messages in thread
From: Eric Botcazou @ 2016-07-14 10:29 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

> Boostrap / regtest on strict-alignment platforms welcome.

Bootstrap/regtest on SPARC/Solaris is OK, modulo:

XPASS: gcc.dg/memmove-4.c scan-tree-dump-not optimized "memmove"
FAIL: gcc.dg/strlenopt-8.c scan-tree-dump-times strlen "strlen \\\\(" 0
FAIL: gcc.dg/strlenopt-8.c scan-tree-dump-times strlen "memcpy \\\\(" 4

for both 32-bit and 64-bit.

> Unless I hear some positives I'll let the patch sit here as I
> don't really care too much about those pesky targets (and
> targets can choose to opt-in by providing movmisalign optabs
> anyway so they don't go the store/extract_bit_field path).

The patch is a step forward but it doesn't fully solve the problem:

#include <stdint.h>
#include <string.h>

uint32_t foo_noalign(const uint32_t *s) {
  uint32_t v;
  memcpy(&v, s, sizeof(v));
  return v;
}

uint32_t foo(const uint32_t *s) {
  uint32_t v;
  memcpy(&v, __builtin_assume_aligned(s, 4), sizeof(v));
  return v;
}

As discussed in the audit trail, the compiler ought to generate again the same 
code on strict-alignment targets in both cases, as Clang apparently and as GCC 
itself during most of its existence (until GCC 4.5 to be precise).

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Lift alignment restrictions from inlining register size memcpy
  2016-07-14 10:29 ` Eric Botcazou
@ 2016-07-14 11:10   ` Richard Biener
  0 siblings, 0 replies; 6+ messages in thread
From: Richard Biener @ 2016-07-14 11:10 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: gcc-patches

On Thu, 14 Jul 2016, Eric Botcazou wrote:

> > Boostrap / regtest on strict-alignment platforms welcome.
> 
> Bootstrap/regtest on SPARC/Solaris is OK, modulo:
> 
> XPASS: gcc.dg/memmove-4.c scan-tree-dump-not optimized "memmove"
> FAIL: gcc.dg/strlenopt-8.c scan-tree-dump-times strlen "strlen \\\\(" 0
> FAIL: gcc.dg/strlenopt-8.c scan-tree-dump-times strlen "memcpy \\\\(" 4
> 
> for both 32-bit and 64-bit.
> 
> > Unless I hear some positives I'll let the patch sit here as I
> > don't really care too much about those pesky targets (and
> > targets can choose to opt-in by providing movmisalign optabs
> > anyway so they don't go the store/extract_bit_field path).
> 
> The patch is a step forward but it doesn't fully solve the problem:
> 
> #include <stdint.h>
> #include <string.h>
> 
> uint32_t foo_noalign(const uint32_t *s) {
>   uint32_t v;
>   memcpy(&v, s, sizeof(v));
>   return v;
> }
> 
> uint32_t foo(const uint32_t *s) {
>   uint32_t v;
>   memcpy(&v, __builtin_assume_aligned(s, 4), sizeof(v));
>   return v;
> }
> 
> As discussed in the audit trail, the compiler ought to generate again 
> the same code on strict-alignment targets in both cases, as Clang 
> apparently and as GCC itself during most of its existence (until GCC 4.5 
> to be precise).

The patch wasn't supposed to address this issue.  It was supposed
to address the fact that the memcpy call is not optimized away
on strict-align targets 
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50417#c16).

I've commented on the original issue in that PR extensively (on what
is possible and what not within the constraints of the middle-end).
Iff alignment according to the pointer type passed to memory
functions is important the FEs need to extract said information
before it gets lost (and before casting it to the prototypes argument
type).

Richard.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Lift alignment restrictions from inlining register size memcpy
  2016-07-12  9:38 [PATCH] Lift alignment restrictions from inlining register size memcpy Richard Biener
  2016-07-12 21:37 ` Eric Botcazou
  2016-07-14 10:29 ` Eric Botcazou
@ 2016-07-14 11:47 ` Georg-Johann Lay
  2016-07-14 12:02   ` Richard Biener
  2 siblings, 1 reply; 6+ messages in thread
From: Georg-Johann Lay @ 2016-07-14 11:47 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

On 12.07.2016 11:38, Richard Biener wrote:
>
> The following patch does $subject which is requested in a comment in
> PR50417 as this restriction defeats the purpose of having memcpy
> as a portable and efficient way to circumvent strict-aliasing violations
> (or even as a portable and efficient way to do unaligned loads).

IIUC the original purpose of the PR is a complaint about execution performance 
in situations like, say, memcpy (int*, int*, 0x1000).

GCC 4.5 used int-wide chunks in movmem even if no alignment could be inferred 
from symbols like in

static int a[..], b[..];
memcpy (a, b, 0x1000)

The more I think about it the more I tend to not changing the current 
behaviour, be conservative and treat memcpy like a function.

If the alignment is deduced from the pointer type, then this is clearly not 
what the standard says as you already pointed out in the PR.

Hence, if we want such an "optimization" it should be optional by means of a 
respective command option so that the user can explicitly request this ABI 
change for memcpy.

> Bootstrap / regtest running on x86_64-unknown-linux-gnu (quite pointless
> as there is no change on that arch).
>
> I have added a testcase that should exercise most relevant cases so
> we can look for fallout on "interesting" targets.
>
> Boostrap / regtest on strict-alignment platforms welcome.
>
> I do expect that with -Os expanding unaligned-unaligned moves
> might be a size pessimization compared to a libcall (or what
> the targets block move expander does).  But the whole point is
> to remove abstraction penalty so it isn't an easy stmt-local
> decision to make.  Comments on this front welcome as well.
>
> Unless I hear some positives I'll let the patch sit here as I
> don't really care too much about those pesky targets (and
> targets can choose to opt-in by providing movmisalign optabs
> anyway so they don't go the store/extract_bit_field path).

As I said, my doubts are increasing...

If movmem sees no alignment (where 4.5 passed a bigger alignment) then it could 
just FAIL and assume that libcall provides a fast implementation.  This means 
inquiring the alignment at runtime and adding noticeable penalty for small 
number of byte...  but well, we have to love with that I think

... and if it was secured by an option, -fstrict-align[ment] would be really 
confusing (to gcc people at least) and also to the users of back-ends that 
support -mstrict-align[ment].


Johann


> Thanks,
> Richard.
>
> 2016-07-12  Richard Biener  <rguenther@suse.de>
>
> 	* gimple-fold.c (gimple_fold_builtin_memory_op): Lift alignment
> 	restrictions from inlining register size memcpy.
>
> 	* gcc.dg/torture/builtin-memcpy.c: New testcase.
>
> Index: gcc/gimple-fold.c
> ===================================================================
> *** gcc/gimple-fold.c	(revision 238237)
> --- gcc/gimple-fold.c	(working copy)
> *************** gimple_fold_builtin_memory_op (gimple_st
> *** 705,718 ****
>   	      if (type
>   		  && TYPE_MODE (type) != BLKmode
>   		  && (GET_MODE_SIZE (TYPE_MODE (type)) * BITS_PER_UNIT
> ! 		      == ilen * 8)
> ! 		  /* If the destination pointer is not aligned we must be able
> ! 		     to emit an unaligned store.  */
> ! 		  && (dest_align >= GET_MODE_ALIGNMENT (TYPE_MODE (type))
> ! 		      || !SLOW_UNALIGNED_ACCESS (TYPE_MODE (type), dest_align)
> ! 		      || (optab_handler (movmisalign_optab, TYPE_MODE (type))
> ! 			  != CODE_FOR_nothing)))
>   		{
>   		  tree srctype = type;
>   		  tree desttype = type;
>   		  if (src_align < GET_MODE_ALIGNMENT (TYPE_MODE (type)))
> --- 705,718 ----
>   	      if (type
>   		  && TYPE_MODE (type) != BLKmode
>   		  && (GET_MODE_SIZE (TYPE_MODE (type)) * BITS_PER_UNIT
> ! 		      == ilen * 8))
>   		{
> + 		  /* RTL expansion handles misaligned destination / source
> + 		     MEM_REFs either via target provided movmisalign or
> + 		     via extract/store_bit_field for targets that set
> + 		     SLOW_UNALIGNED_ACCESS for the move.  For move
> + 		     quantities up to MOVE_MAX this should be always
> + 		     more efficient than a libcall to memcpy.  */
>   		  tree srctype = type;
>   		  tree desttype = type;
>   		  if (src_align < GET_MODE_ALIGNMENT (TYPE_MODE (type)))
> *************** gimple_fold_builtin_memory_op (gimple_st
> *** 721,767 ****
>   		  tree tem = fold_const_aggregate_ref (srcmem);
>   		  if (tem)
>   		    srcmem = tem;
> ! 		  else if (src_align < GET_MODE_ALIGNMENT (TYPE_MODE (type))
> ! 			   && SLOW_UNALIGNED_ACCESS (TYPE_MODE (type),
> ! 						     src_align)
> ! 			   && (optab_handler (movmisalign_optab,
> ! 					      TYPE_MODE (type))
> ! 			       == CODE_FOR_nothing))
> ! 		    srcmem = NULL_TREE;
> ! 		  if (srcmem)
>   		    {
> ! 		      gimple *new_stmt;
> ! 		      if (is_gimple_reg_type (TREE_TYPE (srcmem)))
> ! 			{
> ! 			  new_stmt = gimple_build_assign (NULL_TREE, srcmem);
> ! 			  if (gimple_in_ssa_p (cfun))
> ! 			    srcmem = make_ssa_name (TREE_TYPE (srcmem),
> ! 						    new_stmt);
> ! 			  else
> ! 			    srcmem = create_tmp_reg (TREE_TYPE (srcmem));
> ! 			  gimple_assign_set_lhs (new_stmt, srcmem);
> ! 			  gimple_set_vuse (new_stmt, gimple_vuse (stmt));
> ! 			  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
> ! 			}
> ! 		      if (dest_align < GET_MODE_ALIGNMENT (TYPE_MODE (type)))
> ! 			desttype = build_aligned_type (type, dest_align);
> ! 		      new_stmt
> ! 			= gimple_build_assign (fold_build2 (MEM_REF, desttype,
> ! 							    dest, off0),
> ! 					       srcmem);
>   		      gimple_set_vuse (new_stmt, gimple_vuse (stmt));
> - 		      gimple_set_vdef (new_stmt, gimple_vdef (stmt));
> - 		      if (gimple_vdef (new_stmt)
> - 			  && TREE_CODE (gimple_vdef (new_stmt)) == SSA_NAME)
> - 			SSA_NAME_DEF_STMT (gimple_vdef (new_stmt)) = new_stmt;
> - 		      if (!lhs)
> - 			{
> - 			  gsi_replace (gsi, new_stmt, false);
> - 			  return true;
> - 			}
>   		      gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
> - 		      goto done;
>   		    }
>   		}
>   	    }
>   	}
> --- 721,756 ----
>   		  tree tem = fold_const_aggregate_ref (srcmem);
>   		  if (tem)
>   		    srcmem = tem;
> ! 		  gimple *new_stmt;
> ! 		  if (is_gimple_reg_type (TREE_TYPE (srcmem)))
>   		    {
> ! 		      new_stmt = gimple_build_assign (NULL_TREE, srcmem);
> ! 		      if (gimple_in_ssa_p (cfun))
> ! 			srcmem = make_ssa_name (TREE_TYPE (srcmem),
> ! 						new_stmt);
> ! 		      else
> ! 			srcmem = create_tmp_reg (TREE_TYPE (srcmem));
> ! 		      gimple_assign_set_lhs (new_stmt, srcmem);
>   		      gimple_set_vuse (new_stmt, gimple_vuse (stmt));
>   		      gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
>   		    }
> + 		  if (dest_align < GET_MODE_ALIGNMENT (TYPE_MODE (type)))
> + 		    desttype = build_aligned_type (type, dest_align);
> + 		  new_stmt
> + 		    = gimple_build_assign (fold_build2 (MEM_REF, desttype,
> + 							dest, off0), srcmem);
> + 		  gimple_set_vuse (new_stmt, gimple_vuse (stmt));
> + 		  gimple_set_vdef (new_stmt, gimple_vdef (stmt));
> + 		  if (gimple_vdef (new_stmt)
> + 		      && TREE_CODE (gimple_vdef (new_stmt)) == SSA_NAME)
> + 		    SSA_NAME_DEF_STMT (gimple_vdef (new_stmt)) = new_stmt;
> + 		  if (!lhs)
> + 		    {
> + 		      gsi_replace (gsi, new_stmt, false);
> + 		      return true;
> + 		    }
> + 		  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
> + 		  goto done;
>   		}
>   	    }
>   	}
> Index: gcc/testsuite/gcc.dg/torture/builtin-memcpy.c
> ===================================================================
> *** gcc/testsuite/gcc.dg/torture/builtin-memcpy.c	(revision 0)
> --- gcc/testsuite/gcc.dg/torture/builtin-memcpy.c	(working copy)
> ***************
> *** 0 ****
> --- 1,89 ----
> + /* { dg-do run } */
> +
> + char src[32], dest[32];
> + long long llsrc, lldest;
> +
> + /* Unaligned source/dest variants.  */
> + void __attribute__((noinline,noclone))
> + copyuu16 (long long *src, long long *dest)
> + {
> +   __builtin_memcpy (dest, src, 16);
> + }
> + void __attribute__((noinline,noclone))
> + copyuu8 (long *src, long *dest)
> + {
> +   __builtin_memcpy (dest, src, 8);
> + }
> + void __attribute__((noinline,noclone))
> + copyuu4 (int *src, int *dest)
> + {
> +   __builtin_memcpy (dest, src, 4);
> + }
> + void __attribute__((noinline,noclone))
> + copyuu2 (short *src, short *dest)
> + {
> +   __builtin_memcpy (dest, src, 2);
> + }
> +
> + /* Aligned source, unaligned dest variants.  */
> + void __attribute__((noinline,noclone))
> + copyau16 (long long *dest)
> + {
> +   __builtin_memcpy (dest, &llsrc, 16);
> + }
> + void __attribute__((noinline,noclone))
> + copyau8 (long long *dest)
> + {
> +   __builtin_memcpy (dest, &llsrc, 8);
> + }
> + void __attribute__((noinline,noclone))
> + copyau4 (long long *dest)
> + {
> +   __builtin_memcpy (dest, &llsrc, 4);
> + }
> + void __attribute__((noinline,noclone))
> + copyau2 (long long *dest)
> + {
> +   __builtin_memcpy (dest, &llsrc, 2);
> + }
> +
> + /* Unaligned source, aligned dest variants.  */
> + void __attribute__((noinline,noclone))
> + copyua16 (long long *src)
> + {
> +   __builtin_memcpy (&lldest, src, 16);
> + }
> + void __attribute__((noinline,noclone))
> + copyua8 (long *src)
> + {
> +   __builtin_memcpy (&lldest, src, 8);
> + }
> + void __attribute__((noinline,noclone))
> + copyua4 (int *src)
> + {
> +   __builtin_memcpy (&lldest, src, 4);
> + }
> + void __attribute__((noinline,noclone))
> + copyua2 (short *src)
> + {
> +   __builtin_memcpy (&lldest, src, 2);
> + }
> +
> + int main()
> + {
> +   void *usrc = (void *)(((__UINTPTR_TYPE__)src & -16) + 17);
> +   void *udest = (void *)(((__UINTPTR_TYPE__)dest & -16) + 17);
> +   copyuu16 (udest, usrc);
> +   copyuu8 (udest, usrc);
> +   copyuu4 (udest, usrc);
> +   copyuu2 (udest, usrc);
> +   copyau16 (usrc);
> +   copyau8 (usrc);
> +   copyau4 (usrc);
> +   copyau2 (usrc);
> +   copyua16 (udest);
> +   copyua8 (udest);
> +   copyua4 (udest);
> +   copyua2 (udest);
> +   return 0;
> + }
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Lift alignment restrictions from inlining register size memcpy
  2016-07-14 11:47 ` Georg-Johann Lay
@ 2016-07-14 12:02   ` Richard Biener
  0 siblings, 0 replies; 6+ messages in thread
From: Richard Biener @ 2016-07-14 12:02 UTC (permalink / raw)
  To: Georg-Johann Lay; +Cc: gcc-patches

On Thu, 14 Jul 2016, Georg-Johann Lay wrote:

> On 12.07.2016 11:38, Richard Biener wrote:
> > 
> > The following patch does $subject which is requested in a comment in
> > PR50417 as this restriction defeats the purpose of having memcpy
> > as a portable and efficient way to circumvent strict-aliasing violations
> > (or even as a portable and efficient way to do unaligned loads).
> 
> IIUC the original purpose of the PR is a complaint about execution performance
> in situations like, say, memcpy (int*, int*, 0x1000).
> 
> GCC 4.5 used int-wide chunks in movmem even if no alignment could be inferred
> from symbols like in
> 
> static int a[..], b[..];
> memcpy (a, b, 0x1000)
> 
> The more I think about it the more I tend to not changing the current
> behaviour, be conservative and treat memcpy like a function.
> 
> If the alignment is deduced from the pointer type, then this is clearly not
> what the standard says as you already pointed out in the PR.
> 
> Hence, if we want such an "optimization" it should be optional by means of a
> respective command option so that the user can explicitly request this ABI
> change for memcpy.

Yes, as said the patch wasn't for the original issue in the PR but
for the comment complaining about not expanding memcpy inline on 
GIMPLE.

The patch will make the misaligned access explicit (unless the compiler
is later able to prove otherwise).

> > Bootstrap / regtest running on x86_64-unknown-linux-gnu (quite pointless
> > as there is no change on that arch).
> > 
> > I have added a testcase that should exercise most relevant cases so
> > we can look for fallout on "interesting" targets.
> > 
> > Boostrap / regtest on strict-alignment platforms welcome.
> > 
> > I do expect that with -Os expanding unaligned-unaligned moves
> > might be a size pessimization compared to a libcall (or what
> > the targets block move expander does).  But the whole point is
> > to remove abstraction penalty so it isn't an easy stmt-local
> > decision to make.  Comments on this front welcome as well.
> > 
> > Unless I hear some positives I'll let the patch sit here as I
> > don't really care too much about those pesky targets (and
> > targets can choose to opt-in by providing movmisalign optabs
> > anyway so they don't go the store/extract_bit_field path).
> 
> As I said, my doubts are increasing...
> 
> If movmem sees no alignment (where 4.5 passed a bigger alignment) then it
> could just FAIL and assume that libcall provides a fast implementation.  This
> means inquiring the alignment at runtime and adding noticeable penalty for
> small number of byte...  but well, we have to love with that I think

Yes, and the patch addresses this by always expanding memcpy to
a load, store pair (the target specifices the largest quantity via 
MOVE_MAX, usually word_mode size but x86_64 for example allows
moves via xmm registers as well).

> ... and if it was secured by an option, -fstrict-align[ment] would be really
> confusing (to gcc people at least) and also to the users of back-ends that
> support -mstrict-align[ment].

Agreed, but as said, the patch is completely separate from the issue
in the PR (so it doesn't reference it either).

Richard.

> 
> Johann
> 
> 
> > Thanks,
> > Richard.
> > 
> > 2016-07-12  Richard Biener  <rguenther@suse.de>
> > 
> > 	* gimple-fold.c (gimple_fold_builtin_memory_op): Lift alignment
> > 	restrictions from inlining register size memcpy.
> > 
> > 	* gcc.dg/torture/builtin-memcpy.c: New testcase.
> > 
> > Index: gcc/gimple-fold.c
> > ===================================================================
> > *** gcc/gimple-fold.c	(revision 238237)
> > --- gcc/gimple-fold.c	(working copy)
> > *************** gimple_fold_builtin_memory_op (gimple_st
> > *** 705,718 ****
> >   	      if (type
> >   		  && TYPE_MODE (type) != BLKmode
> >   		  && (GET_MODE_SIZE (TYPE_MODE (type)) * BITS_PER_UNIT
> > ! 		      == ilen * 8)
> > ! 		  /* If the destination pointer is not aligned we must be able
> > ! 		     to emit an unaligned store.  */
> > ! 		  && (dest_align >= GET_MODE_ALIGNMENT (TYPE_MODE (type))
> > ! 		      || !SLOW_UNALIGNED_ACCESS (TYPE_MODE (type), dest_align)
> > ! 		      || (optab_handler (movmisalign_optab, TYPE_MODE (type))
> > ! 			  != CODE_FOR_nothing)))
> >   		{
> >   		  tree srctype = type;
> >   		  tree desttype = type;
> >   		  if (src_align < GET_MODE_ALIGNMENT (TYPE_MODE (type)))
> > --- 705,718 ----
> >   	      if (type
> >   		  && TYPE_MODE (type) != BLKmode
> >   		  && (GET_MODE_SIZE (TYPE_MODE (type)) * BITS_PER_UNIT
> > ! 		      == ilen * 8))
> >   		{
> > + 		  /* RTL expansion handles misaligned destination / source
> > + 		     MEM_REFs either via target provided movmisalign or
> > + 		     via extract/store_bit_field for targets that set
> > + 		     SLOW_UNALIGNED_ACCESS for the move.  For move
> > + 		     quantities up to MOVE_MAX this should be always
> > + 		     more efficient than a libcall to memcpy.  */
> >   		  tree srctype = type;
> >   		  tree desttype = type;
> >   		  if (src_align < GET_MODE_ALIGNMENT (TYPE_MODE (type)))
> > *************** gimple_fold_builtin_memory_op (gimple_st
> > *** 721,767 ****
> >   		  tree tem = fold_const_aggregate_ref (srcmem);
> >   		  if (tem)
> >   		    srcmem = tem;
> > ! 		  else if (src_align < GET_MODE_ALIGNMENT (TYPE_MODE (type))
> > ! 			   && SLOW_UNALIGNED_ACCESS (TYPE_MODE (type),
> > ! 						     src_align)
> > ! 			   && (optab_handler (movmisalign_optab,
> > ! 					      TYPE_MODE (type))
> > ! 			       == CODE_FOR_nothing))
> > ! 		    srcmem = NULL_TREE;
> > ! 		  if (srcmem)
> >   		    {
> > ! 		      gimple *new_stmt;
> > ! 		      if (is_gimple_reg_type (TREE_TYPE (srcmem)))
> > ! 			{
> > ! 			  new_stmt = gimple_build_assign (NULL_TREE, srcmem);
> > ! 			  if (gimple_in_ssa_p (cfun))
> > ! 			    srcmem = make_ssa_name (TREE_TYPE (srcmem),
> > ! 						    new_stmt);
> > ! 			  else
> > ! 			    srcmem = create_tmp_reg (TREE_TYPE (srcmem));
> > ! 			  gimple_assign_set_lhs (new_stmt, srcmem);
> > ! 			  gimple_set_vuse (new_stmt, gimple_vuse (stmt));
> > ! 			  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
> > ! 			}
> > ! 		      if (dest_align < GET_MODE_ALIGNMENT (TYPE_MODE (type)))
> > ! 			desttype = build_aligned_type (type, dest_align);
> > ! 		      new_stmt
> > ! 			= gimple_build_assign (fold_build2 (MEM_REF, desttype,
> > ! 							    dest, off0),
> > ! 					       srcmem);
> >   		      gimple_set_vuse (new_stmt, gimple_vuse (stmt));
> > - 		      gimple_set_vdef (new_stmt, gimple_vdef (stmt));
> > - 		      if (gimple_vdef (new_stmt)
> > - 			  && TREE_CODE (gimple_vdef (new_stmt)) == SSA_NAME)
> > - 			SSA_NAME_DEF_STMT (gimple_vdef (new_stmt)) = new_stmt;
> > - 		      if (!lhs)
> > - 			{
> > - 			  gsi_replace (gsi, new_stmt, false);
> > - 			  return true;
> > - 			}
> >   		      gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
> > - 		      goto done;
> >   		    }
> >   		}
> >   	    }
> >   	}
> > --- 721,756 ----
> >   		  tree tem = fold_const_aggregate_ref (srcmem);
> >   		  if (tem)
> >   		    srcmem = tem;
> > ! 		  gimple *new_stmt;
> > ! 		  if (is_gimple_reg_type (TREE_TYPE (srcmem)))
> >   		    {
> > ! 		      new_stmt = gimple_build_assign (NULL_TREE, srcmem);
> > ! 		      if (gimple_in_ssa_p (cfun))
> > ! 			srcmem = make_ssa_name (TREE_TYPE (srcmem),
> > ! 						new_stmt);
> > ! 		      else
> > ! 			srcmem = create_tmp_reg (TREE_TYPE (srcmem));
> > ! 		      gimple_assign_set_lhs (new_stmt, srcmem);
> >   		      gimple_set_vuse (new_stmt, gimple_vuse (stmt));
> >   		      gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
> >   		    }
> > + 		  if (dest_align < GET_MODE_ALIGNMENT (TYPE_MODE (type)))
> > + 		    desttype = build_aligned_type (type, dest_align);
> > + 		  new_stmt
> > + 		    = gimple_build_assign (fold_build2 (MEM_REF, desttype,
> > + 							dest, off0), srcmem);
> > + 		  gimple_set_vuse (new_stmt, gimple_vuse (stmt));
> > + 		  gimple_set_vdef (new_stmt, gimple_vdef (stmt));
> > + 		  if (gimple_vdef (new_stmt)
> > + 		      && TREE_CODE (gimple_vdef (new_stmt)) == SSA_NAME)
> > + 		    SSA_NAME_DEF_STMT (gimple_vdef (new_stmt)) = new_stmt;
> > + 		  if (!lhs)
> > + 		    {
> > + 		      gsi_replace (gsi, new_stmt, false);
> > + 		      return true;
> > + 		    }
> > + 		  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
> > + 		  goto done;
> >   		}
> >   	    }
> >   	}
> > Index: gcc/testsuite/gcc.dg/torture/builtin-memcpy.c
> > ===================================================================
> > *** gcc/testsuite/gcc.dg/torture/builtin-memcpy.c	(revision 0)
> > --- gcc/testsuite/gcc.dg/torture/builtin-memcpy.c	(working copy)
> > ***************
> > *** 0 ****
> > --- 1,89 ----
> > + /* { dg-do run } */
> > +
> > + char src[32], dest[32];
> > + long long llsrc, lldest;
> > +
> > + /* Unaligned source/dest variants.  */
> > + void __attribute__((noinline,noclone))
> > + copyuu16 (long long *src, long long *dest)
> > + {
> > +   __builtin_memcpy (dest, src, 16);
> > + }
> > + void __attribute__((noinline,noclone))
> > + copyuu8 (long *src, long *dest)
> > + {
> > +   __builtin_memcpy (dest, src, 8);
> > + }
> > + void __attribute__((noinline,noclone))
> > + copyuu4 (int *src, int *dest)
> > + {
> > +   __builtin_memcpy (dest, src, 4);
> > + }
> > + void __attribute__((noinline,noclone))
> > + copyuu2 (short *src, short *dest)
> > + {
> > +   __builtin_memcpy (dest, src, 2);
> > + }
> > +
> > + /* Aligned source, unaligned dest variants.  */
> > + void __attribute__((noinline,noclone))
> > + copyau16 (long long *dest)
> > + {
> > +   __builtin_memcpy (dest, &llsrc, 16);
> > + }
> > + void __attribute__((noinline,noclone))
> > + copyau8 (long long *dest)
> > + {
> > +   __builtin_memcpy (dest, &llsrc, 8);
> > + }
> > + void __attribute__((noinline,noclone))
> > + copyau4 (long long *dest)
> > + {
> > +   __builtin_memcpy (dest, &llsrc, 4);
> > + }
> > + void __attribute__((noinline,noclone))
> > + copyau2 (long long *dest)
> > + {
> > +   __builtin_memcpy (dest, &llsrc, 2);
> > + }
> > +
> > + /* Unaligned source, aligned dest variants.  */
> > + void __attribute__((noinline,noclone))
> > + copyua16 (long long *src)
> > + {
> > +   __builtin_memcpy (&lldest, src, 16);
> > + }
> > + void __attribute__((noinline,noclone))
> > + copyua8 (long *src)
> > + {
> > +   __builtin_memcpy (&lldest, src, 8);
> > + }
> > + void __attribute__((noinline,noclone))
> > + copyua4 (int *src)
> > + {
> > +   __builtin_memcpy (&lldest, src, 4);
> > + }
> > + void __attribute__((noinline,noclone))
> > + copyua2 (short *src)
> > + {
> > +   __builtin_memcpy (&lldest, src, 2);
> > + }
> > +
> > + int main()
> > + {
> > +   void *usrc = (void *)(((__UINTPTR_TYPE__)src & -16) + 17);
> > +   void *udest = (void *)(((__UINTPTR_TYPE__)dest & -16) + 17);
> > +   copyuu16 (udest, usrc);
> > +   copyuu8 (udest, usrc);
> > +   copyuu4 (udest, usrc);
> > +   copyuu2 (udest, usrc);
> > +   copyau16 (usrc);
> > +   copyau8 (usrc);
> > +   copyau4 (usrc);
> > +   copyau2 (usrc);
> > +   copyua16 (udest);
> > +   copyua8 (udest);
> > +   copyua4 (udest);
> > +   copyua2 (udest);
> > +   return 0;
> > + }
> > 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-07-14 12:02 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-12  9:38 [PATCH] Lift alignment restrictions from inlining register size memcpy Richard Biener
2016-07-12 21:37 ` Eric Botcazou
2016-07-14 10:29 ` Eric Botcazou
2016-07-14 11:10   ` Richard Biener
2016-07-14 11:47 ` Georg-Johann Lay
2016-07-14 12:02   ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).