[PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).
@ 2018-03-12  8:57 Martin Liška
  2018-03-12  9:40 ` Marc Glisse
                   ` (2 more replies)
  0 siblings, 3 replies; 45+ messages in thread
From: Martin Liška @ 2018-03-12  8:57 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jakub Jelinek, H.J. Lu

[-- Attachment #1: Type: text/plain, Size: 1416 bytes --]

Hi.

This is fix for the PR that introduces a new target macro. Using the macro
one can say that a target has a fast mempcpy and thus it's preferred to be used
if possible.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
I also tested on x86_64-linux-gnu.

Ready to be installed?
Martin

gcc/ChangeLog:

2018-03-08  Martin Liska  <mliska@suse.cz>

	PR middle-end/81657
	* builtins.c (expand_builtin_memory_copy_args): Add new
	arguments.
	* config/i386/i386.h (TARGET_HAS_FAST_MEMPCPY_ROUTINE):
	New macro.
	* defaults.h (TARGET_HAS_FAST_MEMPCPY_ROUTINE): Likewise.
	* doc/tm.texi: Likewise.
	* doc/tm.texi.in: Likewise.
	* expr.c (compare_by_pieces): Add support for bail out.
	(emit_block_move_hints): Likewise.
	* expr.h (emit_block_move_hints): Add new arguments.

gcc/testsuite/ChangeLog:

2018-03-09  Martin Liska  <mliska@suse.cz>

	PR middle-end/81657
	* gcc.dg/string-opt-1.c: Adjust test to run only on non-x86
	target.
---
 gcc/builtins.c                      | 13 ++++++++++++-
 gcc/config/i386/i386.h              |  3 +++
 gcc/defaults.h                      |  7 +++++++
 gcc/doc/tm.texi                     |  5 +++++
 gcc/doc/tm.texi.in                  |  5 +++++
 gcc/expr.c                          | 16 +++++++++++++++-
 gcc/expr.h                          |  4 +++-
 gcc/testsuite/gcc.dg/string-opt-1.c |  4 ++--
 8 files changed, 52 insertions(+), 5 deletions(-)



[-- Attachment #2: 0001-Prefer-mempcpy-to-memcpy-on-x86_64-target-PR-middle-.patch --]
[-- Type: text/x-patch, Size: 6311 bytes --]

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 85affa74510..c2ca36934f7 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -3651,13 +3651,24 @@ expand_builtin_memory_copy_args (tree dest, tree src, tree len,
   src_mem = get_memory_rtx (src, len);
   set_mem_align (src_mem, src_align);
 
+  bool is_move_done;
+
   /* Copy word part most expediently.  */
   dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx,
 				     CALL_EXPR_TAILCALL (exp)
 				     && (endp == 0 || target == const0_rtx)
 				     ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL,
 				     expected_align, expected_size,
-				     min_size, max_size, probable_max_size);
+				     min_size, max_size, probable_max_size,
+				     TARGET_HAS_FAST_MEMPCPY_ROUTINE
+				     && endp == 1,
+				     &is_move_done);
+
+  /* Bail out when a mempcpy call would be expanded as libcall and when
+     we have a target that provides a fast implementation
+     of mempcpy routine.  */
+  if (!is_move_done)
+    return NULL_RTX;
 
   if (dest_addr == 0)
     {
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index e43edd77b56..8744d706fd7 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1914,6 +1914,9 @@ typedef struct ix86_args {
 
 #define CLEAR_RATIO(speed) ((speed) ? MIN (6, ix86_cost->move_ratio) : 2)
 
+/* C library provides fast implementation of mempcpy function.  */
+#define TARGET_HAS_FAST_MEMPCPY_ROUTINE 1
+
 /* Define if shifts truncate the shift count which implies one can
    omit a sign-extension or zero-extension of a shift count.
 
diff --git a/gcc/defaults.h b/gcc/defaults.h
index 78a08a33f12..2e5caac8dcd 100644
--- a/gcc/defaults.h
+++ b/gcc/defaults.h
@@ -1340,6 +1340,13 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #define SET_RATIO(speed) MOVE_RATIO (speed)
 #endif
 
+/* By default do not generate libcall to mempcpy and rather use
+   libcall to memcpy and adjustment of return value.  */
+
+#ifndef TARGET_HAS_FAST_MEMPCPY_ROUTINE
+#define TARGET_HAS_FAST_MEMPCPY_ROUTINE 0
+#endif
+
 /* Supply a default definition of STACK_SAVEAREA_MODE for emit_stack_save.
    Normally move_insn, so Pmode stack pointer.  */
 
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index bd8b917ba82..0c8a3f3298c 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6627,6 +6627,11 @@ optimized for speed rather than size.
 If you don't define this, it defaults to the value of @code{MOVE_RATIO}.
 @end defmac
 
+@defmac TARGET_HAS_FAST_MEMPCPY_ROUTINE
+By default do not generate libcall to mempcpy and rather use
+libcall to memcpy and adjustment of return value.
+@end defmac
+
 @defmac USE_LOAD_POST_INCREMENT (@var{mode})
 A C expression used to determine whether a load postincrement is a good
 thing to use for a given mode.  Defaults to the value of
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index b0207146e8c..e7ef85ab78e 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4560,6 +4560,11 @@ optimized for speed rather than size.
 If you don't define this, it defaults to the value of @code{MOVE_RATIO}.
 @end defmac
 
+@defmac TARGET_HAS_FAST_MEMPCPY_ROUTINE
+By default do not generate libcall to mempcpy and rather use
+libcall to memcpy and adjustment of return value.
+@end defmac
+
 @defmac USE_LOAD_POST_INCREMENT (@var{mode})
 A C expression used to determine whether a load postincrement is a good
 thing to use for a given mode.  Defaults to the value of
diff --git a/gcc/expr.c b/gcc/expr.c
index 00660293f72..b6c13652d79 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -1554,6 +1554,8 @@ compare_by_pieces (rtx arg0, rtx arg1, unsigned HOST_WIDE_INT len,
    MIN_SIZE is the minimal size of block to move
    MAX_SIZE is the maximal size of block to move, if it can not be represented
    in unsigned HOST_WIDE_INT, than it is mask of all ones.
+   If BAIL_OUT_LIBCALL is set true, do not emit library call and set
+   *IS_MOVE_DONE to false.
 
    Return the address of the new block, if memcpy is called and returns it,
    0 otherwise.  */
@@ -1563,12 +1565,17 @@ emit_block_move_hints (rtx x, rtx y, rtx size, enum block_op_methods method,
 		       unsigned int expected_align, HOST_WIDE_INT expected_size,
 		       unsigned HOST_WIDE_INT min_size,
 		       unsigned HOST_WIDE_INT max_size,
-		       unsigned HOST_WIDE_INT probable_max_size)
+		       unsigned HOST_WIDE_INT probable_max_size,
+		       bool bail_out_libcall, bool *is_move_done)
 {
   bool may_use_call;
   rtx retval = 0;
   unsigned int align;
 
+  /* When not doing a bail out, we always emit a memory move.  */
+  if (is_move_done)
+    *is_move_done = true;
+
   gcc_assert (size);
   if (CONST_INT_P (size) && INTVAL (size) == 0)
     return 0;
@@ -1625,6 +1632,13 @@ emit_block_move_hints (rtx x, rtx y, rtx size, enum block_op_methods method,
 	   && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (x))
 	   && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (y)))
     {
+      if (bail_out_libcall)
+	{
+	  if (is_move_done)
+	    *is_move_done = false;
+	  return retval;
+	}
+
       /* Since x and y are passed to a libcall, mark the corresponding
 	 tree EXPR as addressable.  */
       tree y_expr = MEM_EXPR (y);
diff --git a/gcc/expr.h b/gcc/expr.h
index b3d523bcb24..023bc5aec47 100644
--- a/gcc/expr.h
+++ b/gcc/expr.h
@@ -110,7 +110,9 @@ extern rtx emit_block_move_hints (rtx, rtx, rtx, enum block_op_methods,
 			          unsigned int, HOST_WIDE_INT,
 				  unsigned HOST_WIDE_INT,
 				  unsigned HOST_WIDE_INT,
-				  unsigned HOST_WIDE_INT);
+				  unsigned HOST_WIDE_INT,
+				  bool bail_out_libcall = false,
+				  bool *is_move_done = NULL);
 extern rtx emit_block_cmp_hints (rtx, rtx, rtx, tree, rtx, bool,
 				 by_pieces_constfn, void *);
 extern bool emit_storent_insn (rtx to, rtx from);
diff --git a/gcc/testsuite/gcc.dg/string-opt-1.c b/gcc/testsuite/gcc.dg/string-opt-1.c
index 2f060732bf0..851c8b04a33 100644
--- a/gcc/testsuite/gcc.dg/string-opt-1.c
+++ b/gcc/testsuite/gcc.dg/string-opt-1.c
@@ -48,5 +48,5 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-assembler-not "\<mempcpy\>" } } */
-/* { dg-final { scan-assembler "memcpy" } } */
+/* { dg-final { scan-assembler-not "\<mempcpy\>" { target { i?86-*-* x86_64-*-* } } } } */
+/* { dg-final { scan-assembler "memcpy" { target  { ! { i?86-*-* x86_64-*-* } } } } } */


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).
  2018-03-12  8:57 [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657) Martin Liška
@ 2018-03-12  9:40 ` Marc Glisse
  2018-03-13  8:25   ` Martin Liška
  2018-03-12 10:08 ` Richard Biener
  2018-03-12 13:35 ` Jakub Jelinek
  2 siblings, 1 reply; 45+ messages in thread
From: Marc Glisse @ 2018-03-12  9:40 UTC (permalink / raw)
  To: Martin Liška; +Cc: gcc-patches, Jakub Jelinek, H.J. Lu

On Mon, 12 Mar 2018, Martin LiÅ¡ka wrote:

> This is fix for the PR that introduces a new target macro. Using the macro
> one can say that a target has a fast mempcpy and thus it's preferred to be used
> if possible.
>
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> I also tested on x86_64-linux-gnu.
>
> Ready to be installed?
> Martin
>
> gcc/ChangeLog:
>
> 2018-03-08  Martin Liska  <mliska@suse.cz>
>
> 	PR middle-end/81657
> 	* builtins.c (expand_builtin_memory_copy_args): Add new
> 	arguments.
> 	* config/i386/i386.h (TARGET_HAS_FAST_MEMPCPY_ROUTINE):
> 	New macro.

Shouldn't the macro be defined in a more specific case, for instance glibc 
on x86? Or do all known libc on x86 happen to provide a fast mempcpy?

> 	* defaults.h (TARGET_HAS_FAST_MEMPCPY_ROUTINE): Likewise.
> 	* doc/tm.texi: Likewise.
> 	* doc/tm.texi.in: Likewise.
> 	* expr.c (compare_by_pieces): Add support for bail out.
> 	(emit_block_move_hints): Likewise.
> 	* expr.h (emit_block_move_hints): Add new arguments.

-- 
Marc Glisse

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).
  2018-03-12  9:40 ` Marc Glisse
@ 2018-03-13  8:25   ` Martin Liška
  2018-03-13  8:35     ` Jakub Jelinek
  0 siblings, 1 reply; 45+ messages in thread
From: Martin Liška @ 2018-03-13  8:25 UTC (permalink / raw)
  To: gcc-patches, Marc Glisse; +Cc: Jakub Jelinek, H.J. Lu

On 03/12/2018 10:39 AM, Marc Glisse wrote:
> On Mon, 12 Mar 2018, Martin LiÅ¡ka wrote:
> 
>> This is fix for the PR that introduces a new target macro. Using the macro
>> one can say that a target has a fast mempcpy and thus it's preferred to be used
>> if possible.
>>
>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>> I also tested on x86_64-linux-gnu.
>>
>> Ready to be installed?
>> Martin
>>
>> gcc/ChangeLog:
>>
>> 2018-03-08Â  Martin LiskaÂ  <mliska@suse.cz>
>>
>> Â Â Â Â PR middle-end/81657
>> Â Â Â Â * builtins.c (expand_builtin_memory_copy_args): Add new
>> Â Â Â Â arguments.
>> Â Â Â Â * config/i386/i386.h (TARGET_HAS_FAST_MEMPCPY_ROUTINE):
>> Â Â Â Â New macro.
> 
> Shouldn't the macro be defined in a more specific case, for instance glibc on x86? Or do all known libc on x86 happen to provide a fast mempcpy?

That's Marc a very good question. Do we already have a glibc-related target macros/hooks?
If so, I would add this as one of these.

Thanks,
Martin

> 
>> Â Â Â Â * defaults.h (TARGET_HAS_FAST_MEMPCPY_ROUTINE): Likewise.
>> Â Â Â Â * doc/tm.texi: Likewise.
>> Â Â Â Â * doc/tm.texi.in: Likewise.
>> Â Â Â Â * expr.c (compare_by_pieces): Add support for bail out.
>> Â Â Â Â (emit_block_move_hints): Likewise.
>> Â Â Â Â * expr.h (emit_block_move_hints): Add new arguments.
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).
  2018-03-13  8:25   ` Martin Liška
@ 2018-03-13  8:35     ` Jakub Jelinek
  2018-03-13  9:22       ` Richard Biener
  2018-03-13 15:19       ` Martin Liška
  0 siblings, 2 replies; 45+ messages in thread
From: Jakub Jelinek @ 2018-03-13  8:35 UTC (permalink / raw)
  To: Martin Liška; +Cc: gcc-patches, Marc Glisse, H.J. Lu

On Tue, Mar 13, 2018 at 09:24:11AM +0100, Martin LiÅ¡ka wrote:
> On 03/12/2018 10:39 AM, Marc Glisse wrote:
> > On Mon, 12 Mar 2018, Martin LiÅ¡ka wrote:
> > 
> > > This is fix for the PR that introduces a new target macro. Using the macro
> > > one can say that a target has a fast mempcpy and thus it's preferred to be used
> > > if possible.
> > > 
> > > Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> > > I also tested on x86_64-linux-gnu.
> > > 
> > > Ready to be installed?
> > > Martin
> > > 
> > > gcc/ChangeLog:
> > > 
> > > 2018-03-08Â  Martin LiskaÂ  <mliska@suse.cz>
> > > 
> > > Â Â Â Â PR middle-end/81657
> > > Â Â Â Â * builtins.c (expand_builtin_memory_copy_args): Add new
> > > Â Â Â Â arguments.
> > > Â Â Â Â * config/i386/i386.h (TARGET_HAS_FAST_MEMPCPY_ROUTINE):
> > > Â Â Â Â New macro.
> > 
> > Shouldn't the macro be defined in a more specific case, for instance glibc on x86? Or do all known libc on x86 happen to provide a fast mempcpy?
> 
> That's Marc a very good question. Do we already have a glibc-related target macros/hooks?
> If so, I would add this as one of these.

Yes, see e.g. TARGET_LIBC_HAS_FUNCTION target hook,
where in particular linux_libc_has_function deals with various C libraries.
Of course, in this case you need another target hook, that is dependent both
on the target backend and C library.

It would be nice to make the target hook a little bit more generic as well,
e.g. pass it enum builtin_function and query if it is fast, slow or
unknown, or even some kind of cost, where the caller could ask for cost of
BUILT_IN_MEMCPY and BUILT_IN_MEMPCPY and decide based on the relative costs.

	Jakub

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).
  2018-03-13  8:35     ` Jakub Jelinek
@ 2018-03-13  9:22       ` Richard Biener
  2018-03-13 15:19       ` Martin Liška
  1 sibling, 0 replies; 45+ messages in thread
From: Richard Biener @ 2018-03-13  9:22 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Martin Liška, GCC Patches, Marc Glisse, H.J. Lu

On Tue, Mar 13, 2018 at 9:32 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Tue, Mar 13, 2018 at 09:24:11AM +0100, Martin Liška wrote:
>> On 03/12/2018 10:39 AM, Marc Glisse wrote:
>> > On Mon, 12 Mar 2018, Martin Liška wrote:
>> >
>> > > This is fix for the PR that introduces a new target macro. Using the macro
>> > > one can say that a target has a fast mempcpy and thus it's preferred to be used
>> > > if possible.
>> > >
>> > > Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>> > > I also tested on x86_64-linux-gnu.
>> > >
>> > > Ready to be installed?
>> > > Martin
>> > >
>> > > gcc/ChangeLog:
>> > >
>> > > 2018-03-08  Martin Liska  <mliska@suse.cz>
>> > >
>> > >     PR middle-end/81657
>> > >     * builtins.c (expand_builtin_memory_copy_args): Add new
>> > >     arguments.
>> > >     * config/i386/i386.h (TARGET_HAS_FAST_MEMPCPY_ROUTINE):
>> > >     New macro.
>> >
>> > Shouldn't the macro be defined in a more specific case, for instance glibc on x86? Or do all known libc on x86 happen to provide a fast mempcpy?
>>
>> That's Marc a very good question. Do we already have a glibc-related target macros/hooks?
>> If so, I would add this as one of these.
>
> Yes, see e.g. TARGET_LIBC_HAS_FUNCTION target hook,
> where in particular linux_libc_has_function deals with various C libraries.
> Of course, in this case you need another target hook, that is dependent both
> on the target backend and C library.
>
> It would be nice to make the target hook a little bit more generic as well,
> e.g. pass it enum builtin_function and query if it is fast, slow or
> unknown, or even some kind of cost, where the caller could ask for cost of
> BUILT_IN_MEMCPY and BUILT_IN_MEMPCPY and decide based on the relative costs.

Or maybe a hook returning the alternative to use?  Pass it BUILT_IN_X
and get back
the same or related builtin?

Richard.

>         Jakub

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).
  2018-03-13  8:35     ` Jakub Jelinek
  2018-03-13  9:22       ` Richard Biener
@ 2018-03-13 15:19       ` Martin Liška
  2018-03-13 20:35         ` Jakub Jelinek
  1 sibling, 1 reply; 45+ messages in thread
From: Martin Liška @ 2018-03-13 15:19 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches, Marc Glisse, H.J. Lu

On 03/13/2018 09:32 AM, Jakub Jelinek wrote:
> On Tue, Mar 13, 2018 at 09:24:11AM +0100, Martin LiÅ¡ka wrote:
>> On 03/12/2018 10:39 AM, Marc Glisse wrote:
>>> On Mon, 12 Mar 2018, Martin LiÅ¡ka wrote:
>>>
>>>> This is fix for the PR that introduces a new target macro. Using the macro
>>>> one can say that a target has a fast mempcpy and thus it's preferred to be used
>>>> if possible.
>>>>
>>>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>>>> I also tested on x86_64-linux-gnu.
>>>>
>>>> Ready to be installed?
>>>> Martin
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>> 2018-03-08Â  Martin LiskaÂ  <mliska@suse.cz>
>>>>
>>>>  Â Â Â Â PR middle-end/81657
>>>>  Â Â Â Â * builtins.c (expand_builtin_memory_copy_args): Add new
>>>>  Â Â Â Â arguments.
>>>>  Â Â Â Â * config/i386/i386.h (TARGET_HAS_FAST_MEMPCPY_ROUTINE):
>>>>  Â Â Â Â New macro.
>>>
>>> Shouldn't the macro be defined in a more specific case, for instance glibc on x86? Or do all known libc on x86 happen to provide a fast mempcpy?
>>
>> That's Marc a very good question. Do we already have a glibc-related target macros/hooks?
>> If so, I would add this as one of these.
> 
> Yes, see e.g. TARGET_LIBC_HAS_FUNCTION target hook,
> where in particular linux_libc_has_function deals with various C libraries.
> Of course, in this case you need another target hook, that is dependent both
> on the target backend and C library.
> 
> It would be nice to make the target hook a little bit more generic as well,
> e.g. pass it enum builtin_function and query if it is fast, slow or
> unknown, or even some kind of cost, where the caller could ask for cost of
> BUILT_IN_MEMCPY and BUILT_IN_MEMPCPY and decide based on the relative costs.

Let me start with simple return enum value of FAST,SLOW,UNKNOWN. I've added new hook
definition to gcc/config/gnu-user.h that will point to gnu_libc_function_implementation.
I would like to implement the function in gcc/targhooks.c, but I don't know how to
make ifdef according to target?

One another issue is that built_in_function is enum defined in tree.h. Thus I'll replace the
callback argument with int, that will be casted. One last issue: am I right that I'll have to define
TARGET_LIBC_FUNCTION_IMPLEMENTATION in each config file (similar to no_c99_libc_has_function)?

Thanks,
Martin

> 
> 	Jakub
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).
  2018-03-13 15:19       ` Martin Liška
@ 2018-03-13 20:35         ` Jakub Jelinek
  2018-03-14 12:57           ` Martin Liška
  0 siblings, 1 reply; 45+ messages in thread
From: Jakub Jelinek @ 2018-03-13 20:35 UTC (permalink / raw)
  To: Martin Liška; +Cc: gcc-patches, Marc Glisse, H.J. Lu

On Tue, Mar 13, 2018 at 04:19:21PM +0100, Martin LiÅ¡ka wrote:
> > Yes, see e.g. TARGET_LIBC_HAS_FUNCTION target hook,
> > where in particular linux_libc_has_function deals with various C libraries.
> > Of course, in this case you need another target hook, that is dependent both
> > on the target backend and C library.
> > 
> > It would be nice to make the target hook a little bit more generic as well,
> > e.g. pass it enum builtin_function and query if it is fast, slow or
> > unknown, or even some kind of cost, where the caller could ask for cost of
> > BUILT_IN_MEMCPY and BUILT_IN_MEMPCPY and decide based on the relative costs.
> 
> Let me start with simple return enum value of FAST,SLOW,UNKNOWN. I've added new hook
> definition to gcc/config/gnu-user.h that will point to gnu_libc_function_implementation.
> I would like to implement the function in gcc/targhooks.c, but I don't know how to
> make ifdef according to target?

Put there just the default implementation (everything is UNKNOWN?).

> One another issue is that built_in_function is enum defined in tree.h. Thus I'll replace the
> callback argument with int, that will be casted. One last issue: am I right that I'll have to define
> TARGET_LIBC_FUNCTION_IMPLEMENTATION in each config file (similar to no_c99_libc_has_function)?

And define the i386/x86_64 glibc one in config/i386/*.h, check there
OPTION_GLIBC and only in that case return something other than UNKNOWN.

And redefine TARGET_LIBC_FUNCTION_IMPLEMENTATION only in that case.

	Jakub

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).
  2018-03-13 20:35         ` Jakub Jelinek
@ 2018-03-14 12:57           ` Martin Liška
  2018-03-14 13:08             ` H.J. Lu
  2018-03-14 13:08             ` Jakub Jelinek
  0 siblings, 2 replies; 45+ messages in thread
From: Martin Liška @ 2018-03-14 12:57 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches, Marc Glisse, H.J. Lu

[-- Attachment #1: Type: text/plain, Size: 1705 bytes --]

On 03/13/2018 04:23 PM, Jakub Jelinek wrote:
> On Tue, Mar 13, 2018 at 04:19:21PM +0100, Martin LiÅ¡ka wrote:
>>> Yes, see e.g. TARGET_LIBC_HAS_FUNCTION target hook,
>>> where in particular linux_libc_has_function deals with various C libraries.
>>> Of course, in this case you need another target hook, that is dependent both
>>> on the target backend and C library.
>>>
>>> It would be nice to make the target hook a little bit more generic as well,
>>> e.g. pass it enum builtin_function and query if it is fast, slow or
>>> unknown, or even some kind of cost, where the caller could ask for cost of
>>> BUILT_IN_MEMCPY and BUILT_IN_MEMPCPY and decide based on the relative costs.
>>
>> Let me start with simple return enum value of FAST,SLOW,UNKNOWN. I've added new hook
>> definition to gcc/config/gnu-user.h that will point to gnu_libc_function_implementation.
>> I would like to implement the function in gcc/targhooks.c, but I don't know how to
>> make ifdef according to target?
> 
> Put there just the default implementation (everything is UNKNOWN?).
> 
>> One another issue is that built_in_function is enum defined in tree.h. Thus I'll replace the
>> callback argument with int, that will be casted. One last issue: am I right that I'll have to define
>> TARGET_LIBC_FUNCTION_IMPLEMENTATION in each config file (similar to no_c99_libc_has_function)?
> 
> And define the i386/x86_64 glibc one in config/i386/*.h, check there
> OPTION_GLIBC and only in that case return something other than UNKNOWN.
> 
> And redefine TARGET_LIBC_FUNCTION_IMPLEMENTATION only in that case.
> 
> 	Jakub
> 

Hi.

I'm sending V2 that can survive bootstrap and regression tests on both x86_64 and ppc64le.

Martin

[-- Attachment #2: 0001-Introduce-new-libc_func_speed-target-hook-PR-middle-.patch --]
[-- Type: text/x-patch, Size: 11154 bytes --]

From 222c7c205a7afc144dc123d2b378a057dcf8816f Mon Sep 17 00:00:00 2001
From: marxin <mliska@suse.cz>
Date: Wed, 14 Mar 2018 09:44:18 +0100
Subject: [PATCH] Introduce new libc_func_speed target hook (PR
 middle-end/81657).

gcc/ChangeLog:

2018-03-14  Martin Liska  <mliska@suse.cz>

	PR middle-end/81657
	* builtins.c (expand_builtin_memory_copy_args): Handle situation
	when libc library provides a fast mempcpy implementation/
	* config/i386/i386-protos.h (gnu_libc_func_speed): New.
	* config/i386/i386.c (enum libc_speed): Likewise.
	(ix86_libc_func_speed): Likewise.
	(TARGET_LIBC_FUNC_SPEED): Likewise.
	* coretypes.h (enum libc_speed): Likewise.
	* doc/tm.texi: Document new target hook.
	* doc/tm.texi.in: Likewise.
	* expr.c (emit_block_move_hints): Handle libc bail out argument.
	* expr.h (emit_block_move_hints): Add new parameters.
	* target.def: Add new hook.
	* targhooks.c (enum libc_speed): New enum.
	(default_libc_func_speed): Provide a default hook
	implementation.
	* targhooks.h (default_libc_func_speed): Likewise.

gcc/testsuite/ChangeLog:

2018-03-14  Martin Liska  <mliska@suse.cz>

	* gcc.c-torture/execute/builtins/mempcpy.c (main_test): Adjust
	to not use mempcpy.
	* gcc.dg/string-opt-1.c: Adjust for i386 target.
---
 gcc/builtins.c                                       | 13 ++++++++++++-
 gcc/config/i386/i386-protos.h                        |  2 ++
 gcc/config/i386/i386.c                               | 20 ++++++++++++++++++++
 gcc/coretypes.h                                      |  7 +++++++
 gcc/doc/tm.texi                                      |  4 ++++
 gcc/doc/tm.texi.in                                   |  1 +
 gcc/expr.c                                           | 16 +++++++++++++++-
 gcc/expr.h                                           |  4 +++-
 gcc/target.def                                       |  7 +++++++
 gcc/targhooks.c                                      |  6 ++++++
 gcc/targhooks.h                                      |  1 +
 .../gcc.c-torture/execute/builtins/mempcpy.c         |  2 +-
 gcc/testsuite/gcc.dg/string-opt-1.c                  |  4 ++--
 13 files changed, 81 insertions(+), 6 deletions(-)

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 85affa74510..eb038dd45b3 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -3651,13 +3651,24 @@ expand_builtin_memory_copy_args (tree dest, tree src, tree len,
   src_mem = get_memory_rtx (src, len);
   set_mem_align (src_mem, src_align);
 
+  bool is_move_done;
+
   /* Copy word part most expediently.  */
+  bool bail_out_libcall = endp == 1
+    && targetm.libc_func_speed ((int)BUILT_IN_MEMPCPY) == FAST_SPEED;
   dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx,
 				     CALL_EXPR_TAILCALL (exp)
 				     && (endp == 0 || target == const0_rtx)
 				     ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL,
 				     expected_align, expected_size,
-				     min_size, max_size, probable_max_size);
+				     min_size, max_size, probable_max_size,
+				     bail_out_libcall, &is_move_done);
+
+  /* Bail out when a mempcpy call would be expanded as libcall and when
+     we have a target that provides a fast implementation
+     of mempcpy routine.  */
+  if (!is_move_done)
+    return NULL_RTX;
 
   if (dest_addr == 0)
     {
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index ef7c818986f..d3fc515845b 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -47,6 +47,8 @@ extern void ix86_reset_previous_fndecl (void);
 
 extern bool ix86_using_red_zone (void);
 
+extern enum libc_speed gnu_libc_func_speed (int fn);
+
 #ifdef RTX_CODE
 extern int standard_80387_constant_p (rtx);
 extern const char *standard_80387_constant_opcode (rtx);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index af24c6ec5ba..b3eb50a42f3 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2733,6 +2733,23 @@ ix86_using_red_zone (void)
 	  && (!cfun->machine->has_local_indirect_jump
 	      || cfun->machine->indirect_branch_type == indirect_branch_keep));
 }
+
+enum libc_speed
+ix86_libc_func_speed (int fn)
+{
+  enum built_in_function f = (built_in_function)fn;
+
+  if (!OPTION_GLIBC)
+    return UNKNOWN_SPEED;
+
+  switch (f)
+    {
+      case BUILT_IN_MEMPCPY:
+	return FAST_SPEED;
+      default:
+	return UNKNOWN_SPEED;
+    }
+}
 \f
 /* Return a string that documents the current -m options.  The caller is
    responsible for freeing the string.  */
@@ -52061,6 +52078,9 @@ ix86_run_selftests (void)
 #undef TARGET_WARN_PARAMETER_PASSING_ABI
 #define TARGET_WARN_PARAMETER_PASSING_ABI ix86_warn_parameter_passing_abi
 
+#undef TARGET_LIBC_FUNC_SPEED
+#define TARGET_LIBC_FUNC_SPEED ix86_libc_func_speed
+
 #if CHECKING_P
 #undef TARGET_RUN_TARGET_SELFTESTS
 #define TARGET_RUN_TARGET_SELFTESTS selftest::ix86_run_selftests
diff --git a/gcc/coretypes.h b/gcc/coretypes.h
index 283b4eb33fe..8123df7ccc5 100644
--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -384,6 +384,13 @@ enum excess_precision_type
   EXCESS_PRECISION_TYPE_FAST
 };
 
+enum libc_speed
+{
+  FAST_SPEED,
+  SLOW_SPEED,
+  UNKNOWN_SPEED
+};
+
 /* Support for user-provided GGC and PCH markers.  The first parameter
    is a pointer to a pointer, the second a cookie.  */
 typedef void (*gt_pointer_operator) (void *, void *);
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index bd8b917ba82..0f7c91a22c4 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -5501,6 +5501,10 @@ macro, a reasonable default is used.
 This hook determines whether a function from a class of functions
 @var{fn_class} is present at the runtime.
 @end deftypefn
+@deftypefn {Target Hook} libc_speed TARGET_LIBC_FUNC_SPEED (int @var{fn})
+This hook determines whether a function from libc has a fast implementation
+@var{fn} is present at the runtime.
+@end deftypefn
 
 @defmac NEXT_OBJC_RUNTIME
 Set this macro to 1 to use the "NeXT" Objective-C message sending conventions
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index b0207146e8c..4bb2998a8a1 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -3933,6 +3933,7 @@ macro, a reasonable default is used.
 @end defmac
 
 @hook TARGET_LIBC_HAS_FUNCTION
+@hook TARGET_LIBC_FUNC_SPEED
 
 @defmac NEXT_OBJC_RUNTIME
 Set this macro to 1 to use the "NeXT" Objective-C message sending conventions
diff --git a/gcc/expr.c b/gcc/expr.c
index 00660293f72..b6c13652d79 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -1554,6 +1554,8 @@ compare_by_pieces (rtx arg0, rtx arg1, unsigned HOST_WIDE_INT len,
    MIN_SIZE is the minimal size of block to move
    MAX_SIZE is the maximal size of block to move, if it can not be represented
    in unsigned HOST_WIDE_INT, than it is mask of all ones.
+   If BAIL_OUT_LIBCALL is set true, do not emit library call and set
+   *IS_MOVE_DONE to false.
 
    Return the address of the new block, if memcpy is called and returns it,
    0 otherwise.  */
@@ -1563,12 +1565,17 @@ emit_block_move_hints (rtx x, rtx y, rtx size, enum block_op_methods method,
 		       unsigned int expected_align, HOST_WIDE_INT expected_size,
 		       unsigned HOST_WIDE_INT min_size,
 		       unsigned HOST_WIDE_INT max_size,
-		       unsigned HOST_WIDE_INT probable_max_size)
+		       unsigned HOST_WIDE_INT probable_max_size,
+		       bool bail_out_libcall, bool *is_move_done)
 {
   bool may_use_call;
   rtx retval = 0;
   unsigned int align;
 
+  /* When not doing a bail out, we always emit a memory move.  */
+  if (is_move_done)
+    *is_move_done = true;
+
   gcc_assert (size);
   if (CONST_INT_P (size) && INTVAL (size) == 0)
     return 0;
@@ -1625,6 +1632,13 @@ emit_block_move_hints (rtx x, rtx y, rtx size, enum block_op_methods method,
 	   && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (x))
 	   && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (y)))
     {
+      if (bail_out_libcall)
+	{
+	  if (is_move_done)
+	    *is_move_done = false;
+	  return retval;
+	}
+
       /* Since x and y are passed to a libcall, mark the corresponding
 	 tree EXPR as addressable.  */
       tree y_expr = MEM_EXPR (y);
diff --git a/gcc/expr.h b/gcc/expr.h
index b3d523bcb24..023bc5aec47 100644
--- a/gcc/expr.h
+++ b/gcc/expr.h
@@ -110,7 +110,9 @@ extern rtx emit_block_move_hints (rtx, rtx, rtx, enum block_op_methods,
 			          unsigned int, HOST_WIDE_INT,
 				  unsigned HOST_WIDE_INT,
 				  unsigned HOST_WIDE_INT,
-				  unsigned HOST_WIDE_INT);
+				  unsigned HOST_WIDE_INT,
+				  bool bail_out_libcall = false,
+				  bool *is_move_done = NULL);
 extern rtx emit_block_cmp_hints (rtx, rtx, rtx, tree, rtx, bool,
 				 by_pieces_constfn, void *);
 extern bool emit_storent_insn (rtx to, rtx from);
diff --git a/gcc/target.def b/gcc/target.def
index c5b2a1e7e71..3bbddc82776 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -2639,6 +2639,13 @@ DEFHOOK
  bool, (enum function_class fn_class),
  default_libc_has_function)
 
+DEFHOOK
+(libc_func_speed,
+ "This hook determines whether a function from libc has a fast implementation\n\
+@var{fn} is present at the runtime.",
+ libc_speed, (int fn),
+ default_libc_func_speed)
+
 /* True if new jumps cannot be created, to replace existing ones or
    not, at the current point in the compilation.  */
 DEFHOOK
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index fafcc6c5196..af0c87ff9fe 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -1642,6 +1642,12 @@ no_c99_libc_has_function (enum function_class fn_class ATTRIBUTE_UNUSED)
   return false;
 }
 
+enum libc_speed
+default_libc_func_speed (int)
+{
+  return UNKNOWN_SPEED;
+}
+
 tree
 default_builtin_tm_load_store (tree ARG_UNUSED (type))
 {
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 8a4393f2ba4..7508673ad0a 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -205,6 +205,7 @@ extern bool default_have_conditional_execution (void);
 extern bool default_libc_has_function (enum function_class);
 extern bool no_c99_libc_has_function (enum function_class);
 extern bool gnu_libc_has_function (enum function_class);
+extern enum libc_speed default_libc_func_speed (int);
 
 extern tree default_builtin_tm_load_store (tree);
 
diff --git a/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c b/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c
index d82e2232d7b..91e1c87f83f 100644
--- a/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c
+++ b/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c
@@ -62,7 +62,7 @@ main_test (void)
   mempcpy (p + 5, s3, 1);
   if (memcmp (p, "ABCDEFg", 8))
     abort ();
-  mempcpy (p + 6, s1 + 1, l1);
+  memcpy (p + 6, s1 + 1, l1);
   if (memcmp (p, "ABCDEF2", 8))
     abort ();
 }
diff --git a/gcc/testsuite/gcc.dg/string-opt-1.c b/gcc/testsuite/gcc.dg/string-opt-1.c
index 2f060732bf0..851c8b04a33 100644
--- a/gcc/testsuite/gcc.dg/string-opt-1.c
+++ b/gcc/testsuite/gcc.dg/string-opt-1.c
@@ -48,5 +48,5 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-assembler-not "\<mempcpy\>" } } */
-/* { dg-final { scan-assembler "memcpy" } } */
+/* { dg-final { scan-assembler-not "\<mempcpy\>" { target { i?86-*-* x86_64-*-* } } } } */
+/* { dg-final { scan-assembler "memcpy" { target  { ! { i?86-*-* x86_64-*-* } } } } } */
-- 
2.16.2


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).
  2018-03-14 12:57           ` Martin Liška
@ 2018-03-14 13:08             ` H.J. Lu
  2018-03-14 13:10               ` Martin Liška
  2018-03-14 13:08             ` Jakub Jelinek
  1 sibling, 1 reply; 45+ messages in thread
From: H.J. Lu @ 2018-03-14 13:08 UTC (permalink / raw)
  To: Martin Liška; +Cc: Jakub Jelinek, GCC Patches, Marc Glisse

On Wed, Mar 14, 2018 at 5:54 AM, Martin Liška <mliska@suse.cz> wrote:
> On 03/13/2018 04:23 PM, Jakub Jelinek wrote:
>> On Tue, Mar 13, 2018 at 04:19:21PM +0100, Martin Liška wrote:
>>>> Yes, see e.g. TARGET_LIBC_HAS_FUNCTION target hook,
>>>> where in particular linux_libc_has_function deals with various C libraries.
>>>> Of course, in this case you need another target hook, that is dependent both
>>>> on the target backend and C library.
>>>>
>>>> It would be nice to make the target hook a little bit more generic as well,
>>>> e.g. pass it enum builtin_function and query if it is fast, slow or
>>>> unknown, or even some kind of cost, where the caller could ask for cost of
>>>> BUILT_IN_MEMCPY and BUILT_IN_MEMPCPY and decide based on the relative costs.
>>>
>>> Let me start with simple return enum value of FAST,SLOW,UNKNOWN. I've added new hook
>>> definition to gcc/config/gnu-user.h that will point to gnu_libc_function_implementation.
>>> I would like to implement the function in gcc/targhooks.c, but I don't know how to
>>> make ifdef according to target?
>>
>> Put there just the default implementation (everything is UNKNOWN?).
>>
>>> One another issue is that built_in_function is enum defined in tree.h. Thus I'll replace the
>>> callback argument with int, that will be casted. One last issue: am I right that I'll have to define
>>> TARGET_LIBC_FUNCTION_IMPLEMENTATION in each config file (similar to no_c99_libc_has_function)?
>>
>> And define the i386/x86_64 glibc one in config/i386/*.h, check there
>> OPTION_GLIBC and only in that case return something other than UNKNOWN.
>>
>> And redefine TARGET_LIBC_FUNCTION_IMPLEMENTATION only in that case.
>>
>>       Jakub
>>
>
> Hi.
>
> I'm sending V2 that can survive bootstrap and regression tests on both x86_64 and ppc64le.
>
> Martin

diff --git a/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c
b/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c
index d82e2232d7b..91e1c87f83f 100644
--- a/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c
+++ b/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c
@@ -62,7 +62,7 @@ main_test (void)
   mempcpy (p + 5, s3, 1);
   if (memcmp (p, "ABCDEFg", 8))
     abort ();
-  mempcpy (p + 6, s1 + 1, l1);
+  memcpy (p + 6, s1 + 1, l1);
   if (memcmp (p, "ABCDEF2", 8))
     abort ();
 }

This is a mempcpy test.  Why is mempcpy changed to memcpy?

-- 
H.J.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).
  2018-03-14 13:08             ` H.J. Lu
@ 2018-03-14 13:10               ` Martin Liška
  2018-03-14 13:43                 ` Jakub Jelinek
  0 siblings, 1 reply; 45+ messages in thread
From: Martin Liška @ 2018-03-14 13:10 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Jakub Jelinek, GCC Patches, Marc Glisse

On 03/14/2018 01:57 PM, H.J. Lu wrote:
> On Wed, Mar 14, 2018 at 5:54 AM, Martin LiÅ¡ka <mliska@suse.cz> wrote:
>> On 03/13/2018 04:23 PM, Jakub Jelinek wrote:
>>> On Tue, Mar 13, 2018 at 04:19:21PM +0100, Martin LiÅ¡ka wrote:
>>>>> Yes, see e.g. TARGET_LIBC_HAS_FUNCTION target hook,
>>>>> where in particular linux_libc_has_function deals with various C libraries.
>>>>> Of course, in this case you need another target hook, that is dependent both
>>>>> on the target backend and C library.
>>>>>
>>>>> It would be nice to make the target hook a little bit more generic as well,
>>>>> e.g. pass it enum builtin_function and query if it is fast, slow or
>>>>> unknown, or even some kind of cost, where the caller could ask for cost of
>>>>> BUILT_IN_MEMCPY and BUILT_IN_MEMPCPY and decide based on the relative costs.
>>>>
>>>> Let me start with simple return enum value of FAST,SLOW,UNKNOWN. I've added new hook
>>>> definition to gcc/config/gnu-user.h that will point to gnu_libc_function_implementation.
>>>> I would like to implement the function in gcc/targhooks.c, but I don't know how to
>>>> make ifdef according to target?
>>>
>>> Put there just the default implementation (everything is UNKNOWN?).
>>>
>>>> One another issue is that built_in_function is enum defined in tree.h. Thus I'll replace the
>>>> callback argument with int, that will be casted. One last issue: am I right that I'll have to define
>>>> TARGET_LIBC_FUNCTION_IMPLEMENTATION in each config file (similar to no_c99_libc_has_function)?
>>>
>>> And define the i386/x86_64 glibc one in config/i386/*.h, check there
>>> OPTION_GLIBC and only in that case return something other than UNKNOWN.
>>>
>>> And redefine TARGET_LIBC_FUNCTION_IMPLEMENTATION only in that case.
>>>
>>>       Jakub
>>>
>>
>> Hi.
>>
>> I'm sending V2 that can survive bootstrap and regression tests on both x86_64 and ppc64le.
>>
>> Martin
> 
> diff --git a/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c
> b/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c
> index d82e2232d7b..91e1c87f83f 100644
> --- a/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c
> +++ b/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c
> @@ -62,7 +62,7 @@ main_test (void)
>    mempcpy (p + 5, s3, 1);
>    if (memcmp (p, "ABCDEFg", 8))
>      abort ();
> -  mempcpy (p + 6, s1 + 1, l1);
> +  memcpy (p + 6, s1 + 1, l1);
>    if (memcmp (p, "ABCDEF2", 8))
>      abort ();
>  }
> 
> This is a mempcpy test.  Why is mempcpy changed to memcpy?
> 

Because this mempcpy is not optimized out to memcpy and it then aborts.
It's proper to leave here mempcpy I believe.

Martin

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).
  2018-03-14 13:10               ` Martin Liška
@ 2018-03-14 13:43                 ` Jakub Jelinek
  0 siblings, 0 replies; 45+ messages in thread
From: Jakub Jelinek @ 2018-03-14 13:43 UTC (permalink / raw)
  To: Martin Liška; +Cc: H.J. Lu, GCC Patches, Marc Glisse

On Wed, Mar 14, 2018 at 02:08:07PM +0100, Martin LiÅ¡ka wrote:
> > diff --git a/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c
> > b/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c
> > index d82e2232d7b..91e1c87f83f 100644
> > --- a/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c
> > +++ b/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c
> > @@ -62,7 +62,7 @@ main_test (void)
> >    mempcpy (p + 5, s3, 1);
> >    if (memcmp (p, "ABCDEFg", 8))
> >      abort ();
> > -  mempcpy (p + 6, s1 + 1, l1);
> > +  memcpy (p + 6, s1 + 1, l1);
> >    if (memcmp (p, "ABCDEF2", 8))
> >      abort ();
> >  }
> > 
> > This is a mempcpy test.  Why is mempcpy changed to memcpy?
> > 
> 
> Because this mempcpy is not optimized out to memcpy and it then aborts.

Why it isn't optimized to memcpy?  If the lhs is unused, it always should be
folded to memcpy, regardless of whether mempcpy is fast or not (I assume no
target has slow memcpy and fast mempcpy).

	Jakub

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).
  2018-03-14 12:57           ` Martin Liška
  2018-03-14 13:08             ` H.J. Lu
@ 2018-03-14 13:08             ` Jakub Jelinek
  2018-03-14 14:04               ` Martin Liška
  1 sibling, 1 reply; 45+ messages in thread
From: Jakub Jelinek @ 2018-03-14 13:08 UTC (permalink / raw)
  To: Martin Liška; +Cc: gcc-patches, Marc Glisse, H.J. Lu

On Wed, Mar 14, 2018 at 01:54:39PM +0100, Martin LiÅ¡ka wrote:
> --- a/gcc/builtins.c
> +++ b/gcc/builtins.c
> @@ -3651,13 +3651,24 @@ expand_builtin_memory_copy_args (tree dest, tree src, tree len,
>    src_mem = get_memory_rtx (src, len);
>    set_mem_align (src_mem, src_align);
>  
> +  bool is_move_done;
> +
>    /* Copy word part most expediently.  */

This comment supposedly belongs right above the emit_block_move_hints call.

> +  bool bail_out_libcall = endp == 1
> +    && targetm.libc_func_speed ((int)BUILT_IN_MEMPCPY) == FAST_SPEED;

Formatting.  && belongs below endp.  So either:
  bool bail_out_libcall
    = (endp == 1
       && ...);
or
  bool bail_out_libcall = false;
  if (endp == 1
      && ...)
    bail_out_libcall = true;
?
The variable is not named very well, shouldn't it be avoid_libcall or
something similar?  Perhaps the variable should have a comment describing
what it is.  Do you need separate argument for that bool and
is_move_done, rather than just the flag being that some pointer to bool is
non-NULL?

>    dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx,
>  				     CALL_EXPR_TAILCALL (exp)
>  				     && (endp == 0 || target == const0_rtx)
>  				     ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL,
>  				     expected_align, expected_size,
> -				     min_size, max_size, probable_max_size);
> +				     min_size, max_size, probable_max_size,
> +				     bail_out_libcall, &is_move_done);
> +
> +  /* Bail out when a mempcpy call would be expanded as libcall and when
> +     we have a target that provides a fast implementation
> +     of mempcpy routine.  */
> +  if (!is_move_done)
> +    return NULL_RTX;
>  
>    if (dest_addr == 0)
>      {
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -2733,6 +2733,23 @@ ix86_using_red_zone (void)
>  	  && (!cfun->machine->has_local_indirect_jump
>  	      || cfun->machine->indirect_branch_type == indirect_branch_keep));
>  }
> +

Missing function comment here.  For target hooks, usually there is a copy of
the target.def comment, perhaps with further details on why it is overridden
and what it does.
> --- a/gcc/coretypes.h
> +++ b/gcc/coretypes.h
> @@ -384,6 +384,13 @@ enum excess_precision_type
>    EXCESS_PRECISION_TYPE_FAST
>  };
>  

Missing comment describing what it is, plus it the enumerators are too
generic, if it is libc_speed enum, perhaps LIBC_FAST_SPEED etc.?
> --- a/gcc/targhooks.c
> +++ b/gcc/targhooks.c
> @@ -1642,6 +1642,12 @@ no_c99_libc_has_function (enum function_class fn_class ATTRIBUTE_UNUSED)
>    return false;
>  }
>  

Again, missing function comment.

> +enum libc_speed
> +default_libc_func_speed (int)
> +{
> +  return UNKNOWN_SPEED;
> +}
> +

> --- a/gcc/testsuite/gcc.dg/string-opt-1.c
> +++ b/gcc/testsuite/gcc.dg/string-opt-1.c
> @@ -48,5 +48,5 @@ main (void)
>    return 0;
>  }
>  
> -/* { dg-final { scan-assembler-not "\<mempcpy\>" } } */
> -/* { dg-final { scan-assembler "memcpy" } } */
> +/* { dg-final { scan-assembler-not "\<mempcpy\>" { target { i?86-*-* x86_64-*-* } } } } */
> +/* { dg-final { scan-assembler "memcpy" { target  { ! { i?86-*-* x86_64-*-* } } } } } */

First of all, I don't really understand this, I'd expect both the
two old dg-final lines to be used as is for non-x86 targets and another two
for x86_64, and more importantly, the target hook is only for glibc, not for
musl/bionic etc., nor for non-linux, so you probably need some effective
target for it for whether it is glibc rather than musl/bionic, and use
...-*-linux* and ...-*-gnu* etc. rather than ...-*-*.  Or tweak the dg-fianl
patterns so that it succeeds on both variants of the target hook, but then
does the test test anything at all?

	Jakub

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).
  2018-03-14 13:08             ` Jakub Jelinek
@ 2018-03-14 14:04               ` Martin Liška
  2018-03-18 22:18                 ` Martin Liška
  2018-03-21 10:37                 ` Jakub Jelinek
  0 siblings, 2 replies; 45+ messages in thread
From: Martin Liška @ 2018-03-14 14:04 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches, Marc Glisse, H.J. Lu

[-- Attachment #1: Type: text/plain, Size: 4156 bytes --]

On 03/14/2018 02:07 PM, Jakub Jelinek wrote:
> On Wed, Mar 14, 2018 at 01:54:39PM +0100, Martin LiÅ¡ka wrote:
>> --- a/gcc/builtins.c
>> +++ b/gcc/builtins.c
>> @@ -3651,13 +3651,24 @@ expand_builtin_memory_copy_args (tree dest, tree src, tree len,
>>    src_mem = get_memory_rtx (src, len);
>>    set_mem_align (src_mem, src_align);
>>  
>> +  bool is_move_done;
>> +
>>    /* Copy word part most expediently.  */
> 
> This comment supposedly belongs right above the emit_block_move_hints call.
> 
>> +  bool bail_out_libcall = endp == 1
>> +    && targetm.libc_func_speed ((int)BUILT_IN_MEMPCPY) == FAST_SPEED;
> 
> Formatting.  && belongs below endp.  So either:
>   bool bail_out_libcall
>     = (endp == 1
>        && ...);
> or
>   bool bail_out_libcall = false;
>   if (endp == 1
>       && ...)
>     bail_out_libcall = true;
> ?
> The variable is not named very well, shouldn't it be avoid_libcall or
> something similar?  Perhaps the variable should have a comment describing
> what it is.  Do you need separate argument for that bool and
> is_move_done, rather than just the flag being that some pointer to bool is
> non-NULL?

Can you please explain me how to replace the 2 new arguments?

> 
>>    dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx,
>>  				     CALL_EXPR_TAILCALL (exp)
>>  				     && (endp == 0 || target == const0_rtx)
>>  				     ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL,
>>  				     expected_align, expected_size,
>> -				     min_size, max_size, probable_max_size);
>> +				     min_size, max_size, probable_max_size,
>> +				     bail_out_libcall, &is_move_done);
>> +
>> +  /* Bail out when a mempcpy call would be expanded as libcall and when
>> +     we have a target that provides a fast implementation
>> +     of mempcpy routine.  */
>> +  if (!is_move_done)
>> +    return NULL_RTX;
>>  
>>    if (dest_addr == 0)
>>      {
>> --- a/gcc/config/i386/i386.c
>> +++ b/gcc/config/i386/i386.c
>> @@ -2733,6 +2733,23 @@ ix86_using_red_zone (void)
>>  	  && (!cfun->machine->has_local_indirect_jump
>>  	      || cfun->machine->indirect_branch_type == indirect_branch_keep));
>>  }
>> +
> 
> Missing function comment here.  For target hooks, usually there is a copy of
> the target.def comment, perhaps with further details on why it is overridden
> and what it does.
>> --- a/gcc/coretypes.h
>> +++ b/gcc/coretypes.h
>> @@ -384,6 +384,13 @@ enum excess_precision_type
>>    EXCESS_PRECISION_TYPE_FAST
>>  };
>>  
> 
> Missing comment describing what it is, plus it the enumerators are too
> generic, if it is libc_speed enum, perhaps LIBC_FAST_SPEED etc.?
>> --- a/gcc/targhooks.c
>> +++ b/gcc/targhooks.c
>> @@ -1642,6 +1642,12 @@ no_c99_libc_has_function (enum function_class fn_class ATTRIBUTE_UNUSED)
>>    return false;
>>  }
>>  
> 
> Again, missing function comment.
> 
>> +enum libc_speed
>> +default_libc_func_speed (int)
>> +{
>> +  return UNKNOWN_SPEED;
>> +}
>> +
> 
>> --- a/gcc/testsuite/gcc.dg/string-opt-1.c
>> +++ b/gcc/testsuite/gcc.dg/string-opt-1.c
>> @@ -48,5 +48,5 @@ main (void)
>>    return 0;
>>  }
>>  
>> -/* { dg-final { scan-assembler-not "\<mempcpy\>" } } */
>> -/* { dg-final { scan-assembler "memcpy" } } */
>> +/* { dg-final { scan-assembler-not "\<mempcpy\>" { target { i?86-*-* x86_64-*-* } } } } */
>> +/* { dg-final { scan-assembler "memcpy" { target  { ! { i?86-*-* x86_64-*-* } } } } } */
> 
> First of all, I don't really understand this, I'd expect both the
> two old dg-final lines to be used as is for non-x86 targets and another two
> for x86_64, and more importantly, the target hook is only for glibc, not for
> musl/bionic etc., nor for non-linux, so you probably need some effective
> target for it for whether it is glibc rather than musl/bionic, and use
> ...-*-linux* and ...-*-gnu* etc. rather than ...-*-*.  Or tweak the dg-fianl
> patterns so that it succeeds on both variants of the target hook, but then
> does the test test anything at all?

I fixed that by preserving the old 2 old-finals and then I added a new for x86_64 target.
Apart from the comment above I've fixed all nits and mempcpy is not used when LHS == NULL.

Martin

> 
> 	Jakub
> 


[-- Attachment #2: 0001-Introduce-new-libc_func_speed-target-hook-PR-middle-.patch --]
[-- Type: text/x-patch, Size: 11017 bytes --]

From 26979038ce9500015f957afd896146022a38490b Mon Sep 17 00:00:00 2001
From: marxin <mliska@suse.cz>
Date: Wed, 14 Mar 2018 09:44:18 +0100
Subject: [PATCH] Introduce new libc_func_speed target hook (PR
 middle-end/81657).

gcc/ChangeLog:

2018-03-14  Martin Liska  <mliska@suse.cz>

	PR middle-end/81657
	* builtins.c (expand_builtin_memory_copy_args): Handle situation
	when libc library provides a fast mempcpy implementation/
	* config/i386/i386-protos.h (gnu_libc_func_speed): New.
	* config/i386/i386.c (enum libc_speed): Likewise.
	(ix86_libc_func_speed): Likewise.
	(TARGET_LIBC_FUNC_SPEED): Likewise.
	* coretypes.h (enum libc_speed): Likewise.
	* doc/tm.texi: Document new target hook.
	* doc/tm.texi.in: Likewise.
	* expr.c (emit_block_move_hints): Handle libc bail out argument.
	* expr.h (emit_block_move_hints): Add new parameters.
	* target.def: Add new hook.
	* targhooks.c (enum libc_speed): New enum.
	(default_libc_func_speed): Provide a default hook
	implementation.
	* targhooks.h (default_libc_func_speed): Likewise.

gcc/testsuite/ChangeLog:

2018-03-14  Martin Liska  <mliska@suse.cz>

	* gcc.dg/string-opt-1.c: Adjust scans for i386 and glibc target
	and others.
---
 gcc/builtins.c                      | 16 +++++++++++++++-
 gcc/config/i386/i386-protos.h       |  2 ++
 gcc/config/i386/i386.c              | 24 ++++++++++++++++++++++++
 gcc/coretypes.h                     |  7 +++++++
 gcc/doc/tm.texi                     |  4 ++++
 gcc/doc/tm.texi.in                  |  1 +
 gcc/expr.c                          | 16 +++++++++++++++-
 gcc/expr.h                          |  4 +++-
 gcc/target.def                      |  7 +++++++
 gcc/targhooks.c                     |  9 +++++++++
 gcc/targhooks.h                     |  1 +
 gcc/testsuite/gcc.dg/string-opt-1.c |  5 +++--
 12 files changed, 91 insertions(+), 5 deletions(-)

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 85affa74510..b668045374c 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -3651,13 +3651,27 @@ expand_builtin_memory_copy_args (tree dest, tree src, tree len,
   src_mem = get_memory_rtx (src, len);
   set_mem_align (src_mem, src_align);
 
+  bool is_move_done;
+
+  /* emit_block_move_hints can generate a library call to memcpy function.
+     In situations when a libc library provides fast implementation
+     of mempcpy, then it's better to call mempcpy directly.  */
+  bool avoid_libcall
+    = (endp == 1
+       && targetm.libc_func_speed ((int)BUILT_IN_MEMPCPY) == LIBC_FAST_SPEED
+       && target != const0_rtx);
+
   /* Copy word part most expediently.  */
   dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx,
 				     CALL_EXPR_TAILCALL (exp)
 				     && (endp == 0 || target == const0_rtx)
 				     ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL,
 				     expected_align, expected_size,
-				     min_size, max_size, probable_max_size);
+				     min_size, max_size, probable_max_size,
+				     avoid_libcall, &is_move_done);
+
+  if (!is_move_done)
+    return NULL_RTX;
 
   if (dest_addr == 0)
     {
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index ef7c818986f..d3fc515845b 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -47,6 +47,8 @@ extern void ix86_reset_previous_fndecl (void);
 
 extern bool ix86_using_red_zone (void);
 
+extern enum libc_speed gnu_libc_func_speed (int fn);
+
 #ifdef RTX_CODE
 extern int standard_80387_constant_p (rtx);
 extern const char *standard_80387_constant_opcode (rtx);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index af24c6ec5ba..f4cbecdf099 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2733,6 +2733,27 @@ ix86_using_red_zone (void)
 	  && (!cfun->machine->has_local_indirect_jump
 	      || cfun->machine->indirect_branch_type == indirect_branch_keep));
 }
+
+/* This hook determines whether a function from libc has a fast implementation
+   FN is present at the runtime.  We override it for i386 and glibc C library
+   as this combination provides fast implementation of mempcpy function.  */
+
+enum libc_speed
+ix86_libc_func_speed (int fn)
+{
+  enum built_in_function f = (built_in_function)fn;
+
+  if (!OPTION_GLIBC)
+    return LIBC_UNKNOWN_SPEED;
+
+  switch (f)
+    {
+      case BUILT_IN_MEMPCPY:
+	return LIBC_FAST_SPEED;
+      default:
+	return LIBC_UNKNOWN_SPEED;
+    }
+}
 \f
 /* Return a string that documents the current -m options.  The caller is
    responsible for freeing the string.  */
@@ -52061,6 +52082,9 @@ ix86_run_selftests (void)
 #undef TARGET_WARN_PARAMETER_PASSING_ABI
 #define TARGET_WARN_PARAMETER_PASSING_ABI ix86_warn_parameter_passing_abi
 
+#undef TARGET_LIBC_FUNC_SPEED
+#define TARGET_LIBC_FUNC_SPEED ix86_libc_func_speed
+
 #if CHECKING_P
 #undef TARGET_RUN_TARGET_SELFTESTS
 #define TARGET_RUN_TARGET_SELFTESTS selftest::ix86_run_selftests
diff --git a/gcc/coretypes.h b/gcc/coretypes.h
index 283b4eb33fe..fe618f708f4 100644
--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -384,6 +384,13 @@ enum excess_precision_type
   EXCESS_PRECISION_TYPE_FAST
 };
 
+enum libc_speed
+{
+  LIBC_FAST_SPEED,
+  LIBC_SLOW_SPEED,
+  LIBC_UNKNOWN_SPEED
+};
+
 /* Support for user-provided GGC and PCH markers.  The first parameter
    is a pointer to a pointer, the second a cookie.  */
 typedef void (*gt_pointer_operator) (void *, void *);
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index bd8b917ba82..0f7c91a22c4 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -5501,6 +5501,10 @@ macro, a reasonable default is used.
 This hook determines whether a function from a class of functions
 @var{fn_class} is present at the runtime.
 @end deftypefn
+@deftypefn {Target Hook} libc_speed TARGET_LIBC_FUNC_SPEED (int @var{fn})
+This hook determines whether a function from libc has a fast implementation
+@var{fn} is present at the runtime.
+@end deftypefn
 
 @defmac NEXT_OBJC_RUNTIME
 Set this macro to 1 to use the "NeXT" Objective-C message sending conventions
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index b0207146e8c..4bb2998a8a1 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -3933,6 +3933,7 @@ macro, a reasonable default is used.
 @end defmac
 
 @hook TARGET_LIBC_HAS_FUNCTION
+@hook TARGET_LIBC_FUNC_SPEED
 
 @defmac NEXT_OBJC_RUNTIME
 Set this macro to 1 to use the "NeXT" Objective-C message sending conventions
diff --git a/gcc/expr.c b/gcc/expr.c
index 00660293f72..b6c13652d79 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -1554,6 +1554,8 @@ compare_by_pieces (rtx arg0, rtx arg1, unsigned HOST_WIDE_INT len,
    MIN_SIZE is the minimal size of block to move
    MAX_SIZE is the maximal size of block to move, if it can not be represented
    in unsigned HOST_WIDE_INT, than it is mask of all ones.
+   If BAIL_OUT_LIBCALL is set true, do not emit library call and set
+   *IS_MOVE_DONE to false.
 
    Return the address of the new block, if memcpy is called and returns it,
    0 otherwise.  */
@@ -1563,12 +1565,17 @@ emit_block_move_hints (rtx x, rtx y, rtx size, enum block_op_methods method,
 		       unsigned int expected_align, HOST_WIDE_INT expected_size,
 		       unsigned HOST_WIDE_INT min_size,
 		       unsigned HOST_WIDE_INT max_size,
-		       unsigned HOST_WIDE_INT probable_max_size)
+		       unsigned HOST_WIDE_INT probable_max_size,
+		       bool bail_out_libcall, bool *is_move_done)
 {
   bool may_use_call;
   rtx retval = 0;
   unsigned int align;
 
+  /* When not doing a bail out, we always emit a memory move.  */
+  if (is_move_done)
+    *is_move_done = true;
+
   gcc_assert (size);
   if (CONST_INT_P (size) && INTVAL (size) == 0)
     return 0;
@@ -1625,6 +1632,13 @@ emit_block_move_hints (rtx x, rtx y, rtx size, enum block_op_methods method,
 	   && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (x))
 	   && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (y)))
     {
+      if (bail_out_libcall)
+	{
+	  if (is_move_done)
+	    *is_move_done = false;
+	  return retval;
+	}
+
       /* Since x and y are passed to a libcall, mark the corresponding
 	 tree EXPR as addressable.  */
       tree y_expr = MEM_EXPR (y);
diff --git a/gcc/expr.h b/gcc/expr.h
index b3d523bcb24..023bc5aec47 100644
--- a/gcc/expr.h
+++ b/gcc/expr.h
@@ -110,7 +110,9 @@ extern rtx emit_block_move_hints (rtx, rtx, rtx, enum block_op_methods,
 			          unsigned int, HOST_WIDE_INT,
 				  unsigned HOST_WIDE_INT,
 				  unsigned HOST_WIDE_INT,
-				  unsigned HOST_WIDE_INT);
+				  unsigned HOST_WIDE_INT,
+				  bool bail_out_libcall = false,
+				  bool *is_move_done = NULL);
 extern rtx emit_block_cmp_hints (rtx, rtx, rtx, tree, rtx, bool,
 				 by_pieces_constfn, void *);
 extern bool emit_storent_insn (rtx to, rtx from);
diff --git a/gcc/target.def b/gcc/target.def
index c5b2a1e7e71..3bbddc82776 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -2639,6 +2639,13 @@ DEFHOOK
  bool, (enum function_class fn_class),
  default_libc_has_function)
 
+DEFHOOK
+(libc_func_speed,
+ "This hook determines whether a function from libc has a fast implementation\n\
+@var{fn} is present at the runtime.",
+ libc_speed, (int fn),
+ default_libc_func_speed)
+
 /* True if new jumps cannot be created, to replace existing ones or
    not, at the current point in the compilation.  */
 DEFHOOK
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index fafcc6c5196..6e44f6f79cf 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -1642,6 +1642,15 @@ no_c99_libc_has_function (enum function_class fn_class ATTRIBUTE_UNUSED)
   return false;
 }
 
+/* This hook determines whether a function from libc has a fast implementation
+   FN is present at the runtime.  */
+
+enum libc_speed
+default_libc_func_speed (int)
+{
+  return LIBC_UNKNOWN_SPEED;
+}
+
 tree
 default_builtin_tm_load_store (tree ARG_UNUSED (type))
 {
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 8a4393f2ba4..7508673ad0a 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -205,6 +205,7 @@ extern bool default_have_conditional_execution (void);
 extern bool default_libc_has_function (enum function_class);
 extern bool no_c99_libc_has_function (enum function_class);
 extern bool gnu_libc_has_function (enum function_class);
+extern enum libc_speed default_libc_func_speed (int);
 
 extern tree default_builtin_tm_load_store (tree);
 
diff --git a/gcc/testsuite/gcc.dg/string-opt-1.c b/gcc/testsuite/gcc.dg/string-opt-1.c
index 2f060732bf0..7faaadcbb1f 100644
--- a/gcc/testsuite/gcc.dg/string-opt-1.c
+++ b/gcc/testsuite/gcc.dg/string-opt-1.c
@@ -48,5 +48,6 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-assembler-not "\<mempcpy\>" } } */
-/* { dg-final { scan-assembler "memcpy" } } */
+/* { dg-final { scan-assembler-not "\<mempcpy\>" { target { ! { i?86-*-gnu* x86_64-*-gnu* i?86-*-linux* x86_64-*-linux* } } } } } */
+/* { dg-final { scan-assembler "memcpy" { target  { ! { i?86-*-gnu* x86_64-*-gnu* i?86-*-linux* x86_64-*-linux* } } } } } */
+/* { dg-final { scan-assembler "mempcpy" { target  { i?86-*-gnu* x86_64-*-gnu* i?86-*-linux* x86_64-*-linux* } } } } */
-- 
2.16.2


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).
  2018-03-14 14:04               ` Martin Liška
@ 2018-03-18 22:18                 ` Martin Liška
  2018-03-21 10:37                 ` Jakub Jelinek
  1 sibling, 0 replies; 45+ messages in thread
From: Martin Liška @ 2018-03-18 22:18 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches, Marc Glisse, H.J. Lu

PING^1

On 03/14/2018 02:54 PM, Martin LiÅ¡ka wrote:
> On 03/14/2018 02:07 PM, Jakub Jelinek wrote:
>> On Wed, Mar 14, 2018 at 01:54:39PM +0100, Martin LiÅ¡ka wrote:
>>> --- a/gcc/builtins.c
>>> +++ b/gcc/builtins.c
>>> @@ -3651,13 +3651,24 @@ expand_builtin_memory_copy_args (tree dest, tree src, tree len,
>>>     src_mem = get_memory_rtx (src, len);
>>>     set_mem_align (src_mem, src_align);
>>>   
>>> +  bool is_move_done;
>>> +
>>>     /* Copy word part most expediently.  */
>>
>> This comment supposedly belongs right above the emit_block_move_hints call.
>>
>>> +  bool bail_out_libcall = endp == 1
>>> +    && targetm.libc_func_speed ((int)BUILT_IN_MEMPCPY) == FAST_SPEED;
>>
>> Formatting.  && belongs below endp.  So either:
>>    bool bail_out_libcall
>>      = (endp == 1
>>         && ...);
>> or
>>    bool bail_out_libcall = false;
>>    if (endp == 1
>>        && ...)
>>      bail_out_libcall = true;
>> ?
>> The variable is not named very well, shouldn't it be avoid_libcall or
>> something similar?  Perhaps the variable should have a comment describing
>> what it is.  Do you need separate argument for that bool and
>> is_move_done, rather than just the flag being that some pointer to bool is
>> non-NULL?
> 
> Can you please explain me how to replace the 2 new arguments?
> 
>>
>>>     dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx,
>>>   				     CALL_EXPR_TAILCALL (exp)
>>>   				     && (endp == 0 || target == const0_rtx)
>>>   				     ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL,
>>>   				     expected_align, expected_size,
>>> -				     min_size, max_size, probable_max_size);
>>> +				     min_size, max_size, probable_max_size,
>>> +				     bail_out_libcall, &is_move_done);
>>> +
>>> +  /* Bail out when a mempcpy call would be expanded as libcall and when
>>> +     we have a target that provides a fast implementation
>>> +     of mempcpy routine.  */
>>> +  if (!is_move_done)
>>> +    return NULL_RTX;
>>>   
>>>     if (dest_addr == 0)
>>>       {
>>> --- a/gcc/config/i386/i386.c
>>> +++ b/gcc/config/i386/i386.c
>>> @@ -2733,6 +2733,23 @@ ix86_using_red_zone (void)
>>>   	  && (!cfun->machine->has_local_indirect_jump
>>>   	      || cfun->machine->indirect_branch_type == indirect_branch_keep));
>>>   }
>>> +
>>
>> Missing function comment here.  For target hooks, usually there is a copy of
>> the target.def comment, perhaps with further details on why it is overridden
>> and what it does.
>>> --- a/gcc/coretypes.h
>>> +++ b/gcc/coretypes.h
>>> @@ -384,6 +384,13 @@ enum excess_precision_type
>>>     EXCESS_PRECISION_TYPE_FAST
>>>   };
>>>   
>>
>> Missing comment describing what it is, plus it the enumerators are too
>> generic, if it is libc_speed enum, perhaps LIBC_FAST_SPEED etc.?
>>> --- a/gcc/targhooks.c
>>> +++ b/gcc/targhooks.c
>>> @@ -1642,6 +1642,12 @@ no_c99_libc_has_function (enum function_class fn_class ATTRIBUTE_UNUSED)
>>>     return false;
>>>   }
>>>   
>>
>> Again, missing function comment.
>>
>>> +enum libc_speed
>>> +default_libc_func_speed (int)
>>> +{
>>> +  return UNKNOWN_SPEED;
>>> +}
>>> +
>>
>>> --- a/gcc/testsuite/gcc.dg/string-opt-1.c
>>> +++ b/gcc/testsuite/gcc.dg/string-opt-1.c
>>> @@ -48,5 +48,5 @@ main (void)
>>>     return 0;
>>>   }
>>>   
>>> -/* { dg-final { scan-assembler-not "\<mempcpy\>" } } */
>>> -/* { dg-final { scan-assembler "memcpy" } } */
>>> +/* { dg-final { scan-assembler-not "\<mempcpy\>" { target { i?86-*-* x86_64-*-* } } } } */
>>> +/* { dg-final { scan-assembler "memcpy" { target  { ! { i?86-*-* x86_64-*-* } } } } } */
>>
>> First of all, I don't really understand this, I'd expect both the
>> two old dg-final lines to be used as is for non-x86 targets and another two
>> for x86_64, and more importantly, the target hook is only for glibc, not for
>> musl/bionic etc., nor for non-linux, so you probably need some effective
>> target for it for whether it is glibc rather than musl/bionic, and use
>> ...-*-linux* and ...-*-gnu* etc. rather than ...-*-*.  Or tweak the dg-fianl
>> patterns so that it succeeds on both variants of the target hook, but then
>> does the test test anything at all?
> 
> I fixed that by preserving the old 2 old-finals and then I added a new for x86_64 target.
> Apart from the comment above I've fixed all nits and mempcpy is not used when LHS == NULL.
> 
> Martin
> 
>>
>> 	Jakub
>>
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).
  2018-03-14 14:04               ` Martin Liška
  2018-03-18 22:18                 ` Martin Liška
@ 2018-03-21 10:37                 ` Jakub Jelinek
  2018-03-28 14:29                   ` Martin Liška
  1 sibling, 1 reply; 45+ messages in thread
From: Jakub Jelinek @ 2018-03-21 10:37 UTC (permalink / raw)
  To: Martin Liška; +Cc: gcc-patches, Marc Glisse, H.J. Lu

On Wed, Mar 14, 2018 at 02:54:00PM +0100, Martin LiÅ¡ka wrote:
> > The variable is not named very well, shouldn't it be avoid_libcall or
> > something similar?  Perhaps the variable should have a comment describing
> > what it is.  Do you need separate argument for that bool and
> > is_move_done, rather than just the flag being that some pointer to bool is
> > non-NULL?
> 
> Can you please explain me how to replace the 2 new arguments?

So you have one bool arg and one pointer arg into which the function
stores true and optionally based on that other bool arg and other conditions
stores false.
I'm suggesting just one bool * argument, which is NULL if the bool arg would
be false, and non-NULL otherwise.  The single argument still could be
called bool *avoid_libcall, and you'd just
if (avoid_libcall) { *avoid_libcall = true; return retval; }
instead of emitting a libcall, the caller would initialize the bool
variable to false.

	Jakub

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).
  2018-03-21 10:37                 ` Jakub Jelinek
@ 2018-03-28 14:29                   ` Martin Liška
  2018-03-28 14:35                     ` Jakub Jelinek
  0 siblings, 1 reply; 45+ messages in thread
From: Martin Liška @ 2018-03-28 14:29 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches, Marc Glisse, H.J. Lu

[-- Attachment #1: Type: text/plain, Size: 1166 bytes --]

On 03/21/2018 11:34 AM, Jakub Jelinek wrote:
> On Wed, Mar 14, 2018 at 02:54:00PM +0100, Martin LiÅ¡ka wrote:
>>> The variable is not named very well, shouldn't it be avoid_libcall or
>>> something similar?  Perhaps the variable should have a comment describing
>>> what it is.  Do you need separate argument for that bool and
>>> is_move_done, rather than just the flag being that some pointer to bool is
>>> non-NULL?
>>
>> Can you please explain me how to replace the 2 new arguments?
> 
> So you have one bool arg and one pointer arg into which the function
> stores true and optionally based on that other bool arg and other conditions
> stores false.
> I'm suggesting just one bool * argument, which is NULL if the bool arg would
> be false, and non-NULL otherwise.  The single argument still could be
> called bool *avoid_libcall, and you'd just
> if (avoid_libcall) { *avoid_libcall = true; return retval; }
> instead of emitting a libcall, the caller would initialize the bool
> variable to false.
> 
> 	Jakub
> 

Got it. I'm sending updated version of the patch. Hope I've had enough fantasy to
write it nice.

Tested on both ppc64le and x86_64.

Martin

[-- Attachment #2: 0001-Introduce-new-libc_func_speed-target-hook-PR-middle-.patch --]
[-- Type: text/x-patch, Size: 10749 bytes --]

From d766330364aa2a23512f7d4e60491b634c5d0523 Mon Sep 17 00:00:00 2001
From: marxin <mliska@suse.cz>
Date: Wed, 14 Mar 2018 09:44:18 +0100
Subject: [PATCH] Introduce new libc_func_speed target hook (PR
 middle-end/81657).

gcc/ChangeLog:

2018-03-14  Martin Liska  <mliska@suse.cz>

	PR middle-end/81657
	* builtins.c (expand_builtin_memory_copy_args): Handle situation
	when libc library provides a fast mempcpy implementation/
	* config/i386/i386-protos.h (gnu_libc_func_speed): New.
	* config/i386/i386.c (enum libc_speed): Likewise.
	(ix86_libc_func_speed): Likewise.
	(TARGET_LIBC_FUNC_SPEED): Likewise.
	* coretypes.h (enum libc_speed): Likewise.
	* doc/tm.texi: Document new target hook.
	* doc/tm.texi.in: Likewise.
	* expr.c (emit_block_move_hints): Handle libc bail out argument.
	* expr.h (emit_block_move_hints): Add new parameters.
	* target.def: Add new hook.
	* targhooks.c (enum libc_speed): New enum.
	(default_libc_func_speed): Provide a default hook
	implementation.
	* targhooks.h (default_libc_func_speed): Likewise.

gcc/testsuite/ChangeLog:

2018-03-14  Martin Liska  <mliska@suse.cz>

	* gcc.dg/string-opt-1.c: Adjust scans for i386 and glibc target
	and others.
---
 gcc/builtins.c                      | 15 ++++++++++++++-
 gcc/config/i386/i386-protos.h       |  2 ++
 gcc/config/i386/i386.c              | 24 ++++++++++++++++++++++++
 gcc/coretypes.h                     |  7 +++++++
 gcc/doc/tm.texi                     |  4 ++++
 gcc/doc/tm.texi.in                  |  1 +
 gcc/expr.c                          | 11 ++++++++++-
 gcc/expr.h                          |  3 ++-
 gcc/target.def                      |  7 +++++++
 gcc/targhooks.c                     |  9 +++++++++
 gcc/targhooks.h                     |  1 +
 gcc/testsuite/gcc.dg/string-opt-1.c |  5 +++--
 12 files changed, 84 insertions(+), 5 deletions(-)

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 487d9d58db2..d3cd93ffbfa 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -3651,13 +3651,26 @@ expand_builtin_memory_copy_args (tree dest, tree src, tree len,
   src_mem = get_memory_rtx (src, len);
   set_mem_align (src_mem, src_align);
 
+  /* emit_block_move_hints can generate a library call to memcpy function.
+     In situations when a libc library provides fast implementation
+     of mempcpy, then it's better to call mempcpy directly.  */
+  bool avoid_libcall
+    = (endp == 1
+       && targetm.libc_func_speed ((int)BUILT_IN_MEMPCPY) == LIBC_FAST_SPEED
+       && target != const0_rtx);
+
   /* Copy word part most expediently.  */
+  bool libcall_avoided = false;
   dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx,
 				     CALL_EXPR_TAILCALL (exp)
 				     && (endp == 0 || target == const0_rtx)
 				     ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL,
 				     expected_align, expected_size,
-				     min_size, max_size, probable_max_size);
+				     min_size, max_size, probable_max_size,
+				     avoid_libcall ? &libcall_avoided: NULL);
+
+  if (libcall_avoided)
+    return NULL_RTX;
 
   if (dest_addr == 0)
     {
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index ef7c818986f..d3fc515845b 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -47,6 +47,8 @@ extern void ix86_reset_previous_fndecl (void);
 
 extern bool ix86_using_red_zone (void);
 
+extern enum libc_speed gnu_libc_func_speed (int fn);
+
 #ifdef RTX_CODE
 extern int standard_80387_constant_p (rtx);
 extern const char *standard_80387_constant_opcode (rtx);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index b4f6aec1434..aadf9fa5ac3 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2735,6 +2735,27 @@ ix86_using_red_zone (void)
 	  && (!cfun->machine->has_local_indirect_jump
 	      || cfun->machine->indirect_branch_type == indirect_branch_keep));
 }
+
+/* This hook determines whether a function from libc has a fast implementation
+   FN is present at the runtime.  We override it for i386 and glibc C library
+   as this combination provides fast implementation of mempcpy function.  */
+
+enum libc_speed
+ix86_libc_func_speed (int fn)
+{
+  enum built_in_function f = (built_in_function)fn;
+
+  if (!OPTION_GLIBC)
+    return LIBC_UNKNOWN_SPEED;
+
+  switch (f)
+    {
+      case BUILT_IN_MEMPCPY:
+	return LIBC_FAST_SPEED;
+      default:
+	return LIBC_UNKNOWN_SPEED;
+    }
+}
 \f
 /* Return a string that documents the current -m options.  The caller is
    responsible for freeing the string.  */
@@ -52105,6 +52126,9 @@ ix86_run_selftests (void)
 #undef TARGET_WARN_PARAMETER_PASSING_ABI
 #define TARGET_WARN_PARAMETER_PASSING_ABI ix86_warn_parameter_passing_abi
 
+#undef TARGET_LIBC_FUNC_SPEED
+#define TARGET_LIBC_FUNC_SPEED ix86_libc_func_speed
+
 #if CHECKING_P
 #undef TARGET_RUN_TARGET_SELFTESTS
 #define TARGET_RUN_TARGET_SELFTESTS selftest::ix86_run_selftests
diff --git a/gcc/coretypes.h b/gcc/coretypes.h
index 283b4eb33fe..fe618f708f4 100644
--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -384,6 +384,13 @@ enum excess_precision_type
   EXCESS_PRECISION_TYPE_FAST
 };
 
+enum libc_speed
+{
+  LIBC_FAST_SPEED,
+  LIBC_SLOW_SPEED,
+  LIBC_UNKNOWN_SPEED
+};
+
 /* Support for user-provided GGC and PCH markers.  The first parameter
    is a pointer to a pointer, the second a cookie.  */
 typedef void (*gt_pointer_operator) (void *, void *);
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index bd8b917ba82..0f7c91a22c4 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -5501,6 +5501,10 @@ macro, a reasonable default is used.
 This hook determines whether a function from a class of functions
 @var{fn_class} is present at the runtime.
 @end deftypefn
+@deftypefn {Target Hook} libc_speed TARGET_LIBC_FUNC_SPEED (int @var{fn})
+This hook determines whether a function from libc has a fast implementation
+@var{fn} is present at the runtime.
+@end deftypefn
 
 @defmac NEXT_OBJC_RUNTIME
 Set this macro to 1 to use the "NeXT" Objective-C message sending conventions
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index b0207146e8c..4bb2998a8a1 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -3933,6 +3933,7 @@ macro, a reasonable default is used.
 @end defmac
 
 @hook TARGET_LIBC_HAS_FUNCTION
+@hook TARGET_LIBC_FUNC_SPEED
 
 @defmac NEXT_OBJC_RUNTIME
 Set this macro to 1 to use the "NeXT" Objective-C message sending conventions
diff --git a/gcc/expr.c b/gcc/expr.c
index 00660293f72..f3bd698bc4d 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -1554,6 +1554,8 @@ compare_by_pieces (rtx arg0, rtx arg1, unsigned HOST_WIDE_INT len,
    MIN_SIZE is the minimal size of block to move
    MAX_SIZE is the maximal size of block to move, if it can not be represented
    in unsigned HOST_WIDE_INT, than it is mask of all ones.
+   If BAIL_OUT_LIBCALL is non-null, do not emit library call and assign
+   true to the pointer when move is not done.
 
    Return the address of the new block, if memcpy is called and returns it,
    0 otherwise.  */
@@ -1563,7 +1565,8 @@ emit_block_move_hints (rtx x, rtx y, rtx size, enum block_op_methods method,
 		       unsigned int expected_align, HOST_WIDE_INT expected_size,
 		       unsigned HOST_WIDE_INT min_size,
 		       unsigned HOST_WIDE_INT max_size,
-		       unsigned HOST_WIDE_INT probable_max_size)
+		       unsigned HOST_WIDE_INT probable_max_size,
+		       bool *bail_out_libcall)
 {
   bool may_use_call;
   rtx retval = 0;
@@ -1625,6 +1628,12 @@ emit_block_move_hints (rtx x, rtx y, rtx size, enum block_op_methods method,
 	   && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (x))
 	   && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (y)))
     {
+      if (bail_out_libcall)
+	{
+	  *bail_out_libcall = true;
+	  return retval;
+	}
+
       /* Since x and y are passed to a libcall, mark the corresponding
 	 tree EXPR as addressable.  */
       tree y_expr = MEM_EXPR (y);
diff --git a/gcc/expr.h b/gcc/expr.h
index b3d523bcb24..c2bf87fd14e 100644
--- a/gcc/expr.h
+++ b/gcc/expr.h
@@ -110,7 +110,8 @@ extern rtx emit_block_move_hints (rtx, rtx, rtx, enum block_op_methods,
 			          unsigned int, HOST_WIDE_INT,
 				  unsigned HOST_WIDE_INT,
 				  unsigned HOST_WIDE_INT,
-				  unsigned HOST_WIDE_INT);
+				  unsigned HOST_WIDE_INT,
+				  bool *bail_out_libcall = NULL);
 extern rtx emit_block_cmp_hints (rtx, rtx, rtx, tree, rtx, bool,
 				 by_pieces_constfn, void *);
 extern bool emit_storent_insn (rtx to, rtx from);
diff --git a/gcc/target.def b/gcc/target.def
index c5b2a1e7e71..3bbddc82776 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -2639,6 +2639,13 @@ DEFHOOK
  bool, (enum function_class fn_class),
  default_libc_has_function)
 
+DEFHOOK
+(libc_func_speed,
+ "This hook determines whether a function from libc has a fast implementation\n\
+@var{fn} is present at the runtime.",
+ libc_speed, (int fn),
+ default_libc_func_speed)
+
 /* True if new jumps cannot be created, to replace existing ones or
    not, at the current point in the compilation.  */
 DEFHOOK
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index fafcc6c5196..6e44f6f79cf 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -1642,6 +1642,15 @@ no_c99_libc_has_function (enum function_class fn_class ATTRIBUTE_UNUSED)
   return false;
 }
 
+/* This hook determines whether a function from libc has a fast implementation
+   FN is present at the runtime.  */
+
+enum libc_speed
+default_libc_func_speed (int)
+{
+  return LIBC_UNKNOWN_SPEED;
+}
+
 tree
 default_builtin_tm_load_store (tree ARG_UNUSED (type))
 {
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 8a4393f2ba4..7508673ad0a 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -205,6 +205,7 @@ extern bool default_have_conditional_execution (void);
 extern bool default_libc_has_function (enum function_class);
 extern bool no_c99_libc_has_function (enum function_class);
 extern bool gnu_libc_has_function (enum function_class);
+extern enum libc_speed default_libc_func_speed (int);
 
 extern tree default_builtin_tm_load_store (tree);
 
diff --git a/gcc/testsuite/gcc.dg/string-opt-1.c b/gcc/testsuite/gcc.dg/string-opt-1.c
index 2f060732bf0..7faaadcbb1f 100644
--- a/gcc/testsuite/gcc.dg/string-opt-1.c
+++ b/gcc/testsuite/gcc.dg/string-opt-1.c
@@ -48,5 +48,6 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-assembler-not "\<mempcpy\>" } } */
-/* { dg-final { scan-assembler "memcpy" } } */
+/* { dg-final { scan-assembler-not "\<mempcpy\>" { target { ! { i?86-*-gnu* x86_64-*-gnu* i?86-*-linux* x86_64-*-linux* } } } } } */
+/* { dg-final { scan-assembler "memcpy" { target  { ! { i?86-*-gnu* x86_64-*-gnu* i?86-*-linux* x86_64-*-linux* } } } } } */
+/* { dg-final { scan-assembler "mempcpy" { target  { i?86-*-gnu* x86_64-*-gnu* i?86-*-linux* x86_64-*-linux* } } } } */
-- 
2.16.2


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).
  2018-03-28 14:29                   ` Martin Liška
@ 2018-03-28 14:35                     ` Jakub Jelinek
  2018-03-28 17:59                       ` Martin Liška
  0 siblings, 1 reply; 45+ messages in thread
From: Jakub Jelinek @ 2018-03-28 14:35 UTC (permalink / raw)
  To: Martin Liška, Richard Biener, Uros Bizjak
  Cc: gcc-patches, Marc Glisse, H.J. Lu

On Wed, Mar 28, 2018 at 04:18:45PM +0200, Martin LiÅ¡ka wrote:
> 2018-03-14  Martin Liska  <mliska@suse.cz>
> 
> 	PR middle-end/81657
> 	* builtins.c (expand_builtin_memory_copy_args): Handle situation
> 	when libc library provides a fast mempcpy implementation/
> 	* config/i386/i386-protos.h (gnu_libc_func_speed): New.
> 	* config/i386/i386.c (enum libc_speed): Likewise.
> 	(ix86_libc_func_speed): Likewise.
> 	(TARGET_LIBC_FUNC_SPEED): Likewise.
> 	* coretypes.h (enum libc_speed): Likewise.
> 	* doc/tm.texi: Document new target hook.
> 	* doc/tm.texi.in: Likewise.
> 	* expr.c (emit_block_move_hints): Handle libc bail out argument.
> 	* expr.h (emit_block_move_hints): Add new parameters.
> 	* target.def: Add new hook.
> 	* targhooks.c (enum libc_speed): New enum.
> 	(default_libc_func_speed): Provide a default hook
> 	implementation.
> 	* targhooks.h (default_libc_func_speed): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
> 2018-03-14  Martin Liska  <mliska@suse.cz>
> 
> 	* gcc.dg/string-opt-1.c: Adjust scans for i386 and glibc target
> 	and others.
>    /* Copy word part most expediently.  */
> +  bool libcall_avoided = false;
>    dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx,
>  				     CALL_EXPR_TAILCALL (exp)
>  				     && (endp == 0 || target == const0_rtx)
>  				     ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL,
>  				     expected_align, expected_size,
> -				     min_size, max_size, probable_max_size);
> +				     min_size, max_size, probable_max_size,
> +				     avoid_libcall ? &libcall_avoided: NULL);

Missing space before :.
> +
> +  if (libcall_avoided)
> +    return NULL_RTX;
>  
> --- a/gcc/config/i386/i386-protos.h
> +++ b/gcc/config/i386/i386-protos.h
> @@ -47,6 +47,8 @@ extern void ix86_reset_previous_fndecl (void);
>  
>  extern bool ix86_using_red_zone (void);
>  
> +extern enum libc_speed gnu_libc_func_speed (int fn);

Here you declare gnu_libc_func_speed.

> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -2735,6 +2735,27 @@ ix86_using_red_zone (void)
>  	  && (!cfun->machine->has_local_indirect_jump
>  	      || cfun->machine->indirect_branch_type == indirect_branch_keep));
>  }
> +
> +/* This hook determines whether a function from libc has a fast implementation
> +   FN is present at the runtime.  We override it for i386 and glibc C library
> +   as this combination provides fast implementation of mempcpy function.  */
> +
> +enum libc_speed
> +ix86_libc_func_speed (int fn)

But define a different function.

> +{
> +  enum built_in_function f = (built_in_function)fn;
> +
> +  if (!OPTION_GLIBC)
> +    return LIBC_UNKNOWN_SPEED;

OPTION_GLIBC is only defined if linux.h is included, so I think you break
all other x86 targets this way.  The hook should have linux in the name,
perhaps defined only in config/i386/linux.h and redefined in i386.c through
#ifdef SUBTARGET_LIBC_FUNC_SPEED
#undef TARGET_LIBC_FUNC_SPEED
#define TARGET_LIBC_FUNC_SPEED SUBTARGET_LIBC_FUNC_SPEED
#endif
or something similar.

Otherwise LGTM, but please get approval also from Richi (for the generic part) and
Uros (for the backend side).

	Jakub

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).
  2018-03-28 14:35                     ` Jakub Jelinek
@ 2018-03-28 17:59                       ` Martin Liška
  2018-03-28 18:35                         ` Jakub Jelinek
  0 siblings, 1 reply; 45+ messages in thread
From: Martin Liška @ 2018-03-28 17:59 UTC (permalink / raw)
  To: Jakub Jelinek, Richard Biener, Uros Bizjak
  Cc: gcc-patches, Marc Glisse, H.J. Lu

[-- Attachment #1: Type: text/plain, Size: 3572 bytes --]

On 03/28/2018 04:31 PM, Jakub Jelinek wrote:
> On Wed, Mar 28, 2018 at 04:18:45PM +0200, Martin LiÅ¡ka wrote:
>> 2018-03-14  Martin Liska  <mliska@suse.cz>
>>
>> 	PR middle-end/81657
>> 	* builtins.c (expand_builtin_memory_copy_args): Handle situation
>> 	when libc library provides a fast mempcpy implementation/
>> 	* config/i386/i386-protos.h (gnu_libc_func_speed): New.
>> 	* config/i386/i386.c (enum libc_speed): Likewise.
>> 	(ix86_libc_func_speed): Likewise.
>> 	(TARGET_LIBC_FUNC_SPEED): Likewise.
>> 	* coretypes.h (enum libc_speed): Likewise.
>> 	* doc/tm.texi: Document new target hook.
>> 	* doc/tm.texi.in: Likewise.
>> 	* expr.c (emit_block_move_hints): Handle libc bail out argument.
>> 	* expr.h (emit_block_move_hints): Add new parameters.
>> 	* target.def: Add new hook.
>> 	* targhooks.c (enum libc_speed): New enum.
>> 	(default_libc_func_speed): Provide a default hook
>> 	implementation.
>> 	* targhooks.h (default_libc_func_speed): Likewise.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2018-03-14  Martin Liska  <mliska@suse.cz>
>>
>> 	* gcc.dg/string-opt-1.c: Adjust scans for i386 and glibc target
>> 	and others.
>>     /* Copy word part most expediently.  */
>> +  bool libcall_avoided = false;
>>     dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx,
>>   				     CALL_EXPR_TAILCALL (exp)
>>   				     && (endp == 0 || target == const0_rtx)
>>   				     ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL,
>>   				     expected_align, expected_size,
>> -				     min_size, max_size, probable_max_size);
>> +				     min_size, max_size, probable_max_size,
>> +				     avoid_libcall ? &libcall_avoided: NULL);
> 
> Missing space before :.

Fixed.

>> +
>> +  if (libcall_avoided)
>> +    return NULL_RTX;
>>   
>> --- a/gcc/config/i386/i386-protos.h
>> +++ b/gcc/config/i386/i386-protos.h
>> @@ -47,6 +47,8 @@ extern void ix86_reset_previous_fndecl (void);
>>   
>>   extern bool ix86_using_red_zone (void);
>>   
>> +extern enum libc_speed gnu_libc_func_speed (int fn);
> 
> Here you declare gnu_libc_func_speed.
> 
>> --- a/gcc/config/i386/i386.c
>> +++ b/gcc/config/i386/i386.c
>> @@ -2735,6 +2735,27 @@ ix86_using_red_zone (void)
>>   	  && (!cfun->machine->has_local_indirect_jump
>>   	      || cfun->machine->indirect_branch_type == indirect_branch_keep));
>>   }
>> +
>> +/* This hook determines whether a function from libc has a fast implementation
>> +   FN is present at the runtime.  We override it for i386 and glibc C library
>> +   as this combination provides fast implementation of mempcpy function.  */
>> +
>> +enum libc_speed
>> +ix86_libc_func_speed (int fn)
> 
> But define a different function.

You are right.

> 
>> +{
>> +  enum built_in_function f = (built_in_function)fn;
>> +
>> +  if (!OPTION_GLIBC)
>> +    return LIBC_UNKNOWN_SPEED;
> 
> OPTION_GLIBC is only defined if linux.h is included, so I think you break
> all other x86 targets this way.

That said the proper way is probably to define the function in linux64.c
(should we also use it on i386, or nobody cares?). And corresponding declaration
will be in config/linux-protos.h.

The hook should have linux in the name,
> perhaps defined only in config/i386/linux.h and redefined in i386.c through
> #ifdef SUBTARGET_LIBC_FUNC_SPEED
> #undef TARGET_LIBC_FUNC_SPEED
> #define TARGET_LIBC_FUNC_SPEED SUBTARGET_LIBC_FUNC_SPEED
> #endif
> or something similar.

I adopted this suggested mechanism.
Thanks for review.

Martin

> 
> Otherwise LGTM, but please get approval also from Richi (for the generic part) and
> Uros (for the backend side).
> 
> 	Jakub
> 


[-- Attachment #2: 0001-Introduce-new-libc_func_speed-target-hook-PR-middle-.patch --]
[-- Type: text/x-patch, Size: 11379 bytes --]

From 4378fa93aafac3d28eaae48c9a41a2f2ef5e1d18 Mon Sep 17 00:00:00 2001
From: marxin <mliska@suse.cz>
Date: Wed, 14 Mar 2018 09:44:18 +0100
Subject: [PATCH] Introduce new libc_func_speed target hook (PR
 middle-end/81657).

gcc/ChangeLog:

2018-03-14  Martin Liska  <mliska@suse.cz>

	PR middle-end/81657
	* builtins.c (expand_builtin_memory_copy_args): Handle situation
	when libc library provides a fast mempcpy implementation/
	* config/linux-protos.h (ix86_linux_libc_func_speed): New.
	* config/linux.c (ix86_linux_libc_func_speed): Likewise.
	(TARGET_LIBC_FUNC_SPEED): Likewise.
	* config/i386/linux64.h (SUBTARGET_LIBC_FUNC_SPEED): Define
	macro.
	* coretypes.h (enum libc_speed): Likewise.
	* doc/tm.texi: Document new target hook.
	* doc/tm.texi.in: Likewise.
	* expr.c (emit_block_move_hints): Handle libc bail out argument.
	* expr.h (emit_block_move_hints): Add new parameters.
	* target.def: Add new hook.
	* targhooks.c (enum libc_speed): New enum.
	(default_libc_func_speed): Provide a default hook
	implementation.
	* targhooks.h (default_libc_func_speed): Likewise.

gcc/testsuite/ChangeLog:

2018-03-28  Martin Liska  <mliska@suse.cz>

	* gcc.dg/string-opt-1.c:

gcc/testsuite/ChangeLog:

2018-03-14  Martin Liska  <mliska@suse.cz>

	* gcc.dg/string-opt-1.c: Adjust scans for i386 and glibc target
	and others.
---
 gcc/builtins.c                      | 15 ++++++++++++++-
 gcc/config/i386/i386.c              |  5 +++++
 gcc/config/i386/linux64.h           |  2 ++
 gcc/config/linux-protos.h           |  1 +
 gcc/config/linux.c                  | 21 +++++++++++++++++++++
 gcc/coretypes.h                     |  7 +++++++
 gcc/doc/tm.texi                     |  4 ++++
 gcc/doc/tm.texi.in                  |  1 +
 gcc/expr.c                          | 11 ++++++++++-
 gcc/expr.h                          |  3 ++-
 gcc/target.def                      |  7 +++++++
 gcc/targhooks.c                     |  9 +++++++++
 gcc/targhooks.h                     |  1 +
 gcc/testsuite/gcc.dg/string-opt-1.c |  5 +++--
 14 files changed, 87 insertions(+), 5 deletions(-)

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 487d9d58db2..98ee3fb272d 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -3651,13 +3651,26 @@ expand_builtin_memory_copy_args (tree dest, tree src, tree len,
   src_mem = get_memory_rtx (src, len);
   set_mem_align (src_mem, src_align);
 
+  /* emit_block_move_hints can generate a library call to memcpy function.
+     In situations when a libc library provides fast implementation
+     of mempcpy, then it's better to call mempcpy directly.  */
+  bool avoid_libcall
+    = (endp == 1
+       && targetm.libc_func_speed ((int)BUILT_IN_MEMPCPY) == LIBC_FAST_SPEED
+       && target != const0_rtx);
+
   /* Copy word part most expediently.  */
+  bool libcall_avoided = false;
   dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx,
 				     CALL_EXPR_TAILCALL (exp)
 				     && (endp == 0 || target == const0_rtx)
 				     ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL,
 				     expected_align, expected_size,
-				     min_size, max_size, probable_max_size);
+				     min_size, max_size, probable_max_size,
+				     avoid_libcall ? &libcall_avoided : NULL);
+
+  if (libcall_avoided)
+    return NULL_RTX;
 
   if (dest_addr == 0)
     {
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index b4f6aec1434..2471ff7b99a 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -52105,6 +52105,11 @@ ix86_run_selftests (void)
 #undef TARGET_WARN_PARAMETER_PASSING_ABI
 #define TARGET_WARN_PARAMETER_PASSING_ABI ix86_warn_parameter_passing_abi
 
+#ifdef SUBTARGET_LIBC_FUNC_SPEED
+#undef TARGET_LIBC_FUNC_SPEED
+#define TARGET_LIBC_FUNC_SPEED SUBTARGET_LIBC_FUNC_SPEED
+#endif
+
 #if CHECKING_P
 #undef TARGET_RUN_TARGET_SELFTESTS
 #define TARGET_RUN_TARGET_SELFTESTS selftest::ix86_run_selftests
diff --git a/gcc/config/i386/linux64.h b/gcc/config/i386/linux64.h
index f2d913e30ac..d855f5cc239 100644
--- a/gcc/config/i386/linux64.h
+++ b/gcc/config/i386/linux64.h
@@ -37,3 +37,5 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #define MUSL_DYNAMIC_LINKER64 "/lib/ld-musl-x86_64.so.1"
 #undef MUSL_DYNAMIC_LINKERX32
 #define MUSL_DYNAMIC_LINKERX32 "/lib/ld-musl-x32.so.1"
+
+#define SUBTARGET_LIBC_FUNC_SPEED ix86_linux_libc_func_speed
diff --git a/gcc/config/linux-protos.h b/gcc/config/linux-protos.h
index 9da8dd7ecaa..b7284735366 100644
--- a/gcc/config/linux-protos.h
+++ b/gcc/config/linux-protos.h
@@ -20,3 +20,4 @@ along with GCC; see the file COPYING3.  If not see
 extern bool linux_has_ifunc_p (void);
 
 extern bool linux_libc_has_function (enum function_class fn_class);
+extern enum libc_speed ix86_linux_libc_func_speed (int fn);
diff --git a/gcc/config/linux.c b/gcc/config/linux.c
index bc06e54a73d..b878d51e4b4 100644
--- a/gcc/config/linux.c
+++ b/gcc/config/linux.c
@@ -37,3 +37,24 @@ linux_libc_has_function (enum function_class fn_class)
 
   return false;
 }
+
+/* This hook determines whether a function from libc has a fast implementation
+   FN is present at the runtime.  We override it for i386 and glibc C library
+   as this combination provides fast implementation of mempcpy function.  */
+
+enum libc_speed
+ix86_linux_libc_func_speed (int fn)
+{
+  enum built_in_function f = (built_in_function)fn;
+
+  if (!OPTION_GLIBC)
+    return LIBC_UNKNOWN_SPEED;
+
+  switch (f)
+    {
+      case BUILT_IN_MEMPCPY:
+	return LIBC_FAST_SPEED;
+      default:
+	return LIBC_UNKNOWN_SPEED;
+    }
+}
diff --git a/gcc/coretypes.h b/gcc/coretypes.h
index 283b4eb33fe..fe618f708f4 100644
--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -384,6 +384,13 @@ enum excess_precision_type
   EXCESS_PRECISION_TYPE_FAST
 };
 
+enum libc_speed
+{
+  LIBC_FAST_SPEED,
+  LIBC_SLOW_SPEED,
+  LIBC_UNKNOWN_SPEED
+};
+
 /* Support for user-provided GGC and PCH markers.  The first parameter
    is a pointer to a pointer, the second a cookie.  */
 typedef void (*gt_pointer_operator) (void *, void *);
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index bd8b917ba82..0f7c91a22c4 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -5501,6 +5501,10 @@ macro, a reasonable default is used.
 This hook determines whether a function from a class of functions
 @var{fn_class} is present at the runtime.
 @end deftypefn
+@deftypefn {Target Hook} libc_speed TARGET_LIBC_FUNC_SPEED (int @var{fn})
+This hook determines whether a function from libc has a fast implementation
+@var{fn} is present at the runtime.
+@end deftypefn
 
 @defmac NEXT_OBJC_RUNTIME
 Set this macro to 1 to use the "NeXT" Objective-C message sending conventions
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index b0207146e8c..4bb2998a8a1 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -3933,6 +3933,7 @@ macro, a reasonable default is used.
 @end defmac
 
 @hook TARGET_LIBC_HAS_FUNCTION
+@hook TARGET_LIBC_FUNC_SPEED
 
 @defmac NEXT_OBJC_RUNTIME
 Set this macro to 1 to use the "NeXT" Objective-C message sending conventions
diff --git a/gcc/expr.c b/gcc/expr.c
index 00660293f72..f3bd698bc4d 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -1554,6 +1554,8 @@ compare_by_pieces (rtx arg0, rtx arg1, unsigned HOST_WIDE_INT len,
    MIN_SIZE is the minimal size of block to move
    MAX_SIZE is the maximal size of block to move, if it can not be represented
    in unsigned HOST_WIDE_INT, than it is mask of all ones.
+   If BAIL_OUT_LIBCALL is non-null, do not emit library call and assign
+   true to the pointer when move is not done.
 
    Return the address of the new block, if memcpy is called and returns it,
    0 otherwise.  */
@@ -1563,7 +1565,8 @@ emit_block_move_hints (rtx x, rtx y, rtx size, enum block_op_methods method,
 		       unsigned int expected_align, HOST_WIDE_INT expected_size,
 		       unsigned HOST_WIDE_INT min_size,
 		       unsigned HOST_WIDE_INT max_size,
-		       unsigned HOST_WIDE_INT probable_max_size)
+		       unsigned HOST_WIDE_INT probable_max_size,
+		       bool *bail_out_libcall)
 {
   bool may_use_call;
   rtx retval = 0;
@@ -1625,6 +1628,12 @@ emit_block_move_hints (rtx x, rtx y, rtx size, enum block_op_methods method,
 	   && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (x))
 	   && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (y)))
     {
+      if (bail_out_libcall)
+	{
+	  *bail_out_libcall = true;
+	  return retval;
+	}
+
       /* Since x and y are passed to a libcall, mark the corresponding
 	 tree EXPR as addressable.  */
       tree y_expr = MEM_EXPR (y);
diff --git a/gcc/expr.h b/gcc/expr.h
index b3d523bcb24..c2bf87fd14e 100644
--- a/gcc/expr.h
+++ b/gcc/expr.h
@@ -110,7 +110,8 @@ extern rtx emit_block_move_hints (rtx, rtx, rtx, enum block_op_methods,
 			          unsigned int, HOST_WIDE_INT,
 				  unsigned HOST_WIDE_INT,
 				  unsigned HOST_WIDE_INT,
-				  unsigned HOST_WIDE_INT);
+				  unsigned HOST_WIDE_INT,
+				  bool *bail_out_libcall = NULL);
 extern rtx emit_block_cmp_hints (rtx, rtx, rtx, tree, rtx, bool,
 				 by_pieces_constfn, void *);
 extern bool emit_storent_insn (rtx to, rtx from);
diff --git a/gcc/target.def b/gcc/target.def
index c5b2a1e7e71..3bbddc82776 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -2639,6 +2639,13 @@ DEFHOOK
  bool, (enum function_class fn_class),
  default_libc_has_function)
 
+DEFHOOK
+(libc_func_speed,
+ "This hook determines whether a function from libc has a fast implementation\n\
+@var{fn} is present at the runtime.",
+ libc_speed, (int fn),
+ default_libc_func_speed)
+
 /* True if new jumps cannot be created, to replace existing ones or
    not, at the current point in the compilation.  */
 DEFHOOK
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index fafcc6c5196..6e44f6f79cf 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -1642,6 +1642,15 @@ no_c99_libc_has_function (enum function_class fn_class ATTRIBUTE_UNUSED)
   return false;
 }
 
+/* This hook determines whether a function from libc has a fast implementation
+   FN is present at the runtime.  */
+
+enum libc_speed
+default_libc_func_speed (int)
+{
+  return LIBC_UNKNOWN_SPEED;
+}
+
 tree
 default_builtin_tm_load_store (tree ARG_UNUSED (type))
 {
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 8a4393f2ba4..7508673ad0a 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -205,6 +205,7 @@ extern bool default_have_conditional_execution (void);
 extern bool default_libc_has_function (enum function_class);
 extern bool no_c99_libc_has_function (enum function_class);
 extern bool gnu_libc_has_function (enum function_class);
+extern enum libc_speed default_libc_func_speed (int);
 
 extern tree default_builtin_tm_load_store (tree);
 
diff --git a/gcc/testsuite/gcc.dg/string-opt-1.c b/gcc/testsuite/gcc.dg/string-opt-1.c
index 2f060732bf0..7faaadcbb1f 100644
--- a/gcc/testsuite/gcc.dg/string-opt-1.c
+++ b/gcc/testsuite/gcc.dg/string-opt-1.c
@@ -48,5 +48,6 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-assembler-not "\<mempcpy\>" } } */
-/* { dg-final { scan-assembler "memcpy" } } */
+/* { dg-final { scan-assembler-not "\<mempcpy\>" { target { ! { i?86-*-gnu* x86_64-*-gnu* i?86-*-linux* x86_64-*-linux* } } } } } */
+/* { dg-final { scan-assembler "memcpy" { target  { ! { i?86-*-gnu* x86_64-*-gnu* i?86-*-linux* x86_64-*-linux* } } } } } */
+/* { dg-final { scan-assembler "mempcpy" { target  { i?86-*-gnu* x86_64-*-gnu* i?86-*-linux* x86_64-*-linux* } } } } */
-- 
2.16.2


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).
  2018-03-28 17:59                       ` Martin Liška
@ 2018-03-28 18:35                         ` Jakub Jelinek
  2018-03-29 12:14                           ` Martin Liška
  0 siblings, 1 reply; 45+ messages in thread
From: Jakub Jelinek @ 2018-03-28 18:35 UTC (permalink / raw)
  To: Martin Liška
  Cc: Richard Biener, Uros Bizjak, gcc-patches, Marc Glisse, H.J. Lu

On Wed, Mar 28, 2018 at 06:30:21PM +0200, Martin LiÅ¡ka wrote:
> --- a/gcc/config/linux.c
> +++ b/gcc/config/linux.c
> @@ -37,3 +37,24 @@ linux_libc_has_function (enum function_class fn_class)
>  
>    return false;
>  }
> +
> +/* This hook determines whether a function from libc has a fast implementation
> +   FN is present at the runtime.  We override it for i386 and glibc C library
> +   as this combination provides fast implementation of mempcpy function.  */
> +
> +enum libc_speed
> +ix86_linux_libc_func_speed (int fn)

Putting a ix86_ function into config/linux.c used by most linux targets is
weird.  Either we multiple linux targets with mempcpy fast, then name it
somehow cpu neutral and let all those CPUs pick it up in config/*/linux.h.
And yes, we do care about i?86-linux.  Or it is for x86 only, and then
it shouldn't be in config/linux.c, but either e.g. static inline in
config/i386/linux.h, or we need config/i386/linux.c if we don't have it
already.

	Jakub

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).
  2018-03-28 18:35                         ` Jakub Jelinek
@ 2018-03-29 12:14                           ` Martin Liška
  2018-03-29 12:31                             ` H.J. Lu
  2018-03-29 12:43                             ` Jakub Jelinek
  0 siblings, 2 replies; 45+ messages in thread
From: Martin Liška @ 2018-03-29 12:14 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Richard Biener, Uros Bizjak, gcc-patches, Marc Glisse, H.J. Lu

On 03/28/2018 06:36 PM, Jakub Jelinek wrote:
> On Wed, Mar 28, 2018 at 06:30:21PM +0200, Martin LiÅ¡ka wrote:
>> --- a/gcc/config/linux.c
>> +++ b/gcc/config/linux.c
>> @@ -37,3 +37,24 @@ linux_libc_has_function (enum function_class fn_class)
>>   
>>     return false;
>>   }
>> +
>> +/* This hook determines whether a function from libc has a fast implementation
>> +   FN is present at the runtime.  We override it for i386 and glibc C library
>> +   as this combination provides fast implementation of mempcpy function.  */
>> +
>> +enum libc_speed
>> +ix86_linux_libc_func_speed (int fn)
> 
> Putting a ix86_ function into config/linux.c used by most linux targets is
> weird.  Either we multiple linux targets with mempcpy fast, then name it
> somehow cpu neutral and let all those CPUs pick it up in config/*/linux.h.
> And yes, we do care about i?86-linux.  Or it is for x86 only, and then
> it shouldn't be in config/linux.c, but either e.g. static inline in
> config/i386/linux.h, or we need config/i386/linux.c if we don't have it
> already.

I'm fine with putting the implementation into gcc/config/i386/linux.c. Can you please
help me how to conditionally build the file?

Martin

> 
> 	Jakub
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).
  2018-03-29 12:14                           ` Martin Liška
@ 2018-03-29 12:31                             ` H.J. Lu
  2018-03-29 12:43                             ` Jakub Jelinek
  1 sibling, 0 replies; 45+ messages in thread
From: H.J. Lu @ 2018-03-29 12:31 UTC (permalink / raw)
  To: Martin Liška
  Cc: Jakub Jelinek, Richard Biener, Uros Bizjak, GCC Patches, Marc Glisse

On Thu, Mar 29, 2018 at 4:28 AM, Martin Liška <mliska@suse.cz> wrote:
> On 03/28/2018 06:36 PM, Jakub Jelinek wrote:
>>
>> On Wed, Mar 28, 2018 at 06:30:21PM +0200, Martin Liška wrote:
>>>
>>> --- a/gcc/config/linux.c
>>> +++ b/gcc/config/linux.c
>>> @@ -37,3 +37,24 @@ linux_libc_has_function (enum function_class fn_class)
>>>       return false;
>>>   }
>>> +
>>> +/* This hook determines whether a function from libc has a fast
>>> implementation
>>> +   FN is present at the runtime.  We override it for i386 and glibc C
>>> library
>>> +   as this combination provides fast implementation of mempcpy function.
>>> */
>>> +
>>> +enum libc_speed
>>> +ix86_linux_libc_func_speed (int fn)
>>
>>
>> Putting a ix86_ function into config/linux.c used by most linux targets is
>> weird.  Either we multiple linux targets with mempcpy fast, then name it
>> somehow cpu neutral and let all those CPUs pick it up in config/*/linux.h.
>> And yes, we do care about i?86-linux.  Or it is for x86 only, and then
>> it shouldn't be in config/linux.c, but either e.g. static inline in
>> config/i386/linux.h, or we need config/i386/linux.c if we don't have it
>> already.
>
>
> I'm fine with putting the implementation into gcc/config/i386/linux.c. Can
> you please
> help me how to conditionally build the file?

config.gcc:      extra_objs="${extra_objs} linux.o"
config.gcc: extra_objs="$extra_objs powerpcspe-linux.o"
config.gcc: extra_objs="$extra_objs rs6000-linux.o"


-- 
H.J.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).
  2018-03-29 12:14                           ` Martin Liška
  2018-03-29 12:31                             ` H.J. Lu
@ 2018-03-29 12:43                             ` Jakub Jelinek
  2018-03-29 12:35                               ` Martin Liška
  1 sibling, 1 reply; 45+ messages in thread
From: Jakub Jelinek @ 2018-03-29 12:43 UTC (permalink / raw)
  To: Martin Liška
  Cc: Richard Biener, Uros Bizjak, gcc-patches, Marc Glisse, H.J. Lu

On Thu, Mar 29, 2018 at 01:28:13PM +0200, Martin LiÅ¡ka wrote:
> On 03/28/2018 06:36 PM, Jakub Jelinek wrote:
> > On Wed, Mar 28, 2018 at 06:30:21PM +0200, Martin LiÅ¡ka wrote:
> > > --- a/gcc/config/linux.c
> > > +++ b/gcc/config/linux.c
> > > @@ -37,3 +37,24 @@ linux_libc_has_function (enum function_class fn_class)
> > >     return false;
> > >   }
> > > +
> > > +/* This hook determines whether a function from libc has a fast implementation
> > > +   FN is present at the runtime.  We override it for i386 and glibc C library
> > > +   as this combination provides fast implementation of mempcpy function.  */
> > > +
> > > +enum libc_speed
> > > +ix86_linux_libc_func_speed (int fn)
> > 
> > Putting a ix86_ function into config/linux.c used by most linux targets is
> > weird.  Either we multiple linux targets with mempcpy fast, then name it
> > somehow cpu neutral and let all those CPUs pick it up in config/*/linux.h.
> > And yes, we do care about i?86-linux.  Or it is for x86 only, and then
> > it shouldn't be in config/linux.c, but either e.g. static inline in
> > config/i386/linux.h, or we need config/i386/linux.c if we don't have it
> > already.
> 
> I'm fine with putting the implementation into gcc/config/i386/linux.c. Can you please

Can't you just put it into gcc/config/i386/linux-common.h as static inline,
so that it is optimized away whenever not needed?

If you really want to add a c file, it better not be called linux.c, because
linux.o for it would clash with linux.o from gcc/config/linux.c.  And,
you'd need to add the whatever.o into extra_objs in gcc/config.gcc and add
rules for it into gcc/config/i386/t-linux (see linux.o in config.gcc and
config/t-linux).

> help me how to conditionally build the file?

	Jakub

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).
  2018-03-29 12:43                             ` Jakub Jelinek
@ 2018-03-29 12:35                               ` Martin Liška
  2018-04-04 10:13                                 ` Martin Liška
  2018-04-09 12:31                                 ` Martin Liška
  0 siblings, 2 replies; 45+ messages in thread
From: Martin Liška @ 2018-03-29 12:35 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Richard Biener, Uros Bizjak, gcc-patches, Marc Glisse, H.J. Lu

On 03/29/2018 02:25 PM, Jakub Jelinek wrote:
> On Thu, Mar 29, 2018 at 01:28:13PM +0200, Martin LiÅ¡ka wrote:
>> On 03/28/2018 06:36 PM, Jakub Jelinek wrote:
>>> On Wed, Mar 28, 2018 at 06:30:21PM +0200, Martin LiÅ¡ka wrote:
>>>> --- a/gcc/config/linux.c
>>>> +++ b/gcc/config/linux.c
>>>> @@ -37,3 +37,24 @@ linux_libc_has_function (enum function_class fn_class)
>>>>      return false;
>>>>    }
>>>> +
>>>> +/* This hook determines whether a function from libc has a fast implementation
>>>> +   FN is present at the runtime.  We override it for i386 and glibc C library
>>>> +   as this combination provides fast implementation of mempcpy function.  */
>>>> +
>>>> +enum libc_speed
>>>> +ix86_linux_libc_func_speed (int fn)
>>>
>>> Putting a ix86_ function into config/linux.c used by most linux targets is
>>> weird.  Either we multiple linux targets with mempcpy fast, then name it
>>> somehow cpu neutral and let all those CPUs pick it up in config/*/linux.h.
>>> And yes, we do care about i?86-linux.  Or it is for x86 only, and then
>>> it shouldn't be in config/linux.c, but either e.g. static inline in
>>> config/i386/linux.h, or we need config/i386/linux.c if we don't have it
>>> already.
>>
>> I'm fine with putting the implementation into gcc/config/i386/linux.c. Can you please
> 
> Can't you just put it into gcc/config/i386/linux-common.h as static inline,
> so that it is optimized away whenever not needed?

I would like to put it there, but:

g++ -c   -g  -DIN_GCC     -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H -DGENERATOR_FILE -fno-PIE -I. -Ibuild -I../../gcc -I../../gcc/build -I../../gcc/../include  -I../../gcc/../libcpp/include  \
	-o build/genpreds.o ../../gcc/genpreds.c
In file included from ./tm.h:38:0,
                  from ../../gcc/genpreds.c:26:
../../gcc/config/i386/linux-common.h: In function â€˜libc_speed ix86_linux_libc_func_speed(int)â€™:
../../gcc/config/i386/linux-common.h:137:8: error: use of enum â€˜built_in_functionâ€™ without previous declaration
    enum built_in_function f = (built_in_function)fn;
         ^~~~~~~~~~~~~~~~~
../../gcc/config/i386/linux-common.h:137:31: error: â€˜built_in_functionâ€™ was not declared in this scope
    enum built_in_function f = (built_in_function)fn;
                                ^~~~~~~~~~~~~~~~~
../../gcc/config/i386/linux-common.h:137:31: note: suggested alternative: â€˜machine_functionâ€™
    enum built_in_function f = (built_in_function)fn;
                                ^~~~~~~~~~~~~~~~~
                                machine_function
../../gcc/config/i386/linux-common.h:144:12: error: â€˜BUILT_IN_MEMPCPYâ€™ was not declared in this scope
        case BUILT_IN_MEMPCPY:
             ^~~~~~~~~~~~~~~~
Martin

> 
> If you really want to add a c file, it better not be called linux.c, because
> linux.o for it would clash with linux.o from gcc/config/linux.c.  And,
> you'd need to add the whatever.o into extra_objs in gcc/config.gcc and add
> rules for it into gcc/config/i386/t-linux (see linux.o in config.gcc and
> config/t-linux).
> 
>> help me how to conditionally build the file?
> 
> 	Jakub
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).
  2018-03-29 12:35                               ` Martin Liška
@ 2018-04-04 10:13                                 ` Martin Liška
  2018-04-09 12:31                                 ` Martin Liška
  1 sibling, 0 replies; 45+ messages in thread
From: Martin Liška @ 2018-04-04 10:13 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Richard Biener, Uros Bizjak, gcc-patches, Marc Glisse, H.J. Lu

PING^1

On 03/29/2018 02:31 PM, Martin LiÅ¡ka wrote:
> On 03/29/2018 02:25 PM, Jakub Jelinek wrote:
>> On Thu, Mar 29, 2018 at 01:28:13PM +0200, Martin LiÅ¡ka wrote:
>>> On 03/28/2018 06:36 PM, Jakub Jelinek wrote:
>>>> On Wed, Mar 28, 2018 at 06:30:21PM +0200, Martin LiÅ¡ka wrote:
>>>>> --- a/gcc/config/linux.c
>>>>> +++ b/gcc/config/linux.c
>>>>> @@ -37,3 +37,24 @@ linux_libc_has_function (enum function_class fn_class)
>>>>> Â Â Â Â  return false;
>>>>> Â Â  }
>>>>> +
>>>>> +/* This hook determines whether a function from libc has a fast implementation
>>>>> +Â Â  FN is present at the runtime.Â  We override it for i386 and glibc C library
>>>>> +Â Â  as this combination provides fast implementation of mempcpy function.Â  */
>>>>> +
>>>>> +enum libc_speed
>>>>> +ix86_linux_libc_func_speed (int fn)
>>>>
>>>> Putting a ix86_ function into config/linux.c used by most linux targets is
>>>> weird.Â  Either we multiple linux targets with mempcpy fast, then name it
>>>> somehow cpu neutral and let all those CPUs pick it up in config/*/linux.h.
>>>> And yes, we do care about i?86-linux.Â  Or it is for x86 only, and then
>>>> it shouldn't be in config/linux.c, but either e.g. static inline in
>>>> config/i386/linux.h, or we need config/i386/linux.c if we don't have it
>>>> already.
>>>
>>> I'm fine with putting the implementation into gcc/config/i386/linux.c. Can you please
>>
>> Can't you just put it into gcc/config/i386/linux-common.h as static inline,
>> so that it is optimized away whenever not needed?
> 
> I would like to put it there, but:
> 
> g++ -cÂ Â  -gÂ  -DIN_GCCÂ Â Â Â  -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-commonÂ  -DHAVE_CONFIG_H -DGENERATOR_FILE -fno-PIE -I. -Ibuild -I../../gcc -I../../gcc/build -I../../gcc/../includeÂ  -I../../gcc/../libcpp/includeÂ  \
> Â Â Â Â -o build/genpreds.o ../../gcc/genpreds.c
> In file included from ./tm.h:38:0,
> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  from ../../gcc/genpreds.c:26:
> ../../gcc/config/i386/linux-common.h: In function â€˜libc_speed ix86_linux_libc_func_speed(int)â€™:
> ../../gcc/config/i386/linux-common.h:137:8: error: use of enum â€˜built_in_functionâ€™ without previous declaration
> Â Â  enum built_in_function f = (built_in_function)fn;
> Â Â Â Â Â Â Â  ^~~~~~~~~~~~~~~~~
> ../../gcc/config/i386/linux-common.h:137:31: error: â€˜built_in_functionâ€™ was not declared in this scope
> Â Â  enum built_in_function f = (built_in_function)fn;
> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  ^~~~~~~~~~~~~~~~~
> ../../gcc/config/i386/linux-common.h:137:31: note: suggested alternative: â€˜machine_functionâ€™
> Â Â  enum built_in_function f = (built_in_function)fn;
> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  ^~~~~~~~~~~~~~~~~
> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  machine_function
> ../../gcc/config/i386/linux-common.h:144:12: error: â€˜BUILT_IN_MEMPCPYâ€™ was not declared in this scope
> Â Â Â Â Â Â  case BUILT_IN_MEMPCPY:
> Â Â Â Â Â Â Â Â Â Â Â  ^~~~~~~~~~~~~~~~
> Martin
> 
>>
>> If you really want to add a c file, it better not be called linux.c, because
>> linux.o for it would clash with linux.o from gcc/config/linux.c.Â  And,
>> you'd need to add the whatever.o into extra_objs in gcc/config.gcc and add
>> rules for it into gcc/config/i386/t-linux (see linux.o in config.gcc and
>> config/t-linux).
>>
>>> help me how to conditionally build the file?
>>
>> Â Â Â Â Jakub
>>
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).
  2018-03-29 12:35                               ` Martin Liška
  2018-04-04 10:13                                 ` Martin Liška
@ 2018-04-09 12:31                                 ` Martin Liška
  2018-04-10 10:35                                   ` Jakub Jelinek
  1 sibling, 1 reply; 45+ messages in thread
From: Martin Liška @ 2018-04-09 12:31 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Richard Biener, Uros Bizjak, gcc-patches, Marc Glisse, H.J. Lu

[-- Attachment #1: Type: text/plain, Size: 148 bytes --]

Hi.

I'm sending updated version of the patch.

Patch can bootstrap on ppc64le-redhat-linux and x86_64-linux and survives regression tests.

Martin

[-- Attachment #2: 0001-Introduce-new-libc_func_speed-target-hook-PR-middle-.patch --]
[-- Type: text/x-patch, Size: 14328 bytes --]

From 6d19b1bf0c28a683e1458e16943e34bb547d36d6 Mon Sep 17 00:00:00 2001
From: marxin <mliska@suse.cz>
Date: Wed, 14 Mar 2018 09:44:18 +0100
Subject: [PATCH] Introduce new libc_func_speed target hook (PR
 middle-end/81657).

gcc/ChangeLog:

2018-03-14  Martin Liska  <mliska@suse.cz>

	PR middle-end/81657
	* builtins.c (expand_builtin_memory_copy_args): Handle situation
	when libc library provides a fast mempcpy implementation/
	* config/linux-protos.h (ix86_linux_libc_func_speed): New.
	(TARGET_LIBC_FUNC_SPEED): Likewise.
	* config/i386/linux64.h (SUBTARGET_LIBC_FUNC_SPEED): Define
	macro.
	* config/i386/linux.h (SUBTARGET_LIBC_FUNC_SPEED): Likewise.
	* config/i386/t-linux: Add x86-linux.o.
	* config.gcc: Likewise.
	* config/i386/x86-linux.c: New file.
	* coretypes.h (enum libc_speed): Likewise.
	* doc/tm.texi: Document new target hook.
	* doc/tm.texi.in: Likewise.
	* expr.c (emit_block_move_hints): Handle libc bail out argument.
	* expr.h (emit_block_move_hints): Add new parameters.
	* target.def: Add new hook.
	* targhooks.c (enum libc_speed): New enum.
	(default_libc_func_speed): Provide a default hook
	implementation.
	* targhooks.h (default_libc_func_speed): Likewise.

gcc/testsuite/ChangeLog:

2018-03-28  Martin Liska  <mliska@suse.cz>

	* gcc.dg/string-opt-1.c:

gcc/testsuite/ChangeLog:

2018-03-14  Martin Liska  <mliska@suse.cz>

	* gcc.dg/string-opt-1.c: Adjust scans for i386 and glibc target
	and others.
---
 gcc/builtins.c                      | 15 ++++++++++-
 gcc/config.gcc                      |  1 +
 gcc/config/i386/i386.c              |  5 ++++
 gcc/config/i386/linux.h             |  2 ++
 gcc/config/i386/linux64.h           |  2 ++
 gcc/config/i386/t-linux             |  6 +++++
 gcc/config/i386/x86-linux.c         | 54 +++++++++++++++++++++++++++++++++++++
 gcc/config/linux-protos.h           |  1 +
 gcc/coretypes.h                     |  7 +++++
 gcc/doc/tm.texi                     |  4 +++
 gcc/doc/tm.texi.in                  |  1 +
 gcc/expr.c                          | 11 +++++++-
 gcc/expr.h                          |  3 ++-
 gcc/target.def                      |  7 +++++
 gcc/targhooks.c                     |  9 +++++++
 gcc/targhooks.h                     |  1 +
 gcc/testsuite/gcc.dg/string-opt-1.c |  5 ++--
 17 files changed, 129 insertions(+), 5 deletions(-)
 create mode 100644 gcc/config/i386/x86-linux.c

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 487d9d58db2..98ee3fb272d 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -3651,13 +3651,26 @@ expand_builtin_memory_copy_args (tree dest, tree src, tree len,
   src_mem = get_memory_rtx (src, len);
   set_mem_align (src_mem, src_align);
 
+  /* emit_block_move_hints can generate a library call to memcpy function.
+     In situations when a libc library provides fast implementation
+     of mempcpy, then it's better to call mempcpy directly.  */
+  bool avoid_libcall
+    = (endp == 1
+       && targetm.libc_func_speed ((int)BUILT_IN_MEMPCPY) == LIBC_FAST_SPEED
+       && target != const0_rtx);
+
   /* Copy word part most expediently.  */
+  bool libcall_avoided = false;
   dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx,
 				     CALL_EXPR_TAILCALL (exp)
 				     && (endp == 0 || target == const0_rtx)
 				     ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL,
 				     expected_align, expected_size,
-				     min_size, max_size, probable_max_size);
+				     min_size, max_size, probable_max_size,
+				     avoid_libcall ? &libcall_avoided : NULL);
+
+  if (libcall_avoided)
+    return NULL_RTX;
 
   if (dest_addr == 0)
     {
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 1b58c060a92..6445ff569b3 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -1607,6 +1607,7 @@ x86_64-*-linux* | x86_64-*-kfreebsd*-gnu)
 	x86_64-*-linux*)
 		tm_file="${tm_file} linux.h linux-android.h i386/linux-common.h i386/linux64.h"
 		extra_options="${extra_options} linux-android.opt"
+		extra_objs="${extra_objs} x86-linux.o"
 	  	;;
 	x86_64-*-kfreebsd*-gnu)
 		tm_file="${tm_file} kfreebsd-gnu.h i386/kfreebsd-gnu64.h"
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index b4f6aec1434..2471ff7b99a 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -52105,6 +52105,11 @@ ix86_run_selftests (void)
 #undef TARGET_WARN_PARAMETER_PASSING_ABI
 #define TARGET_WARN_PARAMETER_PASSING_ABI ix86_warn_parameter_passing_abi
 
+#ifdef SUBTARGET_LIBC_FUNC_SPEED
+#undef TARGET_LIBC_FUNC_SPEED
+#define TARGET_LIBC_FUNC_SPEED SUBTARGET_LIBC_FUNC_SPEED
+#endif
+
 #if CHECKING_P
 #undef TARGET_RUN_TARGET_SELFTESTS
 #define TARGET_RUN_TARGET_SELFTESTS selftest::ix86_run_selftests
diff --git a/gcc/config/i386/linux.h b/gcc/config/i386/linux.h
index 69f97f15b0d..6c59cbd6d62 100644
--- a/gcc/config/i386/linux.h
+++ b/gcc/config/i386/linux.h
@@ -24,3 +24,5 @@ along with GCC; see the file COPYING3.  If not see
 
 #undef MUSL_DYNAMIC_LINKER
 #define MUSL_DYNAMIC_LINKER "/lib/ld-musl-i386.so.1"
+
+#define SUBTARGET_LIBC_FUNC_SPEED ix86_linux_libc_func_speed
diff --git a/gcc/config/i386/linux64.h b/gcc/config/i386/linux64.h
index f2d913e30ac..d855f5cc239 100644
--- a/gcc/config/i386/linux64.h
+++ b/gcc/config/i386/linux64.h
@@ -37,3 +37,5 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #define MUSL_DYNAMIC_LINKER64 "/lib/ld-musl-x86_64.so.1"
 #undef MUSL_DYNAMIC_LINKERX32
 #define MUSL_DYNAMIC_LINKERX32 "/lib/ld-musl-x32.so.1"
+
+#define SUBTARGET_LIBC_FUNC_SPEED ix86_linux_libc_func_speed
diff --git a/gcc/config/i386/t-linux b/gcc/config/i386/t-linux
index 155314c08a7..6e3ebe94fe8 100644
--- a/gcc/config/i386/t-linux
+++ b/gcc/config/i386/t-linux
@@ -1 +1,7 @@
 MULTIARCH_DIRNAME = $(call if_multiarch,i386-linux-gnu)
+
+x86-linux.o: $(srcdir)/config/i386/x86-linux.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
+  $(TM_H) $(RTL_H) $(REGS_H) hard-reg-set.h output.h $(TREE_H) flags.h \
+  $(TM_P_H) $(HASHTAB_H) $(GGC_H)
+	$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
+	$(srcdir)/config/i386/x86-linux.c
diff --git a/gcc/config/i386/x86-linux.c b/gcc/config/i386/x86-linux.c
new file mode 100644
index 00000000000..71291d18a71
--- /dev/null
+++ b/gcc/config/i386/x86-linux.c
@@ -0,0 +1,54 @@
+/* Implementation for linux-specific functions for i386 and x86-64 systems.
+   Copyright (C) 2018 Free Software Foundation, Inc.
+   Contributed by Martin Liska <mliska@suse.cz>.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+#define IN_TARGET_CODE 1
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "cp/cp-tree.h" /* This is why we're a separate module.  */
+#include "stringpool.h"
+#include "attribs.h"
+
+/* This hook determines whether a function from libc has a fast implementation
+   FN is present at the runtime.  We override it for i386 and glibc C library
+   as this combination provides fast implementation of mempcpy function.  */
+
+enum libc_speed
+ix86_linux_libc_func_speed (int fn)
+{
+  enum built_in_function f = (built_in_function)fn;
+
+  if (!OPTION_GLIBC)
+    return LIBC_UNKNOWN_SPEED;
+
+  switch (f)
+    {
+      case BUILT_IN_MEMPCPY:
+	return LIBC_FAST_SPEED;
+      default:
+	return LIBC_UNKNOWN_SPEED;
+    }
+}
diff --git a/gcc/config/linux-protos.h b/gcc/config/linux-protos.h
index 9da8dd7ecaa..b7284735366 100644
--- a/gcc/config/linux-protos.h
+++ b/gcc/config/linux-protos.h
@@ -20,3 +20,4 @@ along with GCC; see the file COPYING3.  If not see
 extern bool linux_has_ifunc_p (void);
 
 extern bool linux_libc_has_function (enum function_class fn_class);
+extern enum libc_speed ix86_linux_libc_func_speed (int fn);
diff --git a/gcc/coretypes.h b/gcc/coretypes.h
index 283b4eb33fe..fe618f708f4 100644
--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -384,6 +384,13 @@ enum excess_precision_type
   EXCESS_PRECISION_TYPE_FAST
 };
 
+enum libc_speed
+{
+  LIBC_FAST_SPEED,
+  LIBC_SLOW_SPEED,
+  LIBC_UNKNOWN_SPEED
+};
+
 /* Support for user-provided GGC and PCH markers.  The first parameter
    is a pointer to a pointer, the second a cookie.  */
 typedef void (*gt_pointer_operator) (void *, void *);
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index bd8b917ba82..0f7c91a22c4 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -5501,6 +5501,10 @@ macro, a reasonable default is used.
 This hook determines whether a function from a class of functions
 @var{fn_class} is present at the runtime.
 @end deftypefn
+@deftypefn {Target Hook} libc_speed TARGET_LIBC_FUNC_SPEED (int @var{fn})
+This hook determines whether a function from libc has a fast implementation
+@var{fn} is present at the runtime.
+@end deftypefn
 
 @defmac NEXT_OBJC_RUNTIME
 Set this macro to 1 to use the "NeXT" Objective-C message sending conventions
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index b0207146e8c..4bb2998a8a1 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -3933,6 +3933,7 @@ macro, a reasonable default is used.
 @end defmac
 
 @hook TARGET_LIBC_HAS_FUNCTION
+@hook TARGET_LIBC_FUNC_SPEED
 
 @defmac NEXT_OBJC_RUNTIME
 Set this macro to 1 to use the "NeXT" Objective-C message sending conventions
diff --git a/gcc/expr.c b/gcc/expr.c
index 00660293f72..f3bd698bc4d 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -1554,6 +1554,8 @@ compare_by_pieces (rtx arg0, rtx arg1, unsigned HOST_WIDE_INT len,
    MIN_SIZE is the minimal size of block to move
    MAX_SIZE is the maximal size of block to move, if it can not be represented
    in unsigned HOST_WIDE_INT, than it is mask of all ones.
+   If BAIL_OUT_LIBCALL is non-null, do not emit library call and assign
+   true to the pointer when move is not done.
 
    Return the address of the new block, if memcpy is called and returns it,
    0 otherwise.  */
@@ -1563,7 +1565,8 @@ emit_block_move_hints (rtx x, rtx y, rtx size, enum block_op_methods method,
 		       unsigned int expected_align, HOST_WIDE_INT expected_size,
 		       unsigned HOST_WIDE_INT min_size,
 		       unsigned HOST_WIDE_INT max_size,
-		       unsigned HOST_WIDE_INT probable_max_size)
+		       unsigned HOST_WIDE_INT probable_max_size,
+		       bool *bail_out_libcall)
 {
   bool may_use_call;
   rtx retval = 0;
@@ -1625,6 +1628,12 @@ emit_block_move_hints (rtx x, rtx y, rtx size, enum block_op_methods method,
 	   && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (x))
 	   && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (y)))
     {
+      if (bail_out_libcall)
+	{
+	  *bail_out_libcall = true;
+	  return retval;
+	}
+
       /* Since x and y are passed to a libcall, mark the corresponding
 	 tree EXPR as addressable.  */
       tree y_expr = MEM_EXPR (y);
diff --git a/gcc/expr.h b/gcc/expr.h
index b3d523bcb24..c2bf87fd14e 100644
--- a/gcc/expr.h
+++ b/gcc/expr.h
@@ -110,7 +110,8 @@ extern rtx emit_block_move_hints (rtx, rtx, rtx, enum block_op_methods,
 			          unsigned int, HOST_WIDE_INT,
 				  unsigned HOST_WIDE_INT,
 				  unsigned HOST_WIDE_INT,
-				  unsigned HOST_WIDE_INT);
+				  unsigned HOST_WIDE_INT,
+				  bool *bail_out_libcall = NULL);
 extern rtx emit_block_cmp_hints (rtx, rtx, rtx, tree, rtx, bool,
 				 by_pieces_constfn, void *);
 extern bool emit_storent_insn (rtx to, rtx from);
diff --git a/gcc/target.def b/gcc/target.def
index c5b2a1e7e71..3bbddc82776 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -2639,6 +2639,13 @@ DEFHOOK
  bool, (enum function_class fn_class),
  default_libc_has_function)
 
+DEFHOOK
+(libc_func_speed,
+ "This hook determines whether a function from libc has a fast implementation\n\
+@var{fn} is present at the runtime.",
+ libc_speed, (int fn),
+ default_libc_func_speed)
+
 /* True if new jumps cannot be created, to replace existing ones or
    not, at the current point in the compilation.  */
 DEFHOOK
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index fafcc6c5196..6e44f6f79cf 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -1642,6 +1642,15 @@ no_c99_libc_has_function (enum function_class fn_class ATTRIBUTE_UNUSED)
   return false;
 }
 
+/* This hook determines whether a function from libc has a fast implementation
+   FN is present at the runtime.  */
+
+enum libc_speed
+default_libc_func_speed (int)
+{
+  return LIBC_UNKNOWN_SPEED;
+}
+
 tree
 default_builtin_tm_load_store (tree ARG_UNUSED (type))
 {
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 8a4393f2ba4..7508673ad0a 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -205,6 +205,7 @@ extern bool default_have_conditional_execution (void);
 extern bool default_libc_has_function (enum function_class);
 extern bool no_c99_libc_has_function (enum function_class);
 extern bool gnu_libc_has_function (enum function_class);
+extern enum libc_speed default_libc_func_speed (int);
 
 extern tree default_builtin_tm_load_store (tree);
 
diff --git a/gcc/testsuite/gcc.dg/string-opt-1.c b/gcc/testsuite/gcc.dg/string-opt-1.c
index 2f060732bf0..7faaadcbb1f 100644
--- a/gcc/testsuite/gcc.dg/string-opt-1.c
+++ b/gcc/testsuite/gcc.dg/string-opt-1.c
@@ -48,5 +48,6 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-assembler-not "\<mempcpy\>" } } */
-/* { dg-final { scan-assembler "memcpy" } } */
+/* { dg-final { scan-assembler-not "\<mempcpy\>" { target { ! { i?86-*-gnu* x86_64-*-gnu* i?86-*-linux* x86_64-*-linux* } } } } } */
+/* { dg-final { scan-assembler "memcpy" { target  { ! { i?86-*-gnu* x86_64-*-gnu* i?86-*-linux* x86_64-*-linux* } } } } } */
+/* { dg-final { scan-assembler "mempcpy" { target  { i?86-*-gnu* x86_64-*-gnu* i?86-*-linux* x86_64-*-linux* } } } } */
-- 
2.16.3


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).
  2018-04-09 12:31                                 ` Martin Liška
@ 2018-04-10 10:35                                   ` Jakub Jelinek
  2018-04-10 12:28                                     ` Martin Liška
  0 siblings, 1 reply; 45+ messages in thread
From: Jakub Jelinek @ 2018-04-10 10:35 UTC (permalink / raw)
  To: Martin Liška
  Cc: Richard Biener, Uros Bizjak, gcc-patches, Marc Glisse, H.J. Lu

On Mon, Apr 09, 2018 at 02:31:04PM +0200, Martin LiÅ¡ka wrote:
> gcc/testsuite/ChangeLog:
> 
> 2018-03-28  Martin Liska  <mliska@suse.cz>
> 
> 	* gcc.dg/string-opt-1.c:

I guess you really didn't mean to keep the above entry around, just the one
below, right?

> gcc/testsuite/ChangeLog:
> 
> 2018-03-14  Martin Liska  <mliska@suse.cz>
> 
> 	* gcc.dg/string-opt-1.c: Adjust scans for i386 and glibc target
> 	and others.

> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -1607,6 +1607,7 @@ x86_64-*-linux* | x86_64-*-kfreebsd*-gnu)
>  	x86_64-*-linux*)
>  		tm_file="${tm_file} linux.h linux-android.h i386/linux-common.h i386/linux64.h"
>  		extra_options="${extra_options} linux-android.opt"
> +		extra_objs="${extra_objs} x86-linux.o"
>  	  	;;

The should go into the i[34567]86-*-linux*) case too (outside of the
if test x$enable_targets = xall; then conditional).
Or maybe better, remove the above and do it in:
        i[34567]86-*-linux* | x86_64-*-linux*)
                extra_objs="${extra_objs} cet.o"
                tmake_file="$tmake_file i386/t-linux i386/t-cet"
                ;;
spot, just add x86-linux.o next to cet.o.

> --- a/gcc/config/i386/linux.h
> +++ b/gcc/config/i386/linux.h
> @@ -24,3 +24,5 @@ along with GCC; see the file COPYING3.  If not see
>  
>  #undef MUSL_DYNAMIC_LINKER
>  #define MUSL_DYNAMIC_LINKER "/lib/ld-musl-i386.so.1"
> +
> +#define SUBTARGET_LIBC_FUNC_SPEED ix86_linux_libc_func_speed
> diff --git a/gcc/config/i386/linux64.h b/gcc/config/i386/linux64.h
> index f2d913e30ac..d855f5cc239 100644
> --- a/gcc/config/i386/linux64.h
> +++ b/gcc/config/i386/linux64.h
> @@ -37,3 +37,5 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
>  #define MUSL_DYNAMIC_LINKER64 "/lib/ld-musl-x86_64.so.1"
>  #undef MUSL_DYNAMIC_LINKERX32
>  #define MUSL_DYNAMIC_LINKERX32 "/lib/ld-musl-x32.so.1"
> +
> +#define SUBTARGET_LIBC_FUNC_SPEED ix86_linux_libc_func_speed

And the above two changes should be replaced by a change in
gcc/config/i386/linux-common.h.

> +#include "coretypes.h"
> +#include "cp/cp-tree.h" /* This is why we're a separate module.  */

Why do you need cp/cp-tree.h?  That is just too weird.
The function just uses libc_speed (in core-types.h, built_in_function
(likewise), OPTION_GLIBC (config/linux.h).

	Jakub

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).
  2018-04-10 10:35                                   ` Jakub Jelinek
@ 2018-04-10 12:28                                     ` Martin Liška
  2018-04-12 13:22                                       ` Martin Liška
  2018-07-05 19:34                                       ` Jeff Law
  0 siblings, 2 replies; 45+ messages in thread
From: Martin Liška @ 2018-04-10 12:28 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Richard Biener, Uros Bizjak, gcc-patches, Marc Glisse, H.J. Lu

[-- Attachment #1: Type: text/plain, Size: 2643 bytes --]

On 04/10/2018 11:19 AM, Jakub Jelinek wrote:
> On Mon, Apr 09, 2018 at 02:31:04PM +0200, Martin LiÅ¡ka wrote:
>> gcc/testsuite/ChangeLog:
>>
>> 2018-03-28  Martin Liska  <mliska@suse.cz>
>>
>> 	* gcc.dg/string-opt-1.c:
> 
> I guess you really didn't mean to keep the above entry around, just the one
> below, right?

Sure, fixed.

> 
>> gcc/testsuite/ChangeLog:
>>
>> 2018-03-14  Martin Liska  <mliska@suse.cz>
>>
>> 	* gcc.dg/string-opt-1.c: Adjust scans for i386 and glibc target
>> 	and others.
> 
>> --- a/gcc/config.gcc
>> +++ b/gcc/config.gcc
>> @@ -1607,6 +1607,7 @@ x86_64-*-linux* | x86_64-*-kfreebsd*-gnu)
>>  	x86_64-*-linux*)
>>  		tm_file="${tm_file} linux.h linux-android.h i386/linux-common.h i386/linux64.h"
>>  		extra_options="${extra_options} linux-android.opt"
>> +		extra_objs="${extra_objs} x86-linux.o"
>>  	  	;;
> 
> The should go into the i[34567]86-*-linux*) case too (outside of the
> if test x$enable_targets = xall; then conditional).
> Or maybe better, remove the above and do it in:
>         i[34567]86-*-linux* | x86_64-*-linux*)
>                 extra_objs="${extra_objs} cet.o"
>                 tmake_file="$tmake_file i386/t-linux i386/t-cet"
>                 ;;
> spot, just add x86-linux.o next to cet.o.

Done.

> 
>> --- a/gcc/config/i386/linux.h
>> +++ b/gcc/config/i386/linux.h
>> @@ -24,3 +24,5 @@ along with GCC; see the file COPYING3.  If not see
>>  
>>  #undef MUSL_DYNAMIC_LINKER
>>  #define MUSL_DYNAMIC_LINKER "/lib/ld-musl-i386.so.1"
>> +
>> +#define SUBTARGET_LIBC_FUNC_SPEED ix86_linux_libc_func_speed
>> diff --git a/gcc/config/i386/linux64.h b/gcc/config/i386/linux64.h
>> index f2d913e30ac..d855f5cc239 100644
>> --- a/gcc/config/i386/linux64.h
>> +++ b/gcc/config/i386/linux64.h
>> @@ -37,3 +37,5 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
>>  #define MUSL_DYNAMIC_LINKER64 "/lib/ld-musl-x86_64.so.1"
>>  #undef MUSL_DYNAMIC_LINKERX32
>>  #define MUSL_DYNAMIC_LINKERX32 "/lib/ld-musl-x32.so.1"
>> +
>> +#define SUBTARGET_LIBC_FUNC_SPEED ix86_linux_libc_func_speed
> 
> And the above two changes should be replaced by a change in
> gcc/config/i386/linux-common.h.

Likewise.

> 
>> +#include "coretypes.h"
>> +#include "cp/cp-tree.h" /* This is why we're a separate module.  */
> 
> Why do you need cp/cp-tree.h?  That is just too weird.
> The function just uses libc_speed (in core-types.h, built_in_function
> (likewise), OPTION_GLIBC (config/linux.h).

I ended up with minimal set of includes:

#include "config.h"
#include "system.h"
#include "coretypes.h"
#include "backend.h"
#include "tree.h"

I'm retesting the patch.

Martin

> 
> 	Jakub
> 


[-- Attachment #2: 0001-Introduce-new-libc_func_speed-target-hook-PR-middle-.patch --]
[-- Type: text/x-patch, Size: 13605 bytes --]

From bed35715063f9435b697eaf4c9868f81e8556de8 Mon Sep 17 00:00:00 2001
From: marxin <mliska@suse.cz>
Date: Wed, 14 Mar 2018 09:44:18 +0100
Subject: [PATCH] Introduce new libc_func_speed target hook (PR
 middle-end/81657).

gcc/ChangeLog:

2018-03-14  Martin Liska  <mliska@suse.cz>

	PR middle-end/81657
	* builtins.c (expand_builtin_memory_copy_args): Handle situation
	when libc library provides a fast mempcpy implementation/
	* config/linux-protos.h (ix86_linux_libc_func_speed): New.
	(TARGET_LIBC_FUNC_SPEED): Likewise.
	* config/i386/linux-common.h (SUBTARGET_LIBC_FUNC_SPEED): Define
	macro.
	* config/i386/t-linux: Add x86-linux.o.
	* config.gcc: Likewise.
	* config/i386/x86-linux.c: New file.
	* coretypes.h (enum libc_speed): Likewise.
	* doc/tm.texi: Document new target hook.
	* doc/tm.texi.in: Likewise.
	* expr.c (emit_block_move_hints): Handle libc bail out argument.
	* expr.h (emit_block_move_hints): Add new parameters.
	* target.def: Add new hook.
	* targhooks.c (enum libc_speed): New enum.
	(default_libc_func_speed): Provide a default hook
	implementation.
	* targhooks.h (default_libc_func_speed): Likewise.

gcc/testsuite/ChangeLog:

2018-03-14  Martin Liska  <mliska@suse.cz>

	* gcc.dg/string-opt-1.c: Adjust scans for i386 and glibc target
	and others.
---
 gcc/builtins.c                      | 15 ++++++++++-
 gcc/config.gcc                      |  2 +-
 gcc/config/i386/i386.c              |  5 ++++
 gcc/config/i386/linux-common.h      |  2 ++
 gcc/config/i386/t-linux             |  6 +++++
 gcc/config/i386/x86-linux.c         | 52 +++++++++++++++++++++++++++++++++++++
 gcc/config/linux-protos.h           |  1 +
 gcc/coretypes.h                     |  7 +++++
 gcc/doc/tm.texi                     |  4 +++
 gcc/doc/tm.texi.in                  |  1 +
 gcc/expr.c                          | 11 +++++++-
 gcc/expr.h                          |  3 ++-
 gcc/target.def                      |  7 +++++
 gcc/targhooks.c                     |  9 +++++++
 gcc/targhooks.h                     |  1 +
 gcc/testsuite/gcc.dg/string-opt-1.c |  5 ++--
 16 files changed, 125 insertions(+), 6 deletions(-)
 create mode 100644 gcc/config/i386/x86-linux.c

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 487d9d58db2..98ee3fb272d 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -3651,13 +3651,26 @@ expand_builtin_memory_copy_args (tree dest, tree src, tree len,
   src_mem = get_memory_rtx (src, len);
   set_mem_align (src_mem, src_align);
 
+  /* emit_block_move_hints can generate a library call to memcpy function.
+     In situations when a libc library provides fast implementation
+     of mempcpy, then it's better to call mempcpy directly.  */
+  bool avoid_libcall
+    = (endp == 1
+       && targetm.libc_func_speed ((int)BUILT_IN_MEMPCPY) == LIBC_FAST_SPEED
+       && target != const0_rtx);
+
   /* Copy word part most expediently.  */
+  bool libcall_avoided = false;
   dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx,
 				     CALL_EXPR_TAILCALL (exp)
 				     && (endp == 0 || target == const0_rtx)
 				     ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL,
 				     expected_align, expected_size,
-				     min_size, max_size, probable_max_size);
+				     min_size, max_size, probable_max_size,
+				     avoid_libcall ? &libcall_avoided : NULL);
+
+  if (libcall_avoided)
+    return NULL_RTX;
 
   if (dest_addr == 0)
     {
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 1b58c060a92..7fe43856b6a 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4617,7 +4617,7 @@ case ${target} in
 	i[34567]86-*-darwin* | x86_64-*-darwin*)
 		;;
 	i[34567]86-*-linux* | x86_64-*-linux*)
-		extra_objs="${extra_objs} cet.o"
+		extra_objs="${extra_objs} cet.o x86-linux.o"
 		tmake_file="$tmake_file i386/t-linux i386/t-cet"
 		;;
 	i[34567]86-*-kfreebsd*-gnu | x86_64-*-kfreebsd*-gnu)
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index b4f6aec1434..2471ff7b99a 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -52105,6 +52105,11 @@ ix86_run_selftests (void)
 #undef TARGET_WARN_PARAMETER_PASSING_ABI
 #define TARGET_WARN_PARAMETER_PASSING_ABI ix86_warn_parameter_passing_abi
 
+#ifdef SUBTARGET_LIBC_FUNC_SPEED
+#undef TARGET_LIBC_FUNC_SPEED
+#define TARGET_LIBC_FUNC_SPEED SUBTARGET_LIBC_FUNC_SPEED
+#endif
+
 #if CHECKING_P
 #undef TARGET_RUN_TARGET_SELFTESTS
 #define TARGET_RUN_TARGET_SELFTESTS selftest::ix86_run_selftests
diff --git a/gcc/config/i386/linux-common.h b/gcc/config/i386/linux-common.h
index d877387021b..1b48c15e5c0 100644
--- a/gcc/config/i386/linux-common.h
+++ b/gcc/config/i386/linux-common.h
@@ -126,3 +126,5 @@ extern void file_end_indicate_exec_stack_and_cet (void);
 
 #undef TARGET_ASM_FILE_END
 #define TARGET_ASM_FILE_END file_end_indicate_exec_stack_and_cet
+
+#define SUBTARGET_LIBC_FUNC_SPEED ix86_linux_libc_func_speed
diff --git a/gcc/config/i386/t-linux b/gcc/config/i386/t-linux
index 155314c08a7..6e3ebe94fe8 100644
--- a/gcc/config/i386/t-linux
+++ b/gcc/config/i386/t-linux
@@ -1 +1,7 @@
 MULTIARCH_DIRNAME = $(call if_multiarch,i386-linux-gnu)
+
+x86-linux.o: $(srcdir)/config/i386/x86-linux.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
+  $(TM_H) $(RTL_H) $(REGS_H) hard-reg-set.h output.h $(TREE_H) flags.h \
+  $(TM_P_H) $(HASHTAB_H) $(GGC_H)
+	$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
+	$(srcdir)/config/i386/x86-linux.c
diff --git a/gcc/config/i386/x86-linux.c b/gcc/config/i386/x86-linux.c
new file mode 100644
index 00000000000..5e4331f635a
--- /dev/null
+++ b/gcc/config/i386/x86-linux.c
@@ -0,0 +1,52 @@
+/* Implementation for linux-specific functions for i386 and x86-64 systems.
+   Copyright (C) 2018 Free Software Foundation, Inc.
+   Contributed by Martin Liska <mliska@suse.cz>.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+#define IN_TARGET_CODE 1
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "tree.h"
+
+/* This hook determines whether a function from libc has a fast implementation
+   FN is present at the runtime.  We override it for i386 and glibc C library
+   as this combination provides fast implementation of mempcpy function.  */
+
+enum libc_speed
+ix86_linux_libc_func_speed (int fn)
+{
+  enum built_in_function f = (built_in_function)fn;
+
+  if (!OPTION_GLIBC)
+    return LIBC_UNKNOWN_SPEED;
+
+  switch (f)
+    {
+      case BUILT_IN_MEMPCPY:
+	return LIBC_FAST_SPEED;
+      default:
+	return LIBC_UNKNOWN_SPEED;
+    }
+}
diff --git a/gcc/config/linux-protos.h b/gcc/config/linux-protos.h
index 9da8dd7ecaa..b7284735366 100644
--- a/gcc/config/linux-protos.h
+++ b/gcc/config/linux-protos.h
@@ -20,3 +20,4 @@ along with GCC; see the file COPYING3.  If not see
 extern bool linux_has_ifunc_p (void);
 
 extern bool linux_libc_has_function (enum function_class fn_class);
+extern enum libc_speed ix86_linux_libc_func_speed (int fn);
diff --git a/gcc/coretypes.h b/gcc/coretypes.h
index 283b4eb33fe..fe618f708f4 100644
--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -384,6 +384,13 @@ enum excess_precision_type
   EXCESS_PRECISION_TYPE_FAST
 };
 
+enum libc_speed
+{
+  LIBC_FAST_SPEED,
+  LIBC_SLOW_SPEED,
+  LIBC_UNKNOWN_SPEED
+};
+
 /* Support for user-provided GGC and PCH markers.  The first parameter
    is a pointer to a pointer, the second a cookie.  */
 typedef void (*gt_pointer_operator) (void *, void *);
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index bd8b917ba82..0f7c91a22c4 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -5501,6 +5501,10 @@ macro, a reasonable default is used.
 This hook determines whether a function from a class of functions
 @var{fn_class} is present at the runtime.
 @end deftypefn
+@deftypefn {Target Hook} libc_speed TARGET_LIBC_FUNC_SPEED (int @var{fn})
+This hook determines whether a function from libc has a fast implementation
+@var{fn} is present at the runtime.
+@end deftypefn
 
 @defmac NEXT_OBJC_RUNTIME
 Set this macro to 1 to use the "NeXT" Objective-C message sending conventions
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index b0207146e8c..4bb2998a8a1 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -3933,6 +3933,7 @@ macro, a reasonable default is used.
 @end defmac
 
 @hook TARGET_LIBC_HAS_FUNCTION
+@hook TARGET_LIBC_FUNC_SPEED
 
 @defmac NEXT_OBJC_RUNTIME
 Set this macro to 1 to use the "NeXT" Objective-C message sending conventions
diff --git a/gcc/expr.c b/gcc/expr.c
index 00660293f72..f3bd698bc4d 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -1554,6 +1554,8 @@ compare_by_pieces (rtx arg0, rtx arg1, unsigned HOST_WIDE_INT len,
    MIN_SIZE is the minimal size of block to move
    MAX_SIZE is the maximal size of block to move, if it can not be represented
    in unsigned HOST_WIDE_INT, than it is mask of all ones.
+   If BAIL_OUT_LIBCALL is non-null, do not emit library call and assign
+   true to the pointer when move is not done.
 
    Return the address of the new block, if memcpy is called and returns it,
    0 otherwise.  */
@@ -1563,7 +1565,8 @@ emit_block_move_hints (rtx x, rtx y, rtx size, enum block_op_methods method,
 		       unsigned int expected_align, HOST_WIDE_INT expected_size,
 		       unsigned HOST_WIDE_INT min_size,
 		       unsigned HOST_WIDE_INT max_size,
-		       unsigned HOST_WIDE_INT probable_max_size)
+		       unsigned HOST_WIDE_INT probable_max_size,
+		       bool *bail_out_libcall)
 {
   bool may_use_call;
   rtx retval = 0;
@@ -1625,6 +1628,12 @@ emit_block_move_hints (rtx x, rtx y, rtx size, enum block_op_methods method,
 	   && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (x))
 	   && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (y)))
     {
+      if (bail_out_libcall)
+	{
+	  *bail_out_libcall = true;
+	  return retval;
+	}
+
       /* Since x and y are passed to a libcall, mark the corresponding
 	 tree EXPR as addressable.  */
       tree y_expr = MEM_EXPR (y);
diff --git a/gcc/expr.h b/gcc/expr.h
index b3d523bcb24..c2bf87fd14e 100644
--- a/gcc/expr.h
+++ b/gcc/expr.h
@@ -110,7 +110,8 @@ extern rtx emit_block_move_hints (rtx, rtx, rtx, enum block_op_methods,
 			          unsigned int, HOST_WIDE_INT,
 				  unsigned HOST_WIDE_INT,
 				  unsigned HOST_WIDE_INT,
-				  unsigned HOST_WIDE_INT);
+				  unsigned HOST_WIDE_INT,
+				  bool *bail_out_libcall = NULL);
 extern rtx emit_block_cmp_hints (rtx, rtx, rtx, tree, rtx, bool,
 				 by_pieces_constfn, void *);
 extern bool emit_storent_insn (rtx to, rtx from);
diff --git a/gcc/target.def b/gcc/target.def
index c5b2a1e7e71..3bbddc82776 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -2639,6 +2639,13 @@ DEFHOOK
  bool, (enum function_class fn_class),
  default_libc_has_function)
 
+DEFHOOK
+(libc_func_speed,
+ "This hook determines whether a function from libc has a fast implementation\n\
+@var{fn} is present at the runtime.",
+ libc_speed, (int fn),
+ default_libc_func_speed)
+
 /* True if new jumps cannot be created, to replace existing ones or
    not, at the current point in the compilation.  */
 DEFHOOK
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index fafcc6c5196..6e44f6f79cf 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -1642,6 +1642,15 @@ no_c99_libc_has_function (enum function_class fn_class ATTRIBUTE_UNUSED)
   return false;
 }
 
+/* This hook determines whether a function from libc has a fast implementation
+   FN is present at the runtime.  */
+
+enum libc_speed
+default_libc_func_speed (int)
+{
+  return LIBC_UNKNOWN_SPEED;
+}
+
 tree
 default_builtin_tm_load_store (tree ARG_UNUSED (type))
 {
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 8a4393f2ba4..7508673ad0a 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -205,6 +205,7 @@ extern bool default_have_conditional_execution (void);
 extern bool default_libc_has_function (enum function_class);
 extern bool no_c99_libc_has_function (enum function_class);
 extern bool gnu_libc_has_function (enum function_class);
+extern enum libc_speed default_libc_func_speed (int);
 
 extern tree default_builtin_tm_load_store (tree);
 
diff --git a/gcc/testsuite/gcc.dg/string-opt-1.c b/gcc/testsuite/gcc.dg/string-opt-1.c
index 2f060732bf0..7faaadcbb1f 100644
--- a/gcc/testsuite/gcc.dg/string-opt-1.c
+++ b/gcc/testsuite/gcc.dg/string-opt-1.c
@@ -48,5 +48,6 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-assembler-not "\<mempcpy\>" } } */
-/* { dg-final { scan-assembler "memcpy" } } */
+/* { dg-final { scan-assembler-not "\<mempcpy\>" { target { ! { i?86-*-gnu* x86_64-*-gnu* i?86-*-linux* x86_64-*-linux* } } } } } */
+/* { dg-final { scan-assembler "memcpy" { target  { ! { i?86-*-gnu* x86_64-*-gnu* i?86-*-linux* x86_64-*-linux* } } } } } */
+/* { dg-final { scan-assembler "mempcpy" { target  { i?86-*-gnu* x86_64-*-gnu* i?86-*-linux* x86_64-*-linux* } } } } */
-- 
2.16.3


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).
  2018-04-10 12:28                                     ` Martin Liška
@ 2018-04-12 13:22                                       ` Martin Liška
  2018-04-12 13:42                                         ` Jan Hubicka
  2018-04-12 13:52                                         ` Richard Biener
  2018-07-05 19:34                                       ` Jeff Law
  1 sibling, 2 replies; 45+ messages in thread
From: Martin Liška @ 2018-04-12 13:22 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Richard Biener, Uros Bizjak, gcc-patches, Marc Glisse, H.J. Lu,
	Jan Hubicka

Hi.

I'm reminding review request from Richi for generic part
and Uros/Honza for target part.

Thanks,
Martin

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).
  2018-04-12 13:22                                       ` Martin Liška
@ 2018-04-12 13:42                                         ` Jan Hubicka
  2018-04-12 13:52                                         ` Richard Biener
  1 sibling, 0 replies; 45+ messages in thread
From: Jan Hubicka @ 2018-04-12 13:42 UTC (permalink / raw)
  To: Martin Liška
  Cc: Jakub Jelinek, Richard Biener, Uros Bizjak, gcc-patches,
	Marc Glisse, H.J. Lu

> Hi.
> 
> I'm reminding review request from Richi for generic part
> and Uros/Honza for target part.

OK for i386 bits.
Honza
> 
> Thanks,
> Martin

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).
  2018-04-12 13:22                                       ` Martin Liška
  2018-04-12 13:42                                         ` Jan Hubicka
@ 2018-04-12 13:52                                         ` Richard Biener
  2018-04-12 14:35                                           ` Jakub Jelinek
  1 sibling, 1 reply; 45+ messages in thread
From: Richard Biener @ 2018-04-12 13:52 UTC (permalink / raw)
  To: Martin Liška
  Cc: Jakub Jelinek, Uros Bizjak, gcc-patches, Marc Glisse, H.J. Lu,
	Jan Hubicka

[-- Attachment #1: Type: text/plain, Size: 1418 bytes --]

On Thu, 12 Apr 2018, Martin LiÅ¡ka wrote:

> Hi.
> 
> I'm reminding review request from Richi for generic part
> and Uros/Honza for target part.

Not sure if I missed some important part of the discussion but
for the testcase we want to preserve the tailcall, right?  So
it would be enough to set avoid_libcall to
endp != 0 && CALL_EXPR_TAILCALL (exp) (and thus also handle
stpcpy)?

I'm not sure I like the interfacing of that to emit_block_move_hints
very much.  I'd have used sth like BLOCK_OP_ABORT_ON_LIBCALL
and extend the interface in a way to return what kind of method it
chose rather than just a bool.

Not sure what gcc.dg/20050503-1.c did on non-x86 targets - the
test runs on all archs but only x86 is ever tested for a result.

So - I think tail-calling is prefered, and somehow in the PR
the discussion wandered off to whether there's fast implementations
or not - but the testcase looks for a tailcall where the source
was a tailcall, that should be authorative for the "default"
decision when the hook isn't implemented or doesn't cover the case.

IMO target libraries have to be quite stupid if they have anything
slower than

void *mempcpy (void *dest, const void *src, size_t n)
{
  return memcpy (dest, src, n) + n;
}

which should be not (very much) slower than a non-tailcall memcpy call.

So -- remove the hook and instead use CALL_EXPR_TAILCALL (exp) instead
of its result.

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).
  2018-04-12 13:52                                         ` Richard Biener
@ 2018-04-12 14:35                                           ` Jakub Jelinek
  2018-04-12 14:19                                             ` Richard Biener
  0 siblings, 1 reply; 45+ messages in thread
From: Jakub Jelinek @ 2018-04-12 14:35 UTC (permalink / raw)
  To: Richard Biener
  Cc: Martin Liška, Uros Bizjak, gcc-patches, Marc Glisse,
	H.J. Lu, Jan Hubicka

On Thu, Apr 12, 2018 at 03:52:09PM +0200, Richard Biener wrote:
> Not sure if I missed some important part of the discussion but
> for the testcase we want to preserve the tailcall, right?  So
> it would be enough to set avoid_libcall to
> endp != 0 && CALL_EXPR_TAILCALL (exp) (and thus also handle
> stpcpy)?

For the testcase yes.  There the question is if some targets have so lame
mempcpy that using a tailcall to mempcpy is slower over avoiding the
tailcall (and on aarch64 it looked like maintainer's choice to have lame
mempcpy and hope the compiler will avoid it at all costs).  On the other
side, that change has been forced over to all targets, even when they don't
have lame mempcpy.
So, the tailcall is one issue, and we can either use mempcpy if endp
and CALL_EXPR_TAILCALL, or only do that if -Os.

And another issue is mempcpy uses in other contexts, here again I think x86
has good enough mempcpy that if I use
foo (mempcpy (x, y, z)) then it is better to use mempcpy over memcpy call,
but not so on targets with lame mempcpy.

My preference would be to have non-lame mempcpy etc. on all targets, but the
aarch64 folks disagree.

So, wonder e.g. about Martin's patch, which would use mempcpy if endp and
either FAST_SPEED for mempcpy (regardless of the context), or not
SLOW_SPEED and CALL_EXPR_TAILCALL.  That way, targets could signal they have
so lame mempcpy that they never want to use it (return SLOW_SPEED), or ask
for it to be used every time it makes sense from caller POV, and have the
default something in between (only use it in tail calls).

	Jakub

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).
  2018-04-12 14:35                                           ` Jakub Jelinek
@ 2018-04-12 14:19                                             ` Richard Biener
  2018-04-12 14:35                                               ` Jakub Jelinek
  0 siblings, 1 reply; 45+ messages in thread
From: Richard Biener @ 2018-04-12 14:19 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Martin Liška, Uros Bizjak, gcc-patches, Marc Glisse,
	H.J. Lu, Jan Hubicka

On Thu, 12 Apr 2018, Jakub Jelinek wrote:

> On Thu, Apr 12, 2018 at 03:52:09PM +0200, Richard Biener wrote:
> > Not sure if I missed some important part of the discussion but
> > for the testcase we want to preserve the tailcall, right?  So
> > it would be enough to set avoid_libcall to
> > endp != 0 && CALL_EXPR_TAILCALL (exp) (and thus also handle
> > stpcpy)?
> 
> For the testcase yes.  There the question is if some targets have so lame
> mempcpy that using a tailcall to mempcpy is slower over avoiding the
> tailcall (and on aarch64 it looked like maintainer's choice to have lame
> mempcpy and hope the compiler will avoid it at all costs).  On the other
> side, that change has been forced over to all targets, even when they don't
> have lame mempcpy.
> So, the tailcall is one issue, and we can either use mempcpy if endp
> and CALL_EXPR_TAILCALL, or only do that if -Os.
> 
> And another issue is mempcpy uses in other contexts, here again I think x86
> has good enough mempcpy that if I use
> foo (mempcpy (x, y, z)) then it is better to use mempcpy over memcpy call,
> but not so on targets with lame mempcpy.
> 
> My preference would be to have non-lame mempcpy etc. on all targets, but the
> aarch64 folks disagree.
> 
> So, wonder e.g. about Martin's patch, which would use mempcpy if endp and
> either FAST_SPEED for mempcpy (regardless of the context), or not
> SLOW_SPEED and CALL_EXPR_TAILCALL.  That way, targets could signal they have
> so lame mempcpy that they never want to use it (return SLOW_SPEED), or ask
> for it to be used every time it makes sense from caller POV, and have the
> default something in between (only use it in tail calls).

Well, but that wouldn't be a fix for a regression and IMHO there's
no reason for a really lame mempcpy.  If targets disgree well,
then they get what they deserve.

I don't see any aarch64 specific mempcpy in glibc btw so hopefully
the default non-stupid one kicks in (it exactly looks like my C
version)

Richard.

> 	Jakub
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).
  2018-04-12 14:19                                             ` Richard Biener
@ 2018-04-12 14:35                                               ` Jakub Jelinek
  2018-04-12 15:17                                                 ` Richard Biener
  0 siblings, 1 reply; 45+ messages in thread
From: Jakub Jelinek @ 2018-04-12 14:35 UTC (permalink / raw)
  To: Richard Biener
  Cc: Martin Liška, Uros Bizjak, gcc-patches, Marc Glisse,
	H.J. Lu, Jan Hubicka

On Thu, Apr 12, 2018 at 04:19:38PM +0200, Richard Biener wrote:
> Well, but that wouldn't be a fix for a regression and IMHO there's
> no reason for a really lame mempcpy.  If targets disgree well,

It is a regression as well, in the past we've emitted mempcpy when user
wrote mempcpy, now we don't.

E.g.
extern void *mempcpy (void *, const void *, __SIZE_TYPE__);
void bar (void *, void *, void *);

void
foo (void *x, void *y, void *z, void *w, __SIZE_TYPE__ n)
{
  bar (mempcpy (x, w, n), mempcpy (y, w, n), mempcpy (z, w, n));
}

is on x86_64-linux -O2 in 7.x using the 3 mempcpy calls and 90 bytes in foo, while
on the trunk uses 3 memcpy calls and 96 bytes in foo.

For -Os that is easily measurable regression, for -O2 it depends on the
relative speed of memcpy vs. mempcpy and whether one or both of them are in
I-cache or not.

> then they get what they deserve.
> 
> I don't see any aarch64 specific mempcpy in glibc btw so hopefully
> the default non-stupid one kicks in (it exactly looks like my C
> version)

	Jakub

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).
  2018-04-12 14:35                                               ` Jakub Jelinek
@ 2018-04-12 15:17                                                 ` Richard Biener
  2018-04-12 17:35                                                   ` Jakub Jelinek
  0 siblings, 1 reply; 45+ messages in thread
From: Richard Biener @ 2018-04-12 15:17 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Martin Liška, Uros Bizjak, gcc-patches, Marc Glisse,
	H.J. Lu, Jan Hubicka

On April 12, 2018 4:31:12 PM GMT+02:00, Jakub Jelinek <jakub@redhat.com> wrote:
>On Thu, Apr 12, 2018 at 04:19:38PM +0200, Richard Biener wrote:
>> Well, but that wouldn't be a fix for a regression and IMHO there's
>> no reason for a really lame mempcpy.  If targets disgree well,
>
>It is a regression as well, in the past we've emitted mempcpy when user
>wrote mempcpy, now we don't.
>
>E.g.
>extern void *mempcpy (void *, const void *, __SIZE_TYPE__);
>void bar (void *, void *, void *);
>
>void
>foo (void *x, void *y, void *z, void *w, __SIZE_TYPE__ n)
>{
>  bar (mempcpy (x, w, n), mempcpy (y, w, n), mempcpy (z, w, n));
>}
>
>is on x86_64-linux -O2 in 7.x using the 3 mempcpy calls and 90 bytes in
>foo, while
>on the trunk uses 3 memcpy calls and 96 bytes in foo.
>
>For -Os that is easily measurable regression, for -O2 it depends on the
>relative speed of memcpy vs. mempcpy and whether one or both of them
>are in
>I-cache or not.

Well, then simply unconditionally not generate a libcall from the move expander? 

>
>> then they get what they deserve.
>> 
>> I don't see any aarch64 specific mempcpy in glibc btw so hopefully
>> the default non-stupid one kicks in (it exactly looks like my C
>> version)
>
>	Jakub

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).
  2018-04-12 15:17                                                 ` Richard Biener
@ 2018-04-12 17:35                                                   ` Jakub Jelinek
  2018-04-12 16:03                                                     ` Richard Biener
  0 siblings, 1 reply; 45+ messages in thread
From: Jakub Jelinek @ 2018-04-12 17:35 UTC (permalink / raw)
  To: Richard Biener
  Cc: Martin Liška, Uros Bizjak, gcc-patches, Marc Glisse,
	H.J. Lu, Jan Hubicka

On Thu, Apr 12, 2018 at 05:17:29PM +0200, Richard Biener wrote:
> >For -Os that is easily measurable regression, for -O2 it depends on the
> >relative speed of memcpy vs. mempcpy and whether one or both of them
> >are in
> >I-cache or not.
> 
> Well, then simply unconditionally not generate a libcall from the move expander? 

We need to generate libcall for many callers and in fact, we don't have a
mode nor a way to tell the caller that we haven't emitted anything.

What we could do is add another enumerator to enum block_op_methods that
would be like BLOCK_OP_NO_LIBCALL, but would not use emit_block_move_via_loop
if move_by_pieces nor emit_block_move_via_movmem can be used, and say
instead return const0_rtx or pc_rtx or some way to tell the caller that
it hasn't emitted anything and in expand_builtin_memory_copy_args
pass for endp == 1 && target != const0_rtx that new BLOCK_OP_NO_LIBCALL_LOOP
to emit_block_move_hints and return 0 if dest_addr is const0_rtx (or pc_rtx
or whatever is chosen).

	Jakub

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).
  2018-04-12 17:35                                                   ` Jakub Jelinek
@ 2018-04-12 16:03                                                     ` Richard Biener
  0 siblings, 0 replies; 45+ messages in thread
From: Richard Biener @ 2018-04-12 16:03 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Martin Liška, Uros Bizjak, gcc-patches, Marc Glisse,
	H.J. Lu, Jan Hubicka

On April 12, 2018 5:38:44 PM GMT+02:00, Jakub Jelinek <jakub@redhat.com> wrote:
>On Thu, Apr 12, 2018 at 05:17:29PM +0200, Richard Biener wrote:
>> >For -Os that is easily measurable regression, for -O2 it depends on
>the
>> >relative speed of memcpy vs. mempcpy and whether one or both of them
>> >are in
>> >I-cache or not.
>> 
>> Well, then simply unconditionally not generate a libcall from the
>move expander? 
>
>We need to generate libcall for many callers and in fact, we don't have
>a
>mode nor a way to tell the caller that we haven't emitted anything.
>
>What we could do is add another enumerator to enum block_op_methods
>that
>would be like BLOCK_OP_NO_LIBCALL, but would not use
>emit_block_move_via_loop
>if move_by_pieces nor emit_block_move_via_movmem can be used, and say
>instead return const0_rtx or pc_rtx or some way to tell the caller that
>it hasn't emitted anything and in expand_builtin_memory_copy_args
>pass for endp == 1 && target != const0_rtx that new
>BLOCK_OP_NO_LIBCALL_LOOP
>to emit_block_move_hints and return 0 if dest_addr is const0_rtx (or
>pc_rtx
>or whatever is chosen).

Yes. Emit the "original" call whenever inline expansion fails. 

Richard. 

>	Jakub

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).
  2018-04-10 12:28                                     ` Martin Liška
  2018-04-12 13:22                                       ` Martin Liška
@ 2018-07-05 19:34                                       ` Jeff Law
  1 sibling, 0 replies; 45+ messages in thread
From: Jeff Law @ 2018-07-05 19:34 UTC (permalink / raw)
  To: Martin Liška, Jakub Jelinek
  Cc: Richard Biener, Uros Bizjak, gcc-patches, Marc Glisse, H.J. Lu

On 04/10/2018 06:27 AM, Martin LiÅ¡ka wrote:
> On 04/10/2018 11:19 AM, Jakub Jelinek wrote:
>> On Mon, Apr 09, 2018 at 02:31:04PM +0200, Martin LiÅ¡ka wrote:
>>> gcc/testsuite/ChangeLog:
>>>
>>> 2018-03-28  Martin Liska  <mliska@suse.cz>
>>>
>>> 	* gcc.dg/string-opt-1.c:
>> I guess you really didn't mean to keep the above entry around, just the one
>> below, right?
> Sure, fixed.
> 
>>> gcc/testsuite/ChangeLog:
>>>
>>> 2018-03-14  Martin Liska  <mliska@suse.cz>
>>>
>>> 	* gcc.dg/string-opt-1.c: Adjust scans for i386 and glibc target
>>> 	and others.
>>> --- a/gcc/config.gcc
>>> +++ b/gcc/config.gcc
>>> @@ -1607,6 +1607,7 @@ x86_64-*-linux* | x86_64-*-kfreebsd*-gnu)
>>>  	x86_64-*-linux*)
>>>  		tm_file="${tm_file} linux.h linux-android.h i386/linux-common.h i386/linux64.h"
>>>  		extra_options="${extra_options} linux-android.opt"
>>> +		extra_objs="${extra_objs} x86-linux.o"
>>>  	  	;;
>> The should go into the i[34567]86-*-linux*) case too (outside of the
>> if test x$enable_targets = xall; then conditional).
>> Or maybe better, remove the above and do it in:
>>         i[34567]86-*-linux* | x86_64-*-linux*)
>>                 extra_objs="${extra_objs} cet.o"
>>                 tmake_file="$tmake_file i386/t-linux i386/t-cet"
>>                 ;;
>> spot, just add x86-linux.o next to cet.o.
> Done.
> 
>>> --- a/gcc/config/i386/linux.h
>>> +++ b/gcc/config/i386/linux.h
>>> @@ -24,3 +24,5 @@ along with GCC; see the file COPYING3.  If not see
>>>  
>>>  #undef MUSL_DYNAMIC_LINKER
>>>  #define MUSL_DYNAMIC_LINKER "/lib/ld-musl-i386.so.1"
>>> +
>>> +#define SUBTARGET_LIBC_FUNC_SPEED ix86_linux_libc_func_speed
>>> diff --git a/gcc/config/i386/linux64.h b/gcc/config/i386/linux64.h
>>> index f2d913e30ac..d855f5cc239 100644
>>> --- a/gcc/config/i386/linux64.h
>>> +++ b/gcc/config/i386/linux64.h
>>> @@ -37,3 +37,5 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
>>>  #define MUSL_DYNAMIC_LINKER64 "/lib/ld-musl-x86_64.so.1"
>>>  #undef MUSL_DYNAMIC_LINKERX32
>>>  #define MUSL_DYNAMIC_LINKERX32 "/lib/ld-musl-x32.so.1"
>>> +
>>> +#define SUBTARGET_LIBC_FUNC_SPEED ix86_linux_libc_func_speed
>> And the above two changes should be replaced by a change in
>> gcc/config/i386/linux-common.h.
> Likewise.
> 
>>> +#include "coretypes.h"
>>> +#include "cp/cp-tree.h" /* This is why we're a separate module.  */
>> Why do you need cp/cp-tree.h?  That is just too weird.
>> The function just uses libc_speed (in core-types.h, built_in_function
>> (likewise), OPTION_GLIBC (config/linux.h).
> I ended up with minimal set of includes:
> 
> #include "config.h"
> #include "system.h"
> #include "coretypes.h"
> #include "backend.h"
> #include "tree.h"
> 
> I'm retesting the patch.
> 
> Martin
> 
>> 	Jakub
>>
> 
> 0001-Introduce-new-libc_func_speed-target-hook-PR-middle-.patch
> 
> 
> From bed35715063f9435b697eaf4c9868f81e8556de8 Mon Sep 17 00:00:00 2001
> From: marxin <mliska@suse.cz>
> Date: Wed, 14 Mar 2018 09:44:18 +0100
> Subject: [PATCH] Introduce new libc_func_speed target hook (PR
>  middle-end/81657).
> 
> gcc/ChangeLog:
> 
> 2018-03-14  Martin Liska  <mliska@suse.cz>
> 
> 	PR middle-end/81657
> 	* builtins.c (expand_builtin_memory_copy_args): Handle situation
> 	when libc library provides a fast mempcpy implementation/
> 	* config/linux-protos.h (ix86_linux_libc_func_speed): New.
> 	(TARGET_LIBC_FUNC_SPEED): Likewise.
> 	* config/i386/linux-common.h (SUBTARGET_LIBC_FUNC_SPEED): Define
> 	macro.
> 	* config/i386/t-linux: Add x86-linux.o.
> 	* config.gcc: Likewise.
> 	* config/i386/x86-linux.c: New file.
> 	* coretypes.h (enum libc_speed): Likewise.
> 	* doc/tm.texi: Document new target hook.
> 	* doc/tm.texi.in: Likewise.
> 	* expr.c (emit_block_move_hints): Handle libc bail out argument.
> 	* expr.h (emit_block_move_hints): Add new parameters.
> 	* target.def: Add new hook.
> 	* targhooks.c (enum libc_speed): New enum.
> 	(default_libc_func_speed): Provide a default hook
> 	implementation.
> 	* targhooks.h (default_libc_func_speed): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
> 2018-03-14  Martin Liska  <mliska@suse.cz>
> 
> 	* gcc.dg/string-opt-1.c: Adjust scans for i386 and glibc target
> 	and others.
This looks pretty reasonable now.  Let's go with it.  If we need to
adjust other targets we certainly can fault them in as their properties
are discovered/updated.

jeff

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).
  2018-03-12  8:57 [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657) Martin Liška
  2018-03-12  9:40 ` Marc Glisse
@ 2018-03-12 10:08 ` Richard Biener
  2018-03-12 13:35 ` Jakub Jelinek
  2 siblings, 0 replies; 45+ messages in thread
From: Richard Biener @ 2018-03-12 10:08 UTC (permalink / raw)
  To: Martin Liška; +Cc: GCC Patches, Jakub Jelinek, H.J. Lu

On Mon, Mar 12, 2018 at 9:57 AM, Martin Liška <mliska@suse.cz> wrote:
> Hi.
>
> This is fix for the PR that introduces a new target macro. Using the macro

Don't add new target macros, use target hooks.

> one can say that a target has a fast mempcpy and thus it's preferred to be used
> if possible.
>
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> I also tested on x86_64-linux-gnu.
>
> Ready to be installed?
> Martin
>
> gcc/ChangeLog:
>
> 2018-03-08  Martin Liska  <mliska@suse.cz>
>
>         PR middle-end/81657
>         * builtins.c (expand_builtin_memory_copy_args): Add new
>         arguments.
>         * config/i386/i386.h (TARGET_HAS_FAST_MEMPCPY_ROUTINE):
>         New macro.
>         * defaults.h (TARGET_HAS_FAST_MEMPCPY_ROUTINE): Likewise.
>         * doc/tm.texi: Likewise.
>         * doc/tm.texi.in: Likewise.
>         * expr.c (compare_by_pieces): Add support for bail out.
>         (emit_block_move_hints): Likewise.
>         * expr.h (emit_block_move_hints): Add new arguments.
>
> gcc/testsuite/ChangeLog:
>
> 2018-03-09  Martin Liska  <mliska@suse.cz>
>
>         PR middle-end/81657
>         * gcc.dg/string-opt-1.c: Adjust test to run only on non-x86
>         target.
> ---
>  gcc/builtins.c                      | 13 ++++++++++++-
>  gcc/config/i386/i386.h              |  3 +++
>  gcc/defaults.h                      |  7 +++++++
>  gcc/doc/tm.texi                     |  5 +++++
>  gcc/doc/tm.texi.in                  |  5 +++++
>  gcc/expr.c                          | 16 +++++++++++++++-
>  gcc/expr.h                          |  4 +++-
>  gcc/testsuite/gcc.dg/string-opt-1.c |  4 ++--
>  8 files changed, 52 insertions(+), 5 deletions(-)
>
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).
  2018-03-12  8:57 [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657) Martin Liška
  2018-03-12  9:40 ` Marc Glisse
  2018-03-12 10:08 ` Richard Biener
@ 2018-03-12 13:35 ` Jakub Jelinek
  2 siblings, 0 replies; 45+ messages in thread
From: Jakub Jelinek @ 2018-03-12 13:35 UTC (permalink / raw)
  To: Martin Liška; +Cc: gcc-patches, H.J. Lu

On Mon, Mar 12, 2018 at 09:57:09AM +0100, Martin LiÅ¡ka wrote:
> Hi.
> 
> This is fix for the PR that introduces a new target macro. Using the macro
> one can say that a target has a fast mempcpy and thus it's preferred to be used
> if possible.
> 
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> I also tested on x86_64-linux-gnu.

I think we shouldn't be adding new target macros, better would be a target
hook that returns this information.

	Jakub

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).
@ 2018-04-12 15:53 Wilco Dijkstra
  2018-04-12 16:11 ` H.J. Lu
  2018-04-12 16:35 ` Jakub Jelinek
  0 siblings, 2 replies; 45+ messages in thread
From: Wilco Dijkstra @ 2018-04-12 15:53 UTC (permalink / raw)
  To: Jakub Jelinek, Richard Biener
  Cc: nd, mliska, ubizjak, GCC Patches, marc.glisse, H.J. Lu, Jan Hubicka

Jakub Jelinek wrote:
> On Thu, Apr 12, 2018 at 03:52:09PM +0200, Richard Biener wrote:
>> Not sure if I missed some important part of the discussion but
>> for the testcase we want to preserve the tailcall, right?  So
>> it would be enough to set avoid_libcall to
>> endp != 0 && CALL_EXPR_TAILCALL (exp) (and thus also handle
>> stpcpy)?

The tailcall issue is just a distraction. Historically the handling of mempcpy  
has been horribly inefficient in both GCC and GLIBC for practically all targets.
This is why it was decided to defer to memcpy.

For example small constant mempcpy was not expanded inline like memcpy
until PR70140 was fixed. Except for a few targets which have added an
optimized mempcpy, the default mempcpy implementation in almost all
released GLIBCs is much slower than memcpy (due to using a badly written
C implementation).

Recent GLIBCs now call the optimized memcpy - this is better but still adds
extra call/return overheads. So to improve that the GLIBC headers have an
inline that changes any call to mempcpy into memcpy (this is the default but
can be disabled on a per-target basis).

Obviously it is best to do this optimization in GCC, which is what we finally do
in GCC8. Inlining mempcpy means you sometimes miss a tailcall, but this is
not common - in all of GLIBC the inlining on AArch64 adds 166 extra instructions
and 12 callee-save registers. This is a small codesize cost to avoid the overhead
of calling the generic C version.

> My preference would be to have non-lame mempcpy etc. on all targets, but the
> aarch64 folks disagree.

The question is who is going to write the 30+ mempcpy implementations for all
those targets which don't have one? And who says doing this is actually going to 
improve performance? Having mempcpy+memcpy typically means more Icache
misses in code that uses both.

So generally it's a good idea to change mempcpy into memcpy by default. It's
not slower than calling mempcpy even if you have a fast implementation, it's faster
if you use an up to date GLIBC which calls memcpy, and it's significantly better
when using an old GLIBC.

Wilco

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).
  2018-04-12 15:53 Wilco Dijkstra
@ 2018-04-12 16:11 ` H.J. Lu
  2018-04-12 16:35 ` Jakub Jelinek
  1 sibling, 0 replies; 45+ messages in thread
From: H.J. Lu @ 2018-04-12 16:11 UTC (permalink / raw)
  To: Wilco Dijkstra
  Cc: Jakub Jelinek, Richard Biener, nd, mliska, ubizjak, GCC Patches,
	marc.glisse, Jan Hubicka

On Thu, Apr 12, 2018 at 8:53 AM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:

> So generally it's a good idea to change mempcpy into memcpy by default. It's
> not slower than calling mempcpy even if you have a fast implementation, it's faster
> if you use an up to date GLIBC which calls memcpy, and it's significantly better
> when using an old GLIBC.
>

It is a BAD idea for x86.   We don't want to turn mempcpy to to memcpy on
x86.  PERIOD.

-- 
H.J.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).
  2018-04-12 15:53 Wilco Dijkstra
  2018-04-12 16:11 ` H.J. Lu
@ 2018-04-12 16:35 ` Jakub Jelinek
  2018-04-12 16:30   ` Wilco Dijkstra
  1 sibling, 1 reply; 45+ messages in thread
From: Jakub Jelinek @ 2018-04-12 16:35 UTC (permalink / raw)
  To: Wilco Dijkstra
  Cc: Richard Biener, nd, mliska, ubizjak, GCC Patches, marc.glisse,
	H.J. Lu, Jan Hubicka

On Thu, Apr 12, 2018 at 03:53:13PM +0000, Wilco Dijkstra wrote:
> Jakub Jelinek wrote:
> > On Thu, Apr 12, 2018 at 03:52:09PM +0200, Richard Biener wrote:
> >> Not sure if I missed some important part of the discussion but
> >> for the testcase we want to preserve the tailcall, right?  So
> >> it would be enough to set avoid_libcall to
> >> endp != 0 && CALL_EXPR_TAILCALL (exp) (and thus also handle
> >> stpcpy)?
> 
> The tailcall issue is just a distraction. Historically the handling of mempcpy  
> has been horribly inefficient in both GCC and GLIBC for practically all targets.
> This is why it was decided to defer to memcpy.

I guess we need to agree to disagree.  But we have a P1 PR that we need to
resolve and it is one of the last 6 blockers we have.  I'm not suggesting to
revert PR70140, just let use mempcpy libcall if it is what the user wrote and
we aren't expanding it inline.

> So generally it's a good idea to change mempcpy into memcpy by default. It's

No.

> not slower than calling mempcpy even if you have a fast implementation, it's faster
> if you use an up to date GLIBC which calls memcpy, and it's significantly better
> when using an old GLIBC.

mempcpy is quite good on many targets even in old GLIBCs.

	Jakub

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).
  2018-04-12 16:35 ` Jakub Jelinek
@ 2018-04-12 16:30   ` Wilco Dijkstra
  2018-04-12 17:35     ` Jakub Jelinek
  0 siblings, 1 reply; 45+ messages in thread
From: Wilco Dijkstra @ 2018-04-12 16:30 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Richard Biener, nd, mliska, ubizjak, GCC Patches, marc.glisse,
	H.J. Lu, Jan Hubicka

Jakub Jelinek wrote:
> On Thu, Apr 12, 2018 at 03:53:13PM +0000, Wilco Dijkstra wrote:

>> The tailcall issue is just a distraction. Historically the handling of mempcpy 
>> has been horribly inefficient in both GCC and GLIBC for practically all targets.
>> This is why it was decided to defer to memcpy.
>
> I guess we need to agree to disagree.  But we have a P1 PR that we need to
> resolve and it is one of the last 6 blockers we have.  I'm not suggesting to
> revert PR70140, just let use mempcpy libcall if it is what the user wrote and
> we aren't expanding it inline.

Frankly I don't see why it is a P1 regression. Do you have a benchmark that
regresses significantly (a few percent, not by a few bytes)? I already showed
the AArch64 results for GLIBC, do you have x86 results that prove things are
much worse?

>> So generally it's a good idea to change mempcpy into memcpy by default. It's
>> not slower than calling mempcpy even if you have a fast implementation, it's faster
>> if you use an up to date GLIBC which calls memcpy, and it's significantly better
>> when using an old GLIBC.
>
> mempcpy is quite good on many targets even in old GLIBCs.

Only true if with "many" you mean x86, x86_64 and IIRC sparc.

Wilco
    

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).
  2018-04-12 16:30   ` Wilco Dijkstra
@ 2018-04-12 17:35     ` Jakub Jelinek
  2018-04-12 17:29       ` Wilco Dijkstra
  0 siblings, 1 reply; 45+ messages in thread
From: Jakub Jelinek @ 2018-04-12 17:35 UTC (permalink / raw)
  To: Wilco Dijkstra
  Cc: Richard Biener, nd, mliska, ubizjak, GCC Patches, marc.glisse,
	H.J. Lu, Jan Hubicka

On Thu, Apr 12, 2018 at 04:30:07PM +0000, Wilco Dijkstra wrote:
> Jakub Jelinek wrote:
> > On Thu, Apr 12, 2018 at 03:53:13PM +0000, Wilco Dijkstra wrote:
> 
> >> The tailcall issue is just a distraction. Historically the handling of mempcpyÂ 
> >> has been horribly inefficient in both GCC and GLIBC for practically all targets.
> >> This is why it was decided to defer to memcpy.
> >
> > I guess we need to agree to disagree.Â  But we have a P1 PR that we need to
> > resolve and it is one of the last 6 blockers we have.Â  I'm not suggesting to
> > revert PR70140, just let use mempcpy libcall if it is what the user wrote and
> > we aren't expanding it inline.
> 
> Frankly I don't see why it is a P1 regression. Do you have a benchmark that

That is how regression priorities are defined.

> >> So generally it's a good idea to change mempcpy into memcpy by default. It's
> >> not slower than calling mempcpy even if you have a fast implementation, it's faster
> >> if you use an up to date GLIBC which calls memcpy, and it's significantly better
> >> when using an old GLIBC.
> >
> > mempcpy is quite good on many targets even in old GLIBCs.
> 
> Only true if with "many" you mean x86, x86_64 and IIRC sparc.

Depending on what you mean old, I see e.g. in 2010 power7 mempcpy got added,
in 2013 other power versions, in 2016 s390*, etc.  Doing a decent mempcpy
isn't hard if you have asm version of memcpy and one spare register.

	Jakub

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).
  2018-04-12 17:35     ` Jakub Jelinek
@ 2018-04-12 17:29       ` Wilco Dijkstra
  0 siblings, 0 replies; 45+ messages in thread
From: Wilco Dijkstra @ 2018-04-12 17:29 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Richard Biener, nd, mliska, ubizjak, GCC Patches, marc.glisse,
	H.J. Lu, Jan Hubicka

Jakub Jelinek wrote:
>On Thu, Apr 12, 2018 at 04:30:07PM +0000, Wilco Dijkstra wrote:
>> Jakub Jelinek wrote:

>> Frankly I don't see why it is a P1 regression. Do you have a benchmark that
>
>That is how regression priorities are defined.

How can one justify considering this a release blocker without hard numbers?
If this is a 1% regression on a large body of code it would be very serious, if 0.01% - 
not so much.

>> >> So generally it's a good idea to change mempcpy into memcpy by default. It's
>> >> not slower than calling mempcpy even if you have a fast implementation, it's faster
>> >> if you use an up to date GLIBC which calls memcpy, and it's significantly better
>> >> when using an old GLIBC.
>> >
>> > mempcpy is quite good on many targets even in old GLIBCs.
>> 
>> Only true if with "many" you mean x86, x86_64 and IIRC sparc.
>
> Depending on what you mean old, I see e.g. in 2010 power7 mempcpy got added,
> in 2013 other power versions, in 2016 s390*, etc.  Doing a decent mempcpy
> isn't hard if you have asm version of memcpy and one spare register.

More mempcpy implementations have been added in recent years indeed, but almost all
add an extra copy of the memcpy code rather than using a single combined implementation.
That means it is still better to call memcpy (which is frequently used and thus likely in L1/L2)
rather than mempcpy (which is more likely to be cold and thus not cached).

Wilco

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2018-07-05 19:34 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-12  8:57 [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657) Martin Liška
2018-03-12  9:40 ` Marc Glisse
2018-03-13  8:25   ` Martin Liška
2018-03-13  8:35     ` Jakub Jelinek
2018-03-13  9:22       ` Richard Biener
2018-03-13 15:19       ` Martin Liška
2018-03-13 20:35         ` Jakub Jelinek
2018-03-14 12:57           ` Martin Liška
2018-03-14 13:08             ` H.J. Lu
2018-03-14 13:10               ` Martin Liška
2018-03-14 13:43                 ` Jakub Jelinek
2018-03-14 13:08             ` Jakub Jelinek
2018-03-14 14:04               ` Martin Liška
2018-03-18 22:18                 ` Martin Liška
2018-03-21 10:37                 ` Jakub Jelinek
2018-03-28 14:29                   ` Martin Liška
2018-03-28 14:35                     ` Jakub Jelinek
2018-03-28 17:59                       ` Martin Liška
2018-03-28 18:35                         ` Jakub Jelinek
2018-03-29 12:14                           ` Martin Liška
2018-03-29 12:31                             ` H.J. Lu
2018-03-29 12:43                             ` Jakub Jelinek
2018-03-29 12:35                               ` Martin Liška
2018-04-04 10:13                                 ` Martin Liška
2018-04-09 12:31                                 ` Martin Liška
2018-04-10 10:35                                   ` Jakub Jelinek
2018-04-10 12:28                                     ` Martin Liška
2018-04-12 13:22                                       ` Martin Liška
2018-04-12 13:42                                         ` Jan Hubicka
2018-04-12 13:52                                         ` Richard Biener
2018-04-12 14:35                                           ` Jakub Jelinek
2018-04-12 14:19                                             ` Richard Biener
2018-04-12 14:35                                               ` Jakub Jelinek
2018-04-12 15:17                                                 ` Richard Biener
2018-04-12 17:35                                                   ` Jakub Jelinek
2018-04-12 16:03                                                     ` Richard Biener
2018-07-05 19:34                                       ` Jeff Law
2018-03-12 10:08 ` Richard Biener
2018-03-12 13:35 ` Jakub Jelinek
2018-04-12 15:53 Wilco Dijkstra
2018-04-12 16:11 ` H.J. Lu
2018-04-12 16:35 ` Jakub Jelinek
2018-04-12 16:30   ` Wilco Dijkstra
2018-04-12 17:35     ` Jakub Jelinek
2018-04-12 17:29       ` Wilco Dijkstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).