* [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657). @ 2018-03-12 8:57 Martin Liška 2018-03-12 9:40 ` Marc Glisse ` (2 more replies) 0 siblings, 3 replies; 45+ messages in thread From: Martin Liška @ 2018-03-12 8:57 UTC (permalink / raw) To: gcc-patches; +Cc: Jakub Jelinek, H.J. Lu [-- Attachment #1: Type: text/plain, Size: 1416 bytes --] Hi. This is fix for the PR that introduces a new target macro. Using the macro one can say that a target has a fast mempcpy and thus it's preferred to be used if possible. Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. I also tested on x86_64-linux-gnu. Ready to be installed? Martin gcc/ChangeLog: 2018-03-08 Martin Liska <mliska@suse.cz> PR middle-end/81657 * builtins.c (expand_builtin_memory_copy_args): Add new arguments. * config/i386/i386.h (TARGET_HAS_FAST_MEMPCPY_ROUTINE): New macro. * defaults.h (TARGET_HAS_FAST_MEMPCPY_ROUTINE): Likewise. * doc/tm.texi: Likewise. * doc/tm.texi.in: Likewise. * expr.c (compare_by_pieces): Add support for bail out. (emit_block_move_hints): Likewise. * expr.h (emit_block_move_hints): Add new arguments. gcc/testsuite/ChangeLog: 2018-03-09 Martin Liska <mliska@suse.cz> PR middle-end/81657 * gcc.dg/string-opt-1.c: Adjust test to run only on non-x86 target. --- gcc/builtins.c | 13 ++++++++++++- gcc/config/i386/i386.h | 3 +++ gcc/defaults.h | 7 +++++++ gcc/doc/tm.texi | 5 +++++ gcc/doc/tm.texi.in | 5 +++++ gcc/expr.c | 16 +++++++++++++++- gcc/expr.h | 4 +++- gcc/testsuite/gcc.dg/string-opt-1.c | 4 ++-- 8 files changed, 52 insertions(+), 5 deletions(-) [-- Attachment #2: 0001-Prefer-mempcpy-to-memcpy-on-x86_64-target-PR-middle-.patch --] [-- Type: text/x-patch, Size: 6311 bytes --] diff --git a/gcc/builtins.c b/gcc/builtins.c index 85affa74510..c2ca36934f7 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -3651,13 +3651,24 @@ expand_builtin_memory_copy_args (tree dest, tree src, tree len, src_mem = get_memory_rtx (src, len); set_mem_align (src_mem, src_align); + bool is_move_done; + /* Copy word part most expediently. */ dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx, CALL_EXPR_TAILCALL (exp) && (endp == 0 || target == const0_rtx) ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL, expected_align, expected_size, - min_size, max_size, probable_max_size); + min_size, max_size, probable_max_size, + TARGET_HAS_FAST_MEMPCPY_ROUTINE + && endp == 1, + &is_move_done); + + /* Bail out when a mempcpy call would be expanded as libcall and when + we have a target that provides a fast implementation + of mempcpy routine. */ + if (!is_move_done) + return NULL_RTX; if (dest_addr == 0) { diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index e43edd77b56..8744d706fd7 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -1914,6 +1914,9 @@ typedef struct ix86_args { #define CLEAR_RATIO(speed) ((speed) ? MIN (6, ix86_cost->move_ratio) : 2) +/* C library provides fast implementation of mempcpy function. */ +#define TARGET_HAS_FAST_MEMPCPY_ROUTINE 1 + /* Define if shifts truncate the shift count which implies one can omit a sign-extension or zero-extension of a shift count. diff --git a/gcc/defaults.h b/gcc/defaults.h index 78a08a33f12..2e5caac8dcd 100644 --- a/gcc/defaults.h +++ b/gcc/defaults.h @@ -1340,6 +1340,13 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see #define SET_RATIO(speed) MOVE_RATIO (speed) #endif +/* By default do not generate libcall to mempcpy and rather use + libcall to memcpy and adjustment of return value. */ + +#ifndef TARGET_HAS_FAST_MEMPCPY_ROUTINE +#define TARGET_HAS_FAST_MEMPCPY_ROUTINE 0 +#endif + /* Supply a default definition of STACK_SAVEAREA_MODE for emit_stack_save. Normally move_insn, so Pmode stack pointer. */ diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index bd8b917ba82..0c8a3f3298c 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -6627,6 +6627,11 @@ optimized for speed rather than size. If you don't define this, it defaults to the value of @code{MOVE_RATIO}. @end defmac +@defmac TARGET_HAS_FAST_MEMPCPY_ROUTINE +By default do not generate libcall to mempcpy and rather use +libcall to memcpy and adjustment of return value. +@end defmac + @defmac USE_LOAD_POST_INCREMENT (@var{mode}) A C expression used to determine whether a load postincrement is a good thing to use for a given mode. Defaults to the value of diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index b0207146e8c..e7ef85ab78e 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -4560,6 +4560,11 @@ optimized for speed rather than size. If you don't define this, it defaults to the value of @code{MOVE_RATIO}. @end defmac +@defmac TARGET_HAS_FAST_MEMPCPY_ROUTINE +By default do not generate libcall to mempcpy and rather use +libcall to memcpy and adjustment of return value. +@end defmac + @defmac USE_LOAD_POST_INCREMENT (@var{mode}) A C expression used to determine whether a load postincrement is a good thing to use for a given mode. Defaults to the value of diff --git a/gcc/expr.c b/gcc/expr.c index 00660293f72..b6c13652d79 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -1554,6 +1554,8 @@ compare_by_pieces (rtx arg0, rtx arg1, unsigned HOST_WIDE_INT len, MIN_SIZE is the minimal size of block to move MAX_SIZE is the maximal size of block to move, if it can not be represented in unsigned HOST_WIDE_INT, than it is mask of all ones. + If BAIL_OUT_LIBCALL is set true, do not emit library call and set + *IS_MOVE_DONE to false. Return the address of the new block, if memcpy is called and returns it, 0 otherwise. */ @@ -1563,12 +1565,17 @@ emit_block_move_hints (rtx x, rtx y, rtx size, enum block_op_methods method, unsigned int expected_align, HOST_WIDE_INT expected_size, unsigned HOST_WIDE_INT min_size, unsigned HOST_WIDE_INT max_size, - unsigned HOST_WIDE_INT probable_max_size) + unsigned HOST_WIDE_INT probable_max_size, + bool bail_out_libcall, bool *is_move_done) { bool may_use_call; rtx retval = 0; unsigned int align; + /* When not doing a bail out, we always emit a memory move. */ + if (is_move_done) + *is_move_done = true; + gcc_assert (size); if (CONST_INT_P (size) && INTVAL (size) == 0) return 0; @@ -1625,6 +1632,13 @@ emit_block_move_hints (rtx x, rtx y, rtx size, enum block_op_methods method, && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (x)) && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (y))) { + if (bail_out_libcall) + { + if (is_move_done) + *is_move_done = false; + return retval; + } + /* Since x and y are passed to a libcall, mark the corresponding tree EXPR as addressable. */ tree y_expr = MEM_EXPR (y); diff --git a/gcc/expr.h b/gcc/expr.h index b3d523bcb24..023bc5aec47 100644 --- a/gcc/expr.h +++ b/gcc/expr.h @@ -110,7 +110,9 @@ extern rtx emit_block_move_hints (rtx, rtx, rtx, enum block_op_methods, unsigned int, HOST_WIDE_INT, unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT, - unsigned HOST_WIDE_INT); + unsigned HOST_WIDE_INT, + bool bail_out_libcall = false, + bool *is_move_done = NULL); extern rtx emit_block_cmp_hints (rtx, rtx, rtx, tree, rtx, bool, by_pieces_constfn, void *); extern bool emit_storent_insn (rtx to, rtx from); diff --git a/gcc/testsuite/gcc.dg/string-opt-1.c b/gcc/testsuite/gcc.dg/string-opt-1.c index 2f060732bf0..851c8b04a33 100644 --- a/gcc/testsuite/gcc.dg/string-opt-1.c +++ b/gcc/testsuite/gcc.dg/string-opt-1.c @@ -48,5 +48,5 @@ main (void) return 0; } -/* { dg-final { scan-assembler-not "\<mempcpy\>" } } */ -/* { dg-final { scan-assembler "memcpy" } } */ +/* { dg-final { scan-assembler-not "\<mempcpy\>" { target { i?86-*-* x86_64-*-* } } } } */ +/* { dg-final { scan-assembler "memcpy" { target { ! { i?86-*-* x86_64-*-* } } } } } */ ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657). 2018-03-12 8:57 [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657) Martin Liška @ 2018-03-12 9:40 ` Marc Glisse 2018-03-13 8:25 ` Martin Liška 2018-03-12 10:08 ` Richard Biener 2018-03-12 13:35 ` Jakub Jelinek 2 siblings, 1 reply; 45+ messages in thread From: Marc Glisse @ 2018-03-12 9:40 UTC (permalink / raw) To: Martin Liška; +Cc: gcc-patches, Jakub Jelinek, H.J. Lu On Mon, 12 Mar 2018, Martin LiÅ¡ka wrote: > This is fix for the PR that introduces a new target macro. Using the macro > one can say that a target has a fast mempcpy and thus it's preferred to be used > if possible. > > Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. > I also tested on x86_64-linux-gnu. > > Ready to be installed? > Martin > > gcc/ChangeLog: > > 2018-03-08 Martin Liska <mliska@suse.cz> > > PR middle-end/81657 > * builtins.c (expand_builtin_memory_copy_args): Add new > arguments. > * config/i386/i386.h (TARGET_HAS_FAST_MEMPCPY_ROUTINE): > New macro. Shouldn't the macro be defined in a more specific case, for instance glibc on x86? Or do all known libc on x86 happen to provide a fast mempcpy? > * defaults.h (TARGET_HAS_FAST_MEMPCPY_ROUTINE): Likewise. > * doc/tm.texi: Likewise. > * doc/tm.texi.in: Likewise. > * expr.c (compare_by_pieces): Add support for bail out. > (emit_block_move_hints): Likewise. > * expr.h (emit_block_move_hints): Add new arguments. -- Marc Glisse ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657). 2018-03-12 9:40 ` Marc Glisse @ 2018-03-13 8:25 ` Martin Liška 2018-03-13 8:35 ` Jakub Jelinek 0 siblings, 1 reply; 45+ messages in thread From: Martin Liška @ 2018-03-13 8:25 UTC (permalink / raw) To: gcc-patches, Marc Glisse; +Cc: Jakub Jelinek, H.J. Lu On 03/12/2018 10:39 AM, Marc Glisse wrote: > On Mon, 12 Mar 2018, Martin LiÅ¡ka wrote: > >> This is fix for the PR that introduces a new target macro. Using the macro >> one can say that a target has a fast mempcpy and thus it's preferred to be used >> if possible. >> >> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. >> I also tested on x86_64-linux-gnu. >> >> Ready to be installed? >> Martin >> >> gcc/ChangeLog: >> >> 2018-03-08 Martin Liska <mliska@suse.cz> >> >>     PR middle-end/81657 >>     * builtins.c (expand_builtin_memory_copy_args): Add new >>     arguments. >>     * config/i386/i386.h (TARGET_HAS_FAST_MEMPCPY_ROUTINE): >>     New macro. > > Shouldn't the macro be defined in a more specific case, for instance glibc on x86? Or do all known libc on x86 happen to provide a fast mempcpy? That's Marc a very good question. Do we already have a glibc-related target macros/hooks? If so, I would add this as one of these. Thanks, Martin > >>     * defaults.h (TARGET_HAS_FAST_MEMPCPY_ROUTINE): Likewise. >>     * doc/tm.texi: Likewise. >>     * doc/tm.texi.in: Likewise. >>     * expr.c (compare_by_pieces): Add support for bail out. >>     (emit_block_move_hints): Likewise. >>     * expr.h (emit_block_move_hints): Add new arguments. > ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657). 2018-03-13 8:25 ` Martin Liška @ 2018-03-13 8:35 ` Jakub Jelinek 2018-03-13 9:22 ` Richard Biener 2018-03-13 15:19 ` Martin Liška 0 siblings, 2 replies; 45+ messages in thread From: Jakub Jelinek @ 2018-03-13 8:35 UTC (permalink / raw) To: Martin Liška; +Cc: gcc-patches, Marc Glisse, H.J. Lu On Tue, Mar 13, 2018 at 09:24:11AM +0100, Martin LiÅ¡ka wrote: > On 03/12/2018 10:39 AM, Marc Glisse wrote: > > On Mon, 12 Mar 2018, Martin LiÅ¡ka wrote: > > > > > This is fix for the PR that introduces a new target macro. Using the macro > > > one can say that a target has a fast mempcpy and thus it's preferred to be used > > > if possible. > > > > > > Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. > > > I also tested on x86_64-linux-gnu. > > > > > > Ready to be installed? > > > Martin > > > > > > gcc/ChangeLog: > > > > > > 2018-03-08 Martin Liska <mliska@suse.cz> > > > > > >     PR middle-end/81657 > > >     * builtins.c (expand_builtin_memory_copy_args): Add new > > >     arguments. > > >     * config/i386/i386.h (TARGET_HAS_FAST_MEMPCPY_ROUTINE): > > >     New macro. > > > > Shouldn't the macro be defined in a more specific case, for instance glibc on x86? Or do all known libc on x86 happen to provide a fast mempcpy? > > That's Marc a very good question. Do we already have a glibc-related target macros/hooks? > If so, I would add this as one of these. Yes, see e.g. TARGET_LIBC_HAS_FUNCTION target hook, where in particular linux_libc_has_function deals with various C libraries. Of course, in this case you need another target hook, that is dependent both on the target backend and C library. It would be nice to make the target hook a little bit more generic as well, e.g. pass it enum builtin_function and query if it is fast, slow or unknown, or even some kind of cost, where the caller could ask for cost of BUILT_IN_MEMCPY and BUILT_IN_MEMPCPY and decide based on the relative costs. Jakub ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657). 2018-03-13 8:35 ` Jakub Jelinek @ 2018-03-13 9:22 ` Richard Biener 2018-03-13 15:19 ` Martin Liška 1 sibling, 0 replies; 45+ messages in thread From: Richard Biener @ 2018-03-13 9:22 UTC (permalink / raw) To: Jakub Jelinek; +Cc: Martin Liška, GCC Patches, Marc Glisse, H.J. Lu On Tue, Mar 13, 2018 at 9:32 AM, Jakub Jelinek <jakub@redhat.com> wrote: > On Tue, Mar 13, 2018 at 09:24:11AM +0100, Martin Liška wrote: >> On 03/12/2018 10:39 AM, Marc Glisse wrote: >> > On Mon, 12 Mar 2018, Martin Liška wrote: >> > >> > > This is fix for the PR that introduces a new target macro. Using the macro >> > > one can say that a target has a fast mempcpy and thus it's preferred to be used >> > > if possible. >> > > >> > > Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. >> > > I also tested on x86_64-linux-gnu. >> > > >> > > Ready to be installed? >> > > Martin >> > > >> > > gcc/ChangeLog: >> > > >> > > 2018-03-08 Martin Liska <mliska@suse.cz> >> > > >> > > PR middle-end/81657 >> > > * builtins.c (expand_builtin_memory_copy_args): Add new >> > > arguments. >> > > * config/i386/i386.h (TARGET_HAS_FAST_MEMPCPY_ROUTINE): >> > > New macro. >> > >> > Shouldn't the macro be defined in a more specific case, for instance glibc on x86? Or do all known libc on x86 happen to provide a fast mempcpy? >> >> That's Marc a very good question. Do we already have a glibc-related target macros/hooks? >> If so, I would add this as one of these. > > Yes, see e.g. TARGET_LIBC_HAS_FUNCTION target hook, > where in particular linux_libc_has_function deals with various C libraries. > Of course, in this case you need another target hook, that is dependent both > on the target backend and C library. > > It would be nice to make the target hook a little bit more generic as well, > e.g. pass it enum builtin_function and query if it is fast, slow or > unknown, or even some kind of cost, where the caller could ask for cost of > BUILT_IN_MEMCPY and BUILT_IN_MEMPCPY and decide based on the relative costs. Or maybe a hook returning the alternative to use? Pass it BUILT_IN_X and get back the same or related builtin? Richard. > Jakub ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657). 2018-03-13 8:35 ` Jakub Jelinek 2018-03-13 9:22 ` Richard Biener @ 2018-03-13 15:19 ` Martin Liška 2018-03-13 20:35 ` Jakub Jelinek 1 sibling, 1 reply; 45+ messages in thread From: Martin Liška @ 2018-03-13 15:19 UTC (permalink / raw) To: Jakub Jelinek; +Cc: gcc-patches, Marc Glisse, H.J. Lu On 03/13/2018 09:32 AM, Jakub Jelinek wrote: > On Tue, Mar 13, 2018 at 09:24:11AM +0100, Martin LiÅ¡ka wrote: >> On 03/12/2018 10:39 AM, Marc Glisse wrote: >>> On Mon, 12 Mar 2018, Martin LiÅ¡ka wrote: >>> >>>> This is fix for the PR that introduces a new target macro. Using the macro >>>> one can say that a target has a fast mempcpy and thus it's preferred to be used >>>> if possible. >>>> >>>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. >>>> I also tested on x86_64-linux-gnu. >>>> >>>> Ready to be installed? >>>> Martin >>>> >>>> gcc/ChangeLog: >>>> >>>> 2018-03-08 Martin Liska <mliska@suse.cz> >>>> >>>>     PR middle-end/81657 >>>>     * builtins.c (expand_builtin_memory_copy_args): Add new >>>>     arguments. >>>>     * config/i386/i386.h (TARGET_HAS_FAST_MEMPCPY_ROUTINE): >>>>     New macro. >>> >>> Shouldn't the macro be defined in a more specific case, for instance glibc on x86? Or do all known libc on x86 happen to provide a fast mempcpy? >> >> That's Marc a very good question. Do we already have a glibc-related target macros/hooks? >> If so, I would add this as one of these. > > Yes, see e.g. TARGET_LIBC_HAS_FUNCTION target hook, > where in particular linux_libc_has_function deals with various C libraries. > Of course, in this case you need another target hook, that is dependent both > on the target backend and C library. > > It would be nice to make the target hook a little bit more generic as well, > e.g. pass it enum builtin_function and query if it is fast, slow or > unknown, or even some kind of cost, where the caller could ask for cost of > BUILT_IN_MEMCPY and BUILT_IN_MEMPCPY and decide based on the relative costs. Let me start with simple return enum value of FAST,SLOW,UNKNOWN. I've added new hook definition to gcc/config/gnu-user.h that will point to gnu_libc_function_implementation. I would like to implement the function in gcc/targhooks.c, but I don't know how to make ifdef according to target? One another issue is that built_in_function is enum defined in tree.h. Thus I'll replace the callback argument with int, that will be casted. One last issue: am I right that I'll have to define TARGET_LIBC_FUNCTION_IMPLEMENTATION in each config file (similar to no_c99_libc_has_function)? Thanks, Martin > > Jakub > ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657). 2018-03-13 15:19 ` Martin Liška @ 2018-03-13 20:35 ` Jakub Jelinek 2018-03-14 12:57 ` Martin Liška 0 siblings, 1 reply; 45+ messages in thread From: Jakub Jelinek @ 2018-03-13 20:35 UTC (permalink / raw) To: Martin Liška; +Cc: gcc-patches, Marc Glisse, H.J. Lu On Tue, Mar 13, 2018 at 04:19:21PM +0100, Martin LiÅ¡ka wrote: > > Yes, see e.g. TARGET_LIBC_HAS_FUNCTION target hook, > > where in particular linux_libc_has_function deals with various C libraries. > > Of course, in this case you need another target hook, that is dependent both > > on the target backend and C library. > > > > It would be nice to make the target hook a little bit more generic as well, > > e.g. pass it enum builtin_function and query if it is fast, slow or > > unknown, or even some kind of cost, where the caller could ask for cost of > > BUILT_IN_MEMCPY and BUILT_IN_MEMPCPY and decide based on the relative costs. > > Let me start with simple return enum value of FAST,SLOW,UNKNOWN. I've added new hook > definition to gcc/config/gnu-user.h that will point to gnu_libc_function_implementation. > I would like to implement the function in gcc/targhooks.c, but I don't know how to > make ifdef according to target? Put there just the default implementation (everything is UNKNOWN?). > One another issue is that built_in_function is enum defined in tree.h. Thus I'll replace the > callback argument with int, that will be casted. One last issue: am I right that I'll have to define > TARGET_LIBC_FUNCTION_IMPLEMENTATION in each config file (similar to no_c99_libc_has_function)? And define the i386/x86_64 glibc one in config/i386/*.h, check there OPTION_GLIBC and only in that case return something other than UNKNOWN. And redefine TARGET_LIBC_FUNCTION_IMPLEMENTATION only in that case. Jakub ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657). 2018-03-13 20:35 ` Jakub Jelinek @ 2018-03-14 12:57 ` Martin Liška 2018-03-14 13:08 ` Jakub Jelinek 2018-03-14 13:08 ` H.J. Lu 0 siblings, 2 replies; 45+ messages in thread From: Martin Liška @ 2018-03-14 12:57 UTC (permalink / raw) To: Jakub Jelinek; +Cc: gcc-patches, Marc Glisse, H.J. Lu [-- Attachment #1: Type: text/plain, Size: 1705 bytes --] On 03/13/2018 04:23 PM, Jakub Jelinek wrote: > On Tue, Mar 13, 2018 at 04:19:21PM +0100, Martin LiÅ¡ka wrote: >>> Yes, see e.g. TARGET_LIBC_HAS_FUNCTION target hook, >>> where in particular linux_libc_has_function deals with various C libraries. >>> Of course, in this case you need another target hook, that is dependent both >>> on the target backend and C library. >>> >>> It would be nice to make the target hook a little bit more generic as well, >>> e.g. pass it enum builtin_function and query if it is fast, slow or >>> unknown, or even some kind of cost, where the caller could ask for cost of >>> BUILT_IN_MEMCPY and BUILT_IN_MEMPCPY and decide based on the relative costs. >> >> Let me start with simple return enum value of FAST,SLOW,UNKNOWN. I've added new hook >> definition to gcc/config/gnu-user.h that will point to gnu_libc_function_implementation. >> I would like to implement the function in gcc/targhooks.c, but I don't know how to >> make ifdef according to target? > > Put there just the default implementation (everything is UNKNOWN?). > >> One another issue is that built_in_function is enum defined in tree.h. Thus I'll replace the >> callback argument with int, that will be casted. One last issue: am I right that I'll have to define >> TARGET_LIBC_FUNCTION_IMPLEMENTATION in each config file (similar to no_c99_libc_has_function)? > > And define the i386/x86_64 glibc one in config/i386/*.h, check there > OPTION_GLIBC and only in that case return something other than UNKNOWN. > > And redefine TARGET_LIBC_FUNCTION_IMPLEMENTATION only in that case. > > Jakub > Hi. I'm sending V2 that can survive bootstrap and regression tests on both x86_64 and ppc64le. Martin [-- Attachment #2: 0001-Introduce-new-libc_func_speed-target-hook-PR-middle-.patch --] [-- Type: text/x-patch, Size: 11154 bytes --] From 222c7c205a7afc144dc123d2b378a057dcf8816f Mon Sep 17 00:00:00 2001 From: marxin <mliska@suse.cz> Date: Wed, 14 Mar 2018 09:44:18 +0100 Subject: [PATCH] Introduce new libc_func_speed target hook (PR middle-end/81657). gcc/ChangeLog: 2018-03-14 Martin Liska <mliska@suse.cz> PR middle-end/81657 * builtins.c (expand_builtin_memory_copy_args): Handle situation when libc library provides a fast mempcpy implementation/ * config/i386/i386-protos.h (gnu_libc_func_speed): New. * config/i386/i386.c (enum libc_speed): Likewise. (ix86_libc_func_speed): Likewise. (TARGET_LIBC_FUNC_SPEED): Likewise. * coretypes.h (enum libc_speed): Likewise. * doc/tm.texi: Document new target hook. * doc/tm.texi.in: Likewise. * expr.c (emit_block_move_hints): Handle libc bail out argument. * expr.h (emit_block_move_hints): Add new parameters. * target.def: Add new hook. * targhooks.c (enum libc_speed): New enum. (default_libc_func_speed): Provide a default hook implementation. * targhooks.h (default_libc_func_speed): Likewise. gcc/testsuite/ChangeLog: 2018-03-14 Martin Liska <mliska@suse.cz> * gcc.c-torture/execute/builtins/mempcpy.c (main_test): Adjust to not use mempcpy. * gcc.dg/string-opt-1.c: Adjust for i386 target. --- gcc/builtins.c | 13 ++++++++++++- gcc/config/i386/i386-protos.h | 2 ++ gcc/config/i386/i386.c | 20 ++++++++++++++++++++ gcc/coretypes.h | 7 +++++++ gcc/doc/tm.texi | 4 ++++ gcc/doc/tm.texi.in | 1 + gcc/expr.c | 16 +++++++++++++++- gcc/expr.h | 4 +++- gcc/target.def | 7 +++++++ gcc/targhooks.c | 6 ++++++ gcc/targhooks.h | 1 + .../gcc.c-torture/execute/builtins/mempcpy.c | 2 +- gcc/testsuite/gcc.dg/string-opt-1.c | 4 ++-- 13 files changed, 81 insertions(+), 6 deletions(-) diff --git a/gcc/builtins.c b/gcc/builtins.c index 85affa74510..eb038dd45b3 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -3651,13 +3651,24 @@ expand_builtin_memory_copy_args (tree dest, tree src, tree len, src_mem = get_memory_rtx (src, len); set_mem_align (src_mem, src_align); + bool is_move_done; + /* Copy word part most expediently. */ + bool bail_out_libcall = endp == 1 + && targetm.libc_func_speed ((int)BUILT_IN_MEMPCPY) == FAST_SPEED; dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx, CALL_EXPR_TAILCALL (exp) && (endp == 0 || target == const0_rtx) ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL, expected_align, expected_size, - min_size, max_size, probable_max_size); + min_size, max_size, probable_max_size, + bail_out_libcall, &is_move_done); + + /* Bail out when a mempcpy call would be expanded as libcall and when + we have a target that provides a fast implementation + of mempcpy routine. */ + if (!is_move_done) + return NULL_RTX; if (dest_addr == 0) { diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h index ef7c818986f..d3fc515845b 100644 --- a/gcc/config/i386/i386-protos.h +++ b/gcc/config/i386/i386-protos.h @@ -47,6 +47,8 @@ extern void ix86_reset_previous_fndecl (void); extern bool ix86_using_red_zone (void); +extern enum libc_speed gnu_libc_func_speed (int fn); + #ifdef RTX_CODE extern int standard_80387_constant_p (rtx); extern const char *standard_80387_constant_opcode (rtx); diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index af24c6ec5ba..b3eb50a42f3 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -2733,6 +2733,23 @@ ix86_using_red_zone (void) && (!cfun->machine->has_local_indirect_jump || cfun->machine->indirect_branch_type == indirect_branch_keep)); } + +enum libc_speed +ix86_libc_func_speed (int fn) +{ + enum built_in_function f = (built_in_function)fn; + + if (!OPTION_GLIBC) + return UNKNOWN_SPEED; + + switch (f) + { + case BUILT_IN_MEMPCPY: + return FAST_SPEED; + default: + return UNKNOWN_SPEED; + } +} \f /* Return a string that documents the current -m options. The caller is responsible for freeing the string. */ @@ -52061,6 +52078,9 @@ ix86_run_selftests (void) #undef TARGET_WARN_PARAMETER_PASSING_ABI #define TARGET_WARN_PARAMETER_PASSING_ABI ix86_warn_parameter_passing_abi +#undef TARGET_LIBC_FUNC_SPEED +#define TARGET_LIBC_FUNC_SPEED ix86_libc_func_speed + #if CHECKING_P #undef TARGET_RUN_TARGET_SELFTESTS #define TARGET_RUN_TARGET_SELFTESTS selftest::ix86_run_selftests diff --git a/gcc/coretypes.h b/gcc/coretypes.h index 283b4eb33fe..8123df7ccc5 100644 --- a/gcc/coretypes.h +++ b/gcc/coretypes.h @@ -384,6 +384,13 @@ enum excess_precision_type EXCESS_PRECISION_TYPE_FAST }; +enum libc_speed +{ + FAST_SPEED, + SLOW_SPEED, + UNKNOWN_SPEED +}; + /* Support for user-provided GGC and PCH markers. The first parameter is a pointer to a pointer, the second a cookie. */ typedef void (*gt_pointer_operator) (void *, void *); diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index bd8b917ba82..0f7c91a22c4 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -5501,6 +5501,10 @@ macro, a reasonable default is used. This hook determines whether a function from a class of functions @var{fn_class} is present at the runtime. @end deftypefn +@deftypefn {Target Hook} libc_speed TARGET_LIBC_FUNC_SPEED (int @var{fn}) +This hook determines whether a function from libc has a fast implementation +@var{fn} is present at the runtime. +@end deftypefn @defmac NEXT_OBJC_RUNTIME Set this macro to 1 to use the "NeXT" Objective-C message sending conventions diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index b0207146e8c..4bb2998a8a1 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -3933,6 +3933,7 @@ macro, a reasonable default is used. @end defmac @hook TARGET_LIBC_HAS_FUNCTION +@hook TARGET_LIBC_FUNC_SPEED @defmac NEXT_OBJC_RUNTIME Set this macro to 1 to use the "NeXT" Objective-C message sending conventions diff --git a/gcc/expr.c b/gcc/expr.c index 00660293f72..b6c13652d79 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -1554,6 +1554,8 @@ compare_by_pieces (rtx arg0, rtx arg1, unsigned HOST_WIDE_INT len, MIN_SIZE is the minimal size of block to move MAX_SIZE is the maximal size of block to move, if it can not be represented in unsigned HOST_WIDE_INT, than it is mask of all ones. + If BAIL_OUT_LIBCALL is set true, do not emit library call and set + *IS_MOVE_DONE to false. Return the address of the new block, if memcpy is called and returns it, 0 otherwise. */ @@ -1563,12 +1565,17 @@ emit_block_move_hints (rtx x, rtx y, rtx size, enum block_op_methods method, unsigned int expected_align, HOST_WIDE_INT expected_size, unsigned HOST_WIDE_INT min_size, unsigned HOST_WIDE_INT max_size, - unsigned HOST_WIDE_INT probable_max_size) + unsigned HOST_WIDE_INT probable_max_size, + bool bail_out_libcall, bool *is_move_done) { bool may_use_call; rtx retval = 0; unsigned int align; + /* When not doing a bail out, we always emit a memory move. */ + if (is_move_done) + *is_move_done = true; + gcc_assert (size); if (CONST_INT_P (size) && INTVAL (size) == 0) return 0; @@ -1625,6 +1632,13 @@ emit_block_move_hints (rtx x, rtx y, rtx size, enum block_op_methods method, && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (x)) && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (y))) { + if (bail_out_libcall) + { + if (is_move_done) + *is_move_done = false; + return retval; + } + /* Since x and y are passed to a libcall, mark the corresponding tree EXPR as addressable. */ tree y_expr = MEM_EXPR (y); diff --git a/gcc/expr.h b/gcc/expr.h index b3d523bcb24..023bc5aec47 100644 --- a/gcc/expr.h +++ b/gcc/expr.h @@ -110,7 +110,9 @@ extern rtx emit_block_move_hints (rtx, rtx, rtx, enum block_op_methods, unsigned int, HOST_WIDE_INT, unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT, - unsigned HOST_WIDE_INT); + unsigned HOST_WIDE_INT, + bool bail_out_libcall = false, + bool *is_move_done = NULL); extern rtx emit_block_cmp_hints (rtx, rtx, rtx, tree, rtx, bool, by_pieces_constfn, void *); extern bool emit_storent_insn (rtx to, rtx from); diff --git a/gcc/target.def b/gcc/target.def index c5b2a1e7e71..3bbddc82776 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -2639,6 +2639,13 @@ DEFHOOK bool, (enum function_class fn_class), default_libc_has_function) +DEFHOOK +(libc_func_speed, + "This hook determines whether a function from libc has a fast implementation\n\ +@var{fn} is present at the runtime.", + libc_speed, (int fn), + default_libc_func_speed) + /* True if new jumps cannot be created, to replace existing ones or not, at the current point in the compilation. */ DEFHOOK diff --git a/gcc/targhooks.c b/gcc/targhooks.c index fafcc6c5196..af0c87ff9fe 100644 --- a/gcc/targhooks.c +++ b/gcc/targhooks.c @@ -1642,6 +1642,12 @@ no_c99_libc_has_function (enum function_class fn_class ATTRIBUTE_UNUSED) return false; } +enum libc_speed +default_libc_func_speed (int) +{ + return UNKNOWN_SPEED; +} + tree default_builtin_tm_load_store (tree ARG_UNUSED (type)) { diff --git a/gcc/targhooks.h b/gcc/targhooks.h index 8a4393f2ba4..7508673ad0a 100644 --- a/gcc/targhooks.h +++ b/gcc/targhooks.h @@ -205,6 +205,7 @@ extern bool default_have_conditional_execution (void); extern bool default_libc_has_function (enum function_class); extern bool no_c99_libc_has_function (enum function_class); extern bool gnu_libc_has_function (enum function_class); +extern enum libc_speed default_libc_func_speed (int); extern tree default_builtin_tm_load_store (tree); diff --git a/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c b/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c index d82e2232d7b..91e1c87f83f 100644 --- a/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c +++ b/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c @@ -62,7 +62,7 @@ main_test (void) mempcpy (p + 5, s3, 1); if (memcmp (p, "ABCDEFg", 8)) abort (); - mempcpy (p + 6, s1 + 1, l1); + memcpy (p + 6, s1 + 1, l1); if (memcmp (p, "ABCDEF2", 8)) abort (); } diff --git a/gcc/testsuite/gcc.dg/string-opt-1.c b/gcc/testsuite/gcc.dg/string-opt-1.c index 2f060732bf0..851c8b04a33 100644 --- a/gcc/testsuite/gcc.dg/string-opt-1.c +++ b/gcc/testsuite/gcc.dg/string-opt-1.c @@ -48,5 +48,5 @@ main (void) return 0; } -/* { dg-final { scan-assembler-not "\<mempcpy\>" } } */ -/* { dg-final { scan-assembler "memcpy" } } */ +/* { dg-final { scan-assembler-not "\<mempcpy\>" { target { i?86-*-* x86_64-*-* } } } } */ +/* { dg-final { scan-assembler "memcpy" { target { ! { i?86-*-* x86_64-*-* } } } } } */ -- 2.16.2 ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657). 2018-03-14 12:57 ` Martin Liška @ 2018-03-14 13:08 ` Jakub Jelinek 2018-03-14 14:04 ` Martin Liška 2018-03-14 13:08 ` H.J. Lu 1 sibling, 1 reply; 45+ messages in thread From: Jakub Jelinek @ 2018-03-14 13:08 UTC (permalink / raw) To: Martin Liška; +Cc: gcc-patches, Marc Glisse, H.J. Lu On Wed, Mar 14, 2018 at 01:54:39PM +0100, Martin LiÅ¡ka wrote: > --- a/gcc/builtins.c > +++ b/gcc/builtins.c > @@ -3651,13 +3651,24 @@ expand_builtin_memory_copy_args (tree dest, tree src, tree len, > src_mem = get_memory_rtx (src, len); > set_mem_align (src_mem, src_align); > > + bool is_move_done; > + > /* Copy word part most expediently. */ This comment supposedly belongs right above the emit_block_move_hints call. > + bool bail_out_libcall = endp == 1 > + && targetm.libc_func_speed ((int)BUILT_IN_MEMPCPY) == FAST_SPEED; Formatting. && belongs below endp. So either: bool bail_out_libcall = (endp == 1 && ...); or bool bail_out_libcall = false; if (endp == 1 && ...) bail_out_libcall = true; ? The variable is not named very well, shouldn't it be avoid_libcall or something similar? Perhaps the variable should have a comment describing what it is. Do you need separate argument for that bool and is_move_done, rather than just the flag being that some pointer to bool is non-NULL? > dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx, > CALL_EXPR_TAILCALL (exp) > && (endp == 0 || target == const0_rtx) > ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL, > expected_align, expected_size, > - min_size, max_size, probable_max_size); > + min_size, max_size, probable_max_size, > + bail_out_libcall, &is_move_done); > + > + /* Bail out when a mempcpy call would be expanded as libcall and when > + we have a target that provides a fast implementation > + of mempcpy routine. */ > + if (!is_move_done) > + return NULL_RTX; > > if (dest_addr == 0) > { > --- a/gcc/config/i386/i386.c > +++ b/gcc/config/i386/i386.c > @@ -2733,6 +2733,23 @@ ix86_using_red_zone (void) > && (!cfun->machine->has_local_indirect_jump > || cfun->machine->indirect_branch_type == indirect_branch_keep)); > } > + Missing function comment here. For target hooks, usually there is a copy of the target.def comment, perhaps with further details on why it is overridden and what it does. > --- a/gcc/coretypes.h > +++ b/gcc/coretypes.h > @@ -384,6 +384,13 @@ enum excess_precision_type > EXCESS_PRECISION_TYPE_FAST > }; > Missing comment describing what it is, plus it the enumerators are too generic, if it is libc_speed enum, perhaps LIBC_FAST_SPEED etc.? > --- a/gcc/targhooks.c > +++ b/gcc/targhooks.c > @@ -1642,6 +1642,12 @@ no_c99_libc_has_function (enum function_class fn_class ATTRIBUTE_UNUSED) > return false; > } > Again, missing function comment. > +enum libc_speed > +default_libc_func_speed (int) > +{ > + return UNKNOWN_SPEED; > +} > + > --- a/gcc/testsuite/gcc.dg/string-opt-1.c > +++ b/gcc/testsuite/gcc.dg/string-opt-1.c > @@ -48,5 +48,5 @@ main (void) > return 0; > } > > -/* { dg-final { scan-assembler-not "\<mempcpy\>" } } */ > -/* { dg-final { scan-assembler "memcpy" } } */ > +/* { dg-final { scan-assembler-not "\<mempcpy\>" { target { i?86-*-* x86_64-*-* } } } } */ > +/* { dg-final { scan-assembler "memcpy" { target { ! { i?86-*-* x86_64-*-* } } } } } */ First of all, I don't really understand this, I'd expect both the two old dg-final lines to be used as is for non-x86 targets and another two for x86_64, and more importantly, the target hook is only for glibc, not for musl/bionic etc., nor for non-linux, so you probably need some effective target for it for whether it is glibc rather than musl/bionic, and use ...-*-linux* and ...-*-gnu* etc. rather than ...-*-*. Or tweak the dg-fianl patterns so that it succeeds on both variants of the target hook, but then does the test test anything at all? Jakub ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657). 2018-03-14 13:08 ` Jakub Jelinek @ 2018-03-14 14:04 ` Martin Liška 2018-03-18 22:18 ` Martin Liška 2018-03-21 10:37 ` Jakub Jelinek 0 siblings, 2 replies; 45+ messages in thread From: Martin Liška @ 2018-03-14 14:04 UTC (permalink / raw) To: Jakub Jelinek; +Cc: gcc-patches, Marc Glisse, H.J. Lu [-- Attachment #1: Type: text/plain, Size: 4156 bytes --] On 03/14/2018 02:07 PM, Jakub Jelinek wrote: > On Wed, Mar 14, 2018 at 01:54:39PM +0100, Martin LiÅ¡ka wrote: >> --- a/gcc/builtins.c >> +++ b/gcc/builtins.c >> @@ -3651,13 +3651,24 @@ expand_builtin_memory_copy_args (tree dest, tree src, tree len, >> src_mem = get_memory_rtx (src, len); >> set_mem_align (src_mem, src_align); >> >> + bool is_move_done; >> + >> /* Copy word part most expediently. */ > > This comment supposedly belongs right above the emit_block_move_hints call. > >> + bool bail_out_libcall = endp == 1 >> + && targetm.libc_func_speed ((int)BUILT_IN_MEMPCPY) == FAST_SPEED; > > Formatting. && belongs below endp. So either: > bool bail_out_libcall > = (endp == 1 > && ...); > or > bool bail_out_libcall = false; > if (endp == 1 > && ...) > bail_out_libcall = true; > ? > The variable is not named very well, shouldn't it be avoid_libcall or > something similar? Perhaps the variable should have a comment describing > what it is. Do you need separate argument for that bool and > is_move_done, rather than just the flag being that some pointer to bool is > non-NULL? Can you please explain me how to replace the 2 new arguments? > >> dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx, >> CALL_EXPR_TAILCALL (exp) >> && (endp == 0 || target == const0_rtx) >> ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL, >> expected_align, expected_size, >> - min_size, max_size, probable_max_size); >> + min_size, max_size, probable_max_size, >> + bail_out_libcall, &is_move_done); >> + >> + /* Bail out when a mempcpy call would be expanded as libcall and when >> + we have a target that provides a fast implementation >> + of mempcpy routine. */ >> + if (!is_move_done) >> + return NULL_RTX; >> >> if (dest_addr == 0) >> { >> --- a/gcc/config/i386/i386.c >> +++ b/gcc/config/i386/i386.c >> @@ -2733,6 +2733,23 @@ ix86_using_red_zone (void) >> && (!cfun->machine->has_local_indirect_jump >> || cfun->machine->indirect_branch_type == indirect_branch_keep)); >> } >> + > > Missing function comment here. For target hooks, usually there is a copy of > the target.def comment, perhaps with further details on why it is overridden > and what it does. >> --- a/gcc/coretypes.h >> +++ b/gcc/coretypes.h >> @@ -384,6 +384,13 @@ enum excess_precision_type >> EXCESS_PRECISION_TYPE_FAST >> }; >> > > Missing comment describing what it is, plus it the enumerators are too > generic, if it is libc_speed enum, perhaps LIBC_FAST_SPEED etc.? >> --- a/gcc/targhooks.c >> +++ b/gcc/targhooks.c >> @@ -1642,6 +1642,12 @@ no_c99_libc_has_function (enum function_class fn_class ATTRIBUTE_UNUSED) >> return false; >> } >> > > Again, missing function comment. > >> +enum libc_speed >> +default_libc_func_speed (int) >> +{ >> + return UNKNOWN_SPEED; >> +} >> + > >> --- a/gcc/testsuite/gcc.dg/string-opt-1.c >> +++ b/gcc/testsuite/gcc.dg/string-opt-1.c >> @@ -48,5 +48,5 @@ main (void) >> return 0; >> } >> >> -/* { dg-final { scan-assembler-not "\<mempcpy\>" } } */ >> -/* { dg-final { scan-assembler "memcpy" } } */ >> +/* { dg-final { scan-assembler-not "\<mempcpy\>" { target { i?86-*-* x86_64-*-* } } } } */ >> +/* { dg-final { scan-assembler "memcpy" { target { ! { i?86-*-* x86_64-*-* } } } } } */ > > First of all, I don't really understand this, I'd expect both the > two old dg-final lines to be used as is for non-x86 targets and another two > for x86_64, and more importantly, the target hook is only for glibc, not for > musl/bionic etc., nor for non-linux, so you probably need some effective > target for it for whether it is glibc rather than musl/bionic, and use > ...-*-linux* and ...-*-gnu* etc. rather than ...-*-*. Or tweak the dg-fianl > patterns so that it succeeds on both variants of the target hook, but then > does the test test anything at all? I fixed that by preserving the old 2 old-finals and then I added a new for x86_64 target. Apart from the comment above I've fixed all nits and mempcpy is not used when LHS == NULL. Martin > > Jakub > [-- Attachment #2: 0001-Introduce-new-libc_func_speed-target-hook-PR-middle-.patch --] [-- Type: text/x-patch, Size: 11017 bytes --] From 26979038ce9500015f957afd896146022a38490b Mon Sep 17 00:00:00 2001 From: marxin <mliska@suse.cz> Date: Wed, 14 Mar 2018 09:44:18 +0100 Subject: [PATCH] Introduce new libc_func_speed target hook (PR middle-end/81657). gcc/ChangeLog: 2018-03-14 Martin Liska <mliska@suse.cz> PR middle-end/81657 * builtins.c (expand_builtin_memory_copy_args): Handle situation when libc library provides a fast mempcpy implementation/ * config/i386/i386-protos.h (gnu_libc_func_speed): New. * config/i386/i386.c (enum libc_speed): Likewise. (ix86_libc_func_speed): Likewise. (TARGET_LIBC_FUNC_SPEED): Likewise. * coretypes.h (enum libc_speed): Likewise. * doc/tm.texi: Document new target hook. * doc/tm.texi.in: Likewise. * expr.c (emit_block_move_hints): Handle libc bail out argument. * expr.h (emit_block_move_hints): Add new parameters. * target.def: Add new hook. * targhooks.c (enum libc_speed): New enum. (default_libc_func_speed): Provide a default hook implementation. * targhooks.h (default_libc_func_speed): Likewise. gcc/testsuite/ChangeLog: 2018-03-14 Martin Liska <mliska@suse.cz> * gcc.dg/string-opt-1.c: Adjust scans for i386 and glibc target and others. --- gcc/builtins.c | 16 +++++++++++++++- gcc/config/i386/i386-protos.h | 2 ++ gcc/config/i386/i386.c | 24 ++++++++++++++++++++++++ gcc/coretypes.h | 7 +++++++ gcc/doc/tm.texi | 4 ++++ gcc/doc/tm.texi.in | 1 + gcc/expr.c | 16 +++++++++++++++- gcc/expr.h | 4 +++- gcc/target.def | 7 +++++++ gcc/targhooks.c | 9 +++++++++ gcc/targhooks.h | 1 + gcc/testsuite/gcc.dg/string-opt-1.c | 5 +++-- 12 files changed, 91 insertions(+), 5 deletions(-) diff --git a/gcc/builtins.c b/gcc/builtins.c index 85affa74510..b668045374c 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -3651,13 +3651,27 @@ expand_builtin_memory_copy_args (tree dest, tree src, tree len, src_mem = get_memory_rtx (src, len); set_mem_align (src_mem, src_align); + bool is_move_done; + + /* emit_block_move_hints can generate a library call to memcpy function. + In situations when a libc library provides fast implementation + of mempcpy, then it's better to call mempcpy directly. */ + bool avoid_libcall + = (endp == 1 + && targetm.libc_func_speed ((int)BUILT_IN_MEMPCPY) == LIBC_FAST_SPEED + && target != const0_rtx); + /* Copy word part most expediently. */ dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx, CALL_EXPR_TAILCALL (exp) && (endp == 0 || target == const0_rtx) ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL, expected_align, expected_size, - min_size, max_size, probable_max_size); + min_size, max_size, probable_max_size, + avoid_libcall, &is_move_done); + + if (!is_move_done) + return NULL_RTX; if (dest_addr == 0) { diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h index ef7c818986f..d3fc515845b 100644 --- a/gcc/config/i386/i386-protos.h +++ b/gcc/config/i386/i386-protos.h @@ -47,6 +47,8 @@ extern void ix86_reset_previous_fndecl (void); extern bool ix86_using_red_zone (void); +extern enum libc_speed gnu_libc_func_speed (int fn); + #ifdef RTX_CODE extern int standard_80387_constant_p (rtx); extern const char *standard_80387_constant_opcode (rtx); diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index af24c6ec5ba..f4cbecdf099 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -2733,6 +2733,27 @@ ix86_using_red_zone (void) && (!cfun->machine->has_local_indirect_jump || cfun->machine->indirect_branch_type == indirect_branch_keep)); } + +/* This hook determines whether a function from libc has a fast implementation + FN is present at the runtime. We override it for i386 and glibc C library + as this combination provides fast implementation of mempcpy function. */ + +enum libc_speed +ix86_libc_func_speed (int fn) +{ + enum built_in_function f = (built_in_function)fn; + + if (!OPTION_GLIBC) + return LIBC_UNKNOWN_SPEED; + + switch (f) + { + case BUILT_IN_MEMPCPY: + return LIBC_FAST_SPEED; + default: + return LIBC_UNKNOWN_SPEED; + } +} \f /* Return a string that documents the current -m options. The caller is responsible for freeing the string. */ @@ -52061,6 +52082,9 @@ ix86_run_selftests (void) #undef TARGET_WARN_PARAMETER_PASSING_ABI #define TARGET_WARN_PARAMETER_PASSING_ABI ix86_warn_parameter_passing_abi +#undef TARGET_LIBC_FUNC_SPEED +#define TARGET_LIBC_FUNC_SPEED ix86_libc_func_speed + #if CHECKING_P #undef TARGET_RUN_TARGET_SELFTESTS #define TARGET_RUN_TARGET_SELFTESTS selftest::ix86_run_selftests diff --git a/gcc/coretypes.h b/gcc/coretypes.h index 283b4eb33fe..fe618f708f4 100644 --- a/gcc/coretypes.h +++ b/gcc/coretypes.h @@ -384,6 +384,13 @@ enum excess_precision_type EXCESS_PRECISION_TYPE_FAST }; +enum libc_speed +{ + LIBC_FAST_SPEED, + LIBC_SLOW_SPEED, + LIBC_UNKNOWN_SPEED +}; + /* Support for user-provided GGC and PCH markers. The first parameter is a pointer to a pointer, the second a cookie. */ typedef void (*gt_pointer_operator) (void *, void *); diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index bd8b917ba82..0f7c91a22c4 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -5501,6 +5501,10 @@ macro, a reasonable default is used. This hook determines whether a function from a class of functions @var{fn_class} is present at the runtime. @end deftypefn +@deftypefn {Target Hook} libc_speed TARGET_LIBC_FUNC_SPEED (int @var{fn}) +This hook determines whether a function from libc has a fast implementation +@var{fn} is present at the runtime. +@end deftypefn @defmac NEXT_OBJC_RUNTIME Set this macro to 1 to use the "NeXT" Objective-C message sending conventions diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index b0207146e8c..4bb2998a8a1 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -3933,6 +3933,7 @@ macro, a reasonable default is used. @end defmac @hook TARGET_LIBC_HAS_FUNCTION +@hook TARGET_LIBC_FUNC_SPEED @defmac NEXT_OBJC_RUNTIME Set this macro to 1 to use the "NeXT" Objective-C message sending conventions diff --git a/gcc/expr.c b/gcc/expr.c index 00660293f72..b6c13652d79 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -1554,6 +1554,8 @@ compare_by_pieces (rtx arg0, rtx arg1, unsigned HOST_WIDE_INT len, MIN_SIZE is the minimal size of block to move MAX_SIZE is the maximal size of block to move, if it can not be represented in unsigned HOST_WIDE_INT, than it is mask of all ones. + If BAIL_OUT_LIBCALL is set true, do not emit library call and set + *IS_MOVE_DONE to false. Return the address of the new block, if memcpy is called and returns it, 0 otherwise. */ @@ -1563,12 +1565,17 @@ emit_block_move_hints (rtx x, rtx y, rtx size, enum block_op_methods method, unsigned int expected_align, HOST_WIDE_INT expected_size, unsigned HOST_WIDE_INT min_size, unsigned HOST_WIDE_INT max_size, - unsigned HOST_WIDE_INT probable_max_size) + unsigned HOST_WIDE_INT probable_max_size, + bool bail_out_libcall, bool *is_move_done) { bool may_use_call; rtx retval = 0; unsigned int align; + /* When not doing a bail out, we always emit a memory move. */ + if (is_move_done) + *is_move_done = true; + gcc_assert (size); if (CONST_INT_P (size) && INTVAL (size) == 0) return 0; @@ -1625,6 +1632,13 @@ emit_block_move_hints (rtx x, rtx y, rtx size, enum block_op_methods method, && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (x)) && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (y))) { + if (bail_out_libcall) + { + if (is_move_done) + *is_move_done = false; + return retval; + } + /* Since x and y are passed to a libcall, mark the corresponding tree EXPR as addressable. */ tree y_expr = MEM_EXPR (y); diff --git a/gcc/expr.h b/gcc/expr.h index b3d523bcb24..023bc5aec47 100644 --- a/gcc/expr.h +++ b/gcc/expr.h @@ -110,7 +110,9 @@ extern rtx emit_block_move_hints (rtx, rtx, rtx, enum block_op_methods, unsigned int, HOST_WIDE_INT, unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT, - unsigned HOST_WIDE_INT); + unsigned HOST_WIDE_INT, + bool bail_out_libcall = false, + bool *is_move_done = NULL); extern rtx emit_block_cmp_hints (rtx, rtx, rtx, tree, rtx, bool, by_pieces_constfn, void *); extern bool emit_storent_insn (rtx to, rtx from); diff --git a/gcc/target.def b/gcc/target.def index c5b2a1e7e71..3bbddc82776 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -2639,6 +2639,13 @@ DEFHOOK bool, (enum function_class fn_class), default_libc_has_function) +DEFHOOK +(libc_func_speed, + "This hook determines whether a function from libc has a fast implementation\n\ +@var{fn} is present at the runtime.", + libc_speed, (int fn), + default_libc_func_speed) + /* True if new jumps cannot be created, to replace existing ones or not, at the current point in the compilation. */ DEFHOOK diff --git a/gcc/targhooks.c b/gcc/targhooks.c index fafcc6c5196..6e44f6f79cf 100644 --- a/gcc/targhooks.c +++ b/gcc/targhooks.c @@ -1642,6 +1642,15 @@ no_c99_libc_has_function (enum function_class fn_class ATTRIBUTE_UNUSED) return false; } +/* This hook determines whether a function from libc has a fast implementation + FN is present at the runtime. */ + +enum libc_speed +default_libc_func_speed (int) +{ + return LIBC_UNKNOWN_SPEED; +} + tree default_builtin_tm_load_store (tree ARG_UNUSED (type)) { diff --git a/gcc/targhooks.h b/gcc/targhooks.h index 8a4393f2ba4..7508673ad0a 100644 --- a/gcc/targhooks.h +++ b/gcc/targhooks.h @@ -205,6 +205,7 @@ extern bool default_have_conditional_execution (void); extern bool default_libc_has_function (enum function_class); extern bool no_c99_libc_has_function (enum function_class); extern bool gnu_libc_has_function (enum function_class); +extern enum libc_speed default_libc_func_speed (int); extern tree default_builtin_tm_load_store (tree); diff --git a/gcc/testsuite/gcc.dg/string-opt-1.c b/gcc/testsuite/gcc.dg/string-opt-1.c index 2f060732bf0..7faaadcbb1f 100644 --- a/gcc/testsuite/gcc.dg/string-opt-1.c +++ b/gcc/testsuite/gcc.dg/string-opt-1.c @@ -48,5 +48,6 @@ main (void) return 0; } -/* { dg-final { scan-assembler-not "\<mempcpy\>" } } */ -/* { dg-final { scan-assembler "memcpy" } } */ +/* { dg-final { scan-assembler-not "\<mempcpy\>" { target { ! { i?86-*-gnu* x86_64-*-gnu* i?86-*-linux* x86_64-*-linux* } } } } } */ +/* { dg-final { scan-assembler "memcpy" { target { ! { i?86-*-gnu* x86_64-*-gnu* i?86-*-linux* x86_64-*-linux* } } } } } */ +/* { dg-final { scan-assembler "mempcpy" { target { i?86-*-gnu* x86_64-*-gnu* i?86-*-linux* x86_64-*-linux* } } } } */ -- 2.16.2 ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657). 2018-03-14 14:04 ` Martin Liška @ 2018-03-18 22:18 ` Martin Liška 2018-03-21 10:37 ` Jakub Jelinek 1 sibling, 0 replies; 45+ messages in thread From: Martin Liška @ 2018-03-18 22:18 UTC (permalink / raw) To: Jakub Jelinek; +Cc: gcc-patches, Marc Glisse, H.J. Lu PING^1 On 03/14/2018 02:54 PM, Martin LiÅ¡ka wrote: > On 03/14/2018 02:07 PM, Jakub Jelinek wrote: >> On Wed, Mar 14, 2018 at 01:54:39PM +0100, Martin LiÅ¡ka wrote: >>> --- a/gcc/builtins.c >>> +++ b/gcc/builtins.c >>> @@ -3651,13 +3651,24 @@ expand_builtin_memory_copy_args (tree dest, tree src, tree len, >>> src_mem = get_memory_rtx (src, len); >>> set_mem_align (src_mem, src_align); >>> >>> + bool is_move_done; >>> + >>> /* Copy word part most expediently. */ >> >> This comment supposedly belongs right above the emit_block_move_hints call. >> >>> + bool bail_out_libcall = endp == 1 >>> + && targetm.libc_func_speed ((int)BUILT_IN_MEMPCPY) == FAST_SPEED; >> >> Formatting. && belongs below endp. So either: >> bool bail_out_libcall >> = (endp == 1 >> && ...); >> or >> bool bail_out_libcall = false; >> if (endp == 1 >> && ...) >> bail_out_libcall = true; >> ? >> The variable is not named very well, shouldn't it be avoid_libcall or >> something similar? Perhaps the variable should have a comment describing >> what it is. Do you need separate argument for that bool and >> is_move_done, rather than just the flag being that some pointer to bool is >> non-NULL? > > Can you please explain me how to replace the 2 new arguments? > >> >>> dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx, >>> CALL_EXPR_TAILCALL (exp) >>> && (endp == 0 || target == const0_rtx) >>> ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL, >>> expected_align, expected_size, >>> - min_size, max_size, probable_max_size); >>> + min_size, max_size, probable_max_size, >>> + bail_out_libcall, &is_move_done); >>> + >>> + /* Bail out when a mempcpy call would be expanded as libcall and when >>> + we have a target that provides a fast implementation >>> + of mempcpy routine. */ >>> + if (!is_move_done) >>> + return NULL_RTX; >>> >>> if (dest_addr == 0) >>> { >>> --- a/gcc/config/i386/i386.c >>> +++ b/gcc/config/i386/i386.c >>> @@ -2733,6 +2733,23 @@ ix86_using_red_zone (void) >>> && (!cfun->machine->has_local_indirect_jump >>> || cfun->machine->indirect_branch_type == indirect_branch_keep)); >>> } >>> + >> >> Missing function comment here. For target hooks, usually there is a copy of >> the target.def comment, perhaps with further details on why it is overridden >> and what it does. >>> --- a/gcc/coretypes.h >>> +++ b/gcc/coretypes.h >>> @@ -384,6 +384,13 @@ enum excess_precision_type >>> EXCESS_PRECISION_TYPE_FAST >>> }; >>> >> >> Missing comment describing what it is, plus it the enumerators are too >> generic, if it is libc_speed enum, perhaps LIBC_FAST_SPEED etc.? >>> --- a/gcc/targhooks.c >>> +++ b/gcc/targhooks.c >>> @@ -1642,6 +1642,12 @@ no_c99_libc_has_function (enum function_class fn_class ATTRIBUTE_UNUSED) >>> return false; >>> } >>> >> >> Again, missing function comment. >> >>> +enum libc_speed >>> +default_libc_func_speed (int) >>> +{ >>> + return UNKNOWN_SPEED; >>> +} >>> + >> >>> --- a/gcc/testsuite/gcc.dg/string-opt-1.c >>> +++ b/gcc/testsuite/gcc.dg/string-opt-1.c >>> @@ -48,5 +48,5 @@ main (void) >>> return 0; >>> } >>> >>> -/* { dg-final { scan-assembler-not "\<mempcpy\>" } } */ >>> -/* { dg-final { scan-assembler "memcpy" } } */ >>> +/* { dg-final { scan-assembler-not "\<mempcpy\>" { target { i?86-*-* x86_64-*-* } } } } */ >>> +/* { dg-final { scan-assembler "memcpy" { target { ! { i?86-*-* x86_64-*-* } } } } } */ >> >> First of all, I don't really understand this, I'd expect both the >> two old dg-final lines to be used as is for non-x86 targets and another two >> for x86_64, and more importantly, the target hook is only for glibc, not for >> musl/bionic etc., nor for non-linux, so you probably need some effective >> target for it for whether it is glibc rather than musl/bionic, and use >> ...-*-linux* and ...-*-gnu* etc. rather than ...-*-*. Or tweak the dg-fianl >> patterns so that it succeeds on both variants of the target hook, but then >> does the test test anything at all? > > I fixed that by preserving the old 2 old-finals and then I added a new for x86_64 target. > Apart from the comment above I've fixed all nits and mempcpy is not used when LHS == NULL. > > Martin > >> >> Jakub >> > ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657). 2018-03-14 14:04 ` Martin Liška 2018-03-18 22:18 ` Martin Liška @ 2018-03-21 10:37 ` Jakub Jelinek 2018-03-28 14:29 ` Martin Liška 1 sibling, 1 reply; 45+ messages in thread From: Jakub Jelinek @ 2018-03-21 10:37 UTC (permalink / raw) To: Martin Liška; +Cc: gcc-patches, Marc Glisse, H.J. Lu On Wed, Mar 14, 2018 at 02:54:00PM +0100, Martin LiÅ¡ka wrote: > > The variable is not named very well, shouldn't it be avoid_libcall or > > something similar? Perhaps the variable should have a comment describing > > what it is. Do you need separate argument for that bool and > > is_move_done, rather than just the flag being that some pointer to bool is > > non-NULL? > > Can you please explain me how to replace the 2 new arguments? So you have one bool arg and one pointer arg into which the function stores true and optionally based on that other bool arg and other conditions stores false. I'm suggesting just one bool * argument, which is NULL if the bool arg would be false, and non-NULL otherwise. The single argument still could be called bool *avoid_libcall, and you'd just if (avoid_libcall) { *avoid_libcall = true; return retval; } instead of emitting a libcall, the caller would initialize the bool variable to false. Jakub ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657). 2018-03-21 10:37 ` Jakub Jelinek @ 2018-03-28 14:29 ` Martin Liška 2018-03-28 14:35 ` Jakub Jelinek 0 siblings, 1 reply; 45+ messages in thread From: Martin Liška @ 2018-03-28 14:29 UTC (permalink / raw) To: Jakub Jelinek; +Cc: gcc-patches, Marc Glisse, H.J. Lu [-- Attachment #1: Type: text/plain, Size: 1166 bytes --] On 03/21/2018 11:34 AM, Jakub Jelinek wrote: > On Wed, Mar 14, 2018 at 02:54:00PM +0100, Martin LiÅ¡ka wrote: >>> The variable is not named very well, shouldn't it be avoid_libcall or >>> something similar? Perhaps the variable should have a comment describing >>> what it is. Do you need separate argument for that bool and >>> is_move_done, rather than just the flag being that some pointer to bool is >>> non-NULL? >> >> Can you please explain me how to replace the 2 new arguments? > > So you have one bool arg and one pointer arg into which the function > stores true and optionally based on that other bool arg and other conditions > stores false. > I'm suggesting just one bool * argument, which is NULL if the bool arg would > be false, and non-NULL otherwise. The single argument still could be > called bool *avoid_libcall, and you'd just > if (avoid_libcall) { *avoid_libcall = true; return retval; } > instead of emitting a libcall, the caller would initialize the bool > variable to false. > > Jakub > Got it. I'm sending updated version of the patch. Hope I've had enough fantasy to write it nice. Tested on both ppc64le and x86_64. Martin [-- Attachment #2: 0001-Introduce-new-libc_func_speed-target-hook-PR-middle-.patch --] [-- Type: text/x-patch, Size: 10749 bytes --] From d766330364aa2a23512f7d4e60491b634c5d0523 Mon Sep 17 00:00:00 2001 From: marxin <mliska@suse.cz> Date: Wed, 14 Mar 2018 09:44:18 +0100 Subject: [PATCH] Introduce new libc_func_speed target hook (PR middle-end/81657). gcc/ChangeLog: 2018-03-14 Martin Liska <mliska@suse.cz> PR middle-end/81657 * builtins.c (expand_builtin_memory_copy_args): Handle situation when libc library provides a fast mempcpy implementation/ * config/i386/i386-protos.h (gnu_libc_func_speed): New. * config/i386/i386.c (enum libc_speed): Likewise. (ix86_libc_func_speed): Likewise. (TARGET_LIBC_FUNC_SPEED): Likewise. * coretypes.h (enum libc_speed): Likewise. * doc/tm.texi: Document new target hook. * doc/tm.texi.in: Likewise. * expr.c (emit_block_move_hints): Handle libc bail out argument. * expr.h (emit_block_move_hints): Add new parameters. * target.def: Add new hook. * targhooks.c (enum libc_speed): New enum. (default_libc_func_speed): Provide a default hook implementation. * targhooks.h (default_libc_func_speed): Likewise. gcc/testsuite/ChangeLog: 2018-03-14 Martin Liska <mliska@suse.cz> * gcc.dg/string-opt-1.c: Adjust scans for i386 and glibc target and others. --- gcc/builtins.c | 15 ++++++++++++++- gcc/config/i386/i386-protos.h | 2 ++ gcc/config/i386/i386.c | 24 ++++++++++++++++++++++++ gcc/coretypes.h | 7 +++++++ gcc/doc/tm.texi | 4 ++++ gcc/doc/tm.texi.in | 1 + gcc/expr.c | 11 ++++++++++- gcc/expr.h | 3 ++- gcc/target.def | 7 +++++++ gcc/targhooks.c | 9 +++++++++ gcc/targhooks.h | 1 + gcc/testsuite/gcc.dg/string-opt-1.c | 5 +++-- 12 files changed, 84 insertions(+), 5 deletions(-) diff --git a/gcc/builtins.c b/gcc/builtins.c index 487d9d58db2..d3cd93ffbfa 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -3651,13 +3651,26 @@ expand_builtin_memory_copy_args (tree dest, tree src, tree len, src_mem = get_memory_rtx (src, len); set_mem_align (src_mem, src_align); + /* emit_block_move_hints can generate a library call to memcpy function. + In situations when a libc library provides fast implementation + of mempcpy, then it's better to call mempcpy directly. */ + bool avoid_libcall + = (endp == 1 + && targetm.libc_func_speed ((int)BUILT_IN_MEMPCPY) == LIBC_FAST_SPEED + && target != const0_rtx); + /* Copy word part most expediently. */ + bool libcall_avoided = false; dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx, CALL_EXPR_TAILCALL (exp) && (endp == 0 || target == const0_rtx) ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL, expected_align, expected_size, - min_size, max_size, probable_max_size); + min_size, max_size, probable_max_size, + avoid_libcall ? &libcall_avoided: NULL); + + if (libcall_avoided) + return NULL_RTX; if (dest_addr == 0) { diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h index ef7c818986f..d3fc515845b 100644 --- a/gcc/config/i386/i386-protos.h +++ b/gcc/config/i386/i386-protos.h @@ -47,6 +47,8 @@ extern void ix86_reset_previous_fndecl (void); extern bool ix86_using_red_zone (void); +extern enum libc_speed gnu_libc_func_speed (int fn); + #ifdef RTX_CODE extern int standard_80387_constant_p (rtx); extern const char *standard_80387_constant_opcode (rtx); diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index b4f6aec1434..aadf9fa5ac3 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -2735,6 +2735,27 @@ ix86_using_red_zone (void) && (!cfun->machine->has_local_indirect_jump || cfun->machine->indirect_branch_type == indirect_branch_keep)); } + +/* This hook determines whether a function from libc has a fast implementation + FN is present at the runtime. We override it for i386 and glibc C library + as this combination provides fast implementation of mempcpy function. */ + +enum libc_speed +ix86_libc_func_speed (int fn) +{ + enum built_in_function f = (built_in_function)fn; + + if (!OPTION_GLIBC) + return LIBC_UNKNOWN_SPEED; + + switch (f) + { + case BUILT_IN_MEMPCPY: + return LIBC_FAST_SPEED; + default: + return LIBC_UNKNOWN_SPEED; + } +} \f /* Return a string that documents the current -m options. The caller is responsible for freeing the string. */ @@ -52105,6 +52126,9 @@ ix86_run_selftests (void) #undef TARGET_WARN_PARAMETER_PASSING_ABI #define TARGET_WARN_PARAMETER_PASSING_ABI ix86_warn_parameter_passing_abi +#undef TARGET_LIBC_FUNC_SPEED +#define TARGET_LIBC_FUNC_SPEED ix86_libc_func_speed + #if CHECKING_P #undef TARGET_RUN_TARGET_SELFTESTS #define TARGET_RUN_TARGET_SELFTESTS selftest::ix86_run_selftests diff --git a/gcc/coretypes.h b/gcc/coretypes.h index 283b4eb33fe..fe618f708f4 100644 --- a/gcc/coretypes.h +++ b/gcc/coretypes.h @@ -384,6 +384,13 @@ enum excess_precision_type EXCESS_PRECISION_TYPE_FAST }; +enum libc_speed +{ + LIBC_FAST_SPEED, + LIBC_SLOW_SPEED, + LIBC_UNKNOWN_SPEED +}; + /* Support for user-provided GGC and PCH markers. The first parameter is a pointer to a pointer, the second a cookie. */ typedef void (*gt_pointer_operator) (void *, void *); diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index bd8b917ba82..0f7c91a22c4 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -5501,6 +5501,10 @@ macro, a reasonable default is used. This hook determines whether a function from a class of functions @var{fn_class} is present at the runtime. @end deftypefn +@deftypefn {Target Hook} libc_speed TARGET_LIBC_FUNC_SPEED (int @var{fn}) +This hook determines whether a function from libc has a fast implementation +@var{fn} is present at the runtime. +@end deftypefn @defmac NEXT_OBJC_RUNTIME Set this macro to 1 to use the "NeXT" Objective-C message sending conventions diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index b0207146e8c..4bb2998a8a1 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -3933,6 +3933,7 @@ macro, a reasonable default is used. @end defmac @hook TARGET_LIBC_HAS_FUNCTION +@hook TARGET_LIBC_FUNC_SPEED @defmac NEXT_OBJC_RUNTIME Set this macro to 1 to use the "NeXT" Objective-C message sending conventions diff --git a/gcc/expr.c b/gcc/expr.c index 00660293f72..f3bd698bc4d 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -1554,6 +1554,8 @@ compare_by_pieces (rtx arg0, rtx arg1, unsigned HOST_WIDE_INT len, MIN_SIZE is the minimal size of block to move MAX_SIZE is the maximal size of block to move, if it can not be represented in unsigned HOST_WIDE_INT, than it is mask of all ones. + If BAIL_OUT_LIBCALL is non-null, do not emit library call and assign + true to the pointer when move is not done. Return the address of the new block, if memcpy is called and returns it, 0 otherwise. */ @@ -1563,7 +1565,8 @@ emit_block_move_hints (rtx x, rtx y, rtx size, enum block_op_methods method, unsigned int expected_align, HOST_WIDE_INT expected_size, unsigned HOST_WIDE_INT min_size, unsigned HOST_WIDE_INT max_size, - unsigned HOST_WIDE_INT probable_max_size) + unsigned HOST_WIDE_INT probable_max_size, + bool *bail_out_libcall) { bool may_use_call; rtx retval = 0; @@ -1625,6 +1628,12 @@ emit_block_move_hints (rtx x, rtx y, rtx size, enum block_op_methods method, && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (x)) && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (y))) { + if (bail_out_libcall) + { + *bail_out_libcall = true; + return retval; + } + /* Since x and y are passed to a libcall, mark the corresponding tree EXPR as addressable. */ tree y_expr = MEM_EXPR (y); diff --git a/gcc/expr.h b/gcc/expr.h index b3d523bcb24..c2bf87fd14e 100644 --- a/gcc/expr.h +++ b/gcc/expr.h @@ -110,7 +110,8 @@ extern rtx emit_block_move_hints (rtx, rtx, rtx, enum block_op_methods, unsigned int, HOST_WIDE_INT, unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT, - unsigned HOST_WIDE_INT); + unsigned HOST_WIDE_INT, + bool *bail_out_libcall = NULL); extern rtx emit_block_cmp_hints (rtx, rtx, rtx, tree, rtx, bool, by_pieces_constfn, void *); extern bool emit_storent_insn (rtx to, rtx from); diff --git a/gcc/target.def b/gcc/target.def index c5b2a1e7e71..3bbddc82776 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -2639,6 +2639,13 @@ DEFHOOK bool, (enum function_class fn_class), default_libc_has_function) +DEFHOOK +(libc_func_speed, + "This hook determines whether a function from libc has a fast implementation\n\ +@var{fn} is present at the runtime.", + libc_speed, (int fn), + default_libc_func_speed) + /* True if new jumps cannot be created, to replace existing ones or not, at the current point in the compilation. */ DEFHOOK diff --git a/gcc/targhooks.c b/gcc/targhooks.c index fafcc6c5196..6e44f6f79cf 100644 --- a/gcc/targhooks.c +++ b/gcc/targhooks.c @@ -1642,6 +1642,15 @@ no_c99_libc_has_function (enum function_class fn_class ATTRIBUTE_UNUSED) return false; } +/* This hook determines whether a function from libc has a fast implementation + FN is present at the runtime. */ + +enum libc_speed +default_libc_func_speed (int) +{ + return LIBC_UNKNOWN_SPEED; +} + tree default_builtin_tm_load_store (tree ARG_UNUSED (type)) { diff --git a/gcc/targhooks.h b/gcc/targhooks.h index 8a4393f2ba4..7508673ad0a 100644 --- a/gcc/targhooks.h +++ b/gcc/targhooks.h @@ -205,6 +205,7 @@ extern bool default_have_conditional_execution (void); extern bool default_libc_has_function (enum function_class); extern bool no_c99_libc_has_function (enum function_class); extern bool gnu_libc_has_function (enum function_class); +extern enum libc_speed default_libc_func_speed (int); extern tree default_builtin_tm_load_store (tree); diff --git a/gcc/testsuite/gcc.dg/string-opt-1.c b/gcc/testsuite/gcc.dg/string-opt-1.c index 2f060732bf0..7faaadcbb1f 100644 --- a/gcc/testsuite/gcc.dg/string-opt-1.c +++ b/gcc/testsuite/gcc.dg/string-opt-1.c @@ -48,5 +48,6 @@ main (void) return 0; } -/* { dg-final { scan-assembler-not "\<mempcpy\>" } } */ -/* { dg-final { scan-assembler "memcpy" } } */ +/* { dg-final { scan-assembler-not "\<mempcpy\>" { target { ! { i?86-*-gnu* x86_64-*-gnu* i?86-*-linux* x86_64-*-linux* } } } } } */ +/* { dg-final { scan-assembler "memcpy" { target { ! { i?86-*-gnu* x86_64-*-gnu* i?86-*-linux* x86_64-*-linux* } } } } } */ +/* { dg-final { scan-assembler "mempcpy" { target { i?86-*-gnu* x86_64-*-gnu* i?86-*-linux* x86_64-*-linux* } } } } */ -- 2.16.2 ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657). 2018-03-28 14:29 ` Martin Liška @ 2018-03-28 14:35 ` Jakub Jelinek 2018-03-28 17:59 ` Martin Liška 0 siblings, 1 reply; 45+ messages in thread From: Jakub Jelinek @ 2018-03-28 14:35 UTC (permalink / raw) To: Martin Liška, Richard Biener, Uros Bizjak Cc: gcc-patches, Marc Glisse, H.J. Lu On Wed, Mar 28, 2018 at 04:18:45PM +0200, Martin LiÅ¡ka wrote: > 2018-03-14 Martin Liska <mliska@suse.cz> > > PR middle-end/81657 > * builtins.c (expand_builtin_memory_copy_args): Handle situation > when libc library provides a fast mempcpy implementation/ > * config/i386/i386-protos.h (gnu_libc_func_speed): New. > * config/i386/i386.c (enum libc_speed): Likewise. > (ix86_libc_func_speed): Likewise. > (TARGET_LIBC_FUNC_SPEED): Likewise. > * coretypes.h (enum libc_speed): Likewise. > * doc/tm.texi: Document new target hook. > * doc/tm.texi.in: Likewise. > * expr.c (emit_block_move_hints): Handle libc bail out argument. > * expr.h (emit_block_move_hints): Add new parameters. > * target.def: Add new hook. > * targhooks.c (enum libc_speed): New enum. > (default_libc_func_speed): Provide a default hook > implementation. > * targhooks.h (default_libc_func_speed): Likewise. > > gcc/testsuite/ChangeLog: > > 2018-03-14 Martin Liska <mliska@suse.cz> > > * gcc.dg/string-opt-1.c: Adjust scans for i386 and glibc target > and others. > /* Copy word part most expediently. */ > + bool libcall_avoided = false; > dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx, > CALL_EXPR_TAILCALL (exp) > && (endp == 0 || target == const0_rtx) > ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL, > expected_align, expected_size, > - min_size, max_size, probable_max_size); > + min_size, max_size, probable_max_size, > + avoid_libcall ? &libcall_avoided: NULL); Missing space before :. > + > + if (libcall_avoided) > + return NULL_RTX; > > --- a/gcc/config/i386/i386-protos.h > +++ b/gcc/config/i386/i386-protos.h > @@ -47,6 +47,8 @@ extern void ix86_reset_previous_fndecl (void); > > extern bool ix86_using_red_zone (void); > > +extern enum libc_speed gnu_libc_func_speed (int fn); Here you declare gnu_libc_func_speed. > --- a/gcc/config/i386/i386.c > +++ b/gcc/config/i386/i386.c > @@ -2735,6 +2735,27 @@ ix86_using_red_zone (void) > && (!cfun->machine->has_local_indirect_jump > || cfun->machine->indirect_branch_type == indirect_branch_keep)); > } > + > +/* This hook determines whether a function from libc has a fast implementation > + FN is present at the runtime. We override it for i386 and glibc C library > + as this combination provides fast implementation of mempcpy function. */ > + > +enum libc_speed > +ix86_libc_func_speed (int fn) But define a different function. > +{ > + enum built_in_function f = (built_in_function)fn; > + > + if (!OPTION_GLIBC) > + return LIBC_UNKNOWN_SPEED; OPTION_GLIBC is only defined if linux.h is included, so I think you break all other x86 targets this way. The hook should have linux in the name, perhaps defined only in config/i386/linux.h and redefined in i386.c through #ifdef SUBTARGET_LIBC_FUNC_SPEED #undef TARGET_LIBC_FUNC_SPEED #define TARGET_LIBC_FUNC_SPEED SUBTARGET_LIBC_FUNC_SPEED #endif or something similar. Otherwise LGTM, but please get approval also from Richi (for the generic part) and Uros (for the backend side). Jakub ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657). 2018-03-28 14:35 ` Jakub Jelinek @ 2018-03-28 17:59 ` Martin Liška 2018-03-28 18:35 ` Jakub Jelinek 0 siblings, 1 reply; 45+ messages in thread From: Martin Liška @ 2018-03-28 17:59 UTC (permalink / raw) To: Jakub Jelinek, Richard Biener, Uros Bizjak Cc: gcc-patches, Marc Glisse, H.J. Lu [-- Attachment #1: Type: text/plain, Size: 3572 bytes --] On 03/28/2018 04:31 PM, Jakub Jelinek wrote: > On Wed, Mar 28, 2018 at 04:18:45PM +0200, Martin LiÅ¡ka wrote: >> 2018-03-14 Martin Liska <mliska@suse.cz> >> >> PR middle-end/81657 >> * builtins.c (expand_builtin_memory_copy_args): Handle situation >> when libc library provides a fast mempcpy implementation/ >> * config/i386/i386-protos.h (gnu_libc_func_speed): New. >> * config/i386/i386.c (enum libc_speed): Likewise. >> (ix86_libc_func_speed): Likewise. >> (TARGET_LIBC_FUNC_SPEED): Likewise. >> * coretypes.h (enum libc_speed): Likewise. >> * doc/tm.texi: Document new target hook. >> * doc/tm.texi.in: Likewise. >> * expr.c (emit_block_move_hints): Handle libc bail out argument. >> * expr.h (emit_block_move_hints): Add new parameters. >> * target.def: Add new hook. >> * targhooks.c (enum libc_speed): New enum. >> (default_libc_func_speed): Provide a default hook >> implementation. >> * targhooks.h (default_libc_func_speed): Likewise. >> >> gcc/testsuite/ChangeLog: >> >> 2018-03-14 Martin Liska <mliska@suse.cz> >> >> * gcc.dg/string-opt-1.c: Adjust scans for i386 and glibc target >> and others. >> /* Copy word part most expediently. */ >> + bool libcall_avoided = false; >> dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx, >> CALL_EXPR_TAILCALL (exp) >> && (endp == 0 || target == const0_rtx) >> ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL, >> expected_align, expected_size, >> - min_size, max_size, probable_max_size); >> + min_size, max_size, probable_max_size, >> + avoid_libcall ? &libcall_avoided: NULL); > > Missing space before :. Fixed. >> + >> + if (libcall_avoided) >> + return NULL_RTX; >> >> --- a/gcc/config/i386/i386-protos.h >> +++ b/gcc/config/i386/i386-protos.h >> @@ -47,6 +47,8 @@ extern void ix86_reset_previous_fndecl (void); >> >> extern bool ix86_using_red_zone (void); >> >> +extern enum libc_speed gnu_libc_func_speed (int fn); > > Here you declare gnu_libc_func_speed. > >> --- a/gcc/config/i386/i386.c >> +++ b/gcc/config/i386/i386.c >> @@ -2735,6 +2735,27 @@ ix86_using_red_zone (void) >> && (!cfun->machine->has_local_indirect_jump >> || cfun->machine->indirect_branch_type == indirect_branch_keep)); >> } >> + >> +/* This hook determines whether a function from libc has a fast implementation >> + FN is present at the runtime. We override it for i386 and glibc C library >> + as this combination provides fast implementation of mempcpy function. */ >> + >> +enum libc_speed >> +ix86_libc_func_speed (int fn) > > But define a different function. You are right. > >> +{ >> + enum built_in_function f = (built_in_function)fn; >> + >> + if (!OPTION_GLIBC) >> + return LIBC_UNKNOWN_SPEED; > > OPTION_GLIBC is only defined if linux.h is included, so I think you break > all other x86 targets this way. That said the proper way is probably to define the function in linux64.c (should we also use it on i386, or nobody cares?). And corresponding declaration will be in config/linux-protos.h. The hook should have linux in the name, > perhaps defined only in config/i386/linux.h and redefined in i386.c through > #ifdef SUBTARGET_LIBC_FUNC_SPEED > #undef TARGET_LIBC_FUNC_SPEED > #define TARGET_LIBC_FUNC_SPEED SUBTARGET_LIBC_FUNC_SPEED > #endif > or something similar. I adopted this suggested mechanism. Thanks for review. Martin > > Otherwise LGTM, but please get approval also from Richi (for the generic part) and > Uros (for the backend side). > > Jakub > [-- Attachment #2: 0001-Introduce-new-libc_func_speed-target-hook-PR-middle-.patch --] [-- Type: text/x-patch, Size: 11379 bytes --] From 4378fa93aafac3d28eaae48c9a41a2f2ef5e1d18 Mon Sep 17 00:00:00 2001 From: marxin <mliska@suse.cz> Date: Wed, 14 Mar 2018 09:44:18 +0100 Subject: [PATCH] Introduce new libc_func_speed target hook (PR middle-end/81657). gcc/ChangeLog: 2018-03-14 Martin Liska <mliska@suse.cz> PR middle-end/81657 * builtins.c (expand_builtin_memory_copy_args): Handle situation when libc library provides a fast mempcpy implementation/ * config/linux-protos.h (ix86_linux_libc_func_speed): New. * config/linux.c (ix86_linux_libc_func_speed): Likewise. (TARGET_LIBC_FUNC_SPEED): Likewise. * config/i386/linux64.h (SUBTARGET_LIBC_FUNC_SPEED): Define macro. * coretypes.h (enum libc_speed): Likewise. * doc/tm.texi: Document new target hook. * doc/tm.texi.in: Likewise. * expr.c (emit_block_move_hints): Handle libc bail out argument. * expr.h (emit_block_move_hints): Add new parameters. * target.def: Add new hook. * targhooks.c (enum libc_speed): New enum. (default_libc_func_speed): Provide a default hook implementation. * targhooks.h (default_libc_func_speed): Likewise. gcc/testsuite/ChangeLog: 2018-03-28 Martin Liska <mliska@suse.cz> * gcc.dg/string-opt-1.c: gcc/testsuite/ChangeLog: 2018-03-14 Martin Liska <mliska@suse.cz> * gcc.dg/string-opt-1.c: Adjust scans for i386 and glibc target and others. --- gcc/builtins.c | 15 ++++++++++++++- gcc/config/i386/i386.c | 5 +++++ gcc/config/i386/linux64.h | 2 ++ gcc/config/linux-protos.h | 1 + gcc/config/linux.c | 21 +++++++++++++++++++++ gcc/coretypes.h | 7 +++++++ gcc/doc/tm.texi | 4 ++++ gcc/doc/tm.texi.in | 1 + gcc/expr.c | 11 ++++++++++- gcc/expr.h | 3 ++- gcc/target.def | 7 +++++++ gcc/targhooks.c | 9 +++++++++ gcc/targhooks.h | 1 + gcc/testsuite/gcc.dg/string-opt-1.c | 5 +++-- 14 files changed, 87 insertions(+), 5 deletions(-) diff --git a/gcc/builtins.c b/gcc/builtins.c index 487d9d58db2..98ee3fb272d 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -3651,13 +3651,26 @@ expand_builtin_memory_copy_args (tree dest, tree src, tree len, src_mem = get_memory_rtx (src, len); set_mem_align (src_mem, src_align); + /* emit_block_move_hints can generate a library call to memcpy function. + In situations when a libc library provides fast implementation + of mempcpy, then it's better to call mempcpy directly. */ + bool avoid_libcall + = (endp == 1 + && targetm.libc_func_speed ((int)BUILT_IN_MEMPCPY) == LIBC_FAST_SPEED + && target != const0_rtx); + /* Copy word part most expediently. */ + bool libcall_avoided = false; dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx, CALL_EXPR_TAILCALL (exp) && (endp == 0 || target == const0_rtx) ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL, expected_align, expected_size, - min_size, max_size, probable_max_size); + min_size, max_size, probable_max_size, + avoid_libcall ? &libcall_avoided : NULL); + + if (libcall_avoided) + return NULL_RTX; if (dest_addr == 0) { diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index b4f6aec1434..2471ff7b99a 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -52105,6 +52105,11 @@ ix86_run_selftests (void) #undef TARGET_WARN_PARAMETER_PASSING_ABI #define TARGET_WARN_PARAMETER_PASSING_ABI ix86_warn_parameter_passing_abi +#ifdef SUBTARGET_LIBC_FUNC_SPEED +#undef TARGET_LIBC_FUNC_SPEED +#define TARGET_LIBC_FUNC_SPEED SUBTARGET_LIBC_FUNC_SPEED +#endif + #if CHECKING_P #undef TARGET_RUN_TARGET_SELFTESTS #define TARGET_RUN_TARGET_SELFTESTS selftest::ix86_run_selftests diff --git a/gcc/config/i386/linux64.h b/gcc/config/i386/linux64.h index f2d913e30ac..d855f5cc239 100644 --- a/gcc/config/i386/linux64.h +++ b/gcc/config/i386/linux64.h @@ -37,3 +37,5 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see #define MUSL_DYNAMIC_LINKER64 "/lib/ld-musl-x86_64.so.1" #undef MUSL_DYNAMIC_LINKERX32 #define MUSL_DYNAMIC_LINKERX32 "/lib/ld-musl-x32.so.1" + +#define SUBTARGET_LIBC_FUNC_SPEED ix86_linux_libc_func_speed diff --git a/gcc/config/linux-protos.h b/gcc/config/linux-protos.h index 9da8dd7ecaa..b7284735366 100644 --- a/gcc/config/linux-protos.h +++ b/gcc/config/linux-protos.h @@ -20,3 +20,4 @@ along with GCC; see the file COPYING3. If not see extern bool linux_has_ifunc_p (void); extern bool linux_libc_has_function (enum function_class fn_class); +extern enum libc_speed ix86_linux_libc_func_speed (int fn); diff --git a/gcc/config/linux.c b/gcc/config/linux.c index bc06e54a73d..b878d51e4b4 100644 --- a/gcc/config/linux.c +++ b/gcc/config/linux.c @@ -37,3 +37,24 @@ linux_libc_has_function (enum function_class fn_class) return false; } + +/* This hook determines whether a function from libc has a fast implementation + FN is present at the runtime. We override it for i386 and glibc C library + as this combination provides fast implementation of mempcpy function. */ + +enum libc_speed +ix86_linux_libc_func_speed (int fn) +{ + enum built_in_function f = (built_in_function)fn; + + if (!OPTION_GLIBC) + return LIBC_UNKNOWN_SPEED; + + switch (f) + { + case BUILT_IN_MEMPCPY: + return LIBC_FAST_SPEED; + default: + return LIBC_UNKNOWN_SPEED; + } +} diff --git a/gcc/coretypes.h b/gcc/coretypes.h index 283b4eb33fe..fe618f708f4 100644 --- a/gcc/coretypes.h +++ b/gcc/coretypes.h @@ -384,6 +384,13 @@ enum excess_precision_type EXCESS_PRECISION_TYPE_FAST }; +enum libc_speed +{ + LIBC_FAST_SPEED, + LIBC_SLOW_SPEED, + LIBC_UNKNOWN_SPEED +}; + /* Support for user-provided GGC and PCH markers. The first parameter is a pointer to a pointer, the second a cookie. */ typedef void (*gt_pointer_operator) (void *, void *); diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index bd8b917ba82..0f7c91a22c4 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -5501,6 +5501,10 @@ macro, a reasonable default is used. This hook determines whether a function from a class of functions @var{fn_class} is present at the runtime. @end deftypefn +@deftypefn {Target Hook} libc_speed TARGET_LIBC_FUNC_SPEED (int @var{fn}) +This hook determines whether a function from libc has a fast implementation +@var{fn} is present at the runtime. +@end deftypefn @defmac NEXT_OBJC_RUNTIME Set this macro to 1 to use the "NeXT" Objective-C message sending conventions diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index b0207146e8c..4bb2998a8a1 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -3933,6 +3933,7 @@ macro, a reasonable default is used. @end defmac @hook TARGET_LIBC_HAS_FUNCTION +@hook TARGET_LIBC_FUNC_SPEED @defmac NEXT_OBJC_RUNTIME Set this macro to 1 to use the "NeXT" Objective-C message sending conventions diff --git a/gcc/expr.c b/gcc/expr.c index 00660293f72..f3bd698bc4d 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -1554,6 +1554,8 @@ compare_by_pieces (rtx arg0, rtx arg1, unsigned HOST_WIDE_INT len, MIN_SIZE is the minimal size of block to move MAX_SIZE is the maximal size of block to move, if it can not be represented in unsigned HOST_WIDE_INT, than it is mask of all ones. + If BAIL_OUT_LIBCALL is non-null, do not emit library call and assign + true to the pointer when move is not done. Return the address of the new block, if memcpy is called and returns it, 0 otherwise. */ @@ -1563,7 +1565,8 @@ emit_block_move_hints (rtx x, rtx y, rtx size, enum block_op_methods method, unsigned int expected_align, HOST_WIDE_INT expected_size, unsigned HOST_WIDE_INT min_size, unsigned HOST_WIDE_INT max_size, - unsigned HOST_WIDE_INT probable_max_size) + unsigned HOST_WIDE_INT probable_max_size, + bool *bail_out_libcall) { bool may_use_call; rtx retval = 0; @@ -1625,6 +1628,12 @@ emit_block_move_hints (rtx x, rtx y, rtx size, enum block_op_methods method, && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (x)) && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (y))) { + if (bail_out_libcall) + { + *bail_out_libcall = true; + return retval; + } + /* Since x and y are passed to a libcall, mark the corresponding tree EXPR as addressable. */ tree y_expr = MEM_EXPR (y); diff --git a/gcc/expr.h b/gcc/expr.h index b3d523bcb24..c2bf87fd14e 100644 --- a/gcc/expr.h +++ b/gcc/expr.h @@ -110,7 +110,8 @@ extern rtx emit_block_move_hints (rtx, rtx, rtx, enum block_op_methods, unsigned int, HOST_WIDE_INT, unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT, - unsigned HOST_WIDE_INT); + unsigned HOST_WIDE_INT, + bool *bail_out_libcall = NULL); extern rtx emit_block_cmp_hints (rtx, rtx, rtx, tree, rtx, bool, by_pieces_constfn, void *); extern bool emit_storent_insn (rtx to, rtx from); diff --git a/gcc/target.def b/gcc/target.def index c5b2a1e7e71..3bbddc82776 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -2639,6 +2639,13 @@ DEFHOOK bool, (enum function_class fn_class), default_libc_has_function) +DEFHOOK +(libc_func_speed, + "This hook determines whether a function from libc has a fast implementation\n\ +@var{fn} is present at the runtime.", + libc_speed, (int fn), + default_libc_func_speed) + /* True if new jumps cannot be created, to replace existing ones or not, at the current point in the compilation. */ DEFHOOK diff --git a/gcc/targhooks.c b/gcc/targhooks.c index fafcc6c5196..6e44f6f79cf 100644 --- a/gcc/targhooks.c +++ b/gcc/targhooks.c @@ -1642,6 +1642,15 @@ no_c99_libc_has_function (enum function_class fn_class ATTRIBUTE_UNUSED) return false; } +/* This hook determines whether a function from libc has a fast implementation + FN is present at the runtime. */ + +enum libc_speed +default_libc_func_speed (int) +{ + return LIBC_UNKNOWN_SPEED; +} + tree default_builtin_tm_load_store (tree ARG_UNUSED (type)) { diff --git a/gcc/targhooks.h b/gcc/targhooks.h index 8a4393f2ba4..7508673ad0a 100644 --- a/gcc/targhooks.h +++ b/gcc/targhooks.h @@ -205,6 +205,7 @@ extern bool default_have_conditional_execution (void); extern bool default_libc_has_function (enum function_class); extern bool no_c99_libc_has_function (enum function_class); extern bool gnu_libc_has_function (enum function_class); +extern enum libc_speed default_libc_func_speed (int); extern tree default_builtin_tm_load_store (tree); diff --git a/gcc/testsuite/gcc.dg/string-opt-1.c b/gcc/testsuite/gcc.dg/string-opt-1.c index 2f060732bf0..7faaadcbb1f 100644 --- a/gcc/testsuite/gcc.dg/string-opt-1.c +++ b/gcc/testsuite/gcc.dg/string-opt-1.c @@ -48,5 +48,6 @@ main (void) return 0; } -/* { dg-final { scan-assembler-not "\<mempcpy\>" } } */ -/* { dg-final { scan-assembler "memcpy" } } */ +/* { dg-final { scan-assembler-not "\<mempcpy\>" { target { ! { i?86-*-gnu* x86_64-*-gnu* i?86-*-linux* x86_64-*-linux* } } } } } */ +/* { dg-final { scan-assembler "memcpy" { target { ! { i?86-*-gnu* x86_64-*-gnu* i?86-*-linux* x86_64-*-linux* } } } } } */ +/* { dg-final { scan-assembler "mempcpy" { target { i?86-*-gnu* x86_64-*-gnu* i?86-*-linux* x86_64-*-linux* } } } } */ -- 2.16.2 ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657). 2018-03-28 17:59 ` Martin Liška @ 2018-03-28 18:35 ` Jakub Jelinek 2018-03-29 12:14 ` Martin Liška 0 siblings, 1 reply; 45+ messages in thread From: Jakub Jelinek @ 2018-03-28 18:35 UTC (permalink / raw) To: Martin Liška Cc: Richard Biener, Uros Bizjak, gcc-patches, Marc Glisse, H.J. Lu On Wed, Mar 28, 2018 at 06:30:21PM +0200, Martin LiÅ¡ka wrote: > --- a/gcc/config/linux.c > +++ b/gcc/config/linux.c > @@ -37,3 +37,24 @@ linux_libc_has_function (enum function_class fn_class) > > return false; > } > + > +/* This hook determines whether a function from libc has a fast implementation > + FN is present at the runtime. We override it for i386 and glibc C library > + as this combination provides fast implementation of mempcpy function. */ > + > +enum libc_speed > +ix86_linux_libc_func_speed (int fn) Putting a ix86_ function into config/linux.c used by most linux targets is weird. Either we multiple linux targets with mempcpy fast, then name it somehow cpu neutral and let all those CPUs pick it up in config/*/linux.h. And yes, we do care about i?86-linux. Or it is for x86 only, and then it shouldn't be in config/linux.c, but either e.g. static inline in config/i386/linux.h, or we need config/i386/linux.c if we don't have it already. Jakub ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657). 2018-03-28 18:35 ` Jakub Jelinek @ 2018-03-29 12:14 ` Martin Liška 2018-03-29 12:31 ` H.J. Lu 2018-03-29 12:43 ` Jakub Jelinek 0 siblings, 2 replies; 45+ messages in thread From: Martin Liška @ 2018-03-29 12:14 UTC (permalink / raw) To: Jakub Jelinek Cc: Richard Biener, Uros Bizjak, gcc-patches, Marc Glisse, H.J. Lu On 03/28/2018 06:36 PM, Jakub Jelinek wrote: > On Wed, Mar 28, 2018 at 06:30:21PM +0200, Martin LiÅ¡ka wrote: >> --- a/gcc/config/linux.c >> +++ b/gcc/config/linux.c >> @@ -37,3 +37,24 @@ linux_libc_has_function (enum function_class fn_class) >> >> return false; >> } >> + >> +/* This hook determines whether a function from libc has a fast implementation >> + FN is present at the runtime. We override it for i386 and glibc C library >> + as this combination provides fast implementation of mempcpy function. */ >> + >> +enum libc_speed >> +ix86_linux_libc_func_speed (int fn) > > Putting a ix86_ function into config/linux.c used by most linux targets is > weird. Either we multiple linux targets with mempcpy fast, then name it > somehow cpu neutral and let all those CPUs pick it up in config/*/linux.h. > And yes, we do care about i?86-linux. Or it is for x86 only, and then > it shouldn't be in config/linux.c, but either e.g. static inline in > config/i386/linux.h, or we need config/i386/linux.c if we don't have it > already. I'm fine with putting the implementation into gcc/config/i386/linux.c. Can you please help me how to conditionally build the file? Martin > > Jakub > ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657). 2018-03-29 12:14 ` Martin Liška @ 2018-03-29 12:31 ` H.J. Lu 2018-03-29 12:43 ` Jakub Jelinek 1 sibling, 0 replies; 45+ messages in thread From: H.J. Lu @ 2018-03-29 12:31 UTC (permalink / raw) To: Martin Liška Cc: Jakub Jelinek, Richard Biener, Uros Bizjak, GCC Patches, Marc Glisse On Thu, Mar 29, 2018 at 4:28 AM, Martin Liška <mliska@suse.cz> wrote: > On 03/28/2018 06:36 PM, Jakub Jelinek wrote: >> >> On Wed, Mar 28, 2018 at 06:30:21PM +0200, Martin Liška wrote: >>> >>> --- a/gcc/config/linux.c >>> +++ b/gcc/config/linux.c >>> @@ -37,3 +37,24 @@ linux_libc_has_function (enum function_class fn_class) >>> return false; >>> } >>> + >>> +/* This hook determines whether a function from libc has a fast >>> implementation >>> + FN is present at the runtime. We override it for i386 and glibc C >>> library >>> + as this combination provides fast implementation of mempcpy function. >>> */ >>> + >>> +enum libc_speed >>> +ix86_linux_libc_func_speed (int fn) >> >> >> Putting a ix86_ function into config/linux.c used by most linux targets is >> weird. Either we multiple linux targets with mempcpy fast, then name it >> somehow cpu neutral and let all those CPUs pick it up in config/*/linux.h. >> And yes, we do care about i?86-linux. Or it is for x86 only, and then >> it shouldn't be in config/linux.c, but either e.g. static inline in >> config/i386/linux.h, or we need config/i386/linux.c if we don't have it >> already. > > > I'm fine with putting the implementation into gcc/config/i386/linux.c. Can > you please > help me how to conditionally build the file? config.gcc: extra_objs="${extra_objs} linux.o" config.gcc: extra_objs="$extra_objs powerpcspe-linux.o" config.gcc: extra_objs="$extra_objs rs6000-linux.o" -- H.J. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657). 2018-03-29 12:14 ` Martin Liška 2018-03-29 12:31 ` H.J. Lu @ 2018-03-29 12:43 ` Jakub Jelinek 2018-03-29 12:35 ` Martin Liška 1 sibling, 1 reply; 45+ messages in thread From: Jakub Jelinek @ 2018-03-29 12:43 UTC (permalink / raw) To: Martin Liška Cc: Richard Biener, Uros Bizjak, gcc-patches, Marc Glisse, H.J. Lu On Thu, Mar 29, 2018 at 01:28:13PM +0200, Martin LiÅ¡ka wrote: > On 03/28/2018 06:36 PM, Jakub Jelinek wrote: > > On Wed, Mar 28, 2018 at 06:30:21PM +0200, Martin LiÅ¡ka wrote: > > > --- a/gcc/config/linux.c > > > +++ b/gcc/config/linux.c > > > @@ -37,3 +37,24 @@ linux_libc_has_function (enum function_class fn_class) > > > return false; > > > } > > > + > > > +/* This hook determines whether a function from libc has a fast implementation > > > + FN is present at the runtime. We override it for i386 and glibc C library > > > + as this combination provides fast implementation of mempcpy function. */ > > > + > > > +enum libc_speed > > > +ix86_linux_libc_func_speed (int fn) > > > > Putting a ix86_ function into config/linux.c used by most linux targets is > > weird. Either we multiple linux targets with mempcpy fast, then name it > > somehow cpu neutral and let all those CPUs pick it up in config/*/linux.h. > > And yes, we do care about i?86-linux. Or it is for x86 only, and then > > it shouldn't be in config/linux.c, but either e.g. static inline in > > config/i386/linux.h, or we need config/i386/linux.c if we don't have it > > already. > > I'm fine with putting the implementation into gcc/config/i386/linux.c. Can you please Can't you just put it into gcc/config/i386/linux-common.h as static inline, so that it is optimized away whenever not needed? If you really want to add a c file, it better not be called linux.c, because linux.o for it would clash with linux.o from gcc/config/linux.c. And, you'd need to add the whatever.o into extra_objs in gcc/config.gcc and add rules for it into gcc/config/i386/t-linux (see linux.o in config.gcc and config/t-linux). > help me how to conditionally build the file? Jakub ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657). 2018-03-29 12:43 ` Jakub Jelinek @ 2018-03-29 12:35 ` Martin Liška 2018-04-04 10:13 ` Martin Liška 2018-04-09 12:31 ` Martin Liška 0 siblings, 2 replies; 45+ messages in thread From: Martin Liška @ 2018-03-29 12:35 UTC (permalink / raw) To: Jakub Jelinek Cc: Richard Biener, Uros Bizjak, gcc-patches, Marc Glisse, H.J. Lu On 03/29/2018 02:25 PM, Jakub Jelinek wrote: > On Thu, Mar 29, 2018 at 01:28:13PM +0200, Martin LiÅ¡ka wrote: >> On 03/28/2018 06:36 PM, Jakub Jelinek wrote: >>> On Wed, Mar 28, 2018 at 06:30:21PM +0200, Martin LiÅ¡ka wrote: >>>> --- a/gcc/config/linux.c >>>> +++ b/gcc/config/linux.c >>>> @@ -37,3 +37,24 @@ linux_libc_has_function (enum function_class fn_class) >>>> return false; >>>> } >>>> + >>>> +/* This hook determines whether a function from libc has a fast implementation >>>> + FN is present at the runtime. We override it for i386 and glibc C library >>>> + as this combination provides fast implementation of mempcpy function. */ >>>> + >>>> +enum libc_speed >>>> +ix86_linux_libc_func_speed (int fn) >>> >>> Putting a ix86_ function into config/linux.c used by most linux targets is >>> weird. Either we multiple linux targets with mempcpy fast, then name it >>> somehow cpu neutral and let all those CPUs pick it up in config/*/linux.h. >>> And yes, we do care about i?86-linux. Or it is for x86 only, and then >>> it shouldn't be in config/linux.c, but either e.g. static inline in >>> config/i386/linux.h, or we need config/i386/linux.c if we don't have it >>> already. >> >> I'm fine with putting the implementation into gcc/config/i386/linux.c. Can you please > > Can't you just put it into gcc/config/i386/linux-common.h as static inline, > so that it is optimized away whenever not needed? I would like to put it there, but: g++ -c -g -DIN_GCC -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common -DHAVE_CONFIG_H -DGENERATOR_FILE -fno-PIE -I. -Ibuild -I../../gcc -I../../gcc/build -I../../gcc/../include -I../../gcc/../libcpp/include \ -o build/genpreds.o ../../gcc/genpreds.c In file included from ./tm.h:38:0, from ../../gcc/genpreds.c:26: ../../gcc/config/i386/linux-common.h: In function âlibc_speed ix86_linux_libc_func_speed(int)â: ../../gcc/config/i386/linux-common.h:137:8: error: use of enum âbuilt_in_functionâ without previous declaration enum built_in_function f = (built_in_function)fn; ^~~~~~~~~~~~~~~~~ ../../gcc/config/i386/linux-common.h:137:31: error: âbuilt_in_functionâ was not declared in this scope enum built_in_function f = (built_in_function)fn; ^~~~~~~~~~~~~~~~~ ../../gcc/config/i386/linux-common.h:137:31: note: suggested alternative: âmachine_functionâ enum built_in_function f = (built_in_function)fn; ^~~~~~~~~~~~~~~~~ machine_function ../../gcc/config/i386/linux-common.h:144:12: error: âBUILT_IN_MEMPCPYâ was not declared in this scope case BUILT_IN_MEMPCPY: ^~~~~~~~~~~~~~~~ Martin > > If you really want to add a c file, it better not be called linux.c, because > linux.o for it would clash with linux.o from gcc/config/linux.c. And, > you'd need to add the whatever.o into extra_objs in gcc/config.gcc and add > rules for it into gcc/config/i386/t-linux (see linux.o in config.gcc and > config/t-linux). > >> help me how to conditionally build the file? > > Jakub > ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657). 2018-03-29 12:35 ` Martin Liška @ 2018-04-04 10:13 ` Martin Liška 2018-04-09 12:31 ` Martin Liška 1 sibling, 0 replies; 45+ messages in thread From: Martin Liška @ 2018-04-04 10:13 UTC (permalink / raw) To: Jakub Jelinek Cc: Richard Biener, Uros Bizjak, gcc-patches, Marc Glisse, H.J. Lu PING^1 On 03/29/2018 02:31 PM, Martin LiÅ¡ka wrote: > On 03/29/2018 02:25 PM, Jakub Jelinek wrote: >> On Thu, Mar 29, 2018 at 01:28:13PM +0200, Martin LiÅ¡ka wrote: >>> On 03/28/2018 06:36 PM, Jakub Jelinek wrote: >>>> On Wed, Mar 28, 2018 at 06:30:21PM +0200, Martin LiÅ¡ka wrote: >>>>> --- a/gcc/config/linux.c >>>>> +++ b/gcc/config/linux.c >>>>> @@ -37,3 +37,24 @@ linux_libc_has_function (enum function_class fn_class) >>>>>     return false; >>>>>   } >>>>> + >>>>> +/* This hook determines whether a function from libc has a fast implementation >>>>> +  FN is present at the runtime. We override it for i386 and glibc C library >>>>> +  as this combination provides fast implementation of mempcpy function. */ >>>>> + >>>>> +enum libc_speed >>>>> +ix86_linux_libc_func_speed (int fn) >>>> >>>> Putting a ix86_ function into config/linux.c used by most linux targets is >>>> weird. Either we multiple linux targets with mempcpy fast, then name it >>>> somehow cpu neutral and let all those CPUs pick it up in config/*/linux.h. >>>> And yes, we do care about i?86-linux. Or it is for x86 only, and then >>>> it shouldn't be in config/linux.c, but either e.g. static inline in >>>> config/i386/linux.h, or we need config/i386/linux.c if we don't have it >>>> already. >>> >>> I'm fine with putting the implementation into gcc/config/i386/linux.c. Can you please >> >> Can't you just put it into gcc/config/i386/linux-common.h as static inline, >> so that it is optimized away whenever not needed? > > I would like to put it there, but: > > g++ -c  -g -DIN_GCC    -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common -DHAVE_CONFIG_H -DGENERATOR_FILE -fno-PIE -I. -Ibuild -I../../gcc -I../../gcc/build -I../../gcc/../include -I../../gcc/../libcpp/include \ >     -o build/genpreds.o ../../gcc/genpreds.c > In file included from ./tm.h:38:0, >                 from ../../gcc/genpreds.c:26: > ../../gcc/config/i386/linux-common.h: In function âlibc_speed ix86_linux_libc_func_speed(int)â: > ../../gcc/config/i386/linux-common.h:137:8: error: use of enum âbuilt_in_functionâ without previous declaration >   enum built_in_function f = (built_in_function)fn; >        ^~~~~~~~~~~~~~~~~ > ../../gcc/config/i386/linux-common.h:137:31: error: âbuilt_in_functionâ was not declared in this scope >   enum built_in_function f = (built_in_function)fn; >                               ^~~~~~~~~~~~~~~~~ > ../../gcc/config/i386/linux-common.h:137:31: note: suggested alternative: âmachine_functionâ >   enum built_in_function f = (built_in_function)fn; >                               ^~~~~~~~~~~~~~~~~ >                               machine_function > ../../gcc/config/i386/linux-common.h:144:12: error: âBUILT_IN_MEMPCPYâ was not declared in this scope >       case BUILT_IN_MEMPCPY: >            ^~~~~~~~~~~~~~~~ > Martin > >> >> If you really want to add a c file, it better not be called linux.c, because >> linux.o for it would clash with linux.o from gcc/config/linux.c. And, >> you'd need to add the whatever.o into extra_objs in gcc/config.gcc and add >> rules for it into gcc/config/i386/t-linux (see linux.o in config.gcc and >> config/t-linux). >> >>> help me how to conditionally build the file? >> >>     Jakub >> > ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657). 2018-03-29 12:35 ` Martin Liška 2018-04-04 10:13 ` Martin Liška @ 2018-04-09 12:31 ` Martin Liška 2018-04-10 10:35 ` Jakub Jelinek 1 sibling, 1 reply; 45+ messages in thread From: Martin Liška @ 2018-04-09 12:31 UTC (permalink / raw) To: Jakub Jelinek Cc: Richard Biener, Uros Bizjak, gcc-patches, Marc Glisse, H.J. Lu [-- Attachment #1: Type: text/plain, Size: 148 bytes --] Hi. I'm sending updated version of the patch. Patch can bootstrap on ppc64le-redhat-linux and x86_64-linux and survives regression tests. Martin [-- Attachment #2: 0001-Introduce-new-libc_func_speed-target-hook-PR-middle-.patch --] [-- Type: text/x-patch, Size: 14328 bytes --] From 6d19b1bf0c28a683e1458e16943e34bb547d36d6 Mon Sep 17 00:00:00 2001 From: marxin <mliska@suse.cz> Date: Wed, 14 Mar 2018 09:44:18 +0100 Subject: [PATCH] Introduce new libc_func_speed target hook (PR middle-end/81657). gcc/ChangeLog: 2018-03-14 Martin Liska <mliska@suse.cz> PR middle-end/81657 * builtins.c (expand_builtin_memory_copy_args): Handle situation when libc library provides a fast mempcpy implementation/ * config/linux-protos.h (ix86_linux_libc_func_speed): New. (TARGET_LIBC_FUNC_SPEED): Likewise. * config/i386/linux64.h (SUBTARGET_LIBC_FUNC_SPEED): Define macro. * config/i386/linux.h (SUBTARGET_LIBC_FUNC_SPEED): Likewise. * config/i386/t-linux: Add x86-linux.o. * config.gcc: Likewise. * config/i386/x86-linux.c: New file. * coretypes.h (enum libc_speed): Likewise. * doc/tm.texi: Document new target hook. * doc/tm.texi.in: Likewise. * expr.c (emit_block_move_hints): Handle libc bail out argument. * expr.h (emit_block_move_hints): Add new parameters. * target.def: Add new hook. * targhooks.c (enum libc_speed): New enum. (default_libc_func_speed): Provide a default hook implementation. * targhooks.h (default_libc_func_speed): Likewise. gcc/testsuite/ChangeLog: 2018-03-28 Martin Liska <mliska@suse.cz> * gcc.dg/string-opt-1.c: gcc/testsuite/ChangeLog: 2018-03-14 Martin Liska <mliska@suse.cz> * gcc.dg/string-opt-1.c: Adjust scans for i386 and glibc target and others. --- gcc/builtins.c | 15 ++++++++++- gcc/config.gcc | 1 + gcc/config/i386/i386.c | 5 ++++ gcc/config/i386/linux.h | 2 ++ gcc/config/i386/linux64.h | 2 ++ gcc/config/i386/t-linux | 6 +++++ gcc/config/i386/x86-linux.c | 54 +++++++++++++++++++++++++++++++++++++ gcc/config/linux-protos.h | 1 + gcc/coretypes.h | 7 +++++ gcc/doc/tm.texi | 4 +++ gcc/doc/tm.texi.in | 1 + gcc/expr.c | 11 +++++++- gcc/expr.h | 3 ++- gcc/target.def | 7 +++++ gcc/targhooks.c | 9 +++++++ gcc/targhooks.h | 1 + gcc/testsuite/gcc.dg/string-opt-1.c | 5 ++-- 17 files changed, 129 insertions(+), 5 deletions(-) create mode 100644 gcc/config/i386/x86-linux.c diff --git a/gcc/builtins.c b/gcc/builtins.c index 487d9d58db2..98ee3fb272d 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -3651,13 +3651,26 @@ expand_builtin_memory_copy_args (tree dest, tree src, tree len, src_mem = get_memory_rtx (src, len); set_mem_align (src_mem, src_align); + /* emit_block_move_hints can generate a library call to memcpy function. + In situations when a libc library provides fast implementation + of mempcpy, then it's better to call mempcpy directly. */ + bool avoid_libcall + = (endp == 1 + && targetm.libc_func_speed ((int)BUILT_IN_MEMPCPY) == LIBC_FAST_SPEED + && target != const0_rtx); + /* Copy word part most expediently. */ + bool libcall_avoided = false; dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx, CALL_EXPR_TAILCALL (exp) && (endp == 0 || target == const0_rtx) ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL, expected_align, expected_size, - min_size, max_size, probable_max_size); + min_size, max_size, probable_max_size, + avoid_libcall ? &libcall_avoided : NULL); + + if (libcall_avoided) + return NULL_RTX; if (dest_addr == 0) { diff --git a/gcc/config.gcc b/gcc/config.gcc index 1b58c060a92..6445ff569b3 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -1607,6 +1607,7 @@ x86_64-*-linux* | x86_64-*-kfreebsd*-gnu) x86_64-*-linux*) tm_file="${tm_file} linux.h linux-android.h i386/linux-common.h i386/linux64.h" extra_options="${extra_options} linux-android.opt" + extra_objs="${extra_objs} x86-linux.o" ;; x86_64-*-kfreebsd*-gnu) tm_file="${tm_file} kfreebsd-gnu.h i386/kfreebsd-gnu64.h" diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index b4f6aec1434..2471ff7b99a 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -52105,6 +52105,11 @@ ix86_run_selftests (void) #undef TARGET_WARN_PARAMETER_PASSING_ABI #define TARGET_WARN_PARAMETER_PASSING_ABI ix86_warn_parameter_passing_abi +#ifdef SUBTARGET_LIBC_FUNC_SPEED +#undef TARGET_LIBC_FUNC_SPEED +#define TARGET_LIBC_FUNC_SPEED SUBTARGET_LIBC_FUNC_SPEED +#endif + #if CHECKING_P #undef TARGET_RUN_TARGET_SELFTESTS #define TARGET_RUN_TARGET_SELFTESTS selftest::ix86_run_selftests diff --git a/gcc/config/i386/linux.h b/gcc/config/i386/linux.h index 69f97f15b0d..6c59cbd6d62 100644 --- a/gcc/config/i386/linux.h +++ b/gcc/config/i386/linux.h @@ -24,3 +24,5 @@ along with GCC; see the file COPYING3. If not see #undef MUSL_DYNAMIC_LINKER #define MUSL_DYNAMIC_LINKER "/lib/ld-musl-i386.so.1" + +#define SUBTARGET_LIBC_FUNC_SPEED ix86_linux_libc_func_speed diff --git a/gcc/config/i386/linux64.h b/gcc/config/i386/linux64.h index f2d913e30ac..d855f5cc239 100644 --- a/gcc/config/i386/linux64.h +++ b/gcc/config/i386/linux64.h @@ -37,3 +37,5 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see #define MUSL_DYNAMIC_LINKER64 "/lib/ld-musl-x86_64.so.1" #undef MUSL_DYNAMIC_LINKERX32 #define MUSL_DYNAMIC_LINKERX32 "/lib/ld-musl-x32.so.1" + +#define SUBTARGET_LIBC_FUNC_SPEED ix86_linux_libc_func_speed diff --git a/gcc/config/i386/t-linux b/gcc/config/i386/t-linux index 155314c08a7..6e3ebe94fe8 100644 --- a/gcc/config/i386/t-linux +++ b/gcc/config/i386/t-linux @@ -1 +1,7 @@ MULTIARCH_DIRNAME = $(call if_multiarch,i386-linux-gnu) + +x86-linux.o: $(srcdir)/config/i386/x86-linux.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \ + $(TM_H) $(RTL_H) $(REGS_H) hard-reg-set.h output.h $(TREE_H) flags.h \ + $(TM_P_H) $(HASHTAB_H) $(GGC_H) + $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \ + $(srcdir)/config/i386/x86-linux.c diff --git a/gcc/config/i386/x86-linux.c b/gcc/config/i386/x86-linux.c new file mode 100644 index 00000000000..71291d18a71 --- /dev/null +++ b/gcc/config/i386/x86-linux.c @@ -0,0 +1,54 @@ +/* Implementation for linux-specific functions for i386 and x86-64 systems. + Copyright (C) 2018 Free Software Foundation, Inc. + Contributed by Martin Liska <mliska@suse.cz>. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 3, or (at your option) +any later version. + +GCC is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +Under Section 7 of GPL version 3, you are granted additional +permissions described in the GCC Runtime Library Exception, version +3.1, as published by the Free Software Foundation. + +You should have received a copy of the GNU General Public License and +a copy of the GCC Runtime Library Exception along with this program; +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +<http://www.gnu.org/licenses/>. */ + +#define IN_TARGET_CODE 1 + +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "cp/cp-tree.h" /* This is why we're a separate module. */ +#include "stringpool.h" +#include "attribs.h" + +/* This hook determines whether a function from libc has a fast implementation + FN is present at the runtime. We override it for i386 and glibc C library + as this combination provides fast implementation of mempcpy function. */ + +enum libc_speed +ix86_linux_libc_func_speed (int fn) +{ + enum built_in_function f = (built_in_function)fn; + + if (!OPTION_GLIBC) + return LIBC_UNKNOWN_SPEED; + + switch (f) + { + case BUILT_IN_MEMPCPY: + return LIBC_FAST_SPEED; + default: + return LIBC_UNKNOWN_SPEED; + } +} diff --git a/gcc/config/linux-protos.h b/gcc/config/linux-protos.h index 9da8dd7ecaa..b7284735366 100644 --- a/gcc/config/linux-protos.h +++ b/gcc/config/linux-protos.h @@ -20,3 +20,4 @@ along with GCC; see the file COPYING3. If not see extern bool linux_has_ifunc_p (void); extern bool linux_libc_has_function (enum function_class fn_class); +extern enum libc_speed ix86_linux_libc_func_speed (int fn); diff --git a/gcc/coretypes.h b/gcc/coretypes.h index 283b4eb33fe..fe618f708f4 100644 --- a/gcc/coretypes.h +++ b/gcc/coretypes.h @@ -384,6 +384,13 @@ enum excess_precision_type EXCESS_PRECISION_TYPE_FAST }; +enum libc_speed +{ + LIBC_FAST_SPEED, + LIBC_SLOW_SPEED, + LIBC_UNKNOWN_SPEED +}; + /* Support for user-provided GGC and PCH markers. The first parameter is a pointer to a pointer, the second a cookie. */ typedef void (*gt_pointer_operator) (void *, void *); diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index bd8b917ba82..0f7c91a22c4 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -5501,6 +5501,10 @@ macro, a reasonable default is used. This hook determines whether a function from a class of functions @var{fn_class} is present at the runtime. @end deftypefn +@deftypefn {Target Hook} libc_speed TARGET_LIBC_FUNC_SPEED (int @var{fn}) +This hook determines whether a function from libc has a fast implementation +@var{fn} is present at the runtime. +@end deftypefn @defmac NEXT_OBJC_RUNTIME Set this macro to 1 to use the "NeXT" Objective-C message sending conventions diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index b0207146e8c..4bb2998a8a1 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -3933,6 +3933,7 @@ macro, a reasonable default is used. @end defmac @hook TARGET_LIBC_HAS_FUNCTION +@hook TARGET_LIBC_FUNC_SPEED @defmac NEXT_OBJC_RUNTIME Set this macro to 1 to use the "NeXT" Objective-C message sending conventions diff --git a/gcc/expr.c b/gcc/expr.c index 00660293f72..f3bd698bc4d 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -1554,6 +1554,8 @@ compare_by_pieces (rtx arg0, rtx arg1, unsigned HOST_WIDE_INT len, MIN_SIZE is the minimal size of block to move MAX_SIZE is the maximal size of block to move, if it can not be represented in unsigned HOST_WIDE_INT, than it is mask of all ones. + If BAIL_OUT_LIBCALL is non-null, do not emit library call and assign + true to the pointer when move is not done. Return the address of the new block, if memcpy is called and returns it, 0 otherwise. */ @@ -1563,7 +1565,8 @@ emit_block_move_hints (rtx x, rtx y, rtx size, enum block_op_methods method, unsigned int expected_align, HOST_WIDE_INT expected_size, unsigned HOST_WIDE_INT min_size, unsigned HOST_WIDE_INT max_size, - unsigned HOST_WIDE_INT probable_max_size) + unsigned HOST_WIDE_INT probable_max_size, + bool *bail_out_libcall) { bool may_use_call; rtx retval = 0; @@ -1625,6 +1628,12 @@ emit_block_move_hints (rtx x, rtx y, rtx size, enum block_op_methods method, && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (x)) && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (y))) { + if (bail_out_libcall) + { + *bail_out_libcall = true; + return retval; + } + /* Since x and y are passed to a libcall, mark the corresponding tree EXPR as addressable. */ tree y_expr = MEM_EXPR (y); diff --git a/gcc/expr.h b/gcc/expr.h index b3d523bcb24..c2bf87fd14e 100644 --- a/gcc/expr.h +++ b/gcc/expr.h @@ -110,7 +110,8 @@ extern rtx emit_block_move_hints (rtx, rtx, rtx, enum block_op_methods, unsigned int, HOST_WIDE_INT, unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT, - unsigned HOST_WIDE_INT); + unsigned HOST_WIDE_INT, + bool *bail_out_libcall = NULL); extern rtx emit_block_cmp_hints (rtx, rtx, rtx, tree, rtx, bool, by_pieces_constfn, void *); extern bool emit_storent_insn (rtx to, rtx from); diff --git a/gcc/target.def b/gcc/target.def index c5b2a1e7e71..3bbddc82776 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -2639,6 +2639,13 @@ DEFHOOK bool, (enum function_class fn_class), default_libc_has_function) +DEFHOOK +(libc_func_speed, + "This hook determines whether a function from libc has a fast implementation\n\ +@var{fn} is present at the runtime.", + libc_speed, (int fn), + default_libc_func_speed) + /* True if new jumps cannot be created, to replace existing ones or not, at the current point in the compilation. */ DEFHOOK diff --git a/gcc/targhooks.c b/gcc/targhooks.c index fafcc6c5196..6e44f6f79cf 100644 --- a/gcc/targhooks.c +++ b/gcc/targhooks.c @@ -1642,6 +1642,15 @@ no_c99_libc_has_function (enum function_class fn_class ATTRIBUTE_UNUSED) return false; } +/* This hook determines whether a function from libc has a fast implementation + FN is present at the runtime. */ + +enum libc_speed +default_libc_func_speed (int) +{ + return LIBC_UNKNOWN_SPEED; +} + tree default_builtin_tm_load_store (tree ARG_UNUSED (type)) { diff --git a/gcc/targhooks.h b/gcc/targhooks.h index 8a4393f2ba4..7508673ad0a 100644 --- a/gcc/targhooks.h +++ b/gcc/targhooks.h @@ -205,6 +205,7 @@ extern bool default_have_conditional_execution (void); extern bool default_libc_has_function (enum function_class); extern bool no_c99_libc_has_function (enum function_class); extern bool gnu_libc_has_function (enum function_class); +extern enum libc_speed default_libc_func_speed (int); extern tree default_builtin_tm_load_store (tree); diff --git a/gcc/testsuite/gcc.dg/string-opt-1.c b/gcc/testsuite/gcc.dg/string-opt-1.c index 2f060732bf0..7faaadcbb1f 100644 --- a/gcc/testsuite/gcc.dg/string-opt-1.c +++ b/gcc/testsuite/gcc.dg/string-opt-1.c @@ -48,5 +48,6 @@ main (void) return 0; } -/* { dg-final { scan-assembler-not "\<mempcpy\>" } } */ -/* { dg-final { scan-assembler "memcpy" } } */ +/* { dg-final { scan-assembler-not "\<mempcpy\>" { target { ! { i?86-*-gnu* x86_64-*-gnu* i?86-*-linux* x86_64-*-linux* } } } } } */ +/* { dg-final { scan-assembler "memcpy" { target { ! { i?86-*-gnu* x86_64-*-gnu* i?86-*-linux* x86_64-*-linux* } } } } } */ +/* { dg-final { scan-assembler "mempcpy" { target { i?86-*-gnu* x86_64-*-gnu* i?86-*-linux* x86_64-*-linux* } } } } */ -- 2.16.3 ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657). 2018-04-09 12:31 ` Martin Liška @ 2018-04-10 10:35 ` Jakub Jelinek 2018-04-10 12:28 ` Martin Liška 0 siblings, 1 reply; 45+ messages in thread From: Jakub Jelinek @ 2018-04-10 10:35 UTC (permalink / raw) To: Martin Liška Cc: Richard Biener, Uros Bizjak, gcc-patches, Marc Glisse, H.J. Lu On Mon, Apr 09, 2018 at 02:31:04PM +0200, Martin LiÅ¡ka wrote: > gcc/testsuite/ChangeLog: > > 2018-03-28 Martin Liska <mliska@suse.cz> > > * gcc.dg/string-opt-1.c: I guess you really didn't mean to keep the above entry around, just the one below, right? > gcc/testsuite/ChangeLog: > > 2018-03-14 Martin Liska <mliska@suse.cz> > > * gcc.dg/string-opt-1.c: Adjust scans for i386 and glibc target > and others. > --- a/gcc/config.gcc > +++ b/gcc/config.gcc > @@ -1607,6 +1607,7 @@ x86_64-*-linux* | x86_64-*-kfreebsd*-gnu) > x86_64-*-linux*) > tm_file="${tm_file} linux.h linux-android.h i386/linux-common.h i386/linux64.h" > extra_options="${extra_options} linux-android.opt" > + extra_objs="${extra_objs} x86-linux.o" > ;; The should go into the i[34567]86-*-linux*) case too (outside of the if test x$enable_targets = xall; then conditional). Or maybe better, remove the above and do it in: i[34567]86-*-linux* | x86_64-*-linux*) extra_objs="${extra_objs} cet.o" tmake_file="$tmake_file i386/t-linux i386/t-cet" ;; spot, just add x86-linux.o next to cet.o. > --- a/gcc/config/i386/linux.h > +++ b/gcc/config/i386/linux.h > @@ -24,3 +24,5 @@ along with GCC; see the file COPYING3. If not see > > #undef MUSL_DYNAMIC_LINKER > #define MUSL_DYNAMIC_LINKER "/lib/ld-musl-i386.so.1" > + > +#define SUBTARGET_LIBC_FUNC_SPEED ix86_linux_libc_func_speed > diff --git a/gcc/config/i386/linux64.h b/gcc/config/i386/linux64.h > index f2d913e30ac..d855f5cc239 100644 > --- a/gcc/config/i386/linux64.h > +++ b/gcc/config/i386/linux64.h > @@ -37,3 +37,5 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see > #define MUSL_DYNAMIC_LINKER64 "/lib/ld-musl-x86_64.so.1" > #undef MUSL_DYNAMIC_LINKERX32 > #define MUSL_DYNAMIC_LINKERX32 "/lib/ld-musl-x32.so.1" > + > +#define SUBTARGET_LIBC_FUNC_SPEED ix86_linux_libc_func_speed And the above two changes should be replaced by a change in gcc/config/i386/linux-common.h. > +#include "coretypes.h" > +#include "cp/cp-tree.h" /* This is why we're a separate module. */ Why do you need cp/cp-tree.h? That is just too weird. The function just uses libc_speed (in core-types.h, built_in_function (likewise), OPTION_GLIBC (config/linux.h). Jakub ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657). 2018-04-10 10:35 ` Jakub Jelinek @ 2018-04-10 12:28 ` Martin Liška 2018-04-12 13:22 ` Martin Liška 2018-07-05 19:34 ` Jeff Law 0 siblings, 2 replies; 45+ messages in thread From: Martin Liška @ 2018-04-10 12:28 UTC (permalink / raw) To: Jakub Jelinek Cc: Richard Biener, Uros Bizjak, gcc-patches, Marc Glisse, H.J. Lu [-- Attachment #1: Type: text/plain, Size: 2643 bytes --] On 04/10/2018 11:19 AM, Jakub Jelinek wrote: > On Mon, Apr 09, 2018 at 02:31:04PM +0200, Martin LiÅ¡ka wrote: >> gcc/testsuite/ChangeLog: >> >> 2018-03-28 Martin Liska <mliska@suse.cz> >> >> * gcc.dg/string-opt-1.c: > > I guess you really didn't mean to keep the above entry around, just the one > below, right? Sure, fixed. > >> gcc/testsuite/ChangeLog: >> >> 2018-03-14 Martin Liska <mliska@suse.cz> >> >> * gcc.dg/string-opt-1.c: Adjust scans for i386 and glibc target >> and others. > >> --- a/gcc/config.gcc >> +++ b/gcc/config.gcc >> @@ -1607,6 +1607,7 @@ x86_64-*-linux* | x86_64-*-kfreebsd*-gnu) >> x86_64-*-linux*) >> tm_file="${tm_file} linux.h linux-android.h i386/linux-common.h i386/linux64.h" >> extra_options="${extra_options} linux-android.opt" >> + extra_objs="${extra_objs} x86-linux.o" >> ;; > > The should go into the i[34567]86-*-linux*) case too (outside of the > if test x$enable_targets = xall; then conditional). > Or maybe better, remove the above and do it in: > i[34567]86-*-linux* | x86_64-*-linux*) > extra_objs="${extra_objs} cet.o" > tmake_file="$tmake_file i386/t-linux i386/t-cet" > ;; > spot, just add x86-linux.o next to cet.o. Done. > >> --- a/gcc/config/i386/linux.h >> +++ b/gcc/config/i386/linux.h >> @@ -24,3 +24,5 @@ along with GCC; see the file COPYING3. If not see >> >> #undef MUSL_DYNAMIC_LINKER >> #define MUSL_DYNAMIC_LINKER "/lib/ld-musl-i386.so.1" >> + >> +#define SUBTARGET_LIBC_FUNC_SPEED ix86_linux_libc_func_speed >> diff --git a/gcc/config/i386/linux64.h b/gcc/config/i386/linux64.h >> index f2d913e30ac..d855f5cc239 100644 >> --- a/gcc/config/i386/linux64.h >> +++ b/gcc/config/i386/linux64.h >> @@ -37,3 +37,5 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see >> #define MUSL_DYNAMIC_LINKER64 "/lib/ld-musl-x86_64.so.1" >> #undef MUSL_DYNAMIC_LINKERX32 >> #define MUSL_DYNAMIC_LINKERX32 "/lib/ld-musl-x32.so.1" >> + >> +#define SUBTARGET_LIBC_FUNC_SPEED ix86_linux_libc_func_speed > > And the above two changes should be replaced by a change in > gcc/config/i386/linux-common.h. Likewise. > >> +#include "coretypes.h" >> +#include "cp/cp-tree.h" /* This is why we're a separate module. */ > > Why do you need cp/cp-tree.h? That is just too weird. > The function just uses libc_speed (in core-types.h, built_in_function > (likewise), OPTION_GLIBC (config/linux.h). I ended up with minimal set of includes: #include "config.h" #include "system.h" #include "coretypes.h" #include "backend.h" #include "tree.h" I'm retesting the patch. Martin > > Jakub > [-- Attachment #2: 0001-Introduce-new-libc_func_speed-target-hook-PR-middle-.patch --] [-- Type: text/x-patch, Size: 13605 bytes --] From bed35715063f9435b697eaf4c9868f81e8556de8 Mon Sep 17 00:00:00 2001 From: marxin <mliska@suse.cz> Date: Wed, 14 Mar 2018 09:44:18 +0100 Subject: [PATCH] Introduce new libc_func_speed target hook (PR middle-end/81657). gcc/ChangeLog: 2018-03-14 Martin Liska <mliska@suse.cz> PR middle-end/81657 * builtins.c (expand_builtin_memory_copy_args): Handle situation when libc library provides a fast mempcpy implementation/ * config/linux-protos.h (ix86_linux_libc_func_speed): New. (TARGET_LIBC_FUNC_SPEED): Likewise. * config/i386/linux-common.h (SUBTARGET_LIBC_FUNC_SPEED): Define macro. * config/i386/t-linux: Add x86-linux.o. * config.gcc: Likewise. * config/i386/x86-linux.c: New file. * coretypes.h (enum libc_speed): Likewise. * doc/tm.texi: Document new target hook. * doc/tm.texi.in: Likewise. * expr.c (emit_block_move_hints): Handle libc bail out argument. * expr.h (emit_block_move_hints): Add new parameters. * target.def: Add new hook. * targhooks.c (enum libc_speed): New enum. (default_libc_func_speed): Provide a default hook implementation. * targhooks.h (default_libc_func_speed): Likewise. gcc/testsuite/ChangeLog: 2018-03-14 Martin Liska <mliska@suse.cz> * gcc.dg/string-opt-1.c: Adjust scans for i386 and glibc target and others. --- gcc/builtins.c | 15 ++++++++++- gcc/config.gcc | 2 +- gcc/config/i386/i386.c | 5 ++++ gcc/config/i386/linux-common.h | 2 ++ gcc/config/i386/t-linux | 6 +++++ gcc/config/i386/x86-linux.c | 52 +++++++++++++++++++++++++++++++++++++ gcc/config/linux-protos.h | 1 + gcc/coretypes.h | 7 +++++ gcc/doc/tm.texi | 4 +++ gcc/doc/tm.texi.in | 1 + gcc/expr.c | 11 +++++++- gcc/expr.h | 3 ++- gcc/target.def | 7 +++++ gcc/targhooks.c | 9 +++++++ gcc/targhooks.h | 1 + gcc/testsuite/gcc.dg/string-opt-1.c | 5 ++-- 16 files changed, 125 insertions(+), 6 deletions(-) create mode 100644 gcc/config/i386/x86-linux.c diff --git a/gcc/builtins.c b/gcc/builtins.c index 487d9d58db2..98ee3fb272d 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -3651,13 +3651,26 @@ expand_builtin_memory_copy_args (tree dest, tree src, tree len, src_mem = get_memory_rtx (src, len); set_mem_align (src_mem, src_align); + /* emit_block_move_hints can generate a library call to memcpy function. + In situations when a libc library provides fast implementation + of mempcpy, then it's better to call mempcpy directly. */ + bool avoid_libcall + = (endp == 1 + && targetm.libc_func_speed ((int)BUILT_IN_MEMPCPY) == LIBC_FAST_SPEED + && target != const0_rtx); + /* Copy word part most expediently. */ + bool libcall_avoided = false; dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx, CALL_EXPR_TAILCALL (exp) && (endp == 0 || target == const0_rtx) ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL, expected_align, expected_size, - min_size, max_size, probable_max_size); + min_size, max_size, probable_max_size, + avoid_libcall ? &libcall_avoided : NULL); + + if (libcall_avoided) + return NULL_RTX; if (dest_addr == 0) { diff --git a/gcc/config.gcc b/gcc/config.gcc index 1b58c060a92..7fe43856b6a 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -4617,7 +4617,7 @@ case ${target} in i[34567]86-*-darwin* | x86_64-*-darwin*) ;; i[34567]86-*-linux* | x86_64-*-linux*) - extra_objs="${extra_objs} cet.o" + extra_objs="${extra_objs} cet.o x86-linux.o" tmake_file="$tmake_file i386/t-linux i386/t-cet" ;; i[34567]86-*-kfreebsd*-gnu | x86_64-*-kfreebsd*-gnu) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index b4f6aec1434..2471ff7b99a 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -52105,6 +52105,11 @@ ix86_run_selftests (void) #undef TARGET_WARN_PARAMETER_PASSING_ABI #define TARGET_WARN_PARAMETER_PASSING_ABI ix86_warn_parameter_passing_abi +#ifdef SUBTARGET_LIBC_FUNC_SPEED +#undef TARGET_LIBC_FUNC_SPEED +#define TARGET_LIBC_FUNC_SPEED SUBTARGET_LIBC_FUNC_SPEED +#endif + #if CHECKING_P #undef TARGET_RUN_TARGET_SELFTESTS #define TARGET_RUN_TARGET_SELFTESTS selftest::ix86_run_selftests diff --git a/gcc/config/i386/linux-common.h b/gcc/config/i386/linux-common.h index d877387021b..1b48c15e5c0 100644 --- a/gcc/config/i386/linux-common.h +++ b/gcc/config/i386/linux-common.h @@ -126,3 +126,5 @@ extern void file_end_indicate_exec_stack_and_cet (void); #undef TARGET_ASM_FILE_END #define TARGET_ASM_FILE_END file_end_indicate_exec_stack_and_cet + +#define SUBTARGET_LIBC_FUNC_SPEED ix86_linux_libc_func_speed diff --git a/gcc/config/i386/t-linux b/gcc/config/i386/t-linux index 155314c08a7..6e3ebe94fe8 100644 --- a/gcc/config/i386/t-linux +++ b/gcc/config/i386/t-linux @@ -1 +1,7 @@ MULTIARCH_DIRNAME = $(call if_multiarch,i386-linux-gnu) + +x86-linux.o: $(srcdir)/config/i386/x86-linux.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \ + $(TM_H) $(RTL_H) $(REGS_H) hard-reg-set.h output.h $(TREE_H) flags.h \ + $(TM_P_H) $(HASHTAB_H) $(GGC_H) + $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \ + $(srcdir)/config/i386/x86-linux.c diff --git a/gcc/config/i386/x86-linux.c b/gcc/config/i386/x86-linux.c new file mode 100644 index 00000000000..5e4331f635a --- /dev/null +++ b/gcc/config/i386/x86-linux.c @@ -0,0 +1,52 @@ +/* Implementation for linux-specific functions for i386 and x86-64 systems. + Copyright (C) 2018 Free Software Foundation, Inc. + Contributed by Martin Liska <mliska@suse.cz>. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 3, or (at your option) +any later version. + +GCC is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +Under Section 7 of GPL version 3, you are granted additional +permissions described in the GCC Runtime Library Exception, version +3.1, as published by the Free Software Foundation. + +You should have received a copy of the GNU General Public License and +a copy of the GCC Runtime Library Exception along with this program; +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +<http://www.gnu.org/licenses/>. */ + +#define IN_TARGET_CODE 1 +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "backend.h" +#include "tree.h" + +/* This hook determines whether a function from libc has a fast implementation + FN is present at the runtime. We override it for i386 and glibc C library + as this combination provides fast implementation of mempcpy function. */ + +enum libc_speed +ix86_linux_libc_func_speed (int fn) +{ + enum built_in_function f = (built_in_function)fn; + + if (!OPTION_GLIBC) + return LIBC_UNKNOWN_SPEED; + + switch (f) + { + case BUILT_IN_MEMPCPY: + return LIBC_FAST_SPEED; + default: + return LIBC_UNKNOWN_SPEED; + } +} diff --git a/gcc/config/linux-protos.h b/gcc/config/linux-protos.h index 9da8dd7ecaa..b7284735366 100644 --- a/gcc/config/linux-protos.h +++ b/gcc/config/linux-protos.h @@ -20,3 +20,4 @@ along with GCC; see the file COPYING3. If not see extern bool linux_has_ifunc_p (void); extern bool linux_libc_has_function (enum function_class fn_class); +extern enum libc_speed ix86_linux_libc_func_speed (int fn); diff --git a/gcc/coretypes.h b/gcc/coretypes.h index 283b4eb33fe..fe618f708f4 100644 --- a/gcc/coretypes.h +++ b/gcc/coretypes.h @@ -384,6 +384,13 @@ enum excess_precision_type EXCESS_PRECISION_TYPE_FAST }; +enum libc_speed +{ + LIBC_FAST_SPEED, + LIBC_SLOW_SPEED, + LIBC_UNKNOWN_SPEED +}; + /* Support for user-provided GGC and PCH markers. The first parameter is a pointer to a pointer, the second a cookie. */ typedef void (*gt_pointer_operator) (void *, void *); diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index bd8b917ba82..0f7c91a22c4 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -5501,6 +5501,10 @@ macro, a reasonable default is used. This hook determines whether a function from a class of functions @var{fn_class} is present at the runtime. @end deftypefn +@deftypefn {Target Hook} libc_speed TARGET_LIBC_FUNC_SPEED (int @var{fn}) +This hook determines whether a function from libc has a fast implementation +@var{fn} is present at the runtime. +@end deftypefn @defmac NEXT_OBJC_RUNTIME Set this macro to 1 to use the "NeXT" Objective-C message sending conventions diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index b0207146e8c..4bb2998a8a1 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -3933,6 +3933,7 @@ macro, a reasonable default is used. @end defmac @hook TARGET_LIBC_HAS_FUNCTION +@hook TARGET_LIBC_FUNC_SPEED @defmac NEXT_OBJC_RUNTIME Set this macro to 1 to use the "NeXT" Objective-C message sending conventions diff --git a/gcc/expr.c b/gcc/expr.c index 00660293f72..f3bd698bc4d 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -1554,6 +1554,8 @@ compare_by_pieces (rtx arg0, rtx arg1, unsigned HOST_WIDE_INT len, MIN_SIZE is the minimal size of block to move MAX_SIZE is the maximal size of block to move, if it can not be represented in unsigned HOST_WIDE_INT, than it is mask of all ones. + If BAIL_OUT_LIBCALL is non-null, do not emit library call and assign + true to the pointer when move is not done. Return the address of the new block, if memcpy is called and returns it, 0 otherwise. */ @@ -1563,7 +1565,8 @@ emit_block_move_hints (rtx x, rtx y, rtx size, enum block_op_methods method, unsigned int expected_align, HOST_WIDE_INT expected_size, unsigned HOST_WIDE_INT min_size, unsigned HOST_WIDE_INT max_size, - unsigned HOST_WIDE_INT probable_max_size) + unsigned HOST_WIDE_INT probable_max_size, + bool *bail_out_libcall) { bool may_use_call; rtx retval = 0; @@ -1625,6 +1628,12 @@ emit_block_move_hints (rtx x, rtx y, rtx size, enum block_op_methods method, && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (x)) && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (y))) { + if (bail_out_libcall) + { + *bail_out_libcall = true; + return retval; + } + /* Since x and y are passed to a libcall, mark the corresponding tree EXPR as addressable. */ tree y_expr = MEM_EXPR (y); diff --git a/gcc/expr.h b/gcc/expr.h index b3d523bcb24..c2bf87fd14e 100644 --- a/gcc/expr.h +++ b/gcc/expr.h @@ -110,7 +110,8 @@ extern rtx emit_block_move_hints (rtx, rtx, rtx, enum block_op_methods, unsigned int, HOST_WIDE_INT, unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT, - unsigned HOST_WIDE_INT); + unsigned HOST_WIDE_INT, + bool *bail_out_libcall = NULL); extern rtx emit_block_cmp_hints (rtx, rtx, rtx, tree, rtx, bool, by_pieces_constfn, void *); extern bool emit_storent_insn (rtx to, rtx from); diff --git a/gcc/target.def b/gcc/target.def index c5b2a1e7e71..3bbddc82776 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -2639,6 +2639,13 @@ DEFHOOK bool, (enum function_class fn_class), default_libc_has_function) +DEFHOOK +(libc_func_speed, + "This hook determines whether a function from libc has a fast implementation\n\ +@var{fn} is present at the runtime.", + libc_speed, (int fn), + default_libc_func_speed) + /* True if new jumps cannot be created, to replace existing ones or not, at the current point in the compilation. */ DEFHOOK diff --git a/gcc/targhooks.c b/gcc/targhooks.c index fafcc6c5196..6e44f6f79cf 100644 --- a/gcc/targhooks.c +++ b/gcc/targhooks.c @@ -1642,6 +1642,15 @@ no_c99_libc_has_function (enum function_class fn_class ATTRIBUTE_UNUSED) return false; } +/* This hook determines whether a function from libc has a fast implementation + FN is present at the runtime. */ + +enum libc_speed +default_libc_func_speed (int) +{ + return LIBC_UNKNOWN_SPEED; +} + tree default_builtin_tm_load_store (tree ARG_UNUSED (type)) { diff --git a/gcc/targhooks.h b/gcc/targhooks.h index 8a4393f2ba4..7508673ad0a 100644 --- a/gcc/targhooks.h +++ b/gcc/targhooks.h @@ -205,6 +205,7 @@ extern bool default_have_conditional_execution (void); extern bool default_libc_has_function (enum function_class); extern bool no_c99_libc_has_function (enum function_class); extern bool gnu_libc_has_function (enum function_class); +extern enum libc_speed default_libc_func_speed (int); extern tree default_builtin_tm_load_store (tree); diff --git a/gcc/testsuite/gcc.dg/string-opt-1.c b/gcc/testsuite/gcc.dg/string-opt-1.c index 2f060732bf0..7faaadcbb1f 100644 --- a/gcc/testsuite/gcc.dg/string-opt-1.c +++ b/gcc/testsuite/gcc.dg/string-opt-1.c @@ -48,5 +48,6 @@ main (void) return 0; } -/* { dg-final { scan-assembler-not "\<mempcpy\>" } } */ -/* { dg-final { scan-assembler "memcpy" } } */ +/* { dg-final { scan-assembler-not "\<mempcpy\>" { target { ! { i?86-*-gnu* x86_64-*-gnu* i?86-*-linux* x86_64-*-linux* } } } } } */ +/* { dg-final { scan-assembler "memcpy" { target { ! { i?86-*-gnu* x86_64-*-gnu* i?86-*-linux* x86_64-*-linux* } } } } } */ +/* { dg-final { scan-assembler "mempcpy" { target { i?86-*-gnu* x86_64-*-gnu* i?86-*-linux* x86_64-*-linux* } } } } */ -- 2.16.3 ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657). 2018-04-10 12:28 ` Martin Liška @ 2018-04-12 13:22 ` Martin Liška 2018-04-12 13:42 ` Jan Hubicka 2018-04-12 13:52 ` Richard Biener 2018-07-05 19:34 ` Jeff Law 1 sibling, 2 replies; 45+ messages in thread From: Martin Liška @ 2018-04-12 13:22 UTC (permalink / raw) To: Jakub Jelinek Cc: Richard Biener, Uros Bizjak, gcc-patches, Marc Glisse, H.J. Lu, Jan Hubicka Hi. I'm reminding review request from Richi for generic part and Uros/Honza for target part. Thanks, Martin ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657). 2018-04-12 13:22 ` Martin Liška @ 2018-04-12 13:42 ` Jan Hubicka 2018-04-12 13:52 ` Richard Biener 1 sibling, 0 replies; 45+ messages in thread From: Jan Hubicka @ 2018-04-12 13:42 UTC (permalink / raw) To: Martin Liška Cc: Jakub Jelinek, Richard Biener, Uros Bizjak, gcc-patches, Marc Glisse, H.J. Lu > Hi. > > I'm reminding review request from Richi for generic part > and Uros/Honza for target part. OK for i386 bits. Honza > > Thanks, > Martin ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657). 2018-04-12 13:22 ` Martin Liška 2018-04-12 13:42 ` Jan Hubicka @ 2018-04-12 13:52 ` Richard Biener 2018-04-12 14:35 ` Jakub Jelinek 1 sibling, 1 reply; 45+ messages in thread From: Richard Biener @ 2018-04-12 13:52 UTC (permalink / raw) To: Martin Liška Cc: Jakub Jelinek, Uros Bizjak, gcc-patches, Marc Glisse, H.J. Lu, Jan Hubicka [-- Attachment #1: Type: text/plain, Size: 1418 bytes --] On Thu, 12 Apr 2018, Martin LiÅ¡ka wrote: > Hi. > > I'm reminding review request from Richi for generic part > and Uros/Honza for target part. Not sure if I missed some important part of the discussion but for the testcase we want to preserve the tailcall, right? So it would be enough to set avoid_libcall to endp != 0 && CALL_EXPR_TAILCALL (exp) (and thus also handle stpcpy)? I'm not sure I like the interfacing of that to emit_block_move_hints very much. I'd have used sth like BLOCK_OP_ABORT_ON_LIBCALL and extend the interface in a way to return what kind of method it chose rather than just a bool. Not sure what gcc.dg/20050503-1.c did on non-x86 targets - the test runs on all archs but only x86 is ever tested for a result. So - I think tail-calling is prefered, and somehow in the PR the discussion wandered off to whether there's fast implementations or not - but the testcase looks for a tailcall where the source was a tailcall, that should be authorative for the "default" decision when the hook isn't implemented or doesn't cover the case. IMO target libraries have to be quite stupid if they have anything slower than void *mempcpy (void *dest, const void *src, size_t n) { return memcpy (dest, src, n) + n; } which should be not (very much) slower than a non-tailcall memcpy call. So -- remove the hook and instead use CALL_EXPR_TAILCALL (exp) instead of its result. Thanks, Richard. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657). 2018-04-12 13:52 ` Richard Biener @ 2018-04-12 14:35 ` Jakub Jelinek 2018-04-12 14:19 ` Richard Biener 0 siblings, 1 reply; 45+ messages in thread From: Jakub Jelinek @ 2018-04-12 14:35 UTC (permalink / raw) To: Richard Biener Cc: Martin Liška, Uros Bizjak, gcc-patches, Marc Glisse, H.J. Lu, Jan Hubicka On Thu, Apr 12, 2018 at 03:52:09PM +0200, Richard Biener wrote: > Not sure if I missed some important part of the discussion but > for the testcase we want to preserve the tailcall, right? So > it would be enough to set avoid_libcall to > endp != 0 && CALL_EXPR_TAILCALL (exp) (and thus also handle > stpcpy)? For the testcase yes. There the question is if some targets have so lame mempcpy that using a tailcall to mempcpy is slower over avoiding the tailcall (and on aarch64 it looked like maintainer's choice to have lame mempcpy and hope the compiler will avoid it at all costs). On the other side, that change has been forced over to all targets, even when they don't have lame mempcpy. So, the tailcall is one issue, and we can either use mempcpy if endp and CALL_EXPR_TAILCALL, or only do that if -Os. And another issue is mempcpy uses in other contexts, here again I think x86 has good enough mempcpy that if I use foo (mempcpy (x, y, z)) then it is better to use mempcpy over memcpy call, but not so on targets with lame mempcpy. My preference would be to have non-lame mempcpy etc. on all targets, but the aarch64 folks disagree. So, wonder e.g. about Martin's patch, which would use mempcpy if endp and either FAST_SPEED for mempcpy (regardless of the context), or not SLOW_SPEED and CALL_EXPR_TAILCALL. That way, targets could signal they have so lame mempcpy that they never want to use it (return SLOW_SPEED), or ask for it to be used every time it makes sense from caller POV, and have the default something in between (only use it in tail calls). Jakub ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657). 2018-04-12 14:35 ` Jakub Jelinek @ 2018-04-12 14:19 ` Richard Biener 2018-04-12 14:35 ` Jakub Jelinek 0 siblings, 1 reply; 45+ messages in thread From: Richard Biener @ 2018-04-12 14:19 UTC (permalink / raw) To: Jakub Jelinek Cc: Martin Liška, Uros Bizjak, gcc-patches, Marc Glisse, H.J. Lu, Jan Hubicka On Thu, 12 Apr 2018, Jakub Jelinek wrote: > On Thu, Apr 12, 2018 at 03:52:09PM +0200, Richard Biener wrote: > > Not sure if I missed some important part of the discussion but > > for the testcase we want to preserve the tailcall, right? So > > it would be enough to set avoid_libcall to > > endp != 0 && CALL_EXPR_TAILCALL (exp) (and thus also handle > > stpcpy)? > > For the testcase yes. There the question is if some targets have so lame > mempcpy that using a tailcall to mempcpy is slower over avoiding the > tailcall (and on aarch64 it looked like maintainer's choice to have lame > mempcpy and hope the compiler will avoid it at all costs). On the other > side, that change has been forced over to all targets, even when they don't > have lame mempcpy. > So, the tailcall is one issue, and we can either use mempcpy if endp > and CALL_EXPR_TAILCALL, or only do that if -Os. > > And another issue is mempcpy uses in other contexts, here again I think x86 > has good enough mempcpy that if I use > foo (mempcpy (x, y, z)) then it is better to use mempcpy over memcpy call, > but not so on targets with lame mempcpy. > > My preference would be to have non-lame mempcpy etc. on all targets, but the > aarch64 folks disagree. > > So, wonder e.g. about Martin's patch, which would use mempcpy if endp and > either FAST_SPEED for mempcpy (regardless of the context), or not > SLOW_SPEED and CALL_EXPR_TAILCALL. That way, targets could signal they have > so lame mempcpy that they never want to use it (return SLOW_SPEED), or ask > for it to be used every time it makes sense from caller POV, and have the > default something in between (only use it in tail calls). Well, but that wouldn't be a fix for a regression and IMHO there's no reason for a really lame mempcpy. If targets disgree well, then they get what they deserve. I don't see any aarch64 specific mempcpy in glibc btw so hopefully the default non-stupid one kicks in (it exactly looks like my C version) Richard. > Jakub > > -- Richard Biener <rguenther@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg) ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657). 2018-04-12 14:19 ` Richard Biener @ 2018-04-12 14:35 ` Jakub Jelinek 2018-04-12 15:17 ` Richard Biener 0 siblings, 1 reply; 45+ messages in thread From: Jakub Jelinek @ 2018-04-12 14:35 UTC (permalink / raw) To: Richard Biener Cc: Martin Liška, Uros Bizjak, gcc-patches, Marc Glisse, H.J. Lu, Jan Hubicka On Thu, Apr 12, 2018 at 04:19:38PM +0200, Richard Biener wrote: > Well, but that wouldn't be a fix for a regression and IMHO there's > no reason for a really lame mempcpy. If targets disgree well, It is a regression as well, in the past we've emitted mempcpy when user wrote mempcpy, now we don't. E.g. extern void *mempcpy (void *, const void *, __SIZE_TYPE__); void bar (void *, void *, void *); void foo (void *x, void *y, void *z, void *w, __SIZE_TYPE__ n) { bar (mempcpy (x, w, n), mempcpy (y, w, n), mempcpy (z, w, n)); } is on x86_64-linux -O2 in 7.x using the 3 mempcpy calls and 90 bytes in foo, while on the trunk uses 3 memcpy calls and 96 bytes in foo. For -Os that is easily measurable regression, for -O2 it depends on the relative speed of memcpy vs. mempcpy and whether one or both of them are in I-cache or not. > then they get what they deserve. > > I don't see any aarch64 specific mempcpy in glibc btw so hopefully > the default non-stupid one kicks in (it exactly looks like my C > version) Jakub ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657). 2018-04-12 14:35 ` Jakub Jelinek @ 2018-04-12 15:17 ` Richard Biener 2018-04-12 17:35 ` Jakub Jelinek 0 siblings, 1 reply; 45+ messages in thread From: Richard Biener @ 2018-04-12 15:17 UTC (permalink / raw) To: Jakub Jelinek Cc: Martin Liška, Uros Bizjak, gcc-patches, Marc Glisse, H.J. Lu, Jan Hubicka On April 12, 2018 4:31:12 PM GMT+02:00, Jakub Jelinek <jakub@redhat.com> wrote: >On Thu, Apr 12, 2018 at 04:19:38PM +0200, Richard Biener wrote: >> Well, but that wouldn't be a fix for a regression and IMHO there's >> no reason for a really lame mempcpy. If targets disgree well, > >It is a regression as well, in the past we've emitted mempcpy when user >wrote mempcpy, now we don't. > >E.g. >extern void *mempcpy (void *, const void *, __SIZE_TYPE__); >void bar (void *, void *, void *); > >void >foo (void *x, void *y, void *z, void *w, __SIZE_TYPE__ n) >{ > bar (mempcpy (x, w, n), mempcpy (y, w, n), mempcpy (z, w, n)); >} > >is on x86_64-linux -O2 in 7.x using the 3 mempcpy calls and 90 bytes in >foo, while >on the trunk uses 3 memcpy calls and 96 bytes in foo. > >For -Os that is easily measurable regression, for -O2 it depends on the >relative speed of memcpy vs. mempcpy and whether one or both of them >are in >I-cache or not. Well, then simply unconditionally not generate a libcall from the move expander? > >> then they get what they deserve. >> >> I don't see any aarch64 specific mempcpy in glibc btw so hopefully >> the default non-stupid one kicks in (it exactly looks like my C >> version) > > Jakub ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657). 2018-04-12 15:17 ` Richard Biener @ 2018-04-12 17:35 ` Jakub Jelinek 2018-04-12 16:03 ` Richard Biener 0 siblings, 1 reply; 45+ messages in thread From: Jakub Jelinek @ 2018-04-12 17:35 UTC (permalink / raw) To: Richard Biener Cc: Martin Liška, Uros Bizjak, gcc-patches, Marc Glisse, H.J. Lu, Jan Hubicka On Thu, Apr 12, 2018 at 05:17:29PM +0200, Richard Biener wrote: > >For -Os that is easily measurable regression, for -O2 it depends on the > >relative speed of memcpy vs. mempcpy and whether one or both of them > >are in > >I-cache or not. > > Well, then simply unconditionally not generate a libcall from the move expander? We need to generate libcall for many callers and in fact, we don't have a mode nor a way to tell the caller that we haven't emitted anything. What we could do is add another enumerator to enum block_op_methods that would be like BLOCK_OP_NO_LIBCALL, but would not use emit_block_move_via_loop if move_by_pieces nor emit_block_move_via_movmem can be used, and say instead return const0_rtx or pc_rtx or some way to tell the caller that it hasn't emitted anything and in expand_builtin_memory_copy_args pass for endp == 1 && target != const0_rtx that new BLOCK_OP_NO_LIBCALL_LOOP to emit_block_move_hints and return 0 if dest_addr is const0_rtx (or pc_rtx or whatever is chosen). Jakub ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657). 2018-04-12 17:35 ` Jakub Jelinek @ 2018-04-12 16:03 ` Richard Biener 0 siblings, 0 replies; 45+ messages in thread From: Richard Biener @ 2018-04-12 16:03 UTC (permalink / raw) To: Jakub Jelinek Cc: Martin Liška, Uros Bizjak, gcc-patches, Marc Glisse, H.J. Lu, Jan Hubicka On April 12, 2018 5:38:44 PM GMT+02:00, Jakub Jelinek <jakub@redhat.com> wrote: >On Thu, Apr 12, 2018 at 05:17:29PM +0200, Richard Biener wrote: >> >For -Os that is easily measurable regression, for -O2 it depends on >the >> >relative speed of memcpy vs. mempcpy and whether one or both of them >> >are in >> >I-cache or not. >> >> Well, then simply unconditionally not generate a libcall from the >move expander? > >We need to generate libcall for many callers and in fact, we don't have >a >mode nor a way to tell the caller that we haven't emitted anything. > >What we could do is add another enumerator to enum block_op_methods >that >would be like BLOCK_OP_NO_LIBCALL, but would not use >emit_block_move_via_loop >if move_by_pieces nor emit_block_move_via_movmem can be used, and say >instead return const0_rtx or pc_rtx or some way to tell the caller that >it hasn't emitted anything and in expand_builtin_memory_copy_args >pass for endp == 1 && target != const0_rtx that new >BLOCK_OP_NO_LIBCALL_LOOP >to emit_block_move_hints and return 0 if dest_addr is const0_rtx (or >pc_rtx >or whatever is chosen). Yes. Emit the "original" call whenever inline expansion fails. Richard. > Jakub ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657). 2018-04-10 12:28 ` Martin Liška 2018-04-12 13:22 ` Martin Liška @ 2018-07-05 19:34 ` Jeff Law 1 sibling, 0 replies; 45+ messages in thread From: Jeff Law @ 2018-07-05 19:34 UTC (permalink / raw) To: Martin Liška, Jakub Jelinek Cc: Richard Biener, Uros Bizjak, gcc-patches, Marc Glisse, H.J. Lu On 04/10/2018 06:27 AM, Martin LiÅ¡ka wrote: > On 04/10/2018 11:19 AM, Jakub Jelinek wrote: >> On Mon, Apr 09, 2018 at 02:31:04PM +0200, Martin LiÅ¡ka wrote: >>> gcc/testsuite/ChangeLog: >>> >>> 2018-03-28 Martin Liska <mliska@suse.cz> >>> >>> * gcc.dg/string-opt-1.c: >> I guess you really didn't mean to keep the above entry around, just the one >> below, right? > Sure, fixed. > >>> gcc/testsuite/ChangeLog: >>> >>> 2018-03-14 Martin Liska <mliska@suse.cz> >>> >>> * gcc.dg/string-opt-1.c: Adjust scans for i386 and glibc target >>> and others. >>> --- a/gcc/config.gcc >>> +++ b/gcc/config.gcc >>> @@ -1607,6 +1607,7 @@ x86_64-*-linux* | x86_64-*-kfreebsd*-gnu) >>> x86_64-*-linux*) >>> tm_file="${tm_file} linux.h linux-android.h i386/linux-common.h i386/linux64.h" >>> extra_options="${extra_options} linux-android.opt" >>> + extra_objs="${extra_objs} x86-linux.o" >>> ;; >> The should go into the i[34567]86-*-linux*) case too (outside of the >> if test x$enable_targets = xall; then conditional). >> Or maybe better, remove the above and do it in: >> i[34567]86-*-linux* | x86_64-*-linux*) >> extra_objs="${extra_objs} cet.o" >> tmake_file="$tmake_file i386/t-linux i386/t-cet" >> ;; >> spot, just add x86-linux.o next to cet.o. > Done. > >>> --- a/gcc/config/i386/linux.h >>> +++ b/gcc/config/i386/linux.h >>> @@ -24,3 +24,5 @@ along with GCC; see the file COPYING3. If not see >>> >>> #undef MUSL_DYNAMIC_LINKER >>> #define MUSL_DYNAMIC_LINKER "/lib/ld-musl-i386.so.1" >>> + >>> +#define SUBTARGET_LIBC_FUNC_SPEED ix86_linux_libc_func_speed >>> diff --git a/gcc/config/i386/linux64.h b/gcc/config/i386/linux64.h >>> index f2d913e30ac..d855f5cc239 100644 >>> --- a/gcc/config/i386/linux64.h >>> +++ b/gcc/config/i386/linux64.h >>> @@ -37,3 +37,5 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see >>> #define MUSL_DYNAMIC_LINKER64 "/lib/ld-musl-x86_64.so.1" >>> #undef MUSL_DYNAMIC_LINKERX32 >>> #define MUSL_DYNAMIC_LINKERX32 "/lib/ld-musl-x32.so.1" >>> + >>> +#define SUBTARGET_LIBC_FUNC_SPEED ix86_linux_libc_func_speed >> And the above two changes should be replaced by a change in >> gcc/config/i386/linux-common.h. > Likewise. > >>> +#include "coretypes.h" >>> +#include "cp/cp-tree.h" /* This is why we're a separate module. */ >> Why do you need cp/cp-tree.h? That is just too weird. >> The function just uses libc_speed (in core-types.h, built_in_function >> (likewise), OPTION_GLIBC (config/linux.h). > I ended up with minimal set of includes: > > #include "config.h" > #include "system.h" > #include "coretypes.h" > #include "backend.h" > #include "tree.h" > > I'm retesting the patch. > > Martin > >> Jakub >> > > 0001-Introduce-new-libc_func_speed-target-hook-PR-middle-.patch > > > From bed35715063f9435b697eaf4c9868f81e8556de8 Mon Sep 17 00:00:00 2001 > From: marxin <mliska@suse.cz> > Date: Wed, 14 Mar 2018 09:44:18 +0100 > Subject: [PATCH] Introduce new libc_func_speed target hook (PR > middle-end/81657). > > gcc/ChangeLog: > > 2018-03-14 Martin Liska <mliska@suse.cz> > > PR middle-end/81657 > * builtins.c (expand_builtin_memory_copy_args): Handle situation > when libc library provides a fast mempcpy implementation/ > * config/linux-protos.h (ix86_linux_libc_func_speed): New. > (TARGET_LIBC_FUNC_SPEED): Likewise. > * config/i386/linux-common.h (SUBTARGET_LIBC_FUNC_SPEED): Define > macro. > * config/i386/t-linux: Add x86-linux.o. > * config.gcc: Likewise. > * config/i386/x86-linux.c: New file. > * coretypes.h (enum libc_speed): Likewise. > * doc/tm.texi: Document new target hook. > * doc/tm.texi.in: Likewise. > * expr.c (emit_block_move_hints): Handle libc bail out argument. > * expr.h (emit_block_move_hints): Add new parameters. > * target.def: Add new hook. > * targhooks.c (enum libc_speed): New enum. > (default_libc_func_speed): Provide a default hook > implementation. > * targhooks.h (default_libc_func_speed): Likewise. > > gcc/testsuite/ChangeLog: > > 2018-03-14 Martin Liska <mliska@suse.cz> > > * gcc.dg/string-opt-1.c: Adjust scans for i386 and glibc target > and others. This looks pretty reasonable now. Let's go with it. If we need to adjust other targets we certainly can fault them in as their properties are discovered/updated. jeff ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657). 2018-03-14 12:57 ` Martin Liška 2018-03-14 13:08 ` Jakub Jelinek @ 2018-03-14 13:08 ` H.J. Lu 2018-03-14 13:10 ` Martin Liška 1 sibling, 1 reply; 45+ messages in thread From: H.J. Lu @ 2018-03-14 13:08 UTC (permalink / raw) To: Martin Liška; +Cc: Jakub Jelinek, GCC Patches, Marc Glisse On Wed, Mar 14, 2018 at 5:54 AM, Martin Liška <mliska@suse.cz> wrote: > On 03/13/2018 04:23 PM, Jakub Jelinek wrote: >> On Tue, Mar 13, 2018 at 04:19:21PM +0100, Martin Liška wrote: >>>> Yes, see e.g. TARGET_LIBC_HAS_FUNCTION target hook, >>>> where in particular linux_libc_has_function deals with various C libraries. >>>> Of course, in this case you need another target hook, that is dependent both >>>> on the target backend and C library. >>>> >>>> It would be nice to make the target hook a little bit more generic as well, >>>> e.g. pass it enum builtin_function and query if it is fast, slow or >>>> unknown, or even some kind of cost, where the caller could ask for cost of >>>> BUILT_IN_MEMCPY and BUILT_IN_MEMPCPY and decide based on the relative costs. >>> >>> Let me start with simple return enum value of FAST,SLOW,UNKNOWN. I've added new hook >>> definition to gcc/config/gnu-user.h that will point to gnu_libc_function_implementation. >>> I would like to implement the function in gcc/targhooks.c, but I don't know how to >>> make ifdef according to target? >> >> Put there just the default implementation (everything is UNKNOWN?). >> >>> One another issue is that built_in_function is enum defined in tree.h. Thus I'll replace the >>> callback argument with int, that will be casted. One last issue: am I right that I'll have to define >>> TARGET_LIBC_FUNCTION_IMPLEMENTATION in each config file (similar to no_c99_libc_has_function)? >> >> And define the i386/x86_64 glibc one in config/i386/*.h, check there >> OPTION_GLIBC and only in that case return something other than UNKNOWN. >> >> And redefine TARGET_LIBC_FUNCTION_IMPLEMENTATION only in that case. >> >> Jakub >> > > Hi. > > I'm sending V2 that can survive bootstrap and regression tests on both x86_64 and ppc64le. > > Martin diff --git a/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c b/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c index d82e2232d7b..91e1c87f83f 100644 --- a/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c +++ b/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c @@ -62,7 +62,7 @@ main_test (void) mempcpy (p + 5, s3, 1); if (memcmp (p, "ABCDEFg", 8)) abort (); - mempcpy (p + 6, s1 + 1, l1); + memcpy (p + 6, s1 + 1, l1); if (memcmp (p, "ABCDEF2", 8)) abort (); } This is a mempcpy test. Why is mempcpy changed to memcpy? -- H.J. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657). 2018-03-14 13:08 ` H.J. Lu @ 2018-03-14 13:10 ` Martin Liška 2018-03-14 13:43 ` Jakub Jelinek 0 siblings, 1 reply; 45+ messages in thread From: Martin Liška @ 2018-03-14 13:10 UTC (permalink / raw) To: H.J. Lu; +Cc: Jakub Jelinek, GCC Patches, Marc Glisse On 03/14/2018 01:57 PM, H.J. Lu wrote: > On Wed, Mar 14, 2018 at 5:54 AM, Martin LiÅ¡ka <mliska@suse.cz> wrote: >> On 03/13/2018 04:23 PM, Jakub Jelinek wrote: >>> On Tue, Mar 13, 2018 at 04:19:21PM +0100, Martin LiÅ¡ka wrote: >>>>> Yes, see e.g. TARGET_LIBC_HAS_FUNCTION target hook, >>>>> where in particular linux_libc_has_function deals with various C libraries. >>>>> Of course, in this case you need another target hook, that is dependent both >>>>> on the target backend and C library. >>>>> >>>>> It would be nice to make the target hook a little bit more generic as well, >>>>> e.g. pass it enum builtin_function and query if it is fast, slow or >>>>> unknown, or even some kind of cost, where the caller could ask for cost of >>>>> BUILT_IN_MEMCPY and BUILT_IN_MEMPCPY and decide based on the relative costs. >>>> >>>> Let me start with simple return enum value of FAST,SLOW,UNKNOWN. I've added new hook >>>> definition to gcc/config/gnu-user.h that will point to gnu_libc_function_implementation. >>>> I would like to implement the function in gcc/targhooks.c, but I don't know how to >>>> make ifdef according to target? >>> >>> Put there just the default implementation (everything is UNKNOWN?). >>> >>>> One another issue is that built_in_function is enum defined in tree.h. Thus I'll replace the >>>> callback argument with int, that will be casted. One last issue: am I right that I'll have to define >>>> TARGET_LIBC_FUNCTION_IMPLEMENTATION in each config file (similar to no_c99_libc_has_function)? >>> >>> And define the i386/x86_64 glibc one in config/i386/*.h, check there >>> OPTION_GLIBC and only in that case return something other than UNKNOWN. >>> >>> And redefine TARGET_LIBC_FUNCTION_IMPLEMENTATION only in that case. >>> >>> Jakub >>> >> >> Hi. >> >> I'm sending V2 that can survive bootstrap and regression tests on both x86_64 and ppc64le. >> >> Martin > > diff --git a/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c > b/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c > index d82e2232d7b..91e1c87f83f 100644 > --- a/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c > +++ b/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c > @@ -62,7 +62,7 @@ main_test (void) > mempcpy (p + 5, s3, 1); > if (memcmp (p, "ABCDEFg", 8)) > abort (); > - mempcpy (p + 6, s1 + 1, l1); > + memcpy (p + 6, s1 + 1, l1); > if (memcmp (p, "ABCDEF2", 8)) > abort (); > } > > This is a mempcpy test. Why is mempcpy changed to memcpy? > Because this mempcpy is not optimized out to memcpy and it then aborts. It's proper to leave here mempcpy I believe. Martin ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657). 2018-03-14 13:10 ` Martin Liška @ 2018-03-14 13:43 ` Jakub Jelinek 0 siblings, 0 replies; 45+ messages in thread From: Jakub Jelinek @ 2018-03-14 13:43 UTC (permalink / raw) To: Martin Liška; +Cc: H.J. Lu, GCC Patches, Marc Glisse On Wed, Mar 14, 2018 at 02:08:07PM +0100, Martin LiÅ¡ka wrote: > > diff --git a/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c > > b/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c > > index d82e2232d7b..91e1c87f83f 100644 > > --- a/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c > > +++ b/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c > > @@ -62,7 +62,7 @@ main_test (void) > > mempcpy (p + 5, s3, 1); > > if (memcmp (p, "ABCDEFg", 8)) > > abort (); > > - mempcpy (p + 6, s1 + 1, l1); > > + memcpy (p + 6, s1 + 1, l1); > > if (memcmp (p, "ABCDEF2", 8)) > > abort (); > > } > > > > This is a mempcpy test. Why is mempcpy changed to memcpy? > > > > Because this mempcpy is not optimized out to memcpy and it then aborts. Why it isn't optimized to memcpy? If the lhs is unused, it always should be folded to memcpy, regardless of whether mempcpy is fast or not (I assume no target has slow memcpy and fast mempcpy). Jakub ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657). 2018-03-12 8:57 [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657) Martin Liška 2018-03-12 9:40 ` Marc Glisse @ 2018-03-12 10:08 ` Richard Biener 2018-03-12 13:35 ` Jakub Jelinek 2 siblings, 0 replies; 45+ messages in thread From: Richard Biener @ 2018-03-12 10:08 UTC (permalink / raw) To: Martin Liška; +Cc: GCC Patches, Jakub Jelinek, H.J. Lu On Mon, Mar 12, 2018 at 9:57 AM, Martin Liška <mliska@suse.cz> wrote: > Hi. > > This is fix for the PR that introduces a new target macro. Using the macro Don't add new target macros, use target hooks. > one can say that a target has a fast mempcpy and thus it's preferred to be used > if possible. > > Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. > I also tested on x86_64-linux-gnu. > > Ready to be installed? > Martin > > gcc/ChangeLog: > > 2018-03-08 Martin Liska <mliska@suse.cz> > > PR middle-end/81657 > * builtins.c (expand_builtin_memory_copy_args): Add new > arguments. > * config/i386/i386.h (TARGET_HAS_FAST_MEMPCPY_ROUTINE): > New macro. > * defaults.h (TARGET_HAS_FAST_MEMPCPY_ROUTINE): Likewise. > * doc/tm.texi: Likewise. > * doc/tm.texi.in: Likewise. > * expr.c (compare_by_pieces): Add support for bail out. > (emit_block_move_hints): Likewise. > * expr.h (emit_block_move_hints): Add new arguments. > > gcc/testsuite/ChangeLog: > > 2018-03-09 Martin Liska <mliska@suse.cz> > > PR middle-end/81657 > * gcc.dg/string-opt-1.c: Adjust test to run only on non-x86 > target. > --- > gcc/builtins.c | 13 ++++++++++++- > gcc/config/i386/i386.h | 3 +++ > gcc/defaults.h | 7 +++++++ > gcc/doc/tm.texi | 5 +++++ > gcc/doc/tm.texi.in | 5 +++++ > gcc/expr.c | 16 +++++++++++++++- > gcc/expr.h | 4 +++- > gcc/testsuite/gcc.dg/string-opt-1.c | 4 ++-- > 8 files changed, 52 insertions(+), 5 deletions(-) > > ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657). 2018-03-12 8:57 [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657) Martin Liška 2018-03-12 9:40 ` Marc Glisse 2018-03-12 10:08 ` Richard Biener @ 2018-03-12 13:35 ` Jakub Jelinek 2 siblings, 0 replies; 45+ messages in thread From: Jakub Jelinek @ 2018-03-12 13:35 UTC (permalink / raw) To: Martin Liška; +Cc: gcc-patches, H.J. Lu On Mon, Mar 12, 2018 at 09:57:09AM +0100, Martin LiÅ¡ka wrote: > Hi. > > This is fix for the PR that introduces a new target macro. Using the macro > one can say that a target has a fast mempcpy and thus it's preferred to be used > if possible. > > Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. > I also tested on x86_64-linux-gnu. I think we shouldn't be adding new target macros, better would be a target hook that returns this information. Jakub ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657). @ 2018-04-12 15:53 Wilco Dijkstra 2018-04-12 16:11 ` H.J. Lu 2018-04-12 16:35 ` Jakub Jelinek 0 siblings, 2 replies; 45+ messages in thread From: Wilco Dijkstra @ 2018-04-12 15:53 UTC (permalink / raw) To: Jakub Jelinek, Richard Biener Cc: nd, mliska, ubizjak, GCC Patches, marc.glisse, H.J. Lu, Jan Hubicka Jakub Jelinek wrote: > On Thu, Apr 12, 2018 at 03:52:09PM +0200, Richard Biener wrote: >> Not sure if I missed some important part of the discussion but >> for the testcase we want to preserve the tailcall, right? So >> it would be enough to set avoid_libcall to >> endp != 0 && CALL_EXPR_TAILCALL (exp) (and thus also handle >> stpcpy)? The tailcall issue is just a distraction. Historically the handling of mempcpy has been horribly inefficient in both GCC and GLIBC for practically all targets. This is why it was decided to defer to memcpy. For example small constant mempcpy was not expanded inline like memcpy until PR70140 was fixed. Except for a few targets which have added an optimized mempcpy, the default mempcpy implementation in almost all released GLIBCs is much slower than memcpy (due to using a badly written C implementation). Recent GLIBCs now call the optimized memcpy - this is better but still adds extra call/return overheads. So to improve that the GLIBC headers have an inline that changes any call to mempcpy into memcpy (this is the default but can be disabled on a per-target basis). Obviously it is best to do this optimization in GCC, which is what we finally do in GCC8. Inlining mempcpy means you sometimes miss a tailcall, but this is not common - in all of GLIBC the inlining on AArch64 adds 166 extra instructions and 12 callee-save registers. This is a small codesize cost to avoid the overhead of calling the generic C version. > My preference would be to have non-lame mempcpy etc. on all targets, but the > aarch64 folks disagree. The question is who is going to write the 30+ mempcpy implementations for all those targets which don't have one? And who says doing this is actually going to improve performance? Having mempcpy+memcpy typically means more Icache misses in code that uses both. So generally it's a good idea to change mempcpy into memcpy by default. It's not slower than calling mempcpy even if you have a fast implementation, it's faster if you use an up to date GLIBC which calls memcpy, and it's significantly better when using an old GLIBC. Wilco ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657). 2018-04-12 15:53 Wilco Dijkstra @ 2018-04-12 16:11 ` H.J. Lu 2018-04-12 16:35 ` Jakub Jelinek 1 sibling, 0 replies; 45+ messages in thread From: H.J. Lu @ 2018-04-12 16:11 UTC (permalink / raw) To: Wilco Dijkstra Cc: Jakub Jelinek, Richard Biener, nd, mliska, ubizjak, GCC Patches, marc.glisse, Jan Hubicka On Thu, Apr 12, 2018 at 8:53 AM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote: > So generally it's a good idea to change mempcpy into memcpy by default. It's > not slower than calling mempcpy even if you have a fast implementation, it's faster > if you use an up to date GLIBC which calls memcpy, and it's significantly better > when using an old GLIBC. > It is a BAD idea for x86. We don't want to turn mempcpy to to memcpy on x86. PERIOD. -- H.J. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657). 2018-04-12 15:53 Wilco Dijkstra 2018-04-12 16:11 ` H.J. Lu @ 2018-04-12 16:35 ` Jakub Jelinek 2018-04-12 16:30 ` Wilco Dijkstra 1 sibling, 1 reply; 45+ messages in thread From: Jakub Jelinek @ 2018-04-12 16:35 UTC (permalink / raw) To: Wilco Dijkstra Cc: Richard Biener, nd, mliska, ubizjak, GCC Patches, marc.glisse, H.J. Lu, Jan Hubicka On Thu, Apr 12, 2018 at 03:53:13PM +0000, Wilco Dijkstra wrote: > Jakub Jelinek wrote: > > On Thu, Apr 12, 2018 at 03:52:09PM +0200, Richard Biener wrote: > >> Not sure if I missed some important part of the discussion but > >> for the testcase we want to preserve the tailcall, right? So > >> it would be enough to set avoid_libcall to > >> endp != 0 && CALL_EXPR_TAILCALL (exp) (and thus also handle > >> stpcpy)? > > The tailcall issue is just a distraction. Historically the handling of mempcpy > has been horribly inefficient in both GCC and GLIBC for practically all targets. > This is why it was decided to defer to memcpy. I guess we need to agree to disagree. But we have a P1 PR that we need to resolve and it is one of the last 6 blockers we have. I'm not suggesting to revert PR70140, just let use mempcpy libcall if it is what the user wrote and we aren't expanding it inline. > So generally it's a good idea to change mempcpy into memcpy by default. It's No. > not slower than calling mempcpy even if you have a fast implementation, it's faster > if you use an up to date GLIBC which calls memcpy, and it's significantly better > when using an old GLIBC. mempcpy is quite good on many targets even in old GLIBCs. Jakub ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657). 2018-04-12 16:35 ` Jakub Jelinek @ 2018-04-12 16:30 ` Wilco Dijkstra 2018-04-12 17:35 ` Jakub Jelinek 0 siblings, 1 reply; 45+ messages in thread From: Wilco Dijkstra @ 2018-04-12 16:30 UTC (permalink / raw) To: Jakub Jelinek Cc: Richard Biener, nd, mliska, ubizjak, GCC Patches, marc.glisse, H.J. Lu, Jan Hubicka Jakub Jelinek wrote: > On Thu, Apr 12, 2018 at 03:53:13PM +0000, Wilco Dijkstra wrote: >> The tailcall issue is just a distraction. Historically the handling of mempcpy >> has been horribly inefficient in both GCC and GLIBC for practically all targets. >> This is why it was decided to defer to memcpy. > > I guess we need to agree to disagree. But we have a P1 PR that we need to > resolve and it is one of the last 6 blockers we have. I'm not suggesting to > revert PR70140, just let use mempcpy libcall if it is what the user wrote and > we aren't expanding it inline. Frankly I don't see why it is a P1 regression. Do you have a benchmark that regresses significantly (a few percent, not by a few bytes)? I already showed the AArch64 results for GLIBC, do you have x86 results that prove things are much worse? >> So generally it's a good idea to change mempcpy into memcpy by default. It's >> not slower than calling mempcpy even if you have a fast implementation, it's faster >> if you use an up to date GLIBC which calls memcpy, and it's significantly better >> when using an old GLIBC. > > mempcpy is quite good on many targets even in old GLIBCs. Only true if with "many" you mean x86, x86_64 and IIRC sparc. Wilco ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657). 2018-04-12 16:30 ` Wilco Dijkstra @ 2018-04-12 17:35 ` Jakub Jelinek 2018-04-12 17:29 ` Wilco Dijkstra 0 siblings, 1 reply; 45+ messages in thread From: Jakub Jelinek @ 2018-04-12 17:35 UTC (permalink / raw) To: Wilco Dijkstra Cc: Richard Biener, nd, mliska, ubizjak, GCC Patches, marc.glisse, H.J. Lu, Jan Hubicka On Thu, Apr 12, 2018 at 04:30:07PM +0000, Wilco Dijkstra wrote: > Jakub Jelinek wrote: > > On Thu, Apr 12, 2018 at 03:53:13PM +0000, Wilco Dijkstra wrote: > > >> The tailcall issue is just a distraction. Historically the handling of mempcpy > >> has been horribly inefficient in both GCC and GLIBC for practically all targets. > >> This is why it was decided to defer to memcpy. > > > > I guess we need to agree to disagree. But we have a P1 PR that we need to > > resolve and it is one of the last 6 blockers we have. I'm not suggesting to > > revert PR70140, just let use mempcpy libcall if it is what the user wrote and > > we aren't expanding it inline. > > Frankly I don't see why it is a P1 regression. Do you have a benchmark that That is how regression priorities are defined. > >> So generally it's a good idea to change mempcpy into memcpy by default. It's > >> not slower than calling mempcpy even if you have a fast implementation, it's faster > >> if you use an up to date GLIBC which calls memcpy, and it's significantly better > >> when using an old GLIBC. > > > > mempcpy is quite good on many targets even in old GLIBCs. > > Only true if with "many" you mean x86, x86_64 and IIRC sparc. Depending on what you mean old, I see e.g. in 2010 power7 mempcpy got added, in 2013 other power versions, in 2016 s390*, etc. Doing a decent mempcpy isn't hard if you have asm version of memcpy and one spare register. Jakub ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657). 2018-04-12 17:35 ` Jakub Jelinek @ 2018-04-12 17:29 ` Wilco Dijkstra 0 siblings, 0 replies; 45+ messages in thread From: Wilco Dijkstra @ 2018-04-12 17:29 UTC (permalink / raw) To: Jakub Jelinek Cc: Richard Biener, nd, mliska, ubizjak, GCC Patches, marc.glisse, H.J. Lu, Jan Hubicka Jakub Jelinek wrote: >On Thu, Apr 12, 2018 at 04:30:07PM +0000, Wilco Dijkstra wrote: >> Jakub Jelinek wrote: >> Frankly I don't see why it is a P1 regression. Do you have a benchmark that > >That is how regression priorities are defined. How can one justify considering this a release blocker without hard numbers? If this is a 1% regression on a large body of code it would be very serious, if 0.01% - not so much. >> >> So generally it's a good idea to change mempcpy into memcpy by default. It's >> >> not slower than calling mempcpy even if you have a fast implementation, it's faster >> >> if you use an up to date GLIBC which calls memcpy, and it's significantly better >> >> when using an old GLIBC. >> > >> > mempcpy is quite good on many targets even in old GLIBCs. >> >> Only true if with "many" you mean x86, x86_64 and IIRC sparc. > > Depending on what you mean old, I see e.g. in 2010 power7 mempcpy got added, > in 2013 other power versions, in 2016 s390*, etc. Doing a decent mempcpy > isn't hard if you have asm version of memcpy and one spare register. More mempcpy implementations have been added in recent years indeed, but almost all add an extra copy of the memcpy code rather than using a single combined implementation. That means it is still better to call memcpy (which is frequently used and thus likely in L1/L2) rather than mempcpy (which is more likely to be cold and thus not cached). Wilco ^ permalink raw reply [flat|nested] 45+ messages in thread
end of thread, other threads:[~2018-07-05 19:34 UTC | newest] Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-03-12 8:57 [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657) Martin Liška 2018-03-12 9:40 ` Marc Glisse 2018-03-13 8:25 ` Martin Liška 2018-03-13 8:35 ` Jakub Jelinek 2018-03-13 9:22 ` Richard Biener 2018-03-13 15:19 ` Martin Liška 2018-03-13 20:35 ` Jakub Jelinek 2018-03-14 12:57 ` Martin Liška 2018-03-14 13:08 ` Jakub Jelinek 2018-03-14 14:04 ` Martin Liška 2018-03-18 22:18 ` Martin Liška 2018-03-21 10:37 ` Jakub Jelinek 2018-03-28 14:29 ` Martin Liška 2018-03-28 14:35 ` Jakub Jelinek 2018-03-28 17:59 ` Martin Liška 2018-03-28 18:35 ` Jakub Jelinek 2018-03-29 12:14 ` Martin Liška 2018-03-29 12:31 ` H.J. Lu 2018-03-29 12:43 ` Jakub Jelinek 2018-03-29 12:35 ` Martin Liška 2018-04-04 10:13 ` Martin Liška 2018-04-09 12:31 ` Martin Liška 2018-04-10 10:35 ` Jakub Jelinek 2018-04-10 12:28 ` Martin Liška 2018-04-12 13:22 ` Martin Liška 2018-04-12 13:42 ` Jan Hubicka 2018-04-12 13:52 ` Richard Biener 2018-04-12 14:35 ` Jakub Jelinek 2018-04-12 14:19 ` Richard Biener 2018-04-12 14:35 ` Jakub Jelinek 2018-04-12 15:17 ` Richard Biener 2018-04-12 17:35 ` Jakub Jelinek 2018-04-12 16:03 ` Richard Biener 2018-07-05 19:34 ` Jeff Law 2018-03-14 13:08 ` H.J. Lu 2018-03-14 13:10 ` Martin Liška 2018-03-14 13:43 ` Jakub Jelinek 2018-03-12 10:08 ` Richard Biener 2018-03-12 13:35 ` Jakub Jelinek 2018-04-12 15:53 Wilco Dijkstra 2018-04-12 16:11 ` H.J. Lu 2018-04-12 16:35 ` Jakub Jelinek 2018-04-12 16:30 ` Wilco Dijkstra 2018-04-12 17:35 ` Jakub Jelinek 2018-04-12 17:29 ` Wilco Dijkstra
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).