public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: "H.J. Lu" <hjl.tools@gmail.com>
To: Richard Biener <richard.guenther@gmail.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>,
	Richard Earnshaw <rearnsha@arm.com>
Subject: Re: [PATCH v2] Add TARGET_MOVE_WITH_MODE_P
Date: Tue, 8 Mar 2022 07:43:29 -0800	[thread overview]
Message-ID: <CAMe9rOrDfyVmWCudTT6dF7FQb4iDTsg1SjtkVp-A7oSmAZBSSg@mail.gmail.com> (raw)
In-Reply-To: <CAFiYyc3ms8QgC5u13ZGuBo2ByjQhhSPn3a8MvnuGrobCn0gePw@mail.gmail.com>

On Mon, Mar 7, 2022 at 5:45 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Wed, Mar 2, 2022 at 10:18 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Wed, Mar 02, 2022 at 09:51:26AM +0100, Richard Biener wrote:
> > > On Tue, Mar 1, 2022 at 11:41 PM H.J. Lu via Gcc-patches
> > > <gcc-patches@gcc.gnu.org> wrote:
> > > >
> > > > Add TARGET_FOLD_MEMCPY_MAX for the maximum number of bytes to fold memcpy.
> > > > The default is
> > > >
> > > > MOVE_MAX * MOVE_RATIO (optimize_function_for_size_p (cfun))
> > > >
> > > > For x86, it is MOVE_MAX to restore the old behavior before
> > >
> > > I know we've discussed this to death in the PR, I just want to repeat here
> > > that the GIMPLE folding expects to generate a single load and a single
> > > store (that is what it does on the GIMPLE level) which is why MOVE_MAX
> > > was chosen originally (it's documented to what a "single instruction" does).
> > > In practice MOVE_MAX does not seem to cover vector register sizes
> > > so Richard pulled MOVE_RATIO which is really intended to cover
> > > the case of using multiple instructions for moving memory (but then I
> > > don't remember whether for the ARM case the single load/store GIMPLE
> > > will be expanded to multiple load/store instructions).
> > >
> > > TARGET_FOLD_MEMCPY_MAX sounds like a stop-gap solution,
> > > being very specific for memcpy folding (we also fold memmove btw).
> > >
> > > There is also MOVE_MAX_PIECES which _might_ be more appropriate
> > > than MOVE_MAX here and still honor the idea of single instructions.
> > > Now neither arm nor aarch64 define this and it defaults to MOVE_MAX,
> > > not MOVE_MAX * MOVE_RATIO.
> > >
> > > So if we need a new hook then that hook should at least get the
> > > 'speed' argument of MOVE_RATIO and it should get a better name.
> > >
> > > I still think that it should be possible to improve the insn check to
> > > avoid use of "disabled" modes, maybe that's also a point to add
> > > a new hook like .move_with_mode_p or so?  To quote, we do
> >
> > Here is the v2 patch to add TARGET_MOVE_WITH_MODE_P.
>
> Again I'd like to shine light on MOVE_MAX_PIECES which explicitely
> mentions "a load or store used TO COPY MEMORY" (emphasis mine)
> and whose x86 implementation would already be fine (doing larger moves
> and also not doing too large moves).  But appearantly the arm folks
> decided that that's not fit and instead (mis-?)used MOVE_MAX * MOVE_RATIO.
>
> Yes, MOVE_MAX_PIECES is documented to apply to move_by_pieces.
> Still GIMPLE memcpy/memmove inlining wants to mimic exactly that but
> restrict itself to a single load and a single store.
>
> > >
> > >               scalar_int_mode mode;
> > >               if (int_mode_for_size (ilen * 8, 0).exists (&mode)
> > >                   && GET_MODE_SIZE (mode) * BITS_PER_UNIT == ilen * 8
> > >                   && have_insn_for (SET, mode)
> > >                   /* If the destination pointer is not aligned we must be able
> > >                      to emit an unaligned store.  */
> > >                   && (dest_align >= GET_MODE_ALIGNMENT (mode)
> > >                       || !targetm.slow_unaligned_access (mode, dest_align)
> > >                       || (optab_handler (movmisalign_optab, mode)
> > >                           != CODE_FOR_nothing)))
> > >
> > > where I understand the ISA is enabled and if the user explicitely
> > > uses it that's OK but -mprefer-avx128 should tell GCC to never
> > > generate AVX256 code where the user was not explicitely using it
> > > (still for example glibc might happily use AVX256 code to implement
> > > the memcpy we are folding!)
> > >
> > > Note the BB vectorizer also might end up with using AVX256 because
> > > in places it also relies on optab queries and the vector_mode_supported_p
> > > check (but the memcpy folding uses the fake integer modes).  So
> > > x86 might need to implement the related_mode hook to avoid "auto"-using
> > > a larger vector mode which the default implementation would happily do.
> > >
> > > Richard.
> >
> > OK for master?
>
> Looking for opinions from others as well.
>
> Btw, there's a similar use in expand_DEFERRED_INIT:
>
>           && int_mode_for_size (tree_to_uhwi (var_size) * BITS_PER_UNIT,
>                                 0).exists (&var_mode)
>           && have_insn_for (SET, var_mode))
>
> So it occured to me that maybe targetm.move_with_mode_p should eventually

TARGET_MOVE_WITH_MODE_P shouldn't be used here since we do want to
use var_mode.

> check have_insn_for (SET, var_mode) or we should abstract checking the two
> things to a generic API somewhere (in optabs-query.h maybe, or expmed.h,
> not sure where it would be more appropriate).
>
> > +@deftypefn {Target Hook} bool TARGET_MOVE_WITH_MODE_P (machine_mode @var{mode})
> > +This target hook returns true if move with mode @var{mode} can be
> > +generated implicitly.  The default definition returns true.
> > +@end deftypefn
>
> I know what you mean but I'm not sure "can be generated implicitly" captures
> that.  The compiler always generated stuff explicitely.  It's also
> "should", not "can", no?

TARGET_MOVE_WITH_MODE_P is a target hook for compiler optimization.
It is totally OK for TARGET_MOVE_WITH_MODE_P to return false.   Will
TARGET_FAST_MOVE_WITH_MODE_P be better?

> Thanks,
> Richard.
>
> > Thanks.
> >
> > H.J.
> > ---
> > Add TARGET_MOVE_WITH_MODE_P to return true if move with mode can be
> > generated implicitly.  The default definition returns true.  The x86
> > version returns true if the mode size <= MOVE_MAX, which is the max
> > number of bytes we can move in one reasonably fast instruction.
> >
> > gcc/
> >
> >         PR target/103393
> >         * gimple-fold.cc (gimple_fold_builtin_memory_op): Call
> >         targetm.move_with_mode_p to check if move with mode can be
> >         generated implicitly.
> >         * target.def: Add move_with_mode_p.
> >         * targhooks.cc (default_move_with_mode_p): New.
> >         * targhooks.h (default_move_with_mode_p): Likewise.
> >         * config/i386/i386.cc (ix86_move_with_mode_p): New.
> >         (TARGET_MOVE_WITH_MODE_P): Likewise.
> >         * doc/tm.texi.in: Add TARGET_MOVE_WITH_MODE_P.
> >         * doc/tm.texi: Regenerate.
> >
> > gcc/testsuite/
> >
> >         PR target/103393
> >         * gcc.target/i386/pr103393-1.c: New test.
> >         * gcc.target/i386/pr103393-2.c: Likewise.
> >         * gcc.target/i386/pr103393-3.c: Likewise.
> >         * gcc.target/i386/pr103393-4.c: Likewise.
> >         * gcc.target/i386/pr103393-5.c: Likewise.
> > ---
> >  gcc/config/i386/i386.cc                    | 12 ++++++++++++
> >  gcc/doc/tm.texi                            |  5 +++++
> >  gcc/doc/tm.texi.in                         |  2 ++
> >  gcc/gimple-fold.cc                         |  1 +
> >  gcc/target.def                             |  7 +++++++
> >  gcc/testsuite/gcc.target/i386/pr103393-1.c | 16 ++++++++++++++++
> >  gcc/testsuite/gcc.target/i386/pr103393-2.c | 16 ++++++++++++++++
> >  gcc/testsuite/gcc.target/i386/pr103393-3.c | 16 ++++++++++++++++
> >  gcc/testsuite/gcc.target/i386/pr103393-4.c | 16 ++++++++++++++++
> >  gcc/testsuite/gcc.target/i386/pr103393-5.c | 16 ++++++++++++++++
> >  10 files changed, 107 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr103393-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr103393-2.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr103393-3.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr103393-4.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr103393-5.c
> >
> > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > index b2bf90576d5..68a2c59220c 100644
> > --- a/gcc/config/i386/i386.cc
> > +++ b/gcc/config/i386/i386.cc
> > @@ -23918,6 +23918,15 @@ ix86_push_rounding (poly_int64 bytes)
> >    return ROUND_UP (bytes, UNITS_PER_WORD);
> >  }
> >
> > +/* Implement the TARGET_MOVE_WITH_MODE_P hook.  Return true if move
> > +   with MODE can be generated implicitly.  */
> > +
> > +static bool
> > +ix86_move_with_mode_p (machine_mode mode)
> > +{
> > +  return GET_MODE_SIZE (mode) <= MOVE_MAX;
> > +}
> > +
> >  /* Target-specific selftests.  */
> >
> >  #if CHECKING_P
> > @@ -24735,6 +24744,9 @@ static bool ix86_libc_has_fast_function (int fcode ATTRIBUTE_UNUSED)
> >  #define TARGET_RUN_TARGET_SELFTESTS selftest::ix86_run_selftests
> >  #endif /* #if CHECKING_P */
> >
> > +#undef TARGET_MOVE_WITH_MODE_P
> > +#define TARGET_MOVE_WITH_MODE_P ix86_move_with_mode_p
> > +
> >  struct gcc_target targetm = TARGET_INITIALIZER;
> >
> >  #include "gt-i386.h"
> > diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> > index 49864dd79f8..9d5058610e1 100644
> > --- a/gcc/doc/tm.texi
> > +++ b/gcc/doc/tm.texi
> > @@ -11924,6 +11924,11 @@ statement holding the function call.  Returns true if any change
> >  was made to the GIMPLE stream.
> >  @end deftypefn
> >
> > +@deftypefn {Target Hook} bool TARGET_MOVE_WITH_MODE_P (machine_mode @var{mode})
> > +This target hook returns true if move with mode @var{mode} can be
> > +generated implicitly.  The default definition returns true.
> > +@end deftypefn
> > +
> >  @deftypefn {Target Hook} int TARGET_COMPARE_VERSION_PRIORITY (tree @var{decl1}, tree @var{decl2})
> >  This hook is used to compare the target attributes in two functions to
> >  determine which function's features get higher priority.  This is used
> > diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> > index 95e5e341f07..e9158ab90c6 100644
> > --- a/gcc/doc/tm.texi.in
> > +++ b/gcc/doc/tm.texi.in
> > @@ -7924,6 +7924,8 @@ to by @var{ce_info}.
> >
> >  @hook TARGET_GIMPLE_FOLD_BUILTIN
> >
> > +@hook TARGET_MOVE_WITH_MODE_P
> > +
> >  @hook TARGET_COMPARE_VERSION_PRIORITY
> >
> >  @hook TARGET_GET_FUNCTION_VERSIONS_DISPATCHER
> > diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
> > index c9179abb27e..93267eeabb2 100644
> > --- a/gcc/gimple-fold.cc
> > +++ b/gcc/gimple-fold.cc
> > @@ -1007,6 +1007,7 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi,
> >               if (int_mode_for_size (ilen * 8, 0).exists (&mode)
> >                   && GET_MODE_SIZE (mode) * BITS_PER_UNIT == ilen * 8
> >                   && have_insn_for (SET, mode)
> > +                 && targetm.move_with_mode_p (mode)
> >                   /* If the destination pointer is not aligned we must be able
> >                      to emit an unaligned store.  */
> >                   && (dest_align >= GET_MODE_ALIGNMENT (mode)
> > diff --git a/gcc/target.def b/gcc/target.def
> > index 72c2e1ef756..041d944cb38 100644
> > --- a/gcc/target.def
> > +++ b/gcc/target.def
> > @@ -2489,6 +2489,13 @@ was made to the GIMPLE stream.",
> >   bool, (gimple_stmt_iterator *gsi),
> >   hook_bool_gsiptr_false)
> >
> > +DEFHOOK
> > +(move_with_mode_p,
> > + "This target hook returns true if move with mode @var{mode} can be\n\
> > +generated implicitly.  The default definition returns true.",
> > + bool, (machine_mode mode),
> > + hook_bool_mode_true)
> > +
> >  /* Target hook is used to compare the target attributes in two functions to
> >     determine which function's features get higher priority.  This is used
> >     during function multi-versioning to figure out the order in which two
> > diff --git a/gcc/testsuite/gcc.target/i386/pr103393-1.c b/gcc/testsuite/gcc.target/i386/pr103393-1.c
> > new file mode 100644
> > index 00000000000..2091d33c45f
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr103393-1.c
> > @@ -0,0 +1,16 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -march=x86-64 -mavx -mprefer-avx128 -mprefer-vector-width=128" } */
> > +
> > +struct TestData {
> > +  float arr[8];
> > +};
> > +
> > +void
> > +cpy (struct TestData *s1, struct TestData *s2 )
> > +{
> > +  for(int i=0; i<8; ++i)
> > +    s1->arr[i] = s2->arr[i];
> > +}
> > +
> > +/* { dg-final { scan-assembler "jmp\[\\t \]+_?memmove" { target { ! ia32 } } } } */
> > +/* { dg-final { scan-assembler "call\[\\t \]+_?memmove" { target ia32 } } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/pr103393-2.c b/gcc/testsuite/gcc.target/i386/pr103393-2.c
> > new file mode 100644
> > index 00000000000..4ad8c8bf379
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr103393-2.c
> > @@ -0,0 +1,16 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -march=skylake-avx512" } */
> > +
> > +struct TestData {
> > +  float arr[8];
> > +};
> > +
> > +void
> > +cpy (struct TestData *s1, struct TestData *s2 )
> > +{
> > +  for(int i=0; i<16; ++i)
> > +    s1->arr[i] = s2->arr[i];
> > +}
> > +
> > +/* { dg-final { scan-assembler "jmp\[\\t \]+_?memmove" { target { ! ia32 } } } } */
> > +/* { dg-final { scan-assembler "call\[\\t \]+_?memmove" { target ia32 } } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/pr103393-3.c b/gcc/testsuite/gcc.target/i386/pr103393-3.c
> > new file mode 100644
> > index 00000000000..7a03160e512
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr103393-3.c
> > @@ -0,0 +1,16 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -march=sapphirerapids" } */
> > +
> > +struct TestData {
> > +  float arr[8];
> > +};
> > +
> > +void
> > +cpy (struct TestData *s1, struct TestData *s2 )
> > +{
> > +  for(int i=0; i<16; ++i)
> > +    s1->arr[i] = s2->arr[i];
> > +}
> > +
> > +/* { dg-final { scan-assembler-times "vmovdqu64\[ \\t\]+\[^\n\]*%zmm" 2 } } */
> > +/* { dg-final { scan-assembler-not "\[\\t \]+_?memmove" } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/pr103393-4.c b/gcc/testsuite/gcc.target/i386/pr103393-4.c
> > new file mode 100644
> > index 00000000000..ae2599f6411
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr103393-4.c
> > @@ -0,0 +1,16 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -march=skylake-avx512 -mstore-max=512" } */
> > +
> > +struct TestData {
> > +  float arr[8];
> > +};
> > +
> > +void
> > +cpy (struct TestData *s1, struct TestData *s2 )
> > +{
> > +  for(int i=0; i<16; ++i)
> > +    s1->arr[i] = s2->arr[i];
> > +}
> > +
> > +/* { dg-final { scan-assembler-times "vmovdqu64\[ \\t\]+\[^\n\]*%zmm" 2 } } */
> > +/* { dg-final { scan-assembler-not "\[\\t \]+_?memmove" } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/pr103393-5.c b/gcc/testsuite/gcc.target/i386/pr103393-5.c
> > new file mode 100644
> > index 00000000000..f9173684212
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr103393-5.c
> > @@ -0,0 +1,16 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -march=skylake-avx512 -mmove-max=512" } */
> > +
> > +struct TestData {
> > +  float arr[8];
> > +};
> > +
> > +void
> > +cpy (struct TestData *s1, struct TestData *s2 )
> > +{
> > +  for(int i=0; i<16; ++i)
> > +    s1->arr[i] = s2->arr[i];
> > +}
> > +
> > +/* { dg-final { scan-assembler-times "vmovdqu64\[ \\t\]+\[^\n\]*%zmm" 2 } } */
> > +/* { dg-final { scan-assembler-not "\[\\t \]+_?memmove" } } */
> > --
> > 2.35.1
> >



-- 
H.J.

  reply	other threads:[~2022-03-08 15:44 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-01 22:41 [PATCH] Add TARGET_FOLD_MEMCPY_MAX H.J. Lu
2022-03-02  8:51 ` Richard Biener
2022-03-02 21:18   ` [PATCH v2] Add TARGET_MOVE_WITH_MODE_P H.J. Lu
2022-03-07 13:45     ` Richard Biener
2022-03-08 15:43       ` H.J. Lu [this message]
2022-03-09  8:25         ` Richard Biener
2022-03-09 18:07           ` H.J. Lu
2022-03-10  8:20             ` Richard Biener
2022-03-09 18:04       ` Richard Sandiford
2022-03-10  8:06         ` Richard Biener
2022-03-14 15:44           ` Richard Sandiford
2022-03-22  2:50             ` H.J. Lu
2022-03-22  8:20               ` Richard Biener
2022-03-23 13:58                 ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAMe9rOrDfyVmWCudTT6dF7FQb4iDTsg1SjtkVp-A7oSmAZBSSg@mail.gmail.com \
    --to=hjl.tools@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=rearnsha@arm.com \
    --cc=richard.guenther@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).