Re: [PATCH 4/5] Add Power10 XXSPLTIDP for vector constants

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: David Edelsohn <dje.gcc@gmail.com>
To: Michael Meissner <meissner@linux.ibm.com>,
	Segher Boessenkool <segher@kernel.crashing.org>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>,
	Bill Schmidt <wschmidt@linux.ibm.com>,
	 Peter Bergner <bergner@linux.ibm.com>,
	will schmidt <will_schmidt@vnet.ibm.com>
Subject: Re: [PATCH 4/5] Add Power10 XXSPLTIDP for vector constants
Date: Tue, 14 Dec 2021 12:00:08 -0500	[thread overview]
Message-ID: <CAGWvnyk9JqazrHx4O5OW3dxsykQe-Ywsbr2D4NQA5a+aiHku-Q@mail.gmail.com> (raw)
In-Reply-To: <290083a37613e0cbf62e33fd76e4eb8e2039bca5.camel@vnet.ibm.com>

On Fri, Nov 5, 2021 at 3:24 PM will schmidt <will_schmidt@vnet.ibm.com> wrote:
>
> On Fri, 2021-11-05 at 00:10 -0400, Michael Meissner wrote:
> > Generate XXSPLTIDP for vectors on power10.
> >
> > This patch implements XXSPLTIDP support for all vector constants.  The
> > XXSPLTIDP instruction is given a 32-bit immediate that is converted to a vector
> > of two DFmode constants.  The immediate is in SFmode format, so only constants
> > that fit as SFmode values can be loaded with XXSPLTIDP.
> >
> > The constraint (eP) added in the previous patch for XXSPLTIW is also used
> > for XXSPLTIDP.
> >
>
> ok
>
>
> > DImode scalar constants are not handled.  This is due to the majority of DImode
> > constants will be in the GPR registers.  With vector registers, you have the
> > problem that XXSPLTIDP splats the double word into both elements of the
> > vector.  However, if TImode is loaded with an integer constant, it wants a full
> > 128-bit constant.
>
> This may be worth as adding to a todo somewhere in the code.
>
> >
> > SFmode and DFmode scalar constants are not handled in this patch.  The
> > support for for those constants will be in the next patch.
>
> ok
>
> >
> > I have added a temporary switch (-msplat-float-constant) to control whether or
> > not the XXSPLTIDP instruction is generated.
> >
> > I added 2 new tests to test loading up V2DI and V2DF vector constants.
>
>
>
>
> >
> > 2021-11-05  Michael Meissner  <meissner@the-meissners.org>
> >
> > gcc/
> >
> >       * config/rs6000/predicates.md (easy_fp_constant): Add support for
> >       generating XXSPLTIDP.
> >       (vsx_prefixed_constant): Likewise.
> >       (easy_vector_constant): Likewise.
> >       * config/rs6000/rs6000-protos.h (constant_generates_xxspltidp):
> >       New declaration.
> >       * config/rs6000/rs6000.c (output_vec_const_move): Add support for
> >       generating XXSPLTIDP.
> >       (prefixed_xxsplti_p): Likewise.
> >       (constant_generates_xxspltidp): New function.
> >       * config/rs6000/rs6000.opt (-msplat-float-constant): New debug option.
> >
> > gcc/testsuite/
> >
> >       * gcc.target/powerpc/pr86731-fwrapv-longlong.c: Update insn
> >       regex for power10.
> >       * gcc.target/powerpc/vec-splat-constant-v2df.c: New test.
> >       * gcc.target/powerpc/vec-splat-constant-v2di.c: New test.
> > ---
>
>
> ok
>
> >  gcc/config/rs6000/predicates.md               |   9 ++
> >  gcc/config/rs6000/rs6000-protos.h             |   1 +
> >  gcc/config/rs6000/rs6000.c                    | 108 ++++++++++++++++++
> >  gcc/config/rs6000/rs6000.opt                  |   4 +
> >  .../powerpc/pr86731-fwrapv-longlong.c         |   9 +-
> >  .../powerpc/vec-splat-constant-v2df.c         |  64 +++++++++++
> >  .../powerpc/vec-splat-constant-v2di.c         |  50 ++++++++
> >  7 files changed, 241 insertions(+), 4 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c
> >
> > diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
> > index ed6252bd0c4..d748b11857c 100644
> > --- a/gcc/config/rs6000/predicates.md
> > +++ b/gcc/config/rs6000/predicates.md
> > @@ -610,6 +610,9 @@ (define_predicate "easy_fp_constant"
> >
> >        if (constant_generates_xxspltiw (&vsx_const))
> >       return true;
> > +
> > +      if (constant_generates_xxspltidp (&vsx_const))
> > +     return true;
> >      }
> >
> >    /* Otherwise consider floating point constants hard, so that the
> > @@ -653,6 +656,9 @@ (define_predicate "vsx_prefixed_constant"
> >    if (constant_generates_xxspltiw (&vsx_const))
> >      return true;
> >
> > +  if (constant_generates_xxspltidp (&vsx_const))
> > +    return true;
> > +
> >    return false;
> >  })
> >
> > @@ -727,6 +733,9 @@ (define_predicate "easy_vector_constant"
> >
> >         if (constant_generates_xxspltiw (&vsx_const))
> >           return true;
> > +
> > +       if (constant_generates_xxspltidp (&vsx_const))
> > +         return true;
> >       }
>
>
> ok
>
> >
> >        if (TARGET_P9_VECTOR
> > diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
> > index 99c6a671289..2d28df7442d 100644
> > --- a/gcc/config/rs6000/rs6000-protos.h
> > +++ b/gcc/config/rs6000/rs6000-protos.h
> > @@ -253,6 +253,7 @@ extern bool vec_const_128bit_to_bytes (rtx, machine_mode,
> >                                      vec_const_128bit_type *);
> >  extern unsigned constant_generates_lxvkq (vec_const_128bit_type *);
> >  extern unsigned constant_generates_xxspltiw (vec_const_128bit_type *);
> > +extern unsigned constant_generates_xxspltidp (vec_const_128bit_type *);
> >  #endif /* RTX_CODE */
> >
> >  #ifdef TREE_CODE
> > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> > index be24f56eb31..8fde48cf2b3 100644
> > --- a/gcc/config/rs6000/rs6000.c
> > +++ b/gcc/config/rs6000/rs6000.c
> > @@ -7012,6 +7012,13 @@ output_vec_const_move (rtx *operands)
> >             operands[2] = GEN_INT (imm);
> >             return "xxspltiw %x0,%2";
> >           }
> > +
> > +       imm = constant_generates_xxspltidp (&vsx_const);
> > +       if (imm)
>
>
> Just a nit that the two lines could be combined into a similar form
> as used elsewhere as ...
>         if (constant_generates_xxspltidp(&vsx_const))
>
>
> > +         {
> > +           operands[2] = GEN_INT (imm);
> > +           return "xxspltidp %x0,%2";
> > +         }
>
> >       }
> >
> >        if (TARGET_P9_VECTOR
> > @@ -26809,6 +26816,9 @@ prefixed_xxsplti_p (rtx_insn *insn)
> >      {
> >        if (constant_generates_xxspltiw (&vsx_const))
> >       return true;
> > +
> > +      if (constant_generates_xxspltidp (&vsx_const))
> > +     return true;
> >      }
> >
> >    return false;
> > @@ -29014,6 +29024,104 @@ constant_generates_xxspltiw (vec_const_128bit_type *vsx_const)
> >    return vsx_const->words[0];
> >  }
> >
> > +/* Determine if a vector constant can be loaded with XXSPLTIDP.  Return zero if
> > +   the XXSPLTIDP instruction cannot be used.  Otherwise return the immediate
> > +   value to be used with the XXSPLTIDP instruction.  */
> > +
> > +unsigned
> > +constant_generates_xxspltidp (vec_const_128bit_type *vsx_const)
> > +{
> > +  if (!TARGET_SPLAT_FLOAT_CONSTANT || !TARGET_PREFIXED || !TARGET_VSX)
> > +    return 0;
> > +
> > +  /* Make sure that the two 64-bit segments are the same.  */
> > +  if (!vsx_const->all_double_words_same)
> > +    return 0;
>
> Perhaps more like "Reject if the two 64-bit segments are (not?) the
> same."
>
>
> > +
> > +  /* If the bytes, half words, or words are all the same, don't use XXSPLTIDP.
> > +     Use a simpler instruction (XXSPLTIB, VSPLTISB, VSPLTISH, or VSPLTISW).  */
> > +  if (vsx_const->all_bytes_same
> > +      || vsx_const->all_half_words_same
> > +      || vsx_const->all_words_same)
> > +    return 0;
> > +
> > +  unsigned HOST_WIDE_INT value = vsx_const->double_words[0];
> > +
> > +  /* Avoid values that look like DFmode NaN's, except for the normal NaN bit
> > +     pattern and the signalling NaN bit pattern.  Recognize infinity and
> > +     negative infinity.  */
> > +
> > +  /* Bit representation of DFmode normal quiet NaN.  */
> > +#define RS6000_CONST_DF_NAN  HOST_WIDE_INT_UC (0x7ff8000000000000)
> > +
> > +  /* Bit representation of DFmode normal signaling NaN.  */
> > +#define RS6000_CONST_DF_NANS HOST_WIDE_INT_UC (0x7ff4000000000000)
> > +
> > +  /* Bit representation of DFmode positive infinity.  */
> > +#define RS6000_CONST_DF_INF  HOST_WIDE_INT_UC (0x7ff0000000000000)
> > +
> > +  /* Bit representation of DFmode negative infinity.  */
> > +#define RS6000_CONST_DF_NEG_INF      HOST_WIDE_INT_UC (0xfff0000000000000)
>
> Defines may be more useful in a header file?
>
> > +
> > +  if (value != RS6000_CONST_DF_NAN
> > +      && value != RS6000_CONST_DF_NANS
> > +      && value != RS6000_CONST_DF_INF
> > +      && value != RS6000_CONST_DF_NEG_INF)
> > +    {
> > +      /* The IEEE 754 64-bit floating format has 1 bit for sign, 11 bits for
> > +      the exponent, and 52 bits for the mantissa (not counting the hidden
> > +      bit used for normal numbers).  NaN values have the exponent set to all
> > +      1 bits, and the mantissa non-zero (mantissa == 0 is infinity).  */
> > +
> > +      int df_exponent = (value >> 52) & 0x7ff;
> > +      unsigned HOST_WIDE_INT df_mantissa
> > +     = value & ((HOST_WIDE_INT_1U << 52) - HOST_WIDE_INT_1U);
>
>
> Should the "=" be on the end of the previous line?
>
>
> > +
> > +      if (df_exponent == 0x7ff && df_mantissa != 0)  /* other NaNs.  */
> > +     return 0;
> > +
> > +      /* Avoid values that are DFmode subnormal values.  Subnormal numbers have
> > +      the exponent all 0 bits, and the mantissa non-zero.  If the value is
> > +      subnormal, then the hidden bit in the mantissa is not set.  */
> > +      if (df_exponent == 0 && df_mantissa != 0)              /* subnormal.  */
> > +     return 0;
> > +    }
> > +
> > +  /* Change the representation to DFmode constant.  */
> > +  long df_words[2] = { vsx_const->words[0], vsx_const->words[1] };
> > +
> > +  /* real_from_target takes the target words in  target order.  */
>
> Extra space before target order.
>
> > +  if (!BYTES_BIG_ENDIAN)
> > +    std::swap (df_words[0], df_words[1]);
> > +
> > +  REAL_VALUE_TYPE rv_type;
> > +  real_from_target (&rv_type, df_words, DFmode);
> > +
> > +  const REAL_VALUE_TYPE *rv = &rv_type;
> > +
> > +  /* Validate that the number can be stored as a SFmode value.  */
> > +  if (!exact_real_truncate (SFmode, rv))
> > +    return 0;
> > +
> > +  /* Validate that the number is not a SFmode subnormal value (exponent is 0,
> > +     mantissa field is non-zero) which is undefined for the XXSPLTIDP
> > +     instruction.  */
> > +  long sf_value;
> > +  real_to_target (&sf_value, rv, SFmode);
> > +
> > +  /* IEEE 754 32-bit values have 1 bit for the sign, 8 bits for the exponent,
> > +     and 23 bits for the mantissa.  Subnormal numbers have the exponent all
> > +     0 bits, and the mantissa non-zero.  */
> > +  long sf_exponent = (sf_value >> 23) & 0xFF;
> > +  long sf_mantissa = sf_value & 0x7FFFFF;
> > +
> > +  if (sf_exponent == 0 && sf_mantissa != 0)
> > +    return 0;
> > +
> > +  /* Return the immediate to be used.  */
> > +  return sf_value;
> > +}
>
> ok
>
> > +
> >
> >  struct gcc_target targetm = TARGET_INITIALIZER;
> >
> > diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
> > index ec7b106fddb..c1d661d7e6b 100644
> > --- a/gcc/config/rs6000/rs6000.opt
> > +++ b/gcc/config/rs6000/rs6000.opt
> > @@ -644,6 +644,10 @@ msplat-word-constant
> >  Target Var(TARGET_SPLAT_WORD_CONSTANT) Init(1) Save
> >  Generate (do not generate) code that uses the XXSPLTIW instruction.
> >
> > +msplat-float-constant
> > +Target Var(TARGET_SPLAT_FLOAT_CONSTANT) Init(1) Save
> > +Generate (do not generate) code that uses the XXSPLTIDP instruction.
> > +
> >  mieee128-constant
> >  Target Var(TARGET_IEEE128_CONSTANT) Init(1) Save
> >  Generate (do not generate) code that uses the LXVKQ instruction.
>
> ok
>
>
> > diff --git a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
> > index bd1502bb30a..dcb30e1d886 100644
> > --- a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
> > +++ b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
> > @@ -24,11 +24,12 @@ vector signed long long splats4(void)
> >          return (vector signed long long) vec_sl(mzero, mzero);
> >  }
> >
> > -/* Codegen will consist of splat and shift instructions for most types.
> > -   If folding is enabled, the vec_sl tests using vector long long type will
> > -   generate a lvx instead of a vspltisw+vsld pair.  */
> > +/* Codegen will consist of splat and shift instructions for most types.  If
> > +   folding is enabled, the vec_sl tests using vector long long type will
> > +   generate a lvx instead of a vspltisw+vsld pair.  On power10, it will
> > +   generate a xxspltidp instruction instead of the lvx.  */
> >
> >  /* { dg-final { scan-assembler-times {\mvspltis[bhw]\M} 0 } } */
> >  /* { dg-final { scan-assembler-times {\mvsl[bhwd]\M} 0 } } */
> > -/* { dg-final { scan-assembler-times {\mp?lxv\M|\mlxv\M|\mlxvd2x\M} 2 } } */
> > +/* { dg-final { scan-assembler-times {\mp?lxv\M|\mlxv\M|\mlxvd2x\M|\mxxspltidp\M} 2 } } */
>
>
> ok
>
> No further comments,
> Thanks
> -Will

Okay.

Thanks, David

next prev parent reply	other threads:[~2021-12-14 17:00 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-05  4:02 [PATCH 0/5] Add Power10 XXSPLTI* and LXVKQ instructions Michael Meissner
2021-11-05  4:04 ` [PATCH 1/5] Add XXSPLTI* and LXVKQ instructions (new data structure and function) Michael Meissner
2021-11-05 17:01   ` will schmidt
2021-11-05 18:13     ` Michael Meissner
2021-12-14 16:57       ` David Edelsohn
2021-11-15 16:35   ` Ping: " Michael Meissner
2021-12-13 16:58   ` Ping #2: " Michael Meissner
2021-11-05  4:07 ` [PATCH 2/5] Add Power10 XXSPLTI* and LXVKQ instructions (LXVKQ) Michael Meissner
2021-11-05 17:52   ` will schmidt
2021-11-05 18:01     ` Michael Meissner
2021-12-14 16:57       ` David Edelsohn
2021-11-15 16:36   ` Ping: " Michael Meissner
2021-12-13 17:02   ` Ping #2: " Michael Meissner
2021-11-05  4:09 ` [PATCH 3/5] Add Power10 XXSPLTIW Michael Meissner
2021-11-05 18:50   ` will schmidt
2021-12-14 16:59     ` David Edelsohn
2021-11-15 16:37   ` Ping: " Michael Meissner
2021-12-13 17:04   ` Ping #2: " Michael Meissner
2021-11-05  4:10 ` [PATCH 4/5] Add Power10 XXSPLTIDP for vector constants Michael Meissner
2021-11-05 19:24   ` will schmidt
2021-12-14 17:00     ` David Edelsohn [this message]
2021-11-15 16:38   ` Ping: " Michael Meissner
2021-12-13 17:06   ` Ping #2: " Michael Meissner
2021-11-05  4:11 ` [PATCH 5/5] Add Power10 XXSPLTIDP for SFmode/DFmode constants Michael Meissner
2021-11-05 19:38   ` will schmidt
2021-12-14 17:01     ` David Edelsohn
2021-11-15 16:38   ` Ping: " Michael Meissner
2021-12-13 17:07   ` Ping #2: " Michael Meissner
2021-11-05 13:08 ` [PATCH 0/5] Add Power10 XXSPLTI* and LXVKQ instructions Michael Meissner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGWvnyk9JqazrHx4O5OW3dxsykQe-Ywsbr2D4NQA5a+aiHku-Q@mail.gmail.com \
    --to=dje.gcc@gmail.com \
    --cc=bergner@linux.ibm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=meissner@linux.ibm.com \
    --cc=segher@kernel.crashing.org \
    --cc=will_schmidt@vnet.ibm.com \
    --cc=wschmidt@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).