public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Hongtao Liu <crazylht@gmail.com>
To: Roger Sayle <roger@nextmovesoftware.com>
Cc: gcc-patches@gcc.gnu.org, Uros Bizjak <ubizjak@gmail.com>
Subject: Re: [x86 PATCH] PR target/106060: Improved SSE vector constant materialization.
Date: Fri, 26 Jan 2024 09:34:11 +0800	[thread overview]
Message-ID: <CAMZc-bz5w+6rZvjo7pX0wnH4K9am4hGQNF+MyCKw1Gnyz+TM_g@mail.gmail.com> (raw)
In-Reply-To: <00d601da4fc1$1f631300$5e293900$@nextmovesoftware.com>

On Fri, Jan 26, 2024 at 3:03 AM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> Hi Hongtao,
> Many thanks for the review.  Here's a revised version of my patch
> that addresses (most of) the issues you've raised.  Firstly the
> handling of zero and all_ones in this function is mostly for
> completeness/documentation, these standard_sse_constant_p
> values are (currently/normally) handled elsewhere.  But I have
> added an "n_var == 0" optimization to ix86_expand_vector_init.
>
> As you've suggested I've added explicit TARGET_SSE2 tests where
> required, and for consistency I've also added support for AVX512's
> V16SImode.
>
> As you've predicted, the eventual goal is to move this after combine
> (or reload) using define_insn_and_split, but that requires a significant
> restructuring that should be done in steps.  This also interacts with
> a similar planned reorganization of TImode constant handling.  If
> all 128-bit (vector) constants are acceptable before combine, then
> STV has the freedom to chose V1TImode (and this broadcast
> functionality) to implement TImode operations on immediate
> constants.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline (in stage 1)?
Ok, thanks for handling this.
>
>
> 2024-01-25  Roger Sayle  <roger@nextmovesoftware.com>
>             Hongtao Liu  <hongtao.liu@intel.com>
>
> gcc/ChangeLog
>         PR target/106060
>         * config/i386/i386-expand.cc (enum ix86_vec_bcast_alg): New.
>         (struct ix86_vec_bcast_map_simode_t): New type for table below.
>         (ix86_vec_bcast_map_simode): Table of SImode constants that may
>         be efficiently synthesized by a ix86_vec_bcast_alg method.
>         (ix86_vec_bcast_map_simode_cmp): New comparator for bsearch.
>         (ix86_vector_duplicate_simode_const): Efficiently synthesize
>         V4SImode and V8SImode constants that duplicate special constants.
>         (ix86_vector_duplicate_value): Attempt to synthesize "special"
>         vector constants using ix86_vector_duplicate_simode_const.
>         * config/i386/i386.cc (ix86_rtx_costs) <case ABS>: ABS of a
>         vector integer mode costs with a single SSE instruction.
>
> gcc/testsuite/ChangeLog
>         PR target/106060
>         * gcc.target/i386/auto-init-8.c: Update test case.
>         * gcc.target/i386/avx512fp16-3.c: Likewise.
>         * gcc.target/i386/pr100865-9a.c: Likewise.
>         * gcc.target/i386/pr101796-1.c: Likewise.
>         * gcc.target/i386/pr106060-1.c: New test case.
>         * gcc.target/i386/pr106060-2.c: Likewise.
>         * gcc.target/i386/pr106060-3.c: Likewise.
>         * gcc.target/i386/pr70314.c: Update test case.
>         * gcc.target/i386/vect-shiftv4qi.c: Likewise.
>         * gcc.target/i386/vect-shiftv8qi.c: Likewise.
>
>
> Roger
> --
>
> > -----Original Message-----
> > From: Hongtao Liu <crazylht@gmail.com>
> > Sent: 17 January 2024 03:13
> > To: Roger Sayle <roger@nextmovesoftware.com>
> > Cc: gcc-patches@gcc.gnu.org; Uros Bizjak <ubizjak@gmail.com>
> > Subject: Re: [x86 PATCH] PR target/106060: Improved SSE vector constant
> > materialization.
> >
> > On Wed, Jan 17, 2024 at 5:59 AM Roger Sayle <roger@nextmovesoftware.com>
> > wrote:
> > >
> > >
> > > I thought I'd just missed the bug fixing season of stage3, but there
> > > appears to a little latitude in early stage4 (for vector patches), so
> > > I'll post this now.
> > >
> > > This patch resolves PR target/106060 by providing efficient methods
> > > for materializing/synthesizing special "vector" constants on x86.
> > > Currently there are three methods of materializing a vector constant;
> > > the most general is to load a vector from the constant pool, secondly
> > "duplicated"
> > > constants can be synthesized by moving an integer between units and
> > > broadcasting (or shuffling it), and finally the special cases of the
> > > all-zeros vector and all-ones vectors can be loaded via a single SSE
> > > instruction.   This patch handles additional cases that can be synthesized
> > > in two instructions, loading an all-ones vector followed by another
> > > SSE instruction.  Following my recent patch for PR target/112992,
> > > there's conveniently a single place in i386-expand.cc where these
> > > special cases can be handled.
> > >
> > > Two examples are given in the original bugzilla PR for 106060.
> > >
> > > __m256i
> > > should_be_cmpeq_abs ()
> > > {
> > >   return _mm256_set1_epi8 (1);
> > > }
> > >
> > > is now generated (with -O3 -march=x86-64-v3) as:
> > >
> > >         vpcmpeqd        %ymm0, %ymm0, %ymm0
> > >         vpabsb  %ymm0, %ymm0
> > >         ret
> > >
> > > and
> > >
> > > __m256i
> > > should_be_cmpeq_add ()
> > > {
> > >   return _mm256_set1_epi8 (-2);
> > > }
> > >
> > > is now generated as:
> > >
> > >         vpcmpeqd        %ymm0, %ymm0, %ymm0
> > >         vpaddb  %ymm0, %ymm0, %ymm0
> > >         ret
> > >
> > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > > and make -k check, both with and without --target_board=unix{-m32}
> > > with no new failures.  Ok for mainline?
> > >
> > >
> > > 2024-01-16  Roger Sayle  <roger@nextmovesoftware.com>
> > >
> > > gcc/ChangeLog
> > >         PR target/106060
> > >         * config/i386/i386-expand.cc (enum ix86_vec_bcast_alg): New.
> > >         (struct ix86_vec_bcast_map_simode_t): New type for table below.
> > >         (ix86_vec_bcast_map_simode): Table of SImode constants that may
> > >         be efficiently synthesized by a ix86_vec_bcast_alg method.
> > >         (ix86_vec_bcast_map_simode_cmp): New comparator for bsearch.
> > >         (ix86_vector_duplicate_simode_const): Efficiently synthesize
> > >         V4SImode and V8SImode constants that duplicate special constants.
> > >         (ix86_vector_duplicate_value): Attempt to synthesize "special"
> > >         vector constants using ix86_vector_duplicate_simode_const.
> > >         * config/i386/i386.cc (ix86_rtx_costs) <case ABS>: ABS of a
> > >         vector integer mode costs with a single SSE instruction.
> > >
> >
> > +  switch (entry->alg)
> > +    {
> > +    case VEC_BCAST_PXOR:
> > +      if (mode == V8SImode && !TARGET_AVX2) return false;
> > +      emit_move_insn (target, CONST0_RTX (mode));
> > +      return true;
> > +    case VEC_BCAST_PCMPEQ:
> > +      if ((mode == V4SImode && !TARGET_SSE2)
> > +          || (mode == V8SImode && !TARGET_AVX2)) return false;
> > +      emit_move_insn (target, CONSTM1_RTX (mode));
> > +      return true;
> >
> > I think we need to prevent those standard_sse_constant_p getting in
> > ix86_expand_vector_init_duplicate by below codes.
> >
> >   /* If all values are identical, broadcast the value.  */
> >   if (all_same
> >       && (nvars != 0 || !standard_sse_constant_p (gen_rtx_CONST_VECTOR
> > (mode, XVEC (vals, 0)), mode))
> >       && ix86_expand_vector_init_duplicate (mmx_ok, mode, target,
> >     XVECEXP (vals, 0, 0)))
> >     return;
> >
> > +    case VEC_BCAST_PABSB:
> > +      if (mode == V4SImode)
> > + {
> > +  tmp1 = gen_reg_rtx (V16QImode);
> > +  emit_move_insn (tmp1, CONSTM1_RTX (V16QImode));
> > +  tmp2 = gen_reg_rtx (V16QImode);
> > +  emit_insn (gen_absv16qi2 (tmp2, tmp1));
> > Shouldn't it rely on TARGET_SSE2?
> >
> > +    case VEC_BCAST_PADDB:
> > +      if (mode == V4SImode)
> > + {
> > +  tmp1 = gen_reg_rtx (V16QImode);
> > +  emit_move_insn (tmp1, CONSTM1_RTX (V16QImode));
> > +  tmp2 = gen_reg_rtx (V16QImode);
> > +  emit_insn (gen_addv16qi3 (tmp2, tmp1, tmp1));
> > Ditto here and for all logic shift cases.
> > + }
> >
> > +
> > +  if ((mode == V4SImode || mode == V8SImode)
> > +      && CONST_INT_P (val)
> > +      && ix86_vector_duplicate_simode_const (mode, target, INTVAL (val)))
> > +    return true;
> > +
> > The alternative way is adding a pre_reload define_insn_and_split to match
> > specific const_vector and splitt it into new instructions.
> > In theoritically, the constant info can be retained before combine and will enable
> > more simplication.
> >
> > Also the patch can be extend to V16SImode, but it can be a separate patch.
> >
> > > gcc/testsuite/ChangeLog
> > >         PR target/106060
> > >         * gcc.target/i386/auto-init-8.c: Update test case.
> > >         * gcc.target/i386/avx512fp16-3.c: Likewise.
> > >         * gcc.target/i386/pr100865-9a.c: Likewise.
> > >         * gcc.target/i386/pr106060-1.c: New test case.
> > >         * gcc.target/i386/pr106060-2.c: Likewise.
> > >         * gcc.target/i386/pr106060-3.c: Likewise.
> > >         * gcc.target/i386/pr70314-3.c: Update test case.
> > >         * gcc.target/i386/vect-shiftv4qi.c: Likewise.
> > >         * gcc.target/i386/vect-shiftv8qi.c: Likewise.
> > >
> > >
> > > Thanks in advance,
> > > Roger
> > > --
> > >
> >
> >
> > --
> > BR,
> > Hongtao



-- 
BR,
Hongtao

      reply	other threads:[~2024-01-26  1:34 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-16 21:59 Roger Sayle
2024-01-17  3:13 ` Hongtao Liu
2024-01-25 19:03   ` Roger Sayle
2024-01-26  1:34     ` Hongtao Liu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAMZc-bz5w+6rZvjo7pX0wnH4K9am4hGQNF+MyCKw1Gnyz+TM_g@mail.gmail.com \
    --to=crazylht@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=roger@nextmovesoftware.com \
    --cc=ubizjak@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).