From: Hongtao Liu <crazylht@gmail.com>
To: Roger Sayle <roger@nextmovesoftware.com>
Cc: gcc-patches@gcc.gnu.org, Uros Bizjak <ubizjak@gmail.com>
Subject: Re: [x86 PATCH] PR target/106060: Improved SSE vector constant materialization.
Date: Fri, 26 Jan 2024 09:34:11 +0800 [thread overview]
Message-ID: <CAMZc-bz5w+6rZvjo7pX0wnH4K9am4hGQNF+MyCKw1Gnyz+TM_g@mail.gmail.com> (raw)
In-Reply-To: <00d601da4fc1$1f631300$5e293900$@nextmovesoftware.com>
On Fri, Jan 26, 2024 at 3:03 AM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> Hi Hongtao,
> Many thanks for the review. Here's a revised version of my patch
> that addresses (most of) the issues you've raised. Firstly the
> handling of zero and all_ones in this function is mostly for
> completeness/documentation, these standard_sse_constant_p
> values are (currently/normally) handled elsewhere. But I have
> added an "n_var == 0" optimization to ix86_expand_vector_init.
>
> As you've suggested I've added explicit TARGET_SSE2 tests where
> required, and for consistency I've also added support for AVX512's
> V16SImode.
>
> As you've predicted, the eventual goal is to move this after combine
> (or reload) using define_insn_and_split, but that requires a significant
> restructuring that should be done in steps. This also interacts with
> a similar planned reorganization of TImode constant handling. If
> all 128-bit (vector) constants are acceptable before combine, then
> STV has the freedom to chose V1TImode (and this broadcast
> functionality) to implement TImode operations on immediate
> constants.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures. Ok for mainline (in stage 1)?
Ok, thanks for handling this.
>
>
> 2024-01-25 Roger Sayle <roger@nextmovesoftware.com>
> Hongtao Liu <hongtao.liu@intel.com>
>
> gcc/ChangeLog
> PR target/106060
> * config/i386/i386-expand.cc (enum ix86_vec_bcast_alg): New.
> (struct ix86_vec_bcast_map_simode_t): New type for table below.
> (ix86_vec_bcast_map_simode): Table of SImode constants that may
> be efficiently synthesized by a ix86_vec_bcast_alg method.
> (ix86_vec_bcast_map_simode_cmp): New comparator for bsearch.
> (ix86_vector_duplicate_simode_const): Efficiently synthesize
> V4SImode and V8SImode constants that duplicate special constants.
> (ix86_vector_duplicate_value): Attempt to synthesize "special"
> vector constants using ix86_vector_duplicate_simode_const.
> * config/i386/i386.cc (ix86_rtx_costs) <case ABS>: ABS of a
> vector integer mode costs with a single SSE instruction.
>
> gcc/testsuite/ChangeLog
> PR target/106060
> * gcc.target/i386/auto-init-8.c: Update test case.
> * gcc.target/i386/avx512fp16-3.c: Likewise.
> * gcc.target/i386/pr100865-9a.c: Likewise.
> * gcc.target/i386/pr101796-1.c: Likewise.
> * gcc.target/i386/pr106060-1.c: New test case.
> * gcc.target/i386/pr106060-2.c: Likewise.
> * gcc.target/i386/pr106060-3.c: Likewise.
> * gcc.target/i386/pr70314.c: Update test case.
> * gcc.target/i386/vect-shiftv4qi.c: Likewise.
> * gcc.target/i386/vect-shiftv8qi.c: Likewise.
>
>
> Roger
> --
>
> > -----Original Message-----
> > From: Hongtao Liu <crazylht@gmail.com>
> > Sent: 17 January 2024 03:13
> > To: Roger Sayle <roger@nextmovesoftware.com>
> > Cc: gcc-patches@gcc.gnu.org; Uros Bizjak <ubizjak@gmail.com>
> > Subject: Re: [x86 PATCH] PR target/106060: Improved SSE vector constant
> > materialization.
> >
> > On Wed, Jan 17, 2024 at 5:59 AM Roger Sayle <roger@nextmovesoftware.com>
> > wrote:
> > >
> > >
> > > I thought I'd just missed the bug fixing season of stage3, but there
> > > appears to a little latitude in early stage4 (for vector patches), so
> > > I'll post this now.
> > >
> > > This patch resolves PR target/106060 by providing efficient methods
> > > for materializing/synthesizing special "vector" constants on x86.
> > > Currently there are three methods of materializing a vector constant;
> > > the most general is to load a vector from the constant pool, secondly
> > "duplicated"
> > > constants can be synthesized by moving an integer between units and
> > > broadcasting (or shuffling it), and finally the special cases of the
> > > all-zeros vector and all-ones vectors can be loaded via a single SSE
> > > instruction. This patch handles additional cases that can be synthesized
> > > in two instructions, loading an all-ones vector followed by another
> > > SSE instruction. Following my recent patch for PR target/112992,
> > > there's conveniently a single place in i386-expand.cc where these
> > > special cases can be handled.
> > >
> > > Two examples are given in the original bugzilla PR for 106060.
> > >
> > > __m256i
> > > should_be_cmpeq_abs ()
> > > {
> > > return _mm256_set1_epi8 (1);
> > > }
> > >
> > > is now generated (with -O3 -march=x86-64-v3) as:
> > >
> > > vpcmpeqd %ymm0, %ymm0, %ymm0
> > > vpabsb %ymm0, %ymm0
> > > ret
> > >
> > > and
> > >
> > > __m256i
> > > should_be_cmpeq_add ()
> > > {
> > > return _mm256_set1_epi8 (-2);
> > > }
> > >
> > > is now generated as:
> > >
> > > vpcmpeqd %ymm0, %ymm0, %ymm0
> > > vpaddb %ymm0, %ymm0, %ymm0
> > > ret
> > >
> > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > > and make -k check, both with and without --target_board=unix{-m32}
> > > with no new failures. Ok for mainline?
> > >
> > >
> > > 2024-01-16 Roger Sayle <roger@nextmovesoftware.com>
> > >
> > > gcc/ChangeLog
> > > PR target/106060
> > > * config/i386/i386-expand.cc (enum ix86_vec_bcast_alg): New.
> > > (struct ix86_vec_bcast_map_simode_t): New type for table below.
> > > (ix86_vec_bcast_map_simode): Table of SImode constants that may
> > > be efficiently synthesized by a ix86_vec_bcast_alg method.
> > > (ix86_vec_bcast_map_simode_cmp): New comparator for bsearch.
> > > (ix86_vector_duplicate_simode_const): Efficiently synthesize
> > > V4SImode and V8SImode constants that duplicate special constants.
> > > (ix86_vector_duplicate_value): Attempt to synthesize "special"
> > > vector constants using ix86_vector_duplicate_simode_const.
> > > * config/i386/i386.cc (ix86_rtx_costs) <case ABS>: ABS of a
> > > vector integer mode costs with a single SSE instruction.
> > >
> >
> > + switch (entry->alg)
> > + {
> > + case VEC_BCAST_PXOR:
> > + if (mode == V8SImode && !TARGET_AVX2) return false;
> > + emit_move_insn (target, CONST0_RTX (mode));
> > + return true;
> > + case VEC_BCAST_PCMPEQ:
> > + if ((mode == V4SImode && !TARGET_SSE2)
> > + || (mode == V8SImode && !TARGET_AVX2)) return false;
> > + emit_move_insn (target, CONSTM1_RTX (mode));
> > + return true;
> >
> > I think we need to prevent those standard_sse_constant_p getting in
> > ix86_expand_vector_init_duplicate by below codes.
> >
> > /* If all values are identical, broadcast the value. */
> > if (all_same
> > && (nvars != 0 || !standard_sse_constant_p (gen_rtx_CONST_VECTOR
> > (mode, XVEC (vals, 0)), mode))
> > && ix86_expand_vector_init_duplicate (mmx_ok, mode, target,
> > XVECEXP (vals, 0, 0)))
> > return;
> >
> > + case VEC_BCAST_PABSB:
> > + if (mode == V4SImode)
> > + {
> > + tmp1 = gen_reg_rtx (V16QImode);
> > + emit_move_insn (tmp1, CONSTM1_RTX (V16QImode));
> > + tmp2 = gen_reg_rtx (V16QImode);
> > + emit_insn (gen_absv16qi2 (tmp2, tmp1));
> > Shouldn't it rely on TARGET_SSE2?
> >
> > + case VEC_BCAST_PADDB:
> > + if (mode == V4SImode)
> > + {
> > + tmp1 = gen_reg_rtx (V16QImode);
> > + emit_move_insn (tmp1, CONSTM1_RTX (V16QImode));
> > + tmp2 = gen_reg_rtx (V16QImode);
> > + emit_insn (gen_addv16qi3 (tmp2, tmp1, tmp1));
> > Ditto here and for all logic shift cases.
> > + }
> >
> > +
> > + if ((mode == V4SImode || mode == V8SImode)
> > + && CONST_INT_P (val)
> > + && ix86_vector_duplicate_simode_const (mode, target, INTVAL (val)))
> > + return true;
> > +
> > The alternative way is adding a pre_reload define_insn_and_split to match
> > specific const_vector and splitt it into new instructions.
> > In theoritically, the constant info can be retained before combine and will enable
> > more simplication.
> >
> > Also the patch can be extend to V16SImode, but it can be a separate patch.
> >
> > > gcc/testsuite/ChangeLog
> > > PR target/106060
> > > * gcc.target/i386/auto-init-8.c: Update test case.
> > > * gcc.target/i386/avx512fp16-3.c: Likewise.
> > > * gcc.target/i386/pr100865-9a.c: Likewise.
> > > * gcc.target/i386/pr106060-1.c: New test case.
> > > * gcc.target/i386/pr106060-2.c: Likewise.
> > > * gcc.target/i386/pr106060-3.c: Likewise.
> > > * gcc.target/i386/pr70314-3.c: Update test case.
> > > * gcc.target/i386/vect-shiftv4qi.c: Likewise.
> > > * gcc.target/i386/vect-shiftv8qi.c: Likewise.
> > >
> > >
> > > Thanks in advance,
> > > Roger
> > > --
> > >
> >
> >
> > --
> > BR,
> > Hongtao
--
BR,
Hongtao
prev parent reply other threads:[~2024-01-26 1:34 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-16 21:59 Roger Sayle
2024-01-17 3:13 ` Hongtao Liu
2024-01-25 19:03 ` Roger Sayle
2024-01-26 1:34 ` Hongtao Liu [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAMZc-bz5w+6rZvjo7pX0wnH4K9am4hGQNF+MyCKw1Gnyz+TM_g@mail.gmail.com \
--to=crazylht@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=roger@nextmovesoftware.com \
--cc=ubizjak@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).