From: Hongtao Liu <crazylht@gmail.com>
To: liuhongt <hongtao.liu@intel.com>
Cc: gcc-patches@gcc.gnu.org, hjl.tools@gmail.com, ubizjak@gmail.com
Subject: Re: [PATCH] [x86] Some tidy up for RA related hooks.
Date: Mon, 21 Nov 2022 13:28:02 +0800 [thread overview]
Message-ID: <CAMZc-bw+igw6kO0wv4F75Pq-b6yKr0m1gP5OTrpfBBDweCD=2A@mail.gmail.com> (raw)
In-Reply-To: <20221121021150.3348406-1-hongtao.liu@intel.com>
On Mon, Nov 21, 2022 at 10:13 AM liuhongt <hongtao.liu@intel.com> wrote:
>
> When i'm working at [1] for ix86_can_change_mode_class,
> I notice there're some incorrectness/misoptimization in current RA-related hook.
> This patch tries to do some fix and tidy up for them:
>
> 1. We also need to guard size of TO to be
> less than TARGET_SSE2 ? 2 : 4 in ix86_can_change_mode_class.
> 2. Merge VALID_AVX512FP16_SCALAR_MODE plus BFmode
> into VALID_AVX512F_SCALAR_MODE since we've support 16-bit data move
> above SSE2, so no need for the condition of AVX512FP16 for those evex
> sse registers.
> 3. Allocate DI/HImode to sse register for SSE2 above just like
> SImode since we've supported 16-bit data move between sse and gpr
> above SSE2, this will help RA to handle cases like (subreg:HI (reg:V8HI)
> 0) or else RA will spill it. This enable optimization for
> pieces-memset-{3,37,39}.c
> 4. Guard 64/32-bit vector move patterns with ix86_hard_reg_move_ok.
>
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606373.html
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
>
> gcc/ChangeLog:
>
> * config/i386/i386.cc (ix86_can_change_mode_class): Also guard
> size of TO.
> (ix86_hard_regno_mode_ok): Remove VALID_AVX512FP16_SCALAR_MODE
> * config/i386/i386.h (VALID_AVX512FP16_SCALAR_MODE): Merged to
> ..
> (VALID_AVX512F_SCALAR_MODE): .. this, also add HImode.
> (VALID_SSE_REG_MODE): Add DI/HImode.
> * config/i386/mmx.md (*mov<mode>_internal): Add
> ix86_hard_reg_move_ok to condition.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pieces-memset-3.c: Remove xfail.
> * gcc.target/i386/pieces-memset-37.c: Remove xfail.
> * gcc.target/i386/pieces-memset-39.c: Remove xfail.
> ---
> gcc/config/i386/i386.cc | 9 ++-------
> gcc/config/i386/i386.h | 16 ++++++++--------
> gcc/config/i386/mmx.md | 6 ++++--
> gcc/testsuite/gcc.target/i386/pieces-memset-3.c | 4 ++--
> gcc/testsuite/gcc.target/i386/pieces-memset-37.c | 4 ++--
> gcc/testsuite/gcc.target/i386/pieces-memset-39.c | 4 ++--
> 6 files changed, 20 insertions(+), 23 deletions(-)
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 292b32c5e99..030c26965ab 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -19725,7 +19725,8 @@ ix86_can_change_mode_class (machine_mode from, machine_mode to,
> the vec_dupv4hi pattern.
> NB: SSE2 can load 16bit data to sse register via pinsrw. */
> int mov_size = MAYBE_SSE_CLASS_P (regclass) && TARGET_SSE2 ? 2 : 4;
> - if (GET_MODE_SIZE (from) < mov_size)
> + if (GET_MODE_SIZE (from) < mov_size
> + || GET_MODE_SIZE (to) < mov_size)
> return false;
> }
>
> @@ -20089,12 +20090,6 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
> || VALID_AVX512F_SCALAR_MODE (mode)))
> return true;
>
> - /* For AVX512FP16, vmovw supports movement of HImode
> - and HFmode between GPR and SSE registers. */
> - if (TARGET_AVX512FP16
> - && VALID_AVX512FP16_SCALAR_MODE (mode))
> - return true;
> -
> /* For AVX-5124FMAPS or AVX-5124VNNIW
> allow V64SF and V64SI modes for special regnos. */
> if ((TARGET_AVX5124FMAPS || TARGET_AVX5124VNNIW)
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index 3869db8f2d3..d9a1fb0e420 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -1017,11 +1017,9 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
> (VALID_AVX256_REG_MODE (MODE) || (MODE) == OImode)
>
> #define VALID_AVX512F_SCALAR_MODE(MODE) \
> - ((MODE) == DImode || (MODE) == DFmode || (MODE) == SImode \
> - || (MODE) == SFmode)
> -
> -#define VALID_AVX512FP16_SCALAR_MODE(MODE) \
> - ((MODE) == HImode || (MODE) == HFmode)
> + ((MODE) == DImode || (MODE) == DFmode \
> + || (MODE) == SImode || (MODE) == SFmode \
> + || (MODE) == HImode || (MODE) == HFmode || (MODE) == BFmode)
>
> #define VALID_AVX512F_REG_MODE(MODE) \
> ((MODE) == V8DImode || (MODE) == V8DFmode || (MODE) == V64QImode \
> @@ -1045,13 +1043,15 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
> || (MODE) == V8HFmode || (MODE) == V4HFmode || (MODE) == V2HFmode \
> || (MODE) == V8BFmode || (MODE) == V4BFmode || (MODE) == V2BFmode \
> || (MODE) == V4QImode || (MODE) == V2HImode || (MODE) == V1SImode \
> - || (MODE) == V2DImode || (MODE) == V2QImode || (MODE) == DFmode \
> - || (MODE) == HFmode || (MODE) == BFmode)
> + || (MODE) == V2DImode || (MODE) == V2QImode \
> + || (MODE) == DFmode || (MODE) == DImode \
> + || (MODE) == HFmode || (MODE) == BFmode || (MODE) == HImode)
I realize there's no TARGET_SSE2 for VALID_SSE_REG_MODE in
hard_regno_mode_ok, it's ok for other modes except HImode which must
require SSE2, orelse we can move 16-bit from/to integer
So Remove HImode from it, and add it separately in hard_regno_mode_ok
as + /* Use pinsrw/pextrw to mov 16-bit data from/to sse to/from
integer. */
+ if (TARGET_SSE2 && mode == HImode)
+ return true;
+
>
> #define VALID_SSE_REG_MODE(MODE) \
> ((MODE) == V1TImode || (MODE) == TImode \
> || (MODE) == V4SFmode || (MODE) == V4SImode \
> - || (MODE) == SFmode || (MODE) == TFmode || (MODE) == TDmode)
> + || (MODE) == SFmode || (MODE) == SImode \
> + || (MODE) == TFmode || (MODE) == TDmode)
>
> #define VALID_MMX_REG_MODE_3DNOW(MODE) \
> ((MODE) == V2SFmode || (MODE) == SFmode)
> diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
> index d5134cc351e..63aff287795 100644
> --- a/gcc/config/i386/mmx.md
> +++ b/gcc/config/i386/mmx.md
> @@ -133,7 +133,8 @@ (define_insn "*mov<mode>_internal"
> (match_operand:MMXMODE 1 "nonimm_or_0_operand"
> "rCo,rC,C,rm,rC,C ,!y,m ,?!y,?!y,r ,C,v,m,v,v,r,*x,!y"))]
> "(TARGET_MMX || TARGET_MMX_WITH_SSE)
> - && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
> + && !(MEM_P (operands[0]) && MEM_P (operands[1]))
> + && ix86_hardreg_mov_ok (operands[0], operands[1])"
> {
> switch (get_attr_type (insn))
> {
> @@ -286,7 +287,8 @@ (define_insn "*mov<mode>_internal"
> "=r ,m ,v,v,v,m,r,v")
> (match_operand:V_32 1 "general_operand"
> "rmC,rC,C,v,m,v,v,r"))]
> - "!(MEM_P (operands[0]) && MEM_P (operands[1]))"
> + "!(MEM_P (operands[0]) && MEM_P (operands[1]))
> + && ix86_hardreg_mov_ok (operands[0], operands[1])"
> {
> switch (get_attr_type (insn))
> {
> diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-3.c b/gcc/testsuite/gcc.target/i386/pieces-memset-3.c
> index 765441a7c4a..4f105f58b26 100644
> --- a/gcc/testsuite/gcc.target/i386/pieces-memset-3.c
> +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-3.c
> @@ -13,6 +13,6 @@ foo (int x)
> /* { dg-final { scan-assembler-times "vinserti64x4\[ \\t\]+\[^\n\]*%zmm" 1 } } */
> /* { dg-final { scan-assembler-times "vmovdqu64\[ \\t\]+\[^\n\]*%zmm" 1 } } */
> /* No need to dynamically realign the stack here. */
> -/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" { xfail *-*-* } } } */
> +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
> /* Nor use a frame pointer. */
> -/* { dg-final { scan-assembler-not "%\[re\]bp" { xfail *-*-* } } } */
> +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-37.c b/gcc/testsuite/gcc.target/i386/pieces-memset-37.c
> index 0c5056be54d..fd09bd153ce 100644
> --- a/gcc/testsuite/gcc.target/i386/pieces-memset-37.c
> +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-37.c
> @@ -10,6 +10,6 @@ foo (int a1, int a2, int a3, int a4, int a5, int a6, int x, char *dst)
> /* { dg-final { scan-assembler-times "vpbroadcastb\[ \\t\]+\[^\n\]*%ymm" 1 } } */
> /* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 2 } } */
> /* No need to dynamically realign the stack here. */
> -/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" { xfail *-*-* } } } */
> +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
> /* Nor use a frame pointer. */
> -/* { dg-final { scan-assembler-not "%\[re\]bp" { xfail *-*-* } } } */
> +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-39.c b/gcc/testsuite/gcc.target/i386/pieces-memset-39.c
> index e33644c2f10..0ed88b274bd 100644
> --- a/gcc/testsuite/gcc.target/i386/pieces-memset-39.c
> +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-39.c
> @@ -11,6 +11,6 @@ foo (int a1, int a2, int a3, int a4, int a5, int a6, int x, char *dst)
> /* { dg-final { scan-assembler-not "vinserti64x4" } } */
> /* { dg-final { scan-assembler-times "vmovdqu8\[ \\t\]+\[^\n\]*%zmm" 1 } } */
> /* No need to dynamically realign the stack here. */
> -/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" { xfail *-*-* } } } */
> +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
> /* Nor use a frame pointer. */
> -/* { dg-final { scan-assembler-not "%\[re\]bp" { xfail *-*-* } } } */
> +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
> --
> 2.27.0
>
--
BR,
Hongtao
next prev parent reply other threads:[~2022-11-21 5:24 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-21 2:11 liuhongt
2022-11-21 5:28 ` Hongtao Liu [this message]
2022-11-21 7:17 ` Uros Bizjak
2022-11-21 8:11 ` Hongtao Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAMZc-bw+igw6kO0wv4F75Pq-b6yKr0m1gP5OTrpfBBDweCD=2A@mail.gmail.com' \
--to=crazylht@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=hjl.tools@gmail.com \
--cc=hongtao.liu@intel.com \
--cc=ubizjak@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).