From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yw1-x112c.google.com (mail-yw1-x112c.google.com [IPv6:2607:f8b0:4864:20::112c]) by sourceware.org (Postfix) with ESMTPS id A13C63857C51 for ; Mon, 21 Nov 2022 05:24:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A13C63857C51 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-yw1-x112c.google.com with SMTP id 00721157ae682-39451671bdfso78115347b3.10 for ; Sun, 20 Nov 2022 21:24:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=sd2xkP6CulcsWseZZMcuevb+BaklpmDHqEkyN3te5Hk=; b=YsF21TLKUbTHoMh1L06H3J0Hl3GMSxpRkWlI26QxsJIG2IDcwexa60EXsnB6BFu3mk B+HQv0ZxfojGqFZzAaDvxprSsVERJ2OMRwC80S4b8Tp3jXq3uGfBoDpwLyOkB4NxDF+d ExlVhX+qk1l3dtwHtoAA5XcqouSRLQ87cK+LOFA4mDwr7tj4XWwEgEZdLKKfCyMt/yHD x59ja6CMcTjuLB02mi+cB6LiOSl4V3Vd5a+n3meQNwzBrlSDCLCPQ+uBig+emGTBdTFy Dpe0Bd4BjbS9xzAa2/JTlYOkEJOD55eqq8Rw2qOXouP3KOnmnR6CTk7r3pbHpetL3KB0 cXbw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=sd2xkP6CulcsWseZZMcuevb+BaklpmDHqEkyN3te5Hk=; b=a+E9ngT3/fsQSvnXUDpEfo3mPcAIYT+TMaZpFIIdK96qSbESAgxQ3u0Ws/8JC9KvTs obSv1GC8bfwJzNSS9tijhKwQM8E0KYZAAqLDg0kklFCevZOLDLmYdxNC5bBhJ16p5A/O XBGwILyaQqh9YXBC/8qRLFHVLemqUw7su5r+ETuEDTTiKYMwQWD/dUTtHvPuYopPX7Yk JKSe3WxZ8G7su3c4eDSMhe6bLmjvemZ4YcflNfZJqcMeIPShdf0yfI8HE4E5nTxNI6JE +U1b3PgO803E2XTZcM/o9VdIVKOX3tE7uQKwqEL0VgpZER0C1ek17z9F8KSVZsQwR+QB h+Vg== X-Gm-Message-State: ANoB5pkuKs5qxZn5Vlr7H5BBr14fhjdHZ4WFc22rCC1jtWf85y7KIDFI kgiuVDi3wWk3PKiGiFc7VZlQUOzoWQVCeO7rUUs= X-Google-Smtp-Source: AA0mqf4D2CLp6pFm4wocqKNNJglxnISg+b623jhJXODltgJ06+gLVetG1c7rdw1p/fiYtCT31mLrKrumiJj7XxCZEB0= X-Received: by 2002:a0d:ce86:0:b0:36b:999a:da62 with SMTP id q128-20020a0dce86000000b0036b999ada62mr15661778ywd.249.1669008293861; Sun, 20 Nov 2022 21:24:53 -0800 (PST) MIME-Version: 1.0 References: <20221121021150.3348406-1-hongtao.liu@intel.com> In-Reply-To: <20221121021150.3348406-1-hongtao.liu@intel.com> From: Hongtao Liu Date: Mon, 21 Nov 2022 13:28:02 +0800 Message-ID: Subject: Re: [PATCH] [x86] Some tidy up for RA related hooks. To: liuhongt Cc: gcc-patches@gcc.gnu.org, hjl.tools@gmail.com, ubizjak@gmail.com Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-7.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, Nov 21, 2022 at 10:13 AM liuhongt wrote: > > When i'm working at [1] for ix86_can_change_mode_class, > I notice there're some incorrectness/misoptimization in current RA-related hook. > This patch tries to do some fix and tidy up for them: > > 1. We also need to guard size of TO to be > less than TARGET_SSE2 ? 2 : 4 in ix86_can_change_mode_class. > 2. Merge VALID_AVX512FP16_SCALAR_MODE plus BFmode > into VALID_AVX512F_SCALAR_MODE since we've support 16-bit data move > above SSE2, so no need for the condition of AVX512FP16 for those evex > sse registers. > 3. Allocate DI/HImode to sse register for SSE2 above just like > SImode since we've supported 16-bit data move between sse and gpr > above SSE2, this will help RA to handle cases like (subreg:HI (reg:V8HI) > 0) or else RA will spill it. This enable optimization for > pieces-memset-{3,37,39}.c > 4. Guard 64/32-bit vector move patterns with ix86_hard_reg_move_ok. > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606373.html > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ok for trunk? > > gcc/ChangeLog: > > * config/i386/i386.cc (ix86_can_change_mode_class): Also guard > size of TO. > (ix86_hard_regno_mode_ok): Remove VALID_AVX512FP16_SCALAR_MODE > * config/i386/i386.h (VALID_AVX512FP16_SCALAR_MODE): Merged to > .. > (VALID_AVX512F_SCALAR_MODE): .. this, also add HImode. > (VALID_SSE_REG_MODE): Add DI/HImode. > * config/i386/mmx.md (*mov_internal): Add > ix86_hard_reg_move_ok to condition. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/pieces-memset-3.c: Remove xfail. > * gcc.target/i386/pieces-memset-37.c: Remove xfail. > * gcc.target/i386/pieces-memset-39.c: Remove xfail. > --- > gcc/config/i386/i386.cc | 9 ++------- > gcc/config/i386/i386.h | 16 ++++++++-------- > gcc/config/i386/mmx.md | 6 ++++-- > gcc/testsuite/gcc.target/i386/pieces-memset-3.c | 4 ++-- > gcc/testsuite/gcc.target/i386/pieces-memset-37.c | 4 ++-- > gcc/testsuite/gcc.target/i386/pieces-memset-39.c | 4 ++-- > 6 files changed, 20 insertions(+), 23 deletions(-) > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > index 292b32c5e99..030c26965ab 100644 > --- a/gcc/config/i386/i386.cc > +++ b/gcc/config/i386/i386.cc > @@ -19725,7 +19725,8 @@ ix86_can_change_mode_class (machine_mode from, machine_mode to, > the vec_dupv4hi pattern. > NB: SSE2 can load 16bit data to sse register via pinsrw. */ > int mov_size = MAYBE_SSE_CLASS_P (regclass) && TARGET_SSE2 ? 2 : 4; > - if (GET_MODE_SIZE (from) < mov_size) > + if (GET_MODE_SIZE (from) < mov_size > + || GET_MODE_SIZE (to) < mov_size) > return false; > } > > @@ -20089,12 +20090,6 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode) > || VALID_AVX512F_SCALAR_MODE (mode))) > return true; > > - /* For AVX512FP16, vmovw supports movement of HImode > - and HFmode between GPR and SSE registers. */ > - if (TARGET_AVX512FP16 > - && VALID_AVX512FP16_SCALAR_MODE (mode)) > - return true; > - > /* For AVX-5124FMAPS or AVX-5124VNNIW > allow V64SF and V64SI modes for special regnos. */ > if ((TARGET_AVX5124FMAPS || TARGET_AVX5124VNNIW) > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h > index 3869db8f2d3..d9a1fb0e420 100644 > --- a/gcc/config/i386/i386.h > +++ b/gcc/config/i386/i386.h > @@ -1017,11 +1017,9 @@ extern const char *host_detect_local_cpu (int argc, const char **argv); > (VALID_AVX256_REG_MODE (MODE) || (MODE) == OImode) > > #define VALID_AVX512F_SCALAR_MODE(MODE) \ > - ((MODE) == DImode || (MODE) == DFmode || (MODE) == SImode \ > - || (MODE) == SFmode) > - > -#define VALID_AVX512FP16_SCALAR_MODE(MODE) \ > - ((MODE) == HImode || (MODE) == HFmode) > + ((MODE) == DImode || (MODE) == DFmode \ > + || (MODE) == SImode || (MODE) == SFmode \ > + || (MODE) == HImode || (MODE) == HFmode || (MODE) == BFmode) > > #define VALID_AVX512F_REG_MODE(MODE) \ > ((MODE) == V8DImode || (MODE) == V8DFmode || (MODE) == V64QImode \ > @@ -1045,13 +1043,15 @@ extern const char *host_detect_local_cpu (int argc, const char **argv); > || (MODE) == V8HFmode || (MODE) == V4HFmode || (MODE) == V2HFmode \ > || (MODE) == V8BFmode || (MODE) == V4BFmode || (MODE) == V2BFmode \ > || (MODE) == V4QImode || (MODE) == V2HImode || (MODE) == V1SImode \ > - || (MODE) == V2DImode || (MODE) == V2QImode || (MODE) == DFmode \ > - || (MODE) == HFmode || (MODE) == BFmode) > + || (MODE) == V2DImode || (MODE) == V2QImode \ > + || (MODE) == DFmode || (MODE) == DImode \ > + || (MODE) == HFmode || (MODE) == BFmode || (MODE) == HImode) I realize there's no TARGET_SSE2 for VALID_SSE_REG_MODE in hard_regno_mode_ok, it's ok for other modes except HImode which must require SSE2, orelse we can move 16-bit from/to integer So Remove HImode from it, and add it separately in hard_regno_mode_ok as + /* Use pinsrw/pextrw to mov 16-bit data from/to sse to/from integer. */ + if (TARGET_SSE2 && mode == HImode) + return true; + > > #define VALID_SSE_REG_MODE(MODE) \ > ((MODE) == V1TImode || (MODE) == TImode \ > || (MODE) == V4SFmode || (MODE) == V4SImode \ > - || (MODE) == SFmode || (MODE) == TFmode || (MODE) == TDmode) > + || (MODE) == SFmode || (MODE) == SImode \ > + || (MODE) == TFmode || (MODE) == TDmode) > > #define VALID_MMX_REG_MODE_3DNOW(MODE) \ > ((MODE) == V2SFmode || (MODE) == SFmode) > diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md > index d5134cc351e..63aff287795 100644 > --- a/gcc/config/i386/mmx.md > +++ b/gcc/config/i386/mmx.md > @@ -133,7 +133,8 @@ (define_insn "*mov_internal" > (match_operand:MMXMODE 1 "nonimm_or_0_operand" > "rCo,rC,C,rm,rC,C ,!y,m ,?!y,?!y,r ,C,v,m,v,v,r,*x,!y"))] > "(TARGET_MMX || TARGET_MMX_WITH_SSE) > - && !(MEM_P (operands[0]) && MEM_P (operands[1]))" > + && !(MEM_P (operands[0]) && MEM_P (operands[1])) > + && ix86_hardreg_mov_ok (operands[0], operands[1])" > { > switch (get_attr_type (insn)) > { > @@ -286,7 +287,8 @@ (define_insn "*mov_internal" > "=r ,m ,v,v,v,m,r,v") > (match_operand:V_32 1 "general_operand" > "rmC,rC,C,v,m,v,v,r"))] > - "!(MEM_P (operands[0]) && MEM_P (operands[1]))" > + "!(MEM_P (operands[0]) && MEM_P (operands[1])) > + && ix86_hardreg_mov_ok (operands[0], operands[1])" > { > switch (get_attr_type (insn)) > { > diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-3.c b/gcc/testsuite/gcc.target/i386/pieces-memset-3.c > index 765441a7c4a..4f105f58b26 100644 > --- a/gcc/testsuite/gcc.target/i386/pieces-memset-3.c > +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-3.c > @@ -13,6 +13,6 @@ foo (int x) > /* { dg-final { scan-assembler-times "vinserti64x4\[ \\t\]+\[^\n\]*%zmm" 1 } } */ > /* { dg-final { scan-assembler-times "vmovdqu64\[ \\t\]+\[^\n\]*%zmm" 1 } } */ > /* No need to dynamically realign the stack here. */ > -/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" { xfail *-*-* } } } */ > +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ > /* Nor use a frame pointer. */ > -/* { dg-final { scan-assembler-not "%\[re\]bp" { xfail *-*-* } } } */ > +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ > diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-37.c b/gcc/testsuite/gcc.target/i386/pieces-memset-37.c > index 0c5056be54d..fd09bd153ce 100644 > --- a/gcc/testsuite/gcc.target/i386/pieces-memset-37.c > +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-37.c > @@ -10,6 +10,6 @@ foo (int a1, int a2, int a3, int a4, int a5, int a6, int x, char *dst) > /* { dg-final { scan-assembler-times "vpbroadcastb\[ \\t\]+\[^\n\]*%ymm" 1 } } */ > /* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 2 } } */ > /* No need to dynamically realign the stack here. */ > -/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" { xfail *-*-* } } } */ > +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ > /* Nor use a frame pointer. */ > -/* { dg-final { scan-assembler-not "%\[re\]bp" { xfail *-*-* } } } */ > +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ > diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-39.c b/gcc/testsuite/gcc.target/i386/pieces-memset-39.c > index e33644c2f10..0ed88b274bd 100644 > --- a/gcc/testsuite/gcc.target/i386/pieces-memset-39.c > +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-39.c > @@ -11,6 +11,6 @@ foo (int a1, int a2, int a3, int a4, int a5, int a6, int x, char *dst) > /* { dg-final { scan-assembler-not "vinserti64x4" } } */ > /* { dg-final { scan-assembler-times "vmovdqu8\[ \\t\]+\[^\n\]*%zmm" 1 } } */ > /* No need to dynamically realign the stack here. */ > -/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" { xfail *-*-* } } } */ > +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */ > /* Nor use a frame pointer. */ > -/* { dg-final { scan-assembler-not "%\[re\]bp" { xfail *-*-* } } } */ > +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */ > -- > 2.27.0 > -- BR, Hongtao