From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qv1-xf32.google.com (mail-qv1-xf32.google.com [IPv6:2607:f8b0:4864:20::f32]) by sourceware.org (Postfix) with ESMTPS id 530823857C48 for ; Sun, 7 Mar 2021 09:07:57 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 530823857C48 Received: by mail-qv1-xf32.google.com with SMTP id d9so3224406qvo.3 for ; Sun, 07 Mar 2021 01:07:57 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=A2Fz1cBhXuCzA4s7jMPPbeKC16if3ca5hj1+zUzjw7U=; b=g/eMcvakeRuJ2x+wv5r/LYDtypQjYdDe4S8FRB4eUlX78krTNfPaFfsvulIVO7ZRw3 QWbmt2ZB3czbQqeS3LimrsTV8EF5YmAw8zoOd96vvP9b83Q+6GmE5Jb2gl3ZxLaf7v9U wOvA58dLQlpubRmvhTPkq/ITeVjiqyKW5tPxSnC355zRYq1N8J3WsEXRvkzjZGg9iGFx WcvFI7eVd8aDgm6biFyi5swzMwEhZwG45GiEd6yeMtoRItpf3JwOTUhTjqxeFiGwYnRV WlB0XfIh2vVkgJgOvFSK70EWP9lu7DBL/hKl0i5uGxANOMu+4UEhgyX7h2l/Ps1AwTzp K6NQ== X-Gm-Message-State: AOAM533rmR2PfNkPc9Zjop8mB/QfZKl+vgJsVU3n6Vovvo7Gq/PyKWJk Vbj4gddluP8fl5+zm4F5/O23p4ppO+WrKs6czcM= X-Google-Smtp-Source: ABdhPJzJIjIrnL1hc9D6Xa9Fr6DCb/bGfm3q1b5jxDnijIw3528D0Vx9bVXn2EqhkYvYUAszgEzWf0iWLFz24AxYe8g= X-Received: by 2002:a0c:c3cd:: with SMTP id p13mr8742286qvi.4.1615108076569; Sun, 07 Mar 2021 01:07:56 -0800 (PST) MIME-Version: 1.0 References: <20210305205058.GN745611@tucnak> <20210306103436.GP745611@tucnak> <20210307084013.GX745611@tucnak> In-Reply-To: <20210307084013.GX745611@tucnak> From: Uros Bizjak Date: Sun, 7 Mar 2021 10:07:49 +0100 Message-ID: Subject: Re: [PATCH] i386, v2: Fix some -mavx512vl -mno-avx512bw bugs [PR99321] To: Jakub Jelinek Cc: "gcc-patches@gcc.gnu.org" , Jeff Law Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-3.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 07 Mar 2021 09:07:59 -0000 On Sun, Mar 7, 2021 at 9:41 AM Jakub Jelinek wrote: > > On Sat, Mar 06, 2021 at 01:39:23PM +0100, Uros Bizjak via Gcc-patches wrote: > > > One possibility would be to change the meaning of Yw, because it > > > is an internal undocumented constraint and all uses in GCC currently use it > > > as xYw: > > > constraints.md:(define_register_constraint "Yw" > > > mmx.md: [(set (match_operand:V4HI 0 "register_operand" "=y,xYw") > > > mmx.md: (match_operand:V4HI 1 "register_mmxmem_operand" "ym,xYw") > > > mmx.md: [(set (match_operand:V4HI 0 "register_operand" "=y,xYw") > > > mmx.md: (match_operand:SI 1 "register_operand" "0,xYw"))))] > > > Would that be ok? > > > > Yes, this is an excellent idea. > > Ok, here is the full patch, now including a testcase. > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2021-03-07 Jakub Jelinek > > PR target/99321 > * config/i386/constraints.md (Yw): Use SSE_REGS if TARGET_SSE > but TARGET_AVX512BW or TARGET_AVX512VL is not set. Adjust description > and comment. > * config/i386/sse.md (v_Yw): New define_mode_attr. > (*3, *mul3, *avx2_3, > *sse4_1_3): Use instead of v > in constraints. > * config/i386/mmx.md (mmx_pshufw_1, *vec_dupv4hi): Use Yw instead of > xYw in constraints. > > * lib/target-supports.exp > (check_effective_target_assembler_march_noavx512bw): New effective > target. > * gcc.target/i386/avx512vl-pr99321-1.c: New test. OK. Thanks, Uros. > > --- gcc/config/i386/constraints.md.jj 2021-01-04 10:25:45.116162680 +0100 > +++ gcc/config/i386/constraints.md 2021-03-06 13:47:38.950644696 +0100 > @@ -110,7 +110,7 @@ (define_register_constraint "v" "TARGET_ > ;; v any EVEX encodable SSE register for AVX512VL target, > ;; otherwise any SSE register > ;; w any EVEX encodable SSE register for AVX512BW with TARGET_AVX512VL > -;; target. > +;; target, otherwise any SSE register. > > (define_register_constraint "Yz" "TARGET_SSE ? SSE_FIRST_REG : NO_REGS" > "First SSE register (@code{%xmm0}).") > @@ -148,8 +148,8 @@ (define_register_constraint "Yv" > "@internal For AVX512VL, any EVEX encodable SSE register (@code{%xmm0-%xmm31}), otherwise any SSE register.") > > (define_register_constraint "Yw" > - "TARGET_AVX512BW && TARGET_AVX512VL ? ALL_SSE_REGS : NO_REGS" > - "@internal Any EVEX encodable SSE register (@code{%xmm0-%xmm31}) for AVX512BW with TARGET_AVX512VL target.") > + "TARGET_AVX512BW && TARGET_AVX512VL ? ALL_SSE_REGS : TARGET_SSE ? SSE_REGS : NO_REGS" > + "@internal Any EVEX encodable SSE register (@code{%xmm0-%xmm31}) for AVX512BW with TARGET_AVX512VL target, otherwise any SSE register.") > > ;; We use the B prefix to denote any number of internal operands: > ;; f FLAGS_REG > --- gcc/config/i386/sse.md.jj 2021-03-05 21:51:33.728349881 +0100 > +++ gcc/config/i386/sse.md 2021-03-06 13:48:13.283261323 +0100 > @@ -560,6 +560,14 @@ (define_mode_attr avx512 > (V4SF "avx512vl") (V8SF "avx512vl") (V16SF "avx512f") > (V2DF "avx512vl") (V4DF "avx512vl") (V8DF "avx512f")]) > > +(define_mode_attr v_Yw > + [(V16QI "Yw") (V32QI "Yw") (V64QI "v") > + (V8HI "Yw") (V16HI "Yw") (V32HI "v") > + (V4SI "v") (V8SI "v") (V16SI "v") > + (V2DI "v") (V4DI "v") (V8DI "v") > + (V4SF "v") (V8SF "v") (V16SF "v") > + (V2DF "v") (V4DF "v") (V8DF "v")]) > + > (define_mode_attr sse2_avx_avx512f > [(V16QI "sse2") (V32QI "avx") (V64QI "avx512f") > (V8HI "avx512vl") (V16HI "avx512vl") (V32HI "avx512bw") > @@ -11677,10 +11685,10 @@ (define_expand "3_mask" > "ix86_fixup_binary_operands_no_copy (, mode, operands);") > > (define_insn "*3" > - [(set (match_operand:VI_AVX2 0 "register_operand" "=x,v") > + [(set (match_operand:VI_AVX2 0 "register_operand" "=x,") > (plusminus:VI_AVX2 > - (match_operand:VI_AVX2 1 "bcst_vector_operand" "0,v") > - (match_operand:VI_AVX2 2 "bcst_vector_operand" "xBm,vmBr")))] > + (match_operand:VI_AVX2 1 "bcst_vector_operand" "0,") > + (match_operand:VI_AVX2 2 "bcst_vector_operand" "xBm,mBr")))] > "TARGET_SSE2 && ix86_binary_operator_ok (, mode, operands)" > "@ > p\t{%2, %0|%0, %2} > @@ -11790,9 +11798,9 @@ (define_expand "mul3" > "ix86_fixup_binary_operands_no_copy (MULT, mode, operands);") > > (define_insn "*mul3" > - [(set (match_operand:VI2_AVX2 0 "register_operand" "=x,v") > - (mult:VI2_AVX2 (match_operand:VI2_AVX2 1 "vector_operand" "%0,v") > - (match_operand:VI2_AVX2 2 "vector_operand" "xBm,vm")))] > + [(set (match_operand:VI2_AVX2 0 "register_operand" "=x,") > + (mult:VI2_AVX2 (match_operand:VI2_AVX2 1 "vector_operand" "%0,") > + (match_operand:VI2_AVX2 2 "vector_operand" "xBm,m")))] > "TARGET_SSE2 && !(MEM_P (operands[1]) && MEM_P (operands[2])) > && && " > "@ > @@ -12618,10 +12626,10 @@ (define_expand "3" > "ix86_fixup_binary_operands_no_copy (, mode, operands);") > > (define_insn "*avx2_3" > - [(set (match_operand:VI124_256 0 "register_operand" "=v") > + [(set (match_operand:VI124_256 0 "register_operand" "=") > (maxmin:VI124_256 > - (match_operand:VI124_256 1 "nonimmediate_operand" "%v") > - (match_operand:VI124_256 2 "nonimmediate_operand" "vm")))] > + (match_operand:VI124_256 1 "nonimmediate_operand" "%") > + (match_operand:VI124_256 2 "nonimmediate_operand" "m")))] > "TARGET_AVX2 && !(MEM_P (operands[1]) && MEM_P (operands[2]))" > "vp\t{%2, %1, %0|%0, %1, %2}" > [(set_attr "type" "sseiadd") > @@ -12745,10 +12753,10 @@ (define_expand "3" > }) > > (define_insn "*sse4_1_3" > - [(set (match_operand:VI14_128 0 "register_operand" "=Yr,*x,v") > + [(set (match_operand:VI14_128 0 "register_operand" "=Yr,*x,") > (smaxmin:VI14_128 > - (match_operand:VI14_128 1 "vector_operand" "%0,0,v") > - (match_operand:VI14_128 2 "vector_operand" "YrBm,*xBm,vm")))] > + (match_operand:VI14_128 1 "vector_operand" "%0,0,") > + (match_operand:VI14_128 2 "vector_operand" "YrBm,*xBm,m")))] > "TARGET_SSE4_1 > && > && !(MEM_P (operands[1]) && MEM_P (operands[2]))" > @@ -12830,10 +12838,10 @@ (define_expand "3" > }) > > (define_insn "*sse4_1_3" > - [(set (match_operand:VI24_128 0 "register_operand" "=Yr,*x,v") > + [(set (match_operand:VI24_128 0 "register_operand" "=Yr,*x,") > (umaxmin:VI24_128 > - (match_operand:VI24_128 1 "vector_operand" "%0,0,v") > - (match_operand:VI24_128 2 "vector_operand" "YrBm,*xBm,vm")))] > + (match_operand:VI24_128 1 "vector_operand" "%0,0,") > + (match_operand:VI24_128 2 "vector_operand" "YrBm,*xBm,m")))] > "TARGET_SSE4_1 > && > && !(MEM_P (operands[1]) && MEM_P (operands[2]))" > --- gcc/config/i386/mmx.md.jj 2021-02-16 08:57:21.151962030 +0100 > +++ gcc/config/i386/mmx.md 2021-03-06 13:48:47.953874141 +0100 > @@ -2021,9 +2021,9 @@ (define_expand "mmx_pshufw" > }) > > (define_insn "mmx_pshufw_1" > - [(set (match_operand:V4HI 0 "register_operand" "=y,xYw") > + [(set (match_operand:V4HI 0 "register_operand" "=y,Yw") > (vec_select:V4HI > - (match_operand:V4HI 1 "register_mmxmem_operand" "ym,xYw") > + (match_operand:V4HI 1 "register_mmxmem_operand" "ym,Yw") > (parallel [(match_operand 2 "const_0_to_3_operand") > (match_operand 3 "const_0_to_3_operand") > (match_operand 4 "const_0_to_3_operand") > @@ -2105,10 +2105,10 @@ (define_insn "mmx_pswapdv2si2" > (set_attr "mode" "DI,TI")]) > > (define_insn "*vec_dupv4hi" > - [(set (match_operand:V4HI 0 "register_operand" "=y,xYw") > + [(set (match_operand:V4HI 0 "register_operand" "=y,Yw") > (vec_duplicate:V4HI > (truncate:HI > - (match_operand:SI 1 "register_operand" "0,xYw"))))] > + (match_operand:SI 1 "register_operand" "0,Yw"))))] > "(TARGET_MMX || TARGET_MMX_WITH_SSE) > && (TARGET_SSE || TARGET_3DNOW_A)" > "@ > --- gcc/testsuite/lib/target-supports.exp.jj 2021-03-02 18:20:07.988674666 +0100 > +++ gcc/testsuite/lib/target-supports.exp 2021-03-06 15:58:01.587771549 +0100 > @@ -8945,6 +8945,16 @@ proc check_effective_target_avx512bw { } > } "-mavx512bw" ] > } > > +# Return 1 if -Wa,-march=+noavx512bw is supported. > +proc check_effective_target_assembler_march_noavx512bw {} { > + if { [istarget i?86*-*-*] || [istarget x86_64*-*-*] } { > + return [check_no_compiler_messages assembler_march_noavx512bw object { > + void foo (void) {} > + } "-mno-avx512bw -Wa,-march=+noavx512bw"] > + } > + return 0 > +} > + > # Return 1 if avx512vp2intersect instructions can be compiled. > proc check_effective_target_avx512vp2intersect { } { > return [check_no_compiler_messages avx512vp2intersect object { > --- gcc/testsuite/gcc.target/i386/avx512vl-pr99321-1.c.jj 2021-03-06 16:00:20.862232850 +0100 > +++ gcc/testsuite/gcc.target/i386/avx512vl-pr99321-1.c 2021-03-06 16:00:03.756421839 +0100 > @@ -0,0 +1,39 @@ > +/* PR target/99321 */ > +/* { dg-do assemble { target lp64 } } */ > +/* { dg-require-effective-target avx512vl } */ > +/* { dg-require-effective-target assembler_march_noavx512bw } */ > +/* { dg-options "-O2 -mavx512vl -mno-avx512bw -Wa,-march=+noavx512bw" } */ > + > +#include > + > +typedef unsigned char V1 __attribute__((vector_size (16))); > +typedef unsigned char V2 __attribute__((vector_size (32))); > +typedef unsigned short V3 __attribute__((vector_size (16))); > +typedef unsigned short V4 __attribute__((vector_size (32))); > + > +void f1 (void) { register V1 a __asm ("%xmm16"), b __asm ("%xmm17"); __asm ("" : "=v" (a), "=v" (b)); a += b; __asm ("" : : "v" (a)); } > +void f2 (void) { register V2 a __asm ("%xmm16"), b __asm ("%xmm17"); __asm ("" : "=v" (a), "=v" (b)); a += b; __asm ("" : : "v" (a)); } > +void f3 (void) { register V3 a __asm ("%xmm16"), b __asm ("%xmm17"); __asm ("" : "=v" (a), "=v" (b)); a += b; __asm ("" : : "v" (a)); } > +void f4 (void) { register V4 a __asm ("%xmm16"), b __asm ("%xmm17"); __asm ("" : "=v" (a), "=v" (b)); a += b; __asm ("" : : "v" (a)); } > +void f5 (void) { register V1 a __asm ("%xmm16"), b __asm ("%xmm17"); __asm ("" : "=v" (a), "=v" (b)); a -= b; __asm ("" : : "v" (a)); } > +void f6 (void) { register V2 a __asm ("%xmm16"), b __asm ("%xmm17"); __asm ("" : "=v" (a), "=v" (b)); a -= b; __asm ("" : : "v" (a)); } > +void f7 (void) { register V3 a __asm ("%xmm16"), b __asm ("%xmm17"); __asm ("" : "=v" (a), "=v" (b)); a -= b; __asm ("" : : "v" (a)); } > +void f8 (void) { register V4 a __asm ("%xmm16"), b __asm ("%xmm17"); __asm ("" : "=v" (a), "=v" (b)); a -= b; __asm ("" : : "v" (a)); } > +void f9 (void) { register V3 a __asm ("%xmm16"), b __asm ("%xmm17"); __asm ("" : "=v" (a), "=v" (b)); a *= b; __asm ("" : : "v" (a)); } > +void f10 (void) { register V4 a __asm ("%xmm16"), b __asm ("%xmm17"); __asm ("" : "=v" (a), "=v" (b)); a *= b; __asm ("" : : "v" (a)); } > +void f11 (void) { register V1 a __asm ("%xmm16"), b __asm ("%xmm17"); __asm ("" : "=v" (a), "=v" (b)); a = (V1) _mm_min_epu8 ((__m128i) a, (__m128i) b); __asm ("" : : "v" (a)); } > +void f12 (void) { register V2 a __asm ("%xmm16"), b __asm ("%xmm17"); __asm ("" : "=v" (a), "=v" (b)); a = (V2) _mm256_min_epu8 ((__m256i) a, (__m256i) b); __asm ("" : : "v" (a)); } > +void f13 (void) { register V3 a __asm ("%xmm16"), b __asm ("%xmm17"); __asm ("" : "=v" (a), "=v" (b)); a = (V3) _mm_min_epu16 ((__m128i) a, (__m128i) b); __asm ("" : : "v" (a)); } > +void f14 (void) { register V4 a __asm ("%xmm16"), b __asm ("%xmm17"); __asm ("" : "=v" (a), "=v" (b)); a = (V4) _mm256_min_epu16 ((__m256i) a, (__m256i) b); __asm ("" : : "v" (a)); } > +void f15 (void) { register V1 a __asm ("%xmm16"), b __asm ("%xmm17"); __asm ("" : "=v" (a), "=v" (b)); a = (V1) _mm_min_epi8 ((__m128i) a, (__m128i) b); __asm ("" : : "v" (a)); } > +void f16 (void) { register V2 a __asm ("%xmm16"), b __asm ("%xmm17"); __asm ("" : "=v" (a), "=v" (b)); a = (V2) _mm256_min_epi8 ((__m256i) a, (__m256i) b); __asm ("" : : "v" (a)); } > +void f17 (void) { register V3 a __asm ("%xmm16"), b __asm ("%xmm17"); __asm ("" : "=v" (a), "=v" (b)); a = (V3) _mm_min_epi16 ((__m128i) a, (__m128i) b); __asm ("" : : "v" (a)); } > +void f18 (void) { register V4 a __asm ("%xmm16"), b __asm ("%xmm17"); __asm ("" : "=v" (a), "=v" (b)); a = (V4) _mm256_min_epi16 ((__m256i) a, (__m256i) b); __asm ("" : : "v" (a)); } > +void f19 (void) { register V1 a __asm ("%xmm16"), b __asm ("%xmm17"); __asm ("" : "=v" (a), "=v" (b)); a = (V1) _mm_max_epu8 ((__m128i) a, (__m128i) b); __asm ("" : : "v" (a)); } > +void f20 (void) { register V2 a __asm ("%xmm16"), b __asm ("%xmm17"); __asm ("" : "=v" (a), "=v" (b)); a = (V2) _mm256_max_epu8 ((__m256i) a, (__m256i) b); __asm ("" : : "v" (a)); } > +void f21 (void) { register V3 a __asm ("%xmm16"), b __asm ("%xmm17"); __asm ("" : "=v" (a), "=v" (b)); a = (V3) _mm_max_epu16 ((__m128i) a, (__m128i) b); __asm ("" : : "v" (a)); } > +void f22 (void) { register V4 a __asm ("%xmm16"), b __asm ("%xmm17"); __asm ("" : "=v" (a), "=v" (b)); a = (V4) _mm256_max_epu16 ((__m256i) a, (__m256i) b); __asm ("" : : "v" (a)); } > +void f23 (void) { register V1 a __asm ("%xmm16"), b __asm ("%xmm17"); __asm ("" : "=v" (a), "=v" (b)); a = (V1) _mm_max_epi8 ((__m128i) a, (__m128i) b); __asm ("" : : "v" (a)); } > +void f24 (void) { register V2 a __asm ("%xmm16"), b __asm ("%xmm17"); __asm ("" : "=v" (a), "=v" (b)); a = (V2) _mm256_max_epi8 ((__m256i) a, (__m256i) b); __asm ("" : : "v" (a)); } > +void f25 (void) { register V3 a __asm ("%xmm16"), b __asm ("%xmm17"); __asm ("" : "=v" (a), "=v" (b)); a = (V3) _mm_max_epi16 ((__m128i) a, (__m128i) b); __asm ("" : : "v" (a)); } > +void f26 (void) { register V4 a __asm ("%xmm16"), b __asm ("%xmm17"); __asm ("" : "=v" (a), "=v" (b)); a = (V4) _mm256_max_epi16 ((__m256i) a, (__m256i) b); __asm ("" : : "v" (a)); } > > > Jakub >