From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yw1-x112e.google.com (mail-yw1-x112e.google.com [IPv6:2607:f8b0:4864:20::112e]) by sourceware.org (Postfix) with ESMTPS id C61DC3858C54 for ; Mon, 13 Nov 2023 09:01:52 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C61DC3858C54 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org C61DC3858C54 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::112e ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699866114; cv=none; b=vN6/ffNRPWT4vwsplSv+ZXRpm3/Dtv1B4f9SYhB9T4q//GpkMhAsliKWBcCUrzQCClvN/HSw6DnPodgrpOwM4xvszyi+IpZsUYz7k36iXKIcvk9lmcj+S5X4uveFz/09gxR7IfFMmQ2KAbclBqPs80t6SM4KI5GHzm97PMaCGV4= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699866114; c=relaxed/simple; bh=b4coiblowYjgDyjoFxxAt6ifvRIFgSwltdF7WHJEU1o=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=Egkz4rLCQT9Iqqdrs7G3kiI0ryQkQuPW57I98OUKiUs0CN0n2H3LW5kzBCfFKstuScL/apLhvjjDrjmbwbchEz4/pkFNyWzFPZMoqm0O5/EBFFTWgvzOGv0qPJoRMlUOejfeFV8fdPk2YSLgDwjXy3auosgxF8nS2SVRajEV/AM= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-yw1-x112e.google.com with SMTP id 00721157ae682-5ac376d311aso45974797b3.1 for ; Mon, 13 Nov 2023 01:01:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1699866112; x=1700470912; darn=gcc.gnu.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Kl45iD7LMCUWuR1vjucMS1mr3qUBRRvxFP3fhGzl3wo=; b=M+Vuig1PZUtNLZuyYsx+ZYZSWTv874BTEoTckcxr73OUij9CCMVq2LiV4ezeAldl2G vY/C1gUXiEHn5N5XmWMG+uOrRgiNfpBnOVSf5vJgTSzNcwGC1QPkta8U3e0s1e+wZ4p8 uCCFSymXDvuIrBBVxigUbTzjXTxmXmF9RcNeqESlgi/KdGAJK8nfpd9jAxsVKCJUeIzH M8gGg0TuESFMZcW8hFfhijwq9BQOb07P0YiY1LmgyklIspfmFzzV8Lm2fhjHkJW4V7Bo KNJweGX9oFdHFXFSst3rUDtwuoCWHJsiLBu36TI6ga3HISex900Ufi9YiB4TuFpok3o6 35/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699866112; x=1700470912; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Kl45iD7LMCUWuR1vjucMS1mr3qUBRRvxFP3fhGzl3wo=; b=r81NPBtTP+ZfL51D+JFU92nz7nmunQJG38LBbCqjwlQ2O9y0PcgWxOeBOiPXrJQIr2 p5ZuHtBE7W3wi+o5oXoCIIv2K94THaBCcoF2qSOJLN5WTQId0PbuDecUmAaLPz2QxRN3 PzcnIh2uOk8+yAwWDIGASdX+ew+jkafjLd+cy1hcvFznrzHO7WwbKQb0q5nFFTTj30yk sBZQi5BQ/HpUjhMlD8lmbpPEtVl9tsEDlhtzDYAqUPER/fBRtC4KoP7UhoM3lKTXvqrC TkCUIWhvG3DR6DTSuLyirWKphFZqG40z8d5srnfQausJ4FbuQEJsK4tOqc6Zdu+UP4sC RgAA== X-Gm-Message-State: AOJu0Yzr1oajTFHVavhk28PsdMZ6a4Mu+IcdkxX+B5pFX9Ti2hR/M0KZ Xpxao789NHJJIkfKscul64lAbJwkUKT1tmZHevU= X-Google-Smtp-Source: AGHT+IHgPwMANZVMSBdUVYwNt1ipBuVSUgFw77bVObdL57kgzJ1v8w5BlbL3PoOQFi/WsGJRfobyr9C0on9Uy8N8l8c= X-Received: by 2002:a0d:d889:0:b0:5a7:b918:26be with SMTP id a131-20020a0dd889000000b005a7b91826bemr5871121ywe.15.1699866112022; Mon, 13 Nov 2023 01:01:52 -0800 (PST) MIME-Version: 1.0 References: <20231109071457.2574044-1-lin1.hu@intel.com> In-Reply-To: From: Hongtao Liu Date: Mon, 13 Nov 2023 17:10:08 +0800 Message-ID: Subject: Re: [PATCH] Avoid generate vblendps with ymm16+ To: Jakub Jelinek Cc: "Hu, Lin1" , gcc-patches@gcc.gnu.org, hongtao.liu@intel.com, ubizjak@gmail.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, Nov 13, 2023 at 4:45=E2=80=AFPM Jakub Jelinek wr= ote: > > On Mon, Nov 13, 2023 at 02:27:35PM +0800, Hongtao Liu wrote: > > > 1) if it isn't better to use separate alternative instead of > > > x86_evex_reg_mentioned_p, like in the patch below > > vblendps doesn't support gpr32 which is checked by x86_evex_reg_mention= ed_p. > > we need to use xjm for operands[1], (I think we don't need to set > > attribute addr to gpr16 for alternative 0 since the alternative 1 is > > alway available and recog will match alternative1 when gpr32 is used) > > Ok, so like this then? I've incorporated the other two tests into the pa= tch > as well. LGTM. > > 2023-11-13 Jakub Jelinek > Hu, Lin1 > > PR target/112435 > * config/i386/sse.md (avx512vl_shuf_32x4_1, > avx512dq_shuf_64x2_1): Add > alternative with just x instead of v constraints and xjm instead = of > vm and use vblendps as optimization only with that alternative. > > * gcc.target/i386/avx512vl-pr112435-1.c: New test. > * gcc.target/i386/avx512vl-pr112435-2.c: New test. > * gcc.target/i386/avx512vl-pr112435-3.c: New test. > > --- gcc/config/i386/sse.md.jj 2023-11-11 08:52:20.377845673 +0100 > +++ gcc/config/i386/sse.md 2023-11-13 09:31:08.568935535 +0100 > @@ -19235,11 +19235,11 @@ (define_expand "avx512dq_shuf_ }) > > (define_insn "avx512dq_shuf_64x2_1= " > - [(set (match_operand:VI8F_256 0 "register_operand" "=3Dv") > + [(set (match_operand:VI8F_256 0 "register_operand" "=3Dx,v") > (vec_select:VI8F_256 > (vec_concat: > - (match_operand:VI8F_256 1 "register_operand" "v") > - (match_operand:VI8F_256 2 "nonimmediate_operand" "vm")) > + (match_operand:VI8F_256 1 "register_operand" "x,v") > + (match_operand:VI8F_256 2 "nonimmediate_operand" "xjm,vm")) > (parallel [(match_operand 3 "const_0_to_3_operand") > (match_operand 4 "const_0_to_3_operand") > (match_operand 5 "const_4_to_7_operand") > @@ -19254,7 +19254,7 @@ (define_insn "avx512dq_shu > mask =3D INTVAL (operands[3]) / 2; > mask |=3D (INTVAL (operands[5]) - 4) / 2 << 1; > operands[3] =3D GEN_INT (mask); > - if (INTVAL (operands[3]) =3D=3D 2 && !) > + if (INTVAL (operands[3]) =3D=3D 2 && ! && which_alternat= ive =3D=3D 0) > return "vblendps\t{$240, %2, %1, %0|%0, %1, %2, 240}"; > return "vshuf64x2\t{%3, %2, %1, %0|%0, %1, %2, %3}"; > } > @@ -19386,11 +19386,11 @@ (define_expand "avx512vl_shuf_ }) > > (define_insn "avx512vl_shuf_32x4_1" > - [(set (match_operand:VI4F_256 0 "register_operand" "=3Dv") > + [(set (match_operand:VI4F_256 0 "register_operand" "=3Dx,v") > (vec_select:VI4F_256 > (vec_concat: > - (match_operand:VI4F_256 1 "register_operand" "v") > - (match_operand:VI4F_256 2 "nonimmediate_operand" "vm")) > + (match_operand:VI4F_256 1 "register_operand" "x,v") > + (match_operand:VI4F_256 2 "nonimmediate_operand" "xjm,vm")) > (parallel [(match_operand 3 "const_0_to_7_operand") > (match_operand 4 "const_0_to_7_operand") > (match_operand 5 "const_0_to_7_operand") > @@ -19414,7 +19414,7 @@ (define_insn "avx512vl_shuf_ mask |=3D (INTVAL (operands[7]) - 8) / 4 << 1; > operands[3] =3D GEN_INT (mask); > > - if (INTVAL (operands[3]) =3D=3D 2 && !) > + if (INTVAL (operands[3]) =3D=3D 2 && ! && which_alternat= ive =3D=3D 0) > return "vblendps\t{$240, %2, %1, %0|%0, %1, %2, 240}"; > > return "vshuf32x4\t{%3, %2, %1, %0|%0, %1, %2, %3}"; > --- gcc/testsuite/gcc.target/i386/avx512vl-pr112435-1.c.jj 2023-11-1= 3 09:20:53.330643098 +0100 > +++ gcc/testsuite/gcc.target/i386/avx512vl-pr112435-1.c 2023-11-13 09:20:= 53.330643098 +0100 > @@ -0,0 +1,13 @@ > +/* PR target/112435 */ > +/* { dg-do assemble { target { avx512vl && { ! ia32 } } } } */ > +/* { dg-options "-mavx512vl -O2" } */ > + > +#include > + > +__m256i > +foo (__m256i a, __m256i b) > +{ > + register __m256i c __asm__("ymm16") =3D a; > + asm ("" : "+v" (c)); > + return _mm256_shuffle_i32x4 (c, b, 2); > +} > --- gcc/testsuite/gcc.target/i386/avx512vl-pr112435-2.c.jj 2023-11-1= 3 09:23:04.361788598 +0100 > +++ gcc/testsuite/gcc.target/i386/avx512vl-pr112435-2.c 2023-11-13 09:34:= 57.186699876 +0100 > @@ -0,0 +1,63 @@ > +/* PR target/112435 */ > +/* { dg-do assemble { target { avx512vl && { ! ia32 } } } } */ > +/* { dg-options "-mavx512vl -O2" } */ > + > +#include > + > +/* vpermi128/vpermf128 */ > +__m256i > +perm0 (__m256i a, __m256i b) > +{ > + register __m256i c __asm__("ymm17") =3D a; > + asm ("":"+v" (c)); > + return _mm256_permute2x128_si256 (c, b, 50); > +} > + > +__m256i > +perm1 (__m256i a, __m256i b) > +{ > + register __m256i c __asm__("ymm17") =3D a; > + asm ("":"+v" (c)); > + return _mm256_permute2x128_si256 (c, b, 18); > +} > + > +__m256i > +perm2 (__m256i a, __m256i b) > +{ > + register __m256i c __asm__("ymm17") =3D a; > + asm ("":"+v" (c)); > + return _mm256_permute2x128_si256 (c, b, 48); > +} > + > +/* vshuf{i,f}{32x4,64x2} ymm .*/ > +__m256i > +shuff0 (__m256i a, __m256i b) > +{ > + register __m256i c __asm__("ymm17") =3D a; > + asm ("":"+v" (c)); > + return _mm256_shuffle_i32x4 (c, b, 2); > +} > + > +__m256 > +shuff1 (__m256 a, __m256 b) > +{ > + register __m256 c __asm__("ymm17") =3D a; > + asm ("":"+v" (c)); > + return _mm256_shuffle_f32x4 (c, b, 2); > +} > + > +__m256i > +shuff2 (__m256i a, __m256i b) > +{ > + register __m256i c __asm__("ymm17") =3D a; > + asm ("":"+v" (c)); > + return _mm256_shuffle_i64x2 (c, b, 2); > +} > + > +__m256d > +shuff3 (__m256d a, __m256d b) > +{ > + register __m256d c __asm__("ymm17") =3D a; > + asm ("":"+v" (c)); > + return _mm256_shuffle_f64x2 (c, b, 2); > +} > --- gcc/testsuite/gcc.target/i386/avx512vl-pr112435-3.c.jj 2023-11-1= 3 09:24:52.518257838 +0100 > +++ gcc/testsuite/gcc.target/i386/avx512vl-pr112435-3.c 2023-11-13 09:26:= 20.761008930 +0100 > @@ -0,0 +1,78 @@ > +/* PR target/112435 */ > +/* { dg-do assemble { target { avx512vl && { ! ia32 } } } } */ > +/* { dg-options "-mavx512vl -O2" } */ > + > +#include > + > +/* vpermf128 */ > +__m256 > +perm0 (__m256 a, __m256 b) > +{ > + register __m256 c __asm__("ymm17") =3Da; > + asm ("":"+v" (c)); > + return _mm256_permute2f128_ps (c, b, 50); > +} > + > +__m256 > +perm1 (__m256 a, __m256 b) > +{ > + register __m256 c __asm__("ymm17") =3Da; > + asm ("":"+v" (c)); > + return _mm256_permute2f128_ps (c, b, 18); > +} > + > +__m256 > +perm2 (__m256 a, __m256 b) > +{ > + register __m256 c __asm__("ymm17") =3Da; > + asm ("":"+v" (c)); > + return _mm256_permute2f128_ps (c, b, 48); > +} > + > +__m256i > +perm3 (__m256i a, __m256i b) > +{ > + register __m256i c __asm__("ymm17") =3Da; > + asm ("":"+v" (c)); > + return _mm256_permute2f128_si256 (c, b, 50); > +} > + > +__m256i > +perm4 (__m256i a, __m256i b) > +{ > + register __m256i c __asm__("ymm17") =3Da; > + asm ("":"+v" (c)); > + return _mm256_permute2f128_si256 (c, b, 18); > +} > + > +__m256i > +perm5 (__m256i a, __m256i b) > +{ > + register __m256i c __asm__("ymm17") =3Da; > + asm ("":"+v" (c)); > + return _mm256_permute2f128_si256 (c, b, 48); > +} > + > +__m256d > +perm6 (__m256d a, __m256d b) > +{ > + register __m256d c __asm__("ymm17") =3Da; > + asm ("":"+v" (c)); > + return _mm256_permute2f128_pd (c, b, 50); > +} > + > +__m256d > +perm7 (__m256d a, __m256d b) > +{ > + register __m256d c __asm__("ymm17") =3Da; > + asm ("":"+v" (c)); > + return _mm256_permute2f128_pd (c, b, 18); > +} > + > +__m256d > +perm8 (__m256d a, __m256d b) > +{ > + register __m256d c __asm__("ymm17") =3Da; > + asm ("":"+v" (c)); > + return _mm256_permute2f128_pd (c, b, 48); > +} > > Jakub > --=20 BR, Hongtao