From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <jakub@redhat.com>
Received: from us-smtp-delivery-124.mimecast.com
 (us-smtp-delivery-124.mimecast.com [216.205.24.124])
 by sourceware.org (Postfix) with ESMTP id C86AC3A77C31
 for <gcc-patches@gcc.gnu.org>; Fri, 23 Apr 2021 09:13:42 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org C86AC3A77C31
Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com
 [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id
 us-mta-237-7_I4aaCVOl2EO1_ePjCm8A-1; Fri, 23 Apr 2021 05:13:40 -0400
X-MC-Unique: 7_I4aaCVOl2EO1_ePjCm8A-1
Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com
 [10.5.11.22])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mimecast-mx01.redhat.com (Postfix) with ESMTPS id CB72ADF8A7;
 Fri, 23 Apr 2021 09:13:39 +0000 (UTC)
Received: from tucnak.zalov.cz (ovpn-115-183.ams2.redhat.com [10.36.115.183])
 by smtp.corp.redhat.com (Postfix) with ESMTPS id 718D810023AC;
 Fri, 23 Apr 2021 09:13:39 +0000 (UTC)
Received: from tucnak.zalov.cz (localhost [127.0.0.1])
 by tucnak.zalov.cz (8.16.1/8.16.1) with ESMTPS id 13N9DbKx2202411
 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT);
 Fri, 23 Apr 2021 11:13:37 +0200
Received: (from jakub@localhost)
 by tucnak.zalov.cz (8.16.1/8.16.1/Submit) id 13N9Da0L2202410;
 Fri, 23 Apr 2021 11:13:36 +0200
Date: Fri, 23 Apr 2021 11:13:36 +0200
From: Jakub Jelinek <jakub@redhat.com>
To: Hongtao Liu <crazylht@gmail.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] [i386] Optimize __builtin_shuffle when it's used to zero
 the upper bits of the dest. [PR target/94680]
Message-ID: <20210423091336.GX1179226@tucnak>
Reply-To: Jakub Jelinek <jakub@redhat.com>
References: <CAMZc-byCJGFwdui=Zj7Gkcub9=QaUACOAn1YtozbcPR5JLr0DQ@mail.gmail.com>
MIME-Version: 1.0
In-Reply-To: <CAMZc-byCJGFwdui=Zj7Gkcub9=QaUACOAn1YtozbcPR5JLr0DQ@mail.gmail.com>
X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
X-Spam-Status: No, score=-5.8 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH,
 DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_LOW,
 RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Fri, 23 Apr 2021 09:13:44 -0000

On Fri, Apr 23, 2021 at 12:53:58PM +0800, Hongtao Liu via Gcc-patches wrote:
> +      if (!CONST_INT_P (er))
> +	return 0;
> +      ei = INTVAL (er);
> +      if (i < nelt2 && ei != i)
> +	return 0;
> +      if (i >= nelt2
> +      	 && (ei < nelt || ei >= nelt<<1))

Formatting:
1) you have spaces followed by tab, remove the spaces; but,
      if (i >= nelt2 && (ei < nelt || ei >= nelt<<1))
   fits on one line, so keep it on one line.
2) nelt<<1 should be nelt << 1 with spaces around the <<

> -(define_insn "*vec_concatv4si_0"
> -  [(set (match_operand:V4SI 0 "register_operand"       "=v,x")
> -	(vec_concat:V4SI
> -	  (match_operand:V2SI 1 "nonimmediate_operand" "vm,?!*y")
> -	  (match_operand:V2SI 2 "const0_operand"       " C,C")))]
> +(define_insn "*vec_concat<mode>_0"
> +  [(set (match_operand:VI124_128 0 "register_operand"       "=v,x")
> +	(vec_concat:VI124_128
> +	  (match_operand:<ssehalfvecmode> 1 "nonimmediate_operand" "vm,?!*y")
> +	  (match_operand:<ssehalfvecmode> 2 "const0_operand"       " C,C")))]
>    "TARGET_SSE2"
>    "@
>     %vmovq\t{%1, %0|%0, %1}
> @@ -22154,6 +22157,24 @@ (define_insn "avx_vec_concat<mode>"
>     (set_attr "prefix" "maybe_evex")
>     (set_attr "mode" "<sseinsnmode>")])
>  
> +(define_insn_and_split "*vec_concat<mode>_0"

Would be better to use a different pattern name, *vec_concat<mode>_0
is already used in the above define_insn.
Use some additional suffix after _0?

> +  return __builtin_shuffle (x, (v32qi) { 0, 0, 0, 0, 0, 0, 0, 0,
> +					 0, 0, 0, 0, 0, 0, 0, 0,
> +					 0, 0, 0, 0, 0, 0, 0, 0,
> +					 0, 0, 0, 0, 0, 0, 0, 0 },
> +			       (v32qi) { 0, 1, 2, 3, 4, 5, 6, 7,
> +					 8, 9, 10, 11, 12, 13, 14, 15,
> +					 32, 49, 34, 58, 36, 53, 38, 39,
> +					 40, 60, 42, 43, 63, 45, 46, 47 });

In this testcase the shuffles in the part taking indexes from the zero
vector are nicely randomized.

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx512f-pr94680.c
> @@ -0,0 +1,78 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx512bw -mavx512vbmi -O2" } */
> +/* { dg-final { scan-assembler-times {(?n)vmov[a-z0-9]*[ \t]*%ymm[0-9]} 6} } */
> +/* { dg-final { scan-assembler-not "pxor" } } */
> +
> +
> +typedef float v16sf __attribute__((vector_size(64)));
> +typedef double v8df __attribute__ ((vector_size (64)));
> +typedef long long v8di __attribute__((vector_size(64)));
> +typedef int v16si __attribute__((vector_size(64)));
> +typedef short v32hi __attribute__ ((vector_size (64)));
> +typedef char v64qi __attribute__ ((vector_size (64)));
> +
> +v8df
> +foo_v8df (v8df x)
> +{
> +  return __builtin_shuffle (x, (v8df) { 0, 0, 0, 0, 0, 0, 0, 0 },
> +			    (v8di) { 0, 1, 2, 3, 8, 9, 10, 11 });
> +}
> +
> +v8di
> +foo_v8di (v8di x)
> +{
> +  return __builtin_shuffle (x, (v8di) { 0, 0, 0, 0, 0, 0, 0, 0 },
> +			    (v8di) { 0, 1, 2, 3, 8, 9, 10, 11 });
> +}
> +
> +v16sf
> +foo_v16sf (v16sf x)
> +{
> +  return __builtin_shuffle (x, (v16sf)  { 0, 0, 0, 0, 0, 0, 0, 0,
> +					   0, 0, 0, 0, 0, 0, 0, 0 },
> +			       (v16si) { 0, 1, 2, 3, 4, 5, 6, 7,
> +					 16, 17, 18, 19, 20, 21, 22, 23 });
> +}
> +
> +v16si
> +foo_v16si (v16si x)
> +{
> +    return __builtin_shuffle (x, (v16si)  { 0, 0, 0, 0, 0, 0, 0, 0,
> +					   0, 0, 0, 0, 0, 0, 0, 0 },
> +			       (v16si) { 0, 1, 2, 3, 4, 5, 6, 7,
> +					 16, 17, 18, 19, 20, 21, 22, 23 });
> +}
> +
> +v32hi
> +foo_v32hi (v32hi x)
> +{
> +  return __builtin_shuffle (x, (v32hi) { 0, 0, 0, 0, 0, 0, 0, 0,
> +					 0, 0, 0, 0, 0, 0, 0, 0,
> +					 0, 0, 0, 0, 0, 0, 0, 0,
> +					 0, 0, 0, 0, 0, 0, 0, 0 },
> +			       (v32hi) { 0, 1, 2, 3, 4, 5, 6, 7,
> +					 8, 9, 10, 11, 12, 13, 14, 15,
> +					 32, 33, 34, 35, 36, 37, 38, 39,
> +					 40,41, 42, 43, 44, 45, 46, 47 });
> +}
> +
> +v64qi
> +foo_v64qi (v64qi x)
> +{
> +  return __builtin_shuffle (x, (v64qi) { 0, 0, 0, 0, 0, 0, 0, 0,
> +					 0, 0, 0, 0, 0, 0, 0, 0,
> +					 0, 0, 0, 0, 0, 0, 0, 0,
> +					 0, 0, 0, 0, 0, 0, 0, 0,
> +					 0, 0, 0, 0, 0, 0, 0, 0,
> +					 0, 0, 0, 0, 0, 0, 0, 0,
> +					 0, 0, 0, 0, 0, 0, 0, 0,
> +					 0, 0, 0, 0, 0, 0, 0, 0 },
> +			       (v64qi) {0, 1, 2, 3, 4, 5, 6, 7,
> +					  8, 9, 10, 11, 12, 13, 14, 15,
> +					  16, 17, 18, 19, 20, 21, 22, 23,
> +					  24, 25, 26, 27, 28, 29, 30, 31,
> +					  64, 65, 66, 67, 68, 69, 70, 71,
> +					  72, 73, 74, 75, 76, 77, 78, 79,
> +					  80, 81, 82, 83, 84, 85, 86, 87,
> +					  88, 89, 90, 91, 92, 93, 94, 95 });

Can't you randomize a little bit at least some of these too?

Also, what happens with __builtin_shuffle (zero_vector, x, ...) (i.e. when
you swap the two vectors and adjust correspondingly the permutation)?
Will it be also recognized or do we just punt on those?

	Jakub