From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 494873858D35 for ; Thu, 3 Aug 2023 07:17:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 494873858D35 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com Received: from pps.filterd (m0360083.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3737CFob020140 for ; Thu, 3 Aug 2023 07:17:15 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : mime-version : subject : to : references : from : in-reply-to : content-type : content-transfer-encoding; s=pp1; bh=zw2qh8tBuVGt3AAic8PSU9pf0/syJHwrZPOJs7qAahw=; b=DNXdlNvLOCHO4gCr9IFXOuI18hQzCUgcWwOuHNA1JUWhLDvkAODXj/eo8I5DUiiphrFA tpA2QfE2Y+djtbziful1CstsdJvTM4GL4K+b7SEAslSrsGSaGzSyjCb3sHdy+gu13hB3 cNvT2//Sozf+9/eqIaS5iyNNOskiQO8XaveOhMazIMbhZsMqkluNFrM4fgz8JyEm9JOn GiQB8Eu3M6HkyAYas5dp/Ly9dL3T/a3duuxPymC0Y+1aiq4FuLqOYLJdw24BQ6GAqgqq 90bFDA72kfGF7z92cNNP3aJU77J+GjxcG8Bl37m2zHgmLVe0JPLAYkX+yBXxxUS/WdKq yQ== Received: from ppma12.dal12v.mail.ibm.com (dc.9e.1632.ip4.static.sl-reverse.com [50.22.158.220]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3s87n2050y-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 03 Aug 2023 07:17:15 +0000 Received: from pps.filterd (ppma12.dal12v.mail.ibm.com [127.0.0.1]) by ppma12.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 37355QfA006066 for ; Thu, 3 Aug 2023 07:17:14 GMT Received: from smtprelay07.fra02v.mail.ibm.com ([9.218.2.229]) by ppma12.dal12v.mail.ibm.com (PPS) with ESMTPS id 3s5d3suxfm-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 03 Aug 2023 07:17:14 +0000 Received: from smtpav02.fra02v.mail.ibm.com (smtpav02.fra02v.mail.ibm.com [10.20.54.101]) by smtprelay07.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 3737HB9V56361304 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 3 Aug 2023 07:17:11 GMT Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 743A92004B; Thu, 3 Aug 2023 07:17:11 +0000 (GMT) Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5301420040; Thu, 3 Aug 2023 07:17:11 +0000 (GMT) Received: from [9.152.224.246] (unknown [9.152.224.246]) by smtpav02.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 3 Aug 2023 07:17:11 +0000 (GMT) Message-ID: <1db34248-452c-cfdd-dfeb-ae49ad85e403@linux.ibm.com> Date: Thu, 3 Aug 2023 09:17:11 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Subject: Re: [PATCH] s390: Try to emit vlbr/vstbr instead of vperm et al. Content-Language: en-US To: Stefan Schulze Frielinghaus , gcc-patches@gcc.gnu.org References: <20230803065059.951867-2-stefansf@linux.ibm.com> From: Andreas Krebbel In-Reply-To: <20230803065059.951867-2-stefansf@linux.ibm.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: sCJOBwj6QkfzYyiZWkR4ykHBMik2dsd8 X-Proofpoint-GUID: sCJOBwj6QkfzYyiZWkR4ykHBMik2dsd8 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.591,FMLib:17.11.176.26 definitions=2023-08-03_05,2023-08-01_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 lowpriorityscore=0 mlxlogscore=999 malwarescore=0 suspectscore=0 mlxscore=0 impostorscore=0 bulkscore=0 adultscore=0 clxscore=1015 spamscore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2306200000 definitions=main-2308030062 X-Spam-Status: No, score=-10.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,NICE_REPLY_A,RCVD_IN_MSPIKE_H5,RCVD_IN_MSPIKE_WL,SCC_5_SHORT_WORD_LINES,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 8/3/23 08:51, Stefan Schulze Frielinghaus wrote: > Bootstrapped and regtested on s390x. Ok for mainline? > > gcc/ChangeLog: > > * config/s390/s390.cc (expand_perm_as_a_vlbr_vstbr_candidate): > New function which handles bswap patterns for vec_perm_const. > (vectorize_vec_perm_const_1): Call new function. > * config/s390/vector.md (*bswap): Fix operands in output > template. > (*vstbr): New insn. > > gcc/testsuite/ChangeLog: > > * gcc.target/s390/s390.exp: Add subdirectory vxe2. > * gcc.target/s390/vxe2/vlbr-1.c: New test. > * gcc.target/s390/vxe2/vstbr-1.c: New test. > * gcc.target/s390/vxe2/vstbr-2.c: New test. Ok. Thanks! Andreas > --- > gcc/config/s390/s390.cc | 55 ++++++++++++++++++++ > gcc/config/s390/vector.md | 16 ++++-- > gcc/testsuite/gcc.target/s390/s390.exp | 3 ++ > gcc/testsuite/gcc.target/s390/vxe2/vlbr-1.c | 29 +++++++++++ > gcc/testsuite/gcc.target/s390/vxe2/vstbr-1.c | 29 +++++++++++ > gcc/testsuite/gcc.target/s390/vxe2/vstbr-2.c | 42 +++++++++++++++ > 6 files changed, 170 insertions(+), 4 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/s390/vxe2/vlbr-1.c > create mode 100644 gcc/testsuite/gcc.target/s390/vxe2/vstbr-1.c > create mode 100644 gcc/testsuite/gcc.target/s390/vxe2/vstbr-2.c > > diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc > index d9f10542473..91eb9232b10 100644 > --- a/gcc/config/s390/s390.cc > +++ b/gcc/config/s390/s390.cc > @@ -17698,6 +17698,58 @@ expand_perm_with_vstbrq (const struct expand_vec_perm_d &d) > return false; > } > > +/* Try to emit vlbr/vstbr. Note, this is only a candidate insn since > + TARGET_VECTORIZE_VEC_PERM_CONST operates on vector registers only. Thus, > + either fwprop, combine et al. "fixes" one of the input/output operands into > + a memory operand or a splitter has to reverse this into a general vperm > + operation. */ > + > +static bool > +expand_perm_as_a_vlbr_vstbr_candidate (const struct expand_vec_perm_d &d) > +{ > + static const char perm[4][MAX_VECT_LEN] > + = { { 1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14 }, > + { 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 }, > + { 7, 6, 5, 4, 3, 2, 1, 0, 15, 14, 13, 12, 11, 10, 9, 8 }, > + { 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 } }; > + > + if (!TARGET_VXE2 || d.vmode != V16QImode || d.op0 != d.op1) > + return false; > + > + if (memcmp (d.perm, perm[0], MAX_VECT_LEN) == 0) > + { > + rtx target = gen_rtx_SUBREG (V8HImode, d.target, 0); > + rtx op0 = gen_rtx_SUBREG (V8HImode, d.op0, 0); > + emit_insn (gen_bswapv8hi (target, op0)); > + return true; > + } > + > + if (memcmp (d.perm, perm[1], MAX_VECT_LEN) == 0) > + { > + rtx target = gen_rtx_SUBREG (V4SImode, d.target, 0); > + rtx op0 = gen_rtx_SUBREG (V4SImode, d.op0, 0); > + emit_insn (gen_bswapv4si (target, op0)); > + return true; > + } > + > + if (memcmp (d.perm, perm[2], MAX_VECT_LEN) == 0) > + { > + rtx target = gen_rtx_SUBREG (V2DImode, d.target, 0); > + rtx op0 = gen_rtx_SUBREG (V2DImode, d.op0, 0); > + emit_insn (gen_bswapv2di (target, op0)); > + return true; > + } > + > + if (memcmp (d.perm, perm[3], MAX_VECT_LEN) == 0) > + { > + rtx target = gen_rtx_SUBREG (V1TImode, d.target, 0); > + rtx op0 = gen_rtx_SUBREG (V1TImode, d.op0, 0); > + emit_insn (gen_bswapv1ti (target, op0)); > + return true; > + } > + > + return false; > +} > > /* Try to find the best sequence for the vector permute operation > described by D. Return true if the operation could be > @@ -17720,6 +17772,9 @@ vectorize_vec_perm_const_1 (const struct expand_vec_perm_d &d) > if (expand_perm_with_rot (d)) > return true; > > + if (expand_perm_as_a_vlbr_vstbr_candidate (d)) > + return true; > + > return false; > } > > diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md > index 21bec729efa..f0e9ed3d263 100644 > --- a/gcc/config/s390/vector.md > +++ b/gcc/config/s390/vector.md > @@ -47,6 +47,7 @@ > (define_mode_iterator VI_HW [V16QI V8HI V4SI V2DI]) > (define_mode_iterator VI_HW_QHS [V16QI V8HI V4SI]) > (define_mode_iterator VI_HW_HSD [V8HI V4SI V2DI]) > +(define_mode_iterator VI_HW_HSDT [V8HI V4SI V2DI V1TI TI]) > (define_mode_iterator VI_HW_HS [V8HI V4SI]) > (define_mode_iterator VI_HW_QH [V16QI V8HI]) > > @@ -2876,12 +2877,12 @@ > (use (match_dup 2))])] > "TARGET_VX" > { > - static char p[4][16] = > + static const char p[4][16] = > { { 1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14 }, /* H */ > { 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 }, /* S */ > { 7, 6, 5, 4, 3, 2, 1, 0, 15, 14, 13, 12, 11, 10, 9, 8 }, /* D */ > { 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 } }; /* T */ > - char *perm; > + const char *perm; > rtx perm_rtx[16]; > > switch (GET_MODE_SIZE (GET_MODE_INNER (mode))) > @@ -2933,8 +2934,8 @@ > "TARGET_VXE2" > "@ > # > - vlbr\t%v0,%v1 > - vstbr\t%v1,%v0" > + vlbr\t%v0,%1 > + vstbr\t%v1,%0" > "&& reload_completed > && !memory_operand (operands[0], mode) > && !memory_operand (operands[1], mode)" > @@ -2947,6 +2948,13 @@ > "" > [(set_attr "op_type" "*,VRX,VRX")]) > > +(define_insn "*vstbr" > + [(set (match_operand:VI_HW_HSDT 0 "memory_operand" "=R") > + (bswap:VI_HW_HSDT (match_operand:VI_HW_HSDT 1 "register_operand" "v")))] > + "TARGET_VXE2" > + "vstbr\t%v1,%0" > + [(set_attr "op_type" "VRX")]) > + > ; > ; Implement len_load/len_store optabs with vll/vstl. > (define_expand "len_load_v16qi" > diff --git a/gcc/testsuite/gcc.target/s390/s390.exp b/gcc/testsuite/gcc.target/s390/s390.exp > index 58258492f83..a2b48eed5f2 100644 > --- a/gcc/testsuite/gcc.target/s390/s390.exp > +++ b/gcc/testsuite/gcc.target/s390/s390.exp > @@ -254,6 +254,9 @@ dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/arch13/*.{c,S}]] \ > dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/vxe/*.{c,S}]] \ > "" "-O3 -march=arch12 -mzarch" > > +dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/vxe2/*.{c,S}]] \ > + "" "-O3 -march=arch13 -mzarch" > + > # Some md tests require libatomic > atomic_init > dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/md/*.{c,S}]] \ > diff --git a/gcc/testsuite/gcc.target/s390/vxe2/vlbr-1.c b/gcc/testsuite/gcc.target/s390/vxe2/vlbr-1.c > new file mode 100644 > index 00000000000..34fd1db23e3 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/s390/vxe2/vlbr-1.c > @@ -0,0 +1,29 @@ > +/* { dg-do compile } */ > +/* { dg-final { scan-assembler {\tvlbrh\t} } } */ > +/* { dg-final { scan-assembler {\tvlbrf\t} } } */ > +/* { dg-final { scan-assembler {\tvlbrg\t} } } */ > +/* { dg-final { scan-assembler-not {\tvperm\t} } } */ > + > +/* The addend X ensures that a LOAD REVERSE and not a STORE REVERSE is > + emitted. */ > + > +void > +vlbrh (unsigned short *a, unsigned short x) > +{ > + for (int i = 0; i < 128; ++i) > + a[i] = __builtin_bswap16 (a[i]) + x; > +} > + > +void > +vlbrf (unsigned int *a, unsigned int x) > +{ > + for (int i = 0; i < 128; ++i) > + a[i] = __builtin_bswap32 (a[i]) + x; > +} > + > +void > +vlbrg (unsigned long long *a, unsigned long long x) > +{ > + for (int i = 0; i < 128; ++i) > + a[i] = __builtin_bswap64 (a[i]) + x; > +} > diff --git a/gcc/testsuite/gcc.target/s390/vxe2/vstbr-1.c b/gcc/testsuite/gcc.target/s390/vxe2/vstbr-1.c > new file mode 100644 > index 00000000000..38947d12380 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/s390/vxe2/vstbr-1.c > @@ -0,0 +1,29 @@ > +/* { dg-do compile } */ > +/* { dg-final { scan-assembler {\tvstbrh\t} } } */ > +/* { dg-final { scan-assembler {\tvstbrf\t} } } */ > +/* { dg-final { scan-assembler {\tvstbrg\t} } } */ > +/* { dg-final { scan-assembler-not {\tvperm\t} } } */ > + > +/* The addend X ensures that a STORE REVERSE and not a LOAD REVERSE is > + emitted. */ > + > +void > +vlbrh (unsigned short *a, unsigned short x) > +{ > + for (int i = 0; i < 128; ++i) > + a[i] = __builtin_bswap16 (a[i] + x); > +} > + > +void > +vlbrf (unsigned int *a, unsigned int x) > +{ > + for (int i = 0; i < 128; ++i) > + a[i] = __builtin_bswap32 (a[i] + x); > +} > + > +void > +vlbrg (unsigned long long *a, unsigned long long x) > +{ > + for (int i = 0; i < 128; ++i) > + a[i] = __builtin_bswap64 (a[i] + x); > +} > diff --git a/gcc/testsuite/gcc.target/s390/vxe2/vstbr-2.c b/gcc/testsuite/gcc.target/s390/vxe2/vstbr-2.c > new file mode 100644 > index 00000000000..65d2e45381c > --- /dev/null > +++ b/gcc/testsuite/gcc.target/s390/vxe2/vstbr-2.c > @@ -0,0 +1,42 @@ > +/* { dg-do compile } */ > +/* { dg-final { scan-assembler {\tvstbrh\t} } } */ > +/* { dg-final { scan-assembler {\tvstbrf\t} } } */ > +/* { dg-final { scan-assembler {\tvstbrg\t} } } */ > +/* { dg-final { scan-assembler-not {\tvperm\t} } } */ > + > +typedef unsigned short __attribute__ ((vector_size (16))) V8HI; > +typedef unsigned int __attribute__ ((vector_size (16))) V4SI; > +typedef unsigned long long __attribute__ ((vector_size (16))) V2DI; > + > +void > +vstbrh (V8HI *p, V8HI x) > +{ > + V8HI y; > + > + for (int i = 0; i < 8; ++i) > + y[i] = __builtin_bswap16 (x[i]); > + > + *p = y; > +} > + > +void > +vstbrf (V4SI *p, V4SI x) > +{ > + V4SI y; > + > + for (int i = 0; i < 4; ++i) > + y[i] = __builtin_bswap32 (x[i]); > + > + *p = y; > +} > + > +void > +vstbrg (V2DI *p, V2DI x) > +{ > + V2DI y; > + > + for (int i = 0; i < 2; ++i) > + y[i] = __builtin_bswap64 (x[i]); > + > + *p = y; > +}