From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id A77BE385F015 for ; Thu, 9 Nov 2023 08:22:30 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A77BE385F015 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org A77BE385F015 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699518155; cv=none; b=BpVI7JsMEBoCHMsZPE32ZJAUB7pbwMuYrVo1OVzcBIARE0fZNUcDCNP920knVzFyUi7QI31/ugL9/iramM2RhkiALCL32plXfs/CPK/G8ppA7XrGivNez6vrYel0DbNwDhYgwmDy2T5fBtzpncLpxvHmQl8huI4hjZBNjr6nigM= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699518155; c=relaxed/simple; bh=HYs8eBwADSO0AVVBc7dCPhvlf+LDjagjS0n79QFCaTg=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=EEM0MWUuVmq1qMmOFyuVSWhT0rSN5kYjl+o9bO528E7dIrv1v1AVHu0vpDSLGkZ5kjCe0nBzPmkclkf6EmPguFNXJRQYBvyI9AMZBax6g2y6S+lRWV+EnRjXGCrv15YJ4uNT+8vdLh78c5Pr2V3OIhreURsV5JyuRhp1W/KmLCQ= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0356517.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3A98HcMk032194 for ; Thu, 9 Nov 2023 08:22:29 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=nCQo8PnhT5sjCsycsBfk+qq7s2TrBXmrHCSMhrlX5kg=; b=hM6yT4S50uszwGDCCUHJhextgATr28oftXzs9ivE9bE33qj+rqFcMZGGzQP0xmNs51tu 9AW+UgX9FXkgKlifvin3EAiWZKAfanfNWz9zBY9n+tAPGazaga/2LJnUhK/zFGSgEiul g/3Mdhlq6VZeC6YyW9HpQTMnrpjazd/Lwd9pcpsjmQ35oGlH3DsjCocEMTUTsa67GNpA ZA/qw/pJV8J5sH0D2ZR/WAhbzsi1Wj+w1jCoSUuVrX2rgVVmjMztfJ6jogrZBh77wxbF U0W27Hx/Wn0pUPyToIliPv7VKFxQxlDXEmXFQGXqtglgdmBKbtPQ1xqMgU7xtx3QpwGC nQ== Received: from ppma11.dal12v.mail.ibm.com (db.9e.1632.ip4.static.sl-reverse.com [50.22.158.219]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3u8usn06pc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 09 Nov 2023 08:22:29 +0000 Received: from pps.filterd (ppma11.dal12v.mail.ibm.com [127.0.0.1]) by ppma11.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 3A9835iE004124 for ; Thu, 9 Nov 2023 08:22:28 GMT Received: from smtprelay05.fra02v.mail.ibm.com ([9.218.2.225]) by ppma11.dal12v.mail.ibm.com (PPS) with ESMTPS id 3u7w212j13-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 09 Nov 2023 08:22:28 +0000 Received: from smtpav06.fra02v.mail.ibm.com (smtpav06.fra02v.mail.ibm.com [10.20.54.105]) by smtprelay05.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 3A98MPcQ16843334 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 9 Nov 2023 08:22:25 GMT Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5773720049; Thu, 9 Nov 2023 08:22:25 +0000 (GMT) Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2352A20040; Thu, 9 Nov 2023 08:22:25 +0000 (GMT) Received: from a8345010.lnxne.boe (unknown [9.152.108.100]) by smtpav06.fra02v.mail.ibm.com (Postfix) with ESMTPS; Thu, 9 Nov 2023 08:22:25 +0000 (GMT) From: Stefan Schulze Frielinghaus To: krebbel@linux.ibm.com, gcc-patches@gcc.gnu.org Cc: Stefan Schulze Frielinghaus Subject: [PATCH 3/3] s390: Revise vector reverse elements Date: Thu, 9 Nov 2023 09:22:11 +0100 Message-ID: <20231109082211.2505-3-stefansf@linux.ibm.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20231109082211.2505-1-stefansf@linux.ibm.com> References: <20231109082211.2505-1-stefansf@linux.ibm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: tADBQ6nwLCjPUgB6Dx6sQ7ObL-yGlGNo X-Proofpoint-ORIG-GUID: tADBQ6nwLCjPUgB6Dx6sQ7ObL-yGlGNo X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.987,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-11-09_07,2023-11-08_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 bulkscore=0 phishscore=0 clxscore=1015 spamscore=0 priorityscore=1501 malwarescore=0 adultscore=0 mlxscore=0 suspectscore=0 mlxlogscore=999 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311060000 definitions=main-2311090068 X-Spam-Status: No, score=-7.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SCC_5_SHORT_WORD_LINES,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Replace UNSPEC_VEC_ELTSWAP with a vec_select implementation. Furthermore, for a vector reverse elements operation between registers of mode V8HI perform three rotates instead of a vperm operation since the latter involves loading the permutation vector from the literal pool. Prior z15, instead of larl + vl + vl + vperm prefer vl + vpdi (+ verllg (+ verllf)) for a load operation. Likewise, prior z15, instead of larl + vl + vperm + vst prefer vpdi (+ verllg (+ verllf)) + vst for a store operation. Bootstrapped and regtested on s390. Ok for mainline? gcc/ChangeLog: * config/s390/s390.md: Remove UNSPEC_VEC_ELTSWAP. * config/s390/vector.md (eltswapv16qi): New expander. (*eltswapv16qi): New insn and splitter. (eltswapv8hi): New insn and splitter. (eltswap): New insn and splitter for modes V_HW_4 as well as V_HW_2. * config/s390/vx-builtins.md (eltswap): Remove. (*eltswapv16qi): Remove. (*eltswap): Remove. (*eltswap_emu): Remove. gcc/testsuite/ChangeLog: * gcc.target/s390/zvector/vec-reve-load-halfword-z14.c: Remove vperm and substitude by vpdi et al. * gcc.target/s390/zvector/vec-reve-load-halfword.c: Likewise. * gcc.target/s390/vector/reverse-elements-1.c: New test. * gcc.target/s390/vector/reverse-elements-2.c: New test. * gcc.target/s390/vector/reverse-elements-3.c: New test. * gcc.target/s390/vector/reverse-elements-4.c: New test. * gcc.target/s390/vector/reverse-elements-5.c: New test. * gcc.target/s390/vector/reverse-elements-6.c: New test. * gcc.target/s390/vector/reverse-elements-7.c: New test. --- gcc/config/s390/s390.md | 2 - gcc/config/s390/vector.md | 146 ++++++++++++++++++ gcc/config/s390/vx-builtins.md | 143 ----------------- .../s390/vector/reverse-elements-1.c | 46 ++++++ .../s390/vector/reverse-elements-2.c | 16 ++ .../s390/vector/reverse-elements-3.c | 56 +++++++ .../s390/vector/reverse-elements-4.c | 67 ++++++++ .../s390/vector/reverse-elements-5.c | 56 +++++++ .../s390/vector/reverse-elements-6.c | 67 ++++++++ .../s390/vector/reverse-elements-7.c | 67 ++++++++ .../s390/zvector/vec-reve-load-halfword-z14.c | 4 +- .../s390/zvector/vec-reve-load-halfword.c | 4 +- 12 files changed, 527 insertions(+), 147 deletions(-) create mode 100644 gcc/testsuite/gcc.target/s390/vector/reverse-elements-1.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/reverse-elements-2.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/reverse-elements-3.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/reverse-elements-4.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/reverse-elements-5.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/reverse-elements-6.c create mode 100644 gcc/testsuite/gcc.target/s390/vector/reverse-elements-7.c diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md index 3f29ba21442..f5e559c1ba4 100644 --- a/gcc/config/s390/s390.md +++ b/gcc/config/s390/s390.md @@ -241,8 +241,6 @@ UNSPEC_VEC_VFMIN UNSPEC_VEC_VFMAX - UNSPEC_VEC_ELTSWAP - UNSPEC_NNPA_VCLFNHS_V8HI UNSPEC_NNPA_VCLFNLS_V8HI UNSPEC_NNPA_VCRNFS_V8HI diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md index 7d1eb36e844..c478fce09df 100644 --- a/gcc/config/s390/vector.md +++ b/gcc/config/s390/vector.md @@ -948,6 +948,152 @@ operands[5] = simplify_gen_subreg (DFmode, operands[1], TFmode, 8); }) +;; VECTOR REVERSE ELEMENTS V16QI + +(define_expand "eltswapv16qi" + [(parallel + [(set (match_operand:V16QI 0 "nonimmediate_operand") + (vec_select:V16QI + (match_operand:V16QI 1 "nonimmediate_operand") + (match_dup 2))) + (use (match_dup 3))])] + "TARGET_VX" +{ + rtvec vec = rtvec_alloc (16); + for (int i = 0; i < 16; ++i) + RTVEC_ELT (vec, i) = GEN_INT (15 - i); + operands[2] = gen_rtx_PARALLEL (VOIDmode, vec); + operands[3] = gen_rtx_CONST_VECTOR (V16QImode, vec); +}) + +(define_insn_and_split "*eltswapv16qi" + [(set (match_operand:V16QI 0 "nonimmediate_operand" "=v,^R,^v") + (vec_select:V16QI + (match_operand:V16QI 1 "nonimmediate_operand" "v,^v,^R") + (parallel [(const_int 15) + (const_int 14) + (const_int 13) + (const_int 12) + (const_int 11) + (const_int 10) + (const_int 9) + (const_int 8) + (const_int 7) + (const_int 6) + (const_int 5) + (const_int 4) + (const_int 3) + (const_int 2) + (const_int 1) + (const_int 0)]))) + (use (match_operand:V16QI 2 "permute_pattern_operand" "v,X,X"))] + "TARGET_VX" + "@ + # + vstbrq\t%v1,%0 + vlbrq\t%v0,%1" + "&& reload_completed && REG_P (operands[0]) && REG_P (operands[1])" + [(set (match_dup 0) + (unspec:V16QI [(match_dup 1) + (match_dup 1) + (match_dup 2)] + UNSPEC_VEC_PERM))] + "" + [(set_attr "cpu_facility" "*,vxe2,vxe2") + (set_attr "op_type" "*,VRX,VRX")]) + +;; VECTOR REVERSE ELEMENTS V8HI + +(define_insn_and_split "eltswapv8hi" + [(set (match_operand:V8HI 0 "nonimmediate_operand" "=v,R,v") + (vec_select:V8HI + (match_operand:V8HI 1 "nonimmediate_operand" "v,v,R") + (parallel [(const_int 7) + (const_int 6) + (const_int 5) + (const_int 4) + (const_int 3) + (const_int 2) + (const_int 1) + (const_int 0)]))) + (clobber (match_scratch:V2DI 2 "=&v,X,X")) + (clobber (match_scratch:V4SI 3 "=&v,X,X"))] + "TARGET_VX" + "@ + # + vsterh\t%v1,%0 + vlerh\t%v0,%1" + "&& reload_completed && REG_P (operands[0]) && REG_P (operands[1])" + [(set (match_dup 2) + (subreg:V2DI (match_dup 1) 0)) + (set (match_dup 2) + (vec_select:V2DI + (match_dup 2) + (parallel [(const_int 1) (const_int 0)]))) + (set (match_dup 2) + (rotate:V2DI + (match_dup 2) + (const_int 32))) + (set (match_dup 3) + (subreg:V4SI (match_dup 2) 0)) + (set (match_dup 3) + (rotate:V4SI + (match_dup 3) + (const_int 16))) + (set (match_dup 0) + (subreg:V8HI (match_dup 3) 0))] + "" + [(set_attr "cpu_facility" "*,vxe2,vxe2") + (set_attr "op_type" "*,VRX,VRX")]) + +;; VECTOR REVERSE ELEMENTS V4SI / V4SF + +(define_insn_and_split "eltswap" + [(set (match_operand:V_HW_4 0 "nonimmediate_operand" "=v,R,v") + (vec_select:V_HW_4 + (match_operand:V_HW_4 1 "nonimmediate_operand" "v,v,R") + (parallel [(const_int 3) + (const_int 2) + (const_int 1) + (const_int 0)]))) + (clobber (match_scratch:V2DI 2 "=&v,X,X"))] + "TARGET_VX" + "@ + # + vsterf\t%v1,%0 + vlerf\t%v0,%1" + "&& reload_completed && REG_P (operands[0]) && REG_P (operands[1])" + [(set (match_dup 2) + (subreg:V2DI (match_dup 1) 0)) + (set (match_dup 2) + (vec_select:V2DI + (match_dup 2) + (parallel [(const_int 1) (const_int 0)]))) + (set (match_dup 2) + (rotate:V2DI + (match_dup 2) + (const_int 32))) + (set (match_dup 0) + (subreg:V_HW_4 (match_dup 2) 0))] + "" + [(set_attr "cpu_facility" "*,vxe2,vxe2") + (set_attr "op_type" "*,VRX,VRX")]) + +;; VECTOR REVERSE ELEMENTS V2DI / V2DF + +(define_insn "eltswap" + [(set (match_operand:V_HW_2 0 "nonimmediate_operand" "=v,R,v") + (vec_select:V_HW_2 + (match_operand:V_HW_2 1 "nonimmediate_operand" "v,v,R") + (parallel [(const_int 1) + (const_int 0)])))] + "TARGET_VX" + "@ + vpdi\t%v0,%v1,%v1,4 + vsterg\t%v1,%0 + vlerg\t%v0,%1" + [(set_attr "cpu_facility" "vx,vxe2,vxe2") + (set_attr "op_type" "VRR,VRX,VRX")]) ;; ;; Vector integer arithmetic instructions diff --git a/gcc/config/s390/vx-builtins.md b/gcc/config/s390/vx-builtins.md index 10eae76777f..6f42c91e8ae 100644 --- a/gcc/config/s390/vx-builtins.md +++ b/gcc/config/s390/vx-builtins.md @@ -2163,149 +2163,6 @@ "fmaxb\t%v0,%v1,%v2,%b3" [(set_attr "op_type" "VRR")]) -; The element reversal builtins introduced with z15 have been made -; available also for older CPUs down to z13. -(define_expand "eltswap" - [(set (match_operand:VEC_HW 0 "nonimmediate_operand" "") - (unspec:VEC_HW [(match_operand:VEC_HW 1 "nonimmediate_operand" "")] - UNSPEC_VEC_ELTSWAP))] - "TARGET_VX") - -; The byte element reversal is implemented as 128 bit byte swap. -; Alternatively this could be emitted as bswap:V1TI but the required -; subregs appear to confuse combine. -(define_insn "*eltswapv16qi" - [(set (match_operand:V16QI 0 "nonimmediate_operand" "=v,v,R") - (unspec:V16QI [(match_operand:V16QI 1 "nonimmediate_operand" "v,R,v")] - UNSPEC_VEC_ELTSWAP))] - "TARGET_VXE2" - "@ - # - vlbrq\t%v0,%v1 - vstbrq\t%v1,%v0" - [(set_attr "op_type" "*,VRX,VRX")]) - -; vlerh, vlerf, vlerg, vsterh, vsterf, vsterg -(define_insn "*eltswap" - [(set (match_operand:V_HW_HSD 0 "nonimmediate_operand" "=v,v,R") - (unspec:V_HW_HSD [(match_operand:V_HW_HSD 1 "nonimmediate_operand" "v,R,v")] - UNSPEC_VEC_ELTSWAP))] - "TARGET_VXE2" - "@ - # - vler\t%v0,%v1 - vster\t%v1,%v0" - [(set_attr "op_type" "*,VRX,VRX")]) - -; The emulation pattern below will also accept -; vst (eltswap (vl)) -; i.e. both operands in memory, which reload needs to fix. -; Split into -; vl -; vster (=vst (eltswap)) -; since we prefer vster over vler as long as the latter -; does not support alignment hints. -(define_split - [(set (match_operand:VEC_HW 0 "memory_operand" "") - (unspec:VEC_HW [(match_operand:VEC_HW 1 "memory_operand" "")] - UNSPEC_VEC_ELTSWAP))] - "TARGET_VXE2 && can_create_pseudo_p ()" - [(set (match_dup 2) (match_dup 1)) - (set (match_dup 0) - (unspec:VEC_HW [(match_dup 2)] UNSPEC_VEC_ELTSWAP))] -{ - operands[2] = gen_reg_rtx (mode); -}) - - -; Swapping v2df/v2di can be done via vpdi on z13 and z14. -(define_split - [(set (match_operand:V_HW_2 0 "register_operand" "") - (unspec:V_HW_2 [(match_operand:V_HW_2 1 "register_operand" "")] - UNSPEC_VEC_ELTSWAP))] - "TARGET_VX && can_create_pseudo_p ()" - [(set (match_operand:V_HW_2 0 "register_operand" "=v") - (vec_select:V_HW_2 - (vec_concat: - (match_operand:V_HW_2 1 "register_operand" "v") - (match_dup 1)) - (parallel [(const_int 1) (const_int 2)])))] -) - - -; Swapping v4df/v4si can be done via vpdi and rot. -(define_split - [(set (match_operand:V_HW_4 0 "register_operand" "") - (unspec:V_HW_4 [(match_operand:V_HW_4 1 "register_operand" "")] - UNSPEC_VEC_ELTSWAP))] - "TARGET_VX && can_create_pseudo_p ()" - [(set (match_dup 2) - (vec_select:V_HW_4 - (vec_concat: - (match_dup 1) - (match_dup 1)) - (parallel [(const_int 2) (const_int 3) (const_int 4) (const_int 5)]))) - (set (match_dup 3) - (subreg:V2DI (match_dup 2) 0)) - (set (match_dup 4) - (rotate:V2DI - (match_dup 3) - (const_int 32))) - (set (match_operand:V_HW_4 0) - (subreg:V_HW_4 (match_dup 4) 0))] -{ - operands[2] = gen_reg_rtx (mode); - operands[3] = gen_reg_rtx (V2DImode); - operands[4] = gen_reg_rtx (V2DImode); -}) - -; z15 has instructions for doing element reversal from mem to reg -; or the other way around. For reg to reg or on pre z15 machines -; we have to emulate it with vector permute. -(define_insn_and_split "*eltswap_emu" - [(set (match_operand:VEC_HW 0 "nonimmediate_operand" "=vR") - (unspec:VEC_HW [(match_operand:VEC_HW 1 "nonimmediate_operand" "vR")] - UNSPEC_VEC_ELTSWAP))] - "TARGET_VX && can_create_pseudo_p ()" - "#" - "&& ((!memory_operand (operands[0], mode) - && !memory_operand (operands[1], mode)) - || !TARGET_VXE2)" - [(set (match_dup 3) - (unspec:V16QI [(match_dup 4) - (match_dup 4) - (match_dup 2)] - UNSPEC_VEC_PERM)) - (set (match_dup 0) (subreg:VEC_HW (match_dup 3) 0))] -{ - static char p[4][16] = - { { 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 }, /* Q */ - { 14, 15, 12, 13, 10, 11, 8, 9, 6, 7, 4, 5, 2, 3, 0, 1 }, /* H */ - { 12, 13, 14, 15, 8, 9, 10, 11, 4, 5, 6, 7, 0, 1, 2, 3 }, /* S */ - { 8, 9, 10, 11, 12, 13, 14, 15, 0, 1, 2, 3, 4, 5, 6, 7 } }; /* D */ - char *perm; - rtx perm_rtx[16], constv; - - switch (GET_MODE_SIZE (GET_MODE_INNER (mode))) - { - case 1: perm = p[0]; break; - case 2: perm = p[1]; break; - case 4: perm = p[2]; break; - case 8: perm = p[3]; break; - default: gcc_unreachable (); - } - - for (int i = 0; i < 16; i++) - perm_rtx[i] = GEN_INT (perm[i]); - - operands[1] = force_reg (mode, operands[1]); - operands[2] = gen_reg_rtx (V16QImode); - operands[3] = gen_reg_rtx (V16QImode); - operands[4] = simplify_gen_subreg (V16QImode, operands[1], mode, 0); - constv = force_const_mem (V16QImode, gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, perm_rtx))); - emit_move_insn (operands[2], constv); -}) - ; vec_insert (__builtin_bswap32 (*a), b, 1) set-element-bswap-2.c ; b[1] = __builtin_bswap32 (*a) set-element-bswap-3.c ; vlebrh, vlebrf, vlebrg diff --git a/gcc/testsuite/gcc.target/s390/vector/reverse-elements-1.c b/gcc/testsuite/gcc.target/s390/vector/reverse-elements-1.c new file mode 100644 index 00000000000..4a2541b7ae6 --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/vector/reverse-elements-1.c @@ -0,0 +1,46 @@ +/* { dg-compile } */ +/* { dg-options "-O3 -mzarch -march=z13" } */ +/* { dg-require-effective-target s390_vx } */ +/* { dg-final { scan-assembler-times {\tvpdi\t} 4 } } */ +/* { dg-final { scan-assembler-not {\tvperm\t} } } */ + +typedef short __attribute__ ((vector_size (16))) V8HI; +typedef int __attribute__ ((vector_size (16))) V4SI; +typedef long long __attribute__ ((vector_size (16))) V2DI; +typedef double __attribute__ ((vector_size (16))) V2DF; + +V8HI +v8hi (V8HI x) +{ + V8HI y; + for (int i = 0; i < 8; ++i) + y[i] = x[7 - i]; + return y; +} + +V4SI +v4si (V4SI x) +{ + V4SI y; + for (int i = 0; i < 4; ++i) + y[i] = x[3 - i]; + return y; +} + +V2DI +v2di (V2DI x) +{ + V2DI y; + for (int i = 0; i < 2; ++i) + y[i] = x[1 - i]; + return y; +} + +V2DF +v2df (V2DF x) +{ + V2DF y; + for (int i = 0; i < 2; ++i) + y[i] = x[1 - i]; + return y; +} diff --git a/gcc/testsuite/gcc.target/s390/vector/reverse-elements-2.c b/gcc/testsuite/gcc.target/s390/vector/reverse-elements-2.c new file mode 100644 index 00000000000..ec0d1da7d57 --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/vector/reverse-elements-2.c @@ -0,0 +1,16 @@ +/* { dg-compile } */ +/* { dg-options "-O3 -mzarch -march=z14" } */ +/* { dg-require-effective-target s390_vxe } */ +/* { dg-final { scan-assembler-times {\tvpdi\t} 1 } } */ +/* { dg-final { scan-assembler-not {\tvperm\t} } } */ + +typedef float __attribute__ ((vector_size (16))) V4SF; + +V4SF +v4sf (V4SF x) +{ + V4SF y; + for (int i = 0; i < 4; ++i) + y[i] = x[3 - i]; + return y; +} diff --git a/gcc/testsuite/gcc.target/s390/vector/reverse-elements-3.c b/gcc/testsuite/gcc.target/s390/vector/reverse-elements-3.c new file mode 100644 index 00000000000..3f69db8831c --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/vector/reverse-elements-3.c @@ -0,0 +1,56 @@ +/* { dg-compile } */ +/* { dg-options "-O3 -mzarch -march=z14" } */ +/* { dg-require-effective-target s390_vxe } */ +/* { dg-final { scan-assembler-times {\tvpdi\t} 5 } } */ +/* { dg-final { scan-assembler-not {\tvperm\t} } } */ + +typedef short __attribute__ ((vector_size (16))) V8HI; +typedef int __attribute__ ((vector_size (16))) V4SI; +typedef long long __attribute__ ((vector_size (16))) V2DI; +typedef float __attribute__ ((vector_size (16))) V4SF; +typedef double __attribute__ ((vector_size (16))) V2DF; + +V8HI +v8hi (V8HI *x) +{ + V8HI y; + for (int i = 0; i < 8; ++i) + y[i] = (*x)[7 - i]; + return y; +} + +V4SI +v4si (V4SI *x) +{ + V4SI y; + for (int i = 0; i < 4; ++i) + y[i] = (*x)[3 - i]; + return y; +} + +V2DI +v2di (V2DI *x) +{ + V2DI y; + for (int i = 0; i < 2; ++i) + y[i] = (*x)[1 - i]; + return y; +} + +V4SF +v4sf (V4SF *x) +{ + V4SF y; + for (int i = 0; i < 4; ++i) + y[i] = (*x)[3 - i]; + return y; +} + +V2DF +v2df (V2DF *x) +{ + V2DF y; + for (int i = 0; i < 2; ++i) + y[i] = (*x)[1 - i]; + return y; +} diff --git a/gcc/testsuite/gcc.target/s390/vector/reverse-elements-4.c b/gcc/testsuite/gcc.target/s390/vector/reverse-elements-4.c new file mode 100644 index 00000000000..5027ed55f50 --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/vector/reverse-elements-4.c @@ -0,0 +1,67 @@ +/* { dg-compile } */ +/* { dg-options "-O3 -mzarch -march=z15" } */ +/* { dg-require-effective-target s390_vxe2 } */ +/* { dg-final { scan-assembler-times {\tvlbrq\t} 1 } } */ +/* { dg-final { scan-assembler-times {\tvler[hfg]\t} 5 } } */ +/* { dg-final { scan-assembler-not {\tvperm\t} } } */ + +typedef signed char __attribute__ ((vector_size (16))) V16QI; +typedef short __attribute__ ((vector_size (16))) V8HI; +typedef int __attribute__ ((vector_size (16))) V4SI; +typedef long long __attribute__ ((vector_size (16))) V2DI; +typedef float __attribute__ ((vector_size (16))) V4SF; +typedef double __attribute__ ((vector_size (16))) V2DF; + +V16QI +v16qi (V16QI *x) +{ + V16QI y; + for (int i = 0; i < 16; ++i) + y[i] = (*x)[15 - i]; + return y; +} + +V8HI +v8hi (V8HI *x) +{ + V8HI y; + for (int i = 0; i < 8; ++i) + y[i] = (*x)[7 - i]; + return y; +} + +V4SI +v4si (V4SI *x) +{ + V4SI y; + for (int i = 0; i < 4; ++i) + y[i] = (*x)[3 - i]; + return y; +} + +V2DI +v2di (V2DI *x) +{ + V2DI y; + for (int i = 0; i < 2; ++i) + y[i] = (*x)[1 - i]; + return y; +} + +V4SF +v4sf (V4SF *x) +{ + V4SF y; + for (int i = 0; i < 4; ++i) + y[i] = (*x)[3 - i]; + return y; +} + +V2DF +v2df (V2DF *x) +{ + V2DF y; + for (int i = 0; i < 2; ++i) + y[i] = (*x)[1 - i]; + return y; +} diff --git a/gcc/testsuite/gcc.target/s390/vector/reverse-elements-5.c b/gcc/testsuite/gcc.target/s390/vector/reverse-elements-5.c new file mode 100644 index 00000000000..8c250aa681b --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/vector/reverse-elements-5.c @@ -0,0 +1,56 @@ +/* { dg-compile } */ +/* { dg-options "-O3 -mzarch -march=z14" } */ +/* { dg-require-effective-target s390_vxe } */ +/* { dg-final { scan-assembler-times {\tvpdi\t} 5 } } */ +/* { dg-final { scan-assembler-not {\tvperm\t} } } */ + +typedef short __attribute__ ((vector_size (16))) V8HI; +typedef int __attribute__ ((vector_size (16))) V4SI; +typedef long long __attribute__ ((vector_size (16))) V2DI; +typedef float __attribute__ ((vector_size (16))) V4SF; +typedef double __attribute__ ((vector_size (16))) V2DF; + +void +v8hi (V8HI *x, V8HI y) +{ + V8HI z; + for (int i = 0; i < 8; ++i) + z[i] = y[7 - i]; + *x = z; +} + +void +v4si (V4SI *x, V4SI y) +{ + V4SI z; + for (int i = 0; i < 4; ++i) + z[i] = y[3 - i]; + *x = z; +} + +void +v2di (V2DI *x, V2DI y) +{ + V2DI z; + for (int i = 0; i < 2; ++i) + z[i] = y[1 - i]; + *x = z; +} + +void +v4sf (V4SF *x, V4SF y) +{ + V4SF z; + for (int i = 0; i < 4; ++i) + z[i] = y[3 - i]; + *x = z; +} + +void +v2df (V2DF *x, V2DF y) +{ + V2DF z; + for (int i = 0; i < 2; ++i) + z[i] = y[1 - i]; + *x = z; +} diff --git a/gcc/testsuite/gcc.target/s390/vector/reverse-elements-6.c b/gcc/testsuite/gcc.target/s390/vector/reverse-elements-6.c new file mode 100644 index 00000000000..7e2b2356788 --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/vector/reverse-elements-6.c @@ -0,0 +1,67 @@ +/* { dg-compile } */ +/* { dg-options "-O3 -mzarch -march=z15" } */ +/* { dg-require-effective-target s390_vxe2 } */ +/* { dg-final { scan-assembler-times {\tvstbrq\t} 1 } } */ +/* { dg-final { scan-assembler-times {\tvster[hfg]\t} 5 } } */ +/* { dg-final { scan-assembler-not {\tvperm\t} } } */ + +typedef signed char __attribute__ ((vector_size (16))) V16QI; +typedef short __attribute__ ((vector_size (16))) V8HI; +typedef int __attribute__ ((vector_size (16))) V4SI; +typedef long long __attribute__ ((vector_size (16))) V2DI; +typedef float __attribute__ ((vector_size (16))) V4SF; +typedef double __attribute__ ((vector_size (16))) V2DF; + +void +v16qi (V16QI *x, V16QI y) +{ + V16QI z; + for (int i = 0; i < 16; ++i) + z[i] = y[15 - i]; + *x = z; +} + +void +v8hi (V8HI *x, V8HI y) +{ + V8HI z; + for (int i = 0; i < 8; ++i) + z[i] = y[7 - i]; + *x = z; +} + +void +v4si (V4SI *x, V4SI y) +{ + V4SI z; + for (int i = 0; i < 4; ++i) + z[i] = y[3 - i]; + *x = z; +} + +void +v2di (V2DI *x, V2DI y) +{ + V2DI z; + for (int i = 0; i < 2; ++i) + z[i] = y[1 - i]; + *x = z; +} + +void +v4sf (V4SF *x, V4SF y) +{ + V4SF z; + for (int i = 0; i < 4; ++i) + z[i] = y[3 - i]; + *x = z; +} + +void +v2df (V2DF *x, V2DF y) +{ + V2DF z; + for (int i = 0; i < 2; ++i) + z[i] = y[1 - i]; + *x = z; +} diff --git a/gcc/testsuite/gcc.target/s390/vector/reverse-elements-7.c b/gcc/testsuite/gcc.target/s390/vector/reverse-elements-7.c new file mode 100644 index 00000000000..046fcc0790a --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/vector/reverse-elements-7.c @@ -0,0 +1,67 @@ +/* { dg-compile } */ +/* { dg-options "-O3 -mzarch -march=z15" } */ +/* { dg-require-effective-target s390_vxe2 } */ +/* { dg-final { scan-assembler-times {\tvstbrq\t} 1 } } */ +/* { dg-final { scan-assembler-times {\tvster[hfg]\t} 5 } } */ +/* { dg-final { scan-assembler-not {\tvperm\t} } } */ + +typedef signed char __attribute__ ((vector_size (16))) V16QI; +typedef short __attribute__ ((vector_size (16))) V8HI; +typedef int __attribute__ ((vector_size (16))) V4SI; +typedef long long __attribute__ ((vector_size (16))) V2DI; +typedef float __attribute__ ((vector_size (16))) V4SF; +typedef double __attribute__ ((vector_size (16))) V2DF; + +void +v16qi (V16QI *x, V16QI *y) +{ + V16QI z; + for (int i = 0; i < 16; ++i) + z[i] = (*y)[15 - i]; + *x = z; +} + +void +v8hi (V8HI *x, V8HI *y) +{ + V8HI z; + for (int i = 0; i < 8; ++i) + z[i] = (*y)[7 - i]; + *x = z; +} + +void +v4si (V4SI *x, V4SI *y) +{ + V4SI z; + for (int i = 0; i < 4; ++i) + z[i] = (*y)[3 - i]; + *x = z; +} + +void +v2di (V2DI *x, V2DI *y) +{ + V2DI z; + for (int i = 0; i < 2; ++i) + z[i] = (*y)[1 - i]; + *x = z; +} + +void +v4sf (V4SF *x, V4SF *y) +{ + V4SF z; + for (int i = 0; i < 4; ++i) + z[i] = (*y)[3 - i]; + *x = z; +} + +void +v2df (V2DF *x, V2DF *y) +{ + V2DF z; + for (int i = 0; i < 2; ++i) + z[i] = (*y)[1 - i]; + *x = z; +} diff --git a/gcc/testsuite/gcc.target/s390/zvector/vec-reve-load-halfword-z14.c b/gcc/testsuite/gcc.target/s390/zvector/vec-reve-load-halfword-z14.c index 4938ac20613..3c1e9338f80 100644 --- a/gcc/testsuite/gcc.target/s390/zvector/vec-reve-load-halfword-z14.c +++ b/gcc/testsuite/gcc.target/s390/zvector/vec-reve-load-halfword-z14.c @@ -21,4 +21,6 @@ baz (signed short *x) return vec_reve (vec_xl (0, x)); } -/* { dg-final { scan-assembler-times "vperm\t" 3 } } */ +/* { dg-final { scan-assembler-times "vpdi\t" 3 } } */ +/* { dg-final { scan-assembler-times "verllg\t" 3 } } */ +/* { dg-final { scan-assembler-times "verllf\t" 3 } } */ diff --git a/gcc/testsuite/gcc.target/s390/zvector/vec-reve-load-halfword.c b/gcc/testsuite/gcc.target/s390/zvector/vec-reve-load-halfword.c index 3c9229922ec..7b1c3f885cd 100644 --- a/gcc/testsuite/gcc.target/s390/zvector/vec-reve-load-halfword.c +++ b/gcc/testsuite/gcc.target/s390/zvector/vec-reve-load-halfword.c @@ -9,7 +9,9 @@ foo (vector signed short x) return vec_reve (x); } -/* { dg-final { scan-assembler-times "vperm\t" 1 } } */ +/* { dg-final { scan-assembler-times "vpdi\t" 1 } } */ +/* { dg-final { scan-assembler-times "verllg\t" 1 } } */ +/* { dg-final { scan-assembler-times "verllf\t" 1 } } */ vector signed short -- 2.41.0