From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 175213858D28 for ; Mon, 10 Jun 2024 07:59:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 175213858D28 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 175213858D28 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1718006369; cv=none; b=HPkt8gzCc4r5252zGdjCi5NfFWzOJGnYUNs05r6aCf+xiTFdvnaIAywomaXck47e4GH7ws6W4pAoR9EOUZ/sjFaJUXiJrQ3NUsmXMjIsDygPCt73DVq+9b9E48b32AEO4o+tJYujnbbyynjbd5zczoqQUqOBspCheailIGGxNuY= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1718006369; c=relaxed/simple; bh=JMlfyaP5RuHtVGYP2TL6bjkrcKviw1xqcFqL65WIdNQ=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=LIWDCJEF0QUyPxXD7yRJ18Lj5OrUZXvQAumRdGK/u6SAuDOLh7fuV3yfbq+TVFCbdrceC6bxCu/7q6SE/BYalo962uEIOBeHif2t0UZreqhIq7Wut1WbgPyiP6M04NrR9Tqk01sr1941u3NxB0ZNXnuUM9YAviYpgmbXrZ53oNU= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0353728.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 45A7wxAg007336 for ; Mon, 10 Jun 2024 07:59:26 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from :to:subject:date:message-id:mime-version :content-transfer-encoding; s=pp1; bh=dlNJ7VU3WxyhFr6EtVjI5PZWH/ zziLpbzdHWcj2ljJs=; b=IOk8YB9snOyE5XEq0xObkF1IdILoHETCzDRKNcLSY2 tY9RIiJs8qNf5aRIPlpDLpKejqqr/4ZzvkV5GKrcS+8X+KtRR0bEuXHbtgLmXbPa 1lepurFyfbiRJw2Shfsb7Tep16qAMTQDXDvyuiiA5UiSEQSSpAFYL2a1xOYa+RTW NdddKqIxEDAaMm2trbd73EkTpfWPp2IfueFCMWpi22eYDnFTAX4YCVYus2DYnCMK KlOQD0divIskCsOyc6b+sSyd0q8y9/sVQ1+6Uuerq1U2c+nb1a9jKMZe1LPhPz9e 8SnjYRLmKUnKnPVmBAicXhA5QL6SCPTOXjQCZF3H5m5Q== Received: from ppma13.dal12v.mail.ibm.com (dd.9e.1632.ip4.static.sl-reverse.com [50.22.158.221]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3ynwjk801p-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Mon, 10 Jun 2024 07:59:25 +0000 (GMT) Received: from pps.filterd (ppma13.dal12v.mail.ibm.com [127.0.0.1]) by ppma13.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 45A4oHZH023597 for ; Mon, 10 Jun 2024 07:59:25 GMT Received: from smtprelay04.fra02v.mail.ibm.com ([9.218.2.228]) by ppma13.dal12v.mail.ibm.com (PPS) with ESMTPS id 3yn3um623g-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Mon, 10 Jun 2024 07:59:24 +0000 Received: from smtpav06.fra02v.mail.ibm.com (smtpav06.fra02v.mail.ibm.com [10.20.54.105]) by smtprelay04.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 45A7xLTa33030658 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Mon, 10 Jun 2024 07:59:23 GMT Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 45DA02004B for ; Mon, 10 Jun 2024 07:59:21 +0000 (GMT) Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 16D0A20049 for ; Mon, 10 Jun 2024 07:59:21 +0000 (GMT) Received: from li-ecc9ffcc-3485-11b2-a85c-e633c5126265.fritz.box (unknown [9.171.57.70]) by smtpav06.fra02v.mail.ibm.com (Postfix) with ESMTP for ; Mon, 10 Jun 2024 07:59:21 +0000 (GMT) From: Andreas Krebbel To: gcc@gcc.gnu.org Subject: [Committed] IBM Z: Fix ICE in expand_perm_as_replicate Date: Mon, 10 Jun 2024 09:59:20 +0200 Message-ID: <20240610075920.187238-1-krebbel@linux.ibm.com> X-Mailer: git-send-email 2.45.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: OGqaQsa7VTK54TxHmYFonae4kHx_3Vf9 X-Proofpoint-ORIG-GUID: OGqaQsa7VTK54TxHmYFonae4kHx_3Vf9 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1039,Hydra:6.0.680,FMLib:17.12.28.16 definitions=2024-06-10_02,2024-06-06_02,2024-05-17_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 phishscore=0 clxscore=1015 malwarescore=0 mlxlogscore=851 bulkscore=0 impostorscore=0 priorityscore=1501 mlxscore=0 lowpriorityscore=0 spamscore=0 suspectscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2405170001 definitions=main-2406100059 X-Spam-Status: No, score=-11.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: The current implementation assumes to always be invoked with register operands. For memory operands we even have an instruction though (vlrep). With the patch we try this first and only if it fails force the input into a register and continue. vec_splats generation fails for single element 128bit types which are allowed for vec_splat. This is something to sort out with another patch I guess. Bootstrapped and regtested on IBM Z. Committed to mainline. Needs to be committed to GCC 14 branch as well. gcc/ChangeLog: * config/s390/s390.cc (expand_perm_as_replicate): Handle memory operands. * config/s390/vx-builtins.md (vec_splats): Turn into parameterized expander. (@vec_splats): New expander. gcc/testsuite/ChangeLog: * g++.dg/torture/vshuf-mem.C: New test. --- gcc/config/s390/s390.cc | 17 +++++++++++++-- gcc/config/s390/vx-builtins.md | 2 +- gcc/testsuite/g++.dg/torture/vshuf-mem.C | 27 ++++++++++++++++++++++++ 3 files changed, 43 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/g++.dg/torture/vshuf-mem.C diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc index fa517bd3e77..ec836ec3cd4 100644 --- a/gcc/config/s390/s390.cc +++ b/gcc/config/s390/s390.cc @@ -17940,7 +17940,8 @@ expand_perm_as_replicate (const struct expand_vec_perm_d &d) unsigned char i; unsigned char elem; rtx base = d.op0; - rtx insn; + rtx insn = NULL_RTX; + /* Needed to silence maybe-uninitialized warning. */ gcc_assert (d.nelt > 0); elem = d.perm[0]; @@ -17954,7 +17955,19 @@ expand_perm_as_replicate (const struct expand_vec_perm_d &d) base = d.op1; elem -= d.nelt; } - insn = maybe_gen_vec_splat (d.vmode, d.target, base, GEN_INT (elem)); + if (memory_operand (base, d.vmode)) + { + /* Try to use vector load and replicate. */ + rtx new_base = adjust_address (base, GET_MODE_INNER (d.vmode), + elem * GET_MODE_UNIT_SIZE (d.vmode)); + insn = maybe_gen_vec_splats (d.vmode, d.target, new_base); + } + if (insn == NULL_RTX) + { + base = force_reg (d.vmode, base); + insn = maybe_gen_vec_splat (d.vmode, d.target, base, GEN_INT (elem)); + } + if (insn == NULL_RTX) return false; emit_insn (insn); diff --git a/gcc/config/s390/vx-builtins.md b/gcc/config/s390/vx-builtins.md index 93c0d408a43..bb271c09a7d 100644 --- a/gcc/config/s390/vx-builtins.md +++ b/gcc/config/s390/vx-builtins.md @@ -145,7 +145,7 @@ DONE; }) -(define_expand "vec_splats" +(define_expand "@vec_splats" [(set (match_operand:VEC_HW 0 "register_operand" "") (vec_duplicate:VEC_HW (match_operand: 1 "general_operand" "")))] "TARGET_VX") diff --git a/gcc/testsuite/g++.dg/torture/vshuf-mem.C b/gcc/testsuite/g++.dg/torture/vshuf-mem.C new file mode 100644 index 00000000000..5f1ebf65665 --- /dev/null +++ b/gcc/testsuite/g++.dg/torture/vshuf-mem.C @@ -0,0 +1,27 @@ +// { dg-options "-std=c++11" } +// { dg-do run } +// { dg-additional-options "-march=z14" { target s390*-*-* } } + +/* This used to trigger (2024-05-28) the vectorize_vec_perm_const + backend hook to be invoked with a MEM source operand. Extracted + from onnxruntime's mlas library. */ + +typedef float V4SF __attribute__((vector_size (16))); +typedef int V4SI __attribute__((vector_size (16))); + +template < unsigned I0, unsigned I1, unsigned I2, unsigned I3 > V4SF +MlasShuffleFloat32x4 (V4SF Vector) +{ + return __builtin_shuffle (Vector, Vector, V4SI{I0, I1, I2, I3}); +} + +int +main () +{ + V4SF f = { 1.0f, 2.0f, 3.0f, 4.0f }; + if (MlasShuffleFloat32x4 < 1, 1, 1, 1 > (f)[3] != 2.0f) + __builtin_abort (); + if (MlasShuffleFloat32x4 < 3, 3, 3, 3 > (f)[1] != 4.0f) + __builtin_abort (); + return 0; +} -- 2.45.1