From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 86A273858C50 for ; Mon, 20 Nov 2023 04:26:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 86A273858C50 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 86A273858C50 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700454370; cv=none; b=uUHA7MCWUES2A9WeYaaHaw18P830dTm3ls7kXN0WiDollT0J0Dookrohe5qYB35UZH/l+nKUFqmmvmN5EyNHzOwgNH5p4aTQt6yjiHxwSLLqQuJZ3LH7Vxpfdf0vywSlouZjz+/JZxbCWSx4ITMMDFPcg8EIBxw4hEThJc15Bbc= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700454370; c=relaxed/simple; bh=Ys4aFWoVvU4qpItHaS3tzaPIUCYsK3NlbrJKhrHZvJg=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=vrK5BsGFhHENhGH4CFTINwywbtBk2H45Y5Tkfs+nIKIC3UgwRwG8Q2/P5PKyB9ficYUSNN0KKQlrgVeAPVOEfTXls6iVl06RYKL/glZE5rXjnhYdO5WQotdvPjAD18uxYxfcKZsgJzYlu2p2ouxfWLZnBcZDmYCpLwcShIEgG/E= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0353723.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3AK4B9Z2028501; Mon, 20 Nov 2023 04:26:08 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=date : from : to : subject : message-id : references : mime-version : content-type : in-reply-to; s=pp1; bh=FS4lD1pU9xOFb0BVeNRLrVXjWgCXr795LUBiv5RnXS0=; b=FzicYwAanWrE0eQdi1blNMkp07/lEdvLqm6eP+bwmE5FUORVfWYYnlCXWvpYeXTQWH+B jxJ5zKljEtWiJHPAH8wXj9kwm0Sqdxds+SvputQ13uuziYw/ei4JaEPpRWeebYdWvwn4 RPe9r+g8y/DeZygYK1KmxCmz1OKx4szkdWZ/NKOWoAqeGrJJE1easqiN1VWFJ1iqgv0e kVXYeB7ybje8hcZd5PbQihipCBJMiIt3bdcf2PO5WYsR07dxJSDFLYYJY4A7Ceexxp2p 342TDyT0Zw98frpMPr0YE6rOm4AUdVI30iZbz0Wzym19NE/rv1Hg6cdUE0MJGuXqGO/9 FQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3ufmv13fum-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 20 Nov 2023 04:26:07 +0000 Received: from m0353723.ppops.net (m0353723.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 3AK4Nefb004317; Mon, 20 Nov 2023 04:26:07 GMT Received: from ppma12.dal12v.mail.ibm.com (dc.9e.1632.ip4.static.sl-reverse.com [50.22.158.220]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3ufmv13fud-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 20 Nov 2023 04:26:07 +0000 Received: from pps.filterd (ppma12.dal12v.mail.ibm.com [127.0.0.1]) by ppma12.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 3AK1jCeH015171; Mon, 20 Nov 2023 04:26:06 GMT Received: from smtprelay04.dal12v.mail.ibm.com ([172.16.1.6]) by ppma12.dal12v.mail.ibm.com (PPS) with ESMTPS id 3uf7ksq4f3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 20 Nov 2023 04:26:06 +0000 Received: from smtpav04.wdc07v.mail.ibm.com (smtpav04.wdc07v.mail.ibm.com [10.39.53.231]) by smtprelay04.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 3AK4Q5NG21103170 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 20 Nov 2023 04:26:05 GMT Received: from smtpav04.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 78DCD58050; Mon, 20 Nov 2023 04:26:05 +0000 (GMT) Received: from smtpav04.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id C347E58045; Mon, 20 Nov 2023 04:26:04 +0000 (GMT) Received: from cowardly-lion.the-meissners.org (unknown [9.61.1.46]) by smtpav04.wdc07v.mail.ibm.com (Postfix) with ESMTPS; Mon, 20 Nov 2023 04:26:04 +0000 (GMT) Date: Sun, 19 Nov 2023 23:26:03 -0500 From: Michael Meissner To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner Subject: [PATCH 3/4] Add integer vector pair mode support to PowerPC Message-ID: Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-TM-AS-GCONF: 00 X-Proofpoint-GUID: t7as77OMTUkZgeug1PGey3lwUhD6NY4J X-Proofpoint-ORIG-GUID: P3fAk2o51RN61I6awyWCX44v52rBNdJs X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.987,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-11-20_01,2023-11-17_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 malwarescore=0 phishscore=0 spamscore=0 mlxlogscore=999 lowpriorityscore=0 bulkscore=0 suspectscore=0 mlxscore=0 priorityscore=1501 adultscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311060000 definitions=main-2311200029 X-Spam-Status: No, score=-10.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: The first two patches in the vector pair series were previous posted. This patch needs thos two patches. The first patch implemented the basic modes, and it allows for initialization of the modes. In addition, I added some optimizations for extracting and setting fields within the vector pair. The second patch in the vector pair series implemented floating point support. This third patch implements the integer vector pair support. This adds the basic support for doing integer operations on vector pairs. I have implemented most of the arithmetic and logical that will be needed in the future when byte shuffling will be added. I did add various combiner insns to fold the logical instructions (i.e. ior of not becomes orc). Since the PowerPC architecture does not have negative for vectors of 8/16-bit elements, I have added alternate code that creates a 0 and then does a subtract. The main instructions that are not supported are shift and rotate instructions. In addition, if people want to use vector pair support on integer types, it might make sense to add support for saturating adds and subtracts, along the various specialized instructions (bpermd, etc.). The fourth patch will provide new tests to the test suite. When I test a saxpy type loop (a[i] += (b[i] * c[i])), I generally see a 10% improvement over either auto-factorization, or just using the vector types. I have tested these patches on a little endian power10 system. With -vector-size-32 disabled by default, there are no regressions in the test suite. I have also built and run the tests on both little endian power 9 and big endian 9 power systems, and there are no regressions. Can I check these patches into the master branch? 2023-11-19 Michael Meisner gcc/ * config/rs6000/vector-pair.md (VPAIR_INT): New mode iterator. (VPAIR_NEG_VNEG): Likewise. (VPAIR_NEG_SUB): Likewise. (VPAIR_INT_BINARY): New code iterator. (neg2, VPAIR_NEG_VNEG iterator): New insn. (neg2, VPAIR_NEG_SUB iterator); Likewise. (2, VPAIR_LOGICAL_UNARY and VPAIR_INT iterators): Likewise. (3, VPAIR_LOGICAL_BINARY and VPAIR INT iterator): Likewise. (nor3_1): Likewise. (nor3_2): Likewise. (andc3): Likewise. (eqv3): Likewise. (nand3_1): Likewise. (nand3_2): Likewise. (orc): Likewise. --- gcc/config/rs6000/vector-pair.md | 252 +++++++++++++++++++++++++++++++ 1 file changed, 252 insertions(+) diff --git a/gcc/config/rs6000/vector-pair.md b/gcc/config/rs6000/vector-pair.md index 8e2d7e5cc5b..dc71ea28293 100644 --- a/gcc/config/rs6000/vector-pair.md +++ b/gcc/config/rs6000/vector-pair.md @@ -38,6 +38,22 @@ (define_mode_iterator VPAIR_FP [V8SF V4DF]) (define_code_iterator VPAIR_FP_UNARY [abs neg]) (define_code_iterator VPAIR_FP_BINARY [plus minus mult smin smax]) +;; Integer vector pair ops. We need the basic logical opts to support +;; permution on little endian systems. +(define_mode_iterator VPAIR_INT [V32QI V16HI V8SI V4DI]) + +;; Special iterators for NEG (V4SI and V2DI have vneg{w,d}), while V16QI and +;; V8HI have to use a subtract from 0. +(define_mode_iterator VPAIR_NEG_VNEG [V4DI V8SI]) +(define_mode_iterator VPAIR_NEG_SUB [V32QI V16HI]) + +;; Iterator integer unary/binary operations. Logical operations can be done on +;; all VSX registers, while the binary int operators need Altivec registers. +(define_code_iterator VPAIR_LOGICAL_UNARY [not]) +(define_code_iterator VPAIR_LOGICAL_BINARY [and ior xor]) + +(define_code_iterator VPAIR_INT_BINARY [plus minus smin smax umin umax]) + ;; Iterator for vector pairs with double word elements (define_mode_iterator VPAIR_DWORD [V4DI V4DF]) @@ -626,4 +642,240 @@ (define_insn_and_split "*nfms_fpcontract_4" } [(set_attr "length" "8") (set_attr "type" "vecfloat")]) + +;; Vector pair negate if we have the VNEGx instruction. +(define_insn_and_split "neg2" + [(set (match_operand:VPAIR_NEG_VNEG 0 "vsx_register_operand" "=v") + (neg:VPAIR_NEG_VNEG + (match_operand:VPAIR_NEG_VNEG 1 "vsx_register_operand" "v")))] + "TARGET_MMA && TARGET_VECTOR_SIZE_32" + "#" + "&& reload_completed" + [(const_int 0)] +{ + split_unary_vector_pair (mode, operands, + gen_neg2); + DONE; +} + [(set_attr "length" "8") + (set_attr "type" "vecfloat")]) + +;; Vector pair negate if we have to do a subtract from 0 +(define_insn_and_split "neg2" + [(set (match_operand:VPAIR_NEG_SUB 0 "vsx_register_operand" "=v") + (neg:VPAIR_NEG_SUB + (match_operand:VPAIR_NEG_SUB 1 "vsx_register_operand" "v"))) + (clobber (match_scratch: 2 "=&v"))] + "TARGET_MMA && TARGET_VECTOR_SIZE_32" + "#" + "&& reload_completed" + [(const_int 0)] +{ + enum machine_mode mode = mode; + rtx tmp = operands[2]; + unsigned reg0 = reg_or_subregno (operands[0]); + unsigned reg1 = reg_or_subregno (operands[1]); + + emit_move_insn (tmp, CONST0_RTX (mode)); + emit_insn (gen_sub3 (gen_rtx_REG (mode, reg0), + tmp, + gen_rtx_REG (mode, reg1))); + + emit_insn (gen_sub3 (gen_rtx_REG (mode, reg0 + 1), + tmp, + gen_rtx_REG (mode, reg1 + 1))); + + DONE; +} + [(set_attr "length" "8") + (set_attr "type" "vecfloat")]) + +;; Vector pair logical unary operations. These operations can use all VSX +;; registers. +(define_insn_and_split "2" + [(set (match_operand:VPAIR_INT 0 "vsx_register_operand" "=wa") + (VPAIR_LOGICAL_UNARY:VPAIR_INT + (match_operand:VPAIR_INT 1 "vsx_register_operand" "wa")))] + "TARGET_MMA && TARGET_VECTOR_SIZE_32" + "#" + "&& reload_completed" + [(const_int 0)] +{ + split_unary_vector_pair (mode, operands, + gen_2); + DONE; +} + [(set_attr "length" "8") + (set_attr "type" "veclogical")]) + +;; Vector pair logical binary operations. These operations can use all VSX +;; registers. +(define_insn_and_split "3" + [(set (match_operand:VPAIR_INT 0 "vsx_register_operand" "=wa") + (VPAIR_LOGICAL_BINARY:VPAIR_INT + (match_operand:VPAIR_INT 1 "vsx_register_operand" "wa") + (match_operand:VPAIR_INT 2 "vsx_register_operand" "wa")))] + "TARGET_MMA && TARGET_VECTOR_SIZE_32" + "#" + "&& reload_completed" + [(const_int 0)] +{ + split_binary_vector_pair (mode, operands, + gen_3); + DONE; +} + [(set_attr "length" "8") + (set_attr "type" "veclogical")]) + +;; Vector pair logical binary operations. These operations require Altivec +;; registers. +(define_insn_and_split "3" + [(set (match_operand:VPAIR_INT 0 "vsx_register_operand" "=v") + (VPAIR_INT_BINARY:VPAIR_INT + (match_operand:VPAIR_INT 1 "vsx_register_operand" "v") + (match_operand:VPAIR_INT 2 "vsx_register_operand" "v")))] + "TARGET_MMA && TARGET_VECTOR_SIZE_32" + "#" + "&& reload_completed" + [(const_int 0)] +{ + split_binary_vector_pair (mode, operands, + gen_3); + DONE; +} + [(set_attr "length" "8") + (set_attr "type" "vecsimple")]) + +;; Optiomize vector pair ~(a | b) or ((~a) & (~b)) to produce xxlnor +(define_insn_and_split "*nor3_1" + [(set (match_operand:VPAIR_INT 0 "vsx_register_operand" "=wa") + (not:VPAIR_INT + (ior:VPAIR_INT + (match_operand:VPAIR_INT 1 "vsx_register_operand" "wa") + (match_operand:VPAIR_INT 2 "vsx_register_operand" "wa"))))] + "TARGET_MMA && TARGET_VECTOR_SIZE_32" + "#" + "&& reload_completed" + [(const_int 0)] +{ + split_binary_vector_pair (mode, operands, + gen_nor3); + DONE; +} + [(set_attr "length" "8") + (set_attr "type" "veclogical")]) + +(define_insn_and_split "*nor3_2" + [(set (match_operand:VPAIR_INT 0 "vsx_register_operand" "=wa") + (and:VPAIR_INT + (not:VPAIR_INT + (match_operand:VPAIR_INT 1 "vsx_register_operand" "wa")) + (not:VPAIR_INT + (match_operand:VPAIR_INT 2 "vsx_register_operand" "wa"))))] + "TARGET_MMA && TARGET_VECTOR_SIZE_32" + "#" + "&& reload_completed" + [(const_int 0)] +{ + split_binary_vector_pair (mode, operands, + gen_nor3); + DONE; +} + [(set_attr "length" "8") + (set_attr "type" "veclogical")]) + +;; Optimize vector pair (~a) & b to use xxlandc +(define_insn_and_split "*andc3" + [(set (match_operand:VPAIR_INT 0 "vsx_register_operand" "=wa") + (and:VPAIR_INT + (not:VPAIR_INT + (match_operand:VPAIR_INT 1 "vsx_register_operand" "wa")) + (match_operand:VPAIR_INT 2 "vsx_register_operand" "wa")))] + "TARGET_MMA && TARGET_VECTOR_SIZE_32" + "#" + "&& reload_completed" + [(const_int 0)] +{ + split_binary_vector_pair (mode, operands, + gen_andc3); + DONE; +} + [(set_attr "length" "8") + (set_attr "type" "veclogical")]) + +;; Optimize vector pair ~(a ^ b) to produce xxleqv +(define_insn_and_split "*eqv3" + [(set (match_operand:VPAIR_INT 0 "vsx_register_operand" "=wa") + (not:VPAIR_INT + (xor:VPAIR_INT + (match_operand:VPAIR_INT 1 "vsx_register_operand" "wa") + (match_operand:VPAIR_INT 2 "vsx_register_operand" "wa"))))] + "TARGET_MMA && TARGET_VECTOR_SIZE_32" + "#" + "&& reload_completed" + [(const_int 0)] +{ + split_binary_vector_pair (mode, operands, + gen_nor3); + DONE; +} + [(set_attr "length" "8") + (set_attr "type" "veclogical")]) + +;; Optiomize vector pair ~(a & b) or ((~a) | (~b)) to produce xxlnand +(define_insn_and_split "*nand3_1" + [(set (match_operand:VPAIR_INT 0 "vsx_register_operand" "=wa") + (not:VPAIR_INT + (and:VPAIR_INT + (match_operand:VPAIR_INT 1 "vsx_register_operand" "wa") + (match_operand:VPAIR_INT 2 "vsx_register_operand" "wa"))))] + "TARGET_MMA && TARGET_VECTOR_SIZE_32" + "#" + "&& reload_completed" + [(const_int 0)] +{ + split_binary_vector_pair (mode, operands, + gen_nand3); + DONE; +} + [(set_attr "length" "8") + (set_attr "type" "veclogical")]) + +(define_insn_and_split "*nand3_2" + [(set (match_operand:VPAIR_INT 0 "vsx_register_operand" "=wa") + (ior:VPAIR_INT + (not:VPAIR_INT + (match_operand:VPAIR_INT 1 "vsx_register_operand" "wa")) + (not:VPAIR_INT + (match_operand:VPAIR_INT 2 "vsx_register_operand" "wa"))))] + "TARGET_MMA && TARGET_VECTOR_SIZE_32" + "#" + "&& reload_completed" + [(const_int 0)] +{ + split_binary_vector_pair (mode, operands, + gen_nand3); + DONE; +} + [(set_attr "length" "8") + (set_attr "type" "veclogical")]) + +;; Optimize vector pair (~a) | b to produce xxlorc +(define_insn_and_split "*orc3" + [(set (match_operand:VPAIR_INT 0 "vsx_register_operand" "=wa") + (ior:VPAIR_INT + (not:VPAIR_INT + (match_operand:VPAIR_INT 1 "vsx_register_operand" "wa")) + (match_operand:VPAIR_INT 2 "vsx_register_operand" "wa")))] + "TARGET_MMA && TARGET_VECTOR_SIZE_32" + "#" + "&& reload_completed" + [(const_int 0)] +{ + split_binary_vector_pair (mode, operands, + gen_orc3); + DONE; +} + [(set_attr "length" "8") + (set_attr "type" "veclogical")]) -- 2.41.0 -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: meissner@linux.ibm.com