From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 9868B3858C50 for ; Mon, 20 Nov 2023 04:20:11 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9868B3858C50 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 9868B3858C50 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700454013; cv=none; b=pPkxzegmHtlCb9bXm0BlVI1RjSKegg6nNnRCWHlDcoZQblstjhYD1d6zQXdgEdz+rSrV9KwwM110dha/oTrSGqh0Y9IIPDOjMCP38XwMlvq7VUK5vCAhNbNOPqJ6Xywo5l2kwZp4sMA6LWB+cSumsZdxtxjq5q89iuSlg+jXmc0= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700454013; c=relaxed/simple; bh=gMnQ6nEjFJ2vhADm0J6yeIkJHQ0T1TIeCKoBwngDPnQ=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=uSOpjB9VotGLrzQqvIuF7QjV2R+WTaG9Km96dB6eVTdvDUlyf1XDh8xbMo3n4H57B99HGC3XbMlr4soVvmHPmcFklzWT6DsZKyQjExw5DMVO+WpjnttI7XLsjBHb4Aw7Q8jbP7sRWe9zSFVbXDEyEUOwHfnR6a8wN4hdUpRTBME= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0353724.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3AK4GTdF013361; Mon, 20 Nov 2023 04:20:11 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=date : from : to : subject : message-id : references : content-type : in-reply-to : mime-version; s=pp1; bh=GCuIR93WjA80guOx2ejGKUNDEbzHKggCwzX0PDaFLRM=; b=ReEYrAN3moABMPtJ2R6SciIvO/xkQvyKQvoFvozfIqGCSEq7LK9zOJETdQAzUU+ar1Og OImybc56PVOuj0c+zk5hLTmPvGZcAUlyPRLV9YhwQzSeg5bLS8AX/Ew8bl1TzswiVvCy Z47v6UJrjZVW0GHFBuxoRUNC+IUL+fDSrS8GE387KQpv6H2y5AWeKXPuJ5Hg0XqLCVxW ICxSdKdxJ9yrTRdifOCgoNLCC18w491Q0J6OzJ6ZnS/2uT/HuvCdhWZAZ7oJ+4D4vqFH 5Mox5eFDumXC9q4SQ5ntuV2Fvi4ybH0U6eooy8aV858EvVTRRocvENbzD3UiVNrvJf02 nQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3ufrd506h1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 20 Nov 2023 04:20:10 +0000 Received: from m0353724.ppops.net (m0353724.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 3AK4HKIU015135; Mon, 20 Nov 2023 04:20:10 GMT Received: from ppma21.wdc07v.mail.ibm.com (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3ufrd506gr-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 20 Nov 2023 04:20:10 +0000 Received: from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1]) by ppma21.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 3AK1XQvr028353; Mon, 20 Nov 2023 04:20:09 GMT Received: from smtprelay04.wdc07v.mail.ibm.com ([172.16.1.71]) by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3uf8knete6-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 20 Nov 2023 04:20:09 +0000 Received: from smtpav04.wdc07v.mail.ibm.com (smtpav04.wdc07v.mail.ibm.com [10.39.53.231]) by smtprelay04.wdc07v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 3AK4K8dd50594256 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 20 Nov 2023 04:20:08 GMT Received: from smtpav04.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A15E058056; Mon, 20 Nov 2023 04:20:08 +0000 (GMT) Received: from smtpav04.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id F1BFE58045; Mon, 20 Nov 2023 04:20:07 +0000 (GMT) Received: from cowardly-lion.the-meissners.org (unknown [9.61.1.46]) by smtpav04.wdc07v.mail.ibm.com (Postfix) with ESMTPS; Mon, 20 Nov 2023 04:20:07 +0000 (GMT) Date: Sun, 19 Nov 2023 23:20:06 -0500 From: Michael Meissner To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner Subject: [PATCH 1/4] Add basic vector pair mode support Message-ID: Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner References: Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: -CHjVewoDM7y0qc6Y3P9SVsYun-7xnB8 X-Proofpoint-GUID: AHB-JXtGgVTTk_aVaE9bYyyWKu5ZOjG7 X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.987,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-11-20_01,2023-11-17_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 impostorscore=0 adultscore=0 phishscore=0 lowpriorityscore=0 mlxscore=0 malwarescore=0 mlxlogscore=999 spamscore=0 suspectscore=0 clxscore=1015 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311060000 definitions=main-2311200028 X-Spam-Status: No, score=-3.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,KAM_SHORT,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Add basic support for vector_size(32). We have had several users ask us to implement ways of using the Power10 load vector pair and store vector pair instructions to give their code a speed up due to reduced memory bandwidth. I had originally posted the following patches: * https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636077.html * https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636078.html * https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636083.html * https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636080.html * https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636081.html to add a set of built-in functions that use the PowePC __vector_pair type and that provide a set of functions to do basic operations on vector pair. After I posted these patches, it was decided that it would be better to have a new type that is used rather than a bunch of new built-in functions. Within the GCC context, the best way to add this support is to extend the vector modes so that V4DFmode, V8SFmode, V4DImode, V8SImode, V16HImode, and V32QImode are used. While in theory you could add a whole new type that isn't a larger size vector, my experience with IEEE 128-bit floating point is that GCC really doesn't like 2 modes that are the same size but have different implementations (such as we see with IEEE 128-bit floating point and IBM double-double 128-bit floating point). So I did not consider adding a new mode for using with vector pairs. My original intention was to just implement V4DFmode and V8SFmode, since the primary users asking for vector pair support are people implementing the high end math libraries like Eigen and Blas. However in implementing this code, I discovered that we will need integer vector pair support as well as floating point vector pair. The integer modes and types are needed to properly implement byte shuffling and vector comparisons which need integer vector pairs. With the current patches, vector pair support is not enabled by default. The main reason is I have not implemented the support for byte shuffling which various tests depend on. I would also like to implement overloads for the vector built-in functions like vec_add, vec_sum, etc. that if you give it a vector pair, it would handle it just like if you give a vector type. In addition, once the various bugs are addressed, I would then implement the support so that automatic vectorization would consider using vector pairs instead of vectors. This is the first patch in the series. It implements the basic modes, and it allows for initialization of the modes. I've added some optimizations for extracting and setting fields within the vector pair. The second patch will implement the floating point vector pair support. The third patch will implement the integer vector pair support. The fourth patch will provide new tests to the test suite. When I test a saxpy type loop (a[i] += (b[i] * c[i])), I generally see a 10% improvement over either auto-factorization, or just using the vector types. I have tested these patches on a little endian power10 system. With -vector-size-32 disabled by default, there are no regressions in the test suite. I have also built and run the tests on both little endian power9 and big endian power9 systems, and there are no regressions. Can I check these patches into the master branch? 2023-11-19 Michael Meissner gcc/ * config/rs6000/constraint.md (eV): New constraint. * config/rs6000/predicates.md (cons_0_to_31_operand): New predicate. (easy_vector_constant): Add support for vector pair constants. (easy_vector_pair_constant): New predicate. (mam_assemble_input_operand): Allow other 16-byte vector modes than Immodest. * config/rs6000/rs6000-c.cc (rs6000_cpu_cpp_builtins): Define __VECTOR_SIZE_32__ if -mvector-size-32. * config/rs6000/rs6000-protos.h (vector_pair_to_vector_mode): New declaration. (split_vector_pair_constant): Likewise. (rs6000_expand_vector_pair_init): Likewise. * config/rs6000/rs6000.cc (rs6000_hard_regno_mode_ok_uncached): Use VECTOR_PAIR_MODE instead of comparing mode to OOmode. (rs6000_modes_tieable_p): Allow various vector pair modes to pair with each other. Allow 16-byte vectors to pair with vector pair modes. (rs6000_setup_reg_addr_masks): Use VECTOR_PAIR_MODE instead of comparing mode to OOmode. (rs6000_init_hard_regno_mode_ok): Setup vector pair mode basic type information and reload handlers. (rs6000_option_override_internal): Warn if -mvector-pair-32 is used without -mcpu=power10 or -mmma. (vector_pair_to_vector_mode): New function. (split_vector_pair_constant): Likewise. (rs6000_expand_vector_pair_init): Likewise. (reg_offset_addressing_ok_p): Add support for vector pair modes. (rs6000_emit_move): Likewise. (rs6000_preferred_reload_class): Likewise. (altivec_expand_vec_perm_le): Likewise. (rs6000_opt_vars): Add -mvector-size-32 switch. (rs6000_split_multireg_move): Add support for vector pair modes. * config/rs6000/rs6000.h (VECTOR_PAIR_MODE): New macro. * config/rs6000/rs6000.md (wd mode attribute): Add vector pair modes. (RELOAD mode iterator): Likewise. (toplevel): Include vector-pair.md. * config/rs6000/rs6000.opt (-mvector-size-32): New option. * config/rs6000/vector-pair.md: New file. * doc/md.texi (PowerPC constraints): Document the eV constraint. -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: meissner@linux.ibm.com