From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id AD8DF3858D33 for ; Wed, 18 Oct 2023 23:55:43 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org AD8DF3858D33 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org AD8DF3858D33 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1697673345; cv=none; b=h52O5RzTT1XvH5RbEsDkRWXUsWSBUnOoTd5urbsu0YaodDPXa4whHv05PF1MGJRqfM76bNElSrBgLWhswRrnH8tWYL5gUMOYOlFTg1xjn7h5COovZE2a6gsvp/9mc86FERKR4j7jOUfX9bE+N7ivhXXuQLYhuRe/3/J7q7FQlUk= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1697673345; c=relaxed/simple; bh=mg/Ku0SQWAnMfDIcbMiyJ7oYhiASDTNGbrTeTOxj5Dc=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=cqPcu3oDOQvy14qJxYeyoeWt/yWCAfUHET6PNkvXpXjL97W44wAAbN4XepPfjaINreclj/7o5krEl5UhAPGdVSn3HfNUg2bXL7IF5DDehT/VX+zAc2q0/hBbzvF+iND7SioY6sDtHeDP8iFGiPJQxNtRqq2Nh2zJh6S9aqXXM/o= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0353722.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 39INn628026577; Wed, 18 Oct 2023 23:55:43 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=date : from : to : subject : message-id : content-type : mime-version; s=pp1; bh=VzsMQSedRMm0u6MqrrpmIUmJR5KfkUTM4+EvJ/hIjfQ=; b=fK5bH+nVQSg5o+ZND5c7tZ0uiEqczoQzEX+3SK7HX71x87fKU9Jf8kqe+xPivSu6t8U+ 1mG2DugAqdsFxkw0OzlxiNmfgrMZkdwKa69jSJB/c5u89KTpyXMbwh1gzlKRMTaIHPGF 9MyxSiHwvmqxG3VsBjWOmp6VA5kfVgxo9orzpMFAm4GBd8zBfjHOIjn7eel+ILMW/cDk isHHuZqZatvLgxMm8zs2/LVEq8fu9zFczZMOH8KeaUSqBXe3A9sAMFfHwgNVi2ePj/8d SHfhoY6APJKU5YyzYLfErhkPGaZHdK61kIHqC/3TecCvU6wtlyvbOwYNqCyBVKl+Wb/2 kQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3ttsc8ga4f-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 18 Oct 2023 23:55:42 +0000 Received: from m0353722.ppops.net (m0353722.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 39INolCQ001196; Wed, 18 Oct 2023 23:55:42 GMT Received: from ppma23.wdc07v.mail.ibm.com (5d.69.3da9.ip4.static.sl-reverse.com [169.61.105.93]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3ttsc8ga4a-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 18 Oct 2023 23:55:42 +0000 Received: from pps.filterd (ppma23.wdc07v.mail.ibm.com [127.0.0.1]) by ppma23.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 39ILZegM027154; Wed, 18 Oct 2023 23:55:41 GMT Received: from smtprelay07.wdc07v.mail.ibm.com ([172.16.1.74]) by ppma23.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3tr6tkmsmj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 18 Oct 2023 23:55:41 +0000 Received: from smtpav05.dal12v.mail.ibm.com (smtpav05.dal12v.mail.ibm.com [10.241.53.104]) by smtprelay07.wdc07v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 39INtetg20316814 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 18 Oct 2023 23:55:41 GMT Received: from smtpav05.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A82CD58056; Wed, 18 Oct 2023 23:55:40 +0000 (GMT) Received: from smtpav05.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3F49C58052; Wed, 18 Oct 2023 23:55:40 +0000 (GMT) Received: from cowardly-lion.the-meissners.org (unknown [9.61.180.52]) by smtpav05.dal12v.mail.ibm.com (Postfix) with ESMTPS; Wed, 18 Oct 2023 23:55:40 +0000 (GMT) Date: Wed, 18 Oct 2023 19:55:38 -0400 From: Michael Meissner To: gcc-patches@gcc.gnu.org, Michael Meissner , Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner Subject: [PATCH 0/6] PowerPC Future patches Message-ID: Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-TM-AS-GCONF: 00 X-Proofpoint-GUID: YK7Idipu4N8fwZK-dtH6KuVGsX9s0LSm X-Proofpoint-ORIG-GUID: YJBYScIv2Pb4-kP5wIz_xDIFVIOTMXec X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.980,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-10-18_18,2023-10-18_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 bulkscore=0 lowpriorityscore=0 adultscore=0 clxscore=1015 mlxlogscore=964 phishscore=0 mlxscore=0 priorityscore=1501 impostorscore=0 spamscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2309180000 definitions=main-2310180197 X-Spam-Status: No, score=-4.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,KAM_SHORT,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: This patch is very preliminary support for a potential new feature to the PowerPC that extends the current power10 MMA architecture. This feature may or may not be present in any specific future PowerPC processor. In the current MMA subsystem for Power10, there are 8 512-bit accumulator registers. These accumulators are each tied to sets of 4 FPR registers. When you issue a prime instruction, it makes sure the accumulator is a copy of the 4 FPR registers the accumulator is tied to. When you issue a deprime instruction, it makes sure that the accumulator data content is logically copied to the matching FPR register. In the potential dense math system, the accumulators are moved to separate registers called dense math registers (DM registers or DMR). The DMRs are then extended to 1,024 bits and new instructions will be added to deal with all 1,024 bits of the DMRs. If you take existing MMA code, it will work as long as you don't do anything with accumulators, and you follow the rules in the ISA 3.1 documentation for using the MMA subsystem. These patches add support for the 512-bit accumulators within the dense math system, and for allocation of the 1,024-bit DMRs. At this time, no additional built-in functions will be done to support any dense math features other than doing data movement between the DMRs and the VSX registers. Before we can look at adding any new dense math support other than data movement, we need the GCC compiler to be able to allocate and use these DMRs. There are 6 patches in this patch set: 1) The first patch just adds -mcpu=future as an option to add new support. This is similar to the -mcpu=future that we did before power10 was announced. 2) The second patch enables GCC to use the load and store vector pair instructions to optimize memory copy operations in the compiler. For power10, we needed to just stay with normal vector load/stores for memory copy operations. 3) The third patch enables 512-bit accumulators that are located within in DMRs instead of the FPRs. This patch enables the register allocation, but it does not move the existing MMA to use these registers. 4) The fourth patch switches the MMA subsystem to use 512-bit accumulators within DMRs if you use -mcpu=future. 5) The fifth patch switches the names of the MMA instructions to use the dense math equivalent name if -mcpu=future. 6) The sixth patch enables using the full 1,024-bit DMRs. Right now, all you can do with DMRs is move a VSX register to a DMR register, and to move a DMR register to a VSX register. In terms of changes, these patch now use the wD constraint for accumulators. If you compile with -mcpu=power10, the wD constraint will match the equivalent FPR register that overlaps with the accumulator. If you compile with -mcpu=future, the wD constraint will match the DMR register and not the FPR register. These patches also modifies the print_operand %A output modifier to print out DMR register numbers if -mcpu=future, and continue to print out the FPR register number divided by 4 for -mcpu=power10. In general, if you only use the built-in functions, things work between the two systems. If you use extended asm, you will likely need to modify the code. Going forward, hopefully if you modify your code to use the wD constraint and %A output modifier, you can write code that switches more easily between the two systems. Again, these are preliminary patches for a potential future machine. Things will likely change in terms of implementation and usage over time. Originally these patches were submitted in November 2022: https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605581.html -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: meissner@linux.ibm.com