From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-x12d.google.com (mail-lf1-x12d.google.com [IPv6:2a00:1450:4864:20::12d]) by sourceware.org (Postfix) with ESMTPS id 1B75F3858D1E for ; Mon, 6 Feb 2023 07:25:37 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 1B75F3858D1E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-lf1-x12d.google.com with SMTP id cf42so16388056lfb.1 for ; Sun, 05 Feb 2023 23:25:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=6CGtHXlmRaPRNrHVRyJ5j+SlPGlZWhyxKa9of7Pygkw=; b=YPMd9R5BQXnZnVPg8g+y3YO9154Ou5TAslvujFJSytqOZ4m30vwNVwlWRskMJAjVWY DZZstMhG5HUSgFY1ppVoi10VcdpahALersT5zonrEgE/wk5BnK8WQ4XYmEHi9COQ+o13 NuH/QTZjLLYCEbOxBOs2bCODsFEghowe4vSd2QIsz/KhOs/yt751cX9SLl0ASJWz07yH hTWFqztwx4/gVPVT7i3ZzaCXZYPjsMwnCNPe8rVmLeoD72KNZgLDiOHTSG9weDeCXOic vqUvkTE0rUwDI71Ve/h+NlMLM4bFdGOdTS4yImrm1JACwDhqm4ZvQfHnrX4L1g36+PNd vJSg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=6CGtHXlmRaPRNrHVRyJ5j+SlPGlZWhyxKa9of7Pygkw=; b=APC5CmVwup20Cj2O1miIjfFP7zhn4ZNhIraFItVqmFQ39Gz238QV4Izybnk/xEhKVN E0Yy2J9LNc2qBTJhnp2vL/TqQghsOcDe72WjJX9bL5EyLcZZIc6UihP6TzZRbyHIZIUX sz4+l43H/jwZFjLH+hTDTVkjVuub3mYCHTbehvPakDKcnCs1eqIhPAcHz7tmNro37owa VYXdiKOsV/dJi6XYry15NsOcSIXj6M2wAmj11uSjWmz2Q6DGMswaQW+xaeN1uHsutyus BV3HT3N0O0hguNX4aNeO+AHkXOHqSxZfMj7QJJNvK411y4ELRhChvuom2BCPuLrXGAdK mfug== X-Gm-Message-State: AO0yUKUXFbGwppBAdHWYxxekVodoBA2IwQBkTAcjGu/c2uR1pFXlh5Op vOOTDZKsmCYi8ZQ1uJ3icGzJ55dQV0Tc1pc8A7Q= X-Google-Smtp-Source: AK7set+dsWRI/QWrcJnG39l4qMNij+69I/bCoJ/GxD70N66gJJ6D2Ma2TFzONOaxgEWjbG0m/u4tiyHtC9vTUR0986I= X-Received: by 2002:a05:6512:709:b0:4d8:5308:f795 with SMTP id b9-20020a056512070900b004d85308f795mr2758644lfs.10.1675668335487; Sun, 05 Feb 2023 23:25:35 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Richard Biener Date: Mon, 6 Feb 2023 08:25:22 +0100 Message-ID: Subject: Re: [PATCH 0/8] PowerPC future support for Dense Math To: Michael Meissner , gcc-patches@gcc.gnu.org, Segher Boessenkool , "Kewen.Lin" , David Edelsohn , Peter Bergner , Will Schmidt Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,KAM_MANYTO,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, Feb 3, 2023 at 10:16 PM Michael Meissner via Gcc-patches wrote: > > These patches were originally posted on November 10th. Segher has asked that I > repost them. These patches are somewhat changed since the original posting to > address some of the comments. > > https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605581.html > > In the first patch (adding -mcpu=future), I have taken out the code of making > -mtune=future act as -mtune=power10. Instead I went through all of the places > that look at the tuning (mostly in power10.md and rs6000.cc), and added future > as an option. Obviously at a later time, we will provide a separate tuning > file for future (or whatever the new name will be if the instructions are added > officially). But for now, it will suffice. > > In patch #3, I fixed the opcode for clearing a dense math register that Peter > had noticed. I was using the name based on the existing clear instruction, > instead of the new instruction. > > In patch #6, I fixed the code, relying on the changes for setting the precision > field to 16 bits. Since that patch will not be able to go into GCC 13 at > present, we might skip that support for now. The important thing for existing > users of the MMA code is the support for accumulators being in the separate > dense math registers rather than overlapping does need to go in, and we can > probably delay the 1,024 bit register support, or implement in a different > fashion. > > In the insn names, I tried to switch to using _vsx instead of _fpr for the > existing MMA support instructions. I also tried to clear up the comments to > specify ISA 3.1 instead of power10 when talking about the existing MMA > support. > > The following is from the original posting (slightly modified): > > This patch is very preliminary support for a potential new feature to the > PowerPC that extends the current power10 MMA architecture. This feature may or > may not be present in any specific future PowerPC processor. > > In the current MMA subsystem for Power10, there are 8 512-bit accumulator > registers. These accumulators are each tied to sets of 4 FPR registers. When > you issue a prime instruction, it makes sure the accumulator is a copy of the 4 > FPR registers the accumulator is tied to. When you issue a deprime > instruction, it makes sure that the accumulator data content is logically > copied to the matching FPR register. > > In the potential dense math system, the accumulators are moved to separate > registers called dense math registers (DM registers or DMR). The DMRs are then > extended to 1,024 bits and new instructions will be added to deal with all > 1,024 bits of the DMRs. > > If you take existing MMA code, it will work as long as you don't do anything > with accumulators, and you follow the rules in the ISA 3.1 documentation for > using the MMA subsystem. > > These patches add support for the 512-bit accumulators within the dense math > system, and for allocation of the 1,024-bit DMRs. At this time, no additional > built-in functions will be done to support any dense math features other than > doing data movement between the DMRs and the VSX registers. Before we can look > at adding any new dense math support other than data movement, we need the GCC > compiler to be able to allocate and use these DMRs. > > There are 8 patches in this patch set: > > 1) The first patch just adds -mcpu=future as an option to add new support. > This is similar to the -mcpu=future that we did before power10 was announced. > > 2) The second patch enables GCC to use the load and store vector pair > instructions to optimize memory copy operations in the compiler. For power10, > we needed to just stay with normal vector load/stores for memory copy > operations. > > 3) The third patch enables 512-bit accumulators store in DMRs. This patch > enables the register allocation, but it does not move the existing MMA to use > these registers. > > 4) The fourth patch switches the MMA subsystem to use 512-bit accumulators > within DMRs if you use -mcpu=future. > > 5) The fifth patch switches the names of the MMA instructions to use the dense > math equivalent name if -mcpu=future. > > 6) The sixth patch enables using the full 1,024-bit DMRs. Right now, all you > can do with DMRs is move a VSX register to a DMR register, and to move a DMR > register to a VSX register. [As I mentioned above, at the moment, this patch > is problematical as is] > > 7) The seventh patch is not DMR related. It adds support for variants of the > load/store vector with length instruction that may be added in future PowerPC > processors. These variants eliminate having to shift the byte length left by > 56 bits. > > 8) The eighth patch is also not DMR related. It adds support for a saturating > subtract operation that may be added to future PowerPC processors. > > In terms of changes, we now use the wD constraint for accumulators. If you > compile with -mcpu=power10, the wD constraint will match the equivalent VSX > register (0..31) that overlaps with the accumulator. If you compile with > -mcpu=future, the wD constraint will match the DMR register and not the FPR > register. > > This patch also modifies the print_operand %A output modifier to print out DMR > register numbers if -mcpu=future, and continue to print out the FPR register > number divided by 4 for -mcpu=power10. > > In general, if you only use the built-in functions, things work between the two > systems. If you use extended asm, you will likely need to modify the code. > Going forward, hopefully if you modify your code to use the wD constraint and > %A output modifier, you can write code that switches more easily between the > two systems. > > Again, these are preliminary patches for a potential future machine. Things > will likely change in terms of implementation and usage over time. May I ask to consider delaying this to stage1 exactly because of this last reason? Richard. > > -- > Michael Meissner, IBM > PO Box 98, Ayer, Massachusetts, USA, 01432 > email: meissner@linux.ibm.com