Re: [PATCH 0/8] PowerPC future support for Dense Math

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Richard Biener <richard.guenther@gmail.com>
To: Michael Meissner <meissner@linux.ibm.com>,
	gcc-patches@gcc.gnu.org,
	 Segher Boessenkool <segher@kernel.crashing.org>,
	"Kewen.Lin" <linkw@linux.ibm.com>,
	 David Edelsohn <dje.gcc@gmail.com>,
	Peter Bergner <bergner@linux.ibm.com>,
	 Will Schmidt <will_schmidt@vnet.ibm.com>
Subject: Re: [PATCH 0/8] PowerPC future support for Dense Math
Date: Mon, 6 Feb 2023 08:25:22 +0100	[thread overview]
Message-ID: <CAFiYyc34bD+XRzrLaUHdAh0teWNKRy_hehQrusaMeDs_tGJwmA@mail.gmail.com> (raw)
In-Reply-To: <Y915kdrgaQxmJl9A@toto.the-meissners.org>

On Fri, Feb 3, 2023 at 10:16 PM Michael Meissner via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> These patches were originally posted on November 10th.  Segher has asked that I
> repost them.  These patches are somewhat changed since the original posting to
> address some of the comments.
>
> https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605581.html
>
> In the first patch (adding -mcpu=future), I have taken out the code of making
> -mtune=future act as -mtune=power10.  Instead I went through all of the places
> that look at the tuning (mostly in power10.md and rs6000.cc), and added future
> as an option.  Obviously at a later time, we will provide a separate tuning
> file for future (or whatever the new name will be if the instructions are added
> officially).  But for now, it will suffice.
>
> In patch #3, I fixed the opcode for clearing a dense math register that Peter
> had noticed.  I was using the name based on the existing clear instruction,
> instead of the new instruction.
>
> In patch #6, I fixed the code, relying on the changes for setting the precision
> field to 16 bits.  Since that patch will not be able to go into GCC 13 at
> present, we might skip that support for now.  The important thing for existing
> users of the MMA code is the support for accumulators being in the separate
> dense math registers rather than overlapping does need to go in, and we can
> probably delay the 1,024 bit register support, or implement in a different
> fashion.
>
> In the insn names, I tried to switch to using _vsx instead of _fpr for the
> existing MMA support instructions.  I also tried to clear up the comments to
> specify ISA 3.1 instead of power10 when talking about the existing MMA
> support.
>
> The following is from the original posting (slightly modified):
>
> This patch is very preliminary support for a potential new feature to the
> PowerPC that extends the current power10 MMA architecture.  This feature may or
> may not be present in any specific future PowerPC processor.
>
> In the current MMA subsystem for Power10, there are 8 512-bit accumulator
> registers.  These accumulators are each tied to sets of 4 FPR registers.  When
> you issue a prime instruction, it makes sure the accumulator is a copy of the 4
> FPR registers the accumulator is tied to.  When you issue a deprime
> instruction, it makes sure that the accumulator data content is logically
> copied to the matching FPR register.
>
> In the potential dense math system, the accumulators are moved to separate
> registers called dense math registers (DM registers or DMR).  The DMRs are then
> extended to 1,024 bits and new instructions will be added to deal with all
> 1,024 bits of the DMRs.
>
> If you take existing MMA code, it will work as long as you don't do anything
> with accumulators, and you follow the rules in the ISA 3.1 documentation for
> using the MMA subsystem.
>
> These patches add support for the 512-bit accumulators within the dense math
> system, and for allocation of the 1,024-bit DMRs.  At this time, no additional
> built-in functions will be done to support any dense math features other than
> doing data movement between the DMRs and the VSX registers.  Before we can look
> at adding any new dense math support other than data movement, we need the GCC
> compiler to be able to allocate and use these DMRs.
>
> There are 8 patches in this patch set:
>
> 1) The first patch just adds -mcpu=future as an option to add new support.
> This is similar to the -mcpu=future that we did before power10 was announced.
>
> 2) The second patch enables GCC to use the load and store vector pair
> instructions to optimize memory copy operations in the compiler.  For power10,
> we needed to just stay with normal vector load/stores for memory copy
> operations.
>
> 3) The third patch enables 512-bit accumulators store in DMRs.  This patch
> enables the register allocation, but it does not move the existing MMA to use
> these registers.
>
> 4) The fourth patch switches the MMA subsystem to use 512-bit accumulators
> within DMRs if you use -mcpu=future.
>
> 5) The fifth patch switches the names of the MMA instructions to use the dense
> math equivalent name if -mcpu=future.
>
> 6) The sixth patch enables using the full 1,024-bit DMRs.  Right now, all you
> can do with DMRs is move a VSX register to a DMR register, and to move a DMR
> register to a VSX register.  [As I mentioned above, at the moment, this patch
> is problematical as is]
>
> 7) The seventh patch is not DMR related.  It adds support for variants of the
> load/store vector with length instruction that may be added in future PowerPC
> processors.  These variants eliminate having to shift the byte length left by
> 56 bits.
>
> 8) The eighth patch is also not DMR related.  It adds support for a saturating
> subtract operation that may be added to future PowerPC processors.
>
> In terms of changes, we now use the wD constraint for accumulators.  If you
> compile with -mcpu=power10, the wD constraint will match the equivalent VSX
> register (0..31) that overlaps with the accumulator.  If you compile with
> -mcpu=future, the wD constraint will match the DMR register and not the FPR
> register.
>
> This patch also modifies the print_operand %A output modifier to print out DMR
> register numbers if -mcpu=future, and continue to print out the FPR register
> number divided by 4 for -mcpu=power10.
>
> In general, if you only use the built-in functions, things work between the two
> systems.  If you use extended asm, you will likely need to modify the code.
> Going forward, hopefully if you modify your code to use the wD constraint and
> %A output modifier, you can write code that switches more easily between the
> two systems.
>
> Again, these are preliminary patches for a potential future machine.  Things
> will likely change in terms of implementation and usage over time.

May I ask to consider delaying this to stage1 exactly because of this
last reason?

Richard.

>
> --
> Michael Meissner, IBM
> PO Box 98, Ayer, Massachusetts, USA, 01432
> email: meissner@linux.ibm.com

next prev parent reply	other threads:[~2023-02-06  7:25 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-03 21:16 Michael Meissner
2023-02-03 21:21 ` [PATCH 1/8] PowerPC: Add -mcpu=future Michael Meissner
2023-02-03 21:23 ` [PATCH 1/8] PowerPC: Make -mcpu=future enable -mblock-ops-vector-pair Michael Meissner
2023-02-03 21:25 ` [PATCH 2/8] PowerPC: Add support for accumulators in DMR registers Michael Meissner
2023-02-03 21:27 ` [PATCH 3/8] PowerPC: Make MMA insns support " Michael Meissner
2023-02-03 21:29 ` [PATCH 4/8] PowerPC: Switch to dense math names for all MMA operations Michael Meissner
2023-02-03 21:33 ` [PATCH 6/8] PowerPC: Add support for 1,024 bit DMR registers Michael Meissner
2023-02-03 21:36 ` [PATCH 7/8] Support load/store vector with right length Michael Meissner
2023-02-03 21:37 ` [PATCH 8/8] Add saturating subtract built-ins Michael Meissner
2023-02-06  7:25 ` Richard Biener [this message]
2023-02-06 18:22   ` [PATCH 0/8] PowerPC future support for Dense Math Peter Bergner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAFiYyc34bD+XRzrLaUHdAh0teWNKRy_hehQrusaMeDs_tGJwmA@mail.gmail.com \
    --to=richard.guenther@gmail.com \
    --cc=bergner@linux.ibm.com \
    --cc=dje.gcc@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=linkw@linux.ibm.com \
    --cc=meissner@linux.ibm.com \
    --cc=segher@kernel.crashing.org \
    --cc=will_schmidt@vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).