public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 0/6] PowerPC Dense Math prelimary support (-mcpu=future)
@ 2022-11-10  2:43 Michael Meissner
  2022-11-10  2:44 ` [PATCH 1/6] PowerPC: Add -mcpu=future Michael Meissner
                   ` (8 more replies)
  0 siblings, 9 replies; 26+ messages in thread
From: Michael Meissner @ 2022-11-10  2:43 UTC (permalink / raw)
  To: gcc-patches, Michael Meissner, Segher Boessenkool, Kewen.Lin,
	David Edelsohn, Peter Bergner, Will Schmidt

This patch is very preliminary support for a potential new feature to the
PowerPC that extends the current power10 MMA architecture.  This feature may or
may not be present in any specific future PowerPC processor.

In the current MMA subsystem for Power10, there are 8 512-bit accumulator
registers.  These accumulators are each tied to sets of 4 FPR registers.  When
you issue a prime instruction, it makes sure the accumulator is a copy of the 4
FPR registers the accumulator is tied to.  When you issue a deprime
instruction, it makes sure that the accumulator data content is logically
copied to the matching FPR register.

In the potential dense math system, the accumulators are moved to separate
registers called dense math registers (DM registers or DMR).  The DMRs are then
extended to 1,024 bits and new instructions will be added to deal with all
1,024 bits of the DMRs.

If you take existing MMA code, it will work as long as you don't do anything
with accumulators, and you follow the rules in the ISA 3.1 documentation for
using the MMA subsystem.

These patches add support for the 512-bit accumulators within the dense math
system, and for allocation of the 1,024-bit DMRs.  At this time, no additional
built-in functions will be done to support any dense math features other than
doing data movement between the DMRs and the VSX registers.  Before we can look
at adding any new dense math support other than data movement, we need the GCC
compiler to be able to allocate and use these DMRs.

There are 6 patches in this patch set:

1) The first patch just adds -mcpu=future as an option to add new support.
This is similar to the -mcpu=future that we did before power10 was announced.

2) The second patch enables GCC to use the load and store vector pair
instructions to optimize memory copy operations in the compiler.  For power10,
we needed to just stay with normal vector load/stores for memory copy
operations.

3) The third patch enables 512-bit accumulators store in DMRs.  This patch
enables the register allocation, but it does not move the existing MMA to use
these registers.

4) The fourth patch switches the MMA subsystem to use 512-bit accumulators
within DMRs if you use -mcpu=future.

5) The fifth patch switches the names of the MMA instructions to use the dense
math equivalent name if -mcpu=future.

6) The sixth patch enables using the full 1,024-bit DMRs.  Right now, all you
can do with DMRs is move a VSX register to a DMR register, and to move a DMR
register to a VSX register.

In terms of changes, we now use the wD constraint for accumulators.  If you
compile with -mcpu=power10, the wD constraint will match the equivalent FPR
register that overlaps with the accumulator.  If you compile with -mcpu=future,
the wD constraint will match the DMR register and not the FPR register.

This patch also modifies the print_operand %A output modifier to print out DMR
register numbers if -mcpu=future, and continue to print out the FPR register
number divided by 4 for -mcpu=power10.

In general, if you only use the built-in functions, things work between the two
systems.  If you use extended asm, you will likely need to modify the code.
Going forward, hopefully if you modify your code to use the wD constraint and
%A output modifier, you can write code that switches more easily between the
two systems.

There is one bug that I noticed.  When you use the full DMR instruction the
constant copy propagation patch issues internal errors.  I believe this is due
to the CCP pass not handling opaque types cleanly enough, and it only shows up
in larger types.  I would like to get these patches committed, and then work
the maintainers of the CCP to fix the problem.

Again, these are preliminary patches for a potential future machine.  Things
will likely change in terms of implementation and usage over time.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 26+ messages in thread
* [PATCH 0/6] PowerPC Future patches
@ 2023-10-18 23:55 Michael Meissner
  2023-10-19  0:06 ` [PATCH 6/6] PowerPC: Add support for 1,024 bit DMR registers Michael Meissner
  0 siblings, 1 reply; 26+ messages in thread
From: Michael Meissner @ 2023-10-18 23:55 UTC (permalink / raw)
  To: gcc-patches, Michael Meissner, Segher Boessenkool, Kewen.Lin,
	David Edelsohn, Peter Bergner

This patch is very preliminary support for a potential new feature to the
PowerPC that extends the current power10 MMA architecture.  This feature may or
may not be present in any specific future PowerPC processor.

In the current MMA subsystem for Power10, there are 8 512-bit accumulator
registers.  These accumulators are each tied to sets of 4 FPR registers.  When
you issue a prime instruction, it makes sure the accumulator is a copy of the 4
FPR registers the accumulator is tied to.  When you issue a deprime
instruction, it makes sure that the accumulator data content is logically
copied to the matching FPR register.

In the potential dense math system, the accumulators are moved to separate
registers called dense math registers (DM registers or DMR).  The DMRs are then
extended to 1,024 bits and new instructions will be added to deal with all
1,024 bits of the DMRs.

If you take existing MMA code, it will work as long as you don't do anything
with accumulators, and you follow the rules in the ISA 3.1 documentation for
using the MMA subsystem.

These patches add support for the 512-bit accumulators within the dense math
system, and for allocation of the 1,024-bit DMRs.  At this time, no additional
built-in functions will be done to support any dense math features other than
doing data movement between the DMRs and the VSX registers.  Before we can look
at adding any new dense math support other than data movement, we need the GCC
compiler to be able to allocate and use these DMRs.

There are 6 patches in this patch set:

1) The first patch just adds -mcpu=future as an option to add new support.
This is similar to the -mcpu=future that we did before power10 was announced.

2) The second patch enables GCC to use the load and store vector pair
instructions to optimize memory copy operations in the compiler.  For power10,
we needed to just stay with normal vector load/stores for memory copy
operations.

3) The third patch enables 512-bit accumulators that are located within in DMRs
instead of the FPRs.  This patch enables the register allocation, but it does
not move the existing MMA to use these registers.

4) The fourth patch switches the MMA subsystem to use 512-bit accumulators
within DMRs if you use -mcpu=future.

5) The fifth patch switches the names of the MMA instructions to use the dense
math equivalent name if -mcpu=future.

6) The sixth patch enables using the full 1,024-bit DMRs.  Right now, all you
can do with DMRs is move a VSX register to a DMR register, and to move a DMR
register to a VSX register.

In terms of changes, these patch now use the wD constraint for accumulators.
If you compile with -mcpu=power10, the wD constraint will match the equivalent
FPR register that overlaps with the accumulator.  If you compile with
-mcpu=future, the wD constraint will match the DMR register and not the FPR
register.

These patches also modifies the print_operand %A output modifier to print out
DMR register numbers if -mcpu=future, and continue to print out the FPR
register number divided by 4 for -mcpu=power10.

In general, if you only use the built-in functions, things work between the two
systems.  If you use extended asm, you will likely need to modify the code.
Going forward, hopefully if you modify your code to use the wD constraint and
%A output modifier, you can write code that switches more easily between the
two systems.

Again, these are preliminary patches for a potential future machine.  Things
will likely change in terms of implementation and usage over time.

Originally these patches were submitted in November 2022:
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605581.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 26+ messages in thread
* Repost [PATCH 0/6] PowerPC Future patches
@ 2024-01-05 23:27 Michael Meissner
  2024-01-05 23:42 ` Repost [PATCH 6/6] PowerPC: Add support for 1,024 bit DMR registers Michael Meissner
  0 siblings, 1 reply; 26+ messages in thread
From: Michael Meissner @ 2024-01-05 23:27 UTC (permalink / raw)
  To: gcc-patches, Michael Meissner, Segher Boessenkool, Kewen.Lin,
	David Edelsohn, Peter Bergner

I posted these patches on October 18th, 2023, and I never receieved any feedback
on the changes.  What changes do I need to make with these patches to get them
into GCC 14?

This patch is very preliminary support for a potential new feature to the
PowerPC that extends the current power10 MMA architecture.  This feature may or
may not be present in any specific future PowerPC processor.

In the current MMA subsystem for Power10, there are 8 512-bit accumulator
registers.  These accumulators are each tied to sets of 4 FPR registers.  When
you issue a prime instruction, it makes sure the accumulator is a copy of the 4
FPR registers the accumulator is tied to.  When you issue a deprime
instruction, it makes sure that the accumulator data content is logically
copied to the matching FPR register.

In the potential dense math system, the accumulators are moved to separate
registers called dense math registers (DM registers or DMR).  The DMRs are then
extended to 1,024 bits and new instructions will be added to deal with all
1,024 bits of the DMRs.

If you take existing MMA code, it will work as long as you don't do anything
with accumulators, and you follow the rules in the ISA 3.1 documentation for
using the MMA subsystem.

These patches add support for the 512-bit accumulators within the dense math
system, and for allocation of the 1,024-bit DMRs.  At this time, no additional
built-in functions will be done to support any dense math features other than
doing data movement between the DMRs and the VSX registers.  Before we can look
at adding any new dense math support other than data movement, we need the GCC
compiler to be able to allocate and use these DMRs.

There are 6 patches in this patch set:

1) The first patch just adds -mcpu=future as an option to add new support.
This is similar to the -mcpu=future that we did before power10 was announced.

2) The second patch enables GCC to use the load and store vector pair
instructions to optimize memory copy operations in the compiler.  For power10,
we needed to just stay with normal vector load/stores for memory copy
operations.

3) The third patch enables 512-bit accumulators that are located within in DMRs
instead of the FPRs.  This patch enables the register allocation, but it does
not move the existing MMA to use these registers.

4) The fourth patch switches the MMA subsystem to use 512-bit accumulators
within DMRs if you use -mcpu=future.

5) The fifth patch switches the names of the MMA instructions to use the dense
math equivalent name if -mcpu=future.

6) The sixth patch enables using the full 1,024-bit DMRs.  Right now, all you
can do with DMRs is move a VSX register to a DMR register, and to move a DMR
register to a VSX register.

In terms of changes, these patch now use the wD constraint for accumulators.
If you compile with -mcpu=power10, the wD constraint will match the equivalent
FPR register that overlaps with the accumulator.  If you compile with
-mcpu=future, the wD constraint will match the DMR register and not the FPR
register.

These patches also modifies the print_operand %A output modifier to print out
DMR register numbers if -mcpu=future, and continue to print out the FPR
register number divided by 4 for -mcpu=power10.

In general, if you only use the built-in functions, things work between the two
systems.  If you use extended asm, you will likely need to modify the code.
Going forward, hopefully if you modify your code to use the wD constraint and
%A output modifier, you can write code that switches more easily between the
two systems.

Again, these are preliminary patches for a potential future machine.  Things
will likely change in terms of implementation and usage over time.

Originally these patches were submitted in November 2022:
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605581.html

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2024-01-19 18:49 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-10  2:43 [PATCH 0/6] PowerPC Dense Math prelimary support (-mcpu=future) Michael Meissner
2022-11-10  2:44 ` [PATCH 1/6] PowerPC: Add -mcpu=future Michael Meissner
2022-11-11 21:07   ` Peter Bergner
2023-01-20 21:05   ` Ping: " Michael Meissner
2023-01-27 20:00     ` Segher Boessenkool
2022-11-10  2:45 ` [PATCH 2/6] PowerPC: Make -mcpu=future enable -mblock-ops-vector-pair Michael Meissner
2023-01-20 21:07   ` Ping: " Michael Meissner
2022-11-10  2:46 ` [PATCH 3/6] PowerPC: Add support for accumulators in DMR registers Michael Meissner
2023-01-20 21:08   ` Ping: " Michael Meissner
2022-11-10  2:50 ` [PATCH 4/6] PowerPC: Make MMA insns support " Michael Meissner
2023-01-20 21:10   ` Ping: " Michael Meissner
2022-11-10  2:51 ` [PATCH 5/6] PowerPC: Switch to dense math names for all MMA operations Michael Meissner
2023-01-20 21:11   ` Ping: " Michael Meissner
2022-11-10  2:52 ` [PATCH 6/6] PowerPC: Add support for 1,024 bit DMR registers Michael Meissner
2023-01-20 21:12   ` Ping: " Michael Meissner
2022-11-12  5:07 ` [PATCH 7] PowerPC: Add -mcpu=future saturating subtract built-ins Michael Meissner
2023-01-20 21:13   ` Ping: " Michael Meissner
2022-11-12  5:10 ` [PATCH 8] PowerPC: Support load/store vector with right length Michael Meissner
2023-01-20 21:15   ` Patch: " Michael Meissner
2023-01-27 19:59 ` [PATCH 0/6] PowerPC Dense Math prelimary support (-mcpu=future) Segher Boessenkool
2023-01-28  7:29   ` Michael Meissner
2023-01-30  2:52     ` Michael Meissner
2023-02-01  3:31       ` Michael Meissner
2023-02-02  0:05         ` Segher Boessenkool
2023-10-18 23:55 [PATCH 0/6] PowerPC Future patches Michael Meissner
2023-10-19  0:06 ` [PATCH 6/6] PowerPC: Add support for 1,024 bit DMR registers Michael Meissner
2023-10-25 20:21   ` Ping: " Michael Meissner
2024-01-05 23:27 Repost [PATCH 0/6] PowerPC Future patches Michael Meissner
2024-01-05 23:42 ` Repost [PATCH 6/6] PowerPC: Add support for 1,024 bit DMR registers Michael Meissner
2024-01-19 18:49   ` Ping " Michael Meissner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).