public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed
* [gcc(refs/users/meissner/heads/work148-vpair)] Update ChangeLog.*
@ 2023-11-28  6:08 Michael Meissner
  0 siblings, 0 replies; only message in thread
From: Michael Meissner @ 2023-11-28  6:08 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:a6dd9b9da9895ab277ee786e461de767464e0651

commit a6dd9b9da9895ab277ee786e461de767464e0651
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Tue Nov 28 01:08:54 2023 -0500

    Update ChangeLog.*

Diff:
---
 gcc/ChangeLog.vpair | 453 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 453 insertions(+)

diff --git a/gcc/ChangeLog.vpair b/gcc/ChangeLog.vpair
index 89859f31096..89674121eaa 100644
--- a/gcc/ChangeLog.vpair
+++ b/gcc/ChangeLog.vpair
@@ -1,5 +1,458 @@
 ==================== Branch work148-vpair, baseline ====================
 
+<intro for the vector pair built-in patches>
+
+These set of patches add support for using the vector pair load (lxvp, plxvp,
+and lxvpx) instructions and the vector pair store (stxvp, pstxvp, and stxvpx)
+that were introduced with ISA 3.1 on Power10 systems.
+
+With GCC 13, the only place vector pairs (and vector quads) were used were to
+feed into the MMA subsystem.  These patches do not use the MMA subsystem, but
+it gives users a way to write code that is extremely memory bandwidth
+intensive.
+
+There are two main ways to add vector pair support to the GCC compiler:
+built-in functions vs. __attribute__((__vector_size__(32))).
+
+The first method is to add a set of built-in functions that use the vector pair
+type and it allows the user to write loops and such using the vector pair type
+(__vector_pair).  Loads are normally done using the load vector pair
+instructions.  Then the operation is done as a post reload split to do the two
+independent vector operations on the two 128-bit vectors located in the vector
+pair.  When the type is stored, normally a store vector pair instruction is
+used.  By keeping the value within a vector pair through register allocation,
+the compiler does not generate extra move instructions which can slow down the
+loop.
+
+The second method is to add support for the V4DF, V8SF, etc. types.  By doing
+so, you can use the attribute __vector_size__(32)) to declare variables that
+are vector pairs, and the GCC compiler will generate the appropriate code.  I
+implemented a limited prototype of this support, but it has some problems that
+I haven't addressed.  One potential problem with using the 32-byte vector size
+is it can generate worse code for options that aren't covered withe as the
+compiler unpacks things and re-packs them.  The compiler would also generate
+these unpacks and packs if you are generating code for a power9 system.  There
+are a bunch of test cases that fail with my prototype implementation that I
+haven't addressed yet.
+
+After discussions within our group, it was decided that using built-in
+functions is the way to go at this time, and these patches are implement those
+functions.
+
+In terms of benchmarks, I wrote two benchmarks:
+
+   1)	One benchmark is a saxpy type loop: value[i] += (a[i] * b[i]).  That is
+	a loop with 3 loads and a store per loop.
+
+   2)	Another benchmark produces a scalar sun of an entire vector.  This is a
+	loop that just has a single load and no store.
+
+For the saxpy type loop, I get the following general numbers for both float and
+double:
+
+   1)	The vector pair built-in functions are roughly 10% faster than using
+	normal vector processing.
+
+   2)	The vector pair built-in functions are roughly 19-20% faster than if I
+	write the loop using the vector pair loads using the exist built-ins,
+	and then manually split the values and do the arithmetic and single
+	vector stores,
+
+   3)	The vector pair built-in functions are roughly 35-40% faster than if I
+	write the loop using the existing built-ins for both vector pair load
+	and vector pair store.  If I apply the patches that Peter Bergner has
+	been writing for PR target/109116, then it improves the speed of the
+	existing built-ins for assembling and disassembling vector pairs.  In
+	this case, the vector pair built-in functions are 20-25% faster,
+	instead of 35-40% faster.  This is due to the patch eliminating extra
+	vector moves.
+
+Unfortunately, for floating point, doing the sum of the whole vector is slower
+using the new vector pair built-in functions using a simple loop (compared to
+using the existing built-ins for disassembling vector pairs.  If I write more
+complex loops that manually unroll the loop, then the floating point vector
+pair built-in functions become like the integer vector pair integer built-in
+functions.  So there is some amount of tuning that will need to be done.
+
+There are 4 patches within this group of patches.
+
+    1)	The first patch adds vector pair support for 32-bit and 64-bit floating
+	point operations.  The operations provided are absolute value,
+	addition, fused multiply-add, minimu, maximum, multiplication,
+	negation, and subtraction.  I did not add divde or square root because
+	these instructions take long enough to compute that you don't get any
+	advantage of using the vector pair load/store instructions.
+
+    2)	The second patch add vector pair support for 8-bit, 16-bit, 32-bit, and
+	64-bit integer operations.  The operations provided include addition,
+	bitwise and, bitwise inclusive or, bitwise exclusive or, bitwise not,
+	both signed and unsigned minimum/maximu, negation, and subtraction.  I
+	did not add multiply because the PowerPC architecture does not provide
+	single instructions to do integer vector multiply on the whole vector.
+	I could add shifts and rotates, but I didn't think memory intensive
+	code used these operations.
+
+    3)	The third patch adds methods to create vector pair values (zero, splat
+	from a scalar value, and combine two 128-bit vectors), as well as a
+	convenient method to exact one 128-bit vector from a vector pair.
+
+    4)	The fourth patch adds horizontal addition for 32-bit, 64-bit floating
+	point, and 64-bit integers.  I do wonder if there are more horizontal
+	reductions that should be done.
+
+I have built and tested these patches on:
+
+    *	A little endian power10 server using --with-cpu=power10
+    *	A little endian power9 server using --with-cpu=power9
+    *	A big endian power9 server using --with-cpu=power9.
+
+Can I check these patches into the master branch?
+
+====================
+
+Add support for floating point vector pair built-in functions.
+
+This patch adds a series of built-in functions to allow users to write code to
+do a number of simple operations where the loop is done using the __vector_pair
+type.  The __vector_pair type is an opaque type.  These built-in functions keep
+the two 128-bit vectors within the __vector_pair together, and split the
+operation after register allocation.
+
+This patch provides vector pair operations for 32-bit floating point and 64-bit
+floating point.
+
+2023-11-17  Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/
+
+	* config/rs6000/rs6000-builtins.def (__builtin_vpair_f32_*): Add vector
+	pair built-in functions for float.
+	(__builtin_vpair_f64_*): Add vector pair built-in functions for double.
+	* config/rs6000/rs6000-protos.h (split_unary_vector_pair): Add
+	declaration.
+	(split_binary_vector_pair): Likewise.
+	(split_fma_vector_pair): Likewise.
+	* config/rs6000/rs6000.cc (split_unary_vector_pair): New helper function
+	for vector pair built-in functions.
+	(split_binary_vector_pair): Likewise.
+	(split_fma_vector_pair): Likewise.
+	* config/rs6000/rs6000.md (toplevel): Include vector-pair.md.
+	* config/rs6000/t-rs6000 (MD_INCLUDES): Add vector-pair.md.
+	* config/rs6000/vector-pair.md: New file.
+	* doc/extend.texi (PowerPC Vector Pair Built-in Functions): Document the
+	floating point and general vector pair built-in functions.
+
+gcc/testsuite/
+
+	* gcc.target/powerpc/vector-pair-1.c: New test.
+	* gcc.target/powerpc/vector-pair-2.c: New test.
+	* gcc.target/powerpc/vector-pair-3.c: New test.
+	* gcc.target/powerpc/vector-pair-4.c: New test.
+
+====================
+
+Add support for integer point vector pair built-in functions.
+
+This patch adds a series of built-in functions to allow users to write code to
+do a number of simple operations where the loop is done using the __vector_pair
+type.  The __vector_pair type is an opaque type.  These built-in functions keep
+the two 128-bit vectors within the __vector_pair together, and split the
+operation after register allocation.
+
+This patch provides vector pair operations for 8, 16, 32, and 64-bit integers.
+
+I have built and tested these patches on:
+
+    *	A little endian power10 server using --with-cpu=power10
+    *	A little endian power9 server using --with-cpu=power9
+    *	A big endian power9 server using --with-cpu=power9.
+
+Can I check this patch into the master branch after the preceeding patch is
+checked in?
+
+2023-11-17  Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/
+
+	* config/rs6000/rs6000-builtins.def (__builtin_vpair_i8*): Add built-in
+	functions for integer vector pairs.
+	(__builtin_vpair_i16*): Likeise.
+	(__builtin_vpair_i32*): Likeise.
+	(__builtin_vpair_i64*): Likeise.
+	* config/rs6000/vector-pair.md (UNSPEC_VPAIR_V32QI): New unspec.
+	(UNSPEC_VPAIR_V16HI): Likewise.
+	(UNSPEC_VPAIR_V8SI): Likewise.
+	(UNSPEC_VPAIR_V4DI): Likewise.
+	(VP_INT_BINARY): New iterator for integer vector pair.
+	(vp_insn): Add supoort for integer vector pairs.
+	(vp_ireg): New code attribute for integer vector pairs.
+	(vp_ipredicate): Likewise.
+	(VP_INT): New int interator for integer vector pairs.
+	(VP_VEC_MODE): Likewise.
+	(vp_pmode): Likewise.
+	(vp_vmode): Likewise.
+	(vp_neg_reg): New int interator for integer vector pairs.
+	(vpair_neg_<vp_pmode>): Add integer vector pair support insns.
+	(vpair_not_<vp_pmode>2): Likewise.
+	(vpair_<vp_insn>_<vp_pmode>3): Likewise.
+	(vpair_andc_<vp_pmode): Likewise.
+	(*vpair_iorc_<vp_pmode>): Likewise.
+	(vpair_nand_<vp_pmode>_1): Likewise.
+	(vpair_nand_<vp_pmode>_2): Likewise.
+	(vpair_nor_<vp_pmode>_1): Likewise.
+	(vpair_nor_<vp_pmode>_2): Likewise.
+	* doc/extend.texi (PowerPC Vector Pair Built-in Functions): Document the
+	integer vector pair built-in functions.
+
+gcc/testsuite/
+
+	* gcc.target/powerpc/vector-pair-5.c: New test.
+	* gcc.target/powerpc/vector-pair-6.c: New test.
+	* gcc.target/powerpc/vector-pair-7.c: New test.
+	* gcc.target/powerpc/vector-pair-8.c: New test.
+
+====================
+
+Add support for initializing and extracting from vector pairs.
+
+This patch adds a series of built-in functions to allow users to write code to
+do a number of simple operations where the loop is done using the __vector_pair
+type.  The __vector_pair type is an opaque type.  These built-in functions keep
+the two 128-bit vectors within the __vector_pair together, and split the
+operation after register allocation.
+
+This patch provides vector pair operations for loading up a vector pair with all
+0's, duplicated (splat) from a scalar type, or combining two vectors in a vector
+pair.  This patch also provides vector pair builtins to extract one vector
+element of a vector pair.
+
+I have built and tested these patches on:
+
+    *	A little endian power10 server using --with-cpu=power10
+    *	A little endian power9 server using --with-cpu=power9
+    *	A big endian power9 server using --with-cpu=power9.
+
+Can I check this patch into the master branch after the preceeding patches have
+been checked in?
+
+2023-11-17  Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/
+
+	* config/rs6000/predicates.md (mma_assemble_input_operand): Allow any
+	16-byte vector, not just V16QImode.
+	* config/rs6000/rs6000-builtins.def (__builtin_vpair_zero): New vector
+	pair initialization built-in functions.
+	(__builtin_vpair_*_assemble): Likeise.
+	(__builtin_vpair_*_splat): Likeise.
+	(__builtin_vpair_*_extract_vector): New vector pair extraction built-in
+	functions.
+	* config/rs6000/vector-pair.md (UNSPEC_VPAIR_V32QI): New unspec.
+	(UNSPEC_VPAIR_V16HI): Likewise.
+	(UNSPEC_VPAIR_V8SI): Likewise.
+	(UNSPEC_VPAIR_V4DI): Likewise.
+	(VP_INT_BINARY): New iterator for integer vector pair.
+	(vp_insn): Add supoort for integer vector pairs.
+	(vp_ireg): New code attribute for integer vector pairs.
+	(vp_ipredicate): Likewise.
+	(VP_INT): New int interator for integer vector pairs.
+	(VP_VEC_MODE): Likewise.
+	(vp_pmode): Likewise.
+	(vp_vmode): Likewise.
+	(vp_neg_reg): New int interator for integer vector pairs.
+	(vpair_neg_<vp_pmode>): Add integer vector pair support insns.
+	(vpair_not_<vp_pmode>2): Likewise.
+	(vpair_<vp_insn>_<vp_pmode>3): Likewise.
+	(vpair_andc_<vp_pmode): Likewise.
+	(vpair_iorc_<vp_pmode>): Likewise.
+	(vpair_nand_<vp_pmode>_1): Likewise.
+	(vpair_nand_<vp_pmode>_2): Likewise.
+	(vpair_nor_<vp_pmode>_1): Likewise.
+	(vpair_nor_<vp_pmode>_2): Likewise.
+	* doc/extend.texi (PowerPC Vector Pair Built-in Functions): Document the
+	integer vector pair built-in functions.
+
+gcc/testsuite/
+
+	* gcc.target/powerpc/vector-pair-5.c: New test.
+	* gcc.target/powerpc/vector-pair-6.c: New test.
+	* gcc.target/powerpc/vector-pair-7.c: New test.
+	* gcc.target/powerpc/vector-pair-8.c: New test.
+
+====================
+
+Add support for doing a horizontal add on vector pair elements.
+
+This patch adds a series of built-in functions to allow users to write code to
+do a number of simple operations where the loop is done using the __vector_pair
+type.  The __vector_pair type is an opaque type.  These built-in functions keep
+the two 128-bit vectors within the __vector_pair together, and split the
+operation after register allocation.
+
+This patch provides vector pair built-in functions to do a horizontal add on
+vector pair elements.  Only floating point and 64-bit horizontal adds are
+provided in this patch.
+
+I have built and tested these patches on:
+
+    *	A little endian power10 server using --with-cpu=power10
+    *	A little endian power9 server using --with-cpu=power9
+    *	A big endian power9 server using --with-cpu=power9.
+
+Can I check this patch into the master branch after the preceeding patches have
+been checked in?
+
+2023-11-17  Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/
+
+	* config/rs6000/rs6000-builtins.def (__builtin_vpair_f32_add_elements):
+	New built-in function.
+	(__builtin_vpair_f64_add_elements): Likewise.
+	(__builtin_vpair_i64_add_elements): Likewise.
+	(__builtin_vpair_i64u_add_elements): Likewise.
+	* config/rs6000/vector-pair.md (UNSPEC_VPAIR_REDUCE_PLUS_F32): New
+	unspec.
+	(UNSPEC_VPAIR_REDUCE_PLUS_F64): Likewise.
+	(UNSPEC_VPAIR_REDUCE_PLUS_I64): Likewise.
+	(vpair_reduc_plus_scale_v8sf): New insn.
+	(vpair_reduc_plus_scale_v4df): Likewise.
+	(vpair_reduc_plus_scale_v4di): Likewise.
+	* doc/extend.texi (__builtin_vpair_f32_add_elements): Document.
+	(__builtin_vpair_f64_add_elements): Likewise.
+	(__builtin_vpair_i64_add_elements): Likewise.
+
+gcc/testsuite/
+
+	* gcc.target/powerpc/vector-pair-16.c: New test.
+
+====================
+
+Add overloads for __builtin_vpair_assemble.
+
+2023-11-17  Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/
+
+	* config/rs6000/rs6000-overloads.def (__builtin_vpair_assemble): Add
+	overloads.
+
+====================
+
+Rename things so it can be combined with the vsize branch.
+
+2023-11-17  Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/
+
+	* config/rs6000/rs6000-builtins.def (__builtin_vpair*): Rename all insn
+	names from VPAIR... to VPAIR_FUNC... to allow building the combined
+	vsubreg branch.
+	* config/rs6000/rs6000-overload.def (__builtin_vpair*): Likewise.
+	* config/rs6000/rs6000.md (toplevel): Include vpair-func.md instead of
+	vector-pair.md.
+	* config/rs6000/t-rs6000: (MD_INCLUDES): Change vector-pair.md to
+	vpair-func.md.
+	* config/rs6000/vpair-func.md: Rename from vector-pair.md to
+	vpair-func.md.  Change all VPAIR names to be VPAIR_FUNC.
+
+==================== Branch work148-vpair, patch #1 (work148 branch) ====================
+
+Power10: Add options to disable load and store vector pair.
+
+This is version 2 of the patch to add -mno-load-vector-pair and
+-mno-store-vector-pair undocumented tuning switches.
+
+The differences between the first version of the patch and this version is that
+I added explicit RTL abi attributes for when the compiler can generate the load
+vector pair and store vector pair instructions.  By having this attribute, the
+movoo insn has separate alternatives for when we generate the instruction and
+when we want to split the instruction into 2 separate vector loads or stores.
+
+In the first version of the patch, I had previously provided built-in functions
+that would always generate load vector pair and store vector pair instructions
+even if these instructions are normally disabled.  I found these built-ins
+weren't specified like the other vector pair built-ins, and I didn't include
+documentation for the built-in functions.  If we want such built-in functions,
+we can add them as a separate patch later.
+
+In addition, since both versions of the patch adds #pragma target and attribute
+support to change the results for individual functions, we can select on a
+function by function basis what the defaults for load/store vector pair is.
+
+The original text for the patch is:
+
+In working on some future patches that involve utilizing vector pair
+instructions, I wanted to be able to tune my program to enable or disable using
+the vector pair load or store operations while still keeping the other
+operations on the vector pair.
+
+This patch adds two undocumented tuning options.  The -mno-load-vector-pair
+option would tell GCC to generate two load vector instructions instead of a
+single load vector pair.  The -mno-store-vector-pair option would tell GCC to
+generate two store vector instructions instead of a single store vector pair.
+
+If either -mno-load-vector-pair is used, GCC will not generate the indexed
+stxvpx instruction.  Similarly if -mno-store-vector-pair is used, GCC will not
+generate the indexed lxvpx instruction.  The reason for this is to enable
+splitting the {,p}lxvp or {,p}stxvp instructions after reload without needing a
+scratch GPR register.
+
+The default for -mcpu=power10 is that both load vector pair and store vector
+pair are enabled.
+
+I added code so that the user code can modify these settings using either a
+'#pragma GCC target' directive or used __attribute__((__target__(...))) in the
+function declaration.
+
+I added tests for the switches, #pragma, and attribute options.
+
+I have built this on both little endian power10 systems and big endian power9
+systems doing the normal bootstrap and test.  There were no regressions in any
+of the tests, and the new tests passed.  Can I check this patch into the master
+branch?
+
+2023-11-28  Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/
+
+	* config/rs6000/mma.md (movoo): Add support for -mno-load-vector-pair and
+	-mno-store-vector-pair.
+	* config/rs6000/rs6000-cpus.def (OTHER_POWER10_MASKS): Add support for
+	-mload-vector-pair and -mstore-vector-pair.
+	(POWERPC_MASKS): Likewise.
+	* config/rs6000/rs6000.cc (rs6000_setup_reg_addr_masks): Only allow
+	indexed mode for OOmode if we are generating both load vector pair and
+	store vector pair instructions.
+	(rs6000_option_override_internal): Add support for -mno-load-vector-pair
+	and -mno-store-vector-pair.
+	(rs6000_opt_masks): Likewise.
+	* config/rs6000/rs6000.md (isa attribute): Add lxvp and stxvp
+	attributes.
+	(enabled attribute): Likewise.
+	* config/rs6000/rs6000.opt (-mload-vector-pair): New option.
+	(-mstore-vector-pair): Likewise.
+
+gcc/testsuite/
+
+	* gcc.target/powerpc/vector-pair-attribute.c: New test.
+	* gcc.target/powerpc/vector-pair-pragma.c: New test.
+	* gcc.target/powerpc/vector-pair-switch1.c: New test.
+	* gcc.target/powerpc/vector-pair-switch2.c: New test.
+	* gcc.target/powerpc/vector-pair-switch3.c: New test.
+	* gcc.target/powerpc/vector-pair-switch4.c: New test.
+
+==================== Branch work148-vpair, baseline ====================
+
+Add ChangeLog.vpair and update REVISION.
+
+2023-11-28  Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/
+
+	* ChangeLog.vpair: New file for branch.
+	* REVISION: Update.
+
 2023-11-28   Michael Meissner  <meissner@linux.ibm.com>
 
 	Clone branch

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2023-11-28  6:08 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-28  6:08 [gcc(refs/users/meissner/heads/work148-vpair)] Update ChangeLog.* Michael Meissner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).