From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <meissner@sourceware.org>
Received: by sourceware.org (Postfix, from userid 1005)
	id 278833858289; Fri, 14 Jul 2023 19:52:22 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 278833858289
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1689364342;
	bh=CbB5ERIwMvlNfi2xfY5jFyQ2lf65TSPGFJLiT6nwutk=;
	h=From:To:Subject:Date:From;
	b=D1uS6r7XB+wyJDmL5pyd9tRLYI+qqTPWW3grQRcnDoKecIJKdpaujYyebuDxizhp4
	 ej5VQTnAod+Dw/29US/giG+sgVDu79mGD8gVF3/6u7UkABYmLbZkVTOoZNzvB6Sx3F
	 sC/OBFEtC59MtgLrHuI5MnOtit0qSEotKfXQsZH0=
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
From: Michael Meissner <meissner@gcc.gnu.org>
To: gcc-cvs@gcc.gnu.org, libstdc++-cvs@gcc.gnu.org
Subject: [gcc(refs/users/meissner/heads/work125-vpair)] Merge commit
 'refs/users/meissner/heads/work125-vpair' of git+ssh://gcc.gnu.org/git/gcc
 into me/work
X-Act-Checkin: gcc
X-Git-Author: Michael Meissner <meissner@linux.ibm.com>
X-Git-Refname: refs/users/meissner/heads/work125-vpair
X-Git-Oldrev: 9846b58a20f2172e5bdd502f3f78d20a59228e27
X-Git-Newrev: 630e1ba4d622b53da62dc7d07bcfa842da5f2162
Message-Id: <20230714195222.278833858289@sourceware.org>
Date: Fri, 14 Jul 2023 19:52:22 +0000 (GMT)
List-Id: <libstdc++-cvs.sourceware.org>

https://gcc.gnu.org/g:630e1ba4d622b53da62dc7d07bcfa842da5f2162

commit 630e1ba4d622b53da62dc7d07bcfa842da5f2162
Merge: 9846b58a20f c53c1e9f353
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Fri Jul 14 15:51:50 2023 -0400

    Merge commit 'refs/users/meissner/heads/work125-vpair' of git+ssh://gcc.gnu.org/git/gcc into me/work125-vpair

Diff:

 gcc/ChangeLog.meissner           | 2 +-
 gcc/REVISION                     | 2 +-
 gcc/c-family/ChangeLog.meissner  | 2 +-
 gcc/c/ChangeLog.meissner         | 2 +-
 gcc/cp/ChangeLog.meissner        | 2 +-
 gcc/fortran/ChangeLog.meissner   | 2 +-
 gcc/testsuite/ChangeLog.meissner | 2 +-
 libgcc/ChangeLog.meissner        | 2 +-
 libstdc++-v3/ChangeLog.meissner  | 2 +-
 9 files changed, 9 insertions(+), 9 deletions(-)

diff --cc gcc/ChangeLog.meissner
index 306c070821c,8c6f8f14248..5051fa02d69
--- a/gcc/ChangeLog.meissner
+++ b/gcc/ChangeLog.meissner
@@@ -1,364 -1,4 +1,364 @@@
 +==================== Branch work125, patch #9 ====================
 +
 +PR target/89213 - Optimize vector shift by a constant.
 +
 +Optimize vector shifts by a constant, taking advantage that the shift
 +instructions only look at the bits within the element.
 +
 +The PowerPC doesn't have a VSPLTID instruction.  This meant that if we are doing
 +a V2DI shift of 0..15, we had to do VSPLTIW and VEXTSW2D instructions to load
 +the constant into the vector register.
 +
 +Similarly for V4SI and V2DI, if we wanted to shift more than 15 bits, we would
 +generate XXSPLTIB and VEXTSB2D or VEXTSB2W instructions to load the constant
 +into the vector register.
 +
 +Given the vector shift instructions only look at the bottom 5 or 6 bits of the
 +shift value, we can load the constant via VSPLTISW or XXSPLTIB instructions and
 +eliminate the sign extend instructions (VEXTSW2D, VEXTSB2D, and VEXTSB2W).
 +
 +I have tested this patch on the following systems and there was no degration.
 +Can I check it into the trunk branch?
 +
 +    *	Power10, LE, --with-cpu=power10, IBM 128-bit long double
 +    *	Power9,  LE, --with-cpu=power9,  IBM 128-bit long double
 +    *	Power9,  LE, --with-cpu=power9,  IEEE 128-bit long double
 +    *   Power9,  LE, --with-cpu=power9,  64-bit default long double
 +    *	Power9,  BE, --with-cpu=power9,  IBM 128-bit long double
 +    *	Power8,  BE, --with-cpu=power8,  IBM 128-bit long double
 +
 +2023-07-14  Michael Meissner  <meissner@linux.ibm.com>
 +
 +gcc/
 +
 +	PR target/89213
 +	* config/rs6000/altivec.md (UNSPEC_VECTOR_SHIFT): New unspec.
 +	(V4SI_V2DI): New mode iterator.
 +	(vshift_code): New code iterator.
 +	(altivec_<code>_const_<mode): New insns.
 +	(altivec_shift_const_<mode>): New insns.
 +	* config/rs6000/predicates.md (vector_shift_constant): New
 +	predicate.
 +
 +gcc/testsuite/
 +
 +	PR target/89213
 +	* gcc.target/powerpc/pr89213.c: New test.
 +	* gcc.target/powerpc/vec-rlmi-rlnm.c: Update insn count.
 +
 +==================== Branch work125, patch #8 ====================
 +
 +Update fold-vec-extract insn counts on 32-bit big endian
 +
 +In running tests, I noticed on big endian systems that the expected ADDIs is
 +higher than the current number of ADDIs generated by the compiler.  This patch
 +adjusts those counts.
 +
 +I have tested this patch on the following systems and there was no degration.
 +Can I check it into the trunk branch?
 +
 +    *	Power10, LE, --with-cpu=power10, IBM 128-bit long double
 +    *	Power9,  LE, --with-cpu=power9,  IBM 128-bit long double
 +    *	Power9,  LE, --with-cpu=power9,  IEEE 128-bit long double
 +    *   Power9,  LE, --with-cpu=power9,  64-bit default long double
 +    *	Power9,  BE, --with-cpu=power9,  IBM 128-bit long double
 +    *	Power8,  BE, --with-cpu=power8,  IBM 128-bit long double
 +
 +2023-07-14   Michael Meissner  <meissner@linux.ibm.com>
 +
 +gcc/testsuite/
 +
 +	* gcc.target/powerpc/fold-vec-extract-char.p7.c: Update insn count for
 +	32-bit.
 +	* gcc.target/powerpc/fold-vec-extract-double.p7.c: Likewise.
 +	* gcc.target/powerpc/fold-vec-extract-float.p7.c: Likewise.
 +	* gcc.target/powerpc/fold-vec-extract-float.p8.c: Likewise.
 +	* gcc.target/powerpc/fold-vec-extract-int.p7.c: Likewise.
 +	* gcc.target/powerpc/fold-vec-extract-int.p8.c: Likewise.
 +	* gcc.target/powerpc/fold-vec-extract-short.p7.c: Likewise.
 +	* gcc.target/powerpc/fold-vec-extract-short.p8.c: Likewise.
 +
 +==================== Branch work125, patch #7 ====================
 +
 +Allow constant element vec_extract to be converted to floating point
 +
 +This patch allows vec_extract of the following types to be converted to
 +floating point by loading the value directly to the vector register, and then
 +doing the conversion instead of loading the value to a GPR and then doing a
 +direct move:
 +
 +vector int
 +vector unsigned int
 +vector unsigned short
 +vector unsigned char
 +
 +I have tested this patch on the following systems and there was no degration.
 +Can I check it into the trunk branch?
 +
 +    *	Power10, LE, --with-cpu=power10, IBM 128-bit long double
 +    *	Power9,  LE, --with-cpu=power9,  IBM 128-bit long double
 +    *	Power9,  LE, --with-cpu=power9,  IEEE 128-bit long double
 +    *   Power9,  LE, --with-cpu=power9,  64-bit default long double
 +    *	Power9,  BE, --with-cpu=power9,  IBM 128-bit long double
 +    *	Power8,  BE, --with-cpu=power8,  IBM 128-bit long double
 +
 +2023-07-14   Michael Meissner  <meissner@linux.ibm.com>
 +
 +gcc/
 +
 +	* config/rs6000/rs6000.md (fp_int_extend): New code attribute.
 +	* config/rs6000/vsx.md (vsx_extract_v4si_load_to_<uns><mode>): New
 +	insn.
 +	(vsx_extract_<VSX_EXTRACT_I2:mode>_load_to_uns<SFDF:mode>): New insn.
 +
 +gcc/testsuite/
 +
 +	* gcc.target/powerpc/vec-extract-mem-char-2.c: New file.
 +	* gcc.target/powerpc/vec-extract-mem-int-2.c: New file.
 +	* gcc.target/powerpc/vec-extract-mem-int_3.c: New file.
 +	* gcc.target/powerpc/vec-extract-mem-short-2.c: New file.
 +
 +==================== Branch work125, patch #6 ====================
 +
 +Add alternatives for vec_extract with constant element loading from memory.
 +
 +This patch expands the alternatives for doing vec_extract of V4SI, V8HI, and
 +V16QI vectors with a constant offset when the vector is in memory.  If the
 +element number is 0 or we are using offsettable addressing for loading up GPR
 +registers we don't need to allocate a temporary base register.  We can fold the
 +offset from the vec_extract into the normal address.
 +
 +I also added alternatives to load the values into vector registers.  If we load
 +the value into vector registers, we require X-form addressing.
 +
 +I added the VSX_EX_ISA mode attribute to distinguish that we can load 32-bit
 +integers on a power8 system to vector registers, but we need a power9 system to
 +be able to load 8-bit or 16-bit integers.
 +
 +In general, loading up small integer values with vec_extract into the vector
 +registers explicitly is likely not done that much.  However, this will be needed
 +in later patches when we want to combine loading up a small integer value into a
 +vector register with sign/zero extension.  This happens when we want to do a
 +vec_extract of a smal integer value and convert it to floating point.
 +
 +I have tested this patch on the following systems and there was no degration.
 +Can I check it into the trunk branch?
 +
 +    *	Power10, LE, --with-cpu=power10, IBM 128-bit long double
 +    *	Power9,  LE, --with-cpu=power9,  IBM 128-bit long double
 +    *	Power9,  LE, --with-cpu=power9,  IEEE 128-bit long double
 +    *   Power9,  LE, --with-cpu=power9,  64-bit default long double
 +    *	Power9,  BE, --with-cpu=power9,  IBM 128-bit long double
 +    *	Power8,  BE, --with-cpu=power8,  IBM 128-bit long double
 +
 +2023-07-14   Michael Meissner  <meissner@linux.ibm.com>
 +
 +gcc/
 +
 +	* config/rs6000/vsx.md (VSX_EX_ISA): New mode attribute.
 +	(vsx_extract_<mode>_load): Add more alternatives for memory options.
 +	Allow the load to load up vector registers if needed.
 +
 +gcc/testsuite/
 +
 +	* gcc.target/powerpc/vec-extract-mem-char-1.c: New test.
 +	* gcc.target/powerpc/vec-extract-mem-int-1.c: New test.
 +	* gcc.target/powerpc/vec-extract-mem-short-1.c: New test.
 +
 +==================== Branch work125, patch #5 ====================
 +
 +Optimize vec_extract of V4SF with variable element number being converted to DF
 +
 +This patch adds a combiner insn to include the conversion of float to double
 +within the memory address when vec_extract of V4SF with a variable element
 +number is done.
 +
 +It also removes the '?' from the 'r' constraint so that if the SFmode is needed
 +in a GPR, it doesn't have to load it to the vector unit, store it on the stack,
 +and reload it into a GPR register.
 +
 +2023-07-11   Michael Meissner  <meissner@linux.ibm.com>
 +
 +gcc/
 +
 +	* config/rs6000/vsx.md (vsx_extract_v4sf_var_load): Remove '?' from 'r'
 +	constraint.
 +	(vsx_extract_v4sf_var_load_to_df): New insn.
 +
 +gcc/testsuite/
 +
 +	* gcc.target/powerpc/vec-extract-mem-float-2.c: New file.
 +
 +==================== Branch work125, patch #4 ====================
 +
 +Optimize vec_extract of V4SF from memory with constant element numbers.
 +
 +This patch updates vec_extract of V4SF from memory with constant element
 +numbers.
 +
 +I went through the alternatives, and I added alternatives to denote when we
 +don't need to allocate a temporary base register.  These cases include
 +extracting element 0, and extracting elements 1-3 where we can use offsetable
 +addresses.
 +
 +I added alternatives for power8 and power9 units to account for the expanded
 +addressing on these machines (power8 can load SFmode into Altivec registers with
 +x-form addressing, and power9 can use offsettable adressing to load up Altivec
 +registers.
 +
 +This patch corrects the ISA test for loading SF values to altivec registers to
 +be power8 vector, and not power7.
 +
 +This patch adds a combiner patch to combine loading up a SF element and
 +converting it to double.
 +
 +It also removes the '?' from the 'r' constraint so that if the SFmode is needed
 +in a GPR, it doesn't have to load it to the vector unit, store it, and then
 +reload it into the GPR register.
 +
 +I have tested this patch on the following systems and there was no degration.
 +Can I check it into the trunk branch?
 +
 +    *	Power10, LE, --with-cpu=power10, IBM 128-bit long double
 +    *	Power9,  LE, --with-cpu=power9,  IBM 128-bit long double
 +    *	Power9,  LE, --with-cpu=power9,  IEEE 128-bit long double
 +    *   Power9,  LE, --with-cpu=power9,  64-bit default long double
 +    *	Power9,  BE, --with-cpu=power9,  IBM 128-bit long double
 +    *	Power8,  BE, --with-cpu=power8,  IBM 128-bit long double
 +
 +2023-07-14   Michael Meissner  <meissner@linux.ibm.com>
 +
 +gcc/
 +
 +	* gcc/config/rs6000/vsx.md (vsx_extract_v4sf_load): Fix ISA for loading
 +	up SFmode values with x-form addresses.  Remove ? from 'r' constraint.
 +	Add more alternatives to prevent requiring a temporary base register if
 +	we don't need the temporary.
 +	(vsx_extract_v4sf_load_to_df): New insn.
 +
 +gcc/testsuite/
 +
 +	* gcc.target/powerpc/vec-extract-mem-float-1.c: New file.
 +
 +==================== Branch work125, patch #3 ====================
 +
 +Fix typo in insn name.
 +
 +In doing other work, I noticed that there was an insn:
 +
 +	vsx_extract_v4sf_<mode>_load
 +
 +Which did not have an iterator.  I removed the useless <mode>.
 +
 +I have tested this patch on the following systems and there was no degration.
 +Can I check it into the trunk branch?
 +
 +    *	Power10, LE, --with-cpu=power10, IBM 128-bit long double
 +    *	Power9,  LE, --with-cpu=power9,  IBM 128-bit long double
 +    *	Power9,  LE, --with-cpu=power9,  IEEE 128-bit long double
 +    *   Power9,  LE, --with-cpu=power9,  64-bit default long double
 +    *	Power9,  BE, --with-cpu=power9,  IBM 128-bit long double
 +    *	Power8,  BE, --with-cpu=power8,  IBM 128-bit long double
 +
 +2023-07-14  Michael Meissner  <meissner@linux.ibm.com>
 +
 +gcc/
 +
 +	* config/rs6000/vsx.md (vsx_extract_v4sf_load): Rename from
 +	vsx_extract_v4sf_<mode>_load.
 +
 +==================== Branch work125, patch #2 ====================
 +
 +Improve 64->128 bit zero extension on PowerPC (PR target/108958)
 +
 +If we are converting an unsigned DImode to a TImode value, and the TImode value
 +will go in a vector register, GCC currently does the DImode to TImode conversion
 +in GPR registers, and then moves the value to the vector register via a mtvsrdd
 +instruction.
 +
 +This patch adds a new zero_extendditi2 insn which optimizes moving a GPR to a
 +vector register using the mtvsrdd instruction with RA=0, and using lxvrdx to
 +load a 64-bit value into the bottom 64-bits of the vector register.
 +
 +I have tested this patch on the following systems and there was no degration.
 +Can I check it into the trunk branch?
 +
 +    *	Power10, LE, --with-cpu=power10, IBM 128-bit long double
 +    *	Power9,  LE, --with-cpu=power9,  IBM 128-bit long double
 +    *	Power9,  LE, --with-cpu=power9,  IEEE 128-bit long double
 +    *   Power9,  LE, --with-cpu=power9,  64-bit default long double
 +    *	Power9,  BE, --with-cpu=power9,  IBM 128-bit long double
 +    *	Power8,  BE, --with-cpu=power8,  IBM 128-bit long double
 +
 +2023-07-14  Michael Meissner  <meissner@linux.ibm.com>
 +
 +gcc/
 +
 +	PR target/108958
 +	* gcc/config/rs6000.md (zero_extendditi2): New insn.
 +
 +gcc/testsuite/
 +
 +	PR target/108958
 +	* gcc.target/powerpc/pr108958.c: New test.
 +
 +==================== Branch work125, patch #1 ====================
 +
 +Optimize vec_splats of vec_extract for V2DI/V2DF (PR target/99293)
 +
 +This patch optimizes cases like:
 +
 +	vector double v1, v2;
 +	/* ... */
 +	v2 = vec_splats (vec_extract (v1, 0);	/* or  */
 +	v2 = vec_splats (vec_extract (v1, 1);
 +
 +Previously:
 +
 +	vector long long
 +	splat_dup_l_0 (vector long long v)
 +	{
 +	  return __builtin_vec_splats (__builtin_vec_extract (v, 0));
 +	}
 +
 +would generate:
 +
 +        mfvsrld 9,34
 +        mtvsrdd 34,9,9
 +        blr
 +
 +With this patch, GCC generates:
 +
 +        xxpermdi 34,34,34,3
 +	blr
 +
 +
 +I have tested this patch on the following systems and there was no degration.
 +Can I check it into the trunk branch?
 +
 +    *	Power10, LE, --with-cpu=power10, IBM 128-bit long double
 +    *	Power9,  LE, --with-cpu=power9,  IBM 128-bit long double
 +    *	Power9,  LE, --with-cpu=power9,  IEEE 128-bit long double
 +    *   Power9,  LE, --with-cpu=power9,  64-bit default long double
 +    *	Power9,  BE, --with-cpu=power9,  IBM 128-bit long double
 +    *	Power8,  BE, --with-cpu=power8,  IBM 128-bit long double
 +
 +2023-07-14  Michael Meissner  <meissner@linux.ibm.com>
 +
 +gcc/
 +
 +	PR target/99293
 +	* gcc/config/rs6000/vsx.md (vsx_splat_extract_<mode>): New combiner
 +	insn.
 +
 +gcc/testsuite/
 +
 +	PR target/108958
 +	* gcc.target/powerpc/pr99293.c: New test.
 +	* gcc.target/powerpc/builtins-1.c: Update insn count.
 +
 +
- ==================== Branch work125, baseline ====================
+ ==================== Branch work125-vpair, baseline ====================
  
  2023-07-14   Michael Meissner  <meissner@linux.ibm.com>