From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <meissner@sourceware.org>
Received: by sourceware.org (Postfix, from userid 1005)
	id 752463858D1E; Mon,  1 May 2023 23:14:00 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 752463858D1E
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1682982840;
	bh=zKj50r0Ll74SRB2EwjwQ8iX+EtCbLoklDwJS9RCzKWs=;
	h=From:To:Subject:Date:From;
	b=ixElE3FNPnWj97O+/csVrNhgeFG3rLmqBg8XwLJCaMogNUj2eQQcGyYOcKBpcJGzk
	 axqRCBdr9mLG9iE7KnSz8OJsuKpWcKORoz/JquyLs4rt0DCgIUoO/2LNjENNcemU2F
	 xEdGswHEpea5BLsGlutj3BTqyO8ZexTDAn8f5vDk=
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
From: Michael Meissner <meissner@gcc.gnu.org>
To: gcc-cvs@gcc.gnu.org
Subject: [gcc(refs/users/meissner/heads/work120)] Update ChangeLog.meissner
X-Act-Checkin: gcc
X-Git-Author: Michael Meissner <meissner@linux.ibm.com>
X-Git-Refname: refs/users/meissner/heads/work120
X-Git-Oldrev: 956989a0739cb47c07498de4348673980ae3ee46
X-Git-Newrev: 41c363568baf9c9a2051c341e4d0c041fdbc8ec5
Message-Id: <20230501231400.752463858D1E@sourceware.org>
Date: Mon,  1 May 2023 23:14:00 +0000 (GMT)
List-Id: <gcc-cvs.sourceware.org>

https://gcc.gnu.org/g:41c363568baf9c9a2051c341e4d0c041fdbc8ec5

commit 41c363568baf9c9a2051c341e4d0c041fdbc8ec5
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Mon May 1 19:13:57 2023 -0400

    Update ChangeLog.meissner

Diff:
---
 gcc/ChangeLog.meissner | 319 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 319 insertions(+)
diff --git a/gcc/ChangeLog.meissner b/gcc/ChangeLog.meissner
index d93fc70d3f1..b4314aa4954 100644
--- a/gcc/ChangeLog.meissner
+++ b/gcc/ChangeLog.meissner
@@ -1,3 +1,322 @@
+==================== Branch work120, patch #30 was reverted ====================
+
+==================== Branch work120, patch #27 ====================
+
+Optimize variable element vec_extract to be converted to floating point
+
+This patch optimizes vec_extract with a variable element number of the following
+types to be converted to floating point by loading the value directly to the
+vector register, and then doing the conversion instead of loading the value to a
+GPR and then doing a direct move:
+
+vector int
+vector unsigned int
+vector unsigned short
+vector unsigned char
+
+2023-04-28   Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/
+
+	* config/rs6000/vsx.md (vsx_extract_v4si_var_load_to_<uns><mode>): New
+	* insn.
+	* vsx_extract_<VSX_EXTRACT_I2:mode>_var_load_to_uns<SFDF:mode>: New
+	insn.
+
+gcc/testsuite/
+
+	* gcc.target/powerpc/vec-extract-mem-int-6.c: New file.
+	* gcc.target/powerpc/vec-extract-mem-int_7.c: New file.
+
+
+==================== Branch work120, patch #26 ====================
+
+Allow constant element vec_extract to be converted to floating point
+
+This patch allows vec_extract of the following types to be converted to
+floating point by loading the value directly to the vector register, and then
+doing the conversion instead of loading the value to a GPR and then doing a
+direct move:
+
+vector int
+vector unsigned int
+vector unsigned short
+vector unsigned char
+
+2023-05-01   Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/
+
+	* config/rs6000/rs6000.md (fp_int_extend): New code attribute.
+	* config/rs6000/vsx.md (vsx_extract_v4si_load_to_<uns><mode>): New
+	* insn.
+	* vsx_extract_<VSX_EXTRACT_I2:mode>_load_to_uns<SFDF:mode>: New insn.
+
+gcc/testsuite/
+
+	* gcc.target/powerpc/vec-extract-mem-char-2.c: New file.
+	* gcc.target/powerpc/vec-extract-mem-int-4.c: New file.
+	* gcc.target/powerpc/vec-extract-mem-int_5.c: New file.
+	* gcc.target/powerpc/vec-extract-mem-short-4.c: New file.
+
+==================== Branch work120, patch #25 ====================
+
+Allow variable element vec_extract to be sign or zero extended
+
+This patch allows vec_extract of V4SI, V8HI, and V16QI vector types with a
+variable element number to be loaded with sign or zero extension, and GCC will
+not generate a separate zero/sign extension instruction.
+
+2023-05-01   Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/
+
+	* config/rs6000/vsx.md (vsx_extract_v4si_var_load_to_<su>di): New insn.
+	(vsx_extract_<VSX_EXTRACT_I2:mode>_var_load_to_u<GPR:mode>): New insn.
+	(vsx_extract_v8hi_var_load_to_s<mode>): New insn.
+
+gcc/testsuite/
+
+	* gcc.target/powerpc/vec-extract-mem-int-3.c: New file.
+	* gcc.target/powerpc/vec-extract-mem-short-3.c: New file.
+
+==================== Branch work120, patch #24 ====================
+
+Allow variable element vec_extract to be loaded into vector registers.
+
+This patch allows vec_extract of V4SI, V8HI, and V16QI vector types with a
+variable element number to be loaded into vector registers directly.
+
+2023-05-1   Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/
+
+	* config/rs6000/vsx.md (vsx_extract_<mode>_var_load): Allow vector
+	registers to be loaded.
+
+==================== Branch work120, patch #23 ====================
+
+Optimize sign and zero extension of vec_extract from memory with constant element
+
+This patch allows vec_extract of V4SI, V8HI, and V16QI vector types with a
+constant element number to be zero extended.  It also allows vec_extract of V4SI
+and V8HI vector types with constant element number to be sign extended.
+
+2023-05-01   Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/
+
+	* config/rs6000/vsx.md (vsx_extract_v4si_load_to_<su>d): New insn.
+	(vsx_extract_<VSX_EXTRACT_I2:mode>_load_to_u<GPR:mode>): New insn.
+	(vsx_extract_v8hi_load_to_s<mode>): New insn.
+
+gcc/testsuite/
+
+	* gcc.target/powerpc/vec-extract-mem-char-1.c: New file.
+	* gcc.target/powerpc/vec-extract-mem-int-1.c: New file.
+	* gcc.target/powerpc/vec-extract-mem-int-2.c: New file.
+	* gcc.target/powerpc/vec-extract-mem-short-1.c: New file.
+	* gcc.target/powerpc/vec-extract-mem-short-2.c: New file.
+
+==================== Branch work120, patch #22 ====================
+
+Allow consant element vec_extract to be loaded into vector registers.
+
+This patch allows vec_extract of V4SI, V8HI, and V16QI vector types with a
+constant element number to be loaded into vector registers directly.
+
+This patch also adds support for optimzing 0 element number to not need a base
+register tempoary.  Likewise, if we have an offsettable address, we don't need
+to allocate a scratch register.
+
+2023-05-01   Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/
+
+	* config/rs6000/vsx.md (VSX_EX_ISA): New mode attribute.
+	(vsx_extract_<mode>_load): Allow vector registers to be loaded.  Add
+	optimizations for loading up element 0 and/or with an offsettable
+	address.
+
+==================== Branch work120, patch #21 ====================
+
+Optimize vec_extract of V4SF with variable element number being converted to DF
+
+This patch adds a combiner insn to include the conversion of float to double
+within the memory address when vec_extract of V4SF with a variable element
+number is done.
+
+It also removes the '?' from the 'r' constraint so that if the SFmode is needed
+in a GPR, it doesn't have to load it to the vector unit and then store it.
+
+2023-05-01   Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/
+
+	* config/rs6000/vsx.md (vsx_extract_v4sf_var_load): Remove '?' from 'r'
+	constraint.
+	(vsx_extract_v4sf_var_load_to_df): New insn.
+
+gcc/testsuite/
+
+	* gcc.target/powerpc/vec-extract-mem-float-2.c: New file.
+
+==================== Branch work120, patch #20 ====================
+
+Optimize vec_extract of V4SF from memory with constant element numbers.
+
+This patch updates vec_extract of V4SF from memory with constant element
+numbers.
+
+This patch corrects the ISA for loading SF values to altivec registers to be
+power8 vector, and not power7.
+
+This patch adds a combiner patch to combine loading up a SF element and
+converting it to double.
+
+It also removes the '?' from the 'r' constraint so that if the SFmode is needed
+in a GPR, it doesn't have to load it to the vector unit and then store it.
+
+2023-05-01   Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/
+
+	* gcc/config/rs6000/vsx.md (vsx_extract_v4sf_load): Fix ISA for loading
+	up SFmode values with x-form addresses.  Remove ? from 'r' constraint.
+	(vsx_extract_v4sf_load_to_df): New insn.
+
+gc/testsuite/
+
+	* gcc.target/powerpc/vec-extract-mem-float-1.c: New file.
+
+==================== Branch work120, patch #3 ====================
+
+Fix typo in insn name.
+
+In doing other work, I noticed that there was an insn:
+
+	vsx_extract_v4sf_<mode>_load
+
+Which did not have an iterator.  I removed the useless <mode>.
+
+2023-05-01   Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/
+
+	* config/rs6000/vsx.md (vsx_extract_v4sf_load): Rename from
+	vsx_extract_v4sf_<mode>_load.
+
+==================== Branch work120, patch #2 ====================
+
+Improve 64->128 bit zero extension on PowerPC
+
+2023-05-01   Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/
+
+	PR target/108958
+	* gcc/config/rs6000.md (zero_extendditi2): New insn.
+
+gcc/testsuite/
+
+	PR target/108958
+	* gcc.target/powerpc/zero-extend-di-ti.c: New test.
+
+==================== Branch work120, patch #1 ====================
+
+Fix splat of extract for long long and double.
+
+2023-05-01   Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/
+
+	PR target/99293
+	* gcc/config/rs6000/vsx.md (vsx_splat_extract_<mode>): New combiner
+	insn.
+
+gcc/testsuite/
+
+	PR target/108958
+	* gcc.target/powerpc/pr99293.c: New test.
+	* gcc.target/powerpc/builtins-1.c: Update insn count.
+
+==================== Branch work120, patch #1 ====================
+
+PR target/105325: Make load/cmp fusion know about prefixed loads.
+
+I posted a version of patch on March 21st, a second version on March 24th, and
+the third version on March 28th.
+
+The V4 patch just adds a new condition to the new test case.  Previously, I was
+using 'powerpc_prefixed_addr' to determine whether the GCC compiler would
+automatically generate prefixed addresses.  The V4 version also adds a check
+for 'power10_ok'.  Power10_ok is needed in case the compiler could generate
+prefixed addresses, but the assembler does not support prefixed instructions.
+
+The V3 patch makes some code changes suggested in the genfusion.pl code from
+the last 2 patch submissions.  The fusion.md that is produced by genfusion.pl
+is the same in all 3 versions.
+
+In V3, I changed the genfusion.pl to match the suggestion for code layout.  I
+also used the correct comment for each of the instructions (in the 2nd patch,
+the when I rewrote the comments about ld and lwa being DS format instructions,
+I had put the ld comment in the section handling lwa, and vice versa).
+
+In V3, I also removed lp64 from the new test.  When I first added the prefixed
+code, it was only done for 64-bit, but now it is allowed for 32-bit.  However,
+the case that shows up (lwa) would not hit in 32-bit, since it only generates
+lwz and not lwa.  It also would not generate ld.  But the test does pass when
+it is built with -m32.
+
+The issue with the original bug is the power10 load GPR + cmpi -1/0/1 fusion
+optimization generates illegal assembler code.
+
+Ultimately the code was dying because the fusion load + compare -1/0/1 patterns
+did not handle the possibility that the load might be prefixed.
+
+The main cause is the constraints for the individual loads in the fusion did not
+match the machine.  In particular, LWA is a ds format instruction when it is
+unprefixed.  The code did not also set the prefixed attribute correctly.
+
+This patch rewrites the genfusion.pl script so that it will have more accurate
+constraints for the LWA and LD instructions (which are DS instructions).  The
+updated genfusion.pl was then run to update fusion.md.  Finally, the code for
+the "prefixed" attribute is modified so that it considers load + compare
+immediate patterns to be like the normal load insns in checking whether
+operand[1] is a prefixed instruction.
+
+I have tested this code on a power9 little endian system (with long double
+being IEEE 128-bit and IBM 128-bit), a power10 little endian system, and a
+power8 big endian system, testing both 32-bit and 64-bit code generation.
+
+For the V4 changes I also built the compiler on a big endian system with an
+older assembler, and I verified that the pr105325.C test was listed as
+unsupported.
+
+Can I put this code into the master branch, and after a waiting period, apply
+it to the GCC 12 and GCC 11 branches (the bug does show up in those branches,
+and the patch applies without change).
+
+2023-05-01   Michael Meissner  <meissner@linux.ibm.com>
+
+gcc/
+
+	PR target/105325
+	* gcc/config/rs6000/genfusion.pl (gen_ld_cmpi_p10): Improve generation
+	of the ld and lwa instructions which use the DS encoding instead of D.
+	Use the YZ constraint for these loads.	Handle prefixed loads better.
+	Set the sign_extend attribute as appropriate.
+	* gcc/config/rs6000/fusion.md: Regenerate.
+	* gcc/config/rs6000/rs6000.md (prefixed attribute): Add fused_load_cmpi
+	instructions to the list of instructions that might have a prefixed load
+	instruction.
+
+gcc/testsuite/
+
+	PR target/105325
+	* g++.target/powerpc/pr105325.C: New test.
+	* gcc.target/powerpc/fusion-p10-ldcmpi.c: Adjust insn counts.
+
 ==================== Branch work120, baseline ====================
 
 2023-05-01   Michael Meissner  <meissner@linux.ibm.com>