From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 1005) id 752463858D1E; Mon, 1 May 2023 23:14:00 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 752463858D1E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1682982840; bh=zKj50r0Ll74SRB2EwjwQ8iX+EtCbLoklDwJS9RCzKWs=; h=From:To:Subject:Date:From; b=ixElE3FNPnWj97O+/csVrNhgeFG3rLmqBg8XwLJCaMogNUj2eQQcGyYOcKBpcJGzk axqRCBdr9mLG9iE7KnSz8OJsuKpWcKORoz/JquyLs4rt0DCgIUoO/2LNjENNcemU2F xEdGswHEpea5BLsGlutj3BTqyO8ZexTDAn8f5vDk= Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: Michael Meissner To: gcc-cvs@gcc.gnu.org Subject: [gcc(refs/users/meissner/heads/work120)] Update ChangeLog.meissner X-Act-Checkin: gcc X-Git-Author: Michael Meissner X-Git-Refname: refs/users/meissner/heads/work120 X-Git-Oldrev: 956989a0739cb47c07498de4348673980ae3ee46 X-Git-Newrev: 41c363568baf9c9a2051c341e4d0c041fdbc8ec5 Message-Id: <20230501231400.752463858D1E@sourceware.org> Date: Mon, 1 May 2023 23:14:00 +0000 (GMT) List-Id: https://gcc.gnu.org/g:41c363568baf9c9a2051c341e4d0c041fdbc8ec5 commit 41c363568baf9c9a2051c341e4d0c041fdbc8ec5 Author: Michael Meissner Date: Mon May 1 19:13:57 2023 -0400 Update ChangeLog.meissner Diff: --- gcc/ChangeLog.meissner | 319 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 319 insertions(+) diff --git a/gcc/ChangeLog.meissner b/gcc/ChangeLog.meissner index d93fc70d3f1..b4314aa4954 100644 --- a/gcc/ChangeLog.meissner +++ b/gcc/ChangeLog.meissner @@ -1,3 +1,322 @@ +==================== Branch work120, patch #30 was reverted ==================== + +==================== Branch work120, patch #27 ==================== + +Optimize variable element vec_extract to be converted to floating point + +This patch optimizes vec_extract with a variable element number of the following +types to be converted to floating point by loading the value directly to the +vector register, and then doing the conversion instead of loading the value to a +GPR and then doing a direct move: + +vector int +vector unsigned int +vector unsigned short +vector unsigned char + +2023-04-28 Michael Meissner + +gcc/ + + * config/rs6000/vsx.md (vsx_extract_v4si_var_load_to_): New + * insn. + * vsx_extract__var_load_to_uns: New + insn. + +gcc/testsuite/ + + * gcc.target/powerpc/vec-extract-mem-int-6.c: New file. + * gcc.target/powerpc/vec-extract-mem-int_7.c: New file. + + +==================== Branch work120, patch #26 ==================== + +Allow constant element vec_extract to be converted to floating point + +This patch allows vec_extract of the following types to be converted to +floating point by loading the value directly to the vector register, and then +doing the conversion instead of loading the value to a GPR and then doing a +direct move: + +vector int +vector unsigned int +vector unsigned short +vector unsigned char + +2023-05-01 Michael Meissner + +gcc/ + + * config/rs6000/rs6000.md (fp_int_extend): New code attribute. + * config/rs6000/vsx.md (vsx_extract_v4si_load_to_): New + * insn. + * vsx_extract__load_to_uns: New insn. + +gcc/testsuite/ + + * gcc.target/powerpc/vec-extract-mem-char-2.c: New file. + * gcc.target/powerpc/vec-extract-mem-int-4.c: New file. + * gcc.target/powerpc/vec-extract-mem-int_5.c: New file. + * gcc.target/powerpc/vec-extract-mem-short-4.c: New file. + +==================== Branch work120, patch #25 ==================== + +Allow variable element vec_extract to be sign or zero extended + +This patch allows vec_extract of V4SI, V8HI, and V16QI vector types with a +variable element number to be loaded with sign or zero extension, and GCC will +not generate a separate zero/sign extension instruction. + +2023-05-01 Michael Meissner + +gcc/ + + * config/rs6000/vsx.md (vsx_extract_v4si_var_load_to_di): New insn. + (vsx_extract__var_load_to_u): New insn. + (vsx_extract_v8hi_var_load_to_s): New insn. + +gcc/testsuite/ + + * gcc.target/powerpc/vec-extract-mem-int-3.c: New file. + * gcc.target/powerpc/vec-extract-mem-short-3.c: New file. + +==================== Branch work120, patch #24 ==================== + +Allow variable element vec_extract to be loaded into vector registers. + +This patch allows vec_extract of V4SI, V8HI, and V16QI vector types with a +variable element number to be loaded into vector registers directly. + +2023-05-1 Michael Meissner + +gcc/ + + * config/rs6000/vsx.md (vsx_extract__var_load): Allow vector + registers to be loaded. + +==================== Branch work120, patch #23 ==================== + +Optimize sign and zero extension of vec_extract from memory with constant element + +This patch allows vec_extract of V4SI, V8HI, and V16QI vector types with a +constant element number to be zero extended. It also allows vec_extract of V4SI +and V8HI vector types with constant element number to be sign extended. + +2023-05-01 Michael Meissner + +gcc/ + + * config/rs6000/vsx.md (vsx_extract_v4si_load_to_d): New insn. + (vsx_extract__load_to_u): New insn. + (vsx_extract_v8hi_load_to_s): New insn. + +gcc/testsuite/ + + * gcc.target/powerpc/vec-extract-mem-char-1.c: New file. + * gcc.target/powerpc/vec-extract-mem-int-1.c: New file. + * gcc.target/powerpc/vec-extract-mem-int-2.c: New file. + * gcc.target/powerpc/vec-extract-mem-short-1.c: New file. + * gcc.target/powerpc/vec-extract-mem-short-2.c: New file. + +==================== Branch work120, patch #22 ==================== + +Allow consant element vec_extract to be loaded into vector registers. + +This patch allows vec_extract of V4SI, V8HI, and V16QI vector types with a +constant element number to be loaded into vector registers directly. + +This patch also adds support for optimzing 0 element number to not need a base +register tempoary. Likewise, if we have an offsettable address, we don't need +to allocate a scratch register. + +2023-05-01 Michael Meissner + +gcc/ + + * config/rs6000/vsx.md (VSX_EX_ISA): New mode attribute. + (vsx_extract__load): Allow vector registers to be loaded. Add + optimizations for loading up element 0 and/or with an offsettable + address. + +==================== Branch work120, patch #21 ==================== + +Optimize vec_extract of V4SF with variable element number being converted to DF + +This patch adds a combiner insn to include the conversion of float to double +within the memory address when vec_extract of V4SF with a variable element +number is done. + +It also removes the '?' from the 'r' constraint so that if the SFmode is needed +in a GPR, it doesn't have to load it to the vector unit and then store it. + +2023-05-01 Michael Meissner + +gcc/ + + * config/rs6000/vsx.md (vsx_extract_v4sf_var_load): Remove '?' from 'r' + constraint. + (vsx_extract_v4sf_var_load_to_df): New insn. + +gcc/testsuite/ + + * gcc.target/powerpc/vec-extract-mem-float-2.c: New file. + +==================== Branch work120, patch #20 ==================== + +Optimize vec_extract of V4SF from memory with constant element numbers. + +This patch updates vec_extract of V4SF from memory with constant element +numbers. + +This patch corrects the ISA for loading SF values to altivec registers to be +power8 vector, and not power7. + +This patch adds a combiner patch to combine loading up a SF element and +converting it to double. + +It also removes the '?' from the 'r' constraint so that if the SFmode is needed +in a GPR, it doesn't have to load it to the vector unit and then store it. + +2023-05-01 Michael Meissner + +gcc/ + + * gcc/config/rs6000/vsx.md (vsx_extract_v4sf_load): Fix ISA for loading + up SFmode values with x-form addresses. Remove ? from 'r' constraint. + (vsx_extract_v4sf_load_to_df): New insn. + +gc/testsuite/ + + * gcc.target/powerpc/vec-extract-mem-float-1.c: New file. + +==================== Branch work120, patch #3 ==================== + +Fix typo in insn name. + +In doing other work, I noticed that there was an insn: + + vsx_extract_v4sf__load + +Which did not have an iterator. I removed the useless . + +2023-05-01 Michael Meissner + +gcc/ + + * config/rs6000/vsx.md (vsx_extract_v4sf_load): Rename from + vsx_extract_v4sf__load. + +==================== Branch work120, patch #2 ==================== + +Improve 64->128 bit zero extension on PowerPC + +2023-05-01 Michael Meissner + +gcc/ + + PR target/108958 + * gcc/config/rs6000.md (zero_extendditi2): New insn. + +gcc/testsuite/ + + PR target/108958 + * gcc.target/powerpc/zero-extend-di-ti.c: New test. + +==================== Branch work120, patch #1 ==================== + +Fix splat of extract for long long and double. + +2023-05-01 Michael Meissner + +gcc/ + + PR target/99293 + * gcc/config/rs6000/vsx.md (vsx_splat_extract_): New combiner + insn. + +gcc/testsuite/ + + PR target/108958 + * gcc.target/powerpc/pr99293.c: New test. + * gcc.target/powerpc/builtins-1.c: Update insn count. + +==================== Branch work120, patch #1 ==================== + +PR target/105325: Make load/cmp fusion know about prefixed loads. + +I posted a version of patch on March 21st, a second version on March 24th, and +the third version on March 28th. + +The V4 patch just adds a new condition to the new test case. Previously, I was +using 'powerpc_prefixed_addr' to determine whether the GCC compiler would +automatically generate prefixed addresses. The V4 version also adds a check +for 'power10_ok'. Power10_ok is needed in case the compiler could generate +prefixed addresses, but the assembler does not support prefixed instructions. + +The V3 patch makes some code changes suggested in the genfusion.pl code from +the last 2 patch submissions. The fusion.md that is produced by genfusion.pl +is the same in all 3 versions. + +In V3, I changed the genfusion.pl to match the suggestion for code layout. I +also used the correct comment for each of the instructions (in the 2nd patch, +the when I rewrote the comments about ld and lwa being DS format instructions, +I had put the ld comment in the section handling lwa, and vice versa). + +In V3, I also removed lp64 from the new test. When I first added the prefixed +code, it was only done for 64-bit, but now it is allowed for 32-bit. However, +the case that shows up (lwa) would not hit in 32-bit, since it only generates +lwz and not lwa. It also would not generate ld. But the test does pass when +it is built with -m32. + +The issue with the original bug is the power10 load GPR + cmpi -1/0/1 fusion +optimization generates illegal assembler code. + +Ultimately the code was dying because the fusion load + compare -1/0/1 patterns +did not handle the possibility that the load might be prefixed. + +The main cause is the constraints for the individual loads in the fusion did not +match the machine. In particular, LWA is a ds format instruction when it is +unprefixed. The code did not also set the prefixed attribute correctly. + +This patch rewrites the genfusion.pl script so that it will have more accurate +constraints for the LWA and LD instructions (which are DS instructions). The +updated genfusion.pl was then run to update fusion.md. Finally, the code for +the "prefixed" attribute is modified so that it considers load + compare +immediate patterns to be like the normal load insns in checking whether +operand[1] is a prefixed instruction. + +I have tested this code on a power9 little endian system (with long double +being IEEE 128-bit and IBM 128-bit), a power10 little endian system, and a +power8 big endian system, testing both 32-bit and 64-bit code generation. + +For the V4 changes I also built the compiler on a big endian system with an +older assembler, and I verified that the pr105325.C test was listed as +unsupported. + +Can I put this code into the master branch, and after a waiting period, apply +it to the GCC 12 and GCC 11 branches (the bug does show up in those branches, +and the patch applies without change). + +2023-05-01 Michael Meissner + +gcc/ + + PR target/105325 + * gcc/config/rs6000/genfusion.pl (gen_ld_cmpi_p10): Improve generation + of the ld and lwa instructions which use the DS encoding instead of D. + Use the YZ constraint for these loads. Handle prefixed loads better. + Set the sign_extend attribute as appropriate. + * gcc/config/rs6000/fusion.md: Regenerate. + * gcc/config/rs6000/rs6000.md (prefixed attribute): Add fused_load_cmpi + instructions to the list of instructions that might have a prefixed load + instruction. + +gcc/testsuite/ + + PR target/105325 + * g++.target/powerpc/pr105325.C: New test. + * gcc.target/powerpc/fusion-p10-ldcmpi.c: Adjust insn counts. + ==================== Branch work120, baseline ==================== 2023-05-01 Michael Meissner