From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 1005) id 278833858289; Fri, 14 Jul 2023 19:52:22 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 278833858289 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1689364342; bh=CbB5ERIwMvlNfi2xfY5jFyQ2lf65TSPGFJLiT6nwutk=; h=From:To:Subject:Date:From; b=D1uS6r7XB+wyJDmL5pyd9tRLYI+qqTPWW3grQRcnDoKecIJKdpaujYyebuDxizhp4 ej5VQTnAod+Dw/29US/giG+sgVDu79mGD8gVF3/6u7UkABYmLbZkVTOoZNzvB6Sx3F sC/OBFEtC59MtgLrHuI5MnOtit0qSEotKfXQsZH0= Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: Michael Meissner To: gcc-cvs@gcc.gnu.org, libstdc++-cvs@gcc.gnu.org Subject: [gcc(refs/users/meissner/heads/work125-vpair)] Merge commit 'refs/users/meissner/heads/work125-vpair' of git+ssh://gcc.gnu.org/git/gcc into me/work X-Act-Checkin: gcc X-Git-Author: Michael Meissner X-Git-Refname: refs/users/meissner/heads/work125-vpair X-Git-Oldrev: 9846b58a20f2172e5bdd502f3f78d20a59228e27 X-Git-Newrev: 630e1ba4d622b53da62dc7d07bcfa842da5f2162 Message-Id: <20230714195222.278833858289@sourceware.org> Date: Fri, 14 Jul 2023 19:52:22 +0000 (GMT) List-Id: https://gcc.gnu.org/g:630e1ba4d622b53da62dc7d07bcfa842da5f2162 commit 630e1ba4d622b53da62dc7d07bcfa842da5f2162 Merge: 9846b58a20f c53c1e9f353 Author: Michael Meissner Date: Fri Jul 14 15:51:50 2023 -0400 Merge commit 'refs/users/meissner/heads/work125-vpair' of git+ssh://gcc.gnu.org/git/gcc into me/work125-vpair Diff: gcc/ChangeLog.meissner | 2 +- gcc/REVISION | 2 +- gcc/c-family/ChangeLog.meissner | 2 +- gcc/c/ChangeLog.meissner | 2 +- gcc/cp/ChangeLog.meissner | 2 +- gcc/fortran/ChangeLog.meissner | 2 +- gcc/testsuite/ChangeLog.meissner | 2 +- libgcc/ChangeLog.meissner | 2 +- libstdc++-v3/ChangeLog.meissner | 2 +- 9 files changed, 9 insertions(+), 9 deletions(-) diff --cc gcc/ChangeLog.meissner index 306c070821c,8c6f8f14248..5051fa02d69 --- a/gcc/ChangeLog.meissner +++ b/gcc/ChangeLog.meissner @@@ -1,364 -1,4 +1,364 @@@ +==================== Branch work125, patch #9 ==================== + +PR target/89213 - Optimize vector shift by a constant. + +Optimize vector shifts by a constant, taking advantage that the shift +instructions only look at the bits within the element. + +The PowerPC doesn't have a VSPLTID instruction. This meant that if we are doing +a V2DI shift of 0..15, we had to do VSPLTIW and VEXTSW2D instructions to load +the constant into the vector register. + +Similarly for V4SI and V2DI, if we wanted to shift more than 15 bits, we would +generate XXSPLTIB and VEXTSB2D or VEXTSB2W instructions to load the constant +into the vector register. + +Given the vector shift instructions only look at the bottom 5 or 6 bits of the +shift value, we can load the constant via VSPLTISW or XXSPLTIB instructions and +eliminate the sign extend instructions (VEXTSW2D, VEXTSB2D, and VEXTSB2W). + +I have tested this patch on the following systems and there was no degration. +Can I check it into the trunk branch? + + * Power10, LE, --with-cpu=power10, IBM 128-bit long double + * Power9, LE, --with-cpu=power9, IBM 128-bit long double + * Power9, LE, --with-cpu=power9, IEEE 128-bit long double + * Power9, LE, --with-cpu=power9, 64-bit default long double + * Power9, BE, --with-cpu=power9, IBM 128-bit long double + * Power8, BE, --with-cpu=power8, IBM 128-bit long double + +2023-07-14 Michael Meissner + +gcc/ + + PR target/89213 + * config/rs6000/altivec.md (UNSPEC_VECTOR_SHIFT): New unspec. + (V4SI_V2DI): New mode iterator. + (vshift_code): New code iterator. + (altivec__const_): New insns. + * config/rs6000/predicates.md (vector_shift_constant): New + predicate. + +gcc/testsuite/ + + PR target/89213 + * gcc.target/powerpc/pr89213.c: New test. + * gcc.target/powerpc/vec-rlmi-rlnm.c: Update insn count. + +==================== Branch work125, patch #8 ==================== + +Update fold-vec-extract insn counts on 32-bit big endian + +In running tests, I noticed on big endian systems that the expected ADDIs is +higher than the current number of ADDIs generated by the compiler. This patch +adjusts those counts. + +I have tested this patch on the following systems and there was no degration. +Can I check it into the trunk branch? + + * Power10, LE, --with-cpu=power10, IBM 128-bit long double + * Power9, LE, --with-cpu=power9, IBM 128-bit long double + * Power9, LE, --with-cpu=power9, IEEE 128-bit long double + * Power9, LE, --with-cpu=power9, 64-bit default long double + * Power9, BE, --with-cpu=power9, IBM 128-bit long double + * Power8, BE, --with-cpu=power8, IBM 128-bit long double + +2023-07-14 Michael Meissner + +gcc/testsuite/ + + * gcc.target/powerpc/fold-vec-extract-char.p7.c: Update insn count for + 32-bit. + * gcc.target/powerpc/fold-vec-extract-double.p7.c: Likewise. + * gcc.target/powerpc/fold-vec-extract-float.p7.c: Likewise. + * gcc.target/powerpc/fold-vec-extract-float.p8.c: Likewise. + * gcc.target/powerpc/fold-vec-extract-int.p7.c: Likewise. + * gcc.target/powerpc/fold-vec-extract-int.p8.c: Likewise. + * gcc.target/powerpc/fold-vec-extract-short.p7.c: Likewise. + * gcc.target/powerpc/fold-vec-extract-short.p8.c: Likewise. + +==================== Branch work125, patch #7 ==================== + +Allow constant element vec_extract to be converted to floating point + +This patch allows vec_extract of the following types to be converted to +floating point by loading the value directly to the vector register, and then +doing the conversion instead of loading the value to a GPR and then doing a +direct move: + +vector int +vector unsigned int +vector unsigned short +vector unsigned char + +I have tested this patch on the following systems and there was no degration. +Can I check it into the trunk branch? + + * Power10, LE, --with-cpu=power10, IBM 128-bit long double + * Power9, LE, --with-cpu=power9, IBM 128-bit long double + * Power9, LE, --with-cpu=power9, IEEE 128-bit long double + * Power9, LE, --with-cpu=power9, 64-bit default long double + * Power9, BE, --with-cpu=power9, IBM 128-bit long double + * Power8, BE, --with-cpu=power8, IBM 128-bit long double + +2023-07-14 Michael Meissner + +gcc/ + + * config/rs6000/rs6000.md (fp_int_extend): New code attribute. + * config/rs6000/vsx.md (vsx_extract_v4si_load_to_): New + insn. + (vsx_extract__load_to_uns): New insn. + +gcc/testsuite/ + + * gcc.target/powerpc/vec-extract-mem-char-2.c: New file. + * gcc.target/powerpc/vec-extract-mem-int-2.c: New file. + * gcc.target/powerpc/vec-extract-mem-int_3.c: New file. + * gcc.target/powerpc/vec-extract-mem-short-2.c: New file. + +==================== Branch work125, patch #6 ==================== + +Add alternatives for vec_extract with constant element loading from memory. + +This patch expands the alternatives for doing vec_extract of V4SI, V8HI, and +V16QI vectors with a constant offset when the vector is in memory. If the +element number is 0 or we are using offsettable addressing for loading up GPR +registers we don't need to allocate a temporary base register. We can fold the +offset from the vec_extract into the normal address. + +I also added alternatives to load the values into vector registers. If we load +the value into vector registers, we require X-form addressing. + +I added the VSX_EX_ISA mode attribute to distinguish that we can load 32-bit +integers on a power8 system to vector registers, but we need a power9 system to +be able to load 8-bit or 16-bit integers. + +In general, loading up small integer values with vec_extract into the vector +registers explicitly is likely not done that much. However, this will be needed +in later patches when we want to combine loading up a small integer value into a +vector register with sign/zero extension. This happens when we want to do a +vec_extract of a smal integer value and convert it to floating point. + +I have tested this patch on the following systems and there was no degration. +Can I check it into the trunk branch? + + * Power10, LE, --with-cpu=power10, IBM 128-bit long double + * Power9, LE, --with-cpu=power9, IBM 128-bit long double + * Power9, LE, --with-cpu=power9, IEEE 128-bit long double + * Power9, LE, --with-cpu=power9, 64-bit default long double + * Power9, BE, --with-cpu=power9, IBM 128-bit long double + * Power8, BE, --with-cpu=power8, IBM 128-bit long double + +2023-07-14 Michael Meissner + +gcc/ + + * config/rs6000/vsx.md (VSX_EX_ISA): New mode attribute. + (vsx_extract__load): Add more alternatives for memory options. + Allow the load to load up vector registers if needed. + +gcc/testsuite/ + + * gcc.target/powerpc/vec-extract-mem-char-1.c: New test. + * gcc.target/powerpc/vec-extract-mem-int-1.c: New test. + * gcc.target/powerpc/vec-extract-mem-short-1.c: New test. + +==================== Branch work125, patch #5 ==================== + +Optimize vec_extract of V4SF with variable element number being converted to DF + +This patch adds a combiner insn to include the conversion of float to double +within the memory address when vec_extract of V4SF with a variable element +number is done. + +It also removes the '?' from the 'r' constraint so that if the SFmode is needed +in a GPR, it doesn't have to load it to the vector unit, store it on the stack, +and reload it into a GPR register. + +2023-07-11 Michael Meissner + +gcc/ + + * config/rs6000/vsx.md (vsx_extract_v4sf_var_load): Remove '?' from 'r' + constraint. + (vsx_extract_v4sf_var_load_to_df): New insn. + +gcc/testsuite/ + + * gcc.target/powerpc/vec-extract-mem-float-2.c: New file. + +==================== Branch work125, patch #4 ==================== + +Optimize vec_extract of V4SF from memory with constant element numbers. + +This patch updates vec_extract of V4SF from memory with constant element +numbers. + +I went through the alternatives, and I added alternatives to denote when we +don't need to allocate a temporary base register. These cases include +extracting element 0, and extracting elements 1-3 where we can use offsetable +addresses. + +I added alternatives for power8 and power9 units to account for the expanded +addressing on these machines (power8 can load SFmode into Altivec registers with +x-form addressing, and power9 can use offsettable adressing to load up Altivec +registers. + +This patch corrects the ISA test for loading SF values to altivec registers to +be power8 vector, and not power7. + +This patch adds a combiner patch to combine loading up a SF element and +converting it to double. + +It also removes the '?' from the 'r' constraint so that if the SFmode is needed +in a GPR, it doesn't have to load it to the vector unit, store it, and then +reload it into the GPR register. + +I have tested this patch on the following systems and there was no degration. +Can I check it into the trunk branch? + + * Power10, LE, --with-cpu=power10, IBM 128-bit long double + * Power9, LE, --with-cpu=power9, IBM 128-bit long double + * Power9, LE, --with-cpu=power9, IEEE 128-bit long double + * Power9, LE, --with-cpu=power9, 64-bit default long double + * Power9, BE, --with-cpu=power9, IBM 128-bit long double + * Power8, BE, --with-cpu=power8, IBM 128-bit long double + +2023-07-14 Michael Meissner + +gcc/ + + * gcc/config/rs6000/vsx.md (vsx_extract_v4sf_load): Fix ISA for loading + up SFmode values with x-form addresses. Remove ? from 'r' constraint. + Add more alternatives to prevent requiring a temporary base register if + we don't need the temporary. + (vsx_extract_v4sf_load_to_df): New insn. + +gcc/testsuite/ + + * gcc.target/powerpc/vec-extract-mem-float-1.c: New file. + +==================== Branch work125, patch #3 ==================== + +Fix typo in insn name. + +In doing other work, I noticed that there was an insn: + + vsx_extract_v4sf__load + +Which did not have an iterator. I removed the useless . + +I have tested this patch on the following systems and there was no degration. +Can I check it into the trunk branch? + + * Power10, LE, --with-cpu=power10, IBM 128-bit long double + * Power9, LE, --with-cpu=power9, IBM 128-bit long double + * Power9, LE, --with-cpu=power9, IEEE 128-bit long double + * Power9, LE, --with-cpu=power9, 64-bit default long double + * Power9, BE, --with-cpu=power9, IBM 128-bit long double + * Power8, BE, --with-cpu=power8, IBM 128-bit long double + +2023-07-14 Michael Meissner + +gcc/ + + * config/rs6000/vsx.md (vsx_extract_v4sf_load): Rename from + vsx_extract_v4sf__load. + +==================== Branch work125, patch #2 ==================== + +Improve 64->128 bit zero extension on PowerPC (PR target/108958) + +If we are converting an unsigned DImode to a TImode value, and the TImode value +will go in a vector register, GCC currently does the DImode to TImode conversion +in GPR registers, and then moves the value to the vector register via a mtvsrdd +instruction. + +This patch adds a new zero_extendditi2 insn which optimizes moving a GPR to a +vector register using the mtvsrdd instruction with RA=0, and using lxvrdx to +load a 64-bit value into the bottom 64-bits of the vector register. + +I have tested this patch on the following systems and there was no degration. +Can I check it into the trunk branch? + + * Power10, LE, --with-cpu=power10, IBM 128-bit long double + * Power9, LE, --with-cpu=power9, IBM 128-bit long double + * Power9, LE, --with-cpu=power9, IEEE 128-bit long double + * Power9, LE, --with-cpu=power9, 64-bit default long double + * Power9, BE, --with-cpu=power9, IBM 128-bit long double + * Power8, BE, --with-cpu=power8, IBM 128-bit long double + +2023-07-14 Michael Meissner + +gcc/ + + PR target/108958 + * gcc/config/rs6000.md (zero_extendditi2): New insn. + +gcc/testsuite/ + + PR target/108958 + * gcc.target/powerpc/pr108958.c: New test. + +==================== Branch work125, patch #1 ==================== + +Optimize vec_splats of vec_extract for V2DI/V2DF (PR target/99293) + +This patch optimizes cases like: + + vector double v1, v2; + /* ... */ + v2 = vec_splats (vec_extract (v1, 0); /* or */ + v2 = vec_splats (vec_extract (v1, 1); + +Previously: + + vector long long + splat_dup_l_0 (vector long long v) + { + return __builtin_vec_splats (__builtin_vec_extract (v, 0)); + } + +would generate: + + mfvsrld 9,34 + mtvsrdd 34,9,9 + blr + +With this patch, GCC generates: + + xxpermdi 34,34,34,3 + blr + + +I have tested this patch on the following systems and there was no degration. +Can I check it into the trunk branch? + + * Power10, LE, --with-cpu=power10, IBM 128-bit long double + * Power9, LE, --with-cpu=power9, IBM 128-bit long double + * Power9, LE, --with-cpu=power9, IEEE 128-bit long double + * Power9, LE, --with-cpu=power9, 64-bit default long double + * Power9, BE, --with-cpu=power9, IBM 128-bit long double + * Power8, BE, --with-cpu=power8, IBM 128-bit long double + +2023-07-14 Michael Meissner + +gcc/ + + PR target/99293 + * gcc/config/rs6000/vsx.md (vsx_splat_extract_): New combiner + insn. + +gcc/testsuite/ + + PR target/108958 + * gcc.target/powerpc/pr99293.c: New test. + * gcc.target/powerpc/builtins-1.c: Update insn count. + + - ==================== Branch work125, baseline ==================== + ==================== Branch work125-vpair, baseline ==================== 2023-07-14 Michael Meissner