[gcc(refs/users/meissner/heads/work067)] Fix SFmode subreg of DImode and TImode

public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed

* [gcc(refs/users/meissner/heads/work067)] Fix SFmode subreg of DImode and TImode
@ 2021-09-08 18:09 Michael Meissner
  0 siblings, 0 replies; 6+ messages in thread
From: Michael Meissner @ 2021-09-08 18:09 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:54d64bd4a3d43d2e4a0bdde2d6ffc8480fa56a20

commit 54d64bd4a3d43d2e4a0bdde2d6ffc8480fa56a20
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Wed Sep 8 14:09:40 2021 -0400

    Fix SFmode subreg of DImode and TImode
    
    I first noticed it in building the Spec 2017 wrf_r and blender_r
    benchmarks.  Once I applied this patch, I also noticed several of the
    tests now pass.
    
    This patch fixes the breakage in the PowerPC due to a recent change in SUBREG
    behavior.  While it is arguable that the patch that caused the breakage should
    be reverted, this patch should be a bandage to prevent these changes from
    happening again.
    
    This is the second version of the patch.  The first version of patch used a
    pseudo register to convert the inner integer type to SImode.  This version
    explicitly allows subregs of the larger DImode and TImode without using a
    pseudo register.
    
    The core of the problem is we need to treat SUBREG's of SFmode and SImode
    specially on the PowerPC.  This is due to the fact that SFmode values that are
    in the vector and floating point registers are represented as DFmode.  When we
    want to do a direct move between the GPR registers and the vector registers, we
    have to convert the value from the DFmode representation to/from the SFmode
    representation.
    
    By doing this special processing instead of doing the transfer via store and
    load, we were able to speed up the math library which at times want to use the
    SFmode values in a union, and do logical operations on it (to test exponent
    ranges, etc.) and then move it over to use as a floating point value.
    
    I did a bootstrap build on a little endian power9 system with and without the
    patch applied.  There was no regression in the tests.  I'm doing a build on a
    big endian power8 system, but it hasn't finished yet as I sent this email.  I
    will check on the big endian progress tomorrow morning.
    
    The following tests now pass once again with the test.
    
            C tests:
            ========
            gcc.c-torture/compile/20071102-1.c
            gcc.c-torture/compile/pr55921.c
            gcc.c-torture/compile/pr85945.c
            gcc.c-torture/execute/complex-3.c
            gcc.dg/atomic/c11-atomic-exec-1.c
            gcc.dg/atomic/c11-atomic-exec-2.c
            gcc.dg/atomic/c11-atomic-exec-4.c
            gcc.dg/atomic/c11-atomic-exec-5.c
            gcc.dg/c11-atomic-2.c
            gcc.dg/pr42475.c
            gcc.dg/pr47201.c
            gcc.dg/pr48335-1.c
            gcc.dg/torture/pr67741.c
            gcc.dg/tree-ssa/ssa-dom-thread-10.c
            gcc.dg/tsan/pr88030.c
            gcc.dg/ubsan/float-cast-overflow-atomic.c
            gcc.dg/vect/no-tree-sra-bb-slp-pr50730.c
    
            C++ tests:
            ==========
            g++.dg/opt/alias1.C
            g++.dg/template/koenig6.C
            g++.dg/torture/pr40924.C
            tmpdir-g++.dg-struct-layout-1/t001
    
            Fortran tests:
            ==============
            gfortran.dg/array_constructor_type_22.f03
            gfortran.dg/array_function_6.f90
            gfortran.dg/derived_comp_array_ref_7.f90
            gfortran.dg/elemental_scalar_args_1.f90
            gfortran.dg/elemental_subroutine_1.f90
            gfortran.dg/inline_matmul_5.f90
            gfortran.dg/inline_matmul_8.f90
            gfortran.dg/inline_matmul_9.f90
            gfortran.dg/matmul_bounds_6.f90
            gfortran.dg/operator_1.f90
            gfortran.dg/past_eor.f90
            gfortran.dg/pr101121.f
            gfortran.dg/pr91552.f90
            gfortran.dg/spread_shape_1.f90
            gfortran.dg/typebound_operator_3.f03
            gfortran.dg/value_1.f90
            gfortran.fortran-torture/execute/entry_4.f90
            gfortran.fortran-torture/execute/intrinsic_dotprod.f90
            gfortran.fortran-torture/execute/intrinsic_matmul.f90
    
    2021-09-08  Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/rs6000.c (rs6000_emit_move_si_sf_subreg): Deal
            with SUBREGs of TImode and DImode.
            * config/rs6000/rs6000.md (SI_DI_TI): New mode iterator.
            (movsf_from_<mode>): Replace movsf_from_si to add support for
            subregs of DImode and TImode.

Diff:
---
 gcc/config/rs6000/rs6000.c  | 17 +++++------------
 gcc/config/rs6000/rs6000.md | 15 ++++++++++-----
 2 files changed, 15 insertions(+), 17 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index da5abe6f7b5..bd1ae1f8d6e 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -10942,23 +10942,16 @@ rs6000_emit_move_si_sf_subreg (rtx dest, rtx source, machine_mode mode)
 	  return true;
 	}
 
-      /* In case we are given a SUBREG for a larger type, reduce it to
-	 SImode.  */
-      if (mode == SFmode && GET_MODE_SIZE (inner_mode) > 4)
+      /* Deal with subregs of SI/DI/TImode.  */
+      if (mode == SFmode && inner_mode == TImode)
 	{
-	  rtx tmp = gen_reg_rtx (SImode);
-	  emit_move_insn (tmp, gen_lowpart (SImode, source));
-	  emit_insn (gen_movsf_from_si (dest, tmp));
+	  emit_insn (gen_movsf_from_ti (dest, inner_source));
 	  return true;
 	}
 
-      /* In case we are given a SUBREG for a larger type, reduce it to
-	 SImode.  */
-      if (mode == SFmode && GET_MODE_SIZE (inner_mode) > 4)
+      if (mode == SFmode && inner_mode == DImode)
 	{
-	  rtx tmp = gen_reg_rtx (SImode);
-	  emit_move_insn (tmp, gen_lowpart (SImode, source));
-	  emit_insn (gen_movsf_from_si (dest, tmp));
+	  emit_insn (gen_movsf_from_di (dest, inner_source));
 	  return true;
 	}
 
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index d6af66a1728..fd3f6043e4b 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -466,6 +466,9 @@
 ; Any supported integer mode.
 (define_mode_iterator INT [QI HI SI DI TI PTI])
 
+; Any supported integer mode that is at least 32-bits in size.
+(define_mode_iterator SI_DI_TI [SI DI TI])
+
 ; Any supported integer mode that fits in one register.
 (define_mode_iterator INT1 [QI HI SI (DI "TARGET_POWERPC64")])
 
@@ -7861,11 +7864,11 @@
 
 ;;	    LWZ          LFS        LXSSP      LXSSPX     STW        STFIWX
 ;;	    STXSIWX      GPR->VSX   VSX->GPR   GPR->GPR
-(define_insn_and_split "movsf_from_si"
+(define_insn_and_split "movsf_from_<mode>"
   [(set (match_operand:SF 0 "nonimmediate_operand"
 	    "=!r,       f,         v,         wa,        m,         Z,
 	     Z,         wa,        ?r,        !r")
-	(unspec:SF [(match_operand:SI 1 "input_operand" 
+	(unspec:SF [(match_operand:SI_DI_TI 1 "input_operand" 
 	    "m,         m,         wY,        Z,         r,         f,
 	     wa,        r,         wa,        r")]
 		   UNSPEC_SF_FROM_SI))
@@ -7874,7 +7877,7 @@
              X,         r,         X,         X"))]
   "TARGET_NO_SF_SUBREG
    && (register_operand (operands[0], SFmode)
-       || register_operand (operands[1], SImode))"
+       || register_operand (operands[1], <MODE>mode))"
   "@
    lwz%U1%X1 %0,%1
    lfs%U1%X1 %0,%1
@@ -7889,13 +7892,15 @@
 
   "&& reload_completed
    && vsx_reg_sfsubreg_ok (operands[0], SFmode)
-   && int_reg_operand_not_pseudo (operands[1], SImode)"
+   && int_reg_operand_not_pseudo (operands[1], <MODE>mode)"
   [(const_int 0)]
 {
   rtx op0 = operands[0];
   rtx op1 = operands[1];
   rtx op2 = operands[2];
-  rtx op1_di = gen_rtx_REG (DImode, REGNO (op1));
+  rtx op1_di = ((<MODE>mode == SImode)
+		? gen_rtx_REG (DImode, reg_or_subregno (op1))
+		: gen_lowpart (DImode, op1));
 
   /* Move SF value to upper 32-bits for xscvspdpn.  */
   emit_insn (gen_ashldi3 (op2, op1_di, GEN_INT (32)));


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [gcc(refs/users/meissner/heads/work067)] Fix SFmode subreg of DImode and TImode
@ 2021-09-08 21:52 Michael Meissner
  0 siblings, 0 replies; 6+ messages in thread
From: Michael Meissner @ 2021-09-08 21:52 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:5cdcc6b5c2ebcec93e153fc80e5ff6804eed05d4

commit 5cdcc6b5c2ebcec93e153fc80e5ff6804eed05d4
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Wed Sep 8 17:51:36 2021 -0400

    Fix SFmode subreg of DImode and TImode
    
    I first noticed it in building the Spec 2017 wrf_r and blender_r
    benchmarks.  Once I applied this patch, I also noticed several of the
    tests now pass.
    
    This patch fixes the breakage in the PowerPC due to a recent change in SUBREG
    behavior.  While it is arguable that the patch that caused the breakage should
    be reverted, this patch should be a bandage to prevent these changes from
    happening again.
    
    The core of the problem is we need to treat SUBREG's of SFmode and SImode
    specially on the PowerPC.  This is due to the fact that SFmode values that are
    in the vector and floating point registers are represented as DFmode.  When we
    want to do a direct move between the GPR registers and the vector registers, we
    have to convert the value from the DFmode representation to/from the SFmode
    representation.
    
    By doing this special processing instead of doing the transfer via store and
    load, we were able to speed up the math library which at times want to use the
    SFmode values in a union, and do logical operations on it (to test exponent
    ranges, etc.) and then move it over to use as a floating point value.
    
    I did a bootstrap build on a little endian power9 system with and without the
    patch applied.  There was no regression in the tests.  I'm doing a build on a
    big endian power8 system, but it hasn't finished yet as I sent this email.  I
    will check on the big endian progress tomorrow morning.
    
    The following tests now pass once again with the test.
    
            C tests:
            ========
            gcc.c-torture/compile/20071102-1.c
            gcc.c-torture/compile/pr55921.c
            gcc.c-torture/compile/pr85945.c
            gcc.c-torture/execute/complex-3.c
            gcc.dg/atomic/c11-atomic-exec-1.c
            gcc.dg/atomic/c11-atomic-exec-2.c
            gcc.dg/atomic/c11-atomic-exec-4.c
            gcc.dg/atomic/c11-atomic-exec-5.c
            gcc.dg/c11-atomic-2.c
            gcc.dg/pr42475.c
            gcc.dg/pr47201.c
            gcc.dg/pr48335-1.c
            gcc.dg/torture/pr67741.c
            gcc.dg/tree-ssa/ssa-dom-thread-10.c
            gcc.dg/tsan/pr88030.c
            gcc.dg/ubsan/float-cast-overflow-atomic.c
            gcc.dg/vect/no-tree-sra-bb-slp-pr50730.c
    
            C++ tests:
            ==========
            g++.dg/opt/alias1.C
            g++.dg/template/koenig6.C
            g++.dg/torture/pr40924.C
            tmpdir-g++.dg-struct-layout-1/t001
    
            Fortran tests:
            ==============
            gfortran.dg/array_constructor_type_22.f03
            gfortran.dg/array_function_6.f90
            gfortran.dg/derived_comp_array_ref_7.f90
            gfortran.dg/elemental_scalar_args_1.f90
            gfortran.dg/elemental_subroutine_1.f90
            gfortran.dg/inline_matmul_5.f90
            gfortran.dg/inline_matmul_8.f90
            gfortran.dg/inline_matmul_9.f90
            gfortran.dg/matmul_bounds_6.f90
            gfortran.dg/operator_1.f90
            gfortran.dg/past_eor.f90
            gfortran.dg/pr101121.f
            gfortran.dg/pr91552.f90
            gfortran.dg/spread_shape_1.f90
            gfortran.dg/typebound_operator_3.f03
            gfortran.dg/value_1.f90
            gfortran.fortran-torture/execute/entry_4.f90
            gfortran.fortran-torture/execute/intrinsic_dotprod.f90
            gfortran.fortran-torture/execute/intrinsic_matmul.f90
    
    2021-09-08  Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/rs6000.c (rs6000_emit_move_si_sf_subreg): Deal
            with SUBREGs of TImode and DImode.

Diff:
---
 gcc/config/rs6000/rs6000.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index b9ebd56c993..7bbf29a3e1c 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -10942,6 +10942,16 @@ rs6000_emit_move_si_sf_subreg (rtx dest, rtx source, machine_mode mode)
 	  return true;
 	}
 
+      /* In case we are given a SUBREG for a larger type, reduce it to
+	 SImode.  */
+      if (mode == SFmode && GET_MODE_SIZE (inner_mode) > 4)
+	{
+	  rtx tmp = gen_reg_rtx (SImode);
+	  emit_move_insn (tmp, gen_lowpart (SImode, source));
+	  emit_insn (gen_movsf_from_si (dest, tmp));
+	  return true;
+	}
+
       if (mode == SFmode && inner_mode == SImode)
 	{
 	  emit_insn (gen_movsf_from_si (dest, inner_source));


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [gcc(refs/users/meissner/heads/work067)] Fix SFmode subreg of DImode and TImode
@ 2021-09-08 19:10 Michael Meissner
  0 siblings, 0 replies; 6+ messages in thread
From: Michael Meissner @ 2021-09-08 19:10 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:93e602421060606c2ae9de1091ba4dfb82f271cd

commit 93e602421060606c2ae9de1091ba4dfb82f271cd
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Wed Sep 8 15:10:01 2021 -0400

    Fix SFmode subreg of DImode and TImode
    
    I first noticed it in building the Spec 2017 wrf_r and blender_r
    benchmarks.  Once I applied this patch, I also noticed several of the
    tests now pass.
    
    This patch fixes the breakage in the PowerPC due to a recent change in SUBREG
    behavior.  While it is arguable that the patch that caused the breakage should
    be reverted, this patch should be a bandage to prevent these changes from
    happening again.
    
    This is the second version of the patch.  The first version of patch used a
    pseudo register to convert the inner integer type to SImode.  This version
    explicitly allows subregs of the larger DImode and TImode without using a
    pseudo register.
    
    The core of the problem is we need to treat SUBREG's of SFmode and SImode
    specially on the PowerPC.  This is due to the fact that SFmode values that are
    in the vector and floating point registers are represented as DFmode.  When we
    want to do a direct move between the GPR registers and the vector registers, we
    have to convert the value from the DFmode representation to/from the SFmode
    representation.
    
    By doing this special processing instead of doing the transfer via store and
    load, we were able to speed up the math library which at times want to use the
    SFmode values in a union, and do logical operations on it (to test exponent
    ranges, etc.) and then move it over to use as a floating point value.
    
    I did a bootstrap build on a little endian power9 system with and without the
    patch applied.  There was no regression in the tests.  I'm doing a build on a
    big endian power8 system, but it hasn't finished yet as I sent this email.  I
    will check on the big endian progress tomorrow morning.
    
    The following tests now pass once again with the test.
    
            C tests:
            ========
            gcc.c-torture/compile/20071102-1.c
            gcc.c-torture/compile/pr55921.c
            gcc.c-torture/compile/pr85945.c
            gcc.c-torture/execute/complex-3.c
            gcc.dg/atomic/c11-atomic-exec-1.c
            gcc.dg/atomic/c11-atomic-exec-2.c
            gcc.dg/atomic/c11-atomic-exec-4.c
            gcc.dg/atomic/c11-atomic-exec-5.c
            gcc.dg/c11-atomic-2.c
            gcc.dg/pr42475.c
            gcc.dg/pr47201.c
            gcc.dg/pr48335-1.c
            gcc.dg/torture/pr67741.c
            gcc.dg/tree-ssa/ssa-dom-thread-10.c
            gcc.dg/tsan/pr88030.c
            gcc.dg/ubsan/float-cast-overflow-atomic.c
            gcc.dg/vect/no-tree-sra-bb-slp-pr50730.c
    
            C++ tests:
            ==========
            g++.dg/opt/alias1.C
            g++.dg/template/koenig6.C
            g++.dg/torture/pr40924.C
            tmpdir-g++.dg-struct-layout-1/t001
    
            Fortran tests:
            ==============
            gfortran.dg/array_constructor_type_22.f03
            gfortran.dg/array_function_6.f90
            gfortran.dg/derived_comp_array_ref_7.f90
            gfortran.dg/elemental_scalar_args_1.f90
            gfortran.dg/elemental_subroutine_1.f90
            gfortran.dg/inline_matmul_5.f90
            gfortran.dg/inline_matmul_8.f90
            gfortran.dg/inline_matmul_9.f90
            gfortran.dg/matmul_bounds_6.f90
            gfortran.dg/operator_1.f90
            gfortran.dg/past_eor.f90
            gfortran.dg/pr101121.f
            gfortran.dg/pr91552.f90
            gfortran.dg/spread_shape_1.f90
            gfortran.dg/typebound_operator_3.f03
            gfortran.dg/value_1.f90
            gfortran.fortran-torture/execute/entry_4.f90
            gfortran.fortran-torture/execute/intrinsic_dotprod.f90
            gfortran.fortran-torture/execute/intrinsic_matmul.f90
    
    2021-09-08  Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/rs6000.c (rs6000_emit_move_si_sf_subreg): Deal
            with SUBREGs of TImode and DImode.
            * config/rs6000/rs6000.md (SI_DI_TI): New mode iterator.
            (movsf_from_<mode>): Replace movsf_from_si to add support for
            subregs of DImode and TImode.

Diff:
---
 gcc/config/rs6000/rs6000.c  | 13 +++++++++++++
 gcc/config/rs6000/rs6000.md | 15 ++++++++++-----
 2 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index b9ebd56c993..bd1ae1f8d6e 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -10942,6 +10942,19 @@ rs6000_emit_move_si_sf_subreg (rtx dest, rtx source, machine_mode mode)
 	  return true;
 	}
 
+      /* Deal with subregs of SI/DI/TImode.  */
+      if (mode == SFmode && inner_mode == TImode)
+	{
+	  emit_insn (gen_movsf_from_ti (dest, inner_source));
+	  return true;
+	}
+
+      if (mode == SFmode && inner_mode == DImode)
+	{
+	  emit_insn (gen_movsf_from_di (dest, inner_source));
+	  return true;
+	}
+
       if (mode == SFmode && inner_mode == SImode)
 	{
 	  emit_insn (gen_movsf_from_si (dest, inner_source));
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index d6af66a1728..88fec34c87f 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -472,6 +472,9 @@
 ; Integer modes supported in VSX registers with ISA 3.0 instructions
 (define_mode_iterator INT_ISA3 [QI HI SI DI])
 
+; Any supported integer mode that is at least 32-bits in size.
+(define_mode_iterator SI_DI_TI [SI DI TI])
+
 ; Everything we can extend QImode to.
 (define_mode_iterator EXTQI [SI (DI "TARGET_POWERPC64")])
 
@@ -7861,11 +7864,11 @@
 
 ;;	    LWZ          LFS        LXSSP      LXSSPX     STW        STFIWX
 ;;	    STXSIWX      GPR->VSX   VSX->GPR   GPR->GPR
-(define_insn_and_split "movsf_from_si"
+(define_insn_and_split "movsf_from_<mode>"
   [(set (match_operand:SF 0 "nonimmediate_operand"
 	    "=!r,       f,         v,         wa,        m,         Z,
 	     Z,         wa,        ?r,        !r")
-	(unspec:SF [(match_operand:SI 1 "input_operand" 
+	(unspec:SF [(match_operand:SI_DI_TI 1 "input_operand" 
 	    "m,         m,         wY,        Z,         r,         f,
 	     wa,        r,         wa,        r")]
 		   UNSPEC_SF_FROM_SI))
@@ -7874,7 +7877,7 @@
              X,         r,         X,         X"))]
   "TARGET_NO_SF_SUBREG
    && (register_operand (operands[0], SFmode)
-       || register_operand (operands[1], SImode))"
+       || register_operand (operands[1], <MODE>mode))"
   "@
    lwz%U1%X1 %0,%1
    lfs%U1%X1 %0,%1
@@ -7889,13 +7892,15 @@
 
   "&& reload_completed
    && vsx_reg_sfsubreg_ok (operands[0], SFmode)
-   && int_reg_operand_not_pseudo (operands[1], SImode)"
+   && int_reg_operand_not_pseudo (operands[1], <MODE>mode)"
   [(const_int 0)]
 {
   rtx op0 = operands[0];
   rtx op1 = operands[1];
   rtx op2 = operands[2];
-  rtx op1_di = gen_rtx_REG (DImode, REGNO (op1));
+  rtx op1_di = ((<MODE>mode == SImode)
+		? gen_rtx_REG (DImode, reg_or_subregno (op1))
+		: gen_lowpart (DImode, op1));
 
   /* Move SF value to upper 32-bits for xscvspdpn.  */
   emit_insn (gen_ashldi3 (op2, op1_di, GEN_INT (32)));


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [gcc(refs/users/meissner/heads/work067)] Fix SFmode subreg of DImode and TImode
@ 2021-09-08 19:00 Michael Meissner
  0 siblings, 0 replies; 6+ messages in thread
From: Michael Meissner @ 2021-09-08 19:00 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:cdbed964fa7ab39fe5ea130e578f97e2a5cb45cb

commit cdbed964fa7ab39fe5ea130e578f97e2a5cb45cb
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Wed Sep 8 14:59:41 2021 -0400

    Fix SFmode subreg of DImode and TImode
    
    I first noticed it in building the Spec 2017 wrf_r and blender_r
    benchmarks.  Once I applied this patch, I also noticed several of the
    tests now pass.
    
    This patch fixes the breakage in the PowerPC due to a recent change in SUBREG
    behavior.  While it is arguable that the patch that caused the breakage should
    be reverted, this patch should be a bandage to prevent these changes from
    happening again.
    
    This is the second version of the patch.  The first version of patch used a
    pseudo register to convert the inner integer type to SImode.  This version
    explicitly allows subregs of the larger DImode and TImode without using a
    pseudo register.
    
    The core of the problem is we need to treat SUBREG's of SFmode and SImode
    specially on the PowerPC.  This is due to the fact that SFmode values that are
    in the vector and floating point registers are represented as DFmode.  When we
    want to do a direct move between the GPR registers and the vector registers, we
    have to convert the value from the DFmode representation to/from the SFmode
    representation.
    
    By doing this special processing instead of doing the transfer via store and
    load, we were able to speed up the math library which at times want to use the
    SFmode values in a union, and do logical operations on it (to test exponent
    ranges, etc.) and then move it over to use as a floating point value.
    
    I did a bootstrap build on a little endian power9 system with and without the
    patch applied.  There was no regression in the tests.  I'm doing a build on a
    big endian power8 system, but it hasn't finished yet as I sent this email.  I
    will check on the big endian progress tomorrow morning.
    
    The following tests now pass once again with the test.
    
            C tests:
            ========
            gcc.c-torture/compile/20071102-1.c
            gcc.c-torture/compile/pr55921.c
            gcc.c-torture/compile/pr85945.c
            gcc.c-torture/execute/complex-3.c
            gcc.dg/atomic/c11-atomic-exec-1.c
            gcc.dg/atomic/c11-atomic-exec-2.c
            gcc.dg/atomic/c11-atomic-exec-4.c
            gcc.dg/atomic/c11-atomic-exec-5.c
            gcc.dg/c11-atomic-2.c
            gcc.dg/pr42475.c
            gcc.dg/pr47201.c
            gcc.dg/pr48335-1.c
            gcc.dg/torture/pr67741.c
            gcc.dg/tree-ssa/ssa-dom-thread-10.c
            gcc.dg/tsan/pr88030.c
            gcc.dg/ubsan/float-cast-overflow-atomic.c
            gcc.dg/vect/no-tree-sra-bb-slp-pr50730.c
    
            C++ tests:
            ==========
            g++.dg/opt/alias1.C
            g++.dg/template/koenig6.C
            g++.dg/torture/pr40924.C
            tmpdir-g++.dg-struct-layout-1/t001
    
            Fortran tests:
            ==============
            gfortran.dg/array_constructor_type_22.f03
            gfortran.dg/array_function_6.f90
            gfortran.dg/derived_comp_array_ref_7.f90
            gfortran.dg/elemental_scalar_args_1.f90
            gfortran.dg/elemental_subroutine_1.f90
            gfortran.dg/inline_matmul_5.f90
            gfortran.dg/inline_matmul_8.f90
            gfortran.dg/inline_matmul_9.f90
            gfortran.dg/matmul_bounds_6.f90
            gfortran.dg/operator_1.f90
            gfortran.dg/past_eor.f90
            gfortran.dg/pr101121.f
            gfortran.dg/pr91552.f90
            gfortran.dg/spread_shape_1.f90
            gfortran.dg/typebound_operator_3.f03
            gfortran.dg/value_1.f90
            gfortran.fortran-torture/execute/entry_4.f90
            gfortran.fortran-torture/execute/intrinsic_dotprod.f90
            gfortran.fortran-torture/execute/intrinsic_matmul.f90
    
    2021-09-08  Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/rs6000.c (rs6000_emit_move_si_sf_subreg): Deal
            with SUBREGs of TImode and DImode.
            * config/rs6000/rs6000.md (SI_DI_TI): New mode iterator.
            (movsf_from_<mode>): Replace movsf_from_si to add support for
            subregs of DImode and TImode.

Diff:
---
 gcc/config/rs6000/rs6000.c  | 15 +++++++++------
 gcc/config/rs6000/rs6000.md | 15 ++++++++++-----
 2 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 7bbf29a3e1c..bd1ae1f8d6e 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -10942,13 +10942,16 @@ rs6000_emit_move_si_sf_subreg (rtx dest, rtx source, machine_mode mode)
 	  return true;
 	}
 
-      /* In case we are given a SUBREG for a larger type, reduce it to
-	 SImode.  */
-      if (mode == SFmode && GET_MODE_SIZE (inner_mode) > 4)
+      /* Deal with subregs of SI/DI/TImode.  */
+      if (mode == SFmode && inner_mode == TImode)
 	{
-	  rtx tmp = gen_reg_rtx (SImode);
-	  emit_move_insn (tmp, gen_lowpart (SImode, source));
-	  emit_insn (gen_movsf_from_si (dest, tmp));
+	  emit_insn (gen_movsf_from_ti (dest, inner_source));
+	  return true;
+	}
+
+      if (mode == SFmode && inner_mode == DImode)
+	{
+	  emit_insn (gen_movsf_from_di (dest, inner_source));
 	  return true;
 	}
 
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index d6af66a1728..fd3f6043e4b 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -466,6 +466,9 @@
 ; Any supported integer mode.
 (define_mode_iterator INT [QI HI SI DI TI PTI])
 
+; Any supported integer mode that is at least 32-bits in size.
+(define_mode_iterator SI_DI_TI [SI DI TI])
+
 ; Any supported integer mode that fits in one register.
 (define_mode_iterator INT1 [QI HI SI (DI "TARGET_POWERPC64")])
 
@@ -7861,11 +7864,11 @@
 
 ;;	    LWZ          LFS        LXSSP      LXSSPX     STW        STFIWX
 ;;	    STXSIWX      GPR->VSX   VSX->GPR   GPR->GPR
-(define_insn_and_split "movsf_from_si"
+(define_insn_and_split "movsf_from_<mode>"
   [(set (match_operand:SF 0 "nonimmediate_operand"
 	    "=!r,       f,         v,         wa,        m,         Z,
 	     Z,         wa,        ?r,        !r")
-	(unspec:SF [(match_operand:SI 1 "input_operand" 
+	(unspec:SF [(match_operand:SI_DI_TI 1 "input_operand" 
 	    "m,         m,         wY,        Z,         r,         f,
 	     wa,        r,         wa,        r")]
 		   UNSPEC_SF_FROM_SI))
@@ -7874,7 +7877,7 @@
              X,         r,         X,         X"))]
   "TARGET_NO_SF_SUBREG
    && (register_operand (operands[0], SFmode)
-       || register_operand (operands[1], SImode))"
+       || register_operand (operands[1], <MODE>mode))"
   "@
    lwz%U1%X1 %0,%1
    lfs%U1%X1 %0,%1
@@ -7889,13 +7892,15 @@
 
   "&& reload_completed
    && vsx_reg_sfsubreg_ok (operands[0], SFmode)
-   && int_reg_operand_not_pseudo (operands[1], SImode)"
+   && int_reg_operand_not_pseudo (operands[1], <MODE>mode)"
   [(const_int 0)]
 {
   rtx op0 = operands[0];
   rtx op1 = operands[1];
   rtx op2 = operands[2];
-  rtx op1_di = gen_rtx_REG (DImode, REGNO (op1));
+  rtx op1_di = ((<MODE>mode == SImode)
+		? gen_rtx_REG (DImode, reg_or_subregno (op1))
+		: gen_lowpart (DImode, op1));
 
   /* Move SF value to upper 32-bits for xscvspdpn.  */
   emit_insn (gen_ashldi3 (op2, op1_di, GEN_INT (32)));


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [gcc(refs/users/meissner/heads/work067)] Fix SFmode subreg of DImode and TImode
@ 2021-09-08 16:53 Michael Meissner
  0 siblings, 0 replies; 6+ messages in thread
From: Michael Meissner @ 2021-09-08 16:53 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:4137e04e9e3edb9ff71ab69d987eec221c32cefc

commit 4137e04e9e3edb9ff71ab69d987eec221c32cefc
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Wed Sep 8 12:50:57 2021 -0400

    Fix SFmode subreg of DImode and TImode
    
    I first noticed it in building the Spec 2017 wrf_r and blender_r
    benchmarks.  Once I applied this patch, I also noticed several of the
    tests now pass.
    
    This patch fixes the breakage in the PowerPC due to a recent change in SUBREG
    behavior.  While it is arguable that the patch that caused the breakage should
    be reverted, this patch should be a bandage to prevent these changes from
    happening again.
    
    This is the second version of the patch.  The first version of patch used a
    pseudo register to convert the inner integer type to SImode.  This version
    explicitly allows subregs of the larger DImode and TImode without using a
    pseudo register.
    
    The core of the problem is we need to treat SUBREG's of SFmode and SImode
    specially on the PowerPC.  This is due to the fact that SFmode values that are
    in the vector and floating point registers are represented as DFmode.  When we
    want to do a direct move between the GPR registers and the vector registers, we
    have to convert the value from the DFmode representation to/from the SFmode
    representation.
    
    By doing this special processing instead of doing the transfer via store and
    load, we were able to speed up the math library which at times want to use the
    SFmode values in a union, and do logical operations on it (to test exponent
    ranges, etc.) and then move it over to use as a floating point value.
    
    I did a bootstrap build on a little endian power9 system with and without the
    patch applied.  There was no regression in the tests.  I'm doing a build on a
    big endian power8 system, but it hasn't finished yet as I sent this email.  I
    will check on the big endian progress tomorrow morning.
    
    The following tests now pass once again with the test.
    
            C tests:
            ========
            gcc.c-torture/compile/20071102-1.c
            gcc.c-torture/compile/pr55921.c
            gcc.c-torture/compile/pr85945.c
            gcc.c-torture/execute/complex-3.c
            gcc.dg/atomic/c11-atomic-exec-1.c
            gcc.dg/atomic/c11-atomic-exec-2.c
            gcc.dg/atomic/c11-atomic-exec-4.c
            gcc.dg/atomic/c11-atomic-exec-5.c
            gcc.dg/c11-atomic-2.c
            gcc.dg/pr42475.c
            gcc.dg/pr47201.c
            gcc.dg/pr48335-1.c
            gcc.dg/torture/pr67741.c
            gcc.dg/tree-ssa/ssa-dom-thread-10.c
            gcc.dg/tsan/pr88030.c
            gcc.dg/ubsan/float-cast-overflow-atomic.c
            gcc.dg/vect/no-tree-sra-bb-slp-pr50730.c
    
            C++ tests:
            ==========
            g++.dg/opt/alias1.C
            g++.dg/template/koenig6.C
            g++.dg/torture/pr40924.C
            tmpdir-g++.dg-struct-layout-1/t001
    
            Fortran tests:
            ==============
            gfortran.dg/array_constructor_type_22.f03
            gfortran.dg/array_function_6.f90
            gfortran.dg/derived_comp_array_ref_7.f90
            gfortran.dg/elemental_scalar_args_1.f90
            gfortran.dg/elemental_subroutine_1.f90
            gfortran.dg/inline_matmul_5.f90
            gfortran.dg/inline_matmul_8.f90
            gfortran.dg/inline_matmul_9.f90
            gfortran.dg/matmul_bounds_6.f90
            gfortran.dg/operator_1.f90
            gfortran.dg/past_eor.f90
            gfortran.dg/pr101121.f
            gfortran.dg/pr91552.f90
            gfortran.dg/spread_shape_1.f90
            gfortran.dg/typebound_operator_3.f03
            gfortran.dg/value_1.f90
            gfortran.fortran-torture/execute/entry_4.f90
            gfortran.fortran-torture/execute/intrinsic_dotprod.f90
            gfortran.fortran-torture/execute/intrinsic_matmul.f90
    
    2021-09-08  Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/rs6000.c (rs6000_emit_move_si_sf_subreg): Deal
            with SUBREGs of TImode and DImode.
            * config/rs6000/rs6000.md (SI_DI_TI): New mode iterator.
            (movsf_from_<mode>): Replace movsf_from_si to add support for
            subregs of DImode and TImode.

Diff:
---
 gcc/config/rs6000/rs6000.c  | 17 +++++------------
 gcc/config/rs6000/rs6000.md | 15 +++++++++++----
 2 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index da5abe6f7b5..bd1ae1f8d6e 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -10942,23 +10942,16 @@ rs6000_emit_move_si_sf_subreg (rtx dest, rtx source, machine_mode mode)
 	  return true;
 	}
 
-      /* In case we are given a SUBREG for a larger type, reduce it to
-	 SImode.  */
-      if (mode == SFmode && GET_MODE_SIZE (inner_mode) > 4)
+      /* Deal with subregs of SI/DI/TImode.  */
+      if (mode == SFmode && inner_mode == TImode)
 	{
-	  rtx tmp = gen_reg_rtx (SImode);
-	  emit_move_insn (tmp, gen_lowpart (SImode, source));
-	  emit_insn (gen_movsf_from_si (dest, tmp));
+	  emit_insn (gen_movsf_from_ti (dest, inner_source));
 	  return true;
 	}
 
-      /* In case we are given a SUBREG for a larger type, reduce it to
-	 SImode.  */
-      if (mode == SFmode && GET_MODE_SIZE (inner_mode) > 4)
+      if (mode == SFmode && inner_mode == DImode)
 	{
-	  rtx tmp = gen_reg_rtx (SImode);
-	  emit_move_insn (tmp, gen_lowpart (SImode, source));
-	  emit_insn (gen_movsf_from_si (dest, tmp));
+	  emit_insn (gen_movsf_from_di (dest, inner_source));
 	  return true;
 	}
 
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index d6af66a1728..bce19310321 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -466,6 +466,9 @@
 ; Any supported integer mode.
 (define_mode_iterator INT [QI HI SI DI TI PTI])
 
+; Any supported integer mode that is at least 32-bits in size.
+(define_mode_iterator SI_DI_TI [SI DI TI])
+
 ; Any supported integer mode that fits in one register.
 (define_mode_iterator INT1 [QI HI SI (DI "TARGET_POWERPC64")])
 
@@ -7861,11 +7864,11 @@
 
 ;;	    LWZ          LFS        LXSSP      LXSSPX     STW        STFIWX
 ;;	    STXSIWX      GPR->VSX   VSX->GPR   GPR->GPR
-(define_insn_and_split "movsf_from_si"
+(define_insn_and_split "movsf_from_<mode>"
   [(set (match_operand:SF 0 "nonimmediate_operand"
 	    "=!r,       f,         v,         wa,        m,         Z,
 	     Z,         wa,        ?r,        !r")
-	(unspec:SF [(match_operand:SI 1 "input_operand" 
+	(unspec:SF [(match_operand:SI_DI_TI 1 "input_operand" 
 	    "m,         m,         wY,        Z,         r,         f,
 	     wa,        r,         wa,        r")]
 		   UNSPEC_SF_FROM_SI))
@@ -7874,7 +7877,7 @@
              X,         r,         X,         X"))]
   "TARGET_NO_SF_SUBREG
    && (register_operand (operands[0], SFmode)
-       || register_operand (operands[1], SImode))"
+       || register_operand (operands[1], <MODE>mode))"
   "@
    lwz%U1%X1 %0,%1
    lfs%U1%X1 %0,%1
@@ -7889,12 +7892,16 @@
 
   "&& reload_completed
    && vsx_reg_sfsubreg_ok (operands[0], SFmode)
-   && int_reg_operand_not_pseudo (operands[1], SImode)"
+   && int_reg_operand_not_pseudo (operands[1], <MODE>mode)"
   [(const_int 0)]
 {
   rtx op0 = operands[0];
   rtx op1 = operands[1];
   rtx op2 = operands[2];
+
+  if (GET_MODE (op1) != SImode)
+    op1 = gen_lowpart (SImode, op1);
+
   rtx op1_di = gen_rtx_REG (DImode, REGNO (op1));
 
   /* Move SF value to upper 32-bits for xscvspdpn.  */


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [gcc(refs/users/meissner/heads/work067)] Fix SFmode subreg of DImode and TImode
@ 2021-09-06 21:17 Michael Meissner
  0 siblings, 0 replies; 6+ messages in thread
From: Michael Meissner @ 2021-09-06 21:17 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:ac07ff1369cd50d70c689933ac7e46bfa70eb229

commit ac07ff1369cd50d70c689933ac7e46bfa70eb229
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Mon Sep 6 17:16:25 2021 -0400

    Fix SFmode subreg of DImode and TImode
    
    2021-09-06  Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/rs6000.c (rs6000_emit_move_si_sf_subreg): Deal
            with SUBREGs of TImode and DImode.

Diff:
---
 gcc/config/rs6000/rs6000.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index b9ebd56c993..7bbf29a3e1c 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -10942,6 +10942,16 @@ rs6000_emit_move_si_sf_subreg (rtx dest, rtx source, machine_mode mode)
 	  return true;
 	}
 
+      /* In case we are given a SUBREG for a larger type, reduce it to
+	 SImode.  */
+      if (mode == SFmode && GET_MODE_SIZE (inner_mode) > 4)
+	{
+	  rtx tmp = gen_reg_rtx (SImode);
+	  emit_move_insn (tmp, gen_lowpart (SImode, source));
+	  emit_insn (gen_movsf_from_si (dest, tmp));
+	  return true;
+	}
+
       if (mode == SFmode && inner_mode == SImode)
 	{
 	  emit_insn (gen_movsf_from_si (dest, inner_source));


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-09-08 21:52 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-08 18:09 [gcc(refs/users/meissner/heads/work067)] Fix SFmode subreg of DImode and TImode Michael Meissner
  -- strict thread matches above, loose matches on Subject: below --
2021-09-08 21:52 Michael Meissner
2021-09-08 19:10 Michael Meissner
2021-09-08 19:00 Michael Meissner
2021-09-08 16:53 Michael Meissner
2021-09-06 21:17 Michael Meissner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).