public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH-1v2, rs6000] Enable SImode in FP registers on P7 [PR88558]
@ 2023-09-04  5:33 HAO CHEN GUI
  2023-09-12  9:33 ` Kewen.Lin
  0 siblings, 1 reply; 6+ messages in thread
From: HAO CHEN GUI @ 2023-09-04  5:33 UTC (permalink / raw)
  To: gcc-patches; +Cc: Segher Boessenkool, David, Kewen.Lin, Peter Bergner

Hi,
  This patch enables SImode in FP registers on P7. Instruction "fctiw"
stores its integer output in an FP register. So SImode in FP register
needs be enabled on P7 if we want support "fctiw" on P7.

  The test case is in the second patch which implements 32bit inline
lrint.

  Compared to the last version, the main change it to remove disparaging
on the alternatives of "fmr". Test shows it doesn't cause regression.
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628435.html

  Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.


ChangeLog
rs6000: enable SImode in FP register on P7

gcc/
	PR target/88558
	* config/rs6000/rs6000.cc (rs6000_hard_regno_mode_ok_uncached):
	Enable SImode in FP registers on P7.
	* config/rs6000/rs6000.md (*movsi_internal1): Add fmr for SImode
	move between FP registers.  Set attribute isa of stfiwx to "*"
	and attribute of stxsiwx to "p7".

patch.diff
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 44b448d2ba6..99085c2cdd7 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -1903,7 +1903,7 @@ rs6000_hard_regno_mode_ok_uncached (int regno, machine_mode mode)
 	  if(GET_MODE_SIZE (mode) == UNITS_PER_FP_WORD)
 	    return 1;

-	  if (TARGET_P8_VECTOR && (mode == SImode))
+	  if (TARGET_POPCNTD && mode == SImode)
 	    return 1;

 	  if (TARGET_P9_VECTOR && (mode == QImode || mode == HImode))
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index cdab49fbb91..edf49bd74e3 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -7566,7 +7566,7 @@ (define_split

 (define_insn "*movsi_internal1"
   [(set (match_operand:SI 0 "nonimmediate_operand"
-	  "=r,         r,
+	  "=r,         r,          d,
 	   r,          d,          v,
 	   m,          ?Z,         ?Z,
 	   r,          r,          r,          r,
@@ -7575,7 +7575,7 @@ (define_insn "*movsi_internal1"
 	   wa,         r,
 	   r,          *h,         *h")
 	(match_operand:SI 1 "input_operand"
-	  "r,          U,
+	  "r,          U,          d,
 	   m,          ?Z,         ?Z,
 	   r,          d,          v,
 	   I,          L,          eI,         n,
@@ -7588,6 +7588,7 @@ (define_insn "*movsi_internal1"
   "@
    mr %0,%1
    la %0,%a1
+   fmr %0,%1
    lwz%U1%X1 %0,%1
    lfiwzx %0,%y1
    lxsiwzx %x0,%y1
@@ -7611,7 +7612,7 @@ (define_insn "*movsi_internal1"
    mt%0 %1
    nop"
   [(set_attr "type"
-	  "*,          *,
+	  "*,          *,          fpsimple,
 	   load,       fpload,     fpload,
 	   store,      fpstore,    fpstore,
 	   *,          *,          *,          *,
@@ -7620,7 +7621,7 @@ (define_insn "*movsi_internal1"
 	   mtvsr,      mfvsr,
 	   *,          *,          *")
    (set_attr "length"
-	  "*,          *,
+	  "*,          *,          *,
 	   *,          *,          *,
 	   *,          *,          *,
 	   *,          *,          *,          8,
@@ -7629,9 +7630,9 @@ (define_insn "*movsi_internal1"
 	   *,          *,
 	   *,          *,          *")
    (set_attr "isa"
-	  "*,          *,
-	   *,          p8v,        p8v,
-	   *,          p8v,        p8v,
+	  "*,          *,          *,
+	   *,          p7,         p8v,
+	   *,          *,          p8v,
 	   *,          *,          p10,        *,
 	   p8v,        p9v,        p9v,        p8v,
 	   p9v,        p8v,        p9v,


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH-1v2, rs6000] Enable SImode in FP registers on P7 [PR88558]
  2023-09-04  5:33 [PATCH-1v2, rs6000] Enable SImode in FP registers on P7 [PR88558] HAO CHEN GUI
@ 2023-09-12  9:33 ` Kewen.Lin
  2023-09-14  8:35   ` HAO CHEN GUI
  0 siblings, 1 reply; 6+ messages in thread
From: Kewen.Lin @ 2023-09-12  9:33 UTC (permalink / raw)
  To: HAO CHEN GUI; +Cc: Segher Boessenkool, David, Peter Bergner, gcc-patches

Hi Haochen,

on 2023/9/4 13:33, HAO CHEN GUI wrote:
> Hi,
>   This patch enables SImode in FP registers on P7. Instruction "fctiw"
> stores its integer output in an FP register. So SImode in FP register
> needs be enabled on P7 if we want support "fctiw" on P7.
> 
>   The test case is in the second patch which implements 32bit inline
> lrint.
> 
>   Compared to the last version, the main change it to remove disparaging
> on the alternatives of "fmr". Test shows it doesn't cause regression.

Ok, at least regression testing doesn't expose any needs to do disparaging
for this.  Could you also test this patch with SPEC2017 for P7 and P8
separately at options like -O2 or -O3, to see if there is any assembly
change, and if yes filtering out some typical to check it's expected or
not?  I think it can help us to better evaluate the impact.  Thanks!

BR,
Kewen

> https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628435.html
> 
>   Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.
> 
> 
> ChangeLog
> rs6000: enable SImode in FP register on P7
> 
> gcc/
> 	PR target/88558
> 	* config/rs6000/rs6000.cc (rs6000_hard_regno_mode_ok_uncached):
> 	Enable SImode in FP registers on P7.
> 	* config/rs6000/rs6000.md (*movsi_internal1): Add fmr for SImode
> 	move between FP registers.  Set attribute isa of stfiwx to "*"
> 	and attribute of stxsiwx to "p7".
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 44b448d2ba6..99085c2cdd7 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -1903,7 +1903,7 @@ rs6000_hard_regno_mode_ok_uncached (int regno, machine_mode mode)
>  	  if(GET_MODE_SIZE (mode) == UNITS_PER_FP_WORD)
>  	    return 1;
> 
> -	  if (TARGET_P8_VECTOR && (mode == SImode))
> +	  if (TARGET_POPCNTD && mode == SImode)
>  	    return 1;
> 
>  	  if (TARGET_P9_VECTOR && (mode == QImode || mode == HImode))
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index cdab49fbb91..edf49bd74e3 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -7566,7 +7566,7 @@ (define_split
> 
>  (define_insn "*movsi_internal1"
>    [(set (match_operand:SI 0 "nonimmediate_operand"
> -	  "=r,         r,
> +	  "=r,         r,          d,
>  	   r,          d,          v,
>  	   m,          ?Z,         ?Z,
>  	   r,          r,          r,          r,
> @@ -7575,7 +7575,7 @@ (define_insn "*movsi_internal1"
>  	   wa,         r,
>  	   r,          *h,         *h")
>  	(match_operand:SI 1 "input_operand"
> -	  "r,          U,
> +	  "r,          U,          d,
>  	   m,          ?Z,         ?Z,
>  	   r,          d,          v,
>  	   I,          L,          eI,         n,
> @@ -7588,6 +7588,7 @@ (define_insn "*movsi_internal1"
>    "@
>     mr %0,%1
>     la %0,%a1
> +   fmr %0,%1
>     lwz%U1%X1 %0,%1
>     lfiwzx %0,%y1
>     lxsiwzx %x0,%y1
> @@ -7611,7 +7612,7 @@ (define_insn "*movsi_internal1"
>     mt%0 %1
>     nop"
>    [(set_attr "type"
> -	  "*,          *,
> +	  "*,          *,          fpsimple,
>  	   load,       fpload,     fpload,
>  	   store,      fpstore,    fpstore,
>  	   *,          *,          *,          *,
> @@ -7620,7 +7621,7 @@ (define_insn "*movsi_internal1"
>  	   mtvsr,      mfvsr,
>  	   *,          *,          *")
>     (set_attr "length"
> -	  "*,          *,
> +	  "*,          *,          *,
>  	   *,          *,          *,
>  	   *,          *,          *,
>  	   *,          *,          *,          8,
> @@ -7629,9 +7630,9 @@ (define_insn "*movsi_internal1"
>  	   *,          *,
>  	   *,          *,          *")
>     (set_attr "isa"
> -	  "*,          *,
> -	   *,          p8v,        p8v,
> -	   *,          p8v,        p8v,
> +	  "*,          *,          *,
> +	   *,          p7,         p8v,
> +	   *,          *,          p8v,
>  	   *,          *,          p10,        *,
>  	   p8v,        p9v,        p9v,        p8v,
>  	   p9v,        p8v,        p9v,
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH-1v2, rs6000] Enable SImode in FP registers on P7 [PR88558]
  2023-09-12  9:33 ` Kewen.Lin
@ 2023-09-14  8:35   ` HAO CHEN GUI
  2023-09-18  7:34     ` Kewen.Lin
  0 siblings, 1 reply; 6+ messages in thread
From: HAO CHEN GUI @ 2023-09-14  8:35 UTC (permalink / raw)
  To: Kewen.Lin; +Cc: Segher Boessenkool, David, Peter Bergner, gcc-patches

Hi Kewen,

在 2023/9/12 17:33, Kewen.Lin 写道:
> Ok, at least regression testing doesn't expose any needs to do disparaging
> for this.  Could you also test this patch with SPEC2017 for P7 and P8
> separately at options like -O2 or -O3, to see if there is any assembly
> change, and if yes filtering out some typical to check it's expected or
> not?  I think it can help us to better evaluate the impact.  Thanks!

Just compared the object files of SPEC2017 for P7 and P8. There is no
difference between P7s'. For P8, some different object files are found.
All differences are the same. Patched object files replace xxlor with fmr.
It's expected as the fmr is added to ahead of xxlor in "*movsi_internal1".

Thanks
Gui Haochen

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH-1v2, rs6000] Enable SImode in FP registers on P7 [PR88558]
  2023-09-14  8:35   ` HAO CHEN GUI
@ 2023-09-18  7:34     ` Kewen.Lin
  2023-09-25  1:57       ` HAO CHEN GUI
  0 siblings, 1 reply; 6+ messages in thread
From: Kewen.Lin @ 2023-09-18  7:34 UTC (permalink / raw)
  To: HAO CHEN GUI; +Cc: Segher Boessenkool, David, Peter Bergner, gcc-patches

Hi Haochen,

on 2023/9/14 16:35, HAO CHEN GUI wrote:
> Hi Kewen,
> 
> 在 2023/9/12 17:33, Kewen.Lin 写道:
>> Ok, at least regression testing doesn't expose any needs to do disparaging
>> for this.  Could you also test this patch with SPEC2017 for P7 and P8
>> separately at options like -O2 or -O3, to see if there is any assembly
>> change, and if yes filtering out some typical to check it's expected or
>> not?  I think it can help us to better evaluate the impact.  Thanks!
> 
> Just compared the object files of SPEC2017 for P7 and P8. There is no
> difference between P7s'. For P8, some different object files are found.
> All differences are the same. Patched object files replace xxlor with fmr.
> It's expected as the fmr is added to ahead of xxlor in "*movsi_internal1".

Thanks for checking!  So for P7, this patch looks neutral, but for P8 and
later, it may cause some few differences in code gen.  I'm curious that how
many total object files and different object files were checked and found
on P8?  fmr or xxlor preference can be further considered along with existing:
https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612821.html
I also wonder if it's easy to reduce some of them further as small test cases.

Since xxlor is better than fmr at least on Power10, could you also evaluate
the affected bmks on P10 (even P8/P9) to ensure no performance degradation?

Thanks!

BR,
Kewen


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH-1v2, rs6000] Enable SImode in FP registers on P7 [PR88558]
  2023-09-18  7:34     ` Kewen.Lin
@ 2023-09-25  1:57       ` HAO CHEN GUI
  2023-09-27  5:07         ` Kewen.Lin
  0 siblings, 1 reply; 6+ messages in thread
From: HAO CHEN GUI @ 2023-09-25  1:57 UTC (permalink / raw)
  To: Kewen.Lin; +Cc: Segher Boessenkool, David, Peter Bergner, gcc-patches

Hi Kewen,

在 2023/9/18 15:34, Kewen.Lin 写道:
> hanks for checking!  So for P7, this patch looks neutral, but for P8 and
> later, it may cause some few differences in code gen.  I'm curious that how
> many total object files and different object files were checked and found
> on P8?  
P8 with -O2, following object files are different.
507.cactuBSSN_r datestamp.o
511.povray_r colutils.o
521.wrf_r module_cu_kfeta.fppized.o
526.blender_r particle_edit.o
526.blender_r glutil.o
526.blender_r displist.o
526.blender_r CCGSubSurf.o

P8 with -O3, following object files are different.
502.gcc_r ifcvt.o
502.gcc_r rtlanal.o
548.exchange2_r exchange2.fppized.o
507.cactuBSSN_r datestamp.o
511.povray_r colutils.o
521.wrf_r module_bc.fppized.o
521.wrf_r module_cu_kfeta.fppized.o
526.blender_r particle_edit.o
526.blender_r displist.o
526.blender_r CCGSubSurf.o
526.blender_r sketch.o




> https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612821.html
> I also wonder if it's easy to reduce some of them further as small test cases.
> 
> Since xxlor is better than fmr at least on Power10, could you also evaluate
> the affected bmks on P10 (even P8/P9) to ensure no performance degradation?
There is no performance recession on P10/P9/P8. The detail data is listed on
internal issue.

Thanks
Gui Haochen

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH-1v2, rs6000] Enable SImode in FP registers on P7 [PR88558]
  2023-09-25  1:57       ` HAO CHEN GUI
@ 2023-09-27  5:07         ` Kewen.Lin
  0 siblings, 0 replies; 6+ messages in thread
From: Kewen.Lin @ 2023-09-27  5:07 UTC (permalink / raw)
  To: HAO CHEN GUI; +Cc: Segher Boessenkool, David, Peter Bergner, gcc-patches

Hi,

on 2023/9/25 09:57, HAO CHEN GUI wrote:
> Hi Kewen,
> 
> 在 2023/9/18 15:34, Kewen.Lin 写道:
>> hanks for checking!  So for P7, this patch looks neutral, but for P8 and
>> later, it may cause some few differences in code gen.  I'm curious that how
>> many total object files and different object files were checked and found
>> on P8?  
> P8 with -O2, following object files are different.
> 507.cactuBSSN_r datestamp.o
> 511.povray_r colutils.o
> 521.wrf_r module_cu_kfeta.fppized.o
> 526.blender_r particle_edit.o
> 526.blender_r glutil.o
> 526.blender_r displist.o
> 526.blender_r CCGSubSurf.o
> 
> P8 with -O3, following object files are different.
> 502.gcc_r ifcvt.o
> 502.gcc_r rtlanal.o
> 548.exchange2_r exchange2.fppized.o
> 507.cactuBSSN_r datestamp.o
> 511.povray_r colutils.o
> 521.wrf_r module_bc.fppized.o
> 521.wrf_r module_cu_kfeta.fppized.o
> 526.blender_r particle_edit.o
> 526.blender_r displist.o
> 526.blender_r CCGSubSurf.o
> 526.blender_r sketch.o
> 

OK, it's concluded that the percentage of the total number of affected object
files is quite small ...

> 
> 
> 
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612821.html
>> I also wonder if it's easy to reduce some of them further as small test cases.
>>
>> Since xxlor is better than fmr at least on Power10, could you also evaluate
>> the affected bmks on P10 (even P8/P9) to ensure no performance degradation?
> There is no performance recession on P10/P9/P8. The detail data is listed on
> internal issue.

... and no runtime performance impact as evaluated, so this patch looks good to
me and thanks for further testing.

Please wait for a week or so to see if Segher and David have comments.  Thanks!

BR,
Kewen

> 
> Thanks
> Gui Haochen



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-09-27  5:07 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-04  5:33 [PATCH-1v2, rs6000] Enable SImode in FP registers on P7 [PR88558] HAO CHEN GUI
2023-09-12  9:33 ` Kewen.Lin
2023-09-14  8:35   ` HAO CHEN GUI
2023-09-18  7:34     ` Kewen.Lin
2023-09-25  1:57       ` HAO CHEN GUI
2023-09-27  5:07         ` Kewen.Lin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).