* [PATCH] Add explicit VIS intrinsics for addition and subtraction.
@ 2011-09-27 7:19 David Miller
2011-09-27 9:35 ` Eric Botcazou
0 siblings, 1 reply; 7+ messages in thread
From: David Miller @ 2011-09-27 7:19 UTC (permalink / raw)
To: gcc-patches; +Cc: ebotcazou
Eric, I'm sure you have noticed this, but the Sparc target test
"combined-1.c" fails for some time on 32-bit because of how float
arguments are passed in the 32-bit SPARC ABI.
Since they are passed in integer registers, the vectorizer does
the initial logical operations using non-VIS instructions,
then pops them into the FPU regs to use the VIS vector addition
and subtraction.
Because of this some of the scan-assembler directives in that test
miss.
Would you mind if I added a hack, like we use in the other test
cases and the one I added here, to force the VIS operations to
be used on 32-bit? We could either pass the vectors as pointer
args, or use the trick where we return the vector values from
extern functions. Any preference?
Committed to trunk.
gcc/
* config/sparc/sparc.c (sparc_vis_init_builtins): Add explicit
builtins for VIS vector addition and subtraction.
* config/sparc/visintrin.h (__vis_fpadd16, __vis_fpadd16s,
__vis_fpadd32, __vis_fpadd32s, __vis_fpsub16, __vis_fpsub16s,
__vis_fpsub32, __vis_fpsub32s): New.
* doc/extend.texi: Document new VIS intrinsics.
gcc/testsuite/
* gcc.target/sparc/fpaddsubi.c: New test.
---
gcc/ChangeLog | 7 +++
gcc/config/sparc/sparc.c | 21 ++++++++++
gcc/config/sparc/visintrin.h | 57 +++++++++++++++++++++++++++
gcc/doc/extend.texi | 10 +++++
gcc/testsuite/ChangeLog | 2 +
gcc/testsuite/gcc.target/sparc/fpaddsubi.c | 58 ++++++++++++++++++++++++++++
6 files changed, 155 insertions(+), 0 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/sparc/fpaddsubi.c
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index ecdb26b..a3dd883 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -24,6 +24,13 @@
* config/sparc/sparc-protos.h (sparc_target_macros): Declare.
* config/sparc/sparc.h (TARGE_CPU_CPP_BUILTINS): Call it.
+ * config/sparc/sparc.c (sparc_vis_init_builtins): Add explicit
+ builtins for VIS vector addition and subtraction.
+ * config/sparc/visintrin.h (__vis_fpadd16, __vis_fpadd16s,
+ __vis_fpadd32, __vis_fpadd32s, __vis_fpsub16, __vis_fpsub16s,
+ __vis_fpsub32, __vis_fpsub32s): New.
+ * doc/extend.texi: Document new VIS intrinsics.
+
2011-09-26 Georg-Johann Lay <avr@gjlay.de>
* config/avr/avr.md (peephole casesi+2): Use -1 instead of 65536.
diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c
index d1d8355..44d7f20 100644
--- a/gcc/config/sparc/sparc.c
+++ b/gcc/config/sparc/sparc.c
@@ -9173,6 +9173,7 @@ sparc_vis_init_builtins (void)
tree v4hi = build_vector_type (intHI_type_node, 4);
tree v2hi = build_vector_type (intHI_type_node, 2);
tree v2si = build_vector_type (intSI_type_node, 2);
+ tree v1si = build_vector_type (intSI_type_node, 1);
tree v4qi_ftype_v4hi = build_function_type_list (v4qi, v4hi, 0);
tree v8qi_ftype_v2si_v8qi = build_function_type_list (v8qi, v2si, v8qi, 0);
@@ -9186,6 +9187,8 @@ sparc_vis_init_builtins (void)
tree v4hi_ftype_v4hi_v4hi = build_function_type_list (v4hi, v4hi, v4hi, 0);
tree v2si_ftype_v2si_v2si = build_function_type_list (v2si, v2si, v2si, 0);
tree v8qi_ftype_v8qi_v8qi = build_function_type_list (v8qi, v8qi, v8qi, 0);
+ tree v2hi_ftype_v2hi_v2hi = build_function_type_list (v2hi, v2hi, v2hi, 0);
+ tree v1si_ftype_v1si_v1si = build_function_type_list (v1si, v1si, v1si, 0);
tree di_ftype_v8qi_v8qi_di = build_function_type_list (intDI_type_node,
v8qi, v8qi,
intDI_type_node, 0);
@@ -9350,6 +9353,24 @@ sparc_vis_init_builtins (void)
def_builtin_const ("__builtin_vis_fcmpeq32", CODE_FOR_fcmpeq32si_vis,
si_ftype_v2si_v2si);
}
+
+ /* Addition and subtraction. */
+ def_builtin_const ("__builtin_vis_fpadd16", CODE_FOR_addv4hi3,
+ v4hi_ftype_v4hi_v4hi);
+ def_builtin_const ("__builtin_vis_fpadd16s", CODE_FOR_addv2hi3,
+ v2hi_ftype_v2hi_v2hi);
+ def_builtin_const ("__builtin_vis_fpadd32", CODE_FOR_addv2si3,
+ v2si_ftype_v2si_v2si);
+ def_builtin_const ("__builtin_vis_fpadd32s", CODE_FOR_addsi3,
+ v1si_ftype_v1si_v1si);
+ def_builtin_const ("__builtin_vis_fpsub16", CODE_FOR_subv4hi3,
+ v4hi_ftype_v4hi_v4hi);
+ def_builtin_const ("__builtin_vis_fpsub16s", CODE_FOR_subv2hi3,
+ v2hi_ftype_v2hi_v2hi);
+ def_builtin_const ("__builtin_vis_fpsub32", CODE_FOR_subv2si3,
+ v2si_ftype_v2si_v2si);
+ def_builtin_const ("__builtin_vis_fpsub32s", CODE_FOR_subsi3,
+ v1si_ftype_v1si_v1si);
}
/* Handle TARGET_EXPAND_BUILTIN target hook.
diff --git a/gcc/config/sparc/visintrin.h b/gcc/config/sparc/visintrin.h
index d37bd95..eb2b4ec 100644
--- a/gcc/config/sparc/visintrin.h
+++ b/gcc/config/sparc/visintrin.h
@@ -25,6 +25,7 @@
#define _VISINTRIN_H_INCLUDED
typedef int __v2si __attribute__ ((__vector_size__ (8)));
+typedef int __v1si __attribute__ ((__vector_size__ (4)));
typedef short __v4hi __attribute__ ((__vector_size__ (8)));
typedef short __v2hi __attribute__ ((__vector_size__ (4)));
typedef unsigned char __v8qi __attribute__ ((__vector_size__ (8)));
@@ -276,4 +277,60 @@ __vis_fcmpeq32 (__v2si __A, __v2si __B)
return __builtin_vis_fcmpeq32 (__A, __B);
}
+extern __inline __v4hi
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__vis_fpadd16 (__v4hi __A, __v4hi __B)
+{
+ return __builtin_vis_fpadd16 (__A, __B);
+}
+
+extern __inline __v2hi
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__vis_fpadd16s (__v2hi __A, __v2hi __B)
+{
+ return __builtin_vis_fpadd16s (__A, __B);
+}
+
+extern __inline __v2si
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__vis_fpadd32 (__v2si __A, __v2si __B)
+{
+ return __builtin_vis_fpadd32 (__A, __B);
+}
+
+extern __inline __v1si
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__vis_fpadd32s (__v1si __A, __v1si __B)
+{
+ return __builtin_vis_fpadd32s (__A, __B);
+}
+
+extern __inline __v4hi
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__vis_fpsub16 (__v4hi __A, __v4hi __B)
+{
+ return __builtin_vis_fpsub16 (__A, __B);
+}
+
+extern __inline __v2hi
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__vis_fpsub16s (__v2hi __A, __v2hi __B)
+{
+ return __builtin_vis_fpsub16s (__A, __B);
+}
+
+extern __inline __v2si
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__vis_fpsub32 (__v2si __A, __v2si __B)
+{
+ return __builtin_vis_fpsub32 (__A, __B);
+}
+
+extern __inline __v1si
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+__vis_fpsub32s (__v1si __A, __v1si __B)
+{
+ return __builtin_vis_fpsub32s (__A, __B);
+}
+
#endif /* _VISINTRIN_H_INCLUDED */
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index e9d0bc7..195fa8c 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -12929,6 +12929,7 @@ the SPARC Visual Instruction Set (VIS). When you use the @option{-mvis}
switch, the VIS extension is exposed as the following built-in functions:
@smallexample
+typedef int v1si __attribute__ ((vector_size (4)));
typedef int v2si __attribute__ ((vector_size (8)));
typedef short v4hi __attribute__ ((vector_size (8)));
typedef short v2hi __attribute__ ((vector_size (4)));
@@ -12977,6 +12978,15 @@ long __builtin_vis_fcmpgt16 (v4hi, v4hi);
long __builtin_vis_fcmpgt32 (v2si, v2si);
long __builtin_vis_fcmpeq16 (v4hi, v4hi);
long __builtin_vis_fcmpeq32 (v2si, v2si);
+
+v4hi __builtin_vis_fpadd16 (v4hi, v4hi);
+v2hi __builtin_vis_fpadd16s (v2hi, v2hi);
+v2si __builtin_vis_fpadd32 (v2si, v2si);
+v1si __builtin_vis_fpadd32s (v1si, v1si);
+v4hi __builtin_vis_fpsub16 (v4hi, v4hi);
+v2hi __builtin_vis_fpsub16s (v2hi, v2hi);
+v2si __builtin_vis_fpsub32 (v2si, v2si);
+v1si __builtin_vis_fpsub32s (v1si, v1si);
@end smallexample
@node SPU Built-in Functions
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index ebc9385..b430baf 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -14,6 +14,8 @@
* gcc.target/sparc/edge.c: Update for new return types.
* gcc.target/sparc/fcmp.c: Likewise.
+ * gcc.target/sparc/fpaddsubi.c: New test.
+
2011-09-26 Janus Weil <janus@gcc.gnu.org>
PR fortran/50515
diff --git a/gcc/testsuite/gcc.target/sparc/fpaddsubi.c b/gcc/testsuite/gcc.target/sparc/fpaddsubi.c
new file mode 100644
index 0000000..a36108e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/sparc/fpaddsubi.c
@@ -0,0 +1,58 @@
+/* { dg-do compile } */
+/* { dg-options "-O -mcpu=ultrasparc -mvis" } */
+typedef int __v2si __attribute__((vector_size(8)));
+typedef int __v1si __attribute__((vector_size(4)));
+typedef short __v4hi __attribute__((vector_size(8)));
+typedef short __v2hi __attribute__((vector_size(4)));
+
+extern __v1si foo_x (void);
+extern __v1si foo_y (void);
+
+__v4hi test_fpadd16 (__v4hi x, __v4hi y)
+{
+ return __builtin_vis_fpadd16 (x, y);
+}
+
+__v2hi test_fpadd16s (__v2hi x, __v2hi y)
+{
+ return __builtin_vis_fpadd16s (x, y);
+}
+
+__v4hi test_fpsub16 (__v4hi x, __v4hi y)
+{
+ return __builtin_vis_fpsub16 (x, y);
+}
+
+__v2hi test_fpsub16s (__v2hi x, __v2hi y)
+{
+ return __builtin_vis_fpsub16s (x, y);
+}
+
+__v2si test_fpadd32 (__v2si x, __v2si y)
+{
+ return __builtin_vis_fpadd32 (x, y);
+}
+
+__v1si test_fpadd32s (void)
+{
+ return __builtin_vis_fpadd32s (foo_x (), foo_y ());
+}
+
+__v2si test_fpsub32 (__v2si x, __v2si y)
+{
+ return __builtin_vis_fpsub32 (x, y);
+}
+
+__v1si test_fpsub32s (__v1si x, __v1si y)
+{
+ return __builtin_vis_fpsub32s (foo_x (), foo_y ());
+}
+
+/* { dg-final { scan-assembler "fpadd16\t%" } } */
+/* { dg-final { scan-assembler "fpadd16s\t%" } } */
+/* { dg-final { scan-assembler "fpsub16\t%" } } */
+/* { dg-final { scan-assembler "fpsub16s\t%" } } */
+/* { dg-final { scan-assembler "fpadd32\t%" } } */
+/* { dg-final { scan-assembler "fpadd32s\t%" } } */
+/* { dg-final { scan-assembler "fpsub32\t%" } } */
+/* { dg-final { scan-assembler "fpsub32s\t%" } } */
--
1.7.6.401.g6a319
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Add explicit VIS intrinsics for addition and subtraction.
2011-09-27 7:19 [PATCH] Add explicit VIS intrinsics for addition and subtraction David Miller
@ 2011-09-27 9:35 ` Eric Botcazou
2011-09-27 9:45 ` David Miller
0 siblings, 1 reply; 7+ messages in thread
From: Eric Botcazou @ 2011-09-27 9:35 UTC (permalink / raw)
To: David Miller; +Cc: gcc-patches
> Eric, I'm sure you have noticed this, but the Sparc target test
> "combined-1.c" fails for some time on 32-bit because of how float
> arguments are passed in the 32-bit SPARC ABI.
>
> Since they are passed in integer registers, the vectorizer does
> the initial logical operations using non-VIS instructions,
> then pops them into the FPU regs to use the VIS vector addition
> and subtraction.
Yes, I noticed it, but this is a regression and probably is unrelated to the
vectorizer. IIRC the problem comes from the RA, which forces a bogus reload.
That wouldn't be the first time some RA change breaks vector support on SPARC
32-bit and this usually means that the change is at least questionable.
> Would you mind if I added a hack, like we use in the other test
> cases and the one I added here, to force the VIS operations to
> be used on 32-bit? We could either pass the vectors as pointer
> args, or use the trick where we return the vector values from
> extern functions.
Let me first investigate a bit.
--
Eric Botcazou
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Add explicit VIS intrinsics for addition and subtraction.
2011-09-27 9:35 ` Eric Botcazou
@ 2011-09-27 9:45 ` David Miller
2011-09-29 1:15 ` Eric Botcazou
0 siblings, 1 reply; 7+ messages in thread
From: David Miller @ 2011-09-27 9:45 UTC (permalink / raw)
To: ebotcazou; +Cc: gcc-patches
From: Eric Botcazou <ebotcazou@adacore.com>
Date: Tue, 27 Sep 2011 10:04:35 +0200
> Yes, I noticed it, but this is a regression and probably is
> unrelated to the vectorizer. IIRC the problem comes from the RA,
> which forces a bogus reload. That wouldn't be the first time some
> RA change breaks vector support on SPARC 32-bit and this usually
> means that the change is at least questionable.
I see, thanks for the information.
> Let me first investigate a bit.
No problem.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Add explicit VIS intrinsics for addition and subtraction.
2011-09-27 9:45 ` David Miller
@ 2011-09-29 1:15 ` Eric Botcazou
2011-10-13 21:05 ` David Miller
2011-10-14 15:15 ` Vladimir Makarov
0 siblings, 2 replies; 7+ messages in thread
From: Eric Botcazou @ 2011-09-29 1:15 UTC (permalink / raw)
To: David Miller; +Cc: gcc-patches, Vladimir Makarov
[Vlad, if you have a few minutes, would you mind having a look at the couple of
questions at the end of the message? Thanks in advance].
> No problem.
Here are the results of the investigation. Pseudo 116 needs to be assigned a
hard register. It is used mostly in vector instructions so we would like it
to be assigned a FP reg, but it is initialized in insn 2:
(insn 2 5 3 2 (set (reg/v:V4HI 116 [ a ])
(reg:V4HI 24 %i0 [ a ])) combined-1.c:7 93 {*movdf_insn_sp32_v9}
(expr_list:REG_DEAD (reg:V4HI 24 %i0 [ a ])
(nil)))
so it ends up being assigned the (integer) argument register %i0 instead. It
used to be assigned a FP reg as expected with the GCC 4.6.x series.
The register class preference discovery is OK:
r116: preferred EXTRA_FP_REGS, alternative GENERAL_OR_EXTRA_FP_REGS,
allocno GENERAL_OR_EXTRA_FP_REGS
a2 (r116,l0) best EXTRA_FP_REGS, allocno GENERAL_OR_EXTRA_FP_REGS
i.e. EXTRA_FP_REGS is "preferred"/"best". Then it seems that this preference
is dropped and only the class of the allocno, GENERAL_OR_EXTRA_FP_REGS, is
handed down to the coloring stage. By contrast, in the GCC 4.6 series, the
cover_class of the allocno is EXTRA_FP_REGS.
The initial cost for %i0 is twice as high (24000) as the cost of FP regs. But
then it is reduced by 12000 when process_bb_node_for_hard_reg_moves sees insn
2 above and then again by 12000 when process_regs_for_copy sees the same insn.
So, in the end, %i0 is given cost 0 and thus beats every other register. This
doesn't happen in the GCC 4.6 series because %i0 isn't in the cover_class.
This is at -O1. At -O2, there is an extra pass at the discovery stage and it
sets the class of the allocno to EXTRA_FP_REGS, like with the GCC 4.6 series,
so a simple workaround is
Index: gcc.target/sparc/combined-1.c
===================================================================
--- gcc.target/sparc/combined-1.c (revision 179316)
+++ gcc.target/sparc/combined-1.c (working copy)
@@ -1,5 +1,5 @@
/* { dg-do compile } */
-/* { dg-options "-O -mcpu=ultrasparc -mvis" } */
+/* { dg-options "-O2 -mcpu=ultrasparc -mvis" } */
typedef short vec16 __attribute__((vector_size(8)));
typedef int vec32 __attribute__((vector_size(8)));
Finally the couple of questions:
1. Is it expected that the register class preference be dropped at -O1?
2. Is it expected that a single insn be processed by 2 different mechanisms
that independently halve the initial cost of a hard register?
--
Eric Botcazou
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Add explicit VIS intrinsics for addition and subtraction.
2011-09-29 1:15 ` Eric Botcazou
@ 2011-10-13 21:05 ` David Miller
2011-10-14 15:15 ` Vladimir Makarov
1 sibling, 0 replies; 7+ messages in thread
From: David Miller @ 2011-10-13 21:05 UTC (permalink / raw)
To: ebotcazou; +Cc: gcc-patches, vmakarov
From: Eric Botcazou <ebotcazou@adacore.com>
Date: Thu, 29 Sep 2011 00:38:49 +0200
> [Vlad, if you have a few minutes, would you mind having a look at the couple of
> questions at the end of the message? Thanks in advance].
Vlad, ping?
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Add explicit VIS intrinsics for addition and subtraction.
2011-09-29 1:15 ` Eric Botcazou
2011-10-13 21:05 ` David Miller
@ 2011-10-14 15:15 ` Vladimir Makarov
2011-10-15 17:02 ` Eric Botcazou
1 sibling, 1 reply; 7+ messages in thread
From: Vladimir Makarov @ 2011-10-14 15:15 UTC (permalink / raw)
To: Eric Botcazou; +Cc: David Miller, gcc-patches
On 09/28/2011 06:38 PM, Eric Botcazou wrote:
> [Vlad, if you have a few minutes, would you mind having a look at the couple of
> questions at the end of the message? Thanks in advance].
>
>> No problem.
> Here are the results of the investigation. Pseudo 116 needs to be assigned a
> hard register. It is used mostly in vector instructions so we would like it
> to be assigned a FP reg, but it is initialized in insn 2:
>
> (insn 2 5 3 2 (set (reg/v:V4HI 116 [ a ])
> (reg:V4HI 24 %i0 [ a ])) combined-1.c:7 93 {*movdf_insn_sp32_v9}
> (expr_list:REG_DEAD (reg:V4HI 24 %i0 [ a ])
> (nil)))
>
> so it ends up being assigned the (integer) argument register %i0 instead. It
> used to be assigned a FP reg as expected with the GCC 4.6.x series.
>
>
> The register class preference discovery is OK:
>
> r116: preferred EXTRA_FP_REGS, alternative GENERAL_OR_EXTRA_FP_REGS,
> allocno GENERAL_OR_EXTRA_FP_REGS
> a2 (r116,l0) best EXTRA_FP_REGS, allocno GENERAL_OR_EXTRA_FP_REGS
>
> i.e. EXTRA_FP_REGS is "preferred"/"best". Then it seems that this preference
> is dropped and only the class of the allocno, GENERAL_OR_EXTRA_FP_REGS, is
> handed down to the coloring stage. By contrast, in the GCC 4.6 series, the
> cover_class of the allocno is EXTRA_FP_REGS.
>
> The initial cost for %i0 is twice as high (24000) as the cost of FP regs. But
> then it is reduced by 12000 when process_bb_node_for_hard_reg_moves sees insn
> 2 above and then again by 12000 when process_regs_for_copy sees the same insn.
> So, in the end, %i0 is given cost 0 and thus beats every other register. This
> doesn't happen in the GCC 4.6 series because %i0 isn't in the cover_class.
>
> This is at -O1. At -O2, there is an extra pass at the discovery stage and it
> sets the class of the allocno to EXTRA_FP_REGS, like with the GCC 4.6 series,
> so a simple workaround is
>
> Index: gcc.target/sparc/combined-1.c
> ===================================================================
> --- gcc.target/sparc/combined-1.c (revision 179316)
> +++ gcc.target/sparc/combined-1.c (working copy)
> @@ -1,5 +1,5 @@
> /* { dg-do compile } */
> -/* { dg-options "-O -mcpu=ultrasparc -mvis" } */
> +/* { dg-options "-O2 -mcpu=ultrasparc -mvis" } */
> typedef short vec16 __attribute__((vector_size(8)));
> typedef int vec32 __attribute__((vector_size(8)));
>
>
> Finally the couple of questions:
>
> 1. Is it expected that the register class preference be dropped at -O1?
>
> 2. Is it expected that a single insn be processed by 2 different mechanisms
> that independently halve the initial cost of a hard register?
>
>
Sorry for the delay with the answer. I missed this email.
About the 1st question. Before gcc4.7, the only class (allocno class)
used for coloring can be a cover class. So it was not possible to use
GENERAL_OR_EXTRA_FP_REGS in gcc4.6 and older versions. Starting gcc4.7,
class used for coloring can be any class which is more profitable than
memory. Although there is inaccuracy in cost calculations for -O1
because only one pass for cost calculations is used (it is very
expensive pass). To get better cost evaluations, more passes should be
used. But again we don't do more 2 passes because even one pass is not
cheap.
In brief, I don't see any criminal that the class calculation is
different for -O1 and -O2.
About the 2nd question. It seems to me wrong. I'd remove function
process_bb_node_for_hard_reg_moves and its call from
setup_allocno_cover_class_and_costs because function
process_regs_for_copy is more accurate (it works with subreg).
Although, I might be miss something here. There were a lot of problems
and tunings of cost calculation code. Generated code *performance* (and
even generation of *valid* code) is very sensitive to changes in
ira-costs.c. So even if such change looks obvious, a lot of testing and
benchmarking should be done. I could do that but it will take a week or
two before committing such change if everything is ok.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Add explicit VIS intrinsics for addition and subtraction.
2011-10-14 15:15 ` Vladimir Makarov
@ 2011-10-15 17:02 ` Eric Botcazou
0 siblings, 0 replies; 7+ messages in thread
From: Eric Botcazou @ 2011-10-15 17:02 UTC (permalink / raw)
To: Vladimir Makarov; +Cc: David Miller, gcc-patches
> About the 1st question. Before gcc4.7, the only class (allocno class)
> used for coloring can be a cover class. So it was not possible to use
> GENERAL_OR_EXTRA_FP_REGS in gcc4.6 and older versions. Starting gcc4.7,
> class used for coloring can be any class which is more profitable than
> memory. Although there is inaccuracy in cost calculations for -O1
> because only one pass for cost calculations is used (it is very
> expensive pass). To get better cost evaluations, more passes should be
> used. But again we don't do more 2 passes because even one pass is not
> cheap.
>
> In brief, I don't see any criminal that the class calculation is
> different for -O1 and -O2.
Fine with me. I'm going to apply the above patchlet then.
> About the 2nd question. It seems to me wrong. I'd remove function
> process_bb_node_for_hard_reg_moves and its call from
> setup_allocno_cover_class_and_costs because function
> process_regs_for_copy is more accurate (it works with subreg).
> Although, I might be miss something here. There were a lot of problems
> and tunings of cost calculation code. Generated code *performance* (and
> even generation of *valid* code) is very sensitive to changes in
> ira-costs.c. So even if such change looks obvious, a lot of testing and
> benchmarking should be done. I could do that but it will take a week or
> two before committing such change if everything is ok.
Understood. I essentially wanted to bring this to your attention, since it
looked like a small oddity to me. This might be something to play with for
future improvements, but it's your call of course.
Thanks for the detailed answer.
--
Eric Botcazou
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2011-10-15 14:45 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-27 7:19 [PATCH] Add explicit VIS intrinsics for addition and subtraction David Miller
2011-09-27 9:35 ` Eric Botcazou
2011-09-27 9:45 ` David Miller
2011-09-29 1:15 ` Eric Botcazou
2011-10-13 21:05 ` David Miller
2011-10-14 15:15 ` Vladimir Makarov
2011-10-15 17:02 ` Eric Botcazou
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).