[RFC][ARM]: Fix reload spill failure (PR 60617)

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [RFC][ARM]: Fix reload spill failure (PR 60617)
@ 2014-06-16 12:53 Venkataramanan Kumar
  2014-06-16 15:50 ` Ajit Kumar Agarwal
  2014-06-18  9:59 ` Ramana Radhakrishnan
  0 siblings, 2 replies; 6+ messages in thread
From: Venkataramanan Kumar @ 2014-06-16 12:53 UTC (permalink / raw)
  To: gcc-patches, Richard Earnshaw, Ramana Radhakrishnan,
	Marcus Shawcroft, Patch Tracking, Maxim Kuvyrkov

[-- Attachment #1: Type: text/plain, Size: 1761 bytes --]

Hi Maintainers,

This patch fixes the PR 60617 that occurs when we turn on reload pass
in thumb2 mode.

It occurs for the pattern "*ior_scc_scc" that gets generated for the 3
argument of the below function call.

JIT:emitStoreInt32(dst,regT0m, (op1 == dst || op2 == dst)));


(----snip---)
(insn 634 633 635 27 (parallel [
            (set (reg:SI 3 r3)
                (ior:SI (eq:SI (reg/v:SI 110 [ dst ]) <== This operand
r5 is registers gets assigned
                        (reg/v:SI 112 [ op2 ]))
                    (eq:SI (reg/v:SI 110 [ dst ]) <== This operand
                        (reg/v:SI 111 [ op1 ]))))
            (clobber (reg:CC 100 cc))
        ]) ../Source/JavaScriptCore/jit/JITArithmetic32_64.cpp:179 300
{*ior_scc_scc
(----snip---)

The issue here is that the above pattern demands 5 registers (LO_REGS).

But when we are in reload, registers r0 is used for pointer to the
class, r1 and r2 for first and second argument. r7 is used for stack
pointer.

So we are left with r3,r4,r5 and r6. But the above patterns needs five
LO_REGS. Hence we get spill failure when processing the last register
operand in that pattern,

In ARM port,  TARGET_LIKELY_SPILLED_CLASS is defined for Thumb-1 and
for thumb 2 mode there is mention of using LO_REG in the comment as
below.

"Care should be taken to avoid adding thumb-2 patterns that require
many low registers"

So conservative fix is not to allow this pattern for Thumb-2 mode.

I allowed these pattern for Thumb2 when we have constant operands for
comparison. That makes the target tests arm/thum2-cond-cmp-1.c to
thum2-cond-cmp-4.c pass.

Regression tested with gcc 4.9 branch since in trunk this bug is
masked revision 209897.

Please provide your suggestion on this patch

regards,
Venkat.

[-- Attachment #2: reload_spill_failure_fix.txt --]
[-- Type: text/plain, Size: 3857 bytes --]

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 0284f95..e8fbb11 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -10654,7 +10654,7 @@
 		 [(match_operand:SI 4 "s_register_operand" "r")
 		  (match_operand:SI 5 "arm_add_operand" "rIL")])))
    (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT
+  "TARGET_ARM
    && (arm_select_dominance_cc_mode (operands[3], operands[6], DOM_CC_X_OR_Y)
        != CCmode)"
   "#"
@@ -10675,6 +10675,36 @@
    (set_attr "type" "multiple")]
 )
 
+(define_insn_and_split "*ior_scc_scc_imm"
+  [(set (match_operand:SI 0 "s_register_operand" "=Ts")
+        (ior:SI (match_operator:SI 3 "arm_comparison_operator"
+                 [(match_operand:SI 1 "s_register_operand" "r")
+                  (match_operand:SI 2 "arm_addimm_operand" "IL")])
+                (match_operator:SI 6 "arm_comparison_operator"
+                 [(match_operand:SI 4 "s_register_operand" "r")
+                  (match_operand:SI 5 "arm_addimm_operand" "IL")])))
+   (clobber (reg:CC CC_REGNUM))]
+  "TARGET_THUMB2
+   && (arm_select_dominance_cc_mode (operands[3], operands[6], DOM_CC_X_OR_Y)
+       != CCmode)"
+  "#"
+  "TARGET_THUMB2 && reload_completed"
+  [(set (match_dup 7)
+        (compare
+         (ior:SI
+          (match_op_dup 3 [(match_dup 1) (match_dup 2)])
+          (match_op_dup 6 [(match_dup 4) (match_dup 5)]))
+         (const_int 0)))
+   (set (match_dup 0) (ne:SI (match_dup 7) (const_int 0)))]
+  "operands[7]
+     = gen_rtx_REG (arm_select_dominance_cc_mode (operands[3], operands[6],
+                                                  DOM_CC_X_OR_Y),
+                    CC_REGNUM);"
+  [(set_attr "conds" "clob")
+   (set_attr "length" "16")
+   (set_attr "type" "multiple")]
+)
+
 ; If the above pattern is followed by a CMP insn, then the compare is 
 ; redundant, since we can rework the conditional instruction that follows.
 (define_insn_and_split "*ior_scc_scc_cmp"
@@ -10714,7 +10744,7 @@
 		 [(match_operand:SI 4 "s_register_operand" "r")
 		  (match_operand:SI 5 "arm_add_operand" "rIL")])))
    (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT
+  "TARGET_ARM 
    && (arm_select_dominance_cc_mode (operands[3], operands[6], DOM_CC_X_AND_Y)
        != CCmode)"
   "#"
@@ -10737,6 +10767,38 @@
    (set_attr "type" "multiple")]
 )
 
+(define_insn_and_split "*and_scc_scc_imm"
+  [(set (match_operand:SI 0 "s_register_operand" "=Ts")
+        (and:SI (match_operator:SI 3 "arm_comparison_operator"
+                 [(match_operand:SI 1 "s_register_operand" "r")
+                  (match_operand:SI 2 "arm_addimm_operand" "IL")])
+                (match_operator:SI 6 "arm_comparison_operator"
+                 [(match_operand:SI 4 "s_register_operand" "r")
+                  (match_operand:SI 5 "arm_addimm_operand" "IL")])))
+   (clobber (reg:CC CC_REGNUM))]
+  "TARGET_THUMB2 
+   && (arm_select_dominance_cc_mode (operands[3], operands[6], DOM_CC_X_AND_Y)
+       != CCmode)"
+  "#"
+  "TARGET_THUMB2 && reload_completed
+   && (arm_select_dominance_cc_mode (operands[3], operands[6], DOM_CC_X_AND_Y)
+       != CCmode)"
+  [(set (match_dup 7)
+        (compare
+         (and:SI
+          (match_op_dup 3 [(match_dup 1) (match_dup 2)])
+          (match_op_dup 6 [(match_dup 4) (match_dup 5)]))
+         (const_int 0)))
+   (set (match_dup 0) (ne:SI (match_dup 7) (const_int 0)))]
+  "operands[7]
+     = gen_rtx_REG (arm_select_dominance_cc_mode (operands[3], operands[6],
+                                                  DOM_CC_X_AND_Y),
+                    CC_REGNUM);"
+  [(set_attr "conds" "clob")
+   (set_attr "length" "16")
+   (set_attr "type" "multiple")]
+)
+
 ; If the above pattern is followed by a CMP insn, then the compare is 
 ; redundant, since we can rework the conditional instruction that follows.
 (define_insn_and_split "*and_scc_scc_cmp"

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [RFC][ARM]: Fix reload spill failure (PR 60617)
  2014-06-16 12:53 [RFC][ARM]: Fix reload spill failure (PR 60617) Venkataramanan Kumar
@ 2014-06-16 15:50 ` Ajit Kumar Agarwal
  2014-06-18  9:59 ` Ramana Radhakrishnan
  1 sibling, 0 replies; 6+ messages in thread
From: Ajit Kumar Agarwal @ 2014-06-16 15:50 UTC (permalink / raw)
  To: Venkataramanan Kumar, gcc-patches, Richard Earnshaw,
	Ramana Radhakrishnan, Marcus Shawcroft, Patch Tracking,
	Maxim Kuvyrkov

You can assign the same register to more than operand based on the Liveness. It will be tricky if on the basis of Liveness  available registers not found. In that you need to spill one of the operands and use the registers assigned to the next operand. This forms the basis of spilling one of the values to assign the same registers to another operand.

This pattern is very specific to LO_REGS and I am sure there will low registers pressure for such operands. Is this pattern used  for dedicated registers for the operands?

Thanks & Regards
Ajit

-----Original Message-----
From: gcc-patches-owner@gcc.gnu.org [mailto:gcc-patches-owner@gcc.gnu.org] On Behalf Of Venkataramanan Kumar
Sent: Monday, June 16, 2014 6:23 PM
To: gcc-patches@gcc.gnu.org; Richard Earnshaw; Ramana Radhakrishnan; Marcus Shawcroft; Patch Tracking; Maxim Kuvyrkov
Subject: [RFC][ARM]: Fix reload spill failure (PR 60617)

Hi Maintainers,

This patch fixes the PR 60617 that occurs when we turn on reload pass in thumb2 mode.

It occurs for the pattern "*ior_scc_scc" that gets generated for the 3 argument of the below function call.

JIT:emitStoreInt32(dst,regT0m, (op1 == dst || op2 == dst)));

(----snip---)
(insn 634 633 635 27 (parallel [
            (set (reg:SI 3 r3)
                (ior:SI (eq:SI (reg/v:SI 110 [ dst ]) <== This operand
r5 is registers gets assigned
                        (reg/v:SI 112 [ op2 ]))
                    (eq:SI (reg/v:SI 110 [ dst ]) <== This operand
                        (reg/v:SI 111 [ op1 ]))))
            (clobber (reg:CC 100 cc))
        ]) ../Source/JavaScriptCore/jit/JITArithmetic32_64.cpp:179 300 {*ior_scc_scc
(----snip---)

The issue here is that the above pattern demands 5 registers (LO_REGS).

But when we are in reload, registers r0 is used for pointer to the class, r1 and r2 for first and second argument. r7 is used for stack pointer.

So we are left with r3,r4,r5 and r6. But the above patterns needs five LO_REGS. Hence we get spill failure when processing the last register operand in that pattern,

In ARM port,  TARGET_LIKELY_SPILLED_CLASS is defined for Thumb-1 and for thumb 2 mode there is mention of using LO_REG in the comment as below.

"Care should be taken to avoid adding thumb-2 patterns that require many low registers"

So conservative fix is not to allow this pattern for Thumb-2 mode.

I allowed these pattern for Thumb2 when we have constant operands for comparison. That makes the target tests arm/thum2-cond-cmp-1.c to thum2-cond-cmp-4.c pass.

Regression tested with gcc 4.9 branch since in trunk this bug is masked revision 209897.

Please provide your suggestion on this patch

regards,
Venkat.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC][ARM]: Fix reload spill failure (PR 60617)
  2014-06-16 12:53 [RFC][ARM]: Fix reload spill failure (PR 60617) Venkataramanan Kumar
  2014-06-16 15:50 ` Ajit Kumar Agarwal
@ 2014-06-18  9:59 ` Ramana Radhakrishnan
  2014-06-18 10:35   ` Venkataramanan Kumar
  1 sibling, 1 reply; 6+ messages in thread
From: Ramana Radhakrishnan @ 2014-06-18  9:59 UTC (permalink / raw)
  To: Venkataramanan Kumar
  Cc: gcc-patches, Richard Earnshaw, Ramana Radhakrishnan,
	Marcus Shawcroft, Patch Tracking, Maxim Kuvyrkov

On Mon, Jun 16, 2014 at 1:53 PM, Venkataramanan Kumar
<venkataramanan.kumar@linaro.org> wrote:
> Hi Maintainers,
>
> This patch fixes the PR 60617 that occurs when we turn on reload pass
> in thumb2 mode.
>
> It occurs for the pattern "*ior_scc_scc" that gets generated for the 3
> argument of the below function call.
>
> JIT:emitStoreInt32(dst,regT0m, (op1 == dst || op2 == dst)));
>
>
> (----snip---)
> (insn 634 633 635 27 (parallel [
>             (set (reg:SI 3 r3)
>                 (ior:SI (eq:SI (reg/v:SI 110 [ dst ]) <== This operand
> r5 is registers gets assigned
>                         (reg/v:SI 112 [ op2 ]))
>                     (eq:SI (reg/v:SI 110 [ dst ]) <== This operand
>                         (reg/v:SI 111 [ op1 ]))))
>             (clobber (reg:CC 100 cc))
>         ]) ../Source/JavaScriptCore/jit/JITArithmetic32_64.cpp:179 300
> {*ior_scc_scc
> (----snip---)
>
> The issue here is that the above pattern demands 5 registers (LO_REGS).
>
> But when we are in reload, registers r0 is used for pointer to the
> class, r1 and r2 for first and second argument. r7 is used for stack
> pointer.
>
> So we are left with r3,r4,r5 and r6. But the above patterns needs five
> LO_REGS. Hence we get spill failure when processing the last register
> operand in that pattern,
>
> In ARM port,  TARGET_LIKELY_SPILLED_CLASS is defined for Thumb-1 and
> for thumb 2 mode there is mention of using LO_REG in the comment as
> below.
>
> "Care should be taken to avoid adding thumb-2 patterns that require
> many low registers"
>
> So conservative fix is not to allow this pattern for Thumb-2 mode.

I don't have an additional solution off the top of my head and
probably need to go do some digging.

It sounds like the conservative fix but what's the impact of doing so
? Have you measured that in terms of performance or code size on a
range of benchmarks ?

>
> I allowed these pattern for Thumb2 when we have constant operands for
> comparison. That makes the target tests arm/thum2-cond-cmp-1.c to
> thum2-cond-cmp-4.c pass.

That sounds fine and fair - no trouble there.

My concern is with removing the register alternatives and loosing the
ability to trigger conditional compares on 4.9 and trunk for Thumb1
till the time the "new" conditional compare work makes it in.

Ramana

>
> Regression tested with gcc 4.9 branch since in trunk this bug is
> masked revision 209897.
>
> Please provide your suggestion on this patch
>
> regards,
> Venkat.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC][ARM]: Fix reload spill failure (PR 60617)
  2014-06-18  9:59 ` Ramana Radhakrishnan
@ 2014-06-18 10:35   ` Venkataramanan Kumar
  2014-07-07 11:48     ` Venkataramanan Kumar
  0 siblings, 1 reply; 6+ messages in thread
From: Venkataramanan Kumar @ 2014-06-18 10:35 UTC (permalink / raw)
  To: Ramana Radhakrishnan
  Cc: gcc-patches, Richard Earnshaw, Ramana Radhakrishnan,
	Marcus Shawcroft, Patch Tracking, Maxim Kuvyrkov

Hi Ramana,

On 18 June 2014 15:29, Ramana Radhakrishnan <ramana.gcc@googlemail.com> wrote:
> On Mon, Jun 16, 2014 at 1:53 PM, Venkataramanan Kumar
> <venkataramanan.kumar@linaro.org> wrote:
>> Hi Maintainers,
>>
>> This patch fixes the PR 60617 that occurs when we turn on reload pass
>> in thumb2 mode.
>>
>> It occurs for the pattern "*ior_scc_scc" that gets generated for the 3
>> argument of the below function call.
>>
>> JIT:emitStoreInt32(dst,regT0m, (op1 == dst || op2 == dst)));
>>
>>
>> (----snip---)
>> (insn 634 633 635 27 (parallel [
>>             (set (reg:SI 3 r3)
>>                 (ior:SI (eq:SI (reg/v:SI 110 [ dst ]) <== This operand
>> r5 is registers gets assigned
>>                         (reg/v:SI 112 [ op2 ]))
>>                     (eq:SI (reg/v:SI 110 [ dst ]) <== This operand
>>                         (reg/v:SI 111 [ op1 ]))))
>>             (clobber (reg:CC 100 cc))
>>         ]) ../Source/JavaScriptCore/jit/JITArithmetic32_64.cpp:179 300
>> {*ior_scc_scc
>> (----snip---)
>>
>> The issue here is that the above pattern demands 5 registers (LO_REGS).
>>
>> But when we are in reload, registers r0 is used for pointer to the
>> class, r1 and r2 for first and second argument. r7 is used for stack
>> pointer.
>>
>> So we are left with r3,r4,r5 and r6. But the above patterns needs five
>> LO_REGS. Hence we get spill failure when processing the last register
>> operand in that pattern,
>>
>> In ARM port,  TARGET_LIKELY_SPILLED_CLASS is defined for Thumb-1 and
>> for thumb 2 mode there is mention of using LO_REG in the comment as
>> below.
>>
>> "Care should be taken to avoid adding thumb-2 patterns that require
>> many low registers"
>>
>> So conservative fix is not to allow this pattern for Thumb-2 mode.
>
> I don't have an additional solution off the top of my head and
> probably need to go do some digging.
>
> It sounds like the conservative fix but what's the impact of doing so
> ? Have you measured that in terms of performance or code size on a
> range of benchmarks ?
>
>>

I haven't done any benchmark testing. I will try and run some
benchmarks with my patch.


>> I allowed these pattern for Thumb2 when we have constant operands for
>> comparison. That makes the target tests arm/thum2-cond-cmp-1.c to
>> thum2-cond-cmp-4.c pass.
>
> That sounds fine and fair - no trouble there.
>
> My concern is with removing the register alternatives and loosing the
> ability to trigger conditional compares on 4.9 and trunk for Thumb1
> till the time the "new" conditional compare work makes it in.
>
> Ramana

This bug does not occur when LRA is enabled. In 4.9 FSF and trunk, the
 LRA pass is enabled by default now .
May be too conservative, but is there a way to enable this pattern
when we have LRA pass and prevent it we have old reload pass?

regards,
Venkat.



>
>>
>> Regression tested with gcc 4.9 branch since in trunk this bug is
>> masked revision 209897.
>>
>> Please provide your suggestion on this patch
>>
>> regards,
>> Venkat.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC][ARM]: Fix reload spill failure (PR 60617)
  2014-06-18 10:35   ` Venkataramanan Kumar
@ 2014-07-07 11:48     ` Venkataramanan Kumar
  2014-09-26 14:26       ` Christophe Lyon
  0 siblings, 1 reply; 6+ messages in thread
From: Venkataramanan Kumar @ 2014-07-07 11:48 UTC (permalink / raw)
  To: Ramana Radhakrishnan
  Cc: gcc-patches, Richard Earnshaw, Ramana Radhakrishnan,
	Marcus Shawcroft, Patch Tracking, Maxim Kuvyrkov, doko

Hi Ramana/Maxim,


On 18 June 2014 16:05, Venkataramanan Kumar
<venkataramanan.kumar@linaro.org> wrote:
> Hi Ramana,
>
> On 18 June 2014 15:29, Ramana Radhakrishnan <ramana.gcc@googlemail.com> wrote:
>> On Mon, Jun 16, 2014 at 1:53 PM, Venkataramanan Kumar
>> <venkataramanan.kumar@linaro.org> wrote:
>>> Hi Maintainers,
>>>
>>> This patch fixes the PR 60617 that occurs when we turn on reload pass
>>> in thumb2 mode.
>>>
>>> It occurs for the pattern "*ior_scc_scc" that gets generated for the 3
>>> argument of the below function call.
>>>
>>> JIT:emitStoreInt32(dst,regT0m, (op1 == dst || op2 == dst)));
>>>
>>>
>>> (----snip---)
>>> (insn 634 633 635 27 (parallel [
>>>             (set (reg:SI 3 r3)
>>>                 (ior:SI (eq:SI (reg/v:SI 110 [ dst ]) <== This operand
>>> r5 is registers gets assigned
>>>                         (reg/v:SI 112 [ op2 ]))
>>>                     (eq:SI (reg/v:SI 110 [ dst ]) <== This operand
>>>                         (reg/v:SI 111 [ op1 ]))))
>>>             (clobber (reg:CC 100 cc))
>>>         ]) ../Source/JavaScriptCore/jit/JITArithmetic32_64.cpp:179 300
>>> {*ior_scc_scc
>>> (----snip---)
>>>
>>> The issue here is that the above pattern demands 5 registers (LO_REGS).
>>>
>>> But when we are in reload, registers r0 is used for pointer to the
>>> class, r1 and r2 for first and second argument. r7 is used for stack
>>> pointer.
>>>
>>> So we are left with r3,r4,r5 and r6. But the above patterns needs five
>>> LO_REGS. Hence we get spill failure when processing the last register
>>> operand in that pattern,
>>>
>>> In ARM port,  TARGET_LIKELY_SPILLED_CLASS is defined for Thumb-1 and
>>> for thumb 2 mode there is mention of using LO_REG in the comment as
>>> below.
>>>
>>> "Care should be taken to avoid adding thumb-2 patterns that require
>>> many low registers"
>>>
>>> So conservative fix is not to allow this pattern for Thumb-2 mode.
>>
>> I don't have an additional solution off the top of my head and
>> probably need to go do some digging.
>>
>> It sounds like the conservative fix but what's the impact of doing so
>> ? Have you measured that in terms of performance or code size on a
>> range of benchmarks ?
>>
>>>
>
> I haven't done any benchmark testing. I will try and run some
> benchmarks with my patch.
>
>
>>> I allowed these pattern for Thumb2 when we have constant operands for
>>> comparison. That makes the target tests arm/thum2-cond-cmp-1.c to
>>> thum2-cond-cmp-4.c pass.
>>
>> That sounds fine and fair - no trouble there.
>>
>> My concern is with removing the register alternatives and loosing the
>> ability to trigger conditional compares on 4.9 and trunk for Thumb1
>> till the time the "new" conditional compare work makes it in.
>>
>> Ramana

I tested this conservative fix with Coremark (ran it on chromebook)and
SPEC cpu2006 (cross compiled and compared assembly differences).

With Coremark there are no performance issues. In fact there no
assembly differences with CPU flags for A15 and A9.

For SPEC2006 I cross compiled and compared assembly differences with
and without patch (-O3 -fno-common).
I have not run these bechmarks.

There are major code differences and are due to conditional compare
instructions not getting generated as you expected. This also results
in different physical register numbers assigned in the generated code
and also there are code scheduling differences when comparing it with
base.


I am showing a simple test case to showcase the conditional compare
difference I am seeing in SPEC2006 benchmarks.

char a,b;
int i;
int f( int j)
{
  if ( (i >= a) ? (j <= a) : 1)
    return 1;
  else
    return 0;
}

GCC FSF 4.9
-----------

        movw    r2, #:lower16:a
        movw    r3, #:lower16:i
        movt    r2, #:upper16:a
        movt    r3, #:upper16:i
        ldrb    r2, [r2]        @ zero_extendqisi2
        ldr     r3, [r3]
        cmp     r2, r3
        it      le
        cmple   r2, r0          <== conditional compare instrucion generated.
        ite     ge
        movge   r0, #1
        movlt   r0, #0
        bx      lr


With patch
---------

        movw    r2, #:lower16:a
        movw    r3, #:lower16:i
        movt    r2, #:upper16:a
        movt    r3, #:upper16:i
        ldr     r3, [r3]
        ldrb    r2, [r2]        @ zero_extendqisi2
        cmp     r2, r3
        ite     le
        movle   r3, #0 <== Unoptimal moves generated.
        movgt   r3, #1 <==
        cmp     r2, r0
        ite     lt
        movlt   r0, r3
        orrge   r0, r3, #1<==
        bx      lr

The following benchmarks have maximum number of conditional compare
pattern differences and also code scheduling changes/different
physical register numbers in generated code.
416.gamess/
434.zeusmp/
400.perlbench
403.gcc/
445.gobmk
483.xalancbmk/
401.bzip2
433.milc/
435.gromacs

Assembly files differences seen in the below benchmarks buy are not very large.

436.cactusADM/
447.dealII
454.calculix
456.hmmer
453.povray/
465.tonto/
471.omnetpp
459.GemsFDTD
437.leslie3d
464.h264ref


Based on the results, I am of the opinion that this patch would cause
some performance hit if we have it up streamed on trunk or GCC 4.9.

Another thing to be noted is that this bug occurs when only when we
compile with -fno-tree-dce and -mno-lra on GCC 4.9.

Also with current FSF trunk and GCC 4.9,  we have LRA as default for
ARM, which does not have this bug.

My opinion is that reporter can use this fix to build the package that
caused it and if it has to be built with -fno-tree-dce.

I am not sure if we can do anything in machine descriptions level to
prevent this bug. Also in reload pass I am not sure if it is easy to
fix.

Do you have any suggestions?

regards,
Venkat.

>
> This bug does not occur when LRA is enabled. In 4.9 FSF and trunk, the
>  LRA pass is enabled by default now .
> May be too conservative, but is there a way to enable this pattern
> when we have LRA pass and prevent it we have old reload pass?
>
> regards,
> Venkat.
>
>
>
>>
>>>
>>> Regression tested with gcc 4.9 branch since in trunk this bug is
>>> masked revision 209897.
>>>
>>> Please provide your suggestion on this patch
>>>
>>> regards,
>>> Venkat.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC][ARM]: Fix reload spill failure (PR 60617)
  2014-07-07 11:48     ` Venkataramanan Kumar
@ 2014-09-26 14:26       ` Christophe Lyon
  0 siblings, 0 replies; 6+ messages in thread
From: Christophe Lyon @ 2014-09-26 14:26 UTC (permalink / raw)
  To: Ramana Radhakrishnan; +Cc: gcc-patches, Patch Tracking, Maxim Kuvyrkov, doko

Ramana,

On 7 July 2014 13:48, Venkataramanan Kumar
<venkataramanan.kumar@linaro.org> wrote:
> Hi Ramana/Maxim,
>
>
> On 18 June 2014 16:05, Venkataramanan Kumar
> <venkataramanan.kumar@linaro.org> wrote:
>> Hi Ramana,
>>
>> On 18 June 2014 15:29, Ramana Radhakrishnan <ramana.gcc@googlemail.com> wrote:
>>> On Mon, Jun 16, 2014 at 1:53 PM, Venkataramanan Kumar
>>> <venkataramanan.kumar@linaro.org> wrote:
>>>> Hi Maintainers,
>>>>
>>>> This patch fixes the PR 60617 that occurs when we turn on reload pass
>>>> in thumb2 mode.
>>>>
>>>> It occurs for the pattern "*ior_scc_scc" that gets generated for the 3
>>>> argument of the below function call.
>>>>
>>>> JIT:emitStoreInt32(dst,regT0m, (op1 == dst || op2 == dst)));
>>>>
>>>>
>>>> (----snip---)
>>>> (insn 634 633 635 27 (parallel [
>>>>             (set (reg:SI 3 r3)
>>>>                 (ior:SI (eq:SI (reg/v:SI 110 [ dst ]) <== This operand
>>>> r5 is registers gets assigned
>>>>                         (reg/v:SI 112 [ op2 ]))
>>>>                     (eq:SI (reg/v:SI 110 [ dst ]) <== This operand
>>>>                         (reg/v:SI 111 [ op1 ]))))
>>>>             (clobber (reg:CC 100 cc))
>>>>         ]) ../Source/JavaScriptCore/jit/JITArithmetic32_64.cpp:179 300
>>>> {*ior_scc_scc
>>>> (----snip---)
>>>>
>>>> The issue here is that the above pattern demands 5 registers (LO_REGS).
>>>>
>>>> But when we are in reload, registers r0 is used for pointer to the
>>>> class, r1 and r2 for first and second argument. r7 is used for stack
>>>> pointer.
>>>>
>>>> So we are left with r3,r4,r5 and r6. But the above patterns needs five
>>>> LO_REGS. Hence we get spill failure when processing the last register
>>>> operand in that pattern,
>>>>
>>>> In ARM port,  TARGET_LIKELY_SPILLED_CLASS is defined for Thumb-1 and
>>>> for thumb 2 mode there is mention of using LO_REG in the comment as
>>>> below.
>>>>
>>>> "Care should be taken to avoid adding thumb-2 patterns that require
>>>> many low registers"
>>>>
>>>> So conservative fix is not to allow this pattern for Thumb-2 mode.
>>>
>>> I don't have an additional solution off the top of my head and
>>> probably need to go do some digging.
>>>
>>> It sounds like the conservative fix but what's the impact of doing so
>>> ? Have you measured that in terms of performance or code size on a
>>> range of benchmarks ?
>>>
>>>>
>>
>> I haven't done any benchmark testing. I will try and run some
>> benchmarks with my patch.
>>
>>
>>>> I allowed these pattern for Thumb2 when we have constant operands for
>>>> comparison. That makes the target tests arm/thum2-cond-cmp-1.c to
>>>> thum2-cond-cmp-4.c pass.
>>>
>>> That sounds fine and fair - no trouble there.
>>>
>>> My concern is with removing the register alternatives and loosing the
>>> ability to trigger conditional compares on 4.9 and trunk for Thumb1
>>> till the time the "new" conditional compare work makes it in.
>>>
>>> Ramana
>
> I tested this conservative fix with Coremark (ran it on chromebook)and
> SPEC cpu2006 (cross compiled and compared assembly differences).
>
> With Coremark there are no performance issues. In fact there no
> assembly differences with CPU flags for A15 and A9.
>
> For SPEC2006 I cross compiled and compared assembly differences with
> and without patch (-O3 -fno-common).
> I have not run these bechmarks.
>
> There are major code differences and are due to conditional compare
> instructions not getting generated as you expected. This also results
> in different physical register numbers assigned in the generated code
> and also there are code scheduling differences when comparing it with
> base.
>
>
> I am showing a simple test case to showcase the conditional compare
> difference I am seeing in SPEC2006 benchmarks.
>
> char a,b;
> int i;
> int f( int j)
> {
>   if ( (i >= a) ? (j <= a) : 1)
>     return 1;
>   else
>     return 0;
> }
>
> GCC FSF 4.9
> -----------
>
>         movw    r2, #:lower16:a
>         movw    r3, #:lower16:i
>         movt    r2, #:upper16:a
>         movt    r3, #:upper16:i
>         ldrb    r2, [r2]        @ zero_extendqisi2
>         ldr     r3, [r3]
>         cmp     r2, r3
>         it      le
>         cmple   r2, r0          <== conditional compare instrucion generated.
>         ite     ge
>         movge   r0, #1
>         movlt   r0, #0
>         bx      lr
>
>
> With patch
> ---------
>
>         movw    r2, #:lower16:a
>         movw    r3, #:lower16:i
>         movt    r2, #:upper16:a
>         movt    r3, #:upper16:i
>         ldr     r3, [r3]
>         ldrb    r2, [r2]        @ zero_extendqisi2
>         cmp     r2, r3
>         ite     le
>         movle   r3, #0 <== Unoptimal moves generated.
>         movgt   r3, #1 <==
>         cmp     r2, r0
>         ite     lt
>         movlt   r0, r3
>         orrge   r0, r3, #1<==
>         bx      lr
>
> The following benchmarks have maximum number of conditional compare
> pattern differences and also code scheduling changes/different
> physical register numbers in generated code.
> 416.gamess/
> 434.zeusmp/
> 400.perlbench
> 403.gcc/
> 445.gobmk
> 483.xalancbmk/
> 401.bzip2
> 433.milc/
> 435.gromacs
>
> Assembly files differences seen in the below benchmarks buy are not very large.
>
> 436.cactusADM/
> 447.dealII
> 454.calculix
> 456.hmmer
> 453.povray/
> 465.tonto/
> 471.omnetpp
> 459.GemsFDTD
> 437.leslie3d
> 464.h264ref
>
>
> Based on the results, I am of the opinion that this patch would cause
> some performance hit if we have it up streamed on trunk or GCC 4.9.
>
> Another thing to be noted is that this bug occurs when only when we
> compile with -fno-tree-dce and -mno-lra on GCC 4.9.
>
> Also with current FSF trunk and GCC 4.9,  we have LRA as default for
> ARM, which does not have this bug.
>
> My opinion is that reporter can use this fix to build the package that
> caused it and if it has to be built with -fno-tree-dce.
>
> I am not sure if we can do anything in machine descriptions level to
> prevent this bug. Also in reload pass I am not sure if it is easy to
> fix.
>
> Do you have any suggestions?
>
So this is a 4.8-only bug, or a 4.9/trunk when disabling LRA.
Ramana, do you think it's worth fixing it in 4.8, at the price of a
likely performance degradation, or leaving it open as "known bug" in
that branch?

Thanks,

Christophe.


> regards,
> Venkat.
>
>>
>> This bug does not occur when LRA is enabled. In 4.9 FSF and trunk, the
>>  LRA pass is enabled by default now .
>> May be too conservative, but is there a way to enable this pattern
>> when we have LRA pass and prevent it we have old reload pass?
>>
>> regards,
>> Venkat.
>>
>>
>>
>>>
>>>>
>>>> Regression tested with gcc 4.9 branch since in trunk this bug is
>>>> masked revision 209897.
>>>>
>>>> Please provide your suggestion on this patch
>>>>
>>>> regards,
>>>> Venkat.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-09-26 14:26 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-16 12:53 [RFC][ARM]: Fix reload spill failure (PR 60617) Venkataramanan Kumar
2014-06-16 15:50 ` Ajit Kumar Agarwal
2014-06-18  9:59 ` Ramana Radhakrishnan
2014-06-18 10:35   ` Venkataramanan Kumar
2014-07-07 11:48     ` Venkataramanan Kumar
2014-09-26 14:26       ` Christophe Lyon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).