public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [patch passes.def]: Fix regression on ARM PR/61608
@ 2014-06-25 13:35 Kai Tietz
  2014-06-25 13:56 ` Richard Biener
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Kai Tietz @ 2014-06-25 13:35 UTC (permalink / raw)
  To: GCC Patches; +Cc: Richard Henderson, Jeff Law

Hello,

so there seems to be a fallout caused by moving peephole2 pass. See PR/61608.
So we need indeed 2 peephole2 passes.

ChangeLog

2014-06-25  Kai Tietz  <ktietz@redhat.com>

    PR rtl-optimization/61608
    * passes.def (peephole2): Readd peephole2 pass
    before if-after-reload pass.

Tested for arm*-none-*, i686-w64-cygwin, x86_64-unknown-linux-gnu.  Ok
for apply?

Regards,
Kai

Index: passes.def
===================================================================
--- passes.def    (Revision 211971)
+++ passes.def    (Arbeitskopie)
@@ -396,6 +396,7 @@ along with GCC; see the file COPYING3.  If not see
       NEXT_PASS (pass_rtl_dse2);
       NEXT_PASS (pass_stack_adjustments);
       NEXT_PASS (pass_jump2);
+      NEXT_PASS (pass_peephole2);
       NEXT_PASS (pass_if_after_reload);
       NEXT_PASS (pass_regrename);
       NEXT_PASS (pass_cprop_hardreg);
@@ -407,8 +408,7 @@ along with GCC; see the file COPYING3.  If not see
          We have a single indirect branch in the entire function
          before duplicate-compute-gotos pass.  This vastly reduces
          the size of the CFG.
-         For preventing to run peephole2 pass twice, its run after
-         the jump2 got removed.  */
+         We need to run peephole2 pass twice. See PR/61608.  */
       NEXT_PASS (pass_peephole2);
       NEXT_PASS (pass_branch_target_load_optimize2);
       NEXT_PASS (pass_leaf_regs);

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch passes.def]: Fix regression on ARM PR/61608
  2014-06-25 13:35 [patch passes.def]: Fix regression on ARM PR/61608 Kai Tietz
@ 2014-06-25 13:56 ` Richard Biener
  2014-06-25 14:04 ` Jeff Law
  2014-06-25 18:12 ` Richard Henderson
  2 siblings, 0 replies; 13+ messages in thread
From: Richard Biener @ 2014-06-25 13:56 UTC (permalink / raw)
  To: Kai Tietz; +Cc: GCC Patches, Richard Henderson, Jeff Law

On Wed, Jun 25, 2014 at 3:35 PM, Kai Tietz <ktietz70@googlemail.com> wrote:
> Hello,
>
> so there seems to be a fallout caused by moving peephole2 pass. See PR/61608.
> So we need indeed 2 peephole2 passes.
>
> ChangeLog
>
> 2014-06-25  Kai Tietz  <ktietz@redhat.com>
>
>     PR rtl-optimization/61608
>     * passes.def (peephole2): Readd peephole2 pass
>     before if-after-reload pass.
>
> Tested for arm*-none-*, i686-w64-cygwin, x86_64-unknown-linux-gnu.  Ok
> for apply?

Uh, every time I try to improve compile-time by removing passes somebody
else adds other passes back.

Please try to avoid this.

Thanks,
Richard.

> Regards,
> Kai
>
> Index: passes.def
> ===================================================================
> --- passes.def    (Revision 211971)
> +++ passes.def    (Arbeitskopie)
> @@ -396,6 +396,7 @@ along with GCC; see the file COPYING3.  If not see
>        NEXT_PASS (pass_rtl_dse2);
>        NEXT_PASS (pass_stack_adjustments);
>        NEXT_PASS (pass_jump2);
> +      NEXT_PASS (pass_peephole2);
>        NEXT_PASS (pass_if_after_reload);
>        NEXT_PASS (pass_regrename);
>        NEXT_PASS (pass_cprop_hardreg);
> @@ -407,8 +408,7 @@ along with GCC; see the file COPYING3.  If not see
>           We have a single indirect branch in the entire function
>           before duplicate-compute-gotos pass.  This vastly reduces
>           the size of the CFG.
> -         For preventing to run peephole2 pass twice, its run after
> -         the jump2 got removed.  */
> +         We need to run peephole2 pass twice. See PR/61608.  */
>        NEXT_PASS (pass_peephole2);
>        NEXT_PASS (pass_branch_target_load_optimize2);
>        NEXT_PASS (pass_leaf_regs);

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch passes.def]: Fix regression on ARM PR/61608
  2014-06-25 13:35 [patch passes.def]: Fix regression on ARM PR/61608 Kai Tietz
  2014-06-25 13:56 ` Richard Biener
@ 2014-06-25 14:04 ` Jeff Law
  2014-06-25 15:04   ` Kai Tietz
  2014-06-25 18:12 ` Richard Henderson
  2 siblings, 1 reply; 13+ messages in thread
From: Jeff Law @ 2014-06-25 14:04 UTC (permalink / raw)
  To: Kai Tietz, GCC Patches; +Cc: Richard Henderson

On 06/25/14 07:35, Kai Tietz wrote:
> Hello,
>
> so there seems to be a fallout caused by moving peephole2 pass. See PR/61608.
> So we need indeed 2 peephole2 passes.
>
> ChangeLog
>
> 2014-06-25  Kai Tietz  <ktietz@redhat.com>
>
>      PR rtl-optimization/61608
>      * passes.def (peephole2): Readd peephole2 pass
>      before if-after-reload pass.
So why is the peephole not working in its current location?

Jeff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch passes.def]: Fix regression on ARM PR/61608
  2014-06-25 14:04 ` Jeff Law
@ 2014-06-25 15:04   ` Kai Tietz
  2014-06-25 15:28     ` Jeff Law
  0 siblings, 1 reply; 13+ messages in thread
From: Kai Tietz @ 2014-06-25 15:04 UTC (permalink / raw)
  To: Jeff Law; +Cc: GCC Patches, Richard Henderson

2014-06-25 16:04 GMT+02:00 Jeff Law <law@redhat.com>:
> So why is the peephole not working in its current location?
>
> Jeff

Hi Jeff,

that is what I read out of dumps:

If peephole2 is executed early we see following pattern transformation:

(insn 12 11 13 2 (set (reg:CC_NOOV 100 cc)
        (compare:CC_NOOV (zero_extract:SI (reg:SI 4 r4 [orig:110 D.1414 ] [110]
)
                (const_int 1 [0x1])
                (const_int 2 [0x2]))
            (const_int 0 [0]))) arm_epilog-1.c:11 84
{*zeroextractsi_compare0_scratch}
     (expr_list:REG_DEAD (reg:SI 4 r4 [orig:110 D.1414 ] [110])
        (nil)))
(jump_insn 13 12 14 2 (set (pc)
        (if_then_else (eq (reg:CC_NOOV 100 cc)
                 (const_int 0 [0]))
             (label_ref:SI 16)
            (pc))) arm_epilog-1.c:11 236 {arm_cond_branch}
     (expr_list:REG_DEAD (reg:CC_NOOV 100 cc)
        (int_list:REG_BR_PROB 3900 (nil)))
  -> 16)

which gets replaced to:

(insn 62 11 63 2 (parallel [
            (set (reg:CC_NOOV 100 cc)
                (compare:CC_NOOV (ashift:SI (reg:SI 4 r4 [orig:110
D.1414 ] [110])
                        (const_int 29 [0x1d]))
                    (const_int 0 [0])))
            (clobber (reg:SI 4 r4))
        ]) arm_epilog-1.c:11 -1
     (nil))
(jump_insn 63 62 14 2 (set (pc)
        (if_then_else (ge (reg:CC_NOOV 100 cc)
                 (const_int 0 [0]))
             (label_ref:SI 16)
            (pc))) arm_epilog-1.c:11 -1
     (nil)
  -> 16)

If we run peephole2 pass late we see instead the following pattern,
which isn't handled by peephole2 pass anymore.

(insn 12 11 15 2 (set (reg:CC_NOOV 100 cc)
        (compare:CC_NOOV (zero_extract:SI (reg:SI 4 r4 [orig:110 D.1414 ] [110])
                (const_int 1 [0x1])
                (const_int 2 [0x2]))
            (const_int 0 [0]))) arm_epilog-1_.c:11 84 {*zeroextractsi_compare0_s
cratch}
     (expr_list:REG_DEAD (reg:SI 4 r4 [orig:110 D.1414 ] [110])
        (nil)))
(insn 15 12 22 2 (cond_exec (ne (reg:CC_NOOV 100 cc)
            (const_int 0 [0]))
        (set (reg/v:SI 2 r2 [orig:115 c ] [115])
            (plus:SI (reg/v:SI 2 r2 [orig:115 c ] [115])
                (const_int 1 [0x1])))) arm_epilog-1_.c:11 3229 {*p *arm_addsi3}
     (expr_list:REG_DEAD (reg:CC_NOOV 100 cc)
        (nil)))

So this issue might be solvable by extending ARM's peephole2 patterns?
An ARM maintainer might be able to tell.

Kai

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch passes.def]: Fix regression on ARM PR/61608
  2014-06-25 15:04   ` Kai Tietz
@ 2014-06-25 15:28     ` Jeff Law
  2014-06-25 15:47       ` Ramana Radhakrishnan
  2014-06-25 15:52       ` Richard Henderson
  0 siblings, 2 replies; 13+ messages in thread
From: Jeff Law @ 2014-06-25 15:28 UTC (permalink / raw)
  To: Kai Tietz; +Cc: GCC Patches, Richard Henderson

On 06/25/14 09:04, Kai Tietz wrote:
> 2014-06-25 16:04 GMT+02:00 Jeff Law <law@redhat.com>:
>> So why is the peephole not working in its current location?
>>
>> Jeff
>
> Hi Jeff,
>
> that is what I read out of dumps:
>
> If peephole2 is executed early we see following pattern transformation:
[ ... ]
Ask an ARM maintainer if the new code is actually better than the old code.

It appears that with the peep2 pass moved that we actually if-convert 
the fall-thru path of the conditional and eliminate the conditional. 
Which, on the surface seems like a good thing.  It may be the case that 
we need to twiddle the test.  Not sure yet.


Jeff


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch passes.def]: Fix regression on ARM PR/61608
  2014-06-25 15:28     ` Jeff Law
@ 2014-06-25 15:47       ` Ramana Radhakrishnan
  2014-06-25 15:52       ` Richard Henderson
  1 sibling, 0 replies; 13+ messages in thread
From: Ramana Radhakrishnan @ 2014-06-25 15:47 UTC (permalink / raw)
  To: Jeff Law, Kai Tietz; +Cc: GCC Patches, Richard Henderson

[Apologies about the duplicates to folks - resending to make this hit 
the lists]

On 25/06/14 16:28, Jeff Law wrote:
> On 06/25/14 09:04, Kai Tietz wrote:
>> 2014-06-25 16:04 GMT+02:00 Jeff Law <law@redhat.com>:
>>> So why is the peephole not working in its current location?
>>>
>>> Jeff
>>
>> Hi Jeff,
>>
>> that is what I read out of dumps:
>>
>> If peephole2 is executed early we see following pattern transformation:
> [ ... ]
> Ask an ARM maintainer if the new code is actually better than the old code.
>
> It appears that with the peep2 pass moved that we actually if-convert
> the fall-thru path of the conditional and eliminate the conditional.
> Which, on the surface seems like a good thing.  It may be the case that
> we need to twiddle the test.  Not sure yet.


No, we don't need to twiddle the test. The test is good as it is today
as we caught a potential size regression. (Prima-facie the test itself
doesn't show an actual size regression because of padding but you can
see the replacement of a 2 byte instruction with a 4 byte instruction)

These peepholes that were in there gave 0.1% in CSiBe for Thumb2 as per
the original patch 
(https://gcc.gnu.org/ml/gcc-patches/2010-06/msg02518.html). The 
if-conversion happened anyway afterwards regardless of where the pattern 
was but the key here was the conversion of the 4 byte instruction to a 2 
byte instruction.

Looks like the all the peephole2's that have if_then_else's (5 patterns
today) also need to develop a cond_exec variant where possible in the
ARM backend.

It seems unfortunate that the backend has to handle each of the possible 
pattern classes that may be conditionalized by the cond-exec pass and we 
potentially maintain a list of patterns in sync with anything that can 
be conditionalized.

At first glance this looks ugly to "work around" in the backend.

regards
Ramana

>
>
> Jeff
>
>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch passes.def]: Fix regression on ARM PR/61608
  2014-06-25 15:28     ` Jeff Law
  2014-06-25 15:47       ` Ramana Radhakrishnan
@ 2014-06-25 15:52       ` Richard Henderson
  2014-06-25 16:02         ` Kai Tietz
  2014-06-25 17:17         ` Jeff Law
  1 sibling, 2 replies; 13+ messages in thread
From: Richard Henderson @ 2014-06-25 15:52 UTC (permalink / raw)
  To: Jeff Law, Kai Tietz; +Cc: GCC Patches

On 06/25/2014 08:28 AM, Jeff Law wrote:
> Ask an ARM maintainer if the new code is actually better than the old code.

It isn't.

> It appears that with the peep2 pass moved that we actually if-convert the
> fall-thru path of the conditional and eliminate the conditional. Which, on the
> surface seems like a good thing.  It may be the case that we need to twiddle
> the test.  Not sure yet.

Looking at the final code in the pr, it looks like we if-convert both ways,
just with changed condition on the test.

It looks like there are at least 3 peepholes in the arm backend that match
compare+branch with a scratch register that are affected by this.  I don't
think it's reasonable to expect a peephole to match compare + n cond_exec
insns, so running peep2 before if-conversion would seem to be called for.

Kai, why does your indirect jump peephole require pass_reorder_blocks?  It
seems to me that instead of moving peephole2 down, we could move
duplicate_computed_gotos up.


r~

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch passes.def]: Fix regression on ARM PR/61608
  2014-06-25 15:52       ` Richard Henderson
@ 2014-06-25 16:02         ` Kai Tietz
  2014-06-25 17:15           ` Jeff Law
  2014-06-25 17:17         ` Jeff Law
  1 sibling, 1 reply; 13+ messages in thread
From: Kai Tietz @ 2014-06-25 16:02 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Jeff Law, GCC Patches

2014-06-25 17:50 GMT+02:00 Richard Henderson <rth@redhat.com>:
> On 06/25/2014 08:28 AM, Jeff Law wrote:
>> Ask an ARM maintainer if the new code is actually better than the old code.
>
> It isn't.
>
>> It appears that with the peep2 pass moved that we actually if-convert the
>> fall-thru path of the conditional and eliminate the conditional. Which, on the
>> surface seems like a good thing.  It may be the case that we need to twiddle
>> the test.  Not sure yet.
>
> Looking at the final code in the pr, it looks like we if-convert both ways,
> just with changed condition on the test.
>
> It looks like there are at least 3 peepholes in the arm backend that match
> compare+branch with a scratch register that are affected by this.  I don't
> think it's reasonable to expect a peephole to match compare + n cond_exec
> insns, so running peep2 before if-conversion would seem to be called for.
>
> Kai, why does your indirect jump peephole require pass_reorder_blocks?  It
> seems to me that instead of moving peephole2 down, we could move
> duplicate_computed_gotos up.
>
>
> r~

We need the peephole2 pass after reorder_blocks due we move in that
pass the jump

...
(code_label 54 52 55 5 8 "" [0 uses])
(note 55 54 56 5 [bb 5] NOTE_INSN_BASIC_BLOCK)
(jump_insn 56 55 57 5 (set (pc)
        (reg/f:DI 0 ax [orig:83 D.3469 ] [83])) 638 {*indirect_jump}
     (expr_list:REG_DEAD (reg/f:DI 0 ax [orig:83 D.3469 ] [83])
        (nil)))
(barrier 57 56 58)
....

into each bb, so it can be resolved to an indirect jump on memory.

Testcase to reproduce that is:

// PR 51840/
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

enum {
  S_atop   = 0,
  S_atop_f = 1,
  S_atop_t = 2,
  S_limit  = 3
};

enum {
  I_a_dec        = 0,
  I_a_non_zero_p = 1,
  I_jmp_if_true  = 2,
  I_exit         = 3,
  I_limit        = 4
};


typedef struct {
  uint64_t a0;
} vm_state_t;


__attribute__((noinline, noclone)) void exec_code(vm_state_t *state,
uint8_t *code) {
  static void (* volatile const vm_dispatch[S_limit][I_limit]) = {
    //dispatch for [S_atop = 0][..] with four unique destination labels
    {&&atop__a_dec,
     &&atop__a_non_zero_p,
     &&fixme,
     &&atop__exit},

    //dispatch for [S_atop_f = 1][..] with two unique destination labels
    {&&fixme,
     &&fixme,
     &&atop_f__jmp_if_true,
     &&fixme},

    //dispatch for [S_atop_t = 2][..] with two unique destination labels
    {&&fixme,
     &&fixme,
     &&atop_t__jmp_if_true,
     &&fixme}
  };

  printf("atop__a_dec:         %p\n", &&atop__a_dec);
  printf("atop__a_non_zero_p:  %p\n", &&atop__a_non_zero_p);
  printf("atop__exit:          %p\n", &&atop__exit);
  printf("atop_f__jmp_if_true: %p\n", &&atop_f__jmp_if_true);
  printf("atop_t__jmp_if_true: %p\n", &&atop_t__jmp_if_true);
  printf("fixme:               %p\n", &&fixme);

  volatile uint64_t atop = state->a0;
  volatile int64_t inst_offset=0;

  goto *(vm_dispatch[S_atop][code[inst_offset]]);

 atop__a_dec: {
    printf("atop = %ld\n", atop);
    --atop;
    volatile uint64_t next_inst = code[inst_offset + 1];
    inst_offset += 1;
    goto *(vm_dispatch[S_atop][next_inst]);
  }

 atop__a_non_zero_p: {
    volatile uint64_t next_inst = code[inst_offset + 1];
    inst_offset += 1;
    if (atop != 0) {
      goto *(vm_dispatch[S_atop_t][next_inst]);
    } else {
      goto *(vm_dispatch[S_atop_f][next_inst]);
    }
  }

 atop_f__jmp_if_true: {
    volatile uint64_t next_inst = code[inst_offset + 2];
    inst_offset += 2;
    goto *(vm_dispatch[S_atop][next_inst]);
  }

 atop_t__jmp_if_true: {
    int64_t offset = (int8_t) code[inst_offset + 1];
    uint64_t next_inst = code[inst_offset + offset];
    inst_offset += offset;
    goto *(vm_dispatch[S_atop][next_inst]);
  }

 atop__exit: {
    state->a0 = atop;
    return;
  }

 fixme: {
    printf("Dispatch logic ERROR\n");
    exit(EXIT_FAILURE);
  }
}

int main(void)
{
  vm_state_t state;
  state.a0     = 10;
  uint8_t code[] = {I_a_dec,           //decrement top of stack
            I_a_non_zero_p,    //true if top of stack is non-zero
            I_jmp_if_true, -2, //if true jump back to decrement
            I_exit};           //otherwise exit

  exec_code(&state, code);
  return EXIT_SUCCESS;
}

Regards,
Kai

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch passes.def]: Fix regression on ARM PR/61608
  2014-06-25 16:02         ` Kai Tietz
@ 2014-06-25 17:15           ` Jeff Law
  2014-06-25 17:53             ` Kai Tietz
  0 siblings, 1 reply; 13+ messages in thread
From: Jeff Law @ 2014-06-25 17:15 UTC (permalink / raw)
  To: Kai Tietz, Richard Henderson; +Cc: GCC Patches

On 06/25/14 10:02, Kai Tietz wrote:
> 2014-06-25 17:50 GMT+02:00 Richard Henderson <rth@redhat.com>:
>> On 06/25/2014 08:28 AM, Jeff Law wrote:
>>> Ask an ARM maintainer if the new code is actually better than the old code.
>>
>> It isn't.
>>
>>> It appears that with the peep2 pass moved that we actually if-convert the
>>> fall-thru path of the conditional and eliminate the conditional. Which, on the
>>> surface seems like a good thing.  It may be the case that we need to twiddle
>>> the test.  Not sure yet.
>>
>> Looking at the final code in the pr, it looks like we if-convert both ways,
>> just with changed condition on the test.
>>
>> It looks like there are at least 3 peepholes in the arm backend that match
>> compare+branch with a scratch register that are affected by this.  I don't
>> think it's reasonable to expect a peephole to match compare + n cond_exec
>> insns, so running peep2 before if-conversion would seem to be called for.
>>
>> Kai, why does your indirect jump peephole require pass_reorder_blocks?  It
>> seems to me that instead of moving peephole2 down, we could move
>> duplicate_computed_gotos up.
>>
>>
>> r~
>
> We need the peephole2 pass after reorder_blocks due we move in that
> pass the jump
>
> ...
> (code_label 54 52 55 5 8 "" [0 uses])
> (note 55 54 56 5 [bb 5] NOTE_INSN_BASIC_BLOCK)
> (jump_insn 56 55 57 5 (set (pc)
>          (reg/f:DI 0 ax [orig:83 D.3469 ] [83])) 638 {*indirect_jump}
>       (expr_list:REG_DEAD (reg/f:DI 0 ax [orig:83 D.3469 ] [83])
>          (nil)))
> (barrier 57 56 58)
> ....
>
> into each bb, so it can be resolved to an indirect jump on memory.
But I think Richard's question was can we move up 
duplicate_computed_gotos since that's the code which I'd expect to make 
this transformation.  Then you run peep2 immediately after 
duplicate_computed_gotos.


Jeff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch passes.def]: Fix regression on ARM PR/61608
  2014-06-25 15:52       ` Richard Henderson
  2014-06-25 16:02         ` Kai Tietz
@ 2014-06-25 17:17         ` Jeff Law
  1 sibling, 0 replies; 13+ messages in thread
From: Jeff Law @ 2014-06-25 17:17 UTC (permalink / raw)
  To: Richard Henderson, Kai Tietz; +Cc: GCC Patches

On 06/25/14 09:50, Richard Henderson wrote:
> On 06/25/2014 08:28 AM, Jeff Law wrote:
>> Ask an ARM maintainer if the new code is actually better than the old code.
>
> It isn't.
>
>> It appears that with the peep2 pass moved that we actually if-convert the
>> fall-thru path of the conditional and eliminate the conditional. Which, on the
>> surface seems like a good thing.  It may be the case that we need to twiddle
>> the test.  Not sure yet.
>
> Looking at the final code in the pr, it looks like we if-convert both ways,
> just with changed condition on the test.
Yea, I came to the same conclusion when I looked back at the code in the 
BZ.   I was too focused on Kai's message and the RTL contained within 
when I wrote my comment.

>
> It looks like there are at least 3 peepholes in the arm backend that match
> compare+branch with a scratch register that are affected by this.  I don't
> think it's reasonable to expect a peephole to match compare + n cond_exec
> insns, so running peep2 before if-conversion would seem to be called for.
Agreed.

jeff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch passes.def]: Fix regression on ARM PR/61608
  2014-06-25 17:15           ` Jeff Law
@ 2014-06-25 17:53             ` Kai Tietz
  0 siblings, 0 replies; 13+ messages in thread
From: Kai Tietz @ 2014-06-25 17:53 UTC (permalink / raw)
  To: Jeff Law; +Cc: Richard Henderson, GCC Patches

2014-06-25 19:15 GMT+02:00 Jeff Law <law@redhat.com>:
> On 06/25/14 10:02, Kai Tietz wrote:
>>
>> 2014-06-25 17:50 GMT+02:00 Richard Henderson <rth@redhat.com>:
>>>
>>> On 06/25/2014 08:28 AM, Jeff Law wrote:
>>>>
>>>> Ask an ARM maintainer if the new code is actually better than the old
>>>> code.
>>>
>>>
>>> It isn't.
>>>
>>>> It appears that with the peep2 pass moved that we actually if-convert
>>>> the
>>>> fall-thru path of the conditional and eliminate the conditional. Which,
>>>> on the
>>>> surface seems like a good thing.  It may be the case that we need to
>>>> twiddle
>>>> the test.  Not sure yet.
>>>
>>>
>>> Looking at the final code in the pr, it looks like we if-convert both
>>> ways,
>>> just with changed condition on the test.
>>>
>>> It looks like there are at least 3 peepholes in the arm backend that
>>> match
>>> compare+branch with a scratch register that are affected by this.  I
>>> don't
>>> think it's reasonable to expect a peephole to match compare + n cond_exec
>>> insns, so running peep2 before if-conversion would seem to be called for.
>>>
>>> Kai, why does your indirect jump peephole require pass_reorder_blocks?
>>> It
>>> seems to me that instead of moving peephole2 down, we could move
>>> duplicate_computed_gotos up.
>>>
>>>
>>> r~
>>
>>
>> We need the peephole2 pass after reorder_blocks due we move in that
>> pass the jump
>>
>> ...
>> (code_label 54 52 55 5 8 "" [0 uses])
>> (note 55 54 56 5 [bb 5] NOTE_INSN_BASIC_BLOCK)
>> (jump_insn 56 55 57 5 (set (pc)
>>          (reg/f:DI 0 ax [orig:83 D.3469 ] [83])) 638 {*indirect_jump}
>>       (expr_list:REG_DEAD (reg/f:DI 0 ax [orig:83 D.3469 ] [83])
>>          (nil)))
>> (barrier 57 56 58)
>> ....
>>
>> into each bb, so it can be resolved to an indirect jump on memory.
>
> But I think Richard's question was can we move up duplicate_computed_gotos
> since that's the code which I'd expect to make this transformation.  Then
> you run peep2 immediately after duplicate_computed_gotos.
>
>
> Jeff
>

The pass of question on ARM isn't the duplicate_compute_gotos pass.
It is the if-after-reload pass.
I tested it by moving it.

Kai

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch passes.def]: Fix regression on ARM PR/61608
  2014-06-25 13:35 [patch passes.def]: Fix regression on ARM PR/61608 Kai Tietz
  2014-06-25 13:56 ` Richard Biener
  2014-06-25 14:04 ` Jeff Law
@ 2014-06-25 18:12 ` Richard Henderson
  2014-06-25 18:29   ` Kai Tietz
  2 siblings, 1 reply; 13+ messages in thread
From: Richard Henderson @ 2014-06-25 18:12 UTC (permalink / raw)
  To: Kai Tietz, GCC Patches; +Cc: Jeff Law

[-- Attachment #1: Type: text/plain, Size: 337 bytes --]

On 06/25/2014 06:35 AM, Kai Tietz wrote:
> Hello,
> 
> so there seems to be a fallout caused by moving peephole2 pass. See PR/61608.
> So we need indeed 2 peephole2 passes.

We don't need a second peephole pass.  Please try this.

I think there's room for cleanup here, depending on when we leave cfglayout
mode for the last time.


r~


[-- Attachment #2: z --]
[-- Type: text/plain, Size: 2272 bytes --]

	* bb-reorder.c (pass_duplicate_computed_gotos::execute): Cleanup
	the cfg if there were any changes.
	* passes.def: Revert move of peephole2 after reorder_blocks;
	move duplicate_computed_gotos before peephole2.

diff --git a/gcc/bb-reorder.c b/gcc/bb-reorder.c
index 61b0cab..cd68fee 100644
--- a/gcc/bb-reorder.c
+++ b/gcc/bb-reorder.c
@@ -2520,13 +2520,20 @@ pass_duplicate_computed_gotos::execute (function *fun)
       changed = true;
     }
 
-done:
-  /* Duplicating blocks above will redirect edges and may cause hot blocks
-     previously reached by both hot and cold blocks to become dominated only
-     by cold blocks.  */
+ done:
   if (changed)
-    fixup_partitions ();
-  cfg_layout_finalize ();
+    {
+      /* Duplicating blocks above will redirect edges and may cause hot
+	 blocks previously reached by both hot and cold blocks to become
+	 dominated only by cold blocks.  */
+      fixup_partitions ();
+
+      /* Merge the duplicated blocks into predecessors, when possible.  */
+      cfg_layout_finalize ();
+      cleanup_cfg (0);
+    }
+  else
+    cfg_layout_finalize ();
 
   BITMAP_FREE (candidates);
   return 0;
diff --git a/gcc/passes.def b/gcc/passes.def
index 280cf16..f13df6c 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -396,20 +396,13 @@ along with GCC; see the file COPYING3.  If not see
 	  NEXT_PASS (pass_rtl_dse2);
 	  NEXT_PASS (pass_stack_adjustments);
 	  NEXT_PASS (pass_jump2);
+	  NEXT_PASS (pass_duplicate_computed_gotos);
+	  NEXT_PASS (pass_peephole2);
 	  NEXT_PASS (pass_if_after_reload);
 	  NEXT_PASS (pass_regrename);
 	  NEXT_PASS (pass_cprop_hardreg);
 	  NEXT_PASS (pass_fast_rtl_dce);
-	  NEXT_PASS (pass_duplicate_computed_gotos);
 	  NEXT_PASS (pass_reorder_blocks);
-	  /* We need to run peephole2 pass after the duplicate-
-	     compute-gotos and the reorder-blocks pass (PR/39284).
-	     We have a single indirect branch in the entire function
-	     before duplicate-compute-gotos pass.  This vastly reduces
-	     the size of the CFG.
-	     For preventing to run peephole2 pass twice, its run after
-	     the jump2 got removed.  */
-	  NEXT_PASS (pass_peephole2);
 	  NEXT_PASS (pass_branch_target_load_optimize2);
 	  NEXT_PASS (pass_leaf_regs);
 	  NEXT_PASS (pass_split_before_sched2);

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch passes.def]: Fix regression on ARM PR/61608
  2014-06-25 18:12 ` Richard Henderson
@ 2014-06-25 18:29   ` Kai Tietz
  0 siblings, 0 replies; 13+ messages in thread
From: Kai Tietz @ 2014-06-25 18:29 UTC (permalink / raw)
  To: Richard Henderson; +Cc: GCC Patches, Jeff Law

2014-06-25 20:12 GMT+02:00 Richard Henderson <rth@redhat.com>:
> On 06/25/2014 06:35 AM, Kai Tietz wrote:
>> Hello,
>>
>> so there seems to be a fallout caused by moving peephole2 pass. See PR/61608.
>> So we need indeed 2 peephole2 passes.
>
> We don't need a second peephole pass.  Please try this.
>
> I think there's room for cleanup here, depending on when we leave cfglayout
> mode for the last time.
>
>
> r~
>

My tests shown that it works for ARM and i386 archs. Doing right now a
bootstrap for x86_64

Kai

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2014-06-25 18:29 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-25 13:35 [patch passes.def]: Fix regression on ARM PR/61608 Kai Tietz
2014-06-25 13:56 ` Richard Biener
2014-06-25 14:04 ` Jeff Law
2014-06-25 15:04   ` Kai Tietz
2014-06-25 15:28     ` Jeff Law
2014-06-25 15:47       ` Ramana Radhakrishnan
2014-06-25 15:52       ` Richard Henderson
2014-06-25 16:02         ` Kai Tietz
2014-06-25 17:15           ` Jeff Law
2014-06-25 17:53             ` Kai Tietz
2014-06-25 17:17         ` Jeff Law
2014-06-25 18:12 ` Richard Henderson
2014-06-25 18:29   ` Kai Tietz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).