public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [patch i386]: Combine memory and indirect jump
@ 2014-06-11  8:32 Kai Tietz
  2014-06-11  9:58 ` Steven Bosscher
  0 siblings, 1 reply; 30+ messages in thread
From: Kai Tietz @ 2014-06-11  8:32 UTC (permalink / raw)
  To: GCC Patches; +Cc: Richard Henderson

Hello,

this patch adds simple combining of indirect-jumps on memory-address.
This patch is pretty similar to sibcall-combing.
ChangeLog

2014-06-11  Kai Tietz  <ktietz@redhat.com>

    * config/i386/i386.md (peehole2): To combine
    indirect jump with memory.

2014-06-11  Kai Tietz  <ktietz@redhat.com>

    * gcc.target/i386/indjmp-1.c: New test.

Tested for i686-pc-cygwin, and x86_64-unknown-linux-gnu.  Ok for apply?

Regards,
Kai

Index: config/i386/i386.md
===================================================================
--- config/i386/i386.md    (Revision 211409)
+++ config/i386/i386.md    (Arbeitskopie)
@@ -11471,6 +11471,40 @@
             (match_dup 3))
           (set (reg:SI SP_REG) (match_dup 4))])])

+;; Combining simple memory jump instruction
+
+(define_peephole2
+  [(set (match_operand:SI 0 "register_operand")
+        (match_operand:SI 1 "memory_nox32_operand"))
+   (set (pc) (match_dup 0))]
+  "!TARGET_64BIT && peep2_reg_dead_p (2, operands[0])"
+  [(set (pc) (match_dup 1))])
+
+(define_peephole2
+  [(set (match_operand:SI 0 "register_operand")
+        (match_operand:SI 1 "memory_nox32_operand"))
+   (unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE)
+   (set (pc) (match_dup 0))]
+  "!TARGET_64BIT && peep2_reg_dead_p (3, operands[0])"
+  [(unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE)
+   (set (pc) (match_dup 1))])
+
+(define_peephole2
+  [(set (match_operand:DI 0 "register_operand")
+        (match_operand:DI 1 "memory_nox32_operand"))
+   (set (pc) (match_dup 0))]
+  "TARGET_64BIT && peep2_reg_dead_p (2, operands[0])"
+  [(set (pc) (match_dup 1))])
+
+(define_peephole2
+  [(set (match_operand:DI 0 "register_operand")
+        (match_operand:DI 1 "memory_nox32_operand"))
+   (unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE)
+   (set (pc) (match_dup 0))]
+  "TARGET_64BIT && peep2_reg_dead_p (3, operands[0])"
+  [(unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE)
+   (set (pc) (match_dup 1))])
+
 ;; Call subroutine, returning value in operand 0

 (define_expand "call_value"
Index: testsuite/gcc.target/i386/indjmp-1.c
===================================================================
--- testsuite/gcc.target/i386/indjmp-1.c    (Revision 0)
+++ testsuite/gcc.target/i386/indjmp-1.c    (Arbeitskopie)
@@ -0,0 +1,23 @@
+/* { dg-do compile  { target ia32 } } */
+/* { dg-options "-O2" } */
+
+#define ADVANCE_AND_DISPATCH() goto *addresses[*pc++]
+
+void
+Interpret(const unsigned char *pc)
+{
+    static const void *const addresses[] = {
+      &&l0, &&l1, &&l2
+    };
+
+l0:
+    ADVANCE_AND_DISPATCH();
+
+l1:
+    ADVANCE_AND_DISPATCH();
+
+l2:
+    return;
+}
+
+/* { dg-final { scan-assembler-not "jmp\[ \t\]*.%eax" } } */

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch i386]: Combine memory and indirect jump
  2014-06-11  8:32 [patch i386]: Combine memory and indirect jump Kai Tietz
@ 2014-06-11  9:58 ` Steven Bosscher
  2014-06-11 10:37   ` Kai Tietz
  0 siblings, 1 reply; 30+ messages in thread
From: Steven Bosscher @ 2014-06-11  9:58 UTC (permalink / raw)
  To: Kai Tietz; +Cc: GCC Patches, Richard Henderson

On Wed, Jun 11, 2014 at 10:32 AM, Kai Tietz wrote:
> this patch adds simple combining of indirect-jumps on memory-address.
> This patch is pretty similar to sibcall-combing.
> ChangeLog
>
> 2014-06-11  Kai Tietz  <ktietz@...>
>
>     * config/i386/i386.md (peehole2): To combine
>     indirect jump with memory.


Likely fixes part of PR39284, xf. https://gcc.gnu.org/PR39284#c12

Ciao!
Steven

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch i386]: Combine memory and indirect jump
  2014-06-11  9:58 ` Steven Bosscher
@ 2014-06-11 10:37   ` Kai Tietz
  2014-06-12 16:21     ` Kai Tietz
  0 siblings, 1 reply; 30+ messages in thread
From: Kai Tietz @ 2014-06-11 10:37 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: GCC Patches, Richard Henderson

2014-06-11 11:57 GMT+02:00 Steven Bosscher <stevenb.gcc@gmail.com>:
> On Wed, Jun 11, 2014 at 10:32 AM, Kai Tietz wrote:
>> this patch adds simple combining of indirect-jumps on memory-address.
>> This patch is pretty similar to sibcall-combing.
>> ChangeLog
>>
>> 2014-06-11  Kai Tietz  <ktietz@...>
>>
>>     * config/i386/i386.md (peehole2): To combine
>>     indirect jump with memory.
>
>
> Likely fixes part of PR39284, xf. https://gcc.gnu.org/PR39284#c12
>
> Ciao!
> Steven

Well, it fixes it just partial AFAICS.  Btw PR target/51840 is related
to the same issue too.

Latter problem shows much better the underlying issue of
jump-shortening. Issue is here that the jump to a jump (or better said
indirect jump) isn't shortened.

Regards,
Kai

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch i386]: Combine memory and indirect jump
  2014-06-11 10:37   ` Kai Tietz
@ 2014-06-12 16:21     ` Kai Tietz
  2014-06-12 18:53       ` Segher Boessenkool
                         ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Kai Tietz @ 2014-06-12 16:21 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: GCC Patches, Richard Henderson

Hello,

I updated i386.md part of the patch. Initial patch included handling
of blockage, which is obviously superflous.  Additionally I merged
32-bit and 64-bit peephole2 versions by using mode-specifier W.

ChangeLog

2014-06-12  Kai Tietz  <ktietz@redhat.com>

    * config/i386/i386.md (peehole2): To combine
    indirect jump with memory.

2014-06-12  Kai Tietz  <ktietz@redhat.com>

    * gcc.target/i386/indjmp-1.c: New test.

Tested for i686-pc-cygwin, and x86_64-unknown-linux-gnu.  Ok for apply?

with addition of adding a second peephole2 pass after sched2 pass, I
was able to get some improvement for PR target/39284.  I think by this
addition we can close bug as fixed.
Additionally additional peephole2 pass shows better results for PR
target/51840 testcase with disabled ASM_GOTO, too.

2014-06-12  Kai Tietz  <ktietz@redhat.com>

    PR target/39284
    * passes.def (pass_peephole2): Add second peephole2
    run after sched2 pass.

Tested for i686-pc-cygwin, and x86_64-unknown-linux-gnu.  Ok for apply?

Regards,
Kai

Index: testsuite/gcc.target/i386/indjmp-1.c
===================================================================
--- testsuite/gcc.target/i386/indjmp-1.c    (Revision 0)
+++ testsuite/gcc.target/i386/indjmp-1.c    (Arbeitskopie)
@@ -0,0 +1,23 @@
+/* { dg-do compile  { target ia32 } } */
+/* { dg-options "-O2" } */
+
+#define ADVANCE_AND_DISPATCH() goto *addresses[*pc++]
+
+void
+Interpret(const unsigned char *pc)
+{
+    static const void *const addresses[] = {
+      &&l0, &&l1, &&l2
+    };
+
+l0:
+    ADVANCE_AND_DISPATCH();
+
+l1:
+    ADVANCE_AND_DISPATCH();
+
+l2:
+    return;
+}
+
+/* { dg-final { scan-assembler-not "jmp\[ \t\]*.%eax" } } */
Index: config/i386/i386.md
===================================================================
--- config/i386/i386.md    (Revision 211489)
+++ config/i386/i386.md    (Arbeitskopie)
@@ -11471,6 +11471,15 @@
             (match_dup 3))
           (set (reg:SI SP_REG) (match_dup 4))])])

+;; Combining simple memory jump instruction
+
+(define_peephole2
+  [(set (match_operand:W 0 "register_operand")
+        (match_operand:W 1 "memory_nox32_operand"))
+   (set (pc) (match_dup 0))]
+  "peep2_reg_dead_p (2, operands[0])"
+  [(set (pc) (match_dup 1))])
+
 ;; Call subroutine, returning value in operand 0

 (define_expand "call_value"
Index: passes.def
===================================================================
--- passes.def    (Revision 211489)
+++ passes.def    (Arbeitskopie)
@@ -396,6 +396,7 @@ along with GCC; see the file COPYING3.  If not see
       NEXT_PASS (pass_leaf_regs);
       NEXT_PASS (pass_split_before_sched2);
       NEXT_PASS (pass_sched2);
+      NEXT_PASS (pass_peephole2);
       NEXT_PASS (pass_stack_regs);
       PUSH_INSERT_PASSES_WITHIN (pass_stack_regs)
           NEXT_PASS (pass_split_before_regstack);

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch i386]: Combine memory and indirect jump
  2014-06-12 16:21     ` Kai Tietz
@ 2014-06-12 18:53       ` Segher Boessenkool
  2014-06-12 20:41         ` Kai Tietz
  2014-06-12 21:47       ` David Wohlferd
  2014-06-13 15:36       ` Jeff Law
  2 siblings, 1 reply; 30+ messages in thread
From: Segher Boessenkool @ 2014-06-12 18:53 UTC (permalink / raw)
  To: Kai Tietz; +Cc: Steven Bosscher, GCC Patches, Richard Henderson

On Thu, Jun 12, 2014 at 06:21:32PM +0200, Kai Tietz wrote:
> with addition of adding a second peephole2 pass after sched2 pass, I
> was able to get some improvement for PR target/39284.  I think by this
> addition we can close bug as fixed.
> Additionally additional peephole2 pass shows better results for PR
> target/51840 testcase with disabled ASM_GOTO, too.

Will that work on other targets?  Also, it needs a doc fix (md.texi
says peephole2 runs before scheduling).


Segher

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch i386]: Combine memory and indirect jump
  2014-06-12 18:53       ` Segher Boessenkool
@ 2014-06-12 20:41         ` Kai Tietz
  2014-06-12 21:06           ` Segher Boessenkool
  0 siblings, 1 reply; 30+ messages in thread
From: Kai Tietz @ 2014-06-12 20:41 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Steven Bosscher, GCC Patches, Richard Henderson

2014-06-12 20:52 GMT+02:00 Segher Boessenkool <segher@kernel.crashing.org>:
> On Thu, Jun 12, 2014 at 06:21:32PM +0200, Kai Tietz wrote:
>> with addition of adding a second peephole2 pass after sched2 pass, I
>> was able to get some improvement for PR target/39284.  I think by this
>> addition we can close bug as fixed.
>> Additionally additional peephole2 pass shows better results for PR
>> target/51840 testcase with disabled ASM_GOTO, too.

Well, this is the only point I am a bit concerned too.  In general I
wouldn't expect here any issues to run peephole after scheduling, as
peephole doesn't do anything a new run of ira/lra would require.
Anyway it would be good if a global maintainer could comment on that.

> Will that work on other targets?  Also, it needs a doc fix (md.texi
> says peephole2 runs before scheduling).

Thanks for pointing on that.  When I send patch for this additional
peephole pass with testcase, I will adjust md.texi.

>
> Segher

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch i386]: Combine memory and indirect jump
  2014-06-12 20:41         ` Kai Tietz
@ 2014-06-12 21:06           ` Segher Boessenkool
  2014-06-13 15:29             ` Jeff Law
  0 siblings, 1 reply; 30+ messages in thread
From: Segher Boessenkool @ 2014-06-12 21:06 UTC (permalink / raw)
  To: Kai Tietz; +Cc: Steven Bosscher, GCC Patches, Richard Henderson

> > Will that work on other targets?

> Well, this is the only point I am a bit concerned too.  In general I
> wouldn't expect here any issues to run peephole after scheduling, as
> peephole doesn't do anything a new run of ira/lra would require.

My concern is that peepholes are rather fragile, so imho it is not
inconceivable that some target will generate wrong code when you add
an extra (later) peephole pass.  Of course, we are in stage1.

My other concern is that running peepholes again after scheduling
could easily generate worse code.

So I think the effect of this change on other targets needs to be
evaluated.

> Anyway it would be good if a global maintainer could comment on that.

Yes :-)


Segher

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch i386]: Combine memory and indirect jump
  2014-06-12 16:21     ` Kai Tietz
  2014-06-12 18:53       ` Segher Boessenkool
@ 2014-06-12 21:47       ` David Wohlferd
  2014-06-13  7:22         ` Kai Tietz
  2014-06-13 15:36       ` Jeff Law
  2 siblings, 1 reply; 30+ messages in thread
From: David Wohlferd @ 2014-06-12 21:47 UTC (permalink / raw)
  To: Kai Tietz, gcc-patches


On 6/12/2014 9:21 AM, Kai Tietz wrote:
> with addition of adding a second peephole2 pass after sched2 pass, I
> was able to get some improvement for PR target/39284.  I think by this
> addition we can close bug as fixed.
> Additionally additional peephole2 pass shows better results for PR
> target/51840 testcase with disabled ASM_GOTO, too.

Any chance this also fixes PR 58670 (see comment #5)?

dw


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch i386]: Combine memory and indirect jump
  2014-06-12 21:47       ` David Wohlferd
@ 2014-06-13  7:22         ` Kai Tietz
  0 siblings, 0 replies; 30+ messages in thread
From: Kai Tietz @ 2014-06-13  7:22 UTC (permalink / raw)
  To: David Wohlferd; +Cc: gcc-patches

2014-06-12 23:46 GMT+02:00 David Wohlferd <dw@limegreensocks.com>:
>
> On 6/12/2014 9:21 AM, Kai Tietz wrote:
>>
>> with addition of adding a second peephole2 pass after sched2 pass, I
>> was able to get some improvement for PR target/39284.  I think by this
>> addition we can close bug as fixed.
>> Additionally additional peephole2 pass shows better results for PR
>> target/51840 testcase with disabled ASM_GOTO, too.
>
>
> Any chance this also fixes PR 58670 (see comment #5)?
>
> dw

Hi dw,

no it doesn't.  The patch shows affect only for computed goto.

Kai

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch i386]: Combine memory and indirect jump
  2014-06-12 21:06           ` Segher Boessenkool
@ 2014-06-13 15:29             ` Jeff Law
  2014-06-13 18:46               ` Segher Boessenkool
  0 siblings, 1 reply; 30+ messages in thread
From: Jeff Law @ 2014-06-13 15:29 UTC (permalink / raw)
  To: Segher Boessenkool, Kai Tietz
  Cc: Steven Bosscher, GCC Patches, Richard Henderson

On 06/12/14 15:06, Segher Boessenkool wrote:
>>> Will that work on other targets?
>
>> Well, this is the only point I am a bit concerned too.  In general I
>> wouldn't expect here any issues to run peephole after scheduling, as
>> peephole doesn't do anything a new run of ira/lra would require.
>
> My concern is that peepholes are rather fragile, so imho it is not
> inconceivable that some target will generate wrong code when you add
> an extra (later) peephole pass.  Of course, we are in stage1.
peephole2 is significantly robust than the original peephole pass; 
largely because peep2 is required to generate real insns that can be 
processed by the scheduler and other passes.  Another pass or moving the 
pass to a different location in the pipeline shouldn't be inherently 
problematical.

The original peephole pass ran so late that ports could play it pretty 
fast and loose and changing its location in the pipeline or adding 
another pass would be ill-advised.



>
> My other concern is that running peepholes again after scheduling
> could easily generate worse code.
That's always been the the case for anything which runs after the 
scheduler -- such passes perturb the work of the scheduler.  We've 
effectively ignored the effects of reg-stack, delay slot filling and the 
old peephole pass in terms of their effect on the schedule.

I'd be surprised if enough things fall out of peep2 changes to be 
measurable.

Jeff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch i386]: Combine memory and indirect jump
  2014-06-12 16:21     ` Kai Tietz
  2014-06-12 18:53       ` Segher Boessenkool
  2014-06-12 21:47       ` David Wohlferd
@ 2014-06-13 15:36       ` Jeff Law
  2014-06-13 15:56         ` Richard Henderson
  2 siblings, 1 reply; 30+ messages in thread
From: Jeff Law @ 2014-06-13 15:36 UTC (permalink / raw)
  To: Kai Tietz, Steven Bosscher; +Cc: GCC Patches, Richard Henderson

On 06/12/14 10:21, Kai Tietz wrote:
> Hello,
>
> I updated i386.md part of the patch. Initial patch included handling
> of blockage, which is obviously superflous.  Additionally I merged
> 32-bit and 64-bit peephole2 versions by using mode-specifier W.
>
> ChangeLog
>
> 2014-06-12  Kai Tietz  <ktietz@redhat.com>
>
>      * config/i386/i386.md (peehole2): To combine
>      indirect jump with memory.
>
> 2014-06-12  Kai Tietz  <ktietz@redhat.com>
>
>      * gcc.target/i386/indjmp-1.c: New test.
>
> Tested for i686-pc-cygwin, and x86_64-unknown-linux-gnu.  Ok for apply?
So you may have answered this already, but why can't this be a combiner 
pattern?   We have the first insn which sets a reg which is used and 
dies in the subsequent insn.  There's a data dependency so the combiner 
ought to be trying to smash these two insns together.

Or is it the case that combine does smash them together, but we get a 
bad register allocation and generate spills?


>
> with addition of adding a second peephole2 pass after sched2 pass, I
> was able to get some improvement for PR target/39284.  I think by this
> addition we can close bug as fixed.
> Additionally additional peephole2 pass shows better results for PR
> target/51840 testcase with disabled ASM_GOTO, too.
>
> 2014-06-12  Kai Tietz  <ktietz@redhat.com>
>
>      PR target/39284
>      * passes.def (pass_peephole2): Add second peephole2
>      run after sched2 pass.
Need to resolve the prior question before digging into this.  Obviously 
if combine is a better place to handle this optimization, then this 
patch may not be needed.


Jeff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch i386]: Combine memory and indirect jump
  2014-06-13 15:36       ` Jeff Law
@ 2014-06-13 15:56         ` Richard Henderson
  2014-06-13 15:58           ` Jeff Law
  0 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2014-06-13 15:56 UTC (permalink / raw)
  To: Jeff Law, Kai Tietz, Steven Bosscher; +Cc: GCC Patches

On 06/13/2014 08:36 AM, Jeff Law wrote:
> So you may have answered this already, but why can't this be a combiner pattern?

Until pass_duplicate_computed_gotos, we (intentionally) have a single indirect
branch in the entire function.  This vastly reduces the size of the CFG.

Peep2 is currently running before d_c_g, so currently Kai can't solve this
problem in peep2.

I don't think peep2 should run after sched2, but I'll bet we can reorder things
a bit so that d_c_g runs before peep2.


r~

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch i386]: Combine memory and indirect jump
  2014-06-13 15:56         ` Richard Henderson
@ 2014-06-13 15:58           ` Jeff Law
  2014-06-13 16:59             ` Kai Tietz
  0 siblings, 1 reply; 30+ messages in thread
From: Jeff Law @ 2014-06-13 15:58 UTC (permalink / raw)
  To: Richard Henderson, Kai Tietz, Steven Bosscher; +Cc: GCC Patches

On 06/13/14 09:56, Richard Henderson wrote:
> On 06/13/2014 08:36 AM, Jeff Law wrote:
>> So you may have answered this already, but why can't this be a combiner pattern?
>
> Until pass_duplicate_computed_gotos, we (intentionally) have a single indirect
> branch in the entire function.  This vastly reduces the size of the CFG.
Ah, the factoring bits.  Should have known.

>
> Peep2 is currently running before d_c_g, so currently Kai can't solve this
> problem in peep2.
>
> I don't think peep2 should run after sched2, but I'll bet we can reorder things
> a bit so that d_c_g runs before peep2.
Yea, seems worth a try.

jeff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch i386]: Combine memory and indirect jump
  2014-06-13 15:58           ` Jeff Law
@ 2014-06-13 16:59             ` Kai Tietz
  2014-06-17 19:26               ` Jeff Law
  0 siblings, 1 reply; 30+ messages in thread
From: Kai Tietz @ 2014-06-13 16:59 UTC (permalink / raw)
  To: Jeff Law; +Cc: Richard Henderson, Steven Bosscher, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 1072 bytes --]

2014-06-13 17:58 GMT+02:00 Jeff Law <law@redhat.com>:
> On 06/13/14 09:56, Richard Henderson wrote:
>>
>> On 06/13/2014 08:36 AM, Jeff Law wrote:
>>>
>>> So you may have answered this already, but why can't this be a combiner
>>> pattern?
>>
>>
>> Until pass_duplicate_computed_gotos, we (intentionally) have a single
>> indirect
>> branch in the entire function.  This vastly reduces the size of the CFG.
>
> Ah, the factoring bits.  Should have known.
>
>
>>
>> Peep2 is currently running before d_c_g, so currently Kai can't solve this
>> problem in peep2.
>>
>> I don't think peep2 should run after sched2, but I'll bet we can reorder
>> things
>> a bit so that d_c_g runs before peep2.
>
> Yea, seems worth a try.
>
> jeff
>

Well, I tested to put the second sched2 pass before the sched2 pass.
That works in general.  There are just some opportunties which weren't
caught then.  I attached a sample, which demonstrates that pretty
well.  I noticed that I had to put that pass behind reload blocks was
necessary for better hit-rate of the peephole optimization.

Kai

[-- Attachment #2: t_goto2.c --]
[-- Type: text/x-csrc, Size: 2643 bytes --]

// PR 51840/
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

enum {
  S_atop   = 0,
  S_atop_f = 1,
  S_atop_t = 2,
  S_limit  = 3
};

enum {
  I_a_dec        = 0,
  I_a_non_zero_p = 1,
  I_jmp_if_true  = 2,
  I_exit         = 3,
  I_limit        = 4
};


typedef struct {
  uint64_t a0;
} vm_state_t;


__attribute__((noinline, noclone)) void exec_code(vm_state_t *state, uint8_t *code) {
  static void (* volatile const vm_dispatch[S_limit][I_limit]) = {
    //dispatch for [S_atop = 0][..] with four unique destination labels
    {&&atop__a_dec,
     &&atop__a_non_zero_p,
     &&fixme,
     &&atop__exit},

    //dispatch for [S_atop_f = 1][..] with two unique destination labels
    {&&fixme,
     &&fixme,
     &&atop_f__jmp_if_true,
     &&fixme},

    //dispatch for [S_atop_t = 2][..] with two unique destination labels
    {&&fixme,
     &&fixme,
     &&atop_t__jmp_if_true,
     &&fixme}
  };

  printf("atop__a_dec:         %p\n", &&atop__a_dec);
  printf("atop__a_non_zero_p:  %p\n", &&atop__a_non_zero_p);
  printf("atop__exit:          %p\n", &&atop__exit);
  printf("atop_f__jmp_if_true: %p\n", &&atop_f__jmp_if_true);
  printf("atop_t__jmp_if_true: %p\n", &&atop_t__jmp_if_true);
  printf("fixme:               %p\n", &&fixme);

  volatile uint64_t atop = state->a0;
  volatile int64_t inst_offset=0;

  goto *(vm_dispatch[S_atop][code[inst_offset]]);

 atop__a_dec: {
    printf("atop = %ld\n", atop);
    --atop;
    volatile uint64_t next_inst = code[inst_offset + 1];
    inst_offset += 1;
    goto *(vm_dispatch[S_atop][next_inst]);
  }

 atop__a_non_zero_p: {
    volatile uint64_t next_inst = code[inst_offset + 1];
    inst_offset += 1;
    if (atop != 0) {
      goto *(vm_dispatch[S_atop_t][next_inst]);
    } else {
      goto *(vm_dispatch[S_atop_f][next_inst]);
    }
  }

 atop_f__jmp_if_true: {
    volatile uint64_t next_inst = code[inst_offset + 2];
    inst_offset += 2;
    goto *(vm_dispatch[S_atop][next_inst]);
  }

 atop_t__jmp_if_true: {
    int64_t offset = (int8_t) code[inst_offset + 1];
    uint64_t next_inst = code[inst_offset + offset];
    inst_offset += offset;
    goto *(vm_dispatch[S_atop][next_inst]);
  }

 atop__exit: {
    state->a0 = atop;
    return;
  }

 fixme: {
    printf("Dispatch logic ERROR\n");
    exit(EXIT_FAILURE);
  }
}

int main(void)u
{
  vm_state_t state;
  state.a0     = 10;
  uint8_t code[] = {I_a_dec,           //decrement top of stack
		    I_a_non_zero_p,    //true if top of stack is non-zero
		    I_jmp_if_true, -2, //if true jump back to decrement
		    I_exit};           //otherwise exit

  exec_code(&state, code);
  return EXIT_SUCCESS;
}


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch i386]: Combine memory and indirect jump
  2014-06-13 15:29             ` Jeff Law
@ 2014-06-13 18:46               ` Segher Boessenkool
  0 siblings, 0 replies; 30+ messages in thread
From: Segher Boessenkool @ 2014-06-13 18:46 UTC (permalink / raw)
  To: Jeff Law; +Cc: Kai Tietz, Steven Bosscher, GCC Patches, Richard Henderson

> >My concern is that peepholes are rather fragile, so imho it is not
> >inconceivable that some target will generate wrong code when you add
> >an extra (later) peephole pass.  Of course, we are in stage1.

> peephole2 is significantly robust than the original peephole pass; 
> largely because peep2 is required to generate real insns that can be 
> processed by the scheduler and other passes.  Another pass or moving the 
> pass to a different location in the pipeline shouldn't be inherently 
> problematical.

Yes.  What I mean is that peephole2s are still fragile in ways that
are unique to peephole2; they need to check manually whether registers
are dead, whether registers overlap, whether constraints match.  I'd be
surprised if there would not be any fallout from adding a later peephole2
pass.  Those of course are target bugs, but it's still fallout.  Doing
this on the back of an x86 patch without testing anywhere else is not
very friendly :-P

> >My other concern is that running peepholes again after scheduling
> >could easily generate worse code.
> That's always been the the case for anything which runs after the 
> scheduler

That's no reason to run more after the scheduler, quite the opposite.

> -- such passes perturb the work of the scheduler.  We've 
> effectively ignored the effects of reg-stack, delay slot filling and the 
> old peephole pass in terms of their effect on the schedule.
> 
> I'd be surprised if enough things fall out of peep2 changes to be 
> measurable.

The very first example in the define_peephole2 description in the manual,
which is for x86, shows a peephole that only makes sense before scheduling
("This pattern tries to split a load from its use in the hope that we'll
be able to schedule around the memory load latency.")

You could well be right that there is no measurable difference on any
target; but we won't know without measuring.


Segher

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch i386]: Combine memory and indirect jump
  2014-06-13 16:59             ` Kai Tietz
@ 2014-06-17 19:26               ` Jeff Law
  2014-06-17 20:35                 ` Kai Tietz
  0 siblings, 1 reply; 30+ messages in thread
From: Jeff Law @ 2014-06-17 19:26 UTC (permalink / raw)
  To: Kai Tietz; +Cc: Richard Henderson, Steven Bosscher, GCC Patches

On 06/13/14 10:59, Kai Tietz wrote:
> 2014-06-13 17:58 GMT+02:00 Jeff Law <law@redhat.com>:
>> On 06/13/14 09:56, Richard Henderson wrote:
>>>
>>> On 06/13/2014 08:36 AM, Jeff Law wrote:
>>>>
>>>> So you may have answered this already, but why can't this be a combiner
>>>> pattern?
>>>
>>>
>>> Until pass_duplicate_computed_gotos, we (intentionally) have a single
>>> indirect
>>> branch in the entire function.  This vastly reduces the size of the CFG.
>>
>> Ah, the factoring bits.  Should have known.
>>
>>
>>>
>>> Peep2 is currently running before d_c_g, so currently Kai can't solve this
>>> problem in peep2.
>>>
>>> I don't think peep2 should run after sched2, but I'll bet we can reorder
>>> things
>>> a bit so that d_c_g runs before peep2.
>>
>> Yea, seems worth a try.
>>
>> jeff
>>
>
> Well, I tested to put the second sched2 pass before the sched2 pass.
> That works in general.  There are just some opportunties which weren't
> caught then.  I attached a sample, which demonstrates that pretty
> well.  I noticed that I had to put that pass behind reload blocks was
> necessary for better hit-rate of the peephole optimization.
So can you tell us why this sample code misses opportunities?  Otherwise 
we have to dig into it ourselves to tease out that information.

I think we're zeroing in on a path to move d_c_g before peep2, but I'd 
like to have a clearer understanding of why we'd still be missing 
opportunities.  If we can avoid running peep2 twice, that'd be good.

jeff


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch i386]: Combine memory and indirect jump
  2014-06-17 19:26               ` Jeff Law
@ 2014-06-17 20:35                 ` Kai Tietz
  2014-06-18 19:36                   ` Jeff Law
  0 siblings, 1 reply; 30+ messages in thread
From: Kai Tietz @ 2014-06-17 20:35 UTC (permalink / raw)
  To: Jeff Law; +Cc: Richard Henderson, Steven Bosscher, GCC Patches

2014-06-17 21:26 GMT+02:00 Jeff Law <law@redhat.com>:
> On 06/13/14 10:59, Kai Tietz wrote:
>>
>> 2014-06-13 17:58 GMT+02:00 Jeff Law <law@redhat.com>:
>>>
>>> On 06/13/14 09:56, Richard Henderson wrote:
>>>>
>>>>
>>>> On 06/13/2014 08:36 AM, Jeff Law wrote:
>>>>>
>>>>>
>>>>> So you may have answered this already, but why can't this be a combiner
>>>>> pattern?
>>>>
>>>>
>>>>
>>>> Until pass_duplicate_computed_gotos, we (intentionally) have a single
>>>> indirect
>>>> branch in the entire function.  This vastly reduces the size of the CFG.
>>>
>>>
>>> Ah, the factoring bits.  Should have known.
>>>
>>>
>>>>
>>>> Peep2 is currently running before d_c_g, so currently Kai can't solve
>>>> this
>>>> problem in peep2.
>>>>
>>>> I don't think peep2 should run after sched2, but I'll bet we can reorder
>>>> things
>>>> a bit so that d_c_g runs before peep2.
>>>
>>>
>>> Yea, seems worth a try.
>>>
>>> jeff
>>>
>>
>> Well, I tested to put the second sched2 pass before the sched2 pass.
>> That works in general.  There are just some opportunties which weren't
>> caught then.  I attached a sample, which demonstrates that pretty
>> well.  I noticed that I had to put that pass behind reload blocks was
>> necessary for better hit-rate of the peephole optimization.
>
> So can you tell us why this sample code misses opportunities?  Otherwise we
> have to dig into it ourselves to tease out that information.
>
> I think we're zeroing in on a path to move d_c_g before peep2, but I'd like
> to have a clearer understanding of why we'd still be missing opportunities.
> If we can avoid running peep2 twice, that'd be good.
>
> jeff

Hi Jeff,

I just did retest my testcase with recent source. I can't reproduce
this missed optimization before sched2 pass anymore.  I moved second
peephole2 pass just before split_before_sched2 and everything got
caught.

To remove first peephole2 pass seems to cause weaker code for
impossible pushes, etc

Nevertheless it might be a point to make this new peephole instead a
define_split?  I admit that this operation isn't a split, nevertheless
we would avoid a second peephole pass.

Kai

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch i386]: Combine memory and indirect jump
  2014-06-17 20:35                 ` Kai Tietz
@ 2014-06-18 19:36                   ` Jeff Law
  2014-06-20 15:56                     ` Kai Tietz
  0 siblings, 1 reply; 30+ messages in thread
From: Jeff Law @ 2014-06-18 19:36 UTC (permalink / raw)
  To: Kai Tietz; +Cc: Richard Henderson, Steven Bosscher, GCC Patches

On 06/17/14 14:35, Kai Tietz wrote:
> I just did retest my testcase with recent source. I can't reproduce
> this missed optimization before sched2 pass anymore.  I moved second
> peephole2 pass just before split_before_sched2 and everything got
> caught.
Let's go with this if your idea of using a define_split doesn't work out.



>
> To remove first peephole2 pass seems to cause weaker code for
> impossible pushes, etc
OK.


>
> Nevertheless it might be a point to make this new peephole instead a
> define_split?  I admit that this operation isn't a split, nevertheless
> we would avoid a second peephole pass.
Doesn't hurt to try and as you say, if we can avoid a 2nd peep2 pass, 
that's good.

jeff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch i386]: Combine memory and indirect jump
  2014-06-18 19:36                   ` Jeff Law
@ 2014-06-20 15:56                     ` Kai Tietz
  2014-06-20 17:23                       ` Richard Henderson
  0 siblings, 1 reply; 30+ messages in thread
From: Kai Tietz @ 2014-06-20 15:56 UTC (permalink / raw)
  To: Jeff Law; +Cc: Richard Henderson, Steven Bosscher, GCC Patches

I tested variant to use additional the split pass for memory combining
for indirect jumps.  By this we don't need to add a second peephole2
pass.  Other advantage of this version is that even for -O1 we do
combining.

ChangeLog

2014-06-20  Kai Tietz  <ktietz@redhat.com>

    PR target/39284
    * config/i386/i386.md (peehole2): To combine
    indirect jump with memory.
    (split2): Likewise.

2014-06-20  Kai Tietz  <ktietz@redhat.com>

    * gcc.target/i386/indjmp-1.c: New test.

Tested for i686-pc-cygwin, and x86_64-unknown-linux-gnu.  Ok for apply?

Regards,
Kai

Index: testsuite/gcc.target/i386/indjmp-1.c
===================================================================
--- testsuite/gcc.target/i386/indjmp-1.c    (Revision 0)
+++ testsuite/gcc.target/i386/indjmp-1.c    (Arbeitskopie)
@@ -0,0 +1,23 @@
+/* { dg-do compile  { target ia32 } } */
+/* { dg-options "-O2" } */
+
+#define ADVANCE_AND_DISPATCH() goto *addresses[*pc++]
+
+void
+Interpret(const unsigned char *pc)
+{
+    static const void *const addresses[] = {
+      &&l0, &&l1, &&l2
+    };
+
+l0:
+    ADVANCE_AND_DISPATCH();
+
+l1:
+    ADVANCE_AND_DISPATCH();
+
+l2:
+    return;
+}
+
+/* { dg-final { scan-assembler-not "jmp\[ \t\]*.%eax" } } */
Index: config/i386/i386.md
===================================================================
--- config/i386/i386.md    (Revision 211850)
+++ config/i386/i386.md    (Arbeitskopie)
@@ -11466,6 +11466,24 @@
                 (match_dup 4)))
           (unspec [(const_int 0)] UNSPEC_PEEPSIB)])])

+;; Combining simple memory jump instruction
+
+(define_peephole2
+  [(set (match_operand:W 0 "register_operand")
+        (match_operand:W 1 "memory_operand"))
+   (set (pc) (match_dup 0))]
+  "!TARGET_X32 && peep2_reg_dead_p (2, operands[0])"
+  [(set (pc) (match_dup 1))])
+
+;; For avoiding a second pass for peephole, we use here split pass
+
+(define_split
+  [(set (match_operand:W 0 "register_operand")
+        (match_operand:W 1 "memory_operand"))
+   (set (pc) (match_dup 0))]
+  "!TARGET_X32 && peep2_reg_dead_p (2, operands[0])"
+  [(set (pc) (match_dup 1))])
+
 ;; Call subroutine, returning value in operand 0

 (define_expand "call_value"

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch i386]: Combine memory and indirect jump
  2014-06-20 15:56                     ` Kai Tietz
@ 2014-06-20 17:23                       ` Richard Henderson
  2014-06-20 17:45                         ` Kai Tietz
  0 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2014-06-20 17:23 UTC (permalink / raw)
  To: Kai Tietz, Jeff Law; +Cc: Steven Bosscher, GCC Patches

On 06/20/2014 08:56 AM, Kai Tietz wrote:
> +(define_split
> +  [(set (match_operand:W 0 "register_operand")
> +        (match_operand:W 1 "memory_operand"))
> +   (set (pc) (match_dup 0))]
> +  "!TARGET_X32 && peep2_reg_dead_p (2, operands[0])"
> +  [(set (pc) (match_dup 1))])
> +

Huh?  You can't use peep2 data structures in split passes.


r~

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch i386]: Combine memory and indirect jump
  2014-06-20 17:23                       ` Richard Henderson
@ 2014-06-20 17:45                         ` Kai Tietz
  2014-06-20 17:52                           ` Kai Tietz
  0 siblings, 1 reply; 30+ messages in thread
From: Kai Tietz @ 2014-06-20 17:45 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Jeff Law, Steven Bosscher, GCC Patches

2014-06-20 19:23 GMT+02:00 Richard Henderson <rth@redhat.com>:
> On 06/20/2014 08:56 AM, Kai Tietz wrote:
>> +(define_split
>> +  [(set (match_operand:W 0 "register_operand")
>> +        (match_operand:W 1 "memory_operand"))
>> +   (set (pc) (match_dup 0))]
>> +  "!TARGET_X32 && peep2_reg_dead_p (2, operands[0])"
>> +  [(set (pc) (match_dup 1))])
>> +
>
> Huh?  You can't use peep2 data structures in split passes.
>
>
> r~

Duh, you are right ... that shouldn't work, nevertheless it
bootstrapped fine.  Well, so we will need second peephole2 pass.  I
will come with patch for that soon.

Kai

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch i386]: Combine memory and indirect jump
  2014-06-20 17:45                         ` Kai Tietz
@ 2014-06-20 17:52                           ` Kai Tietz
  2014-06-20 17:55                             ` Richard Henderson
  0 siblings, 1 reply; 30+ messages in thread
From: Kai Tietz @ 2014-06-20 17:52 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Jeff Law, Steven Bosscher, GCC Patches

So revert to use a second peephole2 pass before final split before sched2 pass.

Ok for apply
ChangeLog

2014-06-20  Kai Tietz  <ktietz@redhat.com>

    PR target/39284
    * passes.def (peephole2): Add second peephole2 pass before
    split before sched2 pass.
    * config/i386/i386.md (peehole2): To combine
    indirect jump with memory.
    (split2): Likewise.

2014-06-20  Kai Tietz  <ktietz@redhat.com>

    * gcc.target/i386/indjmp-1.c: New test.

Tested for i686-pc-cygwin, and x86_64-unknown-linux-gnu running.  Ok
for apply when boostraps are passing?

Regards,
Kai

Index: testsuite/gcc.target/i386/indjmp-1.c
===================================================================
--- testsuite/gcc.target/i386/indjmp-1.c    (Revision 0)
+++ testsuite/gcc.target/i386/indjmp-1.c    (Arbeitskopie)
@@ -0,0 +1,23 @@
+/* { dg-do compile  { target ia32 } } */
+/* { dg-options "-O2" } */
+
+#define ADVANCE_AND_DISPATCH() goto *addresses[*pc++]
+
+void
+Interpret(const unsigned char *pc)
+{
+    static const void *const addresses[] = {
+      &&l0, &&l1, &&l2
+    };
+
+l0:
+    ADVANCE_AND_DISPATCH();
+
+l1:
+    ADVANCE_AND_DISPATCH();
+
+l2:
+    return;
+}
+
+/* { dg-final { scan-assembler-not "jmp\[ \t\]*.%eax" } } */
Index: passes.def
===================================================================
--- passes.def    (Revision 211850)
+++ passes.def    (Arbeitskopie)
@@ -393,6 +393,7 @@ along with GCC; see the file COPYING3.  If not see
       NEXT_PASS (pass_reorder_blocks);
       NEXT_PASS (pass_branch_target_load_optimize2);
       NEXT_PASS (pass_leaf_regs);
+      NEXT_PASS (pass_peephole2);
       NEXT_PASS (pass_split_before_sched2);
       NEXT_PASS (pass_sched2);
       NEXT_PASS (pass_stack_regs);
Index: config/i386/i386.md
===================================================================
--- config/i386/i386.md    (Revision 211850)
+++ config/i386/i386.md    (Arbeitskopie)
@@ -11466,6 +11466,15 @@
                 (match_dup 4)))
           (unspec [(const_int 0)] UNSPEC_PEEPSIB)])])

+;; Combining simple memory jump instruction
+
+(define_peephole2
+  [(set (match_operand:W 0 "register_operand")
+        (match_operand:W 1 "memory_operand"))
+   (set (pc) (match_dup 0))]
+  "!TARGET_X32 && peep2_reg_dead_p (2, operands[0])"
+  [(set (pc) (match_dup 1))])
+
 ;; Call subroutine, returning value in operand 0

 (define_expand "call_value"

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch i386]: Combine memory and indirect jump
  2014-06-20 17:52                           ` Kai Tietz
@ 2014-06-20 17:55                             ` Richard Henderson
  2014-06-20 18:07                               ` Kai Tietz
  0 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2014-06-20 17:55 UTC (permalink / raw)
  To: Kai Tietz; +Cc: Jeff Law, Steven Bosscher, GCC Patches

On 06/20/2014 10:52 AM, Kai Tietz wrote:
> 2014-06-20  Kai Tietz  <ktietz@redhat.com>
> 
>     PR target/39284
>     * passes.def (peephole2): Add second peephole2 pass before
>     split before sched2 pass.
>     * config/i386/i386.md (peehole2): To combine
>     indirect jump with memory.
>     (split2): Likewise.

Why are we adding a second pass instead of just moving the one?


r~

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch i386]: Combine memory and indirect jump
  2014-06-20 17:55                             ` Richard Henderson
@ 2014-06-20 18:07                               ` Kai Tietz
  2014-06-20 18:15                                 ` Jeff Law
  0 siblings, 1 reply; 30+ messages in thread
From: Kai Tietz @ 2014-06-20 18:07 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Jeff Law, Steven Bosscher, GCC Patches

2014-06-20 19:55 GMT+02:00 Richard Henderson <rth@redhat.com>:
> On 06/20/2014 10:52 AM, Kai Tietz wrote:
>> 2014-06-20  Kai Tietz  <ktietz@redhat.com>
>>
>>     PR target/39284
>>     * passes.def (peephole2): Add second peephole2 pass before
>>     split before sched2 pass.
>>     * config/i386/i386.md (peehole2): To combine
>>     indirect jump with memory.
>>     (split2): Likewise.
>
> Why are we adding a second pass instead of just moving the one?
>
>
> r~

I told that in a prior mail in that thread to Jeff. IIRC there are
some conversion of impossible pushes then done too late, additional
some patterns about split & dieing register too.  Means we produce
weaker code.

Kai

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch i386]: Combine memory and indirect jump
  2014-06-20 18:07                               ` Kai Tietz
@ 2014-06-20 18:15                                 ` Jeff Law
  2014-06-20 21:59                                   ` Kai Tietz
  0 siblings, 1 reply; 30+ messages in thread
From: Jeff Law @ 2014-06-20 18:15 UTC (permalink / raw)
  To: Kai Tietz, Richard Henderson; +Cc: Steven Bosscher, GCC Patches

On 06/20/14 12:07, Kai Tietz wrote:
> 2014-06-20 19:55 GMT+02:00 Richard Henderson <rth@redhat.com>:
>> On 06/20/2014 10:52 AM, Kai Tietz wrote:
>>> 2014-06-20  Kai Tietz  <ktietz@redhat.com>
>>>
>>>      PR target/39284
>>>      * passes.def (peephole2): Add second peephole2 pass before
>>>      split before sched2 pass.
>>>      * config/i386/i386.md (peehole2): To combine
>>>      indirect jump with memory.
>>>      (split2): Likewise.
>>
>> Why are we adding a second pass instead of just moving the one?
>>
>>
>> r~
>
> I told that in a prior mail in that thread to Jeff. IIRC there are
> some conversion of impossible pushes then done too late, additional
> some patterns about split & dieing register too.  Means we produce
> weaker code.
So let's dig into this deeper.  Examples & explanations would help.  I 
know it feels like a bit of a runaround, but avoiding adding the pass 
would be good.

jeff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch i386]: Combine memory and indirect jump
  2014-06-20 18:15                                 ` Jeff Law
@ 2014-06-20 21:59                                   ` Kai Tietz
  2014-06-23 14:13                                     ` Richard Henderson
  0 siblings, 1 reply; 30+ messages in thread
From: Kai Tietz @ 2014-06-20 21:59 UTC (permalink / raw)
  To: Jeff Law; +Cc: Richard Henderson, Steven Bosscher, GCC Patches

2014-06-20 20:14 GMT+02:00 Jeff Law <law@redhat.com>:
> On 06/20/14 12:07, Kai Tietz wrote:
>>
>> 2014-06-20 19:55 GMT+02:00 Richard Henderson <rth@redhat.com>:
>>>
>>> On 06/20/2014 10:52 AM, Kai Tietz wrote:
>>>>
>>>> 2014-06-20  Kai Tietz  <ktietz@redhat.com>
>>>>
>>>>      PR target/39284
>>>>      * passes.def (peephole2): Add second peephole2 pass before
>>>>      split before sched2 pass.
>>>>      * config/i386/i386.md (peehole2): To combine
>>>>      indirect jump with memory.
>>>>      (split2): Likewise.
>>>
>>>
>>> Why are we adding a second pass instead of just moving the one?
>>>
>>>
>>> r~
>>
>>
>> I told that in a prior mail in that thread to Jeff. IIRC there are
>> some conversion of impossible pushes then done too late, additional
>> some patterns about split & dieing register too.  Means we produce
>> weaker code.
>
> So let's dig into this deeper.  Examples & explanations would help.  I know
> it feels like a bit of a runaround, but avoiding adding the pass would be
> good.
>
> jeff

I dug into it a bit.  And couldn't find any significant difference for
x64 target for existing testcases.
I am still a bit concerned - I can't reproduce it for x86/x86_64
targets - that we might cause regressions for targets by moving
peephole2 pass too close before the sched2 pass.  Therefore I searched
for the closest place to the prior place of the peephole2 pass, which
solves still the indirect jump optimization on memory.
By testing for x86/x64 the pass needs to be run directly after the
"reorder blocks" pass.

So I suggest following change of passes.def:

Index: passes.def
===================================================================
--- passes.def  (Revision 211850)
+++ passes.def  (Arbeitskopie)
@@ -384,7 +384,6 @@ along with GCC; see the file COPYING3.  If not see
          NEXT_PASS (pass_rtl_dse2);
          NEXT_PASS (pass_stack_adjustments);
          NEXT_PASS (pass_jump2);
-         NEXT_PASS (pass_peephole2);
          NEXT_PASS (pass_if_after_reload);
          NEXT_PASS (pass_regrename);
          NEXT_PASS (pass_cprop_hardreg);
@@ -391,6 +390,7 @@ along with GCC; see the file COPYING3.  If not see
          NEXT_PASS (pass_fast_rtl_dce);
          NEXT_PASS (pass_duplicate_computed_gotos);
          NEXT_PASS (pass_reorder_blocks);
+         NEXT_PASS (pass_peephole2);
          NEXT_PASS (pass_branch_target_load_optimize2);
          NEXT_PASS (pass_leaf_regs);
          NEXT_PASS (pass_split_before_sched2);

Kai

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch i386]: Combine memory and indirect jump
  2014-06-20 21:59                                   ` Kai Tietz
@ 2014-06-23 14:13                                     ` Richard Henderson
  2014-06-23 14:32                                       ` Richard Biener
  0 siblings, 1 reply; 30+ messages in thread
From: Richard Henderson @ 2014-06-23 14:13 UTC (permalink / raw)
  To: Kai Tietz, Jeff Law; +Cc: Steven Bosscher, GCC Patches

On 06/20/2014 02:59 PM, Kai Tietz wrote:
> So I suggest following change of passes.def:
> 
> Index: passes.def
> ===================================================================
> --- passes.def  (Revision 211850)
> +++ passes.def  (Arbeitskopie)
> @@ -384,7 +384,6 @@ along with GCC; see the file COPYING3.  If not see
>           NEXT_PASS (pass_rtl_dse2);
>           NEXT_PASS (pass_stack_adjustments);
>           NEXT_PASS (pass_jump2);
> -         NEXT_PASS (pass_peephole2);
>           NEXT_PASS (pass_if_after_reload);
>           NEXT_PASS (pass_regrename);
>           NEXT_PASS (pass_cprop_hardreg);
> @@ -391,6 +390,7 @@ along with GCC; see the file COPYING3.  If not see
>           NEXT_PASS (pass_fast_rtl_dce);
>           NEXT_PASS (pass_duplicate_computed_gotos);
>           NEXT_PASS (pass_reorder_blocks);
> +         NEXT_PASS (pass_peephole2);
>           NEXT_PASS (pass_branch_target_load_optimize2);
>           NEXT_PASS (pass_leaf_regs);
>           NEXT_PASS (pass_split_before_sched2);

Looks good to me.  I guess just keep an eye out for bug reports for other ports.


r~

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch i386]: Combine memory and indirect jump
  2014-06-23 14:13                                     ` Richard Henderson
@ 2014-06-23 14:32                                       ` Richard Biener
  2014-06-23 16:22                                         ` Jeff Law
  0 siblings, 1 reply; 30+ messages in thread
From: Richard Biener @ 2014-06-23 14:32 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Kai Tietz, Jeff Law, Steven Bosscher, GCC Patches

On Mon, Jun 23, 2014 at 4:13 PM, Richard Henderson <rth@redhat.com> wrote:
> On 06/20/2014 02:59 PM, Kai Tietz wrote:
>> So I suggest following change of passes.def:
>>
>> Index: passes.def
>> ===================================================================
>> --- passes.def  (Revision 211850)
>> +++ passes.def  (Arbeitskopie)
>> @@ -384,7 +384,6 @@ along with GCC; see the file COPYING3.  If not see
>>           NEXT_PASS (pass_rtl_dse2);
>>           NEXT_PASS (pass_stack_adjustments);
>>           NEXT_PASS (pass_jump2);
>> -         NEXT_PASS (pass_peephole2);
>>           NEXT_PASS (pass_if_after_reload);
>>           NEXT_PASS (pass_regrename);
>>           NEXT_PASS (pass_cprop_hardreg);
>> @@ -391,6 +390,7 @@ along with GCC; see the file COPYING3.  If not see
>>           NEXT_PASS (pass_fast_rtl_dce);
>>           NEXT_PASS (pass_duplicate_computed_gotos);
>>           NEXT_PASS (pass_reorder_blocks);
>> +         NEXT_PASS (pass_peephole2);
>>           NEXT_PASS (pass_branch_target_load_optimize2);
>>           NEXT_PASS (pass_leaf_regs);
>>           NEXT_PASS (pass_split_before_sched2);
>
> Looks good to me.  I guess just keep an eye out for bug reports for other ports.

Maybe put a comment here because it looks like a random placement to me
which would be obvious to revert.  peepholing before if-after-reload sounds
good anyway.

Did you test effect on code-generation of this change on other targets?

Btw, there is now no DCE after peephole2?  Is peephole2 expected to
cleanup after itself?

Richard.

>
> r~

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch i386]: Combine memory and indirect jump
  2014-06-23 14:32                                       ` Richard Biener
@ 2014-06-23 16:22                                         ` Jeff Law
  2014-06-23 16:43                                           ` Richard Henderson
  0 siblings, 1 reply; 30+ messages in thread
From: Jeff Law @ 2014-06-23 16:22 UTC (permalink / raw)
  To: Richard Biener, Richard Henderson; +Cc: Kai Tietz, Steven Bosscher, GCC Patches

On 06/23/14 08:32, Richard Biener wrote:
> On Mon, Jun 23, 2014 at 4:13 PM, Richard Henderson <rth@redhat.com> wrote:
>> On 06/20/2014 02:59 PM, Kai Tietz wrote:
>>> So I suggest following change of passes.def:
>>>
>>> Index: passes.def
>>> ===================================================================
>>> --- passes.def  (Revision 211850)
>>> +++ passes.def  (Arbeitskopie)
>>> @@ -384,7 +384,6 @@ along with GCC; see the file COPYING3.  If not see
>>>            NEXT_PASS (pass_rtl_dse2);
>>>            NEXT_PASS (pass_stack_adjustments);
>>>            NEXT_PASS (pass_jump2);
>>> -         NEXT_PASS (pass_peephole2);
>>>            NEXT_PASS (pass_if_after_reload);
>>>            NEXT_PASS (pass_regrename);
>>>            NEXT_PASS (pass_cprop_hardreg);
>>> @@ -391,6 +390,7 @@ along with GCC; see the file COPYING3.  If not see
>>>            NEXT_PASS (pass_fast_rtl_dce);
>>>            NEXT_PASS (pass_duplicate_computed_gotos);
>>>            NEXT_PASS (pass_reorder_blocks);
>>> +         NEXT_PASS (pass_peephole2);
>>>            NEXT_PASS (pass_branch_target_load_optimize2);
>>>            NEXT_PASS (pass_leaf_regs);
>>>            NEXT_PASS (pass_split_before_sched2);
>>
>> Looks good to me.  I guess just keep an eye out for bug reports for other ports.
>
> Maybe put a comment here because it looks like a random placement to me
> which would be obvious to revert.  peepholing before if-after-reload sounds
> good anyway.
Definitely need a comment on the pass placement.

> Btw, there is now no DCE after peephole2?  Is peephole2 expected to
> cleanup after itself?
There were cases where we wanted to change the insns we would output to 
fit into the 4:1:1 issue model of the PPro, but to do so we needed to 
know what registers were live/dead so that we could rewrite the insns 
appropriately.  It didn't fit well into what we could do in the 
splitters and the old peephole ran too late.  Dead code wasn't ever 
really considered.  At least that's my recollection.  RTH might recall more.

I think it'd be worth an experiment here, but I think that can/should 
happen independently of Kai's patch.  Arguably the scheduler should have 
all the necessary dataflow information to quickly identify any dead code.

Jeff

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch i386]: Combine memory and indirect jump
  2014-06-23 16:22                                         ` Jeff Law
@ 2014-06-23 16:43                                           ` Richard Henderson
  0 siblings, 0 replies; 30+ messages in thread
From: Richard Henderson @ 2014-06-23 16:43 UTC (permalink / raw)
  To: Jeff Law, Richard Biener; +Cc: Kai Tietz, Steven Bosscher, GCC Patches

On 06/23/2014 09:22 AM, Jeff Law wrote:
> On 06/23/14 08:32, Richard Biener wrote:
>> Btw, there is now no DCE after peephole2?  Is peephole2 expected to
>> cleanup after itself?
> There were cases where we wanted to change the insns we would output to fit
> into the 4:1:1 issue model of the PPro, but to do so we needed to know what
> registers were live/dead so that we could rewrite the insns appropriately.  It
> didn't fit well into what we could do in the splitters and the old peephole ran
> too late.  Dead code wasn't ever really considered.  At least that's my
> recollection.  RTH might recall more.

Yes, peep2 was about doing what the old "peep1" rtl->text transformation did,
but as an rtl->rtl transformation so we can expose the result to the scheduler.

It's expected that all dead code be gone before sched2, so that the scheduler
sees exactly what needs to be scheduled, and can bundle insns appropriately.

I believe the peep2 pass to also want dead code to be gone, so that it gets an
accurate picture of what registers are live or dead at any point.  As far as I
know, there are no current transformations that create new garbage.



r~

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2014-06-23 16:43 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-11  8:32 [patch i386]: Combine memory and indirect jump Kai Tietz
2014-06-11  9:58 ` Steven Bosscher
2014-06-11 10:37   ` Kai Tietz
2014-06-12 16:21     ` Kai Tietz
2014-06-12 18:53       ` Segher Boessenkool
2014-06-12 20:41         ` Kai Tietz
2014-06-12 21:06           ` Segher Boessenkool
2014-06-13 15:29             ` Jeff Law
2014-06-13 18:46               ` Segher Boessenkool
2014-06-12 21:47       ` David Wohlferd
2014-06-13  7:22         ` Kai Tietz
2014-06-13 15:36       ` Jeff Law
2014-06-13 15:56         ` Richard Henderson
2014-06-13 15:58           ` Jeff Law
2014-06-13 16:59             ` Kai Tietz
2014-06-17 19:26               ` Jeff Law
2014-06-17 20:35                 ` Kai Tietz
2014-06-18 19:36                   ` Jeff Law
2014-06-20 15:56                     ` Kai Tietz
2014-06-20 17:23                       ` Richard Henderson
2014-06-20 17:45                         ` Kai Tietz
2014-06-20 17:52                           ` Kai Tietz
2014-06-20 17:55                             ` Richard Henderson
2014-06-20 18:07                               ` Kai Tietz
2014-06-20 18:15                                 ` Jeff Law
2014-06-20 21:59                                   ` Kai Tietz
2014-06-23 14:13                                     ` Richard Henderson
2014-06-23 14:32                                       ` Richard Biener
2014-06-23 16:22                                         ` Jeff Law
2014-06-23 16:43                                           ` Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).