public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
* new ira optimization - adding a loop to ira
@ 2019-09-13  8:48 stefan
  2019-09-13 10:15 ` Richard Sandiford
  0 siblings, 1 reply; 6+ messages in thread
From: stefan @ 2019-09-13  8:48 UTC (permalink / raw)
  To: gcc-help

I'm working on a new optimization to get rid of spilled tmp variables (e.g.
introduced by pre) to use the source mem ref instead of a stack slot.

To do this, I added a loop into ira.c:ira()

  init_prune_stack_vars ();
  do
    {
#ifndef IRA_NO_OBSTACK
  gcc_obstack_init (&ira_obstack);
#endif
bitmap_obstack_initialize (&ira_bitmap_obstack);

...

      ira_color ();

    }
  while (flag_prune_stack_vars && prune_stack_vars ());

To get it work, the prune_stack_vars function resets a couple of data.
This is mostly working - but on some source files, it fails due to invalid
reg_equivs. 
Since this also happens, if the optimizer does nothing and just loops once.

Currently I'm calling this, before looping again

      regstat_free_n_sets_and_refs ();
      regstat_free_ri ();
      loop_optimizer_finalize ();
      free_dominance_info (CDI_DOMINATORS);

Any hint, what I'm missing to reset?


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: new ira optimization - adding a loop to ira
  2019-09-13  8:48 new ira optimization - adding a loop to ira stefan
@ 2019-09-13 10:15 ` Richard Sandiford
  2019-09-13 10:44   ` AW: " stefan
       [not found]   ` <022901d56a20$40895f20$c19c1d60$@franke.ms>
  0 siblings, 2 replies; 6+ messages in thread
From: Richard Sandiford @ 2019-09-13 10:15 UTC (permalink / raw)
  To: stefan; +Cc: gcc-help

<stefan@franke.ms> writes:
> I'm working on a new optimization to get rid of spilled tmp variables (e.g.
> introduced by pre) to use the source mem ref instead of a stack slot.
>
> To do this, I added a loop into ira.c:ira()
>
>   init_prune_stack_vars ();
>   do
>     {
> #ifndef IRA_NO_OBSTACK
>   gcc_obstack_init (&ira_obstack);
> #endif
> bitmap_obstack_initialize (&ira_bitmap_obstack);
>
> ...
>
>       ira_color ();
>
>     }
>   while (flag_prune_stack_vars && prune_stack_vars ());
>
> To get it work, the prune_stack_vars function resets a couple of data.
> This is mostly working - but on some source files, it fails due to invalid
> reg_equivs. 
> Since this also happens, if the optimizer does nothing and just loops once.
>
> Currently I'm calling this, before looping again
>
>       regstat_free_n_sets_and_refs ();
>       regstat_free_ri ();
>       loop_optimizer_finalize ();
>       free_dominance_info (CDI_DOMINATORS);
>
> Any hint, what I'm missing to reset?

I can't see anything obviously missing.  What kind of failure do
you see?  E.g. do you get an internal compiler error or does the
compiler generate incorrect code?

Do you see the failure on an in-tree test case?  FWIW, I just tried
looping like this locally and didn't see any failures for the tests
I tried.  But I was obviously testing without the new optimisation,
and so each loop iteration should just repeat what the previous one did.

Not related to the failure, but: do you do anything with the obstacks
when looping again?  Including the initialisations in the loop as above
would introduce a memory leak if you don't do anything to free the contents.
It'd probably be better to initialise outside the loop unless you're
really confident that the no data is carried across iterations.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 6+ messages in thread

* AW: new ira optimization - adding a loop to ira
  2019-09-13 10:15 ` Richard Sandiford
@ 2019-09-13 10:44   ` stefan
       [not found]   ` <022901d56a20$40895f20$c19c1d60$@franke.ms>
  1 sibling, 0 replies; 6+ messages in thread
From: stefan @ 2019-09-13 10:44 UTC (permalink / raw)
  To: 'Richard Sandiford'; +Cc: gcc-help

> -----Ursprüngliche Nachricht-----
> Von: Richard Sandiford <richard.sandiford@arm.com>
> Gesendet: Freitag, 13. September 2019 12:16
> An: stefan@franke.ms
> Cc: gcc-help@gcc.gnu.org
> Betreff: Re: new ira optimization - adding a loop to ira
> 
> <stefan@franke.ms> writes:
> > I'm working on a new optimization to get rid of spilled tmp variables
(e.g.
> > introduced by pre) to use the source mem ref instead of a stack slot.
> >
> > To do this, I added a loop into ira.c:ira()
> >
> >   init_prune_stack_vars ();
> >   do
> >     {
> > #ifndef IRA_NO_OBSTACK
> >   gcc_obstack_init (&ira_obstack);
> > #endif
> > bitmap_obstack_initialize (&ira_bitmap_obstack);
> >
> > ...
> >
> >       ira_color ();
> >
> >     }
> >   while (flag_prune_stack_vars && prune_stack_vars ());
> >
> > To get it work, the prune_stack_vars function resets a couple of data.
> > This is mostly working - but on some source files, it fails due to
> > invalid reg_equivs.
> > Since this also happens, if the optimizer does nothing and just loops
once.
> >
> > Currently I'm calling this, before looping again
> >
> >       regstat_free_n_sets_and_refs ();
> >       regstat_free_ri ();
> >       loop_optimizer_finalize ();
> >       free_dominance_info (CDI_DOMINATORS);
> >
> > Any hint, what I'm missing to reset?
> 
> I can't see anything obviously missing.  What kind of failure do you see?
E.g.
> do you get an internal compiler error or does the compiler generate
> incorrect code?
> 
> Do you see the failure on an in-tree test case?  FWIW, I just tried
looping like
> this locally and didn't see any failures for the tests I tried.  But I was
obviously
> testing without the new optimisation, and so each loop iteration should
just
> repeat what the previous one did.
> 
> Not related to the failure, but: do you do anything with the obstacks when
> looping again?  Including the initialisations in the loop as above would
> introduce a memory leak if you don't do anything to free the contents.
> It'd probably be better to initialise outside the loop unless you're
really
> confident that the no data is carried across iterations.
> 
> Thanks,
> Richard

Thanks für the ira_obstack hint - I will take care of this, once the loop
mode is working - maybe I can start looping later or I'll free the memory.

In reload: push_reload(...) this raises an error:

      gcc_assert (regno < FIRST_PSEUDO_REGISTER
		  || reg_renumber[regno] >= 0
		  || reg_equiv_constant (regno) == NULL_RTX);

I already know that it's reg_equiv_constant and that this reg_equiv_constant
is also set in the unpatched code.

So I am looking why these additional reloads occur. There are additional
reloads if I enable the loop, interestingly for uid like 2, 3, 4 ...

Thanks,
Stefan


^ permalink raw reply	[flat|nested] 6+ messages in thread

* AW: new ira optimization - adding a loop to ira
       [not found]   ` <022901d56a20$40895f20$c19c1d60$@franke.ms>
@ 2019-09-13 10:58     ` stefan
       [not found]     ` <022b01d56a22$241361e0$6c3a25a0$@franke.ms>
  1 sibling, 0 replies; 6+ messages in thread
From: stefan @ 2019-09-13 10:58 UTC (permalink / raw)
  To: 'Richard Sandiford'; +Cc: gcc-help



> -----Ursprüngliche Nachricht-----
> Von: stefan@franke.ms <stefan@franke.ms>
> Gesendet: Freitag, 13. September 2019 12:45
> An: 'Richard Sandiford' <richard.sandiford@arm.com>
> Cc: gcc-help@gcc.gnu.org
> Betreff: AW: new ira optimization - adding a loop to ira
> 
> > -----Ursprüngliche Nachricht-----
> > Von: Richard Sandiford <richard.sandiford@arm.com>
> > Gesendet: Freitag, 13. September 2019 12:16
> > An: stefan@franke.ms
> > Cc: gcc-help@gcc.gnu.org
> > Betreff: Re: new ira optimization - adding a loop to ira
> >
> > <stefan@franke.ms> writes:
> > > I'm working on a new optimization to get rid of spilled tmp
> > > variables
> (e.g.
> > > introduced by pre) to use the source mem ref instead of a stack slot.
> > >
> > > To do this, I added a loop into ira.c:ira()
> > >
> > >   init_prune_stack_vars ();
> > >   do
> > >     {
> > > #ifndef IRA_NO_OBSTACK
> > >   gcc_obstack_init (&ira_obstack);
> > > #endif
> > > bitmap_obstack_initialize (&ira_bitmap_obstack);
> > >
> > > ...
> > >
> > >       ira_color ();
> > >
> > >     }
> > >   while (flag_prune_stack_vars && prune_stack_vars ());
> > >
> > > To get it work, the prune_stack_vars function resets a couple of data.
> > > This is mostly working - but on some source files, it fails due to
> > > invalid reg_equivs.
> > > Since this also happens, if the optimizer does nothing and just
> > > loops
> once.
> > >
> > > Currently I'm calling this, before looping again
> > >
> > >       regstat_free_n_sets_and_refs ();
> > >       regstat_free_ri ();
> > >       loop_optimizer_finalize ();
> > >       free_dominance_info (CDI_DOMINATORS);
> > >
> > > Any hint, what I'm missing to reset?
> >
> > I can't see anything obviously missing.  What kind of failure do you
> see?  E.g.
> > do you get an internal compiler error or does the compiler generate
> > incorrect code?
> >
> > Do you see the failure on an in-tree test case?  FWIW, I just tried
> looping like
> > this locally and didn't see any failures for the tests I tried.  But I
> was obviously
> > testing without the new optimisation, and so each loop iteration
> > should
> just
> > repeat what the previous one did.
> >
> > Not related to the failure, but: do you do anything with the obstacks
> when
> > looping again?  Including the initialisations in the loop as above
> > would introduce a memory leak if you don't do anything to free the
> contents.
> > It'd probably be better to initialise outside the loop unless you're
> really
> > confident that the no data is carried across iterations.
> >
> > Thanks,
> > Richard
> 
> Thanks für the ira_obstack hint - I will take care of this, once the loop
mode
> is working - maybe I can start looping later or I'll free the memory.
> 
> In reload: push_reload(...) this raises an error:
> 
>       gcc_assert (regno < FIRST_PSEUDO_REGISTER
> 		  || reg_renumber[regno] >= 0
> 		  || reg_equiv_constant (regno) == NULL_RTX);
> 
> I already know that it's reg_equiv_constant and that this
reg_equiv_constant
> is also set in the unpatched code.
> 
> So I am looking why these additional reloads occur. There are additional
> reloads if I enable the loop, interestingly for uid like 2, 3, 4 ...
> 
> Thanks,
> Stefan


The difference is the additional expr_list, which causes the reload:

(insn 2 10 3 2 (set (reg/f:SI 9 a1 [orig:46 this ] [46])
        (mem/f/c:SI (plus:SI (reg/f:SI 15 sp)
                (const_int 16 [0x10])) [178 this+0 S4 A16]))
engines/sci/engine/kpathing.cpp:758 40 {*movsi_m68k2}
     (expr_list:REG_EQUIV (mem/f/c:SI (plus:SI (reg/f:SI 15 sp)
                (const_int 16 [0x10])) [178 this+0 S4 A16])
        (nil)))

=> I'll add some code to drop the expr_list from all insns...

^ permalink raw reply	[flat|nested] 6+ messages in thread

* AW: new ira optimization - adding a loop to ira
       [not found]     ` <022b01d56a22$241361e0$6c3a25a0$@franke.ms>
@ 2019-09-13 12:58       ` stefan
  2019-09-20 17:07         ` stefan
  0 siblings, 1 reply; 6+ messages in thread
From: stefan @ 2019-09-13 12:58 UTC (permalink / raw)
  To: 'Richard Sandiford'; +Cc: gcc-help

> -----Ursprüngliche Nachricht-----
> Von: stefan@franke.ms <stefan@franke.ms>
> Gesendet: Freitag, 13. September 2019 12:58
> An: 'Richard Sandiford' <richard.sandiford@arm.com>
> Cc: gcc-help@gcc.gnu.org
> Betreff: AW: new ira optimization - adding a loop to ira
> 
> > -----Ursprüngliche Nachricht-----
> > Von: stefan@franke.ms <stefan@franke.ms>
> > Gesendet: Freitag, 13. September 2019 12:45
> > An: 'Richard Sandiford' <richard.sandiford@arm.com>
> > Cc: gcc-help@gcc.gnu.org
> > Betreff: AW: new ira optimization - adding a loop to ira
> >
> > > -----Ursprüngliche Nachricht-----
> > > Von: Richard Sandiford <richard.sandiford@arm.com>
> > > Gesendet: Freitag, 13. September 2019 12:16
> > > An: stefan@franke.ms
> > > Cc: gcc-help@gcc.gnu.org
> > > Betreff: Re: new ira optimization - adding a loop to ira
> > >
> > > <stefan@franke.ms> writes:
> > > > I'm working on a new optimization to get rid of spilled tmp
> > > > variables
> > (e.g.
> > > > introduced by pre) to use the source mem ref instead of a stack
> slot.
> > > >
> > > > To do this, I added a loop into ira.c:ira()
> > > >
> > > >   init_prune_stack_vars ();
> > > >   do
> > > >     {
> > > > #ifndef IRA_NO_OBSTACK
> > > >   gcc_obstack_init (&ira_obstack); #endif
> > > > bitmap_obstack_initialize (&ira_bitmap_obstack);
> > > >
> > > > ...
> > > >
> > > >       ira_color ();
> > > >
> > > >     }
> > > >   while (flag_prune_stack_vars && prune_stack_vars ());
> > > >
> > > > To get it work, the prune_stack_vars function resets a couple of
> data.
> > > > This is mostly working - but on some source files, it fails due to
> > > > invalid reg_equivs.
> > > > Since this also happens, if the optimizer does nothing and just
> > > > loops
> > once.
> > > >
> > > > Currently I'm calling this, before looping again
> > > >
> > > >       regstat_free_n_sets_and_refs ();
> > > >       regstat_free_ri ();
> > > >       loop_optimizer_finalize ();
> > > >       free_dominance_info (CDI_DOMINATORS);
> > > >
> > > > Any hint, what I'm missing to reset?
> > >
> > > I can't see anything obviously missing.  What kind of failure do you
> > see?  E.g.
> > > do you get an internal compiler error or does the compiler generate
> > > incorrect code?
> > >
> > > Do you see the failure on an in-tree test case?  FWIW, I just tried
> > looping like
> > > this locally and didn't see any failures for the tests I tried.  But
> > > I
> > was obviously
> > > testing without the new optimisation, and so each loop iteration
> > > should
> > just
> > > repeat what the previous one did.
> > >
> > > Not related to the failure, but: do you do anything with the
> > > obstacks
> > when
> > > looping again?  Including the initialisations in the loop as above
> > > would introduce a memory leak if you don't do anything to free the
> > contents.
> > > It'd probably be better to initialise outside the loop unless you're
> > really
> > > confident that the no data is carried across iterations.
> > >
> > > Thanks,
> > > Richard
> >
> > Thanks für the ira_obstack hint - I will take care of this, once the
> loop mode
> > is working - maybe I can start looping later or I'll free the memory.
> >
> > In reload: push_reload(...) this raises an error:
> >
> >       gcc_assert (regno < FIRST_PSEUDO_REGISTER
> > 		  || reg_renumber[regno] >= 0
> > 		  || reg_equiv_constant (regno) == NULL_RTX);
> >
> > I already know that it's reg_equiv_constant and that this
> reg_equiv_constant
> > is also set in the unpatched code.
> >
> > So I am looking why these additional reloads occur. There are
> > additional reloads if I enable the loop, interestingly for uid like 2, 3, 4 ...
> >
> > Thanks,
> > Stefan
> 
> 
> The difference is the additional expr_list, which causes the reload:
> 
> (insn 2 10 3 2 (set (reg/f:SI 9 a1 [orig:46 this ] [46])
>         (mem/f/c:SI (plus:SI (reg/f:SI 15 sp)
>                 (const_int 16 [0x10])) [178 this+0 S4 A16]))
> engines/sci/engine/kpathing.cpp:758 40 {*movsi_m68k2}
>      (expr_list:REG_EQUIV (mem/f/c:SI (plus:SI (reg/f:SI 15 sp)
>                 (const_int 16 [0x10])) [178 this+0 S4 A16])
>         (nil)))
> 
> => I'll add some code to drop the expr_list from all insns...

I took the wrong corner:

A normal ira pass is changing REG_EQUAL notes to REG_EQUIV notes. This sets the req_equiv and causes the failure during reload...

=> I added code to record all insn/REG_EQUAL-note pairs
=> and restore these if the loop is run again - dropping the REQ_EQUIV notes.

And this issue went aways.

Plus I moved the loop start further below, so the ira_obstack is only initialized once:

  init_prune_stack_vars ();
  do
    {
      init_reg_equiv ();


=> I can continue to work on the optimizer itself.

To provide an example:

void transformVector( double* restrict inputVector, double const transformMatrix[4][4],double* restrict outputVector)
{
    for(int k = 0; k < 900; k++)
    {
        double x = *inputVector++;
        double y = *inputVector++;
        double z = *inputVector++;

        for(int l = 0; l < 3; l++){
            double res =  transformMatrix[l][0] * x;
            res +=  transformMatrix[l][1] * y;
            res +=  transformMatrix[l][2] * z;
            res +=  transformMatrix[l][3];
            *outputVector++ = res;
        }
    }
}

m68k-amigaos-gcc -m68080 -O3 x.c -S

yields:

#NO_APP
        .text
        .align  2
        .globl  _transformVector
_transformVector:
        link.w a5,#-88
        move.l (16,a5),a0
        move.l (8,a5),a1
        fmovem fp2/fp3/fp4/fp5/fp6/fp7,-(sp)
        movem.l a4/a3/a2,-(sp)
        move.l (12,a5),a2
        move.l (a2)+,(-16,a5)
        move.l (a2)+,(-12,a5)
        lea (21600,a0),a4
        fdmove.d (a2)+,fp7
        move.l (a2)+,(-8,a5)
        move.l (a2)+,(-4,a5)
        move.l (a2)+,(-24,a5)
        move.l (a2)+,(-20,a5)
        move.l (a2)+,(-32,a5)
        move.l (a2)+,(-28,a5)
        move.l (a2)+,(-40,a5)
        move.l (a2)+,(-36,a5)
        move.l (a2)+,(-48,a5)
        move.l (a2)+,(-44,a5)
        move.l (a2)+,(-56,a5)
        move.l (a2)+,(-52,a5)
        move.l (a2)+,(-64,a5)
        move.l (a2)+,(-60,a5)
        move.l (a2)+,(-72,a5)
        move.l (a2)+,(-68,a5)
        move.l (a2)+,(-80,a5)
        move.l (a2)+,(-76,a5)
        move.l (a2),(-88,a5)
        move.l (4,a2),(-84,a5)
.L2:
        fdmove.d (8,a1),fp0
        lea (24,a1),a3
        lea (24,a0),a2
        fdmove.d (a1),fp6
        move.l a3,a1
        fdmove.x fp0,fp4
        fdmove.d (-16,a5),fp2
        fdmul.x fp6,fp2
        fdmul.x fp7,fp4
...


And with the new option:

m68k-amigaos-gcc -m68080 -O3 x.c -S -fprune-stack-vars

_transformVector:
        link.w a5,#0
        move.l (16,a5),a1
        move.l (12,a5),a0
        fmovem fp2/fp3/fp4/fp5/fp6/fp7,-(sp)
        movem.l a6/a4/a3/a2,-(sp)
        move.l (8,a5),a2
        lea (21600,a1),a6
.L2:
        fdmove.d (a2),fp2
        lea (24,a2),a4
        lea (24,a1),a3
        fdmove.d (8,a2),fp0
        move.l a4,a2
        fdmove.x fp2,fp3
        fdmove.x fp0,fp5
        fdmul.d (a0),fp3
        fdmul.d (8,a0),fp5
...

Btw: the code is not platform specific -> guess it's generally useful 

Thanks
Stefan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* AW: new ira optimization - adding a loop to ira
  2019-09-13 12:58       ` stefan
@ 2019-09-20 17:07         ` stefan
  0 siblings, 0 replies; 6+ messages in thread
From: stefan @ 2019-09-20 17:07 UTC (permalink / raw)
  To: gcc-help

> -----Ursprüngliche Nachricht-----
> Von: gcc-help-owner@gcc.gnu.org <gcc-help-owner@gcc.gnu.org> Im
> Auftrag von stefan@franke.ms
> Gesendet: Freitag, 13. September 2019 14:59
> An: 'Richard Sandiford' <richard.sandiford@arm.com>
> Cc: gcc-help@gcc.gnu.org
> Betreff: AW: new ira optimization - adding a loop to ira
> 
> > -----Ursprüngliche Nachricht-----
> > Von: stefan@franke.ms <stefan@franke.ms>
> > Gesendet: Freitag, 13. September 2019 12:58
> > An: 'Richard Sandiford' <richard.sandiford@arm.com>
> > Cc: gcc-help@gcc.gnu.org
> > Betreff: AW: new ira optimization - adding a loop to ira
> >
> > > -----Ursprüngliche Nachricht-----
> > > Von: stefan@franke.ms <stefan@franke.ms>
> > > Gesendet: Freitag, 13. September 2019 12:45
> > > An: 'Richard Sandiford' <richard.sandiford@arm.com>
> > > Cc: gcc-help@gcc.gnu.org
> > > Betreff: AW: new ira optimization - adding a loop to ira
> > >
> > > > -----Ursprüngliche Nachricht-----
> > > > Von: Richard Sandiford <richard.sandiford@arm.com>
> > > > Gesendet: Freitag, 13. September 2019 12:16
> > > > An: stefan@franke.ms
> > > > Cc: gcc-help@gcc.gnu.org
> > > > Betreff: Re: new ira optimization - adding a loop to ira
> > > >
> > > > <stefan@franke.ms> writes:
> > > > > I'm working on a new optimization to get rid of spilled tmp
> > > > > variables
> > > (e.g.
> > > > > introduced by pre) to use the source mem ref instead of a stack
> > slot.
> > > > >
> > > > > To do this, I added a loop into ira.c:ira()
> > > > >
> > > > >   init_prune_stack_vars ();
> > > > >   do
> > > > >     {
> > > > > #ifndef IRA_NO_OBSTACK
> > > > >   gcc_obstack_init (&ira_obstack); #endif
> > > > > bitmap_obstack_initialize (&ira_bitmap_obstack);
> > > > >
> > > > > ...
> > > > >
> > > > >       ira_color ();
> > > > >
> > > > >     }
> > > > >   while (flag_prune_stack_vars && prune_stack_vars ());
> > > > >
> > > > > To get it work, the prune_stack_vars function resets a couple of
> > data.
> > > > > This is mostly working - but on some source files, it fails due
> > > > > to invalid reg_equivs.
> > > > > Since this also happens, if the optimizer does nothing and just
> > > > > loops
> > > once.
> > > > >
> > > > > Currently I'm calling this, before looping again
> > > > >
> > > > >       regstat_free_n_sets_and_refs ();
> > > > >       regstat_free_ri ();
> > > > >       loop_optimizer_finalize ();
> > > > >       free_dominance_info (CDI_DOMINATORS);
> > > > >
> > > > > Any hint, what I'm missing to reset?
> > > >
> > > > I can't see anything obviously missing.  What kind of failure do
> > > > you
> > > see?  E.g.
> > > > do you get an internal compiler error or does the compiler
> > > > generate incorrect code?
> > > >
> > > > Do you see the failure on an in-tree test case?  FWIW, I just
> > > > tried
> > > looping like
> > > > this locally and didn't see any failures for the tests I tried.
> > > > But I
> > > was obviously
> > > > testing without the new optimisation, and so each loop iteration
> > > > should
> > > just
> > > > repeat what the previous one did.
> > > >
> > > > Not related to the failure, but: do you do anything with the
> > > > obstacks
> > > when
> > > > looping again?  Including the initialisations in the loop as above
> > > > would introduce a memory leak if you don't do anything to free the
> > > contents.
> > > > It'd probably be better to initialise outside the loop unless
> > > > you're
> > > really
> > > > confident that the no data is carried across iterations.
> > > >
> > > > Thanks,
> > > > Richard
> > >
> > > Thanks für the ira_obstack hint - I will take care of this, once the
> > loop mode
> > > is working - maybe I can start looping later or I'll free the memory.
> > >
> > > In reload: push_reload(...) this raises an error:
> > >
> > >       gcc_assert (regno < FIRST_PSEUDO_REGISTER
> > > 		  || reg_renumber[regno] >= 0
> > > 		  || reg_equiv_constant (regno) == NULL_RTX);
> > >
> > > I already know that it's reg_equiv_constant and that this
> > reg_equiv_constant
> > > is also set in the unpatched code.
> > >
> > > So I am looking why these additional reloads occur. There are
> > > additional reloads if I enable the loop, interestingly for uid like 2, 3, 4 ...
> > >
> > > Thanks,
> > > Stefan
> >
> >
> > The difference is the additional expr_list, which causes the reload:
> >
> > (insn 2 10 3 2 (set (reg/f:SI 9 a1 [orig:46 this ] [46])
> >         (mem/f/c:SI (plus:SI (reg/f:SI 15 sp)
> >                 (const_int 16 [0x10])) [178 this+0 S4 A16]))
> > engines/sci/engine/kpathing.cpp:758 40 {*movsi_m68k2}
> >      (expr_list:REG_EQUIV (mem/f/c:SI (plus:SI (reg/f:SI 15 sp)
> >                 (const_int 16 [0x10])) [178 this+0 S4 A16])
> >         (nil)))
> >
> > => I'll add some code to drop the expr_list from all insns...
> 
> I took the wrong corner:
> 
> A normal ira pass is changing REG_EQUAL notes to REG_EQUIV notes. This
> sets the req_equiv and causes the failure during reload...
> 
> => I added code to record all insn/REG_EQUAL-note pairs => and restore
> these if the loop is run again - dropping the REQ_EQUIV notes.
> 
> And this issue went aways.
> 
> Plus I moved the loop start further below, so the ira_obstack is only
> initialized once:
> 
>   init_prune_stack_vars ();
>   do
>     {
>       init_reg_equiv ();
> 
> 
> => I can continue to work on the optimizer itself.
> 
> To provide an example:
> 
> void transformVector( double* restrict inputVector, double const
> transformMatrix[4][4],double* restrict outputVector) {
>     for(int k = 0; k < 900; k++)
>     {
>         double x = *inputVector++;
>         double y = *inputVector++;
>         double z = *inputVector++;
> 
>         for(int l = 0; l < 3; l++){
>             double res =  transformMatrix[l][0] * x;
>             res +=  transformMatrix[l][1] * y;
>             res +=  transformMatrix[l][2] * z;
>             res +=  transformMatrix[l][3];
>             *outputVector++ = res;
>         }
>     }
> }
> 
> m68k-amigaos-gcc -m68080 -O3 x.c -S
> 
> yields:
> 
> #NO_APP
>         .text
>         .align  2
>         .globl  _transformVector
> _transformVector:
>         link.w a5,#-88
>         move.l (16,a5),a0
>         move.l (8,a5),a1
>         fmovem fp2/fp3/fp4/fp5/fp6/fp7,-(sp)
>         movem.l a4/a3/a2,-(sp)
>         move.l (12,a5),a2
>         move.l (a2)+,(-16,a5)
>         move.l (a2)+,(-12,a5)
>         lea (21600,a0),a4
>         fdmove.d (a2)+,fp7
>         move.l (a2)+,(-8,a5)
>         move.l (a2)+,(-4,a5)
>         move.l (a2)+,(-24,a5)
>         move.l (a2)+,(-20,a5)
>         move.l (a2)+,(-32,a5)
>         move.l (a2)+,(-28,a5)
>         move.l (a2)+,(-40,a5)
>         move.l (a2)+,(-36,a5)
>         move.l (a2)+,(-48,a5)
>         move.l (a2)+,(-44,a5)
>         move.l (a2)+,(-56,a5)
>         move.l (a2)+,(-52,a5)
>         move.l (a2)+,(-64,a5)
>         move.l (a2)+,(-60,a5)
>         move.l (a2)+,(-72,a5)
>         move.l (a2)+,(-68,a5)
>         move.l (a2)+,(-80,a5)
>         move.l (a2)+,(-76,a5)
>         move.l (a2),(-88,a5)
>         move.l (4,a2),(-84,a5)
> .L2:
>         fdmove.d (8,a1),fp0
>         lea (24,a1),a3
>         lea (24,a0),a2
>         fdmove.d (a1),fp6
>         move.l a3,a1
>         fdmove.x fp0,fp4
>         fdmove.d (-16,a5),fp2
>         fdmul.x fp6,fp2
>         fdmul.x fp7,fp4
> ...
> 
> 
> And with the new option:
> 
> m68k-amigaos-gcc -m68080 -O3 x.c -S -fprune-stack-vars
> 
> _transformVector:
>         link.w a5,#0
>         move.l (16,a5),a1
>         move.l (12,a5),a0
>         fmovem fp2/fp3/fp4/fp5/fp6/fp7,-(sp)
>         movem.l a6/a4/a3/a2,-(sp)
>         move.l (8,a5),a2
>         lea (21600,a1),a6
> .L2:
>         fdmove.d (a2),fp2
>         lea (24,a2),a4
>         lea (24,a1),a3
>         fdmove.d (8,a2),fp0
>         move.l a4,a2
>         fdmove.x fp2,fp3
>         fdmove.x fp0,fp5
>         fdmul.d (a0),fp3
>         fdmul.d (8,a0),fp5
> ...
> 
> Btw: the code is not platform specific -> guess it's generally useful
> 
> Thanks
> Stefan

Ok, it's working now as it should. So if someone else needs to invent a loop at ira(), insert it here:

  df_analyze ();

  if (flag_prune_stack_vars)
    init_prune_stack_vars ();
  do
    {
      init_reg_equiv ();
...
      ira_color ();
    }
  while (flag_prune_stack_vars && prune_stack_vars ());

And if you modify something in your look, you need to do this inside your function:

  if (touched)
    {
      /* make stats visible. */
      if (internal_flag_ira_verbose > 0 && ira_dump_file != NULL)
	calculate_allocation_cost ();

      /* the lifetime of all registers must be reconsidered - reset what's needed. */
      regstat_free_n_sets_and_refs ();
      regstat_free_ri ();
      loop_optimizer_finalize ();
      free_dominance_info (CDI_DOMINATORS);

/* plus restore the REG_EQUAL notes which were recorded during init_prune_stack_vars() ! */
      std::vector<std::pair<rtx_insn *, rtx> >::iterator i2r = insn2req_equals.begin();
      for (;i2r != insn2req_equals.end(); ++i2r)
	{
	  rtx_insn * insn = i2r->first;
	  REG_NOTES (insn) = i2r->second;
	}

      df_mark_solutions_dirty();
      df_analyze ();
    }


Maybe I should provide it as a patch?


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-09-20 17:07 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-13  8:48 new ira optimization - adding a loop to ira stefan
2019-09-13 10:15 ` Richard Sandiford
2019-09-13 10:44   ` AW: " stefan
     [not found]   ` <022901d56a20$40895f20$c19c1d60$@franke.ms>
2019-09-13 10:58     ` stefan
     [not found]     ` <022b01d56a22$241361e0$6c3a25a0$@franke.ms>
2019-09-13 12:58       ` stefan
2019-09-20 17:07         ` stefan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).