From: <stefan@franke.ms>
To: <gcc-help@gcc.gnu.org>
Subject: AW: new ira optimization - adding a loop to ira
Date: Fri, 20 Sep 2019 17:07:00 -0000 [thread overview]
Message-ID: <0f5d01d56fd5$da609ea0$8f21dbe0$@franke.ms> (raw)
In-Reply-To: <024e01d56a32$fd6a0330$f83e0990$@franke.ms>
> -----Ursprüngliche Nachricht-----
> Von: gcc-help-owner@gcc.gnu.org <gcc-help-owner@gcc.gnu.org> Im
> Auftrag von stefan@franke.ms
> Gesendet: Freitag, 13. September 2019 14:59
> An: 'Richard Sandiford' <richard.sandiford@arm.com>
> Cc: gcc-help@gcc.gnu.org
> Betreff: AW: new ira optimization - adding a loop to ira
>
> > -----Ursprüngliche Nachricht-----
> > Von: stefan@franke.ms <stefan@franke.ms>
> > Gesendet: Freitag, 13. September 2019 12:58
> > An: 'Richard Sandiford' <richard.sandiford@arm.com>
> > Cc: gcc-help@gcc.gnu.org
> > Betreff: AW: new ira optimization - adding a loop to ira
> >
> > > -----Ursprüngliche Nachricht-----
> > > Von: stefan@franke.ms <stefan@franke.ms>
> > > Gesendet: Freitag, 13. September 2019 12:45
> > > An: 'Richard Sandiford' <richard.sandiford@arm.com>
> > > Cc: gcc-help@gcc.gnu.org
> > > Betreff: AW: new ira optimization - adding a loop to ira
> > >
> > > > -----Ursprüngliche Nachricht-----
> > > > Von: Richard Sandiford <richard.sandiford@arm.com>
> > > > Gesendet: Freitag, 13. September 2019 12:16
> > > > An: stefan@franke.ms
> > > > Cc: gcc-help@gcc.gnu.org
> > > > Betreff: Re: new ira optimization - adding a loop to ira
> > > >
> > > > <stefan@franke.ms> writes:
> > > > > I'm working on a new optimization to get rid of spilled tmp
> > > > > variables
> > > (e.g.
> > > > > introduced by pre) to use the source mem ref instead of a stack
> > slot.
> > > > >
> > > > > To do this, I added a loop into ira.c:ira()
> > > > >
> > > > > init_prune_stack_vars ();
> > > > > do
> > > > > {
> > > > > #ifndef IRA_NO_OBSTACK
> > > > > gcc_obstack_init (&ira_obstack); #endif
> > > > > bitmap_obstack_initialize (&ira_bitmap_obstack);
> > > > >
> > > > > ...
> > > > >
> > > > > ira_color ();
> > > > >
> > > > > }
> > > > > while (flag_prune_stack_vars && prune_stack_vars ());
> > > > >
> > > > > To get it work, the prune_stack_vars function resets a couple of
> > data.
> > > > > This is mostly working - but on some source files, it fails due
> > > > > to invalid reg_equivs.
> > > > > Since this also happens, if the optimizer does nothing and just
> > > > > loops
> > > once.
> > > > >
> > > > > Currently I'm calling this, before looping again
> > > > >
> > > > > regstat_free_n_sets_and_refs ();
> > > > > regstat_free_ri ();
> > > > > loop_optimizer_finalize ();
> > > > > free_dominance_info (CDI_DOMINATORS);
> > > > >
> > > > > Any hint, what I'm missing to reset?
> > > >
> > > > I can't see anything obviously missing. What kind of failure do
> > > > you
> > > see? E.g.
> > > > do you get an internal compiler error or does the compiler
> > > > generate incorrect code?
> > > >
> > > > Do you see the failure on an in-tree test case? FWIW, I just
> > > > tried
> > > looping like
> > > > this locally and didn't see any failures for the tests I tried.
> > > > But I
> > > was obviously
> > > > testing without the new optimisation, and so each loop iteration
> > > > should
> > > just
> > > > repeat what the previous one did.
> > > >
> > > > Not related to the failure, but: do you do anything with the
> > > > obstacks
> > > when
> > > > looping again? Including the initialisations in the loop as above
> > > > would introduce a memory leak if you don't do anything to free the
> > > contents.
> > > > It'd probably be better to initialise outside the loop unless
> > > > you're
> > > really
> > > > confident that the no data is carried across iterations.
> > > >
> > > > Thanks,
> > > > Richard
> > >
> > > Thanks für the ira_obstack hint - I will take care of this, once the
> > loop mode
> > > is working - maybe I can start looping later or I'll free the memory.
> > >
> > > In reload: push_reload(...) this raises an error:
> > >
> > > gcc_assert (regno < FIRST_PSEUDO_REGISTER
> > > || reg_renumber[regno] >= 0
> > > || reg_equiv_constant (regno) == NULL_RTX);
> > >
> > > I already know that it's reg_equiv_constant and that this
> > reg_equiv_constant
> > > is also set in the unpatched code.
> > >
> > > So I am looking why these additional reloads occur. There are
> > > additional reloads if I enable the loop, interestingly for uid like 2, 3, 4 ...
> > >
> > > Thanks,
> > > Stefan
> >
> >
> > The difference is the additional expr_list, which causes the reload:
> >
> > (insn 2 10 3 2 (set (reg/f:SI 9 a1 [orig:46 this ] [46])
> > (mem/f/c:SI (plus:SI (reg/f:SI 15 sp)
> > (const_int 16 [0x10])) [178 this+0 S4 A16]))
> > engines/sci/engine/kpathing.cpp:758 40 {*movsi_m68k2}
> > (expr_list:REG_EQUIV (mem/f/c:SI (plus:SI (reg/f:SI 15 sp)
> > (const_int 16 [0x10])) [178 this+0 S4 A16])
> > (nil)))
> >
> > => I'll add some code to drop the expr_list from all insns...
>
> I took the wrong corner:
>
> A normal ira pass is changing REG_EQUAL notes to REG_EQUIV notes. This
> sets the req_equiv and causes the failure during reload...
>
> => I added code to record all insn/REG_EQUAL-note pairs => and restore
> these if the loop is run again - dropping the REQ_EQUIV notes.
>
> And this issue went aways.
>
> Plus I moved the loop start further below, so the ira_obstack is only
> initialized once:
>
> init_prune_stack_vars ();
> do
> {
> init_reg_equiv ();
>
>
> => I can continue to work on the optimizer itself.
>
> To provide an example:
>
> void transformVector( double* restrict inputVector, double const
> transformMatrix[4][4],double* restrict outputVector) {
> for(int k = 0; k < 900; k++)
> {
> double x = *inputVector++;
> double y = *inputVector++;
> double z = *inputVector++;
>
> for(int l = 0; l < 3; l++){
> double res = transformMatrix[l][0] * x;
> res += transformMatrix[l][1] * y;
> res += transformMatrix[l][2] * z;
> res += transformMatrix[l][3];
> *outputVector++ = res;
> }
> }
> }
>
> m68k-amigaos-gcc -m68080 -O3 x.c -S
>
> yields:
>
> #NO_APP
> .text
> .align 2
> .globl _transformVector
> _transformVector:
> link.w a5,#-88
> move.l (16,a5),a0
> move.l (8,a5),a1
> fmovem fp2/fp3/fp4/fp5/fp6/fp7,-(sp)
> movem.l a4/a3/a2,-(sp)
> move.l (12,a5),a2
> move.l (a2)+,(-16,a5)
> move.l (a2)+,(-12,a5)
> lea (21600,a0),a4
> fdmove.d (a2)+,fp7
> move.l (a2)+,(-8,a5)
> move.l (a2)+,(-4,a5)
> move.l (a2)+,(-24,a5)
> move.l (a2)+,(-20,a5)
> move.l (a2)+,(-32,a5)
> move.l (a2)+,(-28,a5)
> move.l (a2)+,(-40,a5)
> move.l (a2)+,(-36,a5)
> move.l (a2)+,(-48,a5)
> move.l (a2)+,(-44,a5)
> move.l (a2)+,(-56,a5)
> move.l (a2)+,(-52,a5)
> move.l (a2)+,(-64,a5)
> move.l (a2)+,(-60,a5)
> move.l (a2)+,(-72,a5)
> move.l (a2)+,(-68,a5)
> move.l (a2)+,(-80,a5)
> move.l (a2)+,(-76,a5)
> move.l (a2),(-88,a5)
> move.l (4,a2),(-84,a5)
> .L2:
> fdmove.d (8,a1),fp0
> lea (24,a1),a3
> lea (24,a0),a2
> fdmove.d (a1),fp6
> move.l a3,a1
> fdmove.x fp0,fp4
> fdmove.d (-16,a5),fp2
> fdmul.x fp6,fp2
> fdmul.x fp7,fp4
> ...
>
>
> And with the new option:
>
> m68k-amigaos-gcc -m68080 -O3 x.c -S -fprune-stack-vars
>
> _transformVector:
> link.w a5,#0
> move.l (16,a5),a1
> move.l (12,a5),a0
> fmovem fp2/fp3/fp4/fp5/fp6/fp7,-(sp)
> movem.l a6/a4/a3/a2,-(sp)
> move.l (8,a5),a2
> lea (21600,a1),a6
> .L2:
> fdmove.d (a2),fp2
> lea (24,a2),a4
> lea (24,a1),a3
> fdmove.d (8,a2),fp0
> move.l a4,a2
> fdmove.x fp2,fp3
> fdmove.x fp0,fp5
> fdmul.d (a0),fp3
> fdmul.d (8,a0),fp5
> ...
>
> Btw: the code is not platform specific -> guess it's generally useful
>
> Thanks
> Stefan
Ok, it's working now as it should. So if someone else needs to invent a loop at ira(), insert it here:
df_analyze ();
if (flag_prune_stack_vars)
init_prune_stack_vars ();
do
{
init_reg_equiv ();
...
ira_color ();
}
while (flag_prune_stack_vars && prune_stack_vars ());
And if you modify something in your look, you need to do this inside your function:
if (touched)
{
/* make stats visible. */
if (internal_flag_ira_verbose > 0 && ira_dump_file != NULL)
calculate_allocation_cost ();
/* the lifetime of all registers must be reconsidered - reset what's needed. */
regstat_free_n_sets_and_refs ();
regstat_free_ri ();
loop_optimizer_finalize ();
free_dominance_info (CDI_DOMINATORS);
/* plus restore the REG_EQUAL notes which were recorded during init_prune_stack_vars() ! */
std::vector<std::pair<rtx_insn *, rtx> >::iterator i2r = insn2req_equals.begin();
for (;i2r != insn2req_equals.end(); ++i2r)
{
rtx_insn * insn = i2r->first;
REG_NOTES (insn) = i2r->second;
}
df_mark_solutions_dirty();
df_analyze ();
}
Maybe I should provide it as a patch?
prev parent reply other threads:[~2019-09-20 17:07 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-09-13 8:48 stefan
2019-09-13 10:15 ` Richard Sandiford
2019-09-13 10:44 ` AW: " stefan
[not found] ` <022901d56a20$40895f20$c19c1d60$@franke.ms>
2019-09-13 10:58 ` stefan
[not found] ` <022b01d56a22$241361e0$6c3a25a0$@franke.ms>
2019-09-13 12:58 ` stefan
2019-09-20 17:07 ` stefan [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='0f5d01d56fd5$da609ea0$8f21dbe0$@franke.ms' \
--to=stefan@franke.ms \
--cc=gcc-help@gcc.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).