public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Restrict LOOP_ALIGN to loop headers only.
@ 2019-07-09  9:14 Martin Liška
  2019-07-09  9:56 ` Jan Hubicka
  0 siblings, 1 reply; 17+ messages in thread
From: Martin Liška @ 2019-07-09  9:14 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jan Hubicka

[-- Attachment #1: Type: text/plain, Size: 561 bytes --]

Hi.

I'm suggesting to restrict LOOP_ALIGN to only loop headers. That are the
basic blocks for which it makes the biggest sense. I quite some binary
size reductions on SPEC2006 and SPEC2017. Speed numbers are also slightly
positive.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

2019-07-09  Martin Liska  <mliska@suse.cz>

	* final.c (compute_alignments): Apply the LOOP_ALIGN only
	to basic blocks that all loop headers.
---
 gcc/final.c | 1 +
 1 file changed, 1 insertion(+)



[-- Attachment #2: 0001-Restrict-LOOP_ALIGN-to-loop-headers-only.patch --]
[-- Type: text/x-patch, Size: 423 bytes --]

diff --git a/gcc/final.c b/gcc/final.c
index fefc4874b24..ce2678da988 100644
--- a/gcc/final.c
+++ b/gcc/final.c
@@ -739,6 +739,7 @@ compute_alignments (void)
       if (has_fallthru
 	  && !(single_succ_p (bb)
 	       && single_succ (bb) == EXIT_BLOCK_PTR_FOR_FN (cfun))
+	  && bb->loop_father->header == bb
 	  && optimize_bb_for_speed_p (bb)
 	  && branch_count + fallthru_count > count_threshold
 	  && (branch_count


[-- Attachment #3: lnt-speed.pdf.bz2 --]
[-- Type: application/x-bzip, Size: 76820 bytes --]

[-- Attachment #4: lnt-size.pdf.bz2 --]
[-- Type: application/x-bzip, Size: 102810 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Restrict LOOP_ALIGN to loop headers only.
  2019-07-09  9:14 [PATCH] Restrict LOOP_ALIGN to loop headers only Martin Liška
@ 2019-07-09  9:56 ` Jan Hubicka
  2019-07-09 10:11   ` Martin Liška
  2019-07-09 10:24   ` Richard Biener
  0 siblings, 2 replies; 17+ messages in thread
From: Jan Hubicka @ 2019-07-09  9:56 UTC (permalink / raw)
  To: Martin Liška; +Cc: gcc-patches

> Hi.
> 
> I'm suggesting to restrict LOOP_ALIGN to only loop headers. That are the
> basic blocks for which it makes the biggest sense. I quite some binary
> size reductions on SPEC2006 and SPEC2017. Speed numbers are also slightly
> positive.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?
The original idea of distinction between jump alignment and loop
alignment was that they have two basic meanings:
 1) jump alignment is there to avoid jumping just to the end of decode
 window (if the window is aligned) so CPU will get stuck after reaching
 the jump and also to possibly reduce code cache polution by populating
 by code that is not executed
 2) loop alignment aims to fit loop in as few cache windows as possible

Now if you have loop laid in a way that header of loop is not first
basic block, 2) IMO still apply.  I.e.

		jump loop
:loopback
		loop body
:loop
		if cond jump to loopback

So dropping loop alignment for those does not seem to make much sense
from high level.  We may want to have differnt alignment for loops
starting by header and loops starting in the middle, but I still liked
more your patch which did bundles for loops.

modern x86 chips are not very good testing targets on it.  I guess
generic changes to alignment needs to be tested on other chips too.

Honza
> Thanks,
> Martin
> 
> gcc/ChangeLog:
> 
> 2019-07-09  Martin Liska  <mliska@suse.cz>
> 
> 	* final.c (compute_alignments): Apply the LOOP_ALIGN only
> 	to basic blocks that all loop headers.
> ---
>  gcc/final.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> 

> diff --git a/gcc/final.c b/gcc/final.c
> index fefc4874b24..ce2678da988 100644
> --- a/gcc/final.c
> +++ b/gcc/final.c
> @@ -739,6 +739,7 @@ compute_alignments (void)
>        if (has_fallthru
>  	  && !(single_succ_p (bb)
>  	       && single_succ (bb) == EXIT_BLOCK_PTR_FOR_FN (cfun))
> +	  && bb->loop_father->header == bb
>  	  && optimize_bb_for_speed_p (bb)
>  	  && branch_count + fallthru_count > count_threshold
>  	  && (branch_count
> 



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Restrict LOOP_ALIGN to loop headers only.
  2019-07-09  9:56 ` Jan Hubicka
@ 2019-07-09 10:11   ` Martin Liška
  2019-07-09 10:24   ` Richard Biener
  1 sibling, 0 replies; 17+ messages in thread
From: Martin Liška @ 2019-07-09 10:11 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc-patches

On 7/9/19 11:56 AM, Jan Hubicka wrote:
>> Hi.
>>
>> I'm suggesting to restrict LOOP_ALIGN to only loop headers. That are the
>> basic blocks for which it makes the biggest sense. I quite some binary
>> size reductions on SPEC2006 and SPEC2017. Speed numbers are also slightly
>> positive.
>>
>> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>>
>> Ready to be installed?
> The original idea of distinction between jump alignment and loop
> alignment was that they have two basic meanings:
>  1) jump alignment is there to avoid jumping just to the end of decode
>  window (if the window is aligned) so CPU will get stuck after reaching
>  the jump and also to possibly reduce code cache polution by populating
>  by code that is not executed
>  2) loop alignment aims to fit loop in as few cache windows as possible
> 
> Now if you have loop laid in a way that header of loop is not first
> basic block, 2) IMO still apply.  I.e.
> 
> 		jump loop
> :loopback
> 		loop body
> :loop
> 		if cond jump to loopback
> 
> So dropping loop alignment for those does not seem to make much sense
> from high level.  We may want to have differnt alignment for loops
> starting by header and loops starting in the middle,

That's quite complicated condition, I would not introduce a new alignment.

> but I still liked
> more your patch which did bundles for loops.

The patch caused regression for quite some benchmarks and has it's own
problems (need of a recent GAS, not doing a bundle for bundles that
can't fit in a single bundle window). For that reasons, I decided
to not work on it any longer.

Martin

> 
> modern x86 chips are not very good testing targets on it.  I guess
> generic changes to alignment needs to be tested on other chips too.
> 
> Honza
>> Thanks,
>> Martin
>>
>> gcc/ChangeLog:
>>
>> 2019-07-09  Martin Liska  <mliska@suse.cz>
>>
>> 	* final.c (compute_alignments): Apply the LOOP_ALIGN only
>> 	to basic blocks that all loop headers.
>> ---
>>  gcc/final.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>>
> 
>> diff --git a/gcc/final.c b/gcc/final.c
>> index fefc4874b24..ce2678da988 100644
>> --- a/gcc/final.c
>> +++ b/gcc/final.c
>> @@ -739,6 +739,7 @@ compute_alignments (void)
>>        if (has_fallthru
>>  	  && !(single_succ_p (bb)
>>  	       && single_succ (bb) == EXIT_BLOCK_PTR_FOR_FN (cfun))
>> +	  && bb->loop_father->header == bb
>>  	  && optimize_bb_for_speed_p (bb)
>>  	  && branch_count + fallthru_count > count_threshold
>>  	  && (branch_count
>>
> 
> 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Restrict LOOP_ALIGN to loop headers only.
  2019-07-09  9:56 ` Jan Hubicka
  2019-07-09 10:11   ` Martin Liška
@ 2019-07-09 10:24   ` Richard Biener
  2019-07-09 10:26     ` Richard Biener
  1 sibling, 1 reply; 17+ messages in thread
From: Richard Biener @ 2019-07-09 10:24 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Martin Liška, GCC Patches

On Tue, Jul 9, 2019 at 11:56 AM Jan Hubicka <hubicka@ucw.cz> wrote:
>
> > Hi.
> >
> > I'm suggesting to restrict LOOP_ALIGN to only loop headers. That are the
> > basic blocks for which it makes the biggest sense. I quite some binary
> > size reductions on SPEC2006 and SPEC2017. Speed numbers are also slightly
> > positive.
> >
> > Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> >
> > Ready to be installed?
> The original idea of distinction between jump alignment and loop
> alignment was that they have two basic meanings:
>  1) jump alignment is there to avoid jumping just to the end of decode
>  window (if the window is aligned) so CPU will get stuck after reaching
>  the jump and also to possibly reduce code cache polution by populating
>  by code that is not executed
>  2) loop alignment aims to fit loop in as few cache windows as possible
>
> Now if you have loop laid in a way that header of loop is not first
> basic block, 2) IMO still apply.  I.e.
>
>                 jump loop
> :loopback
>                 loop body
> :loop
>                 if cond jump to loopback
>
> So dropping loop alignment for those does not seem to make much sense
> from high level.  We may want to have differnt alignment for loops
> starting by header and loops starting in the middle, but I still liked
> more your patch which did bundles for loops.
>
> modern x86 chips are not very good testing targets on it.  I guess
> generic changes to alignment needs to be tested on other chips too.
>
> Honza
> > Thanks,
> > Martin
> >
> > gcc/ChangeLog:
> >
> > 2019-07-09  Martin Liska  <mliska@suse.cz>
> >
> >       * final.c (compute_alignments): Apply the LOOP_ALIGN only
> >       to basic blocks that all loop headers.
> > ---
> >  gcc/final.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> >
>
> > diff --git a/gcc/final.c b/gcc/final.c
> > index fefc4874b24..ce2678da988 100644
> > --- a/gcc/final.c
> > +++ b/gcc/final.c
> > @@ -739,6 +739,7 @@ compute_alignments (void)
> >        if (has_fallthru
> >         && !(single_succ_p (bb)
> >              && single_succ (bb) == EXIT_BLOCK_PTR_FOR_FN (cfun))
> > +       && bb->loop_father->header == bb

I agree that the above is the wrong condition - but I'm not sure we
only end up using LOOP_ALIGN for blocks reached by a DFS_BACK
edge.  Note that DFS_BACK would have to be applied considering
the current CFG layout, simply doing mark_dfs_back_edges doesn't
work (we're in CFG layout mode here, no?).  Eventually the code
counting brances effectively already does this though.

The odd thing is that we apply LOOP_ALIGN only to blocks that
have a fallthru incoming edge.  I don't see Honzas example
above having one.

> >         && optimize_bb_for_speed_p (bb)
> >         && branch_count + fallthru_count > count_threshold
> >         && (branch_count
> >
>
>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Restrict LOOP_ALIGN to loop headers only.
  2019-07-09 10:24   ` Richard Biener
@ 2019-07-09 10:26     ` Richard Biener
  2019-07-09 10:39       ` Richard Biener
  0 siblings, 1 reply; 17+ messages in thread
From: Richard Biener @ 2019-07-09 10:26 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Martin Liška, GCC Patches

On Tue, Jul 9, 2019 at 12:22 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Tue, Jul 9, 2019 at 11:56 AM Jan Hubicka <hubicka@ucw.cz> wrote:
> >
> > > Hi.
> > >
> > > I'm suggesting to restrict LOOP_ALIGN to only loop headers. That are the
> > > basic blocks for which it makes the biggest sense. I quite some binary
> > > size reductions on SPEC2006 and SPEC2017. Speed numbers are also slightly
> > > positive.
> > >
> > > Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> > >
> > > Ready to be installed?
> > The original idea of distinction between jump alignment and loop
> > alignment was that they have two basic meanings:
> >  1) jump alignment is there to avoid jumping just to the end of decode
> >  window (if the window is aligned) so CPU will get stuck after reaching
> >  the jump and also to possibly reduce code cache polution by populating
> >  by code that is not executed
> >  2) loop alignment aims to fit loop in as few cache windows as possible
> >
> > Now if you have loop laid in a way that header of loop is not first
> > basic block, 2) IMO still apply.  I.e.
> >
> >                 jump loop
> > :loopback
> >                 loop body
> > :loop
> >                 if cond jump to loopback
> >
> > So dropping loop alignment for those does not seem to make much sense
> > from high level.  We may want to have differnt alignment for loops
> > starting by header and loops starting in the middle, but I still liked
> > more your patch which did bundles for loops.
> >
> > modern x86 chips are not very good testing targets on it.  I guess
> > generic changes to alignment needs to be tested on other chips too.
> >
> > Honza
> > > Thanks,
> > > Martin
> > >
> > > gcc/ChangeLog:
> > >
> > > 2019-07-09  Martin Liska  <mliska@suse.cz>
> > >
> > >       * final.c (compute_alignments): Apply the LOOP_ALIGN only
> > >       to basic blocks that all loop headers.
> > > ---
> > >  gcc/final.c | 1 +
> > >  1 file changed, 1 insertion(+)
> > >
> > >
> >
> > > diff --git a/gcc/final.c b/gcc/final.c
> > > index fefc4874b24..ce2678da988 100644
> > > --- a/gcc/final.c
> > > +++ b/gcc/final.c
> > > @@ -739,6 +739,7 @@ compute_alignments (void)
> > >        if (has_fallthru
> > >         && !(single_succ_p (bb)
> > >              && single_succ (bb) == EXIT_BLOCK_PTR_FOR_FN (cfun))
> > > +       && bb->loop_father->header == bb
>
> I agree that the above is the wrong condition - but I'm not sure we
> only end up using LOOP_ALIGN for blocks reached by a DFS_BACK
> edge.  Note that DFS_BACK would have to be applied considering
> the current CFG layout, simply doing mark_dfs_back_edges doesn't
> work (we're in CFG layout mode here, no?).

So a "backedge" in this sense would be e->dest->index < e->src->index.
No?

> Eventually the code
> counting brances effectively already does this though.
>
> The odd thing is that we apply LOOP_ALIGN only to blocks that
> have a fallthru incoming edge.  I don't see Honzas example
> above having one.
>
> > >         && optimize_bb_for_speed_p (bb)
> > >         && branch_count + fallthru_count > count_threshold
> > >         && (branch_count
> > >
> >
> >
> >

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Restrict LOOP_ALIGN to loop headers only.
  2019-07-09 10:26     ` Richard Biener
@ 2019-07-09 10:39       ` Richard Biener
  2019-07-09 13:19         ` Michael Matz
  0 siblings, 1 reply; 17+ messages in thread
From: Richard Biener @ 2019-07-09 10:39 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Martin Liška, GCC Patches

On Tue, Jul 9, 2019 at 12:23 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Tue, Jul 9, 2019 at 12:22 PM Richard Biener
> <richard.guenther@gmail.com> wrote:
> >
> > On Tue, Jul 9, 2019 at 11:56 AM Jan Hubicka <hubicka@ucw.cz> wrote:
> > >
> > > > Hi.
> > > >
> > > > I'm suggesting to restrict LOOP_ALIGN to only loop headers. That are the
> > > > basic blocks for which it makes the biggest sense. I quite some binary
> > > > size reductions on SPEC2006 and SPEC2017. Speed numbers are also slightly
> > > > positive.
> > > >
> > > > Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> > > >
> > > > Ready to be installed?
> > > The original idea of distinction between jump alignment and loop
> > > alignment was that they have two basic meanings:
> > >  1) jump alignment is there to avoid jumping just to the end of decode
> > >  window (if the window is aligned) so CPU will get stuck after reaching
> > >  the jump and also to possibly reduce code cache polution by populating
> > >  by code that is not executed
> > >  2) loop alignment aims to fit loop in as few cache windows as possible
> > >
> > > Now if you have loop laid in a way that header of loop is not first
> > > basic block, 2) IMO still apply.  I.e.
> > >
> > >                 jump loop
> > > :loopback
> > >                 loop body
> > > :loop
> > >                 if cond jump to loopback
> > >
> > > So dropping loop alignment for those does not seem to make much sense
> > > from high level.  We may want to have differnt alignment for loops
> > > starting by header and loops starting in the middle, but I still liked
> > > more your patch which did bundles for loops.
> > >
> > > modern x86 chips are not very good testing targets on it.  I guess
> > > generic changes to alignment needs to be tested on other chips too.
> > >
> > > Honza
> > > > Thanks,
> > > > Martin
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > 2019-07-09  Martin Liska  <mliska@suse.cz>
> > > >
> > > >       * final.c (compute_alignments): Apply the LOOP_ALIGN only
> > > >       to basic blocks that all loop headers.
> > > > ---
> > > >  gcc/final.c | 1 +
> > > >  1 file changed, 1 insertion(+)
> > > >
> > > >
> > >
> > > > diff --git a/gcc/final.c b/gcc/final.c
> > > > index fefc4874b24..ce2678da988 100644
> > > > --- a/gcc/final.c
> > > > +++ b/gcc/final.c
> > > > @@ -739,6 +739,7 @@ compute_alignments (void)
> > > >        if (has_fallthru
> > > >         && !(single_succ_p (bb)
> > > >              && single_succ (bb) == EXIT_BLOCK_PTR_FOR_FN (cfun))
> > > > +       && bb->loop_father->header == bb
> >
> > I agree that the above is the wrong condition - but I'm not sure we
> > only end up using LOOP_ALIGN for blocks reached by a DFS_BACK
> > edge.  Note that DFS_BACK would have to be applied considering
> > the current CFG layout, simply doing mark_dfs_back_edges doesn't
> > work (we're in CFG layout mode here, no?).
>
> So a "backedge" in this sense would be e->dest->index < e->src->index.
> No?

To me the following would make sense.

Index: gcc/final.c
===================================================================
--- gcc/final.c (revision 273294)
+++ gcc/final.c (working copy)
@@ -669,6 +669,7 @@ compute_alignments (void)
     {
       rtx_insn *label = BB_HEAD (bb);
       bool has_fallthru = 0;
+      bool has_backedge = 0;
       edge e;
       edge_iterator ei;

@@ -693,6 +694,8 @@ compute_alignments (void)
            has_fallthru = 1, fallthru_count += e->count ();
          else
            branch_count += e->count ();
+         if (e->src->index > bb->index)
+           has_backedge = 1;
        }
       if (dump_file)
        {
@@ -736,7 +739,7 @@ compute_alignments (void)
        }
       /* In case block is frequent and reached mostly by non-fallthru edge,
         align it.  It is most likely a first block of loop.  */
-      if (has_fallthru
+      if (has_backedge
          && !(single_succ_p (bb)
               && single_succ (bb) == EXIT_BLOCK_PTR_FOR_FN (cfun))
          && optimize_bb_for_speed_p (bb)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Restrict LOOP_ALIGN to loop headers only.
  2019-07-09 10:39       ` Richard Biener
@ 2019-07-09 13:19         ` Michael Matz
  2019-07-09 16:41           ` Richard Biener
  0 siblings, 1 reply; 17+ messages in thread
From: Michael Matz @ 2019-07-09 13:19 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jan Hubicka, Martin Liška, GCC Patches

Hi,

On Tue, 9 Jul 2019, Richard Biener wrote:

> > So a "backedge" in this sense would be e->dest->index < e->src->index.
> > No?
> 
> To me the following would make sense.

The basic block index is not a DFS index, so no, that's not a test for 
backedge.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Restrict LOOP_ALIGN to loop headers only.
  2019-07-09 13:19         ` Michael Matz
@ 2019-07-09 16:41           ` Richard Biener
  2019-07-10 12:46             ` Michael Matz
  0 siblings, 1 reply; 17+ messages in thread
From: Richard Biener @ 2019-07-09 16:41 UTC (permalink / raw)
  To: Michael Matz; +Cc: Jan Hubicka, Martin Liška, GCC Patches

On July 9, 2019 3:10:19 PM GMT+02:00, Michael Matz <matz@suse.de> wrote:
>Hi,
>
>On Tue, 9 Jul 2019, Richard Biener wrote:
>
>> > So a "backedge" in this sense would be e->dest->index <
>e->src->index.
>> > No?
>> 
>> To me the following would make sense.
>
>The basic block index is not a DFS index, so no, that's not a test for 
>backedge.

I think in CFG RTL mode the BB index designates the order of the BBs in the object file? So this is a way to identify backwards jumps? 

Richard. 

>
>Ciao,
>Michael.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Restrict LOOP_ALIGN to loop headers only.
  2019-07-09 16:41           ` Richard Biener
@ 2019-07-10 12:46             ` Michael Matz
  2019-07-10 15:52               ` Richard Biener
  0 siblings, 1 reply; 17+ messages in thread
From: Michael Matz @ 2019-07-10 12:46 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jan Hubicka, Martin Liška, GCC Patches

Hi,

On Tue, 9 Jul 2019, Richard Biener wrote:

> >The basic block index is not a DFS index, so no, that's not a test for 
> >backedge.
> 
> I think in CFG RTL mode the BB index designates the order of the BBs in 
> the object file? So this is a way to identify backwards jumps?

Even if it means a backwards jump (and that's not always the case, the 
insns are emitted by following the NEXT_INSN links, without a CFG, and 
that all happens after machine-dependend reorg, and going out of cfg 
layout might link insn together even from high index BBs to low index BBs 
(e.g. because of fall-through)), that's still not a backedge in the 
general case.  If a heuristic is enough here it might be okay, though.

OTOH, as here a CFG still exists, why not simply rely on a proper DFS 
marking backedges?


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Restrict LOOP_ALIGN to loop headers only.
  2019-07-10 12:46             ` Michael Matz
@ 2019-07-10 15:52               ` Richard Biener
  2019-07-11 10:18                 ` Richard Biener
  0 siblings, 1 reply; 17+ messages in thread
From: Richard Biener @ 2019-07-10 15:52 UTC (permalink / raw)
  To: Michael Matz; +Cc: Jan Hubicka, Martin Liška, GCC Patches

On July 10, 2019 2:11:17 PM GMT+02:00, Michael Matz <matz@suse.de> wrote:
>Hi,
>
>On Tue, 9 Jul 2019, Richard Biener wrote:
>
>> >The basic block index is not a DFS index, so no, that's not a test
>for 
>> >backedge.
>> 
>> I think in CFG RTL mode the BB index designates the order of the BBs
>in 
>> the object file? So this is a way to identify backwards jumps?
>
>Even if it means a backwards jump (and that's not always the case, the 
>insns are emitted by following the NEXT_INSN links, without a CFG, and 
>that all happens after machine-dependend reorg, and going out of cfg 
>layout might link insn together even from high index BBs to low index
>BBs 
>(e.g. because of fall-through)), that's still not a backedge in the 
>general case.  If a heuristic is enough here it might be okay, though.
>
>OTOH, as here a CFG still exists, why not simply rely on a proper DFS 
>marking backedges?

Because proper backedges is not what we want here, see honzas example. 

So I'm second-guessing why we have different LOOP_ALIGN and when it makes sense to apply. 

Richard. 

>
>Ciao,
>Michael.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Restrict LOOP_ALIGN to loop headers only.
  2019-07-10 15:52               ` Richard Biener
@ 2019-07-11 10:18                 ` Richard Biener
  2019-08-08  9:11                   ` Martin Liška
  0 siblings, 1 reply; 17+ messages in thread
From: Richard Biener @ 2019-07-11 10:18 UTC (permalink / raw)
  To: Michael Matz; +Cc: Jan Hubicka, Martin Liška, GCC Patches

On Wed, Jul 10, 2019 at 5:52 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On July 10, 2019 2:11:17 PM GMT+02:00, Michael Matz <matz@suse.de> wrote:
> >Hi,
> >
> >On Tue, 9 Jul 2019, Richard Biener wrote:
> >
> >> >The basic block index is not a DFS index, so no, that's not a test
> >for
> >> >backedge.
> >>
> >> I think in CFG RTL mode the BB index designates the order of the BBs
> >in
> >> the object file? So this is a way to identify backwards jumps?
> >
> >Even if it means a backwards jump (and that's not always the case, the
> >insns are emitted by following the NEXT_INSN links, without a CFG, and
> >that all happens after machine-dependend reorg, and going out of cfg
> >layout might link insn together even from high index BBs to low index
> >BBs
> >(e.g. because of fall-through)), that's still not a backedge in the
> >general case.  If a heuristic is enough here it might be okay, though.
> >
> >OTOH, as here a CFG still exists, why not simply rely on a proper DFS
> >marking backedges?
>
> Because proper backedges is not what we want here, see honzas example.
>
> So I'm second-guessing why we have different LOOP_ALIGN and when it makes sense to apply.

So docs have

@defmac JUMP_ALIGN (@var{label})
The alignment (log base 2) to put in front of @var{label}, which is
a common destination of jumps and has no fallthru incoming edge.

...

@defmac LOOP_ALIGN (@var{label})
The alignment (log base 2) to put in front of @var{label} that heads
a frequently executed basic block (usually the header of a loop).

so I would expect the alignment pass to have

      if ( (branch_count > count_threshold
              || (bb->count > bb->prev_bb->count.apply_scale (10, 1)
                  && (bb->prev_bb->count
                      <= ENTRY_BLOCK_PTR_FOR_FN (cfun)
                           ->count.apply_scale (1, 2)))))
        {
          align_flags alignment = has_fallthru ? JUMP_ALIGN (label) :
LOOP_ALIGN (label);
          if (dump_file)
            fprintf (dump_file, "  jump alignment added.\n");
          max_alignment = align_flags::max (max_alignment, alignment);
        }

instead of the two different conditions?  Or the JUMP_ALIGN case
computing "common destination" instead of "frequently executed"
somehow but I think it makes sense that those are actually the same
here (frequently executed).  It oddly looks at prev_bb and is not
guarded with optimize_bb_for_speed_p at all so this all is
full of heuristics and changing anything here just based on x86
measurements is surely going to cause issues for targets more
sensitive to (mis-)alignment.

Richard.

> Richard.
>
> >
> >Ciao,
> >Michael.
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Restrict LOOP_ALIGN to loop headers only.
  2019-07-11 10:18                 ` Richard Biener
@ 2019-08-08  9:11                   ` Martin Liška
  2019-08-08 13:02                     ` Michael Matz
  0 siblings, 1 reply; 17+ messages in thread
From: Martin Liška @ 2019-08-08  9:11 UTC (permalink / raw)
  To: Richard Biener, Michael Matz; +Cc: Jan Hubicka, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 3087 bytes --]

On 7/11/19 11:42 AM, Richard Biener wrote:
> On Wed, Jul 10, 2019 at 5:52 PM Richard Biener
> <richard.guenther@gmail.com> wrote:
>>
>> On July 10, 2019 2:11:17 PM GMT+02:00, Michael Matz <matz@suse.de> wrote:
>>> Hi,
>>>
>>> On Tue, 9 Jul 2019, Richard Biener wrote:
>>>
>>>>> The basic block index is not a DFS index, so no, that's not a test
>>> for
>>>>> backedge.
>>>>
>>>> I think in CFG RTL mode the BB index designates the order of the BBs
>>> in
>>>> the object file? So this is a way to identify backwards jumps?
>>>
>>> Even if it means a backwards jump (and that's not always the case, the
>>> insns are emitted by following the NEXT_INSN links, without a CFG, and
>>> that all happens after machine-dependend reorg, and going out of cfg
>>> layout might link insn together even from high index BBs to low index
>>> BBs
>>> (e.g. because of fall-through)), that's still not a backedge in the
>>> general case.  If a heuristic is enough here it might be okay, though.
>>>
>>> OTOH, as here a CFG still exists, why not simply rely on a proper DFS
>>> marking backedges?
>>
>> Because proper backedges is not what we want here, see honzas example.
>>
>> So I'm second-guessing why we have different LOOP_ALIGN and when it makes sense to apply.
> 
> So docs have
> 
> @defmac JUMP_ALIGN (@var{label})
> The alignment (log base 2) to put in front of @var{label}, which is
> a common destination of jumps and has no fallthru incoming edge.
> 
> ...
> 
> @defmac LOOP_ALIGN (@var{label})
> The alignment (log base 2) to put in front of @var{label} that heads
> a frequently executed basic block (usually the header of a loop).
> 
> so I would expect the alignment pass to have
> 
>       if ( (branch_count > count_threshold
>               || (bb->count > bb->prev_bb->count.apply_scale (10, 1)
>                   && (bb->prev_bb->count
>                       <= ENTRY_BLOCK_PTR_FOR_FN (cfun)
>                            ->count.apply_scale (1, 2)))))
>         {
>           align_flags alignment = has_fallthru ? JUMP_ALIGN (label) :
> LOOP_ALIGN (label);
>           if (dump_file)
>             fprintf (dump_file, "  jump alignment added.\n");
>           max_alignment = align_flags::max (max_alignment, alignment);
>         }

Sorry for the delay.

> 
> instead of the two different conditions?  Or the JUMP_ALIGN case
> computing "common destination" instead of "frequently executed"
> somehow but I think it makes sense that those are actually the same
> here (frequently executed).  It oddly looks at prev_bb and is not
> guarded with optimize_bb_for_speed_p at all so this all is
> full of heuristics and changing anything here just based on x86
> measurements is surely going to cause issues for targets more
> sensitive to (mis-)alignment.

I like you patch, it's a rapid simplification of the code which
we have there.

I'm sending LNT results before and after the revision. It show
there are quite some significant changes. lbm is a known issue
where we are lucky with the patch.

Thoughts?
Martin

> 
> Richard.
> 
>> Richard.
>>
>>>
>>> Ciao,
>>> Michael.
>>


[-- Attachment #2: 0001-Simplify-alignment.patch --]
[-- Type: text/x-patch, Size: 2102 bytes --]

From b5cad0c2bdc611aedf32a839ead1774fd82fdad4 Mon Sep 17 00:00:00 2001
From: Martin Liska <mliska@suse.cz>
Date: Tue, 6 Aug 2019 18:09:25 +0200
Subject: [PATCH] Simplify alignment.

---
 gcc/final.c | 32 ++++++++------------------------
 1 file changed, 8 insertions(+), 24 deletions(-)

diff --git a/gcc/final.c b/gcc/final.c
index fefc4874b24..b33dfc6646e 100644
--- a/gcc/final.c
+++ b/gcc/final.c
@@ -722,32 +722,16 @@ compute_alignments (void)
 	 than the predecessor and the predecessor is likely to not be executed
 	 when function is called.  */
 
-      if (!has_fallthru
-	  && (branch_count > count_threshold
-	      || (bb->count > bb->prev_bb->count.apply_scale (10, 1)
-		  && (bb->prev_bb->count
-		      <= ENTRY_BLOCK_PTR_FOR_FN (cfun)
-			   ->count.apply_scale (1, 2)))))
+      if (branch_count > count_threshold
+	  || (bb->count > bb->prev_bb->count.apply_scale (10, 1)
+	      && (bb->prev_bb->count
+		  <= ENTRY_BLOCK_PTR_FOR_FN (cfun)->count.apply_scale (1, 2))))
 	{
-	  align_flags alignment = JUMP_ALIGN (label);
+	  align_flags alignment
+	    = has_fallthru ? JUMP_ALIGN (label) : LOOP_ALIGN (label);
 	  if (dump_file)
-	    fprintf (dump_file, "  jump alignment added.\n");
-	  max_alignment = align_flags::max (max_alignment, alignment);
-	}
-      /* In case block is frequent and reached mostly by non-fallthru edge,
-	 align it.  It is most likely a first block of loop.  */
-      if (has_fallthru
-	  && !(single_succ_p (bb)
-	       && single_succ (bb) == EXIT_BLOCK_PTR_FOR_FN (cfun))
-	  && optimize_bb_for_speed_p (bb)
-	  && branch_count + fallthru_count > count_threshold
-	  && (branch_count
-	      > fallthru_count.apply_scale
-		    (PARAM_VALUE (PARAM_ALIGN_LOOP_ITERATIONS), 1)))
-	{
-	  align_flags alignment = LOOP_ALIGN (label);
-	  if (dump_file)
-	    fprintf (dump_file, "  internal loop alignment added.\n");
+	    fprintf (dump_file, "  %s alignment added.\n",
+		     has_fallthru ? "loop" : "internal loop");
 	  max_alignment = align_flags::max (max_alignment, alignment);
 	}
       LABEL_TO_ALIGNMENT (label) = max_alignment;
-- 
2.22.0


[-- Attachment #3: lnt-loop-alignment.pdf.bz2 --]
[-- Type: application/x-bzip, Size: 82976 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Restrict LOOP_ALIGN to loop headers only.
  2019-08-08  9:11                   ` Martin Liška
@ 2019-08-08 13:02                     ` Michael Matz
  2019-08-09 11:23                       ` Richard Biener
  0 siblings, 1 reply; 17+ messages in thread
From: Michael Matz @ 2019-08-08 13:02 UTC (permalink / raw)
  To: Martin Liška; +Cc: Richard Biener, Jan Hubicka, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 1399 bytes --]

Hi,

On Thu, 8 Aug 2019, Martin Liška wrote:

> > So docs have
> > 
> > @defmac JUMP_ALIGN (@var{label})
> > The alignment (log base 2) to put in front of @var{label}, which is
> > a common destination of jumps and has no fallthru incoming edge.

So, per docu: JUMP_ALIGN implies !fallthru ...

> >           align_flags alignment = has_fallthru ? JUMP_ALIGN (label) :
> > LOOP_ALIGN (label);

... exactly the opposite way here.

> > instead of the two different conditions?  Or the JUMP_ALIGN case
> > computing "common destination" instead of "frequently executed"
> > somehow but I think it makes sense that those are actually the same
> > here (frequently executed).  It oddly looks at prev_bb and is not
> > guarded with optimize_bb_for_speed_p at all so this all is
> > full of heuristics and changing anything here just based on x86
> > measurements is surely going to cause issues for targets more
> > sensitive to (mis-)alignment.
> 
> I like you patch, it's a rapid simplification of the code which
> we have there.

Yeah, but it's also contradicting the documentation.  And I think the docu 
makes sense, because it means that there is no padding inserted on the 
fall-thru path (because there is none).  So please measure with the 
opposite direction.  (I still think these conditions shouldn't be 
hot-needled, but rather should somewhat make sense in the abstract).


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Restrict LOOP_ALIGN to loop headers only.
  2019-08-08 13:02                     ` Michael Matz
@ 2019-08-09 11:23                       ` Richard Biener
  2019-08-09 12:41                         ` Michael Matz
  0 siblings, 1 reply; 17+ messages in thread
From: Richard Biener @ 2019-08-09 11:23 UTC (permalink / raw)
  To: Michael Matz; +Cc: Martin Liška, Jan Hubicka, GCC Patches

On Thu, Aug 8, 2019 at 2:24 PM Michael Matz <matz@suse.de> wrote:
>
> Hi,
>
> On Thu, 8 Aug 2019, Martin Liška wrote:
>
> > > So docs have
> > >
> > > @defmac JUMP_ALIGN (@var{label})
> > > The alignment (log base 2) to put in front of @var{label}, which is
> > > a common destination of jumps and has no fallthru incoming edge.
>
> So, per docu: JUMP_ALIGN implies !fallthru ...
>
> > >           align_flags alignment = has_fallthru ? JUMP_ALIGN (label) :
> > > LOOP_ALIGN (label);
>
> ... exactly the opposite way here.

Yeah, sorry - my mistake.

> > > instead of the two different conditions?  Or the JUMP_ALIGN case
> > > computing "common destination" instead of "frequently executed"
> > > somehow but I think it makes sense that those are actually the same
> > > here (frequently executed).  It oddly looks at prev_bb and is not
> > > guarded with optimize_bb_for_speed_p at all so this all is
> > > full of heuristics and changing anything here just based on x86
> > > measurements is surely going to cause issues for targets more
> > > sensitive to (mis-)alignment.
> >
> > I like you patch, it's a rapid simplification of the code which
> > we have there.
>
> Yeah, but it's also contradicting the documentation.  And I think the docu
> makes sense, because it means that there is no padding inserted on the
> fall-thru path (because there is none).  So please measure with the
> opposite direction.  (I still think these conditions shouldn't be
> hot-needled, but rather should somewhat make sense in the abstract).

Of course I'm still afraid that the other code exists for a reason
(tuning/hack/whatever...).

Note that with the patch we're now applying LOOP_ALIGN to L2 here:
  if (a)
    foo = bar;
L2:
  blah;

because there's a jump-around and a fallthru.  So I'm not sure
we don't need to apply some condition on fallthru_count
(which is unused after your patch btw).

Richard.


>
> Ciao,
> Michael.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Restrict LOOP_ALIGN to loop headers only.
  2019-08-09 11:23                       ` Richard Biener
@ 2019-08-09 12:41                         ` Michael Matz
  2019-08-09 13:05                           ` Martin Liška
  0 siblings, 1 reply; 17+ messages in thread
From: Michael Matz @ 2019-08-09 12:41 UTC (permalink / raw)
  To: Richard Biener; +Cc: Martin Liška, Jan Hubicka, GCC Patches

Hi,

On Fri, 9 Aug 2019, Richard Biener wrote:

> Of course I'm still afraid that the other code exists for a reason
> (tuning/hack/whatever...).
> 
> Note that with the patch we're now applying LOOP_ALIGN to L2 here:
>   if (a)
>     foo = bar;
> L2:
>   blah;
> 
> because there's a jump-around and a fallthru.

Yeah, and I think that would be wrong.  That's why the existing code (not 
sure about after the patch) does this only when L2 is reached by one edge 
much more often than by the other edges.

> So I'm not sure we don't need to apply some condition on fallthru_count 
> (which is unused after your patch btw).


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Restrict LOOP_ALIGN to loop headers only.
  2019-08-09 12:41                         ` Michael Matz
@ 2019-08-09 13:05                           ` Martin Liška
  2019-10-29 11:49                             ` Martin Liška
  0 siblings, 1 reply; 17+ messages in thread
From: Martin Liška @ 2019-08-09 13:05 UTC (permalink / raw)
  To: Michael Matz, Richard Biener; +Cc: Jan Hubicka, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 1068 bytes --]

On 8/9/19 2:13 PM, Michael Matz wrote:
> Hi,
> 
> On Fri, 9 Aug 2019, Richard Biener wrote:
> 
>> Of course I'm still afraid that the other code exists for a reason
>> (tuning/hack/whatever...).
>>
>> Note that with the patch we're now applying LOOP_ALIGN to L2 here:
>>   if (a)
>>     foo = bar;
>> L2:
>>   blah;
>>
>> because there's a jump-around and a fallthru.
> 
> Yeah, and I think that would be wrong.  That's why the existing code (not 
> sure about after the patch) does this only when L2 is reached by one edge 
> much more often than by the other edges.
> 
>> So I'm not sure we don't need to apply some condition on fallthru_count 
>> (which is unused after your patch btw).
> 
> 
> Ciao,
> Michael.
> 

I'm sending numbers for the opposite condition.

> Of course I'm still afraid that the other code exists for a reason
> (tuning/hack/whatever...).

I fully agree that the current code is quite hacking and was probably subject
of some tuning.

I'm leaving the decision about simplification to you?
You're much more experienced in the area :)

Martin

[-- Attachment #2: lnt-loop-alignment-v2.pdf.bz2 --]
[-- Type: application/x-bzip, Size: 83848 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Restrict LOOP_ALIGN to loop headers only.
  2019-08-09 13:05                           ` Martin Liška
@ 2019-10-29 11:49                             ` Martin Liška
  0 siblings, 0 replies; 17+ messages in thread
From: Martin Liška @ 2019-10-29 11:49 UTC (permalink / raw)
  To: Michael Matz, Richard Biener; +Cc: Jan Hubicka, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 1224 bytes --]

On 8/9/19 2:51 PM, Martin Liška wrote:
> On 8/9/19 2:13 PM, Michael Matz wrote:
>> Hi,
>>
>> On Fri, 9 Aug 2019, Richard Biener wrote:
>>
>>> Of course I'm still afraid that the other code exists for a reason
>>> (tuning/hack/whatever...).
>>>
>>> Note that with the patch we're now applying LOOP_ALIGN to L2 here:
>>>   if (a)
>>>     foo = bar;
>>> L2:
>>>   blah;
>>>
>>> because there's a jump-around and a fallthru.
>>
>> Yeah, and I think that would be wrong.  That's why the existing code (not 
>> sure about after the patch) does this only when L2 is reached by one edge 
>> much more often than by the other edges.
>>
>>> So I'm not sure we don't need to apply some condition on fallthru_count 
>>> (which is unused after your patch btw).
>>
>>
>> Ciao,
>> Michael.
>>
> 
> I'm sending numbers for the opposite condition.
> 
>> Of course I'm still afraid that the other code exists for a reason
>> (tuning/hack/whatever...).
> 
> I fully agree that the current code is quite hacking and was probably subject
> of some tuning.
> 
> I'm leaving the decision about simplification to you?
> You're much more experienced in the area :)
> 
> Martin
> 

@Honza: PING

Martin

[-- Attachment #2: pEpkey.asc --]
[-- Type: application/pgp-keys, Size: 1787 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2019-10-29 11:34 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-09  9:14 [PATCH] Restrict LOOP_ALIGN to loop headers only Martin Liška
2019-07-09  9:56 ` Jan Hubicka
2019-07-09 10:11   ` Martin Liška
2019-07-09 10:24   ` Richard Biener
2019-07-09 10:26     ` Richard Biener
2019-07-09 10:39       ` Richard Biener
2019-07-09 13:19         ` Michael Matz
2019-07-09 16:41           ` Richard Biener
2019-07-10 12:46             ` Michael Matz
2019-07-10 15:52               ` Richard Biener
2019-07-11 10:18                 ` Richard Biener
2019-08-08  9:11                   ` Martin Liška
2019-08-08 13:02                     ` Michael Matz
2019-08-09 11:23                       ` Richard Biener
2019-08-09 12:41                         ` Michael Matz
2019-08-09 13:05                           ` Martin Liška
2019-10-29 11:49                             ` Martin Liška

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).