public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: postreload-gcse heuristics broken?
@ 2005-01-23  2:07 Ayal Zaks
  2005-01-23 23:56 ` Ulrich Weigand
  0 siblings, 1 reply; 13+ messages in thread
From: Ayal Zaks @ 2005-01-23  2:07 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: gcc, Mostafa Hagog





>the heuristics used in eliminate_partially_redundant_load
>(postreload-gcse.c) to decide whether or not eliminating
>a redundancy appears to be seriously broken.  It tests
>
>  /* Check if it's worth applying the partial redundancy elimination.  */
>  if (ok_count < GCSE_AFTER_RELOAD_PARTIAL_FRACTION * not_ok_count)
>    goto cleanup;
>  if (ok_count < GCSE_AFTER_RELOAD_CRITICAL_FRACTION * critical_count)
>    goto cleanup;
>
>where the counts are accumulated edge->count values, e.g.
>
>      if (EDGE_CRITICAL_P (pred))
>        critical_count += pred->count;
>
>Unfortunately it would appear that unless profile-directed
>feedback is being used, edge->count is always zero!
>
>Thus the check above always reduced to 0 < 0, and therefore
>*every possible* redundancy elimination is performed, resulting
>in much worse code than before in some cases.
>
>Any suggestions how the heuristics can be fixed in the
>absence of profile data?  Otherwise, I'd suggest to make
>the pass conditional on profile-directed feedback ...

One way would be to try and use nesting-levels/static-branch-prediction,
for guesstimating these basic block counts or obtaining inverse-branch
prediction (how probable is it that we reach a basic block from each one of
its predecessors). Without such information, one could make a decision
based on how many predecessors are 'ok' versus 'not_ok' (critical edges
should probably not be split), by doing

          [not_]ok_count += (pred->count + 1);


>Thus the check above always reduced to 0 < 0, and therefore

I think there was a reason not to check 0 <= 0, but I don't recall what it
was ..


>resulting in much worse code than before in some cases.

Is it easy to see what causes the degradation in these cases?

Ayal.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: postreload-gcse heuristics broken?
  2005-01-23  2:07 postreload-gcse heuristics broken? Ayal Zaks
@ 2005-01-23 23:56 ` Ulrich Weigand
  2005-01-24 14:08   ` Mostafa Hagog
  2005-01-25 17:44   ` Mostafa Hagog
  0 siblings, 2 replies; 13+ messages in thread
From: Ulrich Weigand @ 2005-01-23 23:56 UTC (permalink / raw)
  To: Ayal Zaks; +Cc: Ulrich Weigand, gcc, Mostafa Hagog

Ayal Zaks wrote:

> >resulting in much worse code than before in some cases.
> 
> Is it easy to see what causes the degradation in these cases?

Well, I noticed the issue on mgrid's resid function, which is
a triply-nested loop; what postreload-gcse does to this loop
is to split each of the loop backedges and insert tiny basic
blocks consisting only of a load instruction.

So the hot path goes instead of through a single branch now through
a sequence of two taken branches nearly immediately following
each other; this doesn't do the pipeline any good ...

Also, it would appear to me that postreload-gcse inserts much
more moves that it is able to eliminate:

generating move from 7 to 4 on edge from 6 to 2
generating move from 2 to 4 on edge from 1 to 2
generating move from 10 to 10 on edge from 1 to 2
generating on edge from 6 to 2 a copy of load: (set (reg:DI 10 %r10)
    (mem:DI (plus:DI (reg/f:DI 15 %r15)
            (const_int 296 [0x128])) [31 pretmp.29+0 S8 A8]))
generating move from 1 to 1 on edge from 1 to 2
generating on edge from 6 to 2 a copy of load: (set (reg:DI 1 %r1)
    (mem:DI (plus:DI (reg/f:DI 15 %r15)
            (const_int 304 [0x130])) [32 prephitmp.26+0 S8 A8]))
generating move from 4 to 4 on edge from 5 to 3
generating move from 3 to 4 on edge from 2 to 3
generating move from 5 to 5 on edge from 2 to 3
generating on edge from 5 to 3 a copy of load: (set (reg:DI 5 %r5)
    (mem:DI (plus:DI (reg/f:DI 15 %r15)
            (const_int 264 [0x108])) [27 pretmp.45+0 S8 A8]))
generating move from 5 to 7 on edge from 3 to 4
generating on edge from 4 to 4 a copy of load: (set (reg:SI 7 %r7)
    (mem:SI (plus:DI (reg/f:DI 15 %r15)
            (const_int 168 [0xa8])) [15 ivtmp.115+0 S4 A8]))
Edge 4->4 redirected to 8
Edge 5->3 redirected to 9
Edge 6->2 redirected to 10
deleting insn:
(insn/v 376 374 62 2 (set (reg:DI 1 %r1)
        (mem:DI (plus:DI (reg/f:DI 15 %r15)
                (const_int 304 [0x130])) [32 prephitmp.26+0 S8 A8])) 42 {*movdi_64} (nil)
    (nil))

GCSE AFTER RELOAD stats:
copies inserted: 4
moves inserted:  8
insns deleted:   1

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  Linux on zSeries Development
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: postreload-gcse heuristics broken?
  2005-01-23 23:56 ` Ulrich Weigand
@ 2005-01-24 14:08   ` Mostafa Hagog
  2005-01-25 17:44   ` Mostafa Hagog
  1 sibling, 0 replies; 13+ messages in thread
From: Mostafa Hagog @ 2005-01-24 14:08 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: Ayal Zaks, gcc





Ulrich Weigand/Germany/IBM@IBMDE wrote on 23/01/2005 17:34:14:

> Ayal Zaks wrote:

> > >resulting in much worse code than before in some cases.
> >
> > Is it easy to see what causes the degradation in these cases?

> Well, I noticed the issue on mgrid's resid function, which is
> a triply-nested loop; what postreload-gcse does to this loop
> is to split each of the loop backedges and insert tiny basic
> blocks consisting only of a load instruction.

> So the hot path goes instead of through a single branch now through
> a sequence of two taken branches nearly immediately following
> each other; this doesn't do the pipeline any good ...

> Also, it would appear to me that postreload-gcse inserts much
> more moves that it is able to eliminate:

> generating move from 7 to 4 on edge from 6 to 2
> generating move from 2 to 4 on edge from 1 to 2
> generating move from 10 to 10 on edge from 1 to 2
> generating on edge from 6 to 2 a copy of load: (set (reg:DI 10 %r10)
> (mem:DI (plus:DI (reg/f:DI 15 %r15)
> (const_int 296 [0x128])) [31 pretmp.29+0 S8 A8]))
> generating move from 1 to 1 on edge from 1 to 2
> generating on edge from 6 to 2 a copy of load: (set (reg:DI 1 %r1)
> (mem:DI (plus:DI (reg/f:DI 15 %r15)
> (const_int 304 [0x130])) [32 prephitmp.26+0 S8 A8]))
> generating move from 4 to 4 on edge from 5 to 3
> generating move from 3 to 4 on edge from 2 to 3
> generating move from 5 to 5 on edge from 2 to 3
> generating on edge from 5 to 3 a copy of load: (set (reg:DI 5 %r5)
> (mem:DI (plus:DI (reg/f:DI 15 %r15)
> (const_int 264 [0x108])) [27 pretmp.45+0 S8 A8]))
> generating move from 5 to 7 on edge from 3 to 4
> generating on edge from 4 to 4 a copy of load: (set (reg:SI 7 %r7)
> (mem:SI (plus:DI (reg/f:DI 15 %r15)
> (const_int 168 [0xa8])) [15 ivtmp.115+0 S4 A8]))
> Edge 4->4 redirected to 8
> Edge 5->3 redirected to 9
> Edge 6->2 redirected to 10
> deleting insn:
> (insn/v 376 374 62 2 (set (reg:DI 1 %r1)
> (mem:DI (plus:DI (reg/f:DI 15 %r15)
> (const_int 304 [0x130])) [32 prephitmp.26+0 S8 A8])) 42 {*movdi_64} (nil)
> (nil))

> GCSE AFTER RELOAD stats:
> copies inserted: 4
> moves inserted:  8
> insns deleted:   1

The main goal from gcse is to reduce the number of instructions executed
dynamically.  For example it would delete an instruction from inside a
loop and put 2 or more instruction on several blocks that leads to it.
It means 2 or more new instructions but less instructions executed.

The main problem here is that the counts that postreload gcse is using
to make a decision are wrong.  There are two possible ways to fix this:
1. Disable postreload-gcse when profile information is not available.
2. Use the branch probabilities estimation instead of edge counts - this
   information will be available in both cases (w/o profiling).  And when
   the profile information is available it will be up-to-date.

Mostafa.

> Bye,
> Ulrich

> --
> Dr. Ulrich Weigand
> Linux on zSeries Development
> Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: postreload-gcse heuristics broken?
  2005-01-23 23:56 ` Ulrich Weigand
  2005-01-24 14:08   ` Mostafa Hagog
@ 2005-01-25 17:44   ` Mostafa Hagog
  1 sibling, 0 replies; 13+ messages in thread
From: Mostafa Hagog @ 2005-01-25 17:44 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: Ayal Zaks, gcc

[-- Attachment #1: Type: text/plain, Size: 2694 bytes --]





Ulrich Weigand/Germany/IBM@IBMDE wrote on 23/01/2005 17:34:14:

> Ayal Zaks wrote:

> > >resulting in much worse code than before in some cases.
> >
> > Is it easy to see what causes the degradation in these cases?

> Well, I noticed the issue on mgrid's resid function, which is
> a triply-nested loop; what postreload-gcse does to this loop
> is to split each of the loop backedges and insert tiny basic
> blocks consisting only of a load instruction.

> So the hot path goes instead of through a single branch now through
> a sequence of two taken branches nearly immediately following
> each other; this doesn't do the pipeline any good ...

> Also, it would appear to me that postreload-gcse inserts much
> more moves that it is able to eliminate:

> generating move from 7 to 4 on edge from 6 to 2
> generating move from 2 to 4 on edge from 1 to 2
> generating move from 10 to 10 on edge from 1 to 2
> generating on edge from 6 to 2 a copy of load: (set (reg:DI 10 %r10)
> (mem:DI (plus:DI (reg/f:DI 15 %r15)
> (const_int 296 [0x128])) [31 pretmp.29+0 S8 A8]))
> generating move from 1 to 1 on edge from 1 to 2
> generating on edge from 6 to 2 a copy of load: (set (reg:DI 1 %r1)
> (mem:DI (plus:DI (reg/f:DI 15 %r15)
> (const_int 304 [0x130])) [32 prephitmp.26+0 S8 A8]))
> generating move from 4 to 4 on edge from 5 to 3
> generating move from 3 to 4 on edge from 2 to 3
> generating move from 5 to 5 on edge from 2 to 3
> generating on edge from 5 to 3 a copy of load: (set (reg:DI 5 %r5)
> (mem:DI (plus:DI (reg/f:DI 15 %r15)
> (const_int 264 [0x108])) [27 pretmp.45+0 S8 A8]))
> generating move from 5 to 7 on edge from 3 to 4
> generating on edge from 4 to 4 a copy of load: (set (reg:SI 7 %r7)
> (mem:SI (plus:DI (reg/f:DI 15 %r15)
> (const_int 168 [0xa8])) [15 ivtmp.115+0 S4 A8]))
> Edge 4->4 redirected to 8
> Edge 5->3 redirected to 9
> Edge 6->2 redirected to 10
> deleting insn:
> (insn/v 376 374 62 2 (set (reg:DI 1 %r1)
> (mem:DI (plus:DI (reg/f:DI 15 %r15)
> (const_int 304 [0x130])) [32 prephitmp.26+0 S8 A8])) 42 {*movdi_64} (nil)
> (nil))

> GCSE AFTER RELOAD stats:
> copies inserted: 4
> moves inserted:  8
> insns deleted:   1

"insns deleted" is wrong, this is due to a bug in the dumps of
postreload-gcse. We have two places in which we delete instructions and
only one of them is reported.

I have prepared a patch to prevent postreload-gcse to split critical
edges (as Ayal suggested to me) - this should do the trick for you.
The patch also includes the fix for the dumps bug.

postreload-gcse didn't cause any degradation in G5 so I cannot test
if the patch fixes your problem.  can you test it?

Mostafa.


(See attached file: postreload_gcse_nopdf.patch)

[-- Attachment #2: postreload_gcse_nopdf.patch --]
[-- Type: application/octet-stream, Size: 1483 bytes --]

Index: postreload-gcse.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/postreload-gcse.c,v
retrieving revision 2.9
diff -c -p -r2.9 postreload-gcse.c
*** postreload-gcse.c	18 Jan 2005 11:36:16 -0000	2.9
--- postreload-gcse.c	25 Jan 2005 16:55:57 -0000
*************** eliminate_partially_redundant_load (basi
*** 1030,1035 ****
--- 1030,1042 ----
    if (reg_set_or_used_since_bb_start (dest, bb, insn))
      return;
  
+   /* If we don't have profile information we cannot tell if splitting 
+      a critical edge is profitable or not so don't do it.  */
+   if (! profile_info || ! flag_branch_probabilities)
+     FOR_EACH_EDGE (pred, ei, bb->preds)
+       if (EDGE_CRITICAL_P (pred))
+ 	return;
+ 
    /* Check potential for replacing load with copy for predecessors.  */
    FOR_EACH_EDGE (pred, ei, bb->preds)
      {
*************** eliminate_partially_redundant_load (basi
*** 1158,1164 ****
         a_occr = get_bb_avail_insn (bb, a_occr->next));
  
    if (!a_occr)
!     delete_insn (insn);
    else
      a_occr->deleted_p = 1;
  
--- 1165,1181 ----
         a_occr = get_bb_avail_insn (bb, a_occr->next));
  
    if (!a_occr)
!     {
!       stats.insns_deleted++;
! 
!       if (dump_file)
! 	{
! 	  fprintf (dump_file, "deleting insn:\n");
!           print_rtl_single (dump_file, occr->insn);
!           fprintf (dump_file, "\n");
! 	}
!       delete_insn (insn);
!     }
    else
      a_occr->deleted_p = 1;
  

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: postreload-gcse heuristics broken?
       [not found] <OFF354D103.6ACD791B-ONC2256F95.00572C20-C2256F95.005804B0@LocalDomain>
@ 2005-01-26 16:34 ` Ulrich Weigand
  0 siblings, 0 replies; 13+ messages in thread
From: Ulrich Weigand @ 2005-01-26 16:34 UTC (permalink / raw)
  To: Mostafa Hagog; +Cc: Ayal Zaks, gcc





Mostafa Hagog/Haifa/IBM wrote on 01/26/2005 05:01:23 PM:

> Yes you are right.  This means that we didn't test the new change,
> just the old one again.
>
> Attached the new patch.

This patch also doesn't split the problematic edge in my test case.
Thanks!


Mit freundlichen Gruessen / Best Regards

Ulrich Weigand

--
  Dr. Ulrich Weigand
  Linux for S/390 Design & Development
  IBM Deutschland Entwicklung GmbH, Schoenaicher Str. 220, 71032 Boeblingen
  Phone: +49-7031/16-3727   ---   Email: Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: postreload-gcse heuristics broken?
       [not found] <OF1E57F82B.303FCFAA-ON41256F95.00569478-41256F95.0056BC0F@LocalDomain>
@ 2005-01-26 16:29 ` Mostafa Hagog
  0 siblings, 0 replies; 13+ messages in thread
From: Mostafa Hagog @ 2005-01-26 16:29 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: Ayal Zaks, gcc

[-- Attachment #1: Type: text/plain, Size: 968 bytes --]





Ulrich Weigand/Germany/IBM wrote on 26/01/2005 17:47:22:

> Mostafa Hagog/Haifa/IBM wrote on 01/25/2005 11:10:50 PM:
>
> > Here is an updated patch, can you please test it?
>
> It does work.  But are you sure you wanted to keep this hunk:

No, this hunk shouldn't be kept.

>
> *************** eliminate_partially_redundant_load (basi
> *** 1030,1035 ****
> --- 1031,1040 ----
>     if (reg_set_or_used_since_bb_start (dest, bb, insn))
>       return;
>
> +     FOR_EACH_EDGE (pred, ei, bb->preds)
> +       if (EDGE_CRITICAL_P (pred))
> +       return;
> +
>     /* Check potential for replacing load with copy for predecessors.  */
>     FOR_EACH_EDGE (pred, ei, bb->preds)
>       {
>
> It would appear the later tests for EDGE_CRITICAL_P are quite
> pointless after this ...

Yes you are right.  This means that we didn't test the new change,
just the old one again.

Attached the new patch.

Thanks,
Mostafa.

(See attached file: postreload_gcse_nopdf3.patch)

[-- Attachment #2: postreload_gcse_nopdf3.patch --]
[-- Type: application/octet-stream, Size: 3467 bytes --]

Index: postreload-gcse.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/postreload-gcse.c,v
retrieving revision 2.9
diff -c -p -r2.9 postreload-gcse.c
*** postreload-gcse.c	18 Jan 2005 11:36:16 -0000	2.9
--- postreload-gcse.c	26 Jan 2005 15:59:34 -0000
*************** eliminate_partially_redundant_load (basi
*** 1016,1021 ****
--- 1016,1022 ----
    gcov_type ok_count = 0; /* Redundant load execution count.  */
    gcov_type critical_count = 0; /* Execution count of critical edges.  */
    edge_iterator ei;
+   bool critical_edge_split = false;
  
    /* The execution count of the loads to be added to make the
       load fully redundant.  */
*************** eliminate_partially_redundant_load (basi
*** 1036,1041 ****
--- 1037,1043 ----
        rtx next_pred_bb_end;
  
        avail_insn = NULL_RTX;
+       avail_reg = NULL_RTX;
        pred_bb = pred->src;
        next_pred_bb_end = NEXT_INSN (BB_END (pred_bb));
        for (a_occr = get_bb_avail_insn (pred_bb, expr->avail_occr); a_occr;
*************** eliminate_partially_redundant_load (basi
*** 1071,1076 ****
--- 1073,1087 ----
  	{
  	  npred_ok++;
  	  ok_count += pred->count;
+ 	  if (! set_noop_p (PATTERN (gen_move_insn (copy_rtx (dest),
+ 						    copy_rtx (avail_reg)))))
+ 	    {
+ 	      /* Check if there is going to be a split.  */
+ 	      if (EDGE_CRITICAL_P (pred))
+ 		critical_edge_split = true;
+ 	    }
+ 	  else /* Its a dead move no need to generate.  */
+ 	    continue;
  	  occr = (struct unoccr *) obstack_alloc (&unoccr_obstack,
  						  sizeof (struct occr));
  	  occr->insn = avail_insn;
*************** eliminate_partially_redundant_load (basi
*** 1082,1087 ****
--- 1093,1101 ----
  	}
        else
  	{
+ 	  /* Adding a load on a critical edge will cuase a split.  */
+ 	  if (EDGE_CRITICAL_P (pred))
+ 	    critical_edge_split = true;
  	  not_ok_count += pred->count;
  	  unoccr = (struct unoccr *) obstack_alloc (&unoccr_obstack,
  						    sizeof (struct unoccr));
*************** eliminate_partially_redundant_load (basi
*** 1097,1103 ****
    if (/* No load can be replaced by copy.  */
        npred_ok == 0
        /* Prevent exploding the code.  */ 
!       || (optimize_size && npred_ok > 1))
      goto cleanup;
  
    /* Check if it's worth applying the partial redundancy elimination.  */
--- 1111,1121 ----
    if (/* No load can be replaced by copy.  */
        npred_ok == 0
        /* Prevent exploding the code.  */ 
!       || (optimize_size && npred_ok > 1)
!       /* If we don't have profile information we cannot tell if splitting 
!          a critical edge is profitable or not so don't do it.  */
!       || ((! profile_info || ! flag_branch_probabilities)
! 	  && critical_edge_split))
      goto cleanup;
  
    /* Check if it's worth applying the partial redundancy elimination.  */
*************** eliminate_partially_redundant_load (basi
*** 1158,1164 ****
         a_occr = get_bb_avail_insn (bb, a_occr->next));
  
    if (!a_occr)
!     delete_insn (insn);
    else
      a_occr->deleted_p = 1;
  
--- 1176,1192 ----
         a_occr = get_bb_avail_insn (bb, a_occr->next));
  
    if (!a_occr)
!     {
!       stats.insns_deleted++;
! 
!       if (dump_file)
! 	{
! 	  fprintf (dump_file, "deleting insn:\n");
!           print_rtl_single (dump_file, insn);
!           fprintf (dump_file, "\n");
! 	}
!       delete_insn (insn);
!     }
    else
      a_occr->deleted_p = 1;
  

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: postreload-gcse heuristics broken?
       [not found] <OF33B6A83D.C92932BB-ONC2256F94.0079B206-C2256F94.0079D784@LocalDomain>
@ 2005-01-26 16:01 ` Ulrich Weigand
  0 siblings, 0 replies; 13+ messages in thread
From: Ulrich Weigand @ 2005-01-26 16:01 UTC (permalink / raw)
  To: Mostafa Hagog; +Cc: Ayal Zaks, gcc





Mostafa Hagog/Haifa/IBM wrote on 01/25/2005 11:10:50 PM:

> Here is an updated patch, can you please test it?

It does work.  But are you sure you wanted to keep this hunk:

*************** eliminate_partially_redundant_load (basi
*** 1030,1035 ****
--- 1031,1040 ----
    if (reg_set_or_used_since_bb_start (dest, bb, insn))
      return;

+     FOR_EACH_EDGE (pred, ei, bb->preds)
+       if (EDGE_CRITICAL_P (pred))
+       return;
+
    /* Check potential for replacing load with copy for predecessors.  */
    FOR_EACH_EDGE (pred, ei, bb->preds)
      {

It would appear the later tests for EDGE_CRITICAL_P are quite
pointless after this ...


Mit freundlichen Gruessen / Best Regards

Ulrich Weigand

--
  Dr. Ulrich Weigand
  Linux for S/390 Design & Development
  IBM Deutschland Entwicklung GmbH, Schoenaicher Str. 220, 71032 Boeblingen
  Phone: +49-7031/16-3727   ---   Email: Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: postreload-gcse heuristics broken?
       [not found] <OF54C43C7C.159EF7C1-ONC2256F94.00735E8A-C2256F94.00739C0E@LocalDomain>
@ 2005-01-26 16:00 ` Ulrich Weigand
  0 siblings, 0 replies; 13+ messages in thread
From: Ulrich Weigand @ 2005-01-26 16:00 UTC (permalink / raw)
  To: Mostafa Hagog; +Cc: Ayal Zaks, gcc





Mostafa Hagog/Haifa/IBM wrote on 01/25/2005 10:02:45 PM:

> I will prepare a patch to be submitted to mainline.
> Do you have numbers of the regressions this problem
> causes so I can attach them with the patch ?

Unfortunately not; this is just one of multiple issues
(and likely one of the smaller ones) contributing to
the big performance degradation I'm seeing on mgrid.

I'll ask our performance team to get numbers for just
postreload-gcse, but that'll take until next week ...


Mit freundlichen Gruessen / Best Regards

Ulrich Weigand

--
  Dr. Ulrich Weigand
  Linux for S/390 Design & Development
  IBM Deutschland Entwicklung GmbH, Schoenaicher Str. 220, 71032 Boeblingen
  Phone: +49-7031/16-3727   ---   Email: Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: postreload-gcse heuristics broken?
       [not found] <OFC154839D.B029149A-ONC1256F94.0071A74D-41256F94.00720B9C@LocalDomain>
  2005-01-25 22:55 ` Mostafa Hagog
@ 2005-01-25 23:15 ` Mostafa Hagog
  1 sibling, 0 replies; 13+ messages in thread
From: Mostafa Hagog @ 2005-01-25 23:15 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: Ayal Zaks, gcc

[-- Attachment #1: Type: text/plain, Size: 1310 bytes --]





Here is an updated patch, can you please test it?

Ulrich Weigand/Germany/IBM wrote on 25/01/2005 22:45:39:

> Mostafa Hagog/Haifa/IBM wrote on 01/25/2005 06:05:27 PM:
>
> > "insns deleted" is wrong, this is due to a bug in the dumps of
> > postreload-gcse. We have two places in which we delete instructions and
> > only one of them is reported.
>
> OK, I see.
>
> > I have prepared a patch to prevent postreload-gcse to split critical
> > edges (as Ayal suggested to me) - this should do the trick for you.
> > The patch also includes the fix for the dumps bug.
> >
> > postreload-gcse didn't cause any degradation in G5 so I cannot test
> > if the patch fixes your problem.  can you test it?
>
> Crashes because of NULL 'occr' in:
> !           print_rtl_single (dump_file, occr->insn);
> I guess this should be just 'insn'.
>
> With this fixed, the loop back-edge of my testcase is indeed no longer
> split, so this should fix the problem I was seeing ...
>
> Thanks for taking care of this!
>
>
> Mit freundlichen Gruessen / Best Regards
>
> Ulrich Weigand
>
> --
>   Dr. Ulrich Weigand
>   Linux for S/390 Design & Development
>   IBM Deutschland Entwicklung GmbH, Schoenaicher Str. 220, 71032
Boeblingen
>   Phone: +49-7031/16-3727   ---   Email: Ulrich.Weigand@de.ibm.com

(See attached file: diff)

[-- Attachment #2: diff --]
[-- Type: application/octet-stream, Size: 3498 bytes --]

Index: postreload-gcse.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/postreload-gcse.c,v
retrieving revision 2.9
diff -c -p -r2.9 postreload-gcse.c
*** postreload-gcse.c	18 Jan 2005 11:36:16 -0000	2.9
--- postreload-gcse.c	25 Jan 2005 22:08:13 -0000
*************** eliminate_partially_redundant_load (basi
*** 1016,1021 ****
--- 1016,1022 ----
    gcov_type ok_count = 0; /* Redundant load execution count.  */
    gcov_type critical_count = 0; /* Execution count of critical edges.  */
    edge_iterator ei;
+   bool critical_edge_split = false;
  
    /* The execution count of the loads to be added to make the
       load fully redundant.  */
*************** eliminate_partially_redundant_load (basi
*** 1030,1035 ****
--- 1031,1040 ----
    if (reg_set_or_used_since_bb_start (dest, bb, insn))
      return;
  
+     FOR_EACH_EDGE (pred, ei, bb->preds)
+       if (EDGE_CRITICAL_P (pred))
+ 	return;
+ 
    /* Check potential for replacing load with copy for predecessors.  */
    FOR_EACH_EDGE (pred, ei, bb->preds)
      {
*************** eliminate_partially_redundant_load (basi
*** 1071,1076 ****
--- 1076,1090 ----
  	{
  	  npred_ok++;
  	  ok_count += pred->count;
+ 	  if (! set_noop_p (PATTERN (gen_move_insn (copy_rtx (dest),
+ 						    copy_rtx (avail_reg)))))
+ 	    {
+ 	      /* Check if there is going to be a split.  */
+ 	      if (EDGE_CRITICAL_P (pred))
+ 		critical_edge_split = true;
+ 	    }
+ 	  else /* Its a dead move no need to generate.  */
+ 	    continue;
  	  occr = (struct unoccr *) obstack_alloc (&unoccr_obstack,
  						  sizeof (struct occr));
  	  occr->insn = avail_insn;
*************** eliminate_partially_redundant_load (basi
*** 1082,1087 ****
--- 1096,1104 ----
  	}
        else
  	{
+ 	  /* Adding a load on a critical edge will cuase a split.  */
+ 	  if (EDGE_CRITICAL_P (pred))
+ 	    critical_edge_split = true;
  	  not_ok_count += pred->count;
  	  unoccr = (struct unoccr *) obstack_alloc (&unoccr_obstack,
  						    sizeof (struct unoccr));
*************** eliminate_partially_redundant_load (basi
*** 1097,1103 ****
    if (/* No load can be replaced by copy.  */
        npred_ok == 0
        /* Prevent exploding the code.  */ 
!       || (optimize_size && npred_ok > 1))
      goto cleanup;
  
    /* Check if it's worth applying the partial redundancy elimination.  */
--- 1114,1124 ----
    if (/* No load can be replaced by copy.  */
        npred_ok == 0
        /* Prevent exploding the code.  */ 
!       || (optimize_size && npred_ok > 1)
!       /* If we don't have profile information we cannot tell if splitting 
!          a critical edge is profitable or not so don't do it.  */
!       || ((! profile_info || ! flag_branch_probabilities)
! 	  && critical_edge_split))
      goto cleanup;
  
    /* Check if it's worth applying the partial redundancy elimination.  */
*************** eliminate_partially_redundant_load (basi
*** 1158,1164 ****
         a_occr = get_bb_avail_insn (bb, a_occr->next));
  
    if (!a_occr)
!     delete_insn (insn);
    else
      a_occr->deleted_p = 1;
  
--- 1179,1195 ----
         a_occr = get_bb_avail_insn (bb, a_occr->next));
  
    if (!a_occr)
!     {
!       stats.insns_deleted++;
! 
!       if (dump_file)
! 	{
! 	  fprintf (dump_file, "deleting insn:\n");
!           print_rtl_single (dump_file, insn);
!           fprintf (dump_file, "\n");
! 	}
!       delete_insn (insn);
!     }
    else
      a_occr->deleted_p = 1;
  

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: postreload-gcse heuristics broken?
       [not found] <OFC154839D.B029149A-ONC1256F94.0071A74D-41256F94.00720B9C@LocalDomain>
@ 2005-01-25 22:55 ` Mostafa Hagog
  2005-01-25 23:15 ` Mostafa Hagog
  1 sibling, 0 replies; 13+ messages in thread
From: Mostafa Hagog @ 2005-01-25 22:55 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: Ayal Zaks, gcc





Ulrich Weigand/Germany/IBM wrote on 25/01/2005 22:45:39:

> Mostafa Hagog/Haifa/IBM wrote on 01/25/2005 06:05:27 PM:
>
> > "insns deleted" is wrong, this is due to a bug in the dumps of
> > postreload-gcse. We have two places in which we delete instructions and
> > only one of them is reported.
>
> OK, I see.
>
> > I have prepared a patch to prevent postreload-gcse to split critical
> > edges (as Ayal suggested to me) - this should do the trick for you.
> > The patch also includes the fix for the dumps bug.
> >
> > postreload-gcse didn't cause any degradation in G5 so I cannot test
> > if the patch fixes your problem.  can you test it?
>
> Crashes because of NULL 'occr' in:
> !           print_rtl_single (dump_file, occr->insn);
> I guess this should be just 'insn'.
Correct.

>
> With this fixed, the loop back-edge of my testcase is indeed no longer
> split, so this should fix the problem I was seeing ...
>
> Thanks for taking care of this!

I will prepare a patch to be submitted to mainline.
Do you have numbers of the regressions this problem
causes so I can attach them with the patch ?

Thanks,
Mostafa.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: postreload-gcse heuristics broken?
       [not found] <OFA1A552DE.EC1881EC-ONC2256F94.005D20F7-C2256F94.005DE228@LocalDomain>
@ 2005-01-25 21:02 ` Ulrich Weigand
  0 siblings, 0 replies; 13+ messages in thread
From: Ulrich Weigand @ 2005-01-25 21:02 UTC (permalink / raw)
  To: Mostafa Hagog; +Cc: Ayal Zaks, gcc





Mostafa Hagog/Haifa/IBM wrote on 01/25/2005 06:05:27 PM:

> "insns deleted" is wrong, this is due to a bug in the dumps of
> postreload-gcse. We have two places in which we delete instructions and
> only one of them is reported.

OK, I see.

> I have prepared a patch to prevent postreload-gcse to split critical
> edges (as Ayal suggested to me) - this should do the trick for you.
> The patch also includes the fix for the dumps bug.
>
> postreload-gcse didn't cause any degradation in G5 so I cannot test
> if the patch fixes your problem.  can you test it?

Crashes because of NULL 'occr' in:
!           print_rtl_single (dump_file, occr->insn);
I guess this should be just 'insn'.

With this fixed, the loop back-edge of my testcase is indeed no longer
split, so this should fix the problem I was seeing ...

Thanks for taking care of this!


Mit freundlichen Gruessen / Best Regards

Ulrich Weigand

--
  Dr. Ulrich Weigand
  Linux for S/390 Design & Development
  IBM Deutschland Entwicklung GmbH, Schoenaicher Str. 220, 71032 Boeblingen
  Phone: +49-7031/16-3727   ---   Email: Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: postreload-gcse heuristics broken?
  2005-01-20 20:13 Ulrich Weigand
@ 2005-01-24 13:54 ` Mostafa Hagog
  0 siblings, 0 replies; 13+ messages in thread
From: Mostafa Hagog @ 2005-01-24 13:54 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: gcc





gcc-owner@gcc.gnu.org wrote on 20/01/2005 21:26:22:

> Hello,
>
> the heuristics used in eliminate_partially_redundant_load
> (postreload-gcse.c) to decide whether or not eliminating
> a redundancy appears to be seriously broken.  It tests
>
>   /* Check if it's worth applying the partial redundancy elimination.  */
>   if (ok_count < GCSE_AFTER_RELOAD_PARTIAL_FRACTION * not_ok_count)
>     goto cleanup;
>   if (ok_count < GCSE_AFTER_RELOAD_CRITICAL_FRACTION * critical_count)
>     goto cleanup;
>
> where the counts are accumulated edge->count values, e.g.
>
>       if (EDGE_CRITICAL_P (pred))
>         critical_count += pred->count;
>
> Unfortunately it would appear that unless profile-directed
> feedback is being used, edge->count is always zero!
>
> Thus the check above always reduced to 0 < 0, and therefore
> *every possible* redundancy elimination is performed, resulting
> in much worse code than before in some cases.
>
> Any suggestions how the heuristics can be fixed in the
> absence of profile data?  Otherwise, I'd suggest to make
> the pass conditional on profile-directed feedback ...

The underlying assumption to the above heuristic is that even without
profile information the edge->count is estimated according to the branch
probabilities, which seems to be wrong.

Mostafa.

>
> Bye,
> Ulrich
>
> --
>   Dr. Ulrich Weigand
>   Linux on zSeries Development
>   Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* postreload-gcse heuristics broken?
@ 2005-01-20 20:13 Ulrich Weigand
  2005-01-24 13:54 ` Mostafa Hagog
  0 siblings, 1 reply; 13+ messages in thread
From: Ulrich Weigand @ 2005-01-20 20:13 UTC (permalink / raw)
  To: gcc

Hello,

the heuristics used in eliminate_partially_redundant_load
(postreload-gcse.c) to decide whether or not eliminating
a redundancy appears to be seriously broken.  It tests

  /* Check if it's worth applying the partial redundancy elimination.  */
  if (ok_count < GCSE_AFTER_RELOAD_PARTIAL_FRACTION * not_ok_count)
    goto cleanup;
  if (ok_count < GCSE_AFTER_RELOAD_CRITICAL_FRACTION * critical_count)
    goto cleanup;

where the counts are accumulated edge->count values, e.g.

      if (EDGE_CRITICAL_P (pred))
        critical_count += pred->count;

Unfortunately it would appear that unless profile-directed
feedback is being used, edge->count is always zero!

Thus the check above always reduced to 0 < 0, and therefore
*every possible* redundancy elimination is performed, resulting
in much worse code than before in some cases.

Any suggestions how the heuristics can be fixed in the
absence of profile data?  Otherwise, I'd suggest to make
the pass conditional on profile-directed feedback ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  Linux on zSeries Development
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2005-01-26 16:12 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-01-23  2:07 postreload-gcse heuristics broken? Ayal Zaks
2005-01-23 23:56 ` Ulrich Weigand
2005-01-24 14:08   ` Mostafa Hagog
2005-01-25 17:44   ` Mostafa Hagog
     [not found] <OFF354D103.6ACD791B-ONC2256F95.00572C20-C2256F95.005804B0@LocalDomain>
2005-01-26 16:34 ` Ulrich Weigand
     [not found] <OF1E57F82B.303FCFAA-ON41256F95.00569478-41256F95.0056BC0F@LocalDomain>
2005-01-26 16:29 ` Mostafa Hagog
     [not found] <OF33B6A83D.C92932BB-ONC2256F94.0079B206-C2256F94.0079D784@LocalDomain>
2005-01-26 16:01 ` Ulrich Weigand
     [not found] <OF54C43C7C.159EF7C1-ONC2256F94.00735E8A-C2256F94.00739C0E@LocalDomain>
2005-01-26 16:00 ` Ulrich Weigand
     [not found] <OFC154839D.B029149A-ONC1256F94.0071A74D-41256F94.00720B9C@LocalDomain>
2005-01-25 22:55 ` Mostafa Hagog
2005-01-25 23:15 ` Mostafa Hagog
     [not found] <OFA1A552DE.EC1881EC-ONC2256F94.005D20F7-C2256F94.005DE228@LocalDomain>
2005-01-25 21:02 ` Ulrich Weigand
  -- strict thread matches above, loose matches on Subject: below --
2005-01-20 20:13 Ulrich Weigand
2005-01-24 13:54 ` Mostafa Hagog

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).