public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/26944]  New: -ftree-ch generates worse code
@ 2006-03-30 16:15 dann at godzilla dot ics dot uci dot edu
  2006-03-30 16:25 ` [Bug tree-optimization/26944] [4.1/4.2 Regression] " rguenth at gcc dot gnu dot org
                   ` (15 more replies)
  0 siblings, 16 replies; 19+ messages in thread
From: dann at godzilla dot ics dot uci dot edu @ 2006-03-30 16:15 UTC (permalink / raw)
  To: gcc-bugs

The loop from the code below is compiled to this when using gcc-4.1 -O2
.L5:
        movl    16(%ebp), %eax
        addl    %ecx, %eax
        addl    $1, %ecx
        movl    %edx, 20(%ebx,%eax,4)
        leal    (%edx,%ecx), %eax
        cmpl    %edi, %eax
        jle     .L5
but the code is much better when using gcc -fno-tree-ch -O2 
.L3:
        addl    $1, %ecx
        movl    %ebx, -4(%edx)
        addl    $4, %edx
        cmpl    %eax, %ecx
        jle     .L3
This is a regression as gcc-3.4.3 generates similar code. 

The code is from the Dhrystone as included in Unixbench.

The regression is quite important as embedded processor people still use
Dhrystone for benchmarking compiler/processor speed.

Its strange that tree-ch messes up, the loop is about as simple as loops can
get. 

typedef int One_Fifty;
typedef int Arr_1_Dim [50];
typedef int Arr_2_Dim [50] [50];
extern int Int_Glob;

void Proc_8 (Arr_1_Par_Ref, Arr_2_Par_Ref, Int_1_Par_Val, Int_2_Par_Val)
     Arr_1_Dim Arr_1_Par_Ref;
     Arr_2_Dim Arr_2_Par_Ref;
     int Int_1_Par_Val;
     int Int_2_Par_Val;
{
  register One_Fifty Int_Index;
  register One_Fifty Int_Loc;

  Int_Loc = Int_1_Par_Val + 5;
  Arr_1_Par_Ref [Int_Loc] = Int_2_Par_Val;
  Arr_1_Par_Ref [Int_Loc+1] = Arr_1_Par_Ref [Int_Loc];
  Arr_1_Par_Ref [Int_Loc+30] = Int_Loc;
  for (Int_Index = Int_Loc; Int_Index <= Int_Loc+1; ++Int_Index)
    Arr_2_Par_Ref [Int_Loc] [Int_Index] = Int_Loc;
  Arr_2_Par_Ref [Int_Loc] [Int_Loc-1] += 1;
  Arr_2_Par_Ref [Int_Loc+20] [Int_Loc] = Arr_1_Par_Ref [Int_Loc];
  Int_Glob = 5;
}


Intel's compiler generates even tighter code:

..B1.7:                         # Preds ..B1.10 ..B1.7
        movl      %ebx, (%ecx,%edx,4)                           #20.5
        addl      $1, %edx                                      #19.55
        cmpl      %eax, %edx                                    #19.3
        jle       ..B1.7        # Prob 80%                      #19.3


-- 
           Summary: -ftree-ch generates worse code
           Product: gcc
           Version: 4.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: dann at godzilla dot ics dot uci dot edu
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26944


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug tree-optimization/26944] [4.1/4.2 Regression] -ftree-ch generates worse code
  2006-03-30 16:15 [Bug tree-optimization/26944] New: -ftree-ch generates worse code dann at godzilla dot ics dot uci dot edu
@ 2006-03-30 16:25 ` rguenth at gcc dot gnu dot org
  2006-03-30 16:43 ` dann at godzilla dot ics dot uci dot edu
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 19+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2006-03-30 16:25 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from rguenth at gcc dot gnu dot org  2006-03-30 16:25 -------
Note that this may be also PRE confusing SCEV in presence of loop headers. 
I.e. a sort of dup of PR26939.  Confirmed though.  A regression from 4.0.3,
which is also fine.


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  BugsThisDependsOn|                            |26939
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|0                           |1
 GCC target triplet|i686-pc-linux-gnu           |
           Keywords|                            |missed-optimization
      Known to work|                            |4.0.3
   Last reconfirmed|0000-00-00 00:00:00         |2006-03-30 16:25:17
               date|                            |
            Summary|-ftree-ch generates worse   |[4.1/4.2 Regression] -ftree-
                   |code                        |ch generates worse code


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26944


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug tree-optimization/26944] [4.1/4.2 Regression] -ftree-ch generates worse code
  2006-03-30 16:15 [Bug tree-optimization/26944] New: -ftree-ch generates worse code dann at godzilla dot ics dot uci dot edu
  2006-03-30 16:25 ` [Bug tree-optimization/26944] [4.1/4.2 Regression] " rguenth at gcc dot gnu dot org
@ 2006-03-30 16:43 ` dann at godzilla dot ics dot uci dot edu
  2006-03-31 22:41   ` Daniel Berlin
  2006-03-31 22:41 ` dberlin at dberlin dot org
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 19+ messages in thread
From: dann at godzilla dot ics dot uci dot edu @ 2006-03-30 16:43 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from dann at godzilla dot ics dot uci dot edu  2006-03-30 16:43 -------
(In reply to comment #1)
> Note that this may be also PRE confusing SCEV in presence of loop headers. 

Talking about PRE, here's a maybe interesting observation in the PRE dump:

<L7>:;
  pretmp.30_53 = Int_Loc.0_4 * 200;
  pretmp.32_23 = (int[50] *) pretmp.30_53;
  pretmp.32_11 = pretmp.32_23 + Arr_2_Par_Ref_30;
  goto <bb 4> (<L2>);

<L6>:;
  pretmp.27_59 = Int_Loc.0_4 * 200;
  pretmp.28_45 = (int[50] *) pretmp.27_59;
  pretmp.28_49 = Arr_2_Par_Ref_30 + pretmp.28_45;

  # Int_Index_37 = PHI <Int_Index_58(7), Int_Loc_3(5)>;
<L0>:;
  D.1544_54 = pretmp.27_59;
  D.1545_55 = pretmp.28_45;
  D.1546_56 = pretmp.28_49;
  (*D.1546_56)[Int_Index_37] = Int_Loc_3;
  Int_Index_58 = Int_Index_37 + 1;
  if (D.1548_41 >= Int_Index_58) goto <L8>; else goto <L9>;

<L8>:;
  goto <bb 3> (<L0>);

<L9>:;

  # prephitmp.33_40 = PHI <D.1546_56(8), pretmp.32_11(6)>;
  # prephitmp.33_18 = PHI <D.1545_55(8), pretmp.32_23(6)>;
  # prephitmp.31_25 = PHI <D.1544_54(8), pretmp.30_53(6)>;


Compare pretmp.28_49 with pretmp.32_11, why are the arguments in a different
order? Is there something unstable in the PRE algorithm?

One has to wonder what are the tree-ch effects on more complex loops. 
It might be interesting test SPEC with and without tree-ch...


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26944


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bug tree-optimization/26944] [4.1/4.2 Regression] -ftree-ch  generates worse code
  2006-03-30 16:43 ` dann at godzilla dot ics dot uci dot edu
@ 2006-03-31 22:41   ` Daniel Berlin
  0 siblings, 0 replies; 19+ messages in thread
From: Daniel Berlin @ 2006-03-31 22:41 UTC (permalink / raw)
  To: gcc-bugzilla; +Cc: gcc-bugs


> Compare pretmp.28_49 with pretmp.32_11, why are the arguments in a different
> order? Is there something unstable in the PRE algorithm?
> 

No, we just call fold on the expressions we build, and whatever it gives
us, we use :)



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug tree-optimization/26944] [4.1/4.2 Regression] -ftree-ch generates worse code
  2006-03-30 16:15 [Bug tree-optimization/26944] New: -ftree-ch generates worse code dann at godzilla dot ics dot uci dot edu
  2006-03-30 16:25 ` [Bug tree-optimization/26944] [4.1/4.2 Regression] " rguenth at gcc dot gnu dot org
  2006-03-30 16:43 ` dann at godzilla dot ics dot uci dot edu
@ 2006-03-31 22:41 ` dberlin at dberlin dot org
  2006-04-02  8:12 ` pinskia at gcc dot gnu dot org
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 19+ messages in thread
From: dberlin at dberlin dot org @ 2006-03-31 22:41 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from dberlin at gcc dot gnu dot org  2006-03-31 22:41 -------
Subject: Re:  [4.1/4.2 Regression] -ftree-ch
        generates worse code


> Compare pretmp.28_49 with pretmp.32_11, why are the arguments in a different
> order? Is there something unstable in the PRE algorithm?
> 

No, we just call fold on the expressions we build, and whatever it gives
us, we use :)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26944


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug tree-optimization/26944] [4.1/4.2 Regression] -ftree-ch generates worse code
  2006-03-30 16:15 [Bug tree-optimization/26944] New: -ftree-ch generates worse code dann at godzilla dot ics dot uci dot edu
                   ` (2 preceding siblings ...)
  2006-03-31 22:41 ` dberlin at dberlin dot org
@ 2006-04-02  8:12 ` pinskia at gcc dot gnu dot org
  2006-04-16 19:13 ` mmitchel at gcc dot gnu dot org
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 19+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2006-04-02  8:12 UTC (permalink / raw)
  To: gcc-bugs



-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |minor
   Target Milestone|---                         |4.1.1


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26944


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug tree-optimization/26944] [4.1/4.2 Regression] -ftree-ch generates worse code
  2006-03-30 16:15 [Bug tree-optimization/26944] New: -ftree-ch generates worse code dann at godzilla dot ics dot uci dot edu
                   ` (3 preceding siblings ...)
  2006-04-02  8:12 ` pinskia at gcc dot gnu dot org
@ 2006-04-16 19:13 ` mmitchel at gcc dot gnu dot org
  2006-05-02 17:38 ` steven at gcc dot gnu dot org
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 19+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2006-04-16 19:13 UTC (permalink / raw)
  To: gcc-bugs



-- 

mmitchel at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P2


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26944


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug tree-optimization/26944] [4.1/4.2 Regression] -ftree-ch generates worse code
  2006-03-30 16:15 [Bug tree-optimization/26944] New: -ftree-ch generates worse code dann at godzilla dot ics dot uci dot edu
                   ` (4 preceding siblings ...)
  2006-04-16 19:13 ` mmitchel at gcc dot gnu dot org
@ 2006-05-02 17:38 ` steven at gcc dot gnu dot org
  2006-05-03 18:55 ` dann at godzilla dot ics dot uci dot edu
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 19+ messages in thread
From: steven at gcc dot gnu dot org @ 2006-05-02 17:38 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from steven at gcc dot gnu dot org  2006-05-02 17:38 -------
The inner loop in the .cunroll, .ivopts and .final_cleanup with GVN-PRE
disabled look like this:

  # Int_Index_37 = PHI <Int_Index_58(5), Int_Loc_3(3)>;
<L0>:;
  (*D.1561_56)[Int_Index_37] = Int_Loc_3;
  Int_Index_58 = Int_Index_37 + 1;
  if (D.1563_41 >= Int_Index_58) goto <L8>; else goto <L2>;

<L8>:;
  goto <bb 4> (<L0>);

and

  # ivtmp.34_26 = PHI <ivtmp.34_19(5), ivtmp.34_1(3)>;
  # Int_Index_37 = PHI <Int_Index_58(5), Int_Loc_3(3)>;
<L0>:;
  D.1613_59 = (int *) ivtmp.34_26;
  MEM[base: D.1613_59, offset: 20B] = Int_Loc_3;
  Int_Index_58 = Int_Index_37 + 1;
  ivtmp.34_19 = ivtmp.34_26 + 4B;
  if (D.1563_41 >= Int_Index_58) goto <L8>; else goto <L2>;

<L8>:;
  goto <bb 4> (<L0>);

and

<L0>:;
  MEM[base: (int *) ivtmp.34, offset: 20B] = Int_Loc;
  Int_Index = Int_Index + 1;
  ivtmp.34 = ivtmp.34 + 4B;
  if (D.1563 >= Int_Index) goto <L0>; else goto <L2>;

which compiles to:
.L4:
        addl    $1, %eax
        movl    %ecx, 20(%edx)
        addl    $4, %edx
        cmpl    %eax, %ebx
        jge     .L4



With PRE enabled, we get this:

  # Int_Index_37 = PHI <Int_Index_58(6), Int_Loc_3(4)>;
<L0>:;
  D.1559_54 = pretmp.27_59;
  D.1560_55 = pretmp.28_45;
  D.1561_56 = pretmp.28_49;
  (*pretmp.28_49)[Int_Index_37] = Int_Loc_3;
  Int_Index_58 = Int_Index_37 + 1;
  if (D.1563_41 >= Int_Index_58) goto <L8>; else goto <L9>;

<L8>:;
  goto <bb 5> (<L0>);

and

  # ivtmp.38_26 = PHI <ivtmp.38_35(6), 0(4)>;
<L0>:;
  D.1559_54 = pretmp.27_59;
  D.1560_55 = pretmp.28_45;
  D.1561_56 = pretmp.28_49;
  D.1622_34 = (int *) pretmp.28_49;
  D.1623_33 = (int *) Int_1_Par_Val_2;
  D.1624_22 = (int *) ivtmp.38_26;
  D.1625_21 = D.1623_33 + D.1624_22;
  MEM[base: D.1622_34, index: D.1625_21, step: 4B, offset: 20B] = Int_Loc_3;
  ivtmp.38_35 = ivtmp.38_26 + 1;
  D.1626_20 = (unsigned int) Int_1_Par_Val_2;
  D.1627_17 = D.1626_20 + ivtmp.38_35;
  D.1628_16 = D.1627_17 + 5;
  Int_Index_15 = (One_Fifty) D.1628_16;
  if (D.1563_41 >= Int_Index_15) goto <L8>; else goto <L9>;

<L8>:;
  goto <bb 5> (<L0>);

and

<L0>:;
  MEM[base: (int *) prephitmp.33, index: (int *) Int_1_Par_Val + (int *)
ivtmp.38, step: 4B, offset: 20B] = Int_Loc;
  ivtmp.38 = ivtmp.38 + 1;
  if ((One_Fifty) ((unsigned int) Int_1_Par_Val + 5 + ivtmp.38) <= D.1563) goto
<L0>; else goto <L2>;

and from there:
.L5:
        leal    (%edi,%edx), %eax
        addl    $1, %edx
        movl    %ecx, 20(%ebx,%eax,4)
        leal    (%ecx,%edx), %eax
        cmpl    %esi, %eax
        jle     .L5

So it's a mix of PRE and IVOPTs that gives this strange code.

BTW regarding "Its strange that tree-ch messes up", please next time don't
blame random passes if you don't fully analyze the problem.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26944


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug tree-optimization/26944] [4.1/4.2 Regression] -ftree-ch generates worse code
  2006-03-30 16:15 [Bug tree-optimization/26944] New: -ftree-ch generates worse code dann at godzilla dot ics dot uci dot edu
                   ` (5 preceding siblings ...)
  2006-05-02 17:38 ` steven at gcc dot gnu dot org
@ 2006-05-03 18:55 ` dann at godzilla dot ics dot uci dot edu
  2006-05-03 19:00   ` Andrew Pinski
  2006-05-03 19:00 ` pinskia at physics dot uc dot edu
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 19+ messages in thread
From: dann at godzilla dot ics dot uci dot edu @ 2006-05-03 18:55 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from dann at godzilla dot ics dot uci dot edu  2006-05-03 18:54 -------
IMO Comment #4 does not look close enough at what is actually happening.
IMO tree-ch is the root cause here.

The code looks like this before .ch
Before .ch
  goto <bb 2> (<L1>);

<L0>:;
  D.1301_54 = Int_Loc.0_4 * 200;
  D.1302_55 = (int[50] *) D.1301_54;
  D.1303_56 = Arr_2_Par_Ref_30 + D.1302_55;
  (*D.1303_56)[Int_Index_1] = Int_Loc_3;
  Int_Index_58 = Int_Index_1 + 1;

  # Int_Index_1 = PHI <Int_Loc_3(0), Int_Index_58(1)>;
<L1>:;
  D.1305_26 = Int_Loc_3 + 1;
  if (Int_Index_1 <= D.1305_26) goto <L0>; else goto <L2>;

<L2>:;


after .ch it looks like this: 
  D.1305_41 = Int_Loc_3 + 1;
  if (Int_Loc_3 <= D.1305_41) goto <L0>; else goto <L2>; <-- this just
complicates the CFG. Look below to see what are the effects of doing this in
later passes. Plus just look at the comparison ...

  # Int_Index_37 = PHI <Int_Index_58(1), Int_Loc_3(0)>;
<L0>:;
  D.1301_54 = Int_Loc.0_4 * 200;
  D.1302_55 = (int[50] *) D.1301_54;
  D.1303_56 = Arr_2_Par_Ref_30 + D.1302_55;
  (*D.1303_56)[Int_Index_37] = Int_Loc_3;
  Int_Index_58 = Int_Index_37 + 1;
  D.1305_26 = Int_Loc_3 + 1;
  if (D.1305_26 >= Int_Index_58) goto <L0>; else goto <L2>;

<L2>:;

Given the above CFG, critical edge splitting transforms this into:
  D.1305_41 = Int_Loc_3 + 1;
  if (Int_Loc_3 <= D.1305_41) goto <L6>; else goto <L7>;

<L7>:;
  goto <bb 2> (<L2>);

<L6>:;

  # Int_Index_37 = PHI <Int_Index_58(5), Int_Loc_3(3)>;
<L0>:;
  D.1301_54 = Int_Loc.0_4 * 200;
  D.1302_55 = (int[50] *) D.1301_54;
  D.1303_56 = Arr_2_Par_Ref_30 + D.1302_55;
  (*D.1303_56)[Int_Index_37] = Int_Loc_3;
  Int_Index_58 = Int_Index_37 + 1;
  if (D.1305_41 >= Int_Index_58) goto <L8>; else goto <L9>;

<L8>:;
  goto <bb 1> (<L0>);

<L9>:;

<L2>:;

Given the above CFG PRE will dutifully fill with code a lot of the empty basic
blocks: 

after pre
  D.1305_41 = Int_Loc_3 + 1;
  if (Int_Loc_3 <= D.1305_41) goto <L6>; else goto <L7>;

<L7>:;
  pretmp.34_45 = Int_Loc.0_4 * 200;
  pretmp.36_57 = (int[50] *) pretmp.34_45;
  pretmp.38_25 = Arr_2_Par_Ref_30 + pretmp.36_57;
  goto <bb 2> (<L2>);

<L6>:;
  pretmp.30_26 = Int_Loc.0_4 * 200;
  pretmp.31_19 = (int[50] *) pretmp.30_26;
  pretmp.32_1 = pretmp.31_19 + Arr_2_Par_Ref_30;

  # Int_Index_37 = PHI <Int_Index_58(5), Int_Loc_3(3)>;
<L0>:;
  D.1301_54 = pretmp.30_26;
  D.1302_55 = pretmp.31_19;
  D.1303_56 = pretmp.32_1;
  (*D.1303_56)[Int_Index_37] = Int_Loc_3;
  Int_Index_58 = Int_Index_37 + 1;
  if (D.1305_41 >= Int_Index_58) goto <L8>; else goto <L9>;

<L8>:;
  goto <bb 1> (<L0>);

<L9>:;

  # prephitmp.39_23 = PHI <D.1303_56(6), pretmp.38_25(4)>;
  # prephitmp.37_53 = PHI <D.1302_55(6), pretmp.36_57(4)>;
  # prephitmp.35_49 = PHI <D.1301_54(6), pretmp.34_45(4)>;
<L2>:;


Now when using -fno-tree-ch 

before critical edge splitting the code looks like this:
  goto <bb 2> (<L1>);

<L0>:;
  D.1301_54 = Int_Loc.0_4 * 200;
  D.1302_55 = (int[50] *) D.1301_54;
  D.1303_56 = Arr_2_Par_Ref_30 + D.1302_55;
  (*D.1303_56)[Int_Index_1] = Int_Loc_3;
  Int_Index_58 = Int_Index_1 + 1;

  # Int_Index_1 = PHI <Int_Loc_3(0), Int_Index_58(1)>;
<L1>:;
  D.1305_26 = Int_Loc_3 + 1;
  if (Int_Index_1 <= D.1305_26) goto <L0>; else goto <L2>;

<L2>:;


after crited it looks like this: (i.e. no change) 

  goto <bb 2> (<L1>);

<L0>:;
  D.1301_54 = Int_Loc.0_4 * 200;
  D.1302_55 = (int[50] *) D.1301_54;
  D.1303_56 = Arr_2_Par_Ref_30 + D.1302_55;
  (*D.1303_56)[Int_Index_1] = Int_Loc_3;
  Int_Index_58 = Int_Index_1 + 1;

  # Int_Index_1 = PHI <Int_Loc_3(0), Int_Index_58(1)>;
<L1>:;
  D.1305_26 = Int_Loc_3 + 1;
  if (Int_Index_1 <= D.1305_26) goto <L0>; else goto <L2>;

<L2>:;

and after PRE

  goto <bb 2> (<L1>);

<L0>:;
  D.1301_54 = pretmp.31_49;
  D.1302_55 = pretmp.32_45;
  D.1303_56 = pretmp.33_41;
  (*D.1303_56)[Int_Index_1] = Int_Loc_3;
  Int_Index_58 = Int_Index_1 + 1;

  # Int_Index_1 = PHI <Int_Loc_3(0), Int_Index_58(1)>;
<L1>:;
  D.1305_26 = pretmp.30_19;
  if (Int_Index_1 <= D.1305_26) goto <L0>; else goto <L2>;

<L2>:;


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26944


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bug tree-optimization/26944] [4.1/4.2 Regression] -ftree-ch generates worse code
  2006-05-03 18:55 ` dann at godzilla dot ics dot uci dot edu
@ 2006-05-03 19:00   ` Andrew Pinski
  0 siblings, 0 replies; 19+ messages in thread
From: Andrew Pinski @ 2006-05-03 19:00 UTC (permalink / raw)
  To: gcc-bugzilla; +Cc: gcc-bugs

> 
> 
> 
> ------- Comment #5 from dann at godzilla dot ics dot uci dot edu  2006-05-03 18:54 -------
> IMO Comment #4 does not look close enough at what is actually happening.
> IMO tree-ch is the root cause here.
> 
> Given the above CFG, critical edge splitting transforms this into:
> Given the above CFG PRE will dutifully fill with code a lot of the empty basic
> blocks: 

None of the above issues are the real issue.  TREE CH is doing the correct thing simplifying
the loop.  PRE is doing the correct thing by getting rid of redundants.  

The main issue is really the RA not being so good.

-- Pinski


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug tree-optimization/26944] [4.1/4.2 Regression] -ftree-ch generates worse code
  2006-03-30 16:15 [Bug tree-optimization/26944] New: -ftree-ch generates worse code dann at godzilla dot ics dot uci dot edu
                   ` (6 preceding siblings ...)
  2006-05-03 18:55 ` dann at godzilla dot ics dot uci dot edu
@ 2006-05-03 19:00 ` pinskia at physics dot uc dot edu
  2006-05-03 21:33 ` steven at gcc dot gnu dot org
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 19+ messages in thread
From: pinskia at physics dot uc dot edu @ 2006-05-03 19:00 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from pinskia at physics dot uc dot edu  2006-05-03 19:00 -------
Subject: Re:  [4.1/4.2 Regression] -ftree-ch generates worse code

> 
> 
> 
> ------- Comment #5 from dann at godzilla dot ics dot uci dot edu  2006-05-03 18:54 -------
> IMO Comment #4 does not look close enough at what is actually happening.
> IMO tree-ch is the root cause here.
> 
> Given the above CFG, critical edge splitting transforms this into:
> Given the above CFG PRE will dutifully fill with code a lot of the empty basic
> blocks: 

None of the above issues are the real issue.  TREE CH is doing the correct
thing simplifying
the loop.  PRE is doing the correct thing by getting rid of redundants.  

The main issue is really the RA not being so good.

-- Pinski


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26944


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug tree-optimization/26944] [4.1/4.2 Regression] -ftree-ch generates worse code
  2006-03-30 16:15 [Bug tree-optimization/26944] New: -ftree-ch generates worse code dann at godzilla dot ics dot uci dot edu
                   ` (7 preceding siblings ...)
  2006-05-03 19:00 ` pinskia at physics dot uc dot edu
@ 2006-05-03 21:33 ` steven at gcc dot gnu dot org
  2006-05-03 21:53 ` dann at godzilla dot ics dot uci dot edu
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 19+ messages in thread
From: steven at gcc dot gnu dot org @ 2006-05-03 21:33 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from steven at gcc dot gnu dot org  2006-05-03 21:33 -------
Re. comment #5, user code could also have a CFG like that, so we should handle
this case properly (and we do, tree-ch is doing the right thing afaict).  Re.
comment #6, I don't see what the register allocator has to do with this at all. 

The bottom line is that for the case where we produce good code, IVOPTs selects
a simple addressing mode and produces a simple loop exit condition; and for the
complicated code, IVOPTs picks an addressing mode that requires a lea and an
extra register.

Look back at that loop for a moment. With tree-ch, ignoring dead code (the sets
to SSA names 5[456] are dead!), the .cunroll dump (i.e. just before IVOPTs)
looks like this:

  # Int_Index_37 = PHI <Int_Index_58(6), Int_Loc_3(4)>;
<L0>:;
  (*pretmp.28_49)[Int_Index_37] = Int_Loc_3;
  Int_Index_58 = Int_Index_37 + 1;
  if (D.1563_41 >= Int_Index_58) goto <L8>; else goto <L9>;

<L8>:;
  goto <bb 5> (<L0>);

That looks rather nice to me. But just after IVOPTs (in the .ivopts dump) we
have turned that simple nice code into this mess:

  # ivtmp.38_26 = PHI <ivtmp.38_35(6), 0(4)>;
<L0>:;
  D.1622_34 = (int *) pretmp.28_49;
  D.1623_33 = (int *) Int_1_Par_Val_2;
  D.1624_22 = (int *) ivtmp.38_26;
  D.1625_21 = D.1623_33 + D.1624_22;
  MEM[base: D.1622_34, index: D.1625_21, step: 4B, offset: 20B] = Int_Loc_3;
  ivtmp.38_35 = ivtmp.38_26 + 1;
  D.1626_20 = (unsigned int) Int_1_Par_Val_2;
  D.1627_17 = D.1626_20 + ivtmp.38_35;
  D.1628_16 = D.1627_17 + 5;
  Int_Index_15 = (One_Fifty) D.1628_16;
  if (D.1563_41 >= Int_Index_15) goto <L8>; else goto <L9>;

<L8>:;
  goto <bb 5> (<L0>);

If this is caused by the register allocator, I'd like to know why you'd think
that.  And if this is the doing of tree-ch, then I'd like to know what you
expect tree-ch to do instead.  But as far as I can tell, this is just a very
poor choice by IVOPTs.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26944


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug tree-optimization/26944] [4.1/4.2 Regression] -ftree-ch generates worse code
  2006-03-30 16:15 [Bug tree-optimization/26944] New: -ftree-ch generates worse code dann at godzilla dot ics dot uci dot edu
                   ` (8 preceding siblings ...)
  2006-05-03 21:33 ` steven at gcc dot gnu dot org
@ 2006-05-03 21:53 ` dann at godzilla dot ics dot uci dot edu
  2006-05-04 21:25 ` pinskia at gcc dot gnu dot org
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 19+ messages in thread
From: dann at godzilla dot ics dot uci dot edu @ 2006-05-03 21:53 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #8 from dann at godzilla dot ics dot uci dot edu  2006-05-03 21:53 -------
WRT this code generated by tree-ch:
  D.1305_41 = Int_Loc_3 + 1;
  if (Int_Loc_3 <= D.1305_41) goto <L0>; else goto <L2>;

AFAICT there's exactly one value for which the comparison can be false, IMO it
would be better to test directly that value instead of generating a new SSA
name and another expression.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26944


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug tree-optimization/26944] [4.1/4.2 Regression] -ftree-ch generates worse code
  2006-03-30 16:15 [Bug tree-optimization/26944] New: -ftree-ch generates worse code dann at godzilla dot ics dot uci dot edu
                   ` (9 preceding siblings ...)
  2006-05-03 21:53 ` dann at godzilla dot ics dot uci dot edu
@ 2006-05-04 21:25 ` pinskia at gcc dot gnu dot org
  2006-05-25  2:39 ` mmitchel at gcc dot gnu dot org
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 19+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2006-05-04 21:25 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #9 from pinskia at gcc dot gnu dot org  2006-05-04 21:25 -------
(In reply to comment #8)
> WRT this code generated by tree-ch:
>   D.1305_41 = Int_Loc_3 + 1;
>   if (Int_Loc_3 <= D.1305_41) goto <L0>; else goto <L2>;
> 
> AFAICT there's exactly one value for which the comparison can be false, IMO it
> would be better to test directly that value instead of generating a new SSA
> name and another expression.

Well CH should not do this as it never sees two expressions together, only the
one COND_EXPR.  If we do a VRP after CH, it will not fix it currently either
because VRP does not record that many symbolic ranges (I forgot that PR number,
it was filed by me). If VRP did that and we added a VRP after CH but before
IV-OPTS, maybe this wil fix itself.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26944


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug tree-optimization/26944] [4.1/4.2 Regression] -ftree-ch generates worse code
  2006-03-30 16:15 [Bug tree-optimization/26944] New: -ftree-ch generates worse code dann at godzilla dot ics dot uci dot edu
                   ` (10 preceding siblings ...)
  2006-05-04 21:25 ` pinskia at gcc dot gnu dot org
@ 2006-05-25  2:39 ` mmitchel at gcc dot gnu dot org
  2007-02-14  9:10 ` [Bug tree-optimization/26944] [4.1/4.2/4.3 " mmitchel at gcc dot gnu dot org
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 19+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2006-05-25  2:39 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #10 from mmitchel at gcc dot gnu dot org  2006-05-25 02:34 -------
Will not be fixed in 4.1.1; adjust target milestone to 4.1.2.


-- 

mmitchel at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.1.1                       |4.1.2


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26944


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug tree-optimization/26944] [4.1/4.2/4.3 Regression] -ftree-ch generates worse code
  2006-03-30 16:15 [Bug tree-optimization/26944] New: -ftree-ch generates worse code dann at godzilla dot ics dot uci dot edu
                   ` (11 preceding siblings ...)
  2006-05-25  2:39 ` mmitchel at gcc dot gnu dot org
@ 2007-02-14  9:10 ` mmitchel at gcc dot gnu dot org
  2007-06-18  5:28 ` [Bug tree-optimization/26944] [4.1/4.2 " pinskia at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 19+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2007-02-14  9:10 UTC (permalink / raw)
  To: gcc-bugs



-- 

mmitchel at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.1.2                       |4.1.3


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26944


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug tree-optimization/26944] [4.1/4.2 Regression] -ftree-ch generates worse code
  2006-03-30 16:15 [Bug tree-optimization/26944] New: -ftree-ch generates worse code dann at godzilla dot ics dot uci dot edu
                   ` (12 preceding siblings ...)
  2007-02-14  9:10 ` [Bug tree-optimization/26944] [4.1/4.2/4.3 " mmitchel at gcc dot gnu dot org
@ 2007-06-18  5:28 ` pinskia at gcc dot gnu dot org
  2008-07-04 20:22 ` [Bug tree-optimization/26944] [4.2 " jsm28 at gcc dot gnu dot org
  2009-03-30 15:51 ` jsm28 at gcc dot gnu dot org
  15 siblings, 0 replies; 19+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2007-06-18  5:28 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #11 from pinskia at gcc dot gnu dot org  2007-06-18 05:28 -------
The trunk no longer produces a loop so this has been fixed unless you can get a
testcase where we still produce worse code.


-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
      Known to work|4.0.3                       |4.0.3 4.3.0
            Summary|[4.1/4.2/4.3 Regression] -  |[4.1/4.2 Regression] -ftree-
                   |ftree-ch generates worse    |ch generates worse code
                   |code                        |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26944


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug tree-optimization/26944] [4.2 Regression] -ftree-ch generates worse code
  2006-03-30 16:15 [Bug tree-optimization/26944] New: -ftree-ch generates worse code dann at godzilla dot ics dot uci dot edu
                   ` (13 preceding siblings ...)
  2007-06-18  5:28 ` [Bug tree-optimization/26944] [4.1/4.2 " pinskia at gcc dot gnu dot org
@ 2008-07-04 20:22 ` jsm28 at gcc dot gnu dot org
  2009-03-30 15:51 ` jsm28 at gcc dot gnu dot org
  15 siblings, 0 replies; 19+ messages in thread
From: jsm28 at gcc dot gnu dot org @ 2008-07-04 20:22 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #12 from jsm28 at gcc dot gnu dot org  2008-07-04 20:21 -------
Closing 4.1 branch.


-- 

jsm28 at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|[4.1/4.2 Regression] -ftree-|[4.2 Regression] -ftree-ch
                   |ch generates worse code     |generates worse code
   Target Milestone|4.1.3                       |4.2.5


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26944


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug tree-optimization/26944] [4.2 Regression] -ftree-ch generates worse code
  2006-03-30 16:15 [Bug tree-optimization/26944] New: -ftree-ch generates worse code dann at godzilla dot ics dot uci dot edu
                   ` (14 preceding siblings ...)
  2008-07-04 20:22 ` [Bug tree-optimization/26944] [4.2 " jsm28 at gcc dot gnu dot org
@ 2009-03-30 15:51 ` jsm28 at gcc dot gnu dot org
  15 siblings, 0 replies; 19+ messages in thread
From: jsm28 at gcc dot gnu dot org @ 2009-03-30 15:51 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #13 from jsm28 at gcc dot gnu dot org  2009-03-30 15:51 -------
Closing 4.2 branch, fixed in 4.3.


-- 

jsm28 at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
      Known to fail|                            |4.2.5
         Resolution|                            |FIXED
   Target Milestone|4.2.5                       |4.3.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26944


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2009-03-30 15:51 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-03-30 16:15 [Bug tree-optimization/26944] New: -ftree-ch generates worse code dann at godzilla dot ics dot uci dot edu
2006-03-30 16:25 ` [Bug tree-optimization/26944] [4.1/4.2 Regression] " rguenth at gcc dot gnu dot org
2006-03-30 16:43 ` dann at godzilla dot ics dot uci dot edu
2006-03-31 22:41   ` Daniel Berlin
2006-03-31 22:41 ` dberlin at dberlin dot org
2006-04-02  8:12 ` pinskia at gcc dot gnu dot org
2006-04-16 19:13 ` mmitchel at gcc dot gnu dot org
2006-05-02 17:38 ` steven at gcc dot gnu dot org
2006-05-03 18:55 ` dann at godzilla dot ics dot uci dot edu
2006-05-03 19:00   ` Andrew Pinski
2006-05-03 19:00 ` pinskia at physics dot uc dot edu
2006-05-03 21:33 ` steven at gcc dot gnu dot org
2006-05-03 21:53 ` dann at godzilla dot ics dot uci dot edu
2006-05-04 21:25 ` pinskia at gcc dot gnu dot org
2006-05-25  2:39 ` mmitchel at gcc dot gnu dot org
2007-02-14  9:10 ` [Bug tree-optimization/26944] [4.1/4.2/4.3 " mmitchel at gcc dot gnu dot org
2007-06-18  5:28 ` [Bug tree-optimization/26944] [4.1/4.2 " pinskia at gcc dot gnu dot org
2008-07-04 20:22 ` [Bug tree-optimization/26944] [4.2 " jsm28 at gcc dot gnu dot org
2009-03-30 15:51 ` jsm28 at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).