public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/95018] New: Excessive unrolling for Fortran library array handling
@ 2020-05-09  7:58 tkoenig at gcc dot gnu.org
  2020-05-09  8:00 ` [Bug target/95018] " tkoenig at gcc dot gnu.org
                   ` (38 more replies)
  0 siblings, 39 replies; 40+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2020-05-09  7:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018

            Bug ID: 95018
           Summary: Excessive unrolling for Fortran library array handling
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: tkoenig at gcc dot gnu.org
  Target Milestone: ---

Created attachment 48488
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48488&action=edit
Generated assembly

The code generated for in_pack_i4.c from libgfortran on POWER9 is huge
and, presumably, slow; I assume that other code in the library is
similarly affected.

The problem manifests itself in the unrolling of the loops which
do all the work:

  while (src)
    {
      /* Copy the data.  */
      *(dest++) = *src;
      /* Advance to the next element.  */
      src += stride0;
      count[0]++;
      /* Advance to the next source element.  */
      index_type n = 0;
      while (count[n] == extent[n])
        {
          /* When we get to the end of a dimension, reset it and increment
             the next dimension.  */
          count[n] = 0;
          /* We could precalculate these products, but this is a less
             frequently used path so probably not worth it.  */
          src -= stride[n] * extent[n];
          n++;
          if (n == dim)
            {
              src = NULL;
              break;
            }
          else
            {
              count[n]++;
              src += stride[n];
            }
        }
    }
  return destptr;
}

One problem here is the while (count[n] == extent[n]) loop.
This is an odometer algorithm to look for the next element to
go to. Most of the times, the while is false, so it is definitely
not good.

x86_64 does not appear to be affected.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Bug target/95018] Excessive unrolling for Fortran library array handling
  2020-05-09  7:58 [Bug target/95018] New: Excessive unrolling for Fortran library array handling tkoenig at gcc dot gnu.org
@ 2020-05-09  8:00 ` tkoenig at gcc dot gnu.org
  2020-05-09  8:10 ` tkoenig at gcc dot gnu.org
                   ` (37 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2020-05-09  8:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018

--- Comment #1 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
Created attachment 48489
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48489&action=edit
Preprocessed source

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Bug target/95018] Excessive unrolling for Fortran library array handling
  2020-05-09  7:58 [Bug target/95018] New: Excessive unrolling for Fortran library array handling tkoenig at gcc dot gnu.org
  2020-05-09  8:00 ` [Bug target/95018] " tkoenig at gcc dot gnu.org
@ 2020-05-09  8:10 ` tkoenig at gcc dot gnu.org
  2020-05-09  8:36 ` [Bug target/95018] [11 Regression] " tkoenig at gcc dot gnu.org
                   ` (36 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2020-05-09  8:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018

--- Comment #2 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
Created attachment 48490
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48490&action=edit
-fdump-tree-optimized dumpj

Finally, the -fdump-tree-optimized dump.

Unrolling already appears to happen in the middle end, all the way up to 15
(which is not good).

The PHI nodes are also something to behold:

 <bb 41> [local count: 96219245]:
  # src_79 = PHI <src_268(36), src_251(34), src_285(38), src_74(14),
src_70(16), src_115(18), src_132(20), src_149(22), src_166(24), src_183(26),
src_200(28), src_217(30), src_234(32), src_62(40)>
  # count_I_lsm.31_531 = PHI <count_I_lsm.31_530(36), count_I_lsm.31_530(34),
count_I_lsm.31_530(38), count_I_lsm.31_530(14), count_I_lsm.31_530(16),
count_I_lsm.31_530(18), count_I_lsm.31_530(20), count_I_lsm.31_530(22),
count_I_lsm.31_530(24), count_I_lsm.31_530(26), count_I_lsm.31_530(28),
count_I_lsm.31_530(30), count_I_lsm.31_530(32), _21(40)>
  # count_I_lsm.33_537 = PHI <0(36), 0(34), 0(38), _98(14), 0(16), 0(18),
0(20), 0(22), 0(24), 0(26), 0(28), 0(30), 0(32), 0(40)>
  # count_I_lsm.35_543 = PHI <0(36), 0(34), 0(38), count_I_lsm.35_542(14),
_104(16), 0(18), 0(20), 0(22), 0(24), 0(26), 0(28), 0(30), 0(32), 0(40)>
  # count_I_lsm.37_549 = PHI <0(36), 0(34), 0(38), count_I_lsm.37_548(14),
count_I_lsm.37_548(16), _109(18), 0(20), 0(22), 0(24), 0(26), 0(28), 0(30),
0(32), 0(40)>
  # count_I_lsm.39_555 = PHI <0(36), 0(34), 0(38), count_I_lsm.39_554(14),
count_I_lsm.39_554(16), count_I_lsm.39_554(18), _126(20), 0(22), 0(24), 0(26),
0(28), 0(30), 0(32), 0(40)>
  # count_I_lsm.41_561 = PHI <0(36), 0(34), 0(38), count_I_lsm.41_560(14),
count_I_lsm.41_560(16), count_I_lsm.41_560(18), count_I_lsm.41_560(20),
_143(22), 0(24), 0(26), 0(28), 0(30), 0(32), 0(40)>
  # count_I_lsm.43_567 = PHI <0(36), 0(34), 0(38), count_I_lsm.43_566(14),
count_I_lsm.43_566(16), count_I_lsm.43_566(18), count_I_lsm.43_566(20),
count_I_lsm.43_566(22), _160(24), 0(26), 0(28), 0(30), 0(32), 0(40)>
  # count_I_lsm.45_573 = PHI <0(36), 0(34), 0(38), count_I_lsm.45_572(14),
count_I_lsm.45_572(16), count_I_lsm.45_572(18), count_I_lsm.45_572(20),
count_I_lsm.45_572(22), count_I_lsm.45_572(24), _177(26), 0(28), 0(30), 0(32),
0(40)>
  # count_I_lsm.47_579 = PHI <0(36), 0(34), 0(38), count_I_lsm.47_578(14),
count_I_lsm.47_578(16), count_I_lsm.47_578(18), count_I_lsm.47_578(20),
count_I_lsm.47_578(22), count_I_lsm.47_578(24), count_I_lsm.47_578(26),
_194(28), 0(30), 0(32), 0(40)>
  # count_I_lsm.49_585 = PHI <0(36), 0(34), 0(38), count_I_lsm.49_584(14),
count_I_lsm.49_584(16), count_I_lsm.49_584(18), count_I_lsm.49_584(20),
count_I_lsm.49_584(22), count_I_lsm.49_584(24), count_I_lsm.49_584(26),
count_I_lsm.49_584(28), _211(30), 0(32), 0(40)>
  # count_I_lsm.51_591 = PHI <0(36), 0(34), 0(38), count_I_lsm.51_590(14),
count_I_lsm.51_590(16), count_I_lsm.51_590(18), count_I_lsm.51_590(20),
count_I_lsm.51_590(22), count_I_lsm.51_590(24), count_I_lsm.51_590(26),
count_I_lsm.51_590(28), count_I_lsm.51_590(30), _228(32), 0(40)>
  # count_I_lsm.53_597 = PHI <0(36), _245(34), 0(38), count_I_lsm.53_596(14),
count_I_lsm.53_596(16), count_I_lsm.53_596(18), count_I_lsm.53_596(20),
count_I_lsm.53_596(22), count_I_lsm.53_596(24), count_I_lsm.53_596(26),
count_I_lsm.53_596(28), count_I_lsm.53_596(30), count_I_lsm.53_596(32), 0(40)>
  # count_I_lsm.55_603 = PHI <_262(36), count_I_lsm.55_602(34), 0(38),
count_I_lsm.55_602(14), count_I_lsm.55_602(16), count_I_lsm.55_602(18),
count_I_lsm.55_602(20), count_I_lsm.55_602(22), count_I_lsm.55_602(24),
count_I_lsm.55_602(26), count_I_lsm.55_602(28), count_I_lsm.55_602(30),
count_I_lsm.55_602(32), 0(40)>
  # count_I_lsm.57_609 = PHI <count_I_lsm.57_608(36), count_I_lsm.57_608(34),
_279(38), count_I_lsm.57_608(14), count_I_lsm.57_608(16),
count_I_lsm.57_608(18), count_I_lsm.57_608(20), count_I_lsm.57_608(22),
count_I_lsm.57_608(24), count_I_lsm.57_608(26), count_I_lsm.57_608(28),
count_I_lsm.57_608(30), count_I_lsm.57_608(32), 0(40)>

  <bb 42> [local count: 102536739]:
  # src_33 = PHI <src_79(41), src_57(12)>
  # count_I_lsm.8_29 = PHI <0(41), _15(12)>
  # count_I_lsm.31_532 = PHI <count_I_lsm.31_531(41), count_I_lsm.31_530(12)>
  # count_I_lsm.33_538 = PHI <count_I_lsm.33_537(41), count_I_lsm.33_536(12)>
  # count_I_lsm.35_544 = PHI <count_I_lsm.35_543(41), count_I_lsm.35_542(12)>
  # count_I_lsm.37_550 = PHI <count_I_lsm.37_549(41), count_I_lsm.37_548(12)>
  # count_I_lsm.39_556 = PHI <count_I_lsm.39_555(41), count_I_lsm.39_554(12)>
  # count_I_lsm.41_562 = PHI <count_I_lsm.41_561(41), count_I_lsm.41_560(12)>
  # count_I_lsm.43_568 = PHI <count_I_lsm.43_567(41), count_I_lsm.43_566(12)>
  # count_I_lsm.45_574 = PHI <count_I_lsm.45_573(41), count_I_lsm.45_572(12)>
  # count_I_lsm.47_580 = PHI <count_I_lsm.47_579(41), count_I_lsm.47_578(12)>
  # count_I_lsm.49_586 = PHI <count_I_lsm.49_585(41), count_I_lsm.49_584(12)>
  # count_I_lsm.51_592 = PHI <count_I_lsm.51_591(41), count_I_lsm.51_590(12)>
  # count_I_lsm.53_598 = PHI <count_I_lsm.53_597(41), count_I_lsm.53_596(12)>
  # count_I_lsm.55_604 = PHI <count_I_lsm.55_603(41), count_I_lsm.55_602(12)>
  # count_I_lsm.57_610 = PHI <count_I_lsm.57_609(41), count_I_lsm.57_608(12)>
  goto <bb 12>; [100.00%]

  <bb 43> [local count: 29043813]:
  # _35 = PHI <_63(3), destptr_52(39), destptr_52(10), destptr_52(13),
destptr_52(37), destptr_52(15), destptr_52(17), destptr_52(19), destptr_52(21),
destptr_52(23), destptr_52(25), destptr_52(27), destptr_52(29), destptr_52(31),
destptr_52(33), destptr_52(35), destptr_52(40)>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Bug target/95018] [11 Regression] Excessive unrolling for Fortran library array handling
  2020-05-09  7:58 [Bug target/95018] New: Excessive unrolling for Fortran library array handling tkoenig at gcc dot gnu.org
  2020-05-09  8:00 ` [Bug target/95018] " tkoenig at gcc dot gnu.org
  2020-05-09  8:10 ` tkoenig at gcc dot gnu.org
@ 2020-05-09  8:36 ` tkoenig at gcc dot gnu.org
  2020-05-09  9:01 ` tkoenig at gcc dot gnu.org
                   ` (35 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2020-05-09  8:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018

Thomas Koenig <tkoenig at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |11.0
            Summary|Excessive unrolling for     |[11 Regression] Excessive
                   |Fortran library array       |unrolling for Fortran
                   |handling                    |library array handling

--- Comment #3 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
Does not occur on 7.5.0, so it's an 11 regression at least.

Further testing underway.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Bug target/95018] [11 Regression] Excessive unrolling for Fortran library array handling
  2020-05-09  7:58 [Bug target/95018] New: Excessive unrolling for Fortran library array handling tkoenig at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2020-05-09  8:36 ` [Bug target/95018] [11 Regression] " tkoenig at gcc dot gnu.org
@ 2020-05-09  9:01 ` tkoenig at gcc dot gnu.org
  2020-05-09  9:03 ` tkoenig at gcc dot gnu.org
                   ` (34 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2020-05-09  9:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018

--- Comment #4 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
9.3.0 is also not affected.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Bug target/95018] [11 Regression] Excessive unrolling for Fortran library array handling
  2020-05-09  7:58 [Bug target/95018] New: Excessive unrolling for Fortran library array handling tkoenig at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2020-05-09  9:01 ` tkoenig at gcc dot gnu.org
@ 2020-05-09  9:03 ` tkoenig at gcc dot gnu.org
  2020-05-09  9:42 ` [Bug target/95018] [10/11 " tkoenig at gcc dot gnu.org
                   ` (33 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2020-05-09  9:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018

--- Comment #5 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
For 9.3.0, I get

$ objdump --disassemble in_pack_i4.o  | wc -l
123

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling
  2020-05-09  7:58 [Bug target/95018] New: Excessive unrolling for Fortran library array handling tkoenig at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2020-05-09  9:03 ` tkoenig at gcc dot gnu.org
@ 2020-05-09  9:42 ` tkoenig at gcc dot gnu.org
  2020-05-09 13:21 ` tkoenig at gcc dot gnu.org
                   ` (32 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2020-05-09  9:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018

Thomas Koenig <tkoenig at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|[11 Regression] Excessive   |[10/11 Regression]
                   |unrolling for Fortran       |Excessive unrolling for
                   |library array handling      |Fortran library array
                   |                            |handling
                 CC|                            |koenigni at gcc dot gnu.org,
                   |                            |segher at gcc dot gnu.org
             Target|                            |powerpc64le-unknown-linux-g
                   |                            |nu
   Target Milestone|11.0                        |10.2
            Version|11.0                        |10.0

--- Comment #6 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
And gcc 10 is also affected:

$ objdump --disassemble in_pack_i4.o | wc -l
451

So, marking this a 10/11 Regression.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling
  2020-05-09  7:58 [Bug target/95018] New: Excessive unrolling for Fortran library array handling tkoenig at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2020-05-09  9:42 ` [Bug target/95018] [10/11 " tkoenig at gcc dot gnu.org
@ 2020-05-09 13:21 ` tkoenig at gcc dot gnu.org
  2020-05-11  5:03 ` tkoenig at gcc dot gnu.org
                   ` (31 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2020-05-09 13:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018

--- Comment #7 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
Just checked aarch64, and that also isn't affected:

tkoenig@gcc116:~/gcc-bin/aarch64-unknown-linux-gnu/libgfortran$ objdump
--disassemble in_pack_i4.o | wc -l
95

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling
  2020-05-09  7:58 [Bug target/95018] New: Excessive unrolling for Fortran library array handling tkoenig at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2020-05-09 13:21 ` tkoenig at gcc dot gnu.org
@ 2020-05-11  5:03 ` tkoenig at gcc dot gnu.org
  2020-05-11  5:18 ` tkoenig at gcc dot gnu.org
                   ` (30 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2020-05-11  5:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018

Thomas Koenig <tkoenig at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2020-05-11
             Status|UNCONFIRMED                 |NEW

--- Comment #8 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
The patch which caused the problem was

commit 6d099a76a0f6a040a3e678f2bce7fc69cc3257d8
Author: Jiufu Guo <guojiufu@linux.ibm.com>
Date:   Mon Oct 28 05:23:24 2019 +0000

    rs6000: Enable limited unrolling at -O2

    In PR88760, there are a few disscussion about improve or tune unroller for
    targets. And we would agree to enable unroller for small loops at O2 first.
    And we could see performance improvement(~10%) for below code:
    ```
      subroutine foo (i, i1, block)
        integer :: i, i1
        integer :: block(9, 9, 9)
        block(i:9,1,i1) = block(i:9,1,i1) - 10
      end subroutine foo

    ```
    This kind of code occurs a few times in exchange2 benchmark.

    Similar C code:
    ```
      for (i = 0; i < n; i++)
        arr[i] = arr[i] - 10;
    ```

    On powerpcle, for O2 , enable -funroll-loops and limit
    PARAM_MAX_UNROLL_TIMES=2 and PARAM_MAX_UNROLLED_INSNS=20, we can see >2%
    overall improvement for SPEC2017.

    This patch is only for rs6000 in which we see visible performance
improvement.

    gcc/
    2019-10-25  Jiufu Guo  <guojiufu@linux.ibm.com>

        PR tree-optimization/88760
        * config/rs6000/rs6000-common.c (rs6000_option_optimization_table):
        Enable -funroll-loops for -O2 and above.
        * config/rs6000/rs6000.c (rs6000_option_override_internal): Set
        PARAM_MAX_UNROLL_TIMES to 2 and PARAM_MAX_UNROLLED_INSNS to 20, and
        do not turn on web and rngreg implicitly, if the unroller is not
        explicitly enabled.

    gcc.testsuite/
    2019-10-25  Jiufu Guo  <guojiufu@linux.ibm.com>

        PR tree-optimization/88760
        * gcc.target/powerpc/small-loop-unroll.c: New test.
        * c-c++-common/tsan/thread_leak2.c: Update test.
        * gcc.dg/pr59643.c: Update test.
        * gcc.target/powerpc/loop_align.c: Update test.
        * gcc.target/powerpc/ppc-fma-1.c: Update test.
        * gcc.target/powerpc/ppc-fma-2.c: Update test.
        * gcc.target/powerpc/ppc-fma-3.c: Update test.
        * gcc.target/powerpc/ppc-fma-4.c: Update test.
        * gcc.target/powerpc/pr78604.c: Update test.

    From-SVN: r277501

So, this patch seems to have exposed a problem with the unrolling in
general, or with the parameters for POWER.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling
  2020-05-09  7:58 [Bug target/95018] New: Excessive unrolling for Fortran library array handling tkoenig at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2020-05-11  5:03 ` tkoenig at gcc dot gnu.org
@ 2020-05-11  5:18 ` tkoenig at gcc dot gnu.org
  2020-05-11  7:53 ` guojiufu at gcc dot gnu.org
                   ` (29 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2020-05-11  5:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018

--- Comment #9 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
Created attachment 48502
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48502&action=edit
Assembly file on x86 with -O2 -funroll-loops

So, it seems the decisions made for unrolling are bad for this case
independent of architecture - the cold loop is also unrolled 15 times
on x86_64  with -funroll-loops.  The change to POWER just happened
to expose it.

The code on x86_64 looks like

# ../../../trunk/libgfortran/generated/in_pack_i4.c:76:   destptr =
xmallocarray (ssize, sizeof (GFC_INTEGER_4));
        movq    %rax, 152(%rsp) # tmp309, %sfp
# ../../../trunk/libgfortran/generated/in_pack_i4.c:78:   src =
source->base_addr;
        movq    0(%rbp), %rax   # source_42(D)->base_addr, src
# ../../../trunk/libgfortran/generated/in_pack_i4.c:82:   while (src)
        testq   %rax, %rax      # src
        je      .L1     #,
# ../../../trunk/libgfortran/generated/in_pack_i4.c:91:       while (count[n]
== extent[n])
        movq    416(%rsp), %r15 # extent, _68
# ../../../trunk/libgfortran/generated/in_pack_i4.c:108:               src +=
stride[n];
        movq    552(%rsp), %rdi # stride, _92
# ../../../trunk/libgfortran/generated/in_pack_i4.c:87:       src += stride0;
        leaq    0(,%rsi,4), %r12        #, _13
# ../../../trunk/libgfortran/generated/in_pack_i4.c:108:               src +=
stride[n];
        movq    560(%rsp), %r14 # stride, _24
        movq    568(%rsp), %r13 # stride, _112
# ../../../trunk/libgfortran/generated/in_pack_i4.c:98:           src -=
stride[n] * extent[n];
        imulq   %r15, %rsi      # _68, tmp225
# ../../../trunk/libgfortran/generated/in_pack_i4.c:91:       while (count[n]
== extent[n])
        movq    %r15, 16(%rsp)  # _68, %sfp
        movq    424(%rsp), %r15 # extent, _73
        movq    %rdi, %r10      # _92, tmp226
        movq    %r14, %r9       # _24, tmp228
        movq    288(%rsp), %rdx # count, count_I_lsm.8
        movq    296(%rsp), %rcx # count, count_I_lsm.29
# ../../../trunk/libgfortran/generated/in_pack_i4.c:98:           src -=
stride[n] * extent[n];
        imulq   %r15, %rdi      # _73, tmp227
# ../../../trunk/libgfortran/generated/in_pack_i4.c:91:       while (count[n]
== extent[n])
        movq    %r15, 32(%rsp)  # _73, %sfp
        movq    432(%rsp), %r15 # extent, _14
        subq    %rsi, %r10      # tmp225, tmp226
        movq    320(%rsp), %r8  # count, count_I_lsm.35
        movq    304(%rsp), %rsi # count, count_I_lsm.31
# ../../../trunk/libgfortran/generated/in_pack_i4.c:98:           src -=
stride[n] * extent[n];
        imulq   %r15, %r14      # _14, tmp229
        leaq    0(,%r10,4), %r11        #, _78
        movq    %r13, %r10      # _112, tmp230
# ../../../trunk/libgfortran/generated/in_pack_i4.c:91:       while (count[n]
== extent[n])

... and so on.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling
  2020-05-09  7:58 [Bug target/95018] New: Excessive unrolling for Fortran library array handling tkoenig at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2020-05-11  5:18 ` tkoenig at gcc dot gnu.org
@ 2020-05-11  7:53 ` guojiufu at gcc dot gnu.org
  2020-05-11  8:07 ` guojiufu at gcc dot gnu.org
                   ` (28 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: guojiufu at gcc dot gnu.org @ 2020-05-11  7:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018

--- Comment #10 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> ---
For power, the patch enables -funroll-loops (with small loops unroller in RTL)
and which also enabled the 'cunroll'(complete unroller) on tree. 

For this loop(the inner loop), 'cunroll' figures out that the loop is executed
at most 13 times. Then the complete unroller could handle this loop.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling
  2020-05-09  7:58 [Bug target/95018] New: Excessive unrolling for Fortran library array handling tkoenig at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2020-05-11  7:53 ` guojiufu at gcc dot gnu.org
@ 2020-05-11  8:07 ` guojiufu at gcc dot gnu.org
  2020-05-11  8:13 ` guojiufu at gcc dot gnu.org
                   ` (27 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: guojiufu at gcc dot gnu.org @ 2020-05-11  8:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018

--- Comment #11 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> ---
In general, 'cunroll' could help performance visibly on some workload, like
SPEC. 
In this case, it may be in question if the performance is improved.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling
  2020-05-09  7:58 [Bug target/95018] New: Excessive unrolling for Fortran library array handling tkoenig at gcc dot gnu.org
                   ` (10 preceding siblings ...)
  2020-05-11  8:07 ` guojiufu at gcc dot gnu.org
@ 2020-05-11  8:13 ` guojiufu at gcc dot gnu.org
  2020-05-11  8:32 ` guojiufu at gcc dot gnu.org
                   ` (26 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: guojiufu at gcc dot gnu.org @ 2020-05-11  8:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018

--- Comment #12 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> ---
> executed at most 13 times. Then the complete unroller could handle this loop.

Correction: 13+1 times

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling
  2020-05-09  7:58 [Bug target/95018] New: Excessive unrolling for Fortran library array handling tkoenig at gcc dot gnu.org
                   ` (11 preceding siblings ...)
  2020-05-11  8:13 ` guojiufu at gcc dot gnu.org
@ 2020-05-11  8:32 ` guojiufu at gcc dot gnu.org
  2020-05-11  9:20 ` tkoenig at gcc dot gnu.org
                   ` (25 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: guojiufu at gcc dot gnu.org @ 2020-05-11  8:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018

--- Comment #13 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> ---
In this case, the loop body execution is at most a given number, but not an
exact number. It would be only some iterations are executed at runtime. As
above said this may false for 'while (count[n] == extent[n])'.

There may be a possible enhancement for the 'cunroll'.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling
  2020-05-09  7:58 [Bug target/95018] New: Excessive unrolling for Fortran library array handling tkoenig at gcc dot gnu.org
                   ` (12 preceding siblings ...)
  2020-05-11  8:32 ` guojiufu at gcc dot gnu.org
@ 2020-05-11  9:20 ` tkoenig at gcc dot gnu.org
  2020-05-11 14:09 ` guojiufu at gcc dot gnu.org
                   ` (24 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2020-05-11  9:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018

--- Comment #14 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
Most Fortran arrays are one- or two-dimensional.

Assuming a 10*10 two-dimensional array that is being packed, the
condition will be tested 121 times and the loop body will be entered
12 times. Only once will it run two times.

So, unrolling is definitely not good here.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling
  2020-05-09  7:58 [Bug target/95018] New: Excessive unrolling for Fortran library array handling tkoenig at gcc dot gnu.org
                   ` (13 preceding siblings ...)
  2020-05-11  9:20 ` tkoenig at gcc dot gnu.org
@ 2020-05-11 14:09 ` guojiufu at gcc dot gnu.org
  2020-05-11 19:13 ` tkoenig at gcc dot gnu.org
                   ` (23 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: guojiufu at gcc dot gnu.org @ 2020-05-11 14:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018

--- Comment #15 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> ---
Hi Thomas,
Are you using a test case to check the performance? If you have, would you
please share it, then we can use it to tune a heuristic improvement for
cunroll.

Thanks.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling
  2020-05-09  7:58 [Bug target/95018] New: Excessive unrolling for Fortran library array handling tkoenig at gcc dot gnu.org
                   ` (14 preceding siblings ...)
  2020-05-11 14:09 ` guojiufu at gcc dot gnu.org
@ 2020-05-11 19:13 ` tkoenig at gcc dot gnu.org
  2020-05-12  5:58 ` guojiufu at gcc dot gnu.org
                   ` (22 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2020-05-11 19:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018

--- Comment #16 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
Hi,

I was unable to find a performance problem, so I take back my
presumption of the original problem.  I have checked two versions
of the preprocessed source, with

+#pragma GCC unroll 1
       while (count[n] == extent[n])
         {

as the difference.

What I found was that with the pragma and -O2

- stack size decreased from 752 bytes to 432 bytes, from the stdu 1,-752(1)
  instruction

- object code size decreased from 7042 to 1936 bytes, determined by looking
  at the addresses of objdump --disassemble

(I also found that #pragma GCC unroll 2 was ignored, but that's for another
PR).

Considering we have this idiom around 400 times in libgfortran, I would
estimate an increase of the size of libgfortan of around two Megabytes.

So, what to do?  I don't know if the gcc optimization routines can even
consider this particular loop to be one that is not often used,
although it is an inner loop.

As far as libgfortran is concerned, we could simply use the
pragma on our sources, but maybe other people have the same issue.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling
  2020-05-09  7:58 [Bug target/95018] New: Excessive unrolling for Fortran library array handling tkoenig at gcc dot gnu.org
                   ` (15 preceding siblings ...)
  2020-05-11 19:13 ` tkoenig at gcc dot gnu.org
@ 2020-05-12  5:58 ` guojiufu at gcc dot gnu.org
  2020-05-12  6:06 ` guojiufu at gcc dot gnu.org
                   ` (21 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: guojiufu at gcc dot gnu.org @ 2020-05-12  5:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018

--- Comment #17 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> ---
For this case, as you said, I also think it is better to avoid unrolling for
the loop. 
'#pragma GCC unroll 1' could help to prevent the loop to be unrolled, even
someone compiles it with aggressive unroll options.
So, add pragma may be acceptable for in_pack_i4.c.

And at the same time, we would also enhance GCC 'cunroll' for this kind of
loop.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling
  2020-05-09  7:58 [Bug target/95018] New: Excessive unrolling for Fortran library array handling tkoenig at gcc dot gnu.org
                   ` (16 preceding siblings ...)
  2020-05-12  5:58 ` guojiufu at gcc dot gnu.org
@ 2020-05-12  6:06 ` guojiufu at gcc dot gnu.org
  2020-05-12  7:41 ` rguenth at gcc dot gnu.org
                   ` (20 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: guojiufu at gcc dot gnu.org @ 2020-05-12  6:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018

--- Comment #18 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> ---
Currently, I'm thinking to enhance GCC 'cunroll' as:
if the loop has multi-exits or upbound is not a fixed number, we may not do
'complete unroll' for the loop, except -funroll-all-loops is specified.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling
  2020-05-09  7:58 [Bug target/95018] New: Excessive unrolling for Fortran library array handling tkoenig at gcc dot gnu.org
                   ` (17 preceding siblings ...)
  2020-05-12  6:06 ` guojiufu at gcc dot gnu.org
@ 2020-05-12  7:41 ` rguenth at gcc dot gnu.org
  2020-05-12  7:50 ` rguenth at gcc dot gnu.org
                   ` (19 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-05-12  7:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018

--- Comment #19 from Richard Biener <rguenth at gcc dot gnu.org> ---
Is libgfortran built with -O2 -funroll-loops or with -O3 (IIRC -O3?).  Note we
see

Estimating sizes for loop 3
 BB: 14, after_exit: 0
  size:   1 _20 = count[n_95];
  size:   1 _21 = _20 + 1;
  size:   1 count[n_95] = _21;
  size:   1 _22 = stride[n_95];
  size:   0 _23 = (long unsigned int) _22;
  size:   1 _44 = _23 - _82;
  size:   1 _45 = _44 * 4;
  size:   1 src_62 = src_85 + _45;
  size:   1 _25 = extent[n_95];
  size:   2 if (_21 == _25)
 BB: 20, after_exit: 1
 BB: 13, after_exit: 0
  size:   1 count[n_95] = 0;
  size:   1 _18 = _22 * _25;
  size:   0 _19 = (long unsigned int) _18;
  size:   1 n_60 = n_95 + 1;
   Induction variable computation will be folded away.
  size:   2 if (dim_43 == n_60)
   Exit condition will be eliminated in last copy.
size: 15-1, last_iteration: 15-3
  Loop size: 15
  Estimated size after unrolling: 129
Making edge 13->20 impossible by redistributing probability to other edges.
../../../trunk/libgfortran/generated/in_pack_i4.c:100:14: optimized: loop with
13 iterations completely unrolled (header execution count 23565294)
Last iteration exit edge was proved true.

Note even with the rs6000 limits turned back to default I see the loop
unrolled (with -O3 or -O2 -funroll-loops).

Checking on x86_64 the file is compiled with -O2 only and we have

size: 17-1, last_iteration: 10-3
  Loop size: 17
  Estimated size after unrolling: 154
Not unrolling loop 3: size would grow.

so what's the speciality on POWER?  Code growth should trigger with -O3 only.
Given we have only a guessed profile (and that does not detect the inner
loop as completely cold) we're allowing growth then.  GCC has no idea the
outer loop iterates more than the inner.

Note re-structuring the loop to use down-counting count[] from extent[] to zero
would be worth experimenting with, likewise "peeling" the dim == 0 loop
and not making the outermost loop key on 'src' (can 'src' be NULL on entry?).

Anyway, completely peeling this loop looks useless - the only benefit
might be better branch prediction (each dimension gets its own entry
in the predictor cache).

If POWER cannot cope with large loops then I wonder why POWER people
increased limits (though even the default limits would unroll the loop).

Thomas - where did you measure the slowness?  For which dimensionality?
I'm quite sure the loop structure will be sub-optimal for certain
input shapes... (stride0 == 1 could even use memcpy for the inner dimension).

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling
  2020-05-09  7:58 [Bug target/95018] New: Excessive unrolling for Fortran library array handling tkoenig at gcc dot gnu.org
                   ` (18 preceding siblings ...)
  2020-05-12  7:41 ` rguenth at gcc dot gnu.org
@ 2020-05-12  7:50 ` rguenth at gcc dot gnu.org
  2020-05-12  9:30 ` tkoenig at gcc dot gnu.org
                   ` (18 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-05-12  7:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018

--- Comment #20 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Jiu Fu Guo from comment #18)
> Currently, I'm thinking to enhance GCC 'cunroll' as:
> if the loop has multi-exits or upbound is not a fixed number, we may not do
> 'complete unroll' for the loop, except -funroll-all-loops is specified.

That doens't make much sense (-funroll-all-loops is RTL unroller only).

I think the growth limits are simply too large unless we compute a "win"
which we in this case do not.  So I'd say the growth limits should scale
with win ^ (1/new param) thus if we estimate to eliminate 20% of the
loop stmts due to unrolling then the limit to apply is
limit * (0.2 ^ (1/X)) with X maybe defaulting to 2.

I'd only apply this new limit for peeling (peeling is when the loop count
is not constant and thus we keep the exit tests).

Of course people want more peeling (hello POWER people!)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling
  2020-05-09  7:58 [Bug target/95018] New: Excessive unrolling for Fortran library array handling tkoenig at gcc dot gnu.org
                   ` (19 preceding siblings ...)
  2020-05-12  7:50 ` rguenth at gcc dot gnu.org
@ 2020-05-12  9:30 ` tkoenig at gcc dot gnu.org
  2020-05-12  9:46 ` tkoenig at gcc dot gnu.org
                   ` (17 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2020-05-12  9:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018

--- Comment #21 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #19)
> Is libgfortran built with -O2 -funroll-loops or with -O3 (IIRC -O3?). 

Just plain -O2 (for size reasons), with matmul as an exception
where we add -funroll-loops and other optoins.

> so what's the speciality on POWER?  Code growth should trigger with -O3 only.
> Given we have only a guessed profile (and that does not detect the inner
> loop as completely cold) we're allowing growth then.  GCC has no idea the
> outer loop iterates more than the inner.

As a test, I changed the condition of the loop in question to

@@ -88,7 +88,7 @@ internal_pack_r4 (gfc_array_r4 * source)
       count[0]++;
       /* Advance to the next source element.  */
       index_type n = 0;
-      while (count[n] == extent[n])
+      while (unlikely (count[n] == extent[n]))
         {
           /* When we get to the end of a dimension, reset it and increment
              the next dimension.  */

which then results in

       while (__builtin_expect(!!(count[n] == extent[n]), 0))

and the loop is still completely peeled on POWER at -O2, which
I do not understand.

> Thomas - where did you measure the slowness?  For which dimensionality?

Actually, I didn't, I just made an assumption that it would be
bad for speed as well.  The tests that I ran then didn't show any
such slowdown, so I guess the POWER9 branch predictor is doing
a good job here.

However, this kind of loop is the standard way of accessing multi-
dimensional arrays of unknown dimension in libgfortran. It occurs
in around 400 files there, sometimes more than once, so the size issue
is significant.  I haven't checked if there is an actual degradation
for other use cases. 

> I'm quite sure the loop structure will be sub-optimal for certain
> input shapes... (stride0 == 1 could even use memcpy for the inner dimension).

Yes. I plan to revisit this when looking at PR 93114, where I have
to touch that part of the code anyway.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling
  2020-05-09  7:58 [Bug target/95018] New: Excessive unrolling for Fortran library array handling tkoenig at gcc dot gnu.org
                   ` (20 preceding siblings ...)
  2020-05-12  9:30 ` tkoenig at gcc dot gnu.org
@ 2020-05-12  9:46 ` tkoenig at gcc dot gnu.org
  2020-05-12 10:08 ` rguenth at gcc dot gnu.org
                   ` (16 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2020-05-12  9:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018

--- Comment #22 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
Here are the details of how I tested this.

I generated the in_pack_r4.i and in_unpack_r4.i by adding -save-temps to the
Makefile options in ~/trunk-bin/powerpc64le-unknown-linux-gnu/libgfortran ,
then removing in_pack_r4.* and in_unpack_r4.* there and running make.

In the benchmark directory, I then used

bench.f90:

program main
  real, dimension(:,:), allocatable :: a
  allocate (a(50000,4))
  call random_number (a)
  do i=1,5000000
     call foo(a(i::2,:))
     call foo(a)
  end do
end program main

foo.f90:

subroutine foo(a)
  real, dimension(*) :: a
end subroutine foo

(constants can be adjusted).  The first call to foo needs a repacking,
the second one is just to confuse the optimizer not to exit the loop.

With the command line

gfortran -g -fno-inline-arg-packing  -O2 bench.f90 foo.f90  in_pack_r4.i
in_unpack_r4.i -static-libgfortran && time ./a.out

a test can be run. -fno-inline-arg-repacking is important because
otherwise the internal packing routines will not be called, and
putting in in_pack_r4.i and in_unpack_r4.i will use those instead
of the ones from the (static) library.

in_pack_r4.i and in_unpack_r4.i can then be adjusted, for
exmaple by adding a #pragma GCC unroll 1.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling
  2020-05-09  7:58 [Bug target/95018] New: Excessive unrolling for Fortran library array handling tkoenig at gcc dot gnu.org
                   ` (21 preceding siblings ...)
  2020-05-12  9:46 ` tkoenig at gcc dot gnu.org
@ 2020-05-12 10:08 ` rguenth at gcc dot gnu.org
  2020-05-12 10:27 ` rguenther at suse dot de
                   ` (15 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-05-12 10:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018

--- Comment #23 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #20)
> (In reply to Jiu Fu Guo from comment #18)
> > Currently, I'm thinking to enhance GCC 'cunroll' as:
> > if the loop has multi-exits or upbound is not a fixed number, we may not do
> > 'complete unroll' for the loop, except -funroll-all-loops is specified.
> 
> That doens't make much sense (-funroll-all-loops is RTL unroller only).
> 
> I think the growth limits are simply too large unless we compute a "win"
> which we in this case do not.  So I'd say the growth limits should scale
> with win ^ (1/new param) thus if we estimate to eliminate 20% of the
> loop stmts due to unrolling then the limit to apply is
> limit * (0.2 ^ (1/X)) with X maybe defaulting to 2.
> 
> I'd only apply this new limit for peeling (peeling is when the loop count
> is not constant and thus we keep the exit tests).
> 
> Of course people want more peeling (hello POWER people!)

Btw, the issue with the rs6000 code at present is that it uses
unroll_only_small_loops but that only affects the RTL unroller
while the enablement of -funroll-loops at -O2 affects GIMPLE
as well but unconstrained (with -O3 params).  For the main
unroll pass (not cunrolli) this triggers code size growth:

  unsigned int val = tree_unroll_loops_completely (flag_unroll_loops
                                                   || flag_peel_loops
                                                   || optimize >= 3, true);

the "original" patch also adjusted parameters.  If the intent is to only
affect the RTL unroller then we need a separate flag controlling it
(yeah, using the same flags as heuristic trigger was probably bad).

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling
  2020-05-09  7:58 [Bug target/95018] New: Excessive unrolling for Fortran library array handling tkoenig at gcc dot gnu.org
                   ` (22 preceding siblings ...)
  2020-05-12 10:08 ` rguenth at gcc dot gnu.org
@ 2020-05-12 10:27 ` rguenther at suse dot de
  2020-05-13  2:33 ` guojiufu at gcc dot gnu.org
                   ` (14 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: rguenther at suse dot de @ 2020-05-12 10:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018

--- Comment #24 from rguenther at suse dot de <rguenther at suse dot de> ---
On Tue, 12 May 2020, tkoenig at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018
> 
> --- Comment #21 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
> (In reply to Richard Biener from comment #19)
> > Is libgfortran built with -O2 -funroll-loops or with -O3 (IIRC -O3?). 
> 
> Just plain -O2 (for size reasons), with matmul as an exception
> where we add -funroll-loops and other optoins.
> 
> > so what's the speciality on POWER?  Code growth should trigger with -O3 only.
> > Given we have only a guessed profile (and that does not detect the inner
> > loop as completely cold) we're allowing growth then.  GCC has no idea the
> > outer loop iterates more than the inner.
> 
> As a test, I changed the condition of the loop in question to
> 
> @@ -88,7 +88,7 @@ internal_pack_r4 (gfc_array_r4 * source)
>        count[0]++;
>        /* Advance to the next source element.  */
>        index_type n = 0;
> -      while (count[n] == extent[n])
> +      while (unlikely (count[n] == extent[n]))
>          {
>            /* When we get to the end of a dimension, reset it and increment
>               the next dimension.  */
> 
> which then results in
> 
>        while (__builtin_expect(!!(count[n] == extent[n]), 0))
> 
> and the loop is still completely peeled on POWER at -O2, which
> I do not understand.

We end up with a 90% vs. 10% probability:

;;    succ:       13 [10.0% (guessed)]  count:4740826 (estimated locally) 
(TRUE_VALUE,EXECUTABLE)
;;                15 [90.0% (guessed)]  count:42667438 (estimated locally) 
(FALSE_VALUE,EXECUTABLE)

...

;;   basic block 13, loop depth 2, count 4740826 (estimated locally), 
maybe hot

which is still maybe hot :/  There is now 
__builtin_expect_with_probability, but I wonder why we end up
with a maybe-hot count regardless of the __builtin_expect.

Using

__builtin_expect_with_probability (!!(count[n] == extent[n]), 0, 1)

makes it assumed to be never executed but as soon as I use
__builtin_expect_with_probability basic block counts all end up
zero!?

Honza?

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling
  2020-05-09  7:58 [Bug target/95018] New: Excessive unrolling for Fortran library array handling tkoenig at gcc dot gnu.org
                   ` (23 preceding siblings ...)
  2020-05-12 10:27 ` rguenther at suse dot de
@ 2020-05-13  2:33 ` guojiufu at gcc dot gnu.org
  2020-05-13  2:59 ` guojiufu at gcc dot gnu.org
                   ` (13 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: guojiufu at gcc dot gnu.org @ 2020-05-13  2:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018

--- Comment #25 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #23)
> (In reply to Richard Biener from comment #20)
> > (In reply to Jiu Fu Guo from comment #18)
> > > Currently, I'm thinking to enhance GCC 'cunroll' as:
> > > if the loop has multi-exits or upbound is not a fixed number, we may not do
> > > 'complete unroll' for the loop, except -funroll-all-loops is specified.
> > 
> > That doens't make much sense (-funroll-all-loops is RTL unroller only).
> > 
> > I think the growth limits are simply too large unless we compute a "win"
> > which we in this case do not.  So I'd say the growth limits should scale
> > with win ^ (1/new param) thus if we estimate to eliminate 20% of the
> > loop stmts due to unrolling then the limit to apply is
> > limit * (0.2 ^ (1/X)) with X maybe defaulting to 2.
> > 
> > I'd only apply this new limit for peeling (peeling is when the loop count
> > is not constant and thus we keep the exit tests).
> > 
> > Of course people want more peeling (hello POWER people!)
> 
> Btw, the issue with the rs6000 code at present is that it uses
> unroll_only_small_loops but that only affects the RTL unroller
> while the enablement of -funroll-loops at -O2 affects GIMPLE
> as well but unconstrained (with -O3 params).  For the main
> unroll pass (not cunrolli) this triggers code size growth:
> 
>   unsigned int val = tree_unroll_loops_completely (flag_unroll_loops
>                                                    || flag_peel_loops
>                                                    || optimize >= 3, true);
> 
> the "original" patch also adjusted parameters.  If the intent is to only
> affect the RTL unroller then we need a separate flag controlling it
> (yeah, using the same flags as heuristic trigger was probably bad).

Yes, the patch controls RTL unroller for small loops, and also enabled cunroll
(through flag_unroll_loops). This cause cunroll may increase size as you
explained.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling
  2020-05-09  7:58 [Bug target/95018] New: Excessive unrolling for Fortran library array handling tkoenig at gcc dot gnu.org
                   ` (24 preceding siblings ...)
  2020-05-13  2:33 ` guojiufu at gcc dot gnu.org
@ 2020-05-13  2:59 ` guojiufu at gcc dot gnu.org
  2020-05-13  3:37 ` guojiufu at gcc dot gnu.org
                   ` (12 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: guojiufu at gcc dot gnu.org @ 2020-05-13  2:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018

--- Comment #26 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #20)
> (In reply to Jiu Fu Guo from comment #18)
> > Currently, I'm thinking to enhance GCC 'cunroll' as:
> > if the loop has multi-exits or upbound is not a fixed number, we may not do
> > 'complete unroll' for the loop, except -funroll-all-loops is specified.
> 
> That doens't make much sense (-funroll-all-loops is RTL unroller only).

-funroll-all-loops is used by RTL unroller (decide_unroll_stupid for loop like 
"while (cond) body"). 
And during option handling, -funroll-all-loops also enables -funroll-loops.
When I thinking about for "cunroll", we may also use a flag to control those
loops to be unrolled less, this option come into my mind.

> 
> I think the growth limits are simply too large unless we compute a "win"
> which we in this case do not.  So I'd say the growth limits should scale
> with win ^ (1/new param) thus if we estimate to eliminate 20% of the
> loop stmts due to unrolling then the limit to apply is
> limit * (0.2 ^ (1/X)) with X maybe defaulting to 2.

It the growth limit seems could be refined. The ^ is an exponent operation,
right?

> 
> I'd only apply this new limit for peeling (peeling is when the loop count
> is not constant and thus we keep the exit tests).
> 
> Of course people want more peeling (hello POWER people!)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling
  2020-05-09  7:58 [Bug target/95018] New: Excessive unrolling for Fortran library array handling tkoenig at gcc dot gnu.org
                   ` (25 preceding siblings ...)
  2020-05-13  2:59 ` guojiufu at gcc dot gnu.org
@ 2020-05-13  3:37 ` guojiufu at gcc dot gnu.org
  2020-05-13  6:07 ` rguenth at gcc dot gnu.org
                   ` (11 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: guojiufu at gcc dot gnu.org @ 2020-05-13  3:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018

--- Comment #27 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> ---
(In reply to Jiu Fu Guo from comment #26)
> (In reply to Richard Biener from comment #20)
> > (In reply to Jiu Fu Guo from comment #18)
> > > Currently, I'm thinking to enhance GCC 'cunroll' as:
> > > if the loop has multi-exits or upbound is not a fixed number, we may not do
> > > 'complete unroll' for the loop, except -funroll-all-loops is specified.
> > 
> > That doens't make much sense (-funroll-all-loops is RTL unroller only).
> 

For the loop which has multi-exits, it may not helpful to unroll it, especially
"complete unroll" may be not helpful. Like loop in in_pack_i4.c. Since it would
early exit, some iterations(may most iterations) were not executed.

Is it a good idea to disable the GIMPLE cunroll for this kind of loop? RTL
unroll_stupid does not unroll this kind of loop either.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling
  2020-05-09  7:58 [Bug target/95018] New: Excessive unrolling for Fortran library array handling tkoenig at gcc dot gnu.org
                   ` (26 preceding siblings ...)
  2020-05-13  3:37 ` guojiufu at gcc dot gnu.org
@ 2020-05-13  6:07 ` rguenth at gcc dot gnu.org
  2020-05-13  7:54 ` tkoenig at gcc dot gnu.org
                   ` (10 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-05-13  6:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018

--- Comment #28 from Richard Biener <rguenth at gcc dot gnu.org> ---
> It the growth limit seems could be refined. The ^ is an exponent operation,
> right?

Yes.  The idea is to limit growth more when there is no benefit of unrolling
detected by the cost model (which currently simply counts likely eliminated
stmts).

(In reply to Jiu Fu Guo from comment #27)
> (In reply to Jiu Fu Guo from comment #26)
> > (In reply to Richard Biener from comment #20)
> > > (In reply to Jiu Fu Guo from comment #18)
> > > > Currently, I'm thinking to enhance GCC 'cunroll' as:
> > > > if the loop has multi-exits or upbound is not a fixed number, we may not do
> > > > 'complete unroll' for the loop, except -funroll-all-loops is specified.
> > > 
> > > That doens't make much sense (-funroll-all-loops is RTL unroller only).
> > 
> 
> For the loop which has multi-exits, it may not helpful to unroll it,
> especially "complete unroll" may be not helpful. Like loop in in_pack_i4.c.
> Since it would early exit, some iterations(may most iterations) were not
> executed.
> 
> Is it a good idea to disable the GIMPLE cunroll for this kind of loop? RTL
> unroll_stupid does not unroll this kind of loop either.

Well, GIMPLE cunroll specifically handles the situation of peeling such loops
and has a separate --param to control how many extra branches it may introduce
for those exits.  Generally disabling unrolling of such loops isn't a good
idea,
the reason for completely unrolling loops is abstraction removal and not
necessarily producing more optimal loop kernels (the loop is gone afterwards).

One of my TODO items is to work on its costing model to the extent that
we run value-numbering on the unrolled body (that's already done) and
roll back the unrolling if there wasn't any visible benefit.  The difficult
cases are like those in SPEC calculix where for full benefit you need to
unroll the 5(!) innermost loops and to even see any benefit you need to
unroll the 3 innermost loops.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling
  2020-05-09  7:58 [Bug target/95018] New: Excessive unrolling for Fortran library array handling tkoenig at gcc dot gnu.org
                   ` (27 preceding siblings ...)
  2020-05-13  6:07 ` rguenth at gcc dot gnu.org
@ 2020-05-13  7:54 ` tkoenig at gcc dot gnu.org
  2020-05-13  8:14 ` rguenth at gcc dot gnu.org
                   ` (9 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2020-05-13  7:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018

--- Comment #29 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
It is also interesting that this variant

--- a/libgfortran/generated/in_pack_i4.c
+++ b/libgfortran/generated/in_pack_i4.c
@@ -88,7 +88,7 @@ internal_pack_4 (gfc_array_i4 * source)
       count[0]++;
       /* Advance to the next source element.  */
       index_type n = 0;
-      while (count[n] == extent[n])
+      while (n < dim && count[n] == extent[n])
         {
           /* When we get to the end of a dimension, reset it and increment
              the next dimension.  */
@@ -100,7 +100,6 @@ internal_pack_4 (gfc_array_i4 * source)
           if (n == dim)
             {
               src = NULL;
-              break;
             }
           else
             {

does not get peeled.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling
  2020-05-09  7:58 [Bug target/95018] New: Excessive unrolling for Fortran library array handling tkoenig at gcc dot gnu.org
                   ` (28 preceding siblings ...)
  2020-05-13  7:54 ` tkoenig at gcc dot gnu.org
@ 2020-05-13  8:14 ` rguenth at gcc dot gnu.org
  2020-05-13  9:54 ` guojiufu at gcc dot gnu.org
                   ` (8 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-05-13  8:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018

--- Comment #30 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Thomas Koenig from comment #29)
> It is also interesting that this variant
> 
> --- a/libgfortran/generated/in_pack_i4.c
> +++ b/libgfortran/generated/in_pack_i4.c
> @@ -88,7 +88,7 @@ internal_pack_4 (gfc_array_i4 * source)
>        count[0]++;
>        /* Advance to the next source element.  */
>        index_type n = 0;
> -      while (count[n] == extent[n])
> +      while (n < dim && count[n] == extent[n])
>          {
>            /* When we get to the end of a dimension, reset it and increment
>               the next dimension.  */
> @@ -100,7 +100,6 @@ internal_pack_4 (gfc_array_i4 * source)
>            if (n == dim)
>              {
>                src = NULL;
> -              break;
>              }
>            else
>              {
> 
> does not get peeled.

More optimal would be

        count[0]--;
>        /* Advance to the next source element.  */
>        index_type n = 0;
        while (count[n] == 0)
          {
...
          }

note completely peeling the inner loop isn't as bad as it looks, it's
basically making the whole loop

  while (1)
    {
      for (count[0] = 0; count[0] < extent[0]; ++count[0])
        {
          /* Copy the data.  */
          *(dest++) = *src;
          /* Advance to the next element.  */
          src += stride0;
        }
      if (dim == 1)
        break;
      count[0] = 0;
      src -= stride[0] * extent[0];
      count[1]++;
      if (count[1] != extent[1])
        continue;
      if (dim == 2)
        break;
      count[1] = 0;
      src -= stride[1] * extent[1];
      count[2]++;
      if (count[2] != extent[2])
        continue;
      if (dim == 3)
        break;
...
    }

which should be quite optimal for speed (branch-prediction wise).  One
might want to try to optimize code size a bit, sure.  Sacrifying a bit
of speed at the loop exit could be setting extent[n > dim] = 1 and
dropping the if (dim == N) break; checks, leaving just the last.
Likewise changing the iteration from extent[N] to zero might make
the tests smaller.  Then as commented in the code pre-computing the
products might help as well - you get one additional load of course.
Interleaving extent and the product data arrays would help cache
locality.

Note writing the loop as above will make GCC recognize it as a loop
nest.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling
  2020-05-09  7:58 [Bug target/95018] New: Excessive unrolling for Fortran library array handling tkoenig at gcc dot gnu.org
                   ` (29 preceding siblings ...)
  2020-05-13  8:14 ` rguenth at gcc dot gnu.org
@ 2020-05-13  9:54 ` guojiufu at gcc dot gnu.org
  2020-05-13 10:14 ` rguenth at gcc dot gnu.org
                   ` (7 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: guojiufu at gcc dot gnu.org @ 2020-05-13  9:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018

--- Comment #31 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #28)

> > For the loop which has multi-exits, it may not helpful to unroll it,
> > especially "complete unroll" may be not helpful. Like loop in in_pack_i4.c.
> > Since it would early exit, some iterations(may most iterations) were not
> > executed.
> > 
> > Is it a good idea to disable the GIMPLE cunroll for this kind of loop? RTL
> > unroll_stupid does not unroll this kind of loop either.
> 
> Well, GIMPLE cunroll specifically handles the situation of peeling such loops
> and has a separate --param to control how many extra branches it may
> introduce
> for those exits.  Generally disabling unrolling of such loops isn't a good
> idea,
> the reason for completely unrolling loops is abstraction removal and not
> necessarily producing more optimal loop kernels (the loop is gone
> afterwards).

I think, the --param, which you said, maybe param_max_peel_branches which
default is 31. 
And currently, param_max_completely_peel_times is default to 16.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling
  2020-05-09  7:58 [Bug target/95018] New: Excessive unrolling for Fortran library array handling tkoenig at gcc dot gnu.org
                   ` (30 preceding siblings ...)
  2020-05-13  9:54 ` guojiufu at gcc dot gnu.org
@ 2020-05-13 10:14 ` rguenth at gcc dot gnu.org
  2020-05-13 11:24 ` guojiufu at gcc dot gnu.org
                   ` (6 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-05-13 10:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018

--- Comment #32 from Richard Biener <rguenth at gcc dot gnu.org> ---
Note I don't think the unrolling is excessive - store motion then applying
to all count[] and all computations hoisted out of the loop may be a bit
too much for register pressure though, especially since we're using
flag-based store-motion.  But it causes the stores to be materialized
on all exits of the loop which means we end up with N*N conditional stores :/

I guess SM could be improved here.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling
  2020-05-09  7:58 [Bug target/95018] New: Excessive unrolling for Fortran library array handling tkoenig at gcc dot gnu.org
                   ` (31 preceding siblings ...)
  2020-05-13 10:14 ` rguenth at gcc dot gnu.org
@ 2020-05-13 11:24 ` guojiufu at gcc dot gnu.org
  2020-05-19  5:42 ` guojiufu at gcc dot gnu.org
                   ` (5 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: guojiufu at gcc dot gnu.org @ 2020-05-13 11:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018

--- Comment #33 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #32)
> Note I don't think the unrolling is excessive - store motion then applying
> to all count[] and all computations hoisted out of the loop may be a bit
> too much for register pressure though, especially since we're using
> flag-based store-motion.  But it causes the stores to be materialized
> on all exits of the loop which means we end up with N*N conditional stores :/

In general, it may not very aggressive for param_max_peel_branches = 31,
param_max_completely_peel_times = 16. 
For in_pack_i4.c, the loop is at most 13+1 times and then be unrolled. While
for the loop, unrolling increases size and does not help performance.

> 
> I guess SM could be improved here.


Thanks all!

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling
  2020-05-09  7:58 [Bug target/95018] New: Excessive unrolling for Fortran library array handling tkoenig at gcc dot gnu.org
                   ` (32 preceding siblings ...)
  2020-05-13 11:24 ` guojiufu at gcc dot gnu.org
@ 2020-05-19  5:42 ` guojiufu at gcc dot gnu.org
  2020-06-07  9:41 ` cvs-commit at gcc dot gnu.org
                   ` (4 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: guojiufu at gcc dot gnu.org @ 2020-05-19  5:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018

--- Comment #34 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> ---
As previous patch 6d099a76a0f6a040a3e678f2bce7fc69cc3257d8(rs6000: Enable
limited unrolling at -O2) only affects simple loops on rs6000.  
We may also set limits for GIMPLE cunroll, like for RTL unroller through a
hook, and tune the limits for the unroller on simple loops.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling
  2020-05-09  7:58 [Bug target/95018] New: Excessive unrolling for Fortran library array handling tkoenig at gcc dot gnu.org
                   ` (33 preceding siblings ...)
  2020-05-19  5:42 ` guojiufu at gcc dot gnu.org
@ 2020-06-07  9:41 ` cvs-commit at gcc dot gnu.org
  2020-06-08  2:54 ` guojiufu at gcc dot gnu.org
                   ` (3 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2020-06-07  9:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018

--- Comment #35 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jiu Fu Guo <guojiufu@gcc.gnu.org>:

https://gcc.gnu.org/g:557a40f599f64e40cc1b20254bf82acc775375f5

commit r11-1020-g557a40f599f64e40cc1b20254bf82acc775375f5
Author: guojiufu <guojiufu@linux.ibm.com>
Date:   Thu May 28 14:10:39 2020 +0800

    rs6000: allow cunroll to grow size according to -funroll-loop or
-fpeel-loops

    Previously, flag_unroll_loops was turned on at -O2 implicitly.  This
    also turned on cunroll with allowance size increasing, and cunroll
    will unroll/peel the loop even the loop is complex like code in PR95018.
    With this patch, size growth for cunroll is allowed only for if
-funroll-loops
    or -fpeel-loops or -O3 is specified explicitly.

    gcc/ChangeLog
    2020-06-07  Jiufu Guo  <guojiufu@linux.ibm.com>

            PR target/95018
            * config/rs6000/rs6000.c (rs6000_option_override_internal):
            Override flag_cunroll_grow_size.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling
  2020-05-09  7:58 [Bug target/95018] New: Excessive unrolling for Fortran library array handling tkoenig at gcc dot gnu.org
                   ` (34 preceding siblings ...)
  2020-06-07  9:41 ` cvs-commit at gcc dot gnu.org
@ 2020-06-08  2:54 ` guojiufu at gcc dot gnu.org
  2020-06-08  5:12 ` tkoenig at gcc dot gnu.org
                   ` (2 subsequent siblings)
  38 siblings, 0 replies; 40+ messages in thread
From: guojiufu at gcc dot gnu.org @ 2020-06-08  2:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018

Jiu Fu Guo <guojiufu at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #36 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> ---
The patch which restores cunroll behavior was committed.
We still need to tune cunroll to avoid cunroll to peel complex loop as this PR
mentioned.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling
  2020-05-09  7:58 [Bug target/95018] New: Excessive unrolling for Fortran library array handling tkoenig at gcc dot gnu.org
                   ` (35 preceding siblings ...)
  2020-06-08  2:54 ` guojiufu at gcc dot gnu.org
@ 2020-06-08  5:12 ` tkoenig at gcc dot gnu.org
  2020-06-09  2:50 ` guojiufu at gcc dot gnu.org
  2020-06-19 14:39 ` cvs-commit at gcc dot gnu.org
  38 siblings, 0 replies; 40+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2020-06-08  5:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018

--- Comment #37 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
(In reply to Jiu Fu Guo from comment #36)
> The patch which restores cunroll behavior was committed.

Thanks!

Will you also backport to gcc 10, the other affected branch?

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling
  2020-05-09  7:58 [Bug target/95018] New: Excessive unrolling for Fortran library array handling tkoenig at gcc dot gnu.org
                   ` (36 preceding siblings ...)
  2020-06-08  5:12 ` tkoenig at gcc dot gnu.org
@ 2020-06-09  2:50 ` guojiufu at gcc dot gnu.org
  2020-06-19 14:39 ` cvs-commit at gcc dot gnu.org
  38 siblings, 0 replies; 40+ messages in thread
From: guojiufu at gcc dot gnu.org @ 2020-06-09  2:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018

--- Comment #38 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> ---
(In reply to Thomas Koenig from comment #37)
> (In reply to Jiu Fu Guo from comment #36)
> 
> Will you also backport to gcc 10, the other affected branch?

Yes, after it is stable on trunk, then backport to gcc10.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling
  2020-05-09  7:58 [Bug target/95018] New: Excessive unrolling for Fortran library array handling tkoenig at gcc dot gnu.org
                   ` (37 preceding siblings ...)
  2020-06-09  2:50 ` guojiufu at gcc dot gnu.org
@ 2020-06-19 14:39 ` cvs-commit at gcc dot gnu.org
  38 siblings, 0 replies; 40+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2020-06-19 14:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018

--- Comment #39 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-10 branch has been updated by Jiu Fu Guo
<guojiufu@gcc.gnu.org>:

https://gcc.gnu.org/g:60bd3f20baebeeddd60f8a2b85927e7da7c6016e

commit r10-8327-g60bd3f20baebeeddd60f8a2b85927e7da7c6016e
Author: guojiufu <guojiufu@linux.ibm.com>
Date:   Thu May 28 13:42:23 2020 +0800

    Introduce flag_cunroll_grow_size for cunroll and avoid enable it at -O2

    Currently GIMPLE complete unroller(cunroll) is checking
    flag_unroll_loops and flag_peel_loops to see if allow size growth.
    Beside affects curnoll, flag_unroll_loops also controls RTL unroler.
    To have more freedom to control cunroll and RTL unroller, this patch
    introduces flag_cunroll_grow_size.  With this patch, we can control
    cunroll and RTL unroller indepently. And enable flag_cunroll_grow_size
    only if -funroll-loops or -fpeel-loops or -O3 is specified explicitly.

    gcc/ChangeLog
    2020-06-19  Jiufu Guo  <guojiufu@linux.ibm.com>

            PR target/95018
            * common.opt (flag_cunroll_grow_size): New flag.
            * toplev.c (process_options): Set flag_cunroll_grow_size.
            * tree-ssa-loop-ivcanon.c (pass_complete_unroll::execute):
            Use flag_cunroll_grow_size.
            * config/rs6000/rs6000.c (rs6000_option_override_internal):
            Override flag_cunroll_grow_size.

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2020-06-19 14:39 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-09  7:58 [Bug target/95018] New: Excessive unrolling for Fortran library array handling tkoenig at gcc dot gnu.org
2020-05-09  8:00 ` [Bug target/95018] " tkoenig at gcc dot gnu.org
2020-05-09  8:10 ` tkoenig at gcc dot gnu.org
2020-05-09  8:36 ` [Bug target/95018] [11 Regression] " tkoenig at gcc dot gnu.org
2020-05-09  9:01 ` tkoenig at gcc dot gnu.org
2020-05-09  9:03 ` tkoenig at gcc dot gnu.org
2020-05-09  9:42 ` [Bug target/95018] [10/11 " tkoenig at gcc dot gnu.org
2020-05-09 13:21 ` tkoenig at gcc dot gnu.org
2020-05-11  5:03 ` tkoenig at gcc dot gnu.org
2020-05-11  5:18 ` tkoenig at gcc dot gnu.org
2020-05-11  7:53 ` guojiufu at gcc dot gnu.org
2020-05-11  8:07 ` guojiufu at gcc dot gnu.org
2020-05-11  8:13 ` guojiufu at gcc dot gnu.org
2020-05-11  8:32 ` guojiufu at gcc dot gnu.org
2020-05-11  9:20 ` tkoenig at gcc dot gnu.org
2020-05-11 14:09 ` guojiufu at gcc dot gnu.org
2020-05-11 19:13 ` tkoenig at gcc dot gnu.org
2020-05-12  5:58 ` guojiufu at gcc dot gnu.org
2020-05-12  6:06 ` guojiufu at gcc dot gnu.org
2020-05-12  7:41 ` rguenth at gcc dot gnu.org
2020-05-12  7:50 ` rguenth at gcc dot gnu.org
2020-05-12  9:30 ` tkoenig at gcc dot gnu.org
2020-05-12  9:46 ` tkoenig at gcc dot gnu.org
2020-05-12 10:08 ` rguenth at gcc dot gnu.org
2020-05-12 10:27 ` rguenther at suse dot de
2020-05-13  2:33 ` guojiufu at gcc dot gnu.org
2020-05-13  2:59 ` guojiufu at gcc dot gnu.org
2020-05-13  3:37 ` guojiufu at gcc dot gnu.org
2020-05-13  6:07 ` rguenth at gcc dot gnu.org
2020-05-13  7:54 ` tkoenig at gcc dot gnu.org
2020-05-13  8:14 ` rguenth at gcc dot gnu.org
2020-05-13  9:54 ` guojiufu at gcc dot gnu.org
2020-05-13 10:14 ` rguenth at gcc dot gnu.org
2020-05-13 11:24 ` guojiufu at gcc dot gnu.org
2020-05-19  5:42 ` guojiufu at gcc dot gnu.org
2020-06-07  9:41 ` cvs-commit at gcc dot gnu.org
2020-06-08  2:54 ` guojiufu at gcc dot gnu.org
2020-06-08  5:12 ` tkoenig at gcc dot gnu.org
2020-06-09  2:50 ` guojiufu at gcc dot gnu.org
2020-06-19 14:39 ` cvs-commit at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).