public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/102756] New: [12 Regression] Vectorizer change creates poor code for c-c++-common/torture/vector-compare-2.c
@ 2021-10-15  0:16 law at gcc dot gnu.org
  2021-10-15  0:29 ` [Bug tree-optimization/102756] [12 Regression] Complete unrolling is too senative to PRE; c-c++-common/torture/vector-compare-2.c pinskia at gcc dot gnu.org
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: law at gcc dot gnu.org @ 2021-10-15  0:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102756

            Bug ID: 102756
           Summary: [12 Regression] Vectorizer change creates poor code
                    for c-c++-common/torture/vector-compare-2.c
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: law at gcc dot gnu.org
  Target Milestone: ---

The visium-elf port is a bit broken in that any code which calls abort will
fail to link.   This has turned out to be useful in that it has pointed out
cases where the quality of our code generation has suffered.

The change to turn on the vectorizer by default at -O2 is yet another example.

c-c++-common/torture/vector-compare-2.c before the vectorizer change compiled
down to this code at -O2:



        .file   "j.c"
        .text
        .align  4
        .p2align 8
        .global foo
        .type   foo, @function
foo:
        moviu   r9,65535
        movil   r9,65533
        write.l (r1),r9
        write.l 1(r1),r9
        write.l 2(r1),r9
        bra     tr,r21,r0               ;return
         write.l 3(r1),r9
        .size   foo, .-foo
        .section        .text.startup,"ax",@progbits
        .align  4
        .p2align 8
        .global main
        .type   main, @function
main:
        bra     tr,r21,r0               ;return
         moviq   r1,0           ;movsi  r  J
        .size   main, .-main
        .ident  "GCC: (GNU) 12.0.0 20211008 (experimental)"


Of particular note "main" does _not_ call abort.  The optimizers have figured
everything out and realized that it should never abort.

After enabling the vectorizer at -O2 we get:
        .file   "j.c"
        .text
        .align  4
        .p2align 8
        .global foo
        .type   foo, @function
foo:
        moviu   r9,65535
        movil   r9,65533
        write.l (r1),r9
        write.l 1(r1),r9
        write.l 2(r1),r9
        bra     tr,r21,r0               ;return
         write.l 3(r1),r9
        .size   foo, .-foo
        .section        .text.startup,"ax",@progbits
        .align  4
        .p2align 8
        .global main
        .type   main, @function
main:
        subi    sp,36
        moviq   r10,23          ;movsi  r  J
        write.l (sp),fp
        move.l  fp,sp           ;stack_save
        add.l   r8,fp,r10
        lsr.l   r8,r8,4
        asl.l   r8,r8,4
        moviu   r9,65535
        write.l 1(sp),r21
        movil   r9,65533
        write.l (r8),r9
        write.l 1(r8),r9
        write.l 2(r8),r9
        write.l 3(r8),r9
        move.l  r10,r8
        moviq   r8,16           ;movsi  r  J
        add.l   r7,r10,r8
.L5:
        read.l  r8,(r10)
        cmp.l   r8,r9
        brr     ne,.L8
         addi    r10,4
        cmp.l   r10,r7
        brr     ne,.L5
         moviq   r1,0           ;movsi  r  J
        read.l  fp,(sp)
        read.l  r21,1(sp)
        bra     tr,r21,r0               ;return
         addi    sp,36          ;stack pop
.L8:
        moviu   r10,%u abort
        movil   r10,%l abort
        bra     tr,r10,r21
         nop            ;call
        .size   main, .-main
        .ident  "GCC: (GNU) 12.0.0 20211008 (experimental)"


Note how there's a call to abort starting at the label .L8.  And more generally
the code in "main" is considerably larger and more complex.

This is a code quality regression.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/102756] [12 Regression] Complete unrolling is too senative to PRE; c-c++-common/torture/vector-compare-2.c
  2021-10-15  0:16 [Bug tree-optimization/102756] New: [12 Regression] Vectorizer change creates poor code for c-c++-common/torture/vector-compare-2.c law at gcc dot gnu.org
@ 2021-10-15  0:29 ` pinskia at gcc dot gnu.org
  2021-10-15  0:29 ` pinskia at gcc dot gnu.org
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-10-15  0:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102756

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2021-10-15
           Keywords|                            |missed-optimization
            Summary|[12 Regression] Vectorizer  |[12 Regression] Complete
                   |change creates poor code    |unrolling is too senative
                   |for                         |to PRE;
                   |c-c++-common/torture/vector |c-c++-common/torture/vector
                   |-compare-2.c                |-compare-2.c
             Status|UNCONFIRMED                 |NEW

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
With -ftree-vectorize
size: 7-4, last_iteration: 7-4
  Loop size: 7
  Estimated size after unrolling: 8


  _1 = VIEW_CONVERT_EXPR<int[4]>(r)[i_10];


With -fno-tree-vectorize
size: 7-4, last_iteration: 6-4
  Loop size: 7
  Estimated size after unrolling: 7

  pretmp_2 = MEM[(vector(4) int *)&r][i_7];


Also -O2 -fno-tree-vectorize -fno-tree-pre produces the same as the -O2
-ftree-vectorize case.



------------------- CUT ----------------------------
Loop 1 iterates 3 times.
Loop 1 iterates at most 3 times.
Loop 1 likely iterates at most 3 times.
Estimating sizes for loop 1
 BB: 3, after_exit: 0
  size:   1 _1 = VIEW_CONVERT_EXPR<int[4]>(r)[i_10];
  size:   2 if (_1 != -3)
 BB: 7, after_exit: 1
 BB: 5, after_exit: 0
  size:   1 i_7 = i_10 + 1;
   Induction variable computation will be folded away.
  size:   1 ivtmp_9 = ivtmp_2 - 1;
   Induction variable computation will be folded away.
  size:   2 if (ivtmp_9 != 0)
   Exit condition will be eliminated in peeled copies.
   Exit condition will be eliminated in last copy.
   Constant conditional.
size: 7-4, last_iteration: 7-4
  Loop size: 7
  Estimated size after unrolling: 8
Not unrolling loop 1: size would grow.


vs:
Estimating sizes for loop 1
 BB: 3, after_exit: 0
  size:   2 if (prephitmp_9 != -3)
 BB: 6, after_exit: 1
  size:   1 pretmp_2 = MEM[(vector(4) int *)&r][i_7];
 BB: 5, after_exit: 0
  size:   1 i_7 = i_10 + 1;
   Induction variable computation will be folded away.
  size:   1 ivtmp_11 = ivtmp_1 - 1;
   Induction variable computation will be folded away.
  size:   2 if (ivtmp_11 != 0)
   Exit condition will be eliminated in peeled copies.
   Exit condition will be eliminated in last copy.
   Constant conditional.
size: 7-4, last_iteration: 6-4
  Loop size: 7
  Estimated size after unrolling: 7


PRE decides to do the load for MEM[(vector(4) int *)&r][0] which is why the
last iteration is 6-4 rather than 7-4.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/102756] [12 Regression] Complete unrolling is too senative to PRE; c-c++-common/torture/vector-compare-2.c
  2021-10-15  0:16 [Bug tree-optimization/102756] New: [12 Regression] Vectorizer change creates poor code for c-c++-common/torture/vector-compare-2.c law at gcc dot gnu.org
  2021-10-15  0:29 ` [Bug tree-optimization/102756] [12 Regression] Complete unrolling is too senative to PRE; c-c++-common/torture/vector-compare-2.c pinskia at gcc dot gnu.org
@ 2021-10-15  0:29 ` pinskia at gcc dot gnu.org
  2021-10-15  6:25 ` rguenth at gcc dot gnu.org
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-10-15  0:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102756

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |12.0

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/102756] [12 Regression] Complete unrolling is too senative to PRE; c-c++-common/torture/vector-compare-2.c
  2021-10-15  0:16 [Bug tree-optimization/102756] New: [12 Regression] Vectorizer change creates poor code for c-c++-common/torture/vector-compare-2.c law at gcc dot gnu.org
  2021-10-15  0:29 ` [Bug tree-optimization/102756] [12 Regression] Complete unrolling is too senative to PRE; c-c++-common/torture/vector-compare-2.c pinskia at gcc dot gnu.org
  2021-10-15  0:29 ` pinskia at gcc dot gnu.org
@ 2021-10-15  6:25 ` rguenth at gcc dot gnu.org
  2021-10-15 14:44 ` law at gcc dot gnu.org
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-10-15  6:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102756

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
I think the behavior is as expected ... PRE is avoided because of vectorization
and the unroll costing correctly sees the single less stmt being needed (which
is of course now on the preheader edge).  Looks like the testcase was
on-the-edge as far as unrolling is concerned.

Note one of the major issue with vectorization at -O2 is that it indeed affects
what we do in PRE where we could improve heuristics of course.  In this case
there's multiple exits in the loop which prevents vectorization anyway, but
we might just have optimized that away (we're in the elimination phase).

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/102756] [12 Regression] Complete unrolling is too senative to PRE; c-c++-common/torture/vector-compare-2.c
  2021-10-15  0:16 [Bug tree-optimization/102756] New: [12 Regression] Vectorizer change creates poor code for c-c++-common/torture/vector-compare-2.c law at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2021-10-15  6:25 ` rguenth at gcc dot gnu.org
@ 2021-10-15 14:44 ` law at gcc dot gnu.org
  2021-11-17 16:28 ` law at gcc dot gnu.org
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: law at gcc dot gnu.org @ 2021-10-15 14:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102756

--- Comment #3 from Jeffrey A. Law <law at gcc dot gnu.org> ---
So if we consider the behavior as-expected and that this was just a case where
we crossed a heuristic border, I'd be comfortable closing.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/102756] [12 Regression] Complete unrolling is too senative to PRE; c-c++-common/torture/vector-compare-2.c
  2021-10-15  0:16 [Bug tree-optimization/102756] New: [12 Regression] Vectorizer change creates poor code for c-c++-common/torture/vector-compare-2.c law at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2021-10-15 14:44 ` law at gcc dot gnu.org
@ 2021-11-17 16:28 ` law at gcc dot gnu.org
  2021-11-17 16:29 ` law at gcc dot gnu.org
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: law at gcc dot gnu.org @ 2021-11-17 16:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102756

--- Comment #4 from Jeffrey A. Law <law at gcc dot gnu.org> ---
I could also set up a toolchain ready-to-debug in an AWS instance that you
could use if that would be helpful.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/102756] [12 Regression] Complete unrolling is too senative to PRE; c-c++-common/torture/vector-compare-2.c
  2021-10-15  0:16 [Bug tree-optimization/102756] New: [12 Regression] Vectorizer change creates poor code for c-c++-common/torture/vector-compare-2.c law at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2021-11-17 16:28 ` law at gcc dot gnu.org
@ 2021-11-17 16:29 ` law at gcc dot gnu.org
  2022-01-19  8:22 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: law at gcc dot gnu.org @ 2021-11-17 16:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102756

--- Comment #5 from Jeffrey A. Law <law at gcc dot gnu.org> ---
Ignore last comment.  Meant for a different BZ.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/102756] [12 Regression] Complete unrolling is too senative to PRE; c-c++-common/torture/vector-compare-2.c
  2021-10-15  0:16 [Bug tree-optimization/102756] New: [12 Regression] Vectorizer change creates poor code for c-c++-common/torture/vector-compare-2.c law at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2021-11-17 16:29 ` law at gcc dot gnu.org
@ 2022-01-19  8:22 ` rguenth at gcc dot gnu.org
  2022-05-06  8:31 ` [Bug tree-optimization/102756] [12/13 " jakub at gcc dot gnu.org
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-01-19  8:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102756

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rsandifo at gcc dot gnu.org

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
Btw, the same happens on x86-64.  With -O2 and vectorization we end up with

  <bb 3> [local count: 858993457]:
  # ivtmp.14_11 = PHI <ivtmp.14_12(5), ivtmp.14_13(2)>
  _14 = (void *) ivtmp.14_11;
  _1 = MEM <int> [(vector(4) int *)_14];
  if (_1 != -3)
    goto <bb 4>; [0.00%]
  else
    goto <bb 5>; [100.00%]

  <bb 4> [count: 0]:
  __builtin_abort ();

  <bb 5> [local count: 858993457]:
  ivtmp.14_12 = ivtmp.14_11 + 4;
  if (ivtmp.14_12 != _16)
    goto <bb 3>; [80.00%]
  else
    goto <bb 6>; [20.00%]

  <bb 6> [local count: 214748368]:
  r ={v} {CLOBBER};

while everything is optimized away with -O2 -fno-tree-vectorize.

Let's keep this open as a regression since -O2 now enables vectorization.  In
principle we could preserve the previous behavior for the very-cheap
vectorizer cost model or adjust the heuristic for that case to only cover
loops with a single BB.

The real issue here is of course the unroller not considering the true
size after simplification.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/102756] [12/13 Regression] Complete unrolling is too senative to PRE; c-c++-common/torture/vector-compare-2.c
  2021-10-15  0:16 [Bug tree-optimization/102756] New: [12 Regression] Vectorizer change creates poor code for c-c++-common/torture/vector-compare-2.c law at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2022-01-19  8:22 ` rguenth at gcc dot gnu.org
@ 2022-05-06  8:31 ` jakub at gcc dot gnu.org
  2022-07-26 13:23 ` rguenth at gcc dot gnu.org
  2023-05-08 12:22 ` [Bug tree-optimization/102756] [12/13/14 " rguenth at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-05-06  8:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102756

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|12.0                        |12.2

--- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 12.1 is being released, retargeting bugs to GCC 12.2.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/102756] [12/13 Regression] Complete unrolling is too senative to PRE; c-c++-common/torture/vector-compare-2.c
  2021-10-15  0:16 [Bug tree-optimization/102756] New: [12 Regression] Vectorizer change creates poor code for c-c++-common/torture/vector-compare-2.c law at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2022-05-06  8:31 ` [Bug tree-optimization/102756] [12/13 " jakub at gcc dot gnu.org
@ 2022-07-26 13:23 ` rguenth at gcc dot gnu.org
  2023-05-08 12:22 ` [Bug tree-optimization/102756] [12/13/14 " rguenth at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-07-26 13:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102756

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P2

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/102756] [12/13/14 Regression] Complete unrolling is too senative to PRE; c-c++-common/torture/vector-compare-2.c
  2021-10-15  0:16 [Bug tree-optimization/102756] New: [12 Regression] Vectorizer change creates poor code for c-c++-common/torture/vector-compare-2.c law at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2022-07-26 13:23 ` rguenth at gcc dot gnu.org
@ 2023-05-08 12:22 ` rguenth at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-05-08 12:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102756

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|12.3                        |12.4

--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 12.3 is being released, retargeting bugs to GCC 12.4.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-05-08 12:22 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-15  0:16 [Bug tree-optimization/102756] New: [12 Regression] Vectorizer change creates poor code for c-c++-common/torture/vector-compare-2.c law at gcc dot gnu.org
2021-10-15  0:29 ` [Bug tree-optimization/102756] [12 Regression] Complete unrolling is too senative to PRE; c-c++-common/torture/vector-compare-2.c pinskia at gcc dot gnu.org
2021-10-15  0:29 ` pinskia at gcc dot gnu.org
2021-10-15  6:25 ` rguenth at gcc dot gnu.org
2021-10-15 14:44 ` law at gcc dot gnu.org
2021-11-17 16:28 ` law at gcc dot gnu.org
2021-11-17 16:29 ` law at gcc dot gnu.org
2022-01-19  8:22 ` rguenth at gcc dot gnu.org
2022-05-06  8:31 ` [Bug tree-optimization/102756] [12/13 " jakub at gcc dot gnu.org
2022-07-26 13:23 ` rguenth at gcc dot gnu.org
2023-05-08 12:22 ` [Bug tree-optimization/102756] [12/13/14 " rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).