[Bug c++/110381] New: Incorrect loop unrolling for structs of floating point types

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug c++/110381] New: Incorrect loop unrolling for structs of floating point types
@ 2023-06-23 18:06 lennox.ho at intel dot com
  2023-06-23 18:08 ` [Bug c++/110381] " lennox.ho at intel dot com
                   ` (20 more replies)
  0 siblings, 21 replies; 22+ messages in thread
From: lennox.ho at intel dot com @ 2023-06-23 18:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110381

            Bug ID: 110381
           Summary: Incorrect loop unrolling for structs of floating point
                    types
           Product: gcc
           Version: 12.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: lennox.ho at intel dot com
  Target Milestone: ---

We believe gcc is incorrectly unrolling loops while performing summation of
structs with floating point members:

Here's a minimal example:

```
#include <iostream>

using value_type = double;

struct FOO {
   value_type a = 0;
   value_type b = 0;
   value_type c = 0;
};

value_type sum_8_foos(const FOO* foos) {
    value_type sum = 0;

    for (int i = 0; i < 8; ++i) {
        auto foo = foos[i];

        sum += foo.c;
        sum += foo.b;
        sum += foo.a;
    }

    return sum;
}

int main() {
    FOO foos[8];
    foos[0].b = 5;

    std::cout << sum_8_foos(foos) << '\n';
    return 0;
}
```
With -O1, we get 5.
With -O2, we get 10.

godbolt link: https://godbolt.org/z/7cxeb3Gsv

Slightly reorganising the assembly output for the loop,
```
.L2
        add     rdi, 48

        addsd   sum, QWORD PTR [rdi-48] // c
        addsd   sum, QWORD PTR [rdi-40] // b
        addsd   sum, QWORD PTR [rdi-32] // a

        addsd   sum, QWORD PTR [rdi-24] // c
        addsd   sum, QWORD PTR [rdi-16] // b
        addsd   sum, QWORD PTR [rdi-8]  // a

        add     rax, 24

        addsd   sum, QWORD PTR [rax-16] // b
        addsd   sum, QWORD PTR [rax-24] // c

        cmp     rdi, end
        jne     .L2
```

There appears to be duplicate additions for the members b and c.

This behaviour appears on gcc 12.1 and is still present in gcc 13.1.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug c++/110381] Incorrect loop unrolling for structs of floating point types
  2023-06-23 18:06 [Bug c++/110381] New: Incorrect loop unrolling for structs of floating point types lennox.ho at intel dot com
@ 2023-06-23 18:08 ` lennox.ho at intel dot com
  2023-06-23 19:48 ` [Bug tree-optimization/110381] [11/12/13/14 Regression] double counting for sum of " pinskia at gcc dot gnu.org
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: lennox.ho at intel dot com @ 2023-06-23 18:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110381

--- Comment #1 from Lennox Ho <lennox.ho at intel dot com> ---
This issues does not appear to reproduce with integer types, or if the members
are summed in the "natural" order - a => b => c.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug tree-optimization/110381] [11/12/13/14 Regression] double counting for sum of structs of floating point types
  2023-06-23 18:06 [Bug c++/110381] New: Incorrect loop unrolling for structs of floating point types lennox.ho at intel dot com
  2023-06-23 18:08 ` [Bug c++/110381] " lennox.ho at intel dot com
@ 2023-06-23 19:48 ` pinskia at gcc dot gnu.org
  2023-06-23 20:01 ` lennox.ho at intel dot com
                   ` (18 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-06-23 19:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110381

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |11.5
      Known to work|                            |10.1.0
   Last reconfirmed|                            |2023-06-23
     Ever confirmed|0                           |1
      Known to fail|                            |11.1.0
           Keywords|                            |needs-bisection, wrong-code
            Summary|Incorrect loop unrolling    |[11/12/13/14 Regression]
                   |for structs of floating     |double counting for sum of
                   |point types                 |structs of floating point
                   |                            |types
          Component|c++                         |tree-optimization
             Status|UNCONFIRMED                 |NEW

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Confirmed, at -O3 (or -O2 -ftree-vectorize) started to fail in GCC 11.1.0.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug tree-optimization/110381] [11/12/13/14 Regression] double counting for sum of structs of floating point types
  2023-06-23 18:06 [Bug c++/110381] New: Incorrect loop unrolling for structs of floating point types lennox.ho at intel dot com
  2023-06-23 18:08 ` [Bug c++/110381] " lennox.ho at intel dot com
  2023-06-23 19:48 ` [Bug tree-optimization/110381] [11/12/13/14 Regression] double counting for sum of " pinskia at gcc dot gnu.org
@ 2023-06-23 20:01 ` lennox.ho at intel dot com
  2023-06-23 20:07 ` pinskia at gcc dot gnu.org
                   ` (17 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: lennox.ho at intel dot com @ 2023-06-23 20:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110381

--- Comment #3 from Lennox Ho <lennox.ho at intel dot com> ---
Thanks. -fno-tree-vectorize appears to fix GCC 12.1 at -O2.

Curious, why is -ftree-vectorize enabled at -O2 with GCC 12.1?
The documents say it's only turned on by default with -O3
```
https://gcc.gnu.org/projects/tree-ssa/vectorization.html
Vectorization is enabled by the flag -ftree-vectorize and by default at -O3
```

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug tree-optimization/110381] [11/12/13/14 Regression] double counting for sum of structs of floating point types
  2023-06-23 18:06 [Bug c++/110381] New: Incorrect loop unrolling for structs of floating point types lennox.ho at intel dot com
                   ` (2 preceding siblings ...)
  2023-06-23 20:01 ` lennox.ho at intel dot com
@ 2023-06-23 20:07 ` pinskia at gcc dot gnu.org
  2023-06-23 22:34 ` mpolacek at gcc dot gnu.org
                   ` (16 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-06-23 20:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110381

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Lennox Ho from comment #3)
> Thanks. -fno-tree-vectorize appears to fix GCC 12.1 at -O2.
> 
> Curious, why is -ftree-vectorize enabled at -O2 with GCC 12.1?

Yes see https://gcc.gnu.org/gcc-12/changes.html#general .


> The documents say it's only turned on by default with -O3
> ```
> https://gcc.gnu.org/projects/tree-ssa/vectorization.html
> Vectorization is enabled by the flag -ftree-vectorize and by default at -O3
> ```

That is not the documentation for the release but rather for the project while
it was during development of it originally (back in early 2000s).

The documentation for the GCC 12.1 release is located at:
https://gcc.gnu.org/onlinedocs/gcc-12.1.0/gcc/Optimize-Options.html#index-ftree-vectorize

Which specifically mentions -ftree-loop-vectorize and -ftree-slp-vectorize are
enabled at -O2 and above (-ftree-vectorize is a meta option for those 2
options).

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug tree-optimization/110381] [11/12/13/14 Regression] double counting for sum of structs of floating point types
  2023-06-23 18:06 [Bug c++/110381] New: Incorrect loop unrolling for structs of floating point types lennox.ho at intel dot com
                   ` (3 preceding siblings ...)
  2023-06-23 20:07 ` pinskia at gcc dot gnu.org
@ 2023-06-23 22:34 ` mpolacek at gcc dot gnu.org
  2023-06-26  8:58 ` rguenth at gcc dot gnu.org
                   ` (15 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: mpolacek at gcc dot gnu.org @ 2023-06-23 22:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110381

Marek Polacek <mpolacek at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|needs-bisection             |
                 CC|                            |mpolacek at gcc dot gnu.org,
                   |                            |rguenth at gcc dot gnu.org

--- Comment #5 from Marek Polacek <mpolacek at gcc dot gnu.org> ---
With -O2 -ftree-vectorize, started with r11-4428:

commit 4a369d199bf2f34e037030b18d0da923e8a24997
Author: Richard Biener <rguenther@suse.de>
Date:   Fri Oct 16 09:43:22 2020 +0200

    SLP vectorize across PHI nodes

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug tree-optimization/110381] [11/12/13/14 Regression] double counting for sum of structs of floating point types
  2023-06-23 18:06 [Bug c++/110381] New: Incorrect loop unrolling for structs of floating point types lennox.ho at intel dot com
                   ` (4 preceding siblings ...)
  2023-06-23 22:34 ` mpolacek at gcc dot gnu.org
@ 2023-06-26  8:58 ` rguenth at gcc dot gnu.org
  2023-06-26  9:33 ` rguenth at gcc dot gnu.org
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-06-26  8:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110381

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
I will have a look.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug tree-optimization/110381] [11/12/13/14 Regression] double counting for sum of structs of floating point types
  2023-06-23 18:06 [Bug c++/110381] New: Incorrect loop unrolling for structs of floating point types lennox.ho at intel dot com
                   ` (5 preceding siblings ...)
  2023-06-26  8:58 ` rguenth at gcc dot gnu.org
@ 2023-06-26  9:33 ` rguenth at gcc dot gnu.org
  2023-06-26  9:41 ` rguenth at gcc dot gnu.org
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-06-26  9:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110381

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
C testcase:

struct FOO {
   double a;
   double b;
   double c;
};

double __attribute__((noipa))
sum_8_foos(const struct FOO* foos)
{
    double sum = 0;

    for (int i = 0; i < 8; ++i) {
        struct FOO foo = foos[i];

        sum += foo.c;
        sum += foo.b;
        sum += foo.a;
    }

    return sum;
}

int main()
{
  struct FOO foos[8];

  __builtin_memset (foos, 0, sizeof (foos));
  foos[0].b = 5;

  if (sum_8_foos (foos) != 5)
    __builtin_abort ();
  return 0;
}

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug tree-optimization/110381] [11/12/13/14 Regression] double counting for sum of structs of floating point types
  2023-06-23 18:06 [Bug c++/110381] New: Incorrect loop unrolling for structs of floating point types lennox.ho at intel dot com
                   ` (6 preceding siblings ...)
  2023-06-26  9:33 ` rguenth at gcc dot gnu.org
@ 2023-06-26  9:41 ` rguenth at gcc dot gnu.org
  2023-06-26 10:50 ` rguenth at gcc dot gnu.org
                   ` (12 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-06-26  9:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110381

--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
OK, so we transform the in-order FOLD_LEFT_REDUCTION as

  # sum_22 = PHI <sum_15(5), 0.0(2)>
  # vectp_foos.7_25 = PHI <vectp_foos.7_23(5), foos_12(D)(2)>
...
  vect_foo_c_8.9_21 = MEM <vector(2) double> [(double *)vectp_foos.7_25];
  vectp_foos.7_20 = vectp_foos.7_25 + 16;
  vect_foo_c_8.10_7 = MEM <vector(2) double> [(double *)vectp_foos.7_20];
  vectp_foos.7_6 = vectp_foos.7_25 + 32;
  vect_foo_c_8.11_5 = MEM <vector(2) double> [(double *)vectp_foos.7_6];
  stmp_sum_13.12_4 = BIT_FIELD_REF <vect_foo_c_8.9_21, 64, 0>;
  stmp_sum_13.12_31 = sum_22 + stmp_sum_13.12_4;
  stmp_sum_13.12_32 = BIT_FIELD_REF <vect_foo_c_8.9_21, 64, 64>;
  stmp_sum_13.12_33 = stmp_sum_13.12_31 + stmp_sum_13.12_32;
  stmp_sum_13.12_34 = BIT_FIELD_REF <vect_foo_c_8.10_7, 64, 0>;
  stmp_sum_13.12_35 = stmp_sum_13.12_33 + stmp_sum_13.12_34;
  stmp_sum_13.12_36 = BIT_FIELD_REF <vect_foo_c_8.10_7, 64, 64>;
  stmp_sum_13.12_37 = stmp_sum_13.12_35 + stmp_sum_13.12_36;
  stmp_sum_13.12_38 = BIT_FIELD_REF <vect_foo_c_8.11_5, 64, 0>;
  stmp_sum_13.12_39 = stmp_sum_13.12_37 + stmp_sum_13.12_38;
  stmp_sum_13.12_40 = BIT_FIELD_REF <vect_foo_c_8.11_5, 64, 64>;
  foo$a_11 = _3->a;
  foo$b_9 = _3->b;
  foo$c_8 = _3->c;
  sum_13 = stmp_sum_13.12_39 + stmp_sum_13.12_40;
  sum_14 = foo$b_9 + sum_13;
  sum_15 = foo$a_11 + sum_14;

where you can see the final step updates one but not the last scalar stmt
in the SLP reduction group.  That causes us to keep the scalar reads live
and apply some elements multiple times.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug tree-optimization/110381] [11/12/13/14 Regression] double counting for sum of structs of floating point types
  2023-06-23 18:06 [Bug c++/110381] New: Incorrect loop unrolling for structs of floating point types lennox.ho at intel dot com
                   ` (7 preceding siblings ...)
  2023-06-26  9:41 ` rguenth at gcc dot gnu.org
@ 2023-06-26 10:50 ` rguenth at gcc dot gnu.org
  2023-06-26 12:18 ` cvs-commit at gcc dot gnu.org
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-06-26 10:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110381

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rsandifo at gcc dot gnu.org

--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
I think the transform phase is correct but the analysis phase fails to reject
this case because there's a permutation we elide even though that will not
preserve the fold-left reduction semantics.  We analyze the SLP node to

t.c:12:23: note:   Final SLP tree for instance 0x4c90840:
t.c:12:23: note:   node 0x4d57380 (max_nunits=2, refcnt=3) vector(2) double
t.c:12:23: note:   op template: sum_13 = foo$c_8 + sum_22;
t.c:12:23: note:        stmt 0 sum_13 = foo$c_8 + sum_22;
t.c:12:23: note:        stmt 1 sum_14 = foo$b_9 + sum_13;
t.c:12:23: note:        stmt 2 sum_15 = foo$a_11 + sum_14;
t.c:12:23: note:        children 0x4d57408 0x4d57490
t.c:12:23: note:   node 0x4d57408 (max_nunits=2, refcnt=2) vector(2) double
t.c:12:23: note:   op template: foo$c_8 = _3->c;
t.c:12:23: note:        stmt 0 foo$c_8 = _3->c;
t.c:12:23: note:        stmt 1 foo$b_9 = _3->b;
t.c:12:23: note:        stmt 2 foo$a_11 = _3->a;
t.c:12:23: note:        load permutation { 2 1 0 }
t.c:12:23: note:   node 0x4d57490 (max_nunits=2, refcnt=2) vector(2) double
t.c:12:23: note:   op template: sum_22 = PHI <sum_15(5), 0.0(2)>
t.c:12:23: note:        stmt 0 sum_22 = PHI <sum_15(5), 0.0(2)>
t.c:12:23: note:        stmt 1 sum_22 = PHI <sum_15(5), 0.0(2)>
t.c:12:23: note:        stmt 2 sum_22 = PHI <sum_15(5), 0.0(2)>
t.c:12:23: note:        children 0x4d57380 (nil)

but optimize_slp mangles things here.

We have

  /* We have to mark outgoing permutations facing non-reduction graph
     entries that are not represented as to be materialized.  */
  for (slp_instance instance : m_vinfo->slp_instances)
    if (SLP_INSTANCE_KIND (instance) == slp_inst_kind_ctor)
      {       
        unsigned int node_i = SLP_INSTANCE_TREE (instance)->vertex;
        m_partitions[m_vertices[node_i].partition].layout = 0;
      }

unfortunately this all happens before we determine the reduction is
in-order.  The only thing we can do here is check
needs_fold_left_reduction_p directly.

I'm testing a patch.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug tree-optimization/110381] [11/12/13/14 Regression] double counting for sum of structs of floating point types
  2023-06-23 18:06 [Bug c++/110381] New: Incorrect loop unrolling for structs of floating point types lennox.ho at intel dot com
                   ` (8 preceding siblings ...)
  2023-06-26 10:50 ` rguenth at gcc dot gnu.org
@ 2023-06-26 12:18 ` cvs-commit at gcc dot gnu.org
  2023-06-26 12:18 ` [Bug tree-optimization/110381] [11/12/13 " rguenth at gcc dot gnu.org
                   ` (10 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-06-26 12:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110381

--- Comment #10 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:53d6f57c1b20c6da52aefce737fb7d5263686ba3

commit r14-2095-g53d6f57c1b20c6da52aefce737fb7d5263686ba3
Author: Richard Biener <rguenther@suse.de>
Date:   Mon Jun 26 12:51:37 2023 +0200

    tree-optimization/110381 - preserve SLP permutation with in-order
reductions

    The following fixes a bug that manifests itself during fold-left
    reduction transform in picking not the last scalar def to replace
    and thus double-counting some elements.  But the underlying issue
    is that we merge a load permutation into the in-order reduction
    which is of course wrong.

    Now, reduction analysis has not yet been performend when optimizing
    permutations so we have to resort to check that ourselves.

            PR tree-optimization/110381
            * tree-vect-slp.cc
(vect_optimize_slp_pass::start_choosing_layouts):
            Materialize permutes before fold-left reductions.

            * gcc.dg/vect/pr110381.c: New testcase.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug tree-optimization/110381] [11/12/13 Regression] double counting for sum of structs of floating point types
  2023-06-23 18:06 [Bug c++/110381] New: Incorrect loop unrolling for structs of floating point types lennox.ho at intel dot com
                   ` (9 preceding siblings ...)
  2023-06-26 12:18 ` cvs-commit at gcc dot gnu.org
@ 2023-06-26 12:18 ` rguenth at gcc dot gnu.org
  2023-06-29 16:01 ` clyon at gcc dot gnu.org
                   ` (9 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-06-26 12:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110381

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|[11/12/13/14 Regression]    |[11/12/13 Regression]
                   |double counting for sum of  |double counting for sum of
                   |structs of floating point   |structs of floating point
                   |types                       |types
      Known to work|                            |14.0

--- Comment #11 from Richard Biener <rguenth at gcc dot gnu.org> ---
Fixed on trunk sofar.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug tree-optimization/110381] [11/12/13 Regression] double counting for sum of structs of floating point types
  2023-06-23 18:06 [Bug c++/110381] New: Incorrect loop unrolling for structs of floating point types lennox.ho at intel dot com
                   ` (10 preceding siblings ...)
  2023-06-26 12:18 ` [Bug tree-optimization/110381] [11/12/13 " rguenth at gcc dot gnu.org
@ 2023-06-29 16:01 ` clyon at gcc dot gnu.org
  2023-06-30  6:35 ` cvs-commit at gcc dot gnu.org
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: clyon at gcc dot gnu.org @ 2023-06-29 16:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110381

Christophe Lyon <clyon at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |clyon at gcc dot gnu.org

--- Comment #12 from Christophe Lyon <clyon at gcc dot gnu.org> ---
The new testcase (gcc.dg/vect/pr110381.c) fails:
FAIL: gcc.dg/vect/pr110381.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/pr110381.c execution test

on arm-linux-gnueabihf configured with --with-float=hard
--with-fpu=neon-fp-armv8 --with-mode=thumb --with-arch=armv8-a

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug tree-optimization/110381] [11/12/13 Regression] double counting for sum of structs of floating point types
  2023-06-23 18:06 [Bug c++/110381] New: Incorrect loop unrolling for structs of floating point types lennox.ho at intel dot com
                   ` (11 preceding siblings ...)
  2023-06-29 16:01 ` clyon at gcc dot gnu.org
@ 2023-06-30  6:35 ` cvs-commit at gcc dot gnu.org
  2023-06-30  6:40 ` rguenth at gcc dot gnu.org
                   ` (7 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-06-30  6:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110381

--- Comment #13 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:c0439218eb79aa0293291aed92453a59db8c6e85

commit r14-2207-gc0439218eb79aa0293291aed92453a59db8c6e85
Author: Richard Biener <rguenther@suse.de>
Date:   Fri Jun 30 08:34:24 2023 +0200

    tree-optimization/110381 - fix testcase

    This adds a missing check_vect () to the execute testcase.

            PR tree-optimization/110381
            * gcc.dg/vect/pr110381.c: Add check_vect ().

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug tree-optimization/110381] [11/12/13 Regression] double counting for sum of structs of floating point types
  2023-06-23 18:06 [Bug c++/110381] New: Incorrect loop unrolling for structs of floating point types lennox.ho at intel dot com
                   ` (12 preceding siblings ...)
  2023-06-30  6:35 ` cvs-commit at gcc dot gnu.org
@ 2023-06-30  6:40 ` rguenth at gcc dot gnu.org
  2023-06-30 14:05 ` clyon at gcc dot gnu.org
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-06-30  6:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110381

--- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Christophe Lyon from comment #12)
> The new testcase (gcc.dg/vect/pr110381.c) fails:
> FAIL: gcc.dg/vect/pr110381.c -flto -ffat-lto-objects execution test
> FAIL: gcc.dg/vect/pr110381.c execution test
> 
> on arm-linux-gnueabihf configured with --with-float=hard
> --with-fpu=neon-fp-armv8 --with-mode=thumb --with-arch=armv8-a

Can you check if it works now?  I've added a missing check_vect () call in
case the harness passes in command-line options that your HW doesn't
support.  Otherwise I'd appreciate command-line options to reproduce.
I cannot get anything to vectorize with a cc1 cross using

> ./cc1 -quiet t.c -O2 -ftree-vectorize -fno-vect-cost-model -fopt-info-vec -I include -march=armv8-a -mthumb -mfpu=neon-fp-armv8 -mfloat-abi=hard -mhard-float

but I have a cross configured with --with-float=hard --with-cpu=cortex-a9
--with-fpu=neon-fp16

I hope the FPU is compliant enough to compute __DBL_MAX__ + -__DBL_MAX__ + 5.
to 5.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug tree-optimization/110381] [11/12/13 Regression] double counting for sum of structs of floating point types
  2023-06-23 18:06 [Bug c++/110381] New: Incorrect loop unrolling for structs of floating point types lennox.ho at intel dot com
                   ` (13 preceding siblings ...)
  2023-06-30  6:40 ` rguenth at gcc dot gnu.org
@ 2023-06-30 14:05 ` clyon at gcc dot gnu.org
  2023-06-30 20:50 ` pinskia at gcc dot gnu.org
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: clyon at gcc dot gnu.org @ 2023-06-30 14:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110381

--- Comment #15 from Christophe Lyon <clyon at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #14)
> (In reply to Christophe Lyon from comment #12)
> > The new testcase (gcc.dg/vect/pr110381.c) fails:
> > FAIL: gcc.dg/vect/pr110381.c -flto -ffat-lto-objects execution test
> > FAIL: gcc.dg/vect/pr110381.c execution test
> > 
> > on arm-linux-gnueabihf configured with --with-float=hard
> > --with-fpu=neon-fp-armv8 --with-mode=thumb --with-arch=armv8-a
> 
> Can you check if it works now?  I've added a missing check_vect () call in
> case the harness passes in command-line options that your HW doesn't
> support.  Otherwise I'd appreciate command-line options to reproduce.

I still fails (check_vect() passes on my config, so there's no change).

Here is what sum_8_foos looks like:
sum_8_foos:
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        vmov.i64        d0, #0  @ float
        add     r3, r0, #192
.L10:
        vldr.64 d16, [r0, #8]
        adds    r0, r0, #24
        vldr.64 d18, [r0, #-24]
        vldr.64 d17, [r0, #-8]
        cmp     r3, r0
        vadd.f64        d16, d16, d18
        vadd.f64        d16, d16, d17
        vadd.f64        d0, d0, d16
        bne     .L10
        bx      lr

so we load:
d16=5
d17=-__DBL_MAX__
d18=__DBL_MAX__
the first addition makes d16=__DBL_MAX__
and the second one makes d16=0


> I cannot get anything to vectorize with a cc1 cross using
> 
> > ./cc1 -quiet t.c -O2 -ftree-vectorize -fno-vect-cost-model -fopt-info-vec -I include tri
> 
> but I have a cross configured with --with-float=hard --with-cpu=cortex-a9
> --with-fpu=neon-fp16

Not sure what happens. I tried my native compiler with the above flags, I get
the same code.
I tried to build my native compiler with the same flags, same code too.

> 
> I hope the FPU is compliant enough to compute __DBL_MAX__ + -__DBL_MAX__ +
> 5. to 5.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug tree-optimization/110381] [11/12/13 Regression] double counting for sum of structs of floating point types
  2023-06-23 18:06 [Bug c++/110381] New: Incorrect loop unrolling for structs of floating point types lennox.ho at intel dot com
                   ` (14 preceding siblings ...)
  2023-06-30 14:05 ` clyon at gcc dot gnu.org
@ 2023-06-30 20:50 ` pinskia at gcc dot gnu.org
  2023-07-03  7:54 ` clyon at gcc dot gnu.org
                   ` (4 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-06-30 20:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110381

--- Comment #16 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I suspect this patch will fix the arm failure:
```
diff --git a/gcc/testsuite/gcc.dg/vect/pr110381.c
b/gcc/testsuite/gcc.dg/vect/pr110381.c
index dc8c6a8f683..ee78666d2e8 100644
--- a/gcc/testsuite/gcc.dg/vect/pr110381.c
+++ b/gcc/testsuite/gcc.dg/vect/pr110381.c
@@ -1,4 +1,5 @@
 /* { dg-do run } */
+/* { dg-require-effective-target vect_float_strict } */

 #include "tree-vect.h"


````

Arm enables -funsafe-math-optimizations for the vector testsuite (as it
requires because of denormals IIRC) but this testcase requires strict ordering
so you need to not allow `-funsafe-math-optimizations` which is what
vect_float_strict does.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug tree-optimization/110381] [11/12/13 Regression] double counting for sum of structs of floating point types
  2023-06-23 18:06 [Bug c++/110381] New: Incorrect loop unrolling for structs of floating point types lennox.ho at intel dot com
                   ` (15 preceding siblings ...)
  2023-06-30 20:50 ` pinskia at gcc dot gnu.org
@ 2023-07-03  7:54 ` clyon at gcc dot gnu.org
  2023-07-03  7:57 ` pinskia at gcc dot gnu.org
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: clyon at gcc dot gnu.org @ 2023-07-03  7:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110381

--- Comment #17 from Christophe Lyon <clyon at gcc dot gnu.org> ---
Thanks Andrew, I wasn't aware of vect_float_strict.
I confirm it makes the test UNSUPPORTED.

Can you commit the fix or do you want me to do it on your behalf?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug tree-optimization/110381] [11/12/13 Regression] double counting for sum of structs of floating point types
  2023-06-23 18:06 [Bug c++/110381] New: Incorrect loop unrolling for structs of floating point types lennox.ho at intel dot com
                   ` (16 preceding siblings ...)
  2023-07-03  7:54 ` clyon at gcc dot gnu.org
@ 2023-07-03  7:57 ` pinskia at gcc dot gnu.org
  2023-07-03  8:03 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-07-03  7:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110381

--- Comment #18 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Christophe Lyon from comment #17)
> Thanks Andrew, I wasn't aware of vect_float_strict.
> I confirm it makes the test UNSUPPORTED.
> 
> Can you commit the fix or do you want me to do it on your behalf?

Can you do it as I have other things I am working on right now.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug tree-optimization/110381] [11/12/13 Regression] double counting for sum of structs of floating point types
  2023-06-23 18:06 [Bug c++/110381] New: Incorrect loop unrolling for structs of floating point types lennox.ho at intel dot com
                   ` (17 preceding siblings ...)
  2023-07-03  7:57 ` pinskia at gcc dot gnu.org
@ 2023-07-03  8:03 ` cvs-commit at gcc dot gnu.org
  2023-07-03  8:05 ` clyon at gcc dot gnu.org
  2023-07-07 12:06 ` [Bug tree-optimization/110381] [11/12 " cvs-commit at gcc dot gnu.org
  20 siblings, 0 replies; 22+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-07-03  8:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110381

--- Comment #19 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Christophe Lyon <clyon@gcc.gnu.org>:

https://gcc.gnu.org/g:8cb087d869be698a86b082a7248d03e468ef1eb1

commit r14-2254-g8cb087d869be698a86b082a7248d03e468ef1eb1
Author: Christophe Lyon <christophe.lyon@linaro.org>
Date:   Mon Jul 3 08:00:00 2023 +0000

    testsuite: Add vect_float_strict to testcase [PR 110381]

    As discussed in the PR, the testcase needs
    /* { dg-require-effective-target vect_float_strict } */

    2023-02-03  Andrew Pinski  <apinski@marvell.com>

            PR tree-optimization/110381
            gcc/testsuite/

            * gcc.dg/vect/pr110381.c: Add vect_float_strict.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug tree-optimization/110381] [11/12/13 Regression] double counting for sum of structs of floating point types
  2023-06-23 18:06 [Bug c++/110381] New: Incorrect loop unrolling for structs of floating point types lennox.ho at intel dot com
                   ` (18 preceding siblings ...)
  2023-07-03  8:03 ` cvs-commit at gcc dot gnu.org
@ 2023-07-03  8:05 ` clyon at gcc dot gnu.org
  2023-07-07 12:06 ` [Bug tree-optimization/110381] [11/12 " cvs-commit at gcc dot gnu.org
  20 siblings, 0 replies; 22+ messages in thread
From: clyon at gcc dot gnu.org @ 2023-07-03  8:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110381

--- Comment #20 from Christophe Lyon <clyon at gcc dot gnu.org> ---
Sorry for the typo in the date in the commit message :-(

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug tree-optimization/110381] [11/12 Regression] double counting for sum of structs of floating point types
  2023-06-23 18:06 [Bug c++/110381] New: Incorrect loop unrolling for structs of floating point types lennox.ho at intel dot com
                   ` (19 preceding siblings ...)
  2023-07-03  8:05 ` clyon at gcc dot gnu.org
@ 2023-07-07 12:06 ` cvs-commit at gcc dot gnu.org
  20 siblings, 0 replies; 22+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-07-07 12:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110381

--- Comment #21 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-13 branch has been updated by Richard Biener
<rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:32c7f05f8bc6d45dee374fe22be3f0e19836278a

commit r13-7543-g32c7f05f8bc6d45dee374fe22be3f0e19836278a
Author: Richard Biener <rguenther@suse.de>
Date:   Mon Jun 26 12:51:37 2023 +0200

    tree-optimization/110381 - preserve SLP permutation with in-order
reductions

    The following fixes a bug that manifests itself during fold-left
    reduction transform in picking not the last scalar def to replace
    and thus double-counting some elements.  But the underlying issue
    is that we merge a load permutation into the in-order reduction
    which is of course wrong.

    Now, reduction analysis has not yet been performend when optimizing
    permutations so we have to resort to check that ourselves.

            PR tree-optimization/110381
            * tree-vect-slp.cc
(vect_optimize_slp_pass::start_choosing_layouts):
            Materialize permutes before fold-left reductions.

            * gcc.dg/vect/pr110381.c: New testcase.

    (cherry picked from commit 53d6f57c1b20c6da52aefce737fb7d5263686ba3)

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2023-07-07 12:06 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-23 18:06 [Bug c++/110381] New: Incorrect loop unrolling for structs of floating point types lennox.ho at intel dot com
2023-06-23 18:08 ` [Bug c++/110381] " lennox.ho at intel dot com
2023-06-23 19:48 ` [Bug tree-optimization/110381] [11/12/13/14 Regression] double counting for sum of " pinskia at gcc dot gnu.org
2023-06-23 20:01 ` lennox.ho at intel dot com
2023-06-23 20:07 ` pinskia at gcc dot gnu.org
2023-06-23 22:34 ` mpolacek at gcc dot gnu.org
2023-06-26  8:58 ` rguenth at gcc dot gnu.org
2023-06-26  9:33 ` rguenth at gcc dot gnu.org
2023-06-26  9:41 ` rguenth at gcc dot gnu.org
2023-06-26 10:50 ` rguenth at gcc dot gnu.org
2023-06-26 12:18 ` cvs-commit at gcc dot gnu.org
2023-06-26 12:18 ` [Bug tree-optimization/110381] [11/12/13 " rguenth at gcc dot gnu.org
2023-06-29 16:01 ` clyon at gcc dot gnu.org
2023-06-30  6:35 ` cvs-commit at gcc dot gnu.org
2023-06-30  6:40 ` rguenth at gcc dot gnu.org
2023-06-30 14:05 ` clyon at gcc dot gnu.org
2023-06-30 20:50 ` pinskia at gcc dot gnu.org
2023-07-03  7:54 ` clyon at gcc dot gnu.org
2023-07-03  7:57 ` pinskia at gcc dot gnu.org
2023-07-03  8:03 ` cvs-commit at gcc dot gnu.org
2023-07-03  8:05 ` clyon at gcc dot gnu.org
2023-07-07 12:06 ` [Bug tree-optimization/110381] [11/12 " cvs-commit at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).