* [Bug c++/110381] Incorrect loop unrolling for structs of floating point types
2023-06-23 18:06 [Bug c++/110381] New: Incorrect loop unrolling for structs of floating point types lennox.ho at intel dot com
@ 2023-06-23 18:08 ` lennox.ho at intel dot com
2023-06-23 19:48 ` [Bug tree-optimization/110381] [11/12/13/14 Regression] double counting for sum of " pinskia at gcc dot gnu.org
` (21 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: lennox.ho at intel dot com @ 2023-06-23 18:08 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110381
--- Comment #1 from Lennox Ho <lennox.ho at intel dot com> ---
This issues does not appear to reproduce with integer types, or if the members
are summed in the "natural" order - a => b => c.
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug tree-optimization/110381] [11/12/13/14 Regression] double counting for sum of structs of floating point types
2023-06-23 18:06 [Bug c++/110381] New: Incorrect loop unrolling for structs of floating point types lennox.ho at intel dot com
2023-06-23 18:08 ` [Bug c++/110381] " lennox.ho at intel dot com
@ 2023-06-23 19:48 ` pinskia at gcc dot gnu.org
2023-06-23 20:01 ` lennox.ho at intel dot com
` (20 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-06-23 19:48 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110381
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|--- |11.5
Known to work| |10.1.0
Last reconfirmed| |2023-06-23
Ever confirmed|0 |1
Known to fail| |11.1.0
Keywords| |needs-bisection, wrong-code
Summary|Incorrect loop unrolling |[11/12/13/14 Regression]
|for structs of floating |double counting for sum of
|point types |structs of floating point
| |types
Component|c++ |tree-optimization
Status|UNCONFIRMED |NEW
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Confirmed, at -O3 (or -O2 -ftree-vectorize) started to fail in GCC 11.1.0.
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug tree-optimization/110381] [11/12/13/14 Regression] double counting for sum of structs of floating point types
2023-06-23 18:06 [Bug c++/110381] New: Incorrect loop unrolling for structs of floating point types lennox.ho at intel dot com
2023-06-23 18:08 ` [Bug c++/110381] " lennox.ho at intel dot com
2023-06-23 19:48 ` [Bug tree-optimization/110381] [11/12/13/14 Regression] double counting for sum of " pinskia at gcc dot gnu.org
@ 2023-06-23 20:01 ` lennox.ho at intel dot com
2023-06-23 20:07 ` pinskia at gcc dot gnu.org
` (19 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: lennox.ho at intel dot com @ 2023-06-23 20:01 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110381
--- Comment #3 from Lennox Ho <lennox.ho at intel dot com> ---
Thanks. -fno-tree-vectorize appears to fix GCC 12.1 at -O2.
Curious, why is -ftree-vectorize enabled at -O2 with GCC 12.1?
The documents say it's only turned on by default with -O3
```
https://gcc.gnu.org/projects/tree-ssa/vectorization.html
Vectorization is enabled by the flag -ftree-vectorize and by default at -O3
```
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug tree-optimization/110381] [11/12/13/14 Regression] double counting for sum of structs of floating point types
2023-06-23 18:06 [Bug c++/110381] New: Incorrect loop unrolling for structs of floating point types lennox.ho at intel dot com
` (2 preceding siblings ...)
2023-06-23 20:01 ` lennox.ho at intel dot com
@ 2023-06-23 20:07 ` pinskia at gcc dot gnu.org
2023-06-23 22:34 ` mpolacek at gcc dot gnu.org
` (18 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-06-23 20:07 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110381
--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Lennox Ho from comment #3)
> Thanks. -fno-tree-vectorize appears to fix GCC 12.1 at -O2.
>
> Curious, why is -ftree-vectorize enabled at -O2 with GCC 12.1?
Yes see https://gcc.gnu.org/gcc-12/changes.html#general .
> The documents say it's only turned on by default with -O3
> ```
> https://gcc.gnu.org/projects/tree-ssa/vectorization.html
> Vectorization is enabled by the flag -ftree-vectorize and by default at -O3
> ```
That is not the documentation for the release but rather for the project while
it was during development of it originally (back in early 2000s).
The documentation for the GCC 12.1 release is located at:
https://gcc.gnu.org/onlinedocs/gcc-12.1.0/gcc/Optimize-Options.html#index-ftree-vectorize
Which specifically mentions -ftree-loop-vectorize and -ftree-slp-vectorize are
enabled at -O2 and above (-ftree-vectorize is a meta option for those 2
options).
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug tree-optimization/110381] [11/12/13/14 Regression] double counting for sum of structs of floating point types
2023-06-23 18:06 [Bug c++/110381] New: Incorrect loop unrolling for structs of floating point types lennox.ho at intel dot com
` (3 preceding siblings ...)
2023-06-23 20:07 ` pinskia at gcc dot gnu.org
@ 2023-06-23 22:34 ` mpolacek at gcc dot gnu.org
2023-06-26 8:58 ` rguenth at gcc dot gnu.org
` (17 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: mpolacek at gcc dot gnu.org @ 2023-06-23 22:34 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110381
Marek Polacek <mpolacek at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords|needs-bisection |
CC| |mpolacek at gcc dot gnu.org,
| |rguenth at gcc dot gnu.org
--- Comment #5 from Marek Polacek <mpolacek at gcc dot gnu.org> ---
With -O2 -ftree-vectorize, started with r11-4428:
commit 4a369d199bf2f34e037030b18d0da923e8a24997
Author: Richard Biener <rguenther@suse.de>
Date: Fri Oct 16 09:43:22 2020 +0200
SLP vectorize across PHI nodes
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug tree-optimization/110381] [11/12/13/14 Regression] double counting for sum of structs of floating point types
2023-06-23 18:06 [Bug c++/110381] New: Incorrect loop unrolling for structs of floating point types lennox.ho at intel dot com
` (4 preceding siblings ...)
2023-06-23 22:34 ` mpolacek at gcc dot gnu.org
@ 2023-06-26 8:58 ` rguenth at gcc dot gnu.org
2023-06-26 9:33 ` rguenth at gcc dot gnu.org
` (16 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-06-26 8:58 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110381
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org
--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
I will have a look.
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug tree-optimization/110381] [11/12/13/14 Regression] double counting for sum of structs of floating point types
2023-06-23 18:06 [Bug c++/110381] New: Incorrect loop unrolling for structs of floating point types lennox.ho at intel dot com
` (5 preceding siblings ...)
2023-06-26 8:58 ` rguenth at gcc dot gnu.org
@ 2023-06-26 9:33 ` rguenth at gcc dot gnu.org
2023-06-26 9:41 ` rguenth at gcc dot gnu.org
` (15 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-06-26 9:33 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110381
--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
C testcase:
struct FOO {
double a;
double b;
double c;
};
double __attribute__((noipa))
sum_8_foos(const struct FOO* foos)
{
double sum = 0;
for (int i = 0; i < 8; ++i) {
struct FOO foo = foos[i];
sum += foo.c;
sum += foo.b;
sum += foo.a;
}
return sum;
}
int main()
{
struct FOO foos[8];
__builtin_memset (foos, 0, sizeof (foos));
foos[0].b = 5;
if (sum_8_foos (foos) != 5)
__builtin_abort ();
return 0;
}
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug tree-optimization/110381] [11/12/13/14 Regression] double counting for sum of structs of floating point types
2023-06-23 18:06 [Bug c++/110381] New: Incorrect loop unrolling for structs of floating point types lennox.ho at intel dot com
` (6 preceding siblings ...)
2023-06-26 9:33 ` rguenth at gcc dot gnu.org
@ 2023-06-26 9:41 ` rguenth at gcc dot gnu.org
2023-06-26 10:50 ` rguenth at gcc dot gnu.org
` (14 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-06-26 9:41 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110381
--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
OK, so we transform the in-order FOLD_LEFT_REDUCTION as
# sum_22 = PHI <sum_15(5), 0.0(2)>
# vectp_foos.7_25 = PHI <vectp_foos.7_23(5), foos_12(D)(2)>
...
vect_foo_c_8.9_21 = MEM <vector(2) double> [(double *)vectp_foos.7_25];
vectp_foos.7_20 = vectp_foos.7_25 + 16;
vect_foo_c_8.10_7 = MEM <vector(2) double> [(double *)vectp_foos.7_20];
vectp_foos.7_6 = vectp_foos.7_25 + 32;
vect_foo_c_8.11_5 = MEM <vector(2) double> [(double *)vectp_foos.7_6];
stmp_sum_13.12_4 = BIT_FIELD_REF <vect_foo_c_8.9_21, 64, 0>;
stmp_sum_13.12_31 = sum_22 + stmp_sum_13.12_4;
stmp_sum_13.12_32 = BIT_FIELD_REF <vect_foo_c_8.9_21, 64, 64>;
stmp_sum_13.12_33 = stmp_sum_13.12_31 + stmp_sum_13.12_32;
stmp_sum_13.12_34 = BIT_FIELD_REF <vect_foo_c_8.10_7, 64, 0>;
stmp_sum_13.12_35 = stmp_sum_13.12_33 + stmp_sum_13.12_34;
stmp_sum_13.12_36 = BIT_FIELD_REF <vect_foo_c_8.10_7, 64, 64>;
stmp_sum_13.12_37 = stmp_sum_13.12_35 + stmp_sum_13.12_36;
stmp_sum_13.12_38 = BIT_FIELD_REF <vect_foo_c_8.11_5, 64, 0>;
stmp_sum_13.12_39 = stmp_sum_13.12_37 + stmp_sum_13.12_38;
stmp_sum_13.12_40 = BIT_FIELD_REF <vect_foo_c_8.11_5, 64, 64>;
foo$a_11 = _3->a;
foo$b_9 = _3->b;
foo$c_8 = _3->c;
sum_13 = stmp_sum_13.12_39 + stmp_sum_13.12_40;
sum_14 = foo$b_9 + sum_13;
sum_15 = foo$a_11 + sum_14;
where you can see the final step updates one but not the last scalar stmt
in the SLP reduction group. That causes us to keep the scalar reads live
and apply some elements multiple times.
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug tree-optimization/110381] [11/12/13/14 Regression] double counting for sum of structs of floating point types
2023-06-23 18:06 [Bug c++/110381] New: Incorrect loop unrolling for structs of floating point types lennox.ho at intel dot com
` (7 preceding siblings ...)
2023-06-26 9:41 ` rguenth at gcc dot gnu.org
@ 2023-06-26 10:50 ` rguenth at gcc dot gnu.org
2023-06-26 12:18 ` cvs-commit at gcc dot gnu.org
` (13 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-06-26 10:50 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110381
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |rsandifo at gcc dot gnu.org
--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
I think the transform phase is correct but the analysis phase fails to reject
this case because there's a permutation we elide even though that will not
preserve the fold-left reduction semantics. We analyze the SLP node to
t.c:12:23: note: Final SLP tree for instance 0x4c90840:
t.c:12:23: note: node 0x4d57380 (max_nunits=2, refcnt=3) vector(2) double
t.c:12:23: note: op template: sum_13 = foo$c_8 + sum_22;
t.c:12:23: note: stmt 0 sum_13 = foo$c_8 + sum_22;
t.c:12:23: note: stmt 1 sum_14 = foo$b_9 + sum_13;
t.c:12:23: note: stmt 2 sum_15 = foo$a_11 + sum_14;
t.c:12:23: note: children 0x4d57408 0x4d57490
t.c:12:23: note: node 0x4d57408 (max_nunits=2, refcnt=2) vector(2) double
t.c:12:23: note: op template: foo$c_8 = _3->c;
t.c:12:23: note: stmt 0 foo$c_8 = _3->c;
t.c:12:23: note: stmt 1 foo$b_9 = _3->b;
t.c:12:23: note: stmt 2 foo$a_11 = _3->a;
t.c:12:23: note: load permutation { 2 1 0 }
t.c:12:23: note: node 0x4d57490 (max_nunits=2, refcnt=2) vector(2) double
t.c:12:23: note: op template: sum_22 = PHI <sum_15(5), 0.0(2)>
t.c:12:23: note: stmt 0 sum_22 = PHI <sum_15(5), 0.0(2)>
t.c:12:23: note: stmt 1 sum_22 = PHI <sum_15(5), 0.0(2)>
t.c:12:23: note: stmt 2 sum_22 = PHI <sum_15(5), 0.0(2)>
t.c:12:23: note: children 0x4d57380 (nil)
but optimize_slp mangles things here.
We have
/* We have to mark outgoing permutations facing non-reduction graph
entries that are not represented as to be materialized. */
for (slp_instance instance : m_vinfo->slp_instances)
if (SLP_INSTANCE_KIND (instance) == slp_inst_kind_ctor)
{
unsigned int node_i = SLP_INSTANCE_TREE (instance)->vertex;
m_partitions[m_vertices[node_i].partition].layout = 0;
}
unfortunately this all happens before we determine the reduction is
in-order. The only thing we can do here is check
needs_fold_left_reduction_p directly.
I'm testing a patch.
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug tree-optimization/110381] [11/12/13/14 Regression] double counting for sum of structs of floating point types
2023-06-23 18:06 [Bug c++/110381] New: Incorrect loop unrolling for structs of floating point types lennox.ho at intel dot com
` (8 preceding siblings ...)
2023-06-26 10:50 ` rguenth at gcc dot gnu.org
@ 2023-06-26 12:18 ` cvs-commit at gcc dot gnu.org
2023-06-26 12:18 ` [Bug tree-optimization/110381] [11/12/13 " rguenth at gcc dot gnu.org
` (12 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-06-26 12:18 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110381
--- Comment #10 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:
https://gcc.gnu.org/g:53d6f57c1b20c6da52aefce737fb7d5263686ba3
commit r14-2095-g53d6f57c1b20c6da52aefce737fb7d5263686ba3
Author: Richard Biener <rguenther@suse.de>
Date: Mon Jun 26 12:51:37 2023 +0200
tree-optimization/110381 - preserve SLP permutation with in-order
reductions
The following fixes a bug that manifests itself during fold-left
reduction transform in picking not the last scalar def to replace
and thus double-counting some elements. But the underlying issue
is that we merge a load permutation into the in-order reduction
which is of course wrong.
Now, reduction analysis has not yet been performend when optimizing
permutations so we have to resort to check that ourselves.
PR tree-optimization/110381
* tree-vect-slp.cc
(vect_optimize_slp_pass::start_choosing_layouts):
Materialize permutes before fold-left reductions.
* gcc.dg/vect/pr110381.c: New testcase.
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug tree-optimization/110381] [11/12/13 Regression] double counting for sum of structs of floating point types
2023-06-23 18:06 [Bug c++/110381] New: Incorrect loop unrolling for structs of floating point types lennox.ho at intel dot com
` (9 preceding siblings ...)
2023-06-26 12:18 ` cvs-commit at gcc dot gnu.org
@ 2023-06-26 12:18 ` rguenth at gcc dot gnu.org
2023-06-29 16:01 ` clyon at gcc dot gnu.org
` (11 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-06-26 12:18 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110381
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|[11/12/13/14 Regression] |[11/12/13 Regression]
|double counting for sum of |double counting for sum of
|structs of floating point |structs of floating point
|types |types
Known to work| |14.0
--- Comment #11 from Richard Biener <rguenth at gcc dot gnu.org> ---
Fixed on trunk sofar.
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug tree-optimization/110381] [11/12/13 Regression] double counting for sum of structs of floating point types
2023-06-23 18:06 [Bug c++/110381] New: Incorrect loop unrolling for structs of floating point types lennox.ho at intel dot com
` (10 preceding siblings ...)
2023-06-26 12:18 ` [Bug tree-optimization/110381] [11/12/13 " rguenth at gcc dot gnu.org
@ 2023-06-29 16:01 ` clyon at gcc dot gnu.org
2023-06-30 6:35 ` cvs-commit at gcc dot gnu.org
` (10 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: clyon at gcc dot gnu.org @ 2023-06-29 16:01 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110381
Christophe Lyon <clyon at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |clyon at gcc dot gnu.org
--- Comment #12 from Christophe Lyon <clyon at gcc dot gnu.org> ---
The new testcase (gcc.dg/vect/pr110381.c) fails:
FAIL: gcc.dg/vect/pr110381.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/pr110381.c execution test
on arm-linux-gnueabihf configured with --with-float=hard
--with-fpu=neon-fp-armv8 --with-mode=thumb --with-arch=armv8-a
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug tree-optimization/110381] [11/12/13 Regression] double counting for sum of structs of floating point types
2023-06-23 18:06 [Bug c++/110381] New: Incorrect loop unrolling for structs of floating point types lennox.ho at intel dot com
` (11 preceding siblings ...)
2023-06-29 16:01 ` clyon at gcc dot gnu.org
@ 2023-06-30 6:35 ` cvs-commit at gcc dot gnu.org
2023-06-30 6:40 ` rguenth at gcc dot gnu.org
` (9 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-06-30 6:35 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110381
--- Comment #13 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:
https://gcc.gnu.org/g:c0439218eb79aa0293291aed92453a59db8c6e85
commit r14-2207-gc0439218eb79aa0293291aed92453a59db8c6e85
Author: Richard Biener <rguenther@suse.de>
Date: Fri Jun 30 08:34:24 2023 +0200
tree-optimization/110381 - fix testcase
This adds a missing check_vect () to the execute testcase.
PR tree-optimization/110381
* gcc.dg/vect/pr110381.c: Add check_vect ().
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug tree-optimization/110381] [11/12/13 Regression] double counting for sum of structs of floating point types
2023-06-23 18:06 [Bug c++/110381] New: Incorrect loop unrolling for structs of floating point types lennox.ho at intel dot com
` (12 preceding siblings ...)
2023-06-30 6:35 ` cvs-commit at gcc dot gnu.org
@ 2023-06-30 6:40 ` rguenth at gcc dot gnu.org
2023-06-30 14:05 ` clyon at gcc dot gnu.org
` (8 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-06-30 6:40 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110381
--- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Christophe Lyon from comment #12)
> The new testcase (gcc.dg/vect/pr110381.c) fails:
> FAIL: gcc.dg/vect/pr110381.c -flto -ffat-lto-objects execution test
> FAIL: gcc.dg/vect/pr110381.c execution test
>
> on arm-linux-gnueabihf configured with --with-float=hard
> --with-fpu=neon-fp-armv8 --with-mode=thumb --with-arch=armv8-a
Can you check if it works now? I've added a missing check_vect () call in
case the harness passes in command-line options that your HW doesn't
support. Otherwise I'd appreciate command-line options to reproduce.
I cannot get anything to vectorize with a cc1 cross using
> ./cc1 -quiet t.c -O2 -ftree-vectorize -fno-vect-cost-model -fopt-info-vec -I include -march=armv8-a -mthumb -mfpu=neon-fp-armv8 -mfloat-abi=hard -mhard-float
but I have a cross configured with --with-float=hard --with-cpu=cortex-a9
--with-fpu=neon-fp16
I hope the FPU is compliant enough to compute __DBL_MAX__ + -__DBL_MAX__ + 5.
to 5.
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug tree-optimization/110381] [11/12/13 Regression] double counting for sum of structs of floating point types
2023-06-23 18:06 [Bug c++/110381] New: Incorrect loop unrolling for structs of floating point types lennox.ho at intel dot com
` (13 preceding siblings ...)
2023-06-30 6:40 ` rguenth at gcc dot gnu.org
@ 2023-06-30 14:05 ` clyon at gcc dot gnu.org
2023-06-30 20:50 ` pinskia at gcc dot gnu.org
` (7 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: clyon at gcc dot gnu.org @ 2023-06-30 14:05 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110381
--- Comment #15 from Christophe Lyon <clyon at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #14)
> (In reply to Christophe Lyon from comment #12)
> > The new testcase (gcc.dg/vect/pr110381.c) fails:
> > FAIL: gcc.dg/vect/pr110381.c -flto -ffat-lto-objects execution test
> > FAIL: gcc.dg/vect/pr110381.c execution test
> >
> > on arm-linux-gnueabihf configured with --with-float=hard
> > --with-fpu=neon-fp-armv8 --with-mode=thumb --with-arch=armv8-a
>
> Can you check if it works now? I've added a missing check_vect () call in
> case the harness passes in command-line options that your HW doesn't
> support. Otherwise I'd appreciate command-line options to reproduce.
I still fails (check_vect() passes on my config, so there's no change).
Here is what sum_8_foos looks like:
sum_8_foos:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
vmov.i64 d0, #0 @ float
add r3, r0, #192
.L10:
vldr.64 d16, [r0, #8]
adds r0, r0, #24
vldr.64 d18, [r0, #-24]
vldr.64 d17, [r0, #-8]
cmp r3, r0
vadd.f64 d16, d16, d18
vadd.f64 d16, d16, d17
vadd.f64 d0, d0, d16
bne .L10
bx lr
so we load:
d16=5
d17=-__DBL_MAX__
d18=__DBL_MAX__
the first addition makes d16=__DBL_MAX__
and the second one makes d16=0
> I cannot get anything to vectorize with a cc1 cross using
>
> > ./cc1 -quiet t.c -O2 -ftree-vectorize -fno-vect-cost-model -fopt-info-vec -I include tri
>
> but I have a cross configured with --with-float=hard --with-cpu=cortex-a9
> --with-fpu=neon-fp16
Not sure what happens. I tried my native compiler with the above flags, I get
the same code.
I tried to build my native compiler with the same flags, same code too.
>
> I hope the FPU is compliant enough to compute __DBL_MAX__ + -__DBL_MAX__ +
> 5. to 5.
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug tree-optimization/110381] [11/12/13 Regression] double counting for sum of structs of floating point types
2023-06-23 18:06 [Bug c++/110381] New: Incorrect loop unrolling for structs of floating point types lennox.ho at intel dot com
` (14 preceding siblings ...)
2023-06-30 14:05 ` clyon at gcc dot gnu.org
@ 2023-06-30 20:50 ` pinskia at gcc dot gnu.org
2023-07-03 7:54 ` clyon at gcc dot gnu.org
` (6 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-06-30 20:50 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110381
--- Comment #16 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I suspect this patch will fix the arm failure:
```
diff --git a/gcc/testsuite/gcc.dg/vect/pr110381.c
b/gcc/testsuite/gcc.dg/vect/pr110381.c
index dc8c6a8f683..ee78666d2e8 100644
--- a/gcc/testsuite/gcc.dg/vect/pr110381.c
+++ b/gcc/testsuite/gcc.dg/vect/pr110381.c
@@ -1,4 +1,5 @@
/* { dg-do run } */
+/* { dg-require-effective-target vect_float_strict } */
#include "tree-vect.h"
````
Arm enables -funsafe-math-optimizations for the vector testsuite (as it
requires because of denormals IIRC) but this testcase requires strict ordering
so you need to not allow `-funsafe-math-optimizations` which is what
vect_float_strict does.
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug tree-optimization/110381] [11/12/13 Regression] double counting for sum of structs of floating point types
2023-06-23 18:06 [Bug c++/110381] New: Incorrect loop unrolling for structs of floating point types lennox.ho at intel dot com
` (15 preceding siblings ...)
2023-06-30 20:50 ` pinskia at gcc dot gnu.org
@ 2023-07-03 7:54 ` clyon at gcc dot gnu.org
2023-07-03 7:57 ` pinskia at gcc dot gnu.org
` (5 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: clyon at gcc dot gnu.org @ 2023-07-03 7:54 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110381
--- Comment #17 from Christophe Lyon <clyon at gcc dot gnu.org> ---
Thanks Andrew, I wasn't aware of vect_float_strict.
I confirm it makes the test UNSUPPORTED.
Can you commit the fix or do you want me to do it on your behalf?
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug tree-optimization/110381] [11/12/13 Regression] double counting for sum of structs of floating point types
2023-06-23 18:06 [Bug c++/110381] New: Incorrect loop unrolling for structs of floating point types lennox.ho at intel dot com
` (16 preceding siblings ...)
2023-07-03 7:54 ` clyon at gcc dot gnu.org
@ 2023-07-03 7:57 ` pinskia at gcc dot gnu.org
2023-07-03 8:03 ` cvs-commit at gcc dot gnu.org
` (4 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-07-03 7:57 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110381
--- Comment #18 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Christophe Lyon from comment #17)
> Thanks Andrew, I wasn't aware of vect_float_strict.
> I confirm it makes the test UNSUPPORTED.
>
> Can you commit the fix or do you want me to do it on your behalf?
Can you do it as I have other things I am working on right now.
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug tree-optimization/110381] [11/12/13 Regression] double counting for sum of structs of floating point types
2023-06-23 18:06 [Bug c++/110381] New: Incorrect loop unrolling for structs of floating point types lennox.ho at intel dot com
` (17 preceding siblings ...)
2023-07-03 7:57 ` pinskia at gcc dot gnu.org
@ 2023-07-03 8:03 ` cvs-commit at gcc dot gnu.org
2023-07-03 8:05 ` clyon at gcc dot gnu.org
` (3 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-07-03 8:03 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110381
--- Comment #19 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Christophe Lyon <clyon@gcc.gnu.org>:
https://gcc.gnu.org/g:8cb087d869be698a86b082a7248d03e468ef1eb1
commit r14-2254-g8cb087d869be698a86b082a7248d03e468ef1eb1
Author: Christophe Lyon <christophe.lyon@linaro.org>
Date: Mon Jul 3 08:00:00 2023 +0000
testsuite: Add vect_float_strict to testcase [PR 110381]
As discussed in the PR, the testcase needs
/* { dg-require-effective-target vect_float_strict } */
2023-02-03 Andrew Pinski <apinski@marvell.com>
PR tree-optimization/110381
gcc/testsuite/
* gcc.dg/vect/pr110381.c: Add vect_float_strict.
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug tree-optimization/110381] [11/12/13 Regression] double counting for sum of structs of floating point types
2023-06-23 18:06 [Bug c++/110381] New: Incorrect loop unrolling for structs of floating point types lennox.ho at intel dot com
` (18 preceding siblings ...)
2023-07-03 8:03 ` cvs-commit at gcc dot gnu.org
@ 2023-07-03 8:05 ` clyon at gcc dot gnu.org
2023-07-07 12:06 ` [Bug tree-optimization/110381] [11/12 " cvs-commit at gcc dot gnu.org
` (2 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: clyon at gcc dot gnu.org @ 2023-07-03 8:05 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110381
--- Comment #20 from Christophe Lyon <clyon at gcc dot gnu.org> ---
Sorry for the typo in the date in the commit message :-(
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug tree-optimization/110381] [11/12 Regression] double counting for sum of structs of floating point types
2023-06-23 18:06 [Bug c++/110381] New: Incorrect loop unrolling for structs of floating point types lennox.ho at intel dot com
` (19 preceding siblings ...)
2023-07-03 8:05 ` clyon at gcc dot gnu.org
@ 2023-07-07 12:06 ` cvs-commit at gcc dot gnu.org
2024-06-04 8:07 ` cvs-commit at gcc dot gnu.org
2024-06-04 8:09 ` [Bug tree-optimization/110381] [11 " rguenth at gcc dot gnu.org
22 siblings, 0 replies; 24+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-07-07 12:06 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110381
--- Comment #21 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-13 branch has been updated by Richard Biener
<rguenth@gcc.gnu.org>:
https://gcc.gnu.org/g:32c7f05f8bc6d45dee374fe22be3f0e19836278a
commit r13-7543-g32c7f05f8bc6d45dee374fe22be3f0e19836278a
Author: Richard Biener <rguenther@suse.de>
Date: Mon Jun 26 12:51:37 2023 +0200
tree-optimization/110381 - preserve SLP permutation with in-order
reductions
The following fixes a bug that manifests itself during fold-left
reduction transform in picking not the last scalar def to replace
and thus double-counting some elements. But the underlying issue
is that we merge a load permutation into the in-order reduction
which is of course wrong.
Now, reduction analysis has not yet been performend when optimizing
permutations so we have to resort to check that ourselves.
PR tree-optimization/110381
* tree-vect-slp.cc
(vect_optimize_slp_pass::start_choosing_layouts):
Materialize permutes before fold-left reductions.
* gcc.dg/vect/pr110381.c: New testcase.
(cherry picked from commit 53d6f57c1b20c6da52aefce737fb7d5263686ba3)
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug tree-optimization/110381] [11/12 Regression] double counting for sum of structs of floating point types
2023-06-23 18:06 [Bug c++/110381] New: Incorrect loop unrolling for structs of floating point types lennox.ho at intel dot com
` (20 preceding siblings ...)
2023-07-07 12:06 ` [Bug tree-optimization/110381] [11/12 " cvs-commit at gcc dot gnu.org
@ 2024-06-04 8:07 ` cvs-commit at gcc dot gnu.org
2024-06-04 8:09 ` [Bug tree-optimization/110381] [11 " rguenth at gcc dot gnu.org
22 siblings, 0 replies; 24+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-06-04 8:07 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110381
--- Comment #22 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-12 branch has been updated by Richard Biener
<rguenth@gcc.gnu.org>:
https://gcc.gnu.org/g:8f6d889a8e609710ecfd555778fbff602b2c7d74
commit r12-10491-g8f6d889a8e609710ecfd555778fbff602b2c7d74
Author: Richard Biener <rguenther@suse.de>
Date: Mon Jun 26 12:51:37 2023 +0200
tree-optimization/110381 - preserve SLP permutation with in-order
reductions
The following fixes a bug that manifests itself during fold-left
reduction transform in picking not the last scalar def to replace
and thus double-counting some elements. But the underlying issue
is that we merge a load permutation into the in-order reduction
which is of course wrong.
Now, reduction analysis has not yet been performend when optimizing
permutations so we have to resort to check that ourselves.
PR tree-optimization/110381
* tree-vect-slp.cc
(vect_optimize_slp_pass::start_choosing_layouts):
Materialize permutes before fold-left reductions.
* gcc.dg/vect/pr110381.c: New testcase.
(cherry picked from commit 53d6f57c1b20c6da52aefce737fb7d5263686ba3)
^ permalink raw reply [flat|nested] 24+ messages in thread
* [Bug tree-optimization/110381] [11 Regression] double counting for sum of structs of floating point types
2023-06-23 18:06 [Bug c++/110381] New: Incorrect loop unrolling for structs of floating point types lennox.ho at intel dot com
` (21 preceding siblings ...)
2024-06-04 8:07 ` cvs-commit at gcc dot gnu.org
@ 2024-06-04 8:09 ` rguenth at gcc dot gnu.org
22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-06-04 8:09 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110381
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|[11/12 Regression] double |[11 Regression] double
|counting for sum of structs |counting for sum of structs
|of floating point types |of floating point types
Priority|P3 |P2
Known to fail| |12.3.0
Known to work| |12.3.1
^ permalink raw reply [flat|nested] 24+ messages in thread