public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
* [Bug tree-optimization/107247] New: SLP reduction results fail to reduce to a single accumulator @ 2022-10-13 12:37 rguenth at gcc dot gnu.org 2022-10-13 12:59 ` [Bug tree-optimization/107247] " rguenth at gcc dot gnu.org ` (3 more replies) 0 siblings, 4 replies; 5+ messages in thread From: rguenth at gcc dot gnu.org @ 2022-10-13 12:37 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107247 Bug ID: 107247 Summary: SLP reduction results fail to reduce to a single accumulator Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Target Milestone: --- float fl[128]; int x[128]; float foo (int n1) { float sum0, sum1, sum2, sum3; sum0 = sum1 = sum2 = sum3 = 0.0f; int n = (n1 / 4) * 4; for (int i = 0; i < n; i += 4) { sum0 += fabs (fl[i]); sum1 += fabs (fl[i + 1]); sum2 += fabs (fl[i + 2]); sum3 += fabs (fl[i + 3]); x[i] = 1; } return sum0 + sum1 + sum2 + sum3; } shows how we fail to reduce the SLP reduction accumulators to a single one before extracting the elements: <bb 3> [local count: 567644343]: # sum0_37 = PHI <sum0_29(7), 0.0(9)> # sum1_39 = PHI <sum1_30(7), 0.0(9)> # sum2_41 = PHI <sum2_31(7), 0.0(9)> # sum3_43 = PHI <sum3_32(7), 0.0(9)> # i_45 = PHI <i_34(7), 0(9)> # vectp_fl.8_89 = PHI <vectp_fl.8_90(7), &fl(9)> # vect_sum3_43.15_102 = PHI <vect_sum3_32.16_106(7), { 0.0, 0.0, 0.0, 0.0 }(9)> # vect_sum3_43.15_103 = PHI <vect_sum3_32.16_107(7), { 0.0, 0.0, 0.0, 0.0 }(9)> # vect_sum3_43.15_104 = PHI <vect_sum3_32.16_108(7), { 0.0, 0.0, 0.0, 0.0 }(9)> # vect_sum3_43.15_105 = PHI <vect_sum3_32.16_109(7), { 0.0, 0.0, 0.0, 0.0 }(9)> ... vect__12.14_98 = ABS_EXPR <vect__11.10_91>; vect__12.14_99 = ABS_EXPR <vect__11.11_93>; vect__12.14_100 = ABS_EXPR <vect__11.12_95>; vect__12.14_101 = ABS_EXPR <vect__11.13_97>; vect_sum3_32.16_106 = vect__12.14_98 + vect_sum3_43.15_102; vect_sum3_32.16_107 = vect__12.14_99 + vect_sum3_43.15_103; vect_sum3_32.16_108 = vect__12.14_100 + vect_sum3_43.15_104; vect_sum3_32.16_109 = vect__12.14_101 + vect_sum3_43.15_105; ... <bb 11> [local count: 94607391]: # sum0_48 = PHI <sum0_29(3)> # sum1_36 = PHI <sum1_30(3)> # sum2_35 = PHI <sum2_31(3)> # sum3_24 = PHI <sum3_32(3)> # vect_sum3_32.16_110 = PHI <vect_sum3_32.16_106(3)> # vect_sum3_32.16_111 = PHI <vect_sum3_32.16_107(3)> # vect_sum3_32.16_112 = PHI <vect_sum3_32.16_108(3)> # vect_sum3_32.16_113 = PHI <vect_sum3_32.16_109(3)> _114 = BIT_FIELD_REF <vect_sum3_32.16_110, 32, 0>; _115 = BIT_FIELD_REF <vect_sum3_32.16_110, 32, 32>; _116 = BIT_FIELD_REF <vect_sum3_32.16_110, 32, 64>; _117 = BIT_FIELD_REF <vect_sum3_32.16_110, 32, 96>; _118 = BIT_FIELD_REF <vect_sum3_32.16_111, 32, 0>; _119 = BIT_FIELD_REF <vect_sum3_32.16_111, 32, 32>; _120 = BIT_FIELD_REF <vect_sum3_32.16_111, 32, 64>; _121 = BIT_FIELD_REF <vect_sum3_32.16_111, 32, 96>; _122 = BIT_FIELD_REF <vect_sum3_32.16_112, 32, 0>; _123 = BIT_FIELD_REF <vect_sum3_32.16_112, 32, 32>; _124 = BIT_FIELD_REF <vect_sum3_32.16_112, 32, 64>; _125 = BIT_FIELD_REF <vect_sum3_32.16_112, 32, 96>; _126 = BIT_FIELD_REF <vect_sum3_32.16_113, 32, 0>; _127 = BIT_FIELD_REF <vect_sum3_32.16_113, 32, 32>; _128 = BIT_FIELD_REF <vect_sum3_32.16_113, 32, 64>; _129 = BIT_FIELD_REF <vect_sum3_32.16_113, 32, 96>; _130 = _114 + _118; _131 = _115 + _119; _132 = _116 + _120; _133 = _117 + _121; _134 = _130 + _122; _135 = _131 + _123; _136 = _132 + _124; _137 = _133 + _125; _138 = _134 + _126; _139 = _135 + _127; _140 = _136 + _128; _141 = _137 + _129; ... instead of doing vector adds and a single series of extracts. ^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug tree-optimization/107247] SLP reduction results fail to reduce to a single accumulator 2022-10-13 12:37 [Bug tree-optimization/107247] New: SLP reduction results fail to reduce to a single accumulator rguenth at gcc dot gnu.org @ 2022-10-13 12:59 ` rguenth at gcc dot gnu.org 2022-10-13 14:28 ` cvs-commit at gcc dot gnu.org ` (2 subsequent siblings) 3 siblings, 0 replies; 5+ messages in thread From: rguenth at gcc dot gnu.org @ 2022-10-13 12:59 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107247 Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Ever confirmed|0 |1 Status|UNCONFIRMED |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Last reconfirmed| |2022-10-13 --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- I have a patch. ^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug tree-optimization/107247] SLP reduction results fail to reduce to a single accumulator 2022-10-13 12:37 [Bug tree-optimization/107247] New: SLP reduction results fail to reduce to a single accumulator rguenth at gcc dot gnu.org 2022-10-13 12:59 ` [Bug tree-optimization/107247] " rguenth at gcc dot gnu.org @ 2022-10-13 14:28 ` cvs-commit at gcc dot gnu.org 2022-10-13 14:30 ` rguenth at gcc dot gnu.org 2023-12-12 3:17 ` pinskia at gcc dot gnu.org 3 siblings, 0 replies; 5+ messages in thread From: cvs-commit at gcc dot gnu.org @ 2022-10-13 14:28 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107247 --- Comment #2 from CVS Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>: https://gcc.gnu.org/g:e5139d18dfb8130876ea59178e8471fb1b34bb80 commit r13-3276-ge5139d18dfb8130876ea59178e8471fb1b34bb80 Author: Richard Biener <rguenther@suse.de> Date: Thu Oct 13 14:56:01 2022 +0200 tree-optimization/107247 - reduce SLP reduction accumulator The following makes sure to reduce a multi-vector SLP reduction accumulator to a single vector using vector operations if easily possible (if the number of lanes in the vector type is a multiple of the number of scalar accumulators). PR tree-optimization/107247 * tree-vect-loop.cc (vect_create_epilog_for_reduction): Reduce multi vector SLP reduction accumulators. Check the adjusted number of accumulator vectors against one for the re-use in the epilogue. ^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug tree-optimization/107247] SLP reduction results fail to reduce to a single accumulator 2022-10-13 12:37 [Bug tree-optimization/107247] New: SLP reduction results fail to reduce to a single accumulator rguenth at gcc dot gnu.org 2022-10-13 12:59 ` [Bug tree-optimization/107247] " rguenth at gcc dot gnu.org 2022-10-13 14:28 ` cvs-commit at gcc dot gnu.org @ 2022-10-13 14:30 ` rguenth at gcc dot gnu.org 2023-12-12 3:17 ` pinskia at gcc dot gnu.org 3 siblings, 0 replies; 5+ messages in thread From: rguenth at gcc dot gnu.org @ 2022-10-13 14:30 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107247 Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |FIXED Status|ASSIGNED |RESOLVED --- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> --- This is now fixed for the cases not requiring permutations. One could think of a three-lane SLP reduction with three four component vectors being reduced by first "expanding" to four "three component" vectors, summing them and then extracting from the lower three lanes. Likewise for a six-lane SLP reduction which would get a more complex extraction of two-vector, six-lane pairs. Unless a compelling case comes along I don't consider these important. ^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug tree-optimization/107247] SLP reduction results fail to reduce to a single accumulator 2022-10-13 12:37 [Bug tree-optimization/107247] New: SLP reduction results fail to reduce to a single accumulator rguenth at gcc dot gnu.org ` (2 preceding siblings ...) 2022-10-13 14:30 ` rguenth at gcc dot gnu.org @ 2023-12-12 3:17 ` pinskia at gcc dot gnu.org 3 siblings, 0 replies; 5+ messages in thread From: pinskia at gcc dot gnu.org @ 2023-12-12 3:17 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107247 Andrew Pinski <pinskia at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Target Milestone|--- |13.0 ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-12-12 3:17 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-10-13 12:37 [Bug tree-optimization/107247] New: SLP reduction results fail to reduce to a single accumulator rguenth at gcc dot gnu.org 2022-10-13 12:59 ` [Bug tree-optimization/107247] " rguenth at gcc dot gnu.org 2022-10-13 14:28 ` cvs-commit at gcc dot gnu.org 2022-10-13 14:30 ` rguenth at gcc dot gnu.org 2023-12-12 3:17 ` pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).