[Bug tree-optimization/107247] New: SLP reduction results fail to reduce to a single accumulator

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/107247] New: SLP reduction results fail to reduce to a single accumulator
@ 2022-10-13 12:37 rguenth at gcc dot gnu.org
  2022-10-13 12:59 ` [Bug tree-optimization/107247] " rguenth at gcc dot gnu.org
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-10-13 12:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107247

            Bug ID: 107247
           Summary: SLP reduction results fail to reduce to a single
                    accumulator
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

float fl[128];
int x[128];
float
foo (int n1)
{
  float sum0, sum1, sum2, sum3;
  sum0 = sum1 = sum2 = sum3 = 0.0f;

  int n = (n1 / 4) * 4;
  for (int i = 0; i < n; i += 4)
    {
      sum0 += fabs (fl[i]);
      sum1 += fabs (fl[i + 1]);
      sum2 += fabs (fl[i + 2]);
      sum3 += fabs (fl[i + 3]);
      x[i] = 1;
    }

  return sum0 + sum1 + sum2 + sum3;
}

shows how we fail to reduce the SLP reduction accumulators to a single one
before extracting the elements:

  <bb 3> [local count: 567644343]:
  # sum0_37 = PHI <sum0_29(7), 0.0(9)>
  # sum1_39 = PHI <sum1_30(7), 0.0(9)>
  # sum2_41 = PHI <sum2_31(7), 0.0(9)>
  # sum3_43 = PHI <sum3_32(7), 0.0(9)>
  # i_45 = PHI <i_34(7), 0(9)>
  # vectp_fl.8_89 = PHI <vectp_fl.8_90(7), &fl(9)>
  # vect_sum3_43.15_102 = PHI <vect_sum3_32.16_106(7), { 0.0, 0.0, 0.0, 0.0
}(9)>
  # vect_sum3_43.15_103 = PHI <vect_sum3_32.16_107(7), { 0.0, 0.0, 0.0, 0.0
}(9)>
  # vect_sum3_43.15_104 = PHI <vect_sum3_32.16_108(7), { 0.0, 0.0, 0.0, 0.0
}(9)>
  # vect_sum3_43.15_105 = PHI <vect_sum3_32.16_109(7), { 0.0, 0.0, 0.0, 0.0
}(9)>
...
  vect__12.14_98 = ABS_EXPR <vect__11.10_91>;
  vect__12.14_99 = ABS_EXPR <vect__11.11_93>;
  vect__12.14_100 = ABS_EXPR <vect__11.12_95>;
  vect__12.14_101 = ABS_EXPR <vect__11.13_97>;
  vect_sum3_32.16_106 = vect__12.14_98 + vect_sum3_43.15_102;
  vect_sum3_32.16_107 = vect__12.14_99 + vect_sum3_43.15_103;
  vect_sum3_32.16_108 = vect__12.14_100 + vect_sum3_43.15_104;
  vect_sum3_32.16_109 = vect__12.14_101 + vect_sum3_43.15_105;
...

  <bb 11> [local count: 94607391]:
  # sum0_48 = PHI <sum0_29(3)>
  # sum1_36 = PHI <sum1_30(3)>
  # sum2_35 = PHI <sum2_31(3)>
  # sum3_24 = PHI <sum3_32(3)>
  # vect_sum3_32.16_110 = PHI <vect_sum3_32.16_106(3)>
  # vect_sum3_32.16_111 = PHI <vect_sum3_32.16_107(3)>
  # vect_sum3_32.16_112 = PHI <vect_sum3_32.16_108(3)>
  # vect_sum3_32.16_113 = PHI <vect_sum3_32.16_109(3)>
  _114 = BIT_FIELD_REF <vect_sum3_32.16_110, 32, 0>;
  _115 = BIT_FIELD_REF <vect_sum3_32.16_110, 32, 32>;
  _116 = BIT_FIELD_REF <vect_sum3_32.16_110, 32, 64>;
  _117 = BIT_FIELD_REF <vect_sum3_32.16_110, 32, 96>;
  _118 = BIT_FIELD_REF <vect_sum3_32.16_111, 32, 0>;
  _119 = BIT_FIELD_REF <vect_sum3_32.16_111, 32, 32>;
  _120 = BIT_FIELD_REF <vect_sum3_32.16_111, 32, 64>;
  _121 = BIT_FIELD_REF <vect_sum3_32.16_111, 32, 96>;
  _122 = BIT_FIELD_REF <vect_sum3_32.16_112, 32, 0>;
  _123 = BIT_FIELD_REF <vect_sum3_32.16_112, 32, 32>;
  _124 = BIT_FIELD_REF <vect_sum3_32.16_112, 32, 64>;
  _125 = BIT_FIELD_REF <vect_sum3_32.16_112, 32, 96>;
  _126 = BIT_FIELD_REF <vect_sum3_32.16_113, 32, 0>;
  _127 = BIT_FIELD_REF <vect_sum3_32.16_113, 32, 32>;
  _128 = BIT_FIELD_REF <vect_sum3_32.16_113, 32, 64>;
  _129 = BIT_FIELD_REF <vect_sum3_32.16_113, 32, 96>;
  _130 = _114 + _118;
  _131 = _115 + _119;
  _132 = _116 + _120;
  _133 = _117 + _121;
  _134 = _130 + _122;
  _135 = _131 + _123;
  _136 = _132 + _124;
  _137 = _133 + _125;
  _138 = _134 + _126;
  _139 = _135 + _127;
  _140 = _136 + _128;
  _141 = _137 + _129;
...

instead of doing vector adds and a single series of extracts.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/107247] SLP reduction results fail to reduce to a single accumulator
  2022-10-13 12:37 [Bug tree-optimization/107247] New: SLP reduction results fail to reduce to a single accumulator rguenth at gcc dot gnu.org
@ 2022-10-13 12:59 ` rguenth at gcc dot gnu.org
  2022-10-13 14:28 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-10-13 12:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107247

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org
   Last reconfirmed|                            |2022-10-13

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
I have a patch.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/107247] SLP reduction results fail to reduce to a single accumulator
  2022-10-13 12:37 [Bug tree-optimization/107247] New: SLP reduction results fail to reduce to a single accumulator rguenth at gcc dot gnu.org
  2022-10-13 12:59 ` [Bug tree-optimization/107247] " rguenth at gcc dot gnu.org
@ 2022-10-13 14:28 ` cvs-commit at gcc dot gnu.org
  2022-10-13 14:30 ` rguenth at gcc dot gnu.org
  2023-12-12  3:17 ` pinskia at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-10-13 14:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107247

--- Comment #2 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:e5139d18dfb8130876ea59178e8471fb1b34bb80

commit r13-3276-ge5139d18dfb8130876ea59178e8471fb1b34bb80
Author: Richard Biener <rguenther@suse.de>
Date:   Thu Oct 13 14:56:01 2022 +0200

    tree-optimization/107247 - reduce SLP reduction accumulator

    The following makes sure to reduce a multi-vector SLP reduction
    accumulator to a single vector using vector operations if
    easily possible (if the number of lanes in the vector type is
    a multiple of the number of scalar accumulators).

            PR tree-optimization/107247
            * tree-vect-loop.cc (vect_create_epilog_for_reduction):
            Reduce multi vector SLP reduction accumulators.  Check
            the adjusted number of accumulator vectors against
            one for the re-use in the epilogue.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/107247] SLP reduction results fail to reduce to a single accumulator
  2022-10-13 12:37 [Bug tree-optimization/107247] New: SLP reduction results fail to reduce to a single accumulator rguenth at gcc dot gnu.org
  2022-10-13 12:59 ` [Bug tree-optimization/107247] " rguenth at gcc dot gnu.org
  2022-10-13 14:28 ` cvs-commit at gcc dot gnu.org
@ 2022-10-13 14:30 ` rguenth at gcc dot gnu.org
  2023-12-12  3:17 ` pinskia at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-10-13 14:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107247

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|ASSIGNED                    |RESOLVED

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
This is now fixed for the cases not requiring permutations.  One could think of
a three-lane SLP reduction with three four component vectors being reduced
by first "expanding" to four "three component" vectors, summing them and then
extracting from the lower three lanes.  Likewise for a six-lane SLP reduction
which would get a more complex extraction of two-vector, six-lane pairs.

Unless a compelling case comes along I don't consider these important.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/107247] SLP reduction results fail to reduce to a single accumulator
  2022-10-13 12:37 [Bug tree-optimization/107247] New: SLP reduction results fail to reduce to a single accumulator rguenth at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2022-10-13 14:30 ` rguenth at gcc dot gnu.org
@ 2023-12-12  3:17 ` pinskia at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-12-12  3:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107247

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |13.0

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-12-12  3:17 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-13 12:37 [Bug tree-optimization/107247] New: SLP reduction results fail to reduce to a single accumulator rguenth at gcc dot gnu.org
2022-10-13 12:59 ` [Bug tree-optimization/107247] " rguenth at gcc dot gnu.org
2022-10-13 14:28 ` cvs-commit at gcc dot gnu.org
2022-10-13 14:30 ` rguenth at gcc dot gnu.org
2023-12-12  3:17 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).