From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 8F41D3858C56; Thu, 13 Oct 2022 12:38:07 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 8F41D3858C56
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1665664687;
	bh=xcITQ03xyX0/P5DgKnkbP9HU5/IpJNTGcTeJdFayOt8=;
	h=From:To:Subject:Date:From;
	b=KoQ0QGOqcOG2FbiLvnvtJ/VZz8IZ5gqpfEbRXLdexvA////RqRseqBI1phYPxObWc
	 ylyoc0wuJyVQFVOJ2pE44vT0Hi/3nNsjKDRwXjyNC1/qH41j+HqBRXbCt2bZZQ4BCS
	 KA2GdU/vLHNbYtsXB9xU0Dgt3Z92Cvk/5s0vnr7o=
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/107247] New: SLP reduction results fail to
 reduce to a single accumulator
Date: Thu, 13 Oct 2022 12:37:56 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: new
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Version: 13.0
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenth at gcc dot gnu.org
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status
 bug_severity priority component assigned_to reporter target_milestone
Message-ID: <bug-107247-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D107247

            Bug ID: 107247
           Summary: SLP reduction results fail to reduce to a single
                    accumulator
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

float fl[128];
int x[128];
float
foo (int n1)
{
  float sum0, sum1, sum2, sum3;
  sum0 =3D sum1 =3D sum2 =3D sum3 =3D 0.0f;

  int n =3D (n1 / 4) * 4;
  for (int i =3D 0; i < n; i +=3D 4)
    {
      sum0 +=3D fabs (fl[i]);
      sum1 +=3D fabs (fl[i + 1]);
      sum2 +=3D fabs (fl[i + 2]);
      sum3 +=3D fabs (fl[i + 3]);
      x[i] =3D 1;
    }

  return sum0 + sum1 + sum2 + sum3;
}

shows how we fail to reduce the SLP reduction accumulators to a single one
before extracting the elements:

  <bb 3> [local count: 567644343]:
  # sum0_37 =3D PHI <sum0_29(7), 0.0(9)>
  # sum1_39 =3D PHI <sum1_30(7), 0.0(9)>
  # sum2_41 =3D PHI <sum2_31(7), 0.0(9)>
  # sum3_43 =3D PHI <sum3_32(7), 0.0(9)>
  # i_45 =3D PHI <i_34(7), 0(9)>
  # vectp_fl.8_89 =3D PHI <vectp_fl.8_90(7), &fl(9)>
  # vect_sum3_43.15_102 =3D PHI <vect_sum3_32.16_106(7), { 0.0, 0.0, 0.0, 0=
.0
}(9)>
  # vect_sum3_43.15_103 =3D PHI <vect_sum3_32.16_107(7), { 0.0, 0.0, 0.0, 0=
.0
}(9)>
  # vect_sum3_43.15_104 =3D PHI <vect_sum3_32.16_108(7), { 0.0, 0.0, 0.0, 0=
.0
}(9)>
  # vect_sum3_43.15_105 =3D PHI <vect_sum3_32.16_109(7), { 0.0, 0.0, 0.0, 0=
.0
}(9)>
...
  vect__12.14_98 =3D ABS_EXPR <vect__11.10_91>;
  vect__12.14_99 =3D ABS_EXPR <vect__11.11_93>;
  vect__12.14_100 =3D ABS_EXPR <vect__11.12_95>;
  vect__12.14_101 =3D ABS_EXPR <vect__11.13_97>;
  vect_sum3_32.16_106 =3D vect__12.14_98 + vect_sum3_43.15_102;
  vect_sum3_32.16_107 =3D vect__12.14_99 + vect_sum3_43.15_103;
  vect_sum3_32.16_108 =3D vect__12.14_100 + vect_sum3_43.15_104;
  vect_sum3_32.16_109 =3D vect__12.14_101 + vect_sum3_43.15_105;
...

  <bb 11> [local count: 94607391]:
  # sum0_48 =3D PHI <sum0_29(3)>
  # sum1_36 =3D PHI <sum1_30(3)>
  # sum2_35 =3D PHI <sum2_31(3)>
  # sum3_24 =3D PHI <sum3_32(3)>
  # vect_sum3_32.16_110 =3D PHI <vect_sum3_32.16_106(3)>
  # vect_sum3_32.16_111 =3D PHI <vect_sum3_32.16_107(3)>
  # vect_sum3_32.16_112 =3D PHI <vect_sum3_32.16_108(3)>
  # vect_sum3_32.16_113 =3D PHI <vect_sum3_32.16_109(3)>
  _114 =3D BIT_FIELD_REF <vect_sum3_32.16_110, 32, 0>;
  _115 =3D BIT_FIELD_REF <vect_sum3_32.16_110, 32, 32>;
  _116 =3D BIT_FIELD_REF <vect_sum3_32.16_110, 32, 64>;
  _117 =3D BIT_FIELD_REF <vect_sum3_32.16_110, 32, 96>;
  _118 =3D BIT_FIELD_REF <vect_sum3_32.16_111, 32, 0>;
  _119 =3D BIT_FIELD_REF <vect_sum3_32.16_111, 32, 32>;
  _120 =3D BIT_FIELD_REF <vect_sum3_32.16_111, 32, 64>;
  _121 =3D BIT_FIELD_REF <vect_sum3_32.16_111, 32, 96>;
  _122 =3D BIT_FIELD_REF <vect_sum3_32.16_112, 32, 0>;
  _123 =3D BIT_FIELD_REF <vect_sum3_32.16_112, 32, 32>;
  _124 =3D BIT_FIELD_REF <vect_sum3_32.16_112, 32, 64>;
  _125 =3D BIT_FIELD_REF <vect_sum3_32.16_112, 32, 96>;
  _126 =3D BIT_FIELD_REF <vect_sum3_32.16_113, 32, 0>;
  _127 =3D BIT_FIELD_REF <vect_sum3_32.16_113, 32, 32>;
  _128 =3D BIT_FIELD_REF <vect_sum3_32.16_113, 32, 64>;
  _129 =3D BIT_FIELD_REF <vect_sum3_32.16_113, 32, 96>;
  _130 =3D _114 + _118;
  _131 =3D _115 + _119;
  _132 =3D _116 + _120;
  _133 =3D _117 + _121;
  _134 =3D _130 + _122;
  _135 =3D _131 + _123;
  _136 =3D _132 + _124;
  _137 =3D _133 + _125;
  _138 =3D _134 + _126;
  _139 =3D _135 + _127;
  _140 =3D _136 + _128;
  _141 =3D _137 + _129;
...

instead of doing vector adds and a single series of extracts.=