From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 8F41D3858C56; Thu, 13 Oct 2022 12:38:07 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 8F41D3858C56 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1665664687; bh=xcITQ03xyX0/P5DgKnkbP9HU5/IpJNTGcTeJdFayOt8=; h=From:To:Subject:Date:From; b=KoQ0QGOqcOG2FbiLvnvtJ/VZz8IZ5gqpfEbRXLdexvA////RqRseqBI1phYPxObWc ylyoc0wuJyVQFVOJ2pE44vT0Hi/3nNsjKDRwXjyNC1/qH41j+HqBRXbCt2bZZQ4BCS KA2GdU/vLHNbYtsXB9xU0Dgt3Z92Cvk/5s0vnr7o= From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/107247] New: SLP reduction results fail to reduce to a single accumulator Date: Thu, 13 Oct 2022 12:37:56 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 13.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D107247 Bug ID: 107247 Summary: SLP reduction results fail to reduce to a single accumulator Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Target Milestone: --- float fl[128]; int x[128]; float foo (int n1) { float sum0, sum1, sum2, sum3; sum0 =3D sum1 =3D sum2 =3D sum3 =3D 0.0f; int n =3D (n1 / 4) * 4; for (int i =3D 0; i < n; i +=3D 4) { sum0 +=3D fabs (fl[i]); sum1 +=3D fabs (fl[i + 1]); sum2 +=3D fabs (fl[i + 2]); sum3 +=3D fabs (fl[i + 3]); x[i] =3D 1; } return sum0 + sum1 + sum2 + sum3; } shows how we fail to reduce the SLP reduction accumulators to a single one before extracting the elements: [local count: 567644343]: # sum0_37 =3D PHI # sum1_39 =3D PHI # sum2_41 =3D PHI # sum3_43 =3D PHI # i_45 =3D PHI # vectp_fl.8_89 =3D PHI # vect_sum3_43.15_102 =3D PHI # vect_sum3_43.15_103 =3D PHI # vect_sum3_43.15_104 =3D PHI # vect_sum3_43.15_105 =3D PHI ... vect__12.14_98 =3D ABS_EXPR ; vect__12.14_99 =3D ABS_EXPR ; vect__12.14_100 =3D ABS_EXPR ; vect__12.14_101 =3D ABS_EXPR ; vect_sum3_32.16_106 =3D vect__12.14_98 + vect_sum3_43.15_102; vect_sum3_32.16_107 =3D vect__12.14_99 + vect_sum3_43.15_103; vect_sum3_32.16_108 =3D vect__12.14_100 + vect_sum3_43.15_104; vect_sum3_32.16_109 =3D vect__12.14_101 + vect_sum3_43.15_105; ... [local count: 94607391]: # sum0_48 =3D PHI # sum1_36 =3D PHI # sum2_35 =3D PHI # sum3_24 =3D PHI # vect_sum3_32.16_110 =3D PHI # vect_sum3_32.16_111 =3D PHI # vect_sum3_32.16_112 =3D PHI # vect_sum3_32.16_113 =3D PHI _114 =3D BIT_FIELD_REF ; _115 =3D BIT_FIELD_REF ; _116 =3D BIT_FIELD_REF ; _117 =3D BIT_FIELD_REF ; _118 =3D BIT_FIELD_REF ; _119 =3D BIT_FIELD_REF ; _120 =3D BIT_FIELD_REF ; _121 =3D BIT_FIELD_REF ; _122 =3D BIT_FIELD_REF ; _123 =3D BIT_FIELD_REF ; _124 =3D BIT_FIELD_REF ; _125 =3D BIT_FIELD_REF ; _126 =3D BIT_FIELD_REF ; _127 =3D BIT_FIELD_REF ; _128 =3D BIT_FIELD_REF ; _129 =3D BIT_FIELD_REF ; _130 =3D _114 + _118; _131 =3D _115 + _119; _132 =3D _116 + _120; _133 =3D _117 + _121; _134 =3D _130 + _122; _135 =3D _131 + _123; _136 =3D _132 + _124; _137 =3D _133 + _125; _138 =3D _134 + _126; _139 =3D _135 + _127; _140 =3D _136 + _128; _141 =3D _137 + _129; ... instead of doing vector adds and a single series of extracts.=