From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 10F233858439; Wed, 20 Dec 2023 09:54:02 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 10F233858439 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1703066042; bh=YmnGdz9Uo0oqiCvyrzCWw3xWmuwNxtVsQzgdJyBFj8I=; h=From:To:Subject:Date:From; b=vhdBkLlw6ggOEn2LvVmgBY4QHRabo9+0DK9JMzx8vj57b0+Bkgnir7kl4amTYMoV0 Q7SHTet8H4P30pMGD9lVazGmUxBvkSMgodFDddr5q6YhQIT7532yOTACHQG1XqDCQy rSc1u4iA2AD1aMqdky8f/5JTeZX8INcoUlhZ/1QE= From: "fxue at os dot amperecomputing.com" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/113091] New: Over-estimate SLP vector-to-scalar cost for non-live pattern statement Date: Wed, 20 Dec 2023 09:54:01 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: fxue at os dot amperecomputing.com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D113091 Bug ID: 113091 Summary: Over-estimate SLP vector-to-scalar cost for non-live pattern statement Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: fxue at os dot amperecomputing.com Target Milestone: --- Gcc fails to vectorize the below testcase on aarch64. int test(unsigned array[8]); int foo(char *a, char *b) { unsigned array[8]; array[0] =3D (a[0] - b[0]); array[1] =3D (a[1] - b[1]); array[2] =3D (a[2] - b[2]); array[3] =3D (a[3] - b[3]); array[4] =3D (a[4] - b[4]); array[5] =3D (a[5] - b[5]); array[6] =3D (a[6] - b[6]); array[7] =3D (a[7] - b[7]); return test(array); } The dump shows that loads to a[i] and b[i] are considered to be live as sca= lar references, which results in over-estimated vector-to-scalar cost. *a_50(D) 1 times vec_to_scalar costs 2 in epilogue MEM[(char *)a_50(D) + 1B] 1 times vec_to_scalar costs 2 in epilogue MEM[(char *)a_50(D) + 2B] 1 times vec_to_scalar costs 2 in epilogue MEM[(char *)a_50(D) + 3B] 1 times vec_to_scalar costs 2 in epilogue MEM[(char *)a_50(D) + 4B] 1 times vec_to_scalar costs 2 in epilogue MEM[(char *)a_50(D) + 5B] 1 times vec_to_scalar costs 2 in epilogue MEM[(char *)a_50(D) + 6B] 1 times vec_to_scalar costs 2 in epilogue MEM[(char *)a_50(D) + 7B] 1 times vec_to_scalar costs 2 in epilogue *b_51(D) 1 times vec_to_scalar costs 2 in epilogue MEM[(char *)b_51(D) + 1B] 1 times vec_to_scalar costs 2 in epilogue MEM[(char *)b_51(D) + 2B] 1 times vec_to_scalar costs 2 in epilogue MEM[(char *)b_51(D) + 3B] 1 times vec_to_scalar costs 2 in epilogue MEM[(char *)b_51(D) + 4B] 1 times vec_to_scalar costs 2 in epilogue MEM[(char *)b_51(D) + 5B] 1 times vec_to_scalar costs 2 in epilogue MEM[(char *)b_51(D) + 6B] 1 times vec_to_scalar costs 2 in epilogue MEM[(char *)b_51(D) + 7B] 1 times vec_to_scalar costs 2 in epilogue Subtraction on char type is recognized as widen-sub, and involves two kinds= of pattern replacement. * Original _1 =3D *a_50(D); _2 =3D (int) _1; _3 =3D *b_51(D); _4 =3D (int) _3; _5 =3D _2 - _4; * After pattern replacement patt_63 =3D (unsigned short) _1; // _2 =3D (int) _1; patt_64 =3D (int) patt_63; // _2 =3D (int) _1; patt_65 =3D (unsigned short) _3; // _4 =3D (int) _3; patt_66 =3D (int) patt_65; // _4 =3D (int) _3; patt_67 =3D .VEC_WIDEN_MINUS (_1, _3); // _5 =3D _2 - _4; patt_68 =3D (signed short) patt_67; // _5 =3D _2 - _4; patt_69 =3D (int) patt_68; // _5 =3D _2 - _4; For the statement "_2 =3D (int) _1", its vectorization representative "patt= _64 =3D (int) patt_63" is not marked as PURE_SLP, so it is conservatively considere= d to having scalar use and being live outside of SLP bb (in the function vect_bb_slp_mark_live_stmts). However, the pattern definition is actually d= ead, should not contribute to vector-to-scalar cost.=20 Those defs from pattern statements are not part of function body, we could = not track def/use chain as ordinary SSAs. Probably, we may have a quick fix for= one situation, if the original SSA "_2" has single use, its existence should be only covered by vectorized operation, no matter what/how it would be w/o pattern replacement.=