From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 7EE5B3939C03; Wed, 24 Feb 2021 09:20:14 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7EE5B3939C03 From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/94092] Code size and performance degradations after -ftree-loop-distribute-patterns was enabled at -O[2s]+ Date: Wed, 24 Feb 2021 09:20:14 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 10.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: aoliva at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Feb 2021 09:20:14 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D94092 --- Comment #9 from Richard Biener --- (In reply to Mel Chen from comment #8) > Sorry for using the bad example to describe the problem I am facing. Let = me > clarify my question with a more precise example. >=20 > void array_mul(int N, int *C, short *A, short *B) { > int i, j; > for (i =3D 0; i < N; i++) { > C[i] =3D 0; // Will be transformed to __builtin_memset > for (j =3D 0; j < N; j++) { > C[i] +=3D (int)A[i * N + j] * (int)B[j]; > } > } > } >=20 > If I compile the case with -O2 -fno-tree-loop-distribute-patterns, the st= ore > operation 'C[i] =3D 0' can be eliminated by dead store elimination (dse3)= . But > without -fno-tree-loop-distribute-patterns, it will be transformed to mem= set > by loop distribution (ldist) because ldist executes before dse3. Finally = the > memset will not be eliminated. >=20 > Another point is if there are other operations in the same level loop as = the > store operation, is it really beneficial to do loop distribution and then > convert to builtin function? Sure, it shows a cost modeling issue given that usually loop distribution merges partitions which touch the same memory stream (but IIRC maybe only for loads). But more to the point we're missing to eliminate the dead store which should be appearant at least after PRE - LIM2 applied store motion but only PRE elides the resulting load of C[i]. Usually DCE and DSE come in pairs but after PRE we have DCE, CDDCE w/o accompaning DSE only with the next DSE only happening after loop distribution. Which means we should eventually do diff --git a/gcc/passes.def b/gcc/passes.def index e9ed3c7bc57..be3a9becde0 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -254,6 +254,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_sancov); NEXT_PASS (pass_asan); NEXT_PASS (pass_tsan); + NEXT_PASS (pass_dse); NEXT_PASS (pass_dce); /* Pass group that runs when 1) enabled, 2) there are loops in the function. Make sure to run pass_fix_loops before=