From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 169663896837; Wed, 24 Feb 2021 09:05:17 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 169663896837 From: "bina2374 at gmail dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/94092] Code size and performance degradations after -ftree-loop-distribute-patterns was enabled at -O[2s]+ Date: Wed, 24 Feb 2021 09:05:16 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 10.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: bina2374 at gmail dot com X-Bugzilla-Status: ASSIGNED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: aoliva at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Feb 2021 09:05:17 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D94092 --- Comment #8 from Mel Chen --- Sorry for using the bad example to describe the problem I am facing. Let me clarify my question with a more precise example. void array_mul(int N, int *C, short *A, short *B) { int i, j; for (i =3D 0; i < N; i++) { C[i] =3D 0; // Will be transformed to __builtin_memset for (j =3D 0; j < N; j++) { C[i] +=3D (int)A[i * N + j] * (int)B[j]; } } } If I compile the case with -O2 -fno-tree-loop-distribute-patterns, the store operation 'C[i] =3D 0' can be eliminated by dead store elimination (dse3). = But without -fno-tree-loop-distribute-patterns, it will be transformed to memse= t by loop distribution (ldist) because ldist executes before dse3. Finally the memset will not be eliminated. Another point is if there are other operations in the same level loop as the store operation, is it really beneficial to do loop distribution and then convert to builtin function?=