From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
 id 7EE5B3939C03; Wed, 24 Feb 2021 09:20:14 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7EE5B3939C03
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/94092] Code size and performance degradations
 after -ftree-loop-distribute-patterns was enabled at -O[2s]+
Date: Wed, 24 Feb 2021 09:20:14 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Version: 10.0
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenth at gcc dot gnu.org
X-Bugzilla-Status: ASSIGNED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: aoliva at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-94092-4-X5DwvkQkAh@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-94092-4@http.gcc.gnu.org/bugzilla/>
References: <bug-94092-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: gcc-bugs@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-bugs mailing list <gcc-bugs.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Wed, 24 Feb 2021 09:20:14 -0000

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D94092

--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Mel Chen from comment #8)
> Sorry for using the bad example to describe the problem I am facing. Let =
me
> clarify my question with a more precise example.
>=20
> void array_mul(int N, int *C, short *A, short *B) {
>   int i, j;
>   for (i =3D 0; i < N; i++) {
>     C[i] =3D 0; // Will be transformed to __builtin_memset
>     for (j =3D 0; j < N; j++) {
>       C[i] +=3D (int)A[i * N + j] * (int)B[j];
>     }
>   }
> }
>=20
> If I compile the case with -O2 -fno-tree-loop-distribute-patterns, the st=
ore
> operation 'C[i] =3D 0' can be eliminated by dead store elimination (dse3)=
. But
> without -fno-tree-loop-distribute-patterns, it will be transformed to mem=
set
> by loop distribution (ldist) because ldist executes before dse3. Finally =
the
> memset will not be eliminated.
>=20
> Another point is if there are other operations in the same level loop as =
the
> store operation, is it really beneficial to do loop distribution and then
> convert to builtin function?

Sure, it shows a cost modeling issue given that usually loop distribution
merges partitions which touch the same memory stream (but IIRC maybe only
for loads).  But more to the point we're missing to eliminate the dead store
which should be appearant at least after PRE - LIM2 applied store motion
but only PRE elides the resulting load of C[i].  Usually DCE and DSE come in
pairs but after PRE we have DCE, CDDCE w/o accompaning DSE only with the
next DSE only happening after loop distribution.

Which means we should eventually do
diff --git a/gcc/passes.def b/gcc/passes.def
index e9ed3c7bc57..be3a9becde0 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -254,6 +254,7 @@ along with GCC; see the file COPYING3.  If not see
       NEXT_PASS (pass_sancov);
       NEXT_PASS (pass_asan);
       NEXT_PASS (pass_tsan);
+      NEXT_PASS (pass_dse);
       NEXT_PASS (pass_dce);
       /* Pass group that runs when 1) enabled, 2) there are loops
         in the function.  Make sure to run pass_fix_loops before=