From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 0BF003858C50; Fri, 23 Feb 2024 15:02:58 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0BF003858C50 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1708700579; bh=+K/0ggkSnYdcwt33gEoOPgL7OyH6Fni0ROgz5sD+Fcc=; h=From:To:Subject:Date:In-Reply-To:References:From; b=LPb40ad7Fs69fCxGc6euxYedy5D0xr+OnE7tx1HVzieQmlAySWIsbwtb7SxNH3mx9 nXjqyFXFaRptTgqoyc/9Gm8Wr75UoXmszDrmLzdXZ1sduFr8KdiYshOoLI8oZj3oOM U2wWAHA6YgiBoPVpv89Q16EGADyTzD8npye3azTo= From: "manolis.tsamis at vrull dot eu" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/114010] Unwanted effects of using SSA free lists. Date: Fri, 23 Feb 2024 15:02:58 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: manolis.tsamis at vrull dot eu X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D114010 --- Comment #10 from Manolis Tsamis --- (In reply to ptomsich from comment #9) > (In reply to Manolis Tsamis from comment #0)=20 > > E.g. another loop, non canonicalized names: > >=20 > > .L120: > > ldr q30, [x0], 16 > > movi v29.2s, 0 > > ld2 {v26.16b - v27.16b}, [x4], 32 > > movi v25.4s, 0 > > zip1 v29.16b, v30.16b, v29.16b > > zip2 v30.16b, v30.16b, v25.16b > > umlal v29.8h, v26.8b, v28.8b > > umlal2 v30.8h, v26.16b, v28.16b > > uaddw v31.4s, v31.4s, v29.4h > > uaddw2 v31.4s, v31.4s, v29.8h > > uaddw v31.4s, v31.4s, v30.4h > > uaddw2 v31.4s, v31.4s, v30.8h > > cmp x5, x0 > > bne .L120 >=20 > Is it just me, or are the zip1 and zip2 instructions dead? >=20 > Philipp. They certainly look dead, but they're not because the umlal/umlal2 (and oth= er accumulate instructions) also read from the destination register. There looks to be a missed optimization opportunity to use just a single `m= ovi v25.4s, 0` here though. Also, looking again at the generated code in the first example: mov v23.16b, v18.16b mla v23.16b, v17.16b, v25.16b If I'm correct this could be folded into just mla v18.16b, v17.16b, v25.16b In which case most of the movs in the first and second example could be eliminated. To me it looks like the accumulate instructions are missing some optimizations.=