From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 36FE23858C51; Fri, 22 Apr 2022 16:28:08 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 36FE23858C51 From: "jakub at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/104723] [12 regression] Redundant usage of stack Date: Fri, 22 Apr 2022 16:28:07 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 12.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: jakub at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 12.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Apr 2022 16:28:08 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D104723 --- Comment #10 from Jakub Jelinek --- (In reply to H.J. Lu from comment #8) > > DSE can remove redundant load/store for TI, but not OI/XI. DSE can remove redundant load/store for OI/XI just fine, just remove the la= st 7 from the string so that it is 48 bytes instead of 49 and all of sudden it w= orks fine. It is indeed due to: > It is due to overlapping store. this. Wonder if we couldn't special case overlapping stores if they are loaded fr= om constant pool and the overlapping bytes have the same values. And for the backend, the question is how big the penalty for the overlapping store is compared to doing multiple non-overlapping stores. Say for those = 49 bytes one could do one OI, one TI/V1TI and one QI load/store as opposed to one aligned and one misaligned OI load/store. For say: void foo (void *p, void *q) { __builtin_memcpy (p, q, 49); } we emit the 2 overlapping loads/stores for -mavx512f and 4 non-overlapping loads/stores with say -mavx2.=