From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 728563858408; Fri, 16 Feb 2024 08:11:10 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 728563858408 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1708071070; bh=QrFPJcZ0f1HE4Dsn9EHo9YsDBY2iRUYaMI6y4Mer8xI=; h=From:To:Subject:Date:In-Reply-To:References:From; b=luZfOaigi2/yLLl5hYWOcruCPzk/xnhNB04NXR0M4J7ov9VqA6HXoWnXvLRySs5GF 8QTtjiKhHIL6BQb10aHr+TfjUkCrOCNeUBzVBWq0vd11OdKe/eBgUevY6cwtbxKiq8 wiT3xumH4U/kV+Vg5/8G7qX2AGjIRU0vbJg3YtxA= From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/112508] [14 Regression] Size regression when using -Os starting with r14-4089-gd45ddc2c04e Date: Fri, 16 Feb 2024 08:11:05 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 14.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D112508 --- Comment #3 from Richard Biener --- Loop store-motion is a difficult thing to cost - it's a critical enabler for many of our loop optimizations, including scalar evolution analysis. Now, this might not hold true so much for the cases where we end up using an extra flag to avoid store data races and this example also shows we're doing a bad job in trying to unify flags for variables stored in the same blocks (we don't try to do this at all ...). Value-numbering has difficulties getting from zero flags to "same flags", it only manages to elide one flag (but maybe that's all we can do - I didn't exactly analyze). Conditionally set (conditionally within a loop, not so much conditionally executed subloops) vars at least less likely will help SCEV, so cost modeling (aka estimating register pressure in a simplistic way, like counting the number of IVs) of store-motion of those might be a way to combat this. Or, for example, disable conditional store-motion for -Os entirely. For targets where -Os matters likely -fallow-store-data-races would be a way to rescue. With that I get on x86_64 main1: .LFB1: .cfi_startproc movb h(%rip), %sil movl d(%rip), %edx movl g(%rip), %edi movl e(%rip), %ecx movl f(%rip), %eax .L2: testb %sil, %sil je .L5 movl %eax, %ecx .L6: movl %ecx, %eax cmpl $9, %ecx jg .L9 testl %edx, %edx je .L3 xorl %edi, %edi .L3: incl %ecx jmp .L6 .L9: decl %esi xorl %ecx, %ecx xorl %edx, %edx jmp .L2 .L5: movb $0, h(%rip) movl %eax, f(%rip) movl %ecx, e(%rip) movl %edi, g(%rip) movl %edx, d(%rip) ret Actionable items: a) disable flag store motion for cold loops (or stores only happening in cold parts of the loop) b) optimize flag variable allocation (try to use the same flag for multiple vars) c) some kind of register pressure estimation, possibly only for non-innerm= ost loops=