From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id E385F384F6E5; Wed, 23 Nov 2022 21:27:04 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E385F384F6E5 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1669238824; bh=xTlUL8K5hTvLW4vogJMq4vgM+N7HaN5nMMBo/CveVbE=; h=From:To:Subject:Date:In-Reply-To:References:From; b=A7VJR6BT6dVBEwU6dE1k1tfG96so9rmd4D/dkb1yoUMVueTH1do6A4L+LU6oe/mtZ h4enOr3gTJkS79u8uXGlSBZHguWggKNcI0tMOTVQilsF2X1+Wy1QQmxq1wcJiRvGPu 3YGw8RTqyFC9D6B32O1asfsaSdPo3WbTC+2bHOIU= From: "pskocik at gmail dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug c/107831] Missed optimization: -fclash-stack-protection causes unnecessary code generation for dynamic stack allocations that are clearly less than a page Date: Wed, 23 Nov 2022 21:27:04 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: c X-Bugzilla-Version: unknown X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: pskocik at gmail dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D107831 --- Comment #6 from Petr Skocik --- (In reply to Jakub Jelinek from comment #2) > (In reply to Petr Skocik from comment #1) > > Sidenote regarding the stack-allocating code for cases when the size is= not > > known to be less than pagesize: the code generated for those cases is q= uite > > large. It could be replaced (at least under -Os) with a call to a speci= al > > assembly function that'd pop the return address (assuming the target ma= chine > > pushes return addresses to the stack), allocate adjust and allocate the > > stack size in a piecemeal fashion so as to not skip guard pages, the re= push > > the return address and return to caller with the stacksize expanded. >=20 > You certainly don't want to kill the return stack the CPU has, even if it > results in a few saved bytes for -Os. That's a very interesting point because I have written x86_64 assembly "functions" that did pop the return address, pushed something to the stack, and then repushed the return address and returned. In a loop, it doesn't se= em to perform badly compared to inline code, so I figure it shouldn't be messi= ng with the return stack buffer. After all, even though the return happens thr= ough a different place in the callstack, it's still returning to the original caller. The one time I absolutely must have accidentally messed with the re= turn stack buffer was when I wrote context switching routine and originally trie= d to "ret" to the new context. It turned out to be very measurably many times sl= ower that `pop %rcx; jmp *%rcx;` (also measured on a loop), so that's why I think popping a return address, allocating on the stack, and then pushing and returning is not really a performance killer (on my Intel CPU anyway). If it was messing with the return stack buffer, I think would be getting similar slowdowns to what I got with context switching code trying to `ret`.=