From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id E385F384F6E5; Wed, 23 Nov 2022 21:27:04 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E385F384F6E5
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1669238824;
	bh=xTlUL8K5hTvLW4vogJMq4vgM+N7HaN5nMMBo/CveVbE=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=A7VJR6BT6dVBEwU6dE1k1tfG96so9rmd4D/dkb1yoUMVueTH1do6A4L+LU6oe/mtZ
	 h4enOr3gTJkS79u8uXGlSBZHguWggKNcI0tMOTVQilsF2X1+Wy1QQmxq1wcJiRvGPu
	 3YGw8RTqyFC9D6B32O1asfsaSdPo3WbTC+2bHOIU=
From: "pskocik at gmail dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug c/107831] Missed optimization: -fclash-stack-protection causes
 unnecessary code generation for dynamic stack allocations that are clearly
 less than a page
Date: Wed, 23 Nov 2022 21:27:04 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: c
X-Bugzilla-Version: unknown
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: normal
X-Bugzilla-Who: pskocik at gmail dot com
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-107831-4-MZkxlmtuNc@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-107831-4@http.gcc.gnu.org/bugzilla/>
References: <bug-107831-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D107831
--- Comment #6 from Petr Skocik <pskocik at gmail dot com> ---
(In reply to Jakub Jelinek from comment #2)
> (In reply to Petr Skocik from comment #1)
> > Sidenote regarding the stack-allocating code for cases when the size is=
 not
> > known to be less than pagesize: the code generated for those cases is q=
uite
> > large. It could be replaced (at least under -Os) with a call to a speci=
al
> > assembly function that'd pop the return address (assuming the target ma=
chine
> > pushes return addresses to the stack), allocate adjust and allocate the
> > stack size in a piecemeal fashion so as to not skip guard pages, the re=
push
> > the return address and return to caller with the stacksize expanded.
>=20
> You certainly don't want to kill the return stack the CPU has, even if it
> results in a few saved bytes for -Os.

That's a very interesting point  because I have written x86_64 assembly
"functions" that  did pop the return address, pushed something to the stack,
and then repushed the return address and returned. In a loop, it doesn't se=
em
to perform badly compared to inline code, so I figure it shouldn't be messi=
ng
with the return stack buffer. After all, even though the return happens thr=
ough
a different place in the callstack, it's still returning to the original
caller. The one time I absolutely must have accidentally messed with the re=
turn
stack buffer was when I wrote context switching routine and originally trie=
d to
"ret" to the new context. It turned out to be very measurably many times sl=
ower
that `pop %rcx; jmp *%rcx;` (also measured on a loop), so that's why I think
popping a return address, allocating on the stack, and then pushing and
returning is not really a performance killer (on my Intel CPU anyway). If it
was messing with the return stack buffer, I think would be getting  similar
slowdowns to what I got with context switching code trying to `ret`.=