From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 53A033858CDB; Mon, 26 Feb 2024 15:17:10 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 53A033858CDB
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1708960630;
	bh=N2BdUKfHT7ta9KL308rDEXqp7e0pCUqNfFZ0DLS1Rqo=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=O+A1wE/lGwSVl0ZIupV6zwc09iUTlo7wcZCj/aXtEEgXKoUu8qCgDlpGdFDYtT1vX
	 xzJHr5T0WdeLvgFdqW4UeOMCcIjOros6DIJ9OduqPgQsJ2Tm4F+xAy28XkigsA8rIJ
	 XVAcMkxqhfsJvdMx8ytLTWoOhkcVK+VXGjSh/Ey4=
From: "lukas.graetz@tu-darmstadt.de" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug rtl-optimization/10837] noreturn attribute causes no sibling
 calling optimization
Date: Mon, 26 Feb 2024 15:17:07 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: rtl-optimization
X-Bugzilla-Version: 3.4.0
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: normal
X-Bugzilla-Who: lukas.graetz@tu-darmstadt.de
X-Bugzilla-Status: RESOLVED
X-Bugzilla-Resolution: WONTFIX
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 3.4.0
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-10837-4-aO52MIbsEl@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-10837-4@http.gcc.gnu.org/bugzilla/>
References: <bug-10837-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D10837

--- Comment #20 from Lukas Gr=C3=A4tz <lukas.graetz@tu-darmstadt.de> ---
(In reply to Petr Skocik from comment #19)
> IMO(In reply to Xi Ruoyao from comment #16)
>=20=20
> > In practice most _Noreturn functions are abort, exit, ..., i.e. they are
> > only executed one time so optimizing against a cold path does not help =
much.
> > I don't think it's a good idea to encourage people to construct some fa=
ncy
> > code by a recursive _Noreturn function (why not just use a loop?!)  And=
 if
> > you must write such fancy code anyway IMO musttail attribute (PR83324) =
will
> > be a better solution.
>=20
> There's also longjmp, which may not be all that super cold and may be
> executed multiple times. And while yeah, nobody will notice a single call=
 vs
> jmp time save against a process spawn/exit, for a longjmp wrapper, it'll
> make it a few % faster (as would utilizing _Noreturn attributes for better
> register allocation: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D11409=
7,
> which would also save a bit of codesize too). Taillcalls can also save a =
bit
> of codesize if the target is near.


Just to emphasize, tail call optimization is not just for speed. It is
essential to avoid waste of stack space. Especially, to avoid potential sta=
ck
overflows, it should _not_ be necessary to replace all recursions with loop=
s,
as Xi Ruoyao suggests. Ah, and I also think that recursions in C is not fan=
cy
(anymore), since everyone expects the compiler to do sibcall or similar
optimizations. Noreturn functions are the exception for that. So it would be
consequent indeed to do sibcall optimization for noreturn functions, too!

Personally, I would be satisfied with the new attribute musttail to enforces
tail calls whenever necessary (given that this will be available for C, not=
 C++
only). But speed-wise, musttail might not have the desired effect. It is me=
ant
for preserving stack space.

---

Following Petr Skocik, I quick-tested on my computer:

=3D=3D=3D=3D=3D longjmp_wrapper.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D
#include <setjmp.h>

__attribute__((noreturn))
void longjmp_wrapper(jmp_buf env, int val) {
    longjmp(env, val);
}

=3D=3D=3D=3D=3D longjmp_main.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
#include <setjmp.h>
#include <limits.h>

__attribute__((noreturn))
void longjmp_wrapper(jmp_buf env, int val);

int main(void) {
    jmp_buf env;
    for (int i =3D 0; i < INT_MAX; i++) {
        if (setjmp(env) =3D=3D 0) {
            longjmp_wrapper(env, 1);
        }
    }
}
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

After compiling with

$ gcc -O3 -m32 -c -S longjmp_wrapper.c -o longjmp_wrapper.S

I copied and manually modified the generated longjmp_wrapper.S as follows:

9,15c9
<       subl    $20, %esp
<       .cfi_def_cfa_offset 24
<       pushl   28(%esp)
<       .cfi_def_cfa_offset 28
<       pushl   28(%esp)
<       .cfi_def_cfa_offset 32
<       call    longjmp
---
> 	jmp 	longjmp


Then I compiled both versions with longjmp_main.c, again with -m32. Measured
with "time", the sibcall and unmodified version took around 23.5 sec and 24=
.5
sec on my computer. So around 4 % improvement for 32 bit x86. For 64 bit x8=
6,
both took around 18 secs without noticeable speed difference (perhaps becau=
se
both arguments are passed in registers instead of stack by 64 bit calling
conventions).=