From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 53A033858CDB; Mon, 26 Feb 2024 15:17:10 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 53A033858CDB DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1708960630; bh=N2BdUKfHT7ta9KL308rDEXqp7e0pCUqNfFZ0DLS1Rqo=; h=From:To:Subject:Date:In-Reply-To:References:From; b=O+A1wE/lGwSVl0ZIupV6zwc09iUTlo7wcZCj/aXtEEgXKoUu8qCgDlpGdFDYtT1vX xzJHr5T0WdeLvgFdqW4UeOMCcIjOros6DIJ9OduqPgQsJ2Tm4F+xAy28XkigsA8rIJ XVAcMkxqhfsJvdMx8ytLTWoOhkcVK+VXGjSh/Ey4= From: "lukas.graetz@tu-darmstadt.de" To: gcc-bugs@gcc.gnu.org Subject: [Bug rtl-optimization/10837] noreturn attribute causes no sibling calling optimization Date: Mon, 26 Feb 2024 15:17:07 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: rtl-optimization X-Bugzilla-Version: 3.4.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: lukas.graetz@tu-darmstadt.de X-Bugzilla-Status: RESOLVED X-Bugzilla-Resolution: WONTFIX X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 3.4.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D10837 --- Comment #20 from Lukas Gr=C3=A4tz --- (In reply to Petr Skocik from comment #19) > IMO(In reply to Xi Ruoyao from comment #16) >=20=20 > > In practice most _Noreturn functions are abort, exit, ..., i.e. they are > > only executed one time so optimizing against a cold path does not help = much. > > I don't think it's a good idea to encourage people to construct some fa= ncy > > code by a recursive _Noreturn function (why not just use a loop?!) And= if > > you must write such fancy code anyway IMO musttail attribute (PR83324) = will > > be a better solution. >=20 > There's also longjmp, which may not be all that super cold and may be > executed multiple times. And while yeah, nobody will notice a single call= vs > jmp time save against a process spawn/exit, for a longjmp wrapper, it'll > make it a few % faster (as would utilizing _Noreturn attributes for better > register allocation: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D11409= 7, > which would also save a bit of codesize too). Taillcalls can also save a = bit > of codesize if the target is near. Just to emphasize, tail call optimization is not just for speed. It is essential to avoid waste of stack space. Especially, to avoid potential sta= ck overflows, it should _not_ be necessary to replace all recursions with loop= s, as Xi Ruoyao suggests. Ah, and I also think that recursions in C is not fan= cy (anymore), since everyone expects the compiler to do sibcall or similar optimizations. Noreturn functions are the exception for that. So it would be consequent indeed to do sibcall optimization for noreturn functions, too! Personally, I would be satisfied with the new attribute musttail to enforces tail calls whenever necessary (given that this will be available for C, not= C++ only). But speed-wise, musttail might not have the desired effect. It is me= ant for preserving stack space. --- Following Petr Skocik, I quick-tested on my computer: =3D=3D=3D=3D=3D longjmp_wrapper.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D #include __attribute__((noreturn)) void longjmp_wrapper(jmp_buf env, int val) { longjmp(env, val); } =3D=3D=3D=3D=3D longjmp_main.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D #include #include __attribute__((noreturn)) void longjmp_wrapper(jmp_buf env, int val); int main(void) { jmp_buf env; for (int i =3D 0; i < INT_MAX; i++) { if (setjmp(env) =3D=3D 0) { longjmp_wrapper(env, 1); } } } =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D After compiling with $ gcc -O3 -m32 -c -S longjmp_wrapper.c -o longjmp_wrapper.S I copied and manually modified the generated longjmp_wrapper.S as follows: 9,15c9 < subl $20, %esp < .cfi_def_cfa_offset 24 < pushl 28(%esp) < .cfi_def_cfa_offset 28 < pushl 28(%esp) < .cfi_def_cfa_offset 32 < call longjmp --- > jmp longjmp Then I compiled both versions with longjmp_main.c, again with -m32. Measured with "time", the sibcall and unmodified version took around 23.5 sec and 24= .5 sec on my computer. So around 4 % improvement for 32 bit x86. For 64 bit x8= 6, both took around 18 secs without noticeable speed difference (perhaps becau= se both arguments are passed in registers instead of stack by 64 bit calling conventions).=