From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 1BFFF3858408; Fri, 19 Jan 2024 16:43:39 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 1BFFF3858408 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1705682619; bh=SZj1ly5kurohwlDA+p8GJG4D2HlUsOt2DFOJLlBob5E=; h=From:To:Subject:Date:In-Reply-To:References:From; b=X/WupRlRDs9AdsU3udkBeNVl4qBhHAuFfKMWQ679W23mEvoAEazAGDEZZaRxlwhzT ZTR950Dm0YLqT8+TcgXIvtAj4hUUVzEYusIefIm1v7MP2uEODSMnpJLgF5xAdlK0aw QUAEVK8o3v1c0qGaGZBWfBE+VUGi7hUyAtgeKTBI= From: "hubicka at ucw dot cz" To: gcc-bugs@gcc.gnu.org Subject: [Bug ipa/113478] -Os does not inline single instruction function Date: Fri, 19 Jan 2024 16:43:38 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: ipa X-Bugzilla-Version: 13.2.1 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: hubicka at ucw dot cz X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D113478 --- Comment #4 from Jan Hubicka --- > Possibly, at least when we know it doesn't expand to a libatomic call? O= TOH > even then a function just wrapping such call should probably be inlined, > so the question is whether the problem that > is estimated as too big compared to the call calling the function > (OTOH a1.test () has no arguments while __atomic_load_1 has two). If we really want to optimize for size, calling function with one parameter is shorter then calling function with two parameters. The code size model takes into account when the offline copy of the function will disappear and it also has some biass towards understanding that a lot of comdat functions are not really shared across units. The testcase calls function 15 times and I guess wrapper function on most architectures is shorter than 15 load zero instructions... We now have -Os and -Oz and two-level optimize_size predicates. We may make this less restrictive with lower size optimization level. But when optimizing for size and if __atomic_load was ordinary function call, I think the decision is correct.=