From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id CE78D3858D38; Sun, 19 Feb 2023 14:39:58 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org CE78D3858D38 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1676817598; bh=uZR1GmwC6Qll57ARInWNm7am9Gx/UT/hhzfbOVycqpU=; h=From:To:Subject:Date:In-Reply-To:References:From; b=Zl0OshYyGZnQmIxpxQ56ECxL9KlEaCKeztF+F2nuSKXbDWidQPQDNnXWUOaq1ynnd NCB5m4alQxKeI404fGr2uOs1csPTOFhGdjtGK0fejeRQMi9NxV1/7gjRURoURyqdX9 hCc0I5b/Wk4JltLRGnIWmTMCqH3hqA7oJ6RbKbIE= From: "mkretz at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug libstdc++/108030] `std::experimental::simd` not inlined Date: Sun, 19 Feb 2023 14:39:56 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: libstdc++ X-Bugzilla-Version: 12.2.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: mkretz at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: mkretz at gcc dot gnu.org X-Bugzilla-Target-Milestone: 13.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D108030 --- Comment #6 from Matthias Kretz (Vir) --- This improves the situation. But according to tests of fixed_size_simd in a= GNU Radio prototype it still isn't enough. It seems like I need to always_inline all non-cmath operations on fixed_size. It's a shame to see GCC choose "4 (= or more) stores, vzeroupper, function call, 4 loads, 4 arithmetic SIMD ops, 4 stores, ret, 4 (or more) loads" over "4 arithmetic SIMD ops inlined". For simd, the semantics should similar to builtin types, so always_inline w= ould anyway be the right choice (i.e. inline on -O0 as well). But there seems to= be room for improvement in GCC. I'll try to find time to produce reasonable testcases for a new PR.=