From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 5B2B33858D33; Tue, 8 Aug 2023 12:40:33 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 5B2B33858D33 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1691498433; bh=D439AByHFZo5aEghjxyWdj0eYPsiXnCQ08N3JEvNKNo=; h=From:To:Subject:Date:In-Reply-To:References:From; b=tKct+iyGsaFasxCMiptnsycmrKjpV95/YOWN8MJONZlOH5o6VMkQLNhXP0h9rR8Tl j0D3wsOB9FrtWbLWyiZzBPLWWySt8j3H7JKRnLS9mv55gaDDSgNa8MYqZ1SO23okGX oRoFPNQWqXlFkkGZY836BBhDLPsZ4VDXQyROVeak= From: "dave.rodgman at arm dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug other/110946] 3x perf regression with -Os on M1 Pro Date: Tue, 08 Aug 2023 12:40:32 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: other X-Bugzilla-Version: 12.1.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: dave.rodgman at arm dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D110946 --- Comment #5 from Dave Rodgman --- (In reply to Richard Biener from comment #3) > Note you shouldn't use -Os if you care about performance. GCC is quite > reasonable with code size increases at -O2 (as compared to other compiler= s). > Instead I suggest you use -flto with -O2 to decrease the size of the final > executable/library and give GCC better knowledge on unit growth. Understood, but I think it depends on the magnitude of the perf difference.= I'd expect a smallish perf drop, say 10%, from -Os to be reasonable, but I'd consider a 3x perf difference to be a compiler issue.(In reply to Alexander Monakov from comment #2) > So basically missed inlining at -Os, even memcpy wrappers are not inlined. >=20 > Can you provide a reproducible testcase? >=20 > Note that inline functions in mbedtls/library/alignment.h all miss the > 'static' qualifier, which affects inlining decisions, and looks like a > mistake anyway (if they are really meant to be non-static inlines, should= n't > there be a comment?) >=20 > Does making them 'static inline' rectify the problem? The easiest way to reproduce is to use the benchmark tool: make programs/test/benchmark CC=3Dgcc CFLAGS=3D"-Os" programs/test/benchmark aes_xts I don't have a compact reproducer, sorry.=