From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 276973858CDA; Sun, 28 May 2023 19:42:57 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 276973858CDA DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1685302977; bh=UD6nbMxDgBS+Etn3bWJOtf69SwR35Y3LIygz2+w9n88=; h=From:To:Subject:Date:In-Reply-To:References:From; b=NyLvANmwbdOh7I30sxAqPphXqsfVGSI0K6yO95QqnV7ss2oqGSVsuL/k08DGh3M29 JLHXryPgSw4bb5MHhT2RPAeKO8uy8TjZXDUKxw656ay7EDpkB7d9LWdliiKuU3fkHo o3BB6SQsP4MT8bqKSLIUocdlaUYJi360m6VkaOuY= From: "hubicka at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug middle-end/110015] openjpeg is slower when built with gcc13 compared to clang16 Date: Sun, 28 May 2023 19:42:56 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: middle-end X-Bugzilla-Version: unknown X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: hubicka at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D110015 --- Comment #1 from Jan Hubicka --- opj_t1_enc_refpass is not inlined due to large function growth and some oth= ers due to max-inline-insns-auto. With inlining forced I get profile: 87.35% opj_t1_cblk_encode_processor 6.22% opj_dwt_encode_and_deinterleave_v.lto_priv.0 1.80% opj_mqc_byteout 1.50% opj_dwt_encode_and_deinterleave_h_one_row.lto_priv.0 So pretty much same profile as for clang. However runtime is still 45573 wi= th -O3 -flto -march=3Dnative -fno-semantic-interposition --param large-function-insns=3D1000000 --param max-inline-insns-auto=3D50000 So it does not seem to be missing IPA optimizations. There are number of conditional moves in clang code, -mbrach=3Dcost helps a= bit, but not enough.=