From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 5D64B3857731; Fri, 26 Jan 2024 18:27:36 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 5D64B3857731 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1706293656; bh=y4X8LVX0IuQtlPa49uvJoH2w0Wt/JtKCkR/7tvBmkDo=; h=From:To:Subject:Date:In-Reply-To:References:From; b=x64QHs/bMBt/Sha8ozATEd3MSUek8r0lxatWtT8iJug+DFcJR56vr8Ba58KCDu9tA MJnAalCbLjgrMBDVtvnr1nDt6/W1Rg4+YbHZkQXuPBcH96DC+iSlfYsbjZXO56QvV7 HPKJlAm4He0i1HAmT6iJ1XPa8zKoR1akBidLE9LM= From: "jamborm at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/113600] [14 regression] 525.x264_r run-time regresses by 8% with PGO -Ofast -march=znver4 Date: Fri, 26 Jan 2024 18:27:35 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: jamborm at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 14.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D113600 --- Comment #4 from Martin Jambor --- (In reply to Hongtao Liu from comment #2) > A patch is posted at > https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640276.html >=20 > Would you give a try to see if it fixes the regression, I don't currently > have a znver4 machine for testing. Unfortunately it does not. (In reply to Richard Biener from comment #3) > I think we need to figure out what exactly gets slower (and hope it's not > scattered all over the place) I have collected some profiles: r14-5602-ge6269bb69c0734 # Samples: 516K of event 'cycles:u' # Event count (approx.): 468008188417 # Overhead Samples Command Shared Object=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 Symbol=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 # ........ ............ ...............=20 .....................................=20 ................................................. # 13.55% 69886 x264_r_peak.min=20 x264_r_peak.mine-pgo-Ofast-native-m64 [.] mc_chroma 11.05% 57017 x264_r_peak.min=20 x264_r_peak.mine-pgo-Ofast-native-m64 [.] x264_pixel_satd_16x16 9.24% 47693 x264_r_peak.min=20 x264_r_peak.mine-pgo-Ofast-native-m64 [.] x264_pixel_satd_8x8 8.67% 44733 x264_r_peak.min=20 x264_r_peak.mine-pgo-Ofast-native-m64 [.] get_ref 4.84% 24984 x264_r_peak.min=20 x264_r_peak.mine-pgo-Ofast-native-m64 [.] sub16x16_dct 4.16% 21484 x264_r_peak.min=20 x264_r_peak.mine-pgo-Ofast-native-m64 [.] x264_me_search_ref 3.30% 17033 x264_r_peak.min=20 x264_r_peak.mine-pgo-Ofast-native-m64 [.] x264_pixel_hadamard_ac_16x16 2.28% 11770 x264_r_peak.min=20 x264_r_peak.mine-pgo-Ofast-native-m64 [.] x264_pixel_satd_4x4 2.10% 10824 x264_r_peak.min=20 x264_r_peak.mine-pgo-Ofast-native-m64 [.] quant_trellis_cabac 2.07% 10694 x264_r_peak.min=20 x264_r_peak.mine-pgo-Ofast-native-m64 [.] hpel_filter 2.05% 10616 x264_r_peak.min=20 x264_r_peak.mine-pgo-Ofast-native-m64 [.] sub8x8_dct 1.86% 9593 x264_r_peak.min=20 x264_r_peak.mine-pgo-Ofast-native-m64 [.] refine_subpel 1.70% 8788 x264_r_peak.min=20 x264_r_peak.mine-pgo-Ofast-native-m64 [.] quant_4x4 1.57% 8077 x264_r_peak.min=20 x264_r_peak.mine-pgo-Ofast-native-m64 [.] x264_pixel_sad_16x16 1.16% 6324 x264_r_peak.min=20 x264_r_peak.mine-pgo-Ofast-native-m64 [.] frame_init_lowres_core 1.14% 5867 x264_r_peak.min=20 x264_r_peak.mine-pgo-Ofast-native-m64 [.] x264_pixel_sa8d_8x8 1.11% 5738 x264_r_peak.min=20 x264_r_peak.mine-pgo-Ofast-native-m64 [.] x264_cabac_encode_decision_c 1.08% 5736 x264_r_peak.min=20 x264_r_peak.mine-pgo-Ofast-native-m64 [.] x264_pixel_var_16x16 r14-5603-g2b59e2b4dff421 # Samples: 550K of event 'cycles:u' # Event count (approx.): 498834737657 # Overhead Samples Command Shared Object=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 Symbol=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 # ........ ............ ...............=20 .....................................=20 ................................................. # 18.21% 100151 x264_r_peak.min=20 x264_r_peak.mine-pgo-Ofast-native-m64 [.] x264_pixel_satd_16x16 12.37% 68006 x264_r_peak.min=20 x264_r_peak.mine-pgo-Ofast-native-m64 [.] mc_chroma 8.51% 46815 x264_r_peak.min=20 x264_r_peak.mine-pgo-Ofast-native-m64 [.] x264_pixel_satd_8x8 7.56% 41560 x264_r_peak.min=20 x264_r_peak.mine-pgo-Ofast-native-m64 [.] get_ref 4.53% 24901 x264_r_peak.min=20 x264_r_peak.mine-pgo-Ofast-native-m64 [.] sub16x16_dct 3.92% 21561 x264_r_peak.min=20 x264_r_peak.mine-pgo-Ofast-native-m64 [.] x264_me_search_ref 3.08% 16963 x264_r_peak.min=20 x264_r_peak.mine-pgo-Ofast-native-m64 [.] x264_pixel_hadamard_ac_16x16 2.41% 13239 x264_r_peak.min=20 x264_r_peak.mine-pgo-Ofast-native-m64 [.] x264_pixel_satd_4x4 1.99% 10931 x264_r_peak.min=20 x264_r_peak.mine-pgo-Ofast-native-m64 [.] quant_trellis_cabac 1.96% 10801 x264_r_peak.min=20 x264_r_peak.mine-pgo-Ofast-native-m64 [.] hpel_filter 1.95% 10764 x264_r_peak.min=20 x264_r_peak.mine-pgo-Ofast-native-m64 [.] sub8x8_dct 1.56% 8587 x264_r_peak.min=20 x264_r_peak.mine-pgo-Ofast-native-m64 [.] quant_4x4 1.49% 8166 x264_r_peak.min=20 x264_r_peak.mine-pgo-Ofast-native-m64 [.] refine_subpel 1.48% 8124 x264_r_peak.min=20 x264_r_peak.mine-pgo-Ofast-native-m64 [.] x264_pixel_sad_16x16 1.09% 6328 x264_r_peak.min=20 x264_r_peak.mine-pgo-Ofast-native-m64 [.] frame_init_lowres_core 1.07% 5901 x264_r_peak.min=20 x264_r_peak.mine-pgo-Ofast-native-m64 [.] x264_pixel_sa8d_8x8 1.04% 5703 x264_r_peak.min=20 x264_r_peak.mine-pgo-Ofast-native-m64 [.] x264_cabac_encode_decision_c=