From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 4E63D384D1BE; Thu, 6 Oct 2022 08:53:15 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 4E63D384D1BE DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1665046395; bh=diUDbu2+jXw2U+rQEZR/ICdFTvh2+knGk2e43Ddyqrg=; h=From:To:Subject:Date:In-Reply-To:References:From; b=X1RZPUA9rleWX66CtIcLwS9u/c+lYa7m9NEDN9PIJqpRGDhYA4zFH8XW6ADfybSFc Imv33wrgek9hKrbIJnu8YvDnqg4U+hnAOsfcdcQIuiEK2hta2tZsY/1FnSwktEq1YT h5SFJ8/Uf8kLjo++a/yNXN0m7jQe7Ivh+39C02gM= From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/107099] uncprop a bit too eager Date: Thu, 06 Oct 2022 08:53:13 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 13.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cf_reconfirmed_on bug_status keywords everconfirmed Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D107099 Richard Biener changed: What |Removed |Added ---------------------------------------------------------------------------- Last reconfirmed| |2022-10-06 Status|UNCONFIRMED |NEW Keywords| |missed-optimization Ever confirmed|0 |1 --- Comment #1 from Richard Biener --- uncprop should really be merged into the out-of-SSA coalescing machinery, i= t's sole purpose is to avoid edge copies (which for constants you always have to perform) when there's the possibility to generate reg-reg moves or coalesci= ng instead. As such it's a heuristic that needs to be weighted against other heuristics in place. In this case we emit a copy instead of a zeroing (and zeroing is cheap on t= he target, so zero might be a good candidate to special-case anyway), but we have the zeroing in the loop body now. The odd thing is that we perform desired coalescing: Partition 0 (_1(D) - 1 6 11 ) (_1(D) is created for DECL_RESULT) but expansion of __builtin_ia32_ptestz128 does ;; _11 =3D __builtin_ia32_ptestz128 (_4, _4); (insn 17 16 18 (set (reg:SI 90) (const_int 0 [0])) "include/smmintrin.h":69:10 -1 (nil)) (insn 18 17 19 (set (reg:CC 17 flags) (unspec:CC [ (reg:V2DI 82 [ _4 ]) repeated x2 ] UNSPEC_PTEST)) "include/smmintrin.h":69:10 -1 (nil)) (insn 19 18 20 (set (strict_low_part (subreg:QI (reg:SI 90) 0)) (eq:QI (reg:CC 17 flags) (const_int 0 [0]))) "include/smmintrin.h":69:10 -1 (nil)) (insn 20 19 0 (set (reg:SI 86 [ ]) (reg:SI 90)) "include/smmintrin.h":69:10 -1 (nil)) which eventually ends up as (insn 21 20 22 4 (set (reg:SI 86 [ ]) (eq:SI (reg:CC 17 flags) (const_int 0 [0]))) "t.c":7:12 940 {*setcc_si_1_movzbl} (nil)) (jump_insn 22 21 23 4 (set (pc) (if_then_else (ne (reg:CC 17 flags) (const_int 0 [0])) (label_ref:DI 32) (pc))) "t.c":7:12 946 {*jcc} (int_list:REG_BR_PROB 59055804 (expr_list:REG_DEAD (reg:CCZ 17 flags) (nil))) that's not forseen by coalescing - that _11 is actually not needed inside the loop but only the CC result. So heuristically we might want to disable uncprop when definitions from calls (to [target] builtins) would be used.=