From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 3177A38618D4; Wed, 14 Oct 2020 15:29:44 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3177A38618D4 From: "cvs-commit at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/97387] we are near 2021, add carry intrinsic still does the wrong thing and generates silly code. Date: Wed, 14 Oct 2020 15:29:44 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 11.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: cvs-commit at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: jakub at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Oct 2020 15:29:44 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D97387 --- Comment #11 from CVS Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:06bec55e80d98419121f3998d98d969990a75b0b commit r11-3882-g06bec55e80d98419121f3998d98d969990a75b0b Author: Jakub Jelinek Date: Wed Oct 14 17:14:47 2020 +0200 i386: Improve chaining of _{addcarry,subborrow}_u{32,64} [PR97387] These builtins have two known issues and this patch fixes one of them. One issue is that the builtins effectively return two results and they make the destination addressable until expansion, which means a stack slot is allocated for them and e.g. with -fstack-protector* DSE isn't able to optimize that away. I think for that we want to use the technique of returning complex value; the patch doesn't handle that though. See PR93990 for that. The other problem is optimization of successive uses of the builtin e.g. for arbitrary precision arithmetic additions/subtractions. As shown PR93990, combine is able to optimize the case when the first argument to these builtins is 0 (the first instance when several are us= ed together), and also the last one if the last one ignores its result (i.= e. the carry/borrow is dead and thrown away in that case). As shown in this PR, combiner refuses to optimize the rest, where it se= es: (insn 10 9 11 2 (set (reg:QI 88 [ _31 ]) (ltu:QI (reg:CCC 17 flags) (const_int 0 [0]))) "include/adxintrin.h":69:10 785 {*setcc= _qi} (expr_list:REG_DEAD (reg:CCC 17 flags) (nil))) - set pseudo 88 to CF from flags, then some uninteresting insns that don't modify flags, and finally: (insn 17 15 18 2 (parallel [ (set (reg:CCC 17 flags) (compare:CCC (plus:QI (reg:QI 88 [ _31 ]) (const_int -1 [0xffffffffffffffff])) (reg:QI 88 [ _31 ]))) (clobber (scratch:QI)) ]) "include/adxintrin.h":69:10 350 {*addqi3_cconly_overflow_1} (expr_list:REG_DEAD (reg:QI 88 [ _31 ]) (nil))) to set CF in flags back to what we saved earlier. The combiner just pu= nts trying to combine the 10, 17 and following addcarrydi (etc.) instructio= n, because if (i1 && !can_combine_p (i1, i3, i0, NULL, i2, NULL, &i1dest, &i1src= )) { if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, "Can't combine i1 into i3\n"); undo_all (); return 0; } fails - the 3 insns aren't all adjacent and || (! all_adjacent && (((!MEM_P (src) || ! find_reg_note (insn, REG_EQUIV, src)) && modified_between_p (src, insn, i3)) src (flags hard register) is modified between the first and third insn = - in the second insn. The following patch optimizes this by optimizing just the two insns, 10 and 17 above, i.e. save CF into pseudo, set CF from that pseudo, into a nop. The new define_insn_and_split matches how combine simplifies th= ose two together (except without the ix86_cc_mode change it was choosing CC= mode for the destination instead of CCCmode, so had to change that function = too, and also adjust costs so that combiner understand it is beneficial). With this, all the testcases are optimized, so that the: setc %dl ... addb $-1, %dl insns in between the ad[dc][lq] or s[ub]b[lq] instructions are all optimized away (sure, if something would clobber flags in between they wouldn't, = but there is nothing that can be done about that). 2020-10-14 Jakub Jelinek PR target/97387 * config/i386/i386.md (CC_CCC): New mode iterator. (*setcc_qi_addqi3_cconly_overflow_1_): New define_insn_and_split. * config/i386/i386.c (ix86_cc_mode): Return CCCmode for *setcc_qi_addqi3_cconly_overflow_1_ pattern operands. (ix86_rtx_costs): Return true and *total =3D 0; for *setcc_qi_addqi3_cconly_overflow_1_ pattern. Use op0= and op1 temporaries to simplify COMPARE checks. * gcc.target/i386/pr97387-1.c: New test. * gcc.target/i386/pr97387-2.c: New test.=