From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 8150 invoked by alias); 27 May 2011 14:47:45 -0000 Received: (qmail 8136 invoked by uid 22791); 27 May 2011 14:47:44 -0000 X-SWARE-Spam-Status: No, hits=-2.7 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 X-Spam-Check-By: sourceware.org Received: from localhost (HELO gcc.gnu.org) (127.0.0.1) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 27 May 2011 14:47:15 +0000 From: "jakub at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug rtl-optimization/49095] Horrible code generation for trivial decrement with test X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: rtl-optimization X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: jakub at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: jakub at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated Content-Type: text/plain; charset="UTF-8" MIME-Version: 1.0 Date: Fri, 27 May 2011 14:55:00 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2011-05/txt/msg02691.txt.bz2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49095 --- Comment #7 from Jakub Jelinek 2011-05-27 14:47:08 UTC --- (In reply to comment #6) > Jakub - the patch looks sane, although I don't currently have a gcc build > environment to actually test it with, and there is no way I'm going to claim > that I follow all the matches and understand why you have that third pattern > with SWI12 (but I'm guessing it's because the op and the test are being done in > the "explicit cast to integer" mode). E.g. in the testcase from the patch, in hcharplus routine peephole2 sees: (insn 27 6 28 2 (set (reg:QI 0 ax [orig:62 D.3491 ] [62]) (mem/c:QI (reg/v/f:DI 3 bx [orig:64 x ] [64]) [0 *x_1(D)+0 S1 A8])) pr49095.c:63 66 {*movqi_internal} (nil)) (insn 28 27 8 2 (parallel [ (set (reg:SI 0 ax [orig:62 D.3491 ] [62]) (plus:SI (reg:SI 0 ax [orig:62 D.3491 ] [62]) (const_int 24 [0x18]))) (clobber (reg:CC 17 flags)) ]) pr49095.c:63 249 {*addsi_1} (expr_list:REG_UNUSED (reg:CC 17 flags) (nil))) (insn 8 28 9 2 (set (mem:QI (reg/v/f:DI 3 bx [orig:64 x ] [64]) [0 *x_1(D)+0 S1 A8]) (reg:QI 0 ax [orig:62 D.3491 ] [62])) pr49095.c:63 66 {*movqi_internal} (nil)) (insn 9 8 10 2 (set (reg:CCZ 17 flags) (compare:CCZ (reg:QI 0 ax [orig:62 D.3491 ] [62]) (const_int 0 [0]))) pr49095.c:63 0 {*cmpqi_ccno_1} (expr_list:REG_DEAD (reg:QI 0 ax [orig:62 D.3491 ] [62]) (nil))) It used to be normal QImode addition addqi_1_lea, but it got later on split to make the code shorter: ;; Avoid redundant prefixes by splitting HImode arithmetic to SImode. (define_split [(set (match_operand 0 "register_operand" "") (match_operator 3 "promotable_binary_operator" [(match_operand 1 "register_operand" "") (match_operand 2 "aligned_operand" "")])) (clobber (reg:CC FLAGS_REG))] "! TARGET_PARTIAL_REG_STALL && reload_completed && ((GET_MODE (operands[0]) == HImode && ((optimize_function_for_speed_p (cfun) && !TARGET_FAST_PREFIX) /* ??? next two lines just !satisfies_constraint_K (...) */ || !CONST_INT_P (operands[2]) || satisfies_constraint_K (operands[2]))) || (GET_MODE (operands[0]) == QImode && (TARGET_PROMOTE_QImode || optimize_function_for_size_p (cfun))))" [(parallel [(set (match_dup 0) (match_op_dup 3 [(match_dup 1) (match_dup 2)])) (clobber (reg:CC FLAGS_REG))])] "operands[0] = gen_lowpart (SImode, operands[0]); operands[1] = gen_lowpart (SImode, operands[1]); if (GET_CODE (operands[3]) != ASHIFT) operands[2] = gen_lowpart (SImode, operands[2]); PUT_MODE (operands[3], SImode);") > One thing I wanted to check, though: I hope everybody is aware that > > op $x,mem > > is not identically the same as > > mov mem,reg > op $x, reg > mov reg,mem > test reg > > WRT the carry flag. The "test" version will always clear the carry flag, while > the "op" version will obviously set it according to the "op". For the logical > ops, this isn't a problem (they also clear carry), but for "add" and "sub" this > transformation *will* result in a difference in the C flag. > > Now, "carry" is meaningless when talking about compare with zero, so this > should all be trivially ok. But I bring it up because it is possible that gcc > perhaps still uses the "unsigned compare" versions with zero. > > In particular, the thing to look out for would be code like > > unsigned int *a; > > if (--*a > 0) > do_something(); > > and if the comparison uses "jg" ("unsigned greater than") for the comparison, > it will get the wrong end result. > > Now, unsigned compares against zero are always expressible as "ne" or "eq", so > this is not a fundamental problem. In fact, my trivial testing with a few cases > seems to imply that gcc already does that conversion to inequality, and you'd > obviously have exactly the same issue with eliding the "test" instruction for > the cases you already handle (when things are in registers). That ought to be handled by the ix86_match_ccmode tests in the peepholes, using CCGOCmode for PLUS/MINUS and CCNOmode for AND/IOR/XOR. I've just copied what the actual patterns these peepholes will turn into use. The documentation about those modes says: Add CCNO to indicate comparisons against zero that requires Overflow flag to be unset. Sign bit test is used instead and thus can be used to form "a&b>0" type of tests. Add CCGOC to indicate comparisons against zero that allows unspecified garbage in the Carry and Overflow flag. This mode is used to simulate comparisons of (a-b) and (a+b) against zero using sub/cmp/add operations. and ix86_match_ccmode should check if the test type in which the flags register is tested uses only the flags that will be set correctly by the operation. In your testcase the comparison was done in CCZmode, which is documented as: Add CCZ to indicate that only the Zero flag is valid. and ix86_match_ccmode allows that for both CCNOmode and CCGOCmode requests, as the zero flag is correct in that case. If that didn't work properly, we'd have problems already without these peepholes, as {add,sub,and,ior,xor}_2 patterns often match during combine when the target is in register, not memory. BTW, the patch bootstrapped/regtested on both x86_64-linux and i686-linux, I'm just running second set of bootstrap without the patch to be able to compare how much the patch changed code on gcc itself.