From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-358024-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 8150 invoked by alias); 27 May 2011 14:47:45 -0000
Received: (qmail 8136 invoked by uid 22791); 27 May 2011 14:47:44 -0000
X-SWARE-Spam-Status: No, hits=-2.7 required=5.0	tests=ALL_TRUSTED,AWL,BAYES_00
X-Spam-Check-By: sourceware.org
Received: from localhost (HELO gcc.gnu.org) (127.0.0.1)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 27 May 2011 14:47:15 +0000
From: "jakub at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug rtl-optimization/49095] Horrible code generation for trivial decrement with test
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: rtl-optimization
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: jakub at gcc dot gnu.org
X-Bugzilla-Status: ASSIGNED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: jakub at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Changed-Fields:
Message-ID: <bug-49095-4-75YMPRSGYx@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-49095-4@http.gcc.gnu.org/bugzilla/>
References: <bug-49095-4@http.gcc.gnu.org/bugzilla/>
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
Content-Type: text/plain; charset="UTF-8"
MIME-Version: 1.0
Date: Fri, 27 May 2011 14:55:00 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
X-SW-Source: 2011-05/txt/msg02691.txt.bz2

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49095
--- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-05-27 14:47:08 UTC ---
(In reply to comment #6)
> Jakub - the patch looks sane, although I don't currently have a gcc build
> environment to actually test it with, and there is no way I'm going to claim
> that I follow all the matches and understand why you have that third pattern
> with SWI12 (but I'm guessing it's because the op and the test are being done in
> the "explicit cast to integer" mode).

E.g. in the testcase from the patch, in hcharplus routine peephole2 sees:
(insn 27 6 28 2 (set (reg:QI 0 ax [orig:62 D.3491 ] [62])
        (mem/c:QI (reg/v/f:DI 3 bx [orig:64 x ] [64]) [0 *x_1(D)+0 S1 A8]))
pr49095.c:63 66 {*movqi_internal}
     (nil))

(insn 28 27 8 2 (parallel [
            (set (reg:SI 0 ax [orig:62 D.3491 ] [62])
                (plus:SI (reg:SI 0 ax [orig:62 D.3491 ] [62])
                    (const_int 24 [0x18])))
            (clobber (reg:CC 17 flags))
        ]) pr49095.c:63 249 {*addsi_1}
     (expr_list:REG_UNUSED (reg:CC 17 flags)
        (nil)))

(insn 8 28 9 2 (set (mem:QI (reg/v/f:DI 3 bx [orig:64 x ] [64]) [0 *x_1(D)+0 S1
A8])
        (reg:QI 0 ax [orig:62 D.3491 ] [62])) pr49095.c:63 66 {*movqi_internal}
     (nil))

(insn 9 8 10 2 (set (reg:CCZ 17 flags)
        (compare:CCZ (reg:QI 0 ax [orig:62 D.3491 ] [62])
            (const_int 0 [0]))) pr49095.c:63 0 {*cmpqi_ccno_1}
     (expr_list:REG_DEAD (reg:QI 0 ax [orig:62 D.3491 ] [62])
        (nil)))

It used to be normal QImode addition addqi_1_lea, but it got later on split
to make the code shorter:

;; Avoid redundant prefixes by splitting HImode arithmetic to SImode.

(define_split
  [(set (match_operand 0 "register_operand" "")
        (match_operator 3 "promotable_binary_operator"
           [(match_operand 1 "register_operand" "")
            (match_operand 2 "aligned_operand" "")]))
   (clobber (reg:CC FLAGS_REG))]
  "! TARGET_PARTIAL_REG_STALL && reload_completed
   && ((GET_MODE (operands[0]) == HImode
        && ((optimize_function_for_speed_p (cfun) && !TARGET_FAST_PREFIX)
            /* ??? next two lines just !satisfies_constraint_K (...) */
            || !CONST_INT_P (operands[2])
            || satisfies_constraint_K (operands[2])))
       || (GET_MODE (operands[0]) == QImode               
           && (TARGET_PROMOTE_QImode || optimize_function_for_size_p (cfun))))"
  [(parallel [(set (match_dup 0)                          
                   (match_op_dup 3 [(match_dup 1) (match_dup 2)]))
              (clobber (reg:CC FLAGS_REG))])]             
  "operands[0] = gen_lowpart (SImode, operands[0]);       
   operands[1] = gen_lowpart (SImode, operands[1]);       
   if (GET_CODE (operands[3]) != ASHIFT)
     operands[2] = gen_lowpart (SImode, operands[2]);
   PUT_MODE (operands[3], SImode);")

> One thing I wanted to check, though: I hope everybody is aware that
> 
>    op $x,mem
> 
> is not identically the same as
> 
>    mov mem,reg
>    op $x, reg
>    mov reg,mem
>    test reg
> 
> WRT the carry flag. The "test" version will always clear the carry flag, while
> the "op" version will obviously set it according to the "op". For the logical
> ops, this isn't a problem (they also clear carry), but for "add" and "sub" this
> transformation *will* result in a difference in the C flag.
> 
> Now, "carry" is meaningless when talking about compare with zero, so this
> should all be trivially ok. But I bring it up because it is possible that gcc
> perhaps still uses the "unsigned compare" versions with zero.
> 
> In particular, the thing to look out for would be code like
> 
>   unsigned int *a;
> 
>   if (--*a > 0)
>     do_something();
> 
> and if the comparison uses "jg" ("unsigned greater than") for the comparison,
> it will get the wrong end result.
> 
> Now, unsigned compares against zero are always expressible as "ne" or "eq", so
> this is not a fundamental problem. In fact, my trivial testing with a few cases
> seems to imply that gcc already does that conversion to inequality, and you'd
> obviously have exactly the same issue with eliding the "test" instruction for
> the cases you already handle (when things are in registers).

That ought to be handled by the
ix86_match_ccmode tests in the peepholes, using CCGOCmode for PLUS/MINUS and
CCNOmode for AND/IOR/XOR.  I've just copied what the actual patterns these
peepholes will turn into use.  The documentation about those modes says:
   Add CCNO to indicate comparisons against zero that requires
   Overflow flag to be unset.  Sign bit test is used instead and
   thus can be used to form "a&b>0" type of tests.

   Add CCGOC to indicate comparisons against zero that allows
   unspecified garbage in the Carry and Overflow flag. This
   mode is used to simulate comparisons of (a-b) and (a+b)
   against zero using sub/cmp/add operations.
and ix86_match_ccmode should check if the test type in which the flags register
is tested uses only the flags that will be set correctly by the operation.  In
your testcase the comparison was done in CCZmode, which is documented as:
   Add CCZ to indicate that only the Zero flag is valid.
and ix86_match_ccmode allows that for both CCNOmode and CCGOCmode requests, as
the zero flag is correct in that case.  If that didn't work properly, we'd have
problems already without these peepholes, as {add,sub,and,ior,xor}<mode>_2
patterns often match during combine when the target is in register, not memory.

BTW, the patch bootstrapped/regtested on both x86_64-linux and i686-linux, I'm
just running second set of bootstrap without the patch to be able to compare
how much the patch changed code on gcc itself.