On 05/08/2015 08:54 AM, Richard Henderson wrote: > I have a feeling I know why these didn't get merged. > > The global optimizers aren't allowed to operate on hard registers lest they > extend the lifetime of the hard register such that it creates an impossible > situation for the register allocator. Think what would happen if EAX were > suddenly live across the entire function. > > Similarly, combine is allowed to merge insns with hard registers if the insns > are sequential. But if the insns aren't sequential, we're lengthening the > lifetime of the hard register. Now, I thought this didn't apply to fixed > registers like esp or flags, but perhaps not. > > Note what happens if you swap the order of le and pf in the asm: > > asm("cmpl %3,%2" : "=@ccp" (pf), "=@ccle" (le) : "r" (x), "g" (y)); > > the order of the two setcc insns is reversed, and then the setle is in fact > merged with the branch. > > Anyway, I'll look into whether the branch around alpha can be optimized, but > I'd be shocked if I'd be able to do anything about the branch around beta. > True, there's nothing in between that will clobber the flags so it would be an > excellent improvement, but combine doesn't work across basic blocks and > changing that would be a major task. My feeling was wrong. (insn 9 11 10 2 (set (reg:QI 91 [ le ]) (le:QI (reg:CCGC 17 flags) (const_int 0 [0]))) z.c:8 601 {*setcc_qi} (nil)) (insn 10 9 12 2 (set (reg:QI 92 [ pf ]) (eq:QI (reg:CCP 17 flags) (const_int 0 [0]))) z.c:8 601 {*setcc_qi} (expr_list:REG_DEAD (reg:CCP 17 flags) (nil))) (insn 12 10 13 2 (set (reg:CCZ 17 flags) (compare:CCZ (reg:QI 91 [ le ]) (const_int 0 [0]))) z.c:12 1 {*cmpqi_ccno_1} (expr_list:REG_DEAD (reg:QI 91 [ le ]) (nil))) (jump_insn 13 12 14 2 (set (pc) (if_then_else (eq (reg:CCZ 17 flags) (const_int 0 [0])) (label_ref 17) (pc))) z.c:12 606 {*jcc_1} (expr_list:REG_DEAD (reg:CCZ 17 flags) (int_list:REG_BR_PROB 7929 (nil))) Combine won't allow a combination of insn 9->12->13 in one step one of the inputs to insn 9, the flags, is clobbered in between insn 9 and insn 13. Namely, by insn 12. Combine has a special-case for allowing this when all insns are consecutive, but not when there's something in the middle. But it *does* try to match an intermediate pattern, (set (reg:CCGC 17 flags) (compare:CCGC (reg:CCGC 17 flags) (const_int 0 [0]))) which can be considered a no-op move. If I add the attached pattern, then the combination happens in two steps -- 9->12, 12->13 -- and we get what we hoped: (note 9 11 10 2 NOTE_INSN_DELETED) (insn 10 9 12 2 (set (reg:QI 92 [ pf ]) (eq:QI (reg:CCP 17 flags) (const_int 0 [0]))) z.c:8 601 {*setcc_qi} (nil)) (note 12 10 13 2 NOTE_INSN_DELETED) (jump_insn 13 12 14 2 (set (pc) (if_then_else (gt (reg:CCGC 17 flags) (const_int 0 [0])) (label_ref 17) (pc))) z.c:12 606 {*jcc_1} (int_list:REG_BR_PROB 7929 (expr_list:REG_DEAD (reg:CCZ 17 flags) (nil))) -> 17) Jeff or Segher, is it worth complicating the can_combine_p test near line 1958 /* Make sure that the value that is to be substituted for the register does not use any registers whose values alter in between. However, If the insns are adjacent, a use can't cross a set even though we think it might (this can happen for a sequence of insns each setting the same destination; last_set of that register might point to a NOTE). If INSN has a REG_EQUIV note, the register is always equivalent to the memory so the substitution is valid even if there are intervening stores. Also, don't move a volatile asm or UNSPEC_VOLATILE across any other insns. */ || (! all_adjacent && (((!MEM_P (src) || ! find_reg_note (insn, REG_EQUIV, src)) && use_crosses_set_p (src, DF_INSN_LUID (insn))) || (GET_CODE (src) == ASM_OPERANDS && MEM_VOLATILE_P (src)) || GET_CODE (src) == UNSPEC_VOLATILE)) to notice that the set is one of SUCC or SUCC2, and is thus included in the insns being combined? That does seem cleaner and more general than the hacky i386 nop_cmp pattern, but would certainly require tons more testing... r~