From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugs-return-436985-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 4732 invoked by alias); 8 Dec 2013 13:47:03 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Received: (qmail 4675 invoked by uid 48); 8 Dec 2013 13:46:58 -0000
From: "olegendo at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction
Date: Sun, 08 Dec 2013 13:47:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 4.6.1
X-Bugzilla-Keywords:
X-Bugzilla-Severity: enhancement
X-Bugzilla-Who: olegendo at gcc dot gnu.org
X-Bugzilla-Status: REOPENED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: olegendo at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields:
Message-ID: <bug-49263-4-WQALggtx3I@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-49263-4@http.gcc.gnu.org/bugzilla/>
References: <bug-49263-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2013-12/txt/msg00640.txt.bz2

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263
--- Comment #21 from Oleg Endo <olegendo at gcc dot gnu.org> ---
(In reply to Oleg Endo from comment #18)
> 
> It seems that combine is trying to look for the following patterns:
> 
> Failed to match this instruction:
> (set (pc)
>     (if_then_else (ne (and:SI (reg:SI 5 r5 [ xb ])
>                 (const_int 85 [0x55]))
>             (const_int 0 [0]))
>         (label_ref:SI 15)
>         (pc)))

Implementing such a combine pattern like ...
(define_insn_and_split "*tst_cbranch"
  [(set (pc)
    (if_then_else (ne (and:SI (match_operand:SI 0 "logical_operand")
                  (match_operand:SI 1 "const_int_operand"))
              (const_int 0))
              (label_ref (match_operand 2))
              (pc)))
   (clobber (reg:SI T_REG))]
  "TARGET_SH1"
  "#"
  "&& 1"
  [(set (reg:SI T_REG) (eq:SI (and:SI (match_dup 0) (match_dup 1))
                  (const_int 0)))
   (set (pc) (if_then_else (eq (reg:SI T_REG) (const_int 0))
               (label_ref (match_dup 2))
               (pc)))])


results in code such as following code:
        mov     #33,r1
        mov     r5,r0
        tst     #33,r0
        bf/s    .L3
        and     r5,r1
        mov.l   r1,@r4
.L3:
        rts
        nop

which is worse.
What happens is that the sequence is expanded to RTL as follows:

(insn 7 4 8 2 (set (reg:SI 163 [ D.1856 ])
        (and:SI (reg/v:SI 162 [ xb ])
            (const_int 33 [0x21]))) sh_tmp.cpp:17 -1
     (nil))
(insn 8 7 9 2 (set (reg:SI 147 t)
        (eq:SI (reg:SI 163 [ D.1856 ])
            (const_int 0 [0]))) sh_tmp.cpp:17 -1
     (nil))
(jump_insn 9 8 10 2 (set (pc)
        (if_then_else (eq (reg:SI 147 t)
                (const_int 0 [0]))
            (label_ref:SI 15)
            (pc))) sh_tmp.cpp:17 301 {*cbranch_t}
     (int_list:REG_BR_PROB 3900 (nil))
 -> 15)
(note 10 9 11 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
(insn 11 10 12 4 (set (reg:SI 164)
        (const_int 0 [0])) sh_tmp.cpp:18 -1
     (nil))
(insn 12 11 15 4 (set (mem:SI (reg/v/f:SI 161 [ x ]) [2 *x_5(D)+0 S4 A32])
        (reg:SI 164)) sh_tmp.cpp:18 -1
     (nil))


and the cse1 pass decides that the result of the and operation can be shared
and replaces the operand in insn 12 with reg:SI 163:

(insn 12 11 15 3 (set (mem:SI (reg/v/f:SI 161 [ x ]) [2 *x_5(D)+0 S4 A32])
        (reg:SI 163 [ D.1856 ])) sh_tmp.cpp:18 258 {movsi_ie}
     (expr_list:REG_DEAD (reg:SI 164)
        (expr_list:REG_DEAD (reg/v/f:SI 161 [ x ])
            (nil))))

and insn 11 becomes dead code and is eliminated.
All of that happens long time before combine, so the tst combine patterns have
no chance to reconstruct the original code.

A sequence such as

        mov     r5,r0
        mov     #0,r1
        tst     #33,r0
        bf      .L3
        mov.l   r1,@r4
.L3:
        rts
        nop

could probably be achieved by combining insn 7 and insn 8 shortly after RTL
expansion, or even during the expansion of insn 8 (by looking at previous
already expanded insns and emitting a tst insn directly).
The idea would be to reduce dependencies on the tested register which allows
better scheduling.  In addition to that, on SH4A "mov #imm8,Rn" is an MT group
instruction which has a higher probability of being executed in parallel.