public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/64345] New: [SH] Improve single bit extraction
@ 2014-12-17 22:21 olegendo at gcc dot gnu.org
  2015-01-24 13:05 ` [Bug target/64345] " olegendo at gcc dot gnu.org
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: olegendo at gcc dot gnu.org @ 2014-12-17 22:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64345

            Bug ID: 64345
           Summary: [SH] Improve single bit extraction
           Product: gcc
           Version: 5.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: olegendo at gcc dot gnu.org
            Target: sh*-*-*

Single bit extractions can be done through the T bit.
Some examples:

unsigned int test0 (unsigned int x)
{
  return ((x >> 4) ^ 1) & 1;
}

unsigned int test1 (unsigned int x)
{
  return ((x >> 4) & 1) ^ 1;
}

unsigned int test2 (unsigned int x)
{
  return ~(x >> 4) & 1;
}

now:
        mov     r4,r0
        shlr2   r0
        shlr2   r0
        xor     #1,r0
        and     #1,r0

non-sh2a:
        mov     r4,r0
        tst     #(1<<4),r0
        movt    r0

sh2a:
        bld     #4,r4
        movrt   r0


unsigned int test3 (unsigned int x)
{
  return ((~x >> 4) & 1);
}

now:
        not     r4,r4
        mov     r4,r0
        shlr2   r0
        shlr2   r0
        rts
        and     #1,r0

non-sh2a:
        mov     r4,r0
        tst     #(1<<4),r0
        movt    r0

sh2a:
        bld     #4,r4
        movrt   r0


unsigned int test4 (unsigned int x)
{
  return (x >> 4) & 1;
}

now:
        mov     r4,r0
        shlr2   r0
        shlr2   r0
        and     #1,r0

non-sh2:
        not     r4,r0
        tst     #(1<<4),r0
        movt    r0

sh2a (1):
        mov     r4,r0
        tst     #(1<<4),r0
        movrt   r0

sh2a (2)
        bld     #4,r4
        movt    r0


This can be realized by implementing zero_extract combine patterns such as:
set (reg:SI 170 [ D.1727 ])
    (zero_extract:SI (xor:SI (reg:SI 4 r4 [ x ])
            (const_int 16 [0x10]))
        (const_int 1 [0x1])
        (const_int 4 [0x4])))

Using the recently added support for multi-set patterns in combine, the T bit
contents can be described exactly, instead of simply T bit clobbers.

If the bit position is 0...7 the SH2A bld insn can be used.  For higher bit
positions it might be better to load a constant and use the tst rm,rn insn.  

Alternatively the reg can be shifted right via shlr8 and shlr16 before the tst
insn.  However, in this case it's probably better to use a shld + and #1,r0
sequence, if dynamic shifts are available.  Although that forces the result
value into r0, which might need to be moved into another reg.  With the tst
insn, r0 is used only as a temporary scratch register.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/64345] [SH] Improve single bit extraction
  2014-12-17 22:21 [Bug target/64345] New: [SH] Improve single bit extraction olegendo at gcc dot gnu.org
@ 2015-01-24 13:05 ` olegendo at gcc dot gnu.org
  2015-07-19  8:54 ` olegendo at gcc dot gnu.org
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: olegendo at gcc dot gnu.org @ 2015-01-24 13:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64345

--- Comment #1 from Oleg Endo <olegendo at gcc dot gnu.org> ---
Author: olegendo
Date: Sat Jan 24 13:04:53 2015
New Revision: 220081

URL: https://gcc.gnu.org/viewcvs?rev=220081&root=gcc&view=rev
Log:
gcc/
    PR target/49263
    PR target/53987
    PR target/64345
    PR target/59533
    PR target/52933
    PR target/54236
    PR target/51244
    * config/sh/sh-protos.h
    (sh_extending_set_of_reg::can_use_as_unextended_reg,
    sh_extending_set_of_reg::use_as_unextended_reg,
    sh_is_nott_insn, sh_movt_set_dest, sh_movrt_set_dest, sh_is_movt_insn,
    sh_is_movrt_insn, sh_insn_operands_modified_between_p,
    sh_reg_dead_or_unused_after_insn, sh_in_recog_treg_set_expr,
    sh_recog_treg_set_expr, sh_split_treg_set_expr): New functions.
    (sh_treg_insns): New class.
    * config/sh/sh.c (TARGET_LEGITIMATE_COMBINED_INSN): Define target hook.
    (scope_counter): New class.
    (sh_legitimate_combined_insn, sh_is_nott_insn, sh_movt_set_dest,
    sh_movrt_set_dest, sh_reg_dead_or_unused_after_insn,
    sh_extending_set_of_reg::can_use_as_unextended_reg,
    sh_extending_set_of_reg::use_as_unextended_reg, sh_recog_treg_set_expr,
    sh_in_recog_treg_set_expr, sh_try_split_insn_simple,
    sh_split_treg_set_expr): New functions.
    (addsubcosts): Handle treg_set_expr.
    (sh_rtx_costs): Handle IF_THEN_ELSE and ZERO_EXTRACT.
    (sh_rtx_costs): Use arith_reg_operand in SIGN_EXTEND and ZERO_EXTEND.
    (sh_rtx_costs): Handle additional bit test patterns in EQ and AND cases.
    (sh_insn_operands_modified_between_p): Make non-static.
    * config/sh/predicates.md (zero_extend_movu_operand): Allow
    simple_mem_operand in addition to displacement_mem_operand.
    (zero_extend_operand): Don't allow zero_extend_movu_operand.
    (treg_set_expr, treg_set_expr_not_const01,
    arith_reg_or_treg_set_expr): New predicates.
    * config/sh/sh.md (tstsi_t): Use arith_reg_operand and
    arith_or_int_operand instead of logical_operand.  Convert to
    insn_and_split.  Try to optimize constant operand in splitter.
    (tsthi_t, tstqi_t): Fold into *tst<mode>_t.  Convert to insn_and_split.
    (*tstqi_t_zero): Delete.
    (*tst<mode>_t_subregs): Add !sh_in_recog_treg_set_expr split condition.
    (tstsi_t_and_not): Delete.
    (tst<mode>_t_zero_extract_eq): Rename to *tst<mode>_t_zero_extract.
    Convert to insn_and_split.
    (unnamed split, tstsi_t_zero_extract_xor,
    tstsi_t_zero_extract_subreg_xor_little,
    tstsi_t_zero_extract_subreg_xor_big): Delete.
    (*tstsi_t_shift_mask): New insn_and_split.
    (cmpeqsi_t, cmpgesi_t): Add new split for const_int 0 operands and try
    to recombine with surrounding insns when splitting.
    (*negtstsi): Add !sh_in_recog_treg_set_expr condition.
    (cmp_div0s_0, cmp_div0s_1, *cmp_div0s_0, *cmp_div0s_1): Rewrite as ...
    (cmp_div0s, *cmp_div0s_1, *cmp_div0s_2, *cmp_div0s_3, *cmp_div0s_4,
    *cmp_div0s_5, *cmp_div0s_6): ... these new insn_and_split patterns.
    (*cbranch_div0s: Delete.
    (*addc): Convert to insn_and_split.  Use treg_set_expr as 3rd operand.
    Try to recombine with surrounding insns when splitting.  Add operand
    order variants.
    (*addc_t_r, *addc_r_t): Use treg_set_expr_not_const01.
    (*addc_r_r_1, *addc_r_lsb, *addc_r_r_lsb, *addc_r_lsb_r, *addc_r_msb,
    *addc_r_r_msb, *addc_2r_msb): Delete.
    (*addc_2r_lsb): Rename to *addc_2r_t.  Use treg_set_expr.  Add operand
    order variant.
    (*addc_negreg_t): New insn_and_split.
    (*subc): Convert to insn_and_split.  Use treg_set_expr as 3rd operand.
    Try to recombine with surrounding insns when splitting.
    Add operand order variants.  
    (*subc_negt_reg, *subc_negreg_t, *reg_lsb_t, *reg_msb_t): New
    insn_and_split patterns.
    (*rotcr): Use arith_reg_or_treg_set_expr.  Try to recombine with
    surrounding insns when splitting.
    (unnamed rotcr split): Use arith_reg_or_treg_set_expr.
    (*rotcl): Likewise.  Add zero_extract variant.
    (*ashrsi2_31): New insn_and_split.
    (*negc): Convert to insn_and_split.  Use treg_set_expr.
    (*zero_extend<mode>si2_disp_mem): Update comment.
    (movrt_negc, *movrt_negc, nott): Add !sh_in_recog_treg_set_expr split
    condition.
    (*mov_t_msb_neg, mov_neg_si_t): Use treg_set_expr.  Try to recombine
    with surrounding insns when splitting.
    (any_treg_expr_to_reg): New insn_and_split.
    (*neg_zero_extract_0, *neg_zero_extract_1, *neg_zero_extract_2,
    *neg_zero_extract_3, *neg_zero_extract_4, *neg_zero_extract_5,
    *neg_zero_extract_6, *zero_extract_0, *zero_extract_1,
    *zero_extract_2): New single bit zero extract patterns.
    (bld_reg, *bld_regqi): Fold into bld<mode>_reg.
    (*get_thread_pointersi, store_gbr, *mov<mode>_gbr_load,
    *mov<mode>_gbr_load, *mov<mode>_gbr_load, *mov<mode>_gbr_load,
    *movdi_gbr_load): Use arith_reg_dest instead of register_operand for
    set destination.
    (set_thread_pointersi, load_gbr): Use arith_reg_operand instead of
    register_operand for set source.

gcc/testsuite/
    PR target/49263
    PR target/53987
    PR target/64345
    PR target/59533
    PR target/52933
    PR target/54236
    PR target/51244
    * gcc.target/sh/pr64345-1.c: New.
    * gcc.target/sh/pr64345-2.c: New.
    * gcc.target/sh/pr59533-1.c: New.
    * gcc.target/sh/pr49263.c: Adjust matching of expected insns.
    * gcc.target/sh/pr52933-2.c: Likewise.
    * gcc.target/sh/pr54089-1.c: Likewise.
    * gcc.target/sh/pr54236-1.c: Likewise.
    * gcc.target/sh/pr51244-20-sh2a.c: Likewise.
    * gcc.target/sh/pr49263-1.c: Remove xfails.
    * gcc.target/sh/pr49263-2.c: Likewise.
    * gcc.target/sh/pr49263-3.c: Likewise.
    * gcc.target/sh/pr53987-1.c: Likewise.
    * gcc.target/sh/pr52933-1.c: Adjust matching of expected insns.
    (test_24, test_25, test_26, test_27, test_28, test_29, test_30): New.
    * gcc.target/sh/pr51244-12.c: Adjust matching of expected insns.
    (test05, test06, test07, test08, test09, test10, test11, test12): New.
    * gcc.target/sh/pr54236-3.c: Adjust matching of expected insns.
    (test_002, test_003, test_004, test_005, test_006, test_007, test_008,
    test_009): New.
    * gcc.target/sh/pr51244-4.c: Adjust matching of expected insns.
    (test_02): New.

Added:
    trunk/gcc/testsuite/gcc.target/sh/pr59533-1.c
    trunk/gcc/testsuite/gcc.target/sh/pr64345-1.c
    trunk/gcc/testsuite/gcc.target/sh/pr64345-2.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/sh/predicates.md
    trunk/gcc/config/sh/sh-protos.h
    trunk/gcc/config/sh/sh.c
    trunk/gcc/config/sh/sh.md
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.target/sh/pr49263-1.c
    trunk/gcc/testsuite/gcc.target/sh/pr49263-2.c
    trunk/gcc/testsuite/gcc.target/sh/pr49263-3.c
    trunk/gcc/testsuite/gcc.target/sh/pr49263.c
    trunk/gcc/testsuite/gcc.target/sh/pr51244-12.c
    trunk/gcc/testsuite/gcc.target/sh/pr51244-20-sh2a.c
    trunk/gcc/testsuite/gcc.target/sh/pr51244-4.c
    trunk/gcc/testsuite/gcc.target/sh/pr52933-1.c
    trunk/gcc/testsuite/gcc.target/sh/pr52933-2.c
    trunk/gcc/testsuite/gcc.target/sh/pr53987-1.c
    trunk/gcc/testsuite/gcc.target/sh/pr54089-1.c
    trunk/gcc/testsuite/gcc.target/sh/pr54236-1.c
    trunk/gcc/testsuite/gcc.target/sh/pr54236-3.c


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/64345] [SH] Improve single bit extraction
  2014-12-17 22:21 [Bug target/64345] New: [SH] Improve single bit extraction olegendo at gcc dot gnu.org
  2015-01-24 13:05 ` [Bug target/64345] " olegendo at gcc dot gnu.org
@ 2015-07-19  8:54 ` olegendo at gcc dot gnu.org
  2015-07-19 13:11 ` olegendo at gcc dot gnu.org
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: olegendo at gcc dot gnu.org @ 2015-07-19  8:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64345

--- Comment #2 from Oleg Endo <olegendo at gcc dot gnu.org> ---
A recent change in the middle end has triggered this:

FAIL: gcc.target/sh/pr54236-1.c scan-assembler-times negc 2

The test

int
test_07 (int *vec)
{
  /* Must not see a 'sett' or 'addc' here.
     This is a case where combine tries to produce
     'a + (0 - b) + 1' out of 'a - b + 1'.
     On non-SH2A there is a 'tst + negc', on SH2A a 'bld + movt'.  */
  int z = vec[0];
  int vi = vec[1];
  int zi = vec[2];

  if (zi != 0 && z < -1)
    vi -= (((vi >> 7) & 0x01) << 1) - 1;

  return vi;
}

fails to produce the expected tst + negc sequence.
A reduced case is:

int test (int vi)
{
  return vi - (((vi >> 6) & 0x01) << 1);
}

-m2a -O2 (gcc 5):
        bld     #6,r4
        movt    r0
        rts     
        add     r0,r0

trunk:
        mov     #-5,r1
        mov     r4,r0
        shld    r1,r0
        rts     
        and     #2,r0

-m4 -O2 (gcc 5):
        mov     r4,r0
        tst     #64,r0
        mov     #-1,r0
        negc    r0,r0
        rts     
        add     r0,r0

trunk:
        mov     #-5,r1
        mov     r4,r0
        shld    r1,r0
        rts     
        and     #2,r0


For the more complex expression

int test (int vi)
{
  return vi - (((vi >> 6) & 0x01) << 1) - 1;
}

-m4 -O2 (gcc 5):
        mov     r4,r0
        mov     #-1,r1
        tst     #64,r0
        negc    r1,r1
        add     r1,r1
        rts     
        subc    r1,r0

trunk:
        mov     #-5,r1
        mov     r4,r0
        shld    r1,r0
        sett
        and     #2,r0
        subc    r0,r4
        rts     
        mov     r4,r0

It seems the zero_extract patterns are not used for this anymore.  Instead
combine is looking for something like

(set (reg:SI 168 [ D.1649 ])
    (and:SI (lshiftrt:SI (reg:SI 4 r4 [ vi ])
            (const_int 5 [0x5]))
        (const_int 2 [0x2])))

This pattern should be added as a zero_extract variant.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/64345] [SH] Improve single bit extraction
  2014-12-17 22:21 [Bug target/64345] New: [SH] Improve single bit extraction olegendo at gcc dot gnu.org
  2015-01-24 13:05 ` [Bug target/64345] " olegendo at gcc dot gnu.org
  2015-07-19  8:54 ` olegendo at gcc dot gnu.org
@ 2015-07-19 13:11 ` olegendo at gcc dot gnu.org
  2015-09-19 10:01 ` olegendo at gcc dot gnu.org
  2015-09-21 13:49 ` olegendo at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: olegendo at gcc dot gnu.org @ 2015-07-19 13:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64345

--- Comment #3 from Oleg Endo <olegendo at gcc dot gnu.org> ---
(In reply to Oleg Endo from comment #2)

The following pattern seems to fix the test case.

===================================================================
--- gcc/config/sh/sh.md (revision 225987)
+++ gcc/config/sh/sh.md (working copy)
@@ -14220,6 +14220,31 @@
   [(set (reg:SI T_REG)
        (zero_extract:SI (match_dup 0) (const_int 1) (match_dup 1)))])

+(define_insn_and_split "*zero_extract_3"
+  [(set (match_operand:SI 0 "arith_reg_dest")
+       (and:SI (lshiftrt:SI (match_operand:SI 1 "arith_reg_operand")
+                            (match_operand 2 "const_int_operand"))
+               (match_operand 3 "const_int_operand")))
+   (clobber (reg:SI T_REG))]
+  "TARGET_SH1 && can_create_pseudo_p ()
+   && exact_log2 (INTVAL (operands[3])) >= 0"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  int rshift = INTVAL (operands[2]);
+  int lshift = exact_log2 (INTVAL (operands[3]));
+
+  rtx tmp = gen_reg_rtx (SImode);
+  emit_insn (gen_rtx_PARALLEL (VOIDmode,
+    gen_rtvec (2,
+      gen_rtx_SET (tmp,
+                  gen_rtx_ZERO_EXTRACT (SImode, operands[1], const1_rtx,
+                                        GEN_INT (rshift + lshift))),
+      gen_rtx_CLOBBER (VOIDmode, get_t_reg_rtx ()))));
+  emit_insn (gen_ashlsi3 (operands[0], tmp, GEN_INT (lshift)));
+})
+
 ;; -------------------------------------------------------------------------
 ;; SH2A instructions for bitwise operations.
 ;; FIXME: Convert multiple instruction insns to insn_and_split.


However, in some cases it might indeed be better to emit a shift-and sequence,
especially if the shift value is 1,2,8,16.
For instance

  return (((vi >> 6) & 0x01) << 3);

results in

        mov     r4,r0
        tst     #64,r0
        mov     #-1,r0
        negc    r0,r0
        shll2   r0
        rts     
        add     r0,r0

which is better as:
        mov     r4,r0
        shll2   r0
        shlr    r0
        rts
        and     #8,r0

Even more so when the bit test constant doesn't fit into 8 bits:

  return (((vi >> 10) & 0x01) << 1);

        mov.w   .L2,r1
        mov     #-1,r0
        tst     r1,r4
        negc    r0,r0
        rts     
        add     r0,r0
        .align 1
.L2:
        .short  1024

better:
        mov     r4,r0
        shlr8   r0
        shlr    r0
        rts
        and     #2,r0

Generally, it seems better to use an shift-and sequence, unless the tst-negc
sequence can simplify the trailing shift for SH variants without dynamic
shifts.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/64345] [SH] Improve single bit extraction
  2014-12-17 22:21 [Bug target/64345] New: [SH] Improve single bit extraction olegendo at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2015-07-19 13:11 ` olegendo at gcc dot gnu.org
@ 2015-09-19 10:01 ` olegendo at gcc dot gnu.org
  2015-09-21 13:49 ` olegendo at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: olegendo at gcc dot gnu.org @ 2015-09-19 10:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64345

--- Comment #4 from Oleg Endo <olegendo at gcc dot gnu.org> ---
(In reply to Oleg Endo from comment #2)
> A recent change in the middle end has triggered this:
> 
> FAIL: gcc.target/sh/pr54236-1.c scan-assembler-times negc 2
> 

I've created a new PR 67636 for this.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/64345] [SH] Improve single bit extraction
  2014-12-17 22:21 [Bug target/64345] New: [SH] Improve single bit extraction olegendo at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2015-09-19 10:01 ` olegendo at gcc dot gnu.org
@ 2015-09-21 13:49 ` olegendo at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: olegendo at gcc dot gnu.org @ 2015-09-21 13:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64345

--- Comment #5 from Oleg Endo <olegendo at gcc dot gnu.org> ---
Author: olegendo
Date: Mon Sep 21 13:49:07 2015
New Revision: 227971

URL: https://gcc.gnu.org/viewcvs?rev=227971&root=gcc&view=rev
Log:
testsuite/
        PR target/64345
        * gcc.target/sh/pr64345-1.c: Adjust expected insn counts for SH2A.

Modified:
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.target/sh/pr64345-1.c


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-09-21 13:49 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-17 22:21 [Bug target/64345] New: [SH] Improve single bit extraction olegendo at gcc dot gnu.org
2015-01-24 13:05 ` [Bug target/64345] " olegendo at gcc dot gnu.org
2015-07-19  8:54 ` olegendo at gcc dot gnu.org
2015-07-19 13:11 ` olegendo at gcc dot gnu.org
2015-09-19 10:01 ` olegendo at gcc dot gnu.org
2015-09-21 13:49 ` olegendo at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).