public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/60884] New: [SH] improve inlined strlen-like builtin functions
@ 2014-04-18  9:52 olegendo at gcc dot gnu.org
  2014-05-03 22:01 ` [Bug target/60884] " olegendo at gcc dot gnu.org
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: olegendo at gcc dot gnu.org @ 2014-04-18  9:52 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60884

            Bug ID: 60884
           Summary: [SH] improve inlined strlen-like builtin functions
           Product: gcc
           Version: 4.10.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: olegendo at gcc dot gnu.org
            Target: sh*-*-*

This is a carry over from the proposed improvement to the strlen-like builtin
functions by Christian.  The idea is to unroll some of the loops that are
expanded:
http://gcc.gnu.org/ml/gcc-patches/2014-03/msg01406.html

As mentioned in
http://gcc.gnu.org/ml/gcc-patches/2014-03/msg01687.html

maybe the unrolling should be done at -O3 and not at -O2, since it increases
code size in some places, but does not seem to lead to significant speed
improvements (~ 1% as measured by Christian).


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/60884] [SH] improve inlined strlen-like builtin functions
  2014-04-18  9:52 [Bug target/60884] New: [SH] improve inlined strlen-like builtin functions olegendo at gcc dot gnu.org
@ 2014-05-03 22:01 ` olegendo at gcc dot gnu.org
  2014-05-04 16:26 ` olegendo at gcc dot gnu.org
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: olegendo at gcc dot gnu.org @ 2014-05-03 22:01 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60884

Oleg Endo <olegendo at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |christian.bruel at st dot com

--- Comment #1 from Oleg Endo <olegendo at gcc dot gnu.org> ---
With the following patch applied to current trunk (r210026)

Index: gcc/config/sh/sh-mem.cc
===================================================================
--- gcc/config/sh/sh-mem.cc    (revision 210037)
+++ gcc/config/sh/sh-mem.cc    (working copy)
@@ -568,7 +568,7 @@

   addr1 = adjust_automodify_address (addr1, SImode, current_addr, 0);

-  /*start long loop.  */
+  /* start long loop.  */
   emit_label (L_loop_long);

   /* tmp1 is aligned, OK to load.  */
@@ -589,29 +589,15 @@
   addr1 = adjust_address (addr1, QImode, 0);

   /* unroll remaining bytes.  */
-  emit_insn (gen_extendqisi2 (tmp1, addr1));
-  emit_insn (gen_cmpeqsi_t (tmp1, const0_rtx));
-  jump = emit_jump_insn (gen_branch_true (L_return));
-  add_int_reg_note (jump, REG_BR_PROB, prob_likely);
+  for (int i = 0; i < 4; ++i)
+    {
+      emit_insn (gen_extendqisi2 (tmp1, addr1));
+      emit_move_insn (current_addr, plus_constant (Pmode, current_addr, 1));
+      emit_insn (gen_cmpeqsi_t (tmp1, const0_rtx));
+      jump = emit_jump_insn (gen_branch_true (L_return));
+      add_int_reg_note (jump, REG_BR_PROB, prob_likely);
+    }

-  emit_move_insn (current_addr, plus_constant (Pmode, current_addr, 1));
-
-  emit_insn (gen_extendqisi2 (tmp1, addr1));
-  emit_insn (gen_cmpeqsi_t (tmp1, const0_rtx));
-  jump = emit_jump_insn (gen_branch_true (L_return));
-  add_int_reg_note (jump, REG_BR_PROB, prob_likely);
-
-  emit_move_insn (current_addr, plus_constant (Pmode, current_addr, 1));
-
-  emit_insn (gen_extendqisi2 (tmp1, addr1));
-  emit_insn (gen_cmpeqsi_t (tmp1, const0_rtx));
-  jump = emit_jump_insn (gen_branch_true (L_return));
-  add_int_reg_note (jump, REG_BR_PROB, prob_likely);
-
-  emit_move_insn (current_addr, plus_constant (Pmode, current_addr, 1));
-
-  emit_insn (gen_extendqisi2 (tmp1, addr1));
-  jump = emit_jump_insn (gen_jump_compact (L_return));
   emit_barrier_after (jump);

   /* start byte loop.  */
@@ -626,10 +612,9 @@

   /* end loop.  */

-  emit_insn (gen_addsi3 (start_addr, start_addr, GEN_INT (1)));
-
   emit_label (L_return);

+  emit_insn (gen_addsi3 (start_addr, start_addr, GEN_INT (1)));
   emit_insn (gen_subsi3 (operands[0], current_addr, start_addr));

   return true;


I get the following when compiling

unsigned int test (const char* x)
{
  return __builtin_strlen (x);
}

with -O2 -m4:
_test:
    mov    r4,r0
    tst    #3,r0
    bf/s    .L12
    mov    r4,r1
    mov    #0,r3
.L4:
    mov.l    @r1+,r2
    cmp/str    r3,r2
    bf    .L4
    add    #-4,r1
    mov.b    @r1+,r2
    tst    r2,r2
    bt    .L2
    mov.b    @r1+,r2
    tst    r2,r2
    bt    .L2
    mov.b    @r1+,r2
    tst    r2,r2
    mov    #-1,r2
    negc    r2,r2
    add    r2,r1
.L2:
    mov    r1,r0
    rts
    subc    r4,r0
    .align 1
.L12:
    mov.b    @r1+,r2
    tst    r2,r2
    bf/s    .L12
    mov    r1,r0
    rts
    subc    r4,r0

which is 5 insns shorter than the currently expanded sequence.
It seems that other optimizers are able to figure out that the 4th byte load is
not needed and eliminate it.
Moving the 'emit_insn (gen_addsi3 ...' after the return label allows it to
utilize the subc insn, which is difficult to get otherwise, as combine looks
only at one BB at a time.

Christian, what do you think?  Any objections?


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/60884] [SH] improve inlined strlen-like builtin functions
  2014-04-18  9:52 [Bug target/60884] New: [SH] improve inlined strlen-like builtin functions olegendo at gcc dot gnu.org
  2014-05-03 22:01 ` [Bug target/60884] " olegendo at gcc dot gnu.org
@ 2014-05-04 16:26 ` olegendo at gcc dot gnu.org
  2014-05-05 12:23 ` chrbr at gcc dot gnu.org
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: olegendo at gcc dot gnu.org @ 2014-05-04 16:26 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60884

--- Comment #2 from Oleg Endo <olegendo at gcc dot gnu.org> ---
I was trying to see how to implement the strchr builtin function which could
also utilize the cmp/str insn.  However, it seems that the necessary builtin
expansion code for strchr is not there (yet).


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/60884] [SH] improve inlined strlen-like builtin functions
  2014-04-18  9:52 [Bug target/60884] New: [SH] improve inlined strlen-like builtin functions olegendo at gcc dot gnu.org
  2014-05-03 22:01 ` [Bug target/60884] " olegendo at gcc dot gnu.org
  2014-05-04 16:26 ` olegendo at gcc dot gnu.org
@ 2014-05-05 12:23 ` chrbr at gcc dot gnu.org
  2014-05-07 20:09 ` olegendo at gcc dot gnu.org
  2015-01-17 23:34 ` olegendo at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: chrbr at gcc dot gnu.org @ 2014-05-05 12:23 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60884

chrbr at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |chrbr at gcc dot gnu.org

--- Comment #3 from chrbr at gcc dot gnu.org ---

> 
> Christian, what do you think?  Any objections?

yes, nice catch to use the subc to swallow the last iteration. post-inc is best
this way indeed, thanks.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/60884] [SH] improve inlined strlen-like builtin functions
  2014-04-18  9:52 [Bug target/60884] New: [SH] improve inlined strlen-like builtin functions olegendo at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2014-05-05 12:23 ` chrbr at gcc dot gnu.org
@ 2014-05-07 20:09 ` olegendo at gcc dot gnu.org
  2015-01-17 23:34 ` olegendo at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: olegendo at gcc dot gnu.org @ 2014-05-07 20:09 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60884

--- Comment #4 from Oleg Endo <olegendo at gcc dot gnu.org> ---
Author: olegendo
Date: Wed May  7 20:08:23 2014
New Revision: 210187

URL: http://gcc.gnu.org/viewcvs?rev=210187&root=gcc&view=rev
Log:
gcc/
    PR target/60884
    * config/sh/sh-mem.cc (sh_expand_strlen): Use loop when emitting
    unrolled byte insns.  Emit address increments after move insns.

gcc/testsuite/
    PR target/60884
    * gcc.target/sh/pr53976-1.c (test_02): Remove inappropriate test case.
    (test_03): Rename to test_02.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/sh/sh-mem.cc
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.target/sh/pr53976-1.c


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/60884] [SH] improve inlined strlen-like builtin functions
  2014-04-18  9:52 [Bug target/60884] New: [SH] improve inlined strlen-like builtin functions olegendo at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2014-05-07 20:09 ` olegendo at gcc dot gnu.org
@ 2015-01-17 23:34 ` olegendo at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: olegendo at gcc dot gnu.org @ 2015-01-17 23:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60884

--- Comment #5 from Oleg Endo <olegendo at gcc dot gnu.org> ---
The test case gcc.target/sh/memset.c:

void
test00(char *dstb)
{
  __builtin_memset (dstb, 0, 15);
}


compiles to:
        mov     r4,r0
        tst     #3,r0
        mov     #0,r1
        bf/s    .L5
        mov     #15,r2
        mov     #3,r2
.L3:
        mov.l   r1,@r4   << loop runs 3x.
        dt      r2       << better emit 3x mov.l
        bf/s    .L3
        add     #4,r4

        mov.b   r1,@r4
        add     #1,r4
        mov.b   r1,@r4
        add     #1,r4
        rts
        mov.b   r1,@r4
        .align 1
.L5:
        mov.b   r1,@r4
        dt      r2
        bf/s    .L5
        add     #1,r4
        rts
        nop

Especially when the number of the iterations is known, we should try to unroll
the loops.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-01-17 23:34 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-18  9:52 [Bug target/60884] New: [SH] improve inlined strlen-like builtin functions olegendo at gcc dot gnu.org
2014-05-03 22:01 ` [Bug target/60884] " olegendo at gcc dot gnu.org
2014-05-04 16:26 ` olegendo at gcc dot gnu.org
2014-05-05 12:23 ` chrbr at gcc dot gnu.org
2014-05-07 20:09 ` olegendo at gcc dot gnu.org
2015-01-17 23:34 ` olegendo at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).