[Bug target/67782] New: [SH] Improve bit tests of values loaded from memory

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/67782] New: [SH] Improve bit tests of values loaded from memory
@ 2015-09-30 13:35 olegendo at gcc dot gnu.org
  0 siblings, 0 replies; only message in thread
From: olegendo at gcc dot gnu.org @ 2015-09-30 13:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67782

            Bug ID: 67782
           Summary: [SH] Improve bit tests of values loaded from memory
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: olegendo at gcc dot gnu.org
  Target Milestone: ---
            Target: sh*-*-*

The following example

int test (int* x)
{
  return (*x & (1 << 14)) == 0;
}

compiled with -O2 -m4 -ml:
        mov.l   @r4,r1
        mov.w   .L2,r2
        tst     r2,r1
        rts     
        movt    r0
        .align 1
.L2:
        .short  16384

compiled with -Os -m4 -ml (uses some constant optimization in tstsi_t pattern):
        mov.l   @r4,r0
        swap.b  r0,r0
        tst     #64,r0
        rts     
        movt    r0

Instead of loading the whole 32 bit value from memory, loading one byte is
enough:
       mov.b    @(1,r4),r0
       tst      #64,r0
       rts
       movt     r0

Because the value has to go into R0 anyway, using a displacement mov.b is OK,
if the displacement is in range and if no further address calculations are
needed (e.g. for a different mode than displacement addressing).  If the
constant is not shared with anything else, this can be a win.

Actually the SLOW_BYTE_ACCESS macro has a similar effect.  Defining it to 0
will make some optimizations try things like above.  Although that particular
case doesn't see any improvement, there are some hits in the CSiBE set.  For
example in linux tcp_input.c:

SLOW_BYTE_ACCESS = 1:
        mov.l   @(32,r4),r3
        mov.l   @(12,r3),r3
        tst     r10,r3

SLOW_BYTE_ACCESS = 0:
        mov.l   @(32,r4),r0
        mov.b   @(13,r0),r0
        tst     #192,r0

tcp_input.c seems to have quite a couple of such cases.  There are also other
cases like binfmt_script.s:

SLOW_BYTE_ACCESS = 1:
        mov.l   @r4,r1
        add     #-68,r15
        mov.w   .L54,r2
        extu.w  r1,r1
        cmp/eq  r2,r1
        bf/s    .L67

SLOW_BYTE_ACCESS = 0:
        mov.w   .L54,r1
        add     #-68,r15
        mov.w   @r4,r2
        cmp/eq  r1,r2
        bf/s    .L67

However, overall the code seems to get a bit worse.  It seems this kind of
transformation has to be done by taking a bit more context into account.  One
idea would be to do it rather later, before/during peephole2, although then
utilizing tst #imm,R0 might be difficult.

It would also be possible to do this during combine by using some special
patterns/predicates that accept a memory operand before register allocation,
and split out the memory load in split1.  However, there are quite a few
patterns involved and the final tstsi_t pattern is formed during split1.  So
maybe tstsi_t can be extended to look for a memory load of the operand and its
addressing mode, and convert the memory load accordingly.  Although that
wouldn't catch the cmp/eq case above.

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2015-09-30 13:35 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-30 13:35 [Bug target/67782] New: [SH] Improve bit tests of values loaded from memory olegendo at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).