public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/67782] New: [SH] Improve bit tests of values loaded from memory
@ 2015-09-30 13:35 olegendo at gcc dot gnu.org
0 siblings, 0 replies; only message in thread
From: olegendo at gcc dot gnu.org @ 2015-09-30 13:35 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67782
Bug ID: 67782
Summary: [SH] Improve bit tests of values loaded from memory
Product: gcc
Version: unknown
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: olegendo at gcc dot gnu.org
Target Milestone: ---
Target: sh*-*-*
The following example
int test (int* x)
{
return (*x & (1 << 14)) == 0;
}
compiled with -O2 -m4 -ml:
mov.l @r4,r1
mov.w .L2,r2
tst r2,r1
rts
movt r0
.align 1
.L2:
.short 16384
compiled with -Os -m4 -ml (uses some constant optimization in tstsi_t pattern):
mov.l @r4,r0
swap.b r0,r0
tst #64,r0
rts
movt r0
Instead of loading the whole 32 bit value from memory, loading one byte is
enough:
mov.b @(1,r4),r0
tst #64,r0
rts
movt r0
Because the value has to go into R0 anyway, using a displacement mov.b is OK,
if the displacement is in range and if no further address calculations are
needed (e.g. for a different mode than displacement addressing). If the
constant is not shared with anything else, this can be a win.
Actually the SLOW_BYTE_ACCESS macro has a similar effect. Defining it to 0
will make some optimizations try things like above. Although that particular
case doesn't see any improvement, there are some hits in the CSiBE set. For
example in linux tcp_input.c:
SLOW_BYTE_ACCESS = 1:
mov.l @(32,r4),r3
mov.l @(12,r3),r3
tst r10,r3
SLOW_BYTE_ACCESS = 0:
mov.l @(32,r4),r0
mov.b @(13,r0),r0
tst #192,r0
tcp_input.c seems to have quite a couple of such cases. There are also other
cases like binfmt_script.s:
SLOW_BYTE_ACCESS = 1:
mov.l @r4,r1
add #-68,r15
mov.w .L54,r2
extu.w r1,r1
cmp/eq r2,r1
bf/s .L67
SLOW_BYTE_ACCESS = 0:
mov.w .L54,r1
add #-68,r15
mov.w @r4,r2
cmp/eq r1,r2
bf/s .L67
However, overall the code seems to get a bit worse. It seems this kind of
transformation has to be done by taking a bit more context into account. One
idea would be to do it rather later, before/during peephole2, although then
utilizing tst #imm,R0 might be difficult.
It would also be possible to do this during combine by using some special
patterns/predicates that accept a memory operand before register allocation,
and split out the memory load in split1. However, there are quite a few
patterns involved and the final tstsi_t pattern is formed during split1. So
maybe tstsi_t can be extended to look for a memory load of the operand and its
addressing mode, and convert the memory load accordingly. Although that
wouldn't catch the cmp/eq case above.
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2015-09-30 13:35 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-30 13:35 [Bug target/67782] New: [SH] Improve bit tests of values loaded from memory olegendo at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).