public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/61241] New: built-in memset makes the caller function slower than normal memset
@ 2014-05-20 1:33 ma.jiang at zte dot com.cn
2014-05-20 1:38 ` [Bug rtl-optimization/61241] " ma.jiang at zte dot com.cn
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: ma.jiang at zte dot com.cn @ 2014-05-20 1:33 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61241
Bug ID: 61241
Summary: built-in memset makes the caller function slower than
normal memset
Product: gcc
Version: 4.10.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ma.jiang at zte dot com.cn
Compiled with -O2,
#include <string.h>
extern int off;
void *test(char *a1, char* a2)
{
memset(a2, 123, 123);
return a2 + off;
}
gives a result as following.
mov ip, r1
mov r1, #123
stmfd sp!, {r3, lr}
mov r0, ip
mov r2, r1
bl memset
movw r3, #:lower16:off
movt r3, #:upper16:off
mov ip, r0
ldr r0, [r3]
add r0, ip, r0
ldmfd sp!, {r3, pc}
After adding -fno-builtin, the assemble code becomes shorter.
stmfd sp!, {r4, lr}
mov r4, r1
mov r1, #123
mov r0, r4
mov r2, r1
bl memset
movw r3, #:lower16:off
movt r3, #:upper16:off
ldr r0, [r3]
add r0, r4, r0
ldmfd sp!, {r4, pc}
One reason is that arm_eabi must align stack to 8 bytes, so it push a
meaningless r3. But that is not the most important reason.
When using built-in memset, ira can know that memset does not change the value
of r0. Then choosing r0 instead of ip is clearly more profitable, because this
choice can get rid of the redundant "mov ip,r0; mov r0,ip;" pair.
For this rtl sequence:
(insn 7 8 9 2 (set (reg:SI 0 r0)
(reg/v/f:SI 115 [ a2 ])) open_test.c:5 186 {*arm_movsi_insn}
(nil))
(insn 9 7 10 2 (set (reg:SI 2 r2)
(reg:SI 1 r1)) open_test.c:5 186 {*arm_movsi_insn}
(expr_list:REG_EQUAL (const_int 123 [0x7b])
(nil)))
(call_insn 10 9 24 2 (parallel [
(set (reg:SI 0 r0)
(call (mem:SI (symbol_ref:SI ("memset") [flags 0x41]
<function_decl 0xb7d72500 memset>) [0 __builtin_memset S4 A32])
(const_int 0 [0])))
(use (const_int 0 [0]))
(clobber (reg:SI 14 lr))
]) open_test.c:5 251 {*call_value_symbol}
(expr_list:REG_RETURNED (reg/v/f:SI 115 [ a2 ])
(expr_list:REG_DEAD (reg:SI 2 r2)
(expr_list:REG_DEAD (reg:SI 1 r1)
(expr_list:REG_UNUSED (reg:SI 0 r0)
(expr_list:REG_EH_REGION (const_int 0 [0])
(nil))))))
(expr_list:REG_CFA_WINDOW_SAVE (set (reg:SI 0 r0)
(reg:SI 0 r0))
(expr_list:REG_CFA_WINDOW_SAVE (use (reg:SI 2 r2))
(expr_list:REG_CFA_WINDOW_SAVE (use (reg:SI 1 r1))
(expr_list:REG_CFA_WINDOW_SAVE (use (reg:SI 0 r0))
(nil))))))
Assigning r0 to r115 was blocked by two pieces of code in
process_bb_node_lives(In ira-lives.c).
1:
call_p = CALL_P (insn);
for (def_rec = DF_INSN_DEFS (insn); *def_rec; def_rec++)
if (!call_p || !DF_REF_FLAGS_IS_SET (*def_rec, DF_REF_MAY_CLOBBER))
mark_ref_live (*def_rec);
2:
/* Mark each used value as live. */
for (use_rec = DF_INSN_USES (insn); *use_rec; use_rec++)
mark_ref_live (*use_rec);
In piece 1, "set (reg:SI 0 ) (reg/v/f:SI 115)" will make r0 conflict with
r115 when r115 is living. This is not necessary as "set (reg:SI 0) (reg:SI 0)"
will not hurt any other instruction. Making r0 conflict with all living pseudo
registers will lose the chance to optimize a set instruction. I think at least
for a simple single set, we should not make the source register conflict with
the dest register when one of them is hard register and the other is not.
In piece 2, after call memset, r0 will become living and then conflict with
living r115. This code neglect that r115 is the result of
find_call_crossed_cheap_reg, and in fact r115 is the same as r0.
As discussed above, the two pieces of code block the ira to do a more
profitable choice.I have build a patch to fix this problem. After the patch,
the assemble code with built-in memset become shorter than normal memset.
mov r0, r1
mov r1, #123
stmfd sp!, {r3, lr}
mov r2, r1
bl memset
movw r3, #:lower16:off
movt r3, #:upper16:off
ldr r3, [r3]
add r0, r0, r3
ldmfd sp!, {r3, pc}
I have done a "bootstrap" and "make check" on x86, nothing change after the
patch. Is that patch OK for trunk?
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug rtl-optimization/61241] built-in memset makes the caller function slower than normal memset
2014-05-20 1:33 [Bug rtl-optimization/61241] New: built-in memset makes the caller function slower than normal memset ma.jiang at zte dot com.cn
@ 2014-05-20 1:38 ` ma.jiang at zte dot com.cn
2014-05-20 1:51 ` ma.jiang at zte dot com.cn
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: ma.jiang at zte dot com.cn @ 2014-05-20 1:38 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61241
--- Comment #1 from ma.jiang at zte dot com.cn ---
Created attachment 32822
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=32822&action=edit
proposed patch
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug rtl-optimization/61241] built-in memset makes the caller function slower than normal memset
2014-05-20 1:33 [Bug rtl-optimization/61241] New: built-in memset makes the caller function slower than normal memset ma.jiang at zte dot com.cn
2014-05-20 1:38 ` [Bug rtl-optimization/61241] " ma.jiang at zte dot com.cn
@ 2014-05-20 1:51 ` ma.jiang at zte dot com.cn
2014-05-20 8:36 ` [Bug rtl-optimization/61241] built-in memset makes the caller function slower ktkachov at gcc dot gnu.org
2014-05-20 14:13 ` ma.jiang at zte dot com.cn
3 siblings, 0 replies; 5+ messages in thread
From: ma.jiang at zte dot com.cn @ 2014-05-20 1:51 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61241
--- Comment #2 from ma.jiang at zte dot com.cn ---
Created attachment 32823
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=32823&action=edit
testcase
should be put into gcc/testsuite/gcc.target/arm
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug rtl-optimization/61241] built-in memset makes the caller function slower
2014-05-20 1:33 [Bug rtl-optimization/61241] New: built-in memset makes the caller function slower than normal memset ma.jiang at zte dot com.cn
2014-05-20 1:38 ` [Bug rtl-optimization/61241] " ma.jiang at zte dot com.cn
2014-05-20 1:51 ` ma.jiang at zte dot com.cn
@ 2014-05-20 8:36 ` ktkachov at gcc dot gnu.org
2014-05-20 14:13 ` ma.jiang at zte dot com.cn
3 siblings, 0 replies; 5+ messages in thread
From: ktkachov at gcc dot gnu.org @ 2014-05-20 8:36 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61241
ktkachov at gcc dot gnu.org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |ktkachov at gcc dot gnu.org
--- Comment #3 from ktkachov at gcc dot gnu.org ---
Can you please send the patch to gcc-patches@gcc.gnu.org including a ChangeLog
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug rtl-optimization/61241] built-in memset makes the caller function slower
2014-05-20 1:33 [Bug rtl-optimization/61241] New: built-in memset makes the caller function slower than normal memset ma.jiang at zte dot com.cn
` (2 preceding siblings ...)
2014-05-20 8:36 ` [Bug rtl-optimization/61241] built-in memset makes the caller function slower ktkachov at gcc dot gnu.org
@ 2014-05-20 14:13 ` ma.jiang at zte dot com.cn
3 siblings, 0 replies; 5+ messages in thread
From: ma.jiang at zte dot com.cn @ 2014-05-20 14:13 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61241
--- Comment #4 from ma.jiang at zte dot com.cn ---
(In reply to ktkachov from comment #3)
> Can you please send the patch to gcc-patches@gcc.gnu.org including a
> ChangeLog
Done! Thanks.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-05-20 14:13 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-20 1:33 [Bug rtl-optimization/61241] New: built-in memset makes the caller function slower than normal memset ma.jiang at zte dot com.cn
2014-05-20 1:38 ` [Bug rtl-optimization/61241] " ma.jiang at zte dot com.cn
2014-05-20 1:51 ` ma.jiang at zte dot com.cn
2014-05-20 8:36 ` [Bug rtl-optimization/61241] built-in memset makes the caller function slower ktkachov at gcc dot gnu.org
2014-05-20 14:13 ` ma.jiang at zte dot com.cn
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).