public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/47764] New: The constant load instruction should be hoisted out of loop
@ 2011-02-16 7:35 carrot at google dot com
2011-02-18 11:49 ` [Bug target/47764] " ibolton at gcc dot gnu.org
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: carrot at google dot com @ 2011-02-16 7:35 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47764
Summary: The constant load instruction should be hoisted out of
loop
Product: gcc
Version: 4.6.0
Status: UNCONFIRMED
Severity: enhancement
Priority: P3
Component: target
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: carrot@google.com
Target: arm-linux-androideabi
Created attachment 23359
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=23359
testcase
The attached test case is extracted from zlib. Compile it with options
-march=armv7-a -mthumb -Os, gcc 4.6 generates:
init_block:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
movs r3, #0
.L2:
adds r2, r0, r3
adds r3, r3, #4
movs r1, #0 // A
cmp r3, #1144
strh r1, [r2, #60] @ movhi // B
bne .L2
movs r3, #0
.L3:
adds r2, r0, r3
adds r3, r3, #4
movs r1, #0 // C
cmp r3, #120
strh r1, [r2, #2352] @ movhi
bne .L3
movs r2, #0
.L4:
adds r1, r0, r2
adds r2, r2, #4
movs r3, #0 // D
cmp r2, #76
strh r3, [r1, #2596] @ movhi
bne .L4
movs r2, #1
str r3, [r0, #2760]
strh r2, [r0, #1084] @ movhi
str r3, [r0, #2756]
str r3, [r0, #2764]
str r3, [r0, #2752]
bx lr
Note that instruction A in loop L2 loads constant 0 to register r1, then
instruction B stores r1 into memory. There is no other usage of r1 in the loop.
So it's better to move instruction A out of the loop.
Similarly instruction C can be moved out of loop L3. Actually it can be removed
since after instruction A the register r1 already contains 0 and no instruction
modify it later.
Similarly instruction D cam be moved out of loop L4. It can also be removed if
we exchange the register usage of r1 and r3 in loop L4.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/47764] The constant load instruction should be hoisted out of loop
2011-02-16 7:35 [Bug target/47764] New: The constant load instruction should be hoisted out of loop carrot at google dot com
@ 2011-02-18 11:49 ` ibolton at gcc dot gnu.org
2011-02-19 7:06 ` pinskia at gcc dot gnu.org
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: ibolton at gcc dot gnu.org @ 2011-02-18 11:49 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47764
Ian Bolton <ibolton at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization
Status|UNCONFIRMED |NEW
Last reconfirmed| |2011.02.18 11:36:27
CC| |ibolton at gcc dot gnu.org
Ever Confirmed|0 |1
Known to fail| |4.6.0
--- Comment #1 from Ian Bolton <ibolton at gcc dot gnu.org> 2011-02-18 11:36:27 UTC ---
I have confirmed this for r170052 of trunk.
Any ideas of how this improvement could be implemented, Carrot?
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/47764] The constant load instruction should be hoisted out of loop
2011-02-16 7:35 [Bug target/47764] New: The constant load instruction should be hoisted out of loop carrot at google dot com
2011-02-18 11:49 ` [Bug target/47764] " ibolton at gcc dot gnu.org
@ 2011-02-19 7:06 ` pinskia at gcc dot gnu.org
2011-02-21 4:10 ` carrot at google dot com
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2011-02-19 7:06 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47764
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> 2011-02-19 03:23:51 UTC ---
This is most likely a cost issue.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/47764] The constant load instruction should be hoisted out of loop
2011-02-16 7:35 [Bug target/47764] New: The constant load instruction should be hoisted out of loop carrot at google dot com
2011-02-18 11:49 ` [Bug target/47764] " ibolton at gcc dot gnu.org
2011-02-19 7:06 ` pinskia at gcc dot gnu.org
@ 2011-02-21 4:10 ` carrot at google dot com
2011-08-10 22:18 ` ramana at gcc dot gnu.org
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: carrot at google dot com @ 2011-02-21 4:10 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47764
--- Comment #3 from Carrot <carrot at google dot com> 2011-02-21 03:15:45 UTC ---
> Any ideas of how this improvement could be implemented, Carrot?
The root cause of this problem is that arm/thumb store instruction can't
directly store a immediate number to memory, but gcc doesn't realize this early
enough. In most part of the rtl phase, the following form is kept.
(insn 41 38 42 3 (set (mem:HI (plus:SI (reg/f:SI 169)
(const_int 60 [0x3c])) [2 MEM[(struct deflate_state *)D.2085
_3 + 60B]+0 S2 A16])
(const_int 0 [0])) src/trees.c:45 696 {*thumb2_movhi_insn}
(expr_list:REG_DEAD (reg/f:SI 169)
(nil)))
Until register allocation it finds the restriction of the store instruction and
split it into two instructions, load 0 into register and store register to
memory. But it's too late to do a loop optimization.
One possible method is to split this insn earlier than loop optimization (maybe
directly in expand pass), and let loop and cse optimizations do the rest. It
may increase register pressure in part of the program, we should rematerialize
it in such cases.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/47764] The constant load instruction should be hoisted out of loop
2011-02-16 7:35 [Bug target/47764] New: The constant load instruction should be hoisted out of loop carrot at google dot com
` (2 preceding siblings ...)
2011-02-21 4:10 ` carrot at google dot com
@ 2011-08-10 22:18 ` ramana at gcc dot gnu.org
2013-01-24 7:25 ` [Bug rtl-optimization/47764] " ubizjak at gmail dot com
2014-12-17 0:38 ` carrot at google dot com
5 siblings, 0 replies; 7+ messages in thread
From: ramana at gcc dot gnu.org @ 2011-08-10 22:18 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47764
Ramana Radhakrishnan <ramana at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |ramana at gcc dot gnu.org
--- Comment #4 from Ramana Radhakrishnan <ramana at gcc dot gnu.org> 2011-08-10 22:17:23 UTC ---
(In reply to comment #3)
> > Any ideas of how this improvement could be implemented, Carrot?
>
> The root cause of this problem is that arm/thumb store instruction can't
> directly store a immediate number to memory, but gcc doesn't realize this early
> enough. In most part of the rtl phase, the following form is kept.
>
> (insn 41 38 42 3 (set (mem:HI (plus:SI (reg/f:SI 169)
> (const_int 60 [0x3c])) [2 MEM[(struct deflate_state *)D.2085
> _3 + 60B]+0 S2 A16])
> (const_int 0 [0])) src/trees.c:45 696 {*thumb2_movhi_insn}
> (expr_list:REG_DEAD (reg/f:SI 169)
> (nil)))
>
> Until register allocation it finds the restriction of the store instruction and
> split it into two instructions, load 0 into register and store register to
> memory. But it's too late to do a loop optimization.
Eh, how is splitting this early going to help with hoisting this out of a loop
?
Ramana
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug rtl-optimization/47764] The constant load instruction should be hoisted out of loop
2011-02-16 7:35 [Bug target/47764] New: The constant load instruction should be hoisted out of loop carrot at google dot com
` (3 preceding siblings ...)
2011-08-10 22:18 ` ramana at gcc dot gnu.org
@ 2013-01-24 7:25 ` ubizjak at gmail dot com
2014-12-17 0:38 ` carrot at google dot com
5 siblings, 0 replies; 7+ messages in thread
From: ubizjak at gmail dot com @ 2013-01-24 7:25 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47764
Uros Bizjak <ubizjak at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target|arm-linux-androideabi |
CC| |ubizjak at gmail dot com
Component|target |rtl-optimization
Known to fail| |4.7.0, 4.8.0
--- Comment #5 from Uros Bizjak <ubizjak at gmail dot com> 2013-01-24 07:25:04 UTC ---
This is a problem with rtl-optimization, gcse2 pass.
Following testcase also fails on x86_64, with 4.8 [1] that removes (!o,F)
alternative.
Following test, when compiled with -O3 hoists memory load out of the loop:
--cut here--
volatile double y;
void
test ()
{
int z;
for (z = 0; z < 1000; z++)
y = 0.1;
}
--cut here--
_.210r.postreload:
15: L15:
8: NOTE_INSN_BASIC_BLOCK 3
23: xmm0:DF=[`*.LC0']
10: [`y']=xmm0:DF
REG_DEAD xmm0:DF
11: NOTE_INSN_DELETED
12: {flags:CCZ=cmp(ax:SI-0x1,0);ax:SI=ax:SI-0x1;}
13: pc={(flags:CCZ!=0)?L15:pc}
REG_BR_PROB 0x26ab
_.211r.gcse2:
26: xmm0:DF=[`*.LC0']
15: L15:
8: NOTE_INSN_BASIC_BLOCK 3
10: [`y']=xmm0:DF
REG_DEAD xmm0:DF
11: NOTE_INSN_DELETED
12: {flags:CCZ=cmp(ax:SI-0x1,0);ax:SI=ax:SI-0x1;}
13: pc={(flags:CCZ!=0)?L15:pc}
REG_BR_PROB 0x26ab
However, when constant is changed to 0.0 (so, we can load it directly to %xmm
register using xorpd insn):
--cut here--
volatile double y;
void
test ()
{
int z;
for (z = 0; z < 1000; z++)
y = 0.0;
}
--cut here--
gcc -O3:
_.211r.gcse2:
15: L15:
8: NOTE_INSN_BASIC_BLOCK 3
10: xmm0:DF=0.0
23: [`y']=xmm0:DF
REG_DEAD xmm0:DF
11: NOTE_INSN_DELETED
12: {flags:CCZ=cmp(ax:SI-0x1,0);ax:SI=ax:SI-0x1;}
13: pc={(flags:CCZ!=0)?L15:pc}
REG_BR_PROB 0x26ab
Constant load remains inside the loop. It looks that gcse2 pass cares only for
loads from memory, but I see no reason why constant load should not be
considered. It looks like an oversight to me.
The same happens with:
--cut here--
volatile long long y;
void
test ()
{
int z;
for (z = 0; z < 1000; z++)
y = 0x123456789;
}
--cut here--
_.211r.gcse2:
15: L15:
8: NOTE_INSN_BASIC_BLOCK 3
23: dx:DI=0x123456789
24: [`y']=dx:DI
REG_DEAD dx:DI
11: NOTE_INSN_DELETED
12: {flags:CCZ=cmp(ax:SI-0x1,0);ax:SI=ax:SI-0x1;}
13: pc={(flags:CCZ!=0)?L15:pc}
REG_BR_PROB 0x26ab
resulting in:
.L3:
movabsq $4886718345, %rdx
subl $1, %eax
movq %rdx, y(%rip)
jne .L3
Reconfirmed as rtl-optimization (gcse2 pass) problem.
[1] 4.8.0 20130124 (experimental) [trunk revision 195417]
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug rtl-optimization/47764] The constant load instruction should be hoisted out of loop
2011-02-16 7:35 [Bug target/47764] New: The constant load instruction should be hoisted out of loop carrot at google dot com
` (4 preceding siblings ...)
2013-01-24 7:25 ` [Bug rtl-optimization/47764] " ubizjak at gmail dot com
@ 2014-12-17 0:38 ` carrot at google dot com
5 siblings, 0 replies; 7+ messages in thread
From: carrot at google dot com @ 2014-12-17 0:38 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47764
--- Comment #6 from Carrot <carrot at google dot com> ---
Another example for ppc.
Following code is disassembled from sha1dgst.o in openssl which is compiled by
gcc
0000000000000000 <sha1_block_data_order>:
...
80: 82 5a 52 3f addis r26,r18,23170
84: 78 9a 4a 7e xor r10,r18,r19
88: 08 00 c4 8a lbz r22,8(r4)
8c: 88 00 1f ea ld r16,136(r31)
90: 0b 00 a4 8b lbz r29,11(r4)
94: 02 00 c4 8b lbz r30,2(r4)
98: 99 79 5a 3b addi r26,r26,31129
...
it uses two instructions to do (r18 + 23170 << 16 + 31129), this large constant
is used many times. In following command line sha1.gcc is disassembled from
sha1dgst.o.
$ grep 31129 sha1.gcc | wc
20 140 881
$ grep 23170 sha1.gcc | wc
20 140 886
If we load this large constant into a register, and use this register later, we
can save 18 instructions.
There are more such cases in the same functions:
$ grep 28378 sha1.gcc | wc
20 140 875
$ grep "\-5215" sha1.gcc | wc
20 140 867
$ grep "\-28900" sha1.gcc | wc
20 140 915
$ grep "\-17188" sha1.gcc | wc
20 140 916
$ grep "\-13725" sha1.gcc | wc
20 140 915
$ grep "\-15914" sha1.gcc | wc
20 140 914
More worse, these codes are inside a hot loop.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2014-12-17 0:38 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-02-16 7:35 [Bug target/47764] New: The constant load instruction should be hoisted out of loop carrot at google dot com
2011-02-18 11:49 ` [Bug target/47764] " ibolton at gcc dot gnu.org
2011-02-19 7:06 ` pinskia at gcc dot gnu.org
2011-02-21 4:10 ` carrot at google dot com
2011-08-10 22:18 ` ramana at gcc dot gnu.org
2013-01-24 7:25 ` [Bug rtl-optimization/47764] " ubizjak at gmail dot com
2014-12-17 0:38 ` carrot at google dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).