* [Bug target/63281] powerpc64le creates 64 bit constants from scratch instead of loading them
2014-09-16 23:01 [Bug target/63281] New: powerpc64le creates 64 bit constants from scratch instead of loading them anton at samba dot org
@ 2014-09-17 6:32 ` amodra at gmail dot com
2014-09-17 6:40 ` amodra at gmail dot com
` (13 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: amodra at gmail dot com @ 2014-09-17 6:32 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63281
--- Comment #2 from Alan Modra <amodra at gmail dot com> ---
Created attachment 33504
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33504&action=edit
this moves constants from the toc to rodata
For -mcmodel=medium it is just as efficient to load a constant from .rodata as
it is from .toc, so keep all constants out of the TOC. (FP is already excluded
by defauls selected for -mcmodel=medium). These constants are actually put
into .rodata.cst8, so there is some chance they might be merged with an
identical constant in another object file, which is a win over putting them in
.toc. Also, this means .toc should only contain addresses, necessary for the
current ppc64 linux kernel that wants to relocate .toc en masse.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug target/63281] powerpc64le creates 64 bit constants from scratch instead of loading them
2014-09-16 23:01 [Bug target/63281] New: powerpc64le creates 64 bit constants from scratch instead of loading them anton at samba dot org
2014-09-17 6:32 ` [Bug target/63281] " amodra at gmail dot com
@ 2014-09-17 6:40 ` amodra at gmail dot com
2021-12-21 9:23 ` [Bug rtl-optimization/63281] " guojiufu at gcc dot gnu.org
` (12 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: amodra at gmail dot com @ 2014-09-17 6:40 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63281
--- Comment #3 from Alan Modra <amodra at gmail dot com> ---
Curiously, trunk + patch1 gives better usage of registers (only r31 of
non-volatile regs used) and find some fusion opportunities.
trunk+patch1+patch2 results in r27-r31 being used (r28-r31 for -mlra), and no
fusion. Yet the two instruction sequences look very similar going into reload.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug rtl-optimization/63281] powerpc64le creates 64 bit constants from scratch instead of loading them
2014-09-16 23:01 [Bug target/63281] New: powerpc64le creates 64 bit constants from scratch instead of loading them anton at samba dot org
2014-09-17 6:32 ` [Bug target/63281] " amodra at gmail dot com
2014-09-17 6:40 ` amodra at gmail dot com
@ 2021-12-21 9:23 ` guojiufu at gcc dot gnu.org
2021-12-21 11:09 ` guojiufu at gcc dot gnu.org
` (11 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: guojiufu at gcc dot gnu.org @ 2021-12-21 9:23 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63281
Jiu Fu Guo <guojiufu at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |guojiufu at gcc dot gnu.org
--- Comment #10 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> ---
With the latest trunk (AT14 is similar), the generated code looks like this:
-O
lis %r9,0x8123
ori %r9,%r9,0x4567
rldimi %r9,%r9,32,0
std %r9,0(%r10)
Or
-O3
lis %r11,0x1234
lis %r31,0x2345
lis %r12,0x3456
ori %r11,%r11,0x5678
ori %r31,%r31,0x6781
ori %r12,%r12,0x7812
rldimi %r11,%r11,32,0
rldimi %r31,%r31,32,0
rldimi %r12,%r12,32,0
...
This code seems better than the previous one.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug rtl-optimization/63281] powerpc64le creates 64 bit constants from scratch instead of loading them
2014-09-16 23:01 [Bug target/63281] New: powerpc64le creates 64 bit constants from scratch instead of loading them anton at samba dot org
` (2 preceding siblings ...)
2021-12-21 9:23 ` [Bug rtl-optimization/63281] " guojiufu at gcc dot gnu.org
@ 2021-12-21 11:09 ` guojiufu at gcc dot gnu.org
2021-12-21 14:29 ` segher at gcc dot gnu.org
` (10 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: guojiufu at gcc dot gnu.org @ 2021-12-21 11:09 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63281
--- Comment #11 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> ---
While for the const which Bill said in comment9, 0x000800004100001
The code sequence still contains a few instructions:
e.g.
li %r11,0
ori %r11,%r11,0x8000
sldi %r11,%r11,32
oris %r11,%r11,0x410
ori %r11,%r11,0x1
std %r11,0(%r3)
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug rtl-optimization/63281] powerpc64le creates 64 bit constants from scratch instead of loading them
2014-09-16 23:01 [Bug target/63281] New: powerpc64le creates 64 bit constants from scratch instead of loading them anton at samba dot org
` (3 preceding siblings ...)
2021-12-21 11:09 ` guojiufu at gcc dot gnu.org
@ 2021-12-21 14:29 ` segher at gcc dot gnu.org
2021-12-21 14:36 ` segher at gcc dot gnu.org
` (9 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: segher at gcc dot gnu.org @ 2021-12-21 14:29 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63281
--- Comment #12 from Segher Boessenkool <segher at gcc dot gnu.org> ---
This is my g:72b2f3317b44, two years and a day old :-)
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug rtl-optimization/63281] powerpc64le creates 64 bit constants from scratch instead of loading them
2014-09-16 23:01 [Bug target/63281] New: powerpc64le creates 64 bit constants from scratch instead of loading them anton at samba dot org
` (4 preceding siblings ...)
2021-12-21 14:29 ` segher at gcc dot gnu.org
@ 2021-12-21 14:36 ` segher at gcc dot gnu.org
2021-12-30 3:29 ` guojiufu at gcc dot gnu.org
` (8 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: segher at gcc dot gnu.org @ 2021-12-21 14:36 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63281
--- Comment #13 from Segher Boessenkool <segher at gcc dot gnu.org> ---
If we need more than three insns to create a constant we are better off loading
it from memory, in all cases. Maybe three is too much already, at least on
some processors?
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug rtl-optimization/63281] powerpc64le creates 64 bit constants from scratch instead of loading them
2014-09-16 23:01 [Bug target/63281] New: powerpc64le creates 64 bit constants from scratch instead of loading them anton at samba dot org
` (5 preceding siblings ...)
2021-12-21 14:36 ` segher at gcc dot gnu.org
@ 2021-12-30 3:29 ` guojiufu at gcc dot gnu.org
2021-12-30 7:14 ` amodra at gmail dot com
` (7 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: guojiufu at gcc dot gnu.org @ 2021-12-30 3:29 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63281
--- Comment #14 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> ---
For constant like 0x000800004100001, which is using 5 insns, at 'expand' pass,
it is treated as preferred to save in memory, while at cse1 pass, it was
replaced back to constant.
expand:
7: r119:DI=[unspec[`*.LC0',%r2:DI] 47]
REG_EQUAL 0x800004100001
8: [r117:DI]=r119:DI
cse1:
7: r119:DI=0x800004100001
REG_EQUAL 0x800004100001
8: [r117:DI]=r119:DI
This is because:
expand_assignment invoke force_const_mem/gen_const_mem under the condition:
(num_insns_constant (operands[1], mode) > (TARGET_CMODEL != CMODEL_SMALL ? 3 :
2))
At cse1, when comparing the cost between 'fold_const' and 'src', 'fold_const'
is selected
'preferable (src_folded_cost, src_folded_regcost, src_cost, src_regcost) <= 0'
src:
(mem/u/c:DI (unspec:DI [
(symbol_ref/u:DI ("*.LC0") [flags 0x82])
(reg:DI 2 2)
] UNSPEC_TOCREL) [2 S8 A8])
fold_const:
(const_int 140737556512769 [0x800004100001])
It would be a way to keep the data in memory(.rodata) through adjusting the
cost of constant.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug rtl-optimization/63281] powerpc64le creates 64 bit constants from scratch instead of loading them
2014-09-16 23:01 [Bug target/63281] New: powerpc64le creates 64 bit constants from scratch instead of loading them anton at samba dot org
` (6 preceding siblings ...)
2021-12-30 3:29 ` guojiufu at gcc dot gnu.org
@ 2021-12-30 7:14 ` amodra at gmail dot com
2021-12-31 1:51 ` guojiufu at gcc dot gnu.org
` (6 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: amodra at gmail dot com @ 2021-12-30 7:14 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63281
--- Comment #15 from Alan Modra <amodra at gmail dot com> ---
(In reply to Jiu Fu Guo from comment #14)
> It would be a way to keep the data in memory(.rodata) through adjusting the
> cost of constant.
Yes, I posted a series of patches that fix this problem and other rtx costs.
Look for patches with "rs6000_rtx_costs" in the subject. Some of the patches
were even approved, but not all in the series. I am disillusioned enough with
gcc that I won't be pushing those patches or attempting any future gcc work.
You or anyone else are welcome to pick up the pieces.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug rtl-optimization/63281] powerpc64le creates 64 bit constants from scratch instead of loading them
2014-09-16 23:01 [Bug target/63281] New: powerpc64le creates 64 bit constants from scratch instead of loading them anton at samba dot org
` (7 preceding siblings ...)
2021-12-30 7:14 ` amodra at gmail dot com
@ 2021-12-31 1:51 ` guojiufu at gcc dot gnu.org
2021-12-31 2:00 ` guojiufu at gcc dot gnu.org
` (5 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: guojiufu at gcc dot gnu.org @ 2021-12-31 1:51 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63281
--- Comment #16 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> ---
Thanks, Alan!
I saw your patches in this PR. They would help us to get the sequence of what
we are thinking. And as you said in the comments: it is a big problem for
fixing insn and rtl cost.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug rtl-optimization/63281] powerpc64le creates 64 bit constants from scratch instead of loading them
2014-09-16 23:01 [Bug target/63281] New: powerpc64le creates 64 bit constants from scratch instead of loading them anton at samba dot org
` (8 preceding siblings ...)
2021-12-31 1:51 ` guojiufu at gcc dot gnu.org
@ 2021-12-31 2:00 ` guojiufu at gcc dot gnu.org
2021-12-31 7:33 ` segher at gcc dot gnu.org
` (4 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: guojiufu at gcc dot gnu.org @ 2021-12-31 2:00 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63281
--- Comment #17 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> ---
One thing, I'm wondering, is if it is really 'slow' using instructions to build
the const (even with 5 insns).
For example, there seems no big difference in runtime between the below two
pieces of code on a real machine.
1.
foo:
.LFB0:
.cfi_startproc
std %r31,-8(%r1)
.cfi_offset 31, -8
li %r12,2
li %r31,1
li %r0,3
li %r11,4
std %r31,0(%r3)
std %r12,0(%r4)
std %r0,0(%r5)
std %r11,0(%r6)
std %r31,0(%r7)
std %r12,0(%r8)
ld %r31,-8(%r1)
std %r0,0(%r9)
std %r11,0(%r10)
.cfi_restore 31
blr
2
foo:
.LFB0:
.cfi_startproc
std 31,-8(1)
.cfi_offset 31, -8
li 11,0
li 31,0
li 12,0
ori 11,11,0x8000
ori 31,31,0x8000
ori 12,12,0x8000
sldi 11,11,32
sldi 31,31,32
sldi 12,12,32
oris 11,11,0x410
oris 31,31,0x410
oris 12,12,0x410
ori 11,11,0x1
ori 31,31,0x3
ori 12,12,0x5
li 0,0
std 11,0(3)
std 31,0(4)
li 3,0
li 4,0
std 12,0(5)
li 5,0
ori 0,0,0x8000
ld 31,-8(1)
ori 3,3,0x8000
ori 4,4,0x8000
ori 5,5,0x8000
sldi 0,0,32
sldi 3,3,32
sldi 4,4,32
sldi 5,5,32
oris 0,0,0x410
oris 3,3,0x410
oris 4,4,0x410
oris 5,5,0x410
ori 0,0,0x7
addi 11,11,5
ori 3,3,0xa
ori 4,4,0xe
ori 5,5,0xc
std 0,0(6)
std 11,0(7)
std 3,0(8)
std 4,0(9)
std 5,0(10)
.cfi_restore 31
blr
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug rtl-optimization/63281] powerpc64le creates 64 bit constants from scratch instead of loading them
2014-09-16 23:01 [Bug target/63281] New: powerpc64le creates 64 bit constants from scratch instead of loading them anton at samba dot org
` (9 preceding siblings ...)
2021-12-31 2:00 ` guojiufu at gcc dot gnu.org
@ 2021-12-31 7:33 ` segher at gcc dot gnu.org
2022-01-04 6:07 ` guojiufu at gcc dot gnu.org
` (3 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: segher at gcc dot gnu.org @ 2021-12-31 7:33 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63281
--- Comment #18 from Segher Boessenkool <segher at gcc dot gnu.org> ---
Yes, it is slow. Five sequential dependent integer instructions instead of
one load instruction. Depending on how you benchmark this you possibly won't
see the slowness, the values are stored to memory and that can happen very
many cycles later even, this is totally out of the critical path, will not
clog up any pipelines.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug rtl-optimization/63281] powerpc64le creates 64 bit constants from scratch instead of loading them
2014-09-16 23:01 [Bug target/63281] New: powerpc64le creates 64 bit constants from scratch instead of loading them anton at samba dot org
` (10 preceding siblings ...)
2021-12-31 7:33 ` segher at gcc dot gnu.org
@ 2022-01-04 6:07 ` guojiufu at gcc dot gnu.org
2022-01-04 8:06 ` guojiufu at gcc dot gnu.org
` (2 subsequent siblings)
14 siblings, 0 replies; 16+ messages in thread
From: guojiufu at gcc dot gnu.org @ 2022-01-04 6:07 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63281
--- Comment #19 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> ---
(In reply to Segher Boessenkool from comment #18)
Thanks for your clarify!
> Yes, it is slow. Five sequential dependent integer instructions instead of
> one load instruction. Depending on how you benchmark this you possibly won't
Yes, it depends on how the cases are benchmarked. There are some factors that
affect the runtime. This is really the point!
In the above cases, a few std(s) and there is one spill on r31 are all affect
the runtime and would hide the instructions on const building.
Focusing on the sequence to build a const, the 5 insns sequence is faster a lot
than the sequence of 1 insns.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug rtl-optimization/63281] powerpc64le creates 64 bit constants from scratch instead of loading them
2014-09-16 23:01 [Bug target/63281] New: powerpc64le creates 64 bit constants from scratch instead of loading them anton at samba dot org
` (11 preceding siblings ...)
2022-01-04 6:07 ` guojiufu at gcc dot gnu.org
@ 2022-01-04 8:06 ` guojiufu at gcc dot gnu.org
2022-01-06 10:06 ` guojiufu at gcc dot gnu.org
2022-01-10 6:52 ` guojiufu at gcc dot gnu.org
14 siblings, 0 replies; 16+ messages in thread
From: guojiufu at gcc dot gnu.org @ 2022-01-04 8:06 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63281
--- Comment #20 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> ---
Created attachment 52114
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52114&action=edit
testcases
With these test cases, invoke 'foo' in these cases 1000,000,000 times, to see
the runtime:
building 'constant' through 1 insn is fastest.
next faster is building const by 2 instructions, or loading from rodata, or
loading from toc.
building const by 3 instructions is slower than loading from rodata, building
const by 5 ins is slowest.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug rtl-optimization/63281] powerpc64le creates 64 bit constants from scratch instead of loading them
2014-09-16 23:01 [Bug target/63281] New: powerpc64le creates 64 bit constants from scratch instead of loading them anton at samba dot org
` (12 preceding siblings ...)
2022-01-04 8:06 ` guojiufu at gcc dot gnu.org
@ 2022-01-06 10:06 ` guojiufu at gcc dot gnu.org
2022-01-10 6:52 ` guojiufu at gcc dot gnu.org
14 siblings, 0 replies; 16+ messages in thread
From: guojiufu at gcc dot gnu.org @ 2022-01-06 10:06 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63281
--- Comment #21 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> ---
Also had a test on powerpc, -m32. As testing, it seems no significant benefit
loading from 'rodata' vs. building constants by instructions.
lis %r7,0x410
ori %r7,%r7,0x103c
lis %r6,0x710
ori %r6,%r6,0xe005
lis %r12,.LC3@ha
la %r12,.LC3@l(%r12)
lwz %r3,0(%r12)
lwz %r4,4(%r12)
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug rtl-optimization/63281] powerpc64le creates 64 bit constants from scratch instead of loading them
2014-09-16 23:01 [Bug target/63281] New: powerpc64le creates 64 bit constants from scratch instead of loading them anton at samba dot org
` (13 preceding siblings ...)
2022-01-06 10:06 ` guojiufu at gcc dot gnu.org
@ 2022-01-10 6:52 ` guojiufu at gcc dot gnu.org
14 siblings, 0 replies; 16+ messages in thread
From: guojiufu at gcc dot gnu.org @ 2022-01-10 6:52 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63281
--- Comment #22 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> ---
On power10, loading constant only needs 1 instruction, like:
pld 9,.LC0@pcrel
And, as tests, it seems nearly as fast as using 1 instruction to build const.
^ permalink raw reply [flat|nested] 16+ messages in thread