public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/70053] Returning a struct of _Decimal128 values generates extraneous stores and loads
[not found] <bug-70053-4@http.gcc.gnu.org/bugzilla/>
@ 2020-05-21 5:29 ` luoxhu at gcc dot gnu.org
2020-05-22 14:59 ` wschmidt at gcc dot gnu.org
` (7 subsequent siblings)
8 siblings, 0 replies; 9+ messages in thread
From: luoxhu at gcc dot gnu.org @ 2020-05-21 5:29 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70053
luoxhu at gcc dot gnu.org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |bergner at gcc dot gnu.org,
| |luoxhu at gcc dot gnu.org
--- Comment #4 from luoxhu at gcc dot gnu.org ---
Is this just the difference of O3 and O2? Since O3 is OK, maybe this bug is not
effective?
$ /opt/at10.0/bin/gcc -O3 -S pr70053.c
$ cat pr70053.s
.file "pr70053.c"
.abiversion 2
.section ".text"
.align 2
.p2align 4,,15
.globl D256_add_finite
.type D256_add_finite, @function
D256_add_finite:
dcmpuq 7,4,6
beq 7,.L3
fmr 7,3
fmr 6,2
fmr 3,7
fmr 2,6
blr
.p2align 4,,15
.L3:
fmr 5,7
fmr 4,6
fmr 3,7
fmr 2,6
blr
.long 0
.byte 0,0,0,0,0,0,0,0
.size D256_add_finite,.-D256_add_finite
.ident "GCC: (GNU) 6.4.1 20170720 (Advance-Toolchain-at10.0) IBM AT 10
branch, based on subversion id 250395."
.section .note.GNU-stack,"",@progbits
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/70053] Returning a struct of _Decimal128 values generates extraneous stores and loads
[not found] <bug-70053-4@http.gcc.gnu.org/bugzilla/>
2020-05-21 5:29 ` [Bug target/70053] Returning a struct of _Decimal128 values generates extraneous stores and loads luoxhu at gcc dot gnu.org
@ 2020-05-22 14:59 ` wschmidt at gcc dot gnu.org
2020-05-25 8:06 ` luoxhu at gcc dot gnu.org
` (6 subsequent siblings)
8 siblings, 0 replies; 9+ messages in thread
From: wschmidt at gcc dot gnu.org @ 2020-05-22 14:59 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70053
Bill Schmidt <wschmidt at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Ever confirmed|0 |1
Status|UNCONFIRMED |NEW
Last reconfirmed| |2020-05-22
--- Comment #5 from Bill Schmidt <wschmidt at gcc dot gnu.org> ---
I'd like to understand why the difference between -O2 and -O3 exists. We
shouldn't generate this kind of nasty store-load at -O2.
Confirmed, BTW. :)
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/70053] Returning a struct of _Decimal128 values generates extraneous stores and loads
[not found] <bug-70053-4@http.gcc.gnu.org/bugzilla/>
2020-05-21 5:29 ` [Bug target/70053] Returning a struct of _Decimal128 values generates extraneous stores and loads luoxhu at gcc dot gnu.org
2020-05-22 14:59 ` wschmidt at gcc dot gnu.org
@ 2020-05-25 8:06 ` luoxhu at gcc dot gnu.org
2020-05-25 9:50 ` luoxhu at gcc dot gnu.org
` (5 subsequent siblings)
8 siblings, 0 replies; 9+ messages in thread
From: luoxhu at gcc dot gnu.org @ 2020-05-25 8:06 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70053
--- Comment #6 from luoxhu at gcc dot gnu.org ---
"-O2 -ftree-slp-vectorize" could also generate the expected simple fmrs.
Reason is pass_cselim will transform conditional stores into unconditional ones
with PHI instructions when vectorization and if-conversion is
enabled(gcc/tree-ssa-phiopt.c:2482).
pr70053.c.108t.cdce:
D256_add_finite (_Decimal128 a, _Decimal128 b, _Decimal128 c)
{
struct TDx2_t D.2914;
<bb 2> [local count: 1073741824]:
if (b_4(D) == c_5(D))
goto <bb 3>; [34.00%]
else
goto <bb 4>; [66.00%]
<bb 3> [local count: 365072224]:
D.2914.td0 = c_5(D);
D.2914.td1 = c_5(D);
goto <bb 5>; [100.00%]
<bb 4> [local count: 708669601]:
D.2914.td0 = a_3(D);
D.2914.td1 = b_4(D);
<bb 5> [local count: 1073741824]:
return D.2914;
}
=> pr70053.c.109t.cselim:
D256_add_finite (_Decimal128 a, _Decimal128 b, _Decimal128 c)
{
struct TDx2_t D.2914;
_Decimal128 cstore_10;
_Decimal128 cstore_11;
<bb 2> [local count: 1073741824]:
if (b_4(D) == c_5(D))
goto <bb 4>; [34.00%]
else
goto <bb 3>; [66.00%]
<bb 3> [local count: 708669601]:
<bb 4> [local count: 1073741824]:
# cstore_10 = PHI <c_5(D)(2), a_3(D)(3)>
# cstore_11 = PHI <c_5(D)(2), b_4(D)(3)>
D.2914.td1 = cstore_11;
D.2914.td0 = cstore_10;
return D.2914;
}
Then at expand pass, the PHI instruction "cstore_10 = PHI <c_5(D)(2),
a_3(D)(3)>" will be expanded to move for "-O2 -ftree-slp-vectorize". If no such
PHI generated, bb3 and bb4 in pr70053.c.108t.cdce will be expanded to
STORE/LOAD with TD->DI conversion, causing a lot st/ld conversion finally.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/70053] Returning a struct of _Decimal128 values generates extraneous stores and loads
[not found] <bug-70053-4@http.gcc.gnu.org/bugzilla/>
` (2 preceding siblings ...)
2020-05-25 8:06 ` luoxhu at gcc dot gnu.org
@ 2020-05-25 9:50 ` luoxhu at gcc dot gnu.org
2020-05-25 17:22 ` segher at gcc dot gnu.org
` (4 subsequent siblings)
8 siblings, 0 replies; 9+ messages in thread
From: luoxhu at gcc dot gnu.org @ 2020-05-25 9:50 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70053
luoxhu at gcc dot gnu.org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |segher at gcc dot gnu.org
--- Comment #7 from luoxhu at gcc dot gnu.org ---
When expanding "D.2914.td0 = c_5(D);" in expand_assignment (to=<component_ref
0x7ffff55909c0>, from=<ssa_name 0x7ffff5390b40>, nontemporal=false) at
../../gcc-master/gcc/expr.c:5058
1) expr.c:5158: to_rtx = expand_expr (tem, NULL_RTX, VOIDmode, EXPAND_WRITE);
gdb pr to_rtx
(mem/c:BLK (reg/f:DI 112 virtual-stack-vars) [2 D.2914+0 S32 A128])
...
2) expr.c:5167: to_rtx = adjust_address (to_rtx, mode1, 0);
p mode1
$86 = E_TDmode
(gdb) pr to_rtx
(mem/c:TD (reg/f:DI 112 virtual-stack-vars) [2 D.2914+0 S16 A128])
to_rtx is generated with address conversion from DImode to TDmode here.
...
3) expr.c:5374: result = store_field (to_rtx, bitsize,
bitpos,bitregion_start, bitregion_end, mode1, from, get_alias_set (to),
nontemporal, reversep);
then the assignment instruction is generated as below:
(insn 11 10 12 4 (set (mem/c:TD (reg/f:DI 112 virtual-stack-vars) [1
D.2914.td0+0 S16 A128]) (reg/v:TD 121 [ c ])) "pr70053.c":20:14 -1 (nil))
So if we need remove the redundant store/load in expand, the conversion from
DImode to TDmode should be avoided for this case when using virtual-stack-vars
registers. (For PR65421, there are similar DImode to DFmode conversion).
pr70053.c.236r.expand with -O2:
1: NOTE_INSN_DELETED
6: NOTE_INSN_BASIC_BLOCK 2
2: r119:TD=%2:TD
3: r120:TD=%4:TD
4: r121:TD=%6:TD
5: NOTE_INSN_FUNCTION_BEG
8: r122:CCFP=cmp(r120:TD,r121:TD)
9: pc={(r122:CCFP!=0)?L16:pc}
REG_BR_PROB 708669604
10: NOTE_INSN_BASIC_BLOCK 4
11: [r112:DI]=r121:TD
12: r123:DI=r112:DI+0x10
13: [r123:DI]=r121:TD
14: pc=L21
15: barrier
16: L16:
17: NOTE_INSN_BASIC_BLOCK 5
18: [r112:DI]=r119:TD
19: r124:DI=r112:DI+0x10
20: [r124:DI]=r120:TD
21: L21:
22: NOTE_INSN_BASIC_BLOCK 6
23: r125:TD=[r112:DI]
24: r127:DI=r112:DI+0x10
25: r126:TD=[r127:DI]
26: r117:TD=r125:TD
27: r118:TD=r126:TD
31: %2:TD=r117:TD
32: %4:TD=r118:TD
33: use %2:TD
34: use %4:TD
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/70053] Returning a struct of _Decimal128 values generates extraneous stores and loads
[not found] <bug-70053-4@http.gcc.gnu.org/bugzilla/>
` (3 preceding siblings ...)
2020-05-25 9:50 ` luoxhu at gcc dot gnu.org
@ 2020-05-25 17:22 ` segher at gcc dot gnu.org
2020-06-01 2:40 ` luoxhu at gcc dot gnu.org
` (3 subsequent siblings)
8 siblings, 0 replies; 9+ messages in thread
From: segher at gcc dot gnu.org @ 2020-05-25 17:22 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70053
--- Comment #8 from Segher Boessenkool <segher at gcc dot gnu.org> ---
I see no conversion there?
But, why does it it store to memory at all?
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/70053] Returning a struct of _Decimal128 values generates extraneous stores and loads
[not found] <bug-70053-4@http.gcc.gnu.org/bugzilla/>
` (4 preceding siblings ...)
2020-05-25 17:22 ` segher at gcc dot gnu.org
@ 2020-06-01 2:40 ` luoxhu at gcc dot gnu.org
2021-01-31 1:53 ` michaeljclark at mac dot com
` (2 subsequent siblings)
8 siblings, 0 replies; 9+ messages in thread
From: luoxhu at gcc dot gnu.org @ 2020-06-01 2:40 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70053
--- Comment #9 from luoxhu at gcc dot gnu.org ---
(In reply to Segher Boessenkool from comment #8)
> I see no conversion there?
>
> But, why does it it store to memory at all?
Yes, no conversion for this case, only adjust_address to TImode. mem/c:TD means
a MEM cannot trap.
Reason of store to memory:
D.2914 is a local struct variable here, seems we need do some optimization to
sink the D.2914.td0 and D.2914.td1 from BB3&BB4 to BB5 to avoid store/load on
stack? Or if there already exists some pass in Gimple? Or should this be
optimized after expander by some new pass like store sink?
O2/pr70053.c.234t.optimized:
D256_add_finite (_Decimal128 a, _Decimal128 b, _Decimal128 c)
{
struct TDx2_t D.2914;
<bb 2> [local count: 1073741824]:
if (b_4(D) == c_5(D))
goto <bb 3>; [34.00%]
else
goto <bb 4>; [66.00%]
<bb 3> [local count: 365072224]:
D.2914.td0 = c_5(D);
D.2914.td1 = c_5(D);
goto <bb 5>; [100.00%]
<bb 4> [local count: 708669601]:
D.2914.td0 = a_3(D);
D.2914.td1 = b_4(D);
<bb 5> [local count: 1073741824]:
return D.2914;
}
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/70053] Returning a struct of _Decimal128 values generates extraneous stores and loads
[not found] <bug-70053-4@http.gcc.gnu.org/bugzilla/>
` (5 preceding siblings ...)
2020-06-01 2:40 ` luoxhu at gcc dot gnu.org
@ 2021-01-31 1:53 ` michaeljclark at mac dot com
2021-02-02 0:01 ` segher at gcc dot gnu.org
2022-10-28 6:56 ` LpSolit at gmail dot com
8 siblings, 0 replies; 9+ messages in thread
From: michaeljclark at mac dot com @ 2021-01-31 1:53 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70053
Michael Clark <michaeljclark at mac dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |michaeljclark at mac dot com
--- Comment #10 from Michael Clark <michaeljclark at mac dot com> ---
another data point. I am seeing something similar on x86-64.
SysV x86-64 ABI specifies that _Decimal128 is to be passed in
xmm regs so I believe the stack stores here are redundant.
; cat > dec1.c << EOF
_Decimal128 add_d(_Decimal128 a, _Decimal128 b) { return a + b; }
EOF
; gcc -O2 -S -masm=intel dec1.c
; cat dec1.s
add_d:
.LFB0:
.cfi_startproc
endbr64
sub rsp, 40
.cfi_def_cfa_offset 48
movaps XMMWORD PTR [rsp], xmm0
movaps XMMWORD PTR 16[rsp], xmm1
call __bid_addtd3@PLT
movaps XMMWORD PTR [rsp], xmm0
add rsp, 40
.cfi_def_cfa_offset 8
ret
.cfi_endproc
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/70053] Returning a struct of _Decimal128 values generates extraneous stores and loads
[not found] <bug-70053-4@http.gcc.gnu.org/bugzilla/>
` (6 preceding siblings ...)
2021-01-31 1:53 ` michaeljclark at mac dot com
@ 2021-02-02 0:01 ` segher at gcc dot gnu.org
2022-10-28 6:56 ` LpSolit at gmail dot com
8 siblings, 0 replies; 9+ messages in thread
From: segher at gcc dot gnu.org @ 2021-02-02 0:01 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70053
--- Comment #11 from Segher Boessenkool <segher at gcc dot gnu.org> ---
Please open a separate bug for x86 problems.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/70053] Returning a struct of _Decimal128 values generates extraneous stores and loads
[not found] <bug-70053-4@http.gcc.gnu.org/bugzilla/>
` (7 preceding siblings ...)
2021-02-02 0:01 ` segher at gcc dot gnu.org
@ 2022-10-28 6:56 ` LpSolit at gmail dot com
8 siblings, 0 replies; 9+ messages in thread
From: LpSolit at gmail dot com @ 2022-10-28 6:56 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70053
Jiu Fu Guo <guojiufu at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution|--- |FIXED
--- Comment #12 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> ---
The generated code is a lot better on the trunk (both -O2 and -O3):
.cfi_startproc
dcmpuq 0,4,6
beq 0,.L4
blr
.p2align 4,,15
.L4:
fmr 5,7
fmr 4,6
fmr 3,7
fmr 2,6
blr
And the expanded RTL like below, and cse/fwprop/dse optimize it as expected.
2: r119:TD=%2:TD
3: r120:TD=%4:TD
4: r121:TD=%6:TD
5: NOTE_INSN_FUNCTION_BEG
10: r122:CCFP=cmp(r120:TD,r121:TD)
11: pc={(r122:CCFP!=0)?L13:pc}
REG_BR_PROB 708669604
12: NOTE_INSN_BASIC_BLOCK 4
6: r120:TD=r121:TD
7: r119:TD=r121:TD
13: L13:
14: NOTE_INSN_BASIC_BLOCK 5
15: r123:DI=r112:DI+0x10
16: [r123:DI]=r120:TD
17: [r112:DI]=r119:TD
18: r124:TD=[r112:DI]
19: r126:DI=r112:DI+0x10
20: r125:TD=[r126:DI]
21: r117:TD=r124:TD
22: r118:TD=r125:TD
26: %2:TD=r117:TD
27: %4:TD=r118:TD
28: use %2:TD
29: use %4:TD
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2024-01-14 21:55 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <bug-70053-4@http.gcc.gnu.org/bugzilla/>
2020-05-21 5:29 ` [Bug target/70053] Returning a struct of _Decimal128 values generates extraneous stores and loads luoxhu at gcc dot gnu.org
2020-05-22 14:59 ` wschmidt at gcc dot gnu.org
2020-05-25 8:06 ` luoxhu at gcc dot gnu.org
2020-05-25 9:50 ` luoxhu at gcc dot gnu.org
2020-05-25 17:22 ` segher at gcc dot gnu.org
2020-06-01 2:40 ` luoxhu at gcc dot gnu.org
2021-01-31 1:53 ` michaeljclark at mac dot com
2021-02-02 0:01 ` segher at gcc dot gnu.org
2022-10-28 6:56 ` LpSolit at gmail dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).