public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/96201] New: x86 movsd/movsq string instructions and alignment inference
@ 2020-07-15 1:50 michaeljclark at mac dot com
2020-07-15 5:42 ` [Bug target/96201] " crazylht at gmail dot com
2020-09-15 11:15 ` amker at gcc dot gnu.org
0 siblings, 2 replies; 3+ messages in thread
From: michaeljclark at mac dot com @ 2020-07-15 1:50 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96201
Bug ID: 96201
Summary: x86 movsd/movsq string instructions and alignment
inference
Product: gcc
Version: 10.1.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: michaeljclark at mac dot com
Target Milestone: ---
Taking the time to record some observations and extract minimal test code for
alignment (inference) and x86 string instruction selection.
GCC9 and GCC10 are not generating x86 string instructions in cases apparently
due to the compiler believing the addresses are not aligned.
GCC10 appears to have an additional issue whereby x86 string instructions are
not selected unless the address is aligned to twice the natural alignment.
Two observations:
* (GCC9/10) integer alignment is not inferred from expressions i.e. x & ~3
* (GCC10) __builtin_assume_aligned appears to require double the alignment
The double alignment issue was observed with both int/movsd and long/movsq
whereby GCC10 will only generate movsd or movsq if the alignment is double the
type's natural alignment. The test case here is for int.
--- BEGIN SAMPLE CODE ---
void f1(long d, long s, unsigned n)
{
int *sn = (int*)( (long)(s ) & ~3l );
int *dn = (int*)( (long)(d ) & ~3l );
int *de = (int*)( (long)(d + n) & ~3l );
while (dn < de) *dn++ = *sn++;
}
void f2(long d, long s, unsigned n)
{
int *sn = (int*)( (long)(s ) & ~7l );
int *dn = (int*)( (long)(d ) & ~7l );
int *de = (int*)( (long)(d + n) & ~7l );
while (dn < de) *dn++ = *sn++;
}
void f3(long d, long s, unsigned n)
{
int *sn = __builtin_assume_aligned( (int*)( (long)(s ) & ~3l ), 4 );
int *dn = __builtin_assume_aligned( (int*)( (long)(d ) & ~3l ), 4 );
int *de = __builtin_assume_aligned( (int*)( (long)(d + n) & ~3l ), 4 );
while (dn < de) *dn++ = *sn++;
}
void f4(long d, long s, unsigned n)
{
int *sn = __builtin_assume_aligned( (int*)((long)(s ) & ~3l ), 8 );
int *dn = __builtin_assume_aligned( (int*)((long)(d ) & ~3l ), 8 );
int *de = __builtin_assume_aligned( (int*)((long)(d + n) & ~3l ), 8 );
while (dn < de) *dn++ = *sn++;
}
--- END SAMPLE CODE ---
GCC9 generates this for f1, f2 and GCC10 generates this for f1, f2, f3
.Ln:
leaq (%rax,%rsi), %rcx
movq %rax, %rdx
addq $4, %rax
movl (%rcx), %ecx
movl %ecx, (%rdx)
cmpq %rax, %rdi
ja .Ln
GCC9 generates this for f3, f4 and GCC10 generates this only for f4
.Ln:
movsl
cmpq %rdi, %rdx
ja .Ln
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug target/96201] x86 movsd/movsq string instructions and alignment inference
2020-07-15 1:50 [Bug target/96201] New: x86 movsd/movsq string instructions and alignment inference michaeljclark at mac dot com
@ 2020-07-15 5:42 ` crazylht at gmail dot com
2020-09-15 11:15 ` amker at gcc dot gnu.org
1 sibling, 0 replies; 3+ messages in thread
From: crazylht at gmail dot com @ 2020-07-15 5:42 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96201
Hongtao.liu <crazylht at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |crazylht at gmail dot com
--- Comment #1 from Hongtao.liu <crazylht at gmail dot com> ---
The issue is caused by pass_ivopt, ivopt select only one iv for f3(dn) which
seems not to be optimal, and select two iv for f4(sn,dn) which seems optimal.
---
loop in f3:
Selected IV set for loop 1 at pr96201.c:25, 10 avg niters, 1 IVs:
Candidate 8:
Var befor: dn_24
Var after: dn_18
Incr POS: orig biv
IV struct:
Type: int *
Base: (int *) _3
Step: 4
Biv: N
Overflowness wrto loop niter: Overflow
loop in f4:
Selected IV set for loop 1 at pr96201.c:34, 10 avg niters, 2 IVs:
Candidate 6:
Var befor: sn_26
Var after: sn_20
Incr POS: orig biv
IV struct:
Type: int *
Base: sn_14
Step: 4
Object: (void *) sn_14
Biv: N
Overflowness wrto loop niter: Overflow
Candidate 8:
Var befor: dn_27
Var after: dn_21
Incr POS: orig biv
IV struct:
Type: int *
Base: dn_16
Step: 4
Object: (void *) dn_16
Biv: N
Overflowness wrto loop niter: Overflow
---
then it generate more instructions for f3 which pass_combine failed to combine
them.
---
loop in f3:
Trying 19 -> 22:
19: r83:DI=r92:DI
22: [r83:DI]=r89:SI
REG_DEAD r89:SI
REG_DEAD r83:DI
Can't combine i2 into i3
Trying 21 -> 22:
21: r89:SI=[r93:DI]
REG_DEAD r93:DI
22: [r83:DI]=r89:SI
REG_DEAD r89:SI
REG_DEAD r83:DI
Failed to match this instruction:
(set (mem:SI (reg/v/f:DI 83 [ dn ]) [1 *dn_2+0 S4 A32])
(mem:SI (reg/f:DI 93 [ _20 ]) [1 *_20+0 S4 A32]))
Trying 18, 21 -> 22:
18: {r93:DI=r92:DI+r102:DI;clobber flags:CC;}
REG_UNUSED flags:CC
21: r89:SI=[r93:DI]
REG_DEAD r93:DI
22: [r83:DI]=r89:SI
REG_DEAD r89:SI
REG_DEAD r83:DI
Can't combine i1 into i3
Trying 21, 19 -> 22:
21: r89:SI=[r93:DI]
REG_DEAD r93:DI
19: r83:DI=r92:DI
22: [r83:DI]=r89:SI
REG_DEAD r89:SI
REG_DEAD r83:DI
Can't combine i1 into i3
(insn 18 16 19 4 (parallel [
(set (reg/f:DI 93 [ _20 ])
(plus:DI (reg/v/f:DI 92 [ dn ])
(reg:DI 102)))
(clobber (reg:CC 17 flags))
]) 210 {*adddi_1}
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil)))
(insn 19 18 20 4 (set (reg/v/f:DI 83 [ dn ])
(reg/v/f:DI 92 [ dn ])) 74 {*movdi_internal}
(nil))
(insn 20 19 21 4 (parallel [
(set (reg/v/f:DI 92 [ dn ])
(plus:DI (reg/v/f:DI 92 [ dn ])
(const_int 4 [0x4])))
(clobber (reg:CC 17 flags))
]) "pr96201.c":25:24 210 {*adddi_1}
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil)))
(insn 21 20 22 4 (set (reg:SI 89 [ _9 ])
(mem:SI (reg/f:DI 93 [ _20 ]) [1 *_20+0 S4 A32])) "pr96201.c":25:29 75
{*movsi_internal}
(expr_list:REG_DEAD (reg/f:DI 93 [ _20 ])
(nil)))
(insn 22 21 24 4 (set (mem:SI (reg/v/f:DI 83 [ dn ]) [1 *dn_2+0 S4 A32])
(reg:SI 89 [ _9 ])) "pr96201.c":25:27 75 {*movsi_internal}
(expr_list:REG_DEAD (reg:SI 89 [ _9 ])
(expr_list:REG_DEAD (reg/v/f:DI 83 [ dn ])
(nil))))
loop in f4:
Trying 16, 18, 17 -> 19:
16: {r89:DI=r89:DI+0x4;clobber flags:CC;}
REG_UNUSED flags:CC
18: r88:SI=[r89:DI-0x4]
17: {r90:DI=r90:DI+0x4;clobber flags:CC;}
REG_UNUSED flags:CC
19: [r90:DI-0x4]=r88:SI
REG_DEAD r88:SI
Successfully matched this instruction:
(parallel [
(set (mem:SI (reg/v/f:DI 90 [ dn ]) [1 MEM[base: dn_21, offset: -4B]+0
S4 A32])
(mem:SI (reg/v/f:DI 89 [ sn ]) [1 MEM[base: sn_20, offset: -4B]+0
S4 A32]))
(set (reg/v/f:DI 90 [ dn ])
(plus:DI (reg/v/f:DI 90 [ dn ])
(const_int 4 [0x4])))
(set (reg/v/f:DI 89 [ sn ])
(plus:DI (reg/v/f:DI 89 [ sn ])
(const_int 4 [0x4])))
])
(insn 16 15 17 3 (parallel [
(set (reg/v/f:DI 89 [ sn ])
(plus:DI (reg/v/f:DI 89 [ sn ])
(const_int 4 [0x4])))
(clobber (reg:CC 17 flags))
]) "pr96201.c":34:32 210 {*adddi_1}
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil)))
(insn 17 16 18 3 (parallel [
(set (reg/v/f:DI 90 [ dn ])
(plus:DI (reg/v/f:DI 90 [ dn ])
(const_int 4 [0x4])))
(clobber (reg:CC 17 flags))
]) "pr96201.c":34:24 210 {*adddi_1}
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil)))
(insn 18 17 19 3 (set (reg:SI 88 [ _9 ])
(mem:SI (plus:DI (reg/v/f:DI 89 [ sn ])
(const_int -4 [0xfffffffffffffffc])) [1 MEM[base: sn_20,
offset: -4B]+0 S4 A32])) "pr96201.c":34:29 75 {*movsi_internal}
(nil))
(insn 19 18 21 3 (set (mem:SI (plus:DI (reg/v/f:DI 90 [ dn ])
(const_int -4 [0xfffffffffffffffc])) [1 MEM[base: dn_21,
offset: -4B]+0 S4 A32])
(reg:SI 88 [ _9 ])) "pr96201.c":34:27 75 {*movsi_internal}
(expr_list:REG_DEAD (reg:SI 88 [ _9 ])
(nil)))
---
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug target/96201] x86 movsd/movsq string instructions and alignment inference
2020-07-15 1:50 [Bug target/96201] New: x86 movsd/movsq string instructions and alignment inference michaeljclark at mac dot com
2020-07-15 5:42 ` [Bug target/96201] " crazylht at gmail dot com
@ 2020-09-15 11:15 ` amker at gcc dot gnu.org
1 sibling, 0 replies; 3+ messages in thread
From: amker at gcc dot gnu.org @ 2020-09-15 11:15 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96201
bin cheng <amker at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |amker at gcc dot gnu.org
--- Comment #2 from bin cheng <amker at gcc dot gnu.org> ---
Reason is that memory references in f3 are not identified as address type IV
uses. I don't remember details, but it's intended by below commit:
commit 653a4b32fe72e33bfd4cdd4c25493049524a3805
Author: Bin Cheng <bin.cheng@arm.com>
Date: Thu Mar 2 11:25:11 2017 +0000
re PR tree-optimization/66768 (address space gets lost on literal pointer)
PR tree-optimization/66768
* tree-ssa-loop-ivopts.c (find_interesting_uses_address): Skip addr
iv_use if base object can't be determined.
gcc/testsuite
* gcc.target/i386/pr66768.c: New test.
From-SVN: r245837
For f1/f2, IVOPTs fails to identify base object because pointers are converted
from integer. We need to tell the difference better.
For f3, __builtin_assume_aligned is optimized away by GCC-10 before IVOPTs.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2020-09-15 11:15 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-15 1:50 [Bug target/96201] New: x86 movsd/movsq string instructions and alignment inference michaeljclark at mac dot com
2020-07-15 5:42 ` [Bug target/96201] " crazylht at gmail dot com
2020-09-15 11:15 ` amker at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).