public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/49362] New: Arm Neon intrinsic types not correctly interpreted by compiler.
@ 2011-06-10 11:35 mark.pupilli at dyson dot com
2011-06-10 11:38 ` [Bug c/49362] " mark.pupilli at dyson dot com
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: mark.pupilli at dyson dot com @ 2011-06-10 11:35 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49362
Summary: Arm Neon intrinsic types not correctly interpreted by
compiler.
Product: gcc
Version: 4.4.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: mark.pupilli@dyson.com
Created attachment 24485
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=24485
C-file with 2 funs that show the bug when compiled.
Arm neon intrinsics define the type uint32x4x2_t as
typedef struct uint32x4x2_t { uint32x4_t val[2]; };
This is interpreted by the compiler literally as a struct. This should not be
the case. The compiler should treat it as a pair of registers, just as it
treats uint32_t as a single register and not an array of 4 x uint32_t.
The attached c file contains two version of the same function - one that uses
quad word loads (vld1q), and one that uses double quad word loads (vld2q). The
function thats uses double quad word loads should take 2 instructions fewer but
it is actually 44 instructions long compared to 19 for the vld1q version. (Both
functions compute the same results).
I believe this bug arises because the compiler treats the following as array
access instead of a reference into the register file:
uint32x4x2_t A = vld2q_u32 ( a );
A.val[0]; // This statement should be treated as a reference to a register -
not an array access!
Assembly for vld2q version - hopefully I am not mistaken as I am new to ARM
assembly but it appears to do double quad word loads in Neon pipeline, then
transfers the registers back to the ARM processor, indexes them as arrays and
then reloads them into the Neon pipeline again!:
vld2q variant, 44 instructions:
00000014 <_ZN4Neon16hamming_distanceEPjS0_>:
14: e92d0070 push {r4, r5, r6}
18: e24dd084 sub sp, sp, #132 ; 0x84
1c: f460c38f vld2.32 {d28-d31}, [r0]
20: e28d6020 add r6, sp, #32
24: ecc6cb08 vstmia r6, {d28-d31}
28: e1a0c001 mov ip, r1
2c: e8b6000f ldm r6!, {r0, r1, r2, r3}
30: e28d4060 add r4, sp, #96 ; 0x60
34: e1a05004 mov r5, r4
38: f46c038f vld2.32 {d16-d19}, [ip]
3c: e8a5000f stmia r5!, {r0, r1, r2, r3}
40: eccd0b08 vstmia sp, {d16-d19}
44: e896000f ldm r6, {r0, r1, r2, r3}
48: e1a0c00d mov ip, sp
4c: e28d4040 add r4, sp, #64 ; 0x40
50: e885000f stm r5, {r0, r1, r2, r3}
54: e1a05004 mov r5, r4
58: e8bc000f ldm ip!, {r0, r1, r2, r3}
5c: e8a5000f stmia r5!, {r0, r1, r2, r3}
60: e89c000f ldm ip, {r0, r1, r2, r3}
64: e885000f stm r5, {r0, r1, r2, r3}
68: eddd4b10 vldr d20, [sp, #64] ; 0x40
6c: eddd5b12 vldr d21, [sp, #72] ; 0x48
70: edddab18 vldr d26, [sp, #96] ; 0x60
74: edddbb1a vldr d27, [sp, #104] ; 0x68
78: f34a61f4 veor q11, q13, q10
7c: eddd8b14 vldr d24, [sp, #80] ; 0x50
80: eddd9b16 vldr d25, [sp, #88] ; 0x58
84: eddd4b1c vldr d20, [sp, #112] ; 0x70
88: eddd5b1e vldr d21, [sp, #120] ; 0x78
8c: f30461f8 veor q3, q10, q12
90: f3f00546 vcnt.8 q8, q3
94: f3b04566 vcnt.8 q2, q11
98: f2042860 vadd.i8 q1, q2, q8
9c: f3f022c2 vpaddl.u8 q9, q1
a0: f3f422e2 vpaddl.u16 q9, q9
a4: f22201b2 vorr d0, d18, d18
a8: f26321b3 vorr d18, d19, d19
ac: f2620b90 vpadd.i32 d16, d18, d0
b0: f2600bb0 vpadd.i32 d16, d16, d16
b4: ee100b90 vmov.32 r0, d16[0]
b8: e28dd084 add sp, sp, #132 ; 0x84
bc: e8bd0070 pop {r4, r5, r6}
c0: e12fff1e bx lr
vld1q variant, only 19 instructions:
00000014 <_ZN4Neon16hamming_distanceEPjS0_>:
14: e2802010 add r2, r0, #16
18: e2813010 add r3, r1, #16
1c: f4606a8f vld1.32 {d22-d23}, [r0]
20: f4624a8f vld1.32 {d20-d21}, [r2]
24: f463aa8f vld1.32 {d26-d27}, [r3]
28: f461ca8f vld1.32 {d28-d29}, [r1]
2c: f34681fc veor q12, q11, q14
30: f30461fa veor q3, q10, q13
34: f3f00546 vcnt.8 q8, q3
38: f3b04568 vcnt.8 q2, q12
3c: f2042860 vadd.i8 q1, q2, q8
40: f3f022c2 vpaddl.u8 q9, q1
44: f3f422e2 vpaddl.u16 q9, q9
48: f22201b2 vorr d0, d18, d18
4c: f26321b3 vorr d18, d19, d19
50: f2620b90 vpadd.i32 d16, d18, d0
54: f2600bb0 vpadd.i32 d16, d16, d16
58: ee100b90 vmov.32 r0, d16[0]
5c: e12fff1e bx lr
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug c/49362] Arm Neon intrinsic types not correctly interpreted by compiler.
2011-06-10 11:35 [Bug c/49362] New: Arm Neon intrinsic types not correctly interpreted by compiler mark.pupilli at dyson dot com
@ 2011-06-10 11:38 ` mark.pupilli at dyson dot com
2011-06-13 19:57 ` mark.pupilli at dyson dot com
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: mark.pupilli at dyson dot com @ 2011-06-10 11:38 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49362
--- Comment #1 from mark.pupilli at dyson dot com 2011-06-10 11:38:40 UTC ---
There is a typo -
'treats uint32_t as a single register and not an array of 4 x uint32_t'
should read:
'treats uint32x4_t as a single register and not an array of 4 x uint32_t'
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug c/49362] Arm Neon intrinsic types not correctly interpreted by compiler.
2011-06-10 11:35 [Bug c/49362] New: Arm Neon intrinsic types not correctly interpreted by compiler mark.pupilli at dyson dot com
2011-06-10 11:38 ` [Bug c/49362] " mark.pupilli at dyson dot com
@ 2011-06-13 19:57 ` mark.pupilli at dyson dot com
2011-06-14 12:59 ` Greta.Yorsh at arm dot com
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: mark.pupilli at dyson dot com @ 2011-06-13 19:57 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49362
--- Comment #2 from mark.pupilli at dyson dot com 2011-06-13 19:56:43 UTC ---
The vld2q version should actually be 15 instructions (not 17!) as follows:
vld2.32 {d20-d23}, [r0]
vld2.32 {d26-d29}, [r1]
veor q12, q11, q14
veor q3, q10, q13
vcnt.8 q8, q3
vcnt.8 q2, q12
vadd.i8 q1, q2, q8
vpaddl.u8 q9, q1
vpaddl.u16 q9, q9
vorr d0, d18, d18
vorr d18, d19, d19
vpadd.i32 d16, d18, d0
vpadd.i32 d16, d16, d16
vmov.32 r0, d16[0]
bx lr
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug c/49362] Arm Neon intrinsic types not correctly interpreted by compiler.
2011-06-10 11:35 [Bug c/49362] New: Arm Neon intrinsic types not correctly interpreted by compiler mark.pupilli at dyson dot com
2011-06-10 11:38 ` [Bug c/49362] " mark.pupilli at dyson dot com
2011-06-13 19:57 ` mark.pupilli at dyson dot com
@ 2011-06-14 12:59 ` Greta.Yorsh at arm dot com
2011-06-14 13:47 ` mark.pupilli at dyson dot com
2011-06-14 14:26 ` ramana at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: Greta.Yorsh at arm dot com @ 2011-06-14 12:59 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49362
Greta Yorsh <Greta.Yorsh at arm dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |Greta.Yorsh at arm dot com
--- Comment #3 from Greta Yorsh <Greta.Yorsh at arm dot com> 2011-06-14 12:59:11 UTC ---
It looks like the problem you described has already been fixed.
When the example is compiled with gcc from trunk (gcc version 4.7.0 with -O2),
vld1q variant has 15 instructions and vld2q variant has 13 instructions (see
below).
The version of gcc you use is 4.4.1. The issue hasn't been fixed in the latest
gcc release 4.6, but the fix should be included in the next release and
probably won't be backported to 4.5 and 4.6 releases.
Disassembly of section .text:
00000000 <hamming_distance_vld2q>:
0: f460438f vld2.32 {d20-d23}, [r0]
4: f461038f vld2.32 {d16-d19}, [r1]
8: f34481f0 veor q12, q10, q8
c: f34601f2 veor q8, q11, q9
10: f3f02568 vcnt.8 q9, q12
14: f3f00560 vcnt.8 q8, q8
18: f24208e0 vadd.i8 q8, q9, q8
1c: f3f002e0 vpaddl.u8 q8, q8
20: f3f402e0 vpaddl.u16 q8, q8
24: f26121b1 vorr d18, d17, d17
28: f2620bb0 vpadd.i32 d16, d18, d16
2c: f2600bb0 vpadd.i32 d16, d16, d16
30: ee100b90 vmov.32 r0, d16[0]
34: e12fff1e bx lr
00000038 <hamming_distance_vld1q>:
38: f4602a8d vld1.32 {d18-d19}, [r0]!
3c: f4610a8d vld1.32 {d16-d17}, [r1]!
40: f34221f0 veor q9, q9, q8
44: f4604a8f vld1.32 {d20-d21}, [r0]
48: f3f02562 vcnt.8 q9, q9
4c: f4610a8f vld1.32 {d16-d17}, [r1]
50: f34401f0 veor q8, q10, q8
54: f3f00560 vcnt.8 q8, q8
58: f24208e0 vadd.i8 q8, q9, q8
5c: f3f002e0 vpaddl.u8 q8, q8
60: f3f402e0 vpaddl.u16 q8, q8
64: f26121b1 vorr d18, d17, d17
68: f2620bb0 vpadd.i32 d16, d18, d16
6c: f2600bb0 vpadd.i32 d16, d16, d16
70: ee100b90 vmov.32 r0, d16[0]
74: e12fff1e bx lr
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug c/49362] Arm Neon intrinsic types not correctly interpreted by compiler.
2011-06-10 11:35 [Bug c/49362] New: Arm Neon intrinsic types not correctly interpreted by compiler mark.pupilli at dyson dot com
` (2 preceding siblings ...)
2011-06-14 12:59 ` Greta.Yorsh at arm dot com
@ 2011-06-14 13:47 ` mark.pupilli at dyson dot com
2011-06-14 14:26 ` ramana at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: mark.pupilli at dyson dot com @ 2011-06-14 13:47 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49362
mark.pupilli at dyson dot com changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |RESOLVED
Resolution| |FIXED
--- Comment #4 from mark.pupilli at dyson dot com 2011-06-14 13:46:51 UTC ---
Ok, looks like the optimiser will be better in 4.7 as well. Thank you!
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug c/49362] Arm Neon intrinsic types not correctly interpreted by compiler.
2011-06-10 11:35 [Bug c/49362] New: Arm Neon intrinsic types not correctly interpreted by compiler mark.pupilli at dyson dot com
` (3 preceding siblings ...)
2011-06-14 13:47 ` mark.pupilli at dyson dot com
@ 2011-06-14 14:26 ` ramana at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: ramana at gcc dot gnu.org @ 2011-06-14 14:26 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49362
Ramana Radhakrishnan <ramana at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |ramana at gcc dot gnu.org
Target Milestone|--- |4.7.0
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2011-06-14 14:26 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-10 11:35 [Bug c/49362] New: Arm Neon intrinsic types not correctly interpreted by compiler mark.pupilli at dyson dot com
2011-06-10 11:38 ` [Bug c/49362] " mark.pupilli at dyson dot com
2011-06-13 19:57 ` mark.pupilli at dyson dot com
2011-06-14 12:59 ` Greta.Yorsh at arm dot com
2011-06-14 13:47 ` mark.pupilli at dyson dot com
2011-06-14 14:26 ` ramana at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).