public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
* [Bug c/42596] New: Integer/Floating point vector casts generate XMM register moves from and to the same register @ 2010-01-03 14:18 adam at consulting dot net dot nz 2010-01-03 14:46 ` [Bug c/42596] " rguenth at gcc dot gnu dot org ` (4 more replies) 0 siblings, 5 replies; 6+ messages in thread From: adam at consulting dot net dot nz @ 2010-01-03 14:18 UTC (permalink / raw) To: gcc-bugs [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain, Size: 3175 bytes --] /* GCC does not permit a vector of a union type. To dynamically store both integer and float data in an XMM register define an integer vector and cast to a float vector whenever a floating point operation is required upon the data (or use a union to perform the type conversion). Instead of treating these casts as a no-op GCC copies the register using a floating point XMM move instruction, performs the calculation, then copies the register back using an integer XMM move instruction. */ typedef double xmm_2f64_t __attribute__((vector_size (16))); typedef long long xmm_2i64_t __attribute__((vector_size (16))); register xmm_2f64_t xmm_a __asm__("xmm4"); register xmm_2f64_t xmm_b __asm__("xmm5"); register xmm_2i64_t xmm_c __asm__("xmm6"); register xmm_2i64_t xmm_d __asm__("xmm7"); typedef union { xmm_2f64_t xmm_2f64; xmm_2i64_t xmm_2i64; } xmm_u; //"data type of xmm_e isnt suitable for a register" //register xmm_u xmm_e __asm__("xmm8"); typedef union { long long i64; double f64; } r64_u; //Note that the union above is suitable for a 64-bit register register r64_u r64 __asm__("r15"); void test_fp_vectors_containing_fp_data() { xmm_a+=xmm_b; } void test_int_vectors_containing_fp_data() { xmm_c=(xmm_2i64_t) ((xmm_2f64_t) xmm_c + (xmm_2f64_t) xmm_d); } void test_int_vectors_containing_fp_data_using_a_union() { xmm_u u_c, u_d; u_c.xmm_2i64=xmm_c; u_d.xmm_2i64=xmm_d; u_c.xmm_2f64+=u_d.xmm_2f64; xmm_c=u_c.xmm_2i64; } int main() { } Relevant code generation: $ gcc -O3 dynamic_vectors.c && objdump -d -m i386:x86-64:intel a.out |less 00000000004004a0 <test_fp_vectors_containing_fp_data>: 4004a0: 66 0f 58 e5 addpd xmm4,xmm5 4004a4: c3 ret 4004a5: 66 66 2e 0f 1f 84 00 nop WORD PTR cs:[rax+rax*1+0x0] 4004ac: 00 00 00 00 00000000004004b0 <test_int_vectors_containing_fp_data>: 4004b0: 66 0f 28 c6 movapd xmm0,xmm6 4004b4: 66 0f 58 c7 addpd xmm0,xmm7 4004b8: 66 0f 6f f0 movdqa xmm6,xmm0 4004bc: c3 ret 4004bd: 0f 1f 00 nop DWORD PTR [rax] 00000000004004c0 <test_int_vectors_containing_fp_data_using_a_union>: 4004c0: 66 0f 28 c6 movapd xmm0,xmm6 4004c4: 66 0f 58 c7 addpd xmm0,xmm7 4004c8: 66 0f 6f f0 movdqa xmm6,xmm0 4004cc: c3 ret 4004cd: 0f 1f 00 nop DWORD PTR [rax] The last two functions should generate addpd xmm6,xmm7 instead of first copying xmm6 to xmm0, performing the calculation, and then copying xmm6 back to xmm0. -- Summary: Integer/Floating point vector casts generate XMM register moves from and to the same register Product: gcc Version: 4.4.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: adam at consulting dot net dot nz http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42596 ^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug c/42596] Integer/Floating point vector casts generate XMM register moves from and to the same register 2010-01-03 14:18 [Bug c/42596] New: Integer/Floating point vector casts generate XMM register moves from and to the same register adam at consulting dot net dot nz @ 2010-01-03 14:46 ` rguenth at gcc dot gnu dot org 2010-01-03 14:50 ` rguenth at gcc dot gnu dot org ` (3 subsequent siblings) 4 siblings, 0 replies; 6+ messages in thread From: rguenth at gcc dot gnu dot org @ 2010-01-03 14:46 UTC (permalink / raw) To: gcc-bugs ------- Comment #1 from rguenth at gcc dot gnu dot org 2010-01-03 14:46 ------- *** Bug 42595 has been marked as a duplicate of this bug. *** -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42596 ^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug c/42596] Integer/Floating point vector casts generate XMM register moves from and to the same register 2010-01-03 14:18 [Bug c/42596] New: Integer/Floating point vector casts generate XMM register moves from and to the same register adam at consulting dot net dot nz 2010-01-03 14:46 ` [Bug c/42596] " rguenth at gcc dot gnu dot org @ 2010-01-03 14:50 ` rguenth at gcc dot gnu dot org 2010-01-03 23:20 ` adam at consulting dot net dot nz ` (2 subsequent siblings) 4 siblings, 0 replies; 6+ messages in thread From: rguenth at gcc dot gnu dot org @ 2010-01-03 14:50 UTC (permalink / raw) To: gcc-bugs ------- Comment #2 from rguenth at gcc dot gnu dot org 2010-01-03 14:50 ------- Likely because you are using hardregs. Re-do the testcase with intrinsics please. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42596 ^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug c/42596] Integer/Floating point vector casts generate XMM register moves from and to the same register 2010-01-03 14:18 [Bug c/42596] New: Integer/Floating point vector casts generate XMM register moves from and to the same register adam at consulting dot net dot nz 2010-01-03 14:46 ` [Bug c/42596] " rguenth at gcc dot gnu dot org 2010-01-03 14:50 ` rguenth at gcc dot gnu dot org @ 2010-01-03 23:20 ` adam at consulting dot net dot nz 2010-01-05 4:17 ` adam at consulting dot net dot nz 2010-01-05 10:47 ` rguenth at gcc dot gnu dot org 4 siblings, 0 replies; 6+ messages in thread From: adam at consulting dot net dot nz @ 2010-01-03 23:20 UTC (permalink / raw) To: gcc-bugs ------- Comment #3 from adam at consulting dot net dot nz 2010-01-03 23:20 ------- This is a demo of poor code generation with XMM global register variables ("hardregs") and the vector extensions. Intrinsics are too low level. The vector extensions can continue to work on a platform without x86 intrinsics (just replace register variable definitions with, e.g., __thread variable definitions on a foreign architecture). test_fp_vectors_containing_fp_data() indicates that code generation can be optimal with XMM global register variables+vector extensions. But only if the register variables are defined as the same type as the calculation. This is not a reasonable restriction for an XMM global register variable that has to contain integer and/or floating point data over the lifetime of a program. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42596 ^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug c/42596] Integer/Floating point vector casts generate XMM register moves from and to the same register 2010-01-03 14:18 [Bug c/42596] New: Integer/Floating point vector casts generate XMM register moves from and to the same register adam at consulting dot net dot nz ` (2 preceding siblings ...) 2010-01-03 23:20 ` adam at consulting dot net dot nz @ 2010-01-05 4:17 ` adam at consulting dot net dot nz 2010-01-05 10:47 ` rguenth at gcc dot gnu dot org 4 siblings, 0 replies; 6+ messages in thread From: adam at consulting dot net dot nz @ 2010-01-05 4:17 UTC (permalink / raw) To: gcc-bugs ------- Comment #4 from adam at consulting dot net dot nz 2010-01-05 04:17 ------- /* Workaround discovered! */ void test_int_vectors_containing_fp_data_using_local_reg_var_overlay() { //create local register variables of the required floating point type //(for the same global register variables) register xmm_2f64_t local_xmm_c __asm__("xmm6"); register xmm_2f64_t local_xmm_d __asm__("xmm7"); //same calculation upon the local register variables. No casts are required. local_xmm_c = local_xmm_c + local_xmm_d; //the local changes above will be optimised away unless the global register //variables are updated. The casts below should be a no-op as the local //register variables are aliased to the global register variables. xmm_c=(xmm_2i64_t) local_xmm_c; xmm_d=(xmm_2i64_t) local_xmm_d; } With this workaround generated code is still optimal when the global register variables have an integer vector type: 0000000000400550 <test_int_vectors_containing_fp_data_using_local_reg_var_overlay>: 400550: 66 0f 58 f7 addpd xmm6,xmm7 400554: c3 ret -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42596 ^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug c/42596] Integer/Floating point vector casts generate XMM register moves from and to the same register 2010-01-03 14:18 [Bug c/42596] New: Integer/Floating point vector casts generate XMM register moves from and to the same register adam at consulting dot net dot nz ` (3 preceding siblings ...) 2010-01-05 4:17 ` adam at consulting dot net dot nz @ 2010-01-05 10:47 ` rguenth at gcc dot gnu dot org 4 siblings, 0 replies; 6+ messages in thread From: rguenth at gcc dot gnu dot org @ 2010-01-05 10:47 UTC (permalink / raw) To: gcc-bugs ------- Comment #5 from rguenth at gcc dot gnu dot org 2010-01-05 10:47 ------- Uh, of course. _Don't use global register variables_ They are not supposed to be used for this kind of things and nobody spends a single second to optimizing code generation for them - instead the most difficult thing is to prevent the compiler from miscompiling things. -- rguenth at gcc dot gnu dot org changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |RESOLVED Resolution| |WONTFIX http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42596 ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2010-01-05 10:47 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2010-01-03 14:18 [Bug c/42596] New: Integer/Floating point vector casts generate XMM register moves from and to the same register adam at consulting dot net dot nz 2010-01-03 14:46 ` [Bug c/42596] " rguenth at gcc dot gnu dot org 2010-01-03 14:50 ` rguenth at gcc dot gnu dot org 2010-01-03 23:20 ` adam at consulting dot net dot nz 2010-01-05 4:17 ` adam at consulting dot net dot nz 2010-01-05 10:47 ` rguenth at gcc dot gnu dot org
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).