public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/54174] New: Missed optimization: Unnecessary vmovaps generated for __builtin_ia32_vextractf128_ps256(v, 0)
@ 2012-08-04 17:58 dag at nimrod dot no
2012-08-05 10:39 ` [Bug target/54174] " rguenth at gcc dot gnu.org
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: dag at nimrod dot no @ 2012-08-04 17:58 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54174
Bug #: 54174
Summary: Missed optimization: Unnecessary vmovaps generated for
__builtin_ia32_vextractf128_ps256(v, 0)
Classification: Unclassified
Product: gcc
Version: 4.7.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: dag@nimrod.no
Pasting the following test code into test.c and compiling with gcc -Wall -O
-mavx -S test.c
----
typedef float v4sf __attribute__ ((vector_size (4*4)));
typedef float v8sf __attribute__ ((vector_size (4*8)));
v4sf add(v8sf v)
{
v4sf a = __builtin_ia32_vextractf128_ps256(v, 0);
v4sf b = __builtin_ia32_vextractf128_ps256(v, 1);
return a + b;
}
----
makes gcc generate the following code:
vmovaps %xmm0, %xmm1
vextractf128 $0x1, %ymm0, %xmm0
vaddps %xmm0, %xmm1, %xmm0
However if the statements for a and b are swapped, i.e.
v4sf b = __builtin_ia32_vextractf128_ps256(v, 1);
v4sf a = __builtin_ia32_vextractf128_ps256(v, 0);
then gcc is able to optimize away the vmovaps instruction:
vextractf128 $0x1, %ymm0, %xmm1
vaddps %xmm1, %xmm0, %xmm0
It thus seems like optimization rules are in place to make
__builtin_ia32_vextractf128_ps256(v, 0) a noop, however regardless of this a
vmovaps is generated (or perhaps rather not optimized away) in most cases.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug target/54174] Missed optimization: Unnecessary vmovaps generated for __builtin_ia32_vextractf128_ps256(v, 0)
2012-08-04 17:58 [Bug c/54174] New: Missed optimization: Unnecessary vmovaps generated for __builtin_ia32_vextractf128_ps256(v, 0) dag at nimrod dot no
@ 2012-08-05 10:39 ` rguenth at gcc dot gnu.org
2021-08-21 19:23 ` pinskia at gcc dot gnu.org
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-08-05 10:39 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54174
Richard Guenther <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |ra
Target| |x86_64-*-*
Component|c |target
--- Comment #1 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-08-05 10:38:21 UTC ---
That's more likely a register allocator issue.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug target/54174] Missed optimization: Unnecessary vmovaps generated for __builtin_ia32_vextractf128_ps256(v, 0)
2012-08-04 17:58 [Bug c/54174] New: Missed optimization: Unnecessary vmovaps generated for __builtin_ia32_vextractf128_ps256(v, 0) dag at nimrod dot no
2012-08-05 10:39 ` [Bug target/54174] " rguenth at gcc dot gnu.org
@ 2021-08-21 19:23 ` pinskia at gcc dot gnu.org
2021-08-23 11:28 ` crazylht at gmail dot com
2024-05-16 1:50 ` lin1.hu at intel dot com
3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-21 19:23 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54174
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed| |2021-08-21
Severity|normal |enhancement
Ever confirmed|0 |1
Status|UNCONFIRMED |NEW
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Confirmed.
Most likely the vec_extract_lo_<mode> pattern should have a tie for the input
and output being the same register
Something like:
(define_insn "vec_extract_lo_<mode>"
[(set (match_operand:<ssehalfvecmode> 0 "nonimmediate_operand" "=v,v,vm,v")
(vec_select:<ssehalfvecmode>
(match_operand:V8FI 1 "nonimmediate_operand" "0,v,v,vm")
(parallel [(const_int 0) (const_int 1)
(const_int 2) (const_int 3)])))]
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug target/54174] Missed optimization: Unnecessary vmovaps generated for __builtin_ia32_vextractf128_ps256(v, 0)
2012-08-04 17:58 [Bug c/54174] New: Missed optimization: Unnecessary vmovaps generated for __builtin_ia32_vextractf128_ps256(v, 0) dag at nimrod dot no
2012-08-05 10:39 ` [Bug target/54174] " rguenth at gcc dot gnu.org
2021-08-21 19:23 ` pinskia at gcc dot gnu.org
@ 2021-08-23 11:28 ` crazylht at gmail dot com
2024-05-16 1:50 ` lin1.hu at intel dot com
3 siblings, 0 replies; 5+ messages in thread
From: crazylht at gmail dot com @ 2021-08-23 11:28 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54174
--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Richard Biener from comment #1)
> That's more likely a register allocator issue.
Yes, LRA allocate registers from back to front which means change source code
like below will eliminate redundant mov.
typedef float v4sf __attribute__ ((vector_size (4*4)));
typedef float v8sf __attribute__ ((vector_size (4*8)));
v4sf add(v8sf v)
{
v4sf b = __builtin_ia32_vextractf128_ps256(v, 1);
v4sf a = __builtin_ia32_vextractf128_ps256(v, 0);
return a + b;
}
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug target/54174] Missed optimization: Unnecessary vmovaps generated for __builtin_ia32_vextractf128_ps256(v, 0)
2012-08-04 17:58 [Bug c/54174] New: Missed optimization: Unnecessary vmovaps generated for __builtin_ia32_vextractf128_ps256(v, 0) dag at nimrod dot no
` (2 preceding siblings ...)
2021-08-23 11:28 ` crazylht at gmail dot com
@ 2024-05-16 1:50 ` lin1.hu at intel dot com
3 siblings, 0 replies; 5+ messages in thread
From: lin1.hu at intel dot com @ 2024-05-16 1:50 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54174
Hu Lin <lin1.hu at intel dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |lin1.hu at intel dot com
--- Comment #4 from Hu Lin <lin1.hu at intel dot com> ---
I tried to modify vec_extract_lo_<mode> to:
(define_insn "vec_extract_lo_<mode>"
[(set (match_operand:<ssehalfvecmode> 0 "nonimmediate_operand" "=v,v,vm,v")
(vec_select:<ssehalfvecmode>
(match_operand:VI4F_256 1 "nonimmediate_operand" "0,v,v,vm")
(parallel [(const_int 0) (const_int 1)
(const_int 2) (const_int 3)])))]
and
(define_insn "vec_extract_lo_<mode>"
[(set (match_operand:<ssehalfvecmode> 0 "nonimmediate_operand"
"=v,?v,?vm,?v")
(vec_select:<ssehalfvecmode>
(match_operand:VI4F_256 1 "nonimmediate_operand" "0,v,v,vm")
(parallel [(const_int 0) (const_int 1)
(const_int 2) (const_int 3)])))]
In 315r.reload
Considering alt=0 of insn 7: (0) =v (1) 0
1 Matching alt: reject+=2
overall=8,losers=1,rld_nregs=1
Considering alt=1 of insn 7: (0) ?v (1) v
Staticly defined alt reject+=6
overall=0,losers=0,rld_nregs=0
Choosing alt 1 in insn 7: (0) ?v (1) v {vec_extract_lo_v8sf}
and I tried to use !, alt=0 is still rejected.
And I even tried to modify
(define_insn "vec_extract_lo_<mode>"
[(set (match_operand:<ssehalfvecmode> 0 "nonimmediate_operand" "=v")
(vec_select:<ssehalfvecmode>
(match_operand:VI4F_256 1 "nonimmediate_operand" "0")
(parallel [(const_int 0) (const_int 1)
(const_int 2) (const_int 3)])))]
Although, vec_extract_lo_v8sf uses the same reg %xmm2, compiler will add an
extra insn "vmovaps %ymm0, %ymm2" after reload.
For the other hand, we tried to split the pattern to
[(set (match_dup 0) (match_dup 1))]
{
operands[1] = gen_lowpart (<ssehalfvecmode>mode, operands[1]);
}
before reload. But GCC can't execute Register Coalescer like Clang.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-05-16 1:50 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-04 17:58 [Bug c/54174] New: Missed optimization: Unnecessary vmovaps generated for __builtin_ia32_vextractf128_ps256(v, 0) dag at nimrod dot no
2012-08-05 10:39 ` [Bug target/54174] " rguenth at gcc dot gnu.org
2021-08-21 19:23 ` pinskia at gcc dot gnu.org
2021-08-23 11:28 ` crazylht at gmail dot com
2024-05-16 1:50 ` lin1.hu at intel dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).