public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/87767] Missing AVX512 memory broadcast for constant vector
[not found] <bug-87767-4@http.gcc.gnu.org/bugzilla/>
@ 2020-07-13 7:05 ` crazylht at gmail dot com
2020-09-03 8:11 ` cvs-commit at gcc dot gnu.org
` (14 subsequent siblings)
15 siblings, 0 replies; 16+ messages in thread
From: crazylht at gmail dot com @ 2020-07-13 7:05 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87767
--- Comment #7 from Hongtao.liu <crazylht at gmail dot com> ---
a patch is posted at
https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549713.html
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug target/87767] Missing AVX512 memory broadcast for constant vector
[not found] <bug-87767-4@http.gcc.gnu.org/bugzilla/>
2020-07-13 7:05 ` [Bug target/87767] Missing AVX512 memory broadcast for constant vector crazylht at gmail dot com
@ 2020-09-03 8:11 ` cvs-commit at gcc dot gnu.org
2020-09-03 8:30 ` crazylht at gmail dot com
` (13 subsequent siblings)
15 siblings, 0 replies; 16+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2020-09-03 8:11 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87767
--- Comment #8 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:
https://gcc.gnu.org/g:433734126996b6fc4fc99b594421510f928a7bb9
commit r11-2991-g433734126996b6fc4fc99b594421510f928a7bb9
Author: liuhongt <hongtao.liu@intel.com>
Date: Wed Jul 8 17:14:36 2020 +0800
Optimize memory broadcast for constant vector under AVX512.
For constant vector having one duplicated value, there's no need to put
whole vector in the constant pool, using embedded broadcast instead.
2020-07-09 Hongtao Liu <hongtao.liu@intel.com>
gcc/ChangeLog:
PR target/87767
* config/i386/i386-features.c
(replace_constant_pool_with_broadcast): New function.
(constant_pool_broadcast): Ditto.
(class pass_constant_pool_broadcast): New pass.
(make_pass_constant_pool_broadcast): Ditto.
(remove_partial_avx_dependency): Call
replace_constant_pool_with_broadcast under TARGET_AVX512F, it
would save compile time when both pass rpad and cpb are
available.
(remove_partial_avx_dependency_gate): New function.
(class pass_remove_partial_avx_dependency::gate): Call
remove_partial_avx_dependency_gate.
* config/i386/i386-passes.def: Insert new pass after combine.
* config/i386/i386-protos.h
(make_pass_constant_pool_broadcast): Declare.
* config/i386/sse.md (*avx512dq_mul<mode>3<mask_name>_bcst):
New define_insn.
(*avx512f_mul<mode>3<mask_name>_bcst): Ditto.
* config/i386/avx512fintrin.h (_mm512_set1_ps,
_mm512_set1_pd,_mm512_set1_epi32, _mm512_set1_epi64): Adjusted.
gcc/testsuite/ChangeLog:
PR target/87767
* gcc.target/i386/avx2-broadcast-pr87767-1.c: New test.
* gcc.target/i386/avx512f-broadcast-pr87767-1.c: New test.
* gcc.target/i386/avx512f-broadcast-pr87767-2.c: New test.
* gcc.target/i386/avx512f-broadcast-pr87767-3.c: New test.
* gcc.target/i386/avx512f-broadcast-pr87767-4.c: New test.
* gcc.target/i386/avx512f-broadcast-pr87767-5.c: New test.
* gcc.target/i386/avx512f-broadcast-pr87767-6.c: New test.
* gcc.target/i386/avx512f-broadcast-pr87767-7.c: New test.
* gcc.target/i386/avx512vl-broadcast-pr87767-1.c: New test.
* gcc.target/i386/avx512vl-broadcast-pr87767-1.c: New test.
* gcc.target/i386/avx512vl-broadcast-pr87767-2.c: New test.
* gcc.target/i386/avx512vl-broadcast-pr87767-3.c: New test.
* gcc.target/i386/avx512vl-broadcast-pr87767-4.c: New test.
* gcc.target/i386/avx512vl-broadcast-pr87767-5.c: New test.
* gcc.target/i386/avx512vl-broadcast-pr87767-6.c: New test.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug target/87767] Missing AVX512 memory broadcast for constant vector
[not found] <bug-87767-4@http.gcc.gnu.org/bugzilla/>
2020-07-13 7:05 ` [Bug target/87767] Missing AVX512 memory broadcast for constant vector crazylht at gmail dot com
2020-09-03 8:11 ` cvs-commit at gcc dot gnu.org
@ 2020-09-03 8:30 ` crazylht at gmail dot com
2020-09-03 8:37 ` jakub at gcc dot gnu.org
` (12 subsequent siblings)
15 siblings, 0 replies; 16+ messages in thread
From: crazylht at gmail dot com @ 2020-09-03 8:30 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87767
--- Comment #9 from Hongtao.liu <crazylht at gmail dot com> ---
I think it's fixed in GCC11, but still there're lots of "_bcst" patterns need
to be added.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug target/87767] Missing AVX512 memory broadcast for constant vector
[not found] <bug-87767-4@http.gcc.gnu.org/bugzilla/>
` (2 preceding siblings ...)
2020-09-03 8:30 ` crazylht at gmail dot com
@ 2020-09-03 8:37 ` jakub at gcc dot gnu.org
2020-09-03 9:06 ` crazylht at gmail dot com
` (11 subsequent siblings)
15 siblings, 0 replies; 16+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-09-03 8:37 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87767
--- Comment #10 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Do we really need separate _bcst patterns btw? Wouldn't it be better to just
have some predicate and corresponding constaint that would allow normal MEM
vectors as well as these broadcast from single element and just use that
predicate wherever the broadcasts are allowed? Probably would need multiple of
them though, even when the main would turn to be just memory_operand if
TARGET_AVX512F is false, some instructions only use EVEX encoding if some other
TARGET_AVX* is on...
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug target/87767] Missing AVX512 memory broadcast for constant vector
[not found] <bug-87767-4@http.gcc.gnu.org/bugzilla/>
` (3 preceding siblings ...)
2020-09-03 8:37 ` jakub at gcc dot gnu.org
@ 2020-09-03 9:06 ` crazylht at gmail dot com
2020-09-03 9:31 ` jakub at gcc dot gnu.org
` (10 subsequent siblings)
15 siblings, 0 replies; 16+ messages in thread
From: crazylht at gmail dot com @ 2020-09-03 9:06 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87767
--- Comment #11 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Jakub Jelinek from comment #10)
> Do we really need separate _bcst patterns btw? Wouldn't it be better to
> just have some predicate and corresponding constaint that would allow normal
In currently implementation, vec_duplicate would be used for memory_operand in
broadcast patterns. I'm not sure if vec_duplicate could be used in
define_predicate, or am i misunderstood?
> MEM vectors as well as these broadcast from single element and just use that
> predicate wherever the broadcasts are allowed? Probably would need multiple
> of them though, even when the main would turn to be just memory_operand if
> TARGET_AVX512F is false, some instructions only use EVEX encoding if some
> other TARGET_AVX* is on...
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug target/87767] Missing AVX512 memory broadcast for constant vector
[not found] <bug-87767-4@http.gcc.gnu.org/bugzilla/>
` (4 preceding siblings ...)
2020-09-03 9:06 ` crazylht at gmail dot com
@ 2020-09-03 9:31 ` jakub at gcc dot gnu.org
2020-09-04 15:52 ` crazylht at gmail dot com
` (9 subsequent siblings)
15 siblings, 0 replies; 16+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-09-03 9:31 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87767
--- Comment #12 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
What I mean is that we should try to simplify the md file, instead of adding
hundreds of new *_bcst patterns.
We have e.g.
(define_insn "*<plusminus_insn><mode>3"
[(set (match_operand:VI_AVX2 0 "register_operand" "=x,v")
(plusminus:VI_AVX2
(match_operand:VI_AVX2 1 "vector_operand" "<comm>0,v")
(match_operand:VI_AVX2 2 "vector_operand" "xBm,vm")))]
"TARGET_SSE2 && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
"@
p<plusminus_mnemonic><ssemodesuffix>\t{%2, %0|%0, %2}
vp<plusminus_mnemonic><ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
[(set_attr "isa" "noavx,avx")
(set_attr "type" "sseiadd")
(set_attr "prefix_data16" "1,*")
(set_attr "prefix" "orig,vex")
(set_attr "mode" "<sseinsnmode>")])
(define_insn "*sub<mode>3_bcst"
[(set (match_operand:VI48_AVX512VL 0 "register_operand" "=v")
(minus:VI48_AVX512VL
(match_operand:VI48_AVX512VL 1 "register_operand" "v")
(vec_duplicate:VI48_AVX512VL
(match_operand:<ssescalarmode> 2 "memory_operand" "m"))))]
"TARGET_AVX512F && ix86_binary_operator_ok (MINUS, <MODE>mode, operands)"
"vpsub<ssemodesuffix>\t{%2<avx512bcst>, %1, %0|%0, %1, %2<avx512bcst>}"
[(set_attr "type" "sseiadd")
(set_attr "prefix" "evex")
(set_attr "mode" "<sseinsnmode>")])
What I meant is we could have just:
(define_insn "*<plusminus_insn><mode>3"
[(set (match_operand:VI_AVX2 0 "register_operand" "=x,v")
(plusminus:VI_AVX2
(match_operand:VI_AVX2 1 "vector_bcst_operand" "<comm>0,v")
(match_operand:VI_AVX2 2 "vector_bcst_operand" "xBm,vBb")))]
"TARGET_SSE2 && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
"@
p<plusminus_mnemonic><ssemodesuffix>\t{%2, %0|%0, %2}
vp<plusminus_mnemonic><ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
[(set_attr "isa" "noavx,avx")
(set_attr "type" "sseiadd")
(set_attr "prefix_data16" "1,*")
(set_attr "prefix" "orig,vex")
(set_attr "mode" "<sseinsnmode>")])
where vector_bcst_operand is either vector_operand, or for TARGET_AVX512F
a VEC_DUPLICATE of the right mode with a MEM inside of it with the element mode
of the VEC_DUPLICATE mode, similarly Bb constraint is either m, or for
TARGET_AVX512F also again the VEC_DUPLICATE with MEM inside of it, and that
ix86_binary_operator_ok would treat a VEC_DUPLICATE wrapping MEM the same as
MEM (in particular ensure one e.g. doesn't have one VEC_DUPLICATE and one MEM
operand, or two VEC_DUPLICATE operands) and that the output code would handle
emitting an operand with VEC_DUPLICATE of a MEM properly.
Or perhaps the constraint there could be just for the broadcast and one could
write vmBb. Still, I think the predicate needs to be accurate, i.e. for some
instructions we want e.g. vector_operand or TARGET_AVX512F and
bcst_mem_operand,
for others vector_operand or TARGET_AVX512VL and bcst_mem_operand etc.
Anyway, if we go down this route, might be best to handle just a couple of
patterns, then ask for review and see what Kirill (or if Uros would be
interested) think about it and only later convert more.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug target/87767] Missing AVX512 memory broadcast for constant vector
[not found] <bug-87767-4@http.gcc.gnu.org/bugzilla/>
` (5 preceding siblings ...)
2020-09-03 9:31 ` jakub at gcc dot gnu.org
@ 2020-09-04 15:52 ` crazylht at gmail dot com
2020-09-04 15:56 ` crazylht at gmail dot com
` (8 subsequent siblings)
15 siblings, 0 replies; 16+ messages in thread
From: crazylht at gmail dot com @ 2020-09-04 15:52 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87767
--- Comment #13 from Hongtao.liu <crazylht at gmail dot com> ---
Created attachment 49182
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49182&action=edit
bcst_vector_operand
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug target/87767] Missing AVX512 memory broadcast for constant vector
[not found] <bug-87767-4@http.gcc.gnu.org/bugzilla/>
` (6 preceding siblings ...)
2020-09-04 15:52 ` crazylht at gmail dot com
@ 2020-09-04 15:56 ` crazylht at gmail dot com
2020-09-08 4:28 ` crazylht at gmail dot com
` (7 subsequent siblings)
15 siblings, 0 replies; 16+ messages in thread
From: crazylht at gmail dot com @ 2020-09-04 15:56 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87767
--- Comment #14 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Jakub Jelinek from comment #12)
> What I mean is that we should try to simplify the md file, instead of adding
> hundreds of new *_bcst patterns.
> We have e.g.
> (define_insn "*<plusminus_insn><mode>3"
> [(set (match_operand:VI_AVX2 0 "register_operand" "=x,v")
> (plusminus:VI_AVX2
> (match_operand:VI_AVX2 1 "vector_operand" "<comm>0,v")
> (match_operand:VI_AVX2 2 "vector_operand" "xBm,vm")))]
> "TARGET_SSE2 && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
> "@
> p<plusminus_mnemonic><ssemodesuffix>\t{%2, %0|%0, %2}
> vp<plusminus_mnemonic><ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
> [(set_attr "isa" "noavx,avx")
> (set_attr "type" "sseiadd")
> (set_attr "prefix_data16" "1,*")
> (set_attr "prefix" "orig,vex")
> (set_attr "mode" "<sseinsnmode>")])
>
> (define_insn "*sub<mode>3_bcst"
> [(set (match_operand:VI48_AVX512VL 0 "register_operand" "=v")
> (minus:VI48_AVX512VL
> (match_operand:VI48_AVX512VL 1 "register_operand" "v")
> (vec_duplicate:VI48_AVX512VL
> (match_operand:<ssescalarmode> 2 "memory_operand" "m"))))]
> "TARGET_AVX512F && ix86_binary_operator_ok (MINUS, <MODE>mode, operands)"
> "vpsub<ssemodesuffix>\t{%2<avx512bcst>, %1, %0|%0, %1, %2<avx512bcst>}"
> [(set_attr "type" "sseiadd")
> (set_attr "prefix" "evex")
> (set_attr "mode" "<sseinsnmode>")])
>
> What I meant is we could have just:
> (define_insn "*<plusminus_insn><mode>3"
> [(set (match_operand:VI_AVX2 0 "register_operand" "=x,v")
> (plusminus:VI_AVX2
> (match_operand:VI_AVX2 1 "vector_bcst_operand" "<comm>0,v")
> (match_operand:VI_AVX2 2 "vector_bcst_operand" "xBm,vBb")))]
> "TARGET_SSE2 && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
> "@
> p<plusminus_mnemonic><ssemodesuffix>\t{%2, %0|%0, %2}
> vp<plusminus_mnemonic><ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
> [(set_attr "isa" "noavx,avx")
> (set_attr "type" "sseiadd")
> (set_attr "prefix_data16" "1,*")
> (set_attr "prefix" "orig,vex")
> (set_attr "mode" "<sseinsnmode>")])
> where vector_bcst_operand is either vector_operand, or for TARGET_AVX512F
> a VEC_DUPLICATE of the right mode with a MEM inside of it with the element
> mode of the VEC_DUPLICATE mode, similarly Bb constraint is either m, or for
> TARGET_AVX512F also again the VEC_DUPLICATE with MEM inside of it, and that
> ix86_binary_operator_ok would treat a VEC_DUPLICATE wrapping MEM the same as
> MEM (in particular ensure one e.g. doesn't have one VEC_DUPLICATE and one
> MEM operand, or two VEC_DUPLICATE operands) and that the output code would
> handle emitting an operand with VEC_DUPLICATE of a MEM properly.
> Or perhaps the constraint there could be just for the broadcast and one
> could write vmBb. Still, I think the predicate needs to be accurate, i.e.
> for some instructions we want e.g. vector_operand or TARGET_AVX512F and
> bcst_mem_operand,
> for others vector_operand or TARGET_AVX512VL and bcst_mem_operand etc.
>
> Anyway, if we go down this route, might be best to handle just a couple of
> patterns, then ask for review and see what Kirill (or if Uros would be
> interested) think about it and only later convert more.
Is there any way to add preference to constraint "Bb", since we always want to
choose "Bb" when vec_duplicate existed, but sometimes, pass_reload would choose
'v', which produce a redudant broadcast instructions.
i.e: with the patch attached.
testcase avx512f-add-df-zmm-1.c would fail to generate embedded broadcast with
-m32.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug target/87767] Missing AVX512 memory broadcast for constant vector
[not found] <bug-87767-4@http.gcc.gnu.org/bugzilla/>
` (7 preceding siblings ...)
2020-09-04 15:56 ` crazylht at gmail dot com
@ 2020-09-08 4:28 ` crazylht at gmail dot com
2020-09-09 17:06 ` vmakarov at gcc dot gnu.org
` (6 subsequent siblings)
15 siblings, 0 replies; 16+ messages in thread
From: crazylht at gmail dot com @ 2020-09-08 4:28 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87767
Hongtao.liu <crazylht at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |vmakarov at redhat dot com
--- Comment #15 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Jakub Jelinek from comment #12)
> What I mean is that we should try to simplify the md file, instead of adding
> hundreds of new *_bcst patterns.
> We have e.g.
> (define_insn "*<plusminus_insn><mode>3"
> [(set (match_operand:VI_AVX2 0 "register_operand" "=x,v")
> (plusminus:VI_AVX2
> (match_operand:VI_AVX2 1 "vector_operand" "<comm>0,v")
> (match_operand:VI_AVX2 2 "vector_operand" "xBm,vm")))]
> "TARGET_SSE2 && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
> "@
> p<plusminus_mnemonic><ssemodesuffix>\t{%2, %0|%0, %2}
> vp<plusminus_mnemonic><ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
> [(set_attr "isa" "noavx,avx")
> (set_attr "type" "sseiadd")
> (set_attr "prefix_data16" "1,*")
> (set_attr "prefix" "orig,vex")
> (set_attr "mode" "<sseinsnmode>")])
>
> (define_insn "*sub<mode>3_bcst"
> [(set (match_operand:VI48_AVX512VL 0 "register_operand" "=v")
> (minus:VI48_AVX512VL
> (match_operand:VI48_AVX512VL 1 "register_operand" "v")
> (vec_duplicate:VI48_AVX512VL
> (match_operand:<ssescalarmode> 2 "memory_operand" "m"))))]
> "TARGET_AVX512F && ix86_binary_operator_ok (MINUS, <MODE>mode, operands)"
> "vpsub<ssemodesuffix>\t{%2<avx512bcst>, %1, %0|%0, %1, %2<avx512bcst>}"
> [(set_attr "type" "sseiadd")
> (set_attr "prefix" "evex")
> (set_attr "mode" "<sseinsnmode>")])
>
> What I meant is we could have just:
> (define_insn "*<plusminus_insn><mode>3"
> [(set (match_operand:VI_AVX2 0 "register_operand" "=x,v")
> (plusminus:VI_AVX2
> (match_operand:VI_AVX2 1 "vector_bcst_operand" "<comm>0,v")
> (match_operand:VI_AVX2 2 "vector_bcst_operand" "xBm,vBb")))]
> "TARGET_SSE2 && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
> "@
> p<plusminus_mnemonic><ssemodesuffix>\t{%2, %0|%0, %2}
> vp<plusminus_mnemonic><ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
> [(set_attr "isa" "noavx,avx")
> (set_attr "type" "sseiadd")
> (set_attr "prefix_data16" "1,*")
> (set_attr "prefix" "orig,vex")
> (set_attr "mode" "<sseinsnmode>")])
> where vector_bcst_operand is either vector_operand, or for TARGET_AVX512F
> a VEC_DUPLICATE of the right mode with a MEM inside of it with the element
> mode of the VEC_DUPLICATE mode, similarly Bb constraint is either m, or for
> TARGET_AVX512F also again the VEC_DUPLICATE with MEM inside of it, and that
> ix86_binary_operator_ok would treat a VEC_DUPLICATE wrapping MEM the same as
> MEM (in particular ensure one e.g. doesn't have one VEC_DUPLICATE and one
> MEM operand, or two VEC_DUPLICATE operands) and that the output code would
> handle emitting an operand with VEC_DUPLICATE of a MEM properly.
> Or perhaps the constraint there could be just for the broadcast and one
> could write vmBb. Still, I think the predicate needs to be accurate, i.e.
> for some instructions we want e.g. vector_operand or TARGET_AVX512F and
> bcst_mem_operand,
> for others vector_operand or TARGET_AVX512VL and bcst_mem_operand etc.
>
> Anyway, if we go down this route, might be best to handle just a couple of
> patterns, then ask for review and see what Kirill (or if Uros would be
> interested) think about it and only later convert more.
Hi Vladimir Makarov:
I saw you add DEFINE_SPECIAL_MEMORY_CONSTRAINT in PR69299, currently we
encounter a similar problem as PR69299, we want to add
special_memory_constraint for broadcast memory operand(call it bcst_mem_operand
later), but problem is bcst_mem_operand is not MEM_P, it's like
(vec_duplicate:V4SF (mem:SF (reg:...))), so pass_reload can't properly handle
this constraint(it alway assumes the operand should be MEM_P). So the question
is can we enhance the handling of special_memory_constraint, not only
restricted to MEM_P, but also for operand containing a memory_operand
inside(i.e. bcst_mem_operand).
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug target/87767] Missing AVX512 memory broadcast for constant vector
[not found] <bug-87767-4@http.gcc.gnu.org/bugzilla/>
` (8 preceding siblings ...)
2020-09-08 4:28 ` crazylht at gmail dot com
@ 2020-09-09 17:06 ` vmakarov at gcc dot gnu.org
2020-10-22 2:29 ` cvs-commit at gcc dot gnu.org
` (5 subsequent siblings)
15 siblings, 0 replies; 16+ messages in thread
From: vmakarov at gcc dot gnu.org @ 2020-09-09 17:06 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87767
Vladimir Makarov <vmakarov at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |vmakarov at gcc dot gnu.org
--- Comment #16 from Vladimir Makarov <vmakarov at gcc dot gnu.org> ---
(In reply to Hongtao.liu from comment #15)
> (In reply to Jakub Jelinek from comment #12)
>
> Hi Vladimir Makarov:
> I saw you add DEFINE_SPECIAL_MEMORY_CONSTRAINT in PR69299, currently we
> encounter a similar problem as PR69299, we want to add
> special_memory_constraint for broadcast memory operand(call it
> bcst_mem_operand later), but problem is bcst_mem_operand is not MEM_P, it's
> like (vec_duplicate:V4SF (mem:SF (reg:...))), so pass_reload can't properly
> handle this constraint(it alway assumes the operand should be MEM_P). So the
> question is can we enhance the handling of special_memory_constraint, not
> only restricted to MEM_P, but also for operand containing a memory_operand
> inside(i.e. bcst_mem_operand).
Sure, I am open to any patches to solve this problem by extending special
memory constraint semantic or even introducing new special constraint.
It would have no sense to implement this for old reload pass as gcc targets are
moving from it. So you need to do this only for LRA.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug target/87767] Missing AVX512 memory broadcast for constant vector
[not found] <bug-87767-4@http.gcc.gnu.org/bugzilla/>
` (9 preceding siblings ...)
2020-09-09 17:06 ` vmakarov at gcc dot gnu.org
@ 2020-10-22 2:29 ` cvs-commit at gcc dot gnu.org
2020-10-22 2:29 ` cvs-commit at gcc dot gnu.org
` (4 subsequent siblings)
15 siblings, 0 replies; 16+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2020-10-22 2:29 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87767
--- Comment #17 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:
https://gcc.gnu.org/g:4de7b010038933dd6ca96bf186ca49f243d0def6
commit r11-4202-g4de7b010038933dd6ca96bf186ca49f243d0def6
Author: liuhongt <hongtao.liu@intel.com>
Date: Sat Sep 26 15:08:32 2020 +0800
Extend special_memory_constraint.
For operand with special_memory_constraint, there could be a wrapper
for memory_operand. Extract mem for operand for conditional judgement
like MEM_P, also for record_address_regs.
gcc/ChangeLog:
PR target/87767
* ira-costs.c (record_operand_costs): Extract memory operand
from recog_data.operand[i] for record_address_regs.
(record_reg_classes): Extract memory operand from OP for
conditional judgement MEM_P.
* ira.c (ira_setup_alts): Ditto.
* lra-constraints.c (extract_mem_from_operand): New function.
(satisfies_memory_constraint_p): Extract memory operand from
OP for decompose_mem_address, return false when there's no
memory operand inside OP.
(process_alt_operands): Remove MEM_P (op) since it would be
judged in satisfies_memory_constraint_p.
* recog.c (asm_operand_ok): Extract memory operand from OP for
judgement of memory_operand (OP, VOIDmode).
(constrain_operands): Don't unwrapper unary operator when
there's memory operand inside.
* rtl.h (extract_mem_from_operand): New decl.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug target/87767] Missing AVX512 memory broadcast for constant vector
[not found] <bug-87767-4@http.gcc.gnu.org/bugzilla/>
` (10 preceding siblings ...)
2020-10-22 2:29 ` cvs-commit at gcc dot gnu.org
@ 2020-10-22 2:29 ` cvs-commit at gcc dot gnu.org
2021-07-01 15:11 ` cvs-commit at gcc dot gnu.org
` (3 subsequent siblings)
15 siblings, 0 replies; 16+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2020-10-22 2:29 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87767
--- Comment #18 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:
https://gcc.gnu.org/g:7026bb9504eb0f95e114f832cd6dd14302376861
commit r11-4203-g7026bb9504eb0f95e114f832cd6dd14302376861
Author: liuhongt <hongtao.liu@intel.com>
Date: Sat Sep 26 15:34:23 2020 +0800
Refactor implementation of *_bcst{_1,_2,_3} patterns.
Add new predicate bcst_mem_operand and corresponding constraint "Br"
to merge "$(pattern)_bcst{_1,_2,_3}" into "$(pattern)", also delete
those separate "*_bcst{_1,_2,_3}" patterns.
gcc/ChangeLog:
PR target/87767
* config/i386/constraints.md ("Br"): New special memory
constraint.
* config/i386/i386-expand.c (ix86_binary_operator_ok): Both
source operand cannot be in memory or bcst_memory_operand.
* config/i386/i386.c (ix86_print_operand): Print bcst_mem_operand.
* config/i386/i386.h (VALID_BCST_MODE_P): New.
* config/i386/predicates.md (bcst_mem_operand): New predicate
for AVX512 embedding broadcast memory operand.
(bcst_vector_operand): New predicate, vector_operand or
bcst_mem_operand.
* config/i386/sse.md
(*<plusminus_insn><mode>3<mask_name><round_name>): Extend
predicate and constraints to handle bcst_mem_operand.
(*mul<mode>3<mask_name><round_name>): Ditto.
(<sse>_div<mode>3<mask_name><round_name>): Ditto.
(<sd_mask_codefor>fma_fmadd_<mode><sd_maskz_name><round_name>):
Ditto.
(<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name><round_name>):
Ditto.
(<sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name><round_name>):
Ditto.
(<sd_mask_codefor>fma_fnmsub_<mode><sd_maskz_name><round_name>):
Ditto.
(*<plusminus_insn><mode>3): Ditto.
(avx512dq_mul<mode>3<mask_name>): Ditto.
(*<sse4_1_avx2>_mul<mode>3<mask_name>): Ditto.
(*andnot<mode>3): Ditto.
(<mask_codefor><code><mode>3<mask_name>): Ditto.
(*sub<mode>3<mask_name>_bcst): Removed.
(*add<mode>3<mask_name>_bcst): Ditto.
(*mul<mode>3<mask_name>_bcst): Ditto.
(*<avx512>_div<mode>3<mask_name>_bcst): Ditto.
(*<sd_mask_codefor>fma_fmadd_<mode><sd_maskz_name>_bcst_1):
Ditto.
(*<sd_mask_codefor>fma_fmadd_<mode><sd_maskz_name>_bcst_2):
Ditto.
(*<sd_mask_codefor>fma_fmadd_<mode><sd_maskz_name>_bcst_3):
Ditto.
(*<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name>_bcst_1):
Ditto.
(*<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name>_bcst_2):
Ditto.
(*<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name>_bcst_3):
Ditto.
(*<sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name>_bcst_1):
Ditto.
(*<sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name>_bcst_2):
Ditto.
(*<sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name>_bcst_3):
Ditto.
(*<sd_mask_codefor>fma_fnmsub_<mode><sd_maskz_name>_bcst_1):
Ditto.
(*<sd_mask_codefor>fma_fnmsub_<mode><sd_maskz_name>_bcst_2):
Ditto.
(*<sd_mask_codefor>fma_fnmsub_<mode><sd_maskz_name>_bcst_3):
Ditto.
(*sub<mode>3_bcst): Ditto.
(*add<mode>3_bcst): Ditto.
(*avx512dq_mul<mode>3<mask_name>_bcst): Ditto.
(*avx512f_mul<mode>3<mask_name>_bcst): Ditto.
(*andnot<mode>3_bcst): Ditto.
(*<code><mode>3_bcst): Ditto.
* config/i386/subst.md (bcst_round_constraint): New subst
attribute.
(bcst_round_nimm_predicate): Ditto.
(bcst_mask_prefix3): Ditto.
(bcst_mask_prefix4): Ditto.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug target/87767] Missing AVX512 memory broadcast for constant vector
[not found] <bug-87767-4@http.gcc.gnu.org/bugzilla/>
` (11 preceding siblings ...)
2020-10-22 2:29 ` cvs-commit at gcc dot gnu.org
@ 2021-07-01 15:11 ` cvs-commit at gcc dot gnu.org
2021-07-21 14:56 ` hjl.tools at gmail dot com
` (2 subsequent siblings)
15 siblings, 0 replies; 16+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-07-01 15:11 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87767
--- Comment #19 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by H.J. Lu <hjl@gcc.gnu.org>:
https://gcc.gnu.org/g:edafb35bdadf309ebb9d1eddc5549f9e1ad49c09
commit r12-1958-gedafb35bdadf309ebb9d1eddc5549f9e1ad49c09
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Wed Jun 2 07:15:45 2021 -0700
x86: Convert CONST_WIDE_INT/CONST_VECTOR to broadcast
1. Update move expanders to convert the CONST_WIDE_INT and CONST_VECTOR
operands to vector broadcast from an integer with AVX.
2. Add ix86_gen_scratch_sse_rtx to return a scratch SSE register which
won't increase stack alignment requirement and blocks transformation by
the combine pass.
A small benchmark:
https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/memset/broadcast
shows that broadcast is a little bit faster on Intel Core i7-8559U:
$ make
gcc -g -I. -O2 -c -o test.o test.c
gcc -g -c -o memory.o memory.S
gcc -g -c -o broadcast.o broadcast.S
gcc -g -c -o vec_dup_sse2.o vec_dup_sse2.S
gcc -o test test.o memory.o broadcast.o vec_dup_sse2.o
./test
memory : 147215
broadcast : 121213
vec_dup_sse2: 171366
$
broadcast is also smaller:
$ size memory.o broadcast.o
text data bss dec hex filename
132 0 0 132 84 memory.o
122 0 0 122 7a broadcast.o
$
3. Update PR 87767 tests to expect integer broadcast instead of broadcast
from memory.
4. Update avx512f_cond_move.c to expect integer broadcast.
A small benchmark:
https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/vpaddd/broadcast
shows that integer broadcast is faster than embedded memory broadcast:
$ make
gcc -g -I. -O2 -march=skylake-avx512 -c -o test.o test.c
gcc -g -c -o memory.o memory.S
gcc -g -c -o broadcast.o broadcast.S
gcc -o test test.o memory.o broadcast.o
./test
memory : 425538
broadcast : 375260
$
gcc/
PR target/100865
* config/i386/i386-expand.c (ix86_expand_vector_init_duplicate):
New prototype.
(ix86_byte_broadcast): New function.
(ix86_convert_const_wide_int_to_broadcast): Likewise.
(ix86_expand_move): Convert CONST_WIDE_INT to broadcast if mode
size is 16 bytes or bigger.
(ix86_broadcast_from_integer_constant): New function.
(ix86_expand_vector_move): Convert CONST_WIDE_INT and CONST_VECTOR
to broadcast if mode size is 16 bytes or bigger.
* config/i386/i386-protos.h (ix86_gen_scratch_sse_rtx): New
prototype.
* config/i386/i386.c (ix86_gen_scratch_sse_rtx): New function.
gcc/testsuite/
PR target/100865
* gcc.target/i386/avx512f-broadcast-pr87767-1.c: Expect integer
broadcast.
* gcc.target/i386/avx512f-broadcast-pr87767-5.c: Likewise.
* gcc.target/i386/avx512vl-broadcast-pr87767-1.c: Likewise.
* gcc.target/i386/avx512vl-broadcast-pr87767-5.c: Likewise.
* gcc.target/i386/avx512f_cond_move.c: Also pass
-mprefer-vector-width=512 and expect integer broadcast.
* gcc.target/i386/pr100865-1.c: New test.
* gcc.target/i386/pr100865-2.c: Likewise.
* gcc.target/i386/pr100865-3.c: Likewise.
* gcc.target/i386/pr100865-4a.c: Likewise.
* gcc.target/i386/pr100865-4b.c: Likewise.
* gcc.target/i386/pr100865-5a.c: Likewise.
* gcc.target/i386/pr100865-5b.c: Likewise.
* gcc.target/i386/pr100865-6a.c: Likewise.
* gcc.target/i386/pr100865-6b.c: Likewise.
* gcc.target/i386/pr100865-6c.c: Likewise.
* gcc.target/i386/pr100865-7a.c: Likewise.
* gcc.target/i386/pr100865-7b.c: Likewise.
* gcc.target/i386/pr100865-7c.c: Likewise.
* gcc.target/i386/pr100865-8a.c: Likewise.
* gcc.target/i386/pr100865-8b.c: Likewise.
* gcc.target/i386/pr100865-8c.c: Likewise.
* gcc.target/i386/pr100865-9a.c: Likewise.
* gcc.target/i386/pr100865-9b.c: Likewise.
* gcc.target/i386/pr100865-9c.c: Likewise.
* gcc.target/i386/pr100865-10a.c: Likewise.
* gcc.target/i386/pr100865-10b.c: Likewise.
* gcc.target/i386/pr100865-11a.c: Likewise.
* gcc.target/i386/pr100865-11b.c: Likewise.
* gcc.target/i386/pr100865-11c.c: Likewise.
* gcc.target/i386/pr100865-12a.c: Likewise.
* gcc.target/i386/pr100865-12b.c: Likewise.
* gcc.target/i386/pr100865-12c.c: Likewise.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug target/87767] Missing AVX512 memory broadcast for constant vector
[not found] <bug-87767-4@http.gcc.gnu.org/bugzilla/>
` (12 preceding siblings ...)
2021-07-01 15:11 ` cvs-commit at gcc dot gnu.org
@ 2021-07-21 14:56 ` hjl.tools at gmail dot com
2021-07-27 20:24 ` hjl.tools at gmail dot com
2021-09-18 5:03 ` cvs-commit at gcc dot gnu.org
15 siblings, 0 replies; 16+ messages in thread
From: hjl.tools at gmail dot com @ 2021-07-21 14:56 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87767
H.J. Lu <hjl.tools at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |skpgkp2 at gmail dot com
--- Comment #20 from H.J. Lu <hjl.tools at gmail dot com> ---
This has been fixed in GCC 12. Sunil, can you submit a GCC patch to
add testcases to cover this?
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug target/87767] Missing AVX512 memory broadcast for constant vector
[not found] <bug-87767-4@http.gcc.gnu.org/bugzilla/>
` (13 preceding siblings ...)
2021-07-21 14:56 ` hjl.tools at gmail dot com
@ 2021-07-27 20:24 ` hjl.tools at gmail dot com
2021-09-18 5:03 ` cvs-commit at gcc dot gnu.org
15 siblings, 0 replies; 16+ messages in thread
From: hjl.tools at gmail dot com @ 2021-07-27 20:24 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87767
H.J. Lu <hjl.tools at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|NEW |RESOLVED
Target Milestone|--- |11.0
--- Comment #21 from H.J. Lu <hjl.tools at gmail dot com> ---
Fixed for GCC 11.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Bug target/87767] Missing AVX512 memory broadcast for constant vector
[not found] <bug-87767-4@http.gcc.gnu.org/bugzilla/>
` (14 preceding siblings ...)
2021-07-27 20:24 ` hjl.tools at gmail dot com
@ 2021-09-18 5:03 ` cvs-commit at gcc dot gnu.org
15 siblings, 0 replies; 16+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-09-18 5:03 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87767
--- Comment #22 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:
https://gcc.gnu.org/g:7afcb534239014a713e1f234c8734644245e5c38
commit r12-3644-g7afcb534239014a713e1f234c8734644245e5c38
Author: liuhongt <hongtao.liu@intel.com>
Date: Sat Sep 18 12:14:32 2021 +0800
Support embedded broadcast for AVX512FP16 instructions.
gcc/ChangeLog:
PR target/87767
* config/i386/i386.c (ix86_print_operand): Handle
V8HF/V16HF/V32HFmode.
* config/i386/i386.h (VALID_BCST_MODE_P): Add HFmode.
* config/i386/sse.md (avx512bcst): Remove.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx512fp16-broadcast-1.c: New test.
* gcc.target/i386/avx512fp16-broadcast-2.c: New test.
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2021-09-18 5:03 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <bug-87767-4@http.gcc.gnu.org/bugzilla/>
2020-07-13 7:05 ` [Bug target/87767] Missing AVX512 memory broadcast for constant vector crazylht at gmail dot com
2020-09-03 8:11 ` cvs-commit at gcc dot gnu.org
2020-09-03 8:30 ` crazylht at gmail dot com
2020-09-03 8:37 ` jakub at gcc dot gnu.org
2020-09-03 9:06 ` crazylht at gmail dot com
2020-09-03 9:31 ` jakub at gcc dot gnu.org
2020-09-04 15:52 ` crazylht at gmail dot com
2020-09-04 15:56 ` crazylht at gmail dot com
2020-09-08 4:28 ` crazylht at gmail dot com
2020-09-09 17:06 ` vmakarov at gcc dot gnu.org
2020-10-22 2:29 ` cvs-commit at gcc dot gnu.org
2020-10-22 2:29 ` cvs-commit at gcc dot gnu.org
2021-07-01 15:11 ` cvs-commit at gcc dot gnu.org
2021-07-21 14:56 ` hjl.tools at gmail dot com
2021-07-27 20:24 ` hjl.tools at gmail dot com
2021-09-18 5:03 ` cvs-commit at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).