public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/87767] Missing AVX512 memory broadcast for constant vector
       [not found] <bug-87767-4@http.gcc.gnu.org/bugzilla/>
@ 2020-07-13  7:05 ` crazylht at gmail dot com
  2020-09-03  8:11 ` cvs-commit at gcc dot gnu.org
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 16+ messages in thread
From: crazylht at gmail dot com @ 2020-07-13  7:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87767

--- Comment #7 from Hongtao.liu <crazylht at gmail dot com> ---
a patch is posted at
https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549713.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/87767] Missing AVX512 memory broadcast for constant vector
       [not found] <bug-87767-4@http.gcc.gnu.org/bugzilla/>
  2020-07-13  7:05 ` [Bug target/87767] Missing AVX512 memory broadcast for constant vector crazylht at gmail dot com
@ 2020-09-03  8:11 ` cvs-commit at gcc dot gnu.org
  2020-09-03  8:30 ` crazylht at gmail dot com
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 16+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2020-09-03  8:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87767

--- Comment #8 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:

https://gcc.gnu.org/g:433734126996b6fc4fc99b594421510f928a7bb9

commit r11-2991-g433734126996b6fc4fc99b594421510f928a7bb9
Author: liuhongt <hongtao.liu@intel.com>
Date:   Wed Jul 8 17:14:36 2020 +0800

    Optimize memory broadcast for constant vector under AVX512.

    For constant vector having one duplicated value, there's no need to put
    whole vector in the constant pool, using embedded broadcast instead.

    2020-07-09  Hongtao Liu  <hongtao.liu@intel.com>

    gcc/ChangeLog:

            PR target/87767
            * config/i386/i386-features.c
            (replace_constant_pool_with_broadcast): New function.
            (constant_pool_broadcast): Ditto.
            (class pass_constant_pool_broadcast): New pass.
            (make_pass_constant_pool_broadcast): Ditto.
            (remove_partial_avx_dependency): Call
            replace_constant_pool_with_broadcast under TARGET_AVX512F, it
            would save compile time when both pass rpad and cpb are
            available.
            (remove_partial_avx_dependency_gate): New function.
            (class pass_remove_partial_avx_dependency::gate): Call
            remove_partial_avx_dependency_gate.
            * config/i386/i386-passes.def: Insert new pass after combine.
            * config/i386/i386-protos.h
            (make_pass_constant_pool_broadcast): Declare.
            * config/i386/sse.md (*avx512dq_mul<mode>3<mask_name>_bcst):
            New define_insn.
            (*avx512f_mul<mode>3<mask_name>_bcst): Ditto.
            * config/i386/avx512fintrin.h (_mm512_set1_ps,
            _mm512_set1_pd,_mm512_set1_epi32, _mm512_set1_epi64): Adjusted.

    gcc/testsuite/ChangeLog:

            PR target/87767
            * gcc.target/i386/avx2-broadcast-pr87767-1.c: New test.
            * gcc.target/i386/avx512f-broadcast-pr87767-1.c: New test.
            * gcc.target/i386/avx512f-broadcast-pr87767-2.c: New test.
            * gcc.target/i386/avx512f-broadcast-pr87767-3.c: New test.
            * gcc.target/i386/avx512f-broadcast-pr87767-4.c: New test.
            * gcc.target/i386/avx512f-broadcast-pr87767-5.c: New test.
            * gcc.target/i386/avx512f-broadcast-pr87767-6.c: New test.
            * gcc.target/i386/avx512f-broadcast-pr87767-7.c: New test.
            * gcc.target/i386/avx512vl-broadcast-pr87767-1.c: New test.
            * gcc.target/i386/avx512vl-broadcast-pr87767-1.c: New test.
            * gcc.target/i386/avx512vl-broadcast-pr87767-2.c: New test.
            * gcc.target/i386/avx512vl-broadcast-pr87767-3.c: New test.
            * gcc.target/i386/avx512vl-broadcast-pr87767-4.c: New test.
            * gcc.target/i386/avx512vl-broadcast-pr87767-5.c: New test.
            * gcc.target/i386/avx512vl-broadcast-pr87767-6.c: New test.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/87767] Missing AVX512 memory broadcast for constant vector
       [not found] <bug-87767-4@http.gcc.gnu.org/bugzilla/>
  2020-07-13  7:05 ` [Bug target/87767] Missing AVX512 memory broadcast for constant vector crazylht at gmail dot com
  2020-09-03  8:11 ` cvs-commit at gcc dot gnu.org
@ 2020-09-03  8:30 ` crazylht at gmail dot com
  2020-09-03  8:37 ` jakub at gcc dot gnu.org
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 16+ messages in thread
From: crazylht at gmail dot com @ 2020-09-03  8:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87767

--- Comment #9 from Hongtao.liu <crazylht at gmail dot com> ---
I think it's fixed in GCC11, but still there're lots of "_bcst" patterns need
to be added.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/87767] Missing AVX512 memory broadcast for constant vector
       [not found] <bug-87767-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2020-09-03  8:30 ` crazylht at gmail dot com
@ 2020-09-03  8:37 ` jakub at gcc dot gnu.org
  2020-09-03  9:06 ` crazylht at gmail dot com
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 16+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-09-03  8:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87767

--- Comment #10 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Do we really need separate _bcst patterns btw?  Wouldn't it be better to just
have some predicate and corresponding constaint that would allow normal MEM
vectors as well as these broadcast from single element and just use that
predicate wherever the broadcasts are allowed?  Probably would need multiple of
them though, even when the main would turn to be just memory_operand if
TARGET_AVX512F is false, some instructions only use EVEX encoding if some other
TARGET_AVX* is on...

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/87767] Missing AVX512 memory broadcast for constant vector
       [not found] <bug-87767-4@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2020-09-03  8:37 ` jakub at gcc dot gnu.org
@ 2020-09-03  9:06 ` crazylht at gmail dot com
  2020-09-03  9:31 ` jakub at gcc dot gnu.org
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 16+ messages in thread
From: crazylht at gmail dot com @ 2020-09-03  9:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87767

--- Comment #11 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Jakub Jelinek from comment #10)
> Do we really need separate _bcst patterns btw?  Wouldn't it be better to
> just have some predicate and corresponding constaint that would allow normal

In currently implementation, vec_duplicate would be used for memory_operand in
broadcast patterns. I'm not sure if vec_duplicate could be used in
define_predicate, or am i misunderstood?

> MEM vectors as well as these broadcast from single element and just use that
> predicate wherever the broadcasts are allowed?  Probably would need multiple
> of them though, even when the main would turn to be just memory_operand if
> TARGET_AVX512F is false, some instructions only use EVEX encoding if some
> other TARGET_AVX* is on...

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/87767] Missing AVX512 memory broadcast for constant vector
       [not found] <bug-87767-4@http.gcc.gnu.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2020-09-03  9:06 ` crazylht at gmail dot com
@ 2020-09-03  9:31 ` jakub at gcc dot gnu.org
  2020-09-04 15:52 ` crazylht at gmail dot com
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 16+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-09-03  9:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87767

--- Comment #12 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
What I mean is that we should try to simplify the md file, instead of adding
hundreds of new *_bcst patterns.
We have e.g.
(define_insn "*<plusminus_insn><mode>3"
  [(set (match_operand:VI_AVX2 0 "register_operand" "=x,v")
        (plusminus:VI_AVX2
          (match_operand:VI_AVX2 1 "vector_operand" "<comm>0,v")
          (match_operand:VI_AVX2 2 "vector_operand" "xBm,vm")))]
  "TARGET_SSE2 && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
  "@
   p<plusminus_mnemonic><ssemodesuffix>\t{%2, %0|%0, %2}
   vp<plusminus_mnemonic><ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
  [(set_attr "isa" "noavx,avx")
   (set_attr "type" "sseiadd")
   (set_attr "prefix_data16" "1,*")
   (set_attr "prefix" "orig,vex")
   (set_attr "mode" "<sseinsnmode>")])

(define_insn "*sub<mode>3_bcst"
  [(set (match_operand:VI48_AVX512VL 0 "register_operand" "=v")
        (minus:VI48_AVX512VL
          (match_operand:VI48_AVX512VL 1 "register_operand" "v")
          (vec_duplicate:VI48_AVX512VL
            (match_operand:<ssescalarmode> 2 "memory_operand" "m"))))]
  "TARGET_AVX512F && ix86_binary_operator_ok (MINUS, <MODE>mode, operands)"
  "vpsub<ssemodesuffix>\t{%2<avx512bcst>, %1, %0|%0, %1, %2<avx512bcst>}"
  [(set_attr "type" "sseiadd")
   (set_attr "prefix" "evex")
   (set_attr "mode" "<sseinsnmode>")])

What I meant is we could have just:
(define_insn "*<plusminus_insn><mode>3"
  [(set (match_operand:VI_AVX2 0 "register_operand" "=x,v")
        (plusminus:VI_AVX2
          (match_operand:VI_AVX2 1 "vector_bcst_operand" "<comm>0,v")
          (match_operand:VI_AVX2 2 "vector_bcst_operand" "xBm,vBb")))]
  "TARGET_SSE2 && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
  "@
   p<plusminus_mnemonic><ssemodesuffix>\t{%2, %0|%0, %2}
   vp<plusminus_mnemonic><ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
  [(set_attr "isa" "noavx,avx")
   (set_attr "type" "sseiadd")
   (set_attr "prefix_data16" "1,*")
   (set_attr "prefix" "orig,vex")
   (set_attr "mode" "<sseinsnmode>")])
where vector_bcst_operand is either vector_operand, or for TARGET_AVX512F
a VEC_DUPLICATE of the right mode with a MEM inside of it with the element mode
of the VEC_DUPLICATE mode, similarly Bb constraint is either m, or for
TARGET_AVX512F also again the VEC_DUPLICATE with MEM inside of it, and that
ix86_binary_operator_ok would treat a VEC_DUPLICATE wrapping MEM the same as
MEM (in particular ensure one e.g. doesn't have one VEC_DUPLICATE and one MEM
operand, or two VEC_DUPLICATE operands) and that the output code would handle
emitting an operand with VEC_DUPLICATE of a MEM properly.
Or perhaps the constraint there could be just for the broadcast and one could
write vmBb.  Still, I think the predicate needs to be accurate, i.e. for some
instructions we want e.g. vector_operand or TARGET_AVX512F and
bcst_mem_operand,
for others vector_operand or TARGET_AVX512VL and bcst_mem_operand etc.

Anyway, if we go down this route, might be best to handle just a couple of
patterns, then ask for review and see what Kirill (or if Uros would be
interested) think about it and only later convert more.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/87767] Missing AVX512 memory broadcast for constant vector
       [not found] <bug-87767-4@http.gcc.gnu.org/bugzilla/>
                   ` (5 preceding siblings ...)
  2020-09-03  9:31 ` jakub at gcc dot gnu.org
@ 2020-09-04 15:52 ` crazylht at gmail dot com
  2020-09-04 15:56 ` crazylht at gmail dot com
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 16+ messages in thread
From: crazylht at gmail dot com @ 2020-09-04 15:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87767

--- Comment #13 from Hongtao.liu <crazylht at gmail dot com> ---
Created attachment 49182
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49182&action=edit
bcst_vector_operand

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/87767] Missing AVX512 memory broadcast for constant vector
       [not found] <bug-87767-4@http.gcc.gnu.org/bugzilla/>
                   ` (6 preceding siblings ...)
  2020-09-04 15:52 ` crazylht at gmail dot com
@ 2020-09-04 15:56 ` crazylht at gmail dot com
  2020-09-08  4:28 ` crazylht at gmail dot com
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 16+ messages in thread
From: crazylht at gmail dot com @ 2020-09-04 15:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87767

--- Comment #14 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Jakub Jelinek from comment #12)
> What I mean is that we should try to simplify the md file, instead of adding
> hundreds of new *_bcst patterns.
> We have e.g.
> (define_insn "*<plusminus_insn><mode>3"
>   [(set (match_operand:VI_AVX2 0 "register_operand" "=x,v")
>         (plusminus:VI_AVX2
>           (match_operand:VI_AVX2 1 "vector_operand" "<comm>0,v")
>           (match_operand:VI_AVX2 2 "vector_operand" "xBm,vm")))]
>   "TARGET_SSE2 && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
>   "@
>    p<plusminus_mnemonic><ssemodesuffix>\t{%2, %0|%0, %2}
>    vp<plusminus_mnemonic><ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
>   [(set_attr "isa" "noavx,avx")
>    (set_attr "type" "sseiadd")
>    (set_attr "prefix_data16" "1,*")
>    (set_attr "prefix" "orig,vex")
>    (set_attr "mode" "<sseinsnmode>")])
> 
> (define_insn "*sub<mode>3_bcst"
>   [(set (match_operand:VI48_AVX512VL 0 "register_operand" "=v")
>         (minus:VI48_AVX512VL
>           (match_operand:VI48_AVX512VL 1 "register_operand" "v")
>           (vec_duplicate:VI48_AVX512VL
>             (match_operand:<ssescalarmode> 2 "memory_operand" "m"))))]
>   "TARGET_AVX512F && ix86_binary_operator_ok (MINUS, <MODE>mode, operands)"
>   "vpsub<ssemodesuffix>\t{%2<avx512bcst>, %1, %0|%0, %1, %2<avx512bcst>}"
>   [(set_attr "type" "sseiadd")
>    (set_attr "prefix" "evex")
>    (set_attr "mode" "<sseinsnmode>")])
> 
> What I meant is we could have just:
> (define_insn "*<plusminus_insn><mode>3"
>   [(set (match_operand:VI_AVX2 0 "register_operand" "=x,v")
>         (plusminus:VI_AVX2
>           (match_operand:VI_AVX2 1 "vector_bcst_operand" "<comm>0,v")
>           (match_operand:VI_AVX2 2 "vector_bcst_operand" "xBm,vBb")))]
>   "TARGET_SSE2 && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
>   "@
>    p<plusminus_mnemonic><ssemodesuffix>\t{%2, %0|%0, %2}
>    vp<plusminus_mnemonic><ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
>   [(set_attr "isa" "noavx,avx")
>    (set_attr "type" "sseiadd")
>    (set_attr "prefix_data16" "1,*")
>    (set_attr "prefix" "orig,vex")
>    (set_attr "mode" "<sseinsnmode>")])
> where vector_bcst_operand is either vector_operand, or for TARGET_AVX512F
> a VEC_DUPLICATE of the right mode with a MEM inside of it with the element
> mode of the VEC_DUPLICATE mode, similarly Bb constraint is either m, or for
> TARGET_AVX512F also again the VEC_DUPLICATE with MEM inside of it, and that
> ix86_binary_operator_ok would treat a VEC_DUPLICATE wrapping MEM the same as
> MEM (in particular ensure one e.g. doesn't have one VEC_DUPLICATE and one
> MEM operand, or two VEC_DUPLICATE operands) and that the output code would
> handle emitting an operand with VEC_DUPLICATE of a MEM properly.
> Or perhaps the constraint there could be just for the broadcast and one
> could write vmBb.  Still, I think the predicate needs to be accurate, i.e.
> for some instructions we want e.g. vector_operand or TARGET_AVX512F and
> bcst_mem_operand,
> for others vector_operand or TARGET_AVX512VL and bcst_mem_operand etc.
> 
> Anyway, if we go down this route, might be best to handle just a couple of
> patterns, then ask for review and see what Kirill (or if Uros would be
> interested) think about it and only later convert more.

Is there any way to add preference to constraint "Bb", since we always want to
choose "Bb" when vec_duplicate existed, but sometimes, pass_reload would choose
'v', which produce a redudant broadcast instructions.

i.e: with the patch attached.
testcase avx512f-add-df-zmm-1.c would fail to generate embedded broadcast with
-m32.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/87767] Missing AVX512 memory broadcast for constant vector
       [not found] <bug-87767-4@http.gcc.gnu.org/bugzilla/>
                   ` (7 preceding siblings ...)
  2020-09-04 15:56 ` crazylht at gmail dot com
@ 2020-09-08  4:28 ` crazylht at gmail dot com
  2020-09-09 17:06 ` vmakarov at gcc dot gnu.org
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 16+ messages in thread
From: crazylht at gmail dot com @ 2020-09-08  4:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87767

Hongtao.liu <crazylht at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |vmakarov at redhat dot com

--- Comment #15 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Jakub Jelinek from comment #12)
> What I mean is that we should try to simplify the md file, instead of adding
> hundreds of new *_bcst patterns.
> We have e.g.
> (define_insn "*<plusminus_insn><mode>3"
>   [(set (match_operand:VI_AVX2 0 "register_operand" "=x,v")
>         (plusminus:VI_AVX2
>           (match_operand:VI_AVX2 1 "vector_operand" "<comm>0,v")
>           (match_operand:VI_AVX2 2 "vector_operand" "xBm,vm")))]
>   "TARGET_SSE2 && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
>   "@
>    p<plusminus_mnemonic><ssemodesuffix>\t{%2, %0|%0, %2}
>    vp<plusminus_mnemonic><ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
>   [(set_attr "isa" "noavx,avx")
>    (set_attr "type" "sseiadd")
>    (set_attr "prefix_data16" "1,*")
>    (set_attr "prefix" "orig,vex")
>    (set_attr "mode" "<sseinsnmode>")])
> 
> (define_insn "*sub<mode>3_bcst"
>   [(set (match_operand:VI48_AVX512VL 0 "register_operand" "=v")
>         (minus:VI48_AVX512VL
>           (match_operand:VI48_AVX512VL 1 "register_operand" "v")
>           (vec_duplicate:VI48_AVX512VL
>             (match_operand:<ssescalarmode> 2 "memory_operand" "m"))))]
>   "TARGET_AVX512F && ix86_binary_operator_ok (MINUS, <MODE>mode, operands)"
>   "vpsub<ssemodesuffix>\t{%2<avx512bcst>, %1, %0|%0, %1, %2<avx512bcst>}"
>   [(set_attr "type" "sseiadd")
>    (set_attr "prefix" "evex")
>    (set_attr "mode" "<sseinsnmode>")])
> 
> What I meant is we could have just:
> (define_insn "*<plusminus_insn><mode>3"
>   [(set (match_operand:VI_AVX2 0 "register_operand" "=x,v")
>         (plusminus:VI_AVX2
>           (match_operand:VI_AVX2 1 "vector_bcst_operand" "<comm>0,v")
>           (match_operand:VI_AVX2 2 "vector_bcst_operand" "xBm,vBb")))]
>   "TARGET_SSE2 && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
>   "@
>    p<plusminus_mnemonic><ssemodesuffix>\t{%2, %0|%0, %2}
>    vp<plusminus_mnemonic><ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
>   [(set_attr "isa" "noavx,avx")
>    (set_attr "type" "sseiadd")
>    (set_attr "prefix_data16" "1,*")
>    (set_attr "prefix" "orig,vex")
>    (set_attr "mode" "<sseinsnmode>")])
> where vector_bcst_operand is either vector_operand, or for TARGET_AVX512F
> a VEC_DUPLICATE of the right mode with a MEM inside of it with the element
> mode of the VEC_DUPLICATE mode, similarly Bb constraint is either m, or for
> TARGET_AVX512F also again the VEC_DUPLICATE with MEM inside of it, and that
> ix86_binary_operator_ok would treat a VEC_DUPLICATE wrapping MEM the same as
> MEM (in particular ensure one e.g. doesn't have one VEC_DUPLICATE and one
> MEM operand, or two VEC_DUPLICATE operands) and that the output code would
> handle emitting an operand with VEC_DUPLICATE of a MEM properly.
> Or perhaps the constraint there could be just for the broadcast and one
> could write vmBb.  Still, I think the predicate needs to be accurate, i.e.
> for some instructions we want e.g. vector_operand or TARGET_AVX512F and
> bcst_mem_operand,
> for others vector_operand or TARGET_AVX512VL and bcst_mem_operand etc.
> 
> Anyway, if we go down this route, might be best to handle just a couple of
> patterns, then ask for review and see what Kirill (or if Uros would be
> interested) think about it and only later convert more.

Hi Vladimir Makarov: 
  I saw you add DEFINE_SPECIAL_MEMORY_CONSTRAINT in PR69299, currently we
encounter a similar problem as PR69299, we want to add
special_memory_constraint for broadcast memory operand(call it bcst_mem_operand
later), but problem is bcst_mem_operand is not MEM_P, it's like
(vec_duplicate:V4SF (mem:SF (reg:...))), so pass_reload can't properly handle
this constraint(it alway assumes the operand should be MEM_P). So the question
is can we enhance the handling of special_memory_constraint, not only
restricted to MEM_P, but also for operand containing a memory_operand
inside(i.e. bcst_mem_operand).

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/87767] Missing AVX512 memory broadcast for constant vector
       [not found] <bug-87767-4@http.gcc.gnu.org/bugzilla/>
                   ` (8 preceding siblings ...)
  2020-09-08  4:28 ` crazylht at gmail dot com
@ 2020-09-09 17:06 ` vmakarov at gcc dot gnu.org
  2020-10-22  2:29 ` cvs-commit at gcc dot gnu.org
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 16+ messages in thread
From: vmakarov at gcc dot gnu.org @ 2020-09-09 17:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87767

Vladimir Makarov <vmakarov at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |vmakarov at gcc dot gnu.org

--- Comment #16 from Vladimir Makarov <vmakarov at gcc dot gnu.org> ---
(In reply to Hongtao.liu from comment #15)
> (In reply to Jakub Jelinek from comment #12)
>
> Hi Vladimir Makarov: 
>   I saw you add DEFINE_SPECIAL_MEMORY_CONSTRAINT in PR69299, currently we
> encounter a similar problem as PR69299, we want to add
> special_memory_constraint for broadcast memory operand(call it
> bcst_mem_operand later), but problem is bcst_mem_operand is not MEM_P, it's
> like (vec_duplicate:V4SF (mem:SF (reg:...))), so pass_reload can't properly
> handle this constraint(it alway assumes the operand should be MEM_P). So the
> question is can we enhance the handling of special_memory_constraint, not
> only restricted to MEM_P, but also for operand containing a memory_operand
> inside(i.e. bcst_mem_operand).

Sure, I am open to any patches to solve this problem by extending special
memory constraint semantic or even introducing new special constraint.

It would have no sense to implement this for old reload pass as gcc targets are
moving from it.  So you need to do this only for LRA.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/87767] Missing AVX512 memory broadcast for constant vector
       [not found] <bug-87767-4@http.gcc.gnu.org/bugzilla/>
                   ` (9 preceding siblings ...)
  2020-09-09 17:06 ` vmakarov at gcc dot gnu.org
@ 2020-10-22  2:29 ` cvs-commit at gcc dot gnu.org
  2020-10-22  2:29 ` cvs-commit at gcc dot gnu.org
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 16+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2020-10-22  2:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87767

--- Comment #17 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:

https://gcc.gnu.org/g:4de7b010038933dd6ca96bf186ca49f243d0def6

commit r11-4202-g4de7b010038933dd6ca96bf186ca49f243d0def6
Author: liuhongt <hongtao.liu@intel.com>
Date:   Sat Sep 26 15:08:32 2020 +0800

    Extend special_memory_constraint.

    For operand with special_memory_constraint, there could be a wrapper
    for memory_operand. Extract mem for operand for conditional judgement
    like MEM_P, also for record_address_regs.

    gcc/ChangeLog:

            PR target/87767
            * ira-costs.c (record_operand_costs): Extract memory operand
            from recog_data.operand[i] for record_address_regs.
            (record_reg_classes): Extract memory operand from OP for
            conditional judgement MEM_P.
            * ira.c (ira_setup_alts): Ditto.
            * lra-constraints.c (extract_mem_from_operand): New function.
            (satisfies_memory_constraint_p): Extract memory operand from
            OP for decompose_mem_address, return false when there's no
            memory operand inside OP.
            (process_alt_operands): Remove MEM_P (op) since it would be
            judged in satisfies_memory_constraint_p.
            * recog.c (asm_operand_ok): Extract memory operand from OP for
            judgement of memory_operand (OP, VOIDmode).
            (constrain_operands): Don't unwrapper unary operator when
            there's memory operand inside.
            * rtl.h (extract_mem_from_operand): New decl.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/87767] Missing AVX512 memory broadcast for constant vector
       [not found] <bug-87767-4@http.gcc.gnu.org/bugzilla/>
                   ` (10 preceding siblings ...)
  2020-10-22  2:29 ` cvs-commit at gcc dot gnu.org
@ 2020-10-22  2:29 ` cvs-commit at gcc dot gnu.org
  2021-07-01 15:11 ` cvs-commit at gcc dot gnu.org
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 16+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2020-10-22  2:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87767

--- Comment #18 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:

https://gcc.gnu.org/g:7026bb9504eb0f95e114f832cd6dd14302376861

commit r11-4203-g7026bb9504eb0f95e114f832cd6dd14302376861
Author: liuhongt <hongtao.liu@intel.com>
Date:   Sat Sep 26 15:34:23 2020 +0800

    Refactor implementation of *_bcst{_1,_2,_3} patterns.

    Add new predicate bcst_mem_operand and corresponding constraint "Br"
    to merge "$(pattern)_bcst{_1,_2,_3}" into "$(pattern)", also delete
    those separate "*_bcst{_1,_2,_3}" patterns.

    gcc/ChangeLog:

            PR target/87767
            * config/i386/constraints.md ("Br"): New special memory
            constraint.
            * config/i386/i386-expand.c (ix86_binary_operator_ok): Both
            source operand cannot be in memory or bcst_memory_operand.
            * config/i386/i386.c (ix86_print_operand): Print bcst_mem_operand.
            * config/i386/i386.h (VALID_BCST_MODE_P): New.
            * config/i386/predicates.md (bcst_mem_operand): New predicate
            for AVX512 embedding broadcast memory operand.
            (bcst_vector_operand): New predicate, vector_operand or
            bcst_mem_operand.
            * config/i386/sse.md
            (*<plusminus_insn><mode>3<mask_name><round_name>): Extend
            predicate and constraints to handle bcst_mem_operand.
            (*mul<mode>3<mask_name><round_name>): Ditto.
            (<sse>_div<mode>3<mask_name><round_name>): Ditto.
            (<sd_mask_codefor>fma_fmadd_<mode><sd_maskz_name><round_name>):
            Ditto.
            (<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name><round_name>):
            Ditto.
            (<sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name><round_name>):
            Ditto.
            (<sd_mask_codefor>fma_fnmsub_<mode><sd_maskz_name><round_name>):
            Ditto.
            (*<plusminus_insn><mode>3): Ditto.
            (avx512dq_mul<mode>3<mask_name>): Ditto.
            (*<sse4_1_avx2>_mul<mode>3<mask_name>): Ditto.
            (*andnot<mode>3): Ditto.
            (<mask_codefor><code><mode>3<mask_name>): Ditto.
            (*sub<mode>3<mask_name>_bcst): Removed.
            (*add<mode>3<mask_name>_bcst): Ditto.
            (*mul<mode>3<mask_name>_bcst): Ditto.
            (*<avx512>_div<mode>3<mask_name>_bcst): Ditto.
            (*<sd_mask_codefor>fma_fmadd_<mode><sd_maskz_name>_bcst_1):
            Ditto.
            (*<sd_mask_codefor>fma_fmadd_<mode><sd_maskz_name>_bcst_2):
            Ditto.
            (*<sd_mask_codefor>fma_fmadd_<mode><sd_maskz_name>_bcst_3):
            Ditto.
            (*<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name>_bcst_1):
            Ditto.
            (*<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name>_bcst_2):
            Ditto.
            (*<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name>_bcst_3):
            Ditto.
            (*<sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name>_bcst_1):
            Ditto.
            (*<sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name>_bcst_2):
            Ditto.
            (*<sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name>_bcst_3):
            Ditto.
            (*<sd_mask_codefor>fma_fnmsub_<mode><sd_maskz_name>_bcst_1):
            Ditto.
            (*<sd_mask_codefor>fma_fnmsub_<mode><sd_maskz_name>_bcst_2):
            Ditto.
            (*<sd_mask_codefor>fma_fnmsub_<mode><sd_maskz_name>_bcst_3):
            Ditto.
            (*sub<mode>3_bcst): Ditto.
            (*add<mode>3_bcst): Ditto.
            (*avx512dq_mul<mode>3<mask_name>_bcst): Ditto.
            (*avx512f_mul<mode>3<mask_name>_bcst): Ditto.
            (*andnot<mode>3_bcst): Ditto.
            (*<code><mode>3_bcst): Ditto.
            * config/i386/subst.md (bcst_round_constraint): New subst
            attribute.
            (bcst_round_nimm_predicate): Ditto.
            (bcst_mask_prefix3): Ditto.
            (bcst_mask_prefix4): Ditto.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/87767] Missing AVX512 memory broadcast for constant vector
       [not found] <bug-87767-4@http.gcc.gnu.org/bugzilla/>
                   ` (11 preceding siblings ...)
  2020-10-22  2:29 ` cvs-commit at gcc dot gnu.org
@ 2021-07-01 15:11 ` cvs-commit at gcc dot gnu.org
  2021-07-21 14:56 ` hjl.tools at gmail dot com
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 16+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-07-01 15:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87767

--- Comment #19 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by H.J. Lu <hjl@gcc.gnu.org>:

https://gcc.gnu.org/g:edafb35bdadf309ebb9d1eddc5549f9e1ad49c09

commit r12-1958-gedafb35bdadf309ebb9d1eddc5549f9e1ad49c09
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Wed Jun 2 07:15:45 2021 -0700

    x86: Convert CONST_WIDE_INT/CONST_VECTOR to broadcast

    1. Update move expanders to convert the CONST_WIDE_INT and CONST_VECTOR
    operands to vector broadcast from an integer with AVX.
    2. Add ix86_gen_scratch_sse_rtx to return a scratch SSE register which
    won't increase stack alignment requirement and blocks transformation by
    the combine pass.

    A small benchmark:

    https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/memset/broadcast

    shows that broadcast is a little bit faster on Intel Core i7-8559U:

    $ make
    gcc -g -I. -O2   -c -o test.o test.c
    gcc -g   -c -o memory.o memory.S
    gcc -g   -c -o broadcast.o broadcast.S
    gcc -g   -c -o vec_dup_sse2.o vec_dup_sse2.S
    gcc -o test test.o memory.o broadcast.o vec_dup_sse2.o
    ./test
    memory      : 147215
    broadcast   : 121213
    vec_dup_sse2: 171366
    $

    broadcast is also smaller:

    $ size memory.o broadcast.o
       text    data     bss     dec     hex filename
        132       0       0     132      84 memory.o
        122       0       0     122      7a broadcast.o
    $

    3. Update PR 87767 tests to expect integer broadcast instead of broadcast
    from memory.
    4. Update avx512f_cond_move.c to expect integer broadcast.

    A small benchmark:

    https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/vpaddd/broadcast

    shows that integer broadcast is faster than embedded memory broadcast:

    $ make
    gcc -g -I. -O2 -march=skylake-avx512   -c -o test.o test.c
    gcc -g   -c -o memory.o memory.S
    gcc -g   -c -o broadcast.o broadcast.S
    gcc -o test test.o memory.o broadcast.o
    ./test
    memory      : 425538
    broadcast   : 375260
    $

    gcc/

            PR target/100865
            * config/i386/i386-expand.c (ix86_expand_vector_init_duplicate):
            New prototype.
            (ix86_byte_broadcast): New function.
            (ix86_convert_const_wide_int_to_broadcast): Likewise.
            (ix86_expand_move): Convert CONST_WIDE_INT to broadcast if mode
            size is 16 bytes or bigger.
            (ix86_broadcast_from_integer_constant): New function.
            (ix86_expand_vector_move): Convert CONST_WIDE_INT and CONST_VECTOR
            to broadcast if mode size is 16 bytes or bigger.
            * config/i386/i386-protos.h (ix86_gen_scratch_sse_rtx): New
            prototype.
            * config/i386/i386.c (ix86_gen_scratch_sse_rtx): New function.

    gcc/testsuite/

            PR target/100865
            * gcc.target/i386/avx512f-broadcast-pr87767-1.c: Expect integer
            broadcast.
            * gcc.target/i386/avx512f-broadcast-pr87767-5.c: Likewise.
            * gcc.target/i386/avx512vl-broadcast-pr87767-1.c: Likewise.
            * gcc.target/i386/avx512vl-broadcast-pr87767-5.c: Likewise.
            * gcc.target/i386/avx512f_cond_move.c: Also pass
            -mprefer-vector-width=512 and expect integer broadcast.
            * gcc.target/i386/pr100865-1.c: New test.
            * gcc.target/i386/pr100865-2.c: Likewise.
            * gcc.target/i386/pr100865-3.c: Likewise.
            * gcc.target/i386/pr100865-4a.c: Likewise.
            * gcc.target/i386/pr100865-4b.c: Likewise.
            * gcc.target/i386/pr100865-5a.c: Likewise.
            * gcc.target/i386/pr100865-5b.c: Likewise.
            * gcc.target/i386/pr100865-6a.c: Likewise.
            * gcc.target/i386/pr100865-6b.c: Likewise.
            * gcc.target/i386/pr100865-6c.c: Likewise.
            * gcc.target/i386/pr100865-7a.c: Likewise.
            * gcc.target/i386/pr100865-7b.c: Likewise.
            * gcc.target/i386/pr100865-7c.c: Likewise.
            * gcc.target/i386/pr100865-8a.c: Likewise.
            * gcc.target/i386/pr100865-8b.c: Likewise.
            * gcc.target/i386/pr100865-8c.c: Likewise.
            * gcc.target/i386/pr100865-9a.c: Likewise.
            * gcc.target/i386/pr100865-9b.c: Likewise.
            * gcc.target/i386/pr100865-9c.c: Likewise.
            * gcc.target/i386/pr100865-10a.c: Likewise.
            * gcc.target/i386/pr100865-10b.c: Likewise.
            * gcc.target/i386/pr100865-11a.c: Likewise.
            * gcc.target/i386/pr100865-11b.c: Likewise.
            * gcc.target/i386/pr100865-11c.c: Likewise.
            * gcc.target/i386/pr100865-12a.c: Likewise.
            * gcc.target/i386/pr100865-12b.c: Likewise.
            * gcc.target/i386/pr100865-12c.c: Likewise.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/87767] Missing AVX512 memory broadcast for constant vector
       [not found] <bug-87767-4@http.gcc.gnu.org/bugzilla/>
                   ` (12 preceding siblings ...)
  2021-07-01 15:11 ` cvs-commit at gcc dot gnu.org
@ 2021-07-21 14:56 ` hjl.tools at gmail dot com
  2021-07-27 20:24 ` hjl.tools at gmail dot com
  2021-09-18  5:03 ` cvs-commit at gcc dot gnu.org
  15 siblings, 0 replies; 16+ messages in thread
From: hjl.tools at gmail dot com @ 2021-07-21 14:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87767

H.J. Lu <hjl.tools at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |skpgkp2 at gmail dot com

--- Comment #20 from H.J. Lu <hjl.tools at gmail dot com> ---
This has been fixed in GCC 12.  Sunil, can you submit a GCC patch to
add testcases to cover this?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/87767] Missing AVX512 memory broadcast for constant vector
       [not found] <bug-87767-4@http.gcc.gnu.org/bugzilla/>
                   ` (13 preceding siblings ...)
  2021-07-21 14:56 ` hjl.tools at gmail dot com
@ 2021-07-27 20:24 ` hjl.tools at gmail dot com
  2021-09-18  5:03 ` cvs-commit at gcc dot gnu.org
  15 siblings, 0 replies; 16+ messages in thread
From: hjl.tools at gmail dot com @ 2021-07-27 20:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87767

H.J. Lu <hjl.tools at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|NEW                         |RESOLVED
   Target Milestone|---                         |11.0

--- Comment #21 from H.J. Lu <hjl.tools at gmail dot com> ---
Fixed for GCC 11.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/87767] Missing AVX512 memory broadcast for constant vector
       [not found] <bug-87767-4@http.gcc.gnu.org/bugzilla/>
                   ` (14 preceding siblings ...)
  2021-07-27 20:24 ` hjl.tools at gmail dot com
@ 2021-09-18  5:03 ` cvs-commit at gcc dot gnu.org
  15 siblings, 0 replies; 16+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-09-18  5:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87767

--- Comment #22 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:

https://gcc.gnu.org/g:7afcb534239014a713e1f234c8734644245e5c38

commit r12-3644-g7afcb534239014a713e1f234c8734644245e5c38
Author: liuhongt <hongtao.liu@intel.com>
Date:   Sat Sep 18 12:14:32 2021 +0800

    Support embedded broadcast for AVX512FP16 instructions.

    gcc/ChangeLog:

            PR target/87767
            * config/i386/i386.c (ix86_print_operand): Handle
            V8HF/V16HF/V32HFmode.
            * config/i386/i386.h (VALID_BCST_MODE_P): Add HFmode.
            * config/i386/sse.md (avx512bcst): Remove.

    gcc/testsuite/ChangeLog:

            * gcc.target/i386/avx512fp16-broadcast-1.c: New test.
            * gcc.target/i386/avx512fp16-broadcast-2.c: New test.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-09-18  5:03 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-87767-4@http.gcc.gnu.org/bugzilla/>
2020-07-13  7:05 ` [Bug target/87767] Missing AVX512 memory broadcast for constant vector crazylht at gmail dot com
2020-09-03  8:11 ` cvs-commit at gcc dot gnu.org
2020-09-03  8:30 ` crazylht at gmail dot com
2020-09-03  8:37 ` jakub at gcc dot gnu.org
2020-09-03  9:06 ` crazylht at gmail dot com
2020-09-03  9:31 ` jakub at gcc dot gnu.org
2020-09-04 15:52 ` crazylht at gmail dot com
2020-09-04 15:56 ` crazylht at gmail dot com
2020-09-08  4:28 ` crazylht at gmail dot com
2020-09-09 17:06 ` vmakarov at gcc dot gnu.org
2020-10-22  2:29 ` cvs-commit at gcc dot gnu.org
2020-10-22  2:29 ` cvs-commit at gcc dot gnu.org
2021-07-01 15:11 ` cvs-commit at gcc dot gnu.org
2021-07-21 14:56 ` hjl.tools at gmail dot com
2021-07-27 20:24 ` hjl.tools at gmail dot com
2021-09-18  5:03 ` cvs-commit at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).