public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/103571] New: ABI: V2HF, V4HF and V8HFmode argument passing issues
@ 2021-12-06  8:35 ubizjak at gmail dot com
  2021-12-06 11:37 ` [Bug target/103571] " crazylht at gmail dot com
                   ` (28 more replies)
  0 siblings, 29 replies; 30+ messages in thread
From: ubizjak at gmail dot com @ 2021-12-06  8:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

            Bug ID: 103571
           Summary: ABI: V2HF, V4HF and V8HFmode argument passing issues
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ubizjak at gmail dot com
  Target Milestone: ---

Following testcase:

--cut here--
typedef _Float16 v2hf __attribute__((vector_size(4)));
typedef _Float16 v4hf __attribute__((vector_size(8)));
typedef _Float16 v8hf __attribute__((vector_size(16)));

v2hf foo (v2hf a, v2hf b)
{
  return b;
}

v4hf bar (v4hf a, v4hf b)
{
  return b;
}

v8hf baz (v8hf a, v8hf b)
{
  return b;
}
--cut here--

compiles with -O2 -msse2 -m64 to:

foo:
        movl    16(%rsp), %edx  # 6     [c=9 l=4]  *movsi_internal/0
        movq    %rdi, %rax      # 2     [c=4 l=3]  *movdi_internal/3
        movl    %edx, (%rdi)    # 7     [c=4 l=2]  *movsi_internal/1
        ret             # 18    [c=0 l=1]  simple_return_internal

and with -O2 -msse2 -m32 to:

foo:
        movl    4(%esp), %eax   # 2     [c=9 l=4]  *movsi_internal/0
        movl    12(%esp), %edx  # 6     [c=9 l=4]  *movsi_internal/0
        movl    %edx, (%eax)    # 7     [c=4 l=2]  *movsi_internal/1
        ret     $4              # 17    [c=0 l=3]  simple_return_pop_internal

bar:
        movq    %mm1, %mm0      # 14    [c=4 l=3]  *movv4hf_internal/6
        ret             # 18    [c=0 l=1]  simple_return_internal

baz:

        pushl   %esi    # 53    [c=4 l=1]  *pushsi2/0
        pushl   %ebx    # 54    [c=4 l=1]  *pushsi2/0
        subl    $52, %esp       # 55    [c=4 l=3]  
        movaps  %xmm1, 16(%esp) # 5     [c=4 l=5]  movv8hf_internal/3
        movl    20(%esp), %ecx  # 34    [c=9 l=4]  *movsi_internal/0
        movl    24(%esp), %edx  # 35    [c=9 l=4]  *movsi_internal/0
        movl    28(%esp), %eax  # 36    [c=9 l=4]  *movsi_internal/0
        movd    %xmm1, (%esp)   # 46    [c=4 l=5]  *movsi_internal/11
        movl    %ecx, 4(%esp)   # 47    [c=4 l=4]  *movsi_internal/1
        movl    %edx, 8(%esp)   # 48    [c=4 l=4]  *movsi_internal/1
        movl    %eax, 12(%esp)  # 49    [c=4 l=4]  *movsi_internal/1
        movdqa  (%esp), %xmm0   # 50    [c=17 l=5]  *movti_internal/4
        addl    $52, %esp       # 58    [c=4 l=3]  
        popl    %ebx    # 59    [c=9 l=1]  *popsi1
        popl    %esi    # 60    [c=9 l=1]  *popsi1
        ret             # 61    [c=0 l=1]  simple_return_internal

Does ABI specify how to handle V2HF arguments and returns? foo looks a bit
suspicious to me, corresponding V2HI arguments are simply returned in %eax
register.

Also, baz iz highly un-optimal for 32bit targets.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
  2021-12-06  8:35 [Bug target/103571] New: ABI: V2HF, V4HF and V8HFmode argument passing issues ubizjak at gmail dot com
@ 2021-12-06 11:37 ` crazylht at gmail dot com
  2021-12-07  3:05 ` crazylht at gmail dot com
                   ` (27 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: crazylht at gmail dot com @ 2021-12-06 11:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

Hongtao.liu <crazylht at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |crazylht at gmail dot com

--- Comment #1 from Hongtao.liu <crazylht at gmail dot com> ---
I remember psABI does not specify how to pass the 32-bit vector, PR102197 have
reported similar issue.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
  2021-12-06  8:35 [Bug target/103571] New: ABI: V2HF, V4HF and V8HFmode argument passing issues ubizjak at gmail dot com
  2021-12-06 11:37 ` [Bug target/103571] " crazylht at gmail dot com
@ 2021-12-07  3:05 ` crazylht at gmail dot com
  2021-12-07  7:47 ` wwwhhhyyy333 at gmail dot com
                   ` (26 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: crazylht at gmail dot com @ 2021-12-07  3:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

--- Comment #2 from Hongtao.liu <crazylht at gmail dot com> ---

> 
> Also, baz iz highly un-optimal for 32bit targets.

Yes, it needs to be fixed, note w/ -mavx512fp16 codegen for baz is optimal on
32-bit target, maybe related to vector_mode_supported_p, but then why codegen
for baz on 64-bit target is optimal w/o TARGET_AVX512FP16?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
  2021-12-06  8:35 [Bug target/103571] New: ABI: V2HF, V4HF and V8HFmode argument passing issues ubizjak at gmail dot com
  2021-12-06 11:37 ` [Bug target/103571] " crazylht at gmail dot com
  2021-12-07  3:05 ` crazylht at gmail dot com
@ 2021-12-07  7:47 ` wwwhhhyyy333 at gmail dot com
  2021-12-07  7:54 ` ubizjak at gmail dot com
                   ` (25 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: wwwhhhyyy333 at gmail dot com @ 2021-12-07  7:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

Hongyu Wang <wwwhhhyyy333 at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |wwwhhhyyy333 at gmail dot com

--- Comment #3 from Hongyu Wang <wwwhhhyyy333 at gmail dot com> ---
(In reply to Hongtao.liu from comment #2)
> > 
> > Also, baz iz highly un-optimal for 32bit targets.
> 
> Yes, it needs to be fixed, note w/ -mavx512fp16 codegen for baz is optimal
> on 32-bit target, maybe related to vector_mode_supported_p, but then why
> codegen for baz on 64-bit target is optimal w/o TARGET_AVX512FP16?

For V8HFmode that is unsupported in VALID_SSE2_REG_MODE, function_value_32 has

return gen_rtx_REG (orig_mode, regno); 

so the retval is (reg:BLK 20 xmm0).

while function_value_64 uses construct_container and returns

(parallel:BLK [                                   
        (expr_list:REG_DEP_TRUE (reg:V8HF 20 xmm0)
            (const_int 0 [0]))                    
    ])                                            

This could be optimized to simple movaps finally.

So we may need to support V8HFmode in VALID_SSE2_REG_MODE if we don't want to
modify those function_args and function_value stuff.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
  2021-12-06  8:35 [Bug target/103571] New: ABI: V2HF, V4HF and V8HFmode argument passing issues ubizjak at gmail dot com
                   ` (2 preceding siblings ...)
  2021-12-07  7:47 ` wwwhhhyyy333 at gmail dot com
@ 2021-12-07  7:54 ` ubizjak at gmail dot com
  2021-12-07  8:14 ` crazylht at gmail dot com
                   ` (24 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: ubizjak at gmail dot com @ 2021-12-07  7:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

--- Comment #4 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Hongyu Wang from comment #3)

> So we may need to support V8HFmode in VALID_SSE2_REG_MODE if we don't want
> to modify those function_args and function_value stuff.

We have V8HFmode moves for TARGET_SSE, So I guress we can enable it for
VALID_SSE2_REG_MODE.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
  2021-12-06  8:35 [Bug target/103571] New: ABI: V2HF, V4HF and V8HFmode argument passing issues ubizjak at gmail dot com
                   ` (3 preceding siblings ...)
  2021-12-07  7:54 ` ubizjak at gmail dot com
@ 2021-12-07  8:14 ` crazylht at gmail dot com
  2021-12-07 11:04 ` ubizjak at gmail dot com
                   ` (23 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: crazylht at gmail dot com @ 2021-12-07  8:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

--- Comment #5 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Uroš Bizjak from comment #4)
> (In reply to Hongyu Wang from comment #3)
>  
> > So we may need to support V8HFmode in VALID_SSE2_REG_MODE if we don't want
> > to modify those function_args and function_value stuff.
> 
> We have V8HFmode moves for TARGET_SSE, So I guress we can enable it for
> VALID_SSE2_REG_MODE.

There're several places in i386-expand.c which assume TARGET_AVX512FP16 for
case V8HF/V16HF/V32HF, if we want to put V8HF/V16HF/V32HF in
VALID_SSE2/AVX256/AVX512F_REG_MODE, we need to "fix" them first.

For insn patterns, it's ok since condition is binded to real instruction but
not mode.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
  2021-12-06  8:35 [Bug target/103571] New: ABI: V2HF, V4HF and V8HFmode argument passing issues ubizjak at gmail dot com
                   ` (4 preceding siblings ...)
  2021-12-07  8:14 ` crazylht at gmail dot com
@ 2021-12-07 11:04 ` ubizjak at gmail dot com
  2021-12-07 11:17 ` ubizjak at gmail dot com
                   ` (22 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: ubizjak at gmail dot com @ 2021-12-07 11:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

--- Comment #6 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Hongtao.liu from comment #5)

> There're several places in i386-expand.c which assume TARGET_AVX512FP16 for
> case V8HF/V16HF/V32HF, if we want to put V8HF/V16HF/V32HF in
> VALID_SSE2/AVX256/AVX512F_REG_MODE, we need to "fix" them first.

These are of the type:

      use_vector_set = TARGET_AVX512FP16 && one_var == 0;
      gen_vec_set_0 = gen_vec_setv8hf_0;

So they look immune to the above change.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
  2021-12-06  8:35 [Bug target/103571] New: ABI: V2HF, V4HF and V8HFmode argument passing issues ubizjak at gmail dot com
                   ` (5 preceding siblings ...)
  2021-12-07 11:04 ` ubizjak at gmail dot com
@ 2021-12-07 11:17 ` ubizjak at gmail dot com
  2021-12-08  5:27 ` crazylht at gmail dot com
                   ` (21 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: ubizjak at gmail dot com @ 2021-12-07 11:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

--- Comment #7 from Uroš Bizjak <ubizjak at gmail dot com> ---
Created attachment 51941
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51941&action=edit
Proposed patch

The patch moves put V2HF+V4HF+V8HF/V16HF/V32HF TO
VALID_SSE2/AVX256/AVX512F_REG_MODE.

Also, introduces VALID_AVX512FP16_SCALAR_MODE to simplify some code.

(Probably we need to add V2HFmode to VALID_INT_MODE_P, but nevertheless the
patch fixes all the issues from the description):

64-bit targets:

foo:
        movl    %esi, %eax
        ret

bar:
        movaps  %xmm1, %xmm0
        ret

baz:
        movdqa  %xmm1, %xmm0
        ret

and for 32-bit targets:

foo:
        movl    8(%esp), %eax
        ret

bar:
        movq    %mm1, %mm0
        ret

baz:
        movdqa  %xmm1, %xmm0
        ret

The patch "regresses" 32bit testsuite:

FAIL: gcc.target/i386/pr102812.c scan-assembler movdqa

but only due to better generated code:

        pxor    %xmm0, %xmm0
        pinsrw  $0, 4(%esp), %xmm0
        ret

vs. the above demonstrated mess.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
  2021-12-06  8:35 [Bug target/103571] New: ABI: V2HF, V4HF and V8HFmode argument passing issues ubizjak at gmail dot com
                   ` (6 preceding siblings ...)
  2021-12-07 11:17 ` ubizjak at gmail dot com
@ 2021-12-08  5:27 ` crazylht at gmail dot com
  2021-12-08  7:10 ` ubizjak at gmail dot com
                   ` (20 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: crazylht at gmail dot com @ 2021-12-08  5:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

--- Comment #8 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Uroš Bizjak from comment #6)
> (In reply to Hongtao.liu from comment #5)
> 
> > There're several places in i386-expand.c which assume TARGET_AVX512FP16 for
> > case V8HF/V16HF/V32HF, if we want to put V8HF/V16HF/V32HF in
> > VALID_SSE2/AVX256/AVX512F_REG_MODE, we need to "fix" them first.
> 
> These are of the type:
> 
>       use_vector_set = TARGET_AVX512FP16 && one_var == 0;
>       gen_vec_set_0 = gen_vec_setv8hf_0;
> 
> So they look immune to the above change.

This is ok.

I mean in ix86_expand_vector_init_duplicate

    case E_V8HFmode:
    case E_V16HFmode:
    case E_V32HFmode:
      return ix86_vector_duplicate_value (mode, target, val);

AVX2 is needed for V8HF/V16HFmode vpbroadcastw, AVX512BW is needed for
V32HFmode, those modes should be handled same as V8HI/V16HI/V32HImode.

Also in ix86_expand_vector_extract, below should be under TARGET_AVX512BW,
other wise, vector_extract go through stack.

    case E_V32HFmode:
      tmp = gen_reg_rtx (V16HFmode);
      if (elt < 16)
        emit_insn (gen_vec_extract_lo_v32hf (tmp, vec));
      else
        emit_insn (gen_vec_extract_hi_v32hf (tmp, vec));
      ix86_expand_vector_extract (false, target, tmp, elt & 15);
      return;


others seems to be ok.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
  2021-12-06  8:35 [Bug target/103571] New: ABI: V2HF, V4HF and V8HFmode argument passing issues ubizjak at gmail dot com
                   ` (7 preceding siblings ...)
  2021-12-08  5:27 ` crazylht at gmail dot com
@ 2021-12-08  7:10 ` ubizjak at gmail dot com
  2021-12-08  7:16 ` crazylht at gmail dot com
                   ` (19 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: ubizjak at gmail dot com @ 2021-12-08  7:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

--- Comment #9 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Hongtao.liu from comment #8)
> (In reply to Uroš Bizjak from comment #6)
> > (In reply to Hongtao.liu from comment #5)
> > 
> > > There're several places in i386-expand.c which assume TARGET_AVX512FP16 for
> > > case V8HF/V16HF/V32HF, if we want to put V8HF/V16HF/V32HF in
> > > VALID_SSE2/AVX256/AVX512F_REG_MODE, we need to "fix" them first.
> > 
> > These are of the type:
> > 
> >       use_vector_set = TARGET_AVX512FP16 && one_var == 0;
> >       gen_vec_set_0 = gen_vec_setv8hf_0;
> > 
> > So they look immune to the above change.
> 
> This is ok.
> 
> I mean in ix86_expand_vector_init_duplicate
> 
>     case E_V8HFmode:
>     case E_V16HFmode:
>     case E_V32HFmode:
>       return ix86_vector_duplicate_value (mode, target, val);
> 
> AVX2 is needed for V8HF/V16HFmode vpbroadcastw, AVX512BW is needed for
> V32HFmode, those modes should be handled same as V8HI/V16HI/V32HImode.
> 
> Also in ix86_expand_vector_extract, below should be under TARGET_AVX512BW,
> other wise, vector_extract go through stack.
> 
>     case E_V32HFmode:
>       tmp = gen_reg_rtx (V16HFmode);
>       if (elt < 16)
> 	emit_insn (gen_vec_extract_lo_v32hf (tmp, vec));
>       else
> 	emit_insn (gen_vec_extract_hi_v32hf (tmp, vec));
>       ix86_expand_vector_extract (false, target, tmp, elt & 15);
>       return;
> 
> 
> others seems to be ok.

Please note that the change mainly affects moves between SSE and GP registers.
Expansion is done way before register allocation, and if we allow these modes
earlier, I'm not sure I understand how it affects expand.

I propose we proceed with my patch and fix eventual fallout as a follow-up.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
  2021-12-06  8:35 [Bug target/103571] New: ABI: V2HF, V4HF and V8HFmode argument passing issues ubizjak at gmail dot com
                   ` (8 preceding siblings ...)
  2021-12-08  7:10 ` ubizjak at gmail dot com
@ 2021-12-08  7:16 ` crazylht at gmail dot com
  2021-12-08 14:25 ` ubizjak at gmail dot com
                   ` (18 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: crazylht at gmail dot com @ 2021-12-08  7:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

--- Comment #10 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Uroš Bizjak from comment #9)
> (In reply to Hongtao.liu from comment #8)
> > (In reply to Uroš Bizjak from comment #6)
> > > (In reply to Hongtao.liu from comment #5)
> > > 
> > > > There're several places in i386-expand.c which assume TARGET_AVX512FP16 for
> > > > case V8HF/V16HF/V32HF, if we want to put V8HF/V16HF/V32HF in
> > > > VALID_SSE2/AVX256/AVX512F_REG_MODE, we need to "fix" them first.
> > > 
> > > These are of the type:
> > > 
> > >       use_vector_set = TARGET_AVX512FP16 && one_var == 0;
> > >       gen_vec_set_0 = gen_vec_setv8hf_0;
> > > 
> > > So they look immune to the above change.
> > 
> > This is ok.
> > 
> > I mean in ix86_expand_vector_init_duplicate
> > 
> >     case E_V8HFmode:
> >     case E_V16HFmode:
> >     case E_V32HFmode:
> >       return ix86_vector_duplicate_value (mode, target, val);
> > 
> > AVX2 is needed for V8HF/V16HFmode vpbroadcastw, AVX512BW is needed for
> > V32HFmode, those modes should be handled same as V8HI/V16HI/V32HImode.
> > 
> > Also in ix86_expand_vector_extract, below should be under TARGET_AVX512BW,
> > other wise, vector_extract go through stack.
> > 
> >     case E_V32HFmode:
> >       tmp = gen_reg_rtx (V16HFmode);
> >       if (elt < 16)
> > 	emit_insn (gen_vec_extract_lo_v32hf (tmp, vec));
> >       else
> > 	emit_insn (gen_vec_extract_hi_v32hf (tmp, vec));
> >       ix86_expand_vector_extract (false, target, tmp, elt & 15);
> >       return;
> > 
> > 
> > others seems to be ok.
> 
> Please note that the change mainly affects moves between SSE and GP
> registers. Expansion is done way before register allocation, and if we allow
> these modes earlier, I'm not sure I understand how it affects expand.
> 
> I propose we proceed with my patch and fix eventual fallout as a follow-up.

Sure.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
  2021-12-06  8:35 [Bug target/103571] New: ABI: V2HF, V4HF and V8HFmode argument passing issues ubizjak at gmail dot com
                   ` (9 preceding siblings ...)
  2021-12-08  7:16 ` crazylht at gmail dot com
@ 2021-12-08 14:25 ` ubizjak at gmail dot com
  2021-12-08 14:38 ` ubizjak at gmail dot com
                   ` (17 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: ubizjak at gmail dot com @ 2021-12-08 14:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

Uroš Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #51941|0                           |1
        is obsolete|                            |

--- Comment #11 from Uroš Bizjak <ubizjak at gmail dot com> ---
Created attachment 51948
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51948&action=edit
Proposed patch to enable vector HF modes for TARGET_SSE2+

Attached patch enables vector HF modes for TARGET_SSE2+. In addition to
enabling vector modes for SSE2, AVX and AVX512F targets, it enables
corresponding move insns in sse.md, redefines some mode iterators and moves a
couple of patterns  from TARGET_AVX512FP16 to lower ABIs.

The patch also fixes ix86_expand_vector_init_duplicate,
ix86_expand_vector_extract and expand_vec_perm_broadcast_1, as mentioned in
Comment #8.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
  2021-12-06  8:35 [Bug target/103571] New: ABI: V2HF, V4HF and V8HFmode argument passing issues ubizjak at gmail dot com
                   ` (10 preceding siblings ...)
  2021-12-08 14:25 ` ubizjak at gmail dot com
@ 2021-12-08 14:38 ` ubizjak at gmail dot com
  2021-12-08 15:05 ` ubizjak at gmail dot com
                   ` (16 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: ubizjak at gmail dot com @ 2021-12-08 14:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

--- Comment #12 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Hongtao.liu from comment #10)

> Sure.
Please find attached the complete patch that enables HF vector modes in Comment
#11. The patch survives bootstrap and regression test and works OK for the
following testcase:

--cut here--
typedef _Float16 vf64 __attribute__((vector_size(64)));
typedef _Float16 vf32 __attribute__((vector_size(32)));
typedef _Float16 vf16 __attribute__((vector_size(16)));

#ifdef __AVX512F__
vf64 bar64 (_Float16 a)
{
  return (vf64) { a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a,
a, a, a, a, a, a, a, a, a, a, a, a };
}
#endif

#ifdef __AVX__
vf32 bar32 (_Float16 a)
{
  return (vf32) { a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a };
}
#endif

#ifdef __SSE2__
vf16 baz16 (_Float16 a)
{
  return (vf16) { a, a, a, a, a, a, a, a };
}
#endif
--cut here--

for -msse2, -mavx, -mavx512f and -mavx512bw.

Perhaps some VxHF patterns need to be re-enabled for lower ABIs, but the
generic target code auto-detects them. Now the generic target code does not
assume that vector HF modes depend solely on TARGET_AVX512FP16.

Hongtao, can you please review the patch and perhaps test it a bit more?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
  2021-12-06  8:35 [Bug target/103571] New: ABI: V2HF, V4HF and V8HFmode argument passing issues ubizjak at gmail dot com
                   ` (11 preceding siblings ...)
  2021-12-08 14:38 ` ubizjak at gmail dot com
@ 2021-12-08 15:05 ` ubizjak at gmail dot com
  2021-12-08 15:07 ` ubizjak at gmail dot com
                   ` (15 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: ubizjak at gmail dot com @ 2021-12-08 15:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

--- Comment #13 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Uroš Bizjak from comment #12)
> Hongtao, can you please review the patch and perhaps test it a bit more?

This part is missing from ix86_expand_vector_set_var:

--cut here
@@ -15912,7 +15921,8 @@ ix86_expand_vector_set_var (rtx target, rtx val, rtx
idx)
   /* 512-bits vector byte/word broadcast and comparison only available
      under TARGET_AVX512BW, break 512-bits vector into two 256-bits vector
      when without TARGET_AVX512BW.  */
-  if ((mode == V32HImode || mode == V64QImode) && !TARGET_AVX512BW)
+  if ((mode == V32HImode || mode == V32HFmode || mode == V64QImode)
+      && !TARGET_AVX512BW)
     {
       gcc_assert (TARGET_AVX512F);
       rtx vhi, vlo, idx_hi;
@@ -15926,6 +15936,12 @@ ix86_expand_vector_set_var (rtx target, rtx val, rtx
idx)
          extract_hi = gen_vec_extract_hi_v32hi;
          extract_lo = gen_vec_extract_lo_v32hi;
        }
+      else if (mode == V32HFmode)
+       {
+         half_mode = V16HFmode;
+         extract_hi = gen_vec_extract_hi_v32hf;
+         extract_lo = gen_vec_extract_lo_v32hf;
+       }
       else
        {
          half_mode = V32QImode;
@@ -15973,7 +15989,6 @@ ix86_expand_vector_set_var (rtx target, rtx val, rtx
idx)
        case E_V16SFmode:
          cmp_mode = V16SImode;
          break;
-       /* TARGET_AVX512FP16 implies TARGET_AVX512BW.  */
        case E_V8HFmode:
          cmp_mode = V8HImode;
          break;
--cut here--

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
  2021-12-06  8:35 [Bug target/103571] New: ABI: V2HF, V4HF and V8HFmode argument passing issues ubizjak at gmail dot com
                   ` (12 preceding siblings ...)
  2021-12-08 15:05 ` ubizjak at gmail dot com
@ 2021-12-08 15:07 ` ubizjak at gmail dot com
  2021-12-09  0:39 ` crazylht at gmail dot com
                   ` (14 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: ubizjak at gmail dot com @ 2021-12-08 15:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

Uroš Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #51948|0                           |1
        is obsolete|                            |

--- Comment #14 from Uroš Bizjak <ubizjak at gmail dot com> ---
Created attachment 51950
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51950&action=edit
Proposed patch to enable vector HF modes for TARGET_SSE2+

Updated patch, see Comment #13.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
  2021-12-06  8:35 [Bug target/103571] New: ABI: V2HF, V4HF and V8HFmode argument passing issues ubizjak at gmail dot com
                   ` (13 preceding siblings ...)
  2021-12-08 15:07 ` ubizjak at gmail dot com
@ 2021-12-09  0:39 ` crazylht at gmail dot com
  2021-12-09  0:42 ` crazylht at gmail dot com
                   ` (13 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: crazylht at gmail dot com @ 2021-12-09  0:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

--- Comment #15 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Uroš Bizjak from comment #12)
> (In reply to Hongtao.liu from comment #10)
> 
> > Sure.
> Please find attached the complete patch that enables HF vector modes in
> Comment #11. The patch survives bootstrap and regression test and works OK
> for the following testcase:
> 
> --cut here--
> typedef _Float16 vf64 __attribute__((vector_size(64)));
> typedef _Float16 vf32 __attribute__((vector_size(32)));
> typedef _Float16 vf16 __attribute__((vector_size(16)));
> 
> #ifdef __AVX512F__
> vf64 bar64 (_Float16 a)
> {
>   return (vf64) { a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a,
> a, a, a, a, a, a, a, a, a, a, a, a, a };
> }
> #endif
> 
> #ifdef __AVX__
> vf32 bar32 (_Float16 a)
> {
>   return (vf32) { a, a, a, a, a, a, a, a, a, a, a, a, a, a, a, a };
> }
> #endif
> 
> #ifdef __SSE2__
> vf16 baz16 (_Float16 a)
> {
>   return (vf16) { a, a, a, a, a, a, a, a };
> }
> #endif
> --cut here--
> 
> for -msse2, -mavx, -mavx512f and -mavx512bw.
> 
> Perhaps some VxHF patterns need to be re-enabled for lower ABIs, but the
> generic target code auto-detects them. Now the generic target code does not
> assume that vector HF modes depend solely on TARGET_AVX512FP16.
> 
> Hongtao, can you please review the patch and perhaps test it a bit more?

Sure.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
  2021-12-06  8:35 [Bug target/103571] New: ABI: V2HF, V4HF and V8HFmode argument passing issues ubizjak at gmail dot com
                   ` (14 preceding siblings ...)
  2021-12-09  0:39 ` crazylht at gmail dot com
@ 2021-12-09  0:42 ` crazylht at gmail dot com
  2021-12-09  4:15 ` crazylht at gmail dot com
                   ` (12 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: crazylht at gmail dot com @ 2021-12-09  0:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

--- Comment #16 from Hongtao.liu <crazylht at gmail dot com> ---
There're already testcases for vec_extract/vec_set/vec_duplicate, but those
testcases are written under TARGET_AVX512FP16, i'll make a copy of them and
test them w/o avx512fp16.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
  2021-12-06  8:35 [Bug target/103571] New: ABI: V2HF, V4HF and V8HFmode argument passing issues ubizjak at gmail dot com
                   ` (15 preceding siblings ...)
  2021-12-09  0:42 ` crazylht at gmail dot com
@ 2021-12-09  4:15 ` crazylht at gmail dot com
  2021-12-09  5:57 ` crazylht at gmail dot com
                   ` (11 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: crazylht at gmail dot com @ 2021-12-09  4:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

--- Comment #17 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Hongtao.liu from comment #16)
> There're already testcases for vec_extract/vec_set/vec_duplicate, but those
> testcases are written under TARGET_AVX512FP16, i'll make a copy of them and
> test them w/o avx512fp16.

Also we can relax condition of extendv*hfv*sf and truncv*sfv*hf to
avx512vl/f16c so that vect-float16-1.c could be vectorized.

vect-float16-1.c

void
foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
     _Float16 *__restrict__ c)
{
  for (int i = 0; i < 256; i++)
    a[i] = b[i] + c[i];
}

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
  2021-12-06  8:35 [Bug target/103571] New: ABI: V2HF, V4HF and V8HFmode argument passing issues ubizjak at gmail dot com
                   ` (16 preceding siblings ...)
  2021-12-09  4:15 ` crazylht at gmail dot com
@ 2021-12-09  5:57 ` crazylht at gmail dot com
  2021-12-09  7:07 ` crazylht at gmail dot com
                   ` (10 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: crazylht at gmail dot com @ 2021-12-09  5:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

--- Comment #18 from Hongtao.liu <crazylht at gmail dot com> ---
codegen for foo1/foo2 is suboptimal under -mavx2, i guess we can have
vec_setv16hf_0 and with vpblendw.

typedef _Float16 __v16hf __attribute__ ((__vector_size__ (32)));
typedef _Float16 __m256h __attribute__ ((__vector_size__ (32), __may_alias__));

__m256h
__attribute__ ((noinline, noclone))
foo1 (_Float16 x)
{
  return __extension__ (__m256h)(__v16hf) { x, 0.0f, 0.0f, 0.0f,
                                            0.0f, 0.0f, 0.0f, 0.0f,
                                            0.0f, 0.0f, 0.0f, 0.0f,
                                            0.0f, 0.0f, 0.0f, 0.0f };
}

__m256h
__attribute__ ((noinline, noclone))
foo2 (_Float16 *x)
{
  return __extension__ (__m256h)(__v16hf) { *x, 0.0f, 0.0f, 0.0f,
                                            0.0f, 0.0f, 0.0f, 0.0f,
                                            0.0f, 0.0f, 0.0f, 0.0f,
                                            0.0f, 0.0f, 0.0f, 0.0f };
}


foo1:
.LFB0:
        .cfi_startproc
        vpxor   %xmm1, %xmm1, %xmm1
        vpbroadcastw    %xmm0, %ymm0
        vpblendw        $1, %ymm0, %ymm1, %ymm0
        vpblendd        $15, %ymm0, %ymm1, %ymm1
        vmovdqa %ymm1, %ymm0
        ret
        .cfi_endproc
.LFE0:
        .size   foo1, .-foo1
        .p2align 4
        .globl  foo2
        .type   foo2, @function
foo2:
.LFB1:
        .cfi_startproc
        vpbroadcastw    (%rdi), %ymm1
        vpxor   %xmm0, %xmm0, %xmm0
        vpblendw        $1, %ymm1, %ymm0, %ymm1
        vpblendd        $15, %ymm1, %ymm0, %ymm0
        ret
        .cfi_endproc

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
  2021-12-06  8:35 [Bug target/103571] New: ABI: V2HF, V4HF and V8HFmode argument passing issues ubizjak at gmail dot com
                   ` (17 preceding siblings ...)
  2021-12-09  5:57 ` crazylht at gmail dot com
@ 2021-12-09  7:07 ` crazylht at gmail dot com
  2021-12-09  7:21 ` crazylht at gmail dot com
                   ` (9 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: crazylht at gmail dot com @ 2021-12-09  7:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

--- Comment #19 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Hongtao.liu from comment #17)
> (In reply to Hongtao.liu from comment #16)
> > There're already testcases for vec_extract/vec_set/vec_duplicate, but those
> > testcases are written under TARGET_AVX512FP16, i'll make a copy of them and
> > test them w/o avx512fp16.
> 
> Also we can relax condition of extendv*hfv*sf and truncv*sfv*hf to
> avx512vl/f16c so that vect-float16-1.c could be vectorized.
> 
> vect-float16-1.c
> 
> void
> foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
>      _Float16 *__restrict__ c)
> {
>   for (int i = 0; i < 256; i++)
>     a[i] = b[i] + c[i];
> }

Even w/ support of extend_optab/trunc_optab, veclower still lower v8hf addition
to scalar version. And the mismatch is vectorizer assume '+/-' is supported by
default(w/o check optab, just cehck if v8hf is supported in
vector_mode_supported_p), and then vectorize the loop, but veclower lower
vector operation back to scalar which create much worse code than not
vectorized version. 

after loop vectorizer, dump is quite optimized:
  vect__4.6_27 = MEM <vector(4) _Float16> [(_Float16 *)vectp_b.4_29];
  vect__6.9_24 = MEM <vector(4) _Float16> [(_Float16 *)vectp_c.7_26];
  vect__8.10_23 = vect__4.6_27 + vect__6.9_24;
  MEM <vector(4) _Float16> [(_Float16 *)vectp_a.11_22] = vect__8.10_23;
  vectp_b.4_28 = vectp_b.4_29 + 8;
  vectp_c.7_25 = vectp_c.7_26 + 8;
  vectp_a.11_21 = vectp_a.11_22 + 8;

But after veclower

  vect__4.6_4 = MEM <vector(4) _Float16> [(_Float16 *)b_12(D)];
  vect__6.9_5 = MEM <vector(4) _Float16> [(_Float16 *)c_13(D)];
  _28 = BIT_FIELD_REF <vect__4.6_4, 16, 0>;
  _25 = BIT_FIELD_REF <vect__6.9_5, 16, 0>;
  _21 = _28 + _25;
  _15 = BIT_FIELD_REF <vect__4.6_4, 16, 16>;
  _10 = BIT_FIELD_REF <vect__6.9_5, 16, 16>;
  _17 = _15 + _10;
  _22 = BIT_FIELD_REF <vect__4.6_4, 16, 32>;
  _26 = BIT_FIELD_REF <vect__6.9_5, 16, 32>;
  _29 = _22 + _26;
  _20 = BIT_FIELD_REF <vect__4.6_4, 16, 48>;
  _3 = BIT_FIELD_REF <vect__6.9_5, 16, 48>;
  _2 = _20 + _3;
  vect__8.10_6 = {_21, _17, _29, _2};
  MEM <vector(4) _Float16> [(_Float16 *)a_14(D)] = vect__8.10_6;
  vectp_b.4_8 = b_12(D) + 8;
  vectp_c.7_16 = c_13(D) + 8;
  vectp_a.11_30 = a_14(D) + 8;
  vect__4.6_27 = MEM <vector(4) _Float16> [(_Float16 *)vectp_b.4_8];
  vect__6.9_24 = MEM <vector(4) _Float16> [(_Float16 *)vectp_c.7_16];
  _1 = BIT_FIELD_REF <vect__4.6_27, 16, 0>;
  _19 = BIT_FIELD_REF <vect__6.9_24, 16, 0>;
  _31 = _1 + _19;
  _9 = BIT_FIELD_REF <vect__4.6_27, 16, 16>;
  _32 = BIT_FIELD_REF <vect__6.9_24, 16, 16>;
  _33 = _9 + _32;
  _34 = BIT_FIELD_REF <vect__4.6_27, 16, 32>;
  _35 = BIT_FIELD_REF <vect__6.9_24, 16, 32>;
  _36 = _34 + _35;
  _37 = BIT_FIELD_REF <vect__4.6_27, 16, 48>;
  _38 = BIT_FIELD_REF <vect__6.9_24, 16, 48>;
  _39 = _37 + _38;
  vect__8.10_23 = {_31, _33, _36, _39};
  MEM <vector(4) _Float16> [(_Float16 *)vectp_a.11_30] = vect__8.10_23;


Could veclower try widen mode for addition, even veclower can, vNhfmode better
be supported under avx512vl or f16c, orelse vectorized code is really bad, then
why should we supported vector mode under generic target.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
  2021-12-06  8:35 [Bug target/103571] New: ABI: V2HF, V4HF and V8HFmode argument passing issues ubizjak at gmail dot com
                   ` (18 preceding siblings ...)
  2021-12-09  7:07 ` crazylht at gmail dot com
@ 2021-12-09  7:21 ` crazylht at gmail dot com
  2021-12-09  8:15 ` ubizjak at gmail dot com
                   ` (8 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: crazylht at gmail dot com @ 2021-12-09  7:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

--- Comment #20 from Hongtao.liu <crazylht at gmail dot com> ---
V2HF/V4HF should also be restricted under AVX512FP16.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
  2021-12-06  8:35 [Bug target/103571] New: ABI: V2HF, V4HF and V8HFmode argument passing issues ubizjak at gmail dot com
                   ` (19 preceding siblings ...)
  2021-12-09  7:21 ` crazylht at gmail dot com
@ 2021-12-09  8:15 ` ubizjak at gmail dot com
  2021-12-09  8:36 ` crazylht at gmail dot com
                   ` (7 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: ubizjak at gmail dot com @ 2021-12-09  8:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

Uroš Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org

--- Comment #21 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Hongtao.liu from comment #19)
> (In reply to Hongtao.liu from comment #17)
> > (In reply to Hongtao.liu from comment #16)
> > > There're already testcases for vec_extract/vec_set/vec_duplicate, but those
> > > testcases are written under TARGET_AVX512FP16, i'll make a copy of them and
> > > test them w/o avx512fp16.
> > 
> > Also we can relax condition of extendv*hfv*sf and truncv*sfv*hf to
> > avx512vl/f16c so that vect-float16-1.c could be vectorized.
> > 
> > vect-float16-1.c
> > 
> > void
> > foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
> >      _Float16 *__restrict__ c)
> > {
> >   for (int i = 0; i < 256; i++)
> >     a[i] = b[i] + c[i];
> > }
> 
> Even w/ support of extend_optab/trunc_optab, veclower still lower v8hf
> addition to scalar version. And the mismatch is vectorizer assume '+/-' is
> supported by default(w/o check optab, just cehck if v8hf is supported in
> vector_mode_supported_p), and then vectorize the loop, but veclower lower
> vector operation back to scalar which create much worse code than not
> vectorized version. 

I was under impression that autovectorizer won't vectorize if
TARGET_VECTORIZE_PRFERRED_SIMD_MODE returns word_mode. Also, the documentation
for TARGET_VECTOR_MODE_SUPPORTED_P claims that only moves are needed.

So, it looks that middle end is somehow inconsistent here. Adding CC.

> Could veclower try widen mode for addition, even veclower can, vNhfmode
> better be supported under avx512vl or f16c, orelse vectorized code is really
> bad, then why should we supported vector mode under generic target.

We should use it for parameter passing, moves, inserts, extracts and shuffles.
In case of VxHF, we can reuse HImode insns for all these operations.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
  2021-12-06  8:35 [Bug target/103571] New: ABI: V2HF, V4HF and V8HFmode argument passing issues ubizjak at gmail dot com
                   ` (20 preceding siblings ...)
  2021-12-09  8:15 ` ubizjak at gmail dot com
@ 2021-12-09  8:36 ` crazylht at gmail dot com
  2021-12-14 17:28 ` cvs-commit at gcc dot gnu.org
                   ` (6 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: crazylht at gmail dot com @ 2021-12-09  8:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

--- Comment #22 from Hongtao.liu <crazylht at gmail dot com> ---

 reply to Uroš Bizjak from comment #21)
> (In reply to Hongtao.liu from comment #19)
> > (In reply to Hongtao.liu from comment #17)
> > > (In reply to Hongtao.liu from comment #16)
> > > > There're already testcases for vec_extract/vec_set/vec_duplicate, but those
> > > > testcases are written under TARGET_AVX512FP16, i'll make a copy of them and
> > > > test them w/o avx512fp16.
> > > 
> > > Also we can relax condition of extendv*hfv*sf and truncv*sfv*hf to
> > > avx512vl/f16c so that vect-float16-1.c could be vectorized.
> > > 
> > > vect-float16-1.c
> > > 
> > > void
> > > foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
> > >      _Float16 *__restrict__ c)
> > > {
> > >   for (int i = 0; i < 256; i++)
> > >     a[i] = b[i] + c[i];
> > > }
> > 
> > Even w/ support of extend_optab/trunc_optab, veclower still lower v8hf
> > addition to scalar version. And the mismatch is vectorizer assume '+/-' is
> > supported by default(w/o check optab, just cehck if v8hf is supported in
> > vector_mode_supported_p), and then vectorize the loop, but veclower lower
> > vector operation back to scalar which create much worse code than not
> > vectorized version. 
> 
> I was under impression that autovectorizer won't vectorize if
> TARGET_VECTORIZE_PRFERRED_SIMD_MODE returns word_mode. Also, the
word_mode is also returned for 64-bit/32-bit vector, but they're vectorized.(In
> documentation for TARGET_VECTOR_MODE_SUPPORTED_P claims that only moves are
> needed.
> 
> So, it looks that middle end is somehow inconsistent here. Adding CC.
> 
> > Could veclower try widen mode for addition, even veclower can, vNhfmode
> > better be supported under avx512vl or f16c, orelse vectorized code is really
> > bad, then why should we supported vector mode under generic target.
> 
> We should use it for parameter passing, moves, inserts, extracts and
> shuffles. In case of VxHF, we can reuse HImode insns for all these
> operations.

Yes, besides TARGET_VECTOR_MODE_SUPPORTED_P, other part in the attached patch
looks fine, the condition should be binded to real instructions but not mode.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
  2021-12-06  8:35 [Bug target/103571] New: ABI: V2HF, V4HF and V8HFmode argument passing issues ubizjak at gmail dot com
                   ` (21 preceding siblings ...)
  2021-12-09  8:36 ` crazylht at gmail dot com
@ 2021-12-14 17:28 ` cvs-commit at gcc dot gnu.org
  2021-12-14 18:30 ` ubizjak at gmail dot com
                   ` (5 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-12-14 17:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

--- Comment #23 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Uros Bizjak <uros@gcc.gnu.org>:

https://gcc.gnu.org/g:7a54d3deecf967029f18aa5ed1fcbdb752e213b9

commit r12-5966-g7a54d3deecf967029f18aa5ed1fcbdb752e213b9
Author: Uros Bizjak <ubizjak@gmail.com>
Date:   Tue Dec 14 18:27:22 2021 +0100

    i386: Implement VxHF vector set/insert/extract with lower ABI levels

    This is a preparation patch that moves VxHF vector set/insert/extract
    expansions from AVX512FP16 ABI to lower ABIs.  There are no functional
    changes for -mavx512fp16 and a follow-up patch is needed to actually
    enable VxHF vector modes for lower ABIs.

    2021-12-14  Uroš Bizjak  <ubizjak@gmail.com>

    gcc/ChangeLog:

            PR target/103571
            * config/i386/i386-expand.c (ix86_expand_vector_init_duplicate)
            <case E_V8HFmode>: Implement for TARGET_SSE2.
            <case E_V16HFmode>: Implement for TARGET_AVX.
            <case E_V32HFmode>: Implement for TARGET_AVX512F.
            (ix86_expand_vector_set_var): Handle V32HFmode
            without TARGET_AVX512BW.
            (ix86_expand_vector_extract)
            <case E_V8HFmode>: Implement for TARGET_SSE2.
            <case E_V16HFmode>: Implement for TARGET_AVX.
            <case E_V32HFmode>: Implement for TARGET_AVX512BW.
            (expand_vec_perm_broadcast_1) <case E_V8HFmode>: New.
            * config/i386/sse.md (VI12HF_AVX512VL): Remove
            TARGET_AVX512FP16 condition.
            (V): Ditto.
            (V_256_512): Ditto.
            (avx_vbroadcastf128_<mode>): Use V_256H mode iterator.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
  2021-12-06  8:35 [Bug target/103571] New: ABI: V2HF, V4HF and V8HFmode argument passing issues ubizjak at gmail dot com
                   ` (22 preceding siblings ...)
  2021-12-14 17:28 ` cvs-commit at gcc dot gnu.org
@ 2021-12-14 18:30 ` ubizjak at gmail dot com
  2021-12-16  8:51 ` ubizjak at gmail dot com
                   ` (4 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: ubizjak at gmail dot com @ 2021-12-14 18:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

Uroš Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #51950|0                           |1
        is obsolete|                            |

--- Comment #24 from Uroš Bizjak <ubizjak at gmail dot com> ---
Created attachment 52002
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52002&action=edit
Current patch to enable vector VxHF modes for TARGET_SSE+

Current patch after preparation patch was committed.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
  2021-12-06  8:35 [Bug target/103571] New: ABI: V2HF, V4HF and V8HFmode argument passing issues ubizjak at gmail dot com
                   ` (23 preceding siblings ...)
  2021-12-14 18:30 ` ubizjak at gmail dot com
@ 2021-12-16  8:51 ` ubizjak at gmail dot com
  2021-12-16 18:35 ` cvs-commit at gcc dot gnu.org
                   ` (3 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: ubizjak at gmail dot com @ 2021-12-16  8:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

--- Comment #25 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Hongtao.liu from comment #22)
> Yes, besides TARGET_VECTOR_MODE_SUPPORTED_P, other part in the attached
> patch looks fine, the condition should be binded to real instructions but
> not mode.

OK, will commit the patch to enable vector modes later today:

- mavx512fp16 is unchanged
- vactorizer middle end can be (will be) fixed as a follow-up (I'll open a PR).
- it will be possible to test various ISA levels, possible ICE is relatively
easy to fix by enabling/disabling various code paths in expanders.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
  2021-12-06  8:35 [Bug target/103571] New: ABI: V2HF, V4HF and V8HFmode argument passing issues ubizjak at gmail dot com
                   ` (24 preceding siblings ...)
  2021-12-16  8:51 ` ubizjak at gmail dot com
@ 2021-12-16 18:35 ` cvs-commit at gcc dot gnu.org
  2021-12-16 18:49 ` ubizjak at gmail dot com
                   ` (2 subsequent siblings)
  28 siblings, 0 replies; 30+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-12-16 18:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

--- Comment #26 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Uros Bizjak <uros@gcc.gnu.org>:

https://gcc.gnu.org/g:271e36d9d5b3a75e7f1a927e594477e6a5dd6fc0

commit r12-6021-g271e36d9d5b3a75e7f1a927e594477e6a5dd6fc0
Author: Uros Bizjak <ubizjak@gmail.com>
Date:   Thu Dec 16 19:34:50 2021 +0100

    i386: Enable VxHF vector modes lower ABI levels [PR103571]

    Enable VxHF vector modes for SSE2, AVX and AVX512F ABIs.

    2021-12-16  Uroš Bizjak  <ubizjak@gmail.com>

    gcc/ChangeLog:

            PR target/103571
            * config/i386/i386.h (VALID_AVX256_REG_MODE): Add V16HFmode.
            (VALID_AVX256_REG_OR_OI_VHF_MODE): Replace with ...
            (VALID_AVX256_REG_OR_OI_MODE): ... this.  Remove V16HFmode.
            (VALID_AVX512F_SCALAR_MODE): Remove HImode and HFmode.
            (VALID_AVX512FP16_SCALAR_MODE): New.
            (VALID_AVX512F_REG_MODE): Add V32HFmode.
            (VALID_SSE2_REG_MODE): Add V8HFmode, V4HFmode and V2HFmode.
            (VALID_SSE2_REG_VHF_MODE): Remove.
            (VALID_INT_MODE_P): Add V2HFmode.
            * config/i386/i386.c (function_arg_advance_64):
            Remove explicit mention of V16HFmode and V32HFmode.
            (ix86_hard_regno_mode_ok): Remove explicit mention of XImode
            and V32HFmode, use VALID_AVX512F_REG_OR_XI_MODE instead.
            Use VALID_AVX512FP_SCALAR_MODE for TARGET_aVX512FP16.
            Use VALID_AVX256_REG_OR_OI_MODE instead of
            VALID_AVX256_REG_OR_OI_VHF_MODE and VALID_SSE2_REG_MODE instead
            of VALID_SSE2_REG_VHF_MODE.
            (ix86_set_reg_reg_cost): Remove usge of VALID_AVX512FP16_REG_MODE.
            (ix86_vector_mode_supported): Ditto.

    gcc/testsuite/ChangeLog:

            PR target/103571
            * gcc.target/i386/pr102812.c (dg-final): Do not scan for movdqa.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
  2021-12-06  8:35 [Bug target/103571] New: ABI: V2HF, V4HF and V8HFmode argument passing issues ubizjak at gmail dot com
                   ` (25 preceding siblings ...)
  2021-12-16 18:35 ` cvs-commit at gcc dot gnu.org
@ 2021-12-16 18:49 ` ubizjak at gmail dot com
  2021-12-16 18:55 ` ubizjak at gmail dot com
  2021-12-16 19:22 ` ubizjak at gmail dot com
  28 siblings, 0 replies; 30+ messages in thread
From: ubizjak at gmail dot com @ 2021-12-16 18:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

--- Comment #27 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Hongtao.liu from comment #17)
> (In reply to Hongtao.liu from comment #16)
> > There're already testcases for vec_extract/vec_set/vec_duplicate, but those
> > testcases are written under TARGET_AVX512FP16, i'll make a copy of them and
> > test them w/o avx512fp16.
> 
> Also we can relax condition of extendv*hfv*sf and truncv*sfv*hf to
> avx512vl/f16c so that vect-float16-1.c could be vectorized.
> 
> vect-float16-1.c
> 
> void
> foo (_Float16 *__restrict__ a, _Float16 *__restrict__ b,
>      _Float16 *__restrict__ c)
> {
>   for (int i = 0; i < 256; i++)
>     a[i] = b[i] + c[i];
> }

This was recently fixed, for -O2 -ftree-vectorize -mfp16c I get:

        vpxor   %xmm2, %xmm2, %xmm2
        vpinsrw $0, (%rsi,%rax), %xmm2, %xmm0
        vpinsrw $0, (%rdx,%rax), %xmm2, %xmm1
        vcvtph2ps       %xmm0, %xmm0
        vcvtph2ps       %xmm1, %xmm1
        vaddss  %xmm1, %xmm0, %xmm0
        vinsertps       $0xe, %xmm0, %xmm0, %xmm0
        vcvtps2ph       $4, %xmm0, %xmm0
        vpextrw $0, %xmm0, (%rdi,%rax)
        addq    $2, %rax
        cmpq    $512, %rax
        jne     .L2
        ret

While it would be nice to partially vectorize with vcvtph2ps/vcvtps2ph, the
compiler doesn't reach that far.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
  2021-12-06  8:35 [Bug target/103571] New: ABI: V2HF, V4HF and V8HFmode argument passing issues ubizjak at gmail dot com
                   ` (26 preceding siblings ...)
  2021-12-16 18:49 ` ubizjak at gmail dot com
@ 2021-12-16 18:55 ` ubizjak at gmail dot com
  2021-12-16 19:22 ` ubizjak at gmail dot com
  28 siblings, 0 replies; 30+ messages in thread
From: ubizjak at gmail dot com @ 2021-12-16 18:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

--- Comment #28 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Hongtao.liu from comment #18)
> codegen for foo1/foo2 is suboptimal under -mavx2, i guess we can have
> vec_setv16hf_0 and with vpblendw.

True, some opportunities are missing from expand_vec_perm* functions, someone
should go through these expanders and add corresponding VxHFmode near VxHImode
handling.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Bug target/103571] ABI: V2HF, V4HF and V8HFmode argument passing issues
  2021-12-06  8:35 [Bug target/103571] New: ABI: V2HF, V4HF and V8HFmode argument passing issues ubizjak at gmail dot com
                   ` (27 preceding siblings ...)
  2021-12-16 18:55 ` ubizjak at gmail dot com
@ 2021-12-16 19:22 ` ubizjak at gmail dot com
  28 siblings, 0 replies; 30+ messages in thread
From: ubizjak at gmail dot com @ 2021-12-16 19:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103571

Uroš Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|UNCONFIRMED                 |RESOLVED
   Target Milestone|---                         |12.0

--- Comment #29 from Uroš Bizjak <ubizjak at gmail dot com> ---
Fixed for gcc-12.

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2021-12-16 19:22 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-06  8:35 [Bug target/103571] New: ABI: V2HF, V4HF and V8HFmode argument passing issues ubizjak at gmail dot com
2021-12-06 11:37 ` [Bug target/103571] " crazylht at gmail dot com
2021-12-07  3:05 ` crazylht at gmail dot com
2021-12-07  7:47 ` wwwhhhyyy333 at gmail dot com
2021-12-07  7:54 ` ubizjak at gmail dot com
2021-12-07  8:14 ` crazylht at gmail dot com
2021-12-07 11:04 ` ubizjak at gmail dot com
2021-12-07 11:17 ` ubizjak at gmail dot com
2021-12-08  5:27 ` crazylht at gmail dot com
2021-12-08  7:10 ` ubizjak at gmail dot com
2021-12-08  7:16 ` crazylht at gmail dot com
2021-12-08 14:25 ` ubizjak at gmail dot com
2021-12-08 14:38 ` ubizjak at gmail dot com
2021-12-08 15:05 ` ubizjak at gmail dot com
2021-12-08 15:07 ` ubizjak at gmail dot com
2021-12-09  0:39 ` crazylht at gmail dot com
2021-12-09  0:42 ` crazylht at gmail dot com
2021-12-09  4:15 ` crazylht at gmail dot com
2021-12-09  5:57 ` crazylht at gmail dot com
2021-12-09  7:07 ` crazylht at gmail dot com
2021-12-09  7:21 ` crazylht at gmail dot com
2021-12-09  8:15 ` ubizjak at gmail dot com
2021-12-09  8:36 ` crazylht at gmail dot com
2021-12-14 17:28 ` cvs-commit at gcc dot gnu.org
2021-12-14 18:30 ` ubizjak at gmail dot com
2021-12-16  8:51 ` ubizjak at gmail dot com
2021-12-16 18:35 ` cvs-commit at gcc dot gnu.org
2021-12-16 18:49 ` ubizjak at gmail dot com
2021-12-16 18:55 ` ubizjak at gmail dot com
2021-12-16 19:22 ` ubizjak at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).