[Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
@ 2021-10-18 10:58 ubizjak at gmail dot com
  2021-10-18 11:17 ` [Bug target/102811] " ubizjak at gmail dot com
                   ` (26 more replies)
  0 siblings, 27 replies; 28+ messages in thread
From: ubizjak at gmail dot com @ 2021-10-18 10:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811

            Bug ID: 102811
           Summary: vcvtph2ps and vcvtps2ph should be used to convert
                    _Float16 to SFmode with -mf16c
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ubizjak at gmail dot com
  Target Milestone: ---

The following testcase:

_Float16 test (_Float16 a, _Float16 b)
{
  return a + b;
}

compiles with -O2 -mf16c to:

--cut here--
        subq    $24, %rsp
        pextrw  $0, %xmm1, 14(%rsp)
        call    __extendhfsf2
        pinsrw  $0, 14(%rsp), %xmm1
        vmovss  %xmm0, 8(%rsp)
        vmovss  %xmm1, %xmm1, %xmm0
        call    __extendhfsf2
        vaddss  8(%rsp), %xmm0, %xmm0
        call    __truncsfhf2
        addq    $24, %rsp
        ret
--cut here--

Instead of calling __extendhfsf2 and __truncsfhf2, we should use vcvtph2ps and
vcvtps2ph (with zeroed elements 1..3) for -m16c targets.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
  2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
@ 2021-10-18 11:17 ` ubizjak at gmail dot com
  2021-10-18 11:31 ` ubizjak at gmail dot com
                   ` (25 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: ubizjak at gmail dot com @ 2021-10-18 11:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811

--- Comment #1 from Uroš Bizjak <ubizjak at gmail dot com> ---
Something like (argument and result in %xmm0):

vpxor %xmm1, %xmm1, %xmm1
vpblendw %xmm1, %xmm1, %xmm0, $1
vcvtph2ps %xmm0, %xmm1

instead of __extendhfsf2 and:

vxorps %xmm1, %xmm1, %xmm1
vblendps %xmm1, %xmm1, %xmm0, $1
vcvtps2ph %xmm0, %xmm1

instead of __truncsfhf2.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
  2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
  2021-10-18 11:17 ` [Bug target/102811] " ubizjak at gmail dot com
@ 2021-10-18 11:31 ` ubizjak at gmail dot com
  2021-10-18 21:20 ` pinskia at gcc dot gnu.org
                   ` (24 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: ubizjak at gmail dot com @ 2021-10-18 11:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811

--- Comment #2 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Uroš Bizjak from comment #1)
> vxorps %xmm1, %xmm1, %xmm1
> vblendps %xmm1, %xmm1, %xmm0, $1
> vcvtps2ph %xmm0, %xmm1

vmovss %xmm1, %xmm1, %xmm0

instead of vblendps would also do the trick.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
  2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
  2021-10-18 11:17 ` [Bug target/102811] " ubizjak at gmail dot com
  2021-10-18 11:31 ` ubizjak at gmail dot com
@ 2021-10-18 21:20 ` pinskia at gcc dot gnu.org
  2021-11-26  1:30 ` cvs-commit at gcc dot gnu.org
                   ` (23 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-10-18 21:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
           Severity|normal                      |enhancement

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
  2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
                   ` (2 preceding siblings ...)
  2021-10-18 21:20 ` pinskia at gcc dot gnu.org
@ 2021-11-26  1:30 ` cvs-commit at gcc dot gnu.org
  2021-11-26  1:32 ` crazylht at gmail dot com
                   ` (22 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-11-26  1:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811

--- Comment #3 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:

https://gcc.gnu.org/g:90cb088ece8d8cc1019d25629d1585e5b0234179

commit r12-5536-g90cb088ece8d8cc1019d25629d1585e5b0234179
Author: konglin1 <lingling.kong@intel.com>
Date:   Wed Nov 10 09:37:32 2021 +0800

    i386: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode
with -mf16c [PR 102811]

    Add define_insn extendhfsf2 and truncsfhf2 for target_f16c.

    gcc/ChangeLog:

            PR target/102811
            * config/i386/i386.c (ix86_can_change_mode_class): Allow 16 bit
data in XMM register
            for TARGET_SSE2.
            * config/i386/i386.md (extendhfsf2): Add extenndhfsf2 for
TARGET_F16C.
            (extendhfdf2): Restrict extendhfdf for TARGET_AVX512FP16 only.
            (*extendhf<mode>2): Rename from extendhf<mode>2.
            (truncsfhf2): Likewise.
            (truncdfhf2): Likewise.
            (*trunc<mode>2): Likewise.

    gcc/testsuite/ChangeLog:

            PR target/102811
            * gcc.target/i386/pr90773-21.c: Allow pextrw instead of movw.
            * gcc.target/i386/pr90773-23.c: Ditto.
            * gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c: New test.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
  2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
                   ` (3 preceding siblings ...)
  2021-11-26  1:30 ` cvs-commit at gcc dot gnu.org
@ 2021-11-26  1:32 ` crazylht at gmail dot com
  2021-11-26  1:48 ` pinskia at gcc dot gnu.org
                   ` (21 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: crazylht at gmail dot com @ 2021-11-26  1:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811

--- Comment #4 from Hongtao.liu <crazylht at gmail dot com> ---
Fixed in GCC12.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
  2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
                   ` (4 preceding siblings ...)
  2021-11-26  1:32 ` crazylht at gmail dot com
@ 2021-11-26  1:48 ` pinskia at gcc dot gnu.org
  2021-11-26 10:41 ` ubizjak at gmail dot com
                   ` (20 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-11-26  1:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
   Target Milestone|---                         |12.0
         Resolution|---                         |FIXED

--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
  2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
                   ` (5 preceding siblings ...)
  2021-11-26  1:48 ` pinskia at gcc dot gnu.org
@ 2021-11-26 10:41 ` ubizjak at gmail dot com
  2021-11-26 10:48 ` ubizjak at gmail dot com
                   ` (19 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: ubizjak at gmail dot com @ 2021-11-26 10:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811

--- Comment #6 from Uroš Bizjak <ubizjak at gmail dot com> ---
Created attachment 51879
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51879&action=edit
Improve HI/HFmode scalar insert

The attached patch further improves HFmode -> SFmode conversion. HFmode values
are passed in XMM registers, but PINSRW insn inserts only from memory or GPR.

The patch introduces *vec_set<V8_128:mode>_0 insn pattern that also adds
PBLENDW instruction that handles insert to element 0 from XMM source.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
  2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
                   ` (6 preceding siblings ...)
  2021-11-26 10:41 ` ubizjak at gmail dot com
@ 2021-11-26 10:48 ` ubizjak at gmail dot com
  2021-11-26 11:12 ` ubizjak at gmail dot com
                   ` (18 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: ubizjak at gmail dot com @ 2021-11-26 10:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811

--- Comment #7 from Uroš Bizjak <ubizjak at gmail dot com> ---
The improvement with patch from comment #6:

The testcase:

_Float16 test (_Float16 a, _Float16 b)
{
  return a + b;
}

compiles with unpatched gcc -O2 -mf16c to:

        vmovss  %xmm0, %xmm0, %xmm2     # 27    [c=4 l=4]  *movhf_internal/3
        pextrw  $0, %xmm1, -4(%rsp)     # 28    [c=4 l=6]  *movhf_internal/5
        vpxor   %xmm0, %xmm0, %xmm0     # 7     [c=4 l=4]  movv8hf_internal/0
        vpxor   %xmm1, %xmm1, %xmm1     # 11    [c=4 l=4]  movv8hf_internal/0
        pextrw  $0, %xmm2, -2(%rsp)     # 30    [c=4 l=6]  *movhf_internal/5
        vpinsrw $0, -4(%rsp), %xmm1, %xmm1      # 12    [c=4 l=8] 
sse4_1_pinsrph/3
        vpinsrw $0, -2(%rsp), %xmm0, %xmm0      # 8     [c=4 l=8] 
sse4_1_pinsrph/3
        vcvtph2ps       %xmm1, %xmm1    # 13    [c=4 l=4]  vcvtph2ps
        vcvtph2ps       %xmm0, %xmm0    # 9     [c=4 l=4]  vcvtph2ps
        vaddss  %xmm1, %xmm0, %xmm0     # 15    [c=12 l=4]  *fop_sf_comm/2
        vinsertps       $0xe, %xmm0, %xmm0, %xmm0       # 17    [c=4 l=4] 
vec_setv4sf_0/2
        vcvtps2ph       $4, %xmm0, %xmm0        # 18    [c=4 l=4]  *vcvtps2ph
        ret             # 35    [c=0 l=1]  simple_return_internal

with unpatched gcc -O2 -mf16c -mavx2:

        vpbroadcastw    %xmm0, %xmm0    # 8     [c=4 l=5]  *vec_dupv8hf/1
        vpxor   %xmm2, %xmm2, %xmm2     # 7     [c=4 l=4]  movv8hf_internal/0
        vpbroadcastw    %xmm1, %xmm1    # 13    [c=4 l=5]  *vec_dupv8hf/1
        vpblendw        $1, %xmm0, %xmm2, %xmm2 # 9     [c=4 l=6] 
sse4_1_pblendph/2
        vpxor   %xmm0, %xmm0, %xmm0     # 12    [c=4 l=4]  movv8hf_internal/0
        vpblendw        $1, %xmm1, %xmm0, %xmm0 # 14    [c=4 l=6] 
sse4_1_pblendph/2
        vcvtph2ps       %xmm2, %xmm2    # 10    [c=4 l=4]  vcvtph2ps
        vcvtph2ps       %xmm0, %xmm0    # 15    [c=4 l=4]  vcvtph2ps
        vaddss  %xmm0, %xmm2, %xmm0     # 17    [c=12 l=4]  *fop_sf_comm/2
        vinsertps       $0xe, %xmm0, %xmm0, %xmm0       # 19    [c=4 l=4] 
vec_setv4sf_0/2
        vcvtps2ph       $4, %xmm0, %xmm0        # 20    [c=4 l=4]  *vcvtps2ph
        ret             # 36    [c=0 l=1]  simple_return_internal

And with patched gcc -O2 -mf16c:

        vpxor   %xmm2, %xmm2, %xmm2     # 32    [c=4 l=4]  movv8hf_internal/0
        vpblendw        $1, %xmm0, %xmm2, %xmm0 # 9     [c=4 l=6] 
*vec_setv8hf_0/8
        vpblendw        $1, %xmm1, %xmm2, %xmm1 # 14    [c=4 l=6] 
*vec_setv8hf_0/8
        vcvtph2ps       %xmm1, %xmm1    # 15    [c=4 l=4]  vcvtph2ps
        vcvtph2ps       %xmm0, %xmm0    # 10    [c=4 l=4]  vcvtph2ps
        vaddss  %xmm1, %xmm0, %xmm0     # 17    [c=12 l=4]  *fop_sf_comm/2
        vinsertps       $0xe, %xmm0, %xmm0, %xmm0       # 19    [c=4 l=4] 
vec_setv4sf_0/2
        vcvtps2ph       $4, %xmm0, %xmm0        # 20    [c=4 l=4]  *vcvtps2ph
        ret             # 40    [c=0 l=1]  simple_return_internal

The above dumps show inconsistendy for PEXTRW (it should be VPEXTRW) and also
open a question, why unpatched gcc prefers memory temp instead of GPR temp for
PEXTRW/PINSRW.

The patch improves HI/HFmode inserts to element 0 in general.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
  2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
                   ` (7 preceding siblings ...)
  2021-11-26 10:48 ` ubizjak at gmail dot com
@ 2021-11-26 11:12 ` ubizjak at gmail dot com
  2021-11-26 12:27 ` crazylht at gmail dot com
                   ` (17 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: ubizjak at gmail dot com @ 2021-11-26 11:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811

--- Comment #8 from Uroš Bizjak <ubizjak at gmail dot com> ---
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 68606e57e60..a2ebaa5ac63 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -2528,12 +2528,12 @@
     case TYPE_SSELOG:
       if (SSE_REG_P (operands[0]))
        return MEM_P (operands[1])
-         ? "pinsrw\t{$0, %1, %0|%0, %1, 0}"
-         : "pinsrw\t{$0, %k1, %0|%0, %k1, 0}";
+         ? "%vpinsrw\t{$0, %1, %d0|%d0, %1, 0}"
+         : "%vpinsrw\t{$0, %k1, %d0|%d0, %k1, 0}";
       else
        return MEM_P (operands[1])
-         ? "pextrw\t{$0, %1, %0|%0, %1, 0}"
-         : "pextrw\t{$0, %1, %k0|%k0, %k1, 0}";
+         ? "%vpextrw\t{$0, %1, %0|%0, %1, 0}"
+         : "%vpextrw\t{$0, %1, %k0|%k0, %k1, 0}";

     case TYPE_MSKLOG:
       if (operands[1] == const0_rtx)
@@ -3788,12 +3788,12 @@
     case TYPE_SSELOG:
       if (SSE_REG_P (operands[0]))
        return MEM_P (operands[1])
-              ? "pinsrw\t{$0, %1, %0|%0, %1, 0}"
-              : "pinsrw\t{$0, %k1, %0|%0, %k1, 0}";
+              ? "%vpinsrw\t{$0, %1, %d0|%d0, %1, 0}"
+              : "%vpinsrw\t{$0, %k1, %d0|%d0, %k1, 0}";
       else
        return MEM_P (operands[1])
-              ? "pextrw\t{$0, %1, %0|%0, %1, 0}"
-              : "pextrw\t{$0, %1, %k0|%k0, %k1, 0}";
+              ? "%vpextrw\t{$0, %1, %0|%0, %1, 0}"
+              : "%vpextrw\t{$0, %1, %k0|%k0, %k1, 0}";

     default:
       gcc_unreachable ();

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
  2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
                   ` (8 preceding siblings ...)
  2021-11-26 11:12 ` ubizjak at gmail dot com
@ 2021-11-26 12:27 ` crazylht at gmail dot com
  2021-11-26 13:01 ` ubizjak at gmail dot com
                   ` (16 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: crazylht at gmail dot com @ 2021-11-26 12:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811

--- Comment #9 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Uroš Bizjak from comment #8)
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 68606e57e60..a2ebaa5ac63 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -2528,12 +2528,12 @@
>      case TYPE_SSELOG:
>        if (SSE_REG_P (operands[0]))
>         return MEM_P (operands[1])
> -         ? "pinsrw\t{$0, %1, %0|%0, %1, 0}"
> -         : "pinsrw\t{$0, %k1, %0|%0, %k1, 0}";
> +         ? "%vpinsrw\t{$0, %1, %d0|%d0, %1, 0}"
> +         : "%vpinsrw\t{$0, %k1, %d0|%d0, %k1, 0}";
>        else
>         return MEM_P (operands[1])
> -         ? "pextrw\t{$0, %1, %0|%0, %1, 0}"
> -         : "pextrw\t{$0, %1, %k0|%k0, %k1, 0}";
> +         ? "%vpextrw\t{$0, %1, %0|%0, %1, 0}"
> +         : "%vpextrw\t{$0, %1, %k0|%k0, %k1, 0}";
>  
>      case TYPE_MSKLOG:
>        if (operands[1] == const0_rtx)
> @@ -3788,12 +3788,12 @@
>      case TYPE_SSELOG:
>        if (SSE_REG_P (operands[0]))
>         return MEM_P (operands[1])
> -              ? "pinsrw\t{$0, %1, %0|%0, %1, 0}"
> -              : "pinsrw\t{$0, %k1, %0|%0, %k1, 0}";
> +              ? "%vpinsrw\t{$0, %1, %d0|%d0, %1, 0}"
> +              : "%vpinsrw\t{$0, %k1, %d0|%d0, %k1, 0}";
>        else
>         return MEM_P (operands[1])
> -              ? "pextrw\t{$0, %1, %0|%0, %1, 0}"
> -              : "pextrw\t{$0, %1, %k0|%k0, %k1, 0}";
> +              ? "%vpextrw\t{$0, %1, %0|%0, %1, 0}"
> +              : "%vpextrw\t{$0, %1, %k0|%k0, %k1, 0}";
>  
>      default:
>        gcc_unreachable ();

Yes, I'm testing

modified   gcc/config/i386/i386.c
@@ -19240,7 +19240,7 @@ ix86_secondary_reload (bool in_p, rtx x, reg_class_t
rclass,
     }

   /* Require movement to gpr, and then store to memory.  */
-  if (mode == HFmode
+  if ((mode == HFmode || mode == HImode)
       && !TARGET_SSE4_1
       && SSE_CLASS_P (rclass)
       && !in_p && MEM_P (x))
modified   gcc/config/i386/i386.md
@@ -2528,12 +2528,12 @@ (define_insn "*movhi_internal"
     case TYPE_SSELOG:
       if (SSE_REG_P (operands[0]))
        return MEM_P (operands[1])
-         ? "pinsrw\t{$0, %1, %0|%0, %1, 0}"
-         : "pinsrw\t{$0, %k1, %0|%0, %k1, 0}";
+         ? "%vpinsrw\t{$0, %1, %0|%0, %1, 0}"
+         : "%vpinsrw\t{$0, %k1, %0|%0, %k1, 0}";
       else
-       return MEM_P (operands[1])
-         ? "pextrw\t{$0, %1, %0|%0, %1, 0}"
-         : "pextrw\t{$0, %1, %k0|%k0, %k1, 0}";
+       return MEM_P (operands[0])
+         ? "%vpextrw\t{$0, %1, %0|%0, %1, 0}"
+         : "%vpextrw\t{$0, %1, %k0|%k0, %k1, 0}";

     case TYPE_MSKLOG:
       if (operands[1] == const0_rtx)
@@ -2557,12 +2557,14 @@ (define_insn "*movhi_internal"
               ]
               (const_string "*")))
    (set (attr "type")
-     (cond [(eq_attr "alternative" "9,10,11,12,13")
+     (cond [(eq_attr "alternative" "9,10,12,13")
              (if_then_else (match_test "TARGET_AVX512FP16")
                (const_string "ssemov")
                (const_string "sselog"))
            (eq_attr "alternative" "4,5,6,7")
              (const_string "mskmov")
+           (eq_attr "alternative" "11")
+             (const_string "ssemov")
            (eq_attr "alternative" "8")
              (const_string "msklog")
            (match_test "optimize_function_for_size_p (cfun)")
@@ -2579,15 +2581,33 @@ (define_insn "*movhi_internal"
              (const_string "imovx")
           ]
           (const_string "imov")))
+    (set (attr "memory")
+        (cond [(eq_attr "alternative" "9,10")
+                 (const_string "none")
+               (eq_attr "alternative" "12")
+                 (const_string "load")
+               (eq_attr "alternative" "13")
+                 (const_string "store")
+               ]
+               (const_string "*")))
     (set (attr "prefix")
-      (if_then_else (eq_attr "alternative" "4,5,6,7,8")
-       (const_string "vex")
-       (const_string "orig")))
+        (cond [(eq_attr "alternative" "9,10,11,12,13")
+                 (const_string "maybe_evex")
+               (eq_attr "alternative" "4,5,6,7,8")
+                 (const_string "vex")
+              ]
+              (const_string "orig")))
     (set (attr "mode")
       (cond [(eq_attr "type" "imovx")
               (const_string "SI")
+            (eq_attr "alternative" "9,10,12,13")
+              (if_then_else (match_test "TARGET_AVX512FP16")
+                (const_string "HI")
+                (const_string "TI"))
             (eq_attr "alternative" "11")
-              (const_string "HF")
+              (if_then_else (match_test "TARGET_AVX512FP16")
+                (const_string "HF")
+                (const_string "SF"))
             (and (eq_attr "alternative" "1,2")
                  (match_operand:HI 1 "aligned_operand"))
               (const_string "SI")
@@ -3791,7 +3811,7 @@ (define_insn "*movhf_internal"
               ? "pinsrw\t{$0, %1, %0|%0, %1, 0}"
               : "pinsrw\t{$0, %k1, %0|%0, %k1, 0}";
       else
-       return MEM_P (operands[1])
+       return MEM_P (operands[0])
               ? "pextrw\t{$0, %1, %0|%0, %1, 0}"
               : "pextrw\t{$0, %1, %k0|%k0, %k1, 0}";

modified   gcc/config/i386/sse.md
@@ -11230,9 +11230,9 @@ (define_insn "*vec_extracthf"
   switch (which_alternative)
     {
     case 0:
-      return "vpextrw\t{%2, %1, %k0|%k0, %1, %2}";
+      return "%vpextrw\t{%2, %1, %k0|%k0, %1, %2}";
     case 1:
-      return "vpextrw\t{%2, %1, %0|%0, %1, %2}";
+      return "%vpextrw\t{%2, %1, %0|%0, %1, %2}";

     case 2:
       operands[2] = GEN_INT (INTVAL (operands[2]) * 2);
@@ -11245,7 +11245,7 @@ (define_insn "*vec_extracthf"
       gcc_unreachable ();
    }
 }
-  [(set_attr "isa" "*,*,noavx,avx")
+  [(set_attr "isa" "*,sse4,noavx,avx")
    (set_attr "type" "sselog1,sselog1,sseishft1,sseishft1")
    (set_attr "prefix" "maybe_evex")
    (set_attr "mode" "TI")])

[back]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
  2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
                   ` (9 preceding siblings ...)
  2021-11-26 12:27 ` crazylht at gmail dot com
@ 2021-11-26 13:01 ` ubizjak at gmail dot com
  2021-11-26 13:28 ` crazylht at gmail dot com
                   ` (15 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: ubizjak at gmail dot com @ 2021-11-26 13:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811

--- Comment #10 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Uroš Bizjak from comment #7)
> compiles with unpatched gcc -O2 -mf16c to:
> 
> 	vmovss  %xmm0, %xmm0, %xmm2     # 27    [c=4 l=4]  *movhf_internal/3
>         pextrw  $0, %xmm1, -4(%rsp)     # 28    [c=4 l=6]  *movhf_internal/5
>         vpxor   %xmm0, %xmm0, %xmm0     # 7     [c=4 l=4]  movv8hf_internal/0
>         vpxor   %xmm1, %xmm1, %xmm1     # 11    [c=4 l=4]  movv8hf_internal/0
>         pextrw  $0, %xmm2, -2(%rsp)     # 30    [c=4 l=6]  *movhf_internal/5
>         vpinsrw $0, -4(%rsp), %xmm1, %xmm1      # 12    [c=4 l=8] 
> sse4_1_pinsrph/3
>         vpinsrw $0, -2(%rsp), %xmm0, %xmm0      # 8     [c=4 l=8] 
> sse4_1_pinsrph/3
>         vcvtph2ps       %xmm1, %xmm1    # 13    [c=4 l=4]  vcvtph2ps
>         vcvtph2ps       %xmm0, %xmm0    # 9     [c=4 l=4]  vcvtph2ps
>         vaddss  %xmm1, %xmm0, %xmm0     # 15    [c=12 l=4]  *fop_sf_comm/2
>         vinsertps       $0xe, %xmm0, %xmm0, %xmm0       # 17    [c=4 l=4] 
> vec_setv4sf_0/2
>         vcvtps2ph       $4, %xmm0, %xmm0        # 18    [c=4 l=4]  *vcvtps2ph
>         ret             # 35    [c=0 l=1]  simple_return_internal

Just noticed that for some reason two VPXORs are emitted. One should be enough
for both VPINSRW insns.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
  2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
                   ` (10 preceding siblings ...)
  2021-11-26 13:01 ` ubizjak at gmail dot com
@ 2021-11-26 13:28 ` crazylht at gmail dot com
  2021-11-26 13:34 ` crazylht at gmail dot com
                   ` (14 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: crazylht at gmail dot com @ 2021-11-26 13:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811

--- Comment #11 from Hongtao.liu <crazylht at gmail dot com> ---

> The above dumps show inconsistendy for PEXTRW (it should be VPEXTRW) and
> also open a question, why unpatched gcc prefers memory temp instead of GPR
> temp for PEXTRW/PINSRW.
> 
Because RA thought memory is needed to move between GPR and SSE.
modified   gcc/config/i386/i386.c
@@ -19438,7 +19438,7 @@ inline_secondary_memory_needed (machine_mode mode,
reg_class_t class1,
        return true;

       /* In addition to SImode moves, AVX512FP16 also enables HImode moves. 
*/
-      int minsize = GET_MODE_SIZE (TARGET_AVX512FP16 ? HImode : SImode);
+      int minsize = GET_MODE_SIZE (TARGET_SSE2 ? HImode : SImode);

       if (msize < minsize)
> The patch improves HI/HFmode inserts to element 0 in general.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
  2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
                   ` (11 preceding siblings ...)
  2021-11-26 13:28 ` crazylht at gmail dot com
@ 2021-11-26 13:34 ` crazylht at gmail dot com
  2021-11-26 13:46 ` ubizjak at gmail dot com
                   ` (13 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: crazylht at gmail dot com @ 2021-11-26 13:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811

--- Comment #12 from Hongtao.liu <crazylht at gmail dot com> ---

> 
> Just noticed that for some reason two VPXORs are emitted. One should be
> enough for both VPINSRW insns.

With new alternative in your attached match(vpblenw one), RA could reuse zero
register, w/o that, xmm0/xmm1 need to be explictly clear for the upper bits.
vpblendw        $1, %xmm1, %xmm2, %xmm1 # 14    [c=4 l=6]  *vec_setv8hf_0/8

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
  2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
                   ` (12 preceding siblings ...)
  2021-11-26 13:34 ` crazylht at gmail dot com
@ 2021-11-26 13:46 ` ubizjak at gmail dot com
  2021-11-26 15:22 ` crazylht at gmail dot com
                   ` (12 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: ubizjak at gmail dot com @ 2021-11-26 13:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811

--- Comment #13 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Hongtao.liu from comment #12)
> > 
> > Just noticed that for some reason two VPXORs are emitted. One should be
> > enough for both VPINSRW insns.
> 
> With new alternative in your attached match(vpblenw one), RA could reuse
> zero register, w/o that, xmm0/xmm1 need to be explictly clear for the upper
> bits.
> vpblendw        $1, %xmm1, %xmm2, %xmm1 # 14    [c=4 l=6]  *vec_setv8hf_0/8

True, but I'd expect some post-reload(?) pass to propagate zeros and remove
redundant initializations.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
  2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
                   ` (13 preceding siblings ...)
  2021-11-26 13:46 ` ubizjak at gmail dot com
@ 2021-11-26 15:22 ` crazylht at gmail dot com
  2021-11-26 16:00 ` ubizjak at gmail dot com
                   ` (11 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: crazylht at gmail dot com @ 2021-11-26 15:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811

--- Comment #14 from Hongtao.liu <crazylht at gmail dot com> ---

(In reply to Uroš Bizjak from comment #13)
> (In reply to Hongtao.liu from comment #12)
> > > 
> > > Just noticed that for some reason two VPXORs are emitted. One should be
> > > enough for both VPINSRW insns.
> > 
> > With new alternative in your attached match(vpblenw one), RA could reuse
> > zero register, w/o that, xmm0/xmm1 need to be explictly clear for the upper
> > bits.
> > vpblendw        $1, %xmm1, %xmm2, %xmm1 # 14    [c=4 l=6]  *vec_setv8hf_0/8
> 
> True, but I'd expect some post-reload(?) pass to propagate zeros and remove
> redundant initializations.

On the other hand, if not use expand_vector_set (which treats zero register as
both input and output), but emit_insn(gen_sse4_1_pinsrph(...)) with a new
pseudo register as dest. the redudant initialization could be optimized off by
fwprop1.

        pextrw  $0, %xmm1, %eax
        pextrw  $0, %xmm0, %edx
        vpxor   %xmm1, %xmm1, %xmm1
        vpinsrw $0, %edx, %xmm1, %xmm0
        vpinsrw $0, %eax, %xmm1, %xmm1
        vcvtph2ps       %xmm1, %xmm1
        vcvtph2ps       %xmm0, %xmm0
        vaddss  %xmm1, %xmm0, %xmm0
        vinsertps       $0xe, %xmm0, %xmm0, %xmm0
        vcvtps2ph       $4, %xmm0, %xmm0

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
  2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
                   ` (14 preceding siblings ...)
  2021-11-26 15:22 ` crazylht at gmail dot com
@ 2021-11-26 16:00 ` ubizjak at gmail dot com
  2021-11-26 16:29 ` crazylht at gmail dot com
                   ` (10 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: ubizjak at gmail dot com @ 2021-11-26 16:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811

--- Comment #15 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Hongtao.liu from comment #14)
> (In reply to Uroš Bizjak from comment #13)
> > (In reply to Hongtao.liu from comment #12)
> > > > 
> > > > Just noticed that for some reason two VPXORs are emitted. One should be
> > > > enough for both VPINSRW insns.
> > > 
> > > With new alternative in your attached match(vpblenw one), RA could reuse
> > > zero register, w/o that, xmm0/xmm1 need to be explictly clear for the upper
> > > bits.
> > > vpblendw        $1, %xmm1, %xmm2, %xmm1 # 14    [c=4 l=6]  *vec_setv8hf_0/8
> > 
> > True, but I'd expect some post-reload(?) pass to propagate zeros and remove
> > redundant initializations.
> 
> On the other hand, if not use expand_vector_set (which treats zero register
> as both input and output), but emit_insn(gen_sse4_1_pinsrph(...)) with a new
> pseudo register as dest. the redudant initialization could be optimized off
> by fwprop1.
> 
>         pextrw  $0, %xmm1, %eax
>         pextrw  $0, %xmm0, %edx
>         vpxor   %xmm1, %xmm1, %xmm1
>         vpinsrw $0, %edx, %xmm1, %xmm0
>         vpinsrw $0, %eax, %xmm1, %xmm1
>         vcvtph2ps       %xmm1, %xmm1
>         vcvtph2ps       %xmm0, %xmm0
>         vaddss  %xmm1, %xmm0, %xmm0
>         vinsertps       $0xe, %xmm0, %xmm0, %xmm0
>         vcvtps2ph       $4, %xmm0, %xmm0

Then we will lose optimization in expand vector set:

    case E_V8HFmode:
      if (TARGET_AVX2)
        {
          mmode = SImode;
          gen_blendm = gen_sse4_1_pblendph;
          blendm_const = true;
        }
      else
        use_vec_merge = true;
      break;

Maybe we should simply copy "target" to a new pseudo here:

do_vec_merge:
      tmp = gen_rtx_VEC_DUPLICATE (mode, val);
      tmp = gen_rtx_VEC_MERGE (mode, tmp, target,
                               GEN_INT (HOST_WIDE_INT_1U << elt));
      emit_insn (gen_rtx_SET (target, tmp));

OTOH, if recycling "target" inhibits FWprop, we should perhaps copy "target" to
a new pseudo at the beginning of the expand_vector_set?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
  2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
                   ` (15 preceding siblings ...)
  2021-11-26 16:00 ` ubizjak at gmail dot com
@ 2021-11-26 16:29 ` crazylht at gmail dot com
  2021-11-26 16:49 ` ubizjak at gmail dot com
                   ` (9 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: crazylht at gmail dot com @ 2021-11-26 16:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811

--- Comment #16 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Uroš Bizjak from comment #15)
> (In reply to Hongtao.liu from comment #14)
> > (In reply to Uroš Bizjak from comment #13)
> > > (In reply to Hongtao.liu from comment #12)
> > > > > 
> > > > > Just noticed that for some reason two VPXORs are emitted. One should be
> > > > > enough for both VPINSRW insns.
> > > > 
> > > > With new alternative in your attached match(vpblenw one), RA could reuse
> > > > zero register, w/o that, xmm0/xmm1 need to be explictly clear for the upper
> > > > bits.
> > > > vpblendw        $1, %xmm1, %xmm2, %xmm1 # 14    [c=4 l=6]  *vec_setv8hf_0/8
> > > 
> > > True, but I'd expect some post-reload(?) pass to propagate zeros and remove
> > > redundant initializations.
> > 
> > On the other hand, if not use expand_vector_set (which treats zero register
> > as both input and output), but emit_insn(gen_sse4_1_pinsrph(...)) with a new
> > pseudo register as dest. the redudant initialization could be optimized off
> > by fwprop1.
> > 
> >         pextrw  $0, %xmm1, %eax
> >         pextrw  $0, %xmm0, %edx
> >         vpxor   %xmm1, %xmm1, %xmm1
> >         vpinsrw $0, %edx, %xmm1, %xmm0
> >         vpinsrw $0, %eax, %xmm1, %xmm1
> >         vcvtph2ps       %xmm1, %xmm1
> >         vcvtph2ps       %xmm0, %xmm0
> >         vaddss  %xmm1, %xmm0, %xmm0
> >         vinsertps       $0xe, %xmm0, %xmm0, %xmm0
> >         vcvtps2ph       $4, %xmm0, %xmm0
> 
> Then we will lose optimization in expand vector set:
> 
>     case E_V8HFmode:
>       if (TARGET_AVX2)
> 	{
> 	  mmode = SImode;
> 	  gen_blendm = gen_sse4_1_pblendph;
> 	  blendm_const = true;
> 	}
>       else
> 	use_vec_merge = true;
>       break;
> 
> Maybe we should simply copy "target" to a new pseudo here:
> 
> do_vec_merge:
>       tmp = gen_rtx_VEC_DUPLICATE (mode, val);
>       tmp = gen_rtx_VEC_MERGE (mode, tmp, target,
> 			       GEN_INT (HOST_WIDE_INT_1U << elt));
>       emit_insn (gen_rtx_SET (target, tmp));
> 
> OTOH, if recycling "target" inhibits FWprop, we should perhaps copy "target"
> to a new pseudo at the beginning of the expand_vector_set?

ix86_expand_vector_set is mainly used by vec_set_optab which exactly takes
target as both input and output, it seems we can't create a new target for
that.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
  2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
                   ` (16 preceding siblings ...)
  2021-11-26 16:29 ` crazylht at gmail dot com
@ 2021-11-26 16:49 ` ubizjak at gmail dot com
  2021-11-26 16:57 ` ubizjak at gmail dot com
                   ` (8 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: ubizjak at gmail dot com @ 2021-11-26 16:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811

--- Comment #17 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Hongtao.liu from comment #16)

> ix86_expand_vector_set is mainly used by vec_set_optab which exactly takes
> target as both input and output, it seems we can't create a new target for
> that.

OK, let's try to optimize it with gen_pinsr, as you proposed.

(It looks that the add-on patch from Comment #6 will generate VPBLEND in this
case, too.)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
  2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
                   ` (17 preceding siblings ...)
  2021-11-26 16:49 ` ubizjak at gmail dot com
@ 2021-11-26 16:57 ` ubizjak at gmail dot com
  2021-11-29  2:50 ` crazylht at gmail dot com
                   ` (7 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: ubizjak at gmail dot com @ 2021-11-26 16:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811

--- Comment #18 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Uroš Bizjak from comment #17)
> (In reply to Hongtao.liu from comment #16)
> 
> > ix86_expand_vector_set is mainly used by vec_set_optab which exactly takes
> > target as both input and output, it seems we can't create a new target for
> > that.
> 
> OK, let's try to optimize it with gen_pinsr, as you proposed.
> 
> (It looks that the add-on patch from Comment #6 will generate VPBLEND in
> this case, too.)

We should manually generate vinsertps from truncsfhf2, too. There is no point
to call ix86_expand_vector_set if we already know the instruction. It will use
vec_set<VI4F_128:mode>_0 insn pattern, which has quite some alternatives.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
  2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
                   ` (18 preceding siblings ...)
  2021-11-26 16:57 ` ubizjak at gmail dot com
@ 2021-11-29  2:50 ` crazylht at gmail dot com
  2021-11-29  3:22 ` crazylht at gmail dot com
                   ` (6 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: crazylht at gmail dot com @ 2021-11-29  2:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811

--- Comment #19 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Uroš Bizjak from comment #17)
> (In reply to Hongtao.liu from comment #16)
> 
> > ix86_expand_vector_set is mainly used by vec_set_optab which exactly takes
> > target as both input and output, it seems we can't create a new target for
> > that.
> 
> OK, let's try to optimize it with gen_pinsr, as you proposed.
> 
> (It looks that the add-on patch from Comment #6 will generate VPBLEND in
> this case, too.)

I think your attached patch is a seperate optimization, the new added
alternatives which generates VPBLEND extend the pattern to accept sse register
for the inserted value, currently we only have "rm".

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
  2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
                   ` (19 preceding siblings ...)
  2021-11-29  2:50 ` crazylht at gmail dot com
@ 2021-11-29  3:22 ` crazylht at gmail dot com
  2021-11-29  8:03 ` ubizjak at gmail dot com
                   ` (5 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: crazylht at gmail dot com @ 2021-11-29  3:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811

--- Comment #20 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Uroš Bizjak from comment #18)
> (In reply to Uroš Bizjak from comment #17)
> > (In reply to Hongtao.liu from comment #16)
> > 
> > > ix86_expand_vector_set is mainly used by vec_set_optab which exactly takes
> > > target as both input and output, it seems we can't create a new target for
> > > that.
> > 
> > OK, let's try to optimize it with gen_pinsr, as you proposed.
> > 
> > (It looks that the add-on patch from Comment #6 will generate VPBLEND in
> > this case, too.)
> 
> We should manually generate vinsertps from truncsfhf2, too. There is no
> point to call ix86_expand_vector_set if we already know the instruction. It
> will use vec_set<VI4F_128:mode>_0 insn pattern, which has quite some
> alternatives.

For AVX2, your attached patch will optimize

        vpxor   %xmm2, %xmm2, %xmm2
-       vpbroadcastw    %xmm1, %xmm1
-       vpbroadcastw    %xmm0, %xmm0
        vpblendw        $1, %xmm0, %xmm2, %xmm0
        vpblendw        $1, %xmm1, %xmm2, %xmm2
        vcvtph2ps       %xmm2, %xmm2

Since upper bits of xmm1/xmm0 is not selected by vpblendw.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
  2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
                   ` (20 preceding siblings ...)
  2021-11-29  3:22 ` crazylht at gmail dot com
@ 2021-11-29  8:03 ` ubizjak at gmail dot com
  2021-11-29  9:46 ` cvs-commit at gcc dot gnu.org
                   ` (4 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: ubizjak at gmail dot com @ 2021-11-29  8:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811

--- Comment #21 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Hongtao.liu from comment #20)
> (In reply to Uroš Bizjak from comment #18)
> > (In reply to Uroš Bizjak from comment #17)
> > > (In reply to Hongtao.liu from comment #16)
> > > 
> > > > ix86_expand_vector_set is mainly used by vec_set_optab which exactly takes
> > > > target as both input and output, it seems we can't create a new target for
> > > > that.
> > > 
> > > OK, let's try to optimize it with gen_pinsr, as you proposed.
> > > 
> > > (It looks that the add-on patch from Comment #6 will generate VPBLEND in
> > > this case, too.)
> > 
> > We should manually generate vinsertps from truncsfhf2, too. There is no
> > point to call ix86_expand_vector_set if we already know the instruction. It
> > will use vec_set<VI4F_128:mode>_0 insn pattern, which has quite some
> > alternatives.
> 
> For AVX2, your attached patch will optimize
> 
>         vpxor   %xmm2, %xmm2, %xmm2
> -       vpbroadcastw    %xmm1, %xmm1
> -       vpbroadcastw    %xmm0, %xmm0
>         vpblendw        $1, %xmm0, %xmm2, %xmm0
>         vpblendw        $1, %xmm1, %xmm2, %xmm2
>         vcvtph2ps       %xmm2, %xmm2
> 
> Since upper bits of xmm1/xmm0 is not selected by vpblendw.

True, the blending of only element 0 does not need broadcast. I will prepare a
formal patch submission once your changes are committed.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
  2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
                   ` (21 preceding siblings ...)
  2021-11-29  8:03 ` ubizjak at gmail dot com
@ 2021-11-29  9:46 ` cvs-commit at gcc dot gnu.org
  2021-11-29  9:46 ` cvs-commit at gcc dot gnu.org
                   ` (3 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-11-29  9:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811

--- Comment #22 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:

https://gcc.gnu.org/g:9519b694afbf9a35c36cf9f14d35d1c0e9e8cacc

commit r12-5573-g9519b694afbf9a35c36cf9f14d35d1c0e9e8cacc
Author: liuhongt <hongtao.liu@intel.com>
Date:   Fri Nov 26 23:24:20 2021 +0800

    Fix regression introduced by r12-5536.

    There're several failures:
    1.  unsupported instruction `pextrw` for "pextrw $0, %xmm31, 16(%rax)"
    %vpextrw should be used in output templates.
    2. ICE in get_attr_memory for movhi_internal since some alternatives
    are marked as TYPE_SSELOG.
    use TYPE_SSELOG1 instead.

    Also this patch fixs a typo and some latent bugs which are related to
    moving HImode from/to sse register w/o TARGET_AVX512FP16.

    gcc/ChangeLog:

            PR target/102811
            PR target/103463
            * config/i386/i386.c (ix86_secondary_reload): Without
            TARGET_SSE4_1, General register is needed to move HImode from
            sse register to memory.
            * config/i386/sse.md (*vec_extrachf): Use %vpextrw instead of
            pextrw in output templates.
            * config/i386/i386.md (movhi_internal): Ditto, also fix typo of
            MEM_P (operands[1]) and adjust mode/prefix/type attribute for
            alternatives related to sse register.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
  2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
                   ` (22 preceding siblings ...)
  2021-11-29  9:46 ` cvs-commit at gcc dot gnu.org
@ 2021-11-29  9:46 ` cvs-commit at gcc dot gnu.org
  2021-11-29 21:17 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-11-29  9:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811

--- Comment #23 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:

https://gcc.gnu.org/g:11d0a2af33910c6d243e7265fb7ea04d2bc89b25

commit r12-5574-g11d0a2af33910c6d243e7265fb7ea04d2bc89b25
Author: liuhongt <hongtao.liu@intel.com>
Date:   Mon Nov 29 10:01:42 2021 +0800

    Optimize _Float16 usage for non AVX512FP16.

    1. No memory is needed to move HI/HFmode between GPR and SSE registers
    under TARGET_SSE2 and above, pinsrw/pextrw are used for them w/o
    AVX512FP16.
    2. Use gen_sse2_pinsrph/gen_vec_setv4sf_0 to replace
    ix86_expand_vector_set in extendhfsf2/truncsfhf2 so that redundant
    initialization cound be eliminated.

    gcc/ChangeLog:

            PR target/102811
            * config/i386/i386.c (inline_secondary_memory_needed): HImode
            move between GPR and SSE registers is supported under
            TARGET_SSE2 and above.
            * config/i386/i386.md (extendhfsf2): Optimize expander.
            (truncsfhf2): Ditto.
            * config/i386/sse.md (sse2p4_1): Adjust attr for V8HFmode to
            align with V8HImode.

    gcc/testsuite/ChangeLog:

            * gcc.target/i386/pr102811-2.c: New test.
            * gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c: Add new
            scan-assembler-times.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
  2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
                   ` (23 preceding siblings ...)
  2021-11-29  9:46 ` cvs-commit at gcc dot gnu.org
@ 2021-11-29 21:17 ` cvs-commit at gcc dot gnu.org
  2021-12-01 22:05 ` cvs-commit at gcc dot gnu.org
  2021-12-01 22:15 ` ubizjak at gmail dot com
  26 siblings, 0 replies; 28+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-11-29 21:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811

--- Comment #24 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Uros Bizjak <uros@gcc.gnu.org>:

https://gcc.gnu.org/g:ca5667e867252db3c8642ee90f55427149cd92b6

commit r12-5584-gca5667e867252db3c8642ee90f55427149cd92b6
Author: Uros Bizjak <ubizjak@gmail.com>
Date:   Mon Nov 29 22:16:12 2021 +0100

    i386: Fix and improve movhi_internal and movhf_internal some more.

    An (*v,C) alternative can be added to movhi_internal to directly load
    HImode constant 0 to xmm register. Also, V4SFmode moves can be used
    for xmm->xmm moves instead of TImode moves when optimizing for size.
    Fix invalid %vpinsrw insn template, which needs to duplicate %xmm
    register for AVX targets.

    Optimize GPR moves in movhf_internal in the same way as in movhi_internal.
    Fix pinsrw and pextrw templates for AVX targets. Use sselog1
    instead of sselog type.  Also, handle TARGET_SSE_PARTIAL_REG_DEPENDENCY
    and TARGET_SSE_SPLIT_REGS targets.

    2021-11-29  UroÅ¡ Bizjak  <ubizjak@gmail.com>

    gcc/ChangeLog:

            PR target/102811
            * config/i386/i386.md (*movhi_internal): Introduce (*v,C)
alternative.
            Do not allocate non-GPR registers.  Optimize xmm->xmm moves when
            optimizing for size.  Fix vpinsrw insn template.
            (*movhf_internal): Fix pinsrw and pextrw insn templates for
            AVX targets. Use sselog1 type instead of sselog.  Optimize GPR
moves.
            Optimize xmm->xmm moves for TARGET_SSE_PARTIAL_REG_DEPENDENCY
            and TARGET_SSE_SPLIT_REGS targets.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
  2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
                   ` (24 preceding siblings ...)
  2021-11-29 21:17 ` cvs-commit at gcc dot gnu.org
@ 2021-12-01 22:05 ` cvs-commit at gcc dot gnu.org
  2021-12-01 22:15 ` ubizjak at gmail dot com
  26 siblings, 0 replies; 28+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-12-01 22:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811

--- Comment #25 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Uros Bizjak <uros@gcc.gnu.org>:

https://gcc.gnu.org/g:7eb961d83b0eda53aeb1cfaacdc367e1952de613

commit r12-5700-g7eb961d83b0eda53aeb1cfaacdc367e1952de613
Author: Uros Bizjak <ubizjak@gmail.com>
Date:   Wed Dec 1 23:01:09 2021 +0100

    i386: Improve V8HI and V8HF inserts [PR102811]

    Introduce vec_set_0 pattern for V8HI and V8HF modes to implement scalar
    element 0 inserts to from a GP register, SSE register or memory.  Also
    add V8HI and V8HF AVX2 (x,x,x) alternative to PINSR insn pattern, which is
    split after reload to a sequence of PBROADCASTW and PBLENDW.

    The V8HF inserts from memory improve from:

    -       vpbroadcastw    4(%esp), %xmm1
    -       vpblendw        $16, %xmm1, %xmm0, %xmm0
    +       vpinsrw $4, 4(%esp), %xmm0, %xmm0

    and V8HF inserts from SSE register to element 0 improve from:

            vpxor   %xmm2, %xmm2, %xmm2
    -       vpbroadcastw    %xmm0, %xmm0
            vpblendw        $1, %xmm0, %xmm2, %xmm0

    Based on the above improvements, the register allocator is able to
determine
    the optimal instruction (or instruction sequence) based on the register set
    of the input value, so there is no need to manually expand V8HI and V8HF
    inserts to the sequence of VEC_DUPLICATE and VEC_MERGE RTXes.

    2021-12-01  UroÅ¡ Bizjak  <ubizjak@gmail.com>

    gcc/ChangeLog:

            PR target/102811
            * config/i386/sse.md (VI2F): Remove mode iterator.
            (VI2F_256_512): New mode iterator.
            (vec_set<V8_128:mode>_0): New insn pattern.
            (vec_set<VI2F_256_512:mode>_0>): Rename from
vec_set<VI2F:mode>mode.
            Use VI2F_256_512 mode iterator instead of VI2F.
            (*axv512fp16_movsh): Remove.
            (<sse2p4_1>_pinsr<ssemodesuffix>): Add (x,x,x) AVX2 alternative.
            Do not disable V8HF mode insn on AVX2 targets.
            (pinsrw -> pbroadcast + pblendw peephole2): New peephole.
            (pinsrw -> pbroadcast + pblendw splitter): New post-reload
splitter.
            * config/i386/i386.md (extendhfsf): Call gen_vec_setv8hf_0.
            * config/i386/i386-expand.c (ix86_expand_vector_set)
            <case E_V8HFmode>: Use vec_merge path for TARGET_AVX2.

    gcc/testsuite/ChangeLog:

            PR target/102881
            * gcc.target/i386/pr102811-1.c: New test.
            * gcc.target/i386/avx512fp16-1c.c (dg-final): Update
            scan-assembler-times scan strings for ia32 targets.
            * gcc.target/i386/pr102327-1.c (dg-final): Ditto.
            * gcc.target/i386/pr102811.c: Rename from ...
            * gcc.target/i386/avx512vl-vcvtps2ph-pr102811.c: ... this.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug target/102811] vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c
  2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
                   ` (25 preceding siblings ...)
  2021-12-01 22:05 ` cvs-commit at gcc dot gnu.org
@ 2021-12-01 22:15 ` ubizjak at gmail dot com
  26 siblings, 0 replies; 28+ messages in thread
From: ubizjak at gmail dot com @ 2021-12-01 22:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102811

--- Comment #26 from Uroš Bizjak <ubizjak at gmail dot com> ---
The testcase now compiles with -O2 -mf16c to:

        vpxor   %xmm2, %xmm2, %xmm2
        vpblendw        $1, %xmm0, %xmm2, %xmm0
        vpblendw        $1, %xmm1, %xmm2, %xmm1
        vcvtph2ps       %xmm1, %xmm1
        vcvtph2ps       %xmm0, %xmm0
        vaddss  %xmm1, %xmm0, %xmm0
        vinsertps       $0xe, %xmm0, %xmm0, %xmm0
        vcvtps2ph       $4, %xmm0, %xmm0
        ret

for 64-bit targets and:

        vpxor   %xmm2, %xmm2, %xmm2
        vpinsrw $0, 4(%esp), %xmm2, %xmm0
        vpinsrw $0, 8(%esp), %xmm2, %xmm1
        vcvtph2ps       %xmm0, %xmm0
        vcvtph2ps       %xmm1, %xmm1
        vaddss  %xmm1, %xmm0, %xmm0
        vinsertps       $0xe, %xmm0, %xmm0, %xmm0
        vcvtps2ph       $4, %xmm0, %xmm0
        ret

for 32-bit targets.

Fixed.

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2021-12-01 22:15 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-18 10:58 [Bug target/102811] New: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c ubizjak at gmail dot com
2021-10-18 11:17 ` [Bug target/102811] " ubizjak at gmail dot com
2021-10-18 11:31 ` ubizjak at gmail dot com
2021-10-18 21:20 ` pinskia at gcc dot gnu.org
2021-11-26  1:30 ` cvs-commit at gcc dot gnu.org
2021-11-26  1:32 ` crazylht at gmail dot com
2021-11-26  1:48 ` pinskia at gcc dot gnu.org
2021-11-26 10:41 ` ubizjak at gmail dot com
2021-11-26 10:48 ` ubizjak at gmail dot com
2021-11-26 11:12 ` ubizjak at gmail dot com
2021-11-26 12:27 ` crazylht at gmail dot com
2021-11-26 13:01 ` ubizjak at gmail dot com
2021-11-26 13:28 ` crazylht at gmail dot com
2021-11-26 13:34 ` crazylht at gmail dot com
2021-11-26 13:46 ` ubizjak at gmail dot com
2021-11-26 15:22 ` crazylht at gmail dot com
2021-11-26 16:00 ` ubizjak at gmail dot com
2021-11-26 16:29 ` crazylht at gmail dot com
2021-11-26 16:49 ` ubizjak at gmail dot com
2021-11-26 16:57 ` ubizjak at gmail dot com
2021-11-29  2:50 ` crazylht at gmail dot com
2021-11-29  3:22 ` crazylht at gmail dot com
2021-11-29  8:03 ` ubizjak at gmail dot com
2021-11-29  9:46 ` cvs-commit at gcc dot gnu.org
2021-11-29  9:46 ` cvs-commit at gcc dot gnu.org
2021-11-29 21:17 ` cvs-commit at gcc dot gnu.org
2021-12-01 22:05 ` cvs-commit at gcc dot gnu.org
2021-12-01 22:15 ` ubizjak at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).