public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/97249] New: Missing vec_select and subreg optimization
@ 2020-09-30  6:30 crazylht at gmail dot com
  2020-09-30  6:38 ` [Bug rtl-optimization/97249] " crazylht at gmail dot com
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: crazylht at gmail dot com @ 2020-09-30  6:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97249

            Bug ID: 97249
           Summary: Missing vec_select and subreg optimization
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: crazylht at gmail dot com
                CC: hjl.tools at gmail dot com, wwwhhhyyy333 at gmail dot com
  Target Milestone: ---
              Host: x86_64-pc-linux-gnu

Cat test.c
---
void
foo (unsigned char* p1, unsigned char* p2, short* __restrict p3)
{
    for (int i = 0 ; i != 8; i++)
     p3[i] = p1[i] + p2[i];
     return;
}
---

gcc11 -Ofast -mavx2 test.c  got

---
foo:
.LFB0:
        .cfi_startproc
        vmovq   (%rdi), %xmm0
        vmovq   (%rsi), %xmm1
        vpmovzxbw       %xmm0, %xmm0
        vpmovzxbw       %xmm1, %xmm1
        vpaddw  %xmm1, %xmm0, %xmm0
        vmovdqu %xmm0, (%rdx)
        ret
        .cfi_endproc
---

memory operand doesn't propagate into *vpmovzxbw* because rtl didn't simplify
---
(insn 9 8 10 2 (set (reg:V8HI 92 [ vect__33.6 ])
        (zero_extend:V8HI (vec_select:V8QI (subreg:V16QI (reg:V8QI 91 [
vect__40.5 ]) 0)
                (parallel [
                        (const_int 0 [0])
                        (const_int 1 [0x1])
                        (const_int 2 [0x2])
                        (const_int 3 [0x3])
                        (const_int 4 [0x4])
                        (const_int 5 [0x5])
                        (const_int 6 [0x6])
                        (const_int 7 [0x7])
                    ])))) "test.c":5:16 4638 {sse4_1_zero_extendv8qiv8hi2}
     (expr_list:REG_DEAD (reg:V8QI 91 [ vect__40.5 ])
        (nil)))
--- 

to 

---
(insn 9 8 10 2 (set (reg:V8HI 92 [ vect__33.6 ])
        (zero_extend:V8HI (reg:V8QI 91 [ vect__40.5 ])))) "test.c":5:16 4638
{sse4_1_zero_extendv8qiv8hi2}
     (expr_list:REG_DEAD (reg:V8QI 91 [ vect__40.5 ])
        (nil)))
---

Similar for other vector modes.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug rtl-optimization/97249] Missing vec_select and subreg optimization
  2020-09-30  6:30 [Bug rtl-optimization/97249] New: Missing vec_select and subreg optimization crazylht at gmail dot com
@ 2020-09-30  6:38 ` crazylht at gmail dot com
  2020-09-30  6:39 ` crazylht at gmail dot com
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: crazylht at gmail dot com @ 2020-09-30  6:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97249

--- Comment #1 from Hongtao.liu <crazylht at gmail dot com> ---
for i386 backend, maybe we can adjust pattern of

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 934b60a288f..2bfa9635fab 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -17658,12 +17658,7 @@ (define_expand "<code>v32qiv32hi2"
 (define_insn "sse4_1_<code>v8qiv8hi2<mask_name>"
   [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v")
        (any_extend:V8HI
-         (vec_select:V8QI
-           (match_operand:V16QI 1 "register_operand" "Yr,*x,v")
-           (parallel [(const_int 0) (const_int 1)
-                      (const_int 2) (const_int 3)
-                      (const_int 4) (const_int 5)
-                      (const_int 6) (const_int 7)]))))]
+         (subreg:V8QI(match_operand:V16QI 1 "register_operand" "Yr,*x,v")
0)))]
   "TARGET_SSE4_1 && <mask_avx512bw_condition> && <mask_avx512vl_condition>"
   "%vpmov<extsuffix>bw\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
   [(set_attr "isa" "noavx,noavx,avx")

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug rtl-optimization/97249] Missing vec_select and subreg optimization
  2020-09-30  6:30 [Bug rtl-optimization/97249] New: Missing vec_select and subreg optimization crazylht at gmail dot com
  2020-09-30  6:38 ` [Bug rtl-optimization/97249] " crazylht at gmail dot com
@ 2020-09-30  6:39 ` crazylht at gmail dot com
  2020-09-30  6:48 ` rguenth at gcc dot gnu.org
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: crazylht at gmail dot com @ 2020-09-30  6:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97249

--- Comment #2 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Hongtao.liu from comment #1)
> for i386 backend, maybe we can adjust pattern of
> 
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 934b60a288f..2bfa9635fab 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -17658,12 +17658,7 @@ (define_expand "<code>v32qiv32hi2"
>  (define_insn "sse4_1_<code>v8qiv8hi2<mask_name>"
>    [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v")
>         (any_extend:V8HI
> -         (vec_select:V8QI
> -           (match_operand:V16QI 1 "register_operand" "Yr,*x,v")
> -           (parallel [(const_int 0) (const_int 1)
> -                      (const_int 2) (const_int 3)
> -                      (const_int 4) (const_int 5)
> -                      (const_int 6) (const_int 7)]))))]
> +         (subreg:V8HI(match_operand:V16QI 1 "register_operand" "Yr,*x,v")
> 0)))]
>    "TARGET_SSE4_1 && <mask_avx512bw_condition> && <mask_avx512vl_condition>"
>    "%vpmov<extsuffix>bw\t{%1, %0<mask_operand2>|%0<mask_operand2>, %1}"
>    [(set_attr "isa" "noavx,noavx,avx")

Correct typo 
 @@ -17658,12 +17658,7 @@ (define_expand "<code>v32qiv32hi2"
  (define_insn "sse4_1_<code>v8qiv8hi2<mask_name>"
    [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v")
         (any_extend:V8HI
 -         (vec_select:V8QI
 -           (match_operand:V16QI 1 "register_operand" "Yr,*x,v")
 -           (parallel [(const_int 0) (const_int 1)
 -                      (const_int 2) (const_int 3)
 -                      (const_int 4) (const_int 5)
 -                      (const_int 6) (const_int 7)]))))]
 +         (subreg:V8HI (match_operand:V16QI 1 "register_operand" "Yr,*x,v")
> 0)))]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug rtl-optimization/97249] Missing vec_select and subreg optimization
  2020-09-30  6:30 [Bug rtl-optimization/97249] New: Missing vec_select and subreg optimization crazylht at gmail dot com
  2020-09-30  6:38 ` [Bug rtl-optimization/97249] " crazylht at gmail dot com
  2020-09-30  6:39 ` crazylht at gmail dot com
@ 2020-09-30  6:48 ` rguenth at gcc dot gnu.org
  2020-10-12  2:23 ` crazylht at gmail dot com
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-09-30  6:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97249

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2020-09-30
                 CC|                            |segher at gcc dot gnu.org
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
Guess you want to figure what built the (vec_select:V8QI (V16QI)) and if
it was appropriately simplified (and simplify_rtx would handle this case).
In any case the vec_select is the same as (subreg:V8QI (V16QI)).

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug rtl-optimization/97249] Missing vec_select and subreg optimization
  2020-09-30  6:30 [Bug rtl-optimization/97249] New: Missing vec_select and subreg optimization crazylht at gmail dot com
                   ` (2 preceding siblings ...)
  2020-09-30  6:48 ` rguenth at gcc dot gnu.org
@ 2020-10-12  2:23 ` crazylht at gmail dot com
  2020-10-12 14:53 ` segher at gcc dot gnu.org
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: crazylht at gmail dot com @ 2020-10-12  2:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97249

--- Comment #4 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Richard Biener from comment #3)
> Guess you want to figure what built the (vec_select:V8QI (V16QI)) and if
> it was appropriately simplified (and simplify_rtx would handle this case).
> In any case the vec_select is the same as (subreg:V8QI (V16QI)).

For this testcase, simplify_rtx will be omiited since it will be handle in
---
  for (i = 0; i < GET_RTX_LENGTH (code); i++)
    switch (*format_ptr++)
      {
      case 'e':
        if (XEXP (orig, i) != NULL)
          {
            rtx result = cselib_expand_value_rtx_1 (XEXP (orig, i), evd,
                                                    max_depth - 1);
            if (!result)
              return NULL; <-----return here.
            if (copy)
              XEXP (copy, i) = result;
          }
        break;
---

So could we handle it in cselib_expand_value_rtx_1?
---
diff --git a/gcc/cselib.c b/gcc/cselib.c
index 53e9603868d..8882ac60f1e 100644
--- a/gcc/cselib.c
+++ b/gcc/cselib.c
@@ -1864,6 +1864,18 @@ cselib_expand_value_rtx_1 (rtx orig, struct
expand_value_data *evd,
        return scopy;
       }

+    /* Handle cases like
+       (vec_select:V8QI (subreg:V16QI (value:V8QI) 0)
+                       (parallel [(const_int 0) (const_int 1)
+                       (const_int 2) (const_int 3)
+                       (const_int 4) (const_int 5)
+                       (const_int 6) (const_int 7)])),
+       it should be equal to (value:V8QI).  */
+    case VEC_SELECT:
+      {
+       
+      }
+
---

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug rtl-optimization/97249] Missing vec_select and subreg optimization
  2020-09-30  6:30 [Bug rtl-optimization/97249] New: Missing vec_select and subreg optimization crazylht at gmail dot com
                   ` (3 preceding siblings ...)
  2020-10-12  2:23 ` crazylht at gmail dot com
@ 2020-10-12 14:53 ` segher at gcc dot gnu.org
  2020-10-13  1:59 ` crazylht at gmail dot com
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: segher at gcc dot gnu.org @ 2020-10-12 14:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97249

--- Comment #5 from Segher Boessenkool <segher at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #3)
> Guess you want to figure what built the (vec_select:V8QI (V16QI)) and if
> it was appropriately simplified (and simplify_rtx would handle this case).
> In any case the vec_select is the same as (subreg:V8QI (V16QI)).

This case for vec_select isn't yet handled in simplify-rtx.  It
looks like it does not yet handle any cases that do not use full
vector length?  (Or, in other words, it only handles cases where
all vectors are the same length.)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug rtl-optimization/97249] Missing vec_select and subreg optimization
  2020-09-30  6:30 [Bug rtl-optimization/97249] New: Missing vec_select and subreg optimization crazylht at gmail dot com
                   ` (4 preceding siblings ...)
  2020-10-12 14:53 ` segher at gcc dot gnu.org
@ 2020-10-13  1:59 ` crazylht at gmail dot com
  2020-10-13  6:08 ` crazylht at gmail dot com
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: crazylht at gmail dot com @ 2020-10-13  1:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97249

--- Comment #6 from Hongtao.liu <crazylht at gmail dot com> ---
We all ready have bellow in simplify-rtx.c, it seems we can also handle such
situation here.

---
3954     case VEC_SELECT:
3955       if (!VECTOR_MODE_P (mode))
3956         {
3957           gcc_assert (VECTOR_MODE_P (GET_MODE (trueop0)));
3958           gcc_assert (mode == GET_MODE_INNER (GET_MODE (trueop0)));
3959           gcc_assert (GET_CODE (trueop1) == PARALLEL);
3960           gcc_assert (XVECLEN (trueop1, 0) == 1);
3961 
3962           /* We can't reason about selections made at runtime.  */
3963           if (!CONST_INT_P (XVECEXP (trueop1, 0, 0)))
3964             return 0;
3965 
3966           if (vec_duplicate_p (trueop0, &elt0))
3967             return elt0;
3968 
3969           if (GET_CODE (trueop0) == CONST_VECTOR)
3970             return CONST_VECTOR_ELT (trueop0, INTVAL (XVECEXP
3971                                                       (trueop1, 0, 0)));
3972 
3973           /* Extract a scalar element from a nested VEC_SELECT expression
3974              (with optional nested VEC_CONCAT expression).  Some targets
3975              (i386) extract scalar element from a vector using chain of
3976              nested VEC_SELECT expressions.  When input operand is a
memory
3977              operand, this operation can be simplified to a simple scalar
3978              load from an offseted memory address.  */

---

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug rtl-optimization/97249] Missing vec_select and subreg optimization
  2020-09-30  6:30 [Bug rtl-optimization/97249] New: Missing vec_select and subreg optimization crazylht at gmail dot com
                   ` (5 preceding siblings ...)
  2020-10-13  1:59 ` crazylht at gmail dot com
@ 2020-10-13  6:08 ` crazylht at gmail dot com
  2020-10-22  3:38 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: crazylht at gmail dot com @ 2020-10-13  6:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97249

--- Comment #7 from Hongtao.liu <crazylht at gmail dot com> ---
I'm testing
---
diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index 869f0d11b2e..9c397157f28 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -4170,6 +4170,33 @@ simplify_binary_operation_1 (enum rtx_code code,
machine_mode mode,
                    return subop1;
                }
            }
+
+         /* For cases like
+            (vec_select:V2SI (subreg:V4SI (inner:V2SI) 0)
+                             (parallel [(const_int 0) (const_int 1)])).
+            return inner directly.  */
+         if (GET_CODE (trueop0) == SUBREG
+             && paradoxical_subreg_p (trueop0)
+             && mode == GET_MODE (XEXP (trueop0, 0))
+             && (GET_MODE_NUNITS (GET_MODE (trueop0))).is_constant (&l0)
+             && (GET_MODE_NUNITS (mode)).is_constant (&l1)
+             && l0 % l1 == 0)
+           {
+             gcc_assert (known_eq (XVECLEN (trueop1, 0), l1));
+             unsigned HOST_WIDE_INT expect = (HOST_WIDE_INT_1U << l1) - 1;
+             unsigned HOST_WIDE_INT sel = 0;
+             int i = 0;
+             for (;i != l1; i++)
+               {
+                 rtx j = XVECEXP (trueop1, 0, i);
+                 if (!CONST_INT_P (j))
+                   break;
+                 sel |= HOST_WIDE_INT_1U << UINTVAL (j);
+               }
+             /* ??? Need to simplify XEXP (trueop0, 0) here.  */
+             if (sel == expect)
+               return XEXP (trueop0, 0);
+           }
        }

       if (XVECLEN (trueop1, 0) == 1
---

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug rtl-optimization/97249] Missing vec_select and subreg optimization
  2020-09-30  6:30 [Bug rtl-optimization/97249] New: Missing vec_select and subreg optimization crazylht at gmail dot com
                   ` (6 preceding siblings ...)
  2020-10-13  6:08 ` crazylht at gmail dot com
@ 2020-10-22  3:38 ` cvs-commit at gcc dot gnu.org
  2020-10-22  3:38 ` crazylht at gmail dot com
  2021-08-21 18:28 ` pinskia at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2020-10-22  3:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97249

--- Comment #8 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:

https://gcc.gnu.org/g:52e7f09698ecb5ba6d9e921ffe912d1f66158e9e

commit r11-4204-g52e7f09698ecb5ba6d9e921ffe912d1f66158e9e
Author: liuhongt <hongtao.liu@intel.com>
Date:   Tue Oct 13 15:35:29 2020 +0800

    Simplify vec_select of a subreg of X to just a vec_select of X.

    gcc/ChangeLog
            PR rtl-optimization/97249
            * simplify-rtx.c (simplify_binary_operation_1): Simplify
            vec_select of a subreg of X to a vec_select of X.

    gcc/testsuite/ChangeLog

            * gcc.target/i386/pr97249-1.c: New test.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug rtl-optimization/97249] Missing vec_select and subreg optimization
  2020-09-30  6:30 [Bug rtl-optimization/97249] New: Missing vec_select and subreg optimization crazylht at gmail dot com
                   ` (7 preceding siblings ...)
  2020-10-22  3:38 ` cvs-commit at gcc dot gnu.org
@ 2020-10-22  3:38 ` crazylht at gmail dot com
  2021-08-21 18:28 ` pinskia at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: crazylht at gmail dot com @ 2020-10-22  3:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97249

Hongtao.liu <crazylht at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|NEW                         |RESOLVED

--- Comment #9 from Hongtao.liu <crazylht at gmail dot com> ---
Fixed in GCC11

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug rtl-optimization/97249] Missing vec_select and subreg optimization
  2020-09-30  6:30 [Bug rtl-optimization/97249] New: Missing vec_select and subreg optimization crazylht at gmail dot com
                   ` (8 preceding siblings ...)
  2020-10-22  3:38 ` crazylht at gmail dot com
@ 2021-08-21 18:28 ` pinskia at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-21 18:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97249

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |11.0

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-08-21 18:28 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-30  6:30 [Bug rtl-optimization/97249] New: Missing vec_select and subreg optimization crazylht at gmail dot com
2020-09-30  6:38 ` [Bug rtl-optimization/97249] " crazylht at gmail dot com
2020-09-30  6:39 ` crazylht at gmail dot com
2020-09-30  6:48 ` rguenth at gcc dot gnu.org
2020-10-12  2:23 ` crazylht at gmail dot com
2020-10-12 14:53 ` segher at gcc dot gnu.org
2020-10-13  1:59 ` crazylht at gmail dot com
2020-10-13  6:08 ` crazylht at gmail dot com
2020-10-22  3:38 ` cvs-commit at gcc dot gnu.org
2020-10-22  3:38 ` crazylht at gmail dot com
2021-08-21 18:28 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).