public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH v1] RISC-V: Remove the type size restriction of vectorizer
@ 2023-10-18  1:20 pan2.li
  2023-10-18  6:34 ` Richard Biener
                   ` (3 more replies)
  0 siblings, 4 replies; 20+ messages in thread
From: pan2.li @ 2023-10-18  1:20 UTC (permalink / raw)
  To: gcc-patches
  Cc: juzhe.zhong, pan2.li, yanzhang.wang, kito.cheng, hongtao.liu,
	richard.guenther

From: Pan Li <pan2.li@intel.com>

The vectoriable_call has one restriction of the size of data type.
Aka DF to DI is allowed but SF to DI isn't. You may see below message
when try to vectorize function call like lrintf.

void
test_lrintf (long *out, float *in, unsigned count)
{
  for (unsigned i = 0; i < count; i++)
    out[i] = __builtin_lrintf (in[i]);
}

lrintf.c:5:26: missed: couldn't vectorize loop
lrintf.c:5:26: missed: not vectorized: unsupported data-type

Then the standard name pattern like lrintmn2 cannot work for different
data type size like SF => DI. This patch would like to remove this data
type size check and unblock the standard name like lrintmn2.

Passed the x86 bootstrap and regression test already.

gcc/ChangeLog:

	* tree-vect-stmts.cc (vectorizable_call): Remove data size
	check.

Signed-off-by: Pan Li <pan2.li@intel.com>
---
 gcc/tree-vect-stmts.cc | 13 -------------
 1 file changed, 13 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index b3a56498595..326e000a71d 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -3529,19 +3529,6 @@ vectorizable_call (vec_info *vinfo,
 
       return false;
     }
-  /* FORNOW: we don't yet support mixtures of vector sizes for calls,
-     just mixtures of nunits.  E.g. DI->SI versions of __builtin_ctz*
-     are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed
-     by a pack of the two vectors into an SI vector.  We would need
-     separate code to handle direct VnDI->VnSI IFN_CTZs.  */
-  if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out))
-    {
-      if (dump_enabled_p ())
-	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-			 "mismatched vector sizes %T and %T\n",
-			 vectype_in, vectype_out);
-      return false;
-    }
 
   if (VECTOR_BOOLEAN_TYPE_P (vectype_out)
       != VECTOR_BOOLEAN_TYPE_P (vectype_in))
-- 
2.34.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v1] RISC-V: Remove the type size restriction of vectorizer
  2023-10-18  1:20 [PATCH v1] RISC-V: Remove the type size restriction of vectorizer pan2.li
@ 2023-10-18  6:34 ` Richard Biener
  2023-10-18 11:50   ` Li, Pan2
  2023-10-20  8:43   ` Li, Pan2
  2023-10-26  2:18 ` [PATCH v2] VECT: " pan2.li
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 20+ messages in thread
From: Richard Biener @ 2023-10-18  6:34 UTC (permalink / raw)
  To: pan2.li; +Cc: gcc-patches, juzhe.zhong, yanzhang.wang, kito.cheng, hongtao.liu

On Wed, Oct 18, 2023 at 3:20 AM <pan2.li@intel.com> wrote:
>
> From: Pan Li <pan2.li@intel.com>
>
> The vectoriable_call has one restriction of the size of data type.
> Aka DF to DI is allowed but SF to DI isn't. You may see below message
> when try to vectorize function call like lrintf.
>
> void
> test_lrintf (long *out, float *in, unsigned count)
> {
>   for (unsigned i = 0; i < count; i++)
>     out[i] = __builtin_lrintf (in[i]);
> }
>
> lrintf.c:5:26: missed: couldn't vectorize loop
> lrintf.c:5:26: missed: not vectorized: unsupported data-type
>
> Then the standard name pattern like lrintmn2 cannot work for different
> data type size like SF => DI. This patch would like to remove this data
> type size check and unblock the standard name like lrintmn2.
>
> Passed the x86 bootstrap and regression test already.

OK.

On x86 we seem to have lrintsfdi2 but not lrintv4sfv4di2, with SLP
vectorization we could expect to see the following vectorized after
the patch (with loop vectorization you'll see us pre-select same sized
vector types)

long int x[4];
float y[4];

void foo ()
{
  x[0] = __builtin_lrintf (y[0]);
  x[1] = __builtin_lrintf (y[1]);
  x[2] = __builtin_lrintf (y[2]);
  x[3] = __builtin_lrintf (y[3]);
}


> gcc/ChangeLog:
>
>         * tree-vect-stmts.cc (vectorizable_call): Remove data size
>         check.
>
> Signed-off-by: Pan Li <pan2.li@intel.com>
> ---
>  gcc/tree-vect-stmts.cc | 13 -------------
>  1 file changed, 13 deletions(-)
>
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index b3a56498595..326e000a71d 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -3529,19 +3529,6 @@ vectorizable_call (vec_info *vinfo,
>
>        return false;
>      }
> -  /* FORNOW: we don't yet support mixtures of vector sizes for calls,
> -     just mixtures of nunits.  E.g. DI->SI versions of __builtin_ctz*
> -     are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed
> -     by a pack of the two vectors into an SI vector.  We would need
> -     separate code to handle direct VnDI->VnSI IFN_CTZs.  */
> -  if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out))
> -    {
> -      if (dump_enabled_p ())
> -       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -                        "mismatched vector sizes %T and %T\n",
> -                        vectype_in, vectype_out);
> -      return false;
> -    }
>
>    if (VECTOR_BOOLEAN_TYPE_P (vectype_out)
>        != VECTOR_BOOLEAN_TYPE_P (vectype_in))
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v1] RISC-V: Remove the type size restriction of vectorizer
  2023-10-18  6:34 ` Richard Biener
@ 2023-10-18 11:50   ` Li, Pan2
  2023-10-20  8:43   ` Li, Pan2
  1 sibling, 0 replies; 20+ messages in thread
From: Li, Pan2 @ 2023-10-18 11:50 UTC (permalink / raw)
  To: Richard Biener
  Cc: gcc-patches, juzhe.zhong, Wang, Yanzhang, kito.cheng, Liu, Hongtao

Thanks Richard, let's wait for a while incase there are comments from others due to not familiar with these parts.

Pan

-----Original Message-----
From: Richard Biener <richard.guenther@gmail.com> 
Sent: Wednesday, October 18, 2023 2:34 PM
To: Li, Pan2 <pan2.li@intel.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>
Subject: Re: [PATCH v1] RISC-V: Remove the type size restriction of vectorizer

On Wed, Oct 18, 2023 at 3:20 AM <pan2.li@intel.com> wrote:
>
> From: Pan Li <pan2.li@intel.com>
>
> The vectoriable_call has one restriction of the size of data type.
> Aka DF to DI is allowed but SF to DI isn't. You may see below message
> when try to vectorize function call like lrintf.
>
> void
> test_lrintf (long *out, float *in, unsigned count)
> {
>   for (unsigned i = 0; i < count; i++)
>     out[i] = __builtin_lrintf (in[i]);
> }
>
> lrintf.c:5:26: missed: couldn't vectorize loop
> lrintf.c:5:26: missed: not vectorized: unsupported data-type
>
> Then the standard name pattern like lrintmn2 cannot work for different
> data type size like SF => DI. This patch would like to remove this data
> type size check and unblock the standard name like lrintmn2.
>
> Passed the x86 bootstrap and regression test already.

OK.

On x86 we seem to have lrintsfdi2 but not lrintv4sfv4di2, with SLP
vectorization we could expect to see the following vectorized after
the patch (with loop vectorization you'll see us pre-select same sized
vector types)

long int x[4];
float y[4];

void foo ()
{
  x[0] = __builtin_lrintf (y[0]);
  x[1] = __builtin_lrintf (y[1]);
  x[2] = __builtin_lrintf (y[2]);
  x[3] = __builtin_lrintf (y[3]);
}


> gcc/ChangeLog:
>
>         * tree-vect-stmts.cc (vectorizable_call): Remove data size
>         check.
>
> Signed-off-by: Pan Li <pan2.li@intel.com>
> ---
>  gcc/tree-vect-stmts.cc | 13 -------------
>  1 file changed, 13 deletions(-)
>
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index b3a56498595..326e000a71d 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -3529,19 +3529,6 @@ vectorizable_call (vec_info *vinfo,
>
>        return false;
>      }
> -  /* FORNOW: we don't yet support mixtures of vector sizes for calls,
> -     just mixtures of nunits.  E.g. DI->SI versions of __builtin_ctz*
> -     are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed
> -     by a pack of the two vectors into an SI vector.  We would need
> -     separate code to handle direct VnDI->VnSI IFN_CTZs.  */
> -  if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out))
> -    {
> -      if (dump_enabled_p ())
> -       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -                        "mismatched vector sizes %T and %T\n",
> -                        vectype_in, vectype_out);
> -      return false;
> -    }
>
>    if (VECTOR_BOOLEAN_TYPE_P (vectype_out)
>        != VECTOR_BOOLEAN_TYPE_P (vectype_in))
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v1] RISC-V: Remove the type size restriction of vectorizer
  2023-10-18  6:34 ` Richard Biener
  2023-10-18 11:50   ` Li, Pan2
@ 2023-10-20  8:43   ` Li, Pan2
  2023-10-20  8:49     ` juzhe.zhong
  1 sibling, 1 reply; 20+ messages in thread
From: Li, Pan2 @ 2023-10-20  8:43 UTC (permalink / raw)
  To: Richard Biener
  Cc: gcc-patches, juzhe.zhong, Wang, Yanzhang, kito.cheng, Liu, Hongtao

Hi Richard Biener,

The CI of linaro-toolchain@lists.linaro.org reports some aarch64 regression of this change, I will double check about it soon.

FAIL: 12 regressions

regressions.sum:
		=== gcc tests ===

Running gcc:gcc.target/aarch64/sve/aarch64-sve.exp ...
FAIL: gcc.target/aarch64/sve/clrsb_1.c (internal compiler error: in expand_fn_using_insn, at internal-fn.cc:284)
FAIL: gcc.target/aarch64/sve/clrsb_1.c (test for excess errors)
FAIL: gcc.target/aarch64/sve/clrsb_1.c scan-assembler-times \\tcls\\tz[0-9]+\\.d, p[0-7]/m, z[0-9]+\\.d\\n 2
FAIL: gcc.target/aarch64/sve/clrsb_1.c scan-assembler-times \\tuzp1\\tz[0-9]+\\.s, z[0-9]+\\.s, z[0-9]+\\.s\\n 1
FAIL: gcc.target/aarch64/sve/clz_1.c (internal compiler error: in expand_fn_using_insn, at internal-fn.cc:284)
FAIL: gcc.target/aarch64/sve/clz_1.c (test for excess errors)
FAIL: gcc.target/aarch64/sve/clz_1.c scan-assembler-times \\tclz\\tz[0-9]+\\.d, p[0-7]/m, z[0-9]+\\.d\\n 2
... and 7 more entries

Pan

-----Original Message-----
From: Richard Biener <richard.guenther@gmail.com> 
Sent: Wednesday, October 18, 2023 2:34 PM
To: Li, Pan2 <pan2.li@intel.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>
Subject: Re: [PATCH v1] RISC-V: Remove the type size restriction of vectorizer

On Wed, Oct 18, 2023 at 3:20 AM <pan2.li@intel.com> wrote:
>
> From: Pan Li <pan2.li@intel.com>
>
> The vectoriable_call has one restriction of the size of data type.
> Aka DF to DI is allowed but SF to DI isn't. You may see below message
> when try to vectorize function call like lrintf.
>
> void
> test_lrintf (long *out, float *in, unsigned count)
> {
>   for (unsigned i = 0; i < count; i++)
>     out[i] = __builtin_lrintf (in[i]);
> }
>
> lrintf.c:5:26: missed: couldn't vectorize loop
> lrintf.c:5:26: missed: not vectorized: unsupported data-type
>
> Then the standard name pattern like lrintmn2 cannot work for different
> data type size like SF => DI. This patch would like to remove this data
> type size check and unblock the standard name like lrintmn2.
>
> Passed the x86 bootstrap and regression test already.

OK.

On x86 we seem to have lrintsfdi2 but not lrintv4sfv4di2, with SLP
vectorization we could expect to see the following vectorized after
the patch (with loop vectorization you'll see us pre-select same sized
vector types)

long int x[4];
float y[4];

void foo ()
{
  x[0] = __builtin_lrintf (y[0]);
  x[1] = __builtin_lrintf (y[1]);
  x[2] = __builtin_lrintf (y[2]);
  x[3] = __builtin_lrintf (y[3]);
}


> gcc/ChangeLog:
>
>         * tree-vect-stmts.cc (vectorizable_call): Remove data size
>         check.
>
> Signed-off-by: Pan Li <pan2.li@intel.com>
> ---
>  gcc/tree-vect-stmts.cc | 13 -------------
>  1 file changed, 13 deletions(-)
>
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index b3a56498595..326e000a71d 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -3529,19 +3529,6 @@ vectorizable_call (vec_info *vinfo,
>
>        return false;
>      }
> -  /* FORNOW: we don't yet support mixtures of vector sizes for calls,
> -     just mixtures of nunits.  E.g. DI->SI versions of __builtin_ctz*
> -     are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed
> -     by a pack of the two vectors into an SI vector.  We would need
> -     separate code to handle direct VnDI->VnSI IFN_CTZs.  */
> -  if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out))
> -    {
> -      if (dump_enabled_p ())
> -       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -                        "mismatched vector sizes %T and %T\n",
> -                        vectype_in, vectype_out);
> -      return false;
> -    }
>
>    if (VECTOR_BOOLEAN_TYPE_P (vectype_out)
>        != VECTOR_BOOLEAN_TYPE_P (vectype_in))
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: RE: [PATCH v1] RISC-V: Remove the type size restriction of vectorizer
  2023-10-20  8:43   ` Li, Pan2
@ 2023-10-20  8:49     ` juzhe.zhong
  0 siblings, 0 replies; 20+ messages in thread
From: juzhe.zhong @ 2023-10-20  8:49 UTC (permalink / raw)
  To: gcc-patches

 /* We only enable VLS modes for VLA vectorization since fixed length VLMAX mode
-   is the highest priority choice and should not conflict with VLS modes.  */
-#define TARGET_VECTOR_VLS                                                      \
-  (TARGET_VECTOR && riscv_autovec_preference == RVV_SCALABLE)
+   is the highest priority choice and should not conflict with VLS modes.
+
+   We also enable VLS modes for some cases in fixed-vlmax, aka the bitsize of
+   the VLS mode are smaller than the minimal vla.
+   */
+#define TARGET_VECTOR_VLS(mode)                                                \
+  (TARGET_VECTOR                                                               \
+    && (riscv_autovec_preference == RVV_SCALABLE                               \
+      || (riscv_autovec_preference == RVV_FIXED_VLMAX                          \
+	&& riscv_vector::legal_vls_mode_in_fixed_vlmax_p (mode))))

Remove this macro. Create a function called "vls_mode_valid_p"
and use riscv_vector::vls_mode_valid_p in all places.

Indicate this PR in commit title: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111857

-> RISC-V: Support partial VLS mode when preference fixed-vlmax[PR111857]

+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vls-mode-6.c
Change  all tests name into pr111857-1.c...etc.

juzhe.zhong@rivai.ai
 
From: Li, Pan2
Date: 2023-10-20 16:43
To: Richard Biener
CC: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang; kito.cheng@gmail.com; Liu, Hongtao
Subject: RE: [PATCH v1] RISC-V: Remove the type size restriction of vectorizer
Hi Richard Biener,
 
The CI of linaro-toolchain@lists.linaro.org reports some aarch64 regression of this change, I will double check about it soon.
 
FAIL: 12 regressions
 
regressions.sum:
=== gcc tests ===
 
Running gcc:gcc.target/aarch64/sve/aarch64-sve.exp ...
FAIL: gcc.target/aarch64/sve/clrsb_1.c (internal compiler error: in expand_fn_using_insn, at internal-fn.cc:284)
FAIL: gcc.target/aarch64/sve/clrsb_1.c (test for excess errors)
FAIL: gcc.target/aarch64/sve/clrsb_1.c scan-assembler-times \\tcls\\tz[0-9]+\\.d, p[0-7]/m, z[0-9]+\\.d\\n 2
FAIL: gcc.target/aarch64/sve/clrsb_1.c scan-assembler-times \\tuzp1\\tz[0-9]+\\.s, z[0-9]+\\.s, z[0-9]+\\.s\\n 1
FAIL: gcc.target/aarch64/sve/clz_1.c (internal compiler error: in expand_fn_using_insn, at internal-fn.cc:284)
FAIL: gcc.target/aarch64/sve/clz_1.c (test for excess errors)
FAIL: gcc.target/aarch64/sve/clz_1.c scan-assembler-times \\tclz\\tz[0-9]+\\.d, p[0-7]/m, z[0-9]+\\.d\\n 2
... and 7 more entries
 
Pan
 
-----Original Message-----
From: Richard Biener <richard.guenther@gmail.com>
Sent: Wednesday, October 18, 2023 2:34 PM
To: Li, Pan2 <pan2.li@intel.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>
Subject: Re: [PATCH v1] RISC-V: Remove the type size restriction of vectorizer
 
On Wed, Oct 18, 2023 at 3:20 AM <pan2.li@intel.com> wrote:
>
> From: Pan Li <pan2.li@intel.com>
>
> The vectoriable_call has one restriction of the size of data type.
> Aka DF to DI is allowed but SF to DI isn't. You may see below message
> when try to vectorize function call like lrintf.
>
> void
> test_lrintf (long *out, float *in, unsigned count)
> {
>   for (unsigned i = 0; i < count; i++)
>     out[i] = __builtin_lrintf (in[i]);
> }
>
> lrintf.c:5:26: missed: couldn't vectorize loop
> lrintf.c:5:26: missed: not vectorized: unsupported data-type
>
> Then the standard name pattern like lrintmn2 cannot work for different
> data type size like SF => DI. This patch would like to remove this data
> type size check and unblock the standard name like lrintmn2.
>
> Passed the x86 bootstrap and regression test already.
 
OK.
 
On x86 we seem to have lrintsfdi2 but not lrintv4sfv4di2, with SLP
vectorization we could expect to see the following vectorized after
the patch (with loop vectorization you'll see us pre-select same sized
vector types)
 
long int x[4];
float y[4];
 
void foo ()
{
  x[0] = __builtin_lrintf (y[0]);
  x[1] = __builtin_lrintf (y[1]);
  x[2] = __builtin_lrintf (y[2]);
  x[3] = __builtin_lrintf (y[3]);
}
 
 
> gcc/ChangeLog:
>
>         * tree-vect-stmts.cc (vectorizable_call): Remove data size
>         check.
>
> Signed-off-by: Pan Li <pan2.li@intel.com>
> ---
>  gcc/tree-vect-stmts.cc | 13 -------------
>  1 file changed, 13 deletions(-)
>
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index b3a56498595..326e000a71d 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -3529,19 +3529,6 @@ vectorizable_call (vec_info *vinfo,
>
>        return false;
>      }
> -  /* FORNOW: we don't yet support mixtures of vector sizes for calls,
> -     just mixtures of nunits.  E.g. DI->SI versions of __builtin_ctz*
> -     are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed
> -     by a pack of the two vectors into an SI vector.  We would need
> -     separate code to handle direct VnDI->VnSI IFN_CTZs.  */
> -  if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out))
> -    {
> -      if (dump_enabled_p ())
> -       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -                        "mismatched vector sizes %T and %T\n",
> -                        vectype_in, vectype_out);
> -      return false;
> -    }
>
>    if (VECTOR_BOOLEAN_TYPE_P (vectype_out)
>        != VECTOR_BOOLEAN_TYPE_P (vectype_in))
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2] VECT: Remove the type size restriction of vectorizer
  2023-10-18  1:20 [PATCH v1] RISC-V: Remove the type size restriction of vectorizer pan2.li
  2023-10-18  6:34 ` Richard Biener
@ 2023-10-26  2:18 ` pan2.li
  2023-10-26  8:37   ` Richard Biener
  2023-10-30 12:22 ` [PATCH v3] VECT: Refine the type size restriction of call vectorizer pan2.li
  2023-10-31 15:10 ` [PATCH v4] " pan2.li
  3 siblings, 1 reply; 20+ messages in thread
From: pan2.li @ 2023-10-26  2:18 UTC (permalink / raw)
  To: gcc-patches
  Cc: juzhe.zhong, pan2.li, yanzhang.wang, kito.cheng, hongtao.liu,
	richard.guenther

From: Pan Li <pan2.li@intel.com>

Update in v2:

* Fix one ICE of type assertion.
* Adjust some test cases for aarch64 sve and riscv vector.

Original log:

The vectoriable_call has one restriction of the size of data type.
Aka DF to DI is allowed but SF to DI isn't. You may see below message
when try to vectorize function call like lrintf.

void
test_lrintf (long *out, float *in, unsigned count)
{
  for (unsigned i = 0; i < count; i++)
    out[i] = __builtin_lrintf (in[i]);
}

lrintf.c:5:26: missed: couldn't vectorize loop
lrintf.c:5:26: missed: not vectorized: unsupported data-type

Then the standard name pattern like lrintmn2 cannot work for different
data type size like SF => DI. This patch would like to remove this data
type size check and unblock the standard name like lrintmn2.

The below test are passed for this patch.

* The x86 bootstrap and regression test.
* The aarch64 regression test.
* The risc-v regression tests.

gcc/ChangeLog:

	* internal-fn.cc (expand_fn_using_insn): Add vector int assertion.
	* tree-vect-stmts.cc (vectorizable_call): Remove size check.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/sve/clrsb_1.c: Adjust checker.
	* gcc.target/aarch64/sve/clz_1.c: Ditto.
	* gcc.target/aarch64/sve/popcount_1.c: Ditto.
	* gcc.target/riscv/rvv/autovec/unop/popcount.c: Ditto.

Signed-off-by: Pan Li <pan2.li@intel.com>
---
 gcc/internal-fn.cc                                  |  3 ++-
 gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c      |  3 +--
 gcc/testsuite/gcc.target/aarch64/sve/clz_1.c        |  3 +--
 gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c   |  3 +--
 .../gcc.target/riscv/rvv/autovec/unop/popcount.c    |  2 +-
 gcc/tree-vect-stmts.cc                              | 13 -------------
 6 files changed, 6 insertions(+), 21 deletions(-)

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 61d5a9e4772..17c0f4c3805 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -281,7 +281,8 @@ expand_fn_using_insn (gcall *stmt, insn_code icode, unsigned int noutputs,
 	emit_move_insn (lhs_rtx, ops[0].value);
       else
 	{
-	  gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs)));
+	  gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs))
+			       || VECTOR_INTEGER_TYPE_P (TREE_TYPE (lhs)));
 	  convert_move (lhs_rtx, ops[0].value, 0);
 	}
     }
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c
index bdc9856faaf..940d08bbc7b 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c
@@ -18,5 +18,4 @@ clrsb_64 (unsigned int *restrict dst, uint64_t *restrict src, int size)
 }
 
 /* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */
-/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */
-/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c
index 0c7a4e6d768..58b8ff406d2 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c
@@ -18,5 +18,4 @@ clz_64 (unsigned int *restrict dst, uint64_t *restrict src, int size)
 }
 
 /* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */
-/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */
-/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c
index dfb6f4ac7a5..0eba898307c 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c
@@ -18,5 +18,4 @@ popcount_64 (unsigned int *restrict dst, uint64_t *restrict src, int size)
 }
 
 /* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */
-/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */
-/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
index 585a522aa81..e6e3c70f927 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
@@ -1461,4 +1461,4 @@ main ()
   RUN_ALL ()
 }
 
-/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 229 "vect" } } */
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 384 "vect" } } */
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index a9200767f67..fa4ca0634e8 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -3361,19 +3361,6 @@ vectorizable_call (vec_info *vinfo,
 
       return false;
     }
-  /* FORNOW: we don't yet support mixtures of vector sizes for calls,
-     just mixtures of nunits.  E.g. DI->SI versions of __builtin_ctz*
-     are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed
-     by a pack of the two vectors into an SI vector.  We would need
-     separate code to handle direct VnDI->VnSI IFN_CTZs.  */
-  if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out))
-    {
-      if (dump_enabled_p ())
-	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-			 "mismatched vector sizes %T and %T\n",
-			 vectype_in, vectype_out);
-      return false;
-    }
 
   if (VECTOR_BOOLEAN_TYPE_P (vectype_out)
       != VECTOR_BOOLEAN_TYPE_P (vectype_in))
-- 
2.34.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2] VECT: Remove the type size restriction of vectorizer
  2023-10-26  2:18 ` [PATCH v2] VECT: " pan2.li
@ 2023-10-26  8:37   ` Richard Biener
  2023-10-26 11:59     ` Li, Pan2
  0 siblings, 1 reply; 20+ messages in thread
From: Richard Biener @ 2023-10-26  8:37 UTC (permalink / raw)
  To: pan2.li
  Cc: gcc-patches, juzhe.zhong, yanzhang.wang, kito.cheng, hongtao.liu,
	Richard Sandiford

On Thu, Oct 26, 2023 at 4:18 AM <pan2.li@intel.com> wrote:
>
> From: Pan Li <pan2.li@intel.com>
>
> Update in v2:
>
> * Fix one ICE of type assertion.
> * Adjust some test cases for aarch64 sve and riscv vector.
>
> Original log:
>
> The vectoriable_call has one restriction of the size of data type.
> Aka DF to DI is allowed but SF to DI isn't. You may see below message
> when try to vectorize function call like lrintf.
>
> void
> test_lrintf (long *out, float *in, unsigned count)
> {
>   for (unsigned i = 0; i < count; i++)
>     out[i] = __builtin_lrintf (in[i]);
> }
>
> lrintf.c:5:26: missed: couldn't vectorize loop
> lrintf.c:5:26: missed: not vectorized: unsupported data-type
>
> Then the standard name pattern like lrintmn2 cannot work for different
> data type size like SF => DI. This patch would like to remove this data
> type size check and unblock the standard name like lrintmn2.
>
> The below test are passed for this patch.
>
> * The x86 bootstrap and regression test.
> * The aarch64 regression test.
> * The risc-v regression tests.
>
> gcc/ChangeLog:
>
>         * internal-fn.cc (expand_fn_using_insn): Add vector int assertion.
>         * tree-vect-stmts.cc (vectorizable_call): Remove size check.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/aarch64/sve/clrsb_1.c: Adjust checker.
>         * gcc.target/aarch64/sve/clz_1.c: Ditto.
>         * gcc.target/aarch64/sve/popcount_1.c: Ditto.
>         * gcc.target/riscv/rvv/autovec/unop/popcount.c: Ditto.
>
> Signed-off-by: Pan Li <pan2.li@intel.com>
> ---
>  gcc/internal-fn.cc                                  |  3 ++-
>  gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c      |  3 +--
>  gcc/testsuite/gcc.target/aarch64/sve/clz_1.c        |  3 +--
>  gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c   |  3 +--
>  .../gcc.target/riscv/rvv/autovec/unop/popcount.c    |  2 +-
>  gcc/tree-vect-stmts.cc                              | 13 -------------
>  6 files changed, 6 insertions(+), 21 deletions(-)
>
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 61d5a9e4772..17c0f4c3805 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -281,7 +281,8 @@ expand_fn_using_insn (gcall *stmt, insn_code icode, unsigned int noutputs,
>         emit_move_insn (lhs_rtx, ops[0].value);
>        else
>         {
> -         gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs)));
> +         gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs))
> +                              || VECTOR_INTEGER_TYPE_P (TREE_TYPE (lhs)));

Can you explain why this is necessary?  In particular what is lhs_rtx
mode vs ops[0].value mode?

>           convert_move (lhs_rtx, ops[0].value, 0);

I'm not sure convert_move handles vector modes correctly.  Richard
probably added this code, CCed.

Richard.

>         }
>      }
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c
> index bdc9856faaf..940d08bbc7b 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c
> @@ -18,5 +18,4 @@ clrsb_64 (unsigned int *restrict dst, uint64_t *restrict src, int size)
>  }
>
>  /* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */
> -/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */
> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
> +/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c
> index 0c7a4e6d768..58b8ff406d2 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c
> @@ -18,5 +18,4 @@ clz_64 (unsigned int *restrict dst, uint64_t *restrict src, int size)
>  }
>
>  /* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */
> -/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */
> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
> +/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c
> index dfb6f4ac7a5..0eba898307c 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c
> @@ -18,5 +18,4 @@ popcount_64 (unsigned int *restrict dst, uint64_t *restrict src, int size)
>  }
>
>  /* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */
> -/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */
> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
> +/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
> index 585a522aa81..e6e3c70f927 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
> @@ -1461,4 +1461,4 @@ main ()
>    RUN_ALL ()
>  }
>
> -/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 229 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 384 "vect" } } */
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index a9200767f67..fa4ca0634e8 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -3361,19 +3361,6 @@ vectorizable_call (vec_info *vinfo,
>
>        return false;
>      }
> -  /* FORNOW: we don't yet support mixtures of vector sizes for calls,
> -     just mixtures of nunits.  E.g. DI->SI versions of __builtin_ctz*
> -     are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed
> -     by a pack of the two vectors into an SI vector.  We would need
> -     separate code to handle direct VnDI->VnSI IFN_CTZs.  */
> -  if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out))
> -    {
> -      if (dump_enabled_p ())
> -       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -                        "mismatched vector sizes %T and %T\n",
> -                        vectype_in, vectype_out);
> -      return false;
> -    }
>
>    if (VECTOR_BOOLEAN_TYPE_P (vectype_out)
>        != VECTOR_BOOLEAN_TYPE_P (vectype_in))
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v2] VECT: Remove the type size restriction of vectorizer
  2023-10-26  8:37   ` Richard Biener
@ 2023-10-26 11:59     ` Li, Pan2
  2023-10-26 13:59       ` Richard Biener
  0 siblings, 1 reply; 20+ messages in thread
From: Li, Pan2 @ 2023-10-26 11:59 UTC (permalink / raw)
  To: Richard Biener
  Cc: gcc-patches, juzhe.zhong, Wang, Yanzhang, kito.cheng, Liu,
	Hongtao, Richard Sandiford

Thanks Richard for comments.

> Can you explain why this is necessary?  In particular what is lhs_rtx
> mode vs ops[0].value mode?

For testcase gcc.target/aarch64/sve/popcount_1.c, the rtl are list as below.

The lhs_rtx is (reg:VNx2SI 98 [ vect__5.36 ]).
The ops[0].value is (reg:VNx2DI 104).

The restriction removing make the vector rtl enter expand_fn_using_insn and of course hit the INTEGER_P assertion.

Pan

-----Original Message-----
From: Richard Biener <richard.guenther@gmail.com> 
Sent: Thursday, October 26, 2023 4:38 PM
To: Li, Pan2 <pan2.li@intel.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>; Richard Sandiford <richard.sandiford@arm.com>
Subject: Re: [PATCH v2] VECT: Remove the type size restriction of vectorizer

On Thu, Oct 26, 2023 at 4:18 AM <pan2.li@intel.com> wrote:
>
> From: Pan Li <pan2.li@intel.com>
>
> Update in v2:
>
> * Fix one ICE of type assertion.
> * Adjust some test cases for aarch64 sve and riscv vector.
>
> Original log:
>
> The vectoriable_call has one restriction of the size of data type.
> Aka DF to DI is allowed but SF to DI isn't. You may see below message
> when try to vectorize function call like lrintf.
>
> void
> test_lrintf (long *out, float *in, unsigned count)
> {
>   for (unsigned i = 0; i < count; i++)
>     out[i] = __builtin_lrintf (in[i]);
> }
>
> lrintf.c:5:26: missed: couldn't vectorize loop
> lrintf.c:5:26: missed: not vectorized: unsupported data-type
>
> Then the standard name pattern like lrintmn2 cannot work for different
> data type size like SF => DI. This patch would like to remove this data
> type size check and unblock the standard name like lrintmn2.
>
> The below test are passed for this patch.
>
> * The x86 bootstrap and regression test.
> * The aarch64 regression test.
> * The risc-v regression tests.
>
> gcc/ChangeLog:
>
>         * internal-fn.cc (expand_fn_using_insn): Add vector int assertion.
>         * tree-vect-stmts.cc (vectorizable_call): Remove size check.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/aarch64/sve/clrsb_1.c: Adjust checker.
>         * gcc.target/aarch64/sve/clz_1.c: Ditto.
>         * gcc.target/aarch64/sve/popcount_1.c: Ditto.
>         * gcc.target/riscv/rvv/autovec/unop/popcount.c: Ditto.
>
> Signed-off-by: Pan Li <pan2.li@intel.com>
> ---
>  gcc/internal-fn.cc                                  |  3 ++-
>  gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c      |  3 +--
>  gcc/testsuite/gcc.target/aarch64/sve/clz_1.c        |  3 +--
>  gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c   |  3 +--
>  .../gcc.target/riscv/rvv/autovec/unop/popcount.c    |  2 +-
>  gcc/tree-vect-stmts.cc                              | 13 -------------
>  6 files changed, 6 insertions(+), 21 deletions(-)
>
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 61d5a9e4772..17c0f4c3805 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -281,7 +281,8 @@ expand_fn_using_insn (gcall *stmt, insn_code icode, unsigned int noutputs,
>         emit_move_insn (lhs_rtx, ops[0].value);
>        else
>         {
> -         gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs)));
> +         gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs))
> +                              || VECTOR_INTEGER_TYPE_P (TREE_TYPE (lhs)));

Can you explain why this is necessary?  In particular what is lhs_rtx
mode vs ops[0].value mode?

>           convert_move (lhs_rtx, ops[0].value, 0);

I'm not sure convert_move handles vector modes correctly.  Richard
probably added this code, CCed.

Richard.

>         }
>      }
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c
> index bdc9856faaf..940d08bbc7b 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c
> @@ -18,5 +18,4 @@ clrsb_64 (unsigned int *restrict dst, uint64_t *restrict src, int size)
>  }
>
>  /* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */
> -/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */
> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
> +/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c
> index 0c7a4e6d768..58b8ff406d2 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c
> @@ -18,5 +18,4 @@ clz_64 (unsigned int *restrict dst, uint64_t *restrict src, int size)
>  }
>
>  /* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */
> -/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */
> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
> +/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c
> index dfb6f4ac7a5..0eba898307c 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c
> @@ -18,5 +18,4 @@ popcount_64 (unsigned int *restrict dst, uint64_t *restrict src, int size)
>  }
>
>  /* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */
> -/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */
> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
> +/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
> index 585a522aa81..e6e3c70f927 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
> @@ -1461,4 +1461,4 @@ main ()
>    RUN_ALL ()
>  }
>
> -/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 229 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 384 "vect" } } */
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index a9200767f67..fa4ca0634e8 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -3361,19 +3361,6 @@ vectorizable_call (vec_info *vinfo,
>
>        return false;
>      }
> -  /* FORNOW: we don't yet support mixtures of vector sizes for calls,
> -     just mixtures of nunits.  E.g. DI->SI versions of __builtin_ctz*
> -     are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed
> -     by a pack of the two vectors into an SI vector.  We would need
> -     separate code to handle direct VnDI->VnSI IFN_CTZs.  */
> -  if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out))
> -    {
> -      if (dump_enabled_p ())
> -       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -                        "mismatched vector sizes %T and %T\n",
> -                        vectype_in, vectype_out);
> -      return false;
> -    }
>
>    if (VECTOR_BOOLEAN_TYPE_P (vectype_out)
>        != VECTOR_BOOLEAN_TYPE_P (vectype_in))
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2] VECT: Remove the type size restriction of vectorizer
  2023-10-26 11:59     ` Li, Pan2
@ 2023-10-26 13:59       ` Richard Biener
  2023-10-26 14:42         ` Li, Pan2
  2023-10-26 17:46         ` Richard Sandiford
  0 siblings, 2 replies; 20+ messages in thread
From: Richard Biener @ 2023-10-26 13:59 UTC (permalink / raw)
  To: Li, Pan2
  Cc: gcc-patches, juzhe.zhong, Wang, Yanzhang, kito.cheng, Liu,
	Hongtao, Richard Sandiford



> Am 26.10.2023 um 13:59 schrieb Li, Pan2 <pan2.li@intel.com>:
> 
> Thanks Richard for comments.
> 
>> Can you explain why this is necessary?  In particular what is lhs_rtx
>> mode vs ops[0].value mode?
> 
> For testcase gcc.target/aarch64/sve/popcount_1.c, the rtl are list as below.
> 
> The lhs_rtx is (reg:VNx2SI 98 [ vect__5.36 ]).
> The ops[0].value is (reg:VNx2DI 104).
> 
> The restriction removing make the vector rtl enter expand_fn_using_insn and of course hit the INTEGER_P assertion.

But I think this shows we mid-selected the optab, a convert_move is certainly not correct unconditionally here (the target might not support that)

> Pan
> 
> -----Original Message-----
> From: Richard Biener <richard.guenther@gmail.com> 
> Sent: Thursday, October 26, 2023 4:38 PM
> To: Li, Pan2 <pan2.li@intel.com>
> Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>; Richard Sandiford <richard.sandiford@arm.com>
> Subject: Re: [PATCH v2] VECT: Remove the type size restriction of vectorizer
> 
>> On Thu, Oct 26, 2023 at 4:18 AM <pan2.li@intel.com> wrote:
>> 
>> From: Pan Li <pan2.li@intel.com>
>> 
>> Update in v2:
>> 
>> * Fix one ICE of type assertion.
>> * Adjust some test cases for aarch64 sve and riscv vector.
>> 
>> Original log:
>> 
>> The vectoriable_call has one restriction of the size of data type.
>> Aka DF to DI is allowed but SF to DI isn't. You may see below message
>> when try to vectorize function call like lrintf.
>> 
>> void
>> test_lrintf (long *out, float *in, unsigned count)
>> {
>>  for (unsigned i = 0; i < count; i++)
>>    out[i] = __builtin_lrintf (in[i]);
>> }
>> 
>> lrintf.c:5:26: missed: couldn't vectorize loop
>> lrintf.c:5:26: missed: not vectorized: unsupported data-type
>> 
>> Then the standard name pattern like lrintmn2 cannot work for different
>> data type size like SF => DI. This patch would like to remove this data
>> type size check and unblock the standard name like lrintmn2.
>> 
>> The below test are passed for this patch.
>> 
>> * The x86 bootstrap and regression test.
>> * The aarch64 regression test.
>> * The risc-v regression tests.
>> 
>> gcc/ChangeLog:
>> 
>>        * internal-fn.cc (expand_fn_using_insn): Add vector int assertion.
>>        * tree-vect-stmts.cc (vectorizable_call): Remove size check.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>        * gcc.target/aarch64/sve/clrsb_1.c: Adjust checker.
>>        * gcc.target/aarch64/sve/clz_1.c: Ditto.
>>        * gcc.target/aarch64/sve/popcount_1.c: Ditto.
>>        * gcc.target/riscv/rvv/autovec/unop/popcount.c: Ditto.
>> 
>> Signed-off-by: Pan Li <pan2.li@intel.com>
>> ---
>> gcc/internal-fn.cc                                  |  3 ++-
>> gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c      |  3 +--
>> gcc/testsuite/gcc.target/aarch64/sve/clz_1.c        |  3 +--
>> gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c   |  3 +--
>> .../gcc.target/riscv/rvv/autovec/unop/popcount.c    |  2 +-
>> gcc/tree-vect-stmts.cc                              | 13 -------------
>> 6 files changed, 6 insertions(+), 21 deletions(-)
>> 
>> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
>> index 61d5a9e4772..17c0f4c3805 100644
>> --- a/gcc/internal-fn.cc
>> +++ b/gcc/internal-fn.cc
>> @@ -281,7 +281,8 @@ expand_fn_using_insn (gcall *stmt, insn_code icode, unsigned int noutputs,
>>        emit_move_insn (lhs_rtx, ops[0].value);
>>       else
>>        {
>> -         gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs)));
>> +         gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs))
>> +                              || VECTOR_INTEGER_TYPE_P (TREE_TYPE (lhs)));
> 
> Can you explain why this is necessary?  In particular what is lhs_rtx
> mode vs ops[0].value mode?
> 
>>          convert_move (lhs_rtx, ops[0].value, 0);
> 
> I'm not sure convert_move handles vector modes correctly.  Richard
> probably added this code, CCed.
> 
> Richard.
> 
>>        }
>>     }
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c
>> index bdc9856faaf..940d08bbc7b 100644
>> --- a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c
>> @@ -18,5 +18,4 @@ clrsb_64 (unsigned int *restrict dst, uint64_t *restrict src, int size)
>> }
>> 
>> /* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */
>> -/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */
>> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
>> +/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c
>> index 0c7a4e6d768..58b8ff406d2 100644
>> --- a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c
>> @@ -18,5 +18,4 @@ clz_64 (unsigned int *restrict dst, uint64_t *restrict src, int size)
>> }
>> 
>> /* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */
>> -/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */
>> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
>> +/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c
>> index dfb6f4ac7a5..0eba898307c 100644
>> --- a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c
>> @@ -18,5 +18,4 @@ popcount_64 (unsigned int *restrict dst, uint64_t *restrict src, int size)
>> }
>> 
>> /* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */
>> -/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */
>> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
>> +/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */
>> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
>> index 585a522aa81..e6e3c70f927 100644
>> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
>> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
>> @@ -1461,4 +1461,4 @@ main ()
>>   RUN_ALL ()
>> }
>> 
>> -/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 229 "vect" } } */
>> +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 384 "vect" } } */
>> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
>> index a9200767f67..fa4ca0634e8 100644
>> --- a/gcc/tree-vect-stmts.cc
>> +++ b/gcc/tree-vect-stmts.cc
>> @@ -3361,19 +3361,6 @@ vectorizable_call (vec_info *vinfo,
>> 
>>       return false;
>>     }
>> -  /* FORNOW: we don't yet support mixtures of vector sizes for calls,
>> -     just mixtures of nunits.  E.g. DI->SI versions of __builtin_ctz*
>> -     are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed
>> -     by a pack of the two vectors into an SI vector.  We would need
>> -     separate code to handle direct VnDI->VnSI IFN_CTZs.  */
>> -  if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out))
>> -    {
>> -      if (dump_enabled_p ())
>> -       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>> -                        "mismatched vector sizes %T and %T\n",
>> -                        vectype_in, vectype_out);
>> -      return false;
>> -    }
>> 
>>   if (VECTOR_BOOLEAN_TYPE_P (vectype_out)
>>       != VECTOR_BOOLEAN_TYPE_P (vectype_in))
>> --
>> 2.34.1
>> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v2] VECT: Remove the type size restriction of vectorizer
  2023-10-26 13:59       ` Richard Biener
@ 2023-10-26 14:42         ` Li, Pan2
  2023-10-26 17:46         ` Richard Sandiford
  1 sibling, 0 replies; 20+ messages in thread
From: Li, Pan2 @ 2023-10-26 14:42 UTC (permalink / raw)
  To: Richard Biener
  Cc: gcc-patches, juzhe.zhong, Wang, Yanzhang, kito.cheng, Liu,
	Hongtao, Richard Sandiford

> But I think this shows we mid-selected the optab, a convert_move is certainly not correct unconditionally here (the target might not support that)

Make sense, we can wait a while for the confirmation from Richard S.

If convert_move is not designed for Vector (looks like mostly up to a point), I am not sure if we can fix the assertion like below

...
else If (VECTOR_INTERGER_TYPE (TREE_TYPE(lhs)))
  return;
else
  {
    gcc_checking_assert (INTEGRAL_TYPE_TYPE_P (TREE_TYPE (lhs)));
    convert_move (lhs_rtx, ops[0].value, 0);
  }

Aka bypass the vector here, but I am afraid this change may make the llrintf (SF => DI) not working on standard name.
Let me have a try and keep you posted.

Pan
  

-----Original Message-----
From: Richard Biener <richard.guenther@gmail.com> 
Sent: Thursday, October 26, 2023 10:00 PM
To: Li, Pan2 <pan2.li@intel.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>; Richard Sandiford <richard.sandiford@arm.com>
Subject: Re: [PATCH v2] VECT: Remove the type size restriction of vectorizer



> Am 26.10.2023 um 13:59 schrieb Li, Pan2 <pan2.li@intel.com>:
> 
> Thanks Richard for comments.
> 
>> Can you explain why this is necessary?  In particular what is lhs_rtx
>> mode vs ops[0].value mode?
> 
> For testcase gcc.target/aarch64/sve/popcount_1.c, the rtl are list as below.
> 
> The lhs_rtx is (reg:VNx2SI 98 [ vect__5.36 ]).
> The ops[0].value is (reg:VNx2DI 104).
> 
> The restriction removing make the vector rtl enter expand_fn_using_insn and of course hit the INTEGER_P assertion.

But I think this shows we mid-selected the optab, a convert_move is certainly not correct unconditionally here (the target might not support that)

> Pan
> 
> -----Original Message-----
> From: Richard Biener <richard.guenther@gmail.com> 
> Sent: Thursday, October 26, 2023 4:38 PM
> To: Li, Pan2 <pan2.li@intel.com>
> Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>; Richard Sandiford <richard.sandiford@arm.com>
> Subject: Re: [PATCH v2] VECT: Remove the type size restriction of vectorizer
> 
>> On Thu, Oct 26, 2023 at 4:18 AM <pan2.li@intel.com> wrote:
>> 
>> From: Pan Li <pan2.li@intel.com>
>> 
>> Update in v2:
>> 
>> * Fix one ICE of type assertion.
>> * Adjust some test cases for aarch64 sve and riscv vector.
>> 
>> Original log:
>> 
>> The vectoriable_call has one restriction of the size of data type.
>> Aka DF to DI is allowed but SF to DI isn't. You may see below message
>> when try to vectorize function call like lrintf.
>> 
>> void
>> test_lrintf (long *out, float *in, unsigned count)
>> {
>>  for (unsigned i = 0; i < count; i++)
>>    out[i] = __builtin_lrintf (in[i]);
>> }
>> 
>> lrintf.c:5:26: missed: couldn't vectorize loop
>> lrintf.c:5:26: missed: not vectorized: unsupported data-type
>> 
>> Then the standard name pattern like lrintmn2 cannot work for different
>> data type size like SF => DI. This patch would like to remove this data
>> type size check and unblock the standard name like lrintmn2.
>> 
>> The below test are passed for this patch.
>> 
>> * The x86 bootstrap and regression test.
>> * The aarch64 regression test.
>> * The risc-v regression tests.
>> 
>> gcc/ChangeLog:
>> 
>>        * internal-fn.cc (expand_fn_using_insn): Add vector int assertion.
>>        * tree-vect-stmts.cc (vectorizable_call): Remove size check.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>        * gcc.target/aarch64/sve/clrsb_1.c: Adjust checker.
>>        * gcc.target/aarch64/sve/clz_1.c: Ditto.
>>        * gcc.target/aarch64/sve/popcount_1.c: Ditto.
>>        * gcc.target/riscv/rvv/autovec/unop/popcount.c: Ditto.
>> 
>> Signed-off-by: Pan Li <pan2.li@intel.com>
>> ---
>> gcc/internal-fn.cc                                  |  3 ++-
>> gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c      |  3 +--
>> gcc/testsuite/gcc.target/aarch64/sve/clz_1.c        |  3 +--
>> gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c   |  3 +--
>> .../gcc.target/riscv/rvv/autovec/unop/popcount.c    |  2 +-
>> gcc/tree-vect-stmts.cc                              | 13 -------------
>> 6 files changed, 6 insertions(+), 21 deletions(-)
>> 
>> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
>> index 61d5a9e4772..17c0f4c3805 100644
>> --- a/gcc/internal-fn.cc
>> +++ b/gcc/internal-fn.cc
>> @@ -281,7 +281,8 @@ expand_fn_using_insn (gcall *stmt, insn_code icode, unsigned int noutputs,
>>        emit_move_insn (lhs_rtx, ops[0].value);
>>       else
>>        {
>> -         gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs)));
>> +         gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs))
>> +                              || VECTOR_INTEGER_TYPE_P (TREE_TYPE (lhs)));
> 
> Can you explain why this is necessary?  In particular what is lhs_rtx
> mode vs ops[0].value mode?
> 
>>          convert_move (lhs_rtx, ops[0].value, 0);
> 
> I'm not sure convert_move handles vector modes correctly.  Richard
> probably added this code, CCed.
> 
> Richard.
> 
>>        }
>>     }
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c
>> index bdc9856faaf..940d08bbc7b 100644
>> --- a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c
>> @@ -18,5 +18,4 @@ clrsb_64 (unsigned int *restrict dst, uint64_t *restrict src, int size)
>> }
>> 
>> /* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */
>> -/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */
>> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
>> +/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c
>> index 0c7a4e6d768..58b8ff406d2 100644
>> --- a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c
>> @@ -18,5 +18,4 @@ clz_64 (unsigned int *restrict dst, uint64_t *restrict src, int size)
>> }
>> 
>> /* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */
>> -/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */
>> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
>> +/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c
>> index dfb6f4ac7a5..0eba898307c 100644
>> --- a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c
>> @@ -18,5 +18,4 @@ popcount_64 (unsigned int *restrict dst, uint64_t *restrict src, int size)
>> }
>> 
>> /* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */
>> -/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */
>> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
>> +/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */
>> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
>> index 585a522aa81..e6e3c70f927 100644
>> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
>> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
>> @@ -1461,4 +1461,4 @@ main ()
>>   RUN_ALL ()
>> }
>> 
>> -/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 229 "vect" } } */
>> +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 384 "vect" } } */
>> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
>> index a9200767f67..fa4ca0634e8 100644
>> --- a/gcc/tree-vect-stmts.cc
>> +++ b/gcc/tree-vect-stmts.cc
>> @@ -3361,19 +3361,6 @@ vectorizable_call (vec_info *vinfo,
>> 
>>       return false;
>>     }
>> -  /* FORNOW: we don't yet support mixtures of vector sizes for calls,
>> -     just mixtures of nunits.  E.g. DI->SI versions of __builtin_ctz*
>> -     are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed
>> -     by a pack of the two vectors into an SI vector.  We would need
>> -     separate code to handle direct VnDI->VnSI IFN_CTZs.  */
>> -  if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out))
>> -    {
>> -      if (dump_enabled_p ())
>> -       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>> -                        "mismatched vector sizes %T and %T\n",
>> -                        vectype_in, vectype_out);
>> -      return false;
>> -    }
>> 
>>   if (VECTOR_BOOLEAN_TYPE_P (vectype_out)
>>       != VECTOR_BOOLEAN_TYPE_P (vectype_in))
>> --
>> 2.34.1
>> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2] VECT: Remove the type size restriction of vectorizer
  2023-10-26 13:59       ` Richard Biener
  2023-10-26 14:42         ` Li, Pan2
@ 2023-10-26 17:46         ` Richard Sandiford
  2023-10-27  2:17           ` Li, Pan2
  1 sibling, 1 reply; 20+ messages in thread
From: Richard Sandiford @ 2023-10-26 17:46 UTC (permalink / raw)
  To: Richard Biener
  Cc: Li, Pan2, gcc-patches, juzhe.zhong, Wang, Yanzhang, kito.cheng,
	Liu, Hongtao

Richard Biener <richard.guenther@gmail.com> writes:
>> Am 26.10.2023 um 13:59 schrieb Li, Pan2 <pan2.li@intel.com>:
>> 
>> Thanks Richard for comments.
>> 
>>> Can you explain why this is necessary?  In particular what is lhs_rtx
>>> mode vs ops[0].value mode?
>> 
>> For testcase gcc.target/aarch64/sve/popcount_1.c, the rtl are list as below.
>> 
>> The lhs_rtx is (reg:VNx2SI 98 [ vect__5.36 ]).
>> The ops[0].value is (reg:VNx2DI 104).
>> 
>> The restriction removing make the vector rtl enter expand_fn_using_insn and of course hit the INTEGER_P assertion.
>
> But I think this shows we mid-selected the optab, a convert_move is certainly not correct unconditionally here (the target might not support that)

Agreed.  Allowing TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out)
makes sense if the called function allows the input and output modes
to vary.  That's true for internal functions that eventually map to
two-mode optabs.  But we can't remove the condition for calls to
other functions, at least not without some fix-ups.

ISTM that the problem being hit is the one described by the removed
comment.

In other words, I don't think simply removing the test from the vectoriser
is correct.  It needs to be replaced by something more selective.

Thanks,
Richard

>> Pan
>> 
>> -----Original Message-----
>> From: Richard Biener <richard.guenther@gmail.com> 
>> Sent: Thursday, October 26, 2023 4:38 PM
>> To: Li, Pan2 <pan2.li@intel.com>
>> Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>; Richard Sandiford <richard.sandiford@arm.com>
>> Subject: Re: [PATCH v2] VECT: Remove the type size restriction of vectorizer
>> 
>>> On Thu, Oct 26, 2023 at 4:18 AM <pan2.li@intel.com> wrote:
>>> 
>>> From: Pan Li <pan2.li@intel.com>
>>> 
>>> Update in v2:
>>> 
>>> * Fix one ICE of type assertion.
>>> * Adjust some test cases for aarch64 sve and riscv vector.
>>> 
>>> Original log:
>>> 
>>> The vectoriable_call has one restriction of the size of data type.
>>> Aka DF to DI is allowed but SF to DI isn't. You may see below message
>>> when try to vectorize function call like lrintf.
>>> 
>>> void
>>> test_lrintf (long *out, float *in, unsigned count)
>>> {
>>>  for (unsigned i = 0; i < count; i++)
>>>    out[i] = __builtin_lrintf (in[i]);
>>> }
>>> 
>>> lrintf.c:5:26: missed: couldn't vectorize loop
>>> lrintf.c:5:26: missed: not vectorized: unsupported data-type
>>> 
>>> Then the standard name pattern like lrintmn2 cannot work for different
>>> data type size like SF => DI. This patch would like to remove this data
>>> type size check and unblock the standard name like lrintmn2.
>>> 
>>> The below test are passed for this patch.
>>> 
>>> * The x86 bootstrap and regression test.
>>> * The aarch64 regression test.
>>> * The risc-v regression tests.
>>> 
>>> gcc/ChangeLog:
>>> 
>>>        * internal-fn.cc (expand_fn_using_insn): Add vector int assertion.
>>>        * tree-vect-stmts.cc (vectorizable_call): Remove size check.
>>> 
>>> gcc/testsuite/ChangeLog:
>>> 
>>>        * gcc.target/aarch64/sve/clrsb_1.c: Adjust checker.
>>>        * gcc.target/aarch64/sve/clz_1.c: Ditto.
>>>        * gcc.target/aarch64/sve/popcount_1.c: Ditto.
>>>        * gcc.target/riscv/rvv/autovec/unop/popcount.c: Ditto.
>>> 
>>> Signed-off-by: Pan Li <pan2.li@intel.com>
>>> ---
>>> gcc/internal-fn.cc                                  |  3 ++-
>>> gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c      |  3 +--
>>> gcc/testsuite/gcc.target/aarch64/sve/clz_1.c        |  3 +--
>>> gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c   |  3 +--
>>> .../gcc.target/riscv/rvv/autovec/unop/popcount.c    |  2 +-
>>> gcc/tree-vect-stmts.cc                              | 13 -------------
>>> 6 files changed, 6 insertions(+), 21 deletions(-)
>>> 
>>> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
>>> index 61d5a9e4772..17c0f4c3805 100644
>>> --- a/gcc/internal-fn.cc
>>> +++ b/gcc/internal-fn.cc
>>> @@ -281,7 +281,8 @@ expand_fn_using_insn (gcall *stmt, insn_code icode, unsigned int noutputs,
>>>        emit_move_insn (lhs_rtx, ops[0].value);
>>>       else
>>>        {
>>> -         gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs)));
>>> +         gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs))
>>> +                              || VECTOR_INTEGER_TYPE_P (TREE_TYPE (lhs)));
>> 
>> Can you explain why this is necessary?  In particular what is lhs_rtx
>> mode vs ops[0].value mode?
>> 
>>>          convert_move (lhs_rtx, ops[0].value, 0);
>> 
>> I'm not sure convert_move handles vector modes correctly.  Richard
>> probably added this code, CCed.
>> 
>> Richard.
>> 
>>>        }
>>>     }
>>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c
>>> index bdc9856faaf..940d08bbc7b 100644
>>> --- a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c
>>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c
>>> @@ -18,5 +18,4 @@ clrsb_64 (unsigned int *restrict dst, uint64_t *restrict src, int size)
>>> }
>>> 
>>> /* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */
>>> -/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */
>>> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
>>> +/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */
>>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c
>>> index 0c7a4e6d768..58b8ff406d2 100644
>>> --- a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c
>>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c
>>> @@ -18,5 +18,4 @@ clz_64 (unsigned int *restrict dst, uint64_t *restrict src, int size)
>>> }
>>> 
>>> /* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */
>>> -/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */
>>> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
>>> +/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */
>>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c
>>> index dfb6f4ac7a5..0eba898307c 100644
>>> --- a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c
>>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c
>>> @@ -18,5 +18,4 @@ popcount_64 (unsigned int *restrict dst, uint64_t *restrict src, int size)
>>> }
>>> 
>>> /* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */
>>> -/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */
>>> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
>>> +/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */
>>> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
>>> index 585a522aa81..e6e3c70f927 100644
>>> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
>>> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
>>> @@ -1461,4 +1461,4 @@ main ()
>>>   RUN_ALL ()
>>> }
>>> 
>>> -/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 229 "vect" } } */
>>> +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 384 "vect" } } */
>>> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
>>> index a9200767f67..fa4ca0634e8 100644
>>> --- a/gcc/tree-vect-stmts.cc
>>> +++ b/gcc/tree-vect-stmts.cc
>>> @@ -3361,19 +3361,6 @@ vectorizable_call (vec_info *vinfo,
>>> 
>>>       return false;
>>>     }
>>> -  /* FORNOW: we don't yet support mixtures of vector sizes for calls,
>>> -     just mixtures of nunits.  E.g. DI->SI versions of __builtin_ctz*
>>> -     are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed
>>> -     by a pack of the two vectors into an SI vector.  We would need
>>> -     separate code to handle direct VnDI->VnSI IFN_CTZs.  */
>>> -  if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out))
>>> -    {
>>> -      if (dump_enabled_p ())
>>> -       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>>> -                        "mismatched vector sizes %T and %T\n",
>>> -                        vectype_in, vectype_out);
>>> -      return false;
>>> -    }
>>> 
>>>   if (VECTOR_BOOLEAN_TYPE_P (vectype_out)
>>>       != VECTOR_BOOLEAN_TYPE_P (vectype_in))
>>> --
>>> 2.34.1
>>> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v2] VECT: Remove the type size restriction of vectorizer
  2023-10-26 17:46         ` Richard Sandiford
@ 2023-10-27  2:17           ` Li, Pan2
  2023-10-27 13:31             ` Richard Biener
  0 siblings, 1 reply; 20+ messages in thread
From: Li, Pan2 @ 2023-10-27  2:17 UTC (permalink / raw)
  To: Richard Sandiford, Richard Biener
  Cc: gcc-patches, juzhe.zhong, Wang, Yanzhang, kito.cheng, Liu, Hongtao

Thanks Richard S for comments.

> In other words, I don't think simply removing the test from the vectoriser
> is correct.  It needs to be replaced by something more selective.

Does it mean we need to check if the internal fun allow different modes/sizes here?

For example, standard name lrintmn2 (m, n mode) is allowed here, while rintm2 (only m mode) isn't.

Pan

-----Original Message-----
From: Richard Sandiford <richard.sandiford@arm.com> 
Sent: Friday, October 27, 2023 1:47 AM
To: Richard Biener <richard.guenther@gmail.com>
Cc: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>
Subject: Re: [PATCH v2] VECT: Remove the type size restriction of vectorizer

Richard Biener <richard.guenther@gmail.com> writes:
>> Am 26.10.2023 um 13:59 schrieb Li, Pan2 <pan2.li@intel.com>:
>> 
>> Thanks Richard for comments.
>> 
>>> Can you explain why this is necessary?  In particular what is lhs_rtx
>>> mode vs ops[0].value mode?
>> 
>> For testcase gcc.target/aarch64/sve/popcount_1.c, the rtl are list as below.
>> 
>> The lhs_rtx is (reg:VNx2SI 98 [ vect__5.36 ]).
>> The ops[0].value is (reg:VNx2DI 104).
>> 
>> The restriction removing make the vector rtl enter expand_fn_using_insn and of course hit the INTEGER_P assertion.
>
> But I think this shows we mid-selected the optab, a convert_move is certainly not correct unconditionally here (the target might not support that)

Agreed.  Allowing TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out)
makes sense if the called function allows the input and output modes
to vary.  That's true for internal functions that eventually map to
two-mode optabs.  But we can't remove the condition for calls to
other functions, at least not without some fix-ups.

ISTM that the problem being hit is the one described by the removed
comment.

In other words, I don't think simply removing the test from the vectoriser
is correct.  It needs to be replaced by something more selective.

Thanks,
Richard

>> Pan
>> 
>> -----Original Message-----
>> From: Richard Biener <richard.guenther@gmail.com> 
>> Sent: Thursday, October 26, 2023 4:38 PM
>> To: Li, Pan2 <pan2.li@intel.com>
>> Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>; Richard Sandiford <richard.sandiford@arm.com>
>> Subject: Re: [PATCH v2] VECT: Remove the type size restriction of vectorizer
>> 
>>> On Thu, Oct 26, 2023 at 4:18 AM <pan2.li@intel.com> wrote:
>>> 
>>> From: Pan Li <pan2.li@intel.com>
>>> 
>>> Update in v2:
>>> 
>>> * Fix one ICE of type assertion.
>>> * Adjust some test cases for aarch64 sve and riscv vector.
>>> 
>>> Original log:
>>> 
>>> The vectoriable_call has one restriction of the size of data type.
>>> Aka DF to DI is allowed but SF to DI isn't. You may see below message
>>> when try to vectorize function call like lrintf.
>>> 
>>> void
>>> test_lrintf (long *out, float *in, unsigned count)
>>> {
>>>  for (unsigned i = 0; i < count; i++)
>>>    out[i] = __builtin_lrintf (in[i]);
>>> }
>>> 
>>> lrintf.c:5:26: missed: couldn't vectorize loop
>>> lrintf.c:5:26: missed: not vectorized: unsupported data-type
>>> 
>>> Then the standard name pattern like lrintmn2 cannot work for different
>>> data type size like SF => DI. This patch would like to remove this data
>>> type size check and unblock the standard name like lrintmn2.
>>> 
>>> The below test are passed for this patch.
>>> 
>>> * The x86 bootstrap and regression test.
>>> * The aarch64 regression test.
>>> * The risc-v regression tests.
>>> 
>>> gcc/ChangeLog:
>>> 
>>>        * internal-fn.cc (expand_fn_using_insn): Add vector int assertion.
>>>        * tree-vect-stmts.cc (vectorizable_call): Remove size check.
>>> 
>>> gcc/testsuite/ChangeLog:
>>> 
>>>        * gcc.target/aarch64/sve/clrsb_1.c: Adjust checker.
>>>        * gcc.target/aarch64/sve/clz_1.c: Ditto.
>>>        * gcc.target/aarch64/sve/popcount_1.c: Ditto.
>>>        * gcc.target/riscv/rvv/autovec/unop/popcount.c: Ditto.
>>> 
>>> Signed-off-by: Pan Li <pan2.li@intel.com>
>>> ---
>>> gcc/internal-fn.cc                                  |  3 ++-
>>> gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c      |  3 +--
>>> gcc/testsuite/gcc.target/aarch64/sve/clz_1.c        |  3 +--
>>> gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c   |  3 +--
>>> .../gcc.target/riscv/rvv/autovec/unop/popcount.c    |  2 +-
>>> gcc/tree-vect-stmts.cc                              | 13 -------------
>>> 6 files changed, 6 insertions(+), 21 deletions(-)
>>> 
>>> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
>>> index 61d5a9e4772..17c0f4c3805 100644
>>> --- a/gcc/internal-fn.cc
>>> +++ b/gcc/internal-fn.cc
>>> @@ -281,7 +281,8 @@ expand_fn_using_insn (gcall *stmt, insn_code icode, unsigned int noutputs,
>>>        emit_move_insn (lhs_rtx, ops[0].value);
>>>       else
>>>        {
>>> -         gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs)));
>>> +         gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs))
>>> +                              || VECTOR_INTEGER_TYPE_P (TREE_TYPE (lhs)));
>> 
>> Can you explain why this is necessary?  In particular what is lhs_rtx
>> mode vs ops[0].value mode?
>> 
>>>          convert_move (lhs_rtx, ops[0].value, 0);
>> 
>> I'm not sure convert_move handles vector modes correctly.  Richard
>> probably added this code, CCed.
>> 
>> Richard.
>> 
>>>        }
>>>     }
>>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c
>>> index bdc9856faaf..940d08bbc7b 100644
>>> --- a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c
>>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c
>>> @@ -18,5 +18,4 @@ clrsb_64 (unsigned int *restrict dst, uint64_t *restrict src, int size)
>>> }
>>> 
>>> /* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */
>>> -/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */
>>> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
>>> +/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */
>>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c
>>> index 0c7a4e6d768..58b8ff406d2 100644
>>> --- a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c
>>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c
>>> @@ -18,5 +18,4 @@ clz_64 (unsigned int *restrict dst, uint64_t *restrict src, int size)
>>> }
>>> 
>>> /* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */
>>> -/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */
>>> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
>>> +/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */
>>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c
>>> index dfb6f4ac7a5..0eba898307c 100644
>>> --- a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c
>>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c
>>> @@ -18,5 +18,4 @@ popcount_64 (unsigned int *restrict dst, uint64_t *restrict src, int size)
>>> }
>>> 
>>> /* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */
>>> -/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */
>>> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
>>> +/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */
>>> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
>>> index 585a522aa81..e6e3c70f927 100644
>>> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
>>> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
>>> @@ -1461,4 +1461,4 @@ main ()
>>>   RUN_ALL ()
>>> }
>>> 
>>> -/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 229 "vect" } } */
>>> +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 384 "vect" } } */
>>> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
>>> index a9200767f67..fa4ca0634e8 100644
>>> --- a/gcc/tree-vect-stmts.cc
>>> +++ b/gcc/tree-vect-stmts.cc
>>> @@ -3361,19 +3361,6 @@ vectorizable_call (vec_info *vinfo,
>>> 
>>>       return false;
>>>     }
>>> -  /* FORNOW: we don't yet support mixtures of vector sizes for calls,
>>> -     just mixtures of nunits.  E.g. DI->SI versions of __builtin_ctz*
>>> -     are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed
>>> -     by a pack of the two vectors into an SI vector.  We would need
>>> -     separate code to handle direct VnDI->VnSI IFN_CTZs.  */
>>> -  if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out))
>>> -    {
>>> -      if (dump_enabled_p ())
>>> -       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>>> -                        "mismatched vector sizes %T and %T\n",
>>> -                        vectype_in, vectype_out);
>>> -      return false;
>>> -    }
>>> 
>>>   if (VECTOR_BOOLEAN_TYPE_P (vectype_out)
>>>       != VECTOR_BOOLEAN_TYPE_P (vectype_in))
>>> --
>>> 2.34.1
>>> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2] VECT: Remove the type size restriction of vectorizer
  2023-10-27  2:17           ` Li, Pan2
@ 2023-10-27 13:31             ` Richard Biener
  0 siblings, 0 replies; 20+ messages in thread
From: Richard Biener @ 2023-10-27 13:31 UTC (permalink / raw)
  To: Li, Pan2
  Cc: Richard Sandiford, gcc-patches, juzhe.zhong, Wang, Yanzhang,
	kito.cheng, Liu, Hongtao

On Fri, Oct 27, 2023 at 4:17 AM Li, Pan2 <pan2.li@intel.com> wrote:
>
> Thanks Richard S for comments.
>
> > In other words, I don't think simply removing the test from the vectoriser
> > is correct.  It needs to be replaced by something more selective.
>
> Does it mean we need to check if the internal fun allow different modes/sizes here?
>
> For example, standard name lrintmn2 (m, n mode) is allowed here, while rintm2 (only m mode) isn't.

We need to check whether the "size" of the LHS is somehow participating in the
optab query.  I think the direct_internal_fn_info type0/1 members,
when -1 say that.
If none is -1 (and -2) then the LHS has to match one of the arguments (if there
are two different I'm not sure which we'd pick).

So patch-wise the existing check can probably be skipped when
vectorizable_internal_function
returns an IFN but that function should have the very same check when
vectype_out isn't
participating in the optab selection.

Richard.

>
> Pan
>
> -----Original Message-----
> From: Richard Sandiford <richard.sandiford@arm.com>
> Sent: Friday, October 27, 2023 1:47 AM
> To: Richard Biener <richard.guenther@gmail.com>
> Cc: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>
> Subject: Re: [PATCH v2] VECT: Remove the type size restriction of vectorizer
>
> Richard Biener <richard.guenther@gmail.com> writes:
> >> Am 26.10.2023 um 13:59 schrieb Li, Pan2 <pan2.li@intel.com>:
> >>
> >> Thanks Richard for comments.
> >>
> >>> Can you explain why this is necessary?  In particular what is lhs_rtx
> >>> mode vs ops[0].value mode?
> >>
> >> For testcase gcc.target/aarch64/sve/popcount_1.c, the rtl are list as below.
> >>
> >> The lhs_rtx is (reg:VNx2SI 98 [ vect__5.36 ]).
> >> The ops[0].value is (reg:VNx2DI 104).
> >>
> >> The restriction removing make the vector rtl enter expand_fn_using_insn and of course hit the INTEGER_P assertion.
> >
> > But I think this shows we mid-selected the optab, a convert_move is certainly not correct unconditionally here (the target might not support that)
>
> Agreed.  Allowing TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out)
> makes sense if the called function allows the input and output modes
> to vary.  That's true for internal functions that eventually map to
> two-mode optabs.  But we can't remove the condition for calls to
> other functions, at least not without some fix-ups.
>
> ISTM that the problem being hit is the one described by the removed
> comment.
>
> In other words, I don't think simply removing the test from the vectoriser
> is correct.  It needs to be replaced by something more selective.
>
> Thanks,
> Richard
>
> >> Pan
> >>
> >> -----Original Message-----
> >> From: Richard Biener <richard.guenther@gmail.com>
> >> Sent: Thursday, October 26, 2023 4:38 PM
> >> To: Li, Pan2 <pan2.li@intel.com>
> >> Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>; Richard Sandiford <richard.sandiford@arm.com>
> >> Subject: Re: [PATCH v2] VECT: Remove the type size restriction of vectorizer
> >>
> >>> On Thu, Oct 26, 2023 at 4:18 AM <pan2.li@intel.com> wrote:
> >>>
> >>> From: Pan Li <pan2.li@intel.com>
> >>>
> >>> Update in v2:
> >>>
> >>> * Fix one ICE of type assertion.
> >>> * Adjust some test cases for aarch64 sve and riscv vector.
> >>>
> >>> Original log:
> >>>
> >>> The vectoriable_call has one restriction of the size of data type.
> >>> Aka DF to DI is allowed but SF to DI isn't. You may see below message
> >>> when try to vectorize function call like lrintf.
> >>>
> >>> void
> >>> test_lrintf (long *out, float *in, unsigned count)
> >>> {
> >>>  for (unsigned i = 0; i < count; i++)
> >>>    out[i] = __builtin_lrintf (in[i]);
> >>> }
> >>>
> >>> lrintf.c:5:26: missed: couldn't vectorize loop
> >>> lrintf.c:5:26: missed: not vectorized: unsupported data-type
> >>>
> >>> Then the standard name pattern like lrintmn2 cannot work for different
> >>> data type size like SF => DI. This patch would like to remove this data
> >>> type size check and unblock the standard name like lrintmn2.
> >>>
> >>> The below test are passed for this patch.
> >>>
> >>> * The x86 bootstrap and regression test.
> >>> * The aarch64 regression test.
> >>> * The risc-v regression tests.
> >>>
> >>> gcc/ChangeLog:
> >>>
> >>>        * internal-fn.cc (expand_fn_using_insn): Add vector int assertion.
> >>>        * tree-vect-stmts.cc (vectorizable_call): Remove size check.
> >>>
> >>> gcc/testsuite/ChangeLog:
> >>>
> >>>        * gcc.target/aarch64/sve/clrsb_1.c: Adjust checker.
> >>>        * gcc.target/aarch64/sve/clz_1.c: Ditto.
> >>>        * gcc.target/aarch64/sve/popcount_1.c: Ditto.
> >>>        * gcc.target/riscv/rvv/autovec/unop/popcount.c: Ditto.
> >>>
> >>> Signed-off-by: Pan Li <pan2.li@intel.com>
> >>> ---
> >>> gcc/internal-fn.cc                                  |  3 ++-
> >>> gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c      |  3 +--
> >>> gcc/testsuite/gcc.target/aarch64/sve/clz_1.c        |  3 +--
> >>> gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c   |  3 +--
> >>> .../gcc.target/riscv/rvv/autovec/unop/popcount.c    |  2 +-
> >>> gcc/tree-vect-stmts.cc                              | 13 -------------
> >>> 6 files changed, 6 insertions(+), 21 deletions(-)
> >>>
> >>> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> >>> index 61d5a9e4772..17c0f4c3805 100644
> >>> --- a/gcc/internal-fn.cc
> >>> +++ b/gcc/internal-fn.cc
> >>> @@ -281,7 +281,8 @@ expand_fn_using_insn (gcall *stmt, insn_code icode, unsigned int noutputs,
> >>>        emit_move_insn (lhs_rtx, ops[0].value);
> >>>       else
> >>>        {
> >>> -         gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs)));
> >>> +         gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs))
> >>> +                              || VECTOR_INTEGER_TYPE_P (TREE_TYPE (lhs)));
> >>
> >> Can you explain why this is necessary?  In particular what is lhs_rtx
> >> mode vs ops[0].value mode?
> >>
> >>>          convert_move (lhs_rtx, ops[0].value, 0);
> >>
> >> I'm not sure convert_move handles vector modes correctly.  Richard
> >> probably added this code, CCed.
> >>
> >> Richard.
> >>
> >>>        }
> >>>     }
> >>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c
> >>> index bdc9856faaf..940d08bbc7b 100644
> >>> --- a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c
> >>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c
> >>> @@ -18,5 +18,4 @@ clrsb_64 (unsigned int *restrict dst, uint64_t *restrict src, int size)
> >>> }
> >>>
> >>> /* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */
> >>> -/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */
> >>> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
> >>> +/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */
> >>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c
> >>> index 0c7a4e6d768..58b8ff406d2 100644
> >>> --- a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c
> >>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c
> >>> @@ -18,5 +18,4 @@ clz_64 (unsigned int *restrict dst, uint64_t *restrict src, int size)
> >>> }
> >>>
> >>> /* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */
> >>> -/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */
> >>> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
> >>> +/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */
> >>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c
> >>> index dfb6f4ac7a5..0eba898307c 100644
> >>> --- a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c
> >>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c
> >>> @@ -18,5 +18,4 @@ popcount_64 (unsigned int *restrict dst, uint64_t *restrict src, int size)
> >>> }
> >>>
> >>> /* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */
> >>> -/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */
> >>> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
> >>> +/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */
> >>> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
> >>> index 585a522aa81..e6e3c70f927 100644
> >>> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
> >>> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c
> >>> @@ -1461,4 +1461,4 @@ main ()
> >>>   RUN_ALL ()
> >>> }
> >>>
> >>> -/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 229 "vect" } } */
> >>> +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 384 "vect" } } */
> >>> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> >>> index a9200767f67..fa4ca0634e8 100644
> >>> --- a/gcc/tree-vect-stmts.cc
> >>> +++ b/gcc/tree-vect-stmts.cc
> >>> @@ -3361,19 +3361,6 @@ vectorizable_call (vec_info *vinfo,
> >>>
> >>>       return false;
> >>>     }
> >>> -  /* FORNOW: we don't yet support mixtures of vector sizes for calls,
> >>> -     just mixtures of nunits.  E.g. DI->SI versions of __builtin_ctz*
> >>> -     are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed
> >>> -     by a pack of the two vectors into an SI vector.  We would need
> >>> -     separate code to handle direct VnDI->VnSI IFN_CTZs.  */
> >>> -  if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out))
> >>> -    {
> >>> -      if (dump_enabled_p ())
> >>> -       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> >>> -                        "mismatched vector sizes %T and %T\n",
> >>> -                        vectype_in, vectype_out);
> >>> -      return false;
> >>> -    }
> >>>
> >>>   if (VECTOR_BOOLEAN_TYPE_P (vectype_out)
> >>>       != VECTOR_BOOLEAN_TYPE_P (vectype_in))
> >>> --
> >>> 2.34.1
> >>>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v3] VECT: Refine the type size restriction of call vectorizer
  2023-10-18  1:20 [PATCH v1] RISC-V: Remove the type size restriction of vectorizer pan2.li
  2023-10-18  6:34 ` Richard Biener
  2023-10-26  2:18 ` [PATCH v2] VECT: " pan2.li
@ 2023-10-30 12:22 ` pan2.li
  2023-10-31 12:58   ` Richard Biener
  2023-10-31 15:10 ` [PATCH v4] " pan2.li
  3 siblings, 1 reply; 20+ messages in thread
From: pan2.li @ 2023-10-30 12:22 UTC (permalink / raw)
  To: gcc-patches
  Cc: juzhe.zhong, pan2.li, yanzhang.wang, kito.cheng, hongtao.liu,
	richard.guenther

From: Pan Li <pan2.li@intel.com>

Update in v3:

* Add func to predicate type size is legal or not for vectorizer call.

Update in v2:

* Fix one ICE of type assertion.
* Adjust some test cases for aarch64 sve and riscv vector.

Original log:

The vectoriable_call has one restriction of the size of data type.
Aka DF to DI is allowed but SF to DI isn't. You may see below message
when try to vectorize function call like lrintf.

void
test_lrintf (long *out, float *in, unsigned count)
{
  for (unsigned i = 0; i < count; i++)
    out[i] = __builtin_lrintf (in[i]);
}

lrintf.c:5:26: missed: couldn't vectorize loop
lrintf.c:5:26: missed: not vectorized: unsupported data-type

Then the standard name pattern like lrintmn2 cannot work for different
data type size like SF => DI. This patch would like to refine this data
type size check and unblock the standard name like lrintmn2 on conditions.

The type size of vectype_out need to be exactly the same as the type
size of vectype_in when the vectype_out size isn't participating in
the optab selection. While there is no such restriction when the
vectype_out is somehow a part of the optab query.

The below test are passed for this patch.

* The x86 bootstrap and regression test.
* The aarch64 regression test.
* The risc-v regression tests.
* Ensure the lrintf standard name in risc-v.

gcc/ChangeLog:

	* tree-vect-stmts.cc (vectorizable_type_size_legal_p): New
	func impl to predicate the type size is legal or not.
	(vectorizable_call): Leverage vectorizable_type_size_legal_p.

Signed-off-by: Pan Li <pan2.li@intel.com>
---
 gcc/tree-vect-stmts.cc | 51 +++++++++++++++++++++++++++++++-----------
 1 file changed, 38 insertions(+), 13 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index a9200767f67..24b3448d961 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -1430,6 +1430,35 @@ vectorizable_internal_function (combined_fn cfn, tree fndecl,
   return IFN_LAST;
 }
 
+/* Return TRUE when the type size is legal for the call vectorizer,
+   or FALSE.
+   The type size of both the vectype_in and vectype_out should be
+   exactly the same when vectype_out isn't participating the optab.
+   While there is no restriction for type size when vectype_out
+   is part of the optab query.
+ */
+static bool
+vectorizable_type_size_legal_p (internal_fn ifn, tree vectype_out,
+				tree vectype_in)
+{
+  bool same_size_p = TYPE_SIZE (vectype_in) == TYPE_SIZE (vectype_out);
+
+  if (ifn == IFN_LAST || !direct_internal_fn_p (ifn))
+    return same_size_p;
+
+  const direct_internal_fn_info &difn_info = direct_internal_fn (ifn);
+
+  if (!difn_info.vectorizable)
+    return same_size_p;
+
+  /* According to vectorizable_internal_function, the type0/1 < 0 indicates
+     the vectype_out participating the optable selection.  Aka the type size
+     check can be skipped here.  */
+  if (difn_info.type0 < 0 || difn_info.type1 < 0)
+    return true;
+
+  return same_size_p;
+}
 
 static tree permute_vec_elements (vec_info *, tree, tree, tree, stmt_vec_info,
 				  gimple_stmt_iterator *);
@@ -3361,19 +3390,6 @@ vectorizable_call (vec_info *vinfo,
 
       return false;
     }
-  /* FORNOW: we don't yet support mixtures of vector sizes for calls,
-     just mixtures of nunits.  E.g. DI->SI versions of __builtin_ctz*
-     are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed
-     by a pack of the two vectors into an SI vector.  We would need
-     separate code to handle direct VnDI->VnSI IFN_CTZs.  */
-  if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out))
-    {
-      if (dump_enabled_p ())
-	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-			 "mismatched vector sizes %T and %T\n",
-			 vectype_in, vectype_out);
-      return false;
-    }
 
   if (VECTOR_BOOLEAN_TYPE_P (vectype_out)
       != VECTOR_BOOLEAN_TYPE_P (vectype_in))
@@ -3431,6 +3447,15 @@ vectorizable_call (vec_info *vinfo,
     ifn = vectorizable_internal_function (cfn, callee, vectype_out,
 					  vectype_in);
 
+  if (!vectorizable_type_size_legal_p (ifn, vectype_out, vectype_in))
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			 "mismatched vector sizes %T and %T\n",
+			 vectype_in, vectype_out);
+      return false;
+    }
+
   /* If that fails, try asking for a target-specific built-in function.  */
   if (ifn == IFN_LAST)
     {
-- 
2.34.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] VECT: Refine the type size restriction of call vectorizer
  2023-10-30 12:22 ` [PATCH v3] VECT: Refine the type size restriction of call vectorizer pan2.li
@ 2023-10-31 12:58   ` Richard Biener
  2023-10-31 14:10     ` Li, Pan2
  0 siblings, 1 reply; 20+ messages in thread
From: Richard Biener @ 2023-10-31 12:58 UTC (permalink / raw)
  To: pan2.li; +Cc: gcc-patches, juzhe.zhong, yanzhang.wang, kito.cheng, hongtao.liu

On Mon, Oct 30, 2023 at 1:23 PM <pan2.li@intel.com> wrote:
>
> From: Pan Li <pan2.li@intel.com>
>
> Update in v3:
>
> * Add func to predicate type size is legal or not for vectorizer call.
>
> Update in v2:
>
> * Fix one ICE of type assertion.
> * Adjust some test cases for aarch64 sve and riscv vector.
>
> Original log:
>
> The vectoriable_call has one restriction of the size of data type.
> Aka DF to DI is allowed but SF to DI isn't. You may see below message
> when try to vectorize function call like lrintf.
>
> void
> test_lrintf (long *out, float *in, unsigned count)
> {
>   for (unsigned i = 0; i < count; i++)
>     out[i] = __builtin_lrintf (in[i]);
> }
>
> lrintf.c:5:26: missed: couldn't vectorize loop
> lrintf.c:5:26: missed: not vectorized: unsupported data-type
>
> Then the standard name pattern like lrintmn2 cannot work for different
> data type size like SF => DI. This patch would like to refine this data
> type size check and unblock the standard name like lrintmn2 on conditions.
>
> The type size of vectype_out need to be exactly the same as the type
> size of vectype_in when the vectype_out size isn't participating in
> the optab selection. While there is no such restriction when the
> vectype_out is somehow a part of the optab query.
>
> The below test are passed for this patch.
>
> * The x86 bootstrap and regression test.
> * The aarch64 regression test.
> * The risc-v regression tests.
> * Ensure the lrintf standard name in risc-v.
>
> gcc/ChangeLog:
>
>         * tree-vect-stmts.cc (vectorizable_type_size_legal_p): New
>         func impl to predicate the type size is legal or not.
>         (vectorizable_call): Leverage vectorizable_type_size_legal_p.
>
> Signed-off-by: Pan Li <pan2.li@intel.com>
> ---
>  gcc/tree-vect-stmts.cc | 51 +++++++++++++++++++++++++++++++-----------
>  1 file changed, 38 insertions(+), 13 deletions(-)
>
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index a9200767f67..24b3448d961 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -1430,6 +1430,35 @@ vectorizable_internal_function (combined_fn cfn, tree fndecl,
>    return IFN_LAST;
>  }
>
> +/* Return TRUE when the type size is legal for the call vectorizer,
> +   or FALSE.
> +   The type size of both the vectype_in and vectype_out should be
> +   exactly the same when vectype_out isn't participating the optab.
> +   While there is no restriction for type size when vectype_out
> +   is part of the optab query.
> + */
> +static bool
> +vectorizable_type_size_legal_p (internal_fn ifn, tree vectype_out,
> +                               tree vectype_in)
> +{
> +  bool same_size_p = TYPE_SIZE (vectype_in) == TYPE_SIZE (vectype_out);
> +
> +  if (ifn == IFN_LAST || !direct_internal_fn_p (ifn))
> +    return same_size_p;
> +
> +  const direct_internal_fn_info &difn_info = direct_internal_fn (ifn);
> +
> +  if (!difn_info.vectorizable)
> +    return same_size_p;
> +
> +  /* According to vectorizable_internal_function, the type0/1 < 0 indicates
> +     the vectype_out participating the optable selection.  Aka the type size
> +     check can be skipped here.  */
> +  if (difn_info.type0 < 0 || difn_info.type1 < 0)
> +    return true;

can you instead amend vectorizable_internal_function to contain the check,
returning IFN_LAST if it doesn't hold?

> +
> +  return same_size_p;
> +}
>
>  static tree permute_vec_elements (vec_info *, tree, tree, tree, stmt_vec_info,
>                                   gimple_stmt_iterator *);
> @@ -3361,19 +3390,6 @@ vectorizable_call (vec_info *vinfo,
>
>        return false;
>      }
> -  /* FORNOW: we don't yet support mixtures of vector sizes for calls,
> -     just mixtures of nunits.  E.g. DI->SI versions of __builtin_ctz*
> -     are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed
> -     by a pack of the two vectors into an SI vector.  We would need
> -     separate code to handle direct VnDI->VnSI IFN_CTZs.  */
> -  if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out))
> -    {
> -      if (dump_enabled_p ())
> -       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -                        "mismatched vector sizes %T and %T\n",
> -                        vectype_in, vectype_out);
> -      return false;
> -    }
>
>    if (VECTOR_BOOLEAN_TYPE_P (vectype_out)
>        != VECTOR_BOOLEAN_TYPE_P (vectype_in))
> @@ -3431,6 +3447,15 @@ vectorizable_call (vec_info *vinfo,
>      ifn = vectorizable_internal_function (cfn, callee, vectype_out,
>                                           vectype_in);
>
> +  if (!vectorizable_type_size_legal_p (ifn, vectype_out, vectype_in))
> +    {
> +      if (dump_enabled_p ())
> +       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                        "mismatched vector sizes %T and %T\n",
> +                        vectype_in, vectype_out);
> +      return false;
> +    }
> +
>    /* If that fails, try asking for a target-specific built-in function.  */
>    if (ifn == IFN_LAST)
>      {
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v3] VECT: Refine the type size restriction of call vectorizer
  2023-10-31 12:58   ` Richard Biener
@ 2023-10-31 14:10     ` Li, Pan2
  0 siblings, 0 replies; 20+ messages in thread
From: Li, Pan2 @ 2023-10-31 14:10 UTC (permalink / raw)
  To: Richard Biener
  Cc: gcc-patches, juzhe.zhong, Wang, Yanzhang, kito.cheng, Liu, Hongtao

> can you instead amend vectorizable_internal_function to contain the check,
> returning IFN_LAST if it doesn't hold?

Sure, will send v4 for this.

Pan

-----Original Message-----
From: Richard Biener <richard.guenther@gmail.com> 
Sent: Tuesday, October 31, 2023 8:58 PM
To: Li, Pan2 <pan2.li@intel.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>
Subject: Re: [PATCH v3] VECT: Refine the type size restriction of call vectorizer

On Mon, Oct 30, 2023 at 1:23 PM <pan2.li@intel.com> wrote:
>
> From: Pan Li <pan2.li@intel.com>
>
> Update in v3:
>
> * Add func to predicate type size is legal or not for vectorizer call.
>
> Update in v2:
>
> * Fix one ICE of type assertion.
> * Adjust some test cases for aarch64 sve and riscv vector.
>
> Original log:
>
> The vectoriable_call has one restriction of the size of data type.
> Aka DF to DI is allowed but SF to DI isn't. You may see below message
> when try to vectorize function call like lrintf.
>
> void
> test_lrintf (long *out, float *in, unsigned count)
> {
>   for (unsigned i = 0; i < count; i++)
>     out[i] = __builtin_lrintf (in[i]);
> }
>
> lrintf.c:5:26: missed: couldn't vectorize loop
> lrintf.c:5:26: missed: not vectorized: unsupported data-type
>
> Then the standard name pattern like lrintmn2 cannot work for different
> data type size like SF => DI. This patch would like to refine this data
> type size check and unblock the standard name like lrintmn2 on conditions.
>
> The type size of vectype_out need to be exactly the same as the type
> size of vectype_in when the vectype_out size isn't participating in
> the optab selection. While there is no such restriction when the
> vectype_out is somehow a part of the optab query.
>
> The below test are passed for this patch.
>
> * The x86 bootstrap and regression test.
> * The aarch64 regression test.
> * The risc-v regression tests.
> * Ensure the lrintf standard name in risc-v.
>
> gcc/ChangeLog:
>
>         * tree-vect-stmts.cc (vectorizable_type_size_legal_p): New
>         func impl to predicate the type size is legal or not.
>         (vectorizable_call): Leverage vectorizable_type_size_legal_p.
>
> Signed-off-by: Pan Li <pan2.li@intel.com>
> ---
>  gcc/tree-vect-stmts.cc | 51 +++++++++++++++++++++++++++++++-----------
>  1 file changed, 38 insertions(+), 13 deletions(-)
>
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index a9200767f67..24b3448d961 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -1430,6 +1430,35 @@ vectorizable_internal_function (combined_fn cfn, tree fndecl,
>    return IFN_LAST;
>  }
>
> +/* Return TRUE when the type size is legal for the call vectorizer,
> +   or FALSE.
> +   The type size of both the vectype_in and vectype_out should be
> +   exactly the same when vectype_out isn't participating the optab.
> +   While there is no restriction for type size when vectype_out
> +   is part of the optab query.
> + */
> +static bool
> +vectorizable_type_size_legal_p (internal_fn ifn, tree vectype_out,
> +                               tree vectype_in)
> +{
> +  bool same_size_p = TYPE_SIZE (vectype_in) == TYPE_SIZE (vectype_out);
> +
> +  if (ifn == IFN_LAST || !direct_internal_fn_p (ifn))
> +    return same_size_p;
> +
> +  const direct_internal_fn_info &difn_info = direct_internal_fn (ifn);
> +
> +  if (!difn_info.vectorizable)
> +    return same_size_p;
> +
> +  /* According to vectorizable_internal_function, the type0/1 < 0 indicates
> +     the vectype_out participating the optable selection.  Aka the type size
> +     check can be skipped here.  */
> +  if (difn_info.type0 < 0 || difn_info.type1 < 0)
> +    return true;

can you instead amend vectorizable_internal_function to contain the check,
returning IFN_LAST if it doesn't hold?

> +
> +  return same_size_p;
> +}
>
>  static tree permute_vec_elements (vec_info *, tree, tree, tree, stmt_vec_info,
>                                   gimple_stmt_iterator *);
> @@ -3361,19 +3390,6 @@ vectorizable_call (vec_info *vinfo,
>
>        return false;
>      }
> -  /* FORNOW: we don't yet support mixtures of vector sizes for calls,
> -     just mixtures of nunits.  E.g. DI->SI versions of __builtin_ctz*
> -     are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed
> -     by a pack of the two vectors into an SI vector.  We would need
> -     separate code to handle direct VnDI->VnSI IFN_CTZs.  */
> -  if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out))
> -    {
> -      if (dump_enabled_p ())
> -       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -                        "mismatched vector sizes %T and %T\n",
> -                        vectype_in, vectype_out);
> -      return false;
> -    }
>
>    if (VECTOR_BOOLEAN_TYPE_P (vectype_out)
>        != VECTOR_BOOLEAN_TYPE_P (vectype_in))
> @@ -3431,6 +3447,15 @@ vectorizable_call (vec_info *vinfo,
>      ifn = vectorizable_internal_function (cfn, callee, vectype_out,
>                                           vectype_in);
>
> +  if (!vectorizable_type_size_legal_p (ifn, vectype_out, vectype_in))
> +    {
> +      if (dump_enabled_p ())
> +       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                        "mismatched vector sizes %T and %T\n",
> +                        vectype_in, vectype_out);
> +      return false;
> +    }
> +
>    /* If that fails, try asking for a target-specific built-in function.  */
>    if (ifn == IFN_LAST)
>      {
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v4] VECT: Refine the type size restriction of call vectorizer
  2023-10-18  1:20 [PATCH v1] RISC-V: Remove the type size restriction of vectorizer pan2.li
                   ` (2 preceding siblings ...)
  2023-10-30 12:22 ` [PATCH v3] VECT: Refine the type size restriction of call vectorizer pan2.li
@ 2023-10-31 15:10 ` pan2.li
  2023-10-31 23:36   ` Li, Pan2
  2023-11-01 16:43   ` Richard Biener
  3 siblings, 2 replies; 20+ messages in thread
From: pan2.li @ 2023-10-31 15:10 UTC (permalink / raw)
  To: gcc-patches
  Cc: juzhe.zhong, pan2.li, yanzhang.wang, kito.cheng, hongtao.liu,
	richard.guenther

From: Pan Li <pan2.li@intel.com>

Update in v4:

* Append the check to vectorizable_internal_function.

Update in v3:

* Add func to predicate type size is legal or not for vectorizer call.

Update in v2:

* Fix one ICE of type assertion.
* Adjust some test cases for aarch64 sve and riscv vector.

Original log:

The vectoriable_call has one restriction of the size of data type.
Aka DF to DI is allowed but SF to DI isn't. You may see below message
when try to vectorize function call like lrintf.

void
test_lrintf (long *out, float *in, unsigned count)
{
  for (unsigned i = 0; i < count; i++)
    out[i] = __builtin_lrintf (in[i]);
}

lrintf.c:5:26: missed: couldn't vectorize loop
lrintf.c:5:26: missed: not vectorized: unsupported data-type

Then the standard name pattern like lrintmn2 cannot work for different
data type size like SF => DI. This patch would like to refine this data
type size check and unblock the standard name like lrintmn2 on conditions.

The type size of vectype_out need to be exactly the same as the type
size of vectype_in when the vectype_out size isn't participating in
the optab selection. While there is no such restriction when the
vectype_out is somehow a part of the optab query.

The below test are passed for this patch.

* The risc-v regression tests.
* Ensure the lrintf standard name in risc-v.

The below test are ongoing.

* The x86 bootstrap and regression test.
* The aarch64 regression test.

gcc/ChangeLog:

	* tree-vect-stmts.cc (vectorizable_internal_function): Add type
	size check for vectype_out doesn't participating for optab query.
	(vectorizable_call): Remove the type size check.

Signed-off-by: Pan Li <pan2.li@intel.com>
---
 gcc/tree-vect-stmts.cc | 22 +++++++++-------------
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index a9200767f67..799b4ab10c7 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -1420,8 +1420,17 @@ vectorizable_internal_function (combined_fn cfn, tree fndecl,
       const direct_internal_fn_info &info = direct_internal_fn (ifn);
       if (info.vectorizable)
 	{
+	  bool same_size_p = TYPE_SIZE (vectype_in) == TYPE_SIZE (vectype_out);
 	  tree type0 = (info.type0 < 0 ? vectype_out : vectype_in);
 	  tree type1 = (info.type1 < 0 ? vectype_out : vectype_in);
+
+	  /* The type size of both the vectype_in and vectype_out should be
+	     exactly the same when vectype_out isn't participating the optab.
+	     While there is no restriction for type size when vectype_out
+	     is part of the optab query.  */
+	  if (type0 != vectype_out && type1 != vectype_out && !same_size_p)
+	    return IFN_LAST;
+
 	  if (direct_internal_fn_supported_p (ifn, tree_pair (type0, type1),
 					      OPTIMIZE_FOR_SPEED))
 	    return ifn;
@@ -3361,19 +3370,6 @@ vectorizable_call (vec_info *vinfo,
 
       return false;
     }
-  /* FORNOW: we don't yet support mixtures of vector sizes for calls,
-     just mixtures of nunits.  E.g. DI->SI versions of __builtin_ctz*
-     are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed
-     by a pack of the two vectors into an SI vector.  We would need
-     separate code to handle direct VnDI->VnSI IFN_CTZs.  */
-  if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out))
-    {
-      if (dump_enabled_p ())
-	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-			 "mismatched vector sizes %T and %T\n",
-			 vectype_in, vectype_out);
-      return false;
-    }
 
   if (VECTOR_BOOLEAN_TYPE_P (vectype_out)
       != VECTOR_BOOLEAN_TYPE_P (vectype_in))
-- 
2.34.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v4] VECT: Refine the type size restriction of call vectorizer
  2023-10-31 15:10 ` [PATCH v4] " pan2.li
@ 2023-10-31 23:36   ` Li, Pan2
  2023-11-01 16:43   ` Richard Biener
  1 sibling, 0 replies; 20+ messages in thread
From: Li, Pan2 @ 2023-10-31 23:36 UTC (permalink / raw)
  To: gcc-patches
  Cc: juzhe.zhong, Wang, Yanzhang, kito.cheng, Liu, Hongtao, richard.guenther

The below test are passed for this patch.

* The x86 bootstrap and regression test.
* The aarch64 regression test.
* The risc-v regression tests.
* Ensure the lrintf standard name in RVV.

Pan

-----Original Message-----
From: Li, Pan2 <pan2.li@intel.com> 
Sent: Tuesday, October 31, 2023 11:10 PM
To: gcc-patches@gcc.gnu.org
Cc: juzhe.zhong@rivai.ai; Li, Pan2 <pan2.li@intel.com>; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>; richard.guenther@gmail.com
Subject: [PATCH v4] VECT: Refine the type size restriction of call vectorizer

From: Pan Li <pan2.li@intel.com>

Update in v4:

* Append the check to vectorizable_internal_function.

Update in v3:

* Add func to predicate type size is legal or not for vectorizer call.

Update in v2:

* Fix one ICE of type assertion.
* Adjust some test cases for aarch64 sve and riscv vector.

Original log:

The vectoriable_call has one restriction of the size of data type.
Aka DF to DI is allowed but SF to DI isn't. You may see below message
when try to vectorize function call like lrintf.

void
test_lrintf (long *out, float *in, unsigned count)
{
  for (unsigned i = 0; i < count; i++)
    out[i] = __builtin_lrintf (in[i]);
}

lrintf.c:5:26: missed: couldn't vectorize loop
lrintf.c:5:26: missed: not vectorized: unsupported data-type

Then the standard name pattern like lrintmn2 cannot work for different
data type size like SF => DI. This patch would like to refine this data
type size check and unblock the standard name like lrintmn2 on conditions.

The type size of vectype_out need to be exactly the same as the type
size of vectype_in when the vectype_out size isn't participating in
the optab selection. While there is no such restriction when the
vectype_out is somehow a part of the optab query.

The below test are passed for this patch.

* The risc-v regression tests.
* Ensure the lrintf standard name in risc-v.

The below test are ongoing.

* The x86 bootstrap and regression test.
* The aarch64 regression test.

gcc/ChangeLog:

	* tree-vect-stmts.cc (vectorizable_internal_function): Add type
	size check for vectype_out doesn't participating for optab query.
	(vectorizable_call): Remove the type size check.

Signed-off-by: Pan Li <pan2.li@intel.com>
---
 gcc/tree-vect-stmts.cc | 22 +++++++++-------------
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index a9200767f67..799b4ab10c7 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -1420,8 +1420,17 @@ vectorizable_internal_function (combined_fn cfn, tree fndecl,
       const direct_internal_fn_info &info = direct_internal_fn (ifn);
       if (info.vectorizable)
 	{
+	  bool same_size_p = TYPE_SIZE (vectype_in) == TYPE_SIZE (vectype_out);
 	  tree type0 = (info.type0 < 0 ? vectype_out : vectype_in);
 	  tree type1 = (info.type1 < 0 ? vectype_out : vectype_in);
+
+	  /* The type size of both the vectype_in and vectype_out should be
+	     exactly the same when vectype_out isn't participating the optab.
+	     While there is no restriction for type size when vectype_out
+	     is part of the optab query.  */
+	  if (type0 != vectype_out && type1 != vectype_out && !same_size_p)
+	    return IFN_LAST;
+
 	  if (direct_internal_fn_supported_p (ifn, tree_pair (type0, type1),
 					      OPTIMIZE_FOR_SPEED))
 	    return ifn;
@@ -3361,19 +3370,6 @@ vectorizable_call (vec_info *vinfo,
 
       return false;
     }
-  /* FORNOW: we don't yet support mixtures of vector sizes for calls,
-     just mixtures of nunits.  E.g. DI->SI versions of __builtin_ctz*
-     are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed
-     by a pack of the two vectors into an SI vector.  We would need
-     separate code to handle direct VnDI->VnSI IFN_CTZs.  */
-  if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out))
-    {
-      if (dump_enabled_p ())
-	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-			 "mismatched vector sizes %T and %T\n",
-			 vectype_in, vectype_out);
-      return false;
-    }
 
   if (VECTOR_BOOLEAN_TYPE_P (vectype_out)
       != VECTOR_BOOLEAN_TYPE_P (vectype_in))
-- 
2.34.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4] VECT: Refine the type size restriction of call vectorizer
  2023-10-31 15:10 ` [PATCH v4] " pan2.li
  2023-10-31 23:36   ` Li, Pan2
@ 2023-11-01 16:43   ` Richard Biener
  2023-11-02  0:54     ` Li, Pan2
  1 sibling, 1 reply; 20+ messages in thread
From: Richard Biener @ 2023-11-01 16:43 UTC (permalink / raw)
  To: pan2.li; +Cc: gcc-patches, juzhe.zhong, yanzhang.wang, kito.cheng, hongtao.liu



> Am 31.10.2023 um 16:10 schrieb pan2.li@intel.com:
> 
> From: Pan Li <pan2.li@intel.com>
> 
> Update in v4:
> 
> * Append the check to vectorizable_internal_function.
> 
> Update in v3:
> 
> * Add func to predicate type size is legal or not for vectorizer call.
> 
> Update in v2:
> 
> * Fix one ICE of type assertion.
> * Adjust some test cases for aarch64 sve and riscv vector.
> 
> Original log:
> 
> The vectoriable_call has one restriction of the size of data type.
> Aka DF to DI is allowed but SF to DI isn't. You may see below message
> when try to vectorize function call like lrintf.
> 
> void
> test_lrintf (long *out, float *in, unsigned count)
> {
>  for (unsigned i = 0; i < count; i++)
>    out[i] = __builtin_lrintf (in[i]);
> }
> 
> lrintf.c:5:26: missed: couldn't vectorize loop
> lrintf.c:5:26: missed: not vectorized: unsupported data-type
> 
> Then the standard name pattern like lrintmn2 cannot work for different
> data type size like SF => DI. This patch would like to refine this data
> type size check and unblock the standard name like lrintmn2 on conditions.
> 
> The type size of vectype_out need to be exactly the same as the type
> size of vectype_in when the vectype_out size isn't participating in
> the optab selection. While there is no such restriction when the
> vectype_out is somehow a part of the optab query.
> 
> The below test are passed for this patch.
> 
> * The risc-v regression tests.
> * Ensure the lrintf standard name in risc-v.
> 
> The below test are ongoing.
> 
> * The x86 bootstrap and regression test.
> * The aarch64 regression test.
> 

Ok

Thanks,
Richard 

> gcc/ChangeLog:
> 
>    * tree-vect-stmts.cc (vectorizable_internal_function): Add type
>    size check for vectype_out doesn't participating for optab query.
>    (vectorizable_call): Remove the type size check.
> 
> Signed-off-by: Pan Li <pan2.li@intel.com>
> ---
> gcc/tree-vect-stmts.cc | 22 +++++++++-------------
> 1 file changed, 9 insertions(+), 13 deletions(-)
> 
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index a9200767f67..799b4ab10c7 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -1420,8 +1420,17 @@ vectorizable_internal_function (combined_fn cfn, tree fndecl,
>       const direct_internal_fn_info &info = direct_internal_fn (ifn);
>       if (info.vectorizable)
>    {
> +      bool same_size_p = TYPE_SIZE (vectype_in) == TYPE_SIZE (vectype_out);
>      tree type0 = (info.type0 < 0 ? vectype_out : vectype_in);
>      tree type1 = (info.type1 < 0 ? vectype_out : vectype_in);
> +
> +      /* The type size of both the vectype_in and vectype_out should be
> +         exactly the same when vectype_out isn't participating the optab.
> +         While there is no restriction for type size when vectype_out
> +         is part of the optab query.  */
> +      if (type0 != vectype_out && type1 != vectype_out && !same_size_p)
> +        return IFN_LAST;
> +
>      if (direct_internal_fn_supported_p (ifn, tree_pair (type0, type1),
>                          OPTIMIZE_FOR_SPEED))
>        return ifn;
> @@ -3361,19 +3370,6 @@ vectorizable_call (vec_info *vinfo,
> 
>       return false;
>     }
> -  /* FORNOW: we don't yet support mixtures of vector sizes for calls,
> -     just mixtures of nunits.  E.g. DI->SI versions of __builtin_ctz*
> -     are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed
> -     by a pack of the two vectors into an SI vector.  We would need
> -     separate code to handle direct VnDI->VnSI IFN_CTZs.  */
> -  if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out))
> -    {
> -      if (dump_enabled_p ())
> -    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -             "mismatched vector sizes %T and %T\n",
> -             vectype_in, vectype_out);
> -      return false;
> -    }
> 
>   if (VECTOR_BOOLEAN_TYPE_P (vectype_out)
>       != VECTOR_BOOLEAN_TYPE_P (vectype_in))
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH v4] VECT: Refine the type size restriction of call vectorizer
  2023-11-01 16:43   ` Richard Biener
@ 2023-11-02  0:54     ` Li, Pan2
  0 siblings, 0 replies; 20+ messages in thread
From: Li, Pan2 @ 2023-11-02  0:54 UTC (permalink / raw)
  To: Richard Biener
  Cc: gcc-patches, juzhe.zhong, Wang, Yanzhang, kito.cheng, Liu, Hongtao

Committed, thanks Richard.

Pan

-----Original Message-----
From: Richard Biener <richard.guenther@gmail.com> 
Sent: Thursday, November 2, 2023 12:43 AM
To: Li, Pan2 <pan2.li@intel.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>
Subject: Re: [PATCH v4] VECT: Refine the type size restriction of call vectorizer



> Am 31.10.2023 um 16:10 schrieb pan2.li@intel.com:
> 
> From: Pan Li <pan2.li@intel.com>
> 
> Update in v4:
> 
> * Append the check to vectorizable_internal_function.
> 
> Update in v3:
> 
> * Add func to predicate type size is legal or not for vectorizer call.
> 
> Update in v2:
> 
> * Fix one ICE of type assertion.
> * Adjust some test cases for aarch64 sve and riscv vector.
> 
> Original log:
> 
> The vectoriable_call has one restriction of the size of data type.
> Aka DF to DI is allowed but SF to DI isn't. You may see below message
> when try to vectorize function call like lrintf.
> 
> void
> test_lrintf (long *out, float *in, unsigned count)
> {
>  for (unsigned i = 0; i < count; i++)
>    out[i] = __builtin_lrintf (in[i]);
> }
> 
> lrintf.c:5:26: missed: couldn't vectorize loop
> lrintf.c:5:26: missed: not vectorized: unsupported data-type
> 
> Then the standard name pattern like lrintmn2 cannot work for different
> data type size like SF => DI. This patch would like to refine this data
> type size check and unblock the standard name like lrintmn2 on conditions.
> 
> The type size of vectype_out need to be exactly the same as the type
> size of vectype_in when the vectype_out size isn't participating in
> the optab selection. While there is no such restriction when the
> vectype_out is somehow a part of the optab query.
> 
> The below test are passed for this patch.
> 
> * The risc-v regression tests.
> * Ensure the lrintf standard name in risc-v.
> 
> The below test are ongoing.
> 
> * The x86 bootstrap and regression test.
> * The aarch64 regression test.
> 

Ok

Thanks,
Richard 

> gcc/ChangeLog:
> 
>    * tree-vect-stmts.cc (vectorizable_internal_function): Add type
>    size check for vectype_out doesn't participating for optab query.
>    (vectorizable_call): Remove the type size check.
> 
> Signed-off-by: Pan Li <pan2.li@intel.com>
> ---
> gcc/tree-vect-stmts.cc | 22 +++++++++-------------
> 1 file changed, 9 insertions(+), 13 deletions(-)
> 
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index a9200767f67..799b4ab10c7 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -1420,8 +1420,17 @@ vectorizable_internal_function (combined_fn cfn, tree fndecl,
>       const direct_internal_fn_info &info = direct_internal_fn (ifn);
>       if (info.vectorizable)
>    {
> +      bool same_size_p = TYPE_SIZE (vectype_in) == TYPE_SIZE (vectype_out);
>      tree type0 = (info.type0 < 0 ? vectype_out : vectype_in);
>      tree type1 = (info.type1 < 0 ? vectype_out : vectype_in);
> +
> +      /* The type size of both the vectype_in and vectype_out should be
> +         exactly the same when vectype_out isn't participating the optab.
> +         While there is no restriction for type size when vectype_out
> +         is part of the optab query.  */
> +      if (type0 != vectype_out && type1 != vectype_out && !same_size_p)
> +        return IFN_LAST;
> +
>      if (direct_internal_fn_supported_p (ifn, tree_pair (type0, type1),
>                          OPTIMIZE_FOR_SPEED))
>        return ifn;
> @@ -3361,19 +3370,6 @@ vectorizable_call (vec_info *vinfo,
> 
>       return false;
>     }
> -  /* FORNOW: we don't yet support mixtures of vector sizes for calls,
> -     just mixtures of nunits.  E.g. DI->SI versions of __builtin_ctz*
> -     are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed
> -     by a pack of the two vectors into an SI vector.  We would need
> -     separate code to handle direct VnDI->VnSI IFN_CTZs.  */
> -  if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out))
> -    {
> -      if (dump_enabled_p ())
> -    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -             "mismatched vector sizes %T and %T\n",
> -             vectype_in, vectype_out);
> -      return false;
> -    }
> 
>   if (VECTOR_BOOLEAN_TYPE_P (vectype_out)
>       != VECTOR_BOOLEAN_TYPE_P (vectype_in))
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2023-11-02  0:54 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-18  1:20 [PATCH v1] RISC-V: Remove the type size restriction of vectorizer pan2.li
2023-10-18  6:34 ` Richard Biener
2023-10-18 11:50   ` Li, Pan2
2023-10-20  8:43   ` Li, Pan2
2023-10-20  8:49     ` juzhe.zhong
2023-10-26  2:18 ` [PATCH v2] VECT: " pan2.li
2023-10-26  8:37   ` Richard Biener
2023-10-26 11:59     ` Li, Pan2
2023-10-26 13:59       ` Richard Biener
2023-10-26 14:42         ` Li, Pan2
2023-10-26 17:46         ` Richard Sandiford
2023-10-27  2:17           ` Li, Pan2
2023-10-27 13:31             ` Richard Biener
2023-10-30 12:22 ` [PATCH v3] VECT: Refine the type size restriction of call vectorizer pan2.li
2023-10-31 12:58   ` Richard Biener
2023-10-31 14:10     ` Li, Pan2
2023-10-31 15:10 ` [PATCH v4] " pan2.li
2023-10-31 23:36   ` Li, Pan2
2023-11-01 16:43   ` Richard Biener
2023-11-02  0:54     ` Li, Pan2

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).