public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 0/3] Optimize loongarch vector implementation.
@ 2023-10-16  2:00 Jiahao Xu
  2023-10-16  2:00 ` [PATCH 1/3] LoongArch:Implement avg and sad standard names Jiahao Xu
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Jiahao Xu @ 2023-10-16  2:00 UTC (permalink / raw)
  To: gcc-patches; +Cc: xry111, i, chenglulu, xuchenghua, Jiahao Xu

The following three patches further enhance loongarch’s vectorization capabilities.

Patch one add LoongArch support for AVG_CEIL/FLOOR.

Patch 2 add LoongArch support for vec_widen_mult/add/sub_lo/hi patterns.

patch 3 make loongarch use the new vector hooks and implements the costing
function determine_suggested_unroll_factor, to make it be able to suggest the
unroll factor for a given loop being vectorized base vec_ops analysis during
vector costing and the available issue information.The patch also adjusts cost
model through performance analysis.

Jiahao Xu (3):
  LoongArch:Implement avg and sad standard names.
  LoongArch:Implement vec_widen standard names.
  LoongArch:Implement the new vector cost model framework.

 gcc/config/loongarch/genopts/loongarch.opt.in |  15 +-
 gcc/config/loongarch/lasx.md                  | 156 ++++++++-
 gcc/config/loongarch/loongarch-protos.h       |   1 +
 gcc/config/loongarch/loongarch.cc             | 309 +++++++++++++++++-
 gcc/config/loongarch/loongarch.md             |   2 +
 gcc/config/loongarch/loongarch.opt            |  15 +-
 gcc/config/loongarch/lsx.md                   |  74 +++++
 gcc/doc/invoke.texi                           |   7 +
 .../gcc.target/loongarch/avg-ceil-lasx.c      |  22 ++
 .../gcc.target/loongarch/avg-ceil-lsx.c       |  22 ++
 .../gcc.target/loongarch/avg-floor-lasx.c     |  22 ++
 .../gcc.target/loongarch/avg-floor-lsx.c      |  22 ++
 gcc/testsuite/gcc.target/loongarch/sad-lasx.c |  20 ++
 gcc/testsuite/gcc.target/loongarch/sad-lsx.c  |  20 ++
 .../gcc.target/loongarch/vect-widen-add.c     |  26 ++
 .../gcc.target/loongarch/vect-widen-mul.c     |  26 ++
 .../gcc.target/loongarch/vect-widen-sub.c     |  26 ++
 17 files changed, 746 insertions(+), 39 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/avg-ceil-lasx.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/avg-ceil-lsx.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/avg-floor-lasx.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/avg-floor-lsx.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/sad-lasx.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/sad-lsx.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-widen-add.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-widen-mul.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-widen-sub.c

-- 
2.20.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/3] LoongArch:Implement avg and sad standard names.
  2023-10-16  2:00 [PATCH 0/3] Optimize loongarch vector implementation Jiahao Xu
@ 2023-10-16  2:00 ` Jiahao Xu
  2023-10-16  2:00 ` [PATCH 2/3] LoongArch:Implement vec_widen " Jiahao Xu
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Jiahao Xu @ 2023-10-16  2:00 UTC (permalink / raw)
  To: gcc-patches; +Cc: xry111, i, chenglulu, xuchenghua, Jiahao Xu

gcc/ChangeLog:
        * config/loongarch/lasx.md (avg<mode>3_floor, uavg<mode>3_floor,
        avg<mode>3_ceil, uavg<mode>3_ceil, ssadv16qi, usadv16qi): New patterns.
        * config/loongarch/lsx.md (avg<mode>3_floor, uavg<mode>3_floor,
        avg<mode>3_ceil, uavg<mode>3_ceil, ssadv16qi, usadv16qi): New patterns.

gcc/testsuite/ChangeLog:
        * gcc.target/loongarch/avg-ceil-lasx.c: New test.
        * gcc.target/loongarch/avg-ceil-lsx.c: New test.
        * gcc.target/loongarch/avg-floor-lasx.c: New test.
        * gcc.target/loongarch/avg-floor-lsx.c: New test.
        * gcc.target/loongarch/sad-lasx.c.c: New test.
        * gcc.target/loongarch/sad-lsx.c: New test.

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index 2bc5d47ed4a..483d78bb210 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -5171,3 +5171,77 @@
 					      const0_rtx));
   DONE;
 })
+
+(define_expand "avg<mode>3_ceil"
+  [(match_operand:ILASX_WHB 0 "register_operand")
+   (match_operand:ILASX_WHB 1 "register_operand")
+   (match_operand:ILASX_WHB 2 "register_operand")]
+  "ISA_HAS_LASX"
+{
+  emit_insn (gen_lasx_xvavgr_s_<lasxfmt> (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
+(define_expand "uavg<mode>3_ceil"
+  [(match_operand:ILASX_WHB 0 "register_operand")
+   (match_operand:ILASX_WHB 1 "register_operand")
+   (match_operand:ILASX_WHB 2 "register_operand")]
+  "ISA_HAS_LASX"
+{
+  emit_insn (gen_lasx_xvavgr_u_<lasxfmt_u> (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
+(define_expand "avg<mode>3_floor"
+  [(match_operand:ILASX_WHB 0 "register_operand")
+   (match_operand:ILASX_WHB 1 "register_operand")
+   (match_operand:ILASX_WHB 2 "register_operand")]
+  "ISA_HAS_LASX"
+{
+  emit_insn (gen_lasx_xvavg_s_<lasxfmt> (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
+(define_expand "uavg<mode>3_floor"
+  [(match_operand:ILASX_WHB 0 "register_operand")
+   (match_operand:ILASX_WHB 1 "register_operand")
+   (match_operand:ILASX_WHB 2 "register_operand")]
+  "ISA_HAS_LASX"
+{
+  emit_insn (gen_lasx_xvavg_u_<lasxfmt_u> (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
+(define_expand "usadv32qi"
+  [(match_operand:V8SI 0 "register_operand")
+   (match_operand:V32QI 1 "register_operand")
+   (match_operand:V32QI 2 "register_operand")
+   (match_operand:V8SI 3 "register_operand")]
+  "ISA_HAS_LASX"
+{
+  rtx t1 = gen_reg_rtx (V32QImode);
+  rtx t2 = gen_reg_rtx (V16HImode);
+  rtx t3 = gen_reg_rtx (V8SImode);
+  emit_insn (gen_lasx_xvabsd_u_bu (t1, operands[1], operands[2]));
+  emit_insn (gen_lasx_xvhaddw_h_b (t2, t1, t1));
+  emit_insn (gen_lasx_xvhaddw_w_h (t3, t2, t2));
+  emit_insn (gen_addv8si3 (operands[0], t3, operands[3]));
+  DONE;
+})
+
+(define_expand "ssadv32qi"
+  [(match_operand:V8SI 0 "register_operand")
+   (match_operand:V32QI 1 "register_operand")
+   (match_operand:V32QI 2 "register_operand")
+   (match_operand:V8SI 3 "register_operand")]
+  "ISA_HAS_LASX"
+{
+  rtx t1 = gen_reg_rtx (V32QImode);
+  rtx t2 = gen_reg_rtx (V16HImode);
+  rtx t3 = gen_reg_rtx (V8SImode);
+  emit_insn (gen_lasx_xvabsd_s_b (t1, operands[1], operands[2]));
+  emit_insn (gen_lasx_xvhaddw_h_b (t2, t1, t1));
+  emit_insn (gen_lasx_xvhaddw_w_h (t3, t2, t2));
+  emit_insn (gen_addv8si3 (operands[0], t3, operands[3]));
+  DONE;
+})
diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md
index 075f6ba569d..b63c6ff4dee 100644
--- a/gcc/config/loongarch/lsx.md
+++ b/gcc/config/loongarch/lsx.md
@@ -3581,6 +3581,80 @@
   DONE;
 })
 
+(define_expand "avg<mode>3_ceil"
+  [(match_operand:ILSX_WHB 0 "register_operand")
+   (match_operand:ILSX_WHB 1 "register_operand")
+   (match_operand:ILSX_WHB 2 "register_operand")]
+  "ISA_HAS_LSX"
+{
+  emit_insn (gen_lsx_vavgr_s_<lsxfmt> (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
+(define_expand "uavg<mode>3_ceil"
+  [(match_operand:ILSX_WHB 0 "register_operand")
+   (match_operand:ILSX_WHB 1 "register_operand")
+   (match_operand:ILSX_WHB 2 "register_operand")]
+  "ISA_HAS_LSX"
+{
+  emit_insn (gen_lsx_vavgr_u_<lsxfmt_u> (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
+(define_expand "avg<mode>3_floor"
+  [(match_operand:ILSX_WHB 0 "register_operand")
+   (match_operand:ILSX_WHB 1 "register_operand")
+   (match_operand:ILSX_WHB 2 "register_operand")]
+  "ISA_HAS_LSX"
+{
+  emit_insn (gen_lsx_vavg_s_<lsxfmt> (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
+(define_expand "uavg<mode>3_floor"
+  [(match_operand:ILSX_WHB 0 "register_operand")
+   (match_operand:ILSX_WHB 1 "register_operand")
+   (match_operand:ILSX_WHB 2 "register_operand")]
+  "ISA_HAS_LSX"
+{
+  emit_insn (gen_lsx_vavg_u_<lsxfmt_u> (operands[0], operands[1], operands[2]));
+  DONE;
+})
+
+(define_expand "usadv16qi"
+  [(match_operand:V4SI 0 "register_operand")
+   (match_operand:V16QI 1 "register_operand")
+   (match_operand:V16QI 2 "register_operand")
+   (match_operand:V4SI 3 "register_operand")]
+  "ISA_HAS_LSX"
+{
+  rtx t1 = gen_reg_rtx (V16QImode);
+  rtx t2 = gen_reg_rtx (V8HImode);
+  rtx t3 = gen_reg_rtx (V4SImode);
+  emit_insn (gen_lsx_vabsd_u_bu (t1, operands[1], operands[2]));
+  emit_insn (gen_lsx_vhaddw_h_b (t2, t1, t1));
+  emit_insn (gen_lsx_vhaddw_w_h (t3, t2, t2));
+  emit_insn (gen_addv4si3 (operands[0], t3, operands[3]));
+  DONE;
+})
+
+(define_expand "ssadv16qi"
+  [(match_operand:V4SI 0 "register_operand")
+   (match_operand:V16QI 1 "register_operand")
+   (match_operand:V16QI 2 "register_operand")
+   (match_operand:V4SI 3 "register_operand")]
+  "ISA_HAS_LSX"
+{
+  rtx t1 = gen_reg_rtx (V16QImode);
+  rtx t2 = gen_reg_rtx (V8HImode);
+  rtx t3 = gen_reg_rtx (V4SImode);
+  emit_insn (gen_lsx_vabsd_s_b (t1, operands[1], operands[2]));
+  emit_insn (gen_lsx_vhaddw_h_b (t2, t1, t1));
+  emit_insn (gen_lsx_vhaddw_w_h (t3, t2, t2));
+  emit_insn (gen_addv4si3 (operands[0], t3, operands[3]));
+  DONE;
+})
+
 (define_insn "lsx_v<optab>wev_d_w<u>"
   [(set (match_operand:V2DI 0 "register_operand" "=f")
 	(addsubmul:V2DI
diff --git a/gcc/testsuite/gcc.target/loongarch/avg-ceil-lasx.c b/gcc/testsuite/gcc.target/loongarch/avg-ceil-lasx.c
new file mode 100644
index 00000000000..a4fc7a63f97
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/avg-ceil-lasx.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mlasx" } */
+/* { dg-final { scan-assembler "xvavgr.b" } } */
+/* { dg-final { scan-assembler "xvavgr.bu" } } */
+/* { dg-final { scan-assembler "xvavgr.hu" } } */
+/* { dg-final { scan-assembler "xvavgr.h" } } */
+
+#define N 1024
+
+#define TEST(TYPE, NAME)                                        \
+  TYPE a_##NAME[N], b_##NAME[N], c_##NAME[N];                   \
+  void f_##NAME (void)                                          \
+  {                                                             \
+    int i;                                                      \
+    for (i = 0; i < N; i++)                                     \
+      a_##NAME[i] = (b_##NAME[i] + c_##NAME[i] + 1) >> 1;       \
+  }
+                                                                 
+TEST(char, 1);
+TEST(short, 2);
+TEST(unsigned char, 3);
+TEST(unsigned short, 4);
diff --git a/gcc/testsuite/gcc.target/loongarch/avg-ceil-lsx.c b/gcc/testsuite/gcc.target/loongarch/avg-ceil-lsx.c
new file mode 100644
index 00000000000..7aae01600d7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/avg-ceil-lsx.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mlsx" } */
+/* { dg-final { scan-assembler "vavgr.b" } } */
+/* { dg-final { scan-assembler "vavgr.bu" } } */
+/* { dg-final { scan-assembler "vavgr.hu" } } */
+/* { dg-final { scan-assembler "vavgr.h" } } */
+
+#define N 1024
+
+#define TEST(TYPE, NAME)                                        \
+  TYPE a_##NAME[N], b_##NAME[N], c_##NAME[N];                   \
+  void f_##NAME (void)                                          \
+  {                                                             \
+    int i;                                                      \
+    for (i = 0; i < N; i++)                                     \
+      a_##NAME[i] = (b_##NAME[i] + c_##NAME[i] + 1) >> 1;       \
+  }
+                                                                 
+TEST(char, 1);
+TEST(short, 2);
+TEST(unsigned char, 3);
+TEST(unsigned short, 4);
diff --git a/gcc/testsuite/gcc.target/loongarch/avg-floor-lasx.c b/gcc/testsuite/gcc.target/loongarch/avg-floor-lasx.c
new file mode 100644
index 00000000000..da6956f6f91
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/avg-floor-lasx.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mlasx" } */
+/* { dg-final { scan-assembler "xvavg.b" } } */
+/* { dg-final { scan-assembler "xvavg.bu" } } */
+/* { dg-final { scan-assembler "xvavg.hu" } } */
+/* { dg-final { scan-assembler "xvavg.h" } } */
+
+#define N 1024
+
+#define TEST(TYPE, NAME)                                        \
+  TYPE a_##NAME[N], b_##NAME[N], c_##NAME[N];                   \
+  void f_##NAME (void)                                          \
+  {                                                             \
+    int i;                                                      \
+    for (i = 0; i < N; i++)                                     \
+      a_##NAME[i] = (b_##NAME[i] + c_##NAME[i]) >> 1;           \
+  }
+                                                                 
+TEST(char, 1);
+TEST(short, 2);
+TEST(unsigned char, 3);
+TEST(unsigned short, 4);
diff --git a/gcc/testsuite/gcc.target/loongarch/avg-floor-lsx.c b/gcc/testsuite/gcc.target/loongarch/avg-floor-lsx.c
new file mode 100644
index 00000000000..d16c23ac0cc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/avg-floor-lsx.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mlsx" } */
+/* { dg-final { scan-assembler "vavg.b" } } */
+/* { dg-final { scan-assembler "vavg.bu" } } */
+/* { dg-final { scan-assembler "vavg.hu" } } */
+/* { dg-final { scan-assembler "vavg.h" } } */
+
+#define N 1024
+
+#define TEST(TYPE, NAME)                                        \
+  TYPE a_##NAME[N], b_##NAME[N], c_##NAME[N];                   \
+  void f_##NAME (void)                                          \
+  {                                                             \
+    int i;                                                      \
+    for (i = 0; i < N; i++)                                     \
+      a_##NAME[i] = (b_##NAME[i] + c_##NAME[i]) >> 1;           \
+  }
+                                                                 
+TEST(char, 1);
+TEST(short, 2);
+TEST(unsigned char, 3);
+TEST(unsigned short, 4);
diff --git a/gcc/testsuite/gcc.target/loongarch/sad-lasx.c b/gcc/testsuite/gcc.target/loongarch/sad-lasx.c
new file mode 100644
index 00000000000..47ca4039489
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/sad-lasx.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */                               
+/* { dg-options "-O3 -mlasx" } */
+
+#define N 1024
+
+#define TEST(SIGN)                                             \
+  SIGN char a_##SIGN[N], b_##SIGN[N];                          \
+  int f_##SIGN (void)                                          \
+  {                                                            \
+    int i, sum = 0;                                            \
+    for (i = 0; i < N; i++)                                    \
+      sum += __builtin_abs (a_##SIGN[i] - b_##SIGN[i]);;       \
+    return sum;                                                \
+  }
+
+TEST(signed);
+TEST(unsigned);
+
+/* { dg-final { scan-assembler {\txvabsd.bu\t} } } */
+/* { dg-final { scan-assembler {\txvabsd.b\t} } } */
diff --git a/gcc/testsuite/gcc.target/loongarch/sad-lsx.c b/gcc/testsuite/gcc.target/loongarch/sad-lsx.c
new file mode 100644
index 00000000000..2aadf3d9309
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/sad-lsx.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */                               
+/* { dg-options "-O3 -mlsx" } */
+
+#define N 1024
+
+#define TEST(SIGN)                                             \
+  SIGN char a_##SIGN[N], b_##SIGN[N];                          \
+  int f_##SIGN (void)                                          \
+  {                                                            \
+    int i, sum = 0;                                            \
+    for (i = 0; i < N; i++)                                    \
+      sum += __builtin_abs (a_##SIGN[i] - b_##SIGN[i]);;       \
+    return sum;                                                \
+  }
+
+TEST(signed);
+TEST(unsigned);
+
+/* { dg-final { scan-assembler {\tvabsd.bu\t} } } */
+/* { dg-final { scan-assembler {\tvabsd.b\t} } } */
-- 
2.20.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 2/3] LoongArch:Implement vec_widen standard names.
  2023-10-16  2:00 [PATCH 0/3] Optimize loongarch vector implementation Jiahao Xu
  2023-10-16  2:00 ` [PATCH 1/3] LoongArch:Implement avg and sad standard names Jiahao Xu
@ 2023-10-16  2:00 ` Jiahao Xu
  2023-10-16  2:00 ` [PATCH 3/3] LoongArch:Implement the new vector cost model framework Jiahao Xu
  2023-10-19  6:16 ` Re:[pushed] [PATCH 0/3] Optimize loongarch vector implementation chenglulu
  3 siblings, 0 replies; 5+ messages in thread
From: Jiahao Xu @ 2023-10-16  2:00 UTC (permalink / raw)
  To: gcc-patches; +Cc: xry111, i, chenglulu, xuchenghua, Jiahao Xu

Add support for vec_widen lo/hi patterns.  These do not directly
match on Loongarch lasx instructions but can be emulated with
even/odd + vector merge.

gcc/ChangeLog:
        * config/loongarch/lasx.md (vec_widen_<su>add_hi_<mode>, vec_widen_<su>add_lo_<mode>,
        vec_widen_<su>sub_hi_<mode>, vec_widen_<su>sub_lo_<mode>,
        vec_widen_<su>mult_hi_<mode>, vec_widen_<su>mult_lo_<mode>): New patterns.
        * config/loongarch/loongarch.cc (loongarch_expand_vec_widen_hilo):New function.

gcc/testsuite/ChangeLog:
        * gcc.target/loongarch/vect-widen-add.c: New test.
        * gcc.target/loongarch/vect-widen-sub.c: New test.
        * gcc.target/loongarch/vect-widen-mul.c: New test.

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index 483d78bb210..02c6019e1dd 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -5048,23 +5048,71 @@
   [(set_attr "type" "simd_store")
    (set_attr "mode" "DI")])
 
-(define_insn "vec_widen_<su>mult_even_v8si"
-  [(set (match_operand:V4DI 0 "register_operand" "=f")
-    (mult:V4DI
-      (any_extend:V4DI
-        (vec_select:V4SI
-          (match_operand:V8SI 1 "register_operand" "%f")
-          (parallel [(const_int 0) (const_int 2)
-                         (const_int 4) (const_int 6)])))
-      (any_extend:V4DI
-        (vec_select:V4SI
-          (match_operand:V8SI 2 "register_operand" "f")
-          (parallel [(const_int 0) (const_int 2)
-             (const_int 4) (const_int 6)])))))]
-  "ISA_HAS_LASX"
-  "xvmulwev.d.w<u>\t%u0,%u1,%u2"
-  [(set_attr "type" "simd_int_arith")
-   (set_attr "mode" "V4DI")])
+(define_expand "vec_widen_<su>add_hi_<mode>"
+  [(match_operand:<VDMODE256> 0 "register_operand")
+   (any_extend:<VDMODE256> (match_operand:ILASX_HB 1 "register_operand"))
+   (any_extend:<VDMODE256> (match_operand:ILASX_HB 2 "register_operand"))]
+  "ISA_HAS_LASX"
+{
+  loongarch_expand_vec_widen_hilo (operands[0], operands[1], operands[2],
+                        <u_bool>, true, "add");
+  DONE;
+})
+
+(define_expand "vec_widen_<su>add_lo_<mode>"
+  [(match_operand:<VDMODE256> 0 "register_operand")
+   (any_extend:<VDMODE256> (match_operand:ILASX_HB 1 "register_operand"))
+   (any_extend:<VDMODE256> (match_operand:ILASX_HB 2 "register_operand"))]
+  "ISA_HAS_LASX"
+{
+  loongarch_expand_vec_widen_hilo (operands[0], operands[1], operands[2],
+                        <u_bool>, false, "add");
+  DONE;
+})
+
+(define_expand "vec_widen_<su>sub_hi_<mode>"
+  [(match_operand:<VDMODE256> 0 "register_operand")
+   (any_extend:<VDMODE256> (match_operand:ILASX_HB 1 "register_operand"))
+   (any_extend:<VDMODE256> (match_operand:ILASX_HB 2 "register_operand"))]
+  "ISA_HAS_LASX"
+{
+  loongarch_expand_vec_widen_hilo (operands[0], operands[1], operands[2],
+                        <u_bool>, true, "sub");
+  DONE;
+})
+
+(define_expand "vec_widen_<su>sub_lo_<mode>"
+  [(match_operand:<VDMODE256> 0 "register_operand")
+   (any_extend:<VDMODE256> (match_operand:ILASX_HB 1 "register_operand"))
+   (any_extend:<VDMODE256> (match_operand:ILASX_HB 2 "register_operand"))]
+  "ISA_HAS_LASX"
+{
+  loongarch_expand_vec_widen_hilo (operands[0], operands[1], operands[2],
+                        <u_bool>, false, "sub");
+  DONE;
+})
+
+(define_expand "vec_widen_<su>mult_hi_<mode>"
+  [(match_operand:<VDMODE256> 0 "register_operand")
+   (any_extend:<VDMODE256> (match_operand:ILASX_HB 1 "register_operand"))
+   (any_extend:<VDMODE256> (match_operand:ILASX_HB 2 "register_operand"))]
+  "ISA_HAS_LASX"
+{
+  loongarch_expand_vec_widen_hilo (operands[0], operands[1], operands[2],
+                        <u_bool>, true, "mult");
+  DONE;
+})
+
+(define_expand "vec_widen_<su>mult_lo_<mode>"
+  [(match_operand:<VDMODE256> 0 "register_operand")
+   (any_extend:<VDMODE256> (match_operand:ILASX_HB 1 "register_operand"))
+   (any_extend:<VDMODE256> (match_operand:ILASX_HB 2 "register_operand"))]
+  "ISA_HAS_LASX"
+{
+  loongarch_expand_vec_widen_hilo (operands[0], operands[1], operands[2],
+                        <u_bool>, false, "mult");
+  DONE;
+})
 
 ;; Vector reduction operation
 (define_expand "reduc_plus_scal_v4di"
diff --git a/gcc/config/loongarch/loongarch-protos.h b/gcc/config/loongarch/loongarch-protos.h
index 251011c5414..72ae9918b09 100644
--- a/gcc/config/loongarch/loongarch-protos.h
+++ b/gcc/config/loongarch/loongarch-protos.h
@@ -205,6 +205,7 @@ extern void loongarch_register_frame_header_opt (void);
 extern void loongarch_expand_vec_cond_expr (machine_mode, machine_mode, rtx *);
 extern void loongarch_expand_vec_cond_mask_expr (machine_mode, machine_mode,
 						 rtx *);
+extern void loongarch_expand_vec_widen_hilo (rtx, rtx, rtx, bool, bool, const char *);
 
 /* Routines implemented in loongarch-c.c.  */
 void loongarch_cpu_cpp_builtins (cpp_reader *);
diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc
index 9e1b0d0cfa8..472f8fd37c9 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -8032,6 +8032,142 @@ loongarch_expand_vec_perm_even_odd (struct expand_vec_perm_d *d)
   return loongarch_expand_vec_perm_even_odd_1 (d, odd);
 }
 
+static void
+loongarch_expand_vec_interleave (rtx target, rtx op0, rtx op1, bool high_p)
+{
+  struct expand_vec_perm_d d;
+  unsigned i, nelt, base;
+  bool ok;
+
+  d.target = target;
+  d.op0 = op0;
+  d.op1 = op1;
+  d.vmode = GET_MODE (target);
+  d.nelt = nelt = GET_MODE_NUNITS (d.vmode);
+  d.one_vector_p = false;
+  d.testing_p = false;
+
+  base = high_p ? nelt / 2 : 0;
+  for (i = 0; i < nelt / 2; ++i)
+    {
+      d.perm[i * 2] = i + base;
+      d.perm[i * 2 + 1] = i + base + nelt;
+    }
+
+  ok = loongarch_expand_vec_perm_interleave (&d);
+  gcc_assert (ok);
+}
+
+/* The loongarch lasx instructions xvmulwev and xvmulwod return the even or odd parts of the
+   double sized result elements in the corresponding elements of the target register. That's
+   NOT what the vec_widen_umult_lo/hi patterns are expected to do. We emulate the widening
+   lo/hi multiplies with the even/odd versions followed by a vector merge.  */
+
+void
+loongarch_expand_vec_widen_hilo (rtx dest, rtx op1, rtx op2,
+				 bool uns_p, bool high_p, const char *optab)
+{
+  machine_mode wmode = GET_MODE (dest);
+  machine_mode mode = GET_MODE (op1);
+  rtx t1, t2, t3;
+
+  t1 = gen_reg_rtx (wmode);
+  t2 = gen_reg_rtx (wmode);
+  t3 = gen_reg_rtx (wmode);
+  switch (mode)
+    {
+    case V16HImode:
+      if (!strcmp (optab, "add"))
+	{
+	  if (!uns_p)
+	    {
+	      emit_insn (gen_lasx_xvaddwev_w_h (t1, op1, op2));
+	      emit_insn (gen_lasx_xvaddwod_w_h (t2, op1, op2));
+	    }
+	  else
+	    {
+	      emit_insn (gen_lasx_xvaddwev_w_hu (t1, op1, op2));
+	      emit_insn (gen_lasx_xvaddwod_w_hu (t2, op1, op2));
+	    }
+	}
+      else if (!strcmp (optab, "mult"))
+	{
+	  if (!uns_p)
+	    {
+	      emit_insn (gen_lasx_xvmulwev_w_h (t1, op1, op2));
+	      emit_insn (gen_lasx_xvmulwod_w_h (t2, op1, op2));
+	    }
+	  else
+	    {
+	      emit_insn (gen_lasx_xvmulwev_w_hu (t1, op1, op2));
+	      emit_insn (gen_lasx_xvmulwod_w_hu (t2, op1, op2));
+	    }
+	}
+      else if (!strcmp (optab, "sub"))
+	{
+	  if (!uns_p)
+	    {
+	      emit_insn (gen_lasx_xvsubwev_w_h (t1, op1, op2));
+	      emit_insn (gen_lasx_xvsubwod_w_h (t2, op1, op2));
+	    }
+	  else
+	    {
+	      emit_insn (gen_lasx_xvsubwev_w_hu (t1, op1, op2));
+	      emit_insn (gen_lasx_xvsubwod_w_hu (t2, op1, op2));
+	    }
+	}
+      break;
+
+    case V32QImode:
+      if (!strcmp (optab, "add"))
+	{
+	  if (!uns_p)
+	    {
+	      emit_insn (gen_lasx_xvaddwev_h_b (t1, op1, op2));
+	      emit_insn (gen_lasx_xvaddwod_h_b (t2, op1, op2));
+	    }
+	  else
+	    {
+	      emit_insn (gen_lasx_xvaddwev_h_bu (t1, op1, op2));
+	      emit_insn (gen_lasx_xvaddwod_h_bu (t2, op1, op2));
+	    }
+	}
+      else if (!strcmp (optab, "mult"))
+	{
+	  if (!uns_p)
+	    {
+	      emit_insn (gen_lasx_xvmulwev_h_b (t1, op1, op2));
+	      emit_insn (gen_lasx_xvmulwod_h_b (t2, op1, op2));
+	    }
+	  else
+	    {
+	      emit_insn (gen_lasx_xvmulwev_h_bu (t1, op1, op2));
+	      emit_insn (gen_lasx_xvmulwod_h_bu (t2, op1, op2));
+	    }
+	}
+      else if (!strcmp (optab, "sub"))
+	{
+	  if (!uns_p)
+	    {
+	      emit_insn (gen_lasx_xvsubwev_h_b (t1, op1, op2));
+	      emit_insn (gen_lasx_xvsubwod_h_b (t2, op1, op2));
+	    }
+	  else
+	    {
+	      emit_insn (gen_lasx_xvsubwev_h_bu (t1, op1, op2));
+	      emit_insn (gen_lasx_xvsubwod_h_bu (t2, op1, op2));
+	    }
+	}
+      break;
+
+    default:
+      gcc_unreachable ();
+    }
+
+  loongarch_expand_vec_interleave (t3, t1, t2, high_p);
+  emit_move_insn (dest, gen_lowpart (wmode, t3));
+}
+
 /* Expand a variable vector permutation for LASX.  */
 
 void
diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md
index 3286b0c56ae..a76bf2c6c9f 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -509,6 +509,8 @@
 ;; <su> is like <u>, but the signed form expands to "s" rather than "".
 (define_code_attr su [(sign_extend "s") (zero_extend "u")])
 
+(define_code_attr u_bool [(sign_extend "false") (zero_extend "true")])
+
 ;; <optab> expands to the name of the optab for a particular code.
 (define_code_attr optab [(ashift "ashl")
 			 (ashiftrt "ashr")
diff --git a/gcc/testsuite/gcc.target/loongarch/vect-widen-add.c b/gcc/testsuite/gcc.target/loongarch/vect-widen-add.c
new file mode 100644
index 00000000000..2d273adaf92
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/vect-widen-add.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */                                            
+/* { dg-options "-O3 -mlasx" } */
+/* { dg-final { scan-assembler "xvaddwev.w.h"  } } */
+/* { dg-final { scan-assembler "xvaddwod.w.h"  } } */
+/* { dg-final { scan-assembler "xvaddwev.w.hu"  } } */
+/* { dg-final { scan-assembler "xvaddwod.w.hu"  } } */
+
+#include <stdint.h>
+
+#define SIZE 1024
+
+void wide_uadd (uint32_t *foo, uint16_t *a, uint16_t *b)
+{
+  for ( int i = 0; i < SIZE; i++)
+    {
+      foo[i]   = a[i] + b[i];
+    }
+}
+
+void wide_sadd (int32_t *foo, int16_t *a, int16_t *b)
+{
+  for ( int i = 0; i < SIZE; i++)
+    {
+      foo[i]   = a[i] + b[i];
+    }
+}
diff --git a/gcc/testsuite/gcc.target/loongarch/vect-widen-mul.c b/gcc/testsuite/gcc.target/loongarch/vect-widen-mul.c
new file mode 100644
index 00000000000..282a168369e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/vect-widen-mul.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */                                            
+/* { dg-options "-O3 -mlasx" } */
+/* { dg-final { scan-assembler "xvmulwev.w.h"  } } */
+/* { dg-final { scan-assembler "xvmulwod.w.h"  } } */
+/* { dg-final { scan-assembler "xvmulwev.w.hu"  } } */
+/* { dg-final { scan-assembler "xvmulwod.w.hu"  } } */
+
+#include <stdint.h>
+
+#define SIZE 1024
+
+void wide_umul (uint32_t *foo, uint16_t *a, uint16_t *b)
+{
+  for ( int i = 0; i < SIZE; i++)
+    {
+      foo[i]   = a[i] * b[i];
+    }
+}
+
+void wide_smul (int32_t *foo, int16_t *a, int16_t *b)
+{
+  for ( int i = 0; i < SIZE; i++)
+    {
+      foo[i]   = a[i] * b[i];
+    }
+}
diff --git a/gcc/testsuite/gcc.target/loongarch/vect-widen-sub.c b/gcc/testsuite/gcc.target/loongarch/vect-widen-sub.c
new file mode 100644
index 00000000000..30cc2206b81
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/vect-widen-sub.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */                                            
+/* { dg-options "-O3 -mlasx" } */
+/* { dg-final { scan-assembler "xvsubwev.w.h"  } } */
+/* { dg-final { scan-assembler "xvsubwod.w.h"  } } */
+/* { dg-final { scan-assembler "xvsubwev.w.hu"  } } */
+/* { dg-final { scan-assembler "xvsubwod.w.hu"  } } */
+
+#include <stdint.h>
+
+#define SIZE 1024
+
+void wide_usub (uint32_t *foo, uint16_t *a, uint16_t *b)
+{
+  for ( int i = 0; i < SIZE; i++)
+    {
+      foo[i]   = a[i] - b[i];
+    }
+}
+
+void wide_ssub (int32_t *foo, int16_t *a, int16_t *b)
+{
+  for ( int i = 0; i < SIZE; i++)
+    {
+      foo[i]   = a[i] - b[i];
+    }
+}
-- 
2.20.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 3/3] LoongArch:Implement the new vector cost model framework.
  2023-10-16  2:00 [PATCH 0/3] Optimize loongarch vector implementation Jiahao Xu
  2023-10-16  2:00 ` [PATCH 1/3] LoongArch:Implement avg and sad standard names Jiahao Xu
  2023-10-16  2:00 ` [PATCH 2/3] LoongArch:Implement vec_widen " Jiahao Xu
@ 2023-10-16  2:00 ` Jiahao Xu
  2023-10-19  6:16 ` Re:[pushed] [PATCH 0/3] Optimize loongarch vector implementation chenglulu
  3 siblings, 0 replies; 5+ messages in thread
From: Jiahao Xu @ 2023-10-16  2:00 UTC (permalink / raw)
  To: gcc-patches; +Cc: xry111, i, chenglulu, xuchenghua, Jiahao Xu

This patch make loongarch use the new vector hooks and implements the costing
function determine_suggested_unroll_factor, to make it be able to suggest the
unroll factor for a given loop being vectorized base vec_ops analysis during
vector costing and the available issue information. Referring to aarch64 and
rs6000 port.

The patch also reduces the cost of unaligned stores, making it equal to the
cost of aligned ones in order to avoid odd alignment peeling.

gcc/ChangeLog:
        * config/loongarch/loongarch.cc (loongarch_vector_costs): Inherit from
        vector_costs.  Add a constructor.
        (loongarch_vector_costs::add_stmt_cost): Use adjust_cost_for_freq to
        adjust the cost for inner loops.
	(loongarch_vector_costs::count_operations): New function.
        (loongarch_vector_costs::determine_suggested_unroll_factor):Ditto.
	(loongarch_vector_costs::finish_cost): Ditto.
        (loongarch_builtin_vectorization_cost): Adjust.
        * config/loongarch/loongarch.opt (loongarch-vect-unroll-limit): New parameter.
	(loongarcg-vect-issue-info): Ditto.
        (mmemvec-cost): Delete.
	* doc/invoke.texi: (loongarcg-vect-unroll-limit): Document new option.

diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in b/gcc/config/loongarch/genopts/loongarch.opt.in
index 9f98f2d845a..4a2d7438f1b 100644
--- a/gcc/config/loongarch/genopts/loongarch.opt.in
+++ b/gcc/config/loongarch/genopts/loongarch.opt.in
@@ -146,10 +146,6 @@ mbranch-cost=
 Target RejectNegative Joined UInteger Var(loongarch_branch_cost)
 -mbranch-cost=COST	Set the cost of branches to roughly COST instructions.
 
-mmemvec-cost=
-Target RejectNegative Joined UInteger Var(loongarch_vector_access_cost) IntegerRange(1, 5)
-mmemvec-cost=COST      Set the cost of vector memory access instructions.
-
 mcheck-zero-division
 Target Mask(CHECK_ZERO_DIV)
 Trap on integer divide by zero.
@@ -213,3 +209,14 @@ mrelax
 Target Var(loongarch_mrelax) Init(HAVE_AS_MRELAX_OPTION)
 Take advantage of linker relaxations to reduce the number of instructions
 required to materialize symbol addresses.
+
+-param=loongarch-vect-unroll-limit=
+Target Joined UInteger Var(loongarch_vect_unroll_limit) Init(6) IntegerRange(1, 64) Param
+Used to limit unroll factor which indicates how much the autovectorizer may
+unroll a loop.  The default value is 6.
+
+-param=loongarch-vect-issue-info=
+Target Undocumented Joined UInteger Var(loongarch_vect_issue_info) Init(4) IntegerRange(1, 64) Param
+Indicate how many non memory access vector instructions can be issued per
+cycle, it's used in unroll factor determination for autovectorizer.  The
+default value is 4.
diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc
index 472f8fd37c9..cfd35a63ff1 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -65,6 +65,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "rtl-iter.h"
 #include "opts.h"
 #include "function-abi.h"
+#include "cfgloop.h"
+#include "tree-vectorizer.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -3845,8 +3847,6 @@ loongarch_rtx_costs (rtx x, machine_mode mode, int outer_code,
     }
 }
 
-/* Vectorizer cost model implementation.  */
-
 /* Implement targetm.vectorize.builtin_vectorization_cost.  */
 
 static int
@@ -3865,36 +3865,182 @@ loongarch_builtin_vectorization_cost (enum vect_cost_for_stmt type_of_cost,
       case vector_load:
       case vec_to_scalar:
       case scalar_to_vec:
-      case cond_branch_not_taken:
-      case vec_promote_demote:
       case scalar_store:
       case vector_store:
 	return 1;
 
+      case vec_promote_demote:
       case vec_perm:
 	return LASX_SUPPORTED_MODE_P (mode)
 	  && !LSX_SUPPORTED_MODE_P (mode) ? 2 : 1;
 
       case unaligned_load:
-      case vector_gather_load:
-	return 2;
-
       case unaligned_store:
-      case vector_scatter_store:
-	return 10;
+	return 2;
 
       case cond_branch_taken:
-	return 3;
+	return 4;
+
+      case cond_branch_not_taken:
+	return 2;
 
       case vec_construct:
 	elements = TYPE_VECTOR_SUBPARTS (vectype);
-	return elements / 2 + 1;
+	if (ISA_HAS_LASX)
+	  return elements + 1;
+	else
+	  return elements;
 
       default:
 	gcc_unreachable ();
     }
 }
 
+class loongarch_vector_costs : public vector_costs
+{
+public:
+  using vector_costs::vector_costs;
+
+  unsigned int add_stmt_cost (int count, vect_cost_for_stmt kind,
+			      stmt_vec_info stmt_info, slp_tree, tree vectype,
+			      int misalign,
+			      vect_cost_model_location where) override;
+  void finish_cost (const vector_costs *) override;
+
+protected:
+  void count_operations (vect_cost_for_stmt, stmt_vec_info,
+			 vect_cost_model_location, unsigned int);
+  unsigned int determine_suggested_unroll_factor (loop_vec_info);
+  /* The number of vectorized stmts in loop.  */
+  unsigned m_stmts = 0;
+  /* The number of load and store operations in loop.  */
+  unsigned m_loads = 0;
+  unsigned m_stores = 0;
+  /* Reduction factor for suggesting unroll factor.  */
+  unsigned m_reduc_factor = 0;
+  /* True if the loop contains an average operation. */
+  bool m_has_avg =false;
+};
+
+/* Implement TARGET_VECTORIZE_CREATE_COSTS.  */
+static vector_costs *
+loongarch_vectorize_create_costs (vec_info *vinfo, bool costing_for_scalar)
+{
+  return new loongarch_vector_costs (vinfo, costing_for_scalar);
+}
+
+void
+loongarch_vector_costs::count_operations (vect_cost_for_stmt kind,
+					  stmt_vec_info stmt_info,
+					  vect_cost_model_location where,
+					  unsigned int count)
+{
+  if (!m_costing_for_scalar
+      && is_a<loop_vec_info> (m_vinfo)
+      && where == vect_body)
+    {
+      m_stmts += count;
+
+      if (kind == scalar_load
+	  || kind == vector_load
+	  || kind == unaligned_load)
+	m_loads += count;
+      else if (kind == scalar_store
+	       || kind == vector_store
+	       || kind == unaligned_store)
+	m_stores += count;
+      else if ((kind == scalar_stmt
+		|| kind == vector_stmt
+		|| kind == vec_to_scalar)
+	       && stmt_info && vect_is_reduction (stmt_info))
+	{
+	  tree lhs = gimple_get_lhs (stmt_info->stmt);
+	  unsigned int base = FLOAT_TYPE_P (TREE_TYPE (lhs)) ? 2 : 1;
+	  m_reduc_factor = MAX (base * count, m_reduc_factor);
+	}
+    }
+}
+
+unsigned int
+loongarch_vector_costs::determine_suggested_unroll_factor (loop_vec_info loop_vinfo)
+{
+  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+
+  if (m_has_avg)
+    return 1;
+
+  /* Don't unroll if it's specified explicitly not to be unrolled.  */
+  if (loop->unroll == 1
+      || (OPTION_SET_P (flag_unroll_loops) && !flag_unroll_loops)
+      || (OPTION_SET_P (flag_unroll_all_loops) && !flag_unroll_all_loops))
+    return 1;
+
+  unsigned int nstmts_nonldst = m_stmts - m_loads - m_stores;
+  /* Don't unroll if no vector instructions excepting for memory access.  */
+  if (nstmts_nonldst == 0)
+    return 1;
+
+  /* Use this simple hardware resource model that how many non vld/vst
+     vector instructions can be issued per cycle.  */
+  unsigned int issue_info = loongarch_vect_issue_info;
+  unsigned int reduc_factor = m_reduc_factor > 1 ? m_reduc_factor : 1;
+  unsigned int uf = CEIL (reduc_factor * issue_info, nstmts_nonldst);
+  uf = MIN ((unsigned int) loongarch_vect_unroll_limit, uf);
+
+  return 1 << ceil_log2 (uf);
+}
+
+unsigned
+loongarch_vector_costs::add_stmt_cost (int count, vect_cost_for_stmt kind,
+				       stmt_vec_info stmt_info, slp_tree,
+				       tree vectype, int misalign,
+				       vect_cost_model_location where)
+{
+  unsigned retval = 0;
+
+  if (flag_vect_cost_model)
+    {
+      int stmt_cost = loongarch_builtin_vectorization_cost (kind, vectype,
+							    misalign);
+      retval = adjust_cost_for_freq (stmt_info, where, count * stmt_cost);
+      m_costs[where] += retval;
+
+      count_operations (kind, stmt_info, where, count);
+    }
+
+  if (stmt_info)
+    {
+      /* Detect the use of an averaging operation.  */
+      gimple *stmt = stmt_info->stmt;
+      if (is_gimple_call (stmt)
+	  && gimple_call_internal_p (stmt))
+	{
+	  switch (gimple_call_internal_fn (stmt))
+	    {
+	    case IFN_AVG_FLOOR:
+	    case IFN_AVG_CEIL:
+	      m_has_avg = true;
+	    default:
+	      break;
+	    }
+	}
+    }
+
+  return retval;
+}
+
+void
+loongarch_vector_costs::finish_cost (const vector_costs *scalar_costs)
+{
+  loop_vec_info loop_vinfo = dyn_cast<loop_vec_info> (m_vinfo);
+  if (loop_vinfo)
+    {
+      m_suggested_unroll_factor = determine_suggested_unroll_factor (loop_vinfo);
+    }
+
+  vector_costs::finish_cost (scalar_costs);
+}
+
 /* Implement TARGET_ADDRESS_COST.  */
 
 static int
@@ -7265,9 +7411,6 @@ loongarch_option_override_internal (struct gcc_options *opts,
   if (TARGET_DIRECT_EXTERN_ACCESS && flag_shlib)
     error ("%qs cannot be used for compiling a shared library",
 	   "-mdirect-extern-access");
-  if (loongarch_vector_access_cost == 0)
-    loongarch_vector_access_cost = 5;
-
 
   switch (la_target.cmodel)
     {
@@ -11279,6 +11422,8 @@ loongarch_builtin_support_vector_misalignment (machine_mode mode,
 #undef TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST
 #define TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST \
   loongarch_builtin_vectorization_cost
+#undef TARGET_VECTORIZE_CREATE_COSTS
+#define TARGET_VECTORIZE_CREATE_COSTS loongarch_vectorize_create_costs
 
 
 #undef TARGET_IN_SMALL_DATA_P
diff --git a/gcc/config/loongarch/loongarch.opt b/gcc/config/loongarch/loongarch.opt
index e1b085ae87c..6215abcac04 100644
--- a/gcc/config/loongarch/loongarch.opt
+++ b/gcc/config/loongarch/loongarch.opt
@@ -153,10 +153,6 @@ mbranch-cost=
 Target RejectNegative Joined UInteger Var(loongarch_branch_cost)
 -mbranch-cost=COST	Set the cost of branches to roughly COST instructions.
 
-mmemvec-cost=
-Target RejectNegative Joined UInteger Var(loongarch_vector_access_cost) IntegerRange(1, 5)
-mmemvec-cost=COST      Set the cost of vector memory access instructions.
-
 mcheck-zero-division
 Target Mask(CHECK_ZERO_DIV)
 Trap on integer divide by zero.
@@ -220,3 +216,14 @@ mrelax
 Target Var(loongarch_mrelax) Init(HAVE_AS_MRELAX_OPTION)
 Take advantage of linker relaxations to reduce the number of instructions
 required to materialize symbol addresses.
+
+-param=loongarch-vect-unroll-limit=
+Target Joined UInteger Var(loongarch_vect_unroll_limit) Init(6) IntegerRange(1, 64) Param
+Used to limit unroll factor which indicates how much the autovectorizer may
+unroll a loop.  The default value is 6.
+
+-param=loongarch-vect-issue-info=
+Target Undocumented Joined UInteger Var(loongarch_vect_issue_info) Init(4) IntegerRange(1, 64) Param
+Indicate how many non memory access vector instructions can be issued per
+cycle, it's used in unroll factor determination for autovectorizer.  The
+default value is 4.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index fee659462ff..733723e29d7 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -26205,6 +26205,13 @@ environments where no dynamic link is performed, like firmwares, OS
 kernels, executables linked with @option{-static} or @option{-static-pie}.
 @option{-mdirect-extern-access} is not compatible with @option{-fPIC} or
 @option{-fpic}.
+
+@item loongarch-vect-unroll-limit
+The vectorizer will use available tuning information to determine whether it
+would be beneficial to unroll the main vectorized loop and by how much.  This
+parameter set's the upper bound of how much the vectorizer will unroll the main
+loop.  The default value is six.
+
 @end table
 
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re:[pushed] [PATCH 0/3] Optimize loongarch vector implementation.
  2023-10-16  2:00 [PATCH 0/3] Optimize loongarch vector implementation Jiahao Xu
                   ` (2 preceding siblings ...)
  2023-10-16  2:00 ` [PATCH 3/3] LoongArch:Implement the new vector cost model framework Jiahao Xu
@ 2023-10-19  6:16 ` chenglulu
  3 siblings, 0 replies; 5+ messages in thread
From: chenglulu @ 2023-10-19  6:16 UTC (permalink / raw)
  To: Jiahao Xu, gcc-patches; +Cc: xry111, i, xuchenghua

Pushed to r14-4730.

在 2023/10/16 上午10:00, Jiahao Xu 写道:
> The following three patches further enhance loongarch’s vectorization capabilities.
>
> Patch one add LoongArch support for AVG_CEIL/FLOOR.
>
> Patch 2 add LoongArch support for vec_widen_mult/add/sub_lo/hi patterns.
>
> patch 3 make loongarch use the new vector hooks and implements the costing
> function determine_suggested_unroll_factor, to make it be able to suggest the
> unroll factor for a given loop being vectorized base vec_ops analysis during
> vector costing and the available issue information.The patch also adjusts cost
> model through performance analysis.
>
> Jiahao Xu (3):
>    LoongArch:Implement avg and sad standard names.
>    LoongArch:Implement vec_widen standard names.
>    LoongArch:Implement the new vector cost model framework.
>
>   gcc/config/loongarch/genopts/loongarch.opt.in |  15 +-
>   gcc/config/loongarch/lasx.md                  | 156 ++++++++-
>   gcc/config/loongarch/loongarch-protos.h       |   1 +
>   gcc/config/loongarch/loongarch.cc             | 309 +++++++++++++++++-
>   gcc/config/loongarch/loongarch.md             |   2 +
>   gcc/config/loongarch/loongarch.opt            |  15 +-
>   gcc/config/loongarch/lsx.md                   |  74 +++++
>   gcc/doc/invoke.texi                           |   7 +
>   .../gcc.target/loongarch/avg-ceil-lasx.c      |  22 ++
>   .../gcc.target/loongarch/avg-ceil-lsx.c       |  22 ++
>   .../gcc.target/loongarch/avg-floor-lasx.c     |  22 ++
>   .../gcc.target/loongarch/avg-floor-lsx.c      |  22 ++
>   gcc/testsuite/gcc.target/loongarch/sad-lasx.c |  20 ++
>   gcc/testsuite/gcc.target/loongarch/sad-lsx.c  |  20 ++
>   .../gcc.target/loongarch/vect-widen-add.c     |  26 ++
>   .../gcc.target/loongarch/vect-widen-mul.c     |  26 ++
>   .../gcc.target/loongarch/vect-widen-sub.c     |  26 ++
>   17 files changed, 746 insertions(+), 39 deletions(-)
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/avg-ceil-lasx.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/avg-ceil-lsx.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/avg-floor-lasx.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/avg-floor-lsx.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/sad-lasx.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/sad-lsx.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-widen-add.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-widen-mul.c
>   create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-widen-sub.c
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-10-19  6:16 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-16  2:00 [PATCH 0/3] Optimize loongarch vector implementation Jiahao Xu
2023-10-16  2:00 ` [PATCH 1/3] LoongArch:Implement avg and sad standard names Jiahao Xu
2023-10-16  2:00 ` [PATCH 2/3] LoongArch:Implement vec_widen " Jiahao Xu
2023-10-16  2:00 ` [PATCH 3/3] LoongArch:Implement the new vector cost model framework Jiahao Xu
2023-10-19  6:16 ` Re:[pushed] [PATCH 0/3] Optimize loongarch vector implementation chenglulu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).