From: Joel Hutton <Joel.Hutton@arm.com>
To: Richard Sandiford <Richard.Sandiford@arm.com>,
Joel Hutton via Gcc-patches <gcc-patches@gcc.gnu.org>
Cc: Richard Biener <rguenther@suse.de>
Subject: Re: [2/3][vect] Add widening add, subtract vect patterns
Date: Fri, 13 Nov 2020 16:48:41 +0000 [thread overview]
Message-ID: <DB6PR0802MB22005D7763AFCE5C687E4355F5E60@DB6PR0802MB2200.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <mpta6vl30zq.fsf@arm.com>
[-- Attachment #1: Type: text/plain, Size: 6264 bytes --]
Tests are still running, but I believe I've addressed all the comments.
> Like Richard said, the new patterns need to be documented in md.texi
> and the new tree codes need to be documented in generic.texi.
Done.
> While we're using tree codes, I think we need to make the naming
> consistent with other tree codes: WIDEN_PLUS_EXPR instead of
> WIDEN_ADD_EXPR and WIDEN_MINUS_EXPR instead of WIDEN_SUB_EXPR.
> Same idea for the VEC_* codes.
Fixed.
> > gcc/ChangeLog:
> >
> > 2020-11-12 Joel Hutton <joel.hutton@arm.com>
> >
> > * expr.c (expand_expr_real_2): add widen_add,widen_subtract cases
>
> Not that I personally care about this stuff (would love to see changelogs
> go away :-)) but some nits:
>
> Each description is supposed to start with a capital letter and end with
> a full stop (even if it's not a complete sentence). Same for the rest
Fixed.
> > * optabs-tree.c (optab_for_tree_code): optabs for widening adds,subtracts
>
> The line limit for changelogs is 80 characters. The entry should say
> what changed, so “Handle …” or “Add case for …” or something.
Fixed.
> > * tree-vect-patterns.c (vect_recog_widen_add_pattern): New recog ptatern
>
> typo: pattern
Fixed.
> > Add widening add, subtract patterns to tree-vect-patterns.
> > Add aarch64 tests for patterns.
> >
> > fix sad
>
> Would be good to expand on this for the final commit message.
'fix sad' was accidentally included when I squashed two commits. I've made all the commit messages more descriptive.
> > +
> > + case VEC_WIDEN_SUB_HI_EXPR:
> > + return (TYPE_UNSIGNED (type)
> > + ? vec_widen_usubl_hi_optab : vec_widen_ssubl_hi_optab);
> > +
> > +
>
> Nits: excess blank line at the end and excess space before the “:”s.
Fixed.
> > +OPTAB_D (vec_widen_usubl_lo_optab, "vec_widen_usubl_lo_$a")
> > +OPTAB_D (vec_widen_uaddl_hi_optab, "vec_widen_uaddl_hi_$a")
> > +OPTAB_D (vec_widen_uaddl_lo_optab, "vec_widen_uaddl_lo_$a")
> > OPTAB_D (vec_widen_sshiftl_hi_optab, "vec_widen_sshiftl_hi_$a")
> > OPTAB_D (vec_widen_sshiftl_lo_optab, "vec_widen_sshiftl_lo_$a")
> > OPTAB_D (vec_widen_umult_even_optab, "vec_widen_umult_even_$a")
>
> Looks like the current code groups signed stuff together and
> unsigned stuff together, so would be good to follow that.
Fixed.
> Same comments as the previous patch about having a "+nosve" pragma
> and about the scan-assembler-times lines. Same for the sub test.
Fixed.
> I am missing documentation in md.texi for the new patterns. In
> particular I wonder why you need singed and unsigned variants
> for the add/subtract patterns.
Fixed. Signed and unsigned variants because they correspond to signed and
unsigned instructions, (uaddl/uaddl2, saddl/saddl2).
> The new functions should have comments before them. Can probably
> just use the vect_recog_widen_mult_pattern comment as a template.
Fixed.
> > + case VEC_WIDEN_SUB_HI_EXPR:
> > + case VEC_WIDEN_SUB_LO_EXPR:
> > + case VEC_WIDEN_ADD_HI_EXPR:
> > + case VEC_WIDEN_ADD_LO_EXPR:
> > + return false;
> > +
>
> I think these should get the same validity checking as
> VEC_WIDEN_MULT_HI_EXPR etc.
Fixed.
> > --- a/gcc/tree-vect-patterns.c
> > +++ b/gcc/tree-vect-patterns.c
> > @@ -1086,8 +1086,10 @@ vect_recog_sad_pattern (vec_info *vinfo,
> > of the above pattern. */
> >
> > tree plus_oprnd0, plus_oprnd1;
> > - if (!vect_reassociating_reduction_p (vinfo, stmt_vinfo, PLUS_EXPR,
> > - &plus_oprnd0, &plus_oprnd1))
> > + if (!(vect_reassociating_reduction_p (vinfo, stmt_vinfo, PLUS_EXPR,
> > + &plus_oprnd0, &plus_oprnd1)
> > + || vect_reassociating_reduction_p (vinfo, stmt_vinfo, WIDEN_ADD_EXPR,
> > + &plus_oprnd0, &plus_oprnd1)))
> > return NULL;
> >
> > tree sum_type = gimple_expr_type (last_stmt);
>
> I think we should make:
>
> /* Any non-truncating sequence of conversions is OK here, since
> with a successful match, the result of the ABS(U) is known to fit
> within the nonnegative range of the result type. (It cannot be the
> negative of the minimum signed value due to the range of the widening
> MINUS_EXPR.) */
> vect_unpromoted_value unprom_abs;
> plus_oprnd0 = vect_look_through_possible_promotion (vinfo, plus_oprnd0,
> &unprom_abs);
>
> specific to the PLUS_EXPR case. If we look through promotions on
> the operands of a WIDEN_ADD_EXPR, we could potentially have a mixture
> of signednesses involved, one on the operands of the WIDEN_ADD_EXPR
> and one on its inputs.
Fixed.
gcc/ChangeLog:
2020-11-13 Joel Hutton <joel.hutton@arm.com>
* expr.c (expand_expr_real_2): Add widen_add,widen_subtract cases.
* optabs-tree.c (optab_for_tree_code): Add case for widening optabs.
adds, subtracts.
* optabs.def (OPTAB_D): Define vectorized widen add, subtracts.
* tree-cfg.c (verify_gimple_assign_binary): Add case for widening adds,
subtracts.
* tree-inline.c (estimate_operator_cost): Add case for widening adds,
subtracts.
* tree-vect-generic.c (expand_vector_operations_1): Add case for
widening adds, subtracts tree-vect-patterns.c
* (vect_recog_widen_add_pattern): New recog pattern.
(vect_recog_widen_sub_pattern): New recog pattern.
(vect_recog_average_pattern): Update widened add code.
(vect_recog_average_pattern): Update widened add code.
* tree-vect-stmts.c (vectorizable_conversion): Add case for widened add,
subtract.
(supportable_widening_operation): Add case for widened add, subtract.
* tree.def
(WIDEN_PLUS_EXPR): New tree code.
(WIDEN_MINUS_EXPR): New tree code.
(VEC_WIDEN_ADD_HI_EXPR): New tree code.
(VEC_WIDEN_PLUS_LO_EXPR): New tree code.
(VEC_WIDEN_MINUS_HI_EXPR): New tree code.
(VEC_WIDEN_MINUS_LO_EXPR): New tree code.
gcc/testsuite/ChangeLog:
2020-11-13 Joel Hutton <joel.hutton@arm.com>
* gcc.target/aarch64/vect-widen-add.c: New test.
* gcc.target/aarch64/vect-widen-sub.c: New test.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0002-vect-Add-widening-add-subtract-patterns.patch --]
[-- Type: text/x-patch; name="0002-vect-Add-widening-add-subtract-patterns.patch", Size: 21630 bytes --]
From d5e20487bbccd69e9b5ac96fef6c9df8710d0cb0 Mon Sep 17 00:00:00 2001
From: Joel Hutton <joel.hutton@arm.com>
Date: Mon, 9 Nov 2020 15:44:18 +0000
Subject: [PATCH 2/3] [vect] Add widening add, subtract patterns
Add widening add, subtract patterns to tree-vect-patterns. Update the
widened code of patterns that detect PLUS_EXPR to also detect
WIDEN_PLUS_EXPR. These patterns take 2 vectors with N elements of size
S and perform an add/subtract on the elements, storing the results as N
elements of size 2*S (in 2 result vectors). This is implemented in the
aarch64 backend as addl,addl2 and subl,subl2 respectively. Add aarch64
tests for patterns.
---
gcc/doc/generic.texi | 31 +++++++
gcc/doc/md.texi | 22 +++++
gcc/expr.c | 6 ++
gcc/optabs-tree.c | 16 ++++
gcc/optabs.def | 8 ++
.../gcc.target/aarch64/vect-widen-add.c | 92 +++++++++++++++++++
.../gcc.target/aarch64/vect-widen-sub.c | 92 +++++++++++++++++++
gcc/tree-cfg.c | 6 ++
gcc/tree-inline.c | 6 ++
gcc/tree-vect-generic.c | 4 +
gcc/tree-vect-patterns.c | 31 ++++++-
gcc/tree-vect-stmts.c | 15 ++-
gcc/tree.def | 6 ++
13 files changed, 331 insertions(+), 4 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-widen-add.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c
diff --git a/gcc/doc/generic.texi b/gcc/doc/generic.texi
index 7373266c69f..3d7d4b0b947 100644
--- a/gcc/doc/generic.texi
+++ b/gcc/doc/generic.texi
@@ -1790,6 +1790,10 @@ a value from @code{enum annot_expr_kind}, the third is an @code{INTEGER_CST}.
@tindex VEC_RSHIFT_EXPR
@tindex VEC_WIDEN_MULT_HI_EXPR
@tindex VEC_WIDEN_MULT_LO_EXPR
+@tindex VEC_WIDEN_PLUS_HI_EXPR
+@tindex VEC_WIDEN_PLUS_LO_EXPR
+@tindex VEC_WIDEN_MINUS_HI_EXPR
+@tindex VEC_WIDEN_MINUS_LO_EXPR
@tindex VEC_UNPACK_HI_EXPR
@tindex VEC_UNPACK_LO_EXPR
@tindex VEC_UNPACK_FLOAT_HI_EXPR
@@ -1836,6 +1840,33 @@ vector of @code{N/2} products. In the case of @code{VEC_WIDEN_MULT_LO_EXPR} the
low @code{N/2} elements of the two vector are multiplied to produce the
vector of @code{N/2} products.
+@item VEC_WIDEN_PLUS_HI_EXPR
+@itemx VEC_WIDEN_PLUS_LO_EXPR
+These nodes represent widening vector addition of the high and low parts of
+the two input vectors, respectively. Their operands are vectors that contain
+the same number of elements (@code{N}) of the same integral type. The result
+is a vector that contains half as many elements, of an integral type whose size
+is twice as wide. In the case of @code{VEC_WIDEN_PLUS_HI_EXPR} the high
+@code{N/2} elements of the two vectors are added to produce the vector of
+@code{N/2} products. In the case of @code{VEC_WIDEN_PLUS_LO_EXPR} the low
+@code{N/2} elements of the two vectors are added to produce the vector of
+@code{N/2} products.
+
+@item VEC_WIDEN_MINUS_HI_EXPR
+@itemx VEC_WIDEN_MINUS_LO_EXPR
+These nodes represent widening vector subtraction of the high and low parts of
+the two input vectors, respectively. Their operands are vectors that contain
+the same number of elements (@code{N}) of the same integral type. The high/low
+elements of the second vector are subtracted from the high/low elements of the
+first. The result is a vector that contains half as many elements, of an
+integral type whose size is twice as wide. In the case of
+@code{VEC_WIDEN_MINUS_HI_EXPR} the high @code{N/2} elements of the second
+vector are subtracted from the high @code{N/2} of the first to produce the
+vector of @code{N/2} products. In the case of
+@code{VEC_WIDEN_MINUS_LO_EXPR} the low @code{N/2} elements of the second
+vector are subtracted from the low @code{N/2} of the first to produce the
+vector of @code{N/2} products.
+
@item VEC_UNPACK_HI_EXPR
@itemx VEC_UNPACK_LO_EXPR
These nodes represent unpacking of the high and low parts of the input vector,
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 813875b973b..da8c9a283dd 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5626,6 +5626,28 @@ with N signed/unsigned elements of size S@. Operand 2 is a constant. Shift
the high/low elements of operand 1, and put the N/2 results of size 2*S in the
output vector (operand 0).
+@cindex @code{vec_widen_saddl_hi_@var{m}} instruction pattern
+@cindex @code{vec_widen_saddl_lo_@var{m}} instruction pattern
+@cindex @code{vec_widen_uaddl_hi_@var{m}} instruction pattern
+@cindex @code{vec_widen_uaddl_lo_@var{m}} instruction pattern
+@item @samp{vec_widen_uaddl_hi_@var{m}}, @samp{vec_widen_uaddl_lo_@var{m}}
+@itemx @samp{vec_widen_saddl_hi_@var{m}}, @samp{vec_widen_saddl_lo_@var{m}}
+Signed/Unsigned widening add long. Operands 1 and 2 are vectors with N
+signed/unsigned elements of size S@. Add the high/low elements of 1 and 2
+together, widen the resulting elements and put the N/2 results of size 2*S in
+the output vector (operand 0).
+
+@cindex @code{vec_widen_ssubl_hi_@var{m}} instruction pattern
+@cindex @code{vec_widen_ssubl_lo_@var{m}} instruction pattern
+@cindex @code{vec_widen_usubl_hi_@var{m}} instruction pattern
+@cindex @code{vec_widen_usubl_lo_@var{m}} instruction pattern
+@item @samp{vec_widen_usubl_hi_@var{m}}, @samp{vec_widen_usubl_lo_@var{m}}
+@itemx @samp{vec_widen_ssubl_hi_@var{m}}, @samp{vec_widen_ssubl_lo_@var{m}}
+Signed/Unsigned widening subtract long. Operands 1 and 2 are vectors with N
+signed/unsigned elements of size S@. Subtract the high/low elements of 2 from
+1 and widen the resulting elements. Put the N/2 results of size 2*S in the
+output vector (operand 0).
+
@cindex @code{mulhisi3} instruction pattern
@item @samp{mulhisi3}
Multiply operands 1 and 2, which have mode @code{HImode}, and store
diff --git a/gcc/expr.c b/gcc/expr.c
index ae16f077758..83aa63c41b5 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -9034,6 +9034,8 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
target, unsignedp);
return target;
+ case WIDEN_PLUS_EXPR:
+ case WIDEN_MINUS_EXPR:
case WIDEN_MULT_EXPR:
/* If first operand is constant, swap them.
Thus the following special case checks need only
@@ -9754,6 +9756,10 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
return temp;
}
+ case VEC_WIDEN_PLUS_HI_EXPR:
+ case VEC_WIDEN_PLUS_LO_EXPR:
+ case VEC_WIDEN_MINUS_HI_EXPR:
+ case VEC_WIDEN_MINUS_LO_EXPR:
case VEC_WIDEN_MULT_HI_EXPR:
case VEC_WIDEN_MULT_LO_EXPR:
case VEC_WIDEN_MULT_EVEN_EXPR:
diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
index 4dfda756932..b797d018c84 100644
--- a/gcc/optabs-tree.c
+++ b/gcc/optabs-tree.c
@@ -170,6 +170,22 @@ optab_for_tree_code (enum tree_code code, const_tree type,
return (TYPE_UNSIGNED (type)
? vec_widen_ushiftl_lo_optab : vec_widen_sshiftl_lo_optab);
+ case VEC_WIDEN_PLUS_LO_EXPR:
+ return (TYPE_UNSIGNED (type)
+ ? vec_widen_uaddl_lo_optab : vec_widen_saddl_lo_optab);
+
+ case VEC_WIDEN_PLUS_HI_EXPR:
+ return (TYPE_UNSIGNED (type)
+ ? vec_widen_uaddl_hi_optab : vec_widen_saddl_hi_optab);
+
+ case VEC_WIDEN_MINUS_LO_EXPR:
+ return (TYPE_UNSIGNED (type)
+ ? vec_widen_usubl_lo_optab : vec_widen_ssubl_lo_optab);
+
+ case VEC_WIDEN_MINUS_HI_EXPR:
+ return (TYPE_UNSIGNED (type)
+ ? vec_widen_usubl_hi_optab : vec_widen_ssubl_hi_optab);
+
case VEC_UNPACK_HI_EXPR:
return (TYPE_UNSIGNED (type)
? vec_unpacku_hi_optab : vec_unpacks_hi_optab);
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 78409aa1453..5607f51e6b4 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -383,6 +383,10 @@ OPTAB_D (vec_widen_smult_even_optab, "vec_widen_smult_even_$a")
OPTAB_D (vec_widen_smult_hi_optab, "vec_widen_smult_hi_$a")
OPTAB_D (vec_widen_smult_lo_optab, "vec_widen_smult_lo_$a")
OPTAB_D (vec_widen_smult_odd_optab, "vec_widen_smult_odd_$a")
+OPTAB_D (vec_widen_ssubl_hi_optab, "vec_widen_ssubl_hi_$a")
+OPTAB_D (vec_widen_ssubl_lo_optab, "vec_widen_ssubl_lo_$a")
+OPTAB_D (vec_widen_saddl_hi_optab, "vec_widen_saddl_hi_$a")
+OPTAB_D (vec_widen_saddl_lo_optab, "vec_widen_saddl_lo_$a")
OPTAB_D (vec_widen_sshiftl_hi_optab, "vec_widen_sshiftl_hi_$a")
OPTAB_D (vec_widen_sshiftl_lo_optab, "vec_widen_sshiftl_lo_$a")
OPTAB_D (vec_widen_umult_even_optab, "vec_widen_umult_even_$a")
@@ -391,6 +395,10 @@ OPTAB_D (vec_widen_umult_lo_optab, "vec_widen_umult_lo_$a")
OPTAB_D (vec_widen_umult_odd_optab, "vec_widen_umult_odd_$a")
OPTAB_D (vec_widen_ushiftl_hi_optab, "vec_widen_ushiftl_hi_$a")
OPTAB_D (vec_widen_ushiftl_lo_optab, "vec_widen_ushiftl_lo_$a")
+OPTAB_D (vec_widen_usubl_hi_optab, "vec_widen_usubl_hi_$a")
+OPTAB_D (vec_widen_usubl_lo_optab, "vec_widen_usubl_lo_$a")
+OPTAB_D (vec_widen_uaddl_hi_optab, "vec_widen_uaddl_hi_$a")
+OPTAB_D (vec_widen_uaddl_lo_optab, "vec_widen_uaddl_lo_$a")
OPTAB_D (sync_add_optab, "sync_add$I$a")
OPTAB_D (sync_and_optab, "sync_and$I$a")
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c b/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c
new file mode 100644
index 00000000000..220bd9352a4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c
@@ -0,0 +1,92 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -save-temps" } */
+#include <stdint.h>
+#include <string.h>
+
+#pragma GCC target "+nosve"
+
+#define ARR_SIZE 1024
+
+/* Should produce an uaddl */
+void uadd_opt (uint32_t *foo, uint16_t *a, uint16_t *b)
+{
+ for( int i = 0; i < ARR_SIZE - 3;i=i+4)
+ {
+ foo[i] = a[i] + b[i];
+ foo[i+1] = a[i+1] + b[i+1];
+ foo[i+2] = a[i+2] + b[i+2];
+ foo[i+3] = a[i+3] + b[i+3];
+ }
+}
+
+__attribute__((optimize (0)))
+void uadd_nonopt (uint32_t *foo, uint16_t *a, uint16_t *b)
+{
+ for( int i = 0; i < ARR_SIZE - 3;i=i+4)
+ {
+ foo[i] = a[i] + b[i];
+ foo[i+1] = a[i+1] + b[i+1];
+ foo[i+2] = a[i+2] + b[i+2];
+ foo[i+3] = a[i+3] + b[i+3];
+ }
+}
+
+/* Should produce an saddl */
+void sadd_opt (int32_t *foo, int16_t *a, int16_t *b)
+{
+ for( int i = 0; i < ARR_SIZE - 3;i=i+4)
+ {
+ foo[i] = a[i] + b[i];
+ foo[i+1] = a[i+1] + b[i+1];
+ foo[i+2] = a[i+2] + b[i+2];
+ foo[i+3] = a[i+3] + b[i+3];
+ }
+}
+
+__attribute__((optimize (0)))
+void sadd_nonopt (int32_t *foo, int16_t *a, int16_t *b)
+{
+ for( int i = 0; i < ARR_SIZE - 3;i=i+4)
+ {
+ foo[i] = a[i] + b[i];
+ foo[i+1] = a[i+1] + b[i+1];
+ foo[i+2] = a[i+2] + b[i+2];
+ foo[i+3] = a[i+3] + b[i+3];
+ }
+}
+
+
+void __attribute__((optimize (0)))
+init(uint16_t *a, uint16_t *b)
+{
+ for( int i = 0; i < ARR_SIZE;i++)
+ {
+ a[i] = i;
+ b[i] = 2*i;
+ }
+}
+
+int __attribute__((optimize (0)))
+main()
+{
+ uint32_t foo_arr[ARR_SIZE];
+ uint32_t bar_arr[ARR_SIZE];
+ uint16_t a[ARR_SIZE];
+ uint16_t b[ARR_SIZE];
+
+ init(a, b);
+ uadd_opt(foo_arr, a, b);
+ uadd_nonopt(bar_arr, a, b);
+ if (memcmp(foo_arr, bar_arr, ARR_SIZE) != 0)
+ return 1;
+ sadd_opt((int32_t*) foo_arr, (int16_t*) a, (int16_t*) b);
+ sadd_nonopt((int32_t*) bar_arr, (int16_t*) a, (int16_t*) b);
+ if (memcmp(foo_arr, bar_arr, ARR_SIZE) != 0)
+ return 1;
+ return 0;
+}
+
+/* { dg-final { scan-assembler-times {\tuaddl\t} 1} } */
+/* { dg-final { scan-assembler-times {\tuaddl2\t} 1} } */
+/* { dg-final { scan-assembler-times {\tsaddl\t} 1} } */
+/* { dg-final { scan-assembler-times {\tsaddl2\t} 1} } */
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c b/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c
new file mode 100644
index 00000000000..a2bed63affb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c
@@ -0,0 +1,92 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -save-temps" } */
+#include <stdint.h>
+#include <string.h>
+
+#pragma GCC target "+nosve"
+
+#define ARR_SIZE 1024
+
+/* Should produce an usubl */
+void usub_opt (uint32_t *foo, uint16_t *a, uint16_t *b)
+{
+ for( int i = 0; i < ARR_SIZE - 3;i=i+4)
+ {
+ foo[i] = a[i] - b[i];
+ foo[i+1] = a[i+1] - b[i+1];
+ foo[i+2] = a[i+2] - b[i+2];
+ foo[i+3] = a[i+3] - b[i+3];
+ }
+}
+
+__attribute__((optimize (0)))
+void usub_nonopt (uint32_t *foo, uint16_t *a, uint16_t *b)
+{
+ for( int i = 0; i < ARR_SIZE - 3;i=i+4)
+ {
+ foo[i] = a[i] - b[i];
+ foo[i+1] = a[i+1] - b[i+1];
+ foo[i+2] = a[i+2] - b[i+2];
+ foo[i+3] = a[i+3] - b[i+3];
+ }
+}
+
+/* Should produce an ssubl */
+void ssub_opt (int32_t *foo, int16_t *a, int16_t *b)
+{
+ for( int i = 0; i < ARR_SIZE - 3;i=i+4)
+ {
+ foo[i] = a[i] - b[i];
+ foo[i+1] = a[i+1] - b[i+1];
+ foo[i+2] = a[i+2] - b[i+2];
+ foo[i+3] = a[i+3] - b[i+3];
+ }
+}
+
+__attribute__((optimize (0)))
+void ssub_nonopt (int32_t *foo, int16_t *a, int16_t *b)
+{
+ for( int i = 0; i < ARR_SIZE - 3;i=i+4)
+ {
+ foo[i] = a[i] - b[i];
+ foo[i+1] = a[i+1] - b[i+1];
+ foo[i+2] = a[i+2] - b[i+2];
+ foo[i+3] = a[i+3] - b[i+3];
+ }
+}
+
+
+void __attribute__((optimize (0)))
+init(uint16_t *a, uint16_t *b)
+{
+ for( int i = 0; i < ARR_SIZE;i++)
+ {
+ a[i] = i;
+ b[i] = 2*i;
+ }
+}
+
+int __attribute__((optimize (0)))
+main()
+{
+ uint32_t foo_arr[ARR_SIZE];
+ uint32_t bar_arr[ARR_SIZE];
+ uint16_t a[ARR_SIZE];
+ uint16_t b[ARR_SIZE];
+
+ init(a, b);
+ usub_opt(foo_arr, a, b);
+ usub_nonopt(bar_arr, a, b);
+ if (memcmp(foo_arr, bar_arr, ARR_SIZE) != 0)
+ return 1;
+ ssub_opt((int32_t*) foo_arr, (int16_t*) a, (int16_t*) b);
+ ssub_nonopt((int32_t*) bar_arr, (int16_t*) a, (int16_t*) b);
+ if (memcmp(foo_arr, bar_arr, ARR_SIZE) != 0)
+ return 1;
+ return 0;
+}
+
+/* { dg-final { scan-assembler-times {\tusubl\t} 1} } */
+/* { dg-final { scan-assembler-times {\tusubl2\t} 1} } */
+/* { dg-final { scan-assembler-times {\tssubl\t} 1} } */
+/* { dg-final { scan-assembler-times {\tssubl2\t} 1} } */
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index 5139f111fec..aaf390bda42 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -3885,6 +3885,8 @@ verify_gimple_assign_binary (gassign *stmt)
return false;
}
+ case WIDEN_PLUS_EXPR:
+ case WIDEN_MINUS_EXPR:
case PLUS_EXPR:
case MINUS_EXPR:
{
@@ -4005,6 +4007,10 @@ verify_gimple_assign_binary (gassign *stmt)
return false;
}
+ case VEC_WIDEN_MINUS_HI_EXPR:
+ case VEC_WIDEN_MINUS_LO_EXPR:
+ case VEC_WIDEN_PLUS_HI_EXPR:
+ case VEC_WIDEN_PLUS_LO_EXPR:
case VEC_WIDEN_MULT_HI_EXPR:
case VEC_WIDEN_MULT_LO_EXPR:
case VEC_WIDEN_MULT_EVEN_EXPR:
diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index 32424b169c7..d9814bd10d3 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -4224,6 +4224,8 @@ estimate_operator_cost (enum tree_code code, eni_weights *weights,
case REALIGN_LOAD_EXPR:
+ case WIDEN_PLUS_EXPR:
+ case WIDEN_MINUS_EXPR:
case WIDEN_SUM_EXPR:
case WIDEN_MULT_EXPR:
case DOT_PROD_EXPR:
@@ -4232,6 +4234,10 @@ estimate_operator_cost (enum tree_code code, eni_weights *weights,
case WIDEN_MULT_MINUS_EXPR:
case WIDEN_LSHIFT_EXPR:
+ case VEC_WIDEN_PLUS_HI_EXPR:
+ case VEC_WIDEN_PLUS_LO_EXPR:
+ case VEC_WIDEN_MINUS_HI_EXPR:
+ case VEC_WIDEN_MINUS_LO_EXPR:
case VEC_WIDEN_MULT_HI_EXPR:
case VEC_WIDEN_MULT_LO_EXPR:
case VEC_WIDEN_MULT_EVEN_EXPR:
diff --git a/gcc/tree-vect-generic.c b/gcc/tree-vect-generic.c
index d7bafa77134..23bc1cb04b7 100644
--- a/gcc/tree-vect-generic.c
+++ b/gcc/tree-vect-generic.c
@@ -2118,6 +2118,10 @@ expand_vector_operations_1 (gimple_stmt_iterator *gsi,
arguments, not the widened result. VEC_UNPACK_FLOAT_*_EXPR is
calculated in the same way above. */
if (code == WIDEN_SUM_EXPR
+ || code == VEC_WIDEN_PLUS_HI_EXPR
+ || code == VEC_WIDEN_PLUS_LO_EXPR
+ || code == VEC_WIDEN_MINUS_HI_EXPR
+ || code == VEC_WIDEN_MINUS_LO_EXPR
|| code == VEC_WIDEN_MULT_HI_EXPR
|| code == VEC_WIDEN_MULT_LO_EXPR
|| code == VEC_WIDEN_MULT_EVEN_EXPR
diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index f68a87e05ed..79b521aa436 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -1148,7 +1148,7 @@ vect_recog_sad_pattern (vec_info *vinfo,
/* FORNOW. Can continue analyzing the def-use chain when this stmt in a phi
inside the loop (in case we are analyzing an outer-loop). */
vect_unpromoted_value unprom[2];
- if (!vect_widened_op_tree (vinfo, diff_stmt_vinfo, MINUS_EXPR, MINUS_EXPR,
+ if (!vect_widened_op_tree (vinfo, diff_stmt_vinfo, MINUS_EXPR, WIDEN_MINUS_EXPR,
false, 2, unprom, &half_type))
return NULL;
@@ -1262,6 +1262,29 @@ vect_recog_widen_mult_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info,
"vect_recog_widen_mult_pattern");
}
+/* Try to detect addition on widened inputs, converting PLUS_EXPR
+ to WIDEN_PLUS_EXPR. See vect_recog_widen_op_pattern for details. */
+
+static gimple *
+vect_recog_widen_plus_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info,
+ tree *type_out)
+{
+ return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out,
+ PLUS_EXPR, WIDEN_PLUS_EXPR, false,
+ "vect_recog_widen_plus_pattern");
+}
+
+/* Try to detect addition on widened inputs, converting SUB_EXPR
+ to WIDEN_MINUS_EXPR. See vect_recog_widen_op_pattern for details. */
+static gimple *
+vect_recog_widen_minus_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info,
+ tree *type_out)
+{
+ return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out,
+ MINUS_EXPR, WIDEN_MINUS_EXPR, false,
+ "vect_recog_widen_minus_pattern");
+}
+
/* Function vect_recog_pow_pattern
Try to find the following pattern:
@@ -1978,7 +2001,7 @@ vect_recog_average_pattern (vec_info *vinfo,
vect_unpromoted_value unprom[3];
tree new_type;
unsigned int nops = vect_widened_op_tree (vinfo, plus_stmt_info, PLUS_EXPR,
- PLUS_EXPR, false, 3,
+ WIDEN_PLUS_EXPR, false, 3,
unprom, &new_type);
if (nops == 0)
return NULL;
@@ -5249,7 +5272,9 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = {
of mask conversion that are needed for gather and scatter
internal functions. */
{ vect_recog_gather_scatter_pattern, "gather_scatter" },
- { vect_recog_mask_conversion_pattern, "mask_conversion" }
+ { vect_recog_mask_conversion_pattern, "mask_conversion" },
+ { vect_recog_widen_plus_pattern, "widen_plus" },
+ { vect_recog_widen_minus_pattern, "widen_minus" },
};
const unsigned int NUM_PATTERNS = ARRAY_SIZE (vect_vect_recog_func_ptrs);
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 2c7a8a70913..25a8474c774 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -4570,6 +4570,8 @@ vectorizable_conversion (vec_info *vinfo,
if (!CONVERT_EXPR_CODE_P (code)
&& code != FIX_TRUNC_EXPR
&& code != FLOAT_EXPR
+ && code != WIDEN_PLUS_EXPR
+ && code != WIDEN_MINUS_EXPR
&& code != WIDEN_MULT_EXPR
&& code != WIDEN_LSHIFT_EXPR)
return false;
@@ -4615,7 +4617,8 @@ vectorizable_conversion (vec_info *vinfo,
if (op_type == binary_op)
{
- gcc_assert (code == WIDEN_MULT_EXPR || code == WIDEN_LSHIFT_EXPR);
+ gcc_assert (code == WIDEN_MULT_EXPR || code == WIDEN_LSHIFT_EXPR
+ || code == WIDEN_PLUS_EXPR || code == WIDEN_MINUS_EXPR);
op1 = gimple_assign_rhs2 (stmt);
tree vectype1_in;
@@ -11534,6 +11537,16 @@ supportable_widening_operation (vec_info *vinfo,
c2 = VEC_WIDEN_LSHIFT_HI_EXPR;
break;
+ case WIDEN_PLUS_EXPR:
+ c1 = VEC_WIDEN_PLUS_LO_EXPR;
+ c2 = VEC_WIDEN_PLUS_HI_EXPR;
+ break;
+
+ case WIDEN_MINUS_EXPR:
+ c1 = VEC_WIDEN_MINUS_LO_EXPR;
+ c2 = VEC_WIDEN_MINUS_HI_EXPR;
+ break;
+
CASE_CONVERT:
c1 = VEC_UNPACK_LO_EXPR;
c2 = VEC_UNPACK_HI_EXPR;
diff --git a/gcc/tree.def b/gcc/tree.def
index 6c53fe1bf67..ffbe00cf79f 100644
--- a/gcc/tree.def
+++ b/gcc/tree.def
@@ -1359,6 +1359,8 @@ DEFTREECODE (WIDEN_MULT_MINUS_EXPR, "widen_mult_minus_expr", tcc_expression, 3)
the first argument from type t1 to type t2, and then shifting it
by the second argument. */
DEFTREECODE (WIDEN_LSHIFT_EXPR, "widen_lshift_expr", tcc_binary, 2)
+DEFTREECODE (WIDEN_PLUS_EXPR, "widen_plus_expr", tcc_binary, 2)
+DEFTREECODE (WIDEN_MINUS_EXPR, "widen_minus_expr", tcc_binary, 2)
/* Widening vector multiplication.
The two operands are vectors with N elements of size S. Multiplying the
@@ -1423,6 +1425,10 @@ DEFTREECODE (VEC_PACK_FLOAT_EXPR, "vec_pack_float_expr", tcc_binary, 2)
*/
DEFTREECODE (VEC_WIDEN_LSHIFT_HI_EXPR, "widen_lshift_hi_expr", tcc_binary, 2)
DEFTREECODE (VEC_WIDEN_LSHIFT_LO_EXPR, "widen_lshift_lo_expr", tcc_binary, 2)
+DEFTREECODE (VEC_WIDEN_PLUS_HI_EXPR, "widen_plus_hi_expr", tcc_binary, 2)
+DEFTREECODE (VEC_WIDEN_PLUS_LO_EXPR, "widen_plus_lo_expr", tcc_binary, 2)
+DEFTREECODE (VEC_WIDEN_MINUS_HI_EXPR, "widen_minus_hi_expr", tcc_binary, 2)
+DEFTREECODE (VEC_WIDEN_MINUS_LO_EXPR, "widen_minus_lo_expr", tcc_binary, 2)
/* PREDICT_EXPR. Specify hint for branch prediction. The
PREDICT_EXPR_PREDICTOR specify predictor and PREDICT_EXPR_OUTCOME the
--
2.17.1
next prev parent reply other threads:[~2020-11-13 16:48 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-12 19:34 Joel Hutton
2020-11-13 7:58 ` Richard Biener
2020-11-13 12:16 ` Richard Sandiford
2020-11-13 16:48 ` Joel Hutton [this message]
2020-11-16 14:04 ` Richard Biener
2020-11-17 13:40 ` Richard Sandiford
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DB6PR0802MB22005D7763AFCE5C687E4355F5E60@DB6PR0802MB2200.eurprd08.prod.outlook.com \
--to=joel.hutton@arm.com \
--cc=Richard.Sandiford@arm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=rguenther@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).