* [PATCH] tree-scalar-evolution.c: Handle LSHIFT by constant
@ 2015-10-16 15:28 Alan Lawrence
2015-10-19 12:13 ` Richard Biener
0 siblings, 1 reply; 10+ messages in thread
From: Alan Lawrence @ 2015-10-16 15:28 UTC (permalink / raw)
To: gcc-patches
This lets the vectorizer handle some simple strides expressed using left-shift
rather than mul, e.g. a[i << 1] (whereas previously only a[i * 2] would have
been handled).
This patch does *not* handle the general case of shifts - neither a[i << j]
nor a[1 << i] will be handled; that would be a significantly bigger patch
(probably duplicating or generalizing much of chrec_fold_multiply and
chrec_fold_multiply_poly_poly in tree-chrec.c), and would probably also only
be applicable to machines with gather-load support.
Bootstrapped+check-gcc,g++,gfortran on x86_64, AArch64 and ARM, also Ada on x86_64.
Is this OK for trunk?
gcc/ChangeLog:
PR tree-optimization/65963
* tree-scalar-evolution.c (interpret_rhs_expr): Handle some LSHIFT_EXPRs
as equivalent MULT_EXPRs.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-strided-shift-1.c: New.
---
gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c | 33 ++++++++++++++++++++++++
gcc/tree-scalar-evolution.c | 18 +++++++++++++
2 files changed, 51 insertions(+)
create mode 100644 gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c b/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
new file mode 100644
index 0000000..b1ce2ec
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
@@ -0,0 +1,33 @@
+/* PR tree-optimization/65963. */
+#include "tree-vect.h"
+
+#define N 512
+
+int in[2*N], out[N];
+
+__attribute__ ((noinline)) void
+loop (void)
+{
+ for (int i = 0; i < N; i++)
+ out[i] = in[i << 1] + 7;
+}
+
+int
+main (int argc, char **argv)
+{
+ check_vect ();
+ for (int i = 0; i < 2*N; i++)
+ {
+ in[i] = i;
+ __asm__ volatile ("" : : : "memory");
+ }
+ loop ();
+ __asm__ volatile ("" : : : "memory");
+ for (int i = 0; i < N; i++)
+ {
+ if (out[i] != i*2 + 7)
+ abort ();
+ }
+ return 0;
+}
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 1 "vect" { target { vect_strided2 } } } } */
diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
index 0753bf3..e478b0e 100644
--- a/gcc/tree-scalar-evolution.c
+++ b/gcc/tree-scalar-evolution.c
@@ -1831,12 +1831,30 @@ interpret_rhs_expr (struct loop *loop, gimple *at_stmt,
break;
case MULT_EXPR:
+ case LSHIFT_EXPR:
+ /* Handle A<<B as A * (1<<B). */
chrec1 = analyze_scalar_evolution (loop, rhs1);
chrec2 = analyze_scalar_evolution (loop, rhs2);
chrec1 = chrec_convert (type, chrec1, at_stmt);
chrec2 = chrec_convert (type, chrec2, at_stmt);
chrec1 = instantiate_parameters (loop, chrec1);
chrec2 = instantiate_parameters (loop, chrec2);
+ if (code == LSHIFT_EXPR)
+ {
+ /* Do the shift in the larger size, as in e.g. (long) << (int)32,
+ we must do 1<<32 as a long or we'd overflow. */
+ tree type = TREE_TYPE (chrec2);
+ if (TYPE_PRECISION (TREE_TYPE (chrec1)) > TYPE_PRECISION (type))
+ type = TREE_TYPE (chrec1);
+ if (TYPE_PRECISION (type) == 0)
+ {
+ res = chrec_dont_know;
+ break;
+ }
+ chrec2 = fold_build2 (LSHIFT_EXPR, type,
+ build_int_cst (type, 1),
+ chrec2);
+ }
res = chrec_fold_multiply (type, chrec1, chrec2);
break;
--
1.9.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] tree-scalar-evolution.c: Handle LSHIFT by constant
2015-10-16 15:28 [PATCH] tree-scalar-evolution.c: Handle LSHIFT by constant Alan Lawrence
@ 2015-10-19 12:13 ` Richard Biener
2015-10-23 15:21 ` Alan Lawrence
0 siblings, 1 reply; 10+ messages in thread
From: Richard Biener @ 2015-10-19 12:13 UTC (permalink / raw)
To: Alan Lawrence; +Cc: GCC Patches
On Fri, Oct 16, 2015 at 5:25 PM, Alan Lawrence <alan.lawrence@arm.com> wrote:
> This lets the vectorizer handle some simple strides expressed using left-shift
> rather than mul, e.g. a[i << 1] (whereas previously only a[i * 2] would have
> been handled).
>
> This patch does *not* handle the general case of shifts - neither a[i << j]
> nor a[1 << i] will be handled; that would be a significantly bigger patch
> (probably duplicating or generalizing much of chrec_fold_multiply and
> chrec_fold_multiply_poly_poly in tree-chrec.c), and would probably also only
> be applicable to machines with gather-load support.
>
> Bootstrapped+check-gcc,g++,gfortran on x86_64, AArch64 and ARM, also Ada on x86_64.
>
> Is this OK for trunk?
>
> gcc/ChangeLog:
>
> PR tree-optimization/65963
> * tree-scalar-evolution.c (interpret_rhs_expr): Handle some LSHIFT_EXPRs
> as equivalent MULT_EXPRs.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/vect/vect-strided-shift-1.c: New.
> ---
> gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c | 33 ++++++++++++++++++++++++
> gcc/tree-scalar-evolution.c | 18 +++++++++++++
> 2 files changed, 51 insertions(+)
> create mode 100644 gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
>
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c b/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
> new file mode 100644
> index 0000000..b1ce2ec
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
> @@ -0,0 +1,33 @@
> +/* PR tree-optimization/65963. */
> +#include "tree-vect.h"
> +
> +#define N 512
> +
> +int in[2*N], out[N];
> +
> +__attribute__ ((noinline)) void
> +loop (void)
> +{
> + for (int i = 0; i < N; i++)
> + out[i] = in[i << 1] + 7;
> +}
> +
> +int
> +main (int argc, char **argv)
> +{
> + check_vect ();
> + for (int i = 0; i < 2*N; i++)
> + {
> + in[i] = i;
> + __asm__ volatile ("" : : : "memory");
> + }
> + loop ();
> + __asm__ volatile ("" : : : "memory");
> + for (int i = 0; i < N; i++)
> + {
> + if (out[i] != i*2 + 7)
> + abort ();
> + }
> + return 0;
> +}
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 1 "vect" { target { vect_strided2 } } } } */
> diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
> index 0753bf3..e478b0e 100644
> --- a/gcc/tree-scalar-evolution.c
> +++ b/gcc/tree-scalar-evolution.c
> @@ -1831,12 +1831,30 @@ interpret_rhs_expr (struct loop *loop, gimple *at_stmt,
> break;
>
> case MULT_EXPR:
> + case LSHIFT_EXPR:
> + /* Handle A<<B as A * (1<<B). */
> chrec1 = analyze_scalar_evolution (loop, rhs1);
> chrec2 = analyze_scalar_evolution (loop, rhs2);
> chrec1 = chrec_convert (type, chrec1, at_stmt);
> chrec2 = chrec_convert (type, chrec2, at_stmt);
> chrec1 = instantiate_parameters (loop, chrec1);
> chrec2 = instantiate_parameters (loop, chrec2);
> + if (code == LSHIFT_EXPR)
> + {
> + /* Do the shift in the larger size, as in e.g. (long) << (int)32,
> + we must do 1<<32 as a long or we'd overflow. */
Err, you should always do the shift in the type of rhs1. You should also
avoid the chrec_convert of rhs2 above for shifts. I think globbing
shifts and multiplies together doesn't make the code any clearer.
Richard.
> + tree type = TREE_TYPE (chrec2);
> + if (TYPE_PRECISION (TREE_TYPE (chrec1)) > TYPE_PRECISION (type))
> + type = TREE_TYPE (chrec1);
> + if (TYPE_PRECISION (type) == 0)
> + {
> + res = chrec_dont_know;
> + break;
> + }
> + chrec2 = fold_build2 (LSHIFT_EXPR, type,
> + build_int_cst (type, 1),
> + chrec2);
> + }
> res = chrec_fold_multiply (type, chrec1, chrec2);
> break;
>
> --
> 1.9.1
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] tree-scalar-evolution.c: Handle LSHIFT by constant
2015-10-19 12:13 ` Richard Biener
@ 2015-10-23 15:21 ` Alan Lawrence
2015-10-26 9:00 ` Richard Biener
0 siblings, 1 reply; 10+ messages in thread
From: Alan Lawrence @ 2015-10-23 15:21 UTC (permalink / raw)
To: gcc-patches; +Cc: richard.guenther
On 19/10/15 12:49, Richard Biener wrote:
> Err, you should always do the shift in the type of rhs1. You should also
> avoid the chrec_convert of rhs2 above for shifts.
Err, yes, indeed. Needed to keep the chrec_convert before the
chrec_fold_multiply, and the rest followed. How's this?
Bootstrapped+check-gcc,g++ on x86, ARM, AArch64.
gcc/ChangeLog (as before):
PR tree-optimization/65963
* tree-scalar-evolution.c (interpret_rhs_expr): Handle some LSHIFT_EXPRs
as equivalent MULT_EXPRs.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-strided-shift-1.c: New.
---
gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c | 33 ++++++++++++++++++++++++
gcc/tree-scalar-evolution.c | 15 +++++++++++
2 files changed, 48 insertions(+)
create mode 100644 gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c b/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
new file mode 100644
index 0000000..b1ce2ec
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
@@ -0,0 +1,33 @@
+/* PR tree-optimization/65963. */
+#include "tree-vect.h"
+
+#define N 512
+
+int in[2*N], out[N];
+
+__attribute__ ((noinline)) void
+loop (void)
+{
+ for (int i = 0; i < N; i++)
+ out[i] = in[i << 1] + 7;
+}
+
+int
+main (int argc, char **argv)
+{
+ check_vect ();
+ for (int i = 0; i < 2*N; i++)
+ {
+ in[i] = i;
+ __asm__ volatile ("" : : : "memory");
+ }
+ loop ();
+ __asm__ volatile ("" : : : "memory");
+ for (int i = 0; i < N; i++)
+ {
+ if (out[i] != i*2 + 7)
+ abort ();
+ }
+ return 0;
+}
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 1 "vect" { target { vect_strided2 } } } } */
diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
index 0753bf3..129682f 100644
--- a/gcc/tree-scalar-evolution.c
+++ b/gcc/tree-scalar-evolution.c
@@ -1840,6 +1840,21 @@ interpret_rhs_expr (struct loop *loop, gimple *at_stmt,
res = chrec_fold_multiply (type, chrec1, chrec2);
break;
+ case LSHIFT_EXPR:
+ /* Handle A<<B as A * (1<<B). */
+ chrec1 = analyze_scalar_evolution (loop, rhs1);
+ chrec2 = analyze_scalar_evolution (loop, rhs2);
+ chrec1 = chrec_convert (type, chrec1, at_stmt);
+ chrec1 = instantiate_parameters (loop, chrec1);
+ chrec2 = instantiate_parameters (loop, chrec2);
+
+ chrec2 = fold_build2 (LSHIFT_EXPR, TREE_TYPE (rhs1),
+ build_int_cst (TREE_TYPE (rhs1), 1),
+ chrec2);
+ chrec2 = chrec_convert (type, chrec2, at_stmt);
+ res = chrec_fold_multiply (type, chrec1, chrec2);
+ break;
+
CASE_CONVERT:
/* In case we have a truncation of a widened operation that in
the truncated type has undefined overflow behavior analyze
--
1.9.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] tree-scalar-evolution.c: Handle LSHIFT by constant
2015-10-23 15:21 ` Alan Lawrence
@ 2015-10-26 9:00 ` Richard Biener
2015-10-27 12:42 ` Alan Lawrence
0 siblings, 1 reply; 10+ messages in thread
From: Richard Biener @ 2015-10-26 9:00 UTC (permalink / raw)
To: Alan Lawrence; +Cc: GCC Patches
On Fri, Oct 23, 2015 at 5:15 PM, Alan Lawrence <alan.lawrence@arm.com> wrote:
> On 19/10/15 12:49, Richard Biener wrote:
>
>> Err, you should always do the shift in the type of rhs1. You should also
>> avoid the chrec_convert of rhs2 above for shifts.
>
> Err, yes, indeed. Needed to keep the chrec_convert before the
> chrec_fold_multiply, and the rest followed. How's this?
>
> Bootstrapped+check-gcc,g++ on x86, ARM, AArch64.
>
> gcc/ChangeLog (as before):
>
> PR tree-optimization/65963
> * tree-scalar-evolution.c (interpret_rhs_expr): Handle some LSHIFT_EXPRs
> as equivalent MULT_EXPRs.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/vect/vect-strided-shift-1.c: New.
> ---
> gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c | 33 ++++++++++++++++++++++++
> gcc/tree-scalar-evolution.c | 15 +++++++++++
> 2 files changed, 48 insertions(+)
> create mode 100644 gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
>
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c b/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
> new file mode 100644
> index 0000000..b1ce2ec
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
> @@ -0,0 +1,33 @@
> +/* PR tree-optimization/65963. */
> +#include "tree-vect.h"
> +
> +#define N 512
> +
> +int in[2*N], out[N];
> +
> +__attribute__ ((noinline)) void
> +loop (void)
> +{
> + for (int i = 0; i < N; i++)
> + out[i] = in[i << 1] + 7;
> +}
> +
> +int
> +main (int argc, char **argv)
> +{
> + check_vect ();
> + for (int i = 0; i < 2*N; i++)
> + {
> + in[i] = i;
> + __asm__ volatile ("" : : : "memory");
> + }
> + loop ();
> + __asm__ volatile ("" : : : "memory");
> + for (int i = 0; i < N; i++)
> + {
> + if (out[i] != i*2 + 7)
> + abort ();
> + }
> + return 0;
> +}
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 1 "vect" { target { vect_strided2 } } } } */
> diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
> index 0753bf3..129682f 100644
> --- a/gcc/tree-scalar-evolution.c
> +++ b/gcc/tree-scalar-evolution.c
> @@ -1840,6 +1840,21 @@ interpret_rhs_expr (struct loop *loop, gimple *at_stmt,
> res = chrec_fold_multiply (type, chrec1, chrec2);
> break;
>
> + case LSHIFT_EXPR:
> + /* Handle A<<B as A * (1<<B). */
> + chrec1 = analyze_scalar_evolution (loop, rhs1);
> + chrec2 = analyze_scalar_evolution (loop, rhs2);
> + chrec1 = chrec_convert (type, chrec1, at_stmt);
> + chrec1 = instantiate_parameters (loop, chrec1);
> + chrec2 = instantiate_parameters (loop, chrec2);
> + chrec2 = fold_build2 (LSHIFT_EXPR, TREE_TYPE (rhs1),
> + build_int_cst (TREE_TYPE (rhs1), 1),
'type' instead of TREE_TYPE (rhs1)
> + chrec2);
> + chrec2 = chrec_convert (type, chrec2, at_stmt);
so you can remove this chrec_convert.
Ok with that change.
Richard.
> + res = chrec_fold_multiply (type, chrec1, chrec2);
> + break;
> +
> CASE_CONVERT:
> /* In case we have a truncation of a widened operation that in
> the truncated type has undefined overflow behavior analyze
> --
> 1.9.1
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] tree-scalar-evolution.c: Handle LSHIFT by constant
2015-10-26 9:00 ` Richard Biener
@ 2015-10-27 12:42 ` Alan Lawrence
2015-10-27 22:31 ` H.J. Lu
0 siblings, 1 reply; 10+ messages in thread
From: Alan Lawrence @ 2015-10-27 12:42 UTC (permalink / raw)
To: gcc-patches; +Cc: richard.guenther
--in-reply-to <CAFiYyc3TePGbER2Jqc8-X_ij4GHtjjoxfzFFcNYzHxhGQbE0iQ@mail.gmail.com>
On 26/10/15 08:58, Richard Biener wrote:
>
> On Fri, Oct 23, 2015 at 5:15 PM, Alan Lawrence <alan.lawrence@arm.com> wrote:
>> + chrec2 = fold_build2 (LSHIFT_EXPR, TREE_TYPE (rhs1),
>> + build_int_cst (TREE_TYPE (rhs1), 1),
>
> 'type' instead of TREE_TYPE (rhs1)
I presume you mean the first of the two (allowing removal of the chrec_convert),
and that I keep the second TREE_TYPE (rhs1), consistent with your previous
observation that I should do the multiply in the type of rhs1. (This appears
correct looking at e.g. gcc.target/i386/avx2-vpsllwi-2.c)
Hence, I've committed the attached as r229437.
Thanks, Alan
---
gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c | 33 ++++++++++++++++++++++++
gcc/tree-scalar-evolution.c | 14 ++++++++++
2 files changed, 47 insertions(+)
create mode 100644 gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c b/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
new file mode 100644
index 0000000..b1ce2ec
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
@@ -0,0 +1,33 @@
+/* PR tree-optimization/65963. */
+#include "tree-vect.h"
+
+#define N 512
+
+int in[2*N], out[N];
+
+__attribute__ ((noinline)) void
+loop (void)
+{
+ for (int i = 0; i < N; i++)
+ out[i] = in[i << 1] + 7;
+}
+
+int
+main (int argc, char **argv)
+{
+ check_vect ();
+ for (int i = 0; i < 2*N; i++)
+ {
+ in[i] = i;
+ __asm__ volatile ("" : : : "memory");
+ }
+ loop ();
+ __asm__ volatile ("" : : : "memory");
+ for (int i = 0; i < N; i++)
+ {
+ if (out[i] != i*2 + 7)
+ abort ();
+ }
+ return 0;
+}
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 1 "vect" { target { vect_strided2 } } } } */
diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
index 0753bf3..8e95ddd 100644
--- a/gcc/tree-scalar-evolution.c
+++ b/gcc/tree-scalar-evolution.c
@@ -1840,6 +1840,20 @@ interpret_rhs_expr (struct loop *loop, gimple *at_stmt,
res = chrec_fold_multiply (type, chrec1, chrec2);
break;
+ case LSHIFT_EXPR:
+ /* Handle A<<B as A * (1<<B). */
+ chrec1 = analyze_scalar_evolution (loop, rhs1);
+ chrec2 = analyze_scalar_evolution (loop, rhs2);
+ chrec1 = chrec_convert (type, chrec1, at_stmt);
+ chrec1 = instantiate_parameters (loop, chrec1);
+ chrec2 = instantiate_parameters (loop, chrec2);
+
+ chrec2 = fold_build2 (LSHIFT_EXPR, type,
+ build_int_cst (TREE_TYPE (rhs1), 1),
+ chrec2);
+ res = chrec_fold_multiply (type, chrec1, chrec2);
+ break;
+
CASE_CONVERT:
/* In case we have a truncation of a widened operation that in
the truncated type has undefined overflow behavior analyze
--
1.9.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] tree-scalar-evolution.c: Handle LSHIFT by constant
2015-10-27 12:42 ` Alan Lawrence
@ 2015-10-27 22:31 ` H.J. Lu
2015-11-03 10:15 ` Alan Lawrence
0 siblings, 1 reply; 10+ messages in thread
From: H.J. Lu @ 2015-10-27 22:31 UTC (permalink / raw)
To: Alan Lawrence; +Cc: GCC Patches, Richard Biener
On Tue, Oct 27, 2015 at 5:40 AM, Alan Lawrence <alan.lawrence@arm.com> wrote:
> --in-reply-to <CAFiYyc3TePGbER2Jqc8-X_ij4GHtjjoxfzFFcNYzHxhGQbE0iQ@mail.gmail.com>
>
> On 26/10/15 08:58, Richard Biener wrote:
>>
>> On Fri, Oct 23, 2015 at 5:15 PM, Alan Lawrence <alan.lawrence@arm.com> wrote:
>>> + chrec2 = fold_build2 (LSHIFT_EXPR, TREE_TYPE (rhs1),
>>> + build_int_cst (TREE_TYPE (rhs1), 1),
>>
>> 'type' instead of TREE_TYPE (rhs1)
>
> I presume you mean the first of the two (allowing removal of the chrec_convert),
> and that I keep the second TREE_TYPE (rhs1), consistent with your previous
> observation that I should do the multiply in the type of rhs1. (This appears
> correct looking at e.g. gcc.target/i386/avx2-vpsllwi-2.c)
>
> Hence, I've committed the attached as r229437.
>
It caused:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68112
--
H.J.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] tree-scalar-evolution.c: Handle LSHIFT by constant
2015-10-27 22:31 ` H.J. Lu
@ 2015-11-03 10:15 ` Alan Lawrence
2015-11-03 11:36 ` Richard Biener
0 siblings, 1 reply; 10+ messages in thread
From: Alan Lawrence @ 2015-11-03 10:15 UTC (permalink / raw)
To: gcc-patches; +Cc: hjl.tools, richard.guenther
On 27/10/15 22:27, H.J. Lu wrote:
>
> It caused:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68112
Bah :(.
So yes, in general case, we can't rewrite (a << 1) to (a * 2) as for signed
types (0x7f...f) << 1 == -2 whereas (0x7f...f * 2) is undefined behaviour.
Oh well :(...
I don't have a really good fix for this. The best way I can see would be to try
to make definedness of overflow a property of either the type, or maybe of the
chrec, and settable on a finer granularity than at present, rather than
TYPE_OVERFLOW_UNDEFINED = (type is signed) && !(a bunch of global flags).
However, I don't think I'm going to have time for that patch before end of
stage 1.
So, I've reverted my r229437. There is a simpler fix: to only apply the rewrite
for unsigned types. I attach that patch, which I've bootstrapped on x86; but
although I think this way is correct, I'm not really sure whether this is
something that should go in. Thoughts?
--Alan
---
gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c | 33 ++++++++++++++++++++++++
gcc/tree-scalar-evolution.c | 19 ++++++++++++++
2 files changed, 52 insertions(+)
create mode 100644 gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c b/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
new file mode 100644
index 0000000..40e6561
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
@@ -0,0 +1,33 @@
+/* PR tree-optimization/65963. */
+#include "tree-vect.h"
+
+#define N 512
+
+int in[2*N], out[N];
+
+__attribute__ ((noinline)) void
+loop (void)
+{
+ for (unsigned i = 0; i < N; i++)
+ out[i] = in[i << 1] + 7;
+}
+
+int
+main (int argc, char **argv)
+{
+ check_vect ();
+ for (int i = 0; i < 2*N; i++)
+ {
+ in[i] = i;
+ __asm__ volatile ("" : : : "memory");
+ }
+ loop ();
+ __asm__ volatile ("" : : : "memory");
+ for (int i = 0; i < N; i++)
+ {
+ if (out[i] != i*2 + 7)
+ abort ();
+ }
+ return 0;
+}
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 1 "vect" { target { vect_strided2 } } } } */
diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
index 0753bf3..d8f3d46 100644
--- a/gcc/tree-scalar-evolution.c
+++ b/gcc/tree-scalar-evolution.c
@@ -1840,6 +1840,25 @@ interpret_rhs_expr (struct loop *loop, gimple *at_stmt,
res = chrec_fold_multiply (type, chrec1, chrec2);
break;
+ case LSHIFT_EXPR:
+ if (!TYPE_OVERFLOW_UNDEFINED (type))
+ {
+ /* Handle A<<B as A * (1<<B). */
+ chrec1 = analyze_scalar_evolution (loop, rhs1);
+ chrec2 = analyze_scalar_evolution (loop, rhs2);
+ chrec1 = chrec_convert (type, chrec1, at_stmt);
+ chrec1 = instantiate_parameters (loop, chrec1);
+ chrec2 = instantiate_parameters (loop, chrec2);
+
+ chrec2 = fold_build2 (LSHIFT_EXPR, type,
+ build_int_cst (TREE_TYPE (rhs1), 1),
+ chrec2);
+ res = chrec_fold_multiply (type, chrec1, chrec2);
+ }
+ else
+ res = chrec_dont_know;
+ break;
+
CASE_CONVERT:
/* In case we have a truncation of a widened operation that in
the truncated type has undefined overflow behavior analyze
--
1.9.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] tree-scalar-evolution.c: Handle LSHIFT by constant
2015-11-03 10:15 ` Alan Lawrence
@ 2015-11-03 11:36 ` Richard Biener
2015-11-05 13:26 ` Alan Lawrence
0 siblings, 1 reply; 10+ messages in thread
From: Richard Biener @ 2015-11-03 11:36 UTC (permalink / raw)
To: Alan Lawrence; +Cc: GCC Patches, H.J. Lu
On Tue, Nov 3, 2015 at 11:15 AM, Alan Lawrence <alan.lawrence@arm.com> wrote:
> On 27/10/15 22:27, H.J. Lu wrote:
>>
>> It caused:
>>
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68112
>
> Bah :(.
>
> So yes, in general case, we can't rewrite (a << 1) to (a * 2) as for signed
> types (0x7f...f) << 1 == -2 whereas (0x7f...f * 2) is undefined behaviour.
> Oh well :(...
>
> I don't have a really good fix for this. The best way I can see would be to try
> to make definedness of overflow a property of either the type, or maybe of the
> chrec, and settable on a finer granularity than at present, rather than
> TYPE_OVERFLOW_UNDEFINED = (type is signed) && !(a bunch of global flags).
> However, I don't think I'm going to have time for that patch before end of
> stage 1.
>
> So, I've reverted my r229437. There is a simpler fix: to only apply the rewrite
> for unsigned types. I attach that patch, which I've bootstrapped on x86; but
> although I think this way is correct, I'm not really sure whether this is
> something that should go in. Thoughts?
>
> --Alan
> ---
> gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c | 33 ++++++++++++++++++++++++
> gcc/tree-scalar-evolution.c | 19 ++++++++++++++
> 2 files changed, 52 insertions(+)
> create mode 100644 gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
>
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c b/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
> new file mode 100644
> index 0000000..40e6561
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
> @@ -0,0 +1,33 @@
> +/* PR tree-optimization/65963. */
> +#include "tree-vect.h"
> +
> +#define N 512
> +
> +int in[2*N], out[N];
> +
> +__attribute__ ((noinline)) void
> +loop (void)
> +{
> + for (unsigned i = 0; i < N; i++)
> + out[i] = in[i << 1] + 7;
> +}
> +
> +int
> +main (int argc, char **argv)
> +{
> + check_vect ();
> + for (int i = 0; i < 2*N; i++)
> + {
> + in[i] = i;
> + __asm__ volatile ("" : : : "memory");
> + }
> + loop ();
> + __asm__ volatile ("" : : : "memory");
> + for (int i = 0; i < N; i++)
> + {
> + if (out[i] != i*2 + 7)
> + abort ();
> + }
> + return 0;
> +}
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 1 "vect" { target { vect_strided2 } } } } */
> diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
> index 0753bf3..d8f3d46 100644
> --- a/gcc/tree-scalar-evolution.c
> +++ b/gcc/tree-scalar-evolution.c
> @@ -1840,6 +1840,25 @@ interpret_rhs_expr (struct loop *loop, gimple *at_stmt,
> res = chrec_fold_multiply (type, chrec1, chrec2);
> break;
>
> + case LSHIFT_EXPR:
> + if (!TYPE_OVERFLOW_UNDEFINED (type))
I think this should simply re-write A << B to (type) (unsigned-type) A
* (1U << B).
Does that then still vectorize the signed case?
> + {
> + /* Handle A<<B as A * (1<<B). */
> + chrec1 = analyze_scalar_evolution (loop, rhs1);
> + chrec2 = analyze_scalar_evolution (loop, rhs2);
> + chrec1 = chrec_convert (type, chrec1, at_stmt);
> + chrec1 = instantiate_parameters (loop, chrec1);
> + chrec2 = instantiate_parameters (loop, chrec2);
> +
> + chrec2 = fold_build2 (LSHIFT_EXPR, type,
> + build_int_cst (TREE_TYPE (rhs1), 1),
> + chrec2);
> + res = chrec_fold_multiply (type, chrec1, chrec2);
> + }
> + else
> + res = chrec_dont_know;
> + break;
> +
> CASE_CONVERT:
> /* In case we have a truncation of a widened operation that in
> the truncated type has undefined overflow behavior analyze
> --
> 1.9.1
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] tree-scalar-evolution.c: Handle LSHIFT by constant
2015-11-03 11:36 ` Richard Biener
@ 2015-11-05 13:26 ` Alan Lawrence
2015-11-05 15:21 ` Richard Biener
0 siblings, 1 reply; 10+ messages in thread
From: Alan Lawrence @ 2015-11-05 13:26 UTC (permalink / raw)
To: gcc-patches; +Cc: richard.guenther, hjl.tools
On 3 November 2015 at 11:35, Richard Biener <richard.guenther@gmail.com> wrote:
>
> I think this should simply re-write A << B to (type) (unsigned-type) A
> * (1U << B).
>
> Does that then still vectorize the signed case?
I didn't realize our representation of chrec's could express that.
Yes, it does - thanks! (And the avx512ifma- test is compiled without warnings.)
Patch attached. I've added a platform-independent version of the failing AVX512
test too.
--Alan
gcc/ChangeLog:
PR tree-optimization/65963
* tree-scalar-evolution.c (interpret_rhs_expr): Try to handle
LSHIFT_EXPRs as equivalent unsigned MULT_EXPRs.
gcc/testsuite/ChangeLog:
* gcc.dg/pr68112.c: New.
* gcc.dg/vect/vect-strided-shift-1.c: New.
---
gcc/testsuite/gcc.dg/pr68112.c | 11 ++++++++
gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c | 33 ++++++++++++++++++++++++
gcc/tree-scalar-evolution.c | 17 ++++++++++++
3 files changed, 61 insertions(+)
create mode 100644 gcc/testsuite/gcc.dg/pr68112.c
create mode 100644 gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
diff --git a/gcc/testsuite/gcc.dg/pr68112.c b/gcc/testsuite/gcc.dg/pr68112.c
new file mode 100644
index 0000000..0a45b03
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr68112.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -Waggressive-loop-optimizations" } */
+
+int *a;
+
+void
+foo ()
+{
+ for (int i = 0; i < 65536; i++)
+ *a = i << 24;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c b/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
new file mode 100644
index 0000000..b1ce2ec
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
@@ -0,0 +1,33 @@
+/* PR tree-optimization/65963. */
+#include "tree-vect.h"
+
+#define N 512
+
+int in[2*N], out[N];
+
+__attribute__ ((noinline)) void
+loop (void)
+{
+ for (int i = 0; i < N; i++)
+ out[i] = in[i << 1] + 7;
+}
+
+int
+main (int argc, char **argv)
+{
+ check_vect ();
+ for (int i = 0; i < 2*N; i++)
+ {
+ in[i] = i;
+ __asm__ volatile ("" : : : "memory");
+ }
+ loop ();
+ __asm__ volatile ("" : : : "memory");
+ for (int i = 0; i < N; i++)
+ {
+ if (out[i] != i*2 + 7)
+ abort ();
+ }
+ return 0;
+}
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 1 "vect" { target { vect_strided2 } } } } */
diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
index 0753bf3..60d515d 100644
--- a/gcc/tree-scalar-evolution.c
+++ b/gcc/tree-scalar-evolution.c
@@ -1840,6 +1840,23 @@ interpret_rhs_expr (struct loop *loop, gimple *at_stmt,
res = chrec_fold_multiply (type, chrec1, chrec2);
break;
+ case LSHIFT_EXPR:
+ {
+ /* Handle A<<B as A * (1<<B). */
+ tree uns = unsigned_type_for (type);
+ chrec1 = analyze_scalar_evolution (loop, rhs1);
+ chrec2 = analyze_scalar_evolution (loop, rhs2);
+ chrec1 = chrec_convert (uns, chrec1, at_stmt);
+ chrec1 = instantiate_parameters (loop, chrec1);
+ chrec2 = instantiate_parameters (loop, chrec2);
+
+ tree one = build_int_cst (unsigned_type_for (TREE_TYPE (rhs1)), 1);
+ chrec2 = fold_build2 (LSHIFT_EXPR, uns, one, chrec2);
+ res = chrec_fold_multiply (uns, chrec1, chrec2);
+ res = chrec_convert (type, res, at_stmt);
+ }
+ break;
+
CASE_CONVERT:
/* In case we have a truncation of a widened operation that in
the truncated type has undefined overflow behavior analyze
--
1.9.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] tree-scalar-evolution.c: Handle LSHIFT by constant
2015-11-05 13:26 ` Alan Lawrence
@ 2015-11-05 15:21 ` Richard Biener
0 siblings, 0 replies; 10+ messages in thread
From: Richard Biener @ 2015-11-05 15:21 UTC (permalink / raw)
To: Alan Lawrence; +Cc: GCC Patches, H.J. Lu
On Thu, Nov 5, 2015 at 2:26 PM, Alan Lawrence <alan.lawrence@arm.com> wrote:
> On 3 November 2015 at 11:35, Richard Biener <richard.guenther@gmail.com> wrote:
>>
>> I think this should simply re-write A << B to (type) (unsigned-type) A
>> * (1U << B).
>>
>> Does that then still vectorize the signed case?
>
> I didn't realize our representation of chrec's could express that.
> Yes, it does - thanks! (And the avx512ifma- test is compiled without warnings.)
>
> Patch attached. I've added a platform-independent version of the failing AVX512
> test too.
>
> --Alan
>
> gcc/ChangeLog:
>
> PR tree-optimization/65963
> * tree-scalar-evolution.c (interpret_rhs_expr): Try to handle
> LSHIFT_EXPRs as equivalent unsigned MULT_EXPRs.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/pr68112.c: New.
> * gcc.dg/vect/vect-strided-shift-1.c: New.
> ---
> gcc/testsuite/gcc.dg/pr68112.c | 11 ++++++++
> gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c | 33 ++++++++++++++++++++++++
> gcc/tree-scalar-evolution.c | 17 ++++++++++++
> 3 files changed, 61 insertions(+)
> create mode 100644 gcc/testsuite/gcc.dg/pr68112.c
> create mode 100644 gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
>
> diff --git a/gcc/testsuite/gcc.dg/pr68112.c b/gcc/testsuite/gcc.dg/pr68112.c
> new file mode 100644
> index 0000000..0a45b03
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr68112.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -Waggressive-loop-optimizations" } */
> +
> +int *a;
> +
> +void
> +foo ()
> +{
> + for (int i = 0; i < 65536; i++)
> + *a = i << 24;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c b/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
> new file mode 100644
> index 0000000..b1ce2ec
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-strided-shift-1.c
> @@ -0,0 +1,33 @@
> +/* PR tree-optimization/65963. */
> +#include "tree-vect.h"
> +
> +#define N 512
> +
> +int in[2*N], out[N];
> +
> +__attribute__ ((noinline)) void
> +loop (void)
> +{
> + for (int i = 0; i < N; i++)
> + out[i] = in[i << 1] + 7;
> +}
> +
> +int
> +main (int argc, char **argv)
> +{
> + check_vect ();
> + for (int i = 0; i < 2*N; i++)
> + {
> + in[i] = i;
> + __asm__ volatile ("" : : : "memory");
> + }
> + loop ();
> + __asm__ volatile ("" : : : "memory");
> + for (int i = 0; i < N; i++)
> + {
> + if (out[i] != i*2 + 7)
> + abort ();
> + }
> + return 0;
> +}
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 1 "vect" { target { vect_strided2 } } } } */
> diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
> index 0753bf3..60d515d 100644
> --- a/gcc/tree-scalar-evolution.c
> +++ b/gcc/tree-scalar-evolution.c
> @@ -1840,6 +1840,23 @@ interpret_rhs_expr (struct loop *loop, gimple *at_stmt,
> res = chrec_fold_multiply (type, chrec1, chrec2);
> break;
>
> + case LSHIFT_EXPR:
> + {
> + /* Handle A<<B as A * (1<<B). */
> + tree uns = unsigned_type_for (type);
> + chrec1 = analyze_scalar_evolution (loop, rhs1);
> + chrec2 = analyze_scalar_evolution (loop, rhs2);
> + chrec1 = chrec_convert (uns, chrec1, at_stmt);
> + chrec1 = instantiate_parameters (loop, chrec1);
> + chrec2 = instantiate_parameters (loop, chrec2);
> +
> + tree one = build_int_cst (unsigned_type_for (TREE_TYPE (rhs1)), 1);
use 'uns' for the type.
Ok with that change.
Richard.
> + chrec2 = fold_build2 (LSHIFT_EXPR, uns, one, chrec2);
> + res = chrec_fold_multiply (uns, chrec1, chrec2);
> + res = chrec_convert (type, res, at_stmt);
> + }
> + break;
> +
> CASE_CONVERT:
> /* In case we have a truncation of a widened operation that in
> the truncated type has undefined overflow behavior analyze
> --
> 1.9.1
>
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2015-11-05 15:21 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-16 15:28 [PATCH] tree-scalar-evolution.c: Handle LSHIFT by constant Alan Lawrence
2015-10-19 12:13 ` Richard Biener
2015-10-23 15:21 ` Alan Lawrence
2015-10-26 9:00 ` Richard Biener
2015-10-27 12:42 ` Alan Lawrence
2015-10-27 22:31 ` H.J. Lu
2015-11-03 10:15 ` Alan Lawrence
2015-11-03 11:36 ` Richard Biener
2015-11-05 13:26 ` Alan Lawrence
2015-11-05 15:21 ` Richard Biener
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).