* [PATCH][omp, simt] Handle alternative IV
@ 2020-10-02 13:21 Tom de Vries
2020-10-15 15:05 ` [PING][PATCH][omp, " Tom de Vries
0 siblings, 1 reply; 5+ messages in thread
From: Tom de Vries @ 2020-10-02 13:21 UTC (permalink / raw)
To: gcc-patches; +Cc: Jakub Jelinek, Alexander Monakov
Hi,
Consider the test-case libgomp.c/pr81778.c added in this commit, with
this core loop (note: CANARY_SIZE set to 0 for simplicity):
...
int s = 1;
#pragma omp target simd
for (int i = N - 1; i > -1; i -= s)
a[i] = 1;
...
which, given that N is 32, sets a[0..31] to 1.
After omp-expand, this looks like:
...
<bb 5> :
simduid.7 = .GOMP_SIMT_ENTER (simduid.7);
.omp_simt.8 = .GOMP_SIMT_ENTER_ALLOC (simduid.7);
D.3193 = -s;
s.9 = s;
D.3204 = .GOMP_SIMT_LANE ();
D.3205 = -s.9;
D.3206 = (int) D.3204;
D.3207 = D.3205 * D.3206;
i = D.3207 + 31;
D.3209 = 0;
D.3210 = -s.9;
D.3211 = D.3210 - i;
D.3210 = -s.9;
D.3212 = D.3211 / D.3210;
D.3213 = (unsigned int) D.3212;
D.3213 = i >= 0 ? D.3213 : 0;
<bb 19> :
if (D.3209 < D.3213)
goto <bb 6>; [87.50%]
else
goto <bb 7>; [12.50%]
<bb 6> :
a[i] = 1;
D.3215 = -s.9;
D.3219 = .GOMP_SIMT_VF ();
D.3216 = (int) D.3219;
D.3220 = D.3215 * D.3216;
i = D.3220 + i;
D.3209 = D.3209 + 1;
goto <bb 19>; [100.00%]
...
On nvptx, the first time bb6 is executed, i is in the 0..31 range (depending
on the lane that is executing) at bb entry.
So we have the following sequence:
- a[0..31] is set to 1
- i is updated to -32..-1
- D.3209 is updated to 1 (being 0 initially)
- bb19 is executed, and if condition (D.3209 < D.3213) == (1 < 32) evaluates
to true
- bb6 is once more executed, which should not happen because all the elements
that needed to be handled were already handled.
- consequently, elements that should not be written are written
- with CANARY_SIZE == 0, we may run into a libgomp error:
...
libgomp: cuCtxSynchronize error: an illegal memory access was encountered
...
and with CANARY_SIZE unmodified, we run into:
...
Expected 0, got 1 at base[-961]
Aborted (core dumped)
...
The cause of this is as follows:
- because the step s is a variable rather than a constant, an alternative
IV (D.3209 in our example) is generated in expand_omp_simd, and the
loop condition is tested in terms of the alternative IV rather than
the original IV (i in our example).
- the SIMT code in expand_omp_simd works by modifying step and initial value.
- The initial value fd->loop.n1 is loaded into a variable n1, which is
modified by the SIMT code and then used there-after.
- The step fd->loop.step is loaded into a variable step, which is is modified
by the SIMT code, but afterwards there are uses of both step and
fd->loop.step.
- There are uses of fd->loop.step in the alternative IV handling code,
which should use step instead.
Fix this by introducing an additional variable orig_step, which is not
modified by the SIMT code and replacing all remaining uses of fd->loop.step
by either step or orig_step.
Build on x86_64-linux with nvptx accelerator, tested libgomp.
This fixes for-5.c and for-6.c FAILs I'm currently seeing on a quadro m1200
with driver 450.66.
OK for trunk?
Thanks,
- Tom
[omp, simt] Handle alternative IV
gcc/ChangeLog:
2020-10-02 Tom de Vries <tdevries@suse.de>
* omp-expand.c (expand_omp_simd): Add step_orig, and replace uses of
fd->loop.step by either step or orig_step.
libgomp/ChangeLog:
2020-10-02 Tom de Vries <tdevries@suse.de>
* testsuite/libgomp.c/pr81778.c: New test.
---
gcc/omp-expand.c | 11 ++++----
libgomp/testsuite/libgomp.c/pr81778.c | 48 +++++++++++++++++++++++++++++++++++
2 files changed, 54 insertions(+), 5 deletions(-)
diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c
index 99cb4f9dda4..80e35ac0294 100644
--- a/gcc/omp-expand.c
+++ b/gcc/omp-expand.c
@@ -6307,6 +6307,7 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
n2 = OMP_CLAUSE_DECL (innerc);
}
tree step = fd->loop.step;
+ tree orig_step = step; /* May be different from step if is_simt. */
bool is_simt = omp_find_clause (gimple_omp_for_clauses (fd->for_stmt),
OMP_CLAUSE__SIMT_);
@@ -6455,7 +6456,7 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
tree altv = NULL_TREE, altn2 = NULL_TREE;
if (fd->collapse == 1
&& !broken_loop
- && TREE_CODE (fd->loops[0].step) != INTEGER_CST)
+ && TREE_CODE (orig_step) != INTEGER_CST)
{
/* The vectorizer currently punts on loops with non-constant steps
for the main IV (can't compute number of iterations and gives up
@@ -6471,7 +6472,7 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
itype = signed_type_for (itype);
t = build_int_cst (itype, (fd->loop.cond_code == LT_EXPR ? -1 : 1));
t = fold_build2 (PLUS_EXPR, itype,
- fold_convert (itype, fd->loop.step), t);
+ fold_convert (itype, step), t);
t = fold_build2 (PLUS_EXPR, itype, t, fold_convert (itype, n2));
t = fold_build2 (MINUS_EXPR, itype, t,
fold_convert (itype, fd->loop.v));
@@ -6479,10 +6480,10 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
t = fold_build2 (TRUNC_DIV_EXPR, itype,
fold_build1 (NEGATE_EXPR, itype, t),
fold_build1 (NEGATE_EXPR, itype,
- fold_convert (itype, fd->loop.step)));
+ fold_convert (itype, step)));
else
t = fold_build2 (TRUNC_DIV_EXPR, itype, t,
- fold_convert (itype, fd->loop.step));
+ fold_convert (itype, step));
t = fold_convert (TREE_TYPE (altv), t);
altn2 = create_tmp_var (TREE_TYPE (altv));
expand_omp_build_assign (&gsi, altn2, t);
@@ -6630,7 +6631,7 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
if (is_simt)
{
gsi = gsi_start_bb (l2_bb);
- step = fold_build2 (MINUS_EXPR, TREE_TYPE (step), fd->loop.step, step);
+ step = fold_build2 (MINUS_EXPR, TREE_TYPE (step), orig_step, step);
if (POINTER_TYPE_P (type))
t = fold_build_pointer_plus (fd->loop.v, step);
else
diff --git a/libgomp/testsuite/libgomp.c/pr81778.c b/libgomp/testsuite/libgomp.c/pr81778.c
new file mode 100644
index 00000000000..571668eb36a
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/pr81778.c
@@ -0,0 +1,48 @@
+/* Minimized from for-5.c. */
+
+#include <stdio.h>
+#include <stdlib.h>
+
+/* Size of array we want to write. */
+#define N 32
+
+/* Size of extra space before and after. */
+#define CANARY_SIZE (N * 32)
+
+/* Start of array we want to write. */
+#define BASE (CANARY_SIZE)
+
+// Total size to be allocated.
+#define ALLOC_SIZE (CANARY_SIZE + N + CANARY_SIZE)
+
+#pragma omp declare target
+int a[ALLOC_SIZE];
+#pragma omp end declare target
+
+int
+main (void)
+{
+ /* Use variable step in for loop. */
+ int s = 1;
+
+#pragma omp target update to(a)
+
+ /* Write a[BASE] .. a[BASE + N - 1]. */
+#pragma omp target simd
+ for (int i = N - 1; i > -1; i -= s)
+ a[BASE + i] = 1;
+
+#pragma omp target update from(a)
+
+ for (int i = 0; i < ALLOC_SIZE; i++)
+ {
+ int expected = (BASE <= i && i < BASE + N) ? 1 : 0;
+ if (a[i] == expected)
+ continue;
+
+ printf ("Expected %d, got %d at base[%d]\n", expected, a[i], i - BASE);
+ abort ();
+ }
+
+ return 0;
+}
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PING][PATCH][omp, simt] Handle alternative IV
2020-10-02 13:21 [PATCH][omp, simt] Handle alternative IV Tom de Vries
@ 2020-10-15 15:05 ` Tom de Vries
2020-12-17 16:46 ` [PING^2][PATCH][omp, " Tom de Vries
0 siblings, 1 reply; 5+ messages in thread
From: Tom de Vries @ 2020-10-15 15:05 UTC (permalink / raw)
To: gcc-patches; +Cc: Jakub Jelinek, Alexander Monakov
On 10/2/20 3:21 PM, Tom de Vries wrote:
> Hi,
>
> Consider the test-case libgomp.c/pr81778.c added in this commit, with
> this core loop (note: CANARY_SIZE set to 0 for simplicity):
> ...
> int s = 1;
> #pragma omp target simd
> for (int i = N - 1; i > -1; i -= s)
> a[i] = 1;
> ...
> which, given that N is 32, sets a[0..31] to 1.
>
> After omp-expand, this looks like:
> ...
> <bb 5> :
> simduid.7 = .GOMP_SIMT_ENTER (simduid.7);
> .omp_simt.8 = .GOMP_SIMT_ENTER_ALLOC (simduid.7);
> D.3193 = -s;
> s.9 = s;
> D.3204 = .GOMP_SIMT_LANE ();
> D.3205 = -s.9;
> D.3206 = (int) D.3204;
> D.3207 = D.3205 * D.3206;
> i = D.3207 + 31;
> D.3209 = 0;
> D.3210 = -s.9;
> D.3211 = D.3210 - i;
> D.3210 = -s.9;
> D.3212 = D.3211 / D.3210;
> D.3213 = (unsigned int) D.3212;
> D.3213 = i >= 0 ? D.3213 : 0;
>
> <bb 19> :
> if (D.3209 < D.3213)
> goto <bb 6>; [87.50%]
> else
> goto <bb 7>; [12.50%]
>
> <bb 6> :
> a[i] = 1;
> D.3215 = -s.9;
> D.3219 = .GOMP_SIMT_VF ();
> D.3216 = (int) D.3219;
> D.3220 = D.3215 * D.3216;
> i = D.3220 + i;
> D.3209 = D.3209 + 1;
> goto <bb 19>; [100.00%]
> ...
>
> On nvptx, the first time bb6 is executed, i is in the 0..31 range (depending
> on the lane that is executing) at bb entry.
>
> So we have the following sequence:
> - a[0..31] is set to 1
> - i is updated to -32..-1
> - D.3209 is updated to 1 (being 0 initially)
> - bb19 is executed, and if condition (D.3209 < D.3213) == (1 < 32) evaluates
> to true
> - bb6 is once more executed, which should not happen because all the elements
> that needed to be handled were already handled.
> - consequently, elements that should not be written are written
> - with CANARY_SIZE == 0, we may run into a libgomp error:
> ...
> libgomp: cuCtxSynchronize error: an illegal memory access was encountered
> ...
> and with CANARY_SIZE unmodified, we run into:
> ...
> Expected 0, got 1 at base[-961]
> Aborted (core dumped)
> ...
>
> The cause of this is as follows:
> - because the step s is a variable rather than a constant, an alternative
> IV (D.3209 in our example) is generated in expand_omp_simd, and the
> loop condition is tested in terms of the alternative IV rather than
> the original IV (i in our example).
> - the SIMT code in expand_omp_simd works by modifying step and initial value.
> - The initial value fd->loop.n1 is loaded into a variable n1, which is
> modified by the SIMT code and then used there-after.
> - The step fd->loop.step is loaded into a variable step, which is is modified
> by the SIMT code, but afterwards there are uses of both step and
> fd->loop.step.
> - There are uses of fd->loop.step in the alternative IV handling code,
> which should use step instead.
>
> Fix this by introducing an additional variable orig_step, which is not
> modified by the SIMT code and replacing all remaining uses of fd->loop.step
> by either step or orig_step.
>
> Build on x86_64-linux with nvptx accelerator, tested libgomp.
>
> This fixes for-5.c and for-6.c FAILs I'm currently seeing on a quadro m1200
> with driver 450.66.
>
> OK for trunk?
>
Ping.
Thanks,
- Tom
> [omp, simt] Handle alternative IV
>
> gcc/ChangeLog:
>
> 2020-10-02 Tom de Vries <tdevries@suse.de>
>
> * omp-expand.c (expand_omp_simd): Add step_orig, and replace uses of
> fd->loop.step by either step or orig_step.
>
> libgomp/ChangeLog:
>
> 2020-10-02 Tom de Vries <tdevries@suse.de>
>
> * testsuite/libgomp.c/pr81778.c: New test.
>
> ---
> gcc/omp-expand.c | 11 ++++----
> libgomp/testsuite/libgomp.c/pr81778.c | 48 +++++++++++++++++++++++++++++++++++
> 2 files changed, 54 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c
> index 99cb4f9dda4..80e35ac0294 100644
> --- a/gcc/omp-expand.c
> +++ b/gcc/omp-expand.c
> @@ -6307,6 +6307,7 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
> n2 = OMP_CLAUSE_DECL (innerc);
> }
> tree step = fd->loop.step;
> + tree orig_step = step; /* May be different from step if is_simt. */
>
> bool is_simt = omp_find_clause (gimple_omp_for_clauses (fd->for_stmt),
> OMP_CLAUSE__SIMT_);
> @@ -6455,7 +6456,7 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
> tree altv = NULL_TREE, altn2 = NULL_TREE;
> if (fd->collapse == 1
> && !broken_loop
> - && TREE_CODE (fd->loops[0].step) != INTEGER_CST)
> + && TREE_CODE (orig_step) != INTEGER_CST)
> {
> /* The vectorizer currently punts on loops with non-constant steps
> for the main IV (can't compute number of iterations and gives up
> @@ -6471,7 +6472,7 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
> itype = signed_type_for (itype);
> t = build_int_cst (itype, (fd->loop.cond_code == LT_EXPR ? -1 : 1));
> t = fold_build2 (PLUS_EXPR, itype,
> - fold_convert (itype, fd->loop.step), t);
> + fold_convert (itype, step), t);
> t = fold_build2 (PLUS_EXPR, itype, t, fold_convert (itype, n2));
> t = fold_build2 (MINUS_EXPR, itype, t,
> fold_convert (itype, fd->loop.v));
> @@ -6479,10 +6480,10 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
> t = fold_build2 (TRUNC_DIV_EXPR, itype,
> fold_build1 (NEGATE_EXPR, itype, t),
> fold_build1 (NEGATE_EXPR, itype,
> - fold_convert (itype, fd->loop.step)));
> + fold_convert (itype, step)));
> else
> t = fold_build2 (TRUNC_DIV_EXPR, itype, t,
> - fold_convert (itype, fd->loop.step));
> + fold_convert (itype, step));
> t = fold_convert (TREE_TYPE (altv), t);
> altn2 = create_tmp_var (TREE_TYPE (altv));
> expand_omp_build_assign (&gsi, altn2, t);
> @@ -6630,7 +6631,7 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
> if (is_simt)
> {
> gsi = gsi_start_bb (l2_bb);
> - step = fold_build2 (MINUS_EXPR, TREE_TYPE (step), fd->loop.step, step);
> + step = fold_build2 (MINUS_EXPR, TREE_TYPE (step), orig_step, step);
> if (POINTER_TYPE_P (type))
> t = fold_build_pointer_plus (fd->loop.v, step);
> else
> diff --git a/libgomp/testsuite/libgomp.c/pr81778.c b/libgomp/testsuite/libgomp.c/pr81778.c
> new file mode 100644
> index 00000000000..571668eb36a
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.c/pr81778.c
> @@ -0,0 +1,48 @@
> +/* Minimized from for-5.c. */
> +
> +#include <stdio.h>
> +#include <stdlib.h>
> +
> +/* Size of array we want to write. */
> +#define N 32
> +
> +/* Size of extra space before and after. */
> +#define CANARY_SIZE (N * 32)
> +
> +/* Start of array we want to write. */
> +#define BASE (CANARY_SIZE)
> +
> +// Total size to be allocated.
> +#define ALLOC_SIZE (CANARY_SIZE + N + CANARY_SIZE)
> +
> +#pragma omp declare target
> +int a[ALLOC_SIZE];
> +#pragma omp end declare target
> +
> +int
> +main (void)
> +{
> + /* Use variable step in for loop. */
> + int s = 1;
> +
> +#pragma omp target update to(a)
> +
> + /* Write a[BASE] .. a[BASE + N - 1]. */
> +#pragma omp target simd
> + for (int i = N - 1; i > -1; i -= s)
> + a[BASE + i] = 1;
> +
> +#pragma omp target update from(a)
> +
> + for (int i = 0; i < ALLOC_SIZE; i++)
> + {
> + int expected = (BASE <= i && i < BASE + N) ? 1 : 0;
> + if (a[i] == expected)
> + continue;
> +
> + printf ("Expected %d, got %d at base[%d]\n", expected, a[i], i - BASE);
> + abort ();
> + }
> +
> + return 0;
> +}
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PING^2][PATCH][omp, simt] Handle alternative IV
2020-10-15 15:05 ` [PING][PATCH][omp, " Tom de Vries
@ 2020-12-17 16:46 ` Tom de Vries
2021-04-22 11:46 ` [PING^3][PATCH][omp, " Tom de Vries
0 siblings, 1 reply; 5+ messages in thread
From: Tom de Vries @ 2020-12-17 16:46 UTC (permalink / raw)
To: gcc-patches; +Cc: Jakub Jelinek, Alexander Monakov
On 10/15/20 5:05 PM, Tom de Vries wrote:
> On 10/2/20 3:21 PM, Tom de Vries wrote:
>> Hi,
>>
>> Consider the test-case libgomp.c/pr81778.c added in this commit, with
>> this core loop (note: CANARY_SIZE set to 0 for simplicity):
>> ...
>> int s = 1;
>> #pragma omp target simd
>> for (int i = N - 1; i > -1; i -= s)
>> a[i] = 1;
>> ...
>> which, given that N is 32, sets a[0..31] to 1.
>>
>> After omp-expand, this looks like:
>> ...
>> <bb 5> :
>> simduid.7 = .GOMP_SIMT_ENTER (simduid.7);
>> .omp_simt.8 = .GOMP_SIMT_ENTER_ALLOC (simduid.7);
>> D.3193 = -s;
>> s.9 = s;
>> D.3204 = .GOMP_SIMT_LANE ();
>> D.3205 = -s.9;
>> D.3206 = (int) D.3204;
>> D.3207 = D.3205 * D.3206;
>> i = D.3207 + 31;
>> D.3209 = 0;
>> D.3210 = -s.9;
>> D.3211 = D.3210 - i;
>> D.3210 = -s.9;
>> D.3212 = D.3211 / D.3210;
>> D.3213 = (unsigned int) D.3212;
>> D.3213 = i >= 0 ? D.3213 : 0;
>>
>> <bb 19> :
>> if (D.3209 < D.3213)
>> goto <bb 6>; [87.50%]
>> else
>> goto <bb 7>; [12.50%]
>>
>> <bb 6> :
>> a[i] = 1;
>> D.3215 = -s.9;
>> D.3219 = .GOMP_SIMT_VF ();
>> D.3216 = (int) D.3219;
>> D.3220 = D.3215 * D.3216;
>> i = D.3220 + i;
>> D.3209 = D.3209 + 1;
>> goto <bb 19>; [100.00%]
>> ...
>>
>> On nvptx, the first time bb6 is executed, i is in the 0..31 range (depending
>> on the lane that is executing) at bb entry.
>>
>> So we have the following sequence:
>> - a[0..31] is set to 1
>> - i is updated to -32..-1
>> - D.3209 is updated to 1 (being 0 initially)
>> - bb19 is executed, and if condition (D.3209 < D.3213) == (1 < 32) evaluates
>> to true
>> - bb6 is once more executed, which should not happen because all the elements
>> that needed to be handled were already handled.
>> - consequently, elements that should not be written are written
>> - with CANARY_SIZE == 0, we may run into a libgomp error:
>> ...
>> libgomp: cuCtxSynchronize error: an illegal memory access was encountered
>> ...
>> and with CANARY_SIZE unmodified, we run into:
>> ...
>> Expected 0, got 1 at base[-961]
>> Aborted (core dumped)
>> ...
>>
>> The cause of this is as follows:
>> - because the step s is a variable rather than a constant, an alternative
>> IV (D.3209 in our example) is generated in expand_omp_simd, and the
>> loop condition is tested in terms of the alternative IV rather than
>> the original IV (i in our example).
>> - the SIMT code in expand_omp_simd works by modifying step and initial value.
>> - The initial value fd->loop.n1 is loaded into a variable n1, which is
>> modified by the SIMT code and then used there-after.
>> - The step fd->loop.step is loaded into a variable step, which is is modified
>> by the SIMT code, but afterwards there are uses of both step and
>> fd->loop.step.
>> - There are uses of fd->loop.step in the alternative IV handling code,
>> which should use step instead.
>>
>> Fix this by introducing an additional variable orig_step, which is not
>> modified by the SIMT code and replacing all remaining uses of fd->loop.step
>> by either step or orig_step.
>>
>> Build on x86_64-linux with nvptx accelerator, tested libgomp.
>>
>> This fixes for-5.c and for-6.c FAILs I'm currently seeing on a quadro m1200
>> with driver 450.66.
>>
>> OK for trunk?
>>
>
Ping^2.
Thanks,
- Tom
>> [omp, simt] Handle alternative IV
>>
>> gcc/ChangeLog:
>>
>> 2020-10-02 Tom de Vries <tdevries@suse.de>
>>
>> * omp-expand.c (expand_omp_simd): Add step_orig, and replace uses of
>> fd->loop.step by either step or orig_step.
>>
>> libgomp/ChangeLog:
>>
>> 2020-10-02 Tom de Vries <tdevries@suse.de>
>>
>> * testsuite/libgomp.c/pr81778.c: New test.
>>
>> ---
>> gcc/omp-expand.c | 11 ++++----
>> libgomp/testsuite/libgomp.c/pr81778.c | 48 +++++++++++++++++++++++++++++++++++
>> 2 files changed, 54 insertions(+), 5 deletions(-)
>>
>> diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c
>> index 99cb4f9dda4..80e35ac0294 100644
>> --- a/gcc/omp-expand.c
>> +++ b/gcc/omp-expand.c
>> @@ -6307,6 +6307,7 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
>> n2 = OMP_CLAUSE_DECL (innerc);
>> }
>> tree step = fd->loop.step;
>> + tree orig_step = step; /* May be different from step if is_simt. */
>>
>> bool is_simt = omp_find_clause (gimple_omp_for_clauses (fd->for_stmt),
>> OMP_CLAUSE__SIMT_);
>> @@ -6455,7 +6456,7 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
>> tree altv = NULL_TREE, altn2 = NULL_TREE;
>> if (fd->collapse == 1
>> && !broken_loop
>> - && TREE_CODE (fd->loops[0].step) != INTEGER_CST)
>> + && TREE_CODE (orig_step) != INTEGER_CST)
>> {
>> /* The vectorizer currently punts on loops with non-constant steps
>> for the main IV (can't compute number of iterations and gives up
>> @@ -6471,7 +6472,7 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
>> itype = signed_type_for (itype);
>> t = build_int_cst (itype, (fd->loop.cond_code == LT_EXPR ? -1 : 1));
>> t = fold_build2 (PLUS_EXPR, itype,
>> - fold_convert (itype, fd->loop.step), t);
>> + fold_convert (itype, step), t);
>> t = fold_build2 (PLUS_EXPR, itype, t, fold_convert (itype, n2));
>> t = fold_build2 (MINUS_EXPR, itype, t,
>> fold_convert (itype, fd->loop.v));
>> @@ -6479,10 +6480,10 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
>> t = fold_build2 (TRUNC_DIV_EXPR, itype,
>> fold_build1 (NEGATE_EXPR, itype, t),
>> fold_build1 (NEGATE_EXPR, itype,
>> - fold_convert (itype, fd->loop.step)));
>> + fold_convert (itype, step)));
>> else
>> t = fold_build2 (TRUNC_DIV_EXPR, itype, t,
>> - fold_convert (itype, fd->loop.step));
>> + fold_convert (itype, step));
>> t = fold_convert (TREE_TYPE (altv), t);
>> altn2 = create_tmp_var (TREE_TYPE (altv));
>> expand_omp_build_assign (&gsi, altn2, t);
>> @@ -6630,7 +6631,7 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
>> if (is_simt)
>> {
>> gsi = gsi_start_bb (l2_bb);
>> - step = fold_build2 (MINUS_EXPR, TREE_TYPE (step), fd->loop.step, step);
>> + step = fold_build2 (MINUS_EXPR, TREE_TYPE (step), orig_step, step);
>> if (POINTER_TYPE_P (type))
>> t = fold_build_pointer_plus (fd->loop.v, step);
>> else
>> diff --git a/libgomp/testsuite/libgomp.c/pr81778.c b/libgomp/testsuite/libgomp.c/pr81778.c
>> new file mode 100644
>> index 00000000000..571668eb36a
>> --- /dev/null
>> +++ b/libgomp/testsuite/libgomp.c/pr81778.c
>> @@ -0,0 +1,48 @@
>> +/* Minimized from for-5.c. */
>> +
>> +#include <stdio.h>
>> +#include <stdlib.h>
>> +
>> +/* Size of array we want to write. */
>> +#define N 32
>> +
>> +/* Size of extra space before and after. */
>> +#define CANARY_SIZE (N * 32)
>> +
>> +/* Start of array we want to write. */
>> +#define BASE (CANARY_SIZE)
>> +
>> +// Total size to be allocated.
>> +#define ALLOC_SIZE (CANARY_SIZE + N + CANARY_SIZE)
>> +
>> +#pragma omp declare target
>> +int a[ALLOC_SIZE];
>> +#pragma omp end declare target
>> +
>> +int
>> +main (void)
>> +{
>> + /* Use variable step in for loop. */
>> + int s = 1;
>> +
>> +#pragma omp target update to(a)
>> +
>> + /* Write a[BASE] .. a[BASE + N - 1]. */
>> +#pragma omp target simd
>> + for (int i = N - 1; i > -1; i -= s)
>> + a[BASE + i] = 1;
>> +
>> +#pragma omp target update from(a)
>> +
>> + for (int i = 0; i < ALLOC_SIZE; i++)
>> + {
>> + int expected = (BASE <= i && i < BASE + N) ? 1 : 0;
>> + if (a[i] == expected)
>> + continue;
>> +
>> + printf ("Expected %d, got %d at base[%d]\n", expected, a[i], i - BASE);
>> + abort ();
>> + }
>> +
>> + return 0;
>> +}
>>
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PING^3][PATCH][omp, simt] Handle alternative IV
2020-12-17 16:46 ` [PING^2][PATCH][omp, " Tom de Vries
@ 2021-04-22 11:46 ` Tom de Vries
2021-04-29 12:38 ` [committed][omp, " Tom de Vries
0 siblings, 1 reply; 5+ messages in thread
From: Tom de Vries @ 2021-04-22 11:46 UTC (permalink / raw)
To: gcc-patches; +Cc: Jakub Jelinek, Alexander Monakov
On 12/17/20 5:46 PM, Tom de Vries wrote:
> On 10/15/20 5:05 PM, Tom de Vries wrote:
>> On 10/2/20 3:21 PM, Tom de Vries wrote:
>>> Hi,
>>>
>>> Consider the test-case libgomp.c/pr81778.c added in this commit, with
>>> this core loop (note: CANARY_SIZE set to 0 for simplicity):
>>> ...
>>> int s = 1;
>>> #pragma omp target simd
>>> for (int i = N - 1; i > -1; i -= s)
>>> a[i] = 1;
>>> ...
>>> which, given that N is 32, sets a[0..31] to 1.
>>>
>>> After omp-expand, this looks like:
>>> ...
>>> <bb 5> :
>>> simduid.7 = .GOMP_SIMT_ENTER (simduid.7);
>>> .omp_simt.8 = .GOMP_SIMT_ENTER_ALLOC (simduid.7);
>>> D.3193 = -s;
>>> s.9 = s;
>>> D.3204 = .GOMP_SIMT_LANE ();
>>> D.3205 = -s.9;
>>> D.3206 = (int) D.3204;
>>> D.3207 = D.3205 * D.3206;
>>> i = D.3207 + 31;
>>> D.3209 = 0;
>>> D.3210 = -s.9;
>>> D.3211 = D.3210 - i;
>>> D.3210 = -s.9;
>>> D.3212 = D.3211 / D.3210;
>>> D.3213 = (unsigned int) D.3212;
>>> D.3213 = i >= 0 ? D.3213 : 0;
>>>
>>> <bb 19> :
>>> if (D.3209 < D.3213)
>>> goto <bb 6>; [87.50%]
>>> else
>>> goto <bb 7>; [12.50%]
>>>
>>> <bb 6> :
>>> a[i] = 1;
>>> D.3215 = -s.9;
>>> D.3219 = .GOMP_SIMT_VF ();
>>> D.3216 = (int) D.3219;
>>> D.3220 = D.3215 * D.3216;
>>> i = D.3220 + i;
>>> D.3209 = D.3209 + 1;
>>> goto <bb 19>; [100.00%]
>>> ...
>>>
>>> On nvptx, the first time bb6 is executed, i is in the 0..31 range (depending
>>> on the lane that is executing) at bb entry.
>>>
>>> So we have the following sequence:
>>> - a[0..31] is set to 1
>>> - i is updated to -32..-1
>>> - D.3209 is updated to 1 (being 0 initially)
>>> - bb19 is executed, and if condition (D.3209 < D.3213) == (1 < 32) evaluates
>>> to true
>>> - bb6 is once more executed, which should not happen because all the elements
>>> that needed to be handled were already handled.
>>> - consequently, elements that should not be written are written
>>> - with CANARY_SIZE == 0, we may run into a libgomp error:
>>> ...
>>> libgomp: cuCtxSynchronize error: an illegal memory access was encountered
>>> ...
>>> and with CANARY_SIZE unmodified, we run into:
>>> ...
>>> Expected 0, got 1 at base[-961]
>>> Aborted (core dumped)
>>> ...
>>>
>>> The cause of this is as follows:
>>> - because the step s is a variable rather than a constant, an alternative
>>> IV (D.3209 in our example) is generated in expand_omp_simd, and the
>>> loop condition is tested in terms of the alternative IV rather than
>>> the original IV (i in our example).
>>> - the SIMT code in expand_omp_simd works by modifying step and initial value.
>>> - The initial value fd->loop.n1 is loaded into a variable n1, which is
>>> modified by the SIMT code and then used there-after.
>>> - The step fd->loop.step is loaded into a variable step, which is is modified
>>> by the SIMT code, but afterwards there are uses of both step and
>>> fd->loop.step.
>>> - There are uses of fd->loop.step in the alternative IV handling code,
>>> which should use step instead.
>>>
>>> Fix this by introducing an additional variable orig_step, which is not
>>> modified by the SIMT code and replacing all remaining uses of fd->loop.step
>>> by either step or orig_step.
>>>
>>> Build on x86_64-linux with nvptx accelerator, tested libgomp.
>>>
>>> This fixes for-5.c and for-6.c FAILs I'm currently seeing on a quadro m1200
>>> with driver 450.66.
>>>
>>> OK for trunk?
>>>
>>
Ping^3.
Thanks,
- Tom
>>> [omp, simt] Handle alternative IV
>>>
>>> gcc/ChangeLog:
>>>
>>> 2020-10-02 Tom de Vries <tdevries@suse.de>
>>>
>>> * omp-expand.c (expand_omp_simd): Add step_orig, and replace uses of
>>> fd->loop.step by either step or orig_step.
>>>
>>> libgomp/ChangeLog:
>>>
>>> 2020-10-02 Tom de Vries <tdevries@suse.de>
>>>
>>> * testsuite/libgomp.c/pr81778.c: New test.
>>>
>>> ---
>>> gcc/omp-expand.c | 11 ++++----
>>> libgomp/testsuite/libgomp.c/pr81778.c | 48 +++++++++++++++++++++++++++++++++++
>>> 2 files changed, 54 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c
>>> index 99cb4f9dda4..80e35ac0294 100644
>>> --- a/gcc/omp-expand.c
>>> +++ b/gcc/omp-expand.c
>>> @@ -6307,6 +6307,7 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
>>> n2 = OMP_CLAUSE_DECL (innerc);
>>> }
>>> tree step = fd->loop.step;
>>> + tree orig_step = step; /* May be different from step if is_simt. */
>>>
>>> bool is_simt = omp_find_clause (gimple_omp_for_clauses (fd->for_stmt),
>>> OMP_CLAUSE__SIMT_);
>>> @@ -6455,7 +6456,7 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
>>> tree altv = NULL_TREE, altn2 = NULL_TREE;
>>> if (fd->collapse == 1
>>> && !broken_loop
>>> - && TREE_CODE (fd->loops[0].step) != INTEGER_CST)
>>> + && TREE_CODE (orig_step) != INTEGER_CST)
>>> {
>>> /* The vectorizer currently punts on loops with non-constant steps
>>> for the main IV (can't compute number of iterations and gives up
>>> @@ -6471,7 +6472,7 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
>>> itype = signed_type_for (itype);
>>> t = build_int_cst (itype, (fd->loop.cond_code == LT_EXPR ? -1 : 1));
>>> t = fold_build2 (PLUS_EXPR, itype,
>>> - fold_convert (itype, fd->loop.step), t);
>>> + fold_convert (itype, step), t);
>>> t = fold_build2 (PLUS_EXPR, itype, t, fold_convert (itype, n2));
>>> t = fold_build2 (MINUS_EXPR, itype, t,
>>> fold_convert (itype, fd->loop.v));
>>> @@ -6479,10 +6480,10 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
>>> t = fold_build2 (TRUNC_DIV_EXPR, itype,
>>> fold_build1 (NEGATE_EXPR, itype, t),
>>> fold_build1 (NEGATE_EXPR, itype,
>>> - fold_convert (itype, fd->loop.step)));
>>> + fold_convert (itype, step)));
>>> else
>>> t = fold_build2 (TRUNC_DIV_EXPR, itype, t,
>>> - fold_convert (itype, fd->loop.step));
>>> + fold_convert (itype, step));
>>> t = fold_convert (TREE_TYPE (altv), t);
>>> altn2 = create_tmp_var (TREE_TYPE (altv));
>>> expand_omp_build_assign (&gsi, altn2, t);
>>> @@ -6630,7 +6631,7 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
>>> if (is_simt)
>>> {
>>> gsi = gsi_start_bb (l2_bb);
>>> - step = fold_build2 (MINUS_EXPR, TREE_TYPE (step), fd->loop.step, step);
>>> + step = fold_build2 (MINUS_EXPR, TREE_TYPE (step), orig_step, step);
>>> if (POINTER_TYPE_P (type))
>>> t = fold_build_pointer_plus (fd->loop.v, step);
>>> else
>>> diff --git a/libgomp/testsuite/libgomp.c/pr81778.c b/libgomp/testsuite/libgomp.c/pr81778.c
>>> new file mode 100644
>>> index 00000000000..571668eb36a
>>> --- /dev/null
>>> +++ b/libgomp/testsuite/libgomp.c/pr81778.c
>>> @@ -0,0 +1,48 @@
>>> +/* Minimized from for-5.c. */
>>> +
>>> +#include <stdio.h>
>>> +#include <stdlib.h>
>>> +
>>> +/* Size of array we want to write. */
>>> +#define N 32
>>> +
>>> +/* Size of extra space before and after. */
>>> +#define CANARY_SIZE (N * 32)
>>> +
>>> +/* Start of array we want to write. */
>>> +#define BASE (CANARY_SIZE)
>>> +
>>> +// Total size to be allocated.
>>> +#define ALLOC_SIZE (CANARY_SIZE + N + CANARY_SIZE)
>>> +
>>> +#pragma omp declare target
>>> +int a[ALLOC_SIZE];
>>> +#pragma omp end declare target
>>> +
>>> +int
>>> +main (void)
>>> +{
>>> + /* Use variable step in for loop. */
>>> + int s = 1;
>>> +
>>> +#pragma omp target update to(a)
>>> +
>>> + /* Write a[BASE] .. a[BASE + N - 1]. */
>>> +#pragma omp target simd
>>> + for (int i = N - 1; i > -1; i -= s)
>>> + a[BASE + i] = 1;
>>> +
>>> +#pragma omp target update from(a)
>>> +
>>> + for (int i = 0; i < ALLOC_SIZE; i++)
>>> + {
>>> + int expected = (BASE <= i && i < BASE + N) ? 1 : 0;
>>> + if (a[i] == expected)
>>> + continue;
>>> +
>>> + printf ("Expected %d, got %d at base[%d]\n", expected, a[i], i - BASE);
>>> + abort ();
>>> + }
>>> +
>>> + return 0;
>>> +}
>>>
^ permalink raw reply [flat|nested] 5+ messages in thread
* [committed][omp, simt] Handle alternative IV
2021-04-22 11:46 ` [PING^3][PATCH][omp, " Tom de Vries
@ 2021-04-29 12:38 ` Tom de Vries
0 siblings, 0 replies; 5+ messages in thread
From: Tom de Vries @ 2021-04-29 12:38 UTC (permalink / raw)
To: gcc-patches; +Cc: Jakub Jelinek, Alexander Monakov
On 4/22/21 1:46 PM, Tom de Vries wrote:
> On 12/17/20 5:46 PM, Tom de Vries wrote:
>> On 10/15/20 5:05 PM, Tom de Vries wrote:
>>> On 10/2/20 3:21 PM, Tom de Vries wrote:
>>>> Hi,
>>>>
>>>> Consider the test-case libgomp.c/pr81778.c added in this commit, with
>>>> this core loop (note: CANARY_SIZE set to 0 for simplicity):
>>>> ...
>>>> int s = 1;
>>>> #pragma omp target simd
>>>> for (int i = N - 1; i > -1; i -= s)
>>>> a[i] = 1;
>>>> ...
>>>> which, given that N is 32, sets a[0..31] to 1.
>>>>
>>>> After omp-expand, this looks like:
>>>> ...
>>>> <bb 5> :
>>>> simduid.7 = .GOMP_SIMT_ENTER (simduid.7);
>>>> .omp_simt.8 = .GOMP_SIMT_ENTER_ALLOC (simduid.7);
>>>> D.3193 = -s;
>>>> s.9 = s;
>>>> D.3204 = .GOMP_SIMT_LANE ();
>>>> D.3205 = -s.9;
>>>> D.3206 = (int) D.3204;
>>>> D.3207 = D.3205 * D.3206;
>>>> i = D.3207 + 31;
>>>> D.3209 = 0;
>>>> D.3210 = -s.9;
>>>> D.3211 = D.3210 - i;
>>>> D.3210 = -s.9;
>>>> D.3212 = D.3211 / D.3210;
>>>> D.3213 = (unsigned int) D.3212;
>>>> D.3213 = i >= 0 ? D.3213 : 0;
>>>>
>>>> <bb 19> :
>>>> if (D.3209 < D.3213)
>>>> goto <bb 6>; [87.50%]
>>>> else
>>>> goto <bb 7>; [12.50%]
>>>>
>>>> <bb 6> :
>>>> a[i] = 1;
>>>> D.3215 = -s.9;
>>>> D.3219 = .GOMP_SIMT_VF ();
>>>> D.3216 = (int) D.3219;
>>>> D.3220 = D.3215 * D.3216;
>>>> i = D.3220 + i;
>>>> D.3209 = D.3209 + 1;
>>>> goto <bb 19>; [100.00%]
>>>> ...
>>>>
>>>> On nvptx, the first time bb6 is executed, i is in the 0..31 range (depending
>>>> on the lane that is executing) at bb entry.
>>>>
>>>> So we have the following sequence:
>>>> - a[0..31] is set to 1
>>>> - i is updated to -32..-1
>>>> - D.3209 is updated to 1 (being 0 initially)
>>>> - bb19 is executed, and if condition (D.3209 < D.3213) == (1 < 32) evaluates
>>>> to true
>>>> - bb6 is once more executed, which should not happen because all the elements
>>>> that needed to be handled were already handled.
>>>> - consequently, elements that should not be written are written
>>>> - with CANARY_SIZE == 0, we may run into a libgomp error:
>>>> ...
>>>> libgomp: cuCtxSynchronize error: an illegal memory access was encountered
>>>> ...
>>>> and with CANARY_SIZE unmodified, we run into:
>>>> ...
>>>> Expected 0, got 1 at base[-961]
>>>> Aborted (core dumped)
>>>> ...
>>>>
>>>> The cause of this is as follows:
>>>> - because the step s is a variable rather than a constant, an alternative
>>>> IV (D.3209 in our example) is generated in expand_omp_simd, and the
>>>> loop condition is tested in terms of the alternative IV rather than
>>>> the original IV (i in our example).
>>>> - the SIMT code in expand_omp_simd works by modifying step and initial value.
>>>> - The initial value fd->loop.n1 is loaded into a variable n1, which is
>>>> modified by the SIMT code and then used there-after.
>>>> - The step fd->loop.step is loaded into a variable step, which is is modified
>>>> by the SIMT code, but afterwards there are uses of both step and
>>>> fd->loop.step.
>>>> - There are uses of fd->loop.step in the alternative IV handling code,
>>>> which should use step instead.
>>>>
>>>> Fix this by introducing an additional variable orig_step, which is not
>>>> modified by the SIMT code and replacing all remaining uses of fd->loop.step
>>>> by either step or orig_step.
>>>>
>>>> Build on x86_64-linux with nvptx accelerator, tested libgomp.
>>>>
>>>> This fixes for-5.c and for-6.c FAILs I'm currently seeing on a quadro m1200
>>>> with driver 450.66.
>>>>
>>>> OK for trunk?
>>>>
>>>
>
> Ping^3.
>
Committed.
Thanks,
- Tom
>>>> [omp, simt] Handle alternative IV
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>> 2020-10-02 Tom de Vries <tdevries@suse.de>
>>>>
>>>> * omp-expand.c (expand_omp_simd): Add step_orig, and replace uses of
>>>> fd->loop.step by either step or orig_step.
>>>>
>>>> libgomp/ChangeLog:
>>>>
>>>> 2020-10-02 Tom de Vries <tdevries@suse.de>
>>>>
>>>> * testsuite/libgomp.c/pr81778.c: New test.
>>>>
>>>> ---
>>>> gcc/omp-expand.c | 11 ++++----
>>>> libgomp/testsuite/libgomp.c/pr81778.c | 48 +++++++++++++++++++++++++++++++++++
>>>> 2 files changed, 54 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c
>>>> index 99cb4f9dda4..80e35ac0294 100644
>>>> --- a/gcc/omp-expand.c
>>>> +++ b/gcc/omp-expand.c
>>>> @@ -6307,6 +6307,7 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
>>>> n2 = OMP_CLAUSE_DECL (innerc);
>>>> }
>>>> tree step = fd->loop.step;
>>>> + tree orig_step = step; /* May be different from step if is_simt. */
>>>>
>>>> bool is_simt = omp_find_clause (gimple_omp_for_clauses (fd->for_stmt),
>>>> OMP_CLAUSE__SIMT_);
>>>> @@ -6455,7 +6456,7 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
>>>> tree altv = NULL_TREE, altn2 = NULL_TREE;
>>>> if (fd->collapse == 1
>>>> && !broken_loop
>>>> - && TREE_CODE (fd->loops[0].step) != INTEGER_CST)
>>>> + && TREE_CODE (orig_step) != INTEGER_CST)
>>>> {
>>>> /* The vectorizer currently punts on loops with non-constant steps
>>>> for the main IV (can't compute number of iterations and gives up
>>>> @@ -6471,7 +6472,7 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
>>>> itype = signed_type_for (itype);
>>>> t = build_int_cst (itype, (fd->loop.cond_code == LT_EXPR ? -1 : 1));
>>>> t = fold_build2 (PLUS_EXPR, itype,
>>>> - fold_convert (itype, fd->loop.step), t);
>>>> + fold_convert (itype, step), t);
>>>> t = fold_build2 (PLUS_EXPR, itype, t, fold_convert (itype, n2));
>>>> t = fold_build2 (MINUS_EXPR, itype, t,
>>>> fold_convert (itype, fd->loop.v));
>>>> @@ -6479,10 +6480,10 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
>>>> t = fold_build2 (TRUNC_DIV_EXPR, itype,
>>>> fold_build1 (NEGATE_EXPR, itype, t),
>>>> fold_build1 (NEGATE_EXPR, itype,
>>>> - fold_convert (itype, fd->loop.step)));
>>>> + fold_convert (itype, step)));
>>>> else
>>>> t = fold_build2 (TRUNC_DIV_EXPR, itype, t,
>>>> - fold_convert (itype, fd->loop.step));
>>>> + fold_convert (itype, step));
>>>> t = fold_convert (TREE_TYPE (altv), t);
>>>> altn2 = create_tmp_var (TREE_TYPE (altv));
>>>> expand_omp_build_assign (&gsi, altn2, t);
>>>> @@ -6630,7 +6631,7 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
>>>> if (is_simt)
>>>> {
>>>> gsi = gsi_start_bb (l2_bb);
>>>> - step = fold_build2 (MINUS_EXPR, TREE_TYPE (step), fd->loop.step, step);
>>>> + step = fold_build2 (MINUS_EXPR, TREE_TYPE (step), orig_step, step);
>>>> if (POINTER_TYPE_P (type))
>>>> t = fold_build_pointer_plus (fd->loop.v, step);
>>>> else
>>>> diff --git a/libgomp/testsuite/libgomp.c/pr81778.c b/libgomp/testsuite/libgomp.c/pr81778.c
>>>> new file mode 100644
>>>> index 00000000000..571668eb36a
>>>> --- /dev/null
>>>> +++ b/libgomp/testsuite/libgomp.c/pr81778.c
>>>> @@ -0,0 +1,48 @@
>>>> +/* Minimized from for-5.c. */
>>>> +
>>>> +#include <stdio.h>
>>>> +#include <stdlib.h>
>>>> +
>>>> +/* Size of array we want to write. */
>>>> +#define N 32
>>>> +
>>>> +/* Size of extra space before and after. */
>>>> +#define CANARY_SIZE (N * 32)
>>>> +
>>>> +/* Start of array we want to write. */
>>>> +#define BASE (CANARY_SIZE)
>>>> +
>>>> +// Total size to be allocated.
>>>> +#define ALLOC_SIZE (CANARY_SIZE + N + CANARY_SIZE)
>>>> +
>>>> +#pragma omp declare target
>>>> +int a[ALLOC_SIZE];
>>>> +#pragma omp end declare target
>>>> +
>>>> +int
>>>> +main (void)
>>>> +{
>>>> + /* Use variable step in for loop. */
>>>> + int s = 1;
>>>> +
>>>> +#pragma omp target update to(a)
>>>> +
>>>> + /* Write a[BASE] .. a[BASE + N - 1]. */
>>>> +#pragma omp target simd
>>>> + for (int i = N - 1; i > -1; i -= s)
>>>> + a[BASE + i] = 1;
>>>> +
>>>> +#pragma omp target update from(a)
>>>> +
>>>> + for (int i = 0; i < ALLOC_SIZE; i++)
>>>> + {
>>>> + int expected = (BASE <= i && i < BASE + N) ? 1 : 0;
>>>> + if (a[i] == expected)
>>>> + continue;
>>>> +
>>>> + printf ("Expected %d, got %d at base[%d]\n", expected, a[i], i - BASE);
>>>> + abort ();
>>>> + }
>>>> +
>>>> + return 0;
>>>> +}
>>>>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-04-29 12:38 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-02 13:21 [PATCH][omp, simt] Handle alternative IV Tom de Vries
2020-10-15 15:05 ` [PING][PATCH][omp, " Tom de Vries
2020-12-17 16:46 ` [PING^2][PATCH][omp, " Tom de Vries
2021-04-22 11:46 ` [PING^3][PATCH][omp, " Tom de Vries
2021-04-29 12:38 ` [committed][omp, " Tom de Vries
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).