public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH][omp, simt] Handle alternative IV
@ 2020-10-02 13:21 Tom de Vries
  2020-10-15 15:05 ` [PING][PATCH][omp, " Tom de Vries
  0 siblings, 1 reply; 5+ messages in thread
From: Tom de Vries @ 2020-10-02 13:21 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jakub Jelinek, Alexander Monakov

Hi,

Consider the test-case libgomp.c/pr81778.c added in this commit, with
this core loop (note: CANARY_SIZE set to 0 for simplicity):
...
  int s = 1;
  #pragma omp target simd
  for (int i = N - 1; i > -1; i -= s)
    a[i] = 1;
...
which, given that N is 32, sets a[0..31] to 1.

After omp-expand, this looks like:
...
  <bb 5> :
  simduid.7 = .GOMP_SIMT_ENTER (simduid.7);
  .omp_simt.8 = .GOMP_SIMT_ENTER_ALLOC (simduid.7);
  D.3193 = -s;
  s.9 = s;
  D.3204 = .GOMP_SIMT_LANE ();
  D.3205 = -s.9;
  D.3206 = (int) D.3204;
  D.3207 = D.3205 * D.3206;
  i = D.3207 + 31;
  D.3209 = 0;
  D.3210 = -s.9;
  D.3211 = D.3210 - i;
  D.3210 = -s.9;
  D.3212 = D.3211 / D.3210;
  D.3213 = (unsigned int) D.3212;
  D.3213 = i >= 0 ? D.3213 : 0;

  <bb 19> :
  if (D.3209 < D.3213)
    goto <bb 6>; [87.50%]
  else
    goto <bb 7>; [12.50%]

  <bb 6> :
  a[i] = 1;
  D.3215 = -s.9;
  D.3219 = .GOMP_SIMT_VF ();
  D.3216 = (int) D.3219;
  D.3220 = D.3215 * D.3216;
  i = D.3220 + i;
  D.3209 = D.3209 + 1;
  goto <bb 19>; [100.00%]
...

On nvptx, the first time bb6 is executed, i is in the 0..31 range (depending
on the lane that is executing) at bb entry.

So we have the following sequence:
- a[0..31] is set to 1
- i is updated to -32..-1
- D.3209 is updated to 1 (being 0 initially)
- bb19 is executed, and if condition (D.3209 < D.3213) == (1 < 32) evaluates
  to true
- bb6 is once more executed, which should not happen because all the elements
  that needed to be handled were already handled.
- consequently, elements that should not be written are written
- with CANARY_SIZE == 0, we may run into a libgomp error:
  ...
  libgomp: cuCtxSynchronize error: an illegal memory access was encountered
  ...
  and with CANARY_SIZE unmodified, we run into:
  ...
  Expected 0, got 1 at base[-961]
  Aborted (core dumped)
  ...

The cause of this is as follows:
- because the step s is a variable rather than a constant, an alternative
  IV (D.3209 in our example) is generated in expand_omp_simd, and the
  loop condition is tested in terms of the alternative IV rather than
  the original IV (i in our example).
- the SIMT code in expand_omp_simd works by modifying step and initial value.
- The initial value fd->loop.n1 is loaded into a variable n1, which is
  modified by the SIMT code and then used there-after.
- The step fd->loop.step is loaded into a variable step, which is is modified
  by the SIMT code, but afterwards there are uses of both step and
  fd->loop.step.
- There are uses of fd->loop.step in the alternative IV handling code,
  which should use step instead.

Fix this by introducing an additional variable orig_step, which is not
modified by the SIMT code and replacing all remaining uses of fd->loop.step
by either step or orig_step.

Build on x86_64-linux with nvptx accelerator, tested libgomp.

This fixes for-5.c and for-6.c FAILs I'm currently seeing on a quadro m1200
with driver 450.66.

OK for trunk?

Thanks,
- Tom

[omp, simt] Handle alternative IV

gcc/ChangeLog:

2020-10-02  Tom de Vries  <tdevries@suse.de>

	* omp-expand.c (expand_omp_simd): Add step_orig, and replace uses of
	fd->loop.step by either step or orig_step.

libgomp/ChangeLog:

2020-10-02  Tom de Vries  <tdevries@suse.de>

	* testsuite/libgomp.c/pr81778.c: New test.

---
 gcc/omp-expand.c                      | 11 ++++----
 libgomp/testsuite/libgomp.c/pr81778.c | 48 +++++++++++++++++++++++++++++++++++
 2 files changed, 54 insertions(+), 5 deletions(-)

diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c
index 99cb4f9dda4..80e35ac0294 100644
--- a/gcc/omp-expand.c
+++ b/gcc/omp-expand.c
@@ -6307,6 +6307,7 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
       n2 = OMP_CLAUSE_DECL (innerc);
     }
   tree step = fd->loop.step;
+  tree orig_step = step; /* May be different from step if is_simt.  */
 
   bool is_simt = omp_find_clause (gimple_omp_for_clauses (fd->for_stmt),
 				  OMP_CLAUSE__SIMT_);
@@ -6455,7 +6456,7 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
   tree altv = NULL_TREE, altn2 = NULL_TREE;
   if (fd->collapse == 1
       && !broken_loop
-      && TREE_CODE (fd->loops[0].step) != INTEGER_CST)
+      && TREE_CODE (orig_step) != INTEGER_CST)
     {
       /* The vectorizer currently punts on loops with non-constant steps
 	 for the main IV (can't compute number of iterations and gives up
@@ -6471,7 +6472,7 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
 	itype = signed_type_for (itype);
       t = build_int_cst (itype, (fd->loop.cond_code == LT_EXPR ? -1 : 1));
       t = fold_build2 (PLUS_EXPR, itype,
-		       fold_convert (itype, fd->loop.step), t);
+		       fold_convert (itype, step), t);
       t = fold_build2 (PLUS_EXPR, itype, t, fold_convert (itype, n2));
       t = fold_build2 (MINUS_EXPR, itype, t,
 		       fold_convert (itype, fd->loop.v));
@@ -6479,10 +6480,10 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
 	t = fold_build2 (TRUNC_DIV_EXPR, itype,
 			 fold_build1 (NEGATE_EXPR, itype, t),
 			 fold_build1 (NEGATE_EXPR, itype,
-				      fold_convert (itype, fd->loop.step)));
+				      fold_convert (itype, step)));
       else
 	t = fold_build2 (TRUNC_DIV_EXPR, itype, t,
-			 fold_convert (itype, fd->loop.step));
+			 fold_convert (itype, step));
       t = fold_convert (TREE_TYPE (altv), t);
       altn2 = create_tmp_var (TREE_TYPE (altv));
       expand_omp_build_assign (&gsi, altn2, t);
@@ -6630,7 +6631,7 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
   if (is_simt)
     {
       gsi = gsi_start_bb (l2_bb);
-      step = fold_build2 (MINUS_EXPR, TREE_TYPE (step), fd->loop.step, step);
+      step = fold_build2 (MINUS_EXPR, TREE_TYPE (step), orig_step, step);
       if (POINTER_TYPE_P (type))
 	t = fold_build_pointer_plus (fd->loop.v, step);
       else
diff --git a/libgomp/testsuite/libgomp.c/pr81778.c b/libgomp/testsuite/libgomp.c/pr81778.c
new file mode 100644
index 00000000000..571668eb36a
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/pr81778.c
@@ -0,0 +1,48 @@
+/* Minimized from for-5.c.  */
+
+#include <stdio.h>
+#include <stdlib.h>
+
+/* Size of array we want to write.  */
+#define N 32
+
+/* Size of extra space before and after.  */
+#define CANARY_SIZE (N * 32)
+
+/* Start of array we want to write.  */
+#define BASE (CANARY_SIZE)
+
+// Total size to be allocated.
+#define ALLOC_SIZE (CANARY_SIZE + N + CANARY_SIZE)
+
+#pragma omp declare target
+int a[ALLOC_SIZE];
+#pragma omp end declare target
+
+int
+main (void)
+{
+  /* Use variable step in for loop.  */
+  int s = 1;
+
+#pragma omp target update to(a)
+
+  /* Write a[BASE] .. a[BASE + N - 1].  */
+#pragma omp target simd
+  for (int i = N - 1; i > -1; i -= s)
+    a[BASE + i] = 1;
+
+#pragma omp target update from(a)
+
+  for (int i = 0; i < ALLOC_SIZE; i++)
+    {
+      int expected = (BASE <= i && i < BASE + N) ? 1 : 0;
+      if (a[i] == expected)
+	continue;
+
+      printf ("Expected %d, got %d at base[%d]\n", expected, a[i], i - BASE);
+      abort ();
+    }
+
+  return 0;
+}

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PING][PATCH][omp, simt] Handle alternative IV
  2020-10-02 13:21 [PATCH][omp, simt] Handle alternative IV Tom de Vries
@ 2020-10-15 15:05 ` Tom de Vries
  2020-12-17 16:46   ` [PING^2][PATCH][omp, " Tom de Vries
  0 siblings, 1 reply; 5+ messages in thread
From: Tom de Vries @ 2020-10-15 15:05 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jakub Jelinek, Alexander Monakov

On 10/2/20 3:21 PM, Tom de Vries wrote:
> Hi,
> 
> Consider the test-case libgomp.c/pr81778.c added in this commit, with
> this core loop (note: CANARY_SIZE set to 0 for simplicity):
> ...
>   int s = 1;
>   #pragma omp target simd
>   for (int i = N - 1; i > -1; i -= s)
>     a[i] = 1;
> ...
> which, given that N is 32, sets a[0..31] to 1.
> 
> After omp-expand, this looks like:
> ...
>   <bb 5> :
>   simduid.7 = .GOMP_SIMT_ENTER (simduid.7);
>   .omp_simt.8 = .GOMP_SIMT_ENTER_ALLOC (simduid.7);
>   D.3193 = -s;
>   s.9 = s;
>   D.3204 = .GOMP_SIMT_LANE ();
>   D.3205 = -s.9;
>   D.3206 = (int) D.3204;
>   D.3207 = D.3205 * D.3206;
>   i = D.3207 + 31;
>   D.3209 = 0;
>   D.3210 = -s.9;
>   D.3211 = D.3210 - i;
>   D.3210 = -s.9;
>   D.3212 = D.3211 / D.3210;
>   D.3213 = (unsigned int) D.3212;
>   D.3213 = i >= 0 ? D.3213 : 0;
> 
>   <bb 19> :
>   if (D.3209 < D.3213)
>     goto <bb 6>; [87.50%]
>   else
>     goto <bb 7>; [12.50%]
> 
>   <bb 6> :
>   a[i] = 1;
>   D.3215 = -s.9;
>   D.3219 = .GOMP_SIMT_VF ();
>   D.3216 = (int) D.3219;
>   D.3220 = D.3215 * D.3216;
>   i = D.3220 + i;
>   D.3209 = D.3209 + 1;
>   goto <bb 19>; [100.00%]
> ...
> 
> On nvptx, the first time bb6 is executed, i is in the 0..31 range (depending
> on the lane that is executing) at bb entry.
> 
> So we have the following sequence:
> - a[0..31] is set to 1
> - i is updated to -32..-1
> - D.3209 is updated to 1 (being 0 initially)
> - bb19 is executed, and if condition (D.3209 < D.3213) == (1 < 32) evaluates
>   to true
> - bb6 is once more executed, which should not happen because all the elements
>   that needed to be handled were already handled.
> - consequently, elements that should not be written are written
> - with CANARY_SIZE == 0, we may run into a libgomp error:
>   ...
>   libgomp: cuCtxSynchronize error: an illegal memory access was encountered
>   ...
>   and with CANARY_SIZE unmodified, we run into:
>   ...
>   Expected 0, got 1 at base[-961]
>   Aborted (core dumped)
>   ...
> 
> The cause of this is as follows:
> - because the step s is a variable rather than a constant, an alternative
>   IV (D.3209 in our example) is generated in expand_omp_simd, and the
>   loop condition is tested in terms of the alternative IV rather than
>   the original IV (i in our example).
> - the SIMT code in expand_omp_simd works by modifying step and initial value.
> - The initial value fd->loop.n1 is loaded into a variable n1, which is
>   modified by the SIMT code and then used there-after.
> - The step fd->loop.step is loaded into a variable step, which is is modified
>   by the SIMT code, but afterwards there are uses of both step and
>   fd->loop.step.
> - There are uses of fd->loop.step in the alternative IV handling code,
>   which should use step instead.
> 
> Fix this by introducing an additional variable orig_step, which is not
> modified by the SIMT code and replacing all remaining uses of fd->loop.step
> by either step or orig_step.
> 
> Build on x86_64-linux with nvptx accelerator, tested libgomp.
> 
> This fixes for-5.c and for-6.c FAILs I'm currently seeing on a quadro m1200
> with driver 450.66.
> 
> OK for trunk?
> 

Ping.

Thanks,
- Tom

> [omp, simt] Handle alternative IV
> 
> gcc/ChangeLog:
> 
> 2020-10-02  Tom de Vries  <tdevries@suse.de>
> 
> 	* omp-expand.c (expand_omp_simd): Add step_orig, and replace uses of
> 	fd->loop.step by either step or orig_step.
> 
> libgomp/ChangeLog:
> 
> 2020-10-02  Tom de Vries  <tdevries@suse.de>
> 
> 	* testsuite/libgomp.c/pr81778.c: New test.
> 
> ---
>  gcc/omp-expand.c                      | 11 ++++----
>  libgomp/testsuite/libgomp.c/pr81778.c | 48 +++++++++++++++++++++++++++++++++++
>  2 files changed, 54 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c
> index 99cb4f9dda4..80e35ac0294 100644
> --- a/gcc/omp-expand.c
> +++ b/gcc/omp-expand.c
> @@ -6307,6 +6307,7 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
>        n2 = OMP_CLAUSE_DECL (innerc);
>      }
>    tree step = fd->loop.step;
> +  tree orig_step = step; /* May be different from step if is_simt.  */
>  
>    bool is_simt = omp_find_clause (gimple_omp_for_clauses (fd->for_stmt),
>  				  OMP_CLAUSE__SIMT_);
> @@ -6455,7 +6456,7 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
>    tree altv = NULL_TREE, altn2 = NULL_TREE;
>    if (fd->collapse == 1
>        && !broken_loop
> -      && TREE_CODE (fd->loops[0].step) != INTEGER_CST)
> +      && TREE_CODE (orig_step) != INTEGER_CST)
>      {
>        /* The vectorizer currently punts on loops with non-constant steps
>  	 for the main IV (can't compute number of iterations and gives up
> @@ -6471,7 +6472,7 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
>  	itype = signed_type_for (itype);
>        t = build_int_cst (itype, (fd->loop.cond_code == LT_EXPR ? -1 : 1));
>        t = fold_build2 (PLUS_EXPR, itype,
> -		       fold_convert (itype, fd->loop.step), t);
> +		       fold_convert (itype, step), t);
>        t = fold_build2 (PLUS_EXPR, itype, t, fold_convert (itype, n2));
>        t = fold_build2 (MINUS_EXPR, itype, t,
>  		       fold_convert (itype, fd->loop.v));
> @@ -6479,10 +6480,10 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
>  	t = fold_build2 (TRUNC_DIV_EXPR, itype,
>  			 fold_build1 (NEGATE_EXPR, itype, t),
>  			 fold_build1 (NEGATE_EXPR, itype,
> -				      fold_convert (itype, fd->loop.step)));
> +				      fold_convert (itype, step)));
>        else
>  	t = fold_build2 (TRUNC_DIV_EXPR, itype, t,
> -			 fold_convert (itype, fd->loop.step));
> +			 fold_convert (itype, step));
>        t = fold_convert (TREE_TYPE (altv), t);
>        altn2 = create_tmp_var (TREE_TYPE (altv));
>        expand_omp_build_assign (&gsi, altn2, t);
> @@ -6630,7 +6631,7 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
>    if (is_simt)
>      {
>        gsi = gsi_start_bb (l2_bb);
> -      step = fold_build2 (MINUS_EXPR, TREE_TYPE (step), fd->loop.step, step);
> +      step = fold_build2 (MINUS_EXPR, TREE_TYPE (step), orig_step, step);
>        if (POINTER_TYPE_P (type))
>  	t = fold_build_pointer_plus (fd->loop.v, step);
>        else
> diff --git a/libgomp/testsuite/libgomp.c/pr81778.c b/libgomp/testsuite/libgomp.c/pr81778.c
> new file mode 100644
> index 00000000000..571668eb36a
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.c/pr81778.c
> @@ -0,0 +1,48 @@
> +/* Minimized from for-5.c.  */
> +
> +#include <stdio.h>
> +#include <stdlib.h>
> +
> +/* Size of array we want to write.  */
> +#define N 32
> +
> +/* Size of extra space before and after.  */
> +#define CANARY_SIZE (N * 32)
> +
> +/* Start of array we want to write.  */
> +#define BASE (CANARY_SIZE)
> +
> +// Total size to be allocated.
> +#define ALLOC_SIZE (CANARY_SIZE + N + CANARY_SIZE)
> +
> +#pragma omp declare target
> +int a[ALLOC_SIZE];
> +#pragma omp end declare target
> +
> +int
> +main (void)
> +{
> +  /* Use variable step in for loop.  */
> +  int s = 1;
> +
> +#pragma omp target update to(a)
> +
> +  /* Write a[BASE] .. a[BASE + N - 1].  */
> +#pragma omp target simd
> +  for (int i = N - 1; i > -1; i -= s)
> +    a[BASE + i] = 1;
> +
> +#pragma omp target update from(a)
> +
> +  for (int i = 0; i < ALLOC_SIZE; i++)
> +    {
> +      int expected = (BASE <= i && i < BASE + N) ? 1 : 0;
> +      if (a[i] == expected)
> +	continue;
> +
> +      printf ("Expected %d, got %d at base[%d]\n", expected, a[i], i - BASE);
> +      abort ();
> +    }
> +
> +  return 0;
> +}
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PING^2][PATCH][omp, simt] Handle alternative IV
  2020-10-15 15:05 ` [PING][PATCH][omp, " Tom de Vries
@ 2020-12-17 16:46   ` Tom de Vries
  2021-04-22 11:46     ` [PING^3][PATCH][omp, " Tom de Vries
  0 siblings, 1 reply; 5+ messages in thread
From: Tom de Vries @ 2020-12-17 16:46 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jakub Jelinek, Alexander Monakov

On 10/15/20 5:05 PM, Tom de Vries wrote:
> On 10/2/20 3:21 PM, Tom de Vries wrote:
>> Hi,
>>
>> Consider the test-case libgomp.c/pr81778.c added in this commit, with
>> this core loop (note: CANARY_SIZE set to 0 for simplicity):
>> ...
>>   int s = 1;
>>   #pragma omp target simd
>>   for (int i = N - 1; i > -1; i -= s)
>>     a[i] = 1;
>> ...
>> which, given that N is 32, sets a[0..31] to 1.
>>
>> After omp-expand, this looks like:
>> ...
>>   <bb 5> :
>>   simduid.7 = .GOMP_SIMT_ENTER (simduid.7);
>>   .omp_simt.8 = .GOMP_SIMT_ENTER_ALLOC (simduid.7);
>>   D.3193 = -s;
>>   s.9 = s;
>>   D.3204 = .GOMP_SIMT_LANE ();
>>   D.3205 = -s.9;
>>   D.3206 = (int) D.3204;
>>   D.3207 = D.3205 * D.3206;
>>   i = D.3207 + 31;
>>   D.3209 = 0;
>>   D.3210 = -s.9;
>>   D.3211 = D.3210 - i;
>>   D.3210 = -s.9;
>>   D.3212 = D.3211 / D.3210;
>>   D.3213 = (unsigned int) D.3212;
>>   D.3213 = i >= 0 ? D.3213 : 0;
>>
>>   <bb 19> :
>>   if (D.3209 < D.3213)
>>     goto <bb 6>; [87.50%]
>>   else
>>     goto <bb 7>; [12.50%]
>>
>>   <bb 6> :
>>   a[i] = 1;
>>   D.3215 = -s.9;
>>   D.3219 = .GOMP_SIMT_VF ();
>>   D.3216 = (int) D.3219;
>>   D.3220 = D.3215 * D.3216;
>>   i = D.3220 + i;
>>   D.3209 = D.3209 + 1;
>>   goto <bb 19>; [100.00%]
>> ...
>>
>> On nvptx, the first time bb6 is executed, i is in the 0..31 range (depending
>> on the lane that is executing) at bb entry.
>>
>> So we have the following sequence:
>> - a[0..31] is set to 1
>> - i is updated to -32..-1
>> - D.3209 is updated to 1 (being 0 initially)
>> - bb19 is executed, and if condition (D.3209 < D.3213) == (1 < 32) evaluates
>>   to true
>> - bb6 is once more executed, which should not happen because all the elements
>>   that needed to be handled were already handled.
>> - consequently, elements that should not be written are written
>> - with CANARY_SIZE == 0, we may run into a libgomp error:
>>   ...
>>   libgomp: cuCtxSynchronize error: an illegal memory access was encountered
>>   ...
>>   and with CANARY_SIZE unmodified, we run into:
>>   ...
>>   Expected 0, got 1 at base[-961]
>>   Aborted (core dumped)
>>   ...
>>
>> The cause of this is as follows:
>> - because the step s is a variable rather than a constant, an alternative
>>   IV (D.3209 in our example) is generated in expand_omp_simd, and the
>>   loop condition is tested in terms of the alternative IV rather than
>>   the original IV (i in our example).
>> - the SIMT code in expand_omp_simd works by modifying step and initial value.
>> - The initial value fd->loop.n1 is loaded into a variable n1, which is
>>   modified by the SIMT code and then used there-after.
>> - The step fd->loop.step is loaded into a variable step, which is is modified
>>   by the SIMT code, but afterwards there are uses of both step and
>>   fd->loop.step.
>> - There are uses of fd->loop.step in the alternative IV handling code,
>>   which should use step instead.
>>
>> Fix this by introducing an additional variable orig_step, which is not
>> modified by the SIMT code and replacing all remaining uses of fd->loop.step
>> by either step or orig_step.
>>
>> Build on x86_64-linux with nvptx accelerator, tested libgomp.
>>
>> This fixes for-5.c and for-6.c FAILs I'm currently seeing on a quadro m1200
>> with driver 450.66.
>>
>> OK for trunk?
>>
> 

Ping^2.

Thanks,
- Tom

>> [omp, simt] Handle alternative IV
>>
>> gcc/ChangeLog:
>>
>> 2020-10-02  Tom de Vries  <tdevries@suse.de>
>>
>> 	* omp-expand.c (expand_omp_simd): Add step_orig, and replace uses of
>> 	fd->loop.step by either step or orig_step.
>>
>> libgomp/ChangeLog:
>>
>> 2020-10-02  Tom de Vries  <tdevries@suse.de>
>>
>> 	* testsuite/libgomp.c/pr81778.c: New test.
>>
>> ---
>>  gcc/omp-expand.c                      | 11 ++++----
>>  libgomp/testsuite/libgomp.c/pr81778.c | 48 +++++++++++++++++++++++++++++++++++
>>  2 files changed, 54 insertions(+), 5 deletions(-)
>>
>> diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c
>> index 99cb4f9dda4..80e35ac0294 100644
>> --- a/gcc/omp-expand.c
>> +++ b/gcc/omp-expand.c
>> @@ -6307,6 +6307,7 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
>>        n2 = OMP_CLAUSE_DECL (innerc);
>>      }
>>    tree step = fd->loop.step;
>> +  tree orig_step = step; /* May be different from step if is_simt.  */
>>  
>>    bool is_simt = omp_find_clause (gimple_omp_for_clauses (fd->for_stmt),
>>  				  OMP_CLAUSE__SIMT_);
>> @@ -6455,7 +6456,7 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
>>    tree altv = NULL_TREE, altn2 = NULL_TREE;
>>    if (fd->collapse == 1
>>        && !broken_loop
>> -      && TREE_CODE (fd->loops[0].step) != INTEGER_CST)
>> +      && TREE_CODE (orig_step) != INTEGER_CST)
>>      {
>>        /* The vectorizer currently punts on loops with non-constant steps
>>  	 for the main IV (can't compute number of iterations and gives up
>> @@ -6471,7 +6472,7 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
>>  	itype = signed_type_for (itype);
>>        t = build_int_cst (itype, (fd->loop.cond_code == LT_EXPR ? -1 : 1));
>>        t = fold_build2 (PLUS_EXPR, itype,
>> -		       fold_convert (itype, fd->loop.step), t);
>> +		       fold_convert (itype, step), t);
>>        t = fold_build2 (PLUS_EXPR, itype, t, fold_convert (itype, n2));
>>        t = fold_build2 (MINUS_EXPR, itype, t,
>>  		       fold_convert (itype, fd->loop.v));
>> @@ -6479,10 +6480,10 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
>>  	t = fold_build2 (TRUNC_DIV_EXPR, itype,
>>  			 fold_build1 (NEGATE_EXPR, itype, t),
>>  			 fold_build1 (NEGATE_EXPR, itype,
>> -				      fold_convert (itype, fd->loop.step)));
>> +				      fold_convert (itype, step)));
>>        else
>>  	t = fold_build2 (TRUNC_DIV_EXPR, itype, t,
>> -			 fold_convert (itype, fd->loop.step));
>> +			 fold_convert (itype, step));
>>        t = fold_convert (TREE_TYPE (altv), t);
>>        altn2 = create_tmp_var (TREE_TYPE (altv));
>>        expand_omp_build_assign (&gsi, altn2, t);
>> @@ -6630,7 +6631,7 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
>>    if (is_simt)
>>      {
>>        gsi = gsi_start_bb (l2_bb);
>> -      step = fold_build2 (MINUS_EXPR, TREE_TYPE (step), fd->loop.step, step);
>> +      step = fold_build2 (MINUS_EXPR, TREE_TYPE (step), orig_step, step);
>>        if (POINTER_TYPE_P (type))
>>  	t = fold_build_pointer_plus (fd->loop.v, step);
>>        else
>> diff --git a/libgomp/testsuite/libgomp.c/pr81778.c b/libgomp/testsuite/libgomp.c/pr81778.c
>> new file mode 100644
>> index 00000000000..571668eb36a
>> --- /dev/null
>> +++ b/libgomp/testsuite/libgomp.c/pr81778.c
>> @@ -0,0 +1,48 @@
>> +/* Minimized from for-5.c.  */
>> +
>> +#include <stdio.h>
>> +#include <stdlib.h>
>> +
>> +/* Size of array we want to write.  */
>> +#define N 32
>> +
>> +/* Size of extra space before and after.  */
>> +#define CANARY_SIZE (N * 32)
>> +
>> +/* Start of array we want to write.  */
>> +#define BASE (CANARY_SIZE)
>> +
>> +// Total size to be allocated.
>> +#define ALLOC_SIZE (CANARY_SIZE + N + CANARY_SIZE)
>> +
>> +#pragma omp declare target
>> +int a[ALLOC_SIZE];
>> +#pragma omp end declare target
>> +
>> +int
>> +main (void)
>> +{
>> +  /* Use variable step in for loop.  */
>> +  int s = 1;
>> +
>> +#pragma omp target update to(a)
>> +
>> +  /* Write a[BASE] .. a[BASE + N - 1].  */
>> +#pragma omp target simd
>> +  for (int i = N - 1; i > -1; i -= s)
>> +    a[BASE + i] = 1;
>> +
>> +#pragma omp target update from(a)
>> +
>> +  for (int i = 0; i < ALLOC_SIZE; i++)
>> +    {
>> +      int expected = (BASE <= i && i < BASE + N) ? 1 : 0;
>> +      if (a[i] == expected)
>> +	continue;
>> +
>> +      printf ("Expected %d, got %d at base[%d]\n", expected, a[i], i - BASE);
>> +      abort ();
>> +    }
>> +
>> +  return 0;
>> +}
>>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PING^3][PATCH][omp, simt] Handle alternative IV
  2020-12-17 16:46   ` [PING^2][PATCH][omp, " Tom de Vries
@ 2021-04-22 11:46     ` Tom de Vries
  2021-04-29 12:38       ` [committed][omp, " Tom de Vries
  0 siblings, 1 reply; 5+ messages in thread
From: Tom de Vries @ 2021-04-22 11:46 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jakub Jelinek, Alexander Monakov

On 12/17/20 5:46 PM, Tom de Vries wrote:
> On 10/15/20 5:05 PM, Tom de Vries wrote:
>> On 10/2/20 3:21 PM, Tom de Vries wrote:
>>> Hi,
>>>
>>> Consider the test-case libgomp.c/pr81778.c added in this commit, with
>>> this core loop (note: CANARY_SIZE set to 0 for simplicity):
>>> ...
>>>   int s = 1;
>>>   #pragma omp target simd
>>>   for (int i = N - 1; i > -1; i -= s)
>>>     a[i] = 1;
>>> ...
>>> which, given that N is 32, sets a[0..31] to 1.
>>>
>>> After omp-expand, this looks like:
>>> ...
>>>   <bb 5> :
>>>   simduid.7 = .GOMP_SIMT_ENTER (simduid.7);
>>>   .omp_simt.8 = .GOMP_SIMT_ENTER_ALLOC (simduid.7);
>>>   D.3193 = -s;
>>>   s.9 = s;
>>>   D.3204 = .GOMP_SIMT_LANE ();
>>>   D.3205 = -s.9;
>>>   D.3206 = (int) D.3204;
>>>   D.3207 = D.3205 * D.3206;
>>>   i = D.3207 + 31;
>>>   D.3209 = 0;
>>>   D.3210 = -s.9;
>>>   D.3211 = D.3210 - i;
>>>   D.3210 = -s.9;
>>>   D.3212 = D.3211 / D.3210;
>>>   D.3213 = (unsigned int) D.3212;
>>>   D.3213 = i >= 0 ? D.3213 : 0;
>>>
>>>   <bb 19> :
>>>   if (D.3209 < D.3213)
>>>     goto <bb 6>; [87.50%]
>>>   else
>>>     goto <bb 7>; [12.50%]
>>>
>>>   <bb 6> :
>>>   a[i] = 1;
>>>   D.3215 = -s.9;
>>>   D.3219 = .GOMP_SIMT_VF ();
>>>   D.3216 = (int) D.3219;
>>>   D.3220 = D.3215 * D.3216;
>>>   i = D.3220 + i;
>>>   D.3209 = D.3209 + 1;
>>>   goto <bb 19>; [100.00%]
>>> ...
>>>
>>> On nvptx, the first time bb6 is executed, i is in the 0..31 range (depending
>>> on the lane that is executing) at bb entry.
>>>
>>> So we have the following sequence:
>>> - a[0..31] is set to 1
>>> - i is updated to -32..-1
>>> - D.3209 is updated to 1 (being 0 initially)
>>> - bb19 is executed, and if condition (D.3209 < D.3213) == (1 < 32) evaluates
>>>   to true
>>> - bb6 is once more executed, which should not happen because all the elements
>>>   that needed to be handled were already handled.
>>> - consequently, elements that should not be written are written
>>> - with CANARY_SIZE == 0, we may run into a libgomp error:
>>>   ...
>>>   libgomp: cuCtxSynchronize error: an illegal memory access was encountered
>>>   ...
>>>   and with CANARY_SIZE unmodified, we run into:
>>>   ...
>>>   Expected 0, got 1 at base[-961]
>>>   Aborted (core dumped)
>>>   ...
>>>
>>> The cause of this is as follows:
>>> - because the step s is a variable rather than a constant, an alternative
>>>   IV (D.3209 in our example) is generated in expand_omp_simd, and the
>>>   loop condition is tested in terms of the alternative IV rather than
>>>   the original IV (i in our example).
>>> - the SIMT code in expand_omp_simd works by modifying step and initial value.
>>> - The initial value fd->loop.n1 is loaded into a variable n1, which is
>>>   modified by the SIMT code and then used there-after.
>>> - The step fd->loop.step is loaded into a variable step, which is is modified
>>>   by the SIMT code, but afterwards there are uses of both step and
>>>   fd->loop.step.
>>> - There are uses of fd->loop.step in the alternative IV handling code,
>>>   which should use step instead.
>>>
>>> Fix this by introducing an additional variable orig_step, which is not
>>> modified by the SIMT code and replacing all remaining uses of fd->loop.step
>>> by either step or orig_step.
>>>
>>> Build on x86_64-linux with nvptx accelerator, tested libgomp.
>>>
>>> This fixes for-5.c and for-6.c FAILs I'm currently seeing on a quadro m1200
>>> with driver 450.66.
>>>
>>> OK for trunk?
>>>
>>

Ping^3.

Thanks,
- Tom

>>> [omp, simt] Handle alternative IV
>>>
>>> gcc/ChangeLog:
>>>
>>> 2020-10-02  Tom de Vries  <tdevries@suse.de>
>>>
>>> 	* omp-expand.c (expand_omp_simd): Add step_orig, and replace uses of
>>> 	fd->loop.step by either step or orig_step.
>>>
>>> libgomp/ChangeLog:
>>>
>>> 2020-10-02  Tom de Vries  <tdevries@suse.de>
>>>
>>> 	* testsuite/libgomp.c/pr81778.c: New test.
>>>
>>> ---
>>>  gcc/omp-expand.c                      | 11 ++++----
>>>  libgomp/testsuite/libgomp.c/pr81778.c | 48 +++++++++++++++++++++++++++++++++++
>>>  2 files changed, 54 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c
>>> index 99cb4f9dda4..80e35ac0294 100644
>>> --- a/gcc/omp-expand.c
>>> +++ b/gcc/omp-expand.c
>>> @@ -6307,6 +6307,7 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
>>>        n2 = OMP_CLAUSE_DECL (innerc);
>>>      }
>>>    tree step = fd->loop.step;
>>> +  tree orig_step = step; /* May be different from step if is_simt.  */
>>>  
>>>    bool is_simt = omp_find_clause (gimple_omp_for_clauses (fd->for_stmt),
>>>  				  OMP_CLAUSE__SIMT_);
>>> @@ -6455,7 +6456,7 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
>>>    tree altv = NULL_TREE, altn2 = NULL_TREE;
>>>    if (fd->collapse == 1
>>>        && !broken_loop
>>> -      && TREE_CODE (fd->loops[0].step) != INTEGER_CST)
>>> +      && TREE_CODE (orig_step) != INTEGER_CST)
>>>      {
>>>        /* The vectorizer currently punts on loops with non-constant steps
>>>  	 for the main IV (can't compute number of iterations and gives up
>>> @@ -6471,7 +6472,7 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
>>>  	itype = signed_type_for (itype);
>>>        t = build_int_cst (itype, (fd->loop.cond_code == LT_EXPR ? -1 : 1));
>>>        t = fold_build2 (PLUS_EXPR, itype,
>>> -		       fold_convert (itype, fd->loop.step), t);
>>> +		       fold_convert (itype, step), t);
>>>        t = fold_build2 (PLUS_EXPR, itype, t, fold_convert (itype, n2));
>>>        t = fold_build2 (MINUS_EXPR, itype, t,
>>>  		       fold_convert (itype, fd->loop.v));
>>> @@ -6479,10 +6480,10 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
>>>  	t = fold_build2 (TRUNC_DIV_EXPR, itype,
>>>  			 fold_build1 (NEGATE_EXPR, itype, t),
>>>  			 fold_build1 (NEGATE_EXPR, itype,
>>> -				      fold_convert (itype, fd->loop.step)));
>>> +				      fold_convert (itype, step)));
>>>        else
>>>  	t = fold_build2 (TRUNC_DIV_EXPR, itype, t,
>>> -			 fold_convert (itype, fd->loop.step));
>>> +			 fold_convert (itype, step));
>>>        t = fold_convert (TREE_TYPE (altv), t);
>>>        altn2 = create_tmp_var (TREE_TYPE (altv));
>>>        expand_omp_build_assign (&gsi, altn2, t);
>>> @@ -6630,7 +6631,7 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
>>>    if (is_simt)
>>>      {
>>>        gsi = gsi_start_bb (l2_bb);
>>> -      step = fold_build2 (MINUS_EXPR, TREE_TYPE (step), fd->loop.step, step);
>>> +      step = fold_build2 (MINUS_EXPR, TREE_TYPE (step), orig_step, step);
>>>        if (POINTER_TYPE_P (type))
>>>  	t = fold_build_pointer_plus (fd->loop.v, step);
>>>        else
>>> diff --git a/libgomp/testsuite/libgomp.c/pr81778.c b/libgomp/testsuite/libgomp.c/pr81778.c
>>> new file mode 100644
>>> index 00000000000..571668eb36a
>>> --- /dev/null
>>> +++ b/libgomp/testsuite/libgomp.c/pr81778.c
>>> @@ -0,0 +1,48 @@
>>> +/* Minimized from for-5.c.  */
>>> +
>>> +#include <stdio.h>
>>> +#include <stdlib.h>
>>> +
>>> +/* Size of array we want to write.  */
>>> +#define N 32
>>> +
>>> +/* Size of extra space before and after.  */
>>> +#define CANARY_SIZE (N * 32)
>>> +
>>> +/* Start of array we want to write.  */
>>> +#define BASE (CANARY_SIZE)
>>> +
>>> +// Total size to be allocated.
>>> +#define ALLOC_SIZE (CANARY_SIZE + N + CANARY_SIZE)
>>> +
>>> +#pragma omp declare target
>>> +int a[ALLOC_SIZE];
>>> +#pragma omp end declare target
>>> +
>>> +int
>>> +main (void)
>>> +{
>>> +  /* Use variable step in for loop.  */
>>> +  int s = 1;
>>> +
>>> +#pragma omp target update to(a)
>>> +
>>> +  /* Write a[BASE] .. a[BASE + N - 1].  */
>>> +#pragma omp target simd
>>> +  for (int i = N - 1; i > -1; i -= s)
>>> +    a[BASE + i] = 1;
>>> +
>>> +#pragma omp target update from(a)
>>> +
>>> +  for (int i = 0; i < ALLOC_SIZE; i++)
>>> +    {
>>> +      int expected = (BASE <= i && i < BASE + N) ? 1 : 0;
>>> +      if (a[i] == expected)
>>> +	continue;
>>> +
>>> +      printf ("Expected %d, got %d at base[%d]\n", expected, a[i], i - BASE);
>>> +      abort ();
>>> +    }
>>> +
>>> +  return 0;
>>> +}
>>>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [committed][omp, simt] Handle alternative IV
  2021-04-22 11:46     ` [PING^3][PATCH][omp, " Tom de Vries
@ 2021-04-29 12:38       ` Tom de Vries
  0 siblings, 0 replies; 5+ messages in thread
From: Tom de Vries @ 2021-04-29 12:38 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jakub Jelinek, Alexander Monakov

On 4/22/21 1:46 PM, Tom de Vries wrote:
> On 12/17/20 5:46 PM, Tom de Vries wrote:
>> On 10/15/20 5:05 PM, Tom de Vries wrote:
>>> On 10/2/20 3:21 PM, Tom de Vries wrote:
>>>> Hi,
>>>>
>>>> Consider the test-case libgomp.c/pr81778.c added in this commit, with
>>>> this core loop (note: CANARY_SIZE set to 0 for simplicity):
>>>> ...
>>>>   int s = 1;
>>>>   #pragma omp target simd
>>>>   for (int i = N - 1; i > -1; i -= s)
>>>>     a[i] = 1;
>>>> ...
>>>> which, given that N is 32, sets a[0..31] to 1.
>>>>
>>>> After omp-expand, this looks like:
>>>> ...
>>>>   <bb 5> :
>>>>   simduid.7 = .GOMP_SIMT_ENTER (simduid.7);
>>>>   .omp_simt.8 = .GOMP_SIMT_ENTER_ALLOC (simduid.7);
>>>>   D.3193 = -s;
>>>>   s.9 = s;
>>>>   D.3204 = .GOMP_SIMT_LANE ();
>>>>   D.3205 = -s.9;
>>>>   D.3206 = (int) D.3204;
>>>>   D.3207 = D.3205 * D.3206;
>>>>   i = D.3207 + 31;
>>>>   D.3209 = 0;
>>>>   D.3210 = -s.9;
>>>>   D.3211 = D.3210 - i;
>>>>   D.3210 = -s.9;
>>>>   D.3212 = D.3211 / D.3210;
>>>>   D.3213 = (unsigned int) D.3212;
>>>>   D.3213 = i >= 0 ? D.3213 : 0;
>>>>
>>>>   <bb 19> :
>>>>   if (D.3209 < D.3213)
>>>>     goto <bb 6>; [87.50%]
>>>>   else
>>>>     goto <bb 7>; [12.50%]
>>>>
>>>>   <bb 6> :
>>>>   a[i] = 1;
>>>>   D.3215 = -s.9;
>>>>   D.3219 = .GOMP_SIMT_VF ();
>>>>   D.3216 = (int) D.3219;
>>>>   D.3220 = D.3215 * D.3216;
>>>>   i = D.3220 + i;
>>>>   D.3209 = D.3209 + 1;
>>>>   goto <bb 19>; [100.00%]
>>>> ...
>>>>
>>>> On nvptx, the first time bb6 is executed, i is in the 0..31 range (depending
>>>> on the lane that is executing) at bb entry.
>>>>
>>>> So we have the following sequence:
>>>> - a[0..31] is set to 1
>>>> - i is updated to -32..-1
>>>> - D.3209 is updated to 1 (being 0 initially)
>>>> - bb19 is executed, and if condition (D.3209 < D.3213) == (1 < 32) evaluates
>>>>   to true
>>>> - bb6 is once more executed, which should not happen because all the elements
>>>>   that needed to be handled were already handled.
>>>> - consequently, elements that should not be written are written
>>>> - with CANARY_SIZE == 0, we may run into a libgomp error:
>>>>   ...
>>>>   libgomp: cuCtxSynchronize error: an illegal memory access was encountered
>>>>   ...
>>>>   and with CANARY_SIZE unmodified, we run into:
>>>>   ...
>>>>   Expected 0, got 1 at base[-961]
>>>>   Aborted (core dumped)
>>>>   ...
>>>>
>>>> The cause of this is as follows:
>>>> - because the step s is a variable rather than a constant, an alternative
>>>>   IV (D.3209 in our example) is generated in expand_omp_simd, and the
>>>>   loop condition is tested in terms of the alternative IV rather than
>>>>   the original IV (i in our example).
>>>> - the SIMT code in expand_omp_simd works by modifying step and initial value.
>>>> - The initial value fd->loop.n1 is loaded into a variable n1, which is
>>>>   modified by the SIMT code and then used there-after.
>>>> - The step fd->loop.step is loaded into a variable step, which is is modified
>>>>   by the SIMT code, but afterwards there are uses of both step and
>>>>   fd->loop.step.
>>>> - There are uses of fd->loop.step in the alternative IV handling code,
>>>>   which should use step instead.
>>>>
>>>> Fix this by introducing an additional variable orig_step, which is not
>>>> modified by the SIMT code and replacing all remaining uses of fd->loop.step
>>>> by either step or orig_step.
>>>>
>>>> Build on x86_64-linux with nvptx accelerator, tested libgomp.
>>>>
>>>> This fixes for-5.c and for-6.c FAILs I'm currently seeing on a quadro m1200
>>>> with driver 450.66.
>>>>
>>>> OK for trunk?
>>>>
>>>
> 
> Ping^3.
> 

Committed.

Thanks,
- Tom

>>>> [omp, simt] Handle alternative IV
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>> 2020-10-02  Tom de Vries  <tdevries@suse.de>
>>>>
>>>> 	* omp-expand.c (expand_omp_simd): Add step_orig, and replace uses of
>>>> 	fd->loop.step by either step or orig_step.
>>>>
>>>> libgomp/ChangeLog:
>>>>
>>>> 2020-10-02  Tom de Vries  <tdevries@suse.de>
>>>>
>>>> 	* testsuite/libgomp.c/pr81778.c: New test.
>>>>
>>>> ---
>>>>  gcc/omp-expand.c                      | 11 ++++----
>>>>  libgomp/testsuite/libgomp.c/pr81778.c | 48 +++++++++++++++++++++++++++++++++++
>>>>  2 files changed, 54 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c
>>>> index 99cb4f9dda4..80e35ac0294 100644
>>>> --- a/gcc/omp-expand.c
>>>> +++ b/gcc/omp-expand.c
>>>> @@ -6307,6 +6307,7 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
>>>>        n2 = OMP_CLAUSE_DECL (innerc);
>>>>      }
>>>>    tree step = fd->loop.step;
>>>> +  tree orig_step = step; /* May be different from step if is_simt.  */
>>>>  
>>>>    bool is_simt = omp_find_clause (gimple_omp_for_clauses (fd->for_stmt),
>>>>  				  OMP_CLAUSE__SIMT_);
>>>> @@ -6455,7 +6456,7 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
>>>>    tree altv = NULL_TREE, altn2 = NULL_TREE;
>>>>    if (fd->collapse == 1
>>>>        && !broken_loop
>>>> -      && TREE_CODE (fd->loops[0].step) != INTEGER_CST)
>>>> +      && TREE_CODE (orig_step) != INTEGER_CST)
>>>>      {
>>>>        /* The vectorizer currently punts on loops with non-constant steps
>>>>  	 for the main IV (can't compute number of iterations and gives up
>>>> @@ -6471,7 +6472,7 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
>>>>  	itype = signed_type_for (itype);
>>>>        t = build_int_cst (itype, (fd->loop.cond_code == LT_EXPR ? -1 : 1));
>>>>        t = fold_build2 (PLUS_EXPR, itype,
>>>> -		       fold_convert (itype, fd->loop.step), t);
>>>> +		       fold_convert (itype, step), t);
>>>>        t = fold_build2 (PLUS_EXPR, itype, t, fold_convert (itype, n2));
>>>>        t = fold_build2 (MINUS_EXPR, itype, t,
>>>>  		       fold_convert (itype, fd->loop.v));
>>>> @@ -6479,10 +6480,10 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
>>>>  	t = fold_build2 (TRUNC_DIV_EXPR, itype,
>>>>  			 fold_build1 (NEGATE_EXPR, itype, t),
>>>>  			 fold_build1 (NEGATE_EXPR, itype,
>>>> -				      fold_convert (itype, fd->loop.step)));
>>>> +				      fold_convert (itype, step)));
>>>>        else
>>>>  	t = fold_build2 (TRUNC_DIV_EXPR, itype, t,
>>>> -			 fold_convert (itype, fd->loop.step));
>>>> +			 fold_convert (itype, step));
>>>>        t = fold_convert (TREE_TYPE (altv), t);
>>>>        altn2 = create_tmp_var (TREE_TYPE (altv));
>>>>        expand_omp_build_assign (&gsi, altn2, t);
>>>> @@ -6630,7 +6631,7 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
>>>>    if (is_simt)
>>>>      {
>>>>        gsi = gsi_start_bb (l2_bb);
>>>> -      step = fold_build2 (MINUS_EXPR, TREE_TYPE (step), fd->loop.step, step);
>>>> +      step = fold_build2 (MINUS_EXPR, TREE_TYPE (step), orig_step, step);
>>>>        if (POINTER_TYPE_P (type))
>>>>  	t = fold_build_pointer_plus (fd->loop.v, step);
>>>>        else
>>>> diff --git a/libgomp/testsuite/libgomp.c/pr81778.c b/libgomp/testsuite/libgomp.c/pr81778.c
>>>> new file mode 100644
>>>> index 00000000000..571668eb36a
>>>> --- /dev/null
>>>> +++ b/libgomp/testsuite/libgomp.c/pr81778.c
>>>> @@ -0,0 +1,48 @@
>>>> +/* Minimized from for-5.c.  */
>>>> +
>>>> +#include <stdio.h>
>>>> +#include <stdlib.h>
>>>> +
>>>> +/* Size of array we want to write.  */
>>>> +#define N 32
>>>> +
>>>> +/* Size of extra space before and after.  */
>>>> +#define CANARY_SIZE (N * 32)
>>>> +
>>>> +/* Start of array we want to write.  */
>>>> +#define BASE (CANARY_SIZE)
>>>> +
>>>> +// Total size to be allocated.
>>>> +#define ALLOC_SIZE (CANARY_SIZE + N + CANARY_SIZE)
>>>> +
>>>> +#pragma omp declare target
>>>> +int a[ALLOC_SIZE];
>>>> +#pragma omp end declare target
>>>> +
>>>> +int
>>>> +main (void)
>>>> +{
>>>> +  /* Use variable step in for loop.  */
>>>> +  int s = 1;
>>>> +
>>>> +#pragma omp target update to(a)
>>>> +
>>>> +  /* Write a[BASE] .. a[BASE + N - 1].  */
>>>> +#pragma omp target simd
>>>> +  for (int i = N - 1; i > -1; i -= s)
>>>> +    a[BASE + i] = 1;
>>>> +
>>>> +#pragma omp target update from(a)
>>>> +
>>>> +  for (int i = 0; i < ALLOC_SIZE; i++)
>>>> +    {
>>>> +      int expected = (BASE <= i && i < BASE + N) ? 1 : 0;
>>>> +      if (a[i] == expected)
>>>> +	continue;
>>>> +
>>>> +      printf ("Expected %d, got %d at base[%d]\n", expected, a[i], i - BASE);
>>>> +      abort ();
>>>> +    }
>>>> +
>>>> +  return 0;
>>>> +}
>>>>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-04-29 12:38 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-02 13:21 [PATCH][omp, simt] Handle alternative IV Tom de Vries
2020-10-15 15:05 ` [PING][PATCH][omp, " Tom de Vries
2020-12-17 16:46   ` [PING^2][PATCH][omp, " Tom de Vries
2021-04-22 11:46     ` [PING^3][PATCH][omp, " Tom de Vries
2021-04-29 12:38       ` [committed][omp, " Tom de Vries

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).