public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Tom de Vries <tdevries@suse.de>
To: gcc-patches@gcc.gnu.org
Cc: Jakub Jelinek <jakub@redhat.com>, Alexander Monakov <amonakov@ispras.ru>
Subject: [committed][omp, simt] Handle alternative IV
Date: Thu, 29 Apr 2021 14:38:54 +0200	[thread overview]
Message-ID: <aaca8351-8b27-10ec-cfcb-8f726902b0d9@suse.de> (raw)
In-Reply-To: <4fde92b4-d893-790e-fe52-83265a51c107@suse.de>

On 4/22/21 1:46 PM, Tom de Vries wrote:
> On 12/17/20 5:46 PM, Tom de Vries wrote:
>> On 10/15/20 5:05 PM, Tom de Vries wrote:
>>> On 10/2/20 3:21 PM, Tom de Vries wrote:
>>>> Hi,
>>>>
>>>> Consider the test-case libgomp.c/pr81778.c added in this commit, with
>>>> this core loop (note: CANARY_SIZE set to 0 for simplicity):
>>>> ...
>>>>   int s = 1;
>>>>   #pragma omp target simd
>>>>   for (int i = N - 1; i > -1; i -= s)
>>>>     a[i] = 1;
>>>> ...
>>>> which, given that N is 32, sets a[0..31] to 1.
>>>>
>>>> After omp-expand, this looks like:
>>>> ...
>>>>   <bb 5> :
>>>>   simduid.7 = .GOMP_SIMT_ENTER (simduid.7);
>>>>   .omp_simt.8 = .GOMP_SIMT_ENTER_ALLOC (simduid.7);
>>>>   D.3193 = -s;
>>>>   s.9 = s;
>>>>   D.3204 = .GOMP_SIMT_LANE ();
>>>>   D.3205 = -s.9;
>>>>   D.3206 = (int) D.3204;
>>>>   D.3207 = D.3205 * D.3206;
>>>>   i = D.3207 + 31;
>>>>   D.3209 = 0;
>>>>   D.3210 = -s.9;
>>>>   D.3211 = D.3210 - i;
>>>>   D.3210 = -s.9;
>>>>   D.3212 = D.3211 / D.3210;
>>>>   D.3213 = (unsigned int) D.3212;
>>>>   D.3213 = i >= 0 ? D.3213 : 0;
>>>>
>>>>   <bb 19> :
>>>>   if (D.3209 < D.3213)
>>>>     goto <bb 6>; [87.50%]
>>>>   else
>>>>     goto <bb 7>; [12.50%]
>>>>
>>>>   <bb 6> :
>>>>   a[i] = 1;
>>>>   D.3215 = -s.9;
>>>>   D.3219 = .GOMP_SIMT_VF ();
>>>>   D.3216 = (int) D.3219;
>>>>   D.3220 = D.3215 * D.3216;
>>>>   i = D.3220 + i;
>>>>   D.3209 = D.3209 + 1;
>>>>   goto <bb 19>; [100.00%]
>>>> ...
>>>>
>>>> On nvptx, the first time bb6 is executed, i is in the 0..31 range (depending
>>>> on the lane that is executing) at bb entry.
>>>>
>>>> So we have the following sequence:
>>>> - a[0..31] is set to 1
>>>> - i is updated to -32..-1
>>>> - D.3209 is updated to 1 (being 0 initially)
>>>> - bb19 is executed, and if condition (D.3209 < D.3213) == (1 < 32) evaluates
>>>>   to true
>>>> - bb6 is once more executed, which should not happen because all the elements
>>>>   that needed to be handled were already handled.
>>>> - consequently, elements that should not be written are written
>>>> - with CANARY_SIZE == 0, we may run into a libgomp error:
>>>>   ...
>>>>   libgomp: cuCtxSynchronize error: an illegal memory access was encountered
>>>>   ...
>>>>   and with CANARY_SIZE unmodified, we run into:
>>>>   ...
>>>>   Expected 0, got 1 at base[-961]
>>>>   Aborted (core dumped)
>>>>   ...
>>>>
>>>> The cause of this is as follows:
>>>> - because the step s is a variable rather than a constant, an alternative
>>>>   IV (D.3209 in our example) is generated in expand_omp_simd, and the
>>>>   loop condition is tested in terms of the alternative IV rather than
>>>>   the original IV (i in our example).
>>>> - the SIMT code in expand_omp_simd works by modifying step and initial value.
>>>> - The initial value fd->loop.n1 is loaded into a variable n1, which is
>>>>   modified by the SIMT code and then used there-after.
>>>> - The step fd->loop.step is loaded into a variable step, which is is modified
>>>>   by the SIMT code, but afterwards there are uses of both step and
>>>>   fd->loop.step.
>>>> - There are uses of fd->loop.step in the alternative IV handling code,
>>>>   which should use step instead.
>>>>
>>>> Fix this by introducing an additional variable orig_step, which is not
>>>> modified by the SIMT code and replacing all remaining uses of fd->loop.step
>>>> by either step or orig_step.
>>>>
>>>> Build on x86_64-linux with nvptx accelerator, tested libgomp.
>>>>
>>>> This fixes for-5.c and for-6.c FAILs I'm currently seeing on a quadro m1200
>>>> with driver 450.66.
>>>>
>>>> OK for trunk?
>>>>
>>>
> 
> Ping^3.
> 

Committed.

Thanks,
- Tom

>>>> [omp, simt] Handle alternative IV
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>> 2020-10-02  Tom de Vries  <tdevries@suse.de>
>>>>
>>>> 	* omp-expand.c (expand_omp_simd): Add step_orig, and replace uses of
>>>> 	fd->loop.step by either step or orig_step.
>>>>
>>>> libgomp/ChangeLog:
>>>>
>>>> 2020-10-02  Tom de Vries  <tdevries@suse.de>
>>>>
>>>> 	* testsuite/libgomp.c/pr81778.c: New test.
>>>>
>>>> ---
>>>>  gcc/omp-expand.c                      | 11 ++++----
>>>>  libgomp/testsuite/libgomp.c/pr81778.c | 48 +++++++++++++++++++++++++++++++++++
>>>>  2 files changed, 54 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c
>>>> index 99cb4f9dda4..80e35ac0294 100644
>>>> --- a/gcc/omp-expand.c
>>>> +++ b/gcc/omp-expand.c
>>>> @@ -6307,6 +6307,7 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
>>>>        n2 = OMP_CLAUSE_DECL (innerc);
>>>>      }
>>>>    tree step = fd->loop.step;
>>>> +  tree orig_step = step; /* May be different from step if is_simt.  */
>>>>  
>>>>    bool is_simt = omp_find_clause (gimple_omp_for_clauses (fd->for_stmt),
>>>>  				  OMP_CLAUSE__SIMT_);
>>>> @@ -6455,7 +6456,7 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
>>>>    tree altv = NULL_TREE, altn2 = NULL_TREE;
>>>>    if (fd->collapse == 1
>>>>        && !broken_loop
>>>> -      && TREE_CODE (fd->loops[0].step) != INTEGER_CST)
>>>> +      && TREE_CODE (orig_step) != INTEGER_CST)
>>>>      {
>>>>        /* The vectorizer currently punts on loops with non-constant steps
>>>>  	 for the main IV (can't compute number of iterations and gives up
>>>> @@ -6471,7 +6472,7 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
>>>>  	itype = signed_type_for (itype);
>>>>        t = build_int_cst (itype, (fd->loop.cond_code == LT_EXPR ? -1 : 1));
>>>>        t = fold_build2 (PLUS_EXPR, itype,
>>>> -		       fold_convert (itype, fd->loop.step), t);
>>>> +		       fold_convert (itype, step), t);
>>>>        t = fold_build2 (PLUS_EXPR, itype, t, fold_convert (itype, n2));
>>>>        t = fold_build2 (MINUS_EXPR, itype, t,
>>>>  		       fold_convert (itype, fd->loop.v));
>>>> @@ -6479,10 +6480,10 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
>>>>  	t = fold_build2 (TRUNC_DIV_EXPR, itype,
>>>>  			 fold_build1 (NEGATE_EXPR, itype, t),
>>>>  			 fold_build1 (NEGATE_EXPR, itype,
>>>> -				      fold_convert (itype, fd->loop.step)));
>>>> +				      fold_convert (itype, step)));
>>>>        else
>>>>  	t = fold_build2 (TRUNC_DIV_EXPR, itype, t,
>>>> -			 fold_convert (itype, fd->loop.step));
>>>> +			 fold_convert (itype, step));
>>>>        t = fold_convert (TREE_TYPE (altv), t);
>>>>        altn2 = create_tmp_var (TREE_TYPE (altv));
>>>>        expand_omp_build_assign (&gsi, altn2, t);
>>>> @@ -6630,7 +6631,7 @@ expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
>>>>    if (is_simt)
>>>>      {
>>>>        gsi = gsi_start_bb (l2_bb);
>>>> -      step = fold_build2 (MINUS_EXPR, TREE_TYPE (step), fd->loop.step, step);
>>>> +      step = fold_build2 (MINUS_EXPR, TREE_TYPE (step), orig_step, step);
>>>>        if (POINTER_TYPE_P (type))
>>>>  	t = fold_build_pointer_plus (fd->loop.v, step);
>>>>        else
>>>> diff --git a/libgomp/testsuite/libgomp.c/pr81778.c b/libgomp/testsuite/libgomp.c/pr81778.c
>>>> new file mode 100644
>>>> index 00000000000..571668eb36a
>>>> --- /dev/null
>>>> +++ b/libgomp/testsuite/libgomp.c/pr81778.c
>>>> @@ -0,0 +1,48 @@
>>>> +/* Minimized from for-5.c.  */
>>>> +
>>>> +#include <stdio.h>
>>>> +#include <stdlib.h>
>>>> +
>>>> +/* Size of array we want to write.  */
>>>> +#define N 32
>>>> +
>>>> +/* Size of extra space before and after.  */
>>>> +#define CANARY_SIZE (N * 32)
>>>> +
>>>> +/* Start of array we want to write.  */
>>>> +#define BASE (CANARY_SIZE)
>>>> +
>>>> +// Total size to be allocated.
>>>> +#define ALLOC_SIZE (CANARY_SIZE + N + CANARY_SIZE)
>>>> +
>>>> +#pragma omp declare target
>>>> +int a[ALLOC_SIZE];
>>>> +#pragma omp end declare target
>>>> +
>>>> +int
>>>> +main (void)
>>>> +{
>>>> +  /* Use variable step in for loop.  */
>>>> +  int s = 1;
>>>> +
>>>> +#pragma omp target update to(a)
>>>> +
>>>> +  /* Write a[BASE] .. a[BASE + N - 1].  */
>>>> +#pragma omp target simd
>>>> +  for (int i = N - 1; i > -1; i -= s)
>>>> +    a[BASE + i] = 1;
>>>> +
>>>> +#pragma omp target update from(a)
>>>> +
>>>> +  for (int i = 0; i < ALLOC_SIZE; i++)
>>>> +    {
>>>> +      int expected = (BASE <= i && i < BASE + N) ? 1 : 0;
>>>> +      if (a[i] == expected)
>>>> +	continue;
>>>> +
>>>> +      printf ("Expected %d, got %d at base[%d]\n", expected, a[i], i - BASE);
>>>> +      abort ();
>>>> +    }
>>>> +
>>>> +  return 0;
>>>> +}
>>>>

      reply	other threads:[~2021-04-29 12:38 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-02 13:21 [PATCH][omp, " Tom de Vries
2020-10-15 15:05 ` [PING][PATCH][omp, " Tom de Vries
2020-12-17 16:46   ` [PING^2][PATCH][omp, " Tom de Vries
2021-04-22 11:46     ` [PING^3][PATCH][omp, " Tom de Vries
2021-04-29 12:38       ` Tom de Vries [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aaca8351-8b27-10ec-cfcb-8f726902b0d9@suse.de \
    --to=tdevries@suse.de \
    --cc=amonakov@ispras.ru \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jakub@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).