* [Bug target/97203] [nvptx] 'illegal memory access was encountered' with 'omp simd'/SIMT and cexpf call
2020-09-25 8:52 [Bug target/97203] New: [nvptx] 'illegal memory access was encountered' with 'omp simd'/SIMT and cexpf call burnus at gcc dot gnu.org
@ 2020-09-25 8:57 ` burnus at gcc dot gnu.org
2020-10-08 9:56 ` vries at gcc dot gnu.org
` (9 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: burnus at gcc dot gnu.org @ 2020-09-25 8:57 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97203
--- Comment #1 from Tobias Burnus <burnus at gcc dot gnu.org> ---
Besides PR95654, see PR81778 and PR80053.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/97203] [nvptx] 'illegal memory access was encountered' with 'omp simd'/SIMT and cexpf call
2020-09-25 8:52 [Bug target/97203] New: [nvptx] 'illegal memory access was encountered' with 'omp simd'/SIMT and cexpf call burnus at gcc dot gnu.org
2020-09-25 8:57 ` [Bug target/97203] " burnus at gcc dot gnu.org
@ 2020-10-08 9:56 ` vries at gcc dot gnu.org
2020-10-08 12:57 ` vries at gcc dot gnu.org
` (8 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: vries at gcc dot gnu.org @ 2020-10-08 9:56 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97203
--- Comment #2 from Tom de Vries <vries at gcc dot gnu.org> ---
Minimal version (without inlining sinf code from newlib):
...
/* { dg-additional-options "-lm -foffload=-lm" } */
#define N 1
int
main (void) {
float k[N];
float res;
for (int i = 0; i < N; i++)
k[i] = 300;
#pragma omp target map(to:k) map(from:res)
{
float sum = 0.0;
#pragma omp simd reduction(+:sum)
for (int i = 0; i < N; i++)
sum += __builtin_sinf (k[i]);
res = sum;
}
return 0;
}
...
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/97203] [nvptx] 'illegal memory access was encountered' with 'omp simd'/SIMT and cexpf call
2020-09-25 8:52 [Bug target/97203] New: [nvptx] 'illegal memory access was encountered' with 'omp simd'/SIMT and cexpf call burnus at gcc dot gnu.org
2020-09-25 8:57 ` [Bug target/97203] " burnus at gcc dot gnu.org
2020-10-08 9:56 ` vries at gcc dot gnu.org
@ 2020-10-08 12:57 ` vries at gcc dot gnu.org
2020-10-08 15:27 ` vries at gcc dot gnu.org
` (7 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: vries at gcc dot gnu.org @ 2020-10-08 12:57 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97203
--- Comment #3 from Tom de Vries <vries at gcc dot gnu.org> ---
[ Note, this is with GOMP_NVPTX_JIT=-O0. ]
In sinf, we have:
...
45: return -__kernel_cosf(y[0],y[1]);
...
which translates to:
...
.loc 1 45 12
ld.f32 %r67,[%frame+4];
ld.f32 %r65,[%frame];
{
.param .f32 %value_in;
.param .f32 %out_arg1;
st.param.f32 [%out_arg1],%r65;
.param .f32 %out_arg2;
st.param.f32 [%out_arg2],%r67;
call (%value_in),__kernel_cosf,(%out_arg1,%out_arg2);
ld.param.f32 %r68,[%value_in];
}
.loc 1 45 11
neg.f32 %r37,%r68;
...
If I place (using GOMP_NVPTX_PTXRW) a trap before the first load:
...
.loc 1 45 12
+trap
ld.f32 %r67,[%frame+4];
...
I get:
...
libgomp: cuCtxSynchronize error: an illegal instruction was encountered
...
If I place it after the first load, I get:
...
libgomp: cuCtxSynchronize error: an illegal memory access was encountered
...
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/97203] [nvptx] 'illegal memory access was encountered' with 'omp simd'/SIMT and cexpf call
2020-09-25 8:52 [Bug target/97203] New: [nvptx] 'illegal memory access was encountered' with 'omp simd'/SIMT and cexpf call burnus at gcc dot gnu.org
` (2 preceding siblings ...)
2020-10-08 12:57 ` vries at gcc dot gnu.org
@ 2020-10-08 15:27 ` vries at gcc dot gnu.org
2020-10-08 15:37 ` vries at gcc dot gnu.org
` (6 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: vries at gcc dot gnu.org @ 2020-10-08 15:27 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97203
--- Comment #4 from Tom de Vries <vries at gcc dot gnu.org> ---
So, I think calling functions from simd code is atm not supported for nvptx.
Stack variables in simd code are mapped on a per-thread stack rather than on
the
usual per-warp stack.
The functions are compiled with the usual per-warp stack, so calling those
functions from simd might mean the different lanes are gonna disagree about
what the value in a stack variable should be.
Having said that, for the example in comment 2, there only should be one thread
executing the call, so this doesn't explain the illegal memory access.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/97203] [nvptx] 'illegal memory access was encountered' with 'omp simd'/SIMT and cexpf call
2020-09-25 8:52 [Bug target/97203] New: [nvptx] 'illegal memory access was encountered' with 'omp simd'/SIMT and cexpf call burnus at gcc dot gnu.org
` (3 preceding siblings ...)
2020-10-08 15:27 ` vries at gcc dot gnu.org
@ 2020-10-08 15:37 ` vries at gcc dot gnu.org
2020-10-09 8:36 ` amonakov at gcc dot gnu.org
` (5 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: vries at gcc dot gnu.org @ 2020-10-08 15:37 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97203
--- Comment #5 from Tom de Vries <vries at gcc dot gnu.org> ---
FWIW, another aspect here is convergence (as usual).
Looking at the SASS code for main$_omp_fn$0$impl, I don't find evidence for the
usual divergence/convergence ops (SSY/SYNC), which might mean that the
following shfl is executed in divergent mode, so, even if we would not get the
memory access error, we would not get correct results.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/97203] [nvptx] 'illegal memory access was encountered' with 'omp simd'/SIMT and cexpf call
2020-09-25 8:52 [Bug target/97203] New: [nvptx] 'illegal memory access was encountered' with 'omp simd'/SIMT and cexpf call burnus at gcc dot gnu.org
` (4 preceding siblings ...)
2020-10-08 15:37 ` vries at gcc dot gnu.org
@ 2020-10-09 8:36 ` amonakov at gcc dot gnu.org
2020-10-12 8:44 ` vries at gcc dot gnu.org
` (4 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: amonakov at gcc dot gnu.org @ 2020-10-09 8:36 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97203
Alexander Monakov <amonakov at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |amonakov at gcc dot gnu.org
--- Comment #6 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
(In reply to Tom de Vries from comment #4)
> So, I think calling functions from simd code is atm not supported for nvptx.
>
> Stack variables in simd code are mapped on a per-thread stack rather than on
> the
> usual per-warp stack.
>
> The functions are compiled with the usual per-warp stack, so calling those
> functions from simd might mean the different lanes are gonna disagree about
> what the value in a stack variable should be.
This is inaccurate. In -msoft-stack mode there's no baked-in assumption that
stacks are always per-warp. The "soft stack" pointer can point either to global
memory (outside of SIMD regions), or to local memory (inside SIMD regions). The
pointer is switched between per-warp global memory and per-lane local memory by
nvptx.c:nvptx_output_softstack_switch.
The main requirement is that functions callable from OpenMP offloaded code are
compiled for -mgomp multilib variant. The design allows calling functions even
from inside SIMD regions, and it should be supported.
It is very disappointing that the first reaction was "I think ... is not
supported" without reaching out and asking questions. Lack of efficient
communication was a huge issue when OpenMP offloading support was contributed,
and it's disappointing to see it again years later.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/97203] [nvptx] 'illegal memory access was encountered' with 'omp simd'/SIMT and cexpf call
2020-09-25 8:52 [Bug target/97203] New: [nvptx] 'illegal memory access was encountered' with 'omp simd'/SIMT and cexpf call burnus at gcc dot gnu.org
` (5 preceding siblings ...)
2020-10-09 8:36 ` amonakov at gcc dot gnu.org
@ 2020-10-12 8:44 ` vries at gcc dot gnu.org
2020-10-12 9:36 ` amonakov at gcc dot gnu.org
` (3 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: vries at gcc dot gnu.org @ 2020-10-12 8:44 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97203
--- Comment #7 from Tom de Vries <vries at gcc dot gnu.org> ---
(In reply to Alexander Monakov from comment #6)
> (In reply to Tom de Vries from comment #4)
> > So, I think calling functions from simd code is atm not supported for nvptx.
> >
> > Stack variables in simd code are mapped on a per-thread stack rather than on
> > the
> > usual per-warp stack.
> >
> > The functions are compiled with the usual per-warp stack, so calling those
> > functions from simd might mean the different lanes are gonna disagree about
> > what the value in a stack variable should be.
>
> This is inaccurate. In -msoft-stack mode there's no baked-in assumption that
> stacks are always per-warp. The "soft stack" pointer can point either to
> global memory (outside of SIMD regions), or to local memory (inside SIMD
> regions). The pointer is switched between per-warp global memory and
> per-lane local memory by nvptx.c:nvptx_output_softstack_switch.
>
> The main requirement is that functions callable from OpenMP offloaded code
> are compiled for -mgomp multilib variant. The design allows calling
> functions even from inside SIMD regions, and it should be supported.
I see, that's helpful, thanks.
I guess I was thrown off by seeing a %simtstack_ar of 136 bytes:
...
.local .align 8 .b8 %simtstack_ar[136];
...
which seems more of an amount claimed by a single function.
Is it possible you meant the default of -msoft-stack-reserve-local=128 to mean
128kb (similar to what is claimed in nvptx_stacks_size in the plugin)? Because
currently it means 128 bytes.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/97203] [nvptx] 'illegal memory access was encountered' with 'omp simd'/SIMT and cexpf call
2020-09-25 8:52 [Bug target/97203] New: [nvptx] 'illegal memory access was encountered' with 'omp simd'/SIMT and cexpf call burnus at gcc dot gnu.org
` (6 preceding siblings ...)
2020-10-12 8:44 ` vries at gcc dot gnu.org
@ 2020-10-12 9:36 ` amonakov at gcc dot gnu.org
2020-10-12 13:00 ` vries at gcc dot gnu.org
` (2 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: amonakov at gcc dot gnu.org @ 2020-10-12 9:36 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97203
--- Comment #8 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
No, -msoft-stack-reserve-local is really meant to be in bytes: it may not
exceed the amount of .local memory reserved by CUDA driver (which is just 1-2
KB, unless overridden via cuCtxSetLimit, which nvptx-run.c does, but
plugin-nvptx.c does not).
Keep in mind that .local memory reservation is multiplied by number of active
contexts, which could be in range 20000-30000 when the code was written: 128KB
local memory per active thread would imply a 2.5GB allocation on the GPU.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/97203] [nvptx] 'illegal memory access was encountered' with 'omp simd'/SIMT and cexpf call
2020-09-25 8:52 [Bug target/97203] New: [nvptx] 'illegal memory access was encountered' with 'omp simd'/SIMT and cexpf call burnus at gcc dot gnu.org
` (7 preceding siblings ...)
2020-10-12 9:36 ` amonakov at gcc dot gnu.org
@ 2020-10-12 13:00 ` vries at gcc dot gnu.org
2020-10-12 15:10 ` vries at gcc dot gnu.org
2020-10-12 15:15 ` amonakov at gcc dot gnu.org
10 siblings, 0 replies; 12+ messages in thread
From: vries at gcc dot gnu.org @ 2020-10-12 13:00 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97203
--- Comment #9 from Tom de Vries <vries at gcc dot gnu.org> ---
(In reply to Tom de Vries from comment #2)
> Minimal version (without inlining sinf code from newlib):
> ...
> /* { dg-additional-options "-lm -foffload=-lm" } */
>
> #define N 1
>
> int
> main (void) {
> float k[N];
> float res;
>
> for (int i = 0; i < N; i++)
> k[i] = 300;
>
> #pragma omp target map(to:k) map(from:res)
> {
> float sum = 0.0;
> #pragma omp simd reduction(+:sum)
> for (int i = 0; i < N; i++)
> sum += __builtin_sinf (k[i]);
>
> res = sum;
> }
>
> return 0;
> }
> ...
Starts passing at -foffload=-msoft-stack-reserve-local=346.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/97203] [nvptx] 'illegal memory access was encountered' with 'omp simd'/SIMT and cexpf call
2020-09-25 8:52 [Bug target/97203] New: [nvptx] 'illegal memory access was encountered' with 'omp simd'/SIMT and cexpf call burnus at gcc dot gnu.org
` (8 preceding siblings ...)
2020-10-12 13:00 ` vries at gcc dot gnu.org
@ 2020-10-12 15:10 ` vries at gcc dot gnu.org
2020-10-12 15:15 ` amonakov at gcc dot gnu.org
10 siblings, 0 replies; 12+ messages in thread
From: vries at gcc dot gnu.org @ 2020-10-12 15:10 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97203
--- Comment #10 from Tom de Vries <vries at gcc dot gnu.org> ---
(In reply to Alexander Monakov from comment #8)
> No, -msoft-stack-reserve-local is really meant to be in bytes: it may not
> exceed the amount of .local memory reserved by CUDA driver (which is just
> 1-2 KB, unless overridden via cuCtxSetLimit, which nvptx-run.c does, but
> plugin-nvptx.c does not).
>
> Keep in mind that .local memory reservation is multiplied by number of
> active contexts, which could be in range 20000-30000 when the code was
> written: 128KB local memory per active thread would imply a 2.5GB allocation
> on the GPU.
With the number of active contexts, do you mean the sm_count * thread_max as
used in nvptx-run.c (which, FWIW, is 10.240 on my card)?
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/97203] [nvptx] 'illegal memory access was encountered' with 'omp simd'/SIMT and cexpf call
2020-09-25 8:52 [Bug target/97203] New: [nvptx] 'illegal memory access was encountered' with 'omp simd'/SIMT and cexpf call burnus at gcc dot gnu.org
` (9 preceding siblings ...)
2020-10-12 15:10 ` vries at gcc dot gnu.org
@ 2020-10-12 15:15 ` amonakov at gcc dot gnu.org
10 siblings, 0 replies; 12+ messages in thread
From: amonakov at gcc dot gnu.org @ 2020-10-12 15:15 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97203
--- Comment #11 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
Yes, that.
^ permalink raw reply [flat|nested] 12+ messages in thread