* [Patch] + [nvptx RFH/RFC]: OpenMP: Fix SIMT for complex/float reduction with && and ||
@ 2021-05-06 10:17 Tobias Burnus
2021-05-06 10:30 ` Jakub Jelinek
0 siblings, 1 reply; 11+ messages in thread
From: Tobias Burnus @ 2021-05-06 10:17 UTC (permalink / raw)
To: gcc-patches, Jakub Jelinek, Tom de Vries
[-- Attachment #1: Type: text/plain, Size: 2211 bytes --]
The complex/float && and || reduction patch missed a target testcase
(→ attached) which revealed that also a SIMT needed some special
handling, but just runs on non-SIMT systems.
The omp-low.c patch is rather simple - and I think it semantically
okay.
[Note to the change: It looks more completed than it is:
- moving 'zero' decl out of the 'if' block
- moving that if block before the 'if (sctx.is_simt)' block
- 'if (is_fp_and_or)' to the 'if (sctx.is_simt)' block.]
I think at least the testcase should be added, possibly also
the omp-low.c change – albeit I get a later ICE (see below),
which needs either an XFAIL or a fix.
* * *
ICE with NVPTX:
When the device lto1 starts, it fails when expanding the
intrinsic XCHG_BFLY function.
We have 'ivar' = complex float, which at rtx level is
converted to a concatenation (via gen_reg_rtx()).
In omp-low.c:
IFN_GOMP_SIMT_XCHG_BFLY (TREE_TYPE(ivar), ivar, simt_lane)
Later in expand_GOMP_SIMT_XCHG_BFLY, we call:
371 expand_insn (targetm.code_for_omp_simt_xchg_bfly, 3, ops);
which fails by running into unreachable of 'expand_insn'
7844 if (!maybe_expand_insn (icode, nops, ops))
7845 gcc_unreachable ();
icode = CODE_FOR_omp_simt_xchg_bfly
nops = 3
(gdb) p ops[0]->type
$3 = EXPAND_OUTPUT
(gdb) p debug(ops[0]->value)
(concat:SC (reg:SF 85)
(reg:SF 86))
(gdb) p ops[1]->type
$5 = EXPAND_INPUT
(gdb) p debug(ops[1]->value)
(concat:SC (reg:SF 26 [ orfc ])
(reg:SF 27 [ orfc+4 ]))
(gdb) p ops[2]->type
$7 = EXPAND_INPUT
(gdb) p debug(ops[2]->value)
(reg:SI 52 [ _74 ])
The mentioned concat happens in
How to fix this? Or does this fall into the same category as
PR100321 (fixed by: r12-395, Disable SIMT for user-defined reduction) with its
follow-up PR 100408?
Small testcase is:
_Complex float rcf[1024];
int
reduction_or ()
{
_Complex float orfc = 0;
for (int i=0; i < 1024; ++i)
orfc = orfc || rcf[i];
return __real__ orfc;
}
Tobias
-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf
[-- Attachment #2: red-nvptx-bfly.diff --]
[-- Type: text/x-patch, Size: 7233 bytes --]
OpenMP: Fix SIMT for complex/float reduction with && and ||
gcc/ChangeLog:
* omp-low.c (lower_rec_input_clauses): Also handle SIMT part
for complex/float recution with && and ||.
libgomp/ChangeLog:
* testsuite/libgomp.c-c++-common/reduction-5.c: New test, testing
complex/floating-point || + && recduction with 'omp target'.
gcc/omp-low.c | 30 ++--
.../testsuite/libgomp.c-c++-common/reduction-5.c | 192 +++++++++++++++++++++
2 files changed, 210 insertions(+), 12 deletions(-)
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 26ceaf7..46220c5 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -6432,28 +6432,34 @@ lower_rec_input_clauses (tree clauses, gimple_seq *ilist, gimple_seq *dlist,
gimplify_assign (unshare_expr (ivar), x, &llist[0]);
- if (sctx.is_simt)
- {
- if (!simt_lane)
- simt_lane = create_tmp_var (unsigned_type_node);
- x = build_call_expr_internal_loc
- (UNKNOWN_LOCATION, IFN_GOMP_SIMT_XCHG_BFLY,
- TREE_TYPE (ivar), 2, ivar, simt_lane);
- x = build2 (code, TREE_TYPE (ivar), ivar, x);
- gimplify_assign (ivar, x, &llist[2]);
- }
tree ivar2 = ivar;
tree ref2 = ref;
+ tree zero = NULL_TREE;
if (is_fp_and_or)
{
- tree zero = build_zero_cst (TREE_TYPE (ivar));
+ zero = build_zero_cst (TREE_TYPE (ivar));
ivar2 = fold_build2_loc (clause_loc, NE_EXPR,
integer_type_node, ivar,
zero);
ref2 = fold_build2_loc (clause_loc, NE_EXPR,
integer_type_node, ref, zero);
}
- x = build2 (code, TREE_TYPE (ref), ref2, ivar2);
+ if (sctx.is_simt)
+ {
+ if (!simt_lane)
+ simt_lane = create_tmp_var (unsigned_type_node);
+ x = build_call_expr_internal_loc
+ (UNKNOWN_LOCATION, IFN_GOMP_SIMT_XCHG_BFLY,
+ TREE_TYPE (ivar), 2, ivar, simt_lane);
+ if (is_fp_and_or)
+ x = fold_build2_loc (clause_loc, NE_EXPR,
+ integer_type_node, x, zero);
+ x = build2 (code, TREE_TYPE (ivar2), ivar2, x);
+ if (is_fp_and_or)
+ x = fold_convert (TREE_TYPE (ivar), x);
+ gimplify_assign (ivar, x, &llist[2]);
+ }
+ x = build2 (code, TREE_TYPE (ref2), ref2, ivar2);
if (is_fp_and_or)
x = fold_convert (TREE_TYPE (ref), x);
ref = build_outer_var_ref (var, ctx);
diff --git a/libgomp/testsuite/libgomp.c-c++-common/reduction-5.c b/libgomp/testsuite/libgomp.c-c++-common/reduction-5.c
new file mode 100644
index 0000000..346c882
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-5.c
@@ -0,0 +1,192 @@
+/* C / C++'s logical AND and OR operators take any scalar argument
+ which compares (un)equal to 0 - the result 1 or 0 and of type int.
+
+ In this testcase, the int result is again converted to a floating-poing
+ or complex type.
+
+ While having a floating-point/complex array element with || and && can make
+ sense, having a non-integer/non-bool reduction variable is odd but valid.
+
+ Test: FP reduction variable + FP array. */
+
+#define N 1024
+_Complex float rcf[N];
+_Complex double rcd[N];
+float rf[N];
+double rd[N];
+
+int
+reduction_or ()
+{
+ float orf = 0;
+ double ord = 0;
+ _Complex float orfc = 0;
+ _Complex double ordc = 0;
+
+ #pragma omp target parallel reduction(||: orf)
+ for (int i=0; i < N; ++i)
+ orf = orf || rf[i];
+
+ #pragma omp target parallel for reduction(||: ord)
+ for (int i=0; i < N; ++i)
+ ord = ord || rcd[i];
+
+ #pragma omp target parallel for simd reduction(||: orfc)
+ for (int i=0; i < N; ++i)
+ orfc = orfc || rcf[i];
+
+ #pragma omp target parallel loop reduction(||: ordc)
+ for (int i=0; i < N; ++i)
+ ordc = ordc || rcd[i];
+
+ return orf + ord + __real__ orfc + __real__ ordc;
+}
+
+int
+reduction_or_teams ()
+{
+ float orf = 0;
+ double ord = 0;
+ _Complex float orfc = 0;
+ _Complex double ordc = 0;
+
+ #pragma omp target teams distribute parallel for reduction(||: orf)
+ for (int i=0; i < N; ++i)
+ orf = orf || rf[i];
+
+ #pragma omp target teams distribute parallel for simd reduction(||: ord)
+ for (int i=0; i < N; ++i)
+ ord = ord || rcd[i];
+
+ #pragma omp target teams distribute parallel for reduction(||: orfc)
+ for (int i=0; i < N; ++i)
+ orfc = orfc || rcf[i];
+
+ #pragma omp target teams distribute parallel for simd reduction(||: ordc)
+ for (int i=0; i < N; ++i)
+ ordc = ordc || rcd[i];
+
+ return orf + ord + __real__ orfc + __real__ ordc;
+}
+
+int
+reduction_and ()
+{
+ float andf = 1;
+ double andd = 1;
+ _Complex float andfc = 1;
+ _Complex double anddc = 1;
+
+ #pragma omp target parallel reduction(&&: andf)
+ for (int i=0; i < N; ++i)
+ andf = andf && rf[i];
+
+ #pragma omp target parallel for reduction(&&: andd)
+ for (int i=0; i < N; ++i)
+ andd = andd && rcd[i];
+
+ #pragma omp target parallel for simd reduction(&&: andfc)
+ for (int i=0; i < N; ++i)
+ andfc = andfc && rcf[i];
+
+ #pragma omp target parallel loop reduction(&&: anddc)
+ for (int i=0; i < N; ++i)
+ anddc = anddc && rcd[i];
+
+ return andf + andd + __real__ andfc + __real__ anddc;
+}
+
+int
+reduction_and_teams ()
+{
+ float andf = 1;
+ double andd = 1;
+ _Complex float andfc = 1;
+ _Complex double anddc = 1;
+
+ #pragma omp target teams distribute parallel for reduction(&&: andf)
+ for (int i=0; i < N; ++i)
+ andf = andf && rf[i];
+
+ #pragma omp target teams distribute parallel for simd reduction(&&: andd)
+ for (int i=0; i < N; ++i)
+ andd = andd && rcd[i];
+
+ #pragma omp target teams distribute parallel for reduction(&&: andfc)
+ for (int i=0; i < N; ++i)
+ andfc = andfc && rcf[i];
+
+ #pragma omp target teams distribute parallel for simd reduction(&&: anddc)
+ for (int i=0; i < N; ++i)
+ anddc = anddc && rcd[i];
+
+ return andf + andd + __real__ andfc + __real__ anddc;
+}
+
+int
+main ()
+{
+ for (int i = 0; i < N; ++i)
+ {
+ rf[i] = 0;
+ rd[i] = 0;
+ rcf[i] = 0;
+ rcd[i] = 0;
+ }
+
+ if (reduction_or () != 0)
+ __builtin_abort ();
+ if (reduction_or_teams () != 0)
+ __builtin_abort ();
+ if (reduction_and () != 0)
+ __builtin_abort ();
+ if (reduction_and_teams () != 0)
+ __builtin_abort ();
+
+ rf[10] = 1.0;
+ rd[15] = 1.0;
+ rcf[10] = 1.0;
+ rcd[15] = 1.0i;
+
+ if (reduction_or () != 4)
+ __builtin_abort ();
+ if (reduction_or_teams () != 4)
+ __builtin_abort ();
+ if (reduction_and () != 0)
+ __builtin_abort ();
+ if (reduction_and_teams () != 0)
+ __builtin_abort ();
+
+ for (int i = 0; i < N; ++i)
+ {
+ rf[i] = 1;
+ rd[i] = 1;
+ rcf[i] = 1;
+ rcd[i] = 1;
+ }
+
+ if (reduction_or () != 4)
+ __builtin_abort ();
+ if (reduction_or_teams () != 4)
+ __builtin_abort ();
+ if (reduction_and () != 4)
+ __builtin_abort ();
+ if (reduction_and_teams () != 4)
+ __builtin_abort ();
+
+ rf[10] = 0.0;
+ rd[15] = 0.0;
+ rcf[10] = 0.0;
+ rcd[15] = 0.0;
+
+ if (reduction_or () != 4)
+ __builtin_abort ();
+ if (reduction_or_teams () != 4)
+ __builtin_abort ();
+ if (reduction_and () != 0)
+ __builtin_abort ();
+ if (reduction_and_teams () != 0)
+ __builtin_abort ();
+
+ return 0;
+}
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Patch] + [nvptx RFH/RFC]: OpenMP: Fix SIMT for complex/float reduction with && and ||
2021-05-06 10:17 [Patch] + [nvptx RFH/RFC]: OpenMP: Fix SIMT for complex/float reduction with && and || Tobias Burnus
@ 2021-05-06 10:30 ` Jakub Jelinek
2021-05-06 13:12 ` Tom de Vries
0 siblings, 1 reply; 11+ messages in thread
From: Jakub Jelinek @ 2021-05-06 10:30 UTC (permalink / raw)
To: Tobias Burnus; +Cc: gcc-patches, Tom de Vries
On Thu, May 06, 2021 at 12:17:03PM +0200, Tobias Burnus wrote:
> OpenMP: Fix SIMT for complex/float reduction with && and ||
>
> gcc/ChangeLog:
>
> * omp-low.c (lower_rec_input_clauses): Also handle SIMT part
> for complex/float recution with && and ||.
>
> libgomp/ChangeLog:
>
> * testsuite/libgomp.c-c++-common/reduction-5.c: New test, testing
> complex/floating-point || + && recduction with 'omp target'.
As the float/complex ||/&& reductions are IMHO just conformance issues, not
something anyone would actually use in meaningful code - floats or complex
aren't the most obvious or efficient holders of boolean values - I think
punting SIMT on those isn't a workaround, but the right solution.
Jakub
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Patch] + [nvptx RFH/RFC]: OpenMP: Fix SIMT for complex/float reduction with && and ||
2021-05-06 10:30 ` Jakub Jelinek
@ 2021-05-06 13:12 ` Tom de Vries
2021-05-06 13:22 ` Jakub Jelinek
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Tom de Vries @ 2021-05-06 13:12 UTC (permalink / raw)
To: Jakub Jelinek, Tobias Burnus; +Cc: gcc-patches
[-- Attachment #1: Type: text/plain, Size: 1103 bytes --]
On 5/6/21 12:30 PM, Jakub Jelinek wrote:
> On Thu, May 06, 2021 at 12:17:03PM +0200, Tobias Burnus wrote:
>> OpenMP: Fix SIMT for complex/float reduction with && and ||
>>
>> gcc/ChangeLog:
>>
>> * omp-low.c (lower_rec_input_clauses): Also handle SIMT part
>> for complex/float recution with && and ||.
>>
>> libgomp/ChangeLog:
>>
>> * testsuite/libgomp.c-c++-common/reduction-5.c: New test, testing
>> complex/floating-point || + && recduction with 'omp target'.
>
> As the float/complex ||/&& reductions are IMHO just conformance issues, not
> something anyone would actually use in meaningful code - floats or complex
> aren't the most obvious or efficient holders of boolean values - I think
> punting SIMT on those isn't a workaround, but the right solution.
>
Ack.
WIP patch below tries that approach and fixes the ICE, but this simple
example still doesn't work:
...
int
main ()
{
float andf = 1;
#pragma omp target parallel reduction(&&: andf)
for (int i=0; i < 1024; ++i)
andf = andf && 0.0;
if ((int)andf != 0)
__builtin_abort ();
return 0;
}
...
Thanks,
- Tom
[-- Attachment #2: tmp.patch --]
[-- Type: text/x-patch, Size: 1184 bytes --]
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 26ceaf74b2d..d8f2487054f 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -4389,14 +4389,28 @@ lower_rec_simd_input_clauses (tree new_var, omp_context *ctx,
{
for (tree c = gimple_omp_for_clauses (ctx->stmt); c;
c = OMP_CLAUSE_CHAIN (c))
- if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION
- && OMP_CLAUSE_REDUCTION_PLACEHOLDER (c))
- {
- /* UDR reductions are not supported yet for SIMT, disable
- SIMT. */
- sctx->max_vf = 1;
- break;
- }
+ {
+ if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_REDUCTION)
+ continue;
+
+ if (OMP_CLAUSE_REDUCTION_PLACEHOLDER (c))
+ {
+ /* UDR reductions are not supported yet for SIMT, disable
+ SIMT. */
+ sctx->max_vf = 1;
+ break;
+ }
+
+ if (truth_value_p (OMP_CLAUSE_REDUCTION_CODE (c))
+ && TREE_CODE (TREE_TYPE (new_var)) != BOOLEAN_TYPE)
+ {
+ /* Doing boolean operations on non-boolean types is
+ for conformance only, it's not worth supporting this
+ for SIMT. */
+ sctx->max_vf = 1;
+ break;
+ }
+ }
}
if (maybe_gt (sctx->max_vf, 1U))
{
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Patch] + [nvptx RFH/RFC]: OpenMP: Fix SIMT for complex/float reduction with && and ||
2021-05-06 13:12 ` Tom de Vries
@ 2021-05-06 13:22 ` Jakub Jelinek
2021-05-06 14:05 ` Tom de Vries
2021-05-06 14:21 ` Tobias Burnus
2 siblings, 0 replies; 11+ messages in thread
From: Jakub Jelinek @ 2021-05-06 13:22 UTC (permalink / raw)
To: Tom de Vries; +Cc: Tobias Burnus, gcc-patches
On Thu, May 06, 2021 at 03:12:59PM +0200, Tom de Vries wrote:
> + if (truth_value_p (OMP_CLAUSE_REDUCTION_CODE (c))
> + && TREE_CODE (TREE_TYPE (new_var)) != BOOLEAN_TYPE)
I would use && !INTEGRAL_TYPE_P (TREE_TYPE (new_var))
Especially in C code using || or && with int or other non-_Bool types
will pretty frequent.
Of course, if that doesn't work for SIMT either, it needs further work
and punting on those could be a temporary workaround. But it would be
a preexisting issue, not something introduced with accepting &&/|| for
floating/complex types - we've accepted &&/|| for integral types forever.
Jakub
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Patch] + [nvptx RFH/RFC]: OpenMP: Fix SIMT for complex/float reduction with && and ||
2021-05-06 13:12 ` Tom de Vries
2021-05-06 13:22 ` Jakub Jelinek
@ 2021-05-06 14:05 ` Tom de Vries
2021-05-06 14:21 ` Tobias Burnus
2 siblings, 0 replies; 11+ messages in thread
From: Tom de Vries @ 2021-05-06 14:05 UTC (permalink / raw)
To: Jakub Jelinek, Tobias Burnus; +Cc: gcc-patches
On 5/6/21 3:12 PM, Tom de Vries wrote:
> On 5/6/21 12:30 PM, Jakub Jelinek wrote:
>> On Thu, May 06, 2021 at 12:17:03PM +0200, Tobias Burnus wrote:
>>> OpenMP: Fix SIMT for complex/float reduction with && and ||
>>>
>>> gcc/ChangeLog:
>>>
>>> * omp-low.c (lower_rec_input_clauses): Also handle SIMT part
>>> for complex/float recution with && and ||.
>>>
>>> libgomp/ChangeLog:
>>>
>>> * testsuite/libgomp.c-c++-common/reduction-5.c: New test, testing
>>> complex/floating-point || + && recduction with 'omp target'.
>>
>> As the float/complex ||/&& reductions are IMHO just conformance issues, not
>> something anyone would actually use in meaningful code - floats or complex
>> aren't the most obvious or efficient holders of boolean values - I think
>> punting SIMT on those isn't a workaround, but the right solution.
>>
>
> Ack.
>
> WIP patch below tries that approach and fixes the ICE, but this simple
> example still doesn't work:
> ...
> int
> main ()
> {
> float andf = 1;
>
> #pragma omp target parallel reduction(&&: andf)
> for (int i=0; i < 1024; ++i)
> andf = andf && 0.0;
>
> if ((int)andf != 0)
> __builtin_abort ();
>
> return 0;
> }
> ...
Hm, after rewriting things like this:
...
#pragma omp target map (tofrom: andf)
#pragma omp parallel reduction(&&: andf)
for (int i=0; i < 1024; ++i)
andf = andf && 0.0;
...
it does work.
My limited openmp knowledge is not enough to decide whether the fail of
the first variant is a test-case issue, or a gcc issue.
Thanks,
- Tom
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Patch] + [nvptx RFH/RFC]: OpenMP: Fix SIMT for complex/float reduction with && and ||
2021-05-06 13:12 ` Tom de Vries
2021-05-06 13:22 ` Jakub Jelinek
2021-05-06 14:05 ` Tom de Vries
@ 2021-05-06 14:21 ` Tobias Burnus
2021-05-06 14:32 ` Jakub Jelinek
2 siblings, 1 reply; 11+ messages in thread
From: Tobias Burnus @ 2021-05-06 14:21 UTC (permalink / raw)
To: Tom de Vries, Jakub Jelinek; +Cc: gcc-patches
[-- Attachment #1: Type: text/plain, Size: 773 bytes --]
On 06.05.21 15:12, Tom de Vries wrote:
> WIP patch below tries that approach and fixes the ICE,
Thanks!
> but this simple example still doesn't work:
> ...
> #pragma omp target parallel reduction(&&: andf)
Try: map(andf). [Cf. PR99928 with pending patch at
https://gcc.gnu.org/pipermail/gcc-patches/2021-April/567838.html ]
I have now added your WIP patch to my patch, honoring the comment by Jakub.
I also copied the _Complex int example to -6.c to have also a target
version for this.
Lightly tested for now w/ and w/o offloading, will run the testsuite now.
Tobias
-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf
[-- Attachment #2: red-nvptx-bfly-v2.diff --]
[-- Type: text/x-patch, Size: 14032 bytes --]
OpenMP: Fix SIMT for complex/float reduction with && and ||
2021-05-06 Tobias Burnus <tobias@codesourcery.com>
Tom de Vries <tdevries@suse.de>
gcc/ChangeLog:
* omp-low.c (lower_rec_simd_input_clauses): Set max_vf = 1 if
a truth_value_p reduction variable is nonintegral.
(lower_rec_input_clauses): Also handle SIMT part
for complex/float recution with && and ||.
libgomp/ChangeLog:
* testsuite/libgomp.c-c++-common/reduction-5.c: New test, testing
complex/floating-point || + && reduction with 'omp target'.
* testsuite/libgomp.c-c++-common/reduction-5.c: Likewise.
gcc/omp-low.c | 58 ++++--
.../testsuite/libgomp.c-c++-common/reduction-5.c | 192 ++++++++++++++++++++
.../testsuite/libgomp.c-c++-common/reduction-6.c | 195 +++++++++++++++++++++
3 files changed, 426 insertions(+), 19 deletions(-)
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 26ceaf74b2d..c3c72241486 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -4389,14 +4389,28 @@ lower_rec_simd_input_clauses (tree new_var, omp_context *ctx,
{
for (tree c = gimple_omp_for_clauses (ctx->stmt); c;
c = OMP_CLAUSE_CHAIN (c))
- if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION
- && OMP_CLAUSE_REDUCTION_PLACEHOLDER (c))
- {
- /* UDR reductions are not supported yet for SIMT, disable
- SIMT. */
- sctx->max_vf = 1;
- break;
+ {
+ if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_REDUCTION)
+ continue;
+
+ if (OMP_CLAUSE_REDUCTION_PLACEHOLDER (c))
+ {
+ /* UDR reductions are not supported yet for SIMT, disable
+ SIMT. */
+ sctx->max_vf = 1;
+ break;
+ }
+
+ if (truth_value_p (OMP_CLAUSE_REDUCTION_CODE (c))
+ && !INTEGRAL_TYPE_P (TREE_TYPE (new_var)))
+ {
+ /* Doing boolean operations on non-boolean types is
+ for conformance only, it's not worth supporting this
+ for SIMT. */
+ sctx->max_vf = 1;
+ break;
}
+ }
}
if (maybe_gt (sctx->max_vf, 1U))
{
@@ -6432,28 +6446,34 @@ lower_rec_input_clauses (tree clauses, gimple_seq *ilist, gimple_seq *dlist,
gimplify_assign (unshare_expr (ivar), x, &llist[0]);
- if (sctx.is_simt)
- {
- if (!simt_lane)
- simt_lane = create_tmp_var (unsigned_type_node);
- x = build_call_expr_internal_loc
- (UNKNOWN_LOCATION, IFN_GOMP_SIMT_XCHG_BFLY,
- TREE_TYPE (ivar), 2, ivar, simt_lane);
- x = build2 (code, TREE_TYPE (ivar), ivar, x);
- gimplify_assign (ivar, x, &llist[2]);
- }
tree ivar2 = ivar;
tree ref2 = ref;
+ tree zero = NULL_TREE;
if (is_fp_and_or)
{
- tree zero = build_zero_cst (TREE_TYPE (ivar));
+ zero = build_zero_cst (TREE_TYPE (ivar));
ivar2 = fold_build2_loc (clause_loc, NE_EXPR,
integer_type_node, ivar,
zero);
ref2 = fold_build2_loc (clause_loc, NE_EXPR,
integer_type_node, ref, zero);
}
- x = build2 (code, TREE_TYPE (ref), ref2, ivar2);
+ if (sctx.is_simt)
+ {
+ if (!simt_lane)
+ simt_lane = create_tmp_var (unsigned_type_node);
+ x = build_call_expr_internal_loc
+ (UNKNOWN_LOCATION, IFN_GOMP_SIMT_XCHG_BFLY,
+ TREE_TYPE (ivar), 2, ivar, simt_lane);
+ if (is_fp_and_or)
+ x = fold_build2_loc (clause_loc, NE_EXPR,
+ integer_type_node, x, zero);
+ x = build2 (code, TREE_TYPE (ivar2), ivar2, x);
+ if (is_fp_and_or)
+ x = fold_convert (TREE_TYPE (ivar), x);
+ gimplify_assign (ivar, x, &llist[2]);
+ }
+ x = build2 (code, TREE_TYPE (ref2), ref2, ivar2);
if (is_fp_and_or)
x = fold_convert (TREE_TYPE (ref), x);
ref = build_outer_var_ref (var, ctx);
diff --git a/libgomp/testsuite/libgomp.c-c++-common/reduction-5.c b/libgomp/testsuite/libgomp.c-c++-common/reduction-5.c
new file mode 100644
index 00000000000..8ac9930b241
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-5.c
@@ -0,0 +1,192 @@
+/* C / C++'s logical AND and OR operators take any scalar argument
+ which compares (un)equal to 0 - the result 1 or 0 and of type int.
+
+ In this testcase, the int result is again converted to a floating-poing
+ or complex type.
+
+ While having a floating-point/complex array element with || and && can make
+ sense, having a non-integer/non-bool reduction variable is odd but valid.
+
+ Test: FP reduction variable + FP array - as reduction-1.c but with target */
+
+#define N 1024
+_Complex float rcf[N];
+_Complex double rcd[N];
+float rf[N];
+double rd[N];
+
+int
+reduction_or ()
+{
+ float orf = 0;
+ double ord = 0;
+ _Complex float orfc = 0;
+ _Complex double ordc = 0;
+
+ #pragma omp target parallel reduction(||: orf) map(orf)
+ for (int i=0; i < N; ++i)
+ orf = orf || rf[i];
+
+ #pragma omp target parallel for reduction(||: ord) map(ord)
+ for (int i=0; i < N; ++i)
+ ord = ord || rcd[i];
+
+ #pragma omp target parallel for simd reduction(||: orfc) map(orfc)
+ for (int i=0; i < N; ++i)
+ orfc = orfc || rcf[i];
+
+ #pragma omp target parallel loop reduction(||: ordc) map(ordc)
+ for (int i=0; i < N; ++i)
+ ordc = ordc || rcd[i];
+
+ return orf + ord + __real__ orfc + __real__ ordc;
+}
+
+int
+reduction_or_teams ()
+{
+ float orf = 0;
+ double ord = 0;
+ _Complex float orfc = 0;
+ _Complex double ordc = 0;
+
+ #pragma omp target teams distribute parallel for reduction(||: orf) map(orf)
+ for (int i=0; i < N; ++i)
+ orf = orf || rf[i];
+
+ #pragma omp target teams distribute parallel for simd reduction(||: ord) map(ord)
+ for (int i=0; i < N; ++i)
+ ord = ord || rcd[i];
+
+ #pragma omp target teams distribute parallel for reduction(||: orfc) map(orfc)
+ for (int i=0; i < N; ++i)
+ orfc = orfc || rcf[i];
+
+ #pragma omp target teams distribute parallel for simd reduction(||: ordc) map(ordc)
+ for (int i=0; i < N; ++i)
+ ordc = ordc || rcd[i];
+
+ return orf + ord + __real__ orfc + __real__ ordc;
+}
+
+int
+reduction_and ()
+{
+ float andf = 1;
+ double andd = 1;
+ _Complex float andfc = 1;
+ _Complex double anddc = 1;
+
+ #pragma omp target parallel reduction(&&: andf) map(andf)
+ for (int i=0; i < N; ++i)
+ andf = andf && rf[i];
+
+ #pragma omp target parallel for reduction(&&: andd) map(andd)
+ for (int i=0; i < N; ++i)
+ andd = andd && rcd[i];
+
+ #pragma omp target parallel for simd reduction(&&: andfc) map(andfc)
+ for (int i=0; i < N; ++i)
+ andfc = andfc && rcf[i];
+
+ #pragma omp target parallel loop reduction(&&: anddc) map(anddc)
+ for (int i=0; i < N; ++i)
+ anddc = anddc && rcd[i];
+
+ return andf + andd + __real__ andfc + __real__ anddc;
+}
+
+int
+reduction_and_teams ()
+{
+ float andf = 1;
+ double andd = 1;
+ _Complex float andfc = 1;
+ _Complex double anddc = 1;
+
+ #pragma omp target teams distribute parallel for reduction(&&: andf) map(andf)
+ for (int i=0; i < N; ++i)
+ andf = andf && rf[i];
+
+ #pragma omp target teams distribute parallel for simd reduction(&&: andd) map(andd)
+ for (int i=0; i < N; ++i)
+ andd = andd && rcd[i];
+
+ #pragma omp target teams distribute parallel for reduction(&&: andfc) map(andfc)
+ for (int i=0; i < N; ++i)
+ andfc = andfc && rcf[i];
+
+ #pragma omp target teams distribute parallel for simd reduction(&&: anddc) map(anddc)
+ for (int i=0; i < N; ++i)
+ anddc = anddc && rcd[i];
+
+ return andf + andd + __real__ andfc + __real__ anddc;
+}
+
+int
+main ()
+{
+ for (int i = 0; i < N; ++i)
+ {
+ rf[i] = 0;
+ rd[i] = 0;
+ rcf[i] = 0;
+ rcd[i] = 0;
+ }
+
+ if (reduction_or () != 0)
+ __builtin_abort ();
+ if (reduction_or_teams () != 0)
+ __builtin_abort ();
+ if (reduction_and () != 0)
+ __builtin_abort ();
+ if (reduction_and_teams () != 0)
+ __builtin_abort ();
+
+ rf[10] = 1.0;
+ rd[15] = 1.0;
+ rcf[10] = 1.0;
+ rcd[15] = 1.0i;
+
+ if (reduction_or () != 4)
+ __builtin_abort ();
+ if (reduction_or_teams () != 4)
+ __builtin_abort ();
+ if (reduction_and () != 0)
+ __builtin_abort ();
+ if (reduction_and_teams () != 0)
+ __builtin_abort ();
+
+ for (int i = 0; i < N; ++i)
+ {
+ rf[i] = 1;
+ rd[i] = 1;
+ rcf[i] = 1;
+ rcd[i] = 1;
+ }
+
+ if (reduction_or () != 4)
+ __builtin_abort ();
+ if (reduction_or_teams () != 4)
+ __builtin_abort ();
+ if (reduction_and () != 4)
+ __builtin_abort ();
+ if (reduction_and_teams () != 4)
+ __builtin_abort ();
+
+ rf[10] = 0.0;
+ rd[15] = 0.0;
+ rcf[10] = 0.0;
+ rcd[15] = 0.0;
+
+ if (reduction_or () != 4)
+ __builtin_abort ();
+ if (reduction_or_teams () != 4)
+ __builtin_abort ();
+ if (reduction_and () != 0)
+ __builtin_abort ();
+ if (reduction_and_teams () != 0)
+ __builtin_abort ();
+
+ return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/reduction-6.c b/libgomp/testsuite/libgomp.c-c++-common/reduction-6.c
new file mode 100644
index 00000000000..a223d296183
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-6.c
@@ -0,0 +1,195 @@
+/* C / C++'s logical AND and OR operators take any scalar argument
+ which compares (un)equal to 0 - the result 1 or 0 and of type int.
+
+ In this testcase, the int result is again converted to an integer complex
+ type.
+
+ While having a floating-point/complex array element with || and && can make
+ sense, having a complex reduction variable is odd but valid.
+
+ Test: int complex reduction variable + int complex array.
+ as reduction-4.c but with target. */
+
+#define N 1024
+_Complex char rcc[N];
+_Complex short rcs[N];
+_Complex int rci[N];
+_Complex long long rcl[N];
+
+int
+reduction_or ()
+{
+ _Complex char orc = 0;
+ _Complex short ors = 0;
+ _Complex int ori = 0;
+ _Complex long orl = 0;
+
+ #pragma omp target parallel reduction(||: orc) map(orc)
+ for (int i=0; i < N; ++i)
+ orc = orc || rcl[i];
+
+ #pragma omp target parallel for reduction(||: ors) map(ors)
+ for (int i=0; i < N; ++i)
+ ors = ors || rci[i];
+
+ #pragma omp target parallel for simd reduction(||: ori) map(ori)
+ for (int i=0; i < N; ++i)
+ ori = ori || rcs[i];
+
+ #pragma omp target parallel loop reduction(||: orl) map(orl)
+ for (int i=0; i < N; ++i)
+ orl = orl || rcc[i];
+
+ return __real__ (orc + ors + ori + orl) + __imag__ (orc + ors + ori + orl);
+}
+
+int
+reduction_or_teams ()
+{
+ _Complex char orc = 0;
+ _Complex short ors = 0;
+ _Complex int ori = 0;
+ _Complex long orl = 0;
+
+ #pragma omp target teams distribute parallel for reduction(||: orc) map(orc)
+ for (int i=0; i < N; ++i)
+ orc = orc || rcc[i];
+
+ #pragma omp target teams distribute parallel for simd reduction(||: ors) map(ors)
+ for (int i=0; i < N; ++i)
+ ors = ors || rcs[i];
+
+ #pragma omp target teams distribute parallel for reduction(||: ori) map(ori)
+ for (int i=0; i < N; ++i)
+ ori = ori || rci[i];
+
+ #pragma omp target teams distribute parallel for simd reduction(||: orl) map(orl)
+ for (int i=0; i < N; ++i)
+ orl = orl || rcl[i];
+
+ return __real__ (orc + ors + ori + orl) + __imag__ (orc + ors + ori + orl);
+}
+
+int
+reduction_and ()
+{
+ _Complex char andc = 1;
+ _Complex short ands = 1;
+ _Complex int andi = 1;
+ _Complex long andl = 1;
+
+ #pragma omp target parallel reduction(&&: andc) map(andc)
+ for (int i=0; i < N; ++i)
+ andc = andc && rcc[i];
+
+ #pragma omp target parallel for reduction(&&: ands) map(ands)
+ for (int i=0; i < N; ++i)
+ ands = ands && rcs[i];
+
+ #pragma omp target parallel for simd reduction(&&: andi) map(andi)
+ for (int i=0; i < N; ++i)
+ andi = andi && rci[i];
+
+ #pragma omp target parallel loop reduction(&&: andl) map(andl)
+ for (int i=0; i < N; ++i)
+ andl = andl && rcl[i];
+
+ return __real__ (andc + ands + andi + andl)
+ + __imag__ (andc + ands + andi + andl);
+}
+
+int
+reduction_and_teams ()
+{
+ _Complex char andc = 1;
+ _Complex short ands = 1;
+ _Complex int andi = 1;
+ _Complex long andl = 1;
+
+ #pragma omp target teams distribute parallel for reduction(&&: andc) map(andc)
+ for (int i=0; i < N; ++i)
+ andc = andc && rcl[i];
+
+ #pragma omp target teams distribute parallel for simd reduction(&&: ands) map(ands)
+ for (int i=0; i < N; ++i)
+ ands = ands && rci[i];
+
+ #pragma omp target teams distribute parallel for reduction(&&: andi) map(andi)
+ for (int i=0; i < N; ++i)
+ andi = andi && rcs[i];
+
+ #pragma omp target teams distribute parallel for simd reduction(&&: andl) map(andl)
+ for (int i=0; i < N; ++i)
+ andl = andl && rcc[i];
+
+ return __real__ (andc + ands + andi + andl)
+ + __imag__ (andc + ands + andi + andl);
+}
+
+int
+main ()
+{
+ for (int i = 0; i < N; ++i)
+ {
+ rcc[i] = 0;
+ rcs[i] = 0;
+ rci[i] = 0;
+ rcl[i] = 0;
+ }
+
+ if (reduction_or () != 0)
+ __builtin_abort ();
+ if (reduction_or_teams () != 0)
+ __builtin_abort ();
+ if (reduction_and () != 0)
+ __builtin_abort ();
+ if (reduction_and_teams () != 0)
+ __builtin_abort ();
+
+ rcc[10] = 1.0;
+ rcs[15] = 1.0i;
+ rci[10] = 1.0;
+ rcl[15] = 1.0i;
+
+ if (reduction_or () != 4)
+ __builtin_abort ();
+ if (reduction_or_teams () != 4)
+ __builtin_abort ();
+ if (reduction_and () != 0)
+ __builtin_abort ();
+ if (reduction_and_teams () != 0)
+ __builtin_abort ();
+
+ for (int i = 0; i < N; ++i)
+ {
+ rcc[i] = 1;
+ rcs[i] = 1i;
+ rci[i] = 1;
+ rcl[i] = 1 + 1i;
+ }
+
+ if (reduction_or () != 4)
+ __builtin_abort ();
+ if (reduction_or_teams () != 4)
+ __builtin_abort ();
+ if (reduction_and () != 4)
+ __builtin_abort ();
+ if (reduction_and_teams () != 4)
+ __builtin_abort ();
+
+ rcc[10] = 0.0;
+ rcs[15] = 0.0;
+ rci[10] = 0.0;
+ rcl[15] = 0.0;
+
+ if (reduction_or () != 4)
+ __builtin_abort ();
+ if (reduction_or_teams () != 4)
+ __builtin_abort ();
+ if (reduction_and () != 0)
+ __builtin_abort ();
+ if (reduction_and_teams () != 0)
+ __builtin_abort ();
+
+ return 0;
+}
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Patch] + [nvptx RFH/RFC]: OpenMP: Fix SIMT for complex/float reduction with && and ||
2021-05-06 14:21 ` Tobias Burnus
@ 2021-05-06 14:32 ` Jakub Jelinek
2021-05-07 10:05 ` Tobias Burnus
0 siblings, 1 reply; 11+ messages in thread
From: Jakub Jelinek @ 2021-05-06 14:32 UTC (permalink / raw)
To: Tobias Burnus; +Cc: Tom de Vries, gcc-patches
On Thu, May 06, 2021 at 04:21:40PM +0200, Tobias Burnus wrote:
> * omp-low.c (lower_rec_simd_input_clauses): Set max_vf = 1 if
> a truth_value_p reduction variable is nonintegral.
> (lower_rec_input_clauses): Also handle SIMT part
> for complex/float recution with && and ||.
s/recution/reduction/
> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c
> @@ -4389,14 +4389,28 @@ lower_rec_simd_input_clauses (tree new_var, omp_context *ctx,
> {
> for (tree c = gimple_omp_for_clauses (ctx->stmt); c;
> c = OMP_CLAUSE_CHAIN (c))
> - if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION
> - && OMP_CLAUSE_REDUCTION_PLACEHOLDER (c))
> - {
> - /* UDR reductions are not supported yet for SIMT, disable
> - SIMT. */
> - sctx->max_vf = 1;
> - break;
> + {
> + if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_REDUCTION)
> + continue;
> +
> + if (OMP_CLAUSE_REDUCTION_PLACEHOLDER (c))
> + {
> + /* UDR reductions are not supported yet for SIMT, disable
> + SIMT. */
> + sctx->max_vf = 1;
> + break;
> + }
> +
> + if (truth_value_p (OMP_CLAUSE_REDUCTION_CODE (c))
> + && !INTEGRAL_TYPE_P (TREE_TYPE (new_var)))
> + {
> + /* Doing boolean operations on non-boolean types is
> + for conformance only, it's not worth supporting this
> + for SIMT. */
This comment needs to be adjusted to talk about non-integral types.
> + sctx->max_vf = 1;
> + break;
> }
> + }
> }
> if (maybe_gt (sctx->max_vf, 1U))
> {
> @@ -6432,28 +6446,34 @@ lower_rec_input_clauses (tree clauses, gimple_seq *ilist, gimple_seq *dlist,
>
> gimplify_assign (unshare_expr (ivar), x, &llist[0]);
>
> - if (sctx.is_simt)
> - {
> - if (!simt_lane)
> - simt_lane = create_tmp_var (unsigned_type_node);
> - x = build_call_expr_internal_loc
> - (UNKNOWN_LOCATION, IFN_GOMP_SIMT_XCHG_BFLY,
> - TREE_TYPE (ivar), 2, ivar, simt_lane);
> - x = build2 (code, TREE_TYPE (ivar), ivar, x);
> - gimplify_assign (ivar, x, &llist[2]);
> - }
> tree ivar2 = ivar;
> tree ref2 = ref;
> + tree zero = NULL_TREE;
> if (is_fp_and_or)
> {
> - tree zero = build_zero_cst (TREE_TYPE (ivar));
> + zero = build_zero_cst (TREE_TYPE (ivar));
> ivar2 = fold_build2_loc (clause_loc, NE_EXPR,
> integer_type_node, ivar,
> zero);
> ref2 = fold_build2_loc (clause_loc, NE_EXPR,
> integer_type_node, ref, zero);
> }
> - x = build2 (code, TREE_TYPE (ref), ref2, ivar2);
> + if (sctx.is_simt)
> + {
> + if (!simt_lane)
> + simt_lane = create_tmp_var (unsigned_type_node);
> + x = build_call_expr_internal_loc
> + (UNKNOWN_LOCATION, IFN_GOMP_SIMT_XCHG_BFLY,
> + TREE_TYPE (ivar), 2, ivar, simt_lane);
> + if (is_fp_and_or)
> + x = fold_build2_loc (clause_loc, NE_EXPR,
> + integer_type_node, x, zero);
> + x = build2 (code, TREE_TYPE (ivar2), ivar2, x);
> + if (is_fp_and_or)
> + x = fold_convert (TREE_TYPE (ivar), x);
> + gimplify_assign (ivar, x, &llist[2]);
> + }
> + x = build2 (code, TREE_TYPE (ref2), ref2, ivar2);
> if (is_fp_and_or)
> x = fold_convert (TREE_TYPE (ref), x);
> ref = build_outer_var_ref (var, ctx);
Is this hunk still needed when the first hunk is in?
I mean, this is in code guarded with
is_simd && lower_rec_simd_input_clauses (...) and that function
will return false for if (known_eq (sctx->max_vf, 1U)) which the first hunk
ensures.
So sctx.is_simt && is_fp_and_or shouldn't be true in that code.
Jakub
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Patch] + [nvptx RFH/RFC]: OpenMP: Fix SIMT for complex/float reduction with && and ||
2021-05-06 14:32 ` Jakub Jelinek
@ 2021-05-07 10:05 ` Tobias Burnus
2021-05-07 10:06 ` Jakub Jelinek
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Tobias Burnus @ 2021-05-07 10:05 UTC (permalink / raw)
To: Jakub Jelinek; +Cc: Tom de Vries, gcc-patches
[-- Attachment #1: Type: text/plain, Size: 453 bytes --]
On 06.05.21 16:32, Jakub Jelinek wrote:
> s/recution/reduction/
Fixed.
> This comment needs to be adjusted to talk about non-integral types.
Fixed.
> Is this hunk still needed when the first hunk is in?
No - and now removed.
Updated code attached.
Tobias
-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf
[-- Attachment #2: red-nvptx-bfly-v4.diff --]
[-- Type: text/x-patch, Size: 12403 bytes --]
OpenMP: Fix SIMT for complex/float reduction with && and ||
2021-05-07 Tobias Burnus <tobias@codesourcery.com>
Tom de Vries <tdevries@suse.de>
gcc/ChangeLog:
* omp-low.c (lower_rec_simd_input_clauses): Set max_vf = 1 if
a truth_value_p reduction variable is nonintegral.
libgomp/ChangeLog:
* testsuite/libgomp.c-c++-common/reduction-5.c: New test, testing
complex/floating-point || + && reduction with 'omp target'.
* testsuite/libgomp.c-c++-common/reduction-5.c: Likewise.
gcc/omp-low.c | 28 ++-
.../testsuite/libgomp.c-c++-common/reduction-5.c | 193 ++++++++++++++++++++
.../testsuite/libgomp.c-c++-common/reduction-6.c | 196 +++++++++++++++++++++
3 files changed, 410 insertions(+), 7 deletions(-)
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 26ceaf74b2d..2325cfcfc34 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -4389,14 +4389,28 @@ lower_rec_simd_input_clauses (tree new_var, omp_context *ctx,
{
for (tree c = gimple_omp_for_clauses (ctx->stmt); c;
c = OMP_CLAUSE_CHAIN (c))
- if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION
- && OMP_CLAUSE_REDUCTION_PLACEHOLDER (c))
- {
- /* UDR reductions are not supported yet for SIMT, disable
- SIMT. */
- sctx->max_vf = 1;
- break;
+ {
+ if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_REDUCTION)
+ continue;
+
+ if (OMP_CLAUSE_REDUCTION_PLACEHOLDER (c))
+ {
+ /* UDR reductions are not supported yet for SIMT, disable
+ SIMT. */
+ sctx->max_vf = 1;
+ break;
+ }
+
+ if (truth_value_p (OMP_CLAUSE_REDUCTION_CODE (c))
+ && !INTEGRAL_TYPE_P (TREE_TYPE (new_var)))
+ {
+ /* Doing boolean operations on non-integral types is
+ for conformance only, it's not worth supporting this
+ for SIMT. */
+ sctx->max_vf = 1;
+ break;
}
+ }
}
if (maybe_gt (sctx->max_vf, 1U))
{
diff --git a/libgomp/testsuite/libgomp.c-c++-common/reduction-5.c b/libgomp/testsuite/libgomp.c-c++-common/reduction-5.c
new file mode 100644
index 00000000000..21540512e23
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-5.c
@@ -0,0 +1,193 @@
+/* { dg-additional-options "-foffload=-latomic" { target { offload_target_nvptx } } } */
+/* C / C++'s logical AND and OR operators take any scalar argument
+ which compares (un)equal to 0 - the result 1 or 0 and of type int.
+
+ In this testcase, the int result is again converted to a floating-poing
+ or complex type.
+
+ While having a floating-point/complex array element with || and && can make
+ sense, having a non-integer/non-bool reduction variable is odd but valid.
+
+ Test: FP reduction variable + FP array - as reduction-1.c but with target */
+
+#define N 1024
+_Complex float rcf[N];
+_Complex double rcd[N];
+float rf[N];
+double rd[N];
+
+int
+reduction_or ()
+{
+ float orf = 0;
+ double ord = 0;
+ _Complex float orfc = 0;
+ _Complex double ordc = 0;
+
+ #pragma omp target parallel reduction(||: orf) map(orf)
+ for (int i=0; i < N; ++i)
+ orf = orf || rf[i];
+
+ #pragma omp target parallel for reduction(||: ord) map(ord)
+ for (int i=0; i < N; ++i)
+ ord = ord || rcd[i];
+
+ #pragma omp target parallel for simd reduction(||: orfc) map(orfc)
+ for (int i=0; i < N; ++i)
+ orfc = orfc || rcf[i];
+
+ #pragma omp target parallel loop reduction(||: ordc) map(ordc)
+ for (int i=0; i < N; ++i)
+ ordc = ordc || rcd[i];
+
+ return orf + ord + __real__ orfc + __real__ ordc;
+}
+
+int
+reduction_or_teams ()
+{
+ float orf = 0;
+ double ord = 0;
+ _Complex float orfc = 0;
+ _Complex double ordc = 0;
+
+ #pragma omp target teams distribute parallel for reduction(||: orf) map(orf)
+ for (int i=0; i < N; ++i)
+ orf = orf || rf[i];
+
+ #pragma omp target teams distribute parallel for simd reduction(||: ord) map(ord)
+ for (int i=0; i < N; ++i)
+ ord = ord || rcd[i];
+
+ #pragma omp target teams distribute parallel for reduction(||: orfc) map(orfc)
+ for (int i=0; i < N; ++i)
+ orfc = orfc || rcf[i];
+
+ #pragma omp target teams distribute parallel for simd reduction(||: ordc) map(ordc)
+ for (int i=0; i < N; ++i)
+ ordc = ordc || rcd[i];
+
+ return orf + ord + __real__ orfc + __real__ ordc;
+}
+
+int
+reduction_and ()
+{
+ float andf = 1;
+ double andd = 1;
+ _Complex float andfc = 1;
+ _Complex double anddc = 1;
+
+ #pragma omp target parallel reduction(&&: andf) map(andf)
+ for (int i=0; i < N; ++i)
+ andf = andf && rf[i];
+
+ #pragma omp target parallel for reduction(&&: andd) map(andd)
+ for (int i=0; i < N; ++i)
+ andd = andd && rcd[i];
+
+ #pragma omp target parallel for simd reduction(&&: andfc) map(andfc)
+ for (int i=0; i < N; ++i)
+ andfc = andfc && rcf[i];
+
+ #pragma omp target parallel loop reduction(&&: anddc) map(anddc)
+ for (int i=0; i < N; ++i)
+ anddc = anddc && rcd[i];
+
+ return andf + andd + __real__ andfc + __real__ anddc;
+}
+
+int
+reduction_and_teams ()
+{
+ float andf = 1;
+ double andd = 1;
+ _Complex float andfc = 1;
+ _Complex double anddc = 1;
+
+ #pragma omp target teams distribute parallel for reduction(&&: andf) map(andf)
+ for (int i=0; i < N; ++i)
+ andf = andf && rf[i];
+
+ #pragma omp target teams distribute parallel for simd reduction(&&: andd) map(andd)
+ for (int i=0; i < N; ++i)
+ andd = andd && rcd[i];
+
+ #pragma omp target teams distribute parallel for reduction(&&: andfc) map(andfc)
+ for (int i=0; i < N; ++i)
+ andfc = andfc && rcf[i];
+
+ #pragma omp target teams distribute parallel for simd reduction(&&: anddc) map(anddc)
+ for (int i=0; i < N; ++i)
+ anddc = anddc && rcd[i];
+
+ return andf + andd + __real__ andfc + __real__ anddc;
+}
+
+int
+main ()
+{
+ for (int i = 0; i < N; ++i)
+ {
+ rf[i] = 0;
+ rd[i] = 0;
+ rcf[i] = 0;
+ rcd[i] = 0;
+ }
+
+ if (reduction_or () != 0)
+ __builtin_abort ();
+ if (reduction_or_teams () != 0)
+ __builtin_abort ();
+ if (reduction_and () != 0)
+ __builtin_abort ();
+ if (reduction_and_teams () != 0)
+ __builtin_abort ();
+
+ rf[10] = 1.0;
+ rd[15] = 1.0;
+ rcf[10] = 1.0;
+ rcd[15] = 1.0i;
+
+ if (reduction_or () != 4)
+ __builtin_abort ();
+ if (reduction_or_teams () != 4)
+ __builtin_abort ();
+ if (reduction_and () != 0)
+ __builtin_abort ();
+ if (reduction_and_teams () != 0)
+ __builtin_abort ();
+
+ for (int i = 0; i < N; ++i)
+ {
+ rf[i] = 1;
+ rd[i] = 1;
+ rcf[i] = 1;
+ rcd[i] = 1;
+ }
+
+ if (reduction_or () != 4)
+ __builtin_abort ();
+ if (reduction_or_teams () != 4)
+ __builtin_abort ();
+ if (reduction_and () != 4)
+ __builtin_abort ();
+ if (reduction_and_teams () != 4)
+ __builtin_abort ();
+
+ rf[10] = 0.0;
+ rd[15] = 0.0;
+ rcf[10] = 0.0;
+ rcd[15] = 0.0;
+
+ if (reduction_or () != 4)
+ __builtin_abort ();
+ if (reduction_or_teams () != 4)
+ __builtin_abort ();
+ if (reduction_and () != 0)
+ __builtin_abort ();
+ if (reduction_and_teams () != 0)
+ __builtin_abort ();
+
+ return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/reduction-6.c b/libgomp/testsuite/libgomp.c-c++-common/reduction-6.c
new file mode 100644
index 00000000000..27d9ef6b635
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-6.c
@@ -0,0 +1,196 @@
+/* { dg-additional-options "-foffload=-latomic" { target { offload_target_nvptx } } } */
+/* C / C++'s logical AND and OR operators take any scalar argument
+ which compares (un)equal to 0 - the result 1 or 0 and of type int.
+
+ In this testcase, the int result is again converted to an integer complex
+ type.
+
+ While having a floating-point/complex array element with || and && can make
+ sense, having a complex reduction variable is odd but valid.
+
+ Test: int complex reduction variable + int complex array.
+ as reduction-4.c but with target. */
+
+#define N 1024
+_Complex char rcc[N];
+_Complex short rcs[N];
+_Complex int rci[N];
+_Complex long long rcl[N];
+
+int
+reduction_or ()
+{
+ _Complex char orc = 0;
+ _Complex short ors = 0;
+ _Complex int ori = 0;
+ _Complex long orl = 0;
+
+ #pragma omp target parallel reduction(||: orc) map(orc)
+ for (int i=0; i < N; ++i)
+ orc = orc || rcl[i];
+
+ #pragma omp target parallel for reduction(||: ors) map(ors)
+ for (int i=0; i < N; ++i)
+ ors = ors || rci[i];
+
+ #pragma omp target parallel for simd reduction(||: ori) map(ori)
+ for (int i=0; i < N; ++i)
+ ori = ori || rcs[i];
+
+ #pragma omp target parallel loop reduction(||: orl) map(orl)
+ for (int i=0; i < N; ++i)
+ orl = orl || rcc[i];
+
+ return __real__ (orc + ors + ori + orl) + __imag__ (orc + ors + ori + orl);
+}
+
+int
+reduction_or_teams ()
+{
+ _Complex char orc = 0;
+ _Complex short ors = 0;
+ _Complex int ori = 0;
+ _Complex long orl = 0;
+
+ #pragma omp target teams distribute parallel for reduction(||: orc) map(orc)
+ for (int i=0; i < N; ++i)
+ orc = orc || rcc[i];
+
+ #pragma omp target teams distribute parallel for simd reduction(||: ors) map(ors)
+ for (int i=0; i < N; ++i)
+ ors = ors || rcs[i];
+
+ #pragma omp target teams distribute parallel for reduction(||: ori) map(ori)
+ for (int i=0; i < N; ++i)
+ ori = ori || rci[i];
+
+ #pragma omp target teams distribute parallel for simd reduction(||: orl) map(orl)
+ for (int i=0; i < N; ++i)
+ orl = orl || rcl[i];
+
+ return __real__ (orc + ors + ori + orl) + __imag__ (orc + ors + ori + orl);
+}
+
+int
+reduction_and ()
+{
+ _Complex char andc = 1;
+ _Complex short ands = 1;
+ _Complex int andi = 1;
+ _Complex long andl = 1;
+
+ #pragma omp target parallel reduction(&&: andc) map(andc)
+ for (int i=0; i < N; ++i)
+ andc = andc && rcc[i];
+
+ #pragma omp target parallel for reduction(&&: ands) map(ands)
+ for (int i=0; i < N; ++i)
+ ands = ands && rcs[i];
+
+ #pragma omp target parallel for simd reduction(&&: andi) map(andi)
+ for (int i=0; i < N; ++i)
+ andi = andi && rci[i];
+
+ #pragma omp target parallel loop reduction(&&: andl) map(andl)
+ for (int i=0; i < N; ++i)
+ andl = andl && rcl[i];
+
+ return __real__ (andc + ands + andi + andl)
+ + __imag__ (andc + ands + andi + andl);
+}
+
+int
+reduction_and_teams ()
+{
+ _Complex char andc = 1;
+ _Complex short ands = 1;
+ _Complex int andi = 1;
+ _Complex long andl = 1;
+
+ #pragma omp target teams distribute parallel for reduction(&&: andc) map(andc)
+ for (int i=0; i < N; ++i)
+ andc = andc && rcl[i];
+
+ #pragma omp target teams distribute parallel for simd reduction(&&: ands) map(ands)
+ for (int i=0; i < N; ++i)
+ ands = ands && rci[i];
+
+ #pragma omp target teams distribute parallel for reduction(&&: andi) map(andi)
+ for (int i=0; i < N; ++i)
+ andi = andi && rcs[i];
+
+ #pragma omp target teams distribute parallel for simd reduction(&&: andl) map(andl)
+ for (int i=0; i < N; ++i)
+ andl = andl && rcc[i];
+
+ return __real__ (andc + ands + andi + andl)
+ + __imag__ (andc + ands + andi + andl);
+}
+
+int
+main ()
+{
+ for (int i = 0; i < N; ++i)
+ {
+ rcc[i] = 0;
+ rcs[i] = 0;
+ rci[i] = 0;
+ rcl[i] = 0;
+ }
+
+ if (reduction_or () != 0)
+ __builtin_abort ();
+ if (reduction_or_teams () != 0)
+ __builtin_abort ();
+ if (reduction_and () != 0)
+ __builtin_abort ();
+ if (reduction_and_teams () != 0)
+ __builtin_abort ();
+
+ rcc[10] = 1.0;
+ rcs[15] = 1.0i;
+ rci[10] = 1.0;
+ rcl[15] = 1.0i;
+
+ if (reduction_or () != 4)
+ __builtin_abort ();
+ if (reduction_or_teams () != 4)
+ __builtin_abort ();
+ if (reduction_and () != 0)
+ __builtin_abort ();
+ if (reduction_and_teams () != 0)
+ __builtin_abort ();
+
+ for (int i = 0; i < N; ++i)
+ {
+ rcc[i] = 1;
+ rcs[i] = 1i;
+ rci[i] = 1;
+ rcl[i] = 1 + 1i;
+ }
+
+ if (reduction_or () != 4)
+ __builtin_abort ();
+ if (reduction_or_teams () != 4)
+ __builtin_abort ();
+ if (reduction_and () != 4)
+ __builtin_abort ();
+ if (reduction_and_teams () != 4)
+ __builtin_abort ();
+
+ rcc[10] = 0.0;
+ rcs[15] = 0.0;
+ rci[10] = 0.0;
+ rcl[15] = 0.0;
+
+ if (reduction_or () != 4)
+ __builtin_abort ();
+ if (reduction_or_teams () != 4)
+ __builtin_abort ();
+ if (reduction_and () != 0)
+ __builtin_abort ();
+ if (reduction_and_teams () != 0)
+ __builtin_abort ();
+
+ return 0;
+}
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Patch] + [nvptx RFH/RFC]: OpenMP: Fix SIMT for complex/float reduction with && and ||
2021-05-07 10:05 ` Tobias Burnus
@ 2021-05-07 10:06 ` Jakub Jelinek
2021-05-07 10:08 ` Tom de Vries
2021-05-18 11:07 ` Thomas Schwinge
2 siblings, 0 replies; 11+ messages in thread
From: Jakub Jelinek @ 2021-05-07 10:06 UTC (permalink / raw)
To: Tobias Burnus; +Cc: Tom de Vries, gcc-patches
On Fri, May 07, 2021 at 12:05:11PM +0200, Tobias Burnus wrote:
> 2021-05-07 Tobias Burnus <tobias@codesourcery.com>
> Tom de Vries <tdevries@suse.de>
>
> gcc/ChangeLog:
>
> * omp-low.c (lower_rec_simd_input_clauses): Set max_vf = 1 if
> a truth_value_p reduction variable is nonintegral.
>
> libgomp/ChangeLog:
>
> * testsuite/libgomp.c-c++-common/reduction-5.c: New test, testing
> complex/floating-point || + && reduction with 'omp target'.
> * testsuite/libgomp.c-c++-common/reduction-5.c: Likewise.
>
> gcc/omp-low.c | 28 ++-
> .../testsuite/libgomp.c-c++-common/reduction-5.c | 193 ++++++++++++++++++++
> .../testsuite/libgomp.c-c++-common/reduction-6.c | 196 +++++++++++++++++++++
> 3 files changed, 410 insertions(+), 7 deletions(-)
Ok, thanks.
Jakub
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Patch] + [nvptx RFH/RFC]: OpenMP: Fix SIMT for complex/float reduction with && and ||
2021-05-07 10:05 ` Tobias Burnus
2021-05-07 10:06 ` Jakub Jelinek
@ 2021-05-07 10:08 ` Tom de Vries
2021-05-18 11:07 ` Thomas Schwinge
2 siblings, 0 replies; 11+ messages in thread
From: Tom de Vries @ 2021-05-07 10:08 UTC (permalink / raw)
To: Tobias Burnus, Jakub Jelinek; +Cc: gcc-patches
On 5/7/21 12:05 PM, Tobias Burnus wrote:
> On 06.05.21 16:32, Jakub Jelinek wrote:
>
>> s/recution/reduction/
> Fixed.
>> This comment needs to be adjusted to talk about non-integral types.
> Fixed.
>> Is this hunk still needed when the first hunk is in?
>
> No - and now removed.
>
> Updated code attached.
>
> libgomp/ChangeLog:
>
> * testsuite/libgomp.c-c++-common/reduction-5.c: New test, testing
> complex/floating-point || + && reduction with 'omp target'.
> * testsuite/libgomp.c-c++-common/reduction-5.c: Likewise.
5 -> 6.
Otherwise, LGTM.
Thanks,
- Tom
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Patch] + [nvptx RFH/RFC]: OpenMP: Fix SIMT for complex/float reduction with && and ||
2021-05-07 10:05 ` Tobias Burnus
2021-05-07 10:06 ` Jakub Jelinek
2021-05-07 10:08 ` Tom de Vries
@ 2021-05-18 11:07 ` Thomas Schwinge
2 siblings, 0 replies; 11+ messages in thread
From: Thomas Schwinge @ 2021-05-18 11:07 UTC (permalink / raw)
To: gcc-patches, Tobias Burnus; +Cc: Jakub Jelinek, Tom de Vries
[-- Attachment #1: Type: text/plain, Size: 941 bytes --]
Hi!
On 2021-05-07T12:05:11+0200, Tobias Burnus <tobias@codesourcery.com> wrote:
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-5.c
> @@ -0,0 +1,193 @@
> +/* { dg-additional-options "-foffload=-latomic" { target { offload_target_nvptx } } } */
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-6.c
> @@ -0,0 +1,196 @@
> +/* { dg-additional-options "-foffload=-latomic" { target { offload_target_nvptx } } } */
Causes issues if more than nvptx offloading compilation is enabled. Thus
pushed "'libgomp.c-c++-common/reduction-{5,6}.c': Restrict '-latomic' to
nvptx offloading compilation" to master branch in commit
937fa5fb7840c19c96b1fdf1ce678699649a6c5e, see attached.
Grüße
Thomas
-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-libgomp.c-c-common-reduction-5-6-.c-Restrict-latomic.patch --]
[-- Type: text/x-diff, Size: 2103 bytes --]
From 937fa5fb7840c19c96b1fdf1ce678699649a6c5e Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Mon, 17 May 2021 08:05:40 +0200
Subject: [PATCH] 'libgomp.c-c++-common/reduction-{5,6}.c': Restrict '-latomic'
to nvptx offloading compilation
Fix-up for recent commit 33b647956caa977d1ae489f9baed9cef70b4f382
"OpenMP: Fix SIMT for complex/float reduction with && and ||"; see
commit d42088e453042f4f8ba9190a7e29efd937ea2181 "Avoid -latomic for amdgcn
offloading".
libgomp/
* testsuite/libgomp.c-c++-common/reduction-5.c: Restrict
'-latomic' to nvptx offloading compilation.
* testsuite/libgomp.c-c++-common/reduction-6.c: Likewise.
---
libgomp/testsuite/libgomp.c-c++-common/reduction-5.c | 2 +-
libgomp/testsuite/libgomp.c-c++-common/reduction-6.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/libgomp/testsuite/libgomp.c-c++-common/reduction-5.c b/libgomp/testsuite/libgomp.c-c++-common/reduction-5.c
index 21540512e23..31fa2670312 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/reduction-5.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-5.c
@@ -1,4 +1,4 @@
-/* { dg-additional-options "-foffload=-latomic" { target { offload_target_nvptx } } } */
+/* { dg-additional-options "-foffload=nvptx-none=-latomic" { target { offload_target_nvptx } } } */
/* C / C++'s logical AND and OR operators take any scalar argument
which compares (un)equal to 0 - the result 1 or 0 and of type int.
diff --git a/libgomp/testsuite/libgomp.c-c++-common/reduction-6.c b/libgomp/testsuite/libgomp.c-c++-common/reduction-6.c
index 27d9ef6b635..727e11e4edf 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/reduction-6.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-6.c
@@ -1,4 +1,4 @@
-/* { dg-additional-options "-foffload=-latomic" { target { offload_target_nvptx } } } */
+/* { dg-additional-options "-foffload=nvptx-none=-latomic" { target { offload_target_nvptx } } } */
/* C / C++'s logical AND and OR operators take any scalar argument
which compares (un)equal to 0 - the result 1 or 0 and of type int.
--
2.30.2
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2021-05-18 11:07 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-06 10:17 [Patch] + [nvptx RFH/RFC]: OpenMP: Fix SIMT for complex/float reduction with && and || Tobias Burnus
2021-05-06 10:30 ` Jakub Jelinek
2021-05-06 13:12 ` Tom de Vries
2021-05-06 13:22 ` Jakub Jelinek
2021-05-06 14:05 ` Tom de Vries
2021-05-06 14:21 ` Tobias Burnus
2021-05-06 14:32 ` Jakub Jelinek
2021-05-07 10:05 ` Tobias Burnus
2021-05-07 10:06 ` Jakub Jelinek
2021-05-07 10:08 ` Tom de Vries
2021-05-18 11:07 ` Thomas Schwinge
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).