[PATCH] [og10] openmp: Scale precision of collapsed iteration variable

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] [og10] openmp: Scale precision of collapsed iteration variable
@ 2021-03-01 22:28 Kwok Cheung Yeung
  2021-03-04  9:55 ` Tobias Burnus
  0 siblings, 1 reply; 2+ messages in thread
From: Kwok Cheung Yeung @ 2021-03-01 22:28 UTC (permalink / raw)
  To: Thomas Schwinge, Tobias Burnus; +Cc: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 2043 bytes --]

Hello

When two or more nested loops are collapsed using the OpenMP collapse clause, a 
single iteration variable is used to represent the combined iteration space. In 
the usual case (i.e. statically scheduled, no ordered clause), the type of this 
variable is picked by taking the unsigned version of the largest of the iterator 
types in the loop nest:

          else if (i == 0
                   || TYPE_PRECISION (iter_type)
                      < TYPE_PRECISION (TREE_TYPE (loop->v)))
            iter_type
              = build_nonstandard_integer_type
                  (TYPE_PRECISION (TREE_TYPE (loop->v)), 1);

If needed, the original indices of the collapsed loops are recalculated from the 
combined index. However, this can be problematic if the combined iteration space 
of the collapsed loops is larger than can be represented by the type of the 
largest individual loop iterator type. e.g.

for (int i = 0; i < 80000; i++)
   for (int j = 0; j < 80000; j++)

In this case, the combined iteration space is [0..6,400,000,000), which is 
larger than the [0..4,294,967,296) range of a 32-bit unsigned int.

This patch attempts to avoid this problem by setting the precision of the 
combined iteration variable to the sum of the precision of the collapsed 
iterators, rounded up to the nearest power of 2. This is capped at the size of a 
long long (i.e. 64 bits) to avoid an excessive performance hit. If any of the 
loops use a larger type (e.g. __int128), then that is used instead.

I believe OpenACC suffers from a similar problem, but it uses a different 
code-path and should be dealt with separately. The patch caused regressions in 
some OpenACC tests related to tiling (pr84955-1.c, pr84955.c, tile-1.c, 
pr84955.f90) due to the type of diff_type changing between when it was used to 
define the '.tile' variables in expand_oacc_collapse_init and when the '.tile' 
variables are used in expand_oacc_for. I fixed this by adding a cast to the 
current diff_type when '.tile' is multiplied.

Okay for OG10?

Thanks

Kwok

[-- Attachment #2: 0001-openmp-Scale-type-precision-of-collapsed-iterator-va.patch --]
[-- Type: text/plain, Size: 5699 bytes --]

From df1332b7a1575920c8de17359b2dfcad5404a112 Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung <kcy@codesourcery.com>
Date: Mon, 1 Mar 2021 14:15:30 -0800
Subject: [PATCH] openmp: Scale type precision of collapsed iterator variable

This sets the type precision of the collapsed iterator variable to the
sum of the precision of the collapsed loop variables, up to a maximum of
sizeof(long long) (i.e. 64-bits).

2021-03-01  Kwok Cheung Yeung  <kcy@codesourcery.com>

	gcc/
	* omp-expand.c (expand_oacc_for): Convert .tile variable to
	diff_type before multiplying.
	* omp-general.c (omp_extract_for_data): Use accumulated precision
	of all collapsed for-loops as precision of iteration variable, up
	to the precision of a long long.

	libgomp/
	* testsuite/libgomp.c-c++-common/collapse-4.c: New.
	* testsuite/libgomp.fortran/collapse5.f90: New.
---
 gcc/ChangeLog.omp                                  |  8 ++++++
 gcc/omp-expand.c                                   |  5 +++-
 gcc/omp-general.c                                  | 29 +++++++++++++++++-----
 libgomp/ChangeLog.omp                              |  5 ++++
 .../testsuite/libgomp.c-c++-common/collapse-4.c    | 19 ++++++++++++++
 libgomp/testsuite/libgomp.fortran/collapse5.f90    | 14 +++++++++++
 6 files changed, 73 insertions(+), 7 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/collapse-4.c
 create mode 100644 libgomp/testsuite/libgomp.fortran/collapse5.f90

diff --git a/gcc/ChangeLog.omp b/gcc/ChangeLog.omp
index a59c25b..374665d 100644
--- a/gcc/ChangeLog.omp
+++ b/gcc/ChangeLog.omp
@@ -1,3 +1,11 @@
+2021-03-01  Kwok Cheung Yeung  <kcy@codesourcery.com>
+
+	* omp-expand.c (expand_oacc_for): Convert .tile variable to
+	diff_type before multiplying.
+	* omp-general.c (omp_extract_for_data): Use accumulated precision
+	of all collapsed for-loops as precision of iteration variable, up
+	to the precision of a long long.
+
 2021-02-24  Julian Brown  <julian@codesourcery.com>
 
 	Backport from mainline
diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c
index e4a2f3a..f8347c0 100644
--- a/gcc/omp-expand.c
+++ b/gcc/omp-expand.c
@@ -7328,7 +7328,10 @@ expand_oacc_for (struct omp_region *region, struct omp_for_data *fd)
       tile_size = create_tmp_var (diff_type, ".tile_size");
       expr = build_int_cst (diff_type, 1);
       for (int ix = 0; ix < fd->collapse; ix++)
-	expr = fold_build2 (MULT_EXPR, diff_type, counts[ix].tile, expr);
+	{
+	  tree tile = fold_convert (diff_type, counts[ix].tile);
+	  expr = fold_build2 (MULT_EXPR, diff_type, tile, expr);
+	}
       expr = force_gimple_operand_gsi (&gsi, expr, true,
 				       NULL_TREE, true, GSI_SAME_STMT);
       ass = gimple_build_assign (tile_size, expr);
diff --git a/gcc/omp-general.c b/gcc/omp-general.c
index 8e5b961..97f94e1 100644
--- a/gcc/omp-general.c
+++ b/gcc/omp-general.c
@@ -356,6 +356,7 @@ omp_extract_for_data (gomp_for *for_stmt, struct omp_for_data *fd,
 	  fd->non_rect = true;
 	}
     }
+  int accum_iter_precision = 0;
   for (i = 0; i < cnt; i++)
     {
       if (i == 0
@@ -438,12 +439,28 @@ omp_extract_for_data (gomp_for *for_stmt, struct omp_for_data *fd,
 	{
 	  if (fd->collapse == 1 && !fd->tiling)
 	    iter_type = TREE_TYPE (loop->v);
-	  else if (i == 0
-		   || TYPE_PRECISION (iter_type)
-		      < TYPE_PRECISION (TREE_TYPE (loop->v)))
-	    iter_type
-	      = build_nonstandard_integer_type
-		  (TYPE_PRECISION (TREE_TYPE (loop->v)), 1);
+	  else
+	    {
+	      int loop_precision = TYPE_PRECISION (TREE_TYPE (loop->v));
+	      int iter_type_precision = 0;
+	      const int max_accum_precision
+		= TYPE_PRECISION (long_long_unsigned_type_node);
+
+	      accum_iter_precision += loop_precision;
+
+	      if (i == 0
+		  || (loop_precision >= max_accum_precision
+		      && loop_precision >= TYPE_PRECISION (iter_type)))
+		iter_type_precision = loop_precision;
+	      else if (TYPE_PRECISION (iter_type) < max_accum_precision)
+		iter_type_precision
+		  = MIN (1 << ceil_log2 (accum_iter_precision),
+			 max_accum_precision);
+
+	      if (iter_type_precision)
+		iter_type = build_nonstandard_integer_type
+			      (iter_type_precision, 1);
+	    }
 	}
       else if (iter_type != long_long_unsigned_type_node)
 	{
diff --git a/libgomp/ChangeLog.omp b/libgomp/ChangeLog.omp
index d1dcf20..0e3fd12 100644
--- a/libgomp/ChangeLog.omp
+++ b/libgomp/ChangeLog.omp
@@ -1,3 +1,8 @@
+2021-03-01  Kwok Cheung Yeung  <kcy@codesourcery.com>
+
+	* testsuite/libgomp.c-c++-common/collapse-4.c: New.
+	* testsuite/libgomp.fortran/collapse5.f90: New.
+
 2021-02-24  Julian Brown  <julian@codesourcery.com>
 
 	Backport from mainline
diff --git a/libgomp/testsuite/libgomp.c-c++-common/collapse-4.c b/libgomp/testsuite/libgomp.c-c++-common/collapse-4.c
new file mode 100644
index 0000000..04cf747
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/collapse-4.c
@@ -0,0 +1,19 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+
+int
+main (void)
+{
+  int i, j;
+  int count = 0;
+
+  #pragma omp parallel for collapse(2)
+    for (i = 0; i < 80000; i++)
+      for (j = 0; j < 80000; j++)
+	if (i == 66666 && j == 77777)
+	  count++;
+
+  if (count != 1)
+    abort ();
+}
diff --git a/libgomp/testsuite/libgomp.fortran/collapse5.f90 b/libgomp/testsuite/libgomp.fortran/collapse5.f90
new file mode 100644
index 0000000..c76424d
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/collapse5.f90
@@ -0,0 +1,14 @@
+! { dg-do run }
+
+  integer :: i, j
+  integer :: count = 0
+
+  !$omp parallel do collapse (2)
+    do i = 0, 80000
+      do j = 0, 80000
+        if (i .eq. 66666 .and. j .eq. 77777) count = count + 1
+      end do
+    end do
+
+  if (count .ne. 1) stop 1
+end
-- 
2.8.1


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH] [og10] openmp: Scale precision of collapsed iteration variable
  2021-03-01 22:28 [PATCH] [og10] openmp: Scale precision of collapsed iteration variable Kwok Cheung Yeung
@ 2021-03-04  9:55 ` Tobias Burnus
  0 siblings, 0 replies; 2+ messages in thread
From: Tobias Burnus @ 2021-03-04  9:55 UTC (permalink / raw)
  To: Kwok Cheung Yeung, Thomas Schwinge, Tobias Burnus; +Cc: GCC Patches

Hello Kwok,

On 01.03.21 23:28, Kwok Cheung Yeung wrote:
> When two or more nested loops are collapsed using the OpenMP collapse
> clause, a single iteration variable is used to represent the combined
> iteration space. In the usual case (i.e. statically scheduled, no
> ordered clause), the type of this variable is picked by taking the
> unsigned version of the largest of the iterator types in the loop nest
> [...] However, this can be problematic if the combined iteration space
> of the collapsed loops is larger than can be represented by the type
> of the largest individual loop iterator type.

Unfortunately, we are deeply in the implementation-defined realm.
I have now filled an OpenMP spec issue to make it possible to
avoid/mitigate those issues as user: OpenMP Issue 2693.
Namely, a way to specify the used integer type/precision
and/or refine what the user can expect.

In any case, the following applies currently:

"The iterations of some number of associated loops can be
  collapsed into one larger iteration space that is called
  the logical iteration space. The particular integer type
  used to compute the iteration count for the collapsed
  loop is implementation defined."
(2.11.1 Canonical Loop Nest Form)

Hence, both is sensible – the current behavior of GCC
and the one you have proposed and implemented:

> This patch attempts to avoid this problem by setting the precision of
> the combined iteration variable to the sum of the precision of the
> collapsed iterators, rounded up to the nearest power of 2. This is
> capped at the size of a long long (i.e. 64 bits) to avoid an excessive
> performance hit. If any of the loops use a larger type (e.g.
> __int128), then that is used instead.
* * *
> I believe OpenACC suffers from a similar problem, but it uses a
> different code-path and should be dealt with separately.
Regarding OpenACC, I think diff_type → expand_oacc_collapse_init,
expand_oacc_collapse_vars would be the equivalent. In the spec, either a
note similar to OpenMP is missing or I have just not found it.
> Okay for OG10?

I think that's fine for the OG10 branch for now. Another question is how
to handle it best on mainline, which is then also Jakub's call. And also
how the OpenMP issue will be resolved.

Can you add a comment to the two testcases that the internal variable,
used to calculate the iteration count, needs a higher precision than
(32bit) 'int'? Otherwise it is not obvious.

Tobias
> +         {
> +           int loop_precision = TYPE_PRECISION (TREE_TYPE (loop->v));
> +           int iter_type_precision = 0;
> +           const int max_accum_precision
> +             = TYPE_PRECISION (long_long_unsigned_type_node);
> +
> +           accum_iter_precision += loop_precision;
> +
> +           if (i == 0
> +               || (loop_precision >= max_accum_precision
> +                   && loop_precision >= TYPE_PRECISION (iter_type)))
> +             iter_type_precision = loop_precision;
> +           else if (TYPE_PRECISION (iter_type) < max_accum_precision)
> +             iter_type_precision
> +               = MIN (1 << ceil_log2 (accum_iter_precision),
> +                      max_accum_precision);
> +
> +           if (iter_type_precision)
> +             iter_type = build_nonstandard_integer_type
> +                           (iter_type_precision, 1);
> +         }
>       }
>         else if (iter_type != long_long_unsigned_type_node)
>       {
> diff --git a/libgomp/ChangeLog.omp b/libgomp/ChangeLog.omp
> index d1dcf20..0e3fd12 100644
> --- a/libgomp/ChangeLog.omp
> +++ b/libgomp/ChangeLog.omp
> @@ -1,3 +1,8 @@
> +2021-03-01  Kwok Cheung Yeung<kcy@codesourcery.com>
> +
> +     * testsuite/libgomp.c-c++-common/collapse-4.c: New.
> +     * testsuite/libgomp.fortran/collapse5.f90: New.
> +
>   2021-02-24  Julian Brown<julian@codesourcery.com>
>
>       Backport from mainline
> diff --git a/libgomp/testsuite/libgomp.c-c++-common/collapse-4.c b/libgomp/testsuite/libgomp.c-c++-common/collapse-4.c
> new file mode 100644
> index 0000000..04cf747
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.c-c++-common/collapse-4.c
> @@ -0,0 +1,19 @@
> +/* { dg-do run } */
> +
> +#include <stdlib.h>
> +
> +int
> +main (void)
> +{
> +  int i, j;
> +  int count = 0;
> +
> +  #pragma omp parallel for collapse(2)
> +    for (i = 0; i < 80000; i++)
> +      for (j = 0; j < 80000; j++)
> +     if (i == 66666 && j == 77777)
> +       count++;
> +
> +  if (count != 1)
> +    abort ();
> +}
> diff --git a/libgomp/testsuite/libgomp.fortran/collapse5.f90 b/libgomp/testsuite/libgomp.fortran/collapse5.f90
> new file mode 100644
> index 0000000..c76424d
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.fortran/collapse5.f90
> @@ -0,0 +1,14 @@
> +! { dg-do run }
> +
> +  integer :: i, j
> +  integer :: count = 0
> +
> +  !$omp parallel do collapse (2)
> +    do i = 0, 80000
> +      do j = 0, 80000
> +        if (i .eq. 66666 .and. j .eq. 77777) count = count + 1
> +      end do
> +    end do
> +
> +  if (count .ne. 1) stop 1
> +end
> -- 2.8.1
-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-03-04  9:55 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-01 22:28 [PATCH] [og10] openmp: Scale precision of collapsed iteration variable Kwok Cheung Yeung
2021-03-04  9:55 ` Tobias Burnus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).