public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [gomp4] fix atomic tests
@ 2016-01-22 21:59 ` Nathan Sidwell
  2016-04-29 14:01   ` [Openacc] Adjust automatic loop partitioning Nathan Sidwell
  0 siblings, 1 reply; 8+ messages in thread
From: Nathan Sidwell @ 2016-01-22 21:59 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 312 bytes --]

These two tests presumed a particular ordering of atomicc operation execution, 
which is kind of anethema to why  you'd want atomic ops and parallelizing loops. 
  I've removed the more obviously incorrect assumptions, but I have a suspicion 
the fortran one at least still contains undefined behaviour.

nathan

[-- Attachment #2: gomp4-atomic.patch --]
[-- Type: text/x-patch, Size: 3183 bytes --]

2016-01-22  Nathan Sidwell  <nathan@codesourcery.com>

	* testsuite/libgomp.oacc-c-c++-common/atomic_capture-1.c: Don't
	assume atomic op ordering.
	* testsuite/libgomp.oacc-fortran/atomic_capture-1.f90: Likewise.

Index: libgomp/testsuite/libgomp.oacc-c-c++-common/atomic_capture-1.c
===================================================================
--- libgomp/testsuite/libgomp.oacc-c-c++-common/atomic_capture-1.c	(revision 232738)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/atomic_capture-1.c	(working copy)
@@ -783,31 +783,6 @@ main(int argc, char **argv)
   fgot = 1.0;
   fexp = 0.0;
 
-#pragma acc data copy (fgot, fdata[0:N])
-  {
-#pragma acc parallel loop
-    for (i = 0; i < N; i++)
-      {
-        float expr = 32.0;
-
-#pragma acc atomic capture
-        fdata[i] = fgot = expr - fgot;
-      }
-  }
-
-  for (i = 0; i < N; i++)
-    if (i % 2 == 0)
-      {
-	if (fdata[i] != 31.0)
-	  abort ();
-      }
-    else
-      {
-	if (fdata[i] != 1.0)
-	  abort ();
-      }
-
-
   /* BINOP = / */
   fexp = 1.0;
   fgot = 8192.0*8192.0*64.0;
Index: libgomp/testsuite/libgomp.oacc-fortran/atomic_capture-1.f90
===================================================================
--- libgomp/testsuite/libgomp.oacc-fortran/atomic_capture-1.f90	(revision 232738)
+++ libgomp/testsuite/libgomp.oacc-fortran/atomic_capture-1.f90	(working copy)
@@ -257,7 +257,6 @@ program main
     end do
   !$acc end parallel loop
 
-  if (itmp /= iexp - 1) call abort
   if (igot /= iexp) call abort
 
   igot = N
@@ -272,7 +271,6 @@ program main
     end do
   !$acc end parallel loop
 
-  if (itmp /= iexp) call abort
   if (igot /= iexp) call abort
 
   igot = -1
@@ -288,7 +286,6 @@ program main
     end do
   !$acc end parallel loop
 
-  if (itmp /= ibset (iexp, N - 1)) call abort
   if (igot /= iexp) call abort
 
   igot = 0
@@ -304,7 +301,6 @@ program main
     end do
   !$acc end parallel loop
 
-  if (itmp /= ieor (iexp, lshift (1, N - 1))) call abort
   if (igot /= iexp) call abort
 
   igot = -1
@@ -320,7 +316,6 @@ program main
     end do
   !$acc end parallel loop
 
-  if (itmp /= ior (iexp, lshift (1, N - 1))) call abort
   if (igot /= iexp) call abort
 
   igot = 1
@@ -335,7 +330,6 @@ program main
     end do
   !$acc end parallel loop
 
-  if (itmp /= iexp - 1) call abort
   if (igot /= iexp) call abort
 
   igot = N
@@ -350,7 +344,6 @@ program main
     end do
   !$acc end parallel loop
 
-  if (itmp /= iexp) call abort
   if (igot /= iexp) call abort
 
   igot = -1
@@ -366,7 +359,6 @@ program main
     end do
   !$acc end parallel loop
 
-  if (itmp /= ibset (iexp, N - 1)) call abort
   if (igot /= iexp) call abort
 
   igot = 0
@@ -382,7 +374,6 @@ program main
     end do
   !$acc end parallel loop
 
-  if (itmp /= ieor (iexp, lshift (1, N - 1))) call abort
   if (igot /= iexp) call abort
 
   igot = -1
@@ -398,7 +389,6 @@ program main
     end do
   !$acc end parallel loop
 
-  if (itmp /= ior (iexp, lshift (1, N - 1))) call abort
   if (igot /= iexp) call abort
 
   fgot = 1234.0
@@ -525,7 +515,6 @@ program main
     end do
   !$acc end parallel loop
 
-  if (ftmp /= fexp) call abort
   if (fgot /= fexp) call abort
 
   fgot = 1.0

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [gomp4] gang partitioning
@ 2016-01-22 22:11 Nathan Sidwell
  2016-01-22 21:59 ` [gomp4] fix atomic tests Nathan Sidwell
  0 siblings, 1 reply; 8+ messages in thread
From: Nathan Sidwell @ 2016-01-22 22:11 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 492 bytes --]

I've committed this patch to gomp4 branch.  It changes the auto partitioning 
logic to allocate the outermost loop to the outermost available partitioning. 
For instance, gang partitioning will be used for the outermost loop of a 
parallel region.   Innermost loops remain partitioned at the  innermost 
available level.

This means that if we run out of available partitions, we've parallelized the 
outer loop and the innermost loops, rather than just parallelized the inner loops.

nathan

[-- Attachment #2: gomp4-auto-gang.patch --]
[-- Type: text/x-patch, Size: 7954 bytes --]

2016-01-22  Nathan Sidwell  <nathan@codesourcery.com>

	gcc/
	* omp-low.c (struct oacc_loop): Add 'inner' field.
	(new_oacc_loop_raw): Initialize it to zero.
	(oacc_loop_fixed_partitions): Initialize it.
	(oacc_loop_auto_partitions): Partition outermost loop to outermost
	available partitioning.

	gcc/testsuite/
	* c-c++-common/goacc/loop-auto-1.c: Adjust expected warnings.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c: Adjust
	expected partitioning.

Index: libgomp/testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c
===================================================================
--- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c	(revision 232749)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c	(working copy)
@@ -102,9 +102,11 @@ int vector_1 (int *ary, int size)
   
 #pragma acc parallel num_workers (32) vector_length(32) copy(ary[0:size]) firstprivate (size)
   {
+#pragma acc loop gang
+    for (int jx = 0; jx < 1; jx++)
 #pragma acc loop auto
-    for (int ix = 0; ix < size; ix++)
-      ary[ix] = place ();
+      for (int ix = 0; ix < size; ix++)
+	ary[ix] = place ();
   }
 
   return check (ary, size, 0, 0, 1);
@@ -117,7 +119,7 @@ int vector_2 (int *ary, int size)
 #pragma acc parallel num_workers (32) vector_length(32) copy(ary[0:size]) firstprivate (size)
   {
 #pragma acc loop worker
-    for (int jx = 0; jx <  size  / 64; jx++)
+    for (int jx = 0; jx < size  / 64; jx++)
 #pragma acc loop auto
       for (int ix = 0; ix < 64; ix++)
 	ary[ix + jx * 64] = place ();
@@ -132,30 +134,16 @@ int worker_1 (int *ary, int size)
   
 #pragma acc parallel num_workers (32) vector_length(32) copy(ary[0:size]) firstprivate (size)
   {
+#pragma acc loop gang
+    for (int kx = 0; kx < 1; kx++)
 #pragma acc loop auto
-    for (int jx = 0; jx <  size  / 64; jx++)
+      for (int jx = 0; jx <  size  / 64; jx++)
 #pragma acc loop vector
-      for (int ix = 0; ix < 64; ix++)
-	ary[ix + jx * 64] = place ();
-  }
-
-  return check (ary, size, 0, 1, 1);
-}
-
-int worker_2 (int *ary, int size)
-{
-  clear (ary, size);
-  
-#pragma acc parallel num_workers (32) vector_length(32) copy(ary[0:size]) firstprivate (size)
-  {
-#pragma acc loop auto
-    for (int jx = 0; jx <  size  / 64; jx++)
-#pragma acc loop auto
-      for (int ix = 0; ix < 64; ix++)
-	ary[ix + jx * 64] = place ();
+	for (int ix = 0; ix < 64; ix++)
+	  ary[ix + jx * 64] = place ();
   }
 
-  return check (ary, size, 0, 1, 1);
+  return check (ary, size, 0,  1, 1);
 }
 
 int gang_1 (int *ary, int size)
@@ -192,6 +180,22 @@ int gang_2 (int *ary, int size)
   return check (ary, size, 1, 1, 1);
 }
 
+int gang_3 (int *ary, int size)
+{
+  clear (ary, size);
+  
+#pragma acc parallel num_workers (32) vector_length(32) copy(ary[0:size]) firstprivate (size)
+  {
+#pragma acc loop auto
+    for (int jx = 0; jx <  size  / 64; jx++)
+#pragma acc loop auto
+      for (int ix = 0; ix < 64; ix++)
+	ary[ix + jx * 64] = place ();
+  }
+
+  return check (ary, size, 1, 0, 1);
+}
+
 #define N (32*32*32)
 int main ()
 {
@@ -213,13 +217,13 @@ int main ()
 
   if (worker_1 (ary,  N))
     return 1;
-  if (worker_2 (ary,  N))
-    return 1;
   
   if (gang_1 (ary,  N))
     return 1;
   if (gang_2 (ary,  N))
     return 1;
+  if (gang_3 (ary,  N))
+    return 1;
 
   return 0;
 }
Index: gcc/omp-low.c
===================================================================
--- gcc/omp-low.c	(revision 232749)
+++ gcc/omp-low.c	(working copy)
@@ -249,8 +249,9 @@ struct oacc_loop
   tree routine;  /* Pseudo-loop enclosing a routine.  */
 
   unsigned mask;   /* Partitioning mask.  */
-  unsigned flags;   /* Partitioning flags.  */
-  tree chunk_size;   /* Chunk size.  */
+  unsigned inner;  /* Partitioning of inner loops.  */
+  unsigned flags;  /* Partitioning flags.  */
+  tree chunk_size; /* Chunk size.  */
   gcall *head_end; /* Final marker of head sequence.  */
 };
 
@@ -19434,7 +19435,7 @@ new_oacc_loop_raw (oacc_loop *parent, lo
   memset (loop->tails, 0, sizeof (loop->tails));
   loop->routine = NULL_TREE;
 
-  loop->mask = loop->flags = 0;
+  loop->mask = loop->flags = loop->inner = 0;
   loop->chunk_size = 0;
   loop->head_end = NULL;
 
@@ -19941,8 +19942,11 @@ oacc_loop_fixed_partitions (oacc_loop *l
   mask_all |= this_mask;
   
   if (loop->child)
-    mask_all |= oacc_loop_fixed_partitions (loop->child,
-					    outer_mask | this_mask);
+    {
+      loop->inner = oacc_loop_fixed_partitions (loop->child,
+						outer_mask | this_mask); 
+      mask_all |= loop->inner;
+    }
 
   if (loop->sibling)
     mask_all |= oacc_loop_fixed_partitions (loop->sibling, outer_mask);
@@ -19958,7 +19962,7 @@ oacc_loop_fixed_partitions (oacc_loop *l
 static unsigned
 oacc_loop_auto_partitions (oacc_loop *loop, unsigned outer_mask)
 {
-  unsigned inner_mask = 0;
+  bool assign = (loop->flags & OLF_AUTO) && (loop->flags & OLF_INDEPENDENT);
   bool noisy = true;
 
 #ifdef ACCEL_COMPILER
@@ -19967,16 +19971,33 @@ oacc_loop_auto_partitions (oacc_loop *lo
   noisy = false;
 #endif
 
+  if (assign && outer_mask < GOMP_DIM_MASK (GOMP_DIM_MAX - 1))
+    {
+      /* Allocate the outermost loop at the outermost available
+	 level.  */
+      unsigned this_mask = outer_mask + 1;
+
+      if (!(this_mask & loop->inner))
+	loop->mask = this_mask;
+    }
+
   if (loop->child)
-    inner_mask |= oacc_loop_auto_partitions (loop->child,
-					     outer_mask | loop->mask);
+    {
+      unsigned child_mask = outer_mask | loop->mask;
+
+      if (loop->mask || assign)
+	child_mask |= GOMP_DIM_MASK (GOMP_DIM_MAX);
 
-  if ((loop->flags & OLF_AUTO) && (loop->flags & OLF_INDEPENDENT))
+      loop->inner = oacc_loop_auto_partitions (loop->child, child_mask);
+    }
+
+  if (assign && !loop->mask)
     {
+      /* Allocate the loop at the innermost available level.  */
       unsigned this_mask = 0;
       
       /* Determine the outermost partitioning used within this loop. */
-      this_mask = inner_mask | GOMP_DIM_MASK (GOMP_DIM_MAX);
+      this_mask = loop->inner | GOMP_DIM_MASK (GOMP_DIM_MAX);
       this_mask = (this_mask & -this_mask);
 
       /* Pick the partitioning just inside that one.  */
@@ -19989,17 +20010,20 @@ oacc_loop_auto_partitions (oacc_loop *lo
 	warning_at (loop->loc, 0,
 		    "insufficient partitioning available to parallelize loop");
 
-      if (dump_file)
-	fprintf (dump_file, "Auto loop %s:%d assigned %d\n",
-		 LOCATION_FILE (loop->loc), LOCATION_LINE (loop->loc),
-		 this_mask);
-
       loop->mask = this_mask;
     }
-  inner_mask |= loop->mask;
+
+  if (assign && dump_file)
+    fprintf (dump_file, "Auto loop %s:%d assigned %d\n",
+	     LOCATION_FILE (loop->loc), LOCATION_LINE (loop->loc),
+	     loop->mask);
+
+  unsigned inner_mask = 0;
   
   if (loop->sibling)
     inner_mask |= oacc_loop_auto_partitions (loop->sibling, outer_mask);
+  
+  inner_mask |= loop->inner | loop->mask;
 
   return inner_mask;
 }
Index: gcc/testsuite/c-c++-common/goacc/loop-auto-1.c
===================================================================
--- gcc/testsuite/c-c++-common/goacc/loop-auto-1.c	(revision 232749)
+++ gcc/testsuite/c-c++-common/goacc/loop-auto-1.c	(working copy)
@@ -186,10 +186,10 @@ void Worker (void)
 	for (int jx = 0; jx < 10; jx++) {}
       }
 
-#pragma acc loop auto /* { dg-warning "insufficient partitioning" } */
+#pragma acc loop auto
     for (int ix = 0; ix < 10; ix++)
       {
-#pragma acc loop auto
+#pragma acc loop auto /* { dg-warning "insufficient partitioning" } */
 	for (int jx = 0; jx < 10; jx++)
 	  {
 #pragma acc loop auto
@@ -214,10 +214,10 @@ void Vector (void)
 #pragma acc loop auto
     for (int ix = 0; ix < 10; ix++) {}
 
-#pragma acc loop auto /* { dg-warning "insufficient partitioning" } */
+#pragma acc loop auto
     for (int ix = 0; ix < 10; ix++)
       {
-#pragma acc loop auto
+#pragma acc loop auto /* { dg-warning "insufficient partitioning" } */
 	for (int jx = 0; jx < 10; jx++) {}
       }
 }

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Openacc] Adjust automatic loop partitioning
@ 2016-04-29 14:01   ` Nathan Sidwell
  2016-05-02  7:15     ` Jakub Jelinek
  2016-05-03 10:35     ` Thomas Schwinge
  0 siblings, 2 replies; 8+ messages in thread
From: Nathan Sidwell @ 2016-04-29 14:01 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 571 bytes --]

Jakub,
currently automatic loop partitioning assigns from the innermost loop outwards 
-- that was the simplest thing to implement.  A better algorithm is to assign 
the outermost loop to the outermost available axis, and then assign from the 
innermost loop outwards.   That way we (generally) get gang partitioning on the 
outermost loop.  Just inside that we'll get non-partitioned loops if the nest is 
too deep, and the two innermost nested loops will get worker and vector 
partitioning.

This patch has been on the gomp4 branch for a while.  ok for trunk?

nathan

[-- Attachment #2: trunk-gang.patch --]
[-- Type: text/x-patch, Size: 7883 bytes --]

2016-04-29  Nathan Sidwell  <nathan@codesourcery.com>

	gcc/
	* omp-low.c (struct oacc_loop): Add 'inner' field.
	(new_oacc_loop_raw): Initialize it to zero.
	(oacc_loop_fixed_partitions): Initialize it.
	(oacc_loop_auto_partitions): Partition outermost loop to outermost
	available partitioning.

	gcc/testsuite/
	* c-c++-common/goacc/loop-auto-1.c: Adjust expected warnings.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c: Adjust
	expected partitioning.

Index: libgomp/testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c
===================================================================
--- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c	(revision 235511)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c	(working copy)
@@ -103,9 +103,11 @@ int vector_1 (int *ary, int size)
   
 #pragma acc parallel num_workers (32) vector_length(32) copy(ary[0:size]) firstprivate (size)
   {
+#pragma acc loop gang
+    for (int jx = 0; jx < 1; jx++)
 #pragma acc loop auto
-    for (int ix = 0; ix < size; ix++)
-      ary[ix] = place ();
+      for (int ix = 0; ix < size; ix++)
+	ary[ix] = place ();
   }
 
   return check (ary, size, 0, 0, 1);
@@ -118,7 +120,7 @@ int vector_2 (int *ary, int size)
 #pragma acc parallel num_workers (32) vector_length(32) copy(ary[0:size]) firstprivate (size)
   {
 #pragma acc loop worker
-    for (int jx = 0; jx <  size  / 64; jx++)
+    for (int jx = 0; jx < size  / 64; jx++)
 #pragma acc loop auto
       for (int ix = 0; ix < 64; ix++)
 	ary[ix + jx * 64] = place ();
@@ -133,30 +135,16 @@ int worker_1 (int *ary, int size)
   
 #pragma acc parallel num_workers (32) vector_length(32) copy(ary[0:size]) firstprivate (size)
   {
+#pragma acc loop gang
+    for (int kx = 0; kx < 1; kx++)
 #pragma acc loop auto
-    for (int jx = 0; jx <  size  / 64; jx++)
+      for (int jx = 0; jx <  size  / 64; jx++)
 #pragma acc loop vector
-      for (int ix = 0; ix < 64; ix++)
-	ary[ix + jx * 64] = place ();
-  }
-
-  return check (ary, size, 0, 1, 1);
-}
-
-int worker_2 (int *ary, int size)
-{
-  clear (ary, size);
-  
-#pragma acc parallel num_workers (32) vector_length(32) copy(ary[0:size]) firstprivate (size)
-  {
-#pragma acc loop auto
-    for (int jx = 0; jx <  size  / 64; jx++)
-#pragma acc loop auto
-      for (int ix = 0; ix < 64; ix++)
-	ary[ix + jx * 64] = place ();
+	for (int ix = 0; ix < 64; ix++)
+	  ary[ix + jx * 64] = place ();
   }
 
-  return check (ary, size, 0, 1, 1);
+  return check (ary, size, 0,  1, 1);
 }
 
 int gang_1 (int *ary, int size)
@@ -193,6 +181,22 @@ int gang_2 (int *ary, int size)
   return check (ary, size, 1, 1, 1);
 }
 
+int gang_3 (int *ary, int size)
+{
+  clear (ary, size);
+  
+#pragma acc parallel num_workers (32) vector_length(32) copy(ary[0:size]) firstprivate (size)
+  {
+#pragma acc loop auto
+    for (int jx = 0; jx <  size  / 64; jx++)
+#pragma acc loop auto
+      for (int ix = 0; ix < 64; ix++)
+	ary[ix + jx * 64] = place ();
+  }
+
+  return check (ary, size, 1, 0, 1);
+}
+
 #define N (32*32*32)
 int main ()
 {
@@ -214,13 +218,13 @@ int main ()
 
   if (worker_1 (ary,  N))
     return 1;
-  if (worker_2 (ary,  N))
-    return 1;
   
   if (gang_1 (ary,  N))
     return 1;
   if (gang_2 (ary,  N))
     return 1;
+  if (gang_3 (ary,  N))
+    return 1;
 
   return 0;
 }
Index: gcc/omp-low.c
===================================================================
--- gcc/omp-low.c	(revision 235511)
+++ gcc/omp-low.c	(working copy)
@@ -241,6 +241,7 @@ struct oacc_loop
   tree routine;  /* Pseudo-loop enclosing a routine.  */
 
   unsigned mask;   /* Partitioning mask.  */
+  unsigned inner;  /* Partitioning of inner loops.  */
   unsigned flags;  /* Partitioning flags.  */
   unsigned ifns;   /* Contained loop abstraction functions.  */
   tree chunk_size; /* Chunk size.  */
@@ -18921,7 +18922,7 @@ new_oacc_loop_raw (oacc_loop *parent, lo
   memset (loop->tails, 0, sizeof (loop->tails));
   loop->routine = NULL_TREE;
 
-  loop->mask = loop->flags = 0;
+  loop->mask = loop->flags = loop->inner = 0;
   loop->ifns = 0;
   loop->chunk_size = 0;
   loop->head_end = NULL;
@@ -19449,8 +19450,11 @@ oacc_loop_fixed_partitions (oacc_loop *l
   mask_all |= this_mask;
   
   if (loop->child)
-    mask_all |= oacc_loop_fixed_partitions (loop->child,
-					    outer_mask | this_mask);
+    {
+      loop->inner = oacc_loop_fixed_partitions (loop->child,
+						outer_mask | this_mask); 
+      mask_all |= loop->inner;
+    }
 
   if (loop->sibling)
     mask_all |= oacc_loop_fixed_partitions (loop->sibling, outer_mask);
@@ -19466,7 +19470,7 @@ oacc_loop_fixed_partitions (oacc_loop *l
 static unsigned
 oacc_loop_auto_partitions (oacc_loop *loop, unsigned outer_mask)
 {
-  unsigned inner_mask = 0;
+  bool assign = (loop->flags & OLF_AUTO) && (loop->flags & OLF_INDEPENDENT);
   bool noisy = true;
 
 #ifdef ACCEL_COMPILER
@@ -19475,16 +19479,33 @@ oacc_loop_auto_partitions (oacc_loop *lo
   noisy = false;
 #endif
 
+  if (assign && outer_mask < GOMP_DIM_MASK (GOMP_DIM_MAX - 1))
+    {
+      /* Allocate the outermost loop at the outermost available
+	 level.  */
+      unsigned this_mask = outer_mask + 1;
+
+      if (!(this_mask & loop->inner))
+	loop->mask = this_mask;
+    }
+
   if (loop->child)
-    inner_mask |= oacc_loop_auto_partitions (loop->child,
-					     outer_mask | loop->mask);
+    {
+      unsigned child_mask = outer_mask | loop->mask;
+
+      if (loop->mask || assign)
+	child_mask |= GOMP_DIM_MASK (GOMP_DIM_MAX);
 
-  if ((loop->flags & OLF_AUTO) && (loop->flags & OLF_INDEPENDENT))
+      loop->inner = oacc_loop_auto_partitions (loop->child, child_mask);
+    }
+
+  if (assign && !loop->mask)
     {
+      /* Allocate the loop at the innermost available level.  */
       unsigned this_mask = 0;
       
       /* Determine the outermost partitioning used within this loop. */
-      this_mask = inner_mask | GOMP_DIM_MASK (GOMP_DIM_MAX);
+      this_mask = loop->inner | GOMP_DIM_MASK (GOMP_DIM_MAX);
       this_mask = (this_mask & -this_mask);
 
       /* Pick the partitioning just inside that one.  */
@@ -19497,17 +19518,20 @@ oacc_loop_auto_partitions (oacc_loop *lo
 	warning_at (loop->loc, 0,
 		    "insufficient partitioning available to parallelize loop");
 
-      if (dump_file)
-	fprintf (dump_file, "Auto loop %s:%d assigned %d\n",
-		 LOCATION_FILE (loop->loc), LOCATION_LINE (loop->loc),
-		 this_mask);
-
       loop->mask = this_mask;
     }
-  inner_mask |= loop->mask;
+
+  if (assign && dump_file)
+    fprintf (dump_file, "Auto loop %s:%d assigned %d\n",
+	     LOCATION_FILE (loop->loc), LOCATION_LINE (loop->loc),
+	     loop->mask);
+
+  unsigned inner_mask = 0;
   
   if (loop->sibling)
     inner_mask |= oacc_loop_auto_partitions (loop->sibling, outer_mask);
+  
+  inner_mask |= loop->inner | loop->mask;
 
   return inner_mask;
 }
Index: gcc/testsuite/c-c++-common/goacc/loop-auto-1.c
===================================================================
--- gcc/testsuite/c-c++-common/goacc/loop-auto-1.c	(revision 235511)
+++ gcc/testsuite/c-c++-common/goacc/loop-auto-1.c	(working copy)
@@ -186,10 +186,10 @@ void Worker (void)
 	for (int jx = 0; jx < 10; jx++) {}
       }
 
-#pragma acc loop auto /* { dg-warning "insufficient partitioning" } */
+#pragma acc loop auto
     for (int ix = 0; ix < 10; ix++)
       {
-#pragma acc loop auto
+#pragma acc loop auto /* { dg-warning "insufficient partitioning" } */
 	for (int jx = 0; jx < 10; jx++)
 	  {
 #pragma acc loop auto
@@ -214,10 +214,10 @@ void Vector (void)
 #pragma acc loop auto
     for (int ix = 0; ix < 10; ix++) {}
 
-#pragma acc loop auto /* { dg-warning "insufficient partitioning" } */
+#pragma acc loop auto
     for (int ix = 0; ix < 10; ix++)
       {
-#pragma acc loop auto
+#pragma acc loop auto /* { dg-warning "insufficient partitioning" } */
 	for (int jx = 0; jx < 10; jx++) {}
       }
 }

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Openacc] Adjust automatic loop partitioning
  2016-04-29 14:01   ` [Openacc] Adjust automatic loop partitioning Nathan Sidwell
@ 2016-05-02  7:15     ` Jakub Jelinek
  2016-05-03 10:35     ` Thomas Schwinge
  1 sibling, 0 replies; 8+ messages in thread
From: Jakub Jelinek @ 2016-05-02  7:15 UTC (permalink / raw)
  To: Nathan Sidwell; +Cc: GCC Patches

On Fri, Apr 29, 2016 at 10:00:43AM -0400, Nathan Sidwell wrote:
> Jakub,
> currently automatic loop partitioning assigns from the innermost loop
> outwards -- that was the simplest thing to implement.  A better algorithm is
> to assign the outermost loop to the outermost available axis, and then
> assign from the innermost loop outwards.   That way we (generally) get gang
> partitioning on the outermost loop.  Just inside that we'll get
> non-partitioned loops if the nest is too deep, and the two innermost nested
> loops will get worker and vector partitioning.
> 
> This patch has been on the gomp4 branch for a while.  ok for trunk?
> 
> nathan

> 2016-04-29  Nathan Sidwell  <nathan@codesourcery.com>
> 
> 	gcc/
> 	* omp-low.c (struct oacc_loop): Add 'inner' field.
> 	(new_oacc_loop_raw): Initialize it to zero.
> 	(oacc_loop_fixed_partitions): Initialize it.
> 	(oacc_loop_auto_partitions): Partition outermost loop to outermost
> 	available partitioning.
> 
> 	gcc/testsuite/
> 	* c-c++-common/goacc/loop-auto-1.c: Adjust expected warnings.
> 
> 	libgomp/
> 	* testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c: Adjust
> 	expected partitioning.

Ok.

	Jakub

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Openacc] Adjust automatic loop partitioning
  2016-04-29 14:01   ` [Openacc] Adjust automatic loop partitioning Nathan Sidwell
  2016-05-02  7:15     ` Jakub Jelinek
@ 2016-05-03 10:35     ` Thomas Schwinge
  2016-05-04 17:25       ` [PATCH] tail merge ICE Nathan Sidwell
  1 sibling, 1 reply; 8+ messages in thread
From: Thomas Schwinge @ 2016-05-03 10:35 UTC (permalink / raw)
  To: Nathan Sidwell; +Cc: GCC Patches, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 4922 bytes --]

Hi Nathan!

On Fri, 29 Apr 2016 10:00:43 -0400, Nathan Sidwell <nathan@acm.org> wrote:
> currently automatic loop partitioning assigns from the innermost loop outwards 
> -- that was the simplest thing to implement.  A better algorithm is to assign 
> the outermost loop to the outermost available axis, and then assign from the 
> innermost loop outwards.   That way we (generally) get gang partitioning on the 
> outermost loop.  Just inside that we'll get non-partitioned loops if the nest is 
> too deep, and the two innermost nested loops will get worker and vector 
> partitioning.

> 	gcc/
> 	* omp-low.c (struct oacc_loop): Add 'inner' field.
> 	(new_oacc_loop_raw): Initialize it to zero.
> 	(oacc_loop_fixed_partitions): Initialize it.
> 	(oacc_loop_auto_partitions): Partition outermost loop to outermost
> 	available partitioning.

I'm now observing the sporadic failures (that you had mentioned before)
of libgomp.oacc-c-c++-common/atomic_capture-1.c and
libgomp.oacc-fortran/atomic_capture-1.f90.  I suppose the problem is that
constructs such as libgomp.oacc-c-c++-common/atomic_capture-1.c:

      fgot = 1.0;
      fexp = 0.0;
    
    #pragma acc data copy (fgot, fdata[0:N])
      {
    #pragma acc parallel loop
        for (i = 0; i < N; i++)
          {
            float expr = 32.0;
    
    #pragma acc atomic capture
            fdata[i] = fgot = expr - fgot;
          }
      }
    
      for (i = 0; i < N; i++)
        if (i % 2 == 0)
          {
            if (fdata[i] != 31.0)
              abort ();
          }
        else
          {
            if (fdata[i] != 1.0)
              abort ();
          }

... are no longer executed in stable/ascending order, and instead of the
exact "i % 2 == 0" classifier, we should now instead verify what the 31.0
and 1.0 cases each appear with probability 0.5?  Are you looking into
resolving that, or should somebody else have a look?


I'm also seeing the following regression for C and C++,
libgomp.oacc-c-c++-common/loop-auto-1.c with -O2:

    source-gcc/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c: In function 'vector_1._omp_fn.0':
    source-gcc/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c:104:9: internal compiler error: Segmentation fault
     #pragma acc parallel num_workers (32) vector_length(32) copy(ary[0:size]) firstprivate (size)
             ^

    #4  0x0000000000f73d46 in internal_error (gmsgid=gmsgid@entry=0x105be63 "%s")
        at [...]/source-gcc/gcc/diagnostic.c:1270
    #5  0x00000000009fccb0 in crash_signal (signo=<optimized out>)
        at [...]/source-gcc/gcc/toplev.c:333
    #6  <signal handler called>
    #7  0x0000000000beaf2e in same_succ_flush_bb (bb=<optimized out>, bb=<optimized out>)
        at [...]/source-gcc/gcc/hash-table.h:919
    #8  0x0000000000bec499 in same_succ_flush_bbs (bbs=<optimized out>)
        at [...]/source-gcc/gcc/tree-ssa-tail-merge.c:823
    #9  update_worklist () at [...]/source-gcc/gcc/tree-ssa-tail-merge.c:870
    #10 tail_merge_optimize (todo=todo@entry=32)
        at [...]/source-gcc/gcc/tree-ssa-tail-merge.c:1716
    #11 0x0000000000b99057 in (anonymous namespace)::pass_pre::execute (this=<optimized out>, fun=<optimized out>)
        at [...]/source-gcc/gcc/tree-ssa-pre.c:4818
    #12 0x0000000000937e9d in execute_one_pass (pass=pass@entry=0x1530970)
        at [...]/source-gcc/gcc/passes.c:2348
    #13 0x00000000009384b8 in execute_pass_list_1 (pass=0x1530970)
        at [...]/source-gcc/gcc/passes.c:2432
    #14 0x00000000009384ca in execute_pass_list_1 (pass=0x152fa10)
        at [...]/source-gcc/gcc/passes.c:2433
    #15 0x0000000000938515 in execute_pass_list (fn=0x7ffff69a5930, pass=<optimized out>)
        at [...]/source-gcc/gcc/passes.c:2443
    #16 0x00000000005fdded in cgraph_node::expand (this=this@entry=0x7ffff6990170)
        at [...]/source-gcc/gcc/cgraphunit.c:1982
    #17 0x00000000005ff8c4 in expand_all_functions ()
        at [...]/source-gcc/gcc/cgraphunit.c:2118
    #18 symbol_table::compile (this=0x7ffff68d2000) at [...]/source-gcc/gcc/cgraphunit.c:2474
    #19 0x0000000000561db8 in lto_main () at [...]/source-gcc/gcc/lto/lto.c:3328
    #20 0x00000000009fccef in compile_file () at [...]/source-gcc/gcc/toplev.c:463
    #21 0x000000000052e5ba in do_compile () at [...]/source-gcc/gcc/toplev.c:1987
    #22 toplev::main (this=this@entry=0x7fffffffcc80, argc=argc@entry=18, argv=0x150aec0, argv@entry=0x7fffffffcd88)
        at [...]/source-gcc/gcc/toplev.c:2095
    #23 0x0000000000530247 in main (argc=18, argv=0x7fffffffcd88)
        at [...]/source-gcc/gcc/main.c:39

Are you seeing that, too?  I can't remember seeing that on
gomp-4_0-branch, so it may be due to a recent trunk change, independent
of your omp-low change.  Are you going to have a look, or want me to?


Grüße
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] tail merge ICE
  2016-05-03 10:35     ` Thomas Schwinge
@ 2016-05-04 17:25       ` Nathan Sidwell
  2016-05-06 10:32         ` Richard Biener
  2016-06-10 10:25         ` Thomas Schwinge
  0 siblings, 2 replies; 8+ messages in thread
From: Nathan Sidwell @ 2016-05-04 17:25 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: GCC Patches, Jakub Jelinek, Richard Guenther

[-- Attachment #1: Type: text/plain, Size: 2575 bytes --]

This patch fixes an ICE Thomas observed in tree-ssa-tail-merge.c:

On 05/03/16 06:34, Thomas Schwinge wrote:

> I'm also seeing the following regression for C and C++,
> libgomp.oacc-c-c++-common/loop-auto-1.c with -O2:
>
>     source-gcc/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c: In function 'vector_1._omp_fn.0':
>     source-gcc/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c:104:9: internal compiler error: Segmentation fault
>      #pragma acc parallel num_workers (32) vector_length(32) copy(ary[0:size]) firstprivate (size)
>              ^
>
>     #4  0x0000000000f73d46 in internal_error (gmsgid=gmsgid@entry=0x105be63 "%s")
>         at [...]/source-gcc/gcc/diagnostic.c:1270
>     #5  0x00000000009fccb0 in crash_signal (signo=<optimized out>)
>         at [...]/source-gcc/gcc/toplev.c:333
>     #6  <signal handler called>
>     #7  0x0000000000beaf2e in same_succ_flush_bb (bb=<optimized out>, bb=<optimized out>)
>         at [...]/source-gcc/gcc/hash-table.h:919
>     #8  0x0000000000bec499 in same_succ_flush_bbs (bbs=<optimized out>)
>         at [...]/source-gcc/gcc/tree-ssa-tail-merge.c:823

What's happening is we're trying to delete an object from a hash table, and 
asserting that we did indeed find the object.  The hash's equality function 
compares gimple sequences and ends up calling gimple_call_same_target_p.  That 
returns false if the call is IFN_UNIQUE, and so the deletion fails to find 
anything.  IFN_UNIQUE function calls should not compare equal, but they should 
compare eq (in the lispy sense).

The local fix is to augment the hash compare function with a check for pointer 
equality.  That way deleting items from the table works and comparing different 
sequences functions as before.

The more general fix is to augment gimple_call_same_target_p so that unique fns 
are eq but not equal.  A cursory look at the other users of that function did 
not indicate this currently causes a problem, but IMHO it is odd for a value to 
not compare the same as itself -- though IEEE NaNs do that :)

I placed the pointer equality comparison in gimple_call_same_target_p after the 
check for unique_fn_p, as I suspect that it is the rare case for that to be 
called with the same gimple call object for both parameters.  Although pointer 
equality would be applicable to all cases, in most instances it's going to be false.

Of course, the gimple_call_same_target_p change fixes the problem on its own, 
but the local change to same_succ::equal seems beneficial on its own merits.

ok?

nathan
-- 
Nathan Sidwell

[-- Attachment #2: unique-ice.patch --]
[-- Type: text/x-patch, Size: 1184 bytes --]

2016-05-04  Nathan Sidwell  <nathan@codesourcery.com>

	* gimple.c (gimple_call_same_target_p): Unique functions are eq.
	* tree-ssa-tail-merge.c (same_succ::equal): Check pointer eq
	equality first.

Index: gimple.c
===================================================================
--- gimple.c	(revision 235871)
+++ gimple.c	(working copy)
@@ -1355,7 +1355,8 @@ gimple_call_same_target_p (const gimple
   if (gimple_call_internal_p (c1))
     return (gimple_call_internal_p (c2)
 	    && gimple_call_internal_fn (c1) == gimple_call_internal_fn (c2)
-	    && !gimple_call_internal_unique_p (as_a <const gcall *> (c1)));
+	    && (!gimple_call_internal_unique_p (as_a <const gcall *> (c1))
+		|| c1 == c2));
   else
     return (gimple_call_fn (c1) == gimple_call_fn (c2)
 	    || (gimple_call_fndecl (c1)
Index: tree-ssa-tail-merge.c
===================================================================
--- tree-ssa-tail-merge.c	(revision 235871)
+++ tree-ssa-tail-merge.c	(working copy)
@@ -538,6 +538,9 @@ same_succ::equal (const same_succ *e1, c
   gimple *s1, *s2;
   basic_block bb1, bb2;
 
+  if (e1 == e2)
+    return 1;
+
   if (e1->hashval != e2->hashval)
     return 0;
 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] tail merge ICE
  2016-05-04 17:25       ` [PATCH] tail merge ICE Nathan Sidwell
@ 2016-05-06 10:32         ` Richard Biener
  2016-06-10 10:25         ` Thomas Schwinge
  1 sibling, 0 replies; 8+ messages in thread
From: Richard Biener @ 2016-05-06 10:32 UTC (permalink / raw)
  To: Nathan Sidwell; +Cc: Thomas Schwinge, GCC Patches, Jakub Jelinek

On Wed, May 4, 2016 at 7:25 PM, Nathan Sidwell <nathan@codesourcery.com> wrote:
> This patch fixes an ICE Thomas observed in tree-ssa-tail-merge.c:
>
> On 05/03/16 06:34, Thomas Schwinge wrote:
>
>> I'm also seeing the following regression for C and C++,
>> libgomp.oacc-c-c++-common/loop-auto-1.c with -O2:
>>
>>     source-gcc/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c:
>> In function 'vector_1._omp_fn.0':
>>
>> source-gcc/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c:104:9:
>> internal compiler error: Segmentation fault
>>      #pragma acc parallel num_workers (32) vector_length(32)
>> copy(ary[0:size]) firstprivate (size)
>>              ^
>>
>>     #4  0x0000000000f73d46 in internal_error
>> (gmsgid=gmsgid@entry=0x105be63 "%s")
>>         at [...]/source-gcc/gcc/diagnostic.c:1270
>>     #5  0x00000000009fccb0 in crash_signal (signo=<optimized out>)
>>         at [...]/source-gcc/gcc/toplev.c:333
>>     #6  <signal handler called>
>>     #7  0x0000000000beaf2e in same_succ_flush_bb (bb=<optimized out>,
>> bb=<optimized out>)
>>         at [...]/source-gcc/gcc/hash-table.h:919
>>     #8  0x0000000000bec499 in same_succ_flush_bbs (bbs=<optimized out>)
>>         at [...]/source-gcc/gcc/tree-ssa-tail-merge.c:823
>
>
> What's happening is we're trying to delete an object from a hash table, and
> asserting that we did indeed find the object.  The hash's equality function
> compares gimple sequences and ends up calling gimple_call_same_target_p.
> That returns false if the call is IFN_UNIQUE, and so the deletion fails to
> find anything.  IFN_UNIQUE function calls should not compare equal, but they
> should compare eq (in the lispy sense).
>
> The local fix is to augment the hash compare function with a check for
> pointer equality.  That way deleting items from the table works and
> comparing different sequences functions as before.
>
> The more general fix is to augment gimple_call_same_target_p so that unique
> fns are eq but not equal.  A cursory look at the other users of that
> function did not indicate this currently causes a problem, but IMHO it is
> odd for a value to not compare the same as itself -- though IEEE NaNs do
> that :)
>
> I placed the pointer equality comparison in gimple_call_same_target_p after
> the check for unique_fn_p, as I suspect that it is the rare case for that to
> be called with the same gimple call object for both parameters.  Although
> pointer equality would be applicable to all cases, in most instances it's
> going to be false.
>
> Of course, the gimple_call_same_target_p change fixes the problem on its
> own, but the local change to same_succ::equal seems beneficial on its own
> merits.
>
> ok?

Ok.

Richard.

> nathan
> --
> Nathan Sidwell

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] tail merge ICE
  2016-05-04 17:25       ` [PATCH] tail merge ICE Nathan Sidwell
  2016-05-06 10:32         ` Richard Biener
@ 2016-06-10 10:25         ` Thomas Schwinge
  1 sibling, 0 replies; 8+ messages in thread
From: Thomas Schwinge @ 2016-06-10 10:25 UTC (permalink / raw)
  To: GCC Patches, Nathan Sidwell; +Cc: Jakub Jelinek, Richard Guenther

[-- Attachment #1: Type: text/plain, Size: 2478 bytes --]

Hi!

On Wed, 4 May 2016 13:25:19 -0400, Nathan Sidwell <nathan@codesourcery.com> wrote:
> This patch fixes an ICE Thomas observed in tree-ssa-tail-merge.c: [...]

I've recently run into the same ICE on gcc-6-branch; committed to
gcc-6-branch in r237294, as obvious:

commit f6355e94a162792e7e3ace0b20efd6e73f030585
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Fri Jun 10 09:45:51 2016 +0000

    tail merge ICE
    
    Backport trunk r235964:
    
    	gcc/
    	* gimple.c (gimple_call_same_target_p): Unique functions are eq.
    	* tree-ssa-tail-merge.c (same_succ::equal): Check pointer eq
    	equality first.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-6-branch@237294 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog             | 9 +++++++++
 gcc/gimple.c              | 3 ++-
 gcc/tree-ssa-tail-merge.c | 3 +++
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git gcc/ChangeLog gcc/ChangeLog
index 9773a36..4b7b8f6 100644
--- gcc/ChangeLog
+++ gcc/ChangeLog
@@ -1,3 +1,12 @@
+2016-06-10  Thomas Schwinge  <thomas@codesourcery.com>
+
+	Backport trunk r235964:
+	2016-05-06  Nathan Sidwell  <nathan@codesourcery.com>
+
+	* gimple.c (gimple_call_same_target_p): Unique functions are eq.
+	* tree-ssa-tail-merge.c (same_succ::equal): Check pointer eq
+	equality first.
+
 2016-06-09  Michael Meissner  <meissner@linux.vnet.ibm.com>
 
 	Back port from trunk
diff --git gcc/gimple.c gcc/gimple.c
index b0e19d5..b06e62c 100644
--- gcc/gimple.c
+++ gcc/gimple.c
@@ -1355,7 +1355,8 @@ gimple_call_same_target_p (const gimple *c1, const gimple *c2)
   if (gimple_call_internal_p (c1))
     return (gimple_call_internal_p (c2)
 	    && gimple_call_internal_fn (c1) == gimple_call_internal_fn (c2)
-	    && !gimple_call_internal_unique_p (as_a <const gcall *> (c1)));
+	    && (!gimple_call_internal_unique_p (as_a <const gcall *> (c1))
+		|| c1 == c2));
   else
     return (gimple_call_fn (c1) == gimple_call_fn (c2)
 	    || (gimple_call_fndecl (c1)
diff --git gcc/tree-ssa-tail-merge.c gcc/tree-ssa-tail-merge.c
index e95879f..3df41fd 100644
--- gcc/tree-ssa-tail-merge.c
+++ gcc/tree-ssa-tail-merge.c
@@ -538,6 +538,9 @@ same_succ::equal (const same_succ *e1, const same_succ *e2)
   gimple *s1, *s2;
   basic_block bb1, bb2;
 
+  if (e1 == e2)
+    return 1;
+
   if (e1->hashval != e2->hashval)
     return 0;
 


Grüße
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-06-10 10:25 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-22 22:11 [gomp4] gang partitioning Nathan Sidwell
2016-01-22 21:59 ` [gomp4] fix atomic tests Nathan Sidwell
2016-04-29 14:01   ` [Openacc] Adjust automatic loop partitioning Nathan Sidwell
2016-05-02  7:15     ` Jakub Jelinek
2016-05-03 10:35     ` Thomas Schwinge
2016-05-04 17:25       ` [PATCH] tail merge ICE Nathan Sidwell
2016-05-06 10:32         ` Richard Biener
2016-06-10 10:25         ` Thomas Schwinge

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).