[gomp4] loop partition optimization

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [gomp4] loop partition optimization
@ 2015-08-26 12:33 Nathan Sidwell
  2015-11-10 22:34 ` [ptx] partitioning optimization Nathan Sidwell
  0 siblings, 1 reply; 12+ messages in thread
From: Nathan Sidwell @ 2015-08-26 12:33 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 681 bytes --]

I've committed this patch, which implements a simple partioned execution 
optimization.  A loop over both worker and vector dimensions is emits  separate 
FORK and JOIN markers for the two dimensions -- there may be reduction pieces 
between them, as Cesar will shortly be committing.

However, if there aren't reductions, then we end up with one partitioned region 
sitting neatly entirely inside another region.   This is inefficient, as it 
causes us to add separate worker and vector partitioning startup.

This optimization looks for regions of this form, and if found consumes the 
inner retion into the outer region.  Then we only emit a single setup block of code.

nathan

[-- Attachment #2: gomp4-part-opt.patch --]
[-- Type: text/x-patch, Size: 4274 bytes --]

2015-08-26  Nathan Sidwell  <nathan@codesourcery.com>

	* config/nvptx/nvptx.opt (moptimize): New flag.
	* config/nvptx/nvptx.c (nvptx_option_override): Default
	nvptx_optimize.
	(nvptx_optimmize_inner): New.
	(nvptx_process_pars): Call it.
	* doc/invoke.txi (Nvptx options): Document moptimize.

Index: gcc/config/nvptx/nvptx.c
===================================================================
--- gcc/config/nvptx/nvptx.c	(revision 227180)
+++ gcc/config/nvptx/nvptx.c	(working copy)
@@ -178,6 +178,9 @@ nvptx_option_override (void)
   write_symbols = NO_DEBUG;
   debug_info_level = DINFO_LEVEL_NONE;
 
+  if (nvptx_optimize < 0)
+    nvptx_optimize = optimize > 0;
+
   declared_fndecls_htab = hash_table<tree_hasher>::create_ggc (17);
   needed_fndecls_htab = hash_table<tree_hasher>::create_ggc (17);
   declared_libfuncs_htab
@@ -3005,6 +3008,64 @@ nvptx_skip_par (unsigned mask, parallel
   nvptx_single (mask, par->forked_block, pre_tail);
 }
 
+/* If PAR has a single inner parallel and PAR itself only contains
+   empty entry and exit blocks, swallow the inner PAR.  */
+
+static void
+nvptx_optimize_inner (parallel *par)
+{
+  parallel *inner = par->inner;
+
+  /* We mustn't be the outer dummy par.  */
+  if (!par->mask)
+    return;
+
+  /* We must have a single inner par.  */
+  if (!inner || inner->next)
+    return;
+
+  /* We must only contain 2 blocks ourselves -- the head and tail of
+     the inner par.  */
+  if (par->blocks.length () != 2)
+    return;
+
+  /* We must be disjoint partitioning.  As we only have vector and
+     worker partitioning, this is sufficient to guarantee the pars
+     have adjacent partitioning.  */
+  if ((par->mask & inner->mask) & (GOMP_DIM_MASK (GOMP_DIM_MAX) - 1))
+    /* This indicates malformed code generation.  */
+    return;
+
+  /* The outer forked insn should be the only one in its block.  */
+  rtx_insn *probe;
+  rtx_insn *forked = par->forked_insn;
+  for (probe = BB_END (par->forked_block);
+       probe != forked; probe = PREV_INSN (probe))
+    if (INSN_P (probe))
+      return;
+
+  /* The outer joining insn, if any, must be in the same block as the inner
+     joined instruction, which must otherwise be empty of insns.  */
+  rtx_insn *joining = par->joining_insn;
+  rtx_insn *join = inner->join_insn;
+  for (probe = BB_END (inner->join_block);
+       probe != join; probe = PREV_INSN (probe))
+    if (probe != joining && INSN_P (probe))
+      return;
+
+  /* Preconditions met.  Swallow the inner par.  */
+  par->mask |= inner->mask & (GOMP_DIM_MASK (GOMP_DIM_MAX) - 1);
+
+  par->blocks.reserve (inner->blocks.length ());
+  while (inner->blocks.length ())
+    par->blocks.quick_push (inner->blocks.pop ());
+
+  par->inner = inner->inner;
+  inner->inner = NULL;
+
+  delete inner;
+}
+
 /* Process the parallel PAR and all its contained
    parallels.  We do everything but the neutering.  Return mask of
    partitioned modes used within this parallel.  */
@@ -3012,8 +3073,11 @@ nvptx_skip_par (unsigned mask, parallel
 static unsigned
 nvptx_process_pars (parallel *par)
 {
-  unsigned inner_mask = par->mask;
+  if (nvptx_optimize)
+    nvptx_optimize_inner (par);
   
+  unsigned inner_mask = par->mask;
+
   /* Do the inner parallels first.  */
   if (par->inner)
     {
Index: gcc/config/nvptx/nvptx.opt
===================================================================
--- gcc/config/nvptx/nvptx.opt	(revision 227180)
+++ gcc/config/nvptx/nvptx.opt	(working copy)
@@ -29,6 +29,10 @@ mmainkernel
 Target Report RejectNegative
 Link in code for a __main kernel.
 
+moptimize
+Target Report Var(nvptx_optimize) Init(-1)
+Optimize partition neutering
+
 Enum
 Name(ptx_isa) Type(int)
 Known PTX ISA versions (for use with the -misa= option):
Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi	(revision 227180)
+++ gcc/doc/invoke.texi	(working copy)
@@ -18814,6 +18814,11 @@ Generate code for 32-bit or 64-bit ABI.
 Link in code for a __main kernel.  This is for stand-alone instead of
 offloading execution.
 
+@item -moptimize
+@opindex moptimize
+Apply partitioned execution optimizations.  This is the default when any
+level of optimization is selected.
+
 @end table
 
 @node PDP-11 Options

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [ptx] partitioning optimization
@ 2015-11-10 22:34 ` Nathan Sidwell
  2015-11-10 22:45   ` Ilya Verbin
  2015-11-11 12:06   ` Bernd Schmidt
  0 siblings, 2 replies; 12+ messages in thread
From: Nathan Sidwell @ 2015-11-10 22:34 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 633 bytes --]

I've committed this patch to trunk.  It implements a partitioning optimization 
for a loop partitioned over both vector and worker axes.  We can elide the inner 
vector partitioning state propagation, if there are no intervening instructions 
in the worker-partitioned outer loop other than the forking and joining.  We 
simply execute the worker propagation on all vectors.

I've been unable to introduce a testcase for this. The difficulty is we want to 
check an rtl dump from the acceleration compiler, and there doesn't  appear to 
be existing machinery for that in the testsuite.  Perhaps something to be added 
later?

nathan

[-- Attachment #2: trunk-ptx-opt-1110.patch --]
[-- Type: text/x-patch, Size: 4574 bytes --]

2015-11-10  Nathan Sidwell  <nathan@codesourcery.com>

	* config/nvptx/nvptx.opt (moptimize): New flag.
	* config/nvptx/nvptx.c (nvptx_option_override): Set nvptx_optimize
	default.
	(nvptx_optimize_inner): New.
	(nvptx_process_pars): Call it when optimizing.
	* doc/invoke.texi (Nvidia PTX Options): Document -moptimize.

Index: config/nvptx/nvptx.c
===================================================================
--- config/nvptx/nvptx.c	(revision 230112)
+++ config/nvptx/nvptx.c	(working copy)
@@ -137,6 +137,9 @@ nvptx_option_override (void)
   write_symbols = NO_DEBUG;
   debug_info_level = DINFO_LEVEL_NONE;
 
+  if (nvptx_optimize < 0)
+    nvptx_optimize = optimize > 0;
+
   declared_fndecls_htab = hash_table<tree_hasher>::create_ggc (17);
   needed_fndecls_htab = hash_table<tree_hasher>::create_ggc (17);
   declared_libfuncs_htab
@@ -2942,6 +2945,69 @@ nvptx_skip_par (unsigned mask, parallel
   nvptx_single (mask, par->forked_block, pre_tail);
 }
 
+/* If PAR has a single inner parallel and PAR itself only contains
+   empty entry and exit blocks, swallow the inner PAR.  */
+
+static void
+nvptx_optimize_inner (parallel *par)
+{
+  parallel *inner = par->inner;
+
+  /* We mustn't be the outer dummy par.  */
+  if (!par->mask)
+    return;
+
+  /* We must have a single inner par.  */
+  if (!inner || inner->next)
+    return;
+
+  /* We must only contain 2 blocks ourselves -- the head and tail of
+     the inner par.  */
+  if (par->blocks.length () != 2)
+    return;
+
+  /* We must be disjoint partitioning.  As we only have vector and
+     worker partitioning, this is sufficient to guarantee the pars
+     have adjacent partitioning.  */
+  if ((par->mask & inner->mask) & (GOMP_DIM_MASK (GOMP_DIM_MAX) - 1))
+    /* This indicates malformed code generation.  */
+    return;
+
+  /* The outer forked insn should be immediately followed by the inner
+     fork insn.  */
+  rtx_insn *forked = par->forked_insn;
+  rtx_insn *fork = BB_END (par->forked_block);
+
+  if (NEXT_INSN (forked) != fork)
+    return;
+  gcc_checking_assert (recog_memoized (fork) == CODE_FOR_nvptx_fork);
+
+  /* The outer joining insn must immediately follow the inner join
+     insn.  */
+  rtx_insn *joining = par->joining_insn;
+  rtx_insn *join = inner->join_insn;
+  if (NEXT_INSN (join) != joining)
+    return;
+
+  /* Preconditions met.  Swallow the inner par.  */
+  if (dump_file)
+    fprintf (dump_file, "Merging loop %x [%d,%d] into %x [%d,%d]\n",
+	     inner->mask, inner->forked_block->index,
+	     inner->join_block->index,
+	     par->mask, par->forked_block->index, par->join_block->index);
+
+  par->mask |= inner->mask & (GOMP_DIM_MASK (GOMP_DIM_MAX) - 1);
+
+  par->blocks.reserve (inner->blocks.length ());
+  while (inner->blocks.length ())
+    par->blocks.quick_push (inner->blocks.pop ());
+
+  par->inner = inner->inner;
+  inner->inner = NULL;
+
+  delete inner;
+}
+
 /* Process the parallel PAR and all its contained
    parallels.  We do everything but the neutering.  Return mask of
    partitioned modes used within this parallel.  */
@@ -2949,6 +3015,9 @@ nvptx_skip_par (unsigned mask, parallel
 static unsigned
 nvptx_process_pars (parallel *par)
 {
+  if (nvptx_optimize)
+    nvptx_optimize_inner (par);
+  
   unsigned inner_mask = par->mask;
 
   /* Do the inner parallels first.  */
Index: config/nvptx/nvptx.opt
===================================================================
--- config/nvptx/nvptx.opt	(revision 230112)
+++ config/nvptx/nvptx.opt	(working copy)
@@ -28,3 +28,7 @@ Generate code for a 64-bit ABI.
 mmainkernel
 Target Report RejectNegative
 Link in code for a __main kernel.
+
+moptimize
+Target Report Var(nvptx_optimize) Init(-1)
+Optimize partition neutering
Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi	(revision 230112)
+++ doc/invoke.texi	(working copy)
@@ -873,7 +873,7 @@ Objective-C and Objective-C++ Dialects}.
 -march=@var{arch} -mbmx -mno-bmx -mcdx -mno-cdx}
 
 @emph{Nvidia PTX Options}
-@gccoptlist{-m32 -m64 -mmainkernel}
+@gccoptlist{-m32 -m64 -mmainkernel -moptimize}
 
 @emph{PDP-11 Options}
 @gccoptlist{-mfpu  -msoft-float  -mac0  -mno-ac0  -m40  -m45  -m10 @gol
@@ -18960,6 +18960,11 @@ Generate code for 32-bit or 64-bit ABI.
 Link in code for a __main kernel.  This is for stand-alone instead of
 offloading execution.
 
+@item -moptimize
+@opindex moptimize
+Apply partitioned execution optimizations.  This is the default when any
+level of optimization is selected.
+
 @end table
 
 @node PDP-11 Options

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [ptx] partitioning optimization
  2015-11-10 22:34 ` [ptx] partitioning optimization Nathan Sidwell
@ 2015-11-10 22:45   ` Ilya Verbin
  2015-11-11 13:37     ` Nathan Sidwell
  2015-11-11 12:06   ` Bernd Schmidt
  1 sibling, 1 reply; 12+ messages in thread
From: Ilya Verbin @ 2015-11-10 22:45 UTC (permalink / raw)
  To: Nathan Sidwell; +Cc: GCC Patches

> I've been unable to introduce a testcase for this. The difficulty is we want
> to check an rtl dump from the acceleration compiler, and there doesn't
> appear to be existing machinery for that in the testsuite.  Perhaps
> something to be added later?

I haven't tried it, but doesn't
/* { dg-options "-foffload=-fdump-rtl-..." } */
with
/* { dg-final { scan-rtl-dump ... } } */
work?

  -- Ilya

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [ptx] partitioning optimization
  2015-11-10 22:34 ` [ptx] partitioning optimization Nathan Sidwell
  2015-11-10 22:45   ` Ilya Verbin
@ 2015-11-11 12:06   ` Bernd Schmidt
  2015-11-11 13:59     ` Nathan Sidwell
  1 sibling, 1 reply; 12+ messages in thread
From: Bernd Schmidt @ 2015-11-11 12:06 UTC (permalink / raw)
  To: Nathan Sidwell, GCC Patches

On 11/10/2015 11:33 PM, Nathan Sidwell wrote:
> I've committed this patch to trunk.  It implements a partitioning
> optimization for a loop partitioned over both vector and worker axes.
> We can elide the inner vector partitioning state propagation, if there
> are no intervening instructions in the worker-partitioned outer loop
> other than the forking and joining.  We simply execute the worker
> propagation on all vectors.

Patch LGTM, although I wonder if you really need the extra option rather 
than just optimize.

> I've been unable to introduce a testcase for this. The difficulty is we
> want to check an rtl dump from the acceleration compiler, and there
> doesn't  appear to be existing machinery for that in the testsuite.
> Perhaps something to be added later?

What's the difficulty exactly? Getting a dump should be possible with 
-foffload=-fdump-whatever, does the testsuite have a problem finding the 
right filename?


Bernd

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [ptx] partitioning optimization
  2015-11-10 22:45   ` Ilya Verbin
@ 2015-11-11 13:37     ` Nathan Sidwell
  0 siblings, 0 replies; 12+ messages in thread
From: Nathan Sidwell @ 2015-11-11 13:37 UTC (permalink / raw)
  To: Ilya Verbin; +Cc: GCC Patches

On 11/10/15 17:45, Ilya Verbin wrote:
>> I've been unable to introduce a testcase for this. The difficulty is we want
>> to check an rtl dump from the acceleration compiler, and there doesn't
>> appear to be existing machinery for that in the testsuite.  Perhaps
>> something to be added later?
>
> I haven't tried it, but doesn't
> /* { dg-options "-foffload=-fdump-rtl-..." } */
> with
> /* { dg-final { scan-rtl-dump ... } } */
> work?

in the gcc testsuite directories?  That's the approach I was going for.

The issue is detecting when the test should be run.  target==nvptx-*-* isn't 
right, as the target is the x86 host machine.  There doesn't seem to be an 
existing dejagnu predicate there to select for 'accel_target==FOO'.  Am I 
missing something?

nathan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [ptx] partitioning optimization
  2015-11-11 12:06   ` Bernd Schmidt
@ 2015-11-11 13:59     ` Nathan Sidwell
  2015-11-11 14:19       ` Bernd Schmidt
  2015-11-11 17:16       ` Thomas Schwinge
  0 siblings, 2 replies; 12+ messages in thread
From: Nathan Sidwell @ 2015-11-11 13:59 UTC (permalink / raw)
  To: Bernd Schmidt, GCC Patches

On 11/11/15 07:06, Bernd Schmidt wrote:
> On 11/10/2015 11:33 PM, Nathan Sidwell wrote:
>> I've committed this patch to trunk.  It implements a partitioning
>> optimization for a loop partitioned over both vector and worker axes.
>> We can elide the inner vector partitioning state propagation, if there
>> are no intervening instructions in the worker-partitioned outer loop
>> other than the forking and joining.  We simply execute the worker
>> propagation on all vectors.
>
> Patch LGTM, although I wonder if you really need the extra option rather than
> just optimize.

The reason I added the option was to be able to turn it off independent of the 
other optimizations, (in cases of debugging)

>> I've been unable to introduce a testcase for this. The difficulty is we
>> want to check an rtl dump from the acceleration compiler, and there
>> doesn't  appear to be existing machinery for that in the testsuite.
>> Perhaps something to be added later?
>
> What's the difficulty exactly? Getting a dump should be possible with
> -foffload=-fdump-whatever, does the testsuite have a problem finding the right
> filename?


That's not the problem.  How to conditionally enable the test is the difficulty. 
  I suspect porting something concerning accel_compiler from the libgomp 
testsuite is needed?

nathan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [ptx] partitioning optimization
  2015-11-11 13:59     ` Nathan Sidwell
@ 2015-11-11 14:19       ` Bernd Schmidt
  2015-11-13 20:07         ` Nathan Sidwell
  2015-11-11 17:16       ` Thomas Schwinge
  1 sibling, 1 reply; 12+ messages in thread
From: Bernd Schmidt @ 2015-11-11 14:19 UTC (permalink / raw)
  To: Nathan Sidwell, GCC Patches

On 11/11/2015 02:59 PM, Nathan Sidwell wrote:
> That's not the problem.  How to conditionally enable the test is the
> difficulty.  I suspect porting something concerning accel_compiler from
> the libgomp testsuite is needed?

Maybe a check_effective_target_offload_nvptx which tries to see if 
-foffload=nvptx gives an error (I would hope it does if it's unsupported).


Bernd

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [ptx] partitioning optimization
  2015-11-11 13:59     ` Nathan Sidwell
  2015-11-11 14:19       ` Bernd Schmidt
@ 2015-11-11 17:16       ` Thomas Schwinge
  1 sibling, 0 replies; 12+ messages in thread
From: Thomas Schwinge @ 2015-11-11 17:16 UTC (permalink / raw)
  To: Nathan Sidwell; +Cc: Bernd Schmidt, GCC Patches, Ilya Verbin

[-- Attachment #1: Type: text/plain, Size: 1686 bytes --]

Hi!

On Wed, 11 Nov 2015 08:59:17 -0500, Nathan Sidwell <nathan@acm.org> wrote:
> On 11/11/15 07:06, Bernd Schmidt wrote:
> > On 11/10/2015 11:33 PM, Nathan Sidwell wrote:
> >> I've been unable to introduce a testcase for this.

(But you still committed an update to gcc/testsuite/ChangeLog.)

You'll need to put such an offloading test into the libgomp testsuite --
offloading complation requires linking, and during that, the offloading
compiler(s) will be invoked, which only the libgomp testsuite is set up
to do, as discussed before.

> >> The difficulty is we
> >> want to check an rtl dump from the acceleration compiler, and there
> >> doesn't  appear to be existing machinery for that in the testsuite.
> >> Perhaps something to be added later?
> >
> > What's the difficulty exactly? Getting a dump should be possible with
> > -foffload=-fdump-whatever, does the testsuite have a problem finding the right
> > filename?

Currently, this will create cc* files, for example ccdjj2z9.o.271r.final
for -foffload=-fdump-rtl-final.  (I don't know if you can come up with
dg-* directives to scan these.)  The reason is -- I think -- because of
the lto-wrapper and/or mkoffloads not specifying a more suitable "base
name" for the temporary input files to lto1.

> That's not the problem.  How to conditionally enable the test is the difficulty. 
>   I suspect porting something concerning accel_compiler from the libgomp 
> testsuite is needed?

Use "{ target openacc_nvidia_accel_selected }", as implemented by
libgomp/testsuite/lib/libgomp.exp:check_effective_target_openacc_nvidia_accel_selected
(already present on trunk).

Grüße
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [ptx] partitioning optimization
  2015-11-11 14:19       ` Bernd Schmidt
@ 2015-11-13 20:07         ` Nathan Sidwell
  2015-11-13 20:22           ` Bernd Schmidt
  0 siblings, 1 reply; 12+ messages in thread
From: Nathan Sidwell @ 2015-11-13 20:07 UTC (permalink / raw)
  To: Bernd Schmidt, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 573 bytes --]

On 11/11/15 09:19, Bernd Schmidt wrote:
> On 11/11/2015 02:59 PM, Nathan Sidwell wrote:
>> That's not the problem.  How to conditionally enable the test is the
>> difficulty.  I suspect porting something concerning accel_compiler from
>> the libgomp testsuite is needed?
>
> Maybe a check_effective_target_offload_nvptx which tries to see if
> -foffload=nvptx gives an error (I would hope it does if it's unsupported).

This patch seems to do the trick.  tested on  an offload-aware build (passes), 
and a regular x86_64-linux build (skipped as unsupported).

ok?

nathan


[-- Attachment #2: trunk-sese-test.patch --]
[-- Type: text/x-patch, Size: 1626 bytes --]

2015-11-13  Nathan Sidwell  <nathan@codesourcery.com>

	* lib/target-supports.exp (check_effective_target_offload_nvptx): New.
	* gcc.dg/goacc/nvptx-merged-loop.c: New.

Index: testsuite/gcc.dg/goacc/nvptx-merged-loop.c
===================================================================
--- testsuite/gcc.dg/goacc/nvptx-merged-loop.c	(revision 0)
+++ testsuite/gcc.dg/goacc/nvptx-merged-loop.c	(working copy)
@@ -0,0 +1,30 @@
+/* { dg-do link } */
+/* { dg-require-effective-target offload_nvptx } */
+/* { dg-options "-fopenacc -O2 -foffload=-fdump-rtl-mach\\ -dumpbase\\ nvptx-merged-loop.c\\ -Wa,--no-verify" } */
+
+#define N (32*32*32+17)
+void __attribute__ ((noinline)) Foo (int *ary)
+{
+  int ix;
+
+#pragma acc parallel num_workers(32) vector_length(32) copyout(ary[0:N])
+  {
+    /* Loop partitioning should be merged.  */
+#pragma acc loop worker vector
+    for (unsigned ix = 0; ix < N; ix++)
+      {
+	ary[ix] = ix;
+      }
+  }
+}
+
+int main ()
+{
+  int ary[N];
+
+  Foo (ary);
+
+  return 0;
+}   
+
+/* { dg-final { scan-rtl-dump "Merging loop .* into " "mach" } } */
Index: testsuite/lib/target-supports.exp
===================================================================
--- testsuite/lib/target-supports.exp	(revision 230324)
+++ testsuite/lib/target-supports.exp	(working copy)
@@ -6716,3 +6716,11 @@ proc check_effective_target_vect_max_red
     }
     return 0
 }
+
+# Return 1 if there is an nvptx offload compiler.
+
+proc check_effective_target_offload_nvptx { } {
+    return [check_no_compiler_messages offload_nvptx object {
+	int main () {return 0;}
+    } "-foffload=nvptx-none" ]
+}

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [ptx] partitioning optimization
  2015-11-13 20:07         ` Nathan Sidwell
@ 2015-11-13 20:22           ` Bernd Schmidt
  2015-11-23  7:47             ` Thomas Schwinge
  0 siblings, 1 reply; 12+ messages in thread
From: Bernd Schmidt @ 2015-11-13 20:22 UTC (permalink / raw)
  To: Nathan Sidwell, GCC Patches

On 11/13/2015 09:06 PM, Nathan Sidwell wrote:
> On 11/11/15 09:19, Bernd Schmidt wrote:
>> On 11/11/2015 02:59 PM, Nathan Sidwell wrote:
>>> That's not the problem.  How to conditionally enable the test is the
>>> difficulty.  I suspect porting something concerning accel_compiler from
>>> the libgomp testsuite is needed?
>>
>> Maybe a check_effective_target_offload_nvptx which tries to see if
>> -foffload=nvptx gives an error (I would hope it does if it's
>> unsupported).
>
> This patch seems to do the trick.  tested on  an offload-aware build
> (passes), and a regular x86_64-linux build (skipped as unsupported).

I think this is fine.


Bernd

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [ptx] partitioning optimization
  2015-11-13 20:22           ` Bernd Schmidt
@ 2015-11-23  7:47             ` Thomas Schwinge
  2015-11-23  8:46               ` Jakub Jelinek
  0 siblings, 1 reply; 12+ messages in thread
From: Thomas Schwinge @ 2015-11-23  7:47 UTC (permalink / raw)
  To: Bernd Schmidt, Nathan Sidwell, Jakub Jelinek; +Cc: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 2052 bytes --]

Hi!

On Fri, 13 Nov 2015 21:22:11 +0100, Bernd Schmidt <bschmidt@redhat.com> wrote:
> On 11/13/2015 09:06 PM, Nathan Sidwell wrote:
> > On 11/11/15 09:19, Bernd Schmidt wrote:
> >> On 11/11/2015 02:59 PM, Nathan Sidwell wrote:
> >>> That's not the problem.  How to conditionally enable the test is the
> >>> difficulty.  I suspect porting something concerning accel_compiler from
> >>> the libgomp testsuite is needed?
> >>
> >> Maybe a check_effective_target_offload_nvptx which tries to see if
> >> -foffload=nvptx gives an error (I would hope it does if it's
> >> unsupported).
> >
> > This patch seems to do the trick.  tested on  an offload-aware build
> > (passes), and a regular x86_64-linux build (skipped as unsupported).

Thanks for adding such tests!

> I think this is fine.

I may have pointed this out before a few times... ;-) --
<http://news.gmane.org/find-root.php?message_id=%3C8737wcv4lg.fsf%40kepler.schwinge.homeip.net%3E>
-- this doesn't work for build-tree testing ("make check"), if GCC has
not yet been installed ("make install"), which is still the default
testing procedure as far as I know.

    spawn [...]/build-gcc/gcc/xgcc -B[...]/build-gcc/gcc/ [...]/source-gcc/gcc/testsuite/gcc.dg/goacc/nvptx-merged-loop.c -fno-diagnostics-show-caret -fdiagnostics-color=never -fopenacc -O2 -foffload=-fdump-rtl-mach -dumpbase nvptx-merged-loop.c -Wa,--no-verify -ffat-lto-objects -lm -o nvptx-merged-loop.exe
    xgcc: error: libgomp.spec: No such file or directory
    compiler exited with status 1
    [...]
    FAIL: gcc.dg/goacc/nvptx-merged-loop.c (test for excess errors)
    [...]
    UNRESOLVED: gcc.dg/goacc/nvptx-merged-loop.c scan-rtl-dump mach "Merging loop .* into "

Here, -fopenacc induces -lgomp.  So, we'll either need a (dummy?) libgomp
available to link against in gcc/testsuite/, or come up with a way to do
LTO/offloading compilation without actually linking (libgomp into) the
final executable, or move such tests into libgomp/testsuite/.  (Jakub?)


Grüße
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [ptx] partitioning optimization
  2015-11-23  7:47             ` Thomas Schwinge
@ 2015-11-23  8:46               ` Jakub Jelinek
  0 siblings, 0 replies; 12+ messages in thread
From: Jakub Jelinek @ 2015-11-23  8:46 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: Bernd Schmidt, Nathan Sidwell, GCC Patches

On Mon, Nov 23, 2015 at 08:46:30AM +0100, Thomas Schwinge wrote:
> Here, -fopenacc induces -lgomp.  So, we'll either need a (dummy?) libgomp
> available to link against in gcc/testsuite/, or come up with a way to do
> LTO/offloading compilation without actually linking (libgomp into) the
> final executable, or move such tests into libgomp/testsuite/.  (Jakub?)

Link/run tests that link against libgomp belong to libgomp/testsuite/.

	Jakub

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2015-11-23  8:34 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-26 12:33 [gomp4] loop partition optimization Nathan Sidwell
2015-11-10 22:34 ` [ptx] partitioning optimization Nathan Sidwell
2015-11-10 22:45   ` Ilya Verbin
2015-11-11 13:37     ` Nathan Sidwell
2015-11-11 12:06   ` Bernd Schmidt
2015-11-11 13:59     ` Nathan Sidwell
2015-11-11 14:19       ` Bernd Schmidt
2015-11-13 20:07         ` Nathan Sidwell
2015-11-13 20:22           ` Bernd Schmidt
2015-11-23  7:47             ` Thomas Schwinge
2015-11-23  8:46               ` Jakub Jelinek
2015-11-11 17:16       ` Thomas Schwinge

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).