From: Thomas Schwinge <thomas@codesourcery.com>
To: GCC Patches <gcc-patches@gcc.gnu.org>
Cc: Tom de Vries <Tom_deVries@mentor.com>,
Richard Biener <rguenther@suse.de>,
Jakub Jelinek <jakub@redhat.com>
Subject: Re: [PATCH, 7/8] Add pass_parallelize_loops_oacc_kernels to pass_oacc_kernels
Date: Tue, 21 Apr 2015 20:09:00 -0000 [thread overview]
Message-ID: <87sibtz1ka.fsf@kepler.schwinge.homeip.net> (raw)
In-Reply-To: <54746B24.3030409@mentor.com>
[-- Attachment #1: Type: text/plain, Size: 35601 bytes --]
Hi!
On Tue, 25 Nov 2014 12:42:28 +0100, Tom de Vries <Tom_deVries@mentor.com> wrote:
> On 15-11-14 18:23, Tom de Vries wrote:
> > On 15-11-14 13:14, Tom de Vries wrote:
> >> I'm submitting a patch series with initial support for the oacc kernels
> >> directive.
> >>
> >> The patch series uses pass_parallelize_loops to implement parallelization of
> >> loops in the oacc kernels region.
> >>
> >> The patch series consists of these 8 patches:
> >> ...
> >> 1 Expand oacc kernels after pass_build_ealias
> >> 2 Add pass_oacc_kernels
> >> 3 Add pass_ch_oacc_kernels to pass_oacc_kernels
> >> 4 Add pass_tree_loop_{init,done} to pass_oacc_kernels
> >> 5 Add pass_loop_im to pass_oacc_kernels
> >> 6 Add pass_ccp to pass_oacc_kernels
> >> 7 Add pass_parloops_oacc_kernels to pass_oacc_kernels
> >> 8 Do simple omp lowering for no address taken var
> >> ...
> >
> > This patch adds:
> > - a specialized version of pass_parallelize_loops called
> > pass_parloops_oacc_kernels to pass group pass_oacc_kernels, and
> > - relevant test-cases.
> >
> > The pass only handles loops that are in a kernels region, and skips over bits of
> > pass_parallelize_loops that are already done for oacc kernels.
> >
> > The pass reintroduces the use of omp_expand_local, I haven't managed to make it
> > work yet using the external pass pass_expand_omp_ssa.
> >
> > An obvious limitation of the patch is the fact that we copy over the clauses
> > from the kernels directive to the generated parallel directive. We'll need to do
> > something more intelligent here, f.i. setting vector_length based on the
> > parallelization factor.
> >
> > Another limitation is that the pass still needs -ftree-parallelize-loops to
> > trigger.
> >
>
> Updated for using pass_copyprop instead of pass_ccp in pass_oacc_kernels.
>
> Bootstrapped and reg-tested as before.
>
> OK for trunk?
Committed to gomp-4_0-branch in r222285:
commit 74e09b9dbbe43321fb20b0174f926893bf2111bc
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date: Tue Apr 21 20:06:16 2015 +0000
Add pass_parallelize_loops_oacc_kernels to pass_oacc_kernels
gcc/
* passes.def: Add pass_parallelize_loops_oacc_kernels in pass group
pass_oacc_kernels.
* tree-parloops.c (create_parallel_loop, gen_parallel_loop): Add
function parameters region_entry and bool oacc_kernels_p. Handle
oacc_kernels_p.
Call create_parallel_loop with additional args.
(parallelize_loops): Add function parameter oacc_kernels_p. Calculate
dominance info. Skip loops that are not in a kernels region. Call
gen_parallel_loop with additional args.
(pass_parallelize_loops::execute): Call parallelize_loops with false
argument.
(pass_data_parallelize_loops_oacc_kernels): New pass_data.
(class pass_parallelize_loops_oacc_kernels): New pass.
(pass_parallelize_loops_oacc_kernels::execute)
(make_pass_parallelize_loops_oacc_kernels): New function.
* tree-pass.h (make_pass_parallelize_loops_oacc_kernels): Declare.
gcc/testsuite/
* c-c++-common/goacc/kernels-loop-2.c: New test.
* c-c++-common/goacc/kernels-loop.c: New test.
* c-c++-common/goacc/kernels-loop-n.c: New test.
* c-c++-common/goacc/kernels-loop-mod-not-zero.c: New test.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c: New test.
* testsuite/libgomp.oacc-c-c++-common/kernels-loop.c: New test.
* testsuite/libgomp.oacc-c-c++-common/kernels-loop-n.c: New test.
* testsuite/libgomp.oacc-c-c++-common/kernels-loop-mod-not-zero.c:
New test.
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@222285 138bc75d-0d04-0410-961f-82ee72b054a4
---
gcc/ChangeLog.gomp | 17 ++
gcc/passes.def | 1 +
gcc/testsuite/ChangeLog.gomp | 5 +
gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c | 62 +++++
.../c-c++-common/goacc/kernels-loop-mod-not-zero.c | 53 ++++
gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c | 48 ++++
gcc/testsuite/c-c++-common/goacc/kernels-loop.c | 53 ++++
gcc/tree-parloops.c | 282 ++++++++++++++++----
gcc/tree-pass.h | 2 +
libgomp/ChangeLog.gomp | 9 +
.../libgomp.oacc-c-c++-common/kernels-loop-2.c | 47 ++++
.../kernels-loop-mod-not-zero.c | 41 +++
.../libgomp.oacc-c-c++-common/kernels-loop-n.c | 47 ++++
.../libgomp.oacc-c-c++-common/kernels-loop.c | 41 +++
14 files changed, 650 insertions(+), 58 deletions(-)
diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index 0be9191..bf0ee52 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,5 +1,22 @@
2015-04-21 Tom de Vries <tom@codesourcery.com>
+ * passes.def: Add pass_parallelize_loops_oacc_kernels in pass group
+ pass_oacc_kernels.
+ * tree-parloops.c (create_parallel_loop, gen_parallel_loop): Add
+ function parameters region_entry and bool oacc_kernels_p. Handle
+ oacc_kernels_p.
+ Call create_parallel_loop with additional args.
+ (parallelize_loops): Add function parameter oacc_kernels_p. Calculate
+ dominance info. Skip loops that are not in a kernels region. Call
+ gen_parallel_loop with additional args.
+ (pass_parallelize_loops::execute): Call parallelize_loops with false
+ argument.
+ (pass_data_parallelize_loops_oacc_kernels): New pass_data.
+ (class pass_parallelize_loops_oacc_kernels): New pass.
+ (pass_parallelize_loops_oacc_kernels::execute)
+ (make_pass_parallelize_loops_oacc_kernels): New function.
+ * tree-pass.h (make_pass_parallelize_loops_oacc_kernels): Declare.
+
* passes.def: Add pass_copy_prop to pass group pass_oacc_kernels.
* tree-ssa-copy.c (stmt_may_generate_copy): Handle .omp_data_i init
conservatively.
diff --git gcc/passes.def gcc/passes.def
index e6f1c33..2d2e286 100644
--- gcc/passes.def
+++ gcc/passes.def
@@ -94,6 +94,7 @@ along with GCC; see the file COPYING3. If not see
NEXT_PASS (pass_tree_loop_init);
NEXT_PASS (pass_lim);
NEXT_PASS (pass_copy_prop);
+ NEXT_PASS (pass_parallelize_loops_oacc_kernels);
NEXT_PASS (pass_expand_omp_ssa);
NEXT_PASS (pass_tree_loop_done);
POP_INSERT_PASSES ()
diff --git gcc/testsuite/ChangeLog.gomp gcc/testsuite/ChangeLog.gomp
index 68d6d93..2c6abff 100644
--- gcc/testsuite/ChangeLog.gomp
+++ gcc/testsuite/ChangeLog.gomp
@@ -1,6 +1,11 @@
2015-04-21 Tom de Vries <tom@codesourcery.com>
Thomas Schwinge <thomas@codesourcery.com>
+ * c-c++-common/goacc/kernels-loop-2.c: New test.
+ * c-c++-common/goacc/kernels-loop.c: New test.
+ * c-c++-common/goacc/kernels-loop-n.c: New test.
+ * c-c++-common/goacc/kernels-loop-mod-not-zero.c: New test.
+
* c-c++-common/restrict-2.c: Update for new pass_lim.
* c-c++-common/restrict-4.c: Same.
* g++.dg/tree-ssa/pr33615.C: Same.
diff --git gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c
new file mode 100644
index 0000000..ab69fe9
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c
@@ -0,0 +1,62 @@
+/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* { dg-additional-options "-fdump-tree-parloops_oacc_kernels-all" } */
+/* { dg-additional-options "-fdump-tree-optimized" } */
+
+#include <stdlib.h>
+
+#define N (1024 * 512)
+#define COUNTERTYPE unsigned int
+
+int
+main (void)
+{
+ unsigned int *__restrict a;
+ unsigned int *__restrict b;
+ unsigned int *__restrict c;
+
+ a = (unsigned int *)malloc (N * sizeof (unsigned int));
+ b = (unsigned int *)malloc (N * sizeof (unsigned int));
+ c = (unsigned int *)malloc (N * sizeof (unsigned int));
+
+#pragma acc kernels copyout (a[0:N])
+ {
+ for (COUNTERTYPE i = 0; i < N; i++)
+ a[i] = i * 2;
+ }
+
+#pragma acc kernels copyout (b[0:N])
+ {
+ for (COUNTERTYPE i = 0; i < N; i++)
+ b[i] = i * 4;
+ }
+
+#pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N])
+ {
+ for (COUNTERTYPE ii = 0; ii < N; ii++)
+ c[ii] = a[ii] + b[ii];
+ }
+
+ for (COUNTERTYPE i = 0; i < N; i++)
+ if (c[i] != a[i] + b[i])
+ abort ();
+
+ free (a);
+ free (b);
+ free (c);
+
+ return 0;
+}
+
+/* Check that only three loops are analyzed, and that all can be
+ parallelized. */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops_oacc_kernels" } } */
+/* { dg-final { scan-tree-dump-not "FAILED:" "parloops_oacc_kernels" } } */
+
+/* Check that the loop has been split off into a function. */
+/* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.1" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.2" 1 "optimized" } } */
+
+/* { dg-final { cleanup-tree-dump "parloops_oacc_kernels" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
diff --git gcc/testsuite/c-c++-common/goacc/kernels-loop-mod-not-zero.c gcc/testsuite/c-c++-common/goacc/kernels-loop-mod-not-zero.c
new file mode 100644
index 0000000..261d213
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/kernels-loop-mod-not-zero.c
@@ -0,0 +1,53 @@
+/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* { dg-additional-options "-fdump-tree-parloops_oacc_kernels-all" } */
+/* { dg-additional-options "-fdump-tree-optimized" } */
+
+#include <stdlib.h>
+
+#define N ((1024 * 512) + 1)
+#define COUNTERTYPE unsigned int
+
+int
+main (void)
+{
+ unsigned int *__restrict a;
+ unsigned int *__restrict b;
+ unsigned int *__restrict c;
+
+ a = (unsigned int *__restrict)malloc (N * sizeof (unsigned int));
+ b = (unsigned int *__restrict)malloc (N * sizeof (unsigned int));
+ c = (unsigned int *__restrict)malloc (N * sizeof (unsigned int));
+
+ for (COUNTERTYPE i = 0; i < N; i++)
+ a[i] = i * 2;
+
+ for (COUNTERTYPE i = 0; i < N; i++)
+ b[i] = i * 4;
+
+#pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N])
+ {
+ for (COUNTERTYPE ii = 0; ii < N; ii++)
+ c[ii] = a[ii] + b[ii];
+ }
+
+ for (COUNTERTYPE i = 0; i < N; i++)
+ if (c[i] != a[i] + b[i])
+ abort ();
+
+ free (a);
+ free (b);
+ free (c);
+
+ return 0;
+}
+
+/* Check that only one loop is analyzed, and that it can be parallelized. */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops_oacc_kernels" } } */
+/* { dg-final { scan-tree-dump-not "FAILED:" "parloops_oacc_kernels" } } */
+
+/* Check that the loop has been split off into a function. */
+/* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
+
+/* { dg-final { cleanup-tree-dump "parloops_oacc_kernels" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
diff --git gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c
new file mode 100644
index 0000000..7bf744e
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c
@@ -0,0 +1,48 @@
+/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* TODO: parallelize this example. */
+
+#include <stdlib.h>
+
+#define N ((1024 * 512) + 1)
+#define COUNTERTYPE unsigned int
+
+static int __attribute__((noinline,noclone))
+foo (COUNTERTYPE n)
+{
+ unsigned int *__restrict a;
+ unsigned int *__restrict b;
+ unsigned int *__restrict c;
+
+ a = (unsigned int *__restrict)malloc (n * sizeof (unsigned int));
+ b = (unsigned int *__restrict)malloc (n * sizeof (unsigned int));
+ c = (unsigned int *__restrict)malloc (n * sizeof (unsigned int));
+
+ for (COUNTERTYPE i = 0; i < n; i++)
+ a[i] = i * 2;
+
+ for (COUNTERTYPE i = 0; i < n; i++)
+ b[i] = i * 4;
+
+#pragma acc kernels copyin (a[0:n], b[0:n]) copyout (c[0:n])
+ {
+ for (COUNTERTYPE ii = 0; ii < n; ii++)
+ c[ii] = a[ii] + b[ii];
+ }
+
+ for (COUNTERTYPE i = 0; i < n; i++)
+ if (c[i] != a[i] + b[i])
+ abort ();
+
+ free (a);
+ free (b);
+ free (c);
+
+ return 0;
+}
+
+int
+main (void)
+{
+ return foo (N);
+}
diff --git gcc/testsuite/c-c++-common/goacc/kernels-loop.c gcc/testsuite/c-c++-common/goacc/kernels-loop.c
new file mode 100644
index 0000000..2391148
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/kernels-loop.c
@@ -0,0 +1,53 @@
+/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* { dg-additional-options "-fdump-tree-parloops_oacc_kernels-all" } */
+/* { dg-additional-options "-fdump-tree-optimized" } */
+
+#include <stdlib.h>
+
+#define N (1024 * 512)
+#define COUNTERTYPE unsigned int
+
+int
+main (void)
+{
+ unsigned int *__restrict a;
+ unsigned int *__restrict b;
+ unsigned int *__restrict c;
+
+ a = (unsigned int *)malloc (N * sizeof (unsigned int));
+ b = (unsigned int *)malloc (N * sizeof (unsigned int));
+ c = (unsigned int *)malloc (N * sizeof (unsigned int));
+
+ for (COUNTERTYPE i = 0; i < N; i++)
+ a[i] = i * 2;
+
+ for (COUNTERTYPE i = 0; i < N; i++)
+ b[i] = i * 4;
+
+#pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N])
+ {
+ for (COUNTERTYPE ii = 0; ii < N; ii++)
+ c[ii] = a[ii] + b[ii];
+ }
+
+ for (COUNTERTYPE i = 0; i < N; i++)
+ if (c[i] != a[i] + b[i])
+ abort ();
+
+ free (a);
+ free (b);
+ free (c);
+
+ return 0;
+}
+
+/* Check that only one loop is analyzed, and that it can be parallelized. */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops_oacc_kernels" } } */
+/* { dg-final { scan-tree-dump-not "FAILED:" "parloops_oacc_kernels" } } */
+
+/* Check that the loop has been split off into a function. */
+/* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
+
+/* { dg-final { cleanup-tree-dump "parloops_oacc_kernels" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
diff --git gcc/tree-parloops.c gcc/tree-parloops.c
index 9a233f4..e218a90 100644
--- gcc/tree-parloops.c
+++ gcc/tree-parloops.c
@@ -1612,9 +1612,10 @@ transform_to_exit_first_loop (struct loop *loop,
of LOOP_FN. N_THREADS is the requested number of threads. Returns the
basic block containing GIMPLE_OMP_PARALLEL tree. */
-static basic_block
+static void
create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
- tree new_data, unsigned n_threads, location_t loc)
+ tree new_data, unsigned n_threads, location_t loc,
+ basic_block region_entry, bool oacc_kernels_p)
{
gimple_stmt_iterator gsi;
basic_block bb, paral_bb, for_bb, ex_bb;
@@ -1631,15 +1632,69 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
/* Prepare the GIMPLE_OMP_PARALLEL statement. */
bb = loop_preheader_edge (loop)->src;
paral_bb = single_pred (bb);
- gsi = gsi_last_bb (paral_bb);
+ if (!oacc_kernels_p)
+ gsi = gsi_last_bb (paral_bb);
+ else
+ /* Make sure the oacc parallel is inserted on top of the oacc kernels
+ region. */
+ gsi = gsi_last_bb (region_entry);
- t = build_omp_clause (loc, OMP_CLAUSE_NUM_THREADS);
- OMP_CLAUSE_NUM_THREADS_EXPR (t)
- = build_int_cst (integer_type_node, n_threads);
- omp_par_stmt = gimple_build_omp_parallel (NULL, t, loop_fn, data);
- gimple_set_location (omp_par_stmt, loc);
+ if (!oacc_kernels_p)
+ {
+ t = build_omp_clause (loc, OMP_CLAUSE_NUM_THREADS);
+ OMP_CLAUSE_NUM_THREADS_EXPR (t)
+ = build_int_cst (integer_type_node, n_threads);
+ omp_par_stmt = gimple_build_omp_parallel (NULL, t, loop_fn, data);
+ gimple_set_location (omp_par_stmt, loc);
- gsi_insert_after (&gsi, omp_par_stmt, GSI_NEW_STMT);
+ gsi_insert_after (&gsi, omp_par_stmt, GSI_NEW_STMT);
+ }
+ else
+ {
+ /* Create oacc parallel pragma based on oacc kernels pragma. */
+ gomp_target *kernels = as_a <gomp_target *> (gsi_stmt (gsi));
+
+ tree clauses = gimple_omp_target_clauses (kernels);
+ /* FIXME: We need a more intelligent mapping onto vector, gangs,
+ workers. */
+ if (1)
+ {
+ tree clause = build_omp_clause (gimple_location (kernels),
+ OMP_CLAUSE_VECTOR_LENGTH);
+ OMP_CLAUSE_VECTOR_LENGTH_EXPR (clause)
+ = build_int_cst (integer_type_node, n_threads);
+ OMP_CLAUSE_CHAIN (clause) = clauses;
+ clauses = clause;
+ }
+ gomp_target *stmt
+ = gimple_build_omp_target (NULL, GF_OMP_TARGET_KIND_OACC_PARALLEL,
+ clauses);
+ tree child_fn = gimple_omp_target_child_fn (kernels);
+ gimple_omp_target_set_child_fn (stmt, child_fn);
+ tree data_arg = gimple_omp_target_data_arg (kernels);
+ gimple_omp_target_set_data_arg (stmt, data_arg);
+
+ gimple_set_location (stmt, loc);
+
+ /* Insert oacc parallel pragma after the oacc kernels pragma. */
+ {
+ gimple_stmt_iterator gsi2;
+ gsi = gsi_last_bb (region_entry);
+ gsi2 = gsi;
+ gsi_prev (&gsi2);
+
+ /* Insert pragma acc parallel. */
+ gsi_insert_after (&gsi, stmt, GSI_NEW_STMT);
+
+ /* Remove GOACC_kernels. */
+ replace_uses_by (gimple_vdef (gsi_stmt (gsi2)),
+ gimple_vuse (gsi_stmt (gsi2)));
+ gsi_remove (&gsi2, true);
+
+ /* Remove pragma acc kernels. */
+ gsi_remove (&gsi2, true);
+ }
+ }
/* Initialize NEW_DATA. */
if (data)
@@ -1657,12 +1712,18 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
gsi_insert_before (&gsi, assign_stmt, GSI_SAME_STMT);
}
- /* Emit GIMPLE_OMP_RETURN for GIMPLE_OMP_PARALLEL. */
- bb = split_loop_exit_edge (single_dom_exit (loop));
- gsi = gsi_last_bb (bb);
- omp_return_stmt1 = gimple_build_omp_return (false);
- gimple_set_location (omp_return_stmt1, loc);
- gsi_insert_after (&gsi, omp_return_stmt1, GSI_NEW_STMT);
+ /* Skip insertion of OMP_RETURN for oacc_kernels_p. We've already generated
+ one when lowering the oacc kernels directive in
+ pass_lower_omp/lower_omp (). */
+ if (!oacc_kernels_p)
+ {
+ /* Emit GIMPLE_OMP_RETURN for GIMPLE_OMP_PARALLEL. */
+ bb = split_loop_exit_edge (single_dom_exit (loop));
+ gsi = gsi_last_bb (bb);
+ omp_return_stmt1 = gimple_build_omp_return (false);
+ gimple_set_location (omp_return_stmt1, loc);
+ gsi_insert_after (&gsi, omp_return_stmt1, GSI_NEW_STMT);
+ }
/* Extract data for GIMPLE_OMP_FOR. */
gcc_assert (loop->header == single_dom_exit (loop)->src);
@@ -1719,7 +1780,11 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
t = build_omp_clause (loc, OMP_CLAUSE_SCHEDULE);
OMP_CLAUSE_SCHEDULE_KIND (t) = OMP_CLAUSE_SCHEDULE_STATIC;
- for_stmt = gimple_build_omp_for (NULL, GF_OMP_FOR_KIND_FOR, t, 1, NULL);
+ for_stmt = gimple_build_omp_for (NULL,
+ (oacc_kernels_p
+ ? GF_OMP_FOR_KIND_OACC_LOOP
+ : GF_OMP_FOR_KIND_FOR),
+ NULL_TREE, 1, NULL);
gimple_set_location (for_stmt, loc);
gimple_omp_for_set_index (for_stmt, 0, initvar);
gimple_omp_for_set_initial (for_stmt, 0, cvar_init);
@@ -1749,8 +1814,6 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
/* After the above dom info is hosed. Re-compute it. */
free_dominance_info (CDI_DOMINATORS);
calculate_dominance_info (CDI_DOMINATORS);
-
- return paral_bb;
}
/* Generates code to execute the iterations of LOOP in N_THREADS
@@ -1762,7 +1825,8 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
static void
gen_parallel_loop (struct loop *loop,
reduction_info_table_type *reduction_list,
- unsigned n_threads, struct tree_niter_desc *niter)
+ unsigned n_threads, struct tree_niter_desc *niter,
+ basic_block region_entry, bool oacc_kernels_p)
{
tree many_iterations_cond, type, nit;
tree arg_struct, new_arg_struct;
@@ -1843,41 +1907,44 @@ gen_parallel_loop (struct loop *loop,
if (stmts)
gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop), stmts);
- if (loop->inner)
- m_p_thread=2;
- else
- m_p_thread=MIN_PER_THREAD;
-
- many_iterations_cond =
- fold_build2 (GE_EXPR, boolean_type_node,
- nit, build_int_cst (type, m_p_thread * n_threads));
-
- many_iterations_cond
- = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
- invert_truthvalue (unshare_expr (niter->may_be_zero)),
- many_iterations_cond);
- many_iterations_cond
- = force_gimple_operand (many_iterations_cond, &stmts, false, NULL_TREE);
- if (stmts)
- gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop), stmts);
- if (!is_gimple_condexpr (many_iterations_cond))
+ if (!oacc_kernels_p)
{
+ if (loop->inner)
+ m_p_thread=2;
+ else
+ m_p_thread=MIN_PER_THREAD;
+
+ many_iterations_cond =
+ fold_build2 (GE_EXPR, boolean_type_node,
+ nit, build_int_cst (type, m_p_thread * n_threads));
+
many_iterations_cond
- = force_gimple_operand (many_iterations_cond, &stmts,
- true, NULL_TREE);
+ = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
+ invert_truthvalue (unshare_expr (niter->may_be_zero)),
+ many_iterations_cond);
+ many_iterations_cond
+ = force_gimple_operand (many_iterations_cond, &stmts, false, NULL_TREE);
if (stmts)
gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop), stmts);
+ if (!is_gimple_condexpr (many_iterations_cond))
+ {
+ many_iterations_cond
+ = force_gimple_operand (many_iterations_cond, &stmts,
+ true, NULL_TREE);
+ if (stmts)
+ gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop), stmts);
+ }
+
+ initialize_original_copy_tables ();
+
+ /* We assume that the loop usually iterates a lot. */
+ prob = 4 * REG_BR_PROB_BASE / 5;
+ loop_version (loop, many_iterations_cond, NULL,
+ prob, prob, REG_BR_PROB_BASE - prob, true);
+ update_ssa (TODO_update_ssa);
+ free_original_copy_tables ();
}
- initialize_original_copy_tables ();
-
- /* We assume that the loop usually iterates a lot. */
- prob = 4 * REG_BR_PROB_BASE / 5;
- loop_version (loop, many_iterations_cond, NULL,
- prob, prob, REG_BR_PROB_BASE - prob, true);
- update_ssa (TODO_update_ssa);
- free_original_copy_tables ();
-
/* Base all the induction variables in LOOP on a single control one. */
canonicalize_loop_ivs (loop, &nit, true);
@@ -1893,19 +1960,30 @@ gen_parallel_loop (struct loop *loop,
entry = loop_preheader_edge (loop);
exit = single_dom_exit (loop);
- eliminate_local_variables (entry, exit);
- /* In the old loop, move all variables non-local to the loop to a structure
- and back, and create separate decls for the variables used in loop. */
- separate_decls_in_region (entry, exit, reduction_list, &arg_struct,
- &new_arg_struct, &clsn_data);
+ /* This rewrites the body in terms of new variables. This has already
+ been done for oacc_kernels_p in pass_lower_omp/lower_omp (). */
+ if (!oacc_kernels_p)
+ {
+ eliminate_local_variables (entry, exit);
+ /* In the old loop, move all variables non-local to the loop to a
+ structure and back, and create separate decls for the variables used in
+ loop. */
+ separate_decls_in_region (entry, exit, reduction_list, &arg_struct,
+ &new_arg_struct, &clsn_data);
+ }
+ else
+ {
+ arg_struct = NULL_TREE;
+ new_arg_struct = NULL_TREE;
+ }
/* Create the parallel constructs. */
loc = UNKNOWN_LOCATION;
cond_stmt = last_stmt (loop->header);
if (cond_stmt)
loc = gimple_location (cond_stmt);
- create_parallel_loop (loop, create_loop_fn (loc), arg_struct,
- new_arg_struct, n_threads, loc);
+ create_parallel_loop (loop, create_loop_fn (loc), arg_struct, new_arg_struct,
+ n_threads, loc, region_entry, oacc_kernels_p);
if (reduction_list->elements () > 0)
create_call_for_reduction (loop, reduction_list, &clsn_data);
@@ -2145,7 +2223,7 @@ try_create_reduction_list (loop_p loop,
otherwise. */
static bool
-parallelize_loops (void)
+parallelize_loops (bool oacc_kernels_p)
{
unsigned n_threads = flag_tree_parallelize_loops;
bool changed = false;
@@ -2154,6 +2232,7 @@ parallelize_loops (void)
struct obstack parloop_obstack;
HOST_WIDE_INT estimated;
source_location loop_loc;
+ basic_block region_entry, region_exit;
/* Do not parallelize loops in the functions created by parallelization. */
if (parallelized_function_p (cfun->decl))
@@ -2165,9 +2244,46 @@ parallelize_loops (void)
reduction_info_table_type reduction_list (10);
init_stmt_vec_info_vec ();
+ calculate_dominance_info (CDI_DOMINATORS);
+
FOR_EACH_LOOP (loop, 0)
{
reduction_list.empty ();
+
+ if (oacc_kernels_p)
+ {
+ if (!loop_in_oacc_kernels_region_p (loop, ®ion_entry,
+ ®ion_exit))
+ continue;
+
+ /* TODO: Allow nested loops. */
+ if (loop->inner)
+ continue;
+
+ gcc_assert (single_succ_p (region_entry));
+ basic_block first = single_succ (region_entry);
+
+ /* TODO: Allow conditional loop entry. This test triggers when the
+ loop bound is not known at compile time. */
+ if (!single_succ_p (first))
+ continue;
+
+ /* TODO: allow more complex loops. */
+ if (single_exit (loop) == NULL)
+ continue;
+
+ /* TODO: Allow other code than a single loop inside a kernels
+ region. */
+ if (loop->header != single_succ (first)
+ || single_exit (loop)->dest != region_exit)
+ continue;
+
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ fprintf (dump_file,
+ "Trying loop %d with header bb %d in oacc kernels region\n",
+ loop->num, loop->header->index);
+ }
+
if (dump_file && (dump_flags & TDF_DETAILS))
{
fprintf (dump_file, "Trying loop %d as candidate\n",loop->num);
@@ -2209,6 +2325,7 @@ parallelize_loops (void)
/* FIXME: Bypass this check as graphite doesn't update the
count and frequency correctly now. */
if (!flag_loop_parallelize_all
+ && !oacc_kernels_p
&& ((estimated != -1
&& estimated <= (HOST_WIDE_INT) n_threads * MIN_PER_THREAD)
/* Do not bother with loops in cold areas. */
@@ -2237,8 +2354,9 @@ parallelize_loops (void)
fprintf (dump_file, "\nloop at %s:%d: ",
LOCATION_FILE (loop_loc), LOCATION_LINE (loop_loc));
}
+
gen_parallel_loop (loop, &reduction_list,
- n_threads, &niter_desc);
+ n_threads, &niter_desc, region_entry, oacc_kernels_p);
}
free_stmt_vec_info_vec ();
@@ -2289,7 +2407,7 @@ pass_parallelize_loops::execute (function *fun)
if (number_of_loops (fun) <= 1)
return 0;
- if (parallelize_loops ())
+ if (parallelize_loops (false))
{
fun->curr_properties &= ~(PROP_gimple_eomp);
return TODO_update_ssa;
@@ -2305,3 +2423,51 @@ make_pass_parallelize_loops (gcc::context *ctxt)
{
return new pass_parallelize_loops (ctxt);
}
+
+namespace {
+
+const pass_data pass_data_parallelize_loops_oacc_kernels =
+{
+ GIMPLE_PASS, /* type */
+ "parloops_oacc_kernels", /* name */
+ OPTGROUP_LOOP, /* optinfo_flags */
+ TV_TREE_PARALLELIZE_LOOPS, /* tv_id */
+ ( PROP_cfg | PROP_ssa ), /* properties_required */
+ 0, /* properties_provided */
+ 0, /* properties_destroyed */
+ 0, /* todo_flags_start */
+ 0, /* todo_flags_finish */
+};
+
+class pass_parallelize_loops_oacc_kernels : public gimple_opt_pass
+{
+public:
+ pass_parallelize_loops_oacc_kernels (gcc::context *ctxt)
+ : gimple_opt_pass (pass_data_parallelize_loops_oacc_kernels, ctxt)
+ {}
+
+ /* opt_pass methods: */
+ virtual bool gate (function *) { return flag_tree_parallelize_loops > 1; }
+ virtual unsigned int execute (function *);
+
+}; // class pass_parallelize_loops_oacc_kernels
+
+unsigned
+pass_parallelize_loops_oacc_kernels::execute (function *fun)
+{
+ if (number_of_loops (fun) <= 1)
+ return 0;
+
+ if (parallelize_loops (true))
+ return TODO_update_ssa;
+
+ return 0;
+}
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_parallelize_loops_oacc_kernels (gcc::context *ctxt)
+{
+ return new pass_parallelize_loops_oacc_kernels (ctxt);
+}
diff --git gcc/tree-pass.h gcc/tree-pass.h
index 321229a..effcb50 100644
--- gcc/tree-pass.h
+++ gcc/tree-pass.h
@@ -375,6 +375,8 @@ extern gimple_opt_pass *make_pass_slp_vectorize (gcc::context *ctxt);
extern gimple_opt_pass *make_pass_complete_unroll (gcc::context *ctxt);
extern gimple_opt_pass *make_pass_complete_unrolli (gcc::context *ctxt);
extern gimple_opt_pass *make_pass_parallelize_loops (gcc::context *ctxt);
+extern gimple_opt_pass *
+ make_pass_parallelize_loops_oacc_kernels (gcc::context *ctxt);
extern gimple_opt_pass *make_pass_loop_prefetch (gcc::context *ctxt);
extern gimple_opt_pass *make_pass_iv_optimize (gcc::context *ctxt);
extern gimple_opt_pass *make_pass_tree_loop_done (gcc::context *ctxt);
diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp
index f052d3e..f6968b8 100644
--- libgomp/ChangeLog.gomp
+++ libgomp/ChangeLog.gomp
@@ -1,3 +1,12 @@
+2015-04-21 Tom de Vries <tom@codesourcery.com>
+ Thomas Schwinge <thomas@codesourcery.com>
+
+ * testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c: New test.
+ * testsuite/libgomp.oacc-c-c++-common/kernels-loop.c: New test.
+ * testsuite/libgomp.oacc-c-c++-common/kernels-loop-n.c: New test.
+ * testsuite/libgomp.oacc-c-c++-common/kernels-loop-mod-not-zero.c:
+ New test.
+
2015-03-13 Thomas Schwinge <thomas@codesourcery.com>
* testsuite/libgomp.fortran/fortran.exp (DG_TORTURE_OPTIONS): Add
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c
new file mode 100644
index 0000000..0a0d754
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c
@@ -0,0 +1,47 @@
+/* { dg-do run } */
+/* { dg-options "-ftree-parallelize-loops=32 -O2" } */
+
+#include <stdlib.h>
+
+#define N (1024 * 512)
+#define COUNTERTYPE unsigned int
+
+int
+main (void)
+{
+ unsigned int *__restrict a;
+ unsigned int *__restrict b;
+ unsigned int *__restrict c;
+
+ a = (unsigned int *__restrict)malloc (N * sizeof (unsigned int));
+ b = (unsigned int *__restrict)malloc (N * sizeof (unsigned int));
+ c = (unsigned int *__restrict)malloc (N * sizeof (unsigned int));
+
+#pragma acc kernels copyout (a[0:N])
+ {
+ for (COUNTERTYPE i = 0; i < N; i++)
+ a[i] = i * 2;
+ }
+
+#pragma acc kernels copyout (b[0:N])
+ {
+ for (COUNTERTYPE i = 0; i < N; i++)
+ b[i] = i * 4;
+ }
+
+#pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N])
+ {
+ for (COUNTERTYPE ii = 0; ii < N; ii++)
+ c[ii] = a[ii] + b[ii];
+ }
+
+ for (COUNTERTYPE i = 0; i < N; i++)
+ if (c[i] != a[i] + b[i])
+ abort ();
+
+ free (a);
+ free (b);
+ free (c);
+
+ return 0;
+}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-mod-not-zero.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-mod-not-zero.c
new file mode 100644
index 0000000..fdd6256
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-mod-not-zero.c
@@ -0,0 +1,41 @@
+/* { dg-do run } */
+/* { dg-options "-ftree-parallelize-loops=32 -O2" } */
+
+#include <stdlib.h>
+
+#define N ((1024 * 512) + 1)
+#define COUNTERTYPE unsigned int
+
+int
+main (void)
+{
+ unsigned int *__restrict a;
+ unsigned int *__restrict b;
+ unsigned int *__restrict c;
+
+ a = (unsigned int *__restrict)malloc (N * sizeof (unsigned int));
+ b = (unsigned int *__restrict)malloc (N * sizeof (unsigned int));
+ c = (unsigned int *__restrict)malloc (N * sizeof (unsigned int));
+
+ for (COUNTERTYPE i = 0; i < N; i++)
+ a[i] = i * 2;
+
+ for (COUNTERTYPE i = 0; i < N; i++)
+ b[i] = i * 4;
+
+#pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N])
+ {
+ for (COUNTERTYPE ii = 0; ii < N; ii++)
+ c[ii] = a[ii] + b[ii];
+ }
+
+ for (COUNTERTYPE i = 0; i < N; i++)
+ if (c[i] != a[i] + b[i])
+ abort ();
+
+ free (a);
+ free (b);
+ free (c);
+
+ return 0;
+}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-n.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-n.c
new file mode 100644
index 0000000..52d8e24
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-n.c
@@ -0,0 +1,47 @@
+/* { dg-do run } */
+/* { dg-options "-ftree-parallelize-loops=32 -O2" } */
+
+#include <stdlib.h>
+
+#define N ((1024 * 512) + 1)
+#define COUNTERTYPE unsigned int
+
+static int __attribute__((noinline,noclone))
+foo (COUNTERTYPE n)
+{
+ unsigned int *__restrict a;
+ unsigned int *__restrict b;
+ unsigned int *__restrict c;
+
+ a = (unsigned int *__restrict)malloc (n * sizeof (unsigned int));
+ b = (unsigned int *__restrict)malloc (n * sizeof (unsigned int));
+ c = (unsigned int *__restrict)malloc (n * sizeof (unsigned int));
+
+ for (COUNTERTYPE i = 0; i < n; i++)
+ a[i] = i * 2;
+
+ for (COUNTERTYPE i = 0; i < n; i++)
+ b[i] = i * 4;
+
+#pragma acc kernels copyin (a[0:n], b[0:n]) copyout (c[0:n])
+ {
+ for (COUNTERTYPE ii = 0; ii < n; ii++)
+ c[ii] = a[ii] + b[ii];
+ }
+
+ for (COUNTERTYPE i = 0; i < n; i++)
+ if (c[i] != a[i] + b[i])
+ abort ();
+
+ free (a);
+ free (b);
+ free (c);
+
+ return 0;
+}
+
+int
+main (void)
+{
+ return foo (N);
+}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop.c
new file mode 100644
index 0000000..294a5bf
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop.c
@@ -0,0 +1,41 @@
+/* { dg-do run } */
+/* { dg-options "-ftree-parallelize-loops=32 -O2" } */
+
+#include <stdlib.h>
+
+#define N (1024 * 512)
+#define COUNTERTYPE unsigned int
+
+int
+main (void)
+{
+ unsigned int *__restrict a;
+ unsigned int *__restrict b;
+ unsigned int *__restrict c;
+
+ a = (unsigned int *__restrict)malloc (N * sizeof (unsigned int));
+ b = (unsigned int *__restrict)malloc (N * sizeof (unsigned int));
+ c = (unsigned int *__restrict)malloc (N * sizeof (unsigned int));
+
+ for (COUNTERTYPE i = 0; i < N; i++)
+ a[i] = i * 2;
+
+ for (COUNTERTYPE i = 0; i < N; i++)
+ b[i] = i * 4;
+
+#pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N])
+ {
+ for (COUNTERTYPE ii = 0; ii < N; ii++)
+ c[ii] = a[ii] + b[ii];
+ }
+
+ for (COUNTERTYPE i = 0; i < N; i++)
+ if (c[i] != a[i] + b[i])
+ abort ();
+
+ free (a);
+ free (b);
+ free (c);
+
+ return 0;
+}
Grüße,
Thomas
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]
next prev parent reply other threads:[~2015-04-21 20:09 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-11-15 14:08 openacc kernels directive -- initial support Tom de Vries
2014-11-15 17:21 ` [PATCH, 1/8] Expand oacc kernels after pass_build_ealias Tom de Vries
2014-11-24 11:29 ` Tom de Vries
2014-11-25 11:30 ` Tom de Vries
2015-04-21 19:40 ` Expand oacc kernels after pass_fre (was: [PATCH, 1/8] Expand oacc kernels after pass_build_ealias) Thomas Schwinge
2015-04-22 7:36 ` Richard Biener
2015-06-04 16:50 ` Expand oacc kernels after pass_fre Tom de Vries
2015-06-08 7:29 ` Richard Biener
2015-06-19 9:04 ` Tom de Vries
2015-08-05 7:24 ` [committed, gomp4] Fix release_dangling_ssa_names Tom de Vries
2015-08-05 7:29 ` Richard Biener
2015-08-05 8:48 ` Tom de Vries
2015-08-05 9:30 ` Richard Biener
2015-08-05 10:49 ` Tom de Vries
2015-08-05 11:13 ` Richard Biener
2015-08-11 9:25 ` [committed] Add todo comment for move_sese_region_to_fn Tom de Vries
2015-08-11 18:53 ` [PATCH] Don't create superfluous parm in expand_omp_taskreg Tom de Vries
2015-08-12 10:51 ` Richard Biener
2015-09-24 6:36 ` Thomas Schwinge
2015-09-24 7:21 ` Tom de Vries
2015-09-24 9:31 ` Thomas Schwinge
2015-09-30 8:05 ` [gomp4,committed] Remove release_dangling_ssa_names Tom de Vries
2015-09-30 10:05 ` Thomas Schwinge
2015-09-30 10:25 ` Tom de Vries
2015-09-30 10:43 ` Thomas Schwinge
2014-11-15 17:22 ` [PATCH, 2/8] Add pass_oacc_kernels Tom de Vries
2014-11-25 11:31 ` Tom de Vries
2015-04-21 19:46 ` Thomas Schwinge
2014-11-15 17:23 ` [PATCH, 4/8] Add pass_tree_loop_{init,done} to pass_oacc_kernels Tom de Vries
2014-11-25 11:42 ` Tom de Vries
2015-04-21 19:52 ` Thomas Schwinge
2015-04-22 7:40 ` Richard Biener
2015-06-02 13:52 ` Tom de Vries
2015-06-02 13:58 ` Richard Biener
2015-06-02 15:40 ` Tom de Vries
2015-06-03 11:26 ` Richard Biener
2014-11-15 17:23 ` [PATCH, 3/8] Add pass_ch_oacc_kernels " Tom de Vries
2014-11-25 11:39 ` Tom de Vries
2015-04-21 19:49 ` Thomas Schwinge
2015-04-22 7:39 ` Richard Biener
2015-06-03 9:22 ` Tom de Vries
2015-06-03 11:21 ` Richard Biener
2015-06-04 15:59 ` Tom de Vries
2015-06-03 10:05 ` Tom de Vries
2015-06-03 11:22 ` Richard Biener
2014-11-15 17:24 ` [PATCH, 5/8] Add pass_loop_im " Tom de Vries
2014-11-25 12:00 ` Tom de Vries
2015-04-21 19:57 ` [PATCH, 5/8] Add pass_lim " Thomas Schwinge
2014-11-15 18:32 ` [PATCH, 6/8] Add pass_ccp " Tom de Vries
2014-11-25 12:03 ` Tom de Vries
2015-04-21 20:01 ` [PATCH, 6/8] Add pass_copy_prop in pass_oacc_kernels Thomas Schwinge
2015-04-22 7:42 ` Richard Biener
2015-06-02 13:04 ` Tom de Vries
2014-11-15 18:52 ` [PATCH, 7/8] Add pass_parloops_oacc_kernels to pass_oacc_kernels Tom de Vries
2014-11-25 12:15 ` Tom de Vries
2015-04-21 20:09 ` Thomas Schwinge [this message]
2014-11-15 19:04 ` [PATCH, 8/8] Do simple omp lowering for no address taken var Tom de Vries
2014-11-17 10:29 ` Richard Biener
2014-11-18 9:13 ` Eric Botcazou
2014-11-18 9:53 ` Richard Biener
2014-11-18 12:20 ` Richard Biener
2014-11-24 11:53 ` Tom de Vries
2014-11-24 11:55 ` Tom de Vries
2014-11-24 12:42 ` Richard Biener
2014-11-24 18:49 ` Tom de Vries
2014-11-24 12:40 ` Richard Biener
2014-11-19 20:34 ` openacc kernels directive -- initial support Tom de Vries
2015-04-21 19:27 ` Add BUILT_IN_GOACC_KERNELS_INTERNAL (was: openacc kernels directive -- initial support) Thomas Schwinge
2015-04-21 20:24 ` Handle global loop counters in fortran oacc kernels " Thomas Schwinge
2015-04-21 20:29 ` Handle global loop counters in c/c++ " Thomas Schwinge
2015-04-21 20:33 ` Handle oacc kernels with other directives " Thomas Schwinge
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87sibtz1ka.fsf@kepler.schwinge.homeip.net \
--to=thomas@codesourcery.com \
--cc=Tom_deVries@mentor.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=jakub@redhat.com \
--cc=rguenther@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).