From: Cesar Philippidis <cesar@codesourcery.com>
To: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Cc: Tom de Vries <Tom_deVries@mentor.com>
Subject: [og7] vector_length extension part 3: reductions
Date: Fri, 02 Mar 2018 17:51:00 -0000 [thread overview]
Message-ID: <d00ad9ed-18c4-d8b2-603e-1e1993ccf07b@codesourcery.com> (raw)
In-Reply-To: <d6642c62-6d01-10ce-dda2-f5fa453ed971@codesourcery.com>
[-- Attachment #1: Type: text/plain, Size: 2551 bytes --]
This patch teaches the nvptx BE how to process vector reductions with
large vector lengths. The original vector reduction finalizer won't work
because it uses a warp shuffle operations. Now that vectors may contain
multiple warps, they need to store the partial reductions into
shared-memory like workers. Once the reduction variable is placed in
shared-memory, it will use the same atomic finalizer to update it as the
workers.
Much like the shared-memory spill-and-fill vector state propagation
extension, the nvptx BE needs to reserve enough shared-memory for each
worker that may encounter a vector reduction. That's why the reduction
functions have been augmented with an offload_attrs arguments. The
offload_attrs contains a max_workers field. Unlike vector_length, which
is fixed as a compile-time constant, num_workers can be altered
dynamically at runtime. Given that the size of a CUDA block is fixed,
max_workers is set to max_block_size / vector_length. This will be
discussed further in the next patch.
Effectively, the nvptx BE will now maintain a shared-memory reduction
buffer, named vector_red_sym, that contains max_workers logical
reduction partitions, where each partition contains enough shared-memory
for all of the reductions used by a single vector. By design, OpenACC
reductions are expanded relatively early during oaccdevlow. Because
accessing the reduction partition is a common operation, the partition
offset is placed in a register stored in cfun->machine_red_partition and
initialized in nvptx_init_axis_predicate. Due to how late that register
becomes available, nvptx_expand_shared_addr emits a
gen_nvptx_red_partition instruction to acquire share-memory address for
the reduction variable indirectly.
You may notice a hack in nvptx_declare_function_name. I observed that
sometimes GCC will mark red_partition as dead and not emit PTX code to
declare it. That's why nvptx_declare_function_name manually inserts it
into regno_reg_rtx prior to declaring all of the PTX registers. I think
there might be something wrong with nvptx_red_partition instruction.
Tom, can you take a look at it?
Ultimately, I suspect that large workers would greatly benefit by using
a new parallel tree reduction finalizer. Whereas the atomic finalizer
may have been suitable for a maximum of 32 workers, vector_length can be
up to 1024 threads, and a sequential finalizer will be slow. However,
that's a project for another day.
I'll commit this patch to openacc-gcc-7-branch after Tom reviews the new
nvptx_red_partition insn.
Cesar
[-- Attachment #2: og7-vl-part3-reductions.diff --]
[-- Type: text/x-patch, Size: 14258 bytes --]
2018-03-02 Cesar Philippidis <cesar@codesourcery.com>
gcc/
* config/nvptx/nvptx-protos.h (nvptx_output_red_partition): Declare.
* config/nvptx/nvptx.c (vector_red_size, vector_red_align,
vector_red_partition, vector_red_sym): New global variables.
(nvptx_option_override): Initialize vector_red_sym.
(nvptx_declare_function_name): Restore red_partition register.
(nvptx_file_end): Emit code to declare the vector reduction variables.
(nvptx_output_red_partition): New function.
(nvptx_expand_shared_addr): Add vector argument. Use it to handle
large vector reductions.
(enum nvptx_builtins): Add NVPTX_BUILTIN_VECTOR_ADDR.
(nvptx_init_builtins): Add VECTOR_ADDR.
(nvptx_expand_builtin): Update call to nvptx_expand_shared_addr.
Handle nvptx_expand_shared_addr.
(nvptx_get_shared_red_addr): Add vector argument and handle large
vectors.
(nvptx_goacc_reduction_setup): Add offload_attrs argument and handle
large vectors.
(nvptx_goacc_reduction_init): Likewise.
(nvptx_goacc_reduction_fini): Likewise.
(nvptx_goacc_reduction_teardown): Likewise.
(nvptx_goacc_reduction): Update calls to nvptx_goacc_reduction_{setup,
init,fini,teardown}.
* config/nvptx/nvptx.md (UNSPECV_RED_PART): New unspecv.
(nvptx_red_partition): New insn.
From 3834101d5144666f30d8798e983e276bd2c66636 Mon Sep 17 00:00:00 2001
From: Cesar Philippidis <cesar@codesourcery.com>
Date: Fri, 2 Mar 2018 07:36:11 -0800
Subject: [PATCH] reductions
---
gcc/config/nvptx/nvptx-protos.h | 1 +
gcc/config/nvptx/nvptx.c | 146 +++++++++++++++++++++++++++++++---------
gcc/config/nvptx/nvptx.md | 12 ++++
3 files changed, 128 insertions(+), 31 deletions(-)
diff --git a/gcc/config/nvptx/nvptx-protos.h b/gcc/config/nvptx/nvptx-protos.h
index 16b316f12b8..326c38c5dc7 100644
--- a/gcc/config/nvptx/nvptx-protos.h
+++ b/gcc/config/nvptx/nvptx-protos.h
@@ -55,5 +55,6 @@ extern const char *nvptx_output_return (void);
extern const char *nvptx_output_set_softstack (unsigned);
extern const char *nvptx_output_simt_enter (rtx, rtx, rtx);
extern const char *nvptx_output_simt_exit (rtx);
+extern const char *nvptx_output_red_partition (rtx, rtx);
#endif
#endif
diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 4a48d44f44c..9d77176c638 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -142,6 +142,14 @@ static unsigned worker_red_size;
static unsigned worker_red_align;
static GTY(()) rtx worker_red_sym;
+/* Buffer needed for vector reductions, when vector_length >
+ PTX_WARP_SIZE. This has to be distinct from the worker broadcast
+ array, as both may be live concurrently. */
+static unsigned vector_red_size;
+static unsigned vector_red_align;
+static unsigned vector_red_partition;
+static GTY(()) rtx vector_red_sym;
+
/* Shared memory block for gang-private variables. */
static unsigned gangprivate_shared_size;
static unsigned gangprivate_shared_align;
@@ -215,6 +223,10 @@ nvptx_option_override (void)
SET_SYMBOL_DATA_AREA (worker_red_sym, DATA_AREA_SHARED);
worker_red_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT;
+ vector_red_sym = gen_rtx_SYMBOL_REF (Pmode, "__vector_red");
+ SET_SYMBOL_DATA_AREA (vector_red_sym, DATA_AREA_SHARED);
+ vector_red_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT;
+
gangprivate_shared_sym = gen_rtx_SYMBOL_REF (Pmode, "__gangprivate_shared");
SET_SYMBOL_DATA_AREA (gangprivate_shared_sym, DATA_AREA_SHARED);
gangprivate_shared_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT;
@@ -1296,6 +1308,12 @@ nvptx_declare_function_name (FILE *file, const char *name, const_tree decl)
fprintf (file, "\t.local.align 8 .b8 %%simtstack_ar["
HOST_WIDE_INT_PRINT_DEC "];\n", simtsz);
}
+
+ /* Restore the vector reduction partition register, if necessary. */
+ if (cfun->machine->red_partition)
+ regno_reg_rtx[REGNO (cfun->machine->red_partition)]
+ = cfun->machine->red_partition;
+
/* Declare the pseudos we have as ptx registers. */
int maxregs = max_reg_num ();
for (int i = LAST_VIRTUAL_REGISTER + 1; i < maxregs; i++)
@@ -4732,6 +4750,10 @@ nvptx_file_end (void)
write_shared_buffer (asm_out_file, worker_red_sym,
worker_red_align, worker_red_size);
+ if (vector_red_size)
+ write_shared_buffer (asm_out_file, vector_red_sym,
+ vector_red_align, vector_red_size);
+
if (gangprivate_shared_size)
write_shared_buffer (asm_out_file, gangprivate_shared_sym,
gangprivate_shared_align, gangprivate_shared_size);
@@ -4781,33 +4803,78 @@ nvptx_expand_shuffle (tree exp, rtx target, machine_mode mode, int ignore)
return target;
}
-/* Worker reduction address expander. */
+const char *
+nvptx_output_red_partition (rtx dst, rtx offset)
+{
+ const char *zero_offset = "\t\tmov.u64\t%%r%d, %%r%d; // vred buffer\n";
+ const char *with_offset = "\t\tadd.u64\t%%r%d, %%r%d, %d; // vred buffer\n";
+
+ if (offset == const0_rtx)
+ fprintf (asm_out_file, zero_offset, REGNO (dst),
+ REGNO (cfun->machine->red_partition));
+ else
+ fprintf (asm_out_file, with_offset, REGNO (dst),
+ REGNO (cfun->machine->red_partition), UINTVAL (offset));
+
+ return "";
+}
+
+/* Shared-memory reduction address expander. */
static rtx
nvptx_expand_shared_addr (tree exp, rtx target,
- machine_mode ARG_UNUSED (mode), int ignore)
+ machine_mode ARG_UNUSED (mode), int ignore,
+ int vector)
{
if (ignore)
return target;
unsigned align = TREE_INT_CST_LOW (CALL_EXPR_ARG (exp, 2));
- if (align > worker_red_align)
- worker_red_align = align;
-
unsigned offset = TREE_INT_CST_LOW (CALL_EXPR_ARG (exp, 0));
unsigned size = TREE_INT_CST_LOW (CALL_EXPR_ARG (exp, 1));
- if (size + offset > worker_red_size)
- worker_red_size = size + offset;
-
rtx addr = worker_red_sym;
- if (offset)
+
+ if (vector)
{
- addr = gen_rtx_PLUS (Pmode, addr, GEN_INT (offset));
- addr = gen_rtx_CONST (Pmode, addr);
+ offload_attrs oa;
+ unsigned new_size = size + offset;
+
+ populate_offload_attrs (&oa);
+
+ new_size = (new_size * oa.max_workers + align - 1) & ~(align - 1);
+
+ if (align > vector_red_align)
+ vector_red_align = align;
+
+ if (cfun->machine->red_partition == NULL)
+ cfun->machine->red_partition = gen_reg_rtx (Pmode);
+
+ if (new_size > vector_red_size)
+ {
+ int partition_size = (size + offset + align - 1) & ~(align -1);
+ vector_red_size = new_size;
+ vector_red_partition = partition_size;
+ }
+
+ addr = gen_reg_rtx (Pmode);
+ emit_insn (gen_nvptx_red_partition (addr, GEN_INT (offset)));
}
+ else
+ {
+ if (align > worker_red_align)
+ worker_red_align = align;
- emit_move_insn (target, addr);
+ if (size + offset > worker_red_size)
+ worker_red_size = size + offset;
+ if (offset)
+ {
+ addr = gen_rtx_PLUS (Pmode, addr, GEN_INT (offset));
+ addr = gen_rtx_CONST (Pmode, addr);
+ }
+ }
+
+ emit_move_insn (target, addr);
return target;
}
@@ -4874,6 +4941,7 @@ enum nvptx_builtins
NVPTX_BUILTIN_SHUFFLE,
NVPTX_BUILTIN_SHUFFLELL,
NVPTX_BUILTIN_WORKER_ADDR,
+ NVPTX_BUILTIN_VECTOR_ADDR,
NVPTX_BUILTIN_CMP_SWAP,
NVPTX_BUILTIN_CMP_SWAPLL,
NVPTX_BUILTIN_COND_UNI,
@@ -4912,6 +4980,8 @@ nvptx_init_builtins (void)
DEF (SHUFFLELL, "shufflell", (LLUINT, LLUINT, UINT, UINT, NULL_TREE));
DEF (WORKER_ADDR, "worker_addr",
(PTRVOID, ST, UINT, UINT, NULL_TREE));
+ DEF (VECTOR_ADDR, "vector_addr",
+ (PTRVOID, ST, UINT, UINT, NULL_TREE));
DEF (CMP_SWAP, "cmp_swap", (UINT, PTRVOID, UINT, UINT, NULL_TREE));
DEF (CMP_SWAPLL, "cmp_swapll", (LLUINT, PTRVOID, LLUINT, LLUINT, NULL_TREE));
DEF (COND_UNI, "cond_uni", (integer_type_node, integer_type_node, NULL_TREE));
@@ -4941,7 +5011,10 @@ nvptx_expand_builtin (tree exp, rtx target, rtx ARG_UNUSED (subtarget),
return nvptx_expand_shuffle (exp, target, mode, ignore);
case NVPTX_BUILTIN_WORKER_ADDR:
- return nvptx_expand_shared_addr (exp, target, mode, ignore);
+ return nvptx_expand_shared_addr (exp, target, mode, ignore, false);
+
+ case NVPTX_BUILTIN_VECTOR_ADDR:
+ return nvptx_expand_shared_addr (exp, target, mode, ignore, true);
case NVPTX_BUILTIN_CMP_SWAP:
case NVPTX_BUILTIN_CMP_SWAPLL:
@@ -5197,10 +5270,13 @@ nvptx_goacc_fork_join (gcall *call, const int dims[],
data at that location. */
static tree
-nvptx_get_shared_red_addr (tree type, tree offset)
+nvptx_get_shared_red_addr (tree type, tree offset, bool vector)
{
+ enum nvptx_builtins addr_dim = NVPTX_BUILTIN_WORKER_ADDR;
+ if (vector)
+ addr_dim = NVPTX_BUILTIN_VECTOR_ADDR;
machine_mode mode = TYPE_MODE (type);
- tree fndecl = nvptx_builtin_decl (NVPTX_BUILTIN_WORKER_ADDR, true);
+ tree fndecl = nvptx_builtin_decl (addr_dim, true);
tree size = build_int_cst (unsigned_type_node, GET_MODE_SIZE (mode));
tree align = build_int_cst (unsigned_type_node,
GET_MODE_ALIGNMENT (mode) / BITS_PER_UNIT);
@@ -5631,7 +5707,7 @@ nvptx_adjust_reduction_type (tree var, tree type, gimple_seq *seq)
/* NVPTX implementation of GOACC_REDUCTION_SETUP. */
static void
-nvptx_goacc_reduction_setup (gcall *call)
+nvptx_goacc_reduction_setup (gcall *call, offload_attrs *oa)
{
gimple_stmt_iterator gsi = gsi_for_stmt (call);
tree lhs = gimple_call_lhs (call);
@@ -5654,11 +5730,13 @@ nvptx_goacc_reduction_setup (gcall *call)
}
}
- if (level == GOMP_DIM_WORKER)
+ if (level == GOMP_DIM_WORKER
+ || (level == GOMP_DIM_VECTOR && oa->vector_length > PTX_WARP_SIZE))
{
/* Store incoming value to worker reduction buffer. */
tree offset = gimple_call_arg (call, 5);
- tree call = nvptx_get_shared_red_addr (TREE_TYPE (var), offset);
+ tree call = nvptx_get_shared_red_addr (TREE_TYPE (var), offset,
+ level == GOMP_DIM_VECTOR);
tree ptr = make_ssa_name (TREE_TYPE (call));
gimplify_assign (ptr, call, &seq);
@@ -5677,7 +5755,7 @@ nvptx_goacc_reduction_setup (gcall *call)
/* NVPTX implementation of GOACC_REDUCTION_INIT. */
static void
-nvptx_goacc_reduction_init (gcall *call)
+nvptx_goacc_reduction_init (gcall *call, offload_attrs *oa)
{
gimple_stmt_iterator gsi = gsi_for_stmt (call);
tree lhs = gimple_call_lhs (call);
@@ -5691,7 +5769,7 @@ nvptx_goacc_reduction_init (gcall *call)
push_gimplify_context (true);
- if (level == GOMP_DIM_VECTOR)
+ if (level == GOMP_DIM_VECTOR && oa->vector_length == PTX_WARP_SIZE)
{
/* Initialize vector-non-zeroes to INIT_VAL (OP). */
tree tid = make_ssa_name (integer_type_node);
@@ -5763,7 +5841,7 @@ nvptx_goacc_reduction_init (gcall *call)
/* NVPTX implementation of GOACC_REDUCTION_FINI. */
static void
-nvptx_goacc_reduction_fini (gcall *call)
+nvptx_goacc_reduction_fini (gcall *call, offload_attrs *oa)
{
gimple_stmt_iterator gsi = gsi_for_stmt (call);
tree lhs = gimple_call_lhs (call);
@@ -5777,17 +5855,18 @@ nvptx_goacc_reduction_fini (gcall *call)
push_gimplify_context (true);
- if (level == GOMP_DIM_VECTOR)
+ if (level == GOMP_DIM_VECTOR && oa->vector_length == PTX_WARP_SIZE)
r = nvptx_vector_reduction (gimple_location (call), &gsi, var, op);
else
{
tree accum = NULL_TREE;
- if (level == GOMP_DIM_WORKER)
+ if (level == GOMP_DIM_WORKER || level == GOMP_DIM_VECTOR)
{
/* Get reduction buffer address. */
tree offset = gimple_call_arg (call, 5);
- tree call = nvptx_get_shared_red_addr (TREE_TYPE (var), offset);
+ tree call = nvptx_get_shared_red_addr (TREE_TYPE (var), offset,
+ level == GOMP_DIM_VECTOR);
tree ptr = make_ssa_name (TREE_TYPE (call));
gimplify_assign (ptr, call, &seq);
@@ -5822,7 +5901,7 @@ nvptx_goacc_reduction_fini (gcall *call)
/* NVPTX implementation of GOACC_REDUCTION_TEARDOWN. */
static void
-nvptx_goacc_reduction_teardown (gcall *call)
+nvptx_goacc_reduction_teardown (gcall *call, offload_attrs *oa)
{
gimple_stmt_iterator gsi = gsi_for_stmt (call);
tree lhs = gimple_call_lhs (call);
@@ -5831,11 +5910,13 @@ nvptx_goacc_reduction_teardown (gcall *call)
gimple_seq seq = NULL;
push_gimplify_context (true);
- if (level == GOMP_DIM_WORKER)
+ if (level == GOMP_DIM_WORKER
+ || (level == GOMP_DIM_VECTOR && oa->vector_length > PTX_WARP_SIZE))
{
/* Read the worker reduction buffer. */
tree offset = gimple_call_arg (call, 5);
- tree call = nvptx_get_shared_red_addr(TREE_TYPE (var), offset);
+ tree call = nvptx_get_shared_red_addr(TREE_TYPE (var), offset,
+ level == GOMP_DIM_VECTOR);
tree ptr = make_ssa_name (TREE_TYPE (call));
gimplify_assign (ptr, call, &seq);
@@ -5870,23 +5951,26 @@ static void
nvptx_goacc_reduction (gcall *call)
{
unsigned code = (unsigned)TREE_INT_CST_LOW (gimple_call_arg (call, 0));
+ offload_attrs oa;
+
+ populate_offload_attrs (&oa);
switch (code)
{
case IFN_GOACC_REDUCTION_SETUP:
- nvptx_goacc_reduction_setup (call);
+ nvptx_goacc_reduction_setup (call, &oa);
break;
case IFN_GOACC_REDUCTION_INIT:
- nvptx_goacc_reduction_init (call);
+ nvptx_goacc_reduction_init (call, &oa);
break;
case IFN_GOACC_REDUCTION_FINI:
- nvptx_goacc_reduction_fini (call);
+ nvptx_goacc_reduction_fini (call, &oa);
break;
case IFN_GOACC_REDUCTION_TEARDOWN:
- nvptx_goacc_reduction_teardown (call);
+ nvptx_goacc_reduction_teardown (call, &oa);
break;
default:
diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index ac7b7cc8440..28ae263c867 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -66,6 +66,8 @@
UNSPECV_SIMT_ENTER
UNSPECV_SIMT_EXIT
+
+ UNSPECV_RED_PART
])
(define_attr "subregs_ok" "false,true"
@@ -1427,3 +1429,13 @@
""
"\\t.pragma \\\"nounroll\\\";"
[(set_attr "predicable" "false")])
+
+(define_insn "nvptx_red_partition"
+ [(set (match_operand:DI 0 "nonimmediate_operand" "=R")
+ (unspec_volatile [(match_operand:DI 1 "const_int_operand")]
+ UNSPECV_RED_PART))]
+ ""
+ {
+ return nvptx_output_red_partition (operands[0], operands[1]);
+ }
+ [(set_attr "predicable" "false")])
--
2.14.3
next prev parent reply other threads:[~2018-03-02 17:51 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-03-01 21:17 [og7] vector_length extension part 1: generalize function and variable names Cesar Philippidis
2018-03-02 16:55 ` [og7] vector_length extension part 2: Generalize state propagation and synchronization Cesar Philippidis
2018-03-21 17:16 ` Tom de Vries
2018-03-22 8:05 ` Cesar Philippidis
2018-03-22 14:16 ` Tom de Vries
2018-03-22 14:35 ` Cesar Philippidis
2018-03-22 14:24 ` Tom de Vries
2018-03-22 15:18 ` Cesar Philippidis
2018-03-22 16:20 ` Tom de Vries
2018-03-22 17:26 ` Cesar Philippidis
2018-03-22 17:58 ` Tom de Vries
2018-03-22 19:32 ` Cesar Philippidis
2018-03-23 8:56 ` Tom de Vries
2018-03-23 14:35 ` Tom de Vries
2018-03-22 15:04 ` Tom de Vries
2018-03-22 17:14 ` Cesar Philippidis
2018-03-22 17:47 ` Tom de Vries
2018-03-22 17:48 ` Cesar Philippidis
2018-03-22 18:00 ` Tom de Vries
2018-03-23 13:14 ` Tom de Vries
2018-03-23 13:16 ` Tom de Vries
2018-03-23 14:18 ` Tom de Vries
2018-03-23 16:30 ` Tom de Vries
2018-03-30 1:50 ` Tom de Vries
2018-03-30 14:48 ` Tom de Vries
2018-03-30 15:06 ` Cesar Philippidis
2018-03-30 15:35 ` Tom de Vries
2018-04-05 16:33 ` Tom de Vries
2018-04-03 14:52 ` [nvptx] Use MAX, MIN, ROUND_UP macros Tom de Vries
2018-04-03 15:00 ` [og7] vector_length extension part 2: Generalize state propagation and synchronization Tom de Vries
2018-04-05 14:06 ` Tom de Vries
2018-04-05 14:14 ` Tom de Vries
2018-03-02 17:51 ` Cesar Philippidis [this message]
2018-04-05 14:07 ` [og7] vector_length extension part 3: reductions Tom de Vries
2018-04-05 16:26 ` Tom de Vries
2018-03-02 19:18 ` [og7] vector_length extension part 4: target hooks and automatic parallelism Cesar Philippidis
2018-03-21 15:55 ` Tom de Vries
2018-03-21 20:28 ` Cesar Philippidis
2018-03-26 14:25 ` Tom de Vries
2018-03-26 14:37 ` Cesar Philippidis
2018-03-26 16:52 ` Tom de Vries
2018-03-27 12:16 ` Tom de Vries
2018-03-26 17:13 ` Tom de Vries
2018-04-05 16:32 ` Tom de Vries
2018-03-02 20:47 ` [og7] vector_length extension part 5: libgomp and tests Cesar Philippidis
2018-03-16 13:50 ` Thomas Schwinge
2018-03-27 13:00 ` Tom de Vries
2018-04-05 16:36 ` Tom de Vries
2018-03-09 15:29 ` [og7] vector_length extension part 1: generalize function and variable names Thomas Schwinge
2018-03-09 15:31 ` Cesar Philippidis
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d00ad9ed-18c4-d8b2-603e-1e1993ccf07b@codesourcery.com \
--to=cesar@codesourcery.com \
--cc=Tom_deVries@mentor.com \
--cc=gcc-patches@gcc.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).