* [PATCH] nvptx per-warp compiler-defined stacks (-msoft-stack) @ 2016-04-20 17:00 Alexander Monakov 2016-04-21 14:08 ` Nathan Sidwell 0 siblings, 1 reply; 10+ messages in thread From: Alexander Monakov @ 2016-04-20 17:00 UTC (permalink / raw) To: gcc-patches; +Cc: Nathan Sidwell This patch implements per-warp compiler-defined stacks under -msoft-stack option, and implements alloca on top of that. In a few obvious places, changes from -muniform-simt patch are present in the hunks. Previously posted here: [PATCH] nvptx: implement automatic storage in custom stacks https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01519.html [gomp-nvptx] nvptx backend: implement alloca with -msoft-stack https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01397.html [gomp-nvptx 7/7] nvptx backend: define STACK_SIZE_MODE https://gcc.gnu.org/ml/gcc-patches/2016-03/msg01108.html 2016-03-15 Alexander Monakov <amonakov@ispras.ru> * config/nvptx/nvptx.h (STACK_SIZE_MODE): Define. 2015-12-14 Alexander Monakov <amonakov@ispras.ru> * config/nvptx/nvptx.c (nvptx_declare_function_name): Emit %outargs using .local %outargs_ar only if not TARGET_SOFT_STACK. Emit %outargs under TARGET_SOFT_STACK by offsetting from %frame. (nvptx_get_drap_rtx): Return %argp as the DRAP if needed. * config/nvptx/nvptx.md (nvptx_register_operand): Allow %outargs under TARGET_SOFT_STACK. (nvptx_nonimmediate_operand): Ditto. (allocate_stack): Implement for TARGET_SOFT_STACK. Remove unused code. (allocate_stack_<mode>): Remove unused pattern. (set_softstack_insn): New pattern. (restore_stack_block): Handle for TARGET_SOFT_STACK. 2015-12-09 Alexander Monakov <amonakov@ispras.ru> * config/nvptx/nvptx.c: (need_softstack_decl): New variable. (nvptx_declare_function_name): Handle TARGET_SOFT_STACK. (nvptx_output_return): Emit stack restore if needed. (nvptx_file_end): Handle need_softstack_decl. * config/nvptx/nvptx.h: (TARGET_CPU_CPP_BUILTINS): Define __nvptx_softstack__ when -msoft-stack is active. (struct machine_function): New bool field using_softstack. * config/nvptx/nvptx.opt: (msoft-stack): New option. * doc/invoke.texi (msoft-stack): Document. diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c index 2d4dad1..e9e4d06 100644 --- a/gcc/config/nvptx/nvptx.c +++ b/gcc/config/nvptx/nvptx.c @@ -139,6 +129,12 @@ static GTY(()) rtx worker_red_sym; /* Global lock variable, needed for 128bit worker & gang reductions. */ static GTY(()) tree global_lock_var; +/* True if any function references __nvptx_stacks. */ +static bool need_softstack_decl; + +/* True if any function references __nvptx_uni. */ +static bool need_unisimt_decl; + /* Allocate a new, cleared machine_function structure. */ static struct machine_function * @@ -992,16 +1086,55 @@ nvptx_declare_function_name (FILE *file, const char *name, const_tree decl) fprintf (file, "%s", s.str().c_str()); - /* Declare a local var for outgoing varargs. */ - if (cfun->machine->has_varadic) - init_frame (file, STACK_POINTER_REGNUM, - UNITS_PER_WORD, crtl->outgoing_args_size); - - /* Declare a local variable for the frame. */ HOST_WIDE_INT sz = get_frame_size (); - if (sz || cfun->machine->has_chain) - init_frame (file, FRAME_POINTER_REGNUM, - crtl->stack_alignment_needed / BITS_PER_UNIT, sz); + bool need_frameptr = sz || cfun->machine->has_chain; + int alignment = crtl->stack_alignment_needed / BITS_PER_UNIT; + if (!TARGET_SOFT_STACK) + { + /* Declare a local var for outgoing varargs. */ + if (cfun->machine->has_varadic) + init_frame (file, STACK_POINTER_REGNUM, + UNITS_PER_WORD, crtl->outgoing_args_size); + + /* Declare a local variable for the frame. */ + if (need_frameptr) + init_frame (file, FRAME_POINTER_REGNUM, alignment, sz); + } + else if (need_frameptr || cfun->machine->has_varadic || cfun->calls_alloca) + { + /* Maintain 64-bit stack alignment. */ + int keep_align = BIGGEST_ALIGNMENT / BITS_PER_UNIT; + sz = ROUND_UP (sz, keep_align); + int bits = POINTER_SIZE; + fprintf (file, "\t.reg.u%d %%frame;\n", bits); + fprintf (file, "\t.reg.u32 %%fstmp0;\n"); + fprintf (file, "\t.reg.u%d %%fstmp1;\n", bits); + fprintf (file, "\t.reg.u%d %%fstmp2;\n", bits); + fprintf (file, "\tmov.u32 %%fstmp0, %%tid.y;\n"); + fprintf (file, "\tmul%s.u32 %%fstmp1, %%fstmp0, %d;\n", + bits == 64 ? ".wide" : ".lo", bits / 8); + fprintf (file, "\tmov.u%d %%fstmp2, __nvptx_stacks;\n", bits); + /* fstmp2 = &__nvptx_stacks[tid.y]; */ + fprintf (file, "\tadd.u%d %%fstmp2, %%fstmp2, %%fstmp1;\n", bits); + fprintf (file, "\tld.shared.u%d %%fstmp1, [%%fstmp2];\n", bits); + fprintf (file, "\tsub.u%d %%frame, %%fstmp1, " + HOST_WIDE_INT_PRINT_DEC ";\n", bits, sz); + if (alignment > keep_align) + fprintf (file, "\tand.b%d %%frame, %%frame, %d;\n", + bits, -alignment); + fprintf (file, "\t.reg.u%d %%stack;\n", bits); + sz = crtl->outgoing_args_size; + gcc_assert (sz % keep_align == 0); + fprintf (file, "\tsub.u%d %%stack, %%frame, " + HOST_WIDE_INT_PRINT_DEC ";\n", bits, sz); + /* crtl->is_leaf is not initialized because RA is not run. */ + if (!leaf_function_p ()) + { + fprintf (file, "\tst.shared.u%d [%%fstmp2], %%stack;\n", bits); + cfun->machine->using_softstack = true; + } + need_softstack_decl = true; + } /* Declare the pseudos we have as ptx registers. */ int maxregs = max_reg_num (); @@ -1037,6 +1172,10 @@ nvptx_output_return (void) { machine_mode mode = (machine_mode)cfun->machine->return_mode; + if (cfun->machine->using_softstack) + fprintf (asm_out_file, "\tst.shared.u%d [%%fstmp2], %%fstmp1;\n", + POINTER_SIZE); + if (mode != VOIDmode) fprintf (asm_out_file, "\tst.param%s\t[%s_out], %s;\n", nvptx_ptx_type_from_mode (mode, false), @@ -1068,6 +1207,8 @@ nvptx_function_ok_for_sibcall (tree, tree) static rtx nvptx_get_drap_rtx (void) { + if (TARGET_SOFT_STACK && stack_realign_drap) + return arg_pointer_rtx; return NULL_RTX; } @@ -3939,6 +4183,18 @@ nvptx_file_end (void) if (worker_red_size) write_worker_buffer (asm_out_file, worker_red_sym, worker_red_align, worker_red_size); + + if (need_softstack_decl) + { + write_var_marker (asm_out_file, false, true, "__nvptx_stacks"); + fprintf (asm_out_file, ".extern .shared .u%d __nvptx_stacks[32];\n", + POINTER_SIZE); + } + if (need_unisimt_decl) + { + write_var_marker (asm_out_file, false, true, "__nvptx_uni"); + fprintf (asm_out_file, ".extern .shared .u32 __nvptx_uni[32];\n"); + } } /* Expander for the shuffle builtins. */ diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h index 381269e..6da4d06 100644 --- a/gcc/config/nvptx/nvptx.h +++ b/gcc/config/nvptx/nvptx.h @@ -31,6 +31,10 @@ builtin_assert ("machine=nvptx"); \ builtin_assert ("cpu=nvptx"); \ builtin_define ("__nvptx__"); \ + if (TARGET_SOFT_STACK) \ + builtin_define ("__nvptx_softstack__"); \ + if (TARGET_UNIFORM_SIMT) \ + builtin_define ("__nvptx_unisimt__"); \ } while (0) /* Avoid the default in ../../gcc.c, which adds "-pthread", which is not @@ -79,6 +83,7 @@ #define POINTER_SIZE (TARGET_ABI64 ? 64 : 32) #define Pmode (TARGET_ABI64 ? DImode : SImode) +#define STACK_SIZE_MODE Pmode /* Registers. Since ptx is a virtual target, we just define a few hard registers for special purposes and leave pseudos unallocated. @@ -200,10 +205,13 @@ struct GTY(()) machine_function bool is_varadic; /* This call is varadic */ bool has_varadic; /* Current function has a varadic call. */ bool has_chain; /* Current function has outgoing static chain. */ + bool using_softstack; /* Need to restore __nvptx_stacks[tid.y]. */ int num_args; /* Number of args of current call. */ int return_mode; /* Return mode of current fn. (machine_mode not defined yet.) */ rtx axis_predicate[2]; /* Neutering predicates. */ + rtx unisimt_master; /* Master lane index for "uniform simt" mode. */ + rtx unisimt_predicate; /* Predicate register for "uniform simt". */ }; #endif \f diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md index 33a4862..e5650b6 100644 --- a/gcc/config/nvptx/nvptx.md +++ b/gcc/config/nvptx/nvptx.md @@ -961,31 +986,41 @@ (define_expand "allocate_stack" (match_operand 1 "nvptx_register_operand")] "" { + if (TARGET_SOFT_STACK) + { + emit_move_insn (stack_pointer_rtx, + gen_rtx_MINUS (Pmode, stack_pointer_rtx, operands[1])); + emit_insn (gen_set_softstack_insn (stack_pointer_rtx)); + emit_move_insn (operands[0], virtual_stack_dynamic_rtx); + DONE; + } /* The ptx documentation specifies an alloca intrinsic (for 32 bit only) but notes it is not implemented. The assembler emits a confused error message. Issue a blunt one now instead. */ sorry ("target cannot support alloca."); emit_insn (gen_nop ()); DONE; - if (TARGET_ABI64) - emit_insn (gen_allocate_stack_di (operands[0], operands[1])); - else - emit_insn (gen_allocate_stack_si (operands[0], operands[1])); - DONE; }) -(define_insn "allocate_stack_<mode>" - [(set (match_operand:P 0 "nvptx_register_operand" "=R") - (unspec:P [(match_operand:P 1 "nvptx_register_operand" "R")] - UNSPEC_ALLOCA))] - "" - "%.\\tcall (%0), %%alloca, (%1);") +(define_insn "set_softstack_insn" + [(unspec [(match_operand 0 "nvptx_register_operand" "R")] UNSPEC_ALLOCA)] + "TARGET_SOFT_STACK" +{ + return (cfun->machine->using_softstack + ? "%.\\tst.shared%t0\\t[%%fstmp2], %0;" + : ""); +}) (define_expand "restore_stack_block" [(match_operand 0 "register_operand" "") (match_operand 1 "register_operand" "")] "" { + if (TARGET_SOFT_STACK) + { + emit_move_insn (operands[0], operands[1]); + emit_insn (gen_set_softstack_insn (operands[0])); + } DONE; }) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] nvptx per-warp compiler-defined stacks (-msoft-stack) 2016-04-20 17:00 [PATCH] nvptx per-warp compiler-defined stacks (-msoft-stack) Alexander Monakov @ 2016-04-21 14:08 ` Nathan Sidwell 2016-05-20 15:09 ` Alexander Monakov 0 siblings, 1 reply; 10+ messages in thread From: Nathan Sidwell @ 2016-04-21 14:08 UTC (permalink / raw) To: Alexander Monakov, gcc-patches On 04/20/16 12:59, Alexander Monakov wrote: > This patch implements per-warp compiler-defined stacks under -msoft-stack > option, and implements alloca on top of that. In a few obvious places, > changes from -muniform-simt patch are present in the hunks. > It'd be better to not mix fragments of patches, and have a description of how soft stacks works. > + /* fstmp2 = &__nvptx_stacks[tid.y]; */ ? > + /* crtl->is_leaf is not initialized because RA is not run. */ Cryptic comment is cryptic. > + fprintf (asm_out_file, ".extern .shared .u%d __nvptx_stacks[32];\n", Magic constant '32'? > + if (need_unisimt_decl) > + { > + write_var_marker (asm_out_file, false, true, "__nvptx_uni"); > + fprintf (asm_out_file, ".extern .shared .u32 __nvptx_uni[32];\n"); > + } Looks like some other patch? > > /* Expander for the shuffle builtins. */ > diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h > index 381269e..6da4d06 100644 > --- a/gcc/config/nvptx/nvptx.h > +++ b/gcc/config/nvptx/nvptx.h > + if (TARGET_UNIFORM_SIMT) \ > + builtin_define ("__nvptx_unisimt__"); \ Likewise. > + rtx unisimt_master; /* Master lane index for "uniform simt" mode. */ > + rtx unisimt_predicate; /* Predicate register for "uniform simt". */ Likewise. Needs testcases. nathan ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] nvptx per-warp compiler-defined stacks (-msoft-stack) 2016-04-21 14:08 ` Nathan Sidwell @ 2016-05-20 15:09 ` Alexander Monakov 2016-05-20 16:36 ` Sandra Loosemore 2016-05-20 19:42 ` Nathan Sidwell 0 siblings, 2 replies; 10+ messages in thread From: Alexander Monakov @ 2016-05-20 15:09 UTC (permalink / raw) To: Nathan Sidwell; +Cc: gcc-patches, Sandra Loosemore On Thu, 21 Apr 2016, Nathan Sidwell wrote: > On 04/20/16 12:59, Alexander Monakov wrote: > > This patch implements per-warp compiler-defined stacks under -msoft-stack > > option, and implements alloca on top of that. In a few obvious places, > > changes from -muniform-simt patch are present in the hunks. > > > > It'd be better to not mix fragments of patches, and have a description of how > soft stacks works. Added a description in the intro text. > > + /* fstmp2 = &__nvptx_stacks[tid.y]; */ > > ? This is what is being computed in the emitted assembly. Changed the comment to say, + /* Now %fstmp2 holds the value of '&__nvptx_stacks[%tid.y]'. */ > > + /* crtl->is_leaf is not initialized because RA is not run. */ > > Cryptic comment is cryptic. Changed the comment to say, + /* Ideally we'd use 'crtl->is_leaf' here, but it is computed during + register allocator initialization, which is not done on NVPTX. */ > > + fprintf (asm_out_file, ".extern .shared .u%d __nvptx_stacks[32];\n", > > Magic constant '32'? It's the maximum warp count in a CUDA block, but since it's an external declaration, it can be omitted altogether; changed to use empty brackets. > > + if (need_unisimt_decl) > > + { > > + write_var_marker (asm_out_file, false, true, "__nvptx_uni"); > > + fprintf (asm_out_file, ".extern .shared .u32 __nvptx_uni[32];\n"); > > + } > > Looks like some other patch? Yes, and likewise in other instances. Both the cover letter and the patch description mentioned that. Removed in this resubmission. > Needs testcases. Added a testcase that enables -msoft-stack explicitely; testing with RUNTESTFLAGS=--target_board=nvptx-none-run/-msoft-stack gives this functionality plenty of exercise, otherwise. Updated patch below. Alexander This patch implements '-msoft-stack' code generation variant for NVPTX. The goal is to avoid relying on '.local' memory space for placement of automatic data, and instead have an explicitely-maintained stack pointer (which can be set up to point to preallocated global memory space). This allows to have stack data accessible from all threads and modifiable with atomic instructions. This also allows to implement variable-length stack allocation (for 'alloca' and C99 VLAs). Each warp has its own 'soft stack' pointer. It lives in shared memory array called __nvptx_stacks at index %tid.y (like OpenACC, OpenMP is offloading is going to use launch geometry such that %tid.y gives the warp index). It is retrieved in function prologue (if the function needs a stack frame) and may also be written there (if the function is non-leaf, so that its callees see the updated stack pointer), and restored prior to returning. Startup code is responsible for setting up the initial soft-stack pointer. For -mmainkernel testing it is libgcc's __main, for OpenMP offloading it's the kernel region entry code. 2016-05-19 Alexander Monakov <amonakov@ispras.ru> * lib/target-supports.exp (check_effective_target_alloca): Use a compile test. 2016-05-19 Alexander Monakov <amonakov@ispras.ru> * gcc.target/nvptx/softstack.c: New test. 2016-05-19 Alexander Monakov <amonakov@ispras.ru> * config/nvptx/nvptx.c (nvptx_declare_function_name): Expand comments. (nvptx_file_end): Do not emit element count in external declaration of __nvptx_stacks. 2016-05-19 Alexander Monakov <amonakov@ispras.ru> * doc/invoke.texi (msoft-stack): Rewrite. 2016-03-15 Alexander Monakov <amonakov@ispras.ru> * config/nvptx/nvptx.h (STACK_SIZE_MODE): Define. 2015-12-14 Alexander Monakov <amonakov@ispras.ru> * config/nvptx/nvptx.c (nvptx_declare_function_name): Emit %outargs using .local %outargs_ar only if not TARGET_SOFT_STACK. Emit %outargs under TARGET_SOFT_STACK by offsetting from %frame. (nvptx_get_drap_rtx): Return %argp as the DRAP if needed. * config/nvptx/nvptx.md (nvptx_register_operand): Allow %outargs under TARGET_SOFT_STACK. (nvptx_nonimmediate_operand): Ditto. (allocate_stack): Implement for TARGET_SOFT_STACK. Remove unused code. (allocate_stack_<mode>): Remove unused pattern. (set_softstack_insn): New pattern. (restore_stack_block): Handle for TARGET_SOFT_STACK. 2015-12-09 Alexander Monakov <amonakov@ispras.ru> * config/nvptx/nvptx.c: (need_softstack_decl): New variable. (nvptx_declare_function_name): Handle TARGET_SOFT_STACK. (nvptx_output_return): Emit stack restore if needed. (nvptx_file_end): Handle need_softstack_decl. * config/nvptx/nvptx.h: (TARGET_CPU_CPP_BUILTINS): Define __nvptx_softstack__ when -msoft-stack is active. (struct machine_function): New bool field using_softstack. * config/nvptx/nvptx.opt: (msoft-stack): New option. * doc/invoke.texi (msoft-stack): Document. diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c index 2d4dad1..700c4b0 100644 --- a/gcc/config/nvptx/nvptx.c +++ b/gcc/config/nvptx/nvptx.c @@ -139,6 +129,9 @@ static GTY(()) rtx worker_red_sym; /* Global lock variable, needed for 128bit worker & gang reductions. */ static GTY(()) tree global_lock_var; +/* True if any function references __nvptx_stacks. */ +static bool need_softstack_decl; + /* Allocate a new, cleared machine_function structure. */ static struct machine_function * @@ -992,16 +1091,56 @@ nvptx_declare_function_name (FILE *file, const char *name, const_tree decl) fprintf (file, "%s", s.str().c_str()); - /* Declare a local var for outgoing varargs. */ - if (cfun->machine->has_varadic) - init_frame (file, STACK_POINTER_REGNUM, - UNITS_PER_WORD, crtl->outgoing_args_size); - - /* Declare a local variable for the frame. */ HOST_WIDE_INT sz = get_frame_size (); - if (sz || cfun->machine->has_chain) - init_frame (file, FRAME_POINTER_REGNUM, - crtl->stack_alignment_needed / BITS_PER_UNIT, sz); + bool need_frameptr = sz || cfun->machine->has_chain; + int alignment = crtl->stack_alignment_needed / BITS_PER_UNIT; + if (!TARGET_SOFT_STACK) + { + /* Declare a local var for outgoing varargs. */ + if (cfun->machine->has_varadic) + init_frame (file, STACK_POINTER_REGNUM, + UNITS_PER_WORD, crtl->outgoing_args_size); + + /* Declare a local variable for the frame. */ + if (need_frameptr) + init_frame (file, FRAME_POINTER_REGNUM, alignment, sz); + } + else if (need_frameptr || cfun->machine->has_varadic || cfun->calls_alloca) + { + /* Maintain 64-bit stack alignment. */ + int keep_align = BIGGEST_ALIGNMENT / BITS_PER_UNIT; + sz = ROUND_UP (sz, keep_align); + int bits = POINTER_SIZE; + fprintf (file, "\t.reg.u%d %%frame;\n", bits); + fprintf (file, "\t.reg.u32 %%fstmp0;\n"); + fprintf (file, "\t.reg.u%d %%fstmp1;\n", bits); + fprintf (file, "\t.reg.u%d %%fstmp2;\n", bits); + fprintf (file, "\tmov.u32 %%fstmp0, %%tid.y;\n"); + fprintf (file, "\tmul%s.u32 %%fstmp1, %%fstmp0, %d;\n", + bits == 64 ? ".wide" : ".lo", bits / 8); + fprintf (file, "\tmov.u%d %%fstmp2, __nvptx_stacks;\n", bits); + fprintf (file, "\tadd.u%d %%fstmp2, %%fstmp2, %%fstmp1;\n", bits); + /* Now %fstmp2 holds the value of '&__nvptx_stacks[%tid.y]'. */ + fprintf (file, "\tld.shared.u%d %%fstmp1, [%%fstmp2];\n", bits); + fprintf (file, "\tsub.u%d %%frame, %%fstmp1, " + HOST_WIDE_INT_PRINT_DEC ";\n", bits, sz); + if (alignment > keep_align) + fprintf (file, "\tand.b%d %%frame, %%frame, %d;\n", + bits, -alignment); + fprintf (file, "\t.reg.u%d %%stack;\n", bits); + sz = crtl->outgoing_args_size; + gcc_assert (sz % keep_align == 0); + fprintf (file, "\tsub.u%d %%stack, %%frame, " + HOST_WIDE_INT_PRINT_DEC ";\n", bits, sz); + /* Ideally we'd use 'crtl->is_leaf' here, but it is computed during + register allocator initialization, which is not done on NVPTX. */ + if (!leaf_function_p ()) + { + fprintf (file, "\tst.shared.u%d [%%fstmp2], %%stack;\n", bits); + cfun->machine->using_softstack = true; + } + need_softstack_decl = true; + } /* Declare the pseudos we have as ptx registers. */ int maxregs = max_reg_num (); @@ -1037,6 +1178,10 @@ nvptx_output_return (void) { machine_mode mode = (machine_mode)cfun->machine->return_mode; + if (cfun->machine->using_softstack) + fprintf (asm_out_file, "\tst.shared.u%d [%%fstmp2], %%fstmp1;\n", + POINTER_SIZE); + if (mode != VOIDmode) fprintf (asm_out_file, "\tst.param%s\t[%s_out], %s;\n", nvptx_ptx_type_from_mode (mode, false), @@ -1068,6 +1213,8 @@ nvptx_function_ok_for_sibcall (tree, tree) static rtx nvptx_get_drap_rtx (void) { + if (TARGET_SOFT_STACK && stack_realign_drap) + return arg_pointer_rtx; return NULL_RTX; } @@ -3939,6 +4189,13 @@ nvptx_file_end (void) if (worker_red_size) write_worker_buffer (asm_out_file, worker_red_sym, worker_red_align, worker_red_size); + + if (need_softstack_decl) + { + write_var_marker (asm_out_file, false, true, "__nvptx_stacks"); + fprintf (asm_out_file, ".extern .shared .u%d __nvptx_stacks[];\n", + POINTER_SIZE); + } } /* Expander for the shuffle builtins. */ diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h index 381269e..6da4d06 100644 --- a/gcc/config/nvptx/nvptx.h +++ b/gcc/config/nvptx/nvptx.h @@ -31,6 +31,8 @@ builtin_assert ("machine=nvptx"); \ builtin_assert ("cpu=nvptx"); \ builtin_define ("__nvptx__"); \ + if (TARGET_SOFT_STACK) \ + builtin_define ("__nvptx_softstack__"); \ } while (0) /* Avoid the default in ../../gcc.c, which adds "-pthread", which is not @@ -79,6 +83,7 @@ #define POINTER_SIZE (TARGET_ABI64 ? 64 : 32) #define Pmode (TARGET_ABI64 ? DImode : SImode) +#define STACK_SIZE_MODE Pmode /* Registers. Since ptx is a virtual target, we just define a few hard registers for special purposes and leave pseudos unallocated. @@ -200,6 +205,7 @@ struct GTY(()) machine_function bool is_varadic; /* This call is varadic */ bool has_varadic; /* Current function has a varadic call. */ bool has_chain; /* Current function has outgoing static chain. */ + bool using_softstack; /* Need to restore __nvptx_stacks[tid.y]. */ int num_args; /* Number of args of current call. */ int return_mode; /* Return mode of current fn. (machine_mode not defined yet.) */ diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md index 33a4862..e5650b6 100644 --- a/gcc/config/nvptx/nvptx.md +++ b/gcc/config/nvptx/nvptx.md @@ -961,31 +986,41 @@ (define_expand "allocate_stack" (match_operand 1 "nvptx_register_operand")] "" { + if (TARGET_SOFT_STACK) + { + emit_move_insn (stack_pointer_rtx, + gen_rtx_MINUS (Pmode, stack_pointer_rtx, operands[1])); + emit_insn (gen_set_softstack_insn (stack_pointer_rtx)); + emit_move_insn (operands[0], virtual_stack_dynamic_rtx); + DONE; + } /* The ptx documentation specifies an alloca intrinsic (for 32 bit only) but notes it is not implemented. The assembler emits a confused error message. Issue a blunt one now instead. */ sorry ("target cannot support alloca."); emit_insn (gen_nop ()); DONE; - if (TARGET_ABI64) - emit_insn (gen_allocate_stack_di (operands[0], operands[1])); - else - emit_insn (gen_allocate_stack_si (operands[0], operands[1])); - DONE; }) -(define_insn "allocate_stack_<mode>" - [(set (match_operand:P 0 "nvptx_register_operand" "=R") - (unspec:P [(match_operand:P 1 "nvptx_register_operand" "R")] - UNSPEC_ALLOCA))] - "" - "%.\\tcall (%0), %%alloca, (%1);") +(define_insn "set_softstack_insn" + [(unspec [(match_operand 0 "nvptx_register_operand" "R")] UNSPEC_ALLOCA)] + "TARGET_SOFT_STACK" +{ + return (cfun->machine->using_softstack + ? "%.\\tst.shared%t0\\t[%%fstmp2], %0;" + : ""); +}) (define_expand "restore_stack_block" [(match_operand 0 "register_operand" "") (match_operand 1 "register_operand" "")] "" { + if (TARGET_SOFT_STACK) + { + emit_move_insn (operands[0], operands[1]); + emit_insn (gen_set_softstack_insn (operands[0])); + } DONE; }) diff --git a/gcc/config/nvptx/nvptx.opt b/gcc/config/nvptx/nvptx.opt index 056b9b2..1a7608b 100644 --- a/gcc/config/nvptx/nvptx.opt +++ b/gcc/config/nvptx/nvptx.opt @@ -32,3 +32,7 @@ Link in code for a __main kernel. moptimize Target Report Var(nvptx_optimize) Init(-1) Optimize partition neutering + +msoft-stack +Target Report Mask(SOFT_STACK) +Use custom stacks instead of local memory for automatic storage. diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index d281975..f0331e2 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -19341,6 +19341,18 @@ offloading execution. Apply partitioned execution optimizations. This is the default when any level of optimization is selected. +@item -msoft-stack +@opindex msoft-stack +Switch to code generation variant that does not use @code{.local} memory +directly for stack storage. Instead, a per-warp stack pointer is +maintained explicitely. This enables variable-length stack allocation (with +variable-length arrays or @code{alloca}), and when global memory is used for +underlying storage, makes possible to access automatic variables from other +threads, or with atomic instructions. This code generation variant is used +for OpenMP offloading, but the option is exposed on its own for the purpose +of testing the compiler; to generate code suitable for linking into programs +using OpenMP offloading, use option @option{-mgomp}. + @end table @node PDP-11 Options diff --git a/gcc/testsuite/gcc.target/nvptx/softstack.c b/gcc/testsuite/gcc.target/nvptx/softstack.c new file mode 100644 index 0000000..73e60f2 --- /dev/null +++ b/gcc/testsuite/gcc.target/nvptx/softstack.c @@ -0,0 +1,23 @@ +/* { dg-options "-O2 -msoft-stack" } */ +/* { dg-do run } */ + +static __attribute__((noinline,noclone)) int f(int *p) +{ + return __sync_lock_test_and_set(p, 1); +} + +static __attribute__((noinline,noclone)) int g(int n) +{ + /* Check that variable-length stack allocation works. */ + int v[n]; + v[0] = 0; + /* Check that atomic operations can be applied to auto data. */ + return f(v) == 0 && v[0] == 1; +} + +int main() +{ + if (!g(1)) + __builtin_abort(); + return 0; +} diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index dceabb5..878fdab 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -729,7 +729,10 @@ proc check_effective_target_untyped_assembly {} { proc check_effective_target_alloca {} { if { [istarget nvptx-*-*] } { - return 0 + return [check_no_compiler_messages alloca assembly { + void f (void*); + void g (int n) { f (__builtin_alloca (n)); } + }] } return 1 } ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] nvptx per-warp compiler-defined stacks (-msoft-stack) 2016-05-20 15:09 ` Alexander Monakov @ 2016-05-20 16:36 ` Sandra Loosemore 2016-05-20 19:42 ` Nathan Sidwell 1 sibling, 0 replies; 10+ messages in thread From: Sandra Loosemore @ 2016-05-20 16:36 UTC (permalink / raw) To: Alexander Monakov, Nathan Sidwell; +Cc: gcc-patches On 05/20/2016 09:09 AM, Alexander Monakov wrote: > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index d281975..f0331e2 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -19341,6 +19341,18 @@ offloading execution. > Apply partitioned execution optimizations. This is the default when any > level of optimization is selected. > > +@item -msoft-stack > +@opindex msoft-stack > +Switch to code generation variant that does not use @code{.local} memory s/Switch to code generation variant/Generate code/ > +directly for stack storage. Instead, a per-warp stack pointer is > +maintained explicitely. This enables variable-length stack allocation (with s/explicitely/explicitly/ > +variable-length arrays or @code{alloca}), and when global memory is used for > +underlying storage, makes possible to access automatic variables from other s/makes possible/makes it possible/ > +threads, or with atomic instructions. This code generation variant is used > +for OpenMP offloading, but the option is exposed on its own for the purpose > +of testing the compiler; to generate code suitable for linking into programs > +using OpenMP offloading, use option @option{-mgomp}. > + > @end table > > @node PDP-11 Options The documentation bits are OK with those changes. -Sandra ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] nvptx per-warp compiler-defined stacks (-msoft-stack) 2016-05-20 15:09 ` Alexander Monakov 2016-05-20 16:36 ` Sandra Loosemore @ 2016-05-20 19:42 ` Nathan Sidwell 2016-05-25 2:32 ` Alexander Monakov 1 sibling, 1 reply; 10+ messages in thread From: Nathan Sidwell @ 2016-05-20 19:42 UTC (permalink / raw) To: Alexander Monakov; +Cc: gcc-patches, Sandra Loosemore On 05/20/16 11:09, Alexander Monakov wrote: > This patch implements '-msoft-stack' code generation variant for NVPTX. The > goal is to avoid relying on '.local' memory space for placement of automatic > data, and instead have an explicitely-maintained stack pointer (which can be > set up to point to preallocated global memory space). This allows to have > stack data accessible from all threads and modifiable with atomic > instructions. This also allows to implement variable-length stack allocation > (for 'alloca' and C99 VLAs). > > Each warp has its own 'soft stack' pointer. It lives in shared memory array > called __nvptx_stacks at index %tid.y (like OpenACC, OpenMP is offloading is > going to use launch geometry such that %tid.y gives the warp index). It is > retrieved in function prologue (if the function needs a stack frame) and may > also be written there (if the function is non-leaf, so that its callees see > the updated stack pointer), and restored prior to returning. > > Startup code is responsible for setting up the initial soft-stack pointer. For > -mmainkernel testing it is libgcc's __main, for OpenMP offloading it's the > kernel region entry code. ah, that's much more understandable, thanks. Presumably this doesn't support worker-single mode (in OpenACC parlance, I don't know what the OpenMP version of that is?) And neither would it support calls from vector-partitioned code (I think that's SIMD in OpenMP-land?). It seems like we should reject the combination of -msoft-stack -fopenacc? > 2016-05-19 Alexander Monakov <amonakov@ispras.ru> > 2016-05-19 Alexander Monakov <amonakov@ispras.ru> > 2016-05-19 Alexander Monakov <amonakov@ispras.ru> > 2016-05-19 Alexander Monakov <amonakov@ispras.ru> > 2016-03-15 Alexander Monakov <amonakov@ispras.ru> > 2015-12-14 Alexander Monakov <amonakov@ispras.ru> > 2015-12-09 Alexander Monakov <amonakov@ispras.ru> why so many changelogs? The on-branch development history is irrelvant for trunk -- the usual single changelog style should be followed. > diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c > index 2d4dad1..700c4b0 100644 > @@ -992,16 +1091,56 @@ nvptx_declare_function_name (FILE *file, const char *name, const_tree decl) > + else if (need_frameptr || cfun->machine->has_varadic || cfun->calls_alloca) > + { > + /* Maintain 64-bit stack alignment. */ This block needs a more descriptive comment -- it appears to be doing a great deal more than maintaining 64-bit stack alignment! > + int keep_align = BIGGEST_ALIGNMENT / BITS_PER_UNIT; > + sz = ROUND_UP (sz, keep_align); > + int bits = POINTER_SIZE; > + fprintf (file, "\t.reg.u%d %%frame;\n", bits); > + fprintf (file, "\t.reg.u32 %%fstmp0;\n"); > + fprintf (file, "\t.reg.u%d %%fstmp1;\n", bits); > + fprintf (file, "\t.reg.u%d %%fstmp2;\n", bits); Some of these register names appear to be long lived -- and referenced in other functions. It would be better to give those more descriptive names, or even give them hard-regs. You should certainly do so for those that are already hard regs (%frame & %stack) -- is it more feasible to augment init_frame to initialize them? > @@ -1037,6 +1178,10 @@ nvptx_output_return (void) > { > machine_mode mode = (machine_mode)cfun->machine->return_mode; > > + if (cfun->machine->using_softstack) > + fprintf (asm_out_file, "\tst.shared.u%d [%%fstmp2], %%fstmp1;\n", See note above about obscure reg names. Since ptx is a virtual target, we just define a few > hard registers for special purposes and leave pseudos unallocated. > @@ -200,6 +205,7 @@ struct GTY(()) machine_function > bool is_varadic; /* This call is varadic */ > bool has_varadic; /* Current function has a varadic call. */ > bool has_chain; /* Current function has outgoing static chain. */ > + bool using_softstack; /* Need to restore __nvptx_stacks[tid.y]. */ Comment should describe what the attribute is, not what it implies. In this case I think it's /* Current function has a soft stack frame. */ > diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md > index 33a4862..e5650b6 100644 > --- a/gcc/config/nvptx/nvptx.md > +++ b/gcc/config/nvptx/nvptx.md > +(define_insn "set_softstack_insn" > + [(unspec [(match_operand 0 "nvptx_register_operand" "R")] UNSPEC_ALLOCA)] > + "TARGET_SOFT_STACK" > +{ > + return (cfun->machine->using_softstack > + ? "%.\\tst.shared%t0\\t[%%fstmp2], %0;" > + : ""); > +}) Is this alloca related (UNSPEC_ALLOCA) or restore related (invoked in restore_stack_block), or stack setting (as insn name suggests). Things seem inconsistently named. Comments would be good. > > (define_expand "restore_stack_block" > [(match_operand 0 "register_operand" "") > (match_operand 1 "register_operand" "")] > "" > { > + if (TARGET_SOFT_STACK) > + { > + emit_move_insn (operands[0], operands[1]); > + emit_insn (gen_set_softstack_insn (operands[0])); Is it necessary to store the soft stack here? Only restore_stack_block seems defined, and not save_stack_block. Why the asymmetry? > diff --git a/gcc/testsuite/gcc.target/nvptx/softstack.c b/gcc/testsuite/gcc.target/nvptx/softstack.c > new file mode 100644 > index 0000000..73e60f2 > --- /dev/null This test is ok, but probably wise to check varadic generates the expected code? Are changes needed in nvptx's crt0.s (or an auxilary file in libc or libgcc) to make this work for testing? Have you tested an nvptx-elf target with this flag on? nathan ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] nvptx per-warp compiler-defined stacks (-msoft-stack) 2016-05-20 19:42 ` Nathan Sidwell @ 2016-05-25 2:32 ` Alexander Monakov 2016-05-25 13:33 ` Nathan Sidwell 0 siblings, 1 reply; 10+ messages in thread From: Alexander Monakov @ 2016-05-25 2:32 UTC (permalink / raw) To: Nathan Sidwell; +Cc: gcc-patches, Sandra Loosemore On Fri, 20 May 2016, Nathan Sidwell wrote: > ah, that's much more understandable, thanks. Presumably this doesn't > support worker-single mode (in OpenACC parlance, I don't know what the OpenMP > version of that is?) I don't see why you have concerns. In OpenMP, what OpenACC calls 'worker-single mode' should correspond to execution of a sequential region (outside of any 'parallel' region). The region is executed by the initial thread (warp 0), while other warps, having formed a thread pool, are suspended on that thread pool's barrier. When the initial thread reaches the parallel region, it unblocks the warps in the pool. The other warps may need data that is allocated on warp 0 stack, so here it's essential that soft-stacks can exist on global memory and thus be world-readable. > And neither would it support calls from vector-partitioned code (I think > that's SIMD in OpenMP-land?). Actually it would: the plan is to switch soft-stack pointer to a region of .local memory when entering OpenMP SIMD region. This makes soft-stacks use lane-private storage inside of SIMD regions (but then it's, of course, no longer world-readable and not modifiable by atomics). > It seems like we should reject the combination of -msoft-stack -fopenacc? Possibly; the doc text makes it explicit that the option is exposed only for the purpose of testing the compiler, anyway. > why so many changelogs? The on-branch development history is irrelvant for > trunk -- the usual single changelog style should be followed. OK, if branch history is not interesting for review, I can squash it; I'll have to do that for the final commit anyway. > > + else if (need_frameptr || cfun->machine->has_varadic || > > cfun->calls_alloca) > > + { > > + /* Maintain 64-bit stack alignment. */ > > This block needs a more descriptive comment -- it appears to be doing a great > deal more than maintaining 64-bit stack alignment! The comment is just for the line that follows, not the whole block. > > + int keep_align = BIGGEST_ALIGNMENT / BITS_PER_UNIT; > > + sz = ROUND_UP (sz, keep_align); > > + int bits = POINTER_SIZE; > > + fprintf (file, "\t.reg.u%d %%frame;\n", bits); > > + fprintf (file, "\t.reg.u32 %%fstmp0;\n"); > > + fprintf (file, "\t.reg.u%d %%fstmp1;\n", bits); > > + fprintf (file, "\t.reg.u%d %%fstmp2;\n", bits); > > Some of these register names appear to be long lived -- and referenced in > other functions. It would be better to give those more descriptive names, or > even give them hard-regs. That's just %fstmp2 (pointer into __nvptx_stacks) and %fstmp1 (previous stack pointer that we need to restore). I can rename them to %ssloc and %ssold (better names welcome), but I don't see a value in making them hard-regs -- there's no interface with the middle-end that would be interested in those. > You should certainly do so for those that are already hard regs (%frame & > %stack) Sorry, do what? They are already hard regs, and have descriptive names. > -- is it more feasible to augment init_frame to initialize them? I don't think so. The whole block could be moved to a separate function though. > Since ptx is a virtual target, we just define a few > > hard registers for special purposes and leave pseudos unallocated. > > @@ -200,6 +205,7 @@ struct GTY(()) machine_function > > bool is_varadic; /* This call is varadic */ > > bool has_varadic; /* Current function has a varadic call. */ > > bool has_chain; /* Current function has outgoing static chain. */ > > + bool using_softstack; /* Need to restore __nvptx_stacks[tid.y]. */ > > Comment should describe what the attribute is, not what it implies. In this > case I think it's /* Current function has a soft stack frame. */ Yes; note it's false when current function is leaf, so the description should be more like "Current function has a soft stack frame that needs restoring". > > diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md > > index 33a4862..e5650b6 100644 > > --- a/gcc/config/nvptx/nvptx.md > > +++ b/gcc/config/nvptx/nvptx.md > > > > +(define_insn "set_softstack_insn" > > + [(unspec [(match_operand 0 "nvptx_register_operand" "R")] UNSPEC_ALLOCA)] > > + "TARGET_SOFT_STACK" > > +{ > > + return (cfun->machine->using_softstack > > + ? "%.\\tst.shared%t0\\t[%%fstmp2], %0;" > > + : ""); > > +}) > > Is this alloca related (UNSPEC_ALLOCA) or restore related (invoked in > restore_stack_block), or stack setting (as insn name suggests). Things seem > inconsistently named. Comments would be good. OK, I'll add some in a respin. This is related to stack setting. I can add a new UNSPEC for that (UNSPEC_SET_SOFTSTACK). > > > > (define_expand "restore_stack_block" > > [(match_operand 0 "register_operand" "") > > (match_operand 1 "register_operand" "")] > > "" > > { > > + if (TARGET_SOFT_STACK) > > + { > > + emit_move_insn (operands[0], operands[1]); > > + emit_insn (gen_set_softstack_insn (operands[0])); > > Is it necessary to store the soft stack here? As a QoI matter, yes -- otherwise calls following a stack restore will see an unrestored stack pointer, causing stack space waste. > Only restore_stack_block seems defined, and not save_stack_block. Why the > asymmetry? Because no special handling is needed on saving. We need to update the pointer in shared memory when stack pointer is moved (thus, in allocate_stack and restore_stack_block), but when the stack pointer is copied to a save area we don't need to do anything special. > This test is ok, but probably wise to check varadic generates the expected > code? Perhaps; I can add a printf to the test. > Are changes needed in nvptx's crt0.s (or an auxilary file in libc or libgcc) > to make this work for testing? Yes. > Have you tested an nvptx-elf target with this flag on? (assuming you mean nvptx-none via nvptx-none-run) Yes, with make -k check-c DEJAGNU=... RUNTESTFLAGS=--target_board=nvptx-none-run/-msoft-stack I saw some changes due to tests passing when they otherwise wouldn't (e.g. changing an auto variable with an atomic op). There are also changes in either direction due to PTX translation bugs in the CUDA driver. Thanks. Alexander ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] nvptx per-warp compiler-defined stacks (-msoft-stack) 2016-05-25 2:32 ` Alexander Monakov @ 2016-05-25 13:33 ` Nathan Sidwell 2016-06-02 21:22 ` Alexander Monakov 0 siblings, 1 reply; 10+ messages in thread From: Nathan Sidwell @ 2016-05-25 13:33 UTC (permalink / raw) To: Alexander Monakov; +Cc: gcc-patches, Sandra Loosemore On 05/24/16 17:29, Alexander Monakov wrote: > On Fri, 20 May 2016, Nathan Sidwell wrote: >> ah, that's much more understandable, thanks. Presumably this doesn't >> support worker-single mode (in OpenACC parlance, I don't know what the OpenMP >> version of that is?) > > I don't see why you have concerns. Because I want to understand it. >> It seems like we should reject the combination of -msoft-stack -fopenacc? > > Possibly; the doc text makes it explicit that the option is exposed only for > the purpose of testing the compiler, anyway. It is always best to prevent the user doing something you don't recommend, rather than presume they'll be sensible. >>> + else if (need_frameptr || cfun->machine->has_varadic || >>> cfun->calls_alloca) >>> + { >>> + /* Maintain 64-bit stack alignment. */ >> >> This block needs a more descriptive comment -- it appears to be doing a great >> deal more than maintaining 64-bit stack alignment! > > The comment is just for the line that follows, not the whole block. It's the only comment in the block and right at the start. Fix it. >>> + int keep_align = BIGGEST_ALIGNMENT / BITS_PER_UNIT; >>> + sz = ROUND_UP (sz, keep_align); >>> + int bits = POINTER_SIZE; >>> + fprintf (file, "\t.reg.u%d %%frame;\n", bits); >>> + fprintf (file, "\t.reg.u32 %%fstmp0;\n"); >>> + fprintf (file, "\t.reg.u%d %%fstmp1;\n", bits); >>> + fprintf (file, "\t.reg.u%d %%fstmp2;\n", bits); >> >> Some of these register names appear to be long lived -- and referenced in >> other functions. It would be better to give those more descriptive names, or >> even give them hard-regs. > > That's just %fstmp2 (pointer into __nvptx_stacks) and %fstmp1 (previous stack > pointer that we need to restore). I can rename them to %ssloc and %ssold > (better names welcome), but I don't see a value in making them hard-regs -- > there's no interface with the middle-end that would be interested in those. Making them hard regs would elide the need to keep naming them in multiple places. Sure, you'll have to use some #define symbol to index the array, but then you'll get a build time error if there's a mismatch. You can also put a comment on that #define saying what it is in an obvious place. The lack of comments means it's not clear to me what ssold is -- is it the caller's value of the slot? %sspslot for the address of the __soft_stack array element. %sspprev for the caller'svalue (if that is indeed what you mean by ssold) >> You should certainly do so for those that are already hard regs (%frame & >> %stack) > > Sorry, do what? They are already hard regs, and have descriptive names. Use the hard reg name array, not duplicated string constants. >> -- is it more feasible to augment init_frame to initialize them? > > I don't think so. The whole block could be moved to a separate function though. That would be acceptable too. >>> + bool using_softstack; /* Need to restore __nvptx_stacks[tid.y]. */ >> >> Comment should describe what the attribute is, not what it implies. In this >> case I think it's /* Current function has a soft stack frame. */ > > Yes; note it's false when current function is leaf, so the description should > be more like "Current function has a soft stack frame that needs restoring". Ok. so the name doesn't match the semantics. I suggest renaming it to 'restore_softstack', with a similarly adjusted comment. >> Is this alloca related (UNSPEC_ALLOCA) or restore related (invoked in >> restore_stack_block), or stack setting (as insn name suggests). Things seem >> inconsistently named. Comments would be good. > > OK, I'll add some in a respin. This is related to stack setting. I can add a > new UNSPEC for that (UNSPEC_SET_SOFTSTACK). Thanks -- although integers are bounded in our computery world, it's not like there's a shortage here! >> Only restore_stack_block seems defined, and not save_stack_block. Why the >> asymmetry? > > Because no special handling is needed on saving. We need to update the pointer > in shared memory when stack pointer is moved (thus, in allocate_stack and > restore_stack_block), but when the stack pointer is copied to a save area we > don't need to do anything special. Thanks -- lack of comments here really impedes understanding. >> Are changes needed in nvptx's crt0.s (or an auxilary file in libc or libgcc) >> to make this work for testing? > > Yes. Please include that chunk in the next revision of this patch. But see the commit I just made. I intend to move the malloc stuff to newlib next, but that shouldn't affect the changes needed for this. > I saw some changes due to tests passing when they otherwise wouldn't (e.g. > changing an auto variable with an atomic op). There are also changes in either > direction due to PTX translation bugs in the CUDA driver. It'd be good to know what that delta is. nathan ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] nvptx per-warp compiler-defined stacks (-msoft-stack) 2016-05-25 13:33 ` Nathan Sidwell @ 2016-06-02 21:22 ` Alexander Monakov 2016-06-09 19:49 ` Nathan Sidwell 0 siblings, 1 reply; 10+ messages in thread From: Alexander Monakov @ 2016-06-02 21:22 UTC (permalink / raw) To: Nathan Sidwell; +Cc: gcc-patches On Wed, 25 May 2016, Nathan Sidwell wrote: > > > It seems like we should reject the combination of -msoft-stack -fopenacc? > > > > Possibly; the doc text makes it explicit that the option is exposed only for > > the purpose of testing the compiler, anyway. > > It is always best to prevent the user doing something you don't recommend, > rather than presume they'll be sensible. No change for that yet in the respin; do you want something like if (flag_openacc && TARGET_SOFT_STACK) sorry ("-fopenacc and -msoft-stack are mutually exclusive"); in nvptx_override_options? > > > > + bool using_softstack; /* Need to restore __nvptx_stacks[tid.y]. */ > > > > > > Comment should describe what the attribute is, not what it implies. In > > > this > > > case I think it's /* Current function has a soft stack frame. */ > > > > Yes; note it's false when current function is leaf, so the description > > should > > be more like "Current function has a soft stack frame that needs restoring". > > Ok. so the name doesn't match the semantics. I suggest renaming it to > 'restore_softstack', with a similarly adjusted comment. I went with 'has_softstack' (and the comment you've suggested previously) and made the code check leafness separately. > > > Are changes needed in nvptx's crt0.s (or an auxilary file in libc or > > > libgcc) > > > to make this work for testing? > > > > Yes. > > Please include that chunk in the next revision of this patch. But see the > commit I just made. I intend to move the malloc stuff to newlib next, but > that shouldn't affect the changes needed for this. Done. Updated patch below. Alexander gcc/: * config/nvptx/nvptx-protos.h (nvptx_output_set_softstack): Declare. * config/nvptx/nvptx.c: (need_softstack_decl): New variable. (init_softstack_frame): New. (nvptx_declare_function_name): Handle TARGET_SOFT_STACK. (nvptx_output_set_softstack): New. (nvptx_output_return): Emit stack restore if needed. (nvptx_get_drap_rtx): Return %argp as the DRAP if needed. (nvptx_file_end): Handle need_softstack_decl. * config/nvptx/nvptx.h: (TARGET_CPU_CPP_BUILTINS): Define __nvptx_softstack__ when -msoft-stack is active. (STACK_SIZE_MODE): Define. (FIXED_REGISTERS): Adjust. (SOFTSTACK_SLOT_REGNUM): New. (SOFTSTACK_PREV_REGNUM): New. (REGISTER_NAMES): Adjust. (struct machine_function): New bool field has_softstack. * config/nvptx/nvptx.md (UNSPEC_SET_SOFTSTACK): New. (allocate_stack): Implement for TARGET_SOFT_STACK. Remove unused code. (allocate_stack_<mode>): Remove unused pattern. (set_softstack_insn): New pattern. (restore_stack_block): Handle for TARGET_SOFT_STACK. * config/nvptx/nvptx.opt: (msoft-stack): New option. * doc/invoke.texi (msoft-stack): Document. gcc/testsuite/: * gcc.target/nvptx/softstack.c: New test. * lib/target-supports.exp (check_effective_target_alloca): Use a compile test. libgcc/: * config/nvptx/crt0.c (__main) [__nvptx_softstack__]: Setup __nvptx_stacks. * config/nvptx/stacks.c: New file. * config/nvptx/t-nvptx (LIB2ADD): Add stacks.c. diff --git a/gcc/config/nvptx/nvptx-protos.h b/gcc/config/nvptx/nvptx-protos.h index ec4588e..331ec0a 100644 --- a/gcc/config/nvptx/nvptx-protos.h +++ b/gcc/config/nvptx/nvptx-protos.h @@ -36,5 +36,6 @@ extern void nvptx_register_pragmas (void); extern const char *nvptx_output_mov_insn (rtx, rtx); extern const char *nvptx_output_call_insn (rtx_insn *, rtx, rtx); extern const char *nvptx_output_return (void); +extern const char *nvptx_output_set_softstack (unsigned); #endif #endif diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c index 2d4dad1..05cc6dd 100644 --- a/gcc/config/nvptx/nvptx.c +++ b/gcc/config/nvptx/nvptx.c @@ -139,6 +129,9 @@ static GTY(()) rtx worker_red_sym; /* Global lock variable, needed for 128bit worker & gang reductions. */ static GTY(()) tree global_lock_var; +/* True if any function references __nvptx_stacks. */ +static bool need_softstack_decl; + /* Allocate a new, cleared machine_function structure. */ static struct machine_function * @@ -931,6 +932,60 @@ init_frame (FILE *file, int regno, unsigned align, unsigned size) POINTER_SIZE, reg_names[regno], reg_names[regno]); } +/* Emit soft stack frame setup sequence. */ + +static void +init_softstack_frame (FILE *file, unsigned alignment, unsigned size) +{ + /* Maintain 64-bit stack alignment. */ + int keep_align = BIGGEST_ALIGNMENT / BITS_PER_UNIT; + size = ROUND_UP (size, keep_align); + int bits = POINTER_SIZE; + const char *reg_stack = reg_names[STACK_POINTER_REGNUM]; + const char *reg_frame = reg_names[FRAME_POINTER_REGNUM]; + const char *reg_sspslot = reg_names[SOFTSTACK_SLOT_REGNUM]; + const char *reg_sspprev = reg_names[SOFTSTACK_PREV_REGNUM]; + fprintf (file, "\t.reg.u%d %s;\n", bits, reg_stack); + fprintf (file, "\t.reg.u%d %s;\n", bits, reg_frame); + fprintf (file, "\t.reg.u%d %s;\n", bits, reg_sspslot); + fprintf (file, "\t.reg.u%d %s;\n", bits, reg_sspprev); + fprintf (file, "\t{\n"); + fprintf (file, "\t\t.reg.u32 %%fstmp0;\n"); + fprintf (file, "\t\t.reg.u%d %%fstmp1;\n", bits); + fprintf (file, "\t\t.reg.u%d %%fstmp2;\n", bits); + fprintf (file, "\t\tmov.u32 %%fstmp0, %%tid.y;\n"); + fprintf (file, "\t\tmul%s.u32 %%fstmp1, %%fstmp0, %d;\n", + bits == 64 ? ".wide" : ".lo", bits / 8); + fprintf (file, "\t\tmov.u%d %%fstmp2, __nvptx_stacks;\n", bits); + /* Initialize %sspslot = &__nvptx_stacks[tid.y]. */ + fprintf (file, "\t\tadd.u%d %s, %%fstmp2, %%fstmp1;\n", bits, reg_sspslot); + /* Initialize %sspprev = __nvptx_stacks[tid.y]. */ + fprintf (file, "\t\tld.shared.u%d %s, [%s];\n", + bits, reg_sspprev, reg_sspslot); + /* Initialize %frame = %sspprev - size. */ + fprintf (file, "\t\tsub.u%d %s, %s, " HOST_WIDE_INT_PRINT_DEC ";\n", + bits, reg_frame, reg_sspprev, size); + /* Apply alignment, if larger than 64. */ + if (alignment > keep_align) + fprintf (file, "\t\tand.b%d %s, %s, %d;\n", + bits, reg_frame, reg_frame, -alignment); + size = crtl->outgoing_args_size; + gcc_assert (size % keep_align == 0); + /* Initialize %stack. */ + fprintf (file, "\t\tsub.u%d %s, %s, " HOST_WIDE_INT_PRINT_DEC ";\n", + bits, reg_stack, reg_frame, size); + /* Usually 'crtl->is_leaf' is computed during register allocator + initialization, which is not done on NVPTX. Compute it now. */ + gcc_assert (!crtl->is_leaf); + crtl->is_leaf = leaf_function_p (); + if (!crtl->is_leaf) + fprintf (file, "\t\tst.shared.u%d [%s], %s;\n", + bits, reg_sspslot, reg_stack); + fprintf (file, "\t}\n"); + cfun->machine->has_softstack = true; + need_softstack_decl = true; +} + /* Emit code to initialize the REGNO predicate register to indicate whether we are not lane zero on the NAME axis. */ @@ -992,16 +1145,22 @@ nvptx_declare_function_name (FILE *file, const char *name, const_tree decl) fprintf (file, "%s", s.str().c_str()); - /* Declare a local var for outgoing varargs. */ - if (cfun->machine->has_varadic) - init_frame (file, STACK_POINTER_REGNUM, - UNITS_PER_WORD, crtl->outgoing_args_size); - - /* Declare a local variable for the frame. */ HOST_WIDE_INT sz = get_frame_size (); - if (sz || cfun->machine->has_chain) - init_frame (file, FRAME_POINTER_REGNUM, - crtl->stack_alignment_needed / BITS_PER_UNIT, sz); + bool need_frameptr = sz || cfun->machine->has_chain; + int alignment = crtl->stack_alignment_needed / BITS_PER_UNIT; + if (!TARGET_SOFT_STACK) + { + /* Declare a local var for outgoing varargs. */ + if (cfun->machine->has_varadic) + init_frame (file, STACK_POINTER_REGNUM, + UNITS_PER_WORD, crtl->outgoing_args_size); + + /* Declare a local variable for the frame. */ + if (need_frameptr) + init_frame (file, FRAME_POINTER_REGNUM, alignment, sz); + } + else if (need_frameptr || cfun->machine->has_varadic || cfun->calls_alloca) + init_softstack_frame (file, alignment, sz); /* Declare the pseudos we have as ptx registers. */ int maxregs = max_reg_num (); @@ -1027,6 +1186,21 @@ nvptx_declare_function_name (FILE *file, const char *name, const_tree decl) REGNO (cfun->machine->axis_predicate[1]), "x"); } +/* Output instruction that sets soft stack pointer in shared memory to the + value in register given by SRC_REGNO. */ + +const char * +nvptx_output_set_softstack (unsigned src_regno) +{ + if (cfun->machine->has_softstack && !crtl->is_leaf) + { + fprintf (asm_out_file, "\tst.shared.u%d\t[%s], ", + POINTER_SIZE, reg_names[SOFTSTACK_SLOT_REGNUM]); + output_reg (asm_out_file, src_regno, VOIDmode); + fprintf (asm_out_file, ";\n"); + } + return ""; +} /* Output a return instruction. Also copy the return value to its outgoing location. */ @@ -1037,6 +1213,9 @@ nvptx_output_return (void) { machine_mode mode = (machine_mode)cfun->machine->return_mode; + if (TARGET_SOFT_STACK) + nvptx_output_set_softstack (SOFTSTACK_PREV_REGNUM); + if (mode != VOIDmode) fprintf (asm_out_file, "\tst.param%s\t[%s_out], %s;\n", nvptx_ptx_type_from_mode (mode, false), @@ -1068,6 +1247,8 @@ nvptx_function_ok_for_sibcall (tree, tree) static rtx nvptx_get_drap_rtx (void) { + if (TARGET_SOFT_STACK && stack_realign_drap) + return arg_pointer_rtx; return NULL_RTX; } @@ -3939,6 +4223,13 @@ nvptx_file_end (void) if (worker_red_size) write_worker_buffer (asm_out_file, worker_red_sym, worker_red_align, worker_red_size); + + if (need_softstack_decl) + { + write_var_marker (asm_out_file, false, true, "__nvptx_stacks"); + fprintf (asm_out_file, ".extern .shared .u%d __nvptx_stacks[];\n", + POINTER_SIZE); + } } /* Expander for the shuffle builtins. */ diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h index 381269e..aceaa40 100644 --- a/gcc/config/nvptx/nvptx.h +++ b/gcc/config/nvptx/nvptx.h @@ -31,6 +31,8 @@ builtin_assert ("machine=nvptx"); \ builtin_assert ("cpu=nvptx"); \ builtin_define ("__nvptx__"); \ + if (TARGET_SOFT_STACK) \ + builtin_define ("__nvptx_softstack__"); \ } while (0) /* Avoid the default in ../../gcc.c, which adds "-pthread", which is not @@ -79,13 +83,14 @@ #define POINTER_SIZE (TARGET_ABI64 ? 64 : 32) #define Pmode (TARGET_ABI64 ? DImode : SImode) +#define STACK_SIZE_MODE Pmode /* Registers. Since ptx is a virtual target, we just define a few hard registers for special purposes and leave pseudos unallocated. We have to have some available hard registers, to keep gcc setup happy. */ #define FIRST_PSEUDO_REGISTER 16 -#define FIXED_REGISTERS { 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 } +#define FIXED_REGISTERS { 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0 } #define CALL_USED_REGISTERS { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 } #define HARD_REGNO_NREGS(REG, MODE) \ @@ -133,10 +138,17 @@ enum reg_class { NO_REGS, ALL_REGS, LIM_REG_CLASSES }; #define FRAME_POINTER_REGNUM 2 #define ARG_POINTER_REGNUM 3 #define STATIC_CHAIN_REGNUM 4 +/* This register points to the shared memory location with the current warp's + soft stack pointer (__nvptx_stacks[tid.y]). */ +#define SOFTSTACK_SLOT_REGNUM 5 +/* This register is used to save the previous value of the soft stack pointer + in the prologue and restore it when returning. */ +#define SOFTSTACK_PREV_REGNUM 6 #define REGISTER_NAMES \ { \ - "%value", "%stack", "%frame", "%args", "%chain", "%hr5", "%hr6", "%hr7", \ + "%value", "%stack", "%frame", "%args", \ + "%chain", "%sspslot", "%sspprev", "%hr7", \ "%hr8", "%hr9", "%hr10", "%hr11", "%hr12", "%hr13", "%hr14", "%hr15" \ } @@ -200,6 +212,7 @@ struct GTY(()) machine_function bool is_varadic; /* This call is varadic */ bool has_varadic; /* Current function has a varadic call. */ bool has_chain; /* Current function has outgoing static chain. */ + bool has_softstack; /* Current function has a soft stack frame. */ int num_args; /* Number of args of current call. */ int return_mode; /* Return mode of current fn. (machine_mode not defined yet.) */ diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md index 33a4862..8e08178 100644 --- a/gcc/config/nvptx/nvptx.md +++ b/gcc/config/nvptx/nvptx.md @@ -36,6 +36,8 @@ (define_c_enum "unspec" [ UNSPEC_ALLOCA + UNSPEC_SET_SOFTSTACK + UNSPEC_DIM_SIZE UNSPEC_BIT_CONV @@ -961,31 +988,40 @@ (define_expand "allocate_stack" (match_operand 1 "nvptx_register_operand")] "" { + if (TARGET_SOFT_STACK) + { + emit_move_insn (stack_pointer_rtx, + gen_rtx_MINUS (Pmode, stack_pointer_rtx, operands[1])); + emit_insn (gen_set_softstack_insn (stack_pointer_rtx)); + emit_move_insn (operands[0], virtual_stack_dynamic_rtx); + DONE; + } /* The ptx documentation specifies an alloca intrinsic (for 32 bit only) but notes it is not implemented. The assembler emits a confused error message. Issue a blunt one now instead. */ sorry ("target cannot support alloca."); emit_insn (gen_nop ()); DONE; - if (TARGET_ABI64) - emit_insn (gen_allocate_stack_di (operands[0], operands[1])); - else - emit_insn (gen_allocate_stack_si (operands[0], operands[1])); - DONE; }) -(define_insn "allocate_stack_<mode>" - [(set (match_operand:P 0 "nvptx_register_operand" "=R") - (unspec:P [(match_operand:P 1 "nvptx_register_operand" "R")] - UNSPEC_ALLOCA))] - "" - "%.\\tcall (%0), %%alloca, (%1);") +(define_insn "set_softstack_insn" + [(unspec [(match_operand 0 "nvptx_register_operand" "R")] + UNSPEC_SET_SOFTSTACK)] + "TARGET_SOFT_STACK" +{ + return nvptx_output_set_softstack (REGNO (operands[0])); +}) (define_expand "restore_stack_block" [(match_operand 0 "register_operand" "") (match_operand 1 "register_operand" "")] "" { + if (TARGET_SOFT_STACK) + { + emit_move_insn (operands[0], operands[1]); + emit_insn (gen_set_softstack_insn (operands[0])); + } DONE; }) diff --git a/gcc/config/nvptx/nvptx.opt b/gcc/config/nvptx/nvptx.opt index 056b9b2..1a7608b 100644 --- a/gcc/config/nvptx/nvptx.opt +++ b/gcc/config/nvptx/nvptx.opt @@ -32,3 +32,7 @@ Link in code for a __main kernel. moptimize Target Report Var(nvptx_optimize) Init(-1) Optimize partition neutering + +msoft-stack +Target Report Mask(SOFT_STACK) +Use custom stacks instead of local memory for automatic storage. diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index d281975..44d75c0 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -19341,6 +19341,18 @@ offloading execution. Apply partitioned execution optimizations. This is the default when any level of optimization is selected. +@item -msoft-stack +@opindex msoft-stack +Generate code that does not use @code{.local} memory +directly for stack storage. Instead, a per-warp stack pointer is +maintained explicitly. This enables variable-length stack allocation (with +variable-length arrays or @code{alloca}), and when global memory is used for +underlying storage, makes it possible to access automatic variables from other +threads, or with atomic instructions. This code generation variant is used +for OpenMP offloading, but the option is exposed on its own for the purpose +of testing the compiler; to generate code suitable for linking into programs +using OpenMP offloading, use option @option{-mgomp}. + @end table @node PDP-11 Options diff --git a/gcc/testsuite/gcc.target/nvptx/softstack.c b/gcc/testsuite/gcc.target/nvptx/softstack.c new file mode 100644 index 0000000..73e60f2 --- /dev/null +++ b/gcc/testsuite/gcc.target/nvptx/softstack.c @@ -0,0 +1,23 @@ +/* { dg-options "-O2 -msoft-stack" } */ +/* { dg-do run } */ + +static __attribute__((noinline,noclone)) int f(int *p) +{ + return __sync_lock_test_and_set(p, 1); +} + +static __attribute__((noinline,noclone)) int g(int n) +{ + /* Check that variable-length stack allocation works. */ + int v[n]; + v[0] = 0; + /* Check that atomic operations can be applied to auto data. */ + return f(v) == 0 && v[0] == 1; +} + +int main() +{ + if (!g(1)) + __builtin_abort(); + return 0; +} diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index dceabb5..878fdab 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -729,7 +729,10 @@ proc check_effective_target_untyped_assembly {} { proc check_effective_target_alloca {} { if { [istarget nvptx-*-*] } { - return 0 + return [check_no_compiler_messages alloca assembly { + void f (void*); + void g (int n) { f (__builtin_alloca (n)); } + }] } return 1 } diff --git a/libgcc/config/nvptx/crt0.c b/libgcc/config/nvptx/crt0.c index 3b7382d..a147c7e 100644 --- a/libgcc/config/nvptx/crt0.c +++ b/libgcc/config/nvptx/crt0.c @@ -33,5 +33,11 @@ __main (int *rval_ptr, int argc, void **argv) if (rval_ptr) *rval_ptr = 255; +#ifdef __nvptx_softstack__ + extern char *__nvptx_stacks[] __attribute__((shared)); + static char stack[131072] __attribute__((aligned(8))); + __nvptx_stacks[0] = stack + sizeof stack; +#endif + exit (main (argc, argv)); } diff --git a/libgcc/config/nvptx/stacks.c b/libgcc/config/nvptx/stacks.c new file mode 100644 index 0000000..4640fc9 --- /dev/null +++ b/libgcc/config/nvptx/stacks.c @@ -0,0 +1,25 @@ +/* Define shared memory array for -msoft-stack. + + Copyright (C) 2015-2016 Free Software Foundation, Inc. + + This file is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by the + Free Software Foundation; either version 3, or (at your option) any + later version. + + This file is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + <http://www.gnu.org/licenses/>. */ + +char *__nvptx_stacks[32] __attribute__((shared,nocommon)); diff --git a/libgcc/config/nvptx/t-nvptx b/libgcc/config/nvptx/t-nvptx index daf252f..23a8252 100644 --- a/libgcc/config/nvptx/t-nvptx +++ b/libgcc/config/nvptx/t-nvptx @@ -1,4 +1,5 @@ -LIB2ADD=$(srcdir)/config/nvptx/reduction.c +LIB2ADD=$(srcdir)/config/nvptx/reduction.c \ + $(srcdir)/config/nvptx/stacks.c LIB2ADDEH= LIB2FUNCS_EXCLUDE=__main ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] nvptx per-warp compiler-defined stacks (-msoft-stack) 2016-06-02 21:22 ` Alexander Monakov @ 2016-06-09 19:49 ` Nathan Sidwell 2016-06-09 20:11 ` Alexander Monakov 0 siblings, 1 reply; 10+ messages in thread From: Nathan Sidwell @ 2016-06-09 19:49 UTC (permalink / raw) To: Alexander Monakov; +Cc: gcc-patches On 06/02/16 17:22, Alexander Monakov wrote: > On Wed, 25 May 2016, Nathan Sidwell wrote: >>>> It seems like we should reject the combination of -msoft-stack -fopenacc? >>> >>> Possibly; the doc text makes it explicit that the option is exposed only for >>> the purpose of testing the compiler, anyway. >> >> It is always best to prevent the user doing something you don't recommend, >> rather than presume they'll be sensible. > > No change for that yet in the respin; do you want something like > > if (flag_openacc && TARGET_SOFT_STACK) > sorry ("-fopenacc and -msoft-stack are mutually exclusive"); > > in nvptx_override_options? Yes, but it's not a sorry, just a regular error. > +static void > +init_softstack_frame (FILE *file, unsigned alignment, unsigned size) > +{ > + /* Maintain 64-bit stack alignment. */ This still needs blank lines to aid readibility. > @@ -1027,6 +1186,21 @@ nvptx_declare_function_name (FILE *file, const char *name, const_tree decl) > REGNO (cfun->machine->axis_predicate[1]), "x"); > } > > +/* Output instruction that sets soft stack pointer in shared memory to the > + value in register given by SRC_REGNO. */ > + > +const char * > +nvptx_output_set_softstack (unsigned src_regno) > +{ > + if (cfun->machine->has_softstack && !crtl->is_leaf) > + { > + fprintf (asm_out_file, "\tst.shared.u%d\t[%s], ", > + POINTER_SIZE, reg_names[SOFTSTACK_SLOT_REGNUM]); > + output_reg (asm_out_file, src_regno, VOIDmode); > + fprintf (asm_out_file, ";\n"); > + } > + return ""; I think this would be clearer by having the softstack insn's pattern mention both the SRC and softstack reg and then simply return either "" or "st.shared%t\\t%0, %1" (and do that directly in the insn's emitter section rather than an fn call?) Then also have an epilogue expander emit the insn, rather than embed it in the return emitter. > --- a/gcc/config/nvptx/nvptx.md > +++ b/gcc/config/nvptx/nvptx.md > (define_expand "restore_stack_block" > [(match_operand 0 "register_operand" "") > (match_operand 1 "register_operand" "")] you've not addressed my previous comments about this. > @@ -0,0 +1,25 @@ > +char *__nvptx_stacks[32] __attribute__((shared,nocommon)); Is there a reason this can't be in crt0? It should be 'void *' Also, why '32' when only slot zero is initialized? ISTM that this should be size 1. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] nvptx per-warp compiler-defined stacks (-msoft-stack) 2016-06-09 19:49 ` Nathan Sidwell @ 2016-06-09 20:11 ` Alexander Monakov 0 siblings, 0 replies; 10+ messages in thread From: Alexander Monakov @ 2016-06-09 20:11 UTC (permalink / raw) To: Nathan Sidwell; +Cc: gcc-patches On Thu, 9 Jun 2016, Nathan Sidwell wrote: > > (define_expand "restore_stack_block" > > [(match_operand 0 "register_operand" "") > > (match_operand 1 "register_operand" "")] > > you've not addressed my previous comments about this. To be clear -- do you mean that "restore_stack_block" should have a comment mentioning why "save_stack_block" is not overridden? > > @@ -0,0 +1,25 @@ > > > +char *__nvptx_stacks[32] __attribute__((shared,nocommon)); > > Is there a reason this can't be in crt0? It's also needed for offloading compilation, which doesn't link crt0. > It should be 'void *' Also, why '32' when only slot zero is initialized? > ISTM that this should be size 1. Offloading will need the full width. Thanks. Alexander ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2016-06-09 20:11 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-04-20 17:00 [PATCH] nvptx per-warp compiler-defined stacks (-msoft-stack) Alexander Monakov 2016-04-21 14:08 ` Nathan Sidwell 2016-05-20 15:09 ` Alexander Monakov 2016-05-20 16:36 ` Sandra Loosemore 2016-05-20 19:42 ` Nathan Sidwell 2016-05-25 2:32 ` Alexander Monakov 2016-05-25 13:33 ` Nathan Sidwell 2016-06-02 21:22 ` Alexander Monakov 2016-06-09 19:49 ` Nathan Sidwell 2016-06-09 20:11 ` Alexander Monakov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).