Document AMD GCN. 2019-01-18 Andrew Stubbs gcc/ * doc/extend.tex (AMD GCN Function Attributes): New section. * doc/install.texi (amdgcn-unknown-amdhsa): New instructions. * doc/invoke.texi (AMD GCN Options): New section. * doc/md.texi (Constraints for Particular Machines): Add AMD GCN. diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index ebd5648..465de30 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -2393,6 +2393,7 @@ GCC plugins may provide their own attributes. @menu * Common Function Attributes:: * AArch64 Function Attributes:: +* AMD GCN Function Attributes:: * ARC Function Attributes:: * ARM Function Attributes:: * AVR Function Attributes:: @@ -3954,6 +3955,96 @@ Note that CPU tuning options and attributes such as the @option{-mcpu=}, @option{-mcpu=} option or the @code{cpu=} attribute conflicts with the architectural feature rules specified above. +@node AMD GCN Function Attributes +@subsection AMD GCN Function Attributes + +These function attributes are supported by the AMD GCN back end: + +@table @code +@item amdgpu_hsa_kernel +@cindex @code{amdgpu_hsa_kernel} function attribute, AMD GCN +This attribute indicates that the corresponding function should be compiled as +a kernel function, that is an entry point that can be invoked from the host +via the HSA runtime library. By default functions are only callable only from +other GCN functions. + +This attribute is implicitly applied to any function named @code{main}, using +default parameters. + +Kernel functions may return an integer value, which will be written to a +conventional place within the HSA "kernargs" region. + +The attribute parameters configure what values are passed into the kernel +function by the GPU drivers, via the initial register state. Some values are +used by the compiler, and therefore forced on. Enabling other options may +break assumptions in the compiler and/or run-time libraries. + +@table @code +@item private_segment_buffer +Set @code{enable_sgpr_private_segment_buffer} flag. Always on (required to +locate the stack). + +@item dispatch_ptr +Set @code{enable_sgpr_dispatch_ptr} flag. Always on (required to locate the +launch dimensions). + +@item queue_ptr +Set @code{enable_sgpr_queue_ptr} flag. Always on (required to convert address +spaces). + +@item kernarg_segment_ptr +Set @code{enable_sgpr_kernarg_segment_ptr} flag. Always on (required to +locate the kernel arguments, "kernargs"). + +@item dispatch_id +Set @code{enable_sgpr_dispatch_id} flag. + +@item flat_scratch_init +Set @code{enable_sgpr_flat_scratch_init} flag. + +@item private_segment_size +Set @code{enable_sgpr_private_segment_size} flag. + +@item grid_workgroup_count_X +Set @code{enable_sgpr_grid_workgroup_count_x} flag. Always on (required to +use OpenACC/OpenMP). + +@item grid_workgroup_count_Y +Set @code{enable_sgpr_grid_workgroup_count_y} flag. + +@item grid_workgroup_count_Z +Set @code{enable_sgpr_grid_workgroup_count_z} flag. + +@item workgroup_id_X +Set @code{enable_sgpr_workgroup_id_x} flag. + +@item workgroup_id_Y +Set @code{enable_sgpr_workgroup_id_y} flag. + +@item workgroup_id_Z +Set @code{enable_sgpr_workgroup_id_z} flag. + +@item workgroup_info +Set @code{enable_sgpr_workgroup_info} flag. + +@item private_segment_wave_offset +Set @code{enable_sgpr_private_segment_wave_byte_offset} flag. Always on +(required to locate the stack). + +@item work_item_id_X +Set @code{enable_vgpr_workitem_id} parameter. Always on (can't be disabled). + +@item work_item_id_Y +Set @code{enable_vgpr_workitem_id} parameter. Always on (required to enable +vectorization.) + +@item work_item_id_Z +Set @code{enable_vgpr_workitem_id} parameter. Always on (required to use +OpenACC/OpenMP). + +@end table +@end table + @node ARC Function Attributes @subsection ARC Function Attributes diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi index d5e1edb..81a15a0 100644 --- a/gcc/doc/install.texi +++ b/gcc/doc/install.texi @@ -3447,6 +3447,27 @@ This is a synonym for @samp{x86_64-*-solaris2.1[0-9]*}. @html
@end html +@anchor{amdgcn-unknown-amdhsa} +@heading amdgcn-unknown-amdhsa +AMD GCN GPU target. + +Instead of GNU Binutils, you will need to install LLVM 6, or later, and copy +@file{bin/llvm-mc} to @file{amdgcn-unknown-amdhsa/bin/as}, +@file{bin/lld} to @file{amdgcn-unknown-amdhsa/bin/ld}, +@file{bin/llvm-nm} to @file{amdgcn-unknown-amdhsa/bin/nm}, and +@file{bin/llvm-ar} to both @file{bin/amdgcn-unknown-amdhsa-ar} and +@file{bin/amdgcn-unknown-amdhsa-ranlib}. + +Use Newlib (2019-01-16, or newer). + +To run the binaries, install the HSA Runtime from the +@uref{https://rocm.github.io,,ROCm Platform}, and use +@file{libexec/gcc/amdhsa-unknown-amdhsa/@var{version}/gcn-run} to launch them +on the GPU. + +@html +
+@end html @anchor{arc-x-elf32} @heading arc-*-elf32 diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 1151708..ff8cd10 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -643,6 +643,9 @@ Objective-C and Objective-C++ Dialects}. -mfp-mode=@var{mode} -mvect-double -max-vect-align=@var{num} @gol -msplit-vecmove-early -m1reg-@var{reg}} +@emph{AMD GCN Options} +@gccoptlist{-march=@var{gpu} -mtune=@var{gpu} -mstack-size=@var{bytes}} + @emph{ARC Options} @gccoptlist{-mbarrel-shifter -mjli-always @gol -mcpu=@var{cpu} -mA6 -mARC600 -mA7 -mARC700 @gol @@ -15479,6 +15482,7 @@ platform. @menu * AArch64 Options:: * Adapteva Epiphany Options:: +* AMD GCN Options:: * ARC Options:: * ARM Options:: * AVR Options:: @@ -16083,6 +16087,41 @@ purpose. The default is @option{-m1reg-none}. @end table +@node AMD GCN Options +@subsection AMD GCN Options +@cindex AMD GCN Options + +These options are defined specifically for the AMD GCN port. + +@table @gcctabopt + +@item -march=@var{gpu} +@opindex march +@itemx -mtune=@var{gpu} +@opindex mtune +Set architecture type or tuning for @var{gpu}. Supported values for @var{gpu} +are + +@table @samp +@opindex fiji +@item fiji +Compile for GCN3 Fiji devices (gfx803). + +@item gfx900 +Compile for GCN5 Vega 10 devices (gfx900). + +@end table + +@item -mstack-size=@var{bytes} +@opindex mstack-size +Specify how many @var{bytes} of stack space will be requested for each GPU +thread (wave-front). Beware that there may be many threads and limited memory +available. The size of the stack allocation may also have an impact on +run-time performance. The default is 32KB when using OpenACC or OpenMP, and +1MB otherwise. + +@end table + @node ARC Options @subsection ARC Options @cindex ARC options diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 18b8af0..6ffb69b 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -1800,6 +1800,100 @@ DF modes @end table +@item AMD GCN ---@file{config/gcn/constraints.md} +@table @code +@item I +Immediate integer in the range @minus{}16 to 64 + +@item J +Immediate 16-bit signed integer + +@item Kf +Immediate constant @minus{}1 + +@item L +Immediate 15-bit unsigned integer + +@item A +Immediate constant that can be inlined in an instruction encoding: integer +@minus{}16..64, or float 0.0, +/@minus{}0.5, +/@minus{}1.0, +/@minus{}2.0, ++/@minus{}4.0, 1.0/(2.0*PI) + +@item B +Immediate 32-bit signed integer that can be attached to an instruction encoding + +@item C +Immediate 32-bit integer in range @minus{}16..4294967295 (i.e. 32-bit unsigned +integer or @samp{A} constraint) + +@item DA +Immediate 64-bit constant that can be split into two @samp{A} constants + +@item DB +Immediate 64-bit constant that can be split into two @samp{B} constants + +@item U +Any @code{unspec} + +@item Y +Any @code{symbol_ref} or @code{label_ref} + +@item v +VGPR register + +@item Sg +SGPR register + +@item SD +SGPR registers valid for instruction destinations, including VCC, M0 and EXEC + +@item SS +SGPR registers valid for instruction sources, including VCC, M0, EXEC and SCC + +@item Sm +SGPR registers valid as a source for scalar memory instructions (excludes M0 +and EXEC) + +@item Sv +SGPR registers valid as a source or destination for vector instructions +(excludes EXEC) + +@item ca +All condition registers: SCC, VCCZ, EXECZ + +@item cs +Scalar condition register: SCC + +@item cV +Vector condition register: VCC, VCC_LO, VCC_HI + +@item e +EXEC register (EXEC_LO and EXEC_HI) + +@item RB +Memory operand with address space suitable for @code{buffer_*} instructions + +@item RF +Memory operand with address space suitable for @code{flat_*} instructions + +@item RS +Memory operand with address space suitable for @code{s_*} instructions + +@item RL +Memory operand with address space suitable for @code{ds_*} LDS instructions + +@item RG +Memory operand with address space suitable for @code{ds_*} GDS instructions + +@item RD +Memory operand with address space suitable for any @code{ds_*} instructions + +@item RM +Memory operand with address space suitable for @code{global_*} instructions + +@end table + + @item ARC ---@file{config/arc/constraints.md} @table @code @item q