public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* The nvptx port [0/11+]
@ 2014-10-20 14:19 Bernd Schmidt
  2014-10-20 14:21 ` The nvptx port [1/11+] indirect jumps Bernd Schmidt
                   ` (16 more replies)
  0 siblings, 17 replies; 82+ messages in thread
From: Bernd Schmidt @ 2014-10-20 14:19 UTC (permalink / raw)
  To: GCC Patches

This is a patch kit that adds the nvptx port to gcc. It contains 
preliminary patches to add needed functionality, the target files, and 
one somewhat optional patch with additional target tools. There'll be 
more patch series, one for the testsuite, and one to make the offload 
functionality work with this port. Also required are the previous four 
rtl patches, two of which weren't entirely approved yet.

For the moment, I've stripped out all the address space support that got 
bogged down in review by brokenness in our representation of address 
spaces. The ptx address spaces are of course still defined and used 
inside the backend.

Ptx really isn't a usual target - it is a virtual target which is then 
translated by another compiler (ptxas) to the final code that runs on 
the GPU. There are many restrictions, some imposed by the GPU hardware, 
and some by the fact that not everything you'd want can be represented 
in ptx. Here are some of the highlights:
  * Everything is typed - variables, functions, registers. This can
    cause problems with K&R style C or anything else that doesn't
    have a proper type internally.
  * Declarations are needed, even for undefined variables.
  * Can't emit initializers referring to their variable's address since
    you can't write forward declarations for variables.
  * Variables can be declared only as scalars or arrays, not
    structures. Initializers must be in the variable's declared type,
    which requires some code in the backend, and it means that packed
    pointer values are not representable.
  * Since it's a virtual target, we skip register allocation - no good
    can probably come from doing that twice. This means asm statements
    aren't fixed up and will fail if they use matching constraints.
  * No support for indirect jumps, label values, nonlocal gotos.
  * No alloca - ptx defines it, but it's not implemented.
  * No trampolines.
  * No debugging (at all, for now - we may add line number directives).
  * Limited C library support - I have a hacked up copy of newlib
    that provides a reasonable subset.
  * malloc and free are defined by ptx (these appear to be
    undocumented), but there isn't a realloc. I have one patch for
    Fortran to use a malloc/memcpy helper function in cases where we
    know the old size.

All in all, this is not intended to be used as a C (or any other source 
language) compiler. I've gone through a lot of effort to make it work 
reasonably well, but only in order to get sufficient test coverage from 
the testsuites. The intended use for this is only to build it as an 
offload compiler, and use it through OpenACC by way of lto1. That leaves 
the question of how we should document it - does it need the usual 
constraint and option documentation, given that user's aren't expected 
to use any of it?

A slightly earlier version of the entire patch kit was bootstrapped and 
tested on x86_64-linux. Ok for trunk?


Bernd

^ permalink raw reply	[flat|nested] 82+ messages in thread

* The nvptx port [1/11+] indirect jumps
  2014-10-20 14:19 The nvptx port [0/11+] Bernd Schmidt
@ 2014-10-20 14:21 ` Bernd Schmidt
  2014-10-21 18:29   ` Jeff Law
  2014-11-04 15:35   ` Bernd Schmidt
  2014-10-20 14:22 ` The nvptx port [2/11+] No register allocation Bernd Schmidt
                   ` (15 subsequent siblings)
  16 siblings, 2 replies; 82+ messages in thread
From: Bernd Schmidt @ 2014-10-20 14:21 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 101 bytes --]

ptx doesn't have indirect jumps, so CODE_FOR_indirect_jump may not be
defined.  Add a sorry.


Bernd

[-- Attachment #2: 001-indjumps.diff --]
[-- Type: text/x-patch, Size: 850 bytes --]

	gcc/
	* optabs.c (emit_indirect_jump): Test HAVE_indirect_jump and emit a
	sorry if necessary.

------------------------------------------------------------------------
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c	(revision 422345)
+++ gcc/optabs.c	(revision 422346)
@@ -4477,13 +4477,16 @@ prepare_float_lib_cmp (rtx x, rtx y, enu
 /* Generate code to indirectly jump to a location given in the rtx LOC.  */
 
 void
-emit_indirect_jump (rtx loc)
+emit_indirect_jump (rtx loc ATTRIBUTE_UNUSED)
 {
+#ifndef HAVE_indirect_jump
+  sorry ("indirect jumps are not available on this target");
+#else
   struct expand_operand ops[1];
-
   create_address_operand (&ops[0], loc);
   expand_jump_insn (CODE_FOR_indirect_jump, 1, ops);
   emit_barrier ();
+#endif
 }
 \f
 #ifdef HAVE_conditional_move

^ permalink raw reply	[flat|nested] 82+ messages in thread

* The nvptx port [2/11+] No register allocation
  2014-10-20 14:19 The nvptx port [0/11+] Bernd Schmidt
  2014-10-20 14:21 ` The nvptx port [1/11+] indirect jumps Bernd Schmidt
@ 2014-10-20 14:22 ` Bernd Schmidt
  2014-10-20 14:24 ` The nvptx port [3/11+] Struct returns Bernd Schmidt
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 82+ messages in thread
From: Bernd Schmidt @ 2014-10-20 14:22 UTC (permalink / raw)
  To: GCC Patches

Since it's a virtual target, I've chosen not to run register allocation. 
This is one of the patches necessary to make that work, it primarily 
adds a target hook to disable it and fixes some of the fallout.


Bernd

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [3/11+] Struct returns
  2014-10-20 14:19 The nvptx port [0/11+] Bernd Schmidt
  2014-10-20 14:21 ` The nvptx port [1/11+] indirect jumps Bernd Schmidt
  2014-10-20 14:22 ` The nvptx port [2/11+] No register allocation Bernd Schmidt
@ 2014-10-20 14:24 ` Bernd Schmidt
  2014-10-21 18:41   ` Jeff Law
  2014-10-20 14:24 ` The nvptx port [2/11+] No register allocation Bernd Schmidt
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 82+ messages in thread
From: Bernd Schmidt @ 2014-10-20 14:24 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 417 bytes --]

Even when returning a structure by passing an invisible reference, gcc 
still likes to set the return register to the address of the struct. 
This is undesirable on ptx where things like the return register have to 
be declared, and the function really returns void at ptx level. I've 
added a target hook to avoid this. I figure other targets might find it 
beneficial to omit this unnecessary set as well.


Bernd


[-- Attachment #2: 003-sretreg.diff --]
[-- Type: text/x-patch, Size: 3075 bytes --]

	gcc/
	* target.def (omit_struct_return_reg): New data hook.
	* doc/tm.texi.in: Add @hook TARGET_OMIT_STRUCT_RETURN_REG.
	* doc/tm.texi: Regenerate.
	* function.c (expand_function_end): Use it.

------------------------------------------------------------------------
Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi	(revision 422355)
+++ gcc/doc/tm.texi	(revision 422356)
@@ -4560,6 +4560,14 @@ need more space than is implied by @code
 saving and restoring an arbitrary return value.
 @end defmac
 
+@deftypevr {Target Hook} bool TARGET_OMIT_STRUCT_RETURN_REG
+Normally, when a function returns a structure by memory, the address
+is passed as an invisible pointer argument, but the compiler also
+arranges to return the address from the function like it would a normal
+pointer return value.  Define this to true if that behaviour is
+undesirable on your target.
+@end deftypevr
+
 @deftypefn {Target Hook} bool TARGET_RETURN_IN_MSB (const_tree @var{type})
 This hook should return true if values of type @var{type} are returned
 at the most significant end of a register (in other words, if they are
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in	(revision 422355)
+++ gcc/doc/tm.texi.in	(revision 422356)
@@ -3769,6 +3769,8 @@ need more space than is implied by @code
 saving and restoring an arbitrary return value.
 @end defmac
 
+@hook TARGET_OMIT_STRUCT_RETURN_REG
+
 @hook TARGET_RETURN_IN_MSB
 
 @node Aggregate Return
Index: gcc/target.def
===================================================================
--- gcc/target.def	(revision 422355)
+++ gcc/target.def	(revision 422356)
@@ -3601,6 +3601,16 @@ structure value address at the beginning
 to emit adjusting code, you should do it at this point.",
  rtx, (tree fndecl, int incoming),
  hook_rtx_tree_int_null)
+
+DEFHOOKPOD
+(omit_struct_return_reg,
+ "Normally, when a function returns a structure by memory, the address\n\
+is passed as an invisible pointer argument, but the compiler also\n\
+arranges to return the address from the function like it would a normal\n\
+pointer return value.  Define this to true if that behaviour is\n\
+undesirable on your target.",
+ bool, false)
+
 DEFHOOK
 (return_in_memory,
  "This target hook should return a nonzero value to say to return the\n\
Index: gcc/function.c
===================================================================
--- gcc/function.c	(revision 422355)
+++ gcc/function.c	(revision 422356)
@@ -5179,8 +5179,8 @@ expand_function_end (void)
      If returning a structure PCC style,
      the caller also depends on this value.
      And cfun->returns_pcc_struct is not necessarily set.  */
-  if (cfun->returns_struct
-      || cfun->returns_pcc_struct)
+  if ((cfun->returns_struct || cfun->returns_pcc_struct)
+      && !targetm.calls.omit_struct_return_reg)
     {
       rtx value_address = DECL_RTL (DECL_RESULT (current_function_decl));
       tree type = TREE_TYPE (DECL_RESULT (current_function_decl));

^ permalink raw reply	[flat|nested] 82+ messages in thread

* The nvptx port [2/11+] No register allocation
  2014-10-20 14:19 The nvptx port [0/11+] Bernd Schmidt
                   ` (2 preceding siblings ...)
  2014-10-20 14:24 ` The nvptx port [3/11+] Struct returns Bernd Schmidt
@ 2014-10-20 14:24 ` Bernd Schmidt
  2014-10-21 18:36   ` Jeff Law
  2014-10-20 14:27 ` The nvptx port [4/11+] Post-RA pipeline Bernd Schmidt
                   ` (12 subsequent siblings)
  16 siblings, 1 reply; 82+ messages in thread
From: Bernd Schmidt @ 2014-10-20 14:24 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 217 bytes --]

Since it's a virtual target, I've chosen not to run register allocation. 
This is one of the patches necessary to make that work, it primarily 
adds a target hook to disable it and fixes some of the fallout.


Bernd


[-- Attachment #2: 002-noregalloc.diff --]
[-- Type: text/x-patch, Size: 5177 bytes --]

	gcc/
	* target.def (no_register_allocation): New data hook.
	* doc/tm.texi.in: Add @hook TARGET_NO_REGISTER_ALLOCATION.
	* doc/tm.texi: Regenerate.
	* ira.c (gate_ira): New function.
	(pass_data_ira): Set has_gate.
	(pass_ira): Add a gate function.
	(pass_data_reload): Likewise.
	(pass_reload): Add a gate function.
	(pass_ira): Use it.
	* reload1.c (eliminate_regs): If reg_eliminte_is NULL, assert that
	no register allocation happens on the target and return.
	* final.c (alter_subreg): Ensure register is not a pseudo before
	calling simplify_subreg.
	(output_operand): Assert that x isn't a pseudo only if doing
	register allocation.

------------------------------------------------------------------------
Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi.orig
+++ gcc/doc/tm.texi
@@ -9520,11 +9520,19 @@ True if the @code{DW_AT_comp_dir} attrib
 @end deftypevr
 
 @deftypevr {Target Hook} bool TARGET_DELAY_SCHED2
-True if sched2 is not to be run at its normal place.  This usually means it will be run as part of machine-specific reorg.
+True if sched2 is not to be run at its normal place.
+This usually means it will be run as part of machine-specific reorg.
 @end deftypevr
 
 @deftypevr {Target Hook} bool TARGET_DELAY_VARTRACK
-True if vartrack is not to be run at its normal place.  This usually means it will be run as part of machine-specific reorg.
+True if vartrack is not to be run at its normal place.
+This usually means it will be run as part of machine-specific reorg.
+@end deftypevr
+
+@deftypevr {Target Hook} bool TARGET_NO_REGISTER_ALLOCATION
+True if register allocation and the passes
+following it should not be run.  Usually true only for virtual assembler
+targets.
 @end deftypevr
 
 @defmac ASM_OUTPUT_DWARF_DELTA (@var{stream}, @var{size}, @var{label1}, @var{label2})
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in.orig
+++ gcc/doc/tm.texi.in
@@ -7188,6 +7188,8 @@ tables, and hence is desirable if it wor
 
 @hook TARGET_DELAY_VARTRACK
 
+@hook TARGET_NO_REGISTER_ALLOCATION
+
 @defmac ASM_OUTPUT_DWARF_DELTA (@var{stream}, @var{size}, @var{label1}, @var{label2})
 A C statement to issue assembly directives that create a difference
 @var{lab1} minus @var{lab2}, using an integer of the given @var{size}.
Index: gcc/target.def
===================================================================
--- gcc/target.def.orig
+++ gcc/target.def
@@ -5379,15 +5379,21 @@ DEFHOOKPOD
  bool, false)
 
 DEFHOOKPOD
-(delay_sched2, "True if sched2 is not to be run at its normal place.  \
+(delay_sched2, "True if sched2 is not to be run at its normal place.\n\
 This usually means it will be run as part of machine-specific reorg.",
 bool, false)
 
 DEFHOOKPOD
-(delay_vartrack, "True if vartrack is not to be run at its normal place.  \
+(delay_vartrack, "True if vartrack is not to be run at its normal place.\n\
 This usually means it will be run as part of machine-specific reorg.",
 bool, false)
 
+DEFHOOKPOD
+(no_register_allocation, "True if register allocation and the passes\n\
+following it should not be run.  Usually true only for virtual assembler\n\
+targets.",
+bool, false)
+
 /* Leave the boolean fields at the end.  */
 
 /* Close the 'struct gcc_target' definition.  */
Index: gcc/final.c
===================================================================
--- gcc/final.c.orig
+++ gcc/final.c
@@ -3129,7 +3129,7 @@ alter_subreg (rtx *xp, bool final_p)
       else
 	*xp = adjust_address_nv (y, GET_MODE (x), offset);
     }
-  else
+  else if (REG_P (y) && HARD_REGISTER_P (y))
     {
       rtx new_rtx = simplify_subreg (GET_MODE (x), y, GET_MODE (y),
 				     SUBREG_BYTE (x));
@@ -3816,7 +3816,8 @@ output_operand (rtx x, int code ATTRIBUT
     x = alter_subreg (&x, true);
 
   /* X must not be a pseudo reg.  */
-  gcc_assert (!x || !REG_P (x) || REGNO (x) < FIRST_PSEUDO_REGISTER);
+  if (!targetm.no_register_allocation)
+    gcc_assert (!x || !REG_P (x) || REGNO (x) < FIRST_PSEUDO_REGISTER);
 
   targetm.asm_out.print_operand (asm_out_file, x, code);
 
Index: gcc/reload1.c
===================================================================
--- gcc/reload1.c.orig
+++ gcc/reload1.c
@@ -2947,6 +2947,11 @@ eliminate_regs_1 (rtx x, enum machine_mo
 rtx
 eliminate_regs (rtx x, enum machine_mode mem_mode, rtx insn)
 {
+  if (reg_eliminate == NULL)
+    {
+      gcc_assert (targetm.no_register_allocation);
+      return x;
+    }
   return eliminate_regs_1 (x, mem_mode, insn, false, false);
 }
 
Index: gcc/ira.c
===================================================================
--- gcc/ira.c.orig
+++ gcc/ira.c
@@ -5573,6 +5573,10 @@ public:
   {}
 
   /* opt_pass methods: */
+  virtual bool gate (function *)
+    {
+      return !targetm.no_register_allocation;
+    }
   virtual unsigned int execute (function *)
     {
       ira (dump_file);
@@ -5613,6 +5617,10 @@ public:
   {}
 
   /* opt_pass methods: */
+  virtual bool gate (function *)
+    {
+      return !targetm.no_register_allocation;
+    }
   virtual unsigned int execute (function *)
     {
       do_reload ();

^ permalink raw reply	[flat|nested] 82+ messages in thread

* The nvptx port [4/11+] Post-RA pipeline
  2014-10-20 14:19 The nvptx port [0/11+] Bernd Schmidt
                   ` (3 preceding siblings ...)
  2014-10-20 14:24 ` The nvptx port [2/11+] No register allocation Bernd Schmidt
@ 2014-10-20 14:27 ` Bernd Schmidt
  2014-10-21 18:42   ` Jeff Law
  2014-10-20 14:27 ` The nvptx port [5/11+] Variable declarations Bernd Schmidt
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 82+ messages in thread
From: Bernd Schmidt @ 2014-10-20 14:27 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 426 bytes --]

This stops most of the post-regalloc passes to be run if the target 
doesn't want register allocation. I'd previously moved them all out of 
postreload to the toplevel, but Jakub (I think) pointed out that the 
idea is not to run them to avoid crashes if reload fails e.g. for an 
invalid asm. So I've made a new container pass.

A later patch will make thread_prologue_and_epilogue_insns callable from 
the backend.


Bernd


[-- Attachment #2: 004-postra.diff --]
[-- Type: text/x-patch, Size: 2978 bytes --]

	gcc/
	* passes.def (pass_compute_alignments, pass_duplicate_computed_gotos,
	pass_variable_tracking, pass_free_cfg, pass_machine_reorg,
	pass_cleanup_barriers, pass_delay_slots,
	pass_split_for_shorten_branches, pass_convert_to_eh_region_ranges,
	pass_shorten_branches, pass_est_nothrow_function_flags,
	pass_dwarf2_frame, pass_final): Move outside of pass_postreload and
	into pass_late_compilation.
	(pass_late_compilation): Add.
	* passes.c (pass_data_late_compilation, pass_late_compilation,
	make_pass_late_compilation): New.
	* timevar.def (TV_LATE_COMPILATION): New.

------------------------------------------------------------------------
Index: gcc/passes.def
===================================================================
--- gcc/passes.def.orig
+++ gcc/passes.def
@@ -415,6 +415,9 @@ along with GCC; see the file COPYING3.
 	      NEXT_PASS (pass_split_before_regstack);
 	      NEXT_PASS (pass_stack_regs_run);
 	  POP_INSERT_PASSES ()
+      POP_INSERT_PASSES ()
+      NEXT_PASS (pass_late_compilation);
+      PUSH_INSERT_PASSES_WITHIN (pass_late_compilation)
 	  NEXT_PASS (pass_compute_alignments);
 	  NEXT_PASS (pass_variable_tracking);
 	  NEXT_PASS (pass_free_cfg);
Index: gcc/passes.c
===================================================================
--- gcc/passes.c.orig
+++ gcc/passes.c
@@ -569,6 +569,44 @@ make_pass_postreload (gcc::context *ctxt
   return new pass_postreload (ctxt);
 }
 
+namespace {
+
+const pass_data pass_data_late_compilation =
+{
+  RTL_PASS, /* type */
+  "*all-late_compilation", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_LATE_COMPILATION, /* tv_id */
+  PROP_rtl, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_late_compilation : public rtl_opt_pass
+{
+public:
+  pass_late_compilation (gcc::context *ctxt)
+    : rtl_opt_pass (pass_data_late_compilation, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *)
+  {
+    return reload_completed || targetm.no_register_allocation;
+  }
+
+}; // class pass_late_compilation
+
+} // anon namespace
+
+static rtl_opt_pass *
+make_pass_late_compilation (gcc::context *ctxt)
+{
+  return new pass_late_compilation (ctxt);
+}
+
 
 
 /* Set the static pass number of pass PASS to ID and record that
Index: gcc/timevar.def
===================================================================
--- gcc/timevar.def.orig
+++ gcc/timevar.def
@@ -270,6 +270,7 @@ DEFTIMEVAR (TV_EARLY_LOCAL	     , "early
 DEFTIMEVAR (TV_OPTIMIZE		     , "unaccounted optimizations")
 DEFTIMEVAR (TV_REST_OF_COMPILATION   , "rest of compilation")
 DEFTIMEVAR (TV_POSTRELOAD	     , "unaccounted post reload")
+DEFTIMEVAR (TV_LATE_COMPILATION	     , "unaccounted late compilation")
 DEFTIMEVAR (TV_REMOVE_UNUSED	     , "remove unused locals")
 DEFTIMEVAR (TV_ADDRESS_TAKEN	     , "address taken")
 DEFTIMEVAR (TV_TODO		     , "unaccounted todo")

^ permalink raw reply	[flat|nested] 82+ messages in thread

* The nvptx port [5/11+] Variable declarations
  2014-10-20 14:19 The nvptx port [0/11+] Bernd Schmidt
                   ` (4 preceding siblings ...)
  2014-10-20 14:27 ` The nvptx port [4/11+] Post-RA pipeline Bernd Schmidt
@ 2014-10-20 14:27 ` Bernd Schmidt
  2014-10-21 18:44   ` Jeff Law
  2014-10-20 14:31 ` The nvptx port [6/11+] Pseudo call args Bernd Schmidt
                   ` (10 subsequent siblings)
  16 siblings, 1 reply; 82+ messages in thread
From: Bernd Schmidt @ 2014-10-20 14:27 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 191 bytes --]

ptx assembly follows rather different rules than what's typical 
elsewhere. We need a new hook to add a " };" string when we are finished 
outputting a variable with an initializer.


Bernd


[-- Attachment #2: 005-declend.diff --]
[-- Type: text/x-patch, Size: 2550 bytes --]

	gcc/
	* target.def (decl_end): New hook.
	* varasm.c (assemble_variable_contents, assemble_constant_contents):
	Use it.
	* doc/tm.texi.in (TARGET_ASM_DECL_END): Add.
	* doc/tm.texi: Regenerate.

------------------------------------------------------------------------
Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi.orig
+++ gcc/doc/tm.texi
@@ -7575,6 +7575,11 @@ The default implementation of this hook
 when the relevant string is @code{NULL}.
 @end deftypefn
 
+@deftypefn {Target Hook} void TARGET_ASM_DECL_END (void)
+Define this hook if the target assembler requires a special marker to
+terminate an initialized variable declaration.
+@end deftypefn
+
 @deftypefn {Target Hook} bool TARGET_ASM_OUTPUT_ADDR_CONST_EXTRA (FILE *@var{file}, rtx @var{x})
 A target hook to recognize @var{rtx} patterns that @code{output_addr_const}
 can't deal with, and output assembly code to @var{file} corresponding to
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in.orig
+++ gcc/doc/tm.texi.in
@@ -5412,6 +5412,8 @@ It must not be modified by command-line
 
 @hook TARGET_ASM_INTEGER
 
+@hook TARGET_ASM_DECL_END
+
 @hook TARGET_ASM_OUTPUT_ADDR_CONST_EXTRA
 
 @defmac ASM_OUTPUT_ASCII (@var{stream}, @var{ptr}, @var{len})
Index: gcc/target.def
===================================================================
--- gcc/target.def.orig
+++ gcc/target.def
@@ -127,6 +127,15 @@ when the relevant string is @code{NULL}.
  bool, (rtx x, unsigned int size, int aligned_p),
  default_assemble_integer)
 
+/* Notify the backend that we have completed emitting the data for a
+   decl.  */
+DEFHOOK
+(decl_end,
+ "Define this hook if the target assembler requires a special marker to\n\
+terminate an initialized variable declaration.",
+ void, (void),
+ hook_void_void)
+
 /* Output code that will globalize a label.  */
 DEFHOOK
 (globalize_label,
Index: gcc/varasm.c
===================================================================
--- gcc/varasm.c.orig
+++ gcc/varasm.c
@@ -1945,6 +1945,7 @@ assemble_variable_contents (tree decl, c
       else
 	/* Leave space for it.  */
 	assemble_zeros (tree_to_uhwi (DECL_SIZE_UNIT (decl)));
+      targetm.asm_out.decl_end ();
     }
 }
 
@@ -3349,6 +3350,8 @@ assemble_constant_contents (tree exp, co
 
   /* Output the value of EXP.  */
   output_constant (exp, size, align);
+
+  targetm.asm_out.decl_end ();
 }
 
 /* We must output the constant data referred to by SYMBOL; do so.  */

^ permalink raw reply	[flat|nested] 82+ messages in thread

* The nvptx port [6/11+] Pseudo call args
  2014-10-20 14:19 The nvptx port [0/11+] Bernd Schmidt
                   ` (5 preceding siblings ...)
  2014-10-20 14:27 ` The nvptx port [5/11+] Variable declarations Bernd Schmidt
@ 2014-10-20 14:31 ` Bernd Schmidt
  2014-10-21 18:56   ` Jeff Law
  2014-10-20 14:32 ` The nvptx port [7/11+] Inform the port about call arguments Bernd Schmidt
                   ` (9 subsequent siblings)
  16 siblings, 1 reply; 82+ messages in thread
From: Bernd Schmidt @ 2014-10-20 14:31 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 142 bytes --]

On ptx, we'll be using pseudos to pass function args as well, and 
there's one assert that needs to be toned town to make that work.


Bernd


[-- Attachment #2: 006-usereg.diff --]
[-- Type: text/x-patch, Size: 668 bytes --]

	gcc/
	* expr.c (use_reg_mode): Just return for pseudo registers.

------------------------------------------------------------------------
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	(revision 422421)
+++ gcc/expr.c	(revision 422422)
@@ -2321,7 +2321,10 @@ copy_blkmode_to_reg (enum machine_mode m
 void
 use_reg_mode (rtx *call_fusage, rtx reg, enum machine_mode mode)
 {
-  gcc_assert (REG_P (reg) && REGNO (reg) < FIRST_PSEUDO_REGISTER);
+  gcc_assert (REG_P (reg));
+
+  if (!HARD_REGISTER_P (reg))
+    return;
 
   *call_fusage
     = gen_rtx_EXPR_LIST (mode, gen_rtx_USE (VOIDmode, reg), *call_fusage);

^ permalink raw reply	[flat|nested] 82+ messages in thread

* The nvptx port [8/11+] Write undefined decls.
  2014-10-20 14:19 The nvptx port [0/11+] Bernd Schmidt
                   ` (7 preceding siblings ...)
  2014-10-20 14:32 ` The nvptx port [7/11+] Inform the port about call arguments Bernd Schmidt
@ 2014-10-20 14:32 ` Bernd Schmidt
  2014-10-21 22:07   ` Jeff Law
  2014-10-20 14:35 ` The nvptx port [9/11+] Epilogues Bernd Schmidt
                   ` (7 subsequent siblings)
  16 siblings, 1 reply; 82+ messages in thread
From: Bernd Schmidt @ 2014-10-20 14:32 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 117 bytes --]

ptx assembly requires that declarations are written for undefined 
variables. This adds that functionality.


Bernd


[-- Attachment #2: 008-undefdecl.diff --]
[-- Type: text/x-patch, Size: 7898 bytes --]

	gcc/
	* target.def (assemble_undefined_decl): New hooks.
	* hooks.c (hook_void_FILEptr_constcharptr_const_tree): New function.
	* hooks.h (hook_void_FILEptr_constcharptr_const_tree): Declare.
	* doc/tm.texi.in (TARGET_ASM_ASSEMBLE_UNDEFINED_DECL): Add.
	* doc/tm.texi: Regenerate.
	* output.h (assemble_undefined_decl): Declare.
	(get_fnname_from_decl): Declare.
	* varasm.c (assemble_undefined_decl): New function.
	(get_fnname_from_decl): New function.
	* final.c (rest_of_handle_final): Use it.
	* varpool.c (varpool_output_variables): Call assemble_undefined_decl
	for nodes without a definition.

------------------------------------------------------------------------
Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi.orig
+++ gcc/doc/tm.texi
@@ -7899,6 +7902,13 @@ global; that is, available for reference
 The default implementation uses the TARGET_ASM_GLOBALIZE_LABEL target hook.
 @end deftypefn
 
+@deftypefn {Target Hook} void TARGET_ASM_ASSEMBLE_UNDEFINED_DECL (FILE *@var{stream}, const char *@var{name}, const_tree @var{decl})
+This target hook is a function to output to the stdio stream
+@var{stream} some commands that will declare the name associated with
+@var{decl} which is not defined in the current translation unit.  Most
+assemblers do not require anything to be output in this case.
+@end deftypefn
+
 @defmac ASM_WEAKEN_LABEL (@var{stream}, @var{name})
 A C statement (sans semicolon) to output to the stdio stream
 @var{stream} some commands that will make the label @var{name} weak;
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in.orig
+++ gcc/doc/tm.texi.in
@@ -5693,6 +5693,8 @@ You may wish to use @code{ASM_OUTPUT_SIZ
 
 @hook TARGET_ASM_GLOBALIZE_DECL_NAME
 
+@hook TARGET_ASM_ASSEMBLE_UNDEFINED_DECL
+
 @defmac ASM_WEAKEN_LABEL (@var{stream}, @var{name})
 A C statement (sans semicolon) to output to the stdio stream
 @var{stream} some commands that will make the label @var{name} weak;
Index: gcc/hooks.c
===================================================================
--- gcc/hooks.c.orig
+++ gcc/hooks.c
@@ -139,6 +139,13 @@ hook_void_FILEptr_constcharptr (FILE *a
 {
 }
 
+/* Generic hook that takes (FILE *, const char *, constr_tree *) and does
+   nothing.  */
+void
+hook_void_FILEptr_constcharptr_const_tree (FILE *, const char *, const_tree)
+{
+}
+
 /* Generic hook that takes (FILE *, rtx) and returns false.  */
 bool
 hook_bool_FILEptr_rtx_false (FILE *a ATTRIBUTE_UNUSED,
Index: gcc/hooks.h
===================================================================
--- gcc/hooks.h.orig
+++ gcc/hooks.h
@@ -69,6 +69,8 @@ extern void hook_void_void (void);
 extern void hook_void_constcharptr (const char *);
 extern void hook_void_rtx_int (rtx, int);
 extern void hook_void_FILEptr_constcharptr (FILE *, const char *);
+extern void hook_void_FILEptr_constcharptr_const_tree (FILE *, const char *,
+						       const_tree);
 extern bool hook_bool_FILEptr_rtx_false (FILE *, rtx);
 extern void hook_void_rtx (rtx);
 extern void hook_void_tree (tree);
Index: gcc/target.def
===================================================================
--- gcc/target.def.orig
+++ gcc/target.def
@@ -158,6 +158,16 @@ global; that is, available for reference
 The default implementation uses the TARGET_ASM_GLOBALIZE_LABEL target hook.",
  void, (FILE *stream, tree decl), default_globalize_decl_name)
 
+/* Output code that will declare an external variable.  */
+DEFHOOK
+(assemble_undefined_decl,
+ "This target hook is a function to output to the stdio stream\n\
+@var{stream} some commands that will declare the name associated with\n\
+@var{decl} which is not defined in the current translation unit.  Most\n\
+assemblers do not require anything to be output in this case.",
+ void, (FILE *stream, const char *name, const_tree decl),
+ hook_void_FILEptr_constcharptr_const_tree)
+
 /* Output code that will emit a label for unwind info, if this
    target requires such labels.  Second argument is the decl the
    unwind info is associated with, third is a boolean: true if
Index: gcc/final.c
===================================================================
--- gcc/final.c.orig
+++ gcc/final.c
@@ -4434,17 +4434,7 @@ leaf_renumber_regs_insn (rtx in_rtx)
 static unsigned int
 rest_of_handle_final (void)
 {
-  rtx x;
-  const char *fnname;
-
-  /* Get the function's name, as described by its RTL.  This may be
-     different from the DECL_NAME name used in the source file.  */
-
-  x = DECL_RTL (current_function_decl);
-  gcc_assert (MEM_P (x));
-  x = XEXP (x, 0);
-  gcc_assert (GET_CODE (x) == SYMBOL_REF);
-  fnname = XSTR (x, 0);
+  const char *fnname = get_fnname_from_decl (current_function_decl);
 
   assemble_start_function (current_function_decl, fnname);
   final_start_function (get_insns (), asm_out_file, optimize);
Index: gcc/varasm.c
===================================================================
--- gcc/varasm.c.orig
+++ gcc/varasm.c
@@ -1611,6 +1611,18 @@ decide_function_section (tree decl)
   in_cold_section_p = first_function_block_is_cold;
 }
 
+/* Get the function's name, as described by its RTL.  This may be
+   different from the DECL_NAME name used in the source file.  */
+const char *
+get_fnname_from_decl (tree decl)
+{
+  rtx x = DECL_RTL (decl);
+  gcc_assert (MEM_P (x));
+  x = XEXP (x, 0);
+  gcc_assert (GET_CODE (x) == SYMBOL_REF);
+  return XSTR (x, 0);
+}
+
 /* Output assembler code for the constant pool of a function and associated
    with defining the name of the function.  DECL describes the function.
    NAME is the function's name.  For the constant pool, we use the current
@@ -1977,6 +1989,15 @@ assemble_variable_contents (tree decl, c
     }
 }
 
+/* Write out assembly for the variable DECL, which is not defined in
+   the current translation unit.  */
+void
+assemble_undefined_decl (tree decl)
+{
+  const char *name = XSTR (XEXP (DECL_RTL (decl), 0), 0);
+  targetm.asm_out.assemble_undefined_decl (asm_out_file, name, decl);
+}
+
 /* Assemble everything that is needed for a variable or function declaration.
    Not used for automatic variables, and not used for function definitions.
    Should not be called for variables of incomplete structure type.
Index: gcc/output.h
===================================================================
--- gcc/output.h.orig
+++ gcc/output.h
@@ -176,6 +176,9 @@ extern void default_assemble_visibility
    for an `asm' keyword used between functions.  */
 extern void assemble_asm (tree);
 
+/* Get the function's name from a decl, as described by its RTL.  */
+extern const char *get_fnname_from_decl (tree);
+
 /* Output assembler code for the constant pool of a function and associated
    with defining the name of the function.  DECL describes the function.
    NAME is the function's name.  For the constant pool, we use the current
@@ -201,6 +204,10 @@ extern void assemble_variable (tree, int
    into the preinit array.  */
 extern void assemble_vtv_preinit_initializer (tree);
 
+/* Assemble everything that is needed for a variable declaration that has
+   no definition in the current translation unit.  */
+extern void assemble_undefined_decl (tree);
+
 /* Compute the alignment of variable specified by DECL.
    DONT_OUTPUT_DATA is from assemble_variable.  */
 extern void align_variable (tree decl, bool dont_output_data);
Index: gcc/varpool.c
===================================================================
--- gcc/varpool.c.orig
+++ gcc/varpool.c
@@ -610,6 +642,9 @@ varpool_output_variables (void)
   FOR_EACH_DEFINED_VARIABLE (node)
     varpool_finalize_named_section_flags (node);
 
+  FOR_EACH_VARIABLE (node)
+    if (!node->definition)
+      assemble_undefined_decl (node->decl);
   FOR_EACH_DEFINED_VARIABLE (node)
     if (varpool_assemble_decl (node))
       changed = true;

^ permalink raw reply	[flat|nested] 82+ messages in thread

* The nvptx port [7/11+] Inform the port about call arguments
  2014-10-20 14:19 The nvptx port [0/11+] Bernd Schmidt
                   ` (6 preceding siblings ...)
  2014-10-20 14:31 ` The nvptx port [6/11+] Pseudo call args Bernd Schmidt
@ 2014-10-20 14:32 ` Bernd Schmidt
  2014-10-21 21:25   ` Jeff Law
  2014-10-20 14:32 ` The nvptx port [8/11+] Write undefined decls Bernd Schmidt
                   ` (8 subsequent siblings)
  16 siblings, 1 reply; 82+ messages in thread
From: Bernd Schmidt @ 2014-10-20 14:32 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 462 bytes --]

In ptx assembly we need to decorate call insns with the arguments that 
are being passed. We also need to know the exact function type. This is 
kind of hard to do with the existing infrastructure since things like 
function_arg are called at other times rather than just when emitting a 
call, so this patch adds two more hooks, one called just before argument 
registers are loaded (once for each arg), and the other just after the 
call is complete.


Bernd


[-- Attachment #2: 007-callargs.diff --]
[-- Type: text/x-patch, Size: 8659 bytes --]

	gcc/
	* target.def (call_args, end_call_args): New hooks.
	* hooks.c (hook_void_rtx_tree): New empty function.
	* hooks.h (hook_void_rtx_tree): Declare.
	* doc/tm.texi.in (TARGET_CALL_ARGS, TARGET_END_CALL_ARGS): Add.
	* doc/tm.texi: Regenerate.
	* calls.c (expand_call): Slightly rearrange the code.  Use the two new
	hooks.
	(expand_library_call_value_1): Use the two new hooks.

------------------------------------------------------------------------
Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi.orig
+++ gcc/doc/tm.texi
@@ -5027,6 +5027,29 @@ except the last are treated as named.
 You need not define this hook if it always returns @code{false}.
 @end deftypefn
 
+@deftypefn {Target Hook} void TARGET_CALL_ARGS (rtx, @var{tree})
+While generating RTL for a function call, this target hook is invoked once
+for each argument passed to the function, either a register returned by
+@code{TARGET_FUNCTION_ARG} or a memory location.  It is called just
+before the point where argument registers are stored.  The type of the
+function to be called is also passed as the second argument; it is
+@code{NULL_TREE} for libcalls.  The @code{TARGET_END_CALL_ARGS} hook is
+invoked just after the code to copy the return reg has been emitted.
+This functionality can be used to perform special setup of call argument
+registers if a target needs it.
+For functions without arguments, the hook is called once with @code{pc_rtx}
+passed instead of an argument register.
+Most ports do not need to implement anything for this hook.
+@end deftypefn
+
+@deftypefn {Target Hook} void TARGET_END_CALL_ARGS (void)
+This target hook is invoked while generating RTL for a function call,
+just after the point where the return reg is copied into a pseudo.  It
+signals that all the call argument and return registers for the just
+emitted call are now no longer in use.
+Most ports do not need to implement anything for this hook.
+@end deftypefn
+
 @deftypefn {Target Hook} bool TARGET_PRETEND_OUTGOING_VARARGS_NAMED (cumulative_args_t @var{ca})
 If you need to conditionally change ABIs so that one works with
 @code{TARGET_SETUP_INCOMING_VARARGS}, but the other works like neither
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in.orig
+++ gcc/doc/tm.texi.in
@@ -3929,6 +3929,10 @@ These machine description macros help im
 
 @hook TARGET_STRICT_ARGUMENT_NAMING
 
+@hook TARGET_CALL_ARGS
+
+@hook TARGET_END_CALL_ARGS
+
 @hook TARGET_PRETEND_OUTGOING_VARARGS_NAMED
 
 @node Trampolines
Index: gcc/hooks.c
===================================================================
--- gcc/hooks.c.orig
+++ gcc/hooks.c
@@ -245,6 +245,11 @@ hook_void_tree (tree a ATTRIBUTE_UNUSED)
 }
 
 void
+hook_void_rtx_tree (rtx, tree)
+{
+}
+
+void
 hook_void_constcharptr (const char *a ATTRIBUTE_UNUSED)
 {
 }
Index: gcc/hooks.h
===================================================================
--- gcc/hooks.h.orig
+++ gcc/hooks.h
@@ -70,6 +70,7 @@ extern void hook_void_constcharptr (cons
 extern void hook_void_rtx_int (rtx, int);
 extern void hook_void_FILEptr_constcharptr (FILE *, const char *);
 extern bool hook_bool_FILEptr_rtx_false (FILE *, rtx);
+extern void hook_void_rtx_tree (rtx, tree);
 extern void hook_void_tree (tree);
 extern void hook_void_tree_treeptr (tree, tree *);
 extern void hook_void_int_int (int, int);
Index: gcc/target.def
===================================================================
--- gcc/target.def.orig
+++ gcc/target.def
@@ -3825,6 +3825,33 @@ not generate any instructions in this ca
  default_setup_incoming_varargs)
 
 DEFHOOK
+(call_args,
+ "While generating RTL for a function call, this target hook is invoked once\n\
+for each argument passed to the function, either a register returned by\n\
+@code{TARGET_FUNCTION_ARG} or a memory location.  It is called just\n\
+before the point where argument registers are stored.  The type of the\n\
+function to be called is also passed as the second argument; it is\n\
+@code{NULL_TREE} for libcalls.  The @code{TARGET_END_CALL_ARGS} hook is\n\
+invoked just after the code to copy the return reg has been emitted.\n\
+This functionality can be used to perform special setup of call argument\n\
+registers if a target needs it.\n\
+For functions without arguments, the hook is called once with @code{pc_rtx}\n\
+passed instead of an argument register.\n\
+Most ports do not need to implement anything for this hook.",
+ void, (rtx, tree),
+ hook_void_rtx_tree)
+
+DEFHOOK
+(end_call_args,
+ "This target hook is invoked while generating RTL for a function call,\n\
+just after the point where the return reg is copied into a pseudo.  It\n\
+signals that all the call argument and return registers for the just\n\
+emitted call are now no longer in use.\n\
+Most ports do not need to implement anything for this hook.",
+ void, (void),
+ hook_void_void)
+
+DEFHOOK
 (strict_argument_naming,
  "Define this hook to return @code{true} if the location where a function\n\
 argument is passed depends on whether or not it is a named argument.\n\
Index: gcc/calls.c
===================================================================
--- gcc/calls.c.orig
+++ gcc/calls.c
@@ -3011,6 +3011,33 @@ expand_call (tree exp, rtx target, int i
 
       funexp = rtx_for_function_call (fndecl, addr);
 
+      /* Precompute all register parameters.  It isn't safe to compute anything
+	 once we have started filling any specific hard regs.  */
+      precompute_register_parameters (num_actuals, args, &reg_parm_seen);
+
+      if (CALL_EXPR_STATIC_CHAIN (exp))
+	static_chain_value = expand_normal (CALL_EXPR_STATIC_CHAIN (exp));
+      else
+	static_chain_value = 0;
+
+#ifdef REG_PARM_STACK_SPACE
+      /* Save the fixed argument area if it's part of the caller's frame and
+	 is clobbered by argument setup for this call.  */
+      if (ACCUMULATE_OUTGOING_ARGS && pass)
+	save_area = save_fixed_argument_area (reg_parm_stack_space, argblock,
+					      &low_to_save, &high_to_save);
+#endif
+
+      bool any_regs = false;
+      for (i = 0; i < num_actuals; i++)
+	if (args[i].reg != NULL_RTX)
+	  {
+	    any_regs = true;
+	    targetm.calls.call_args (args[i].reg, funtype);
+	  }
+      if (!any_regs)
+	targetm.calls.call_args (pc_rtx, funtype);
+
       /* Figure out the register where the value, if any, will come back.  */
       valreg = 0;
       if (TYPE_MODE (rettype) != VOIDmode
@@ -3037,23 +3064,6 @@ expand_call (tree exp, rtx target, int i
 	    }
 	}
 
-      /* Precompute all register parameters.  It isn't safe to compute anything
-	 once we have started filling any specific hard regs.  */
-      precompute_register_parameters (num_actuals, args, &reg_parm_seen);
-
-      if (CALL_EXPR_STATIC_CHAIN (exp))
-	static_chain_value = expand_normal (CALL_EXPR_STATIC_CHAIN (exp));
-      else
-	static_chain_value = 0;
-
-#ifdef REG_PARM_STACK_SPACE
-      /* Save the fixed argument area if it's part of the caller's frame and
-	 is clobbered by argument setup for this call.  */
-      if (ACCUMULATE_OUTGOING_ARGS && pass)
-	save_area = save_fixed_argument_area (reg_parm_stack_space, argblock,
-					      &low_to_save, &high_to_save);
-#endif
-
       /* Now store (and compute if necessary) all non-register parms.
 	 These come before register parms, since they can require block-moves,
 	 which could clobber the registers used for register parms.
@@ -3458,6 +3468,8 @@ expand_call (tree exp, rtx target, int i
       for (i = 0; i < num_actuals; ++i)
 	free (args[i].aligned_regs);
 
+      targetm.calls.end_call_args ();
+
       insns = get_insns ();
       end_sequence ();
 
@@ -3985,6 +3997,18 @@ emit_library_call_value_1 (int retval, r
     }
 #endif
 
+  /* When expanding a normal call, args are stored in push order,
+     which is the reverse of what we have here.  */
+  bool any_regs = false;
+  for (int i = nargs; i-- > 0; )
+    if (argvec[i].reg != NULL_RTX)
+      {
+	targetm.calls.call_args (argvec[i].reg, NULL_TREE);
+	any_regs = true;
+      }
+  if (!any_regs)
+    targetm.calls.call_args (pc_rtx, NULL_TREE);
+
   /* Push the args that need to be pushed.  */
 
   /* ARGNUM indexes the ARGVEC array in the order in which the arguments
@@ -4224,6 +4248,8 @@ emit_library_call_value_1 (int retval, r
       valreg = gen_rtx_REG (TYPE_MODE (tfom), REGNO (valreg));
     }
 
+  targetm.calls.end_call_args ();
+
   /* For calls to `setjmp', etc., inform function.c:setjmp_warnings
      that it should complain if nonvolatile values are live.  For
      functions that cannot return, inform flow that control does not

^ permalink raw reply	[flat|nested] 82+ messages in thread

* The nvptx port [9/11+] Epilogues
  2014-10-20 14:19 The nvptx port [0/11+] Bernd Schmidt
                   ` (8 preceding siblings ...)
  2014-10-20 14:32 ` The nvptx port [8/11+] Write undefined decls Bernd Schmidt
@ 2014-10-20 14:35 ` Bernd Schmidt
  2014-10-21 22:08   ` Jeff Law
  2014-10-20 14:50 ` The nvptx port [10/11+] Target files Bernd Schmidt
                   ` (6 subsequent siblings)
  16 siblings, 1 reply; 82+ messages in thread
From: Bernd Schmidt @ 2014-10-20 14:35 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 249 bytes --]

We skip the late compilation passes on ptx, but there's one piece we do 
need - fixing up the function so that we get return insns in the right 
places. This patch just makes thread_prologue_and_epilogue_insns 
callable from the reorg pass.


Bernd

[-- Attachment #2: 009-proep.diff --]
[-- Type: text/x-patch, Size: 1057 bytes --]

	gcc/
	* function.c (thread_prologue_and_epilogue_insns): No longer static.
	* function.h (thread_prologue_and_epilogue_insns): Declare.

------------------------------------------------------------------------
Index: gcc/function.c
===================================================================
--- gcc/function.c	(revision 422424)
+++ gcc/function.c	(revision 422425)
@@ -5945,7 +5945,7 @@ emit_return_for_exit (edge exit_fallthru
    in a sibcall omit the sibcall_epilogue if the block is not in
    ANTIC.  */
 
-static void
+void
 thread_prologue_and_epilogue_insns (void)
 {
   bool inserted;
Index: gcc/function.h
===================================================================
--- gcc/function.h	(revision 422424)
+++ gcc/function.h	(revision 422425)
@@ -773,6 +773,8 @@ extern void free_after_compilation (stru
 
 extern void init_varasm_status (void);
 
+extern void thread_prologue_and_epilogue_insns (void);
+
 #ifdef RTX_CODE
 extern void diddle_return_value (void (*)(rtx, void*), void*);
 extern void clobber_return_register (void);

^ permalink raw reply	[flat|nested] 82+ messages in thread

* The nvptx port [10/11+] Target files
  2014-10-20 14:19 The nvptx port [0/11+] Bernd Schmidt
                   ` (9 preceding siblings ...)
  2014-10-20 14:35 ` The nvptx port [9/11+] Epilogues Bernd Schmidt
@ 2014-10-20 14:50 ` Bernd Schmidt
  2014-10-22 18:12   ` Jeff Law
  2014-10-20 14:58 ` The nvptx port [11/11] More tools Bernd Schmidt
                   ` (5 subsequent siblings)
  16 siblings, 1 reply; 82+ messages in thread
From: Bernd Schmidt @ 2014-10-20 14:50 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 136 bytes --]

These are the main target files for the ptx port. t-nvptx is empty for 
now but will grow some content with follow up patches.


Bernd


[-- Attachment #2: 010-target.diff --]
[-- Type: text/x-patch, Size: 115282 bytes --]


	* configure.ac: Allow configuring lto for nvptx.
	* configure: Regenerate.

	gcc/
	* config/nvptx/nvptx.c: New file.
	* config/nvptx/nvptx.h: New file.
	* config/nvptx/nvptx-protos.h: New file.
	* config/nvptx/nvptx.md: New file.
	* config/nvptx/t-nvptx: New file.
	* config/nvptx/nvptx.opt: New file.
	* common/config/nvptx/nvptx-common.c: New file.
	* config.gcc: Handle nvptx-*-*.

	libgcc/
	* config.host: Handle nvptx-*-*.
	* config/nvptx/t-nvptx: New file.
	* config/nvptx/crt0.s: New file.

------------------------------------------------------------------------
Index: gcc/common/config/nvptx/nvptx-common.c
===================================================================
--- /dev/null
+++ gcc/common/config/nvptx/nvptx-common.c
@@ -0,0 +1,38 @@
+/* NVPTX common hooks.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   Contributed by Bernd Schmidt <bernds@codesourcery.com>
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "diagnostic-core.h"
+#include "tm.h"
+#include "tm_p.h"
+#include "common/common-target.h"
+#include "common/common-target-def.h"
+#include "opts.h"
+#include "flags.h"
+
+#undef TARGET_HAVE_NAMED_SECTIONS
+#define TARGET_HAVE_NAMED_SECTIONS false
+
+#undef TARGET_DEFAULT_TARGET_FLAGS
+#define TARGET_DEFAULT_TARGET_FLAGS MASK_ABI64
+
+struct gcc_targetm_common targetm_common = TARGETM_COMMON_INITIALIZER;
Index: gcc/config.gcc
===================================================================
--- gcc/config.gcc.orig
+++ gcc/config.gcc
@@ -420,6 +420,9 @@ nios2-*-*)
 	cpu_type=nios2
 	extra_options="${extra_options} g.opt"
 	;;
+nvptx-*-*)
+	cpu_type=nvptx
+	;;
 powerpc*-*-*)
 	cpu_type=rs6000
 	extra_headers="ppc-asm.h altivec.h spe.h ppu_intrinsics.h paired.h spu2vmx.h vec_types.h si2vmx.h htmintrin.h htmxlintrin.h"
@@ -2148,6 +2151,10 @@ nios2-*-*)
 		;;
         esac
 	;;
+nvptx-*)
+	tm_file="${tm_file} newlib-stdint.h"
+	tmake_file="nvptx/t-nvptx"
+	;;
 pdp11-*-*)
 	tm_file="${tm_file} newlib-stdint.h"
 	use_gcc_stdint=wrap
Index: gcc/config/nvptx/nvptx.c
===================================================================
--- /dev/null
+++ gcc/config/nvptx/nvptx.c
@@ -0,0 +1,2024 @@
+/* Target code for NVPTX.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   Contributed by Bernd Schmidt <bernds@codesourcery.com>
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "rtl.h"
+#include "tree.h"
+#include "insn-flags.h"
+#include "output.h"
+#include "insn-attr.h"
+#include "insn-codes.h"
+#include "expr.h"
+#include "regs.h"
+#include "optabs.h"
+#include "recog.h"
+#include "ggc.h"
+#include "timevar.h"
+#include "tm_p.h"
+#include "tm-preds.h"
+#include "tm-constrs.h"
+#include "function.h"
+#include "langhooks.h"
+#include "dbxout.h"
+#include "target.h"
+#include "target-def.h"
+#include "diagnostic.h"
+#include "basic-block.h"
+#include "stor-layout.h"
+#include "calls.h"
+#include "df.h"
+#include "builtins.h"
+#include "hashtab.h"
+#include <sstream>
+
+/* Record the function decls we've written, and the libfuncs and function
+   decls corresponding to them.  */
+static std::stringstream func_decls;
+static GTY((if_marked ("ggc_marked_p"), param_is (struct rtx_def)))
+  htab_t declared_libfuncs_htab;
+static GTY((if_marked ("ggc_marked_p"), param_is (union tree_node)))
+  htab_t declared_fndecls_htab;
+static GTY((if_marked ("ggc_marked_p"), param_is (union tree_node)))
+  htab_t needed_fndecls_htab;
+
+/* Allocate a new, cleared machine_function structure.  */
+
+static struct machine_function *
+nvptx_init_machine_status (void)
+{
+  struct machine_function *p = ggc_cleared_alloc<machine_function> ();
+  p->ret_reg_mode = VOIDmode;
+  return p;
+}
+
+/* Implement TARGET_OPTION_OVERRIDE.  */
+
+static void
+nvptx_option_override (void)
+{
+  init_machine_status = nvptx_init_machine_status;
+  /* Gives us a predictable order, which we need especially for variables.  */
+  flag_toplevel_reorder = 1;
+  /* Assumes that it will see only hard registers.  */
+  flag_var_tracking = 0;
+  write_symbols = NO_DEBUG;
+  debug_info_level = DINFO_LEVEL_NONE;
+
+  declared_fndecls_htab
+    = htab_create_ggc (17, htab_hash_pointer, htab_eq_pointer, NULL);
+  needed_fndecls_htab
+    = htab_create_ggc (17, htab_hash_pointer, htab_eq_pointer, NULL);
+  declared_libfuncs_htab
+    = htab_create_ggc (17, htab_hash_pointer, htab_eq_pointer, NULL);
+}
+
+/* Return the mode to be used when declaring a ptx object for OBJ.
+   For objects with subparts such as complex modes this is the mode
+   of the subpart.  */
+
+enum machine_mode
+nvptx_underlying_object_mode (rtx obj)
+{
+  if (GET_CODE (obj) == SUBREG)
+    obj = SUBREG_REG (obj);
+  enum machine_mode mode = GET_MODE (obj);
+  if (mode == TImode)
+    return DImode;
+  if (COMPLEX_MODE_P (mode))
+    return GET_MODE_INNER (mode);
+  return mode;
+}
+
+/* Return a ptx type for MODE.  If PROMOTE, then use .u32 for QImode to
+   deal with ptx ideosyncracies.  */
+
+const char *
+nvptx_ptx_type_from_mode (enum machine_mode mode, bool promote)
+{
+  switch (mode)
+    {
+    case BLKmode:
+      return ".b8";
+    case BImode:
+      return ".pred";
+    case QImode:
+      if (promote)
+	return ".u32";
+      else
+	return ".u8";
+    case HImode:
+      return ".u16";
+    case SImode:
+      return ".u32";
+    case DImode:
+      return ".u64";
+
+    case SFmode:
+      return ".f32";
+    case DFmode:
+      return ".f64";
+
+    default:
+      gcc_unreachable ();
+    }
+}
+
+static bool
+nvptx_split_reg_p (enum machine_mode mode)
+{
+  if (COMPLEX_MODE_P (mode))
+    return true;
+  if (mode == TImode)
+    return true;
+  return false;
+}
+
+/* Return the number of pieces to use when dealing with a pseudo of *PMODE.
+   Alter *PMODE if we return a number greater than one.  */
+
+static int
+maybe_split_mode (enum machine_mode *pmode)
+{
+  enum machine_mode mode = *pmode;
+
+  if (COMPLEX_MODE_P (mode))
+    {
+      *pmode = GET_MODE_INNER (mode);
+      return 2;
+    }
+  else if (mode == TImode)
+    {
+      *pmode = DImode;
+      return 2;
+    }
+  return 1;
+}
+
+#define PASS_IN_REG_P(MODE, TYPE)				\
+  ((GET_MODE_CLASS (MODE) == MODE_INT				\
+    || GET_MODE_CLASS (MODE) == MODE_FLOAT			\
+    || ((GET_MODE_CLASS (MODE) == MODE_COMPLEX_INT		\
+	 || GET_MODE_CLASS (MODE) == MODE_COMPLEX_FLOAT)	\
+	&& !AGGREGATE_TYPE_P (TYPE)))				\
+   && (MODE) != TImode)
+
+#define RETURN_IN_REG_P(MODE)			\
+  ((GET_MODE_CLASS (MODE) == MODE_INT		\
+    || GET_MODE_CLASS (MODE) == MODE_FLOAT)	\
+   && GET_MODE_SIZE (MODE) <= 8)
+\f
+/* Perform a mode promotion for a function argument.  Return the promoted
+   mode.  */
+static enum machine_mode
+arg_promotion (enum machine_mode mode)
+{
+  if (mode == QImode || mode == HImode)
+    return SImode;
+  return mode;
+}
+
+/* Write the declaration of a function arg of TYPE to S.  I is the index
+   of the argument, MODE its mode.  NO_ARG_TYPES is true if this is for
+   a decl with zero TYPE_ARG_TYPES, i.e. an old-style C decl.  */
+static int
+write_one_arg (std::stringstream &s, tree type, int i, enum machine_mode mode,
+	       bool no_arg_types)
+{
+  if (!PASS_IN_REG_P (mode, type))
+    mode = Pmode;
+
+  int count = maybe_split_mode (&mode);
+
+  if (count == 2)
+    {
+      write_one_arg (s, NULL_TREE, i, mode, false);
+      write_one_arg (s, NULL_TREE, i + 1, mode, false);
+      return i + 1;
+    }
+
+  if (no_arg_types && !AGGREGATE_TYPE_P (type))
+    {
+      if (mode == SFmode)
+	mode = DFmode;
+      mode = arg_promotion (mode);
+    }
+
+  if (i > 0)
+    s << ", ";
+  s << ".param" << nvptx_ptx_type_from_mode (mode, false) << " %in_ar"
+    << (i + 1) << (mode == QImode || mode == HImode ? "[1]" : "");
+  if (mode == BLKmode)
+    s << "[" << int_size_in_bytes (type) << "]";
+  return i;
+}
+
+static bool
+write_as_kernel (tree attrs)
+{
+  return (lookup_attribute ("kernel", attrs) != NULL_TREE
+	  || lookup_attribute ("omp target entrypoint", attrs) != NULL_TREE);
+}
+
+static void
+nvptx_write_function_decl (std::stringstream &s, const char *name, const_tree decl)
+{
+  tree fntype = TREE_TYPE (decl);
+  tree result_type = TREE_TYPE (fntype);
+  tree args = TYPE_ARG_TYPES (fntype);
+  tree attrs = DECL_ATTRIBUTES (decl);
+  bool kernel = write_as_kernel (attrs);
+  bool is_main = strcmp (name, "main") == 0;
+  bool args_from_decl = false;
+
+  /* We get:
+     NULL in TYPE_ARG_TYPES, for old-style functions
+     NULL in DECL_ARGUMENTS, for builtin functions without another
+       declaration.
+     So we have to pick the best one we have.  */
+  if (args == 0)
+    {
+      args = DECL_ARGUMENTS (decl);
+      args_from_decl = true;
+    }
+
+  if (DECL_EXTERNAL (decl))
+    s << ".extern ";
+  else if (TREE_PUBLIC (decl))
+    s << ".visible ";
+
+  if (kernel)
+    s << ".entry ";
+  else
+    s << ".func ";
+
+  /* Declare the result.  */
+  bool return_in_mem = false;
+  if (TYPE_MODE (result_type) != VOIDmode)
+    {
+      enum machine_mode mode = TYPE_MODE (result_type);
+      if (!RETURN_IN_REG_P (mode))
+	return_in_mem = true;
+      else
+	{
+	  mode = arg_promotion (mode);
+	  s << "(.param" << nvptx_ptx_type_from_mode (mode, false)
+	    << " %out_retval)";
+	}
+    }
+
+  if (name[0] == '*')
+    s << (name + 1);
+  else
+    s << name;
+
+  /* Declare argument types.  */
+  if ((args != NULL_TREE
+       && !(TREE_CODE (args) == TREE_LIST && TREE_VALUE (args) == void_type_node))
+      || is_main
+      || return_in_mem
+      || DECL_STATIC_CHAIN (decl))
+    {
+      s << "(";
+      int i = 0;
+      bool any_args = false;
+      if (return_in_mem)
+	{
+	  s << ".param.u" << GET_MODE_BITSIZE (Pmode) << " %in_ar1";
+	  i++;
+	}
+      while (args != NULL_TREE)
+	{
+	  tree type = args_from_decl ? TREE_TYPE (args) : TREE_VALUE (args);
+	  enum machine_mode mode = TYPE_MODE (type);
+
+	  if (mode != VOIDmode)
+	    {
+	      i = write_one_arg (s, type, i, mode,
+				 TYPE_ARG_TYPES (fntype) == 0);
+	      any_args = true;
+	      i++;
+	    }
+	  args = TREE_CHAIN (args);
+	}
+      if (stdarg_p (fntype))
+	{
+	  gcc_assert (i > 0);
+	  s << ", .param.u" << GET_MODE_BITSIZE (Pmode) << " %in_argp";
+	}
+      if (DECL_STATIC_CHAIN (decl))
+	{
+	  if (i > 0)
+	    s << ", ";
+	  s << ".reg.u" << GET_MODE_BITSIZE (Pmode)
+	    << reg_names [STATIC_CHAIN_REGNUM];
+	}
+      if (!any_args && is_main)
+	s << ".param.u32 %argc, .param.u" << GET_MODE_BITSIZE (Pmode)
+	  << " %argv";
+      s << ")";
+    }
+}
+
+/* Walk either ARGTYPES or ARGS if the former is null, and write out part of
+   the function header to FILE.  If WRITE_COPY is false, write reg
+   declarations, otherwise write the copy from the incoming argument to that
+   reg.  RETURN_IN_MEM indicates whether to start counting arg numbers at 1
+   instead of 0.  */
+
+static void
+walk_args_for_param (FILE *file, tree argtypes, tree args, bool write_copy,
+		     bool return_in_mem)
+{
+  int i;
+
+  bool args_from_decl = false;
+  if (argtypes == 0)
+    args_from_decl = true;
+  else
+    args = argtypes;
+
+  for (i = return_in_mem ? 1 : 0; args != NULL_TREE; args = TREE_CHAIN (args))
+    {
+      tree type = args_from_decl ? TREE_TYPE (args) : TREE_VALUE (args);
+      enum machine_mode mode = TYPE_MODE (type);
+
+      if (mode == VOIDmode)
+	break;
+
+      if (!PASS_IN_REG_P (mode, type))
+	mode = Pmode;
+
+      int count = maybe_split_mode (&mode);
+      if (count == 1)
+	{
+	  if (argtypes == NULL && !AGGREGATE_TYPE_P (type))
+	    {
+	      if (mode == SFmode)
+		mode = DFmode;
+
+	    }
+	  mode = arg_promotion (mode);
+	}
+      while (count-- > 0)
+	{
+	  i++;
+	  if (write_copy)
+	    fprintf (file, "\tld.param%s %%ar%d, [%%in_ar%d];\n",
+		     nvptx_ptx_type_from_mode (mode, false), i, i);
+	  else
+	    fprintf (file, "\t.reg%s %%ar%d;\n",
+		     nvptx_ptx_type_from_mode (mode, false), i);
+	}
+    }
+}
+
+static void
+write_function_decl_only (std::stringstream &s, const char *name, const_tree decl)
+{
+  s << "// BEGIN";
+  if (TREE_PUBLIC (decl))
+    s << " GLOBAL";
+  s << " FUNCTION DECL: ";
+  if (name[0] == '*')
+    s << (name + 1);
+  else
+    s << name;
+  s << "\n";
+  nvptx_write_function_decl (s, name, decl);
+  s << ";\n";
+}
+
+/* If DECL is a FUNCTION_DECL, check the hash table to see if we
+   already encountered it, and if not, insert it and write a ptx
+   declarations that will be output at the end of compilation.  */
+static bool
+nvptx_record_fndecl (tree decl, bool force = false)
+{
+  if (decl == NULL_TREE || TREE_CODE (decl) != FUNCTION_DECL
+      || !DECL_EXTERNAL (decl))
+    return true;
+
+  if (!force && TYPE_ARG_TYPES (TREE_TYPE (decl)) == NULL_TREE)
+    return false;
+
+  void **slot = htab_find_slot (declared_fndecls_htab, decl, INSERT);
+  if (*slot == NULL)
+    {
+      *slot = decl;
+      const char *name = get_fnname_from_decl (decl);
+      write_function_decl_only (func_decls, name, decl);
+    }
+  return true;
+}
+
+/* Record that we need to emit a ptx decl for DECL.  Either do it now, or
+   record it for later in case we have no argument information at this
+   point.  */
+void
+nvptx_record_needed_fndecl (tree decl)
+{
+  if (nvptx_record_fndecl (decl))
+    return;
+
+  void **slot = htab_find_slot (needed_fndecls_htab, decl, INSERT);
+  if (*slot == NULL)
+    *slot = decl;
+}
+
+/* Implement ASM_DECLARE_FUNCTION_NAME.  Writes the start of a ptx
+   function, including local var decls and copies from the arguments to
+   local regs.  */
+void
+nvptx_declare_function_name (FILE *file, const char *name, const_tree decl)
+{
+  tree fntype = TREE_TYPE (decl);
+  tree result_type = TREE_TYPE (fntype);
+
+  std::stringstream s;
+  write_function_decl_only (s, name, decl);
+  s << "// BEGIN";
+  if (TREE_PUBLIC (decl))
+    s << " GLOBAL";
+  s << " FUNCTION DEF: ";
+
+  if (name[0] == '*')
+    s << (name + 1);
+  else
+    s << name;
+  s << "\n";
+
+  nvptx_write_function_decl (s, name, decl);
+  fprintf (file, "%s", s.str().c_str());
+
+  bool return_in_mem = false;
+  if (TYPE_MODE (result_type) != VOIDmode)
+    {
+      enum machine_mode mode = TYPE_MODE (result_type);
+      if (!RETURN_IN_REG_P (mode))
+	return_in_mem = true;
+    }
+
+  fprintf (file, "\n{\n");
+
+  /* Ensure all arguments that should live in a register have one
+     declared.  We'll emit the copies below.  */
+  walk_args_for_param (file, TYPE_ARG_TYPES (fntype), DECL_ARGUMENTS (decl),
+		       false, return_in_mem);
+  if (return_in_mem)
+    fprintf (file, "\t.reg.u%d %%ar1;\n", GET_MODE_BITSIZE (Pmode));
+  else if (TYPE_MODE (result_type) != VOIDmode)
+    {
+      enum machine_mode mode = arg_promotion (TYPE_MODE (result_type));
+      fprintf (file, ".reg%s %%retval;\n",
+	       nvptx_ptx_type_from_mode (mode, false));
+    }
+
+  if (stdarg_p (fntype))
+    fprintf (file, "\t.reg.u%d %%argp;\n", GET_MODE_BITSIZE (Pmode));
+
+  fprintf (file, "\t.reg.u%d %s;\n", GET_MODE_BITSIZE (Pmode),
+	   reg_names[OUTGOING_STATIC_CHAIN_REGNUM]);
+
+  /* Declare the pseudos we have as ptx registers.  */
+  int maxregs = max_reg_num ();
+  for (int i = LAST_VIRTUAL_REGISTER + 1; i < maxregs; i++)
+    {
+      if (regno_reg_rtx[i] != const0_rtx)
+	{
+	  enum machine_mode mode = PSEUDO_REGNO_MODE (i);
+	  int count = maybe_split_mode (&mode);
+	  if (count > 1)
+	    {
+	      while (count-- > 0)
+		fprintf (file, "\t.reg%s %%r%d$%d;\n",
+			 nvptx_ptx_type_from_mode (mode, true),
+			 i, count);
+	    }
+	  else
+	    fprintf (file, "\t.reg%s %%r%d;\n",
+		     nvptx_ptx_type_from_mode (mode, true),
+		     i);
+	}
+    }
+
+  /* The only reason we might be using outgoing args is if we call a stdargs
+     function.  Allocate the space for this.  If we called varargs functions
+     without passing any variadic arguments, we'll see a reference to outargs
+     even with a zero outgoing_args_size.  */
+  HOST_WIDE_INT sz = crtl->outgoing_args_size;
+  if (sz == 0)
+    sz = 1;
+  if (cfun->machine->has_call_with_varargs)
+    fprintf (file, "\t.reg.u%d %%outargs;\n"
+	     "\t.local.align 8 .b8 %%outargs_ar["HOST_WIDE_INT_PRINT_DEC"];\n",
+	     BITS_PER_WORD, sz);
+  if (cfun->machine->punning_buffer_size > 0)
+    fprintf (file, "\t.reg.u%d %%punbuffer;\n"
+	     "\t.local.align 8 .b8 %%punbuffer_ar[%d];\n",
+	     BITS_PER_WORD, cfun->machine->punning_buffer_size);
+
+  /* Declare a local variable for the frame.  */
+  sz = get_frame_size ();
+  if (sz > 0 || cfun->machine->has_call_with_sc)
+    {
+      fprintf (file, "\t.reg.u%d %%frame;\n"
+	       "\t.local.align 8 .b8 %%farray["HOST_WIDE_INT_PRINT_DEC"];\n",
+	       BITS_PER_WORD, sz == 0 ? 1 : sz);
+      fprintf (file, "\tcvta.local.u%d %%frame, %%farray;\n",
+	       BITS_PER_WORD);
+    }
+
+  if (cfun->machine->has_call_with_varargs)
+      fprintf (file, "\tcvta.local.u%d %%outargs, %%outargs_ar;\n",
+	       BITS_PER_WORD);
+  if (cfun->machine->punning_buffer_size > 0)
+      fprintf (file, "\tcvta.local.u%d %%punbuffer, %%punbuffer_ar;\n",
+	       BITS_PER_WORD);
+
+  /* Now emit any copies necessary for arguments.  */
+  walk_args_for_param (file, TYPE_ARG_TYPES (fntype), DECL_ARGUMENTS (decl),
+		       true, return_in_mem);
+  if (return_in_mem)
+    fprintf (file, "ld.param.u%d %%ar1, [%%in_ar1];\n",
+	     GET_MODE_BITSIZE (Pmode));
+  if (stdarg_p (fntype))
+    fprintf (file, "ld.param.u%d %%argp, [%%in_argp];\n",
+	     GET_MODE_BITSIZE (Pmode));
+}
+
+/* Output a return instruction.  Also copy the return value to its outgoing
+   location.  */
+
+const char *
+nvptx_output_return (void)
+{
+  tree fntype = TREE_TYPE (current_function_decl);
+  tree result_type = TREE_TYPE (fntype);
+  if (TYPE_MODE (result_type) != VOIDmode)
+    {
+      enum machine_mode mode = TYPE_MODE (result_type);
+      if (RETURN_IN_REG_P (mode))
+	{
+	  mode = arg_promotion (mode);
+	  fprintf (asm_out_file, "\tst.param%s\t[%%out_retval], %%retval;\n",
+		   nvptx_ptx_type_from_mode (mode, false));
+	}
+    }
+
+  return "ret;";
+}
+
+/* Construct a function declaration from a call insn.  This can be
+   necessary for two reasons - either we have an indirect call which
+   requires a .callprototype declaration, or we have a libcall
+   generated by emit_library_call for which no decl exists.  */
+
+static void
+write_func_decl_from_insn (std::stringstream &s, rtx result, rtx pat,
+			   rtx callee)
+{
+  bool callprototype = register_operand (callee, Pmode);
+  const char *name = "_";
+  if (!callprototype)
+    {
+      name = XSTR (callee, 0);
+      s << "// BEGIN GLOBAL FUNCTION DECL: " << name << "\n";
+    }
+  s << (callprototype ? "\t.callprototype\t" : "\t.extern .func ");
+
+  if (result != NULL_RTX)
+    {
+      s << "(.param";
+      s << nvptx_ptx_type_from_mode (arg_promotion (GET_MODE (result)),
+				     false);
+      s << " ";
+      if (callprototype)
+	s << "_";
+      else
+	s << "%out_retval";
+      s << ")";
+    }
+
+  s << name;
+
+  int nargs = XVECLEN (pat, 0) - 1;
+  if (nargs > 0)
+    {
+      s << " (";
+      for (int i = 0; i < nargs; i++)
+	{
+	  rtx t = XEXP (XVECEXP (pat, 0, i + 1), 0);
+	  enum machine_mode mode = GET_MODE (t);
+	  int count = maybe_split_mode (&mode);
+
+	  while (count-- > 0)
+	    {
+	      s << ".param";
+	      s << nvptx_ptx_type_from_mode (mode, false);
+	      s << " ";
+	      if (callprototype)
+		s << "_";
+	      else
+		s << "%arg" << i;
+	      if (mode == QImode || mode == HImode)
+		s << "[1]";
+	      if (i + 1 < nargs || count > 0)
+		s << ", ";
+	    }
+	}
+      s << ")";
+    }
+  s << ";\n";
+}
+
+void
+nvptx_function_end (FILE *file)
+{
+  fprintf (file, "\t}\n");
+}
+\f
+/* Decide whether we can make a sibling call to a function.  For ptx, we
+   can't.  */
+
+static bool
+nvptx_function_ok_for_sibcall (tree, tree)
+{
+  return false;
+}
+
+/* Implement the TARGET_CALL_ARGS hook.  Record information about one
+   argument to the next call.  */
+
+static void
+nvptx_call_args (rtx arg, tree funtype)
+{
+  if (cfun->machine->start_call == NULL_RTX)
+    {
+      cfun->machine->call_args = NULL;
+      cfun->machine->funtype = funtype;
+      cfun->machine->start_call = const0_rtx;
+    }
+  if (arg == pc_rtx)
+    return;
+
+  rtx_expr_list *args_so_far = cfun->machine->call_args;
+  if (REG_P (arg))
+    cfun->machine->call_args = alloc_EXPR_LIST (VOIDmode, arg, args_so_far);
+}
+
+static void
+nvptx_end_call_args (void)
+{
+  cfun->machine->start_call = NULL_RTX;
+  free_EXPR_LIST_list (&cfun->machine->call_args);
+}
+
+/* Emit the sequence for a call.  */
+
+void
+nvptx_expand_call (rtx retval, rtx address)
+{
+  int nargs;
+  rtx callee = XEXP (address, 0);
+  rtx pat, t;
+  rtvec vec;
+  bool external_decl = false;
+
+  nargs = 0;
+  for (t = cfun->machine->call_args; t; t = XEXP (t, 1))
+    nargs++;
+
+  bool has_varargs = false;
+  tree decl_type = NULL_TREE;
+
+  if (!call_insn_operand (callee, Pmode))
+    {
+      callee = force_reg (Pmode, callee);
+      address = change_address (address, QImode, callee);
+    }
+
+  if (GET_CODE (callee) == SYMBOL_REF)
+    {
+      tree decl = SYMBOL_REF_DECL (callee);
+      if (decl != NULL_TREE)
+	{
+	  decl_type = TREE_TYPE (decl);
+	  if (DECL_STATIC_CHAIN (decl))
+	    cfun->machine->has_call_with_sc = true;
+	  if (DECL_EXTERNAL (decl))
+	    external_decl = true;
+	}
+    }
+  if (cfun->machine->funtype
+      /* It's possible to construct testcases where we call a variable.
+	 See compile/20020129-1.c.  stdarg_p will crash so avoid calling it
+	 in such a case.  */
+      && (TREE_CODE (cfun->machine->funtype) == FUNCTION_TYPE
+	  || TREE_CODE (cfun->machine->funtype) == METHOD_TYPE)
+      && stdarg_p (cfun->machine->funtype))
+    {
+      has_varargs = true;
+      cfun->machine->has_call_with_varargs = true;
+    }
+  vec = rtvec_alloc (nargs + 1 + (has_varargs ? 1 : 0));
+  pat = gen_rtx_PARALLEL (VOIDmode, vec);
+  if (has_varargs)
+    {
+      rtx this_arg = gen_reg_rtx (Pmode);
+      if (Pmode == DImode)
+	emit_move_insn (this_arg, stack_pointer_rtx);
+      else
+	emit_move_insn (this_arg, stack_pointer_rtx);
+      XVECEXP (pat, 0, nargs + 1) = gen_rtx_USE (VOIDmode, this_arg);
+    }
+
+  int i;
+  rtx arg;
+  for (i = 1, arg = cfun->machine->call_args; arg; arg = XEXP (arg, 1), i++)
+    {
+      rtx this_arg = XEXP (arg, 0);
+      XVECEXP (pat, 0, i) = gen_rtx_USE (VOIDmode, this_arg);
+    }
+
+  t = gen_rtx_CALL (VOIDmode, address, const0_rtx);
+  if (retval != NULL_RTX)
+    t = gen_rtx_SET (VOIDmode, retval, t);
+  XVECEXP (pat, 0, 0) = t;
+  if (!REG_P (callee)
+      && (decl_type == NULL_TREE
+	  || (external_decl && TYPE_ARG_TYPES (decl_type) == NULL_TREE)))
+    {
+      void **slot = htab_find_slot (declared_libfuncs_htab, callee, INSERT);
+      if (*slot == NULL)
+	{
+	  *slot = callee;
+	  write_func_decl_from_insn (func_decls, retval, pat, callee);
+	}
+    }
+  emit_call_insn (pat);
+}
+
+/* Implement TARGET_FUNCTION_ARG.  */
+
+static rtx
+nvptx_function_arg (cumulative_args_t, enum machine_mode mode,
+		    const_tree, bool named)
+{
+  if (mode == VOIDmode)
+    return NULL_RTX;
+
+  if (named)
+    return gen_reg_rtx (mode);
+  return NULL_RTX;
+}
+
+static rtx
+nvptx_function_incoming_arg (cumulative_args_t cum_v, enum machine_mode mode,
+			     const_tree, bool named)
+{
+  CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v);
+  if (mode == VOIDmode)
+    return NULL_RTX;
+
+  if (!named)
+    return NULL_RTX;
+
+  /* No need to deal with split modes here, the only case that can
+     happen is complex modes and those are dealt with by
+     TARGET_SPLIT_COMPLEX_ARG.  */
+  return gen_rtx_UNSPEC (mode,
+			 gen_rtvec (1, GEN_INT (1 + cum->count)),
+			 UNSPEC_ARG_REG);
+}
+
+/* Implement TARGET_FUNCTION_ARG_ADVANCE.  */
+
+static void
+nvptx_function_arg_advance (cumulative_args_t cum_v, enum machine_mode mode,
+			    const_tree type ATTRIBUTE_UNUSED,
+			    bool named ATTRIBUTE_UNUSED)
+{
+  CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v);
+  if (mode == TImode)
+    cum->count += 2;
+  else
+    cum->count++;
+}
+
+/* Handle the TARGET_STRICT_ARGUMENT_NAMING target hook.
+
+   For nvptx, we know how to handle functions declared as stdarg: by
+   passing an extra pointer to the unnamed arguments.  However, the
+   Fortran frontend can produce a different situation, where a
+   function pointer is declared with no arguments, but the actual
+   function and calls to it take more arguments.  In that case, we
+   want to ensure the call matches the definition of the function.  */
+
+static bool
+nvptx_strict_argument_naming (cumulative_args_t cum_v)
+{
+  CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v);
+  return cum->fntype == NULL_TREE || stdarg_p (cum->fntype);
+}
+
+
+/* Implement TARGET_FUNCTION_ARG_BOUNDARY.  */
+
+static unsigned int
+nvptx_function_arg_boundary (enum machine_mode mode, const_tree type)
+{
+  unsigned int boundary = type ? TYPE_ALIGN (type) : GET_MODE_BITSIZE (mode);
+
+  if (boundary > BITS_PER_WORD)
+    return 2 * BITS_PER_WORD;
+
+  if (mode == BLKmode)
+    {
+      HOST_WIDE_INT size = int_size_in_bytes (type);
+      if (size > 4)
+	return 2 * BITS_PER_WORD;
+      if (boundary < BITS_PER_WORD)
+	{
+	  if (size >= 3)
+	    return BITS_PER_WORD;
+	  if (size >= 2)
+	    return 2 * BITS_PER_UNIT;
+	}
+    }
+  return boundary;
+}
+
+/* Implement TARGET_FUNCTION_ARG_ROUND_BOUNDARY.  */
+static unsigned int
+nvptx_function_arg_round_boundary (enum machine_mode mode, const_tree type)
+{
+  return nvptx_function_arg_boundary (mode, type);
+}
+
+/* TARGET_FUNCTION_VALUE implementation.  Returns an RTX representing the place
+   where function FUNC returns or receives a value of data type TYPE.  */
+
+static rtx
+nvptx_function_value (const_tree type, const_tree func ATTRIBUTE_UNUSED,
+		      bool outgoing)
+{
+  int unsignedp = TYPE_UNSIGNED (type);
+  enum machine_mode orig_mode = TYPE_MODE (type);
+  enum machine_mode mode = promote_function_mode (type, orig_mode,
+						  &unsignedp, NULL_TREE, 1);
+  if (outgoing)
+    return gen_rtx_REG (mode, 4);
+  if (cfun->machine->start_call == NULL_RTX)
+    /* Pretend to return in a hard reg for early uses before pseudos can be
+       generated.  */
+    return gen_rtx_REG (mode, 4);
+  return gen_reg_rtx (mode);
+}
+
+/* Implement TARGET_LIBCALL_VALUE.  */
+
+static rtx
+nvptx_libcall_value (enum machine_mode mode, const_rtx)
+{
+  if (cfun->machine->start_call == NULL_RTX)
+    /* Pretend to return in a hard reg for early uses before pseudos can be
+       generated.  */
+    return gen_rtx_REG (mode, 4);
+  return gen_reg_rtx (mode);
+}
+
+/* Implement TARGET_FUNCTION_VALUE_REGNO_P.  */
+
+static bool
+nvptx_function_value_regno_p (const unsigned int regno)
+{
+  return regno == 4;
+}
+
+/* Types with a mode other than those supported by the machine are passed by
+   reference in memory.  */
+
+static bool
+nvptx_pass_by_reference (cumulative_args_t, enum machine_mode mode,
+			 const_tree type, bool)
+{
+  return !PASS_IN_REG_P (mode, type);
+}
+
+/* Decide whether a type should be returned in memory (true)
+   or in a register (false).  This is called by the macro
+   TARGET_RETURN_IN_MEMORY.  */
+
+static bool
+nvptx_return_in_memory (const_tree type, const_tree)
+{
+  enum machine_mode mode = TYPE_MODE (type);
+  if (!RETURN_IN_REG_P (mode))
+    return true;
+  return false;
+
+  return TYPE_MODE (type) == BLKmode;
+  int size = int_size_in_bytes (type);
+  return size > 2 * UNITS_PER_WORD || size == -1;
+}
+
+static enum machine_mode
+nvptx_promote_function_mode (const_tree type, enum machine_mode mode,
+			     int *punsignedp,
+			     const_tree funtype, int for_return)
+{
+  if (type == NULL_TREE)
+    return mode;
+  if (for_return)
+    return promote_mode (type, mode, punsignedp);
+  /* For K&R-style functions, try to match the language promotion rules to
+     minimize type mismatches at assembly time.  */
+  if (TYPE_ARG_TYPES (funtype) == NULL_TREE
+      && type != NULL_TREE
+      && !AGGREGATE_TYPE_P (type))
+    {
+      if (mode == SFmode)
+	mode = DFmode;
+      mode = arg_promotion (mode);
+    }
+
+  return mode;
+}
+
+/* Implement TARGET_STATIC_CHAIN.  */
+
+static rtx
+nvptx_static_chain (const_tree fndecl, bool incoming_p)
+{
+  if (!DECL_STATIC_CHAIN (fndecl))
+    return NULL;
+
+  if (incoming_p)
+    return gen_rtx_REG (Pmode, STATIC_CHAIN_REGNUM);
+  else
+    return gen_rtx_REG (Pmode, OUTGOING_STATIC_CHAIN_REGNUM);
+}
+\f
+/* Emit a comparison.  */
+
+rtx
+nvptx_expand_compare (rtx compare)
+{
+  rtx pred = gen_reg_rtx (BImode);
+  rtx cmp = gen_rtx_fmt_ee (GET_CODE (compare), BImode,
+			    XEXP (compare, 0), XEXP (compare, 1));
+  emit_insn (gen_rtx_SET (VOIDmode, pred, cmp));
+  return gen_rtx_NE (BImode, pred, const0_rtx);
+}
+
+rtx
+nvptx_maybe_convert_symbolic_operand (rtx orig_op)
+{
+  if (GET_MODE (orig_op) != Pmode)
+    return orig_op;
+
+  rtx op = orig_op;
+  while (GET_CODE (op) == PLUS || GET_CODE (op) == CONST)
+    op = XEXP (op, 0);
+  if (GET_CODE (op) != SYMBOL_REF)
+    return orig_op;
+
+  tree decl = SYMBOL_REF_DECL (op);
+  if (decl && TREE_CODE (decl) == FUNCTION_DECL)
+    {
+      nvptx_record_needed_fndecl (decl);
+      return orig_op;
+    }
+
+  addr_space_t as = nvptx_addr_space_from_address (op);
+  if (as == ADDR_SPACE_GENERIC)
+    return orig_op;
+
+  enum unspec code;
+  code = (as == ADDR_SPACE_GLOBAL ? UNSPEC_FROM_GLOBAL
+	  : as == ADDR_SPACE_LOCAL ? UNSPEC_FROM_LOCAL
+	  : as == ADDR_SPACE_SHARED ? UNSPEC_FROM_SHARED
+	  : as == ADDR_SPACE_CONST ? UNSPEC_FROM_CONST
+	  : UNSPEC_FROM_PARAM);
+  rtx dest = gen_reg_rtx (Pmode);
+  emit_insn (gen_rtx_SET (VOIDmode, dest,
+			  gen_rtx_UNSPEC (Pmode, gen_rtvec (1, orig_op),
+					  code)));
+  return dest;
+}
+\f
+/* Returns true if X is a valid address for use in a memory reference.  */
+
+static bool
+nvptx_legitimate_address_p (enum machine_mode, rtx x, bool)
+{
+  enum rtx_code code = GET_CODE (x);
+
+  switch (code)
+    {
+    case REG:
+      return true;
+
+    case PLUS:
+      if (REG_P (XEXP (x, 0)) && CONST_INT_P (XEXP (x, 1)))
+	return true;
+      return false;
+
+    case CONST:
+    case SYMBOL_REF:
+    case LABEL_REF:
+      return true;
+
+    default:
+      return false;
+    }
+}
+
+bool
+nvptx_hard_regno_mode_ok (int regno, enum machine_mode mode)
+{
+  if (regno != 4 || cfun == NULL || cfun->machine->ret_reg_mode == VOIDmode)
+    return true;
+  return mode == cfun->machine->ret_reg_mode;
+}
+\f
+/* Convert an address space AS to the corresponding ptx string.  */
+
+const char *
+nvptx_section_from_addr_space (addr_space_t as)
+{
+  switch (as)
+    {
+    case ADDR_SPACE_CONST:
+      return ".const";
+
+    case ADDR_SPACE_GLOBAL:
+      return ".global";
+
+    case ADDR_SPACE_SHARED:
+      return ".shared";
+
+    case ADDR_SPACE_GENERIC:
+      return "";
+
+    default:
+      gcc_unreachable ();
+    }
+}
+
+/* Determine whether DECL goes into .const or .global.  */
+
+const char *
+nvptx_section_for_decl (const_tree decl)
+{
+  bool is_const = (CONSTANT_CLASS_P (decl)
+		   || TREE_CODE (decl) == CONST_DECL
+		   || TREE_READONLY (decl));
+  if (is_const)
+    return ".const";
+
+  return ".global";
+}
+
+/* Look for a SYMBOL_REF in ADDR and return the address space to be used
+   for the insn referencing this address.  */
+
+addr_space_t
+nvptx_addr_space_from_address (rtx addr)
+{
+  while (GET_CODE (addr) == PLUS || GET_CODE (addr) == CONST)
+    addr = XEXP (addr, 0);
+  if (GET_CODE (addr) != SYMBOL_REF)
+    return ADDR_SPACE_GENERIC;
+
+  tree decl = SYMBOL_REF_DECL (addr);
+  if (decl == NULL_TREE || TREE_CODE (decl) == FUNCTION_DECL)
+    return ADDR_SPACE_GENERIC;
+
+  bool is_const = (CONSTANT_CLASS_P (decl)
+		   || TREE_CODE (decl) == CONST_DECL
+		   || TREE_READONLY (decl));
+  if (is_const)
+    return ADDR_SPACE_CONST;
+
+  return ADDR_SPACE_GLOBAL;
+}
+\f
+/* Machinery to output constant initializers.  */
+
+/* Used when assembling integers to ensure data is emitted in
+   pieces whose size matches the declaration we printed.  */
+static unsigned int decl_chunk_size;
+static enum machine_mode decl_chunk_mode;
+/* Used in the same situation, to keep track of the byte offset
+   into the initializer.  */
+static unsigned HOST_WIDE_INT decl_offset;
+/* The initializer part we are currently processing.  */
+static HOST_WIDE_INT init_part;
+/* The total size of the object.  */
+static unsigned HOST_WIDE_INT object_size;
+/* True if we found a skip extending to the end of the object.  Used to
+   assert that no data follows.  */
+static bool object_finished;
+
+static void
+begin_decl_field (void)
+{
+  /* We never see decl_offset at zero by the time we get here.  */
+  if (decl_offset == decl_chunk_size)
+    fprintf (asm_out_file, " = { ");
+  else
+    fprintf (asm_out_file, ", ");
+}
+
+static void
+output_decl_chunk (void)
+{
+  begin_decl_field ();
+  output_address (gen_int_mode (init_part, decl_chunk_mode));
+  init_part = 0;
+}
+
+/* Add value VAL sized SIZE to the data we're emitting, and keep writing
+   out chunks as they fill up.  */
+
+static void
+nvptx_assemble_value (HOST_WIDE_INT val, unsigned int size)
+{
+  unsigned HOST_WIDE_INT chunk_offset = decl_offset % decl_chunk_size;
+  gcc_assert (!object_finished);
+  while (size > 0)
+    {
+      int this_part = size;
+      if (chunk_offset + this_part > decl_chunk_size)
+	this_part = decl_chunk_size - chunk_offset;
+      HOST_WIDE_INT val_part;
+      HOST_WIDE_INT mask = 2;
+      mask <<= this_part * BITS_PER_UNIT - 1;
+      val_part = val & (mask - 1);
+      init_part |= val_part << (BITS_PER_UNIT * chunk_offset);
+      val >>= BITS_PER_UNIT * this_part;
+      size -= this_part;
+      decl_offset += this_part;
+      if (decl_offset % decl_chunk_size == 0)
+	output_decl_chunk ();
+
+      chunk_offset = 0;
+    }
+}
+
+/* Target hook for assembling integer object X of size SIZE.  */
+
+static bool
+nvptx_assemble_integer (rtx x, unsigned int size, int ARG_UNUSED (aligned_p))
+{
+  if (GET_CODE (x) == SYMBOL_REF || GET_CODE (x) == CONST)
+    {
+      gcc_assert (size = decl_chunk_size);
+      if (decl_offset % decl_chunk_size != 0)
+	sorry ("cannot emit unaligned pointers in ptx assembly");
+      decl_offset += size;
+      begin_decl_field ();
+
+      HOST_WIDE_INT off = 0;
+      if (GET_CODE (x) == CONST)
+	x = XEXP (x, 0);
+      if (GET_CODE (x) == PLUS)
+	{
+	  off = INTVAL (XEXP (x, 1));
+	  x = XEXP (x, 0);
+	}
+      if (GET_CODE (x) == SYMBOL_REF)
+	{
+	  nvptx_record_needed_fndecl (SYMBOL_REF_DECL (x));
+	  fprintf (asm_out_file, "generic(");
+	  output_address (x);
+	  fprintf (asm_out_file, ")");
+	}
+      if (off != 0)
+	fprintf (asm_out_file, " + " HOST_WIDE_INT_PRINT_DEC, off);
+      return true;
+    }
+
+  HOST_WIDE_INT val;
+  switch (GET_CODE (x))
+    {
+    case CONST_INT:
+      val = INTVAL (x);
+      break;
+    case CONST_DOUBLE:
+      gcc_unreachable ();
+      break;
+    default:
+      gcc_unreachable ();
+    }
+
+  nvptx_assemble_value (val, size);
+  return true;
+}
+
+void
+nvptx_output_skip (FILE *, unsigned HOST_WIDE_INT size)
+{
+  if (decl_offset + size >= object_size)
+    {
+      if (decl_offset % decl_chunk_size != 0)
+	nvptx_assemble_value (0, decl_chunk_size);
+      object_finished = true;
+      return;
+    }
+
+  while (size > decl_chunk_size)
+    {
+      nvptx_assemble_value (0, decl_chunk_size);
+      size -= decl_chunk_size;
+    }
+  while (size-- > 0)
+    nvptx_assemble_value (0, 1);
+}
+
+void
+nvptx_output_ascii (FILE *, const char *str, unsigned HOST_WIDE_INT size)
+{
+  for (unsigned HOST_WIDE_INT i = 0; i < size; i++)
+    nvptx_assemble_value (str[i], 1);
+}
+
+static void
+nvptx_assemble_decl_end (void)
+{
+  if (decl_offset != 0)
+    {
+      if (!object_finished && decl_offset % decl_chunk_size != 0)
+	nvptx_assemble_value (0, decl_chunk_size);
+
+      fprintf (asm_out_file, " }");
+    }
+  fprintf (asm_out_file, ";\n");
+  fprintf (asm_out_file, "// END VAR DEF\n");
+}
+
+/* Start a declaration of a variable of TYPE with NAME to
+   FILE.  IS_PUBLIC says whether this will be externally visible.
+   Here we just write the linker hint and decide on the chunk size
+   to use.  */
+
+static void
+init_output_initializer (FILE *file, const char *name, const_tree type,
+			 bool is_public)
+{
+  fprintf (file, "// BEGIN%s VAR DEF: ", is_public ? " GLOBAL" : "");
+  assemble_name_raw (file, name);
+  fputc ('\n', file);
+
+  if (TREE_CODE (type) == ARRAY_TYPE)
+    type = TREE_TYPE (type);
+  int sz = int_size_in_bytes (type);
+  if ((TREE_CODE (type) != INTEGER_TYPE
+       && TREE_CODE (type) != ENUMERAL_TYPE
+       && TREE_CODE (type) != REAL_TYPE)
+      || sz < 0
+      || sz > HOST_BITS_PER_WIDE_INT)
+    type = ptr_type_node;
+  decl_chunk_size = int_size_in_bytes (type);
+  decl_chunk_mode = int_mode_for_mode (TYPE_MODE (type));
+  decl_offset = 0;
+  init_part = 0;
+  object_finished = false;
+}
+
+static void
+nvptx_asm_declare_constant_name (FILE *file, const char *name,
+				 const_tree exp ATTRIBUTE_UNUSED,
+				 HOST_WIDE_INT size)
+{
+  tree type = TREE_TYPE (exp);
+  init_output_initializer (file, name, type, false);
+  fprintf (file, "\t.const .align %d .u%d ",
+	   TYPE_ALIGN (TREE_TYPE (exp)) / BITS_PER_UNIT,
+	   decl_chunk_size * BITS_PER_UNIT);
+  assemble_name (file, name);
+  fprintf (file, "[" HOST_WIDE_INT_PRINT_DEC "]",
+	   (size + decl_chunk_size - 1) / decl_chunk_size);
+  object_size = size;
+}
+
+void
+nvptx_declare_object_name (FILE *file, const char *name, const_tree decl)
+{
+  if (decl && DECL_SIZE (decl))
+    {
+      tree type = TREE_TYPE (decl);
+      unsigned HOST_WIDE_INT size;
+
+      init_output_initializer (file, name, type, TREE_PUBLIC (decl));
+      size = tree_to_uhwi (DECL_SIZE_UNIT (decl));
+      const char *section = nvptx_section_for_decl (decl);
+      fprintf (file, "\t%s%s .align %d .u%d ",
+	       TREE_PUBLIC (decl) ? " .visible" : "", section,
+	       DECL_ALIGN (decl) / BITS_PER_UNIT,
+	       decl_chunk_size * BITS_PER_UNIT);
+      assemble_name (file, name);
+      if (size > 0)
+	fprintf (file, "[" HOST_WIDE_INT_PRINT_DEC "]",
+		 (size + decl_chunk_size - 1) / decl_chunk_size);
+      else
+	object_finished = true;
+      object_size = size;
+    }
+}
+
+/* Implement the target hook by doing nothing.  */
+static void
+nvptx_globalize_label (FILE *, const char *)
+{
+}
+
+static void
+nvptx_assemble_undefined_decl (FILE *file, const char *name, const_tree decl)
+{
+  if (TREE_CODE (decl) != VAR_DECL)
+    return;
+  const char *section = nvptx_section_for_decl (decl);
+  fprintf (file, "// BEGIN%s VAR DECL: ", TREE_PUBLIC (decl) ? " GLOBAL" : "");
+  assemble_name_raw (file, name);
+  fputs ("\n", file);
+  HOST_WIDE_INT size = int_size_in_bytes (TREE_TYPE (decl));
+  fprintf (file, ".extern %s .b8 ", section);
+  assemble_name_raw (file, name);
+  if (size > 0)
+    fprintf (file, "["HOST_WIDE_INT_PRINT_DEC"]", size);
+  fprintf (file, ";\n// END VAR DECL\n");
+}
+
+const char *
+nvptx_output_call_insn (rtx insn, rtx result, rtx callee)
+{
+  char buf[256];
+  static int labelno;
+  bool needs_tgt = register_operand (callee, Pmode);
+  rtx pat = PATTERN (insn);
+  int nargs = XVECLEN (pat, 0) - 1;
+  tree decl = NULL_TREE;
+
+  fprintf (asm_out_file, "\t{\n");
+  if (result != NULL)
+    {
+      fprintf (asm_out_file, "\t\t.param%s %%retval_in;\n",
+	       nvptx_ptx_type_from_mode (arg_promotion (GET_MODE (result)),
+					 false));
+    }
+
+  if (GET_CODE (callee) == SYMBOL_REF)
+    {
+      decl = SYMBOL_REF_DECL (callee);
+      if (decl && DECL_EXTERNAL (decl))
+	nvptx_record_fndecl (decl);
+    }
+
+  if (needs_tgt)
+    {
+      ASM_GENERATE_INTERNAL_LABEL (buf, "LCT", labelno);
+      labelno++;
+      ASM_OUTPUT_LABEL (asm_out_file, buf);
+      std::stringstream s;
+      write_func_decl_from_insn (s, result, pat, callee);
+      fputs (s.str().c_str(), asm_out_file);
+    }
+
+  for (int i = 0, argno = 0; i < nargs; i++)
+    {
+      rtx t = XEXP (XVECEXP (pat, 0, i + 1), 0);
+      enum machine_mode mode = GET_MODE (t);
+      int count = maybe_split_mode (&mode);
+
+      while (count-- > 0)
+	fprintf (asm_out_file, "\t\t.param%s %%out_arg%d%s;\n",
+		 nvptx_ptx_type_from_mode (mode, false), argno++,
+		 mode == QImode || mode == HImode ? "[1]" : "");
+    }
+  for (int i = 0, argno = 0; i < nargs; i++)
+    {
+      rtx t = XEXP (XVECEXP (pat, 0, i + 1), 0);
+      gcc_assert (REG_P (t));
+      enum machine_mode mode = GET_MODE (t);
+      int count = maybe_split_mode (&mode);
+
+      if (count == 1)
+	fprintf (asm_out_file, "\t\tst.param%s [%%out_arg%d], %%r%d;\n",
+		 nvptx_ptx_type_from_mode (mode, false), argno++,
+		 REGNO (t));
+      else
+	{
+	  int n = 0;
+	  while (count-- > 0)
+	    fprintf (asm_out_file, "\t\tst.param%s [%%out_arg%d], %%r%d$%d;\n",
+		     nvptx_ptx_type_from_mode (mode, false), argno++,
+		     REGNO (t), n++);
+	}
+    }
+
+  fprintf (asm_out_file, "\t\tcall ");
+  if (result != NULL_RTX)
+    fprintf (asm_out_file, "(%%retval_in), ");
+
+  output_address (callee);
+  if (nargs > 0 || (decl && DECL_STATIC_CHAIN (decl)))
+    {
+      fprintf (asm_out_file, ", (");
+      int i, argno;
+      for (i = 0, argno = 0; i < nargs; i++)
+	{
+	  rtx t = XEXP (XVECEXP (pat, 0, i + 1), 0);
+	  enum machine_mode mode = GET_MODE (t);
+	  int count = maybe_split_mode (&mode);
+
+	  while (count-- > 0)
+	    {
+	      fprintf (asm_out_file, "%%out_arg%d", argno++);
+	      if (i + 1 < nargs || count > 0)
+		fprintf (asm_out_file, ", ");
+	    }
+	}
+      if (decl && DECL_STATIC_CHAIN (decl))
+	{
+	  if (i > 0)
+	    fprintf (asm_out_file, ", ");
+	  fprintf (asm_out_file, "%s",
+		   reg_names [OUTGOING_STATIC_CHAIN_REGNUM]);
+	}
+
+      fprintf (asm_out_file, ")");
+    }
+  if (needs_tgt)
+    {
+      fprintf (asm_out_file, ", ");
+      assemble_name (asm_out_file, buf);
+    }
+  fprintf (asm_out_file, ";\n");
+  if (result != NULL_RTX)
+    return "ld.param%t0\t%0, [%%retval_in];\n\t}";
+
+  return "}";
+}
+
+static bool
+nvptx_print_operand_punct_valid_p (unsigned char c)
+{
+  return c == '.' || c== '#';
+}
+
+static void nvptx_print_operand (FILE *, rtx, int);
+
+/* Subroutine of nvptx_print_operand; used to print a memory reference X to FILE.  */
+
+static void
+nvptx_print_address_operand (FILE *file, rtx x, enum machine_mode)
+{
+  rtx off;
+  if (GET_CODE (x) == CONST)
+    x = XEXP (x, 0);
+  switch (GET_CODE (x))
+    {
+    case PLUS:
+      off = XEXP (x, 1);
+      output_address (XEXP (x, 0));
+      fprintf (file, "+");
+      output_address (off);
+      break;
+
+    case SYMBOL_REF:
+    case LABEL_REF:
+      output_addr_const (file, x);
+      break;
+
+    default:
+      gcc_assert (GET_CODE (x) != MEM);
+      nvptx_print_operand (file, x, 0);
+      break;
+    }
+}
+
+/* Output assembly language output for the address ADDR to FILE.  */
+static void
+nvptx_print_operand_address (FILE *file, rtx addr)
+{
+  nvptx_print_address_operand (file, addr, VOIDmode);
+}
+
+/* Print an operand, X, to FILE, with an optional modifier in CODE.
+
+   Meaning of CODE:
+   . -- print the predicate for the instruction or an emptry string for an
+        unconditional one.
+   # -- print a rounding mode for the instruction
+
+   A -- print an address space identifier for a MEM
+   c -- print an opcode suffix for a comparison operator, including a type code
+   d -- print a CONST_INT as a vector dimension (x, y, or z)
+   f -- print a full reg even for something that must always be split
+   t -- print a type opcode suffix, promoting QImode to 32 bits
+   T -- print a type size in bits
+   u -- print a type opcode suffix without promotions.  */
+
+static void
+nvptx_print_operand (FILE *file, rtx x, int code)
+{
+  rtx orig_x = x;
+  enum machine_mode op_mode;
+
+  if (code == '.')
+    {
+      x = current_insn_predicate;
+      if (x)
+	{
+	  unsigned int regno = REGNO (XEXP (x, 0));
+	  fputs ("[", file);
+ 	  if (GET_CODE (x) == EQ)
+	    fputs ("!", file);
+	  fputs (reg_names [regno], file);
+	  fputs ("]", file);
+	}
+      return;
+    }
+  else if (code == '#')
+    {
+      fputs (".rn", file);
+      return;
+    }
+
+  enum rtx_code x_code = GET_CODE (x);
+
+  switch (code)
+    {
+    case 'A':
+      {
+	addr_space_t as = nvptx_addr_space_from_address (XEXP (x, 0));
+	fputs (nvptx_section_from_addr_space (as), file);
+      }
+      break;
+
+    case 'd':
+      gcc_assert (x_code == CONST_INT);
+      if (INTVAL (x) == 0)
+	fputs (".x", file);
+      else if (INTVAL (x) == 1)
+	fputs (".y", file);
+      else if (INTVAL (x) == 2)
+	fputs (".z", file);
+      else
+	gcc_unreachable ();
+      break;
+
+    case 't':
+      op_mode = nvptx_underlying_object_mode (x);
+      fprintf (file, "%s", nvptx_ptx_type_from_mode (op_mode, true));
+      break;
+
+    case 'u':
+      op_mode = nvptx_underlying_object_mode (x);
+      fprintf (file, "%s", nvptx_ptx_type_from_mode (op_mode, false));
+      break;
+
+    case 'T':
+      fprintf (file, "%d", GET_MODE_BITSIZE (GET_MODE (x)));
+      break;
+
+    case 'j':
+      fprintf (file, "@");
+      goto common;
+
+    case 'J':
+      fprintf (file, "@!");
+      goto common;
+
+    case 'c':
+      op_mode = GET_MODE (XEXP (x, 0));
+      switch (x_code)
+	{
+	case EQ:
+	  fputs (".eq", file);
+	  break;
+	case NE:
+	  if (FLOAT_MODE_P (op_mode))
+	    fputs (".neu", file);
+	  else
+	    fputs (".ne", file);
+	  break;
+	case LE:
+	  fputs (".le", file);
+	  break;
+	case GE:
+	  fputs (".ge", file);
+	  break;
+	case LT:
+	  fputs (".lt", file);
+	  break;
+	case GT:
+	  fputs (".gt", file);
+	  break;
+	case LEU:
+	  fputs (".ls", file);
+	  break;
+	case GEU:
+	  fputs (".hs", file);
+	  break;
+	case LTU:
+	  fputs (".lo", file);
+	  break;
+	case GTU:
+	  fputs (".hi", file);
+	  break;
+	case LTGT:
+	  fputs (".ne", file);
+	  break;
+	case UNEQ:
+	  fputs (".equ", file);
+	  break;
+	case UNLE:
+	  fputs (".leu", file);
+	  break;
+	case UNGE:
+	  fputs (".geu", file);
+	  break;
+	case UNLT:
+	  fputs (".ltu", file);
+	  break;
+	case UNGT:
+	  fputs (".gtu", file);
+	  break;
+	case UNORDERED:
+	  fputs (".nan", file);
+	  break;
+	case ORDERED:
+	  fputs (".num", file);
+	  break;
+	default:
+	  gcc_unreachable ();
+	}
+      if (FLOAT_MODE_P (op_mode)
+	  || x_code == EQ || x_code == NE
+	  || x_code == GEU || x_code == GTU
+	  || x_code == LEU || x_code == LTU)
+	fputs (nvptx_ptx_type_from_mode (op_mode, true), file);
+      else
+	fprintf (file, ".s%d", GET_MODE_BITSIZE (op_mode));
+      break;
+    default:
+    common:
+      switch (x_code)
+	{
+	case SUBREG:
+	  x = SUBREG_REG (x);
+	  /* fall through */
+
+	case REG:
+	  if (HARD_REGISTER_P (x))
+	    fprintf (file, "%s", reg_names[REGNO (x)]);
+	  else
+	    fprintf (file, "%%r%d", REGNO (x));
+	  if (code != 'f' && nvptx_split_reg_p (GET_MODE (x)))
+	    {
+	      gcc_assert (GET_CODE (orig_x) == SUBREG
+			  && !nvptx_split_reg_p (GET_MODE (orig_x)));
+	      fprintf (file, "$%d", SUBREG_BYTE (orig_x) / UNITS_PER_WORD);
+	    }
+	  break;
+
+	case MEM:
+	  fputc ('[', file);
+	  nvptx_print_address_operand (file, XEXP (x, 0), GET_MODE (x));
+	  fputc (']', file);
+	  break;
+
+	case CONST_INT:
+	  output_addr_const (file, x);
+	  break;
+
+	case CONST:
+	case SYMBOL_REF:
+	case LABEL_REF:
+	  /* We could use output_addr_const, but that can print things like
+	     "x-8", which breaks ptxas.  Need to ensure it is output as
+	     "x+-8".  */
+	  nvptx_print_address_operand (file, x, VOIDmode);
+	  break;
+
+	case CONST_DOUBLE:
+	  long vals[2];
+	  REAL_VALUE_TYPE real;
+	  REAL_VALUE_FROM_CONST_DOUBLE (real, x);
+	  real_to_target (vals, &real, GET_MODE (x));
+	  vals[0] &= 0xffffffff;
+	  vals[1] &= 0xffffffff;
+	  if (GET_MODE (x) == SFmode)
+	    fprintf (file, "0f%08lx", vals[0]);
+	  else
+	    fprintf (file, "0d%08lx%08lx", vals[1], vals[0]);
+	  break;
+
+	default:
+	  output_addr_const (file, x);
+	}
+    }
+}
+\f
+/* Record replacement regs used to deal with subreg operands.  */
+struct reg_replace
+{
+  rtx replacement[MAX_RECOG_OPERANDS];
+  enum machine_mode mode;
+  int n_allocated;
+  int n_in_use;
+};
+
+/* Allocate or reuse a replacement in R and return the rtx.  */
+
+static rtx
+get_replacement (struct reg_replace *r)
+{
+  if (r->n_allocated == r->n_in_use)
+    r->replacement[r->n_allocated++] = gen_reg_rtx (r->mode);
+  return r->replacement[r->n_in_use++];
+}
+
+/* Clean up subreg operands.  */
+static void
+nvptx_reorg (void)
+{
+  struct reg_replace qiregs, hiregs, siregs, diregs;
+  rtx_insn *insn, *next;
+
+  /* We are freeing block_for_insn in the toplev to keep compatibility
+     with old MDEP_REORGS that are not CFG based.  Recompute it now.  */
+  compute_bb_for_insn ();
+
+  df_clear_flags (DF_LR_RUN_DCE);
+  df_analyze ();
+
+  thread_prologue_and_epilogue_insns ();
+
+  qiregs.n_allocated = 0;
+  hiregs.n_allocated = 0;
+  siregs.n_allocated = 0;
+  diregs.n_allocated = 0;
+  qiregs.mode = QImode;
+  hiregs.mode = HImode;
+  siregs.mode = SImode;
+  diregs.mode = DImode;
+
+  for (insn = get_insns (); insn; insn = next)
+    {
+      next = NEXT_INSN (insn);
+      if (!NONDEBUG_INSN_P (insn)
+	  || asm_noperands (insn) >= 0
+	  || GET_CODE (PATTERN (insn)) == USE
+	  || GET_CODE (PATTERN (insn)) == CLOBBER)
+	continue;
+      qiregs.n_in_use = 0;
+      hiregs.n_in_use = 0;
+      siregs.n_in_use = 0;
+      diregs.n_in_use = 0;
+      extract_insn (insn);
+      enum attr_subregs_ok s_ok = get_attr_subregs_ok (insn);
+      for (int i = 0; i < recog_data.n_operands; i++)
+	{
+	  rtx op = recog_data.operand[i];
+	  if (GET_CODE (op) != SUBREG)
+	    continue;
+
+	  rtx inner = SUBREG_REG (op);
+
+	  enum machine_mode outer_mode = GET_MODE (op);
+	  enum machine_mode inner_mode = GET_MODE (inner);
+	  gcc_assert (s_ok);
+	  if (s_ok
+	      && (GET_MODE_PRECISION (inner_mode)
+		  >= GET_MODE_PRECISION (outer_mode)))
+	    continue;
+	  gcc_assert (SCALAR_INT_MODE_P (outer_mode));
+	  struct reg_replace *r = (outer_mode == QImode ? &qiregs
+				   : outer_mode == HImode ? &hiregs
+				   : outer_mode == SImode ? &siregs
+				   : &diregs);
+	  rtx new_reg = get_replacement (r);
+
+	  if (recog_data.operand_type[i] != OP_OUT)
+	    {
+	      enum rtx_code code;
+	      if (GET_MODE_PRECISION (inner_mode)
+		  < GET_MODE_PRECISION (outer_mode))
+		code = ZERO_EXTEND;
+	      else
+		code = TRUNCATE;
+
+	      rtx pat = gen_rtx_SET (VOIDmode, new_reg,
+				     gen_rtx_fmt_e (code, outer_mode, inner));
+	      emit_insn_before (pat, insn);
+	    }
+
+	  if (recog_data.operand_type[i] != OP_IN)
+	    {
+	      enum rtx_code code;
+	      if (GET_MODE_PRECISION (inner_mode)
+		  < GET_MODE_PRECISION (outer_mode))
+		code = TRUNCATE;
+	      else
+		code = ZERO_EXTEND;
+
+	      rtx pat = gen_rtx_SET (VOIDmode, inner,
+				     gen_rtx_fmt_e (code, inner_mode, new_reg));
+	      emit_insn_after (pat, insn);
+	    }
+	  validate_change (insn, recog_data.operand_loc[i], new_reg, false);
+	}
+    }
+
+  int maxregs = max_reg_num ();
+  regstat_init_n_sets_and_refs ();
+
+  for (int i = LAST_VIRTUAL_REGISTER + 1; i < maxregs; i++)
+    if (REG_N_SETS (i) == 0 && REG_N_REFS (i) == 0)
+      regno_reg_rtx[i] = const0_rtx;
+  regstat_free_n_sets_and_refs ();
+}
+\f
+/* Handle a "kernel" attribute; arguments as in
+   struct attribute_spec.handler.  */
+
+static tree
+nvptx_handle_kernel_attribute (tree *node, tree name, tree ARG_UNUSED (args),
+			       int ARG_UNUSED (flags), bool *no_add_attrs)
+{
+  tree decl = *node;
+
+  if (TREE_CODE (decl) != FUNCTION_DECL)
+    {
+      error ("%qE attribute only applies to functions", name);
+      *no_add_attrs = true;
+    }
+
+  else if (TREE_TYPE (TREE_TYPE (decl)) != void_type_node)
+    {
+      error ("%qE attribute requires a void return type", name);
+      *no_add_attrs = true;
+    }
+
+  return NULL_TREE;
+}
+
+/* Table of valid machine attributes.  */
+static const struct attribute_spec nvptx_attribute_table[] =
+{
+  /* { name, min_len, max_len, decl_req, type_req, fn_type_req, handler,
+       affects_type_identity } */
+  { "kernel", 0, 0, true, false,  false, nvptx_handle_kernel_attribute, false },
+  { NULL, 0, 0, false, false, false, NULL, false }
+};
+\f
+/* Limit vector alignments to BIGGEST_ALIGNMENT.  */
+
+static HOST_WIDE_INT
+nvptx_vector_alignment (const_tree type)
+{
+  HOST_WIDE_INT align = tree_to_shwi (TYPE_SIZE (type));
+
+  return MIN (align, BIGGEST_ALIGNMENT);
+}
+\f
+static void
+nvptx_file_start (void)
+{
+  fputs ("// BEGIN PREAMBLE\n", asm_out_file);
+  fputs ("\t.version\t3.1\n", asm_out_file);
+  fputs ("\t.target\tsm_30\n", asm_out_file);
+  fprintf (asm_out_file, "\t.address_size %d\n", GET_MODE_BITSIZE (Pmode));
+  fputs ("// END PREAMBLE\n", asm_out_file);
+}
+
+static int
+write_one_fndecl (void **slot, void *)
+{
+  tree decl = (tree)*slot;
+  nvptx_record_fndecl (decl, true);
+  return 1;
+}
+
+/* Write out the function declarations we've collected.  */
+
+static void
+nvptx_file_end (void)
+{
+  htab_traverse (needed_fndecls_htab,
+		 write_one_fndecl,
+		 NULL);
+  fputs (func_decls.str().c_str(), asm_out_file);
+}
+\f
+#undef TARGET_OPTION_OVERRIDE
+#define TARGET_OPTION_OVERRIDE nvptx_option_override
+
+#undef TARGET_ATTRIBUTE_TABLE
+#define TARGET_ATTRIBUTE_TABLE nvptx_attribute_table
+
+#undef TARGET_LEGITIMATE_ADDRESS_P
+#define TARGET_LEGITIMATE_ADDRESS_P nvptx_legitimate_address_p
+
+#undef  TARGET_PROMOTE_FUNCTION_MODE
+#define TARGET_PROMOTE_FUNCTION_MODE nvptx_promote_function_mode
+//#undef TARGET_PROMOTE_PROTOTYPES
+//#define TARGET_PROMOTE_PROTOTYPES hook_bool_const_tree_true
+
+#undef TARGET_FUNCTION_ARG
+#define TARGET_FUNCTION_ARG nvptx_function_arg
+#undef TARGET_FUNCTION_INCOMING_ARG
+#define TARGET_FUNCTION_INCOMING_ARG nvptx_function_incoming_arg
+#undef TARGET_FUNCTION_ARG_ADVANCE
+#define TARGET_FUNCTION_ARG_ADVANCE nvptx_function_arg_advance
+#undef TARGET_FUNCTION_ARG_BOUNDARY
+#define TARGET_FUNCTION_ARG_BOUNDARY nvptx_function_arg_boundary
+#undef TARGET_FUNCTION_ARG_ROUND_BOUNDARY
+#define TARGET_FUNCTION_ARG_ROUND_BOUNDARY \
+  nvptx_function_arg_round_boundary
+#undef TARGET_PASS_BY_REFERENCE
+#define TARGET_PASS_BY_REFERENCE nvptx_pass_by_reference
+#undef TARGET_FUNCTION_VALUE_REGNO_P
+#define TARGET_FUNCTION_VALUE_REGNO_P nvptx_function_value_regno_p
+#undef TARGET_FUNCTION_VALUE
+#define TARGET_FUNCTION_VALUE nvptx_function_value
+#undef TARGET_LIBCALL_VALUE
+#define TARGET_LIBCALL_VALUE nvptx_libcall_value
+#undef TARGET_FUNCTION_OK_FOR_SIBCALL
+#define TARGET_FUNCTION_OK_FOR_SIBCALL nvptx_function_ok_for_sibcall
+#undef TARGET_SPLIT_COMPLEX_ARG
+#define TARGET_SPLIT_COMPLEX_ARG hook_bool_const_tree_true
+#undef TARGET_RETURN_IN_MEMORY
+#define TARGET_RETURN_IN_MEMORY nvptx_return_in_memory
+#undef TARGET_OMIT_STRUCT_RETURN_REG
+#define TARGET_OMIT_STRUCT_RETURN_REG true
+#undef TARGET_STRICT_ARGUMENT_NAMING
+#define TARGET_STRICT_ARGUMENT_NAMING nvptx_strict_argument_naming
+#undef TARGET_STATIC_CHAIN
+#define TARGET_STATIC_CHAIN nvptx_static_chain
+
+#undef TARGET_CALL_ARGS
+#define TARGET_CALL_ARGS nvptx_call_args
+#undef TARGET_END_CALL_ARGS
+#define TARGET_END_CALL_ARGS nvptx_end_call_args
+
+#undef TARGET_ASM_FILE_START
+#define TARGET_ASM_FILE_START nvptx_file_start
+#undef TARGET_ASM_FILE_END
+#define TARGET_ASM_FILE_END nvptx_file_end
+#undef TARGET_ASM_GLOBALIZE_LABEL
+#define TARGET_ASM_GLOBALIZE_LABEL nvptx_globalize_label
+#undef TARGET_ASM_ASSEMBLE_UNDEFINED_DECL
+#define TARGET_ASM_ASSEMBLE_UNDEFINED_DECL nvptx_assemble_undefined_decl
+#undef  TARGET_PRINT_OPERAND
+#define TARGET_PRINT_OPERAND nvptx_print_operand
+#undef  TARGET_PRINT_OPERAND_ADDRESS
+#define TARGET_PRINT_OPERAND_ADDRESS nvptx_print_operand_address
+#undef  TARGET_PRINT_OPERAND_PUNCT_VALID_P
+#define TARGET_PRINT_OPERAND_PUNCT_VALID_P nvptx_print_operand_punct_valid_p
+#undef TARGET_ASM_INTEGER
+#define TARGET_ASM_INTEGER nvptx_assemble_integer
+#undef TARGET_ASM_DECL_END
+#define TARGET_ASM_DECL_END nvptx_assemble_decl_end
+#undef TARGET_ASM_DECLARE_CONSTANT_NAME
+#define TARGET_ASM_DECLARE_CONSTANT_NAME nvptx_asm_declare_constant_name
+#undef TARGET_USE_BLOCKS_FOR_CONSTANT_P
+#define TARGET_USE_BLOCKS_FOR_CONSTANT_P hook_bool_mode_const_rtx_true
+#undef TARGET_ASM_NEED_VAR_DECL_BEFORE_USE
+#define TARGET_ASM_NEED_VAR_DECL_BEFORE_USE true
+
+#undef TARGET_MACHINE_DEPENDENT_REORG
+#define TARGET_MACHINE_DEPENDENT_REORG nvptx_reorg
+#undef TARGET_NO_REGISTER_ALLOCATION
+#define TARGET_NO_REGISTER_ALLOCATION true
+
+#undef TARGET_VECTOR_ALIGNMENT
+#define TARGET_VECTOR_ALIGNMENT nvptx_vector_alignment
+
+struct gcc_target targetm = TARGET_INITIALIZER;
+
+#include "gt-nvptx.h"
Index: gcc/config/nvptx/nvptx.opt
===================================================================
--- /dev/null
+++ gcc/config/nvptx/nvptx.opt
@@ -0,0 +1,30 @@
+; Options for the NVPTX port
+; Copyright 2014 Free Software Foundation, Inc.
+;
+; This file is part of GCC.
+;
+; GCC is free software; you can redistribute it and/or modify it under
+; the terms of the GNU General Public License as published by the Free
+; Software Foundation; either version 3, or (at your option) any later
+; version.
+;
+; GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+; WARRANTY; without even the implied warranty of MERCHANTABILITY or
+; FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+; for more details.
+;
+; You should have received a copy of the GNU General Public License
+; along with GCC; see the file COPYING3.  If not see
+; <http://www.gnu.org/licenses/>.
+
+m64
+Target Report RejectNegative Mask(ABI64)
+Generate code for a 64 bit ABI
+
+m32
+Target Report RejectNegative InverseMask(ABI64)
+Generate code for a 32 bit ABI
+
+mmainkernel
+Target Report RejectNegative
+Link in code for a __main kernel.
Index: gcc/config/nvptx/t-nvptx
===================================================================
--- /dev/null
+++ gcc/config/nvptx/t-nvptx
@@ -0,0 +1,2 @@
+#
+
Index: gcc/config/nvptx/nvptx.h
===================================================================
--- /dev/null
+++ gcc/config/nvptx/nvptx.h
@@ -0,0 +1,355 @@
+/* Target Definitions for NVPTX.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   Contributed by Bernd Schmidt <bernds@codesourcery.com>
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_NVPTX_H
+#define GCC_NVPTX_H
+
+/* Run-time Target.  */
+
+#define STARTFILE_SPEC "%{mmainkernel:crt0.o}"
+
+#define TARGET_CPU_CPP_BUILTINS()		\
+  do						\
+    {						\
+      builtin_assert ("machine=nvptx");		\
+      builtin_assert ("cpu=nvptx");		\
+      builtin_define ("__nvptx__");		\
+    } while (0)
+
+/* Storage Layout.  */
+
+#define BITS_BIG_ENDIAN 0
+#define BYTES_BIG_ENDIAN 0
+#define WORDS_BIG_ENDIAN 0
+
+/* Chosen such that we won't have to deal with multi-word subregs.  */
+#define UNITS_PER_WORD 8
+
+#define PARM_BOUNDARY 8
+#define STACK_BOUNDARY 64
+#define FUNCTION_BOUNDARY 32
+#define BIGGEST_ALIGNMENT 64
+#define STRICT_ALIGNMENT 1
+
+/* Copied from elf.h and other places.  We'd otherwise use
+   BIGGEST_ALIGNMENT and fail a number of testcases.  */
+#define MAX_OFILE_ALIGNMENT (32768 * 8)
+
+/* Type Layout.  */
+
+#define DEFAULT_SIGNED_CHAR 1
+
+#define SHORT_TYPE_SIZE 16
+#define INT_TYPE_SIZE 32
+#define LONG_TYPE_SIZE (TARGET_ABI64 ? 64 : 32)
+#define LONG_LONG_TYPE_SIZE 64
+#define FLOAT_TYPE_SIZE 32
+#define DOUBLE_TYPE_SIZE 64
+#define LONG_DOUBLE_TYPE_SIZE 64
+
+#undef SIZE_TYPE
+#define SIZE_TYPE (TARGET_ABI64 ? "long unsigned int" : "unsigned int")
+#undef PTRDIFF_TYPE
+#define PTRDIFF_TYPE (TARGET_ABI64 ? "long int" : "int")
+
+#define POINTER_SIZE (TARGET_ABI64 ? 64 : 32)
+
+#define Pmode (TARGET_ABI64 ? DImode : SImode)
+
+/* Registers.  Since ptx is a virtual target, we just define a few
+   hard registers for special purposes and leave pseudos unallocated.  */
+
+#define FIRST_PSEUDO_REGISTER 16
+#define FIXED_REGISTERS					\
+  { 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1 }
+#define CALL_USED_REGISTERS				\
+  { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 }
+
+#define HARD_REGNO_NREGS(regno, mode)	1
+#define CANNOT_CHANGE_MODE_CLASS(M1, M2, CLS) ((CLS) == RETURN_REG)
+#define HARD_REGNO_MODE_OK(REG, MODE) nvptx_hard_regno_mode_ok (REG, MODE)
+
+/* Register Classes.  */
+
+enum reg_class
+  {
+    NO_REGS,
+    RETURN_REG,
+    ALL_REGS,
+    LIM_REG_CLASSES
+  };
+
+#define N_REG_CLASSES (int) LIM_REG_CLASSES
+
+#define REG_CLASS_NAMES {	  \
+    "RETURN_REG",		  \
+    "NO_REGS",			  \
+    "ALL_REGS" }
+
+#define REG_CLASS_CONTENTS	\
+{				\
+  /* NO_REGS.  */		\
+  { 0x0000 },			\
+  /* RETURN_REG.  */		\
+  { 0x0008 },			\
+  /* ALL_REGS.  */		\
+  { 0xFFFF },			\
+}
+
+#define GENERAL_REGS ALL_REGS
+
+#define REGNO_REG_CLASS(R) ((R) == 4 ? RETURN_REG : ALL_REGS)
+
+#define BASE_REG_CLASS ALL_REGS
+#define INDEX_REG_CLASS NO_REGS
+
+#define REGNO_OK_FOR_BASE_P(X) true
+#define REGNO_OK_FOR_INDEX_P(X) false
+
+#define CLASS_MAX_NREGS(class, mode) \
+  ((GET_MODE_SIZE (mode) + UNITS_PER_WORD - 1) / UNITS_PER_WORD)
+
+#define MODES_TIEABLE_P(M1, M2) false
+
+#define PROMOTE_MODE(MODE, UNSIGNEDP, TYPE)		\
+  if (GET_MODE_CLASS (MODE) == MODE_INT			\
+      && GET_MODE_SIZE (MODE) < GET_MODE_SIZE (SImode))	\
+    {							\
+      (MODE) = SImode;					\
+    }
+
+/* Address spaces.  */
+#define ADDR_SPACE_GLOBAL 1
+#define ADDR_SPACE_SHARED 3
+#define ADDR_SPACE_CONST 4
+#define ADDR_SPACE_LOCAL 5
+#define ADDR_SPACE_PARAM 101
+
+/* Stack and Calling.  */
+
+#define STARTING_FRAME_OFFSET 0
+#define FRAME_GROWS_DOWNWARD 0
+#define STACK_GROWS_DOWNWARD
+
+#define STACK_POINTER_REGNUM 1
+#define HARD_FRAME_POINTER_REGNUM 2
+#define NVPTX_PUNNING_BUFFER_REGNUM 3
+#define FRAME_POINTER_REGNUM 15
+#define ARG_POINTER_REGNUM 14
+#define RETURN_ADDR_REGNO 13
+
+#define STATIC_CHAIN_REGNUM 12
+#define OUTGOING_ARG_POINTER_REGNUM 11
+#define OUTGOING_STATIC_CHAIN_REGNUM 10
+
+#define FIRST_PARM_OFFSET(FNDECL) 0
+#define PUSH_ARGS_REVERSED 1
+
+#define ACCUMULATE_OUTGOING_ARGS 1
+
+#ifdef HOST_WIDE_INT
+struct nvptx_args {
+  union tree_node *fntype;
+  /* Number of arguments passed in registers so far.  */
+  int count;
+  /* Offset into the stdarg area so far.  */
+  HOST_WIDE_INT off;
+};
+#endif
+
+#define CUMULATIVE_ARGS struct nvptx_args
+
+#define INIT_CUMULATIVE_ARGS(CUM, FNTYPE, LIBNAME, FNDECL, N_NAMED_ARGS) \
+  do { (CUM).fntype = (FNTYPE); (CUM).count = 0; (CUM).off = 0; } while (0)
+
+#define FUNCTION_ARG_REGNO_P(r) 0
+
+#define DEFAULT_PCC_STRUCT_RETURN 0
+
+#define FUNCTION_PROFILER(file, labelno) \
+  fatal_error ("profiling is not yet implemented for this architecture")
+
+#define TRAMPOLINE_SIZE 32
+#define TRAMPOLINE_ALIGNMENT 256
+\f
+/* We don't run reload, so this isn't actually used, but it still needs to be
+   defined.  Showing an argp->fp elimination also stops
+   expand_builtin_setjmp_receiver from generating invalid insns.  */
+#define ELIMINABLE_REGS					\
+  {							\
+    { FRAME_POINTER_REGNUM, HARD_FRAME_POINTER_REGNUM},	\
+    { ARG_POINTER_REGNUM, HARD_FRAME_POINTER_REGNUM}	\
+  }
+
+/* Define the offset between two registers, one to be eliminated, and the other
+   its replacement, at the start of a routine.  */
+
+#define INITIAL_ELIMINATION_OFFSET(FROM, TO, OFFSET) \
+  ((OFFSET) = 0)
+\f
+/* Addressing Modes.  */
+
+#define MAX_REGS_PER_ADDRESS 1
+
+#define LEGITIMATE_PIC_OPERAND_P(X) 1
+\f
+
+struct nvptx_pseudo_info
+{
+  int true_size;
+  int renumber;
+};
+
+#if defined HOST_WIDE_INT
+struct GTY(()) machine_function
+{
+  rtx_expr_list *call_args;
+  rtx start_call;
+  tree funtype;
+  bool has_call_with_varargs;
+  bool has_call_with_sc;
+  struct GTY((skip)) nvptx_pseudo_info *pseudos;
+  HOST_WIDE_INT outgoing_stdarg_size;
+  int ret_reg_mode;
+  int punning_buffer_size;
+};
+#endif
+\f
+/* Costs.  */
+
+#define NO_FUNCTION_CSE 1
+#define SLOW_BYTE_ACCESS 0
+#define BRANCH_COST(speed_p, predictable_p) 6
+\f
+/* Assembler Format.  */
+
+#undef ASM_DECLARE_FUNCTION_NAME
+#define ASM_DECLARE_FUNCTION_NAME(FILE, NAME, DECL)		\
+  nvptx_declare_function_name (FILE, NAME, DECL)
+
+#undef ASM_DECLARE_FUNCTION_SIZE
+#define ASM_DECLARE_FUNCTION_SIZE(STREAM, NAME, DECL) \
+  nvptx_function_end (STREAM)
+
+#define DWARF2_ASM_LINE_DEBUG_INFO 1
+
+#undef ASM_APP_ON
+#define ASM_APP_ON "\t// #APP \n"
+#undef ASM_APP_OFF
+#define ASM_APP_OFF "\t// #NO_APP \n"
+
+#define ASM_OUTPUT_COMMON(stream, name, size, rounded)
+#define ASM_OUTPUT_LOCAL(stream, name, size, rounded)
+
+#define REGISTER_NAMES							\
+  {									\
+    "%hr0", "%outargs", "%hfp", "%punbuffer", "%retval", "%retval_in", "%hr6", "%hr7",	\
+    "%hr8", "%hr9", "%hr10", "%hr11", "%hr12", "%hr13", "%argp", "%frame" \
+  }
+
+#define DBX_REGISTER_NUMBER(N) N
+
+#define TEXT_SECTION_ASM_OP ""
+#define DATA_SECTION_ASM_OP ""
+
+#undef  ASM_GENERATE_INTERNAL_LABEL
+#define ASM_GENERATE_INTERNAL_LABEL(LABEL, PREFIX, NUM)		\
+  do								\
+    {								\
+      char *__p;						\
+      __p = stpcpy (&(LABEL)[1], PREFIX);			\
+      (LABEL)[0] = '$';						\
+      sprint_ul (__p, (unsigned long) (NUM));			\
+    }								\
+  while (0)
+
+#define ASM_OUTPUT_ALIGN(FILE, POWER)
+#define ASM_OUTPUT_SKIP(FILE, N)		\
+  nvptx_output_skip (FILE, N)
+#undef  ASM_OUTPUT_ASCII
+#define ASM_OUTPUT_ASCII(FILE, STR, LENGTH)			\
+  nvptx_output_ascii (FILE, STR, LENGTH);
+
+#define ASM_DECLARE_OBJECT_NAME(FILE, NAME, DECL)	\
+  nvptx_declare_object_name (FILE, NAME, DECL)
+
+#undef  ASM_OUTPUT_ALIGNED_DECL_COMMON
+#define ASM_OUTPUT_ALIGNED_DECL_COMMON(FILE, DECL, NAME, SIZE, ALIGN)	\
+  do									\
+    {									\
+      fprintf (FILE, "// BEGIN%s VAR DEF: ",				\
+	       TREE_PUBLIC (DECL) ? " GLOBAL" : "");			\
+      assemble_name_raw (FILE, NAME);					\
+      fputc ('\n', FILE);						\
+      const char *sec = nvptx_section_for_decl (DECL);			\
+      fprintf (FILE, ".visible%s.align %d .b8 ", sec,			\
+	       (ALIGN) / BITS_PER_UNIT);				\
+      assemble_name ((FILE), (NAME));					\
+      if ((SIZE) > 0)							\
+	fprintf (FILE, "["HOST_WIDE_INT_PRINT_DEC"]", (SIZE));		\
+      fprintf (FILE, ";\n");						\
+    }									\
+  while (0)
+
+#undef  ASM_OUTPUT_ALIGNED_DECL_LOCAL
+#define ASM_OUTPUT_ALIGNED_DECL_LOCAL(FILE, DECL, NAME, SIZE, ALIGN)	\
+  do									\
+    {									\
+      fprintf (FILE, "// BEGIN VAR DEF: ");				\
+      assemble_name_raw (FILE, NAME);					\
+      fputc ('\n', FILE);						\
+      const char *sec = nvptx_section_for_decl (DECL);			\
+      fprintf (FILE, ".visible%s.align %d .b8 ", sec,			\
+	       (ALIGN) / BITS_PER_UNIT);				\
+      assemble_name ((FILE), (NAME));					\
+      if ((SIZE) > 0)							\
+	fprintf (FILE, "["HOST_WIDE_INT_PRINT_DEC"]", (SIZE));		\
+      fprintf (FILE, ";\n");						\
+    }									\
+  while (0)
+
+#define CASE_VECTOR_PC_RELATIVE flag_pic
+#define JUMP_TABLES_IN_TEXT_SECTION flag_pic
+
+#define ADDR_VEC_ALIGN(VEC) (JUMP_TABLES_IN_TEXT_SECTION ? 5 : 2)
+
+/* Misc.  */
+
+#define DWARF2_DEBUGGING_INFO 1
+
+#define CLZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \
+  ((VALUE) = GET_MODE_BITSIZE ((MODE)), 2)
+#define CTZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \
+  ((VALUE) = GET_MODE_BITSIZE ((MODE)), 2)
+
+#define NO_DOT_IN_LABEL
+#define ASM_COMMENT_START "//"
+
+#define STORE_FLAG_VALUE -1
+#define FLOAT_STORE_FLAG_VALUE(MODE) REAL_VALUE_ATOF("1.0", (MODE))
+
+#define CASE_VECTOR_MODE SImode
+#define MOVE_MAX 4
+#define MOVE_RATIO(SPEED) 4
+#define TRULY_NOOP_TRUNCATION(outprec, inprec) 1
+#define FUNCTION_MODE QImode
+#define HAS_INIT_SECTION 1
+
+#endif /* GCC_NVPTX_H */
Index: gcc/config/nvptx/nvptx-protos.h
===================================================================
--- /dev/null
+++ gcc/config/nvptx/nvptx-protos.h
@@ -0,0 +1,47 @@
+/* Prototypes for exported functions defined in nvptx.c.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   Contributed by Bernd Schmidt <bernds@codesourcery.com>
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_NVPTX_PROTOS_H
+#define GCC_NVPTX_PROTOS_H
+
+extern void nvptx_declare_function_name (FILE *, const char *, const_tree decl);
+extern void nvptx_declare_object_name (FILE *file, const char *name,
+				       const_tree decl);
+extern void nvptx_record_needed_fndecl (tree decl);
+extern void nvptx_function_end (FILE *);
+extern void nvptx_output_skip (FILE *, unsigned HOST_WIDE_INT);
+extern void nvptx_output_ascii (FILE *, const char *, unsigned HOST_WIDE_INT);
+extern void nvptx_register_pragmas (void);
+extern const char *nvptx_section_for_decl (const_tree);
+
+#ifdef RTX_CODE
+extern void nvptx_expand_call (rtx, rtx);
+extern rtx nvptx_expand_compare (rtx);
+extern const char *nvptx_ptx_type_from_mode (enum machine_mode, bool);
+extern const char *nvptx_output_call_insn (rtx, rtx, rtx);
+extern const char *nvptx_output_return (void);
+extern enum machine_mode nvptx_underlying_object_mode (rtx);
+extern const char *nvptx_section_from_addr_space (addr_space_t);
+extern bool nvptx_hard_regno_mode_ok (int, enum machine_mode);
+extern addr_space_t nvptx_addr_space_from_address (rtx);
+extern rtx nvptx_maybe_convert_symbolic_operand (rtx);
+#endif
+#endif
+
Index: gcc/config/nvptx/nvptx.md
===================================================================
--- /dev/null
+++ gcc/config/nvptx/nvptx.md
@@ -0,0 +1,1280 @@
+;; Machine description for NVPTX.
+;; Copyright (C) 2014 Free Software Foundation, Inc.
+;; Contributed by Bernd Schmidt <bernds@codesourcery.com>
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+(define_c_enum "unspec" [
+   UNSPEC_ARG_REG
+   UNSPEC_FROM_GLOBAL
+   UNSPEC_FROM_LOCAL
+   UNSPEC_FROM_PARAM
+   UNSPEC_FROM_SHARED
+   UNSPEC_FROM_CONST
+   UNSPEC_TO_GLOBAL
+   UNSPEC_TO_LOCAL
+   UNSPEC_TO_PARAM
+   UNSPEC_TO_SHARED
+   UNSPEC_TO_CONST
+
+   UNSPEC_CPLX_LOWPART
+   UNSPEC_CPLX_HIGHPART
+
+   UNSPEC_COPYSIGN
+   UNSPEC_LOG2
+   UNSPEC_EXP2
+   UNSPEC_SIN
+   UNSPEC_COS
+
+   UNSPEC_FPINT_FLOOR
+   UNSPEC_FPINT_BTRUNC
+   UNSPEC_FPINT_CEIL
+   UNSPEC_FPINT_NEARBYINT
+
+   UNSPEC_BITREV
+
+   UNSPEC_ALLOCA
+
+   UNSPEC_NTID
+   UNSPEC_TID
+])
+
+(define_attr "subregs_ok" "false,true"
+  (const_string "false"))
+
+(define_predicate "nvptx_register_operand"
+  (match_code "reg,subreg")
+{
+  if (REG_P (op))
+    return !HARD_REGISTER_P (op);
+  if (GET_CODE (op) == SUBREG && MEM_P (SUBREG_REG (op)))
+    return false;
+  if (GET_CODE (op) == SUBREG)
+    return false;
+  return register_operand (op, mode);
+})
+
+(define_predicate "nvptx_reg_or_mem_operand"
+  (match_code "mem,reg,subreg")
+{
+  if (REG_P (op))
+    return !HARD_REGISTER_P (op);
+  if (GET_CODE (op) == SUBREG && MEM_P (SUBREG_REG (op)))
+    return false;
+  if (GET_CODE (op) == SUBREG)
+    return false;
+  return memory_operand (op, mode) || register_operand (op, mode);
+})
+
+;; Allow symbolic constants.
+(define_predicate "symbolic_operand"
+  (match_code "symbol_ref,const"))
+
+;; Allow registers or symbolic constants.  We can allow frame, arg or stack
+;; pointers here since they are actually symbolic constants.
+(define_predicate "nvptx_register_or_symbolic_operand"
+  (match_code "reg,subreg,symbol_ref,const")
+{
+  if (GET_CODE (op) == SUBREG && MEM_P (SUBREG_REG (op)))
+    return false;
+  if (GET_CODE (op) == SUBREG)
+    return false;
+  if (CONSTANT_P (op))
+    return true;
+  return register_operand (op, mode);
+})
+
+;; Registers or constants for normal instructions.  Does not allow symbolic
+;; constants.
+(define_predicate "nvptx_nonmemory_operand"
+  (match_code "reg,subreg,const_int,const_double")
+{
+  if (REG_P (op))
+    return !HARD_REGISTER_P (op);
+  if (GET_CODE (op) == SUBREG && MEM_P (SUBREG_REG (op)))
+    return false;
+  if (GET_CODE (op) == SUBREG)
+    return false;
+  return nonmemory_operand (op, mode);
+})
+
+;; A source operand for a move instruction.  This is the only predicate we use
+;; that accepts symbolic constants.
+(define_predicate "nvptx_general_operand"
+  (match_code "reg,subreg,mem,const,symbol_ref,label_ref,const_int,const_double")
+{
+  if (REG_P (op))
+    return !HARD_REGISTER_P (op);
+  return general_operand (op, mode);
+})
+
+;; A destination operand for a move instruction.  This is the only destination
+;; predicate that accepts the return register since it requires special handling.
+(define_predicate "nvptx_nonimmediate_operand"
+  (match_code "reg,subreg,mem")
+{
+  if (REG_P (op))
+    return (op != frame_pointer_rtx
+	    && op != arg_pointer_rtx
+	    && op != stack_pointer_rtx);
+  return nonimmediate_operand (op, mode);
+})
+
+(define_predicate "const_0_operand"
+  (and (match_code "const_int,const_double,const_vector")
+       (match_test "op == CONST0_RTX (GET_MODE (op))")))
+
+(define_predicate "global_mem_operand"
+  (and (match_code "mem")
+       (match_test "MEM_ADDR_SPACE (op) == ADDR_SPACE_GLOBAL")))
+
+(define_predicate "const_mem_operand"
+  (and (match_code "mem")
+       (match_test "MEM_ADDR_SPACE (op) == ADDR_SPACE_CONST")))
+
+(define_predicate "param_mem_operand"
+  (and (match_code "mem")
+       (match_test "MEM_ADDR_SPACE (op) == ADDR_SPACE_PARAM")))
+
+(define_predicate "shared_mem_operand"
+  (and (match_code "mem")
+       (match_test "MEM_ADDR_SPACE (op) == ADDR_SPACE_SHARED")))
+
+(define_predicate "const0_operand"
+  (and (match_code "const_int")
+       (match_test "op == const0_rtx")))
+
+;; True if this operator is valid for predication.
+(define_predicate "predicate_operator"
+  (match_code "eq,ne"))
+
+(define_predicate "ne_operator"
+  (match_code "ne"))
+
+(define_predicate "nvptx_comparison_operator"
+  (match_code "eq,ne,le,ge,lt,gt,leu,geu,ltu,gtu"))
+
+(define_predicate "nvptx_float_comparison_operator"
+  (match_code "eq,ne,le,ge,lt,gt,uneq,unle,unge,unlt,ungt,unordered,ordered"))
+
+;; Test for a valid operand for a call instruction.
+(define_special_predicate "call_insn_operand"
+  (match_code "symbol_ref,reg")
+{
+  if (GET_CODE (op) == SYMBOL_REF)
+    {
+      tree decl = SYMBOL_REF_DECL (op);
+      /* This happens for libcalls.  */
+      if (decl == NULL_TREE)
+        return true;
+      return TREE_CODE (SYMBOL_REF_DECL (op)) == FUNCTION_DECL;
+    }
+  return true;
+})
+
+;; Return true if OP is a call with parallel USEs of the argument
+;; pseudos.
+(define_predicate "call_operation"
+  (match_code "parallel")
+{
+  unsigned i;
+
+  for (i = 1; i < XVECLEN (op, 0); i++)
+    {
+      rtx elt = XVECEXP (op, 0, i);
+      enum machine_mode mode;
+      unsigned regno;
+
+      if (GET_CODE (elt) != USE
+          || GET_CODE (XEXP (elt, 0)) != REG
+          || XEXP (elt, 0) == frame_pointer_rtx
+          || XEXP (elt, 0) == arg_pointer_rtx
+          || XEXP (elt, 0) == stack_pointer_rtx)
+
+        return false;
+    }
+  return true;
+})
+
+(define_constraint "P0"
+  "An integer with the value 0."
+  (and (match_code "const_int")
+       (match_test "ival == 0")))
+
+(define_constraint "P1"
+  "An integer with the value 1."
+  (and (match_code "const_int")
+       (match_test "ival == 1")))
+
+(define_constraint "Pn"
+  "An integer with the value -1."
+  (and (match_code "const_int")
+       (match_test "ival == -1")))
+
+(define_constraint "R"
+  "A pseudo register."
+  (match_code "reg"))
+
+(define_constraint "Ia"
+  "Any integer constant."
+  (and (match_code "const_int") (match_test "true")))
+
+(define_mode_iterator QHSDISDFM [QI HI SI DI SF DF])
+(define_mode_iterator QHSDIM [QI HI SI DI])
+(define_mode_iterator HSDIM [HI SI DI])
+(define_mode_iterator BHSDIM [BI HI SI DI])
+(define_mode_iterator SDIM [SI DI])
+(define_mode_iterator SDISDFM [SI DI SF DF])
+(define_mode_iterator QHIM [QI HI])
+(define_mode_iterator QHSIM [QI HI SI])
+(define_mode_iterator SDFM [SF DF])
+(define_mode_iterator SDCM [SC DC])
+
+;; This mode iterator allows :P to be used for patterns that operate on
+;; pointer-sized quantities.  Exactly one of the two alternatives will match.
+(define_mode_iterator P [(SI "Pmode == SImode") (DI "Pmode == DImode")])
+
+;; We should get away with not defining memory alternatives, since we don't
+;; get variables in this mode and pseudos are never spilled.
+(define_insn "movbi"
+  [(set (match_operand:BI 0 "nvptx_register_operand" "=R,R,R")
+	(match_operand:BI 1 "nvptx_nonmemory_operand" "R,P0,Pn"))]
+  ""
+  "@
+   %.\\tmov%t0\\t%0, %1;
+   %.\\tsetp.eq.u32\\t%0, 1, 0;
+   %.\\tsetp.eq.u32\\t%0, 1, 1;")
+
+(define_insn "*mov<mode>_insn"
+  [(set (match_operand:QHSDIM 0 "nvptx_nonimmediate_operand" "=R,R,R,m")
+	(match_operand:QHSDIM 1 "general_operand" "n,Ri,m,R"))]
+  "!(MEM_P (operands[0])
+     && (!REG_P (operands[1]) || REGNO (operands[1]) <= LAST_VIRTUAL_REGISTER))"
+{
+  if (which_alternative == 2)
+    return "%.\\tld%A1%u1\\t%0, %1;";
+  if (which_alternative == 3)
+    return "%.\\tst%A0%u0\\t%0, %1;";
+
+  rtx dst = operands[0];
+  rtx src = operands[1];
+
+  enum machine_mode dst_mode = nvptx_underlying_object_mode (dst);
+  enum machine_mode src_mode = nvptx_underlying_object_mode (src);
+  if (GET_CODE (dst) == SUBREG)
+    dst = SUBREG_REG (dst);
+  if (GET_CODE (src) == SUBREG)
+    src = SUBREG_REG (src);
+  if (src_mode == QImode)
+    src_mode = SImode;
+  if (dst_mode == QImode)
+    dst_mode = SImode;
+  if (CONSTANT_P (src))
+    {
+      if (GET_MODE_CLASS (dst_mode) != MODE_INT)
+        return "%.\\tmov.b%T0\\t%0, %1;";
+      else
+        return "%.\\tmov%t0\\t%0, %1;";
+    }
+
+  /* Special handling for the return register; we allow this register to
+     only occur in the destination of a move insn.  */
+  if (REG_P (dst) && REGNO (dst) == 4 && dst_mode == HImode)
+    dst_mode = SImode;
+  if (dst_mode == src_mode)
+    return "%.\\tmov%t0\\t%0, %1;";
+  /* Mode-punning between floating point and integer.  */
+  if (GET_MODE_SIZE (dst_mode) == GET_MODE_SIZE (src_mode))
+    return "%.\\tmov.b%T0\\t%0, %1;";
+  return "%.\\tcvt%t0%t1\\t%0, %1;";
+}
+  [(set_attr "subregs_ok" "true")])
+
+(define_insn "*mov<mode>_insn"
+  [(set (match_operand:SDFM 0 "nvptx_nonimmediate_operand" "=R,R,m")
+	(match_operand:SDFM 1 "general_operand" "RF,m,R"))]
+  "!(MEM_P (operands[0]) && !REG_P (operands[1]))"
+{
+  if (which_alternative == 1)
+    return "%.\\tld%A1%u0\\t%0, %1;";
+  if (which_alternative == 2)
+    return "%.\\tst%A0%u1\\t%0, %1;";
+
+  rtx dst = operands[0];
+  rtx src = operands[1];
+  if (GET_CODE (dst) == SUBREG)
+    dst = SUBREG_REG (dst);
+  if (GET_CODE (src) == SUBREG)
+    src = SUBREG_REG (src);
+  enum machine_mode dst_mode = GET_MODE (dst);
+  enum machine_mode src_mode = GET_MODE (src);
+  if (dst_mode == src_mode)
+    return "%.\\tmov%t0\\t%0, %1;";
+  if (GET_MODE_SIZE (dst_mode) == GET_MODE_SIZE (src_mode))
+    return "%.\\tmov.b%T0\\t%0, %1;";
+  gcc_unreachable ();
+}
+  [(set_attr "subregs_ok" "true")])
+
+(define_insn "load_arg_reg<mode>"
+  [(set (match_operand:QHIM 0 "nvptx_register_operand" "=R")
+	(unspec:QHIM [(match_operand 1 "const_int_operand" "i")]
+		     UNSPEC_ARG_REG))]
+  ""
+  "%.\\tcvt%t0.u32\\t%0, %%ar%1;")
+
+(define_insn "load_arg_reg<mode>"
+  [(set (match_operand:SDISDFM 0 "nvptx_register_operand" "=R")
+	(unspec:SDISDFM [(match_operand 1 "const_int_operand" "i")]
+			UNSPEC_ARG_REG))]
+  ""
+  "%.\\tmov%t0\\t%0, %%ar%1;")
+
+(define_expand "mov<mode>"
+  [(set (match_operand:QHSDISDFM 0 "nvptx_nonimmediate_operand" "")
+	(match_operand:QHSDISDFM 1 "general_operand" ""))]
+  ""
+{
+  operands[1] = nvptx_maybe_convert_symbolic_operand (operands[1]);
+  /* Record the mode of the return register so that we can prevent
+     later optimization passes from changing it.  */
+  if (REG_P (operands[0]) && REGNO (operands[0]) == 4 && cfun)
+    {
+      if (cfun->machine->ret_reg_mode == VOIDmode)
+	cfun->machine->ret_reg_mode = GET_MODE (operands[0]);
+      else
+        gcc_assert (cfun->machine->ret_reg_mode == GET_MODE (operands[0]));
+    }
+
+  /* Hard registers are often actually symbolic operands on this target.
+     Don't allow them when storing to memory.  */
+  if (MEM_P (operands[0])
+      && (!REG_P (operands[1])
+	  || REGNO (operands[1]) <= LAST_VIRTUAL_REGISTER))
+    {
+      rtx tmp = gen_reg_rtx (<MODE>mode);
+      emit_move_insn (tmp, operands[1]);
+      emit_move_insn (operands[0], tmp);
+      DONE;
+    }
+  if (GET_CODE (operands[1]) == SYMBOL_REF)
+    nvptx_record_needed_fndecl (SYMBOL_REF_DECL (operands[1]));
+})
+
+(define_insn "highpartscsf2"
+  [(set (match_operand:SF 0 "nvptx_register_operand" "=R")
+	(unspec:SF [(match_operand:SC 1 "nvptx_register_operand")]
+		   UNSPEC_CPLX_HIGHPART))]
+  ""
+  "%.\\tmov%t0\\t%0, %f1$1;")
+
+(define_insn "set_highpartsfsc2"
+  [(set (match_operand:SC 0 "nvptx_register_operand" "+R")
+	(unspec:SC [(match_dup 0)
+		    (match_operand:SF 1 "nvptx_register_operand")]
+		   UNSPEC_CPLX_HIGHPART))]
+  ""
+  "%.\\tmov%t1\\t%f0$1, %1;")
+
+(define_insn "lowpartscsf2"
+  [(set (match_operand:SF 0 "nvptx_register_operand" "=R")
+	(unspec:SF [(match_operand:SC 1 "nvptx_register_operand")]
+		   UNSPEC_CPLX_LOWPART))]
+  ""
+  "%.\\tmov%t0\\t%0, %f1$0;")
+
+(define_insn "set_lowpartsfsc2"
+  [(set (match_operand:SC 0 "nvptx_register_operand" "+R")
+	(unspec:SC [(match_dup 0)
+		    (match_operand:SF 1 "nvptx_register_operand")]
+		   UNSPEC_CPLX_LOWPART))]
+  ""
+  "%.\\tmov%t1\\t%f0$0, %1;")
+
+(define_expand "mov<mode>"
+  [(set (match_operand:SDCM 0 "nvptx_nonimmediate_operand" "")
+	(match_operand:SDCM 1 "general_operand" ""))]
+  ""
+{
+  enum machine_mode submode = <MODE>mode == SCmode ? SFmode : DFmode;
+  int sz = GET_MODE_SIZE (submode);
+  rtx xops[4];
+  rtx punning_reg = NULL_RTX;
+  rtx copyback = NULL_RTX;
+
+  if (GET_CODE (operands[0]) == SUBREG)
+    {
+      rtx inner = SUBREG_REG (operands[0]);
+      enum machine_mode inner_mode = GET_MODE (inner);
+      int sz2 = GET_MODE_SIZE (inner_mode);
+      gcc_assert (sz2 >= sz);
+      cfun->machine->punning_buffer_size
+        = MAX (cfun->machine->punning_buffer_size, sz2);
+      if (punning_reg == NULL_RTX)
+	punning_reg = gen_rtx_REG (Pmode, NVPTX_PUNNING_BUFFER_REGNUM);
+      copyback = gen_move_insn (inner, gen_rtx_MEM (inner_mode, punning_reg));
+      operands[0] = gen_rtx_MEM (<MODE>mode, punning_reg);
+    }
+  if (GET_CODE (operands[1]) == SUBREG)
+    {
+      rtx inner = SUBREG_REG (operands[1]);
+      enum machine_mode inner_mode = GET_MODE (inner);
+      int sz2 = GET_MODE_SIZE (inner_mode);
+      gcc_assert (sz2 >= sz);
+      cfun->machine->punning_buffer_size
+        = MAX (cfun->machine->punning_buffer_size, sz2);
+      if (punning_reg == NULL_RTX)
+	punning_reg = gen_rtx_REG (Pmode, NVPTX_PUNNING_BUFFER_REGNUM);
+      emit_move_insn (gen_rtx_MEM (inner_mode, punning_reg), inner);
+      operands[1] = gen_rtx_MEM (<MODE>mode, punning_reg);
+    }
+
+  if (REG_P (operands[0]) && submode == SFmode)
+    {
+      xops[0] = gen_reg_rtx (submode);
+      xops[1] = gen_reg_rtx (submode);
+    }
+  else
+    {
+      xops[0] = gen_lowpart (submode, operands[0]);
+      if (MEM_P (operands[0]))
+	xops[1] = adjust_address_nv (operands[0], submode, sz);
+      else
+	xops[1] = gen_highpart (submode, operands[0]);
+    }
+
+  if (REG_P (operands[1]) && submode == SFmode)
+    {
+      xops[2] = gen_reg_rtx (submode);
+      xops[3] = gen_reg_rtx (submode);
+      emit_insn (gen_lowpartscsf2 (xops[2], operands[1]));
+      emit_insn (gen_highpartscsf2 (xops[3], operands[1]));
+    }
+  else
+    {
+      xops[2] = gen_lowpart (submode, operands[1]);
+      if (MEM_P (operands[1]))
+	xops[3] = adjust_address_nv (operands[1], submode, sz);
+      else
+	xops[3] = gen_highpart (submode, operands[1]);
+    }
+
+  emit_move_insn (xops[0], xops[2]);
+  emit_move_insn (xops[1], xops[3]);
+  if (REG_P (operands[0]) && submode == SFmode)
+    {
+      emit_insn (gen_set_lowpartsfsc2 (operands[0], xops[0]));
+      emit_insn (gen_set_highpartsfsc2 (operands[0], xops[1]));
+    }
+  if (copyback)
+    emit_insn (copyback);
+  DONE;
+})
+
+(define_insn "zero_extendqihi2"
+  [(set (match_operand:HI 0 "nvptx_register_operand" "=R,R")
+	(zero_extend:HI (match_operand:QI 1 "nvptx_reg_or_mem_operand" "R,m")))]
+  ""
+  "@
+   %.\\tcvt.u16.u%T1\\t%0, %1;
+   %.\\tld%A1.u8\\t%0, %1;"
+  [(set_attr "subregs_ok" "true")])
+
+(define_insn "zero_extend<mode>si2"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R,R")
+	(zero_extend:SI (match_operand:QHIM 1 "nvptx_reg_or_mem_operand" "R,m")))]
+  ""
+  "@
+   %.\\tcvt.u32.u%T1\\t%0, %1;
+   %.\\tld%A1.u%T1\\t%0, %1;"
+  [(set_attr "subregs_ok" "true")])
+
+(define_insn "zero_extend<mode>di2"
+  [(set (match_operand:DI 0 "nvptx_register_operand" "=R,R")
+	(zero_extend:DI (match_operand:QHSIM 1 "nvptx_reg_or_mem_operand" "R,m")))]
+  ""
+  "@
+   %.\\tcvt.u64.u%T1\\t%0, %1;
+   %.\\tld%A1%u1\\t%0, %1;"
+  [(set_attr "subregs_ok" "true")])
+
+(define_insn "extend<mode>si2"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R,R")
+	(sign_extend:SI (match_operand:QHIM 1 "nvptx_reg_or_mem_operand" "R,m")))]
+  ""
+  "@
+   %.\\tcvt.s32.s%T1\\t%0, %1;
+   %.\\tld%A1.s%T1\\t%0, %1;"
+  [(set_attr "subregs_ok" "true")])
+
+(define_insn "extend<mode>di2"
+  [(set (match_operand:DI 0 "nvptx_register_operand" "=R,R")
+	(sign_extend:DI (match_operand:QHSIM 1 "nvptx_reg_or_mem_operand" "R,m")))]
+  ""
+  "@
+   %.\\tcvt.s64.s%T1\\t%0, %1;
+   %.\\tld%A1.s%T1\\t%0, %1;"
+  [(set_attr "subregs_ok" "true")])
+
+(define_insn "trunchiqi2"
+  [(set (match_operand:QI 0 "nvptx_reg_or_mem_operand" "=R,m")
+	(truncate:QI (match_operand:HI 1 "nvptx_register_operand" "R,R")))]
+  ""
+  "@
+   %.\\tcvt%t0.u16\\t%0, %1;
+   %.\\tst%A0.u8\\t%0, %1;"
+  [(set_attr "subregs_ok" "true")])
+
+(define_insn "truncsi<mode>2"
+  [(set (match_operand:QHIM 0 "nvptx_reg_or_mem_operand" "=R,m")
+	(truncate:QHIM (match_operand:SI 1 "nvptx_register_operand" "R,R")))]
+  ""
+  "@
+   %.\\tcvt%t0.u32\\t%0, %1;
+   %.\\tst%A0.u%T0\\t%0, %1;"
+  [(set_attr "subregs_ok" "true")])
+
+(define_insn "truncdi<mode>2"
+  [(set (match_operand:QHSIM 0 "nvptx_reg_or_mem_operand" "=R,m")
+	(truncate:QHSIM (match_operand:DI 1 "nvptx_register_operand" "R,R")))]
+  ""
+  "@
+   %.\\tcvt%t0.u64\\t%0, %1;
+   %.\\tst%A0.u%T0\\t%0, %1;"
+  [(set_attr "subregs_ok" "true")])
+
+;; Pointer address space conversions
+
+(define_int_iterator cvt_code
+  [UNSPEC_FROM_GLOBAL
+   UNSPEC_FROM_LOCAL
+   UNSPEC_FROM_SHARED
+   UNSPEC_FROM_CONST
+   UNSPEC_TO_GLOBAL
+   UNSPEC_TO_LOCAL
+   UNSPEC_TO_SHARED
+   UNSPEC_TO_CONST])
+
+(define_int_attr cvt_name
+  [(UNSPEC_FROM_GLOBAL "from_global")
+   (UNSPEC_FROM_LOCAL "from_local")
+   (UNSPEC_FROM_SHARED "from_shared")
+   (UNSPEC_FROM_CONST "from_const")
+   (UNSPEC_TO_GLOBAL "to_global")
+   (UNSPEC_TO_LOCAL "to_local")
+   (UNSPEC_TO_SHARED "to_shared")
+   (UNSPEC_TO_CONST "to_const")])
+
+(define_int_attr cvt_str
+  [(UNSPEC_FROM_GLOBAL ".global")
+   (UNSPEC_FROM_LOCAL ".local")
+   (UNSPEC_FROM_SHARED ".shared")
+   (UNSPEC_FROM_CONST ".const")
+   (UNSPEC_TO_GLOBAL ".to.global")
+   (UNSPEC_TO_LOCAL ".to.local")
+   (UNSPEC_TO_SHARED ".to.shared")
+   (UNSPEC_TO_CONST ".to.const")])
+
+(define_insn "convaddr_<cvt_name><mode>"
+  [(set (match_operand:P 0 "nvptx_register_operand" "=R")
+	(unspec:P [(match_operand:P 1 "nvptx_register_or_symbolic_operand" "Rs")] cvt_code))]
+  ""
+  "%.\\tcvta<cvt_str>%t0\\t%0, %1;")
+
+;; Integer arithmetic
+
+(define_insn "add<mode>3"
+  [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
+	(plus:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R")
+		    (match_operand:HSDIM 2 "nvptx_nonmemory_operand" "Ri")))]
+  ""
+  "%.\\tadd%t0\\t%0, %1, %2;")
+
+(define_insn "sub<mode>3"
+  [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
+	(minus:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R")
+		     (match_operand:HSDIM 2 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tsub%t0\\t%0, %1, %2;")
+
+(define_insn "mul<mode>3"
+  [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
+	(mult:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R")
+		    (match_operand:HSDIM 2 "nvptx_nonmemory_operand" "Ri")))]
+  ""
+  "%.\\tmul.lo%t0\\t%0, %1, %2;")
+
+(define_insn "*mad<mode>3"
+  [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
+	(plus:HSDIM (mult:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R")
+				(match_operand:HSDIM 2 "nvptx_nonmemory_operand" "Ri"))
+		    (match_operand:HSDIM 3 "nvptx_nonmemory_operand" "Ri")))]
+  ""
+  "%.\\tmad.lo%t0\\t%0, %1, %2, %3;")
+
+(define_insn "div<mode>3"
+  [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
+	(div:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R")
+		   (match_operand:HSDIM 2 "nvptx_nonmemory_operand" "Ri")))]
+  ""
+  "%.\\tdiv.s%T0\\t%0, %1, %2;")
+
+(define_insn "udiv<mode>3"
+  [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
+	(udiv:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R")
+		   (match_operand:HSDIM 2 "nvptx_nonmemory_operand" "Ri")))]
+  ""
+  "%.\\tdiv.u%T0\\t%0, %1, %2;")
+
+(define_insn "mod<mode>3"
+  [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
+	(mod:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "Ri")
+		   (match_operand:HSDIM 2 "nvptx_nonmemory_operand" "Ri")))]
+  ""
+  "%.\\trem.s%T0\\t%0, %1, %2;")
+
+(define_insn "umod<mode>3"
+  [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
+	(umod:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "Ri")
+		    (match_operand:HSDIM 2 "nvptx_nonmemory_operand" "Ri")))]
+  ""
+  "%.\\trem.u%T0\\t%0, %1, %2;")
+
+(define_insn "smin<mode>3"
+  [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
+	(smin:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R")
+		    (match_operand:HSDIM 2 "nvptx_nonmemory_operand" "Ri")))]
+  ""
+  "%.\\tmin.s%T0\\t%0, %1, %2;")
+
+(define_insn "umin<mode>3"
+  [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
+	(umin:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R")
+		    (match_operand:HSDIM 2 "nvptx_nonmemory_operand" "Ri")))]
+  ""
+  "%.\\tmin.u%T0\\t%0, %1, %2;")
+
+(define_insn "smax<mode>3"
+  [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
+	(smax:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R")
+		    (match_operand:HSDIM 2 "nvptx_nonmemory_operand" "Ri")))]
+  ""
+  "%.\\tmax.s%T0\\t%0, %1, %2;")
+
+(define_insn "umax<mode>3"
+  [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
+	(umax:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R")
+		    (match_operand:HSDIM 2 "nvptx_nonmemory_operand" "Ri")))]
+  ""
+  "%.\\tmax.u%T0\\t%0, %1, %2;")
+
+(define_insn "abs<mode>2"
+  [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
+	(abs:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tabs.s%T0\\t%0, %1;")
+
+(define_insn "neg<mode>2"
+  [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
+	(neg:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tneg.s%T0\\t%0, %1;")
+
+(define_insn "one_cmpl<mode>2"
+  [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
+	(not:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tnot.b%T0\\t%0, %1;")
+
+(define_insn "bitrev<mode>2"
+  [(set (match_operand:SDIM 0 "nvptx_register_operand" "=R")
+	(unspec:SDIM [(match_operand:SDIM 1 "nvptx_register_operand" "R")]
+		     UNSPEC_BITREV))]
+  ""
+  "%.\\tbrev.b%T0\\t%0, %1;")
+
+(define_insn "clz<mode>2"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+	(clz:SI (match_operand:SDIM 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tclz.b%T0\\t%0, %1;")
+
+(define_expand "ctz<mode>2"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "")
+	(ctz:SI (match_operand:SDIM 1 "nvptx_register_operand" "")))]
+  ""
+{
+  rtx tmpreg = gen_reg_rtx (<MODE>mode);
+  emit_insn (gen_bitrev<mode>2 (tmpreg, operands[1]));
+  emit_insn (gen_clz<mode>2 (operands[0], tmpreg));
+  DONE;
+})
+
+;; Shifts
+
+(define_insn "ashl<mode>3"
+  [(set (match_operand:SDIM 0 "nvptx_register_operand" "=R")
+	(ashift:SDIM (match_operand:SDIM 1 "nvptx_register_operand" "R")
+		     (match_operand:SI 2 "nvptx_nonmemory_operand" "Ri")))]
+  ""
+  "%.\\tshl.b%T0\\t%0, %1, %2;")
+
+(define_insn "ashr<mode>3"
+  [(set (match_operand:SDIM 0 "nvptx_register_operand" "=R")
+	(ashiftrt:SDIM (match_operand:SDIM 1 "nvptx_register_operand" "R")
+		       (match_operand:SI 2 "nvptx_nonmemory_operand" "Ri")))]
+  ""
+  "%.\\tshr.s%T0\\t%0, %1, %2;")
+
+(define_insn "lshr<mode>3"
+  [(set (match_operand:SDIM 0 "nvptx_register_operand" "=R")
+	(lshiftrt:SDIM (match_operand:SDIM 1 "nvptx_register_operand" "R")
+		       (match_operand:SI 2 "nvptx_nonmemory_operand" "Ri")))]
+  ""
+  "%.\\tshr.u%T0\\t%0, %1, %2;")
+
+;; Logical operations
+
+(define_insn "and<mode>3"
+  [(set (match_operand:BHSDIM 0 "nvptx_register_operand" "=R")
+	(and:BHSDIM (match_operand:BHSDIM 1 "nvptx_register_operand" "R")
+		    (match_operand:BHSDIM 2 "nvptx_nonmemory_operand" "Ri")))]
+  ""
+  "%.\\tand.b%T0\\t%0, %1, %2;")
+
+(define_insn "ior<mode>3"
+  [(set (match_operand:BHSDIM 0 "nvptx_register_operand" "=R")
+	(ior:BHSDIM (match_operand:BHSDIM 1 "nvptx_register_operand" "R")
+		    (match_operand:BHSDIM 2 "nvptx_nonmemory_operand" "Ri")))]
+  ""
+  "%.\\tor.b%T0\\t%0, %1, %2;")
+
+(define_insn "xor<mode>3"
+  [(set (match_operand:BHSDIM 0 "nvptx_register_operand" "=R")
+	(xor:BHSDIM (match_operand:BHSDIM 1 "nvptx_register_operand" "R")
+		    (match_operand:BHSDIM 2 "nvptx_nonmemory_operand" "Ri")))]
+  ""
+  "%.\\txor.b%T0\\t%0, %1, %2;")
+
+;; Comparisons and branches
+
+(define_insn "*cmp<mode>"
+  [(set (match_operand:BI 0 "nvptx_register_operand" "=R")
+	(match_operator:BI 1 "nvptx_comparison_operator"
+	   [(match_operand:HSDIM 2 "nvptx_register_operand" "R")
+	    (match_operand:HSDIM 3 "nvptx_nonmemory_operand" "Ri")]))]
+  ""
+  "%.\\tsetp%c1 %0,%2,%3;")
+
+(define_insn "*cmp<mode>"
+  [(set (match_operand:BI 0 "nvptx_register_operand" "=R")
+	(match_operator:BI 1 "nvptx_float_comparison_operator"
+	   [(match_operand:SDFM 2 "nvptx_register_operand" "R")
+	    (match_operand:SDFM 3 "nvptx_nonmemory_operand" "RF")]))]
+  ""
+  "%.\\tsetp%c1 %0,%2,%3;")
+
+(define_insn "jump"
+  [(set (pc)
+	(label_ref (match_operand 0 "" "")))]
+  ""
+  "%.\\tbra\\t%l0;")
+
+(define_insn "br_true"
+  [(set (pc)
+	(if_then_else (ne (match_operand:BI 0 "nvptx_register_operand" "R")
+			  (const_int 0))
+		      (label_ref (match_operand 1 "" ""))
+		      (pc)))]
+  ""
+  "%j0\\tbra\\t%l1;")
+
+(define_insn "br_false"
+  [(set (pc)
+	(if_then_else (eq (match_operand:BI 0 "nvptx_register_operand" "R")
+			  (const_int 0))
+		      (label_ref (match_operand 1 "" ""))
+		      (pc)))]
+  ""
+  "%J0\\tbra\\t%l1;")
+
+(define_expand "cbranch<mode>4"
+  [(set (pc)
+	(if_then_else (match_operator 0 "nvptx_comparison_operator"
+		       [(match_operand:HSDIM 1 "nvptx_register_operand" "")
+			(match_operand:HSDIM 2 "nvptx_register_operand" "")])
+		      (label_ref (match_operand 3 "" ""))
+		      (pc)))]
+  ""
+{
+  rtx t = nvptx_expand_compare (operands[0]);
+  operands[0] = t;
+  operands[1] = XEXP (t, 0);
+  operands[2] = XEXP (t, 1);
+})
+
+(define_expand "cbranch<mode>4"
+  [(set (pc)
+	(if_then_else (match_operator 0 "nvptx_float_comparison_operator"
+		       [(match_operand:SDFM 1 "nvptx_register_operand" "")
+			(match_operand:SDFM 2 "nvptx_register_operand" "")])
+		      (label_ref (match_operand 3 "" ""))
+		      (pc)))]
+  ""
+{
+  rtx t = nvptx_expand_compare (operands[0]);
+  operands[0] = t;
+  operands[1] = XEXP (t, 0);
+  operands[2] = XEXP (t, 1);
+})
+
+(define_expand "cbranchbi4"
+  [(set (pc)
+	(if_then_else (match_operator 0 "predicate_operator"
+		       [(match_operand:BI 1 "nvptx_register_operand" "")
+			(match_operand:BI 2 "const0_operand" "")])
+		      (label_ref (match_operand 3 "" ""))
+		      (pc)))]
+  ""
+  "")
+
+;; Conditional stores
+
+(define_insn "setcc_from_bi"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+	(ne:SI (match_operand:BI 1 "nvptx_register_operand" "R")
+	       (const_int 0)))]
+  ""
+  "%.\\tselp%t0 %0,-1,0,%1;")
+
+(define_insn "setcc_int<mode>"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+	(match_operator:SI 1 "nvptx_comparison_operator"
+			   [(match_operand:HSDIM 2 "nvptx_register_operand" "R")
+			    (match_operand:HSDIM 3 "nvptx_nonmemory_operand" "Ri")]))]
+  ""
+  "%.\\tset%t0%c1 %0,%2,%3;")
+
+(define_insn "setcc_int<mode>"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+	(match_operator:SI 1 "nvptx_float_comparison_operator"
+			   [(match_operand:SDFM 2 "nvptx_register_operand" "R")
+			    (match_operand:SDFM 3 "nvptx_nonmemory_operand" "RF")]))]
+  ""
+  "%.\\tset%t0%c1 %0,%2,%3;")
+
+(define_insn "setcc_float<mode>"
+  [(set (match_operand:SF 0 "nvptx_register_operand" "=R")
+	(match_operator:SF 1 "nvptx_comparison_operator"
+			   [(match_operand:HSDIM 2 "nvptx_register_operand" "R")
+			    (match_operand:HSDIM 3 "nvptx_nonmemory_operand" "Ri")]))]
+  ""
+  "%.\\tset%t0%c1 %0,%2,%3;")
+
+(define_insn "setcc_float<mode>"
+  [(set (match_operand:SF 0 "nvptx_register_operand" "=R")
+	(match_operator:SF 1 "nvptx_float_comparison_operator"
+			   [(match_operand:SDFM 2 "nvptx_register_operand" "R")
+			    (match_operand:SDFM 3 "nvptx_nonmemory_operand" "RF")]))]
+  ""
+  "%.\\tset%t0%c1 %0,%2,%3;")
+
+(define_expand "cstorebi4"
+  [(set (match_operand:SI 0 "nvptx_register_operand")
+	(match_operator:SI 1 "ne_operator"
+         [(match_operand:BI 2 "nvptx_register_operand")
+          (match_operand:BI 3 "const0_operand")]))]
+  ""
+  "")
+
+(define_expand "cstore<mode>4"
+  [(set (match_operand:SI 0 "nvptx_register_operand")
+	(match_operator:SI 1 "nvptx_comparison_operator"
+         [(match_operand:HSDIM 2 "nvptx_register_operand")
+          (match_operand:HSDIM 3 "nvptx_nonmemory_operand")]))]
+  ""
+  "")
+
+(define_expand "cstore<mode>4"
+  [(set (match_operand:SI 0 "nvptx_register_operand")
+	(match_operator:SI 1 "nvptx_float_comparison_operator"
+         [(match_operand:SDFM 2 "nvptx_register_operand")
+          (match_operand:SDFM 3 "nvptx_nonmemory_operand")]))]
+  ""
+  "")
+
+;; Calls
+
+(define_insn "call_insn"
+  [(match_parallel 2 "call_operation"
+    [(call (mem:QI (match_operand:SI 0 "call_insn_operand" "Rs"))
+	   (match_operand 1))])]
+  ""
+{
+  return nvptx_output_call_insn (insn, NULL_RTX, operands[0]);
+})
+
+(define_insn "call_value_insn"
+  [(match_parallel 3 "call_operation"
+    [(set (match_operand 0 "nvptx_register_operand" "=R")
+	  (call (mem:QI (match_operand:SI 1 "call_insn_operand" "Rs"))
+		(match_operand 2)))])]
+  ""
+{
+  return nvptx_output_call_insn (insn, operands[0], operands[1]);
+})
+
+(define_expand "call"
+ [(match_operand 0 "" "")]
+ ""
+{
+  nvptx_expand_call (NULL_RTX, operands[0]);
+  DONE;
+})
+
+(define_expand "call_value"
+  [(match_operand 0 "" "")
+   (match_operand 1 "" "")]
+ ""
+{
+  nvptx_expand_call (operands[0], operands[1]);
+  DONE;
+})
+
+;; Floating point arithmetic.
+
+(define_insn "add<mode>3"
+  [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
+	(plus:SDFM (match_operand:SDFM 1 "nvptx_register_operand" "R")
+		   (match_operand:SDFM 2 "nvptx_nonmemory_operand" "RF")))]
+  ""
+  "%.\\tadd%t0\\t%0, %1, %2;")
+
+(define_insn "sub<mode>3"
+  [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
+	(minus:SDFM (match_operand:SDFM 1 "nvptx_register_operand" "R")
+		    (match_operand:SDFM 2 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tsub%t0\\t%0, %1, %2;")
+
+(define_insn "mul<mode>3"
+  [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
+	(mult:SDFM (match_operand:SDFM 1 "nvptx_register_operand" "R")
+		   (match_operand:SDFM 2 "nvptx_nonmemory_operand" "RF")))]
+  ""
+  "%.\\tmul%t0\\t%0, %1, %2;")
+
+(define_insn "fma<mode>4"
+  [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
+	(fma:SDFM (match_operand:SDFM 1 "nvptx_register_operand" "R")
+		  (match_operand:SDFM 2 "nvptx_nonmemory_operand" "RF")
+		  (match_operand:SDFM 3 "nvptx_nonmemory_operand" "RF")))]
+  ""
+  "%.\\tfma%#%t0\\t%0, %1, %2, %3;")
+
+(define_insn "div<mode>3"
+  [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
+	(div:SDFM (match_operand:SDFM 1 "nvptx_register_operand" "R")
+		  (match_operand:SDFM 2 "nvptx_nonmemory_operand" "RF")))]
+  ""
+  "%.\\tdiv%#%t0\\t%0, %1, %2;")
+
+(define_insn "copysign<mode>3"
+  [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
+	(unspec:SDFM [(match_operand:SDFM 1 "nvptx_register_operand" "R")
+		      (match_operand:SDFM 2 "nvptx_register_operand" "R")]
+		      UNSPEC_COPYSIGN))]
+  ""
+  "%.\\tcopysign%t0\\t%0, %2, %1;")
+
+(define_insn "smin<mode>3"
+  [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
+	(smin:SDFM (match_operand:SDFM 1 "nvptx_register_operand" "R")
+		    (match_operand:SDFM 2 "nvptx_nonmemory_operand" "RF")))]
+  ""
+  "%.\\tmin%t0\\t%0, %1, %2;")
+
+(define_insn "smax<mode>3"
+  [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
+	(smax:SDFM (match_operand:SDFM 1 "nvptx_register_operand" "R")
+		    (match_operand:SDFM 2 "nvptx_nonmemory_operand" "RF")))]
+  ""
+  "%.\\tmax%t0\\t%0, %1, %2;")
+
+(define_insn "abs<mode>2"
+  [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
+	(abs:SDFM (match_operand:SDFM 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tabs%t0\\t%0, %1;")
+
+(define_insn "neg<mode>2"
+  [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
+	(neg:SDFM (match_operand:SDFM 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tneg%t0\\t%0, %1;")
+
+(define_insn "sqrt<mode>2"
+  [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
+	(sqrt:SDFM (match_operand:SDFM 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tsqrt%#%t0\\t%0, %1;")
+
+(define_insn "sinsf2"
+  [(set (match_operand:SF 0 "nvptx_register_operand" "=R")
+	(unspec:SF [(match_operand:SF 1 "nvptx_register_operand" "R")]
+		   UNSPEC_SIN))]
+  "flag_unsafe_math_optimizations"
+  "%.\\tsin.approx%t0\\t%0, %1;")
+
+(define_insn "cossf2"
+  [(set (match_operand:SF 0 "nvptx_register_operand" "=R")
+	(unspec:SF [(match_operand:SF 1 "nvptx_register_operand" "R")]
+		   UNSPEC_COS))]
+  "flag_unsafe_math_optimizations"
+  "%.\\tcos.approx%t0\\t%0, %1;")
+
+(define_insn "log2sf2"
+  [(set (match_operand:SF 0 "nvptx_register_operand" "=R")
+	(unspec:SF [(match_operand:SF 1 "nvptx_register_operand" "R")]
+		   UNSPEC_LOG2))]
+  "flag_unsafe_math_optimizations"
+  "%.\\tlg2.approx%t0\\t%0, %1;")
+
+(define_insn "exp2sf2"
+  [(set (match_operand:SF 0 "nvptx_register_operand" "=R")
+	(unspec:SF [(match_operand:SF 1 "nvptx_register_operand" "R")]
+		   UNSPEC_EXP2))]
+  "flag_unsafe_math_optimizations"
+  "%.\\tex2.approx%t0\\t%0, %1;")
+
+;; Conversions involving floating point
+
+(define_insn "extendsfdf2"
+  [(set (match_operand:DF 0 "nvptx_register_operand" "=R")
+	(float_extend:DF (match_operand:SF 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tcvt%t0%t1\\t%0, %1;")
+
+(define_insn "truncdfsf2"
+  [(set (match_operand:SF 0 "nvptx_register_operand" "=R")
+	(float_truncate:SF (match_operand:DF 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tcvt%#%t0%t1\\t%0, %1;")
+
+(define_insn "floatunssi<mode>2"
+  [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
+	(unsigned_float:SDFM (match_operand:SI 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tcvt%#%t0.u%T1\\t%0, %1;")
+
+(define_insn "floatsi<mode>2"
+  [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
+	(float:SDFM (match_operand:SI 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tcvt%#%t0.s%T1\\t%0, %1;")
+
+(define_insn "floatunsdi<mode>2"
+  [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
+	(unsigned_float:SDFM (match_operand:DI 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tcvt%#%t0.u%T1\\t%0, %1;")
+
+(define_insn "floatdi<mode>2"
+  [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
+	(float:SDFM (match_operand:DI 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tcvt%#%t0.s%T1\\t%0, %1;")
+
+(define_insn "fixuns_trunc<mode>si2"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+	(unsigned_fix:SI (match_operand:SDFM 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tcvt.rzi.u%T0%t1\\t%0, %1;")
+
+(define_insn "fix_trunc<mode>si2"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+	(fix:SI (match_operand:SDFM 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tcvt.rzi.s%T0%t1\\t%0, %1;")
+
+(define_insn "fixuns_trunc<mode>di2"
+  [(set (match_operand:DI 0 "nvptx_register_operand" "=R")
+	(unsigned_fix:DI (match_operand:SDFM 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tcvt.rzi.u%T0%t1\\t%0, %1;")
+
+(define_insn "fix_trunc<mode>di2"
+  [(set (match_operand:DI 0 "nvptx_register_operand" "=R")
+	(fix:DI (match_operand:SDFM 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tcvt.rzi.s%T0%t1\\t%0, %1;")
+
+(define_int_iterator FPINT [UNSPEC_FPINT_FLOOR UNSPEC_FPINT_BTRUNC
+			    UNSPEC_FPINT_CEIL UNSPEC_FPINT_NEARBYINT])
+(define_int_attr fpint_name [(UNSPEC_FPINT_FLOOR "floor")
+			     (UNSPEC_FPINT_BTRUNC "btrunc")
+			     (UNSPEC_FPINT_CEIL "ceil")
+			     (UNSPEC_FPINT_NEARBYINT "nearbyint")])
+(define_int_attr fpint_roundingmode [(UNSPEC_FPINT_FLOOR ".rmi")
+				     (UNSPEC_FPINT_BTRUNC ".rzi")
+				     (UNSPEC_FPINT_CEIL ".rpi")
+				     (UNSPEC_FPINT_NEARBYINT "%#i")])
+
+(define_insn "<FPINT:fpint_name><SDFM:mode>2"
+  [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
+	(unspec:SDFM [(match_operand:SDFM 1 "nvptx_register_operand" "R")]
+		     FPINT))]
+  ""
+  "%.\\tcvt<FPINT:fpint_roundingmode>%t0%t1\\t%0, %1;")
+
+(define_int_iterator FPINT2 [UNSPEC_FPINT_FLOOR UNSPEC_FPINT_CEIL])
+(define_int_attr fpint2_name [(UNSPEC_FPINT_FLOOR "lfloor")
+			     (UNSPEC_FPINT_CEIL "lceil")])
+(define_int_attr fpint2_roundingmode [(UNSPEC_FPINT_FLOOR ".rmi")
+				     (UNSPEC_FPINT_CEIL ".rpi")])
+
+(define_insn "<FPINT2:fpint2_name><SDFM:mode><SDIM:mode>2"
+  [(set (match_operand:SDIM 0 "nvptx_register_operand" "=R")
+	(unspec:SDIM [(match_operand:SDFM 1 "nvptx_register_operand" "R")]
+		     FPINT2))]
+  ""
+  "%.\\tcvt<FPINT2:fpint2_roundingmode>.s%T0%t1\\t%0, %1;")
+
+;; Miscellaneous
+
+(define_insn "nop"
+  [(const_int 0)]
+  ""
+  "")
+
+(define_insn "return"
+  [(return)]
+  ""
+{
+  return nvptx_output_return ();
+})
+
+(define_expand "epilogue"
+  [(clobber (const_int 0))]
+  ""
+{
+  emit_jump_insn (gen_return ());
+  DONE;
+})
+
+(define_expand "nonlocal_goto"
+  [(match_operand 0 "" "")
+   (match_operand 1 "" "")
+   (match_operand 2 "" "")
+   (match_operand 3 "" "")]
+  ""
+{
+  sorry ("target cannot support nonlocal goto.");
+  emit_insn (gen_nop ());
+  DONE;
+})
+
+(define_expand "nonlocal_goto_receiver"
+  [(const_int 0)]
+  ""
+{
+  sorry ("target cannot support nonlocal goto.");
+})
+
+(define_insn "allocate_stack"
+  [(set (match_operand 0 "nvptx_register_operand" "=R")
+	(unspec [(match_operand 1 "nvptx_register_operand" "R")]
+		  UNSPEC_ALLOCA))]
+  ""
+  "%.\\tcall (%0), %%alloca, (%1);")
+
+(define_expand "restore_stack_block"
+  [(match_operand 0 "register_operand" "")
+   (match_operand 1 "register_operand" "")]
+  ""
+{
+  DONE;
+})
+
+(define_expand "restore_stack_function"
+  [(match_operand 0 "register_operand" "")
+   (match_operand 1 "register_operand" "")]
+  ""
+{
+  DONE;
+})
+
+(define_insn "trap"
+  [(trap_if (const_int 1) (const_int 0))]
+  ""
+  "trap;")
+
+(define_insn "trap_if_true"
+  [(trap_if (ne (match_operand:BI 0 "nvptx_register_operand" "R")
+		(const_int 0))
+	    (const_int 0))]
+  ""
+  "%j0 trap;")
+
+(define_insn "trap_if_false"
+  [(trap_if (eq (match_operand:BI 0 "nvptx_register_operand" "R")
+		(const_int 0))
+	    (const_int 0))]
+  ""
+  "%J0 trap;")
+
+(define_expand "ctrap<mode>4"
+  [(trap_if (match_operator 0 "nvptx_comparison_operator"
+			    [(match_operand:SDIM 1 "nvptx_register_operand")
+			     (match_operand:SDIM 2 "nvptx_nonmemory_operand")])
+	    (match_operand 3 "const_0_operand"))]
+  ""
+{
+  rtx t = nvptx_expand_compare (operands[0]);
+  emit_insn (gen_trap_if_true (t));
+  DONE;
+})
+
+(define_insn "*oacc_ntid_insn"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+	(unspec:SI [(match_operand:SI 1 "const_int_operand" "n")] UNSPEC_NTID))]
+  ""
+  "%.\\tmov.u32 %0, %%ntid%d1;")
+
+(define_expand "oacc_ntid"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "")
+	(unspec:SI [(match_operand:SI 1 "const_int_operand" "")] UNSPEC_NTID))]
+  ""
+{
+  if (INTVAL (operands[1]) < 0 || INTVAL (operands[1]) > 2)
+    FAIL;
+})
+
+(define_insn "*oacc_tid_insn"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+	(unspec:SI [(match_operand:SI 1 "const_int_operand" "n")] UNSPEC_TID))]
+  ""
+  "%.\\tmov.u32 %0, %%tid%d1;")
+
+(define_expand "oacc_tid"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "")
+	(unspec:SI [(match_operand:SI 1 "const_int_operand" "")] UNSPEC_TID))]
+  ""
+{
+  if (INTVAL (operands[1]) < 0 || INTVAL (operands[1]) > 2)
+    FAIL;
+})
Index: libgcc/config.host
===================================================================
--- libgcc/config.host.orig
+++ libgcc/config.host
@@ -1236,6 +1236,10 @@ mep*-*-*)
 	tmake_file="mep/t-mep t-fdpbit"
 	extra_parts="crtbegin.o crtend.o"
 	;;
+nvptx-*)
+	tmake_file="$tmake_file nvptx/t-nvptx"
+	extra_parts="crt0.o"
+ 	;;
 *)
 	echo "*** Configuration ${host} not supported" 1>&2
 	exit 1
Index: libgcc/config/nvptx/t-nvptx
===================================================================
--- /dev/null
+++ libgcc/config/nvptx/t-nvptx
@@ -0,0 +1,5 @@
+LIB2ADDEH=
+LIB2FUNCS_EXCLUDE=__main
+
+crt0.o: $(srcdir)/config/nvptx/crt0.s
+	cp $< $@
Index: configure
===================================================================
--- configure.orig
+++ configure
@@ -3771,6 +3771,10 @@ case "${target}" in
   mips*-*-*)
     noconfigdirs="$noconfigdirs gprof"
     ;;
+  nvptx*-*-*)
+    # nvptx is just a compiler
+    noconfigdirs="$noconfigdirs target-libssp target-libstdc++-v3 target-libobjc"
+    ;;
   sh-*-* | sh64-*-*)
     case "${target}" in
       sh*-*-elf)
Index: configure.ac
===================================================================
--- configure.ac.orig
+++ configure.ac
@@ -1130,6 +1130,10 @@ case "${target}" in
   mips*-*-*)
     noconfigdirs="$noconfigdirs gprof"
     ;;
+  nvptx*-*-*)
+    # nvptx is just a compiler
+    noconfigdirs="$noconfigdirs target-libssp target-libstdc++-v3 target-libobjc"
+    ;;
   sh-*-* | sh64-*-*)
     case "${target}" in
       sh*-*-elf)
Index: libgcc/config/nvptx/crt0.s
===================================================================
--- /dev/null
+++ libgcc/config/nvptx/crt0.s
@@ -0,0 +1,45 @@
+	.version 3.1
+	.target	sm_30
+	.address_size 64
+
+.global .u64 %__exitval;
+// BEGIN GLOBAL FUNCTION DEF: abort
+.visible .func abort
+{
+        .reg .u64 %rd1;
+        ld.global.u64   %rd1,[%__exitval];
+        st.u32   [%rd1], 255;
+        exit;
+}
+// BEGIN GLOBAL FUNCTION DEF: exit
+.visible .func exit (.param .u32 %arg)
+{
+        .reg .u64 %rd1;
+	.reg .u32 %val;
+	ld.param.u32 %val,[%arg];
+        ld.global.u64   %rd1,[%__exitval];
+        st.u32   [%rd1], %val;
+        exit;
+}
+
+.extern .func (.param.u32 retval) main (.param.u32 argc, .param.u64 argv);
+
+.visible .entry __main (.param .u64 __retval, .param.u32 __argc, .param.u64 __argv)
+{
+        .reg .u32 %r<3>;
+        .reg .u64 %rd<3>;
+	.param.u32 %argc;
+	.param.u64 %argp;
+	.param.u32 %mainret;
+        ld.param.u64    %rd0, [__retval];
+        st.global.u64   [%__exitval], %rd0;
+
+	ld.param.u32	%r1, [__argc];
+	ld.param.u64	%rd1, [__argv];
+	st.param.u32	[%argc], %r1;
+	st.param.u64	[%argp], %rd1;
+        call.uni        (%mainret), main, (%argc, %argp);
+	ld.param.u32	%r1,[%mainret];
+        st.s32   [%rd0], %r1;
+        exit;
+}

^ permalink raw reply	[flat|nested] 82+ messages in thread

* The nvptx port [11/11] More tools.
  2014-10-20 14:19 The nvptx port [0/11+] Bernd Schmidt
                   ` (10 preceding siblings ...)
  2014-10-20 14:50 ` The nvptx port [10/11+] Target files Bernd Schmidt
@ 2014-10-20 14:58 ` Bernd Schmidt
  2014-10-21  0:16   ` Joseph S. Myers
                     ` (2 more replies)
  2014-10-21  8:23 ` The nvptx port [0/11+] Richard Biener
                   ` (4 subsequent siblings)
  16 siblings, 3 replies; 82+ messages in thread
From: Bernd Schmidt @ 2014-10-20 14:58 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 1552 bytes --]

This is a "bonus" optional patch which adds ar, ranlib, as and ld to the 
ptx port. This is not proper binutils; ar and ranlib are just linked to 
the host versions, and the other two tools have the following functions:

* nvptx-as is required to convert the compiler output to actual valid
   ptx assembly, primarily by reordering declarations and definitions.
   Believe me when I say that I've tried to make that work in the
   compiler itself and it's pretty much impossible without some really
   invasive changes.
* nvptx-ld is just a pseudo linker that works by concatenating ptx
   input files and separating them with nul characters. Actual linking
   is something that happens later, when calling CUDA library functions,
   but existing build system make it useful to have something called
   "ld" which is able to bundle everything that's needed into a single
   file, and this seemed to be the simplest way of achieving this.

There's a toplevel configure.ac change necessary to make ar/ranlib 
useable by the libgcc build. Having some tools built like this has some 
precedent in t-vmsnative, but as Thomas noted it does make feature tests 
in gcc's configure somewhat ugly (but everything works well enough to 
build the compiler). The alternative here is to bundle all these files 
into a separate nvptx-tools package which users would have to download - 
something that would be nice to avoid.

These tools currently require GNU extensions - something I probably 
ought to fix if we decide to add them to the gcc build itself.


Bernd


[-- Attachment #2: 011-tools.diff --]
[-- Type: text/x-patch, Size: 43127 bytes --]

	* configure.ac (AR_FOR_TARGET, RANLIB_FOR_TARGET): If nvptx-*,
	look for them in the gcc build directory.
	* configure: Regenerate.

	gcc/
	* config.gcc (nvptx-*): Define extra_programs.
	* config/nvptx/nvptx-as.c: New file.
	* config/nvptx/nvptx-ld.c: New file.
	* config/nvptx/t-nvptx (nvptx-ld.o, nvptx-as.o, collect-ld$(exeext),
	as$(exeext), ar$(exeext), ranlib$(exeext): New rules.

Index: git/gcc/config.gcc
===================================================================
--- git.orig/gcc/config.gcc
+++ git/gcc/config.gcc
@@ -2154,6 +2154,7 @@ nios2-*-*)
 nvptx-*)
 	tm_file="${tm_file} newlib-stdint.h"
 	tmake_file="nvptx/t-nvptx"
+	extra_programs="collect-ld\$(exeext) as\$(exeext) ar\$(exeext) ranlib\$(exeext)"
 	;;
 pdp11-*-*)
 	tm_file="${tm_file} newlib-stdint.h"
Index: git/gcc/config/nvptx/nvptx-as.c
===================================================================
--- /dev/null
+++ git/gcc/config/nvptx/nvptx-as.c
@@ -0,0 +1,961 @@
+/* An "assembler" for ptx.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   Contributed by Nathan Sidwell <nathan@codesourcery.com>
+   Contributed by Bernd Schmidt <bernds@codesourcery.com>
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+/* Munges gcc-generated PTX assembly so that it becomes acceptable for ptxas.
+
+   This is not a complete assembler.  We presume the source is well
+   formed from the compiler and can die horribly if it is not.  */
+
+#include <getopt.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <stdarg.h>
+#include <string.h>
+#include <wait.h>
+#include <unistd.h>
+#include <errno.h>
+#define obstack_chunk_alloc malloc
+#define obstack_chunk_free free
+#include <obstack.h>
+#define HAVE_DECL_BASENAME 1
+#include <libiberty.h>
+#include <hashtab.h>
+
+#include <list>
+
+static const char *outname = NULL;
+
+static void __attribute__ ((format (printf, 1, 2)))
+fatal_error (const char * cmsgid, ...)
+{
+  va_list ap;
+
+  va_start (ap, cmsgid);
+  fprintf (stderr, "nvptx-as: ");
+  vfprintf (stderr, cmsgid, ap);
+  fprintf (stderr, "\n");
+  va_end (ap);
+
+  unlink (outname);
+  exit (1);
+}
+
+struct Stmt;
+
+class symbol
+{
+ public:
+  symbol (const char *k) : key (k), stmts (0), pending (0), emitted (0)
+    { }
+
+  /* The name of the symbol.  */
+  const char *key;
+  /* A linked list of dependencies for the initializer.  */
+  std::list<symbol *> deps;
+  /* The statement in which it is defined.  */
+  struct Stmt *stmts;
+  bool pending;
+  bool emitted;
+};
+
+/* Hash and comparison functions for these hash tables.  */
+
+static int hash_string_eq (const void *, const void *);
+static hashval_t hash_string_hash (const void *);
+
+static int
+hash_string_eq (const void *s1_p, const void *s2_p)
+{
+  const char *const *s1 = (const char *const *) s1_p;
+  const char *s2 = (const char *) s2_p;
+  return strcmp (*s1, s2) == 0;
+}
+
+static hashval_t
+hash_string_hash (const void *s_p)
+{
+  const char *const *s = (const char *const *) s_p;
+  return (*htab_hash_string) (*s);
+}
+
+static htab_t symbol_table;
+
+/* Look up an entry in the symbol hash table.  */
+
+static symbol *
+symbol_hash_lookup (const char *string)
+{
+  void **e;
+  e = htab_find_slot_with_hash (symbol_table, string,
+                                (*htab_hash_string) (string),
+                                INSERT);
+  if (e == NULL)
+    return NULL;
+  if (*e == NULL)
+    *e = new symbol (string);
+
+  return (symbol *) *e;
+}
+
+#define COMMENT_PREFIX "#"
+
+typedef enum Kind
+{
+  /* 0-ff used for single char tokens */
+  K_symbol = 0x100, /* a symbol */
+  K_label,  /* a label defn (i.e. symbol:) */
+  K_ident,  /* other ident */
+  K_dotted, /* dotted identifier */
+  K_number,
+  K_string,
+  K_comment
+} Kind;
+
+typedef struct Token
+{
+  unsigned short kind : 12;
+  unsigned short space : 1; /* preceded by space */
+  unsigned short end : 1;   /* succeeded by end of line */
+  /* Length of token */
+  unsigned short len;
+
+  /* Token itself */
+  char const *ptr;
+} Token;
+
+/* statement info */
+typedef enum Vis
+{
+  V_dot = 0,  /* random pseudo */
+  V_var = 1,  /* var decl/defn */
+  V_func = 2, /* func decl/defn */
+  V_insn = 3, /* random insn */
+  V_label = 4, /* label defn */
+  V_comment = 5,
+  V_pred = 6,  /* predicate */
+  V_mask = 0x7,
+  V_global = 0x08, /* globalize */
+  V_weak = 0x10,   /* weakly globalize */
+  V_no_eol = 0x20, /* no end of line */
+  V_prefix_comment = 0x40 /* prefixed comment */
+} Vis;
+
+typedef struct Stmt
+{
+  struct Stmt *next;
+  Token *tokens;
+  symbol *sym;
+  unsigned char vis;
+  unsigned len : 12;
+} Stmt;
+
+struct id_map
+{
+  id_map *next;
+  char *ptx_name;
+};
+
+#define alloc_comment(S,E) alloc_stmt (V_comment, S, E, 0)
+#define append_stmt(V, S) ((S)->next = *(V), *(V) = (S))
+
+static Stmt *decls;
+static Stmt *fns;
+
+static id_map *func_ids, **funcs_tail = &func_ids;
+static id_map *var_ids, **vars_tail = &var_ids;
+
+static void
+record_id (const char *p1, id_map ***where)
+{
+  const char *end = strchr (p1, '\n');
+  if (!end)
+    fatal_error ("malformed ptx file");
+
+  id_map *v = XNEW (id_map);
+  size_t len = end - p1;
+  v->ptx_name = XNEWVEC (char, len + 1);
+  memcpy (v->ptx_name, p1, len);
+  v->ptx_name[len] = '\0';
+  v->next = NULL;
+  id_map **tail = *where;
+  *tail = v;
+  *where = &v->next;
+}
+
+/* Read the whole input file.  It will be NUL terminated (but
+   remember, there could be a NUL in the file itself.  */
+
+static const char *
+read_file (FILE *stream)
+{
+  size_t alloc = 16384;
+  size_t base = 0;
+  char *buffer;
+
+  if (!fseek (stream, 0, SEEK_END))
+    {
+      /* Get the file size.  */
+      long s = ftell (stream);
+      if (s >= 0)
+	alloc = s + 100;
+      fseek (stream, 0, SEEK_SET);
+    }
+  buffer = XNEWVEC (char, alloc);
+
+  for (;;)
+    {
+      size_t n = fread (buffer + base, 1, alloc - base - 1, stream);
+
+      if (!n)
+	break;
+      base += n;
+      if (base + 1 == alloc)
+	{
+	  alloc *= 2;
+	  buffer = XRESIZEVEC (char, buffer, alloc);
+	}
+    }
+  buffer[base] = 0;
+  return buffer;
+}
+
+/* Read a token, advancing ptr.
+   If we read a comment, append it to the comments block. */
+
+static Token *
+tokenize (const char *ptr)
+{
+  unsigned alloc = 1000;
+  unsigned num = 0;
+  Token *toks = XNEWVEC (Token, alloc);
+  int in_comment = 0;
+  int not_comment = 0;
+
+  for (;; num++)
+    {
+      const char *base;
+      unsigned kind;
+      int ws = 0;
+      int eol = 0;
+
+    again:
+      base = ptr;
+      if (in_comment)
+	goto block_comment;
+      switch (kind = *ptr++)
+	{
+	default:
+	  break;
+
+	case '\n':
+	  eol = 1;
+	  /* Fall through */
+	case ' ':
+	case '\t':
+	case '\r':
+	case '\v':
+	  /* White space */
+	  ws = not_comment;
+	  goto again;
+
+	case '/':
+	  {
+	    if (*ptr == '/')
+	      {
+		/* line comment.  Do not include trailing \n */
+		base += 2;
+		for (; *ptr; ptr++)
+		  if (*ptr == '\n')
+		    break;
+		kind = K_comment;
+	      }
+	    else if (*ptr == '*')
+	      {
+		/* block comment */
+		base += 2;
+		ptr++;
+
+	      block_comment:
+		eol = in_comment;
+		in_comment = 1;
+		for (; *ptr; ptr++)
+		  {
+		    if (*ptr == '\n')
+		      {
+			ptr++;
+			break;
+		      }
+		    if (ptr[0] == '*' && ptr[1] == '/')
+		      {
+			in_comment = 2;
+			ptr += 2;
+			break;
+		      }
+		  }
+		kind = K_comment;
+	      }
+	    else
+	      break;
+	  }
+	  break;
+
+	case '"':
+	  /* quoted string */
+	  kind = K_string;
+	  while (*ptr)
+	    if (*ptr == '"')
+	      {
+		ptr++;
+		break;
+	      }
+	    else if (*ptr++ == '\\')
+	      ptr++;
+	  break;
+
+	case '.':
+	  if (*ptr < '0' || *ptr > '9')
+	    {
+	      kind = K_dotted;
+	      ws = not_comment;
+	      goto ident;
+	    }
+	  /* FALLTHROUGH */
+	case '0'...'9':
+	  kind = K_number;
+	  goto ident;
+	  break;
+
+	case '$':  /* local labels.  */
+	case '%':  /* register names, pseudoes etc */
+	  kind = K_ident;
+	  goto ident;
+
+	case 'a'...'z':
+	case 'A'...'Z':
+	case '_':
+	  kind = K_symbol; /* possible symbol name */
+	ident:
+	  for (; *ptr; ptr++)
+	    {
+	      if (*ptr >= 'A' && *ptr <= 'Z')
+		continue;
+	      if (*ptr >= 'a' && *ptr <= 'z')
+		continue;
+	      if (*ptr >= '0' && *ptr <= '9')
+		continue;
+	      if (*ptr == '_' || *ptr == '$')
+		continue;
+	      if (*ptr == '.' && kind != K_dotted)
+		/* Idents starting with a dot, cannot have internal dots. */
+		continue;
+	      if ((*ptr == '+' || *ptr == '-')
+		  && kind == K_number
+		  && (ptr[-1] == 'e' || ptr[-1] == 'E'
+		      || ptr[-1] == 'p' || ptr[-1] == 'P'))
+		/* exponent */
+		continue;
+	      break;
+	    }
+	  if (*ptr == ':')
+	    {
+	      ptr++;
+	      kind = K_label;
+	    }
+	  break;
+	}
+
+      if (alloc == num)
+	{
+	  alloc *= 2;
+	  toks = XRESIZEVEC (Token, toks, alloc);
+	}
+      Token *tok = toks + num;
+
+      tok->kind = kind;
+      tok->space = ws;
+      tok->end = 0;
+      tok->ptr = base;
+      tok->len = ptr - base - in_comment;
+      in_comment &= 1;
+      not_comment = kind != K_comment;
+      if (eol && num)
+	tok[-1].end = 1;
+      if (!kind)
+	break;
+    }
+
+  return toks;
+}
+
+/* Write an encoded token. */
+
+static void
+write_token (FILE *out, Token const *tok)
+{
+  if (tok->space)
+    fputc (' ', out);
+
+  switch (tok->kind)
+    {
+    case K_string:
+      {
+	const char *c = tok->ptr + 1;
+	size_t len = tok->len - 2;
+
+	fputs ("\"", out);
+	while (len)
+	  {
+	    const char *bs = (const char *)memchr (c, '\\', len);
+	    size_t l = bs ? bs - c : len;
+
+	    fprintf (out, "%.*s", (int)l, c);
+	    len -= l;
+	    c += l;
+	    if (bs)
+	      {
+		fputs ("\\\\", out);
+		len--, c++;
+	      }
+	  }
+	fputs ("\"", out);
+      }
+      break;
+
+    default:
+      /* All other tokens shouldn't have anything magic in them */
+      fprintf (out, "%.*s", tok->len, tok->ptr);
+      break;
+    }
+
+  if (tok->end)
+    fputs ("\n", out);
+}
+
+static Stmt *
+alloc_stmt (unsigned vis, Token *tokens, Token *end, symbol *sym)
+{
+  static unsigned alloc = 0;
+  static Stmt *heap = 0;
+
+  if (!alloc)
+    {
+      alloc = 1000;
+      heap = XNEWVEC (Stmt, alloc);
+    }
+
+  Stmt *stmt = heap++;
+  alloc--;
+
+  tokens->space = 0;
+  stmt->next = 0;
+  stmt->vis = vis;
+  stmt->tokens = tokens;
+  stmt->len = end - tokens;
+  stmt->sym = sym;
+
+  return stmt;
+}
+
+static Stmt *
+rev_stmts (Stmt *stmt)
+{
+  Stmt *prev = 0;
+  Stmt *next;
+
+  while (stmt)
+    {
+      next = stmt->next;
+      stmt->next = prev;
+      prev = stmt;
+      stmt = next;
+    }
+
+  return prev;
+}
+
+static void
+write_stmt (FILE *out, const Stmt *stmt)
+{
+  for (int i = 0; i < stmt->len; i++)
+    {
+      if ((stmt->vis & V_mask) == V_comment)
+	fprintf (out, "//");
+      write_token (out, stmt->tokens + i);
+      if ((stmt->vis & V_mask) == V_pred)
+	fputc (' ', out);
+    }
+}
+
+static void
+write_stmts (FILE *out, const Stmt *stmts)
+{
+  for (; stmts; stmts = stmts->next)
+    write_stmt (out, stmts);
+}
+
+static Token *
+parse_insn (Token *tok)
+{
+  unsigned depth = 0;
+
+  do
+    {
+      Stmt *stmt;
+      unsigned s = V_insn;
+      Token *start = tok;
+
+      switch (tok++->kind)
+	{
+	case K_comment:
+	  while (tok->kind == K_comment)
+	    tok++;
+	  stmt = alloc_comment (start, tok);
+	  append_stmt (&fns, stmt);
+	  continue;
+
+	case '{':
+	  depth++;
+	  break;
+
+	case '}':
+	  depth--;
+	  break;
+
+	case K_label:
+	  tok[-1].end = 1;
+	  s = V_label;
+	  break;
+
+	case '@':
+	  tok->space = 0;
+	  if (tok->kind == '!')
+	    tok++;
+	  tok++;
+	  s = V_pred;
+	  break;
+
+	default:
+	  for (; tok->kind != ';'; tok++)
+	    {
+	      if (tok->kind == ',')
+		tok[1].space = 0;
+	    }
+	  tok++->end = 1;
+	  break;
+	}
+
+      stmt = alloc_stmt (s, start, tok, 0);
+      append_stmt (&fns, stmt);
+
+      if (!tok[-1].end && tok[0].kind == K_comment)
+	{
+	  stmt->vis |= V_no_eol;
+	  stmt = alloc_comment (tok, tok + 1);
+	  append_stmt (&fns, stmt);
+	  tok++;
+	}
+    }
+  while (depth);
+
+  return tok;
+}
+
+/* comma separated list of tokens */
+
+static Token *
+parse_list_nosemi (Token *tok)
+{
+  Token *start = tok;
+
+  do
+    if (!(++tok)->kind)
+      break;
+  while ((++tok)->kind == ',');
+
+  tok[-1].end = 1;
+  Stmt *stmt = alloc_stmt (V_dot, start, tok, 0);
+  append_stmt (&decls, stmt);
+
+  return tok;
+}
+
+#define is_keyword(T,S) \
+  (sizeof (S) == (T)->len && !memcmp ((T)->ptr + 1, (S), (T)->len - 1))
+
+static Token *
+parse_init (Token *tok, symbol *sym)
+{
+  for (;;)
+    {
+      Token *start = tok;
+      Token *def_tok = 0;
+      Stmt *stmt;
+
+      if (tok->kind == K_comment)
+	{
+	  while (tok->kind == K_comment)
+	    tok++;
+	  stmt = alloc_comment (start, tok);
+	  append_stmt (&sym->stmts, stmt);
+	  start = tok;
+	}
+
+      if (tok->kind == '{')
+	tok[1].space = 0;
+      /* Find the last symbol before the next comma.  This allows us
+	 to do the right thing for constructs like "generic (sym)".  */
+      for (; tok->kind != ',' && tok->kind != ';'; tok++)
+	if (tok->kind == K_symbol || tok->kind == K_ident)
+	  def_tok = tok;
+      if (def_tok)
+	sym->deps.push_back (symbol_hash_lookup (xstrndup (def_tok->ptr,
+							   def_tok->len)));
+      tok[1].space = 0;
+      int end = tok++->kind == ';';
+      stmt = alloc_stmt (V_insn, start, tok, 0);
+      append_stmt (&sym->stmts, stmt);
+      if (!tok[-1].end && tok->kind == K_comment)
+	{
+	  stmt->vis |= V_no_eol;
+	  stmt = alloc_comment (tok, tok + 1);
+	  append_stmt (&sym->stmts, stmt);
+	  tok++;
+	}
+      if (end)
+	break;
+    }
+  return tok;
+}
+
+static Token *
+parse_file (Token *tok)
+{
+  Stmt *comment = 0;
+
+  if (tok->kind == K_comment)
+    {
+      Token *start = tok;
+
+      while (tok->kind == K_comment)
+	{
+	  if (strncmp (tok->ptr, ":VAR_MAP ", 9) == 0)
+	    record_id (tok->ptr + 9, &vars_tail);
+	  if (strncmp (tok->ptr, ":FUNC_MAP ", 10) == 0)
+	    record_id (tok->ptr + 10, &funcs_tail);
+	  tok++;
+	}
+      comment = alloc_comment (start, tok);
+      comment->vis |= V_prefix_comment;
+    }
+
+  if (tok->kind == K_dotted)
+    {
+      if (is_keyword (tok, "version")
+	  || is_keyword (tok, "target")
+	  || is_keyword (tok, "address_size"))
+	{
+	  if (comment)
+	    append_stmt (&decls, comment);
+	  tok = parse_list_nosemi (tok);
+	}
+      else
+	{
+	  unsigned vis = 0;
+	  symbol *def = 0;
+	  unsigned is_decl = 0;
+	  Token *start, *def_token = 0;
+
+	  for (start = tok;
+	       tok->kind && tok->kind != '=' && tok->kind != K_comment
+		 && tok->kind != '{' && tok->kind != ';'; tok++)
+	    {
+	      if (is_keyword (tok, "global")
+		  || is_keyword (tok, "const"))
+		vis |= V_var;
+	      else if (is_keyword (tok, "func")
+		       || is_keyword (tok, "entry"))
+		vis |= V_func;
+	      else if (is_keyword (tok, "visible"))
+		vis |= V_global;
+	      else if (is_keyword (tok, "extern"))
+		is_decl = 1;
+	      else if (is_keyword (tok, "weak"))
+		vis |= V_weak;
+	      if (tok->kind == '(')
+		{
+		  tok[1].space = 0;
+		  tok[0].space = 1;
+		}
+	      else if (tok->kind == ')' && tok[1].kind != ';')
+		tok[1].space = 1;
+
+	      if (tok->kind == K_symbol || tok->kind == K_ident)
+		def_token = tok;
+	    }
+	  if (def_token)
+	    def = symbol_hash_lookup (xstrndup (def_token->ptr, def_token->len));
+
+	  if (!tok->kind)
+	    {
+	      /* end of file */
+	      if (comment)
+		append_stmt (&fns, comment);
+	    }
+	  else if (tok->kind == '{'
+		   || tok->kind == K_comment)
+	    {
+	      /* function defn */
+	      Stmt *stmt = alloc_stmt (vis, start, tok, def);
+	      if (comment)
+		{
+		  append_stmt (&fns, comment);
+		  stmt->vis |= V_prefix_comment;
+		}
+	      append_stmt (&fns, stmt);
+	      tok = parse_insn (tok);
+	    }
+	  else
+	    {
+	      int assign = tok->kind == '=';
+
+	      tok++->end = 1;
+	      if ((vis & V_mask) == V_var && !is_decl)
+		{
+		  /* variable */
+		  Stmt *stmt = alloc_stmt (vis, start, tok, def);
+		  if (comment)
+		    {
+		      append_stmt (&def->stmts, comment);
+		      stmt->vis |= V_prefix_comment;
+		    }
+		  append_stmt (&def->stmts, stmt);
+		  if (assign)
+		    tok = parse_init (tok, def);
+		}
+	      else
+		{
+		  /* declaration */
+		  Stmt *stmt = alloc_stmt (vis, start, tok, 0);
+		  if (comment)
+		    {
+		      append_stmt (&decls, comment);
+		      stmt->vis |= V_prefix_comment;
+		    }
+		  append_stmt (&decls, stmt);
+		}
+	    }
+	}
+    }
+  else
+    {
+      /* Something strange.  Ignore it.  */
+      if (comment)
+	append_stmt (&fns, comment);
+
+      while (tok->kind && !tok->end)
+	tok++;
+      if (tok->kind)
+	tok++;
+    }
+  return tok;
+}
+
+static void
+output_symbol (FILE *out, symbol *e)
+{
+  if (e->emitted)
+    return;
+  if (e->pending)
+    fatal_error ("circular reference in variable initializers");
+  e->pending = true;
+  std::list<symbol *>::iterator i;
+  for (i = e->deps.begin (); i != e->deps.end (); i++)
+    output_symbol (out, *i);
+  e->pending = false;
+  write_stmts (out, rev_stmts (e->stmts));
+  e->emitted = true;
+}
+
+static int
+traverse (void **slot, void *data)
+{
+  symbol *e = *(symbol **)slot;
+  output_symbol ((FILE *)data, e);
+  return 1;
+}
+
+static void
+process (FILE *in, FILE *out)
+{
+  symbol_table = htab_create (500, hash_string_hash, hash_string_eq,
+                              NULL);
+
+  const char *input = read_file (in);
+  Token *tok = tokenize (input);
+
+  do
+    tok = parse_file (tok);
+  while (tok->kind);
+
+  write_stmts (out, rev_stmts (decls));
+  htab_traverse (symbol_table, traverse, (void *)out);
+  write_stmts (out, rev_stmts (fns));
+}
+
+/* Wait for a process to finish, and exit if a nonzero status is found.  */
+
+int
+collect_wait (const char *prog, struct pex_obj *pex)
+{
+  int status;
+
+  if (!pex_get_status (pex, 1, &status))
+    fatal_error ("can't get program status: %m");
+  pex_free (pex);
+
+  if (status)
+    {
+      if (WIFSIGNALED (status))
+	{
+	  int sig = WTERMSIG (status);
+	  fatal_error ("%s terminated with signal %d [%s]%s",
+		       prog, sig, strsignal(sig),
+		       WCOREDUMP(status) ? ", core dumped" : "");
+	}
+
+      if (WIFEXITED (status))
+	return WEXITSTATUS (status);
+    }
+  return 0;
+}
+
+static void
+do_wait (const char *prog, struct pex_obj *pex)
+{
+  int ret = collect_wait (prog, pex);
+  if (ret != 0)
+    {
+      fatal_error ("%s returned %d exit status", prog, ret);
+    }
+}
+
+\f
+/* Execute a program, and wait for the reply.  */
+static void
+fork_execute (const char *prog, char *const *argv)
+{
+  struct pex_obj *pex = pex_init (0, "nvptx-as", NULL);
+  if (pex == NULL)
+    fatal_error ("pex_init failed: %m");
+
+  int err;
+  const char *errmsg;
+
+  errmsg = pex_run (pex, PEX_LAST | PEX_SEARCH, argv[0], argv, NULL,
+		    NULL, &err);
+  if (errmsg != NULL)
+    {
+      if (err != 0)
+	{
+	  errno = err;
+	  fatal_error ("%s: %m", errmsg);
+	}
+      else
+	fatal_error (errmsg);
+    }
+  do_wait (prog, pex);
+}
+
+static struct option long_options[] = {
+  {"traditional-format",     no_argument, 0,  0 },
+  {"save-temps",  no_argument,       0,  0 },
+  {"no-verify",  no_argument,       0,  0 },
+  {0,         0,                 0,  0 }
+};
+
+int
+main (int argc, char **argv)
+{
+  FILE *in = stdin;
+  FILE *out = stdout;
+  bool verbose __attribute__((unused)) = false;
+  bool verify = true;
+
+  int o;
+  int option_index = 0;
+  while ((o = getopt_long (argc, argv, "o:I:v", long_options, &option_index)) != -1)
+    {
+      switch (o)
+	{
+	case 0:
+	  if (option_index == 2)
+	    verify = false;
+	  break;
+	case 'v':
+	  verbose = true;
+	  break;
+	case 'o':
+	  if (outname != NULL)
+	    {
+	      fprintf (stderr, "multiple output files specified\n");
+	      exit (1);
+	    }
+	  outname = optarg;
+	  break;
+	case 'I':
+	  /* Ignore include paths.  */
+	  break;
+	default:
+	  break;
+	}
+    }
+
+  if (optind + 1 != argc)
+    fatal_error ("not exactly one input file specified");
+
+  out = fopen (outname, "w");
+  if (!out)
+    fatal_error ("cannot open '%s'", outname);
+
+  in = fopen (argv[optind], "r");
+  if (!in)
+    fatal_error ("cannot open input ptx file");
+
+  process (in, out);
+  fclose (out);
+
+  if (verify)
+    {
+      struct obstack argv_obstack;
+      obstack_init (&argv_obstack);
+      obstack_ptr_grow (&argv_obstack, "ptxas");
+      obstack_ptr_grow (&argv_obstack, "-c");
+      obstack_ptr_grow (&argv_obstack, "-o");
+      obstack_ptr_grow (&argv_obstack, "/dev/null");
+      obstack_ptr_grow (&argv_obstack, outname);
+      obstack_ptr_grow (&argv_obstack, "--gpu-name");
+      obstack_ptr_grow (&argv_obstack, "sm_30");
+      obstack_ptr_grow (&argv_obstack, NULL);
+      char *const *new_argv = XOBFINISH (&argv_obstack, char *const *);
+      fork_execute (new_argv[0], new_argv);
+    }
+  return 0;
+}
Index: git/gcc/config/nvptx/nvptx-ld.c
===================================================================
--- /dev/null
+++ git/gcc/config/nvptx/nvptx-ld.c
@@ -0,0 +1,498 @@
+/* An "linker" for ptx.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   Contributed by Bernd Schmidt <bernds@codesourcery.com>
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <string.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <getopt.h>
+#include <unistd.h>
+#include <assert.h>
+
+#include "hashtab.h"
+#include "obstack.h"
+#define HAVE_DECL_BASENAME 1
+#include "libiberty.h"
+
+#include <list>
+#include <string>
+#include <iostream>
+
+struct file_hash_entry;
+
+typedef struct symbol_hash_entry
+{
+  /* The name of the symbol.  */
+  const char *key;
+  /* A linked list of unresolved referenced symbols.  */
+  struct symbol_hash_entry **pprev, *next;
+  /* The file in which it is defined.  */
+  struct file_hash_entry *def;
+  int included;
+  int referenced;
+} symbol;
+
+typedef struct file_hash_entry
+{
+  struct file_hash_entry **pprev, *next;
+  const char *name;
+  const char *arname;
+  const char *data;
+  size_t len;
+} file;
+
+/* Hash and comparison functions for these hash tables.  */
+
+static int hash_string_eq (const void *, const void *);
+static hashval_t hash_string_hash (const void *);
+
+static int
+hash_string_eq (const void *s1_p, const void *s2_p)
+{
+  const char *const *s1 = (const char *const *) s1_p;
+  const char *s2 = (const char *) s2_p;
+  return strcmp (*s1, s2) == 0;
+}
+
+static hashval_t
+hash_string_hash (const void *s_p)
+{
+  const char *const *s = (const char *const *) s_p;
+  return (*htab_hash_string) (*s);
+}
+
+static htab_t symbol_table;
+
+/* Look up an entry in the symbol hash table.  */
+
+static struct symbol_hash_entry *
+symbol_hash_lookup (const char *string, int create)
+{
+  void **e;
+  e = htab_find_slot_with_hash (symbol_table, string,
+                                (*htab_hash_string) (string),
+                                create ? INSERT : NO_INSERT);
+  if (e == NULL)
+    return NULL;
+  if (*e == NULL)
+    {
+      struct symbol_hash_entry *v;
+      *e = v = XCNEW (struct symbol_hash_entry);
+      v->key = string;
+    }
+  return (struct symbol_hash_entry *) *e;
+}
+
+static struct file_hash_entry *
+file_hash_new (const char *data, size_t len, const char *arname, const char *name)
+{
+  struct file_hash_entry *v = XCNEW (struct file_hash_entry);
+  v->data = data;
+  v->len = len;
+  v->name = xstrdup (name);
+  v->arname = xstrdup (arname);
+  return v;
+}
+
+using namespace std;
+
+#define ARMAG  "!<arch>\012"    /* For COFF and a.out archives.  */
+#define SARMAG 8
+#define ARFMAG "`\012"
+
+struct ar_hdr
+{
+  char ar_name[16];             /* Name of this member.  */
+  char ar_date[12];             /* File mtime.  */
+  char ar_uid[6];               /* Owner uid; printed as decimal.  */
+  char ar_gid[6];               /* Owner gid; printed as decimal.  */
+  char ar_mode[8];              /* File mode, printed as octal.   */
+  char ar_size[10];             /* File size, printed as decimal.  */
+  char ar_fmag[2];              /* Should contain ARFMAG.  */
+};
+
+class archive
+{
+  FILE *f;
+  off_t flen;
+  off_t off;
+
+  char name[17];
+  char *contents;
+  size_t len;
+
+ public:
+  archive () : f (NULL), contents (NULL) { }
+  ~archive ()
+  {
+    discard_contents ();
+  }
+  void discard_contents ()
+  {
+    if (contents)
+      delete[] contents;
+    contents = NULL;
+  }
+  bool init (FILE *file)
+  {
+    char magic[SARMAG];
+    if (fread (magic, 1, SARMAG, file) != SARMAG)
+      return false;
+    if (memcmp (magic, ARMAG, SARMAG) != 0)
+      return false;
+    f = file;
+    fseek (f, 0, SEEK_END);
+    flen = ftell (f);
+    fseek (f, SARMAG, SEEK_SET);
+    off = SARMAG;
+
+    struct ar_hdr hdr;
+    if (fread (&hdr, sizeof hdr, 1, f) != 1)
+      return false;
+    if (hdr.ar_name[0] == '/' || hdr.ar_name[0] == ' ')
+      {
+	off += sizeof hdr;
+	long l = atol (hdr.ar_size);
+	if (l < 0 || off + l > flen)
+	  return false;
+	off += l;
+      }
+
+    fseek (f, off, SEEK_SET);
+    return true;
+  }
+
+  bool at_end ()
+  {
+    return off == flen;
+  }
+
+  bool next_file ()
+  {
+    discard_contents ();
+
+    struct ar_hdr hdr;
+    if (fread (&hdr, sizeof hdr, 1, f) != 1)
+      return false;
+    off += sizeof hdr;
+    long l = atol (hdr.ar_size);
+    if (l <= 0 || l > flen)
+      return false;
+    size_t read_len = l + (l & 1);
+    len = l;
+    contents = new char[read_len];
+    if (contents == NULL)
+      return false;
+    if (fread (contents, 1, read_len, f) != read_len)
+      return false;
+    off += read_len;
+    memcpy (name, hdr.ar_name, sizeof hdr.ar_name);
+    name[16] = '\0';
+    return true;
+  }
+  const char *get_contents () { return contents; }
+  const char *get_name () { return name; }
+  size_t get_len () { return len; }
+};
+
+FILE *
+path_open (const char *filename, list<string> &paths)
+{
+  FILE *f = fopen (filename, "r");
+  if (f)
+    return f;
+  if (strchr (filename, '/') != NULL)
+    return NULL;
+
+  for (list<string>::const_iterator iterator = paths.begin(), end = paths.end();
+       iterator != end;
+       ++iterator)
+    {
+      string tmp = *iterator;
+      tmp += '/';
+      tmp += filename;
+      FILE *f = fopen (tmp.c_str (), "r");
+      if (f)
+	return f;
+    }
+  return NULL;
+}
+
+static struct symbol_hash_entry *unresolved;
+
+static void
+enqueue_as_unresolved (struct symbol_hash_entry *e)
+{
+  e->pprev = &unresolved;
+  e->next = unresolved;
+  if (e->next)
+    e->next->pprev = &e->next;
+  unresolved = e;
+  e->referenced = true;
+}
+
+static void
+dequeue_unresolved (struct symbol_hash_entry *e)
+{
+  if (e->pprev != NULL)
+    {
+      if (e->next)
+	e->next->pprev = e->pprev;
+      *e->pprev = e->next;
+    }
+  e->pprev = NULL;
+}
+
+static void
+process_refs_defs (file *f, const char *ptx)
+{
+  while (*ptx != '\0')
+    {
+      if (strncmp (ptx, "\n// BEGIN GLOBAL ", 17) == 0)
+	{
+	  int type = 0;
+	  ptx += 17;
+	  if (strncmp (ptx, "VAR DEF: ", 9) == 0)
+	    {
+	      type = 1;
+	      ptx += 9;
+	    }
+	  else if (strncmp (ptx, "FUNCTION DEF: ", 14) == 0)
+	    {
+	      type = 1;
+	      ptx += 14;
+	    }
+	  if (strncmp (ptx, "VAR DECL: ", 10) == 0)
+	    {
+	      type = 2;
+	      ptx += 10;
+	    }
+	  else if (strncmp (ptx, "FUNCTION DECL: ", 15) == 0)
+	    {
+	      type = 2;
+	      ptx += 15;
+	    }
+	  if (type == 0)
+	    continue;
+	  const char *end = strchr (ptx, '\n');
+	  if (end == 0)
+	    end = ptx + strlen (ptx);
+	  if ((end - ptx == 6 && memcmp (ptx, "malloc", 6) == 0)
+	      || (end - ptx == 4 && memcmp (ptx, "free", 4) == 0)
+	      || (end - ptx == 7 && memcmp (ptx, "vprintf", 7) == 0))
+	    continue;
+	  const char *sym = xstrndup (ptx, end - ptx);
+	  struct symbol_hash_entry *e = symbol_hash_lookup (sym, 1);
+
+	  if (!e->included)
+	    {
+	      if (type == 1)
+		{
+		  if (f == NULL)
+		    {
+		      e->included = true;
+		      dequeue_unresolved (e);
+		    }
+		  else
+		    e->def = f;
+		}
+	      else
+		{
+		  if (f == NULL)
+		    {
+		      if (!e->referenced)
+			enqueue_as_unresolved (e);
+		    }
+		}
+	    }
+	}
+      ptx++;
+    }
+}
+
+int
+main (int argc, char **argv)
+{
+  const char *outname = NULL;
+  list<string> libraries;
+  list<string> libpaths;
+  bool verbose = false;
+
+  int o;
+  while ((o = getopt (argc, argv, "L:l:o:v")) != -1)
+    {
+      switch (o)
+	{
+	case 'v':
+	  verbose = true;
+	  break;
+	case 'o':
+	  if (outname != NULL)
+	    {
+	      cerr << "multiple output files specified\n";
+	      exit (1);
+	    }
+	  outname = optarg;
+	  break;
+	case 'l':
+	  libraries.push_back (optarg);
+	  break;
+	case 'L':
+	  libpaths.push_back (optarg);
+	  break;
+	default:
+	  break;
+	}
+    }
+
+  libraries.sort ();
+  libraries.unique ();
+  libpaths.unique ();
+
+  if (outname == NULL)
+    outname = "a.out";
+
+  symbol_table = htab_create (500, hash_string_hash, hash_string_eq,
+                              NULL);
+
+  FILE *outfile = fopen (outname, "w");
+  if (outfile == NULL)
+    {
+      cerr << "error opening output file\n";
+      exit (1);
+    }
+  list<string> inputfiles;
+  while (optind < argc)
+    inputfiles.push_back (argv[optind++]);
+
+  int idx = 0;
+  for (list<string>::const_iterator iterator = inputfiles.begin(), end = inputfiles.end();
+       iterator != end;
+       ++iterator)
+    {
+      const string &name = *iterator;
+      FILE *f = path_open (name.c_str (), libpaths);
+      if (f == NULL)
+	{
+	  cerr << "error opening " << name << "\n";
+	  goto error_out;
+	}
+      fseek (f, 0, SEEK_END);
+      off_t len = ftell (f);
+      fseek (f, 0, SEEK_SET);
+      char *buf = new char[len + 1];
+      fread (buf, 1, len, f);
+      buf[len] = '\0';
+      if (ferror (f))
+	{
+	  cerr << "error reading " << name << "\n";
+	  goto error_out;
+	}
+      size_t out = fwrite (buf, 1, len, outfile);
+      if (out != len)
+	{
+	  cerr << "error writing to output file\n";
+	  goto error_out;
+	}
+      process_refs_defs (NULL, buf);
+      free (buf);
+      if (verbose)
+	cout << "Linking " << name << " as " << idx++ << "\n";
+      fputc ('\0', outfile);
+    }
+  for (list<string>::const_iterator iterator = libraries.begin(), end = libraries.end();
+       iterator != end;
+       ++iterator)
+    {
+      const string &name = "lib" + *iterator + ".a";
+      if (verbose)
+	cout << "trying lib " << name << "\n";
+      FILE *f = path_open (name.c_str (), libpaths);
+      if (f == NULL)
+	{
+	  cerr << "error opening " << name << "\n";
+	  goto error_out;
+	}
+      archive ar;
+      if (!ar.init (f))
+	{
+	  cerr << name << " is not a valid archive\n";
+	  goto error_out;
+	}
+      while (!ar.at_end ())
+	{
+	  if (!ar.next_file ())
+	    {
+	      cerr << "error reading from archive " << name << "\n";
+	      goto error_out;
+	    }
+	  const char *p = xstrdup (ar.get_contents ());
+	  size_t len = ar.get_len ();
+	  file *f = file_hash_new (p, len, name.c_str (), ar.get_name ());
+	  process_refs_defs (f, p);
+	}
+    }
+
+  while (unresolved)
+    {
+      struct file_hash_entry *to_add = NULL;
+      struct symbol_hash_entry *e;
+      for (e = unresolved; e; e = e->next)
+	{
+	  struct file_hash_entry *f = e->def;
+	  if (!f)
+	    {
+	      cerr << "unresolved symbol " << e->key << "\n";
+	      goto error_out;
+	    }
+	  if (verbose)
+	    cout << "Resolving " << e->key << "\n";
+	  if (!f->pprev)
+	    {
+	      f->pprev = &to_add;
+	      f->next = to_add;
+	      to_add = f;
+	    }
+	  e->included = true;
+	  e->pprev = NULL;
+	}
+      unresolved = NULL;
+      assert (to_add != NULL);
+      struct file_hash_entry *f;
+      for (f = to_add; f; f = f->next)
+	{
+	  f->pprev = NULL;
+	  if (verbose)
+	    cout << "Linking " << f->arname << "::" << f->name << " as " << idx++ << "\n";
+	  if (fwrite (f->data, 1, f->len, outfile) != f->len)
+	    {
+	      cerr << "error writing to output file\n";
+	      goto error_out;
+	    }
+	  fputc ('\0', outfile);
+	  process_refs_defs (NULL, f->data);
+	}
+    }
+  return 0;
+
+ error_out:
+  fclose (outfile);
+  unlink (outname);
+  return 1;
+}
Index: git/gcc/config/nvptx/t-nvptx
===================================================================
--- git.orig/gcc/config/nvptx/t-nvptx
+++ git/gcc/config/nvptx/t-nvptx
@@ -1,2 +1,27 @@
 #
 
+nvptx-ld.o: $(srcdir)/config/nvptx/nvptx-ld.c
+	$(COMPILE) $<
+	$(POSTCOMPILE)
+
+collect-ld$(exeext): nvptx-ld.o $(LIBIBERTY)
+	+$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) -o $@ \
+	  nvptx-ld.o $(LIBIBERTY)
+
+nvptx-as.o: $(srcdir)/config/nvptx/nvptx-as.c
+	$(COMPILE) $<
+	$(POSTCOMPILE)
+
+as$(exeext): nvptx-as.o $(LIBIBERTY)
+	+$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) -o $@ \
+	  nvptx-as.o $(LIBIBERTY)
+
+ar$(exeext):
+	echo -e "#! /bin/sh\n$(AR) \"$$""@\"" >$@
+	chmod a+x $@
+
+ranlib$(exeext):
+	echo -e "#! /bin/sh\n$(RANLIB) \"$$""@\"" >$@
+	chmod a+x $@
+
+
Index: git/configure
===================================================================
--- git.orig/configure
+++ git/configure
@@ -13607,7 +13607,95 @@ fi
 
 RAW_CXX_FOR_TARGET="$CXX_FOR_TARGET"
 
-{ $as_echo "$as_me:${as_lineno-$LINENO}: checking where to find the target ar" >&5
+case "${target}" in
+  nvptx-*)
+    { $as_echo "$as_me:${as_lineno-$LINENO}: checking where to find the target ar" >&5
+$as_echo_n "checking where to find the target ar... " >&6; }
+if test "x${build}" != "x${host}" ; then
+  if expr "x$AR_FOR_TARGET" : "x/" > /dev/null; then
+    # We already found the complete path
+    ac_dir=`dirname $AR_FOR_TARGET`
+    { $as_echo "$as_me:${as_lineno-$LINENO}: result: pre-installed in $ac_dir" >&5
+$as_echo "pre-installed in $ac_dir" >&6; }
+  else
+    # Canadian cross, just use what we found
+    { $as_echo "$as_me:${as_lineno-$LINENO}: result: pre-installed" >&5
+$as_echo "pre-installed" >&6; }
+  fi
+else
+  ok=yes
+  case " ${configdirs} " in
+    *" gcc "*) ;;
+    *) ok=no ;;
+  esac
+
+  if test $ok = yes; then
+    # An in-tree tool is available and we can use it
+    AR_FOR_TARGET='$$r/$(HOST_SUBDIR)/gcc/ar'
+    { $as_echo "$as_me:${as_lineno-$LINENO}: result: just compiled" >&5
+$as_echo "just compiled" >&6; }
+  elif expr "x$AR_FOR_TARGET" : "x/" > /dev/null; then
+    # We already found the complete path
+    ac_dir=`dirname $AR_FOR_TARGET`
+    { $as_echo "$as_me:${as_lineno-$LINENO}: result: pre-installed in $ac_dir" >&5
+$as_echo "pre-installed in $ac_dir" >&6; }
+  elif test "x$target" = "x$host"; then
+    # We can use an host tool
+    AR_FOR_TARGET='$(AR)'
+    { $as_echo "$as_me:${as_lineno-$LINENO}: result: host tool" >&5
+$as_echo "host tool" >&6; }
+  else
+    # We need a cross tool
+    { $as_echo "$as_me:${as_lineno-$LINENO}: result: pre-installed" >&5
+$as_echo "pre-installed" >&6; }
+  fi
+fi
+
+    { $as_echo "$as_me:${as_lineno-$LINENO}: checking where to find the target ranlib" >&5
+$as_echo_n "checking where to find the target ranlib... " >&6; }
+if test "x${build}" != "x${host}" ; then
+  if expr "x$RANLIB_FOR_TARGET" : "x/" > /dev/null; then
+    # We already found the complete path
+    ac_dir=`dirname $RANLIB_FOR_TARGET`
+    { $as_echo "$as_me:${as_lineno-$LINENO}: result: pre-installed in $ac_dir" >&5
+$as_echo "pre-installed in $ac_dir" >&6; }
+  else
+    # Canadian cross, just use what we found
+    { $as_echo "$as_me:${as_lineno-$LINENO}: result: pre-installed" >&5
+$as_echo "pre-installed" >&6; }
+  fi
+else
+  ok=yes
+  case " ${configdirs} " in
+    *" gcc "*) ;;
+    *) ok=no ;;
+  esac
+
+  if test $ok = yes; then
+    # An in-tree tool is available and we can use it
+    RANLIB_FOR_TARGET='$$r/$(HOST_SUBDIR)/gcc/ranlib'
+    { $as_echo "$as_me:${as_lineno-$LINENO}: result: just compiled" >&5
+$as_echo "just compiled" >&6; }
+  elif expr "x$RANLIB_FOR_TARGET" : "x/" > /dev/null; then
+    # We already found the complete path
+    ac_dir=`dirname $RANLIB_FOR_TARGET`
+    { $as_echo "$as_me:${as_lineno-$LINENO}: result: pre-installed in $ac_dir" >&5
+$as_echo "pre-installed in $ac_dir" >&6; }
+  elif test "x$target" = "x$host"; then
+    # We can use an host tool
+    RANLIB_FOR_TARGET='$(RANLIB)'
+    { $as_echo "$as_me:${as_lineno-$LINENO}: result: host tool" >&5
+$as_echo "host tool" >&6; }
+  else
+    # We need a cross tool
+    { $as_echo "$as_me:${as_lineno-$LINENO}: result: pre-installed" >&5
+$as_echo "pre-installed" >&6; }
+  fi
+fi
+
+    ;;
+  *)
+    { $as_echo "$as_me:${as_lineno-$LINENO}: checking where to find the target ar" >&5
 $as_echo_n "checking where to find the target ar... " >&6; }
 if test "x${build}" != "x${host}" ; then
   if expr "x$AR_FOR_TARGET" : "x/" > /dev/null; then
@@ -13649,6 +13737,50 @@ $as_echo "pre-installed" >&6; }
   fi
 fi
 
+    { $as_echo "$as_me:${as_lineno-$LINENO}: checking where to find the target ranlib" >&5
+$as_echo_n "checking where to find the target ranlib... " >&6; }
+if test "x${build}" != "x${host}" ; then
+  if expr "x$RANLIB_FOR_TARGET" : "x/" > /dev/null; then
+    # We already found the complete path
+    ac_dir=`dirname $RANLIB_FOR_TARGET`
+    { $as_echo "$as_me:${as_lineno-$LINENO}: result: pre-installed in $ac_dir" >&5
+$as_echo "pre-installed in $ac_dir" >&6; }
+  else
+    # Canadian cross, just use what we found
+    { $as_echo "$as_me:${as_lineno-$LINENO}: result: pre-installed" >&5
+$as_echo "pre-installed" >&6; }
+  fi
+else
+  ok=yes
+  case " ${configdirs} " in
+    *" binutils "*) ;;
+    *) ok=no ;;
+  esac
+
+  if test $ok = yes; then
+    # An in-tree tool is available and we can use it
+    RANLIB_FOR_TARGET='$$r/$(HOST_SUBDIR)/binutils/ranlib'
+    { $as_echo "$as_me:${as_lineno-$LINENO}: result: just compiled" >&5
+$as_echo "just compiled" >&6; }
+  elif expr "x$RANLIB_FOR_TARGET" : "x/" > /dev/null; then
+    # We already found the complete path
+    ac_dir=`dirname $RANLIB_FOR_TARGET`
+    { $as_echo "$as_me:${as_lineno-$LINENO}: result: pre-installed in $ac_dir" >&5
+$as_echo "pre-installed in $ac_dir" >&6; }
+  elif test "x$target" = "x$host"; then
+    # We can use an host tool
+    RANLIB_FOR_TARGET='$(RANLIB)'
+    { $as_echo "$as_me:${as_lineno-$LINENO}: result: host tool" >&5
+$as_echo "host tool" >&6; }
+  else
+    # We need a cross tool
+    { $as_echo "$as_me:${as_lineno-$LINENO}: result: pre-installed" >&5
+$as_echo "pre-installed" >&6; }
+  fi
+fi
+
+    ;;
+esac
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking where to find the target as" >&5
 $as_echo_n "checking where to find the target as... " >&6; }
 if test "x${build}" != "x${host}" ; then
@@ -14193,48 +14325,6 @@ $as_echo "pre-installed in $ac_dir" >&6;
     { $as_echo "$as_me:${as_lineno-$LINENO}: result: host tool" >&5
 $as_echo "host tool" >&6; }
   else
-    # We need a cross tool
-    { $as_echo "$as_me:${as_lineno-$LINENO}: result: pre-installed" >&5
-$as_echo "pre-installed" >&6; }
-  fi
-fi
-
-{ $as_echo "$as_me:${as_lineno-$LINENO}: checking where to find the target ranlib" >&5
-$as_echo_n "checking where to find the target ranlib... " >&6; }
-if test "x${build}" != "x${host}" ; then
-  if expr "x$RANLIB_FOR_TARGET" : "x/" > /dev/null; then
-    # We already found the complete path
-    ac_dir=`dirname $RANLIB_FOR_TARGET`
-    { $as_echo "$as_me:${as_lineno-$LINENO}: result: pre-installed in $ac_dir" >&5
-$as_echo "pre-installed in $ac_dir" >&6; }
-  else
-    # Canadian cross, just use what we found
-    { $as_echo "$as_me:${as_lineno-$LINENO}: result: pre-installed" >&5
-$as_echo "pre-installed" >&6; }
-  fi
-else
-  ok=yes
-  case " ${configdirs} " in
-    *" binutils "*) ;;
-    *) ok=no ;;
-  esac
-
-  if test $ok = yes; then
-    # An in-tree tool is available and we can use it
-    RANLIB_FOR_TARGET='$$r/$(HOST_SUBDIR)/binutils/ranlib'
-    { $as_echo "$as_me:${as_lineno-$LINENO}: result: just compiled" >&5
-$as_echo "just compiled" >&6; }
-  elif expr "x$RANLIB_FOR_TARGET" : "x/" > /dev/null; then
-    # We already found the complete path
-    ac_dir=`dirname $RANLIB_FOR_TARGET`
-    { $as_echo "$as_me:${as_lineno-$LINENO}: result: pre-installed in $ac_dir" >&5
-$as_echo "pre-installed in $ac_dir" >&6; }
-  elif test "x$target" = "x$host"; then
-    # We can use an host tool
-    RANLIB_FOR_TARGET='$(RANLIB)'
-    { $as_echo "$as_me:${as_lineno-$LINENO}: result: host tool" >&5
-$as_echo "host tool" >&6; }
-  else
     # We need a cross tool
     { $as_echo "$as_me:${as_lineno-$LINENO}: result: pre-installed" >&5
 $as_echo "pre-installed" >&6; }
Index: git/configure.ac
===================================================================
--- git.orig/configure.ac
+++ git/configure.ac
@@ -3271,7 +3271,16 @@ ACX_CHECK_INSTALLED_TARGET_TOOL(WINDMC_F
 
 RAW_CXX_FOR_TARGET="$CXX_FOR_TARGET"
 
-GCC_TARGET_TOOL(ar, AR_FOR_TARGET, AR, [binutils/ar])
+case "${target}" in
+  nvptx-*)
+    GCC_TARGET_TOOL(ar, AR_FOR_TARGET, AR, [gcc/ar])
+    GCC_TARGET_TOOL(ranlib, RANLIB_FOR_TARGET, RANLIB, [gcc/ranlib])
+    ;;
+  *)
+    GCC_TARGET_TOOL(ar, AR_FOR_TARGET, AR, [binutils/ar])
+    GCC_TARGET_TOOL(ranlib, RANLIB_FOR_TARGET, RANLIB, [binutils/ranlib])
+    ;;
+esac
 GCC_TARGET_TOOL(as, AS_FOR_TARGET, AS, [gas/as-new])
 GCC_TARGET_TOOL(cc, CC_FOR_TARGET, CC, [gcc/xgcc -B$$r/$(HOST_SUBDIR)/gcc/])
 dnl see comments for CXX_FOR_TARGET_FLAG_TO_PASS
@@ -3293,7 +3302,6 @@ GCC_TARGET_TOOL(ld, LD_FOR_TARGET, LD, [
 GCC_TARGET_TOOL(lipo, LIPO_FOR_TARGET, LIPO)
 GCC_TARGET_TOOL(nm, NM_FOR_TARGET, NM, [binutils/nm-new])
 GCC_TARGET_TOOL(objdump, OBJDUMP_FOR_TARGET, OBJDUMP, [binutils/objdump])
-GCC_TARGET_TOOL(ranlib, RANLIB_FOR_TARGET, RANLIB, [binutils/ranlib])
 GCC_TARGET_TOOL(readelf, READELF_FOR_TARGET, READELF, [binutils/readelf])
 GCC_TARGET_TOOL(strip, STRIP_FOR_TARGET, STRIP, [binutils/strip-new])
 GCC_TARGET_TOOL(windres, WINDRES_FOR_TARGET, WINDRES, [binutils/windres])

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [11/11] More tools.
  2014-10-20 14:58 ` The nvptx port [11/11] More tools Bernd Schmidt
@ 2014-10-21  0:16   ` Joseph S. Myers
  2014-10-22 20:40   ` Jeff Law
  2014-10-31 21:04   ` Jeff Law
  2 siblings, 0 replies; 82+ messages in thread
From: Joseph S. Myers @ 2014-10-21  0:16 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: GCC Patches

On Mon, 20 Oct 2014, Bernd Schmidt wrote:

> These tools currently require GNU extensions - something I probably ought to
> fix if we decide to add them to the gcc build itself.

And as regards library use, I'd expect the sources to start with #includes 
of config.h and system.h (and so not include system headers directly if 
they are included by system.h) even if no other GCC headers are useful in 
any way.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [0/11+]
  2014-10-20 14:19 The nvptx port [0/11+] Bernd Schmidt
                   ` (11 preceding siblings ...)
  2014-10-20 14:58 ` The nvptx port [11/11] More tools Bernd Schmidt
@ 2014-10-21  8:23 ` Richard Biener
  2014-10-21 10:57   ` Bernd Schmidt
  2014-10-21  9:17 ` Jakub Jelinek
                   ` (3 subsequent siblings)
  16 siblings, 1 reply; 82+ messages in thread
From: Richard Biener @ 2014-10-21  8:23 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: GCC Patches

On Mon, Oct 20, 2014 at 4:17 PM, Bernd Schmidt <bernds@codesourcery.com> wrote:
> This is a patch kit that adds the nvptx port to gcc. It contains preliminary
> patches to add needed functionality, the target files, and one somewhat
> optional patch with additional target tools. There'll be more patch series,
> one for the testsuite, and one to make the offload functionality work with
> this port. Also required are the previous four rtl patches, two of which
> weren't entirely approved yet.
>
> For the moment, I've stripped out all the address space support that got
> bogged down in review by brokenness in our representation of address spaces.
> The ptx address spaces are of course still defined and used inside the
> backend.
>
> Ptx really isn't a usual target - it is a virtual target which is then
> translated by another compiler (ptxas) to the final code that runs on the
> GPU. There are many restrictions, some imposed by the GPU hardware, and some
> by the fact that not everything you'd want can be represented in ptx. Here
> are some of the highlights:
>  * Everything is typed - variables, functions, registers. This can
>    cause problems with K&R style C or anything else that doesn't
>    have a proper type internally.
>  * Declarations are needed, even for undefined variables.
>  * Can't emit initializers referring to their variable's address since
>    you can't write forward declarations for variables.
>  * Variables can be declared only as scalars or arrays, not
>    structures. Initializers must be in the variable's declared type,
>    which requires some code in the backend, and it means that packed
>    pointer values are not representable.
>  * Since it's a virtual target, we skip register allocation - no good
>    can probably come from doing that twice. This means asm statements
>    aren't fixed up and will fail if they use matching constraints.

So with this restriction I wonder why it didn't make sense to go the
HSA "backend" route emitting PTX from a GIMPLE SSA pass.  This
would have avoided the LTO dance as well ...

That is, what is the advantage of expanding to RTL here - what
main benefits do you get from that which you thought would be
different to handle if doing code generation from GIMPLE SSA?

For HSA we even do register allocation (to a fixed virtual register
set), sth simple enough on SSA.  We of course also have to do
instruction selection but luckily virtual ISAs are easy to target.

So were you worried about "duplicating" instruction selection
and or doing it manually instead of with well-known machine
descriptions?

I'm just curious - I am not asking you to rewrite the beast ;)

Thanks,
Richard.

>  * No support for indirect jumps, label values, nonlocal gotos.
>  * No alloca - ptx defines it, but it's not implemented.
>  * No trampolines.
>  * No debugging (at all, for now - we may add line number directives).
>  * Limited C library support - I have a hacked up copy of newlib
>    that provides a reasonable subset.
>  * malloc and free are defined by ptx (these appear to be
>    undocumented), but there isn't a realloc. I have one patch for
>    Fortran to use a malloc/memcpy helper function in cases where we
>    know the old size.
>
> All in all, this is not intended to be used as a C (or any other source
> language) compiler. I've gone through a lot of effort to make it work
> reasonably well, but only in order to get sufficient test coverage from the
> testsuites. The intended use for this is only to build it as an offload
> compiler, and use it through OpenACC by way of lto1. That leaves the
> question of how we should document it - does it need the usual constraint
> and option documentation, given that user's aren't expected to use any of
> it?
>
> A slightly earlier version of the entire patch kit was bootstrapped and
> tested on x86_64-linux. Ok for trunk?
>
>
> Bernd

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [0/11+]
  2014-10-20 14:19 The nvptx port [0/11+] Bernd Schmidt
                   ` (12 preceding siblings ...)
  2014-10-21  8:23 ` The nvptx port [0/11+] Richard Biener
@ 2014-10-21  9:17 ` Jakub Jelinek
  2014-10-21 11:19   ` Bernd Schmidt
  2014-11-12 12:36 ` Richard Biener
                   ` (2 subsequent siblings)
  16 siblings, 1 reply; 82+ messages in thread
From: Jakub Jelinek @ 2014-10-21  9:17 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: GCC Patches

On Mon, Oct 20, 2014 at 04:17:56PM +0200, Bernd Schmidt wrote:
>  * Can't emit initializers referring to their variable's address since
>    you can't write forward declarations for variables.

Can't that be handled by emitting the initializer without the address and
some constructor that fixes up the initializer at runtime?

>  * Variables can be declared only as scalars or arrays, not
>    structures. Initializers must be in the variable's declared type,
>    which requires some code in the backend, and it means that packed
>    pointer values are not representable.

Can't you represent structures and unions as arrays of chars?
For constant initializers that don't need relocations the compiler can
surely turn them into arrays of char initializers (e.g. fold-const.c
native_encode_expr/native_interpret_expr could be used for that).
Supposedly it would mean slower than perhaps necessary loads/stores of
aligned larger fields from the structure, but if it is an alternative to
not supporting structures/unions at all, that sounds like so severe
limitation that it can be pretty fatal for the target.

>  * No support for indirect jumps, label values, nonlocal gotos.

Not even indirect calls?  How do you implement C++ or Fortran vtables?

	Jakub

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [0/11+]
  2014-10-21  8:23 ` The nvptx port [0/11+] Richard Biener
@ 2014-10-21 10:57   ` Bernd Schmidt
  2014-10-21 11:27     ` Richard Biener
  0 siblings, 1 reply; 82+ messages in thread
From: Bernd Schmidt @ 2014-10-21 10:57 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

On 10/21/2014 10:18 AM, Richard Biener wrote:
> So with this restriction I wonder why it didn't make sense to go the
> HSA "backend" route emitting PTX from a GIMPLE SSA pass.  This
> would have avoided the LTO dance as well ...

Quite simple - there isn't an established way to do this. If I'd known 
you were doing something like this when I started the work I might have 
looked into that approach.


Bernd

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [0/11+]
  2014-10-21  9:17 ` Jakub Jelinek
@ 2014-10-21 11:19   ` Bernd Schmidt
  0 siblings, 0 replies; 82+ messages in thread
From: Bernd Schmidt @ 2014-10-21 11:19 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: GCC Patches

On 10/21/2014 10:42 AM, Jakub Jelinek wrote:
> On Mon, Oct 20, 2014 at 04:17:56PM +0200, Bernd Schmidt wrote:
>>   * Can't emit initializers referring to their variable's address since
>>     you can't write forward declarations for variables.
>
> Can't that be handled by emitting the initializer without the address and
> some constructor that fixes up the initializer at runtime?

That reminds me that constructors are something I forgot to add to the 
list. I'm thinking about making these work with some trickery in the 
"linker", but at the moment they are unsupported.

> Can't you represent structures and unions as arrays of chars?
> For constant initializers that don't need relocations the compiler can
> surely turn them into arrays of char initializers (e.g. fold-const.c
> native_encode_expr/native_interpret_expr could be used for that).
> Supposedly it would mean slower than perhaps necessary loads/stores of
> aligned larger fields from the structure, but if it is an alternative to
> not supporting structures/unions at all, that sounds like so severe
> limitation that it can be pretty fatal for the target.

Oh, structs and unions are supported, and essentially that's what I'm 
doing - I choose a base integer type to represent them. That happens to 
be the size of a pointer, so properly aligned symbol refs can be 
emitted. It's just the packed ones that can't be done.

>>   * No support for indirect jumps, label values, nonlocal gotos.
>
> Not even indirect calls?  How do you implement C++ or Fortran vtables?

Indirect calls do exist.


Bernd

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [0/11+]
  2014-10-21 10:57   ` Bernd Schmidt
@ 2014-10-21 11:27     ` Richard Biener
  0 siblings, 0 replies; 82+ messages in thread
From: Richard Biener @ 2014-10-21 11:27 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: GCC Patches

On Tue, Oct 21, 2014 at 12:53 PM, Bernd Schmidt <bernds@codesourcery.com> wrote:
> On 10/21/2014 10:18 AM, Richard Biener wrote:
>>
>> So with this restriction I wonder why it didn't make sense to go the
>> HSA "backend" route emitting PTX from a GIMPLE SSA pass.  This
>> would have avoided the LTO dance as well ...
>
>
> Quite simple - there isn't an established way to do this. If I'd known you
> were doing something like this when I started the work I might have looked
> into that approach.

Ah, I see.  I think having both ways now is good so we can compare
pros and cons in practice (and make further targets follow the better
approach if there is one).

Richard.

>
> Bernd
>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [1/11+] indirect jumps
  2014-10-20 14:21 ` The nvptx port [1/11+] indirect jumps Bernd Schmidt
@ 2014-10-21 18:29   ` Jeff Law
  2014-10-21 21:03     ` Bernd Schmidt
  2014-11-04 15:35   ` Bernd Schmidt
  1 sibling, 1 reply; 82+ messages in thread
From: Jeff Law @ 2014-10-21 18:29 UTC (permalink / raw)
  To: Bernd Schmidt, GCC Patches

On 10/20/14 14:19, Bernd Schmidt wrote:
> ptx doesn't have indirect jumps, so CODE_FOR_indirect_jump may not be
> defined.  Add a sorry.
>
>
> Bernd
>
> 001-indjumps.diff
>
>
> 	gcc/
> 	* optabs.c (emit_indirect_jump): Test HAVE_indirect_jump and emit a
> 	sorry if necessary.
So doesn't this imply no hot-cold partitioning since we use indirect 
jumps to get across the partition?  Similarly doesn't this imply other 
missing features (setjmp/longjmp, nonlocal gotos, computed jumps?

Do you need some mechanism to ensure that hot/cold partitioning isn't 
enabled?  Do you need some kind of message specific to the other 
features, or are we going to assume that the user will map from the 
indirect jump message back to the use of setjmp/longjmp or something 
similar?

How are switches implemented (if at all)?

Jeff

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [2/11+] No register allocation
  2014-10-20 14:24 ` The nvptx port [2/11+] No register allocation Bernd Schmidt
@ 2014-10-21 18:36   ` Jeff Law
  0 siblings, 0 replies; 82+ messages in thread
From: Jeff Law @ 2014-10-21 18:36 UTC (permalink / raw)
  To: Bernd Schmidt, GCC Patches

On 10/20/14 14:20, Bernd Schmidt wrote:
> Since it's a virtual target, I've chosen not to run register allocation.
> This is one of the patches necessary to make that work, it primarily
> adds a target hook to disable it and fixes some of the fallout.
>
>
> Bernd
>
>
> 002-noregalloc.diff
>
>
> 	gcc/
> 	* target.def (no_register_allocation): New data hook.
> 	* doc/tm.texi.in: Add @hook TARGET_NO_REGISTER_ALLOCATION.
> 	* doc/tm.texi: Regenerate.
> 	* ira.c (gate_ira): New function.
> 	(pass_data_ira): Set has_gate.
> 	(pass_ira): Add a gate function.
> 	(pass_data_reload): Likewise.
> 	(pass_reload): Add a gate function.
> 	(pass_ira): Use it.
> 	* reload1.c (eliminate_regs): If reg_eliminte_is NULL, assert that
> 	no register allocation happens on the target and return.
> 	* final.c (alter_subreg): Ensure register is not a pseudo before
> 	calling simplify_subreg.
> 	(output_operand): Assert that x isn't a pseudo only if doing
> 	register allocation.\
s/reg_eliminte/reg_eliminate/

Otherwise this looks fine. Note potential for rethinking this change at 
some point in the future as we get more experience with these kinds of 
targets.

Jeff

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [3/11+] Struct returns
  2014-10-20 14:24 ` The nvptx port [3/11+] Struct returns Bernd Schmidt
@ 2014-10-21 18:41   ` Jeff Law
  0 siblings, 0 replies; 82+ messages in thread
From: Jeff Law @ 2014-10-21 18:41 UTC (permalink / raw)
  To: Bernd Schmidt, GCC Patches

On 10/20/14 14:22, Bernd Schmidt wrote:
> Even when returning a structure by passing an invisible reference, gcc
> still likes to set the return register to the address of the struct.
> This is undesirable on ptx where things like the return register have to
> be declared, and the function really returns void at ptx level. I've
> added a target hook to avoid this. I figure other targets might find it
> beneficial to omit this unnecessary set as well.
>
>
> Bernd
>
>
> 003-sretreg.diff
>
>
> 	gcc/
> 	* target.def (omit_struct_return_reg): New data hook.
> 	* doc/tm.texi.in: Add @hook TARGET_OMIT_STRUCT_RETURN_REG.
> 	* doc/tm.texi: Regenerate.
> 	* function.c (expand_function_end): Use it.
My first thought when reading this surprise that we actually return a 
value here and a desire to just zap that code completely since there's 
virtually no chance the optimizer will be able to delete it.

But then I remembered how much I hate dealing with this kind of ABI 
issue.  I suspect nobody actually specifies behavior here other than to 
indicate when pass by invisible reference is used and what register 
holds that incoming value.

Sooooo, OK for the trunk.

jeff

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [4/11+] Post-RA pipeline
  2014-10-20 14:27 ` The nvptx port [4/11+] Post-RA pipeline Bernd Schmidt
@ 2014-10-21 18:42   ` Jeff Law
  0 siblings, 0 replies; 82+ messages in thread
From: Jeff Law @ 2014-10-21 18:42 UTC (permalink / raw)
  To: Bernd Schmidt, GCC Patches

On 10/20/14 14:24, Bernd Schmidt wrote:
> This stops most of the post-regalloc passes to be run if the target
> doesn't want register allocation. I'd previously moved them all out of
> postreload to the toplevel, but Jakub (I think) pointed out that the
> idea is not to run them to avoid crashes if reload fails e.g. for an
> invalid asm. So I've made a new container pass.
>
> A later patch will make thread_prologue_and_epilogue_insns callable from
> the backend.
>
>
> Bernd
>
>
> 004-postra.diff
>
>
> 	gcc/
> 	* passes.def (pass_compute_alignments, pass_duplicate_computed_gotos,
> 	pass_variable_tracking, pass_free_cfg, pass_machine_reorg,
> 	pass_cleanup_barriers, pass_delay_slots,
> 	pass_split_for_shorten_branches, pass_convert_to_eh_region_ranges,
> 	pass_shorten_branches, pass_est_nothrow_function_flags,
> 	pass_dwarf2_frame, pass_final): Move outside of pass_postreload and
> 	into pass_late_compilation.
> 	(pass_late_compilation): Add.
> 	* passes.c (pass_data_late_compilation, pass_late_compilation,
> 	make_pass_late_compilation): New.
> 	* timevar.def (TV_LATE_COMPILATION): New.
OK.
jeff

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [5/11+] Variable declarations
  2014-10-20 14:27 ` The nvptx port [5/11+] Variable declarations Bernd Schmidt
@ 2014-10-21 18:44   ` Jeff Law
  0 siblings, 0 replies; 82+ messages in thread
From: Jeff Law @ 2014-10-21 18:44 UTC (permalink / raw)
  To: Bernd Schmidt, GCC Patches

On 10/20/14 14:25, Bernd Schmidt wrote:
> ptx assembly follows rather different rules than what's typical
> elsewhere. We need a new hook to add a " };" string when we are finished
> outputting a variable with an initializer.
>
>
> Bernd
>
>
> 005-declend.diff
>
>
> 	gcc/
> 	* target.def (decl_end): New hook.
> 	* varasm.c (assemble_variable_contents, assemble_constant_contents):
> 	Use it.
> 	* doc/tm.texi.in (TARGET_ASM_DECL_END): Add.
> 	* doc/tm.texi: Regenerate.
Ok.
jeff

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [6/11+] Pseudo call args
  2014-10-20 14:31 ` The nvptx port [6/11+] Pseudo call args Bernd Schmidt
@ 2014-10-21 18:56   ` Jeff Law
  0 siblings, 0 replies; 82+ messages in thread
From: Jeff Law @ 2014-10-21 18:56 UTC (permalink / raw)
  To: Bernd Schmidt, GCC Patches

On 10/20/14 14:26, Bernd Schmidt wrote:
> On ptx, we'll be using pseudos to pass function args as well, and
> there's one assert that needs to be toned town to make that work.
>
>
> Bernd
>
>
> 006-usereg.diff
>
>
> 	gcc/
> 	* expr.c (use_reg_mode): Just return for pseudo registers.
OK.

I pondered asking for this to be conditional on the 
no-register-allocation conditional, but then  decided it wasn't worth 
the effort.

jeff

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [1/11+] indirect jumps
  2014-10-21 18:29   ` Jeff Law
@ 2014-10-21 21:03     ` Bernd Schmidt
  2014-10-21 21:30       ` Jakub Jelinek
  0 siblings, 1 reply; 82+ messages in thread
From: Bernd Schmidt @ 2014-10-21 21:03 UTC (permalink / raw)
  To: Jeff Law, GCC Patches

On 10/21/2014 08:26 PM, Jeff Law wrote:
>>     * optabs.c (emit_indirect_jump): Test HAVE_indirect_jump and emit a
>>     sorry if necessary.
> So doesn't this imply no hot-cold partitioning since we use indirect
> jumps to get across the partition?  Similarly doesn't this imply other
> missing features (setjmp/longjmp, nonlocal gotos, computed jumps?

Pretty much yes to all.

> Do you need some mechanism to ensure that hot/cold partitioning isn't
> enabled?

I guess I could clear flag_reorder_blocks_and_partition in 
nvptx_option_override. The problem hasn't come up so far.

> Do you need some kind of message specific to the other
> features, or are we going to assume that the user will map from the
> indirect jump message back to the use of setjmp/longjmp or something
> similar?

I have some sorry calls in things like a dummy nonlocal_goto pattern. It 
doesn't quite manage to catch everything without an ICE yet though.

> How are switches implemented (if at all)?

Comparison tree as you'd generate for small switches on all other targets.


Bernd

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [7/11+] Inform the port about call arguments
  2014-10-20 14:32 ` The nvptx port [7/11+] Inform the port about call arguments Bernd Schmidt
@ 2014-10-21 21:25   ` Jeff Law
  2014-10-21 21:33     ` Bernd Schmidt
  0 siblings, 1 reply; 82+ messages in thread
From: Jeff Law @ 2014-10-21 21:25 UTC (permalink / raw)
  To: Bernd Schmidt, GCC Patches

On 10/20/14 14:29, Bernd Schmidt wrote:
> In ptx assembly we need to decorate call insns with the arguments that
> are being passed. We also need to know the exact function type. This is
> kind of hard to do with the existing infrastructure since things like
> function_arg are called at other times rather than just when emitting a
> call, so this patch adds two more hooks, one called just before argument
> registers are loaded (once for each arg), and the other just after the
> call is complete.
>
>
> Bernd
>
>
> 007-callargs.diff
>
>
> 	gcc/
> 	* target.def (call_args, end_call_args): New hooks.
> 	* hooks.c (hook_void_rtx_tree): New empty function.
> 	* hooks.h (hook_void_rtx_tree): Declare.
> 	* doc/tm.texi.in (TARGET_CALL_ARGS, TARGET_END_CALL_ARGS): Add.
> 	* doc/tm.texi: Regenerate.
> 	* calls.c (expand_call): Slightly rearrange the code.  Use the two new
> 	hooks.
> 	(expand_library_call_value_1): Use the two new hooks.
How exactly do you need to decorate?  Just mention the register, size 
information or do you need full type information?

We've had targets where we had to indicate register banks for each 
argument.  Those would walk CALL_INSN_FUNCTION_USAGE to find the 
argument registers, then from the register # we would know which 
register bank to use.   Would that work for you?

Jeff



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [1/11+] indirect jumps
  2014-10-21 21:03     ` Bernd Schmidt
@ 2014-10-21 21:30       ` Jakub Jelinek
  2014-10-21 21:37         ` Bernd Schmidt
  0 siblings, 1 reply; 82+ messages in thread
From: Jakub Jelinek @ 2014-10-21 21:30 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: Jeff Law, GCC Patches

On Tue, Oct 21, 2014 at 11:00:35PM +0200, Bernd Schmidt wrote:
> On 10/21/2014 08:26 PM, Jeff Law wrote:
> >>    * optabs.c (emit_indirect_jump): Test HAVE_indirect_jump and emit a
> >>    sorry if necessary.
> >So doesn't this imply no hot-cold partitioning since we use indirect
> >jumps to get across the partition?  Similarly doesn't this imply other
> >missing features (setjmp/longjmp, nonlocal gotos, computed jumps?
> 
> Pretty much yes to all.
> 
> >Do you need some mechanism to ensure that hot/cold partitioning isn't
> >enabled?
> 
> I guess I could clear flag_reorder_blocks_and_partition in
> nvptx_option_override. The problem hasn't come up so far.
> 
> >Do you need some kind of message specific to the other
> >features, or are we going to assume that the user will map from the
> >indirect jump message back to the use of setjmp/longjmp or something
> >similar?
> 
> I have some sorry calls in things like a dummy nonlocal_goto pattern. It
> doesn't quite manage to catch everything without an ICE yet though.

With all the sorry additions, what is actually the plan for OpenMP (dunno how
OpenACC is different in this regard)?
At least for OpenMP, the best would be if the #pragma omp target regions
and/or #pragma omp declare target functions contain anything a particular
offloading accelerator can't handle, instead of failing the whole
compilation perhaps just emit some at least by default non-fatal warning
and not emit anything for the particular offloading target, which would mean
either host fallback, or, if some other offloading target succeeded, just
that target.
The unsupported stuff can be machine dependent builtins that can't be
transformed, or e.g. the various things you've listed as unsupportable by
the PTX backend right now.

	Jakub

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [7/11+] Inform the port about call arguments
  2014-10-21 21:25   ` Jeff Law
@ 2014-10-21 21:33     ` Bernd Schmidt
  2014-10-21 21:55       ` Jeff Law
  0 siblings, 1 reply; 82+ messages in thread
From: Bernd Schmidt @ 2014-10-21 21:33 UTC (permalink / raw)
  To: Jeff Law, GCC Patches

On 10/21/2014 11:11 PM, Jeff Law wrote:
> On 10/20/14 14:29, Bernd Schmidt wrote:
>> In ptx assembly we need to decorate call insns with the arguments that
>> are being passed. We also need to know the exact function type. This is
>> kind of hard to do with the existing infrastructure since things like
>> function_arg are called at other times rather than just when emitting a
>> call, so this patch adds two more hooks, one called just before argument
>> registers are loaded (once for each arg), and the other just after the
>> call is complete.
>>
> How exactly do you need to decorate?  Just mention the register, size
> information or do you need full type information?

A normal call looks like

{
   .param.u32 %retval_in;
   .param.u64 %out_arg0;
   st.param.u64 [%out_arg0], %r1400;
   call (%retval_in), PopCnt, (%out_arg0);
   ld.param.u32    %r1403, [%retval_in];
}

which declares local variables for the args and return value, stores the 
pseudos in the outgoing args, calls the function with explicitly named 
args and return values, and loads the incoming return value. All this is 
produced by nvptx_output_call_insn for a single CALL rtx insn.

Indirect calls additionally need to produce a .callprototype pseudo-op 
which looks like a function declaration; for normal calls the called 
function must already be declared elsewhere. The machinery to produce 
such .callprototypes is also used to produce a ptx decl from the call 
insn for an external K&R declaration with no argument types.

> We've had targets where we had to indicate register banks for each
> argument.  Those would walk CALL_INSN_FUNCTION_USAGE to find the
> argument registers, then from the register # we would know which
> register bank to use.   Would that work for you?

Couple of problems with this - the fusage isn't available to gen_call, 
it gets added to the call insn after it is emitted, but the backend 
would like to have this information when emitting the insn. Also, I'd 
need the order to be reliable and I don't think CALL_INSN_FUNCTION_USAGE 
is really designed to guarantee that (I suspect the order of register 
args and things like the struct return reg is wrong). I also need the 
exact function type and the call_args hook seems like the easiest way to 
communicate it to the backend.


Bernd

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [1/11+] indirect jumps
  2014-10-21 21:30       ` Jakub Jelinek
@ 2014-10-21 21:37         ` Bernd Schmidt
  2014-10-22  8:21           ` Richard Biener
  0 siblings, 1 reply; 82+ messages in thread
From: Bernd Schmidt @ 2014-10-21 21:37 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Jeff Law, GCC Patches

On 10/21/2014 11:30 PM, Jakub Jelinek wrote:
> At least for OpenMP, the best would be if the #pragma omp target regions
> and/or #pragma omp declare target functions contain anything a particular
> offloading accelerator can't handle, instead of failing the whole
> compilation perhaps just emit some at least by default non-fatal warning
> and not emit anything for the particular offloading target, which would mean
> either host fallback, or, if some other offloading target succeeded, just
> that target.

I guess a test could be added to mkoffload if gcc were to return a 
different value for a sorry vs. any other compilation failure. The tool 
could then choose not to produce offloading support for that target.


Bernd

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [7/11+] Inform the port about call arguments
  2014-10-21 21:33     ` Bernd Schmidt
@ 2014-10-21 21:55       ` Jeff Law
  2014-10-21 22:16         ` Bernd Schmidt
  0 siblings, 1 reply; 82+ messages in thread
From: Jeff Law @ 2014-10-21 21:55 UTC (permalink / raw)
  To: Bernd Schmidt, GCC Patches

On 10/21/14 21:29, Bernd Schmidt wrote:
>
> A normal call looks like
>
> {
>    .param.u32 %retval_in;
>    .param.u64 %out_arg0;
>    st.param.u64 [%out_arg0], %r1400;
>    call (%retval_in), PopCnt, (%out_arg0);
>    ld.param.u32    %r1403, [%retval_in];
> }
>
> which declares local variables for the args and return value, stores the
> pseudos in the outgoing args, calls the function with explicitly named
> args and return values, and loads the incoming return value. All this is
> produced by nvptx_output_call_insn for a single CALL rtx insn.
So far, so good.

>
> Indirect calls additionally need to produce a .callprototype pseudo-op
> which looks like a function declaration; for normal calls the called
> function must already be declared elsewhere. The machinery to produce
> such .callprototypes is also used to produce a ptx decl from the call
> insn for an external K&R declaration with no argument types.
Yea, no surprise here.


> Couple of problems with this - the fusage isn't available to gen_call,
> it gets added to the call insn after it is emitted, but the backend
> would like to have this information when emitting the insn.
Right.  Targets which have needed this emit the decorations at 
insn-output time so the fusage has been attached.


> Also, I'd
> need the order to be reliable and I don't think CALL_INSN_FUNCTION_USAGE
> is really designed to guarantee that (I suspect the order of register
> args and things like the struct return reg is wrong). I also need the
> exact function type and the call_args hook seems like the easiest way to
> communicate it to the backend.
We've depended on the ordering in the PA, well, forever.  However, I 
doubt ordering of regs in the fusage is documented at all!  We could 
change that.

So, in the end I'm torn.  I don't like adding new hooks when they're not 
needed, but I have some reservations about relying on the order of stuff 
in CALL_INSN_FUNCTION_USAGE and I worry a bit that you might end up with 
stuff other than arguments on that list -- the PA port could filter on 
the hard registers used for passing arguments, so other stuff appearing 
isn't a big deal.

Let me sleep on this one :-)
Jeff

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [8/11+] Write undefined decls.
  2014-10-20 14:32 ` The nvptx port [8/11+] Write undefined decls Bernd Schmidt
@ 2014-10-21 22:07   ` Jeff Law
  2014-10-21 22:30     ` Bernd Schmidt
  0 siblings, 1 reply; 82+ messages in thread
From: Jeff Law @ 2014-10-21 22:07 UTC (permalink / raw)
  To: Bernd Schmidt, GCC Patches

On 10/20/14 14:30, Bernd Schmidt wrote:
> ptx assembly requires that declarations are written for undefined
> variables. This adds that functionality.
>
>
> Bernd
>
>
> 008-undefdecl.diff
>
>
> 	gcc/
> 	* target.def (assemble_undefined_decl): New hooks.
> 	* hooks.c (hook_void_FILEptr_constcharptr_const_tree): New function.
> 	* hooks.h (hook_void_FILEptr_constcharptr_const_tree): Declare.
> 	* doc/tm.texi.in (TARGET_ASM_ASSEMBLE_UNDEFINED_DECL): Add.
> 	* doc/tm.texi: Regenerate.
> 	* output.h (assemble_undefined_decl): Declare.
> 	(get_fnname_from_decl): Declare.
> 	* varasm.c (assemble_undefined_decl): New function.
> 	(get_fnname_from_decl): New function.
> 	* final.c (rest_of_handle_final): Use it.
> 	* varpool.c (varpool_output_variables): Call assemble_undefined_decl
> 	for nodes without a definition.
Does this need to happen at the use site, or can it be deferred?

THe PA had to do something similar.  We built up a vector of every 
external object in ASM_OUTPUT_EXTERNAL, but did not emit anything.

Then in ASM_FILE_END, we walked that vector and anything that was 
actually referenced (as opposed to just just declared) we would emit the 
magic .IMPORT lines.

Jeff

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [9/11+] Epilogues
  2014-10-20 14:35 ` The nvptx port [9/11+] Epilogues Bernd Schmidt
@ 2014-10-21 22:08   ` Jeff Law
  0 siblings, 0 replies; 82+ messages in thread
From: Jeff Law @ 2014-10-21 22:08 UTC (permalink / raw)
  To: Bernd Schmidt, GCC Patches

On 10/20/14 14:32, Bernd Schmidt wrote:
> We skip the late compilation passes on ptx, but there's one piece we do
> need - fixing up the function so that we get return insns in the right
> places. This patch just makes thread_prologue_and_epilogue_insns
> callable from the reorg pass.
>
>
> Bernd
>
> 009-proep.diff
>
>
> 	gcc/
> 	* function.c (thread_prologue_and_epilogue_insns): No longer static.
> 	* function.h (thread_prologue_and_epilogue_insns): Declare.
OK.
Jeff

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [7/11+] Inform the port about call arguments
  2014-10-21 21:55       ` Jeff Law
@ 2014-10-21 22:16         ` Bernd Schmidt
  2014-10-22 18:23           ` Jeff Law
  0 siblings, 1 reply; 82+ messages in thread
From: Bernd Schmidt @ 2014-10-21 22:16 UTC (permalink / raw)
  To: Jeff Law, GCC Patches

On 10/21/2014 11:53 PM, Jeff Law wrote:

> So, in the end I'm torn.  I don't like adding new hooks when they're not
> needed, but I have some reservations about relying on the order of stuff
> in CALL_INSN_FUNCTION_USAGE and I worry a bit that you might end up with
> stuff other than arguments on that list -- the PA port could filter on
> the hard registers used for passing arguments, so other stuff appearing
> isn't a big deal.

This is another worry. Also, at the moment we don't actually add the 
pseudos to CALL_INSN_FUNCTION_USAGE (that's patch 6/11), we use the regs 
saved by the call_args hook to make proper USEs in a PARALLEL. I'm not 
convinced the rest of the compiler would be too happy to see pseudos there.

So, in all I'd say it's probably possible to do it that way, but it 
feels a lot iffier than I'd be happy with. I for one didn't know about 
the PA requirement, so I could easily have broken it unknowingly if I'd 
made some random change modifying call expansion.


Bernd

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [8/11+] Write undefined decls.
  2014-10-21 22:07   ` Jeff Law
@ 2014-10-21 22:30     ` Bernd Schmidt
  2014-10-22 18:23       ` Jeff Law
  0 siblings, 1 reply; 82+ messages in thread
From: Bernd Schmidt @ 2014-10-21 22:30 UTC (permalink / raw)
  To: Jeff Law, GCC Patches

On 10/22/2014 12:05 AM, Jeff Law wrote:
> On 10/20/14 14:30, Bernd Schmidt wrote:
>> ptx assembly requires that declarations are written for undefined
>> variables. This adds that functionality.
> Does this need to happen at the use site, or can it be deferred?

This is independent of use sites. The patch just adds another walk over 
the varpool to emit not just the defined vars.

Ideally we'd maintain an order that declares or defines every variable 
before it is referenced by an initializer, but the attempt to do that in 
the compiler totally failed due to references between constant pools and 
regular variables. The nvptx-as tool we have fixes up the order of 
declarations after the first compilation stage.

> THe PA had to do something similar.  We built up a vector of every
> external object in ASM_OUTPUT_EXTERNAL, but did not emit anything.
>
> Then in ASM_FILE_END, we walked that vector and anything that was
> actually referenced (as opposed to just just declared) we would emit the
> magic .IMPORT lines.

Sounds like the PA could use this hook to simplify its code quite a bit.

Looking at the patch again I noticed there's still some unrelated code 
in here - the patch used to be quite a lot larger and got shrunk due to 
the failure mentioned above. get_fnname_for_decl is just a new function 
broken out of rest_of_handle_final, it is used by the nvptx.c code.


Bernd

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [1/11+] indirect jumps
  2014-10-21 21:37         ` Bernd Schmidt
@ 2014-10-22  8:21           ` Richard Biener
  2014-10-22  8:34             ` Jakub Jelinek
  2014-10-22  8:37             ` Thomas Schwinge
  0 siblings, 2 replies; 82+ messages in thread
From: Richard Biener @ 2014-10-22  8:21 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: Jakub Jelinek, Jeff Law, GCC Patches

On Tue, Oct 21, 2014 at 11:32 PM, Bernd Schmidt <bernds@codesourcery.com> wrote:
> On 10/21/2014 11:30 PM, Jakub Jelinek wrote:
>>
>> At least for OpenMP, the best would be if the #pragma omp target regions
>> and/or #pragma omp declare target functions contain anything a particular
>> offloading accelerator can't handle, instead of failing the whole
>> compilation perhaps just emit some at least by default non-fatal warning
>> and not emit anything for the particular offloading target, which would
>> mean
>> either host fallback, or, if some other offloading target succeeded, just
>> that target.
>
>
> I guess a test could be added to mkoffload if gcc were to return a different
> value for a sorry vs. any other compilation failure. The tool could then
> choose not to produce offloading support for that target.

But that would be for the whole file instead of for the specific region?

So maybe we should produce one LTO offload object for each offload
function and make the symbols they are supposed to provide weak
so a fail doesn't end up failing to link the main program?

Looks like this gets somewhat awkward with the LTO setup.

Richard.

>
> Bernd
>

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [1/11+] indirect jumps
  2014-10-22  8:21           ` Richard Biener
@ 2014-10-22  8:34             ` Jakub Jelinek
  2014-10-22  8:37             ` Thomas Schwinge
  1 sibling, 0 replies; 82+ messages in thread
From: Jakub Jelinek @ 2014-10-22  8:34 UTC (permalink / raw)
  To: Richard Biener; +Cc: Bernd Schmidt, Jeff Law, GCC Patches

On Wed, Oct 22, 2014 at 10:18:49AM +0200, Richard Biener wrote:
> On Tue, Oct 21, 2014 at 11:32 PM, Bernd Schmidt <bernds@codesourcery.com> wrote:
> > On 10/21/2014 11:30 PM, Jakub Jelinek wrote:
> >>
> >> At least for OpenMP, the best would be if the #pragma omp target regions
> >> and/or #pragma omp declare target functions contain anything a particular
> >> offloading accelerator can't handle, instead of failing the whole
> >> compilation perhaps just emit some at least by default non-fatal warning
> >> and not emit anything for the particular offloading target, which would
> >> mean
> >> either host fallback, or, if some other offloading target succeeded, just
> >> that target.
> >
> >
> > I guess a test could be added to mkoffload if gcc were to return a different
> > value for a sorry vs. any other compilation failure. The tool could then
> > choose not to produce offloading support for that target.
> 
> But that would be for the whole file instead of for the specific region?
> 
> So maybe we should produce one LTO offload object for each offload
> function and make the symbols they are supposed to provide weak
> so a fail doesn't end up failing to link the main program?
> 
> Looks like this gets somewhat awkward with the LTO setup.

I don't think we want to do a fine-grained granularity here, it will only
lead to significant nightmares.  E.g. a target region can call other target
functions, if a target function it calls (perhaps directly through a series
of other target functions, perhaps indirectly through function pointers
etc.) can't be supported by the host, you'd need to give up on offloading
all target regions that do or could invoke that.  That can be in another TU
within the same shared library etc.  And, if some regions are emitted and
others are not, #pragma omp target data will behave less predictably and
more confusingly, right now it can test, does this library have usable
offloading for everything it provides (i.e. libgomp would ask the plugin to
initialize offloading from the current shared library if not already done,
and if successful, say it supports offloading for the particular device and
map variables to that device as requested, otherwise it would just assume
only host fallback is possible and not really map anything).  When a target
region is hit, from either within the target data region or elsewhere, it is
already figured out if it has to fallback to host or not.

Now, if you have fine-grained offloading, 33.2% of target regions being
offloadable, the rest not, what would you actually do in target data region?
It doesn't generically know what target regions will be encountered.
So act as if offloading perhaps was possible?  But then at each target
region find out if it is really possible?

IMHO people that care about performance will use target regions with care,
with the offloading targets that they care about in mind, for those that
don't care about that, either they will be lucky and things will work out
all, or they will just end up with host fallback.

	Jakub

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [1/11+] indirect jumps
  2014-10-22  8:21           ` Richard Biener
  2014-10-22  8:34             ` Jakub Jelinek
@ 2014-10-22  8:37             ` Thomas Schwinge
  2014-10-22 10:03               ` Richard Biener
  1 sibling, 1 reply; 82+ messages in thread
From: Thomas Schwinge @ 2014-10-22  8:37 UTC (permalink / raw)
  To: Richard Biener, Bernd Schmidt; +Cc: Jakub Jelinek, Jeff Law, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 1584 bytes --]

Hi!

On Wed, 22 Oct 2014 10:18:49 +0200, Richard Biener <richard.guenther@gmail.com> wrote:
> On Tue, Oct 21, 2014 at 11:32 PM, Bernd Schmidt <bernds@codesourcery.com> wrote:
> > On 10/21/2014 11:30 PM, Jakub Jelinek wrote:
> >>
> >> At least for OpenMP, the best would be if the #pragma omp target regions
> >> and/or #pragma omp declare target functions contain anything a particular
> >> offloading accelerator can't handle, instead of failing the whole
> >> compilation perhaps just emit some at least by default non-fatal warning
> >> and not emit anything for the particular offloading target, which would
> >> mean
> >> either host fallback, or, if some other offloading target succeeded, just
> >> that target.
> >
> >
> > I guess a test could be added to mkoffload if gcc were to return a different
> > value for a sorry vs. any other compilation failure. The tool could then
> > choose not to produce offloading support for that target.
> 
> But that would be for the whole file instead of for the specific region?

I'm not sure that's what you're suggesting, but at least on non-shared
memory offloading devices, you can't switch arbitrarily between
offloading device(s) and host-fallback, for you have to do data
management between the non-shared memories.

> So maybe we should produce one LTO offload object for each offload
> function and make the symbols they are supposed to provide weak
> so a fail doesn't end up failing to link the main program?
> 
> Looks like this gets somewhat awkward with the LTO setup.


Grüße,
 Thomas

[-- Attachment #2: Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [1/11+] indirect jumps
  2014-10-22  8:37             ` Thomas Schwinge
@ 2014-10-22 10:03               ` Richard Biener
  2014-10-22 10:32                 ` Jakub Jelinek
  0 siblings, 1 reply; 82+ messages in thread
From: Richard Biener @ 2014-10-22 10:03 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: Bernd Schmidt, Jakub Jelinek, Jeff Law, GCC Patches

On Wed, Oct 22, 2014 at 10:34 AM, Thomas Schwinge
<thomas@codesourcery.com> wrote:
> Hi!
>
> On Wed, 22 Oct 2014 10:18:49 +0200, Richard Biener <richard.guenther@gmail.com> wrote:
>> On Tue, Oct 21, 2014 at 11:32 PM, Bernd Schmidt <bernds@codesourcery.com> wrote:
>> > On 10/21/2014 11:30 PM, Jakub Jelinek wrote:
>> >>
>> >> At least for OpenMP, the best would be if the #pragma omp target regions
>> >> and/or #pragma omp declare target functions contain anything a particular
>> >> offloading accelerator can't handle, instead of failing the whole
>> >> compilation perhaps just emit some at least by default non-fatal warning
>> >> and not emit anything for the particular offloading target, which would
>> >> mean
>> >> either host fallback, or, if some other offloading target succeeded, just
>> >> that target.
>> >
>> >
>> > I guess a test could be added to mkoffload if gcc were to return a different
>> > value for a sorry vs. any other compilation failure. The tool could then
>> > choose not to produce offloading support for that target.
>>
>> But that would be for the whole file instead of for the specific region?
>
> I'm not sure that's what you're suggesting, but at least on non-shared
> memory offloading devices, you can't switch arbitrarily between
> offloading device(s) and host-fallback, for you have to do data
> management between the non-shared memories.

Oh, I see.  For HSA we simply don't emit an offload variant for code
we cannot handle.  But only for those parts.

So it's only offload or fallback for other devices?  Thus also never
share work between both for example (run N threads on the CPU
and M threads on the offload target)?

Richard.

>> So maybe we should produce one LTO offload object for each offload
>> function and make the symbols they are supposed to provide weak
>> so a fail doesn't end up failing to link the main program?
>>
>> Looks like this gets somewhat awkward with the LTO setup.
>
>
> Grüße,
>  Thomas

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [1/11+] indirect jumps
  2014-10-22 10:03               ` Richard Biener
@ 2014-10-22 10:32                 ` Jakub Jelinek
  0 siblings, 0 replies; 82+ messages in thread
From: Jakub Jelinek @ 2014-10-22 10:32 UTC (permalink / raw)
  To: Richard Biener; +Cc: Thomas Schwinge, Bernd Schmidt, Jeff Law, GCC Patches

On Wed, Oct 22, 2014 at 12:02:16PM +0200, Richard Biener wrote:
> > I'm not sure that's what you're suggesting, but at least on non-shared
> > memory offloading devices, you can't switch arbitrarily between
> > offloading device(s) and host-fallback, for you have to do data
> > management between the non-shared memories.
> 
> Oh, I see.  For HSA we simply don't emit an offload variant for code
> we cannot handle.  But only for those parts.
> 
> So it's only offload or fallback for other devices?  Thus also never

Yeah.

> share work between both for example (run N threads on the CPU
> and M threads on the offload target)?

I believe at least for the non-shared memory the OpenMP model wouldn't allow
that.  Of course, user can do the sharing explicitly (though OpenMP 4.0
doesn't have asynchronous target regions): one could e.g. run a couple of
host tasks on the offloading region with if (0) - forced host fallback,
ensure e.g. one team and one parallel thread in that case,
and then in one host task with if (1) and use as many teams and parallel
threads as available on the offloading device.

	Jakub

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [10/11+] Target files
  2014-10-20 14:50 ` The nvptx port [10/11+] Target files Bernd Schmidt
@ 2014-10-22 18:12   ` Jeff Law
  2014-10-28 15:10     ` Bernd Schmidt
  0 siblings, 1 reply; 82+ messages in thread
From: Jeff Law @ 2014-10-22 18:12 UTC (permalink / raw)
  To: Bernd Schmidt, GCC Patches

On 10/20/14 08:33, Bernd Schmidt wrote:
> These are the main target files for the ptx port. t-nvptx is empty for
> now but will grow some content with follow up patches.
>
>
> Bernd
>
>
> 010-target.diff
>
>
> 	* configure.ac: Allow configuring lto for nvptx.
> 	* configure: Regenerate.
>
> 	gcc/
> 	* config/nvptx/nvptx.c: New file.
> 	* config/nvptx/nvptx.h: New file.
> 	* config/nvptx/nvptx-protos.h: New file.
> 	* config/nvptx/nvptx.md: New file.
> 	* config/nvptx/t-nvptx: New file.
> 	* config/nvptx/nvptx.opt: New file.
> 	* common/config/nvptx/nvptx-common.c: New file.
> 	* config.gcc: Handle nvptx-*-*.
>
> 	libgcc/
> 	* config.host: Handle nvptx-*-*.
> 	* config/nvptx/t-nvptx: New file.
> 	* config/nvptx/crt0.s: New file.
Please make sure all the functions in nvptx.c have function comments. 
nvptx_split_reg_p, write_as_kernel, nvptx_write_function_decl, 
write_function_decl_only, nvptx_function_incoming_arg, 
nvptx_promote_function_mode, nvptx_maybe_convert_symbolic_operand, etc.

There are many others..  A scan over that entire file would be appreciated.


>
> ------------------------------------------------------------------------
> +
> +/* TARGET_FUNCTION_VALUE implementation.  Returns an RTX representing the place
> +   where function FUNC returns or receives a value of data type TYPE.  */
> +
> +static rtx
> +nvptx_function_value (const_tree type, const_tree func ATTRIBUTE_UNUSED,
> +		      bool outgoing)
> +{
> +  int unsignedp = TYPE_UNSIGNED (type);
> +  enum machine_mode orig_mode = TYPE_MODE (type);
> +  enum machine_mode mode = promote_function_mode (type, orig_mode,
> +						  &unsignedp, NULL_TREE, 1);
> +  if (outgoing)
> +    return gen_rtx_REG (mode, 4);
> +  if (cfun->machine->start_call == NULL_RTX)
> +    /* Pretend to return in a hard reg for early uses before pseudos can be
> +       generated.  */
> +    return gen_rtx_REG (mode, 4);
> +  return gen_reg_rtx (mode);
Rather than magic register numbers, can you use something symbolic?

> +}
> +
> +/* Implement TARGET_LIBCALL_VALUE.  */
> +
> +static rtx
> +nvptx_libcall_value (enum machine_mode mode, const_rtx)
> +{
> +  if (cfun->machine->start_call == NULL_RTX)
> +    /* Pretend to return in a hard reg for early uses before pseudos can be
> +       generated.  */
> +    return gen_rtx_REG (mode, 4);
> +  return gen_reg_rtx (mode);
> +}
Similarly.


> +
> +/* Implement TARGET_FUNCTION_VALUE_REGNO_P.  */
> +
> +static bool
> +nvptx_function_value_regno_p (const unsigned int regno)
> +{
> +  return regno == 4;
> +}
Here too.


> +
> +bool
> +nvptx_hard_regno_mode_ok (int regno, enum machine_mode mode)
> +{
> +  if (regno != 4 || cfun == NULL || cfun->machine->ret_reg_mode == VOIDmode)
> +    return true;
> +  return mode == cfun->machine->ret_reg_mode;
> +}
Function comment.  Magic register #.


> +
> +const char *
> +nvptx_output_call_insn (rtx insn, rtx result, rtx callee)
If possible, promote first argument to rtx_insn *.

> +
> +/* Clean up subreg operands.  */
Which means what?  A little more descriptive here would be helpful.  I 
have a guess what you need to do here, but more commentary would be 
helpful for someone that hasn't read through the virtual PTX ISA.

The machine description is about what I would expect, in fact, it shows 
how "nice" a virtual ISA can be.

Overall it seems pretty reasonable.  Most of the difficulty appears to 
be interfacing with the 3rd party tools, but that's largely expected.

I'm surprised there's not more hair around the address space issues.  I 
expected more problems there.

I'm going to trust that all the ABI related stuff is correct.  I'm not 
going to second guess any of that stuff.

I think we've got a couple things to iterate on from yesterday and 
you've got some minor stuff to address as noted above, but this looks 
pretty close to being ready.


jeff

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [8/11+] Write undefined decls.
  2014-10-21 22:30     ` Bernd Schmidt
@ 2014-10-22 18:23       ` Jeff Law
  2014-11-05 12:05         ` Bernd Schmidt
  0 siblings, 1 reply; 82+ messages in thread
From: Jeff Law @ 2014-10-22 18:23 UTC (permalink / raw)
  To: Bernd Schmidt, GCC Patches

On 10/21/14 16:15, Bernd Schmidt wrote:
> On 10/22/2014 12:05 AM, Jeff Law wrote:
>> On 10/20/14 14:30, Bernd Schmidt wrote:
>>> ptx assembly requires that declarations are written for undefined
>>> variables. This adds that functionality.
>> Does this need to happen at the use site, or can it be deferred?
>
> This is independent of use sites. The patch just adds another walk over
> the varpool to emit not just the defined vars.
>
> Ideally we'd maintain an order that declares or defines every variable
> before it is referenced by an initializer, but the attempt to do that in
> the compiler totally failed due to references between constant pools and
> regular variables. The nvptx-as tool we have fixes up the order of
> declarations after the first compilation stage.
>
>> THe PA had to do something similar.  We built up a vector of every
>> external object in ASM_OUTPUT_EXTERNAL, but did not emit anything.
>>
>> Then in ASM_FILE_END, we walked that vector and anything that was
>> actually referenced (as opposed to just just declared) we would emit the
>> magic .IMPORT lines.
>
> Sounds like the PA could use this hook to simplify its code quite a bit.
The PA stuff is a trivial amount of code :-)  But it is a bit awkward in 
that we're using a per-variable hook to stash, then the end-file hook to 
walk the stashed stuff.

IIRC, the problem is tentative definitions.  Otherwise we'd just emit 
the .import statements as we saw the declarations.  I believe that was 
to properly interface with the HP assembler/linker.

We also have to defer emitting plabels, but I can't recall the 
braindamage behind that.


I'm not going to insist you do this in the same way as the PA.  That was 
a different era -- we had significant motivation to make things work in 
such a way that everything could be buried in the pa specific files. 
That sometimes led to less than optimal approaches to fix certain problems.


Jeff

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [7/11+] Inform the port about call arguments
  2014-10-21 22:16         ` Bernd Schmidt
@ 2014-10-22 18:23           ` Jeff Law
  2014-10-28 14:57             ` Bernd Schmidt
  0 siblings, 1 reply; 82+ messages in thread
From: Jeff Law @ 2014-10-22 18:23 UTC (permalink / raw)
  To: Bernd Schmidt, GCC Patches

On 10/21/14 16:06, Bernd Schmidt wrote:
> On 10/21/2014 11:53 PM, Jeff Law wrote:
>
>> So, in the end I'm torn.  I don't like adding new hooks when they're not
>> needed, but I have some reservations about relying on the order of stuff
>> in CALL_INSN_FUNCTION_USAGE and I worry a bit that you might end up with
>> stuff other than arguments on that list -- the PA port could filter on
>> the hard registers used for passing arguments, so other stuff appearing
>> isn't a big deal.
>
> This is another worry. Also, at the moment we don't actually add the
> pseudos to CALL_INSN_FUNCTION_USAGE (that's patch 6/11), we use the regs
> saved by the call_args hook to make proper USEs in a PARALLEL. I'm not
> convinced the rest of the compiler would be too happy to see pseudos there.
>
> So, in all I'd say it's probably possible to do it that way, but it
> feels a lot iffier than I'd be happy with. I for one didn't know about
> the PA requirement, so I could easily have broken it unknowingly if I'd
> made some random change modifying call expansion.
Yea, let's keep your approach.  Just wanted to explore a bit since the 
PA seems to have a variety of similar characteristics.

jeff

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [11/11] More tools.
  2014-10-20 14:58 ` The nvptx port [11/11] More tools Bernd Schmidt
  2014-10-21  0:16   ` Joseph S. Myers
@ 2014-10-22 20:40   ` Jeff Law
  2014-10-22 21:16     ` Bernd Schmidt
  2014-10-31 21:04   ` Jeff Law
  2 siblings, 1 reply; 82+ messages in thread
From: Jeff Law @ 2014-10-22 20:40 UTC (permalink / raw)
  To: Bernd Schmidt, GCC Patches

On 10/20/14 08:48, Bernd Schmidt wrote:
> This is a "bonus" optional patch which adds ar, ranlib, as and ld to the
> ptx port. This is not proper binutils; ar and ranlib are just linked to
> the host versions, and the other two tools have the following functions:
>
> * nvptx-as is required to convert the compiler output to actual valid
>    ptx assembly, primarily by reordering declarations and definitions.
>    Believe me when I say that I've tried to make that work in the
>    compiler itself and it's pretty much impossible without some really
>    invasive changes.
> * nvptx-ld is just a pseudo linker that works by concatenating ptx
>    input files and separating them with nul characters. Actual linking
>    is something that happens later, when calling CUDA library functions,
>    but existing build system make it useful to have something called
>    "ld" which is able to bundle everything that's needed into a single
>    file, and this seemed to be the simplest way of achieving this.
>
> There's a toplevel configure.ac change necessary to make ar/ranlib
> useable by the libgcc build. Having some tools built like this has some
> precedent in t-vmsnative, but as Thomas noted it does make feature tests
> in gcc's configure somewhat ugly (but everything works well enough to
> build the compiler). The alternative here is to bundle all these files
> into a separate nvptx-tools package which users would have to download -
> something that would be nice to avoid.
>
> These tools currently require GNU extensions - something I probably
> ought to fix if we decide to add them to the gcc build itself.
Would these be more appropriate in binutils?

Jeff

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [11/11] More tools.
  2014-10-22 20:40   ` Jeff Law
@ 2014-10-22 21:16     ` Bernd Schmidt
  2014-10-24 19:52       ` Jeff Law
  0 siblings, 1 reply; 82+ messages in thread
From: Bernd Schmidt @ 2014-10-22 21:16 UTC (permalink / raw)
  To: Jeff Law, GCC Patches

On 10/22/2014 10:31 PM, Jeff Law wrote:
>> These tools currently require GNU extensions - something I probably
>> ought to fix if we decide to add them to the gcc build itself.
> Would these be more appropriate in binutils?

I don't think so, given that we don't need any piece of regular 
binutils. There's no meaningful way to build libbfd. It would be strange 
to build binutils and have everything that's normally part of it 
disabled at configure time.


Bernd

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [11/11] More tools.
  2014-10-22 21:16     ` Bernd Schmidt
@ 2014-10-24 19:52       ` Jeff Law
  0 siblings, 0 replies; 82+ messages in thread
From: Jeff Law @ 2014-10-24 19:52 UTC (permalink / raw)
  To: Bernd Schmidt, GCC Patches

On 10/22/14 15:11, Bernd Schmidt wrote:
> On 10/22/2014 10:31 PM, Jeff Law wrote:
>>> These tools currently require GNU extensions - something I probably
>>> ought to fix if we decide to add them to the gcc build itself.
>> Would these be more appropriate in binutils?
>
> I don't think so, given that we don't need any piece of regular
> binutils. There's no meaningful way to build libbfd. It would be strange
> to build binutils and have everything that's normally part of it
> disabled at configure time.
Fair enough, but I'm having trouble seeing these in GCC.  Makes me 
wonder if they ought to be a package unto themselves, nvptxtools or 
somesuch.

Note that as a separate package, you don't have to remove the GNU 
extensions :-)

jeff

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [7/11+] Inform the port about call arguments
  2014-10-22 18:23           ` Jeff Law
@ 2014-10-28 14:57             ` Bernd Schmidt
  2014-10-29 23:42               ` Jeff Law
  0 siblings, 1 reply; 82+ messages in thread
From: Bernd Schmidt @ 2014-10-28 14:57 UTC (permalink / raw)
  To: Jeff Law, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 697 bytes --]

On 10/22/2014 08:12 PM, Jeff Law wrote:
> Yea, let's keep your approach.  Just wanted to explore a bit since the
> PA seems to have a variety of similar characteristics.

Here's an updated version of the patch. I experimented a little with ptx 
calling conventions and ran into an arg that had to be moved with 
memcpy, which exposed an ordering problem - all call_args were added to 
the memcpy call. So the invocation of the hook had to be moved downwards 
a bit, and the calculation of the return value needs to happen after it 
(since nvptx_function_value needs to know whether we are actually trying 
to construct a call at the moment).

Bootstrapped and tested on x86_64-linux, ok?


Bernd


[-- Attachment #2: 007-callargs.diff --]
[-- Type: text/x-patch, Size: 9038 bytes --]

	gcc/
	* target.def (call_args, end_call_args): New hooks.
	* hooks.c (hook_void_rtx_tree): New empty function.
	* hooks.h (hook_void_rtx_tree): Declare.
	* doc/tm.texi.in (TARGET_CALL_ARGS, TARGET_END_CALL_ARGS): Add.
	* doc/tm.texi: Regenerate.
	* calls.c (expand_call): Slightly rearrange the code.  Use the two new
	hooks.
	(expand_library_call_value_1): Use the two new hooks.

------------------------------------------------------------------------
Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi.orig
+++ gcc/doc/tm.texi
@@ -4960,6 +4960,29 @@ except the last are treated as named.
 You need not define this hook if it always returns @code{false}.
 @end deftypefn
 
+@deftypefn {Target Hook} void TARGET_CALL_ARGS (rtx, @var{tree})
+While generating RTL for a function call, this target hook is invoked once
+for each argument passed to the function, either a register returned by
+@code{TARGET_FUNCTION_ARG} or a memory location.  It is called just
+before the point where argument registers are stored.  The type of the
+function to be called is also passed as the second argument; it is
+@code{NULL_TREE} for libcalls.  The @code{TARGET_END_CALL_ARGS} hook is
+invoked just after the code to copy the return reg has been emitted.
+This functionality can be used to perform special setup of call argument
+registers if a target needs it.
+For functions without arguments, the hook is called once with @code{pc_rtx}
+passed instead of an argument register.
+Most ports do not need to implement anything for this hook.
+@end deftypefn
+
+@deftypefn {Target Hook} void TARGET_END_CALL_ARGS (void)
+This target hook is invoked while generating RTL for a function call,
+just after the point where the return reg is copied into a pseudo.  It
+signals that all the call argument and return registers for the just
+emitted call are now no longer in use.
+Most ports do not need to implement anything for this hook.
+@end deftypefn
+
 @deftypefn {Target Hook} bool TARGET_PRETEND_OUTGOING_VARARGS_NAMED (cumulative_args_t @var{ca})
 If you need to conditionally change ABIs so that one works with
 @code{TARGET_SETUP_INCOMING_VARARGS}, but the other works like neither
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in.orig
+++ gcc/doc/tm.texi.in
@@ -3856,6 +3856,10 @@ These machine description macros help im
 
 @hook TARGET_STRICT_ARGUMENT_NAMING
 
+@hook TARGET_CALL_ARGS
+
+@hook TARGET_END_CALL_ARGS
+
 @hook TARGET_PRETEND_OUTGOING_VARARGS_NAMED
 
 @node Trampolines
Index: gcc/hooks.c
===================================================================
--- gcc/hooks.c.orig
+++ gcc/hooks.c
@@ -245,6 +245,11 @@ hook_void_tree (tree a ATTRIBUTE_UNUSED)
 }
 
 void
+hook_void_rtx_tree (rtx, tree)
+{
+}
+
+void
 hook_void_constcharptr (const char *a ATTRIBUTE_UNUSED)
 {
 }
Index: gcc/hooks.h
===================================================================
--- gcc/hooks.h.orig
+++ gcc/hooks.h
@@ -71,6 +71,7 @@ extern void hook_void_constcharptr (cons
 extern void hook_void_rtx_insn_int (rtx_insn *, int);
 extern void hook_void_FILEptr_constcharptr (FILE *, const char *);
 extern bool hook_bool_FILEptr_rtx_false (FILE *, rtx);
+extern void hook_void_rtx_tree (rtx, tree);
 extern void hook_void_tree (tree);
 extern void hook_void_tree_treeptr (tree, tree *);
 extern void hook_void_int_int (int, int);
Index: gcc/target.def
===================================================================
--- gcc/target.def.orig
+++ gcc/target.def
@@ -3816,6 +3816,33 @@ not generate any instructions in this ca
  default_setup_incoming_varargs)
 
 DEFHOOK
+(call_args,
+ "While generating RTL for a function call, this target hook is invoked once\n\
+for each argument passed to the function, either a register returned by\n\
+@code{TARGET_FUNCTION_ARG} or a memory location.  It is called just\n\
+before the point where argument registers are stored.  The type of the\n\
+function to be called is also passed as the second argument; it is\n\
+@code{NULL_TREE} for libcalls.  The @code{TARGET_END_CALL_ARGS} hook is\n\
+invoked just after the code to copy the return reg has been emitted.\n\
+This functionality can be used to perform special setup of call argument\n\
+registers if a target needs it.\n\
+For functions without arguments, the hook is called once with @code{pc_rtx}\n\
+passed instead of an argument register.\n\
+Most ports do not need to implement anything for this hook.",
+ void, (rtx, tree),
+ hook_void_rtx_tree)
+
+DEFHOOK
+(end_call_args,
+ "This target hook is invoked while generating RTL for a function call,\n\
+just after the point where the return reg is copied into a pseudo.  It\n\
+signals that all the call argument and return registers for the just\n\
+emitted call are now no longer in use.\n\
+Most ports do not need to implement anything for this hook.",
+ void, (void),
+ hook_void_void)
+
+DEFHOOK
 (strict_argument_naming,
  "Define this hook to return @code{true} if the location where a function\n\
 argument is passed depends on whether or not it is a named argument.\n\
Index: gcc/calls.c
===================================================================
--- gcc/calls.c.orig
+++ gcc/calls.c
@@ -2978,32 +2978,6 @@ expand_call (tree exp, rtx target, int i
 
       funexp = rtx_for_function_call (fndecl, addr);
 
-      /* Figure out the register where the value, if any, will come back.  */
-      valreg = 0;
-      if (TYPE_MODE (rettype) != VOIDmode
-	  && ! structure_value_addr)
-	{
-	  if (pcc_struct_value)
-	    valreg = hard_function_value (build_pointer_type (rettype),
-					  fndecl, NULL, (pass == 0));
-	  else
-	    valreg = hard_function_value (rettype, fndecl, fntype,
-					  (pass == 0));
-
-	  /* If VALREG is a PARALLEL whose first member has a zero
-	     offset, use that.  This is for targets such as m68k that
-	     return the same value in multiple places.  */
-	  if (GET_CODE (valreg) == PARALLEL)
-	    {
-	      rtx elem = XVECEXP (valreg, 0, 0);
-	      rtx where = XEXP (elem, 0);
-	      rtx offset = XEXP (elem, 1);
-	      if (offset == const0_rtx
-		  && GET_MODE (where) == GET_MODE (valreg))
-		valreg = where;
-	    }
-	}
-
       /* Precompute all register parameters.  It isn't safe to compute anything
 	 once we have started filling any specific hard regs.  */
       precompute_register_parameters (num_actuals, args, &reg_parm_seen);
@@ -3082,6 +3056,42 @@ expand_call (tree exp, rtx target, int i
 		sibcall_failure = 1;
 	    }
 
+      bool any_regs = false;
+      for (i = 0; i < num_actuals; i++)
+	if (args[i].reg != NULL_RTX)
+	  {
+	    any_regs = true;
+	    targetm.calls.call_args (args[i].reg, funtype);
+	  }
+      if (!any_regs)
+	targetm.calls.call_args (pc_rtx, funtype);
+
+      /* Figure out the register where the value, if any, will come back.  */
+      valreg = 0;
+      if (TYPE_MODE (rettype) != VOIDmode
+	  && ! structure_value_addr)
+	{
+	  if (pcc_struct_value)
+	    valreg = hard_function_value (build_pointer_type (rettype),
+					  fndecl, NULL, (pass == 0));
+	  else
+	    valreg = hard_function_value (rettype, fndecl, fntype,
+					  (pass == 0));
+
+	  /* If VALREG is a PARALLEL whose first member has a zero
+	     offset, use that.  This is for targets such as m68k that
+	     return the same value in multiple places.  */
+	  if (GET_CODE (valreg) == PARALLEL)
+	    {
+	      rtx elem = XVECEXP (valreg, 0, 0);
+	      rtx where = XEXP (elem, 0);
+	      rtx offset = XEXP (elem, 1);
+	      if (offset == const0_rtx
+		  && GET_MODE (where) == GET_MODE (valreg))
+		valreg = where;
+	    }
+	}
+
       /* If register arguments require space on the stack and stack space
 	 was not preallocated, allocate stack space here for arguments
 	 passed in registers.  */
@@ -3430,6 +3440,8 @@ expand_call (tree exp, rtx target, int i
       for (i = 0; i < num_actuals; ++i)
 	free (args[i].aligned_regs);
 
+      targetm.calls.end_call_args ();
+
       insns = get_insns ();
       end_sequence ();
 
@@ -3956,6 +3968,18 @@ emit_library_call_value_1 (int retval, r
     }
 #endif
 
+  /* When expanding a normal call, args are stored in push order,
+     which is the reverse of what we have here.  */
+  bool any_regs = false;
+  for (int i = nargs; i-- > 0; )
+    if (argvec[i].reg != NULL_RTX)
+      {
+	targetm.calls.call_args (argvec[i].reg, NULL_TREE);
+	any_regs = true;
+      }
+  if (!any_regs)
+    targetm.calls.call_args (pc_rtx, NULL_TREE);
+
   /* Push the args that need to be pushed.  */
 
   /* ARGNUM indexes the ARGVEC array in the order in which the arguments
@@ -4196,6 +4220,8 @@ emit_library_call_value_1 (int retval, r
       valreg = gen_rtx_REG (TYPE_MODE (tfom), REGNO (valreg));
     }
 
+  targetm.calls.end_call_args ();
+
   /* For calls to `setjmp', etc., inform function.c:setjmp_warnings
      that it should complain if nonvolatile values are live.  For
      functions that cannot return, inform flow that control does not

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [10/11+] Target files
  2014-10-22 18:12   ` Jeff Law
@ 2014-10-28 15:10     ` Bernd Schmidt
  2014-10-29 23:51       ` Jeff Law
  2014-11-04 16:48       ` The nvptx port [10/11+] Target files Richard Henderson
  0 siblings, 2 replies; 82+ messages in thread
From: Bernd Schmidt @ 2014-10-28 15:10 UTC (permalink / raw)
  To: Jeff Law, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 1433 bytes --]

On 10/22/2014 08:01 PM, Jeff Law wrote:
> Please make sure all the functions in nvptx.c have function comments.

Done, and replaced regno 4 with NVPTX_RETURN_REGNUM.

>> +const char *
>> +nvptx_output_call_insn (rtx insn, rtx result, rtx callee)
> If possible, promote first argument to rtx_insn *.

Also done.

>> +/* Clean up subreg operands.  */
> Which means what?  A little more descriptive here would be helpful.

Expanded.

> I'm surprised there's not more hair around the address space issues.  I
> expected more problems there.

I have patches that expose all the address spaces to the middle-end 
through a lower-as pass that runs early. The preliminary patches for 
that ran into some resistance and into general brokenness of our address 
space support, so I decided to rip all that out for the moment to get 
the basic port into the next version.

This new version also implements a way of providing realloc that was 
suggested in another thread. Calls to malloc and free are redirected to 
libgcc variants. I'm not a big fan of wasting extra space on every 
allocation (which is why I didn't originally consider this approach 
viable), but it seems we'll have to do it that way. There's a change to 
the libgcc build system: on ptx we need comments in the assembly to 
survive, so we can't use -xassembler-with-cpp. I've not found any files 
named "*.asm", so I've changed that suffix to mean plain assembler.


Bernd


[-- Attachment #2: 010-target.diff --]
[-- Type: text/x-patch, Size: 127695 bytes --]

	* configure.ac: Allow configuring lto for nvptx.
	* configure: Regenerate.

	gcc/
	* config/nvptx/nvptx.c: New file.
	* config/nvptx/nvptx.h: New file.
	* config/nvptx/nvptx-protos.h: New file.
	* config/nvptx/nvptx.md: New file.
	* config/nvptx/t-nvptx: New file.
	* config/nvptx/nvptx.opt: New file.
	* common/config/nvptx/nvptx-common.c: New file.
	* config.gcc: Handle nvptx-*-*.

	libgcc/
	* config.host: Handle nvptx-*-*.
	* shared-object.mk (as-flags-$o): Define.
	($(base)$(objext), $(base)_s$(objext)): Use it instead of
	-xassembler-with-cpp.
	* static-object.mk: Identical changes.
	* config/nvptx/t-nvptx: New file.
	* config/nvptx/crt0.s: New file.
	* config/nvptx/free.asm: New file.
	* config/nvptx/malloc.asm: New file.
	* config/nvptx/realloc.c: New file.

------------------------------------------------------------------------
Index: gcc/common/config/nvptx/nvptx-common.c
===================================================================
--- /dev/null
+++ gcc/common/config/nvptx/nvptx-common.c
@@ -0,0 +1,38 @@
+/* NVPTX common hooks.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   Contributed by Bernd Schmidt <bernds@codesourcery.com>
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "diagnostic-core.h"
+#include "tm.h"
+#include "tm_p.h"
+#include "common/common-target.h"
+#include "common/common-target-def.h"
+#include "opts.h"
+#include "flags.h"
+
+#undef TARGET_HAVE_NAMED_SECTIONS
+#define TARGET_HAVE_NAMED_SECTIONS false
+
+#undef TARGET_DEFAULT_TARGET_FLAGS
+#define TARGET_DEFAULT_TARGET_FLAGS MASK_ABI64
+
+struct gcc_targetm_common targetm_common = TARGETM_COMMON_INITIALIZER;
Index: gcc/config.gcc
===================================================================
--- gcc/config.gcc.orig
+++ gcc/config.gcc
@@ -420,6 +420,9 @@ nios2-*-*)
 	cpu_type=nios2
 	extra_options="${extra_options} g.opt"
 	;;
+nvptx-*-*)
+	cpu_type=nvptx
+	;;
 powerpc*-*-*)
 	cpu_type=rs6000
 	extra_headers="ppc-asm.h altivec.h spe.h ppu_intrinsics.h paired.h spu2vmx.h vec_types.h si2vmx.h htmintrin.h htmxlintrin.h"
@@ -2148,6 +2151,10 @@ nios2-*-*)
 		;;
         esac
 	;;
+nvptx-*)
+	tm_file="${tm_file} newlib-stdint.h"
+	tmake_file="nvptx/t-nvptx"
+	;;
 pdp11-*-*)
 	tm_file="${tm_file} newlib-stdint.h"
 	use_gcc_stdint=wrap
Index: gcc/config/nvptx/nvptx.c
===================================================================
--- /dev/null
+++ gcc/config/nvptx/nvptx.c
@@ -0,0 +1,2118 @@
+/* Target code for NVPTX.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   Contributed by Bernd Schmidt <bernds@codesourcery.com>
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "rtl.h"
+#include "tree.h"
+#include "insn-flags.h"
+#include "output.h"
+#include "insn-attr.h"
+#include "insn-codes.h"
+#include "expr.h"
+#include "regs.h"
+#include "optabs.h"
+#include "recog.h"
+#include "ggc.h"
+#include "timevar.h"
+#include "tm_p.h"
+#include "tm-preds.h"
+#include "tm-constrs.h"
+#include "function.h"
+#include "langhooks.h"
+#include "dbxout.h"
+#include "target.h"
+#include "target-def.h"
+#include "diagnostic.h"
+#include "basic-block.h"
+#include "stor-layout.h"
+#include "calls.h"
+#include "df.h"
+#include "builtins.h"
+#include "hashtab.h"
+#include <sstream>
+
+/* Record the function decls we've written, and the libfuncs and function
+   decls corresponding to them.  */
+static std::stringstream func_decls;
+static GTY((if_marked ("ggc_marked_p"), param_is (struct rtx_def)))
+  htab_t declared_libfuncs_htab;
+static GTY((if_marked ("ggc_marked_p"), param_is (union tree_node)))
+  htab_t declared_fndecls_htab;
+static GTY((if_marked ("ggc_marked_p"), param_is (union tree_node)))
+  htab_t needed_fndecls_htab;
+
+/* Allocate a new, cleared machine_function structure.  */
+
+static struct machine_function *
+nvptx_init_machine_status (void)
+{
+  struct machine_function *p = ggc_cleared_alloc<machine_function> ();
+  p->ret_reg_mode = VOIDmode;
+  return p;
+}
+
+/* Implement TARGET_OPTION_OVERRIDE.  */
+
+static void
+nvptx_option_override (void)
+{
+  init_machine_status = nvptx_init_machine_status;
+  /* Gives us a predictable order, which we need especially for variables.  */
+  flag_toplevel_reorder = 1;
+  /* Assumes that it will see only hard registers.  */
+  flag_var_tracking = 0;
+  write_symbols = NO_DEBUG;
+  debug_info_level = DINFO_LEVEL_NONE;
+
+  declared_fndecls_htab
+    = htab_create_ggc (17, htab_hash_pointer, htab_eq_pointer, NULL);
+  needed_fndecls_htab
+    = htab_create_ggc (17, htab_hash_pointer, htab_eq_pointer, NULL);
+  declared_libfuncs_htab
+    = htab_create_ggc (17, htab_hash_pointer, htab_eq_pointer, NULL);
+}
+
+/* Return the mode to be used when declaring a ptx object for OBJ.
+   For objects with subparts such as complex modes this is the mode
+   of the subpart.  */
+
+enum machine_mode
+nvptx_underlying_object_mode (rtx obj)
+{
+  if (GET_CODE (obj) == SUBREG)
+    obj = SUBREG_REG (obj);
+  enum machine_mode mode = GET_MODE (obj);
+  if (mode == TImode)
+    return DImode;
+  if (COMPLEX_MODE_P (mode))
+    return GET_MODE_INNER (mode);
+  return mode;
+}
+
+/* Return a ptx type for MODE.  If PROMOTE, then use .u32 for QImode to
+   deal with ptx ideosyncracies.  */
+
+const char *
+nvptx_ptx_type_from_mode (enum machine_mode mode, bool promote)
+{
+  switch (mode)
+    {
+    case BLKmode:
+      return ".b8";
+    case BImode:
+      return ".pred";
+    case QImode:
+      if (promote)
+	return ".u32";
+      else
+	return ".u8";
+    case HImode:
+      return ".u16";
+    case SImode:
+      return ".u32";
+    case DImode:
+      return ".u64";
+
+    case SFmode:
+      return ".f32";
+    case DFmode:
+      return ".f64";
+
+    default:
+      gcc_unreachable ();
+    }
+}
+
+/* Return the number of pieces to use when dealing with a pseudo of *PMODE.
+   Alter *PMODE if we return a number greater than one.  */
+
+static int
+maybe_split_mode (enum machine_mode *pmode)
+{
+  enum machine_mode mode = *pmode;
+
+  if (COMPLEX_MODE_P (mode))
+    {
+      *pmode = GET_MODE_INNER (mode);
+      return 2;
+    }
+  else if (mode == TImode)
+    {
+      *pmode = DImode;
+      return 2;
+    }
+  return 1;
+}
+
+/* Like maybe_split_mode, but only return whether or not the mode
+   needs to be split.  */
+static bool
+nvptx_split_reg_p (enum machine_mode mode)
+{
+  if (COMPLEX_MODE_P (mode))
+    return true;
+  if (mode == TImode)
+    return true;
+  return false;
+}
+
+#define PASS_IN_REG_P(MODE, TYPE)				\
+  ((GET_MODE_CLASS (MODE) == MODE_INT				\
+    || GET_MODE_CLASS (MODE) == MODE_FLOAT			\
+    || ((GET_MODE_CLASS (MODE) == MODE_COMPLEX_INT		\
+	 || GET_MODE_CLASS (MODE) == MODE_COMPLEX_FLOAT)	\
+	&& !AGGREGATE_TYPE_P (TYPE)))				\
+   && (MODE) != TImode)
+
+#define RETURN_IN_REG_P(MODE)			\
+  ((GET_MODE_CLASS (MODE) == MODE_INT		\
+    || GET_MODE_CLASS (MODE) == MODE_FLOAT)	\
+   && GET_MODE_SIZE (MODE) <= 8)
+\f
+/* Perform a mode promotion for a function argument with MODE.  Return
+   the promoted mode.  */
+
+static enum machine_mode
+arg_promotion (enum machine_mode mode)
+{
+  if (mode == QImode || mode == HImode)
+    return SImode;
+  return mode;
+}
+
+/* Write the declaration of a function arg of TYPE to S.  I is the index
+   of the argument, MODE its mode.  NO_ARG_TYPES is true if this is for
+   a decl with zero TYPE_ARG_TYPES, i.e. an old-style C decl.  */
+
+static int
+write_one_arg (std::stringstream &s, tree type, int i, enum machine_mode mode,
+	       bool no_arg_types)
+{
+  if (!PASS_IN_REG_P (mode, type))
+    mode = Pmode;
+
+  int count = maybe_split_mode (&mode);
+
+  if (count == 2)
+    {
+      write_one_arg (s, NULL_TREE, i, mode, false);
+      write_one_arg (s, NULL_TREE, i + 1, mode, false);
+      return i + 1;
+    }
+
+  if (no_arg_types && !AGGREGATE_TYPE_P (type))
+    {
+      if (mode == SFmode)
+	mode = DFmode;
+      mode = arg_promotion (mode);
+    }
+
+  if (i > 0)
+    s << ", ";
+  s << ".param" << nvptx_ptx_type_from_mode (mode, false) << " %in_ar"
+    << (i + 1) << (mode == QImode || mode == HImode ? "[1]" : "");
+  if (mode == BLKmode)
+    s << "[" << int_size_in_bytes (type) << "]";
+  return i;
+}
+
+/* Look for attributes in ATTRS that would indicate we must write a function
+   as a .entry kernel rather than a .func.  Return true if one is found.  */
+
+static bool
+write_as_kernel (tree attrs)
+{
+  return (lookup_attribute ("kernel", attrs) != NULL_TREE
+	  || lookup_attribute ("omp target entrypoint", attrs) != NULL_TREE);
+}
+
+/* Write a function decl for DECL to S, where NAME is the name to be used.  */
+
+static void
+nvptx_write_function_decl (std::stringstream &s, const char *name, const_tree decl)
+{
+  tree fntype = TREE_TYPE (decl);
+  tree result_type = TREE_TYPE (fntype);
+  tree args = TYPE_ARG_TYPES (fntype);
+  tree attrs = DECL_ATTRIBUTES (decl);
+  bool kernel = write_as_kernel (attrs);
+  bool is_main = strcmp (name, "main") == 0;
+  bool args_from_decl = false;
+
+  /* We get:
+     NULL in TYPE_ARG_TYPES, for old-style functions
+     NULL in DECL_ARGUMENTS, for builtin functions without another
+       declaration.
+     So we have to pick the best one we have.  */
+  if (args == 0)
+    {
+      args = DECL_ARGUMENTS (decl);
+      args_from_decl = true;
+    }
+
+  if (DECL_EXTERNAL (decl))
+    s << ".extern ";
+  else if (TREE_PUBLIC (decl))
+    s << ".visible ";
+
+  if (kernel)
+    s << ".entry ";
+  else
+    s << ".func ";
+
+  /* Declare the result.  */
+  bool return_in_mem = false;
+  if (TYPE_MODE (result_type) != VOIDmode)
+    {
+      enum machine_mode mode = TYPE_MODE (result_type);
+      if (!RETURN_IN_REG_P (mode))
+	return_in_mem = true;
+      else
+	{
+	  mode = arg_promotion (mode);
+	  s << "(.param" << nvptx_ptx_type_from_mode (mode, false)
+	    << " %out_retval)";
+	}
+    }
+
+  if (name[0] == '*')
+    s << (name + 1);
+  else
+    s << name;
+
+  /* Declare argument types.  */
+  if ((args != NULL_TREE
+       && !(TREE_CODE (args) == TREE_LIST && TREE_VALUE (args) == void_type_node))
+      || is_main
+      || return_in_mem
+      || DECL_STATIC_CHAIN (decl))
+    {
+      s << "(";
+      int i = 0;
+      bool any_args = false;
+      if (return_in_mem)
+	{
+	  s << ".param.u" << GET_MODE_BITSIZE (Pmode) << " %in_ar1";
+	  i++;
+	}
+      while (args != NULL_TREE)
+	{
+	  tree type = args_from_decl ? TREE_TYPE (args) : TREE_VALUE (args);
+	  enum machine_mode mode = TYPE_MODE (type);
+
+	  if (mode != VOIDmode)
+	    {
+	      i = write_one_arg (s, type, i, mode,
+				 TYPE_ARG_TYPES (fntype) == 0);
+	      any_args = true;
+	      i++;
+	    }
+	  args = TREE_CHAIN (args);
+	}
+      if (stdarg_p (fntype))
+	{
+	  gcc_assert (i > 0);
+	  s << ", .param.u" << GET_MODE_BITSIZE (Pmode) << " %in_argp";
+	}
+      if (DECL_STATIC_CHAIN (decl))
+	{
+	  if (i > 0)
+	    s << ", ";
+	  s << ".reg.u" << GET_MODE_BITSIZE (Pmode)
+	    << reg_names [STATIC_CHAIN_REGNUM];
+	}
+      if (!any_args && is_main)
+	s << ".param.u32 %argc, .param.u" << GET_MODE_BITSIZE (Pmode)
+	  << " %argv";
+      s << ")";
+    }
+}
+
+/* Walk either ARGTYPES or ARGS if the former is null, and write out part of
+   the function header to FILE.  If WRITE_COPY is false, write reg
+   declarations, otherwise write the copy from the incoming argument to that
+   reg.  RETURN_IN_MEM indicates whether to start counting arg numbers at 1
+   instead of 0.  */
+
+static void
+walk_args_for_param (FILE *file, tree argtypes, tree args, bool write_copy,
+		     bool return_in_mem)
+{
+  int i;
+
+  bool args_from_decl = false;
+  if (argtypes == 0)
+    args_from_decl = true;
+  else
+    args = argtypes;
+
+  for (i = return_in_mem ? 1 : 0; args != NULL_TREE; args = TREE_CHAIN (args))
+    {
+      tree type = args_from_decl ? TREE_TYPE (args) : TREE_VALUE (args);
+      enum machine_mode mode = TYPE_MODE (type);
+
+      if (mode == VOIDmode)
+	break;
+
+      if (!PASS_IN_REG_P (mode, type))
+	mode = Pmode;
+
+      int count = maybe_split_mode (&mode);
+      if (count == 1)
+	{
+	  if (argtypes == NULL && !AGGREGATE_TYPE_P (type))
+	    {
+	      if (mode == SFmode)
+		mode = DFmode;
+
+	    }
+	  mode = arg_promotion (mode);
+	}
+      while (count-- > 0)
+	{
+	  i++;
+	  if (write_copy)
+	    fprintf (file, "\tld.param%s %%ar%d, [%%in_ar%d];\n",
+		     nvptx_ptx_type_from_mode (mode, false), i, i);
+	  else
+	    fprintf (file, "\t.reg%s %%ar%d;\n",
+		     nvptx_ptx_type_from_mode (mode, false), i);
+	}
+    }
+}
+
+/* Write a .func or .kernel declaration (not a definition) along with
+   a helper comment for use by ld.  S is the stream to write to, DECL
+   the decl for the function with name NAME.  */
+
+static void
+write_function_decl_and_comment (std::stringstream &s, const char *name, const_tree decl)
+{
+  s << "// BEGIN";
+  if (TREE_PUBLIC (decl))
+    s << " GLOBAL";
+  s << " FUNCTION DECL: ";
+  if (name[0] == '*')
+    s << (name + 1);
+  else
+    s << name;
+  s << "\n";
+  nvptx_write_function_decl (s, name, decl);
+  s << ";\n";
+}
+
+/* Check NAME for special function names and redirect them by returning a
+   replacement.  This applies to malloc, free and realloc, for which we
+   want to use libgcc wrappers, and call, which triggers a bug in ptxas.  */
+
+static const char *
+nvptx_name_replacement (const char *name)
+{
+  if (strcmp (name, "call") == 0)
+    return "__nvptx_call";
+  if (strcmp (name, "malloc") == 0)
+    return "__nvptx_malloc";
+  if (strcmp (name, "free") == 0)
+    return "__nvptx_free";
+  if (strcmp (name, "realloc") == 0)
+    return "__nvptx_realloc";
+  return name;
+}
+
+/* If DECL is a FUNCTION_DECL, check the hash table to see if we
+   already encountered it, and if not, insert it and write a ptx
+   declarations that will be output at the end of compilation.  */
+
+static bool
+nvptx_record_fndecl (tree decl, bool force = false)
+{
+  if (decl == NULL_TREE || TREE_CODE (decl) != FUNCTION_DECL
+      || !DECL_EXTERNAL (decl))
+    return true;
+
+  if (!force && TYPE_ARG_TYPES (TREE_TYPE (decl)) == NULL_TREE)
+    return false;
+
+  void **slot = htab_find_slot (declared_fndecls_htab, decl, INSERT);
+  if (*slot == NULL)
+    {
+      *slot = decl;
+      const char *name = get_fnname_from_decl (decl);
+      name = nvptx_name_replacement (name);
+      write_function_decl_and_comment (func_decls, name, decl);
+    }
+  return true;
+}
+
+/* Record that we need to emit a ptx decl for DECL.  Either do it now, or
+   record it for later in case we have no argument information at this
+   point.  */
+
+void
+nvptx_record_needed_fndecl (tree decl)
+{
+  if (nvptx_record_fndecl (decl))
+    return;
+
+  void **slot = htab_find_slot (needed_fndecls_htab, decl, INSERT);
+  if (*slot == NULL)
+    *slot = decl;
+}
+
+/* Implement ASM_DECLARE_FUNCTION_NAME.  Writes the start of a ptx
+   function, including local var decls and copies from the arguments to
+   local regs.  */
+
+void
+nvptx_declare_function_name (FILE *file, const char *name, const_tree decl)
+{
+  tree fntype = TREE_TYPE (decl);
+  tree result_type = TREE_TYPE (fntype);
+
+  name = nvptx_name_replacement (name);
+
+  std::stringstream s;
+  write_function_decl_and_comment (s, name, decl);
+  s << "// BEGIN";
+  if (TREE_PUBLIC (decl))
+    s << " GLOBAL";
+  s << " FUNCTION DEF: ";
+
+  if (name[0] == '*')
+    s << (name + 1);
+  else
+    s << name;
+  s << "\n";
+
+  nvptx_write_function_decl (s, name, decl);
+  fprintf (file, "%s", s.str().c_str());
+
+  bool return_in_mem = false;
+  if (TYPE_MODE (result_type) != VOIDmode)
+    {
+      enum machine_mode mode = TYPE_MODE (result_type);
+      if (!RETURN_IN_REG_P (mode))
+	return_in_mem = true;
+    }
+
+  fprintf (file, "\n{\n");
+
+  /* Ensure all arguments that should live in a register have one
+     declared.  We'll emit the copies below.  */
+  walk_args_for_param (file, TYPE_ARG_TYPES (fntype), DECL_ARGUMENTS (decl),
+		       false, return_in_mem);
+  if (return_in_mem)
+    fprintf (file, "\t.reg.u%d %%ar1;\n", GET_MODE_BITSIZE (Pmode));
+  else if (TYPE_MODE (result_type) != VOIDmode)
+    {
+      enum machine_mode mode = arg_promotion (TYPE_MODE (result_type));
+      fprintf (file, ".reg%s %%retval;\n",
+	       nvptx_ptx_type_from_mode (mode, false));
+    }
+
+  if (stdarg_p (fntype))
+    fprintf (file, "\t.reg.u%d %%argp;\n", GET_MODE_BITSIZE (Pmode));
+
+  fprintf (file, "\t.reg.u%d %s;\n", GET_MODE_BITSIZE (Pmode),
+	   reg_names[OUTGOING_STATIC_CHAIN_REGNUM]);
+
+  /* Declare the pseudos we have as ptx registers.  */
+  int maxregs = max_reg_num ();
+  for (int i = LAST_VIRTUAL_REGISTER + 1; i < maxregs; i++)
+    {
+      if (regno_reg_rtx[i] != const0_rtx)
+	{
+	  enum machine_mode mode = PSEUDO_REGNO_MODE (i);
+	  int count = maybe_split_mode (&mode);
+	  if (count > 1)
+	    {
+	      while (count-- > 0)
+		fprintf (file, "\t.reg%s %%r%d$%d;\n",
+			 nvptx_ptx_type_from_mode (mode, true),
+			 i, count);
+	    }
+	  else
+	    fprintf (file, "\t.reg%s %%r%d;\n",
+		     nvptx_ptx_type_from_mode (mode, true),
+		     i);
+	}
+    }
+
+  /* The only reason we might be using outgoing args is if we call a stdargs
+     function.  Allocate the space for this.  If we called varargs functions
+     without passing any variadic arguments, we'll see a reference to outargs
+     even with a zero outgoing_args_size.  */
+  HOST_WIDE_INT sz = crtl->outgoing_args_size;
+  if (sz == 0)
+    sz = 1;
+  if (cfun->machine->has_call_with_varargs)
+    fprintf (file, "\t.reg.u%d %%outargs;\n"
+	     "\t.local.align 8 .b8 %%outargs_ar["HOST_WIDE_INT_PRINT_DEC"];\n",
+	     BITS_PER_WORD, sz);
+  if (cfun->machine->punning_buffer_size > 0)
+    fprintf (file, "\t.reg.u%d %%punbuffer;\n"
+	     "\t.local.align 8 .b8 %%punbuffer_ar[%d];\n",
+	     BITS_PER_WORD, cfun->machine->punning_buffer_size);
+
+  /* Declare a local variable for the frame.  */
+  sz = get_frame_size ();
+  if (sz > 0 || cfun->machine->has_call_with_sc)
+    {
+      fprintf (file, "\t.reg.u%d %%frame;\n"
+	       "\t.local.align 8 .b8 %%farray["HOST_WIDE_INT_PRINT_DEC"];\n",
+	       BITS_PER_WORD, sz == 0 ? 1 : sz);
+      fprintf (file, "\tcvta.local.u%d %%frame, %%farray;\n",
+	       BITS_PER_WORD);
+    }
+
+  if (cfun->machine->has_call_with_varargs)
+      fprintf (file, "\tcvta.local.u%d %%outargs, %%outargs_ar;\n",
+	       BITS_PER_WORD);
+  if (cfun->machine->punning_buffer_size > 0)
+      fprintf (file, "\tcvta.local.u%d %%punbuffer, %%punbuffer_ar;\n",
+	       BITS_PER_WORD);
+
+  /* Now emit any copies necessary for arguments.  */
+  walk_args_for_param (file, TYPE_ARG_TYPES (fntype), DECL_ARGUMENTS (decl),
+		       true, return_in_mem);
+  if (return_in_mem)
+    fprintf (file, "ld.param.u%d %%ar1, [%%in_ar1];\n",
+	     GET_MODE_BITSIZE (Pmode));
+  if (stdarg_p (fntype))
+    fprintf (file, "ld.param.u%d %%argp, [%%in_argp];\n",
+	     GET_MODE_BITSIZE (Pmode));
+}
+
+/* Output a return instruction.  Also copy the return value to its outgoing
+   location.  */
+
+const char *
+nvptx_output_return (void)
+{
+  tree fntype = TREE_TYPE (current_function_decl);
+  tree result_type = TREE_TYPE (fntype);
+  if (TYPE_MODE (result_type) != VOIDmode)
+    {
+      enum machine_mode mode = TYPE_MODE (result_type);
+      if (RETURN_IN_REG_P (mode))
+	{
+	  mode = arg_promotion (mode);
+	  fprintf (asm_out_file, "\tst.param%s\t[%%out_retval], %%retval;\n",
+		   nvptx_ptx_type_from_mode (mode, false));
+	}
+    }
+
+  return "ret;";
+}
+
+/* Construct a function declaration from a call insn.  This can be
+   necessary for two reasons - either we have an indirect call which
+   requires a .callprototype declaration, or we have a libcall
+   generated by emit_library_call for which no decl exists.  */
+
+static void
+write_func_decl_from_insn (std::stringstream &s, rtx result, rtx pat,
+			   rtx callee)
+{
+  bool callprototype = register_operand (callee, Pmode);
+  const char *name = "_";
+  if (!callprototype)
+    {
+      name = XSTR (callee, 0);
+      name = nvptx_name_replacement (name);
+      s << "// BEGIN GLOBAL FUNCTION DECL: " << name << "\n";
+    }
+  s << (callprototype ? "\t.callprototype\t" : "\t.extern .func ");
+
+  if (result != NULL_RTX)
+    {
+      s << "(.param";
+      s << nvptx_ptx_type_from_mode (arg_promotion (GET_MODE (result)),
+				     false);
+      s << " ";
+      if (callprototype)
+	s << "_";
+      else
+	s << "%out_retval";
+      s << ")";
+    }
+
+  s << name;
+
+  int nargs = XVECLEN (pat, 0) - 1;
+  if (nargs > 0)
+    {
+      s << " (";
+      for (int i = 0; i < nargs; i++)
+	{
+	  rtx t = XEXP (XVECEXP (pat, 0, i + 1), 0);
+	  enum machine_mode mode = GET_MODE (t);
+	  int count = maybe_split_mode (&mode);
+
+	  while (count-- > 0)
+	    {
+	      s << ".param";
+	      s << nvptx_ptx_type_from_mode (mode, false);
+	      s << " ";
+	      if (callprototype)
+		s << "_";
+	      else
+		s << "%arg" << i;
+	      if (mode == QImode || mode == HImode)
+		s << "[1]";
+	      if (i + 1 < nargs || count > 0)
+		s << ", ";
+	    }
+	}
+      s << ")";
+    }
+  s << ";\n";
+}
+
+/* Terminate a function by writing a closing brace to FILE.  */
+
+void
+nvptx_function_end (FILE *file)
+{
+  fprintf (file, "\t}\n");
+}
+\f
+/* Decide whether we can make a sibling call to a function.  For ptx, we
+   can't.  */
+
+static bool
+nvptx_function_ok_for_sibcall (tree, tree)
+{
+  return false;
+}
+
+/* Implement the TARGET_CALL_ARGS hook.  Record information about one
+   argument to the next call.  */
+
+static void
+nvptx_call_args (rtx arg, tree funtype)
+{
+  if (cfun->machine->start_call == NULL_RTX)
+    {
+      cfun->machine->call_args = NULL;
+      cfun->machine->funtype = funtype;
+      cfun->machine->start_call = const0_rtx;
+    }
+  if (arg == pc_rtx)
+    return;
+
+  rtx_expr_list *args_so_far = cfun->machine->call_args;
+  if (REG_P (arg))
+    cfun->machine->call_args = alloc_EXPR_LIST (VOIDmode, arg, args_so_far);
+}
+
+/* Implement the corresponding END_CALL_ARGS hook.  Clear and free the
+   information we recorded.  */
+
+static void
+nvptx_end_call_args (void)
+{
+  cfun->machine->start_call = NULL_RTX;
+  free_EXPR_LIST_list (&cfun->machine->call_args);
+}
+
+/* Emit the sequence for a call.  */
+
+void
+nvptx_expand_call (rtx retval, rtx address)
+{
+  int nargs;
+  rtx callee = XEXP (address, 0);
+  rtx pat, t;
+  rtvec vec;
+  bool external_decl = false;
+
+  nargs = 0;
+  for (t = cfun->machine->call_args; t; t = XEXP (t, 1))
+    nargs++;
+
+  bool has_varargs = false;
+  tree decl_type = NULL_TREE;
+
+  if (!call_insn_operand (callee, Pmode))
+    {
+      callee = force_reg (Pmode, callee);
+      address = change_address (address, QImode, callee);
+    }
+
+  if (GET_CODE (callee) == SYMBOL_REF)
+    {
+      tree decl = SYMBOL_REF_DECL (callee);
+      if (decl != NULL_TREE)
+	{
+	  decl_type = TREE_TYPE (decl);
+	  if (DECL_STATIC_CHAIN (decl))
+	    cfun->machine->has_call_with_sc = true;
+	  if (DECL_EXTERNAL (decl))
+	    external_decl = true;
+	}
+    }
+  if (cfun->machine->funtype
+      /* It's possible to construct testcases where we call a variable.
+	 See compile/20020129-1.c.  stdarg_p will crash so avoid calling it
+	 in such a case.  */
+      && (TREE_CODE (cfun->machine->funtype) == FUNCTION_TYPE
+	  || TREE_CODE (cfun->machine->funtype) == METHOD_TYPE)
+      && stdarg_p (cfun->machine->funtype))
+    {
+      has_varargs = true;
+      cfun->machine->has_call_with_varargs = true;
+    }
+  vec = rtvec_alloc (nargs + 1 + (has_varargs ? 1 : 0));
+  pat = gen_rtx_PARALLEL (VOIDmode, vec);
+  if (has_varargs)
+    {
+      rtx this_arg = gen_reg_rtx (Pmode);
+      if (Pmode == DImode)
+	emit_move_insn (this_arg, stack_pointer_rtx);
+      else
+	emit_move_insn (this_arg, stack_pointer_rtx);
+      XVECEXP (pat, 0, nargs + 1) = gen_rtx_USE (VOIDmode, this_arg);
+    }
+
+  int i;
+  rtx arg;
+  for (i = 1, arg = cfun->machine->call_args; arg; arg = XEXP (arg, 1), i++)
+    {
+      rtx this_arg = XEXP (arg, 0);
+      XVECEXP (pat, 0, i) = gen_rtx_USE (VOIDmode, this_arg);
+    }
+
+  rtx tmp_retval = retval;
+  t = gen_rtx_CALL (VOIDmode, address, const0_rtx);
+  if (retval != NULL_RTX)
+    {
+      if (!nvptx_register_operand (retval, GET_MODE (retval)))
+	tmp_retval = gen_reg_rtx (GET_MODE (retval));
+      t = gen_rtx_SET (VOIDmode, tmp_retval, t);
+    }
+  XVECEXP (pat, 0, 0) = t;
+  if (!REG_P (callee)
+      && (decl_type == NULL_TREE
+	  || (external_decl && TYPE_ARG_TYPES (decl_type) == NULL_TREE)))
+    {
+      void **slot = htab_find_slot (declared_libfuncs_htab, callee, INSERT);
+      if (*slot == NULL)
+	{
+	  *slot = callee;
+	  write_func_decl_from_insn (func_decls, retval, pat, callee);
+	}
+    }
+  emit_call_insn (pat);
+  if (tmp_retval != retval)
+    emit_move_insn (retval, tmp_retval);
+}
+
+/* Implement TARGET_FUNCTION_ARG.  */
+
+static rtx
+nvptx_function_arg (cumulative_args_t, enum machine_mode mode,
+		    const_tree, bool named)
+{
+  if (mode == VOIDmode)
+    return NULL_RTX;
+
+  if (named)
+    return gen_reg_rtx (mode);
+  return NULL_RTX;
+}
+
+/* Implement TARGET_FUNCTION_INCOMING_ARG.  */
+
+static rtx
+nvptx_function_incoming_arg (cumulative_args_t cum_v, enum machine_mode mode,
+			     const_tree, bool named)
+{
+  CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v);
+  if (mode == VOIDmode)
+    return NULL_RTX;
+
+  if (!named)
+    return NULL_RTX;
+
+  /* No need to deal with split modes here, the only case that can
+     happen is complex modes and those are dealt with by
+     TARGET_SPLIT_COMPLEX_ARG.  */
+  return gen_rtx_UNSPEC (mode,
+			 gen_rtvec (1, GEN_INT (1 + cum->count)),
+			 UNSPEC_ARG_REG);
+}
+
+/* Implement TARGET_FUNCTION_ARG_ADVANCE.  */
+
+static void
+nvptx_function_arg_advance (cumulative_args_t cum_v, enum machine_mode mode,
+			    const_tree type ATTRIBUTE_UNUSED,
+			    bool named ATTRIBUTE_UNUSED)
+{
+  CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v);
+  if (mode == TImode)
+    cum->count += 2;
+  else
+    cum->count++;
+}
+
+/* Handle the TARGET_STRICT_ARGUMENT_NAMING target hook.
+
+   For nvptx, we know how to handle functions declared as stdarg: by
+   passing an extra pointer to the unnamed arguments.  However, the
+   Fortran frontend can produce a different situation, where a
+   function pointer is declared with no arguments, but the actual
+   function and calls to it take more arguments.  In that case, we
+   want to ensure the call matches the definition of the function.  */
+
+static bool
+nvptx_strict_argument_naming (cumulative_args_t cum_v)
+{
+  CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v);
+  return cum->fntype == NULL_TREE || stdarg_p (cum->fntype);
+}
+
+/* Implement TARGET_FUNCTION_ARG_BOUNDARY.  */
+
+static unsigned int
+nvptx_function_arg_boundary (enum machine_mode mode, const_tree type)
+{
+  unsigned int boundary = type ? TYPE_ALIGN (type) : GET_MODE_BITSIZE (mode);
+
+  if (boundary > BITS_PER_WORD)
+    return 2 * BITS_PER_WORD;
+
+  if (mode == BLKmode)
+    {
+      HOST_WIDE_INT size = int_size_in_bytes (type);
+      if (size > 4)
+        return 2 * BITS_PER_WORD;
+      if (boundary < BITS_PER_WORD)
+        {
+          if (size >= 3)
+            return BITS_PER_WORD;
+          if (size >= 2)
+            return 2 * BITS_PER_UNIT;
+        }
+    }
+  return boundary;
+}
+
+/* TARGET_FUNCTION_VALUE implementation.  Returns an RTX representing the place
+   where function FUNC returns or receives a value of data type TYPE.  */
+
+static rtx
+nvptx_function_value (const_tree type, const_tree func ATTRIBUTE_UNUSED,
+		      bool outgoing)
+{
+  int unsignedp = TYPE_UNSIGNED (type);
+  enum machine_mode orig_mode = TYPE_MODE (type);
+  enum machine_mode mode = promote_function_mode (type, orig_mode,
+						  &unsignedp, NULL_TREE, 1);
+  if (outgoing)
+    return gen_rtx_REG (mode, NVPTX_RETURN_REGNUM);
+  if (cfun->machine->start_call == NULL_RTX)
+    /* Pretend to return in a hard reg for early uses before pseudos can be
+       generated.  */
+    return gen_rtx_REG (mode, NVPTX_RETURN_REGNUM);
+  return gen_reg_rtx (mode);
+}
+
+/* Implement TARGET_LIBCALL_VALUE.  */
+
+static rtx
+nvptx_libcall_value (enum machine_mode mode, const_rtx)
+{
+  if (cfun->machine->start_call == NULL_RTX)
+    /* Pretend to return in a hard reg for early uses before pseudos can be
+       generated.  */
+    return gen_rtx_REG (mode, NVPTX_RETURN_REGNUM);
+  return gen_reg_rtx (mode);
+}
+
+/* Implement TARGET_FUNCTION_VALUE_REGNO_P.  */
+
+static bool
+nvptx_function_value_regno_p (const unsigned int regno)
+{
+  return regno == NVPTX_RETURN_REGNUM;
+}
+
+/* Types with a mode other than those supported by the machine are passed by
+   reference in memory.  */
+
+static bool
+nvptx_pass_by_reference (cumulative_args_t, enum machine_mode mode,
+			 const_tree type, bool)
+{
+  return !PASS_IN_REG_P (mode, type);
+}
+
+/* Implement TARGET_RETURN_IN_MEMORY.  */
+
+static bool
+nvptx_return_in_memory (const_tree type, const_tree)
+{
+  enum machine_mode mode = TYPE_MODE (type);
+  if (!RETURN_IN_REG_P (mode))
+    return true;
+  return false;
+}
+
+/* Implement TARGET_PROMOTE_FUNCTION_MODE.  */
+
+static enum machine_mode
+nvptx_promote_function_mode (const_tree type, enum machine_mode mode,
+			     int *punsignedp,
+			     const_tree funtype, int for_return)
+{
+  if (type == NULL_TREE)
+    return mode;
+  if (for_return)
+    return promote_mode (type, mode, punsignedp);
+  /* For K&R-style functions, try to match the language promotion rules to
+     minimize type mismatches at assembly time.  */
+  if (TYPE_ARG_TYPES (funtype) == NULL_TREE
+      && type != NULL_TREE
+      && !AGGREGATE_TYPE_P (type))
+    {
+      if (mode == SFmode)
+	mode = DFmode;
+      mode = arg_promotion (mode);
+    }
+
+  return mode;
+}
+
+/* Implement TARGET_STATIC_CHAIN.  */
+
+static rtx
+nvptx_static_chain (const_tree fndecl, bool incoming_p)
+{
+  if (!DECL_STATIC_CHAIN (fndecl))
+    return NULL;
+
+  if (incoming_p)
+    return gen_rtx_REG (Pmode, STATIC_CHAIN_REGNUM);
+  else
+    return gen_rtx_REG (Pmode, OUTGOING_STATIC_CHAIN_REGNUM);
+}
+\f
+/* Emit a comparison COMPARE, and return the new test to be used in the
+   jump.  */
+
+rtx
+nvptx_expand_compare (rtx compare)
+{
+  rtx pred = gen_reg_rtx (BImode);
+  rtx cmp = gen_rtx_fmt_ee (GET_CODE (compare), BImode,
+			    XEXP (compare, 0), XEXP (compare, 1));
+  emit_insn (gen_rtx_SET (VOIDmode, pred, cmp));
+  return gen_rtx_NE (BImode, pred, const0_rtx);
+}
+
+/* When loading an operand ORIG_OP, verify whether an address space
+   conversion to generic is required, and if so, perform it.  Also
+   check for SYMBOL_REFs for function decls and call
+   nvptx_record_needed_fndecl as needed.
+   Return either the original operand, or the converted one.  */
+
+rtx
+nvptx_maybe_convert_symbolic_operand (rtx orig_op)
+{
+  if (GET_MODE (orig_op) != Pmode)
+    return orig_op;
+
+  rtx op = orig_op;
+  while (GET_CODE (op) == PLUS || GET_CODE (op) == CONST)
+    op = XEXP (op, 0);
+  if (GET_CODE (op) != SYMBOL_REF)
+    return orig_op;
+
+  tree decl = SYMBOL_REF_DECL (op);
+  if (decl && TREE_CODE (decl) == FUNCTION_DECL)
+    {
+      nvptx_record_needed_fndecl (decl);
+      return orig_op;
+    }
+
+  addr_space_t as = nvptx_addr_space_from_address (op);
+  if (as == ADDR_SPACE_GENERIC)
+    return orig_op;
+
+  enum unspec code;
+  code = (as == ADDR_SPACE_GLOBAL ? UNSPEC_FROM_GLOBAL
+	  : as == ADDR_SPACE_LOCAL ? UNSPEC_FROM_LOCAL
+	  : as == ADDR_SPACE_SHARED ? UNSPEC_FROM_SHARED
+	  : as == ADDR_SPACE_CONST ? UNSPEC_FROM_CONST
+	  : UNSPEC_FROM_PARAM);
+  rtx dest = gen_reg_rtx (Pmode);
+  emit_insn (gen_rtx_SET (VOIDmode, dest,
+			  gen_rtx_UNSPEC (Pmode, gen_rtvec (1, orig_op),
+					  code)));
+  return dest;
+}
+\f
+/* Returns true if X is a valid address for use in a memory reference.  */
+
+static bool
+nvptx_legitimate_address_p (enum machine_mode, rtx x, bool)
+{
+  enum rtx_code code = GET_CODE (x);
+
+  switch (code)
+    {
+    case REG:
+      return true;
+
+    case PLUS:
+      if (REG_P (XEXP (x, 0)) && CONST_INT_P (XEXP (x, 1)))
+	return true;
+      return false;
+
+    case CONST:
+    case SYMBOL_REF:
+    case LABEL_REF:
+      return true;
+
+    default:
+      return false;
+    }
+}
+
+/* Implement HARD_REGNO_MODE_OK.  We barely use hard regs, but we want
+   to ensure that the return register's mode isn't changed.  */
+
+bool
+nvptx_hard_regno_mode_ok (int regno, enum machine_mode mode)
+{
+  if (regno != NVPTX_RETURN_REGNUM
+      || cfun == NULL || cfun->machine->ret_reg_mode == VOIDmode)
+    return true;
+  return mode == cfun->machine->ret_reg_mode;
+}
+\f
+/* Convert an address space AS to the corresponding ptx string.  */
+
+const char *
+nvptx_section_from_addr_space (addr_space_t as)
+{
+  switch (as)
+    {
+    case ADDR_SPACE_CONST:
+      return ".const";
+
+    case ADDR_SPACE_GLOBAL:
+      return ".global";
+
+    case ADDR_SPACE_SHARED:
+      return ".shared";
+
+    case ADDR_SPACE_GENERIC:
+      return "";
+
+    default:
+      gcc_unreachable ();
+    }
+}
+
+/* Determine whether DECL goes into .const or .global.  */
+
+const char *
+nvptx_section_for_decl (const_tree decl)
+{
+  bool is_const = (CONSTANT_CLASS_P (decl)
+		   || TREE_CODE (decl) == CONST_DECL
+		   || TREE_READONLY (decl));
+  if (is_const)
+    return ".const";
+
+  return ".global";
+}
+
+/* Look for a SYMBOL_REF in ADDR and return the address space to be used
+   for the insn referencing this address.  */
+
+addr_space_t
+nvptx_addr_space_from_address (rtx addr)
+{
+  while (GET_CODE (addr) == PLUS || GET_CODE (addr) == CONST)
+    addr = XEXP (addr, 0);
+  if (GET_CODE (addr) != SYMBOL_REF)
+    return ADDR_SPACE_GENERIC;
+
+  tree decl = SYMBOL_REF_DECL (addr);
+  if (decl == NULL_TREE || TREE_CODE (decl) == FUNCTION_DECL)
+    return ADDR_SPACE_GENERIC;
+
+  bool is_const = (CONSTANT_CLASS_P (decl)
+		   || TREE_CODE (decl) == CONST_DECL
+		   || TREE_READONLY (decl));
+  if (is_const)
+    return ADDR_SPACE_CONST;
+
+  return ADDR_SPACE_GLOBAL;
+}
+\f
+/* Machinery to output constant initializers.  */
+
+/* Used when assembling integers to ensure data is emitted in
+   pieces whose size matches the declaration we printed.  */
+static unsigned int decl_chunk_size;
+static enum machine_mode decl_chunk_mode;
+/* Used in the same situation, to keep track of the byte offset
+   into the initializer.  */
+static unsigned HOST_WIDE_INT decl_offset;
+/* The initializer part we are currently processing.  */
+static HOST_WIDE_INT init_part;
+/* The total size of the object.  */
+static unsigned HOST_WIDE_INT object_size;
+/* True if we found a skip extending to the end of the object.  Used to
+   assert that no data follows.  */
+static bool object_finished;
+
+/* Write the necessary separator string to begin a new initializer value.  */
+
+static void
+begin_decl_field (void)
+{
+  /* We never see decl_offset at zero by the time we get here.  */
+  if (decl_offset == decl_chunk_size)
+    fprintf (asm_out_file, " = { ");
+  else
+    fprintf (asm_out_file, ", ");
+}
+
+/* Output the currently stored chunk as an initializer value.  */
+
+static void
+output_decl_chunk (void)
+{
+  begin_decl_field ();
+  output_address (gen_int_mode (init_part, decl_chunk_mode));
+  init_part = 0;
+}
+
+/* Add value VAL sized SIZE to the data we're emitting, and keep writing
+   out chunks as they fill up.  */
+
+static void
+nvptx_assemble_value (HOST_WIDE_INT val, unsigned int size)
+{
+  unsigned HOST_WIDE_INT chunk_offset = decl_offset % decl_chunk_size;
+  gcc_assert (!object_finished);
+  while (size > 0)
+    {
+      int this_part = size;
+      if (chunk_offset + this_part > decl_chunk_size)
+	this_part = decl_chunk_size - chunk_offset;
+      HOST_WIDE_INT val_part;
+      HOST_WIDE_INT mask = 2;
+      mask <<= this_part * BITS_PER_UNIT - 1;
+      val_part = val & (mask - 1);
+      init_part |= val_part << (BITS_PER_UNIT * chunk_offset);
+      val >>= BITS_PER_UNIT * this_part;
+      size -= this_part;
+      decl_offset += this_part;
+      if (decl_offset % decl_chunk_size == 0)
+	output_decl_chunk ();
+
+      chunk_offset = 0;
+    }
+}
+
+/* Target hook for assembling integer object X of size SIZE.  */
+
+static bool
+nvptx_assemble_integer (rtx x, unsigned int size, int ARG_UNUSED (aligned_p))
+{
+  if (GET_CODE (x) == SYMBOL_REF || GET_CODE (x) == CONST)
+    {
+      gcc_assert (size = decl_chunk_size);
+      if (decl_offset % decl_chunk_size != 0)
+	sorry ("cannot emit unaligned pointers in ptx assembly");
+      decl_offset += size;
+      begin_decl_field ();
+
+      HOST_WIDE_INT off = 0;
+      if (GET_CODE (x) == CONST)
+	x = XEXP (x, 0);
+      if (GET_CODE (x) == PLUS)
+	{
+	  off = INTVAL (XEXP (x, 1));
+	  x = XEXP (x, 0);
+	}
+      if (GET_CODE (x) == SYMBOL_REF)
+	{
+	  nvptx_record_needed_fndecl (SYMBOL_REF_DECL (x));
+	  fprintf (asm_out_file, "generic(");
+	  output_address (x);
+	  fprintf (asm_out_file, ")");
+	}
+      if (off != 0)
+	fprintf (asm_out_file, " + " HOST_WIDE_INT_PRINT_DEC, off);
+      return true;
+    }
+
+  HOST_WIDE_INT val;
+  switch (GET_CODE (x))
+    {
+    case CONST_INT:
+      val = INTVAL (x);
+      break;
+    case CONST_DOUBLE:
+      gcc_unreachable ();
+      break;
+    default:
+      gcc_unreachable ();
+    }
+
+  nvptx_assemble_value (val, size);
+  return true;
+}
+
+/* Output SIZE zero bytes.  We ignore the FILE argument since the
+   functions we're calling to perform the output just use
+   asm_out_file.  */
+
+void
+nvptx_output_skip (FILE *, unsigned HOST_WIDE_INT size)
+{
+  if (decl_offset + size >= object_size)
+    {
+      if (decl_offset % decl_chunk_size != 0)
+	nvptx_assemble_value (0, decl_chunk_size);
+      object_finished = true;
+      return;
+    }
+
+  while (size > decl_chunk_size)
+    {
+      nvptx_assemble_value (0, decl_chunk_size);
+      size -= decl_chunk_size;
+    }
+  while (size-- > 0)
+    nvptx_assemble_value (0, 1);
+}
+
+/* Output a string STR with length SIZE.  As in nvptx_output_skip we
+   ignore the FILE arg.  */
+
+void
+nvptx_output_ascii (FILE *, const char *str, unsigned HOST_WIDE_INT size)
+{
+  for (unsigned HOST_WIDE_INT i = 0; i < size; i++)
+    nvptx_assemble_value (str[i], 1);
+}
+
+/* Called when the initializer for a decl has been completely output through
+   combinations of the three functions above.  */
+
+static void
+nvptx_assemble_decl_end (void)
+{
+  if (decl_offset != 0)
+    {
+      if (!object_finished && decl_offset % decl_chunk_size != 0)
+	nvptx_assemble_value (0, decl_chunk_size);
+
+      fprintf (asm_out_file, " }");
+    }
+  fprintf (asm_out_file, ";\n");
+}
+
+/* Start a declaration of a variable of TYPE with NAME to
+   FILE.  IS_PUBLIC says whether this will be externally visible.
+   Here we just write the linker hint and decide on the chunk size
+   to use.  */
+
+static void
+init_output_initializer (FILE *file, const char *name, const_tree type,
+			 bool is_public)
+{
+  fprintf (file, "// BEGIN%s VAR DEF: ", is_public ? " GLOBAL" : "");
+  assemble_name_raw (file, name);
+  fputc ('\n', file);
+
+  if (TREE_CODE (type) == ARRAY_TYPE)
+    type = TREE_TYPE (type);
+  int sz = int_size_in_bytes (type);
+  if ((TREE_CODE (type) != INTEGER_TYPE
+       && TREE_CODE (type) != ENUMERAL_TYPE
+       && TREE_CODE (type) != REAL_TYPE)
+      || sz < 0
+      || sz > HOST_BITS_PER_WIDE_INT)
+    type = ptr_type_node;
+  decl_chunk_size = int_size_in_bytes (type);
+  decl_chunk_mode = int_mode_for_mode (TYPE_MODE (type));
+  decl_offset = 0;
+  init_part = 0;
+  object_finished = false;
+}
+
+/* Implement TARGET_ASM_DECLARE_CONSTANT_NAME.  Begin the process of
+   writing a constant variable EXP with NAME and SIZE and its
+   initializer to FILE.  */
+
+static void
+nvptx_asm_declare_constant_name (FILE *file, const char *name,
+				 const_tree exp, HOST_WIDE_INT size)
+{
+  tree type = TREE_TYPE (exp);
+  init_output_initializer (file, name, type, false);
+  fprintf (file, "\t.const .align %d .u%d ",
+	   TYPE_ALIGN (TREE_TYPE (exp)) / BITS_PER_UNIT,
+	   decl_chunk_size * BITS_PER_UNIT);
+  assemble_name (file, name);
+  fprintf (file, "[" HOST_WIDE_INT_PRINT_DEC "]",
+	   (size + decl_chunk_size - 1) / decl_chunk_size);
+  object_size = size;
+}
+
+/* Implement the ASM_DECLARE_OBJECT_NAME macro.  Used to start writing
+   a variable DECL with NAME to FILE.  */
+
+void
+nvptx_declare_object_name (FILE *file, const char *name, const_tree decl)
+{
+  if (decl && DECL_SIZE (decl))
+    {
+      tree type = TREE_TYPE (decl);
+      unsigned HOST_WIDE_INT size;
+
+      init_output_initializer (file, name, type, TREE_PUBLIC (decl));
+      size = tree_to_uhwi (DECL_SIZE_UNIT (decl));
+      const char *section = nvptx_section_for_decl (decl);
+      fprintf (file, "\t%s%s .align %d .u%d ",
+	       TREE_PUBLIC (decl) ? " .visible" : "", section,
+	       DECL_ALIGN (decl) / BITS_PER_UNIT,
+	       decl_chunk_size * BITS_PER_UNIT);
+      assemble_name (file, name);
+      if (size > 0)
+	fprintf (file, "[" HOST_WIDE_INT_PRINT_DEC "]",
+		 (size + decl_chunk_size - 1) / decl_chunk_size);
+      else
+	object_finished = true;
+      object_size = size;
+    }
+}
+
+/* Implement TARGET_ASM_GLOBALIZE_LABEL by doing nothing.  */
+
+static void
+nvptx_globalize_label (FILE *, const char *)
+{
+}
+
+/* Implement TARGET_ASM_ASSEMBLE_UNDEFINED_DECL.  Write an extern
+   declaration only for variable DECL with NAME to FILE.  */
+static void
+nvptx_assemble_undefined_decl (FILE *file, const char *name, const_tree decl)
+{
+  if (TREE_CODE (decl) != VAR_DECL)
+    return;
+  const char *section = nvptx_section_for_decl (decl);
+  fprintf (file, "// BEGIN%s VAR DECL: ", TREE_PUBLIC (decl) ? " GLOBAL" : "");
+  assemble_name_raw (file, name);
+  fputs ("\n", file);
+  HOST_WIDE_INT size = int_size_in_bytes (TREE_TYPE (decl));
+  fprintf (file, ".extern %s .b8 ", section);
+  assemble_name_raw (file, name);
+  if (size > 0)
+    fprintf (file, "["HOST_WIDE_INT_PRINT_DEC"]", size);
+  fprintf (file, ";\n\n");
+}
+
+/* Output INSN, which is a call to CALLEE with result RESULT.  For ptx, this
+   involves writing .param declarations and in/out copies into them.  */
+
+const char *
+nvptx_output_call_insn (rtx_insn *insn, rtx result, rtx callee)
+{
+  char buf[256];
+  static int labelno;
+  bool needs_tgt = register_operand (callee, Pmode);
+  rtx pat = PATTERN (insn);
+  int nargs = XVECLEN (pat, 0) - 1;
+  tree decl = NULL_TREE;
+
+  fprintf (asm_out_file, "\t{\n");
+  if (result != NULL)
+    {
+      fprintf (asm_out_file, "\t\t.param%s %%retval_in;\n",
+	       nvptx_ptx_type_from_mode (arg_promotion (GET_MODE (result)),
+					 false));
+    }
+
+  if (GET_CODE (callee) == SYMBOL_REF)
+    {
+      decl = SYMBOL_REF_DECL (callee);
+      if (decl && DECL_EXTERNAL (decl))
+	nvptx_record_fndecl (decl);
+    }
+
+  if (needs_tgt)
+    {
+      ASM_GENERATE_INTERNAL_LABEL (buf, "LCT", labelno);
+      labelno++;
+      ASM_OUTPUT_LABEL (asm_out_file, buf);
+      std::stringstream s;
+      write_func_decl_from_insn (s, result, pat, callee);
+      fputs (s.str().c_str(), asm_out_file);
+    }
+
+  for (int i = 0, argno = 0; i < nargs; i++)
+    {
+      rtx t = XEXP (XVECEXP (pat, 0, i + 1), 0);
+      enum machine_mode mode = GET_MODE (t);
+      int count = maybe_split_mode (&mode);
+
+      while (count-- > 0)
+	fprintf (asm_out_file, "\t\t.param%s %%out_arg%d%s;\n",
+		 nvptx_ptx_type_from_mode (mode, false), argno++,
+		 mode == QImode || mode == HImode ? "[1]" : "");
+    }
+  for (int i = 0, argno = 0; i < nargs; i++)
+    {
+      rtx t = XEXP (XVECEXP (pat, 0, i + 1), 0);
+      gcc_assert (REG_P (t));
+      enum machine_mode mode = GET_MODE (t);
+      int count = maybe_split_mode (&mode);
+
+      if (count == 1)
+	fprintf (asm_out_file, "\t\tst.param%s [%%out_arg%d], %%r%d;\n",
+		 nvptx_ptx_type_from_mode (mode, false), argno++,
+		 REGNO (t));
+      else
+	{
+	  int n = 0;
+	  while (count-- > 0)
+	    fprintf (asm_out_file, "\t\tst.param%s [%%out_arg%d], %%r%d$%d;\n",
+		     nvptx_ptx_type_from_mode (mode, false), argno++,
+		     REGNO (t), n++);
+	}
+    }
+
+  fprintf (asm_out_file, "\t\tcall ");
+  if (result != NULL_RTX)
+    fprintf (asm_out_file, "(%%retval_in), ");
+
+  if (decl)
+    {
+      const char *name = get_fnname_from_decl (decl);
+      name = nvptx_name_replacement (name);
+      assemble_name (asm_out_file, name);
+    }
+  else
+    output_address (callee);
+
+  if (nargs > 0 || (decl && DECL_STATIC_CHAIN (decl)))
+    {
+      fprintf (asm_out_file, ", (");
+      int i, argno;
+      for (i = 0, argno = 0; i < nargs; i++)
+	{
+	  rtx t = XEXP (XVECEXP (pat, 0, i + 1), 0);
+	  enum machine_mode mode = GET_MODE (t);
+	  int count = maybe_split_mode (&mode);
+
+	  while (count-- > 0)
+	    {
+	      fprintf (asm_out_file, "%%out_arg%d", argno++);
+	      if (i + 1 < nargs || count > 0)
+		fprintf (asm_out_file, ", ");
+	    }
+	}
+      if (decl && DECL_STATIC_CHAIN (decl))
+	{
+	  if (i > 0)
+	    fprintf (asm_out_file, ", ");
+	  fprintf (asm_out_file, "%s",
+		   reg_names [OUTGOING_STATIC_CHAIN_REGNUM]);
+	}
+
+      fprintf (asm_out_file, ")");
+    }
+  if (needs_tgt)
+    {
+      fprintf (asm_out_file, ", ");
+      assemble_name (asm_out_file, buf);
+    }
+  fprintf (asm_out_file, ";\n");
+  if (result != NULL_RTX)
+    return "ld.param%t0\t%0, [%%retval_in];\n\t}";
+
+  return "}";
+}
+
+/* Implement TARGET_PRINT_OPERAND_PUNCT_VALID_P.  */
+
+static bool
+nvptx_print_operand_punct_valid_p (unsigned char c)
+{
+  return c == '.' || c== '#';
+}
+
+static void nvptx_print_operand (FILE *, rtx, int);
+
+/* Subroutine of nvptx_print_operand; used to print a memory reference X to FILE.  */
+
+static void
+nvptx_print_address_operand (FILE *file, rtx x, enum machine_mode)
+{
+  rtx off;
+  if (GET_CODE (x) == CONST)
+    x = XEXP (x, 0);
+  switch (GET_CODE (x))
+    {
+    case PLUS:
+      off = XEXP (x, 1);
+      output_address (XEXP (x, 0));
+      fprintf (file, "+");
+      output_address (off);
+      break;
+
+    case SYMBOL_REF:
+    case LABEL_REF:
+      output_addr_const (file, x);
+      break;
+
+    default:
+      gcc_assert (GET_CODE (x) != MEM);
+      nvptx_print_operand (file, x, 0);
+      break;
+    }
+}
+
+/* Write assembly language output for the address ADDR to FILE.  */
+
+static void
+nvptx_print_operand_address (FILE *file, rtx addr)
+{
+  nvptx_print_address_operand (file, addr, VOIDmode);
+}
+
+/* Print an operand, X, to FILE, with an optional modifier in CODE.
+
+   Meaning of CODE:
+   . -- print the predicate for the instruction or an emptry string for an
+        unconditional one.
+   # -- print a rounding mode for the instruction
+
+   A -- print an address space identifier for a MEM
+   c -- print an opcode suffix for a comparison operator, including a type code
+   d -- print a CONST_INT as a vector dimension (x, y, or z)
+   f -- print a full reg even for something that must always be split
+   t -- print a type opcode suffix, promoting QImode to 32 bits
+   T -- print a type size in bits
+   u -- print a type opcode suffix without promotions.  */
+
+static void
+nvptx_print_operand (FILE *file, rtx x, int code)
+{
+  rtx orig_x = x;
+  enum machine_mode op_mode;
+
+  if (code == '.')
+    {
+      x = current_insn_predicate;
+      if (x)
+	{
+	  unsigned int regno = REGNO (XEXP (x, 0));
+	  fputs ("[", file);
+ 	  if (GET_CODE (x) == EQ)
+	    fputs ("!", file);
+	  fputs (reg_names [regno], file);
+	  fputs ("]", file);
+	}
+      return;
+    }
+  else if (code == '#')
+    {
+      fputs (".rn", file);
+      return;
+    }
+
+  enum rtx_code x_code = GET_CODE (x);
+
+  switch (code)
+    {
+    case 'A':
+      {
+	addr_space_t as = nvptx_addr_space_from_address (XEXP (x, 0));
+	fputs (nvptx_section_from_addr_space (as), file);
+      }
+      break;
+
+    case 'd':
+      gcc_assert (x_code == CONST_INT);
+      if (INTVAL (x) == 0)
+	fputs (".x", file);
+      else if (INTVAL (x) == 1)
+	fputs (".y", file);
+      else if (INTVAL (x) == 2)
+	fputs (".z", file);
+      else
+	gcc_unreachable ();
+      break;
+
+    case 't':
+      op_mode = nvptx_underlying_object_mode (x);
+      fprintf (file, "%s", nvptx_ptx_type_from_mode (op_mode, true));
+      break;
+
+    case 'u':
+      op_mode = nvptx_underlying_object_mode (x);
+      fprintf (file, "%s", nvptx_ptx_type_from_mode (op_mode, false));
+      break;
+
+    case 'T':
+      fprintf (file, "%d", GET_MODE_BITSIZE (GET_MODE (x)));
+      break;
+
+    case 'j':
+      fprintf (file, "@");
+      goto common;
+
+    case 'J':
+      fprintf (file, "@!");
+      goto common;
+
+    case 'c':
+      op_mode = GET_MODE (XEXP (x, 0));
+      switch (x_code)
+	{
+	case EQ:
+	  fputs (".eq", file);
+	  break;
+	case NE:
+	  if (FLOAT_MODE_P (op_mode))
+	    fputs (".neu", file);
+	  else
+	    fputs (".ne", file);
+	  break;
+	case LE:
+	  fputs (".le", file);
+	  break;
+	case GE:
+	  fputs (".ge", file);
+	  break;
+	case LT:
+	  fputs (".lt", file);
+	  break;
+	case GT:
+	  fputs (".gt", file);
+	  break;
+	case LEU:
+	  fputs (".ls", file);
+	  break;
+	case GEU:
+	  fputs (".hs", file);
+	  break;
+	case LTU:
+	  fputs (".lo", file);
+	  break;
+	case GTU:
+	  fputs (".hi", file);
+	  break;
+	case LTGT:
+	  fputs (".ne", file);
+	  break;
+	case UNEQ:
+	  fputs (".equ", file);
+	  break;
+	case UNLE:
+	  fputs (".leu", file);
+	  break;
+	case UNGE:
+	  fputs (".geu", file);
+	  break;
+	case UNLT:
+	  fputs (".ltu", file);
+	  break;
+	case UNGT:
+	  fputs (".gtu", file);
+	  break;
+	case UNORDERED:
+	  fputs (".nan", file);
+	  break;
+	case ORDERED:
+	  fputs (".num", file);
+	  break;
+	default:
+	  gcc_unreachable ();
+	}
+      if (FLOAT_MODE_P (op_mode)
+	  || x_code == EQ || x_code == NE
+	  || x_code == GEU || x_code == GTU
+	  || x_code == LEU || x_code == LTU)
+	fputs (nvptx_ptx_type_from_mode (op_mode, true), file);
+      else
+	fprintf (file, ".s%d", GET_MODE_BITSIZE (op_mode));
+      break;
+    default:
+    common:
+      switch (x_code)
+	{
+	case SUBREG:
+	  x = SUBREG_REG (x);
+	  /* fall through */
+
+	case REG:
+	  if (HARD_REGISTER_P (x))
+	    fprintf (file, "%s", reg_names[REGNO (x)]);
+	  else
+	    fprintf (file, "%%r%d", REGNO (x));
+	  if (code != 'f' && nvptx_split_reg_p (GET_MODE (x)))
+	    {
+	      gcc_assert (GET_CODE (orig_x) == SUBREG
+			  && !nvptx_split_reg_p (GET_MODE (orig_x)));
+	      fprintf (file, "$%d", SUBREG_BYTE (orig_x) / UNITS_PER_WORD);
+	    }
+	  break;
+
+	case MEM:
+	  fputc ('[', file);
+	  nvptx_print_address_operand (file, XEXP (x, 0), GET_MODE (x));
+	  fputc (']', file);
+	  break;
+
+	case CONST_INT:
+	  output_addr_const (file, x);
+	  break;
+
+	case CONST:
+	case SYMBOL_REF:
+	case LABEL_REF:
+	  /* We could use output_addr_const, but that can print things like
+	     "x-8", which breaks ptxas.  Need to ensure it is output as
+	     "x+-8".  */
+	  nvptx_print_address_operand (file, x, VOIDmode);
+	  break;
+
+	case CONST_DOUBLE:
+	  long vals[2];
+	  REAL_VALUE_TYPE real;
+	  REAL_VALUE_FROM_CONST_DOUBLE (real, x);
+	  real_to_target (vals, &real, GET_MODE (x));
+	  vals[0] &= 0xffffffff;
+	  vals[1] &= 0xffffffff;
+	  if (GET_MODE (x) == SFmode)
+	    fprintf (file, "0f%08lx", vals[0]);
+	  else
+	    fprintf (file, "0d%08lx%08lx", vals[1], vals[0]);
+	  break;
+
+	default:
+	  output_addr_const (file, x);
+	}
+    }
+}
+\f
+/* Record replacement regs used to deal with subreg operands.  */
+struct reg_replace
+{
+  rtx replacement[MAX_RECOG_OPERANDS];
+  enum machine_mode mode;
+  int n_allocated;
+  int n_in_use;
+};
+
+/* Allocate or reuse a replacement in R and return the rtx.  */
+
+static rtx
+get_replacement (struct reg_replace *r)
+{
+  if (r->n_allocated == r->n_in_use)
+    r->replacement[r->n_allocated++] = gen_reg_rtx (r->mode);
+  return r->replacement[r->n_in_use++];
+}
+
+/* Clean up subreg operands.  In ptx assembly, everything is typed, and
+   the presence of subregs would break the rules for most instructions.
+   Replace them with a suitable new register of the right size, plus
+   conversion copyin/copyout instructions.  */
+
+static void
+nvptx_reorg (void)
+{
+  struct reg_replace qiregs, hiregs, siregs, diregs;
+  rtx_insn *insn, *next;
+
+  /* We are freeing block_for_insn in the toplev to keep compatibility
+     with old MDEP_REORGS that are not CFG based.  Recompute it now.  */
+  compute_bb_for_insn ();
+
+  df_clear_flags (DF_LR_RUN_DCE);
+  df_analyze ();
+
+  thread_prologue_and_epilogue_insns ();
+
+  qiregs.n_allocated = 0;
+  hiregs.n_allocated = 0;
+  siregs.n_allocated = 0;
+  diregs.n_allocated = 0;
+  qiregs.mode = QImode;
+  hiregs.mode = HImode;
+  siregs.mode = SImode;
+  diregs.mode = DImode;
+
+  for (insn = get_insns (); insn; insn = next)
+    {
+      next = NEXT_INSN (insn);
+      if (!NONDEBUG_INSN_P (insn)
+	  || asm_noperands (insn) >= 0
+	  || GET_CODE (PATTERN (insn)) == USE
+	  || GET_CODE (PATTERN (insn)) == CLOBBER)
+	continue;
+      qiregs.n_in_use = 0;
+      hiregs.n_in_use = 0;
+      siregs.n_in_use = 0;
+      diregs.n_in_use = 0;
+      extract_insn (insn);
+      enum attr_subregs_ok s_ok = get_attr_subregs_ok (insn);
+      for (int i = 0; i < recog_data.n_operands; i++)
+	{
+	  rtx op = recog_data.operand[i];
+	  if (GET_CODE (op) != SUBREG)
+	    continue;
+
+	  rtx inner = SUBREG_REG (op);
+
+	  enum machine_mode outer_mode = GET_MODE (op);
+	  enum machine_mode inner_mode = GET_MODE (inner);
+	  gcc_assert (s_ok);
+	  if (s_ok
+	      && (GET_MODE_PRECISION (inner_mode)
+		  >= GET_MODE_PRECISION (outer_mode)))
+	    continue;
+	  gcc_assert (SCALAR_INT_MODE_P (outer_mode));
+	  struct reg_replace *r = (outer_mode == QImode ? &qiregs
+				   : outer_mode == HImode ? &hiregs
+				   : outer_mode == SImode ? &siregs
+				   : &diregs);
+	  rtx new_reg = get_replacement (r);
+
+	  if (recog_data.operand_type[i] != OP_OUT)
+	    {
+	      enum rtx_code code;
+	      if (GET_MODE_PRECISION (inner_mode)
+		  < GET_MODE_PRECISION (outer_mode))
+		code = ZERO_EXTEND;
+	      else
+		code = TRUNCATE;
+
+	      rtx pat = gen_rtx_SET (VOIDmode, new_reg,
+				     gen_rtx_fmt_e (code, outer_mode, inner));
+	      emit_insn_before (pat, insn);
+	    }
+
+	  if (recog_data.operand_type[i] != OP_IN)
+	    {
+	      enum rtx_code code;
+	      if (GET_MODE_PRECISION (inner_mode)
+		  < GET_MODE_PRECISION (outer_mode))
+		code = TRUNCATE;
+	      else
+		code = ZERO_EXTEND;
+
+	      rtx pat = gen_rtx_SET (VOIDmode, inner,
+				     gen_rtx_fmt_e (code, inner_mode, new_reg));
+	      emit_insn_after (pat, insn);
+	    }
+	  validate_change (insn, recog_data.operand_loc[i], new_reg, false);
+	}
+    }
+
+  int maxregs = max_reg_num ();
+  regstat_init_n_sets_and_refs ();
+
+  for (int i = LAST_VIRTUAL_REGISTER + 1; i < maxregs; i++)
+    if (REG_N_SETS (i) == 0 && REG_N_REFS (i) == 0)
+      regno_reg_rtx[i] = const0_rtx;
+  regstat_free_n_sets_and_refs ();
+}
+\f
+/* Handle a "kernel" attribute; arguments as in
+   struct attribute_spec.handler.  */
+
+static tree
+nvptx_handle_kernel_attribute (tree *node, tree name, tree ARG_UNUSED (args),
+			       int ARG_UNUSED (flags), bool *no_add_attrs)
+{
+  tree decl = *node;
+
+  if (TREE_CODE (decl) != FUNCTION_DECL)
+    {
+      error ("%qE attribute only applies to functions", name);
+      *no_add_attrs = true;
+    }
+
+  else if (TREE_TYPE (TREE_TYPE (decl)) != void_type_node)
+    {
+      error ("%qE attribute requires a void return type", name);
+      *no_add_attrs = true;
+    }
+
+  return NULL_TREE;
+}
+
+/* Table of valid machine attributes.  */
+static const struct attribute_spec nvptx_attribute_table[] =
+{
+  /* { name, min_len, max_len, decl_req, type_req, fn_type_req, handler,
+       affects_type_identity } */
+  { "kernel", 0, 0, true, false,  false, nvptx_handle_kernel_attribute, false },
+  { NULL, 0, 0, false, false, false, NULL, false }
+};
+\f
+/* Limit vector alignments to BIGGEST_ALIGNMENT.  */
+
+static HOST_WIDE_INT
+nvptx_vector_alignment (const_tree type)
+{
+  HOST_WIDE_INT align = tree_to_shwi (TYPE_SIZE (type));
+
+  return MIN (align, BIGGEST_ALIGNMENT);
+}
+\f
+/* Implement TARGET_ASM_FILE_START.  Write the kinds of things ptxas expects
+   at the start of a file.  */
+
+static void
+nvptx_file_start (void)
+{
+  fputs ("// BEGIN PREAMBLE\n", asm_out_file);
+  fputs ("\t.version\t3.1\n", asm_out_file);
+  fputs ("\t.target\tsm_30\n", asm_out_file);
+  fprintf (asm_out_file, "\t.address_size %d\n", GET_MODE_BITSIZE (Pmode));
+  fputs ("// END PREAMBLE\n", asm_out_file);
+}
+
+/* Called through htab_traverse; call nvptx_record_fndecl for every
+   SLOT.  */
+
+static int
+write_one_fndecl (void **slot, void *)
+{
+  tree decl = (tree)*slot;
+  nvptx_record_fndecl (decl, true);
+  return 1;
+}
+
+/* Write out the function declarations we've collected.  */
+
+static void
+nvptx_file_end (void)
+{
+  htab_traverse (needed_fndecls_htab,
+		 write_one_fndecl,
+		 NULL);
+  fputs (func_decls.str().c_str(), asm_out_file);
+}
+\f
+#undef TARGET_OPTION_OVERRIDE
+#define TARGET_OPTION_OVERRIDE nvptx_option_override
+
+#undef TARGET_ATTRIBUTE_TABLE
+#define TARGET_ATTRIBUTE_TABLE nvptx_attribute_table
+
+#undef TARGET_LEGITIMATE_ADDRESS_P
+#define TARGET_LEGITIMATE_ADDRESS_P nvptx_legitimate_address_p
+
+#undef  TARGET_PROMOTE_FUNCTION_MODE
+#define TARGET_PROMOTE_FUNCTION_MODE nvptx_promote_function_mode
+
+#undef TARGET_FUNCTION_ARG
+#define TARGET_FUNCTION_ARG nvptx_function_arg
+#undef TARGET_FUNCTION_INCOMING_ARG
+#define TARGET_FUNCTION_INCOMING_ARG nvptx_function_incoming_arg
+#undef TARGET_FUNCTION_ARG_ADVANCE
+#define TARGET_FUNCTION_ARG_ADVANCE nvptx_function_arg_advance
+#undef TARGET_FUNCTION_ARG_BOUNDARY
+#define TARGET_FUNCTION_ARG_BOUNDARY nvptx_function_arg_boundary
+#undef TARGET_FUNCTION_ARG_ROUND_BOUNDARY
+#define TARGET_FUNCTION_ARG_ROUND_BOUNDARY nvptx_function_arg_boundary
+#undef TARGET_PASS_BY_REFERENCE
+#define TARGET_PASS_BY_REFERENCE nvptx_pass_by_reference
+#undef TARGET_FUNCTION_VALUE_REGNO_P
+#define TARGET_FUNCTION_VALUE_REGNO_P nvptx_function_value_regno_p
+#undef TARGET_FUNCTION_VALUE
+#define TARGET_FUNCTION_VALUE nvptx_function_value
+#undef TARGET_LIBCALL_VALUE
+#define TARGET_LIBCALL_VALUE nvptx_libcall_value
+#undef TARGET_FUNCTION_OK_FOR_SIBCALL
+#define TARGET_FUNCTION_OK_FOR_SIBCALL nvptx_function_ok_for_sibcall
+#undef TARGET_SPLIT_COMPLEX_ARG
+#define TARGET_SPLIT_COMPLEX_ARG hook_bool_const_tree_true
+#undef TARGET_RETURN_IN_MEMORY
+#define TARGET_RETURN_IN_MEMORY nvptx_return_in_memory
+#undef TARGET_OMIT_STRUCT_RETURN_REG
+#define TARGET_OMIT_STRUCT_RETURN_REG true
+#undef TARGET_STRICT_ARGUMENT_NAMING
+#define TARGET_STRICT_ARGUMENT_NAMING nvptx_strict_argument_naming
+#undef TARGET_STATIC_CHAIN
+#define TARGET_STATIC_CHAIN nvptx_static_chain
+
+#undef TARGET_CALL_ARGS
+#define TARGET_CALL_ARGS nvptx_call_args
+#undef TARGET_END_CALL_ARGS
+#define TARGET_END_CALL_ARGS nvptx_end_call_args
+
+#undef TARGET_ASM_FILE_START
+#define TARGET_ASM_FILE_START nvptx_file_start
+#undef TARGET_ASM_FILE_END
+#define TARGET_ASM_FILE_END nvptx_file_end
+#undef TARGET_ASM_GLOBALIZE_LABEL
+#define TARGET_ASM_GLOBALIZE_LABEL nvptx_globalize_label
+#undef TARGET_ASM_ASSEMBLE_UNDEFINED_DECL
+#define TARGET_ASM_ASSEMBLE_UNDEFINED_DECL nvptx_assemble_undefined_decl
+#undef  TARGET_PRINT_OPERAND
+#define TARGET_PRINT_OPERAND nvptx_print_operand
+#undef  TARGET_PRINT_OPERAND_ADDRESS
+#define TARGET_PRINT_OPERAND_ADDRESS nvptx_print_operand_address
+#undef  TARGET_PRINT_OPERAND_PUNCT_VALID_P
+#define TARGET_PRINT_OPERAND_PUNCT_VALID_P nvptx_print_operand_punct_valid_p
+#undef TARGET_ASM_INTEGER
+#define TARGET_ASM_INTEGER nvptx_assemble_integer
+#undef TARGET_ASM_DECL_END
+#define TARGET_ASM_DECL_END nvptx_assemble_decl_end
+#undef TARGET_ASM_DECLARE_CONSTANT_NAME
+#define TARGET_ASM_DECLARE_CONSTANT_NAME nvptx_asm_declare_constant_name
+#undef TARGET_USE_BLOCKS_FOR_CONSTANT_P
+#define TARGET_USE_BLOCKS_FOR_CONSTANT_P hook_bool_mode_const_rtx_true
+#undef TARGET_ASM_NEED_VAR_DECL_BEFORE_USE
+#define TARGET_ASM_NEED_VAR_DECL_BEFORE_USE true
+
+#undef TARGET_MACHINE_DEPENDENT_REORG
+#define TARGET_MACHINE_DEPENDENT_REORG nvptx_reorg
+#undef TARGET_NO_REGISTER_ALLOCATION
+#define TARGET_NO_REGISTER_ALLOCATION true
+
+#undef TARGET_VECTOR_ALIGNMENT
+#define TARGET_VECTOR_ALIGNMENT nvptx_vector_alignment
+
+struct gcc_target targetm = TARGET_INITIALIZER;
+
+#include "gt-nvptx.h"
Index: gcc/config/nvptx/nvptx.opt
===================================================================
--- /dev/null
+++ gcc/config/nvptx/nvptx.opt
@@ -0,0 +1,30 @@
+; Options for the NVPTX port
+; Copyright 2014 Free Software Foundation, Inc.
+;
+; This file is part of GCC.
+;
+; GCC is free software; you can redistribute it and/or modify it under
+; the terms of the GNU General Public License as published by the Free
+; Software Foundation; either version 3, or (at your option) any later
+; version.
+;
+; GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+; WARRANTY; without even the implied warranty of MERCHANTABILITY or
+; FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+; for more details.
+;
+; You should have received a copy of the GNU General Public License
+; along with GCC; see the file COPYING3.  If not see
+; <http://www.gnu.org/licenses/>.
+
+m64
+Target Report RejectNegative Mask(ABI64)
+Generate code for a 64 bit ABI
+
+m32
+Target Report RejectNegative InverseMask(ABI64)
+Generate code for a 32 bit ABI
+
+mmainkernel
+Target Report RejectNegative
+Link in code for a __main kernel.
Index: gcc/config/nvptx/t-nvptx
===================================================================
--- /dev/null
+++ gcc/config/nvptx/t-nvptx
@@ -0,0 +1,2 @@
+#
+
Index: gcc/config/nvptx/nvptx.h
===================================================================
--- /dev/null
+++ gcc/config/nvptx/nvptx.h
@@ -0,0 +1,356 @@
+/* Target Definitions for NVPTX.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   Contributed by Bernd Schmidt <bernds@codesourcery.com>
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_NVPTX_H
+#define GCC_NVPTX_H
+
+/* Run-time Target.  */
+
+#define STARTFILE_SPEC "%{mmainkernel:crt0.o}"
+
+#define TARGET_CPU_CPP_BUILTINS()		\
+  do						\
+    {						\
+      builtin_assert ("machine=nvptx");		\
+      builtin_assert ("cpu=nvptx");		\
+      builtin_define ("__nvptx__");		\
+    } while (0)
+
+/* Storage Layout.  */
+
+#define BITS_BIG_ENDIAN 0
+#define BYTES_BIG_ENDIAN 0
+#define WORDS_BIG_ENDIAN 0
+
+/* Chosen such that we won't have to deal with multi-word subregs.  */
+#define UNITS_PER_WORD 8
+
+#define PARM_BOUNDARY 8
+#define STACK_BOUNDARY 64
+#define FUNCTION_BOUNDARY 32
+#define BIGGEST_ALIGNMENT 64
+#define STRICT_ALIGNMENT 1
+
+/* Copied from elf.h and other places.  We'd otherwise use
+   BIGGEST_ALIGNMENT and fail a number of testcases.  */
+#define MAX_OFILE_ALIGNMENT (32768 * 8)
+
+/* Type Layout.  */
+
+#define DEFAULT_SIGNED_CHAR 1
+
+#define SHORT_TYPE_SIZE 16
+#define INT_TYPE_SIZE 32
+#define LONG_TYPE_SIZE (TARGET_ABI64 ? 64 : 32)
+#define LONG_LONG_TYPE_SIZE 64
+#define FLOAT_TYPE_SIZE 32
+#define DOUBLE_TYPE_SIZE 64
+#define LONG_DOUBLE_TYPE_SIZE 64
+
+#undef SIZE_TYPE
+#define SIZE_TYPE (TARGET_ABI64 ? "long unsigned int" : "unsigned int")
+#undef PTRDIFF_TYPE
+#define PTRDIFF_TYPE (TARGET_ABI64 ? "long int" : "int")
+
+#define POINTER_SIZE (TARGET_ABI64 ? 64 : 32)
+
+#define Pmode (TARGET_ABI64 ? DImode : SImode)
+
+/* Registers.  Since ptx is a virtual target, we just define a few
+   hard registers for special purposes and leave pseudos unallocated.  */
+
+#define FIRST_PSEUDO_REGISTER 16
+#define FIXED_REGISTERS					\
+  { 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1 }
+#define CALL_USED_REGISTERS				\
+  { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 }
+
+#define HARD_REGNO_NREGS(regno, mode)	1
+#define CANNOT_CHANGE_MODE_CLASS(M1, M2, CLS) ((CLS) == RETURN_REG)
+#define HARD_REGNO_MODE_OK(REG, MODE) nvptx_hard_regno_mode_ok (REG, MODE)
+
+/* Register Classes.  */
+
+enum reg_class
+  {
+    NO_REGS,
+    RETURN_REG,
+    ALL_REGS,
+    LIM_REG_CLASSES
+  };
+
+#define N_REG_CLASSES (int) LIM_REG_CLASSES
+
+#define REG_CLASS_NAMES {	  \
+    "RETURN_REG",		  \
+    "NO_REGS",			  \
+    "ALL_REGS" }
+
+#define REG_CLASS_CONTENTS	\
+{				\
+  /* NO_REGS.  */		\
+  { 0x0000 },			\
+  /* RETURN_REG.  */		\
+  { 0x0008 },			\
+  /* ALL_REGS.  */		\
+  { 0xFFFF },			\
+}
+
+#define GENERAL_REGS ALL_REGS
+
+#define REGNO_REG_CLASS(R) ((R) == 4 ? RETURN_REG : ALL_REGS)
+
+#define BASE_REG_CLASS ALL_REGS
+#define INDEX_REG_CLASS NO_REGS
+
+#define REGNO_OK_FOR_BASE_P(X) true
+#define REGNO_OK_FOR_INDEX_P(X) false
+
+#define CLASS_MAX_NREGS(class, mode) \
+  ((GET_MODE_SIZE (mode) + UNITS_PER_WORD - 1) / UNITS_PER_WORD)
+
+#define MODES_TIEABLE_P(M1, M2) false
+
+#define PROMOTE_MODE(MODE, UNSIGNEDP, TYPE)		\
+  if (GET_MODE_CLASS (MODE) == MODE_INT			\
+      && GET_MODE_SIZE (MODE) < GET_MODE_SIZE (SImode))	\
+    {							\
+      (MODE) = SImode;					\
+    }
+
+/* Address spaces.  */
+#define ADDR_SPACE_GLOBAL 1
+#define ADDR_SPACE_SHARED 3
+#define ADDR_SPACE_CONST 4
+#define ADDR_SPACE_LOCAL 5
+#define ADDR_SPACE_PARAM 101
+
+/* Stack and Calling.  */
+
+#define STARTING_FRAME_OFFSET 0
+#define FRAME_GROWS_DOWNWARD 0
+#define STACK_GROWS_DOWNWARD
+
+#define STACK_POINTER_REGNUM 1
+#define HARD_FRAME_POINTER_REGNUM 2
+#define NVPTX_PUNNING_BUFFER_REGNUM 3
+#define NVPTX_RETURN_REGNUM 4
+#define FRAME_POINTER_REGNUM 15
+#define ARG_POINTER_REGNUM 14
+#define RETURN_ADDR_REGNO 13
+
+#define STATIC_CHAIN_REGNUM 12
+#define OUTGOING_ARG_POINTER_REGNUM 11
+#define OUTGOING_STATIC_CHAIN_REGNUM 10
+
+#define FIRST_PARM_OFFSET(FNDECL) 0
+#define PUSH_ARGS_REVERSED 1
+
+#define ACCUMULATE_OUTGOING_ARGS 1
+
+#ifdef HOST_WIDE_INT
+struct nvptx_args {
+  union tree_node *fntype;
+  /* Number of arguments passed in registers so far.  */
+  int count;
+  /* Offset into the stdarg area so far.  */
+  HOST_WIDE_INT off;
+};
+#endif
+
+#define CUMULATIVE_ARGS struct nvptx_args
+
+#define INIT_CUMULATIVE_ARGS(CUM, FNTYPE, LIBNAME, FNDECL, N_NAMED_ARGS) \
+  do { (CUM).fntype = (FNTYPE); (CUM).count = 0; (CUM).off = 0; } while (0)
+
+#define FUNCTION_ARG_REGNO_P(r) 0
+
+#define DEFAULT_PCC_STRUCT_RETURN 0
+
+#define FUNCTION_PROFILER(file, labelno) \
+  fatal_error ("profiling is not yet implemented for this architecture")
+
+#define TRAMPOLINE_SIZE 32
+#define TRAMPOLINE_ALIGNMENT 256
+\f
+/* We don't run reload, so this isn't actually used, but it still needs to be
+   defined.  Showing an argp->fp elimination also stops
+   expand_builtin_setjmp_receiver from generating invalid insns.  */
+#define ELIMINABLE_REGS					\
+  {							\
+    { FRAME_POINTER_REGNUM, HARD_FRAME_POINTER_REGNUM},	\
+    { ARG_POINTER_REGNUM, HARD_FRAME_POINTER_REGNUM}	\
+  }
+
+/* Define the offset between two registers, one to be eliminated, and the other
+   its replacement, at the start of a routine.  */
+
+#define INITIAL_ELIMINATION_OFFSET(FROM, TO, OFFSET) \
+  ((OFFSET) = 0)
+\f
+/* Addressing Modes.  */
+
+#define MAX_REGS_PER_ADDRESS 1
+
+#define LEGITIMATE_PIC_OPERAND_P(X) 1
+\f
+
+struct nvptx_pseudo_info
+{
+  int true_size;
+  int renumber;
+};
+
+#if defined HOST_WIDE_INT
+struct GTY(()) machine_function
+{
+  rtx_expr_list *call_args;
+  rtx start_call;
+  tree funtype;
+  bool has_call_with_varargs;
+  bool has_call_with_sc;
+  struct GTY((skip)) nvptx_pseudo_info *pseudos;
+  HOST_WIDE_INT outgoing_stdarg_size;
+  int ret_reg_mode;
+  int punning_buffer_size;
+};
+#endif
+\f
+/* Costs.  */
+
+#define NO_FUNCTION_CSE 1
+#define SLOW_BYTE_ACCESS 0
+#define BRANCH_COST(speed_p, predictable_p) 6
+\f
+/* Assembler Format.  */
+
+#undef ASM_DECLARE_FUNCTION_NAME
+#define ASM_DECLARE_FUNCTION_NAME(FILE, NAME, DECL)		\
+  nvptx_declare_function_name (FILE, NAME, DECL)
+
+#undef ASM_DECLARE_FUNCTION_SIZE
+#define ASM_DECLARE_FUNCTION_SIZE(STREAM, NAME, DECL) \
+  nvptx_function_end (STREAM)
+
+#define DWARF2_ASM_LINE_DEBUG_INFO 1
+
+#undef ASM_APP_ON
+#define ASM_APP_ON "\t// #APP \n"
+#undef ASM_APP_OFF
+#define ASM_APP_OFF "\t// #NO_APP \n"
+
+#define ASM_OUTPUT_COMMON(stream, name, size, rounded)
+#define ASM_OUTPUT_LOCAL(stream, name, size, rounded)
+
+#define REGISTER_NAMES							\
+  {									\
+    "%hr0", "%outargs", "%hfp", "%punbuffer", "%retval", "%retval_in", "%hr6", "%hr7",	\
+    "%hr8", "%hr9", "%hr10", "%hr11", "%hr12", "%hr13", "%argp", "%frame" \
+  }
+
+#define DBX_REGISTER_NUMBER(N) N
+
+#define TEXT_SECTION_ASM_OP ""
+#define DATA_SECTION_ASM_OP ""
+
+#undef  ASM_GENERATE_INTERNAL_LABEL
+#define ASM_GENERATE_INTERNAL_LABEL(LABEL, PREFIX, NUM)		\
+  do								\
+    {								\
+      char *__p;						\
+      __p = stpcpy (&(LABEL)[1], PREFIX);			\
+      (LABEL)[0] = '$';						\
+      sprint_ul (__p, (unsigned long) (NUM));			\
+    }								\
+  while (0)
+
+#define ASM_OUTPUT_ALIGN(FILE, POWER)
+#define ASM_OUTPUT_SKIP(FILE, N)		\
+  nvptx_output_skip (FILE, N)
+#undef  ASM_OUTPUT_ASCII
+#define ASM_OUTPUT_ASCII(FILE, STR, LENGTH)			\
+  nvptx_output_ascii (FILE, STR, LENGTH);
+
+#define ASM_DECLARE_OBJECT_NAME(FILE, NAME, DECL)	\
+  nvptx_declare_object_name (FILE, NAME, DECL)
+
+#undef  ASM_OUTPUT_ALIGNED_DECL_COMMON
+#define ASM_OUTPUT_ALIGNED_DECL_COMMON(FILE, DECL, NAME, SIZE, ALIGN)	\
+  do									\
+    {									\
+      fprintf (FILE, "// BEGIN%s VAR DEF: ",				\
+	       TREE_PUBLIC (DECL) ? " GLOBAL" : "");			\
+      assemble_name_raw (FILE, NAME);					\
+      fputc ('\n', FILE);						\
+      const char *sec = nvptx_section_for_decl (DECL);			\
+      fprintf (FILE, ".visible%s.align %d .b8 ", sec,			\
+	       (ALIGN) / BITS_PER_UNIT);				\
+      assemble_name ((FILE), (NAME));					\
+      if ((SIZE) > 0)							\
+	fprintf (FILE, "["HOST_WIDE_INT_PRINT_DEC"]", (SIZE));		\
+      fprintf (FILE, ";\n");						\
+    }									\
+  while (0)
+
+#undef  ASM_OUTPUT_ALIGNED_DECL_LOCAL
+#define ASM_OUTPUT_ALIGNED_DECL_LOCAL(FILE, DECL, NAME, SIZE, ALIGN)	\
+  do									\
+    {									\
+      fprintf (FILE, "// BEGIN VAR DEF: ");				\
+      assemble_name_raw (FILE, NAME);					\
+      fputc ('\n', FILE);						\
+      const char *sec = nvptx_section_for_decl (DECL);			\
+      fprintf (FILE, ".visible%s.align %d .b8 ", sec,			\
+	       (ALIGN) / BITS_PER_UNIT);				\
+      assemble_name ((FILE), (NAME));					\
+      if ((SIZE) > 0)							\
+	fprintf (FILE, "["HOST_WIDE_INT_PRINT_DEC"]", (SIZE));		\
+      fprintf (FILE, ";\n");						\
+    }									\
+  while (0)
+
+#define CASE_VECTOR_PC_RELATIVE flag_pic
+#define JUMP_TABLES_IN_TEXT_SECTION flag_pic
+
+#define ADDR_VEC_ALIGN(VEC) (JUMP_TABLES_IN_TEXT_SECTION ? 5 : 2)
+
+/* Misc.  */
+
+#define DWARF2_DEBUGGING_INFO 1
+
+#define CLZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \
+  ((VALUE) = GET_MODE_BITSIZE ((MODE)), 2)
+#define CTZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \
+  ((VALUE) = GET_MODE_BITSIZE ((MODE)), 2)
+
+#define NO_DOT_IN_LABEL
+#define ASM_COMMENT_START "//"
+
+#define STORE_FLAG_VALUE -1
+#define FLOAT_STORE_FLAG_VALUE(MODE) REAL_VALUE_ATOF("1.0", (MODE))
+
+#define CASE_VECTOR_MODE SImode
+#define MOVE_MAX 4
+#define MOVE_RATIO(SPEED) 4
+#define TRULY_NOOP_TRUNCATION(outprec, inprec) 1
+#define FUNCTION_MODE QImode
+#define HAS_INIT_SECTION 1
+
+#endif /* GCC_NVPTX_H */
Index: gcc/config/nvptx/nvptx-protos.h
===================================================================
--- /dev/null
+++ gcc/config/nvptx/nvptx-protos.h
@@ -0,0 +1,47 @@
+/* Prototypes for exported functions defined in nvptx.c.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   Contributed by Bernd Schmidt <bernds@codesourcery.com>
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_NVPTX_PROTOS_H
+#define GCC_NVPTX_PROTOS_H
+
+extern void nvptx_declare_function_name (FILE *, const char *, const_tree decl);
+extern void nvptx_declare_object_name (FILE *file, const char *name,
+				       const_tree decl);
+extern void nvptx_record_needed_fndecl (tree decl);
+extern void nvptx_function_end (FILE *);
+extern void nvptx_output_skip (FILE *, unsigned HOST_WIDE_INT);
+extern void nvptx_output_ascii (FILE *, const char *, unsigned HOST_WIDE_INT);
+extern void nvptx_register_pragmas (void);
+extern const char *nvptx_section_for_decl (const_tree);
+
+#ifdef RTX_CODE
+extern void nvptx_expand_call (rtx, rtx);
+extern rtx nvptx_expand_compare (rtx);
+extern const char *nvptx_ptx_type_from_mode (enum machine_mode, bool);
+extern const char *nvptx_output_call_insn (rtx_insn *, rtx, rtx);
+extern const char *nvptx_output_return (void);
+extern enum machine_mode nvptx_underlying_object_mode (rtx);
+extern const char *nvptx_section_from_addr_space (addr_space_t);
+extern bool nvptx_hard_regno_mode_ok (int, enum machine_mode);
+extern addr_space_t nvptx_addr_space_from_address (rtx);
+extern rtx nvptx_maybe_convert_symbolic_operand (rtx);
+#endif
+#endif
+
Index: gcc/config/nvptx/nvptx.md
===================================================================
--- /dev/null
+++ gcc/config/nvptx/nvptx.md
@@ -0,0 +1,1282 @@
+;; Machine description for NVPTX.
+;; Copyright (C) 2014 Free Software Foundation, Inc.
+;; Contributed by Bernd Schmidt <bernds@codesourcery.com>
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+(define_c_enum "unspec" [
+   UNSPEC_ARG_REG
+   UNSPEC_FROM_GLOBAL
+   UNSPEC_FROM_LOCAL
+   UNSPEC_FROM_PARAM
+   UNSPEC_FROM_SHARED
+   UNSPEC_FROM_CONST
+   UNSPEC_TO_GLOBAL
+   UNSPEC_TO_LOCAL
+   UNSPEC_TO_PARAM
+   UNSPEC_TO_SHARED
+   UNSPEC_TO_CONST
+
+   UNSPEC_CPLX_LOWPART
+   UNSPEC_CPLX_HIGHPART
+
+   UNSPEC_COPYSIGN
+   UNSPEC_LOG2
+   UNSPEC_EXP2
+   UNSPEC_SIN
+   UNSPEC_COS
+
+   UNSPEC_FPINT_FLOOR
+   UNSPEC_FPINT_BTRUNC
+   UNSPEC_FPINT_CEIL
+   UNSPEC_FPINT_NEARBYINT
+
+   UNSPEC_BITREV
+
+   UNSPEC_ALLOCA
+
+   UNSPEC_NTID
+   UNSPEC_TID
+])
+
+(define_attr "subregs_ok" "false,true"
+  (const_string "false"))
+
+(define_predicate "nvptx_register_operand"
+  (match_code "reg,subreg")
+{
+  if (REG_P (op))
+    return !HARD_REGISTER_P (op);
+  if (GET_CODE (op) == SUBREG && MEM_P (SUBREG_REG (op)))
+    return false;
+  if (GET_CODE (op) == SUBREG)
+    return false;
+  return register_operand (op, mode);
+})
+
+(define_predicate "nvptx_reg_or_mem_operand"
+  (match_code "mem,reg,subreg")
+{
+  if (REG_P (op))
+    return !HARD_REGISTER_P (op);
+  if (GET_CODE (op) == SUBREG && MEM_P (SUBREG_REG (op)))
+    return false;
+  if (GET_CODE (op) == SUBREG)
+    return false;
+  return memory_operand (op, mode) || register_operand (op, mode);
+})
+
+;; Allow symbolic constants.
+(define_predicate "symbolic_operand"
+  (match_code "symbol_ref,const"))
+
+;; Allow registers or symbolic constants.  We can allow frame, arg or stack
+;; pointers here since they are actually symbolic constants.
+(define_predicate "nvptx_register_or_symbolic_operand"
+  (match_code "reg,subreg,symbol_ref,const")
+{
+  if (GET_CODE (op) == SUBREG && MEM_P (SUBREG_REG (op)))
+    return false;
+  if (GET_CODE (op) == SUBREG)
+    return false;
+  if (CONSTANT_P (op))
+    return true;
+  return register_operand (op, mode);
+})
+
+;; Registers or constants for normal instructions.  Does not allow symbolic
+;; constants.
+(define_predicate "nvptx_nonmemory_operand"
+  (match_code "reg,subreg,const_int,const_double")
+{
+  if (REG_P (op))
+    return !HARD_REGISTER_P (op);
+  if (GET_CODE (op) == SUBREG && MEM_P (SUBREG_REG (op)))
+    return false;
+  if (GET_CODE (op) == SUBREG)
+    return false;
+  return nonmemory_operand (op, mode);
+})
+
+;; A source operand for a move instruction.  This is the only predicate we use
+;; that accepts symbolic constants.
+(define_predicate "nvptx_general_operand"
+  (match_code "reg,subreg,mem,const,symbol_ref,label_ref,const_int,const_double")
+{
+  if (REG_P (op))
+    return !HARD_REGISTER_P (op);
+  return general_operand (op, mode);
+})
+
+;; A destination operand for a move instruction.  This is the only destination
+;; predicate that accepts the return register since it requires special handling.
+(define_predicate "nvptx_nonimmediate_operand"
+  (match_code "reg,subreg,mem")
+{
+  if (REG_P (op))
+    return (op != frame_pointer_rtx
+	    && op != arg_pointer_rtx
+	    && op != stack_pointer_rtx);
+  return nonimmediate_operand (op, mode);
+})
+
+(define_predicate "const_0_operand"
+  (and (match_code "const_int,const_double,const_vector")
+       (match_test "op == CONST0_RTX (GET_MODE (op))")))
+
+(define_predicate "global_mem_operand"
+  (and (match_code "mem")
+       (match_test "MEM_ADDR_SPACE (op) == ADDR_SPACE_GLOBAL")))
+
+(define_predicate "const_mem_operand"
+  (and (match_code "mem")
+       (match_test "MEM_ADDR_SPACE (op) == ADDR_SPACE_CONST")))
+
+(define_predicate "param_mem_operand"
+  (and (match_code "mem")
+       (match_test "MEM_ADDR_SPACE (op) == ADDR_SPACE_PARAM")))
+
+(define_predicate "shared_mem_operand"
+  (and (match_code "mem")
+       (match_test "MEM_ADDR_SPACE (op) == ADDR_SPACE_SHARED")))
+
+(define_predicate "const0_operand"
+  (and (match_code "const_int")
+       (match_test "op == const0_rtx")))
+
+;; True if this operator is valid for predication.
+(define_predicate "predicate_operator"
+  (match_code "eq,ne"))
+
+(define_predicate "ne_operator"
+  (match_code "ne"))
+
+(define_predicate "nvptx_comparison_operator"
+  (match_code "eq,ne,le,ge,lt,gt,leu,geu,ltu,gtu"))
+
+(define_predicate "nvptx_float_comparison_operator"
+  (match_code "eq,ne,le,ge,lt,gt,uneq,unle,unge,unlt,ungt,unordered,ordered"))
+
+;; Test for a valid operand for a call instruction.
+(define_special_predicate "call_insn_operand"
+  (match_code "symbol_ref,reg")
+{
+  if (GET_CODE (op) == SYMBOL_REF)
+    {
+      tree decl = SYMBOL_REF_DECL (op);
+      /* This happens for libcalls.  */
+      if (decl == NULL_TREE)
+        return true;
+      return TREE_CODE (SYMBOL_REF_DECL (op)) == FUNCTION_DECL;
+    }
+  return true;
+})
+
+;; Return true if OP is a call with parallel USEs of the argument
+;; pseudos.
+(define_predicate "call_operation"
+  (match_code "parallel")
+{
+  unsigned i;
+
+  for (i = 1; i < XVECLEN (op, 0); i++)
+    {
+      rtx elt = XVECEXP (op, 0, i);
+      enum machine_mode mode;
+      unsigned regno;
+
+      if (GET_CODE (elt) != USE
+          || GET_CODE (XEXP (elt, 0)) != REG
+          || XEXP (elt, 0) == frame_pointer_rtx
+          || XEXP (elt, 0) == arg_pointer_rtx
+          || XEXP (elt, 0) == stack_pointer_rtx)
+
+        return false;
+    }
+  return true;
+})
+
+(define_constraint "P0"
+  "An integer with the value 0."
+  (and (match_code "const_int")
+       (match_test "ival == 0")))
+
+(define_constraint "P1"
+  "An integer with the value 1."
+  (and (match_code "const_int")
+       (match_test "ival == 1")))
+
+(define_constraint "Pn"
+  "An integer with the value -1."
+  (and (match_code "const_int")
+       (match_test "ival == -1")))
+
+(define_constraint "R"
+  "A pseudo register."
+  (match_code "reg"))
+
+(define_constraint "Ia"
+  "Any integer constant."
+  (and (match_code "const_int") (match_test "true")))
+
+(define_mode_iterator QHSDISDFM [QI HI SI DI SF DF])
+(define_mode_iterator QHSDIM [QI HI SI DI])
+(define_mode_iterator HSDIM [HI SI DI])
+(define_mode_iterator BHSDIM [BI HI SI DI])
+(define_mode_iterator SDIM [SI DI])
+(define_mode_iterator SDISDFM [SI DI SF DF])
+(define_mode_iterator QHIM [QI HI])
+(define_mode_iterator QHSIM [QI HI SI])
+(define_mode_iterator SDFM [SF DF])
+(define_mode_iterator SDCM [SC DC])
+
+;; This mode iterator allows :P to be used for patterns that operate on
+;; pointer-sized quantities.  Exactly one of the two alternatives will match.
+(define_mode_iterator P [(SI "Pmode == SImode") (DI "Pmode == DImode")])
+
+;; We should get away with not defining memory alternatives, since we don't
+;; get variables in this mode and pseudos are never spilled.
+(define_insn "movbi"
+  [(set (match_operand:BI 0 "nvptx_register_operand" "=R,R,R")
+	(match_operand:BI 1 "nvptx_nonmemory_operand" "R,P0,Pn"))]
+  ""
+  "@
+   %.\\tmov%t0\\t%0, %1;
+   %.\\tsetp.eq.u32\\t%0, 1, 0;
+   %.\\tsetp.eq.u32\\t%0, 1, 1;")
+
+(define_insn "*mov<mode>_insn"
+  [(set (match_operand:QHSDIM 0 "nvptx_nonimmediate_operand" "=R,R,R,m")
+	(match_operand:QHSDIM 1 "general_operand" "n,Ri,m,R"))]
+  "!(MEM_P (operands[0])
+     && (!REG_P (operands[1]) || REGNO (operands[1]) <= LAST_VIRTUAL_REGISTER))"
+{
+  if (which_alternative == 2)
+    return "%.\\tld%A1%u1\\t%0, %1;";
+  if (which_alternative == 3)
+    return "%.\\tst%A0%u0\\t%0, %1;";
+
+  rtx dst = operands[0];
+  rtx src = operands[1];
+
+  enum machine_mode dst_mode = nvptx_underlying_object_mode (dst);
+  enum machine_mode src_mode = nvptx_underlying_object_mode (src);
+  if (GET_CODE (dst) == SUBREG)
+    dst = SUBREG_REG (dst);
+  if (GET_CODE (src) == SUBREG)
+    src = SUBREG_REG (src);
+  if (src_mode == QImode)
+    src_mode = SImode;
+  if (dst_mode == QImode)
+    dst_mode = SImode;
+  if (CONSTANT_P (src))
+    {
+      if (GET_MODE_CLASS (dst_mode) != MODE_INT)
+        return "%.\\tmov.b%T0\\t%0, %1;";
+      else
+        return "%.\\tmov%t0\\t%0, %1;";
+    }
+
+  /* Special handling for the return register; we allow this register to
+     only occur in the destination of a move insn.  */
+  if (REG_P (dst) && REGNO (dst) == NVPTX_RETURN_REGNUM
+      && dst_mode == HImode)
+    dst_mode = SImode;
+  if (dst_mode == src_mode)
+    return "%.\\tmov%t0\\t%0, %1;";
+  /* Mode-punning between floating point and integer.  */
+  if (GET_MODE_SIZE (dst_mode) == GET_MODE_SIZE (src_mode))
+    return "%.\\tmov.b%T0\\t%0, %1;";
+  return "%.\\tcvt%t0%t1\\t%0, %1;";
+}
+  [(set_attr "subregs_ok" "true")])
+
+(define_insn "*mov<mode>_insn"
+  [(set (match_operand:SDFM 0 "nvptx_nonimmediate_operand" "=R,R,m")
+	(match_operand:SDFM 1 "general_operand" "RF,m,R"))]
+  "!(MEM_P (operands[0]) && !REG_P (operands[1]))"
+{
+  if (which_alternative == 1)
+    return "%.\\tld%A1%u0\\t%0, %1;";
+  if (which_alternative == 2)
+    return "%.\\tst%A0%u1\\t%0, %1;";
+
+  rtx dst = operands[0];
+  rtx src = operands[1];
+  if (GET_CODE (dst) == SUBREG)
+    dst = SUBREG_REG (dst);
+  if (GET_CODE (src) == SUBREG)
+    src = SUBREG_REG (src);
+  enum machine_mode dst_mode = GET_MODE (dst);
+  enum machine_mode src_mode = GET_MODE (src);
+  if (dst_mode == src_mode)
+    return "%.\\tmov%t0\\t%0, %1;";
+  if (GET_MODE_SIZE (dst_mode) == GET_MODE_SIZE (src_mode))
+    return "%.\\tmov.b%T0\\t%0, %1;";
+  gcc_unreachable ();
+}
+  [(set_attr "subregs_ok" "true")])
+
+(define_insn "load_arg_reg<mode>"
+  [(set (match_operand:QHIM 0 "nvptx_register_operand" "=R")
+	(unspec:QHIM [(match_operand 1 "const_int_operand" "i")]
+		     UNSPEC_ARG_REG))]
+  ""
+  "%.\\tcvt%t0.u32\\t%0, %%ar%1;")
+
+(define_insn "load_arg_reg<mode>"
+  [(set (match_operand:SDISDFM 0 "nvptx_register_operand" "=R")
+	(unspec:SDISDFM [(match_operand 1 "const_int_operand" "i")]
+			UNSPEC_ARG_REG))]
+  ""
+  "%.\\tmov%t0\\t%0, %%ar%1;")
+
+(define_expand "mov<mode>"
+  [(set (match_operand:QHSDISDFM 0 "nvptx_nonimmediate_operand" "")
+	(match_operand:QHSDISDFM 1 "general_operand" ""))]
+  ""
+{
+  operands[1] = nvptx_maybe_convert_symbolic_operand (operands[1]);
+  /* Record the mode of the return register so that we can prevent
+     later optimization passes from changing it.  */
+  if (REG_P (operands[0]) && REGNO (operands[0]) == NVPTX_RETURN_REGNUM
+      && cfun)
+    {
+      if (cfun->machine->ret_reg_mode == VOIDmode)
+	cfun->machine->ret_reg_mode = GET_MODE (operands[0]);
+      else
+        gcc_assert (cfun->machine->ret_reg_mode == GET_MODE (operands[0]));
+    }
+
+  /* Hard registers are often actually symbolic operands on this target.
+     Don't allow them when storing to memory.  */
+  if (MEM_P (operands[0])
+      && (!REG_P (operands[1])
+	  || REGNO (operands[1]) <= LAST_VIRTUAL_REGISTER))
+    {
+      rtx tmp = gen_reg_rtx (<MODE>mode);
+      emit_move_insn (tmp, operands[1]);
+      emit_move_insn (operands[0], tmp);
+      DONE;
+    }
+  if (GET_CODE (operands[1]) == SYMBOL_REF)
+    nvptx_record_needed_fndecl (SYMBOL_REF_DECL (operands[1]));
+})
+
+(define_insn "highpartscsf2"
+  [(set (match_operand:SF 0 "nvptx_register_operand" "=R")
+	(unspec:SF [(match_operand:SC 1 "nvptx_register_operand")]
+		   UNSPEC_CPLX_HIGHPART))]
+  ""
+  "%.\\tmov%t0\\t%0, %f1$1;")
+
+(define_insn "set_highpartsfsc2"
+  [(set (match_operand:SC 0 "nvptx_register_operand" "+R")
+	(unspec:SC [(match_dup 0)
+		    (match_operand:SF 1 "nvptx_register_operand")]
+		   UNSPEC_CPLX_HIGHPART))]
+  ""
+  "%.\\tmov%t1\\t%f0$1, %1;")
+
+(define_insn "lowpartscsf2"
+  [(set (match_operand:SF 0 "nvptx_register_operand" "=R")
+	(unspec:SF [(match_operand:SC 1 "nvptx_register_operand")]
+		   UNSPEC_CPLX_LOWPART))]
+  ""
+  "%.\\tmov%t0\\t%0, %f1$0;")
+
+(define_insn "set_lowpartsfsc2"
+  [(set (match_operand:SC 0 "nvptx_register_operand" "+R")
+	(unspec:SC [(match_dup 0)
+		    (match_operand:SF 1 "nvptx_register_operand")]
+		   UNSPEC_CPLX_LOWPART))]
+  ""
+  "%.\\tmov%t1\\t%f0$0, %1;")
+
+(define_expand "mov<mode>"
+  [(set (match_operand:SDCM 0 "nvptx_nonimmediate_operand" "")
+	(match_operand:SDCM 1 "general_operand" ""))]
+  ""
+{
+  enum machine_mode submode = <MODE>mode == SCmode ? SFmode : DFmode;
+  int sz = GET_MODE_SIZE (submode);
+  rtx xops[4];
+  rtx punning_reg = NULL_RTX;
+  rtx copyback = NULL_RTX;
+
+  if (GET_CODE (operands[0]) == SUBREG)
+    {
+      rtx inner = SUBREG_REG (operands[0]);
+      enum machine_mode inner_mode = GET_MODE (inner);
+      int sz2 = GET_MODE_SIZE (inner_mode);
+      gcc_assert (sz2 >= sz);
+      cfun->machine->punning_buffer_size
+        = MAX (cfun->machine->punning_buffer_size, sz2);
+      if (punning_reg == NULL_RTX)
+	punning_reg = gen_rtx_REG (Pmode, NVPTX_PUNNING_BUFFER_REGNUM);
+      copyback = gen_move_insn (inner, gen_rtx_MEM (inner_mode, punning_reg));
+      operands[0] = gen_rtx_MEM (<MODE>mode, punning_reg);
+    }
+  if (GET_CODE (operands[1]) == SUBREG)
+    {
+      rtx inner = SUBREG_REG (operands[1]);
+      enum machine_mode inner_mode = GET_MODE (inner);
+      int sz2 = GET_MODE_SIZE (inner_mode);
+      gcc_assert (sz2 >= sz);
+      cfun->machine->punning_buffer_size
+        = MAX (cfun->machine->punning_buffer_size, sz2);
+      if (punning_reg == NULL_RTX)
+	punning_reg = gen_rtx_REG (Pmode, NVPTX_PUNNING_BUFFER_REGNUM);
+      emit_move_insn (gen_rtx_MEM (inner_mode, punning_reg), inner);
+      operands[1] = gen_rtx_MEM (<MODE>mode, punning_reg);
+    }
+
+  if (REG_P (operands[0]) && submode == SFmode)
+    {
+      xops[0] = gen_reg_rtx (submode);
+      xops[1] = gen_reg_rtx (submode);
+    }
+  else
+    {
+      xops[0] = gen_lowpart (submode, operands[0]);
+      if (MEM_P (operands[0]))
+	xops[1] = adjust_address_nv (operands[0], submode, sz);
+      else
+	xops[1] = gen_highpart (submode, operands[0]);
+    }
+
+  if (REG_P (operands[1]) && submode == SFmode)
+    {
+      xops[2] = gen_reg_rtx (submode);
+      xops[3] = gen_reg_rtx (submode);
+      emit_insn (gen_lowpartscsf2 (xops[2], operands[1]));
+      emit_insn (gen_highpartscsf2 (xops[3], operands[1]));
+    }
+  else
+    {
+      xops[2] = gen_lowpart (submode, operands[1]);
+      if (MEM_P (operands[1]))
+	xops[3] = adjust_address_nv (operands[1], submode, sz);
+      else
+	xops[3] = gen_highpart (submode, operands[1]);
+    }
+
+  emit_move_insn (xops[0], xops[2]);
+  emit_move_insn (xops[1], xops[3]);
+  if (REG_P (operands[0]) && submode == SFmode)
+    {
+      emit_insn (gen_set_lowpartsfsc2 (operands[0], xops[0]));
+      emit_insn (gen_set_highpartsfsc2 (operands[0], xops[1]));
+    }
+  if (copyback)
+    emit_insn (copyback);
+  DONE;
+})
+
+(define_insn "zero_extendqihi2"
+  [(set (match_operand:HI 0 "nvptx_register_operand" "=R,R")
+	(zero_extend:HI (match_operand:QI 1 "nvptx_reg_or_mem_operand" "R,m")))]
+  ""
+  "@
+   %.\\tcvt.u16.u%T1\\t%0, %1;
+   %.\\tld%A1.u8\\t%0, %1;"
+  [(set_attr "subregs_ok" "true")])
+
+(define_insn "zero_extend<mode>si2"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R,R")
+	(zero_extend:SI (match_operand:QHIM 1 "nvptx_reg_or_mem_operand" "R,m")))]
+  ""
+  "@
+   %.\\tcvt.u32.u%T1\\t%0, %1;
+   %.\\tld%A1.u%T1\\t%0, %1;"
+  [(set_attr "subregs_ok" "true")])
+
+(define_insn "zero_extend<mode>di2"
+  [(set (match_operand:DI 0 "nvptx_register_operand" "=R,R")
+	(zero_extend:DI (match_operand:QHSIM 1 "nvptx_reg_or_mem_operand" "R,m")))]
+  ""
+  "@
+   %.\\tcvt.u64.u%T1\\t%0, %1;
+   %.\\tld%A1%u1\\t%0, %1;"
+  [(set_attr "subregs_ok" "true")])
+
+(define_insn "extend<mode>si2"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R,R")
+	(sign_extend:SI (match_operand:QHIM 1 "nvptx_reg_or_mem_operand" "R,m")))]
+  ""
+  "@
+   %.\\tcvt.s32.s%T1\\t%0, %1;
+   %.\\tld%A1.s%T1\\t%0, %1;"
+  [(set_attr "subregs_ok" "true")])
+
+(define_insn "extend<mode>di2"
+  [(set (match_operand:DI 0 "nvptx_register_operand" "=R,R")
+	(sign_extend:DI (match_operand:QHSIM 1 "nvptx_reg_or_mem_operand" "R,m")))]
+  ""
+  "@
+   %.\\tcvt.s64.s%T1\\t%0, %1;
+   %.\\tld%A1.s%T1\\t%0, %1;"
+  [(set_attr "subregs_ok" "true")])
+
+(define_insn "trunchiqi2"
+  [(set (match_operand:QI 0 "nvptx_reg_or_mem_operand" "=R,m")
+	(truncate:QI (match_operand:HI 1 "nvptx_register_operand" "R,R")))]
+  ""
+  "@
+   %.\\tcvt%t0.u16\\t%0, %1;
+   %.\\tst%A0.u8\\t%0, %1;"
+  [(set_attr "subregs_ok" "true")])
+
+(define_insn "truncsi<mode>2"
+  [(set (match_operand:QHIM 0 "nvptx_reg_or_mem_operand" "=R,m")
+	(truncate:QHIM (match_operand:SI 1 "nvptx_register_operand" "R,R")))]
+  ""
+  "@
+   %.\\tcvt%t0.u32\\t%0, %1;
+   %.\\tst%A0.u%T0\\t%0, %1;"
+  [(set_attr "subregs_ok" "true")])
+
+(define_insn "truncdi<mode>2"
+  [(set (match_operand:QHSIM 0 "nvptx_reg_or_mem_operand" "=R,m")
+	(truncate:QHSIM (match_operand:DI 1 "nvptx_register_operand" "R,R")))]
+  ""
+  "@
+   %.\\tcvt%t0.u64\\t%0, %1;
+   %.\\tst%A0.u%T0\\t%0, %1;"
+  [(set_attr "subregs_ok" "true")])
+
+;; Pointer address space conversions
+
+(define_int_iterator cvt_code
+  [UNSPEC_FROM_GLOBAL
+   UNSPEC_FROM_LOCAL
+   UNSPEC_FROM_SHARED
+   UNSPEC_FROM_CONST
+   UNSPEC_TO_GLOBAL
+   UNSPEC_TO_LOCAL
+   UNSPEC_TO_SHARED
+   UNSPEC_TO_CONST])
+
+(define_int_attr cvt_name
+  [(UNSPEC_FROM_GLOBAL "from_global")
+   (UNSPEC_FROM_LOCAL "from_local")
+   (UNSPEC_FROM_SHARED "from_shared")
+   (UNSPEC_FROM_CONST "from_const")
+   (UNSPEC_TO_GLOBAL "to_global")
+   (UNSPEC_TO_LOCAL "to_local")
+   (UNSPEC_TO_SHARED "to_shared")
+   (UNSPEC_TO_CONST "to_const")])
+
+(define_int_attr cvt_str
+  [(UNSPEC_FROM_GLOBAL ".global")
+   (UNSPEC_FROM_LOCAL ".local")
+   (UNSPEC_FROM_SHARED ".shared")
+   (UNSPEC_FROM_CONST ".const")
+   (UNSPEC_TO_GLOBAL ".to.global")
+   (UNSPEC_TO_LOCAL ".to.local")
+   (UNSPEC_TO_SHARED ".to.shared")
+   (UNSPEC_TO_CONST ".to.const")])
+
+(define_insn "convaddr_<cvt_name><mode>"
+  [(set (match_operand:P 0 "nvptx_register_operand" "=R")
+	(unspec:P [(match_operand:P 1 "nvptx_register_or_symbolic_operand" "Rs")] cvt_code))]
+  ""
+  "%.\\tcvta<cvt_str>%t0\\t%0, %1;")
+
+;; Integer arithmetic
+
+(define_insn "add<mode>3"
+  [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
+	(plus:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R")
+		    (match_operand:HSDIM 2 "nvptx_nonmemory_operand" "Ri")))]
+  ""
+  "%.\\tadd%t0\\t%0, %1, %2;")
+
+(define_insn "sub<mode>3"
+  [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
+	(minus:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R")
+		     (match_operand:HSDIM 2 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tsub%t0\\t%0, %1, %2;")
+
+(define_insn "mul<mode>3"
+  [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
+	(mult:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R")
+		    (match_operand:HSDIM 2 "nvptx_nonmemory_operand" "Ri")))]
+  ""
+  "%.\\tmul.lo%t0\\t%0, %1, %2;")
+
+(define_insn "*mad<mode>3"
+  [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
+	(plus:HSDIM (mult:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R")
+				(match_operand:HSDIM 2 "nvptx_nonmemory_operand" "Ri"))
+		    (match_operand:HSDIM 3 "nvptx_nonmemory_operand" "Ri")))]
+  ""
+  "%.\\tmad.lo%t0\\t%0, %1, %2, %3;")
+
+(define_insn "div<mode>3"
+  [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
+	(div:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R")
+		   (match_operand:HSDIM 2 "nvptx_nonmemory_operand" "Ri")))]
+  ""
+  "%.\\tdiv.s%T0\\t%0, %1, %2;")
+
+(define_insn "udiv<mode>3"
+  [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
+	(udiv:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R")
+		   (match_operand:HSDIM 2 "nvptx_nonmemory_operand" "Ri")))]
+  ""
+  "%.\\tdiv.u%T0\\t%0, %1, %2;")
+
+(define_insn "mod<mode>3"
+  [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
+	(mod:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "Ri")
+		   (match_operand:HSDIM 2 "nvptx_nonmemory_operand" "Ri")))]
+  ""
+  "%.\\trem.s%T0\\t%0, %1, %2;")
+
+(define_insn "umod<mode>3"
+  [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
+	(umod:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "Ri")
+		    (match_operand:HSDIM 2 "nvptx_nonmemory_operand" "Ri")))]
+  ""
+  "%.\\trem.u%T0\\t%0, %1, %2;")
+
+(define_insn "smin<mode>3"
+  [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
+	(smin:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R")
+		    (match_operand:HSDIM 2 "nvptx_nonmemory_operand" "Ri")))]
+  ""
+  "%.\\tmin.s%T0\\t%0, %1, %2;")
+
+(define_insn "umin<mode>3"
+  [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
+	(umin:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R")
+		    (match_operand:HSDIM 2 "nvptx_nonmemory_operand" "Ri")))]
+  ""
+  "%.\\tmin.u%T0\\t%0, %1, %2;")
+
+(define_insn "smax<mode>3"
+  [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
+	(smax:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R")
+		    (match_operand:HSDIM 2 "nvptx_nonmemory_operand" "Ri")))]
+  ""
+  "%.\\tmax.s%T0\\t%0, %1, %2;")
+
+(define_insn "umax<mode>3"
+  [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
+	(umax:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R")
+		    (match_operand:HSDIM 2 "nvptx_nonmemory_operand" "Ri")))]
+  ""
+  "%.\\tmax.u%T0\\t%0, %1, %2;")
+
+(define_insn "abs<mode>2"
+  [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
+	(abs:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tabs.s%T0\\t%0, %1;")
+
+(define_insn "neg<mode>2"
+  [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
+	(neg:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tneg.s%T0\\t%0, %1;")
+
+(define_insn "one_cmpl<mode>2"
+  [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
+	(not:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tnot.b%T0\\t%0, %1;")
+
+(define_insn "bitrev<mode>2"
+  [(set (match_operand:SDIM 0 "nvptx_register_operand" "=R")
+	(unspec:SDIM [(match_operand:SDIM 1 "nvptx_register_operand" "R")]
+		     UNSPEC_BITREV))]
+  ""
+  "%.\\tbrev.b%T0\\t%0, %1;")
+
+(define_insn "clz<mode>2"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+	(clz:SI (match_operand:SDIM 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tclz.b%T0\\t%0, %1;")
+
+(define_expand "ctz<mode>2"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "")
+	(ctz:SI (match_operand:SDIM 1 "nvptx_register_operand" "")))]
+  ""
+{
+  rtx tmpreg = gen_reg_rtx (<MODE>mode);
+  emit_insn (gen_bitrev<mode>2 (tmpreg, operands[1]));
+  emit_insn (gen_clz<mode>2 (operands[0], tmpreg));
+  DONE;
+})
+
+;; Shifts
+
+(define_insn "ashl<mode>3"
+  [(set (match_operand:SDIM 0 "nvptx_register_operand" "=R")
+	(ashift:SDIM (match_operand:SDIM 1 "nvptx_register_operand" "R")
+		     (match_operand:SI 2 "nvptx_nonmemory_operand" "Ri")))]
+  ""
+  "%.\\tshl.b%T0\\t%0, %1, %2;")
+
+(define_insn "ashr<mode>3"
+  [(set (match_operand:SDIM 0 "nvptx_register_operand" "=R")
+	(ashiftrt:SDIM (match_operand:SDIM 1 "nvptx_register_operand" "R")
+		       (match_operand:SI 2 "nvptx_nonmemory_operand" "Ri")))]
+  ""
+  "%.\\tshr.s%T0\\t%0, %1, %2;")
+
+(define_insn "lshr<mode>3"
+  [(set (match_operand:SDIM 0 "nvptx_register_operand" "=R")
+	(lshiftrt:SDIM (match_operand:SDIM 1 "nvptx_register_operand" "R")
+		       (match_operand:SI 2 "nvptx_nonmemory_operand" "Ri")))]
+  ""
+  "%.\\tshr.u%T0\\t%0, %1, %2;")
+
+;; Logical operations
+
+(define_insn "and<mode>3"
+  [(set (match_operand:BHSDIM 0 "nvptx_register_operand" "=R")
+	(and:BHSDIM (match_operand:BHSDIM 1 "nvptx_register_operand" "R")
+		    (match_operand:BHSDIM 2 "nvptx_nonmemory_operand" "Ri")))]
+  ""
+  "%.\\tand.b%T0\\t%0, %1, %2;")
+
+(define_insn "ior<mode>3"
+  [(set (match_operand:BHSDIM 0 "nvptx_register_operand" "=R")
+	(ior:BHSDIM (match_operand:BHSDIM 1 "nvptx_register_operand" "R")
+		    (match_operand:BHSDIM 2 "nvptx_nonmemory_operand" "Ri")))]
+  ""
+  "%.\\tor.b%T0\\t%0, %1, %2;")
+
+(define_insn "xor<mode>3"
+  [(set (match_operand:BHSDIM 0 "nvptx_register_operand" "=R")
+	(xor:BHSDIM (match_operand:BHSDIM 1 "nvptx_register_operand" "R")
+		    (match_operand:BHSDIM 2 "nvptx_nonmemory_operand" "Ri")))]
+  ""
+  "%.\\txor.b%T0\\t%0, %1, %2;")
+
+;; Comparisons and branches
+
+(define_insn "*cmp<mode>"
+  [(set (match_operand:BI 0 "nvptx_register_operand" "=R")
+	(match_operator:BI 1 "nvptx_comparison_operator"
+	   [(match_operand:HSDIM 2 "nvptx_register_operand" "R")
+	    (match_operand:HSDIM 3 "nvptx_nonmemory_operand" "Ri")]))]
+  ""
+  "%.\\tsetp%c1 %0,%2,%3;")
+
+(define_insn "*cmp<mode>"
+  [(set (match_operand:BI 0 "nvptx_register_operand" "=R")
+	(match_operator:BI 1 "nvptx_float_comparison_operator"
+	   [(match_operand:SDFM 2 "nvptx_register_operand" "R")
+	    (match_operand:SDFM 3 "nvptx_nonmemory_operand" "RF")]))]
+  ""
+  "%.\\tsetp%c1 %0,%2,%3;")
+
+(define_insn "jump"
+  [(set (pc)
+	(label_ref (match_operand 0 "" "")))]
+  ""
+  "%.\\tbra\\t%l0;")
+
+(define_insn "br_true"
+  [(set (pc)
+	(if_then_else (ne (match_operand:BI 0 "nvptx_register_operand" "R")
+			  (const_int 0))
+		      (label_ref (match_operand 1 "" ""))
+		      (pc)))]
+  ""
+  "%j0\\tbra\\t%l1;")
+
+(define_insn "br_false"
+  [(set (pc)
+	(if_then_else (eq (match_operand:BI 0 "nvptx_register_operand" "R")
+			  (const_int 0))
+		      (label_ref (match_operand 1 "" ""))
+		      (pc)))]
+  ""
+  "%J0\\tbra\\t%l1;")
+
+(define_expand "cbranch<mode>4"
+  [(set (pc)
+	(if_then_else (match_operator 0 "nvptx_comparison_operator"
+		       [(match_operand:HSDIM 1 "nvptx_register_operand" "")
+			(match_operand:HSDIM 2 "nvptx_register_operand" "")])
+		      (label_ref (match_operand 3 "" ""))
+		      (pc)))]
+  ""
+{
+  rtx t = nvptx_expand_compare (operands[0]);
+  operands[0] = t;
+  operands[1] = XEXP (t, 0);
+  operands[2] = XEXP (t, 1);
+})
+
+(define_expand "cbranch<mode>4"
+  [(set (pc)
+	(if_then_else (match_operator 0 "nvptx_float_comparison_operator"
+		       [(match_operand:SDFM 1 "nvptx_register_operand" "")
+			(match_operand:SDFM 2 "nvptx_register_operand" "")])
+		      (label_ref (match_operand 3 "" ""))
+		      (pc)))]
+  ""
+{
+  rtx t = nvptx_expand_compare (operands[0]);
+  operands[0] = t;
+  operands[1] = XEXP (t, 0);
+  operands[2] = XEXP (t, 1);
+})
+
+(define_expand "cbranchbi4"
+  [(set (pc)
+	(if_then_else (match_operator 0 "predicate_operator"
+		       [(match_operand:BI 1 "nvptx_register_operand" "")
+			(match_operand:BI 2 "const0_operand" "")])
+		      (label_ref (match_operand 3 "" ""))
+		      (pc)))]
+  ""
+  "")
+
+;; Conditional stores
+
+(define_insn "setcc_from_bi"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+	(ne:SI (match_operand:BI 1 "nvptx_register_operand" "R")
+	       (const_int 0)))]
+  ""
+  "%.\\tselp%t0 %0,-1,0,%1;")
+
+(define_insn "setcc_int<mode>"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+	(match_operator:SI 1 "nvptx_comparison_operator"
+			   [(match_operand:HSDIM 2 "nvptx_register_operand" "R")
+			    (match_operand:HSDIM 3 "nvptx_nonmemory_operand" "Ri")]))]
+  ""
+  "%.\\tset%t0%c1 %0,%2,%3;")
+
+(define_insn "setcc_int<mode>"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+	(match_operator:SI 1 "nvptx_float_comparison_operator"
+			   [(match_operand:SDFM 2 "nvptx_register_operand" "R")
+			    (match_operand:SDFM 3 "nvptx_nonmemory_operand" "RF")]))]
+  ""
+  "%.\\tset%t0%c1 %0,%2,%3;")
+
+(define_insn "setcc_float<mode>"
+  [(set (match_operand:SF 0 "nvptx_register_operand" "=R")
+	(match_operator:SF 1 "nvptx_comparison_operator"
+			   [(match_operand:HSDIM 2 "nvptx_register_operand" "R")
+			    (match_operand:HSDIM 3 "nvptx_nonmemory_operand" "Ri")]))]
+  ""
+  "%.\\tset%t0%c1 %0,%2,%3;")
+
+(define_insn "setcc_float<mode>"
+  [(set (match_operand:SF 0 "nvptx_register_operand" "=R")
+	(match_operator:SF 1 "nvptx_float_comparison_operator"
+			   [(match_operand:SDFM 2 "nvptx_register_operand" "R")
+			    (match_operand:SDFM 3 "nvptx_nonmemory_operand" "RF")]))]
+  ""
+  "%.\\tset%t0%c1 %0,%2,%3;")
+
+(define_expand "cstorebi4"
+  [(set (match_operand:SI 0 "nvptx_register_operand")
+	(match_operator:SI 1 "ne_operator"
+         [(match_operand:BI 2 "nvptx_register_operand")
+          (match_operand:BI 3 "const0_operand")]))]
+  ""
+  "")
+
+(define_expand "cstore<mode>4"
+  [(set (match_operand:SI 0 "nvptx_register_operand")
+	(match_operator:SI 1 "nvptx_comparison_operator"
+         [(match_operand:HSDIM 2 "nvptx_register_operand")
+          (match_operand:HSDIM 3 "nvptx_nonmemory_operand")]))]
+  ""
+  "")
+
+(define_expand "cstore<mode>4"
+  [(set (match_operand:SI 0 "nvptx_register_operand")
+	(match_operator:SI 1 "nvptx_float_comparison_operator"
+         [(match_operand:SDFM 2 "nvptx_register_operand")
+          (match_operand:SDFM 3 "nvptx_nonmemory_operand")]))]
+  ""
+  "")
+
+;; Calls
+
+(define_insn "call_insn"
+  [(match_parallel 2 "call_operation"
+    [(call (mem:QI (match_operand:SI 0 "call_insn_operand" "Rs"))
+	   (match_operand 1))])]
+  ""
+{
+  return nvptx_output_call_insn (insn, NULL_RTX, operands[0]);
+})
+
+(define_insn "call_value_insn"
+  [(match_parallel 3 "call_operation"
+    [(set (match_operand 0 "nvptx_register_operand" "=R")
+	  (call (mem:QI (match_operand:SI 1 "call_insn_operand" "Rs"))
+		(match_operand 2)))])]
+  ""
+{
+  return nvptx_output_call_insn (insn, operands[0], operands[1]);
+})
+
+(define_expand "call"
+ [(match_operand 0 "" "")]
+ ""
+{
+  nvptx_expand_call (NULL_RTX, operands[0]);
+  DONE;
+})
+
+(define_expand "call_value"
+  [(match_operand 0 "" "")
+   (match_operand 1 "" "")]
+ ""
+{
+  nvptx_expand_call (operands[0], operands[1]);
+  DONE;
+})
+
+;; Floating point arithmetic.
+
+(define_insn "add<mode>3"
+  [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
+	(plus:SDFM (match_operand:SDFM 1 "nvptx_register_operand" "R")
+		   (match_operand:SDFM 2 "nvptx_nonmemory_operand" "RF")))]
+  ""
+  "%.\\tadd%t0\\t%0, %1, %2;")
+
+(define_insn "sub<mode>3"
+  [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
+	(minus:SDFM (match_operand:SDFM 1 "nvptx_register_operand" "R")
+		    (match_operand:SDFM 2 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tsub%t0\\t%0, %1, %2;")
+
+(define_insn "mul<mode>3"
+  [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
+	(mult:SDFM (match_operand:SDFM 1 "nvptx_register_operand" "R")
+		   (match_operand:SDFM 2 "nvptx_nonmemory_operand" "RF")))]
+  ""
+  "%.\\tmul%t0\\t%0, %1, %2;")
+
+(define_insn "fma<mode>4"
+  [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
+	(fma:SDFM (match_operand:SDFM 1 "nvptx_register_operand" "R")
+		  (match_operand:SDFM 2 "nvptx_nonmemory_operand" "RF")
+		  (match_operand:SDFM 3 "nvptx_nonmemory_operand" "RF")))]
+  ""
+  "%.\\tfma%#%t0\\t%0, %1, %2, %3;")
+
+(define_insn "div<mode>3"
+  [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
+	(div:SDFM (match_operand:SDFM 1 "nvptx_register_operand" "R")
+		  (match_operand:SDFM 2 "nvptx_nonmemory_operand" "RF")))]
+  ""
+  "%.\\tdiv%#%t0\\t%0, %1, %2;")
+
+(define_insn "copysign<mode>3"
+  [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
+	(unspec:SDFM [(match_operand:SDFM 1 "nvptx_register_operand" "R")
+		      (match_operand:SDFM 2 "nvptx_register_operand" "R")]
+		      UNSPEC_COPYSIGN))]
+  ""
+  "%.\\tcopysign%t0\\t%0, %2, %1;")
+
+(define_insn "smin<mode>3"
+  [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
+	(smin:SDFM (match_operand:SDFM 1 "nvptx_register_operand" "R")
+		    (match_operand:SDFM 2 "nvptx_nonmemory_operand" "RF")))]
+  ""
+  "%.\\tmin%t0\\t%0, %1, %2;")
+
+(define_insn "smax<mode>3"
+  [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
+	(smax:SDFM (match_operand:SDFM 1 "nvptx_register_operand" "R")
+		    (match_operand:SDFM 2 "nvptx_nonmemory_operand" "RF")))]
+  ""
+  "%.\\tmax%t0\\t%0, %1, %2;")
+
+(define_insn "abs<mode>2"
+  [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
+	(abs:SDFM (match_operand:SDFM 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tabs%t0\\t%0, %1;")
+
+(define_insn "neg<mode>2"
+  [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
+	(neg:SDFM (match_operand:SDFM 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tneg%t0\\t%0, %1;")
+
+(define_insn "sqrt<mode>2"
+  [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
+	(sqrt:SDFM (match_operand:SDFM 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tsqrt%#%t0\\t%0, %1;")
+
+(define_insn "sinsf2"
+  [(set (match_operand:SF 0 "nvptx_register_operand" "=R")
+	(unspec:SF [(match_operand:SF 1 "nvptx_register_operand" "R")]
+		   UNSPEC_SIN))]
+  "flag_unsafe_math_optimizations"
+  "%.\\tsin.approx%t0\\t%0, %1;")
+
+(define_insn "cossf2"
+  [(set (match_operand:SF 0 "nvptx_register_operand" "=R")
+	(unspec:SF [(match_operand:SF 1 "nvptx_register_operand" "R")]
+		   UNSPEC_COS))]
+  "flag_unsafe_math_optimizations"
+  "%.\\tcos.approx%t0\\t%0, %1;")
+
+(define_insn "log2sf2"
+  [(set (match_operand:SF 0 "nvptx_register_operand" "=R")
+	(unspec:SF [(match_operand:SF 1 "nvptx_register_operand" "R")]
+		   UNSPEC_LOG2))]
+  "flag_unsafe_math_optimizations"
+  "%.\\tlg2.approx%t0\\t%0, %1;")
+
+(define_insn "exp2sf2"
+  [(set (match_operand:SF 0 "nvptx_register_operand" "=R")
+	(unspec:SF [(match_operand:SF 1 "nvptx_register_operand" "R")]
+		   UNSPEC_EXP2))]
+  "flag_unsafe_math_optimizations"
+  "%.\\tex2.approx%t0\\t%0, %1;")
+
+;; Conversions involving floating point
+
+(define_insn "extendsfdf2"
+  [(set (match_operand:DF 0 "nvptx_register_operand" "=R")
+	(float_extend:DF (match_operand:SF 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tcvt%t0%t1\\t%0, %1;")
+
+(define_insn "truncdfsf2"
+  [(set (match_operand:SF 0 "nvptx_register_operand" "=R")
+	(float_truncate:SF (match_operand:DF 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tcvt%#%t0%t1\\t%0, %1;")
+
+(define_insn "floatunssi<mode>2"
+  [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
+	(unsigned_float:SDFM (match_operand:SI 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tcvt%#%t0.u%T1\\t%0, %1;")
+
+(define_insn "floatsi<mode>2"
+  [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
+	(float:SDFM (match_operand:SI 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tcvt%#%t0.s%T1\\t%0, %1;")
+
+(define_insn "floatunsdi<mode>2"
+  [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
+	(unsigned_float:SDFM (match_operand:DI 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tcvt%#%t0.u%T1\\t%0, %1;")
+
+(define_insn "floatdi<mode>2"
+  [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
+	(float:SDFM (match_operand:DI 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tcvt%#%t0.s%T1\\t%0, %1;")
+
+(define_insn "fixuns_trunc<mode>si2"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+	(unsigned_fix:SI (match_operand:SDFM 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tcvt.rzi.u%T0%t1\\t%0, %1;")
+
+(define_insn "fix_trunc<mode>si2"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+	(fix:SI (match_operand:SDFM 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tcvt.rzi.s%T0%t1\\t%0, %1;")
+
+(define_insn "fixuns_trunc<mode>di2"
+  [(set (match_operand:DI 0 "nvptx_register_operand" "=R")
+	(unsigned_fix:DI (match_operand:SDFM 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tcvt.rzi.u%T0%t1\\t%0, %1;")
+
+(define_insn "fix_trunc<mode>di2"
+  [(set (match_operand:DI 0 "nvptx_register_operand" "=R")
+	(fix:DI (match_operand:SDFM 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tcvt.rzi.s%T0%t1\\t%0, %1;")
+
+(define_int_iterator FPINT [UNSPEC_FPINT_FLOOR UNSPEC_FPINT_BTRUNC
+			    UNSPEC_FPINT_CEIL UNSPEC_FPINT_NEARBYINT])
+(define_int_attr fpint_name [(UNSPEC_FPINT_FLOOR "floor")
+			     (UNSPEC_FPINT_BTRUNC "btrunc")
+			     (UNSPEC_FPINT_CEIL "ceil")
+			     (UNSPEC_FPINT_NEARBYINT "nearbyint")])
+(define_int_attr fpint_roundingmode [(UNSPEC_FPINT_FLOOR ".rmi")
+				     (UNSPEC_FPINT_BTRUNC ".rzi")
+				     (UNSPEC_FPINT_CEIL ".rpi")
+				     (UNSPEC_FPINT_NEARBYINT "%#i")])
+
+(define_insn "<FPINT:fpint_name><SDFM:mode>2"
+  [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
+	(unspec:SDFM [(match_operand:SDFM 1 "nvptx_register_operand" "R")]
+		     FPINT))]
+  ""
+  "%.\\tcvt<FPINT:fpint_roundingmode>%t0%t1\\t%0, %1;")
+
+(define_int_iterator FPINT2 [UNSPEC_FPINT_FLOOR UNSPEC_FPINT_CEIL])
+(define_int_attr fpint2_name [(UNSPEC_FPINT_FLOOR "lfloor")
+			     (UNSPEC_FPINT_CEIL "lceil")])
+(define_int_attr fpint2_roundingmode [(UNSPEC_FPINT_FLOOR ".rmi")
+				     (UNSPEC_FPINT_CEIL ".rpi")])
+
+(define_insn "<FPINT2:fpint2_name><SDFM:mode><SDIM:mode>2"
+  [(set (match_operand:SDIM 0 "nvptx_register_operand" "=R")
+	(unspec:SDIM [(match_operand:SDFM 1 "nvptx_register_operand" "R")]
+		     FPINT2))]
+  ""
+  "%.\\tcvt<FPINT2:fpint2_roundingmode>.s%T0%t1\\t%0, %1;")
+
+;; Miscellaneous
+
+(define_insn "nop"
+  [(const_int 0)]
+  ""
+  "")
+
+(define_insn "return"
+  [(return)]
+  ""
+{
+  return nvptx_output_return ();
+})
+
+(define_expand "epilogue"
+  [(clobber (const_int 0))]
+  ""
+{
+  emit_jump_insn (gen_return ());
+  DONE;
+})
+
+(define_expand "nonlocal_goto"
+  [(match_operand 0 "" "")
+   (match_operand 1 "" "")
+   (match_operand 2 "" "")
+   (match_operand 3 "" "")]
+  ""
+{
+  sorry ("target cannot support nonlocal goto.");
+  emit_insn (gen_nop ());
+  DONE;
+})
+
+(define_expand "nonlocal_goto_receiver"
+  [(const_int 0)]
+  ""
+{
+  sorry ("target cannot support nonlocal goto.");
+})
+
+(define_insn "allocate_stack"
+  [(set (match_operand 0 "nvptx_register_operand" "=R")
+	(unspec [(match_operand 1 "nvptx_register_operand" "R")]
+		  UNSPEC_ALLOCA))]
+  ""
+  "%.\\tcall (%0), %%alloca, (%1);")
+
+(define_expand "restore_stack_block"
+  [(match_operand 0 "register_operand" "")
+   (match_operand 1 "register_operand" "")]
+  ""
+{
+  DONE;
+})
+
+(define_expand "restore_stack_function"
+  [(match_operand 0 "register_operand" "")
+   (match_operand 1 "register_operand" "")]
+  ""
+{
+  DONE;
+})
+
+(define_insn "trap"
+  [(trap_if (const_int 1) (const_int 0))]
+  ""
+  "trap;")
+
+(define_insn "trap_if_true"
+  [(trap_if (ne (match_operand:BI 0 "nvptx_register_operand" "R")
+		(const_int 0))
+	    (const_int 0))]
+  ""
+  "%j0 trap;")
+
+(define_insn "trap_if_false"
+  [(trap_if (eq (match_operand:BI 0 "nvptx_register_operand" "R")
+		(const_int 0))
+	    (const_int 0))]
+  ""
+  "%J0 trap;")
+
+(define_expand "ctrap<mode>4"
+  [(trap_if (match_operator 0 "nvptx_comparison_operator"
+			    [(match_operand:SDIM 1 "nvptx_register_operand")
+			     (match_operand:SDIM 2 "nvptx_nonmemory_operand")])
+	    (match_operand 3 "const_0_operand"))]
+  ""
+{
+  rtx t = nvptx_expand_compare (operands[0]);
+  emit_insn (gen_trap_if_true (t));
+  DONE;
+})
+
+(define_insn "*oacc_ntid_insn"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+	(unspec:SI [(match_operand:SI 1 "const_int_operand" "n")] UNSPEC_NTID))]
+  ""
+  "%.\\tmov.u32 %0, %%ntid%d1;")
+
+(define_expand "oacc_ntid"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "")
+	(unspec:SI [(match_operand:SI 1 "const_int_operand" "")] UNSPEC_NTID))]
+  ""
+{
+  if (INTVAL (operands[1]) < 0 || INTVAL (operands[1]) > 2)
+    FAIL;
+})
+
+(define_insn "*oacc_tid_insn"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+	(unspec:SI [(match_operand:SI 1 "const_int_operand" "n")] UNSPEC_TID))]
+  ""
+  "%.\\tmov.u32 %0, %%tid%d1;")
+
+(define_expand "oacc_tid"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "")
+	(unspec:SI [(match_operand:SI 1 "const_int_operand" "")] UNSPEC_TID))]
+  ""
+{
+  if (INTVAL (operands[1]) < 0 || INTVAL (operands[1]) > 2)
+    FAIL;
+})
Index: libgcc/config.host
===================================================================
--- libgcc/config.host.orig
+++ libgcc/config.host
@@ -1236,6 +1236,10 @@ mep*-*-*)
 	tmake_file="mep/t-mep t-fdpbit"
 	extra_parts="crtbegin.o crtend.o"
 	;;
+nvptx-*)
+	tmake_file="$tmake_file nvptx/t-nvptx"
+	extra_parts="crt0.o"
+ 	;;
 *)
 	echo "*** Configuration ${host} not supported" 1>&2
 	exit 1
Index: libgcc/config/nvptx/t-nvptx
===================================================================
--- /dev/null
+++ libgcc/config/nvptx/t-nvptx
@@ -0,0 +1,9 @@
+LIB2ADD=$(srcdir)/config/nvptx/malloc.asm \
+	$(srcdir)/config/nvptx/free.asm \
+	$(srcdir)/config/nvptx/realloc.c
+
+LIB2ADDEH=
+LIB2FUNCS_EXCLUDE=__main
+
+crt0.o: $(srcdir)/config/nvptx/crt0.s
+	cp $< $@
Index: configure
===================================================================
--- configure.orig
+++ configure
@@ -3771,6 +3771,10 @@ case "${target}" in
   mips*-*-*)
     noconfigdirs="$noconfigdirs gprof"
     ;;
+  nvptx*-*-*)
+    # nvptx is just a compiler
+    noconfigdirs="$noconfigdirs target-libssp target-libstdc++-v3 target-libobjc"
+    ;;
   sh-*-* | sh64-*-*)
     case "${target}" in
       sh*-*-elf)
Index: configure.ac
===================================================================
--- configure.ac.orig
+++ configure.ac
@@ -1130,6 +1130,10 @@ case "${target}" in
   mips*-*-*)
     noconfigdirs="$noconfigdirs gprof"
     ;;
+  nvptx*-*-*)
+    # nvptx is just a compiler
+    noconfigdirs="$noconfigdirs target-libssp target-libstdc++-v3 target-libobjc"
+    ;;
   sh-*-* | sh64-*-*)
     case "${target}" in
       sh*-*-elf)
Index: libgcc/config/nvptx/crt0.s
===================================================================
--- /dev/null
+++ libgcc/config/nvptx/crt0.s
@@ -0,0 +1,45 @@
+	.version 3.1
+	.target	sm_30
+	.address_size 64
+
+.global .u64 %__exitval;
+// BEGIN GLOBAL FUNCTION DEF: abort
+.visible .func abort
+{
+        .reg .u64 %rd1;
+        ld.global.u64   %rd1,[%__exitval];
+        st.u32   [%rd1], 255;
+        exit;
+}
+// BEGIN GLOBAL FUNCTION DEF: exit
+.visible .func exit (.param .u32 %arg)
+{
+        .reg .u64 %rd1;
+	.reg .u32 %val;
+	ld.param.u32 %val,[%arg];
+        ld.global.u64   %rd1,[%__exitval];
+        st.u32   [%rd1], %val;
+        exit;
+}
+
+.extern .func (.param.u32 retval) main (.param.u32 argc, .param.u64 argv);
+
+.visible .entry __main (.param .u64 __retval, .param.u32 __argc, .param.u64 __argv)
+{
+        .reg .u32 %r<3>;
+        .reg .u64 %rd<3>;
+	.param.u32 %argc;
+	.param.u64 %argp;
+	.param.u32 %mainret;
+        ld.param.u64    %rd0, [__retval];
+        st.global.u64   [%__exitval], %rd0;
+
+	ld.param.u32	%r1, [__argc];
+	ld.param.u64	%rd1, [__argv];
+	st.param.u32	[%argc], %r1;
+	st.param.u64	[%argp], %rd1;
+        call.uni        (%mainret), main, (%argc, %argp);
+	ld.param.u32	%r1,[%mainret];
+        st.s32   [%rd0], %r1;
+        exit;
+}
Index: libgcc/config/nvptx/nvptx-malloc.h
===================================================================
--- /dev/null
+++ libgcc/config/nvptx/nvptx-malloc.h
@@ -0,0 +1,26 @@
+/* Declarations for the malloc wrappers.
+
+   Copyright (C) 2014 Free Software Foundation, Inc.
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+extern void __nvptx_free (void *);
+extern void *__nvptx_malloc (size_t);
+extern void *__nvptx_realloc (void *, size_t);
Index: libgcc/config/nvptx/realloc.c
===================================================================
--- /dev/null
+++ libgcc/config/nvptx/realloc.c
@@ -0,0 +1,51 @@
+/* Implement realloc with the help of the malloc and free wrappers.
+
+   Copyright (C) 2014 Free Software Foundation, Inc.
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <stdlib.h>
+#include <string.h>
+#include "nvptx-malloc.h"
+
+void *
+__nvptx_realloc (void *ptr, size_t newsz)
+{
+  if (newsz == 0)
+    {
+      __nvptx_free (ptr);
+      return NULL;
+    }
+  void *newptr = __nvptx_malloc (newsz);
+
+  size_t oldsz;
+  if (ptr == NULL)
+    oldsz = 0;
+  else
+    {
+      size_t *sp = __extension__ (size_t *)(ptr - 8);
+      oldsz = *sp;
+    }
+  if (oldsz != 0)
+    memcpy (newptr, ptr, oldsz > newsz ? newsz : oldsz);
+
+  __nvptx_free (ptr);
+  return newptr;
+}
Index: libgcc/config/nvptx/free.asm
===================================================================
--- /dev/null
+++ libgcc/config/nvptx/free.asm
@@ -0,0 +1,50 @@
+// A wrapper around free to enable a realloc implementation.
+
+// Copyright (C) 2014 Free Software Foundation, Inc.
+
+// This file is free software; you can redistribute it and/or modify it
+// under the terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option) any
+// later version.
+
+// This file is distributed in the hope that it will be useful, but
+// WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+// General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// <http://www.gnu.org/licenses/>.
+
+        .version        3.1
+        .target sm_30
+        .address_size 64
+
+.extern .func free(.param.u64 %in_ar1);
+
+// BEGIN GLOBAL FUNCTION DEF: __nvptx_free
+.visible .func __nvptx_free(.param.u64 %in_ar1)
+{
+	.reg.u64 %ar1;
+	.reg.u64 %hr10;
+	.reg.u64 %r23;
+	.reg.pred %r25;
+	.reg.u64 %r27;
+	ld.param.u64 %ar1, [%in_ar1];
+		mov.u64	%r23, %ar1;
+		setp.eq.u64 %r25,%r23,0;
+	@%r25	bra	$L1;
+		add.u64	%r27, %r23, -8;
+	{
+		.param.u64 %out_arg0;
+		st.param.u64 [%out_arg0], %r27;
+		call free, (%out_arg0);
+	}
+$L1:
+	ret;
+	}
Index: libgcc/config/nvptx/malloc.asm
===================================================================
--- /dev/null
+++ libgcc/config/nvptx/malloc.asm
@@ -0,0 +1,55 @@
+// A wrapper around malloc to enable a realloc implementation.
+
+// Copyright (C) 2014 Free Software Foundation, Inc.
+
+// This file is free software; you can redistribute it and/or modify it
+// under the terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option) any
+// later version.
+
+// This file is distributed in the hope that it will be useful, but
+// WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+// General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// <http://www.gnu.org/licenses/>.
+
+        .version        3.1
+        .target sm_30
+        .address_size 64
+
+.extern .func (.param.u64 %out_retval) malloc(.param.u64 %in_ar1);
+
+// BEGIN GLOBAL FUNCTION DEF: __nvptx_malloc
+.visible .func (.param.u64 %out_retval) __nvptx_malloc(.param.u64 %in_ar1)
+{
+        .reg.u64 %ar1;
+.reg.u64 %retval;
+        .reg.u64 %hr10;
+        .reg.u64 %r26;
+        .reg.u64 %r28;
+        .reg.u64 %r29;
+        .reg.u64 %r31;
+        ld.param.u64 %ar1, [%in_ar1];
+                mov.u64 %r26, %ar1;
+                add.u64 %r28, %r26, 8;
+        {
+                .param.u64 %retval_in;
+                .param.u64 %out_arg0;
+                st.param.u64 [%out_arg0], %r28;
+                call (%retval_in), malloc, (%out_arg0);
+        	ld.param.u64    %r29, [%retval_in];
+        }
+                st.u64  [%r29], %r26;
+                add.u64 %r31, %r29, 8;
+                mov.u64 %retval, %r31;
+        	st.param.u64    [%out_retval], %retval;
+        	ret;
+}
Index: libgcc/shared-object.mk
===================================================================
--- libgcc/shared-object.mk.orig
+++ libgcc/shared-object.mk
@@ -24,13 +24,15 @@ $(error Unsupported file type: $o)
 endif
 endif
 
+as_flags-$o := -xassembler$(if $(filter .S,$(suffix $o)),-with-cpp)
+
 $(base)$(objext): $o $(base).vis
-	$(gcc_compile) -c -xassembler-with-cpp -include $*.vis $<
+	$(gcc_compile) -c $(as_flags-$<) -include $*.vis $<
 
 $(base).vis: $(base)_s$(objext)
 	$(gen-hide-list)
 
 $(base)_s$(objext): $o
-	$(gcc_s_compile) -c -xassembler-with-cpp $<
+	$(gcc_s_compile) -c $(as_flags-$<) $<
 
 endif
Index: libgcc/static-object.mk
===================================================================
--- libgcc/static-object.mk.orig
+++ libgcc/static-object.mk
@@ -24,13 +24,15 @@ $(error Unsupported file type: $o)
 endif
 endif
 
+as_flags-$o := -xassembler$(if $(filter .S,$(suffix $o)),-with-cpp)
+
 $(base)$(objext): $o $(base).vis
-	$(gcc_compile) -c -xassembler-with-cpp -include $*.vis $<
+	$(gcc_compile) -c $(as_flags-$<) -include $*.vis $<
 
 $(base).vis: $(base)_s$(objext)
 	$(gen-hide-list)
 
 $(base)_s$(objext): $o
-	$(gcc_s_compile) -c -xassembler-with-cpp $<
+	$(gcc_s_compile) -c $(as_flags-$<) $<
 
 endif

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [7/11+] Inform the port about call arguments
  2014-10-28 14:57             ` Bernd Schmidt
@ 2014-10-29 23:42               ` Jeff Law
  0 siblings, 0 replies; 82+ messages in thread
From: Jeff Law @ 2014-10-29 23:42 UTC (permalink / raw)
  To: Bernd Schmidt, GCC Patches

On 10/28/14 08:49, Bernd Schmidt wrote:
> On 10/22/2014 08:12 PM, Jeff Law wrote:
>> Yea, let's keep your approach.  Just wanted to explore a bit since the
>> PA seems to have a variety of similar characteristics.
>
> Here's an updated version of the patch. I experimented a little with ptx
> calling conventions and ran into an arg that had to be moved with
> memcpy, which exposed an ordering problem - all call_args were added to
> the memcpy call. So the invocation of the hook had to be moved downwards
> a bit, and the calculation of the return value needs to happen after it
> (since nvptx_function_value needs to know whether we are actually trying
> to construct a call at the moment).
>
> Bootstrapped and tested on x86_64-linux, ok?
OK.

Jeff

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [10/11+] Target files
  2014-10-28 15:10     ` Bernd Schmidt
@ 2014-10-29 23:51       ` Jeff Law
  2014-10-30  2:53         ` Bernd Schmidt
  2014-11-10 16:33         ` Bernd Schmidt
  2014-11-04 16:48       ` The nvptx port [10/11+] Target files Richard Henderson
  1 sibling, 2 replies; 82+ messages in thread
From: Jeff Law @ 2014-10-29 23:51 UTC (permalink / raw)
  To: Bernd Schmidt, GCC Patches

On 10/28/14 08:56, Bernd Schmidt wrote:
>
> I have patches that expose all the address spaces to the middle-end
> through a lower-as pass that runs early. The preliminary patches for
> that ran into some resistance and into general brokenness of our address
> space support, so I decided to rip all that out for the moment to get
> the basic port into the next version.
>
> This new version also implements a way of providing realloc that was
> suggested in another thread. Calls to malloc and free are redirected to
> libgcc variants. I'm not a big fan of wasting extra space on every
> allocation (which is why I didn't originally consider this approach
> viable), but it seems we'll have to do it that way. There's a change to
> the libgcc build system: on ptx we need comments in the assembly to
> survive, so we can't use -xassembler-with-cpp. I've not found any files
> named "*.asm", so I've changed that suffix to mean plain assembler.
>
>
> Bernd
>
>
> 010-target.diff
>
>
> 	* configure.ac: Allow configuring lto for nvptx.
> 	* configure: Regenerate.
>
> 	gcc/
> 	* config/nvptx/nvptx.c: New file.
> 	* config/nvptx/nvptx.h: New file.
> 	* config/nvptx/nvptx-protos.h: New file.
> 	* config/nvptx/nvptx.md: New file.
> 	* config/nvptx/t-nvptx: New file.
> 	* config/nvptx/nvptx.opt: New file.
> 	* common/config/nvptx/nvptx-common.c: New file.
> 	* config.gcc: Handle nvptx-*-*.
>
> 	libgcc/
> 	* config.host: Handle nvptx-*-*.
> 	* shared-object.mk (as-flags-$o): Define.
> 	($(base)$(objext), $(base)_s$(objext)): Use it instead of
> 	-xassembler-with-cpp.
> 	* static-object.mk: Identical changes.
> 	* config/nvptx/t-nvptx: New file.
> 	* config/nvptx/crt0.s: New file.
> 	* config/nvptx/free.asm: New file.
> 	* config/nvptx/malloc.asm: New file.
> 	* config/nvptx/realloc.c: New file.
A "nit" -- Richard S. recently removed the need to include the "enum" 
for "enum machine_mode".  I believe he had a script to handle the 
mundane parts of that change.  Please make sure to update the nvptx port 
to conform to that new convention, obviously feel free to use the script 
if you want.

You may need to update with James Greenhalgh's changes to 
MOVE_BY_PIECES_P and friends.

With those two issues addressed as needed, this is OK for the trunk.


FWIW, I'm amazed at how many similarities there are between what needs 
to be done for the PTX tools and what needed to be done to interface 
with the native HPPA tools way-back-when.  Simply amazing.

I notice that you've got some OpenMP bits (write_as_kernel).  Are y'all 
doing any testing with OpenMP or is that an artifact of layering OpenACC 
on top of the OpenMP infrastructure?

Also, I've asked the steering committee to appoint you as the maintainer 
for the nvptx port as well.

jeff

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [10/11+] Target files
  2014-10-29 23:51       ` Jeff Law
@ 2014-10-30  2:53         ` Bernd Schmidt
  2014-10-30  3:09           ` Jeff Law
  2014-11-10 16:33         ` Bernd Schmidt
  1 sibling, 1 reply; 82+ messages in thread
From: Bernd Schmidt @ 2014-10-30  2:53 UTC (permalink / raw)
  To: Jeff Law, GCC Patches

On 10/30/2014 12:35 AM, Jeff Law wrote:
> A "nit" -- Richard S. recently removed the need to include the "enum"
> for "enum machine_mode".  I believe he had a script to handle the
> mundane parts of that change.  Please make sure to update the nvptx port
> to conform to that new convention, obviously feel free to use the script
> if you want.
>
> You may need to update with James Greenhalgh's changes to
> MOVE_BY_PIECES_P and friends.

Ok, I'll look into those.

> With those two issues addressed as needed, this is OK for the trunk.

Thanks! I've pinged some of the preliminary patches that went unapproved 
up to this point.

One leftover issue, discussed in the [0/11] mail - what amount of 
documentation is appropriate for this, given that we don't want to 
support using this as anything other than an offload compiler? Should I 
still add all the standard invoke.texi/gccint.texi pieces?

> I notice that you've got some OpenMP bits (write_as_kernel).  Are y'all
> doing any testing with OpenMP or is that an artifact of layering OpenACC
> on top of the OpenMP infrastructure?

The distinction between .kernel and .func is really not to do with 
either - only .kernels are callable from the host, and only .funcs are 
callable from within ptx code.


Bernd

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [10/11+] Target files
  2014-10-30  2:53         ` Bernd Schmidt
@ 2014-10-30  3:09           ` Jeff Law
  0 siblings, 0 replies; 82+ messages in thread
From: Jeff Law @ 2014-10-30  3:09 UTC (permalink / raw)
  To: Bernd Schmidt, GCC Patches

On 10/29/14 17:55, Bernd Schmidt wrote:
> Thanks! I've pinged some of the preliminary patches that went unapproved
> up to this point.
Thanks.


>
> One leftover issue, discussed in the [0/11] mail - what amount of
> documentation is appropriate for this, given that we don't want to
> support using this as anything other than an offload compiler? Should I
> still add all the standard invoke.texi/gccint.texi pieces?
I'm still not sure here.  nvptx is quite a bit different than anything 
we've done in the past and I'm not sure how much of the traditional 
stuff we want to document vs on the other end how much of the special 
stuff we want to document.  I simply don't know.

>> I notice that you've got some OpenMP bits (write_as_kernel).  Are y'all
>> doing any testing with OpenMP or is that an artifact of layering OpenACC
>> on top of the OpenMP infrastructure?
>
> The distinction between .kernel and .func is really not to do with
> either - only .kernels are callable from the host, and only .funcs are
> callable from within ptx code.
Ok.  Thanks for clarifying.

jeff

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [11/11] More tools.
  2014-10-20 14:58 ` The nvptx port [11/11] More tools Bernd Schmidt
  2014-10-21  0:16   ` Joseph S. Myers
  2014-10-22 20:40   ` Jeff Law
@ 2014-10-31 21:04   ` Jeff Law
       [not found]     ` <54542050.6010908@codesourcery.com>
  2 siblings, 1 reply; 82+ messages in thread
From: Jeff Law @ 2014-10-31 21:04 UTC (permalink / raw)
  To: Bernd Schmidt, GCC Patches

On 10/20/14 08:48, Bernd Schmidt wrote:
> This is a "bonus" optional patch which adds ar, ranlib, as and ld to the
> ptx port. This is not proper binutils; ar and ranlib are just linked to
> the host versions, and the other two tools have the following functions:
>
> * nvptx-as is required to convert the compiler output to actual valid
>    ptx assembly, primarily by reordering declarations and definitions.
>    Believe me when I say that I've tried to make that work in the
>    compiler itself and it's pretty much impossible without some really
>    invasive changes.
> * nvptx-ld is just a pseudo linker that works by concatenating ptx
>    input files and separating them with nul characters. Actual linking
>    is something that happens later, when calling CUDA library functions,
>    but existing build system make it useful to have something called
>    "ld" which is able to bundle everything that's needed into a single
>    file, and this seemed to be the simplest way of achieving this.
>
> There's a toplevel configure.ac change necessary to make ar/ranlib
> useable by the libgcc build. Having some tools built like this has some
> precedent in t-vmsnative, but as Thomas noted it does make feature tests
> in gcc's configure somewhat ugly (but everything works well enough to
> build the compiler). The alternative here is to bundle all these files
> into a separate nvptx-tools package which users would have to download -
> something that would be nice to avoid.
>
> These tools currently require GNU extensions - something I probably
> ought to fix if we decide to add them to the gcc build itself.
Pondering this a bit more, I think this is fine in concept.  As you 
note, removing the GNU extensions or at least making them conditional 
would be good since these are going to be built with the host tools.

I'm not going to dig into the implementations...  I'm going to assume 
the nvptx maintainer (that's highly likely to be you :-) will own their 
care and feeding.

jeff

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [11/11] More tools.
       [not found]     ` <54542050.6010908@codesourcery.com>
@ 2014-11-03 21:49       ` Jeff Law
  0 siblings, 0 replies; 82+ messages in thread
From: Jeff Law @ 2014-11-03 21:49 UTC (permalink / raw)
  To: Bernd Schmidt, GCC Patches

On 10/31/14 17:50, Bernd Schmidt wrote:
> On 10/31/2014 09:56 PM, Jeff Law wrote:
>> Pondering this a bit more, I think this is fine in concept.  As you
>> note, removing the GNU extensions or at least making them conditional
>> would be good since these are going to be built with the host tools.
>>
>> I'm not going to dig into the implementations...  I'm going to assume
>> the nvptx maintainer (that's highly likely to be you :-) will own their
>> care and feeding.
>
> I was beginning to think I'd just make a separate package. That could
> then also include a nvptx-run which would have to link against CUDA
> libraries.
Your call.

jeff

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [1/11+] indirect jumps
  2014-10-20 14:21 ` The nvptx port [1/11+] indirect jumps Bernd Schmidt
  2014-10-21 18:29   ` Jeff Law
@ 2014-11-04 15:35   ` Bernd Schmidt
  2014-11-04 15:43     ` Richard Henderson
  1 sibling, 1 reply; 82+ messages in thread
From: Bernd Schmidt @ 2014-11-04 15:35 UTC (permalink / raw)
  To: GCC Patches; +Cc: Jeffrey A Law

On 10/20/2014 04:19 PM, Bernd Schmidt wrote:
> ptx doesn't have indirect jumps, so CODE_FOR_indirect_jump may not be
> defined.  Add a sorry.

Looking back through all the mails it turns out this one wasn't approved 
yet. Ping?


Bernd

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [1/11+] indirect jumps
  2014-11-04 15:35   ` Bernd Schmidt
@ 2014-11-04 15:43     ` Richard Henderson
  0 siblings, 0 replies; 82+ messages in thread
From: Richard Henderson @ 2014-11-04 15:43 UTC (permalink / raw)
  To: Bernd Schmidt, GCC Patches; +Cc: Jeffrey A Law

On 11/04/2014 04:32 PM, Bernd Schmidt wrote:
> On 10/20/2014 04:19 PM, Bernd Schmidt wrote:
>> ptx doesn't have indirect jumps, so CODE_FOR_indirect_jump may not be
>> defined.  Add a sorry.
> 
> Looking back through all the mails it turns out this one wasn't approved yet.
> Ping?

Ok.


r~

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [10/11+] Target files
  2014-10-28 15:10     ` Bernd Schmidt
  2014-10-29 23:51       ` Jeff Law
@ 2014-11-04 16:48       ` Richard Henderson
  2014-11-04 16:55         ` Bernd Schmidt
  1 sibling, 1 reply; 82+ messages in thread
From: Richard Henderson @ 2014-11-04 16:48 UTC (permalink / raw)
  To: Bernd Schmidt, Jeff Law, GCC Patches

On 10/28/2014 03:56 PM, Bernd Schmidt wrote:
> +nvptx_ptx_type_from_mode (enum machine_mode mode, bool promote)
> +{
> +  switch (mode)
> +    {
> +    case BLKmode:
> +      return ".b8";
> +    case BImode:
> +      return ".pred";
> +    case QImode:
> +      if (promote)
> +	return ".u32";
> +      else
> +	return ".u8";
> +    case HImode:
> +      return ".u16";

Promote here too?  Or does this have nothing to do with

> +static enum machine_mode
> +arg_promotion (enum machine_mode mode)
> +{
> +  if (mode == QImode || mode == HImode)
> +    return SImode;
> +  return mode;
> +}


r~

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [10/11+] Target files
  2014-11-04 16:48       ` The nvptx port [10/11+] Target files Richard Henderson
@ 2014-11-04 16:55         ` Bernd Schmidt
  2014-11-05 13:07           ` Bernd Schmidt
  0 siblings, 1 reply; 82+ messages in thread
From: Bernd Schmidt @ 2014-11-04 16:55 UTC (permalink / raw)
  To: Richard Henderson, Jeff Law, GCC Patches

On 11/04/2014 05:48 PM, Richard Henderson wrote:
> On 10/28/2014 03:56 PM, Bernd Schmidt wrote:
>> +nvptx_ptx_type_from_mode (enum machine_mode mode, bool promote)
>> +{
>> +  switch (mode)
>> +    {
>> +    case BLKmode:
>> +      return ".b8";
>> +    case BImode:
>> +      return ".pred";
>> +    case QImode:
>> +      if (promote)
>> +	return ".u32";
>> +      else
>> +	return ".u8";
>> +    case HImode:
>> +      return ".u16";
>
> Promote here too?  Or does this have nothing to do with
>
>> +static enum machine_mode
>> +arg_promotion (enum machine_mode mode)
>> +{
>> +  if (mode == QImode || mode == HImode)
>> +    return SImode;
>> +  return mode;
>> +}

No, these are different problems - the one in arg promotion is purely 
about K&R C and trying to match untyped function decls with calls, while 
the type_from_mode bit was about some ptx ideosyncracy. Although I 
forget what the problem was, that code is more than a year old - I'll 
see if I can get rid of this.


Bernd

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [8/11+] Write undefined decls.
  2014-10-22 18:23       ` Jeff Law
@ 2014-11-05 12:05         ` Bernd Schmidt
  2014-11-05 20:05           ` Jeff Law
  0 siblings, 1 reply; 82+ messages in thread
From: Bernd Schmidt @ 2014-11-05 12:05 UTC (permalink / raw)
  To: Jeff Law, GCC Patches

On 10/22/2014 08:11 PM, Jeff Law wrote:
> I'm not going to insist you do this in the same way as the PA.  That was
> a different era -- we had significant motivation to make things work in
> such a way that everything could be buried in the pa specific files.
> That sometimes led to less than optimal approaches to fix certain problems.

So... is this patch approved?


Bernd

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [10/11+] Target files
  2014-11-04 16:55         ` Bernd Schmidt
@ 2014-11-05 13:07           ` Bernd Schmidt
  0 siblings, 0 replies; 82+ messages in thread
From: Bernd Schmidt @ 2014-11-05 13:07 UTC (permalink / raw)
  To: Richard Henderson, Jeff Law, GCC Patches

On 11/04/2014 05:51 PM, Bernd Schmidt wrote:
> On 11/04/2014 05:48 PM, Richard Henderson wrote:
>> On 10/28/2014 03:56 PM, Bernd Schmidt wrote:
>>> +nvptx_ptx_type_from_mode (enum machine_mode mode, bool promote)
>>> +{
>>> +  switch (mode)
>>> +    {
>>> +    case BLKmode:
>>> +      return ".b8";
>>> +    case BImode:
>>> +      return ".pred";
>>> +    case QImode:
>>> +      if (promote)
>>> +    return ".u32";
>>> +      else
>>> +    return ".u8";
>>> +    case HImode:
>>> +      return ".u16";
>>
>> Promote here too?  Or does this have nothing to do with
>>
>>> +static enum machine_mode
>>> +arg_promotion (enum machine_mode mode)
>>> +{
>>> +  if (mode == QImode || mode == HImode)
>>> +    return SImode;
>>> +  return mode;
>>> +}
>
> No, these are different problems - the one in arg promotion is purely
> about K&R C and trying to match untyped function decls with calls, while
> the type_from_mode bit was about some ptx ideosyncracy. Although I
> forget what the problem was, that code is more than a year old - I'll
> see if I can get rid of this.

Err, no, it's quite necessary. From the manual "The .u8, .s8 and .b8 
instruction types are restricted to ld, st and cvt instructions." This 
means that if the compiler generates reasonable-looking code along the 
lines of

.reg .u8 %r70;
mov.u8 %r70,48;

you get

ptxas 20000211-1.o, line 191; error   : Arguments mismatch for 
instruction 'mov'

Now, one _could_ write .cvt.u8.u32 for the load immediate, but then one 
would also have to write .cvt.u8.u8 for register-register moves, and 
that's starting to look iffy. I don't really want to rely on the ptx 
assembler to do the right thing for "conversions" from one type to itself.


Bernd

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [8/11+] Write undefined decls.
  2014-11-05 12:05         ` Bernd Schmidt
@ 2014-11-05 20:05           ` Jeff Law
  0 siblings, 0 replies; 82+ messages in thread
From: Jeff Law @ 2014-11-05 20:05 UTC (permalink / raw)
  To: Bernd Schmidt, GCC Patches

On 11/05/14 05:01, Bernd Schmidt wrote:
> On 10/22/2014 08:11 PM, Jeff Law wrote:
>> I'm not going to insist you do this in the same way as the PA.  That was
>> a different era -- we had significant motivation to make things work in
>> such a way that everything could be buried in the pa specific files.
>> That sometimes led to less than optimal approaches to fix certain
>> problems.
>
> So... is this patch approved?
Yes, sorry for not being explicit.

Jeff

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [10/11+] Target files
  2014-10-29 23:51       ` Jeff Law
  2014-10-30  2:53         ` Bernd Schmidt
@ 2014-11-10 16:33         ` Bernd Schmidt
  2014-11-10 20:06           ` Jakub Jelinek
                             ` (2 more replies)
  1 sibling, 3 replies; 82+ messages in thread
From: Bernd Schmidt @ 2014-11-10 16:33 UTC (permalink / raw)
  To: Jeff Law, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 914 bytes --]

On 10/30/2014 12:35 AM, Jeff Law wrote:
> A "nit" -- Richard S. recently removed the need to include the "enum"
> for "enum machine_mode".  I believe he had a script to handle the
> mundane parts of that change.  Please make sure to update the nvptx port
> to conform to that new convention, obviously feel free to use the script
> if you want.
>
> You may need to update with James Greenhalgh's changes to
> MOVE_BY_PIECES_P and friends.
>
> With those two issues addressed as needed, this is OK for the trunk.

I've now committed it, in the following form. Other than the enum thing, 
this also adds some atomic instructions.

The scripts (11/11) I've put up on github, along with a hacked up 
newlib. These are at

https://github.com/bernds/nvptx-tools
https://github.com/bernds/nvptx-newlib

They are likely to migrate to MentorEmbedded from bernds, but that had 
some permissions problems last week.


Bernd


[-- Attachment #2: ptx-committed.diff --]
[-- Type: text/x-patch, Size: 132712 bytes --]

commit 659744a99d815b168716b4460e32f6a21593e494
Author: Bernd Schmidt <bernds@codesourcery.com>
Date:   Thu Nov 6 19:03:57 2014 +0100

    Add the nvptx port.
    
    	* configure.ac: Handle nvptx-*-*.
    	* configure: Regenerate.
    
    	gcc/
    	* config/nvptx/nvptx.c: New file.
    	* config/nvptx/nvptx.h: New file.
    	* config/nvptx/nvptx-protos.h: New file.
    	* config/nvptx/nvptx.md: New file.
    	* config/nvptx/t-nvptx: New file.
    	* config/nvptx/nvptx.opt: New file.
    	* common/config/nvptx/nvptx-common.c: New file.
    	* config.gcc: Handle nvptx-*-*.
    
    	libgcc/
    	* config.host: Handle nvptx-*-*.
    	* shared-object.mk (as-flags-$o): Define.
    	($(base)$(objext), $(base)_s$(objext)): Use it instead of
    	-xassembler-with-cpp.
    	* static-object.mk: Identical changes.
    	* config/nvptx/t-nvptx: New file.
    	* config/nvptx/crt0.s: New file.
    	* config/nvptx/free.asm: New file.
    	* config/nvptx/malloc.asm: New file.
    	* config/nvptx/realloc.c: New file.

diff --git a/ChangeLog b/ChangeLog
index fd6172a..e83d1e6 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,8 @@
+2014-11-06  Bernd Schmidt  <bernds@codesourcery.com>
+
+	* configure.ac: Handle nvptx-*-*.
+	* configure: Regenerate.
+
 2014-11-06  Prachi Godbole  <prachi.godbole@imgtec.com>
 
 	* MAINTAINERS (Write After Approval): Add myself.
diff --git a/configure b/configure
index d0c760b..0e014a3 100755
--- a/configure
+++ b/configure
@@ -3779,6 +3779,10 @@ case "${target}" in
   mips*-*-*)
     noconfigdirs="$noconfigdirs gprof"
     ;;
+  nvptx*-*-*)
+    # nvptx is just a compiler
+    noconfigdirs="$noconfigdirs target-libssp target-libstdc++-v3 target-libobjc"
+    ;;
   sh-*-* | sh64-*-*)
     case "${target}" in
       sh*-*-elf)
diff --git a/configure.ac b/configure.ac
index 2f0af4a..b1ef069 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1138,6 +1138,10 @@ case "${target}" in
   mips*-*-*)
     noconfigdirs="$noconfigdirs gprof"
     ;;
+  nvptx*-*-*)
+    # nvptx is just a compiler
+    noconfigdirs="$noconfigdirs target-libssp target-libstdc++-v3 target-libobjc"
+    ;;
   sh-*-* | sh64-*-*)
     case "${target}" in
       sh*-*-elf)
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 731a7bc8b..c170e69 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,14 @@
+2014-11-10  Bernd Schmidt  <bernds@codesourcery.com>
+
+	* config/nvptx/nvptx.c: New file.
+	* config/nvptx/nvptx.h: New file.
+	* config/nvptx/nvptx-protos.h: New file.
+	* config/nvptx/nvptx.md: New file.
+	* config/nvptx/t-nvptx: New file.
+	* config/nvptx/nvptx.opt: New file.
+	* common/config/nvptx/nvptx-common.c: New file.
+	* config.gcc: Handle nvptx-*-*.
+
 2014-11-10  Richard Biener  <rguenther@suse.de>
 
 	* tree-ssa-operands.c (finalize_ssa_uses): Properly put
diff --git a/gcc/common/config/nvptx/nvptx-common.c b/gcc/common/config/nvptx/nvptx-common.c
new file mode 100644
index 0000000..80ab076
--- /dev/null
+++ b/gcc/common/config/nvptx/nvptx-common.c
@@ -0,0 +1,38 @@
+/* NVPTX common hooks.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   Contributed by Bernd Schmidt <bernds@codesourcery.com>
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "diagnostic-core.h"
+#include "tm.h"
+#include "tm_p.h"
+#include "common/common-target.h"
+#include "common/common-target-def.h"
+#include "opts.h"
+#include "flags.h"
+
+#undef TARGET_HAVE_NAMED_SECTIONS
+#define TARGET_HAVE_NAMED_SECTIONS false
+
+#undef TARGET_DEFAULT_TARGET_FLAGS
+#define TARGET_DEFAULT_TARGET_FLAGS MASK_ABI64
+
+struct gcc_targetm_common targetm_common = TARGETM_COMMON_INITIALIZER;
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 7afe5a7..2284b9e 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -422,6 +422,9 @@ nios2-*-*)
 	cpu_type=nios2
 	extra_options="${extra_options} g.opt"
 	;;
+nvptx-*-*)
+	cpu_type=nvptx
+	;;
 powerpc*-*-*)
 	cpu_type=rs6000
 	extra_headers="ppc-asm.h altivec.h spe.h ppu_intrinsics.h paired.h spu2vmx.h vec_types.h si2vmx.h htmintrin.h htmxlintrin.h"
@@ -2153,6 +2156,10 @@ nios2-*-*)
 		;;
         esac
 	;;
+nvptx-*)
+	tm_file="${tm_file} newlib-stdint.h"
+	tmake_file="nvptx/t-nvptx"
+	;;
 pdp11-*-*)
 	tm_file="${tm_file} newlib-stdint.h"
 	use_gcc_stdint=wrap
diff --git a/gcc/config/nvptx/nvptx-protos.h b/gcc/config/nvptx/nvptx-protos.h
new file mode 100644
index 0000000..bd5a920
--- /dev/null
+++ b/gcc/config/nvptx/nvptx-protos.h
@@ -0,0 +1,46 @@
+/* Prototypes for exported functions defined in nvptx.c.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   Contributed by Bernd Schmidt <bernds@codesourcery.com>
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_NVPTX_PROTOS_H
+#define GCC_NVPTX_PROTOS_H
+
+extern void nvptx_declare_function_name (FILE *, const char *, const_tree decl);
+extern void nvptx_declare_object_name (FILE *file, const char *name,
+				       const_tree decl);
+extern void nvptx_record_needed_fndecl (tree decl);
+extern void nvptx_function_end (FILE *);
+extern void nvptx_output_skip (FILE *, unsigned HOST_WIDE_INT);
+extern void nvptx_output_ascii (FILE *, const char *, unsigned HOST_WIDE_INT);
+extern void nvptx_register_pragmas (void);
+extern const char *nvptx_section_for_decl (const_tree);
+
+#ifdef RTX_CODE
+extern void nvptx_expand_call (rtx, rtx);
+extern rtx nvptx_expand_compare (rtx);
+extern const char *nvptx_ptx_type_from_mode (machine_mode, bool);
+extern const char *nvptx_output_call_insn (rtx_insn *, rtx, rtx);
+extern const char *nvptx_output_return (void);
+extern machine_mode nvptx_underlying_object_mode (rtx);
+extern const char *nvptx_section_from_addr_space (addr_space_t);
+extern bool nvptx_hard_regno_mode_ok (int, machine_mode);
+extern addr_space_t nvptx_addr_space_from_address (rtx);
+extern rtx nvptx_maybe_convert_symbolic_operand (rtx);
+#endif
+#endif
diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
new file mode 100644
index 0000000..9382812
--- /dev/null
+++ b/gcc/config/nvptx/nvptx.c
@@ -0,0 +1,2120 @@
+/* Target code for NVPTX.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   Contributed by Bernd Schmidt <bernds@codesourcery.com>
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "rtl.h"
+#include "tree.h"
+#include "insn-flags.h"
+#include "output.h"
+#include "insn-attr.h"
+#include "insn-codes.h"
+#include "expr.h"
+#include "regs.h"
+#include "optabs.h"
+#include "recog.h"
+#include "ggc.h"
+#include "timevar.h"
+#include "tm_p.h"
+#include "tm-preds.h"
+#include "tm-constrs.h"
+#include "function.h"
+#include "langhooks.h"
+#include "dbxout.h"
+#include "target.h"
+#include "target-def.h"
+#include "diagnostic.h"
+#include "predict.h"
+#include "basic-block.h"
+#include "cfgrtl.h"
+#include "stor-layout.h"
+#include "calls.h"
+#include "df.h"
+#include "builtins.h"
+#include "hashtab.h"
+#include <sstream>
+
+/* Record the function decls we've written, and the libfuncs and function
+   decls corresponding to them.  */
+static std::stringstream func_decls;
+static GTY((if_marked ("ggc_marked_p"), param_is (struct rtx_def)))
+  htab_t declared_libfuncs_htab;
+static GTY((if_marked ("ggc_marked_p"), param_is (union tree_node)))
+  htab_t declared_fndecls_htab;
+static GTY((if_marked ("ggc_marked_p"), param_is (union tree_node)))
+  htab_t needed_fndecls_htab;
+
+/* Allocate a new, cleared machine_function structure.  */
+
+static struct machine_function *
+nvptx_init_machine_status (void)
+{
+  struct machine_function *p = ggc_cleared_alloc<machine_function> ();
+  p->ret_reg_mode = VOIDmode;
+  return p;
+}
+
+/* Implement TARGET_OPTION_OVERRIDE.  */
+
+static void
+nvptx_option_override (void)
+{
+  init_machine_status = nvptx_init_machine_status;
+  /* Gives us a predictable order, which we need especially for variables.  */
+  flag_toplevel_reorder = 1;
+  /* Assumes that it will see only hard registers.  */
+  flag_var_tracking = 0;
+  write_symbols = NO_DEBUG;
+  debug_info_level = DINFO_LEVEL_NONE;
+
+  declared_fndecls_htab
+    = htab_create_ggc (17, htab_hash_pointer, htab_eq_pointer, NULL);
+  needed_fndecls_htab
+    = htab_create_ggc (17, htab_hash_pointer, htab_eq_pointer, NULL);
+  declared_libfuncs_htab
+    = htab_create_ggc (17, htab_hash_pointer, htab_eq_pointer, NULL);
+}
+
+/* Return the mode to be used when declaring a ptx object for OBJ.
+   For objects with subparts such as complex modes this is the mode
+   of the subpart.  */
+
+machine_mode
+nvptx_underlying_object_mode (rtx obj)
+{
+  if (GET_CODE (obj) == SUBREG)
+    obj = SUBREG_REG (obj);
+  machine_mode mode = GET_MODE (obj);
+  if (mode == TImode)
+    return DImode;
+  if (COMPLEX_MODE_P (mode))
+    return GET_MODE_INNER (mode);
+  return mode;
+}
+
+/* Return a ptx type for MODE.  If PROMOTE, then use .u32 for QImode to
+   deal with ptx ideosyncracies.  */
+
+const char *
+nvptx_ptx_type_from_mode (machine_mode mode, bool promote)
+{
+  switch (mode)
+    {
+    case BLKmode:
+      return ".b8";
+    case BImode:
+      return ".pred";
+    case QImode:
+      if (promote)
+	return ".u32";
+      else
+	return ".u8";
+    case HImode:
+      return ".u16";
+    case SImode:
+      return ".u32";
+    case DImode:
+      return ".u64";
+
+    case SFmode:
+      return ".f32";
+    case DFmode:
+      return ".f64";
+
+    default:
+      gcc_unreachable ();
+    }
+}
+
+/* Return the number of pieces to use when dealing with a pseudo of *PMODE.
+   Alter *PMODE if we return a number greater than one.  */
+
+static int
+maybe_split_mode (machine_mode *pmode)
+{
+  machine_mode mode = *pmode;
+
+  if (COMPLEX_MODE_P (mode))
+    {
+      *pmode = GET_MODE_INNER (mode);
+      return 2;
+    }
+  else if (mode == TImode)
+    {
+      *pmode = DImode;
+      return 2;
+    }
+  return 1;
+}
+
+/* Like maybe_split_mode, but only return whether or not the mode
+   needs to be split.  */
+static bool
+nvptx_split_reg_p (machine_mode mode)
+{
+  if (COMPLEX_MODE_P (mode))
+    return true;
+  if (mode == TImode)
+    return true;
+  return false;
+}
+
+#define PASS_IN_REG_P(MODE, TYPE)				\
+  ((GET_MODE_CLASS (MODE) == MODE_INT				\
+    || GET_MODE_CLASS (MODE) == MODE_FLOAT			\
+    || ((GET_MODE_CLASS (MODE) == MODE_COMPLEX_INT		\
+	 || GET_MODE_CLASS (MODE) == MODE_COMPLEX_FLOAT)	\
+	&& !AGGREGATE_TYPE_P (TYPE)))				\
+   && (MODE) != TImode)
+
+#define RETURN_IN_REG_P(MODE)			\
+  ((GET_MODE_CLASS (MODE) == MODE_INT		\
+    || GET_MODE_CLASS (MODE) == MODE_FLOAT)	\
+   && GET_MODE_SIZE (MODE) <= 8)
+\f
+/* Perform a mode promotion for a function argument with MODE.  Return
+   the promoted mode.  */
+
+static machine_mode
+arg_promotion (machine_mode mode)
+{
+  if (mode == QImode || mode == HImode)
+    return SImode;
+  return mode;
+}
+
+/* Write the declaration of a function arg of TYPE to S.  I is the index
+   of the argument, MODE its mode.  NO_ARG_TYPES is true if this is for
+   a decl with zero TYPE_ARG_TYPES, i.e. an old-style C decl.  */
+
+static int
+write_one_arg (std::stringstream &s, tree type, int i, machine_mode mode,
+	       bool no_arg_types)
+{
+  if (!PASS_IN_REG_P (mode, type))
+    mode = Pmode;
+
+  int count = maybe_split_mode (&mode);
+
+  if (count == 2)
+    {
+      write_one_arg (s, NULL_TREE, i, mode, false);
+      write_one_arg (s, NULL_TREE, i + 1, mode, false);
+      return i + 1;
+    }
+
+  if (no_arg_types && !AGGREGATE_TYPE_P (type))
+    {
+      if (mode == SFmode)
+	mode = DFmode;
+      mode = arg_promotion (mode);
+    }
+
+  if (i > 0)
+    s << ", ";
+  s << ".param" << nvptx_ptx_type_from_mode (mode, false) << " %in_ar"
+    << (i + 1) << (mode == QImode || mode == HImode ? "[1]" : "");
+  if (mode == BLKmode)
+    s << "[" << int_size_in_bytes (type) << "]";
+  return i;
+}
+
+/* Look for attributes in ATTRS that would indicate we must write a function
+   as a .entry kernel rather than a .func.  Return true if one is found.  */
+
+static bool
+write_as_kernel (tree attrs)
+{
+  return (lookup_attribute ("kernel", attrs) != NULL_TREE
+	  || lookup_attribute ("omp target entrypoint", attrs) != NULL_TREE);
+}
+
+/* Write a function decl for DECL to S, where NAME is the name to be used.  */
+
+static void
+nvptx_write_function_decl (std::stringstream &s, const char *name, const_tree decl)
+{
+  tree fntype = TREE_TYPE (decl);
+  tree result_type = TREE_TYPE (fntype);
+  tree args = TYPE_ARG_TYPES (fntype);
+  tree attrs = DECL_ATTRIBUTES (decl);
+  bool kernel = write_as_kernel (attrs);
+  bool is_main = strcmp (name, "main") == 0;
+  bool args_from_decl = false;
+
+  /* We get:
+     NULL in TYPE_ARG_TYPES, for old-style functions
+     NULL in DECL_ARGUMENTS, for builtin functions without another
+       declaration.
+     So we have to pick the best one we have.  */
+  if (args == 0)
+    {
+      args = DECL_ARGUMENTS (decl);
+      args_from_decl = true;
+    }
+
+  if (DECL_EXTERNAL (decl))
+    s << ".extern ";
+  else if (TREE_PUBLIC (decl))
+    s << ".visible ";
+
+  if (kernel)
+    s << ".entry ";
+  else
+    s << ".func ";
+
+  /* Declare the result.  */
+  bool return_in_mem = false;
+  if (TYPE_MODE (result_type) != VOIDmode)
+    {
+      machine_mode mode = TYPE_MODE (result_type);
+      if (!RETURN_IN_REG_P (mode))
+	return_in_mem = true;
+      else
+	{
+	  mode = arg_promotion (mode);
+	  s << "(.param" << nvptx_ptx_type_from_mode (mode, false)
+	    << " %out_retval)";
+	}
+    }
+
+  if (name[0] == '*')
+    s << (name + 1);
+  else
+    s << name;
+
+  /* Declare argument types.  */
+  if ((args != NULL_TREE
+       && !(TREE_CODE (args) == TREE_LIST && TREE_VALUE (args) == void_type_node))
+      || is_main
+      || return_in_mem
+      || DECL_STATIC_CHAIN (decl))
+    {
+      s << "(";
+      int i = 0;
+      bool any_args = false;
+      if (return_in_mem)
+	{
+	  s << ".param.u" << GET_MODE_BITSIZE (Pmode) << " %in_ar1";
+	  i++;
+	}
+      while (args != NULL_TREE)
+	{
+	  tree type = args_from_decl ? TREE_TYPE (args) : TREE_VALUE (args);
+	  machine_mode mode = TYPE_MODE (type);
+
+	  if (mode != VOIDmode)
+	    {
+	      i = write_one_arg (s, type, i, mode,
+				 TYPE_ARG_TYPES (fntype) == 0);
+	      any_args = true;
+	      i++;
+	    }
+	  args = TREE_CHAIN (args);
+	}
+      if (stdarg_p (fntype))
+	{
+	  gcc_assert (i > 0);
+	  s << ", .param.u" << GET_MODE_BITSIZE (Pmode) << " %in_argp";
+	}
+      if (DECL_STATIC_CHAIN (decl))
+	{
+	  if (i > 0)
+	    s << ", ";
+	  s << ".reg.u" << GET_MODE_BITSIZE (Pmode)
+	    << reg_names [STATIC_CHAIN_REGNUM];
+	}
+      if (!any_args && is_main)
+	s << ".param.u32 %argc, .param.u" << GET_MODE_BITSIZE (Pmode)
+	  << " %argv";
+      s << ")";
+    }
+}
+
+/* Walk either ARGTYPES or ARGS if the former is null, and write out part of
+   the function header to FILE.  If WRITE_COPY is false, write reg
+   declarations, otherwise write the copy from the incoming argument to that
+   reg.  RETURN_IN_MEM indicates whether to start counting arg numbers at 1
+   instead of 0.  */
+
+static void
+walk_args_for_param (FILE *file, tree argtypes, tree args, bool write_copy,
+		     bool return_in_mem)
+{
+  int i;
+
+  bool args_from_decl = false;
+  if (argtypes == 0)
+    args_from_decl = true;
+  else
+    args = argtypes;
+
+  for (i = return_in_mem ? 1 : 0; args != NULL_TREE; args = TREE_CHAIN (args))
+    {
+      tree type = args_from_decl ? TREE_TYPE (args) : TREE_VALUE (args);
+      machine_mode mode = TYPE_MODE (type);
+
+      if (mode == VOIDmode)
+	break;
+
+      if (!PASS_IN_REG_P (mode, type))
+	mode = Pmode;
+
+      int count = maybe_split_mode (&mode);
+      if (count == 1)
+	{
+	  if (argtypes == NULL && !AGGREGATE_TYPE_P (type))
+	    {
+	      if (mode == SFmode)
+		mode = DFmode;
+
+	    }
+	  mode = arg_promotion (mode);
+	}
+      while (count-- > 0)
+	{
+	  i++;
+	  if (write_copy)
+	    fprintf (file, "\tld.param%s %%ar%d, [%%in_ar%d];\n",
+		     nvptx_ptx_type_from_mode (mode, false), i, i);
+	  else
+	    fprintf (file, "\t.reg%s %%ar%d;\n",
+		     nvptx_ptx_type_from_mode (mode, false), i);
+	}
+    }
+}
+
+/* Write a .func or .kernel declaration (not a definition) along with
+   a helper comment for use by ld.  S is the stream to write to, DECL
+   the decl for the function with name NAME.  */
+
+static void
+write_function_decl_and_comment (std::stringstream &s, const char *name, const_tree decl)
+{
+  s << "// BEGIN";
+  if (TREE_PUBLIC (decl))
+    s << " GLOBAL";
+  s << " FUNCTION DECL: ";
+  if (name[0] == '*')
+    s << (name + 1);
+  else
+    s << name;
+  s << "\n";
+  nvptx_write_function_decl (s, name, decl);
+  s << ";\n";
+}
+
+/* Check NAME for special function names and redirect them by returning a
+   replacement.  This applies to malloc, free and realloc, for which we
+   want to use libgcc wrappers, and call, which triggers a bug in ptxas.  */
+
+static const char *
+nvptx_name_replacement (const char *name)
+{
+  if (strcmp (name, "call") == 0)
+    return "__nvptx_call";
+  if (strcmp (name, "malloc") == 0)
+    return "__nvptx_malloc";
+  if (strcmp (name, "free") == 0)
+    return "__nvptx_free";
+  if (strcmp (name, "realloc") == 0)
+    return "__nvptx_realloc";
+  return name;
+}
+
+/* If DECL is a FUNCTION_DECL, check the hash table to see if we
+   already encountered it, and if not, insert it and write a ptx
+   declarations that will be output at the end of compilation.  */
+
+static bool
+nvptx_record_fndecl (tree decl, bool force = false)
+{
+  if (decl == NULL_TREE || TREE_CODE (decl) != FUNCTION_DECL
+      || !DECL_EXTERNAL (decl))
+    return true;
+
+  if (!force && TYPE_ARG_TYPES (TREE_TYPE (decl)) == NULL_TREE)
+    return false;
+
+  void **slot = htab_find_slot (declared_fndecls_htab, decl, INSERT);
+  if (*slot == NULL)
+    {
+      *slot = decl;
+      const char *name = get_fnname_from_decl (decl);
+      name = nvptx_name_replacement (name);
+      write_function_decl_and_comment (func_decls, name, decl);
+    }
+  return true;
+}
+
+/* Record that we need to emit a ptx decl for DECL.  Either do it now, or
+   record it for later in case we have no argument information at this
+   point.  */
+
+void
+nvptx_record_needed_fndecl (tree decl)
+{
+  if (nvptx_record_fndecl (decl))
+    return;
+
+  void **slot = htab_find_slot (needed_fndecls_htab, decl, INSERT);
+  if (*slot == NULL)
+    *slot = decl;
+}
+
+/* Implement ASM_DECLARE_FUNCTION_NAME.  Writes the start of a ptx
+   function, including local var decls and copies from the arguments to
+   local regs.  */
+
+void
+nvptx_declare_function_name (FILE *file, const char *name, const_tree decl)
+{
+  tree fntype = TREE_TYPE (decl);
+  tree result_type = TREE_TYPE (fntype);
+
+  name = nvptx_name_replacement (name);
+
+  std::stringstream s;
+  write_function_decl_and_comment (s, name, decl);
+  s << "// BEGIN";
+  if (TREE_PUBLIC (decl))
+    s << " GLOBAL";
+  s << " FUNCTION DEF: ";
+
+  if (name[0] == '*')
+    s << (name + 1);
+  else
+    s << name;
+  s << "\n";
+
+  nvptx_write_function_decl (s, name, decl);
+  fprintf (file, "%s", s.str().c_str());
+
+  bool return_in_mem = false;
+  if (TYPE_MODE (result_type) != VOIDmode)
+    {
+      machine_mode mode = TYPE_MODE (result_type);
+      if (!RETURN_IN_REG_P (mode))
+	return_in_mem = true;
+    }
+
+  fprintf (file, "\n{\n");
+
+  /* Ensure all arguments that should live in a register have one
+     declared.  We'll emit the copies below.  */
+  walk_args_for_param (file, TYPE_ARG_TYPES (fntype), DECL_ARGUMENTS (decl),
+		       false, return_in_mem);
+  if (return_in_mem)
+    fprintf (file, "\t.reg.u%d %%ar1;\n", GET_MODE_BITSIZE (Pmode));
+  else if (TYPE_MODE (result_type) != VOIDmode)
+    {
+      machine_mode mode = arg_promotion (TYPE_MODE (result_type));
+      fprintf (file, ".reg%s %%retval;\n",
+	       nvptx_ptx_type_from_mode (mode, false));
+    }
+
+  if (stdarg_p (fntype))
+    fprintf (file, "\t.reg.u%d %%argp;\n", GET_MODE_BITSIZE (Pmode));
+
+  fprintf (file, "\t.reg.u%d %s;\n", GET_MODE_BITSIZE (Pmode),
+	   reg_names[OUTGOING_STATIC_CHAIN_REGNUM]);
+
+  /* Declare the pseudos we have as ptx registers.  */
+  int maxregs = max_reg_num ();
+  for (int i = LAST_VIRTUAL_REGISTER + 1; i < maxregs; i++)
+    {
+      if (regno_reg_rtx[i] != const0_rtx)
+	{
+	  machine_mode mode = PSEUDO_REGNO_MODE (i);
+	  int count = maybe_split_mode (&mode);
+	  if (count > 1)
+	    {
+	      while (count-- > 0)
+		fprintf (file, "\t.reg%s %%r%d$%d;\n",
+			 nvptx_ptx_type_from_mode (mode, true),
+			 i, count);
+	    }
+	  else
+	    fprintf (file, "\t.reg%s %%r%d;\n",
+		     nvptx_ptx_type_from_mode (mode, true),
+		     i);
+	}
+    }
+
+  /* The only reason we might be using outgoing args is if we call a stdargs
+     function.  Allocate the space for this.  If we called varargs functions
+     without passing any variadic arguments, we'll see a reference to outargs
+     even with a zero outgoing_args_size.  */
+  HOST_WIDE_INT sz = crtl->outgoing_args_size;
+  if (sz == 0)
+    sz = 1;
+  if (cfun->machine->has_call_with_varargs)
+    fprintf (file, "\t.reg.u%d %%outargs;\n"
+	     "\t.local.align 8 .b8 %%outargs_ar["HOST_WIDE_INT_PRINT_DEC"];\n",
+	     BITS_PER_WORD, sz);
+  if (cfun->machine->punning_buffer_size > 0)
+    fprintf (file, "\t.reg.u%d %%punbuffer;\n"
+	     "\t.local.align 8 .b8 %%punbuffer_ar[%d];\n",
+	     BITS_PER_WORD, cfun->machine->punning_buffer_size);
+
+  /* Declare a local variable for the frame.  */
+  sz = get_frame_size ();
+  if (sz > 0 || cfun->machine->has_call_with_sc)
+    {
+      fprintf (file, "\t.reg.u%d %%frame;\n"
+	       "\t.local.align 8 .b8 %%farray["HOST_WIDE_INT_PRINT_DEC"];\n",
+	       BITS_PER_WORD, sz == 0 ? 1 : sz);
+      fprintf (file, "\tcvta.local.u%d %%frame, %%farray;\n",
+	       BITS_PER_WORD);
+    }
+
+  if (cfun->machine->has_call_with_varargs)
+      fprintf (file, "\tcvta.local.u%d %%outargs, %%outargs_ar;\n",
+	       BITS_PER_WORD);
+  if (cfun->machine->punning_buffer_size > 0)
+      fprintf (file, "\tcvta.local.u%d %%punbuffer, %%punbuffer_ar;\n",
+	       BITS_PER_WORD);
+
+  /* Now emit any copies necessary for arguments.  */
+  walk_args_for_param (file, TYPE_ARG_TYPES (fntype), DECL_ARGUMENTS (decl),
+		       true, return_in_mem);
+  if (return_in_mem)
+    fprintf (file, "ld.param.u%d %%ar1, [%%in_ar1];\n",
+	     GET_MODE_BITSIZE (Pmode));
+  if (stdarg_p (fntype))
+    fprintf (file, "ld.param.u%d %%argp, [%%in_argp];\n",
+	     GET_MODE_BITSIZE (Pmode));
+}
+
+/* Output a return instruction.  Also copy the return value to its outgoing
+   location.  */
+
+const char *
+nvptx_output_return (void)
+{
+  tree fntype = TREE_TYPE (current_function_decl);
+  tree result_type = TREE_TYPE (fntype);
+  if (TYPE_MODE (result_type) != VOIDmode)
+    {
+      machine_mode mode = TYPE_MODE (result_type);
+      if (RETURN_IN_REG_P (mode))
+	{
+	  mode = arg_promotion (mode);
+	  fprintf (asm_out_file, "\tst.param%s\t[%%out_retval], %%retval;\n",
+		   nvptx_ptx_type_from_mode (mode, false));
+	}
+    }
+
+  return "ret;";
+}
+
+/* Construct a function declaration from a call insn.  This can be
+   necessary for two reasons - either we have an indirect call which
+   requires a .callprototype declaration, or we have a libcall
+   generated by emit_library_call for which no decl exists.  */
+
+static void
+write_func_decl_from_insn (std::stringstream &s, rtx result, rtx pat,
+			   rtx callee)
+{
+  bool callprototype = register_operand (callee, Pmode);
+  const char *name = "_";
+  if (!callprototype)
+    {
+      name = XSTR (callee, 0);
+      name = nvptx_name_replacement (name);
+      s << "// BEGIN GLOBAL FUNCTION DECL: " << name << "\n";
+    }
+  s << (callprototype ? "\t.callprototype\t" : "\t.extern .func ");
+
+  if (result != NULL_RTX)
+    {
+      s << "(.param";
+      s << nvptx_ptx_type_from_mode (arg_promotion (GET_MODE (result)),
+				     false);
+      s << " ";
+      if (callprototype)
+	s << "_";
+      else
+	s << "%out_retval";
+      s << ")";
+    }
+
+  s << name;
+
+  int nargs = XVECLEN (pat, 0) - 1;
+  if (nargs > 0)
+    {
+      s << " (";
+      for (int i = 0; i < nargs; i++)
+	{
+	  rtx t = XEXP (XVECEXP (pat, 0, i + 1), 0);
+	  machine_mode mode = GET_MODE (t);
+	  int count = maybe_split_mode (&mode);
+
+	  while (count-- > 0)
+	    {
+	      s << ".param";
+	      s << nvptx_ptx_type_from_mode (mode, false);
+	      s << " ";
+	      if (callprototype)
+		s << "_";
+	      else
+		s << "%arg" << i;
+	      if (mode == QImode || mode == HImode)
+		s << "[1]";
+	      if (i + 1 < nargs || count > 0)
+		s << ", ";
+	    }
+	}
+      s << ")";
+    }
+  s << ";\n";
+}
+
+/* Terminate a function by writing a closing brace to FILE.  */
+
+void
+nvptx_function_end (FILE *file)
+{
+  fprintf (file, "\t}\n");
+}
+\f
+/* Decide whether we can make a sibling call to a function.  For ptx, we
+   can't.  */
+
+static bool
+nvptx_function_ok_for_sibcall (tree, tree)
+{
+  return false;
+}
+
+/* Implement the TARGET_CALL_ARGS hook.  Record information about one
+   argument to the next call.  */
+
+static void
+nvptx_call_args (rtx arg, tree funtype)
+{
+  if (cfun->machine->start_call == NULL_RTX)
+    {
+      cfun->machine->call_args = NULL;
+      cfun->machine->funtype = funtype;
+      cfun->machine->start_call = const0_rtx;
+    }
+  if (arg == pc_rtx)
+    return;
+
+  rtx_expr_list *args_so_far = cfun->machine->call_args;
+  if (REG_P (arg))
+    cfun->machine->call_args = alloc_EXPR_LIST (VOIDmode, arg, args_so_far);
+}
+
+/* Implement the corresponding END_CALL_ARGS hook.  Clear and free the
+   information we recorded.  */
+
+static void
+nvptx_end_call_args (void)
+{
+  cfun->machine->start_call = NULL_RTX;
+  free_EXPR_LIST_list (&cfun->machine->call_args);
+}
+
+/* Emit the sequence for a call.  */
+
+void
+nvptx_expand_call (rtx retval, rtx address)
+{
+  int nargs;
+  rtx callee = XEXP (address, 0);
+  rtx pat, t;
+  rtvec vec;
+  bool external_decl = false;
+
+  nargs = 0;
+  for (t = cfun->machine->call_args; t; t = XEXP (t, 1))
+    nargs++;
+
+  bool has_varargs = false;
+  tree decl_type = NULL_TREE;
+
+  if (!call_insn_operand (callee, Pmode))
+    {
+      callee = force_reg (Pmode, callee);
+      address = change_address (address, QImode, callee);
+    }
+
+  if (GET_CODE (callee) == SYMBOL_REF)
+    {
+      tree decl = SYMBOL_REF_DECL (callee);
+      if (decl != NULL_TREE)
+	{
+	  decl_type = TREE_TYPE (decl);
+	  if (DECL_STATIC_CHAIN (decl))
+	    cfun->machine->has_call_with_sc = true;
+	  if (DECL_EXTERNAL (decl))
+	    external_decl = true;
+	}
+    }
+  if (cfun->machine->funtype
+      /* It's possible to construct testcases where we call a variable.
+	 See compile/20020129-1.c.  stdarg_p will crash so avoid calling it
+	 in such a case.  */
+      && (TREE_CODE (cfun->machine->funtype) == FUNCTION_TYPE
+	  || TREE_CODE (cfun->machine->funtype) == METHOD_TYPE)
+      && stdarg_p (cfun->machine->funtype))
+    {
+      has_varargs = true;
+      cfun->machine->has_call_with_varargs = true;
+    }
+  vec = rtvec_alloc (nargs + 1 + (has_varargs ? 1 : 0));
+  pat = gen_rtx_PARALLEL (VOIDmode, vec);
+  if (has_varargs)
+    {
+      rtx this_arg = gen_reg_rtx (Pmode);
+      if (Pmode == DImode)
+	emit_move_insn (this_arg, stack_pointer_rtx);
+      else
+	emit_move_insn (this_arg, stack_pointer_rtx);
+      XVECEXP (pat, 0, nargs + 1) = gen_rtx_USE (VOIDmode, this_arg);
+    }
+
+  int i;
+  rtx arg;
+  for (i = 1, arg = cfun->machine->call_args; arg; arg = XEXP (arg, 1), i++)
+    {
+      rtx this_arg = XEXP (arg, 0);
+      XVECEXP (pat, 0, i) = gen_rtx_USE (VOIDmode, this_arg);
+    }
+
+  rtx tmp_retval = retval;
+  t = gen_rtx_CALL (VOIDmode, address, const0_rtx);
+  if (retval != NULL_RTX)
+    {
+      if (!nvptx_register_operand (retval, GET_MODE (retval)))
+	tmp_retval = gen_reg_rtx (GET_MODE (retval));
+      t = gen_rtx_SET (VOIDmode, tmp_retval, t);
+    }
+  XVECEXP (pat, 0, 0) = t;
+  if (!REG_P (callee)
+      && (decl_type == NULL_TREE
+	  || (external_decl && TYPE_ARG_TYPES (decl_type) == NULL_TREE)))
+    {
+      void **slot = htab_find_slot (declared_libfuncs_htab, callee, INSERT);
+      if (*slot == NULL)
+	{
+	  *slot = callee;
+	  write_func_decl_from_insn (func_decls, retval, pat, callee);
+	}
+    }
+  emit_call_insn (pat);
+  if (tmp_retval != retval)
+    emit_move_insn (retval, tmp_retval);
+}
+
+/* Implement TARGET_FUNCTION_ARG.  */
+
+static rtx
+nvptx_function_arg (cumulative_args_t, machine_mode mode,
+		    const_tree, bool named)
+{
+  if (mode == VOIDmode)
+    return NULL_RTX;
+
+  if (named)
+    return gen_reg_rtx (mode);
+  return NULL_RTX;
+}
+
+/* Implement TARGET_FUNCTION_INCOMING_ARG.  */
+
+static rtx
+nvptx_function_incoming_arg (cumulative_args_t cum_v, machine_mode mode,
+			     const_tree, bool named)
+{
+  CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v);
+  if (mode == VOIDmode)
+    return NULL_RTX;
+
+  if (!named)
+    return NULL_RTX;
+
+  /* No need to deal with split modes here, the only case that can
+     happen is complex modes and those are dealt with by
+     TARGET_SPLIT_COMPLEX_ARG.  */
+  return gen_rtx_UNSPEC (mode,
+			 gen_rtvec (1, GEN_INT (1 + cum->count)),
+			 UNSPEC_ARG_REG);
+}
+
+/* Implement TARGET_FUNCTION_ARG_ADVANCE.  */
+
+static void
+nvptx_function_arg_advance (cumulative_args_t cum_v, machine_mode mode,
+			    const_tree type ATTRIBUTE_UNUSED,
+			    bool named ATTRIBUTE_UNUSED)
+{
+  CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v);
+  if (mode == TImode)
+    cum->count += 2;
+  else
+    cum->count++;
+}
+
+/* Handle the TARGET_STRICT_ARGUMENT_NAMING target hook.
+
+   For nvptx, we know how to handle functions declared as stdarg: by
+   passing an extra pointer to the unnamed arguments.  However, the
+   Fortran frontend can produce a different situation, where a
+   function pointer is declared with no arguments, but the actual
+   function and calls to it take more arguments.  In that case, we
+   want to ensure the call matches the definition of the function.  */
+
+static bool
+nvptx_strict_argument_naming (cumulative_args_t cum_v)
+{
+  CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v);
+  return cum->fntype == NULL_TREE || stdarg_p (cum->fntype);
+}
+
+/* Implement TARGET_FUNCTION_ARG_BOUNDARY.  */
+
+static unsigned int
+nvptx_function_arg_boundary (machine_mode mode, const_tree type)
+{
+  unsigned int boundary = type ? TYPE_ALIGN (type) : GET_MODE_BITSIZE (mode);
+
+  if (boundary > BITS_PER_WORD)
+    return 2 * BITS_PER_WORD;
+
+  if (mode == BLKmode)
+    {
+      HOST_WIDE_INT size = int_size_in_bytes (type);
+      if (size > 4)
+        return 2 * BITS_PER_WORD;
+      if (boundary < BITS_PER_WORD)
+        {
+          if (size >= 3)
+            return BITS_PER_WORD;
+          if (size >= 2)
+            return 2 * BITS_PER_UNIT;
+        }
+    }
+  return boundary;
+}
+
+/* TARGET_FUNCTION_VALUE implementation.  Returns an RTX representing the place
+   where function FUNC returns or receives a value of data type TYPE.  */
+
+static rtx
+nvptx_function_value (const_tree type, const_tree func ATTRIBUTE_UNUSED,
+		      bool outgoing)
+{
+  int unsignedp = TYPE_UNSIGNED (type);
+  machine_mode orig_mode = TYPE_MODE (type);
+  machine_mode mode = promote_function_mode (type, orig_mode,
+					     &unsignedp, NULL_TREE, 1);
+  if (outgoing)
+    return gen_rtx_REG (mode, NVPTX_RETURN_REGNUM);
+  if (cfun->machine->start_call == NULL_RTX)
+    /* Pretend to return in a hard reg for early uses before pseudos can be
+       generated.  */
+    return gen_rtx_REG (mode, NVPTX_RETURN_REGNUM);
+  return gen_reg_rtx (mode);
+}
+
+/* Implement TARGET_LIBCALL_VALUE.  */
+
+static rtx
+nvptx_libcall_value (machine_mode mode, const_rtx)
+{
+  if (cfun->machine->start_call == NULL_RTX)
+    /* Pretend to return in a hard reg for early uses before pseudos can be
+       generated.  */
+    return gen_rtx_REG (mode, NVPTX_RETURN_REGNUM);
+  return gen_reg_rtx (mode);
+}
+
+/* Implement TARGET_FUNCTION_VALUE_REGNO_P.  */
+
+static bool
+nvptx_function_value_regno_p (const unsigned int regno)
+{
+  return regno == NVPTX_RETURN_REGNUM;
+}
+
+/* Types with a mode other than those supported by the machine are passed by
+   reference in memory.  */
+
+static bool
+nvptx_pass_by_reference (cumulative_args_t, machine_mode mode,
+			 const_tree type, bool)
+{
+  return !PASS_IN_REG_P (mode, type);
+}
+
+/* Implement TARGET_RETURN_IN_MEMORY.  */
+
+static bool
+nvptx_return_in_memory (const_tree type, const_tree)
+{
+  machine_mode mode = TYPE_MODE (type);
+  if (!RETURN_IN_REG_P (mode))
+    return true;
+  return false;
+}
+
+/* Implement TARGET_PROMOTE_FUNCTION_MODE.  */
+
+static machine_mode
+nvptx_promote_function_mode (const_tree type, machine_mode mode,
+			     int *punsignedp,
+			     const_tree funtype, int for_return)
+{
+  if (type == NULL_TREE)
+    return mode;
+  if (for_return)
+    return promote_mode (type, mode, punsignedp);
+  /* For K&R-style functions, try to match the language promotion rules to
+     minimize type mismatches at assembly time.  */
+  if (TYPE_ARG_TYPES (funtype) == NULL_TREE
+      && type != NULL_TREE
+      && !AGGREGATE_TYPE_P (type))
+    {
+      if (mode == SFmode)
+	mode = DFmode;
+      mode = arg_promotion (mode);
+    }
+
+  return mode;
+}
+
+/* Implement TARGET_STATIC_CHAIN.  */
+
+static rtx
+nvptx_static_chain (const_tree fndecl, bool incoming_p)
+{
+  if (!DECL_STATIC_CHAIN (fndecl))
+    return NULL;
+
+  if (incoming_p)
+    return gen_rtx_REG (Pmode, STATIC_CHAIN_REGNUM);
+  else
+    return gen_rtx_REG (Pmode, OUTGOING_STATIC_CHAIN_REGNUM);
+}
+\f
+/* Emit a comparison COMPARE, and return the new test to be used in the
+   jump.  */
+
+rtx
+nvptx_expand_compare (rtx compare)
+{
+  rtx pred = gen_reg_rtx (BImode);
+  rtx cmp = gen_rtx_fmt_ee (GET_CODE (compare), BImode,
+			    XEXP (compare, 0), XEXP (compare, 1));
+  emit_insn (gen_rtx_SET (VOIDmode, pred, cmp));
+  return gen_rtx_NE (BImode, pred, const0_rtx);
+}
+
+/* When loading an operand ORIG_OP, verify whether an address space
+   conversion to generic is required, and if so, perform it.  Also
+   check for SYMBOL_REFs for function decls and call
+   nvptx_record_needed_fndecl as needed.
+   Return either the original operand, or the converted one.  */
+
+rtx
+nvptx_maybe_convert_symbolic_operand (rtx orig_op)
+{
+  if (GET_MODE (orig_op) != Pmode)
+    return orig_op;
+
+  rtx op = orig_op;
+  while (GET_CODE (op) == PLUS || GET_CODE (op) == CONST)
+    op = XEXP (op, 0);
+  if (GET_CODE (op) != SYMBOL_REF)
+    return orig_op;
+
+  tree decl = SYMBOL_REF_DECL (op);
+  if (decl && TREE_CODE (decl) == FUNCTION_DECL)
+    {
+      nvptx_record_needed_fndecl (decl);
+      return orig_op;
+    }
+
+  addr_space_t as = nvptx_addr_space_from_address (op);
+  if (as == ADDR_SPACE_GENERIC)
+    return orig_op;
+
+  enum unspec code;
+  code = (as == ADDR_SPACE_GLOBAL ? UNSPEC_FROM_GLOBAL
+	  : as == ADDR_SPACE_LOCAL ? UNSPEC_FROM_LOCAL
+	  : as == ADDR_SPACE_SHARED ? UNSPEC_FROM_SHARED
+	  : as == ADDR_SPACE_CONST ? UNSPEC_FROM_CONST
+	  : UNSPEC_FROM_PARAM);
+  rtx dest = gen_reg_rtx (Pmode);
+  emit_insn (gen_rtx_SET (VOIDmode, dest,
+			  gen_rtx_UNSPEC (Pmode, gen_rtvec (1, orig_op),
+					  code)));
+  return dest;
+}
+\f
+/* Returns true if X is a valid address for use in a memory reference.  */
+
+static bool
+nvptx_legitimate_address_p (machine_mode, rtx x, bool)
+{
+  enum rtx_code code = GET_CODE (x);
+
+  switch (code)
+    {
+    case REG:
+      return true;
+
+    case PLUS:
+      if (REG_P (XEXP (x, 0)) && CONST_INT_P (XEXP (x, 1)))
+	return true;
+      return false;
+
+    case CONST:
+    case SYMBOL_REF:
+    case LABEL_REF:
+      return true;
+
+    default:
+      return false;
+    }
+}
+
+/* Implement HARD_REGNO_MODE_OK.  We barely use hard regs, but we want
+   to ensure that the return register's mode isn't changed.  */
+
+bool
+nvptx_hard_regno_mode_ok (int regno, machine_mode mode)
+{
+  if (regno != NVPTX_RETURN_REGNUM
+      || cfun == NULL || cfun->machine->ret_reg_mode == VOIDmode)
+    return true;
+  return mode == cfun->machine->ret_reg_mode;
+}
+\f
+/* Convert an address space AS to the corresponding ptx string.  */
+
+const char *
+nvptx_section_from_addr_space (addr_space_t as)
+{
+  switch (as)
+    {
+    case ADDR_SPACE_CONST:
+      return ".const";
+
+    case ADDR_SPACE_GLOBAL:
+      return ".global";
+
+    case ADDR_SPACE_SHARED:
+      return ".shared";
+
+    case ADDR_SPACE_GENERIC:
+      return "";
+
+    default:
+      gcc_unreachable ();
+    }
+}
+
+/* Determine whether DECL goes into .const or .global.  */
+
+const char *
+nvptx_section_for_decl (const_tree decl)
+{
+  bool is_const = (CONSTANT_CLASS_P (decl)
+		   || TREE_CODE (decl) == CONST_DECL
+		   || TREE_READONLY (decl));
+  if (is_const)
+    return ".const";
+
+  return ".global";
+}
+
+/* Look for a SYMBOL_REF in ADDR and return the address space to be used
+   for the insn referencing this address.  */
+
+addr_space_t
+nvptx_addr_space_from_address (rtx addr)
+{
+  while (GET_CODE (addr) == PLUS || GET_CODE (addr) == CONST)
+    addr = XEXP (addr, 0);
+  if (GET_CODE (addr) != SYMBOL_REF)
+    return ADDR_SPACE_GENERIC;
+
+  tree decl = SYMBOL_REF_DECL (addr);
+  if (decl == NULL_TREE || TREE_CODE (decl) == FUNCTION_DECL)
+    return ADDR_SPACE_GENERIC;
+
+  bool is_const = (CONSTANT_CLASS_P (decl)
+		   || TREE_CODE (decl) == CONST_DECL
+		   || TREE_READONLY (decl));
+  if (is_const)
+    return ADDR_SPACE_CONST;
+
+  return ADDR_SPACE_GLOBAL;
+}
+\f
+/* Machinery to output constant initializers.  */
+
+/* Used when assembling integers to ensure data is emitted in
+   pieces whose size matches the declaration we printed.  */
+static unsigned int decl_chunk_size;
+static machine_mode decl_chunk_mode;
+/* Used in the same situation, to keep track of the byte offset
+   into the initializer.  */
+static unsigned HOST_WIDE_INT decl_offset;
+/* The initializer part we are currently processing.  */
+static HOST_WIDE_INT init_part;
+/* The total size of the object.  */
+static unsigned HOST_WIDE_INT object_size;
+/* True if we found a skip extending to the end of the object.  Used to
+   assert that no data follows.  */
+static bool object_finished;
+
+/* Write the necessary separator string to begin a new initializer value.  */
+
+static void
+begin_decl_field (void)
+{
+  /* We never see decl_offset at zero by the time we get here.  */
+  if (decl_offset == decl_chunk_size)
+    fprintf (asm_out_file, " = { ");
+  else
+    fprintf (asm_out_file, ", ");
+}
+
+/* Output the currently stored chunk as an initializer value.  */
+
+static void
+output_decl_chunk (void)
+{
+  begin_decl_field ();
+  output_address (gen_int_mode (init_part, decl_chunk_mode));
+  init_part = 0;
+}
+
+/* Add value VAL sized SIZE to the data we're emitting, and keep writing
+   out chunks as they fill up.  */
+
+static void
+nvptx_assemble_value (HOST_WIDE_INT val, unsigned int size)
+{
+  unsigned HOST_WIDE_INT chunk_offset = decl_offset % decl_chunk_size;
+  gcc_assert (!object_finished);
+  while (size > 0)
+    {
+      int this_part = size;
+      if (chunk_offset + this_part > decl_chunk_size)
+	this_part = decl_chunk_size - chunk_offset;
+      HOST_WIDE_INT val_part;
+      HOST_WIDE_INT mask = 2;
+      mask <<= this_part * BITS_PER_UNIT - 1;
+      val_part = val & (mask - 1);
+      init_part |= val_part << (BITS_PER_UNIT * chunk_offset);
+      val >>= BITS_PER_UNIT * this_part;
+      size -= this_part;
+      decl_offset += this_part;
+      if (decl_offset % decl_chunk_size == 0)
+	output_decl_chunk ();
+
+      chunk_offset = 0;
+    }
+}
+
+/* Target hook for assembling integer object X of size SIZE.  */
+
+static bool
+nvptx_assemble_integer (rtx x, unsigned int size, int ARG_UNUSED (aligned_p))
+{
+  if (GET_CODE (x) == SYMBOL_REF || GET_CODE (x) == CONST)
+    {
+      gcc_assert (size = decl_chunk_size);
+      if (decl_offset % decl_chunk_size != 0)
+	sorry ("cannot emit unaligned pointers in ptx assembly");
+      decl_offset += size;
+      begin_decl_field ();
+
+      HOST_WIDE_INT off = 0;
+      if (GET_CODE (x) == CONST)
+	x = XEXP (x, 0);
+      if (GET_CODE (x) == PLUS)
+	{
+	  off = INTVAL (XEXP (x, 1));
+	  x = XEXP (x, 0);
+	}
+      if (GET_CODE (x) == SYMBOL_REF)
+	{
+	  nvptx_record_needed_fndecl (SYMBOL_REF_DECL (x));
+	  fprintf (asm_out_file, "generic(");
+	  output_address (x);
+	  fprintf (asm_out_file, ")");
+	}
+      if (off != 0)
+	fprintf (asm_out_file, " + " HOST_WIDE_INT_PRINT_DEC, off);
+      return true;
+    }
+
+  HOST_WIDE_INT val;
+  switch (GET_CODE (x))
+    {
+    case CONST_INT:
+      val = INTVAL (x);
+      break;
+    case CONST_DOUBLE:
+      gcc_unreachable ();
+      break;
+    default:
+      gcc_unreachable ();
+    }
+
+  nvptx_assemble_value (val, size);
+  return true;
+}
+
+/* Output SIZE zero bytes.  We ignore the FILE argument since the
+   functions we're calling to perform the output just use
+   asm_out_file.  */
+
+void
+nvptx_output_skip (FILE *, unsigned HOST_WIDE_INT size)
+{
+  if (decl_offset + size >= object_size)
+    {
+      if (decl_offset % decl_chunk_size != 0)
+	nvptx_assemble_value (0, decl_chunk_size);
+      object_finished = true;
+      return;
+    }
+
+  while (size > decl_chunk_size)
+    {
+      nvptx_assemble_value (0, decl_chunk_size);
+      size -= decl_chunk_size;
+    }
+  while (size-- > 0)
+    nvptx_assemble_value (0, 1);
+}
+
+/* Output a string STR with length SIZE.  As in nvptx_output_skip we
+   ignore the FILE arg.  */
+
+void
+nvptx_output_ascii (FILE *, const char *str, unsigned HOST_WIDE_INT size)
+{
+  for (unsigned HOST_WIDE_INT i = 0; i < size; i++)
+    nvptx_assemble_value (str[i], 1);
+}
+
+/* Called when the initializer for a decl has been completely output through
+   combinations of the three functions above.  */
+
+static void
+nvptx_assemble_decl_end (void)
+{
+  if (decl_offset != 0)
+    {
+      if (!object_finished && decl_offset % decl_chunk_size != 0)
+	nvptx_assemble_value (0, decl_chunk_size);
+
+      fprintf (asm_out_file, " }");
+    }
+  fprintf (asm_out_file, ";\n");
+}
+
+/* Start a declaration of a variable of TYPE with NAME to
+   FILE.  IS_PUBLIC says whether this will be externally visible.
+   Here we just write the linker hint and decide on the chunk size
+   to use.  */
+
+static void
+init_output_initializer (FILE *file, const char *name, const_tree type,
+			 bool is_public)
+{
+  fprintf (file, "// BEGIN%s VAR DEF: ", is_public ? " GLOBAL" : "");
+  assemble_name_raw (file, name);
+  fputc ('\n', file);
+
+  if (TREE_CODE (type) == ARRAY_TYPE)
+    type = TREE_TYPE (type);
+  int sz = int_size_in_bytes (type);
+  if ((TREE_CODE (type) != INTEGER_TYPE
+       && TREE_CODE (type) != ENUMERAL_TYPE
+       && TREE_CODE (type) != REAL_TYPE)
+      || sz < 0
+      || sz > HOST_BITS_PER_WIDE_INT)
+    type = ptr_type_node;
+  decl_chunk_size = int_size_in_bytes (type);
+  decl_chunk_mode = int_mode_for_mode (TYPE_MODE (type));
+  decl_offset = 0;
+  init_part = 0;
+  object_finished = false;
+}
+
+/* Implement TARGET_ASM_DECLARE_CONSTANT_NAME.  Begin the process of
+   writing a constant variable EXP with NAME and SIZE and its
+   initializer to FILE.  */
+
+static void
+nvptx_asm_declare_constant_name (FILE *file, const char *name,
+				 const_tree exp, HOST_WIDE_INT size)
+{
+  tree type = TREE_TYPE (exp);
+  init_output_initializer (file, name, type, false);
+  fprintf (file, "\t.const .align %d .u%d ",
+	   TYPE_ALIGN (TREE_TYPE (exp)) / BITS_PER_UNIT,
+	   decl_chunk_size * BITS_PER_UNIT);
+  assemble_name (file, name);
+  fprintf (file, "[" HOST_WIDE_INT_PRINT_DEC "]",
+	   (size + decl_chunk_size - 1) / decl_chunk_size);
+  object_size = size;
+}
+
+/* Implement the ASM_DECLARE_OBJECT_NAME macro.  Used to start writing
+   a variable DECL with NAME to FILE.  */
+
+void
+nvptx_declare_object_name (FILE *file, const char *name, const_tree decl)
+{
+  if (decl && DECL_SIZE (decl))
+    {
+      tree type = TREE_TYPE (decl);
+      unsigned HOST_WIDE_INT size;
+
+      init_output_initializer (file, name, type, TREE_PUBLIC (decl));
+      size = tree_to_uhwi (DECL_SIZE_UNIT (decl));
+      const char *section = nvptx_section_for_decl (decl);
+      fprintf (file, "\t%s%s .align %d .u%d ",
+	       TREE_PUBLIC (decl) ? " .visible" : "", section,
+	       DECL_ALIGN (decl) / BITS_PER_UNIT,
+	       decl_chunk_size * BITS_PER_UNIT);
+      assemble_name (file, name);
+      if (size > 0)
+	fprintf (file, "[" HOST_WIDE_INT_PRINT_DEC "]",
+		 (size + decl_chunk_size - 1) / decl_chunk_size);
+      else
+	object_finished = true;
+      object_size = size;
+    }
+}
+
+/* Implement TARGET_ASM_GLOBALIZE_LABEL by doing nothing.  */
+
+static void
+nvptx_globalize_label (FILE *, const char *)
+{
+}
+
+/* Implement TARGET_ASM_ASSEMBLE_UNDEFINED_DECL.  Write an extern
+   declaration only for variable DECL with NAME to FILE.  */
+static void
+nvptx_assemble_undefined_decl (FILE *file, const char *name, const_tree decl)
+{
+  if (TREE_CODE (decl) != VAR_DECL)
+    return;
+  const char *section = nvptx_section_for_decl (decl);
+  fprintf (file, "// BEGIN%s VAR DECL: ", TREE_PUBLIC (decl) ? " GLOBAL" : "");
+  assemble_name_raw (file, name);
+  fputs ("\n", file);
+  HOST_WIDE_INT size = int_size_in_bytes (TREE_TYPE (decl));
+  fprintf (file, ".extern %s .b8 ", section);
+  assemble_name_raw (file, name);
+  if (size > 0)
+    fprintf (file, "["HOST_WIDE_INT_PRINT_DEC"]", size);
+  fprintf (file, ";\n\n");
+}
+
+/* Output INSN, which is a call to CALLEE with result RESULT.  For ptx, this
+   involves writing .param declarations and in/out copies into them.  */
+
+const char *
+nvptx_output_call_insn (rtx_insn *insn, rtx result, rtx callee)
+{
+  char buf[256];
+  static int labelno;
+  bool needs_tgt = register_operand (callee, Pmode);
+  rtx pat = PATTERN (insn);
+  int nargs = XVECLEN (pat, 0) - 1;
+  tree decl = NULL_TREE;
+
+  fprintf (asm_out_file, "\t{\n");
+  if (result != NULL)
+    {
+      fprintf (asm_out_file, "\t\t.param%s %%retval_in;\n",
+	       nvptx_ptx_type_from_mode (arg_promotion (GET_MODE (result)),
+					 false));
+    }
+
+  if (GET_CODE (callee) == SYMBOL_REF)
+    {
+      decl = SYMBOL_REF_DECL (callee);
+      if (decl && DECL_EXTERNAL (decl))
+	nvptx_record_fndecl (decl);
+    }
+
+  if (needs_tgt)
+    {
+      ASM_GENERATE_INTERNAL_LABEL (buf, "LCT", labelno);
+      labelno++;
+      ASM_OUTPUT_LABEL (asm_out_file, buf);
+      std::stringstream s;
+      write_func_decl_from_insn (s, result, pat, callee);
+      fputs (s.str().c_str(), asm_out_file);
+    }
+
+  for (int i = 0, argno = 0; i < nargs; i++)
+    {
+      rtx t = XEXP (XVECEXP (pat, 0, i + 1), 0);
+      machine_mode mode = GET_MODE (t);
+      int count = maybe_split_mode (&mode);
+
+      while (count-- > 0)
+	fprintf (asm_out_file, "\t\t.param%s %%out_arg%d%s;\n",
+		 nvptx_ptx_type_from_mode (mode, false), argno++,
+		 mode == QImode || mode == HImode ? "[1]" : "");
+    }
+  for (int i = 0, argno = 0; i < nargs; i++)
+    {
+      rtx t = XEXP (XVECEXP (pat, 0, i + 1), 0);
+      gcc_assert (REG_P (t));
+      machine_mode mode = GET_MODE (t);
+      int count = maybe_split_mode (&mode);
+
+      if (count == 1)
+	fprintf (asm_out_file, "\t\tst.param%s [%%out_arg%d], %%r%d;\n",
+		 nvptx_ptx_type_from_mode (mode, false), argno++,
+		 REGNO (t));
+      else
+	{
+	  int n = 0;
+	  while (count-- > 0)
+	    fprintf (asm_out_file, "\t\tst.param%s [%%out_arg%d], %%r%d$%d;\n",
+		     nvptx_ptx_type_from_mode (mode, false), argno++,
+		     REGNO (t), n++);
+	}
+    }
+
+  fprintf (asm_out_file, "\t\tcall ");
+  if (result != NULL_RTX)
+    fprintf (asm_out_file, "(%%retval_in), ");
+
+  if (decl)
+    {
+      const char *name = get_fnname_from_decl (decl);
+      name = nvptx_name_replacement (name);
+      assemble_name (asm_out_file, name);
+    }
+  else
+    output_address (callee);
+
+  if (nargs > 0 || (decl && DECL_STATIC_CHAIN (decl)))
+    {
+      fprintf (asm_out_file, ", (");
+      int i, argno;
+      for (i = 0, argno = 0; i < nargs; i++)
+	{
+	  rtx t = XEXP (XVECEXP (pat, 0, i + 1), 0);
+	  machine_mode mode = GET_MODE (t);
+	  int count = maybe_split_mode (&mode);
+
+	  while (count-- > 0)
+	    {
+	      fprintf (asm_out_file, "%%out_arg%d", argno++);
+	      if (i + 1 < nargs || count > 0)
+		fprintf (asm_out_file, ", ");
+	    }
+	}
+      if (decl && DECL_STATIC_CHAIN (decl))
+	{
+	  if (i > 0)
+	    fprintf (asm_out_file, ", ");
+	  fprintf (asm_out_file, "%s",
+		   reg_names [OUTGOING_STATIC_CHAIN_REGNUM]);
+	}
+
+      fprintf (asm_out_file, ")");
+    }
+  if (needs_tgt)
+    {
+      fprintf (asm_out_file, ", ");
+      assemble_name (asm_out_file, buf);
+    }
+  fprintf (asm_out_file, ";\n");
+  if (result != NULL_RTX)
+    return "ld.param%t0\t%0, [%%retval_in];\n\t}";
+
+  return "}";
+}
+
+/* Implement TARGET_PRINT_OPERAND_PUNCT_VALID_P.  */
+
+static bool
+nvptx_print_operand_punct_valid_p (unsigned char c)
+{
+  return c == '.' || c== '#';
+}
+
+static void nvptx_print_operand (FILE *, rtx, int);
+
+/* Subroutine of nvptx_print_operand; used to print a memory reference X to FILE.  */
+
+static void
+nvptx_print_address_operand (FILE *file, rtx x, machine_mode)
+{
+  rtx off;
+  if (GET_CODE (x) == CONST)
+    x = XEXP (x, 0);
+  switch (GET_CODE (x))
+    {
+    case PLUS:
+      off = XEXP (x, 1);
+      output_address (XEXP (x, 0));
+      fprintf (file, "+");
+      output_address (off);
+      break;
+
+    case SYMBOL_REF:
+    case LABEL_REF:
+      output_addr_const (file, x);
+      break;
+
+    default:
+      gcc_assert (GET_CODE (x) != MEM);
+      nvptx_print_operand (file, x, 0);
+      break;
+    }
+}
+
+/* Write assembly language output for the address ADDR to FILE.  */
+
+static void
+nvptx_print_operand_address (FILE *file, rtx addr)
+{
+  nvptx_print_address_operand (file, addr, VOIDmode);
+}
+
+/* Print an operand, X, to FILE, with an optional modifier in CODE.
+
+   Meaning of CODE:
+   . -- print the predicate for the instruction or an emptry string for an
+        unconditional one.
+   # -- print a rounding mode for the instruction
+
+   A -- print an address space identifier for a MEM
+   c -- print an opcode suffix for a comparison operator, including a type code
+   d -- print a CONST_INT as a vector dimension (x, y, or z)
+   f -- print a full reg even for something that must always be split
+   t -- print a type opcode suffix, promoting QImode to 32 bits
+   T -- print a type size in bits
+   u -- print a type opcode suffix without promotions.  */
+
+static void
+nvptx_print_operand (FILE *file, rtx x, int code)
+{
+  rtx orig_x = x;
+  machine_mode op_mode;
+
+  if (code == '.')
+    {
+      x = current_insn_predicate;
+      if (x)
+	{
+	  unsigned int regno = REGNO (XEXP (x, 0));
+	  fputs ("[", file);
+	  if (GET_CODE (x) == EQ)
+	    fputs ("!", file);
+	  fputs (reg_names [regno], file);
+	  fputs ("]", file);
+	}
+      return;
+    }
+  else if (code == '#')
+    {
+      fputs (".rn", file);
+      return;
+    }
+
+  enum rtx_code x_code = GET_CODE (x);
+
+  switch (code)
+    {
+    case 'A':
+      {
+	addr_space_t as = nvptx_addr_space_from_address (XEXP (x, 0));
+	fputs (nvptx_section_from_addr_space (as), file);
+      }
+      break;
+
+    case 'd':
+      gcc_assert (x_code == CONST_INT);
+      if (INTVAL (x) == 0)
+	fputs (".x", file);
+      else if (INTVAL (x) == 1)
+	fputs (".y", file);
+      else if (INTVAL (x) == 2)
+	fputs (".z", file);
+      else
+	gcc_unreachable ();
+      break;
+
+    case 't':
+      op_mode = nvptx_underlying_object_mode (x);
+      fprintf (file, "%s", nvptx_ptx_type_from_mode (op_mode, true));
+      break;
+
+    case 'u':
+      op_mode = nvptx_underlying_object_mode (x);
+      fprintf (file, "%s", nvptx_ptx_type_from_mode (op_mode, false));
+      break;
+
+    case 'T':
+      fprintf (file, "%d", GET_MODE_BITSIZE (GET_MODE (x)));
+      break;
+
+    case 'j':
+      fprintf (file, "@");
+      goto common;
+
+    case 'J':
+      fprintf (file, "@!");
+      goto common;
+
+    case 'c':
+      op_mode = GET_MODE (XEXP (x, 0));
+      switch (x_code)
+	{
+	case EQ:
+	  fputs (".eq", file);
+	  break;
+	case NE:
+	  if (FLOAT_MODE_P (op_mode))
+	    fputs (".neu", file);
+	  else
+	    fputs (".ne", file);
+	  break;
+	case LE:
+	  fputs (".le", file);
+	  break;
+	case GE:
+	  fputs (".ge", file);
+	  break;
+	case LT:
+	  fputs (".lt", file);
+	  break;
+	case GT:
+	  fputs (".gt", file);
+	  break;
+	case LEU:
+	  fputs (".ls", file);
+	  break;
+	case GEU:
+	  fputs (".hs", file);
+	  break;
+	case LTU:
+	  fputs (".lo", file);
+	  break;
+	case GTU:
+	  fputs (".hi", file);
+	  break;
+	case LTGT:
+	  fputs (".ne", file);
+	  break;
+	case UNEQ:
+	  fputs (".equ", file);
+	  break;
+	case UNLE:
+	  fputs (".leu", file);
+	  break;
+	case UNGE:
+	  fputs (".geu", file);
+	  break;
+	case UNLT:
+	  fputs (".ltu", file);
+	  break;
+	case UNGT:
+	  fputs (".gtu", file);
+	  break;
+	case UNORDERED:
+	  fputs (".nan", file);
+	  break;
+	case ORDERED:
+	  fputs (".num", file);
+	  break;
+	default:
+	  gcc_unreachable ();
+	}
+      if (FLOAT_MODE_P (op_mode)
+	  || x_code == EQ || x_code == NE
+	  || x_code == GEU || x_code == GTU
+	  || x_code == LEU || x_code == LTU)
+	fputs (nvptx_ptx_type_from_mode (op_mode, true), file);
+      else
+	fprintf (file, ".s%d", GET_MODE_BITSIZE (op_mode));
+      break;
+    default:
+    common:
+      switch (x_code)
+	{
+	case SUBREG:
+	  x = SUBREG_REG (x);
+	  /* fall through */
+
+	case REG:
+	  if (HARD_REGISTER_P (x))
+	    fprintf (file, "%s", reg_names[REGNO (x)]);
+	  else
+	    fprintf (file, "%%r%d", REGNO (x));
+	  if (code != 'f' && nvptx_split_reg_p (GET_MODE (x)))
+	    {
+	      gcc_assert (GET_CODE (orig_x) == SUBREG
+			  && !nvptx_split_reg_p (GET_MODE (orig_x)));
+	      fprintf (file, "$%d", SUBREG_BYTE (orig_x) / UNITS_PER_WORD);
+	    }
+	  break;
+
+	case MEM:
+	  fputc ('[', file);
+	  nvptx_print_address_operand (file, XEXP (x, 0), GET_MODE (x));
+	  fputc (']', file);
+	  break;
+
+	case CONST_INT:
+	  output_addr_const (file, x);
+	  break;
+
+	case CONST:
+	case SYMBOL_REF:
+	case LABEL_REF:
+	  /* We could use output_addr_const, but that can print things like
+	     "x-8", which breaks ptxas.  Need to ensure it is output as
+	     "x+-8".  */
+	  nvptx_print_address_operand (file, x, VOIDmode);
+	  break;
+
+	case CONST_DOUBLE:
+	  long vals[2];
+	  REAL_VALUE_TYPE real;
+	  REAL_VALUE_FROM_CONST_DOUBLE (real, x);
+	  real_to_target (vals, &real, GET_MODE (x));
+	  vals[0] &= 0xffffffff;
+	  vals[1] &= 0xffffffff;
+	  if (GET_MODE (x) == SFmode)
+	    fprintf (file, "0f%08lx", vals[0]);
+	  else
+	    fprintf (file, "0d%08lx%08lx", vals[1], vals[0]);
+	  break;
+
+	default:
+	  output_addr_const (file, x);
+	}
+    }
+}
+\f
+/* Record replacement regs used to deal with subreg operands.  */
+struct reg_replace
+{
+  rtx replacement[MAX_RECOG_OPERANDS];
+  machine_mode mode;
+  int n_allocated;
+  int n_in_use;
+};
+
+/* Allocate or reuse a replacement in R and return the rtx.  */
+
+static rtx
+get_replacement (struct reg_replace *r)
+{
+  if (r->n_allocated == r->n_in_use)
+    r->replacement[r->n_allocated++] = gen_reg_rtx (r->mode);
+  return r->replacement[r->n_in_use++];
+}
+
+/* Clean up subreg operands.  In ptx assembly, everything is typed, and
+   the presence of subregs would break the rules for most instructions.
+   Replace them with a suitable new register of the right size, plus
+   conversion copyin/copyout instructions.  */
+
+static void
+nvptx_reorg (void)
+{
+  struct reg_replace qiregs, hiregs, siregs, diregs;
+  rtx_insn *insn, *next;
+
+  /* We are freeing block_for_insn in the toplev to keep compatibility
+     with old MDEP_REORGS that are not CFG based.  Recompute it now.  */
+  compute_bb_for_insn ();
+
+  df_clear_flags (DF_LR_RUN_DCE);
+  df_analyze ();
+
+  thread_prologue_and_epilogue_insns ();
+
+  qiregs.n_allocated = 0;
+  hiregs.n_allocated = 0;
+  siregs.n_allocated = 0;
+  diregs.n_allocated = 0;
+  qiregs.mode = QImode;
+  hiregs.mode = HImode;
+  siregs.mode = SImode;
+  diregs.mode = DImode;
+
+  for (insn = get_insns (); insn; insn = next)
+    {
+      next = NEXT_INSN (insn);
+      if (!NONDEBUG_INSN_P (insn)
+	  || asm_noperands (insn) >= 0
+	  || GET_CODE (PATTERN (insn)) == USE
+	  || GET_CODE (PATTERN (insn)) == CLOBBER)
+	continue;
+      qiregs.n_in_use = 0;
+      hiregs.n_in_use = 0;
+      siregs.n_in_use = 0;
+      diregs.n_in_use = 0;
+      extract_insn (insn);
+      enum attr_subregs_ok s_ok = get_attr_subregs_ok (insn);
+      for (int i = 0; i < recog_data.n_operands; i++)
+	{
+	  rtx op = recog_data.operand[i];
+	  if (GET_CODE (op) != SUBREG)
+	    continue;
+
+	  rtx inner = SUBREG_REG (op);
+
+	  machine_mode outer_mode = GET_MODE (op);
+	  machine_mode inner_mode = GET_MODE (inner);
+	  gcc_assert (s_ok);
+	  if (s_ok
+	      && (GET_MODE_PRECISION (inner_mode)
+		  >= GET_MODE_PRECISION (outer_mode)))
+	    continue;
+	  gcc_assert (SCALAR_INT_MODE_P (outer_mode));
+	  struct reg_replace *r = (outer_mode == QImode ? &qiregs
+				   : outer_mode == HImode ? &hiregs
+				   : outer_mode == SImode ? &siregs
+				   : &diregs);
+	  rtx new_reg = get_replacement (r);
+
+	  if (recog_data.operand_type[i] != OP_OUT)
+	    {
+	      enum rtx_code code;
+	      if (GET_MODE_PRECISION (inner_mode)
+		  < GET_MODE_PRECISION (outer_mode))
+		code = ZERO_EXTEND;
+	      else
+		code = TRUNCATE;
+
+	      rtx pat = gen_rtx_SET (VOIDmode, new_reg,
+				     gen_rtx_fmt_e (code, outer_mode, inner));
+	      emit_insn_before (pat, insn);
+	    }
+
+	  if (recog_data.operand_type[i] != OP_IN)
+	    {
+	      enum rtx_code code;
+	      if (GET_MODE_PRECISION (inner_mode)
+		  < GET_MODE_PRECISION (outer_mode))
+		code = TRUNCATE;
+	      else
+		code = ZERO_EXTEND;
+
+	      rtx pat = gen_rtx_SET (VOIDmode, inner,
+				     gen_rtx_fmt_e (code, inner_mode, new_reg));
+	      emit_insn_after (pat, insn);
+	    }
+	  validate_change (insn, recog_data.operand_loc[i], new_reg, false);
+	}
+    }
+
+  int maxregs = max_reg_num ();
+  regstat_init_n_sets_and_refs ();
+
+  for (int i = LAST_VIRTUAL_REGISTER + 1; i < maxregs; i++)
+    if (REG_N_SETS (i) == 0 && REG_N_REFS (i) == 0)
+      regno_reg_rtx[i] = const0_rtx;
+  regstat_free_n_sets_and_refs ();
+}
+\f
+/* Handle a "kernel" attribute; arguments as in
+   struct attribute_spec.handler.  */
+
+static tree
+nvptx_handle_kernel_attribute (tree *node, tree name, tree ARG_UNUSED (args),
+			       int ARG_UNUSED (flags), bool *no_add_attrs)
+{
+  tree decl = *node;
+
+  if (TREE_CODE (decl) != FUNCTION_DECL)
+    {
+      error ("%qE attribute only applies to functions", name);
+      *no_add_attrs = true;
+    }
+
+  else if (TREE_TYPE (TREE_TYPE (decl)) != void_type_node)
+    {
+      error ("%qE attribute requires a void return type", name);
+      *no_add_attrs = true;
+    }
+
+  return NULL_TREE;
+}
+
+/* Table of valid machine attributes.  */
+static const struct attribute_spec nvptx_attribute_table[] =
+{
+  /* { name, min_len, max_len, decl_req, type_req, fn_type_req, handler,
+       affects_type_identity } */
+  { "kernel", 0, 0, true, false,  false, nvptx_handle_kernel_attribute, false },
+  { NULL, 0, 0, false, false, false, NULL, false }
+};
+\f
+/* Limit vector alignments to BIGGEST_ALIGNMENT.  */
+
+static HOST_WIDE_INT
+nvptx_vector_alignment (const_tree type)
+{
+  HOST_WIDE_INT align = tree_to_shwi (TYPE_SIZE (type));
+
+  return MIN (align, BIGGEST_ALIGNMENT);
+}
+\f
+/* Implement TARGET_ASM_FILE_START.  Write the kinds of things ptxas expects
+   at the start of a file.  */
+
+static void
+nvptx_file_start (void)
+{
+  fputs ("// BEGIN PREAMBLE\n", asm_out_file);
+  fputs ("\t.version\t3.1\n", asm_out_file);
+  fputs ("\t.target\tsm_30\n", asm_out_file);
+  fprintf (asm_out_file, "\t.address_size %d\n", GET_MODE_BITSIZE (Pmode));
+  fputs ("// END PREAMBLE\n", asm_out_file);
+}
+
+/* Called through htab_traverse; call nvptx_record_fndecl for every
+   SLOT.  */
+
+static int
+write_one_fndecl (void **slot, void *)
+{
+  tree decl = (tree)*slot;
+  nvptx_record_fndecl (decl, true);
+  return 1;
+}
+
+/* Write out the function declarations we've collected.  */
+
+static void
+nvptx_file_end (void)
+{
+  htab_traverse (needed_fndecls_htab,
+		 write_one_fndecl,
+		 NULL);
+  fputs (func_decls.str().c_str(), asm_out_file);
+}
+\f
+#undef TARGET_OPTION_OVERRIDE
+#define TARGET_OPTION_OVERRIDE nvptx_option_override
+
+#undef TARGET_ATTRIBUTE_TABLE
+#define TARGET_ATTRIBUTE_TABLE nvptx_attribute_table
+
+#undef TARGET_LEGITIMATE_ADDRESS_P
+#define TARGET_LEGITIMATE_ADDRESS_P nvptx_legitimate_address_p
+
+#undef  TARGET_PROMOTE_FUNCTION_MODE
+#define TARGET_PROMOTE_FUNCTION_MODE nvptx_promote_function_mode
+
+#undef TARGET_FUNCTION_ARG
+#define TARGET_FUNCTION_ARG nvptx_function_arg
+#undef TARGET_FUNCTION_INCOMING_ARG
+#define TARGET_FUNCTION_INCOMING_ARG nvptx_function_incoming_arg
+#undef TARGET_FUNCTION_ARG_ADVANCE
+#define TARGET_FUNCTION_ARG_ADVANCE nvptx_function_arg_advance
+#undef TARGET_FUNCTION_ARG_BOUNDARY
+#define TARGET_FUNCTION_ARG_BOUNDARY nvptx_function_arg_boundary
+#undef TARGET_FUNCTION_ARG_ROUND_BOUNDARY
+#define TARGET_FUNCTION_ARG_ROUND_BOUNDARY nvptx_function_arg_boundary
+#undef TARGET_PASS_BY_REFERENCE
+#define TARGET_PASS_BY_REFERENCE nvptx_pass_by_reference
+#undef TARGET_FUNCTION_VALUE_REGNO_P
+#define TARGET_FUNCTION_VALUE_REGNO_P nvptx_function_value_regno_p
+#undef TARGET_FUNCTION_VALUE
+#define TARGET_FUNCTION_VALUE nvptx_function_value
+#undef TARGET_LIBCALL_VALUE
+#define TARGET_LIBCALL_VALUE nvptx_libcall_value
+#undef TARGET_FUNCTION_OK_FOR_SIBCALL
+#define TARGET_FUNCTION_OK_FOR_SIBCALL nvptx_function_ok_for_sibcall
+#undef TARGET_SPLIT_COMPLEX_ARG
+#define TARGET_SPLIT_COMPLEX_ARG hook_bool_const_tree_true
+#undef TARGET_RETURN_IN_MEMORY
+#define TARGET_RETURN_IN_MEMORY nvptx_return_in_memory
+#undef TARGET_OMIT_STRUCT_RETURN_REG
+#define TARGET_OMIT_STRUCT_RETURN_REG true
+#undef TARGET_STRICT_ARGUMENT_NAMING
+#define TARGET_STRICT_ARGUMENT_NAMING nvptx_strict_argument_naming
+#undef TARGET_STATIC_CHAIN
+#define TARGET_STATIC_CHAIN nvptx_static_chain
+
+#undef TARGET_CALL_ARGS
+#define TARGET_CALL_ARGS nvptx_call_args
+#undef TARGET_END_CALL_ARGS
+#define TARGET_END_CALL_ARGS nvptx_end_call_args
+
+#undef TARGET_ASM_FILE_START
+#define TARGET_ASM_FILE_START nvptx_file_start
+#undef TARGET_ASM_FILE_END
+#define TARGET_ASM_FILE_END nvptx_file_end
+#undef TARGET_ASM_GLOBALIZE_LABEL
+#define TARGET_ASM_GLOBALIZE_LABEL nvptx_globalize_label
+#undef TARGET_ASM_ASSEMBLE_UNDEFINED_DECL
+#define TARGET_ASM_ASSEMBLE_UNDEFINED_DECL nvptx_assemble_undefined_decl
+#undef  TARGET_PRINT_OPERAND
+#define TARGET_PRINT_OPERAND nvptx_print_operand
+#undef  TARGET_PRINT_OPERAND_ADDRESS
+#define TARGET_PRINT_OPERAND_ADDRESS nvptx_print_operand_address
+#undef  TARGET_PRINT_OPERAND_PUNCT_VALID_P
+#define TARGET_PRINT_OPERAND_PUNCT_VALID_P nvptx_print_operand_punct_valid_p
+#undef TARGET_ASM_INTEGER
+#define TARGET_ASM_INTEGER nvptx_assemble_integer
+#undef TARGET_ASM_DECL_END
+#define TARGET_ASM_DECL_END nvptx_assemble_decl_end
+#undef TARGET_ASM_DECLARE_CONSTANT_NAME
+#define TARGET_ASM_DECLARE_CONSTANT_NAME nvptx_asm_declare_constant_name
+#undef TARGET_USE_BLOCKS_FOR_CONSTANT_P
+#define TARGET_USE_BLOCKS_FOR_CONSTANT_P hook_bool_mode_const_rtx_true
+#undef TARGET_ASM_NEED_VAR_DECL_BEFORE_USE
+#define TARGET_ASM_NEED_VAR_DECL_BEFORE_USE true
+
+#undef TARGET_MACHINE_DEPENDENT_REORG
+#define TARGET_MACHINE_DEPENDENT_REORG nvptx_reorg
+#undef TARGET_NO_REGISTER_ALLOCATION
+#define TARGET_NO_REGISTER_ALLOCATION true
+
+#undef TARGET_VECTOR_ALIGNMENT
+#define TARGET_VECTOR_ALIGNMENT nvptx_vector_alignment
+
+struct gcc_target targetm = TARGET_INITIALIZER;
+
+#include "gt-nvptx.h"
diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h
new file mode 100644
index 0000000..c222375
--- /dev/null
+++ b/gcc/config/nvptx/nvptx.h
@@ -0,0 +1,356 @@
+/* Target Definitions for NVPTX.
+   Copyright (C) 2014 Free Software Foundation, Inc.
+   Contributed by Bernd Schmidt <bernds@codesourcery.com>
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_NVPTX_H
+#define GCC_NVPTX_H
+
+/* Run-time Target.  */
+
+#define STARTFILE_SPEC "%{mmainkernel:crt0.o}"
+
+#define TARGET_CPU_CPP_BUILTINS()		\
+  do						\
+    {						\
+      builtin_assert ("machine=nvptx");		\
+      builtin_assert ("cpu=nvptx");		\
+      builtin_define ("__nvptx__");		\
+    } while (0)
+
+/* Storage Layout.  */
+
+#define BITS_BIG_ENDIAN 0
+#define BYTES_BIG_ENDIAN 0
+#define WORDS_BIG_ENDIAN 0
+
+/* Chosen such that we won't have to deal with multi-word subregs.  */
+#define UNITS_PER_WORD 8
+
+#define PARM_BOUNDARY 8
+#define STACK_BOUNDARY 64
+#define FUNCTION_BOUNDARY 32
+#define BIGGEST_ALIGNMENT 64
+#define STRICT_ALIGNMENT 1
+
+/* Copied from elf.h and other places.  We'd otherwise use
+   BIGGEST_ALIGNMENT and fail a number of testcases.  */
+#define MAX_OFILE_ALIGNMENT (32768 * 8)
+
+/* Type Layout.  */
+
+#define DEFAULT_SIGNED_CHAR 1
+
+#define SHORT_TYPE_SIZE 16
+#define INT_TYPE_SIZE 32
+#define LONG_TYPE_SIZE (TARGET_ABI64 ? 64 : 32)
+#define LONG_LONG_TYPE_SIZE 64
+#define FLOAT_TYPE_SIZE 32
+#define DOUBLE_TYPE_SIZE 64
+#define LONG_DOUBLE_TYPE_SIZE 64
+
+#undef SIZE_TYPE
+#define SIZE_TYPE (TARGET_ABI64 ? "long unsigned int" : "unsigned int")
+#undef PTRDIFF_TYPE
+#define PTRDIFF_TYPE (TARGET_ABI64 ? "long int" : "int")
+
+#define POINTER_SIZE (TARGET_ABI64 ? 64 : 32)
+
+#define Pmode (TARGET_ABI64 ? DImode : SImode)
+
+/* Registers.  Since ptx is a virtual target, we just define a few
+   hard registers for special purposes and leave pseudos unallocated.  */
+
+#define FIRST_PSEUDO_REGISTER 16
+#define FIXED_REGISTERS					\
+  { 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1 }
+#define CALL_USED_REGISTERS				\
+  { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 }
+
+#define HARD_REGNO_NREGS(regno, mode)	1
+#define CANNOT_CHANGE_MODE_CLASS(M1, M2, CLS) ((CLS) == RETURN_REG)
+#define HARD_REGNO_MODE_OK(REG, MODE) nvptx_hard_regno_mode_ok (REG, MODE)
+
+/* Register Classes.  */
+
+enum reg_class
+  {
+    NO_REGS,
+    RETURN_REG,
+    ALL_REGS,
+    LIM_REG_CLASSES
+  };
+
+#define N_REG_CLASSES (int) LIM_REG_CLASSES
+
+#define REG_CLASS_NAMES {	  \
+    "RETURN_REG",		  \
+    "NO_REGS",			  \
+    "ALL_REGS" }
+
+#define REG_CLASS_CONTENTS	\
+{				\
+  /* NO_REGS.  */		\
+  { 0x0000 },			\
+  /* RETURN_REG.  */		\
+  { 0x0008 },			\
+  /* ALL_REGS.  */		\
+  { 0xFFFF },			\
+}
+
+#define GENERAL_REGS ALL_REGS
+
+#define REGNO_REG_CLASS(R) ((R) == 4 ? RETURN_REG : ALL_REGS)
+
+#define BASE_REG_CLASS ALL_REGS
+#define INDEX_REG_CLASS NO_REGS
+
+#define REGNO_OK_FOR_BASE_P(X) true
+#define REGNO_OK_FOR_INDEX_P(X) false
+
+#define CLASS_MAX_NREGS(class, mode) \
+  ((GET_MODE_SIZE (mode) + UNITS_PER_WORD - 1) / UNITS_PER_WORD)
+
+#define MODES_TIEABLE_P(M1, M2) false
+
+#define PROMOTE_MODE(MODE, UNSIGNEDP, TYPE)		\
+  if (GET_MODE_CLASS (MODE) == MODE_INT			\
+      && GET_MODE_SIZE (MODE) < GET_MODE_SIZE (SImode))	\
+    {							\
+      (MODE) = SImode;					\
+    }
+
+/* Address spaces.  */
+#define ADDR_SPACE_GLOBAL 1
+#define ADDR_SPACE_SHARED 3
+#define ADDR_SPACE_CONST 4
+#define ADDR_SPACE_LOCAL 5
+#define ADDR_SPACE_PARAM 101
+
+/* Stack and Calling.  */
+
+#define STARTING_FRAME_OFFSET 0
+#define FRAME_GROWS_DOWNWARD 0
+#define STACK_GROWS_DOWNWARD
+
+#define STACK_POINTER_REGNUM 1
+#define HARD_FRAME_POINTER_REGNUM 2
+#define NVPTX_PUNNING_BUFFER_REGNUM 3
+#define NVPTX_RETURN_REGNUM 4
+#define FRAME_POINTER_REGNUM 15
+#define ARG_POINTER_REGNUM 14
+#define RETURN_ADDR_REGNO 13
+
+#define STATIC_CHAIN_REGNUM 12
+#define OUTGOING_ARG_POINTER_REGNUM 11
+#define OUTGOING_STATIC_CHAIN_REGNUM 10
+
+#define FIRST_PARM_OFFSET(FNDECL) 0
+#define PUSH_ARGS_REVERSED 1
+
+#define ACCUMULATE_OUTGOING_ARGS 1
+
+#ifdef HOST_WIDE_INT
+struct nvptx_args {
+  union tree_node *fntype;
+  /* Number of arguments passed in registers so far.  */
+  int count;
+  /* Offset into the stdarg area so far.  */
+  HOST_WIDE_INT off;
+};
+#endif
+
+#define CUMULATIVE_ARGS struct nvptx_args
+
+#define INIT_CUMULATIVE_ARGS(CUM, FNTYPE, LIBNAME, FNDECL, N_NAMED_ARGS) \
+  do { (CUM).fntype = (FNTYPE); (CUM).count = 0; (CUM).off = 0; } while (0)
+
+#define FUNCTION_ARG_REGNO_P(r) 0
+
+#define DEFAULT_PCC_STRUCT_RETURN 0
+
+#define FUNCTION_PROFILER(file, labelno) \
+  fatal_error ("profiling is not yet implemented for this architecture")
+
+#define TRAMPOLINE_SIZE 32
+#define TRAMPOLINE_ALIGNMENT 256
+\f
+/* We don't run reload, so this isn't actually used, but it still needs to be
+   defined.  Showing an argp->fp elimination also stops
+   expand_builtin_setjmp_receiver from generating invalid insns.  */
+#define ELIMINABLE_REGS					\
+  {							\
+    { FRAME_POINTER_REGNUM, HARD_FRAME_POINTER_REGNUM},	\
+    { ARG_POINTER_REGNUM, HARD_FRAME_POINTER_REGNUM}	\
+  }
+
+/* Define the offset between two registers, one to be eliminated, and the other
+   its replacement, at the start of a routine.  */
+
+#define INITIAL_ELIMINATION_OFFSET(FROM, TO, OFFSET) \
+  ((OFFSET) = 0)
+\f
+/* Addressing Modes.  */
+
+#define MAX_REGS_PER_ADDRESS 1
+
+#define LEGITIMATE_PIC_OPERAND_P(X) 1
+\f
+
+struct nvptx_pseudo_info
+{
+  int true_size;
+  int renumber;
+};
+
+#if defined HOST_WIDE_INT
+struct GTY(()) machine_function
+{
+  rtx_expr_list *call_args;
+  rtx start_call;
+  tree funtype;
+  bool has_call_with_varargs;
+  bool has_call_with_sc;
+  struct GTY((skip)) nvptx_pseudo_info *pseudos;
+  HOST_WIDE_INT outgoing_stdarg_size;
+  int ret_reg_mode;
+  int punning_buffer_size;
+};
+#endif
+\f
+/* Costs.  */
+
+#define NO_FUNCTION_CSE 1
+#define SLOW_BYTE_ACCESS 0
+#define BRANCH_COST(speed_p, predictable_p) 6
+\f
+/* Assembler Format.  */
+
+#undef ASM_DECLARE_FUNCTION_NAME
+#define ASM_DECLARE_FUNCTION_NAME(FILE, NAME, DECL)		\
+  nvptx_declare_function_name (FILE, NAME, DECL)
+
+#undef ASM_DECLARE_FUNCTION_SIZE
+#define ASM_DECLARE_FUNCTION_SIZE(STREAM, NAME, DECL) \
+  nvptx_function_end (STREAM)
+
+#define DWARF2_ASM_LINE_DEBUG_INFO 1
+
+#undef ASM_APP_ON
+#define ASM_APP_ON "\t// #APP \n"
+#undef ASM_APP_OFF
+#define ASM_APP_OFF "\t// #NO_APP \n"
+
+#define ASM_OUTPUT_COMMON(stream, name, size, rounded)
+#define ASM_OUTPUT_LOCAL(stream, name, size, rounded)
+
+#define REGISTER_NAMES							\
+  {									\
+    "%hr0", "%outargs", "%hfp", "%punbuffer", "%retval", "%retval_in", "%hr6", "%hr7",	\
+    "%hr8", "%hr9", "%hr10", "%hr11", "%hr12", "%hr13", "%argp", "%frame" \
+  }
+
+#define DBX_REGISTER_NUMBER(N) N
+
+#define TEXT_SECTION_ASM_OP ""
+#define DATA_SECTION_ASM_OP ""
+
+#undef  ASM_GENERATE_INTERNAL_LABEL
+#define ASM_GENERATE_INTERNAL_LABEL(LABEL, PREFIX, NUM)		\
+  do								\
+    {								\
+      char *__p;						\
+      __p = stpcpy (&(LABEL)[1], PREFIX);			\
+      (LABEL)[0] = '$';						\
+      sprint_ul (__p, (unsigned long) (NUM));			\
+    }								\
+  while (0)
+
+#define ASM_OUTPUT_ALIGN(FILE, POWER)
+#define ASM_OUTPUT_SKIP(FILE, N)		\
+  nvptx_output_skip (FILE, N)
+#undef  ASM_OUTPUT_ASCII
+#define ASM_OUTPUT_ASCII(FILE, STR, LENGTH)			\
+  nvptx_output_ascii (FILE, STR, LENGTH);
+
+#define ASM_DECLARE_OBJECT_NAME(FILE, NAME, DECL)	\
+  nvptx_declare_object_name (FILE, NAME, DECL)
+
+#undef  ASM_OUTPUT_ALIGNED_DECL_COMMON
+#define ASM_OUTPUT_ALIGNED_DECL_COMMON(FILE, DECL, NAME, SIZE, ALIGN)	\
+  do									\
+    {									\
+      fprintf (FILE, "// BEGIN%s VAR DEF: ",				\
+	       TREE_PUBLIC (DECL) ? " GLOBAL" : "");			\
+      assemble_name_raw (FILE, NAME);					\
+      fputc ('\n', FILE);						\
+      const char *sec = nvptx_section_for_decl (DECL);			\
+      fprintf (FILE, ".visible%s.align %d .b8 ", sec,			\
+	       (ALIGN) / BITS_PER_UNIT);				\
+      assemble_name ((FILE), (NAME));					\
+      if ((SIZE) > 0)							\
+	fprintf (FILE, "["HOST_WIDE_INT_PRINT_DEC"]", (SIZE));		\
+      fprintf (FILE, ";\n");						\
+    }									\
+  while (0)
+
+#undef  ASM_OUTPUT_ALIGNED_DECL_LOCAL
+#define ASM_OUTPUT_ALIGNED_DECL_LOCAL(FILE, DECL, NAME, SIZE, ALIGN)	\
+  do									\
+    {									\
+      fprintf (FILE, "// BEGIN VAR DEF: ");				\
+      assemble_name_raw (FILE, NAME);					\
+      fputc ('\n', FILE);						\
+      const char *sec = nvptx_section_for_decl (DECL);			\
+      fprintf (FILE, ".visible%s.align %d .b8 ", sec,			\
+	       (ALIGN) / BITS_PER_UNIT);				\
+      assemble_name ((FILE), (NAME));					\
+      if ((SIZE) > 0)							\
+	fprintf (FILE, "["HOST_WIDE_INT_PRINT_DEC"]", (SIZE));		\
+      fprintf (FILE, ";\n");						\
+    }									\
+  while (0)
+
+#define CASE_VECTOR_PC_RELATIVE flag_pic
+#define JUMP_TABLES_IN_TEXT_SECTION flag_pic
+
+#define ADDR_VEC_ALIGN(VEC) (JUMP_TABLES_IN_TEXT_SECTION ? 5 : 2)
+
+/* Misc.  */
+
+#define DWARF2_DEBUGGING_INFO 1
+
+#define CLZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \
+  ((VALUE) = GET_MODE_BITSIZE ((MODE)), 2)
+#define CTZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \
+  ((VALUE) = GET_MODE_BITSIZE ((MODE)), 2)
+
+#define NO_DOT_IN_LABEL
+#define ASM_COMMENT_START "//"
+
+#define STORE_FLAG_VALUE -1
+#define FLOAT_STORE_FLAG_VALUE(MODE) REAL_VALUE_ATOF("1.0", (MODE))
+
+#define CASE_VECTOR_MODE SImode
+#define MOVE_MAX 4
+#define MOVE_RATIO(SPEED) 4
+#define TRULY_NOOP_TRUNCATION(outprec, inprec) 1
+#define FUNCTION_MODE QImode
+#define HAS_INIT_SECTION 1
+
+#endif /* GCC_NVPTX_H */
diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
new file mode 100644
index 0000000..966c28b
--- /dev/null
+++ b/gcc/config/nvptx/nvptx.md
@@ -0,0 +1,1376 @@
+;; Machine description for NVPTX.
+;; Copyright (C) 2014 Free Software Foundation, Inc.
+;; Contributed by Bernd Schmidt <bernds@codesourcery.com>
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+(define_c_enum "unspec" [
+   UNSPEC_ARG_REG
+   UNSPEC_FROM_GLOBAL
+   UNSPEC_FROM_LOCAL
+   UNSPEC_FROM_PARAM
+   UNSPEC_FROM_SHARED
+   UNSPEC_FROM_CONST
+   UNSPEC_TO_GLOBAL
+   UNSPEC_TO_LOCAL
+   UNSPEC_TO_PARAM
+   UNSPEC_TO_SHARED
+   UNSPEC_TO_CONST
+
+   UNSPEC_CPLX_LOWPART
+   UNSPEC_CPLX_HIGHPART
+
+   UNSPEC_COPYSIGN
+   UNSPEC_LOG2
+   UNSPEC_EXP2
+   UNSPEC_SIN
+   UNSPEC_COS
+
+   UNSPEC_FPINT_FLOOR
+   UNSPEC_FPINT_BTRUNC
+   UNSPEC_FPINT_CEIL
+   UNSPEC_FPINT_NEARBYINT
+
+   UNSPEC_BITREV
+
+   UNSPEC_ALLOCA
+
+   UNSPEC_NTID
+   UNSPEC_TID
+])
+
+(define_c_enum "unspecv" [
+   UNSPECV_LOCK
+   UNSPECV_CAS
+   UNSPECV_XCHG
+])
+
+(define_attr "subregs_ok" "false,true"
+  (const_string "false"))
+
+(define_predicate "nvptx_register_operand"
+  (match_code "reg,subreg")
+{
+  if (REG_P (op))
+    return !HARD_REGISTER_P (op);
+  if (GET_CODE (op) == SUBREG && MEM_P (SUBREG_REG (op)))
+    return false;
+  if (GET_CODE (op) == SUBREG)
+    return false;
+  return register_operand (op, mode);
+})
+
+(define_predicate "nvptx_reg_or_mem_operand"
+  (match_code "mem,reg,subreg")
+{
+  if (REG_P (op))
+    return !HARD_REGISTER_P (op);
+  if (GET_CODE (op) == SUBREG && MEM_P (SUBREG_REG (op)))
+    return false;
+  if (GET_CODE (op) == SUBREG)
+    return false;
+  return memory_operand (op, mode) || register_operand (op, mode);
+})
+
+;; Allow symbolic constants.
+(define_predicate "symbolic_operand"
+  (match_code "symbol_ref,const"))
+
+;; Allow registers or symbolic constants.  We can allow frame, arg or stack
+;; pointers here since they are actually symbolic constants.
+(define_predicate "nvptx_register_or_symbolic_operand"
+  (match_code "reg,subreg,symbol_ref,const")
+{
+  if (GET_CODE (op) == SUBREG && MEM_P (SUBREG_REG (op)))
+    return false;
+  if (GET_CODE (op) == SUBREG)
+    return false;
+  if (CONSTANT_P (op))
+    return true;
+  return register_operand (op, mode);
+})
+
+;; Registers or constants for normal instructions.  Does not allow symbolic
+;; constants.
+(define_predicate "nvptx_nonmemory_operand"
+  (match_code "reg,subreg,const_int,const_double")
+{
+  if (REG_P (op))
+    return !HARD_REGISTER_P (op);
+  if (GET_CODE (op) == SUBREG && MEM_P (SUBREG_REG (op)))
+    return false;
+  if (GET_CODE (op) == SUBREG)
+    return false;
+  return nonmemory_operand (op, mode);
+})
+
+;; A source operand for a move instruction.  This is the only predicate we use
+;; that accepts symbolic constants.
+(define_predicate "nvptx_general_operand"
+  (match_code "reg,subreg,mem,const,symbol_ref,label_ref,const_int,const_double")
+{
+  if (REG_P (op))
+    return !HARD_REGISTER_P (op);
+  return general_operand (op, mode);
+})
+
+;; A destination operand for a move instruction.  This is the only destination
+;; predicate that accepts the return register since it requires special handling.
+(define_predicate "nvptx_nonimmediate_operand"
+  (match_code "reg,subreg,mem")
+{
+  if (REG_P (op))
+    return (op != frame_pointer_rtx
+	    && op != arg_pointer_rtx
+	    && op != stack_pointer_rtx);
+  return nonimmediate_operand (op, mode);
+})
+
+(define_predicate "const_0_operand"
+  (and (match_code "const_int,const_double,const_vector")
+       (match_test "op == CONST0_RTX (GET_MODE (op))")))
+
+(define_predicate "global_mem_operand"
+  (and (match_code "mem")
+       (match_test "MEM_ADDR_SPACE (op) == ADDR_SPACE_GLOBAL")))
+
+(define_predicate "const_mem_operand"
+  (and (match_code "mem")
+       (match_test "MEM_ADDR_SPACE (op) == ADDR_SPACE_CONST")))
+
+(define_predicate "param_mem_operand"
+  (and (match_code "mem")
+       (match_test "MEM_ADDR_SPACE (op) == ADDR_SPACE_PARAM")))
+
+(define_predicate "shared_mem_operand"
+  (and (match_code "mem")
+       (match_test "MEM_ADDR_SPACE (op) == ADDR_SPACE_SHARED")))
+
+(define_predicate "const0_operand"
+  (and (match_code "const_int")
+       (match_test "op == const0_rtx")))
+
+;; True if this operator is valid for predication.
+(define_predicate "predicate_operator"
+  (match_code "eq,ne"))
+
+(define_predicate "ne_operator"
+  (match_code "ne"))
+
+(define_predicate "nvptx_comparison_operator"
+  (match_code "eq,ne,le,ge,lt,gt,leu,geu,ltu,gtu"))
+
+(define_predicate "nvptx_float_comparison_operator"
+  (match_code "eq,ne,le,ge,lt,gt,uneq,unle,unge,unlt,ungt,unordered,ordered"))
+
+;; Test for a valid operand for a call instruction.
+(define_special_predicate "call_insn_operand"
+  (match_code "symbol_ref,reg")
+{
+  if (GET_CODE (op) == SYMBOL_REF)
+    {
+      tree decl = SYMBOL_REF_DECL (op);
+      /* This happens for libcalls.  */
+      if (decl == NULL_TREE)
+        return true;
+      return TREE_CODE (SYMBOL_REF_DECL (op)) == FUNCTION_DECL;
+    }
+  return true;
+})
+
+;; Return true if OP is a call with parallel USEs of the argument
+;; pseudos.
+(define_predicate "call_operation"
+  (match_code "parallel")
+{
+  unsigned i;
+
+  for (i = 1; i < XVECLEN (op, 0); i++)
+    {
+      rtx elt = XVECEXP (op, 0, i);
+      enum machine_mode mode;
+      unsigned regno;
+
+      if (GET_CODE (elt) != USE
+          || GET_CODE (XEXP (elt, 0)) != REG
+          || XEXP (elt, 0) == frame_pointer_rtx
+          || XEXP (elt, 0) == arg_pointer_rtx
+          || XEXP (elt, 0) == stack_pointer_rtx)
+
+        return false;
+    }
+  return true;
+})
+
+(define_constraint "P0"
+  "An integer with the value 0."
+  (and (match_code "const_int")
+       (match_test "ival == 0")))
+
+(define_constraint "P1"
+  "An integer with the value 1."
+  (and (match_code "const_int")
+       (match_test "ival == 1")))
+
+(define_constraint "Pn"
+  "An integer with the value -1."
+  (and (match_code "const_int")
+       (match_test "ival == -1")))
+
+(define_constraint "R"
+  "A pseudo register."
+  (match_code "reg"))
+
+(define_constraint "Ia"
+  "Any integer constant."
+  (and (match_code "const_int") (match_test "true")))
+
+(define_mode_iterator QHSDISDFM [QI HI SI DI SF DF])
+(define_mode_iterator QHSDIM [QI HI SI DI])
+(define_mode_iterator HSDIM [HI SI DI])
+(define_mode_iterator BHSDIM [BI HI SI DI])
+(define_mode_iterator SDIM [SI DI])
+(define_mode_iterator SDISDFM [SI DI SF DF])
+(define_mode_iterator QHIM [QI HI])
+(define_mode_iterator QHSIM [QI HI SI])
+(define_mode_iterator SDFM [SF DF])
+(define_mode_iterator SDCM [SC DC])
+
+;; This mode iterator allows :P to be used for patterns that operate on
+;; pointer-sized quantities.  Exactly one of the two alternatives will match.
+(define_mode_iterator P [(SI "Pmode == SImode") (DI "Pmode == DImode")])
+
+;; We should get away with not defining memory alternatives, since we don't
+;; get variables in this mode and pseudos are never spilled.
+(define_insn "movbi"
+  [(set (match_operand:BI 0 "nvptx_register_operand" "=R,R,R")
+	(match_operand:BI 1 "nvptx_nonmemory_operand" "R,P0,Pn"))]
+  ""
+  "@
+   %.\\tmov%t0\\t%0, %1;
+   %.\\tsetp.eq.u32\\t%0, 1, 0;
+   %.\\tsetp.eq.u32\\t%0, 1, 1;")
+
+(define_insn "*mov<mode>_insn"
+  [(set (match_operand:QHSDIM 0 "nvptx_nonimmediate_operand" "=R,R,R,m")
+	(match_operand:QHSDIM 1 "general_operand" "n,Ri,m,R"))]
+  "!(MEM_P (operands[0])
+     && (!REG_P (operands[1]) || REGNO (operands[1]) <= LAST_VIRTUAL_REGISTER))"
+{
+  if (which_alternative == 2)
+    return "%.\\tld%A1%u1\\t%0, %1;";
+  if (which_alternative == 3)
+    return "%.\\tst%A0%u0\\t%0, %1;";
+
+  rtx dst = operands[0];
+  rtx src = operands[1];
+
+  enum machine_mode dst_mode = nvptx_underlying_object_mode (dst);
+  enum machine_mode src_mode = nvptx_underlying_object_mode (src);
+  if (GET_CODE (dst) == SUBREG)
+    dst = SUBREG_REG (dst);
+  if (GET_CODE (src) == SUBREG)
+    src = SUBREG_REG (src);
+  if (src_mode == QImode)
+    src_mode = SImode;
+  if (dst_mode == QImode)
+    dst_mode = SImode;
+  if (CONSTANT_P (src))
+    {
+      if (GET_MODE_CLASS (dst_mode) != MODE_INT)
+        return "%.\\tmov.b%T0\\t%0, %1;";
+      else
+        return "%.\\tmov%t0\\t%0, %1;";
+    }
+
+  /* Special handling for the return register; we allow this register to
+     only occur in the destination of a move insn.  */
+  if (REG_P (dst) && REGNO (dst) == NVPTX_RETURN_REGNUM
+      && dst_mode == HImode)
+    dst_mode = SImode;
+  if (dst_mode == src_mode)
+    return "%.\\tmov%t0\\t%0, %1;";
+  /* Mode-punning between floating point and integer.  */
+  if (GET_MODE_SIZE (dst_mode) == GET_MODE_SIZE (src_mode))
+    return "%.\\tmov.b%T0\\t%0, %1;";
+  return "%.\\tcvt%t0%t1\\t%0, %1;";
+}
+  [(set_attr "subregs_ok" "true")])
+
+(define_insn "*mov<mode>_insn"
+  [(set (match_operand:SDFM 0 "nvptx_nonimmediate_operand" "=R,R,m")
+	(match_operand:SDFM 1 "general_operand" "RF,m,R"))]
+  "!(MEM_P (operands[0]) && !REG_P (operands[1]))"
+{
+  if (which_alternative == 1)
+    return "%.\\tld%A1%u0\\t%0, %1;";
+  if (which_alternative == 2)
+    return "%.\\tst%A0%u1\\t%0, %1;";
+
+  rtx dst = operands[0];
+  rtx src = operands[1];
+  if (GET_CODE (dst) == SUBREG)
+    dst = SUBREG_REG (dst);
+  if (GET_CODE (src) == SUBREG)
+    src = SUBREG_REG (src);
+  enum machine_mode dst_mode = GET_MODE (dst);
+  enum machine_mode src_mode = GET_MODE (src);
+  if (dst_mode == src_mode)
+    return "%.\\tmov%t0\\t%0, %1;";
+  if (GET_MODE_SIZE (dst_mode) == GET_MODE_SIZE (src_mode))
+    return "%.\\tmov.b%T0\\t%0, %1;";
+  gcc_unreachable ();
+}
+  [(set_attr "subregs_ok" "true")])
+
+(define_insn "load_arg_reg<mode>"
+  [(set (match_operand:QHIM 0 "nvptx_register_operand" "=R")
+	(unspec:QHIM [(match_operand 1 "const_int_operand" "i")]
+		     UNSPEC_ARG_REG))]
+  ""
+  "%.\\tcvt%t0.u32\\t%0, %%ar%1;")
+
+(define_insn "load_arg_reg<mode>"
+  [(set (match_operand:SDISDFM 0 "nvptx_register_operand" "=R")
+	(unspec:SDISDFM [(match_operand 1 "const_int_operand" "i")]
+			UNSPEC_ARG_REG))]
+  ""
+  "%.\\tmov%t0\\t%0, %%ar%1;")
+
+(define_expand "mov<mode>"
+  [(set (match_operand:QHSDISDFM 0 "nvptx_nonimmediate_operand" "")
+	(match_operand:QHSDISDFM 1 "general_operand" ""))]
+  ""
+{
+  operands[1] = nvptx_maybe_convert_symbolic_operand (operands[1]);
+  /* Record the mode of the return register so that we can prevent
+     later optimization passes from changing it.  */
+  if (REG_P (operands[0]) && REGNO (operands[0]) == NVPTX_RETURN_REGNUM
+      && cfun)
+    {
+      if (cfun->machine->ret_reg_mode == VOIDmode)
+	cfun->machine->ret_reg_mode = GET_MODE (operands[0]);
+      else
+        gcc_assert (cfun->machine->ret_reg_mode == GET_MODE (operands[0]));
+    }
+
+  /* Hard registers are often actually symbolic operands on this target.
+     Don't allow them when storing to memory.  */
+  if (MEM_P (operands[0])
+      && (!REG_P (operands[1])
+	  || REGNO (operands[1]) <= LAST_VIRTUAL_REGISTER))
+    {
+      rtx tmp = gen_reg_rtx (<MODE>mode);
+      emit_move_insn (tmp, operands[1]);
+      emit_move_insn (operands[0], tmp);
+      DONE;
+    }
+  if (GET_CODE (operands[1]) == SYMBOL_REF)
+    nvptx_record_needed_fndecl (SYMBOL_REF_DECL (operands[1]));
+})
+
+(define_insn "highpartscsf2"
+  [(set (match_operand:SF 0 "nvptx_register_operand" "=R")
+	(unspec:SF [(match_operand:SC 1 "nvptx_register_operand")]
+		   UNSPEC_CPLX_HIGHPART))]
+  ""
+  "%.\\tmov%t0\\t%0, %f1$1;")
+
+(define_insn "set_highpartsfsc2"
+  [(set (match_operand:SC 0 "nvptx_register_operand" "+R")
+	(unspec:SC [(match_dup 0)
+		    (match_operand:SF 1 "nvptx_register_operand")]
+		   UNSPEC_CPLX_HIGHPART))]
+  ""
+  "%.\\tmov%t1\\t%f0$1, %1;")
+
+(define_insn "lowpartscsf2"
+  [(set (match_operand:SF 0 "nvptx_register_operand" "=R")
+	(unspec:SF [(match_operand:SC 1 "nvptx_register_operand")]
+		   UNSPEC_CPLX_LOWPART))]
+  ""
+  "%.\\tmov%t0\\t%0, %f1$0;")
+
+(define_insn "set_lowpartsfsc2"
+  [(set (match_operand:SC 0 "nvptx_register_operand" "+R")
+	(unspec:SC [(match_dup 0)
+		    (match_operand:SF 1 "nvptx_register_operand")]
+		   UNSPEC_CPLX_LOWPART))]
+  ""
+  "%.\\tmov%t1\\t%f0$0, %1;")
+
+(define_expand "mov<mode>"
+  [(set (match_operand:SDCM 0 "nvptx_nonimmediate_operand" "")
+	(match_operand:SDCM 1 "general_operand" ""))]
+  ""
+{
+  enum machine_mode submode = <MODE>mode == SCmode ? SFmode : DFmode;
+  int sz = GET_MODE_SIZE (submode);
+  rtx xops[4];
+  rtx punning_reg = NULL_RTX;
+  rtx copyback = NULL_RTX;
+
+  if (GET_CODE (operands[0]) == SUBREG)
+    {
+      rtx inner = SUBREG_REG (operands[0]);
+      enum machine_mode inner_mode = GET_MODE (inner);
+      int sz2 = GET_MODE_SIZE (inner_mode);
+      gcc_assert (sz2 >= sz);
+      cfun->machine->punning_buffer_size
+        = MAX (cfun->machine->punning_buffer_size, sz2);
+      if (punning_reg == NULL_RTX)
+	punning_reg = gen_rtx_REG (Pmode, NVPTX_PUNNING_BUFFER_REGNUM);
+      copyback = gen_move_insn (inner, gen_rtx_MEM (inner_mode, punning_reg));
+      operands[0] = gen_rtx_MEM (<MODE>mode, punning_reg);
+    }
+  if (GET_CODE (operands[1]) == SUBREG)
+    {
+      rtx inner = SUBREG_REG (operands[1]);
+      enum machine_mode inner_mode = GET_MODE (inner);
+      int sz2 = GET_MODE_SIZE (inner_mode);
+      gcc_assert (sz2 >= sz);
+      cfun->machine->punning_buffer_size
+        = MAX (cfun->machine->punning_buffer_size, sz2);
+      if (punning_reg == NULL_RTX)
+	punning_reg = gen_rtx_REG (Pmode, NVPTX_PUNNING_BUFFER_REGNUM);
+      emit_move_insn (gen_rtx_MEM (inner_mode, punning_reg), inner);
+      operands[1] = gen_rtx_MEM (<MODE>mode, punning_reg);
+    }
+
+  if (REG_P (operands[0]) && submode == SFmode)
+    {
+      xops[0] = gen_reg_rtx (submode);
+      xops[1] = gen_reg_rtx (submode);
+    }
+  else
+    {
+      xops[0] = gen_lowpart (submode, operands[0]);
+      if (MEM_P (operands[0]))
+	xops[1] = adjust_address_nv (operands[0], submode, sz);
+      else
+	xops[1] = gen_highpart (submode, operands[0]);
+    }
+
+  if (REG_P (operands[1]) && submode == SFmode)
+    {
+      xops[2] = gen_reg_rtx (submode);
+      xops[3] = gen_reg_rtx (submode);
+      emit_insn (gen_lowpartscsf2 (xops[2], operands[1]));
+      emit_insn (gen_highpartscsf2 (xops[3], operands[1]));
+    }
+  else
+    {
+      xops[2] = gen_lowpart (submode, operands[1]);
+      if (MEM_P (operands[1]))
+	xops[3] = adjust_address_nv (operands[1], submode, sz);
+      else
+	xops[3] = gen_highpart (submode, operands[1]);
+    }
+
+  emit_move_insn (xops[0], xops[2]);
+  emit_move_insn (xops[1], xops[3]);
+  if (REG_P (operands[0]) && submode == SFmode)
+    {
+      emit_insn (gen_set_lowpartsfsc2 (operands[0], xops[0]));
+      emit_insn (gen_set_highpartsfsc2 (operands[0], xops[1]));
+    }
+  if (copyback)
+    emit_insn (copyback);
+  DONE;
+})
+
+(define_insn "zero_extendqihi2"
+  [(set (match_operand:HI 0 "nvptx_register_operand" "=R,R")
+	(zero_extend:HI (match_operand:QI 1 "nvptx_reg_or_mem_operand" "R,m")))]
+  ""
+  "@
+   %.\\tcvt.u16.u%T1\\t%0, %1;
+   %.\\tld%A1.u8\\t%0, %1;"
+  [(set_attr "subregs_ok" "true")])
+
+(define_insn "zero_extend<mode>si2"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R,R")
+	(zero_extend:SI (match_operand:QHIM 1 "nvptx_reg_or_mem_operand" "R,m")))]
+  ""
+  "@
+   %.\\tcvt.u32.u%T1\\t%0, %1;
+   %.\\tld%A1.u%T1\\t%0, %1;"
+  [(set_attr "subregs_ok" "true")])
+
+(define_insn "zero_extend<mode>di2"
+  [(set (match_operand:DI 0 "nvptx_register_operand" "=R,R")
+	(zero_extend:DI (match_operand:QHSIM 1 "nvptx_reg_or_mem_operand" "R,m")))]
+  ""
+  "@
+   %.\\tcvt.u64.u%T1\\t%0, %1;
+   %.\\tld%A1%u1\\t%0, %1;"
+  [(set_attr "subregs_ok" "true")])
+
+(define_insn "extend<mode>si2"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R,R")
+	(sign_extend:SI (match_operand:QHIM 1 "nvptx_reg_or_mem_operand" "R,m")))]
+  ""
+  "@
+   %.\\tcvt.s32.s%T1\\t%0, %1;
+   %.\\tld%A1.s%T1\\t%0, %1;"
+  [(set_attr "subregs_ok" "true")])
+
+(define_insn "extend<mode>di2"
+  [(set (match_operand:DI 0 "nvptx_register_operand" "=R,R")
+	(sign_extend:DI (match_operand:QHSIM 1 "nvptx_reg_or_mem_operand" "R,m")))]
+  ""
+  "@
+   %.\\tcvt.s64.s%T1\\t%0, %1;
+   %.\\tld%A1.s%T1\\t%0, %1;"
+  [(set_attr "subregs_ok" "true")])
+
+(define_insn "trunchiqi2"
+  [(set (match_operand:QI 0 "nvptx_reg_or_mem_operand" "=R,m")
+	(truncate:QI (match_operand:HI 1 "nvptx_register_operand" "R,R")))]
+  ""
+  "@
+   %.\\tcvt%t0.u16\\t%0, %1;
+   %.\\tst%A0.u8\\t%0, %1;"
+  [(set_attr "subregs_ok" "true")])
+
+(define_insn "truncsi<mode>2"
+  [(set (match_operand:QHIM 0 "nvptx_reg_or_mem_operand" "=R,m")
+	(truncate:QHIM (match_operand:SI 1 "nvptx_register_operand" "R,R")))]
+  ""
+  "@
+   %.\\tcvt%t0.u32\\t%0, %1;
+   %.\\tst%A0.u%T0\\t%0, %1;"
+  [(set_attr "subregs_ok" "true")])
+
+(define_insn "truncdi<mode>2"
+  [(set (match_operand:QHSIM 0 "nvptx_reg_or_mem_operand" "=R,m")
+	(truncate:QHSIM (match_operand:DI 1 "nvptx_register_operand" "R,R")))]
+  ""
+  "@
+   %.\\tcvt%t0.u64\\t%0, %1;
+   %.\\tst%A0.u%T0\\t%0, %1;"
+  [(set_attr "subregs_ok" "true")])
+
+;; Pointer address space conversions
+
+(define_int_iterator cvt_code
+  [UNSPEC_FROM_GLOBAL
+   UNSPEC_FROM_LOCAL
+   UNSPEC_FROM_SHARED
+   UNSPEC_FROM_CONST
+   UNSPEC_TO_GLOBAL
+   UNSPEC_TO_LOCAL
+   UNSPEC_TO_SHARED
+   UNSPEC_TO_CONST])
+
+(define_int_attr cvt_name
+  [(UNSPEC_FROM_GLOBAL "from_global")
+   (UNSPEC_FROM_LOCAL "from_local")
+   (UNSPEC_FROM_SHARED "from_shared")
+   (UNSPEC_FROM_CONST "from_const")
+   (UNSPEC_TO_GLOBAL "to_global")
+   (UNSPEC_TO_LOCAL "to_local")
+   (UNSPEC_TO_SHARED "to_shared")
+   (UNSPEC_TO_CONST "to_const")])
+
+(define_int_attr cvt_str
+  [(UNSPEC_FROM_GLOBAL ".global")
+   (UNSPEC_FROM_LOCAL ".local")
+   (UNSPEC_FROM_SHARED ".shared")
+   (UNSPEC_FROM_CONST ".const")
+   (UNSPEC_TO_GLOBAL ".to.global")
+   (UNSPEC_TO_LOCAL ".to.local")
+   (UNSPEC_TO_SHARED ".to.shared")
+   (UNSPEC_TO_CONST ".to.const")])
+
+(define_insn "convaddr_<cvt_name><mode>"
+  [(set (match_operand:P 0 "nvptx_register_operand" "=R")
+	(unspec:P [(match_operand:P 1 "nvptx_register_or_symbolic_operand" "Rs")] cvt_code))]
+  ""
+  "%.\\tcvta<cvt_str>%t0\\t%0, %1;")
+
+;; Integer arithmetic
+
+(define_insn "add<mode>3"
+  [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
+	(plus:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R")
+		    (match_operand:HSDIM 2 "nvptx_nonmemory_operand" "Ri")))]
+  ""
+  "%.\\tadd%t0\\t%0, %1, %2;")
+
+(define_insn "sub<mode>3"
+  [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
+	(minus:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R")
+		     (match_operand:HSDIM 2 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tsub%t0\\t%0, %1, %2;")
+
+(define_insn "mul<mode>3"
+  [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
+	(mult:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R")
+		    (match_operand:HSDIM 2 "nvptx_nonmemory_operand" "Ri")))]
+  ""
+  "%.\\tmul.lo%t0\\t%0, %1, %2;")
+
+(define_insn "*mad<mode>3"
+  [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
+	(plus:HSDIM (mult:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R")
+				(match_operand:HSDIM 2 "nvptx_nonmemory_operand" "Ri"))
+		    (match_operand:HSDIM 3 "nvptx_nonmemory_operand" "Ri")))]
+  ""
+  "%.\\tmad.lo%t0\\t%0, %1, %2, %3;")
+
+(define_insn "div<mode>3"
+  [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
+	(div:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R")
+		   (match_operand:HSDIM 2 "nvptx_nonmemory_operand" "Ri")))]
+  ""
+  "%.\\tdiv.s%T0\\t%0, %1, %2;")
+
+(define_insn "udiv<mode>3"
+  [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
+	(udiv:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R")
+		   (match_operand:HSDIM 2 "nvptx_nonmemory_operand" "Ri")))]
+  ""
+  "%.\\tdiv.u%T0\\t%0, %1, %2;")
+
+(define_insn "mod<mode>3"
+  [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
+	(mod:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "Ri")
+		   (match_operand:HSDIM 2 "nvptx_nonmemory_operand" "Ri")))]
+  ""
+  "%.\\trem.s%T0\\t%0, %1, %2;")
+
+(define_insn "umod<mode>3"
+  [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
+	(umod:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "Ri")
+		    (match_operand:HSDIM 2 "nvptx_nonmemory_operand" "Ri")))]
+  ""
+  "%.\\trem.u%T0\\t%0, %1, %2;")
+
+(define_insn "smin<mode>3"
+  [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
+	(smin:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R")
+		    (match_operand:HSDIM 2 "nvptx_nonmemory_operand" "Ri")))]
+  ""
+  "%.\\tmin.s%T0\\t%0, %1, %2;")
+
+(define_insn "umin<mode>3"
+  [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
+	(umin:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R")
+		    (match_operand:HSDIM 2 "nvptx_nonmemory_operand" "Ri")))]
+  ""
+  "%.\\tmin.u%T0\\t%0, %1, %2;")
+
+(define_insn "smax<mode>3"
+  [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
+	(smax:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R")
+		    (match_operand:HSDIM 2 "nvptx_nonmemory_operand" "Ri")))]
+  ""
+  "%.\\tmax.s%T0\\t%0, %1, %2;")
+
+(define_insn "umax<mode>3"
+  [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
+	(umax:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R")
+		    (match_operand:HSDIM 2 "nvptx_nonmemory_operand" "Ri")))]
+  ""
+  "%.\\tmax.u%T0\\t%0, %1, %2;")
+
+(define_insn "abs<mode>2"
+  [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
+	(abs:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tabs.s%T0\\t%0, %1;")
+
+(define_insn "neg<mode>2"
+  [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
+	(neg:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tneg.s%T0\\t%0, %1;")
+
+(define_insn "one_cmpl<mode>2"
+  [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
+	(not:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tnot.b%T0\\t%0, %1;")
+
+(define_insn "bitrev<mode>2"
+  [(set (match_operand:SDIM 0 "nvptx_register_operand" "=R")
+	(unspec:SDIM [(match_operand:SDIM 1 "nvptx_register_operand" "R")]
+		     UNSPEC_BITREV))]
+  ""
+  "%.\\tbrev.b%T0\\t%0, %1;")
+
+(define_insn "clz<mode>2"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+	(clz:SI (match_operand:SDIM 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tclz.b%T0\\t%0, %1;")
+
+(define_expand "ctz<mode>2"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "")
+	(ctz:SI (match_operand:SDIM 1 "nvptx_register_operand" "")))]
+  ""
+{
+  rtx tmpreg = gen_reg_rtx (<MODE>mode);
+  emit_insn (gen_bitrev<mode>2 (tmpreg, operands[1]));
+  emit_insn (gen_clz<mode>2 (operands[0], tmpreg));
+  DONE;
+})
+
+;; Shifts
+
+(define_insn "ashl<mode>3"
+  [(set (match_operand:SDIM 0 "nvptx_register_operand" "=R")
+	(ashift:SDIM (match_operand:SDIM 1 "nvptx_register_operand" "R")
+		     (match_operand:SI 2 "nvptx_nonmemory_operand" "Ri")))]
+  ""
+  "%.\\tshl.b%T0\\t%0, %1, %2;")
+
+(define_insn "ashr<mode>3"
+  [(set (match_operand:SDIM 0 "nvptx_register_operand" "=R")
+	(ashiftrt:SDIM (match_operand:SDIM 1 "nvptx_register_operand" "R")
+		       (match_operand:SI 2 "nvptx_nonmemory_operand" "Ri")))]
+  ""
+  "%.\\tshr.s%T0\\t%0, %1, %2;")
+
+(define_insn "lshr<mode>3"
+  [(set (match_operand:SDIM 0 "nvptx_register_operand" "=R")
+	(lshiftrt:SDIM (match_operand:SDIM 1 "nvptx_register_operand" "R")
+		       (match_operand:SI 2 "nvptx_nonmemory_operand" "Ri")))]
+  ""
+  "%.\\tshr.u%T0\\t%0, %1, %2;")
+
+;; Logical operations
+
+(define_insn "and<mode>3"
+  [(set (match_operand:BHSDIM 0 "nvptx_register_operand" "=R")
+	(and:BHSDIM (match_operand:BHSDIM 1 "nvptx_register_operand" "R")
+		    (match_operand:BHSDIM 2 "nvptx_nonmemory_operand" "Ri")))]
+  ""
+  "%.\\tand.b%T0\\t%0, %1, %2;")
+
+(define_insn "ior<mode>3"
+  [(set (match_operand:BHSDIM 0 "nvptx_register_operand" "=R")
+	(ior:BHSDIM (match_operand:BHSDIM 1 "nvptx_register_operand" "R")
+		    (match_operand:BHSDIM 2 "nvptx_nonmemory_operand" "Ri")))]
+  ""
+  "%.\\tor.b%T0\\t%0, %1, %2;")
+
+(define_insn "xor<mode>3"
+  [(set (match_operand:BHSDIM 0 "nvptx_register_operand" "=R")
+	(xor:BHSDIM (match_operand:BHSDIM 1 "nvptx_register_operand" "R")
+		    (match_operand:BHSDIM 2 "nvptx_nonmemory_operand" "Ri")))]
+  ""
+  "%.\\txor.b%T0\\t%0, %1, %2;")
+
+;; Comparisons and branches
+
+(define_insn "*cmp<mode>"
+  [(set (match_operand:BI 0 "nvptx_register_operand" "=R")
+	(match_operator:BI 1 "nvptx_comparison_operator"
+	   [(match_operand:HSDIM 2 "nvptx_register_operand" "R")
+	    (match_operand:HSDIM 3 "nvptx_nonmemory_operand" "Ri")]))]
+  ""
+  "%.\\tsetp%c1 %0,%2,%3;")
+
+(define_insn "*cmp<mode>"
+  [(set (match_operand:BI 0 "nvptx_register_operand" "=R")
+	(match_operator:BI 1 "nvptx_float_comparison_operator"
+	   [(match_operand:SDFM 2 "nvptx_register_operand" "R")
+	    (match_operand:SDFM 3 "nvptx_nonmemory_operand" "RF")]))]
+  ""
+  "%.\\tsetp%c1 %0,%2,%3;")
+
+(define_insn "jump"
+  [(set (pc)
+	(label_ref (match_operand 0 "" "")))]
+  ""
+  "%.\\tbra\\t%l0;")
+
+(define_insn "br_true"
+  [(set (pc)
+	(if_then_else (ne (match_operand:BI 0 "nvptx_register_operand" "R")
+			  (const_int 0))
+		      (label_ref (match_operand 1 "" ""))
+		      (pc)))]
+  ""
+  "%j0\\tbra\\t%l1;")
+
+(define_insn "br_false"
+  [(set (pc)
+	(if_then_else (eq (match_operand:BI 0 "nvptx_register_operand" "R")
+			  (const_int 0))
+		      (label_ref (match_operand 1 "" ""))
+		      (pc)))]
+  ""
+  "%J0\\tbra\\t%l1;")
+
+(define_expand "cbranch<mode>4"
+  [(set (pc)
+	(if_then_else (match_operator 0 "nvptx_comparison_operator"
+		       [(match_operand:HSDIM 1 "nvptx_register_operand" "")
+			(match_operand:HSDIM 2 "nvptx_register_operand" "")])
+		      (label_ref (match_operand 3 "" ""))
+		      (pc)))]
+  ""
+{
+  rtx t = nvptx_expand_compare (operands[0]);
+  operands[0] = t;
+  operands[1] = XEXP (t, 0);
+  operands[2] = XEXP (t, 1);
+})
+
+(define_expand "cbranch<mode>4"
+  [(set (pc)
+	(if_then_else (match_operator 0 "nvptx_float_comparison_operator"
+		       [(match_operand:SDFM 1 "nvptx_register_operand" "")
+			(match_operand:SDFM 2 "nvptx_register_operand" "")])
+		      (label_ref (match_operand 3 "" ""))
+		      (pc)))]
+  ""
+{
+  rtx t = nvptx_expand_compare (operands[0]);
+  operands[0] = t;
+  operands[1] = XEXP (t, 0);
+  operands[2] = XEXP (t, 1);
+})
+
+(define_expand "cbranchbi4"
+  [(set (pc)
+	(if_then_else (match_operator 0 "predicate_operator"
+		       [(match_operand:BI 1 "nvptx_register_operand" "")
+			(match_operand:BI 2 "const0_operand" "")])
+		      (label_ref (match_operand 3 "" ""))
+		      (pc)))]
+  ""
+  "")
+
+;; Conditional stores
+
+(define_insn "setcc_from_bi"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+	(ne:SI (match_operand:BI 1 "nvptx_register_operand" "R")
+	       (const_int 0)))]
+  ""
+  "%.\\tselp%t0 %0,-1,0,%1;")
+
+(define_insn "setcc_int<mode>"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+	(match_operator:SI 1 "nvptx_comparison_operator"
+			   [(match_operand:HSDIM 2 "nvptx_register_operand" "R")
+			    (match_operand:HSDIM 3 "nvptx_nonmemory_operand" "Ri")]))]
+  ""
+  "%.\\tset%t0%c1 %0,%2,%3;")
+
+(define_insn "setcc_int<mode>"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+	(match_operator:SI 1 "nvptx_float_comparison_operator"
+			   [(match_operand:SDFM 2 "nvptx_register_operand" "R")
+			    (match_operand:SDFM 3 "nvptx_nonmemory_operand" "RF")]))]
+  ""
+  "%.\\tset%t0%c1 %0,%2,%3;")
+
+(define_insn "setcc_float<mode>"
+  [(set (match_operand:SF 0 "nvptx_register_operand" "=R")
+	(match_operator:SF 1 "nvptx_comparison_operator"
+			   [(match_operand:HSDIM 2 "nvptx_register_operand" "R")
+			    (match_operand:HSDIM 3 "nvptx_nonmemory_operand" "Ri")]))]
+  ""
+  "%.\\tset%t0%c1 %0,%2,%3;")
+
+(define_insn "setcc_float<mode>"
+  [(set (match_operand:SF 0 "nvptx_register_operand" "=R")
+	(match_operator:SF 1 "nvptx_float_comparison_operator"
+			   [(match_operand:SDFM 2 "nvptx_register_operand" "R")
+			    (match_operand:SDFM 3 "nvptx_nonmemory_operand" "RF")]))]
+  ""
+  "%.\\tset%t0%c1 %0,%2,%3;")
+
+(define_expand "cstorebi4"
+  [(set (match_operand:SI 0 "nvptx_register_operand")
+	(match_operator:SI 1 "ne_operator"
+         [(match_operand:BI 2 "nvptx_register_operand")
+          (match_operand:BI 3 "const0_operand")]))]
+  ""
+  "")
+
+(define_expand "cstore<mode>4"
+  [(set (match_operand:SI 0 "nvptx_register_operand")
+	(match_operator:SI 1 "nvptx_comparison_operator"
+         [(match_operand:HSDIM 2 "nvptx_register_operand")
+          (match_operand:HSDIM 3 "nvptx_nonmemory_operand")]))]
+  ""
+  "")
+
+(define_expand "cstore<mode>4"
+  [(set (match_operand:SI 0 "nvptx_register_operand")
+	(match_operator:SI 1 "nvptx_float_comparison_operator"
+         [(match_operand:SDFM 2 "nvptx_register_operand")
+          (match_operand:SDFM 3 "nvptx_nonmemory_operand")]))]
+  ""
+  "")
+
+;; Calls
+
+(define_insn "call_insn"
+  [(match_parallel 2 "call_operation"
+    [(call (mem:QI (match_operand:SI 0 "call_insn_operand" "Rs"))
+	   (match_operand 1))])]
+  ""
+{
+  return nvptx_output_call_insn (insn, NULL_RTX, operands[0]);
+})
+
+(define_insn "call_value_insn"
+  [(match_parallel 3 "call_operation"
+    [(set (match_operand 0 "nvptx_register_operand" "=R")
+	  (call (mem:QI (match_operand:SI 1 "call_insn_operand" "Rs"))
+		(match_operand 2)))])]
+  ""
+{
+  return nvptx_output_call_insn (insn, operands[0], operands[1]);
+})
+
+(define_expand "call"
+ [(match_operand 0 "" "")]
+ ""
+{
+  nvptx_expand_call (NULL_RTX, operands[0]);
+  DONE;
+})
+
+(define_expand "call_value"
+  [(match_operand 0 "" "")
+   (match_operand 1 "" "")]
+ ""
+{
+  nvptx_expand_call (operands[0], operands[1]);
+  DONE;
+})
+
+;; Floating point arithmetic.
+
+(define_insn "add<mode>3"
+  [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
+	(plus:SDFM (match_operand:SDFM 1 "nvptx_register_operand" "R")
+		   (match_operand:SDFM 2 "nvptx_nonmemory_operand" "RF")))]
+  ""
+  "%.\\tadd%t0\\t%0, %1, %2;")
+
+(define_insn "sub<mode>3"
+  [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
+	(minus:SDFM (match_operand:SDFM 1 "nvptx_register_operand" "R")
+		    (match_operand:SDFM 2 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tsub%t0\\t%0, %1, %2;")
+
+(define_insn "mul<mode>3"
+  [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
+	(mult:SDFM (match_operand:SDFM 1 "nvptx_register_operand" "R")
+		   (match_operand:SDFM 2 "nvptx_nonmemory_operand" "RF")))]
+  ""
+  "%.\\tmul%t0\\t%0, %1, %2;")
+
+(define_insn "fma<mode>4"
+  [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
+	(fma:SDFM (match_operand:SDFM 1 "nvptx_register_operand" "R")
+		  (match_operand:SDFM 2 "nvptx_nonmemory_operand" "RF")
+		  (match_operand:SDFM 3 "nvptx_nonmemory_operand" "RF")))]
+  ""
+  "%.\\tfma%#%t0\\t%0, %1, %2, %3;")
+
+(define_insn "div<mode>3"
+  [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
+	(div:SDFM (match_operand:SDFM 1 "nvptx_register_operand" "R")
+		  (match_operand:SDFM 2 "nvptx_nonmemory_operand" "RF")))]
+  ""
+  "%.\\tdiv%#%t0\\t%0, %1, %2;")
+
+(define_insn "copysign<mode>3"
+  [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
+	(unspec:SDFM [(match_operand:SDFM 1 "nvptx_register_operand" "R")
+		      (match_operand:SDFM 2 "nvptx_register_operand" "R")]
+		      UNSPEC_COPYSIGN))]
+  ""
+  "%.\\tcopysign%t0\\t%0, %2, %1;")
+
+(define_insn "smin<mode>3"
+  [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
+	(smin:SDFM (match_operand:SDFM 1 "nvptx_register_operand" "R")
+		    (match_operand:SDFM 2 "nvptx_nonmemory_operand" "RF")))]
+  ""
+  "%.\\tmin%t0\\t%0, %1, %2;")
+
+(define_insn "smax<mode>3"
+  [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
+	(smax:SDFM (match_operand:SDFM 1 "nvptx_register_operand" "R")
+		    (match_operand:SDFM 2 "nvptx_nonmemory_operand" "RF")))]
+  ""
+  "%.\\tmax%t0\\t%0, %1, %2;")
+
+(define_insn "abs<mode>2"
+  [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
+	(abs:SDFM (match_operand:SDFM 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tabs%t0\\t%0, %1;")
+
+(define_insn "neg<mode>2"
+  [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
+	(neg:SDFM (match_operand:SDFM 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tneg%t0\\t%0, %1;")
+
+(define_insn "sqrt<mode>2"
+  [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
+	(sqrt:SDFM (match_operand:SDFM 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tsqrt%#%t0\\t%0, %1;")
+
+(define_insn "sinsf2"
+  [(set (match_operand:SF 0 "nvptx_register_operand" "=R")
+	(unspec:SF [(match_operand:SF 1 "nvptx_register_operand" "R")]
+		   UNSPEC_SIN))]
+  "flag_unsafe_math_optimizations"
+  "%.\\tsin.approx%t0\\t%0, %1;")
+
+(define_insn "cossf2"
+  [(set (match_operand:SF 0 "nvptx_register_operand" "=R")
+	(unspec:SF [(match_operand:SF 1 "nvptx_register_operand" "R")]
+		   UNSPEC_COS))]
+  "flag_unsafe_math_optimizations"
+  "%.\\tcos.approx%t0\\t%0, %1;")
+
+(define_insn "log2sf2"
+  [(set (match_operand:SF 0 "nvptx_register_operand" "=R")
+	(unspec:SF [(match_operand:SF 1 "nvptx_register_operand" "R")]
+		   UNSPEC_LOG2))]
+  "flag_unsafe_math_optimizations"
+  "%.\\tlg2.approx%t0\\t%0, %1;")
+
+(define_insn "exp2sf2"
+  [(set (match_operand:SF 0 "nvptx_register_operand" "=R")
+	(unspec:SF [(match_operand:SF 1 "nvptx_register_operand" "R")]
+		   UNSPEC_EXP2))]
+  "flag_unsafe_math_optimizations"
+  "%.\\tex2.approx%t0\\t%0, %1;")
+
+;; Conversions involving floating point
+
+(define_insn "extendsfdf2"
+  [(set (match_operand:DF 0 "nvptx_register_operand" "=R")
+	(float_extend:DF (match_operand:SF 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tcvt%t0%t1\\t%0, %1;")
+
+(define_insn "truncdfsf2"
+  [(set (match_operand:SF 0 "nvptx_register_operand" "=R")
+	(float_truncate:SF (match_operand:DF 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tcvt%#%t0%t1\\t%0, %1;")
+
+(define_insn "floatunssi<mode>2"
+  [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
+	(unsigned_float:SDFM (match_operand:SI 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tcvt%#%t0.u%T1\\t%0, %1;")
+
+(define_insn "floatsi<mode>2"
+  [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
+	(float:SDFM (match_operand:SI 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tcvt%#%t0.s%T1\\t%0, %1;")
+
+(define_insn "floatunsdi<mode>2"
+  [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
+	(unsigned_float:SDFM (match_operand:DI 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tcvt%#%t0.u%T1\\t%0, %1;")
+
+(define_insn "floatdi<mode>2"
+  [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
+	(float:SDFM (match_operand:DI 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tcvt%#%t0.s%T1\\t%0, %1;")
+
+(define_insn "fixuns_trunc<mode>si2"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+	(unsigned_fix:SI (match_operand:SDFM 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tcvt.rzi.u%T0%t1\\t%0, %1;")
+
+(define_insn "fix_trunc<mode>si2"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+	(fix:SI (match_operand:SDFM 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tcvt.rzi.s%T0%t1\\t%0, %1;")
+
+(define_insn "fixuns_trunc<mode>di2"
+  [(set (match_operand:DI 0 "nvptx_register_operand" "=R")
+	(unsigned_fix:DI (match_operand:SDFM 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tcvt.rzi.u%T0%t1\\t%0, %1;")
+
+(define_insn "fix_trunc<mode>di2"
+  [(set (match_operand:DI 0 "nvptx_register_operand" "=R")
+	(fix:DI (match_operand:SDFM 1 "nvptx_register_operand" "R")))]
+  ""
+  "%.\\tcvt.rzi.s%T0%t1\\t%0, %1;")
+
+(define_int_iterator FPINT [UNSPEC_FPINT_FLOOR UNSPEC_FPINT_BTRUNC
+			    UNSPEC_FPINT_CEIL UNSPEC_FPINT_NEARBYINT])
+(define_int_attr fpint_name [(UNSPEC_FPINT_FLOOR "floor")
+			     (UNSPEC_FPINT_BTRUNC "btrunc")
+			     (UNSPEC_FPINT_CEIL "ceil")
+			     (UNSPEC_FPINT_NEARBYINT "nearbyint")])
+(define_int_attr fpint_roundingmode [(UNSPEC_FPINT_FLOOR ".rmi")
+				     (UNSPEC_FPINT_BTRUNC ".rzi")
+				     (UNSPEC_FPINT_CEIL ".rpi")
+				     (UNSPEC_FPINT_NEARBYINT "%#i")])
+
+(define_insn "<FPINT:fpint_name><SDFM:mode>2"
+  [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
+	(unspec:SDFM [(match_operand:SDFM 1 "nvptx_register_operand" "R")]
+		     FPINT))]
+  ""
+  "%.\\tcvt<FPINT:fpint_roundingmode>%t0%t1\\t%0, %1;")
+
+(define_int_iterator FPINT2 [UNSPEC_FPINT_FLOOR UNSPEC_FPINT_CEIL])
+(define_int_attr fpint2_name [(UNSPEC_FPINT_FLOOR "lfloor")
+			     (UNSPEC_FPINT_CEIL "lceil")])
+(define_int_attr fpint2_roundingmode [(UNSPEC_FPINT_FLOOR ".rmi")
+				     (UNSPEC_FPINT_CEIL ".rpi")])
+
+(define_insn "<FPINT2:fpint2_name><SDFM:mode><SDIM:mode>2"
+  [(set (match_operand:SDIM 0 "nvptx_register_operand" "=R")
+	(unspec:SDIM [(match_operand:SDFM 1 "nvptx_register_operand" "R")]
+		     FPINT2))]
+  ""
+  "%.\\tcvt<FPINT2:fpint2_roundingmode>.s%T0%t1\\t%0, %1;")
+
+;; Miscellaneous
+
+(define_insn "nop"
+  [(const_int 0)]
+  ""
+  "")
+
+(define_insn "return"
+  [(return)]
+  ""
+{
+  return nvptx_output_return ();
+})
+
+(define_expand "epilogue"
+  [(clobber (const_int 0))]
+  ""
+{
+  emit_jump_insn (gen_return ());
+  DONE;
+})
+
+(define_expand "nonlocal_goto"
+  [(match_operand 0 "" "")
+   (match_operand 1 "" "")
+   (match_operand 2 "" "")
+   (match_operand 3 "" "")]
+  ""
+{
+  sorry ("target cannot support nonlocal goto.");
+  emit_insn (gen_nop ());
+  DONE;
+})
+
+(define_expand "nonlocal_goto_receiver"
+  [(const_int 0)]
+  ""
+{
+  sorry ("target cannot support nonlocal goto.");
+})
+
+(define_insn "allocate_stack"
+  [(set (match_operand 0 "nvptx_register_operand" "=R")
+	(unspec [(match_operand 1 "nvptx_register_operand" "R")]
+		  UNSPEC_ALLOCA))]
+  ""
+  "%.\\tcall (%0), %%alloca, (%1);")
+
+(define_expand "restore_stack_block"
+  [(match_operand 0 "register_operand" "")
+   (match_operand 1 "register_operand" "")]
+  ""
+{
+  DONE;
+})
+
+(define_expand "restore_stack_function"
+  [(match_operand 0 "register_operand" "")
+   (match_operand 1 "register_operand" "")]
+  ""
+{
+  DONE;
+})
+
+(define_insn "trap"
+  [(trap_if (const_int 1) (const_int 0))]
+  ""
+  "trap;")
+
+(define_insn "trap_if_true"
+  [(trap_if (ne (match_operand:BI 0 "nvptx_register_operand" "R")
+		(const_int 0))
+	    (const_int 0))]
+  ""
+  "%j0 trap;")
+
+(define_insn "trap_if_false"
+  [(trap_if (eq (match_operand:BI 0 "nvptx_register_operand" "R")
+		(const_int 0))
+	    (const_int 0))]
+  ""
+  "%J0 trap;")
+
+(define_expand "ctrap<mode>4"
+  [(trap_if (match_operator 0 "nvptx_comparison_operator"
+			    [(match_operand:SDIM 1 "nvptx_register_operand")
+			     (match_operand:SDIM 2 "nvptx_nonmemory_operand")])
+	    (match_operand 3 "const_0_operand"))]
+  ""
+{
+  rtx t = nvptx_expand_compare (operands[0]);
+  emit_insn (gen_trap_if_true (t));
+  DONE;
+})
+
+(define_insn "*oacc_ntid_insn"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+	(unspec:SI [(match_operand:SI 1 "const_int_operand" "n")] UNSPEC_NTID))]
+  ""
+  "%.\\tmov.u32 %0, %%ntid%d1;")
+
+(define_expand "oacc_ntid"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "")
+	(unspec:SI [(match_operand:SI 1 "const_int_operand" "")] UNSPEC_NTID))]
+  ""
+{
+  if (INTVAL (operands[1]) < 0 || INTVAL (operands[1]) > 2)
+    FAIL;
+})
+
+(define_insn "*oacc_tid_insn"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
+	(unspec:SI [(match_operand:SI 1 "const_int_operand" "n")] UNSPEC_TID))]
+  ""
+  "%.\\tmov.u32 %0, %%tid%d1;")
+
+(define_expand "oacc_tid"
+  [(set (match_operand:SI 0 "nvptx_register_operand" "")
+	(unspec:SI [(match_operand:SI 1 "const_int_operand" "")] UNSPEC_TID))]
+  ""
+{
+  if (INTVAL (operands[1]) < 0 || INTVAL (operands[1]) > 2)
+    FAIL;
+})
+
+;; Atomic insns.
+
+(define_expand "atomic_compare_and_swap<mode>"
+  [(match_operand:SI 0 "nvptx_register_operand")	;; bool success output
+   (match_operand:SDIM 1 "nvptx_register_operand")	;; oldval output
+   (match_operand:SDIM 2 "memory_operand")		;; memory
+   (match_operand:SDIM 3 "nvptx_register_operand")	;; expected input
+   (match_operand:SDIM 4 "nvptx_register_operand")	;; newval input
+   (match_operand:SI 5 "const_int_operand")		;; is_weak
+   (match_operand:SI 6 "const_int_operand")		;; success model
+   (match_operand:SI 7 "const_int_operand")]		;; failure model
+  ""
+{
+  emit_insn (gen_atomic_compare_and_swap<mode>_1 (operands[1], operands[2], operands[3],
+					          operands[4], operands[6]));
+
+  rtx tmp = gen_reg_rtx (GET_MODE (operands[0]));
+  emit_insn (gen_cstore<mode>4 (tmp,
+				gen_rtx_EQ (SImode, operands[1], operands[3]),
+				operands[1], operands[3]));
+  emit_insn (gen_andsi3 (operands[0], tmp, GEN_INT (1)));
+  DONE;
+})
+
+(define_insn "atomic_compare_and_swap<mode>_1"
+  [(set (match_operand:SDIM 0 "nvptx_register_operand" "=R")
+	(unspec_volatile:SDIM
+	  [(match_operand:SDIM 1 "memory_operand" "+m")
+	   (match_operand:SDIM 2 "nvptx_register_operand" "R")
+	   (match_operand:SDIM 3 "nvptx_register_operand" "R")
+	   (match_operand:SI 4 "const_int_operand")]
+	  UNSPECV_CAS))
+   (set (match_dup 1)
+	(unspec_volatile:SDIM [(const_int 0)] UNSPECV_CAS))]
+  ""
+  "%.\\tatom%A1.cas.b%T0\\t%0, %1, %2, %3;")
+
+(define_insn "atomic_exchange<mode>"
+  [(set (match_operand:SDIM 0 "nvptx_register_operand" "=R")	;; output
+	(unspec_volatile:SDIM
+	  [(match_operand:SDIM 1 "memory_operand" "+m")		;; memory
+	   (match_operand:SI 3 "const_int_operand")]		;; model
+	  UNSPECV_XCHG))
+   (set (match_dup 1)
+	(match_operand:SDIM 2 "nvptx_register_operand" "R"))]	;; input
+  ""
+  "%.\\tatom%A1.exch.b%T0\\t%0, %1, %2;")
+
+(define_insn "atomic_fetch_add<mode>"
+  [(set (match_operand:SDIM 1 "memory_operand" "+m")
+	(unspec_volatile:SDIM
+	  [(plus:SDIM (match_dup 1)
+		      (match_operand:SDIM 2 "nvptx_nonmemory_operand" "Ri"))
+	   (match_operand:SI 3 "const_int_operand")]		;; model
+	  UNSPECV_LOCK))
+   (set (match_operand:SDIM 0 "nvptx_register_operand" "=R")
+	(match_dup 1))]
+  ""
+  "%.\\tatom%A1.add%t0\\t%0, %1, %2;")
+
+(define_insn "atomic_fetch_addsf"
+  [(set (match_operand:SF 1 "memory_operand" "+m")
+	(unspec_volatile:SF
+	 [(plus:SF (match_dup 1)
+		   (match_operand:SF 2 "nvptx_nonmemory_operand" "RF"))
+	   (match_operand:SI 3 "const_int_operand")]		;; model
+	  UNSPECV_LOCK))
+   (set (match_operand:SF 0 "nvptx_register_operand" "=R")
+	(match_dup 1))]
+  ""
+  "%.\\tatom%A1.add%t0\\t%0, %1, %2;")
+
+(define_code_iterator any_logic [and ior xor])
+(define_code_attr logic [(and "and") (ior "or") (xor "xor")])
+
+;; Currently disabled until we add better subtarget support - requires sm_32.
+(define_insn "atomic_fetch_<logic><mode>"
+  [(set (match_operand:SDIM 1 "memory_operand" "+m")
+	(unspec_volatile:SDIM
+	  [(any_logic:SDIM (match_dup 1)
+			   (match_operand:SDIM 2 "nvptx_nonmemory_operand" "Ri"))
+	   (match_operand:SI 3 "const_int_operand")]		;; model
+	  UNSPECV_LOCK))
+   (set (match_operand:SDIM 0 "nvptx_register_operand" "=R")
+	(match_dup 1))]
+  "0"
+  "%.\\tatom%A1.b%T0.<logic>\\t%0, %1, %2;")
diff --git a/gcc/config/nvptx/nvptx.opt b/gcc/config/nvptx/nvptx.opt
new file mode 100644
index 0000000..bcdbc8c
--- /dev/null
+++ b/gcc/config/nvptx/nvptx.opt
@@ -0,0 +1,30 @@
+; Options for the NVPTX port
+; Copyright 2014 Free Software Foundation, Inc.
+;
+; This file is part of GCC.
+;
+; GCC is free software; you can redistribute it and/or modify it under
+; the terms of the GNU General Public License as published by the Free
+; Software Foundation; either version 3, or (at your option) any later
+; version.
+;
+; GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+; WARRANTY; without even the implied warranty of MERCHANTABILITY or
+; FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+; for more details.
+;
+; You should have received a copy of the GNU General Public License
+; along with GCC; see the file COPYING3.  If not see
+; <http://www.gnu.org/licenses/>.
+
+m64
+Target Report RejectNegative Mask(ABI64)
+Generate code for a 64 bit ABI
+
+m32
+Target Report RejectNegative InverseMask(ABI64)
+Generate code for a 32 bit ABI
+
+mmainkernel
+Target Report RejectNegative
+Link in code for a __main kernel.
diff --git a/gcc/config/nvptx/t-nvptx b/gcc/config/nvptx/t-nvptx
new file mode 100644
index 0000000..8fa2136
--- /dev/null
+++ b/gcc/config/nvptx/t-nvptx
@@ -0,0 +1,2 @@
+#
+
diff --git a/libgcc/ChangeLog b/libgcc/ChangeLog
index 073ed11..17c8bb1 100644
--- a/libgcc/ChangeLog
+++ b/libgcc/ChangeLog
@@ -1,3 +1,16 @@
+2014-11-06  Bernd Schmidt  <bernds@codesourcery.com>
+
+	* config.host: Handle nvptx-*-*.
+	* shared-object.mk (as-flags-$o): Define.
+	($(base)$(objext), $(base)_s$(objext)): Use it instead of
+	-xassembler-with-cpp.
+	* static-object.mk: Identical changes.
+	* config/nvptx/t-nvptx: New file.
+	* config/nvptx/crt0.s: New file.
+	* config/nvptx/free.asm: New file.
+	* config/nvptx/malloc.asm: New file.
+	* config/nvptx/realloc.c: New file.
+
 2014-10-30  Joseph Myers  <joseph@codesourcery.com>
 
 	* Makefile.in (libgcc.map.in): New target.
diff --git a/libgcc/config.host b/libgcc/config.host
index f3cc276..9903d15 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -1256,6 +1256,10 @@ mep*-*-*)
 	tmake_file="mep/t-mep t-fdpbit"
 	extra_parts="crtbegin.o crtend.o"
 	;;
+nvptx-*)
+	tmake_file="$tmake_file nvptx/t-nvptx"
+	extra_parts="crt0.o"
+	;;
 *)
 	echo "*** Configuration ${host} not supported" 1>&2
 	exit 1
diff --git a/libgcc/config/nvptx/crt0.s b/libgcc/config/nvptx/crt0.s
new file mode 100644
index 0000000..38327ed
--- /dev/null
+++ b/libgcc/config/nvptx/crt0.s
@@ -0,0 +1,45 @@
+	.version 3.1
+	.target	sm_30
+	.address_size 64
+
+.global .u64 %__exitval;
+// BEGIN GLOBAL FUNCTION DEF: abort
+.visible .func abort
+{
+        .reg .u64 %rd1;
+        ld.global.u64   %rd1,[%__exitval];
+        st.u32   [%rd1], 255;
+        exit;
+}
+// BEGIN GLOBAL FUNCTION DEF: exit
+.visible .func exit (.param .u32 %arg)
+{
+        .reg .u64 %rd1;
+	.reg .u32 %val;
+	ld.param.u32 %val,[%arg];
+        ld.global.u64   %rd1,[%__exitval];
+        st.u32   [%rd1], %val;
+        exit;
+}
+
+.extern .func (.param.u32 retval) main (.param.u32 argc, .param.u64 argv);
+
+.visible .entry __main (.param .u64 __retval, .param.u32 __argc, .param.u64 __argv)
+{
+        .reg .u32 %r<3>;
+        .reg .u64 %rd<3>;
+	.param.u32 %argc;
+	.param.u64 %argp;
+	.param.u32 %mainret;
+        ld.param.u64    %rd0, [__retval];
+        st.global.u64   [%__exitval], %rd0;
+
+	ld.param.u32	%r1, [__argc];
+	ld.param.u64	%rd1, [__argv];
+	st.param.u32	[%argc], %r1;
+	st.param.u64	[%argp], %rd1;
+        call.uni        (%mainret), main, (%argc, %argp);
+	ld.param.u32	%r1,[%mainret];
+        st.s32   [%rd0], %r1;
+        exit;
+}
diff --git a/libgcc/config/nvptx/free.asm b/libgcc/config/nvptx/free.asm
new file mode 100644
index 0000000..c7c56cf
--- /dev/null
+++ b/libgcc/config/nvptx/free.asm
@@ -0,0 +1,50 @@
+// A wrapper around free to enable a realloc implementation.
+
+// Copyright (C) 2014 Free Software Foundation, Inc.
+
+// This file is free software; you can redistribute it and/or modify it
+// under the terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option) any
+// later version.
+
+// This file is distributed in the hope that it will be useful, but
+// WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+// General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// <http://www.gnu.org/licenses/>.
+
+        .version        3.1
+        .target sm_30
+        .address_size 64
+
+.extern .func free(.param.u64 %in_ar1);
+
+// BEGIN GLOBAL FUNCTION DEF: __nvptx_free
+.visible .func __nvptx_free(.param.u64 %in_ar1)
+{
+	.reg.u64 %ar1;
+	.reg.u64 %hr10;
+	.reg.u64 %r23;
+	.reg.pred %r25;
+	.reg.u64 %r27;
+	ld.param.u64 %ar1, [%in_ar1];
+		mov.u64	%r23, %ar1;
+		setp.eq.u64 %r25,%r23,0;
+	@%r25	bra	$L1;
+		add.u64	%r27, %r23, -8;
+	{
+		.param.u64 %out_arg0;
+		st.param.u64 [%out_arg0], %r27;
+		call free, (%out_arg0);
+	}
+$L1:
+	ret;
+	}
diff --git a/libgcc/config/nvptx/malloc.asm b/libgcc/config/nvptx/malloc.asm
new file mode 100644
index 0000000..9d9db10
--- /dev/null
+++ b/libgcc/config/nvptx/malloc.asm
@@ -0,0 +1,55 @@
+// A wrapper around malloc to enable a realloc implementation.
+
+// Copyright (C) 2014 Free Software Foundation, Inc.
+
+// This file is free software; you can redistribute it and/or modify it
+// under the terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option) any
+// later version.
+
+// This file is distributed in the hope that it will be useful, but
+// WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+// General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// <http://www.gnu.org/licenses/>.
+
+        .version        3.1
+        .target sm_30
+        .address_size 64
+
+.extern .func (.param.u64 %out_retval) malloc(.param.u64 %in_ar1);
+
+// BEGIN GLOBAL FUNCTION DEF: __nvptx_malloc
+.visible .func (.param.u64 %out_retval) __nvptx_malloc(.param.u64 %in_ar1)
+{
+        .reg.u64 %ar1;
+.reg.u64 %retval;
+        .reg.u64 %hr10;
+        .reg.u64 %r26;
+        .reg.u64 %r28;
+        .reg.u64 %r29;
+        .reg.u64 %r31;
+        ld.param.u64 %ar1, [%in_ar1];
+		mov.u64 %r26, %ar1;
+		add.u64 %r28, %r26, 8;
+        {
+		.param.u64 %retval_in;
+		.param.u64 %out_arg0;
+		st.param.u64 [%out_arg0], %r28;
+		call (%retval_in), malloc, (%out_arg0);
+		ld.param.u64    %r29, [%retval_in];
+        }
+		st.u64  [%r29], %r26;
+		add.u64 %r31, %r29, 8;
+		mov.u64 %retval, %r31;
+		st.param.u64    [%out_retval], %retval;
+		ret;
+}
diff --git a/libgcc/config/nvptx/nvptx-malloc.h b/libgcc/config/nvptx/nvptx-malloc.h
new file mode 100644
index 0000000..137d73c
--- /dev/null
+++ b/libgcc/config/nvptx/nvptx-malloc.h
@@ -0,0 +1,26 @@
+/* Declarations for the malloc wrappers.
+
+   Copyright (C) 2014 Free Software Foundation, Inc.
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+extern void __nvptx_free (void *);
+extern void *__nvptx_malloc (size_t);
+extern void *__nvptx_realloc (void *, size_t);
diff --git a/libgcc/config/nvptx/realloc.c b/libgcc/config/nvptx/realloc.c
new file mode 100644
index 0000000..41cf554
--- /dev/null
+++ b/libgcc/config/nvptx/realloc.c
@@ -0,0 +1,51 @@
+/* Implement realloc with the help of the malloc and free wrappers.
+
+   Copyright (C) 2014 Free Software Foundation, Inc.
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <stdlib.h>
+#include <string.h>
+#include "nvptx-malloc.h"
+
+void *
+__nvptx_realloc (void *ptr, size_t newsz)
+{
+  if (newsz == 0)
+    {
+      __nvptx_free (ptr);
+      return NULL;
+    }
+  void *newptr = __nvptx_malloc (newsz);
+
+  size_t oldsz;
+  if (ptr == NULL)
+    oldsz = 0;
+  else
+    {
+      size_t *sp = __extension__ (size_t *)(ptr - 8);
+      oldsz = *sp;
+    }
+  if (oldsz != 0)
+    memcpy (newptr, ptr, oldsz > newsz ? newsz : oldsz);
+
+  __nvptx_free (ptr);
+  return newptr;
+}
diff --git a/libgcc/config/nvptx/t-nvptx b/libgcc/config/nvptx/t-nvptx
new file mode 100644
index 0000000..08d3a67
--- /dev/null
+++ b/libgcc/config/nvptx/t-nvptx
@@ -0,0 +1,9 @@
+LIB2ADD=$(srcdir)/config/nvptx/malloc.asm \
+	$(srcdir)/config/nvptx/free.asm \
+	$(srcdir)/config/nvptx/realloc.c
+
+LIB2ADDEH=
+LIB2FUNCS_EXCLUDE=__main
+
+crt0.o: $(srcdir)/config/nvptx/crt0.s
+	cp $< $@
diff --git a/libgcc/shared-object.mk b/libgcc/shared-object.mk
index d9ee922..efac797 100644
--- a/libgcc/shared-object.mk
+++ b/libgcc/shared-object.mk
@@ -24,13 +24,15 @@ $(error Unsupported file type: $o)
 endif
 endif
 
+as_flags-$o := -xassembler$(if $(filter .S,$(suffix $o)),-with-cpp)
+
 $(base)$(objext): $o $(base).vis
-	$(gcc_compile) -c -xassembler-with-cpp -include $*.vis $<
+	$(gcc_compile) -c $(as_flags-$<) -include $*.vis $<
 
 $(base).vis: $(base)_s$(objext)
 	$(gen-hide-list)
 
 $(base)_s$(objext): $o
-	$(gcc_s_compile) -c -xassembler-with-cpp $<
+	$(gcc_s_compile) -c $(as_flags-$<) $<
 
 endif
diff --git a/libgcc/static-object.mk b/libgcc/static-object.mk
index 4f53636..891787e 100644
--- a/libgcc/static-object.mk
+++ b/libgcc/static-object.mk
@@ -24,13 +24,15 @@ $(error Unsupported file type: $o)
 endif
 endif
 
+as_flags-$o := -xassembler$(if $(filter .S,$(suffix $o)),-with-cpp)
+
 $(base)$(objext): $o $(base).vis
-	$(gcc_compile) -c -xassembler-with-cpp -include $*.vis $<
+	$(gcc_compile) -c $(as_flags-$<) -include $*.vis $<
 
 $(base).vis: $(base)_s$(objext)
 	$(gen-hide-list)
 
 $(base)_s$(objext): $o
-	$(gcc_s_compile) -c -xassembler-with-cpp $<
+	$(gcc_s_compile) -c $(as_flags-$<) $<
 
 endif

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [10/11+] Target files
  2014-11-10 16:33         ` Bernd Schmidt
@ 2014-11-10 20:06           ` Jakub Jelinek
  2014-11-10 20:37             ` H.J. Lu
  2014-11-10 20:40             ` H.J. Lu
  2014-12-12 20:18           ` Thomas Schwinge
  2014-12-23 18:51           ` nvptx-tools and nvptx-newlib (was: The nvptx port [10/11+] Target files) Thomas Schwinge
  2 siblings, 2 replies; 82+ messages in thread
From: Jakub Jelinek @ 2014-11-10 20:06 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: Jeff Law, GCC Patches

On Mon, Nov 10, 2014 at 05:19:57PM +0100, Bernd Schmidt wrote:
> commit 659744a99d815b168716b4460e32f6a21593e494
> Author: Bernd Schmidt <bernds@codesourcery.com>
> Date:   Thu Nov 6 19:03:57 2014 +0100

Note, in r217301 you've committed a change to pr35468.c, not mentioned in
the ChangeLog, that uses no_const_addr_space effective target that is never
defined.  Can you please revert or commit a patch that adds support for that
to gcc/testsuite/lib/ ?

+ERROR: gcc.c-torture/compile/pr35468.c   -O0 : unknown effective target keyword \`no_const_addr_space' for " dg-require-effective-target 2 no_const_addr_space "
+UNRESOLVED: gcc.c-torture/compile/pr35468.c   -O0 : unknown effective target keyword \`no_const_addr_space' for " dg-require-effective-target 2 no_const_addr_space "
+ERROR: gcc.c-torture/compile/pr35468.c   -O1 : unknown effective target keyword \`no_const_addr_space' for " dg-require-effective-target 2 no_const_addr_space "
+UNRESOLVED: gcc.c-torture/compile/pr35468.c   -O1 : unknown effective target keyword \`no_const_addr_space' for " dg-require-effective-target 2 no_const_addr_space "
+ERROR: gcc.c-torture/compile/pr35468.c   -O2 -flto -flto-partition=none : unknown effective target keyword \`no_const_addr_space' for " dg-require-effective-target 2 no_const_addr_space "
+UNRESOLVED: gcc.c-torture/compile/pr35468.c   -O2 -flto -flto-partition=none : unknown effective target keyword \`no_const_addr_space' for " dg-require-effective-target 2 no_const_addr_space "
+ERROR: gcc.c-torture/compile/pr35468.c   -O2 -flto : unknown effective target keyword \`no_const_addr_space' for " dg-require-effective-target 2 no_const_addr_space "
+UNRESOLVED: gcc.c-torture/compile/pr35468.c   -O2 -flto : unknown effective target keyword \`no_const_addr_space' for " dg-require-effective-target 2 no_const_addr_space "
+ERROR: gcc.c-torture/compile/pr35468.c   -O2 : unknown effective target keyword \`no_const_addr_space' for " dg-require-effective-target 2 no_const_addr_space "
+UNRESOLVED: gcc.c-torture/compile/pr35468.c   -O2 : unknown effective target keyword \`no_const_addr_space' for " dg-require-effective-target 2 no_const_addr_space "
+ERROR: gcc.c-torture/compile/pr35468.c   -O3 -fomit-frame-pointer : unknown effective target keyword \`no_const_addr_space' for " dg-require-effective-target 2 no_const_addr_space "
+UNRESOLVED: gcc.c-torture/compile/pr35468.c   -O3 -fomit-frame-pointer : unknown effective target keyword \`no_const_addr_space' for " dg-require-effective-target 2 no_const_addr_space "
+ERROR: gcc.c-torture/compile/pr35468.c   -O3 -g : unknown effective target keyword \`no_const_addr_space' for " dg-require-effective-target 2 no_const_addr_space "
+UNRESOLVED: gcc.c-torture/compile/pr35468.c   -O3 -g : unknown effective target keyword \`no_const_addr_space' for " dg-require-effective-target 2 no_const_addr_space "
+ERROR: gcc.c-torture/compile/pr35468.c   -Os : unknown effective target keyword \`no_const_addr_space' for " dg-require-effective-target 2 no_const_addr_space "
+UNRESOLVED: gcc.c-torture/compile/pr35468.c   -Os : unknown effective target keyword \`no_const_addr_space' for " dg-require-effective-target 2 no_const_addr_space "
+ERROR: gcc.dg/pr44194-1.c: syntax error in target selector "target  { { { { { i?86-*-* x86_64-*-* } && x32 } || lp64 } && { ! s390*-*-* } } && { ! hppa*64*-*-* } } && { ! alpha*-*-* }  &&  { ! powerpc*-*-linux* } || powerpc_elfv2  &&  ! nvptx-*-*" for " dg-do 1 compile { target { { { { { { { i?86-*-* x86_64-*-* } && x32 } || lp64 } && { ! s390*-*-* } } && { ! hppa*64*-*-* } } && { ! alpha*-*-* } } && { { ! powerpc*-*-linux* } || powerpc_elfv2 } && { ! nvptx-*-* } } } "
+UNRESOLVED: gcc.dg/pr44194-1.c: syntax error in target selector "target  { { { { { i?86-*-* x86_64-*-* } && x32 } || lp64 } && { ! s390*-*-* } } && { ! hppa*64*-*-* } } && { ! alpha*-*-* }  &&  { ! powerpc*-*-linux* } || powerpc_elfv2  &&  ! nvptx-*-*" for " dg-do 1 compile { target { { { { { { { i?86-*-* x86_64-*-* } && x32 } || lp64 } && { ! s390*-*-* } } && { ! hppa*64*-*-* } } && { ! alpha*-*-* } } && { { ! powerpc*-*-linux* } || powerpc_elfv2 } && { ! nvptx-*-* } } } "
+FAIL: gcc.dg/pr45352-1.c (test for excess errors)

	Jakub

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [10/11+] Target files
  2014-11-10 20:06           ` Jakub Jelinek
@ 2014-11-10 20:37             ` H.J. Lu
  2014-11-10 20:40             ` H.J. Lu
  1 sibling, 0 replies; 82+ messages in thread
From: H.J. Lu @ 2014-11-10 20:37 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Bernd Schmidt, Jeff Law, GCC Patches

On Mon, Nov 10, 2014 at 12:04 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Mon, Nov 10, 2014 at 05:19:57PM +0100, Bernd Schmidt wrote:
>> commit 659744a99d815b168716b4460e32f6a21593e494
>> Author: Bernd Schmidt <bernds@codesourcery.com>
>> Date:   Thu Nov 6 19:03:57 2014 +0100
>
> Note, in r217301 you've committed a change to pr35468.c, not mentioned in
> the ChangeLog, that uses no_const_addr_space effective target that is never
> defined.  Can you please revert or commit a patch that adds support for that
> to gcc/testsuite/lib/ ?
>
> +ERROR: gcc.c-torture/compile/pr35468.c   -O0 : unknown effective target keyword \`no_const_addr_space' for " dg-require-effective-target 2 no_const_addr_space "
> +UNRESOLVED: gcc.c-torture/compile/pr35468.c   -O0 : unknown effective target keyword \`no_const_addr_space' for " dg-require-effective-target 2 no_const_addr_space "
> +ERROR: gcc.c-torture/compile/pr35468.c   -O1 : unknown effective target keyword \`no_const_addr_space' for " dg-require-effective-target 2 no_const_addr_space "
> +UNRESOLVED: gcc.c-torture/compile/pr35468.c   -O1 : unknown effective target keyword \`no_const_addr_space' for " dg-require-effective-target 2 no_const_addr_space "
> +ERROR: gcc.c-torture/compile/pr35468.c   -O2 -flto -flto-partition=none : unknown effective target keyword \`no_const_addr_space' for " dg-require-effective-target 2 no_const_addr_space "
> +UNRESOLVED: gcc.c-torture/compile/pr35468.c   -O2 -flto -flto-partition=none : unknown effective target keyword \`no_const_addr_space' for " dg-require-effective-target 2 no_const_addr_space "
> +ERROR: gcc.c-torture/compile/pr35468.c   -O2 -flto : unknown effective target keyword \`no_const_addr_space' for " dg-require-effective-target 2 no_const_addr_space "
> +UNRESOLVED: gcc.c-torture/compile/pr35468.c   -O2 -flto : unknown effective target keyword \`no_const_addr_space' for " dg-require-effective-target 2 no_const_addr_space "
> +ERROR: gcc.c-torture/compile/pr35468.c   -O2 : unknown effective target keyword \`no_const_addr_space' for " dg-require-effective-target 2 no_const_addr_space "
> +UNRESOLVED: gcc.c-torture/compile/pr35468.c   -O2 : unknown effective target keyword \`no_const_addr_space' for " dg-require-effective-target 2 no_const_addr_space "
> +ERROR: gcc.c-torture/compile/pr35468.c   -O3 -fomit-frame-pointer : unknown effective target keyword \`no_const_addr_space' for " dg-require-effective-target 2 no_const_addr_space "
> +UNRESOLVED: gcc.c-torture/compile/pr35468.c   -O3 -fomit-frame-pointer : unknown effective target keyword \`no_const_addr_space' for " dg-require-effective-target 2 no_const_addr_space "
> +ERROR: gcc.c-torture/compile/pr35468.c   -O3 -g : unknown effective target keyword \`no_const_addr_space' for " dg-require-effective-target 2 no_const_addr_space "
> +UNRESOLVED: gcc.c-torture/compile/pr35468.c   -O3 -g : unknown effective target keyword \`no_const_addr_space' for " dg-require-effective-target 2 no_const_addr_space "
> +ERROR: gcc.c-torture/compile/pr35468.c   -Os : unknown effective target keyword \`no_const_addr_space' for " dg-require-effective-target 2 no_const_addr_space "
> +UNRESOLVED: gcc.c-torture/compile/pr35468.c   -Os : unknown effective target keyword \`no_const_addr_space' for " dg-require-effective-target 2 no_const_addr_space "
> +ERROR: gcc.dg/pr44194-1.c: syntax error in target selector "target  { { { { { i?86-*-* x86_64-*-* } && x32 } || lp64 } && { ! s390*-*-* } } && { ! hppa*64*-*-* } } && { ! alpha*-*-* }  &&  { ! powerpc*-*-linux* } || powerpc_elfv2  &&  ! nvptx-*-*" for " dg-do 1 compile { target { { { { { { { i?86-*-* x86_64-*-* } && x32 } || lp64 } && { ! s390*-*-* } } && { ! hppa*64*-*-* } } && { ! alpha*-*-* } } && { { ! powerpc*-*-linux* } || powerpc_elfv2 } && { ! nvptx-*-* } } } "
> +UNRESOLVED: gcc.dg/pr44194-1.c: syntax error in target selector "target  { { { { { i?86-*-* x86_64-*-* } && x32 } || lp64 } && { ! s390*-*-* } } && { ! hppa*64*-*-* } } && { ! alpha*-*-* }  &&  { ! powerpc*-*-linux* } || powerpc_elfv2  &&  ! nvptx-*-*" for " dg-do 1 compile { target { { { { { { { i?86-*-* x86_64-*-* } && x32 } || lp64 } && { ! s390*-*-* } } && { ! hppa*64*-*-* } } && { ! alpha*-*-* } } && { { ! powerpc*-*-linux* } || powerpc_elfv2 } && { ! nvptx-*-* } } } "
> +FAIL: gcc.dg/pr45352-1.c (test for excess errors)
>
>         Jakub



-- 
H.J.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [10/11+] Target files
  2014-11-10 20:06           ` Jakub Jelinek
  2014-11-10 20:37             ` H.J. Lu
@ 2014-11-10 20:40             ` H.J. Lu
  2014-11-10 20:42               ` Mike Stump
  1 sibling, 1 reply; 82+ messages in thread
From: H.J. Lu @ 2014-11-10 20:40 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Bernd Schmidt, Jeff Law, GCC Patches

On Mon, Nov 10, 2014 at 12:04 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Mon, Nov 10, 2014 at 05:19:57PM +0100, Bernd Schmidt wrote:
>> commit 659744a99d815b168716b4460e32f6a21593e494
>> Author: Bernd Schmidt <bernds@codesourcery.com>
>> Date:   Thu Nov 6 19:03:57 2014 +0100
>
> Note, in r217301 you've committed a change to pr35468.c, not mentioned in
> the ChangeLog, that uses no_const_addr_space effective target that is never
> defined.  Can you please revert or commit a patch that adds support for that
> to gcc/testsuite/lib/ ?
>
> +ERROR: gcc.c-torture/compile/pr35468.c   -O0 : unknown effective target keyword \`no_const_addr_space' for " dg-require-effective-target 2 no_const_addr_space "
> +UNRESOLVED: gcc.c-torture/compile/pr35468.c   -O0 : unknown effective target keyword \`no_const_addr_space' for " dg-require-effective-target 2 no_const_addr_space "
> +ERROR: gcc.c-torture/compile/pr35468.c   -O1 : unknown effective target keyword \`no_const_addr_space' for " dg-require-effective-target 2 no_const_addr_space "
> +UNRESOLVED: gcc.c-torture/compile/pr35468.c   -O1 : unknown effective target keyword \`no_const_addr_space' for " dg-require-effective-target 2 no_const_addr_space "
> +ERROR: gcc.c-torture/compile/pr35468.c   -O2 -flto -flto-partition=none : unknown effective target keyword \`no_const_addr_space' for " dg-require-effective-target 2 no_const_addr_space "
> +UNRESOLVED: gcc.c-torture/compile/pr35468.c   -O2 -flto -flto-partition=none : unknown effective target keyword \`no_const_addr_space' for " dg-require-effective-target 2 no_const_addr_space "
> +ERROR: gcc.c-torture/compile/pr35468.c   -O2 -flto : unknown effective target keyword \`no_const_addr_space' for " dg-require-effective-target 2 no_const_addr_space "
> +UNRESOLVED: gcc.c-torture/compile/pr35468.c   -O2 -flto : unknown effective target keyword \`no_const_addr_space' for " dg-require-effective-target 2 no_const_addr_space "
> +ERROR: gcc.c-torture/compile/pr35468.c   -O2 : unknown effective target keyword \`no_const_addr_space' for " dg-require-effective-target 2 no_const_addr_space "
> +UNRESOLVED: gcc.c-torture/compile/pr35468.c   -O2 : unknown effective target keyword \`no_const_addr_space' for " dg-require-effective-target 2 no_const_addr_space "
> +ERROR: gcc.c-torture/compile/pr35468.c   -O3 -fomit-frame-pointer : unknown effective target keyword \`no_const_addr_space' for " dg-require-effective-target 2 no_const_addr_space "
> +UNRESOLVED: gcc.c-torture/compile/pr35468.c   -O3 -fomit-frame-pointer : unknown effective target keyword \`no_const_addr_space' for " dg-require-effective-target 2 no_const_addr_space "
> +ERROR: gcc.c-torture/compile/pr35468.c   -O3 -g : unknown effective target keyword \`no_const_addr_space' for " dg-require-effective-target 2 no_const_addr_space "
> +UNRESOLVED: gcc.c-torture/compile/pr35468.c   -O3 -g : unknown effective target keyword \`no_const_addr_space' for " dg-require-effective-target 2 no_const_addr_space "
> +ERROR: gcc.c-torture/compile/pr35468.c   -Os : unknown effective target keyword \`no_const_addr_space' for " dg-require-effective-target 2 no_const_addr_space "
> +UNRESOLVED: gcc.c-torture/compile/pr35468.c   -Os : unknown effective target keyword \`no_const_addr_space' for " dg-require-effective-target 2 no_const_addr_space "
> +ERROR: gcc.dg/pr44194-1.c: syntax error in target selector "target  { { { { { i?86-*-* x86_64-*-* } && x32 } || lp64 } && { ! s390*-*-* } } && { ! hppa*64*-*-* } } && { ! alpha*-*-* }  &&  { ! powerpc*-*-linux* } || powerpc_elfv2  &&  ! nvptx-*-*" for " dg-do 1 compile { target { { { { { { { i?86-*-* x86_64-*-* } && x32 } || lp64 } && { ! s390*-*-* } } && { ! hppa*64*-*-* } } && { ! alpha*-*-* } } && { { ! powerpc*-*-linux* } || powerpc_elfv2 } && { ! nvptx-*-* } } } "
> +UNRESOLVED: gcc.dg/pr44194-1.c: syntax error in target selector "target  { { { { { i?86-*-* x86_64-*-* } && x32 } || lp64 } && { ! s390*-*-* } } && { ! hppa*64*-*-* } } && { ! alpha*-*-* }  &&  { ! powerpc*-*-linux* } || powerpc_elfv2  &&  ! nvptx-*-*" for " dg-do 1 compile { target { { { { { { { i?86-*-* x86_64-*-* } && x32 } || lp64 } && { ! s390*-*-* } } && { ! hppa*64*-*-* } } && { ! alpha*-*-* } } && { { ! powerpc*-*-linux* } || powerpc_elfv2 } && { ! nvptx-*-* } } } "
> +FAIL: gcc.dg/pr45352-1.c (test for excess errors)
>
>         Jakub

I reverted the change in gcc.c-torture/compile/pr35468.c.
I also checked in this patch to add missing braces in
gcc.dg/pr44194-1.c.



-- 
H.J.
-----
Index: ChangeLog
===================================================================
--- ChangeLog (revision 217315)
+++ ChangeLog (working copy)
@@ -1,3 +1,7 @@
+2014-11-10  H.J. Lu  <hongjiu.lu@intel.com>
+
+ * gcc.dg/pr44194-1.c (dg-do): Add missing braces.
+
 2014-11-10 Roman Gareev  <gareevroman@gmail.com>

  * gcc.dg/graphite/isl-ast-gen-blocks-1.c: Remove using of
Index: gcc.dg/pr44194-1.c
===================================================================
--- gcc.dg/pr44194-1.c (revision 217315)
+++ gcc.dg/pr44194-1.c (working copy)
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { { { { { { { i?86-*-* x86_64-*-* } && x32 } || lp6
4 } && { ! s390*-*-* } } && { ! hppa*64*-*-* } } && { ! alpha*-*-* } } && { { !
powerpc*-*-linux* } || powerpc_elfv2 } && { ! nvptx-*-* } } } } */
+/* { dg-do compile { target { { { { { { { { i?86-*-* x86_64-*-* } && x32 } || l
p64 } && { ! s390*-*-* } } && { ! hppa*64*-*-* } } && { ! alpha*-*-* } } && { {
! powerpc*-*-linux* } || powerpc_elfv2 } && { ! nvptx-*-* } } } } } */
 /* { dg-options "-O2 -fdump-rtl-dse1 -fdump-rtl-final" } */

 /* Restrict to 64-bit targets since 32-bit targets usually return small

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [10/11+] Target files
  2014-11-10 20:40             ` H.J. Lu
@ 2014-11-10 20:42               ` Mike Stump
  0 siblings, 0 replies; 82+ messages in thread
From: Mike Stump @ 2014-11-10 20:42 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Jakub Jelinek, Bernd Schmidt, Jeff Law, GCC Patches

On Nov 10, 2014, at 12:37 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> I also checked in this patch to add missing braces in
> gcc.dg/pr44194-1.c.

Thanks.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [0/11+]
  2014-10-20 14:19 The nvptx port [0/11+] Bernd Schmidt
                   ` (13 preceding siblings ...)
  2014-10-21  9:17 ` Jakub Jelinek
@ 2014-11-12 12:36 ` Richard Biener
  2014-11-12 21:39   ` Jeff Law
  2015-02-18  7:48 ` nvptx-none: Define empty GOMP_SELF_SPECS (was: The nvptx port [0/11+]) Thomas Schwinge
  2015-02-18  8:01 ` The nvptx port [0/11+] Thomas Schwinge
  16 siblings, 1 reply; 82+ messages in thread
From: Richard Biener @ 2014-11-12 12:36 UTC (permalink / raw)
  To: Bernd Schmidt, David Edelsohn; +Cc: GCC Patches

On Mon, Oct 20, 2014 at 4:17 PM, Bernd Schmidt <bernds@codesourcery.com> wrote:
> This is a patch kit that adds the nvptx port to gcc. It contains preliminary
> patches to add needed functionality, the target files, and one somewhat
> optional patch with additional target tools. There'll be more patch series,
> one for the testsuite, and one to make the offload functionality work with
> this port. Also required are the previous four rtl patches, two of which
> weren't entirely approved yet.
>
> For the moment, I've stripped out all the address space support that got
> bogged down in review by brokenness in our representation of address spaces.
> The ptx address spaces are of course still defined and used inside the
> backend.
>
> Ptx really isn't a usual target - it is a virtual target which is then
> translated by another compiler (ptxas) to the final code that runs on the
> GPU. There are many restrictions, some imposed by the GPU hardware, and some
> by the fact that not everything you'd want can be represented in ptx. Here
> are some of the highlights:
>  * Everything is typed - variables, functions, registers. This can
>    cause problems with K&R style C or anything else that doesn't
>    have a proper type internally.
>  * Declarations are needed, even for undefined variables.
>  * Can't emit initializers referring to their variable's address since
>    you can't write forward declarations for variables.
>  * Variables can be declared only as scalars or arrays, not
>    structures. Initializers must be in the variable's declared type,
>    which requires some code in the backend, and it means that packed
>    pointer values are not representable.
>  * Since it's a virtual target, we skip register allocation - no good
>    can probably come from doing that twice. This means asm statements
>    aren't fixed up and will fail if they use matching constraints.
>  * No support for indirect jumps, label values, nonlocal gotos.
>  * No alloca - ptx defines it, but it's not implemented.
>  * No trampolines.
>  * No debugging (at all, for now - we may add line number directives).
>  * Limited C library support - I have a hacked up copy of newlib
>    that provides a reasonable subset.
>  * malloc and free are defined by ptx (these appear to be
>    undocumented), but there isn't a realloc. I have one patch for
>    Fortran to use a malloc/memcpy helper function in cases where we
>    know the old size.
>
> All in all, this is not intended to be used as a C (or any other source
> language) compiler. I've gone through a lot of effort to make it work
> reasonably well, but only in order to get sufficient test coverage from the
> testsuites. The intended use for this is only to build it as an offload
> compiler, and use it through OpenACC by way of lto1. That leaves the
> question of how we should document it - does it need the usual constraint
> and option documentation, given that user's aren't expected to use any of
> it?
>
> A slightly earlier version of the entire patch kit was bootstrapped and
> tested on x86_64-linux. Ok for trunk?

Now that this has been committed - I notice that there is no entry
in MAINTAINERS for the port.  I propose Bernd.

Thanks,
Richard.

>
> Bernd

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [0/11+]
  2014-11-12 12:36 ` Richard Biener
@ 2014-11-12 21:39   ` Jeff Law
  0 siblings, 0 replies; 82+ messages in thread
From: Jeff Law @ 2014-11-12 21:39 UTC (permalink / raw)
  To: Richard Biener, Bernd Schmidt, David Edelsohn; +Cc: GCC Patches

On 11/12/14 05:34, Richard Biener wrote:

>
> Now that this has been committed - I notice that there is no entry
> in MAINTAINERS for the port.  I propose Bernd.
Well, ahead of you there.   I proposed Bernd to the steering committee 
as the maintainer a little while ago.  I need to go back and count votes :-)

jeff

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [10/11+] Target files
  2014-11-10 16:33         ` Bernd Schmidt
  2014-11-10 20:06           ` Jakub Jelinek
@ 2014-12-12 20:18           ` Thomas Schwinge
  2014-12-23 18:51           ` nvptx-tools and nvptx-newlib (was: The nvptx port [10/11+] Target files) Thomas Schwinge
  2 siblings, 0 replies; 82+ messages in thread
From: Thomas Schwinge @ 2014-12-12 20:18 UTC (permalink / raw)
  To: Bernd Schmidt, Jeff Law, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 5116 bytes --]

Hi!

On Mon, 10 Nov 2014 17:19:57 +0100, Bernd Schmidt <bernds@codesourcery.com> wrote:
> I've now committed it, in the following form.

> --- /dev/null
> +++ b/gcc/config/nvptx/nvptx.h
> @@ -0,0 +1,356 @@

> +#define ASM_OUTPUT_ALIGN(FILE, POWER)

Committed to trunk in r218689:

commit 61f8a1bd770ded96fcff88f3cbc426a23c413992
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Fri Dec 12 20:14:10 2014 +0000

    nvptx: Define valid ASM_OUTPUT_ALIGN.
    
    	gcc/
    	* config/nvptx/nvptx.h (ASM_OUTPUT_ALIGN): Define as a C statment.
    
        gcc/doc/tm.texi:@defmac ASM_OUTPUT_ALIGN (@var{stream}, @var{power})
        gcc/doc/tm.texi-A C statement to output to the stdio stream @var{stream} an assembler
        gcc/doc/tm.texi-command to advance the location counter to a multiple of 2 to the
        gcc/doc/tm.texi-@var{power} bytes.  @var{power} will be a C expression of type @code{int}.
        gcc/doc/tm.texi-@end defmac
    
        gcc/config/nvptx/nvptx.h:#define ASM_OUTPUT_ALIGN(FILE, POWER)
    
    "Empty" is not a C statement, and so in code such as:
    
        gcc/dwarf2out.c-              if (lsda_encoding == DW_EH_PE_aligned)
        gcc/dwarf2out.c:                ASM_OUTPUT_ALIGN (asm_out_file, floor_log2 (PTR_SIZE));
        gcc/dwarf2out.c-              dw2_asm_output_data (size_of_encoded_value (lsda_encoding), 0,
        gcc/dwarf2out.c-                                   "Language Specific Data Area (none)");
    
        gcc/varasm.c-      if (align > BITS_PER_UNIT)
        gcc/varasm.c:        ASM_OUTPUT_ALIGN (asm_out_file, floor_log2 (align / BITS_PER_UNIT));
        gcc/varasm.c-      assemble_variable_contents (decl, name, dont_output_data);
    
        gcc/varasm.c-  if (align > 0)
        gcc/varasm.c:    ASM_OUTPUT_ALIGN (asm_out_file, align);
        gcc/varasm.c-
        gcc/varasm.c-  targetm.asm_out.internal_label (asm_out_file, "LTRAMP", 0);
    
        gcc/varasm.c-      if (align > BITS_PER_UNIT)
        gcc/varasm.c:        ASM_OUTPUT_ALIGN (asm_out_file, floor_log2 (align / BITS_PER_UNIT));
        gcc/varasm.c-      assemble_constant_contents (exp, XSTR (symbol, 0), align);
    
    ..., GCC warns:
    
        [...]/source-gcc/gcc/dwarf2out.c: In function 'void output_fde(dw_fde_ref, bool, bool, char*, int, char*, bool, int)':
        [...]/source-gcc/gcc/dwarf2out.c:665:3: warning: suggest braces around empty body in an 'if' statement [-Wempty-body]
           ASM_OUTPUT_ALIGN (asm_out_file, floor_log2 (PTR_SIZE));
           ^
    
        [...]/source-gcc/gcc/varasm.c: In function 'void assemble_variable(tree, int, int, int)':
        [...]/source-gcc/gcc/varasm.c:2217:2: warning: suggest braces around empty body in an 'if' statement [-Wempty-body]
          ASM_OUTPUT_ALIGN (asm_out_file, floor_log2 (align / BITS_PER_UNIT));
          ^
        [...]/source-gcc/gcc/varasm.c: In function 'rtx_def* assemble_trampoline_template()':
        [...]/source-gcc/gcc/varasm.c:2603:5: warning: suggest braces around empty body in an 'if' statement [-Wempty-body]
             ASM_OUTPUT_ALIGN (asm_out_file, align);
             ^
        [...]/source-gcc/gcc/varasm.c: In function 'void output_constant_def_contents(rtx)':
        [...]/source-gcc/gcc/varasm.c:3413:2: warning: suggest braces around empty body in an 'if' statement [-Wempty-body]
          ASM_OUTPUT_ALIGN (asm_out_file, floor_log2 (align / BITS_PER_UNIT));
          ^
    
    Also, "use" the values, to get rid of that one:
    
        [...]/source-gcc/gcc/final.c: In function 'rtx_insn* final_scan_insn(rtx_insn*, FILE*, int, int, int*)':
        [...]/source-gcc/gcc/final.c:2450:12: warning: variable 'log_align' set but not used [-Wunused-but-set-variable]
                int log_align;
                    ^
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@218689 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog            |  4 ++++
 gcc/config/nvptx/nvptx.h | 10 +++++++++-
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git gcc/ChangeLog gcc/ChangeLog
index 689c4fd..e5de2c6 100644
--- gcc/ChangeLog
+++ gcc/ChangeLog
@@ -1,3 +1,7 @@
+2014-12-12  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* config/nvptx/nvptx.h (ASM_OUTPUT_ALIGN): Define as a C statment.
+
 2014-12-12  Vladimir Makarov  <vmakarov@redhat.com>
 
 	PR target/64110
diff --git gcc/config/nvptx/nvptx.h gcc/config/nvptx/nvptx.h
index c222375..5f08ba7 100644
--- gcc/config/nvptx/nvptx.h
+++ gcc/config/nvptx/nvptx.h
@@ -281,9 +281,17 @@ struct GTY(()) machine_function
     }								\
   while (0)
 
-#define ASM_OUTPUT_ALIGN(FILE, POWER)
+#define ASM_OUTPUT_ALIGN(FILE, POWER)		\
+  do						\
+    {						\
+      (void) (FILE);				\
+      (void) (POWER);				\
+    }						\
+  while (0)
+
 #define ASM_OUTPUT_SKIP(FILE, N)		\
   nvptx_output_skip (FILE, N)
+
 #undef  ASM_OUTPUT_ASCII
 #define ASM_OUTPUT_ASCII(FILE, STR, LENGTH)			\
   nvptx_output_ascii (FILE, STR, LENGTH);


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 82+ messages in thread

* nvptx-tools and nvptx-newlib (was: The nvptx port [10/11+] Target files)
  2014-11-10 16:33         ` Bernd Schmidt
  2014-11-10 20:06           ` Jakub Jelinek
  2014-12-12 20:18           ` Thomas Schwinge
@ 2014-12-23 18:51           ` Thomas Schwinge
  2015-02-02 15:33             ` Thomas Schwinge
  2 siblings, 1 reply; 82+ messages in thread
From: Thomas Schwinge @ 2014-12-23 18:51 UTC (permalink / raw)
  To: GCC Patches; +Cc: Bernd Schmidt

[-- Attachment #1: Type: text/plain, Size: 655 bytes --]

Hi!

On Mon, 10 Nov 2014 17:19:57 +0100, Bernd Schmidt <bernds@codesourcery.com> wrote:
> The scripts (11/11) I've put up on github, along with a hacked up 
> newlib. These are at
> 
> https://github.com/bernds/nvptx-tools
> https://github.com/bernds/nvptx-newlib
> 
> They are likely to migrate to MentorEmbedded from bernds, but that had 
> some permissions problems last week.

That has recently been done:
<https://github.com/MentorEmbedded/nvptx-tools> and
<https://github.com/MentorEmbedded/nvptx-newlib> are now available.

(I'm aware that we still are to write up how to actually build and test
all this.)


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: nvptx-tools and nvptx-newlib (was: The nvptx port [10/11+] Target files)
  2014-12-23 18:51           ` nvptx-tools and nvptx-newlib (was: The nvptx port [10/11+] Target files) Thomas Schwinge
@ 2015-02-02 15:33             ` Thomas Schwinge
  2015-02-04  9:43               ` Jakub Jelinek
  0 siblings, 1 reply; 82+ messages in thread
From: Thomas Schwinge @ 2015-02-02 15:33 UTC (permalink / raw)
  To: GCC Patches, Jakub Jelinek; +Cc: Bernd Schmidt

[-- Attachment #1: Type: text/plain, Size: 4628 bytes --]

Hi!

On Tue, 23 Dec 2014 19:49:35 +0100, I wrote:
> On Mon, 10 Nov 2014 17:19:57 +0100, Bernd Schmidt <bernds@codesourcery.com> wrote:
> > The scripts (11/11) I've put up on github, along with a hacked up 
> > newlib. These are at [...]

> > They are likely to migrate to MentorEmbedded from bernds, but that had 
> > some permissions problems last week.
> 
> That has recently been done:
> <https://github.com/MentorEmbedded/nvptx-tools> and
> <https://github.com/MentorEmbedded/nvptx-newlib> are now available.
> 
> (I'm aware that we still are to write up how to actually build and test
> all this.)

I just updated
<https://gcc.gnu.org/wiki/Offloading?action=diff&rev2=26&rev1=25>.

OK to check in the following to trunk?

commit a0c73cb76d1f13642df7725d64bc618ee0909abc
Author: Thomas Schwinge <thomas@codesourcery.com>
Date:   Mon Feb 2 16:29:36 2015 +0100

    Begin documenting the nvptx backend.
    
    	gcc/
    	* doc/install.texi (nvptx-*-none): New section.
    	* doc/invoke.texi (Nvidia PTX Options): Likewise.
    	* config/nvptx/nvptx.opt: Update.
---
 gcc/config/nvptx/nvptx.opt | 10 +++++-----
 gcc/doc/install.texi       | 23 +++++++++++++++++++++++
 gcc/doc/invoke.texi        | 26 ++++++++++++++++++++++++++
 3 files changed, 54 insertions(+), 5 deletions(-)

diff --git gcc/config/nvptx/nvptx.opt gcc/config/nvptx/nvptx.opt
index 1448dfc..249a61d 100644
--- gcc/config/nvptx/nvptx.opt
+++ gcc/config/nvptx/nvptx.opt
@@ -17,13 +17,13 @@
 ; along with GCC; see the file COPYING3.  If not see
 ; <http://www.gnu.org/licenses/>.
 
-m64
-Target Report RejectNegative Mask(ABI64)
-Generate code for a 64 bit ABI
-
 m32
 Target Report RejectNegative InverseMask(ABI64)
-Generate code for a 32 bit ABI
+Generate code for a 32-bit ABI
+
+m64
+Target Report RejectNegative Mask(ABI64)
+Generate code for a 64-bit ABI
 
 mmainkernel
 Target Report RejectNegative
diff --git gcc/doc/install.texi gcc/doc/install.texi
index c9e3bf1..b31f9b6 100644
--- gcc/doc/install.texi
+++ gcc/doc/install.texi
@@ -3302,6 +3302,8 @@ information have to.
 @item
 @uref{#nds32be-x-elf,,nds32be-*-elf}
 @item
+@uref{#nvptx-x-none,,nvptx-*-none}
+@item
 @uref{#powerpc-x-x,,powerpc*-*-*}
 @item
 @uref{#powerpc-x-darwin,,powerpc-*-darwin*}
@@ -4269,6 +4271,27 @@ Andes NDS32 target in big endian mode.
 @html
 <hr />
 @end html
+@anchor{nvptx-x-none}
+@heading nvptx-*-none
+Nvidia PTX target.
+
+Instead of GNU binutils, you will need to install
+@uref{https://github.com/MentorEmbedded/nvptx-tools/,,nvptx-tools}.
+Tell GCC where to find it:
+@option{--with-build-time-tools=[install-nvptx-tools]/nvptx-none/bin}.
+
+A nvptx port of newlib is available at
+@uref{https://github.com/MentorEmbedded/nvptx-newlib/,,nvptx-newlib}.
+It can be automatically built together with GCC@.  For this, add a
+symbolic link to nvptx-newlib's @file{newlib} directory to the
+directory containing the GCC sources.
+
+Use the @option{--disable-sjlj-exceptions} and
+@option{--enable-newlib-io-long-long} options when configuring.
+
+@html
+<hr />
+@end html
 @anchor{powerpc-x-x}
 @heading powerpc-*-*
 You can specify a default version for the @option{-mcpu=@var{cpu_type}}
diff --git gcc/doc/invoke.texi gcc/doc/invoke.texi
index ba81ec7..1fb329e 100644
--- gcc/doc/invoke.texi
+++ gcc/doc/invoke.texi
@@ -840,6 +840,9 @@ Objective-C and Objective-C++ Dialects}.
 -mcustom-fpu-cfg=@var{name} @gol
 -mhal -msmallc -msys-crt0=@var{name} -msys-lib=@var{name}}
 
+@emph{Nvidia PTX Options}
+@gccoptlist{-m32 -m64 -mmainkernel}
+
 @emph{PDP-11 Options}
 @gccoptlist{-mfpu  -msoft-float  -mac0  -mno-ac0  -m40  -m45  -m10 @gol
 -mbcopy  -mbcopy-builtin  -mint32  -mno-int16 @gol
@@ -11967,6 +11970,7 @@ platform.
 * MSP430 Options::
 * NDS32 Options::
 * Nios II Options::
+* Nvidia PTX Options::
 * PDP-11 Options::
 * picoChip Options::
 * PowerPC Options::
@@ -18277,6 +18281,28 @@ This option is typically used to link with a library provided by a HAL BSP.
 
 @end table
 
+@node Nvidia PTX Options
+@subsection Nvidia PTX Options
+@cindex Nvidia PTX options
+@cindex nvptx options
+
+These options are defined for Nvidia PTX:
+
+@table @gcctabopt
+
+@item -m32
+@itemx -m64
+@opindex m32
+@opindex m64
+Generate code for 32-bit or 64-bit ABI.
+
+@item -mmainkernel
+@opindex mmainkernel
+Link in code for a __main kernel.  This is for stand-alone instead of
+offloading execution.
+
+@end table
+
 @node PDP-11 Options
 @subsection PDP-11 Options
 @cindex PDP-11 Options


Grüße,
 Thomas

[-- Attachment #2: Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: nvptx-tools and nvptx-newlib (was: The nvptx port [10/11+] Target files)
  2015-02-02 15:33             ` Thomas Schwinge
@ 2015-02-04  9:43               ` Jakub Jelinek
  2015-02-18  8:50                 ` Thomas Schwinge
  0 siblings, 1 reply; 82+ messages in thread
From: Jakub Jelinek @ 2015-02-04  9:43 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: GCC Patches, Bernd Schmidt

On Mon, Feb 02, 2015 at 04:32:34PM +0100, Thomas Schwinge wrote:
> Hi!
> 
> On Tue, 23 Dec 2014 19:49:35 +0100, I wrote:
> > On Mon, 10 Nov 2014 17:19:57 +0100, Bernd Schmidt <bernds@codesourcery.com> wrote:
> > > The scripts (11/11) I've put up on github, along with a hacked up 
> > > newlib. These are at [...]
> 
> > > They are likely to migrate to MentorEmbedded from bernds, but that had 
> > > some permissions problems last week.
> > 
> > That has recently been done:
> > <https://github.com/MentorEmbedded/nvptx-tools> and
> > <https://github.com/MentorEmbedded/nvptx-newlib> are now available.
> > 
> > (I'm aware that we still are to write up how to actually build and test
> > all this.)
> 
> I just updated
> <https://gcc.gnu.org/wiki/Offloading?action=diff&rev2=26&rev1=25>.

Can you please update the gmane URLs to corresponding
https://gcc.gnu.org/ml/gcc-patches/ URLs?  We have our own mailing list
archives, no need to use third party ones.
> 
> OK to check in the following to trunk?

> --- gcc/config/nvptx/nvptx.opt
> +++ gcc/config/nvptx/nvptx.opt
> @@ -17,13 +17,13 @@
>  ; along with GCC; see the file COPYING3.  If not see
>  ; <http://www.gnu.org/licenses/>.
>  
> -m64
> -Target Report RejectNegative Mask(ABI64)
> -Generate code for a 64 bit ABI
> -
>  m32
>  Target Report RejectNegative InverseMask(ABI64)
> -Generate code for a 32 bit ABI
> +Generate code for a 32-bit ABI
> +
> +m64
> +Target Report RejectNegative Mask(ABI64)
> +Generate code for a 64-bit ABI

I'd expect you want also Negative(m64) on the m32 option and
Negative(m32) on the m64 option.

> +@table @gcctabopt
> +
> +@item -m32
> +@itemx -m64
> +@opindex m32
> +@opindex m64
> +Generate code for 32-bit or 64-bit ABI.

I guess you should mention which one of those is the default (if it isn't
configure time configurable).

What about multilibs, is newlib built for both -m32 and -m64, or just the
default option?

	Jakub

^ permalink raw reply	[flat|nested] 82+ messages in thread

* nvptx-none: Define empty GOMP_SELF_SPECS (was: The nvptx port [0/11+])
  2014-10-20 14:19 The nvptx port [0/11+] Bernd Schmidt
                   ` (14 preceding siblings ...)
  2014-11-12 12:36 ` Richard Biener
@ 2015-02-18  7:48 ` Thomas Schwinge
  2015-02-18  8:01 ` The nvptx port [0/11+] Thomas Schwinge
  16 siblings, 0 replies; 82+ messages in thread
From: Thomas Schwinge @ 2015-02-18  7:48 UTC (permalink / raw)
  To: GCC Patches; +Cc: Bernd Schmidt

[-- Attachment #1: Type: text/plain, Size: 1880 bytes --]

Hi!

On Mon, 20 Oct 2014 16:17:56 +0200, Bernd Schmidt <bernds@codesourcery.com> wrote:
> This is a patch kit that adds the nvptx port to gcc.

I wonder why we haven't been seeing this in our internal development
branch -- maybe because on that branch we're still discarding more
compiler options in the offloading path?

Committed to trunk in r220780:

commit 2fdc66a9fcfbc5b77c1c03d7c34893a0a086e8f8
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Wed Feb 18 07:45:42 2015 +0000

    nvptx-none: Define empty GOMP_SELF_SPECS.
    
    Otherwise, offloading with -fopenacc or -fopenmp active will run into:
    
        x86_64-unknown-linux-gnu-accel-nvptx-none-gcc: error: unrecognized command line option '-pthread'
    
    	gcc/
    	* config/nvptx/nvptx.h (GOMP_SELF_SPECS): Define macro.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@220780 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog            |    4 ++++
 gcc/config/nvptx/nvptx.h |    4 ++++
 2 files changed, 8 insertions(+)

diff --git gcc/ChangeLog gcc/ChangeLog
index 2c75df6..180a605 100644
--- gcc/ChangeLog
+++ gcc/ChangeLog
@@ -1,3 +1,7 @@
+2015-02-18  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* config/nvptx/nvptx.h (GOMP_SELF_SPECS): Define macro.
+
 2015-02-18  Andrew Pinski  <apinski@cavium.com>
 	    Naveen H.S  <Naveen.Hurugalawadi@caviumnetworks.com>
 
diff --git gcc/config/nvptx/nvptx.h gcc/config/nvptx/nvptx.h
index 9a9954b..e74d16f 100644
--- gcc/config/nvptx/nvptx.h
+++ gcc/config/nvptx/nvptx.h
@@ -33,6 +33,10 @@
       builtin_define ("__nvptx__");		\
     } while (0)
 
+/* Avoid the default in ../../gcc.c, which adds "-pthread", which is not
+   supported for nvptx.  */
+#define GOMP_SELF_SPECS ""
+
 /* Storage Layout.  */
 
 #define BITS_BIG_ENDIAN 0


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: The nvptx port [0/11+]
  2014-10-20 14:19 The nvptx port [0/11+] Bernd Schmidt
                   ` (15 preceding siblings ...)
  2015-02-18  7:48 ` nvptx-none: Define empty GOMP_SELF_SPECS (was: The nvptx port [0/11+]) Thomas Schwinge
@ 2015-02-18  8:01 ` Thomas Schwinge
  16 siblings, 0 replies; 82+ messages in thread
From: Thomas Schwinge @ 2015-02-18  8:01 UTC (permalink / raw)
  To: GCC Patches; +Cc: Bernd Schmidt

[-- Attachment #1: Type: text/plain, Size: 3064 bytes --]

Hi!

On Mon, 20 Oct 2014 16:17:56 +0200, Bernd Schmidt <bernds@codesourcery.com> wrote:
> This is a patch kit that adds the nvptx port to gcc.

Committed to trunk in r220781:

commit 0f7695734890f93fe58179e36ac2f41bf4147d78
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Wed Feb 18 08:01:03 2015 +0000

    nvptx-none: Disable the lto-plugin.
    
    	config/
    	* elf.m4 (ACX_ELF_TARGET_IFELSE): nvptx-*-none isn't ELF.
    	/
    	* configure: Regenerate.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@220781 138bc75d-0d04-0410-961f-82ee72b054a4
---
 ChangeLog        |    4 ++++
 config/ChangeLog |    4 ++++
 config/elf.m4    |    7 +++++--
 configure        |    3 ++-
 4 files changed, 15 insertions(+), 3 deletions(-)

diff --git ChangeLog ChangeLog
index 0969af5..a9e4437 100644
--- ChangeLog
+++ ChangeLog
@@ -1,3 +1,7 @@
+2015-02-18  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* configure: Regenerate.
+
 2015-02-06  Diego Novillo  <dnovillo@google.com>
 
 	* MAINTAINERS (Global Reviewers, Plugin, LTO, tree-ssa,
diff --git config/ChangeLog config/ChangeLog
index 2cbc885..c9ed121 100644
--- config/ChangeLog
+++ config/ChangeLog
@@ -1,3 +1,7 @@
+2015-02-18  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* elf.m4 (ACX_ELF_TARGET_IFELSE): nvptx-*-none isn't ELF.
+
 2014-11-17  Bob Dunlop  <bob.dunlop@xyzzy.org.uk>
 
 	* mt-ospace (CFLAGS_FOR_TARGET): Append -g -Os rather than
diff --git config/elf.m4 config/elf.m4
index da051cb..1772a44 100644
--- config/elf.m4
+++ config/elf.m4
@@ -1,4 +1,4 @@
-dnl Copyright (C) 2010, 2011 Free Software Foundation, Inc.
+dnl Copyright (C) 2010, 2011, 2015 Free Software Foundation, Inc.
 dnl This file is free software, distributed under the terms of the GNU
 dnl General Public License.  As a special exception to the GNU General
 dnl Public License, this file may be distributed as part of a program
@@ -7,6 +7,8 @@ dnl the same distribution terms as the rest of that program.
 
 dnl From Paolo Bonzini.
 
+dnl Is this an ELF target supporting the LTO plugin?
+
 dnl usage: ACX_ELF_TARGET_IFELSE([if-elf], [if-not-elf])
 AC_DEFUN([ACX_ELF_TARGET_IFELSE], [
 AC_REQUIRE([AC_CANONICAL_TARGET])
@@ -15,7 +17,8 @@ target_elf=no
 case $target in
   *-darwin* | *-aix* | *-cygwin* | *-mingw* | *-aout* | *-*coff* | \
   *-msdosdjgpp* | *-vms* | *-wince* | *-*-pe* | \
-  alpha*-dec-osf* | *-interix* | hppa[[12]]*-*-hpux*)
+  alpha*-dec-osf* | *-interix* | hppa[[12]]*-*-hpux* | \
+  nvptx-*-none)
     target_elf=no
     ;;
   *)
diff --git configure configure
index dd794db..f20a6ab 100755
--- configure
+++ configure
@@ -6047,7 +6047,8 @@ target_elf=no
 case $target in
   *-darwin* | *-aix* | *-cygwin* | *-mingw* | *-aout* | *-*coff* | \
   *-msdosdjgpp* | *-vms* | *-wince* | *-*-pe* | \
-  alpha*-dec-osf* | *-interix* | hppa[12]*-*-hpux*)
+  alpha*-dec-osf* | *-interix* | hppa[12]*-*-hpux* | \
+  nvptx-*-none)
     target_elf=no
     ;;
   *)


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: nvptx-tools and nvptx-newlib (was: The nvptx port [10/11+] Target files)
  2015-02-04  9:43               ` Jakub Jelinek
@ 2015-02-18  8:50                 ` Thomas Schwinge
  2015-02-18  9:03                   ` Jakub Jelinek
  2015-07-08 15:03                   ` [nvptx offloading] Only 64-bit configurations are currently supported (was: nvptx-tools and nvptx-newlib) Thomas Schwinge
  0 siblings, 2 replies; 82+ messages in thread
From: Thomas Schwinge @ 2015-02-18  8:50 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: GCC Patches, Bernd Schmidt

[-- Attachment #1: Type: text/plain, Size: 2724 bytes --]

Hi!

On Wed, 4 Feb 2015 10:43:14 +0100, Jakub Jelinek <jakub@redhat.com> wrote:
> On Mon, Feb 02, 2015 at 04:32:34PM +0100, Thomas Schwinge wrote:
> > Hi!
> > 
> > On Tue, 23 Dec 2014 19:49:35 +0100, I wrote:
> > > On Mon, 10 Nov 2014 17:19:57 +0100, Bernd Schmidt <bernds@codesourcery.com> wrote:
> > > > The scripts (11/11) I've put up on github, along with a hacked up 
> > > > newlib. These are at [...]
> > 
> > > > They are likely to migrate to MentorEmbedded from bernds, but that had 
> > > > some permissions problems last week.
> > > 
> > > That has recently been done:
> > > <https://github.com/MentorEmbedded/nvptx-tools> and
> > > <https://github.com/MentorEmbedded/nvptx-newlib> are now available.
> > > 
> > > (I'm aware that we still are to write up how to actually build and test
> > > all this.)
> > 
> > I just updated
> > <https://gcc.gnu.org/wiki/Offloading?action=diff&rev2=26&rev1=25>.
> 
> Can you please update the gmane URLs to corresponding
> https://gcc.gnu.org/ml/gcc-patches/ URLs?  We have our own mailing list
> archives, no need to use third party ones.

It's convenient for me (Message-IDs falls out of my mailer automatically,
and Gmane happens to support retrieving message by Message-ID), and the
sourceware mailing list archives software doesn't interlink articles
between different YYYY-MM, which I find rather limiting.


> > OK to check in the following to trunk?

Committed to trunk in r220783.


> > --- gcc/config/nvptx/nvptx.opt
> > +++ gcc/config/nvptx/nvptx.opt
> > @@ -17,13 +17,13 @@
> >  ; along with GCC; see the file COPYING3.  If not see
> >  ; <http://www.gnu.org/licenses/>.
> >  
> > -m64
> > -Target Report RejectNegative Mask(ABI64)
> > -Generate code for a 64 bit ABI
> > -
> >  m32
> >  Target Report RejectNegative InverseMask(ABI64)
> > -Generate code for a 32 bit ABI
> > +Generate code for a 32-bit ABI
> > +
> > +m64
> > +Target Report RejectNegative Mask(ABI64)
> > +Generate code for a 64-bit ABI
> 
> I'd expect you want also Negative(m64) on the m32 option and
> Negative(m32) on the m64 option.
> 
> > +@table @gcctabopt
> > +
> > +@item -m32
> > +@itemx -m64
> > +@opindex m32
> > +@opindex m64
> > +Generate code for 32-bit or 64-bit ABI.
> 
> I guess you should mention which one of those is the default (if it isn't
> configure time configurable).

Have taken a note to look into these, later.


> What about multilibs, is newlib built for both -m32 and -m64, or just the
> default option?

So far, we have concentrated only on the 64-bit x86_64 configuration;
32-bit has several known issues to be resolved.
<https://gcc.gnu.org/PR65099> filed.


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: nvptx-tools and nvptx-newlib (was: The nvptx port [10/11+] Target files)
  2015-02-18  8:50                 ` Thomas Schwinge
@ 2015-02-18  9:03                   ` Jakub Jelinek
  2015-07-08 15:03                   ` [nvptx offloading] Only 64-bit configurations are currently supported (was: nvptx-tools and nvptx-newlib) Thomas Schwinge
  1 sibling, 0 replies; 82+ messages in thread
From: Jakub Jelinek @ 2015-02-18  9:03 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: GCC Patches, Bernd Schmidt

On Wed, Feb 18, 2015 at 09:50:15AM +0100, Thomas Schwinge wrote:
> > What about multilibs, is newlib built for both -m32 and -m64, or just the
> > default option?
> 
> So far, we have concentrated only on the 64-bit x86_64 configuration;
> 32-bit has several known issues to be resolved.
> <https://gcc.gnu.org/PR65099> filed.

I meant 64-bit and 32-bit PTX.

	Jakub

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [nvptx offloading] Only 64-bit configurations are currently supported (was: nvptx-tools and nvptx-newlib)
  2015-02-18  8:50                 ` Thomas Schwinge
  2015-02-18  9:03                   ` Jakub Jelinek
@ 2015-07-08 15:03                   ` Thomas Schwinge
  2015-07-14 20:10                     ` [nvptx offloading] Only 64-bit configurations are currently supported Thomas Schwinge
  2021-01-14 18:18                     ` [nvptx libgomp plugin] Build only in supported configurations (was: [nvptx offloading] Only 64-bit configurations are currently supported) Thomas Schwinge
  1 sibling, 2 replies; 82+ messages in thread
From: Thomas Schwinge @ 2015-07-08 15:03 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jakub Jelinek, Bernd Schmidt

[-- Attachment #1: Type: text/plain, Size: 7984 bytes --]

Hi!

On Wed, 18 Feb 2015 09:50:15 +0100, I wrote:
> So far, we have concentrated only on the 64-bit x86_64 configuration;
> 32-bit has several known issues to be resolved.
> <https://gcc.gnu.org/PR65099> filed.

I have committed the following patch in r225560.  This gets us rid of the
lots of "expected FAILs" in the 32-bit part of
RUNTESTFLAGS='--target_board=unix\{-m64,-m32\}' testing, for example.

commit fe265ad3c9624da88f43be349137696449148f4f
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Wed Jul 8 14:59:59 2015 +0000

    [nvptx offloading] Only 64-bit configurations are currently supported
    
    	PR libgomp/65099
    	gcc/
    	* config/nvptx/mkoffload.c (main): Create an offload image only in
    	64-bit configurations.
    	libgomp/
    	* plugin/plugin-nvptx.c (nvptx_get_num_devices): Return 0 if not
    	in a 64-bit configuration.
    	* testsuite/libgomp.oacc-c++/c++.exp: Don't attempt nvidia
    	offloading testing if no such device is available.
    	* testsuite/libgomp.oacc-c/c.exp: Likewise.
    	* testsuite/libgomp.oacc-fortran/fortran.exp: Likewise.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@225560 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog                                      |    6 +++
 gcc/config/nvptx/mkoffload.c                       |   56 +++++++++++---------
 libgomp/ChangeLog                                  |   10 ++++
 libgomp/plugin/plugin-nvptx.c                      |    5 ++
 libgomp/testsuite/libgomp.oacc-c++/c++.exp         |    6 +++
 libgomp/testsuite/libgomp.oacc-c/c.exp             |    6 +++
 libgomp/testsuite/libgomp.oacc-fortran/fortran.exp |    6 +++
 7 files changed, 70 insertions(+), 25 deletions(-)

diff --git gcc/ChangeLog gcc/ChangeLog
index 33a2fa0..4c83723 100644
--- gcc/ChangeLog
+++ gcc/ChangeLog
@@ -1,3 +1,9 @@
+2015-07-08  Thomas Schwinge  <thomas@codesourcery.com>
+
+	PR libgomp/65099
+	* config/nvptx/mkoffload.c (main): Create an offload image only in
+	64-bit configurations.
+
 2015-07-08  Martin Liska  <mliska@suse.cz>
 
 	PR bootstrap/66744
diff --git gcc/config/nvptx/mkoffload.c gcc/config/nvptx/mkoffload.c
index 8687154..8bc08bf 100644
--- gcc/config/nvptx/mkoffload.c
+++ gcc/config/nvptx/mkoffload.c
@@ -993,37 +993,43 @@ main (int argc, char **argv)
 	obstack_ptr_grow (&argv_obstack, argv[ix]);
     }
 
-  ptx_name = make_temp_file (".mkoffload");
-  obstack_ptr_grow (&argv_obstack, "-o");
-  obstack_ptr_grow (&argv_obstack, ptx_name);
-  obstack_ptr_grow (&argv_obstack, NULL);
-  const char **new_argv = XOBFINISH (&argv_obstack, const char **);
-
-  char *execpath = getenv ("GCC_EXEC_PREFIX");
-  char *cpath = getenv ("COMPILER_PATH");
-  char *lpath = getenv ("LIBRARY_PATH");
-  unsetenv ("GCC_EXEC_PREFIX");
-  unsetenv ("COMPILER_PATH");
-  unsetenv ("LIBRARY_PATH");
-
-  fork_execute (new_argv[0], CONST_CAST (char **, new_argv), true);
-  obstack_free (&argv_obstack, NULL);
-
-  xputenv (concat ("GCC_EXEC_PREFIX=", execpath, NULL));
-  xputenv (concat ("COMPILER_PATH=", cpath, NULL));
-  xputenv (concat ("LIBRARY_PATH=", lpath, NULL));
-
-  in = fopen (ptx_name, "r");
-  if (!in)
-    fatal_error (input_location, "cannot open intermediate ptx file");
-
   ptx_cfile_name = make_temp_file (".c");
 
   out = fopen (ptx_cfile_name, "w");
   if (!out)
     fatal_error (input_location, "cannot open '%s'", ptx_cfile_name);
 
-  process (in, out);
+  /* PR libgomp/65099: Currently, we only support offloading in 64-bit
+     configurations.  */
+  if (!target_ilp32)
+    {
+      ptx_name = make_temp_file (".mkoffload");
+      obstack_ptr_grow (&argv_obstack, "-o");
+      obstack_ptr_grow (&argv_obstack, ptx_name);
+      obstack_ptr_grow (&argv_obstack, NULL);
+      const char **new_argv = XOBFINISH (&argv_obstack, const char **);
+
+      char *execpath = getenv ("GCC_EXEC_PREFIX");
+      char *cpath = getenv ("COMPILER_PATH");
+      char *lpath = getenv ("LIBRARY_PATH");
+      unsetenv ("GCC_EXEC_PREFIX");
+      unsetenv ("COMPILER_PATH");
+      unsetenv ("LIBRARY_PATH");
+
+      fork_execute (new_argv[0], CONST_CAST (char **, new_argv), true);
+      obstack_free (&argv_obstack, NULL);
+
+      xputenv (concat ("GCC_EXEC_PREFIX=", execpath, NULL));
+      xputenv (concat ("COMPILER_PATH=", cpath, NULL));
+      xputenv (concat ("LIBRARY_PATH=", lpath, NULL));
+
+      in = fopen (ptx_name, "r");
+      if (!in)
+	fatal_error (input_location, "cannot open intermediate ptx file");
+
+      process (in, out);
+    }
+
   fclose (out);
 
   compile_native (ptx_cfile_name, outname, collect_gcc);
diff --git libgomp/ChangeLog libgomp/ChangeLog
index 8839397..34f3a1c 100644
--- libgomp/ChangeLog
+++ libgomp/ChangeLog
@@ -1,3 +1,13 @@
+2015-07-08  Thomas Schwinge  <thomas@codesourcery.com>
+
+	PR libgomp/65099
+	* plugin/plugin-nvptx.c (nvptx_get_num_devices): Return 0 if not
+	in a 64-bit configuration.
+	* testsuite/libgomp.oacc-c++/c++.exp: Don't attempt nvidia
+	offloading testing if no such device is available.
+	* testsuite/libgomp.oacc-c/c.exp: Likewise.
+	* testsuite/libgomp.oacc-fortran/fortran.exp: Likewise.
+
 2015-07-08  Tom de Vries  <tom@codesourcery.com>
 
 	* testsuite/libgomp.c/parloops-exit-first-loop-alt-3.c (main): Fix
diff --git libgomp/plugin/plugin-nvptx.c libgomp/plugin/plugin-nvptx.c
index ee3a0ae..b67d301 100644
--- libgomp/plugin/plugin-nvptx.c
+++ libgomp/plugin/plugin-nvptx.c
@@ -777,6 +777,11 @@ nvptx_get_num_devices (void)
   int n;
   CUresult r;
 
+  /* PR libgomp/65099: Currently, we only support offloading in 64-bit
+     configurations.  */
+  if (sizeof (void *) != 8)
+    return 0;
+
   /* This function will be called before the plugin has been initialized in
      order to enumerate available devices, but CUDA API routines can't be used
      until cuInit has been called.  Just call it now (but don't yet do any
diff --git libgomp/testsuite/libgomp.oacc-c++/c++.exp libgomp/testsuite/libgomp.oacc-c++/c++.exp
index 80d1359..3b97024 100644
--- libgomp/testsuite/libgomp.oacc-c++/c++.exp
+++ libgomp/testsuite/libgomp.oacc-c++/c++.exp
@@ -85,6 +85,12 @@ if { $lang_test_file_found } {
 		set acc_mem_shared 0
 	    }
 	    nvidia {
+		if { ![check_effective_target_openacc_nvidia_accel_present] } {
+		    # Don't bother; execution testing is going to FAIL.
+		    untested "$subdir $offload_target_openacc offloading"
+		    continue
+		}
+
 		# Copy ptx file (TEMPORARY)
 		remote_download host $srcdir/libgomp.oacc-c-c++-common/subr.ptx
 
diff --git libgomp/testsuite/libgomp.oacc-c/c.exp libgomp/testsuite/libgomp.oacc-c/c.exp
index c0c70bb..326b988 100644
--- libgomp/testsuite/libgomp.oacc-c/c.exp
+++ libgomp/testsuite/libgomp.oacc-c/c.exp
@@ -48,6 +48,12 @@ foreach offload_target_openacc $offload_targets_s_openacc {
 	    set acc_mem_shared 0
 	}
 	nvidia {
+	    if { ![check_effective_target_openacc_nvidia_accel_present] } {
+		# Don't bother; execution testing is going to FAIL.
+		untested "$subdir $offload_target_openacc offloading"
+		continue
+	    }
+
 	    # Copy ptx file (TEMPORARY)
 	    remote_download host $srcdir/libgomp.oacc-c-c++-common/subr.ptx
 
diff --git libgomp/testsuite/libgomp.oacc-fortran/fortran.exp libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
index a8f62e8..a8aaff0 100644
--- libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
+++ libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
@@ -77,6 +77,12 @@ if { $lang_test_file_found } {
 		set acc_mem_shared 0
 	    }
 	    nvidia {
+		if { ![check_effective_target_openacc_nvidia_accel_present] } {
+		    # Don't bother; execution testing is going to FAIL.
+		    untested "$subdir $offload_target_openacc offloading"
+		    continue
+		}
+
 		set acc_mem_shared 0
 	    }
 	    default {


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [nvptx offloading] Only 64-bit configurations are currently supported
  2015-07-08 15:03                   ` [nvptx offloading] Only 64-bit configurations are currently supported (was: nvptx-tools and nvptx-newlib) Thomas Schwinge
@ 2015-07-14 20:10                     ` Thomas Schwinge
  2015-07-14 20:25                       ` Richard Biener
  2021-01-14 18:18                     ` [nvptx libgomp plugin] Build only in supported configurations (was: [nvptx offloading] Only 64-bit configurations are currently supported) Thomas Schwinge
  1 sibling, 1 reply; 82+ messages in thread
From: Thomas Schwinge @ 2015-07-14 20:10 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 8473 bytes --]

Hi!

OK for gcc-5-branch?

On Wed, 8 Jul 2015 17:03:02 +0200, I wrote:
> On Wed, 18 Feb 2015 09:50:15 +0100, I wrote:
> > So far, we have concentrated only on the 64-bit x86_64 configuration;
> > 32-bit has several known issues to be resolved.
> > <https://gcc.gnu.org/PR65099> filed.
> 
> I have committed the following patch in r225560.  This gets us rid of the
> lots of "expected FAILs" in the 32-bit part of
> RUNTESTFLAGS='--target_board=unix\{-m64,-m32\}' testing, for example.
> 
> commit fe265ad3c9624da88f43be349137696449148f4f
> Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
> Date:   Wed Jul 8 14:59:59 2015 +0000
> 
>     [nvptx offloading] Only 64-bit configurations are currently supported
>     
>     	PR libgomp/65099
>     	gcc/
>     	* config/nvptx/mkoffload.c (main): Create an offload image only in
>     	64-bit configurations.
>     	libgomp/
>     	* plugin/plugin-nvptx.c (nvptx_get_num_devices): Return 0 if not
>     	in a 64-bit configuration.
>     	* testsuite/libgomp.oacc-c++/c++.exp: Don't attempt nvidia
>     	offloading testing if no such device is available.
>     	* testsuite/libgomp.oacc-c/c.exp: Likewise.
>     	* testsuite/libgomp.oacc-fortran/fortran.exp: Likewise.
>     
>     git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@225560 138bc75d-0d04-0410-961f-82ee72b054a4
> ---
>  gcc/ChangeLog                                      |    6 +++
>  gcc/config/nvptx/mkoffload.c                       |   56 +++++++++++---------
>  libgomp/ChangeLog                                  |   10 ++++
>  libgomp/plugin/plugin-nvptx.c                      |    5 ++
>  libgomp/testsuite/libgomp.oacc-c++/c++.exp         |    6 +++
>  libgomp/testsuite/libgomp.oacc-c/c.exp             |    6 +++
>  libgomp/testsuite/libgomp.oacc-fortran/fortran.exp |    6 +++
>  7 files changed, 70 insertions(+), 25 deletions(-)
> 
> diff --git gcc/ChangeLog gcc/ChangeLog
> index 33a2fa0..4c83723 100644
> --- gcc/ChangeLog
> +++ gcc/ChangeLog
> @@ -1,3 +1,9 @@
> +2015-07-08  Thomas Schwinge  <thomas@codesourcery.com>
> +
> +	PR libgomp/65099
> +	* config/nvptx/mkoffload.c (main): Create an offload image only in
> +	64-bit configurations.
> +
>  2015-07-08  Martin Liska  <mliska@suse.cz>
>  
>  	PR bootstrap/66744
> diff --git gcc/config/nvptx/mkoffload.c gcc/config/nvptx/mkoffload.c
> index 8687154..8bc08bf 100644
> --- gcc/config/nvptx/mkoffload.c
> +++ gcc/config/nvptx/mkoffload.c
> @@ -993,37 +993,43 @@ main (int argc, char **argv)
>  	obstack_ptr_grow (&argv_obstack, argv[ix]);
>      }
>  
> -  ptx_name = make_temp_file (".mkoffload");
> -  obstack_ptr_grow (&argv_obstack, "-o");
> -  obstack_ptr_grow (&argv_obstack, ptx_name);
> -  obstack_ptr_grow (&argv_obstack, NULL);
> -  const char **new_argv = XOBFINISH (&argv_obstack, const char **);
> -
> -  char *execpath = getenv ("GCC_EXEC_PREFIX");
> -  char *cpath = getenv ("COMPILER_PATH");
> -  char *lpath = getenv ("LIBRARY_PATH");
> -  unsetenv ("GCC_EXEC_PREFIX");
> -  unsetenv ("COMPILER_PATH");
> -  unsetenv ("LIBRARY_PATH");
> -
> -  fork_execute (new_argv[0], CONST_CAST (char **, new_argv), true);
> -  obstack_free (&argv_obstack, NULL);
> -
> -  xputenv (concat ("GCC_EXEC_PREFIX=", execpath, NULL));
> -  xputenv (concat ("COMPILER_PATH=", cpath, NULL));
> -  xputenv (concat ("LIBRARY_PATH=", lpath, NULL));
> -
> -  in = fopen (ptx_name, "r");
> -  if (!in)
> -    fatal_error (input_location, "cannot open intermediate ptx file");
> -
>    ptx_cfile_name = make_temp_file (".c");
>  
>    out = fopen (ptx_cfile_name, "w");
>    if (!out)
>      fatal_error (input_location, "cannot open '%s'", ptx_cfile_name);
>  
> -  process (in, out);
> +  /* PR libgomp/65099: Currently, we only support offloading in 64-bit
> +     configurations.  */
> +  if (!target_ilp32)
> +    {
> +      ptx_name = make_temp_file (".mkoffload");
> +      obstack_ptr_grow (&argv_obstack, "-o");
> +      obstack_ptr_grow (&argv_obstack, ptx_name);
> +      obstack_ptr_grow (&argv_obstack, NULL);
> +      const char **new_argv = XOBFINISH (&argv_obstack, const char **);
> +
> +      char *execpath = getenv ("GCC_EXEC_PREFIX");
> +      char *cpath = getenv ("COMPILER_PATH");
> +      char *lpath = getenv ("LIBRARY_PATH");
> +      unsetenv ("GCC_EXEC_PREFIX");
> +      unsetenv ("COMPILER_PATH");
> +      unsetenv ("LIBRARY_PATH");
> +
> +      fork_execute (new_argv[0], CONST_CAST (char **, new_argv), true);
> +      obstack_free (&argv_obstack, NULL);
> +
> +      xputenv (concat ("GCC_EXEC_PREFIX=", execpath, NULL));
> +      xputenv (concat ("COMPILER_PATH=", cpath, NULL));
> +      xputenv (concat ("LIBRARY_PATH=", lpath, NULL));
> +
> +      in = fopen (ptx_name, "r");
> +      if (!in)
> +	fatal_error (input_location, "cannot open intermediate ptx file");
> +
> +      process (in, out);
> +    }
> +
>    fclose (out);
>  
>    compile_native (ptx_cfile_name, outname, collect_gcc);
> diff --git libgomp/ChangeLog libgomp/ChangeLog
> index 8839397..34f3a1c 100644
> --- libgomp/ChangeLog
> +++ libgomp/ChangeLog
> @@ -1,3 +1,13 @@
> +2015-07-08  Thomas Schwinge  <thomas@codesourcery.com>
> +
> +	PR libgomp/65099
> +	* plugin/plugin-nvptx.c (nvptx_get_num_devices): Return 0 if not
> +	in a 64-bit configuration.
> +	* testsuite/libgomp.oacc-c++/c++.exp: Don't attempt nvidia
> +	offloading testing if no such device is available.
> +	* testsuite/libgomp.oacc-c/c.exp: Likewise.
> +	* testsuite/libgomp.oacc-fortran/fortran.exp: Likewise.
> +
>  2015-07-08  Tom de Vries  <tom@codesourcery.com>
>  
>  	* testsuite/libgomp.c/parloops-exit-first-loop-alt-3.c (main): Fix
> diff --git libgomp/plugin/plugin-nvptx.c libgomp/plugin/plugin-nvptx.c
> index ee3a0ae..b67d301 100644
> --- libgomp/plugin/plugin-nvptx.c
> +++ libgomp/plugin/plugin-nvptx.c
> @@ -777,6 +777,11 @@ nvptx_get_num_devices (void)
>    int n;
>    CUresult r;
>  
> +  /* PR libgomp/65099: Currently, we only support offloading in 64-bit
> +     configurations.  */
> +  if (sizeof (void *) != 8)
> +    return 0;
> +
>    /* This function will be called before the plugin has been initialized in
>       order to enumerate available devices, but CUDA API routines can't be used
>       until cuInit has been called.  Just call it now (but don't yet do any
> diff --git libgomp/testsuite/libgomp.oacc-c++/c++.exp libgomp/testsuite/libgomp.oacc-c++/c++.exp
> index 80d1359..3b97024 100644
> --- libgomp/testsuite/libgomp.oacc-c++/c++.exp
> +++ libgomp/testsuite/libgomp.oacc-c++/c++.exp
> @@ -85,6 +85,12 @@ if { $lang_test_file_found } {
>  		set acc_mem_shared 0
>  	    }
>  	    nvidia {
> +		if { ![check_effective_target_openacc_nvidia_accel_present] } {
> +		    # Don't bother; execution testing is going to FAIL.
> +		    untested "$subdir $offload_target_openacc offloading"
> +		    continue
> +		}
> +
>  		# Copy ptx file (TEMPORARY)
>  		remote_download host $srcdir/libgomp.oacc-c-c++-common/subr.ptx
>  
> diff --git libgomp/testsuite/libgomp.oacc-c/c.exp libgomp/testsuite/libgomp.oacc-c/c.exp
> index c0c70bb..326b988 100644
> --- libgomp/testsuite/libgomp.oacc-c/c.exp
> +++ libgomp/testsuite/libgomp.oacc-c/c.exp
> @@ -48,6 +48,12 @@ foreach offload_target_openacc $offload_targets_s_openacc {
>  	    set acc_mem_shared 0
>  	}
>  	nvidia {
> +	    if { ![check_effective_target_openacc_nvidia_accel_present] } {
> +		# Don't bother; execution testing is going to FAIL.
> +		untested "$subdir $offload_target_openacc offloading"
> +		continue
> +	    }
> +
>  	    # Copy ptx file (TEMPORARY)
>  	    remote_download host $srcdir/libgomp.oacc-c-c++-common/subr.ptx
>  
> diff --git libgomp/testsuite/libgomp.oacc-fortran/fortran.exp libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
> index a8f62e8..a8aaff0 100644
> --- libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
> +++ libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
> @@ -77,6 +77,12 @@ if { $lang_test_file_found } {
>  		set acc_mem_shared 0
>  	    }
>  	    nvidia {
> +		if { ![check_effective_target_openacc_nvidia_accel_present] } {
> +		    # Don't bother; execution testing is going to FAIL.
> +		    untested "$subdir $offload_target_openacc offloading"
> +		    continue
> +		}
> +
>  		set acc_mem_shared 0
>  	    }
>  	    default {


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [nvptx offloading] Only 64-bit configurations are currently supported
  2015-07-14 20:10                     ` [nvptx offloading] Only 64-bit configurations are currently supported Thomas Schwinge
@ 2015-07-14 20:25                       ` Richard Biener
  0 siblings, 0 replies; 82+ messages in thread
From: Richard Biener @ 2015-07-14 20:25 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: gcc-patches

On July 14, 2015 10:05:52 PM GMT+02:00, Thomas Schwinge <thomas@codesourcery.com> wrote:
>Hi!
>
>OK for gcc-5-branch?

OK

Richard

>On Wed, 8 Jul 2015 17:03:02 +0200, I wrote:
>> On Wed, 18 Feb 2015 09:50:15 +0100, I wrote:
>> > So far, we have concentrated only on the 64-bit x86_64
>configuration;
>> > 32-bit has several known issues to be resolved.
>> > <https://gcc.gnu.org/PR65099> filed.
>> 
>> I have committed the following patch in r225560.  This gets us rid of
>the
>> lots of "expected FAILs" in the 32-bit part of
>> RUNTESTFLAGS='--target_board=unix\{-m64,-m32\}' testing, for example.
>> 
>> commit fe265ad3c9624da88f43be349137696449148f4f
>> Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
>> Date:   Wed Jul 8 14:59:59 2015 +0000
>> 
>>     [nvptx offloading] Only 64-bit configurations are currently
>supported
>>     
>>     	PR libgomp/65099
>>     	gcc/
>>     	* config/nvptx/mkoffload.c (main): Create an offload image only
>in
>>     	64-bit configurations.
>>     	libgomp/
>>     	* plugin/plugin-nvptx.c (nvptx_get_num_devices): Return 0 if not
>>     	in a 64-bit configuration.
>>     	* testsuite/libgomp.oacc-c++/c++.exp: Don't attempt nvidia
>>     	offloading testing if no such device is available.
>>     	* testsuite/libgomp.oacc-c/c.exp: Likewise.
>>     	* testsuite/libgomp.oacc-fortran/fortran.exp: Likewise.
>>     
>>     git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@225560
>138bc75d-0d04-0410-961f-82ee72b054a4
>> ---
>>  gcc/ChangeLog                                      |    6 +++
>>  gcc/config/nvptx/mkoffload.c                       |   56
>+++++++++++---------
>>  libgomp/ChangeLog                                  |   10 ++++
>>  libgomp/plugin/plugin-nvptx.c                      |    5 ++
>>  libgomp/testsuite/libgomp.oacc-c++/c++.exp         |    6 +++
>>  libgomp/testsuite/libgomp.oacc-c/c.exp             |    6 +++
>>  libgomp/testsuite/libgomp.oacc-fortran/fortran.exp |    6 +++
>>  7 files changed, 70 insertions(+), 25 deletions(-)
>> 
>> diff --git gcc/ChangeLog gcc/ChangeLog
>> index 33a2fa0..4c83723 100644
>> --- gcc/ChangeLog
>> +++ gcc/ChangeLog
>> @@ -1,3 +1,9 @@
>> +2015-07-08  Thomas Schwinge  <thomas@codesourcery.com>
>> +
>> +	PR libgomp/65099
>> +	* config/nvptx/mkoffload.c (main): Create an offload image only in
>> +	64-bit configurations.
>> +
>>  2015-07-08  Martin Liska  <mliska@suse.cz>
>>  
>>  	PR bootstrap/66744
>> diff --git gcc/config/nvptx/mkoffload.c gcc/config/nvptx/mkoffload.c
>> index 8687154..8bc08bf 100644
>> --- gcc/config/nvptx/mkoffload.c
>> +++ gcc/config/nvptx/mkoffload.c
>> @@ -993,37 +993,43 @@ main (int argc, char **argv)
>>  	obstack_ptr_grow (&argv_obstack, argv[ix]);
>>      }
>>  
>> -  ptx_name = make_temp_file (".mkoffload");
>> -  obstack_ptr_grow (&argv_obstack, "-o");
>> -  obstack_ptr_grow (&argv_obstack, ptx_name);
>> -  obstack_ptr_grow (&argv_obstack, NULL);
>> -  const char **new_argv = XOBFINISH (&argv_obstack, const char **);
>> -
>> -  char *execpath = getenv ("GCC_EXEC_PREFIX");
>> -  char *cpath = getenv ("COMPILER_PATH");
>> -  char *lpath = getenv ("LIBRARY_PATH");
>> -  unsetenv ("GCC_EXEC_PREFIX");
>> -  unsetenv ("COMPILER_PATH");
>> -  unsetenv ("LIBRARY_PATH");
>> -
>> -  fork_execute (new_argv[0], CONST_CAST (char **, new_argv), true);
>> -  obstack_free (&argv_obstack, NULL);
>> -
>> -  xputenv (concat ("GCC_EXEC_PREFIX=", execpath, NULL));
>> -  xputenv (concat ("COMPILER_PATH=", cpath, NULL));
>> -  xputenv (concat ("LIBRARY_PATH=", lpath, NULL));
>> -
>> -  in = fopen (ptx_name, "r");
>> -  if (!in)
>> -    fatal_error (input_location, "cannot open intermediate ptx
>file");
>> -
>>    ptx_cfile_name = make_temp_file (".c");
>>  
>>    out = fopen (ptx_cfile_name, "w");
>>    if (!out)
>>      fatal_error (input_location, "cannot open '%s'",
>ptx_cfile_name);
>>  
>> -  process (in, out);
>> +  /* PR libgomp/65099: Currently, we only support offloading in
>64-bit
>> +     configurations.  */
>> +  if (!target_ilp32)
>> +    {
>> +      ptx_name = make_temp_file (".mkoffload");
>> +      obstack_ptr_grow (&argv_obstack, "-o");
>> +      obstack_ptr_grow (&argv_obstack, ptx_name);
>> +      obstack_ptr_grow (&argv_obstack, NULL);
>> +      const char **new_argv = XOBFINISH (&argv_obstack, const char
>**);
>> +
>> +      char *execpath = getenv ("GCC_EXEC_PREFIX");
>> +      char *cpath = getenv ("COMPILER_PATH");
>> +      char *lpath = getenv ("LIBRARY_PATH");
>> +      unsetenv ("GCC_EXEC_PREFIX");
>> +      unsetenv ("COMPILER_PATH");
>> +      unsetenv ("LIBRARY_PATH");
>> +
>> +      fork_execute (new_argv[0], CONST_CAST (char **, new_argv),
>true);
>> +      obstack_free (&argv_obstack, NULL);
>> +
>> +      xputenv (concat ("GCC_EXEC_PREFIX=", execpath, NULL));
>> +      xputenv (concat ("COMPILER_PATH=", cpath, NULL));
>> +      xputenv (concat ("LIBRARY_PATH=", lpath, NULL));
>> +
>> +      in = fopen (ptx_name, "r");
>> +      if (!in)
>> +	fatal_error (input_location, "cannot open intermediate ptx file");
>> +
>> +      process (in, out);
>> +    }
>> +
>>    fclose (out);
>>  
>>    compile_native (ptx_cfile_name, outname, collect_gcc);
>> diff --git libgomp/ChangeLog libgomp/ChangeLog
>> index 8839397..34f3a1c 100644
>> --- libgomp/ChangeLog
>> +++ libgomp/ChangeLog
>> @@ -1,3 +1,13 @@
>> +2015-07-08  Thomas Schwinge  <thomas@codesourcery.com>
>> +
>> +	PR libgomp/65099
>> +	* plugin/plugin-nvptx.c (nvptx_get_num_devices): Return 0 if not
>> +	in a 64-bit configuration.
>> +	* testsuite/libgomp.oacc-c++/c++.exp: Don't attempt nvidia
>> +	offloading testing if no such device is available.
>> +	* testsuite/libgomp.oacc-c/c.exp: Likewise.
>> +	* testsuite/libgomp.oacc-fortran/fortran.exp: Likewise.
>> +
>>  2015-07-08  Tom de Vries  <tom@codesourcery.com>
>>  
>>  	* testsuite/libgomp.c/parloops-exit-first-loop-alt-3.c (main): Fix
>> diff --git libgomp/plugin/plugin-nvptx.c
>libgomp/plugin/plugin-nvptx.c
>> index ee3a0ae..b67d301 100644
>> --- libgomp/plugin/plugin-nvptx.c
>> +++ libgomp/plugin/plugin-nvptx.c
>> @@ -777,6 +777,11 @@ nvptx_get_num_devices (void)
>>    int n;
>>    CUresult r;
>>  
>> +  /* PR libgomp/65099: Currently, we only support offloading in
>64-bit
>> +     configurations.  */
>> +  if (sizeof (void *) != 8)
>> +    return 0;
>> +
>>    /* This function will be called before the plugin has been
>initialized in
>>       order to enumerate available devices, but CUDA API routines
>can't be used
>>       until cuInit has been called.  Just call it now (but don't yet
>do any
>> diff --git libgomp/testsuite/libgomp.oacc-c++/c++.exp
>libgomp/testsuite/libgomp.oacc-c++/c++.exp
>> index 80d1359..3b97024 100644
>> --- libgomp/testsuite/libgomp.oacc-c++/c++.exp
>> +++ libgomp/testsuite/libgomp.oacc-c++/c++.exp
>> @@ -85,6 +85,12 @@ if { $lang_test_file_found } {
>>  		set acc_mem_shared 0
>>  	    }
>>  	    nvidia {
>> +		if { ![check_effective_target_openacc_nvidia_accel_present] } {
>> +		    # Don't bother; execution testing is going to FAIL.
>> +		    untested "$subdir $offload_target_openacc offloading"
>> +		    continue
>> +		}
>> +
>>  		# Copy ptx file (TEMPORARY)
>>  		remote_download host $srcdir/libgomp.oacc-c-c++-common/subr.ptx
>>  
>> diff --git libgomp/testsuite/libgomp.oacc-c/c.exp
>libgomp/testsuite/libgomp.oacc-c/c.exp
>> index c0c70bb..326b988 100644
>> --- libgomp/testsuite/libgomp.oacc-c/c.exp
>> +++ libgomp/testsuite/libgomp.oacc-c/c.exp
>> @@ -48,6 +48,12 @@ foreach offload_target_openacc
>$offload_targets_s_openacc {
>>  	    set acc_mem_shared 0
>>  	}
>>  	nvidia {
>> +	    if { ![check_effective_target_openacc_nvidia_accel_present] } {
>> +		# Don't bother; execution testing is going to FAIL.
>> +		untested "$subdir $offload_target_openacc offloading"
>> +		continue
>> +	    }
>> +
>>  	    # Copy ptx file (TEMPORARY)
>>  	    remote_download host $srcdir/libgomp.oacc-c-c++-common/subr.ptx
>>  
>> diff --git libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
>libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
>> index a8f62e8..a8aaff0 100644
>> --- libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
>> +++ libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
>> @@ -77,6 +77,12 @@ if { $lang_test_file_found } {
>>  		set acc_mem_shared 0
>>  	    }
>>  	    nvidia {
>> +		if { ![check_effective_target_openacc_nvidia_accel_present] } {
>> +		    # Don't bother; execution testing is going to FAIL.
>> +		    untested "$subdir $offload_target_openacc offloading"
>> +		    continue
>> +		}
>> +
>>  		set acc_mem_shared 0
>>  	    }
>>  	    default {
>
>
>Grüße,
> Thomas


^ permalink raw reply	[flat|nested] 82+ messages in thread

* [nvptx libgomp plugin] Build only in supported configurations (was: [nvptx offloading] Only 64-bit configurations are currently supported)
  2015-07-08 15:03                   ` [nvptx offloading] Only 64-bit configurations are currently supported (was: nvptx-tools and nvptx-newlib) Thomas Schwinge
  2015-07-14 20:10                     ` [nvptx offloading] Only 64-bit configurations are currently supported Thomas Schwinge
@ 2021-01-14 18:18                     ` Thomas Schwinge
  2021-03-04  8:52                       ` [committed] libgomp: Use sizeof(void*) based checks instead of looking through $CC $CFLAGS for -m32/-mx32 Jakub Jelinek
  1 sibling, 1 reply; 82+ messages in thread
From: Thomas Schwinge @ 2021-01-14 18:18 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jakub Jelinek, Tom de Vries

[-- Attachment #1: Type: text/plain, Size: 1766 bytes --]

Hi!

On 2015-07-08T17:03:02+0200, I wrote:
> On Wed, 18 Feb 2015 09:50:15 +0100, I wrote:
>> So far, we have concentrated only on the 64-bit x86_64 configuration;
>> 32-bit has several known issues to be resolved.
>> <https://gcc.gnu.org/PR65099> filed.

(This still holds, and is unlikely to ever get addressed.)

> I have committed the following patch in r225560.  This gets us rid of the
> lots of "expected FAILs" in the 32-bit part of
> RUNTESTFLAGS='--target_board=unix\{-m64,-m32\}' testing, for example.
>
> commit fe265ad3c9624da88f43be349137696449148f4f
> Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
> Date:   Wed Jul 8 14:59:59 2015 +0000
>
>     [nvptx offloading] Only 64-bit configurations are currently supported
>
>       PR libgomp/65099
>       gcc/
>       * config/nvptx/mkoffload.c (main): Create an offload image only in
>       64-bit configurations.

(That remains in place.)

>       libgomp/
>       * plugin/plugin-nvptx.c (nvptx_get_num_devices): Return 0 if not
>       in a 64-bit configuration.

That, for reasons given in the commit log, I've just refined, pushed
"[nvptx libgomp plugin] Build only in supported configurations" to master
branch in commit 6106dfb9f73a33c87108ad5b2dcd4842bdd7828e, and
cherry-picked into releases/gcc-10 branch in commit
1e56a7c9a6631b217299b2ddcd5c4d497bb3445e, releases/gcc-9 branch in commit
0f1e1069a753e912b058f0d4bf599f0edde28408, releases/gcc-8 branch in commit
f9267925c648f2ccd9e4680b699e581003125bcf, see attached.


Grüße
 Thomas


-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-nvptx-libgomp-plugin-Build-only-in-supported-configu.patch --]
[-- Type: text/x-diff, Size: 8721 bytes --]

From 6106dfb9f73a33c87108ad5b2dcd4842bdd7828e Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Mon, 30 Nov 2020 15:15:20 +0100
Subject: [PATCH] [nvptx libgomp plugin] Build only in supported configurations

As recently again discussed in <https://gcc.gnu.org/PR97436> "[nvptx] -m32
support", nvptx offloading other than for 64-bit host has never been
implemented, tested, supported.  So we simply should buildn't the nvptx libgomp
plugin in this case.

This avoids build problems if, for example, in a (standard) bi-arch
x86_64-pc-linux-gnu '-m64'/'-m32' build, libcuda is available only in a 64-bit
variant but not in a 32-bit one, which, for example, is the case if you build
GCC against the CUDA toolkit's 'stubs/libcuda.so' (see
<https://stackoverflow.com/a/52784819>).

This amends PR65099 commit a92defdab79a1268f4b9dcf42b937e4002a4cf15 (r225560)
"[nvptx offloading] Only 64-bit configurations are currently supported" to
match the way we're doing this for the HSA/GCN plugins.

	libgomp/
	PR libgomp/65099
	* plugin/configfrag.ac (PLUGIN_NVPTX): Restrict to supported
	configurations.
	* configure: Regenerate.
	* plugin/plugin-nvptx.c (nvptx_get_num_devices): Remove 64-bit
	check.
---
 libgomp/configure             | 86 +++++++++++++++++++-------------
 libgomp/plugin/configfrag.ac  | 92 ++++++++++++++++++++---------------
 libgomp/plugin/plugin-nvptx.c |  9 ----
 3 files changed, 105 insertions(+), 82 deletions(-)

diff --git a/libgomp/configure b/libgomp/configure
index 89c17c571b7..48bf8e4a72c 100755
--- a/libgomp/configure
+++ b/libgomp/configure
@@ -15272,21 +15272,30 @@ if test x"$enable_offload_targets" != x; then
 	tgt_plugin=intelmic
 	;;
       nvptx*)
-	tgt_plugin=nvptx
-	PLUGIN_NVPTX=$tgt
-	if test "x$CUDA_DRIVER_LIB" != xno \
-	   && test "x$CUDA_DRIVER_LIB" != xno; then
-	  PLUGIN_NVPTX_CPPFLAGS=$CUDA_DRIVER_CPPFLAGS
-	  PLUGIN_NVPTX_LDFLAGS=$CUDA_DRIVER_LDFLAGS
-	  PLUGIN_NVPTX_LIBS='-lcuda'
-
-	  PLUGIN_NVPTX_save_CPPFLAGS=$CPPFLAGS
-	  CPPFLAGS="$PLUGIN_NVPTX_CPPFLAGS $CPPFLAGS"
-	  PLUGIN_NVPTX_save_LDFLAGS=$LDFLAGS
-	  LDFLAGS="$PLUGIN_NVPTX_LDFLAGS $LDFLAGS"
-	  PLUGIN_NVPTX_save_LIBS=$LIBS
-	  LIBS="$PLUGIN_NVPTX_LIBS $LIBS"
-	  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+	case "${target}" in
+	  aarch64*-*-* | powerpc64le-*-* | x86_64-*-*)
+	    case " ${CC} ${CFLAGS} " in
+	      *" -m32 "* | *" -mx32 "*)
+		# PR libgomp/65099: Currently, we only support offloading in
+		# 64-bit configurations.
+		PLUGIN_NVPTX=0
+		;;
+	      *)
+		tgt_plugin=nvptx
+		PLUGIN_NVPTX=$tgt
+		if test "x$CUDA_DRIVER_LIB" != xno \
+		   && test "x$CUDA_DRIVER_LIB" != xno; then
+		  PLUGIN_NVPTX_CPPFLAGS=$CUDA_DRIVER_CPPFLAGS
+		  PLUGIN_NVPTX_LDFLAGS=$CUDA_DRIVER_LDFLAGS
+		  PLUGIN_NVPTX_LIBS='-lcuda'
+
+		  PLUGIN_NVPTX_save_CPPFLAGS=$CPPFLAGS
+		  CPPFLAGS="$PLUGIN_NVPTX_CPPFLAGS $CPPFLAGS"
+		  PLUGIN_NVPTX_save_LDFLAGS=$LDFLAGS
+		  LDFLAGS="$PLUGIN_NVPTX_LDFLAGS $LDFLAGS"
+		  PLUGIN_NVPTX_save_LIBS=$LIBS
+		  LIBS="$PLUGIN_NVPTX_LIBS $LIBS"
+		  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 #include "cuda.h"
 int
@@ -15302,28 +15311,35 @@ if ac_fn_c_try_link "$LINENO"; then :
 fi
 rm -f core conftest.err conftest.$ac_objext \
     conftest$ac_exeext conftest.$ac_ext
-	  CPPFLAGS=$PLUGIN_NVPTX_save_CPPFLAGS
-	  LDFLAGS=$PLUGIN_NVPTX_save_LDFLAGS
-	  LIBS=$PLUGIN_NVPTX_save_LIBS
-	fi
-	case $PLUGIN_NVPTX in
-	  nvptx*)
-	    if (test "x$CUDA_DRIVER_INCLUDE" = x \
-		|| test "x$CUDA_DRIVER_INCLUDE" = xno) \
-	       && (test "x$CUDA_DRIVER_LIB" = x \
-		   || test "x$CUDA_DRIVER_LIB" = xno); then
-	      PLUGIN_NVPTX=1
-	      PLUGIN_NVPTX_CPPFLAGS='-I$(srcdir)/plugin/cuda'
-	      PLUGIN_NVPTX_LIBS='-ldl'
-	      PLUGIN_NVPTX_DYNAMIC=1
-	    else
-	      PLUGIN_NVPTX=0
-	      as_fn_error $? "CUDA driver package required for nvptx support" "$LINENO" 5
-	    fi
-	  ;;
+		  CPPFLAGS=$PLUGIN_NVPTX_save_CPPFLAGS
+		  LDFLAGS=$PLUGIN_NVPTX_save_LDFLAGS
+		  LIBS=$PLUGIN_NVPTX_save_LIBS
+		fi
+		case $PLUGIN_NVPTX in
+		  nvptx*)
+		    if (test "x$CUDA_DRIVER_INCLUDE" = x \
+			|| test "x$CUDA_DRIVER_INCLUDE" = xno) \
+		       && (test "x$CUDA_DRIVER_LIB" = x \
+			   || test "x$CUDA_DRIVER_LIB" = xno); then
+		      PLUGIN_NVPTX=1
+		      PLUGIN_NVPTX_CPPFLAGS='-I$(srcdir)/plugin/cuda'
+		      PLUGIN_NVPTX_LIBS='-ldl'
+		      PLUGIN_NVPTX_DYNAMIC=1
+		    else
+		      PLUGIN_NVPTX=0
+		      as_fn_error $? "CUDA driver package required for nvptx support" "$LINENO" 5
+		    fi
+		    ;;
+		esac
+		;;
+	    esac
+	    ;;
+	  *-*-*)
+	    # Target architecture not supported.
+	    PLUGIN_NVPTX=0
+	    ;;
 	esac
 	;;
-
       amdgcn*)
 	case "${target}" in
 	  x86_64-*-*)
diff --git a/libgomp/plugin/configfrag.ac b/libgomp/plugin/configfrag.ac
index 2e086c017fe..88550982eab 100644
--- a/libgomp/plugin/configfrag.ac
+++ b/libgomp/plugin/configfrag.ac
@@ -158,47 +158,63 @@ if test x"$enable_offload_targets" != x; then
 	tgt_plugin=intelmic
 	;;
       nvptx*)
-	tgt_plugin=nvptx
-	PLUGIN_NVPTX=$tgt
-	if test "x$CUDA_DRIVER_LIB" != xno \
-	   && test "x$CUDA_DRIVER_LIB" != xno; then
-	  PLUGIN_NVPTX_CPPFLAGS=$CUDA_DRIVER_CPPFLAGS
-	  PLUGIN_NVPTX_LDFLAGS=$CUDA_DRIVER_LDFLAGS
-	  PLUGIN_NVPTX_LIBS='-lcuda'
+	case "${target}" in
+	  aarch64*-*-* | powerpc64le-*-* | x86_64-*-*)
+	    case " ${CC} ${CFLAGS} " in
+	      *" -m32 "* | *" -mx32 "*)
+		# PR libgomp/65099: Currently, we only support offloading in
+		# 64-bit configurations.
+		PLUGIN_NVPTX=0
+		;;
+	      *)
+		tgt_plugin=nvptx
+		PLUGIN_NVPTX=$tgt
+		if test "x$CUDA_DRIVER_LIB" != xno \
+		   && test "x$CUDA_DRIVER_LIB" != xno; then
+		  PLUGIN_NVPTX_CPPFLAGS=$CUDA_DRIVER_CPPFLAGS
+		  PLUGIN_NVPTX_LDFLAGS=$CUDA_DRIVER_LDFLAGS
+		  PLUGIN_NVPTX_LIBS='-lcuda'
 
-	  PLUGIN_NVPTX_save_CPPFLAGS=$CPPFLAGS
-	  CPPFLAGS="$PLUGIN_NVPTX_CPPFLAGS $CPPFLAGS"
-	  PLUGIN_NVPTX_save_LDFLAGS=$LDFLAGS
-	  LDFLAGS="$PLUGIN_NVPTX_LDFLAGS $LDFLAGS"
-	  PLUGIN_NVPTX_save_LIBS=$LIBS
-	  LIBS="$PLUGIN_NVPTX_LIBS $LIBS"
-	  AC_LINK_IFELSE(
-	    [AC_LANG_PROGRAM(
-	      [#include "cuda.h"],
-		[CUresult r = cuCtxPushCurrent (NULL);])],
-	    [PLUGIN_NVPTX=1])
-	  CPPFLAGS=$PLUGIN_NVPTX_save_CPPFLAGS
-	  LDFLAGS=$PLUGIN_NVPTX_save_LDFLAGS
-	  LIBS=$PLUGIN_NVPTX_save_LIBS
-	fi
-	case $PLUGIN_NVPTX in
-	  nvptx*)
-	    if (test "x$CUDA_DRIVER_INCLUDE" = x \
-		|| test "x$CUDA_DRIVER_INCLUDE" = xno) \
-	       && (test "x$CUDA_DRIVER_LIB" = x \
-		   || test "x$CUDA_DRIVER_LIB" = xno); then
-	      PLUGIN_NVPTX=1
-	      PLUGIN_NVPTX_CPPFLAGS='-I$(srcdir)/plugin/cuda'
-	      PLUGIN_NVPTX_LIBS='-ldl'
-	      PLUGIN_NVPTX_DYNAMIC=1
-	    else
-	      PLUGIN_NVPTX=0
-	      AC_MSG_ERROR([CUDA driver package required for nvptx support])
-	    fi
-	  ;;
+		  PLUGIN_NVPTX_save_CPPFLAGS=$CPPFLAGS
+		  CPPFLAGS="$PLUGIN_NVPTX_CPPFLAGS $CPPFLAGS"
+		  PLUGIN_NVPTX_save_LDFLAGS=$LDFLAGS
+		  LDFLAGS="$PLUGIN_NVPTX_LDFLAGS $LDFLAGS"
+		  PLUGIN_NVPTX_save_LIBS=$LIBS
+		  LIBS="$PLUGIN_NVPTX_LIBS $LIBS"
+		  AC_LINK_IFELSE(
+		    [AC_LANG_PROGRAM(
+		      [#include "cuda.h"],
+			[CUresult r = cuCtxPushCurrent (NULL);])],
+		    [PLUGIN_NVPTX=1])
+		  CPPFLAGS=$PLUGIN_NVPTX_save_CPPFLAGS
+		  LDFLAGS=$PLUGIN_NVPTX_save_LDFLAGS
+		  LIBS=$PLUGIN_NVPTX_save_LIBS
+		fi
+		case $PLUGIN_NVPTX in
+		  nvptx*)
+		    if (test "x$CUDA_DRIVER_INCLUDE" = x \
+			|| test "x$CUDA_DRIVER_INCLUDE" = xno) \
+		       && (test "x$CUDA_DRIVER_LIB" = x \
+			   || test "x$CUDA_DRIVER_LIB" = xno); then
+		      PLUGIN_NVPTX=1
+		      PLUGIN_NVPTX_CPPFLAGS='-I$(srcdir)/plugin/cuda'
+		      PLUGIN_NVPTX_LIBS='-ldl'
+		      PLUGIN_NVPTX_DYNAMIC=1
+		    else
+		      PLUGIN_NVPTX=0
+		      AC_MSG_ERROR([CUDA driver package required for nvptx support])
+		    fi
+		    ;;
+		esac
+		;;
+	    esac
+	    ;;
+	  *-*-*)
+	    # Target architecture not supported.
+	    PLUGIN_NVPTX=0
+	    ;;
 	esac
 	;;
-
       amdgcn*)
 	case "${target}" in
 	  x86_64-*-*)
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 681c344b9c2..1215212d501 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -572,15 +572,6 @@ nvptx_get_num_devices (void)
 {
   int n;
 
-  /* PR libgomp/65099: Currently, we only support offloading in 64-bit
-     configurations.  */
-  if (sizeof (void *) != 8)
-    {
-      GOMP_PLUGIN_debug (0, "Disabling nvptx offloading;"
-			 " only 64-bit configurations are supported\n");
-      return 0;
-    }
-
   /* This function will be called before the plugin has been initialized in
      order to enumerate available devices, but CUDA API routines can't be used
      until cuInit has been called.  Just call it now (but don't yet do any
-- 
2.17.1


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: 0001-nvptx-libgomp-plugin-Build-only-in-supported-con.g10.patch --]
[-- Type: text/x-diff, Size: 8705 bytes --]

From 1e56a7c9a6631b217299b2ddcd5c4d497bb3445e Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Mon, 30 Nov 2020 15:15:20 +0100
Subject: [PATCH] [nvptx libgomp plugin] Build only in supported configurations

As recently again discussed in <https://gcc.gnu.org/PR97436> "[nvptx] -m32
support", nvptx offloading other than for 64-bit host has never been
implemented, tested, supported.  So we simply should buildn't the nvptx libgomp
plugin in this case.

This avoids build problems if, for example, in a (standard) bi-arch
x86_64-pc-linux-gnu '-m64'/'-m32' build, libcuda is available only in a 64-bit
variant but not in a 32-bit one, which, for example, is the case if you build
GCC against the CUDA toolkit's 'stubs/libcuda.so' (see
<https://stackoverflow.com/a/52784819>).

This amends PR65099 commit a92defdab79a1268f4b9dcf42b937e4002a4cf15 (r225560)
"[nvptx offloading] Only 64-bit configurations are currently supported" to
match the way we're doing this for the HSA/GCN plugins.

	libgomp/
	PR libgomp/65099
	* plugin/configfrag.ac (PLUGIN_NVPTX): Restrict to supported
	configurations.
	* configure: Regenerate.
	* plugin/plugin-nvptx.c (nvptx_get_num_devices): Remove 64-bit
	check.

(cherry picked from commit 6106dfb9f73a33c87108ad5b2dcd4842bdd7828e)
---
 libgomp/configure             | 85 +++++++++++++++++++-------------
 libgomp/plugin/configfrag.ac  | 91 +++++++++++++++++++++--------------
 libgomp/plugin/plugin-nvptx.c |  9 ----
 3 files changed, 105 insertions(+), 80 deletions(-)

diff --git a/libgomp/configure b/libgomp/configure
index 69f57e31521..73f4a309f55 100755
--- a/libgomp/configure
+++ b/libgomp/configure
@@ -15294,21 +15294,30 @@ if test x"$enable_offload_targets" != x; then
 	tgt_plugin=intelmic
 	;;
       nvptx*)
-	tgt_plugin=nvptx
-	PLUGIN_NVPTX=$tgt
-	if test "x$CUDA_DRIVER_LIB" != xno \
-	   && test "x$CUDA_DRIVER_LIB" != xno; then
-	  PLUGIN_NVPTX_CPPFLAGS=$CUDA_DRIVER_CPPFLAGS
-	  PLUGIN_NVPTX_LDFLAGS=$CUDA_DRIVER_LDFLAGS
-	  PLUGIN_NVPTX_LIBS='-lcuda'
-
-	  PLUGIN_NVPTX_save_CPPFLAGS=$CPPFLAGS
-	  CPPFLAGS="$PLUGIN_NVPTX_CPPFLAGS $CPPFLAGS"
-	  PLUGIN_NVPTX_save_LDFLAGS=$LDFLAGS
-	  LDFLAGS="$PLUGIN_NVPTX_LDFLAGS $LDFLAGS"
-	  PLUGIN_NVPTX_save_LIBS=$LIBS
-	  LIBS="$PLUGIN_NVPTX_LIBS $LIBS"
-	  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+	case "${target}" in
+	  aarch64*-*-* | powerpc64le-*-* | x86_64-*-*)
+	    case " ${CC} ${CFLAGS} " in
+	      *" -m32 "* | *" -mx32 "*)
+		# PR libgomp/65099: Currently, we only support offloading in
+		# 64-bit configurations.
+		PLUGIN_NVPTX=0
+		;;
+	      *)
+		tgt_plugin=nvptx
+		PLUGIN_NVPTX=$tgt
+		if test "x$CUDA_DRIVER_LIB" != xno \
+		   && test "x$CUDA_DRIVER_LIB" != xno; then
+		  PLUGIN_NVPTX_CPPFLAGS=$CUDA_DRIVER_CPPFLAGS
+		  PLUGIN_NVPTX_LDFLAGS=$CUDA_DRIVER_LDFLAGS
+		  PLUGIN_NVPTX_LIBS='-lcuda'
+
+		  PLUGIN_NVPTX_save_CPPFLAGS=$CPPFLAGS
+		  CPPFLAGS="$PLUGIN_NVPTX_CPPFLAGS $CPPFLAGS"
+		  PLUGIN_NVPTX_save_LDFLAGS=$LDFLAGS
+		  LDFLAGS="$PLUGIN_NVPTX_LDFLAGS $LDFLAGS"
+		  PLUGIN_NVPTX_save_LIBS=$LIBS
+		  LIBS="$PLUGIN_NVPTX_LIBS $LIBS"
+		  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 #include "cuda.h"
 int
@@ -15324,25 +15333,33 @@ if ac_fn_c_try_link "$LINENO"; then :
 fi
 rm -f core conftest.err conftest.$ac_objext \
     conftest$ac_exeext conftest.$ac_ext
-	  CPPFLAGS=$PLUGIN_NVPTX_save_CPPFLAGS
-	  LDFLAGS=$PLUGIN_NVPTX_save_LDFLAGS
-	  LIBS=$PLUGIN_NVPTX_save_LIBS
-	fi
-	case $PLUGIN_NVPTX in
-	  nvptx*)
-	    if (test "x$CUDA_DRIVER_INCLUDE" = x \
-		|| test "x$CUDA_DRIVER_INCLUDE" = xno) \
-	       && (test "x$CUDA_DRIVER_LIB" = x \
-		   || test "x$CUDA_DRIVER_LIB" = xno); then
-	      PLUGIN_NVPTX=1
-	      PLUGIN_NVPTX_CPPFLAGS='-I$(srcdir)/plugin/cuda'
-	      PLUGIN_NVPTX_LIBS='-ldl'
-	      PLUGIN_NVPTX_DYNAMIC=1
-	    else
-	      PLUGIN_NVPTX=0
-	      as_fn_error $? "CUDA driver package required for nvptx support" "$LINENO" 5
-	    fi
-	  ;;
+		  CPPFLAGS=$PLUGIN_NVPTX_save_CPPFLAGS
+		  LDFLAGS=$PLUGIN_NVPTX_save_LDFLAGS
+		  LIBS=$PLUGIN_NVPTX_save_LIBS
+		fi
+		case $PLUGIN_NVPTX in
+		  nvptx*)
+		    if (test "x$CUDA_DRIVER_INCLUDE" = x \
+			|| test "x$CUDA_DRIVER_INCLUDE" = xno) \
+		       && (test "x$CUDA_DRIVER_LIB" = x \
+			   || test "x$CUDA_DRIVER_LIB" = xno); then
+		      PLUGIN_NVPTX=1
+		      PLUGIN_NVPTX_CPPFLAGS='-I$(srcdir)/plugin/cuda'
+		      PLUGIN_NVPTX_LIBS='-ldl'
+		      PLUGIN_NVPTX_DYNAMIC=1
+		    else
+		      PLUGIN_NVPTX=0
+		      as_fn_error $? "CUDA driver package required for nvptx support" "$LINENO" 5
+		    fi
+		    ;;
+		esac
+		;;
+	    esac
+	    ;;
+	  *-*-*)
+	    # Target architecture not supported.
+	    PLUGIN_NVPTX=0
+	    ;;
 	esac
 	;;
       hsa*)
diff --git a/libgomp/plugin/configfrag.ac b/libgomp/plugin/configfrag.ac
index fc91702a434..7eb137472c2 100644
--- a/libgomp/plugin/configfrag.ac
+++ b/libgomp/plugin/configfrag.ac
@@ -167,44 +167,61 @@ if test x"$enable_offload_targets" != x; then
 	tgt_plugin=intelmic
 	;;
       nvptx*)
-	tgt_plugin=nvptx
-	PLUGIN_NVPTX=$tgt
-	if test "x$CUDA_DRIVER_LIB" != xno \
-	   && test "x$CUDA_DRIVER_LIB" != xno; then
-	  PLUGIN_NVPTX_CPPFLAGS=$CUDA_DRIVER_CPPFLAGS
-	  PLUGIN_NVPTX_LDFLAGS=$CUDA_DRIVER_LDFLAGS
-	  PLUGIN_NVPTX_LIBS='-lcuda'
+	case "${target}" in
+	  aarch64*-*-* | powerpc64le-*-* | x86_64-*-*)
+	    case " ${CC} ${CFLAGS} " in
+	      *" -m32 "* | *" -mx32 "*)
+		# PR libgomp/65099: Currently, we only support offloading in
+		# 64-bit configurations.
+		PLUGIN_NVPTX=0
+		;;
+	      *)
+		tgt_plugin=nvptx
+		PLUGIN_NVPTX=$tgt
+		if test "x$CUDA_DRIVER_LIB" != xno \
+		   && test "x$CUDA_DRIVER_LIB" != xno; then
+		  PLUGIN_NVPTX_CPPFLAGS=$CUDA_DRIVER_CPPFLAGS
+		  PLUGIN_NVPTX_LDFLAGS=$CUDA_DRIVER_LDFLAGS
+		  PLUGIN_NVPTX_LIBS='-lcuda'
 
-	  PLUGIN_NVPTX_save_CPPFLAGS=$CPPFLAGS
-	  CPPFLAGS="$PLUGIN_NVPTX_CPPFLAGS $CPPFLAGS"
-	  PLUGIN_NVPTX_save_LDFLAGS=$LDFLAGS
-	  LDFLAGS="$PLUGIN_NVPTX_LDFLAGS $LDFLAGS"
-	  PLUGIN_NVPTX_save_LIBS=$LIBS
-	  LIBS="$PLUGIN_NVPTX_LIBS $LIBS"
-	  AC_LINK_IFELSE(
-	    [AC_LANG_PROGRAM(
-	      [#include "cuda.h"],
-		[CUresult r = cuCtxPushCurrent (NULL);])],
-	    [PLUGIN_NVPTX=1])
-	  CPPFLAGS=$PLUGIN_NVPTX_save_CPPFLAGS
-	  LDFLAGS=$PLUGIN_NVPTX_save_LDFLAGS
-	  LIBS=$PLUGIN_NVPTX_save_LIBS
-	fi
-	case $PLUGIN_NVPTX in
-	  nvptx*)
-	    if (test "x$CUDA_DRIVER_INCLUDE" = x \
-		|| test "x$CUDA_DRIVER_INCLUDE" = xno) \
-	       && (test "x$CUDA_DRIVER_LIB" = x \
-		   || test "x$CUDA_DRIVER_LIB" = xno); then
-	      PLUGIN_NVPTX=1
-	      PLUGIN_NVPTX_CPPFLAGS='-I$(srcdir)/plugin/cuda'
-	      PLUGIN_NVPTX_LIBS='-ldl'
-	      PLUGIN_NVPTX_DYNAMIC=1
-	    else
-	      PLUGIN_NVPTX=0
-	      AC_MSG_ERROR([CUDA driver package required for nvptx support])
-	    fi
-	  ;;
+		  PLUGIN_NVPTX_save_CPPFLAGS=$CPPFLAGS
+		  CPPFLAGS="$PLUGIN_NVPTX_CPPFLAGS $CPPFLAGS"
+		  PLUGIN_NVPTX_save_LDFLAGS=$LDFLAGS
+		  LDFLAGS="$PLUGIN_NVPTX_LDFLAGS $LDFLAGS"
+		  PLUGIN_NVPTX_save_LIBS=$LIBS
+		  LIBS="$PLUGIN_NVPTX_LIBS $LIBS"
+		  AC_LINK_IFELSE(
+		    [AC_LANG_PROGRAM(
+		      [#include "cuda.h"],
+			[CUresult r = cuCtxPushCurrent (NULL);])],
+		    [PLUGIN_NVPTX=1])
+		  CPPFLAGS=$PLUGIN_NVPTX_save_CPPFLAGS
+		  LDFLAGS=$PLUGIN_NVPTX_save_LDFLAGS
+		  LIBS=$PLUGIN_NVPTX_save_LIBS
+		fi
+		case $PLUGIN_NVPTX in
+		  nvptx*)
+		    if (test "x$CUDA_DRIVER_INCLUDE" = x \
+			|| test "x$CUDA_DRIVER_INCLUDE" = xno) \
+		       && (test "x$CUDA_DRIVER_LIB" = x \
+			   || test "x$CUDA_DRIVER_LIB" = xno); then
+		      PLUGIN_NVPTX=1
+		      PLUGIN_NVPTX_CPPFLAGS='-I$(srcdir)/plugin/cuda'
+		      PLUGIN_NVPTX_LIBS='-ldl'
+		      PLUGIN_NVPTX_DYNAMIC=1
+		    else
+		      PLUGIN_NVPTX=0
+		      AC_MSG_ERROR([CUDA driver package required for nvptx support])
+		    fi
+		    ;;
+		esac
+		;;
+	    esac
+	    ;;
+	  *-*-*)
+	    # Target architecture not supported.
+	    PLUGIN_NVPTX=0
+	    ;;
 	esac
 	;;
       hsa*)
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 390804ad1fa..ee4a3ed2264 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -546,15 +546,6 @@ nvptx_get_num_devices (void)
 {
   int n;
 
-  /* PR libgomp/65099: Currently, we only support offloading in 64-bit
-     configurations.  */
-  if (sizeof (void *) != 8)
-    {
-      GOMP_PLUGIN_debug (0, "Disabling nvptx offloading;"
-			 " only 64-bit configurations are supported\n");
-      return 0;
-    }
-
   /* This function will be called before the plugin has been initialized in
      order to enumerate available devices, but CUDA API routines can't be used
      until cuInit has been called.  Just call it now (but don't yet do any
-- 
2.17.1


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #4: 0001-nvptx-libgomp-plugin-Build-only-in-supported-conf.g9.patch --]
[-- Type: text/x-diff, Size: 8705 bytes --]

From 0f1e1069a753e912b058f0d4bf599f0edde28408 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Mon, 30 Nov 2020 15:15:20 +0100
Subject: [PATCH] [nvptx libgomp plugin] Build only in supported configurations

As recently again discussed in <https://gcc.gnu.org/PR97436> "[nvptx] -m32
support", nvptx offloading other than for 64-bit host has never been
implemented, tested, supported.  So we simply should buildn't the nvptx libgomp
plugin in this case.

This avoids build problems if, for example, in a (standard) bi-arch
x86_64-pc-linux-gnu '-m64'/'-m32' build, libcuda is available only in a 64-bit
variant but not in a 32-bit one, which, for example, is the case if you build
GCC against the CUDA toolkit's 'stubs/libcuda.so' (see
<https://stackoverflow.com/a/52784819>).

This amends PR65099 commit a92defdab79a1268f4b9dcf42b937e4002a4cf15 (r225560)
"[nvptx offloading] Only 64-bit configurations are currently supported" to
match the way we're doing this for the HSA/GCN plugins.

	libgomp/
	PR libgomp/65099
	* plugin/configfrag.ac (PLUGIN_NVPTX): Restrict to supported
	configurations.
	* configure: Regenerate.
	* plugin/plugin-nvptx.c (nvptx_get_num_devices): Remove 64-bit
	check.

(cherry picked from commit 6106dfb9f73a33c87108ad5b2dcd4842bdd7828e)
---
 libgomp/configure             | 85 +++++++++++++++++++-------------
 libgomp/plugin/configfrag.ac  | 91 +++++++++++++++++++++--------------
 libgomp/plugin/plugin-nvptx.c |  9 ----
 3 files changed, 105 insertions(+), 80 deletions(-)

diff --git a/libgomp/configure b/libgomp/configure
index b4bc4f43628..de31f97c2c6 100755
--- a/libgomp/configure
+++ b/libgomp/configure
@@ -15641,21 +15641,30 @@ if test x"$enable_offload_targets" != x; then
 	tgt_plugin=intelmic
 	;;
       nvptx*)
-	tgt_plugin=nvptx
-	PLUGIN_NVPTX=$tgt
-	if test "x$CUDA_DRIVER_LIB" != xno \
-	   && test "x$CUDA_DRIVER_LIB" != xno; then
-	  PLUGIN_NVPTX_CPPFLAGS=$CUDA_DRIVER_CPPFLAGS
-	  PLUGIN_NVPTX_LDFLAGS=$CUDA_DRIVER_LDFLAGS
-	  PLUGIN_NVPTX_LIBS='-lcuda'
-
-	  PLUGIN_NVPTX_save_CPPFLAGS=$CPPFLAGS
-	  CPPFLAGS="$PLUGIN_NVPTX_CPPFLAGS $CPPFLAGS"
-	  PLUGIN_NVPTX_save_LDFLAGS=$LDFLAGS
-	  LDFLAGS="$PLUGIN_NVPTX_LDFLAGS $LDFLAGS"
-	  PLUGIN_NVPTX_save_LIBS=$LIBS
-	  LIBS="$PLUGIN_NVPTX_LIBS $LIBS"
-	  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+	case "${target}" in
+	  aarch64*-*-* | powerpc64le-*-* | x86_64-*-*)
+	    case " ${CC} ${CFLAGS} " in
+	      *" -m32 "* | *" -mx32 "*)
+		# PR libgomp/65099: Currently, we only support offloading in
+		# 64-bit configurations.
+		PLUGIN_NVPTX=0
+		;;
+	      *)
+		tgt_plugin=nvptx
+		PLUGIN_NVPTX=$tgt
+		if test "x$CUDA_DRIVER_LIB" != xno \
+		   && test "x$CUDA_DRIVER_LIB" != xno; then
+		  PLUGIN_NVPTX_CPPFLAGS=$CUDA_DRIVER_CPPFLAGS
+		  PLUGIN_NVPTX_LDFLAGS=$CUDA_DRIVER_LDFLAGS
+		  PLUGIN_NVPTX_LIBS='-lcuda'
+
+		  PLUGIN_NVPTX_save_CPPFLAGS=$CPPFLAGS
+		  CPPFLAGS="$PLUGIN_NVPTX_CPPFLAGS $CPPFLAGS"
+		  PLUGIN_NVPTX_save_LDFLAGS=$LDFLAGS
+		  LDFLAGS="$PLUGIN_NVPTX_LDFLAGS $LDFLAGS"
+		  PLUGIN_NVPTX_save_LIBS=$LIBS
+		  LIBS="$PLUGIN_NVPTX_LIBS $LIBS"
+		  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 #include "cuda.h"
 int
@@ -15671,25 +15680,33 @@ if ac_fn_c_try_link "$LINENO"; then :
 fi
 rm -f core conftest.err conftest.$ac_objext \
     conftest$ac_exeext conftest.$ac_ext
-	  CPPFLAGS=$PLUGIN_NVPTX_save_CPPFLAGS
-	  LDFLAGS=$PLUGIN_NVPTX_save_LDFLAGS
-	  LIBS=$PLUGIN_NVPTX_save_LIBS
-	fi
-	case $PLUGIN_NVPTX in
-	  nvptx*)
-	    if (test "x$CUDA_DRIVER_INCLUDE" = x \
-		|| test "x$CUDA_DRIVER_INCLUDE" = xno) \
-	       && (test "x$CUDA_DRIVER_LIB" = x \
-		   || test "x$CUDA_DRIVER_LIB" = xno); then
-	      PLUGIN_NVPTX=1
-	      PLUGIN_NVPTX_CPPFLAGS='-I$(srcdir)/plugin/cuda'
-	      PLUGIN_NVPTX_LIBS='-ldl'
-	      PLUGIN_NVPTX_DYNAMIC=1
-	    else
-	      PLUGIN_NVPTX=0
-	      as_fn_error $? "CUDA driver package required for nvptx support" "$LINENO" 5
-	    fi
-	  ;;
+		  CPPFLAGS=$PLUGIN_NVPTX_save_CPPFLAGS
+		  LDFLAGS=$PLUGIN_NVPTX_save_LDFLAGS
+		  LIBS=$PLUGIN_NVPTX_save_LIBS
+		fi
+		case $PLUGIN_NVPTX in
+		  nvptx*)
+		    if (test "x$CUDA_DRIVER_INCLUDE" = x \
+			|| test "x$CUDA_DRIVER_INCLUDE" = xno) \
+		       && (test "x$CUDA_DRIVER_LIB" = x \
+			   || test "x$CUDA_DRIVER_LIB" = xno); then
+		      PLUGIN_NVPTX=1
+		      PLUGIN_NVPTX_CPPFLAGS='-I$(srcdir)/plugin/cuda'
+		      PLUGIN_NVPTX_LIBS='-ldl'
+		      PLUGIN_NVPTX_DYNAMIC=1
+		    else
+		      PLUGIN_NVPTX=0
+		      as_fn_error $? "CUDA driver package required for nvptx support" "$LINENO" 5
+		    fi
+		    ;;
+		esac
+		;;
+	    esac
+	    ;;
+	  *-*-*)
+	    # Target architecture not supported.
+	    PLUGIN_NVPTX=0
+	    ;;
 	esac
 	;;
       hsa*)
diff --git a/libgomp/plugin/configfrag.ac b/libgomp/plugin/configfrag.ac
index 9718ac752e2..77e1cda1a73 100644
--- a/libgomp/plugin/configfrag.ac
+++ b/libgomp/plugin/configfrag.ac
@@ -158,44 +158,61 @@ if test x"$enable_offload_targets" != x; then
 	tgt_plugin=intelmic
 	;;
       nvptx*)
-	tgt_plugin=nvptx
-	PLUGIN_NVPTX=$tgt
-	if test "x$CUDA_DRIVER_LIB" != xno \
-	   && test "x$CUDA_DRIVER_LIB" != xno; then
-	  PLUGIN_NVPTX_CPPFLAGS=$CUDA_DRIVER_CPPFLAGS
-	  PLUGIN_NVPTX_LDFLAGS=$CUDA_DRIVER_LDFLAGS
-	  PLUGIN_NVPTX_LIBS='-lcuda'
+	case "${target}" in
+	  aarch64*-*-* | powerpc64le-*-* | x86_64-*-*)
+	    case " ${CC} ${CFLAGS} " in
+	      *" -m32 "* | *" -mx32 "*)
+		# PR libgomp/65099: Currently, we only support offloading in
+		# 64-bit configurations.
+		PLUGIN_NVPTX=0
+		;;
+	      *)
+		tgt_plugin=nvptx
+		PLUGIN_NVPTX=$tgt
+		if test "x$CUDA_DRIVER_LIB" != xno \
+		   && test "x$CUDA_DRIVER_LIB" != xno; then
+		  PLUGIN_NVPTX_CPPFLAGS=$CUDA_DRIVER_CPPFLAGS
+		  PLUGIN_NVPTX_LDFLAGS=$CUDA_DRIVER_LDFLAGS
+		  PLUGIN_NVPTX_LIBS='-lcuda'
 
-	  PLUGIN_NVPTX_save_CPPFLAGS=$CPPFLAGS
-	  CPPFLAGS="$PLUGIN_NVPTX_CPPFLAGS $CPPFLAGS"
-	  PLUGIN_NVPTX_save_LDFLAGS=$LDFLAGS
-	  LDFLAGS="$PLUGIN_NVPTX_LDFLAGS $LDFLAGS"
-	  PLUGIN_NVPTX_save_LIBS=$LIBS
-	  LIBS="$PLUGIN_NVPTX_LIBS $LIBS"
-	  AC_LINK_IFELSE(
-	    [AC_LANG_PROGRAM(
-	      [#include "cuda.h"],
-		[CUresult r = cuCtxPushCurrent (NULL);])],
-	    [PLUGIN_NVPTX=1])
-	  CPPFLAGS=$PLUGIN_NVPTX_save_CPPFLAGS
-	  LDFLAGS=$PLUGIN_NVPTX_save_LDFLAGS
-	  LIBS=$PLUGIN_NVPTX_save_LIBS
-	fi
-	case $PLUGIN_NVPTX in
-	  nvptx*)
-	    if (test "x$CUDA_DRIVER_INCLUDE" = x \
-		|| test "x$CUDA_DRIVER_INCLUDE" = xno) \
-	       && (test "x$CUDA_DRIVER_LIB" = x \
-		   || test "x$CUDA_DRIVER_LIB" = xno); then
-	      PLUGIN_NVPTX=1
-	      PLUGIN_NVPTX_CPPFLAGS='-I$(srcdir)/plugin/cuda'
-	      PLUGIN_NVPTX_LIBS='-ldl'
-	      PLUGIN_NVPTX_DYNAMIC=1
-	    else
-	      PLUGIN_NVPTX=0
-	      AC_MSG_ERROR([CUDA driver package required for nvptx support])
-	    fi
-	  ;;
+		  PLUGIN_NVPTX_save_CPPFLAGS=$CPPFLAGS
+		  CPPFLAGS="$PLUGIN_NVPTX_CPPFLAGS $CPPFLAGS"
+		  PLUGIN_NVPTX_save_LDFLAGS=$LDFLAGS
+		  LDFLAGS="$PLUGIN_NVPTX_LDFLAGS $LDFLAGS"
+		  PLUGIN_NVPTX_save_LIBS=$LIBS
+		  LIBS="$PLUGIN_NVPTX_LIBS $LIBS"
+		  AC_LINK_IFELSE(
+		    [AC_LANG_PROGRAM(
+		      [#include "cuda.h"],
+			[CUresult r = cuCtxPushCurrent (NULL);])],
+		    [PLUGIN_NVPTX=1])
+		  CPPFLAGS=$PLUGIN_NVPTX_save_CPPFLAGS
+		  LDFLAGS=$PLUGIN_NVPTX_save_LDFLAGS
+		  LIBS=$PLUGIN_NVPTX_save_LIBS
+		fi
+		case $PLUGIN_NVPTX in
+		  nvptx*)
+		    if (test "x$CUDA_DRIVER_INCLUDE" = x \
+			|| test "x$CUDA_DRIVER_INCLUDE" = xno) \
+		       && (test "x$CUDA_DRIVER_LIB" = x \
+			   || test "x$CUDA_DRIVER_LIB" = xno); then
+		      PLUGIN_NVPTX=1
+		      PLUGIN_NVPTX_CPPFLAGS='-I$(srcdir)/plugin/cuda'
+		      PLUGIN_NVPTX_LIBS='-ldl'
+		      PLUGIN_NVPTX_DYNAMIC=1
+		    else
+		      PLUGIN_NVPTX=0
+		      AC_MSG_ERROR([CUDA driver package required for nvptx support])
+		    fi
+		    ;;
+		esac
+		;;
+	    esac
+	    ;;
+	  *-*-*)
+	    # Target architecture not supported.
+	    PLUGIN_NVPTX=0
+	    ;;
 	esac
 	;;
       hsa*)
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 387e7cc6dd3..eaa8a956573 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -876,15 +876,6 @@ nvptx_get_num_devices (void)
 {
   int n;
 
-  /* PR libgomp/65099: Currently, we only support offloading in 64-bit
-     configurations.  */
-  if (sizeof (void *) != 8)
-    {
-      GOMP_PLUGIN_debug (0, "Disabling nvptx offloading;"
-			 " only 64-bit configurations are supported\n");
-      return 0;
-    }
-
   /* This function will be called before the plugin has been initialized in
      order to enumerate available devices, but CUDA API routines can't be used
      until cuInit has been called.  Just call it now (but don't yet do any
-- 
2.17.1


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #5: 0001-nvptx-libgomp-plugin-Build-only-in-supported-conf.g8.patch --]
[-- Type: text/x-diff, Size: 7868 bytes --]

From f9267925c648f2ccd9e4680b699e581003125bcf Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Mon, 30 Nov 2020 15:15:20 +0100
Subject: [PATCH] [nvptx libgomp plugin] Build only in supported configurations

As recently again discussed in <https://gcc.gnu.org/PR97436> "[nvptx] -m32
support", nvptx offloading other than for 64-bit host has never been
implemented, tested, supported.  So we simply should buildn't the nvptx libgomp
plugin in this case.

This avoids build problems if, for example, in a (standard) bi-arch
x86_64-pc-linux-gnu '-m64'/'-m32' build, libcuda is available only in a 64-bit
variant but not in a 32-bit one, which, for example, is the case if you build
GCC against the CUDA toolkit's 'stubs/libcuda.so' (see
<https://stackoverflow.com/a/52784819>).

This amends PR65099 commit a92defdab79a1268f4b9dcf42b937e4002a4cf15 (r225560)
"[nvptx offloading] Only 64-bit configurations are currently supported" to
match the way we're doing this for the HSA/GCN plugins.

	libgomp/
	PR libgomp/65099
	* plugin/configfrag.ac (PLUGIN_NVPTX): Restrict to supported
	configurations.
	* configure: Regenerate.
	* plugin/plugin-nvptx.c (nvptx_get_num_devices): Remove 64-bit
	check.

(cherry picked from commit 6106dfb9f73a33c87108ad5b2dcd4842bdd7828e)
---
 libgomp/configure             | 75 +++++++++++++++++++-------------
 libgomp/plugin/configfrag.ac  | 81 +++++++++++++++++++++--------------
 libgomp/plugin/plugin-nvptx.c |  9 ----
 3 files changed, 95 insertions(+), 70 deletions(-)

diff --git a/libgomp/configure b/libgomp/configure
index ced7606b355..2529a8e0603 100755
--- a/libgomp/configure
+++ b/libgomp/configure
@@ -15398,19 +15398,28 @@ if test x"$enable_offload_targets" != x; then
 	tgt_name=intelmic
 	;;
       nvptx*)
-        tgt_name=nvptx
-	PLUGIN_NVPTX=$tgt
-	PLUGIN_NVPTX_CPPFLAGS=$CUDA_DRIVER_CPPFLAGS
-	PLUGIN_NVPTX_LDFLAGS=$CUDA_DRIVER_LDFLAGS
-	PLUGIN_NVPTX_LIBS='-lcuda'
-
-	PLUGIN_NVPTX_save_CPPFLAGS=$CPPFLAGS
-	CPPFLAGS="$PLUGIN_NVPTX_CPPFLAGS $CPPFLAGS"
-	PLUGIN_NVPTX_save_LDFLAGS=$LDFLAGS
-	LDFLAGS="$PLUGIN_NVPTX_LDFLAGS $LDFLAGS"
-	PLUGIN_NVPTX_save_LIBS=$LIBS
-	LIBS="$PLUGIN_NVPTX_LIBS $LIBS"
-	cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+	case "${target}" in
+	  aarch64*-*-* | powerpc64le-*-* | x86_64-*-*)
+	    case " ${CC} ${CFLAGS} " in
+	      *" -m32 "* | *" -mx32 "*)
+		# PR libgomp/65099: Currently, we only support offloading in
+		# 64-bit configurations.
+		PLUGIN_NVPTX=0
+		;;
+	      *)
+		tgt_name=nvptx
+		PLUGIN_NVPTX=$tgt
+		PLUGIN_NVPTX_CPPFLAGS=$CUDA_DRIVER_CPPFLAGS
+		PLUGIN_NVPTX_LDFLAGS=$CUDA_DRIVER_LDFLAGS
+		PLUGIN_NVPTX_LIBS='-lcuda'
+
+		PLUGIN_NVPTX_save_CPPFLAGS=$CPPFLAGS
+		CPPFLAGS="$PLUGIN_NVPTX_CPPFLAGS $CPPFLAGS"
+		PLUGIN_NVPTX_save_LDFLAGS=$LDFLAGS
+		LDFLAGS="$PLUGIN_NVPTX_LDFLAGS $LDFLAGS"
+		PLUGIN_NVPTX_save_LIBS=$LIBS
+		LIBS="$PLUGIN_NVPTX_LIBS $LIBS"
+		cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
 #include "cuda.h"
 int
@@ -15426,22 +15435,30 @@ if ac_fn_c_try_link "$LINENO"; then :
 fi
 rm -f core conftest.err conftest.$ac_objext \
     conftest$ac_exeext conftest.$ac_ext
-	CPPFLAGS=$PLUGIN_NVPTX_save_CPPFLAGS
-	LDFLAGS=$PLUGIN_NVPTX_save_LDFLAGS
-	LIBS=$PLUGIN_NVPTX_save_LIBS
-	case $PLUGIN_NVPTX in
-	  nvptx*)
-	    if test "x$CUDA_DRIVER_INCLUDE" = x \
-	       && test "x$CUDA_DRIVER_LIB" = x; then
-	      PLUGIN_NVPTX=1
-	      PLUGIN_NVPTX_CPPFLAGS='-I$(srcdir)/plugin/cuda'
-	      PLUGIN_NVPTX_LIBS='-ldl'
-	      PLUGIN_NVPTX_DYNAMIC=1
-	    else
-	      PLUGIN_NVPTX=0
-	      as_fn_error "CUDA driver package required for nvptx support" "$LINENO" 5
-	    fi
-	  ;;
+		CPPFLAGS=$PLUGIN_NVPTX_save_CPPFLAGS
+		LDFLAGS=$PLUGIN_NVPTX_save_LDFLAGS
+		LIBS=$PLUGIN_NVPTX_save_LIBS
+		case $PLUGIN_NVPTX in
+		  nvptx*)
+		    if test "x$CUDA_DRIVER_INCLUDE" = x \
+		       && test "x$CUDA_DRIVER_LIB" = x; then
+		      PLUGIN_NVPTX=1
+		      PLUGIN_NVPTX_CPPFLAGS='-I$(srcdir)/plugin/cuda'
+		      PLUGIN_NVPTX_LIBS='-ldl'
+		      PLUGIN_NVPTX_DYNAMIC=1
+		    else
+		      PLUGIN_NVPTX=0
+		      as_fn_error "CUDA driver package required for nvptx support" "$LINENO" 5
+		    fi
+		    ;;
+		esac
+		;;
+	    esac
+	    ;;
+	  *-*-*)
+	    # Target architecture not supported.
+	    PLUGIN_NVPTX=0
+	    ;;
 	esac
 	;;
       hsa*)
diff --git a/libgomp/plugin/configfrag.ac b/libgomp/plugin/configfrag.ac
index 864817d44d1..d3470f82f8c 100644
--- a/libgomp/plugin/configfrag.ac
+++ b/libgomp/plugin/configfrag.ac
@@ -148,39 +148,56 @@ if test x"$enable_offload_targets" != x; then
 	tgt_name=intelmic
 	;;
       nvptx*)
-        tgt_name=nvptx
-	PLUGIN_NVPTX=$tgt
-	PLUGIN_NVPTX_CPPFLAGS=$CUDA_DRIVER_CPPFLAGS
-	PLUGIN_NVPTX_LDFLAGS=$CUDA_DRIVER_LDFLAGS
-	PLUGIN_NVPTX_LIBS='-lcuda'
+	case "${target}" in
+	  aarch64*-*-* | powerpc64le-*-* | x86_64-*-*)
+	    case " ${CC} ${CFLAGS} " in
+	      *" -m32 "* | *" -mx32 "*)
+		# PR libgomp/65099: Currently, we only support offloading in
+		# 64-bit configurations.
+		PLUGIN_NVPTX=0
+		;;
+	      *)
+		tgt_name=nvptx
+		PLUGIN_NVPTX=$tgt
+		PLUGIN_NVPTX_CPPFLAGS=$CUDA_DRIVER_CPPFLAGS
+		PLUGIN_NVPTX_LDFLAGS=$CUDA_DRIVER_LDFLAGS
+		PLUGIN_NVPTX_LIBS='-lcuda'
 
-	PLUGIN_NVPTX_save_CPPFLAGS=$CPPFLAGS
-	CPPFLAGS="$PLUGIN_NVPTX_CPPFLAGS $CPPFLAGS"
-	PLUGIN_NVPTX_save_LDFLAGS=$LDFLAGS
-	LDFLAGS="$PLUGIN_NVPTX_LDFLAGS $LDFLAGS"
-	PLUGIN_NVPTX_save_LIBS=$LIBS
-	LIBS="$PLUGIN_NVPTX_LIBS $LIBS"
-	AC_LINK_IFELSE(
-	  [AC_LANG_PROGRAM(
-	    [#include "cuda.h"],
-	      [CUresult r = cuCtxPushCurrent (NULL);])],
-	  [PLUGIN_NVPTX=1])
-	CPPFLAGS=$PLUGIN_NVPTX_save_CPPFLAGS
-	LDFLAGS=$PLUGIN_NVPTX_save_LDFLAGS
-	LIBS=$PLUGIN_NVPTX_save_LIBS
-	case $PLUGIN_NVPTX in
-	  nvptx*)
-	    if test "x$CUDA_DRIVER_INCLUDE" = x \
-	       && test "x$CUDA_DRIVER_LIB" = x; then
-	      PLUGIN_NVPTX=1
-	      PLUGIN_NVPTX_CPPFLAGS='-I$(srcdir)/plugin/cuda'
-	      PLUGIN_NVPTX_LIBS='-ldl'
-	      PLUGIN_NVPTX_DYNAMIC=1
-	    else
-	      PLUGIN_NVPTX=0
-	      AC_MSG_ERROR([CUDA driver package required for nvptx support])
-	    fi
-	  ;;
+		PLUGIN_NVPTX_save_CPPFLAGS=$CPPFLAGS
+		CPPFLAGS="$PLUGIN_NVPTX_CPPFLAGS $CPPFLAGS"
+		PLUGIN_NVPTX_save_LDFLAGS=$LDFLAGS
+		LDFLAGS="$PLUGIN_NVPTX_LDFLAGS $LDFLAGS"
+		PLUGIN_NVPTX_save_LIBS=$LIBS
+		LIBS="$PLUGIN_NVPTX_LIBS $LIBS"
+		AC_LINK_IFELSE(
+		  [AC_LANG_PROGRAM(
+		    [#include "cuda.h"],
+		      [CUresult r = cuCtxPushCurrent (NULL);])],
+		  [PLUGIN_NVPTX=1])
+		CPPFLAGS=$PLUGIN_NVPTX_save_CPPFLAGS
+		LDFLAGS=$PLUGIN_NVPTX_save_LDFLAGS
+		LIBS=$PLUGIN_NVPTX_save_LIBS
+		case $PLUGIN_NVPTX in
+		  nvptx*)
+		    if test "x$CUDA_DRIVER_INCLUDE" = x \
+		       && test "x$CUDA_DRIVER_LIB" = x; then
+		      PLUGIN_NVPTX=1
+		      PLUGIN_NVPTX_CPPFLAGS='-I$(srcdir)/plugin/cuda'
+		      PLUGIN_NVPTX_LIBS='-ldl'
+		      PLUGIN_NVPTX_DYNAMIC=1
+		    else
+		      PLUGIN_NVPTX=0
+		      AC_MSG_ERROR([CUDA driver package required for nvptx support])
+		    fi
+		    ;;
+		esac
+		;;
+	    esac
+	    ;;
+	  *-*-*)
+	    # Target architecture not supported.
+	    PLUGIN_NVPTX=0
+	    ;;
 	esac
 	;;
       hsa*)
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 9ae60953a9a..0a4f4f410bb 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -835,15 +835,6 @@ nvptx_get_num_devices (void)
 {
   int n;
 
-  /* PR libgomp/65099: Currently, we only support offloading in 64-bit
-     configurations.  */
-  if (sizeof (void *) != 8)
-    {
-      GOMP_PLUGIN_debug (0, "Disabling nvptx offloading;"
-			 " only 64-bit configurations are supported\n");
-      return 0;
-    }
-
   /* This function will be called before the plugin has been initialized in
      order to enumerate available devices, but CUDA API routines can't be used
      until cuInit has been called.  Just call it now (but don't yet do any
-- 
2.17.1


^ permalink raw reply	[flat|nested] 82+ messages in thread

* [committed] libgomp: Use sizeof(void*) based checks instead of looking through $CC $CFLAGS for -m32/-mx32
  2021-01-14 18:18                     ` [nvptx libgomp plugin] Build only in supported configurations (was: [nvptx offloading] Only 64-bit configurations are currently supported) Thomas Schwinge
@ 2021-03-04  8:52                       ` Jakub Jelinek
  2021-03-22 11:24                         ` Thomas Schwinge
  0 siblings, 1 reply; 82+ messages in thread
From: Jakub Jelinek @ 2021-03-04  8:52 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: gcc-patches

On Thu, Jan 14, 2021 at 07:18:13PM +0100, Thomas Schwinge wrote:
> 	libgomp/
> 	PR libgomp/65099
> 	* plugin/configfrag.ac (PLUGIN_NVPTX): Restrict to supported
> 	configurations.
> 	* configure: Regenerate.
> 	* plugin/plugin-nvptx.c (nvptx_get_num_devices): Remove 64-bit
> 	check.

Some gcc configurations default to -m32 but support -m64 too.  This patch
just makes the ILP32 tests more reliable by following what e.g. libsanitizer
configury does.  Perhaps we should incrementally also handle there
| i?86-*-*
I didn't change case to if because that would require reindenting the whole
large block.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2021-03-04  Jakub Jelinek  <jakub@redhat.com>

	* configure.ac: Add AC_CHECK_SIZEOF([void *]).
	* plugin/configfrag.ac: Check $ac_cv_sizeof_void_p value instead of
	checking of -m32 or -mx32 options on the command line.
	* config.h.in: Regenerated.
	* configure: Regenerated.

--- libgomp/configure.ac.jj	2020-07-28 15:39:10.148754303 +0200
+++ libgomp/configure.ac	2021-03-03 14:41:21.964355951 +0100
@@ -221,6 +221,8 @@ if test x$libgomp_offloaded_only = xyes;
             [Define to 1 if building libgomp for an accelerator-only target.])
 fi
 
+AC_CHECK_SIZEOF([void *])
+
 m4_include([plugin/configfrag.ac])
 
 # Check for functions needed.
--- libgomp/plugin/configfrag.ac.jj	2021-01-14 19:34:06.164423884 +0100
+++ libgomp/plugin/configfrag.ac	2021-03-03 14:45:45.374070228 +0100
@@ -160,8 +160,8 @@ if test x"$enable_offload_targets" != x;
       nvptx*)
 	case "${target}" in
 	  aarch64*-*-* | powerpc64le-*-* | x86_64-*-*)
-	    case " ${CC} ${CFLAGS} " in
-	      *" -m32 "* | *" -mx32 "*)
+	    case "$ac_cv_sizeof_void_p" in
+	      4)
 		# PR libgomp/65099: Currently, we only support offloading in
 		# 64-bit configurations.
 		PLUGIN_NVPTX=0
@@ -218,8 +218,8 @@ if test x"$enable_offload_targets" != x;
       amdgcn*)
 	case "${target}" in
 	  x86_64-*-*)
-	    case " ${CC} ${CFLAGS} " in
-	      *" -m32 "*|*" -mx32 "*)
+	    case "$ac_cv_sizeof_void_p" in
+	      4)
 		PLUGIN_GCN=0
 		;;
 	      *)
--- libgomp/config.h.in.jj	2020-08-03 22:54:51.483530741 +0200
+++ libgomp/config.h.in	2021-03-03 14:46:07.965788364 +0100
@@ -183,6 +183,9 @@
 /* Define if all infrastructure, needed for plugins, is supported. */
 #undef PLUGIN_SUPPORT
 
+/* The size of `void *', as computed by sizeof. */
+#undef SIZEOF_VOID_P
+
 /* Define to 1 if you have the ANSI C header files. */
 #undef STDC_HEADERS
 
--- libgomp/configure.jj	2021-01-14 19:34:06.140424158 +0100
+++ libgomp/configure	2021-03-03 14:46:05.317821453 +0100
@@ -2058,60 +2058,6 @@ fi
 
 } # ac_fn_c_check_header_mongrel
 
-# ac_fn_c_check_type LINENO TYPE VAR INCLUDES
-# -------------------------------------------
-# Tests whether TYPE exists after having included INCLUDES, setting cache
-# variable VAR accordingly.
-ac_fn_c_check_type ()
-{
-  as_lineno=${as_lineno-"$1"} as_lineno_stack=as_lineno_stack=$as_lineno_stack
-  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $2" >&5
-$as_echo_n "checking for $2... " >&6; }
-if eval \${$3+:} false; then :
-  $as_echo_n "(cached) " >&6
-else
-  eval "$3=no"
-  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
-/* end confdefs.h.  */
-$4
-int
-main ()
-{
-if (sizeof ($2))
-	 return 0;
-  ;
-  return 0;
-}
-_ACEOF
-if ac_fn_c_try_compile "$LINENO"; then :
-  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
-/* end confdefs.h.  */
-$4
-int
-main ()
-{
-if (sizeof (($2)))
-	    return 0;
-  ;
-  return 0;
-}
-_ACEOF
-if ac_fn_c_try_compile "$LINENO"; then :
-
-else
-  eval "$3=yes"
-fi
-rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
-fi
-rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
-fi
-eval ac_res=\$$3
-	       { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5
-$as_echo "$ac_res" >&6; }
-  eval $as_lineno_stack; ${as_lineno_stack:+:} unset as_lineno
-
-} # ac_fn_c_check_type
-
 # ac_fn_c_compute_int LINENO EXPR VAR INCLUDES
 # --------------------------------------------
 # Tries to find the compile-time value of EXPR in a program that includes
@@ -2294,6 +2240,60 @@ rm -f conftest.val
   as_fn_set_status $ac_retval
 
 } # ac_fn_c_compute_int
+
+# ac_fn_c_check_type LINENO TYPE VAR INCLUDES
+# -------------------------------------------
+# Tests whether TYPE exists after having included INCLUDES, setting cache
+# variable VAR accordingly.
+ac_fn_c_check_type ()
+{
+  as_lineno=${as_lineno-"$1"} as_lineno_stack=as_lineno_stack=$as_lineno_stack
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $2" >&5
+$as_echo_n "checking for $2... " >&6; }
+if eval \${$3+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  eval "$3=no"
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+$4
+int
+main ()
+{
+if (sizeof ($2))
+	 return 0;
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"; then :
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+$4
+int
+main ()
+{
+if (sizeof (($2)))
+	    return 0;
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"; then :
+
+else
+  eval "$3=yes"
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+fi
+eval ac_res=\$$3
+	       { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5
+$as_echo "$ac_res" >&6; }
+  eval $as_lineno_stack; ${as_lineno_stack:+:} unset as_lineno
+
+} # ac_fn_c_check_type
 cat >config.log <<_ACEOF
 This file contains any messages produced by compilers while
 running configure, to aid debugging if configure makes a mistake.
@@ -11421,7 +11421,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11434 "configure"
+#line 11424 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -11527,7 +11527,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11540 "configure"
+#line 11530 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -14251,16 +14251,6 @@ freebsd* | dragonfly*)
   esac
   ;;
 
-gnu*)
-  version_type=linux
-  need_lib_prefix=no
-  need_version=no
-  library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}${major} ${libname}${shared_ext}'
-  soname_spec='${libname}${release}${shared_ext}$major'
-  shlibpath_var=LD_LIBRARY_PATH
-  hardcode_into_libs=yes
-  ;;
-
 haiku*)
   version_type=linux
   need_lib_prefix=no
@@ -14382,7 +14372,7 @@ linux*oldld* | linux*aout* | linux*coff*
 # project, but have not yet been accepted: they are GCC-local changes
 # for the time being.  (See
 # https://lists.gnu.org/archive/html/libtool-patches/2018-05/msg00000.html)
-linux* | k*bsd*-gnu | kopensolaris*-gnu | uclinuxfdpiceabi)
+linux* | k*bsd*-gnu | kopensolaris*-gnu | gnu* | uclinuxfdpiceabi)
   version_type=linux
   need_lib_prefix=no
   need_version=no
@@ -15005,9 +14995,43 @@ $as_echo "#define LIBGOMP_OFFLOADED_ONLY
 
 fi
 
+# The cast to long int works around a bug in the HP C Compiler
+# version HP92453-01 B.11.11.23709.GP, which incorrectly rejects
+# declarations like `int a3[[(sizeof (unsigned char)) >= 0]];'.
+# This bug is HP SR number 8606223364.
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking size of void *" >&5
+$as_echo_n "checking size of void *... " >&6; }
+if ${ac_cv_sizeof_void_p+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  if ac_fn_c_compute_int "$LINENO" "(long int) (sizeof (void *))" "ac_cv_sizeof_void_p"        "$ac_includes_default"; then :
+
+else
+  if test "$ac_cv_type_void_p" = yes; then
+     { { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5
+$as_echo "$as_me: error: in \`$ac_pwd':" >&2;}
+as_fn_error 77 "cannot compute sizeof (void *)
+See \`config.log' for more details" "$LINENO" 5; }
+   else
+     ac_cv_sizeof_void_p=0
+   fi
+fi
+
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_sizeof_void_p" >&5
+$as_echo "$ac_cv_sizeof_void_p" >&6; }
+
+
+
+cat >>confdefs.h <<_ACEOF
+#define SIZEOF_VOID_P $ac_cv_sizeof_void_p
+_ACEOF
+
+
+
 # Plugins for offload execution, configure.ac fragment.  -*- mode: autoconf -*-
 #
-# Copyright (C) 2014-2020 Free Software Foundation, Inc.
+# Copyright (C) 2014-2021 Free Software Foundation, Inc.
 #
 # Contributed by Mentor Embedded.
 #
@@ -15274,8 +15298,8 @@ if test x"$enable_offload_targets" != x;
       nvptx*)
 	case "${target}" in
 	  aarch64*-*-* | powerpc64le-*-* | x86_64-*-*)
-	    case " ${CC} ${CFLAGS} " in
-	      *" -m32 "* | *" -mx32 "*)
+	    case "$ac_cv_sizeof_void_p" in
+	      4)
 		# PR libgomp/65099: Currently, we only support offloading in
 		# 64-bit configurations.
 		PLUGIN_NVPTX=0
@@ -15343,8 +15367,8 @@ rm -f core conftest.err conftest.$ac_obj
       amdgcn*)
 	case "${target}" in
 	  x86_64-*-*)
-	    case " ${CC} ${CFLAGS} " in
-	      *" -m32 "*|*" -mx32 "*)
+	    case "$ac_cv_sizeof_void_p" in
+	      4)
 		PLUGIN_GCN=0
 		;;
 	      *)


	Jakub


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [committed] libgomp: Use sizeof(void*) based checks instead of looking through $CC $CFLAGS for -m32/-mx32
  2021-03-04  8:52                       ` [committed] libgomp: Use sizeof(void*) based checks instead of looking through $CC $CFLAGS for -m32/-mx32 Jakub Jelinek
@ 2021-03-22 11:24                         ` Thomas Schwinge
  0 siblings, 0 replies; 82+ messages in thread
From: Thomas Schwinge @ 2021-03-22 11:24 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches

Hi Jakub!

On 2021-03-04T09:52:41+0100, Jakub Jelinek via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> On Thu, Jan 14, 2021 at 07:18:13PM +0100, Thomas Schwinge wrote:
>>      libgomp/
>>      PR libgomp/65099
>>      * plugin/configfrag.ac (PLUGIN_NVPTX): Restrict to supported
>>      configurations.

(I had copied for the nvptx offloading plugin the approach as had before
been established for the GCN offloading plugin.)

> Some gcc configurations default to -m32 but support -m64 too.  This patch
> just makes the ILP32 tests more reliable by following what e.g. libsanitizer
> configury does.

ACK, thanks!

> Perhaps we should incrementally also handle there
> | i?86-*-*

Yes, I suppose we should.

> Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

Shouldn't this also go onto the release branches?


Grüße
 Thomas


> 2021-03-04  Jakub Jelinek  <jakub@redhat.com>
>
>       * configure.ac: Add AC_CHECK_SIZEOF([void *]).
>       * plugin/configfrag.ac: Check $ac_cv_sizeof_void_p value instead of
>       checking of -m32 or -mx32 options on the command line.
>       * config.h.in: Regenerated.
>       * configure: Regenerated.
>
> --- libgomp/configure.ac.jj   2020-07-28 15:39:10.148754303 +0200
> +++ libgomp/configure.ac      2021-03-03 14:41:21.964355951 +0100
> @@ -221,6 +221,8 @@ if test x$libgomp_offloaded_only = xyes;
>              [Define to 1 if building libgomp for an accelerator-only target.])
>  fi
>
> +AC_CHECK_SIZEOF([void *])
> +
>  m4_include([plugin/configfrag.ac])
>
>  # Check for functions needed.
> --- libgomp/plugin/configfrag.ac.jj   2021-01-14 19:34:06.164423884 +0100
> +++ libgomp/plugin/configfrag.ac      2021-03-03 14:45:45.374070228 +0100
> @@ -160,8 +160,8 @@ if test x"$enable_offload_targets" != x;
>        nvptx*)
>       case "${target}" in
>         aarch64*-*-* | powerpc64le-*-* | x86_64-*-*)
> -         case " ${CC} ${CFLAGS} " in
> -           *" -m32 "* | *" -mx32 "*)
> +         case "$ac_cv_sizeof_void_p" in
> +           4)
>               # PR libgomp/65099: Currently, we only support offloading in
>               # 64-bit configurations.
>               PLUGIN_NVPTX=0
> @@ -218,8 +218,8 @@ if test x"$enable_offload_targets" != x;
>        amdgcn*)
>       case "${target}" in
>         x86_64-*-*)
> -         case " ${CC} ${CFLAGS} " in
> -           *" -m32 "*|*" -mx32 "*)
> +         case "$ac_cv_sizeof_void_p" in
> +           4)
>               PLUGIN_GCN=0
>               ;;
>             *)
> --- libgomp/config.h.in.jj    2020-08-03 22:54:51.483530741 +0200
> +++ libgomp/config.h.in       2021-03-03 14:46:07.965788364 +0100
> @@ -183,6 +183,9 @@
>  /* Define if all infrastructure, needed for plugins, is supported. */
>  #undef PLUGIN_SUPPORT
>
> +/* The size of `void *', as computed by sizeof. */
> +#undef SIZEOF_VOID_P
> +
>  /* Define to 1 if you have the ANSI C header files. */
>  #undef STDC_HEADERS
>
> --- libgomp/configure.jj      2021-01-14 19:34:06.140424158 +0100
> +++ libgomp/configure 2021-03-03 14:46:05.317821453 +0100
> @@ -2058,60 +2058,6 @@ fi
>
>  } # ac_fn_c_check_header_mongrel
>
> -# ac_fn_c_check_type LINENO TYPE VAR INCLUDES
> -# -------------------------------------------
> -# Tests whether TYPE exists after having included INCLUDES, setting cache
> -# variable VAR accordingly.
> -ac_fn_c_check_type ()
> -{
> -  as_lineno=${as_lineno-"$1"} as_lineno_stack=as_lineno_stack=$as_lineno_stack
> -  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $2" >&5
> -$as_echo_n "checking for $2... " >&6; }
> -if eval \${$3+:} false; then :
> -  $as_echo_n "(cached) " >&6
> -else
> -  eval "$3=no"
> -  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
> -/* end confdefs.h.  */
> -$4
> -int
> -main ()
> -{
> -if (sizeof ($2))
> -      return 0;
> -  ;
> -  return 0;
> -}
> -_ACEOF
> -if ac_fn_c_try_compile "$LINENO"; then :
> -  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
> -/* end confdefs.h.  */
> -$4
> -int
> -main ()
> -{
> -if (sizeof (($2)))
> -         return 0;
> -  ;
> -  return 0;
> -}
> -_ACEOF
> -if ac_fn_c_try_compile "$LINENO"; then :
> -
> -else
> -  eval "$3=yes"
> -fi
> -rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
> -fi
> -rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
> -fi
> -eval ac_res=\$$3
> -            { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5
> -$as_echo "$ac_res" >&6; }
> -  eval $as_lineno_stack; ${as_lineno_stack:+:} unset as_lineno
> -
> -} # ac_fn_c_check_type
> -
>  # ac_fn_c_compute_int LINENO EXPR VAR INCLUDES
>  # --------------------------------------------
>  # Tries to find the compile-time value of EXPR in a program that includes
> @@ -2294,6 +2240,60 @@ rm -f conftest.val
>    as_fn_set_status $ac_retval
>
>  } # ac_fn_c_compute_int
> +
> +# ac_fn_c_check_type LINENO TYPE VAR INCLUDES
> +# -------------------------------------------
> +# Tests whether TYPE exists after having included INCLUDES, setting cache
> +# variable VAR accordingly.
> +ac_fn_c_check_type ()
> +{
> +  as_lineno=${as_lineno-"$1"} as_lineno_stack=as_lineno_stack=$as_lineno_stack
> +  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $2" >&5
> +$as_echo_n "checking for $2... " >&6; }
> +if eval \${$3+:} false; then :
> +  $as_echo_n "(cached) " >&6
> +else
> +  eval "$3=no"
> +  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
> +/* end confdefs.h.  */
> +$4
> +int
> +main ()
> +{
> +if (sizeof ($2))
> +      return 0;
> +  ;
> +  return 0;
> +}
> +_ACEOF
> +if ac_fn_c_try_compile "$LINENO"; then :
> +  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
> +/* end confdefs.h.  */
> +$4
> +int
> +main ()
> +{
> +if (sizeof (($2)))
> +         return 0;
> +  ;
> +  return 0;
> +}
> +_ACEOF
> +if ac_fn_c_try_compile "$LINENO"; then :
> +
> +else
> +  eval "$3=yes"
> +fi
> +rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
> +fi
> +rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
> +fi
> +eval ac_res=\$$3
> +            { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5
> +$as_echo "$ac_res" >&6; }
> +  eval $as_lineno_stack; ${as_lineno_stack:+:} unset as_lineno
> +
> +} # ac_fn_c_check_type
>  cat >config.log <<_ACEOF
>  This file contains any messages produced by compilers while
>  running configure, to aid debugging if configure makes a mistake.
> @@ -11421,7 +11421,7 @@ else
>    lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
>    lt_status=$lt_dlunknown
>    cat > conftest.$ac_ext <<_LT_EOF
> -#line 11434 "configure"
> +#line 11424 "configure"
>  #include "confdefs.h"
>
>  #if HAVE_DLFCN_H
> @@ -11527,7 +11527,7 @@ else
>    lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
>    lt_status=$lt_dlunknown
>    cat > conftest.$ac_ext <<_LT_EOF
> -#line 11540 "configure"
> +#line 11530 "configure"
>  #include "confdefs.h"
>
>  #if HAVE_DLFCN_H
> @@ -14251,16 +14251,6 @@ freebsd* | dragonfly*)
>    esac
>    ;;
>
> -gnu*)
> -  version_type=linux
> -  need_lib_prefix=no
> -  need_version=no
> -  library_names_spec='${libname}${release}${shared_ext}$versuffix ${libname}${release}${shared_ext}${major} ${libname}${shared_ext}'
> -  soname_spec='${libname}${release}${shared_ext}$major'
> -  shlibpath_var=LD_LIBRARY_PATH
> -  hardcode_into_libs=yes
> -  ;;
> -
>  haiku*)
>    version_type=linux
>    need_lib_prefix=no
> @@ -14382,7 +14372,7 @@ linux*oldld* | linux*aout* | linux*coff*
>  # project, but have not yet been accepted: they are GCC-local changes
>  # for the time being.  (See
>  # https://lists.gnu.org/archive/html/libtool-patches/2018-05/msg00000.html)
> -linux* | k*bsd*-gnu | kopensolaris*-gnu | uclinuxfdpiceabi)
> +linux* | k*bsd*-gnu | kopensolaris*-gnu | gnu* | uclinuxfdpiceabi)
>    version_type=linux
>    need_lib_prefix=no
>    need_version=no
> @@ -15005,9 +14995,43 @@ $as_echo "#define LIBGOMP_OFFLOADED_ONLY
>
>  fi
>
> +# The cast to long int works around a bug in the HP C Compiler
> +# version HP92453-01 B.11.11.23709.GP, which incorrectly rejects
> +# declarations like `int a3[[(sizeof (unsigned char)) >= 0]];'.
> +# This bug is HP SR number 8606223364.
> +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking size of void *" >&5
> +$as_echo_n "checking size of void *... " >&6; }
> +if ${ac_cv_sizeof_void_p+:} false; then :
> +  $as_echo_n "(cached) " >&6
> +else
> +  if ac_fn_c_compute_int "$LINENO" "(long int) (sizeof (void *))" "ac_cv_sizeof_void_p"        "$ac_includes_default"; then :
> +
> +else
> +  if test "$ac_cv_type_void_p" = yes; then
> +     { { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5
> +$as_echo "$as_me: error: in \`$ac_pwd':" >&2;}
> +as_fn_error 77 "cannot compute sizeof (void *)
> +See \`config.log' for more details" "$LINENO" 5; }
> +   else
> +     ac_cv_sizeof_void_p=0
> +   fi
> +fi
> +
> +fi
> +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_sizeof_void_p" >&5
> +$as_echo "$ac_cv_sizeof_void_p" >&6; }
> +
> +
> +
> +cat >>confdefs.h <<_ACEOF
> +#define SIZEOF_VOID_P $ac_cv_sizeof_void_p
> +_ACEOF
> +
> +
> +
>  # Plugins for offload execution, configure.ac fragment.  -*- mode: autoconf -*-
>  #
> -# Copyright (C) 2014-2020 Free Software Foundation, Inc.
> +# Copyright (C) 2014-2021 Free Software Foundation, Inc.
>  #
>  # Contributed by Mentor Embedded.
>  #
> @@ -15274,8 +15298,8 @@ if test x"$enable_offload_targets" != x;
>        nvptx*)
>       case "${target}" in
>         aarch64*-*-* | powerpc64le-*-* | x86_64-*-*)
> -         case " ${CC} ${CFLAGS} " in
> -           *" -m32 "* | *" -mx32 "*)
> +         case "$ac_cv_sizeof_void_p" in
> +           4)
>               # PR libgomp/65099: Currently, we only support offloading in
>               # 64-bit configurations.
>               PLUGIN_NVPTX=0
> @@ -15343,8 +15367,8 @@ rm -f core conftest.err conftest.$ac_obj
>        amdgcn*)
>       case "${target}" in
>         x86_64-*-*)
> -         case " ${CC} ${CFLAGS} " in
> -           *" -m32 "*|*" -mx32 "*)
> +         case "$ac_cv_sizeof_void_p" in
> +           4)
>               PLUGIN_GCN=0
>               ;;
>             *)
>
>
>       Jakub
-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf

^ permalink raw reply	[flat|nested] 82+ messages in thread

end of thread, other threads:[~2021-03-22 11:24 UTC | newest]

Thread overview: 82+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-20 14:19 The nvptx port [0/11+] Bernd Schmidt
2014-10-20 14:21 ` The nvptx port [1/11+] indirect jumps Bernd Schmidt
2014-10-21 18:29   ` Jeff Law
2014-10-21 21:03     ` Bernd Schmidt
2014-10-21 21:30       ` Jakub Jelinek
2014-10-21 21:37         ` Bernd Schmidt
2014-10-22  8:21           ` Richard Biener
2014-10-22  8:34             ` Jakub Jelinek
2014-10-22  8:37             ` Thomas Schwinge
2014-10-22 10:03               ` Richard Biener
2014-10-22 10:32                 ` Jakub Jelinek
2014-11-04 15:35   ` Bernd Schmidt
2014-11-04 15:43     ` Richard Henderson
2014-10-20 14:22 ` The nvptx port [2/11+] No register allocation Bernd Schmidt
2014-10-20 14:24 ` The nvptx port [3/11+] Struct returns Bernd Schmidt
2014-10-21 18:41   ` Jeff Law
2014-10-20 14:24 ` The nvptx port [2/11+] No register allocation Bernd Schmidt
2014-10-21 18:36   ` Jeff Law
2014-10-20 14:27 ` The nvptx port [4/11+] Post-RA pipeline Bernd Schmidt
2014-10-21 18:42   ` Jeff Law
2014-10-20 14:27 ` The nvptx port [5/11+] Variable declarations Bernd Schmidt
2014-10-21 18:44   ` Jeff Law
2014-10-20 14:31 ` The nvptx port [6/11+] Pseudo call args Bernd Schmidt
2014-10-21 18:56   ` Jeff Law
2014-10-20 14:32 ` The nvptx port [7/11+] Inform the port about call arguments Bernd Schmidt
2014-10-21 21:25   ` Jeff Law
2014-10-21 21:33     ` Bernd Schmidt
2014-10-21 21:55       ` Jeff Law
2014-10-21 22:16         ` Bernd Schmidt
2014-10-22 18:23           ` Jeff Law
2014-10-28 14:57             ` Bernd Schmidt
2014-10-29 23:42               ` Jeff Law
2014-10-20 14:32 ` The nvptx port [8/11+] Write undefined decls Bernd Schmidt
2014-10-21 22:07   ` Jeff Law
2014-10-21 22:30     ` Bernd Schmidt
2014-10-22 18:23       ` Jeff Law
2014-11-05 12:05         ` Bernd Schmidt
2014-11-05 20:05           ` Jeff Law
2014-10-20 14:35 ` The nvptx port [9/11+] Epilogues Bernd Schmidt
2014-10-21 22:08   ` Jeff Law
2014-10-20 14:50 ` The nvptx port [10/11+] Target files Bernd Schmidt
2014-10-22 18:12   ` Jeff Law
2014-10-28 15:10     ` Bernd Schmidt
2014-10-29 23:51       ` Jeff Law
2014-10-30  2:53         ` Bernd Schmidt
2014-10-30  3:09           ` Jeff Law
2014-11-10 16:33         ` Bernd Schmidt
2014-11-10 20:06           ` Jakub Jelinek
2014-11-10 20:37             ` H.J. Lu
2014-11-10 20:40             ` H.J. Lu
2014-11-10 20:42               ` Mike Stump
2014-12-12 20:18           ` Thomas Schwinge
2014-12-23 18:51           ` nvptx-tools and nvptx-newlib (was: The nvptx port [10/11+] Target files) Thomas Schwinge
2015-02-02 15:33             ` Thomas Schwinge
2015-02-04  9:43               ` Jakub Jelinek
2015-02-18  8:50                 ` Thomas Schwinge
2015-02-18  9:03                   ` Jakub Jelinek
2015-07-08 15:03                   ` [nvptx offloading] Only 64-bit configurations are currently supported (was: nvptx-tools and nvptx-newlib) Thomas Schwinge
2015-07-14 20:10                     ` [nvptx offloading] Only 64-bit configurations are currently supported Thomas Schwinge
2015-07-14 20:25                       ` Richard Biener
2021-01-14 18:18                     ` [nvptx libgomp plugin] Build only in supported configurations (was: [nvptx offloading] Only 64-bit configurations are currently supported) Thomas Schwinge
2021-03-04  8:52                       ` [committed] libgomp: Use sizeof(void*) based checks instead of looking through $CC $CFLAGS for -m32/-mx32 Jakub Jelinek
2021-03-22 11:24                         ` Thomas Schwinge
2014-11-04 16:48       ` The nvptx port [10/11+] Target files Richard Henderson
2014-11-04 16:55         ` Bernd Schmidt
2014-11-05 13:07           ` Bernd Schmidt
2014-10-20 14:58 ` The nvptx port [11/11] More tools Bernd Schmidt
2014-10-21  0:16   ` Joseph S. Myers
2014-10-22 20:40   ` Jeff Law
2014-10-22 21:16     ` Bernd Schmidt
2014-10-24 19:52       ` Jeff Law
2014-10-31 21:04   ` Jeff Law
     [not found]     ` <54542050.6010908@codesourcery.com>
2014-11-03 21:49       ` Jeff Law
2014-10-21  8:23 ` The nvptx port [0/11+] Richard Biener
2014-10-21 10:57   ` Bernd Schmidt
2014-10-21 11:27     ` Richard Biener
2014-10-21  9:17 ` Jakub Jelinek
2014-10-21 11:19   ` Bernd Schmidt
2014-11-12 12:36 ` Richard Biener
2014-11-12 21:39   ` Jeff Law
2015-02-18  7:48 ` nvptx-none: Define empty GOMP_SELF_SPECS (was: The nvptx port [0/11+]) Thomas Schwinge
2015-02-18  8:01 ` The nvptx port [0/11+] Thomas Schwinge

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).