public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] AArch64: Improve GOT addressing
@ 2021-05-05 13:17 Wilco Dijkstra
  2021-05-10 12:05 ` Richard Sandiford
  0 siblings, 1 reply; 4+ messages in thread
From: Wilco Dijkstra @ 2021-05-05 13:17 UTC (permalink / raw)
  To: GCC Patches; +Cc: Richard Sandiford, Kyrylo Tkachov


Improve GOT addressing by emitting the instructions as a pair.  This reduces
register pressure and improves code quality. With -fPIC codesize improves by
0.65% and SPECINT2017 improves by 0.25%.

Passes bootstrap and regress. OK for commit?

ChangeLog:
2021-05-05  Wilco Dijkstra  <wdijkstr@arm.com>

        * config/aarch64/aarch64.md (ldr_got_small_<mode>): Emit ADRP+LDR GOT sequence.
        (ldr_got_small_sidi): Likewise.
        * config/aarch64/aarch64.c (aarch64_load_symref_appropriately): Remove tmp_reg.
        (aarch64_print_operand): Correctly print got_lo12 in L specifier.

---

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 641c83b479e76cbcc75b299eb7ae5f634d9db7cd..32c5c76d3c001a79d2a69b7f8243f1f1f605f901 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -3625,27 +3625,21 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,
 
 	rtx insn;
 	rtx mem;
-	rtx tmp_reg = dest;
 	machine_mode mode = GET_MODE (dest);
 
-	if (can_create_pseudo_p ())
-	  tmp_reg = gen_reg_rtx (mode);
-
-	emit_move_insn (tmp_reg, gen_rtx_HIGH (mode, imm));
 	if (mode == ptr_mode)
 	  {
 	    if (mode == DImode)
-	      insn = gen_ldr_got_small_di (dest, tmp_reg, imm);
+	      insn = gen_ldr_got_small_di (dest, imm);
 	    else
-	      insn = gen_ldr_got_small_si (dest, tmp_reg, imm);
+	      insn = gen_ldr_got_small_si (dest, imm);
 
 	    mem = XVECEXP (SET_SRC (insn), 0, 0);
 	  }
 	else
 	  {
 	    gcc_assert (mode == Pmode);
-
-	    insn = gen_ldr_got_small_sidi (dest, tmp_reg, imm);
+	    insn = gen_ldr_got_small_sidi (dest, imm);
 	    mem = XVECEXP (XEXP (SET_SRC (insn), 0), 0, 0);
 	  }
 
@@ -11019,7 +11013,7 @@ aarch64_print_operand (FILE *f, rtx x, int code)
       switch (aarch64_classify_symbolic_expression (x))
 	{
 	case SYMBOL_SMALL_GOT_4G:
-	  asm_fprintf (asm_out_file, ":lo12:");
+	  asm_fprintf (asm_out_file, ":got_lo12:");
 	  break;
 
 	case SYMBOL_SMALL_TLSGD:
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index abfd84526745d029ad4953eabad6dd17b159a218..36c5c054f86e9cdd1f0945cdbc1beb47aa7ad80a 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -6705,25 +6705,23 @@ (define_insn "add_losym_<mode>"
 
 (define_insn "ldr_got_small_<mode>"
   [(set (match_operand:PTR 0 "register_operand" "=r")
-	(unspec:PTR [(mem:PTR (lo_sum:PTR
-			      (match_operand:PTR 1 "register_operand" "r")
-			      (match_operand:PTR 2 "aarch64_valid_symref" "S")))]
+	(unspec:PTR [(mem:PTR (match_operand:PTR 1 "aarch64_valid_symref" "S"))]
 		    UNSPEC_GOTSMALLPIC))]
   ""
-  "ldr\\t%<w>0, [%1, #:got_lo12:%c2]"
-  [(set_attr "type" "load_<ldst_sz>")]
+  "adrp\\t%0, %A1\;ldr\\t%<w>0, [%0, %L1]"
+  [(set_attr "type" "load_<ldst_sz>")
+   (set_attr "length" "8")]
 )
 
 (define_insn "ldr_got_small_sidi"
   [(set (match_operand:DI 0 "register_operand" "=r")
 	(zero_extend:DI
-	 (unspec:SI [(mem:SI (lo_sum:DI
-			     (match_operand:DI 1 "register_operand" "r")
-			     (match_operand:DI 2 "aarch64_valid_symref" "S")))]
+	 (unspec:SI [(mem:SI (match_operand:DI 1 "aarch64_valid_symref" "S"))]
 		    UNSPEC_GOTSMALLPIC)))]
   "TARGET_ILP32"
-  "ldr\\t%w0, [%1, #:got_lo12:%c2]"
-  [(set_attr "type" "load_4")]
+  "adrp\\t%0, %A1\;ldr\\t%w0, [%0, %L1]"
+  [(set_attr "type" "load_4")
+   (set_attr "length" "8")]
 )
 
 (define_insn "ldr_got_small_28k_<mode>"

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] AArch64: Improve GOT addressing
  2021-05-05 13:17 [PATCH] AArch64: Improve GOT addressing Wilco Dijkstra
@ 2021-05-10 12:05 ` Richard Sandiford
  2021-05-10 14:04   ` Wilco Dijkstra
  0 siblings, 1 reply; 4+ messages in thread
From: Richard Sandiford @ 2021-05-10 12:05 UTC (permalink / raw)
  To: Wilco Dijkstra via Gcc-patches; +Cc: Wilco Dijkstra

Wilco Dijkstra via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> Improve GOT addressing by emitting the instructions as a pair.  This reduces
> register pressure and improves code quality. With -fPIC codesize improves by
> 0.65% and SPECINT2017 improves by 0.25%.
>
> Passes bootstrap and regress. OK for commit?

Normally we should only put two instructions in the same define_insn
if there's a specific ABI or architectural reason for not separating
them.  Doing it purely for optimisation reasons is going against the
general direction of travel.  So I think the first question is: why
don't we simply delay the split until after reload instead, since
that's the more normal way of handling this kind of thing?

Also, the patch means that we use RTL of the form:

  (set (reg:PTR R)
       (unspec:PTR [(mem:PTR (symbol_ref:PTR S))]
                   UNSPEC_GOTSMALLPIC))

to represent the move of S into R.  This should just be represented as:

  (set (reg:PTR R) (symbol_ref:PTR S))

and go through the normal move patterns.

Thanks,
Richard

> ChangeLog:
> 2021-05-05  Wilco Dijkstra  <wdijkstr@arm.com>
>
>         * config/aarch64/aarch64.md (ldr_got_small_<mode>): Emit ADRP+LDR GOT sequence.
>         (ldr_got_small_sidi): Likewise.
>         * config/aarch64/aarch64.c (aarch64_load_symref_appropriately): Remove tmp_reg.
>         (aarch64_print_operand): Correctly print got_lo12 in L specifier.
>
> ---
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 641c83b479e76cbcc75b299eb7ae5f634d9db7cd..32c5c76d3c001a79d2a69b7f8243f1f1f605f901 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -3625,27 +3625,21 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,
>  
>  	rtx insn;
>  	rtx mem;
> -	rtx tmp_reg = dest;
>  	machine_mode mode = GET_MODE (dest);
>  
> -	if (can_create_pseudo_p ())
> -	  tmp_reg = gen_reg_rtx (mode);
> -
> -	emit_move_insn (tmp_reg, gen_rtx_HIGH (mode, imm));
>  	if (mode == ptr_mode)
>  	  {
>  	    if (mode == DImode)
> -	      insn = gen_ldr_got_small_di (dest, tmp_reg, imm);
> +	      insn = gen_ldr_got_small_di (dest, imm);
>  	    else
> -	      insn = gen_ldr_got_small_si (dest, tmp_reg, imm);
> +	      insn = gen_ldr_got_small_si (dest, imm);
>  
>  	    mem = XVECEXP (SET_SRC (insn), 0, 0);
>  	  }
>  	else
>  	  {
>  	    gcc_assert (mode == Pmode);
> -
> -	    insn = gen_ldr_got_small_sidi (dest, tmp_reg, imm);
> +	    insn = gen_ldr_got_small_sidi (dest, imm);
>  	    mem = XVECEXP (XEXP (SET_SRC (insn), 0), 0, 0);
>  	  }
>  
> @@ -11019,7 +11013,7 @@ aarch64_print_operand (FILE *f, rtx x, int code)
>        switch (aarch64_classify_symbolic_expression (x))
>  	{
>  	case SYMBOL_SMALL_GOT_4G:
> -	  asm_fprintf (asm_out_file, ":lo12:");
> +	  asm_fprintf (asm_out_file, ":got_lo12:");
>  	  break;
>  
>  	case SYMBOL_SMALL_TLSGD:
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index abfd84526745d029ad4953eabad6dd17b159a218..36c5c054f86e9cdd1f0945cdbc1beb47aa7ad80a 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -6705,25 +6705,23 @@ (define_insn "add_losym_<mode>"
>  
>  (define_insn "ldr_got_small_<mode>"
>    [(set (match_operand:PTR 0 "register_operand" "=r")
> -	(unspec:PTR [(mem:PTR (lo_sum:PTR
> -			      (match_operand:PTR 1 "register_operand" "r")
> -			      (match_operand:PTR 2 "aarch64_valid_symref" "S")))]
> +	(unspec:PTR [(mem:PTR (match_operand:PTR 1 "aarch64_valid_symref" "S"))]
>  		    UNSPEC_GOTSMALLPIC))]
>    ""
> -  "ldr\\t%<w>0, [%1, #:got_lo12:%c2]"
> -  [(set_attr "type" "load_<ldst_sz>")]
> +  "adrp\\t%0, %A1\;ldr\\t%<w>0, [%0, %L1]"
> +  [(set_attr "type" "load_<ldst_sz>")
> +   (set_attr "length" "8")]
>  )
>  
>  (define_insn "ldr_got_small_sidi"
>    [(set (match_operand:DI 0 "register_operand" "=r")
>  	(zero_extend:DI
> -	 (unspec:SI [(mem:SI (lo_sum:DI
> -			     (match_operand:DI 1 "register_operand" "r")
> -			     (match_operand:DI 2 "aarch64_valid_symref" "S")))]
> +	 (unspec:SI [(mem:SI (match_operand:DI 1 "aarch64_valid_symref" "S"))]
>  		    UNSPEC_GOTSMALLPIC)))]
>    "TARGET_ILP32"
> -  "ldr\\t%w0, [%1, #:got_lo12:%c2]"
> -  [(set_attr "type" "load_4")]
> +  "adrp\\t%0, %A1\;ldr\\t%w0, [%0, %L1]"
> +  [(set_attr "type" "load_4")
> +   (set_attr "length" "8")]
>  )
>  
>  (define_insn "ldr_got_small_28k_<mode>"

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] AArch64: Improve GOT addressing
  2021-05-10 12:05 ` Richard Sandiford
@ 2021-05-10 14:04   ` Wilco Dijkstra
  2021-05-10 14:54     ` Richard Sandiford
  0 siblings, 1 reply; 4+ messages in thread
From: Wilco Dijkstra @ 2021-05-10 14:04 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: Wilco Dijkstra via Gcc-patches

Hi Richard,

> Normally we should only put two instructions in the same define_insn
> if there's a specific ABI or architectural reason for not separating
> them.  Doing it purely for optimisation reasons is going against the
> general direction of travel.  So I think the first question is: why
> don't we simply delay the split until after reload instead, since
> that's the more normal way of handling this kind of thing?

Well there are no optimizations that benefit from them being split, and there
is no gain from scheduling them independently. Keeping them together
means the linker could perform relaxations on the pair without adding new
relocations. So if we split after reload we'd still want to keep them together.

> This should just be represented as:
>
>  (set (reg:PTR R) (symbol_ref:PTR S))
>
> and go through the normal move patterns.

Yes that should be feasible. Note the UNSPEC is unnecessary even in the
original pattern - given it uses the standard HIGH operator for GOT accesses,
we can also use LO_SUM for the 2nd part since the indirection is hidden.

Cheers,
Wilco

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] AArch64: Improve GOT addressing
  2021-05-10 14:04   ` Wilco Dijkstra
@ 2021-05-10 14:54     ` Richard Sandiford
  0 siblings, 0 replies; 4+ messages in thread
From: Richard Sandiford @ 2021-05-10 14:54 UTC (permalink / raw)
  To: Wilco Dijkstra; +Cc: Wilco Dijkstra via Gcc-patches

Wilco Dijkstra <Wilco.Dijkstra@arm.com> writes:
> Hi Richard,
>
>> Normally we should only put two instructions in the same define_insn
>> if there's a specific ABI or architectural reason for not separating
>> them.  Doing it purely for optimisation reasons is going against the
>> general direction of travel.  So I think the first question is: why
>> don't we simply delay the split until after reload instead, since
>> that's the more normal way of handling this kind of thing?
>
> Well there are no optimizations that benefit from them being split, and there
> is no gain from scheduling them independently. Keeping them together
> means the linker could perform relaxations on the pair without adding new
> relocations. So if we split after reload we'd still want to keep them together.

The burden of proof is the other way though: there has to be a specific
reason for keeping the instructions together, rather than a specific
reason for splitting them.  How we optimise things after RA changes
with time.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-05-10 14:54 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-05 13:17 [PATCH] AArch64: Improve GOT addressing Wilco Dijkstra
2021-05-10 12:05 ` Richard Sandiford
2021-05-10 14:04   ` Wilco Dijkstra
2021-05-10 14:54     ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).