public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [Patch AArch64] Fix PR 63724 - Improve immediate generation
@ 2014-11-07 12:02 Ramana Radhakrishnan
  2014-11-07 13:36 ` Richard Henderson
  2014-11-11 11:35 ` [Patch AArch64] Fix up BSL expander for floating point types James Greenhalgh
  0 siblings, 2 replies; 7+ messages in thread
From: Ramana Radhakrishnan @ 2014-11-07 12:02 UTC (permalink / raw)
  To: gcc-patches; +Cc: marcus.shawcroft, richard.earnshaw, james.greenhalgh

[-- Attachment #1: Type: text/plain, Size: 1715 bytes --]

Hi,

This patch fixes up immediate generation for the AArch64 backend
allowing for the RTL optimizers like CSE and loop hoisting to work
more effectively with immediates. I also took the oppurtunity to
rework this to also be used in the costs calculations. This patch only
deals with numerical constants and handling symbolic constants will be
the subject of another patch. There has been some talk about
restructuring the immediate generation with a trie but I'm going to
leave that for another time.

This requires another patch that James is working on to fix up bsl
code code generation with float mode which was discovered to be broken
causing regressions when testing this code. I've worked around those
failures by pulling out a bsl patch that restructures the floating
point versions with an unspec and found no other issues with this
patch.

The output now generated matches the expected output in the PR. Looked
at output in a number of other benchmarks and saw it making sense.

Tested cross with aarch64-none-elf + a BSL patch with no regressions.
Bootstrapped and regression tested with aarch64-none-linux-gnu.

Ok for trunk once James's BSL patch is committed ?

Ramana

<DATE>  Ramana Radhakrishnan  <ramana.radhakrishnan@arm.com>

	PR target/63724
	* config/aarch64/aarch64.c (aarch64_expand_mov_immediate): Split out
	   numerical immediate handling to...
	(aarch64_internal_mov_immediate): ...this. New.
   	(aarch64_rtx_costs): Use aarch64_internal_mov_immediate.
	(aarch64_mov_operand_p): Relax predicate.
   	* config/aarch64/aarch64.md (mov<mode>:GPI): Do not expand CONST_INTs.
	(*movsi_aarch64): Turn into define_insn_and_split and new alternative 
   	for 'n'.
	(*movdi_aarch64): Likewise.

[-- Attachment #2: movimm-rework.txt --]
[-- Type: text/plain, Size: 17684 bytes --]

commit 34392753bd7f1481eff6ff86e055981618a3d06e
Author: Ramana Radhakrishnan <ramana.radhakrishnan@arm.com>
Date:   Thu Nov 6 16:08:27 2014 +0000


diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 736ad90..20cbb2d 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1046,8 +1046,8 @@ aarch64_add_offset (machine_mode mode, rtx temp, rtx reg, HOST_WIDE_INT offset)
   return plus_constant (mode, reg, offset);
 }
 
-void
-aarch64_expand_mov_immediate (rtx dest, rtx imm)
+static int
+aarch64_internal_mov_immediate (rtx dest, rtx imm, bool generate)
 {
   machine_mode mode = GET_MODE (dest);
   unsigned HOST_WIDE_INT mask;
@@ -1057,85 +1057,14 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
   bool subtargets;
   rtx subtarget;
   int one_match, zero_match, first_not_ffff_match;
-
-  gcc_assert (mode == SImode || mode == DImode);
-
-  /* Check on what type of symbol it is.  */
-  if (GET_CODE (imm) == SYMBOL_REF
-      || GET_CODE (imm) == LABEL_REF
-      || GET_CODE (imm) == CONST)
-    {
-      rtx mem, base, offset;
-      enum aarch64_symbol_type sty;
-
-      /* If we have (const (plus symbol offset)), separate out the offset
-	 before we start classifying the symbol.  */
-      split_const (imm, &base, &offset);
-
-      sty = aarch64_classify_symbol (base, SYMBOL_CONTEXT_ADR);
-      switch (sty)
-	{
-	case SYMBOL_FORCE_TO_MEM:
-	  if (offset != const0_rtx
-	      && targetm.cannot_force_const_mem (mode, imm))
-	    {
-	      gcc_assert (can_create_pseudo_p ());
-	      base = aarch64_force_temporary (mode, dest, base);
-	      base = aarch64_add_offset (mode, NULL, base, INTVAL (offset));
-	      aarch64_emit_move (dest, base);
-	      return;
-	    }
-	  mem = force_const_mem (ptr_mode, imm);
-	  gcc_assert (mem);
-	  if (mode != ptr_mode)
-	    mem = gen_rtx_ZERO_EXTEND (mode, mem);
-	  emit_insn (gen_rtx_SET (VOIDmode, dest, mem));
-	  return;
-
-        case SYMBOL_SMALL_TLSGD:
-        case SYMBOL_SMALL_TLSDESC:
-        case SYMBOL_SMALL_GOTTPREL:
-	case SYMBOL_SMALL_GOT:
-	case SYMBOL_TINY_GOT:
-	  if (offset != const0_rtx)
-	    {
-	      gcc_assert(can_create_pseudo_p ());
-	      base = aarch64_force_temporary (mode, dest, base);
-	      base = aarch64_add_offset (mode, NULL, base, INTVAL (offset));
-	      aarch64_emit_move (dest, base);
-	      return;
-	    }
-	  /* FALLTHRU */
-
-        case SYMBOL_SMALL_TPREL:
-	case SYMBOL_SMALL_ABSOLUTE:
-	case SYMBOL_TINY_ABSOLUTE:
-	  aarch64_load_symref_appropriately (dest, imm, sty);
-	  return;
-
-	default:
-	  gcc_unreachable ();
-	}
-    }
+  int num_insns = 0;
 
   if (CONST_INT_P (imm) && aarch64_move_imm (INTVAL (imm), mode))
     {
-      emit_insn (gen_rtx_SET (VOIDmode, dest, imm));
-      return;
-    }
-
-  if (!CONST_INT_P (imm))
-    {
-      if (GET_CODE (imm) == HIGH)
+      if (generate)
 	emit_insn (gen_rtx_SET (VOIDmode, dest, imm));
-      else
-        {
-	  rtx mem = force_const_mem (mode, imm);
-	  gcc_assert (mem);
-	  emit_insn (gen_rtx_SET (VOIDmode, dest, mem));
-	}
-
-      return;
+      num_insns++;
+      return num_insns;
     }
 
   if (mode == SImode)
@@ -1143,10 +1072,15 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
       /* We know we can't do this in 1 insn, and we must be able to do it
 	 in two; so don't mess around looking for sequences that don't buy
 	 us anything.  */
-      emit_insn (gen_rtx_SET (VOIDmode, dest, GEN_INT (INTVAL (imm) & 0xffff)));
-      emit_insn (gen_insv_immsi (dest, GEN_INT (16),
-				 GEN_INT ((INTVAL (imm) >> 16) & 0xffff)));
-      return;
+      if (generate)
+	{
+	  emit_insn (gen_rtx_SET (VOIDmode, dest,
+				  GEN_INT (INTVAL (imm) & 0xffff)));
+	  emit_insn (gen_insv_immsi (dest, GEN_INT (16),
+				     GEN_INT ((INTVAL (imm) >> 16) & 0xffff)));
+	}
+      num_insns += 2;
+      return num_insns;
     }
 
   /* Remaining cases are all for DImode.  */
@@ -1176,11 +1110,15 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
     {
       /* Set one of the quarters and then insert back into result.  */
       mask = 0xffffll << first_not_ffff_match;
-      emit_insn (gen_rtx_SET (VOIDmode, dest, GEN_INT (val | mask)));
-      emit_insn (gen_insv_immdi (dest, GEN_INT (first_not_ffff_match),
-				 GEN_INT ((val >> first_not_ffff_match)
-					  & 0xffff)));
-      return;
+      if (generate)
+	{
+	  emit_insn (gen_rtx_SET (VOIDmode, dest, GEN_INT (val | mask)));
+	  emit_insn (gen_insv_immdi (dest, GEN_INT (first_not_ffff_match),
+				     GEN_INT ((val >> first_not_ffff_match)
+					      & 0xffff)));
+	}
+      num_insns += 2;
+      return num_insns;
     }
 
   if (zero_match == 2)
@@ -1195,40 +1133,57 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
 	{
 	  subtarget = subtargets ? gen_reg_rtx (DImode) : dest;
 
-	  emit_insn (gen_rtx_SET (VOIDmode, subtarget, GEN_INT (val & mask)));
-	  emit_insn (gen_adddi3 (dest, subtarget,
-				 GEN_INT (val - (val & mask))));
-	  return;
+	  if (generate)
+	    {
+	      emit_insn (gen_rtx_SET (VOIDmode, subtarget,
+				      GEN_INT (val & mask)));
+	      emit_insn (gen_adddi3 (dest, subtarget,
+				     GEN_INT (val - (val & mask))));
+	    }
+	  num_insns += 2;
+	  return num_insns;
 	}
       else if (aarch64_uimm12_shift (-(val - ((val + comp) & mask))))
 	{
 	  subtarget = subtargets ? gen_reg_rtx (DImode) : dest;
 
-	  emit_insn (gen_rtx_SET (VOIDmode, subtarget,
-				  GEN_INT ((val + comp) & mask)));
-	  emit_insn (gen_adddi3 (dest, subtarget,
-				 GEN_INT (val - ((val + comp) & mask))));
-	  return;
+	  if (generate)
+	    {
+	      emit_insn (gen_rtx_SET (VOIDmode, subtarget,
+				      GEN_INT ((val + comp) & mask)));
+	      emit_insn (gen_adddi3 (dest, subtarget,
+				     GEN_INT (val - ((val + comp) & mask))));
+	    }
+	  num_insns += 2;
+	  return num_insns;
 	}
       else if (aarch64_uimm12_shift (val - ((val - comp) | ~mask)))
 	{
 	  subtarget = subtargets ? gen_reg_rtx (DImode) : dest;
 
-	  emit_insn (gen_rtx_SET (VOIDmode, subtarget,
-				  GEN_INT ((val - comp) | ~mask)));
-	  emit_insn (gen_adddi3 (dest, subtarget,
-				 GEN_INT (val - ((val - comp) | ~mask))));
-	  return;
+	  if (generate)
+	    {
+	      emit_insn (gen_rtx_SET (VOIDmode, subtarget,
+				      GEN_INT ((val - comp) | ~mask)));
+	      emit_insn (gen_adddi3 (dest, subtarget,
+				     GEN_INT (val - ((val - comp) | ~mask))));
+	    }
+	  num_insns += 2;
+	  return num_insns;
 	}
       else if (aarch64_uimm12_shift (-(val - (val | ~mask))))
 	{
 	  subtarget = subtargets ? gen_reg_rtx (DImode) : dest;
 
-	  emit_insn (gen_rtx_SET (VOIDmode, subtarget,
-				  GEN_INT (val | ~mask)));
-	  emit_insn (gen_adddi3 (dest, subtarget,
-				 GEN_INT (val - (val | ~mask))));
-	  return;
+	  if (generate)
+	    {
+	      emit_insn (gen_rtx_SET (VOIDmode, subtarget,
+				      GEN_INT (val | ~mask)));
+	      emit_insn (gen_adddi3 (dest, subtarget,
+				     GEN_INT (val - (val | ~mask))));
+	    }
+	  num_insns += 2;
+	  return num_insns;
 	}
     }
 
@@ -1243,22 +1198,30 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
 	  || aarch64_uimm12_shift (-val + aarch64_bitmasks[i]))
 	{
 	  subtarget = subtargets ? gen_reg_rtx (DImode) : dest;
-	  emit_insn (gen_rtx_SET (VOIDmode, subtarget,
-				  GEN_INT (aarch64_bitmasks[i])));
-	  emit_insn (gen_adddi3 (dest, subtarget,
-				 GEN_INT (val - aarch64_bitmasks[i])));
-	  return;
+	  if (generate)
+	    {
+	      emit_insn (gen_rtx_SET (VOIDmode, subtarget,
+				      GEN_INT (aarch64_bitmasks[i])));
+	      emit_insn (gen_adddi3 (dest, subtarget,
+				     GEN_INT (val - aarch64_bitmasks[i])));
+	    }
+	  num_insns += 2;
+	  return num_insns;
 	}
 
       for (j = 0; j < 64; j += 16, mask <<= 16)
 	{
 	  if ((aarch64_bitmasks[i] & ~mask) == (val & ~mask))
 	    {
-	      emit_insn (gen_rtx_SET (VOIDmode, dest,
-				      GEN_INT (aarch64_bitmasks[i])));
-	      emit_insn (gen_insv_immdi (dest, GEN_INT (j),
-					 GEN_INT ((val >> j) & 0xffff)));
-	      return;
+	      if (generate)
+		{
+		  emit_insn (gen_rtx_SET (VOIDmode, dest,
+					  GEN_INT (aarch64_bitmasks[i])));
+		  emit_insn (gen_insv_immdi (dest, GEN_INT (j),
+					     GEN_INT ((val >> j) & 0xffff)));
+		}
+	      num_insns += 2;
+	      return num_insns;
 	    }
 	}
     }
@@ -1274,11 +1237,15 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
 	    if (val == (aarch64_bitmasks[i] | aarch64_bitmasks[j]))
 	      {
 		subtarget = subtargets ? gen_reg_rtx (mode) : dest;
-		emit_insn (gen_rtx_SET (VOIDmode, subtarget,
-					GEN_INT (aarch64_bitmasks[i])));
-		emit_insn (gen_iordi3 (dest, subtarget,
-				       GEN_INT (aarch64_bitmasks[j])));
-		return;
+		if (generate)
+		  {
+		    emit_insn (gen_rtx_SET (VOIDmode, subtarget,
+					    GEN_INT (aarch64_bitmasks[i])));
+		    emit_insn (gen_iordi3 (dest, subtarget,
+					   GEN_INT (aarch64_bitmasks[j])));
+		  }
+		num_insns += 2;
+		return num_insns;
 	      }
 	}
       else if ((val & aarch64_bitmasks[i]) == val)
@@ -1290,11 +1257,15 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
 	      {
 
 		subtarget = subtargets ? gen_reg_rtx (mode) : dest;
-		emit_insn (gen_rtx_SET (VOIDmode, subtarget,
-					GEN_INT (aarch64_bitmasks[j])));
-		emit_insn (gen_anddi3 (dest, subtarget,
-				       GEN_INT (aarch64_bitmasks[i])));
-		return;
+		if (generate)
+		  {
+		    emit_insn (gen_rtx_SET (VOIDmode, subtarget,
+					    GEN_INT (aarch64_bitmasks[j])));
+		    emit_insn (gen_anddi3 (dest, subtarget,
+					   GEN_INT (aarch64_bitmasks[i])));
+		  }
+		num_insns += 2;
+		return num_insns;
 	      }
 	}
     }
@@ -1303,18 +1274,24 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
     {
       /* Set either first three quarters or all but the third.	 */
       mask = 0xffffll << (16 - first_not_ffff_match);
-      emit_insn (gen_rtx_SET (VOIDmode, dest,
-			      GEN_INT (val | mask | 0xffffffff00000000ull)));
+      if (generate)
+	emit_insn (gen_rtx_SET (VOIDmode, dest,
+				GEN_INT (val | mask | 0xffffffff00000000ull)));
+      num_insns ++;
 
       /* Now insert other two quarters.	 */
       for (i = first_not_ffff_match + 16, mask <<= (first_not_ffff_match << 1);
 	   i < 64; i += 16, mask <<= 16)
 	{
 	  if ((val & mask) != mask)
-	    emit_insn (gen_insv_immdi (dest, GEN_INT (i),
-				       GEN_INT ((val >> i) & 0xffff)));
+	    {
+	      if (generate)
+		emit_insn (gen_insv_immdi (dest, GEN_INT (i),
+					   GEN_INT ((val >> i) & 0xffff)));
+	      num_insns ++;
+	    }
 	}
-      return;
+      return num_insns;
     }
 
  simple_sequence:
@@ -1326,15 +1303,106 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
 	{
 	  if (first)
 	    {
-	      emit_insn (gen_rtx_SET (VOIDmode, dest,
-				      GEN_INT (val & mask)));
+	      if (generate)
+		emit_insn (gen_rtx_SET (VOIDmode, dest,
+					GEN_INT (val & mask)));
+	      num_insns ++;
 	      first = false;
 	    }
 	  else
-	    emit_insn (gen_insv_immdi (dest, GEN_INT (i),
-				       GEN_INT ((val >> i) & 0xffff)));
+	    {
+	      if (generate)
+		emit_insn (gen_insv_immdi (dest, GEN_INT (i),
+					   GEN_INT ((val >> i) & 0xffff)));
+	      num_insns ++;
+	    }
 	}
     }
+
+  return num_insns;
+}
+
+
+void
+aarch64_expand_mov_immediate (rtx dest, rtx imm)
+{
+  machine_mode mode = GET_MODE (dest);
+
+  gcc_assert (mode == SImode || mode == DImode);
+
+  /* Check on what type of symbol it is.  */
+  if (GET_CODE (imm) == SYMBOL_REF
+      || GET_CODE (imm) == LABEL_REF
+      || GET_CODE (imm) == CONST)
+    {
+      rtx mem, base, offset;
+      enum aarch64_symbol_type sty;
+
+      /* If we have (const (plus symbol offset)), separate out the offset
+	 before we start classifying the symbol.  */
+      split_const (imm, &base, &offset);
+
+      sty = aarch64_classify_symbol (base, SYMBOL_CONTEXT_ADR);
+      switch (sty)
+	{
+	case SYMBOL_FORCE_TO_MEM:
+	  if (offset != const0_rtx
+	      && targetm.cannot_force_const_mem (mode, imm))
+	    {
+	      gcc_assert (can_create_pseudo_p ());
+	      base = aarch64_force_temporary (mode, dest, base);
+	      base = aarch64_add_offset (mode, NULL, base, INTVAL (offset));
+	      aarch64_emit_move (dest, base);
+	      return;
+	    }
+	  mem = force_const_mem (ptr_mode, imm);
+	  gcc_assert (mem);
+	  if (mode != ptr_mode)
+	    mem = gen_rtx_ZERO_EXTEND (mode, mem);
+	  emit_insn (gen_rtx_SET (VOIDmode, dest, mem));
+	  return;
+
+        case SYMBOL_SMALL_TLSGD:
+        case SYMBOL_SMALL_TLSDESC:
+        case SYMBOL_SMALL_GOTTPREL:
+	case SYMBOL_SMALL_GOT:
+	case SYMBOL_TINY_GOT:
+	  if (offset != const0_rtx)
+	    {
+	      gcc_assert(can_create_pseudo_p ());
+	      base = aarch64_force_temporary (mode, dest, base);
+	      base = aarch64_add_offset (mode, NULL, base, INTVAL (offset));
+	      aarch64_emit_move (dest, base);
+	      return;
+	    }
+	  /* FALLTHRU */
+
+        case SYMBOL_SMALL_TPREL:
+	case SYMBOL_SMALL_ABSOLUTE:
+	case SYMBOL_TINY_ABSOLUTE:
+	  aarch64_load_symref_appropriately (dest, imm, sty);
+	  return;
+
+	default:
+	  gcc_unreachable ();
+	}
+    }
+
+  if (!CONST_INT_P (imm))
+    {
+      if (GET_CODE (imm) == HIGH)
+	emit_insn (gen_rtx_SET (VOIDmode, dest, imm));
+      else
+        {
+	  rtx mem = force_const_mem (mode, imm);
+	  gcc_assert (mem);
+	  emit_insn (gen_rtx_SET (VOIDmode, dest, mem));
+	}
+
+      return;
+    }
+
+  aarch64_internal_mov_immediate (dest, imm, true);
 }
 
 static bool
@@ -5234,9 +5302,8 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
 	     proportionally expensive to the number of instructions
 	     required to build that constant.  This is true whether we
 	     are compiling for SPEED or otherwise.  */
-	  *cost = COSTS_N_INSNS (aarch64_build_constant (0,
-							 INTVAL (x),
-							 false));
+	  *cost = COSTS_N_INSNS (aarch64_internal_mov_immediate
+				 (gen_rtx_REG (mode, 0), x, false));
 	}
       return true;
 
@@ -8035,7 +8102,7 @@ aarch64_mov_operand_p (rtx x,
       && aarch64_valid_symref (XEXP (x, 0), GET_MODE (XEXP (x, 0))))
     return true;
 
-  if (CONST_INT_P (x) && aarch64_move_imm (INTVAL (x), mode))
+  if (CONST_INT_P (x))
     return true;
 
   if (GET_CODE (x) == SYMBOL_REF && mode == DImode && CONSTANT_ADDRESS_P (x))
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 17570ba..142e8b1 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -746,17 +746,20 @@
     if (GET_CODE (operands[0]) == MEM && operands[1] != const0_rtx)
       operands[1] = force_reg (<MODE>mode, operands[1]);
 
-    if (CONSTANT_P (operands[1]))
-      {
-	aarch64_expand_mov_immediate (operands[0], operands[1]);
-	DONE;
-      }
+    /* FIXME: RR we still need to fix up what we are doing with
+       symbol_refs and other types of constants.  */
+    if (CONSTANT_P (operands[1])
+        && !CONST_INT_P (operands[1]))
+     {
+       aarch64_expand_mov_immediate (operands[0], operands[1]);
+       DONE;
+     }
   "
 )
 
-(define_insn "*movsi_aarch64"
-  [(set (match_operand:SI 0 "nonimmediate_operand" "=r,k,r,r,r,*w,m,  m,r,r  ,*w, r,*w")
-	(match_operand:SI 1 "aarch64_mov_operand"  " r,r,k,M,m, m,rZ,*w,S,Ush,rZ,*w,*w"))]
+(define_insn_and_split "*movsi_aarch64"
+  [(set (match_operand:SI 0 "nonimmediate_operand" "=r,k,r,r,r,r,*w,m,  m,r,r  ,*w, r,*w")
+	(match_operand:SI 1 "aarch64_mov_operand"  " r,r,k,M,n,m, m,rZ,*w,S,Ush,rZ,*w,*w"))]
   "(register_operand (operands[0], SImode)
     || aarch64_reg_or_zero (operands[1], SImode))"
   "@
@@ -764,6 +767,7 @@
    mov\\t%w0, %w1
    mov\\t%w0, %w1
    mov\\t%w0, %1
+   #
    ldr\\t%w0, %1
    ldr\\t%s0, %1
    str\\t%w1, %0
@@ -773,14 +777,20 @@
    fmov\\t%s0, %w1
    fmov\\t%w0, %s1
    fmov\\t%s0, %s1"
-  [(set_attr "type" "mov_reg,mov_reg,mov_reg,mov_imm,load1,load1,store1,store1,\
+   "CONST_INT_P (operands[1]) && !aarch64_move_imm (INTVAL (operands[1]), SImode)"
+   [(const_int 0)]
+   "{
+       aarch64_expand_mov_immediate (operands[0], operands[1]);
+       DONE;
+    }"
+  [(set_attr "type" "mov_reg,mov_reg,mov_reg,mov_imm,mov_imm,load1,load1,store1,store1,\
                      adr,adr,f_mcr,f_mrc,fmov")
-   (set_attr "fp" "*,*,*,*,*,yes,*,yes,*,*,yes,yes,yes")]
+   (set_attr "fp" "*,*,*,*,*,*,yes,*,yes,*,*,yes,yes,yes")]
 )
 
-(define_insn "*movdi_aarch64"
-  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,k,r,r,r,*w,m,  m,r,r,  *w, r,*w,w")
-	(match_operand:DI 1 "aarch64_mov_operand"  " r,r,k,N,m, m,rZ,*w,S,Ush,rZ,*w,*w,Dd"))]
+(define_insn_and_split "*movdi_aarch64"
+  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,k,r,r,r,r,*w,m,  m,r,r,  *w, r,*w,w")
+	(match_operand:DI 1 "aarch64_mov_operand"  " r,r,k,N,n,m, m,rZ,*w,S,Ush,rZ,*w,*w,Dd"))]
   "(register_operand (operands[0], DImode)
     || aarch64_reg_or_zero (operands[1], DImode))"
   "@
@@ -788,6 +798,7 @@
    mov\\t%0, %x1
    mov\\t%x0, %1
    mov\\t%x0, %1
+   #
    ldr\\t%x0, %1
    ldr\\t%d0, %1
    str\\t%x1, %0
@@ -798,10 +809,16 @@
    fmov\\t%x0, %d1
    fmov\\t%d0, %d1
    movi\\t%d0, %1"
-  [(set_attr "type" "mov_reg,mov_reg,mov_reg,mov_imm,load1,load1,store1,store1,\
+   "(CONST_INT_P (operands[1]) && !aarch64_move_imm (INTVAL (operands[1]), DImode))"
+   [(const_int 0)]
+   "{
+       aarch64_expand_mov_immediate (operands[0], operands[1]);
+       DONE;
+    }"
+  [(set_attr "type" "mov_reg,mov_reg,mov_reg,mov_imm,mov_imm,load1,load1,store1,store1,\
                      adr,adr,f_mcr,f_mrc,fmov,fmov")
-   (set_attr "fp" "*,*,*,*,*,yes,*,yes,*,*,yes,yes,yes,*")
-   (set_attr "simd" "*,*,*,*,*,*,*,*,*,*,*,*,*,yes")]
+   (set_attr "fp" "*,*,*,*,*,*,yes,*,yes,*,*,yes,yes,yes,*")
+   (set_attr "simd" "*,*,*,*,*,*,*,*,*,*,*,*,*,*,yes")]
 )
 
 (define_insn "insv_imm<mode>"

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Patch AArch64] Fix PR 63724 - Improve immediate generation
  2014-11-07 12:02 [Patch AArch64] Fix PR 63724 - Improve immediate generation Ramana Radhakrishnan
@ 2014-11-07 13:36 ` Richard Henderson
  2014-11-07 14:09   ` Ramana Radhakrishnan
  2014-11-12 16:53   ` Ramana Radhakrishnan
  2014-11-11 11:35 ` [Patch AArch64] Fix up BSL expander for floating point types James Greenhalgh
  1 sibling, 2 replies; 7+ messages in thread
From: Richard Henderson @ 2014-11-07 13:36 UTC (permalink / raw)
  To: Ramana Radhakrishnan, gcc-patches
  Cc: marcus.shawcroft, richard.earnshaw, james.greenhalgh

On 11/07/2014 01:02 PM, Ramana Radhakrishnan wrote:
> +	  *cost = COSTS_N_INSNS (aarch64_internal_mov_immediate
> +				 (gen_rtx_REG (mode, 0), x, false));
>  	}

Can't you pass NULL for the register when generate is false?


r~

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Patch AArch64] Fix PR 63724 - Improve immediate generation
  2014-11-07 13:36 ` Richard Henderson
@ 2014-11-07 14:09   ` Ramana Radhakrishnan
  2014-11-12 16:53   ` Ramana Radhakrishnan
  1 sibling, 0 replies; 7+ messages in thread
From: Ramana Radhakrishnan @ 2014-11-07 14:09 UTC (permalink / raw)
  To: Richard Henderson, gcc-patches
  Cc: Marcus Shawcroft, Richard Earnshaw, James Greenhalgh



On 07/11/14 13:36, Richard Henderson wrote:
> On 11/07/2014 01:02 PM, Ramana Radhakrishnan wrote:
>> +	  *cost = COSTS_N_INSNS (aarch64_internal_mov_immediate
>> +				 (gen_rtx_REG (mode, 0), x, false));
>>   	}
>
> Can't you pass NULL for the register when generate is false?

True, that would work as well. Should have thought of that.

regards
Ramana


>
>
> r~
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Patch AArch64] Fix up BSL expander for floating point types
  2014-11-07 12:02 [Patch AArch64] Fix PR 63724 - Improve immediate generation Ramana Radhakrishnan
  2014-11-07 13:36 ` Richard Henderson
@ 2014-11-11 11:35 ` James Greenhalgh
  2014-11-11 16:42   ` Marcus Shawcroft
  1 sibling, 1 reply; 7+ messages in thread
From: James Greenhalgh @ 2014-11-11 11:35 UTC (permalink / raw)
  To: gcc-patches; +Cc: ramana.radhakrishnan, marcus.shawcroft, richard.earnshaw

[-- Attachment #1: Type: text/plain, Size: 1125 bytes --]


Hi,

As Ramana hinted here:
  https://gcc.gnu.org/ml/gcc-patches/2014-11/msg00607.html

There are two issues with the way we've defined our BSL pattern. We pun
types around in a way that is scary and quite likely unsafe, and we
haven't canonicalized the pattern so combine is unlikely to pick it up.

This patch fixes both of these issues and adds testcases to ensure we are
picking up the combine opportunity.

I've bootstrapped and tested this on aarch64-none-linux-gnu and
cross-tested it for aarch64-none-elf.

OK?

Cheers,
James

---
gcc/

2014-11-11  James Greenhalgh  <james.greenhalgh@arm.com>

	* config/aarch64/aarch64-simd.md
	(aarch64_simd_bsl<mode>_internal): Remove float cases, canonicalize.
	(aarch64_simd_bsl<mode>): Add gen_lowpart expressions where we
	are punning between float vectors and integer vectors.

gcc/testsuite/

2014-11-11  James Greenhalgh  <james.greenhalgh@arm.com>

	* gcc.target/aarch64/vbslq_f64_1.c: New.
	* gcc.target/aarch64/vbslq_f64_2.c: Likewise.
	* gcc.target/aarch64/vbslq_u64_1.c: Likewise.
	* gcc.target/aarch64/vbslq_u64_2.c: Likewise.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Patch-AArch64-Fix-up-BSL-expander-for-floating-point.patch --]
[-- Type: text/x-patch;  name=0001-Patch-AArch64-Fix-up-BSL-expander-for-floating-point.patch, Size: 5183 bytes --]

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index ef196e4b6fb39c0d2fd9ebfee76abab8369b1e92..f7012ecab07c1b38836e949c2f4e5bd0c7939b5c 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1924,15 +1924,15 @@ (define_insn "aarch64_reduc_<maxmin_uns>
 ;;     bif op0, op1, mask
 
 (define_insn "aarch64_simd_bsl<mode>_internal"
-  [(set (match_operand:VALLDIF 0 "register_operand"		"=w,w,w")
-	(ior:VALLDIF
-	   (and:VALLDIF
-	     (match_operand:<V_cmp_result> 1 "register_operand"	" 0,w,w")
-	     (match_operand:VALLDIF 2 "register_operand"	" w,w,0"))
-	   (and:VALLDIF
+  [(set (match_operand:VSDQ_I_DI 0 "register_operand"		"=w,w,w")
+	(ior:VSDQ_I_DI
+	   (and:VSDQ_I_DI
 	     (not:<V_cmp_result>
-		(match_dup:<V_cmp_result> 1))
-	     (match_operand:VALLDIF 3 "register_operand"	" w,0,w"))
+	       (match_operand:<V_cmp_result> 1 "register_operand"	" 0,w,w"))
+	     (match_operand:VSDQ_I_DI 3 "register_operand"	" w,0,w"))
+	   (and:VSDQ_I_DI
+	     (match_dup:<V_cmp_result> 1)
+	     (match_operand:VSDQ_I_DI 2 "register_operand"	" w,w,0"))
 	))]
   "TARGET_SIMD"
   "@
@@ -1950,9 +1950,21 @@ (define_expand "aarch64_simd_bsl<mode>"
  "TARGET_SIMD"
 {
   /* We can't alias operands together if they have different modes.  */
+  rtx tmp = operands[0];
+  if (FLOAT_MODE_P (<MODE>mode))
+    {
+      operands[2] = gen_lowpart (<V_cmp_result>mode, operands[2]);
+      operands[3] = gen_lowpart (<V_cmp_result>mode, operands[3]);
+      tmp = gen_reg_rtx (<V_cmp_result>mode);
+    }
   operands[1] = gen_lowpart (<V_cmp_result>mode, operands[1]);
-  emit_insn (gen_aarch64_simd_bsl<mode>_internal (operands[0], operands[1],
-						  operands[2], operands[3]));
+  emit_insn (gen_aarch64_simd_bsl<v_cmp_result>_internal (tmp,
+							  operands[1],
+							  operands[2],
+							  operands[3]));
+  if (tmp != operands[0])
+    emit_move_insn (operands[0], gen_lowpart (<MODE>mode, tmp));
+
   DONE;
 })
 
diff --git a/gcc/testsuite/gcc.target/aarch64/vbslq_f64_1.c b/gcc/testsuite/gcc.target/aarch64/vbslq_f64_1.c
new file mode 100644
index 0000000..7b0e8f9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vbslq_f64_1.c
@@ -0,0 +1,20 @@
+/* Test vbslq_f64 can be folded.  */
+/* { dg-do assemble } */
+/* { dg-options "--save-temps -O3" } */
+
+#include <arm_neon.h>
+
+/* Folds to ret.  */
+
+float32x4_t
+fold_me (float32x4_t a, float32x4_t b)
+{
+  uint32x4_t mask = {-1, -1, -1, -1};
+  return vbslq_f32 (mask, a, b);
+}
+
+/* { dg-final { scan-assembler-not "bsl\\tv" } } */
+/* { dg-final { scan-assembler-not "bit\\tv" } } */
+/* { dg-final { scan-assembler-not "bif\\tv" } } */
+
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/vbslq_f64_2.c b/gcc/testsuite/gcc.target/aarch64/vbslq_f64_2.c
new file mode 100644
index 0000000..1dca90d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vbslq_f64_2.c
@@ -0,0 +1,23 @@
+/* Test vbslq_f64 can be folded.  */
+/* { dg-do assemble } */
+/* { dg-options "--save-temps -O3" } */
+
+#include <arm_neon.h>
+
+/* Should fold out one half of the BSL, leaving just a BIC.  */
+
+float32x4_t
+half_fold_me (uint32x4_t mask)
+{
+  float32x4_t a = {0.0, 0.0, 0.0, 0.0};
+  float32x4_t b = {2.0, 4.0, 8.0, 16.0};
+  return vbslq_f32 (mask, a, b);
+
+}
+
+/* { dg-final { scan-assembler-not "bsl\\tv" } } */
+/* { dg-final { scan-assembler-not "bit\\tv" } } */
+/* { dg-final { scan-assembler-not "bif\\tv" } } */
+/* { dg-final { scan-assembler "bic\\tv" } } */
+
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/vbslq_u64_1.c b/gcc/testsuite/gcc.target/aarch64/vbslq_u64_1.c
new file mode 100644
index 0000000..9c61d1a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vbslq_u64_1.c
@@ -0,0 +1,16 @@
+/* Test if a BSL-like instruction can be generated from a C idiom.  */
+/* { dg-do assemble } */
+/* { dg-options "--save-temps -O3" } */
+
+#include <arm_neon.h>
+
+/* Folds to BIF.  */
+
+uint32x4_t
+vbslq_dummy_u32 (uint32x4_t a, uint32x4_t b, uint32x4_t mask)
+{
+  return (mask & a) | (~mask & b);
+}
+
+/* { dg-final { scan-assembler-times "bif\\tv" 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/vbslq_u64_2.c b/gcc/testsuite/gcc.target/aarch64/vbslq_u64_2.c
new file mode 100644
index 0000000..4540351
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vbslq_u64_2.c
@@ -0,0 +1,21 @@
+/* Test vbslq_u64 can be folded.  */
+/* { dg-do assemble } */
+/* { dg-options "--save-temps -O3" } */
+#include <arm_neon.h>
+
+/* Folds to BIC.  */
+
+int32x4_t
+half_fold_int (uint32x4_t mask)
+{
+  int32x4_t a = {0, 0, 0, 0};
+  int32x4_t b = {2, 4, 8, 16};
+  return vbslq_s32 (mask, a, b);
+}
+
+/* { dg-final { scan-assembler-not "bsl\\tv" } } */
+/* { dg-final { scan-assembler-not "bit\\tv" } } */
+/* { dg-final { scan-assembler-not "bif\\tv" } } */
+/* { dg-final { scan-assembler "bic\\tv" } } */
+
+/* { dg-final { cleanup-saved-temps } } */

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Patch AArch64] Fix up BSL expander for floating point types
  2014-11-11 11:35 ` [Patch AArch64] Fix up BSL expander for floating point types James Greenhalgh
@ 2014-11-11 16:42   ` Marcus Shawcroft
  0 siblings, 0 replies; 7+ messages in thread
From: Marcus Shawcroft @ 2014-11-11 16:42 UTC (permalink / raw)
  To: James Greenhalgh; +Cc: gcc-patches

On 11 November 2014 11:27, James Greenhalgh <james.greenhalgh@arm.com> wrote:

> 2014-11-11  James Greenhalgh  <james.greenhalgh@arm.com>
>
>         * config/aarch64/aarch64-simd.md
>         (aarch64_simd_bsl<mode>_internal): Remove float cases, canonicalize.
>         (aarch64_simd_bsl<mode>): Add gen_lowpart expressions where we
>         are punning between float vectors and integer vectors.
>
> gcc/testsuite/
>
> 2014-11-11  James Greenhalgh  <james.greenhalgh@arm.com>
>
>         * gcc.target/aarch64/vbslq_f64_1.c: New.
>         * gcc.target/aarch64/vbslq_f64_2.c: Likewise.
>         * gcc.target/aarch64/vbslq_u64_1.c: Likewise.
>         * gcc.target/aarch64/vbslq_u64_2.c: Likewise.

OK /Marcus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Patch AArch64] Fix PR 63724 - Improve immediate generation
  2014-11-07 13:36 ` Richard Henderson
  2014-11-07 14:09   ` Ramana Radhakrishnan
@ 2014-11-12 16:53   ` Ramana Radhakrishnan
  2014-11-14  9:34     ` Marcus Shawcroft
  1 sibling, 1 reply; 7+ messages in thread
From: Ramana Radhakrishnan @ 2014-11-12 16:53 UTC (permalink / raw)
  To: Richard Henderson, gcc-patches
  Cc: Marcus Shawcroft, Richard Earnshaw, James Greenhalgh

[-- Attachment #1: Type: text/plain, Size: 1132 bytes --]



On 07/11/14 13:36, Richard Henderson wrote:
> On 11/07/2014 01:02 PM, Ramana Radhakrishnan wrote:
>> +	  *cost = COSTS_N_INSNS (aarch64_internal_mov_immediate
>> +				 (gen_rtx_REG (mode, 0), x, false));
>>   	}
>
> Can't you pass NULL for the register when generate is false?
>
>
> r~
>

v2 , based on Richard's suggestion as well as fixing a bug that I hit in 
some more testing at O1. aarch64_internal_mov_immediate should not 
generate a temporary for subtarget when not actually "generating" code.

Tested again on aarch64-none-elf and with a bootstrap / reg test. Ok ?


<DATE>  Ramana Radhakrishnan  <ramana.radhakrishnan@arm.com>

	PR target/63724
	* config/aarch64/aarch64.c (aarch64_expand_mov_immediate): Split out
	   numerical immediate handling to...
	(aarch64_internal_mov_immediate): ...this. New.
     	(aarch64_rtx_costs): Use aarch64_internal_mov_immediate.
	(aarch64_mov_operand_p): Relax predicate.
     	* config/aarch64/aarch64.md (mov<mode>:GPI): Do not expand CONST_INTs.
	(*movsi_aarch64): Turn into define_insn_and_split and new alternative
     	for 'n'.
	(*movdi_aarch64): Likewise.



regards
Ramana

[-- Attachment #2: movimm-rework-latest.txt --]
[-- Type: text/plain, Size: 18305 bytes --]

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 0429d96..0244169 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1046,10 +1046,10 @@ aarch64_add_offset (machine_mode mode, rtx temp, rtx reg, HOST_WIDE_INT offset)
   return plus_constant (mode, reg, offset);
 }
 
-void
-aarch64_expand_mov_immediate (rtx dest, rtx imm)
+static int
+aarch64_internal_mov_immediate (rtx dest, rtx imm, bool generate,
+				machine_mode mode)
 {
-  machine_mode mode = GET_MODE (dest);
   unsigned HOST_WIDE_INT mask;
   int i;
   bool first;
@@ -1057,85 +1057,14 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
   bool subtargets;
   rtx subtarget;
   int one_match, zero_match, first_not_ffff_match;
-
-  gcc_assert (mode == SImode || mode == DImode);
-
-  /* Check on what type of symbol it is.  */
-  if (GET_CODE (imm) == SYMBOL_REF
-      || GET_CODE (imm) == LABEL_REF
-      || GET_CODE (imm) == CONST)
-    {
-      rtx mem, base, offset;
-      enum aarch64_symbol_type sty;
-
-      /* If we have (const (plus symbol offset)), separate out the offset
-	 before we start classifying the symbol.  */
-      split_const (imm, &base, &offset);
-
-      sty = aarch64_classify_symbol (base, SYMBOL_CONTEXT_ADR);
-      switch (sty)
-	{
-	case SYMBOL_FORCE_TO_MEM:
-	  if (offset != const0_rtx
-	      && targetm.cannot_force_const_mem (mode, imm))
-	    {
-	      gcc_assert (can_create_pseudo_p ());
-	      base = aarch64_force_temporary (mode, dest, base);
-	      base = aarch64_add_offset (mode, NULL, base, INTVAL (offset));
-	      aarch64_emit_move (dest, base);
-	      return;
-	    }
-	  mem = force_const_mem (ptr_mode, imm);
-	  gcc_assert (mem);
-	  if (mode != ptr_mode)
-	    mem = gen_rtx_ZERO_EXTEND (mode, mem);
-	  emit_insn (gen_rtx_SET (VOIDmode, dest, mem));
-	  return;
-
-        case SYMBOL_SMALL_TLSGD:
-        case SYMBOL_SMALL_TLSDESC:
-        case SYMBOL_SMALL_GOTTPREL:
-	case SYMBOL_SMALL_GOT:
-	case SYMBOL_TINY_GOT:
-	  if (offset != const0_rtx)
-	    {
-	      gcc_assert(can_create_pseudo_p ());
-	      base = aarch64_force_temporary (mode, dest, base);
-	      base = aarch64_add_offset (mode, NULL, base, INTVAL (offset));
-	      aarch64_emit_move (dest, base);
-	      return;
-	    }
-	  /* FALLTHRU */
-
-        case SYMBOL_SMALL_TPREL:
-	case SYMBOL_SMALL_ABSOLUTE:
-	case SYMBOL_TINY_ABSOLUTE:
-	  aarch64_load_symref_appropriately (dest, imm, sty);
-	  return;
-
-	default:
-	  gcc_unreachable ();
-	}
-    }
+  int num_insns = 0;
 
   if (CONST_INT_P (imm) && aarch64_move_imm (INTVAL (imm), mode))
     {
-      emit_insn (gen_rtx_SET (VOIDmode, dest, imm));
-      return;
-    }
-
-  if (!CONST_INT_P (imm))
-    {
-      if (GET_CODE (imm) == HIGH)
+      if (generate)
 	emit_insn (gen_rtx_SET (VOIDmode, dest, imm));
-      else
-        {
-	  rtx mem = force_const_mem (mode, imm);
-	  gcc_assert (mem);
-	  emit_insn (gen_rtx_SET (VOIDmode, dest, mem));
-	}
-
-      return;
+      num_insns++;
+      return num_insns;
     }
 
   if (mode == SImode)
@@ -1143,10 +1072,15 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
       /* We know we can't do this in 1 insn, and we must be able to do it
 	 in two; so don't mess around looking for sequences that don't buy
 	 us anything.  */
-      emit_insn (gen_rtx_SET (VOIDmode, dest, GEN_INT (INTVAL (imm) & 0xffff)));
-      emit_insn (gen_insv_immsi (dest, GEN_INT (16),
-				 GEN_INT ((INTVAL (imm) >> 16) & 0xffff)));
-      return;
+      if (generate)
+	{
+	  emit_insn (gen_rtx_SET (VOIDmode, dest,
+				  GEN_INT (INTVAL (imm) & 0xffff)));
+	  emit_insn (gen_insv_immsi (dest, GEN_INT (16),
+				     GEN_INT ((INTVAL (imm) >> 16) & 0xffff)));
+	}
+      num_insns += 2;
+      return num_insns;
     }
 
   /* Remaining cases are all for DImode.  */
@@ -1176,11 +1110,15 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
     {
       /* Set one of the quarters and then insert back into result.  */
       mask = 0xffffll << first_not_ffff_match;
-      emit_insn (gen_rtx_SET (VOIDmode, dest, GEN_INT (val | mask)));
-      emit_insn (gen_insv_immdi (dest, GEN_INT (first_not_ffff_match),
-				 GEN_INT ((val >> first_not_ffff_match)
-					  & 0xffff)));
-      return;
+      if (generate)
+	{
+	  emit_insn (gen_rtx_SET (VOIDmode, dest, GEN_INT (val | mask)));
+	  emit_insn (gen_insv_immdi (dest, GEN_INT (first_not_ffff_match),
+				     GEN_INT ((val >> first_not_ffff_match)
+					      & 0xffff)));
+	}
+      num_insns += 2;
+      return num_insns;
     }
 
   if (zero_match == 2)
@@ -1193,42 +1131,55 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
 
       if (aarch64_uimm12_shift (val - (val & mask)))
 	{
-	  subtarget = subtargets ? gen_reg_rtx (DImode) : dest;
-
-	  emit_insn (gen_rtx_SET (VOIDmode, subtarget, GEN_INT (val & mask)));
-	  emit_insn (gen_adddi3 (dest, subtarget,
-				 GEN_INT (val - (val & mask))));
-	  return;
+	  if (generate)
+	    {
+	      subtarget = subtargets ? gen_reg_rtx (DImode) : dest;
+	      emit_insn (gen_rtx_SET (VOIDmode, subtarget,
+				      GEN_INT (val & mask)));
+	      emit_insn (gen_adddi3 (dest, subtarget,
+				     GEN_INT (val - (val & mask))));
+	    }
+	  num_insns += 2;
+	  return num_insns;
 	}
       else if (aarch64_uimm12_shift (-(val - ((val + comp) & mask))))
 	{
-	  subtarget = subtargets ? gen_reg_rtx (DImode) : dest;
-
-	  emit_insn (gen_rtx_SET (VOIDmode, subtarget,
-				  GEN_INT ((val + comp) & mask)));
-	  emit_insn (gen_adddi3 (dest, subtarget,
-				 GEN_INT (val - ((val + comp) & mask))));
-	  return;
+	  if (generate)
+	    {
+	      subtarget = subtargets ? gen_reg_rtx (DImode) : dest;
+	      emit_insn (gen_rtx_SET (VOIDmode, subtarget,
+				      GEN_INT ((val + comp) & mask)));
+	      emit_insn (gen_adddi3 (dest, subtarget,
+				     GEN_INT (val - ((val + comp) & mask))));
+	    }
+	  num_insns += 2;
+	  return num_insns;
 	}
       else if (aarch64_uimm12_shift (val - ((val - comp) | ~mask)))
 	{
-	  subtarget = subtargets ? gen_reg_rtx (DImode) : dest;
-
-	  emit_insn (gen_rtx_SET (VOIDmode, subtarget,
-				  GEN_INT ((val - comp) | ~mask)));
-	  emit_insn (gen_adddi3 (dest, subtarget,
-				 GEN_INT (val - ((val - comp) | ~mask))));
-	  return;
+	  if (generate)
+	    {
+	      subtarget = subtargets ? gen_reg_rtx (DImode) : dest;
+	      emit_insn (gen_rtx_SET (VOIDmode, subtarget,
+				      GEN_INT ((val - comp) | ~mask)));
+	      emit_insn (gen_adddi3 (dest, subtarget,
+				     GEN_INT (val - ((val - comp) | ~mask))));
+	    }
+	  num_insns += 2;
+	  return num_insns;
 	}
       else if (aarch64_uimm12_shift (-(val - (val | ~mask))))
 	{
-	  subtarget = subtargets ? gen_reg_rtx (DImode) : dest;
-
-	  emit_insn (gen_rtx_SET (VOIDmode, subtarget,
-				  GEN_INT (val | ~mask)));
-	  emit_insn (gen_adddi3 (dest, subtarget,
-				 GEN_INT (val - (val | ~mask))));
-	  return;
+	  if (generate)
+	    {
+	      subtarget = subtargets ? gen_reg_rtx (DImode) : dest;
+	      emit_insn (gen_rtx_SET (VOIDmode, subtarget,
+				      GEN_INT (val | ~mask)));
+	      emit_insn (gen_adddi3 (dest, subtarget,
+				     GEN_INT (val - (val | ~mask))));
+	    }
+	  num_insns += 2;
+	  return num_insns;
 	}
     }
 
@@ -1242,23 +1193,31 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
       if (aarch64_uimm12_shift (val - aarch64_bitmasks[i])
 	  || aarch64_uimm12_shift (-val + aarch64_bitmasks[i]))
 	{
-	  subtarget = subtargets ? gen_reg_rtx (DImode) : dest;
-	  emit_insn (gen_rtx_SET (VOIDmode, subtarget,
-				  GEN_INT (aarch64_bitmasks[i])));
-	  emit_insn (gen_adddi3 (dest, subtarget,
-				 GEN_INT (val - aarch64_bitmasks[i])));
-	  return;
+	  if (generate)
+	    {
+	      subtarget = subtargets ? gen_reg_rtx (DImode) : dest;
+	      emit_insn (gen_rtx_SET (VOIDmode, subtarget,
+				      GEN_INT (aarch64_bitmasks[i])));
+	      emit_insn (gen_adddi3 (dest, subtarget,
+				     GEN_INT (val - aarch64_bitmasks[i])));
+	    }
+	  num_insns += 2;
+	  return num_insns;
 	}
 
       for (j = 0; j < 64; j += 16, mask <<= 16)
 	{
 	  if ((aarch64_bitmasks[i] & ~mask) == (val & ~mask))
 	    {
-	      emit_insn (gen_rtx_SET (VOIDmode, dest,
-				      GEN_INT (aarch64_bitmasks[i])));
-	      emit_insn (gen_insv_immdi (dest, GEN_INT (j),
-					 GEN_INT ((val >> j) & 0xffff)));
-	      return;
+	      if (generate)
+		{
+		  emit_insn (gen_rtx_SET (VOIDmode, dest,
+					  GEN_INT (aarch64_bitmasks[i])));
+		  emit_insn (gen_insv_immdi (dest, GEN_INT (j),
+					     GEN_INT ((val >> j) & 0xffff)));
+		}
+	      num_insns += 2;
+	      return num_insns;
 	    }
 	}
     }
@@ -1273,12 +1232,16 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
 	  for (j = i + 1; j < AARCH64_NUM_BITMASKS; j++)
 	    if (val == (aarch64_bitmasks[i] | aarch64_bitmasks[j]))
 	      {
-		subtarget = subtargets ? gen_reg_rtx (mode) : dest;
-		emit_insn (gen_rtx_SET (VOIDmode, subtarget,
-					GEN_INT (aarch64_bitmasks[i])));
-		emit_insn (gen_iordi3 (dest, subtarget,
-				       GEN_INT (aarch64_bitmasks[j])));
-		return;
+		if (generate)
+		  {
+		    subtarget = subtargets ? gen_reg_rtx (mode) : dest;
+		    emit_insn (gen_rtx_SET (VOIDmode, subtarget,
+					    GEN_INT (aarch64_bitmasks[i])));
+		    emit_insn (gen_iordi3 (dest, subtarget,
+					   GEN_INT (aarch64_bitmasks[j])));
+		  }
+		num_insns += 2;
+		return num_insns;
 	      }
 	}
       else if ((val & aarch64_bitmasks[i]) == val)
@@ -1288,13 +1251,16 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
 	  for (j = i + 1; j < AARCH64_NUM_BITMASKS; j++)
 	    if (val == (aarch64_bitmasks[j] & aarch64_bitmasks[i]))
 	      {
-
-		subtarget = subtargets ? gen_reg_rtx (mode) : dest;
-		emit_insn (gen_rtx_SET (VOIDmode, subtarget,
-					GEN_INT (aarch64_bitmasks[j])));
-		emit_insn (gen_anddi3 (dest, subtarget,
-				       GEN_INT (aarch64_bitmasks[i])));
-		return;
+		if (generate)
+		  {
+		    subtarget = subtargets ? gen_reg_rtx (mode) : dest;
+		    emit_insn (gen_rtx_SET (VOIDmode, subtarget,
+					    GEN_INT (aarch64_bitmasks[j])));
+		    emit_insn (gen_anddi3 (dest, subtarget,
+					   GEN_INT (aarch64_bitmasks[i])));
+		  }
+		num_insns += 2;
+		return num_insns;
 	      }
 	}
     }
@@ -1303,18 +1269,24 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
     {
       /* Set either first three quarters or all but the third.	 */
       mask = 0xffffll << (16 - first_not_ffff_match);
-      emit_insn (gen_rtx_SET (VOIDmode, dest,
-			      GEN_INT (val | mask | 0xffffffff00000000ull)));
+      if (generate)
+	emit_insn (gen_rtx_SET (VOIDmode, dest,
+				GEN_INT (val | mask | 0xffffffff00000000ull)));
+      num_insns ++;
 
       /* Now insert other two quarters.	 */
       for (i = first_not_ffff_match + 16, mask <<= (first_not_ffff_match << 1);
 	   i < 64; i += 16, mask <<= 16)
 	{
 	  if ((val & mask) != mask)
-	    emit_insn (gen_insv_immdi (dest, GEN_INT (i),
-				       GEN_INT ((val >> i) & 0xffff)));
+	    {
+	      if (generate)
+		emit_insn (gen_insv_immdi (dest, GEN_INT (i),
+					   GEN_INT ((val >> i) & 0xffff)));
+	      num_insns ++;
+	    }
 	}
-      return;
+      return num_insns;
     }
 
  simple_sequence:
@@ -1326,15 +1298,106 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
 	{
 	  if (first)
 	    {
-	      emit_insn (gen_rtx_SET (VOIDmode, dest,
-				      GEN_INT (val & mask)));
+	      if (generate)
+		emit_insn (gen_rtx_SET (VOIDmode, dest,
+					GEN_INT (val & mask)));
+	      num_insns ++;
 	      first = false;
 	    }
 	  else
-	    emit_insn (gen_insv_immdi (dest, GEN_INT (i),
-				       GEN_INT ((val >> i) & 0xffff)));
+	    {
+	      if (generate)
+		emit_insn (gen_insv_immdi (dest, GEN_INT (i),
+					   GEN_INT ((val >> i) & 0xffff)));
+	      num_insns ++;
+	    }
+	}
+    }
+
+  return num_insns;
+}
+
+
+void
+aarch64_expand_mov_immediate (rtx dest, rtx imm)
+{
+  machine_mode mode = GET_MODE (dest);
+
+  gcc_assert (mode == SImode || mode == DImode);
+
+  /* Check on what type of symbol it is.  */
+  if (GET_CODE (imm) == SYMBOL_REF
+      || GET_CODE (imm) == LABEL_REF
+      || GET_CODE (imm) == CONST)
+    {
+      rtx mem, base, offset;
+      enum aarch64_symbol_type sty;
+
+      /* If we have (const (plus symbol offset)), separate out the offset
+	 before we start classifying the symbol.  */
+      split_const (imm, &base, &offset);
+
+      sty = aarch64_classify_symbol (base, SYMBOL_CONTEXT_ADR);
+      switch (sty)
+	{
+	case SYMBOL_FORCE_TO_MEM:
+	  if (offset != const0_rtx
+	      && targetm.cannot_force_const_mem (mode, imm))
+	    {
+	      gcc_assert (can_create_pseudo_p ());
+	      base = aarch64_force_temporary (mode, dest, base);
+	      base = aarch64_add_offset (mode, NULL, base, INTVAL (offset));
+	      aarch64_emit_move (dest, base);
+	      return;
+	    }
+	  mem = force_const_mem (ptr_mode, imm);
+	  gcc_assert (mem);
+	  if (mode != ptr_mode)
+	    mem = gen_rtx_ZERO_EXTEND (mode, mem);
+	  emit_insn (gen_rtx_SET (VOIDmode, dest, mem));
+	  return;
+
+        case SYMBOL_SMALL_TLSGD:
+        case SYMBOL_SMALL_TLSDESC:
+        case SYMBOL_SMALL_GOTTPREL:
+	case SYMBOL_SMALL_GOT:
+	case SYMBOL_TINY_GOT:
+	  if (offset != const0_rtx)
+	    {
+	      gcc_assert(can_create_pseudo_p ());
+	      base = aarch64_force_temporary (mode, dest, base);
+	      base = aarch64_add_offset (mode, NULL, base, INTVAL (offset));
+	      aarch64_emit_move (dest, base);
+	      return;
+	    }
+	  /* FALLTHRU */
+
+        case SYMBOL_SMALL_TPREL:
+	case SYMBOL_SMALL_ABSOLUTE:
+	case SYMBOL_TINY_ABSOLUTE:
+	  aarch64_load_symref_appropriately (dest, imm, sty);
+	  return;
+
+	default:
+	  gcc_unreachable ();
+	}
+    }
+
+  if (!CONST_INT_P (imm))
+    {
+      if (GET_CODE (imm) == HIGH)
+	emit_insn (gen_rtx_SET (VOIDmode, dest, imm));
+      else
+        {
+	  rtx mem = force_const_mem (mode, imm);
+	  gcc_assert (mem);
+	  emit_insn (gen_rtx_SET (VOIDmode, dest, mem));
 	}
+
+      return;
     }
+
+  aarch64_internal_mov_immediate (dest, imm, true, GET_MODE (dest));
 }
 
 static bool
@@ -5229,9 +5292,8 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED,
 	     proportionally expensive to the number of instructions
 	     required to build that constant.  This is true whether we
 	     are compiling for SPEED or otherwise.  */
-	  *cost = COSTS_N_INSNS (aarch64_build_constant (0,
-							 INTVAL (x),
-							 false));
+	  *cost = COSTS_N_INSNS (aarch64_internal_mov_immediate
+				 (NULL_RTX, x, false, mode));
 	}
       return true;
 
@@ -8030,7 +8092,7 @@ aarch64_mov_operand_p (rtx x,
       && aarch64_valid_symref (XEXP (x, 0), GET_MODE (XEXP (x, 0))))
     return true;
 
-  if (CONST_INT_P (x) && aarch64_move_imm (INTVAL (x), mode))
+  if (CONST_INT_P (x))
     return true;
 
   if (GET_CODE (x) == SYMBOL_REF && mode == DImode && CONSTANT_ADDRESS_P (x))
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 17570ba..142e8b1 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -746,17 +746,20 @@
     if (GET_CODE (operands[0]) == MEM && operands[1] != const0_rtx)
       operands[1] = force_reg (<MODE>mode, operands[1]);
 
-    if (CONSTANT_P (operands[1]))
-      {
-	aarch64_expand_mov_immediate (operands[0], operands[1]);
-	DONE;
-      }
+    /* FIXME: RR we still need to fix up what we are doing with
+       symbol_refs and other types of constants.  */
+    if (CONSTANT_P (operands[1])
+        && !CONST_INT_P (operands[1]))
+     {
+       aarch64_expand_mov_immediate (operands[0], operands[1]);
+       DONE;
+     }
   "
 )
 
-(define_insn "*movsi_aarch64"
-  [(set (match_operand:SI 0 "nonimmediate_operand" "=r,k,r,r,r,*w,m,  m,r,r  ,*w, r,*w")
-	(match_operand:SI 1 "aarch64_mov_operand"  " r,r,k,M,m, m,rZ,*w,S,Ush,rZ,*w,*w"))]
+(define_insn_and_split "*movsi_aarch64"
+  [(set (match_operand:SI 0 "nonimmediate_operand" "=r,k,r,r,r,r,*w,m,  m,r,r  ,*w, r,*w")
+	(match_operand:SI 1 "aarch64_mov_operand"  " r,r,k,M,n,m, m,rZ,*w,S,Ush,rZ,*w,*w"))]
   "(register_operand (operands[0], SImode)
     || aarch64_reg_or_zero (operands[1], SImode))"
   "@
@@ -764,6 +767,7 @@
    mov\\t%w0, %w1
    mov\\t%w0, %w1
    mov\\t%w0, %1
+   #
    ldr\\t%w0, %1
    ldr\\t%s0, %1
    str\\t%w1, %0
@@ -773,14 +777,20 @@
    fmov\\t%s0, %w1
    fmov\\t%w0, %s1
    fmov\\t%s0, %s1"
-  [(set_attr "type" "mov_reg,mov_reg,mov_reg,mov_imm,load1,load1,store1,store1,\
+   "CONST_INT_P (operands[1]) && !aarch64_move_imm (INTVAL (operands[1]), SImode)"
+   [(const_int 0)]
+   "{
+       aarch64_expand_mov_immediate (operands[0], operands[1]);
+       DONE;
+    }"
+  [(set_attr "type" "mov_reg,mov_reg,mov_reg,mov_imm,mov_imm,load1,load1,store1,store1,\
                      adr,adr,f_mcr,f_mrc,fmov")
-   (set_attr "fp" "*,*,*,*,*,yes,*,yes,*,*,yes,yes,yes")]
+   (set_attr "fp" "*,*,*,*,*,*,yes,*,yes,*,*,yes,yes,yes")]
 )
 
-(define_insn "*movdi_aarch64"
-  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,k,r,r,r,*w,m,  m,r,r,  *w, r,*w,w")
-	(match_operand:DI 1 "aarch64_mov_operand"  " r,r,k,N,m, m,rZ,*w,S,Ush,rZ,*w,*w,Dd"))]
+(define_insn_and_split "*movdi_aarch64"
+  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,k,r,r,r,r,*w,m,  m,r,r,  *w, r,*w,w")
+	(match_operand:DI 1 "aarch64_mov_operand"  " r,r,k,N,n,m, m,rZ,*w,S,Ush,rZ,*w,*w,Dd"))]
   "(register_operand (operands[0], DImode)
     || aarch64_reg_or_zero (operands[1], DImode))"
   "@
@@ -788,6 +798,7 @@
    mov\\t%0, %x1
    mov\\t%x0, %1
    mov\\t%x0, %1
+   #
    ldr\\t%x0, %1
    ldr\\t%d0, %1
    str\\t%x1, %0
@@ -798,10 +809,16 @@
    fmov\\t%x0, %d1
    fmov\\t%d0, %d1
    movi\\t%d0, %1"
-  [(set_attr "type" "mov_reg,mov_reg,mov_reg,mov_imm,load1,load1,store1,store1,\
+   "(CONST_INT_P (operands[1]) && !aarch64_move_imm (INTVAL (operands[1]), DImode))"
+   [(const_int 0)]
+   "{
+       aarch64_expand_mov_immediate (operands[0], operands[1]);
+       DONE;
+    }"
+  [(set_attr "type" "mov_reg,mov_reg,mov_reg,mov_imm,mov_imm,load1,load1,store1,store1,\
                      adr,adr,f_mcr,f_mrc,fmov,fmov")
-   (set_attr "fp" "*,*,*,*,*,yes,*,yes,*,*,yes,yes,yes,*")
-   (set_attr "simd" "*,*,*,*,*,*,*,*,*,*,*,*,*,yes")]
+   (set_attr "fp" "*,*,*,*,*,*,yes,*,yes,*,*,yes,yes,yes,*")
+   (set_attr "simd" "*,*,*,*,*,*,*,*,*,*,*,*,*,*,yes")]
 )
 
 (define_insn "insv_imm<mode>"

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Patch AArch64] Fix PR 63724 - Improve immediate generation
  2014-11-12 16:53   ` Ramana Radhakrishnan
@ 2014-11-14  9:34     ` Marcus Shawcroft
  0 siblings, 0 replies; 7+ messages in thread
From: Marcus Shawcroft @ 2014-11-14  9:34 UTC (permalink / raw)
  To: Ramana Radhakrishnan
  Cc: Richard Henderson, gcc-patches, Marcus Shawcroft,
	Richard Earnshaw, James Greenhalgh

On 12 November 2014 16:46, Ramana Radhakrishnan
<ramana.radhakrishnan@arm.com> wrote:

> v2 , based on Richard's suggestion as well as fixing a bug that I hit in
> some more testing at O1. aarch64_internal_mov_immediate should not generate
> a temporary for subtarget when not actually "generating" code.
>
> Tested again on aarch64-none-elf and with a bootstrap / reg test. Ok ?
>
>
> <DATE>  Ramana Radhakrishnan  <ramana.radhakrishnan@arm.com>
>
>         PR target/63724
>         * config/aarch64/aarch64.c (aarch64_expand_mov_immediate): Split out
>            numerical immediate handling to...
>         (aarch64_internal_mov_immediate): ...this. New.
>         (aarch64_rtx_costs): Use aarch64_internal_mov_immediate.
>         (aarch64_mov_operand_p): Relax predicate.
>         * config/aarch64/aarch64.md (mov<mode>:GPI): Do not expand
> CONST_INTs.
>         (*movsi_aarch64): Turn into define_insn_and_split and new
> alternative
>         for 'n'.
>         (*movdi_aarch64): Likewise.


OK.

We should be able to remove aarch64_build_constant now.

/Marcus

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-11-14  9:14 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-07 12:02 [Patch AArch64] Fix PR 63724 - Improve immediate generation Ramana Radhakrishnan
2014-11-07 13:36 ` Richard Henderson
2014-11-07 14:09   ` Ramana Radhakrishnan
2014-11-12 16:53   ` Ramana Radhakrishnan
2014-11-14  9:34     ` Marcus Shawcroft
2014-11-11 11:35 ` [Patch AArch64] Fix up BSL expander for floating point types James Greenhalgh
2014-11-11 16:42   ` Marcus Shawcroft

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).