public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH, ARM] Unaligned accesses for packed structures [1/2]
@ 2011-05-06 12:45 Julian Brown
  2011-05-06 13:07 ` Richard Earnshaw
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Julian Brown @ 2011-05-06 12:45 UTC (permalink / raw)
  To: gcc-patches; +Cc: paul, rearnsha, Ramana Radhakrishnan

[-- Attachment #1: Type: text/plain, Size: 3214 bytes --]

Hi,

This is the first of two patches to add unaligned-access support to the
ARM backend. This is done somewhat differently to Jie Zhang's earlier
patch:

  http://gcc.gnu.org/ml/gcc-patches/2010-12/msg01890.html

In that with Jie's patch, *any* pointer dereference would be allowed to
access unaligned data. This has the undesirable side-effect of
disallowing instructions which don't support unaligned accesses (LDRD,
LDM etc.) when unaligned accesses are enabled.

Instead, this patch enables only packed-structure accesses to use
ldr/str/ldrh/strh, by taking a hint from the MIPS ldl/ldr
implementation. I figured the unaligned-access ARM case is kind of
similar to those, except that normal loads/stores are used, and the
shifting/merging happens in hardware.

The standard names extv/extzv/insv can take a memory
operand for the source/destination of the extract/insert operation, so
we just expand to unspec'ed versions of the load and store operations
when unaligned-access support is enabled: the benefit of doing that
rather than, say, expanding using the regular movsi pattern is that we
bypass any smartness in the compiler which might replace operations
which work for unaligned accesses (ldr/str/ldrh/strh) with operations
which don't work (ldrd/strd/ldm/stm/vldr/...). The downside is we might
potentially miss out on optimization opportunities (since these things
no longer look like plain memory accesses).

Doing things this way allows us to leave the settings for
STRICT_ALIGNMENT/SLOW_BYTE_ACCESS alone, avoiding the disruption that
changing them might cause.

The most awkward change in the patch is to generic code (expmed.c,
{store,extract}_bit_field_1): in big-endian mode, the existing behaviour
(when inserting/extracting a bitfield to a memory location) is
definitely bogus: "unit" is set to BITS_PER_UNIT for memory locations,
and if bitsize (the size of the field to insert/extract) is greater than
BITS_PER_UNIT (which isn't unusual at all), xbitpos becomes negative.
That can't possibly be intentional; I can only assume that this code
path is not exercised for machines which have memory alternatives for
bitfield insert/extract, and BITS_BIG_ENDIAN of 0 in BYTES_BIG_ENDIAN
mode.

The logic for choosing when to enable the unaligned-access support (and
the name of the option to override the default behaviour) is lifted from
Jie's patch.

Tested with cross to ARM Linux, and (on a branch) in both little &
big-endian mode cross to ARM EABI, with no regressions. OK to apply?

Thanks,

Julian

ChangeLog

    gcc/
    * config/arm/arm.c (arm_override_options): Add unaligned_access
    support.
    * config/arm/arm.md (UNSPEC_UNALIGNED_LOAD)
    (UNSPEC_UNALIGNED_STORE): Add constants for unspecs.
    (insv, extzv): Add unaligned-access support.
    (extv): Change to expander. Likewise.
    (unaligned_loadsi, unaligned_loadhis, unaligned_loadhiu)
    (unaligned_storesi, unaligned_storehi): New.
    (*extv_reg): New (previous extv implementation).
    * config/arm/arm.opt (munaligned_access): Add option.
    * expmed.c (store_bit_field_1): Don't tweak bitfield numbering for
    memory locations if BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN.
    (extract_bit_field_1): Likewise.

[-- Attachment #2: arm-unaligned-support-fsf-1.diff --]
[-- Type: text/x-patch, Size: 11144 bytes --]

commit e76508ff702406fd63bc59465d9c7ab70dcb3266
Author: Julian Brown <julian@henry7.codesourcery.com>
Date:   Wed May 4 10:06:25 2011 -0700

    Permit regular ldr/str/ldrh/strh for packed-structure accesses etc.

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 4f9c2aa..a18aea6 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1833,6 +1833,22 @@ arm_option_override (void)
 	fix_cm3_ldrd = 0;
     }
 
+  /* Enable -munaligned-access by default for
+     - all ARMv6 architecture-based processors
+     - ARMv7-A, ARMv7-R, and ARMv7-M architecture-based processors.
+
+     Disable -munaligned-access by default for
+     - all pre-ARMv6 architecture-based processors
+     - ARMv6-M architecture-based processors.  */
+
+  if (unaligned_access == 2)
+    {
+      if (arm_arch6 && (arm_arch_notm || arm_arch7))
+	unaligned_access = 1;
+      else
+	unaligned_access = 0;
+    }
+
   if (TARGET_THUMB1 && flag_schedule_insns)
     {
       /* Don't warn since it's on by default in -O2.  */
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 40ebf35..7d37445 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -104,6 +104,10 @@
   UNSPEC_SYMBOL_OFFSET  ; The offset of the start of the symbol from
                         ; another symbolic address.
   UNSPEC_MEMORY_BARRIER ; Represent a memory barrier.
+  UNSPEC_UNALIGNED_LOAD	; Used to represent ldr/ldrh instructions that access
+			; unaligned locations, on architectures which support
+			; that.
+  UNSPEC_UNALIGNED_STORE ; Same for str/strh.
 ])
 
 ;; UNSPEC_VOLATILE Usage:
@@ -2393,7 +2397,7 @@
 ;;; this insv pattern, so this pattern needs to be reevalutated.
 
 (define_expand "insv"
-  [(set (zero_extract:SI (match_operand:SI 0 "s_register_operand" "")
+  [(set (zero_extract:SI (match_operand:SI 0 "nonimmediate_operand" "")
                          (match_operand:SI 1 "general_operand" "")
                          (match_operand:SI 2 "general_operand" ""))
         (match_operand:SI 3 "reg_or_int_operand" ""))]
@@ -2407,35 +2411,66 @@
 
     if (arm_arch_thumb2)
       {
-	bool use_bfi = TRUE;
-
-	if (GET_CODE (operands[3]) == CONST_INT)
+        if (unaligned_access && MEM_P (operands[0])
+	    && s_register_operand (operands[3], GET_MODE (operands[3]))
+	    && (width == 16 || width == 32) && (start_bit % BITS_PER_UNIT) == 0)
 	  {
-	    HOST_WIDE_INT val = INTVAL (operands[3]) & mask;
-
-	    if (val == 0)
+	    rtx base_addr;
+	    
+	    if (width == 32)
 	      {
-		emit_insn (gen_insv_zero (operands[0], operands[1],
-					  operands[2]));
-		DONE;
+	        base_addr = adjust_address (operands[0], SImode,
+					    start_bit / BITS_PER_UNIT);
+		emit_insn (gen_unaligned_storesi (base_addr, operands[3]));
 	      }
+	    else
+	      {
+	        rtx tmp = gen_reg_rtx (HImode);
 
-	    /* See if the set can be done with a single orr instruction.  */
-	    if (val == mask && const_ok_for_arm (val << start_bit))
-	      use_bfi = FALSE;
+	        base_addr = adjust_address (operands[0], HImode,
+					    start_bit / BITS_PER_UNIT);
+		emit_move_insn (tmp, gen_lowpart (HImode, operands[3]));
+		emit_insn (gen_unaligned_storehi (base_addr, tmp));
+	      }
+	    DONE;
 	  }
-	  
-	if (use_bfi)
+	else if (s_register_operand (operands[0], GET_MODE (operands[0])))
 	  {
-	    if (GET_CODE (operands[3]) != REG)
-	      operands[3] = force_reg (SImode, operands[3]);
+	    bool use_bfi = TRUE;
 
-	    emit_insn (gen_insv_t2 (operands[0], operands[1], operands[2],
-				    operands[3]));
-	    DONE;
+	    if (GET_CODE (operands[3]) == CONST_INT)
+	      {
+		HOST_WIDE_INT val = INTVAL (operands[3]) & mask;
+
+		if (val == 0)
+		  {
+		    emit_insn (gen_insv_zero (operands[0], operands[1],
+					      operands[2]));
+		    DONE;
+		  }
+
+		/* See if the set can be done with a single orr instruction.  */
+		if (val == mask && const_ok_for_arm (val << start_bit))
+		  use_bfi = FALSE;
+	      }
+
+	    if (use_bfi)
+	      {
+		if (GET_CODE (operands[3]) != REG)
+		  operands[3] = force_reg (SImode, operands[3]);
+
+		emit_insn (gen_insv_t2 (operands[0], operands[1], operands[2],
+					operands[3]));
+		DONE;
+	      }
 	  }
+	else
+	  FAIL;
       }
 
+    if (!s_register_operand (operands[0], GET_MODE (operands[0])))
+      FAIL;
+
     target = copy_rtx (operands[0]);
     /* Avoid using a subreg as a subtarget, and avoid writing a paradoxical 
        subreg as the final target.  */
@@ -3628,7 +3663,7 @@
 
 (define_expand "extzv"
   [(set (match_dup 4)
-	(ashift:SI (match_operand:SI   1 "register_operand" "")
+	(ashift:SI (match_operand:SI   1 "nonimmediate_operand" "")
 		   (match_operand:SI   2 "const_int_operand" "")))
    (set (match_operand:SI              0 "register_operand" "")
 	(lshiftrt:SI (match_dup 4)
@@ -3641,10 +3676,53 @@
     
     if (arm_arch_thumb2)
       {
-	emit_insn (gen_extzv_t2 (operands[0], operands[1], operands[2],
-				 operands[3]));
-	DONE;
+	HOST_WIDE_INT width = INTVAL (operands[2]);
+	HOST_WIDE_INT bitpos = INTVAL (operands[3]);
+
+	if (unaligned_access && MEM_P (operands[1])
+	    && (width == 16 || width == 32) && (bitpos % BITS_PER_UNIT) == 0)
+	  {
+	    rtx base_addr;
+
+	    if (width == 32)
+              {
+		base_addr = adjust_address (operands[1], SImode,
+					    bitpos / BITS_PER_UNIT);
+		emit_insn (gen_unaligned_loadsi (operands[0], base_addr));
+              }
+	    else
+              {
+		rtx dest = operands[0];
+		rtx tmp = gen_reg_rtx (SImode);
+
+		/* We may get a paradoxical subreg here.  Strip it off.  */
+		if (GET_CODE (dest) == SUBREG
+		    && GET_MODE (dest) == SImode
+		    && GET_MODE (SUBREG_REG (dest)) == HImode)
+		  dest = SUBREG_REG (dest);
+
+		if (GET_MODE_BITSIZE (GET_MODE (dest)) != width)
+		  FAIL;
+
+		base_addr = adjust_address (operands[1], HImode,
+					    bitpos / BITS_PER_UNIT);
+		emit_insn (gen_unaligned_loadhiu (tmp, base_addr));
+		emit_move_insn (gen_lowpart (SImode, dest), tmp);
+	      }
+	    DONE;
+	  }
+	else if (s_register_operand (operands[1], GET_MODE (operands[1])))
+	  {
+	    emit_insn (gen_extzv_t2 (operands[0], operands[1], operands[2],
+				     operands[3]));
+	    DONE;
+	  }
+	else
+	  FAIL;
       }
+    
+    if (!s_register_operand (operands[1], GET_MODE (operands[1])))
+      FAIL;
 
     operands[3] = GEN_INT (rshift);
     
@@ -3659,7 +3737,103 @@
   }"
 )
 
-(define_insn "extv"
+(define_expand "extv"
+  [(set (match_operand 0 "s_register_operand" "")
+	(sign_extract (match_operand 1 "nonimmediate_operand" "")
+		      (match_operand 2 "const_int_operand" "")
+		      (match_operand 3 "const_int_operand" "")))]
+  "arm_arch_thumb2"
+{
+  HOST_WIDE_INT width = INTVAL (operands[2]);
+  HOST_WIDE_INT bitpos = INTVAL (operands[3]);
+
+  if (unaligned_access && MEM_P (operands[1]) && (width == 16 || width == 32)
+      && (bitpos % BITS_PER_UNIT) == 0)
+    {
+      rtx base_addr;
+      
+      if (width == 32)
+        {
+	  base_addr = adjust_address (operands[1], SImode,
+				      bitpos / BITS_PER_UNIT);
+	  emit_insn (gen_unaligned_loadsi (operands[0], base_addr));
+        }
+      else
+        {
+	  rtx dest = operands[0];
+	  rtx tmp = gen_reg_rtx (SImode);
+	  
+	  /* We may get a paradoxical subreg here.  Strip it off.  */
+	  if (GET_CODE (dest) == SUBREG
+	      && GET_MODE (dest) == SImode
+	      && GET_MODE (SUBREG_REG (dest)) == HImode)
+	    dest = SUBREG_REG (dest);
+	  
+	  if (GET_MODE_BITSIZE (GET_MODE (dest)) != width)
+	    FAIL;
+	  
+	  base_addr = adjust_address (operands[1], HImode,
+				      bitpos / BITS_PER_UNIT);
+	  emit_insn (gen_unaligned_loadhis (tmp, base_addr));
+	  emit_move_insn (gen_lowpart (SImode, dest), tmp);
+	}
+
+      DONE;
+    }
+  else if (s_register_operand (operands[1], GET_MODE (operands[1])))
+    DONE;
+  else
+    FAIL;
+})
+
+; ARMv6+ unaligned load/store instructions (used for packed structure accesses).
+
+(define_insn "unaligned_loadsi"
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
+	(unspec:SI [(match_operand:SI 1 "memory_operand" "m")]
+		   UNSPEC_UNALIGNED_LOAD))]
+  "unaligned_access"
+  "ldr%?\t%0, %1\t@ unaligned"
+  [(set_attr "predicable" "yes")
+   (set_attr "type" "load1")])
+
+(define_insn "unaligned_loadhis"
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
+	(sign_extend:SI (unspec:HI [(match_operand:HI 1 "memory_operand" "m")]
+				   UNSPEC_UNALIGNED_LOAD)))]
+  "unaligned_access"
+  "ldr%(sh%)\t%0, %1\t@ unaligned"
+  [(set_attr "predicable" "yes")
+   (set_attr "type" "load_byte")])
+
+(define_insn "unaligned_loadhiu"
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
+	(zero_extend:SI (unspec:HI [(match_operand:HI 1 "memory_operand" "m")]
+				   UNSPEC_UNALIGNED_LOAD)))]
+  "unaligned_access"
+  "ldr%(h%)\t%0, %1\t@ unaligned"
+  [(set_attr "predicable" "yes")
+   (set_attr "type" "load_byte")])
+
+(define_insn "unaligned_storesi"
+  [(set (match_operand:SI 0 "memory_operand" "=m")
+	(unspec:SI [(match_operand:SI 1 "s_register_operand" "r")]
+		   UNSPEC_UNALIGNED_STORE))]
+  "unaligned_access"
+  "str%?\t%1, %0\t@ unaligned"
+  [(set_attr "predicable" "yes")
+   (set_attr "type" "store1")])
+
+(define_insn "unaligned_storehi"
+  [(set (match_operand:HI 0 "memory_operand" "=m")
+	(unspec:HI [(match_operand:HI 1 "s_register_operand" "r")]
+		   UNSPEC_UNALIGNED_STORE))]
+  "unaligned_access"
+  "str%(h%)\t%1, %0\t@ unaligned"
+  [(set_attr "predicable" "yes")
+   (set_attr "type" "store1")])
+
+(define_insn "*extv_reg"
   [(set (match_operand:SI 0 "s_register_operand" "=r")
 	(sign_extract:SI (match_operand:SI 1 "s_register_operand" "r")
                          (match_operand:SI 2 "const_int_operand" "M")
diff --git a/gcc/config/arm/arm.opt b/gcc/config/arm/arm.opt
index 7d2d84c..3ed9016 100644
--- a/gcc/config/arm/arm.opt
+++ b/gcc/config/arm/arm.opt
@@ -170,3 +170,7 @@ mfix-cortex-m3-ldrd
 Target Report Var(fix_cm3_ldrd) Init(2)
 Avoid overlapping destination and address registers on LDRD instructions
 that may trigger Cortex-M3 errata.
+
+munaligned-access
+Target Report Var(unaligned_access) Init(2)
+Enable unaligned word and halfword accesses to packed data.
diff --git a/gcc/expmed.c b/gcc/expmed.c
index 7482747..a525184 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -648,7 +648,7 @@ store_bit_field_1 (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
       /* On big-endian machines, we count bits from the most significant.
 	 If the bit field insn does not, we must invert.  */
 
-      if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN)
+      if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN && !MEM_P (xop0))
 	xbitpos = unit - bitsize - xbitpos;
 
       /* We have been counting XBITPOS within UNIT.
@@ -1456,7 +1456,7 @@ extract_bit_field_1 (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
 
       /* On big-endian machines, we count bits from the most significant.
 	 If the bit field insn does not, we must invert.  */
-      if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN)
+      if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN && !MEM_P (xop0))
 	xbitpos = unit - bitsize - xbitpos;
 
       /* Now convert from counting within UNIT to counting in EXT_MODE.  */

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH, ARM] Unaligned accesses for packed structures [1/2]
  2011-05-06 12:45 [PATCH, ARM] Unaligned accesses for packed structures [1/2] Julian Brown
@ 2011-05-06 13:07 ` Richard Earnshaw
  2011-05-09 17:50   ` Julian Brown
  2011-07-04 11:44 ` Ulrich Weigand
  2011-09-22  6:47 ` Ye Joey
  2 siblings, 1 reply; 14+ messages in thread
From: Richard Earnshaw @ 2011-05-06 13:07 UTC (permalink / raw)
  To: Julian Brown; +Cc: gcc-patches, paul, Ramana Radhakrishnan


On Fri, 2011-05-06 at 13:34 +0100, Julian Brown wrote:
> Hi,
> 
> This is the first of two patches to add unaligned-access support to the
> ARM backend. This is done somewhat differently to Jie Zhang's earlier
> patch:
> 
>   http://gcc.gnu.org/ml/gcc-patches/2010-12/msg01890.html
> 
> In that with Jie's patch, *any* pointer dereference would be allowed to
> access unaligned data. This has the undesirable side-effect of
> disallowing instructions which don't support unaligned accesses (LDRD,
> LDM etc.) when unaligned accesses are enabled.
> 
> Instead, this patch enables only packed-structure accesses to use
> ldr/str/ldrh/strh, by taking a hint from the MIPS ldl/ldr
> implementation. I figured the unaligned-access ARM case is kind of
> similar to those, except that normal loads/stores are used, and the
> shifting/merging happens in hardware.
> 
> The standard names extv/extzv/insv can take a memory
> operand for the source/destination of the extract/insert operation, so
> we just expand to unspec'ed versions of the load and store operations
> when unaligned-access support is enabled: the benefit of doing that
> rather than, say, expanding using the regular movsi pattern is that we
> bypass any smartness in the compiler which might replace operations
> which work for unaligned accesses (ldr/str/ldrh/strh) with operations
> which don't work (ldrd/strd/ldm/stm/vldr/...). The downside is we might
> potentially miss out on optimization opportunities (since these things
> no longer look like plain memory accesses).
> 
> Doing things this way allows us to leave the settings for
> STRICT_ALIGNMENT/SLOW_BYTE_ACCESS alone, avoiding the disruption that
> changing them might cause.
> 
> The most awkward change in the patch is to generic code (expmed.c,
> {store,extract}_bit_field_1): in big-endian mode, the existing behaviour
> (when inserting/extracting a bitfield to a memory location) is
> definitely bogus: "unit" is set to BITS_PER_UNIT for memory locations,
> and if bitsize (the size of the field to insert/extract) is greater than
> BITS_PER_UNIT (which isn't unusual at all), xbitpos becomes negative.
> That can't possibly be intentional; I can only assume that this code
> path is not exercised for machines which have memory alternatives for
> bitfield insert/extract, and BITS_BIG_ENDIAN of 0 in BYTES_BIG_ENDIAN
> mode.
> 
> The logic for choosing when to enable the unaligned-access support (and
> the name of the option to override the default behaviour) is lifted from
> Jie's patch.
> 
> Tested with cross to ARM Linux, and (on a branch) in both little &
> big-endian mode cross to ARM EABI, with no regressions. OK to apply?
> 
> Thanks,
> 
> Julian
> 
> ChangeLog
> 
>     gcc/
>     * config/arm/arm.c (arm_override_options): Add unaligned_access
>     support.
>     * config/arm/arm.md (UNSPEC_UNALIGNED_LOAD)
>     (UNSPEC_UNALIGNED_STORE): Add constants for unspecs.
>     (insv, extzv): Add unaligned-access support.
>     (extv): Change to expander. Likewise.
>     (unaligned_loadsi, unaligned_loadhis, unaligned_loadhiu)
>     (unaligned_storesi, unaligned_storehi): New.
>     (*extv_reg): New (previous extv implementation).
>     * config/arm/arm.opt (munaligned_access): Add option.
>     * expmed.c (store_bit_field_1): Don't tweak bitfield numbering for
>     memory locations if BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN.
>     (extract_bit_field_1): Likewise.

+  if (unaligned_access == 2)
+    {
+      if (arm_arch6 && (arm_arch_notm || arm_arch7))
+       unaligned_access = 1;
+      else
+       unaligned_access = 0;
+    }
+

The compiler should fault -munaligned-access on cores that don't support
it.
+(define_insn "unaligned_loadsi"
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
+       (unspec:SI [(match_operand:SI 1 "memory_operand" "m")]
+                  UNSPEC_UNALIGNED_LOAD))]
+  "unaligned_access"
+  "ldr%?\t%0, %1\t@ unaligned"
+  [(set_attr "predicable" "yes")
+   (set_attr "type" "load1")])

I think the final condition should also include TARGET_32BIT, as these
patterns aren't expected to work with Thumb-1.

Secondly, they should be structured to get the length information right
when a 16-bit encoding can be used in Thumb-2 (you'll keep Carrot happy
that way :-) : just add an alternative that's only enabled for Thumb
mode and which matches the requirements for a 16-bit instruction (beware
however of the 16-bit write-back case).

Finally, I don't see anything to put out the correct build attribute
information for unaligned access enabled (Tag_CPU_unaligned_access).
Have I just missed it?

R.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH, ARM] Unaligned accesses for packed structures [1/2]
  2011-05-06 13:07 ` Richard Earnshaw
@ 2011-05-09 17:50   ` Julian Brown
  2011-05-10 14:03     ` Julian Brown
  2011-05-26 17:06     ` Julian Brown
  0 siblings, 2 replies; 14+ messages in thread
From: Julian Brown @ 2011-05-09 17:50 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: gcc-patches, paul, Ramana Radhakrishnan

[-- Attachment #1: Type: text/plain, Size: 2660 bytes --]

On Fri, 06 May 2011 14:04:16 +0100
Richard Earnshaw <rearnsha@arm.com> wrote:

> On Fri, 2011-05-06 at 13:34 +0100, Julian Brown wrote:
> > This is the first of two patches to add unaligned-access support to
> > the ARM backend. [...]

> The compiler should fault -munaligned-access on cores that don't
> support it.

I've implemented this as a warning (which automatically disables the
option).

> +(define_insn "unaligned_loadsi"
> +  [(set (match_operand:SI 0 "s_register_operand" "=r")
> +       (unspec:SI [(match_operand:SI 1 "memory_operand" "m")]
> +                  UNSPEC_UNALIGNED_LOAD))]
> +  "unaligned_access"
> +  "ldr%?\t%0, %1\t@ unaligned"
> +  [(set_attr "predicable" "yes")
> +   (set_attr "type" "load1")])
> 
> I think the final condition should also include TARGET_32BIT, as these
> patterns aren't expected to work with Thumb-1.

Added.

> Secondly, they should be structured to get the length information
> right when a 16-bit encoding can be used in Thumb-2 (you'll keep
> Carrot happy that way :-) : just add an alternative that's only
> enabled for Thumb mode and which matches the requirements for a
> 16-bit instruction (beware however of the 16-bit write-back case).

I've done this, assuming that by "16-bit write-back case" you meant
singleton-register-list ldmia/stmia instructions masquerading as
post-increment ldr/str? I've disallowed that by adding a new
constraint. It's not entirely clear to me whether Thumb-2 (vs. Thumb-1)
will actually ever use 16-bit ldmia/stmia instead of 32-bit writeback
ldr/str though.

> Finally, I don't see anything to put out the correct build attribute
> information for unaligned access enabled (Tag_CPU_unaligned_access).
> Have I just missed it?

No you didn't miss it, it wasn't there :-). Added.

How does this look now? (Re-testing in progress.)

Thanks,

Julian

ChangeLog

    gcc/
    * config/arm/arm.c (arm_override_options): Add unaligned_access
    support.
    (arm_file_start): Emit attribute for unaligned access as
    appropriate.
    * config/arm/arm.md (UNSPEC_UNALIGNED_LOAD)
    (UNSPEC_UNALIGNED_STORE): Add constants for unspecs.
    (insv, extzv): Add unaligned-access support.
    (extv): Change to expander. Likewise.
    (unaligned_loadsi, unaligned_loadhis, unaligned_loadhiu)
    (unaligned_storesi, unaligned_storehi): New.
    (*extv_reg): New (previous extv implementation).
    * config/arm/arm.opt (munaligned_access): Add option.
    * config/arm/constraints.md (Uw): New constraint.
    * expmed.c (store_bit_field_1): Don't tweak bitfield numbering for
    memory locations if BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN.
    (extract_bit_field_1): Likewise.

[-- Attachment #2: arm-unaligned-support-fsf-2.diff --]
[-- Type: text/x-patch, Size: 13333 bytes --]

commit bd55df2538dd90e576dd0f69bc9c9d570c8eee08
Author: Julian Brown <julian@henry7.codesourcery.com>
Date:   Wed May 4 10:06:25 2011 -0700

    Permit regular ldr/str/ldrh/strh for packed-structure accesses etc.

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 4f9c2aa..f0f1a73 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1833,6 +1833,28 @@ arm_option_override (void)
 	fix_cm3_ldrd = 0;
     }
 
+  /* Enable -munaligned-access by default for
+     - all ARMv6 architecture-based processors
+     - ARMv7-A, ARMv7-R, and ARMv7-M architecture-based processors.
+
+     Disable -munaligned-access by default for
+     - all pre-ARMv6 architecture-based processors
+     - ARMv6-M architecture-based processors.  */
+
+  if (unaligned_access == 2)
+    {
+      if (arm_arch6 && (arm_arch_notm || arm_arch7))
+	unaligned_access = 1;
+      else
+	unaligned_access = 0;
+    }
+  else if (unaligned_access == 1
+	   && !(arm_arch6 && (arm_arch_notm || arm_arch7)))
+    {
+      warning (0, "target CPU does not support unaligned accesses");
+      unaligned_access = 0;
+    }
+
   if (TARGET_THUMB1 && flag_schedule_insns)
     {
       /* Don't warn since it's on by default in -O2.  */
@@ -21714,6 +21736,10 @@ arm_file_start (void)
 	val = 6;
       asm_fprintf (asm_out_file, "\t.eabi_attribute 30, %d\n", val);
 
+      /* Tag_CPU_unaligned_access.  */
+      asm_fprintf (asm_out_file, "\t.eabi_attribute 34, %d\n",
+		   unaligned_access);
+
       /* Tag_ABI_FP_16bit_format.  */
       if (arm_fp16_format)
 	asm_fprintf (asm_out_file, "\t.eabi_attribute 38, %d\n",
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 40ebf35..59b9ffb 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -104,6 +104,10 @@
   UNSPEC_SYMBOL_OFFSET  ; The offset of the start of the symbol from
                         ; another symbolic address.
   UNSPEC_MEMORY_BARRIER ; Represent a memory barrier.
+  UNSPEC_UNALIGNED_LOAD	; Used to represent ldr/ldrh instructions that access
+			; unaligned locations, on architectures which support
+			; that.
+  UNSPEC_UNALIGNED_STORE ; Same for str/strh.
 ])
 
 ;; UNSPEC_VOLATILE Usage:
@@ -2393,7 +2397,7 @@
 ;;; this insv pattern, so this pattern needs to be reevalutated.
 
 (define_expand "insv"
-  [(set (zero_extract:SI (match_operand:SI 0 "s_register_operand" "")
+  [(set (zero_extract:SI (match_operand:SI 0 "nonimmediate_operand" "")
                          (match_operand:SI 1 "general_operand" "")
                          (match_operand:SI 2 "general_operand" ""))
         (match_operand:SI 3 "reg_or_int_operand" ""))]
@@ -2407,35 +2411,66 @@
 
     if (arm_arch_thumb2)
       {
-	bool use_bfi = TRUE;
-
-	if (GET_CODE (operands[3]) == CONST_INT)
+        if (unaligned_access && MEM_P (operands[0])
+	    && s_register_operand (operands[3], GET_MODE (operands[3]))
+	    && (width == 16 || width == 32) && (start_bit % BITS_PER_UNIT) == 0)
 	  {
-	    HOST_WIDE_INT val = INTVAL (operands[3]) & mask;
-
-	    if (val == 0)
+	    rtx base_addr;
+	    
+	    if (width == 32)
 	      {
-		emit_insn (gen_insv_zero (operands[0], operands[1],
-					  operands[2]));
-		DONE;
+	        base_addr = adjust_address (operands[0], SImode,
+					    start_bit / BITS_PER_UNIT);
+		emit_insn (gen_unaligned_storesi (base_addr, operands[3]));
 	      }
+	    else
+	      {
+	        rtx tmp = gen_reg_rtx (HImode);
 
-	    /* See if the set can be done with a single orr instruction.  */
-	    if (val == mask && const_ok_for_arm (val << start_bit))
-	      use_bfi = FALSE;
+	        base_addr = adjust_address (operands[0], HImode,
+					    start_bit / BITS_PER_UNIT);
+		emit_move_insn (tmp, gen_lowpart (HImode, operands[3]));
+		emit_insn (gen_unaligned_storehi (base_addr, tmp));
+	      }
+	    DONE;
 	  }
-	  
-	if (use_bfi)
+	else if (s_register_operand (operands[0], GET_MODE (operands[0])))
 	  {
-	    if (GET_CODE (operands[3]) != REG)
-	      operands[3] = force_reg (SImode, operands[3]);
+	    bool use_bfi = TRUE;
 
-	    emit_insn (gen_insv_t2 (operands[0], operands[1], operands[2],
-				    operands[3]));
-	    DONE;
+	    if (GET_CODE (operands[3]) == CONST_INT)
+	      {
+		HOST_WIDE_INT val = INTVAL (operands[3]) & mask;
+
+		if (val == 0)
+		  {
+		    emit_insn (gen_insv_zero (operands[0], operands[1],
+					      operands[2]));
+		    DONE;
+		  }
+
+		/* See if the set can be done with a single orr instruction.  */
+		if (val == mask && const_ok_for_arm (val << start_bit))
+		  use_bfi = FALSE;
+	      }
+
+	    if (use_bfi)
+	      {
+		if (GET_CODE (operands[3]) != REG)
+		  operands[3] = force_reg (SImode, operands[3]);
+
+		emit_insn (gen_insv_t2 (operands[0], operands[1], operands[2],
+					operands[3]));
+		DONE;
+	      }
 	  }
+	else
+	  FAIL;
       }
 
+    if (!s_register_operand (operands[0], GET_MODE (operands[0])))
+      FAIL;
+
     target = copy_rtx (operands[0]);
     /* Avoid using a subreg as a subtarget, and avoid writing a paradoxical 
        subreg as the final target.  */
@@ -3628,7 +3663,7 @@
 
 (define_expand "extzv"
   [(set (match_dup 4)
-	(ashift:SI (match_operand:SI   1 "register_operand" "")
+	(ashift:SI (match_operand:SI   1 "nonimmediate_operand" "")
 		   (match_operand:SI   2 "const_int_operand" "")))
    (set (match_operand:SI              0 "register_operand" "")
 	(lshiftrt:SI (match_dup 4)
@@ -3641,10 +3676,53 @@
     
     if (arm_arch_thumb2)
       {
-	emit_insn (gen_extzv_t2 (operands[0], operands[1], operands[2],
-				 operands[3]));
-	DONE;
+	HOST_WIDE_INT width = INTVAL (operands[2]);
+	HOST_WIDE_INT bitpos = INTVAL (operands[3]);
+
+	if (unaligned_access && MEM_P (operands[1])
+	    && (width == 16 || width == 32) && (bitpos % BITS_PER_UNIT) == 0)
+	  {
+	    rtx base_addr;
+
+	    if (width == 32)
+              {
+		base_addr = adjust_address (operands[1], SImode,
+					    bitpos / BITS_PER_UNIT);
+		emit_insn (gen_unaligned_loadsi (operands[0], base_addr));
+              }
+	    else
+              {
+		rtx dest = operands[0];
+		rtx tmp = gen_reg_rtx (SImode);
+
+		/* We may get a paradoxical subreg here.  Strip it off.  */
+		if (GET_CODE (dest) == SUBREG
+		    && GET_MODE (dest) == SImode
+		    && GET_MODE (SUBREG_REG (dest)) == HImode)
+		  dest = SUBREG_REG (dest);
+
+		if (GET_MODE_BITSIZE (GET_MODE (dest)) != width)
+		  FAIL;
+
+		base_addr = adjust_address (operands[1], HImode,
+					    bitpos / BITS_PER_UNIT);
+		emit_insn (gen_unaligned_loadhiu (tmp, base_addr));
+		emit_move_insn (gen_lowpart (SImode, dest), tmp);
+	      }
+	    DONE;
+	  }
+	else if (s_register_operand (operands[1], GET_MODE (operands[1])))
+	  {
+	    emit_insn (gen_extzv_t2 (operands[0], operands[1], operands[2],
+				     operands[3]));
+	    DONE;
+	  }
+	else
+	  FAIL;
       }
+    
+    if (!s_register_operand (operands[1], GET_MODE (operands[1])))
+      FAIL;
 
     operands[3] = GEN_INT (rshift);
     
@@ -3659,7 +3737,115 @@
   }"
 )
 
-(define_insn "extv"
+(define_expand "extv"
+  [(set (match_operand 0 "s_register_operand" "")
+	(sign_extract (match_operand 1 "nonimmediate_operand" "")
+		      (match_operand 2 "const_int_operand" "")
+		      (match_operand 3 "const_int_operand" "")))]
+  "arm_arch_thumb2"
+{
+  HOST_WIDE_INT width = INTVAL (operands[2]);
+  HOST_WIDE_INT bitpos = INTVAL (operands[3]);
+
+  if (unaligned_access && MEM_P (operands[1]) && (width == 16 || width == 32)
+      && (bitpos % BITS_PER_UNIT) == 0)
+    {
+      rtx base_addr;
+      
+      if (width == 32)
+        {
+	  base_addr = adjust_address (operands[1], SImode,
+				      bitpos / BITS_PER_UNIT);
+	  emit_insn (gen_unaligned_loadsi (operands[0], base_addr));
+        }
+      else
+        {
+	  rtx dest = operands[0];
+	  rtx tmp = gen_reg_rtx (SImode);
+	  
+	  /* We may get a paradoxical subreg here.  Strip it off.  */
+	  if (GET_CODE (dest) == SUBREG
+	      && GET_MODE (dest) == SImode
+	      && GET_MODE (SUBREG_REG (dest)) == HImode)
+	    dest = SUBREG_REG (dest);
+	  
+	  if (GET_MODE_BITSIZE (GET_MODE (dest)) != width)
+	    FAIL;
+	  
+	  base_addr = adjust_address (operands[1], HImode,
+				      bitpos / BITS_PER_UNIT);
+	  emit_insn (gen_unaligned_loadhis (tmp, base_addr));
+	  emit_move_insn (gen_lowpart (SImode, dest), tmp);
+	}
+
+      DONE;
+    }
+  else if (s_register_operand (operands[1], GET_MODE (operands[1])))
+    DONE;
+  else
+    FAIL;
+})
+
+; ARMv6+ unaligned load/store instructions (used for packed structure accesses).
+
+(define_insn "unaligned_loadsi"
+  [(set (match_operand:SI 0 "s_register_operand" "=l,r")
+	(unspec:SI [(match_operand:SI 1 "memory_operand" "Uw,m")]
+		   UNSPEC_UNALIGNED_LOAD))]
+  "unaligned_access && TARGET_32BIT"
+  "ldr%?\t%0, %1\t@ unaligned"
+  [(set_attr "arch" "t2,any")
+   (set_attr "length" "2,4")
+   (set_attr "predicable" "yes")
+   (set_attr "type" "load1")])
+
+(define_insn "unaligned_loadhis"
+  [(set (match_operand:SI 0 "s_register_operand" "=l,r")
+	(sign_extend:SI
+	  (unspec:HI [(match_operand:HI 1 "memory_operand" "Uw,m")]
+		     UNSPEC_UNALIGNED_LOAD)))]
+  "unaligned_access && TARGET_32BIT"
+  "ldr%(sh%)\t%0, %1\t@ unaligned"
+  [(set_attr "arch" "t2,any")
+   (set_attr "length" "2,4")
+   (set_attr "predicable" "yes")
+   (set_attr "type" "load_byte")])
+
+(define_insn "unaligned_loadhiu"
+  [(set (match_operand:SI 0 "s_register_operand" "=l,r")
+	(zero_extend:SI
+	  (unspec:HI [(match_operand:HI 1 "memory_operand" "Uw,m")]
+		     UNSPEC_UNALIGNED_LOAD)))]
+  "unaligned_access && TARGET_32BIT"
+  "ldr%(h%)\t%0, %1\t@ unaligned"
+  [(set_attr "arch" "t2,any")
+   (set_attr "length" "2,4")
+   (set_attr "predicable" "yes")
+   (set_attr "type" "load_byte")])
+
+(define_insn "unaligned_storesi"
+  [(set (match_operand:SI 0 "memory_operand" "=Uw,m")
+	(unspec:SI [(match_operand:SI 1 "s_register_operand" "l,r")]
+		   UNSPEC_UNALIGNED_STORE))]
+  "unaligned_access && TARGET_32BIT"
+  "str%?\t%1, %0\t@ unaligned"
+  [(set_attr "arch" "t2,any")
+   (set_attr "length" "2,4")
+   (set_attr "predicable" "yes")
+   (set_attr "type" "store1")])
+
+(define_insn "unaligned_storehi"
+  [(set (match_operand:HI 0 "memory_operand" "=Uw,m")
+	(unspec:HI [(match_operand:HI 1 "s_register_operand" "l,r")]
+		   UNSPEC_UNALIGNED_STORE))]
+  "unaligned_access && TARGET_32BIT"
+  "str%(h%)\t%1, %0\t@ unaligned"
+  [(set_attr "arch" "t2,any")
+   (set_attr "length" "2,4")
+   (set_attr "predicable" "yes")
+   (set_attr "type" "store1")])
+
+(define_insn "*extv_reg"
   [(set (match_operand:SI 0 "s_register_operand" "=r")
 	(sign_extract:SI (match_operand:SI 1 "s_register_operand" "r")
                          (match_operand:SI 2 "const_int_operand" "M")
diff --git a/gcc/config/arm/arm.opt b/gcc/config/arm/arm.opt
index 7d2d84c..3ed9016 100644
--- a/gcc/config/arm/arm.opt
+++ b/gcc/config/arm/arm.opt
@@ -170,3 +170,7 @@ mfix-cortex-m3-ldrd
 Target Report Var(fix_cm3_ldrd) Init(2)
 Avoid overlapping destination and address registers on LDRD instructions
 that may trigger Cortex-M3 errata.
+
+munaligned-access
+Target Report Var(unaligned_access) Init(2)
+Enable unaligned word and halfword accesses to packed data.
diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index f5b8521..53c358e 100644
--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -36,7 +36,7 @@
 ;; The following memory constraints have been used:
 ;; in ARM/Thumb-2 state: Q, Ut, Uv, Uy, Un, Um, Us
 ;; in ARM state: Uq
-;; in Thumb state: Uu
+;; in Thumb state: Uu, Uw
 
 
 (define_register_constraint "f" "TARGET_ARM ? FPA_REGS : NO_REGS"
@@ -341,6 +341,19 @@
 		   && thumb1_legitimate_address_p (GET_MODE (op), XEXP (op, 0),
 						   0)")))
 
+; The 16-bit post-increment LDR/STR accepted by thumb1_legitimate_address_p
+; are actually LDM/STM instructions, so cannot be used to access unaligned
+; data.
+(define_memory_constraint "Uw"
+ "@internal
+  In Thumb state an address that is valid in 16bit encoding, and that can be
+  used for unaligned accesses."
+ (and (match_code "mem")
+      (match_test "TARGET_THUMB
+		   && thumb1_legitimate_address_p (GET_MODE (op), XEXP (op, 0),
+						   0)
+		   && GET_CODE (XEXP (op, 0)) != POST_INC")))
+
 ;; We used to have constraint letters for S and R in ARM state, but
 ;; all uses of these now appear to have been removed.
 
diff --git a/gcc/expmed.c b/gcc/expmed.c
index 7482747..a525184 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -648,7 +648,7 @@ store_bit_field_1 (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
       /* On big-endian machines, we count bits from the most significant.
 	 If the bit field insn does not, we must invert.  */
 
-      if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN)
+      if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN && !MEM_P (xop0))
 	xbitpos = unit - bitsize - xbitpos;
 
       /* We have been counting XBITPOS within UNIT.
@@ -1456,7 +1456,7 @@ extract_bit_field_1 (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
 
       /* On big-endian machines, we count bits from the most significant.
 	 If the bit field insn does not, we must invert.  */
-      if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN)
+      if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN && !MEM_P (xop0))
 	xbitpos = unit - bitsize - xbitpos;
 
       /* Now convert from counting within UNIT to counting in EXT_MODE.  */

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH, ARM] Unaligned accesses for packed structures [1/2]
  2011-05-09 17:50   ` Julian Brown
@ 2011-05-10 14:03     ` Julian Brown
  2011-05-26 17:06     ` Julian Brown
  1 sibling, 0 replies; 14+ messages in thread
From: Julian Brown @ 2011-05-10 14:03 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: gcc-patches, paul, Ramana Radhakrishnan

On Mon, 9 May 2011 18:01:12 +0100
Julian Brown <julian@codesourcery.com> wrote:

> How does this look now? (Re-testing in progress.)

Results from re-testing are fine, btw.

Cheers,

Julian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH, ARM] Unaligned accesses for packed structures [1/2]
  2011-05-09 17:50   ` Julian Brown
  2011-05-10 14:03     ` Julian Brown
@ 2011-05-26 17:06     ` Julian Brown
  1 sibling, 0 replies; 14+ messages in thread
From: Julian Brown @ 2011-05-26 17:06 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw, paul, Ramana Radhakrishnan

[-- Attachment #1: Type: text/plain, Size: 1347 bytes --]

On Mon, 9 May 2011 18:01:12 +0100
Julian Brown <julian@codesourcery.com> wrote:

> How does this look now? (Re-testing in progress.)

The previously-posted version contained a bug in the "extv" expander,
which became apparent only when testing together with my fixed-point
support patch. This is a corrected version.

Re-tested (together with the followup unaligned-memcpy patch and
fixed-point patch series), with no regressions in both big &
little-endian mode, cross to ARM EABI.

OK to apply this version?

Thanks,

Julian

ChangeLog

    gcc/
    * config/arm/arm.c (arm_override_options): Add unaligned_access
    support.
    (arm_file_start): Emit attribute for unaligned access as
    appropriate.
    * config/arm/arm.md (UNSPEC_UNALIGNED_LOAD)
    (UNSPEC_UNALIGNED_STORE): Add constants for unspecs.
    (insv, extzv): Add unaligned-access support.
    (extv): Change to expander. Likewise.
    (unaligned_loadsi, unaligned_loadhis, unaligned_loadhiu)
    (unaligned_storesi, unaligned_storehi): New.
    (*extv_reg): New (previous extv implementation).
    * config/arm/arm.opt (munaligned_access): Add option.
    * config/arm/constraints.md (Uw): New constraint.
    * expmed.c (store_bit_field_1): Don't tweak bitfield numbering for
    memory locations if BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN.
    (extract_bit_field_1): Likewise.

[-- Attachment #2: arm-unaligned-support-fsf-3.diff --]
[-- Type: text/x-patch, Size: 13292 bytes --]

commit d66c777aa23e6df0d68265767d4ae9f370ffb761
Author: Julian Brown <julian@henry8.codesourcery.com>
Date:   Tue May 24 10:01:48 2011 -0700

    Unaligned support for packed structs

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 47c7a3a..177c9bc 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1674,6 +1674,28 @@ arm_option_override (void)
 	fix_cm3_ldrd = 0;
     }
 
+  /* Enable -munaligned-access by default for
+     - all ARMv6 architecture-based processors
+     - ARMv7-A, ARMv7-R, and ARMv7-M architecture-based processors.
+
+     Disable -munaligned-access by default for
+     - all pre-ARMv6 architecture-based processors
+     - ARMv6-M architecture-based processors.  */
+
+  if (unaligned_access == 2)
+    {
+      if (arm_arch6 && (arm_arch_notm || arm_arch7))
+	unaligned_access = 1;
+      else
+	unaligned_access = 0;
+    }
+  else if (unaligned_access == 1
+	   && !(arm_arch6 && (arm_arch_notm || arm_arch7)))
+    {
+      warning (0, "target CPU does not support unaligned accesses");
+      unaligned_access = 0;
+    }
+
   if (TARGET_THUMB1 && flag_schedule_insns)
     {
       /* Don't warn since it's on by default in -O2.  */
@@ -21555,6 +21577,10 @@ arm_file_start (void)
 	val = 6;
       asm_fprintf (asm_out_file, "\t.eabi_attribute 30, %d\n", val);
 
+      /* Tag_CPU_unaligned_access.  */
+      asm_fprintf (asm_out_file, "\t.eabi_attribute 34, %d\n",
+		   unaligned_access);
+
       /* Tag_ABI_FP_16bit_format.  */
       if (arm_fp16_format)
 	asm_fprintf (asm_out_file, "\t.eabi_attribute 38, %d\n",
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index fad82f0..12fd7ca 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -104,6 +104,10 @@
   UNSPEC_SYMBOL_OFFSET  ; The offset of the start of the symbol from
                         ; another symbolic address.
   UNSPEC_MEMORY_BARRIER ; Represent a memory barrier.
+  UNSPEC_UNALIGNED_LOAD	; Used to represent ldr/ldrh instructions that access
+			; unaligned locations, on architectures which support
+			; that.
+  UNSPEC_UNALIGNED_STORE ; Same for str/strh.
 ])
 
 ;; UNSPEC_VOLATILE Usage:
@@ -2393,7 +2397,7 @@
 ;;; this insv pattern, so this pattern needs to be reevalutated.
 
 (define_expand "insv"
-  [(set (zero_extract:SI (match_operand:SI 0 "s_register_operand" "")
+  [(set (zero_extract:SI (match_operand:SI 0 "nonimmediate_operand" "")
                          (match_operand:SI 1 "general_operand" "")
                          (match_operand:SI 2 "general_operand" ""))
         (match_operand:SI 3 "reg_or_int_operand" ""))]
@@ -2407,35 +2411,66 @@
 
     if (arm_arch_thumb2)
       {
-	bool use_bfi = TRUE;
-
-	if (GET_CODE (operands[3]) == CONST_INT)
+        if (unaligned_access && MEM_P (operands[0])
+	    && s_register_operand (operands[3], GET_MODE (operands[3]))
+	    && (width == 16 || width == 32) && (start_bit % BITS_PER_UNIT) == 0)
 	  {
-	    HOST_WIDE_INT val = INTVAL (operands[3]) & mask;
-
-	    if (val == 0)
+	    rtx base_addr;
+	    
+	    if (width == 32)
 	      {
-		emit_insn (gen_insv_zero (operands[0], operands[1],
-					  operands[2]));
-		DONE;
+	        base_addr = adjust_address (operands[0], SImode,
+					    start_bit / BITS_PER_UNIT);
+		emit_insn (gen_unaligned_storesi (base_addr, operands[3]));
 	      }
+	    else
+	      {
+	        rtx tmp = gen_reg_rtx (HImode);
 
-	    /* See if the set can be done with a single orr instruction.  */
-	    if (val == mask && const_ok_for_arm (val << start_bit))
-	      use_bfi = FALSE;
+	        base_addr = adjust_address (operands[0], HImode,
+					    start_bit / BITS_PER_UNIT);
+		emit_move_insn (tmp, gen_lowpart (HImode, operands[3]));
+		emit_insn (gen_unaligned_storehi (base_addr, tmp));
+	      }
+	    DONE;
 	  }
-	  
-	if (use_bfi)
+	else if (s_register_operand (operands[0], GET_MODE (operands[0])))
 	  {
-	    if (GET_CODE (operands[3]) != REG)
-	      operands[3] = force_reg (SImode, operands[3]);
+	    bool use_bfi = TRUE;
 
-	    emit_insn (gen_insv_t2 (operands[0], operands[1], operands[2],
-				    operands[3]));
-	    DONE;
+	    if (GET_CODE (operands[3]) == CONST_INT)
+	      {
+		HOST_WIDE_INT val = INTVAL (operands[3]) & mask;
+
+		if (val == 0)
+		  {
+		    emit_insn (gen_insv_zero (operands[0], operands[1],
+					      operands[2]));
+		    DONE;
+		  }
+
+		/* See if the set can be done with a single orr instruction.  */
+		if (val == mask && const_ok_for_arm (val << start_bit))
+		  use_bfi = FALSE;
+	      }
+
+	    if (use_bfi)
+	      {
+		if (GET_CODE (operands[3]) != REG)
+		  operands[3] = force_reg (SImode, operands[3]);
+
+		emit_insn (gen_insv_t2 (operands[0], operands[1], operands[2],
+					operands[3]));
+		DONE;
+	      }
 	  }
+	else
+	  FAIL;
       }
 
+    if (!s_register_operand (operands[0], GET_MODE (operands[0])))
+      FAIL;
+
     target = copy_rtx (operands[0]);
     /* Avoid using a subreg as a subtarget, and avoid writing a paradoxical 
        subreg as the final target.  */
@@ -3628,7 +3663,7 @@
 
 (define_expand "extzv"
   [(set (match_dup 4)
-	(ashift:SI (match_operand:SI   1 "register_operand" "")
+	(ashift:SI (match_operand:SI   1 "nonimmediate_operand" "")
 		   (match_operand:SI   2 "const_int_operand" "")))
    (set (match_operand:SI              0 "register_operand" "")
 	(lshiftrt:SI (match_dup 4)
@@ -3641,10 +3676,53 @@
     
     if (arm_arch_thumb2)
       {
-	emit_insn (gen_extzv_t2 (operands[0], operands[1], operands[2],
-				 operands[3]));
-	DONE;
+	HOST_WIDE_INT width = INTVAL (operands[2]);
+	HOST_WIDE_INT bitpos = INTVAL (operands[3]);
+
+	if (unaligned_access && MEM_P (operands[1])
+	    && (width == 16 || width == 32) && (bitpos % BITS_PER_UNIT) == 0)
+	  {
+	    rtx base_addr;
+
+	    if (width == 32)
+              {
+		base_addr = adjust_address (operands[1], SImode,
+					    bitpos / BITS_PER_UNIT);
+		emit_insn (gen_unaligned_loadsi (operands[0], base_addr));
+              }
+	    else
+              {
+		rtx dest = operands[0];
+		rtx tmp = gen_reg_rtx (SImode);
+
+		/* We may get a paradoxical subreg here.  Strip it off.  */
+		if (GET_CODE (dest) == SUBREG
+		    && GET_MODE (dest) == SImode
+		    && GET_MODE (SUBREG_REG (dest)) == HImode)
+		  dest = SUBREG_REG (dest);
+
+		if (GET_MODE_BITSIZE (GET_MODE (dest)) != width)
+		  FAIL;
+
+		base_addr = adjust_address (operands[1], HImode,
+					    bitpos / BITS_PER_UNIT);
+		emit_insn (gen_unaligned_loadhiu (tmp, base_addr));
+		emit_move_insn (gen_lowpart (SImode, dest), tmp);
+	      }
+	    DONE;
+	  }
+	else if (s_register_operand (operands[1], GET_MODE (operands[1])))
+	  {
+	    emit_insn (gen_extzv_t2 (operands[0], operands[1], operands[2],
+				     operands[3]));
+	    DONE;
+	  }
+	else
+	  FAIL;
       }
+    
+    if (!s_register_operand (operands[1], GET_MODE (operands[1])))
+      FAIL;
 
     operands[3] = GEN_INT (rshift);
     
@@ -3659,7 +3737,113 @@
   }"
 )
 
-(define_insn "extv"
+(define_expand "extv"
+  [(set (match_operand:SI 0 "s_register_operand" "")
+	(sign_extract:SI (match_operand:SI 1 "nonimmediate_operand" "")
+			 (match_operand:SI 2 "const_int_operand" "")
+			 (match_operand:SI 3 "const_int_operand" "")))]
+  "arm_arch_thumb2"
+{
+  HOST_WIDE_INT width = INTVAL (operands[2]);
+  HOST_WIDE_INT bitpos = INTVAL (operands[3]);
+
+  if (unaligned_access && MEM_P (operands[1]) && (width == 16 || width == 32)
+      && (bitpos % BITS_PER_UNIT) == 0)
+    {
+      rtx base_addr;
+      
+      if (width == 32)
+        {
+	  base_addr = adjust_address (operands[1], SImode,
+				      bitpos / BITS_PER_UNIT);
+	  emit_insn (gen_unaligned_loadsi (operands[0], base_addr));
+        }
+      else
+        {
+	  rtx dest = operands[0];
+	  rtx tmp = gen_reg_rtx (SImode);
+	  
+	  /* We may get a paradoxical subreg here.  Strip it off.  */
+	  if (GET_CODE (dest) == SUBREG
+	      && GET_MODE (dest) == SImode
+	      && GET_MODE (SUBREG_REG (dest)) == HImode)
+	    dest = SUBREG_REG (dest);
+	  
+	  if (GET_MODE_BITSIZE (GET_MODE (dest)) != width)
+	    FAIL;
+	  
+	  base_addr = adjust_address (operands[1], HImode,
+				      bitpos / BITS_PER_UNIT);
+	  emit_insn (gen_unaligned_loadhis (tmp, base_addr));
+	  emit_move_insn (gen_lowpart (SImode, dest), tmp);
+	}
+
+      DONE;
+    }
+  else if (!s_register_operand (operands[1], GET_MODE (operands[1])))
+    FAIL;
+})
+
+; ARMv6+ unaligned load/store instructions (used for packed structure accesses).
+
+(define_insn "unaligned_loadsi"
+  [(set (match_operand:SI 0 "s_register_operand" "=l,r")
+	(unspec:SI [(match_operand:SI 1 "memory_operand" "Uw,m")]
+		   UNSPEC_UNALIGNED_LOAD))]
+  "unaligned_access && TARGET_32BIT"
+  "ldr%?\t%0, %1\t@ unaligned"
+  [(set_attr "arch" "t2,any")
+   (set_attr "length" "2,4")
+   (set_attr "predicable" "yes")
+   (set_attr "type" "load1")])
+
+(define_insn "unaligned_loadhis"
+  [(set (match_operand:SI 0 "s_register_operand" "=l,r")
+	(sign_extend:SI
+	  (unspec:HI [(match_operand:HI 1 "memory_operand" "Uw,m")]
+		     UNSPEC_UNALIGNED_LOAD)))]
+  "unaligned_access && TARGET_32BIT"
+  "ldr%(sh%)\t%0, %1\t@ unaligned"
+  [(set_attr "arch" "t2,any")
+   (set_attr "length" "2,4")
+   (set_attr "predicable" "yes")
+   (set_attr "type" "load_byte")])
+
+(define_insn "unaligned_loadhiu"
+  [(set (match_operand:SI 0 "s_register_operand" "=l,r")
+	(zero_extend:SI
+	  (unspec:HI [(match_operand:HI 1 "memory_operand" "Uw,m")]
+		     UNSPEC_UNALIGNED_LOAD)))]
+  "unaligned_access && TARGET_32BIT"
+  "ldr%(h%)\t%0, %1\t@ unaligned"
+  [(set_attr "arch" "t2,any")
+   (set_attr "length" "2,4")
+   (set_attr "predicable" "yes")
+   (set_attr "type" "load_byte")])
+
+(define_insn "unaligned_storesi"
+  [(set (match_operand:SI 0 "memory_operand" "=Uw,m")
+	(unspec:SI [(match_operand:SI 1 "s_register_operand" "l,r")]
+		   UNSPEC_UNALIGNED_STORE))]
+  "unaligned_access && TARGET_32BIT"
+  "str%?\t%1, %0\t@ unaligned"
+  [(set_attr "arch" "t2,any")
+   (set_attr "length" "2,4")
+   (set_attr "predicable" "yes")
+   (set_attr "type" "store1")])
+
+(define_insn "unaligned_storehi"
+  [(set (match_operand:HI 0 "memory_operand" "=Uw,m")
+	(unspec:HI [(match_operand:HI 1 "s_register_operand" "l,r")]
+		   UNSPEC_UNALIGNED_STORE))]
+  "unaligned_access && TARGET_32BIT"
+  "str%(h%)\t%1, %0\t@ unaligned"
+  [(set_attr "arch" "t2,any")
+   (set_attr "length" "2,4")
+   (set_attr "predicable" "yes")
+   (set_attr "type" "store1")])
+
+(define_insn "*extv_reg"
   [(set (match_operand:SI 0 "s_register_operand" "=r")
 	(sign_extract:SI (match_operand:SI 1 "s_register_operand" "r")
                          (match_operand:SI 2 "const_int_operand" "M")
diff --git a/gcc/config/arm/arm.opt b/gcc/config/arm/arm.opt
index c45bc77..bba4900 100644
--- a/gcc/config/arm/arm.opt
+++ b/gcc/config/arm/arm.opt
@@ -235,3 +235,7 @@ mfix-cortex-m3-ldrd
 Target Report Var(fix_cm3_ldrd) Init(2)
 Avoid overlapping destination and address registers on LDRD instructions
 that may trigger Cortex-M3 errata.
+
+munaligned-access
+Target Report Var(unaligned_access) Init(2)
+Enable unaligned word and halfword accesses to packed data.
diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index f5b8521..53c358e 100644
--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -36,7 +36,7 @@
 ;; The following memory constraints have been used:
 ;; in ARM/Thumb-2 state: Q, Ut, Uv, Uy, Un, Um, Us
 ;; in ARM state: Uq
-;; in Thumb state: Uu
+;; in Thumb state: Uu, Uw
 
 
 (define_register_constraint "f" "TARGET_ARM ? FPA_REGS : NO_REGS"
@@ -341,6 +341,19 @@
 		   && thumb1_legitimate_address_p (GET_MODE (op), XEXP (op, 0),
 						   0)")))
 
+; The 16-bit post-increment LDR/STR accepted by thumb1_legitimate_address_p
+; are actually LDM/STM instructions, so cannot be used to access unaligned
+; data.
+(define_memory_constraint "Uw"
+ "@internal
+  In Thumb state an address that is valid in 16bit encoding, and that can be
+  used for unaligned accesses."
+ (and (match_code "mem")
+      (match_test "TARGET_THUMB
+		   && thumb1_legitimate_address_p (GET_MODE (op), XEXP (op, 0),
+						   0)
+		   && GET_CODE (XEXP (op, 0)) != POST_INC")))
+
 ;; We used to have constraint letters for S and R in ARM state, but
 ;; all uses of these now appear to have been removed.
 
diff --git a/gcc/expmed.c b/gcc/expmed.c
index 5527c1e..dea11b8 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -648,7 +648,7 @@ store_bit_field_1 (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
       /* On big-endian machines, we count bits from the most significant.
 	 If the bit field insn does not, we must invert.  */
 
-      if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN)
+      if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN && !MEM_P (xop0))
 	xbitpos = unit - bitsize - xbitpos;
 
       /* We have been counting XBITPOS within UNIT.
@@ -1456,7 +1456,7 @@ extract_bit_field_1 (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
 
       /* On big-endian machines, we count bits from the most significant.
 	 If the bit field insn does not, we must invert.  */
-      if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN)
+      if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN && !MEM_P (xop0))
 	xbitpos = unit - bitsize - xbitpos;
 
       /* Now convert from counting within UNIT to counting in EXT_MODE.  */

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH, ARM] Unaligned accesses for packed structures [1/2]
  2011-05-06 12:45 [PATCH, ARM] Unaligned accesses for packed structures [1/2] Julian Brown
  2011-05-06 13:07 ` Richard Earnshaw
@ 2011-07-04 11:44 ` Ulrich Weigand
  2011-08-25 16:29   ` Julian Brown
  2011-09-22  6:47 ` Ye Joey
  2 siblings, 1 reply; 14+ messages in thread
From: Ulrich Weigand @ 2011-07-04 11:44 UTC (permalink / raw)
  To: Julian Brown; +Cc: gcc-patches, paul, rearnsha, Ramana Radhakrishnan

Julian Brown wrote:

> The most awkward change in the patch is to generic code (expmed.c,
> {store,extract}_bit_field_1): in big-endian mode, the existing behaviour
> (when inserting/extracting a bitfield to a memory location) is
> definitely bogus: "unit" is set to BITS_PER_UNIT for memory locations,
> and if bitsize (the size of the field to insert/extract) is greater than
> BITS_PER_UNIT (which isn't unusual at all), xbitpos becomes negative.
> That can't possibly be intentional; I can only assume that this code
> path is not exercised for machines which have memory alternatives for
> bitfield insert/extract, and BITS_BIG_ENDIAN of 0 in BYTES_BIG_ENDIAN
> mode.
[snip]
> @@ -648,7 +648,7 @@ store_bit_field_1 (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
>        /* On big-endian machines, we count bits from the most significant.
>  	 If the bit field insn does not, we must invert.  */
>  
> -      if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN)
> +      if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN && !MEM_P (xop0))
>  	xbitpos = unit - bitsize - xbitpos;

I agree that the current code cannot possibly be correct.  However, just
disabling the BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN renumbering *completely*
seems wrong to me as well.  

According to the docs, the meaning bit position passed to the extv/insv
expanders is determined by BITS_BIG_ENDIAN, both in the cases of register
and memory operands.  Therefore, if BITS_BIG_ENDIAN differs from
BYTES_BIG_ENDIAN, we should need a correction for memory operands as
well.  However, this correction needs to be relative to the size of
the access (i.e. the operand to the extv/insn), not just BITS_PER_UNIT.

From looking at the sources, the simplest way to implement that might
be to swap the order of the two corrections, that is, change this:


      /* On big-endian machines, we count bits from the most significant.
         If the bit field insn does not, we must invert.  */

      if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN)
        xbitpos = unit - bitsize - xbitpos;

      /* We have been counting XBITPOS within UNIT.
         Count instead within the size of the register.  */
      if (BITS_BIG_ENDIAN && !MEM_P (xop0))
        xbitpos += GET_MODE_BITSIZE (op_mode) - unit;

      unit = GET_MODE_BITSIZE (op_mode);


to look instead like:

      /* We have been counting XBITPOS within UNIT.
         Count instead within the size of the register.  */
      if (BYTES_BIG_ENDIAN && !MEM_P (xop0))
        xbitpos += GET_MODE_BITSIZE (op_mode) - unit;

      unit = GET_MODE_BITSIZE (op_mode);

      /* On big-endian machines, we count bits from the most significant.
         If the bit field insn does not, we must invert.  */

      if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN)
        xbitpos = unit - bitsize - xbitpos;


(Note that the condition in the first if must then check BYTES_BIG_ENDIAN
instead of BITS_BIG_ENDIAN.)   This change results in unchanged behaviour
for register operands in all cases, and memory operands if BITS_BIG_ENDIAN
== BYTES_BIG_ENDIAN.  For the problematic case of memory operands with
BITS_BIG_ENDIAN != BYTES_BIG ENDIAN it should result in the appropriate
correction.

Note that with that change, the new code your patch introduces to the
ARM back-end will also need to change.  You currently handle bitpos
like this:

          base_addr = adjust_address (operands[1], HImode,
                                      bitpos / BITS_PER_UNIT);

This implicitly assumes that bitpos counts according to BYTES_BIG_ENDIAN,
not BITS_BIG_ENDIAN -- which exactly cancels out the common code behaviour
introduced by your patch ...

Thoughts?  Am I overlooking something here?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH, ARM] Unaligned accesses for packed structures [1/2]
  2011-07-04 11:44 ` Ulrich Weigand
@ 2011-08-25 16:29   ` Julian Brown
  2011-08-25 18:19     ` Julian Brown
  2011-09-01 14:46     ` Ulrich Weigand
  0 siblings, 2 replies; 14+ messages in thread
From: Julian Brown @ 2011-08-25 16:29 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: gcc-patches, paul, rearnsha, Ramana Radhakrishnan

[-- Attachment #1: Type: text/plain, Size: 3822 bytes --]

On Mon, 4 Jul 2011 13:43:31 +0200 (CEST)
"Ulrich Weigand" <uweigand@de.ibm.com> wrote:

> Julian Brown wrote:
> 
> > The most awkward change in the patch is to generic code (expmed.c,
> > {store,extract}_bit_field_1): in big-endian mode, the existing
> > behaviour (when inserting/extracting a bitfield to a memory
> > location) is definitely bogus: "unit" is set to BITS_PER_UNIT for
> > memory locations, and if bitsize (the size of the field to
> > insert/extract) is greater than BITS_PER_UNIT (which isn't unusual
> > at all), xbitpos becomes negative. That can't possibly be
> > intentional; I can only assume that this code path is not exercised
> > for machines which have memory alternatives for bitfield
> > insert/extract, and BITS_BIG_ENDIAN of 0 in BYTES_BIG_ENDIAN mode.

> I agree that the current code cannot possibly be correct.  However,
> just disabling the BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN renumbering
> *completely* seems wrong to me as well.  
> 
> According to the docs, the meaning bit position passed to the
> extv/insv expanders is determined by BITS_BIG_ENDIAN, both in the
> cases of register and memory operands.  Therefore, if BITS_BIG_ENDIAN
> differs from BYTES_BIG_ENDIAN, we should need a correction for memory
> operands as well.  However, this correction needs to be relative to
> the size of the access (i.e. the operand to the extv/insn), not just
> BITS_PER_UNIT.

> Note that with that change, the new code your patch introduces to the
> ARM back-end will also need to change.  You currently handle bitpos
> like this:
> 
>           base_addr = adjust_address (operands[1], HImode,
>                                       bitpos / BITS_PER_UNIT);
> 
> This implicitly assumes that bitpos counts according to
> BYTES_BIG_ENDIAN, not BITS_BIG_ENDIAN -- which exactly cancels out
> the common code behaviour introduced by your patch ...

I've updated the patch to work with current mainline, and implemented
your suggestion along with the change of the interpretation of bitpos in
the insv/extv/extzv expanders in arm.md. It seems to work fine
(testing still in progress), but I'm a bit concerned that the semantics
of bit-positioning for memory operands when BYTES_BIG_ENDIAN
&& !BITS_BIG_ENDIAN are now strange to the point of perversity.

The problem is, if we're using little-endian bit numbering for memory
locations in big-endian-bytes mode, we need to define an origin from
which to count "backwards" from. For the current implementation, this
will now be (I believe) one word beyond the base address of the access
in question, which is IMO slightly bizarre, to say the least. But, I
can't think of any other easy ways forward than either this patch, or
the previous one which disabled bit-endian switching entirely for
memory operands in this case.

So, OK to apply this version, assuming testing comes out OK? (And the
followup patch [2/2], which remains unchanged?)

Thanks,

Julian

ChangeLog

    gcc/
    * config/arm/arm.c (arm_override_options): Add unaligned_access
    support.
    (arm_file_start): Emit attribute for unaligned access as
    appropriate.
    * config/arm/arm.md (UNSPEC_UNALIGNED_LOAD)
    (UNSPEC_UNALIGNED_STORE): Add constants for unspecs.
    (insv, extzv): Add unaligned-access support.
    (extv): Change to expander. Likewise.
    (extzv_t1, extv_regsi): Add helpers.
    (unaligned_loadsi, unaligned_loadhis, unaligned_loadhiu)
    (unaligned_storesi, unaligned_storehi): New.
    (*extv_reg): New (previous extv implementation).
    * config/arm/arm.opt (munaligned_access): Add option.
    * config/arm/constraints.md (Uw): New constraint.
    * expmed.c (store_bit_field_1): Adjust bitfield numbering according
    to size of access, not size of unit, when BITS_BIG_ENDIAN !=
    BYTES_BIG_ENDIAN.
    (extract_bit_field_1): Likewise.

[-- Attachment #2: arm-unaligned-support-fsf-9.diff --]
[-- Type: text/x-patch, Size: 16540 bytes --]

commit 7bf9c1c0806ad1ae75e96635cda55fff4c40e7ae
Author: Julian Brown <julian@henry8.codesourcery.com>
Date:   Tue Aug 23 05:46:22 2011 -0700

    Unaligned support for packed structs

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 2353704..dda2718 100644
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 3162b30..cc1eb80 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1905,6 +1905,28 @@ arm_option_override (void)
 	fix_cm3_ldrd = 0;
     }
 
+  /* Enable -munaligned-access by default for
+     - all ARMv6 architecture-based processors
+     - ARMv7-A, ARMv7-R, and ARMv7-M architecture-based processors.
+
+     Disable -munaligned-access by default for
+     - all pre-ARMv6 architecture-based processors
+     - ARMv6-M architecture-based processors.  */
+
+  if (unaligned_access == 2)
+    {
+      if (arm_arch6 && (arm_arch_notm || arm_arch7))
+	unaligned_access = 1;
+      else
+	unaligned_access = 0;
+    }
+  else if (unaligned_access == 1
+	   && !(arm_arch6 && (arm_arch_notm || arm_arch7)))
+    {
+      warning (0, "target CPU does not support unaligned accesses");
+      unaligned_access = 0;
+    }
+
   if (TARGET_THUMB1 && flag_schedule_insns)
     {
       /* Don't warn since it's on by default in -O2.  */
@@ -22145,6 +22167,10 @@ arm_file_start (void)
 	val = 6;
       asm_fprintf (asm_out_file, "\t.eabi_attribute 30, %d\n", val);
 
+      /* Tag_CPU_unaligned_access.  */
+      asm_fprintf (asm_out_file, "\t.eabi_attribute 34, %d\n",
+		   unaligned_access);
+
       /* Tag_ABI_FP_16bit_format.  */
       if (arm_fp16_format)
 	asm_fprintf (asm_out_file, "\t.eabi_attribute 38, %d\n",
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 0f23400..0ea0f7f 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -103,6 +103,10 @@
   UNSPEC_SYMBOL_OFFSET  ; The offset of the start of the symbol from
                         ; another symbolic address.
   UNSPEC_MEMORY_BARRIER ; Represent a memory barrier.
+  UNSPEC_UNALIGNED_LOAD	; Used to represent ldr/ldrh instructions that access
+			; unaligned locations, on architectures which support
+			; that.
+  UNSPEC_UNALIGNED_STORE ; Same for str/strh.
 ])
 
 ;; UNSPEC_VOLATILE Usage:
@@ -2468,10 +2472,10 @@
 ;;; this insv pattern, so this pattern needs to be reevalutated.
 
 (define_expand "insv"
-  [(set (zero_extract:SI (match_operand:SI 0 "s_register_operand" "")
-                         (match_operand:SI 1 "general_operand" "")
-                         (match_operand:SI 2 "general_operand" ""))
-        (match_operand:SI 3 "reg_or_int_operand" ""))]
+  [(set (zero_extract (match_operand 0 "nonimmediate_operand" "")
+                      (match_operand 1 "general_operand" "")
+                      (match_operand 2 "general_operand" ""))
+        (match_operand 3 "reg_or_int_operand" ""))]
   "TARGET_ARM || arm_arch_thumb2"
   "
   {
@@ -2482,35 +2486,70 @@
 
     if (arm_arch_thumb2)
       {
-	bool use_bfi = TRUE;
-
-	if (GET_CODE (operands[3]) == CONST_INT)
+        if (unaligned_access && MEM_P (operands[0])
+	    && s_register_operand (operands[3], GET_MODE (operands[3]))
+	    && (width == 16 || width == 32) && (start_bit % BITS_PER_UNIT) == 0)
 	  {
-	    HOST_WIDE_INT val = INTVAL (operands[3]) & mask;
+	    rtx base_addr;
+
+	    if (BYTES_BIG_ENDIAN)
+	      start_bit = GET_MODE_BITSIZE (GET_MODE (operands[3])) - width
+			  - start_bit;
 
-	    if (val == 0)
+	    if (width == 32)
 	      {
-		emit_insn (gen_insv_zero (operands[0], operands[1],
-					  operands[2]));
-		DONE;
+	        base_addr = adjust_address (operands[0], SImode,
+					    start_bit / BITS_PER_UNIT);
+		emit_insn (gen_unaligned_storesi (base_addr, operands[3]));
 	      }
+	    else
+	      {
+	        rtx tmp = gen_reg_rtx (HImode);
 
-	    /* See if the set can be done with a single orr instruction.  */
-	    if (val == mask && const_ok_for_arm (val << start_bit))
-	      use_bfi = FALSE;
+	        base_addr = adjust_address (operands[0], HImode,
+					    start_bit / BITS_PER_UNIT);
+		emit_move_insn (tmp, gen_lowpart (HImode, operands[3]));
+		emit_insn (gen_unaligned_storehi (base_addr, tmp));
+	      }
+	    DONE;
 	  }
-	  
-	if (use_bfi)
+	else if (s_register_operand (operands[0], GET_MODE (operands[0])))
 	  {
-	    if (GET_CODE (operands[3]) != REG)
-	      operands[3] = force_reg (SImode, operands[3]);
+	    bool use_bfi = TRUE;
 
-	    emit_insn (gen_insv_t2 (operands[0], operands[1], operands[2],
-				    operands[3]));
-	    DONE;
+	    if (GET_CODE (operands[3]) == CONST_INT)
+	      {
+		HOST_WIDE_INT val = INTVAL (operands[3]) & mask;
+
+		if (val == 0)
+		  {
+		    emit_insn (gen_insv_zero (operands[0], operands[1],
+					      operands[2]));
+		    DONE;
+		  }
+
+		/* See if the set can be done with a single orr instruction.  */
+		if (val == mask && const_ok_for_arm (val << start_bit))
+		  use_bfi = FALSE;
+	      }
+
+	    if (use_bfi)
+	      {
+		if (GET_CODE (operands[3]) != REG)
+		  operands[3] = force_reg (SImode, operands[3]);
+
+		emit_insn (gen_insv_t2 (operands[0], operands[1], operands[2],
+					operands[3]));
+		DONE;
+	      }
 	  }
+	else
+	  FAIL;
       }
 
+    if (!s_register_operand (operands[0], GET_MODE (operands[0])))
+      FAIL;
+
     target = copy_rtx (operands[0]);
     /* Avoid using a subreg as a subtarget, and avoid writing a paradoxical 
        subreg as the final target.  */
@@ -3702,12 +3741,10 @@
 ;; to reduce register pressure later on.
 
 (define_expand "extzv"
-  [(set (match_dup 4)
-	(ashift:SI (match_operand:SI   1 "register_operand" "")
-		   (match_operand:SI   2 "const_int_operand" "")))
-   (set (match_operand:SI              0 "register_operand" "")
-	(lshiftrt:SI (match_dup 4)
-		     (match_operand:SI 3 "const_int_operand" "")))]
+  [(set (match_operand 0 "s_register_operand" "")
+	(zero_extract (match_operand 1 "nonimmediate_operand" "")
+		      (match_operand 2 "const_int_operand" "")
+		      (match_operand 3 "const_int_operand" "")))]
   "TARGET_THUMB1 || arm_arch_thumb2"
   "
   {
@@ -3716,10 +3753,57 @@
     
     if (arm_arch_thumb2)
       {
-	emit_insn (gen_extzv_t2 (operands[0], operands[1], operands[2],
-				 operands[3]));
-	DONE;
+	HOST_WIDE_INT width = INTVAL (operands[2]);
+	HOST_WIDE_INT bitpos = INTVAL (operands[3]);
+
+	if (unaligned_access && MEM_P (operands[1])
+	    && (width == 16 || width == 32) && (bitpos % BITS_PER_UNIT) == 0)
+	  {
+	    rtx base_addr;
+
+	    if (BYTES_BIG_ENDIAN)
+	      bitpos = GET_MODE_BITSIZE (GET_MODE (operands[0])) - width
+		       - bitpos;
+
+	    if (width == 32)
+              {
+		base_addr = adjust_address (operands[1], SImode,
+					    bitpos / BITS_PER_UNIT);
+		emit_insn (gen_unaligned_loadsi (operands[0], base_addr));
+              }
+	    else
+              {
+		rtx dest = operands[0];
+		rtx tmp = gen_reg_rtx (SImode);
+
+		/* We may get a paradoxical subreg here.  Strip it off.  */
+		if (GET_CODE (dest) == SUBREG
+		    && GET_MODE (dest) == SImode
+		    && GET_MODE (SUBREG_REG (dest)) == HImode)
+		  dest = SUBREG_REG (dest);
+
+		if (GET_MODE_BITSIZE (GET_MODE (dest)) != width)
+		  FAIL;
+
+		base_addr = adjust_address (operands[1], HImode,
+					    bitpos / BITS_PER_UNIT);
+		emit_insn (gen_unaligned_loadhiu (tmp, base_addr));
+		emit_move_insn (gen_lowpart (SImode, dest), tmp);
+	      }
+	    DONE;
+	  }
+	else if (s_register_operand (operands[1], GET_MODE (operands[1])))
+	  {
+	    emit_insn (gen_extzv_t2 (operands[0], operands[1], operands[2],
+				     operands[3]));
+	    DONE;
+	  }
+	else
+	  FAIL;
       }
+    
+    if (!s_register_operand (operands[1], GET_MODE (operands[1])))
+      FAIL;
 
     operands[3] = GEN_INT (rshift);
     
@@ -3729,12 +3813,154 @@
         DONE;
       }
       
-    operands[2] = GEN_INT (lshift);
-    operands[4] = gen_reg_rtx (SImode);
+    emit_insn (gen_extzv_t1 (operands[0], operands[1], GEN_INT (lshift),
+			     operands[3], gen_reg_rtx (SImode)));
+    DONE;
   }"
 )
 
-(define_insn "extv"
+;; Helper for extzv, for the Thumb-1 register-shifts case.
+
+(define_expand "extzv_t1"
+  [(set (match_operand:SI 4 "s_register_operand" "")
+	(ashift:SI (match_operand:SI 1 "nonimmediate_operand" "")
+		   (match_operand:SI 2 "const_int_operand" "")))
+   (set (match_operand:SI 0 "s_register_operand" "")
+	(lshiftrt:SI (match_dup 4)
+		     (match_operand:SI 3 "const_int_operand" "")))]
+  "TARGET_THUMB1"
+  "")
+
+(define_expand "extv"
+  [(set (match_operand 0 "s_register_operand" "")
+	(sign_extract (match_operand 1 "nonimmediate_operand" "")
+		      (match_operand 2 "const_int_operand" "")
+		      (match_operand 3 "const_int_operand" "")))]
+  "arm_arch_thumb2"
+{
+  HOST_WIDE_INT width = INTVAL (operands[2]);
+  HOST_WIDE_INT bitpos = INTVAL (operands[3]);
+
+  if (unaligned_access && MEM_P (operands[1]) && (width == 16 || width == 32)
+      && (bitpos % BITS_PER_UNIT)  == 0)
+    {
+      rtx base_addr;
+      
+      if (BYTES_BIG_ENDIAN)
+	bitpos = GET_MODE_BITSIZE (GET_MODE (operands[0])) - width - bitpos;
+      
+      if (width == 32)
+        {
+	  base_addr = adjust_address (operands[1], SImode,
+				      bitpos / BITS_PER_UNIT);
+	  emit_insn (gen_unaligned_loadsi (operands[0], base_addr));
+        }
+      else
+        {
+	  rtx dest = operands[0];
+	  rtx tmp = gen_reg_rtx (SImode);
+	  
+	  /* We may get a paradoxical subreg here.  Strip it off.  */
+	  if (GET_CODE (dest) == SUBREG
+	      && GET_MODE (dest) == SImode
+	      && GET_MODE (SUBREG_REG (dest)) == HImode)
+	    dest = SUBREG_REG (dest);
+	  
+	  if (GET_MODE_BITSIZE (GET_MODE (dest)) != width)
+	    FAIL;
+	  
+	  base_addr = adjust_address (operands[1], HImode,
+				      bitpos / BITS_PER_UNIT);
+	  emit_insn (gen_unaligned_loadhis (tmp, base_addr));
+	  emit_move_insn (gen_lowpart (SImode, dest), tmp);
+	}
+
+      DONE;
+    }
+  else if (!s_register_operand (operands[1], GET_MODE (operands[1])))
+    FAIL;
+  else if (GET_MODE (operands[0]) == SImode
+	   && GET_MODE (operands[1]) == SImode)
+    {
+      emit_insn (gen_extv_regsi (operands[0], operands[1], operands[2],
+				 operands[3]));
+      DONE;
+    }
+
+  FAIL;
+})
+
+; Helper to expand register forms of extv with the proper modes.
+
+(define_expand "extv_regsi"
+  [(set (match_operand:SI 0 "s_register_operand" "")
+	(sign_extract:SI (match_operand:SI 1 "s_register_operand" "")
+			 (match_operand 2 "const_int_operand" "")
+			 (match_operand 3 "const_int_operand" "")))]
+  ""
+{
+})
+
+; ARMv6+ unaligned load/store instructions (used for packed structure accesses).
+
+(define_insn "unaligned_loadsi"
+  [(set (match_operand:SI 0 "s_register_operand" "=l,r")
+	(unspec:SI [(match_operand:SI 1 "memory_operand" "Uw,m")]
+		   UNSPEC_UNALIGNED_LOAD))]
+  "unaligned_access && TARGET_32BIT"
+  "ldr%?\t%0, %1\t@ unaligned"
+  [(set_attr "arch" "t2,any")
+   (set_attr "length" "2,4")
+   (set_attr "predicable" "yes")
+   (set_attr "type" "load1")])
+
+(define_insn "unaligned_loadhis"
+  [(set (match_operand:SI 0 "s_register_operand" "=l,r")
+	(sign_extend:SI
+	  (unspec:HI [(match_operand:HI 1 "memory_operand" "Uw,m")]
+		     UNSPEC_UNALIGNED_LOAD)))]
+  "unaligned_access && TARGET_32BIT"
+  "ldr%(sh%)\t%0, %1\t@ unaligned"
+  [(set_attr "arch" "t2,any")
+   (set_attr "length" "2,4")
+   (set_attr "predicable" "yes")
+   (set_attr "type" "load_byte")])
+
+(define_insn "unaligned_loadhiu"
+  [(set (match_operand:SI 0 "s_register_operand" "=l,r")
+	(zero_extend:SI
+	  (unspec:HI [(match_operand:HI 1 "memory_operand" "Uw,m")]
+		     UNSPEC_UNALIGNED_LOAD)))]
+  "unaligned_access && TARGET_32BIT"
+  "ldr%(h%)\t%0, %1\t@ unaligned"
+  [(set_attr "arch" "t2,any")
+   (set_attr "length" "2,4")
+   (set_attr "predicable" "yes")
+   (set_attr "type" "load_byte")])
+
+(define_insn "unaligned_storesi"
+  [(set (match_operand:SI 0 "memory_operand" "=Uw,m")
+	(unspec:SI [(match_operand:SI 1 "s_register_operand" "l,r")]
+		   UNSPEC_UNALIGNED_STORE))]
+  "unaligned_access && TARGET_32BIT"
+  "str%?\t%1, %0\t@ unaligned"
+  [(set_attr "arch" "t2,any")
+   (set_attr "length" "2,4")
+   (set_attr "predicable" "yes")
+   (set_attr "type" "store1")])
+
+(define_insn "unaligned_storehi"
+  [(set (match_operand:HI 0 "memory_operand" "=Uw,m")
+	(unspec:HI [(match_operand:HI 1 "s_register_operand" "l,r")]
+		   UNSPEC_UNALIGNED_STORE))]
+  "unaligned_access && TARGET_32BIT"
+  "str%(h%)\t%1, %0\t@ unaligned"
+  [(set_attr "arch" "t2,any")
+   (set_attr "length" "2,4")
+   (set_attr "predicable" "yes")
+   (set_attr "type" "store1")])
+
+(define_insn "*extv_reg"
   [(set (match_operand:SI 0 "s_register_operand" "=r")
 	(sign_extract:SI (match_operand:SI 1 "s_register_operand" "r")
                          (match_operand:SI 2 "const_int_operand" "M")
diff --git a/gcc/config/arm/arm.opt b/gcc/config/arm/arm.opt
index be5fd3c..98eace3 100644
--- a/gcc/config/arm/arm.opt
+++ b/gcc/config/arm/arm.opt
@@ -249,3 +249,7 @@ mfix-cortex-m3-ldrd
 Target Report Var(fix_cm3_ldrd) Init(2)
 Avoid overlapping destination and address registers on LDRD instructions
 that may trigger Cortex-M3 errata.
+
+munaligned-access
+Target Report Var(unaligned_access) Init(2)
+Enable unaligned word and halfword accesses to packed data.
diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index f5b8521..53c358e 100644
--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -36,7 +36,7 @@
 ;; The following memory constraints have been used:
 ;; in ARM/Thumb-2 state: Q, Ut, Uv, Uy, Un, Um, Us
 ;; in ARM state: Uq
-;; in Thumb state: Uu
+;; in Thumb state: Uu, Uw
 
 
 (define_register_constraint "f" "TARGET_ARM ? FPA_REGS : NO_REGS"
@@ -341,6 +341,19 @@
 		   && thumb1_legitimate_address_p (GET_MODE (op), XEXP (op, 0),
 						   0)")))
 
+; The 16-bit post-increment LDR/STR accepted by thumb1_legitimate_address_p
+; are actually LDM/STM instructions, so cannot be used to access unaligned
+; data.
+(define_memory_constraint "Uw"
+ "@internal
+  In Thumb state an address that is valid in 16bit encoding, and that can be
+  used for unaligned accesses."
+ (and (match_code "mem")
+      (match_test "TARGET_THUMB
+		   && thumb1_legitimate_address_p (GET_MODE (op), XEXP (op, 0),
+						   0)
+		   && GET_CODE (XEXP (op, 0)) != POST_INC")))
+
 ;; We used to have constraint letters for S and R in ARM state, but
 ;; all uses of these now appear to have been removed.
 
diff --git a/gcc/expmed.c b/gcc/expmed.c
index 0047cd0..2a663b3 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -659,19 +659,19 @@ store_bit_field_1 (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
 	  copy_back = true;
 	}
 
-      /* On big-endian machines, we count bits from the most significant.
-	 If the bit field insn does not, we must invert.  */
-
-      if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN)
-	xbitpos = unit - bitsize - xbitpos;
-
       /* We have been counting XBITPOS within UNIT.
 	 Count instead within the size of the register.  */
-      if (BITS_BIG_ENDIAN && !MEM_P (xop0))
+      if (BYTES_BIG_ENDIAN && !MEM_P (xop0))
 	xbitpos += GET_MODE_BITSIZE (op_mode) - unit;
 
       unit = GET_MODE_BITSIZE (op_mode);
 
+      /* On big-endian machines, we count bits from the most significant.
+	 If the bit field insn does not, we must invert.  */
+
+      if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN)
+	xbitpos = unit - bitsize - xbitpos;
+
       /* Convert VALUE to op_mode (which insv insn wants) in VALUE1.  */
       value1 = value;
       if (GET_MODE (value) != op_mode)
@@ -1528,17 +1528,17 @@ extract_bit_field_1 (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
 	/* Get ref to first byte containing part of the field.  */
 	xop0 = adjust_address (xop0, byte_mode, xoffset);
 
-      /* On big-endian machines, we count bits from the most significant.
-	 If the bit field insn does not, we must invert.  */
-      if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN)
-	xbitpos = unit - bitsize - xbitpos;
-
       /* Now convert from counting within UNIT to counting in EXT_MODE.  */
-      if (BITS_BIG_ENDIAN && !MEM_P (xop0))
+      if (BYTES_BIG_ENDIAN && !MEM_P (xop0))
 	xbitpos += GET_MODE_BITSIZE (ext_mode) - unit;
 
       unit = GET_MODE_BITSIZE (ext_mode);
 
+      /* On big-endian machines, we count bits from the most significant.
+	 If the bit field insn does not, we must invert.  */
+      if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN)
+	xbitpos = unit - bitsize - xbitpos;
+
       if (xtarget == 0)
 	xtarget = xspec_target = gen_reg_rtx (tmode);
 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH, ARM] Unaligned accesses for packed structures [1/2]
  2011-08-25 16:29   ` Julian Brown
@ 2011-08-25 18:19     ` Julian Brown
  2011-08-25 22:18       ` Ramana Radhakrishnan
  2011-08-26 17:37       ` Julian Brown
  2011-09-01 14:46     ` Ulrich Weigand
  1 sibling, 2 replies; 14+ messages in thread
From: Julian Brown @ 2011-08-25 18:19 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: gcc-patches, paul, rearnsha, Ramana Radhakrishnan

On Thu, 25 Aug 2011 16:46:50 +0100
Julian Brown <julian@codesourcery.com> wrote:

> So, OK to apply this version, assuming testing comes out OK? (And the
> followup patch [2/2], which remains unchanged?)

FWIW, all tests pass, apart from gcc.target/arm/volatile-bitfields-3.c,
which regresses. The output contains:

        ldrh    r0, [r3, #2]    @ unaligned

I believe that, to conform to the ARM EABI, that GCC must use an
(aligned) ldr in this case. Is that correct? If so, it looks like the
middle-end bitfield code does not take the setting of
-fstrict-volatile-bitfields into account.

Julian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH, ARM] Unaligned accesses for packed structures [1/2]
  2011-08-25 18:19     ` Julian Brown
@ 2011-08-25 22:18       ` Ramana Radhakrishnan
  2011-08-25 22:23         ` Joseph S. Myers
  2011-08-26 17:37       ` Julian Brown
  1 sibling, 1 reply; 14+ messages in thread
From: Ramana Radhakrishnan @ 2011-08-25 22:18 UTC (permalink / raw)
  To: Julian Brown; +Cc: Ulrich Weigand, gcc-patches, paul, rearnsha

On 25 August 2011 18:31, Julian Brown <julian@codesourcery.com> wrote:
> On Thu, 25 Aug 2011 16:46:50 +0100
> Julian Brown <julian@codesourcery.com> wrote:
>
>> So, OK to apply this version, assuming testing comes out OK? (And the
>> followup patch [2/2], which remains unchanged?)
>
> FWIW, all tests pass, apart from gcc.target/arm/volatile-bitfields-3.c,
> which regresses. The output contains:
>
>        ldrh    r0, [r3, #2]    @ unaligned
>
> I believe that, to conform to the ARM EABI, that GCC must use an
> (aligned) ldr in this case. Is that correct?

That is correct by my reading of the ABI Spec.

The relevant section is 7.1.7.5 where it states that :

"When a volatile bitfield is read it's container must be read exactly
once using the access width appropriate to the type of the container.
"

Here the type of the container is a long and hence the access should
be with an ldr instruction followed by a shift as it is today
irrespective whether we support unaligned accesses in this case.

cheers
Ramana

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH, ARM] Unaligned accesses for packed structures [1/2]
  2011-08-25 22:18       ` Ramana Radhakrishnan
@ 2011-08-25 22:23         ` Joseph S. Myers
  0 siblings, 0 replies; 14+ messages in thread
From: Joseph S. Myers @ 2011-08-25 22:23 UTC (permalink / raw)
  To: Ramana Radhakrishnan
  Cc: Julian Brown, Ulrich Weigand, gcc-patches, paul, rearnsha

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1609 bytes --]

On Thu, 25 Aug 2011, Ramana Radhakrishnan wrote:

> On 25 August 2011 18:31, Julian Brown <julian@codesourcery.com> wrote:
> > On Thu, 25 Aug 2011 16:46:50 +0100
> > Julian Brown <julian@codesourcery.com> wrote:
> >
> >> So, OK to apply this version, assuming testing comes out OK? (And the
> >> followup patch [2/2], which remains unchanged?)
> >
> > FWIW, all tests pass, apart from gcc.target/arm/volatile-bitfields-3.c,
> > which regresses. The output contains:
> >
> >        ldrh    r0, [r3, #2]    @ unaligned
> >
> > I believe that, to conform to the ARM EABI, that GCC must use an
> > (aligned) ldr in this case. Is that correct?
> 
> That is correct by my reading of the ABI Spec.
> 
> The relevant section is 7.1.7.5 where it states that :
> 
> "When a volatile bitfield is read it's container must be read exactly
> once using the access width appropriate to the type of the container.
> "
> 
> Here the type of the container is a long and hence the access should
> be with an ldr instruction followed by a shift as it is today
> irrespective whether we support unaligned accesses in this case.

Except for packed structures, anyway (and there aren't any packed 
structures involved in this particular testcase).  The ABI doesn't cover 
packed structures and I maintain that there the target-independent GNU C 
semantics take precedence - meaning that if the compiler doesn't know that 
a single read with the relevant access width is going to be safe, it must 
use an instruction sequence it does know to be safe.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH, ARM] Unaligned accesses for packed structures [1/2]
  2011-08-25 18:19     ` Julian Brown
  2011-08-25 22:18       ` Ramana Radhakrishnan
@ 2011-08-26 17:37       ` Julian Brown
  2011-09-08 17:35         ` Richard Earnshaw
  1 sibling, 1 reply; 14+ messages in thread
From: Julian Brown @ 2011-08-26 17:37 UTC (permalink / raw)
  To: gcc-patches; +Cc: Ulrich Weigand, paul, rearnsha, Ramana Radhakrishnan

[-- Attachment #1: Type: text/plain, Size: 1872 bytes --]

On Thu, 25 Aug 2011 18:31:21 +0100
Julian Brown <julian@codesourcery.com> wrote:

> On Thu, 25 Aug 2011 16:46:50 +0100
> Julian Brown <julian@codesourcery.com> wrote:
> 
> > So, OK to apply this version, assuming testing comes out OK? (And
> > the followup patch [2/2], which remains unchanged?)
> 
> FWIW, all tests pass, apart from
> gcc.target/arm/volatile-bitfields-3.c, which regresses. The output
> contains:
> 
>         ldrh    r0, [r3, #2]    @ unaligned
> 
> I believe that, to conform to the ARM EABI, that GCC must use an
> (aligned) ldr in this case. Is that correct? If so, it looks like the
> middle-end bitfield code does not take the setting of
> -fstrict-volatile-bitfields into account.

This version fixes the last issue, by adding additional checks for
volatile accesses/-fstrict-volatile-bitfields. Tests now show no
regressions.

OK to apply?

Thanks,

Julian

ChangeLog

    gcc/
    * config/arm/arm.c (arm_override_options): Add unaligned_access
    support.
    (arm_file_start): Emit attribute for unaligned access as
    appropriate.
    * config/arm/arm.md (UNSPEC_UNALIGNED_LOAD)
    (UNSPEC_UNALIGNED_STORE): Add constants for unspecs.
    (insv, extzv): Add unaligned-access support.
    (extv): Change to expander. Likewise.
    (extzv_t1, extv_regsi): Add helpers.
    (unaligned_loadsi, unaligned_loadhis, unaligned_loadhiu)
    (unaligned_storesi, unaligned_storehi): New.
    (*extv_reg): New (previous extv implementation).
    * config/arm/arm.opt (munaligned_access): Add option.
    * config/arm/constraints.md (Uw): New constraint.
    * expmed.c (store_bit_field_1): Adjust bitfield numbering according
    to size of access, not size of unit, when BITS_BIG_ENDIAN !=
    BYTES_BIG_ENDIAN. Don't use bitfield accesses for
    volatile accesses when -fstrict-volatile-bitfields is in effect.
    (extract_bit_field_1): Likewise.

[-- Attachment #2: arm-unaligned-support-fsf-10.diff --]
[-- Type: text/x-patch, Size: 17537 bytes --]

commit 645a7c99ff91ea2841c8101fb3c76e3b1fddb2c7
Author: Julian Brown <julian@henry8.codesourcery.com>
Date:   Tue Aug 23 05:46:22 2011 -0700

    Unaligned support for packed structs

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 3162b30..cc1eb80 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1905,6 +1905,28 @@ arm_option_override (void)
 	fix_cm3_ldrd = 0;
     }
 
+  /* Enable -munaligned-access by default for
+     - all ARMv6 architecture-based processors
+     - ARMv7-A, ARMv7-R, and ARMv7-M architecture-based processors.
+
+     Disable -munaligned-access by default for
+     - all pre-ARMv6 architecture-based processors
+     - ARMv6-M architecture-based processors.  */
+
+  if (unaligned_access == 2)
+    {
+      if (arm_arch6 && (arm_arch_notm || arm_arch7))
+	unaligned_access = 1;
+      else
+	unaligned_access = 0;
+    }
+  else if (unaligned_access == 1
+	   && !(arm_arch6 && (arm_arch_notm || arm_arch7)))
+    {
+      warning (0, "target CPU does not support unaligned accesses");
+      unaligned_access = 0;
+    }
+
   if (TARGET_THUMB1 && flag_schedule_insns)
     {
       /* Don't warn since it's on by default in -O2.  */
@@ -22145,6 +22167,10 @@ arm_file_start (void)
 	val = 6;
       asm_fprintf (asm_out_file, "\t.eabi_attribute 30, %d\n", val);
 
+      /* Tag_CPU_unaligned_access.  */
+      asm_fprintf (asm_out_file, "\t.eabi_attribute 34, %d\n",
+		   unaligned_access);
+
       /* Tag_ABI_FP_16bit_format.  */
       if (arm_fp16_format)
 	asm_fprintf (asm_out_file, "\t.eabi_attribute 38, %d\n",
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 0f23400..0ea0f7f 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -103,6 +103,10 @@
   UNSPEC_SYMBOL_OFFSET  ; The offset of the start of the symbol from
                         ; another symbolic address.
   UNSPEC_MEMORY_BARRIER ; Represent a memory barrier.
+  UNSPEC_UNALIGNED_LOAD	; Used to represent ldr/ldrh instructions that access
+			; unaligned locations, on architectures which support
+			; that.
+  UNSPEC_UNALIGNED_STORE ; Same for str/strh.
 ])
 
 ;; UNSPEC_VOLATILE Usage:
@@ -2468,10 +2472,10 @@
 ;;; this insv pattern, so this pattern needs to be reevalutated.
 
 (define_expand "insv"
-  [(set (zero_extract:SI (match_operand:SI 0 "s_register_operand" "")
-                         (match_operand:SI 1 "general_operand" "")
-                         (match_operand:SI 2 "general_operand" ""))
-        (match_operand:SI 3 "reg_or_int_operand" ""))]
+  [(set (zero_extract (match_operand 0 "nonimmediate_operand" "")
+                      (match_operand 1 "general_operand" "")
+                      (match_operand 2 "general_operand" ""))
+        (match_operand 3 "reg_or_int_operand" ""))]
   "TARGET_ARM || arm_arch_thumb2"
   "
   {
@@ -2482,35 +2486,70 @@
 
     if (arm_arch_thumb2)
       {
-	bool use_bfi = TRUE;
-
-	if (GET_CODE (operands[3]) == CONST_INT)
+        if (unaligned_access && MEM_P (operands[0])
+	    && s_register_operand (operands[3], GET_MODE (operands[3]))
+	    && (width == 16 || width == 32) && (start_bit % BITS_PER_UNIT) == 0)
 	  {
-	    HOST_WIDE_INT val = INTVAL (operands[3]) & mask;
+	    rtx base_addr;
+
+	    if (BYTES_BIG_ENDIAN)
+	      start_bit = GET_MODE_BITSIZE (GET_MODE (operands[3])) - width
+			  - start_bit;
 
-	    if (val == 0)
+	    if (width == 32)
 	      {
-		emit_insn (gen_insv_zero (operands[0], operands[1],
-					  operands[2]));
-		DONE;
+	        base_addr = adjust_address (operands[0], SImode,
+					    start_bit / BITS_PER_UNIT);
+		emit_insn (gen_unaligned_storesi (base_addr, operands[3]));
 	      }
+	    else
+	      {
+	        rtx tmp = gen_reg_rtx (HImode);
 
-	    /* See if the set can be done with a single orr instruction.  */
-	    if (val == mask && const_ok_for_arm (val << start_bit))
-	      use_bfi = FALSE;
+	        base_addr = adjust_address (operands[0], HImode,
+					    start_bit / BITS_PER_UNIT);
+		emit_move_insn (tmp, gen_lowpart (HImode, operands[3]));
+		emit_insn (gen_unaligned_storehi (base_addr, tmp));
+	      }
+	    DONE;
 	  }
-	  
-	if (use_bfi)
+	else if (s_register_operand (operands[0], GET_MODE (operands[0])))
 	  {
-	    if (GET_CODE (operands[3]) != REG)
-	      operands[3] = force_reg (SImode, operands[3]);
+	    bool use_bfi = TRUE;
 
-	    emit_insn (gen_insv_t2 (operands[0], operands[1], operands[2],
-				    operands[3]));
-	    DONE;
+	    if (GET_CODE (operands[3]) == CONST_INT)
+	      {
+		HOST_WIDE_INT val = INTVAL (operands[3]) & mask;
+
+		if (val == 0)
+		  {
+		    emit_insn (gen_insv_zero (operands[0], operands[1],
+					      operands[2]));
+		    DONE;
+		  }
+
+		/* See if the set can be done with a single orr instruction.  */
+		if (val == mask && const_ok_for_arm (val << start_bit))
+		  use_bfi = FALSE;
+	      }
+
+	    if (use_bfi)
+	      {
+		if (GET_CODE (operands[3]) != REG)
+		  operands[3] = force_reg (SImode, operands[3]);
+
+		emit_insn (gen_insv_t2 (operands[0], operands[1], operands[2],
+					operands[3]));
+		DONE;
+	      }
 	  }
+	else
+	  FAIL;
       }
 
+    if (!s_register_operand (operands[0], GET_MODE (operands[0])))
+      FAIL;
+
     target = copy_rtx (operands[0]);
     /* Avoid using a subreg as a subtarget, and avoid writing a paradoxical 
        subreg as the final target.  */
@@ -3702,12 +3741,10 @@
 ;; to reduce register pressure later on.
 
 (define_expand "extzv"
-  [(set (match_dup 4)
-	(ashift:SI (match_operand:SI   1 "register_operand" "")
-		   (match_operand:SI   2 "const_int_operand" "")))
-   (set (match_operand:SI              0 "register_operand" "")
-	(lshiftrt:SI (match_dup 4)
-		     (match_operand:SI 3 "const_int_operand" "")))]
+  [(set (match_operand 0 "s_register_operand" "")
+	(zero_extract (match_operand 1 "nonimmediate_operand" "")
+		      (match_operand 2 "const_int_operand" "")
+		      (match_operand 3 "const_int_operand" "")))]
   "TARGET_THUMB1 || arm_arch_thumb2"
   "
   {
@@ -3716,10 +3753,57 @@
     
     if (arm_arch_thumb2)
       {
-	emit_insn (gen_extzv_t2 (operands[0], operands[1], operands[2],
-				 operands[3]));
-	DONE;
+	HOST_WIDE_INT width = INTVAL (operands[2]);
+	HOST_WIDE_INT bitpos = INTVAL (operands[3]);
+
+	if (unaligned_access && MEM_P (operands[1])
+	    && (width == 16 || width == 32) && (bitpos % BITS_PER_UNIT) == 0)
+	  {
+	    rtx base_addr;
+
+	    if (BYTES_BIG_ENDIAN)
+	      bitpos = GET_MODE_BITSIZE (GET_MODE (operands[0])) - width
+		       - bitpos;
+
+	    if (width == 32)
+              {
+		base_addr = adjust_address (operands[1], SImode,
+					    bitpos / BITS_PER_UNIT);
+		emit_insn (gen_unaligned_loadsi (operands[0], base_addr));
+              }
+	    else
+              {
+		rtx dest = operands[0];
+		rtx tmp = gen_reg_rtx (SImode);
+
+		/* We may get a paradoxical subreg here.  Strip it off.  */
+		if (GET_CODE (dest) == SUBREG
+		    && GET_MODE (dest) == SImode
+		    && GET_MODE (SUBREG_REG (dest)) == HImode)
+		  dest = SUBREG_REG (dest);
+
+		if (GET_MODE_BITSIZE (GET_MODE (dest)) != width)
+		  FAIL;
+
+		base_addr = adjust_address (operands[1], HImode,
+					    bitpos / BITS_PER_UNIT);
+		emit_insn (gen_unaligned_loadhiu (tmp, base_addr));
+		emit_move_insn (gen_lowpart (SImode, dest), tmp);
+	      }
+	    DONE;
+	  }
+	else if (s_register_operand (operands[1], GET_MODE (operands[1])))
+	  {
+	    emit_insn (gen_extzv_t2 (operands[0], operands[1], operands[2],
+				     operands[3]));
+	    DONE;
+	  }
+	else
+	  FAIL;
       }
+    
+    if (!s_register_operand (operands[1], GET_MODE (operands[1])))
+      FAIL;
 
     operands[3] = GEN_INT (rshift);
     
@@ -3729,12 +3813,154 @@
         DONE;
       }
       
-    operands[2] = GEN_INT (lshift);
-    operands[4] = gen_reg_rtx (SImode);
+    emit_insn (gen_extzv_t1 (operands[0], operands[1], GEN_INT (lshift),
+			     operands[3], gen_reg_rtx (SImode)));
+    DONE;
   }"
 )
 
-(define_insn "extv"
+;; Helper for extzv, for the Thumb-1 register-shifts case.
+
+(define_expand "extzv_t1"
+  [(set (match_operand:SI 4 "s_register_operand" "")
+	(ashift:SI (match_operand:SI 1 "nonimmediate_operand" "")
+		   (match_operand:SI 2 "const_int_operand" "")))
+   (set (match_operand:SI 0 "s_register_operand" "")
+	(lshiftrt:SI (match_dup 4)
+		     (match_operand:SI 3 "const_int_operand" "")))]
+  "TARGET_THUMB1"
+  "")
+
+(define_expand "extv"
+  [(set (match_operand 0 "s_register_operand" "")
+	(sign_extract (match_operand 1 "nonimmediate_operand" "")
+		      (match_operand 2 "const_int_operand" "")
+		      (match_operand 3 "const_int_operand" "")))]
+  "arm_arch_thumb2"
+{
+  HOST_WIDE_INT width = INTVAL (operands[2]);
+  HOST_WIDE_INT bitpos = INTVAL (operands[3]);
+
+  if (unaligned_access && MEM_P (operands[1]) && (width == 16 || width == 32)
+      && (bitpos % BITS_PER_UNIT)  == 0)
+    {
+      rtx base_addr;
+      
+      if (BYTES_BIG_ENDIAN)
+	bitpos = GET_MODE_BITSIZE (GET_MODE (operands[0])) - width - bitpos;
+      
+      if (width == 32)
+        {
+	  base_addr = adjust_address (operands[1], SImode,
+				      bitpos / BITS_PER_UNIT);
+	  emit_insn (gen_unaligned_loadsi (operands[0], base_addr));
+        }
+      else
+        {
+	  rtx dest = operands[0];
+	  rtx tmp = gen_reg_rtx (SImode);
+	  
+	  /* We may get a paradoxical subreg here.  Strip it off.  */
+	  if (GET_CODE (dest) == SUBREG
+	      && GET_MODE (dest) == SImode
+	      && GET_MODE (SUBREG_REG (dest)) == HImode)
+	    dest = SUBREG_REG (dest);
+	  
+	  if (GET_MODE_BITSIZE (GET_MODE (dest)) != width)
+	    FAIL;
+	  
+	  base_addr = adjust_address (operands[1], HImode,
+				      bitpos / BITS_PER_UNIT);
+	  emit_insn (gen_unaligned_loadhis (tmp, base_addr));
+	  emit_move_insn (gen_lowpart (SImode, dest), tmp);
+	}
+
+      DONE;
+    }
+  else if (!s_register_operand (operands[1], GET_MODE (operands[1])))
+    FAIL;
+  else if (GET_MODE (operands[0]) == SImode
+	   && GET_MODE (operands[1]) == SImode)
+    {
+      emit_insn (gen_extv_regsi (operands[0], operands[1], operands[2],
+				 operands[3]));
+      DONE;
+    }
+
+  FAIL;
+})
+
+; Helper to expand register forms of extv with the proper modes.
+
+(define_expand "extv_regsi"
+  [(set (match_operand:SI 0 "s_register_operand" "")
+	(sign_extract:SI (match_operand:SI 1 "s_register_operand" "")
+			 (match_operand 2 "const_int_operand" "")
+			 (match_operand 3 "const_int_operand" "")))]
+  ""
+{
+})
+
+; ARMv6+ unaligned load/store instructions (used for packed structure accesses).
+
+(define_insn "unaligned_loadsi"
+  [(set (match_operand:SI 0 "s_register_operand" "=l,r")
+	(unspec:SI [(match_operand:SI 1 "memory_operand" "Uw,m")]
+		   UNSPEC_UNALIGNED_LOAD))]
+  "unaligned_access && TARGET_32BIT"
+  "ldr%?\t%0, %1\t@ unaligned"
+  [(set_attr "arch" "t2,any")
+   (set_attr "length" "2,4")
+   (set_attr "predicable" "yes")
+   (set_attr "type" "load1")])
+
+(define_insn "unaligned_loadhis"
+  [(set (match_operand:SI 0 "s_register_operand" "=l,r")
+	(sign_extend:SI
+	  (unspec:HI [(match_operand:HI 1 "memory_operand" "Uw,m")]
+		     UNSPEC_UNALIGNED_LOAD)))]
+  "unaligned_access && TARGET_32BIT"
+  "ldr%(sh%)\t%0, %1\t@ unaligned"
+  [(set_attr "arch" "t2,any")
+   (set_attr "length" "2,4")
+   (set_attr "predicable" "yes")
+   (set_attr "type" "load_byte")])
+
+(define_insn "unaligned_loadhiu"
+  [(set (match_operand:SI 0 "s_register_operand" "=l,r")
+	(zero_extend:SI
+	  (unspec:HI [(match_operand:HI 1 "memory_operand" "Uw,m")]
+		     UNSPEC_UNALIGNED_LOAD)))]
+  "unaligned_access && TARGET_32BIT"
+  "ldr%(h%)\t%0, %1\t@ unaligned"
+  [(set_attr "arch" "t2,any")
+   (set_attr "length" "2,4")
+   (set_attr "predicable" "yes")
+   (set_attr "type" "load_byte")])
+
+(define_insn "unaligned_storesi"
+  [(set (match_operand:SI 0 "memory_operand" "=Uw,m")
+	(unspec:SI [(match_operand:SI 1 "s_register_operand" "l,r")]
+		   UNSPEC_UNALIGNED_STORE))]
+  "unaligned_access && TARGET_32BIT"
+  "str%?\t%1, %0\t@ unaligned"
+  [(set_attr "arch" "t2,any")
+   (set_attr "length" "2,4")
+   (set_attr "predicable" "yes")
+   (set_attr "type" "store1")])
+
+(define_insn "unaligned_storehi"
+  [(set (match_operand:HI 0 "memory_operand" "=Uw,m")
+	(unspec:HI [(match_operand:HI 1 "s_register_operand" "l,r")]
+		   UNSPEC_UNALIGNED_STORE))]
+  "unaligned_access && TARGET_32BIT"
+  "str%(h%)\t%1, %0\t@ unaligned"
+  [(set_attr "arch" "t2,any")
+   (set_attr "length" "2,4")
+   (set_attr "predicable" "yes")
+   (set_attr "type" "store1")])
+
+(define_insn "*extv_reg"
   [(set (match_operand:SI 0 "s_register_operand" "=r")
 	(sign_extract:SI (match_operand:SI 1 "s_register_operand" "r")
                          (match_operand:SI 2 "const_int_operand" "M")
diff --git a/gcc/config/arm/arm.opt b/gcc/config/arm/arm.opt
index be5fd3c..98eace3 100644
--- a/gcc/config/arm/arm.opt
+++ b/gcc/config/arm/arm.opt
@@ -249,3 +249,7 @@ mfix-cortex-m3-ldrd
 Target Report Var(fix_cm3_ldrd) Init(2)
 Avoid overlapping destination and address registers on LDRD instructions
 that may trigger Cortex-M3 errata.
+
+munaligned-access
+Target Report Var(unaligned_access) Init(2)
+Enable unaligned word and halfword accesses to packed data.
diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index f5b8521..53c358e 100644
--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -36,7 +36,7 @@
 ;; The following memory constraints have been used:
 ;; in ARM/Thumb-2 state: Q, Ut, Uv, Uy, Un, Um, Us
 ;; in ARM state: Uq
-;; in Thumb state: Uu
+;; in Thumb state: Uu, Uw
 
 
 (define_register_constraint "f" "TARGET_ARM ? FPA_REGS : NO_REGS"
@@ -341,6 +341,19 @@
 		   && thumb1_legitimate_address_p (GET_MODE (op), XEXP (op, 0),
 						   0)")))
 
+; The 16-bit post-increment LDR/STR accepted by thumb1_legitimate_address_p
+; are actually LDM/STM instructions, so cannot be used to access unaligned
+; data.
+(define_memory_constraint "Uw"
+ "@internal
+  In Thumb state an address that is valid in 16bit encoding, and that can be
+  used for unaligned accesses."
+ (and (match_code "mem")
+      (match_test "TARGET_THUMB
+		   && thumb1_legitimate_address_p (GET_MODE (op), XEXP (op, 0),
+						   0)
+		   && GET_CODE (XEXP (op, 0)) != POST_INC")))
+
 ;; We used to have constraint letters for S and R in ARM state, but
 ;; all uses of these now appear to have been removed.
 
diff --git a/gcc/expmed.c b/gcc/expmed.c
index 0047cd0..3c63750 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -620,6 +620,10 @@ store_bit_field_1 (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
       && GET_MODE (value) != BLKmode
       && bitsize > 0
       && GET_MODE_BITSIZE (op_mode) >= bitsize
+      /* Do not use insv for volatile bitfields when
+         -fstrict-volatile-bitfields is in effect.  */
+      && !(MEM_P (op0) && MEM_VOLATILE_P (op0)
+	   && flag_strict_volatile_bitfields > 0)
       && ! ((REG_P (op0) || GET_CODE (op0) == SUBREG)
 	    && (bitsize + bitpos > GET_MODE_BITSIZE (op_mode))))
     {
@@ -659,19 +663,19 @@ store_bit_field_1 (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
 	  copy_back = true;
 	}
 
-      /* On big-endian machines, we count bits from the most significant.
-	 If the bit field insn does not, we must invert.  */
-
-      if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN)
-	xbitpos = unit - bitsize - xbitpos;
-
       /* We have been counting XBITPOS within UNIT.
 	 Count instead within the size of the register.  */
-      if (BITS_BIG_ENDIAN && !MEM_P (xop0))
+      if (BYTES_BIG_ENDIAN && !MEM_P (xop0))
 	xbitpos += GET_MODE_BITSIZE (op_mode) - unit;
 
       unit = GET_MODE_BITSIZE (op_mode);
 
+      /* On big-endian machines, we count bits from the most significant.
+	 If the bit field insn does not, we must invert.  */
+
+      if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN)
+	xbitpos = unit - bitsize - xbitpos;
+
       /* Convert VALUE to op_mode (which insv insn wants) in VALUE1.  */
       value1 = value;
       if (GET_MODE (value) != op_mode)
@@ -1507,6 +1511,10 @@ extract_bit_field_1 (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
   if (ext_mode != MAX_MACHINE_MODE
       && bitsize > 0
       && GET_MODE_BITSIZE (ext_mode) >= bitsize
+      /* Do not use extv/extzv for volatile bitfields when
+         -fstrict-volatile-bitfields is in effect.  */
+      && !(MEM_P (op0) && MEM_VOLATILE_P (op0)
+	   && flag_strict_volatile_bitfields > 0)
       /* If op0 is a register, we need it in EXT_MODE to make it
 	 acceptable to the format of ext(z)v.  */
       && !(GET_CODE (op0) == SUBREG && GET_MODE (op0) != ext_mode)
@@ -1528,17 +1536,17 @@ extract_bit_field_1 (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
 	/* Get ref to first byte containing part of the field.  */
 	xop0 = adjust_address (xop0, byte_mode, xoffset);
 
-      /* On big-endian machines, we count bits from the most significant.
-	 If the bit field insn does not, we must invert.  */
-      if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN)
-	xbitpos = unit - bitsize - xbitpos;
-
       /* Now convert from counting within UNIT to counting in EXT_MODE.  */
-      if (BITS_BIG_ENDIAN && !MEM_P (xop0))
+      if (BYTES_BIG_ENDIAN && !MEM_P (xop0))
 	xbitpos += GET_MODE_BITSIZE (ext_mode) - unit;
 
       unit = GET_MODE_BITSIZE (ext_mode);
 
+      /* On big-endian machines, we count bits from the most significant.
+	 If the bit field insn does not, we must invert.  */
+      if (BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN)
+	xbitpos = unit - bitsize - xbitpos;
+
       if (xtarget == 0)
 	xtarget = xspec_target = gen_reg_rtx (tmode);
 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH, ARM] Unaligned accesses for packed structures [1/2]
  2011-08-25 16:29   ` Julian Brown
  2011-08-25 18:19     ` Julian Brown
@ 2011-09-01 14:46     ` Ulrich Weigand
  1 sibling, 0 replies; 14+ messages in thread
From: Ulrich Weigand @ 2011-09-01 14:46 UTC (permalink / raw)
  To: Julian Brown; +Cc: gcc-patches, paul, rearnsha, Ramana Radhakrishnan

Julian Brown wrote:

> The problem is, if we're using little-endian bit numbering for memory
> locations in big-endian-bytes mode, we need to define an origin from
> which to count "backwards" from. For the current implementation, this
> will now be (I believe) one word beyond the base address of the access
> in question, which is IMO slightly bizarre, to say the least.

It doesn't seem all that bizarre to me.  Conceptually, the extraction
(etc.) operation works on an operand of integer type (usually, word sized)
in two steps:
- read the operand from its storage location, resulting in a plain
  (conceptual) integer value
- extract certain numbered bits from that integer value

The first step, reading the operand from the storage location,
depends on BYTES_BIG_ENDIAN (as all memory reads do).  It does not
depend on BITS_BIG_ENDIAN at all.

The second step, extracting numbered bits, depends on BITS_BIG_ENDIAN,
which provides the link between bit number and its value.  This step
however does not depend on BYTES_BIG_ENDIAN at all.  [However, if
BITS_BIG_ENDIAN is true, you need to consider the total size of the
operand in order to perform the conversion, since bit 0 then refers
to the most-significant bit, and which bit that is depends on the
size ...  But this still does not depend on *BYTES_BIG_ENDIAN*.]

Thus, the two flags can be considered really independent, and any
combination is quite well-defined.


When actually implementing the extraction, you will of course deviate
from that conceptual sequence, e.g. by avoiding to read certain bytes
if you know none of the bits in them will end up in the final result.
But while this might result in some computations that may not 
immediately look obvious, in the end this is just an optimization
step and doesn't change that the endian flags remain well-defined ...


Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH, ARM] Unaligned accesses for packed structures [1/2]
  2011-08-26 17:37       ` Julian Brown
@ 2011-09-08 17:35         ` Richard Earnshaw
  0 siblings, 0 replies; 14+ messages in thread
From: Richard Earnshaw @ 2011-09-08 17:35 UTC (permalink / raw)
  To: Julian Brown; +Cc: gcc-patches, Ulrich Weigand, paul, Ramana Radhakrishnan

On 26/08/11 17:39, Julian Brown wrote:
> On Thu, 25 Aug 2011 18:31:21 +0100
> Julian Brown <julian@codesourcery.com> wrote:
> 
>> On Thu, 25 Aug 2011 16:46:50 +0100
>> Julian Brown <julian@codesourcery.com> wrote:
>>
>>> So, OK to apply this version, assuming testing comes out OK? (And
>>> the followup patch [2/2], which remains unchanged?)
>>
>> FWIW, all tests pass, apart from
>> gcc.target/arm/volatile-bitfields-3.c, which regresses. The output
>> contains:
>>
>>         ldrh    r0, [r3, #2]    @ unaligned
>>
>> I believe that, to conform to the ARM EABI, that GCC must use an
>> (aligned) ldr in this case. Is that correct? If so, it looks like the
>> middle-end bitfield code does not take the setting of
>> -fstrict-volatile-bitfields into account.
> 
> This version fixes the last issue, by adding additional checks for
> volatile accesses/-fstrict-volatile-bitfields. Tests now show no
> regressions.
> 
> OK to apply?
> 

+      /* On big-endian machines, we count bits from the most significant.
+	 If the bit field insn does not, we must invert.  */


It sounds to me like this comment is somewhat out of date now that we
have BITS_BIG_ENDIAN; it would be better re-worded to reflect the code
as it stands now.

Other than that, OK.

R.

> Thanks,
> 
> Julian
> 
> ChangeLog
> 
>     gcc/
>     * config/arm/arm.c (arm_override_options): Add unaligned_access
>     support.
>     (arm_file_start): Emit attribute for unaligned access as
>     appropriate.
>     * config/arm/arm.md (UNSPEC_UNALIGNED_LOAD)
>     (UNSPEC_UNALIGNED_STORE): Add constants for unspecs.
>     (insv, extzv): Add unaligned-access support.
>     (extv): Change to expander. Likewise.
>     (extzv_t1, extv_regsi): Add helpers.
>     (unaligned_loadsi, unaligned_loadhis, unaligned_loadhiu)
>     (unaligned_storesi, unaligned_storehi): New.
>     (*extv_reg): New (previous extv implementation).
>     * config/arm/arm.opt (munaligned_access): Add option.
>     * config/arm/constraints.md (Uw): New constraint.
>     * expmed.c (store_bit_field_1): Adjust bitfield numbering according
>     to size of access, not size of unit, when BITS_BIG_ENDIAN !=
>     BYTES_BIG_ENDIAN. Don't use bitfield accesses for
>     volatile accesses when -fstrict-volatile-bitfields is in effect.
>     (extract_bit_field_1): Likewise.
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH, ARM] Unaligned accesses for packed structures [1/2]
  2011-05-06 12:45 [PATCH, ARM] Unaligned accesses for packed structures [1/2] Julian Brown
  2011-05-06 13:07 ` Richard Earnshaw
  2011-07-04 11:44 ` Ulrich Weigand
@ 2011-09-22  6:47 ` Ye Joey
  2 siblings, 0 replies; 14+ messages in thread
From: Ye Joey @ 2011-09-22  6:47 UTC (permalink / raw)
  To: Julian Brown; +Cc: gcc-patches, paul, rearnsha, Ramana Radhakrishnan

Julian,

This patch works for register ld/st, but doesn't work for immediate,
as showed in example.

Would you further improve it for imm?

Thanks - Joey

$ arm-none-eabi-gcc -v 2>&1 | grep version
gcc version 4.7.0 20110922 (experimental) [trunk revision 179074] (GCC)
$ cat u.c
struct packed_str
{
	char f0;
	int  f1;
}__attribute__((packed)) u;

void foo(int a)
{
  u.f1 = a; // works fine
}

void bar()
{
  u.f1 = 1; // doesn't work
}

$ arm-none-eabi-gcc -mthumb -march=armv7 -S -O2 u.c
$ cat u.s
	.syntax unified
	.arch armv7
	.fpu softvfp
	.eabi_attribute 20, 1
	.eabi_attribute 21, 1
	.eabi_attribute 23, 3
	.eabi_attribute 24, 1
	.eabi_attribute 25, 1
	.eabi_attribute 26, 1
	.eabi_attribute 30, 2
	.eabi_attribute 34, 1
	.eabi_attribute 18, 4
	.thumb
	.file	"u.c"
	.text
	.align	2
	.global	foo
	.thumb
	.thumb_func
	.type	foo, %function
foo:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	movw	r3, #:lower16:u
	movt	r3, #:upper16:u
	str	r0, [r3, #1]	@ unaligned
	bx	lr
	.size	foo, .-foo
	.align	2
	.global	bar
	.thumb
	.thumb_func
	.type	bar, %function
bar:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	movw	r3, #:lower16:u
	movs	r1, #0
	movt	r3, #:upper16:u

@ Still uses aligned str from here
	ldrb	r2, [r3, #0]	@ zero_extendqisi2
	strb	r1, [r3, #4]
	orr	r2, r2, #256
	str	r2, [r3, #0]
	bx	lr
	.size	bar, .-bar
	.comm	u,5,4
	.ident	"GCC: (GNU) 4.7.0 20110922 (experimental) [trunk revision 179074]"

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2011-09-22  4:54 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-06 12:45 [PATCH, ARM] Unaligned accesses for packed structures [1/2] Julian Brown
2011-05-06 13:07 ` Richard Earnshaw
2011-05-09 17:50   ` Julian Brown
2011-05-10 14:03     ` Julian Brown
2011-05-26 17:06     ` Julian Brown
2011-07-04 11:44 ` Ulrich Weigand
2011-08-25 16:29   ` Julian Brown
2011-08-25 18:19     ` Julian Brown
2011-08-25 22:18       ` Ramana Radhakrishnan
2011-08-25 22:23         ` Joseph S. Myers
2011-08-26 17:37       ` Julian Brown
2011-09-08 17:35         ` Richard Earnshaw
2011-09-01 14:46     ` Ulrich Weigand
2011-09-22  6:47 ` Ye Joey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).