public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCHv2] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)
@ 2019-03-10 12:51 Bernd Edlinger
  2019-03-19 14:01 ` [PING] " Bernd Edlinger
  2019-03-21 11:26 ` Richard Biener
  0 siblings, 2 replies; 50+ messages in thread
From: Bernd Edlinger @ 2019-03-10 12:51 UTC (permalink / raw)
  To: gcc-patches, Richard Biener, Richard Earnshaw,
	Ramana Radhakrishnan, Kyrill Tkachov, Eric Botcazou

[-- Attachment #1: Type: text/plain, Size: 1583 bytes --]

Hi,

This patch is an update to the previous patch, which takes into account that
the middle-end is not supposed to use the unaligned DI value directly which
was passed in an unaligned stack slot due to the AAPCS parameter passing rules.

The patch works by changing use_register_for_decl to return false if the
incoming RTL is not sufficiently aligned on a STRICT_ALIGNMENT target,
as if the address of the parameter was taken (which is TREE_ADDRESSABLE).
So not taking the address of the parameter is a necessary condition
for the wrong-code in PR 89544.

It works together with this check in assign_parm_adjust_stack_rtl:
  /* If we can't trust the parm stack slot to be aligned enough for its
     ultimate type, don't use that slot after entry.  We'll make another
     stack slot, if we need one.  */
  if (stack_parm
      && ((STRICT_ALIGNMENT
           && GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN (stack_parm))
...
    stack_param = NULL

This makes assign_parms use assign_parm_setup_stack instead of
assign_parm_setup_reg, and the latter does assign a suitably aligned
stack slot, because stack_param == NULL by now, and uses emit_block_move
which avoids the issue with the back-end.

Additionally, to prevent unnecessary performance regressions,
assign_parm_find_stack_rtl is enhanced to make use of a possible larger
alignment if the parameter was passed in an aligned stack slot without
the ABI requiring it.


Bootstrapped and reg-tested with all languages on arm-linux-gnueabihf.
Is it OK for trunk?


Thanks
Bernd.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: patch-arm-align-abi.diff --]
[-- Type: text/x-patch; name="patch-arm-align-abi.diff", Size: 3581 bytes --]

2019-03-05  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	PR middle-end/89544
	* function.c (use_register_for_decl): Avoid using unaligned stack
	values on strict alignment targets.
	(assign_parm_find_stack_rtl): Use larger alignment when possible.

testsuite:
2019-03-05  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	PR middle-end/89544
	* gcc.target/arm/unaligned-argument-1.c: New test.
	* gcc.target/arm/unaligned-argument-2.c: New test.

Index: gcc/function.c
===================================================================
--- gcc/function.c	(revision 269264)
+++ gcc/function.c	(working copy)
@@ -2210,6 +2210,12 @@ use_register_for_decl (const_tree decl)
   if (DECL_MODE (decl) == BLKmode)
     return false;
 
+  if (STRICT_ALIGNMENT && TREE_CODE (decl) == PARM_DECL
+      && DECL_INCOMING_RTL (decl) && MEM_P (DECL_INCOMING_RTL (decl))
+      && GET_MODE_ALIGNMENT (DECL_MODE (decl))
+	 > MEM_ALIGN (DECL_INCOMING_RTL (decl)))
+    return false;
+
   /* If -ffloat-store specified, don't put explicit float variables
      into registers.  */
   /* ??? This should be checked after DECL_ARTIFICIAL, but tree-ssa
@@ -2698,8 +2704,20 @@ assign_parm_find_stack_rtl (tree parm, struct assi
      intentionally forcing upward padding.  Otherwise we have to come
      up with a guess at the alignment based on OFFSET_RTX.  */
   poly_int64 offset;
-  if (data->locate.where_pad != PAD_DOWNWARD || data->entry_parm)
+  if (data->locate.where_pad == PAD_NONE || data->entry_parm)
     align = boundary;
+  else if (data->locate.where_pad == PAD_UPWARD)
+    {
+      align = boundary;
+      if (poly_int_rtx_p (offset_rtx, &offset)
+	  && STACK_POINTER_OFFSET == 0)
+	{
+	  unsigned int offset_align = known_alignment (offset) * BITS_PER_UNIT;
+	  if (offset_align == 0 || offset_align > STACK_BOUNDARY)
+	    offset_align = STACK_BOUNDARY;
+	  align = MAX (align, offset_align);
+	}
+    }
   else if (poly_int_rtx_p (offset_rtx, &offset))
     {
       align = least_bit_hwi (boundary);
Index: gcc/testsuite/gcc.target/arm/unaligned-argument-1.c
===================================================================
--- gcc/testsuite/gcc.target/arm/unaligned-argument-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/arm/unaligned-argument-1.c	(working copy)
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-marm -march=armv6 -mno-unaligned-access -mfloat-abi=soft -mabi=aapcs -O3" } */
+
+struct s {
+  int a, b;
+} __attribute__((aligned(8)));
+
+struct s f0;
+
+void f(int a, int b, int c, int d, struct s f)
+{
+  f0 = f;
+}
+
+/* { dg-final { scan-assembler-times "stmdb" 0 } } */
+/* { dg-final { scan-assembler-times "ldrd\[^\\n\]*\\\[sp\\\]" 1 } } */
+/* { dg-final { scan-assembler-times "ldrd" 1 } } */
+/* { dg-final { scan-assembler-times "strd" 1 } } */
Index: gcc/testsuite/gcc.target/arm/unaligned-argument-2.c
===================================================================
--- gcc/testsuite/gcc.target/arm/unaligned-argument-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/arm/unaligned-argument-2.c	(working copy)
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-marm -march=armv6 -mno-unaligned-access -mfloat-abi=soft -mabi=aapcs -O3" } */
+
+struct s {
+  int a, b;
+} __attribute__((aligned(8)));
+
+struct s f0;
+
+void f(int a, int b, int c, int d, int e, struct s f)
+{
+  f0 = f;
+}
+
+/* { dg-final { scan-assembler-times "stmdb" 1 } } */
+/* { dg-final { scan-assembler-times "ldrd\[^\\n\]*\\\[sp\\\]" 1 } } */
+/* { dg-final { scan-assembler-times "ldrd" 1 } } */
+/* { dg-final { scan-assembler-times "strd" 1 } } */

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PING] [PATCHv2] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)
  2019-03-10 12:51 [PATCHv2] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544) Bernd Edlinger
@ 2019-03-19 14:01 ` Bernd Edlinger
  2019-03-21 11:26 ` Richard Biener
  1 sibling, 0 replies; 50+ messages in thread
From: Bernd Edlinger @ 2019-03-19 14:01 UTC (permalink / raw)
  To: gcc-patches, Richard Biener, Richard Earnshaw,
	Ramana Radhakrishnan, Kyrill Tkachov, Eric Botcazou

Hi,

I'd like to ping for this patch:
https://gcc.gnu.org/ml/gcc-patches/2019-03/msg00438.html

Thanks
Bernd.


On 3/10/19 10:42 AM, Bernd Edlinger wrote:
> Hi,
> 
> This patch is an update to the previous patch, which takes into account that
> the middle-end is not supposed to use the unaligned DI value directly which
> was passed in an unaligned stack slot due to the AAPCS parameter passing rules.
> 
> The patch works by changing use_register_for_decl to return false if the
> incoming RTL is not sufficiently aligned on a STRICT_ALIGNMENT target,
> as if the address of the parameter was taken (which is TREE_ADDRESSABLE).
> So not taking the address of the parameter is a necessary condition
> for the wrong-code in PR 89544.
> 
> It works together with this check in assign_parm_adjust_stack_rtl:
>   /* If we can't trust the parm stack slot to be aligned enough for its
>      ultimate type, don't use that slot after entry.  We'll make another
>      stack slot, if we need one.  */
>   if (stack_parm
>       && ((STRICT_ALIGNMENT
>            && GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN (stack_parm))
> ...
>     stack_param = NULL
> 
> This makes assign_parms use assign_parm_setup_stack instead of
> assign_parm_setup_reg, and the latter does assign a suitably aligned
> stack slot, because stack_param == NULL by now, and uses emit_block_move
> which avoids the issue with the back-end.
> 
> Additionally, to prevent unnecessary performance regressions,
> assign_parm_find_stack_rtl is enhanced to make use of a possible larger
> alignment if the parameter was passed in an aligned stack slot without
> the ABI requiring it.
> 
> 
> Bootstrapped and reg-tested with all languages on arm-linux-gnueabihf.
> Is it OK for trunk?
> 
> 
> Thanks
> Bernd.
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCHv2] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)
  2019-03-10 12:51 [PATCHv2] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544) Bernd Edlinger
  2019-03-19 14:01 ` [PING] " Bernd Edlinger
@ 2019-03-21 11:26 ` Richard Biener
  2019-03-22 17:47   ` Bernd Edlinger
  1 sibling, 1 reply; 50+ messages in thread
From: Richard Biener @ 2019-03-21 11:26 UTC (permalink / raw)
  To: Bernd Edlinger
  Cc: gcc-patches, Richard Earnshaw, Ramana Radhakrishnan,
	Kyrill Tkachov, Eric Botcazou

On Sun, 10 Mar 2019, Bernd Edlinger wrote:

> Hi,
> 
> This patch is an update to the previous patch, which takes into account that
> the middle-end is not supposed to use the unaligned DI value directly which
> was passed in an unaligned stack slot due to the AAPCS parameter passing rules.
> 
> The patch works by changing use_register_for_decl to return false if the
> incoming RTL is not sufficiently aligned on a STRICT_ALIGNMENT target,
> as if the address of the parameter was taken (which is TREE_ADDRESSABLE).
> So not taking the address of the parameter is a necessary condition
> for the wrong-code in PR 89544.
> 
> It works together with this check in assign_parm_adjust_stack_rtl:
>   /* If we can't trust the parm stack slot to be aligned enough for its
>      ultimate type, don't use that slot after entry.  We'll make another
>      stack slot, if we need one.  */
>   if (stack_parm
>       && ((STRICT_ALIGNMENT
>            && GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN (stack_parm))
> ...
>     stack_param = NULL
> 
> This makes assign_parms use assign_parm_setup_stack instead of
> assign_parm_setup_reg, and the latter does assign a suitably aligned
> stack slot, because stack_param == NULL by now, and uses emit_block_move
> which avoids the issue with the back-end.
> 
> Additionally, to prevent unnecessary performance regressions,
> assign_parm_find_stack_rtl is enhanced to make use of a possible larger
> alignment if the parameter was passed in an aligned stack slot without
> the ABI requiring it.
> 
> 
> Bootstrapped and reg-tested with all languages on arm-linux-gnueabihf.
> Is it OK for trunk?

I think the assign_parm_find_stack_rtl is not appropriate at this stage,
I am also missing an update to the comment of the block you change.
It also changes code I'm not familar enough with to review...

Finally...

Index: gcc/function.c
===================================================================
--- gcc/function.c      (revision 269264)
+++ gcc/function.c      (working copy)
@@ -2210,6 +2210,12 @@ use_register_for_decl (const_tree decl)
   if (DECL_MODE (decl) == BLKmode)
     return false;

+  if (STRICT_ALIGNMENT && TREE_CODE (decl) == PARM_DECL
+      && DECL_INCOMING_RTL (decl) && MEM_P (DECL_INCOMING_RTL (decl))
+      && GET_MODE_ALIGNMENT (DECL_MODE (decl))
+        > MEM_ALIGN (DECL_INCOMING_RTL (decl)))
+    return false;
+
   /* If -ffloat-store specified, don't put explicit float variables
      into registers.  */
   /* ??? This should be checked after DECL_ARTIFICIAL, but tree-ssa

I wonder if it is necessary to look at DECL_INCOMING_RTL here
and why such RTL may not exist?  That is, iff DECL_INCOMING_RTL
doesn't exist then shouldn't we return false for safety reasons?

Similarly the very same issue should exist on x86_64 which is
!STRICT_ALIGNMENT, it's just the ABI seems to provide the appropriate
alignment on the caller side.  So the STRICT_ALIGNMENT check is
a wrong one.

Which makes me think that a proper fix is not here, but in
target(hook) code.

Changing use_register_for_decl sounds odd anyways since if we return true
we for the testcase still end up in memory, no?

The hunk obviously misses a comment since the effect that this
will cause a copy to be emitted isn't obvious (and relying on
this probably fragile).

Richard.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCHv2] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)
  2019-03-21 11:26 ` Richard Biener
@ 2019-03-22 17:47   ` Bernd Edlinger
  2019-03-25  9:28     ` Richard Biener
  0 siblings, 1 reply; 50+ messages in thread
From: Bernd Edlinger @ 2019-03-22 17:47 UTC (permalink / raw)
  To: Richard Biener
  Cc: gcc-patches, Richard Earnshaw, Ramana Radhakrishnan,
	Kyrill Tkachov, Eric Botcazou

On 3/21/19 12:15 PM, Richard Biener wrote:
> On Sun, 10 Mar 2019, Bernd Edlinger wrote:
> Finally...
> 
> Index: gcc/function.c
> ===================================================================
> --- gcc/function.c      (revision 269264)
> +++ gcc/function.c      (working copy)
> @@ -2210,6 +2210,12 @@ use_register_for_decl (const_tree decl)
>    if (DECL_MODE (decl) == BLKmode)
>      return false;
> 
> +  if (STRICT_ALIGNMENT && TREE_CODE (decl) == PARM_DECL
> +      && DECL_INCOMING_RTL (decl) && MEM_P (DECL_INCOMING_RTL (decl))
> +      && GET_MODE_ALIGNMENT (DECL_MODE (decl))
> +        > MEM_ALIGN (DECL_INCOMING_RTL (decl)))
> +    return false;
> +
>    /* If -ffloat-store specified, don't put explicit float variables
>       into registers.  */
>    /* ??? This should be checked after DECL_ARTIFICIAL, but tree-ssa
> 
> I wonder if it is necessary to look at DECL_INCOMING_RTL here
> and why such RTL may not exist?  That is, iff DECL_INCOMING_RTL
> doesn't exist then shouldn't we return false for safety reasons?
> 

I think that happens a few times already before the INCOMING_RTL
is assigned.  I thought that might be too pessimistic.

> Similarly the very same issue should exist on x86_64 which is
> !STRICT_ALIGNMENT, it's just the ABI seems to provide the appropriate
> alignment on the caller side.  So the STRICT_ALIGNMENT check is
> a wrong one.
> 

I may be plain wrong here, but I thought that !STRICT_ALIGNMENT targets
just use MEM_ALIGN to select the right instructions.  MEM_ALIGN
is always 32-bit align on the DImode memory.  The x86_64 vector instructions
would look at MEM_ALIGN and do the right thing, yes?

It seems to be the definition of STRICT_ALIGNMENT targets that all RTL
instructions need to have MEM_ALIGN >= GET_MODE_ALIGNMENT, so the target
does not even have to look at MEM_ALIGN except in the mov_misalign_optab,
right?

The other hunk, where I admit I did not fully understand the comment, tries
only to increase the MEM_ALIGN to 64-bit if the stack slot is
64-bit aligned although the target said it only needs 32-bit alignment.
So that it is no longer necessary to copy the incoming value.


> Which makes me think that a proper fix is not here, but in
> target(hook) code.
> 
> Changing use_register_for_decl sounds odd anyways since if we return true
> we for the testcase still end up in memory, no?
> 

It seems to make us use the incoming register _or_ stack slot if this function
returns true here.

If it returns false here, a new stack slot is allocated, but only if the
original stack slot was not aligned.  This works together with the
other STRICT_ALIGNMENT check in assign_parm_adjust_stack_rtl.
Where also for !STRICT_ALIGNMENT target TYPE_ALIGN and MEM_ALIGN
are checked, but this seems to have only an effect if an address
is taken, in that case I see use_register_for_decl return false
due to TREE_ADDRESSABLE (decl), and whoops, we have an aligned copy
of the unaligned stack slot.

So I believe that there was already a fix for unaligned stack positions,
that relied on the addressability of the parameter, while the target
relied on the 8-byte alignment of the DImode access.

> The hunk obviously misses a comment since the effect that this
> will cause a copy to be emitted isn't obvious (and relying on
> this probably fragile).
> 

Yes, also that the copy is done using movmisalign optab is important.


Thanks
Bernd.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCHv2] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)
  2019-03-22 17:47   ` Bernd Edlinger
@ 2019-03-25  9:28     ` Richard Biener
  2019-07-30 22:13       ` [PATCHv3] " Bernd Edlinger
  0 siblings, 1 reply; 50+ messages in thread
From: Richard Biener @ 2019-03-25  9:28 UTC (permalink / raw)
  To: Bernd Edlinger
  Cc: gcc-patches, Richard Earnshaw, Ramana Radhakrishnan,
	Kyrill Tkachov, Eric Botcazou

[-- Attachment #1: Type: text/plain, Size: 4188 bytes --]

On Fri, 22 Mar 2019, Bernd Edlinger wrote:

> On 3/21/19 12:15 PM, Richard Biener wrote:
> > On Sun, 10 Mar 2019, Bernd Edlinger wrote:
> > Finally...
> > 
> > Index: gcc/function.c
> > ===================================================================
> > --- gcc/function.c      (revision 269264)
> > +++ gcc/function.c      (working copy)
> > @@ -2210,6 +2210,12 @@ use_register_for_decl (const_tree decl)
> >    if (DECL_MODE (decl) == BLKmode)
> >      return false;
> > 
> > +  if (STRICT_ALIGNMENT && TREE_CODE (decl) == PARM_DECL
> > +      && DECL_INCOMING_RTL (decl) && MEM_P (DECL_INCOMING_RTL (decl))
> > +      && GET_MODE_ALIGNMENT (DECL_MODE (decl))
> > +        > MEM_ALIGN (DECL_INCOMING_RTL (decl)))
> > +    return false;
> > +
> >    /* If -ffloat-store specified, don't put explicit float variables
> >       into registers.  */
> >    /* ??? This should be checked after DECL_ARTIFICIAL, but tree-ssa
> > 
> > I wonder if it is necessary to look at DECL_INCOMING_RTL here
> > and why such RTL may not exist?  That is, iff DECL_INCOMING_RTL
> > doesn't exist then shouldn't we return false for safety reasons?
> > 
> 
> I think that happens a few times already before the INCOMING_RTL
> is assigned.  I thought that might be too pessimistic.
> 
> > Similarly the very same issue should exist on x86_64 which is
> > !STRICT_ALIGNMENT, it's just the ABI seems to provide the appropriate
> > alignment on the caller side.  So the STRICT_ALIGNMENT check is
> > a wrong one.
> > 
> 
> I may be plain wrong here, but I thought that !STRICT_ALIGNMENT targets
> just use MEM_ALIGN to select the right instructions.  MEM_ALIGN
> is always 32-bit align on the DImode memory.  The x86_64 vector instructions
> would look at MEM_ALIGN and do the right thing, yes?

No, they need to use the movmisalign optab and end up with UNSPECs
for example.

> It seems to be the definition of STRICT_ALIGNMENT targets that all RTL
> instructions need to have MEM_ALIGN >= GET_MODE_ALIGNMENT, so the target
> does not even have to look at MEM_ALIGN except in the mov_misalign_optab,
> right?

Yes, I think we never losened that.  Note that RTL expansion has to
fix this up for them.  Note that strictly speaking SLOW_UNALIGNED_ACCESS
specifies that x86 is strict-align wrt vector modes.

> The other hunk, where I admit I did not fully understand the comment, tries
> only to increase the MEM_ALIGN to 64-bit if the stack slot is
> 64-bit aligned although the target said it only needs 32-bit alignment.
> So that it is no longer necessary to copy the incoming value.
> 
> 
> > Which makes me think that a proper fix is not here, but in
> > target(hook) code.
> > 
> > Changing use_register_for_decl sounds odd anyways since if we return true
> > we for the testcase still end up in memory, no?
> > 
> 
> It seems to make us use the incoming register _or_ stack slot if this function
> returns true here.
>
> If it returns false here, a new stack slot is allocated, but only if the
> original stack slot was not aligned.  This works together with the
> other STRICT_ALIGNMENT check in assign_parm_adjust_stack_rtl.

Yes, I understood this - but then the check should be in that code
deciding whether to copy, not in use_register_for_decl?

> Where also for !STRICT_ALIGNMENT target TYPE_ALIGN and MEM_ALIGN
> are checked, but this seems to have only an effect if an address
> is taken, in that case I see use_register_for_decl return false
> due to TREE_ADDRESSABLE (decl), and whoops, we have an aligned copy
> of the unaligned stack slot.
> 
> So I believe that there was already a fix for unaligned stack positions,
> that relied on the addressability of the parameter, while the target
> relied on the 8-byte alignment of the DImode access.
> 
> > The hunk obviously misses a comment since the effect that this
> > will cause a copy to be emitted isn't obvious (and relying on
> > this probably fragile).
> > 
> 
> Yes, also that the copy is done using movmisalign optab is important.
> 
> 
> Thanks
> Bernd.
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany;
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah; HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCHv3] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)
  2019-03-25  9:28     ` Richard Biener
@ 2019-07-30 22:13       ` Bernd Edlinger
  2019-07-31 13:17         ` Richard Earnshaw (lists)
  2019-08-02 13:11         ` Richard Biener
  0 siblings, 2 replies; 50+ messages in thread
From: Bernd Edlinger @ 2019-07-30 22:13 UTC (permalink / raw)
  To: Richard Biener
  Cc: gcc-patches, Richard Earnshaw, Ramana Radhakrishnan,
	Kyrill Tkachov, Eric Botcazou

[-- Attachment #1: Type: text/plain, Size: 3464 bytes --]

Hi Richard,

it is already a while ago, but I had not found time to continue
with this patch until now.

I think I have now a better solution, which properly addresses your
comments below.

On 3/25/19 9:41 AM, Richard Biener wrote:
> On Fri, 22 Mar 2019, Bernd Edlinger wrote:
> 
>> On 3/21/19 12:15 PM, Richard Biener wrote:
>>> On Sun, 10 Mar 2019, Bernd Edlinger wrote:
>>> Finally...
>>>
>>> Index: gcc/function.c
>>> ===================================================================
>>> --- gcc/function.c      (revision 269264)
>>> +++ gcc/function.c      (working copy)
>>> @@ -2210,6 +2210,12 @@ use_register_for_decl (const_tree decl)
>>>    if (DECL_MODE (decl) == BLKmode)
>>>      return false;
>>>
>>> +  if (STRICT_ALIGNMENT && TREE_CODE (decl) == PARM_DECL
>>> +      && DECL_INCOMING_RTL (decl) && MEM_P (DECL_INCOMING_RTL (decl))
>>> +      && GET_MODE_ALIGNMENT (DECL_MODE (decl))
>>> +        > MEM_ALIGN (DECL_INCOMING_RTL (decl)))
>>> +    return false;
>>> +
>>>    /* If -ffloat-store specified, don't put explicit float variables
>>>       into registers.  */
>>>    /* ??? This should be checked after DECL_ARTIFICIAL, but tree-ssa
>>>
>>> I wonder if it is necessary to look at DECL_INCOMING_RTL here
>>> and why such RTL may not exist?  That is, iff DECL_INCOMING_RTL
>>> doesn't exist then shouldn't we return false for safety reasons?
>>>

You are right, it is not possbile to return different results from
use_register_for_decl before vs. after incoming RTL is assigned.
That hits an assertion in set_rtl.

This hunk is gone now, instead I changed assign_parm_setup_reg
to use movmisalign optab and/or extract_bit_field if misaligned
entry_parm is to be assigned in a register.

I have no test coverage for the movmisalign optab though, so I
rely on your code review for that part.

>>> Similarly the very same issue should exist on x86_64 which is
>>> !STRICT_ALIGNMENT, it's just the ABI seems to provide the appropriate
>>> alignment on the caller side.  So the STRICT_ALIGNMENT check is
>>> a wrong one.
>>>
>>
>> I may be plain wrong here, but I thought that !STRICT_ALIGNMENT targets
>> just use MEM_ALIGN to select the right instructions.  MEM_ALIGN
>> is always 32-bit align on the DImode memory.  The x86_64 vector instructions
>> would look at MEM_ALIGN and do the right thing, yes?
> 
> No, they need to use the movmisalign optab and end up with UNSPECs
> for example.
Ah, thanks, now I see.

>> It seems to be the definition of STRICT_ALIGNMENT targets that all RTL
>> instructions need to have MEM_ALIGN >= GET_MODE_ALIGNMENT, so the target
>> does not even have to look at MEM_ALIGN except in the mov_misalign_optab,
>> right?
> 
> Yes, I think we never losened that.  Note that RTL expansion has to
> fix this up for them.  Note that strictly speaking SLOW_UNALIGNED_ACCESS
> specifies that x86 is strict-align wrt vector modes.
> 

Yes I agree, the code would be incorrect for x86 as well when the movmisalign_optab
is not used.  So I invoke the movmisalign optab if available and if not fall
back to extract_bit_field.  As in the assign_parm_setup_stack assign_parm_setup_reg
assumes that data->promoted_mode != data->nominal_mode does not happen with
misaligned stack slots.


Attached is the v3 if my patch.

Boot-strapped and reg-tested on x86_64-pc-linux-gnu and arm-linux-gnueabihf.

Is it OK for trunk?


Thanks
Bernd.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: patch-arm-align-abi.diff --]
[-- Type: text/x-patch; name="patch-arm-align-abi.diff", Size: 4900 bytes --]

2019-07-30  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	PR middle-end/89544
	* function.c (assign_param_data_one): Remove unused data members.
	(assign_parm_find_stack_rtl): Use larger alignment when possible.
	(assign_parm_adjust_stack_rtl): Revise STRICT_ALIGNMENT check.
	(assign_parm_setup_reg): Handle misaligned stack arguments.

testsuite:
2019-07-30  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	PR middle-end/89544
	* gcc.target/arm/unaligned-argument-1.c: New test.
	* gcc.target/arm/unaligned-argument-2.c: New test.

Index: gcc/function.c
===================================================================
--- gcc/function.c	(revision 273767)
+++ gcc/function.c	(working copy)
@@ -2274,8 +2274,6 @@ struct assign_parm_data_one
   int partial;
   BOOL_BITFIELD named_arg : 1;
   BOOL_BITFIELD passed_pointer : 1;
-  BOOL_BITFIELD on_stack : 1;
-  BOOL_BITFIELD loaded_in_reg : 1;
 };
 
 /* A subroutine of assign_parms.  Initialize ALL.  */
@@ -2699,8 +2697,23 @@ assign_parm_find_stack_rtl (tree parm, struct assi
      intentionally forcing upward padding.  Otherwise we have to come
      up with a guess at the alignment based on OFFSET_RTX.  */
   poly_int64 offset;
-  if (data->locate.where_pad != PAD_DOWNWARD || data->entry_parm)
+  if (data->locate.where_pad == PAD_NONE || data->entry_parm)
     align = boundary;
+  else if (data->locate.where_pad == PAD_UPWARD)
+    {
+      align = boundary;
+      /* If the argument offset is actually more aligned than the nominal
+	 stack slot boundary, take advantage of that excess alignment.
+	 Don't make any assumptions if STACK_POINTER_OFFSET is in use.  */
+      if (poly_int_rtx_p (offset_rtx, &offset)
+	  && STACK_POINTER_OFFSET == 0)
+	{
+	  unsigned int offset_align = known_alignment (offset) * BITS_PER_UNIT;
+	  if (offset_align == 0 || offset_align > STACK_BOUNDARY)
+	    offset_align = STACK_BOUNDARY;
+	  align = MAX (align, offset_align);
+	}
+    }
   else if (poly_int_rtx_p (offset_rtx, &offset))
     {
       align = least_bit_hwi (boundary);
@@ -2813,8 +2826,9 @@ assign_parm_adjust_stack_rtl (struct assign_parm_d
      ultimate type, don't use that slot after entry.  We'll make another
      stack slot, if we need one.  */
   if (stack_parm
-      && ((STRICT_ALIGNMENT
-	   && GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN (stack_parm))
+      && ((GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN (stack_parm)
+	   && targetm.slow_unaligned_access (data->nominal_mode,
+					     MEM_ALIGN (stack_parm)))
 	  || (data->nominal_type
 	      && TYPE_ALIGN (data->nominal_type) > MEM_ALIGN (stack_parm)
 	      && MEM_ALIGN (stack_parm) < PREFERRED_STACK_BOUNDARY)))
@@ -3292,6 +3306,23 @@ assign_parm_setup_reg (struct assign_parm_data_all
 
       did_conversion = true;
     }
+  else if (MEM_P (data->entry_parm)
+	   && GET_MODE_ALIGNMENT (promoted_nominal_mode)
+	      > MEM_ALIGN (data->entry_parm)
+	   && targetm.slow_unaligned_access (promoted_nominal_mode,
+					     MEM_ALIGN (data->entry_parm)))
+    {
+      enum insn_code icode = optab_handler (movmisalign_optab,
+					    promoted_nominal_mode);
+
+      if (icode != CODE_FOR_nothing)
+	emit_insn (GEN_FCN (icode) (parmreg, validated_mem));
+      else
+	rtl = parmreg = extract_bit_field (validated_mem,
+			GET_MODE_BITSIZE (promoted_nominal_mode), 0,
+			unsignedp, parmreg,
+			promoted_nominal_mode, VOIDmode, false, NULL);
+    }
   else
     emit_move_insn (parmreg, validated_mem);
 
Index: gcc/testsuite/gcc.target/arm/unaligned-argument-1.c
===================================================================
--- gcc/testsuite/gcc.target/arm/unaligned-argument-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/arm/unaligned-argument-1.c	(working copy)
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-marm -march=armv6 -mno-unaligned-access -mfloat-abi=soft -mabi=aapcs -O3" } */
+
+struct s {
+  int a, b;
+} __attribute__((aligned(8)));
+
+struct s f0;
+
+void f(int a, int b, int c, int d, struct s f)
+{
+  f0 = f;
+}
+
+/* { dg-final { scan-assembler-times "ldrd" 1 } } */
+/* { dg-final { scan-assembler-times "strd" 1 } } */
+/* { dg-final { scan-assembler-times "stm" 0 } } */
Index: gcc/testsuite/gcc.target/arm/unaligned-argument-2.c
===================================================================
--- gcc/testsuite/gcc.target/arm/unaligned-argument-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/arm/unaligned-argument-2.c	(working copy)
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-marm -march=armv6 -mno-unaligned-access -mfloat-abi=soft -mabi=aapcs -O3" } */
+
+struct s {
+  int a, b;
+} __attribute__((aligned(8)));
+
+struct s f0;
+
+void f(int a, int b, int c, int d, int e, struct s f)
+{
+  f0 = f;
+}
+
+/* { dg-final { scan-assembler-times "ldrd" 0 } } */
+/* { dg-final { scan-assembler-times "strd" 0 } } */
+/* { dg-final { scan-assembler-times "stm" 1 } } */

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCHv3] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)
  2019-07-30 22:13       ` [PATCHv3] " Bernd Edlinger
@ 2019-07-31 13:17         ` Richard Earnshaw (lists)
  2019-08-01 11:19           ` Bernd Edlinger
  2019-08-02 13:11         ` Richard Biener
  1 sibling, 1 reply; 50+ messages in thread
From: Richard Earnshaw (lists) @ 2019-07-31 13:17 UTC (permalink / raw)
  To: Bernd Edlinger, Richard Biener
  Cc: gcc-patches, Ramana Radhakrishnan, Kyrill Tkachov, Eric Botcazou



On 30/07/2019 21:51, Bernd Edlinger wrote:
> +/* { dg-options "-marm -march=armv6 -mno-unaligned-access -mfloat-abi=soft -mabi=aapcs -O3" } */

This isn't going to work as-is, we test many combinations of the 
compiler, either with explicit dejagnu settings or with the compiler 
defaults and the dejagnu settings can't generally be overridden this way.

For -marm you require an effective-target of arm_arm_ok.  For ldrd, it 
should be enough to just require an effective-target of 
arm_ldrd_strd_ok, then you can .

I don't think we really care about any ABIs other than aapcs, so I'd 
just leave that off.  And as for setting the float-abi, I don't see 
anything in the tests that would require that, so that can probably be 
omitted as well.

I think with all this, you can then write something like

/* { dg-require-effective-target arm_arm_ok && arm_ldrd_strd_ok } */
/* { dg-options "-marm -mno-unaligned-access -O3 } */

But I haven't tested that, so you might need to fiddle with it a bit, 
especially the effective-target rule.

R.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCHv3] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)
  2019-07-31 13:17         ` Richard Earnshaw (lists)
@ 2019-08-01 11:19           ` Bernd Edlinger
  2019-08-02  9:10             ` Richard Earnshaw (lists)
  0 siblings, 1 reply; 50+ messages in thread
From: Bernd Edlinger @ 2019-08-01 11:19 UTC (permalink / raw)
  To: Richard Earnshaw (lists), Richard Biener
  Cc: gcc-patches, Ramana Radhakrishnan, Kyrill Tkachov, Eric Botcazou

[-- Attachment #1: Type: text/plain, Size: 1392 bytes --]

On 7/31/19 3:16 PM, Richard Earnshaw (lists) wrote:
> 
> 
> On 30/07/2019 21:51, Bernd Edlinger wrote:
>> +/* { dg-options "-marm -march=armv6 -mno-unaligned-access -mfloat-abi=soft -mabi=aapcs -O3" } */
> 
> This isn't going to work as-is, we test many combinations of the compiler, either with explicit dejagnu settings or with the compiler defaults and the dejagnu settings can't generally be overridden this way.
> 
> For -marm you require an effective-target of arm_arm_ok.  For ldrd, it should be enough to just require an effective-target of arm_ldrd_strd_ok, then you can .
> 
> I don't think we really care about any ABIs other than aapcs, so I'd just leave that off.  And as for setting the float-abi, I don't see anything in the tests that would require that, so that can probably be omitted as well.
> 
> I think with all this, you can then write something like
> 
> /* { dg-require-effective-target arm_arm_ok && arm_ldrd_strd_ok } */
> /* { dg-options "-marm -mno-unaligned-access -O3 } */
> 
> But I haven't tested that, so you might need to fiddle with it a bit, especially the effective-target rule.
> 

Okay, it seems we need two dg-require-effective-target rules for this to work,
as in the attached new version of the patch which I am currently boot-strapping.

Is it OK for trunk after successful boot-strap and reg-testing?


Thanks
Bernd.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: patch-arm-align-abi.diff --]
[-- Type: text/x-patch; name="patch-arm-align-abi.diff", Size: 5028 bytes --]

2019-07-30  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	PR middle-end/89544
	* function.c (assign_param_data_one): Remove unused data members.
	(assign_parm_find_stack_rtl): Use larger alignment when possible.
	(assign_parm_adjust_stack_rtl): Revise STRICT_ALIGNMENT check.
	(assign_parm_setup_reg): Handle misaligned stack arguments.

testsuite:
2019-07-30  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	PR middle-end/89544
	* gcc.target/arm/unaligned-argument-1.c: New test.
	* gcc.target/arm/unaligned-argument-2.c: New test.

Index: gcc/function.c
===================================================================
--- gcc/function.c	(revision 273767)
+++ gcc/function.c	(working copy)
@@ -2274,8 +2274,6 @@ struct assign_parm_data_one
   int partial;
   BOOL_BITFIELD named_arg : 1;
   BOOL_BITFIELD passed_pointer : 1;
-  BOOL_BITFIELD on_stack : 1;
-  BOOL_BITFIELD loaded_in_reg : 1;
 };
 
 /* A subroutine of assign_parms.  Initialize ALL.  */
@@ -2699,8 +2697,23 @@ assign_parm_find_stack_rtl (tree parm, struct assi
      intentionally forcing upward padding.  Otherwise we have to come
      up with a guess at the alignment based on OFFSET_RTX.  */
   poly_int64 offset;
-  if (data->locate.where_pad != PAD_DOWNWARD || data->entry_parm)
+  if (data->locate.where_pad == PAD_NONE || data->entry_parm)
     align = boundary;
+  else if (data->locate.where_pad == PAD_UPWARD)
+    {
+      align = boundary;
+      /* If the argument offset is actually more aligned than the nominal
+	 stack slot boundary, take advantage of that excess alignment.
+	 Don't make any assumptions if STACK_POINTER_OFFSET is in use.  */
+      if (poly_int_rtx_p (offset_rtx, &offset)
+	  && STACK_POINTER_OFFSET == 0)
+	{
+	  unsigned int offset_align = known_alignment (offset) * BITS_PER_UNIT;
+	  if (offset_align == 0 || offset_align > STACK_BOUNDARY)
+	    offset_align = STACK_BOUNDARY;
+	  align = MAX (align, offset_align);
+	}
+    }
   else if (poly_int_rtx_p (offset_rtx, &offset))
     {
       align = least_bit_hwi (boundary);
@@ -2813,8 +2826,9 @@ assign_parm_adjust_stack_rtl (struct assign_parm_d
      ultimate type, don't use that slot after entry.  We'll make another
      stack slot, if we need one.  */
   if (stack_parm
-      && ((STRICT_ALIGNMENT
-	   && GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN (stack_parm))
+      && ((GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN (stack_parm)
+	   && targetm.slow_unaligned_access (data->nominal_mode,
+					     MEM_ALIGN (stack_parm)))
 	  || (data->nominal_type
 	      && TYPE_ALIGN (data->nominal_type) > MEM_ALIGN (stack_parm)
 	      && MEM_ALIGN (stack_parm) < PREFERRED_STACK_BOUNDARY)))
@@ -3292,6 +3306,23 @@ assign_parm_setup_reg (struct assign_parm_data_all
 
       did_conversion = true;
     }
+  else if (MEM_P (data->entry_parm)
+	   && GET_MODE_ALIGNMENT (promoted_nominal_mode)
+	      > MEM_ALIGN (data->entry_parm)
+	   && targetm.slow_unaligned_access (promoted_nominal_mode,
+					     MEM_ALIGN (data->entry_parm)))
+    {
+      enum insn_code icode = optab_handler (movmisalign_optab,
+					    promoted_nominal_mode);
+
+      if (icode != CODE_FOR_nothing)
+	emit_insn (GEN_FCN (icode) (parmreg, validated_mem));
+      else
+	rtl = parmreg = extract_bit_field (validated_mem,
+			GET_MODE_BITSIZE (promoted_nominal_mode), 0,
+			unsignedp, parmreg,
+			promoted_nominal_mode, VOIDmode, false, NULL);
+    }
   else
     emit_move_insn (parmreg, validated_mem);
 
Index: gcc/testsuite/gcc.target/arm/unaligned-argument-1.c
===================================================================
--- gcc/testsuite/gcc.target/arm/unaligned-argument-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/arm/unaligned-argument-1.c	(working copy)
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_arm_ok } */
+/* { dg-require-effective-target arm_ldrd_strd_ok } */
+/* { dg-options "-marm -mno-unaligned-access -O3" } */
+
+struct s {
+  int a, b;
+} __attribute__((aligned(8)));
+
+struct s f0;
+
+void f(int a, int b, int c, int d, struct s f)
+{
+  f0 = f;
+}
+
+/* { dg-final { scan-assembler-times "ldrd" 1 } } */
+/* { dg-final { scan-assembler-times "strd" 1 } } */
+/* { dg-final { scan-assembler-times "stm" 0 } } */
Index: gcc/testsuite/gcc.target/arm/unaligned-argument-2.c
===================================================================
--- gcc/testsuite/gcc.target/arm/unaligned-argument-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/arm/unaligned-argument-2.c	(working copy)
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_arm_ok } */
+/* { dg-require-effective-target arm_ldrd_strd_ok } */
+/* { dg-options "-marm -mno-unaligned-access -O3" } */
+
+struct s {
+  int a, b;
+} __attribute__((aligned(8)));
+
+struct s f0;
+
+void f(int a, int b, int c, int d, int e, struct s f)
+{
+  f0 = f;
+}
+
+/* { dg-final { scan-assembler-times "ldrd" 0 } } */
+/* { dg-final { scan-assembler-times "strd" 0 } } */
+/* { dg-final { scan-assembler-times "stm" 1 } } */

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCHv3] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)
  2019-08-01 11:19           ` Bernd Edlinger
@ 2019-08-02  9:10             ` Richard Earnshaw (lists)
  0 siblings, 0 replies; 50+ messages in thread
From: Richard Earnshaw (lists) @ 2019-08-02  9:10 UTC (permalink / raw)
  To: Bernd Edlinger, Richard Biener
  Cc: gcc-patches, Ramana Radhakrishnan, Kyrill Tkachov, Eric Botcazou

On 01/08/2019 12:19, Bernd Edlinger wrote:
> On 7/31/19 3:16 PM, Richard Earnshaw (lists) wrote:
>>
>>
>> On 30/07/2019 21:51, Bernd Edlinger wrote:
>>> +/* { dg-options "-marm -march=armv6 -mno-unaligned-access -mfloat-abi=soft -mabi=aapcs -O3" } */
>>
>> This isn't going to work as-is, we test many combinations of the compiler, either with explicit dejagnu settings or with the compiler defaults and the dejagnu settings can't generally be overridden this way.
>>
>> For -marm you require an effective-target of arm_arm_ok.  For ldrd, it should be enough to just require an effective-target of arm_ldrd_strd_ok, then you can .
>>
>> I don't think we really care about any ABIs other than aapcs, so I'd just leave that off.  And as for setting the float-abi, I don't see anything in the tests that would require that, so that can probably be omitted as well.
>>
>> I think with all this, you can then write something like
>>
>> /* { dg-require-effective-target arm_arm_ok && arm_ldrd_strd_ok } */
>> /* { dg-options "-marm -mno-unaligned-access -O3 } */
>>
>> But I haven't tested that, so you might need to fiddle with it a bit, especially the effective-target rule.
>>
> 
> Okay, it seems we need two dg-require-effective-target rules for this to work,
> as in the attached new version of the patch which I am currently boot-strapping.
> 
> Is it OK for trunk after successful boot-strap and reg-testing?
> 

The tests are OK.  If the match rules for the stm instruction turn out 
to cause problems I think we can just drop them without materially 
weakening the tests.  But lets wait and see on that.

I'll leave the mid-end bits to Richi, I'm not familiar with that code.

R.

> 
> Thanks
> Bernd.
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCHv3] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)
  2019-07-30 22:13       ` [PATCHv3] " Bernd Edlinger
  2019-07-31 13:17         ` Richard Earnshaw (lists)
@ 2019-08-02 13:11         ` Richard Biener
  2019-08-02 19:01           ` Bernd Edlinger
  1 sibling, 1 reply; 50+ messages in thread
From: Richard Biener @ 2019-08-02 13:11 UTC (permalink / raw)
  To: Bernd Edlinger
  Cc: gcc-patches, Richard Earnshaw, Ramana Radhakrishnan,
	Kyrill Tkachov, Eric Botcazou

On Tue, 30 Jul 2019, Bernd Edlinger wrote:

> Hi Richard,
> 
> it is already a while ago, but I had not found time to continue
> with this patch until now.
> 
> I think I have now a better solution, which properly addresses your
> comments below.
> 
> On 3/25/19 9:41 AM, Richard Biener wrote:
> > On Fri, 22 Mar 2019, Bernd Edlinger wrote:
> > 
> >> On 3/21/19 12:15 PM, Richard Biener wrote:
> >>> On Sun, 10 Mar 2019, Bernd Edlinger wrote:
> >>> Finally...
> >>>
> >>> Index: gcc/function.c
> >>> ===================================================================
> >>> --- gcc/function.c      (revision 269264)
> >>> +++ gcc/function.c      (working copy)
> >>> @@ -2210,6 +2210,12 @@ use_register_for_decl (const_tree decl)
> >>>    if (DECL_MODE (decl) == BLKmode)
> >>>      return false;
> >>>
> >>> +  if (STRICT_ALIGNMENT && TREE_CODE (decl) == PARM_DECL
> >>> +      && DECL_INCOMING_RTL (decl) && MEM_P (DECL_INCOMING_RTL (decl))
> >>> +      && GET_MODE_ALIGNMENT (DECL_MODE (decl))
> >>> +        > MEM_ALIGN (DECL_INCOMING_RTL (decl)))
> >>> +    return false;
> >>> +
> >>>    /* If -ffloat-store specified, don't put explicit float variables
> >>>       into registers.  */
> >>>    /* ??? This should be checked after DECL_ARTIFICIAL, but tree-ssa
> >>>
> >>> I wonder if it is necessary to look at DECL_INCOMING_RTL here
> >>> and why such RTL may not exist?  That is, iff DECL_INCOMING_RTL
> >>> doesn't exist then shouldn't we return false for safety reasons?
> >>>
> 
> You are right, it is not possbile to return different results from
> use_register_for_decl before vs. after incoming RTL is assigned.
> That hits an assertion in set_rtl.
> 
> This hunk is gone now, instead I changed assign_parm_setup_reg
> to use movmisalign optab and/or extract_bit_field if misaligned
> entry_parm is to be assigned in a register.
> 
> I have no test coverage for the movmisalign optab though, so I
> rely on your code review for that part.

It looks OK.  I tried to make it trigger on the following on
i?86 with -msse2:

typedef int v4si __attribute__((vector_size (16)));

struct S { v4si v; } __attribute__((packed));

v4si foo (struct S s)
{
  return s.v;
}

but nowadays x86 seems to be happy with regular moves operating on
unaligned memory, using unaligned moves where necessary.

(insn 5 2 8 2 (set (reg:V4SI 82 [ _2 ])
        (mem/c:V4SI (reg/f:SI 16 argp) [2 s.v+0 S16 A32])) "t.c":7:11 1229 
{movv4si_internal}
     (nil))

and with GCC 4.8 we ended up with the following expansion which is
also correct.

(insn 2 4 3 2 (set (subreg:V16QI (reg/v:V4SI 61 [ s ]) 0)
        (unspec:V16QI [
                (mem/c:V16QI (reg/f:SI 16 argp) [0 s+0 S16 A32])
            ] UNSPEC_LOADU)) t.c:6 1164 {sse2_loaddqu}
     (nil))

So it seems it has been too long and I don't remember what is
special with arm that it doesn't work...  it possibly simply
trusts GET_MODE_ALIGNMENT, never looking at MEM_ALIGN which
I think is OK-ish?

> >>> Similarly the very same issue should exist on x86_64 which is
> >>> !STRICT_ALIGNMENT, it's just the ABI seems to provide the appropriate
> >>> alignment on the caller side.  So the STRICT_ALIGNMENT check is
> >>> a wrong one.
> >>>
> >>
> >> I may be plain wrong here, but I thought that !STRICT_ALIGNMENT targets
> >> just use MEM_ALIGN to select the right instructions.  MEM_ALIGN
> >> is always 32-bit align on the DImode memory.  The x86_64 vector instructions
> >> would look at MEM_ALIGN and do the right thing, yes?
> > 
> > No, they need to use the movmisalign optab and end up with UNSPECs
> > for example.
> Ah, thanks, now I see.
> 
> >> It seems to be the definition of STRICT_ALIGNMENT targets that all RTL
> >> instructions need to have MEM_ALIGN >= GET_MODE_ALIGNMENT, so the target
> >> does not even have to look at MEM_ALIGN except in the mov_misalign_optab,
> >> right?
> > 
> > Yes, I think we never losened that.  Note that RTL expansion has to
> > fix this up for them.  Note that strictly speaking SLOW_UNALIGNED_ACCESS
> > specifies that x86 is strict-align wrt vector modes.
> > 
> 
> Yes I agree, the code would be incorrect for x86 as well when the movmisalign_optab
> is not used.  So I invoke the movmisalign optab if available and if not fall
> back to extract_bit_field.  As in the assign_parm_setup_stack assign_parm_setup_reg
> assumes that data->promoted_mode != data->nominal_mode does not happen with
> misaligned stack slots.
> 
> 
> Attached is the v3 if my patch.
> 
> Boot-strapped and reg-tested on x86_64-pc-linux-gnu and arm-linux-gnueabihf.
> 
> Is it OK for trunk?

Few comments.

@@ -2274,8 +2274,6 @@ struct assign_parm_data_one
   int partial;
   BOOL_BITFIELD named_arg : 1;
   BOOL_BITFIELD passed_pointer : 1;
-  BOOL_BITFIELD on_stack : 1;
-  BOOL_BITFIELD loaded_in_reg : 1;
 };

 /* A subroutine of assign_parms.  Initialize ALL.  */

independently OK.

@@ -2813,8 +2826,9 @@ assign_parm_adjust_stack_rtl (struct assign_parm_d
      ultimate type, don't use that slot after entry.  We'll make another
      stack slot, if we need one.  */
   if (stack_parm
-      && ((STRICT_ALIGNMENT
-          && GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN 
(stack_parm))
+      && ((GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN 
(stack_parm)
+          && targetm.slow_unaligned_access (data->nominal_mode,
+                                            MEM_ALIGN (stack_parm)))
          || (data->nominal_type
              && TYPE_ALIGN (data->nominal_type) > MEM_ALIGN (stack_parm)
              && MEM_ALIGN (stack_parm) < PREFERRED_STACK_BOUNDARY)))

looks like something we should have as a separate commit as well.  It
also looks obvious to me.

@@ -3292,6 +3306,23 @@ assign_parm_setup_reg (struct assign_parm_data_all

       did_conversion = true;
     }
+  else if (MEM_P (data->entry_parm)
+          && GET_MODE_ALIGNMENT (promoted_nominal_mode)
+             > MEM_ALIGN (data->entry_parm)

we arrive here by-passing

  else if (need_conversion)
    {
      /* We did not have an insn to convert directly, or the sequence
         generated appeared unsafe.  We must first copy the parm to a
         pseudo reg, and save the conversion until after all
         parameters have been moved.  */

      int save_tree_used;
      rtx tempreg = gen_reg_rtx (GET_MODE (data->entry_parm));

      emit_move_insn (tempreg, validated_mem);

but this move instruction is invalid in the same way as the case
you fix, no?  So wouldn't it be better to do

  if (moved)
    /* Nothing to do.  */
    ;
  else
    {
       if (unaligned)
         ...
       else
         emit_move_insn (...);

       if (need_conversion)
 ....
    }

?  Hopefully whatever "moved" things in the if (moved) case did
it correctly.

Can you check whehter your patch does anything to the x86 testcase
posted above?

I'm not very familiar with this code so I'm leaving actual approval
to somebody else.  Still hope the comments were helpful.

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCHv3] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)
  2019-08-02 13:11         ` Richard Biener
@ 2019-08-02 19:01           ` Bernd Edlinger
  2019-08-08 14:20             ` [PATCHv4] " Bernd Edlinger
  2019-08-14 11:56             ` [PATCHv3] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544) Richard Biener
  0 siblings, 2 replies; 50+ messages in thread
From: Bernd Edlinger @ 2019-08-02 19:01 UTC (permalink / raw)
  To: Richard Biener
  Cc: gcc-patches, Richard Earnshaw, Ramana Radhakrishnan,
	Kyrill Tkachov, Eric Botcazou, Jeff Law, Jakub Jelinek

On 8/2/19 3:11 PM, Richard Biener wrote:
> On Tue, 30 Jul 2019, Bernd Edlinger wrote:
> 
>>
>> I have no test coverage for the movmisalign optab though, so I
>> rely on your code review for that part.
> 
> It looks OK.  I tried to make it trigger on the following on
> i?86 with -msse2:
> 
> typedef int v4si __attribute__((vector_size (16)));
> 
> struct S { v4si v; } __attribute__((packed));
> 
> v4si foo (struct S s)
> {
>   return s.v;
> }
> 

Hmm, the entry_parm need to be a MEM_P and an unaligned one.
So the test case could be made to trigger it this way:

typedef int v4si __attribute__((vector_size (16)));

struct S { v4si v; } __attribute__((packed));

int t;
v4si foo (struct S a, struct S b, struct S c, struct S d,
          struct S e, struct S f, struct S g, struct S h,
          int i, int j, int k, int l, int m, int n,
          int o, struct S s)
{
  t = o;
  return s.v;
}

However the code path is still not reached, since targetm.slow_ualigned_access
is always FALSE, which is probably a flaw in my patch.

So I think,

+  else if (MEM_P (data->entry_parm)
+          && GET_MODE_ALIGNMENT (promoted_nominal_mode)
+             > MEM_ALIGN (data->entry_parm)
+          && targetm.slow_unaligned_access (promoted_nominal_mode,
+                                            MEM_ALIGN (data->entry_parm)))

should probably better be

+  else if (MEM_P (data->entry_parm)
+          && GET_MODE_ALIGNMENT (promoted_nominal_mode)
+             > MEM_ALIGN (data->entry_parm)
+        && (((icode = optab_handler (movmisalign_optab, promoted_nominal_mode))
+             != CODE_FOR_nothing)
+            || targetm.slow_unaligned_access (promoted_nominal_mode,
+                                              MEM_ALIGN (data->entry_parm))))

Right?

Then the modified test case would use the movmisalign optab.
However nothing changes in the end, since the i386 back-end is used to work
around the middle end not using movmisalign optab when it should do so.

I wonder if I should try to add a gcc_checking_assert to the mov<mode> expand
patterns that the memory is properly aligned ?


> but nowadays x86 seems to be happy with regular moves operating on
> unaligned memory, using unaligned moves where necessary.
> 
> (insn 5 2 8 2 (set (reg:V4SI 82 [ _2 ])
>         (mem/c:V4SI (reg/f:SI 16 argp) [2 s.v+0 S16 A32])) "t.c":7:11 1229 
> {movv4si_internal}
>      (nil))
> 
> and with GCC 4.8 we ended up with the following expansion which is
> also correct.
> 
> (insn 2 4 3 2 (set (subreg:V16QI (reg/v:V4SI 61 [ s ]) 0)
>         (unspec:V16QI [
>                 (mem/c:V16QI (reg/f:SI 16 argp) [0 s+0 S16 A32])
>             ] UNSPEC_LOADU)) t.c:6 1164 {sse2_loaddqu}
>      (nil))
> 
> So it seems it has been too long and I don't remember what is
> special with arm that it doesn't work...  it possibly simply
> trusts GET_MODE_ALIGNMENT, never looking at MEM_ALIGN which
> I think is OK-ish?
> 

Yes, that is what Richard said as well.

>>>>> Similarly the very same issue should exist on x86_64 which is
>>>>> !STRICT_ALIGNMENT, it's just the ABI seems to provide the appropriate
>>>>> alignment on the caller side.  So the STRICT_ALIGNMENT check is
>>>>> a wrong one.
>>>>>
>>>>
>>>> I may be plain wrong here, but I thought that !STRICT_ALIGNMENT targets
>>>> just use MEM_ALIGN to select the right instructions.  MEM_ALIGN
>>>> is always 32-bit align on the DImode memory.  The x86_64 vector instructions
>>>> would look at MEM_ALIGN and do the right thing, yes?
>>>
>>> No, they need to use the movmisalign optab and end up with UNSPECs
>>> for example.
>> Ah, thanks, now I see.
>>
>>>> It seems to be the definition of STRICT_ALIGNMENT targets that all RTL
>>>> instructions need to have MEM_ALIGN >= GET_MODE_ALIGNMENT, so the target
>>>> does not even have to look at MEM_ALIGN except in the mov_misalign_optab,
>>>> right?
>>>
>>> Yes, I think we never losened that.  Note that RTL expansion has to
>>> fix this up for them.  Note that strictly speaking SLOW_UNALIGNED_ACCESS
>>> specifies that x86 is strict-align wrt vector modes.
>>>
>>
>> Yes I agree, the code would be incorrect for x86 as well when the movmisalign_optab
>> is not used.  So I invoke the movmisalign optab if available and if not fall
>> back to extract_bit_field.  As in the assign_parm_setup_stack assign_parm_setup_reg
>> assumes that data->promoted_mode != data->nominal_mode does not happen with
>> misaligned stack slots.
>>
>>
>> Attached is the v3 if my patch.
>>
>> Boot-strapped and reg-tested on x86_64-pc-linux-gnu and arm-linux-gnueabihf.
>>
>> Is it OK for trunk?
> 
> Few comments.
> 
> @@ -2274,8 +2274,6 @@ struct assign_parm_data_one
>    int partial;
>    BOOL_BITFIELD named_arg : 1;
>    BOOL_BITFIELD passed_pointer : 1;
> -  BOOL_BITFIELD on_stack : 1;
> -  BOOL_BITFIELD loaded_in_reg : 1;
>  };
> 
>  /* A subroutine of assign_parms.  Initialize ALL.  */
> 
> independently OK.
> 
> @@ -2813,8 +2826,9 @@ assign_parm_adjust_stack_rtl (struct assign_parm_d
>       ultimate type, don't use that slot after entry.  We'll make another
>       stack slot, if we need one.  */
>    if (stack_parm
> -      && ((STRICT_ALIGNMENT
> -          && GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN 
> (stack_parm))
> +      && ((GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN 
> (stack_parm)
> +          && targetm.slow_unaligned_access (data->nominal_mode,
> +                                            MEM_ALIGN (stack_parm)))
>           || (data->nominal_type
>               && TYPE_ALIGN (data->nominal_type) > MEM_ALIGN (stack_parm)
>               && MEM_ALIGN (stack_parm) < PREFERRED_STACK_BOUNDARY)))
> 
> looks like something we should have as a separate commit as well.  It
> also looks obvious to me.
> 

Okay, committed as two separate commits: r274023 & r274025

> @@ -3292,6 +3306,23 @@ assign_parm_setup_reg (struct assign_parm_data_all
> 
>        did_conversion = true;
>      }
> +  else if (MEM_P (data->entry_parm)
> +          && GET_MODE_ALIGNMENT (promoted_nominal_mode)
> +             > MEM_ALIGN (data->entry_parm)
> 
> we arrive here by-passing
> 
>   else if (need_conversion)
>     {
>       /* We did not have an insn to convert directly, or the sequence
>          generated appeared unsafe.  We must first copy the parm to a
>          pseudo reg, and save the conversion until after all
>          parameters have been moved.  */
> 
>       int save_tree_used;
>       rtx tempreg = gen_reg_rtx (GET_MODE (data->entry_parm));
> 
>       emit_move_insn (tempreg, validated_mem);
> 
> but this move instruction is invalid in the same way as the case
> you fix, no?  So wouldn't it be better to do
> 

We could do that, but I supposed that there must be a reason why
assign_parm_setup_stack gets away with that same:

  if (data->promoted_mode != data->nominal_mode)
    {
      /* Conversion is required.  */
      rtx tempreg = gen_reg_rtx (GET_MODE (data->entry_parm));

      emit_move_insn (tempreg, validize_mem (copy_rtx (data->entry_parm)));


So either some back-ends are too permissive with us,
or there is a reason why promoted_mode != nominal_mode
does not happen together with unaligned entry_parm.
In a way that would be a rather unusual ABI.

>   if (moved)
>     /* Nothing to do.  */
>     ;
>   else
>     {
>        if (unaligned)
>          ...
>        else
>          emit_move_insn (...);
> 
>        if (need_conversion)
>  ....
>     }
> 
> ?  Hopefully whatever "moved" things in the if (moved) case did
> it correctly.
> 

It would'nt.  It uses the gen_extend_insn would that be expected to
work with unaligned memory?

> Can you check whehter your patch does anything to the x86 testcase
> posted above?
> 

Thanks, it might help to have at least a test case where the pattern
is expanded, even if it does not change anything.

> I'm not very familiar with this code so I'm leaving actual approval
> to somebody else.  Still hope the comments were helpful.
> 

Yes they are, thanks a lot.


Thanks
Bernd.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCHv4] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)
  2019-08-02 19:01           ` Bernd Edlinger
@ 2019-08-08 14:20             ` Bernd Edlinger
  2019-08-14 10:54               ` [PING] " Bernd Edlinger
  2019-08-14 12:27               ` Richard Biener
  2019-08-14 11:56             ` [PATCHv3] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544) Richard Biener
  1 sibling, 2 replies; 50+ messages in thread
From: Bernd Edlinger @ 2019-08-08 14:20 UTC (permalink / raw)
  To: Richard Biener
  Cc: gcc-patches, Richard Earnshaw, Ramana Radhakrishnan,
	Kyrill Tkachov, Eric Botcazou, Jeff Law, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 8179 bytes --]

On 8/2/19 9:01 PM, Bernd Edlinger wrote:
> On 8/2/19 3:11 PM, Richard Biener wrote:
>> On Tue, 30 Jul 2019, Bernd Edlinger wrote:
>>
>>>
>>> I have no test coverage for the movmisalign optab though, so I
>>> rely on your code review for that part.
>>
>> It looks OK.  I tried to make it trigger on the following on
>> i?86 with -msse2:
>>
>> typedef int v4si __attribute__((vector_size (16)));
>>
>> struct S { v4si v; } __attribute__((packed));
>>
>> v4si foo (struct S s)
>> {
>>   return s.v;
>> }
>>
> 
> Hmm, the entry_parm need to be a MEM_P and an unaligned one.
> So the test case could be made to trigger it this way:
> 
> typedef int v4si __attribute__((vector_size (16)));
> 
> struct S { v4si v; } __attribute__((packed));
> 
> int t;
> v4si foo (struct S a, struct S b, struct S c, struct S d,
>           struct S e, struct S f, struct S g, struct S h,
>           int i, int j, int k, int l, int m, int n,
>           int o, struct S s)
> {
>   t = o;
>   return s.v;
> }
> 

Ah, I realized that there are already a couple of very similar
test cases: gcc.target/i386/pr35767-1.c, gcc.target/i386/pr35767-1d.c,
gcc.target/i386/pr35767-1i.c and gcc.target/i386/pr39445.c,
which also manage to execute the movmisalign code with the latest patch
version.  So I thought that it is not necessary to add another one.

> However the code path is still not reached, since targetm.slow_ualigned_access
> is always FALSE, which is probably a flaw in my patch.
> 
> So I think,
> 
> +  else if (MEM_P (data->entry_parm)
> +          && GET_MODE_ALIGNMENT (promoted_nominal_mode)
> +             > MEM_ALIGN (data->entry_parm)
> +          && targetm.slow_unaligned_access (promoted_nominal_mode,
> +                                            MEM_ALIGN (data->entry_parm)))
> 
> should probably better be
> 
> +  else if (MEM_P (data->entry_parm)
> +          && GET_MODE_ALIGNMENT (promoted_nominal_mode)
> +             > MEM_ALIGN (data->entry_parm)
> +        && (((icode = optab_handler (movmisalign_optab, promoted_nominal_mode))
> +             != CODE_FOR_nothing)
> +            || targetm.slow_unaligned_access (promoted_nominal_mode,
> +                                              MEM_ALIGN (data->entry_parm))))
> 
> Right?
> 
> Then the modified test case would use the movmisalign optab.
> However nothing changes in the end, since the i386 back-end is used to work
> around the middle end not using movmisalign optab when it should do so.
> 

I prefer the second form of the check, as it offers more test coverage,
and is probably more correct than the former.

Note there are more variations of this misalign check in expr.c,
some are somehow odd, like expansion of MEM_REF and VIEW_CONVERT_EXPR:

            && mode != BLKmode
            && align < GET_MODE_ALIGNMENT (mode))
          {
            if ((icode = optab_handler (movmisalign_optab, mode))
                != CODE_FOR_nothing)
              [...]
            else if (targetm.slow_unaligned_access (mode, align))
              temp = extract_bit_field (temp, GET_MODE_BITSIZE (mode),
                                        0, TYPE_UNSIGNED (TREE_TYPE (exp)),
                                        (modifier == EXPAND_STACK_PARM
                                         ? NULL_RTX : target),
                                        mode, mode, false, alt_rtl);

I wonder if they are correct this way, why shouldn't we use the movmisalign
optab if it exists, regardless of TARGET_SLOW_UNALIGNED_ACCESSS ?


> I wonder if I should try to add a gcc_checking_assert to the mov<mode> expand
> patterns that the memory is properly aligned ?
>

Wow, that was a really exciting bug-hunt with those assertions around...

>> @@ -3292,6 +3306,23 @@ assign_parm_setup_reg (struct assign_parm_data_all
>>
>>        did_conversion = true;
>>      }
>> +  else if (MEM_P (data->entry_parm)
>> +          && GET_MODE_ALIGNMENT (promoted_nominal_mode)
>> +             > MEM_ALIGN (data->entry_parm)
>>
>> we arrive here by-passing
>>
>>   else if (need_conversion)
>>     {
>>       /* We did not have an insn to convert directly, or the sequence
>>          generated appeared unsafe.  We must first copy the parm to a
>>          pseudo reg, and save the conversion until after all
>>          parameters have been moved.  */
>>
>>       int save_tree_used;
>>       rtx tempreg = gen_reg_rtx (GET_MODE (data->entry_parm));
>>
>>       emit_move_insn (tempreg, validated_mem);
>>
>> but this move instruction is invalid in the same way as the case
>> you fix, no?  So wouldn't it be better to do
>>
> 
> We could do that, but I supposed that there must be a reason why
> assign_parm_setup_stack gets away with that same:
> 
>   if (data->promoted_mode != data->nominal_mode)
>     {
>       /* Conversion is required.  */
>       rtx tempreg = gen_reg_rtx (GET_MODE (data->entry_parm));
> 
>       emit_move_insn (tempreg, validize_mem (copy_rtx (data->entry_parm)));
> 
> 
> So either some back-ends are too permissive with us,
> or there is a reason why promoted_mode != nominal_mode
> does not happen together with unaligned entry_parm.
> In a way that would be a rather unusual ABI.
> 

To find out if that ever happens I added a couple of checking
assertions in the arm mov<mode> expand patterns.

So far the assertions did (almost) always hold, so it is likely not
necessary to fiddle with all those naive move instructions here.

So my gut feeling is, leave those places alone until there is a reason
for changing them.

However the assertion in movsi triggered a couple times in the
ada testsuite due to expand_builtin_init_descriptor using a
BLKmode MEM rtx, which is only 8-bit aligned.  So, I set the
ptr_mode alignment there explicitly.

Several struct-layout-1.dg testcase tripped over misaligned
complex_cst constants, fixed by varasm.c (align_variable).
This is likely a wrong code bug, because misaligned complex
constants, are expanded to misaligned MEM_REF, but the
expansion cannot handle misaligned constants, only packed
structure fields.

Furthermore gcc.dg/Warray-bounds-33.c was fixed by the
change in expr.c (expand_expr_real_1).  Certainly is it invalid
to read memory at a function address, but it should not ICE.
The problem here, is the MEM_REF has no valid MEM_ALIGN, it looks
like A32, so the misaligned code execution is not taken, but it is
set to A8 below, but then we hit an ICE if the result is used:

        /* Don't set memory attributes if the base expression is
           SSA_NAME that got expanded as a MEM.  In that case, we should
           just honor its original memory attributes.  */
        if (TREE_CODE (tem) != SSA_NAME || !MEM_P (orig_op0))
          set_mem_attributes (op0, exp, 0);


Finally gcc.dg/torture/pr48493.c required the change
in assign_parm_setup_stack.  This is just not using the
correct MEM_ALIGN attribute value, while the memory is
actually aligned.  Note that set_mem_attributes does not
always preserve the MEM_ALIGN of the ref, since:

  /* Default values from pre-existing memory attributes if present.  */
  refattrs = MEM_ATTRS (ref);
  if (refattrs)
    {
      /* ??? Can this ever happen?  Calling this routine on a MEM that
         already carries memory attributes should probably be invalid.  */
      attrs.expr = refattrs->expr;
      attrs.offset_known_p = refattrs->offset_known_p;
      attrs.offset = refattrs->offset;
      attrs.size_known_p = refattrs->size_known_p;
      attrs.size = refattrs->size;
      attrs.align = refattrs->align;
    }

but if we happen to set_mem_align to _exactly_ the MODE_ALIGNMENT
the MEM_ATTRS are zero, and a smaller alignment may result.

Well with those checks in place it should now be a lot harder to generate
invalid code on STRICT_ALIGNMENT targets, without running into an ICE.

Attached is the latest version of my arm alignment patch.


Boot-strapped and reg-tested on x64_64-pc-linux-gnu and arm-linux-gnueabihf.
Is it OK for trunk?


Thanks
Bernd.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: patch-arm-align-abi.diff --]
[-- Type: text/x-patch; name="patch-arm-align-abi.diff", Size: 12166 bytes --]

2019-08-05  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	PR middle-end/89544
	* builtins.c (expand_builtin_init_descriptor): Set memory alignment.
	* expr.c (expand_expr_real_1): Handle FUNCTION_DECL as unaligned.
	* function.c (assign_parm_find_stack_rtl): Use larger alignment
	when possible.
	(assign_parm_setup_reg): Handle misaligned stack arguments.
	(assign_parm_setup_stack): Allocate properly aligned stack slots.
	* varasm.c (align_variable): Align constants of misaligned types.
	* config/arm/arm.md (movdi, movsi, movhi, movhf, movsf, movdf): Check
	strict alignment restrictions on memory addresses.
	* config/arm/neon.md (movti, mov<VSTRUCT>, mov<VH>): Likewise.
	* config/arm/vec-common.md (mov<VALL>): Likewise.

testsuite:
2019-08-05  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	PR middle-end/89544
	* gcc.target/arm/unaligned-argument-1.c: New test.
	* gcc.target/arm/unaligned-argument-2.c: New test.

Index: gcc/builtins.c
===================================================================
--- gcc/builtins.c	(revision 274168)
+++ gcc/builtins.c	(working copy)
@@ -5756,6 +5756,7 @@ expand_builtin_init_descriptor (tree exp)
   r_descr = expand_normal (t_descr);
   m_descr = gen_rtx_MEM (BLKmode, r_descr);
   MEM_NOTRAP_P (m_descr) = 1;
+  set_mem_align (m_descr, GET_MODE_ALIGNMENT (ptr_mode));
 
   r_func = expand_normal (t_func);
   r_chain = expand_normal (t_chain);
Index: gcc/config/arm/arm.md
===================================================================
--- gcc/config/arm/arm.md	(revision 274168)
+++ gcc/config/arm/arm.md	(working copy)
@@ -5824,6 +5824,12 @@
 	(match_operand:DI 1 "general_operand"))]
   "TARGET_EITHER"
   "
+  gcc_checking_assert (!MEM_P (operands[0])
+		       || MEM_ALIGN (operands[0])
+			  >= GET_MODE_ALIGNMENT (DImode));
+  gcc_checking_assert (!MEM_P (operands[1])
+		       || MEM_ALIGN (operands[1])
+			  >= GET_MODE_ALIGNMENT (DImode));
   if (can_create_pseudo_p ())
     {
       if (!REG_P (operands[0]))
@@ -6000,6 +6006,12 @@
   {
   rtx base, offset, tmp;
 
+  gcc_checking_assert (!MEM_P (operands[0])
+		       || MEM_ALIGN (operands[0])
+			  >= GET_MODE_ALIGNMENT (SImode));
+  gcc_checking_assert (!MEM_P (operands[1])
+		       || MEM_ALIGN (operands[1])
+			  >= GET_MODE_ALIGNMENT (SImode));
   if (TARGET_32BIT || TARGET_HAVE_MOVT)
     {
       /* Everything except mem = const or mem = mem can be done easily.  */
@@ -6489,6 +6501,12 @@
 	(match_operand:HI 1 "general_operand"))]
   "TARGET_EITHER"
   "
+  gcc_checking_assert (!MEM_P (operands[0])
+		       || MEM_ALIGN (operands[0])
+			  >= GET_MODE_ALIGNMENT (HImode));
+  gcc_checking_assert (!MEM_P (operands[1])
+		       || MEM_ALIGN (operands[1])
+			  >= GET_MODE_ALIGNMENT (HImode));
   if (TARGET_ARM)
     {
       if (can_create_pseudo_p ())
@@ -6898,6 +6916,12 @@
 	(match_operand:HF 1 "general_operand"))]
   "TARGET_EITHER"
   "
+  gcc_checking_assert (!MEM_P (operands[0])
+		       || MEM_ALIGN (operands[0])
+			  >= GET_MODE_ALIGNMENT (HFmode));
+  gcc_checking_assert (!MEM_P (operands[1])
+		       || MEM_ALIGN (operands[1])
+			  >= GET_MODE_ALIGNMENT (HFmode));
   if (TARGET_32BIT)
     {
       if (MEM_P (operands[0]))
@@ -6962,6 +6986,12 @@
 	(match_operand:SF 1 "general_operand"))]
   "TARGET_EITHER"
   "
+  gcc_checking_assert (!MEM_P (operands[0])
+		       || MEM_ALIGN (operands[0])
+			  >= GET_MODE_ALIGNMENT (SFmode));
+  gcc_checking_assert (!MEM_P (operands[1])
+		       || MEM_ALIGN (operands[1])
+			  >= GET_MODE_ALIGNMENT (SFmode));
   if (TARGET_32BIT)
     {
       if (MEM_P (operands[0]))
@@ -7057,6 +7087,12 @@
 	(match_operand:DF 1 "general_operand"))]
   "TARGET_EITHER"
   "
+  gcc_checking_assert (!MEM_P (operands[0])
+		       || MEM_ALIGN (operands[0])
+			  >= GET_MODE_ALIGNMENT (DFmode));
+  gcc_checking_assert (!MEM_P (operands[1])
+		       || MEM_ALIGN (operands[1])
+			  >= GET_MODE_ALIGNMENT (DFmode));
   if (TARGET_32BIT)
     {
       if (MEM_P (operands[0]))
Index: gcc/config/arm/neon.md
===================================================================
--- gcc/config/arm/neon.md	(revision 274168)
+++ gcc/config/arm/neon.md	(working copy)
@@ -127,6 +127,12 @@
 	(match_operand:TI 1 "general_operand"))]
   "TARGET_NEON"
 {
+  gcc_checking_assert (!MEM_P (operands[0])
+		       || MEM_ALIGN (operands[0])
+			  >= GET_MODE_ALIGNMENT (TImode));
+  gcc_checking_assert (!MEM_P (operands[1])
+		       || MEM_ALIGN (operands[1])
+			  >= GET_MODE_ALIGNMENT (TImode));
   if (can_create_pseudo_p ())
     {
       if (!REG_P (operands[0]))
@@ -139,6 +145,12 @@
 	(match_operand:VSTRUCT 1 "general_operand"))]
   "TARGET_NEON"
 {
+  gcc_checking_assert (!MEM_P (operands[0])
+		       || MEM_ALIGN (operands[0])
+			  >= GET_MODE_ALIGNMENT (<MODE>mode));
+  gcc_checking_assert (!MEM_P (operands[1])
+		       || MEM_ALIGN (operands[1])
+			  >= GET_MODE_ALIGNMENT (<MODE>mode));
   if (can_create_pseudo_p ())
     {
       if (!REG_P (operands[0]))
@@ -151,6 +163,12 @@
 	(match_operand:VH 1 "s_register_operand"))]
   "TARGET_NEON"
 {
+  gcc_checking_assert (!MEM_P (operands[0])
+		       || MEM_ALIGN (operands[0])
+			  >= GET_MODE_ALIGNMENT (<MODE>mode));
+  gcc_checking_assert (!MEM_P (operands[1])
+		       || MEM_ALIGN (operands[1])
+			  >= GET_MODE_ALIGNMENT (<MODE>mode));
   if (can_create_pseudo_p ())
     {
       if (!REG_P (operands[0]))
Index: gcc/config/arm/vec-common.md
===================================================================
--- gcc/config/arm/vec-common.md	(revision 274168)
+++ gcc/config/arm/vec-common.md	(working copy)
@@ -26,6 +26,12 @@
   "TARGET_NEON
    || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))"
 {
+  gcc_checking_assert (!MEM_P (operands[0])
+		       || MEM_ALIGN (operands[0])
+			  >= GET_MODE_ALIGNMENT (<MODE>mode));
+  gcc_checking_assert (!MEM_P (operands[1])
+		       || MEM_ALIGN (operands[1])
+			  >= GET_MODE_ALIGNMENT (<MODE>mode));
   if (can_create_pseudo_p ())
     {
       if (!REG_P (operands[0]))
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	(revision 274168)
+++ gcc/expr.c	(working copy)
@@ -10796,6 +10796,14 @@ expand_expr_real_1 (tree exp, rtx target, machine_
 	    MEM_VOLATILE_P (op0) = 1;
 	  }
 
+	if (MEM_P (op0) && TREE_CODE (tem) == FUNCTION_DECL)
+	  {
+	    if (op0 == orig_op0)
+	      op0 = copy_rtx (op0);
+
+	    set_mem_align (op0, BITS_PER_UNIT);
+	  }
+
 	/* In cases where an aligned union has an unaligned object
 	   as a field, we might be extracting a BLKmode value from
 	   an integer-mode (e.g., SImode) object.  Handle this case
Index: gcc/function.c
===================================================================
--- gcc/function.c	(revision 274168)
+++ gcc/function.c	(working copy)
@@ -2697,8 +2697,23 @@ assign_parm_find_stack_rtl (tree parm, struct assi
      intentionally forcing upward padding.  Otherwise we have to come
      up with a guess at the alignment based on OFFSET_RTX.  */
   poly_int64 offset;
-  if (data->locate.where_pad != PAD_DOWNWARD || data->entry_parm)
+  if (data->locate.where_pad == PAD_NONE || data->entry_parm)
     align = boundary;
+  else if (data->locate.where_pad == PAD_UPWARD)
+    {
+      align = boundary;
+      /* If the argument offset is actually more aligned than the nominal
+	 stack slot boundary, take advantage of that excess alignment.
+	 Don't make any assumptions if STACK_POINTER_OFFSET is in use.  */
+      if (poly_int_rtx_p (offset_rtx, &offset)
+	  && STACK_POINTER_OFFSET == 0)
+	{
+	  unsigned int offset_align = known_alignment (offset) * BITS_PER_UNIT;
+	  if (offset_align == 0 || offset_align > STACK_BOUNDARY)
+	    offset_align = STACK_BOUNDARY;
+	  align = MAX (align, offset_align);
+	}
+    }
   else if (poly_int_rtx_p (offset_rtx, &offset))
     {
       align = least_bit_hwi (boundary);
@@ -3127,6 +3142,7 @@ assign_parm_setup_reg (struct assign_parm_data_all
   int unsignedp = TYPE_UNSIGNED (TREE_TYPE (parm));
   bool did_conversion = false;
   bool need_conversion, moved;
+  enum insn_code icode;
   rtx rtl;
 
   /* Store the parm in a pseudoregister during the function, but we may
@@ -3188,7 +3204,6 @@ assign_parm_setup_reg (struct assign_parm_data_all
 	 conversion.  We verify that this insn does not clobber any
 	 hard registers.  */
 
-      enum insn_code icode;
       rtx op0, op1;
 
       icode = can_extend_p (promoted_nominal_mode, data->passed_mode,
@@ -3291,6 +3306,23 @@ assign_parm_setup_reg (struct assign_parm_data_all
 
       did_conversion = true;
     }
+  else if (MEM_P (data->entry_parm)
+	   && GET_MODE_ALIGNMENT (promoted_nominal_mode)
+	      > MEM_ALIGN (data->entry_parm)
+	   && (((icode = optab_handler (movmisalign_optab,
+					promoted_nominal_mode))
+		!= CODE_FOR_nothing)
+	       || targetm.slow_unaligned_access (promoted_nominal_mode,
+						 MEM_ALIGN (data->entry_parm))))
+    {
+      if (icode != CODE_FOR_nothing)
+	emit_insn (GEN_FCN (icode) (parmreg, validated_mem));
+      else
+	rtl = parmreg = extract_bit_field (validated_mem,
+			GET_MODE_BITSIZE (promoted_nominal_mode), 0,
+			unsignedp, parmreg,
+			promoted_nominal_mode, VOIDmode, false, NULL);
+    }
   else
     emit_move_insn (parmreg, validated_mem);
 
@@ -3449,11 +3481,17 @@ assign_parm_setup_stack (struct assign_parm_data_a
 	  int align = STACK_SLOT_ALIGNMENT (data->passed_type,
 					    GET_MODE (data->entry_parm),
 					    TYPE_ALIGN (data->passed_type));
+	  if (align < (int)GET_MODE_ALIGNMENT (GET_MODE (data->entry_parm))
+	      && targetm.slow_unaligned_access (GET_MODE (data->entry_parm),
+						align))
+	    align = GET_MODE_ALIGNMENT (GET_MODE (data->entry_parm));
 	  data->stack_parm
 	    = assign_stack_local (GET_MODE (data->entry_parm),
 				  GET_MODE_SIZE (GET_MODE (data->entry_parm)),
 				  align);
+	  align = MEM_ALIGN (data->stack_parm);
 	  set_mem_attributes (data->stack_parm, parm, 1);
+	  set_mem_align (data->stack_parm, align);
 	}
 
       dest = validize_mem (copy_rtx (data->stack_parm));
Index: gcc/testsuite/gcc.target/arm/unaligned-argument-1.c
===================================================================
--- gcc/testsuite/gcc.target/arm/unaligned-argument-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/arm/unaligned-argument-1.c	(working copy)
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_arm_ok } */
+/* { dg-require-effective-target arm_ldrd_strd_ok } */
+/* { dg-options "-marm -mno-unaligned-access -O3" } */
+
+struct s {
+  int a, b;
+} __attribute__((aligned(8)));
+
+struct s f0;
+
+void f(int a, int b, int c, int d, struct s f)
+{
+  f0 = f;
+}
+
+/* { dg-final { scan-assembler-times "ldrd" 1 } } */
+/* { dg-final { scan-assembler-times "strd" 1 } } */
+/* { dg-final { scan-assembler-times "stm" 0 } } */
Index: gcc/testsuite/gcc.target/arm/unaligned-argument-2.c
===================================================================
--- gcc/testsuite/gcc.target/arm/unaligned-argument-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/arm/unaligned-argument-2.c	(working copy)
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_arm_ok } */
+/* { dg-require-effective-target arm_ldrd_strd_ok } */
+/* { dg-options "-marm -mno-unaligned-access -O3" } */
+
+struct s {
+  int a, b;
+} __attribute__((aligned(8)));
+
+struct s f0;
+
+void f(int a, int b, int c, int d, int e, struct s f)
+{
+  f0 = f;
+}
+
+/* { dg-final { scan-assembler-times "ldrd" 0 } } */
+/* { dg-final { scan-assembler-times "strd" 0 } } */
+/* { dg-final { scan-assembler-times "stm" 1 } } */
Index: gcc/varasm.c
===================================================================
--- gcc/varasm.c	(revision 274168)
+++ gcc/varasm.c	(working copy)
@@ -1085,6 +1085,10 @@ align_variable (tree decl, bool dont_output_data)
 	}
     }
 
+  if (align < GET_MODE_ALIGNMENT (DECL_MODE (decl))
+      && targetm.slow_unaligned_access (DECL_MODE (decl), align))
+    align = GET_MODE_ALIGNMENT (DECL_MODE (decl));
+
   /* Reset the alignment in case we have made it tighter, so we can benefit
      from it in get_pointer_alignment.  */
   SET_DECL_ALIGN (decl, align);

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PING] [PATCHv4] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)
  2019-08-08 14:20             ` [PATCHv4] " Bernd Edlinger
@ 2019-08-14 10:54               ` Bernd Edlinger
  2019-08-14 12:27               ` Richard Biener
  1 sibling, 0 replies; 50+ messages in thread
From: Bernd Edlinger @ 2019-08-14 10:54 UTC (permalink / raw)
  To: Richard Biener
  Cc: gcc-patches, Richard Earnshaw, Ramana Radhakrishnan,
	Kyrill Tkachov, Eric Botcazou, Jeff Law, Jakub Jelinek

Hi!

I'd like to ping for this patch:
https://gcc.gnu.org/ml/gcc-patches/2019-08/msg00546.html


Thanks
Bernd.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCHv3] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)
  2019-08-02 19:01           ` Bernd Edlinger
  2019-08-08 14:20             ` [PATCHv4] " Bernd Edlinger
@ 2019-08-14 11:56             ` Richard Biener
  1 sibling, 0 replies; 50+ messages in thread
From: Richard Biener @ 2019-08-14 11:56 UTC (permalink / raw)
  To: Bernd Edlinger
  Cc: gcc-patches, Richard Earnshaw, Ramana Radhakrishnan,
	Kyrill Tkachov, Eric Botcazou, Jeff Law, Jakub Jelinek

On Fri, 2 Aug 2019, Bernd Edlinger wrote:

> On 8/2/19 3:11 PM, Richard Biener wrote:
> > On Tue, 30 Jul 2019, Bernd Edlinger wrote:
> > 
> >>
> >> I have no test coverage for the movmisalign optab though, so I
> >> rely on your code review for that part.
> > 
> > It looks OK.  I tried to make it trigger on the following on
> > i?86 with -msse2:
> > 
> > typedef int v4si __attribute__((vector_size (16)));
> > 
> > struct S { v4si v; } __attribute__((packed));
> > 
> > v4si foo (struct S s)
> > {
> >   return s.v;
> > }
> > 
> 
> Hmm, the entry_parm need to be a MEM_P and an unaligned one.
> So the test case could be made to trigger it this way:
> 
> typedef int v4si __attribute__((vector_size (16)));
> 
> struct S { v4si v; } __attribute__((packed));
> 
> int t;
> v4si foo (struct S a, struct S b, struct S c, struct S d,
>           struct S e, struct S f, struct S g, struct S h,
>           int i, int j, int k, int l, int m, int n,
>           int o, struct S s)
> {
>   t = o;
>   return s.v;
> }
> 
> However the code path is still not reached, since targetm.slow_ualigned_access
> is always FALSE, which is probably a flaw in my patch.
> 
> So I think,
> 
> +  else if (MEM_P (data->entry_parm)
> +          && GET_MODE_ALIGNMENT (promoted_nominal_mode)
> +             > MEM_ALIGN (data->entry_parm)
> +          && targetm.slow_unaligned_access (promoted_nominal_mode,
> +                                            MEM_ALIGN (data->entry_parm)))
> 
> should probably better be
> 
> +  else if (MEM_P (data->entry_parm)
> +          && GET_MODE_ALIGNMENT (promoted_nominal_mode)
> +             > MEM_ALIGN (data->entry_parm)
> +        && (((icode = optab_handler (movmisalign_optab, promoted_nominal_mode))
> +             != CODE_FOR_nothing)
> +            || targetm.slow_unaligned_access (promoted_nominal_mode,
> +                                              MEM_ALIGN (data->entry_parm))))
> 
> Right?

Ah, yes.  So it's really the presence of a movmisalign optab makes it
a must for unaligned moves and if it is not present then
targetm.slow_unaligned_access tells whether we need to use the bitfield
extraction/insertion code.

> Then the modified test case would use the movmisalign optab.
> However nothing changes in the end, since the i386 back-end is used to work
> around the middle end not using movmisalign optab when it should do so.

Yeah, in the past it would have failed though.  I wonder if movmisalign
is still needed for x86...

> I wonder if I should try to add a gcc_checking_assert to the mov<mode> expand
> patterns that the memory is properly aligned ?

I suppose gen* could add asserts that there is no movmisalign_optab
that would match when expanding a mov<mode>.  Eventually it's enough
to guard the mov_optab use in emit_move_insn_1 that way?  Or even
try movmisalign there...

> 
> > but nowadays x86 seems to be happy with regular moves operating on
> > unaligned memory, using unaligned moves where necessary.
> > 
> > (insn 5 2 8 2 (set (reg:V4SI 82 [ _2 ])
> >         (mem/c:V4SI (reg/f:SI 16 argp) [2 s.v+0 S16 A32])) "t.c":7:11 1229 
> > {movv4si_internal}
> >      (nil))
> > 
> > and with GCC 4.8 we ended up with the following expansion which is
> > also correct.
> > 
> > (insn 2 4 3 2 (set (subreg:V16QI (reg/v:V4SI 61 [ s ]) 0)
> >         (unspec:V16QI [
> >                 (mem/c:V16QI (reg/f:SI 16 argp) [0 s+0 S16 A32])
> >             ] UNSPEC_LOADU)) t.c:6 1164 {sse2_loaddqu}
> >      (nil))
> > 
> > So it seems it has been too long and I don't remember what is
> > special with arm that it doesn't work...  it possibly simply
> > trusts GET_MODE_ALIGNMENT, never looking at MEM_ALIGN which
> > I think is OK-ish?
> > 
> 
> Yes, that is what Richard said as well.
> 
> >>>>> Similarly the very same issue should exist on x86_64 which is
> >>>>> !STRICT_ALIGNMENT, it's just the ABI seems to provide the appropriate
> >>>>> alignment on the caller side.  So the STRICT_ALIGNMENT check is
> >>>>> a wrong one.
> >>>>>
> >>>>
> >>>> I may be plain wrong here, but I thought that !STRICT_ALIGNMENT targets
> >>>> just use MEM_ALIGN to select the right instructions.  MEM_ALIGN
> >>>> is always 32-bit align on the DImode memory.  The x86_64 vector instructions
> >>>> would look at MEM_ALIGN and do the right thing, yes?
> >>>
> >>> No, they need to use the movmisalign optab and end up with UNSPECs
> >>> for example.
> >> Ah, thanks, now I see.
> >>
> >>>> It seems to be the definition of STRICT_ALIGNMENT targets that all RTL
> >>>> instructions need to have MEM_ALIGN >= GET_MODE_ALIGNMENT, so the target
> >>>> does not even have to look at MEM_ALIGN except in the mov_misalign_optab,
> >>>> right?
> >>>
> >>> Yes, I think we never losened that.  Note that RTL expansion has to
> >>> fix this up for them.  Note that strictly speaking SLOW_UNALIGNED_ACCESS
> >>> specifies that x86 is strict-align wrt vector modes.
> >>>
> >>
> >> Yes I agree, the code would be incorrect for x86 as well when the movmisalign_optab
> >> is not used.  So I invoke the movmisalign optab if available and if not fall
> >> back to extract_bit_field.  As in the assign_parm_setup_stack assign_parm_setup_reg
> >> assumes that data->promoted_mode != data->nominal_mode does not happen with
> >> misaligned stack slots.
> >>
> >>
> >> Attached is the v3 if my patch.
> >>
> >> Boot-strapped and reg-tested on x86_64-pc-linux-gnu and arm-linux-gnueabihf.
> >>
> >> Is it OK for trunk?
> > 
> > Few comments.
> > 
> > @@ -2274,8 +2274,6 @@ struct assign_parm_data_one
> >    int partial;
> >    BOOL_BITFIELD named_arg : 1;
> >    BOOL_BITFIELD passed_pointer : 1;
> > -  BOOL_BITFIELD on_stack : 1;
> > -  BOOL_BITFIELD loaded_in_reg : 1;
> >  };
> > 
> >  /* A subroutine of assign_parms.  Initialize ALL.  */
> > 
> > independently OK.
> > 
> > @@ -2813,8 +2826,9 @@ assign_parm_adjust_stack_rtl (struct assign_parm_d
> >       ultimate type, don't use that slot after entry.  We'll make another
> >       stack slot, if we need one.  */
> >    if (stack_parm
> > -      && ((STRICT_ALIGNMENT
> > -          && GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN 
> > (stack_parm))
> > +      && ((GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN 
> > (stack_parm)
> > +          && targetm.slow_unaligned_access (data->nominal_mode,
> > +                                            MEM_ALIGN (stack_parm)))
> >           || (data->nominal_type
> >               && TYPE_ALIGN (data->nominal_type) > MEM_ALIGN (stack_parm)
> >               && MEM_ALIGN (stack_parm) < PREFERRED_STACK_BOUNDARY)))
> > 
> > looks like something we should have as a separate commit as well.  It
> > also looks obvious to me.
> > 
> 
> Okay, committed as two separate commits: r274023 & r274025
> 
> > @@ -3292,6 +3306,23 @@ assign_parm_setup_reg (struct assign_parm_data_all
> > 
> >        did_conversion = true;
> >      }
> > +  else if (MEM_P (data->entry_parm)
> > +          && GET_MODE_ALIGNMENT (promoted_nominal_mode)
> > +             > MEM_ALIGN (data->entry_parm)
> > 
> > we arrive here by-passing
> > 
> >   else if (need_conversion)
> >     {
> >       /* We did not have an insn to convert directly, or the sequence
> >          generated appeared unsafe.  We must first copy the parm to a
> >          pseudo reg, and save the conversion until after all
> >          parameters have been moved.  */
> > 
> >       int save_tree_used;
> >       rtx tempreg = gen_reg_rtx (GET_MODE (data->entry_parm));
> > 
> >       emit_move_insn (tempreg, validated_mem);
> > 
> > but this move instruction is invalid in the same way as the case
> > you fix, no?  So wouldn't it be better to do
> > 
> 
> We could do that, but I supposed that there must be a reason why
> assign_parm_setup_stack gets away with that same:
> 
>   if (data->promoted_mode != data->nominal_mode)
>     {
>       /* Conversion is required.  */
>       rtx tempreg = gen_reg_rtx (GET_MODE (data->entry_parm));
> 
>       emit_move_insn (tempreg, validize_mem (copy_rtx (data->entry_parm)));
> 
> 
> So either some back-ends are too permissive with us,
> or there is a reason why promoted_mode != nominal_mode
> does not happen together with unaligned entry_parm.
> In a way that would be a rather unusual ABI.
> 
> >   if (moved)
> >     /* Nothing to do.  */
> >     ;
> >   else
> >     {
> >        if (unaligned)
> >          ...
> >        else
> >          emit_move_insn (...);
> > 
> >        if (need_conversion)
> >  ....
> >     }
> > 
> > ?  Hopefully whatever "moved" things in the if (moved) case did
> > it correctly.
> > 
> 
> It would'nt.  It uses the gen_extend_insn would that be expected to
> work with unaligned memory?

No idea..

> > Can you check whehter your patch does anything to the x86 testcase
> > posted above?
> > 
> 
> Thanks, it might help to have at least a test case where the pattern
> is expanded, even if it does not change anything.
> 
> > I'm not very familiar with this code so I'm leaving actual approval
> > to somebody else.  Still hope the comments were helpful.
> > 
> 
> Yes they are, thanks a lot.

Sorry for the slow response(s).

Richard.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCHv4] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)
  2019-08-08 14:20             ` [PATCHv4] " Bernd Edlinger
  2019-08-14 10:54               ` [PING] " Bernd Edlinger
@ 2019-08-14 12:27               ` Richard Biener
  2019-08-14 22:26                 ` Bernd Edlinger
  1 sibling, 1 reply; 50+ messages in thread
From: Richard Biener @ 2019-08-14 12:27 UTC (permalink / raw)
  To: Bernd Edlinger
  Cc: gcc-patches, Richard Earnshaw, Ramana Radhakrishnan,
	Kyrill Tkachov, Eric Botcazou, Jeff Law, Jakub Jelinek

On Thu, 8 Aug 2019, Bernd Edlinger wrote:

> On 8/2/19 9:01 PM, Bernd Edlinger wrote:
> > On 8/2/19 3:11 PM, Richard Biener wrote:
> >> On Tue, 30 Jul 2019, Bernd Edlinger wrote:
> >>
> >>>
> >>> I have no test coverage for the movmisalign optab though, so I
> >>> rely on your code review for that part.
> >>
> >> It looks OK.  I tried to make it trigger on the following on
> >> i?86 with -msse2:
> >>
> >> typedef int v4si __attribute__((vector_size (16)));
> >>
> >> struct S { v4si v; } __attribute__((packed));
> >>
> >> v4si foo (struct S s)
> >> {
> >>   return s.v;
> >> }
> >>
> > 
> > Hmm, the entry_parm need to be a MEM_P and an unaligned one.
> > So the test case could be made to trigger it this way:
> > 
> > typedef int v4si __attribute__((vector_size (16)));
> > 
> > struct S { v4si v; } __attribute__((packed));
> > 
> > int t;
> > v4si foo (struct S a, struct S b, struct S c, struct S d,
> >           struct S e, struct S f, struct S g, struct S h,
> >           int i, int j, int k, int l, int m, int n,
> >           int o, struct S s)
> > {
> >   t = o;
> >   return s.v;
> > }
> > 
> 
> Ah, I realized that there are already a couple of very similar
> test cases: gcc.target/i386/pr35767-1.c, gcc.target/i386/pr35767-1d.c,
> gcc.target/i386/pr35767-1i.c and gcc.target/i386/pr39445.c,
> which also manage to execute the movmisalign code with the latest patch
> version.  So I thought that it is not necessary to add another one.
> 
> > However the code path is still not reached, since targetm.slow_ualigned_access
> > is always FALSE, which is probably a flaw in my patch.
> > 
> > So I think,
> > 
> > +  else if (MEM_P (data->entry_parm)
> > +          && GET_MODE_ALIGNMENT (promoted_nominal_mode)
> > +             > MEM_ALIGN (data->entry_parm)
> > +          && targetm.slow_unaligned_access (promoted_nominal_mode,
> > +                                            MEM_ALIGN (data->entry_parm)))
> > 
> > should probably better be
> > 
> > +  else if (MEM_P (data->entry_parm)
> > +          && GET_MODE_ALIGNMENT (promoted_nominal_mode)
> > +             > MEM_ALIGN (data->entry_parm)
> > +        && (((icode = optab_handler (movmisalign_optab, promoted_nominal_mode))
> > +             != CODE_FOR_nothing)
> > +            || targetm.slow_unaligned_access (promoted_nominal_mode,
> > +                                              MEM_ALIGN (data->entry_parm))))
> > 
> > Right?
> > 
> > Then the modified test case would use the movmisalign optab.
> > However nothing changes in the end, since the i386 back-end is used to work
> > around the middle end not using movmisalign optab when it should do so.
> > 
> 
> I prefer the second form of the check, as it offers more test coverage,
> and is probably more correct than the former.
> 
> Note there are more variations of this misalign check in expr.c,
> some are somehow odd, like expansion of MEM_REF and VIEW_CONVERT_EXPR:
> 
>             && mode != BLKmode
>             && align < GET_MODE_ALIGNMENT (mode))
>           {
>             if ((icode = optab_handler (movmisalign_optab, mode))
>                 != CODE_FOR_nothing)
>               [...]
>             else if (targetm.slow_unaligned_access (mode, align))
>               temp = extract_bit_field (temp, GET_MODE_BITSIZE (mode),
>                                         0, TYPE_UNSIGNED (TREE_TYPE (exp)),
>                                         (modifier == EXPAND_STACK_PARM
>                                          ? NULL_RTX : target),
>                                         mode, mode, false, alt_rtl);
> 
> I wonder if they are correct this way, why shouldn't we use the movmisalign
> optab if it exists, regardless of TARGET_SLOW_UNALIGNED_ACCESSS ?

Doesn't the code do exactly this?  Prefer movmisalign over 
extrct_bit_field?

> 
> > I wonder if I should try to add a gcc_checking_assert to the mov<mode> expand
> > patterns that the memory is properly aligned ?
> >
> 
> Wow, that was a really exciting bug-hunt with those assertions around...

:)

> >> @@ -3292,6 +3306,23 @@ assign_parm_setup_reg (struct assign_parm_data_all
> >>
> >>        did_conversion = true;
> >>      }
> >> +  else if (MEM_P (data->entry_parm)
> >> +          && GET_MODE_ALIGNMENT (promoted_nominal_mode)
> >> +             > MEM_ALIGN (data->entry_parm)
> >>
> >> we arrive here by-passing
> >>
> >>   else if (need_conversion)
> >>     {
> >>       /* We did not have an insn to convert directly, or the sequence
> >>          generated appeared unsafe.  We must first copy the parm to a
> >>          pseudo reg, and save the conversion until after all
> >>          parameters have been moved.  */
> >>
> >>       int save_tree_used;
> >>       rtx tempreg = gen_reg_rtx (GET_MODE (data->entry_parm));
> >>
> >>       emit_move_insn (tempreg, validated_mem);
> >>
> >> but this move instruction is invalid in the same way as the case
> >> you fix, no?  So wouldn't it be better to do
> >>
> > 
> > We could do that, but I supposed that there must be a reason why
> > assign_parm_setup_stack gets away with that same:
> > 
> >   if (data->promoted_mode != data->nominal_mode)
> >     {
> >       /* Conversion is required.  */
> >       rtx tempreg = gen_reg_rtx (GET_MODE (data->entry_parm));
> > 
> >       emit_move_insn (tempreg, validize_mem (copy_rtx (data->entry_parm)));
> > 
> > 
> > So either some back-ends are too permissive with us,
> > or there is a reason why promoted_mode != nominal_mode
> > does not happen together with unaligned entry_parm.
> > In a way that would be a rather unusual ABI.
> > 
> 
> To find out if that ever happens I added a couple of checking
> assertions in the arm mov<mode> expand patterns.
> 
> So far the assertions did (almost) always hold, so it is likely not
> necessary to fiddle with all those naive move instructions here.
>
> So my gut feeling is, leave those places alone until there is a reason
> for changing them.

Works for me - I wonder if we should add those asserts to generic
code (guarded with flag_checking) though.

> However the assertion in movsi triggered a couple times in the
> ada testsuite due to expand_builtin_init_descriptor using a
> BLKmode MEM rtx, which is only 8-bit aligned.  So, I set the
> ptr_mode alignment there explicitly.

Looks good given we emit ptr_mode moves into it.  Thus OK independently.
 
> Several struct-layout-1.dg testcase tripped over misaligned
> complex_cst constants, fixed by varasm.c (align_variable).
> This is likely a wrong code bug, because misaligned complex
> constants, are expanded to misaligned MEM_REF, but the
> expansion cannot handle misaligned constants, only packed
> structure fields.

Hmm.  So your patch overrides user-alignment here.  Woudln't it
be better to do that more conciously by

  if (! DECL_USER_ALIGN (decl)
      || (align < GET_MODE_ALIGNMENT (DECL_MODE (decl))
          && targetm.slow_unaligned_access (DECL_MODE (decl), align)))

?  And why is the movmisalign optab support missing here?

IMHO whatever code later fails to properly use unaligned loads
should be fixed instead rather than ignoring user requested alignment.

Can you quote a short testcase that explains what exactly goes wrong?
The struct-layout ones are awkward to look at...

> Furthermore gcc.dg/Warray-bounds-33.c was fixed by the
> change in expr.c (expand_expr_real_1).  Certainly is it invalid
> to read memory at a function address, but it should not ICE.
> The problem here, is the MEM_REF has no valid MEM_ALIGN, it looks
> like A32, so the misaligned code execution is not taken, but it is
> set to A8 below, but then we hit an ICE if the result is used:

So the user accessed it as A32.

>         /* Don't set memory attributes if the base expression is
>            SSA_NAME that got expanded as a MEM.  In that case, we should
>            just honor its original memory attributes.  */
>         if (TREE_CODE (tem) != SSA_NAME || !MEM_P (orig_op0))
>           set_mem_attributes (op0, exp, 0);

Huh, I don't understand this.  'tem' should never be SSA_NAME.
But set_mem_attributes_minus_bitpos uses get_object_alignment_1
and that has special treatment for FUNCTION_DECLs that is not
covered by

      /* When EXP is an actual memory reference then we can use
         TYPE_ALIGN of a pointer indirection to derive alignment.
         Do so only if get_pointer_alignment_1 did not reveal absolute
         alignment knowledge and if using that alignment would
         improve the situation.  */
      unsigned int talign;
      if (!addr_p && !known_alignment
          && (talign = min_align_of_type (TREE_TYPE (exp)) * 
BITS_PER_UNIT)
          && talign > align)
        align = talign;

which could be moved out of the if-cascade.

That said, setting A8 should eventually result into appropriate
unaligned expansion, so it seems odd this triggers the assert...

> 
> Finally gcc.dg/torture/pr48493.c required the change
> in assign_parm_setup_stack.  This is just not using the
> correct MEM_ALIGN attribute value, while the memory is
> actually aligned.

But doesn't

          int align = STACK_SLOT_ALIGNMENT (data->passed_type,
                                            GET_MODE (data->entry_parm),
                                            TYPE_ALIGN 
(data->passed_type));
+         if (align < (int)GET_MODE_ALIGNMENT (GET_MODE 
(data->entry_parm))
+             && targetm.slow_unaligned_access (GET_MODE 
(data->entry_parm),
+                                               align))
+           align = GET_MODE_ALIGNMENT (GET_MODE (data->entry_parm));

hint at that STACK_SLOT_ALIGNMENT is simply bogus for the target?
That is, the target says, for natural alignment 64 the stack slot
alignment can only be guaranteed 32.  You can't then simply up it
but have to use unaligned accesses (or the target/middle-end needs
to do dynamic stack alignment).


>  Note that set_mem_attributes does not
> always preserve the MEM_ALIGN of the ref, since:

set_mem_attributes sets _all_ attributes from an expression or type.

>   /* Default values from pre-existing memory attributes if present.  */
>   refattrs = MEM_ATTRS (ref);
>   if (refattrs)
>     {
>       /* ??? Can this ever happen?  Calling this routine on a MEM that
>          already carries memory attributes should probably be invalid.  */
>       attrs.expr = refattrs->expr;
>       attrs.offset_known_p = refattrs->offset_known_p;
>       attrs.offset = refattrs->offset;
>       attrs.size_known_p = refattrs->size_known_p;
>       attrs.size = refattrs->size;
>       attrs.align = refattrs->align;
>     }
> 
> but if we happen to set_mem_align to _exactly_ the MODE_ALIGNMENT
> the MEM_ATTRS are zero, and a smaller alignment may result.

Not sure what you are saying here.  That

set_mem_align (MEM:SI A32, 32)

produces a NULL MEM_ATTRS and thus set_mem_attributes not inheriting
the A32 but eventually computing sth lower?  Yeah, that's probably
an interesting "hole" here.  I'm quite sure that if we'd do

refattrs = MEM_ATTRS (ref) ? MEM_ATTRS (ref) : mem_mode_attrs[(int) GET_MODE (ref)];

we run into issues exactly on strict-align targets ...

> Well with those checks in place it should now be a lot harder to generate
> invalid code on STRICT_ALIGNMENT targets, without running into an ICE.
> 
> Attached is the latest version of my arm alignment patch.
> 
> 
> Boot-strapped and reg-tested on x64_64-pc-linux-gnu and arm-linux-gnueabihf.
> Is it OK for trunk?

@@ -3291,6 +3306,23 @@ assign_parm_setup_reg (struct assign_parm_data_all

       did_conversion = true;
     }
+  else if (MEM_P (data->entry_parm)
+          && GET_MODE_ALIGNMENT (promoted_nominal_mode)
+             > MEM_ALIGN (data->entry_parm)
+          && (((icode = optab_handler (movmisalign_optab,
+                                       promoted_nominal_mode))
+               != CODE_FOR_nothing)
+              || targetm.slow_unaligned_access (promoted_nominal_mode,
+                                                MEM_ALIGN 
(data->entry_parm))))
+    {
+      if (icode != CODE_FOR_nothing)
+       emit_insn (GEN_FCN (icode) (parmreg, validated_mem));
+      else
+       rtl = parmreg = extract_bit_field (validated_mem,
+                       GET_MODE_BITSIZE (promoted_nominal_mode), 0,
+                       unsignedp, parmreg,
+                       promoted_nominal_mode, VOIDmode, false, NULL);
+    }
   else
     emit_move_insn (parmreg, validated_mem);

This hunk would be obvious to me if we'd use MEM_ALIGN (validated_mem) /
GET_MODE (validated_mem) instead of MEM_ALIGN (data->entry_parm)
and promoted_nominal_mode.

Not sure if it helps on its own.

I'm nervous about the alignment one since I'm not at all familiar
with this code...

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCHv4] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)
  2019-08-14 12:27               ` Richard Biener
@ 2019-08-14 22:26                 ` Bernd Edlinger
  2019-08-15  8:58                   ` Richard Biener
  0 siblings, 1 reply; 50+ messages in thread
From: Bernd Edlinger @ 2019-08-14 22:26 UTC (permalink / raw)
  To: Richard Biener
  Cc: gcc-patches, Richard Earnshaw, Ramana Radhakrishnan,
	Kyrill Tkachov, Eric Botcazou, Jeff Law, Jakub Jelinek

On 8/14/19 2:00 PM, Richard Biener wrote:
> On Thu, 8 Aug 2019, Bernd Edlinger wrote:
> 
>> On 8/2/19 9:01 PM, Bernd Edlinger wrote:
>>> On 8/2/19 3:11 PM, Richard Biener wrote:
>>>> On Tue, 30 Jul 2019, Bernd Edlinger wrote:
>>>>
>>>>>
>>>>> I have no test coverage for the movmisalign optab though, so I
>>>>> rely on your code review for that part.
>>>>
>>>> It looks OK.  I tried to make it trigger on the following on
>>>> i?86 with -msse2:
>>>>
>>>> typedef int v4si __attribute__((vector_size (16)));
>>>>
>>>> struct S { v4si v; } __attribute__((packed));
>>>>
>>>> v4si foo (struct S s)
>>>> {
>>>>   return s.v;
>>>> }
>>>>
>>>
>>> Hmm, the entry_parm need to be a MEM_P and an unaligned one.
>>> So the test case could be made to trigger it this way:
>>>
>>> typedef int v4si __attribute__((vector_size (16)));
>>>
>>> struct S { v4si v; } __attribute__((packed));
>>>
>>> int t;
>>> v4si foo (struct S a, struct S b, struct S c, struct S d,
>>>           struct S e, struct S f, struct S g, struct S h,
>>>           int i, int j, int k, int l, int m, int n,
>>>           int o, struct S s)
>>> {
>>>   t = o;
>>>   return s.v;
>>> }
>>>
>>
>> Ah, I realized that there are already a couple of very similar
>> test cases: gcc.target/i386/pr35767-1.c, gcc.target/i386/pr35767-1d.c,
>> gcc.target/i386/pr35767-1i.c and gcc.target/i386/pr39445.c,
>> which also manage to execute the movmisalign code with the latest patch
>> version.  So I thought that it is not necessary to add another one.
>>
>>> However the code path is still not reached, since targetm.slow_ualigned_access
>>> is always FALSE, which is probably a flaw in my patch.
>>>
>>> So I think,
>>>
>>> +  else if (MEM_P (data->entry_parm)
>>> +          && GET_MODE_ALIGNMENT (promoted_nominal_mode)
>>> +             > MEM_ALIGN (data->entry_parm)
>>> +          && targetm.slow_unaligned_access (promoted_nominal_mode,
>>> +                                            MEM_ALIGN (data->entry_parm)))
>>>
>>> should probably better be
>>>
>>> +  else if (MEM_P (data->entry_parm)
>>> +          && GET_MODE_ALIGNMENT (promoted_nominal_mode)
>>> +             > MEM_ALIGN (data->entry_parm)
>>> +        && (((icode = optab_handler (movmisalign_optab, promoted_nominal_mode))
>>> +             != CODE_FOR_nothing)
>>> +            || targetm.slow_unaligned_access (promoted_nominal_mode,
>>> +                                              MEM_ALIGN (data->entry_parm))))
>>>
>>> Right?
>>>
>>> Then the modified test case would use the movmisalign optab.
>>> However nothing changes in the end, since the i386 back-end is used to work
>>> around the middle end not using movmisalign optab when it should do so.
>>>
>>
>> I prefer the second form of the check, as it offers more test coverage,
>> and is probably more correct than the former.
>>
>> Note there are more variations of this misalign check in expr.c,
>> some are somehow odd, like expansion of MEM_REF and VIEW_CONVERT_EXPR:
>>
>>             && mode != BLKmode
>>             && align < GET_MODE_ALIGNMENT (mode))
>>           {
>>             if ((icode = optab_handler (movmisalign_optab, mode))
>>                 != CODE_FOR_nothing)
>>               [...]
>>             else if (targetm.slow_unaligned_access (mode, align))
>>               temp = extract_bit_field (temp, GET_MODE_BITSIZE (mode),
>>                                         0, TYPE_UNSIGNED (TREE_TYPE (exp)),
>>                                         (modifier == EXPAND_STACK_PARM
>>                                          ? NULL_RTX : target),
>>                                         mode, mode, false, alt_rtl);
>>
>> I wonder if they are correct this way, why shouldn't we use the movmisalign
>> optab if it exists, regardless of TARGET_SLOW_UNALIGNED_ACCESSS ?
> 
> Doesn't the code do exactly this?  Prefer movmisalign over 
> extrct_bit_field?
> 

Ah, yes.  How could I miss that.

>>
>>> I wonder if I should try to add a gcc_checking_assert to the mov<mode> expand
>>> patterns that the memory is properly aligned ?
>>>
>>
>> Wow, that was a really exciting bug-hunt with those assertions around...
> 
> :)
> 
>>>> @@ -3292,6 +3306,23 @@ assign_parm_setup_reg (struct assign_parm_data_all
>>>>
>>>>        did_conversion = true;
>>>>      }
>>>> +  else if (MEM_P (data->entry_parm)
>>>> +          && GET_MODE_ALIGNMENT (promoted_nominal_mode)
>>>> +             > MEM_ALIGN (data->entry_parm)
>>>>
>>>> we arrive here by-passing
>>>>
>>>>   else if (need_conversion)
>>>>     {
>>>>       /* We did not have an insn to convert directly, or the sequence
>>>>          generated appeared unsafe.  We must first copy the parm to a
>>>>          pseudo reg, and save the conversion until after all
>>>>          parameters have been moved.  */
>>>>
>>>>       int save_tree_used;
>>>>       rtx tempreg = gen_reg_rtx (GET_MODE (data->entry_parm));
>>>>
>>>>       emit_move_insn (tempreg, validated_mem);
>>>>
>>>> but this move instruction is invalid in the same way as the case
>>>> you fix, no?  So wouldn't it be better to do
>>>>
>>>
>>> We could do that, but I supposed that there must be a reason why
>>> assign_parm_setup_stack gets away with that same:
>>>
>>>   if (data->promoted_mode != data->nominal_mode)
>>>     {
>>>       /* Conversion is required.  */
>>>       rtx tempreg = gen_reg_rtx (GET_MODE (data->entry_parm));
>>>
>>>       emit_move_insn (tempreg, validize_mem (copy_rtx (data->entry_parm)));
>>>
>>>
>>> So either some back-ends are too permissive with us,
>>> or there is a reason why promoted_mode != nominal_mode
>>> does not happen together with unaligned entry_parm.
>>> In a way that would be a rather unusual ABI.
>>>
>>
>> To find out if that ever happens I added a couple of checking
>> assertions in the arm mov<mode> expand patterns.
>>
>> So far the assertions did (almost) always hold, so it is likely not
>> necessary to fiddle with all those naive move instructions here.
>>
>> So my gut feeling is, leave those places alone until there is a reason
>> for changing them.
> 
> Works for me - I wonder if we should add those asserts to generic
> code (guarded with flag_checking) though.
> 

Well, yes, but I was scared away by the complexity of emit_move_insn_1.

It could be done, but in the moment I would be happy to have these
checks of one major strict alignment target, ARM is a good candidate
since most instructions work even if they are accidentally
using unaligned arguments.  So middle-end errors do not always
visible by ordinary tests.  Nevertheless it is a blatant violation of the
contract between middle-end and back-end, which should be avoided.

>> However the assertion in movsi triggered a couple times in the
>> ada testsuite due to expand_builtin_init_descriptor using a
>> BLKmode MEM rtx, which is only 8-bit aligned.  So, I set the
>> ptr_mode alignment there explicitly.
> 
> Looks good given we emit ptr_mode moves into it.  Thus OK independently.
> 

Thanks, committed as r274487.

>> Several struct-layout-1.dg testcase tripped over misaligned
>> complex_cst constants, fixed by varasm.c (align_variable).
>> This is likely a wrong code bug, because misaligned complex
>> constants, are expanded to misaligned MEM_REF, but the
>> expansion cannot handle misaligned constants, only packed
>> structure fields.
> 
> Hmm.  So your patch overrides user-alignment here.  Woudln't it
> be better to do that more conciously by
> 
>   if (! DECL_USER_ALIGN (decl)
>       || (align < GET_MODE_ALIGNMENT (DECL_MODE (decl))
>           && targetm.slow_unaligned_access (DECL_MODE (decl), align)))
> 
> ?  And why is the movmisalign optab support missing here?
> 

Yes, I wanted to replicate what we have in assign_parm_adjust_stack_rtl:

  /* If we can't trust the parm stack slot to be aligned enough for its
     ultimate type, don't use that slot after entry.  We'll make another
     stack slot, if we need one.  */
  if (stack_parm
      && ((GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN (stack_parm)
           && targetm.slow_unaligned_access (data->nominal_mode,
                                             MEM_ALIGN (stack_parm)))

which also makes a variable more aligned than it is declared.
But maybe both should also check the movmisalign optab in
addition to slow_unaligned_access ?

> IMHO whatever code later fails to properly use unaligned loads
> should be fixed instead rather than ignoring user requested alignment.
> 
> Can you quote a short testcase that explains what exactly goes wrong?
> The struct-layout ones are awkward to look at...
> 

Sure,

$ cat test.c
_Complex float __attribute__((aligned(1))) cf;

void foo (void)
{
  cf = 1.0i;
}

$ arm-linux-gnueabihf-gcc -S test.c 
during RTL pass: expand
test.c: In function 'foo':
test.c:5:6: internal compiler error: in gen_movsf, at config/arm/arm.md:7003
    5 |   cf = 1.0i;
      |   ~~~^~~~~~
0x7ba475 gen_movsf(rtx_def*, rtx_def*)
	../../gcc-trunk/gcc/config/arm/arm.md:7003
0xa49587 insn_gen_fn::operator()(rtx_def*, rtx_def*) const
	../../gcc-trunk/gcc/recog.h:318
0xa49587 emit_move_insn_1(rtx_def*, rtx_def*)
	../../gcc-trunk/gcc/expr.c:3695
0xa49914 emit_move_insn(rtx_def*, rtx_def*)
	../../gcc-trunk/gcc/expr.c:3791
0xa494f7 emit_move_complex_parts(rtx_def*, rtx_def*)
	../../gcc-trunk/gcc/expr.c:3490
0xa49914 emit_move_insn(rtx_def*, rtx_def*)
	../../gcc-trunk/gcc/expr.c:3791
0xa5106f store_expr(tree_node*, rtx_def*, int, bool, bool)
	../../gcc-trunk/gcc/expr.c:5855
0xa51cc0 expand_assignment(tree_node*, tree_node*, bool)
	../../gcc-trunk/gcc/expr.c:5441
0xa51cc0 expand_assignment(tree_node*, tree_node*, bool)
	../../gcc-trunk/gcc/expr.c:4983
0x93396f expand_gimple_stmt_1
	../../gcc-trunk/gcc/cfgexpand.c:3777
0x93396f expand_gimple_stmt
	../../gcc-trunk/gcc/cfgexpand.c:3875
0x9392e1 expand_gimple_basic_block
	../../gcc-trunk/gcc/cfgexpand.c:5915
0x93b046 execute
	../../gcc-trunk/gcc/cfgexpand.c:6538
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.

Without the hunk in varasm.c of course.

What happens is that expand_expr_real_2 returns a unaligned mem_ref here:

    case COMPLEX_CST:
      /* Handle evaluating a complex constant in a CONCAT target.  */
      if (original_target && GET_CODE (original_target) == CONCAT)
        {
          [... this path not taken ...]
        }

      /* fall through */

    case STRING_CST:
      temp = expand_expr_constant (exp, 1, modifier);

      /* temp contains a constant address.
         On RISC machines where a constant address isn't valid,
         make some insns to get that address into a register.  */
      if (modifier != EXPAND_CONST_ADDRESS
          && modifier != EXPAND_INITIALIZER
          && modifier != EXPAND_SUM
          && ! memory_address_addr_space_p (mode, XEXP (temp, 0),
                                            MEM_ADDR_SPACE (temp)))
        return replace_equiv_address (temp,
                                      copy_rtx (XEXP (temp, 0)));
      return temp;

The result of expand_expr_real(..., EXPAND_NORMAL) ought to be usable
by emit_move_insn, that is expected just *everywhere* and can't be changed.

This could probably be fixed in an ugly way in the COMPLEX_CST, handler
but OTOH, I don't see any reason why this constant has to be misaligned
when it can be easily aligned, which avoids the need for a misaligned access.

>> Furthermore gcc.dg/Warray-bounds-33.c was fixed by the
>> change in expr.c (expand_expr_real_1).  Certainly is it invalid
>> to read memory at a function address, but it should not ICE.
>> The problem here, is the MEM_REF has no valid MEM_ALIGN, it looks
>> like A32, so the misaligned code execution is not taken, but it is
>> set to A8 below, but then we hit an ICE if the result is used:
> 
> So the user accessed it as A32.
> 
>>         /* Don't set memory attributes if the base expression is
>>            SSA_NAME that got expanded as a MEM.  In that case, we should
>>            just honor its original memory attributes.  */
>>         if (TREE_CODE (tem) != SSA_NAME || !MEM_P (orig_op0))
>>           set_mem_attributes (op0, exp, 0);
> 
> Huh, I don't understand this.  'tem' should never be SSA_NAME.

tem is the result of get_inner_reference, why can't that be a SSA_NAME ?

> But set_mem_attributes_minus_bitpos uses get_object_alignment_1
> and that has special treatment for FUNCTION_DECLs that is not
> covered by
> 
>       /* When EXP is an actual memory reference then we can use
>          TYPE_ALIGN of a pointer indirection to derive alignment.
>          Do so only if get_pointer_alignment_1 did not reveal absolute
>          alignment knowledge and if using that alignment would
>          improve the situation.  */
>       unsigned int talign;
>       if (!addr_p && !known_alignment
>           && (talign = min_align_of_type (TREE_TYPE (exp)) * 
> BITS_PER_UNIT)
>           && talign > align)
>         align = talign;
> 
> which could be moved out of the if-cascade.
> 
> That said, setting A8 should eventually result into appropriate
> unaligned expansion, so it seems odd this triggers the assert...
> 

The function pointer is really 32-byte aligned in ARM mode to start
with...

The problem is that the code that handles this misaligned access
is skipped because the mem_rtx has initially no MEM_ATTRS and therefore
MEM_ALIGN == 32, and therefore the code that handles the unaligned
access is not taken.  BUT before the mem_rtx is returned it is
set to MEM_ALIGN = 8 by set_mem_attributes, and we have an assertion,
because the result from expand_expr_real(..., EXPAND_NORMAL) ought to be
usable with emit_move_insn.

>>
>> Finally gcc.dg/torture/pr48493.c required the change
>> in assign_parm_setup_stack.  This is just not using the
>> correct MEM_ALIGN attribute value, while the memory is
>> actually aligned.
> 
> But doesn't
> 
>           int align = STACK_SLOT_ALIGNMENT (data->passed_type,
>                                             GET_MODE (data->entry_parm),
>                                             TYPE_ALIGN 
> (data->passed_type));
> +         if (align < (int)GET_MODE_ALIGNMENT (GET_MODE 
> (data->entry_parm))
> +             && targetm.slow_unaligned_access (GET_MODE 
> (data->entry_parm),
> +                                               align))
> +           align = GET_MODE_ALIGNMENT (GET_MODE (data->entry_parm));
> 
> hint at that STACK_SLOT_ALIGNMENT is simply bogus for the target?
> That is, the target says, for natural alignment 64 the stack slot
> alignment can only be guaranteed 32.  You can't then simply up it
> but have to use unaligned accesses (or the target/middle-end needs
> to do dynamic stack alignment).
> 
Yes, maybe, but STACK_SLOT_ALIGNMENT is used in a few other places as well,
and none of them have a problem, probably because they use expand_expr,
but here we use emit_move_insn:

      if (MEM_P (src))
        {
          [...]
        }
      else
        {
          if (!REG_P (src))
            src = force_reg (GET_MODE (src), src);
          emit_move_insn (dest, src);
        }

So I could restrict that to

          if (!MEM_P (data->entry_parm)
              && align < (int)GET_MODE_ALIGNMENT (GET_MODE (data->entry_parm))
              && ((optab_handler (movmisalign_optab,
				  GET_MODE (data->entry_parm))
                   != CODE_FOR_nothing)
                  || targetm.slow_unaligned_access (GET_MODE (data->entry_parm),
                                                    align)))
            align = GET_MODE_ALIGNMENT (GET_MODE (data->entry_parm));

But OTOH even for arguments arriving in unaligned stack slots where
emit_block_move could handle it, that would just work against the
intention of assign_parm_adjust_stack_rtl.

Of course there are limits how much alignment assign_stack_local
can handle, and that would result in an assertion in the emit_move_insn.
But in the end if that happens it is just an impossible target
configuration.

> 
>>  Note that set_mem_attributes does not
>> always preserve the MEM_ALIGN of the ref, since:
> 
> set_mem_attributes sets _all_ attributes from an expression or type.
> 

Not really:

  refattrs = MEM_ATTRS (ref);
  if (refattrs)
    {
      /* ??? Can this ever happen?  Calling this routine on a MEM that
         already carries memory attributes should probably be invalid.  */
      [...]
      attrs.align = refattrs->align;
    }
  else
    [...]

  if (objectp || TREE_CODE (t) == INDIRECT_REF)
    attrs.align = MAX (attrs.align, TYPE_ALIGN (type));

>>   /* Default values from pre-existing memory attributes if present.  */
>>   refattrs = MEM_ATTRS (ref);
>>   if (refattrs)
>>     {
>>       /* ??? Can this ever happen?  Calling this routine on a MEM that
>>          already carries memory attributes should probably be invalid.  */
>>       attrs.expr = refattrs->expr;
>>       attrs.offset_known_p = refattrs->offset_known_p;
>>       attrs.offset = refattrs->offset;
>>       attrs.size_known_p = refattrs->size_known_p;
>>       attrs.size = refattrs->size;
>>       attrs.align = refattrs->align;
>>     }
>>
>> but if we happen to set_mem_align to _exactly_ the MODE_ALIGNMENT
>> the MEM_ATTRS are zero, and a smaller alignment may result.
> 
> Not sure what you are saying here.  That
> 
> set_mem_align (MEM:SI A32, 32)
> 
> produces a NULL MEM_ATTRS and thus set_mem_attributes not inheriting
> the A32 but eventually computing sth lower?  Yeah, that's probably
> an interesting "hole" here.  I'm quite sure that if we'd do
> 
> refattrs = MEM_ATTRS (ref) ? MEM_ATTRS (ref) : mem_mode_attrs[(int) GET_MODE (ref)];
> 
> we run into issues exactly on strict-align targets ...
> 

Yeah, that's scary...

>> Well with those checks in place it should now be a lot harder to generate
>> invalid code on STRICT_ALIGNMENT targets, without running into an ICE.
>>
>> Attached is the latest version of my arm alignment patch.
>>
>>
>> Boot-strapped and reg-tested on x64_64-pc-linux-gnu and arm-linux-gnueabihf.
>> Is it OK for trunk?
> 
> @@ -3291,6 +3306,23 @@ assign_parm_setup_reg (struct assign_parm_data_all
> 
>        did_conversion = true;
>      }
> +  else if (MEM_P (data->entry_parm)
> +          && GET_MODE_ALIGNMENT (promoted_nominal_mode)
> +             > MEM_ALIGN (data->entry_parm)
> +          && (((icode = optab_handler (movmisalign_optab,
> +                                       promoted_nominal_mode))
> +               != CODE_FOR_nothing)
> +              || targetm.slow_unaligned_access (promoted_nominal_mode,
> +                                                MEM_ALIGN 
> (data->entry_parm))))
> +    {
> +      if (icode != CODE_FOR_nothing)
> +       emit_insn (GEN_FCN (icode) (parmreg, validated_mem));
> +      else
> +       rtl = parmreg = extract_bit_field (validated_mem,
> +                       GET_MODE_BITSIZE (promoted_nominal_mode), 0,
> +                       unsignedp, parmreg,
> +                       promoted_nominal_mode, VOIDmode, false, NULL);
> +    }
>    else
>      emit_move_insn (parmreg, validated_mem);
> 
> This hunk would be obvious to me if we'd use MEM_ALIGN (validated_mem) /
> GET_MODE (validated_mem) instead of MEM_ALIGN (data->entry_parm)
> and promoted_nominal_mode.
> 

Yes, the idea is just to save some cycles, since

parmreg = gen_reg_rtx (promoted_nominal_mode);
we know that parmreg will also have that mode, plus
emit_move_insn (parmreg, validated_mem) which would be called here
asserts that:

  gcc_assert (mode != BLKmode
              && (GET_MODE (y) == mode || GET_MODE (y) == VOIDmode));

so GET_MODE(validated_mem) == GET_MODE (parmreg) == promoted_nominal_mode

I still like the current version with promoted_nominal_mode slighhtly
better both because of performance, and the 80-column restriction. :)

> Not sure if it helps on its own.
> 
> I'm nervous about the alignment one since I'm not at all familiar
> with this code...
> 
> Thanks,
> Richard.
> 

I will send a new version of the patch shortly.


Thanks
Bernd.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCHv4] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)
  2019-08-14 22:26                 ` Bernd Edlinger
@ 2019-08-15  8:58                   ` Richard Biener
  2019-08-15 12:38                     ` Bernd Edlinger
  0 siblings, 1 reply; 50+ messages in thread
From: Richard Biener @ 2019-08-15  8:58 UTC (permalink / raw)
  To: Bernd Edlinger
  Cc: gcc-patches, Richard Earnshaw, Ramana Radhakrishnan,
	Kyrill Tkachov, Eric Botcazou, Jeff Law, Jakub Jelinek

On Wed, 14 Aug 2019, Bernd Edlinger wrote:

> On 8/14/19 2:00 PM, Richard Biener wrote:
> > On Thu, 8 Aug 2019, Bernd Edlinger wrote:
> > 
> >> On 8/2/19 9:01 PM, Bernd Edlinger wrote:
> >>> On 8/2/19 3:11 PM, Richard Biener wrote:
> >>>> On Tue, 30 Jul 2019, Bernd Edlinger wrote:
> >>>>
> >>>>>
> >>>>> I have no test coverage for the movmisalign optab though, so I
> >>>>> rely on your code review for that part.
> >>>>
> >>>> It looks OK.  I tried to make it trigger on the following on
> >>>> i?86 with -msse2:
> >>>>
> >>>> typedef int v4si __attribute__((vector_size (16)));
> >>>>
> >>>> struct S { v4si v; } __attribute__((packed));
> >>>>
> >>>> v4si foo (struct S s)
> >>>> {
> >>>>   return s.v;
> >>>> }
> >>>>
> >>>
> >>> Hmm, the entry_parm need to be a MEM_P and an unaligned one.
> >>> So the test case could be made to trigger it this way:
> >>>
> >>> typedef int v4si __attribute__((vector_size (16)));
> >>>
> >>> struct S { v4si v; } __attribute__((packed));
> >>>
> >>> int t;
> >>> v4si foo (struct S a, struct S b, struct S c, struct S d,
> >>>           struct S e, struct S f, struct S g, struct S h,
> >>>           int i, int j, int k, int l, int m, int n,
> >>>           int o, struct S s)
> >>> {
> >>>   t = o;
> >>>   return s.v;
> >>> }
> >>>
> >>
> >> Ah, I realized that there are already a couple of very similar
> >> test cases: gcc.target/i386/pr35767-1.c, gcc.target/i386/pr35767-1d.c,
> >> gcc.target/i386/pr35767-1i.c and gcc.target/i386/pr39445.c,
> >> which also manage to execute the movmisalign code with the latest patch
> >> version.  So I thought that it is not necessary to add another one.
> >>
> >>> However the code path is still not reached, since targetm.slow_ualigned_access
> >>> is always FALSE, which is probably a flaw in my patch.
> >>>
> >>> So I think,
> >>>
> >>> +  else if (MEM_P (data->entry_parm)
> >>> +          && GET_MODE_ALIGNMENT (promoted_nominal_mode)
> >>> +             > MEM_ALIGN (data->entry_parm)
> >>> +          && targetm.slow_unaligned_access (promoted_nominal_mode,
> >>> +                                            MEM_ALIGN (data->entry_parm)))
> >>>
> >>> should probably better be
> >>>
> >>> +  else if (MEM_P (data->entry_parm)
> >>> +          && GET_MODE_ALIGNMENT (promoted_nominal_mode)
> >>> +             > MEM_ALIGN (data->entry_parm)
> >>> +        && (((icode = optab_handler (movmisalign_optab, promoted_nominal_mode))
> >>> +             != CODE_FOR_nothing)
> >>> +            || targetm.slow_unaligned_access (promoted_nominal_mode,
> >>> +                                              MEM_ALIGN (data->entry_parm))))
> >>>
> >>> Right?
> >>>
> >>> Then the modified test case would use the movmisalign optab.
> >>> However nothing changes in the end, since the i386 back-end is used to work
> >>> around the middle end not using movmisalign optab when it should do so.
> >>>
> >>
> >> I prefer the second form of the check, as it offers more test coverage,
> >> and is probably more correct than the former.
> >>
> >> Note there are more variations of this misalign check in expr.c,
> >> some are somehow odd, like expansion of MEM_REF and VIEW_CONVERT_EXPR:
> >>
> >>             && mode != BLKmode
> >>             && align < GET_MODE_ALIGNMENT (mode))
> >>           {
> >>             if ((icode = optab_handler (movmisalign_optab, mode))
> >>                 != CODE_FOR_nothing)
> >>               [...]
> >>             else if (targetm.slow_unaligned_access (mode, align))
> >>               temp = extract_bit_field (temp, GET_MODE_BITSIZE (mode),
> >>                                         0, TYPE_UNSIGNED (TREE_TYPE (exp)),
> >>                                         (modifier == EXPAND_STACK_PARM
> >>                                          ? NULL_RTX : target),
> >>                                         mode, mode, false, alt_rtl);
> >>
> >> I wonder if they are correct this way, why shouldn't we use the movmisalign
> >> optab if it exists, regardless of TARGET_SLOW_UNALIGNED_ACCESSS ?
> > 
> > Doesn't the code do exactly this?  Prefer movmisalign over 
> > extrct_bit_field?
> > 
> 
> Ah, yes.  How could I miss that.
> 
> >>
> >>> I wonder if I should try to add a gcc_checking_assert to the mov<mode> expand
> >>> patterns that the memory is properly aligned ?
> >>>
> >>
> >> Wow, that was a really exciting bug-hunt with those assertions around...
> > 
> > :)
> > 
> >>>> @@ -3292,6 +3306,23 @@ assign_parm_setup_reg (struct assign_parm_data_all
> >>>>
> >>>>        did_conversion = true;
> >>>>      }
> >>>> +  else if (MEM_P (data->entry_parm)
> >>>> +          && GET_MODE_ALIGNMENT (promoted_nominal_mode)
> >>>> +             > MEM_ALIGN (data->entry_parm)
> >>>>
> >>>> we arrive here by-passing
> >>>>
> >>>>   else if (need_conversion)
> >>>>     {
> >>>>       /* We did not have an insn to convert directly, or the sequence
> >>>>          generated appeared unsafe.  We must first copy the parm to a
> >>>>          pseudo reg, and save the conversion until after all
> >>>>          parameters have been moved.  */
> >>>>
> >>>>       int save_tree_used;
> >>>>       rtx tempreg = gen_reg_rtx (GET_MODE (data->entry_parm));
> >>>>
> >>>>       emit_move_insn (tempreg, validated_mem);
> >>>>
> >>>> but this move instruction is invalid in the same way as the case
> >>>> you fix, no?  So wouldn't it be better to do
> >>>>
> >>>
> >>> We could do that, but I supposed that there must be a reason why
> >>> assign_parm_setup_stack gets away with that same:
> >>>
> >>>   if (data->promoted_mode != data->nominal_mode)
> >>>     {
> >>>       /* Conversion is required.  */
> >>>       rtx tempreg = gen_reg_rtx (GET_MODE (data->entry_parm));
> >>>
> >>>       emit_move_insn (tempreg, validize_mem (copy_rtx (data->entry_parm)));
> >>>
> >>>
> >>> So either some back-ends are too permissive with us,
> >>> or there is a reason why promoted_mode != nominal_mode
> >>> does not happen together with unaligned entry_parm.
> >>> In a way that would be a rather unusual ABI.
> >>>
> >>
> >> To find out if that ever happens I added a couple of checking
> >> assertions in the arm mov<mode> expand patterns.
> >>
> >> So far the assertions did (almost) always hold, so it is likely not
> >> necessary to fiddle with all those naive move instructions here.
> >>
> >> So my gut feeling is, leave those places alone until there is a reason
> >> for changing them.
> > 
> > Works for me - I wonder if we should add those asserts to generic
> > code (guarded with flag_checking) though.
> > 
> 
> Well, yes, but I was scared away by the complexity of emit_move_insn_1.
> 
> It could be done, but in the moment I would be happy to have these
> checks of one major strict alignment target, ARM is a good candidate
> since most instructions work even if they are accidentally
> using unaligned arguments.  So middle-end errors do not always
> visible by ordinary tests.  Nevertheless it is a blatant violation of the
> contract between middle-end and back-end, which should be avoided.

Fair enough.

> >> However the assertion in movsi triggered a couple times in the
> >> ada testsuite due to expand_builtin_init_descriptor using a
> >> BLKmode MEM rtx, which is only 8-bit aligned.  So, I set the
> >> ptr_mode alignment there explicitly.
> > 
> > Looks good given we emit ptr_mode moves into it.  Thus OK independently.
> > 
> 
> Thanks, committed as r274487.
> 
> >> Several struct-layout-1.dg testcase tripped over misaligned
> >> complex_cst constants, fixed by varasm.c (align_variable).
> >> This is likely a wrong code bug, because misaligned complex
> >> constants, are expanded to misaligned MEM_REF, but the
> >> expansion cannot handle misaligned constants, only packed
> >> structure fields.
> > 
> > Hmm.  So your patch overrides user-alignment here.  Woudln't it
> > be better to do that more conciously by
> > 
> >   if (! DECL_USER_ALIGN (decl)
> >       || (align < GET_MODE_ALIGNMENT (DECL_MODE (decl))
> >           && targetm.slow_unaligned_access (DECL_MODE (decl), align)))
> > 
> > ?  And why is the movmisalign optab support missing here?
> > 
> 
> Yes, I wanted to replicate what we have in assign_parm_adjust_stack_rtl:
> 
>   /* If we can't trust the parm stack slot to be aligned enough for its
>      ultimate type, don't use that slot after entry.  We'll make another
>      stack slot, if we need one.  */
>   if (stack_parm
>       && ((GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN (stack_parm)
>            && targetm.slow_unaligned_access (data->nominal_mode,
>                                              MEM_ALIGN (stack_parm)))
> 
> which also makes a variable more aligned than it is declared.
> But maybe both should also check the movmisalign optab in
> addition to slow_unaligned_access ?

Quite possible.

> > IMHO whatever code later fails to properly use unaligned loads
> > should be fixed instead rather than ignoring user requested alignment.
> > 
> > Can you quote a short testcase that explains what exactly goes wrong?
> > The struct-layout ones are awkward to look at...
> > 
> 
> Sure,
> 
> $ cat test.c
> _Complex float __attribute__((aligned(1))) cf;
> 
> void foo (void)
> {
>   cf = 1.0i;
> }
> 
> $ arm-linux-gnueabihf-gcc -S test.c 
> during RTL pass: expand
> test.c: In function 'foo':
> test.c:5:6: internal compiler error: in gen_movsf, at config/arm/arm.md:7003
>     5 |   cf = 1.0i;
>       |   ~~~^~~~~~
> 0x7ba475 gen_movsf(rtx_def*, rtx_def*)
> 	../../gcc-trunk/gcc/config/arm/arm.md:7003
> 0xa49587 insn_gen_fn::operator()(rtx_def*, rtx_def*) const
> 	../../gcc-trunk/gcc/recog.h:318
> 0xa49587 emit_move_insn_1(rtx_def*, rtx_def*)
> 	../../gcc-trunk/gcc/expr.c:3695
> 0xa49914 emit_move_insn(rtx_def*, rtx_def*)
> 	../../gcc-trunk/gcc/expr.c:3791
> 0xa494f7 emit_move_complex_parts(rtx_def*, rtx_def*)
> 	../../gcc-trunk/gcc/expr.c:3490
> 0xa49914 emit_move_insn(rtx_def*, rtx_def*)
> 	../../gcc-trunk/gcc/expr.c:3791
> 0xa5106f store_expr(tree_node*, rtx_def*, int, bool, bool)
> 	../../gcc-trunk/gcc/expr.c:5855
> 0xa51cc0 expand_assignment(tree_node*, tree_node*, bool)
> 	../../gcc-trunk/gcc/expr.c:5441

Huh, so why didn't it trigger

  /* Handle misaligned stores.  */
  mode = TYPE_MODE (TREE_TYPE (to));
  if ((TREE_CODE (to) == MEM_REF
       || TREE_CODE (to) == TARGET_MEM_REF)
      && mode != BLKmode
      && !mem_ref_refers_to_non_mem_p (to)
      && ((align = get_object_alignment (to))
          < GET_MODE_ALIGNMENT (mode))
      && (((icode = optab_handler (movmisalign_optab, mode))
           != CODE_FOR_nothing)
          || targetm.slow_unaligned_access (mode, align)))
    {

?  (_Complex float is 32bit aligned it seems, the DECL_RTL for the
var is (mem/c:SC (symbol_ref:SI ("cf") [flags 0x2] <var_decl 
0x2aaaaaad1240 cf>) [1 cf+0 S8 A8]), SCmode is 32bit aligned.

Ah, 'to' is a plain DECL here so the above handling is incomplete.
IIRC component refs like __real cf = 0.f should be handled fine
again(?).  So, does adding || DECL_P (to) fix the case as well?

> 0xa51cc0 expand_assignment(tree_node*, tree_node*, bool)
> 	../../gcc-trunk/gcc/expr.c:4983
> 0x93396f expand_gimple_stmt_1
> 	../../gcc-trunk/gcc/cfgexpand.c:3777
> 0x93396f expand_gimple_stmt
> 	../../gcc-trunk/gcc/cfgexpand.c:3875
> 0x9392e1 expand_gimple_basic_block
> 	../../gcc-trunk/gcc/cfgexpand.c:5915
> 0x93b046 execute
> 	../../gcc-trunk/gcc/cfgexpand.c:6538
> Please submit a full bug report,
> with preprocessed source if appropriate.
> Please include the complete backtrace with any bug report.
> See <https://gcc.gnu.org/bugs/> for instructions.
> 
> Without the hunk in varasm.c of course.
> 
> What happens is that expand_expr_real_2 returns a unaligned mem_ref here:
> 
>     case COMPLEX_CST:
>       /* Handle evaluating a complex constant in a CONCAT target.  */
>       if (original_target && GET_CODE (original_target) == CONCAT)
>         {
>           [... this path not taken ...]
>         }
> 
>       /* fall through */
> 
>     case STRING_CST:
>       temp = expand_expr_constant (exp, 1, modifier);
> 
>       /* temp contains a constant address.
>          On RISC machines where a constant address isn't valid,
>          make some insns to get that address into a register.  */
>       if (modifier != EXPAND_CONST_ADDRESS
>           && modifier != EXPAND_INITIALIZER
>           && modifier != EXPAND_SUM
>           && ! memory_address_addr_space_p (mode, XEXP (temp, 0),
>                                             MEM_ADDR_SPACE (temp)))
>         return replace_equiv_address (temp,
>                                       copy_rtx (XEXP (temp, 0)));
>       return temp;
> 
> The result of expand_expr_real(..., EXPAND_NORMAL) ought to be usable
> by emit_move_insn, that is expected just *everywhere* and can't be changed.
> 
> This could probably be fixed in an ugly way in the COMPLEX_CST, handler
> but OTOH, I don't see any reason why this constant has to be misaligned
> when it can be easily aligned, which avoids the need for a misaligned access.

If the COMPLEX_CST happends to end up in unaligned memory then that's
of course a bug (unless the target requests that for all COMPLEX_CSTs).
That is, if the unalignment is triggered because the store is to an
unaligned decl.

But I think the issue is the above one?

> >> Furthermore gcc.dg/Warray-bounds-33.c was fixed by the
> >> change in expr.c (expand_expr_real_1).  Certainly is it invalid
> >> to read memory at a function address, but it should not ICE.
> >> The problem here, is the MEM_REF has no valid MEM_ALIGN, it looks
> >> like A32, so the misaligned code execution is not taken, but it is
> >> set to A8 below, but then we hit an ICE if the result is used:
> > 
> > So the user accessed it as A32.
> > 
> >>         /* Don't set memory attributes if the base expression is
> >>            SSA_NAME that got expanded as a MEM.  In that case, we should
> >>            just honor its original memory attributes.  */
> >>         if (TREE_CODE (tem) != SSA_NAME || !MEM_P (orig_op0))
> >>           set_mem_attributes (op0, exp, 0);
> > 
> > Huh, I don't understand this.  'tem' should never be SSA_NAME.
> 
> tem is the result of get_inner_reference, why can't that be a SSA_NAME ?

We can't subset an SSA_NAME.  I have really no idea what this intended
to do...

> > But set_mem_attributes_minus_bitpos uses get_object_alignment_1
> > and that has special treatment for FUNCTION_DECLs that is not
> > covered by
> > 
> >       /* When EXP is an actual memory reference then we can use
> >          TYPE_ALIGN of a pointer indirection to derive alignment.
> >          Do so only if get_pointer_alignment_1 did not reveal absolute
> >          alignment knowledge and if using that alignment would
> >          improve the situation.  */
> >       unsigned int talign;
> >       if (!addr_p && !known_alignment
> >           && (talign = min_align_of_type (TREE_TYPE (exp)) * 
> > BITS_PER_UNIT)
> >           && talign > align)
> >         align = talign;
> > 
> > which could be moved out of the if-cascade.
> > 
> > That said, setting A8 should eventually result into appropriate
> > unaligned expansion, so it seems odd this triggers the assert...
> > 
> 
> The function pointer is really 32-byte aligned in ARM mode to start
> with...
> 
> The problem is that the code that handles this misaligned access
> is skipped because the mem_rtx has initially no MEM_ATTRS and therefore
> MEM_ALIGN == 32, and therefore the code that handles the unaligned
> access is not taken.  BUT before the mem_rtx is returned it is
> set to MEM_ALIGN = 8 by set_mem_attributes, and we have an assertion,
> because the result from expand_expr_real(..., EXPAND_NORMAL) ought to be
> usable with emit_move_insn.

yes, as said the _access_ determines the address should be aligned
so we shouldn't end up setting MEM_ALIGN to 8 but to 32 according
to the access type/mode.  But we can't trust DECL_ALIGN of
FUNCTION_DECLs but we _can_ trust users writing *(int *)fn
(maybe for actual accesses we _can_ trust DECL_ALIGN, it's just
we may not compute nonzero bits for the actual address because
of function pointer mangling)
(for accessing function code I'd say this would be premature
optimization, but ...)

> >>
> >> Finally gcc.dg/torture/pr48493.c required the change
> >> in assign_parm_setup_stack.  This is just not using the
> >> correct MEM_ALIGN attribute value, while the memory is
> >> actually aligned.
> > 
> > But doesn't
> > 
> >           int align = STACK_SLOT_ALIGNMENT (data->passed_type,
> >                                             GET_MODE (data->entry_parm),
> >                                             TYPE_ALIGN 
> > (data->passed_type));
> > +         if (align < (int)GET_MODE_ALIGNMENT (GET_MODE 
> > (data->entry_parm))
> > +             && targetm.slow_unaligned_access (GET_MODE 
> > (data->entry_parm),
> > +                                               align))
> > +           align = GET_MODE_ALIGNMENT (GET_MODE (data->entry_parm));
> > 
> > hint at that STACK_SLOT_ALIGNMENT is simply bogus for the target?
> > That is, the target says, for natural alignment 64 the stack slot
> > alignment can only be guaranteed 32.  You can't then simply up it
> > but have to use unaligned accesses (or the target/middle-end needs
> > to do dynamic stack alignment).
> > 
> Yes, maybe, but STACK_SLOT_ALIGNMENT is used in a few other places as well,
> and none of them have a problem, probably because they use expand_expr,
> but here we use emit_move_insn:
> 
>       if (MEM_P (src))
>         {
>           [...]
>         }
>       else
>         {
>           if (!REG_P (src))
>             src = force_reg (GET_MODE (src), src);
>           emit_move_insn (dest, src);
>         }
> 
> So I could restrict that to
> 
>           if (!MEM_P (data->entry_parm)
>               && align < (int)GET_MODE_ALIGNMENT (GET_MODE (data->entry_parm))
>               && ((optab_handler (movmisalign_optab,
> 				  GET_MODE (data->entry_parm))
>                    != CODE_FOR_nothing)
>                   || targetm.slow_unaligned_access (GET_MODE (data->entry_parm),
>                                                     align)))
>             align = GET_MODE_ALIGNMENT (GET_MODE (data->entry_parm));
> 
> But OTOH even for arguments arriving in unaligned stack slots where
> emit_block_move could handle it, that would just work against the
> intention of assign_parm_adjust_stack_rtl.
> 
> Of course there are limits how much alignment assign_stack_local
> can handle, and that would result in an assertion in the emit_move_insn.
> But in the end if that happens it is just an impossible target
> configuration.

Still I think you can't simply override STACK_SLOT_ALIGNMENT just because
of the mode of an entry param, can you?  If you can assume a bigger
alignment then STACK_SLOT_ALIGNMENT should return it.

> > 
> >>  Note that set_mem_attributes does not
> >> always preserve the MEM_ALIGN of the ref, since:
> > 
> > set_mem_attributes sets _all_ attributes from an expression or type.
> > 
> 
> Not really:
> 
>   refattrs = MEM_ATTRS (ref);
>   if (refattrs)
>     {
>       /* ??? Can this ever happen?  Calling this routine on a MEM that
>          already carries memory attributes should probably be invalid.  */
>       [...]
>       attrs.align = refattrs->align;
>     }
>   else
>     [...]
> 
>   if (objectp || TREE_CODE (t) == INDIRECT_REF)
>     attrs.align = MAX (attrs.align, TYPE_ALIGN (type));
> 
> >>   /* Default values from pre-existing memory attributes if present.  */
> >>   refattrs = MEM_ATTRS (ref);
> >>   if (refattrs)
> >>     {
> >>       /* ??? Can this ever happen?  Calling this routine on a MEM that
> >>          already carries memory attributes should probably be invalid.  */
> >>       attrs.expr = refattrs->expr;
> >>       attrs.offset_known_p = refattrs->offset_known_p;
> >>       attrs.offset = refattrs->offset;
> >>       attrs.size_known_p = refattrs->size_known_p;
> >>       attrs.size = refattrs->size;
> >>       attrs.align = refattrs->align;
> >>     }
> >>
> >> but if we happen to set_mem_align to _exactly_ the MODE_ALIGNMENT
> >> the MEM_ATTRS are zero, and a smaller alignment may result.
> > 
> > Not sure what you are saying here.  That
> > 
> > set_mem_align (MEM:SI A32, 32)
> > 
> > produces a NULL MEM_ATTRS and thus set_mem_attributes not inheriting
> > the A32 but eventually computing sth lower?  Yeah, that's probably
> > an interesting "hole" here.  I'm quite sure that if we'd do
> > 
> > refattrs = MEM_ATTRS (ref) ? MEM_ATTRS (ref) : mem_mode_attrs[(int) GET_MODE (ref)];
> > 
> > we run into issues exactly on strict-align targets ...
> > 
> 
> Yeah, that's scary...
> 
> >> Well with those checks in place it should now be a lot harder to generate
> >> invalid code on STRICT_ALIGNMENT targets, without running into an ICE.
> >>
> >> Attached is the latest version of my arm alignment patch.
> >>
> >>
> >> Boot-strapped and reg-tested on x64_64-pc-linux-gnu and arm-linux-gnueabihf.
> >> Is it OK for trunk?
> > 
> > @@ -3291,6 +3306,23 @@ assign_parm_setup_reg (struct assign_parm_data_all
> > 
> >        did_conversion = true;
> >      }
> > +  else if (MEM_P (data->entry_parm)
> > +          && GET_MODE_ALIGNMENT (promoted_nominal_mode)
> > +             > MEM_ALIGN (data->entry_parm)
> > +          && (((icode = optab_handler (movmisalign_optab,
> > +                                       promoted_nominal_mode))
> > +               != CODE_FOR_nothing)
> > +              || targetm.slow_unaligned_access (promoted_nominal_mode,
> > +                                                MEM_ALIGN 
> > (data->entry_parm))))
> > +    {
> > +      if (icode != CODE_FOR_nothing)
> > +       emit_insn (GEN_FCN (icode) (parmreg, validated_mem));
> > +      else
> > +       rtl = parmreg = extract_bit_field (validated_mem,
> > +                       GET_MODE_BITSIZE (promoted_nominal_mode), 0,
> > +                       unsignedp, parmreg,
> > +                       promoted_nominal_mode, VOIDmode, false, NULL);
> > +    }
> >    else
> >      emit_move_insn (parmreg, validated_mem);
> > 
> > This hunk would be obvious to me if we'd use MEM_ALIGN (validated_mem) /
> > GET_MODE (validated_mem) instead of MEM_ALIGN (data->entry_parm)
> > and promoted_nominal_mode.
> > 
> 
> Yes, the idea is just to save some cycles, since
> 
> parmreg = gen_reg_rtx (promoted_nominal_mode);
> we know that parmreg will also have that mode, plus
> emit_move_insn (parmreg, validated_mem) which would be called here
> asserts that:
> 
>   gcc_assert (mode != BLKmode
>               && (GET_MODE (y) == mode || GET_MODE (y) == VOIDmode));
> 
> so GET_MODE(validated_mem) == GET_MODE (parmreg) == promoted_nominal_mode
> 
> I still like the current version with promoted_nominal_mode slighhtly
> better both because of performance, and the 80-column restriction. :)

So if you say they are 1:1 equivalent then go for it (for this hunk,
approved as "obvious").

Richard.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCHv4] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)
  2019-08-15  8:58                   ` Richard Biener
@ 2019-08-15 12:38                     ` Bernd Edlinger
  2019-08-15 13:03                       ` Richard Biener
  0 siblings, 1 reply; 50+ messages in thread
From: Bernd Edlinger @ 2019-08-15 12:38 UTC (permalink / raw)
  To: Richard Biener
  Cc: gcc-patches, Richard Earnshaw, Ramana Radhakrishnan,
	Kyrill Tkachov, Eric Botcazou, Jeff Law, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 21581 bytes --]

On 8/15/19 10:55 AM, Richard Biener wrote:
> On Wed, 14 Aug 2019, Bernd Edlinger wrote:
> 
>> On 8/14/19 2:00 PM, Richard Biener wrote:
>>
>> Well, yes, but I was scared away by the complexity of emit_move_insn_1.
>>
>> It could be done, but in the moment I would be happy to have these
>> checks of one major strict alignment target, ARM is a good candidate
>> since most instructions work even if they are accidentally
>> using unaligned arguments.  So middle-end errors do not always
>> visible by ordinary tests.  Nevertheless it is a blatant violation of the
>> contract between middle-end and back-end, which should be avoided.
> 
> Fair enough.
> 
>>>> Several struct-layout-1.dg testcase tripped over misaligned
>>>> complex_cst constants, fixed by varasm.c (align_variable).
>>>> This is likely a wrong code bug, because misaligned complex
>>>> constants, are expanded to misaligned MEM_REF, but the
>>>> expansion cannot handle misaligned constants, only packed
>>>> structure fields.
>>>
>>> Hmm.  So your patch overrides user-alignment here.  Woudln't it
>>> be better to do that more conciously by
>>>
>>>   if (! DECL_USER_ALIGN (decl)
>>>       || (align < GET_MODE_ALIGNMENT (DECL_MODE (decl))
>>>           && targetm.slow_unaligned_access (DECL_MODE (decl), align)))
>>>

? I don't know why that would be better?
If the value is underaligned no matter why, pretend it was declared as
naturally aligned if that causes wrong code otherwise.
That was the idea here.

>>> ?  And why is the movmisalign optab support missing here?
>>>
>>
>> Yes, I wanted to replicate what we have in assign_parm_adjust_stack_rtl:
>>
>>   /* If we can't trust the parm stack slot to be aligned enough for its
>>      ultimate type, don't use that slot after entry.  We'll make another
>>      stack slot, if we need one.  */
>>   if (stack_parm
>>       && ((GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN (stack_parm)
>>            && targetm.slow_unaligned_access (data->nominal_mode,
>>                                              MEM_ALIGN (stack_parm)))
>>
>> which also makes a variable more aligned than it is declared.
>> But maybe both should also check the movmisalign optab in
>> addition to slow_unaligned_access ?
> 
> Quite possible.
> 

Will do, see attached new version of the patch.

>>> IMHO whatever code later fails to properly use unaligned loads
>>> should be fixed instead rather than ignoring user requested alignment.
>>>
>>> Can you quote a short testcase that explains what exactly goes wrong?
>>> The struct-layout ones are awkward to look at...
>>>
>>
>> Sure,
>>
>> $ cat test.c
>> _Complex float __attribute__((aligned(1))) cf;
>>
>> void foo (void)
>> {
>>   cf = 1.0i;
>> }
>>
>> $ arm-linux-gnueabihf-gcc -S test.c 
>> during RTL pass: expand
>> test.c: In function 'foo':
>> test.c:5:6: internal compiler error: in gen_movsf, at config/arm/arm.md:7003
>>     5 |   cf = 1.0i;
>>       |   ~~~^~~~~~
>> 0x7ba475 gen_movsf(rtx_def*, rtx_def*)
>> 	../../gcc-trunk/gcc/config/arm/arm.md:7003
>> 0xa49587 insn_gen_fn::operator()(rtx_def*, rtx_def*) const
>> 	../../gcc-trunk/gcc/recog.h:318
>> 0xa49587 emit_move_insn_1(rtx_def*, rtx_def*)
>> 	../../gcc-trunk/gcc/expr.c:3695
>> 0xa49914 emit_move_insn(rtx_def*, rtx_def*)
>> 	../../gcc-trunk/gcc/expr.c:3791
>> 0xa494f7 emit_move_complex_parts(rtx_def*, rtx_def*)
>> 	../../gcc-trunk/gcc/expr.c:3490
>> 0xa49914 emit_move_insn(rtx_def*, rtx_def*)
>> 	../../gcc-trunk/gcc/expr.c:3791
>> 0xa5106f store_expr(tree_node*, rtx_def*, int, bool, bool)
>> 	../../gcc-trunk/gcc/expr.c:5855
>> 0xa51cc0 expand_assignment(tree_node*, tree_node*, bool)
>> 	../../gcc-trunk/gcc/expr.c:5441
> 
> Huh, so why didn't it trigger
> 
>   /* Handle misaligned stores.  */
>   mode = TYPE_MODE (TREE_TYPE (to));
>   if ((TREE_CODE (to) == MEM_REF
>        || TREE_CODE (to) == TARGET_MEM_REF)
>       && mode != BLKmode
>       && !mem_ref_refers_to_non_mem_p (to)
>       && ((align = get_object_alignment (to))
>           < GET_MODE_ALIGNMENT (mode))
>       && (((icode = optab_handler (movmisalign_optab, mode))
>            != CODE_FOR_nothing)
>           || targetm.slow_unaligned_access (mode, align)))
>     {
> 
> ?  (_Complex float is 32bit aligned it seems, the DECL_RTL for the
> var is (mem/c:SC (symbol_ref:SI ("cf") [flags 0x2] <var_decl 
> 0x2aaaaaad1240 cf>) [1 cf+0 S8 A8]), SCmode is 32bit aligned.
> 
> Ah, 'to' is a plain DECL here so the above handling is incomplete.
> IIRC component refs like __real cf = 0.f should be handled fine
> again(?).  So, does adding || DECL_P (to) fix the case as well?
> 

So I tried this instead of the varasm.c change:

Index: expr.c
===================================================================
--- expr.c	(revision 274487)
+++ expr.c	(working copy)
@@ -5002,9 +5002,10 @@ expand_assignment (tree to, tree from, bool nontem
   /* Handle misaligned stores.  */
   mode = TYPE_MODE (TREE_TYPE (to));
   if ((TREE_CODE (to) == MEM_REF
-       || TREE_CODE (to) == TARGET_MEM_REF)
+       || TREE_CODE (to) == TARGET_MEM_REF
+       || DECL_P (to))
       && mode != BLKmode
-      && !mem_ref_refers_to_non_mem_p (to)
+      && (DECL_P (to) || !mem_ref_refers_to_non_mem_p (to))
       && ((align = get_object_alignment (to))
 	  < GET_MODE_ALIGNMENT (mode))
       && (((icode = optab_handler (movmisalign_optab, mode))

Result, yes, it fixes this test case
but then I run all struct-layout-1.exp there are sill cases. where we have problems:

In file included from /home/ed/gnu/gcc-build-arm-linux-gnueabihf-linux64/gcc/testsuite/gcc/gcc.dg-struct-layout-1//t024_x.c:8:^M
/home/ed/gnu/gcc-build-arm-linux-gnueabihf-linux64/gcc/testsuite/gcc/gcc.dg-struct-layout-1//t024_test.h: In function 'test2112':^M
/home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.dg/compat/struct-layout-1_x1.h:23:10: internal compiler error: in gen_movdf, at config/arm/arm.md:7107^M
/home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.dg/compat/struct-layout-1_x1.h:62:3: note: in definition of macro 'TX'^M
/home/ed/gnu/gcc-build-arm-linux-gnueabihf-linux64/gcc/testsuite/gcc/gcc.dg-struct-layout-1//t024_test.h:113:1: note: in expansion of macro 'TCI'^M
/home/ed/gnu/gcc-build-arm-linux-gnueabihf-linux64/gcc/testsuite/gcc/gcc.dg-struct-layout-1//t024_test.h:113:294: note: in expansion of macro 'F'^M
0x7ba377 gen_movdf(rtx_def*, rtx_def*)^M
        ../../gcc-trunk/gcc/config/arm/arm.md:7107^M
0xa494c7 insn_gen_fn::operator()(rtx_def*, rtx_def*) const^M
        ../../gcc-trunk/gcc/recog.h:318^M
0xa494c7 emit_move_insn_1(rtx_def*, rtx_def*)^M
        ../../gcc-trunk/gcc/expr.c:3695^M
0xa49854 emit_move_insn(rtx_def*, rtx_def*)^M
        ../../gcc-trunk/gcc/expr.c:3791^M
0xa49437 emit_move_complex_parts(rtx_def*, rtx_def*)^M
        ../../gcc-trunk/gcc/expr.c:3490^M
0xa49854 emit_move_insn(rtx_def*, rtx_def*)^M
        ../../gcc-trunk/gcc/expr.c:3791^M
0xa50faf store_expr(tree_node*, rtx_def*, int, bool, bool)^M
        ../../gcc-trunk/gcc/expr.c:5856^M
0xa51f34 expand_assignment(tree_node*, tree_node*, bool)^M
        ../../gcc-trunk/gcc/expr.c:5302^M
0xa51f34 expand_assignment(tree_node*, tree_node*, bool)^M
        ../../gcc-trunk/gcc/expr.c:4983^M
0x9338af expand_gimple_stmt_1^M
        ../../gcc-trunk/gcc/cfgexpand.c:3777^M
0x9338af expand_gimple_stmt^M
        ../../gcc-trunk/gcc/cfgexpand.c:3875^M
0x939221 expand_gimple_basic_block^M
        ../../gcc-trunk/gcc/cfgexpand.c:5915^M
0x93af86 execute^M
        ../../gcc-trunk/gcc/cfgexpand.c:6538^M
Please submit a full bug report,^M

My personal gut feeling this will be more fragile than over-aligning the
constants.



>> 0xa51cc0 expand_assignment(tree_node*, tree_node*, bool)
>> 	../../gcc-trunk/gcc/expr.c:4983
>> 0x93396f expand_gimple_stmt_1
>> 	../../gcc-trunk/gcc/cfgexpand.c:3777
>> 0x93396f expand_gimple_stmt
>> 	../../gcc-trunk/gcc/cfgexpand.c:3875
>> 0x9392e1 expand_gimple_basic_block
>> 	../../gcc-trunk/gcc/cfgexpand.c:5915
>> 0x93b046 execute
>> 	../../gcc-trunk/gcc/cfgexpand.c:6538
>> Please submit a full bug report,
>> with preprocessed source if appropriate.
>> Please include the complete backtrace with any bug report.
>> See <https://gcc.gnu.org/bugs/> for instructions.
>>
>> Without the hunk in varasm.c of course.
>>
>> What happens is that expand_expr_real_2 returns a unaligned mem_ref here:
>>
>>     case COMPLEX_CST:
>>       /* Handle evaluating a complex constant in a CONCAT target.  */
>>       if (original_target && GET_CODE (original_target) == CONCAT)
>>         {
>>           [... this path not taken ...]

BTW: this code block executes when the other ICE happens.
 
>>         }
>>
>>       /* fall through */
>>
>>     case STRING_CST:
>>       temp = expand_expr_constant (exp, 1, modifier);
>>
>>       /* temp contains a constant address.
>>          On RISC machines where a constant address isn't valid,
>>          make some insns to get that address into a register.  */
>>       if (modifier != EXPAND_CONST_ADDRESS
>>           && modifier != EXPAND_INITIALIZER
>>           && modifier != EXPAND_SUM
>>           && ! memory_address_addr_space_p (mode, XEXP (temp, 0),
>>                                             MEM_ADDR_SPACE (temp)))
>>         return replace_equiv_address (temp,
>>                                       copy_rtx (XEXP (temp, 0)));
>>       return temp;
>>
>> The result of expand_expr_real(..., EXPAND_NORMAL) ought to be usable
>> by emit_move_insn, that is expected just *everywhere* and can't be changed.
>>
>> This could probably be fixed in an ugly way in the COMPLEX_CST, handler
>> but OTOH, I don't see any reason why this constant has to be misaligned
>> when it can be easily aligned, which avoids the need for a misaligned access.
> 
> If the COMPLEX_CST happends to end up in unaligned memory then that's
> of course a bug (unless the target requests that for all COMPLEX_CSTs).
> That is, if the unalignment is triggered because the store is to an
> unaligned decl.
> 
> But I think the issue is the above one?
> 

yes initially the constant seems to be unaligned. then it is expanded,
and there is no special handling for unaligned constants in expand_expr_real,
and then probably expand_assignment or store_expr seem not fully prepared for
this either.

>>>> Furthermore gcc.dg/Warray-bounds-33.c was fixed by the
>>>> change in expr.c (expand_expr_real_1).  Certainly is it invalid
>>>> to read memory at a function address, but it should not ICE.
>>>> The problem here, is the MEM_REF has no valid MEM_ALIGN, it looks
>>>> like A32, so the misaligned code execution is not taken, but it is
>>>> set to A8 below, but then we hit an ICE if the result is used:
>>>
>>> So the user accessed it as A32.
>>>
>>>>         /* Don't set memory attributes if the base expression is
>>>>            SSA_NAME that got expanded as a MEM.  In that case, we should
>>>>            just honor its original memory attributes.  */
>>>>         if (TREE_CODE (tem) != SSA_NAME || !MEM_P (orig_op0))
>>>>           set_mem_attributes (op0, exp, 0);
>>>
>>> Huh, I don't understand this.  'tem' should never be SSA_NAME.
>>
>> tem is the result of get_inner_reference, why can't that be a SSA_NAME ?
> 
> We can't subset an SSA_NAME.  I have really no idea what this intended
> to do...
> 

Nice, so would you do a patch to change that to a
gcc_checking_assert (TREE_CODE (tem) != SSA_NAME) ?
maybe with a small explanation?

>>> But set_mem_attributes_minus_bitpos uses get_object_alignment_1
>>> and that has special treatment for FUNCTION_DECLs that is not
>>> covered by
>>>
>>>       /* When EXP is an actual memory reference then we can use
>>>          TYPE_ALIGN of a pointer indirection to derive alignment.
>>>          Do so only if get_pointer_alignment_1 did not reveal absolute
>>>          alignment knowledge and if using that alignment would
>>>          improve the situation.  */
>>>       unsigned int talign;
>>>       if (!addr_p && !known_alignment
>>>           && (talign = min_align_of_type (TREE_TYPE (exp)) * 
>>> BITS_PER_UNIT)
>>>           && talign > align)
>>>         align = talign;
>>>
>>> which could be moved out of the if-cascade.
>>>
>>> That said, setting A8 should eventually result into appropriate
>>> unaligned expansion, so it seems odd this triggers the assert...
>>>
>>
>> The function pointer is really 32-byte aligned in ARM mode to start
>> with...
>>
>> The problem is that the code that handles this misaligned access
>> is skipped because the mem_rtx has initially no MEM_ATTRS and therefore
>> MEM_ALIGN == 32, and therefore the code that handles the unaligned
>> access is not taken.  BUT before the mem_rtx is returned it is
>> set to MEM_ALIGN = 8 by set_mem_attributes, and we have an assertion,
>> because the result from expand_expr_real(..., EXPAND_NORMAL) ought to be
>> usable with emit_move_insn.
> 
> yes, as said the _access_ determines the address should be aligned
> so we shouldn't end up setting MEM_ALIGN to 8 but to 32 according
> to the access type/mode.  But we can't trust DECL_ALIGN of
> FUNCTION_DECLs but we _can_ trust users writing *(int *)fn
> (maybe for actual accesses we _can_ trust DECL_ALIGN, it's just
> we may not compute nonzero bits for the actual address because
> of function pointer mangling)
> (for accessing function code I'd say this would be premature
> optimization, but ...)
> 

Not a very nice solution, but it is not worth to spend much effort
in optimizing undefined behavior, I just want to avoid the ICE
at this time and would not trust the DECL_ALIGN either.

>>>>
>>>> Finally gcc.dg/torture/pr48493.c required the change
>>>> in assign_parm_setup_stack.  This is just not using the
>>>> correct MEM_ALIGN attribute value, while the memory is
>>>> actually aligned.
>>>
>>> But doesn't
>>>
>>>           int align = STACK_SLOT_ALIGNMENT (data->passed_type,
>>>                                             GET_MODE (data->entry_parm),
>>>                                             TYPE_ALIGN 
>>> (data->passed_type));
>>> +         if (align < (int)GET_MODE_ALIGNMENT (GET_MODE 
>>> (data->entry_parm))
>>> +             && targetm.slow_unaligned_access (GET_MODE 
>>> (data->entry_parm),
>>> +                                               align))
>>> +           align = GET_MODE_ALIGNMENT (GET_MODE (data->entry_parm));
>>>
>>> hint at that STACK_SLOT_ALIGNMENT is simply bogus for the target?
>>> That is, the target says, for natural alignment 64 the stack slot
>>> alignment can only be guaranteed 32.  You can't then simply up it
>>> but have to use unaligned accesses (or the target/middle-end needs
>>> to do dynamic stack alignment).
>>>
>> Yes, maybe, but STACK_SLOT_ALIGNMENT is used in a few other places as well,
>> and none of them have a problem, probably because they use expand_expr,
>> but here we use emit_move_insn:
>>
>>       if (MEM_P (src))
>>         {
>>           [...]
>>         }
>>       else
>>         {
>>           if (!REG_P (src))
>>             src = force_reg (GET_MODE (src), src);
>>           emit_move_insn (dest, src);
>>         }
>>
>> So I could restrict that to
>>
>>           if (!MEM_P (data->entry_parm)
>>               && align < (int)GET_MODE_ALIGNMENT (GET_MODE (data->entry_parm))
>>               && ((optab_handler (movmisalign_optab,
>> 				  GET_MODE (data->entry_parm))
>>                    != CODE_FOR_nothing)
>>                   || targetm.slow_unaligned_access (GET_MODE (data->entry_parm),
>>                                                     align)))
>>             align = GET_MODE_ALIGNMENT (GET_MODE (data->entry_parm));
>>
>> But OTOH even for arguments arriving in unaligned stack slots where
>> emit_block_move could handle it, that would just work against the
>> intention of assign_parm_adjust_stack_rtl.
>>
>> Of course there are limits how much alignment assign_stack_local
>> can handle, and that would result in an assertion in the emit_move_insn.
>> But in the end if that happens it is just an impossible target
>> configuration.
> 
> Still I think you can't simply override STACK_SLOT_ALIGNMENT just because
> of the mode of an entry param, can you?  If you can assume a bigger
> alignment then STACK_SLOT_ALIGNMENT should return it.
> 

I don't see a real problem here.  All target except i386 and gcn (whatever that is)
use the default for STACK_SLOT_ALIGNMENT which simply allows any (large) align value
to rule the effective STACK_SLOT_ALIGNMENT.  The user could have simply declared
the local variable with the alignment that results in better code FWIW.

If the stack alignment is too high that is capped in assign_stack_local:

  /* Ignore alignment if it exceeds MAX_SUPPORTED_STACK_ALIGNMENT.  */
  if (alignment_in_bits > MAX_SUPPORTED_STACK_ALIGNMENT)
    {
      alignment_in_bits = MAX_SUPPORTED_STACK_ALIGNMENT;
      alignment = MAX_SUPPORTED_STACK_ALIGNMENT / BITS_PER_UNIT;
    }

I for one, would just assume that MAX_SUPPORTED_STACK_ALIGNMENT should
be sufficient for all modes that need movmisalign_optab and friends.
If it is not, an ICE would be just fine.

>>>
>>>>  Note that set_mem_attributes does not
>>>> always preserve the MEM_ALIGN of the ref, since:
>>>
>>> set_mem_attributes sets _all_ attributes from an expression or type.
>>>
>>
>> Not really:
>>
>>   refattrs = MEM_ATTRS (ref);
>>   if (refattrs)
>>     {
>>       /* ??? Can this ever happen?  Calling this routine on a MEM that
>>          already carries memory attributes should probably be invalid.  */
>>       [...]
>>       attrs.align = refattrs->align;
>>     }
>>   else
>>     [...]
>>
>>   if (objectp || TREE_CODE (t) == INDIRECT_REF)
>>     attrs.align = MAX (attrs.align, TYPE_ALIGN (type));
>>
>>>>   /* Default values from pre-existing memory attributes if present.  */
>>>>   refattrs = MEM_ATTRS (ref);
>>>>   if (refattrs)
>>>>     {
>>>>       /* ??? Can this ever happen?  Calling this routine on a MEM that
>>>>          already carries memory attributes should probably be invalid.  */
>>>>       attrs.expr = refattrs->expr;
>>>>       attrs.offset_known_p = refattrs->offset_known_p;
>>>>       attrs.offset = refattrs->offset;
>>>>       attrs.size_known_p = refattrs->size_known_p;
>>>>       attrs.size = refattrs->size;
>>>>       attrs.align = refattrs->align;
>>>>     }
>>>>
>>>> but if we happen to set_mem_align to _exactly_ the MODE_ALIGNMENT
>>>> the MEM_ATTRS are zero, and a smaller alignment may result.
>>>
>>> Not sure what you are saying here.  That
>>>
>>> set_mem_align (MEM:SI A32, 32)
>>>
>>> produces a NULL MEM_ATTRS and thus set_mem_attributes not inheriting
>>> the A32 but eventually computing sth lower?  Yeah, that's probably
>>> an interesting "hole" here.  I'm quite sure that if we'd do
>>>
>>> refattrs = MEM_ATTRS (ref) ? MEM_ATTRS (ref) : mem_mode_attrs[(int) GET_MODE (ref)];
>>>
>>> we run into issues exactly on strict-align targets ...
>>>
>>
>> Yeah, that's scary...
>>
>>>
>>> @@ -3291,6 +3306,23 @@ assign_parm_setup_reg (struct assign_parm_data_all
>>>
>>>        did_conversion = true;
>>>      }
>>> +  else if (MEM_P (data->entry_parm)
>>> +          && GET_MODE_ALIGNMENT (promoted_nominal_mode)
>>> +             > MEM_ALIGN (data->entry_parm)
>>> +          && (((icode = optab_handler (movmisalign_optab,
>>> +                                       promoted_nominal_mode))
>>> +               != CODE_FOR_nothing)
>>> +              || targetm.slow_unaligned_access (promoted_nominal_mode,
>>> +                                                MEM_ALIGN 
>>> (data->entry_parm))))
>>> +    {
>>> +      if (icode != CODE_FOR_nothing)
>>> +       emit_insn (GEN_FCN (icode) (parmreg, validated_mem));
>>> +      else
>>> +       rtl = parmreg = extract_bit_field (validated_mem,
>>> +                       GET_MODE_BITSIZE (promoted_nominal_mode), 0,
>>> +                       unsignedp, parmreg,
>>> +                       promoted_nominal_mode, VOIDmode, false, NULL);
>>> +    }
>>>    else
>>>      emit_move_insn (parmreg, validated_mem);
>>>
>>> This hunk would be obvious to me if we'd use MEM_ALIGN (validated_mem) /
>>> GET_MODE (validated_mem) instead of MEM_ALIGN (data->entry_parm)
>>> and promoted_nominal_mode.
>>>
>>
>> Yes, the idea is just to save some cycles, since
>>
>> parmreg = gen_reg_rtx (promoted_nominal_mode);
>> we know that parmreg will also have that mode, plus
>> emit_move_insn (parmreg, validated_mem) which would be called here
>> asserts that:
>>
>>   gcc_assert (mode != BLKmode
>>               && (GET_MODE (y) == mode || GET_MODE (y) == VOIDmode));
>>
>> so GET_MODE(validated_mem) == GET_MODE (parmreg) == promoted_nominal_mode
>>
>> I still like the current version with promoted_nominal_mode slighhtly
>> better both because of performance, and the 80-column restriction. :)
> 
> So if you say they are 1:1 equivalent then go for it (for this hunk,
> approved as "obvious").
> 

Okay.  Thanks, so I committed that hunk as r274531.

Here is what I have right now, boot-strapped and reg-tested on x86_64-pc-linux-gnu
and arm-linux-gnueabihf (still running, but looks good so far).

Is it OK for trunk?


Thanks
Bernd.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: patch-arm-align-abi.diff --]
[-- Type: text/x-patch; name="patch-arm-align-abi.diff", Size: 11293 bytes --]

2019-08-05  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	PR middle-end/89544
	* expr.c (expand_expr_real_1): Handle FUNCTION_DECL as unaligned.
	* function.c (assign_parm_find_stack_rtl): Use larger alignment
	when possible.
	(assign_parm_adjust_stack_rtl): Check movmisalign optab too.
	(assign_parm_setup_stack): Allocate properly aligned stack slots.
	* varasm.c (align_variable): Align constants of misaligned types.
	* config/arm/arm.md (movdi, movsi, movhi, movhf, movsf, movdf): Check
	strict alignment restrictions on memory addresses.
	* config/arm/neon.md (movti, mov<VSTRUCT>, mov<VH>): Likewise.
	* config/arm/vec-common.md (mov<VALL>): Likewise.

testsuite:
2019-08-05  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	PR middle-end/89544
	* gcc.target/arm/unaligned-argument-1.c: New test.
	* gcc.target/arm/unaligned-argument-2.c: New test.

Index: gcc/config/arm/arm.md
===================================================================
--- gcc/config/arm/arm.md	(revision 274531)
+++ gcc/config/arm/arm.md	(working copy)
@@ -5838,6 +5838,12 @@
 	(match_operand:DI 1 "general_operand"))]
   "TARGET_EITHER"
   "
+  gcc_checking_assert (!MEM_P (operands[0])
+		       || MEM_ALIGN (operands[0])
+			  >= GET_MODE_ALIGNMENT (DImode));
+  gcc_checking_assert (!MEM_P (operands[1])
+		       || MEM_ALIGN (operands[1])
+			  >= GET_MODE_ALIGNMENT (DImode));
   if (can_create_pseudo_p ())
     {
       if (!REG_P (operands[0]))
@@ -6014,6 +6020,12 @@
   {
   rtx base, offset, tmp;
 
+  gcc_checking_assert (!MEM_P (operands[0])
+		       || MEM_ALIGN (operands[0])
+			  >= GET_MODE_ALIGNMENT (SImode));
+  gcc_checking_assert (!MEM_P (operands[1])
+		       || MEM_ALIGN (operands[1])
+			  >= GET_MODE_ALIGNMENT (SImode));
   if (TARGET_32BIT || TARGET_HAVE_MOVT)
     {
       /* Everything except mem = const or mem = mem can be done easily.  */
@@ -6503,6 +6515,12 @@
 	(match_operand:HI 1 "general_operand"))]
   "TARGET_EITHER"
   "
+  gcc_checking_assert (!MEM_P (operands[0])
+		       || MEM_ALIGN (operands[0])
+			  >= GET_MODE_ALIGNMENT (HImode));
+  gcc_checking_assert (!MEM_P (operands[1])
+		       || MEM_ALIGN (operands[1])
+			  >= GET_MODE_ALIGNMENT (HImode));
   if (TARGET_ARM)
     {
       if (can_create_pseudo_p ())
@@ -6912,6 +6930,12 @@
 	(match_operand:HF 1 "general_operand"))]
   "TARGET_EITHER"
   "
+  gcc_checking_assert (!MEM_P (operands[0])
+		       || MEM_ALIGN (operands[0])
+			  >= GET_MODE_ALIGNMENT (HFmode));
+  gcc_checking_assert (!MEM_P (operands[1])
+		       || MEM_ALIGN (operands[1])
+			  >= GET_MODE_ALIGNMENT (HFmode));
   if (TARGET_32BIT)
     {
       if (MEM_P (operands[0]))
@@ -6976,6 +7000,12 @@
 	(match_operand:SF 1 "general_operand"))]
   "TARGET_EITHER"
   "
+  gcc_checking_assert (!MEM_P (operands[0])
+		       || MEM_ALIGN (operands[0])
+			  >= GET_MODE_ALIGNMENT (SFmode));
+  gcc_checking_assert (!MEM_P (operands[1])
+		       || MEM_ALIGN (operands[1])
+			  >= GET_MODE_ALIGNMENT (SFmode));
   if (TARGET_32BIT)
     {
       if (MEM_P (operands[0]))
@@ -7071,6 +7101,12 @@
 	(match_operand:DF 1 "general_operand"))]
   "TARGET_EITHER"
   "
+  gcc_checking_assert (!MEM_P (operands[0])
+		       || MEM_ALIGN (operands[0])
+			  >= GET_MODE_ALIGNMENT (DFmode));
+  gcc_checking_assert (!MEM_P (operands[1])
+		       || MEM_ALIGN (operands[1])
+			  >= GET_MODE_ALIGNMENT (DFmode));
   if (TARGET_32BIT)
     {
       if (MEM_P (operands[0]))
Index: gcc/config/arm/neon.md
===================================================================
--- gcc/config/arm/neon.md	(revision 274531)
+++ gcc/config/arm/neon.md	(working copy)
@@ -127,6 +127,12 @@
 	(match_operand:TI 1 "general_operand"))]
   "TARGET_NEON"
 {
+  gcc_checking_assert (!MEM_P (operands[0])
+		       || MEM_ALIGN (operands[0])
+			  >= GET_MODE_ALIGNMENT (TImode));
+  gcc_checking_assert (!MEM_P (operands[1])
+		       || MEM_ALIGN (operands[1])
+			  >= GET_MODE_ALIGNMENT (TImode));
   if (can_create_pseudo_p ())
     {
       if (!REG_P (operands[0]))
@@ -139,6 +145,12 @@
 	(match_operand:VSTRUCT 1 "general_operand"))]
   "TARGET_NEON"
 {
+  gcc_checking_assert (!MEM_P (operands[0])
+		       || MEM_ALIGN (operands[0])
+			  >= GET_MODE_ALIGNMENT (<MODE>mode));
+  gcc_checking_assert (!MEM_P (operands[1])
+		       || MEM_ALIGN (operands[1])
+			  >= GET_MODE_ALIGNMENT (<MODE>mode));
   if (can_create_pseudo_p ())
     {
       if (!REG_P (operands[0]))
@@ -151,6 +163,12 @@
 	(match_operand:VH 1 "s_register_operand"))]
   "TARGET_NEON"
 {
+  gcc_checking_assert (!MEM_P (operands[0])
+		       || MEM_ALIGN (operands[0])
+			  >= GET_MODE_ALIGNMENT (<MODE>mode));
+  gcc_checking_assert (!MEM_P (operands[1])
+		       || MEM_ALIGN (operands[1])
+			  >= GET_MODE_ALIGNMENT (<MODE>mode));
   if (can_create_pseudo_p ())
     {
       if (!REG_P (operands[0]))
Index: gcc/config/arm/vec-common.md
===================================================================
--- gcc/config/arm/vec-common.md	(revision 274531)
+++ gcc/config/arm/vec-common.md	(working copy)
@@ -26,6 +26,12 @@
   "TARGET_NEON
    || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))"
 {
+  gcc_checking_assert (!MEM_P (operands[0])
+		       || MEM_ALIGN (operands[0])
+			  >= GET_MODE_ALIGNMENT (<MODE>mode));
+  gcc_checking_assert (!MEM_P (operands[1])
+		       || MEM_ALIGN (operands[1])
+			  >= GET_MODE_ALIGNMENT (<MODE>mode));
   if (can_create_pseudo_p ())
     {
       if (!REG_P (operands[0]))
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	(revision 274531)
+++ gcc/expr.c	(working copy)
@@ -10796,6 +10796,14 @@ expand_expr_real_1 (tree exp, rtx target, machine_
 	    MEM_VOLATILE_P (op0) = 1;
 	  }
 
+	if (MEM_P (op0) && TREE_CODE (tem) == FUNCTION_DECL)
+	  {
+	    if (op0 == orig_op0)
+	      op0 = copy_rtx (op0);
+
+	    set_mem_align (op0, BITS_PER_UNIT);
+	  }
+
 	/* In cases where an aligned union has an unaligned object
 	   as a field, we might be extracting a BLKmode value from
 	   an integer-mode (e.g., SImode) object.  Handle this case
Index: gcc/function.c
===================================================================
--- gcc/function.c	(revision 274531)
+++ gcc/function.c	(working copy)
@@ -2697,8 +2697,23 @@ assign_parm_find_stack_rtl (tree parm, struct assi
      intentionally forcing upward padding.  Otherwise we have to come
      up with a guess at the alignment based on OFFSET_RTX.  */
   poly_int64 offset;
-  if (data->locate.where_pad != PAD_DOWNWARD || data->entry_parm)
+  if (data->locate.where_pad == PAD_NONE || data->entry_parm)
     align = boundary;
+  else if (data->locate.where_pad == PAD_UPWARD)
+    {
+      align = boundary;
+      /* If the argument offset is actually more aligned than the nominal
+	 stack slot boundary, take advantage of that excess alignment.
+	 Don't make any assumptions if STACK_POINTER_OFFSET is in use.  */
+      if (poly_int_rtx_p (offset_rtx, &offset)
+	  && STACK_POINTER_OFFSET == 0)
+	{
+	  unsigned int offset_align = known_alignment (offset) * BITS_PER_UNIT;
+	  if (offset_align == 0 || offset_align > STACK_BOUNDARY)
+	    offset_align = STACK_BOUNDARY;
+	  align = MAX (align, offset_align);
+	}
+    }
   else if (poly_int_rtx_p (offset_rtx, &offset))
     {
       align = least_bit_hwi (boundary);
@@ -2812,8 +2827,10 @@ assign_parm_adjust_stack_rtl (struct assign_parm_d
      stack slot, if we need one.  */
   if (stack_parm
       && ((GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN (stack_parm)
-	   && targetm.slow_unaligned_access (data->nominal_mode,
-					     MEM_ALIGN (stack_parm)))
+	   && ((optab_handler (movmisalign_optab, data->nominal_mode)
+		!= CODE_FOR_nothing)
+	       || targetm.slow_unaligned_access (data->nominal_mode,
+						 MEM_ALIGN (stack_parm))))
 	  || (data->nominal_type
 	      && TYPE_ALIGN (data->nominal_type) > MEM_ALIGN (stack_parm)
 	      && MEM_ALIGN (stack_parm) < PREFERRED_STACK_BOUNDARY)))
@@ -3466,11 +3483,20 @@ assign_parm_setup_stack (struct assign_parm_data_a
 	  int align = STACK_SLOT_ALIGNMENT (data->passed_type,
 					    GET_MODE (data->entry_parm),
 					    TYPE_ALIGN (data->passed_type));
+	  if (align < (int)GET_MODE_ALIGNMENT (GET_MODE (data->entry_parm))
+	      && ((optab_handler (movmisalign_optab,
+				  GET_MODE (data->entry_parm))
+		   != CODE_FOR_nothing)
+		  || targetm.slow_unaligned_access (GET_MODE (data->entry_parm),
+						    align)))
+	    align = GET_MODE_ALIGNMENT (GET_MODE (data->entry_parm));
 	  data->stack_parm
 	    = assign_stack_local (GET_MODE (data->entry_parm),
 				  GET_MODE_SIZE (GET_MODE (data->entry_parm)),
 				  align);
+	  align = MEM_ALIGN (data->stack_parm);
 	  set_mem_attributes (data->stack_parm, parm, 1);
+	  set_mem_align (data->stack_parm, align);
 	}
 
       dest = validize_mem (copy_rtx (data->stack_parm));
Index: gcc/testsuite/gcc.target/arm/unaligned-argument-1.c
===================================================================
--- gcc/testsuite/gcc.target/arm/unaligned-argument-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/arm/unaligned-argument-1.c	(working copy)
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_arm_ok } */
+/* { dg-require-effective-target arm_ldrd_strd_ok } */
+/* { dg-options "-marm -mno-unaligned-access -O3" } */
+
+struct s {
+  int a, b;
+} __attribute__((aligned(8)));
+
+struct s f0;
+
+void f(int a, int b, int c, int d, struct s f)
+{
+  f0 = f;
+}
+
+/* { dg-final { scan-assembler-times "ldrd" 1 } } */
+/* { dg-final { scan-assembler-times "strd" 1 } } */
+/* { dg-final { scan-assembler-times "stm" 0 } } */
Index: gcc/testsuite/gcc.target/arm/unaligned-argument-2.c
===================================================================
--- gcc/testsuite/gcc.target/arm/unaligned-argument-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/arm/unaligned-argument-2.c	(working copy)
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_arm_ok } */
+/* { dg-require-effective-target arm_ldrd_strd_ok } */
+/* { dg-options "-marm -mno-unaligned-access -O3" } */
+
+struct s {
+  int a, b;
+} __attribute__((aligned(8)));
+
+struct s f0;
+
+void f(int a, int b, int c, int d, int e, struct s f)
+{
+  f0 = f;
+}
+
+/* { dg-final { scan-assembler-times "ldrd" 0 } } */
+/* { dg-final { scan-assembler-times "strd" 0 } } */
+/* { dg-final { scan-assembler-times "stm" 1 } } */
Index: gcc/varasm.c
===================================================================
--- gcc/varasm.c	(revision 274531)
+++ gcc/varasm.c	(working copy)
@@ -47,6 +47,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "stmt.h"
 #include "expr.h"
 #include "expmed.h"
+#include "optabs.h"
 #include "output.h"
 #include "langhooks.h"
 #include "debug.h"
@@ -1085,6 +1086,12 @@ align_variable (tree decl, bool dont_output_data)
 	}
     }
 
+  if (align < GET_MODE_ALIGNMENT (DECL_MODE (decl))
+      && ((optab_handler (movmisalign_optab, DECL_MODE (decl))
+	   != CODE_FOR_nothing)
+	  || targetm.slow_unaligned_access (DECL_MODE (decl), align)))
+    align = GET_MODE_ALIGNMENT (DECL_MODE (decl));
+
   /* Reset the alignment in case we have made it tighter, so we can benefit
      from it in get_pointer_alignment.  */
   SET_DECL_ALIGN (decl, align);

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCHv4] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)
  2019-08-15 12:38                     ` Bernd Edlinger
@ 2019-08-15 13:03                       ` Richard Biener
  2019-08-15 14:33                         ` Richard Biener
  2019-08-15 15:28                         ` Bernd Edlinger
  0 siblings, 2 replies; 50+ messages in thread
From: Richard Biener @ 2019-08-15 13:03 UTC (permalink / raw)
  To: Bernd Edlinger
  Cc: gcc-patches, Richard Earnshaw, Ramana Radhakrishnan,
	Kyrill Tkachov, Eric Botcazou, Jeff Law, Jakub Jelinek

On Thu, 15 Aug 2019, Bernd Edlinger wrote:

> On 8/15/19 10:55 AM, Richard Biener wrote:
> > On Wed, 14 Aug 2019, Bernd Edlinger wrote:
> > 
> >> On 8/14/19 2:00 PM, Richard Biener wrote:
> >>
> >> Well, yes, but I was scared away by the complexity of emit_move_insn_1.
> >>
> >> It could be done, but in the moment I would be happy to have these
> >> checks of one major strict alignment target, ARM is a good candidate
> >> since most instructions work even if they are accidentally
> >> using unaligned arguments.  So middle-end errors do not always
> >> visible by ordinary tests.  Nevertheless it is a blatant violation of the
> >> contract between middle-end and back-end, which should be avoided.
> > 
> > Fair enough.
> > 
> >>>> Several struct-layout-1.dg testcase tripped over misaligned
> >>>> complex_cst constants, fixed by varasm.c (align_variable).
> >>>> This is likely a wrong code bug, because misaligned complex
> >>>> constants, are expanded to misaligned MEM_REF, but the
> >>>> expansion cannot handle misaligned constants, only packed
> >>>> structure fields.
> >>>
> >>> Hmm.  So your patch overrides user-alignment here.  Woudln't it
> >>> be better to do that more conciously by
> >>>
> >>>   if (! DECL_USER_ALIGN (decl)
> >>>       || (align < GET_MODE_ALIGNMENT (DECL_MODE (decl))
> >>>           && targetm.slow_unaligned_access (DECL_MODE (decl), align)))
> >>>
> 
> ? I don't know why that would be better?
> If the value is underaligned no matter why, pretend it was declared as
> naturally aligned if that causes wrong code otherwise.
> That was the idea here.

It would be better because then we ignore it and use what we'd use
by default rather than inventing sth new.  And your patch suggests
it might be needed to up align even w/o DECL_USER_ALIGN.

> >>> ?  And why is the movmisalign optab support missing here?
> >>>
> >>
> >> Yes, I wanted to replicate what we have in assign_parm_adjust_stack_rtl:
> >>
> >>   /* If we can't trust the parm stack slot to be aligned enough for its
> >>      ultimate type, don't use that slot after entry.  We'll make another
> >>      stack slot, if we need one.  */
> >>   if (stack_parm
> >>       && ((GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN (stack_parm)
> >>            && targetm.slow_unaligned_access (data->nominal_mode,
> >>                                              MEM_ALIGN (stack_parm)))
> >>
> >> which also makes a variable more aligned than it is declared.
> >> But maybe both should also check the movmisalign optab in
> >> addition to slow_unaligned_access ?
> > 
> > Quite possible.
> > 
> 
> Will do, see attached new version of the patch.
> 
> >>> IMHO whatever code later fails to properly use unaligned loads
> >>> should be fixed instead rather than ignoring user requested alignment.
> >>>
> >>> Can you quote a short testcase that explains what exactly goes wrong?
> >>> The struct-layout ones are awkward to look at...
> >>>
> >>
> >> Sure,
> >>
> >> $ cat test.c
> >> _Complex float __attribute__((aligned(1))) cf;
> >>
> >> void foo (void)
> >> {
> >>   cf = 1.0i;
> >> }
> >>
> >> $ arm-linux-gnueabihf-gcc -S test.c 
> >> during RTL pass: expand
> >> test.c: In function 'foo':
> >> test.c:5:6: internal compiler error: in gen_movsf, at config/arm/arm.md:7003
> >>     5 |   cf = 1.0i;
> >>       |   ~~~^~~~~~
> >> 0x7ba475 gen_movsf(rtx_def*, rtx_def*)
> >> 	../../gcc-trunk/gcc/config/arm/arm.md:7003
> >> 0xa49587 insn_gen_fn::operator()(rtx_def*, rtx_def*) const
> >> 	../../gcc-trunk/gcc/recog.h:318
> >> 0xa49587 emit_move_insn_1(rtx_def*, rtx_def*)
> >> 	../../gcc-trunk/gcc/expr.c:3695
> >> 0xa49914 emit_move_insn(rtx_def*, rtx_def*)
> >> 	../../gcc-trunk/gcc/expr.c:3791
> >> 0xa494f7 emit_move_complex_parts(rtx_def*, rtx_def*)
> >> 	../../gcc-trunk/gcc/expr.c:3490
> >> 0xa49914 emit_move_insn(rtx_def*, rtx_def*)
> >> 	../../gcc-trunk/gcc/expr.c:3791
> >> 0xa5106f store_expr(tree_node*, rtx_def*, int, bool, bool)
> >> 	../../gcc-trunk/gcc/expr.c:5855
> >> 0xa51cc0 expand_assignment(tree_node*, tree_node*, bool)
> >> 	../../gcc-trunk/gcc/expr.c:5441
> > 
> > Huh, so why didn't it trigger
> > 
> >   /* Handle misaligned stores.  */
> >   mode = TYPE_MODE (TREE_TYPE (to));
> >   if ((TREE_CODE (to) == MEM_REF
> >        || TREE_CODE (to) == TARGET_MEM_REF)
> >       && mode != BLKmode
> >       && !mem_ref_refers_to_non_mem_p (to)
> >       && ((align = get_object_alignment (to))
> >           < GET_MODE_ALIGNMENT (mode))
> >       && (((icode = optab_handler (movmisalign_optab, mode))
> >            != CODE_FOR_nothing)
> >           || targetm.slow_unaligned_access (mode, align)))
> >     {
> > 
> > ?  (_Complex float is 32bit aligned it seems, the DECL_RTL for the
> > var is (mem/c:SC (symbol_ref:SI ("cf") [flags 0x2] <var_decl 
> > 0x2aaaaaad1240 cf>) [1 cf+0 S8 A8]), SCmode is 32bit aligned.
> > 
> > Ah, 'to' is a plain DECL here so the above handling is incomplete.
> > IIRC component refs like __real cf = 0.f should be handled fine
> > again(?).  So, does adding || DECL_P (to) fix the case as well?
> > 
> 
> So I tried this instead of the varasm.c change:
> 
> Index: expr.c
> ===================================================================
> --- expr.c	(revision 274487)
> +++ expr.c	(working copy)
> @@ -5002,9 +5002,10 @@ expand_assignment (tree to, tree from, bool nontem
>    /* Handle misaligned stores.  */
>    mode = TYPE_MODE (TREE_TYPE (to));
>    if ((TREE_CODE (to) == MEM_REF
> -       || TREE_CODE (to) == TARGET_MEM_REF)
> +       || TREE_CODE (to) == TARGET_MEM_REF
> +       || DECL_P (to))
>        && mode != BLKmode
> -      && !mem_ref_refers_to_non_mem_p (to)
> +      && (DECL_P (to) || !mem_ref_refers_to_non_mem_p (to))
>        && ((align = get_object_alignment (to))
>  	  < GET_MODE_ALIGNMENT (mode))
>        && (((icode = optab_handler (movmisalign_optab, mode))
> 
> Result, yes, it fixes this test case
> but then I run all struct-layout-1.exp there are sill cases. where we have problems:
> 
> In file included from /home/ed/gnu/gcc-build-arm-linux-gnueabihf-linux64/gcc/testsuite/gcc/gcc.dg-struct-layout-1//t024_x.c:8:^M
> /home/ed/gnu/gcc-build-arm-linux-gnueabihf-linux64/gcc/testsuite/gcc/gcc.dg-struct-layout-1//t024_test.h: In function 'test2112':^M
> /home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.dg/compat/struct-layout-1_x1.h:23:10: internal compiler error: in gen_movdf, at config/arm/arm.md:7107^M
> /home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.dg/compat/struct-layout-1_x1.h:62:3: note: in definition of macro 'TX'^M
> /home/ed/gnu/gcc-build-arm-linux-gnueabihf-linux64/gcc/testsuite/gcc/gcc.dg-struct-layout-1//t024_test.h:113:1: note: in expansion of macro 'TCI'^M
> /home/ed/gnu/gcc-build-arm-linux-gnueabihf-linux64/gcc/testsuite/gcc/gcc.dg-struct-layout-1//t024_test.h:113:294: note: in expansion of macro 'F'^M
> 0x7ba377 gen_movdf(rtx_def*, rtx_def*)^M
>         ../../gcc-trunk/gcc/config/arm/arm.md:7107^M
> 0xa494c7 insn_gen_fn::operator()(rtx_def*, rtx_def*) const^M
>         ../../gcc-trunk/gcc/recog.h:318^M
> 0xa494c7 emit_move_insn_1(rtx_def*, rtx_def*)^M
>         ../../gcc-trunk/gcc/expr.c:3695^M
> 0xa49854 emit_move_insn(rtx_def*, rtx_def*)^M
>         ../../gcc-trunk/gcc/expr.c:3791^M
> 0xa49437 emit_move_complex_parts(rtx_def*, rtx_def*)^M
>         ../../gcc-trunk/gcc/expr.c:3490^M
> 0xa49854 emit_move_insn(rtx_def*, rtx_def*)^M
>         ../../gcc-trunk/gcc/expr.c:3791^M
> 0xa50faf store_expr(tree_node*, rtx_def*, int, bool, bool)^M
>         ../../gcc-trunk/gcc/expr.c:5856^M
> 0xa51f34 expand_assignment(tree_node*, tree_node*, bool)^M
>         ../../gcc-trunk/gcc/expr.c:5302^M
> 0xa51f34 expand_assignment(tree_node*, tree_node*, bool)^M
>         ../../gcc-trunk/gcc/expr.c:4983^M
> 0x9338af expand_gimple_stmt_1^M
>         ../../gcc-trunk/gcc/cfgexpand.c:3777^M
> 0x9338af expand_gimple_stmt^M
>         ../../gcc-trunk/gcc/cfgexpand.c:3875^M
> 0x939221 expand_gimple_basic_block^M
>         ../../gcc-trunk/gcc/cfgexpand.c:5915^M
> 0x93af86 execute^M
>         ../../gcc-trunk/gcc/cfgexpand.c:6538^M
> Please submit a full bug report,^M
> 
> My personal gut feeling this will be more fragile than over-aligning the
> constants.

As said the constant shouldn't end up under-aligned, the user cannot
specify alignment of literal constants.  Not sure what you mean
with "over"-aligning.

> 
> 
> >> 0xa51cc0 expand_assignment(tree_node*, tree_node*, bool)
> >> 	../../gcc-trunk/gcc/expr.c:4983
> >> 0x93396f expand_gimple_stmt_1
> >> 	../../gcc-trunk/gcc/cfgexpand.c:3777
> >> 0x93396f expand_gimple_stmt
> >> 	../../gcc-trunk/gcc/cfgexpand.c:3875
> >> 0x9392e1 expand_gimple_basic_block
> >> 	../../gcc-trunk/gcc/cfgexpand.c:5915
> >> 0x93b046 execute
> >> 	../../gcc-trunk/gcc/cfgexpand.c:6538
> >> Please submit a full bug report,
> >> with preprocessed source if appropriate.
> >> Please include the complete backtrace with any bug report.
> >> See <https://gcc.gnu.org/bugs/> for instructions.
> >>
> >> Without the hunk in varasm.c of course.
> >>
> >> What happens is that expand_expr_real_2 returns a unaligned mem_ref here:
> >>
> >>     case COMPLEX_CST:
> >>       /* Handle evaluating a complex constant in a CONCAT target.  */
> >>       if (original_target && GET_CODE (original_target) == CONCAT)
> >>         {
> >>           [... this path not taken ...]
> 
> BTW: this code block executes when the other ICE happens.
>  
> >>         }
> >>
> >>       /* fall through */
> >>
> >>     case STRING_CST:
> >>       temp = expand_expr_constant (exp, 1, modifier);
> >>
> >>       /* temp contains a constant address.
> >>          On RISC machines where a constant address isn't valid,
> >>          make some insns to get that address into a register.  */
> >>       if (modifier != EXPAND_CONST_ADDRESS
> >>           && modifier != EXPAND_INITIALIZER
> >>           && modifier != EXPAND_SUM
> >>           && ! memory_address_addr_space_p (mode, XEXP (temp, 0),
> >>                                             MEM_ADDR_SPACE (temp)))
> >>         return replace_equiv_address (temp,
> >>                                       copy_rtx (XEXP (temp, 0)));
> >>       return temp;
> >>
> >> The result of expand_expr_real(..., EXPAND_NORMAL) ought to be usable
> >> by emit_move_insn, that is expected just *everywhere* and can't be changed.
> >>
> >> This could probably be fixed in an ugly way in the COMPLEX_CST, handler
> >> but OTOH, I don't see any reason why this constant has to be misaligned
> >> when it can be easily aligned, which avoids the need for a misaligned access.
> > 
> > If the COMPLEX_CST happends to end up in unaligned memory then that's
> > of course a bug (unless the target requests that for all COMPLEX_CSTs).
> > That is, if the unalignment is triggered because the store is to an
> > unaligned decl.
> > 
> > But I think the issue is the above one?
> > 
> 
> yes initially the constant seems to be unaligned. then it is expanded,
> and there is no special handling for unaligned constants in expand_expr_real,
> and then probably expand_assignment or store_expr seem not fully prepared for
> this either.

With a cross I see the constant has regular aligned _Complex type
so not sure how it can end up unaligned.

> >>>> Furthermore gcc.dg/Warray-bounds-33.c was fixed by the
> >>>> change in expr.c (expand_expr_real_1).  Certainly is it invalid
> >>>> to read memory at a function address, but it should not ICE.
> >>>> The problem here, is the MEM_REF has no valid MEM_ALIGN, it looks
> >>>> like A32, so the misaligned code execution is not taken, but it is
> >>>> set to A8 below, but then we hit an ICE if the result is used:
> >>>
> >>> So the user accessed it as A32.
> >>>
> >>>>         /* Don't set memory attributes if the base expression is
> >>>>            SSA_NAME that got expanded as a MEM.  In that case, we should
> >>>>            just honor its original memory attributes.  */
> >>>>         if (TREE_CODE (tem) != SSA_NAME || !MEM_P (orig_op0))
> >>>>           set_mem_attributes (op0, exp, 0);
> >>>
> >>> Huh, I don't understand this.  'tem' should never be SSA_NAME.
> >>
> >> tem is the result of get_inner_reference, why can't that be a SSA_NAME ?
> > 
> > We can't subset an SSA_NAME.  I have really no idea what this intended
> > to do...
> > 
> 
> Nice, so would you do a patch to change that to a
> gcc_checking_assert (TREE_CODE (tem) != SSA_NAME) ?
> maybe with a small explanation?

I'll try.

> >>> But set_mem_attributes_minus_bitpos uses get_object_alignment_1
> >>> and that has special treatment for FUNCTION_DECLs that is not
> >>> covered by
> >>>
> >>>       /* When EXP is an actual memory reference then we can use
> >>>          TYPE_ALIGN of a pointer indirection to derive alignment.
> >>>          Do so only if get_pointer_alignment_1 did not reveal absolute
> >>>          alignment knowledge and if using that alignment would
> >>>          improve the situation.  */
> >>>       unsigned int talign;
> >>>       if (!addr_p && !known_alignment
> >>>           && (talign = min_align_of_type (TREE_TYPE (exp)) * 
> >>> BITS_PER_UNIT)
> >>>           && talign > align)
> >>>         align = talign;
> >>>
> >>> which could be moved out of the if-cascade.
> >>>
> >>> That said, setting A8 should eventually result into appropriate
> >>> unaligned expansion, so it seems odd this triggers the assert...
> >>>
> >>
> >> The function pointer is really 32-byte aligned in ARM mode to start
> >> with...
> >>
> >> The problem is that the code that handles this misaligned access
> >> is skipped because the mem_rtx has initially no MEM_ATTRS and therefore
> >> MEM_ALIGN == 32, and therefore the code that handles the unaligned
> >> access is not taken.  BUT before the mem_rtx is returned it is
> >> set to MEM_ALIGN = 8 by set_mem_attributes, and we have an assertion,
> >> because the result from expand_expr_real(..., EXPAND_NORMAL) ought to be
> >> usable with emit_move_insn.
> > 
> > yes, as said the _access_ determines the address should be aligned
> > so we shouldn't end up setting MEM_ALIGN to 8 but to 32 according
> > to the access type/mode.  But we can't trust DECL_ALIGN of
> > FUNCTION_DECLs but we _can_ trust users writing *(int *)fn
> > (maybe for actual accesses we _can_ trust DECL_ALIGN, it's just
> > we may not compute nonzero bits for the actual address because
> > of function pointer mangling)
> > (for accessing function code I'd say this would be premature
> > optimization, but ...)
> > 
> 
> Not a very nice solution, but it is not worth to spend much effort
> in optimizing undefined behavior, I just want to avoid the ICE
> at this time and would not trust the DECL_ALIGN either.

So I meant

Index: gcc/builtins.c
===================================================================
--- gcc/builtins.c      (revision 274534)
+++ gcc/builtins.c      (working copy)
@@ -255,7 +255,8 @@ get_object_alignment_2 (tree exp, unsign
 
   /* Extract alignment information from the innermost object and
      possibly adjust bitpos and offset.  */
-  if (TREE_CODE (exp) == FUNCTION_DECL)
+  if (TREE_CODE (exp) == FUNCTION_DECL
+      && addr_p)
     {
       /* Function addresses can encode extra information besides their
         alignment.  However, if TARGET_PTRMEMFUNC_VBIT_LOCATION

so we get at DECL_ALIGN of the FUNCTION_DECL (not sure if we
can trust it).

> >>>>
> >>>> Finally gcc.dg/torture/pr48493.c required the change
> >>>> in assign_parm_setup_stack.  This is just not using the
> >>>> correct MEM_ALIGN attribute value, while the memory is
> >>>> actually aligned.
> >>>
> >>> But doesn't
> >>>
> >>>           int align = STACK_SLOT_ALIGNMENT (data->passed_type,
> >>>                                             GET_MODE (data->entry_parm),
> >>>                                             TYPE_ALIGN 
> >>> (data->passed_type));
> >>> +         if (align < (int)GET_MODE_ALIGNMENT (GET_MODE 
> >>> (data->entry_parm))
> >>> +             && targetm.slow_unaligned_access (GET_MODE 
> >>> (data->entry_parm),
> >>> +                                               align))
> >>> +           align = GET_MODE_ALIGNMENT (GET_MODE (data->entry_parm));
> >>>
> >>> hint at that STACK_SLOT_ALIGNMENT is simply bogus for the target?
> >>> That is, the target says, for natural alignment 64 the stack slot
> >>> alignment can only be guaranteed 32.  You can't then simply up it
> >>> but have to use unaligned accesses (or the target/middle-end needs
> >>> to do dynamic stack alignment).
> >>>
> >> Yes, maybe, but STACK_SLOT_ALIGNMENT is used in a few other places as well,
> >> and none of them have a problem, probably because they use expand_expr,
> >> but here we use emit_move_insn:
> >>
> >>       if (MEM_P (src))
> >>         {
> >>           [...]
> >>         }
> >>       else
> >>         {
> >>           if (!REG_P (src))
> >>             src = force_reg (GET_MODE (src), src);
> >>           emit_move_insn (dest, src);
> >>         }
> >>
> >> So I could restrict that to
> >>
> >>           if (!MEM_P (data->entry_parm)
> >>               && align < (int)GET_MODE_ALIGNMENT (GET_MODE (data->entry_parm))
> >>               && ((optab_handler (movmisalign_optab,
> >> 				  GET_MODE (data->entry_parm))
> >>                    != CODE_FOR_nothing)
> >>                   || targetm.slow_unaligned_access (GET_MODE (data->entry_parm),
> >>                                                     align)))
> >>             align = GET_MODE_ALIGNMENT (GET_MODE (data->entry_parm));
> >>
> >> But OTOH even for arguments arriving in unaligned stack slots where
> >> emit_block_move could handle it, that would just work against the
> >> intention of assign_parm_adjust_stack_rtl.
> >>
> >> Of course there are limits how much alignment assign_stack_local
> >> can handle, and that would result in an assertion in the emit_move_insn.
> >> But in the end if that happens it is just an impossible target
> >> configuration.
> > 
> > Still I think you can't simply override STACK_SLOT_ALIGNMENT just because
> > of the mode of an entry param, can you?  If you can assume a bigger
> > alignment then STACK_SLOT_ALIGNMENT should return it.
> > 
> 
> I don't see a real problem here.  All target except i386 and gcn (whatever that is)
> use the default for STACK_SLOT_ALIGNMENT which simply allows any (large) align value
> to rule the effective STACK_SLOT_ALIGNMENT.  The user could have simply declared
> the local variable with the alignment that results in better code FWIW.
> 
> If the stack alignment is too high that is capped in assign_stack_local:
> 
>   /* Ignore alignment if it exceeds MAX_SUPPORTED_STACK_ALIGNMENT.  */
>   if (alignment_in_bits > MAX_SUPPORTED_STACK_ALIGNMENT)
>     {
>       alignment_in_bits = MAX_SUPPORTED_STACK_ALIGNMENT;
>       alignment = MAX_SUPPORTED_STACK_ALIGNMENT / BITS_PER_UNIT;
>     }
> 
> I for one, would just assume that MAX_SUPPORTED_STACK_ALIGNMENT should
> be sufficient for all modes that need movmisalign_optab and friends.
> If it is not, an ICE would be just fine.

Hmm.  In some way we could better communicate with the user then
and do not allow under-aligning automatic vars?  But the you
still have packed structs with BLKmode where the actual field
accesses will carry SImode even when not aligned(?)

> >>>
> >>>>  Note that set_mem_attributes does not
> >>>> always preserve the MEM_ALIGN of the ref, since:
> >>>
> >>> set_mem_attributes sets _all_ attributes from an expression or type.
> >>>
> >>
> >> Not really:
> >>
> >>   refattrs = MEM_ATTRS (ref);
> >>   if (refattrs)
> >>     {
> >>       /* ??? Can this ever happen?  Calling this routine on a MEM that
> >>          already carries memory attributes should probably be invalid.  */
> >>       [...]
> >>       attrs.align = refattrs->align;
> >>     }
> >>   else
> >>     [...]
> >>
> >>   if (objectp || TREE_CODE (t) == INDIRECT_REF)
> >>     attrs.align = MAX (attrs.align, TYPE_ALIGN (type));
> >>
> >>>>   /* Default values from pre-existing memory attributes if present.  */
> >>>>   refattrs = MEM_ATTRS (ref);
> >>>>   if (refattrs)
> >>>>     {
> >>>>       /* ??? Can this ever happen?  Calling this routine on a MEM that
> >>>>          already carries memory attributes should probably be invalid.  */
> >>>>       attrs.expr = refattrs->expr;
> >>>>       attrs.offset_known_p = refattrs->offset_known_p;
> >>>>       attrs.offset = refattrs->offset;
> >>>>       attrs.size_known_p = refattrs->size_known_p;
> >>>>       attrs.size = refattrs->size;
> >>>>       attrs.align = refattrs->align;
> >>>>     }
> >>>>
> >>>> but if we happen to set_mem_align to _exactly_ the MODE_ALIGNMENT
> >>>> the MEM_ATTRS are zero, and a smaller alignment may result.
> >>>
> >>> Not sure what you are saying here.  That
> >>>
> >>> set_mem_align (MEM:SI A32, 32)
> >>>
> >>> produces a NULL MEM_ATTRS and thus set_mem_attributes not inheriting
> >>> the A32 but eventually computing sth lower?  Yeah, that's probably
> >>> an interesting "hole" here.  I'm quite sure that if we'd do
> >>>
> >>> refattrs = MEM_ATTRS (ref) ? MEM_ATTRS (ref) : mem_mode_attrs[(int) GET_MODE (ref)];
> >>>
> >>> we run into issues exactly on strict-align targets ...
> >>>
> >>
> >> Yeah, that's scary...
> >>
> >>>
> >>> @@ -3291,6 +3306,23 @@ assign_parm_setup_reg (struct assign_parm_data_all
> >>>
> >>>        did_conversion = true;
> >>>      }
> >>> +  else if (MEM_P (data->entry_parm)
> >>> +          && GET_MODE_ALIGNMENT (promoted_nominal_mode)
> >>> +             > MEM_ALIGN (data->entry_parm)
> >>> +          && (((icode = optab_handler (movmisalign_optab,
> >>> +                                       promoted_nominal_mode))
> >>> +               != CODE_FOR_nothing)
> >>> +              || targetm.slow_unaligned_access (promoted_nominal_mode,
> >>> +                                                MEM_ALIGN 
> >>> (data->entry_parm))))
> >>> +    {
> >>> +      if (icode != CODE_FOR_nothing)
> >>> +       emit_insn (GEN_FCN (icode) (parmreg, validated_mem));
> >>> +      else
> >>> +       rtl = parmreg = extract_bit_field (validated_mem,
> >>> +                       GET_MODE_BITSIZE (promoted_nominal_mode), 0,
> >>> +                       unsignedp, parmreg,
> >>> +                       promoted_nominal_mode, VOIDmode, false, NULL);
> >>> +    }
> >>>    else
> >>>      emit_move_insn (parmreg, validated_mem);
> >>>
> >>> This hunk would be obvious to me if we'd use MEM_ALIGN (validated_mem) /
> >>> GET_MODE (validated_mem) instead of MEM_ALIGN (data->entry_parm)
> >>> and promoted_nominal_mode.
> >>>
> >>
> >> Yes, the idea is just to save some cycles, since
> >>
> >> parmreg = gen_reg_rtx (promoted_nominal_mode);
> >> we know that parmreg will also have that mode, plus
> >> emit_move_insn (parmreg, validated_mem) which would be called here
> >> asserts that:
> >>
> >>   gcc_assert (mode != BLKmode
> >>               && (GET_MODE (y) == mode || GET_MODE (y) == VOIDmode));
> >>
> >> so GET_MODE(validated_mem) == GET_MODE (parmreg) == promoted_nominal_mode
> >>
> >> I still like the current version with promoted_nominal_mode slighhtly
> >> better both because of performance, and the 80-column restriction. :)
> > 
> > So if you say they are 1:1 equivalent then go for it (for this hunk,
> > approved as "obvious").
> > 
> 
> Okay.  Thanks, so I committed that hunk as r274531.
> 
> Here is what I have right now, boot-strapped and reg-tested on x86_64-pc-linux-gnu
> and arm-linux-gnueabihf (still running, but looks good so far).
> 
> Is it OK for trunk?

Please split it into the parts for the PR and parts making the
asserts not trigger.

The PR is already fixed, right?  The assign_parm_find_stack_rtl hunk
is merely an optimization?

Richard.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCHv4] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)
  2019-08-15 13:03                       ` Richard Biener
@ 2019-08-15 14:33                         ` Richard Biener
  2019-08-15 15:28                         ` Bernd Edlinger
  1 sibling, 0 replies; 50+ messages in thread
From: Richard Biener @ 2019-08-15 14:33 UTC (permalink / raw)
  To: Bernd Edlinger
  Cc: gcc-patches, Richard Earnshaw, Ramana Radhakrishnan,
	Kyrill Tkachov, Eric Botcazou, Jeff Law, Jakub Jelinek

On Thu, 15 Aug 2019, Richard Biener wrote:

> On Thu, 15 Aug 2019, Bernd Edlinger wrote:
> > > 
> > > We can't subset an SSA_NAME.  I have really no idea what this intended
> > > to do...
> > > 
> > 
> > Nice, so would you do a patch to change that to a
> > gcc_checking_assert (TREE_CODE (tem) != SSA_NAME) ?
> > maybe with a small explanation?
> 
> I'll try.

So actually we can via BIT_FIELD_REF<_1, ...> and that _1 can end
up being expanded in memory.  See r233656 which brought this in.

Richard.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCHv4] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)
  2019-08-15 13:03                       ` Richard Biener
  2019-08-15 14:33                         ` Richard Biener
@ 2019-08-15 15:28                         ` Bernd Edlinger
  2019-08-15 17:42                           ` Richard Biener
  1 sibling, 1 reply; 50+ messages in thread
From: Bernd Edlinger @ 2019-08-15 15:28 UTC (permalink / raw)
  To: Richard Biener
  Cc: gcc-patches, Richard Earnshaw, Ramana Radhakrishnan,
	Kyrill Tkachov, Eric Botcazou, Jeff Law, Jakub Jelinek

On 8/15/19 2:54 PM, Richard Biener wrote:
> On Thu, 15 Aug 2019, Bernd Edlinger wrote:
> 
>>>>>
>>>>> Hmm.  So your patch overrides user-alignment here.  Woudln't it
>>>>> be better to do that more conciously by
>>>>>
>>>>>   if (! DECL_USER_ALIGN (decl)
>>>>>       || (align < GET_MODE_ALIGNMENT (DECL_MODE (decl))
>>>>>           && targetm.slow_unaligned_access (DECL_MODE (decl), align)))
>>>>>
>>
>> ? I don't know why that would be better?
>> If the value is underaligned no matter why, pretend it was declared as
>> naturally aligned if that causes wrong code otherwise.
>> That was the idea here.
> 
> It would be better because then we ignore it and use what we'd use
> by default rather than inventing sth new.  And your patch suggests
> it might be needed to up align even w/o DECL_USER_ALIGN.
> 

Hmmm, you mean the constant 1.0i should not have DECL_USER_ALIGN set?
But it inherits the alignment from the destination variable, apparently.

did you mean
if (! DECL_USER_ALIGN (decl)
    && align < GET_MODE_ALIGNMENT (DECL_MODE (decl))
    && ...
?

I can give it a try.

>>>>> IMHO whatever code later fails to properly use unaligned loads
>>>>> should be fixed instead rather than ignoring user requested alignment.
>>>>>
>>>>> Can you quote a short testcase that explains what exactly goes wrong?
>>>>> The struct-layout ones are awkward to look at...
>>>>>
>>>>
>>>> Sure,
>>>>
>>>> $ cat test.c
>>>> _Complex float __attribute__((aligned(1))) cf;
>>>>
>>>> void foo (void)
>>>> {
>>>>   cf = 1.0i;
>>>> }
>>>>
>>>> $ arm-linux-gnueabihf-gcc -S test.c 
>>>> during RTL pass: expand
>>>> test.c: In function 'foo':
>>>> test.c:5:6: internal compiler error: in gen_movsf, at config/arm/arm.md:7003
>>>>     5 |   cf = 1.0i;
>>>>       |   ~~~^~~~~~
>>>> 0x7ba475 gen_movsf(rtx_def*, rtx_def*)
>>>> 	../../gcc-trunk/gcc/config/arm/arm.md:7003
>>>> 0xa49587 insn_gen_fn::operator()(rtx_def*, rtx_def*) const
>>>> 	../../gcc-trunk/gcc/recog.h:318
>>>> 0xa49587 emit_move_insn_1(rtx_def*, rtx_def*)
>>>> 	../../gcc-trunk/gcc/expr.c:3695
>>>> 0xa49914 emit_move_insn(rtx_def*, rtx_def*)
>>>> 	../../gcc-trunk/gcc/expr.c:3791
>>>> 0xa494f7 emit_move_complex_parts(rtx_def*, rtx_def*)
>>>> 	../../gcc-trunk/gcc/expr.c:3490
>>>> 0xa49914 emit_move_insn(rtx_def*, rtx_def*)
>>>> 	../../gcc-trunk/gcc/expr.c:3791
>>>> 0xa5106f store_expr(tree_node*, rtx_def*, int, bool, bool)
>>>> 	../../gcc-trunk/gcc/expr.c:5855
>>>> 0xa51cc0 expand_assignment(tree_node*, tree_node*, bool)
>>>> 	../../gcc-trunk/gcc/expr.c:5441
>>>
>>> Huh, so why didn't it trigger
>>>
>>>   /* Handle misaligned stores.  */
>>>   mode = TYPE_MODE (TREE_TYPE (to));
>>>   if ((TREE_CODE (to) == MEM_REF
>>>        || TREE_CODE (to) == TARGET_MEM_REF)
>>>       && mode != BLKmode
>>>       && !mem_ref_refers_to_non_mem_p (to)
>>>       && ((align = get_object_alignment (to))
>>>           < GET_MODE_ALIGNMENT (mode))
>>>       && (((icode = optab_handler (movmisalign_optab, mode))
>>>            != CODE_FOR_nothing)
>>>           || targetm.slow_unaligned_access (mode, align)))
>>>     {
>>>
>>> ?  (_Complex float is 32bit aligned it seems, the DECL_RTL for the
>>> var is (mem/c:SC (symbol_ref:SI ("cf") [flags 0x2] <var_decl 
>>> 0x2aaaaaad1240 cf>) [1 cf+0 S8 A8]), SCmode is 32bit aligned.
>>>
>>> Ah, 'to' is a plain DECL here so the above handling is incomplete.
>>> IIRC component refs like __real cf = 0.f should be handled fine
>>> again(?).  So, does adding || DECL_P (to) fix the case as well?
>>>
>>
>> So I tried this instead of the varasm.c change:
>>
>> Index: expr.c
>> ===================================================================
>> --- expr.c	(revision 274487)
>> +++ expr.c	(working copy)
>> @@ -5002,9 +5002,10 @@ expand_assignment (tree to, tree from, bool nontem
>>    /* Handle misaligned stores.  */
>>    mode = TYPE_MODE (TREE_TYPE (to));
>>    if ((TREE_CODE (to) == MEM_REF
>> -       || TREE_CODE (to) == TARGET_MEM_REF)
>> +       || TREE_CODE (to) == TARGET_MEM_REF
>> +       || DECL_P (to))
>>        && mode != BLKmode
>> -      && !mem_ref_refers_to_non_mem_p (to)
>> +      && (DECL_P (to) || !mem_ref_refers_to_non_mem_p (to))
>>        && ((align = get_object_alignment (to))
>>  	  < GET_MODE_ALIGNMENT (mode))
>>        && (((icode = optab_handler (movmisalign_optab, mode))
>>
>> Result, yes, it fixes this test case
>> but then I run all struct-layout-1.exp there are sill cases. where we have problems:
>>
>> In file included from /home/ed/gnu/gcc-build-arm-linux-gnueabihf-linux64/gcc/testsuite/gcc/gcc.dg-struct-layout-1//t024_x.c:8:^M
>> /home/ed/gnu/gcc-build-arm-linux-gnueabihf-linux64/gcc/testsuite/gcc/gcc.dg-struct-layout-1//t024_test.h: In function 'test2112':^M
>> /home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.dg/compat/struct-layout-1_x1.h:23:10: internal compiler error: in gen_movdf, at config/arm/arm.md:7107^M
>> /home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.dg/compat/struct-layout-1_x1.h:62:3: note: in definition of macro 'TX'^M
>> /home/ed/gnu/gcc-build-arm-linux-gnueabihf-linux64/gcc/testsuite/gcc/gcc.dg-struct-layout-1//t024_test.h:113:1: note: in expansion of macro 'TCI'^M
>> /home/ed/gnu/gcc-build-arm-linux-gnueabihf-linux64/gcc/testsuite/gcc/gcc.dg-struct-layout-1//t024_test.h:113:294: note: in expansion of macro 'F'^M
>> 0x7ba377 gen_movdf(rtx_def*, rtx_def*)^M
>>         ../../gcc-trunk/gcc/config/arm/arm.md:7107^M
>> 0xa494c7 insn_gen_fn::operator()(rtx_def*, rtx_def*) const^M
>>         ../../gcc-trunk/gcc/recog.h:318^M
>> 0xa494c7 emit_move_insn_1(rtx_def*, rtx_def*)^M
>>         ../../gcc-trunk/gcc/expr.c:3695^M
>> 0xa49854 emit_move_insn(rtx_def*, rtx_def*)^M
>>         ../../gcc-trunk/gcc/expr.c:3791^M
>> 0xa49437 emit_move_complex_parts(rtx_def*, rtx_def*)^M
>>         ../../gcc-trunk/gcc/expr.c:3490^M
>> 0xa49854 emit_move_insn(rtx_def*, rtx_def*)^M
>>         ../../gcc-trunk/gcc/expr.c:3791^M
>> 0xa50faf store_expr(tree_node*, rtx_def*, int, bool, bool)^M
>>         ../../gcc-trunk/gcc/expr.c:5856^M
>> 0xa51f34 expand_assignment(tree_node*, tree_node*, bool)^M
>>         ../../gcc-trunk/gcc/expr.c:5302^M
>> 0xa51f34 expand_assignment(tree_node*, tree_node*, bool)^M
>>         ../../gcc-trunk/gcc/expr.c:4983^M
>> 0x9338af expand_gimple_stmt_1^M
>>         ../../gcc-trunk/gcc/cfgexpand.c:3777^M
>> 0x9338af expand_gimple_stmt^M
>>         ../../gcc-trunk/gcc/cfgexpand.c:3875^M
>> 0x939221 expand_gimple_basic_block^M
>>         ../../gcc-trunk/gcc/cfgexpand.c:5915^M
>> 0x93af86 execute^M
>>         ../../gcc-trunk/gcc/cfgexpand.c:6538^M
>> Please submit a full bug report,^M
>>
>> My personal gut feeling this will be more fragile than over-aligning the
>> constants.
> 
> As said the constant shouldn't end up under-aligned, the user cannot
> specify alignment of literal constants.  Not sure what you mean
> with "over"-aligning.
> 


Hmm wait a moment, I actually wanted _only_ to change the DECL_ARTIFICIAL
that is built by build_constant_desc.  It uses align_variable of course,
but I totally missed that this also controls the alignment of normal
variables, sorry about the confusion here.

I mean we should align the constant for the unaligned complex with
the natural alignment of the type-mode.  That wrong fix made
the variables ignore the alignment, which was of course not intended,
and instead I would need:

Index: expr.c
===================================================================
--- expr.c	(revision 274531)
+++ expr.c	(working copy)
@@ -5002,9 +5002,10 @@ expand_assignment (tree to, tree from, bool nontem
   /* Handle misaligned stores.  */
   mode = TYPE_MODE (TREE_TYPE (to));
   if ((TREE_CODE (to) == MEM_REF
-       || TREE_CODE (to) == TARGET_MEM_REF)
+       || TREE_CODE (to) == TARGET_MEM_REF
+       || DECL_P (to))
       && mode != BLKmode
-      && !mem_ref_refers_to_non_mem_p (to)
+      && (DECL_P (to) || !mem_ref_refers_to_non_mem_p (to))
       && ((align = get_object_alignment (to))
 	  < GET_MODE_ALIGNMENT (mode))
       && (((icode = optab_handler (movmisalign_optab, mode))

Index: varasm.c
===================================================================
--- varasm.c	(revision 274531)
+++ varasm.c	(working copy)
@@ -47,6 +47,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "stmt.h"
 #include "expr.h"
 #include "expmed.h"
+#include "optabs.h"
 #include "output.h"
 #include "langhooks.h"
 #include "debug.h"
@@ -3386,7 +3387,15 @@ build_constant_desc (tree exp)
   if (TREE_CODE (exp) == STRING_CST)
     SET_DECL_ALIGN (decl, targetm.constant_alignment (exp, DECL_ALIGN (decl)));
   else
-    align_variable (decl, 0);
+    {
+      align_variable (decl, 0);
+      if (DECL_ALIGN (decl) < GET_MODE_ALIGNMENT (DECL_MODE (decl))
+	  && ((optab_handler (movmisalign_optab, DECL_MODE (decl))
+		!= CODE_FOR_nothing)
+	      || targetm.slow_unaligned_access (DECL_MODE (decl),
+						DECL_ALIGN (decl))))
+	SET_DECL_ALIGN (decl, GET_MODE_ALIGNMENT (DECL_MODE (decl)));
+    }
 
   /* Now construct the SYMBOL_REF and the MEM.  */
   if (use_object_blocks_p ())

>>
>>
>>>> 0xa51cc0 expand_assignment(tree_node*, tree_node*, bool)
>>>> 	../../gcc-trunk/gcc/expr.c:4983
>>>> 0x93396f expand_gimple_stmt_1
>>>> 	../../gcc-trunk/gcc/cfgexpand.c:3777
>>>> 0x93396f expand_gimple_stmt
>>>> 	../../gcc-trunk/gcc/cfgexpand.c:3875
>>>> 0x9392e1 expand_gimple_basic_block
>>>> 	../../gcc-trunk/gcc/cfgexpand.c:5915
>>>> 0x93b046 execute
>>>> 	../../gcc-trunk/gcc/cfgexpand.c:6538
>>>> Please submit a full bug report,
>>>> with preprocessed source if appropriate.
>>>> Please include the complete backtrace with any bug report.
>>>> See <https://gcc.gnu.org/bugs/> for instructions.
>>>>
>>>> Without the hunk in varasm.c of course.
>>>>
>>>> What happens is that expand_expr_real_2 returns a unaligned mem_ref here:
>>>>
>>>>     case COMPLEX_CST:
>>>>       /* Handle evaluating a complex constant in a CONCAT target.  */
>>>>       if (original_target && GET_CODE (original_target) == CONCAT)
>>>>         {
>>>>           [... this path not taken ...]
>>
>> BTW: this code block executes when the other ICE happens.
>>  
>>>>         }
>>>>
>>>>       /* fall through */
>>>>
>>>>     case STRING_CST:
>>>>       temp = expand_expr_constant (exp, 1, modifier);
>>>>
>>>>       /* temp contains a constant address.
>>>>          On RISC machines where a constant address isn't valid,
>>>>          make some insns to get that address into a register.  */
>>>>       if (modifier != EXPAND_CONST_ADDRESS
>>>>           && modifier != EXPAND_INITIALIZER
>>>>           && modifier != EXPAND_SUM
>>>>           && ! memory_address_addr_space_p (mode, XEXP (temp, 0),
>>>>                                             MEM_ADDR_SPACE (temp)))
>>>>         return replace_equiv_address (temp,
>>>>                                       copy_rtx (XEXP (temp, 0)));
>>>>       return temp;
>>>>
>>>> The result of expand_expr_real(..., EXPAND_NORMAL) ought to be usable
>>>> by emit_move_insn, that is expected just *everywhere* and can't be changed.
>>>>
>>>> This could probably be fixed in an ugly way in the COMPLEX_CST, handler
>>>> but OTOH, I don't see any reason why this constant has to be misaligned
>>>> when it can be easily aligned, which avoids the need for a misaligned access.
>>>
>>> If the COMPLEX_CST happends to end up in unaligned memory then that's
>>> of course a bug (unless the target requests that for all COMPLEX_CSTs).
>>> That is, if the unalignment is triggered because the store is to an
>>> unaligned decl.
>>>
>>> But I think the issue is the above one?
>>>
>>
>> yes initially the constant seems to be unaligned. then it is expanded,
>> and there is no special handling for unaligned constants in expand_expr_real,
>> and then probably expand_assignment or store_expr seem not fully prepared for
>> this either.
> 
> With a cross I see the constant has regular aligned _Complex type
> so not sure how it can end up unaligned.
> 

Maybe a target configuration issue.
Not sure, I have configured mine this way:

../gcc-trunk/configure --prefix=/home/ed/gnu/arm-linux-gnueabihf-linux64 --target=arm-linux-gnueabihf --enable-languages=all --with-arch=armv7-a --with-tune=cortex-a9 --with-fpu=vfpv3-d16 --with-float=hard

However it appears now there are two different errors, one is in expand_assignment
which you found (I start to wonder if I should add you to the authors section
of this patch), and a different one, which I have not yet simplified,
but you can easily try that for yourself:

make check-gcc-c RUNTESTFLAGS="struct-layout-1.exp=*"

it is okay when the test fails to execute but there should no internal compiler errors.


>>>>
>>>> The problem is that the code that handles this misaligned access
>>>> is skipped because the mem_rtx has initially no MEM_ATTRS and therefore
>>>> MEM_ALIGN == 32, and therefore the code that handles the unaligned
>>>> access is not taken.  BUT before the mem_rtx is returned it is
>>>> set to MEM_ALIGN = 8 by set_mem_attributes, and we have an assertion,
>>>> because the result from expand_expr_real(..., EXPAND_NORMAL) ought to be
>>>> usable with emit_move_insn.
>>>
>>> yes, as said the _access_ determines the address should be aligned
>>> so we shouldn't end up setting MEM_ALIGN to 8 but to 32 according
>>> to the access type/mode.  But we can't trust DECL_ALIGN of
>>> FUNCTION_DECLs but we _can_ trust users writing *(int *)fn
>>> (maybe for actual accesses we _can_ trust DECL_ALIGN, it's just
>>> we may not compute nonzero bits for the actual address because
>>> of function pointer mangling)
>>> (for accessing function code I'd say this would be premature
>>> optimization, but ...)
>>>
>>
>> Not a very nice solution, but it is not worth to spend much effort
>> in optimizing undefined behavior, I just want to avoid the ICE
>> at this time and would not trust the DECL_ALIGN either.
> 
> So I meant
> 
> Index: gcc/builtins.c
> ===================================================================
> --- gcc/builtins.c      (revision 274534)
> +++ gcc/builtins.c      (working copy)
> @@ -255,7 +255,8 @@ get_object_alignment_2 (tree exp, unsign
>  
>    /* Extract alignment information from the innermost object and
>       possibly adjust bitpos and offset.  */
> -  if (TREE_CODE (exp) == FUNCTION_DECL)
> +  if (TREE_CODE (exp) == FUNCTION_DECL
> +      && addr_p)
>      {
>        /* Function addresses can encode extra information besides their
>          alignment.  However, if TARGET_PTRMEMFUNC_VBIT_LOCATION
> 
> so we get at DECL_ALIGN of the FUNCTION_DECL (not sure if we
> can trust it).
> 
>>>
>>> Still I think you can't simply override STACK_SLOT_ALIGNMENT just because
>>> of the mode of an entry param, can you?  If you can assume a bigger
>>> alignment then STACK_SLOT_ALIGNMENT should return it.
>>>
>>
>> I don't see a real problem here.  All target except i386 and gcn (whatever that is)
>> use the default for STACK_SLOT_ALIGNMENT which simply allows any (large) align value
>> to rule the effective STACK_SLOT_ALIGNMENT.  The user could have simply declared
>> the local variable with the alignment that results in better code FWIW.
>>
>> If the stack alignment is too high that is capped in assign_stack_local:
>>
>>   /* Ignore alignment if it exceeds MAX_SUPPORTED_STACK_ALIGNMENT.  */
>>   if (alignment_in_bits > MAX_SUPPORTED_STACK_ALIGNMENT)
>>     {
>>       alignment_in_bits = MAX_SUPPORTED_STACK_ALIGNMENT;
>>       alignment = MAX_SUPPORTED_STACK_ALIGNMENT / BITS_PER_UNIT;
>>     }
>>
>> I for one, would just assume that MAX_SUPPORTED_STACK_ALIGNMENT should
>> be sufficient for all modes that need movmisalign_optab and friends.
>> If it is not, an ICE would be just fine.
> 
> Hmm.  In some way we could better communicate with the user then
> and do not allow under-aligning automatic vars?  But the you
> still have packed structs with BLKmode where the actual field
> accesses will carry SImode even when not aligned(?)
> 

Yes, that works also when unaligned.

> 
> Please split it into the parts for the PR and parts making the
> asserts not trigger.
> 

Yes, will do.

> The PR is already fixed, right?  The assign_parm_find_stack_rtl hunk
> is merely an optimization?
> 

Hmmmm...  You are right, I should have added that to the commit message...

Of course the test cases try to verify the optimization.


Thanks
Bernd.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCHv4] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)
  2019-08-15 15:28                         ` Bernd Edlinger
@ 2019-08-15 17:42                           ` Richard Biener
  2019-08-15 21:19                             ` [PATCHv5] " Bernd Edlinger
  2019-08-15 21:27                             ` [PATCH] Sanitizing the middle-end interface to the back-end for strict alignment Bernd Edlinger
  0 siblings, 2 replies; 50+ messages in thread
From: Richard Biener @ 2019-08-15 17:42 UTC (permalink / raw)
  To: Bernd Edlinger
  Cc: gcc-patches, Richard Earnshaw, Ramana Radhakrishnan,
	Kyrill Tkachov, Eric Botcazou, Jeff Law, Jakub Jelinek

On August 15, 2019 4:52:24 PM GMT+02:00, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote:
>On 8/15/19 2:54 PM, Richard Biener wrote:
>> On Thu, 15 Aug 2019, Bernd Edlinger wrote:
>> 
>>>>>>
>>>>>> Hmm.  So your patch overrides user-alignment here.  Woudln't it
>>>>>> be better to do that more conciously by
>>>>>>
>>>>>>   if (! DECL_USER_ALIGN (decl)
>>>>>>       || (align < GET_MODE_ALIGNMENT (DECL_MODE (decl))
>>>>>>           && targetm.slow_unaligned_access (DECL_MODE (decl),
>align)))
>>>>>>
>>>
>>> ? I don't know why that would be better?
>>> If the value is underaligned no matter why, pretend it was declared
>as
>>> naturally aligned if that causes wrong code otherwise.
>>> That was the idea here.
>> 
>> It would be better because then we ignore it and use what we'd use
>> by default rather than inventing sth new.  And your patch suggests
>> it might be needed to up align even w/o DECL_USER_ALIGN.
>> 
>
>Hmmm, you mean the constant 1.0i should not have DECL_USER_ALIGN set?
>But it inherits the alignment from the destination variable,
>apparently. 

Yes. I think it shouldn't inherit the alignment unless we are assembling a static initializer. 

>
>did you mean
>if (! DECL_USER_ALIGN (decl)
>    && align < GET_MODE_ALIGNMENT (DECL_MODE (decl))
>    && ...
>?
>
>I can give it a try.

No, I meant || thus ignore DECL_USER_ALIGN if it is sth we have to satisfy with unaligned loads. 
>
>>>>>> IMHO whatever code later fails to properly use unaligned loads
>>>>>> should be fixed instead rather than ignoring user requested
>alignment.
>>>>>>
>>>>>> Can you quote a short testcase that explains what exactly goes
>wrong?
>>>>>> The struct-layout ones are awkward to look at...
>>>>>>
>>>>>
>>>>> Sure,
>>>>>
>>>>> $ cat test.c
>>>>> _Complex float __attribute__((aligned(1))) cf;
>>>>>
>>>>> void foo (void)
>>>>> {
>>>>>   cf = 1.0i;
>>>>> }
>>>>>
>>>>> $ arm-linux-gnueabihf-gcc -S test.c 
>>>>> during RTL pass: expand
>>>>> test.c: In function 'foo':
>>>>> test.c:5:6: internal compiler error: in gen_movsf, at
>config/arm/arm.md:7003
>>>>>     5 |   cf = 1.0i;
>>>>>       |   ~~~^~~~~~
>>>>> 0x7ba475 gen_movsf(rtx_def*, rtx_def*)
>>>>> 	../../gcc-trunk/gcc/config/arm/arm.md:7003
>>>>> 0xa49587 insn_gen_fn::operator()(rtx_def*, rtx_def*) const
>>>>> 	../../gcc-trunk/gcc/recog.h:318
>>>>> 0xa49587 emit_move_insn_1(rtx_def*, rtx_def*)
>>>>> 	../../gcc-trunk/gcc/expr.c:3695
>>>>> 0xa49914 emit_move_insn(rtx_def*, rtx_def*)
>>>>> 	../../gcc-trunk/gcc/expr.c:3791
>>>>> 0xa494f7 emit_move_complex_parts(rtx_def*, rtx_def*)
>>>>> 	../../gcc-trunk/gcc/expr.c:3490
>>>>> 0xa49914 emit_move_insn(rtx_def*, rtx_def*)
>>>>> 	../../gcc-trunk/gcc/expr.c:3791
>>>>> 0xa5106f store_expr(tree_node*, rtx_def*, int, bool, bool)
>>>>> 	../../gcc-trunk/gcc/expr.c:5855
>>>>> 0xa51cc0 expand_assignment(tree_node*, tree_node*, bool)
>>>>> 	../../gcc-trunk/gcc/expr.c:5441
>>>>
>>>> Huh, so why didn't it trigger
>>>>
>>>>   /* Handle misaligned stores.  */
>>>>   mode = TYPE_MODE (TREE_TYPE (to));
>>>>   if ((TREE_CODE (to) == MEM_REF
>>>>        || TREE_CODE (to) == TARGET_MEM_REF)
>>>>       && mode != BLKmode
>>>>       && !mem_ref_refers_to_non_mem_p (to)
>>>>       && ((align = get_object_alignment (to))
>>>>           < GET_MODE_ALIGNMENT (mode))
>>>>       && (((icode = optab_handler (movmisalign_optab, mode))
>>>>            != CODE_FOR_nothing)
>>>>           || targetm.slow_unaligned_access (mode, align)))
>>>>     {
>>>>
>>>> ?  (_Complex float is 32bit aligned it seems, the DECL_RTL for the
>>>> var is (mem/c:SC (symbol_ref:SI ("cf") [flags 0x2] <var_decl 
>>>> 0x2aaaaaad1240 cf>) [1 cf+0 S8 A8]), SCmode is 32bit aligned.
>>>>
>>>> Ah, 'to' is a plain DECL here so the above handling is incomplete.
>>>> IIRC component refs like __real cf = 0.f should be handled fine
>>>> again(?).  So, does adding || DECL_P (to) fix the case as well?
>>>>
>>>
>>> So I tried this instead of the varasm.c change:
>>>
>>> Index: expr.c
>>> ===================================================================
>>> --- expr.c	(revision 274487)
>>> +++ expr.c	(working copy)
>>> @@ -5002,9 +5002,10 @@ expand_assignment (tree to, tree from, bool
>nontem
>>>    /* Handle misaligned stores.  */
>>>    mode = TYPE_MODE (TREE_TYPE (to));
>>>    if ((TREE_CODE (to) == MEM_REF
>>> -       || TREE_CODE (to) == TARGET_MEM_REF)
>>> +       || TREE_CODE (to) == TARGET_MEM_REF
>>> +       || DECL_P (to))
>>>        && mode != BLKmode
>>> -      && !mem_ref_refers_to_non_mem_p (to)
>>> +      && (DECL_P (to) || !mem_ref_refers_to_non_mem_p (to))
>>>        && ((align = get_object_alignment (to))
>>>  	  < GET_MODE_ALIGNMENT (mode))
>>>        && (((icode = optab_handler (movmisalign_optab, mode))
>>>
>>> Result, yes, it fixes this test case
>>> but then I run all struct-layout-1.exp there are sill cases. where
>we have problems:
>>>
>>> In file included from
>/home/ed/gnu/gcc-build-arm-linux-gnueabihf-linux64/gcc/testsuite/gcc/gcc.dg-struct-layout-1//t024_x.c:8:^M
>>>
>/home/ed/gnu/gcc-build-arm-linux-gnueabihf-linux64/gcc/testsuite/gcc/gcc.dg-struct-layout-1//t024_test.h:
>In function 'test2112':^M
>>>
>/home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.dg/compat/struct-layout-1_x1.h:23:10:
>internal compiler error: in gen_movdf, at config/arm/arm.md:7107^M
>>>
>/home/ed/gnu/gcc-trunk/gcc/testsuite/gcc.dg/compat/struct-layout-1_x1.h:62:3:
>note: in definition of macro 'TX'^M
>>>
>/home/ed/gnu/gcc-build-arm-linux-gnueabihf-linux64/gcc/testsuite/gcc/gcc.dg-struct-layout-1//t024_test.h:113:1:
>note: in expansion of macro 'TCI'^M
>>>
>/home/ed/gnu/gcc-build-arm-linux-gnueabihf-linux64/gcc/testsuite/gcc/gcc.dg-struct-layout-1//t024_test.h:113:294:
>note: in expansion of macro 'F'^M
>>> 0x7ba377 gen_movdf(rtx_def*, rtx_def*)^M
>>>         ../../gcc-trunk/gcc/config/arm/arm.md:7107^M
>>> 0xa494c7 insn_gen_fn::operator()(rtx_def*, rtx_def*) const^M
>>>         ../../gcc-trunk/gcc/recog.h:318^M
>>> 0xa494c7 emit_move_insn_1(rtx_def*, rtx_def*)^M
>>>         ../../gcc-trunk/gcc/expr.c:3695^M
>>> 0xa49854 emit_move_insn(rtx_def*, rtx_def*)^M
>>>         ../../gcc-trunk/gcc/expr.c:3791^M
>>> 0xa49437 emit_move_complex_parts(rtx_def*, rtx_def*)^M
>>>         ../../gcc-trunk/gcc/expr.c:3490^M
>>> 0xa49854 emit_move_insn(rtx_def*, rtx_def*)^M
>>>         ../../gcc-trunk/gcc/expr.c:3791^M
>>> 0xa50faf store_expr(tree_node*, rtx_def*, int, bool, bool)^M
>>>         ../../gcc-trunk/gcc/expr.c:5856^M
>>> 0xa51f34 expand_assignment(tree_node*, tree_node*, bool)^M
>>>         ../../gcc-trunk/gcc/expr.c:5302^M
>>> 0xa51f34 expand_assignment(tree_node*, tree_node*, bool)^M
>>>         ../../gcc-trunk/gcc/expr.c:4983^M
>>> 0x9338af expand_gimple_stmt_1^M
>>>         ../../gcc-trunk/gcc/cfgexpand.c:3777^M
>>> 0x9338af expand_gimple_stmt^M
>>>         ../../gcc-trunk/gcc/cfgexpand.c:3875^M
>>> 0x939221 expand_gimple_basic_block^M
>>>         ../../gcc-trunk/gcc/cfgexpand.c:5915^M
>>> 0x93af86 execute^M
>>>         ../../gcc-trunk/gcc/cfgexpand.c:6538^M
>>> Please submit a full bug report,^M
>>>
>>> My personal gut feeling this will be more fragile than over-aligning
>the
>>> constants.
>> 
>> As said the constant shouldn't end up under-aligned, the user cannot
>> specify alignment of literal constants.  Not sure what you mean
>> with "over"-aligning.
>> 
>
>
>Hmm wait a moment, I actually wanted _only_ to change the
>DECL_ARTIFICIAL
>that is built by build_constant_desc.  It uses align_variable of
>course,
>but I totally missed that this also controls the alignment of normal
>variables, sorry about the confusion here.
>
>I mean we should align the constant for the unaligned complex with
>the natural alignment of the type-mode. 

Agreed. 

 That wrong fix made
>the variables ignore the alignment, which was of course not intended,
>and instead I would need:
>
>Index: expr.c
>===================================================================
>--- expr.c	(revision 274531)
>+++ expr.c	(working copy)
>@@ -5002,9 +5002,10 @@ expand_assignment (tree to, tree from, bool
>nontem
>   /* Handle misaligned stores.  */
>   mode = TYPE_MODE (TREE_TYPE (to));
>   if ((TREE_CODE (to) == MEM_REF
>-       || TREE_CODE (to) == TARGET_MEM_REF)
>+       || TREE_CODE (to) == TARGET_MEM_REF
>+       || DECL_P (to))
>       && mode != BLKmode
>-      && !mem_ref_refers_to_non_mem_p (to)
>+      && (DECL_P (to) || !mem_ref_refers_to_non_mem_p (to))
>       && ((align = get_object_alignment (to))
> 	  < GET_MODE_ALIGNMENT (mode))
>       && (((icode = optab_handler (movmisalign_optab, mode))
>
>Index: varasm.c
>===================================================================
>--- varasm.c	(revision 274531)
>+++ varasm.c	(working copy)
>@@ -47,6 +47,7 @@ along with GCC; see the file COPYING3.  If not see
> #include "stmt.h"
> #include "expr.h"
> #include "expmed.h"
>+#include "optabs.h"
> #include "output.h"
> #include "langhooks.h"
> #include "debug.h"
>@@ -3386,7 +3387,15 @@ build_constant_desc (tree exp)
>   if (TREE_CODE (exp) == STRING_CST)
>SET_DECL_ALIGN (decl, targetm.constant_alignment (exp, DECL_ALIGN
>(decl)));
>   else
>-    align_variable (decl, 0);
>+    {
>+      align_variable (decl, 0);
>+      if (DECL_ALIGN (decl) < GET_MODE_ALIGNMENT (DECL_MODE (decl))
>+	  && ((optab_handler (movmisalign_optab, DECL_MODE (decl))
>+		!= CODE_FOR_nothing)
>+	      || targetm.slow_unaligned_access (DECL_MODE (decl),
>+						DECL_ALIGN (decl))))
>+	SET_DECL_ALIGN (decl, GET_MODE_ALIGNMENT (DECL_MODE (decl)));
>+    }
> 
>   /* Now construct the SYMBOL_REF and the MEM.  */
>   if (use_object_blocks_p ())
>
>>>
>>>
>>>>> 0xa51cc0 expand_assignment(tree_node*, tree_node*, bool)
>>>>> 	../../gcc-trunk/gcc/expr.c:4983
>>>>> 0x93396f expand_gimple_stmt_1
>>>>> 	../../gcc-trunk/gcc/cfgexpand.c:3777
>>>>> 0x93396f expand_gimple_stmt
>>>>> 	../../gcc-trunk/gcc/cfgexpand.c:3875
>>>>> 0x9392e1 expand_gimple_basic_block
>>>>> 	../../gcc-trunk/gcc/cfgexpand.c:5915
>>>>> 0x93b046 execute
>>>>> 	../../gcc-trunk/gcc/cfgexpand.c:6538
>>>>> Please submit a full bug report,
>>>>> with preprocessed source if appropriate.
>>>>> Please include the complete backtrace with any bug report.
>>>>> See <https://gcc.gnu.org/bugs/> for instructions.
>>>>>
>>>>> Without the hunk in varasm.c of course.
>>>>>
>>>>> What happens is that expand_expr_real_2 returns a unaligned
>mem_ref here:
>>>>>
>>>>>     case COMPLEX_CST:
>>>>>       /* Handle evaluating a complex constant in a CONCAT target. 
>*/
>>>>>       if (original_target && GET_CODE (original_target) == CONCAT)
>>>>>         {
>>>>>           [... this path not taken ...]
>>>
>>> BTW: this code block executes when the other ICE happens.
>>>  
>>>>>         }
>>>>>
>>>>>       /* fall through */
>>>>>
>>>>>     case STRING_CST:
>>>>>       temp = expand_expr_constant (exp, 1, modifier);
>>>>>
>>>>>       /* temp contains a constant address.
>>>>>          On RISC machines where a constant address isn't valid,
>>>>>          make some insns to get that address into a register.  */
>>>>>       if (modifier != EXPAND_CONST_ADDRESS
>>>>>           && modifier != EXPAND_INITIALIZER
>>>>>           && modifier != EXPAND_SUM
>>>>>           && ! memory_address_addr_space_p (mode, XEXP (temp, 0),
>>>>>                                             MEM_ADDR_SPACE
>(temp)))
>>>>>         return replace_equiv_address (temp,
>>>>>                                       copy_rtx (XEXP (temp, 0)));
>>>>>       return temp;
>>>>>
>>>>> The result of expand_expr_real(..., EXPAND_NORMAL) ought to be
>usable
>>>>> by emit_move_insn, that is expected just *everywhere* and can't be
>changed.
>>>>>
>>>>> This could probably be fixed in an ugly way in the COMPLEX_CST,
>handler
>>>>> but OTOH, I don't see any reason why this constant has to be
>misaligned
>>>>> when it can be easily aligned, which avoids the need for a
>misaligned access.
>>>>
>>>> If the COMPLEX_CST happends to end up in unaligned memory then
>that's
>>>> of course a bug (unless the target requests that for all
>COMPLEX_CSTs).
>>>> That is, if the unalignment is triggered because the store is to an
>>>> unaligned decl.
>>>>
>>>> But I think the issue is the above one?
>>>>
>>>
>>> yes initially the constant seems to be unaligned. then it is
>expanded,
>>> and there is no special handling for unaligned constants in
>expand_expr_real,
>>> and then probably expand_assignment or store_expr seem not fully
>prepared for
>>> this either.
>> 
>> With a cross I see the constant has regular aligned _Complex type
>> so not sure how it can end up unaligned.
>> 
>
>Maybe a target configuration issue.
>Not sure, I have configured mine this way:
>
>../gcc-trunk/configure
>--prefix=/home/ed/gnu/arm-linux-gnueabihf-linux64
>--target=arm-linux-gnueabihf --enable-languages=all --with-arch=armv7-a
>--with-tune=cortex-a9 --with-fpu=vfpv3-d16 --with-float=hard
>
>However it appears now there are two different errors, one is in
>expand_assignment
>which you found (I start to wonder if I should add you to the authors
>section
>of this patch), and a different one, which I have not yet simplified,
>but you can easily try that for yourself:
>
>make check-gcc-c RUNTESTFLAGS="struct-layout-1.exp=*"
>
>it is okay when the test fails to execute but there should no internal
>compiler errors.
>
>
>>>>>
>>>>> The problem is that the code that handles this misaligned access
>>>>> is skipped because the mem_rtx has initially no MEM_ATTRS and
>therefore
>>>>> MEM_ALIGN == 32, and therefore the code that handles the unaligned
>>>>> access is not taken.  BUT before the mem_rtx is returned it is
>>>>> set to MEM_ALIGN = 8 by set_mem_attributes, and we have an
>assertion,
>>>>> because the result from expand_expr_real(..., EXPAND_NORMAL) ought
>to be
>>>>> usable with emit_move_insn.
>>>>
>>>> yes, as said the _access_ determines the address should be aligned
>>>> so we shouldn't end up setting MEM_ALIGN to 8 but to 32 according
>>>> to the access type/mode.  But we can't trust DECL_ALIGN of
>>>> FUNCTION_DECLs but we _can_ trust users writing *(int *)fn
>>>> (maybe for actual accesses we _can_ trust DECL_ALIGN, it's just
>>>> we may not compute nonzero bits for the actual address because
>>>> of function pointer mangling)
>>>> (for accessing function code I'd say this would be premature
>>>> optimization, but ...)
>>>>
>>>
>>> Not a very nice solution, but it is not worth to spend much effort
>>> in optimizing undefined behavior, I just want to avoid the ICE
>>> at this time and would not trust the DECL_ALIGN either.
>> 
>> So I meant
>> 
>> Index: gcc/builtins.c
>> ===================================================================
>> --- gcc/builtins.c      (revision 274534)
>> +++ gcc/builtins.c      (working copy)
>> @@ -255,7 +255,8 @@ get_object_alignment_2 (tree exp, unsign
>>  
>>    /* Extract alignment information from the innermost object and
>>       possibly adjust bitpos and offset.  */
>> -  if (TREE_CODE (exp) == FUNCTION_DECL)
>> +  if (TREE_CODE (exp) == FUNCTION_DECL
>> +      && addr_p)
>>      {
>>        /* Function addresses can encode extra information besides
>their
>>          alignment.  However, if TARGET_PTRMEMFUNC_VBIT_LOCATION
>> 
>> so we get at DECL_ALIGN of the FUNCTION_DECL (not sure if we
>> can trust it).
>> 
>>>>
>>>> Still I think you can't simply override STACK_SLOT_ALIGNMENT just
>because
>>>> of the mode of an entry param, can you?  If you can assume a bigger
>>>> alignment then STACK_SLOT_ALIGNMENT should return it.
>>>>
>>>
>>> I don't see a real problem here.  All target except i386 and gcn
>(whatever that is)
>>> use the default for STACK_SLOT_ALIGNMENT which simply allows any
>(large) align value
>>> to rule the effective STACK_SLOT_ALIGNMENT.  The user could have
>simply declared
>>> the local variable with the alignment that results in better code
>FWIW.
>>>
>>> If the stack alignment is too high that is capped in
>assign_stack_local:
>>>
>>>   /* Ignore alignment if it exceeds MAX_SUPPORTED_STACK_ALIGNMENT. 
>*/
>>>   if (alignment_in_bits > MAX_SUPPORTED_STACK_ALIGNMENT)
>>>     {
>>>       alignment_in_bits = MAX_SUPPORTED_STACK_ALIGNMENT;
>>>       alignment = MAX_SUPPORTED_STACK_ALIGNMENT / BITS_PER_UNIT;
>>>     }
>>>
>>> I for one, would just assume that MAX_SUPPORTED_STACK_ALIGNMENT
>should
>>> be sufficient for all modes that need movmisalign_optab and friends.
>>> If it is not, an ICE would be just fine.
>> 
>> Hmm.  In some way we could better communicate with the user then
>> and do not allow under-aligning automatic vars?  But the you
>> still have packed structs with BLKmode where the actual field
>> accesses will carry SImode even when not aligned(?)
>> 
>
>Yes, that works also when unaligned.
>
>> 
>> Please split it into the parts for the PR and parts making the
>> asserts not trigger.
>> 
>
>Yes, will do.
>
>> The PR is already fixed, right?  The assign_parm_find_stack_rtl hunk
>> is merely an optimization?
>> 
>
>Hmmmm...  You are right, I should have added that to the commit
>message...
>
>Of course the test cases try to verify the optimization.
>
>
>Thanks
>Bernd.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCHv5] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)
  2019-08-15 17:42                           ` Richard Biener
@ 2019-08-15 21:19                             ` Bernd Edlinger
  2019-08-20  5:38                               ` Jeff Law
                                                 ` (2 more replies)
  2019-08-15 21:27                             ` [PATCH] Sanitizing the middle-end interface to the back-end for strict alignment Bernd Edlinger
  1 sibling, 3 replies; 50+ messages in thread
From: Bernd Edlinger @ 2019-08-15 21:19 UTC (permalink / raw)
  To: Richard Biener
  Cc: gcc-patches, Richard Earnshaw, Ramana Radhakrishnan,
	Kyrill Tkachov, Eric Botcazou, Jeff Law, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 455 bytes --]

On 8/15/19 6:29 PM, Richard Biener wrote:
>>>
>>> Please split it into the parts for the PR and parts making the
>>> asserts not trigger.
>>>
>>
>> Yes, will do.
>>

Okay, here is the rest of the PR 89544 fix,
actually just an optimization, making the larger stack alignment
known to the middle-end, and the test cases.


Boot-strapped and reg-tested on x86_64-pc-linux-gnu and arm-linux-gnueabihf.
Is it OK for trunk?


Thanks
Bernd.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: patch-arm-align-abi.diff --]
[-- Type: text/x-patch; name="patch-arm-align-abi.diff", Size: 3154 bytes --]

2019-08-15  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	PR middle-end/89544
	* function.c (assign_parm_find_stack_rtl): Use larger alignment
	when possible.

testsuite:
2019-08-15  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	PR middle-end/89544
	* gcc.target/arm/unaligned-argument-1.c: New test.
	* gcc.target/arm/unaligned-argument-2.c: New test.

Index: gcc/function.c
===================================================================
--- gcc/function.c	(Revision 274531)
+++ gcc/function.c	(Arbeitskopie)
@@ -2697,8 +2697,23 @@ assign_parm_find_stack_rtl (tree parm, struct assi
      intentionally forcing upward padding.  Otherwise we have to come
      up with a guess at the alignment based on OFFSET_RTX.  */
   poly_int64 offset;
-  if (data->locate.where_pad != PAD_DOWNWARD || data->entry_parm)
+  if (data->locate.where_pad == PAD_NONE || data->entry_parm)
     align = boundary;
+  else if (data->locate.where_pad == PAD_UPWARD)
+    {
+      align = boundary;
+      /* If the argument offset is actually more aligned than the nominal
+	 stack slot boundary, take advantage of that excess alignment.
+	 Don't make any assumptions if STACK_POINTER_OFFSET is in use.  */
+      if (poly_int_rtx_p (offset_rtx, &offset)
+	  && STACK_POINTER_OFFSET == 0)
+	{
+	  unsigned int offset_align = known_alignment (offset) * BITS_PER_UNIT;
+	  if (offset_align == 0 || offset_align > STACK_BOUNDARY)
+	    offset_align = STACK_BOUNDARY;
+	  align = MAX (align, offset_align);
+	}
+    }
   else if (poly_int_rtx_p (offset_rtx, &offset))
     {
       align = least_bit_hwi (boundary);
Index: gcc/testsuite/gcc.target/arm/unaligned-argument-1.c
===================================================================
--- gcc/testsuite/gcc.target/arm/unaligned-argument-1.c	(Revision 0)
+++ gcc/testsuite/gcc.target/arm/unaligned-argument-1.c	(Arbeitskopie)
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_arm_ok } */
+/* { dg-require-effective-target arm_ldrd_strd_ok } */
+/* { dg-options "-marm -mno-unaligned-access -O3" } */
+
+struct s {
+  int a, b;
+} __attribute__((aligned(8)));
+
+struct s f0;
+
+void f(int a, int b, int c, int d, struct s f)
+{
+  f0 = f;
+}
+
+/* { dg-final { scan-assembler-times "ldrd" 1 } } */
+/* { dg-final { scan-assembler-times "strd" 1 } } */
+/* { dg-final { scan-assembler-times "stm" 0 } } */
Index: gcc/testsuite/gcc.target/arm/unaligned-argument-2.c
===================================================================
--- gcc/testsuite/gcc.target/arm/unaligned-argument-2.c	(Revision 0)
+++ gcc/testsuite/gcc.target/arm/unaligned-argument-2.c	(Arbeitskopie)
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_arm_ok } */
+/* { dg-require-effective-target arm_ldrd_strd_ok } */
+/* { dg-options "-marm -mno-unaligned-access -O3" } */
+
+struct s {
+  int a, b;
+} __attribute__((aligned(8)));
+
+struct s f0;
+
+void f(int a, int b, int c, int d, int e, struct s f)
+{
+  f0 = f;
+}
+
+/* { dg-final { scan-assembler-times "ldrd" 0 } } */
+/* { dg-final { scan-assembler-times "strd" 0 } } */
+/* { dg-final { scan-assembler-times "stm" 1 } } */

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH] Sanitizing the middle-end interface to the back-end for strict alignment
  2019-08-15 17:42                           ` Richard Biener
  2019-08-15 21:19                             ` [PATCHv5] " Bernd Edlinger
@ 2019-08-15 21:27                             ` Bernd Edlinger
  2019-08-17 10:11                               ` Bernd Edlinger
                                                 ` (2 more replies)
  1 sibling, 3 replies; 50+ messages in thread
From: Bernd Edlinger @ 2019-08-15 21:27 UTC (permalink / raw)
  To: Richard Biener
  Cc: gcc-patches, Richard Earnshaw, Ramana Radhakrishnan,
	Kyrill Tkachov, Eric Botcazou, Jeff Law, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 892 bytes --]

Hi,

this is the split out part from the "Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)"
which is sanitizing the middle-end interface to the back-end for strict alignment,
and a couple of bug-fixes that are necessary to survive boot-strap.
It is intended to be applied after the PR 89544 fix.

I think it would be possible to change the default implementation of STACK_SLOT_ALIGNMENT
to make all stack variables always naturally aligned instead of doing that only
in assign_parm_setup_stack, but would still like to avoid changing too many things
that do not seem to have a problem.  Since this would affect many targets, and more
kinds of variables that may probably not have a strict alignment problem.
But I am ready to take your advice though.


Boot-strapped and reg-tested on x86_64-pc-linux-gnu and arm-linux-gnueabihf
Is it OK for trunk?


Thanks
Bernd.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: patch-strict-align.diff --]
[-- Type: text/x-patch; name="patch-strict-align.diff", Size: 9187 bytes --]

2019-08-15  Bernd Edlinger  <bernd.edlinger@hotmail.de>
	    Richard Biener  <rguenther@suse.de>

	* expr.c (expand_assignment): Handle misaligned DECLs.
	(expand_expr_real_1): Handle FUNCTION_DECL as unaligned.
	* function.c (assign_parm_adjust_stack_rtl): Check movmisalign optab
	too.
	(assign_parm_setup_stack): Allocate properly aligned stack slots.
	* varasm.c (build_constant_desc): Align constants of misaligned types.
	* config/arm/arm.md (movdi, movsi, movhi, movhf, movsf, movdf): Check
	strict alignment restrictions on memory addresses.
	* config/arm/neon.md (movti, mov<VSTRUCT>, mov<VH>): Likewise.
	* config/arm/vec-common.md (mov<VALL>): Likewise.

Index: gcc/config/arm/arm.md
===================================================================
--- gcc/config/arm/arm.md	(Revision 274531)
+++ gcc/config/arm/arm.md	(Arbeitskopie)
@@ -5838,6 +5838,12 @@
 	(match_operand:DI 1 "general_operand"))]
   "TARGET_EITHER"
   "
+  gcc_checking_assert (!MEM_P (operands[0])
+		       || MEM_ALIGN (operands[0])
+			  >= GET_MODE_ALIGNMENT (DImode));
+  gcc_checking_assert (!MEM_P (operands[1])
+		       || MEM_ALIGN (operands[1])
+			  >= GET_MODE_ALIGNMENT (DImode));
   if (can_create_pseudo_p ())
     {
       if (!REG_P (operands[0]))
@@ -6014,6 +6020,12 @@
   {
   rtx base, offset, tmp;
 
+  gcc_checking_assert (!MEM_P (operands[0])
+		       || MEM_ALIGN (operands[0])
+			  >= GET_MODE_ALIGNMENT (SImode));
+  gcc_checking_assert (!MEM_P (operands[1])
+		       || MEM_ALIGN (operands[1])
+			  >= GET_MODE_ALIGNMENT (SImode));
   if (TARGET_32BIT || TARGET_HAVE_MOVT)
     {
       /* Everything except mem = const or mem = mem can be done easily.  */
@@ -6503,6 +6515,12 @@
 	(match_operand:HI 1 "general_operand"))]
   "TARGET_EITHER"
   "
+  gcc_checking_assert (!MEM_P (operands[0])
+		       || MEM_ALIGN (operands[0])
+			  >= GET_MODE_ALIGNMENT (HImode));
+  gcc_checking_assert (!MEM_P (operands[1])
+		       || MEM_ALIGN (operands[1])
+			  >= GET_MODE_ALIGNMENT (HImode));
   if (TARGET_ARM)
     {
       if (can_create_pseudo_p ())
@@ -6912,6 +6930,12 @@
 	(match_operand:HF 1 "general_operand"))]
   "TARGET_EITHER"
   "
+  gcc_checking_assert (!MEM_P (operands[0])
+		       || MEM_ALIGN (operands[0])
+			  >= GET_MODE_ALIGNMENT (HFmode));
+  gcc_checking_assert (!MEM_P (operands[1])
+		       || MEM_ALIGN (operands[1])
+			  >= GET_MODE_ALIGNMENT (HFmode));
   if (TARGET_32BIT)
     {
       if (MEM_P (operands[0]))
@@ -6976,6 +7000,12 @@
 	(match_operand:SF 1 "general_operand"))]
   "TARGET_EITHER"
   "
+  gcc_checking_assert (!MEM_P (operands[0])
+		       || MEM_ALIGN (operands[0])
+			  >= GET_MODE_ALIGNMENT (SFmode));
+  gcc_checking_assert (!MEM_P (operands[1])
+		       || MEM_ALIGN (operands[1])
+			  >= GET_MODE_ALIGNMENT (SFmode));
   if (TARGET_32BIT)
     {
       if (MEM_P (operands[0]))
@@ -7071,6 +7101,12 @@
 	(match_operand:DF 1 "general_operand"))]
   "TARGET_EITHER"
   "
+  gcc_checking_assert (!MEM_P (operands[0])
+		       || MEM_ALIGN (operands[0])
+			  >= GET_MODE_ALIGNMENT (DFmode));
+  gcc_checking_assert (!MEM_P (operands[1])
+		       || MEM_ALIGN (operands[1])
+			  >= GET_MODE_ALIGNMENT (DFmode));
   if (TARGET_32BIT)
     {
       if (MEM_P (operands[0]))
Index: gcc/config/arm/neon.md
===================================================================
--- gcc/config/arm/neon.md	(Revision 274531)
+++ gcc/config/arm/neon.md	(Arbeitskopie)
@@ -127,6 +127,12 @@
 	(match_operand:TI 1 "general_operand"))]
   "TARGET_NEON"
 {
+  gcc_checking_assert (!MEM_P (operands[0])
+		       || MEM_ALIGN (operands[0])
+			  >= GET_MODE_ALIGNMENT (TImode));
+  gcc_checking_assert (!MEM_P (operands[1])
+		       || MEM_ALIGN (operands[1])
+			  >= GET_MODE_ALIGNMENT (TImode));
   if (can_create_pseudo_p ())
     {
       if (!REG_P (operands[0]))
@@ -139,6 +145,12 @@
 	(match_operand:VSTRUCT 1 "general_operand"))]
   "TARGET_NEON"
 {
+  gcc_checking_assert (!MEM_P (operands[0])
+		       || MEM_ALIGN (operands[0])
+			  >= GET_MODE_ALIGNMENT (<MODE>mode));
+  gcc_checking_assert (!MEM_P (operands[1])
+		       || MEM_ALIGN (operands[1])
+			  >= GET_MODE_ALIGNMENT (<MODE>mode));
   if (can_create_pseudo_p ())
     {
       if (!REG_P (operands[0]))
@@ -151,6 +163,12 @@
 	(match_operand:VH 1 "s_register_operand"))]
   "TARGET_NEON"
 {
+  gcc_checking_assert (!MEM_P (operands[0])
+		       || MEM_ALIGN (operands[0])
+			  >= GET_MODE_ALIGNMENT (<MODE>mode));
+  gcc_checking_assert (!MEM_P (operands[1])
+		       || MEM_ALIGN (operands[1])
+			  >= GET_MODE_ALIGNMENT (<MODE>mode));
   if (can_create_pseudo_p ())
     {
       if (!REG_P (operands[0]))
Index: gcc/config/arm/vec-common.md
===================================================================
--- gcc/config/arm/vec-common.md	(Revision 274531)
+++ gcc/config/arm/vec-common.md	(Arbeitskopie)
@@ -26,6 +26,12 @@
   "TARGET_NEON
    || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))"
 {
+  gcc_checking_assert (!MEM_P (operands[0])
+		       || MEM_ALIGN (operands[0])
+			  >= GET_MODE_ALIGNMENT (<MODE>mode));
+  gcc_checking_assert (!MEM_P (operands[1])
+		       || MEM_ALIGN (operands[1])
+			  >= GET_MODE_ALIGNMENT (<MODE>mode));
   if (can_create_pseudo_p ())
     {
       if (!REG_P (operands[0]))
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	(Revision 274531)
+++ gcc/expr.c	(Arbeitskopie)
@@ -5002,9 +5002,10 @@ expand_assignment (tree to, tree from, bool nontem
   /* Handle misaligned stores.  */
   mode = TYPE_MODE (TREE_TYPE (to));
   if ((TREE_CODE (to) == MEM_REF
-       || TREE_CODE (to) == TARGET_MEM_REF)
+       || TREE_CODE (to) == TARGET_MEM_REF
+       || DECL_P (to))
       && mode != BLKmode
-      && !mem_ref_refers_to_non_mem_p (to)
+      && (DECL_P (to) || !mem_ref_refers_to_non_mem_p (to))
       && ((align = get_object_alignment (to))
 	  < GET_MODE_ALIGNMENT (mode))
       && (((icode = optab_handler (movmisalign_optab, mode))
@@ -10796,6 +10797,14 @@ expand_expr_real_1 (tree exp, rtx target, machine_
 	    MEM_VOLATILE_P (op0) = 1;
 	  }
 
+	if (MEM_P (op0) && TREE_CODE (tem) == FUNCTION_DECL)
+	  {
+	    if (op0 == orig_op0)
+	      op0 = copy_rtx (op0);
+
+	    set_mem_align (op0, BITS_PER_UNIT);
+	  }
+
 	/* In cases where an aligned union has an unaligned object
 	   as a field, we might be extracting a BLKmode value from
 	   an integer-mode (e.g., SImode) object.  Handle this case
Index: gcc/function.c
===================================================================
--- gcc/function.c	(Revision 274531)
+++ gcc/function.c	(Arbeitskopie)
@@ -2812,8 +2827,10 @@ assign_parm_adjust_stack_rtl (struct assign_parm_d
      stack slot, if we need one.  */
   if (stack_parm
       && ((GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN (stack_parm)
-	   && targetm.slow_unaligned_access (data->nominal_mode,
-					     MEM_ALIGN (stack_parm)))
+	   && ((optab_handler (movmisalign_optab, data->nominal_mode)
+		!= CODE_FOR_nothing)
+	       || targetm.slow_unaligned_access (data->nominal_mode,
+						 MEM_ALIGN (stack_parm))))
 	  || (data->nominal_type
 	      && TYPE_ALIGN (data->nominal_type) > MEM_ALIGN (stack_parm)
 	      && MEM_ALIGN (stack_parm) < PREFERRED_STACK_BOUNDARY)))
@@ -3466,11 +3483,20 @@ assign_parm_setup_stack (struct assign_parm_data_a
 	  int align = STACK_SLOT_ALIGNMENT (data->passed_type,
 					    GET_MODE (data->entry_parm),
 					    TYPE_ALIGN (data->passed_type));
+	  if (align < (int)GET_MODE_ALIGNMENT (GET_MODE (data->entry_parm))
+	      && ((optab_handler (movmisalign_optab,
+				  GET_MODE (data->entry_parm))
+		   != CODE_FOR_nothing)
+		  || targetm.slow_unaligned_access (GET_MODE (data->entry_parm),
+						    align)))
+	    align = GET_MODE_ALIGNMENT (GET_MODE (data->entry_parm));
 	  data->stack_parm
 	    = assign_stack_local (GET_MODE (data->entry_parm),
 				  GET_MODE_SIZE (GET_MODE (data->entry_parm)),
 				  align);
+	  align = MEM_ALIGN (data->stack_parm);
 	  set_mem_attributes (data->stack_parm, parm, 1);
+	  set_mem_align (data->stack_parm, align);
 	}
 
       dest = validize_mem (copy_rtx (data->stack_parm));
Index: gcc/varasm.c
===================================================================
--- gcc/varasm.c	(Revision 274531)
+++ gcc/varasm.c	(Arbeitskopie)
@@ -47,6 +47,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "stmt.h"
 #include "expr.h"
 #include "expmed.h"
+#include "optabs.h"
 #include "output.h"
 #include "langhooks.h"
 #include "debug.h"
@@ -3386,7 +3387,15 @@ build_constant_desc (tree exp)
   if (TREE_CODE (exp) == STRING_CST)
     SET_DECL_ALIGN (decl, targetm.constant_alignment (exp, DECL_ALIGN (decl)));
   else
-    align_variable (decl, 0);
+    {
+      align_variable (decl, 0);
+      if (DECL_ALIGN (decl) < GET_MODE_ALIGNMENT (DECL_MODE (decl))
+	  && ((optab_handler (movmisalign_optab, DECL_MODE (decl))
+	       != CODE_FOR_nothing)
+	      || targetm.slow_unaligned_access (DECL_MODE (decl),
+						DECL_ALIGN (decl))))
+	SET_DECL_ALIGN (decl, GET_MODE_ALIGNMENT (DECL_MODE (decl)));
+    }
 
   /* Now construct the SYMBOL_REF and the MEM.  */
   if (use_object_blocks_p ())

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Sanitizing the middle-end interface to the back-end for strict alignment
  2019-08-15 21:27                             ` [PATCH] Sanitizing the middle-end interface to the back-end for strict alignment Bernd Edlinger
@ 2019-08-17 10:11                               ` Bernd Edlinger
  2019-08-23  0:01                                 ` Jeff Law
  2019-08-23  0:05                               ` Jeff Law
  2019-08-27 10:07                               ` Kyrill Tkachov
  2 siblings, 1 reply; 50+ messages in thread
From: Bernd Edlinger @ 2019-08-17 10:11 UTC (permalink / raw)
  To: Richard Biener
  Cc: gcc-patches, Richard Earnshaw, Ramana Radhakrishnan,
	Kyrill Tkachov, Eric Botcazou, Jeff Law, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 2152 bytes --]

On 8/15/19 9:47 PM, Bernd Edlinger wrote:
> Hi,
> 
> this is the split out part from the "Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)"
> which is sanitizing the middle-end interface to the back-end for strict alignment,
> and a couple of bug-fixes that are necessary to survive boot-strap.
> It is intended to be applied after the PR 89544 fix.
> 
> I think it would be possible to change the default implementation of STACK_SLOT_ALIGNMENT
> to make all stack variables always naturally aligned instead of doing that only
> in assign_parm_setup_stack, but would still like to avoid changing too many things
> that do not seem to have a problem.  Since this would affect many targets, and more
> kinds of variables that may probably not have a strict alignment problem.
> But I am ready to take your advice though.
> 
> 
> Boot-strapped and reg-tested on x86_64-pc-linux-gnu and arm-linux-gnueabihf
> Is it OK for trunk?
> 
> 

Hmm, actually the hunk in assign_parm_setup_stack is not only failing
an assertion, but rather a wrong code bug:

I found now a test case that generates silently wrong code and is fixed
by this patch.

$ cat unaligned-argument-3.c 
/* { dg-do compile } */
/* { dg-require-effective-target arm_arm_ok } */
/* { dg-options "-marm -mno-unaligned-access -O3" } */

typedef int __attribute__((aligned(1))) s;

void x(char*, s*);
void f(char a, s f)
{
  x(&a, &f);
}

/* { dg-final { scan-assembler-times "str\t\[^\\n\]*\\\[sp\\\]" 1 } } */
/* { dg-final { scan-assembler-times "str\t\[^\\n\]*\\\[sp, #3\\\]" 0 } } */

currently with -marm -mno-unaligned-access -O3 we generate:

f:
	@ args = 0, pretend = 0, frame = 8
	@ frame_needed = 0, uses_anonymous_args = 0
	str	lr, [sp, #-4]!
	sub	sp, sp, #12
	mov	r3, r0
	str	r1, [sp, #3]  <- may trap
	add	r0, sp, #7
	add	r1, sp, #3
	strb	r3, [sp, #7]
	bl	x
	add	sp, sp, #12
	@ sp needed
	ldr	pc, [sp], #4


So I would like to add a test case to the patch as attached.

Tested with a cross, that both dg-final fail currently and are fixed
with the other patches applied.

Is it OK for trunk?


Thanks
Bernd.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: patch-strict-align-1.diff --]
[-- Type: text/x-patch; name="patch-strict-align-1.diff", Size: 809 bytes --]

2019-08-17  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	PR middle-end/89544
	* gcc.target/arm/unaligned-argument-3.c: New test.

Index: gcc/testsuite/gcc.target/arm/unaligned-argument-3.c
===================================================================
--- gcc/testsuite/gcc.target/arm/unaligned-argument-3.c	(revision 0)
+++ gcc/testsuite/gcc.target/arm/unaligned-argument-3.c	(working copy)
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_arm_ok } */
+/* { dg-options "-marm -mno-unaligned-access -O3" } */
+
+typedef int __attribute__((aligned(1))) s;
+
+void x(char*, s*);
+void f(char a, s f)
+{
+  x(&a, &f);
+}
+
+/* { dg-final { scan-assembler-times "str\t\[^\\n\]*\\\[sp\\\]" 1 } } */
+/* { dg-final { scan-assembler-times "str\t\[^\\n\]*\\\[sp, #3\\\]" 0 } } */

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCHv5] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)
  2019-08-15 21:19                             ` [PATCHv5] " Bernd Edlinger
@ 2019-08-20  5:38                               ` Jeff Law
  2019-08-20 15:04                               ` John David Anglin
  2019-09-04 12:53                               ` Richard Earnshaw (lists)
  2 siblings, 0 replies; 50+ messages in thread
From: Jeff Law @ 2019-08-20  5:38 UTC (permalink / raw)
  To: Bernd Edlinger, Richard Biener
  Cc: gcc-patches, Richard Earnshaw, Ramana Radhakrishnan,
	Kyrill Tkachov, Eric Botcazou, Jakub Jelinek

On 8/15/19 1:47 PM, Bernd Edlinger wrote:
> On 8/15/19 6:29 PM, Richard Biener wrote:
>>>> Please split it into the parts for the PR and parts making the
>>>> asserts not trigger.
>>>>
>>> Yes, will do.
>>>
> Okay, here is the rest of the PR 89544 fix,
> actually just an optimization, making the larger stack alignment
> known to the middle-end, and the test cases.
> 
> 
> Boot-strapped and reg-tested on x86_64-pc-linux-gnu and arm-linux-gnueabihf.
> Is it OK for trunk?
> 
> 
> Thanks
> Bernd.
> 
> 
> patch-arm-align-abi.diff
> 
> 2019-08-15  Bernd Edlinger  <bernd.edlinger@hotmail.de>
> 
> 	PR middle-end/89544
> 	* function.c (assign_parm_find_stack_rtl): Use larger alignment
> 	when possible.
> 
> testsuite:
> 2019-08-15  Bernd Edlinger  <bernd.edlinger@hotmail.de>
> 
> 	PR middle-end/89544
> 	* gcc.target/arm/unaligned-argument-1.c: New test.
> 	* gcc.target/arm/unaligned-argument-2.c: New test.
OK.

Given the sensitivity of this code, let's give the tester a chance to
run with this patch applied before we add the next one for sanitizing
the middle end interface.

jeff

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCHv5] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)
  2019-08-15 21:19                             ` [PATCHv5] " Bernd Edlinger
  2019-08-20  5:38                               ` Jeff Law
@ 2019-08-20 15:04                               ` John David Anglin
       [not found]                                 ` <0d39b64f-67d9-7857-cf4e-36f09c0dc15e@bell.net>
  2019-09-04 12:53                               ` Richard Earnshaw (lists)
  2 siblings, 1 reply; 50+ messages in thread
From: John David Anglin @ 2019-08-20 15:04 UTC (permalink / raw)
  To: Bernd Edlinger, Richard Biener
  Cc: gcc-patches, Richard Earnshaw, Ramana Radhakrishnan,
	Kyrill Tkachov, Eric Botcazou, Jeff Law, Jakub Jelinek

On 2019-08-15 3:47 p.m., Bernd Edlinger wrote:
> 2019-08-15  Bernd Edlinger  <bernd.edlinger@hotmail.de>
>
> 	PR middle-end/89544
> 	* function.c (assign_parm_find_stack_rtl): Use larger alignment
> 	when possible.
This patch breaks build on hppa-unknown-linux-gnu:
https://buildd.debian.org/status/fetch.php?pkg=gcc-snapshot&arch=hppa&ver=1%3A20190820-1&stamp=1566307455&raw=0

hppa-linux-gnu-g++-9 -std=gnu++98 -fno-PIE -c   -g -DIN_GCC     -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-format -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H -I. -I. -I../../src/gcc -I../../src/gcc/. -I../../src/gcc/../include -I../../src/gcc/../libcpp/include  -I../../src/gcc/../libdecnumber -I../../src/gcc/../libdecnumber/dpd -I../libdecnumber -I../../src/gcc/../libbacktrace   -o function.o -MT function.o -MMD -MP -MF ./.deps/function.TPo ../../src/gcc/function.c
hppa-linux-gnu-g++-9 -std=gnu++98 -fno-PIE -c   -g -DIN_GCC     -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-format -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H -I. -I. -I../../src/gcc -I../../src/gcc/. -I../../src/gcc/../include -I../../src/gcc/../libcpp/include  -I../../src/gcc/../libdecnumber -I../../src/gcc/../libdecnumber/dpd -I../libdecnumber -I../../src/gcc/../libbacktrace   -o function-tests.o -MT function-tests.o -MMD -MP -MF ./.deps/function-tests.TPo ../../src/gcc/function-tests.c
hppa-linux-gnu-g++-9 -std=gnu++98 -fno-PIE -c   -g -DIN_GCC     -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-format -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H -I. -I. -I../../src/gcc -I../../src/gcc/. -I../../src/gcc/../include -I../../src/gcc/../libcpp/include  -I../../src/gcc/../libdecnumber -I../../src/gcc/../libdecnumber/dpd -I../libdecnumber -I../../src/gcc/../libbacktrace   -o fwprop.o -MT fwprop.o -MMD -MP -MF ./.deps/fwprop.TPo ../../src/gcc/fwprop.c
../../src/gcc/function.c: In function 'void assign_parm_find_stack_rtl(tree, assign_parm_data_one*)':
../../src/gcc/function.c:2690:28: error: no match for 'operator==' (operand types are 'poly_int<1, long long int>' and 'int')
 2690 |    && STACK_POINTER_OFFSET == 0)
      |                            ^~ ~
      |                               |
      |                               int
In file included from ../../src/gcc/coretypes.h:415,
                 from ../../src/gcc/function.c:36:
../../src/gcc/wide-int.h:3287:19: note: candidate: 'template<class T1, class T2> typename wi::binary_traits<T1, T2>::predicate_result operator==(const T1&, const T2&)'
 3287 | BINARY_PREDICATE (operator ==, eq_p)
      |                   ^~~~~~~~
../../src/gcc/wide-int.h:3264:3: note: in definition of macro 'BINARY_PREDICATE'
 3264 |   OP (const T1 &x, const T2 &y) \
      |   ^~
../../src/gcc/wide-int.h:3287:19: note:   template argument deduction/substitution failed:
 3287 | BINARY_PREDICATE (operator ==, eq_p)
      |                   ^~~~~~~~
../../src/gcc/wide-int.h:3264:3: note: in definition of macro 'BINARY_PREDICATE'
 3264 |   OP (const T1 &x, const T2 &y) \
      |   ^~
../../src/gcc/wide-int.h: In substitution of 'template<class T1, class T2> typename wi::binary_traits<T1, T2>::predicate_result operator==(const T1&, const T2&) [with T1 = poly_int<1, long long int>; T2 = int]':
../../src/gcc/function.c:2690:31:   required from here
../../src/gcc/wide-int.h:3287:19: error: incomplete type 'wi::int_traits<poly_int<1, long long int> >' used in nested name specifier
 3287 | BINARY_PREDICATE (operator ==, eq_p)
      |                   ^~~~~~~~
../../src/gcc/wide-int.h:3264:3: note: in definition of macro 'BINARY_PREDICATE'
 3264 |   OP (const T1 &x, const T2 &y) \
      |   ^~
make[5]: *** [Makefile:1118: function.o] Error 1
make[5]: *** Waiting for unfinished jobs....

We have the following define for STACK_POINTER_OFFSET:

#define STACK_POINTER_OFFSET \
  (TARGET_64BIT ? -(crtl->outgoing_args_size + 48) : poly_int64 (-32))
 
Dave

-- 
John David Anglin  dave.anglin@bell.net

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Fwd: [PATCHv5] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)
       [not found]                                 ` <0d39b64f-67d9-7857-cf4e-36f09c0dc15e@bell.net>
@ 2019-08-20 16:03                                   ` Bernd Edlinger
  0 siblings, 0 replies; 50+ messages in thread
From: Bernd Edlinger @ 2019-08-20 16:03 UTC (permalink / raw)
  To: John David Anglin, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 4875 bytes --]

Ah, yes that was unexpected...
Sorry for the breakage.

So this needs to be known_eq (STACK_POINTER_OFFSET, 0)
instead of STACK_POINTER_OFFSET == 0 obviously.

Should be fixed by this patch, which I am going to commit
as "obvious" in a moment unless someone objects.


Thanks
Bernd.


On 8/20/19 4:39 PM, John David Anglin wrote:
> On 2019-08-15 3:47 p.m., Bernd Edlinger wrote:
>> 2019-08-15  Bernd Edlinger  <bernd.edlinger@hotmail.de>
>>
>> 	PR middle-end/89544
>> 	* function.c (assign_parm_find_stack_rtl): Use larger alignment
>> 	when possible.
> This patch breaks build on hppa-unknown-linux-gnu:
> https://buildd.debian.org/status/fetch.php?pkg=gcc-snapshot&arch=hppa&ver=1%3A20190820-1&stamp=1566307455&raw=0
> 
> hppa-linux-gnu-g++-9 -std=gnu++98 -fno-PIE -c   -g -DIN_GCC     -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-format -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H -I. -I. -I../../src/gcc -I../../src/gcc/. -I../../src/gcc/../include -I../../src/gcc/../libcpp/include  -I../../src/gcc/../libdecnumber -I../../src/gcc/../libdecnumber/dpd -I../libdecnumber -I../../src/gcc/../libbacktrace   -o function.o -MT function.o -MMD -MP -MF ./.deps/function.TPo ../../src/gcc/function.c
> hppa-linux-gnu-g++-9 -std=gnu++98 -fno-PIE -c   -g -DIN_GCC     -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-format -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H -I. -I. -I../../src/gcc -I../../src/gcc/. -I../../src/gcc/../include -I../../src/gcc/../libcpp/include  -I../../src/gcc/../libdecnumber -I../../src/gcc/../libdecnumber/dpd -I../libdecnumber -I../../src/gcc/../libbacktrace   -o function-tests.o -MT function-tests.o -MMD -MP -MF ./.deps/function-tests.TPo ../../src/gcc/function-tests.c
> hppa-linux-gnu-g++-9 -std=gnu++98 -fno-PIE -c   -g -DIN_GCC     -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-format -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H -I. -I. -I../../src/gcc -I../../src/gcc/. -I../../src/gcc/../include -I../../src/gcc/../libcpp/include  -I../../src/gcc/../libdecnumber -I../../src/gcc/../libdecnumber/dpd -I../libdecnumber -I../../src/gcc/../libbacktrace   -o fwprop.o -MT fwprop.o -MMD -MP -MF ./.deps/fwprop.TPo ../../src/gcc/fwprop.c
> ../../src/gcc/function.c: In function 'void assign_parm_find_stack_rtl(tree, assign_parm_data_one*)':
> ../../src/gcc/function.c:2690:28: error: no match for 'operator==' (operand types are 'poly_int<1, long long int>' and 'int')
>  2690 |    && STACK_POINTER_OFFSET == 0)
>       |                            ^~ ~
>       |                               |
>       |                               int
> In file included from ../../src/gcc/coretypes.h:415,
>                  from ../../src/gcc/function.c:36:
> ../../src/gcc/wide-int.h:3287:19: note: candidate: 'template<class T1, class T2> typename wi::binary_traits<T1, T2>::predicate_result operator==(const T1&, const T2&)'
>  3287 | BINARY_PREDICATE (operator ==, eq_p)
>       |                   ^~~~~~~~
> ../../src/gcc/wide-int.h:3264:3: note: in definition of macro 'BINARY_PREDICATE'
>  3264 |   OP (const T1 &x, const T2 &y) \
>       |   ^~
> ../../src/gcc/wide-int.h:3287:19: note:   template argument deduction/substitution failed:
>  3287 | BINARY_PREDICATE (operator ==, eq_p)
>       |                   ^~~~~~~~
> ../../src/gcc/wide-int.h:3264:3: note: in definition of macro 'BINARY_PREDICATE'
>  3264 |   OP (const T1 &x, const T2 &y) \
>       |   ^~
> ../../src/gcc/wide-int.h: In substitution of 'template<class T1, class T2> typename wi::binary_traits<T1, T2>::predicate_result operator==(const T1&, const T2&) [with T1 = poly_int<1, long long int>; T2 = int]':
> ../../src/gcc/function.c:2690:31:   required from here
> ../../src/gcc/wide-int.h:3287:19: error: incomplete type 'wi::int_traits<poly_int<1, long long int> >' used in nested name specifier
>  3287 | BINARY_PREDICATE (operator ==, eq_p)
>       |                   ^~~~~~~~
> ../../src/gcc/wide-int.h:3264:3: note: in definition of macro 'BINARY_PREDICATE'
>  3264 |   OP (const T1 &x, const T2 &y) \
>       |   ^~
> make[5]: *** [Makefile:1118: function.o] Error 1
> make[5]: *** Waiting for unfinished jobs....
> 
> We have the following define for STACK_POINTER_OFFSET:
> 
> #define STACK_POINTER_OFFSET \
>   (TARGET_64BIT ? -(crtl->outgoing_args_size + 48) : poly_int64 (-32))
>  
> Dave
> 

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: patch-function.diff --]
[-- Type: text/x-patch; name="patch-function.diff", Size: 761 bytes --]

2019-08-20  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	* function.c (assign_parm_find_stack_rtl): Use known_eq instead of ==.

Index: gcc/function.c
===================================================================
--- gcc/function.c	(revision 274691)
+++ gcc/function.c	(working copy)
@@ -2706,7 +2706,7 @@ assign_parm_find_stack_rtl (tree parm, struct assi
 	 stack slot boundary, take advantage of that excess alignment.
 	 Don't make any assumptions if STACK_POINTER_OFFSET is in use.  */
       if (poly_int_rtx_p (offset_rtx, &offset)
-	  && STACK_POINTER_OFFSET == 0)
+	  && known_eq (STACK_POINTER_OFFSET, 0))
 	{
 	  unsigned int offset_align = known_alignment (offset) * BITS_PER_UNIT;
 	  if (offset_align == 0 || offset_align > STACK_BOUNDARY)

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Sanitizing the middle-end interface to the back-end for strict alignment
  2019-08-17 10:11                               ` Bernd Edlinger
@ 2019-08-23  0:01                                 ` Jeff Law
  0 siblings, 0 replies; 50+ messages in thread
From: Jeff Law @ 2019-08-23  0:01 UTC (permalink / raw)
  To: Bernd Edlinger, Richard Biener
  Cc: gcc-patches, Richard Earnshaw, Ramana Radhakrishnan,
	Kyrill Tkachov, Eric Botcazou, Jakub Jelinek

On 8/17/19 1:44 AM, Bernd Edlinger wrote:
> On 8/15/19 9:47 PM, Bernd Edlinger wrote:
>> Hi,
>>
>> this is the split out part from the "Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)"
>> which is sanitizing the middle-end interface to the back-end for strict alignment,
>> and a couple of bug-fixes that are necessary to survive boot-strap.
>> It is intended to be applied after the PR 89544 fix.
>>
>> I think it would be possible to change the default implementation of STACK_SLOT_ALIGNMENT
>> to make all stack variables always naturally aligned instead of doing that only
>> in assign_parm_setup_stack, but would still like to avoid changing too many things
>> that do not seem to have a problem.  Since this would affect many targets, and more
>> kinds of variables that may probably not have a strict alignment problem.
>> But I am ready to take your advice though.
>>
>>
>> Boot-strapped and reg-tested on x86_64-pc-linux-gnu and arm-linux-gnueabihf
>> Is it OK for trunk?
>>
>>
> 
> Hmm, actually the hunk in assign_parm_setup_stack is not only failing
> an assertion, but rather a wrong code bug:
> 
> I found now a test case that generates silently wrong code and is fixed
> by this patch.
> 
> $ cat unaligned-argument-3.c 
> /* { dg-do compile } */
> /* { dg-require-effective-target arm_arm_ok } */
> /* { dg-options "-marm -mno-unaligned-access -O3" } */
> 
> typedef int __attribute__((aligned(1))) s;
> 
> void x(char*, s*);
> void f(char a, s f)
> {
>   x(&a, &f);
> }
> 
> /* { dg-final { scan-assembler-times "str\t\[^\\n\]*\\\[sp\\\]" 1 } } */
> /* { dg-final { scan-assembler-times "str\t\[^\\n\]*\\\[sp, #3\\\]" 0 } } */
> 
> currently with -marm -mno-unaligned-access -O3 we generate:
> 
> f:
> 	@ args = 0, pretend = 0, frame = 8
> 	@ frame_needed = 0, uses_anonymous_args = 0
> 	str	lr, [sp, #-4]!
> 	sub	sp, sp, #12
> 	mov	r3, r0
> 	str	r1, [sp, #3]  <- may trap
> 	add	r0, sp, #7
> 	add	r1, sp, #3
> 	strb	r3, [sp, #7]
> 	bl	x
> 	add	sp, sp, #12
> 	@ sp needed
> 	ldr	pc, [sp], #4
> 
> 
> So I would like to add a test case to the patch as attached.
> 
> Tested with a cross, that both dg-final fail currently and are fixed
> with the other patches applied.
> 
> Is it OK for trunk?
OK when the patch that fixes this is ACK'd.

jeff

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Sanitizing the middle-end interface to the back-end for strict alignment
  2019-08-15 21:27                             ` [PATCH] Sanitizing the middle-end interface to the back-end for strict alignment Bernd Edlinger
  2019-08-17 10:11                               ` Bernd Edlinger
@ 2019-08-23  0:05                               ` Jeff Law
  2019-08-23 15:15                                 ` [PING] " Bernd Edlinger
  2019-08-27 10:07                               ` Kyrill Tkachov
  2 siblings, 1 reply; 50+ messages in thread
From: Jeff Law @ 2019-08-23  0:05 UTC (permalink / raw)
  To: Bernd Edlinger, Richard Biener
  Cc: gcc-patches, Richard Earnshaw, Ramana Radhakrishnan,
	Kyrill Tkachov, Eric Botcazou, Jakub Jelinek

On 8/15/19 1:47 PM, Bernd Edlinger wrote:
> Hi,
> 
> this is the split out part from the "Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)"
> which is sanitizing the middle-end interface to the back-end for strict alignment,
> and a couple of bug-fixes that are necessary to survive boot-strap.
> It is intended to be applied after the PR 89544 fix.
> 
> I think it would be possible to change the default implementation of STACK_SLOT_ALIGNMENT
> to make all stack variables always naturally aligned instead of doing that only
> in assign_parm_setup_stack, but would still like to avoid changing too many things
> that do not seem to have a problem.  Since this would affect many targets, and more
> kinds of variables that may probably not have a strict alignment problem.
> But I am ready to take your advice though.
> 
> 
> Boot-strapped and reg-tested on x86_64-pc-linux-gnu and arm-linux-gnueabihf
> Is it OK for trunk?
> 
> 
> Thanks
> Bernd.
> 
> 
> patch-strict-align.diff
> 
> 2019-08-15  Bernd Edlinger  <bernd.edlinger@hotmail.de>
> 	    Richard Biener  <rguenther@suse.de>
> 
> 	* expr.c (expand_assignment): Handle misaligned DECLs.
> 	(expand_expr_real_1): Handle FUNCTION_DECL as unaligned.
> 	* function.c (assign_parm_adjust_stack_rtl): Check movmisalign optab
> 	too.
> 	(assign_parm_setup_stack): Allocate properly aligned stack slots.
> 	* varasm.c (build_constant_desc): Align constants of misaligned types.
> 	* config/arm/arm.md (movdi, movsi, movhi, movhf, movsf, movdf): Check
> 	strict alignment restrictions on memory addresses.
> 	* config/arm/neon.md (movti, mov<VSTRUCT>, mov<VH>): Likewise.
> 	* config/arm/vec-common.md (mov<VALL>): Likewise.
I'll ack the generic bits.  I have no clue if the ARM maintainers want
the asserts or not.

jeff

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PING] [PATCH] Sanitizing the middle-end interface to the back-end for strict alignment
  2019-08-23  0:05                               ` Jeff Law
@ 2019-08-23 15:15                                 ` Bernd Edlinger
  0 siblings, 0 replies; 50+ messages in thread
From: Bernd Edlinger @ 2019-08-23 15:15 UTC (permalink / raw)
  To: Jeff Law, Richard Biener
  Cc: gcc-patches, Richard Earnshaw, Ramana Radhakrishnan,
	Kyrill Tkachov, Eric Botcazou, Jakub Jelinek

On 8/23/19 12:57 AM, Jeff Law wrote:
> On 8/15/19 1:47 PM, Bernd Edlinger wrote:
>> Hi,
>>
>> this is the split out part from the "Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)"
>> which is sanitizing the middle-end interface to the back-end for strict alignment,
>> and a couple of bug-fixes that are necessary to survive boot-strap.
>> It is intended to be applied after the PR 89544 fix.
>>
>> I think it would be possible to change the default implementation of STACK_SLOT_ALIGNMENT
>> to make all stack variables always naturally aligned instead of doing that only
>> in assign_parm_setup_stack, but would still like to avoid changing too many things
>> that do not seem to have a problem.  Since this would affect many targets, and more
>> kinds of variables that may probably not have a strict alignment problem.
>> But I am ready to take your advice though.
>>
>>
>> Boot-strapped and reg-tested on x86_64-pc-linux-gnu and arm-linux-gnueabihf
>> Is it OK for trunk?
>>
>>
>> Thanks
>> Bernd.
>>
>>
>> patch-strict-align.diff
>>
>> 2019-08-15  Bernd Edlinger  <bernd.edlinger@hotmail.de>
>> 	    Richard Biener  <rguenther@suse.de>
>>
>> 	* expr.c (expand_assignment): Handle misaligned DECLs.
>> 	(expand_expr_real_1): Handle FUNCTION_DECL as unaligned.
>> 	* function.c (assign_parm_adjust_stack_rtl): Check movmisalign optab
>> 	too.
>> 	(assign_parm_setup_stack): Allocate properly aligned stack slots.
>> 	* varasm.c (build_constant_desc): Align constants of misaligned types.
>> 	* config/arm/arm.md (movdi, movsi, movhi, movhf, movsf, movdf): Check
>> 	strict alignment restrictions on memory addresses.
>> 	* config/arm/neon.md (movti, mov<VSTRUCT>, mov<VH>): Likewise.
>> 	* config/arm/vec-common.md (mov<VALL>): Likewise.
> I'll ack the generic bits.  I have no clue if the ARM maintainers want
> the asserts or not.
> 

Okay, thanks Jeff, and Richi.

So I would like to ping on the ARM platform bits.
Just a couple of gcc_checking_asserts.
The wrong code will be fixed by the middle-end changes alone,
but the assertions would help prevent further wrong-code issues
going unnoticed.

Is it OK for trunk?


Thanks
Bernd.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Sanitizing the middle-end interface to the back-end for strict alignment
  2019-08-15 21:27                             ` [PATCH] Sanitizing the middle-end interface to the back-end for strict alignment Bernd Edlinger
  2019-08-17 10:11                               ` Bernd Edlinger
  2019-08-23  0:05                               ` Jeff Law
@ 2019-08-27 10:07                               ` Kyrill Tkachov
  2019-08-28 11:50                                 ` Bernd Edlinger
  2 siblings, 1 reply; 50+ messages in thread
From: Kyrill Tkachov @ 2019-08-27 10:07 UTC (permalink / raw)
  To: Bernd Edlinger, Richard Biener
  Cc: gcc-patches, Richard Earnshaw, Ramana Radhakrishnan,
	Eric Botcazou, Jeff Law, Jakub Jelinek

Hi Bernd,

On 8/15/19 8:47 PM, Bernd Edlinger wrote:
> Hi,
>
> this is the split out part from the "Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)"
> which is sanitizing the middle-end interface to the back-end for strict alignment,
> and a couple of bug-fixes that are necessary to survive boot-strap.
> It is intended to be applied after the PR 89544 fix.
>
> I think it would be possible to change the default implementation of STACK_SLOT_ALIGNMENT
> to make all stack variables always naturally aligned instead of doing that only
> in assign_parm_setup_stack, but would still like to avoid changing too many things
> that do not seem to have a problem.  Since this would affect many targets, and more
> kinds of variables that may probably not have a strict alignment problem.
> But I am ready to take your advice though.
>
>
> Boot-strapped and reg-tested on x86_64-pc-linux-gnu and arm-linux-gnueabihf
> Is it OK for trunk?

I'm not opposed to the checks but...


>
> Thanks
> Bernd.
>

Index: gcc/config/arm/arm.md
===================================================================
--- gcc/config/arm/arm.md	(Revision 274531)
+++ gcc/config/arm/arm.md	(Arbeitskopie)
@@ -5838,6 +5838,12 @@
  	(match_operand:DI 1 "general_operand"))]
    "TARGET_EITHER"
    "
+  gcc_checking_assert (!MEM_P (operands[0])
+		       || MEM_ALIGN (operands[0])
+			  >= GET_MODE_ALIGNMENT (DImode));
+  gcc_checking_assert (!MEM_P (operands[1])
+		       || MEM_ALIGN (operands[1])
+			  >= GET_MODE_ALIGNMENT (DImode));
    if (can_create_pseudo_p ())
      {
        if (!REG_P (operands[0]))
@@ -6014,6 +6020,12 @@
    {
    rtx base, offset, tmp;
  
+  gcc_checking_assert (!MEM_P (operands[0])
+		       || MEM_ALIGN (operands[0])
+			  >= GET_MODE_ALIGNMENT (SImode));
+  gcc_checking_assert (!MEM_P (operands[1])
+		       || MEM_ALIGN (operands[1])
+			  >= GET_MODE_ALIGNMENT (SImode));
    if (TARGET_32BIT || TARGET_HAVE_MOVT)
      {
        /* Everything except mem = const or mem = mem can be done easily.  */
@@ -6503,6 +6515,12 @@
  	(match_operand:HI 1 "general_operand"))]
    "TARGET_EITHER"
    "
+  gcc_checking_assert (!MEM_P (operands[0])
+		       || MEM_ALIGN (operands[0])
+			  >= GET_MODE_ALIGNMENT (HImode));
+  gcc_checking_assert (!MEM_P (operands[1])
+		       || MEM_ALIGN (operands[1])
+			  >= GET_MODE_ALIGNMENT (HImode));
    if (TARGET_ARM)
      {
        if (can_create_pseudo_p ())
@@ -6912,6 +6930,12 @@
  	(match_operand:HF 1 "general_operand"))]
    "TARGET_EITHER"
    "
+  gcc_checking_assert (!MEM_P (operands[0])
+		       || MEM_ALIGN (operands[0])
+			  >= GET_MODE_ALIGNMENT (HFmode));
+  gcc_checking_assert (!MEM_P (operands[1])
+		       || MEM_ALIGN (operands[1])
+			  >= GET_MODE_ALIGNMENT (HFmode));
    if (TARGET_32BIT)
      {
        if (MEM_P (operands[0]))
@@ -6976,6 +7000,12 @@
  	(match_operand:SF 1 "general_operand"))]
    "TARGET_EITHER"
    "
+  gcc_checking_assert (!MEM_P (operands[0])
+		       || MEM_ALIGN (operands[0])
+			  >= GET_MODE_ALIGNMENT (SFmode));
+  gcc_checking_assert (!MEM_P (operands[1])
+		       || MEM_ALIGN (operands[1])
+			  >= GET_MODE_ALIGNMENT (SFmode));
    if (TARGET_32BIT)
      {
        if (MEM_P (operands[0]))
@@ -7071,6 +7101,12 @@
  	(match_operand:DF 1 "general_operand"))]
    "TARGET_EITHER"
    "
+  gcc_checking_assert (!MEM_P (operands[0])
+		       || MEM_ALIGN (operands[0])
+			  >= GET_MODE_ALIGNMENT (DFmode));
+  gcc_checking_assert (!MEM_P (operands[1])
+		       || MEM_ALIGN (operands[1])
+			  >= GET_MODE_ALIGNMENT (DFmode));
    if (TARGET_32BIT)
      {
        if (MEM_P (operands[0]))
Index: gcc/config/arm/neon.md
===================================================================
--- gcc/config/arm/neon.md	(Revision 274531)
+++ gcc/config/arm/neon.md	(Arbeitskopie)
@@ -127,6 +127,12 @@
  	(match_operand:TI 1 "general_operand"))]
    "TARGET_NEON"
  {
+  gcc_checking_assert (!MEM_P (operands[0])
+		       || MEM_ALIGN (operands[0])
+			  >= GET_MODE_ALIGNMENT (TImode));
+  gcc_checking_assert (!MEM_P (operands[1])
+		       || MEM_ALIGN (operands[1])
+			  >= GET_MODE_ALIGNMENT (TImode));
    if (can_create_pseudo_p ())
      {
        if (!REG_P (operands[0]))
@@ -139,6 +145,12 @@
  	(match_operand:VSTRUCT 1 "general_operand"))]
    "TARGET_NEON"
  {
+  gcc_checking_assert (!MEM_P (operands[0])
+		       || MEM_ALIGN (operands[0])
+			  >= GET_MODE_ALIGNMENT (<MODE>mode));
+  gcc_checking_assert (!MEM_P (operands[1])
+		       || MEM_ALIGN (operands[1])
+			  >= GET_MODE_ALIGNMENT (<MODE>mode));
    if (can_create_pseudo_p ())
      {
        if (!REG_P (operands[0]))
@@ -151,6 +163,12 @@
  	(match_operand:VH 1 "s_register_operand"))]
    "TARGET_NEON"
  {
+  gcc_checking_assert (!MEM_P (operands[0])
+		       || MEM_ALIGN (operands[0])
+			  >= GET_MODE_ALIGNMENT (<MODE>mode));
+  gcc_checking_assert (!MEM_P (operands[1])
+		       || MEM_ALIGN (operands[1])
+			  >= GET_MODE_ALIGNMENT (<MODE>mode));
    if (can_create_pseudo_p ())
      {
        if (!REG_P (operands[0]))
Index: gcc/config/arm/vec-common.md
===================================================================
--- gcc/config/arm/vec-common.md	(Revision 274531)
+++ gcc/config/arm/vec-common.md	(Arbeitskopie)
@@ -26,6 +26,12 @@
    "TARGET_NEON
     || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))"
  {
+  gcc_checking_assert (!MEM_P (operands[0])
+		       || MEM_ALIGN (operands[0])
+			  >= GET_MODE_ALIGNMENT (<MODE>mode));
+  gcc_checking_assert (!MEM_P (operands[1])
+		       || MEM_ALIGN (operands[1])
+			  >= GET_MODE_ALIGNMENT (<MODE>mode));
    if (can_create_pseudo_p ())
      {
        if (!REG_P (operands[0]))

... can we please factor the (!MEM_P (operands[0]) || MEM_ALIGN (operands[0]) >= GET_MODE_ALIGNMENT (<MODE>mode)) checks into a common function and use that?

Thanks,
Kyrill

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Sanitizing the middle-end interface to the back-end for strict alignment
  2019-08-27 10:07                               ` Kyrill Tkachov
@ 2019-08-28 11:50                                 ` Bernd Edlinger
  2019-08-28 12:01                                   ` Kyrill Tkachov
  0 siblings, 1 reply; 50+ messages in thread
From: Bernd Edlinger @ 2019-08-28 11:50 UTC (permalink / raw)
  To: Kyrill Tkachov, Richard Biener
  Cc: gcc-patches, Richard Earnshaw, Ramana Radhakrishnan,
	Eric Botcazou, Jeff Law, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 2245 bytes --]

On 8/27/19 11:25 AM, Kyrill Tkachov wrote:
> Hi Bernd,
> 
> On 8/15/19 8:47 PM, Bernd Edlinger wrote:
>> Hi,
>>
>> this is the split out part from the "Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)"
>> which is sanitizing the middle-end interface to the back-end for strict alignment,
>> and a couple of bug-fixes that are necessary to survive boot-strap.
>> It is intended to be applied after the PR 89544 fix.
>>
>> I think it would be possible to change the default implementation of STACK_SLOT_ALIGNMENT
>> to make all stack variables always naturally aligned instead of doing that only
>> in assign_parm_setup_stack, but would still like to avoid changing too many things
>> that do not seem to have a problem.  Since this would affect many targets, and more
>> kinds of variables that may probably not have a strict alignment problem.
>> But I am ready to take your advice though.
>>
>>
>> Boot-strapped and reg-tested on x86_64-pc-linux-gnu and arm-linux-gnueabihf
>> Is it OK for trunk?
> 
> I'm not opposed to the checks but...
> 
> 
>>
>> Thanks
>> Bernd.
>>
> 
> Index: gcc/config/arm/vec-common.md
> ===================================================================
> --- gcc/config/arm/vec-common.md    (Revision 274531)
> +++ gcc/config/arm/vec-common.md    (Arbeitskopie)
> @@ -26,6 +26,12 @@
>    "TARGET_NEON
>     || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))"
>  {
> +  gcc_checking_assert (!MEM_P (operands[0])
> +               || MEM_ALIGN (operands[0])
> +              >= GET_MODE_ALIGNMENT (<MODE>mode));
> +  gcc_checking_assert (!MEM_P (operands[1])
> +               || MEM_ALIGN (operands[1])
> +              >= GET_MODE_ALIGNMENT (<MODE>mode));
>    if (can_create_pseudo_p ())
>      {
>        if (!REG_P (operands[0]))
> 
> ... can we please factor the (!MEM_P (operands[0]) || MEM_ALIGN (operands[0]) >= GET_MODE_ALIGNMENT (<MODE>mode)) checks into a common function and use that?
> 

Sure, good idea.  How about converting it to a predicate?
This creates 1:1 equivalent code to the open coded assertions.

Is it OK for trunk?


Thanks
Bernd.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: patch-strict-align.diff --]
[-- Type: text/x-patch; name="patch-strict-align.diff", Size: 8624 bytes --]

2019-08-15  Bernd Edlinger  <bernd.edlinger@hotmail.de>
	    Richard Biener  <rguenther@suse.de>

	* expr.c (expand_assignment): Handle misaligned DECLs.
	(expand_expr_real_1): Handle FUNCTION_DECL as unaligned.
	* function.c (assign_parm_adjust_stack_rtl): Check movmisalign optab
	too.
	(assign_parm_setup_stack): Allocate properly aligned stack slots.
	* varasm.c (build_constant_desc): Align constants of misaligned types.
	* config/arm/predicates.md (aligned_operand): New predicate.
	* config/arm/arm.md (movdi, movsi, movhi, movhf, movsf, movdf): Use
	sligned_operand to check restrictions on memory addresses.
	* config/arm/neon.md (movti, mov<VSTRUCT>, mov<VH>): Likewise.
	* config/arm/vec-common.md (mov<VALL>): Likewise.

Index: gcc/config/arm/arm.md
===================================================================
--- gcc/config/arm/arm.md	(revision 274946)
+++ gcc/config/arm/arm.md	(working copy)
@@ -5231,6 +5231,8 @@
 	(match_operand:DI 1 "general_operand"))]
   "TARGET_EITHER"
   "
+  gcc_checking_assert (aligned_operand (operands[0], DImode));
+  gcc_checking_assert (aligned_operand (operands[1], DImode));
   if (can_create_pseudo_p ())
     {
       if (!REG_P (operands[0]))
@@ -5407,6 +5409,8 @@
   {
   rtx base, offset, tmp;
 
+  gcc_checking_assert (aligned_operand (operands[0], SImode));
+  gcc_checking_assert (aligned_operand (operands[1], SImode));
   if (TARGET_32BIT || TARGET_HAVE_MOVT)
     {
       /* Everything except mem = const or mem = mem can be done easily.  */
@@ -5896,6 +5900,8 @@
 	(match_operand:HI 1 "general_operand"))]
   "TARGET_EITHER"
   "
+  gcc_checking_assert (aligned_operand (operands[0], HImode));
+  gcc_checking_assert (aligned_operand (operands[1], HImode));
   if (TARGET_ARM)
     {
       if (can_create_pseudo_p ())
@@ -6305,6 +6311,8 @@
 	(match_operand:HF 1 "general_operand"))]
   "TARGET_EITHER"
   "
+  gcc_checking_assert (aligned_operand (operands[0], HFmode));
+  gcc_checking_assert (aligned_operand (operands[1], HFmode));
   if (TARGET_32BIT)
     {
       if (MEM_P (operands[0]))
@@ -6369,6 +6377,8 @@
 	(match_operand:SF 1 "general_operand"))]
   "TARGET_EITHER"
   "
+  gcc_checking_assert (aligned_operand (operands[0], SFmode));
+  gcc_checking_assert (aligned_operand (operands[1], SFmode));
   if (TARGET_32BIT)
     {
       if (MEM_P (operands[0]))
@@ -6464,6 +6474,8 @@
 	(match_operand:DF 1 "general_operand"))]
   "TARGET_EITHER"
   "
+  gcc_checking_assert (aligned_operand (operands[0], DFmode));
+  gcc_checking_assert (aligned_operand (operands[1], DFmode));
   if (TARGET_32BIT)
     {
       if (MEM_P (operands[0]))
Index: gcc/config/arm/neon.md
===================================================================
--- gcc/config/arm/neon.md	(revision 274946)
+++ gcc/config/arm/neon.md	(working copy)
@@ -127,6 +127,8 @@
 	(match_operand:TI 1 "general_operand"))]
   "TARGET_NEON"
 {
+  gcc_checking_assert (aligned_operand (operands[0], TImode));
+  gcc_checking_assert (aligned_operand (operands[1], TImode));
   if (can_create_pseudo_p ())
     {
       if (!REG_P (operands[0]))
@@ -139,6 +141,8 @@
 	(match_operand:VSTRUCT 1 "general_operand"))]
   "TARGET_NEON"
 {
+  gcc_checking_assert (aligned_operand (operands[0], <MODE>mode));
+  gcc_checking_assert (aligned_operand (operands[1], <MODE>mode));
   if (can_create_pseudo_p ())
     {
       if (!REG_P (operands[0]))
@@ -151,6 +155,8 @@
 	(match_operand:VH 1 "s_register_operand"))]
   "TARGET_NEON"
 {
+  gcc_checking_assert (aligned_operand (operands[0], <MODE>mode));
+  gcc_checking_assert (aligned_operand (operands[1], <MODE>mode));
   if (can_create_pseudo_p ())
     {
       if (!REG_P (operands[0]))
Index: gcc/config/arm/predicates.md
===================================================================
--- gcc/config/arm/predicates.md	(revision 274946)
+++ gcc/config/arm/predicates.md	(working copy)
@@ -697,3 +697,7 @@
   (ior (and (match_code "symbol_ref")
 	    (match_test "!arm_is_long_call_p (SYMBOL_REF_DECL (op))"))
        (match_operand 0 "s_register_operand")))
+
+(define_special_predicate "aligned_operand"
+  (ior (not (match_code "mem"))
+       (match_test "MEM_ALIGN (op) >= GET_MODE_ALIGNMENT (mode)")))
Index: gcc/config/arm/vec-common.md
===================================================================
--- gcc/config/arm/vec-common.md	(revision 274946)
+++ gcc/config/arm/vec-common.md	(working copy)
@@ -26,6 +26,8 @@
   "TARGET_NEON
    || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))"
 {
+  gcc_checking_assert (aligned_operand (operands[0], <MODE>mode));
+  gcc_checking_assert (aligned_operand (operands[1], <MODE>mode));
   if (can_create_pseudo_p ())
     {
       if (!REG_P (operands[0]))
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	(revision 274946)
+++ gcc/expr.c	(working copy)
@@ -5001,9 +5001,10 @@ expand_assignment (tree to, tree from, bool nontem
   /* Handle misaligned stores.  */
   mode = TYPE_MODE (TREE_TYPE (to));
   if ((TREE_CODE (to) == MEM_REF
-       || TREE_CODE (to) == TARGET_MEM_REF)
+       || TREE_CODE (to) == TARGET_MEM_REF
+       || DECL_P (to))
       && mode != BLKmode
-      && !mem_ref_refers_to_non_mem_p (to)
+      && (DECL_P (to) || !mem_ref_refers_to_non_mem_p (to))
       && ((align = get_object_alignment (to))
 	  < GET_MODE_ALIGNMENT (mode))
       && (((icode = optab_handler (movmisalign_optab, mode))
@@ -10795,6 +10796,14 @@ expand_expr_real_1 (tree exp, rtx target, machine_
 	    MEM_VOLATILE_P (op0) = 1;
 	  }
 
+	if (MEM_P (op0) && TREE_CODE (tem) == FUNCTION_DECL)
+	  {
+	    if (op0 == orig_op0)
+	      op0 = copy_rtx (op0);
+
+	    set_mem_align (op0, BITS_PER_UNIT);
+	  }
+
 	/* In cases where an aligned union has an unaligned object
 	   as a field, we might be extracting a BLKmode value from
 	   an integer-mode (e.g., SImode) object.  Handle this case
Index: gcc/function.c
===================================================================
--- gcc/function.c	(revision 274946)
+++ gcc/function.c	(working copy)
@@ -2807,8 +2807,10 @@ assign_parm_adjust_stack_rtl (struct assign_parm_d
      stack slot, if we need one.  */
   if (stack_parm
       && ((GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN (stack_parm)
-	   && targetm.slow_unaligned_access (data->nominal_mode,
-					     MEM_ALIGN (stack_parm)))
+	   && ((optab_handler (movmisalign_optab, data->nominal_mode)
+		!= CODE_FOR_nothing)
+	       || targetm.slow_unaligned_access (data->nominal_mode,
+						 MEM_ALIGN (stack_parm))))
 	  || (data->nominal_type
 	      && TYPE_ALIGN (data->nominal_type) > MEM_ALIGN (stack_parm)
 	      && MEM_ALIGN (stack_parm) < PREFERRED_STACK_BOUNDARY)))
@@ -3461,11 +3463,20 @@ assign_parm_setup_stack (struct assign_parm_data_a
 	  int align = STACK_SLOT_ALIGNMENT (data->arg.type,
 					    GET_MODE (data->entry_parm),
 					    TYPE_ALIGN (data->arg.type));
+	  if (align < (int)GET_MODE_ALIGNMENT (GET_MODE (data->entry_parm))
+	      && ((optab_handler (movmisalign_optab,
+				  GET_MODE (data->entry_parm))
+		   != CODE_FOR_nothing)
+		  || targetm.slow_unaligned_access (GET_MODE (data->entry_parm),
+						    align)))
+	    align = GET_MODE_ALIGNMENT (GET_MODE (data->entry_parm));
 	  data->stack_parm
 	    = assign_stack_local (GET_MODE (data->entry_parm),
 				  GET_MODE_SIZE (GET_MODE (data->entry_parm)),
 				  align);
+	  align = MEM_ALIGN (data->stack_parm);
 	  set_mem_attributes (data->stack_parm, parm, 1);
+	  set_mem_align (data->stack_parm, align);
 	}
 
       dest = validize_mem (copy_rtx (data->stack_parm));
Index: gcc/varasm.c
===================================================================
--- gcc/varasm.c	(revision 274946)
+++ gcc/varasm.c	(working copy)
@@ -47,6 +47,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "stmt.h"
 #include "expr.h"
 #include "expmed.h"
+#include "optabs.h"
 #include "output.h"
 #include "langhooks.h"
 #include "debug.h"
@@ -3386,7 +3387,15 @@ build_constant_desc (tree exp)
   if (TREE_CODE (exp) == STRING_CST)
     SET_DECL_ALIGN (decl, targetm.constant_alignment (exp, DECL_ALIGN (decl)));
   else
-    align_variable (decl, 0);
+    {
+      align_variable (decl, 0);
+      if (DECL_ALIGN (decl) < GET_MODE_ALIGNMENT (DECL_MODE (decl))
+	  && ((optab_handler (movmisalign_optab, DECL_MODE (decl))
+	       != CODE_FOR_nothing)
+	      || targetm.slow_unaligned_access (DECL_MODE (decl),
+						DECL_ALIGN (decl))))
+	SET_DECL_ALIGN (decl, GET_MODE_ALIGNMENT (DECL_MODE (decl)));
+    }
 
   /* Now construct the SYMBOL_REF and the MEM.  */
   if (use_object_blocks_p ())

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Sanitizing the middle-end interface to the back-end for strict alignment
  2019-08-28 11:50                                 ` Bernd Edlinger
@ 2019-08-28 12:01                                   ` Kyrill Tkachov
  2019-08-28 13:54                                     ` Christophe Lyon
  0 siblings, 1 reply; 50+ messages in thread
From: Kyrill Tkachov @ 2019-08-28 12:01 UTC (permalink / raw)
  To: Bernd Edlinger, Richard Biener
  Cc: gcc-patches, Richard Earnshaw, Ramana Radhakrishnan,
	Eric Botcazou, Jeff Law, Jakub Jelinek


On 8/28/19 10:38 AM, Bernd Edlinger wrote:
> On 8/27/19 11:25 AM, Kyrill Tkachov wrote:
>> Hi Bernd,
>>
>> On 8/15/19 8:47 PM, Bernd Edlinger wrote:
>>> Hi,
>>>
>>> this is the split out part from the "Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)"
>>> which is sanitizing the middle-end interface to the back-end for strict alignment,
>>> and a couple of bug-fixes that are necessary to survive boot-strap.
>>> It is intended to be applied after the PR 89544 fix.
>>>
>>> I think it would be possible to change the default implementation of STACK_SLOT_ALIGNMENT
>>> to make all stack variables always naturally aligned instead of doing that only
>>> in assign_parm_setup_stack, but would still like to avoid changing too many things
>>> that do not seem to have a problem.  Since this would affect many targets, and more
>>> kinds of variables that may probably not have a strict alignment problem.
>>> But I am ready to take your advice though.
>>>
>>>
>>> Boot-strapped and reg-tested on x86_64-pc-linux-gnu and arm-linux-gnueabihf
>>> Is it OK for trunk?
>> I'm not opposed to the checks but...
>>
>>
>>> Thanks
>>> Bernd.
>>>
>> Index: gcc/config/arm/vec-common.md
>> ===================================================================
>> --- gcc/config/arm/vec-common.md    (Revision 274531)
>> +++ gcc/config/arm/vec-common.md    (Arbeitskopie)
>> @@ -26,6 +26,12 @@
>>     "TARGET_NEON
>>      || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))"
>>   {
>> +  gcc_checking_assert (!MEM_P (operands[0])
>> +               || MEM_ALIGN (operands[0])
>> +              >= GET_MODE_ALIGNMENT (<MODE>mode));
>> +  gcc_checking_assert (!MEM_P (operands[1])
>> +               || MEM_ALIGN (operands[1])
>> +              >= GET_MODE_ALIGNMENT (<MODE>mode));
>>     if (can_create_pseudo_p ())
>>       {
>>         if (!REG_P (operands[0]))
>>
>> ... can we please factor the (!MEM_P (operands[0]) || MEM_ALIGN (operands[0]) >= GET_MODE_ALIGNMENT (<MODE>mode)) checks into a common function and use that?
>>
> Sure, good idea.  How about converting it to a predicate?
> This creates 1:1 equivalent code to the open coded assertions.
>
> Is it OK for trunk?
>
>
> Thanks
> Bernd.


patch-strict-align.diff

2019-08-15  Bernd Edlinger<bernd.edlinger@hotmail.de>
	    Richard Biener<rguenther@suse.de>

	* expr.c (expand_assignment): Handle misaligned DECLs.
	(expand_expr_real_1): Handle FUNCTION_DECL as unaligned.
	* function.c (assign_parm_adjust_stack_rtl): Check movmisalign optab
	too.
	(assign_parm_setup_stack): Allocate properly aligned stack slots.
	* varasm.c (build_constant_desc): Align constants of misaligned types.
	* config/arm/predicates.md (aligned_operand): New predicate.
	* config/arm/arm.md (movdi, movsi, movhi, movhf, movsf, movdf): Use
	sligned_operand to check restrictions on memory addresses.

typo in "aligned_operand"

         * config/arm/neon.md (movti, mov<VSTRUCT>, mov<VH>): Likewise.
	* config/arm/vec-common.md (mov<VALL>): Likewise.


Looks good now.

Ok, thanks!

Kyrill

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Sanitizing the middle-end interface to the back-end for strict alignment
  2019-08-28 12:01                                   ` Kyrill Tkachov
@ 2019-08-28 13:54                                     ` Christophe Lyon
  2019-08-28 21:48                                       ` Bernd Edlinger
  0 siblings, 1 reply; 50+ messages in thread
From: Christophe Lyon @ 2019-08-28 13:54 UTC (permalink / raw)
  To: Kyrill Tkachov
  Cc: Bernd Edlinger, Richard Biener, gcc-patches, Richard Earnshaw,
	Ramana Radhakrishnan, Eric Botcazou, Jeff Law, Jakub Jelinek

On Wed, 28 Aug 2019 at 11:42, Kyrill Tkachov
<kyrylo.tkachov@foss.arm.com> wrote:
>
>
> On 8/28/19 10:38 AM, Bernd Edlinger wrote:
> > On 8/27/19 11:25 AM, Kyrill Tkachov wrote:
> >> Hi Bernd,
> >>
> >> On 8/15/19 8:47 PM, Bernd Edlinger wrote:
> >>> Hi,
> >>>
> >>> this is the split out part from the "Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)"
> >>> which is sanitizing the middle-end interface to the back-end for strict alignment,
> >>> and a couple of bug-fixes that are necessary to survive boot-strap.
> >>> It is intended to be applied after the PR 89544 fix.
> >>>
> >>> I think it would be possible to change the default implementation of STACK_SLOT_ALIGNMENT
> >>> to make all stack variables always naturally aligned instead of doing that only
> >>> in assign_parm_setup_stack, but would still like to avoid changing too many things
> >>> that do not seem to have a problem.  Since this would affect many targets, and more
> >>> kinds of variables that may probably not have a strict alignment problem.
> >>> But I am ready to take your advice though.
> >>>
> >>>
> >>> Boot-strapped and reg-tested on x86_64-pc-linux-gnu and arm-linux-gnueabihf
> >>> Is it OK for trunk?
> >> I'm not opposed to the checks but...
> >>
> >>
> >>> Thanks
> >>> Bernd.
> >>>
> >> Index: gcc/config/arm/vec-common.md
> >> ===================================================================
> >> --- gcc/config/arm/vec-common.md    (Revision 274531)
> >> +++ gcc/config/arm/vec-common.md    (Arbeitskopie)
> >> @@ -26,6 +26,12 @@
> >>     "TARGET_NEON
> >>      || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))"
> >>   {
> >> +  gcc_checking_assert (!MEM_P (operands[0])
> >> +               || MEM_ALIGN (operands[0])
> >> +              >= GET_MODE_ALIGNMENT (<MODE>mode));
> >> +  gcc_checking_assert (!MEM_P (operands[1])
> >> +               || MEM_ALIGN (operands[1])
> >> +              >= GET_MODE_ALIGNMENT (<MODE>mode));
> >>     if (can_create_pseudo_p ())
> >>       {
> >>         if (!REG_P (operands[0]))
> >>
> >> ... can we please factor the (!MEM_P (operands[0]) || MEM_ALIGN (operands[0]) >= GET_MODE_ALIGNMENT (<MODE>mode)) checks into a common function and use that?
> >>
> > Sure, good idea.  How about converting it to a predicate?
> > This creates 1:1 equivalent code to the open coded assertions.
> >
> > Is it OK for trunk?
> >
> >
> > Thanks
> > Bernd.
>
>
> patch-strict-align.diff
>
> 2019-08-15  Bernd Edlinger<bernd.edlinger@hotmail.de>
>             Richard Biener<rguenther@suse.de>
>
>         * expr.c (expand_assignment): Handle misaligned DECLs.
>         (expand_expr_real_1): Handle FUNCTION_DECL as unaligned.
>         * function.c (assign_parm_adjust_stack_rtl): Check movmisalign optab
>         too.
>         (assign_parm_setup_stack): Allocate properly aligned stack slots.
>         * varasm.c (build_constant_desc): Align constants of misaligned types.
>         * config/arm/predicates.md (aligned_operand): New predicate.
>         * config/arm/arm.md (movdi, movsi, movhi, movhf, movsf, movdf): Use
>         sligned_operand to check restrictions on memory addresses.
>
> typo in "aligned_operand"
>
>          * config/arm/neon.md (movti, mov<VSTRUCT>, mov<VH>): Likewise.
>         * config/arm/vec-common.md (mov<VALL>): Likewise.
>
>
> Looks good now.
>

Hi,

This patch causes an ICE when building libgcc's unwind-arm.o
when configuring GCC:
--target  arm-none-linux-gnueabihf --with-mode thumb --with-cpu
cortex-a15 --with-fpu neon-vfpv4:

The build works for the same target, but --with-mode arm --with-cpu
cortex a9 --with-fpu vfp

In file included from
/tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/config/arm/unwind-arm.c:144:
/tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-arm-common.inc:
In function 'get_eit_entry':
/tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-arm-common.inc:245:29:
warning: cast discards 'const' qualifier from pointer target type
[-Wcast-qual]
  245 |       ucbp->pr_cache.ehtp = (_Unwind_EHT_Header *)&eitp->content;
      |                             ^
during RTL pass: expand
/tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-arm-common.inc:
In function 'unwind_phase2_forced':
/tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-arm-common.inc:319:18:
internal compiler error: in gen_movdi, at config/arm/arm.md:5235
  319 |   saved_vrs.core = entry_vrs->core;
      |   ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~
0x126530f gen_movdi(rtx_def*, rtx_def*)
        /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.md:5235
0x896d92 insn_gen_fn::operator()(rtx_def*, rtx_def*) const
        /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/recog.h:318
0x896d92 emit_move_insn_1(rtx_def*, rtx_def*)
        /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:3694
0x897083 emit_move_insn(rtx_def*, rtx_def*)
        /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:3790
0xfc25d6 gen_cpymem_ldrd_strd(rtx_def**)
        /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:14582
0x126a1f1 gen_cpymemqi(rtx_def*, rtx_def*, rtx_def*, rtx_def*)
        /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.md:6688
0xb0bc08 maybe_expand_insn(insn_code, unsigned int, expand_operand*)
        /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/optabs.c:7440
0x89ba1e emit_block_move_via_cpymem
        /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:1808
0x89ba1e emit_block_move_hints(rtx_def*, rtx_def*, rtx_def*,
block_op_methods, unsigned int, long, unsigned long, unsigned long,
unsigned long, bool, bool*)
        /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:1627
0x89c383 emit_block_move(rtx_def*, rtx_def*, rtx_def*, block_op_methods)
        /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:1667
0x89fb4e store_expr(tree_node*, rtx_def*, int, bool, bool)
        /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:5845
0x88c1f9 store_field
        /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:7149
0x8a0c22 expand_assignment(tree_node*, tree_node*, bool)
        /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:5304
0x761964 expand_gimple_stmt_1
        /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:3779
0x761964 expand_gimple_stmt
        /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:3875
0x768583 expand_gimple_basic_block
        /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:5915
0x76abc6 execute
        /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:6538

Christophe

> Ok, thanks!
>
> Kyrill
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Sanitizing the middle-end interface to the back-end for strict alignment
  2019-08-28 13:54                                     ` Christophe Lyon
@ 2019-08-28 21:48                                       ` Bernd Edlinger
  2019-08-29  9:09                                         ` Kyrill Tkachov
  0 siblings, 1 reply; 50+ messages in thread
From: Bernd Edlinger @ 2019-08-28 21:48 UTC (permalink / raw)
  To: Christophe Lyon, Kyrill Tkachov
  Cc: Richard Biener, gcc-patches, Richard Earnshaw,
	Ramana Radhakrishnan, Eric Botcazou, Jeff Law, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 4117 bytes --]

On 8/28/19 2:07 PM, Christophe Lyon wrote:
> Hi,
> 
> This patch causes an ICE when building libgcc's unwind-arm.o
> when configuring GCC:
> --target  arm-none-linux-gnueabihf --with-mode thumb --with-cpu
> cortex-a15 --with-fpu neon-vfpv4:
> 
> The build works for the same target, but --with-mode arm --with-cpu
> cortex a9 --with-fpu vfp
> 
> In file included from
> /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/config/arm/unwind-arm.c:144:
> /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-arm-common.inc:
> In function 'get_eit_entry':
> /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-arm-common.inc:245:29:
> warning: cast discards 'const' qualifier from pointer target type
> [-Wcast-qual]
>   245 |       ucbp->pr_cache.ehtp = (_Unwind_EHT_Header *)&eitp->content;
>       |                             ^
> during RTL pass: expand
> /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-arm-common.inc:
> In function 'unwind_phase2_forced':
> /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-arm-common.inc:319:18:
> internal compiler error: in gen_movdi, at config/arm/arm.md:5235
>   319 |   saved_vrs.core = entry_vrs->core;
>       |   ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~
> 0x126530f gen_movdi(rtx_def*, rtx_def*)
>         /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.md:5235
> 0x896d92 insn_gen_fn::operator()(rtx_def*, rtx_def*) const
>         /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/recog.h:318
> 0x896d92 emit_move_insn_1(rtx_def*, rtx_def*)
>         /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:3694
> 0x897083 emit_move_insn(rtx_def*, rtx_def*)
>         /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:3790
> 0xfc25d6 gen_cpymem_ldrd_strd(rtx_def**)
>         /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:14582
> 0x126a1f1 gen_cpymemqi(rtx_def*, rtx_def*, rtx_def*, rtx_def*)
>         /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.md:6688
> 0xb0bc08 maybe_expand_insn(insn_code, unsigned int, expand_operand*)
>         /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/optabs.c:7440
> 0x89ba1e emit_block_move_via_cpymem
>         /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:1808
> 0x89ba1e emit_block_move_hints(rtx_def*, rtx_def*, rtx_def*,
> block_op_methods, unsigned int, long, unsigned long, unsigned long,
> unsigned long, bool, bool*)
>         /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:1627
> 0x89c383 emit_block_move(rtx_def*, rtx_def*, rtx_def*, block_op_methods)
>         /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:1667
> 0x89fb4e store_expr(tree_node*, rtx_def*, int, bool, bool)
>         /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:5845
> 0x88c1f9 store_field
>         /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:7149
> 0x8a0c22 expand_assignment(tree_node*, tree_node*, bool)
>         /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:5304
> 0x761964 expand_gimple_stmt_1
>         /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:3779
> 0x761964 expand_gimple_stmt
>         /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:3875
> 0x768583 expand_gimple_basic_block
>         /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:5915
> 0x76abc6 execute
>         /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:6538
> 
> Christophe
> 

Okay, sorry for the breakage.

What is happening in gen_cpymem_ldrd_strd is of course against the rules:

It uses emit_move_insn on only 4-byte aligned DI-mode memory operands.

I have a patch for this, which is able to fix the libgcc build on a cross, but have no
possibility to bootstrap the affected target.

Could you please help?


Thanks
Bernd.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: patch-cpymem-fix.diff --]
[-- Type: text/x-patch; name="patch-cpymem-fix.diff", Size: 2685 bytes --]

2019-08-28  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	* config/arm/arm.md (unaligned_loaddi,
	unaligned_storedi): New unspec insn patterns.
	* config/arm/arm.c (gen_cpymem_ldrd_strd): Use unaligned_loaddi
	and unaligned_storedi for 4-byte aligned memory.

Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c	(revision 274987)
+++ gcc/config/arm/arm.c	(working copy)
@@ -14578,8 +14578,10 @@ gen_cpymem_ldrd_strd (rtx *operands)
 	  low_reg = gen_lowpart (SImode, reg0);
 	  hi_reg = gen_highpart_mode (SImode, DImode, reg0);
 	}
-      if (src_aligned)
-        emit_move_insn (reg0, src);
+      if (MEM_ALIGN (src) >= 2 * BITS_PER_WORD)
+	emit_move_insn (reg0, src);
+      else if (src_aligned)
+	emit_insn (gen_unaligned_loaddi (reg0, src));
       else
 	{
 	  emit_insn (gen_unaligned_loadsi (low_reg, src));
@@ -14587,8 +14589,10 @@ gen_cpymem_ldrd_strd (rtx *operands)
 	  emit_insn (gen_unaligned_loadsi (hi_reg, src));
 	}
 
-      if (dst_aligned)
-        emit_move_insn (dst, reg0);
+      if (MEM_ALIGN (dst) >= 2 * BITS_PER_WORD)
+	emit_move_insn (dst, reg0);
+      else if (dst_aligned)
+	emit_insn (gen_unaligned_storedi (dst, reg0));
       else
 	{
 	  emit_insn (gen_unaligned_storesi (dst, low_reg));
Index: gcc/config/arm/arm.md
===================================================================
--- gcc/config/arm/arm.md	(revision 274987)
+++ gcc/config/arm/arm.md	(working copy)
@@ -3963,6 +3963,17 @@
 
 ; ARMv6+ unaligned load/store instructions (used for packed structure accesses).
 
+(define_insn "unaligned_loaddi"
+  [(set (match_operand:DI 0 "s_register_operand" "=r")
+	(unspec:DI [(match_operand:DI 1 "memory_operand" "m")]
+		   UNSPEC_UNALIGNED_LOAD))]
+  "TARGET_32BIT && TARGET_LDRD"
+  "*
+  return output_move_double (operands, true, NULL);
+  "
+  [(set_attr "length" "8")
+   (set_attr "type" "load_8")])
+
 (define_insn "unaligned_loadsi"
   [(set (match_operand:SI 0 "s_register_operand" "=l,l,r")
 	(unspec:SI [(match_operand:SI 1 "memory_operand" "m,Uw,m")]
@@ -4008,6 +4019,17 @@
    (set_attr "predicable_short_it" "no,yes,no")
    (set_attr "type" "load_byte")])
 
+(define_insn "unaligned_storedi"
+  [(set (match_operand:DI 0 "memory_operand" "=m")
+	(unspec:DI [(match_operand:DI 1 "s_register_operand" "r")]
+		   UNSPEC_UNALIGNED_STORE))]
+  "TARGET_32BIT && TARGET_LDRD"
+  "*
+  return output_move_double (operands, true, NULL);
+  "
+  [(set_attr "length" "8")
+   (set_attr "type" "store_8")])
+
 (define_insn "unaligned_storesi"
   [(set (match_operand:SI 0 "memory_operand" "=m,Uw,m")
 	(unspec:SI [(match_operand:SI 1 "s_register_operand" "l,l,r")]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Sanitizing the middle-end interface to the back-end for strict alignment
  2019-08-28 21:48                                       ` Bernd Edlinger
@ 2019-08-29  9:09                                         ` Kyrill Tkachov
  2019-08-29 10:00                                           ` Christophe Lyon
  0 siblings, 1 reply; 50+ messages in thread
From: Kyrill Tkachov @ 2019-08-29  9:09 UTC (permalink / raw)
  To: Bernd Edlinger, Christophe Lyon
  Cc: Richard Biener, gcc-patches, Richard Earnshaw,
	Ramana Radhakrishnan, Eric Botcazou, Jeff Law, Jakub Jelinek

Hi Bernd,

On 8/28/19 10:36 PM, Bernd Edlinger wrote:
> On 8/28/19 2:07 PM, Christophe Lyon wrote:
>> Hi,
>>
>> This patch causes an ICE when building libgcc's unwind-arm.o
>> when configuring GCC:
>> --target  arm-none-linux-gnueabihf --with-mode thumb --with-cpu
>> cortex-a15 --with-fpu neon-vfpv4:
>>
>> The build works for the same target, but --with-mode arm --with-cpu
>> cortex a9 --with-fpu vfp
>>
>> In file included from
>> /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/config/arm/unwind-arm.c:144:
>> /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-arm-common.inc:
>> In function 'get_eit_entry':
>> /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-arm-common.inc:245:29:
>> warning: cast discards 'const' qualifier from pointer target type
>> [-Wcast-qual]
>>    245 |       ucbp->pr_cache.ehtp = (_Unwind_EHT_Header *)&eitp->content;
>>        |                             ^
>> during RTL pass: expand
>> /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-arm-common.inc:
>> In function 'unwind_phase2_forced':
>> /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-arm-common.inc:319:18:
>> internal compiler error: in gen_movdi, at config/arm/arm.md:5235
>>    319 |   saved_vrs.core = entry_vrs->core;
>>        |   ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~
>> 0x126530f gen_movdi(rtx_def*, rtx_def*)
>>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.md:5235
>> 0x896d92 insn_gen_fn::operator()(rtx_def*, rtx_def*) const
>>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/recog.h:318
>> 0x896d92 emit_move_insn_1(rtx_def*, rtx_def*)
>>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:3694
>> 0x897083 emit_move_insn(rtx_def*, rtx_def*)
>>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:3790
>> 0xfc25d6 gen_cpymem_ldrd_strd(rtx_def**)
>>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:14582
>> 0x126a1f1 gen_cpymemqi(rtx_def*, rtx_def*, rtx_def*, rtx_def*)
>>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.md:6688
>> 0xb0bc08 maybe_expand_insn(insn_code, unsigned int, expand_operand*)
>>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/optabs.c:7440
>> 0x89ba1e emit_block_move_via_cpymem
>>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:1808
>> 0x89ba1e emit_block_move_hints(rtx_def*, rtx_def*, rtx_def*,
>> block_op_methods, unsigned int, long, unsigned long, unsigned long,
>> unsigned long, bool, bool*)
>>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:1627
>> 0x89c383 emit_block_move(rtx_def*, rtx_def*, rtx_def*, block_op_methods)
>>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:1667
>> 0x89fb4e store_expr(tree_node*, rtx_def*, int, bool, bool)
>>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:5845
>> 0x88c1f9 store_field
>>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:7149
>> 0x8a0c22 expand_assignment(tree_node*, tree_node*, bool)
>>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:5304
>> 0x761964 expand_gimple_stmt_1
>>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:3779
>> 0x761964 expand_gimple_stmt
>>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:3875
>> 0x768583 expand_gimple_basic_block
>>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:5915
>> 0x76abc6 execute
>>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:6538
>>
>> Christophe
>>
> Okay, sorry for the breakage.
>
> What is happening in gen_cpymem_ldrd_strd is of course against the rules:
>
> It uses emit_move_insn on only 4-byte aligned DI-mode memory operands.
>
> I have a patch for this, which is able to fix the libgcc build on a cross, but have no
> possibility to bootstrap the affected target.
>
> Could you please help?

Well it's good that the sanitisation is catching the bugs!

Bootstrapping this patch I get another assert with the backtrace:

$BUILD/arm-none-linux-gnueabihf/libstdc++-v3/include/ext/concurrence.h: 
In function '(static initializers for 
$SRC/libstdc++-v3/libsupc++/eh_alloc.cc)':
$BUILD/arm-none-linux-gnueabihf/libstdc++-v3/include/ext/concurrence.h:129:5: 
internal compiler error: in gen_movv8qi, at config/arm/vec-common.md:29
   129 |     {
       |     ^
0x14155cb gen_movv8qi(rtx_def*, rtx_def*)
         $SRC/gcc/config/arm/vec-common.md:29
0x96bb89 insn_gen_fn::operator()(rtx_def*, rtx_def*) const
         $SRC/gcc/recog.h:318
0x94bc95 emit_move_insn_1(rtx_def*, rtx_def*)
         $SRC/gcc/expr.c:3694
0x94c05b emit_move_insn(rtx_def*, rtx_def*)
         $SRC/gcc/expr.c:3790
0x10d5ee5 arm_block_set_aligned_vect
         $SRC/gcc/config/arm/arm.c:30204
0x10d6b37 arm_block_set_vect
         $SRC/gcc/config/arm/arm.c:30428
0x10d6caf arm_gen_setmem(rtx_def**)
         $SRC/gcc/config/arm/arm.c:30458
0x140d7ed gen_setmemsi(rtx_def*, rtx_def*, rtx_def*, rtx_def*)
         $SRC/gcc/config/arm/arm.md:6687
0xbf0e87 insn_gen_fn::operator()(rtx_def*, rtx_def*, rtx_def*, rtx_def*) 
const
         $SRC/gcc/recog.h:320
0xbf0999 maybe_gen_insn(insn_code, unsigned int, expand_operand*)
         $SRC/gcc/optabs.c:7409
0xbf0b87 maybe_expand_insn(insn_code, unsigned int, expand_operand*)
         $SRC/gcc/optabs.c:7440
0x94a709 set_storage_via_setmem(rtx_def*, rtx_def*, rtx_def*, unsigned 
int, unsigned int, long long, unsigned long long, unsigned long long, 
unsigned long long)
         $SRC/gcc/expr.c:3168
0x94a059 clear_storage_hints(rtx_def*, rtx_def*, block_op_methods, 
unsigned int, long long, unsigned long long, unsigned long long, 
unsigned long long)
         $SRC/gcc/expr.c:3037
0x94a137 clear_storage(rtx_def*, rtx_def*, block_op_methods)
         $SRC/gcc/expr.c:3058
0x9537c5 store_constructor
         $SRC/gcc/expr.c:6333
0x957227 store_field
         $SRC/gcc/expr.c:7145
0x94fde1 expand_assignment(tree_node*, tree_node*, bool)
         $SRC/gcc/expr.c:5301
0x815e25 expand_gimple_stmt_1
         $SRC/gcc/cfgexpand.c:3777
0x81611d expand_gimple_stmt
         $SRC/gcc/cfgexpand.c:3875
0x81cd61 expand_gimple_basic_block
         $SRC/gcc/cfgexpand.c:5915

Looks to me like arm_gen_setmem needs similar fixes to gen_cpymem_ldrd_strd?

Thanks,

Kyrill


>
>
> Thanks
> Bernd.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Sanitizing the middle-end interface to the back-end for strict alignment
  2019-08-29  9:09                                         ` Kyrill Tkachov
@ 2019-08-29 10:00                                           ` Christophe Lyon
  2019-08-29 22:57                                             ` Bernd Edlinger
  0 siblings, 1 reply; 50+ messages in thread
From: Christophe Lyon @ 2019-08-29 10:00 UTC (permalink / raw)
  To: Kyrill Tkachov
  Cc: Bernd Edlinger, Richard Biener, gcc-patches, Richard Earnshaw,
	Ramana Radhakrishnan, Eric Botcazou, Jeff Law, Jakub Jelinek

On Thu, 29 Aug 2019 at 10:58, Kyrill Tkachov
<kyrylo.tkachov@foss.arm.com> wrote:
>
> Hi Bernd,
>
> On 8/28/19 10:36 PM, Bernd Edlinger wrote:
> > On 8/28/19 2:07 PM, Christophe Lyon wrote:
> >> Hi,
> >>
> >> This patch causes an ICE when building libgcc's unwind-arm.o
> >> when configuring GCC:
> >> --target  arm-none-linux-gnueabihf --with-mode thumb --with-cpu
> >> cortex-a15 --with-fpu neon-vfpv4:
> >>
> >> The build works for the same target, but --with-mode arm --with-cpu
> >> cortex a9 --with-fpu vfp
> >>
> >> In file included from
> >> /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/config/arm/unwind-arm.c:144:
> >> /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-arm-common.inc:
> >> In function 'get_eit_entry':
> >> /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-arm-common.inc:245:29:
> >> warning: cast discards 'const' qualifier from pointer target type
> >> [-Wcast-qual]
> >>    245 |       ucbp->pr_cache.ehtp = (_Unwind_EHT_Header *)&eitp->content;
> >>        |                             ^
> >> during RTL pass: expand
> >> /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-arm-common.inc:
> >> In function 'unwind_phase2_forced':
> >> /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-arm-common.inc:319:18:
> >> internal compiler error: in gen_movdi, at config/arm/arm.md:5235
> >>    319 |   saved_vrs.core = entry_vrs->core;
> >>        |   ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~
> >> 0x126530f gen_movdi(rtx_def*, rtx_def*)
> >>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.md:5235
> >> 0x896d92 insn_gen_fn::operator()(rtx_def*, rtx_def*) const
> >>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/recog.h:318
> >> 0x896d92 emit_move_insn_1(rtx_def*, rtx_def*)
> >>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:3694
> >> 0x897083 emit_move_insn(rtx_def*, rtx_def*)
> >>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:3790
> >> 0xfc25d6 gen_cpymem_ldrd_strd(rtx_def**)
> >>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:14582
> >> 0x126a1f1 gen_cpymemqi(rtx_def*, rtx_def*, rtx_def*, rtx_def*)
> >>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.md:6688
> >> 0xb0bc08 maybe_expand_insn(insn_code, unsigned int, expand_operand*)
> >>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/optabs.c:7440
> >> 0x89ba1e emit_block_move_via_cpymem
> >>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:1808
> >> 0x89ba1e emit_block_move_hints(rtx_def*, rtx_def*, rtx_def*,
> >> block_op_methods, unsigned int, long, unsigned long, unsigned long,
> >> unsigned long, bool, bool*)
> >>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:1627
> >> 0x89c383 emit_block_move(rtx_def*, rtx_def*, rtx_def*, block_op_methods)
> >>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:1667
> >> 0x89fb4e store_expr(tree_node*, rtx_def*, int, bool, bool)
> >>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:5845
> >> 0x88c1f9 store_field
> >>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:7149
> >> 0x8a0c22 expand_assignment(tree_node*, tree_node*, bool)
> >>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:5304
> >> 0x761964 expand_gimple_stmt_1
> >>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:3779
> >> 0x761964 expand_gimple_stmt
> >>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:3875
> >> 0x768583 expand_gimple_basic_block
> >>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:5915
> >> 0x76abc6 execute
> >>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:6538
> >>
> >> Christophe
> >>
> > Okay, sorry for the breakage.
> >
> > What is happening in gen_cpymem_ldrd_strd is of course against the rules:
> >
> > It uses emit_move_insn on only 4-byte aligned DI-mode memory operands.
> >
> > I have a patch for this, which is able to fix the libgcc build on a cross, but have no
> > possibility to bootstrap the affected target.
> >
> > Could you please help?
>
> Well it's good that the sanitisation is catching the bugs!
>
> Bootstrapping this patch I get another assert with the backtrace:

Thanks for the additional testing, Kyrill!

FWIW, my original report was with a failure to just build GCC for
cortex-a15. I later got the reports of testing cross-toolchains, and
saw other problems on cortex-a9 for instance.
But I guess, you have noticed them with your bootstrap?
on arm-linux-gnueabi
gcc.target/arm/aapcs/align4.c (internal compiler error)
gcc.target/arm/aapcs/align_rec4.c (internal compiler error)

(with -march=armv5t: gcc.dg/pr83930.c (internal compiler error))

on arm-linux-gnueabihf, in addition to align4/align_rec4:
--with-cpu cortex-a9
--with-fpu neon-fp16
    gcc.c-torture/execute/pr37573.c   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  (internal
compiler error)
    gcc.c-torture/execute/pr37573.c   -O3 -g  (internal compiler error)
    gcc.dg/vect/fast-math-pr35982.c (internal compiler error)
    gcc.dg/vect/pr55857-1.c (internal compiler error)
    gcc.dg/vect/pr55857-1.c -flto -ffat-lto-objects (internal compiler error)
    gcc.dg/vect/pr55857-2.c (internal compiler error)
    gcc.dg/vect/pr55857-2.c -flto -ffat-lto-objects (internal compiler error)
    gcc.dg/vect/pr57558-2.c (internal compiler error)
    gcc.dg/vect/pr57558-2.c -flto -ffat-lto-objects (internal compiler error)

and even more with other configs
(http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/274986/report-build-info.html
may help)

Christophe

>
> $BUILD/arm-none-linux-gnueabihf/libstdc++-v3/include/ext/concurrence.h:
> In function '(static initializers for
> $SRC/libstdc++-v3/libsupc++/eh_alloc.cc)':
> $BUILD/arm-none-linux-gnueabihf/libstdc++-v3/include/ext/concurrence.h:129:5:
> internal compiler error: in gen_movv8qi, at config/arm/vec-common.md:29
>    129 |     {
>        |     ^
> 0x14155cb gen_movv8qi(rtx_def*, rtx_def*)
>          $SRC/gcc/config/arm/vec-common.md:29
> 0x96bb89 insn_gen_fn::operator()(rtx_def*, rtx_def*) const
>          $SRC/gcc/recog.h:318
> 0x94bc95 emit_move_insn_1(rtx_def*, rtx_def*)
>          $SRC/gcc/expr.c:3694
> 0x94c05b emit_move_insn(rtx_def*, rtx_def*)
>          $SRC/gcc/expr.c:3790
> 0x10d5ee5 arm_block_set_aligned_vect
>          $SRC/gcc/config/arm/arm.c:30204
> 0x10d6b37 arm_block_set_vect
>          $SRC/gcc/config/arm/arm.c:30428
> 0x10d6caf arm_gen_setmem(rtx_def**)
>          $SRC/gcc/config/arm/arm.c:30458
> 0x140d7ed gen_setmemsi(rtx_def*, rtx_def*, rtx_def*, rtx_def*)
>          $SRC/gcc/config/arm/arm.md:6687
> 0xbf0e87 insn_gen_fn::operator()(rtx_def*, rtx_def*, rtx_def*, rtx_def*)
> const
>          $SRC/gcc/recog.h:320
> 0xbf0999 maybe_gen_insn(insn_code, unsigned int, expand_operand*)
>          $SRC/gcc/optabs.c:7409
> 0xbf0b87 maybe_expand_insn(insn_code, unsigned int, expand_operand*)
>          $SRC/gcc/optabs.c:7440
> 0x94a709 set_storage_via_setmem(rtx_def*, rtx_def*, rtx_def*, unsigned
> int, unsigned int, long long, unsigned long long, unsigned long long,
> unsigned long long)
>          $SRC/gcc/expr.c:3168
> 0x94a059 clear_storage_hints(rtx_def*, rtx_def*, block_op_methods,
> unsigned int, long long, unsigned long long, unsigned long long,
> unsigned long long)
>          $SRC/gcc/expr.c:3037
> 0x94a137 clear_storage(rtx_def*, rtx_def*, block_op_methods)
>          $SRC/gcc/expr.c:3058
> 0x9537c5 store_constructor
>          $SRC/gcc/expr.c:6333
> 0x957227 store_field
>          $SRC/gcc/expr.c:7145
> 0x94fde1 expand_assignment(tree_node*, tree_node*, bool)
>          $SRC/gcc/expr.c:5301
> 0x815e25 expand_gimple_stmt_1
>          $SRC/gcc/cfgexpand.c:3777
> 0x81611d expand_gimple_stmt
>          $SRC/gcc/cfgexpand.c:3875
> 0x81cd61 expand_gimple_basic_block
>          $SRC/gcc/cfgexpand.c:5915
>
> Looks to me like arm_gen_setmem needs similar fixes to gen_cpymem_ldrd_strd?
>
> Thanks,
>
> Kyrill
>
>
> >
> >
> > Thanks
> > Bernd.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Sanitizing the middle-end interface to the back-end for strict alignment
  2019-08-29 10:00                                           ` Christophe Lyon
@ 2019-08-29 22:57                                             ` Bernd Edlinger
  2019-08-30 10:07                                               ` Kyrill Tkachov
  2019-08-30 15:22                                               ` Christophe Lyon
  0 siblings, 2 replies; 50+ messages in thread
From: Bernd Edlinger @ 2019-08-29 22:57 UTC (permalink / raw)
  To: Christophe Lyon, Kyrill Tkachov
  Cc: Richard Biener, gcc-patches, Richard Earnshaw,
	Ramana Radhakrishnan, Eric Botcazou, Jeff Law, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 11010 bytes --]

On 8/29/19 11:08 AM, Christophe Lyon wrote:
> On Thu, 29 Aug 2019 at 10:58, Kyrill Tkachov
> <kyrylo.tkachov@foss.arm.com> wrote:
>>
>> Hi Bernd,
>>
>> On 8/28/19 10:36 PM, Bernd Edlinger wrote:
>>> On 8/28/19 2:07 PM, Christophe Lyon wrote:
>>>> Hi,
>>>>
>>>> This patch causes an ICE when building libgcc's unwind-arm.o
>>>> when configuring GCC:
>>>> --target  arm-none-linux-gnueabihf --with-mode thumb --with-cpu
>>>> cortex-a15 --with-fpu neon-vfpv4:
>>>>
>>>> The build works for the same target, but --with-mode arm --with-cpu
>>>> cortex a9 --with-fpu vfp
>>>>
>>>> In file included from
>>>> /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/config/arm/unwind-arm.c:144:
>>>> /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-arm-common.inc:
>>>> In function 'get_eit_entry':
>>>> /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-arm-common.inc:245:29:
>>>> warning: cast discards 'const' qualifier from pointer target type
>>>> [-Wcast-qual]
>>>>    245 |       ucbp->pr_cache.ehtp = (_Unwind_EHT_Header *)&eitp->content;
>>>>        |                             ^
>>>> during RTL pass: expand
>>>> /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-arm-common.inc:
>>>> In function 'unwind_phase2_forced':
>>>> /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-arm-common.inc:319:18:
>>>> internal compiler error: in gen_movdi, at config/arm/arm.md:5235
>>>>    319 |   saved_vrs.core = entry_vrs->core;
>>>>        |   ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~
>>>> 0x126530f gen_movdi(rtx_def*, rtx_def*)
>>>>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.md:5235
>>>> 0x896d92 insn_gen_fn::operator()(rtx_def*, rtx_def*) const
>>>>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/recog.h:318
>>>> 0x896d92 emit_move_insn_1(rtx_def*, rtx_def*)
>>>>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:3694
>>>> 0x897083 emit_move_insn(rtx_def*, rtx_def*)
>>>>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:3790
>>>> 0xfc25d6 gen_cpymem_ldrd_strd(rtx_def**)
>>>>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:14582
>>>> 0x126a1f1 gen_cpymemqi(rtx_def*, rtx_def*, rtx_def*, rtx_def*)
>>>>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.md:6688
>>>> 0xb0bc08 maybe_expand_insn(insn_code, unsigned int, expand_operand*)
>>>>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/optabs.c:7440
>>>> 0x89ba1e emit_block_move_via_cpymem
>>>>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:1808
>>>> 0x89ba1e emit_block_move_hints(rtx_def*, rtx_def*, rtx_def*,
>>>> block_op_methods, unsigned int, long, unsigned long, unsigned long,
>>>> unsigned long, bool, bool*)
>>>>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:1627
>>>> 0x89c383 emit_block_move(rtx_def*, rtx_def*, rtx_def*, block_op_methods)
>>>>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:1667
>>>> 0x89fb4e store_expr(tree_node*, rtx_def*, int, bool, bool)
>>>>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:5845
>>>> 0x88c1f9 store_field
>>>>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:7149
>>>> 0x8a0c22 expand_assignment(tree_node*, tree_node*, bool)
>>>>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:5304
>>>> 0x761964 expand_gimple_stmt_1
>>>>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:3779
>>>> 0x761964 expand_gimple_stmt
>>>>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:3875
>>>> 0x768583 expand_gimple_basic_block
>>>>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:5915
>>>> 0x76abc6 execute
>>>>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:6538
>>>>
>>>> Christophe
>>>>
>>> Okay, sorry for the breakage.
>>>
>>> What is happening in gen_cpymem_ldrd_strd is of course against the rules:
>>>
>>> It uses emit_move_insn on only 4-byte aligned DI-mode memory operands.
>>>
>>> I have a patch for this, which is able to fix the libgcc build on a cross, but have no
>>> possibility to bootstrap the affected target.
>>>
>>> Could you please help?
>>
>> Well it's good that the sanitisation is catching the bugs!
>>

Yes, more than expected, though ;)

>> Bootstrapping this patch I get another assert with the backtrace:
> 
> Thanks for the additional testing, Kyrill!
> 
> FWIW, my original report was with a failure to just build GCC for
> cortex-a15. I later got the reports of testing cross-toolchains, and
> saw other problems on cortex-a9 for instance.
> But I guess, you have noticed them with your bootstrap?
> on arm-linux-gnueabi
> gcc.target/arm/aapcs/align4.c (internal compiler error)
> gcc.target/arm/aapcs/align_rec4.c (internal compiler error)
> 

This appears to be yet unknown middle-end bug (not fixed by current patch)

$ arm-linux-gnueabihf-gcc align4.c 
during RTL pass: expand
In file included from align4.c:22:
align4.c: In function 'testfunc':
abitest.h:73:42: internal compiler error: in gen_movv2si, at config/arm/vec-common.md:30
   73 | #define LAST_ARG(type,val,offset) { type __x = val; if (memcmp(&__x, stack+offset, sizeof(type)) != 0) abort(); }
      |                                          ^~~
abitest.h:74:30: note: in expansion of macro 'LAST_ARG'
   74 | #define ARG(type,val,offset) LAST_ARG(type, val, offset)
      |                              ^~~~~~~~
align4.c:26:3: note: in expansion of macro 'ARG'
   26 |   ARG (unalignedvec, a, R2)
      |   ^~~
0x7bb33c gen_movv2si(rtx_def*, rtx_def*)
	../../gcc-trunk/gcc/config/arm/vec-common.md:30
0xa4a807 insn_gen_fn::operator()(rtx_def*, rtx_def*) const
	../../gcc-trunk/gcc/recog.h:318
0xa4a807 emit_move_insn_1(rtx_def*, rtx_def*)
	../../gcc-trunk/gcc/expr.c:3694
0xa4ab94 emit_move_insn(rtx_def*, rtx_def*)
	../../gcc-trunk/gcc/expr.c:3790
0xa522bf store_expr(tree_node*, rtx_def*, int, bool, bool)
	../../gcc-trunk/gcc/expr.c:5855
0xa52bfd expand_assignment(tree_node*, tree_node*, bool)
	../../gcc-trunk/gcc/expr.c:5441
0xa52bfd expand_assignment(tree_node*, tree_node*, bool)
	../../gcc-trunk/gcc/expr.c:4982
0x934adf expand_gimple_stmt_1
	../../gcc-trunk/gcc/cfgexpand.c:3777
0x934adf expand_gimple_stmt
	../../gcc-trunk/gcc/cfgexpand.c:3875
0x93a451 expand_gimple_basic_block
	../../gcc-trunk/gcc/cfgexpand.c:5915
0x93c1b6 execute
	../../gcc-trunk/gcc/cfgexpand.c:6538
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.


> (with -march=armv5t: gcc.dg/pr83930.c (internal compiler error))
> 

possibly fixed by latest patch.

> on arm-linux-gnueabihf, in addition to align4/align_rec4:
> --with-cpu cortex-a9
> --with-fpu neon-fp16
>     gcc.c-torture/execute/pr37573.c   -O3 -fomit-frame-pointer
> -funroll-loops -fpeel-loops -ftracer -finline-functions  (internal
> compiler error)
>     gcc.c-torture/execute/pr37573.c   -O3 -g  (internal compiler error)
>     gcc.dg/vect/fast-math-pr35982.c (internal compiler error)
>     gcc.dg/vect/pr55857-1.c (internal compiler error)
>     gcc.dg/vect/pr55857-1.c -flto -ffat-lto-objects (internal compiler error)
>     gcc.dg/vect/pr55857-2.c (internal compiler error)
>     gcc.dg/vect/pr55857-2.c -flto -ffat-lto-objects (internal compiler error)
>     gcc.dg/vect/pr57558-2.c (internal compiler error)
>     gcc.dg/vect/pr57558-2.c -flto -ffat-lto-objects (internal compiler error)
> 
> and even more with other configs
> (http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/274986/report-build-info.html
> may help)
> 
> Christophe
> 
>>
>> $BUILD/arm-none-linux-gnueabihf/libstdc++-v3/include/ext/concurrence.h:
>> In function '(static initializers for
>> $SRC/libstdc++-v3/libsupc++/eh_alloc.cc)':
>> $BUILD/arm-none-linux-gnueabihf/libstdc++-v3/include/ext/concurrence.h:129:5:
>> internal compiler error: in gen_movv8qi, at config/arm/vec-common.md:29
>>    129 |     {
>>        |     ^
>> 0x14155cb gen_movv8qi(rtx_def*, rtx_def*)
>>          $SRC/gcc/config/arm/vec-common.md:29
>> 0x96bb89 insn_gen_fn::operator()(rtx_def*, rtx_def*) const
>>          $SRC/gcc/recog.h:318
>> 0x94bc95 emit_move_insn_1(rtx_def*, rtx_def*)
>>          $SRC/gcc/expr.c:3694
>> 0x94c05b emit_move_insn(rtx_def*, rtx_def*)
>>          $SRC/gcc/expr.c:3790
>> 0x10d5ee5 arm_block_set_aligned_vect
>>          $SRC/gcc/config/arm/arm.c:30204
>> 0x10d6b37 arm_block_set_vect
>>          $SRC/gcc/config/arm/arm.c:30428
>> 0x10d6caf arm_gen_setmem(rtx_def**)
>>          $SRC/gcc/config/arm/arm.c:30458
>> 0x140d7ed gen_setmemsi(rtx_def*, rtx_def*, rtx_def*, rtx_def*)
>>          $SRC/gcc/config/arm/arm.md:6687
>> 0xbf0e87 insn_gen_fn::operator()(rtx_def*, rtx_def*, rtx_def*, rtx_def*)
>> const
>>          $SRC/gcc/recog.h:320
>> 0xbf0999 maybe_gen_insn(insn_code, unsigned int, expand_operand*)
>>          $SRC/gcc/optabs.c:7409
>> 0xbf0b87 maybe_expand_insn(insn_code, unsigned int, expand_operand*)
>>          $SRC/gcc/optabs.c:7440
>> 0x94a709 set_storage_via_setmem(rtx_def*, rtx_def*, rtx_def*, unsigned
>> int, unsigned int, long long, unsigned long long, unsigned long long,
>> unsigned long long)
>>          $SRC/gcc/expr.c:3168
>> 0x94a059 clear_storage_hints(rtx_def*, rtx_def*, block_op_methods,
>> unsigned int, long long, unsigned long long, unsigned long long,
>> unsigned long long)
>>          $SRC/gcc/expr.c:3037
>> 0x94a137 clear_storage(rtx_def*, rtx_def*, block_op_methods)
>>          $SRC/gcc/expr.c:3058
>> 0x9537c5 store_constructor
>>          $SRC/gcc/expr.c:6333
>> 0x957227 store_field
>>          $SRC/gcc/expr.c:7145
>> 0x94fde1 expand_assignment(tree_node*, tree_node*, bool)
>>          $SRC/gcc/expr.c:5301
>> 0x815e25 expand_gimple_stmt_1
>>          $SRC/gcc/cfgexpand.c:3777
>> 0x81611d expand_gimple_stmt
>>          $SRC/gcc/cfgexpand.c:3875
>> 0x81cd61 expand_gimple_basic_block
>>          $SRC/gcc/cfgexpand.c:5915
>>
>> Looks to me like arm_gen_setmem needs similar fixes to gen_cpymem_ldrd_strd?
>>

Yes, indeed, see attached patch.

This seems to fix the bootstrap, but at least one other error remains,
however I think those do hopefully not break the boot-strap and can be
fixed with follow-up patches.

Christophe can you please track the remaining regressions, that would be
very helpful.

Attached is an updated patch version which should un-break the bootstrap issues.
Is it OK for trunk?



Thanks
Bernd.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: patch-cpymem-fix.diff --]
[-- Type: text/x-patch; name="patch-cpymem-fix.diff", Size: 4401 bytes --]

2019-08-29  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	* config/arm/arm.md (unaligned_loaddi,
	unaligned_storedi): New unspec insn patterns.
	* config/arm/neon.md (unaligned_storev8qi): Likewise.
	* config/arm/arm.c (gen_cpymem_ldrd_strd): Use unaligned_loaddi
	and unaligned_storedi for 4-byte aligned memory.
	(arm_block_set_aligned_vect): Use unaligned_storev8qi for
	4-byte aligned memory.

Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c	(revision 274987)
+++ gcc/config/arm/arm.c	(working copy)
@@ -14578,8 +14578,10 @@ gen_cpymem_ldrd_strd (rtx *operands)
 	  low_reg = gen_lowpart (SImode, reg0);
 	  hi_reg = gen_highpart_mode (SImode, DImode, reg0);
 	}
-      if (src_aligned)
-        emit_move_insn (reg0, src);
+      if (MEM_ALIGN (src) >= 2 * BITS_PER_WORD)
+	emit_move_insn (reg0, src);
+      else if (src_aligned)
+	emit_insn (gen_unaligned_loaddi (reg0, src));
       else
 	{
 	  emit_insn (gen_unaligned_loadsi (low_reg, src));
@@ -14587,8 +14589,10 @@ gen_cpymem_ldrd_strd (rtx *operands)
 	  emit_insn (gen_unaligned_loadsi (hi_reg, src));
 	}
 
-      if (dst_aligned)
-        emit_move_insn (dst, reg0);
+      if (MEM_ALIGN (dst) >= 2 * BITS_PER_WORD)
+	emit_move_insn (dst, reg0);
+      else if (dst_aligned)
+	emit_insn (gen_unaligned_storedi (dst, reg0));
       else
 	{
 	  emit_insn (gen_unaligned_storesi (dst, low_reg));
@@ -30197,7 +30201,10 @@ arm_block_set_aligned_vect (rtx dstbase,
     {
       addr = plus_constant (Pmode, dst, i);
       mem = adjust_automodify_address (dstbase, mode, addr, offset + i);
-      emit_move_insn (mem, reg);
+      if (MEM_ALIGN (mem) >= 2 * BITS_PER_WORD)
+	emit_move_insn (mem, reg);
+      else
+	emit_insn (gen_unaligned_storev8qi (mem, reg));
     }
 
   /* Handle single word leftover by shifting 4 bytes back.  We can
@@ -30211,7 +30218,7 @@ arm_block_set_aligned_vect (rtx dstbase,
       if (align > UNITS_PER_WORD)
 	set_mem_align (mem, BITS_PER_UNIT * UNITS_PER_WORD);
 
-      emit_move_insn (mem, reg);
+      emit_insn (gen_unaligned_storev8qi (mem, reg));
     }
   /* Handle (0, 4), (4, 8) bytes leftover by shifting bytes back.
      We have to use unaligned access for this case.  */
Index: gcc/config/arm/arm.md
===================================================================
--- gcc/config/arm/arm.md	(revision 274987)
+++ gcc/config/arm/arm.md	(working copy)
@@ -3963,6 +3963,17 @@
 
 ; ARMv6+ unaligned load/store instructions (used for packed structure accesses).
 
+(define_insn "unaligned_loaddi"
+  [(set (match_operand:DI 0 "s_register_operand" "=r")
+	(unspec:DI [(match_operand:DI 1 "memory_operand" "m")]
+		   UNSPEC_UNALIGNED_LOAD))]
+  "TARGET_32BIT && TARGET_LDRD"
+  "*
+  return output_move_double (operands, true, NULL);
+  "
+  [(set_attr "length" "8")
+   (set_attr "type" "load_8")])
+
 (define_insn "unaligned_loadsi"
   [(set (match_operand:SI 0 "s_register_operand" "=l,l,r")
 	(unspec:SI [(match_operand:SI 1 "memory_operand" "m,Uw,m")]
@@ -4008,6 +4019,17 @@
    (set_attr "predicable_short_it" "no,yes,no")
    (set_attr "type" "load_byte")])
 
+(define_insn "unaligned_storedi"
+  [(set (match_operand:DI 0 "memory_operand" "=m")
+	(unspec:DI [(match_operand:DI 1 "s_register_operand" "r")]
+		   UNSPEC_UNALIGNED_STORE))]
+  "TARGET_32BIT && TARGET_LDRD"
+  "*
+  return output_move_double (operands, true, NULL);
+  "
+  [(set_attr "length" "8")
+   (set_attr "type" "store_8")])
+
 (define_insn "unaligned_storesi"
   [(set (match_operand:SI 0 "memory_operand" "=m,Uw,m")
 	(unspec:SI [(match_operand:SI 1 "s_register_operand" "l,l,r")]
Index: gcc/config/arm/neon.md
===================================================================
--- gcc/config/arm/neon.md	(revision 274987)
+++ gcc/config/arm/neon.md	(working copy)
@@ -23,6 +23,17 @@
 ;; type attribute definitions.
 (define_attr "vqh_mnem" "vadd,vmin,vmax" (const_string "vadd"))
 
+(define_insn "unaligned_storev8qi"
+  [(set (match_operand:V8QI 0 "memory_operand" "=Un")
+	(unspec:V8QI [(match_operand:V8QI 1 "s_register_operand" "w")]
+		     UNSPEC_UNALIGNED_STORE))]
+  "TARGET_NEON"
+  "*
+  return output_move_neon (operands);
+  "
+  [(set_attr "length" "4")
+   (set_attr "type" "neon_store1_1reg")])
+
 (define_insn "*neon_mov<mode>"
   [(set (match_operand:VDX 0 "nonimmediate_operand"
 	  "=w,Un,w, w, w,  ?r,?w,?r, ?Us,*r")

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Sanitizing the middle-end interface to the back-end for strict alignment
  2019-08-29 22:57                                             ` Bernd Edlinger
@ 2019-08-30 10:07                                               ` Kyrill Tkachov
  2019-08-30 15:22                                               ` Christophe Lyon
  1 sibling, 0 replies; 50+ messages in thread
From: Kyrill Tkachov @ 2019-08-30 10:07 UTC (permalink / raw)
  To: Bernd Edlinger, Christophe Lyon
  Cc: Richard Biener, gcc-patches, Richard Earnshaw,
	Ramana Radhakrishnan, Eric Botcazou, Jeff Law, Jakub Jelinek

Hi Bernd,

On 8/29/19 10:26 PM, Bernd Edlinger wrote:
> On 8/29/19 11:08 AM, Christophe Lyon wrote:
>> On Thu, 29 Aug 2019 at 10:58, Kyrill Tkachov
>> <kyrylo.tkachov@foss.arm.com> wrote:
>>> Hi Bernd,
>>>
>>> On 8/28/19 10:36 PM, Bernd Edlinger wrote:
>>>> On 8/28/19 2:07 PM, Christophe Lyon wrote:
>>>>> Hi,
>>>>>
>>>>> This patch causes an ICE when building libgcc's unwind-arm.o
>>>>> when configuring GCC:
>>>>> --target  arm-none-linux-gnueabihf --with-mode thumb --with-cpu
>>>>> cortex-a15 --with-fpu neon-vfpv4:
>>>>>
>>>>> The build works for the same target, but --with-mode arm --with-cpu
>>>>> cortex a9 --with-fpu vfp
>>>>>
>>>>> In file included from
>>>>> /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/config/arm/unwind-arm.c:144:
>>>>> /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-arm-common.inc:
>>>>> In function 'get_eit_entry':
>>>>> /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-arm-common.inc:245:29:
>>>>> warning: cast discards 'const' qualifier from pointer target type
>>>>> [-Wcast-qual]
>>>>>     245 |       ucbp->pr_cache.ehtp = (_Unwind_EHT_Header *)&eitp->content;
>>>>>         |                             ^
>>>>> during RTL pass: expand
>>>>> /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-arm-common.inc:
>>>>> In function 'unwind_phase2_forced':
>>>>> /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-arm-common.inc:319:18:
>>>>> internal compiler error: in gen_movdi, at config/arm/arm.md:5235
>>>>>     319 |   saved_vrs.core = entry_vrs->core;
>>>>>         |   ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~
>>>>> 0x126530f gen_movdi(rtx_def*, rtx_def*)
>>>>>           /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.md:5235
>>>>> 0x896d92 insn_gen_fn::operator()(rtx_def*, rtx_def*) const
>>>>>           /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/recog.h:318
>>>>> 0x896d92 emit_move_insn_1(rtx_def*, rtx_def*)
>>>>>           /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:3694
>>>>> 0x897083 emit_move_insn(rtx_def*, rtx_def*)
>>>>>           /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:3790
>>>>> 0xfc25d6 gen_cpymem_ldrd_strd(rtx_def**)
>>>>>           /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:14582
>>>>> 0x126a1f1 gen_cpymemqi(rtx_def*, rtx_def*, rtx_def*, rtx_def*)
>>>>>           /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.md:6688
>>>>> 0xb0bc08 maybe_expand_insn(insn_code, unsigned int, expand_operand*)
>>>>>           /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/optabs.c:7440
>>>>> 0x89ba1e emit_block_move_via_cpymem
>>>>>           /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:1808
>>>>> 0x89ba1e emit_block_move_hints(rtx_def*, rtx_def*, rtx_def*,
>>>>> block_op_methods, unsigned int, long, unsigned long, unsigned long,
>>>>> unsigned long, bool, bool*)
>>>>>           /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:1627
>>>>> 0x89c383 emit_block_move(rtx_def*, rtx_def*, rtx_def*, block_op_methods)
>>>>>           /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:1667
>>>>> 0x89fb4e store_expr(tree_node*, rtx_def*, int, bool, bool)
>>>>>           /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:5845
>>>>> 0x88c1f9 store_field
>>>>>           /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:7149
>>>>> 0x8a0c22 expand_assignment(tree_node*, tree_node*, bool)
>>>>>           /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:5304
>>>>> 0x761964 expand_gimple_stmt_1
>>>>>           /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:3779
>>>>> 0x761964 expand_gimple_stmt
>>>>>           /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:3875
>>>>> 0x768583 expand_gimple_basic_block
>>>>>           /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:5915
>>>>> 0x76abc6 execute
>>>>>           /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:6538
>>>>>
>>>>> Christophe
>>>>>
>>>> Okay, sorry for the breakage.
>>>>
>>>> What is happening in gen_cpymem_ldrd_strd is of course against the rules:
>>>>
>>>> It uses emit_move_insn on only 4-byte aligned DI-mode memory operands.
>>>>
>>>> I have a patch for this, which is able to fix the libgcc build on a cross, but have no
>>>> possibility to bootstrap the affected target.
>>>>
>>>> Could you please help?
>>> Well it's good that the sanitisation is catching the bugs!
>>>
> Yes, more than expected, though ;)
>
>>> Bootstrapping this patch I get another assert with the backtrace:
>> Thanks for the additional testing, Kyrill!
>>
>> FWIW, my original report was with a failure to just build GCC for
>> cortex-a15. I later got the reports of testing cross-toolchains, and
>> saw other problems on cortex-a9 for instance.
>> But I guess, you have noticed them with your bootstrap?
>> on arm-linux-gnueabi
>> gcc.target/arm/aapcs/align4.c (internal compiler error)
>> gcc.target/arm/aapcs/align_rec4.c (internal compiler error)
>>
> This appears to be yet unknown middle-end bug (not fixed by current patch)
>
> $ arm-linux-gnueabihf-gcc align4.c
> during RTL pass: expand
> In file included from align4.c:22:
> align4.c: In function 'testfunc':
> abitest.h:73:42: internal compiler error: in gen_movv2si, at config/arm/vec-common.md:30
>     73 | #define LAST_ARG(type,val,offset) { type __x = val; if (memcmp(&__x, stack+offset, sizeof(type)) != 0) abort(); }
>        |                                          ^~~
> abitest.h:74:30: note: in expansion of macro 'LAST_ARG'
>     74 | #define ARG(type,val,offset) LAST_ARG(type, val, offset)
>        |                              ^~~~~~~~
> align4.c:26:3: note: in expansion of macro 'ARG'
>     26 |   ARG (unalignedvec, a, R2)
>        |   ^~~
> 0x7bb33c gen_movv2si(rtx_def*, rtx_def*)
> 	../../gcc-trunk/gcc/config/arm/vec-common.md:30
> 0xa4a807 insn_gen_fn::operator()(rtx_def*, rtx_def*) const
> 	../../gcc-trunk/gcc/recog.h:318
> 0xa4a807 emit_move_insn_1(rtx_def*, rtx_def*)
> 	../../gcc-trunk/gcc/expr.c:3694
> 0xa4ab94 emit_move_insn(rtx_def*, rtx_def*)
> 	../../gcc-trunk/gcc/expr.c:3790
> 0xa522bf store_expr(tree_node*, rtx_def*, int, bool, bool)
> 	../../gcc-trunk/gcc/expr.c:5855
> 0xa52bfd expand_assignment(tree_node*, tree_node*, bool)
> 	../../gcc-trunk/gcc/expr.c:5441
> 0xa52bfd expand_assignment(tree_node*, tree_node*, bool)
> 	../../gcc-trunk/gcc/expr.c:4982
> 0x934adf expand_gimple_stmt_1
> 	../../gcc-trunk/gcc/cfgexpand.c:3777
> 0x934adf expand_gimple_stmt
> 	../../gcc-trunk/gcc/cfgexpand.c:3875
> 0x93a451 expand_gimple_basic_block
> 	../../gcc-trunk/gcc/cfgexpand.c:5915
> 0x93c1b6 execute
> 	../../gcc-trunk/gcc/cfgexpand.c:6538
> Please submit a full bug report,
> with preprocessed source if appropriate.
> Please include the complete backtrace with any bug report.
> See <https://gcc.gnu.org/bugs/> for instructions.
>
>
>> (with -march=armv5t: gcc.dg/pr83930.c (internal compiler error))
>>
> possibly fixed by latest patch.
>
>> on arm-linux-gnueabihf, in addition to align4/align_rec4:
>> --with-cpu cortex-a9
>> --with-fpu neon-fp16
>>      gcc.c-torture/execute/pr37573.c   -O3 -fomit-frame-pointer
>> -funroll-loops -fpeel-loops -ftracer -finline-functions  (internal
>> compiler error)
>>      gcc.c-torture/execute/pr37573.c   -O3 -g  (internal compiler error)
>>      gcc.dg/vect/fast-math-pr35982.c (internal compiler error)
>>      gcc.dg/vect/pr55857-1.c (internal compiler error)
>>      gcc.dg/vect/pr55857-1.c -flto -ffat-lto-objects (internal compiler error)
>>      gcc.dg/vect/pr55857-2.c (internal compiler error)
>>      gcc.dg/vect/pr55857-2.c -flto -ffat-lto-objects (internal compiler error)
>>      gcc.dg/vect/pr57558-2.c (internal compiler error)
>>      gcc.dg/vect/pr57558-2.c -flto -ffat-lto-objects (internal compiler error)
>>
>> and even more with other configs
>> (http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/274986/report-build-info.html
>> may help)
>>
>> Christophe
>>
>>> $BUILD/arm-none-linux-gnueabihf/libstdc++-v3/include/ext/concurrence.h:
>>> In function '(static initializers for
>>> $SRC/libstdc++-v3/libsupc++/eh_alloc.cc)':
>>> $BUILD/arm-none-linux-gnueabihf/libstdc++-v3/include/ext/concurrence.h:129:5:
>>> internal compiler error: in gen_movv8qi, at config/arm/vec-common.md:29
>>>     129 |     {
>>>         |     ^
>>> 0x14155cb gen_movv8qi(rtx_def*, rtx_def*)
>>>           $SRC/gcc/config/arm/vec-common.md:29
>>> 0x96bb89 insn_gen_fn::operator()(rtx_def*, rtx_def*) const
>>>           $SRC/gcc/recog.h:318
>>> 0x94bc95 emit_move_insn_1(rtx_def*, rtx_def*)
>>>           $SRC/gcc/expr.c:3694
>>> 0x94c05b emit_move_insn(rtx_def*, rtx_def*)
>>>           $SRC/gcc/expr.c:3790
>>> 0x10d5ee5 arm_block_set_aligned_vect
>>>           $SRC/gcc/config/arm/arm.c:30204
>>> 0x10d6b37 arm_block_set_vect
>>>           $SRC/gcc/config/arm/arm.c:30428
>>> 0x10d6caf arm_gen_setmem(rtx_def**)
>>>           $SRC/gcc/config/arm/arm.c:30458
>>> 0x140d7ed gen_setmemsi(rtx_def*, rtx_def*, rtx_def*, rtx_def*)
>>>           $SRC/gcc/config/arm/arm.md:6687
>>> 0xbf0e87 insn_gen_fn::operator()(rtx_def*, rtx_def*, rtx_def*, rtx_def*)
>>> const
>>>           $SRC/gcc/recog.h:320
>>> 0xbf0999 maybe_gen_insn(insn_code, unsigned int, expand_operand*)
>>>           $SRC/gcc/optabs.c:7409
>>> 0xbf0b87 maybe_expand_insn(insn_code, unsigned int, expand_operand*)
>>>           $SRC/gcc/optabs.c:7440
>>> 0x94a709 set_storage_via_setmem(rtx_def*, rtx_def*, rtx_def*, unsigned
>>> int, unsigned int, long long, unsigned long long, unsigned long long,
>>> unsigned long long)
>>>           $SRC/gcc/expr.c:3168
>>> 0x94a059 clear_storage_hints(rtx_def*, rtx_def*, block_op_methods,
>>> unsigned int, long long, unsigned long long, unsigned long long,
>>> unsigned long long)
>>>           $SRC/gcc/expr.c:3037
>>> 0x94a137 clear_storage(rtx_def*, rtx_def*, block_op_methods)
>>>           $SRC/gcc/expr.c:3058
>>> 0x9537c5 store_constructor
>>>           $SRC/gcc/expr.c:6333
>>> 0x957227 store_field
>>>           $SRC/gcc/expr.c:7145
>>> 0x94fde1 expand_assignment(tree_node*, tree_node*, bool)
>>>           $SRC/gcc/expr.c:5301
>>> 0x815e25 expand_gimple_stmt_1
>>>           $SRC/gcc/cfgexpand.c:3777
>>> 0x81611d expand_gimple_stmt
>>>           $SRC/gcc/cfgexpand.c:3875
>>> 0x81cd61 expand_gimple_basic_block
>>>           $SRC/gcc/cfgexpand.c:5915
>>>
>>> Looks to me like arm_gen_setmem needs similar fixes to gen_cpymem_ldrd_strd?
>>>
> Yes, indeed, see attached patch.
>
> This seems to fix the bootstrap, but at least one other error remains,
> however I think those do hopefully not break the boot-strap and can be
> fixed with follow-up patches.
>
> Christophe can you please track the remaining regressions, that would be
> very helpful.
>
> Attached is an updated patch version which should un-break the bootstrap issues.
> Is it OK for trunk?
>
Yes, that fixes the bootstrap and testing looks ok, modulo the 
regressions Christophe listed.

Ok with one change...

+(define_insn "unaligned_storev8qi"
+  [(set (match_operand:V8QI 0 "memory_operand" "=Un")
+	(unspec:V8QI [(match_operand:V8QI 1 "s_register_operand" "w")]
+		     UNSPEC_UNALIGNED_STORE))]
+  "TARGET_NEON"
+  "*
+  return output_move_neon (operands);
+  "
+  [(set_attr "length" "4")
+   (set_attr "type" "neon_store1_1reg")])

No need to specify the "length" here as it's 4 by default.

Thanks,

Kyrill

>
> Thanks
> Bernd.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH] Sanitizing the middle-end interface to the back-end for strict alignment
  2019-08-29 22:57                                             ` Bernd Edlinger
  2019-08-30 10:07                                               ` Kyrill Tkachov
@ 2019-08-30 15:22                                               ` Christophe Lyon
  1 sibling, 0 replies; 50+ messages in thread
From: Christophe Lyon @ 2019-08-30 15:22 UTC (permalink / raw)
  To: Bernd Edlinger
  Cc: Kyrill Tkachov, Richard Biener, gcc-patches, Richard Earnshaw,
	Ramana Radhakrishnan, Eric Botcazou, Jeff Law, Jakub Jelinek

On Thu, 29 Aug 2019 at 23:26, Bernd Edlinger <bernd.edlinger@hotmail.de> wrote:
>
> On 8/29/19 11:08 AM, Christophe Lyon wrote:
> > On Thu, 29 Aug 2019 at 10:58, Kyrill Tkachov
> > <kyrylo.tkachov@foss.arm.com> wrote:
> >>
> >> Hi Bernd,
> >>
> >> On 8/28/19 10:36 PM, Bernd Edlinger wrote:
> >>> On 8/28/19 2:07 PM, Christophe Lyon wrote:
> >>>> Hi,
> >>>>
> >>>> This patch causes an ICE when building libgcc's unwind-arm.o
> >>>> when configuring GCC:
> >>>> --target  arm-none-linux-gnueabihf --with-mode thumb --with-cpu
> >>>> cortex-a15 --with-fpu neon-vfpv4:
> >>>>
> >>>> The build works for the same target, but --with-mode arm --with-cpu
> >>>> cortex a9 --with-fpu vfp
> >>>>
> >>>> In file included from
> >>>> /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/config/arm/unwind-arm.c:144:
> >>>> /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-arm-common.inc:
> >>>> In function 'get_eit_entry':
> >>>> /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-arm-common.inc:245:29:
> >>>> warning: cast discards 'const' qualifier from pointer target type
> >>>> [-Wcast-qual]
> >>>>    245 |       ucbp->pr_cache.ehtp = (_Unwind_EHT_Header *)&eitp->content;
> >>>>        |                             ^
> >>>> during RTL pass: expand
> >>>> /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-arm-common.inc:
> >>>> In function 'unwind_phase2_forced':
> >>>> /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgcc/unwind-arm-common.inc:319:18:
> >>>> internal compiler error: in gen_movdi, at config/arm/arm.md:5235
> >>>>    319 |   saved_vrs.core = entry_vrs->core;
> >>>>        |   ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~
> >>>> 0x126530f gen_movdi(rtx_def*, rtx_def*)
> >>>>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.md:5235
> >>>> 0x896d92 insn_gen_fn::operator()(rtx_def*, rtx_def*) const
> >>>>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/recog.h:318
> >>>> 0x896d92 emit_move_insn_1(rtx_def*, rtx_def*)
> >>>>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:3694
> >>>> 0x897083 emit_move_insn(rtx_def*, rtx_def*)
> >>>>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:3790
> >>>> 0xfc25d6 gen_cpymem_ldrd_strd(rtx_def**)
> >>>>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.c:14582
> >>>> 0x126a1f1 gen_cpymemqi(rtx_def*, rtx_def*, rtx_def*, rtx_def*)
> >>>>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/config/arm/arm.md:6688
> >>>> 0xb0bc08 maybe_expand_insn(insn_code, unsigned int, expand_operand*)
> >>>>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/optabs.c:7440
> >>>> 0x89ba1e emit_block_move_via_cpymem
> >>>>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:1808
> >>>> 0x89ba1e emit_block_move_hints(rtx_def*, rtx_def*, rtx_def*,
> >>>> block_op_methods, unsigned int, long, unsigned long, unsigned long,
> >>>> unsigned long, bool, bool*)
> >>>>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:1627
> >>>> 0x89c383 emit_block_move(rtx_def*, rtx_def*, rtx_def*, block_op_methods)
> >>>>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:1667
> >>>> 0x89fb4e store_expr(tree_node*, rtx_def*, int, bool, bool)
> >>>>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:5845
> >>>> 0x88c1f9 store_field
> >>>>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:7149
> >>>> 0x8a0c22 expand_assignment(tree_node*, tree_node*, bool)
> >>>>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/expr.c:5304
> >>>> 0x761964 expand_gimple_stmt_1
> >>>>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:3779
> >>>> 0x761964 expand_gimple_stmt
> >>>>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:3875
> >>>> 0x768583 expand_gimple_basic_block
> >>>>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:5915
> >>>> 0x76abc6 execute
> >>>>          /tmp/6852788_4.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/cfgexpand.c:6538
> >>>>
> >>>> Christophe
> >>>>
> >>> Okay, sorry for the breakage.
> >>>
> >>> What is happening in gen_cpymem_ldrd_strd is of course against the rules:
> >>>
> >>> It uses emit_move_insn on only 4-byte aligned DI-mode memory operands.
> >>>
> >>> I have a patch for this, which is able to fix the libgcc build on a cross, but have no
> >>> possibility to bootstrap the affected target.
> >>>
> >>> Could you please help?
> >>
> >> Well it's good that the sanitisation is catching the bugs!
> >>
>
> Yes, more than expected, though ;)
>
> >> Bootstrapping this patch I get another assert with the backtrace:
> >
> > Thanks for the additional testing, Kyrill!
> >
> > FWIW, my original report was with a failure to just build GCC for
> > cortex-a15. I later got the reports of testing cross-toolchains, and
> > saw other problems on cortex-a9 for instance.
> > But I guess, you have noticed them with your bootstrap?
> > on arm-linux-gnueabi
> > gcc.target/arm/aapcs/align4.c (internal compiler error)
> > gcc.target/arm/aapcs/align_rec4.c (internal compiler error)
> >
>
> This appears to be yet unknown middle-end bug (not fixed by current patch)
>
> $ arm-linux-gnueabihf-gcc align4.c
> during RTL pass: expand
> In file included from align4.c:22:
> align4.c: In function 'testfunc':
> abitest.h:73:42: internal compiler error: in gen_movv2si, at config/arm/vec-common.md:30
>    73 | #define LAST_ARG(type,val,offset) { type __x = val; if (memcmp(&__x, stack+offset, sizeof(type)) != 0) abort(); }
>       |                                          ^~~
> abitest.h:74:30: note: in expansion of macro 'LAST_ARG'
>    74 | #define ARG(type,val,offset) LAST_ARG(type, val, offset)
>       |                              ^~~~~~~~
> align4.c:26:3: note: in expansion of macro 'ARG'
>    26 |   ARG (unalignedvec, a, R2)
>       |   ^~~
> 0x7bb33c gen_movv2si(rtx_def*, rtx_def*)
>         ../../gcc-trunk/gcc/config/arm/vec-common.md:30
> 0xa4a807 insn_gen_fn::operator()(rtx_def*, rtx_def*) const
>         ../../gcc-trunk/gcc/recog.h:318
> 0xa4a807 emit_move_insn_1(rtx_def*, rtx_def*)
>         ../../gcc-trunk/gcc/expr.c:3694
> 0xa4ab94 emit_move_insn(rtx_def*, rtx_def*)
>         ../../gcc-trunk/gcc/expr.c:3790
> 0xa522bf store_expr(tree_node*, rtx_def*, int, bool, bool)
>         ../../gcc-trunk/gcc/expr.c:5855
> 0xa52bfd expand_assignment(tree_node*, tree_node*, bool)
>         ../../gcc-trunk/gcc/expr.c:5441
> 0xa52bfd expand_assignment(tree_node*, tree_node*, bool)
>         ../../gcc-trunk/gcc/expr.c:4982
> 0x934adf expand_gimple_stmt_1
>         ../../gcc-trunk/gcc/cfgexpand.c:3777
> 0x934adf expand_gimple_stmt
>         ../../gcc-trunk/gcc/cfgexpand.c:3875
> 0x93a451 expand_gimple_basic_block
>         ../../gcc-trunk/gcc/cfgexpand.c:5915
> 0x93c1b6 execute
>         ../../gcc-trunk/gcc/cfgexpand.c:6538
> Please submit a full bug report,
> with preprocessed source if appropriate.
> Please include the complete backtrace with any bug report.
> See <https://gcc.gnu.org/bugs/> for instructions.
>
>
> > (with -march=armv5t: gcc.dg/pr83930.c (internal compiler error))
> >
>
> possibly fixed by latest patch.
>
> > on arm-linux-gnueabihf, in addition to align4/align_rec4:
> > --with-cpu cortex-a9
> > --with-fpu neon-fp16
> >     gcc.c-torture/execute/pr37573.c   -O3 -fomit-frame-pointer
> > -funroll-loops -fpeel-loops -ftracer -finline-functions  (internal
> > compiler error)
> >     gcc.c-torture/execute/pr37573.c   -O3 -g  (internal compiler error)
> >     gcc.dg/vect/fast-math-pr35982.c (internal compiler error)
> >     gcc.dg/vect/pr55857-1.c (internal compiler error)
> >     gcc.dg/vect/pr55857-1.c -flto -ffat-lto-objects (internal compiler error)
> >     gcc.dg/vect/pr55857-2.c (internal compiler error)
> >     gcc.dg/vect/pr55857-2.c -flto -ffat-lto-objects (internal compiler error)
> >     gcc.dg/vect/pr57558-2.c (internal compiler error)
> >     gcc.dg/vect/pr57558-2.c -flto -ffat-lto-objects (internal compiler error)
> >
> > and even more with other configs
> > (http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/274986/report-build-info.html
> > may help)
> >
> > Christophe
> >
> >>
> >> $BUILD/arm-none-linux-gnueabihf/libstdc++-v3/include/ext/concurrence.h:
> >> In function '(static initializers for
> >> $SRC/libstdc++-v3/libsupc++/eh_alloc.cc)':
> >> $BUILD/arm-none-linux-gnueabihf/libstdc++-v3/include/ext/concurrence.h:129:5:
> >> internal compiler error: in gen_movv8qi, at config/arm/vec-common.md:29
> >>    129 |     {
> >>        |     ^
> >> 0x14155cb gen_movv8qi(rtx_def*, rtx_def*)
> >>          $SRC/gcc/config/arm/vec-common.md:29
> >> 0x96bb89 insn_gen_fn::operator()(rtx_def*, rtx_def*) const
> >>          $SRC/gcc/recog.h:318
> >> 0x94bc95 emit_move_insn_1(rtx_def*, rtx_def*)
> >>          $SRC/gcc/expr.c:3694
> >> 0x94c05b emit_move_insn(rtx_def*, rtx_def*)
> >>          $SRC/gcc/expr.c:3790
> >> 0x10d5ee5 arm_block_set_aligned_vect
> >>          $SRC/gcc/config/arm/arm.c:30204
> >> 0x10d6b37 arm_block_set_vect
> >>          $SRC/gcc/config/arm/arm.c:30428
> >> 0x10d6caf arm_gen_setmem(rtx_def**)
> >>          $SRC/gcc/config/arm/arm.c:30458
> >> 0x140d7ed gen_setmemsi(rtx_def*, rtx_def*, rtx_def*, rtx_def*)
> >>          $SRC/gcc/config/arm/arm.md:6687
> >> 0xbf0e87 insn_gen_fn::operator()(rtx_def*, rtx_def*, rtx_def*, rtx_def*)
> >> const
> >>          $SRC/gcc/recog.h:320
> >> 0xbf0999 maybe_gen_insn(insn_code, unsigned int, expand_operand*)
> >>          $SRC/gcc/optabs.c:7409
> >> 0xbf0b87 maybe_expand_insn(insn_code, unsigned int, expand_operand*)
> >>          $SRC/gcc/optabs.c:7440
> >> 0x94a709 set_storage_via_setmem(rtx_def*, rtx_def*, rtx_def*, unsigned
> >> int, unsigned int, long long, unsigned long long, unsigned long long,
> >> unsigned long long)
> >>          $SRC/gcc/expr.c:3168
> >> 0x94a059 clear_storage_hints(rtx_def*, rtx_def*, block_op_methods,
> >> unsigned int, long long, unsigned long long, unsigned long long,
> >> unsigned long long)
> >>          $SRC/gcc/expr.c:3037
> >> 0x94a137 clear_storage(rtx_def*, rtx_def*, block_op_methods)
> >>          $SRC/gcc/expr.c:3058
> >> 0x9537c5 store_constructor
> >>          $SRC/gcc/expr.c:6333
> >> 0x957227 store_field
> >>          $SRC/gcc/expr.c:7145
> >> 0x94fde1 expand_assignment(tree_node*, tree_node*, bool)
> >>          $SRC/gcc/expr.c:5301
> >> 0x815e25 expand_gimple_stmt_1
> >>          $SRC/gcc/cfgexpand.c:3777
> >> 0x81611d expand_gimple_stmt
> >>          $SRC/gcc/cfgexpand.c:3875
> >> 0x81cd61 expand_gimple_basic_block
> >>          $SRC/gcc/cfgexpand.c:5915
> >>
> >> Looks to me like arm_gen_setmem needs similar fixes to gen_cpymem_ldrd_strd?
> >>
>
> Yes, indeed, see attached patch.
>
> This seems to fix the bootstrap, but at least one other error remains,
> however I think those do hopefully not break the boot-strap and can be
> fixed with follow-up patches.
>
> Christophe can you please track the remaining regressions, that would be
> very helpful.
>
> Attached is an updated patch version which should un-break the bootstrap issues.
> Is it OK for trunk?
>

I've run validations comparing r274985 (rev before your 1st patch)
with r274986 and the patch from this thread.
I think you've committed it while the validations were running.

I've filed several PR for the different ICEs and regressions I've noticed:
91612 91613 61614 91615

HTH

Thanks,

Christophe


>
> Thanks
> Bernd.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCHv5] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)
  2019-08-15 21:19                             ` [PATCHv5] " Bernd Edlinger
  2019-08-20  5:38                               ` Jeff Law
  2019-08-20 15:04                               ` John David Anglin
@ 2019-09-04 12:53                               ` Richard Earnshaw (lists)
  2019-09-04 13:29                                 ` Bernd Edlinger
  2019-09-06 10:15                                 ` Bernd Edlinger
  2 siblings, 2 replies; 50+ messages in thread
From: Richard Earnshaw (lists) @ 2019-09-04 12:53 UTC (permalink / raw)
  To: Bernd Edlinger, Richard Biener
  Cc: gcc-patches, Ramana Radhakrishnan, Kyrill Tkachov, Eric Botcazou,
	Jeff Law, Jakub Jelinek

On 15/08/2019 20:47, Bernd Edlinger wrote:
> On 8/15/19 6:29 PM, Richard Biener wrote:
>>>>
>>>> Please split it into the parts for the PR and parts making the
>>>> asserts not trigger.
>>>>
>>>
>>> Yes, will do.
>>>
> 
> Okay, here is the rest of the PR 89544 fix,
> actually just an optimization, making the larger stack alignment
> known to the middle-end, and the test cases.
> 
> 
> Boot-strapped and reg-tested on x86_64-pc-linux-gnu and arm-linux-gnueabihf.
> Is it OK for trunk?
> 
> 
> Thanks
> Bernd.
> 

Index: gcc/testsuite/gcc.target/arm/unaligned-argument-2.c
===================================================================
--- gcc/testsuite/gcc.target/arm/unaligned-argument-2.c	(Revision 0)
+++ gcc/testsuite/gcc.target/arm/unaligned-argument-2.c	(Arbeitskopie)
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_arm_ok } */
+/* { dg-require-effective-target arm_ldrd_strd_ok } */
+/* { dg-options "-marm -mno-unaligned-access -O3" } */
+
+struct s {
+  int a, b;
+} __attribute__((aligned(8)));
+
+struct s f0;
+
+void f(int a, int b, int c, int d, int e, struct s f)
+{
+  f0 = f;
+}
+
+/* { dg-final { scan-assembler-times "ldrd" 0 } } */
+/* { dg-final { scan-assembler-times "strd" 0 } } */
+/* { dg-final { scan-assembler-times "stm" 1 } } */

I don't think this test is right.  While we can't use an LDRD to load 
the argument off the stack, there's nothing wrong with using an STRD to 
then store the value to f0 (as that is 8-byte aligned).  So the second 
and third scan-assembler tests are meaningless.

R.

(sorry, just noticed this).

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCHv5] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)
  2019-09-04 12:53                               ` Richard Earnshaw (lists)
@ 2019-09-04 13:29                                 ` Bernd Edlinger
  2019-09-04 14:14                                   ` Richard Earnshaw (lists)
  2019-09-06 10:15                                 ` Bernd Edlinger
  1 sibling, 1 reply; 50+ messages in thread
From: Bernd Edlinger @ 2019-09-04 13:29 UTC (permalink / raw)
  To: Richard Earnshaw (lists), Richard Biener
  Cc: gcc-patches, Ramana Radhakrishnan, Kyrill Tkachov, Eric Botcazou,
	Jeff Law, Jakub Jelinek

On 9/4/19 2:53 PM, Richard Earnshaw (lists) wrote:
> On 15/08/2019 20:47, Bernd Edlinger wrote:
>> On 8/15/19 6:29 PM, Richard Biener wrote:
>>>>>
>>>>> Please split it into the parts for the PR and parts making the
>>>>> asserts not trigger.
>>>>>
>>>>
>>>> Yes, will do.
>>>>
>>
>> Okay, here is the rest of the PR 89544 fix,
>> actually just an optimization, making the larger stack alignment
>> known to the middle-end, and the test cases.
>>
>>
>> Boot-strapped and reg-tested on x86_64-pc-linux-gnu and arm-linux-gnueabihf.
>> Is it OK for trunk?
>>
>>
>> Thanks
>> Bernd.
>>
> 
> Index: gcc/testsuite/gcc.target/arm/unaligned-argument-2.c
> ===================================================================
> --- gcc/testsuite/gcc.target/arm/unaligned-argument-2.c    (Revision 0)
> +++ gcc/testsuite/gcc.target/arm/unaligned-argument-2.c    (Arbeitskopie)
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target arm_arm_ok } */
> +/* { dg-require-effective-target arm_ldrd_strd_ok } */
> +/* { dg-options "-marm -mno-unaligned-access -O3" } */
> +
> +struct s {
> +  int a, b;
> +} __attribute__((aligned(8)));
> +
> +struct s f0;
> +
> +void f(int a, int b, int c, int d, int e, struct s f)
> +{
> +  f0 = f;
> +}
> +
> +/* { dg-final { scan-assembler-times "ldrd" 0 } } */
> +/* { dg-final { scan-assembler-times "strd" 0 } } */
> +/* { dg-final { scan-assembler-times "stm" 1 } } */
> 
> I don't think this test is right.  While we can't use an LDRD to load the argument off the stack, there's nothing wrong with using an STRD to then store the value to f0 (as that is 8-byte aligned).  So the second and third scan-assembler tests are meaningless.
> 

Ah, that is very similar to the unaligned-memcpy-2/3.c,
see https://gcc.gnu.org/ml/gcc-patches/2019-09/msg00157.html

initially that is a movdi,
then in subreg1 it is split in two movsi
which is then re-assembled as ldm


Not sure if that is intended in that way.


Bernd.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCHv5] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)
  2019-09-04 13:29                                 ` Bernd Edlinger
@ 2019-09-04 14:14                                   ` Richard Earnshaw (lists)
  2019-09-04 15:00                                     ` Bernd Edlinger
  0 siblings, 1 reply; 50+ messages in thread
From: Richard Earnshaw (lists) @ 2019-09-04 14:14 UTC (permalink / raw)
  To: Bernd Edlinger, Richard Biener
  Cc: gcc-patches, Ramana Radhakrishnan, Kyrill Tkachov, Eric Botcazou,
	Jeff Law, Jakub Jelinek

On 04/09/2019 14:28, Bernd Edlinger wrote:
> On 9/4/19 2:53 PM, Richard Earnshaw (lists) wrote:
>> On 15/08/2019 20:47, Bernd Edlinger wrote:
>>> On 8/15/19 6:29 PM, Richard Biener wrote:
>>>>>>
>>>>>> Please split it into the parts for the PR and parts making the
>>>>>> asserts not trigger.
>>>>>>
>>>>>
>>>>> Yes, will do.
>>>>>
>>>
>>> Okay, here is the rest of the PR 89544 fix,
>>> actually just an optimization, making the larger stack alignment
>>> known to the middle-end, and the test cases.
>>>
>>>
>>> Boot-strapped and reg-tested on x86_64-pc-linux-gnu and arm-linux-gnueabihf.
>>> Is it OK for trunk?
>>>
>>>
>>> Thanks
>>> Bernd.
>>>
>>
>> Index: gcc/testsuite/gcc.target/arm/unaligned-argument-2.c
>> ===================================================================
>> --- gcc/testsuite/gcc.target/arm/unaligned-argument-2.c    (Revision 0)
>> +++ gcc/testsuite/gcc.target/arm/unaligned-argument-2.c    (Arbeitskopie)
>> @@ -0,0 +1,19 @@
>> +/* { dg-do compile } */
>> +/* { dg-require-effective-target arm_arm_ok } */
>> +/* { dg-require-effective-target arm_ldrd_strd_ok } */
>> +/* { dg-options "-marm -mno-unaligned-access -O3" } */
>> +
>> +struct s {
>> +  int a, b;
>> +} __attribute__((aligned(8)));
>> +
>> +struct s f0;
>> +
>> +void f(int a, int b, int c, int d, int e, struct s f)
>> +{
>> +  f0 = f;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times "ldrd" 0 } } */
>> +/* { dg-final { scan-assembler-times "strd" 0 } } */
>> +/* { dg-final { scan-assembler-times "stm" 1 } } */
>>
>> I don't think this test is right.  While we can't use an LDRD to load the argument off the stack, there's nothing wrong with using an STRD to then store the value to f0 (as that is 8-byte aligned).  So the second and third scan-assembler tests are meaningless.
>>
> 
> Ah, that is very similar to the unaligned-memcpy-2/3.c,
> see https://gcc.gnu.org/ml/gcc-patches/2019-09/msg00157.html
> 
> initially that is a movdi,
> then in subreg1 it is split in two movsi
> which is then re-assembled as ldm
> 
> 
> Not sure if that is intended in that way.
> 
> 

Yeah, these are causing me some problems too, but that's because with 
some changes I'm working on I now see the compiler using r4 and r5, 
which leads to prologue and epilogue stores that distort the results.

Tests like this are generally fragile - I hate 'em!!!!

R.
> Bernd.
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCHv5] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)
  2019-09-04 14:14                                   ` Richard Earnshaw (lists)
@ 2019-09-04 15:00                                     ` Bernd Edlinger
  2019-09-04 15:48                                       ` Richard Earnshaw (lists)
  0 siblings, 1 reply; 50+ messages in thread
From: Bernd Edlinger @ 2019-09-04 15:00 UTC (permalink / raw)
  To: Richard Earnshaw (lists), Richard Biener
  Cc: gcc-patches, Ramana Radhakrishnan, Kyrill Tkachov, Eric Botcazou,
	Jeff Law, Jakub Jelinek

On 9/4/19 4:14 PM, Richard Earnshaw (lists) wrote:
> On 04/09/2019 14:28, Bernd Edlinger wrote:
>> On 9/4/19 2:53 PM, Richard Earnshaw (lists) wrote:
>>> Index: gcc/testsuite/gcc.target/arm/unaligned-argument-2.c
>>> ===================================================================
>>> --- gcc/testsuite/gcc.target/arm/unaligned-argument-2.c    (Revision 0)
>>> +++ gcc/testsuite/gcc.target/arm/unaligned-argument-2.c    (Arbeitskopie)
>>> @@ -0,0 +1,19 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-require-effective-target arm_arm_ok } */
>>> +/* { dg-require-effective-target arm_ldrd_strd_ok } */
>>> +/* { dg-options "-marm -mno-unaligned-access -O3" } */
>>> +
>>> +struct s {
>>> +  int a, b;
>>> +} __attribute__((aligned(8)));
>>> +
>>> +struct s f0;
>>> +
>>> +void f(int a, int b, int c, int d, int e, struct s f)
>>> +{
>>> +  f0 = f;
>>> +}
>>> +
>>> +/* { dg-final { scan-assembler-times "ldrd" 0 } } */
>>> +/* { dg-final { scan-assembler-times "strd" 0 } } */
>>> +/* { dg-final { scan-assembler-times "stm" 1 } } */
>>>
>>> I don't think this test is right.  While we can't use an LDRD to load the argument off the stack, there's nothing wrong with using an STRD to then store the value to f0 (as that is 8-byte aligned).  So the second and third scan-assembler tests are meaningless.
>>>
>>
>> Ah, that is very similar to the unaligned-memcpy-2/3.c,
>> see https://gcc.gnu.org/ml/gcc-patches/2019-09/msg00157.html
>>
>> initially that is a movdi,
>> then in subreg1 it is split in two movsi
>> which is then re-assembled as ldm
>>
>>
>> Not sure if that is intended in that way.
>>
>>
> 
> Yeah, these are causing me some problems too, but that's because with some changes I'm working on I now see the compiler using r4 and r5, which leads to prologue and epilogue stores that distort the results.
> 
> Tests like this are generally fragile - I hate 'em!!!!
> 

Yeah, that changed since r275063 introduced the unaligned-load/storedi

r275063 | edlinger | 2019-08-30 12:38:37 +0200 (Fr, 30. Aug 2019) | 10 Zeilen
Geänderte Pfade:
   M /trunk/gcc/ChangeLog
   M /trunk/gcc/config/arm/arm.c
   M /trunk/gcc/config/arm/arm.md
   M /trunk/gcc/config/arm/neon.md

2019-08-30  Bernd Edlinger  <bernd.edlinger@hotmail.de>

        * config/arm/arm.md (unaligned_loaddi,
        unaligned_storedi): New unspec insn patterns.
        * config/arm/neon.md (unaligned_storev8qi): Likewise.
        * config/arm/arm.c (gen_cpymem_ldrd_strd): Use unaligned_loaddi
        and unaligned_storedi for 4-byte aligned memory.
        (arm_block_set_aligned_vect): Use unaligned_storev8qi for
        4-byte aligned memory.

Since other than the movdi they are not split up but stay as ldrd/strd.
But for some unknown reason ira assigns r4-5 to those although also
r1-2 would be available. :-(


Bernd.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCHv5] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)
  2019-09-04 15:00                                     ` Bernd Edlinger
@ 2019-09-04 15:48                                       ` Richard Earnshaw (lists)
  2019-09-05  9:21                                         ` Richard Earnshaw (lists)
  0 siblings, 1 reply; 50+ messages in thread
From: Richard Earnshaw (lists) @ 2019-09-04 15:48 UTC (permalink / raw)
  To: Bernd Edlinger, Richard Biener
  Cc: gcc-patches, Ramana Radhakrishnan, Kyrill Tkachov, Eric Botcazou,
	Jeff Law, Jakub Jelinek

On 04/09/2019 16:00, Bernd Edlinger wrote:
> On 9/4/19 4:14 PM, Richard Earnshaw (lists) wrote:
>> On 04/09/2019 14:28, Bernd Edlinger wrote:
>>> On 9/4/19 2:53 PM, Richard Earnshaw (lists) wrote:
>>>> Index: gcc/testsuite/gcc.target/arm/unaligned-argument-2.c
>>>> ===================================================================
>>>> --- gcc/testsuite/gcc.target/arm/unaligned-argument-2.c    (Revision 0)
>>>> +++ gcc/testsuite/gcc.target/arm/unaligned-argument-2.c    (Arbeitskopie)
>>>> @@ -0,0 +1,19 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-require-effective-target arm_arm_ok } */
>>>> +/* { dg-require-effective-target arm_ldrd_strd_ok } */
>>>> +/* { dg-options "-marm -mno-unaligned-access -O3" } */
>>>> +
>>>> +struct s {
>>>> +  int a, b;
>>>> +} __attribute__((aligned(8)));
>>>> +
>>>> +struct s f0;
>>>> +
>>>> +void f(int a, int b, int c, int d, int e, struct s f)
>>>> +{
>>>> +  f0 = f;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-assembler-times "ldrd" 0 } } */
>>>> +/* { dg-final { scan-assembler-times "strd" 0 } } */
>>>> +/* { dg-final { scan-assembler-times "stm" 1 } } */
>>>>
>>>> I don't think this test is right.  While we can't use an LDRD to load the argument off the stack, there's nothing wrong with using an STRD to then store the value to f0 (as that is 8-byte aligned).  So the second and third scan-assembler tests are meaningless.
>>>>
>>>
>>> Ah, that is very similar to the unaligned-memcpy-2/3.c,
>>> see https://gcc.gnu.org/ml/gcc-patches/2019-09/msg00157.html
>>>
>>> initially that is a movdi,
>>> then in subreg1 it is split in two movsi
>>> which is then re-assembled as ldm
>>>
>>>
>>> Not sure if that is intended in that way.
>>>
>>>
>>
>> Yeah, these are causing me some problems too, but that's because with some changes I'm working on I now see the compiler using r4 and r5, which leads to prologue and epilogue stores that distort the results.
>>
>> Tests like this are generally fragile - I hate 'em!!!!
>>
> 
> Yeah, that changed since r275063 introduced the unaligned-load/storedi
> 
> r275063 | edlinger | 2019-08-30 12:38:37 +0200 (Fr, 30. Aug 2019) | 10 Zeilen
> Geänderte Pfade:
>     M /trunk/gcc/ChangeLog
>     M /trunk/gcc/config/arm/arm.c
>     M /trunk/gcc/config/arm/arm.md
>     M /trunk/gcc/config/arm/neon.md
> 
> 2019-08-30  Bernd Edlinger  <bernd.edlinger@hotmail.de>
> 
>          * config/arm/arm.md (unaligned_loaddi,
>          unaligned_storedi): New unspec insn patterns.
>          * config/arm/neon.md (unaligned_storev8qi): Likewise.
>          * config/arm/arm.c (gen_cpymem_ldrd_strd): Use unaligned_loaddi
>          and unaligned_storedi for 4-byte aligned memory.
>          (arm_block_set_aligned_vect): Use unaligned_storev8qi for
>          4-byte aligned memory.
> 
> Since other than the movdi they are not split up but stay as ldrd/strd.
> But for some unknown reason ira assigns r4-5 to those although also
> r1-2 would be available. :-(
> 

r1-r2 can't be used in Arm state as the register has to start on an even 
boundary.  But ira has already used r3 for the address of the store (it 
could have picked r1) and now r4-r5 is the next even-numbered pair.  So 
we end up with needing to save some call-clobbered regs.

R.
> 
> Bernd.
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCHv5] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)
  2019-09-04 15:48                                       ` Richard Earnshaw (lists)
@ 2019-09-05  9:21                                         ` Richard Earnshaw (lists)
  2019-09-05  9:35                                           ` Bernd Edlinger
  0 siblings, 1 reply; 50+ messages in thread
From: Richard Earnshaw (lists) @ 2019-09-05  9:21 UTC (permalink / raw)
  To: Bernd Edlinger, Richard Biener
  Cc: gcc-patches, Ramana Radhakrishnan, Kyrill Tkachov, Eric Botcazou,
	Jeff Law, Jakub Jelinek

On 04/09/2019 16:48, Richard Earnshaw (lists) wrote:
> On 04/09/2019 16:00, Bernd Edlinger wrote:
>> On 9/4/19 4:14 PM, Richard Earnshaw (lists) wrote:
>>> On 04/09/2019 14:28, Bernd Edlinger wrote:
>>>> On 9/4/19 2:53 PM, Richard Earnshaw (lists) wrote:
>>>>> Index: gcc/testsuite/gcc.target/arm/unaligned-argument-2.c
>>>>> ===================================================================
>>>>> --- gcc/testsuite/gcc.target/arm/unaligned-argument-2.c    
>>>>> (Revision 0)
>>>>> +++ gcc/testsuite/gcc.target/arm/unaligned-argument-2.c    
>>>>> (Arbeitskopie)
>>>>> @@ -0,0 +1,19 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-require-effective-target arm_arm_ok } */
>>>>> +/* { dg-require-effective-target arm_ldrd_strd_ok } */
>>>>> +/* { dg-options "-marm -mno-unaligned-access -O3" } */
>>>>> +
>>>>> +struct s {
>>>>> +  int a, b;
>>>>> +} __attribute__((aligned(8)));
>>>>> +
>>>>> +struct s f0;
>>>>> +
>>>>> +void f(int a, int b, int c, int d, int e, struct s f)
>>>>> +{
>>>>> +  f0 = f;
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-assembler-times "ldrd" 0 } } */
>>>>> +/* { dg-final { scan-assembler-times "strd" 0 } } */
>>>>> +/* { dg-final { scan-assembler-times "stm" 1 } } */
>>>>>
>>>>> I don't think this test is right.  While we can't use an LDRD to 
>>>>> load the argument off the stack, there's nothing wrong with using 
>>>>> an STRD to then store the value to f0 (as that is 8-byte aligned).  
>>>>> So the second and third scan-assembler tests are meaningless.
>>>>>
>>>>
>>>> Ah, that is very similar to the unaligned-memcpy-2/3.c,
>>>> see https://gcc.gnu.org/ml/gcc-patches/2019-09/msg00157.html
>>>>
>>>> initially that is a movdi,
>>>> then in subreg1 it is split in two movsi
>>>> which is then re-assembled as ldm
>>>>
>>>>
>>>> Not sure if that is intended in that way.
>>>>
>>>>
>>>
>>> Yeah, these are causing me some problems too, but that's because with 
>>> some changes I'm working on I now see the compiler using r4 and r5, 
>>> which leads to prologue and epilogue stores that distort the results.
>>>
>>> Tests like this are generally fragile - I hate 'em!!!!
>>>
>>
>> Yeah, that changed since r275063 introduced the unaligned-load/storedi
>>
>> r275063 | edlinger | 2019-08-30 12:38:37 +0200 (Fr, 30. Aug 2019) | 10 
>> Zeilen
>> Geänderte Pfade:
>>     M /trunk/gcc/ChangeLog
>>     M /trunk/gcc/config/arm/arm.c
>>     M /trunk/gcc/config/arm/arm.md
>>     M /trunk/gcc/config/arm/neon.md
>>
>> 2019-08-30  Bernd Edlinger  <bernd.edlinger@hotmail.de>
>>
>>          * config/arm/arm.md (unaligned_loaddi,
>>          unaligned_storedi): New unspec insn patterns.
>>          * config/arm/neon.md (unaligned_storev8qi): Likewise.
>>          * config/arm/arm.c (gen_cpymem_ldrd_strd): Use unaligned_loaddi
>>          and unaligned_storedi for 4-byte aligned memory.
>>          (arm_block_set_aligned_vect): Use unaligned_storev8qi for
>>          4-byte aligned memory.
>>
>> Since other than the movdi they are not split up but stay as ldrd/strd.
>> But for some unknown reason ira assigns r4-5 to those although also
>> r1-2 would be available. :-(
>>
> 
> r1-r2 can't be used in Arm state as the register has to start on an even 
> boundary.  But ira has already used r3 for the address of the store (it 
> could have picked r1) and now r4-r5 is the next even-numbered pair.  So 
> we end up with needing to save some call-clobbered regs.
> 
> R.
>>
>> Bernd.
>>
> 

One possible trick to stabilize the test is to insert an asm that 
clobbers r4 and r5 and forces the prologue/epilogue code to always save 
and restore them.  Then we can account for those prologue/epilogue 
consistently (at least, modulo the arm_prefer_ldrd_strd condition).

R.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCHv5] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)
  2019-09-05  9:21                                         ` Richard Earnshaw (lists)
@ 2019-09-05  9:35                                           ` Bernd Edlinger
  0 siblings, 0 replies; 50+ messages in thread
From: Bernd Edlinger @ 2019-09-05  9:35 UTC (permalink / raw)
  To: Richard Earnshaw (lists), Richard Biener
  Cc: gcc-patches, Ramana Radhakrishnan, Kyrill Tkachov, Eric Botcazou,
	Jeff Law, Jakub Jelinek

On 9/5/19 11:21 AM, Richard Earnshaw (lists) wrote:
> On 04/09/2019 16:48, Richard Earnshaw (lists) wrote:
>> On 04/09/2019 16:00, Bernd Edlinger wrote:
>>> On 9/4/19 4:14 PM, Richard Earnshaw (lists) wrote:
>>>> On 04/09/2019 14:28, Bernd Edlinger wrote:
>>>>> On 9/4/19 2:53 PM, Richard Earnshaw (lists) wrote:
>>>>>> Index: gcc/testsuite/gcc.target/arm/unaligned-argument-2.c
>>>>>> ===================================================================
>>>>>> --- gcc/testsuite/gcc.target/arm/unaligned-argument-2.c    (Revision 0)
>>>>>> +++ gcc/testsuite/gcc.target/arm/unaligned-argument-2.c    (Arbeitskopie)
>>>>>> @@ -0,0 +1,19 @@
>>>>>> +/* { dg-do compile } */
>>>>>> +/* { dg-require-effective-target arm_arm_ok } */
>>>>>> +/* { dg-require-effective-target arm_ldrd_strd_ok } */
>>>>>> +/* { dg-options "-marm -mno-unaligned-access -O3" } */
>>>>>> +
>>>>>> +struct s {
>>>>>> +  int a, b;
>>>>>> +} __attribute__((aligned(8)));
>>>>>> +
>>>>>> +struct s f0;
>>>>>> +
>>>>>> +void f(int a, int b, int c, int d, int e, struct s f)
>>>>>> +{
>>>>>> +  f0 = f;
>>>>>> +}
>>>>>> +
>>>>>> +/* { dg-final { scan-assembler-times "ldrd" 0 } } */
>>>>>> +/* { dg-final { scan-assembler-times "strd" 0 } } */
>>>>>> +/* { dg-final { scan-assembler-times "stm" 1 } } */
>>>>>>
>>>>>> I don't think this test is right.  While we can't use an LDRD to load the argument off the stack, there's nothing wrong with using an STRD to then store the value to f0 (as that is 8-byte aligned).  So the second and third scan-assembler tests are meaningless.
>>>>>>
>>>>>
>>>>> Ah, that is very similar to the unaligned-memcpy-2/3.c,
>>>>> see https://gcc.gnu.org/ml/gcc-patches/2019-09/msg00157.html
>>>>>
>>>>> initially that is a movdi,
>>>>> then in subreg1 it is split in two movsi
>>>>> which is then re-assembled as ldm
>>>>>
>>>>>
>>>>> Not sure if that is intended in that way.
>>>>>
>>>>>
>>>>
>>>> Yeah, these are causing me some problems too, but that's because with some changes I'm working on I now see the compiler using r4 and r5, which leads to prologue and epilogue stores that distort the results.
>>>>
>>>> Tests like this are generally fragile - I hate 'em!!!!
>>>>
>>>
>>> Yeah, that changed since r275063 introduced the unaligned-load/storedi
>>>
>>> r275063 | edlinger | 2019-08-30 12:38:37 +0200 (Fr, 30. Aug 2019) | 10 Zeilen
>>> Geänderte Pfade:
>>>     M /trunk/gcc/ChangeLog
>>>     M /trunk/gcc/config/arm/arm.c
>>>     M /trunk/gcc/config/arm/arm.md
>>>     M /trunk/gcc/config/arm/neon.md
>>>
>>> 2019-08-30  Bernd Edlinger  <bernd.edlinger@hotmail.de>
>>>
>>>          * config/arm/arm.md (unaligned_loaddi,
>>>          unaligned_storedi): New unspec insn patterns.
>>>          * config/arm/neon.md (unaligned_storev8qi): Likewise.
>>>          * config/arm/arm.c (gen_cpymem_ldrd_strd): Use unaligned_loaddi
>>>          and unaligned_storedi for 4-byte aligned memory.
>>>          (arm_block_set_aligned_vect): Use unaligned_storev8qi for
>>>          4-byte aligned memory.
>>>
>>> Since other than the movdi they are not split up but stay as ldrd/strd.
>>> But for some unknown reason ira assigns r4-5 to those although also
>>> r1-2 would be available. :-(
>>>
>>
>> r1-r2 can't be used in Arm state as the register has to start on an even boundary.  But ira has already used r3 for the address of the store (it could have picked r1) and now r4-r5 is the next even-numbered pair.  So we end up with needing to save some call-clobbered regs.
>>
>> R.
>>>
>>> Bernd.
>>>
>>
> 
> One possible trick to stabilize the test is to insert an asm that clobbers r4 and r5 and forces the prologue/epilogue code to always save and restore them.  Then we can account for those prologue/epilogue consistently (at least, modulo the arm_prefer_ldrd_strd condition).
> 

Yes, or add -fno-omit-frame-pointer.

BTW: have you seen this negative lookahead in my patch
[PATCH] [ARM] Adjust test expectations of unaligned-memcpy-2/3.c (PR 91614)
https://gcc.gnu.org/ml/gcc-patches/2019-09/msg00157.html

/* { dg-final { scan-assembler-times "ldrd\(?!\[^\\n\]*sp\)" 0 } } */

it makes the test work for all possible combinations with
RUNTESTFLAGS="--target_board=unix\{-mcpu=cortex-a15,-mcpu=cortex-a57,-mcpu=cortex-a9,-mcpu=cortex-a8,-mcpu=cortex-a7\}\{-fno-omit-frame-pointer,\}"


Cool isn't it?

Bernd.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCHv5] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)
  2019-09-04 12:53                               ` Richard Earnshaw (lists)
  2019-09-04 13:29                                 ` Bernd Edlinger
@ 2019-09-06 10:15                                 ` Bernd Edlinger
  2019-09-06 10:18                                   ` Richard Earnshaw (lists)
  1 sibling, 1 reply; 50+ messages in thread
From: Bernd Edlinger @ 2019-09-06 10:15 UTC (permalink / raw)
  To: Richard Earnshaw (lists), Richard Biener
  Cc: gcc-patches, Ramana Radhakrishnan, Kyrill Tkachov, Eric Botcazou,
	Jeff Law, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 1370 bytes --]

On 9/4/19 2:53 PM, Richard Earnshaw (lists) wrote:
> Index: gcc/testsuite/gcc.target/arm/unaligned-argument-2.c
> ===================================================================
> --- gcc/testsuite/gcc.target/arm/unaligned-argument-2.c    (Revision 0)
> +++ gcc/testsuite/gcc.target/arm/unaligned-argument-2.c    (Arbeitskopie)
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target arm_arm_ok } */
> +/* { dg-require-effective-target arm_ldrd_strd_ok } */
> +/* { dg-options "-marm -mno-unaligned-access -O3" } */
> +
> +struct s {
> +  int a, b;
> +} __attribute__((aligned(8)));
> +
> +struct s f0;
> +
> +void f(int a, int b, int c, int d, int e, struct s f)
> +{
> +  f0 = f;
> +}
> +
> +/* { dg-final { scan-assembler-times "ldrd" 0 } } */
> +/* { dg-final { scan-assembler-times "strd" 0 } } */
> +/* { dg-final { scan-assembler-times "stm" 1 } } */
> 
> I don't think this test is right.  While we can't use an LDRD to load the argument off the stack, there's nothing wrong with using an STRD to then store the value to f0 (as that is 8-byte aligned).  So the second and third scan-assembler tests are meaningless.
> 
> R.
> 
> (sorry, just noticed this).

So, agreed, that is really likely to change.
I would just remove those, as attached.

Is that OK for trunk?


Thanks
Bernd.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: patch-pr89544-fixup.diff --]
[-- Type: text/x-patch; name="patch-pr89544-fixup.diff", Size: 632 bytes --]

2019-09-06  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	* gcc.target/arm/unaligned-argument-2.c: Remove bogus test cases.

Index: gcc/testsuite/gcc.target/arm/unaligned-argument-2.c
===================================================================
--- gcc/testsuite/gcc.target/arm/unaligned-argument-2.c	(revision 275409)
+++ gcc/testsuite/gcc.target/arm/unaligned-argument-2.c	(working copy)
@@ -15,5 +15,3 @@ void f(int a, int b, int c, int d, int e, struct s
 }
 
 /* { dg-final { scan-assembler-times "ldrd" 0 } } */
-/* { dg-final { scan-assembler-times "strd" 0 } } */
-/* { dg-final { scan-assembler-times "stm" 1 } } */

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCHv5] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)
  2019-09-06 10:15                                 ` Bernd Edlinger
@ 2019-09-06 10:18                                   ` Richard Earnshaw (lists)
  0 siblings, 0 replies; 50+ messages in thread
From: Richard Earnshaw (lists) @ 2019-09-06 10:18 UTC (permalink / raw)
  To: Bernd Edlinger, Richard Biener
  Cc: gcc-patches, Ramana Radhakrishnan, Kyrill Tkachov, Eric Botcazou,
	Jeff Law, Jakub Jelinek

On 06/09/2019 11:15, Bernd Edlinger wrote:
> On 9/4/19 2:53 PM, Richard Earnshaw (lists) wrote:
>> Index: gcc/testsuite/gcc.target/arm/unaligned-argument-2.c
>> ===================================================================
>> --- gcc/testsuite/gcc.target/arm/unaligned-argument-2.c    (Revision 0)
>> +++ gcc/testsuite/gcc.target/arm/unaligned-argument-2.c    (Arbeitskopie)
>> @@ -0,0 +1,19 @@
>> +/* { dg-do compile } */
>> +/* { dg-require-effective-target arm_arm_ok } */
>> +/* { dg-require-effective-target arm_ldrd_strd_ok } */
>> +/* { dg-options "-marm -mno-unaligned-access -O3" } */
>> +
>> +struct s {
>> +  int a, b;
>> +} __attribute__((aligned(8)));
>> +
>> +struct s f0;
>> +
>> +void f(int a, int b, int c, int d, int e, struct s f)
>> +{
>> +  f0 = f;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times "ldrd" 0 } } */
>> +/* { dg-final { scan-assembler-times "strd" 0 } } */
>> +/* { dg-final { scan-assembler-times "stm" 1 } } */
>>
>> I don't think this test is right.  While we can't use an LDRD to load the argument off the stack, there's nothing wrong with using an STRD to then store the value to f0 (as that is 8-byte aligned).  So the second and third scan-assembler tests are meaningless.
>>
>> R.
>>
>> (sorry, just noticed this).
> 
> So, agreed, that is really likely to change.
> I would just remove those, as attached.
> 
> Is that OK for trunk?
> 
> 
> Thanks
> Bernd.
> 

OK.

R.

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2019-09-06 10:18 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-10 12:51 [PATCHv2] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544) Bernd Edlinger
2019-03-19 14:01 ` [PING] " Bernd Edlinger
2019-03-21 11:26 ` Richard Biener
2019-03-22 17:47   ` Bernd Edlinger
2019-03-25  9:28     ` Richard Biener
2019-07-30 22:13       ` [PATCHv3] " Bernd Edlinger
2019-07-31 13:17         ` Richard Earnshaw (lists)
2019-08-01 11:19           ` Bernd Edlinger
2019-08-02  9:10             ` Richard Earnshaw (lists)
2019-08-02 13:11         ` Richard Biener
2019-08-02 19:01           ` Bernd Edlinger
2019-08-08 14:20             ` [PATCHv4] " Bernd Edlinger
2019-08-14 10:54               ` [PING] " Bernd Edlinger
2019-08-14 12:27               ` Richard Biener
2019-08-14 22:26                 ` Bernd Edlinger
2019-08-15  8:58                   ` Richard Biener
2019-08-15 12:38                     ` Bernd Edlinger
2019-08-15 13:03                       ` Richard Biener
2019-08-15 14:33                         ` Richard Biener
2019-08-15 15:28                         ` Bernd Edlinger
2019-08-15 17:42                           ` Richard Biener
2019-08-15 21:19                             ` [PATCHv5] " Bernd Edlinger
2019-08-20  5:38                               ` Jeff Law
2019-08-20 15:04                               ` John David Anglin
     [not found]                                 ` <0d39b64f-67d9-7857-cf4e-36f09c0dc15e@bell.net>
2019-08-20 16:03                                   ` Fwd: " Bernd Edlinger
2019-09-04 12:53                               ` Richard Earnshaw (lists)
2019-09-04 13:29                                 ` Bernd Edlinger
2019-09-04 14:14                                   ` Richard Earnshaw (lists)
2019-09-04 15:00                                     ` Bernd Edlinger
2019-09-04 15:48                                       ` Richard Earnshaw (lists)
2019-09-05  9:21                                         ` Richard Earnshaw (lists)
2019-09-05  9:35                                           ` Bernd Edlinger
2019-09-06 10:15                                 ` Bernd Edlinger
2019-09-06 10:18                                   ` Richard Earnshaw (lists)
2019-08-15 21:27                             ` [PATCH] Sanitizing the middle-end interface to the back-end for strict alignment Bernd Edlinger
2019-08-17 10:11                               ` Bernd Edlinger
2019-08-23  0:01                                 ` Jeff Law
2019-08-23  0:05                               ` Jeff Law
2019-08-23 15:15                                 ` [PING] " Bernd Edlinger
2019-08-27 10:07                               ` Kyrill Tkachov
2019-08-28 11:50                                 ` Bernd Edlinger
2019-08-28 12:01                                   ` Kyrill Tkachov
2019-08-28 13:54                                     ` Christophe Lyon
2019-08-28 21:48                                       ` Bernd Edlinger
2019-08-29  9:09                                         ` Kyrill Tkachov
2019-08-29 10:00                                           ` Christophe Lyon
2019-08-29 22:57                                             ` Bernd Edlinger
2019-08-30 10:07                                               ` Kyrill Tkachov
2019-08-30 15:22                                               ` Christophe Lyon
2019-08-14 11:56             ` [PATCHv3] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544) Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).