public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH][ARM,AARCH64] target/PR68674: relayout vector_types in expand_expr
@ 2016-01-08 13:29 Christian Bruel
  2016-01-14 13:09 ` ping:[PATCH][ARM,AARCH64] " Christian Bruel
  2016-01-18 11:36 ` [PATCH][ARM,AARCH64] " Richard Biener
  0 siblings, 2 replies; 11+ messages in thread
From: Christian Bruel @ 2016-01-08 13:29 UTC (permalink / raw)
  To: kyrylo.tkachov, Richard.Earnshaw, ramana.radhakrishnan,
	richard.guenther, bschmidt, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1496 bytes --]

When compiling code with attribute targets on arm or aarch64, 
vector_type_mode returns different results (eg Vmode or BLKmode) 
depending on the current simd flags that are not set between functions.

for example the following code:

#include <arm_neon.h>

extern int8x8_t a;
extern int8x8_t b;

int16x8_t
__attribute__ ((target("fpu=neon")))
foo(void)
{
    return vaddl_s8 (a, b);
}

Triggers gcc_asserts in copy_to_mode_regs while expanding NEON builtins 
, because the mismatch and DECL_MODE current's TYPE_MODE used in 
expand_builtin for global variables.

but the best explanation is in the vector_type_mode:
/* Vector types need to re-check the target flags each time we report
     the machine mode.  We need to do this because attribute target can
     change the result of vector_mode_supported_p and have_regs_of_mode
     on a per-function basis.  Thus the TYPE_MODE of a VECTOR_TYPE can
     change on a per-function basis.  */

I first tried to hack the 2 machine descriptions to insert 
convert_to_mode or relayout_decls here and there, but I found this very 
fragile. Instead a more central relayout the of type while expanding 
gave good results, as proposed here.

bootstraped and tested with no regression for arm, aarch64 and i586.

Does this look to be the right approach ?

nb: for testing this patch is complementary with

https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00332.html
https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00248.html

thanks for your comments.








[-- Attachment #2: pr68674.patch --]
[-- Type: text/x-patch, Size: 2108 bytes --]

2016-01-06  Christian Bruel  <christian.bruel@st.com>

	PR target/68674
	* expr.c (expand_expr_real_1): Relayout VECTOR_TYPE expression.

2016-01-06  Christian Bruel  <christian.bruel@st.com>

	PR target/68674
	* gcc.target/arm/pr68674.c
	* gcc.target/aarch64/pr68674.c

Index: gcc/expr.c
===================================================================
--- gcc/expr.c	(revision 232158)
+++ gcc/expr.c	(working copy)
@@ -9602,8 +9602,17 @@ expand_expr_real_1 (tree exp, rtx target
       exp = SSA_NAME_VAR (ssa_name);
       goto expand_decl_rtl;
 
-    case PARM_DECL:
     case VAR_DECL:
+      /* Vector types need to re-check the target flags,
+	 since DECL_MODE might change with attribute target.  */
+      if (TREE_CODE (type) == VECTOR_TYPE
+	  && DECL_MODE (exp) != TYPE_MODE (type)
+	  && (TREE_STATIC (exp) || DECL_EXTERNAL (exp)))
+	relayout_decl (exp);
+
+      /* ... fall through ...  */
+
+    case PARM_DECL:
       /* If a static var's type was incomplete when the decl was written,
 	 but the type is complete now, lay out the decl now.  */
       if (DECL_SIZE (exp) == 0
Index: gcc/testsuite/gcc.target/arm/pr68674.c
===================================================================
--- gcc/testsuite/gcc.target/arm/pr68674.c	(revision 0)
+++ gcc/testsuite/gcc.target/arm/pr68674.c	(working copy)
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options "-O2 -mfloat-abi=softfp" } */
+
+#include <arm_neon.h>
+
+int8x8_t a, b;
+int16x8_t e;
+
+void
+__attribute__ ((target("fpu=neon")))
+foo(void)
+{
+  e = (int16x8_t) vaddl_s8(a, b);
+}
Index: gcc/testsuite/gcc.target/aarch64/pr68674.c
===================================================================
--- gcc/testsuite/gcc.target/aarch64/pr68674.c	(revision 0)
+++ gcc/testsuite/gcc.target/aarch64/pr68674.c	(working copy)
@@ -0,0 +1,15 @@
+/* { dg-do assemble } */
+/* { dg-options "-O2 -march=armv8-a+nosimd" } */
+
+#include <arm_neon.h>
+
+int8x8_t a, b;
+int16x8_t e;
+
+void
+__attribute__ ((target("+simd")))
+foo(void)
+{
+  e = (int16x8_t) vaddl_s8(a, b);
+}
+


^ permalink raw reply	[flat|nested] 11+ messages in thread

* ping:[PATCH][ARM,AARCH64] target/PR68674: relayout vector_types in expand_expr
  2016-01-08 13:29 [PATCH][ARM,AARCH64] target/PR68674: relayout vector_types in expand_expr Christian Bruel
@ 2016-01-14 13:09 ` Christian Bruel
  2016-01-18 11:36 ` [PATCH][ARM,AARCH64] " Richard Biener
  1 sibling, 0 replies; 11+ messages in thread
From: Christian Bruel @ 2016-01-14 13:09 UTC (permalink / raw)
  To: richard.guenther, bschmidt
  Cc: kyrylo.tkachov, Richard.Earnshaw, ramana.radhakrishnan, gcc-patches

https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00415.html

thanks


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][ARM,AARCH64] target/PR68674: relayout vector_types in expand_expr
  2016-01-08 13:29 [PATCH][ARM,AARCH64] target/PR68674: relayout vector_types in expand_expr Christian Bruel
  2016-01-14 13:09 ` ping:[PATCH][ARM,AARCH64] " Christian Bruel
@ 2016-01-18 11:36 ` Richard Biener
  2016-01-19 15:01   ` Christian Bruel
  1 sibling, 1 reply; 11+ messages in thread
From: Richard Biener @ 2016-01-18 11:36 UTC (permalink / raw)
  To: Christian Bruel
  Cc: kyrylo.tkachov, Richard Earnshaw, ramana.radhakrishnan, bschmidt,
	GCC Patches

On Fri, Jan 8, 2016 at 2:29 PM, Christian Bruel <christian.bruel@st.com> wrote:
> When compiling code with attribute targets on arm or aarch64,
> vector_type_mode returns different results (eg Vmode or BLKmode) depending
> on the current simd flags that are not set between functions.
>
> for example the following code:
>
> #include <arm_neon.h>
>
> extern int8x8_t a;
> extern int8x8_t b;
>
> int16x8_t
> __attribute__ ((target("fpu=neon")))
> foo(void)
> {
>    return vaddl_s8 (a, b);
> }
>
> Triggers gcc_asserts in copy_to_mode_regs while expanding NEON builtins ,
> because the mismatch and DECL_MODE current's TYPE_MODE used in
> expand_builtin for global variables.
>
> but the best explanation is in the vector_type_mode:
> /* Vector types need to re-check the target flags each time we report
>     the machine mode.  We need to do this because attribute target can
>     change the result of vector_mode_supported_p and have_regs_of_mode
>     on a per-function basis.  Thus the TYPE_MODE of a VECTOR_TYPE can
>     change on a per-function basis.  */
>
> I first tried to hack the 2 machine descriptions to insert convert_to_mode
> or relayout_decls here and there, but I found this very fragile. Instead a
> more central relayout the of type while expanding gave good results, as
> proposed here.
>
> bootstraped and tested with no regression for arm, aarch64 and i586.
>
> Does this look to be the right approach ?
>
> nb: for testing this patch is complementary with
>
> https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00332.html
> https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00248.html
>
> thanks for your comments.

A x86 specific testcase that ICEs as well:

typedef int v8si __attribute__((vector_size(32)));
v8si a;
v8si __attribute__((target("avx"))) foo()
{
  return a;
}

in your patch not using the shared DECL_RTL of the global var
"fixes" this so I think a conceptually better fix would be to
"adjust" DECL_RTL from globals via a adjust_address (or so).

Also given that we do

      /* ... fall through ...  */

    case FUNCTION_DECL:
    case RESULT_DECL:
      decl_rtl = DECL_RTL (exp);
    expand_decl_rtl:
      gcc_assert (decl_rtl);
      decl_rtl = copy_rtx (decl_rtl);

thus always "unshare" DECL_RTL anyway it might be not so
bad to simply do

     decl_rtl = adjust_address (decl_rtl, TYPE_MODE (type), 0);

instead of that to avoid one copy.

Index: expr.c
===================================================================
--- expr.c      (revision 232496)
+++ expr.c      (working copy)
@@ -9597,7 +9597,10 @@ expand_expr_real_1 (tree exp, rtx target
       decl_rtl = DECL_RTL (exp);
     expand_decl_rtl:
       gcc_assert (decl_rtl);
-      decl_rtl = copy_rtx (decl_rtl);
+      if (MEM_P (decl_rtl))
+       decl_rtl = adjust_address (decl_rtl, TYPE_MODE (type), 0);
+      else
+       decl_rtl = copy_rtx (decl_rtl);
       /* Record writes to register variables.  */
       if (modifier == EXPAND_WRITE
          && REG_P (decl_rtl)

untested apart from on the x86_64 testcase (which it fixes).  One could guard
this further to only apply on vector typed decls with mismatched mode of course.

I think that re-layouting globals is not very good design.

Richard.

>
>
>
>
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][ARM,AARCH64] target/PR68674: relayout vector_types in expand_expr
  2016-01-18 11:36 ` [PATCH][ARM,AARCH64] " Richard Biener
@ 2016-01-19 15:01   ` Christian Bruel
  2016-01-19 15:13     ` Christian Bruel
  0 siblings, 1 reply; 11+ messages in thread
From: Christian Bruel @ 2016-01-19 15:01 UTC (permalink / raw)
  To: Richard Biener
  Cc: kyrylo.tkachov, Richard Earnshaw, ramana.radhakrishnan, bschmidt,
	GCC Patches

Hi Richard,

thanks for your input,

On 01/18/2016 12:36 PM, Richard Biener wrote:
> On Fri, Jan 8, 2016 at 2:29 PM, Christian Bruel <christian.bruel@st.com> wrote:
>> When compiling code with attribute targets on arm or aarch64,
>> vector_type_mode returns different results (eg Vmode or BLKmode) depending
>> on the current simd flags that are not set between functions.
>>
>> for example the following code:
>>
>> #include <arm_neon.h>
>>
>> extern int8x8_t a;
>> extern int8x8_t b;
>>
>> int16x8_t
>> __attribute__ ((target("fpu=neon")))
>> foo(void)
>> {
>>     return vaddl_s8 (a, b);
>> }
>>
>> Triggers gcc_asserts in copy_to_mode_regs while expanding NEON builtins ,
>> because the mismatch and DECL_MODE current's TYPE_MODE used in
>> expand_builtin for global variables.
>>
>> but the best explanation is in the vector_type_mode:
>> /* Vector types need to re-check the target flags each time we report
>>      the machine mode.  We need to do this because attribute target can
>>      change the result of vector_mode_supported_p and have_regs_of_mode
>>      on a per-function basis.  Thus the TYPE_MODE of a VECTOR_TYPE can
>>      change on a per-function basis.  */
>>
>> I first tried to hack the 2 machine descriptions to insert convert_to_mode
>> or relayout_decls here and there, but I found this very fragile. Instead a
>> more central relayout the of type while expanding gave good results, as
>> proposed here.
>>
>> bootstraped and tested with no regression for arm, aarch64 and i586.
>>
>> Does this look to be the right approach ?
>>
>> nb: for testing this patch is complementary with
>>
>> https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00332.html
>> https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00248.html
>>
>> thanks for your comments.
> A x86 specific testcase that ICEs as well:
>
> typedef int v8si __attribute__((vector_size(32)));
> v8si a;
> v8si __attribute__((target("avx"))) foo()
> {
>    return a;
> }
>
> in your patch not using the shared DECL_RTL of the global var
> "fixes" this so I think a conceptually better fix would be to
> "adjust" DECL_RTL from globals via a adjust_address (or so).
>
> Also given that we do
>
>        /* ... fall through ...  */
>
>      case FUNCTION_DECL:
>      case RESULT_DECL:
>        decl_rtl = DECL_RTL (exp);
>      expand_decl_rtl:
>        gcc_assert (decl_rtl);
>        decl_rtl = copy_rtx (decl_rtl);
>
> thus always "unshare" DECL_RTL anyway it might be not so
> bad to simply do
>
>       decl_rtl = adjust_address (decl_rtl, TYPE_MODE (type), 0);
>
> instead of that to avoid one copy.
>
> Index: expr.c
> ===================================================================
> --- expr.c      (revision 232496)
> +++ expr.c      (working copy)
> @@ -9597,7 +9597,10 @@ expand_expr_real_1 (tree exp, rtx target
>         decl_rtl = DECL_RTL (exp);
>       expand_decl_rtl:
>         gcc_assert (decl_rtl);
> -      decl_rtl = copy_rtx (decl_rtl);
> +      if (MEM_P (decl_rtl))
> +       decl_rtl = adjust_address (decl_rtl, TYPE_MODE (type), 0);
> +      else
> +       decl_rtl = copy_rtx (decl_rtl);
>         /* Record writes to register variables.  */
>         if (modifier == EXPAND_WRITE
>            && REG_P (decl_rtl)
>
> untested apart from on the x86_64 testcase (which it fixes).  One could guard
> this further to only apply on vector typed decls with mismatched mode of course.
>
> I think that re-layouting globals is not very good design.
>
> Richard.

A few other ICEs with this implementation, for instance if the context 
is not in a function, such as

typedef __simd64_int8_t int8x8_t;

extern int8x8_t b;
int8x8_t *a = &b;

So, to avoid a var re-layout and a copy_rtx (implied by adjust_address 
btw). What about just calling 'change_address' ? like: (very lightly tested)

Index: expr.c
===================================================================
--- expr.c    (revision 232564)
+++ expr.c    (working copy)
@@ -9392,7 +9392,8 @@
              enum expand_modifier modifier, rtx *alt_rtl,
              bool inner_reference_p)
  {
-  rtx op0, op1, temp, decl_rtl;
+  rtx op0, op1, temp;
+  rtx decl_rtl = NULL_RTX;
    tree type;
    int unsignedp;
    machine_mode mode, dmode;
@@ -9590,11 +9591,22 @@
        && (TREE_STATIC (exp) || DECL_EXTERNAL (exp)))
      layout_decl (exp, 0);

+      decl_rtl = DECL_RTL (exp);
+
+      if (MEM_P (decl_rtl)
+      && (VECTOR_TYPE_P (type) && DECL_MODE (exp) != mode))
+    {
+      if (current_function_decl
+          && (! reload_completed && !reload_in_progress))
+        decl_rtl = change_address (decl_rtl, TYPE_MODE (type), 0);
+    }
+
        /* ... fall through ...  */

      case FUNCTION_DECL:
      case RESULT_DECL:
-      decl_rtl = DECL_RTL (exp);
+      if (! decl_rtl)
+    decl_rtl = DECL_RTL (exp);
      expand_decl_rtl:
        gcc_assert (decl_rtl);
        decl_rtl = copy_rtx (decl_rtl);

I'm not sure that moving the code in the 'expand_decl_rtl' label is 
best, as we'd need to test for exp and the case should only happen for 
global vars (not functions or results)

thanks,

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][ARM,AARCH64] target/PR68674: relayout vector_types in expand_expr
  2016-01-19 15:01   ` Christian Bruel
@ 2016-01-19 15:13     ` Christian Bruel
  2016-01-19 15:18       ` Richard Biener
  0 siblings, 1 reply; 11+ messages in thread
From: Christian Bruel @ 2016-01-19 15:13 UTC (permalink / raw)
  To: Richard Biener
  Cc: kyrylo.tkachov, Richard Earnshaw, ramana.radhakrishnan, bschmidt,
	GCC Patches



On 01/19/2016 04:01 PM, Christian Bruel wrote:
> Hi Richard,
>
> thanks for your input,
>
> On 01/18/2016 12:36 PM, Richard Biener wrote:
>> On Fri, Jan 8, 2016 at 2:29 PM, Christian Bruel <christian.bruel@st.com> wrote:
>>> When compiling code with attribute targets on arm or aarch64,
>>> vector_type_mode returns different results (eg Vmode or BLKmode) depending
>>> on the current simd flags that are not set between functions.
>>>
>>> for example the following code:
>>>
>>> #include <arm_neon.h>
>>>
>>> extern int8x8_t a;
>>> extern int8x8_t b;
>>>
>>> int16x8_t
>>> __attribute__ ((target("fpu=neon")))
>>> foo(void)
>>> {
>>>      return vaddl_s8 (a, b);
>>> }
>>>
>>> Triggers gcc_asserts in copy_to_mode_regs while expanding NEON builtins ,
>>> because the mismatch and DECL_MODE current's TYPE_MODE used in
>>> expand_builtin for global variables.
>>>
>>> but the best explanation is in the vector_type_mode:
>>> /* Vector types need to re-check the target flags each time we report
>>>       the machine mode.  We need to do this because attribute target can
>>>       change the result of vector_mode_supported_p and have_regs_of_mode
>>>       on a per-function basis.  Thus the TYPE_MODE of a VECTOR_TYPE can
>>>       change on a per-function basis.  */
>>>
>>> I first tried to hack the 2 machine descriptions to insert convert_to_mode
>>> or relayout_decls here and there, but I found this very fragile. Instead a
>>> more central relayout the of type while expanding gave good results, as
>>> proposed here.
>>>
>>> bootstraped and tested with no regression for arm, aarch64 and i586.
>>>
>>> Does this look to be the right approach ?
>>>
>>> nb: for testing this patch is complementary with
>>>
>>> https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00332.html
>>> https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00248.html
>>>
>>> thanks for your comments.
>> A x86 specific testcase that ICEs as well:
>>
>> typedef int v8si __attribute__((vector_size(32)));
>> v8si a;
>> v8si __attribute__((target("avx"))) foo()
>> {
>>     return a;
>> }
>>
>> in your patch not using the shared DECL_RTL of the global var
>> "fixes" this so I think a conceptually better fix would be to
>> "adjust" DECL_RTL from globals via a adjust_address (or so).
>>
>> Also given that we do
>>
>>         /* ... fall through ...  */
>>
>>       case FUNCTION_DECL:
>>       case RESULT_DECL:
>>         decl_rtl = DECL_RTL (exp);
>>       expand_decl_rtl:
>>         gcc_assert (decl_rtl);
>>         decl_rtl = copy_rtx (decl_rtl);
>>
>> thus always "unshare" DECL_RTL anyway it might be not so
>> bad to simply do
>>
>>        decl_rtl = adjust_address (decl_rtl, TYPE_MODE (type), 0);
>>
>> instead of that to avoid one copy.
>>
>> Index: expr.c
>> ===================================================================
>> --- expr.c      (revision 232496)
>> +++ expr.c      (working copy)
>> @@ -9597,7 +9597,10 @@ expand_expr_real_1 (tree exp, rtx target
>>          decl_rtl = DECL_RTL (exp);
>>        expand_decl_rtl:
>>          gcc_assert (decl_rtl);
>> -      decl_rtl = copy_rtx (decl_rtl);
>> +      if (MEM_P (decl_rtl))
>> +       decl_rtl = adjust_address (decl_rtl, TYPE_MODE (type), 0);
>> +      else
>> +       decl_rtl = copy_rtx (decl_rtl);
>>          /* Record writes to register variables.  */
>>          if (modifier == EXPAND_WRITE
>>             && REG_P (decl_rtl)
>>
>> untested apart from on the x86_64 testcase (which it fixes).  One could guard
>> this further to only apply on vector typed decls with mismatched mode of course.
>>
>> I think that re-layouting globals is not very good design.
>>
>> Richard.
> A few other ICEs with this implementation, for instance if the context
> is not in a function, such as
>
> typedef __simd64_int8_t int8x8_t;
>
> extern int8x8_t b;
> int8x8_t *a = &b;
>
> So, to avoid a var re-layout and a copy_rtx (implied by adjust_address
> btw). What about just calling 'change_address' ? like: (very lightly tested)
>
> Index: expr.c
> ===================================================================
> --- expr.c    (revision 232564)
> +++ expr.c    (working copy)
> @@ -9392,7 +9392,8 @@
>                enum expand_modifier modifier, rtx *alt_rtl,
>                bool inner_reference_p)
>    {
> -  rtx op0, op1, temp, decl_rtl;
> +  rtx op0, op1, temp;
> +  rtx decl_rtl = NULL_RTX;
>      tree type;
>      int unsignedp;
>      machine_mode mode, dmode;
> @@ -9590,11 +9591,22 @@
>          && (TREE_STATIC (exp) || DECL_EXTERNAL (exp)))
>        layout_decl (exp, 0);
>
> +      decl_rtl = DECL_RTL (exp);
> +
> +      if (MEM_P (decl_rtl)
> +      && (VECTOR_TYPE_P (type) && DECL_MODE (exp) != mode))
> +    {
> +      if (current_function_decl
> +          && (! reload_completed && !reload_in_progress))
> +        decl_rtl = change_address (decl_rtl, TYPE_MODE (type), 0);
> +    }
> +
>          /* ... fall through ...  */
>
>        case FUNCTION_DECL:
>        case RESULT_DECL:
> -      decl_rtl = DECL_RTL (exp);
> +      if (! decl_rtl)
> +    decl_rtl = DECL_RTL (exp);
>        expand_decl_rtl:
>          gcc_assert (decl_rtl);
>          decl_rtl = copy_rtx (decl_rtl);
>
> I'm not sure that moving the code in the 'expand_decl_rtl' label is
> best, as we'd need to test for exp and the case should only happen for
> global vars (not functions or results)

Here is the alternative implementation, shorter after all. testing in 
progress.

Index: expr.c
===================================================================
--- expr.c    (revision 232570)
+++ expr.c    (working copy)
@@ -9597,6 +9597,15 @@
        decl_rtl = DECL_RTL (exp);
      expand_decl_rtl:
        gcc_assert (decl_rtl);
+
+      if (exp && code == VAR_DECL && MEM_P (decl_rtl)
+      && (VECTOR_TYPE_P (type) && DECL_MODE (exp) != mode))
+    {
+      if (current_function_decl
+          && (! reload_completed && !reload_in_progress))
+        decl_rtl = change_address (decl_rtl, TYPE_MODE (type), 0);
+    }
+
        decl_rtl = copy_rtx (decl_rtl);
        /* Record writes to register variables.  */
        if (modifier == EXPAND_WRITE

>
> thanks,
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][ARM,AARCH64] target/PR68674: relayout vector_types in expand_expr
  2016-01-19 15:13     ` Christian Bruel
@ 2016-01-19 15:18       ` Richard Biener
  2016-01-22 11:41         ` Christian Bruel
  0 siblings, 1 reply; 11+ messages in thread
From: Richard Biener @ 2016-01-19 15:18 UTC (permalink / raw)
  To: Christian Bruel
  Cc: kyrylo.tkachov, Richard Earnshaw, ramana.radhakrishnan, bschmidt,
	GCC Patches

On Tue, Jan 19, 2016 at 4:13 PM, Christian Bruel <christian.bruel@st.com> wrote:
>
>
> On 01/19/2016 04:01 PM, Christian Bruel wrote:
>>
>> Hi Richard,
>>
>> thanks for your input,
>>
>> On 01/18/2016 12:36 PM, Richard Biener wrote:
>>>
>>> On Fri, Jan 8, 2016 at 2:29 PM, Christian Bruel <christian.bruel@st.com>
>>> wrote:
>>>>
>>>> When compiling code with attribute targets on arm or aarch64,
>>>> vector_type_mode returns different results (eg Vmode or BLKmode)
>>>> depending
>>>> on the current simd flags that are not set between functions.
>>>>
>>>> for example the following code:
>>>>
>>>> #include <arm_neon.h>
>>>>
>>>> extern int8x8_t a;
>>>> extern int8x8_t b;
>>>>
>>>> int16x8_t
>>>> __attribute__ ((target("fpu=neon")))
>>>> foo(void)
>>>> {
>>>>      return vaddl_s8 (a, b);
>>>> }
>>>>
>>>> Triggers gcc_asserts in copy_to_mode_regs while expanding NEON builtins
>>>> ,
>>>> because the mismatch and DECL_MODE current's TYPE_MODE used in
>>>> expand_builtin for global variables.
>>>>
>>>> but the best explanation is in the vector_type_mode:
>>>> /* Vector types need to re-check the target flags each time we report
>>>>       the machine mode.  We need to do this because attribute target can
>>>>       change the result of vector_mode_supported_p and have_regs_of_mode
>>>>       on a per-function basis.  Thus the TYPE_MODE of a VECTOR_TYPE can
>>>>       change on a per-function basis.  */
>>>>
>>>> I first tried to hack the 2 machine descriptions to insert
>>>> convert_to_mode
>>>> or relayout_decls here and there, but I found this very fragile. Instead
>>>> a
>>>> more central relayout the of type while expanding gave good results, as
>>>> proposed here.
>>>>
>>>> bootstraped and tested with no regression for arm, aarch64 and i586.
>>>>
>>>> Does this look to be the right approach ?
>>>>
>>>> nb: for testing this patch is complementary with
>>>>
>>>> https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00332.html
>>>> https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00248.html
>>>>
>>>> thanks for your comments.
>>>
>>> A x86 specific testcase that ICEs as well:
>>>
>>> typedef int v8si __attribute__((vector_size(32)));
>>> v8si a;
>>> v8si __attribute__((target("avx"))) foo()
>>> {
>>>     return a;
>>> }
>>>
>>> in your patch not using the shared DECL_RTL of the global var
>>> "fixes" this so I think a conceptually better fix would be to
>>> "adjust" DECL_RTL from globals via a adjust_address (or so).
>>>
>>> Also given that we do
>>>
>>>         /* ... fall through ...  */
>>>
>>>       case FUNCTION_DECL:
>>>       case RESULT_DECL:
>>>         decl_rtl = DECL_RTL (exp);
>>>       expand_decl_rtl:
>>>         gcc_assert (decl_rtl);
>>>         decl_rtl = copy_rtx (decl_rtl);
>>>
>>> thus always "unshare" DECL_RTL anyway it might be not so
>>> bad to simply do
>>>
>>>        decl_rtl = adjust_address (decl_rtl, TYPE_MODE (type), 0);
>>>
>>> instead of that to avoid one copy.
>>>
>>> Index: expr.c
>>> ===================================================================
>>> --- expr.c      (revision 232496)
>>> +++ expr.c      (working copy)
>>> @@ -9597,7 +9597,10 @@ expand_expr_real_1 (tree exp, rtx target
>>>          decl_rtl = DECL_RTL (exp);
>>>        expand_decl_rtl:
>>>          gcc_assert (decl_rtl);
>>> -      decl_rtl = copy_rtx (decl_rtl);
>>> +      if (MEM_P (decl_rtl))
>>> +       decl_rtl = adjust_address (decl_rtl, TYPE_MODE (type), 0);
>>> +      else
>>> +       decl_rtl = copy_rtx (decl_rtl);
>>>          /* Record writes to register variables.  */
>>>          if (modifier == EXPAND_WRITE
>>>             && REG_P (decl_rtl)
>>>
>>> untested apart from on the x86_64 testcase (which it fixes).  One could
>>> guard
>>> this further to only apply on vector typed decls with mismatched mode of
>>> course.
>>>
>>> I think that re-layouting globals is not very good design.
>>>
>>> Richard.
>>
>> A few other ICEs with this implementation, for instance if the context
>> is not in a function, such as
>>
>> typedef __simd64_int8_t int8x8_t;
>>
>> extern int8x8_t b;
>> int8x8_t *a = &b;
>>
>> So, to avoid a var re-layout and a copy_rtx (implied by adjust_address
>> btw). What about just calling 'change_address' ? like: (very lightly
>> tested)
>>
>> Index: expr.c
>> ===================================================================
>> --- expr.c    (revision 232564)
>> +++ expr.c    (working copy)
>> @@ -9392,7 +9392,8 @@
>>                enum expand_modifier modifier, rtx *alt_rtl,
>>                bool inner_reference_p)
>>    {
>> -  rtx op0, op1, temp, decl_rtl;
>> +  rtx op0, op1, temp;
>> +  rtx decl_rtl = NULL_RTX;
>>      tree type;
>>      int unsignedp;
>>      machine_mode mode, dmode;
>> @@ -9590,11 +9591,22 @@
>>          && (TREE_STATIC (exp) || DECL_EXTERNAL (exp)))
>>        layout_decl (exp, 0);
>>
>> +      decl_rtl = DECL_RTL (exp);
>> +
>> +      if (MEM_P (decl_rtl)
>> +      && (VECTOR_TYPE_P (type) && DECL_MODE (exp) != mode))
>> +    {
>> +      if (current_function_decl
>> +          && (! reload_completed && !reload_in_progress))
>> +        decl_rtl = change_address (decl_rtl, TYPE_MODE (type), 0);
>> +    }
>> +
>>          /* ... fall through ...  */
>>
>>        case FUNCTION_DECL:
>>        case RESULT_DECL:
>> -      decl_rtl = DECL_RTL (exp);
>> +      if (! decl_rtl)
>> +    decl_rtl = DECL_RTL (exp);
>>        expand_decl_rtl:
>>          gcc_assert (decl_rtl);
>>          decl_rtl = copy_rtx (decl_rtl);
>>
>> I'm not sure that moving the code in the 'expand_decl_rtl' label is
>> best, as we'd need to test for exp and the case should only happen for
>> global vars (not functions or results)
>
>
> Here is the alternative implementation, shorter after all. testing in
> progress.
>
> Index: expr.c
> ===================================================================
> --- expr.c    (revision 232570)
> +++ expr.c    (working copy)
> @@ -9597,6 +9597,15 @@
>        decl_rtl = DECL_RTL (exp);
>      expand_decl_rtl:
>        gcc_assert (decl_rtl);
> +
> +      if (exp && code == VAR_DECL && MEM_P (decl_rtl)
> +      && (VECTOR_TYPE_P (type) && DECL_MODE (exp) != mode))
> +    {
> +      if (current_function_decl
> +          && (! reload_completed && !reload_in_progress))

maybe just if (currently_expanding_to_rtl)?

But yes, this looks like a safe variant of the fix.

Richard.

> +        decl_rtl = change_address (decl_rtl, TYPE_MODE (type), 0);
> +    }
> +
>        decl_rtl = copy_rtx (decl_rtl);
>        /* Record writes to register variables.  */
>        if (modifier == EXPAND_WRITE
>
>>
>> thanks,
>>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][ARM,AARCH64] target/PR68674: relayout vector_types in expand_expr
  2016-01-19 15:18       ` Richard Biener
@ 2016-01-22 11:41         ` Christian Bruel
  2016-01-22 11:56           ` Richard Biener
  0 siblings, 1 reply; 11+ messages in thread
From: Christian Bruel @ 2016-01-22 11:41 UTC (permalink / raw)
  To: Richard Biener
  Cc: kyrylo.tkachov, Richard Earnshaw, ramana.radhakrishnan, bschmidt,
	GCC Patches

[-- Attachment #1: Type: text/plain, Size: 440 bytes --]



On 01/19/2016 04:18 PM, Richard Biener wrote:
> maybe just if (currently_expanding_to_rtl)?
>
> But yes, this looks like a safe variant of the fix.
>
> Richard.
>
thanks, currently_expanding_to_rtl works perfectly. So the final version.
I added a test for each target.

bootstrapped / tested for :
     unix/-m32/-march=i586
     unix

     arm-qemu/
     arm-qemu//-mfpu=neon
     arm-qemu//-mfpu=neon-fp-armv8

     aarch64-qemu








[-- Attachment #2: pr68674.patch --]
[-- Type: text/x-patch, Size: 2950 bytes --]

2016-01-21  Christian Bruel  <christian.bruel@st.com>

	PR target/68674
	* expr.c (expand_expr_real_1): Reset DECL_MODE if VECTOR_TYPE_P changed.

2016-01-21  Christian Bruel  <christian.bruel@st.com>

	PR target/68674
	* gcc.target/i386/pr68674.c
	* gcc.target/aarch64/pr68674.c
	* gcc.target/arm/pr68674.c

Index: gcc/expr.c
===================================================================
--- gcc/expr.c	(revision 232724)
+++ gcc/expr.c	(working copy)
@@ -9597,7 +9597,16 @@ expand_expr_real_1 (tree exp, rtx target, machine_
       decl_rtl = DECL_RTL (exp);
     expand_decl_rtl:
       gcc_assert (decl_rtl);
-      decl_rtl = copy_rtx (decl_rtl);
+
+      /* DECL_MODE might change when TYPE_MODE depends on attribute target
+	 settings for VECTOR_TYPE_P that might switch for the function.  */
+      if (currently_expanding_to_rtl
+	  && code == VAR_DECL && MEM_P (decl_rtl)
+	  && VECTOR_TYPE_P (type) && exp && DECL_MODE (exp) != mode)
+	decl_rtl = change_address (decl_rtl, TYPE_MODE (type), 0);
+      else
+	decl_rtl = copy_rtx (decl_rtl);
+
       /* Record writes to register variables.  */
       if (modifier == EXPAND_WRITE
 	  && REG_P (decl_rtl)
Index: gcc/testsuite/gcc.target/aarch64/pr68674.c
===================================================================
--- gcc/testsuite/gcc.target/aarch64/pr68674.c	(revision 0)
+++ gcc/testsuite/gcc.target/aarch64/pr68674.c	(working copy)
@@ -0,0 +1,22 @@
+/* PR target/68674 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -mcpu=generic+nosimd" } */
+
+#include <arm_neon.h>
+
+int8x8_t a;
+extern int8x8_t b;
+int16x8_t e;
+
+void __attribute__((target("+simd")))
+foo1(void)
+{
+  e = (int16x8_t) vaddl_s8(a, b);
+}
+
+int8x8_t __attribute__((target("+simd")))
+foo2(void)
+{
+  return a;
+}
+
Index: gcc/testsuite/gcc.target/arm/pr68674.c
===================================================================
--- gcc/testsuite/gcc.target/arm/pr68674.c	(revision 0)
+++ gcc/testsuite/gcc.target/arm/pr68674.c	(working copy)
@@ -0,0 +1,26 @@
+/* PR target/68674 */
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options "-O2 -mfloat-abi=softfp" } */
+
+#pragma GCC target ("fpu=vfp")
+
+#include <arm_neon.h>
+
+int8x8_t a;
+extern int8x8_t b;
+int16x8_t e;
+
+void __attribute__((target("fpu=neon")))
+foo1(void)
+{
+  e = (int16x8_t) vaddl_s8(a, b);
+}
+
+int8x8_t __attribute__((target("fpu=neon")))
+foo2(void)
+{
+  return b;
+}
+
+
Index: gcc/testsuite/gcc.target/i386/pr68674.c
===================================================================
--- gcc/testsuite/gcc.target/i386/pr68674.c	(revision 0)
+++ gcc/testsuite/gcc.target/i386/pr68674.c	(working copy)
@@ -0,0 +1,15 @@
+/* PR target/68674 */
+/* { dg-do compile } */
+/* { dg-require-effective-target avx } */
+/* { dg-options "-O2" } */
+
+typedef int v8si __attribute__((vector_size(32)));
+
+v8si a;
+
+ __attribute__((target("avx")))
+v8si
+foo()
+{
+    return a;
+}

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][ARM,AARCH64] target/PR68674: relayout vector_types in expand_expr
  2016-01-22 11:41         ` Christian Bruel
@ 2016-01-22 11:56           ` Richard Biener
  2016-01-25 19:06             ` Christophe Lyon
  0 siblings, 1 reply; 11+ messages in thread
From: Richard Biener @ 2016-01-22 11:56 UTC (permalink / raw)
  To: Christian Bruel
  Cc: kyrylo.tkachov, Richard Earnshaw, ramana.radhakrishnan, bschmidt,
	GCC Patches

On Fri, Jan 22, 2016 at 12:41 PM, Christian Bruel
<christian.bruel@st.com> wrote:
>
>
> On 01/19/2016 04:18 PM, Richard Biener wrote:
>>
>> maybe just if (currently_expanding_to_rtl)?
>>
>> But yes, this looks like a safe variant of the fix.
>>
>> Richard.
>>
> thanks, currently_expanding_to_rtl works perfectly. So the final version.
> I added a test for each target.

Ok.

Thanks,
Richard.

> bootstrapped / tested for :
>     unix/-m32/-march=i586
>     unix
>
>     arm-qemu/
>     arm-qemu//-mfpu=neon
>     arm-qemu//-mfpu=neon-fp-armv8
>
>     aarch64-qemu
>
>
>
>
>
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][ARM,AARCH64] target/PR68674: relayout vector_types in expand_expr
  2016-01-22 11:56           ` Richard Biener
@ 2016-01-25 19:06             ` Christophe Lyon
  2016-01-26  8:17               ` Christian Bruel
  2016-01-26  9:27               ` Kyrill Tkachov
  0 siblings, 2 replies; 11+ messages in thread
From: Christophe Lyon @ 2016-01-25 19:06 UTC (permalink / raw)
  To: Richard Biener
  Cc: Christian Bruel, kyrylo.tkachov, Richard Earnshaw,
	ramana.radhakrishnan, bschmidt, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 1012 bytes --]

On 22 January 2016 at 12:56, Richard Biener <richard.guenther@gmail.com> wrote:
> On Fri, Jan 22, 2016 at 12:41 PM, Christian Bruel
> <christian.bruel@st.com> wrote:
>>
>>
>> On 01/19/2016 04:18 PM, Richard Biener wrote:
>>>
>>> maybe just if (currently_expanding_to_rtl)?
>>>
>>> But yes, this looks like a safe variant of the fix.
>>>
>>> Richard.
>>>
>> thanks, currently_expanding_to_rtl works perfectly. So the final version.
>> I added a test for each target.
>
> Ok.
>

Hi,

This small patch is needed to make the new test pass on arm hard-float
targets (eg. arm-none-linux-gnueabihf).

I'm not sure it counts as obvious, so here it is.
OK?

Christophe.

DATE  Christophe Lyon  <christophe.lyon@linaro.org>

    * gcc.target/arm/pr68674.c: Check and use arm_fp effective target.


> Thanks,
> Richard.
>
>> bootstrapped / tested for :
>>     unix/-m32/-march=i586
>>     unix
>>
>>     arm-qemu/
>>     arm-qemu//-mfpu=neon
>>     arm-qemu//-mfpu=neon-fp-armv8
>>
>>     aarch64-qemu
>>
>>
>>
>>
>>
>>
>>

[-- Attachment #2: pr68674-test.patch.txt --]
[-- Type: text/plain, Size: 521 bytes --]

diff --git a/gcc/testsuite/gcc.target/arm/pr68674.c b/gcc/testsuite/gcc.target/arm/pr68674.c
index a31a88a..0b32374 100644
--- a/gcc/testsuite/gcc.target/arm/pr68674.c
+++ b/gcc/testsuite/gcc.target/arm/pr68674.c
@@ -1,7 +1,9 @@
 /* PR target/68674 */
 /* { dg-do compile } */
 /* { dg-require-effective-target arm_neon_ok } */
-/* { dg-options "-O2 -mfloat-abi=softfp" } */
+/* { dg-require-effective-target arm_fp_ok } */
+/* { dg-options "-O2" } */
+/* { dg-add-options arm_fp } */
 
 #pragma GCC target ("fpu=vfp")
 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][ARM,AARCH64] target/PR68674: relayout vector_types in expand_expr
  2016-01-25 19:06             ` Christophe Lyon
@ 2016-01-26  8:17               ` Christian Bruel
  2016-01-26  9:27               ` Kyrill Tkachov
  1 sibling, 0 replies; 11+ messages in thread
From: Christian Bruel @ 2016-01-26  8:17 UTC (permalink / raw)
  To: Christophe Lyon, Richard Biener
  Cc: kyrylo.tkachov, Richard Earnshaw, ramana.radhakrishnan, bschmidt,
	GCC Patches



On 01/25/2016 08:06 PM, Christophe Lyon wrote:
> On 22 January 2016 at 12:56, Richard Biener <richard.guenther@gmail.com> wrote:
>> On Fri, Jan 22, 2016 at 12:41 PM, Christian Bruel
>> <christian.bruel@st.com> wrote:
>>>
>>> On 01/19/2016 04:18 PM, Richard Biener wrote:
>>>> maybe just if (currently_expanding_to_rtl)?
>>>>
>>>> But yes, this looks like a safe variant of the fix.
>>>>
>>>> Richard.
>>>>
>>> thanks, currently_expanding_to_rtl works perfectly. So the final version.
>>> I added a test for each target.
>> Ok.
>>
> Hi,
>
> This small patch is needed to make the new test pass on arm hard-float
> targets (eg. arm-none-linux-gnueabihf).
>
> I'm not sure it counts as obvious, so here it is.
> OK?
>
> Christophe.
>
> DATE  Christophe Lyon  <christophe.lyon@linaro.org>
>
>      * gcc.target/arm/pr68674.c: Check and use arm_fp effective target.

At least that's fine with me.

Just for the story, arm_neon.h ends up to include stubs.h from the 
sysroot. Depending on the glibc version it might assert that we cannot 
use softfp abi with a sysroot built with hardfp.

thanks for spotting this Christophe,

>
>> Thanks,
>> Richard.
>>
>>> bootstrapped / tested for :
>>>      unix/-m32/-march=i586
>>>      unix
>>>
>>>      arm-qemu/
>>>      arm-qemu//-mfpu=neon
>>>      arm-qemu//-mfpu=neon-fp-armv8
>>>
>>>      aarch64-qemu
>>>
>>>
>>>
>>>
>>>
>>>
>>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][ARM,AARCH64] target/PR68674: relayout vector_types in expand_expr
  2016-01-25 19:06             ` Christophe Lyon
  2016-01-26  8:17               ` Christian Bruel
@ 2016-01-26  9:27               ` Kyrill Tkachov
  1 sibling, 0 replies; 11+ messages in thread
From: Kyrill Tkachov @ 2016-01-26  9:27 UTC (permalink / raw)
  To: Christophe Lyon, Richard Biener
  Cc: Christian Bruel, Richard Earnshaw, ramana.radhakrishnan,
	bschmidt, GCC Patches


On 25/01/16 19:06, Christophe Lyon wrote:
> On 22 January 2016 at 12:56, Richard Biener <richard.guenther@gmail.com> wrote:
>> On Fri, Jan 22, 2016 at 12:41 PM, Christian Bruel
>> <christian.bruel@st.com> wrote:
>>>
>>> On 01/19/2016 04:18 PM, Richard Biener wrote:
>>>> maybe just if (currently_expanding_to_rtl)?
>>>>
>>>> But yes, this looks like a safe variant of the fix.
>>>>
>>>> Richard.
>>>>
>>> thanks, currently_expanding_to_rtl works perfectly. So the final version.
>>> I added a test for each target.
>> Ok.
>>
> Hi,
>
> This small patch is needed to make the new test pass on arm hard-float
> targets (eg. arm-none-linux-gnueabihf).
>
> I'm not sure it counts as obvious, so here it is.
> OK?

Ok.

Thanks,
Kyrill

> Christophe.
>
> DATE  Christophe Lyon  <christophe.lyon@linaro.org>
>
>      * gcc.target/arm/pr68674.c: Check and use arm_fp effective target.
>
>
>> Thanks,
>> Richard.
>>
>>> bootstrapped / tested for :
>>>      unix/-m32/-march=i586
>>>      unix
>>>
>>>      arm-qemu/
>>>      arm-qemu//-mfpu=neon
>>>      arm-qemu//-mfpu=neon-fp-armv8
>>>
>>>      aarch64-qemu
>>>
>>>
>>>
>>>
>>>
>>>
>>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-01-26  9:27 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-08 13:29 [PATCH][ARM,AARCH64] target/PR68674: relayout vector_types in expand_expr Christian Bruel
2016-01-14 13:09 ` ping:[PATCH][ARM,AARCH64] " Christian Bruel
2016-01-18 11:36 ` [PATCH][ARM,AARCH64] " Richard Biener
2016-01-19 15:01   ` Christian Bruel
2016-01-19 15:13     ` Christian Bruel
2016-01-19 15:18       ` Richard Biener
2016-01-22 11:41         ` Christian Bruel
2016-01-22 11:56           ` Richard Biener
2016-01-25 19:06             ` Christophe Lyon
2016-01-26  8:17               ` Christian Bruel
2016-01-26  9:27               ` Kyrill Tkachov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).