public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH, ARM] PR68674 Fix LTO support for neon builtins and error catching
@ 2015-12-08 12:53 Christian Bruel
  2015-12-08 13:01 ` Ramana Radhakrishnan
  2015-12-09 17:32 ` [PATCH, ARM] PR68674 Fix LTO support for neon builtins and error catching Kyrill Tkachov
  0 siblings, 2 replies; 14+ messages in thread
From: Christian Bruel @ 2015-12-08 12:53 UTC (permalink / raw)
  To: ramana.radhakrishnan, kyrylo.tkachov, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1898 bytes --]

Hi,

The order of the NEON builtins construction has led to complications 
since the attribute target support. This was not a problem when driven 
from the command line, but was causing various issues when the builtins 
was mixed between fpu configurations or when used with LTO.

Firstly the builtin functions was not initialized before the parsing of 
functions, leading to wrong type initializations.

Then error catching code when a builtin was used without the proper fpu 
flags was incomprehensible for the user, for instance

#include "arm_neon.h"

int8x8_t a, b;
int16x8_t e;

void
main()
{
   e = (int16x8_t)__builtin_neon_vaddlsv8qi (a, b);
}

compiled with default options (without -mfpu=neon -mfloat-abi=hard) gave 
pages of

/arm-none-eabi/6.0.0/include/arm_neon.h:39:9: error: unknown type name 
'__simd64_int8_t'
  typedef __simd64_int8_t int8x8_t;
...
...
arm_neon.h:4724:3: error: can't convert a vector of type 'poly64x2_t 
{aka __vector(4) int}' to type 'int' which has different size
    return (poly64x2_t)__builtin_neon_vsli_nv2di ((int64x2_t) __a, 
(int64x2_t) __b, __c);
    ^~~~~~
...
... and one for each arm_neon.h lines..

by postponing the check into arm_expand_builtin, we now emit something 
more useful:

testo.c: In function 'main':
testo.c:9:7: error: '__builtin_neon_vaddlsv8qi' neon builtin is not 
supported in this configuration.
    e = (int16x8_t)__builtin_neon_vaddlsv8qi (a, b);
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

One small side effect to note: The total memory allocated is 370k bigger 
when neon is not used, so this support will have a follow-up to make 
their initialization lazy. But I'd like first to stabilize the stuff for 
stage3 (or get it pre-approved if the memory is an issue)

tested without new failures with {,-mfpu=vfp,-mfpu=neon}{,-march=armv7-a\}
(a few tests that was fail are now unsupported)

OK for trunk ?









[-- Attachment #2: lto-neon.patch --]
[-- Type: text/x-patch, Size: 6967 bytes --]

2015-12-07  Christian Bruel  <christian.bruel@st.com>

	* config/arm/arm-builtins.c (ARM_BUILTIN_CRYPTO_BASE): New enum tag.
	(arm_init_neon_builtins_internal): Rename arm_init_neon_builtins,
	(arm_init_crypto_builtins_internal): Rename arm_init_crypto_builtins.
	use add_builtin_function_ext_scope instead of add_builtin_function.
	(neon_set_p, neon_crypto_set_p): Remove.
	(arm_init_builtins): Always call arm_init_neon_builtins and
	arm_init_crypto_builtins.
	(arm_expand_builtin): Check ARM_BUILTIN_NEON_BASE and
	ARM_BUILTIN_CRYPTO_BASE.
	* config/arm/arm-protos.h (arm_init_neon_builtins): Remove proto.
	* config/arm/arm.c (arm_can_inline_p): Return OK for builtins.
	(arm_valid_target_attribute_tree) : Remove arm_init_neon_builtins call.

2015-12-07  Christian Bruel  <christian.bruel@st.com>

	PR target/pr68784
	PR target/pr65837
	* gcc.target/arm/pr68784.c: New test.
	* gcc.target/arm/lto/pr65837_0_attr.c: New test.
	* gcc.target/arm/lto/pr65837_0.c: Force float-abi.

Index: gcc/config/arm/arm-builtins.c
===================================================================
--- gcc/config/arm/arm-builtins.c	(revision 231363)
+++ gcc/config/arm/arm-builtins.c	(working copy)
@@ -526,6 +526,8 @@ enum arm_builtins
 #define CRYPTO3(L, U, M1, M2, M3, M4) \
   ARM_BUILTIN_CRYPTO_##U,
 
+  ARM_BUILTIN_CRYPTO_BASE,
+
 #include "crypto.def"
 
 #undef CRYPTO1
@@ -894,7 +896,7 @@ arm_init_simd_builtin_scalar_types (void
 }
 
 static void
-arm_init_neon_builtins_internal (void)
+arm_init_neon_builtins (void)
 {
   unsigned int i, fcode = ARM_BUILTIN_NEON_PATTERN_START;
 
@@ -1018,7 +1020,7 @@ arm_init_neon_builtins_internal (void)
 }
 
 static void
-arm_init_crypto_builtins_internal (void)
+arm_init_crypto_builtins (void)
 {
   tree V16UQI_type_node
     = arm_simd_builtin_type (V16QImode, true, false);
@@ -1098,25 +1100,6 @@ arm_init_crypto_builtins_internal (void)
   #undef FT3
 }
 
-static bool neon_set_p = false;
-static bool neon_crypto_set_p = false;
-
-void
-arm_init_neon_builtins (void)
-{
-  if (! neon_set_p)
-    {
-      neon_set_p = true;
-      arm_init_neon_builtins_internal ();
-    }
-
-  if (! neon_crypto_set_p && TARGET_CRYPTO && TARGET_HARD_FLOAT)
-    {
-      neon_crypto_set_p = true;
-      arm_init_crypto_builtins_internal ();
-    }
-}
-
 #undef NUM_DREG_TYPES
 #undef NUM_QREG_TYPES
 
@@ -1777,8 +1760,9 @@ arm_init_builtins (void)
      arm_init_neon_builtins which uses it.  */
   arm_init_fp16_builtins ();
 
-  if (TARGET_NEON)
-    arm_init_neon_builtins ();
+  arm_init_neon_builtins ();
+
+  arm_init_crypto_builtins ();
 
   if (TARGET_CRC32)
     arm_init_crc32_builtins ();
@@ -2332,9 +2316,26 @@ arm_expand_builtin (tree exp,
   int mask;
   int imm;
 
+  /* Check in the context of the function making the call whether the
+     builtin is supported.  */
+  if (fcode >= ARM_BUILTIN_NEON_BASE && !TARGET_NEON)
+    {
+      error ("%qE neon builtin is not supported in this configuration.",
+	     fndecl);
+      return const0_rtx;
+    }
+
   if (fcode >= ARM_BUILTIN_NEON_BASE)
     return arm_expand_neon_builtin (fcode, exp, target);
 
+  if (fcode >= ARM_BUILTIN_CRYPTO_BASE
+      && (!TARGET_CRYPTO || !TARGET_HARD_FLOAT))
+    {
+      error ("%qE crypto builtin is not supported in this configuration.",
+	     fndecl);
+      return const0_rtx;
+    }
+
   switch (fcode)
     {
     case ARM_BUILTIN_GET_FPSCR:
Index: gcc/config/arm/arm-protos.h
===================================================================
--- gcc/config/arm/arm-protos.h	(revision 231363)
+++ gcc/config/arm/arm-protos.h	(working copy)
@@ -213,7 +213,6 @@ extern void arm_mark_dllimport (tree);
 extern bool arm_change_mode_p (tree);
 #endif
 
-extern void arm_init_neon_builtins (void);
 extern tree arm_valid_target_attribute_tree (tree, struct gcc_options *,
 					     struct gcc_options *);
 extern void arm_pr_long_calls (struct cpp_reader *);
Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c	(revision 231363)
+++ gcc/config/arm/arm.c	(working copy)
@@ -26542,16 +26542,10 @@ thumb_set_return_address (rtx source, rt
 bool
 arm_vector_mode_supported_p (machine_mode mode)
 {
-  /* Neon also supports V2SImode, etc. listed in the clause below.  */
-  if (TARGET_NEON && (mode == V2SFmode || mode == V4SImode || mode == V8HImode
+  if (mode == V2SFmode || mode == V4SImode || mode == V8HImode
       || mode == V4HFmode || mode == V16QImode || mode == V4SFmode
-      || mode == V2DImode || mode == V8HFmode))
-    return true;
-
-  if ((TARGET_NEON || TARGET_IWMMXT)
-      && ((mode == V2SImode)
-	  || (mode == V4HImode)
-	  || (mode == V8QImode)))
+      || mode == V2DImode || mode == V8HFmode
+      || mode == V2SImode || mode == V4HImode || mode == V8QImode)
     return true;
 
   if (TARGET_INT_SIMD && (mode == V4UQQmode || mode == V4QQmode
@@ -29926,9 +29920,6 @@ arm_valid_target_attribute_tree (tree ar
   /* Do any overrides, such as global options arch=xxx.  */
   arm_option_override_internal (opts, opts_set);
 
-  if (TARGET_NEON)
-    arm_init_neon_builtins ();
-
   return build_target_option_node (opts);
 }
 
Index: gcc/testsuite/gcc.target/arm/lto/pr65837_0.c
===================================================================
--- gcc/testsuite/gcc.target/arm/lto/pr65837_0.c	(revision 231363)
+++ gcc/testsuite/gcc.target/arm/lto/pr65837_0.c	(working copy)
@@ -1,5 +1,7 @@
 /* { dg-lto-do run } */
-/* { dg-lto-options {{-flto -mfpu=neon}} } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-skip-if "need hardfp ABI" { *-*-* } { "-mfloat-abi=soft" } { "" } } */
+/* { dg-lto-options {{-flto -mfpu=neon -mfloat-abi=hard}} } */
 /* { dg-suppress-ld-options {-mfpu=neon} } */
 
 #include "arm_neon.h"
Index: gcc/testsuite/gcc.target/arm/lto/pr65837_0_attr.c
===================================================================
--- gcc/testsuite/gcc.target/arm/lto/pr65837_0_attr.c	(revision 0)
+++ gcc/testsuite/gcc.target/arm/lto/pr65837_0_attr.c	(working copy)
@@ -0,0 +1,16 @@
+/* { dg-lto-do run } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-skip-if "need hardfp ABI" { *-*-* } { "-mfloat-abi=soft" } { "" } } */
+/* { dg-lto-options {{-flto -mfloat-abi=hard}} } */
+
+#include "arm_neon.h"
+
+float32x2_t a, b, c, e;
+
+int __attribute__ ((target("fpu=neon")))
+main()
+{
+  e = __builtin_neon_vmls_lanev2sf (a, b, c, 0);
+  return 0;
+}
+
Index: gcc/testsuite/gcc.target/arm/pr68784.c
===================================================================
--- gcc/testsuite/gcc.target/arm/pr68784.c	(revision 0)
+++ gcc/testsuite/gcc.target/arm/pr68784.c	(working copy)
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options "-O2 -mfloat-abi=softfp" } */
+
+#include "arm_neon.h"
+
+int8x8_t a, b;
+int16x8_t e;
+
+void
+__attribute__ ((target("fpu=neon")))
+foo(void)
+{
+  e = (int16x8_t)__builtin_neon_vaddlsv8qi (a, b);
+}
+

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH, ARM] PR68674 Fix LTO support for neon builtins and error catching
  2015-12-08 12:53 [PATCH, ARM] PR68674 Fix LTO support for neon builtins and error catching Christian Bruel
@ 2015-12-08 13:01 ` Ramana Radhakrishnan
  2015-12-08 13:29   ` Christian Bruel
  2015-12-17 16:21   ` [PATCH, ARM] PR65835 Fix LTO support for neon builtins Christian Bruel
  2015-12-09 17:32 ` [PATCH, ARM] PR68674 Fix LTO support for neon builtins and error catching Kyrill Tkachov
  1 sibling, 2 replies; 14+ messages in thread
From: Ramana Radhakrishnan @ 2015-12-08 13:01 UTC (permalink / raw)
  To: Christian Bruel; +Cc: Ramana Radhakrishnan, Kyrylo Tkachov, gcc-patches

On Tue, Dec 8, 2015 at 12:53 PM, Christian Bruel <christian.bruel@st.com> wrote:
> Hi,
>
> The order of the NEON builtins construction has led to complications since
> the attribute target support. This was not a problem when driven from the
> command line, but was causing various issues when the builtins was mixed
> between fpu configurations or when used with LTO.
>
> Firstly the builtin functions was not initialized before the parsing of
> functions, leading to wrong type initializations.
>
> Then error catching code when a builtin was used without the proper fpu
> flags was incomprehensible for the user, for instance
>
> #include "arm_neon.h"
>
> int8x8_t a, b;
> int16x8_t e;
>
> void
> main()
> {
>   e = (int16x8_t)__builtin_neon_vaddlsv8qi (a, b);
> }

I'm not sure what problem you are trying to solve here - The user
should never be using __builtin_neon_vaddlsv8qi (a, b) here. What
happens with vaddl_s16 intrinsic ?

They really have to only use the vaddl_s8 intrinsic.


Ramana

>
> compiled with default options (without -mfpu=neon -mfloat-abi=hard) gave
> pages of
>
> /arm-none-eabi/6.0.0/include/arm_neon.h:39:9: error: unknown type name
> '__simd64_int8_t'
>  typedef __simd64_int8_t int8x8_t;
> ...
> ...
> arm_neon.h:4724:3: error: can't convert a vector of type 'poly64x2_t {aka
> __vector(4) int}' to type 'int' which has different size
>    return (poly64x2_t)__builtin_neon_vsli_nv2di ((int64x2_t) __a,
> (int64x2_t) __b, __c);
>    ^~~~~~
> ...
> ... and one for each arm_neon.h lines..
>
> by postponing the check into arm_expand_builtin, we now emit something more
> useful:
>
> testo.c: In function 'main':
> testo.c:9:7: error: '__builtin_neon_vaddlsv8qi' neon builtin is not
> supported in this configuration.
>    e = (int16x8_t)__builtin_neon_vaddlsv8qi (a, b);
>        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> One small side effect to note: The total memory allocated is 370k bigger
> when neon is not used, so this support will have a follow-up to make their
> initialization lazy. But I'd like first to stabilize the stuff for stage3
> (or get it pre-approved if the memory is an issue)
>
> tested without new failures with {,-mfpu=vfp,-mfpu=neon}{,-march=armv7-a\}
> (a few tests that was fail are now unsupported)
>
> OK for trunk ?
>
>
>
>
>
>
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH, ARM] PR68674 Fix LTO support for neon builtins and error catching
  2015-12-08 13:01 ` Ramana Radhakrishnan
@ 2015-12-08 13:29   ` Christian Bruel
  2015-12-08 13:36     ` Ramana Radhakrishnan
  2015-12-17 16:21   ` [PATCH, ARM] PR65835 Fix LTO support for neon builtins Christian Bruel
  1 sibling, 1 reply; 14+ messages in thread
From: Christian Bruel @ 2015-12-08 13:29 UTC (permalink / raw)
  To: Ramana Radhakrishnan; +Cc: Ramana Radhakrishnan, Kyrylo Tkachov, gcc-patches

Hello Ramana,

On 12/08/2015 02:01 PM, Ramana Radhakrishnan wrote:
> On Tue, Dec 8, 2015 at 12:53 PM, Christian Bruel <christian.bruel@st.com> wrote:
>> Hi,
>>
>> The order of the NEON builtins construction has led to complications since
>> the attribute target support. This was not a problem when driven from the
>> command line, but was causing various issues when the builtins was mixed
>> between fpu configurations or when used with LTO.
>>
>> Firstly the builtin functions was not initialized before the parsing of
>> functions, leading to wrong type initializations.
>>
>> Then error catching code when a builtin was used without the proper fpu
>> flags was incomprehensible for the user, for instance
>>
>> #include "arm_neon.h"
>>
>> int8x8_t a, b;
>> int16x8_t e;
>>
>> void
>> main()
>> {
>>    e = (int16x8_t)__builtin_neon_vaddlsv8qi (a, b);
>> }
>
> I'm not sure what problem you are trying to solve here - The user
> should never be using __builtin_neon_vaddlsv8qi (a, b) here. What
> happens with vaddl_s16 intrinsic ?
>
> They really have to only use the vaddl_s8 intrinsic.


Sure, that's not the problem, replace _builtin_neon_vaddlsv8qi by 
vaddl_s8. The tests (part of the patch) equivalently fails.

But anyway, users do use the __builtin directly, see for instance the 
Bug 65837


>
>
> Ramana
>
>>
>> compiled with default options (without -mfpu=neon -mfloat-abi=hard) gave
>> pages of
>>
>> /arm-none-eabi/6.0.0/include/arm_neon.h:39:9: error: unknown type name
>> '__simd64_int8_t'
>>   typedef __simd64_int8_t int8x8_t;
>> ...
>> ...
>> arm_neon.h:4724:3: error: can't convert a vector of type 'poly64x2_t {aka
>> __vector(4) int}' to type 'int' which has different size
>>     return (poly64x2_t)__builtin_neon_vsli_nv2di ((int64x2_t) __a,
>> (int64x2_t) __b, __c);
>>     ^~~~~~
>> ...
>> ... and one for each arm_neon.h lines..
>>
>> by postponing the check into arm_expand_builtin, we now emit something more
>> useful:
>>
>> testo.c: In function 'main':
>> testo.c:9:7: error: '__builtin_neon_vaddlsv8qi' neon builtin is not
>> supported in this configuration.
>>     e = (int16x8_t)__builtin_neon_vaddlsv8qi (a, b);
>>         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>> One small side effect to note: The total memory allocated is 370k bigger
>> when neon is not used, so this support will have a follow-up to make their
>> initialization lazy. But I'd like first to stabilize the stuff for stage3
>> (or get it pre-approved if the memory is an issue)
>>
>> tested without new failures with {,-mfpu=vfp,-mfpu=neon}{,-march=armv7-a\}
>> (a few tests that was fail are now unsupported)
>>
>> OK for trunk ?
>>
>>
>>
>>
>>
>>
>>
>>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH, ARM] PR68674 Fix LTO support for neon builtins and error catching
  2015-12-08 13:29   ` Christian Bruel
@ 2015-12-08 13:36     ` Ramana Radhakrishnan
  2015-12-08 13:53       ` Christian Bruel
  0 siblings, 1 reply; 14+ messages in thread
From: Ramana Radhakrishnan @ 2015-12-08 13:36 UTC (permalink / raw)
  To: Christian Bruel; +Cc: Ramana Radhakrishnan, Kyrylo Tkachov, gcc-patches

On Tue, Dec 8, 2015 at 1:29 PM, Christian Bruel <christian.bruel@st.com> wrote:
> Hello Ramana,
>
> On 12/08/2015 02:01 PM, Ramana Radhakrishnan wrote:
>>
>> On Tue, Dec 8, 2015 at 12:53 PM, Christian Bruel <christian.bruel@st.com>
>> wrote:
>>>
>>> Hi,
>>>
>>> The order of the NEON builtins construction has led to complications
>>> since
>>> the attribute target support. This was not a problem when driven from the
>>> command line, but was causing various issues when the builtins was mixed
>>> between fpu configurations or when used with LTO.
>>>
>>> Firstly the builtin functions was not initialized before the parsing of
>>> functions, leading to wrong type initializations.
>>>
>>> Then error catching code when a builtin was used without the proper fpu
>>> flags was incomprehensible for the user, for instance
>>>
>>> #include "arm_neon.h"
>>>
>>> int8x8_t a, b;
>>> int16x8_t e;
>>>
>>> void
>>> main()
>>> {
>>>    e = (int16x8_t)__builtin_neon_vaddlsv8qi (a, b);
>>> }
>>
>>
>> I'm not sure what problem you are trying to solve here - The user
>> should never be using __builtin_neon_vaddlsv8qi (a, b) here. What
>> happens with vaddl_s16 intrinsic ?
>>
>> They really have to only use the vaddl_s8 intrinsic.
>
>
>
> Sure, that's not the problem, replace _builtin_neon_vaddlsv8qi by vaddl_s8.
> The tests (part of the patch) equivalently fails.
>
> But anyway, users do use the __builtin directly, see for instance the Bug
> 65837

I think that's just a reduced testcase from the issue to illustrate
the problem from Prathamesh who was trying to build chromium with LTO.

The __builtin_neon* aren't published anywhere and people really
shouldn't be using that directly in source code and only use the
interface in arm_neon.h which implements pretty much all the Neon
intrinsics in the ACLE document.

regards
Ramana


>
>
>
>>
>>
>> Ramana
>>
>>>
>>> compiled with default options (without -mfpu=neon -mfloat-abi=hard) gave
>>> pages of
>>>
>>> /arm-none-eabi/6.0.0/include/arm_neon.h:39:9: error: unknown type name
>>> '__simd64_int8_t'
>>>   typedef __simd64_int8_t int8x8_t;
>>> ...
>>> ...
>>> arm_neon.h:4724:3: error: can't convert a vector of type 'poly64x2_t {aka
>>> __vector(4) int}' to type 'int' which has different size
>>>     return (poly64x2_t)__builtin_neon_vsli_nv2di ((int64x2_t) __a,
>>> (int64x2_t) __b, __c);
>>>     ^~~~~~
>>> ...
>>> ... and one for each arm_neon.h lines..
>>>
>>> by postponing the check into arm_expand_builtin, we now emit something
>>> more
>>> useful:
>>>
>>> testo.c: In function 'main':
>>> testo.c:9:7: error: '__builtin_neon_vaddlsv8qi' neon builtin is not
>>> supported in this configuration.
>>>     e = (int16x8_t)__builtin_neon_vaddlsv8qi (a, b);
>>>         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>
>>> One small side effect to note: The total memory allocated is 370k bigger
>>> when neon is not used, so this support will have a follow-up to make
>>> their
>>> initialization lazy. But I'd like first to stabilize the stuff for stage3
>>> (or get it pre-approved if the memory is an issue)
>>>
>>> tested without new failures with
>>> {,-mfpu=vfp,-mfpu=neon}{,-march=armv7-a\}
>>> (a few tests that was fail are now unsupported)
>>>
>>> OK for trunk ?
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH, ARM] PR68674 Fix LTO support for neon builtins and error catching
  2015-12-08 13:36     ` Ramana Radhakrishnan
@ 2015-12-08 13:53       ` Christian Bruel
  2015-12-08 15:31         ` Christian Bruel
  2015-12-08 20:45         ` Ramana Radhakrishnan
  0 siblings, 2 replies; 14+ messages in thread
From: Christian Bruel @ 2015-12-08 13:53 UTC (permalink / raw)
  To: Ramana Radhakrishnan; +Cc: Ramana Radhakrishnan, Kyrylo Tkachov, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 491 bytes --]


>
> The __builtin_neon* aren't published anywhere and people really
> shouldn't be using that directly in source code and only use the
> interface in arm_neon.h which implements pretty much all the Neon
> intrinsics in the ACLE document.
>

yes, I see. I wanted to reduce the problem as well, not to confuse 
anything by exposing those. sorry about this.

Here is the amended patch that use the arm_neon.h interface instead of 
the builtins. Still fixes the same issues

Thanks

Christian


[-- Attachment #2: lto-neon.patch --]
[-- Type: text/x-patch, Size: 8121 bytes --]

2015-12-07  Christian Bruel  <christian.bruel@st.com>

	* config/arm/arm-builtins.c (ARM_BUILTIN_CRYPTO_BASE): New enum tag.
	(arm_init_neon_builtins_internal): Rename arm_init_neon_builtins,
	(arm_init_crypto_builtins_internal): Rename arm_init_crypto_builtins.
	use add_builtin_function_ext_scope instead of add_builtin_function.
	(neon_set_p, neon_crypto_set_p): Remove.
	(arm_init_builtins): Always call arm_init_neon_builtins and
	arm_init_crypto_builtins.
	(arm_expand_builtin): Check ARM_BUILTIN_NEON_BASE and
	ARM_BUILTIN_CRYPTO_BASE.
	* config/arm/arm-protos.h (arm_init_neon_builtins): Remove proto.
	* config/arm/arm.c (arm_can_inline_p): Return OK for builtins.
	(arm_valid_target_attribute_tree) : Remove arm_init_neon_builtins call.

2015-12-07  Christian Bruel  <christian.bruel@st.com>

	PR target/pr68784
	PR target/pr65837
	* gcc.target/arm/pr68784.c: New test.
	* gcc.target/arm/lto/pr65837_0_attr.c: New test.
	* gcc.target/arm/lto/pr65837_0.c: Force float-abi.

Index: gcc/config/arm/arm-builtins.c
===================================================================
--- gcc/config/arm/arm-builtins.c	(revision 231363)
+++ gcc/config/arm/arm-builtins.c	(working copy)
@@ -526,6 +526,8 @@ enum arm_builtins
 #define CRYPTO3(L, U, M1, M2, M3, M4) \
   ARM_BUILTIN_CRYPTO_##U,
 
+  ARM_BUILTIN_CRYPTO_BASE,
+
 #include "crypto.def"
 
 #undef CRYPTO1
@@ -894,7 +896,7 @@ arm_init_simd_builtin_scalar_types (void
 }
 
 static void
-arm_init_neon_builtins_internal (void)
+arm_init_neon_builtins (void)
 {
   unsigned int i, fcode = ARM_BUILTIN_NEON_PATTERN_START;
 
@@ -1018,7 +1020,7 @@ arm_init_neon_builtins_internal (void)
 }
 
 static void
-arm_init_crypto_builtins_internal (void)
+arm_init_crypto_builtins (void)
 {
   tree V16UQI_type_node
     = arm_simd_builtin_type (V16QImode, true, false);
@@ -1098,25 +1100,6 @@ arm_init_crypto_builtins_internal (void)
   #undef FT3
 }
 
-static bool neon_set_p = false;
-static bool neon_crypto_set_p = false;
-
-void
-arm_init_neon_builtins (void)
-{
-  if (! neon_set_p)
-    {
-      neon_set_p = true;
-      arm_init_neon_builtins_internal ();
-    }
-
-  if (! neon_crypto_set_p && TARGET_CRYPTO && TARGET_HARD_FLOAT)
-    {
-      neon_crypto_set_p = true;
-      arm_init_crypto_builtins_internal ();
-    }
-}
-
 #undef NUM_DREG_TYPES
 #undef NUM_QREG_TYPES
 
@@ -1777,8 +1760,9 @@ arm_init_builtins (void)
      arm_init_neon_builtins which uses it.  */
   arm_init_fp16_builtins ();
 
-  if (TARGET_NEON)
-    arm_init_neon_builtins ();
+  arm_init_neon_builtins ();
+
+  arm_init_crypto_builtins ();
 
   if (TARGET_CRC32)
     arm_init_crc32_builtins ();
@@ -2332,9 +2316,26 @@ arm_expand_builtin (tree exp,
   int mask;
   int imm;
 
+  /* Check in the context of the function making the call whether the
+     builtin is supported.  */
+  if (fcode >= ARM_BUILTIN_NEON_BASE && !TARGET_NEON)
+    {
+      error ("%qE neon builtin is not supported in this configuration.",
+	     fndecl);
+      return const0_rtx;
+    }
+
   if (fcode >= ARM_BUILTIN_NEON_BASE)
     return arm_expand_neon_builtin (fcode, exp, target);
 
+  if (fcode >= ARM_BUILTIN_CRYPTO_BASE
+      && (!TARGET_CRYPTO || !TARGET_HARD_FLOAT))
+    {
+      error ("%qE crypto builtin is not supported in this configuration.",
+	     fndecl);
+      return const0_rtx;
+    }
+
   switch (fcode)
     {
     case ARM_BUILTIN_GET_FPSCR:
Index: gcc/config/arm/arm-protos.h
===================================================================
--- gcc/config/arm/arm-protos.h	(revision 231363)
+++ gcc/config/arm/arm-protos.h	(working copy)
@@ -213,7 +213,6 @@ extern void arm_mark_dllimport (tree);
 extern bool arm_change_mode_p (tree);
 #endif
 
-extern void arm_init_neon_builtins (void);
 extern tree arm_valid_target_attribute_tree (tree, struct gcc_options *,
 					     struct gcc_options *);
 extern void arm_pr_long_calls (struct cpp_reader *);
Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c	(revision 231363)
+++ gcc/config/arm/arm.c	(working copy)
@@ -26542,16 +26542,10 @@ thumb_set_return_address (rtx source, rt
 bool
 arm_vector_mode_supported_p (machine_mode mode)
 {
-  /* Neon also supports V2SImode, etc. listed in the clause below.  */
-  if (TARGET_NEON && (mode == V2SFmode || mode == V4SImode || mode == V8HImode
+  if (mode == V2SFmode || mode == V4SImode || mode == V8HImode
       || mode == V4HFmode || mode == V16QImode || mode == V4SFmode
-      || mode == V2DImode || mode == V8HFmode))
-    return true;
-
-  if ((TARGET_NEON || TARGET_IWMMXT)
-      && ((mode == V2SImode)
-	  || (mode == V4HImode)
-	  || (mode == V8QImode)))
+      || mode == V2DImode || mode == V8HFmode
+      || mode == V2SImode || mode == V4HImode || mode == V8QImode)
     return true;
 
   if (TARGET_INT_SIMD && (mode == V4UQQmode || mode == V4QQmode
@@ -29926,9 +29920,6 @@ arm_valid_target_attribute_tree (tree ar
   /* Do any overrides, such as global options arch=xxx.  */
   arm_option_override_internal (opts, opts_set);
 
-  if (TARGET_NEON)
-    arm_init_neon_builtins ();
-
   return build_target_option_node (opts);
 }
 
Index: gcc/testsuite/gcc.target/arm/attr-crypto.c
===================================================================
--- gcc/testsuite/gcc.target/arm/attr-crypto.c	(revision 231363)
+++ gcc/testsuite/gcc.target/arm/attr-crypto.c	(working copy)
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target arm_crypto_ok } */
-/* { dg-options "-O2 -mfloat-abi=softfp" } */
+/* { dg-additional-options "-mfpu=vfp -mfloat-abi=softfp" } */
 
 #pragma GCC target ("fpu=crypto-neon-fp-armv8")
 
Index: gcc/testsuite/gcc.target/arm/attr_thumb-static.c
===================================================================
--- gcc/testsuite/gcc.target/arm/attr_thumb-static.c	(revision 231363)
+++ gcc/testsuite/gcc.target/arm/attr_thumb-static.c	(working copy)
@@ -1,5 +1,6 @@
 /* Check that a change mode to a static function is correctly handled. */
 /* { dg-do run } */
+/* { dg-skip-if "Need thumb support" { ! { arm_thumb1_ok || arm_thumb2_ok } } { "*" } { "" } } */
 
 static void
  __attribute__((__noinline__)) 
Index: gcc/testsuite/gcc.target/arm/lto/pr65837_0.c
===================================================================
--- gcc/testsuite/gcc.target/arm/lto/pr65837_0.c	(revision 231363)
+++ gcc/testsuite/gcc.target/arm/lto/pr65837_0.c	(working copy)
@@ -1,5 +1,7 @@
 /* { dg-lto-do run } */
-/* { dg-lto-options {{-flto -mfpu=neon}} } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-skip-if "need hardfp ABI" { *-*-* } { "-mfloat-abi=soft" } { "" } } */
+/* { dg-lto-options {{-flto -mfpu=neon -mfloat-abi=hard}} } */
 /* { dg-suppress-ld-options {-mfpu=neon} } */
 
 #include "arm_neon.h"
@@ -8,7 +10,7 @@ float32x2_t a, b, c, e;
 
 int main()
 {
-  e = __builtin_neon_vmls_lanev2sf (a, b, c, 0);
+  e = vmls_lane_f32 (a, b, c, 0);
   return 0;
 }
 
Index: gcc/testsuite/gcc.target/arm/lto/pr65837_0_attr.c
===================================================================
--- gcc/testsuite/gcc.target/arm/lto/pr65837_0_attr.c	(revision 0)
+++ gcc/testsuite/gcc.target/arm/lto/pr65837_0_attr.c	(working copy)
@@ -0,0 +1,16 @@
+/* { dg-lto-do run } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-skip-if "need hardfp ABI" { *-*-* } { "-mfloat-abi=soft" } { "" } } */
+/* { dg-lto-options {{-flto -mfloat-abi=hard}} } */
+
+#include "arm_neon.h"
+
+float32x2_t a, b, c, e;
+
+int __attribute__ ((target("fpu=neon")))
+main()
+{
+  e = vmls_lane_f32 (a, b, c, 0);
+  return 0;
+}
+
Index: gcc/testsuite/gcc.target/arm/pr68784.c
===================================================================
--- gcc/testsuite/gcc.target/arm/pr68784.c	(revision 0)
+++ gcc/testsuite/gcc.target/arm/pr68784.c	(working copy)
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options "-O2 -mfloat-abi=softfp" } */
+
+#include "arm_neon.h"
+
+int8x8_t a, b;
+int16x8_t e;
+
+void
+__attribute__ ((target("fpu=neon")))
+foo(void)
+{
+  e = (int16x8_t) vaddl_s8(a, b);
+}
+

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH, ARM] PR68674 Fix LTO support for neon builtins and error catching
  2015-12-08 13:53       ` Christian Bruel
@ 2015-12-08 15:31         ` Christian Bruel
  2015-12-08 20:45         ` Ramana Radhakrishnan
  1 sibling, 0 replies; 14+ messages in thread
From: Christian Bruel @ 2015-12-08 15:31 UTC (permalink / raw)
  To: ramana.gcc, kyrylo.tkachov; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 114 bytes --]

there was a few chunks in the testsuite that should not be part of the 
previous patch, Here is it again.









[-- Attachment #2: lto-neon.patch --]
[-- Type: text/x-patch, Size: 7156 bytes --]

2015-12-07  Christian Bruel  <christian.bruel@st.com>

	* config/arm/arm-builtins.c (ARM_BUILTIN_CRYPTO_BASE): New enum tag.
	(arm_init_neon_builtins_internal): Rename arm_init_neon_builtins,
	(arm_init_crypto_builtins_internal): Rename arm_init_crypto_builtins.
	use add_builtin_function_ext_scope instead of add_builtin_function.
	(neon_set_p, neon_crypto_set_p): Remove.
	(arm_init_builtins): Always call arm_init_neon_builtins and
	arm_init_crypto_builtins.
	(arm_expand_builtin): Check ARM_BUILTIN_NEON_BASE and
	ARM_BUILTIN_CRYPTO_BASE.
	* config/arm/arm-protos.h (arm_init_neon_builtins): Remove proto.
	* config/arm/arm.c (arm_can_inline_p): Return OK for builtins.
	(arm_valid_target_attribute_tree) : Remove arm_init_neon_builtins call.

2015-12-07  Christian Bruel  <christian.bruel@st.com>

	PR target/pr68784
	PR target/pr65837
	* gcc.target/arm/pr68784.c: New test.
	* gcc.target/arm/lto/pr65837_0_attr.c: New test.
	* gcc.target/arm/lto/pr65837_0.c: Force float-abi.
	Use vmls_lane_f32 instead of __builtin_neon_vmls_lanev2sf.

Index: gcc/config/arm/arm-builtins.c
===================================================================
--- gcc/config/arm/arm-builtins.c	(revision 231363)
+++ gcc/config/arm/arm-builtins.c	(working copy)
@@ -526,6 +526,8 @@ enum arm_builtins
 #define CRYPTO3(L, U, M1, M2, M3, M4) \
   ARM_BUILTIN_CRYPTO_##U,
 
+  ARM_BUILTIN_CRYPTO_BASE,
+
 #include "crypto.def"
 
 #undef CRYPTO1
@@ -894,7 +896,7 @@ arm_init_simd_builtin_scalar_types (void
 }
 
 static void
-arm_init_neon_builtins_internal (void)
+arm_init_neon_builtins (void)
 {
   unsigned int i, fcode = ARM_BUILTIN_NEON_PATTERN_START;
 
@@ -1018,7 +1020,7 @@ arm_init_neon_builtins_internal (void)
 }
 
 static void
-arm_init_crypto_builtins_internal (void)
+arm_init_crypto_builtins (void)
 {
   tree V16UQI_type_node
     = arm_simd_builtin_type (V16QImode, true, false);
@@ -1098,25 +1100,6 @@ arm_init_crypto_builtins_internal (void)
   #undef FT3
 }
 
-static bool neon_set_p = false;
-static bool neon_crypto_set_p = false;
-
-void
-arm_init_neon_builtins (void)
-{
-  if (! neon_set_p)
-    {
-      neon_set_p = true;
-      arm_init_neon_builtins_internal ();
-    }
-
-  if (! neon_crypto_set_p && TARGET_CRYPTO && TARGET_HARD_FLOAT)
-    {
-      neon_crypto_set_p = true;
-      arm_init_crypto_builtins_internal ();
-    }
-}
-
 #undef NUM_DREG_TYPES
 #undef NUM_QREG_TYPES
 
@@ -1777,8 +1760,9 @@ arm_init_builtins (void)
      arm_init_neon_builtins which uses it.  */
   arm_init_fp16_builtins ();
 
-  if (TARGET_NEON)
-    arm_init_neon_builtins ();
+  arm_init_neon_builtins ();
+
+  arm_init_crypto_builtins ();
 
   if (TARGET_CRC32)
     arm_init_crc32_builtins ();
@@ -2332,9 +2316,26 @@ arm_expand_builtin (tree exp,
   int mask;
   int imm;
 
+  /* Check in the context of the function making the call whether the
+     builtin is supported.  */
+  if (fcode >= ARM_BUILTIN_NEON_BASE && !TARGET_NEON)
+    {
+      error ("%qE neon builtin is not supported in this configuration.",
+	     fndecl);
+      return const0_rtx;
+    }
+
   if (fcode >= ARM_BUILTIN_NEON_BASE)
     return arm_expand_neon_builtin (fcode, exp, target);
 
+  if (fcode >= ARM_BUILTIN_CRYPTO_BASE
+      && (!TARGET_CRYPTO || !TARGET_HARD_FLOAT))
+    {
+      error ("%qE crypto builtin is not supported in this configuration.",
+	     fndecl);
+      return const0_rtx;
+    }
+
   switch (fcode)
     {
     case ARM_BUILTIN_GET_FPSCR:
Index: gcc/config/arm/arm-protos.h
===================================================================
--- gcc/config/arm/arm-protos.h	(revision 231363)
+++ gcc/config/arm/arm-protos.h	(working copy)
@@ -213,7 +213,6 @@ extern void arm_mark_dllimport (tree);
 extern bool arm_change_mode_p (tree);
 #endif
 
-extern void arm_init_neon_builtins (void);
 extern tree arm_valid_target_attribute_tree (tree, struct gcc_options *,
 					     struct gcc_options *);
 extern void arm_pr_long_calls (struct cpp_reader *);
Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c	(revision 231363)
+++ gcc/config/arm/arm.c	(working copy)
@@ -26542,16 +26542,10 @@ thumb_set_return_address (rtx source, rt
 bool
 arm_vector_mode_supported_p (machine_mode mode)
 {
-  /* Neon also supports V2SImode, etc. listed in the clause below.  */
-  if (TARGET_NEON && (mode == V2SFmode || mode == V4SImode || mode == V8HImode
+  if (mode == V2SFmode || mode == V4SImode || mode == V8HImode
       || mode == V4HFmode || mode == V16QImode || mode == V4SFmode
-      || mode == V2DImode || mode == V8HFmode))
-    return true;
-
-  if ((TARGET_NEON || TARGET_IWMMXT)
-      && ((mode == V2SImode)
-	  || (mode == V4HImode)
-	  || (mode == V8QImode)))
+      || mode == V2DImode || mode == V8HFmode
+      || mode == V2SImode || mode == V4HImode || mode == V8QImode)
     return true;
 
   if (TARGET_INT_SIMD && (mode == V4UQQmode || mode == V4QQmode
@@ -29926,9 +29920,6 @@ arm_valid_target_attribute_tree (tree ar
   /* Do any overrides, such as global options arch=xxx.  */
   arm_option_override_internal (opts, opts_set);
 
-  if (TARGET_NEON)
-    arm_init_neon_builtins ();
-
   return build_target_option_node (opts);
 }
 
Index: gcc/testsuite/gcc.target/arm/lto/pr65837_0.c
===================================================================
--- gcc/testsuite/gcc.target/arm/lto/pr65837_0.c	(revision 231363)
+++ gcc/testsuite/gcc.target/arm/lto/pr65837_0.c	(working copy)
@@ -1,5 +1,7 @@
 /* { dg-lto-do run } */
-/* { dg-lto-options {{-flto -mfpu=neon}} } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-skip-if "need hardfp ABI" { *-*-* } { "-mfloat-abi=soft" } { "" } } */
+/* { dg-lto-options {{-flto -mfpu=neon -mfloat-abi=hard}} } */
 /* { dg-suppress-ld-options {-mfpu=neon} } */
 
 #include "arm_neon.h"
@@ -8,7 +10,7 @@ float32x2_t a, b, c, e;
 
 int main()
 {
-  e = __builtin_neon_vmls_lanev2sf (a, b, c, 0);
+  e = vmls_lane_f32 (a, b, c, 0);
   return 0;
 }
 
Index: gcc/testsuite/gcc.target/arm/lto/pr65837_0_attr.c
===================================================================
--- gcc/testsuite/gcc.target/arm/lto/pr65837_0_attr.c	(revision 0)
+++ gcc/testsuite/gcc.target/arm/lto/pr65837_0_attr.c	(working copy)
@@ -0,0 +1,16 @@
+/* { dg-lto-do run } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-skip-if "need hardfp ABI" { *-*-* } { "-mfloat-abi=soft" } { "" } } */
+/* { dg-lto-options {{-flto -mfloat-abi=hard}} } */
+
+#include "arm_neon.h"
+
+float32x2_t a, b, c, e;
+
+int __attribute__ ((target("fpu=neon")))
+main()
+{
+  e = vmls_lane_f32 (a, b, c, 0);
+  return 0;
+}
+
Index: gcc/testsuite/gcc.target/arm/pr68784.c
===================================================================
--- gcc/testsuite/gcc.target/arm/pr68784.c	(revision 0)
+++ gcc/testsuite/gcc.target/arm/pr68784.c	(working copy)
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options "-O2 -mfloat-abi=softfp" } */
+
+#include "arm_neon.h"
+
+int8x8_t a, b;
+int16x8_t e;
+
+void
+__attribute__ ((target("fpu=neon")))
+foo(void)
+{
+  e = (int16x8_t) vaddl_s8(a, b);
+}
+

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH, ARM] PR68674 Fix LTO support for neon builtins and error catching
  2015-12-08 13:53       ` Christian Bruel
  2015-12-08 15:31         ` Christian Bruel
@ 2015-12-08 20:45         ` Ramana Radhakrishnan
  2015-12-09 16:08           ` Christian Bruel
  1 sibling, 1 reply; 14+ messages in thread
From: Ramana Radhakrishnan @ 2015-12-08 20:45 UTC (permalink / raw)
  To: Christian Bruel, Ramana Radhakrishnan; +Cc: Kyrylo Tkachov, gcc-patches



On 08/12/15 13:53, Christian Bruel wrote:
> 
>>
>> The __builtin_neon* aren't published anywhere and people really
>> shouldn't be using that directly in source code and only use the
>> interface in arm_neon.h which implements pretty much all the Neon
>> intrinsics in the ACLE document.
>>
> 
> yes, I see. I wanted to reduce the problem as well, not to confuse anything by exposing those. sorry about this.
> 
> Here is the amended patch that use the arm_neon.h interface instead of the builtins. Still fixes the same issues
> 
> Thanks
> 
> Christian
> 

> lto-neon.patch
> 
> 2015-12-07  Christian Bruel  <christian.bruel@st.com>
> 
> 	* config/arm/arm-builtins.c (ARM_BUILTIN_CRYPTO_BASE): New enum tag.
> 	(arm_init_neon_builtins_internal): Rename arm_init_neon_builtins,
> 	(arm_init_crypto_builtins_internal): Rename arm_init_crypto_builtins.
> 	use add_builtin_function_ext_scope instead of add_builtin_function.
> 	(neon_set_p, neon_crypto_set_p): Remove.
> 	(arm_init_builtins): Always call arm_init_neon_builtins and
> 	arm_init_crypto_builtins.
> 	(arm_expand_builtin): Check ARM_BUILTIN_NEON_BASE and
> 	ARM_BUILTIN_CRYPTO_BASE.
> 	* config/arm/arm-protos.h (arm_init_neon_builtins): Remove proto.
> 	* config/arm/arm.c (arm_can_inline_p): Return OK for builtins.
> 	(arm_valid_target_attribute_tree) : Remove arm_init_neon_builtins call.
> 
> 2015-12-07  Christian Bruel  <christian.bruel@st.com>
> 
> 	PR target/pr68784
> 	PR target/pr65837
> 	* gcc.target/arm/pr68784.c: New test.
> 	* gcc.target/arm/lto/pr65837_0_attr.c: New test.
> 	* gcc.target/arm/lto/pr65837_0.c: Force float-abi.
> 
> Index: gcc/config/arm/arm-builtins.c
> ===================================================================
> --- gcc/config/arm/arm-builtins.c	(revision 231363)
> +++ gcc/config/arm/arm-builtins.c	(working copy)
> @@ -526,6 +526,8 @@ enum arm_builtins
>  #define CRYPTO3(L, U, M1, M2, M3, M4) \
>    ARM_BUILTIN_CRYPTO_##U,
>  
> +  ARM_BUILTIN_CRYPTO_BASE,
> +
>  #include "crypto.def"
>  
>  #undef CRYPTO1
> @@ -894,7 +896,7 @@ arm_init_simd_builtin_scalar_types (void
>  }
>  
>  static void
> -arm_init_neon_builtins_internal (void)
> +arm_init_neon_builtins (void)
>  {
>    unsigned int i, fcode = ARM_BUILTIN_NEON_PATTERN_START;
>  
> @@ -1018,7 +1020,7 @@ arm_init_neon_builtins_internal (void)
>  }
>  
>  static void
> -arm_init_crypto_builtins_internal (void)
> +arm_init_crypto_builtins (void)
>  {
>    tree V16UQI_type_node
>      = arm_simd_builtin_type (V16QImode, true, false);
> @@ -1098,25 +1100,6 @@ arm_init_crypto_builtins_internal (void)
>    #undef FT3
>  }
>  
> -static bool neon_set_p = false;
> -static bool neon_crypto_set_p = false;
> -
> -void
> -arm_init_neon_builtins (void)
> -{
> -  if (! neon_set_p)
> -    {
> -      neon_set_p = true;
> -      arm_init_neon_builtins_internal ();
> -    }
> -
> -  if (! neon_crypto_set_p && TARGET_CRYPTO && TARGET_HARD_FLOAT)
> -    {
> -      neon_crypto_set_p = true;
> -      arm_init_crypto_builtins_internal ();
> -    }
> -}
> -
>  #undef NUM_DREG_TYPES
>  #undef NUM_QREG_TYPES
>  
> @@ -1777,8 +1760,9 @@ arm_init_builtins (void)
>       arm_init_neon_builtins which uses it.  */
>    arm_init_fp16_builtins ();
>  
> -  if (TARGET_NEON)
> -    arm_init_neon_builtins ();
> +  arm_init_neon_builtins ();
> +
> +  arm_init_crypto_builtins ();
>  
>    if (TARGET_CRC32)
>      arm_init_crc32_builtins ();
> @@ -2332,9 +2316,26 @@ arm_expand_builtin (tree exp,
>    int mask;
>    int imm;
>  
> +  /* Check in the context of the function making the call whether the
> +     builtin is supported.  */
> +  if (fcode >= ARM_BUILTIN_NEON_BASE && !TARGET_NEON)
> +    {
> +      error ("%qE neon builtin is not supported in this configuration.",
> +	     fndecl);
> +      return const0_rtx;
> +    }

Can we make this error message more user friendly.

"You must enable NEON instructions (e.g. -mfloat-abi=softfp -mfpu=neon) to use these intrinsics"

> +
>    if (fcode >= ARM_BUILTIN_NEON_BASE)
>      return arm_expand_neon_builtin (fcode, exp, target);
>  
> +  if (fcode >= ARM_BUILTIN_CRYPTO_BASE
> +      && (!TARGET_CRYPTO || !TARGET_HARD_FLOAT))
> +    {
> +      error ("%qE crypto builtin is not supported in this configuration.",
> +	     fndecl);
> +      return const0_rtx;
> +    }

"You must enable crypto intrinsics (e.g. -mfloat-abi=softfp -mfpu=crypto-neon...) to use these intrinsics" 

I'm still playing with this patch.

regards
Ramana





> +




>    switch (fcode)
>      {
>      case ARM_BUILTIN_GET_FPSCR:
> Index: gcc/config/arm/arm-protos.h
> ===================================================================
> --- gcc/config/arm/arm-protos.h	(revision 231363)
> +++ gcc/config/arm/arm-protos.h	(working copy)
> @@ -213,7 +213,6 @@ extern void arm_mark_dllimport (tree);
>  extern bool arm_change_mode_p (tree);
>  #endif
>  
> -extern void arm_init_neon_builtins (void);
>  extern tree arm_valid_target_attribute_tree (tree, struct gcc_options *,
>  					     struct gcc_options *);
>  extern void arm_pr_long_calls (struct cpp_reader *);
> Index: gcc/config/arm/arm.c
> ===================================================================
> --- gcc/config/arm/arm.c	(revision 231363)
> +++ gcc/config/arm/arm.c	(working copy)
> @@ -26542,16 +26542,10 @@ thumb_set_return_address (rtx source, rt
>  bool
>  arm_vector_mode_supported_p (machine_mode mode)
>  {
> -  /* Neon also supports V2SImode, etc. listed in the clause below.  */
> -  if (TARGET_NEON && (mode == V2SFmode || mode == V4SImode || mode == V8HImode
> +  if (mode == V2SFmode || mode == V4SImode || mode == V8HImode
>        || mode == V4HFmode || mode == V16QImode || mode == V4SFmode
> -      || mode == V2DImode || mode == V8HFmode))
> -    return true;
> -
> -  if ((TARGET_NEON || TARGET_IWMMXT)
> -      && ((mode == V2SImode)
> -	  || (mode == V4HImode)
> -	  || (mode == V8QImode)))
> +      || mode == V2DImode || mode == V8HFmode
> +      || mode == V2SImode || mode == V4HImode || mode == V8QImode)
>      return true;
>  
>    if (TARGET_INT_SIMD && (mode == V4UQQmode || mode == V4QQmode
> @@ -29926,9 +29920,6 @@ arm_valid_target_attribute_tree (tree ar
>    /* Do any overrides, such as global options arch=xxx.  */
>    arm_option_override_internal (opts, opts_set);
>  
> -  if (TARGET_NEON)
> -    arm_init_neon_builtins ();
> -
>    return build_target_option_node (opts);
>  }
>  
> Index: gcc/testsuite/gcc.target/arm/attr-crypto.c
> ===================================================================
> --- gcc/testsuite/gcc.target/arm/attr-crypto.c	(revision 231363)
> +++ gcc/testsuite/gcc.target/arm/attr-crypto.c	(working copy)
> @@ -1,6 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-require-effective-target arm_crypto_ok } */
> -/* { dg-options "-O2 -mfloat-abi=softfp" } */
> +/* { dg-additional-options "-mfpu=vfp -mfloat-abi=softfp" } */
>  
>  #pragma GCC target ("fpu=crypto-neon-fp-armv8")
>  
> Index: gcc/testsuite/gcc.target/arm/attr_thumb-static.c
> ===================================================================
> --- gcc/testsuite/gcc.target/arm/attr_thumb-static.c	(revision 231363)
> +++ gcc/testsuite/gcc.target/arm/attr_thumb-static.c	(working copy)
> @@ -1,5 +1,6 @@
>  /* Check that a change mode to a static function is correctly handled. */
>  /* { dg-do run } */
> +/* { dg-skip-if "Need thumb support" { ! { arm_thumb1_ok || arm_thumb2_ok } } { "*" } { "" } } */
>  
>  static void
>   __attribute__((__noinline__)) 
> Index: gcc/testsuite/gcc.target/arm/lto/pr65837_0.c
> ===================================================================
> --- gcc/testsuite/gcc.target/arm/lto/pr65837_0.c	(revision 231363)
> +++ gcc/testsuite/gcc.target/arm/lto/pr65837_0.c	(working copy)
> @@ -1,5 +1,7 @@
>  /* { dg-lto-do run } */
> -/* { dg-lto-options {{-flto -mfpu=neon}} } */
> +/* { dg-require-effective-target arm_neon_ok } */
> +/* { dg-skip-if "need hardfp ABI" { *-*-* } { "-mfloat-abi=soft" } { "" } } */
> +/* { dg-lto-options {{-flto -mfpu=neon -mfloat-abi=hard}} } */
>  /* { dg-suppress-ld-options {-mfpu=neon} } */
>  
>  #include "arm_neon.h"
> @@ -8,7 +10,7 @@ float32x2_t a, b, c, e;
>  
>  int main()
>  {
> -  e = __builtin_neon_vmls_lanev2sf (a, b, c, 0);
> +  e = vmls_lane_f32 (a, b, c, 0);
>    return 0;
>  }
>  
> Index: gcc/testsuite/gcc.target/arm/lto/pr65837_0_attr.c
> ===================================================================
> --- gcc/testsuite/gcc.target/arm/lto/pr65837_0_attr.c	(revision 0)
> +++ gcc/testsuite/gcc.target/arm/lto/pr65837_0_attr.c	(working copy)
> @@ -0,0 +1,16 @@
> +/* { dg-lto-do run } */
> +/* { dg-require-effective-target arm_neon_ok } */
> +/* { dg-skip-if "need hardfp ABI" { *-*-* } { "-mfloat-abi=soft" } { "" } } */
> +/* { dg-lto-options {{-flto -mfloat-abi=hard}} } */
> +
> +#include "arm_neon.h"
> +
> +float32x2_t a, b, c, e;
> +
> +int __attribute__ ((target("fpu=neon")))
> +main()
> +{
> +  e = vmls_lane_f32 (a, b, c, 0);
> +  return 0;
> +}
> +
> Index: gcc/testsuite/gcc.target/arm/pr68784.c
> ===================================================================
> --- gcc/testsuite/gcc.target/arm/pr68784.c	(revision 0)
> +++ gcc/testsuite/gcc.target/arm/pr68784.c	(working copy)
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target arm_neon_ok } */
> +/* { dg-options "-O2 -mfloat-abi=softfp" } */
> +
> +#include "arm_neon.h"
> +
> +int8x8_t a, b;
> +int16x8_t e;
> +
> +void
> +__attribute__ ((target("fpu=neon")))
> +foo(void)
> +{
> +  e = (int16x8_t) vaddl_s8(a, b);
> +}
> +

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH, ARM] PR68674 Fix LTO support for neon builtins and error catching
  2015-12-08 20:45         ` Ramana Radhakrishnan
@ 2015-12-09 16:08           ` Christian Bruel
  0 siblings, 0 replies; 14+ messages in thread
From: Christian Bruel @ 2015-12-09 16:08 UTC (permalink / raw)
  To: Ramana Radhakrishnan, Ramana Radhakrishnan; +Cc: Kyrylo Tkachov, gcc-patches



>> +  /* Check in the context of the function making the call whether the
>> +     builtin is supported.  */
>> +  if (fcode >= ARM_BUILTIN_NEON_BASE && !TARGET_NEON)
>> +    {
>> +      error ("%qE neon builtin is not supported in this configuration.",
>> +	     fndecl);
>> +      return const0_rtx;
>> +    }
>
> Can we make this error message more user friendly.
>
> "You must enable NEON instructions (e.g. -mfloat-abi=softfp -mfpu=neon) to use these intrinsics"

yes, maybe also mention here arm_neon.h since this error is a sanity 
catch in case of direct __builtin calls.

what about something like:

"You must enable NEON instructions (e.g. -mfloat-abi=softfp -mfpu=neon) 
and use the functions from arm_neon.h"

?

thanks to the #pragma, calls from arm_neon.h should never result into 
this error (eventually a target specific option mismatch caught from 
arm_can_inline_p in case of mismatch with the caller)

>
>> +
>>     if (fcode >= ARM_BUILTIN_NEON_BASE)
>>       return arm_expand_neon_builtin (fcode, exp, target);
>>
>> +  if (fcode >= ARM_BUILTIN_CRYPTO_BASE
>> +      && (!TARGET_CRYPTO || !TARGET_HARD_FLOAT))
>> +    {
>> +      error ("%qE crypto builtin is not supported in this configuration.",
>> +	     fndecl);
>> +      return const0_rtx;
>> +    }
>
> "You must enable crypto intrinsics (e.g. -mfloat-abi=softfp -mfpu=crypto-neon...) to use these intrinsics"

yes, same as above

>
> I'm still playing with this patch.

thanks,

Regards

Christian

>
> regards
> Ramana
>
>
>
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH, ARM] PR68674 Fix LTO support for neon builtins and error catching
  2015-12-08 12:53 [PATCH, ARM] PR68674 Fix LTO support for neon builtins and error catching Christian Bruel
  2015-12-08 13:01 ` Ramana Radhakrishnan
@ 2015-12-09 17:32 ` Kyrill Tkachov
  2015-12-10  9:26   ` Christian Bruel
  1 sibling, 1 reply; 14+ messages in thread
From: Kyrill Tkachov @ 2015-12-09 17:32 UTC (permalink / raw)
  To: Christian Bruel, ramana.radhakrishnan, gcc-patches

Hi Christian,

On 08/12/15 12:53, Christian Bruel wrote:
> Hi,
>
> The order of the NEON builtins construction has led to complications since the attribute target support. This was not a problem when driven from the command line, but was causing various issues when the builtins was mixed between fpu 
> configurations or when used with LTO.
>
> Firstly the builtin functions was not initialized before the parsing of functions, leading to wrong type initializations.
>
> Then error catching code when a builtin was used without the proper fpu flags was incomprehensible for the user, for instance
>
> #include "arm_neon.h"
>
> int8x8_t a, b;
> int16x8_t e;
>
> void
> main()
> {
>   e = (int16x8_t)__builtin_neon_vaddlsv8qi (a, b);
> }
>
> compiled with default options (without -mfpu=neon -mfloat-abi=hard) gave pages of
>
> /arm-none-eabi/6.0.0/include/arm_neon.h:39:9: error: unknown type name '__simd64_int8_t'
>  typedef __simd64_int8_t int8x8_t;
> ...
> ...
> arm_neon.h:4724:3: error: can't convert a vector of type 'poly64x2_t {aka __vector(4) int}' to type 'int' which has different size
>    return (poly64x2_t)__builtin_neon_vsli_nv2di ((int64x2_t) __a, (int64x2_t) __b, __c);
>    ^~~~~~
> ...
> ... and one for each arm_neon.h lines..
>
> by postponing the check into arm_expand_builtin, we now emit something more useful:
>
> testo.c: In function 'main':
> testo.c:9:7: error: '__builtin_neon_vaddlsv8qi' neon builtin is not supported in this configuration.
>    e = (int16x8_t)__builtin_neon_vaddlsv8qi (a, b);
>        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> One small side effect to note: The total memory allocated is 370k bigger when neon is not used, so this support will have a follow-up to make their initialization lazy. But I'd like first to stabilize the stuff for stage3 (or get it 
> pre-approved if the memory is an issue)
>
> tested without new failures with {,-mfpu=vfp,-mfpu=neon}{,-march=armv7-a\}
> (a few tests that was fail are now unsupported)
>

I agree, the vector types (re)initialisation is a tricky part.
I've seen similar issues in the aarch64 work for target attributes

  bool
  arm_vector_mode_supported_p (machine_mode mode)
  {
-  /* Neon also supports V2SImode, etc. listed in the clause below.  */
-  if (TARGET_NEON && (mode == V2SFmode || mode == V4SImode || mode == V8HImode
+  if (mode == V2SFmode || mode == V4SImode || mode == V8HImode
        || mode == V4HFmode || mode == V16QImode || mode == V4SFmode
-      || mode == V2DImode || mode == V8HFmode))
-    return true;
-
-  if ((TARGET_NEON || TARGET_IWMMXT)
-      && ((mode == V2SImode)
-	  || (mode == V4HImode)
-	  || (mode == V8QImode)))
+      || mode == V2DImode || mode == V8HFmode
+      || mode == V2SImode || mode == V4HImode || mode == V8QImode)
      return true;
  

So this allows vector modes unconditionally for all targets/fpu configurations?
I was tempted to do that in aarch64 when I was encountering similar issues.
In the end what worked for me was re-laying out the vector types in SET_CURRENT_FUNCTION
if necessary (https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01084.html)

Kyrill

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH, ARM] PR68674 Fix LTO support for neon builtins and error catching
  2015-12-09 17:32 ` [PATCH, ARM] PR68674 Fix LTO support for neon builtins and error catching Kyrill Tkachov
@ 2015-12-10  9:26   ` Christian Bruel
  2015-12-10  9:59     ` Kyrill Tkachov
  0 siblings, 1 reply; 14+ messages in thread
From: Christian Bruel @ 2015-12-10  9:26 UTC (permalink / raw)
  To: Kyrill Tkachov, ramana.radhakrishnan, gcc-patches

Hi Kyrill,

On 12/09/2015 06:32 PM, Kyrill Tkachov wrote:
> Hi Christian,
>
> On 08/12/15 12:53, Christian Bruel wrote:
>> Hi,
>>
>> The order of the NEON builtins construction has led to complications since the attribute target support. This was not a problem when driven from the command line, but was causing various issues when the builtins was mixed between fpu
>> configurations or when used with LTO.
>>
>> Firstly the builtin functions was not initialized before the parsing of functions, leading to wrong type initializations.
>>
>> Then error catching code when a builtin was used without the proper fpu flags was incomprehensible for the user, for instance
>>
>> #include "arm_neon.h"
>>
>> int8x8_t a, b;
>> int16x8_t e;
>>
>> void
>> main()
>> {
>>    e = (int16x8_t)__builtin_neon_vaddlsv8qi (a, b);
>> }
>>
>> compiled with default options (without -mfpu=neon -mfloat-abi=hard) gave pages of
>>
>> /arm-none-eabi/6.0.0/include/arm_neon.h:39:9: error: unknown type name '__simd64_int8_t'
>>   typedef __simd64_int8_t int8x8_t;
>> ...
>> ...
>> arm_neon.h:4724:3: error: can't convert a vector of type 'poly64x2_t {aka __vector(4) int}' to type 'int' which has different size
>>     return (poly64x2_t)__builtin_neon_vsli_nv2di ((int64x2_t) __a, (int64x2_t) __b, __c);
>>     ^~~~~~
>> ...
>> ... and one for each arm_neon.h lines..
>>
>> by postponing the check into arm_expand_builtin, we now emit something more useful:
>>
>> testo.c: In function 'main':
>> testo.c:9:7: error: '__builtin_neon_vaddlsv8qi' neon builtin is not supported in this configuration.
>>     e = (int16x8_t)__builtin_neon_vaddlsv8qi (a, b);
>>         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>> One small side effect to note: The total memory allocated is 370k bigger when neon is not used, so this support will have a follow-up to make their initialization lazy. But I'd like first to stabilize the stuff for stage3 (or get it
>> pre-approved if the memory is an issue)
>>
>> tested without new failures with {,-mfpu=vfp,-mfpu=neon}{,-march=armv7-a\}
>> (a few tests that was fail are now unsupported)
>>
>
> I agree, the vector types (re)initialisation is a tricky part.
> I've seen similar issues in the aarch64 work for target attributes
>
>    bool
>    arm_vector_mode_supported_p (machine_mode mode)
>    {
> -  /* Neon also supports V2SImode, etc. listed in the clause below.  */
> -  if (TARGET_NEON && (mode == V2SFmode || mode == V4SImode || mode == V8HImode
> +  if (mode == V2SFmode || mode == V4SImode || mode == V8HImode
>          || mode == V4HFmode || mode == V16QImode || mode == V4SFmode
> -      || mode == V2DImode || mode == V8HFmode))
> -    return true;
> -
> -  if ((TARGET_NEON || TARGET_IWMMXT)
> -      && ((mode == V2SImode)
> -	  || (mode == V4HImode)
> -	  || (mode == V8QImode)))
> +      || mode == V2DImode || mode == V8HFmode
> +      || mode == V2SImode || mode == V4HImode || mode == V8QImode)
>        return true;
>
>
> So this allows vector modes unconditionally for all targets/fpu configurations?
> I was tempted to do that in aarch64 when I was encountering similar issues.
> In the end what worked for me was re-laying out the vector types in SET_CURRENT_FUNCTION
> if necessary (https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01084.html)

yes my assumption was that arm_init_neon_builtins () is now called for 
all targets, since the check is done at expand time and that the 
builtins need to be known by lto, with the vector type initialization, 
before they are expanded. However at that time, lto streaming-in have 
not yet processed the attributes and TARGET_NEON is not set for the 
function.

I had a look at your re-layout, but I'm not sure. it feels like a hack. 
I think this should be solved first place during the builtin 
construction. Also set_current_function is too late, builtin_expand that 
will explode because of the unknown modes.

But raise the point. In fact I was not really happy with this 
arm_vector_mode_supported_p neither as I was not sure about other 
contexts it can be called from and I cannot clearly claim that this 
change is always correct.

I'd like to think about other way to set the vector modes from 
arm_init_neon_builtins before the target flags are known. I'm thinking 
about the lazy initialization at expand time, or using a contextual 
boolean flags. how does that sound ?

many thanks,

Christian


>
> Kyrill
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH, ARM] PR68674 Fix LTO support for neon builtins and error catching
  2015-12-10  9:26   ` Christian Bruel
@ 2015-12-10  9:59     ` Kyrill Tkachov
  2015-12-10 10:11       ` Christian Bruel
  0 siblings, 1 reply; 14+ messages in thread
From: Kyrill Tkachov @ 2015-12-10  9:59 UTC (permalink / raw)
  To: Christian Bruel, ramana.radhakrishnan, gcc-patches


On 10/12/15 09:26, Christian Bruel wrote:
> Hi Kyrill,
>
> On 12/09/2015 06:32 PM, Kyrill Tkachov wrote:
>> Hi Christian,
>>
>> On 08/12/15 12:53, Christian Bruel wrote:
>>> Hi,
>>>
>>> The order of the NEON builtins construction has led to complications since the attribute target support. This was not a problem when driven from the command line, but was causing various issues when the builtins was mixed between fpu
>>> configurations or when used with LTO.
>>>
>>> Firstly the builtin functions was not initialized before the parsing of functions, leading to wrong type initializations.
>>>
>>> Then error catching code when a builtin was used without the proper fpu flags was incomprehensible for the user, for instance
>>>
>>> #include "arm_neon.h"
>>>
>>> int8x8_t a, b;
>>> int16x8_t e;
>>>
>>> void
>>> main()
>>> {
>>>    e = (int16x8_t)__builtin_neon_vaddlsv8qi (a, b);
>>> }
>>>
>>> compiled with default options (without -mfpu=neon -mfloat-abi=hard) gave pages of
>>>
>>> /arm-none-eabi/6.0.0/include/arm_neon.h:39:9: error: unknown type name '__simd64_int8_t'
>>>   typedef __simd64_int8_t int8x8_t;
>>> ...
>>> ...
>>> arm_neon.h:4724:3: error: can't convert a vector of type 'poly64x2_t {aka __vector(4) int}' to type 'int' which has different size
>>>     return (poly64x2_t)__builtin_neon_vsli_nv2di ((int64x2_t) __a, (int64x2_t) __b, __c);
>>>     ^~~~~~
>>> ...
>>> ... and one for each arm_neon.h lines..
>>>
>>> by postponing the check into arm_expand_builtin, we now emit something more useful:
>>>
>>> testo.c: In function 'main':
>>> testo.c:9:7: error: '__builtin_neon_vaddlsv8qi' neon builtin is not supported in this configuration.
>>>     e = (int16x8_t)__builtin_neon_vaddlsv8qi (a, b);
>>>         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>
>>> One small side effect to note: The total memory allocated is 370k bigger when neon is not used, so this support will have a follow-up to make their initialization lazy. But I'd like first to stabilize the stuff for stage3 (or get it
>>> pre-approved if the memory is an issue)
>>>
>>> tested without new failures with {,-mfpu=vfp,-mfpu=neon}{,-march=armv7-a\}
>>> (a few tests that was fail are now unsupported)
>>>
>>
>> I agree, the vector types (re)initialisation is a tricky part.
>> I've seen similar issues in the aarch64 work for target attributes
>>
>>    bool
>>    arm_vector_mode_supported_p (machine_mode mode)
>>    {
>> -  /* Neon also supports V2SImode, etc. listed in the clause below.  */
>> -  if (TARGET_NEON && (mode == V2SFmode || mode == V4SImode || mode == V8HImode
>> +  if (mode == V2SFmode || mode == V4SImode || mode == V8HImode
>>          || mode == V4HFmode || mode == V16QImode || mode == V4SFmode
>> -      || mode == V2DImode || mode == V8HFmode))
>> -    return true;
>> -
>> -  if ((TARGET_NEON || TARGET_IWMMXT)
>> -      && ((mode == V2SImode)
>> -      || (mode == V4HImode)
>> -      || (mode == V8QImode)))
>> +      || mode == V2DImode || mode == V8HFmode
>> +      || mode == V2SImode || mode == V4HImode || mode == V8QImode)
>>        return true;
>>
>>
>> So this allows vector modes unconditionally for all targets/fpu configurations?
>> I was tempted to do that in aarch64 when I was encountering similar issues.
>> In the end what worked for me was re-laying out the vector types in SET_CURRENT_FUNCTION
>> if necessary (https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01084.html)
>
> yes my assumption was that arm_init_neon_builtins () is now called for all targets, since the check is done at expand time and that the builtins need to be known by lto, with the vector type initialization, before they are expanded. 
> However at that time, lto streaming-in have not yet processed the attributes and TARGET_NEON is not set for the function.
>
> I had a look at your re-layout, but I'm not sure. it feels like a hack. I think this should be solved first place during the builtin construction. Also set_current_function is too late, builtin_expand that will explode because of the 
> unknown modes.
>
> But raise the point. In fact I was not really happy with this arm_vector_mode_supported_p neither as I was not sure about other contexts it can be called from and I cannot clearly claim that this change is always correct.
>

So the main usage of targetm.vector_mode_supported_p is in stor-layout.c and vector_type_mode in particular seems
to have a relevant comment:
  /* Vector types need to re-check the target flags each time we report
     the machine mode.  We need to do this because attribute target can
     change the result of vector_mode_supported_p and have_regs_of_mode
     on a per-function basis.  Thus the TYPE_MODE of a VECTOR_TYPE can
     change on a per-function basis.  */

I think that implies that it expects targetm.vector_mode_supported_p to reject vector modes in
contexts that don't support NEON...

> I'd like to think about other way to set the vector modes from arm_init_neon_builtins before the target flags are known. I'm thinking about the lazy initialization at expand time, or using a contextual boolean flags. how does that sound ?
>

Laying out the vector types during arm_init_neon_builtins sounds more promising to me.
Changing layout of types during expand is risky, from what I remember.

In principle, the types and builtins created in arm_init_neon_builtins are only ever supposed to be used in
a NEON context, so I thought that just turning on NEON upon entry into arm_init_neon_builtins and resetting
it back upon exit would work. However, this won't work because we construct our builtin types by copying existing
type nodes (e.g. intQI_type_node) that have been laid out earlier by the midend (frontend?) assuming no NEON.

I wonder if we can explicitly layout these global types in the arm_init_neon_builtins context...

Thanks,
Kyrill

> many thanks,
>
> Christian
>
>
>>
>> Kyrill
>>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH, ARM] PR68674 Fix LTO support for neon builtins and error catching
  2015-12-10  9:59     ` Kyrill Tkachov
@ 2015-12-10 10:11       ` Christian Bruel
  2015-12-10 10:19         ` Kyrill Tkachov
  0 siblings, 1 reply; 14+ messages in thread
From: Christian Bruel @ 2015-12-10 10:11 UTC (permalink / raw)
  To: Kyrill Tkachov, ramana.radhakrishnan, gcc-patches



On 12/10/2015 10:59 AM, Kyrill Tkachov wrote:
>
> On 10/12/15 09:26, Christian Bruel wrote:
>> Hi Kyrill,
>>
>> On 12/09/2015 06:32 PM, Kyrill Tkachov wrote:
>>> Hi Christian,
>>>
>>> On 08/12/15 12:53, Christian Bruel wrote:
>>>> Hi,
>>>>
>>>> The order of the NEON builtins construction has led to complications since the attribute target support. This was not a problem when driven from the command line, but was causing various issues when the builtins was mixed between fpu
>>>> configurations or when used with LTO.
>>>>
>>>> Firstly the builtin functions was not initialized before the parsing of functions, leading to wrong type initializations.
>>>>
>>>> Then error catching code when a builtin was used without the proper fpu flags was incomprehensible for the user, for instance
>>>>
>>>> #include "arm_neon.h"
>>>>
>>>> int8x8_t a, b;
>>>> int16x8_t e;
>>>>
>>>> void
>>>> main()
>>>> {
>>>>     e = (int16x8_t)__builtin_neon_vaddlsv8qi (a, b);
>>>> }
>>>>
>>>> compiled with default options (without -mfpu=neon -mfloat-abi=hard) gave pages of
>>>>
>>>> /arm-none-eabi/6.0.0/include/arm_neon.h:39:9: error: unknown type name '__simd64_int8_t'
>>>>    typedef __simd64_int8_t int8x8_t;
>>>> ...
>>>> ...
>>>> arm_neon.h:4724:3: error: can't convert a vector of type 'poly64x2_t {aka __vector(4) int}' to type 'int' which has different size
>>>>      return (poly64x2_t)__builtin_neon_vsli_nv2di ((int64x2_t) __a, (int64x2_t) __b, __c);
>>>>      ^~~~~~
>>>> ...
>>>> ... and one for each arm_neon.h lines..
>>>>
>>>> by postponing the check into arm_expand_builtin, we now emit something more useful:
>>>>
>>>> testo.c: In function 'main':
>>>> testo.c:9:7: error: '__builtin_neon_vaddlsv8qi' neon builtin is not supported in this configuration.
>>>>      e = (int16x8_t)__builtin_neon_vaddlsv8qi (a, b);
>>>>          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>
>>>> One small side effect to note: The total memory allocated is 370k bigger when neon is not used, so this support will have a follow-up to make their initialization lazy. But I'd like first to stabilize the stuff for stage3 (or get it
>>>> pre-approved if the memory is an issue)
>>>>
>>>> tested without new failures with {,-mfpu=vfp,-mfpu=neon}{,-march=armv7-a\}
>>>> (a few tests that was fail are now unsupported)
>>>>
>>>
>>> I agree, the vector types (re)initialisation is a tricky part.
>>> I've seen similar issues in the aarch64 work for target attributes
>>>
>>>     bool
>>>     arm_vector_mode_supported_p (machine_mode mode)
>>>     {
>>> -  /* Neon also supports V2SImode, etc. listed in the clause below.  */
>>> -  if (TARGET_NEON && (mode == V2SFmode || mode == V4SImode || mode == V8HImode
>>> +  if (mode == V2SFmode || mode == V4SImode || mode == V8HImode
>>>           || mode == V4HFmode || mode == V16QImode || mode == V4SFmode
>>> -      || mode == V2DImode || mode == V8HFmode))
>>> -    return true;
>>> -
>>> -  if ((TARGET_NEON || TARGET_IWMMXT)
>>> -      && ((mode == V2SImode)
>>> -      || (mode == V4HImode)
>>> -      || (mode == V8QImode)))
>>> +      || mode == V2DImode || mode == V8HFmode
>>> +      || mode == V2SImode || mode == V4HImode || mode == V8QImode)
>>>         return true;
>>>
>>>
>>> So this allows vector modes unconditionally for all targets/fpu configurations?
>>> I was tempted to do that in aarch64 when I was encountering similar issues.
>>> In the end what worked for me was re-laying out the vector types in SET_CURRENT_FUNCTION
>>> if necessary (https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01084.html)
>>
>> yes my assumption was that arm_init_neon_builtins () is now called for all targets, since the check is done at expand time and that the builtins need to be known by lto, with the vector type initialization, before they are expanded.
>> However at that time, lto streaming-in have not yet processed the attributes and TARGET_NEON is not set for the function.
>>
>> I had a look at your re-layout, but I'm not sure. it feels like a hack. I think this should be solved first place during the builtin construction. Also set_current_function is too late, builtin_expand that will explode because of the
>> unknown modes.
>>
>> But raise the point. In fact I was not really happy with this arm_vector_mode_supported_p neither as I was not sure about other contexts it can be called from and I cannot clearly claim that this change is always correct.
>>
>
> So the main usage of targetm.vector_mode_supported_p is in stor-layout.c and vector_type_mode in particular seems
> to have a relevant comment:
>    /* Vector types need to re-check the target flags each time we report
>       the machine mode.  We need to do this because attribute target can
>       change the result of vector_mode_supported_p and have_regs_of_mode
>       on a per-function basis.  Thus the TYPE_MODE of a VECTOR_TYPE can
>       change on a per-function basis.  */
>
> I think that implies that it expects targetm.vector_mode_supported_p to reject vector modes in
> contexts that don't support NEON...

yes, thanks for this clarification, that settles it. this part of my 
patch is rubbish :-)

>
>> I'd like to think about other way to set the vector modes from arm_init_neon_builtins before the target flags are known. I'm thinking about the lazy initialization at expand time, or using a contextual boolean flags. how does that sound ?
>>
>
> Laying out the vector types during arm_init_neon_builtins sounds more promising to me.
> Changing layout of types during expand is risky, from what I remember.

I am thinking about the arm_builtin_decl hook, not expand. There is a 
bool initialize_p flag that seems perfect for the need. (apparently it's 
always true and never used by any other target)



>
> In principle, the types and builtins created in arm_init_neon_builtins are only ever supposed to be used in
> a NEON context, so I thought that just turning on NEON upon entry into arm_init_neon_builtins and resetting
> it back upon exit would work. However, this won't work because we construct our builtin types by copying existing
> type nodes (e.g. intQI_type_node) that have been laid out earlier by the midend (frontend?) assuming no NEON.
>
> I wonder if we can explicitly layout these global types in the arm_init_neon_builtins context...
>
> Thanks,
> Kyrill
>
>> many thanks,
>>
>> Christian
>>
>>
>>>
>>> Kyrill
>>>
>>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH, ARM] PR68674 Fix LTO support for neon builtins and error catching
  2015-12-10 10:11       ` Christian Bruel
@ 2015-12-10 10:19         ` Kyrill Tkachov
  0 siblings, 0 replies; 14+ messages in thread
From: Kyrill Tkachov @ 2015-12-10 10:19 UTC (permalink / raw)
  To: Christian Bruel, ramana.radhakrishnan, gcc-patches


On 10/12/15 10:11, Christian Bruel wrote:
>
>
> On 12/10/2015 10:59 AM, Kyrill Tkachov wrote:
>>
>> On 10/12/15 09:26, Christian Bruel wrote:
>>> Hi Kyrill,
>>>
>>> On 12/09/2015 06:32 PM, Kyrill Tkachov wrote:
>>>> Hi Christian,
>>>>
>>>> On 08/12/15 12:53, Christian Bruel wrote:
>>>>> Hi,
>>>>>
>>>>> The order of the NEON builtins construction has led to complications since the attribute target support. This was not a problem when driven from the command line, but was causing various issues when the builtins was mixed between fpu
>>>>> configurations or when used with LTO.
>>>>>
>>>>> Firstly the builtin functions was not initialized before the parsing of functions, leading to wrong type initializations.
>>>>>
>>>>> Then error catching code when a builtin was used without the proper fpu flags was incomprehensible for the user, for instance
>>>>>
>>>>> #include "arm_neon.h"
>>>>>
>>>>> int8x8_t a, b;
>>>>> int16x8_t e;
>>>>>
>>>>> void
>>>>> main()
>>>>> {
>>>>>     e = (int16x8_t)__builtin_neon_vaddlsv8qi (a, b);
>>>>> }
>>>>>
>>>>> compiled with default options (without -mfpu=neon -mfloat-abi=hard) gave pages of
>>>>>
>>>>> /arm-none-eabi/6.0.0/include/arm_neon.h:39:9: error: unknown type name '__simd64_int8_t'
>>>>>    typedef __simd64_int8_t int8x8_t;
>>>>> ...
>>>>> ...
>>>>> arm_neon.h:4724:3: error: can't convert a vector of type 'poly64x2_t {aka __vector(4) int}' to type 'int' which has different size
>>>>>      return (poly64x2_t)__builtin_neon_vsli_nv2di ((int64x2_t) __a, (int64x2_t) __b, __c);
>>>>>      ^~~~~~
>>>>> ...
>>>>> ... and one for each arm_neon.h lines..
>>>>>
>>>>> by postponing the check into arm_expand_builtin, we now emit something more useful:
>>>>>
>>>>> testo.c: In function 'main':
>>>>> testo.c:9:7: error: '__builtin_neon_vaddlsv8qi' neon builtin is not supported in this configuration.
>>>>>      e = (int16x8_t)__builtin_neon_vaddlsv8qi (a, b);
>>>>>          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>
>>>>> One small side effect to note: The total memory allocated is 370k bigger when neon is not used, so this support will have a follow-up to make their initialization lazy. But I'd like first to stabilize the stuff for stage3 (or get it
>>>>> pre-approved if the memory is an issue)
>>>>>
>>>>> tested without new failures with {,-mfpu=vfp,-mfpu=neon}{,-march=armv7-a\}
>>>>> (a few tests that was fail are now unsupported)
>>>>>
>>>>
>>>> I agree, the vector types (re)initialisation is a tricky part.
>>>> I've seen similar issues in the aarch64 work for target attributes
>>>>
>>>>     bool
>>>>     arm_vector_mode_supported_p (machine_mode mode)
>>>>     {
>>>> -  /* Neon also supports V2SImode, etc. listed in the clause below.  */
>>>> -  if (TARGET_NEON && (mode == V2SFmode || mode == V4SImode || mode == V8HImode
>>>> +  if (mode == V2SFmode || mode == V4SImode || mode == V8HImode
>>>>           || mode == V4HFmode || mode == V16QImode || mode == V4SFmode
>>>> -      || mode == V2DImode || mode == V8HFmode))
>>>> -    return true;
>>>> -
>>>> -  if ((TARGET_NEON || TARGET_IWMMXT)
>>>> -      && ((mode == V2SImode)
>>>> -      || (mode == V4HImode)
>>>> -      || (mode == V8QImode)))
>>>> +      || mode == V2DImode || mode == V8HFmode
>>>> +      || mode == V2SImode || mode == V4HImode || mode == V8QImode)
>>>>         return true;
>>>>
>>>>
>>>> So this allows vector modes unconditionally for all targets/fpu configurations?
>>>> I was tempted to do that in aarch64 when I was encountering similar issues.
>>>> In the end what worked for me was re-laying out the vector types in SET_CURRENT_FUNCTION
>>>> if necessary (https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01084.html)
>>>
>>> yes my assumption was that arm_init_neon_builtins () is now called for all targets, since the check is done at expand time and that the builtins need to be known by lto, with the vector type initialization, before they are expanded.
>>> However at that time, lto streaming-in have not yet processed the attributes and TARGET_NEON is not set for the function.
>>>
>>> I had a look at your re-layout, but I'm not sure. it feels like a hack. I think this should be solved first place during the builtin construction. Also set_current_function is too late, builtin_expand that will explode because of the
>>> unknown modes.
>>>
>>> But raise the point. In fact I was not really happy with this arm_vector_mode_supported_p neither as I was not sure about other contexts it can be called from and I cannot clearly claim that this change is always correct.
>>>
>>
>> So the main usage of targetm.vector_mode_supported_p is in stor-layout.c and vector_type_mode in particular seems
>> to have a relevant comment:
>>    /* Vector types need to re-check the target flags each time we report
>>       the machine mode.  We need to do this because attribute target can
>>       change the result of vector_mode_supported_p and have_regs_of_mode
>>       on a per-function basis.  Thus the TYPE_MODE of a VECTOR_TYPE can
>>       change on a per-function basis.  */
>>
>> I think that implies that it expects targetm.vector_mode_supported_p to reject vector modes in
>> contexts that don't support NEON...
>
> yes, thanks for this clarification, that settles it. this part of my patch is rubbish :-)
>
>>
>>> I'd like to think about other way to set the vector modes from arm_init_neon_builtins before the target flags are known. I'm thinking about the lazy initialization at expand time, or using a contextual boolean flags. how does that sound ?
>>>
>>
>> Laying out the vector types during arm_init_neon_builtins sounds more promising to me.
>> Changing layout of types during expand is risky, from what I remember.
>
> I am thinking about the arm_builtin_decl hook, not expand. There is a bool initialize_p flag that seems perfect for the need. (apparently it's always true and never used by any other target)
>

Sounds promising. I'm not familiar with the callsites of targetm.builtin_decl, but if it does what we want
maybe it's worth pursuing.

Kyrill

>
>
>>
>> In principle, the types and builtins created in arm_init_neon_builtins are only ever supposed to be used in
>> a NEON context, so I thought that just turning on NEON upon entry into arm_init_neon_builtins and resetting
>> it back upon exit would work. However, this won't work because we construct our builtin types by copying existing
>> type nodes (e.g. intQI_type_node) that have been laid out earlier by the midend (frontend?) assuming no NEON.
>>
>> I wonder if we can explicitly layout these global types in the arm_init_neon_builtins context...
>>
>> Thanks,
>> Kyrill
>>
>>> many thanks,
>>>
>>> Christian
>>>
>>>
>>>>
>>>> Kyrill
>>>>
>>>
>>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH, ARM] PR65835 Fix LTO support for neon builtins
  2015-12-08 13:01 ` Ramana Radhakrishnan
  2015-12-08 13:29   ` Christian Bruel
@ 2015-12-17 16:21   ` Christian Bruel
  1 sibling, 0 replies; 14+ messages in thread
From: Christian Bruel @ 2015-12-17 16:21 UTC (permalink / raw)
  To: Ramana Radhakrishnan; +Cc: Ramana Radhakrishnan, Kyrylo Tkachov, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 390 bytes --]


This is the new version of the patch that removes the "lto1 target 
specific builtin not available" error message happening in the LTO 
streamer-in while loading the NEON intrinsic in the global context, 
without arm_fpu_index being set with the current FPU mode.

nb: Needs https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01714.html to pass

Also robustified the associated test.

Thanks,


[-- Attachment #2: lto-neon.patch --]
[-- Type: text/x-patch, Size: 3692 bytes --]

2015-12-17  Christian Bruel  <christian.bruel@st.com>

	PR target/65837
	* arm-builtins.c (ARM_BUILTIN_CRYPTO_BASE): New enum.
	(arm_init_neon_builtins): Move neon_set_p and neon_crypto_set_p...
	(arm_init_neon_builtins_internal, arm_init_crypto_builtins_internal):
	Here.
	* arm_builtin_decl (arm_init_neon_builtins_internal)
	(arm_init_crypto_builtins_internal): Call if needed.

2015-12-17  Christian Bruel  <christian.bruel@st.com>

	PR target/65837
	* gcc.target/arm/lto/pr65837_0.c: Robustify dg tests.
	Use intrinsinc name from arm_neon.h.

Index: config/arm/arm-builtins.c
===================================================================
--- config/arm/arm-builtins.c	(revision 231774)
+++ config/arm/arm-builtins.c	(working copy)
@@ -519,6 +519,8 @@ enum arm_builtins
 #undef CRYPTO2
 #undef CRYPTO3
 
+  ARM_BUILTIN_CRYPTO_BASE,
+
 #define CRYPTO1(L, U, M1, M2) \
   ARM_BUILTIN_CRYPTO_##U,
 #define CRYPTO2(L, U, M1, M2, M3) \
@@ -893,11 +895,19 @@ arm_init_simd_builtin_scalar_types (void
 					     "__builtin_neon_uti");
 }
 
+static bool neon_set_p = false;
+static bool neon_crypto_set_p = false;
+
 static void
 arm_init_neon_builtins_internal (void)
 {
   unsigned int i, fcode = ARM_BUILTIN_NEON_PATTERN_START;
 
+  if (neon_set_p)
+    return;
+
+  neon_set_p = true;
+
   arm_init_simd_builtin_types ();
 
   /* Strong-typing hasn't been implemented for all AdvSIMD builtin intrinsics.
@@ -1020,6 +1030,11 @@ arm_init_neon_builtins_internal (void)
 static void
 arm_init_crypto_builtins_internal (void)
 {
+  if (neon_crypto_set_p)
+    return;
+
+  neon_crypto_set_p = true;
+
   tree V16UQI_type_node
     = arm_simd_builtin_type (V16QImode, true, false);
 
@@ -1098,23 +1113,13 @@ arm_init_crypto_builtins_internal (void)
   #undef FT3
 }
 
-static bool neon_set_p = false;
-static bool neon_crypto_set_p = false;
-
 void
 arm_init_neon_builtins (void)
 {
-  if (! neon_set_p)
-    {
-      neon_set_p = true;
-      arm_init_neon_builtins_internal ();
-    }
+  arm_init_neon_builtins_internal ();
 
-  if (! neon_crypto_set_p && TARGET_CRYPTO && TARGET_HARD_FLOAT)
-    {
-      neon_crypto_set_p = true;
-      arm_init_crypto_builtins_internal ();
-    }
+  if (TARGET_CRYPTO && TARGET_HARD_FLOAT)
+    arm_init_crypto_builtins_internal ();
 }
 
 #undef NUM_DREG_TYPES
@@ -1802,11 +1807,22 @@ arm_init_builtins (void)
 /* Return the ARM builtin for CODE.  */
 
 tree
-arm_builtin_decl (unsigned code, bool initialize_p ATTRIBUTE_UNUSED)
+arm_builtin_decl (unsigned code, bool initialize_p)
 {
   if (code >= ARM_BUILTIN_MAX)
     return error_mark_node;
 
+  if (! arm_builtin_decls[code] && initialize_p)
+    {
+      /* arm_fpu_index is not set to test global features here.  */
+      if (code >= ARM_BUILTIN_CRYPTO_BASE)
+	{
+	  arm_init_neon_builtins_internal ();
+	  if (code < ARM_BUILTIN_NEON_BASE)
+	    arm_init_crypto_builtins_internal ();
+	}
+    }
+
   return arm_builtin_decls[code];
 }
 
Index: testsuite/gcc.target/arm/lto/pr65837_0.c
===================================================================
--- testsuite/gcc.target/arm/lto/pr65837_0.c	(revision 231774)
+++ testsuite/gcc.target/arm/lto/pr65837_0.c	(working copy)
@@ -1,5 +1,7 @@
 /* { dg-lto-do run } */
-/* { dg-lto-options {{-flto -mfpu=neon}} } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-skip-if "need hardfp ABI" { *-*-* } { "-mfloat-abi=soft" } { "" } } */
+/* { dg-lto-options {{-flto -mfpu=neon -mfloat-abi=hard}} } */
 /* { dg-suppress-ld-options {-mfpu=neon} } */
 
 #include "arm_neon.h"
@@ -8,7 +10,8 @@ float32x2_t a, b, c, e;
 
 int main()
 {
-  e = __builtin_neon_vmls_lanev2sf (a, b, c, 0);
+  e = vmls_lane_f32 (a, b, c, 0);
   return 0;
 }
 


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-12-17 16:21 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-08 12:53 [PATCH, ARM] PR68674 Fix LTO support for neon builtins and error catching Christian Bruel
2015-12-08 13:01 ` Ramana Radhakrishnan
2015-12-08 13:29   ` Christian Bruel
2015-12-08 13:36     ` Ramana Radhakrishnan
2015-12-08 13:53       ` Christian Bruel
2015-12-08 15:31         ` Christian Bruel
2015-12-08 20:45         ` Ramana Radhakrishnan
2015-12-09 16:08           ` Christian Bruel
2015-12-17 16:21   ` [PATCH, ARM] PR65835 Fix LTO support for neon builtins Christian Bruel
2015-12-09 17:32 ` [PATCH, ARM] PR68674 Fix LTO support for neon builtins and error catching Kyrill Tkachov
2015-12-10  9:26   ` Christian Bruel
2015-12-10  9:59     ` Kyrill Tkachov
2015-12-10 10:11       ` Christian Bruel
2015-12-10 10:19         ` Kyrill Tkachov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).