Hi Kyrill,

Thanks for your comments. Answers interleaved and the new patch attached.

On 09/18/2015 11:04 AM, Kyrill Tkachov wrote:
>
> On 15/09/15 11:47, Christian Bruel wrote:
>>
>> On 09/14/2015 04:30 PM, Christian Bruel wrote:
>>> Finally, the final part of the patch set does the attribute target
>>> parsing and checking, redefines the preprocessor macros and implements
>>> the inlining rules.
>>>
>>> testcases and documentation included.
>>>
>> new version to remove a shadowed remnant piece of code.
>>
>>
>>    > thanks
>>    >
>>    > Christian
>>    >
>
> +  /* OK to inline between different modes.
> +     Function with mode specific instructions, e.g using asm,
> +     must be explicitely protected with noinline.  */
>
> s/explicitely/explicitly/
>

thanks

>
> +  const struct arm_fpu_desc *fpu_desc1
> +    = &all_fpus[caller_opts->x_arm_fpu_index];
> +  const struct arm_fpu_desc *fpu_desc2
> +    = &all_fpus[callee_opts->x_arm_fpu_index];
>
> Please call these caller_fpu and callee_fpu, it's much easier to reason about the inlining rules that way

ok

>
> +
> +  /* Can't inline NEON extension if the caller doesn't support it.  */
> +  if (ARM_FPU_FSET_HAS (fpu_desc2->features, FPU_FL_NEON)
> +      && ! ARM_FPU_FSET_HAS (fpu_desc1->features, FPU_FL_NEON))
> +    return false;
> +
> +  /* Can't inline CRYPTO extension if the caller doesn't support it.  */
> +  if (ARM_FPU_FSET_HAS (fpu_desc2->features, FPU_FL_CRYPTO)
> +      && ! ARM_FPU_FSET_HAS (fpu_desc1->features, FPU_FL_CRYPTO))
> +    return false;
> +
>
> We also need to take into account FPU_FL_FP16...
> In general what we want is for the callee FPU features to be
> a subset of the callers features, similar to the way we handle
> the x_aarch64_isa_flags handling in aarch64_can_inline_p from the
> aarch64 port. I think that's the way to go here rather than explicitly
> writing down a check for each feature.

ok, with FL_FP16 now,

>
> @@ -242,6 +239,8 @@
>
>          /* Update macros.  */
>          gcc_assert (cur_opt->x_target_flags == target_flags);
> +      /* This one can be redefined by the pragma without warning.  */
> +      cpp_undef (parse_in, "__ARM_FP");
>          arm_cpu_builtins (parse_in);
>
> Could you elaborate why the cpp_undef here?
> If you want to undefine __ARM_FP so you can redefine it to a new value
> in arm_cpu_builtins then I think you should just undefine it in that function.

This is to avoid a warning: "__ARM_FP" redefined when creating a new 
pragma scope. (See the test attr-crypto.c).

We cannot call the cpp_undef inside arm_cpu_builtins, because it is also 
used for the TARGET_CPU_CPP_BUILTINS hook and then would prevent real 
illegitimate redefinitions.

Alternatively, I thought to reset the warn_builtin_macro_redefined flag, 
but that doesn't work as the macro is not NODE_BUILTIN (see the 
definition of warn_of_redefinition in libcpp).
We might need to change this later : should target macros be marked as 
NOTE_BUILTIN ? We can discuss this separately (I can open a defect) as 
we have the cpp_undep solution for now, if you agree.

>
>
> diff -ruN gnu_trunk.p3/gcc/gcc/doc/invoke.texi gnu_trunk.p4/gcc/gcc/doc/invoke.texi
> --- gnu_trunk.p3/gcc/gcc/doc/invoke.texi	2015-09-10 12:21:00.698911244 +0200
> +++ gnu_trunk.p4/gcc/gcc/doc/invoke.texi	2015-09-14 10:27:20.281932581 +0200
> @@ -13360,6 +13363,8 @@
>    floating-point arithmetic (in particular denormal values are treated as
>    zero), so the use of NEON instructions may lead to a loss of precision.
>
> +You can also set the fpu name at function level by using the @code{target("mfpu=")} function attributes (@pxref{ARM Function Attributes}) or pragmas (@pxref{Function Specific Option Pragmas}).
> +
>
> s/"mfpu="/"fpu="
>

thanks

>
> --- gnu_trunk.p3/gcc/gcc/testsuite/gcc.target/arm/attr-neon.c	1970-01-01 01:00:00.000000000 +0100
> +++ gnu_trunk.p4/gcc/gcc/testsuite/gcc.target/arm/attr-neon.c	2015-09-14 16:12:08.449698268 +0200
> @@ -0,0 +1,26 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target arm_neon_ok } */
> +/* { dg-options "-O3 -mfloat-abi=softfp -ftree-vectorize" } */
> +
> +void
> +f3(int n, int x[], int y[]) {
> +  int i;
> +  for (i = 0; i < n; ++i)
> +    y[i] = x[i] << 3;
> +}
> +
>
> What if GCC has been configured with --with-fpu=neon?
> Then f3 will be compiled assuming NEON. You should add a -mfpu=vfp to the dg-options.

Ah yes. I've added ((target("fpu=vfp")) instead, since we are testing 
the attribute.