On 23 January 2015 at 14:44, Christophe Lyon <christophe.lyon@linaro.org> wrote:
> On 23 January 2015 at 12:42, Christophe Lyon <christophe.lyon@linaro.org> wrote:
>> On 23 January 2015 at 11:18, Tejas Belagod <tejas.belagod@arm.com> wrote:
>>> On 22/01/15 21:31, Christophe Lyon wrote:
>>>>
>>>> On 22 January 2015 at 16:22, Tejas Belagod <tejas.belagod@arm.com> wrote:
>>>>>
>>>>> On 22/01/15 14:28, Christophe Lyon wrote:
>>>>>>
>>>>>>
>>>>>> On 22 January 2015 at 12:19, Tejas Belagod <tejas.belagod@arm.com>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 21/01/15 15:07, Christophe Lyon wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 19 January 2015 at 17:54, Marcus Shawcroft
>>>>>>>> <marcus.shawcroft@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 19 January 2015 at 15:43, Christophe Lyon
>>>>>>>>> <christophe.lyon@linaro.org>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 19 January 2015 at 14:29, Marcus Shawcroft
>>>>>>>>>> <marcus.shawcroft@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 16 January 2015 at 17:52, Christophe Lyon
>>>>>>>>>>> <christophe.lyon@linaro.org> wrote:
>>>>>>>>>>>
>>>>>>>>>>>>> OK provided, as per the previous couple, that we don;t regression
>>>>>>>>>>>>> or
>>>>>>>>>>>>> introduce new fails on aarch64[_be] or aarch32.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> This patch shows failures on aarch64 and aarch64_be for vmax and
>>>>>>>>>>>> vmin
>>>>>>>>>>>> when the input is -NaN.
>>>>>>>>>>>> It's a corner case, and my reading of the ARM ARM is that the
>>>>>>>>>>>> result
>>>>>>>>>>>> should the same as on aarch32.
>>>>>>>>>>>> I haven't had time to look at it in more details though.
>>>>>>>>>>>> So, not OK?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> They should have the same behaviour in aarch32 and aarch64. Did you
>>>>>>>>>>> test on HW or a model?
>>>>>>>>>>>
>>>>>>>>>> I ran the tests on qemu for aarch32 and aarch64-linux, and on the
>>>>>>>>>> foundation model for aarch64*-elf.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Leave this one out until we understand why it fails. /Marcus
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I've looked at this a bit more.
>>>>>>>> We have
>>>>>>>> fmax    v0.4s, v0.4s, v1.4s
>>>>>>>> where v0 is a vector of -NaN (0xffc00000) and v1 is a vector of 1.
>>>>>>>>
>>>>>>>> The output is still -NaN (0xffc00000), while the test expects
>>>>>>>> defaultNaN (0x7fc00000).
>>>>>>>>
>>>>>>>
>>>>>>> In the AArch32 execution state, Advanced SIMD FP arithmetic always uses
>>>>>>> the
>>>>>>> DefaultNaN setting regardless of the DN-bit value in the FPSCR. In
>>>>>>> AArch64
>>>>>>> execution state, result of Advanced SIMD FP arithmetic operations
>>>>>>> depend
>>>>>>> on
>>>>>>> the value of the DN-bit i.e. either propagate the input NaN or generate
>>>>>>> DefaultNaN depending on the value of DN.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Maybe I'm using an outdated doc. On page 2282 of ARMv8 ARM rev C, I
>>>>>> can see only the latter (no diff between aarch32 and aarch64 in
>>>>>> FPProcessNan pseudo-code)
>>>>>>
>>>>>
>>>>> If you see pg. 4005 in the same doc(rev C), you'll see the FPSCR spec -
>>>>> under DN:
>>>>>
>>>>> "The value of this bit only controls scalar floating-point arithmetic.
>>>>> Advanced SIMD arithmetic always uses the Default NaN setting, regardless
>>>>> of
>>>>> the value of the DN bit."
>>>>>
>>>>> Also on page 3180 for the description of VMAX(vector FP), it says:
>>>>> "
>>>>> *  max(+0.0, -0.0) = +0.0
>>>>> * If any input is a NaN, the corresponding result element is the default
>>>>> NaN.
>>>>> "
>>>>>
>>>> Oops I was looking at FMAX (vector) pg 936.
>>>>
>>>>> The pseudocode for FPMax () on pg. 3180 passes StandardFPSCRValue() to
>>>>> FPMax() which is on pg. 2285
>>>>>
>>>>> // StandardFPSCRValue()
>>>>> // ====================
>>>>> FPCRType StandardFPSCRValue()
>>>>> return ‘00000’ : FPSCR.AHP : ‘11000000000000000000000000’
>>>>>
>>>>> Here bit-25(FPSCR.DN) is set to 1.
>>>>>
>>>>
>>>> So, we should get defaultNaN too on aarch64, and no need to try to
>>>> force DN to 1 in gdb?
>>>>
>>>> What can be wrong?
>>>>
>>>
>>> On pg 3180, I see VMAX(FPSIMD) for A32/T32, not A64. I hope we're reading
>>> the same document.
>>>
>>> Regardless of the page number, if you see the pseudocode for VMAX(FPSIMD)
>>> for AArch32, StandardFPSCRValue() (i.e. DN = 1) is passed to FPMax() which
>>> means generate DefaultNaN() regardless.
>>>
>>> OTOH, on pg 936, you have FMAX(vector) for A64 where FPMax() in the
>>> pseudocode gets just FPCR.
>>>
>>>
>> Ok, that was my initial understanding but our discussion confused me.
>>
>> And that's why I tried to force DN = 1 in gdb before single-stepping over
>> fmax    v0.4s, v0.4s, v1.4s
>>
>> but it changed nothing :-(
>> Hence my question about a gdb possible bug or misuse.
>
> Hmm... user error, I missed one bit
> set $fpcr=0x2000000
> works under gdb.
>
>> I'll try modifying the test to have it force DN=1.
>>
> Forcing DN=1 in the test makes it pass.
>
> I am going to look at adding that cleanly to my test, and resubmit it.
>
> Thanks, and sorry for the noise.
>
Here is the updated version:
- Now I set DN=1 on AArch64 in clean_results, as it is the main
initialization function.
- I removed the double negative :-)
- I removed the useless [u]int64 and poly variants

Christophe.

2015-01-25  Christophe Lyon  <christophe.lyon@linaro.org>

* gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
(_ARM_FPSRC): Add DN and AHP fields.
(clean_results): Force DN=1 on AArch64.
* gcc.target/aarch64/advsimd-intrinsics/binary_op_no64.inc: New file.
* gcc.target/aarch64/advsimd-intrinsics/vhadd.c: New file.
* gcc.target/aarch64/advsimd-intrinsics/vhsub.c: New file.
* gcc.target/aarch64/advsimd-intrinsics/vmax.c: New file.
* gcc.target/aarch64/advsimd-intrinsics/vmin.c: New file.
* gcc.target/aarch64/advsimd-intrinsics/vrhadd.c: New file.

>>> Thanks,
>>> Tejas.
>>>
>>>
>>>>> Thanks,
>>>>> Tejas.
>>>>>
>>>>>
>>>>>>> If you're running your test in the AArch64 execution state, you'd want
>>>>>>> to
>>>>>>> define the DN bit and modify the expected results accordingly or have
>>>>>>> the
>>>>>>> test poll at runtime what the DN-bit is set to and check expected
>>>>>>> results
>>>>>>> dynamically.
>>>>>>
>>>>>>
>>>>>> Makes sense, I hadn't noticed the different aarch64 spec here.
>>>>>>
>>>>>>> I think the test already has expected behaviour for AArch32 execution
>>>>>>> state
>>>>>>> by expecting DefaultNaN regardless.
>>>>>>
>>>>>>
>>>>>> Yes.
>>>>>>
>>>>>>>> I have executed the test under GDB on AArch64 HW, and noticed that
>>>>>>>> fpcr
>>>>>>>> was 0.
>>>>>>>> I forced it to have DN==1:
>>>>>>>> set $fpcr=0x1000000
>>>>>>>> but this didn't change the result.
>>>>>>>>
>>>>>>>> Does setting fpcr.dn under gdb actually work?
>>>>>>>>
>>>>>>>
>>>>>>> It should. Possibly a bug, patches welcome :-).
>>>>>>>
>>>>>> :-)
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>