* [AArch64] Fix integer vabs intrinsics
@ 2014-05-02 8:48 James Greenhalgh
2014-05-02 9:00 ` Andrew Pinski
0 siblings, 1 reply; 10+ messages in thread
From: James Greenhalgh @ 2014-05-02 8:48 UTC (permalink / raw)
To: gcc-patches; +Cc: marcus.shawcroft
[-- Attachment #1: Type: text/plain, Size: 1138 bytes --]
Hi,
Unlike the mid-end's concept of an ABS_EXPR, which treats overflow as
undefined/impossible, the neon intrinsics vabs intrinsics should behave as
the hardware. That is to say, the pseudo-code sequence:
a = vabs_s8 (vdup_n_s8 (-128));
assert (a >= 0);
does not hold. As in hardware
abs (-128) == -128
Folding vabs intrinsics to an ABS_EXPR is thus a mistake, and we should avoid
it. In fact, we have to be even more careful than that, and keep the integer
vabs intrinsics as an unspec in the back end.
We keep the standard pattern name around for the benefit of
auto-vectorization.
Tested on aarch64-none-elf with no issues.
This will also be a bug on 4.9 (ugh), OK for trunk and gcc-4_9-branch?
Thanks,
James
---
2014-05-02 James Greenhalgh <james.greenhalgh@arm.com>
* config/aarch64/aarch64-builtins.c (aarch64_fold_builtin): Don't
fold integer abs builtins.
* config/aarch64/aarch64-simd-builtins.def (abs): Split by integer
and floating point variants.
* config/aarch64/aarch64-simd.md (aarch64_abs<mode>): New.
* config/aarch64/iterators.md (unspec): Add UNSPEC_ABS.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-AArch64-Fix-integer-vabs-intrinsics.patch --]
[-- Type: text/x-patch; name=0001-AArch64-Fix-integer-vabs-intrinsics.patch, Size: 2539 bytes --]
diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index a301982..6d47c0b 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -1153,7 +1153,7 @@ aarch64_fold_builtin (tree fndecl, int n_args ATTRIBUTE_UNUSED, tree *args,
switch (fcode)
{
- BUILTIN_VALLDI (UNOP, abs, 2)
+ BUILTIN_VDQF (UNOP, abs, 2)
return fold_build1 (ABS_EXPR, type, args[0]);
break;
BUILTIN_VALLDI (BINOP, cmge, 0)
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 339e8f8..e2d1078 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -365,7 +365,8 @@
BUILTIN_VDQF (UNOP, frecpe, 0)
BUILTIN_VDQF (BINOP, frecps, 0)
- BUILTIN_VALLDI (UNOP, abs, 2)
+ BUILTIN_VDQ (UNOP, abs, 0)
+ BUILTIN_VDQF (UNOP, abs, 2)
VAR1 (UNOP, vec_unpacks_hi_, 10, v4sf)
VAR1 (BINOP, float_truncate_hi_, 0, v4sf)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 108bc8d88931e67e6c7eeb77774a01bb391a1ced..acb75f5bd0c732d8e11d4a7b6b61f8b1e81d1960 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -390,6 +390,18 @@ (define_insn "aba<mode>_3"
[(set_attr "type" "neon_arith_acc<q>")]
)
+;; To mirror the behaviour of hardware, as required for arm_neon.h, we must
+;; show an abundance of caution around the abs instruction.
+
+(define_insn "aarch64_abs<mode>"
+ [(set (match_operand:VDQ 0 "register_operand" "=w")
+ (unspec:VDQ [(match_operand:VDQ 1 "register_operand" "w")]
+ UNSPEC_ABS))]
+ "TARGET_SIMD"
+ "abs\t%0.<Vtype>, %1.<Vtype>"
+ [(set_attr "type" "neon_abs<q>")]
+)
+
(define_insn "fabd<mode>_3"
[(set (match_operand:VDQF 0 "register_operand" "=w")
(abs:VDQF (minus:VDQF
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index c537c3780eea95fa315c82bb36ac7f91f0f920fd..e45a1a11991a71ad37a8d5bb7c4ff81627671384 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -197,6 +197,7 @@ (define_c_enum "unspec"
[
UNSPEC_ASHIFT_SIGNED ; Used in aarch-simd.md.
UNSPEC_ASHIFT_UNSIGNED ; Used in aarch64-simd.md.
+ UNSPEC_ABS ; Used in aarch64-simd.md.
UNSPEC_FMAX ; Used in aarch64-simd.md.
UNSPEC_FMAXNMV ; Used in aarch64-simd.md.
UNSPEC_FMAXV ; Used in aarch64-simd.md.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [AArch64] Fix integer vabs intrinsics
2014-05-02 8:48 [AArch64] Fix integer vabs intrinsics James Greenhalgh
@ 2014-05-02 9:00 ` Andrew Pinski
2014-05-02 9:21 ` James Greenhalgh
0 siblings, 1 reply; 10+ messages in thread
From: Andrew Pinski @ 2014-05-02 9:00 UTC (permalink / raw)
To: James Greenhalgh; +Cc: GCC Patches, Marcus Shawcroft
On Fri, May 2, 2014 at 1:48 AM, James Greenhalgh
<james.greenhalgh@arm.com> wrote:
>
> Hi,
>
> Unlike the mid-end's concept of an ABS_EXPR, which treats overflow as
> undefined/impossible, the neon intrinsics vabs intrinsics should behave as
> the hardware. That is to say, the pseudo-code sequence:
Only for signed integer types. You should be able to use an unsigned
integer type here instead.
>
> a = vabs_s8 (vdup_n_s8 (-128));
> assert (a >= 0);
>
> does not hold. As in hardware
>
> abs (-128) == -128
>
> Folding vabs intrinsics to an ABS_EXPR is thus a mistake, and we should avoid
> it. In fact, we have to be even more careful than that, and keep the integer
> vabs intrinsics as an unspec in the back end.
No it is not. The mistake is to use signed integer types here. Just
add a conversion to an unsigned integer vector and it will work
correctly.
In fact the ABS rtl code is not undefined for the overflow.
Thanks,
Andrew Pinski
>
> We keep the standard pattern name around for the benefit of
> auto-vectorization.
>
> Tested on aarch64-none-elf with no issues.
>
> This will also be a bug on 4.9 (ugh), OK for trunk and gcc-4_9-branch?
>
> Thanks,
> James
>
> ---
> 2014-05-02 James Greenhalgh <james.greenhalgh@arm.com>
>
> * config/aarch64/aarch64-builtins.c (aarch64_fold_builtin): Don't
> fold integer abs builtins.
> * config/aarch64/aarch64-simd-builtins.def (abs): Split by integer
> and floating point variants.
> * config/aarch64/aarch64-simd.md (aarch64_abs<mode>): New.
> * config/aarch64/iterators.md (unspec): Add UNSPEC_ABS.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [AArch64] Fix integer vabs intrinsics
2014-05-02 9:00 ` Andrew Pinski
@ 2014-05-02 9:21 ` James Greenhalgh
2014-05-02 9:29 ` pinskia
0 siblings, 1 reply; 10+ messages in thread
From: James Greenhalgh @ 2014-05-02 9:21 UTC (permalink / raw)
To: Andrew Pinski; +Cc: GCC Patches, Marcus Shawcroft
On Fri, May 02, 2014 at 10:00:15AM +0100, Andrew Pinski wrote:
> On Fri, May 2, 2014 at 1:48 AM, James Greenhalgh
> <james.greenhalgh@arm.com> wrote:
> >
> > Hi,
> >
> > Unlike the mid-end's concept of an ABS_EXPR, which treats overflow as
> > undefined/impossible, the neon intrinsics vabs intrinsics should behave as
> > the hardware. That is to say, the pseudo-code sequence:
>
>
> Only for signed integer types. You should be able to use an unsigned
> integer type here instead.
If anything, I think that puts us in a worse position. The issue that
inspires this patch is that GCC will happily fold:
t1 = ABS_EXPR (x)
t2 = GE_EXPR (t1, 0)
to
t2 = TRUE
Surely an unsigned integer type is going to suffer the same fate? Certainly I
can imagine somewhere in the compiler there being a fold path for:
(unsigned >= 0) == TRUE
> >
> > a = vabs_s8 (vdup_n_s8 (-128));
> > assert (a >= 0);
> >
> > does not hold. As in hardware
> >
> > abs (-128) == -128
> >
> > Folding vabs intrinsics to an ABS_EXPR is thus a mistake, and we should avoid
> > it. In fact, we have to be even more careful than that, and keep the integer
> > vabs intrinsics as an unspec in the back end.
>
> No it is not. The mistake is to use signed integer types here. Just
> add a conversion to an unsigned integer vector and it will work
> correctly.
> In fact the ABS rtl code is not undefined for the overflow.
Here we are covering ourselves against a seperate issue. For auto-vectorized
code we want the SABD combine patterns to kick in whenever sensible. For
intrinsics code, in the case where vsub_s8 (x, y) would cause an underflow:
vabs_s8 (vsub_s8 (x, y)) != vabd_s8 (x, y)
So in this case, the combine would be erroneous. Likewise SABA.
Thanks,
James
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [AArch64] Fix integer vabs intrinsics
2014-05-02 9:21 ` James Greenhalgh
@ 2014-05-02 9:29 ` pinskia
2014-05-02 10:29 ` James Greenhalgh
0 siblings, 1 reply; 10+ messages in thread
From: pinskia @ 2014-05-02 9:29 UTC (permalink / raw)
To: James Greenhalgh; +Cc: GCC Patches, Marcus Shawcroft
> On May 2, 2014, at 2:21 AM, James Greenhalgh <james.greenhalgh@arm.com> wrote:
>
>> On Fri, May 02, 2014 at 10:00:15AM +0100, Andrew Pinski wrote:
>> On Fri, May 2, 2014 at 1:48 AM, James Greenhalgh
>> <james.greenhalgh@arm.com> wrote:
>>>
>>> Hi,
>>>
>>> Unlike the mid-end's concept of an ABS_EXPR, which treats overflow as
>>> undefined/impossible, the neon intrinsics vabs intrinsics should behave as
>>> the hardware. That is to say, the pseudo-code sequence:
>>
>>
>> Only for signed integer types. You should be able to use an unsigned
>> integer type here instead.
>
> If anything, I think that puts us in a worse position.
Not if you cast it back.
> The issue that
> inspires this patch is that GCC will happily fold:
>
> t1 = ABS_EXPR (x)
> t2 = GE_EXPR (t1, 0)
>
> to
>
> t2 = TRUE
>
> Surely an unsigned integer type is going to suffer the same fate? Certainly I
> can imagine somewhere in the compiler there being a fold path for:
Yes but if add a cast from the unsigned type to the signed type gcc does not optimize that. If it does it is a bug since the overflow is defined there.
>
> (unsigned >= 0) == TRUE
>
>>>
>>> a = vabs_s8 (vdup_n_s8 (-128));
>>> assert (a >= 0);
>>>
>>> does not hold. As in hardware
>>>
>>> abs (-128) == -128
>>>
>>> Folding vabs intrinsics to an ABS_EXPR is thus a mistake, and we should avoid
>>> it. In fact, we have to be even more careful than that, and keep the integer
>>> vabs intrinsics as an unspec in the back end.
>>
>> No it is not. The mistake is to use signed integer types here. Just
>> add a conversion to an unsigned integer vector and it will work
>> correctly.
>> In fact the ABS rtl code is not undefined for the overflow.
>
> Here we are covering ourselves against a seperate issue. For auto-vectorized
> code we want the SABD combine patterns to kick in whenever sensible. For
> intrinsics code, in the case where vsub_s8 (x, y) would cause an underflow:
>
> vabs_s8 (vsub_s8 (x, y)) != vabd_s8 (x, y)
>
> So in this case, the combine would be erroneous. Likewise SABA.
This sounds like it would problematic for unsigned types and not just for vabs_s8 with vsub_s8. So I think you should be using unspec for vabd_s8 instead. Since in rtl overflow and underflow is defined to be wrapping.
Thanks,
Andrew Pinski
>
> Thanks,
> James
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [AArch64] Fix integer vabs intrinsics
2014-05-02 9:29 ` pinskia
@ 2014-05-02 10:29 ` James Greenhalgh
2014-05-02 10:39 ` Richard Earnshaw
0 siblings, 1 reply; 10+ messages in thread
From: James Greenhalgh @ 2014-05-02 10:29 UTC (permalink / raw)
To: pinskia; +Cc: GCC Patches, Marcus Shawcroft
On Fri, May 02, 2014 at 10:29:06AM +0100, pinskia@gmail.com wrote:
>
>
> > On May 2, 2014, at 2:21 AM, James Greenhalgh <james.greenhalgh@arm.com> wrote:
> >
> >> On Fri, May 02, 2014 at 10:00:15AM +0100, Andrew Pinski wrote:
> >> On Fri, May 2, 2014 at 1:48 AM, James Greenhalgh
> >> <james.greenhalgh@arm.com> wrote:
> >>>
> >>> Hi,
> >>>
> >>> Unlike the mid-end's concept of an ABS_EXPR, which treats overflow as
> >>> undefined/impossible, the neon intrinsics vabs intrinsics should behave as
> >>> the hardware. That is to say, the pseudo-code sequence:
> >>
> >>
> >> Only for signed integer types. You should be able to use an unsigned
> >> integer type here instead.
> >
> > If anything, I think that puts us in a worse position.
>
> Not if you cast it back.
>
>
> > The issue that
> > inspires this patch is that GCC will happily fold:
> >
> > t1 = ABS_EXPR (x)
> > t2 = GE_EXPR (t1, 0)
> >
> > to
> >
> > t2 = TRUE
> >
> > Surely an unsigned integer type is going to suffer the same fate? Certainly I
> > can imagine somewhere in the compiler there being a fold path for:
>
> Yes but if add a cast from the unsigned type to the signed type gcc does not
> optimize that. If it does it is a bug since the overflow is defined there.
I'm not sure I understand, are you saying I want to fold to:
t1 = VIEW_CONVERT_EXPR (x, unsigned)
t2 = ABS_EXPR (t1)
t3 = VIEW_CONVERT_EXPR (t2, signed)
Surely ABS_EXPR (unsigned) is a nop, and the two VIEW_CONVERTs cancel each
other out leading to an overall NOP? It might just be Friday morning and a
lack of coffee talking, but I think I need you to spell this one out to
me in big letters!
> >
> > (unsigned >= 0) == TRUE
> >
> >>>
> >>> a = vabs_s8 (vdup_n_s8 (-128));
> >>> assert (a >= 0);
> >>>
> >>> does not hold. As in hardware
> >>>
> >>> abs (-128) == -128
> >>>
> >>> Folding vabs intrinsics to an ABS_EXPR is thus a mistake, and we should avoid
> >>> it. In fact, we have to be even more careful than that, and keep the integer
> >>> vabs intrinsics as an unspec in the back end.
> >>
> >> No it is not. The mistake is to use signed integer types here. Just
> >> add a conversion to an unsigned integer vector and it will work
> >> correctly.
> >> In fact the ABS rtl code is not undefined for the overflow.
> >
> > Here we are covering ourselves against a seperate issue. For auto-vectorized
> > code we want the SABD combine patterns to kick in whenever sensible. For
> > intrinsics code, in the case where vsub_s8 (x, y) would cause an underflow:
> >
> > vabs_s8 (vsub_s8 (x, y)) != vabd_s8 (x, y)
> >
> > So in this case, the combine would be erroneous. Likewise SABA.
>
> This sounds like it would problematic for unsigned types and not just for
> vabs_s8 with vsub_s8. So I think you should be using unspec for vabd_s8
> instead. Since in rtl overflow and underflow is defined to be wrapping.
There are no vabs_u8/vabd_u8 so I don't see how we can reach this point
with unsigned types. Further, I have never thought of RTL having signed
and unsigned types, just a bag of bits. We'll want to use unspec for the
intrinsic version of vabd_s8 - but we'll want to specify the
(abs (minus (reg) (reg)))
behaviour so that auto-vectorized code can pick it up.
So in the end we'll have these patterns:
(abs
(abs (reg)))
(intrinsic_abs
(unspec [(reg)] UNSPEC_ABS))
(abd
(abs (minus (reg) (reg))))
(intrinsic_abd
(unspec [(reg) (reg)] UNSPEC_ABD))
(aba
(plus (abs (minus (reg) (reg))) (reg)))
(intrinsic_aba
(plus (unspec [(reg) (reg)] UNSPEC_ABD) (reg)))
which should give us reasonable auto-vectorized code without triggering any
of the issues mapping the semantics of the instructions to intrinsics.
Thanks,
James
>
> Thanks,
> Andrew Pinski
>
> >
> > Thanks,
> > James
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [AArch64] Fix integer vabs intrinsics
2014-05-02 10:29 ` James Greenhalgh
@ 2014-05-02 10:39 ` Richard Earnshaw
2014-05-05 8:04 ` Richard Biener
0 siblings, 1 reply; 10+ messages in thread
From: Richard Earnshaw @ 2014-05-02 10:39 UTC (permalink / raw)
To: James Greenhalgh; +Cc: pinskia, GCC Patches, Marcus Shawcroft
On 02/05/14 11:28, James Greenhalgh wrote:
> On Fri, May 02, 2014 at 10:29:06AM +0100, pinskia@gmail.com wrote:
>>
>>
>>> On May 2, 2014, at 2:21 AM, James Greenhalgh <james.greenhalgh@arm.com> wrote:
>>>
>>>> On Fri, May 02, 2014 at 10:00:15AM +0100, Andrew Pinski wrote:
>>>> On Fri, May 2, 2014 at 1:48 AM, James Greenhalgh
>>>> <james.greenhalgh@arm.com> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> Unlike the mid-end's concept of an ABS_EXPR, which treats overflow as
>>>>> undefined/impossible, the neon intrinsics vabs intrinsics should behave as
>>>>> the hardware. That is to say, the pseudo-code sequence:
>>>>
>>>>
>>>> Only for signed integer types. You should be able to use an unsigned
>>>> integer type here instead.
>>>
>>> If anything, I think that puts us in a worse position.
>>
>> Not if you cast it back.
>>
>>
>>> The issue that
>>> inspires this patch is that GCC will happily fold:
>>>
>>> t1 = ABS_EXPR (x)
>>> t2 = GE_EXPR (t1, 0)
>>>
>>> to
>>>
>>> t2 = TRUE
>>>
>>> Surely an unsigned integer type is going to suffer the same fate? Certainly I
>>> can imagine somewhere in the compiler there being a fold path for:
>>
>> Yes but if add a cast from the unsigned type to the signed type gcc does not
>> optimize that. If it does it is a bug since the overflow is defined there.
>
> I'm not sure I understand, are you saying I want to fold to:
>
> t1 = VIEW_CONVERT_EXPR (x, unsigned)
> t2 = ABS_EXPR (t1)
> t3 = VIEW_CONVERT_EXPR (t2, signed)
>
> Surely ABS_EXPR (unsigned) is a nop, and the two VIEW_CONVERTs cancel each
> other out leading to an overall NOP? It might just be Friday morning and a
> lack of coffee talking, but I think I need you to spell this one out to
> me in big letters!
>
I agree. I think what you need is a type widening so that you get
t1 = VEC_WIDEN (x)
t2 = ABS_EXPR (t1)
t3 = VEC_NARROW (t2)
This then guarantees that the ABS expression cannot be undefined. I'm
less sure, however about the narrow causing a change in 'sign'. Has it
just punted the problem? Maybe you need
t1 = VEC_WIDEN (x)
t2 = ABS_EXPR (t1)
t3 = VIEW_CONVERT_EXPR (x, unsigned)
t4 = VEC_NARROW (t3)
t5 = VIEW_CONVERT_EXPR (t4, signed)
!!!
How you capture this into RTL during expand, though, is another thing.
R.
>>>
>>> (unsigned >= 0) == TRUE
>>>
>>>>>
>>>>> a = vabs_s8 (vdup_n_s8 (-128));
>>>>> assert (a >= 0);
>>>>>
>>>>> does not hold. As in hardware
>>>>>
>>>>> abs (-128) == -128
>>>>>
>>>>> Folding vabs intrinsics to an ABS_EXPR is thus a mistake, and we should avoid
>>>>> it. In fact, we have to be even more careful than that, and keep the integer
>>>>> vabs intrinsics as an unspec in the back end.
>>>>
>>>> No it is not. The mistake is to use signed integer types here. Just
>>>> add a conversion to an unsigned integer vector and it will work
>>>> correctly.
>>>> In fact the ABS rtl code is not undefined for the overflow.
>>>
>>> Here we are covering ourselves against a seperate issue. For auto-vectorized
>>> code we want the SABD combine patterns to kick in whenever sensible. For
>>> intrinsics code, in the case where vsub_s8 (x, y) would cause an underflow:
>>>
>>> vabs_s8 (vsub_s8 (x, y)) != vabd_s8 (x, y)
>>>
>>> So in this case, the combine would be erroneous. Likewise SABA.
>>
>> This sounds like it would problematic for unsigned types and not just for
>> vabs_s8 with vsub_s8. So I think you should be using unspec for vabd_s8
>> instead. Since in rtl overflow and underflow is defined to be wrapping.
>
> There are no vabs_u8/vabd_u8 so I don't see how we can reach this point
> with unsigned types. Further, I have never thought of RTL having signed
> and unsigned types, just a bag of bits. We'll want to use unspec for the
> intrinsic version of vabd_s8 - but we'll want to specify the
>
> (abs (minus (reg) (reg)))
>
> behaviour so that auto-vectorized code can pick it up.
>
> So in the end we'll have these patterns:
>
> (abs
> (abs (reg)))
>
> (intrinsic_abs
> (unspec [(reg)] UNSPEC_ABS))
>
> (abd
> (abs (minus (reg) (reg))))
>
> (intrinsic_abd
> (unspec [(reg) (reg)] UNSPEC_ABD))
>
> (aba
> (plus (abs (minus (reg) (reg))) (reg)))
>
> (intrinsic_aba
> (plus (unspec [(reg) (reg)] UNSPEC_ABD) (reg)))
>
> which should give us reasonable auto-vectorized code without triggering any
> of the issues mapping the semantics of the instructions to intrinsics.
>
> Thanks,
> James
>
>>
>> Thanks,
>> Andrew Pinski
>>
>>>
>>> Thanks,
>>> James
>>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [AArch64] Fix integer vabs intrinsics
2014-05-02 10:39 ` Richard Earnshaw
@ 2014-05-05 8:04 ` Richard Biener
2014-05-07 10:30 ` Richard Earnshaw
0 siblings, 1 reply; 10+ messages in thread
From: Richard Biener @ 2014-05-05 8:04 UTC (permalink / raw)
To: Richard Earnshaw; +Cc: James Greenhalgh, pinskia, GCC Patches, Marcus Shawcroft
On Fri, May 2, 2014 at 12:39 PM, Richard Earnshaw <rearnsha@arm.com> wrote:
> On 02/05/14 11:28, James Greenhalgh wrote:
>> On Fri, May 02, 2014 at 10:29:06AM +0100, pinskia@gmail.com wrote:
>>>
>>>
>>>> On May 2, 2014, at 2:21 AM, James Greenhalgh <james.greenhalgh@arm.com> wrote:
>>>>
>>>>> On Fri, May 02, 2014 at 10:00:15AM +0100, Andrew Pinski wrote:
>>>>> On Fri, May 2, 2014 at 1:48 AM, James Greenhalgh
>>>>> <james.greenhalgh@arm.com> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Unlike the mid-end's concept of an ABS_EXPR, which treats overflow as
>>>>>> undefined/impossible, the neon intrinsics vabs intrinsics should behave as
>>>>>> the hardware. That is to say, the pseudo-code sequence:
>>>>>
>>>>>
>>>>> Only for signed integer types. You should be able to use an unsigned
>>>>> integer type here instead.
>>>>
>>>> If anything, I think that puts us in a worse position.
>>>
>>> Not if you cast it back.
>>>
>>>
>>>> The issue that
>>>> inspires this patch is that GCC will happily fold:
>>>>
>>>> t1 = ABS_EXPR (x)
>>>> t2 = GE_EXPR (t1, 0)
>>>>
>>>> to
>>>>
>>>> t2 = TRUE
>>>>
>>>> Surely an unsigned integer type is going to suffer the same fate? Certainly I
>>>> can imagine somewhere in the compiler there being a fold path for:
>>>
>>> Yes but if add a cast from the unsigned type to the signed type gcc does not
>>> optimize that. If it does it is a bug since the overflow is defined there.
>>
>> I'm not sure I understand, are you saying I want to fold to:
>>
>> t1 = VIEW_CONVERT_EXPR (x, unsigned)
>> t2 = ABS_EXPR (t1)
>> t3 = VIEW_CONVERT_EXPR (t2, signed)
>>
>> Surely ABS_EXPR (unsigned) is a nop, and the two VIEW_CONVERTs cancel each
>> other out leading to an overall NOP? It might just be Friday morning and a
>> lack of coffee talking, but I think I need you to spell this one out to
>> me in big letters!
>>
>
> I agree. I think what you need is a type widening so that you get
>
> t1 = VEC_WIDEN (x)
> t2 = ABS_EXPR (t1)
> t3 = VEC_NARROW (t2)
>
> This then guarantees that the ABS expression cannot be undefined. I'm
> less sure, however about the narrow causing a change in 'sign'. Has it
> just punted the problem? Maybe you need
Another option is to allow ABS_EXPR to have a TYPE_UNSIGNED
result type, thus do abs(int) -> unsigned (what we have as absu_hwi).
That is, have an ABS_EXPR that doesn't have the undefined issue
(at expense of optimization in case the result is immediately casted
back to signed)
Richard.
>
> t1 = VEC_WIDEN (x)
> t2 = ABS_EXPR (t1)
> t3 = VIEW_CONVERT_EXPR (x, unsigned)
> t4 = VEC_NARROW (t3)
> t5 = VIEW_CONVERT_EXPR (t4, signed)
>
> !!!
>
> How you capture this into RTL during expand, though, is another thing.
>
> R.
>
>>>>
>>>> (unsigned >= 0) == TRUE
>>>>
>>>>>>
>>>>>> a = vabs_s8 (vdup_n_s8 (-128));
>>>>>> assert (a >= 0);
>>>>>>
>>>>>> does not hold. As in hardware
>>>>>>
>>>>>> abs (-128) == -128
>>>>>>
>>>>>> Folding vabs intrinsics to an ABS_EXPR is thus a mistake, and we should avoid
>>>>>> it. In fact, we have to be even more careful than that, and keep the integer
>>>>>> vabs intrinsics as an unspec in the back end.
>>>>>
>>>>> No it is not. The mistake is to use signed integer types here. Just
>>>>> add a conversion to an unsigned integer vector and it will work
>>>>> correctly.
>>>>> In fact the ABS rtl code is not undefined for the overflow.
>>>>
>>>> Here we are covering ourselves against a seperate issue. For auto-vectorized
>>>> code we want the SABD combine patterns to kick in whenever sensible. For
>>>> intrinsics code, in the case where vsub_s8 (x, y) would cause an underflow:
>>>>
>>>> vabs_s8 (vsub_s8 (x, y)) != vabd_s8 (x, y)
>>>>
>>>> So in this case, the combine would be erroneous. Likewise SABA.
>>>
>>> This sounds like it would problematic for unsigned types and not just for
>>> vabs_s8 with vsub_s8. So I think you should be using unspec for vabd_s8
>>> instead. Since in rtl overflow and underflow is defined to be wrapping.
>>
>> There are no vabs_u8/vabd_u8 so I don't see how we can reach this point
>> with unsigned types. Further, I have never thought of RTL having signed
>> and unsigned types, just a bag of bits. We'll want to use unspec for the
>> intrinsic version of vabd_s8 - but we'll want to specify the
>>
>> (abs (minus (reg) (reg)))
>>
>> behaviour so that auto-vectorized code can pick it up.
>>
>> So in the end we'll have these patterns:
>>
>> (abs
>> (abs (reg)))
>>
>> (intrinsic_abs
>> (unspec [(reg)] UNSPEC_ABS))
>>
>> (abd
>> (abs (minus (reg) (reg))))
>>
>> (intrinsic_abd
>> (unspec [(reg) (reg)] UNSPEC_ABD))
>>
>> (aba
>> (plus (abs (minus (reg) (reg))) (reg)))
>>
>> (intrinsic_aba
>> (plus (unspec [(reg) (reg)] UNSPEC_ABD) (reg)))
>>
>> which should give us reasonable auto-vectorized code without triggering any
>> of the issues mapping the semantics of the instructions to intrinsics.
>>
>> Thanks,
>> James
>>
>>>
>>> Thanks,
>>> Andrew Pinski
>>>
>>>>
>>>> Thanks,
>>>> James
>>>
>>
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [AArch64] Fix integer vabs intrinsics
2014-05-05 8:04 ` Richard Biener
@ 2014-05-07 10:30 ` Richard Earnshaw
2014-05-07 10:32 ` Richard Biener
0 siblings, 1 reply; 10+ messages in thread
From: Richard Earnshaw @ 2014-05-07 10:30 UTC (permalink / raw)
To: Richard Biener; +Cc: James Greenhalgh, pinskia, GCC Patches, Marcus Shawcroft
On 05/05/14 09:04, Richard Biener wrote:
> On Fri, May 2, 2014 at 12:39 PM, Richard Earnshaw <rearnsha@arm.com> wrote:
>> On 02/05/14 11:28, James Greenhalgh wrote:
>>> On Fri, May 02, 2014 at 10:29:06AM +0100, pinskia@gmail.com wrote:
>>>>
>>>>
>>>>> On May 2, 2014, at 2:21 AM, James Greenhalgh <james.greenhalgh@arm.com> wrote:
>>>>>
>>>>>> On Fri, May 02, 2014 at 10:00:15AM +0100, Andrew Pinski wrote:
>>>>>> On Fri, May 2, 2014 at 1:48 AM, James Greenhalgh
>>>>>> <james.greenhalgh@arm.com> wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Unlike the mid-end's concept of an ABS_EXPR, which treats overflow as
>>>>>>> undefined/impossible, the neon intrinsics vabs intrinsics should behave as
>>>>>>> the hardware. That is to say, the pseudo-code sequence:
>>>>>>
>>>>>>
>>>>>> Only for signed integer types. You should be able to use an unsigned
>>>>>> integer type here instead.
>>>>>
>>>>> If anything, I think that puts us in a worse position.
>>>>
>>>> Not if you cast it back.
>>>>
>>>>
>>>>> The issue that
>>>>> inspires this patch is that GCC will happily fold:
>>>>>
>>>>> t1 = ABS_EXPR (x)
>>>>> t2 = GE_EXPR (t1, 0)
>>>>>
>>>>> to
>>>>>
>>>>> t2 = TRUE
>>>>>
>>>>> Surely an unsigned integer type is going to suffer the same fate? Certainly I
>>>>> can imagine somewhere in the compiler there being a fold path for:
>>>>
>>>> Yes but if add a cast from the unsigned type to the signed type gcc does not
>>>> optimize that. If it does it is a bug since the overflow is defined there.
>>>
>>> I'm not sure I understand, are you saying I want to fold to:
>>>
>>> t1 = VIEW_CONVERT_EXPR (x, unsigned)
>>> t2 = ABS_EXPR (t1)
>>> t3 = VIEW_CONVERT_EXPR (t2, signed)
>>>
>>> Surely ABS_EXPR (unsigned) is a nop, and the two VIEW_CONVERTs cancel each
>>> other out leading to an overall NOP? It might just be Friday morning and a
>>> lack of coffee talking, but I think I need you to spell this one out to
>>> me in big letters!
>>>
>>
>> I agree. I think what you need is a type widening so that you get
>>
>> t1 = VEC_WIDEN (x)
>> t2 = ABS_EXPR (t1)
>> t3 = VEC_NARROW (t2)
>>
>> This then guarantees that the ABS expression cannot be undefined. I'm
>> less sure, however about the narrow causing a change in 'sign'. Has it
>> just punted the problem? Maybe you need
>
> Another option is to allow ABS_EXPR to have a TYPE_UNSIGNED
> result type, thus do abs(int) -> unsigned (what we have as absu_hwi).
> That is, have an ABS_EXPR that doesn't have the undefined issue
> (at expense of optimization in case the result is immediately casted
> back to signed)
>
Yes, that would make more sense, and is, in effect, what the ARM VABS
instruction is doing (producing an unsigned result with no undefined
behaviour).
I'm not sure I understand your 'at expense of optimization' comment,
though. Surely a cast back to signed is essentially a no-op, since
there's no representational change in the value (at least, not on 2's
complement machines)?
> Richard.
>
>>
>> t1 = VEC_WIDEN (x)
>> t2 = ABS_EXPR (t1)
>> t3 = VIEW_CONVERT_EXPR (x, unsigned)
>> t4 = VEC_NARROW (t3)
>> t5 = VIEW_CONVERT_EXPR (t4, signed)
>>
>> !!!
>>
>> How you capture this into RTL during expand, though, is another thing.
>>
>> R.
>>
>>>>>
>>>>> (unsigned >= 0) == TRUE
>>>>>
>>>>>>>
>>>>>>> a = vabs_s8 (vdup_n_s8 (-128));
>>>>>>> assert (a >= 0);
>>>>>>>
>>>>>>> does not hold. As in hardware
>>>>>>>
>>>>>>> abs (-128) == -128
>>>>>>>
>>>>>>> Folding vabs intrinsics to an ABS_EXPR is thus a mistake, and we should avoid
>>>>>>> it. In fact, we have to be even more careful than that, and keep the integer
>>>>>>> vabs intrinsics as an unspec in the back end.
>>>>>>
>>>>>> No it is not. The mistake is to use signed integer types here. Just
>>>>>> add a conversion to an unsigned integer vector and it will work
>>>>>> correctly.
>>>>>> In fact the ABS rtl code is not undefined for the overflow.
>>>>>
>>>>> Here we are covering ourselves against a seperate issue. For auto-vectorized
>>>>> code we want the SABD combine patterns to kick in whenever sensible. For
>>>>> intrinsics code, in the case where vsub_s8 (x, y) would cause an underflow:
>>>>>
>>>>> vabs_s8 (vsub_s8 (x, y)) != vabd_s8 (x, y)
>>>>>
>>>>> So in this case, the combine would be erroneous. Likewise SABA.
>>>>
>>>> This sounds like it would problematic for unsigned types and not just for
>>>> vabs_s8 with vsub_s8. So I think you should be using unspec for vabd_s8
>>>> instead. Since in rtl overflow and underflow is defined to be wrapping.
>>>
>>> There are no vabs_u8/vabd_u8 so I don't see how we can reach this point
>>> with unsigned types. Further, I have never thought of RTL having signed
>>> and unsigned types, just a bag of bits. We'll want to use unspec for the
>>> intrinsic version of vabd_s8 - but we'll want to specify the
>>>
>>> (abs (minus (reg) (reg)))
>>>
>>> behaviour so that auto-vectorized code can pick it up.
>>>
>>> So in the end we'll have these patterns:
>>>
>>> (abs
>>> (abs (reg)))
>>>
>>> (intrinsic_abs
>>> (unspec [(reg)] UNSPEC_ABS))
>>>
>>> (abd
>>> (abs (minus (reg) (reg))))
>>>
>>> (intrinsic_abd
>>> (unspec [(reg) (reg)] UNSPEC_ABD))
>>>
>>> (aba
>>> (plus (abs (minus (reg) (reg))) (reg)))
>>>
>>> (intrinsic_aba
>>> (plus (unspec [(reg) (reg)] UNSPEC_ABD) (reg)))
>>>
>>> which should give us reasonable auto-vectorized code without triggering any
>>> of the issues mapping the semantics of the instructions to intrinsics.
>>>
>>> Thanks,
>>> James
>>>
>>>>
>>>> Thanks,
>>>> Andrew Pinski
>>>>
>>>>>
>>>>> Thanks,
>>>>> James
>>>>
>>>
>>
>>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [AArch64] Fix integer vabs intrinsics
2014-05-07 10:30 ` Richard Earnshaw
@ 2014-05-07 10:32 ` Richard Biener
2014-05-07 10:40 ` Richard Earnshaw
0 siblings, 1 reply; 10+ messages in thread
From: Richard Biener @ 2014-05-07 10:32 UTC (permalink / raw)
To: Richard Earnshaw; +Cc: James Greenhalgh, pinskia, GCC Patches, Marcus Shawcroft
On Wed, May 7, 2014 at 12:30 PM, Richard Earnshaw <rearnsha@arm.com> wrote:
> On 05/05/14 09:04, Richard Biener wrote:
>> On Fri, May 2, 2014 at 12:39 PM, Richard Earnshaw <rearnsha@arm.com> wrote:
>>> On 02/05/14 11:28, James Greenhalgh wrote:
>>>> On Fri, May 02, 2014 at 10:29:06AM +0100, pinskia@gmail.com wrote:
>>>>>
>>>>>
>>>>>> On May 2, 2014, at 2:21 AM, James Greenhalgh <james.greenhalgh@arm.com> wrote:
>>>>>>
>>>>>>> On Fri, May 02, 2014 at 10:00:15AM +0100, Andrew Pinski wrote:
>>>>>>> On Fri, May 2, 2014 at 1:48 AM, James Greenhalgh
>>>>>>> <james.greenhalgh@arm.com> wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Unlike the mid-end's concept of an ABS_EXPR, which treats overflow as
>>>>>>>> undefined/impossible, the neon intrinsics vabs intrinsics should behave as
>>>>>>>> the hardware. That is to say, the pseudo-code sequence:
>>>>>>>
>>>>>>>
>>>>>>> Only for signed integer types. You should be able to use an unsigned
>>>>>>> integer type here instead.
>>>>>>
>>>>>> If anything, I think that puts us in a worse position.
>>>>>
>>>>> Not if you cast it back.
>>>>>
>>>>>
>>>>>> The issue that
>>>>>> inspires this patch is that GCC will happily fold:
>>>>>>
>>>>>> t1 = ABS_EXPR (x)
>>>>>> t2 = GE_EXPR (t1, 0)
>>>>>>
>>>>>> to
>>>>>>
>>>>>> t2 = TRUE
>>>>>>
>>>>>> Surely an unsigned integer type is going to suffer the same fate? Certainly I
>>>>>> can imagine somewhere in the compiler there being a fold path for:
>>>>>
>>>>> Yes but if add a cast from the unsigned type to the signed type gcc does not
>>>>> optimize that. If it does it is a bug since the overflow is defined there.
>>>>
>>>> I'm not sure I understand, are you saying I want to fold to:
>>>>
>>>> t1 = VIEW_CONVERT_EXPR (x, unsigned)
>>>> t2 = ABS_EXPR (t1)
>>>> t3 = VIEW_CONVERT_EXPR (t2, signed)
>>>>
>>>> Surely ABS_EXPR (unsigned) is a nop, and the two VIEW_CONVERTs cancel each
>>>> other out leading to an overall NOP? It might just be Friday morning and a
>>>> lack of coffee talking, but I think I need you to spell this one out to
>>>> me in big letters!
>>>>
>>>
>>> I agree. I think what you need is a type widening so that you get
>>>
>>> t1 = VEC_WIDEN (x)
>>> t2 = ABS_EXPR (t1)
>>> t3 = VEC_NARROW (t2)
>>>
>>> This then guarantees that the ABS expression cannot be undefined. I'm
>>> less sure, however about the narrow causing a change in 'sign'. Has it
>>> just punted the problem? Maybe you need
>>
>> Another option is to allow ABS_EXPR to have a TYPE_UNSIGNED
>> result type, thus do abs(int) -> unsigned (what we have as absu_hwi).
>> That is, have an ABS_EXPR that doesn't have the undefined issue
>> (at expense of optimization in case the result is immediately casted
>> back to signed)
>>
>
> Yes, that would make more sense, and is, in effect, what the ARM VABS
> instruction is doing (producing an unsigned result with no undefined
> behaviour).
>
> I'm not sure I understand your 'at expense of optimization' comment,
> though. Surely a cast back to signed is essentially a no-op, since
> there's no representational change in the value (at least, not on 2's
> complement machines)?
We can't derive a value range of [0, INT_MAX] for the (int)ABSU_EXPR.
Richard.
>
>> Richard.
>>
>>>
>>> t1 = VEC_WIDEN (x)
>>> t2 = ABS_EXPR (t1)
>>> t3 = VIEW_CONVERT_EXPR (x, unsigned)
>>> t4 = VEC_NARROW (t3)
>>> t5 = VIEW_CONVERT_EXPR (t4, signed)
>>>
>>> !!!
>>>
>>> How you capture this into RTL during expand, though, is another thing.
>>>
>>> R.
>>>
>>>>>>
>>>>>> (unsigned >= 0) == TRUE
>>>>>>
>>>>>>>>
>>>>>>>> a = vabs_s8 (vdup_n_s8 (-128));
>>>>>>>> assert (a >= 0);
>>>>>>>>
>>>>>>>> does not hold. As in hardware
>>>>>>>>
>>>>>>>> abs (-128) == -128
>>>>>>>>
>>>>>>>> Folding vabs intrinsics to an ABS_EXPR is thus a mistake, and we should avoid
>>>>>>>> it. In fact, we have to be even more careful than that, and keep the integer
>>>>>>>> vabs intrinsics as an unspec in the back end.
>>>>>>>
>>>>>>> No it is not. The mistake is to use signed integer types here. Just
>>>>>>> add a conversion to an unsigned integer vector and it will work
>>>>>>> correctly.
>>>>>>> In fact the ABS rtl code is not undefined for the overflow.
>>>>>>
>>>>>> Here we are covering ourselves against a seperate issue. For auto-vectorized
>>>>>> code we want the SABD combine patterns to kick in whenever sensible. For
>>>>>> intrinsics code, in the case where vsub_s8 (x, y) would cause an underflow:
>>>>>>
>>>>>> vabs_s8 (vsub_s8 (x, y)) != vabd_s8 (x, y)
>>>>>>
>>>>>> So in this case, the combine would be erroneous. Likewise SABA.
>>>>>
>>>>> This sounds like it would problematic for unsigned types and not just for
>>>>> vabs_s8 with vsub_s8. So I think you should be using unspec for vabd_s8
>>>>> instead. Since in rtl overflow and underflow is defined to be wrapping.
>>>>
>>>> There are no vabs_u8/vabd_u8 so I don't see how we can reach this point
>>>> with unsigned types. Further, I have never thought of RTL having signed
>>>> and unsigned types, just a bag of bits. We'll want to use unspec for the
>>>> intrinsic version of vabd_s8 - but we'll want to specify the
>>>>
>>>> (abs (minus (reg) (reg)))
>>>>
>>>> behaviour so that auto-vectorized code can pick it up.
>>>>
>>>> So in the end we'll have these patterns:
>>>>
>>>> (abs
>>>> (abs (reg)))
>>>>
>>>> (intrinsic_abs
>>>> (unspec [(reg)] UNSPEC_ABS))
>>>>
>>>> (abd
>>>> (abs (minus (reg) (reg))))
>>>>
>>>> (intrinsic_abd
>>>> (unspec [(reg) (reg)] UNSPEC_ABD))
>>>>
>>>> (aba
>>>> (plus (abs (minus (reg) (reg))) (reg)))
>>>>
>>>> (intrinsic_aba
>>>> (plus (unspec [(reg) (reg)] UNSPEC_ABD) (reg)))
>>>>
>>>> which should give us reasonable auto-vectorized code without triggering any
>>>> of the issues mapping the semantics of the instructions to intrinsics.
>>>>
>>>> Thanks,
>>>> James
>>>>
>>>>>
>>>>> Thanks,
>>>>> Andrew Pinski
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> James
>>>>>
>>>>
>>>
>>>
>>
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [AArch64] Fix integer vabs intrinsics
2014-05-07 10:32 ` Richard Biener
@ 2014-05-07 10:40 ` Richard Earnshaw
0 siblings, 0 replies; 10+ messages in thread
From: Richard Earnshaw @ 2014-05-07 10:40 UTC (permalink / raw)
To: Richard Biener; +Cc: James Greenhalgh, pinskia, GCC Patches, Marcus Shawcroft
On 07/05/14 11:32, Richard Biener wrote:
> On Wed, May 7, 2014 at 12:30 PM, Richard Earnshaw <rearnsha@arm.com> wrote:
>> On 05/05/14 09:04, Richard Biener wrote:
>>> On Fri, May 2, 2014 at 12:39 PM, Richard Earnshaw <rearnsha@arm.com> wrote:
>>>> On 02/05/14 11:28, James Greenhalgh wrote:
>>>>> On Fri, May 02, 2014 at 10:29:06AM +0100, pinskia@gmail.com wrote:
>>>>>>
>>>>>>
>>>>>>> On May 2, 2014, at 2:21 AM, James Greenhalgh <james.greenhalgh@arm.com> wrote:
>>>>>>>
>>>>>>>> On Fri, May 02, 2014 at 10:00:15AM +0100, Andrew Pinski wrote:
>>>>>>>> On Fri, May 2, 2014 at 1:48 AM, James Greenhalgh
>>>>>>>> <james.greenhalgh@arm.com> wrote:
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Unlike the mid-end's concept of an ABS_EXPR, which treats overflow as
>>>>>>>>> undefined/impossible, the neon intrinsics vabs intrinsics should behave as
>>>>>>>>> the hardware. That is to say, the pseudo-code sequence:
>>>>>>>>
>>>>>>>>
>>>>>>>> Only for signed integer types. You should be able to use an unsigned
>>>>>>>> integer type here instead.
>>>>>>>
>>>>>>> If anything, I think that puts us in a worse position.
>>>>>>
>>>>>> Not if you cast it back.
>>>>>>
>>>>>>
>>>>>>> The issue that
>>>>>>> inspires this patch is that GCC will happily fold:
>>>>>>>
>>>>>>> t1 = ABS_EXPR (x)
>>>>>>> t2 = GE_EXPR (t1, 0)
>>>>>>>
>>>>>>> to
>>>>>>>
>>>>>>> t2 = TRUE
>>>>>>>
>>>>>>> Surely an unsigned integer type is going to suffer the same fate? Certainly I
>>>>>>> can imagine somewhere in the compiler there being a fold path for:
>>>>>>
>>>>>> Yes but if add a cast from the unsigned type to the signed type gcc does not
>>>>>> optimize that. If it does it is a bug since the overflow is defined there.
>>>>>
>>>>> I'm not sure I understand, are you saying I want to fold to:
>>>>>
>>>>> t1 = VIEW_CONVERT_EXPR (x, unsigned)
>>>>> t2 = ABS_EXPR (t1)
>>>>> t3 = VIEW_CONVERT_EXPR (t2, signed)
>>>>>
>>>>> Surely ABS_EXPR (unsigned) is a nop, and the two VIEW_CONVERTs cancel each
>>>>> other out leading to an overall NOP? It might just be Friday morning and a
>>>>> lack of coffee talking, but I think I need you to spell this one out to
>>>>> me in big letters!
>>>>>
>>>>
>>>> I agree. I think what you need is a type widening so that you get
>>>>
>>>> t1 = VEC_WIDEN (x)
>>>> t2 = ABS_EXPR (t1)
>>>> t3 = VEC_NARROW (t2)
>>>>
>>>> This then guarantees that the ABS expression cannot be undefined. I'm
>>>> less sure, however about the narrow causing a change in 'sign'. Has it
>>>> just punted the problem? Maybe you need
>>>
>>> Another option is to allow ABS_EXPR to have a TYPE_UNSIGNED
>>> result type, thus do abs(int) -> unsigned (what we have as absu_hwi).
>>> That is, have an ABS_EXPR that doesn't have the undefined issue
>>> (at expense of optimization in case the result is immediately casted
>>> back to signed)
>>>
>>
>> Yes, that would make more sense, and is, in effect, what the ARM VABS
>> instruction is doing (producing an unsigned result with no undefined
>> behaviour).
>>
>> I'm not sure I understand your 'at expense of optimization' comment,
>> though. Surely a cast back to signed is essentially a no-op, since
>> there's no representational change in the value (at least, not on 2's
>> complement machines)?
>
> We can't derive a value range of [0, INT_MAX] for the (int)ABSU_EXPR.
>
Unless you're assuming that ABS_EXPR(INT_MIN) will always trap, then if
you can derive it for ABS_EXPR (which really returns [0,
INT_MAX]+UNSPECIFIED, I don't really see why you can't derive it for
(int)ABSU_EXPR, which returns [0, INT_MAX]+INT_MIN, since the latter is
a subset of the former).
R.
> Richard.
>
>>
>>> Richard.
>>>
>>>>
>>>> t1 = VEC_WIDEN (x)
>>>> t2 = ABS_EXPR (t1)
>>>> t3 = VIEW_CONVERT_EXPR (x, unsigned)
>>>> t4 = VEC_NARROW (t3)
>>>> t5 = VIEW_CONVERT_EXPR (t4, signed)
>>>>
>>>> !!!
>>>>
>>>> How you capture this into RTL during expand, though, is another thing.
>>>>
>>>> R.
>>>>
>>>>>>>
>>>>>>> (unsigned >= 0) == TRUE
>>>>>>>
>>>>>>>>>
>>>>>>>>> a = vabs_s8 (vdup_n_s8 (-128));
>>>>>>>>> assert (a >= 0);
>>>>>>>>>
>>>>>>>>> does not hold. As in hardware
>>>>>>>>>
>>>>>>>>> abs (-128) == -128
>>>>>>>>>
>>>>>>>>> Folding vabs intrinsics to an ABS_EXPR is thus a mistake, and we should avoid
>>>>>>>>> it. In fact, we have to be even more careful than that, and keep the integer
>>>>>>>>> vabs intrinsics as an unspec in the back end.
>>>>>>>>
>>>>>>>> No it is not. The mistake is to use signed integer types here. Just
>>>>>>>> add a conversion to an unsigned integer vector and it will work
>>>>>>>> correctly.
>>>>>>>> In fact the ABS rtl code is not undefined for the overflow.
>>>>>>>
>>>>>>> Here we are covering ourselves against a seperate issue. For auto-vectorized
>>>>>>> code we want the SABD combine patterns to kick in whenever sensible. For
>>>>>>> intrinsics code, in the case where vsub_s8 (x, y) would cause an underflow:
>>>>>>>
>>>>>>> vabs_s8 (vsub_s8 (x, y)) != vabd_s8 (x, y)
>>>>>>>
>>>>>>> So in this case, the combine would be erroneous. Likewise SABA.
>>>>>>
>>>>>> This sounds like it would problematic for unsigned types and not just for
>>>>>> vabs_s8 with vsub_s8. So I think you should be using unspec for vabd_s8
>>>>>> instead. Since in rtl overflow and underflow is defined to be wrapping.
>>>>>
>>>>> There are no vabs_u8/vabd_u8 so I don't see how we can reach this point
>>>>> with unsigned types. Further, I have never thought of RTL having signed
>>>>> and unsigned types, just a bag of bits. We'll want to use unspec for the
>>>>> intrinsic version of vabd_s8 - but we'll want to specify the
>>>>>
>>>>> (abs (minus (reg) (reg)))
>>>>>
>>>>> behaviour so that auto-vectorized code can pick it up.
>>>>>
>>>>> So in the end we'll have these patterns:
>>>>>
>>>>> (abs
>>>>> (abs (reg)))
>>>>>
>>>>> (intrinsic_abs
>>>>> (unspec [(reg)] UNSPEC_ABS))
>>>>>
>>>>> (abd
>>>>> (abs (minus (reg) (reg))))
>>>>>
>>>>> (intrinsic_abd
>>>>> (unspec [(reg) (reg)] UNSPEC_ABD))
>>>>>
>>>>> (aba
>>>>> (plus (abs (minus (reg) (reg))) (reg)))
>>>>>
>>>>> (intrinsic_aba
>>>>> (plus (unspec [(reg) (reg)] UNSPEC_ABD) (reg)))
>>>>>
>>>>> which should give us reasonable auto-vectorized code without triggering any
>>>>> of the issues mapping the semantics of the instructions to intrinsics.
>>>>>
>>>>> Thanks,
>>>>> James
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Andrew Pinski
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> James
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2014-05-07 10:40 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-02 8:48 [AArch64] Fix integer vabs intrinsics James Greenhalgh
2014-05-02 9:00 ` Andrew Pinski
2014-05-02 9:21 ` James Greenhalgh
2014-05-02 9:29 ` pinskia
2014-05-02 10:29 ` James Greenhalgh
2014-05-02 10:39 ` Richard Earnshaw
2014-05-05 8:04 ` Richard Biener
2014-05-07 10:30 ` Richard Earnshaw
2014-05-07 10:32 ` Richard Biener
2014-05-07 10:40 ` Richard Earnshaw
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).