* [RFC][PR target/39726 P4 regression] match.pd pattern to do type narrowing
@ 2015-01-31 8:53 Jeff Law
2015-02-01 0:47 ` Joseph Myers
0 siblings, 1 reply; 19+ messages in thread
From: Jeff Law @ 2015-01-31 8:53 UTC (permalink / raw)
To: gcc-patches
So I've gone round and round with BZ39726.
The basic idea here is we have an operation on widened operands that
feeds a narrow comparison. ie, the two inputs are QImode, extended to
SImode, subtracted, then we want to conditionally branch on the result
of sign bit of QImode.
If we could just do the subtraction in the native mode of the operands
(QImode) and test that result, we eliminate two extensions and the test
itself can be eliminated as we'll get the QImode sign bit from the
QImode subtraction.
I first started poking at combine to see if I could exploit the newer 4
insn combination capability to see enough INSNs to prove we could narrow
the operations *and* arrange for the resulting insns to be well suited
for redundant cmp/tst removal.
This was going to require some extensions to the simplification code,
the code to determine good split points, the when to use 4 insn
combination heuristics and possibly other stuff. I was able to see all
the insns I needed but the RTL I was going to have to detect and rewrite
was getting pretty fugly.
I changed course and started looking at twiddling the m68k backend. I
built a proof of concept pair of splitters that would rewrite the key 4
insns and it clearly improved the code. But the result was not well
suited for removing the redundant tst. To also arrange for result to be
suitable for redundant tst/cmp removal, the two proof of concept
splitters would have to merely be bridge patterns to an even larger
splitter which would give us access to the conditional branch so that we
could rewrite the compare and conditional branch too in such a way to
make it obvious the result of the subtraction was sufficient to remove
the tst/cmp instruction.
I then wondered if I could use the new match.pd code to describe what I
wanted at a higher level, possibly getting to simpler code while still
working on generic/gimple. Richi speculated that the new match.pd
approach could be used for type demotions -- I was skeptical, but what
the hell.
+/* If we are testing a single bit resulting from a binary
+ operation in precision P1 where the operands were widened
+ precision P2 and the tested bit is the sign bit for
+ precision P2. Rewrite so the binary operation is in
+ precision P2. */
+(for inner_op (minus plus mult bit_and bit_ior bit_xor)
+ (simplify
+ (bit_and (inner_op (convert @0) (convert @1)) INTEGER_CST@3)
+ (if (GENERIC
+ && exact_log2 (TREE_INT_CST_LOW (@3)) + 1 == TYPE_PRECISION
(TREE_TYPE (@0))
+ && TREE_TYPE (@0) == TREE_TYPE (@1)
+ && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)))
+ /* This is a bit hackish, but I didn't see a good way to change
+ the type of the result. Note that changing the result like
+ this implies that we are not operating on gimple. */
+ (with {type = TREE_TYPE (@0); }
+ (convert (bit_and (inner_op @0 @1) (convert @3)))))))
The exact_log2 (blah) + 1 == TYPE_PRECISION (blah2) stuff ensures that
we're looking at just the sign bit in the same mode/precision as the two
operands of the binary operation.
We're changing the type of the entire expression to match the native
type of the operands. So we have to reassign "type" to the new type we
want to use. This also implies we're working on generic rather than
gimple. It's a bit of a hack, but I couldn't convince the code to use a
different type for the final result.
I guess it could be made to work on gimple as well by expanding it to
include the conditional branch this pattern feeds -- that way we could
avoid the type mismatch.
With that match to match.pd, I get the desired code on the m68k for the
test in the testcase as well as several of my own. I've also verified
this makes _no_ difference to the resulting runtime libraries for x86
and x86_64.
I'll spin a m68k bootstrap over the weekend, but wanted to get some
feedback on the match.pd pattern and perhaps a cleaner way to deal with
changing the resultant type.
Thoughts, comments?
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC][PR target/39726 P4 regression] match.pd pattern to do type narrowing
2015-01-31 8:53 [RFC][PR target/39726 P4 regression] match.pd pattern to do type narrowing Jeff Law
@ 2015-02-01 0:47 ` Joseph Myers
2015-02-01 5:46 ` Jeff Law
0 siblings, 1 reply; 19+ messages in thread
From: Joseph Myers @ 2015-02-01 0:47 UTC (permalink / raw)
To: Jeff Law; +Cc: gcc-patches
On Fri, 30 Jan 2015, Jeff Law wrote:
> +/* If we are testing a single bit resulting from a binary
> + operation in precision P1 where the operands were widened
> + precision P2 and the tested bit is the sign bit for
> + precision P2. Rewrite so the binary operation is in
> + precision P2. */
To avoid introducing undefined behavior, if the operation is arithmetic
rather than bitwise and the original type with precision P2 is signed then
you need to convert the operands to the corresponding unsigned type.
(I'm not sure how you avoid needing to convert the final result back to
the original type of the expression to avoid type mismatches in the
containing expression, but such a conversion back to the original type
would need to be a zero-extension not a sign-extension and so for that I'd
suppose the inner type should be unsigned even for bitwise operations.
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC][PR target/39726 P4 regression] match.pd pattern to do type narrowing
2015-02-01 0:47 ` Joseph Myers
@ 2015-02-01 5:46 ` Jeff Law
2015-02-02 8:57 ` Richard Biener
2015-02-02 16:59 ` Joseph Myers
0 siblings, 2 replies; 19+ messages in thread
From: Jeff Law @ 2015-02-01 5:46 UTC (permalink / raw)
To: Joseph Myers; +Cc: gcc-patches
On 01/31/15 17:47, Joseph Myers wrote:
> On Fri, 30 Jan 2015, Jeff Law wrote:
>
>> +/* If we are testing a single bit resulting from a binary
>> + operation in precision P1 where the operands were widened
>> + precision P2 and the tested bit is the sign bit for
>> + precision P2. Rewrite so the binary operation is in
>> + precision P2. */
>
> To avoid introducing undefined behavior, if the operation is arithmetic
> rather than bitwise and the original type with precision P2 is signed then
> you need to convert the operands to the corresponding unsigned type.
Yea, probably so.
>
> (I'm not sure how you avoid needing to convert the final result back to
> the original type of the expression to avoid type mismatches in the
> containing expression, but such a conversion back to the original type
> would need to be a zero-extension not a sign-extension and so for that I'd
> suppose the inner type should be unsigned even for bitwise operations.
So I think the way to go to solve both issues is to wrap the result
inside a convert. Right now by working on generic, we're relying on
implicit type conversions, which feels wrong.
The nice thing about wrapping the result inside a convert is the types
for the inner operations will propagate from the type of the inner
operands, which is exactly what we want. We then remove the hack
assigning type and instead the original type will be used for the
outermost convert.
That seems to DTRT in some initial sniff testing.
And FWIW, there's no reason to restrict the pattern to just masking off
the sign bit. That's what the PR complains about, but we can do
considerably better here. That's part of the reason why I put in the
iterators -- to generalize this to more cases.
jeff
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC][PR target/39726 P4 regression] match.pd pattern to do type narrowing
2015-02-01 5:46 ` Jeff Law
@ 2015-02-02 8:57 ` Richard Biener
2015-02-02 18:32 ` Jeff Law
2015-02-02 16:59 ` Joseph Myers
1 sibling, 1 reply; 19+ messages in thread
From: Richard Biener @ 2015-02-02 8:57 UTC (permalink / raw)
To: Jeff Law; +Cc: Joseph Myers, GCC Patches
On Sun, Feb 1, 2015 at 6:46 AM, Jeff Law <law@redhat.com> wrote:
> On 01/31/15 17:47, Joseph Myers wrote:
>>
>> On Fri, 30 Jan 2015, Jeff Law wrote:
>>
>>> +/* If we are testing a single bit resulting from a binary
>>> + operation in precision P1 where the operands were widened
>>> + precision P2 and the tested bit is the sign bit for
>>> + precision P2. Rewrite so the binary operation is in
>>> + precision P2. */
>>
>>
>> To avoid introducing undefined behavior, if the operation is arithmetic
>> rather than bitwise and the original type with precision P2 is signed then
>> you need to convert the operands to the corresponding unsigned type.
>
> Yea, probably so.
>
>
>>
>> (I'm not sure how you avoid needing to convert the final result back to
>> the original type of the expression to avoid type mismatches in the
>> containing expression, but such a conversion back to the original type
>> would need to be a zero-extension not a sign-extension and so for that I'd
>> suppose the inner type should be unsigned even for bitwise operations.
>
> So I think the way to go to solve both issues is to wrap the result inside a
> convert. Right now by working on generic, we're relying on implicit type
> conversions, which feels wrong.
>
> The nice thing about wrapping the result inside a convert is the types for
> the inner operations will propagate from the type of the inner operands,
> which is exactly what we want. We then remove the hack assigning type and
> instead the original type will be used for the outermost convert.
It's not even a hack but wrong ;) Correct supported syntax is
+ (with { tree type0 = TREE_TYPE (@0); }
+ (convert:type0 (bit_and (inner_op @0 @1) (convert @3)))))))
Thus whenever the generator cannot auto-guess a type (or would guess
the wrong one) you can explicitely specify a type to convert to.
Why do you restrict this to GENERIC? On GIMPLE you'd eventually
want to impose some single-use constraints as the result with all
the conversions won't really be unconditionally "better"?
(reminds me of thinking of a nicer way for all this single-use stuff
for next stage1)
> That seems to DTRT in some initial sniff testing.
>
> And FWIW, there's no reason to restrict the pattern to just masking off the
> sign bit. That's what the PR complains about, but we can do considerably
> better here. That's part of the reason why I put in the iterators -- to
> generalize this to more cases.
Indeed.
Richard.
> jeff
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC][PR target/39726 P4 regression] match.pd pattern to do type narrowing
2015-02-02 8:57 ` Richard Biener
@ 2015-02-02 18:32 ` Jeff Law
2015-02-03 11:39 ` Richard Biener
0 siblings, 1 reply; 19+ messages in thread
From: Jeff Law @ 2015-02-02 18:32 UTC (permalink / raw)
To: Richard Biener; +Cc: Joseph Myers, GCC Patches
On 02/02/15 01:57, Richard Biener wrote:
>>
>> The nice thing about wrapping the result inside a convert is the types for
>> the inner operations will propagate from the type of the inner operands,
>> which is exactly what we want. We then remove the hack assigning type and
>> instead the original type will be used for the outermost convert.
>
> It's not even a hack but wrong ;) Correct supported syntax is
>
> + (with { tree type0 = TREE_TYPE (@0); }
> + (convert:type0 (bit_and (inner_op @0 @1) (convert @3)))))))
>
> Thus whenever the generator cannot auto-guess a type (or would guess
> the wrong one) you can explicitely specify a type to convert to.
I found that explicit types were ignored in some cases. It was
frustrating to say the least. But I think I've got this part doing what
I want without the hack.
>
> Why do you restrict this to GENERIC? On GIMPLE you'd eventually
> want to impose some single-use constraints as the result with all
> the conversions won't really be unconditionally "better"?
That was strictly because of the mismatch between the resulting type and
how it was later used. That restriction shouldn't be needed anymore.
Jeff
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC][PR target/39726 P4 regression] match.pd pattern to do type narrowing
2015-02-02 18:32 ` Jeff Law
@ 2015-02-03 11:39 ` Richard Biener
2015-02-09 7:15 ` Jeff Law
0 siblings, 1 reply; 19+ messages in thread
From: Richard Biener @ 2015-02-03 11:39 UTC (permalink / raw)
To: Jeff Law; +Cc: Joseph Myers, GCC Patches
On February 2, 2015 7:32:15 PM CET, Jeff Law <law@redhat.com> wrote:
>On 02/02/15 01:57, Richard Biener wrote:
>>>
>>> The nice thing about wrapping the result inside a convert is the
>types for
>>> the inner operations will propagate from the type of the inner
>operands,
>>> which is exactly what we want. We then remove the hack assigning
>type and
>>> instead the original type will be used for the outermost convert.
>>
>> It's not even a hack but wrong ;) Correct supported syntax is
>>
>> + (with { tree type0 = TREE_TYPE (@0); }
>> + (convert:type0 (bit_and (inner_op @0 @1) (convert @3)))))))
>>
>> Thus whenever the generator cannot auto-guess a type (or would guess
>> the wrong one) you can explicitely specify a type to convert to.
>I found that explicit types were ignored in some cases. It was
>frustrating to say the least.
Huh, that would be a bug. Do you have a pattern where that happens?
Richard.
But I think I've got this part doing
>what
>I want without the hack.
>
>>
>> Why do you restrict this to GENERIC? On GIMPLE you'd eventually
>> want to impose some single-use constraints as the result with all
>> the conversions won't really be unconditionally "better"?
>That was strictly because of the mismatch between the resulting type
>and
>how it was later used. That restriction shouldn't be needed anymore.
>
>Jeff
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC][PR target/39726 P4 regression] match.pd pattern to do type narrowing
2015-02-03 11:39 ` Richard Biener
@ 2015-02-09 7:15 ` Jeff Law
2015-02-09 13:42 ` Richard Biener
0 siblings, 1 reply; 19+ messages in thread
From: Jeff Law @ 2015-02-09 7:15 UTC (permalink / raw)
To: Richard Biener; +Cc: Joseph Myers, GCC Patches
On 02/03/15 04:39, Richard Biener wrote:
>> I found that explicit types were ignored in some cases. It was
>> frustrating to say the least.
>
> Huh, that would be a bug. Do you have a pattern where that happens?
I'll have to recreate them. In the mean time consider something else
I'm playing with that causes an odd error from genmatch...
/* If we have a narrowing conversion of an arithmetic or logical
operation where both are operands widening conversions from the
same type as the outer narrowing conversion. Then convert the
innermost operands to a suitable unsigned type (to avoid introducing
undefined behaviour), perform the operation and convert the result to
the desired type. */
(simplify
(convert (plus (convert@2 @0) (convert @1)))
(if (TREE_TYPE (@0) == TREE_TYPE (@1)
&& TREE_TYPE (@0) == type
&& INTEGRAL_TYPE_P (type)
&& TYPE_PRECISION (TREE_TYPE (@2)) >= TYPE_PRECISION
(TREE_TYPE (@0)))
(with { tree utype = unsigned_type_for (TREE_TYPE (@0));}
(convert (plus (convert:utype @0) (convert:utype @1)))))))
So given two narrow operands that get widened, added, and the final
result narrowed back down to the original operand types. Replace with
convert the operands to an unsigned type (of same size as the operand),
operate on them and convert to the final desired type.
This happens to fix 47477 (P2 regression). Works perfectly for the
testcase.
Of course we'd like to extend that to other operators... So, adding the
obvious for iterator...
(for op (plus minus)
(simplify
(convert (op (convert@2 @0) (convert @1)))
(if (TREE_TYPE (@0) == TREE_TYPE (@1)
&& TREE_TYPE (@0) == type
&& INTEGRAL_TYPE_P (type)
&& TYPE_PRECISION (TREE_TYPE (@2)) >= TYPE_PRECISION
(TREE_TYPE (@0)))
(with { tree utype = unsigned_type_for (TREE_TYPE (@0));}
(convert (op (convert:utype @0) (convert:utype @1)))))))
Which causes genmatch to barf:
build/genmatch --gimple /home/gcc/GIT-2/gcc/gcc/match.pd \
> tmp-gimple-match.c
genmatch: two conversions in a row
Not only does genmatch barf, it doesn't give any indication what part of
the .pd file it found objectionable.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC][PR target/39726 P4 regression] match.pd pattern to do type narrowing
2015-02-09 7:15 ` Jeff Law
@ 2015-02-09 13:42 ` Richard Biener
2015-02-09 18:33 ` Jeff Law
0 siblings, 1 reply; 19+ messages in thread
From: Richard Biener @ 2015-02-09 13:42 UTC (permalink / raw)
To: Jeff Law; +Cc: Joseph Myers, GCC Patches
On Mon, Feb 9, 2015 at 8:15 AM, Jeff Law <law@redhat.com> wrote:
> On 02/03/15 04:39, Richard Biener wrote:
>>>
>>> I found that explicit types were ignored in some cases. It was
>>> frustrating to say the least.
>>
>>
>> Huh, that would be a bug. Do you have a pattern where that happens?
>
> I'll have to recreate them. In the mean time consider something else I'm
> playing with that causes an odd error from genmatch...
>
> /* If we have a narrowing conversion of an arithmetic or logical
> operation where both are operands widening conversions from the
> same type as the outer narrowing conversion. Then convert the
> innermost operands to a suitable unsigned type (to avoid introducing
> undefined behaviour), perform the operation and convert the result to
> the desired type. */
> (simplify
> (convert (plus (convert@2 @0) (convert @1)))
> (if (TREE_TYPE (@0) == TREE_TYPE (@1)
> && TREE_TYPE (@0) == type
> && INTEGRAL_TYPE_P (type)
> && TYPE_PRECISION (TREE_TYPE (@2)) >= TYPE_PRECISION (TREE_TYPE
> (@0)))
> (with { tree utype = unsigned_type_for (TREE_TYPE (@0));}
> (convert (plus (convert:utype @0) (convert:utype @1)))))))
>
> So given two narrow operands that get widened, added, and the final result
> narrowed back down to the original operand types. Replace with convert the
> operands to an unsigned type (of same size as the operand), operate on them
> and convert to the final desired type.
>
> This happens to fix 47477 (P2 regression). Works perfectly for the
> testcase.
>
>
> Of course we'd like to extend that to other operators... So, adding the
> obvious for iterator...
>
> (for op (plus minus)
> (simplify
> (convert (op (convert@2 @0) (convert @1)))
> (if (TREE_TYPE (@0) == TREE_TYPE (@1)
> && TREE_TYPE (@0) == type
> && INTEGRAL_TYPE_P (type)
> && TYPE_PRECISION (TREE_TYPE (@2)) >= TYPE_PRECISION (TREE_TYPE
> (@0)))
> (with { tree utype = unsigned_type_for (TREE_TYPE (@0));}
> (convert (op (convert:utype @0) (convert:utype @1)))))))
>
>
> Which causes genmatch to barf:
>
> build/genmatch --gimple /home/gcc/GIT-2/gcc/gcc/match.pd \
> > tmp-gimple-match.c
> genmatch: two conversions in a row
>
>
> Not only does genmatch barf, it doesn't give any indication what part of the
> .pd file it found objectionable.
Yeah, I'll have to assign locations to more places at some point.
But the following fixes your testcase, committed to trunk as obvious.
Richard.
2015-02-09 Richard Biener <rguenther@suse.de>
* genmatch.c (replace_id): Copy expr_type.
Index: gcc/genmatch.c
===================================================================
--- gcc/genmatch.c (revision 220540)
+++ gcc/genmatch.c (working copy)
@@ -982,6 +982,7 @@ replace_id (operand *o, user_id *id, id_
{
expr *ne = new expr (e->operation == id ? with : e->operation,
e->is_commutative);
+ ne->expr_type = e->expr_type;
for (unsigned i = 0; i < e->ops.length (); ++i)
ne->append_op (replace_id (e->ops[i], id, with));
return ne;
>
>
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC][PR target/39726 P4 regression] match.pd pattern to do type narrowing
2015-02-09 13:42 ` Richard Biener
@ 2015-02-09 18:33 ` Jeff Law
0 siblings, 0 replies; 19+ messages in thread
From: Jeff Law @ 2015-02-09 18:33 UTC (permalink / raw)
To: Richard Biener; +Cc: Joseph Myers, GCC Patches
On 02/09/15 06:42, Richard Biener wrote:
> On Mon, Feb 9, 2015 at 8:15 AM, Jeff Law <law@redhat.com> wrote:
>> On 02/03/15 04:39, Richard Biener wrote:
>>>>
>>>> I found that explicit types were ignored in some cases. It was
>>>> frustrating to say the least.
>>>
>>>
>>> Huh, that would be a bug. Do you have a pattern where that happens?
>>
>> I'll have to recreate them. In the mean time consider something else I'm
>> playing with that causes an odd error from genmatch...
>>
>> /* If we have a narrowing conversion of an arithmetic or logical
>> operation where both are operands widening conversions from the
>> same type as the outer narrowing conversion. Then convert the
>> innermost operands to a suitable unsigned type (to avoid introducing
>> undefined behaviour), perform the operation and convert the result to
>> the desired type. */
>> (simplify
>> (convert (plus (convert@2 @0) (convert @1)))
>> (if (TREE_TYPE (@0) == TREE_TYPE (@1)
>> && TREE_TYPE (@0) == type
>> && INTEGRAL_TYPE_P (type)
>> && TYPE_PRECISION (TREE_TYPE (@2)) >= TYPE_PRECISION (TREE_TYPE
>> (@0)))
>> (with { tree utype = unsigned_type_for (TREE_TYPE (@0));}
>> (convert (plus (convert:utype @0) (convert:utype @1)))))))
>>
>> So given two narrow operands that get widened, added, and the final result
>> narrowed back down to the original operand types. Replace with convert the
>> operands to an unsigned type (of same size as the operand), operate on them
>> and convert to the final desired type.
>>
>> This happens to fix 47477 (P2 regression). Works perfectly for the
>> testcase.
>>
>>
>> Of course we'd like to extend that to other operators... So, adding the
>> obvious for iterator...
>>
>> (for op (plus minus)
>> (simplify
>> (convert (op (convert@2 @0) (convert @1)))
>> (if (TREE_TYPE (@0) == TREE_TYPE (@1)
>> && TREE_TYPE (@0) == type
>> && INTEGRAL_TYPE_P (type)
>> && TYPE_PRECISION (TREE_TYPE (@2)) >= TYPE_PRECISION (TREE_TYPE
>> (@0)))
>> (with { tree utype = unsigned_type_for (TREE_TYPE (@0));}
>> (convert (op (convert:utype @0) (convert:utype @1)))))))
>>
>>
>> Which causes genmatch to barf:
>>
>> build/genmatch --gimple /home/gcc/GIT-2/gcc/gcc/match.pd \
>> > tmp-gimple-match.c
>> genmatch: two conversions in a row
>>
>>
>> Not only does genmatch barf, it doesn't give any indication what part of the
>> .pd file it found objectionable.
>
> Yeah, I'll have to assign locations to more places at some point.
>
> But the following fixes your testcase, committed to trunk as obvious.
>
> Richard.
>
> 2015-02-09 Richard Biener <rguenther@suse.de>
>
> * genmatch.c (replace_id): Copy expr_type.
Cool, thanks!
jeff
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC][PR target/39726 P4 regression] match.pd pattern to do type narrowing
2015-02-01 5:46 ` Jeff Law
2015-02-02 8:57 ` Richard Biener
@ 2015-02-02 16:59 ` Joseph Myers
2015-02-02 18:04 ` Jeff Law
1 sibling, 1 reply; 19+ messages in thread
From: Joseph Myers @ 2015-02-02 16:59 UTC (permalink / raw)
To: Jeff Law; +Cc: gcc-patches
On Sat, 31 Jan 2015, Jeff Law wrote:
> The nice thing about wrapping the result inside a convert is the types for the
> inner operations will propagate from the type of the inner operands, which is
> exactly what we want. We then remove the hack assigning type and instead the
> original type will be used for the outermost convert.
Those inner operands still need converting to unsigned for arithmetic.
> And FWIW, there's no reason to restrict the pattern to just masking off the
> sign bit. That's what the PR complains about, but we can do considerably
> better here. That's part of the reason why I put in the iterators -- to
> generalize this to more cases.
Well, we want to move shorten_binary_op and shorten_compare to the new
mechanism.
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC][PR target/39726 P4 regression] match.pd pattern to do type narrowing
2015-02-02 16:59 ` Joseph Myers
@ 2015-02-02 18:04 ` Jeff Law
2015-02-03 7:20 ` Jeff Law
0 siblings, 1 reply; 19+ messages in thread
From: Jeff Law @ 2015-02-02 18:04 UTC (permalink / raw)
To: Joseph Myers; +Cc: gcc-patches
On 02/02/15 09:59, Joseph Myers wrote:
> On Sat, 31 Jan 2015, Jeff Law wrote:
>
>> The nice thing about wrapping the result inside a convert is the types for the
>> inner operations will propagate from the type of the inner operands, which is
>> exactly what we want. We then remove the hack assigning type and instead the
>> original type will be used for the outermost convert.
>
> Those inner operands still need converting to unsigned for arithmetic.
Yes.
>
>> And FWIW, there's no reason to restrict the pattern to just masking off the
>> sign bit. That's what the PR complains about, but we can do considerably
>> better here. That's part of the reason why I put in the iterators -- to
>> generalize this to more cases.
>
> Well, we want to move shorten_binary_op and shorten_compare to the new
> mechanism.
Absolutely. If we could have match.pd cover those cases, that'd be a
significant validation of match.pd for this class of problems. Seems
like gcc-6 stuff to me though.
I haven't looked at those routines in a long time, but reviewing them
seems wise both in the immediate term WRT this bug and ensuring we're
doing the right thing for the various corner cases.
Jeff
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC][PR target/39726 P4 regression] match.pd pattern to do type narrowing
2015-02-02 18:04 ` Jeff Law
@ 2015-02-03 7:20 ` Jeff Law
2015-02-03 12:23 ` Joseph Myers
0 siblings, 1 reply; 19+ messages in thread
From: Jeff Law @ 2015-02-03 7:20 UTC (permalink / raw)
To: Joseph Myers; +Cc: gcc-patches
[-- Attachment #1: Type: text/plain, Size: 2375 bytes --]
On 02/02/15 11:04, Jeff Law wrote:
> On 02/02/15 09:59, Joseph Myers wrote:
>> On Sat, 31 Jan 2015, Jeff Law wrote:
>>
>>> The nice thing about wrapping the result inside a convert is the
>>> types for the
>>> inner operations will propagate from the type of the inner operands,
>>> which is
>>> exactly what we want. We then remove the hack assigning type and
>>> instead the
>>> original type will be used for the outermost convert.
>>
>> Those inner operands still need converting to unsigned for arithmetic.
> Yes.
So it's actually painful to try to get those inner operands converted to
unsigned. So at least for this iteration, it's probably best to punt
for signed arithmetic and focus on the logicals and unsigned arithmetic
>
> I haven't looked at those routines in a long time, but reviewing them
> seems wise both in the immediate term WRT this bug and ensuring we're
> doing the right thing for the various corner cases.
So shorten_binary_op is the closest to what the code in match.pd tries
to do right now. They use slightly different means to know when its
safe to narrow a binary operation.
shorten_binary_op is passed in a resulting type and it assumes that no
bits outside that type are needed.
My match.pd code looks for an explicit mask to know when the bits
outside the binary operation's type are not needed. I'd think it ought
to be possible to extend the match.pd code to handle other mechansisms
where we know some set of high bits aren't needed. Hard to justify that
extension in stage4 though.
The latest match.pd code requires the types of op0 and op1 to have the
same precision and signedness. simplify_binary_op is a bit looser than
that, but that's most likely a historical quirk since it pre-dates gimple.
I'm now using two match.pd patterns, one for logicals, one for unsigned
arithmetic which also simplifies things a bit. Finally the match.pd
code does not try to handle signed inner arith operands which helps too.
It would certainly be interesting to instrument shorten_binary_op in
stage1 and catch the cases where it's triggering and then looking to see
how those cases can be done in match.pd.
Anyway, here's the two patterns now. They bootstrap and don't cause any
code generation changes for x86_64, but they do fix 39726 and
considerably improve a variety of related testcases on the m68k.
Jeff
[-- Attachment #2: P --]
[-- Type: text/plain, Size: 1777 bytes --]
diff --git a/gcc/match.pd b/gcc/match.pd
index 81c4ee6..d55fccd 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1018,3 +1018,31 @@ along with GCC; see the file COPYING3. If not see
(logs (pows @0 @1))
(mult @1 (logs @0)))))
+/* Given a bit-wise operation performed in mode P1 on operands
+ in some narrower type P2 that feeds an outer masking operation.
+ See if the mask turns off all the bits outside P2, and if so
+ perform the all the operations in P2 and just convert the final
+ result from P1 to P2. */
+(for inner_op (bit_and bit_ior bit_xor)
+ (simplify
+ (bit_and (inner_op (convert @0) (convert @1)) INTEGER_CST@3)
+ (if ((TREE_INT_CST_LOW (@3) & ~GET_MODE_MASK (TYPE_MODE (TREE_TYPE (@0)))) == 0
+ && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1))
+ && TYPE_UNSIGNED (TREE_TYPE (@0)) == TYPE_UNSIGNED (TREE_TYPE (@1))
+ && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)))
+ (convert (bit_and (inner_op @0 @1) (convert @3))))))
+
+/* Similarly, but for unsigned arithmetic operations.
+
+ It would be nice to handle signed arithmetic, but that runs the
+ risk of introducing undefined behaviour where none existed before. */
+(for inner_op (minus plus mult)
+ (simplify
+ (bit_and (inner_op (convert @0) (convert @1)) INTEGER_CST@3)
+ (if ((TREE_INT_CST_LOW (@3) & ~GET_MODE_MASK (TYPE_MODE (TREE_TYPE (@0)))) == 0
+ && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1))
+ && TYPE_UNSIGNED (TREE_TYPE (@0)) == TYPE_UNSIGNED (TREE_TYPE (@1))
+ /* Restricted to unsigned inner arithmetic for now. */
+ && TYPE_UNSIGNED (TREE_TYPE (@0))
+ && TYPE_PRECISION (TREE_TYPE (@3)) > TYPE_PRECISION (TREE_TYPE (@0)))
+ (convert (bit_and (inner_op @0 @1) (convert @3))))))
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC][PR target/39726 P4 regression] match.pd pattern to do type narrowing
2015-02-03 7:20 ` Jeff Law
@ 2015-02-03 12:23 ` Joseph Myers
2015-02-08 7:43 ` Jeff Law
2015-02-11 6:43 ` Jeff Law
0 siblings, 2 replies; 19+ messages in thread
From: Joseph Myers @ 2015-02-03 12:23 UTC (permalink / raw)
To: Jeff Law; +Cc: gcc-patches
On Tue, 3 Feb 2015, Jeff Law wrote:
> +/* Given a bit-wise operation performed in mode P1 on operands
> + in some narrower type P2 that feeds an outer masking operation.
> + See if the mask turns off all the bits outside P2, and if so
> + perform the all the operations in P2 and just convert the final
> + result from P1 to P2. */
> +(for inner_op (bit_and bit_ior bit_xor)
> + (simplify
> + (bit_and (inner_op (convert @0) (convert @1)) INTEGER_CST@3)
> + (if ((TREE_INT_CST_LOW (@3) & ~GET_MODE_MASK (TYPE_MODE (TREE_TYPE (@0)))) == 0
Are you sure about checking TREE_INT_CST_LOW here? What if the inner type
is wider than HOST_WIDE_INT? (That could occur with bit-fields of type
__int128, for example.) I think the check for what bits are set needs to
be written in a wide-int-safe way - maybe something like tree_int_cst_sgn
(@3) > 0 && tree_int_cst_min_precision (@3, UNSIGNED) <= TYPE_PRECISION
(TREE_TYPE (@1)).
> + && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1))
> + && TYPE_UNSIGNED (TREE_TYPE (@0)) == TYPE_UNSIGNED (TREE_TYPE (@1))
> + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)))
> + (convert (bit_and (inner_op @0 @1) (convert @3))))))
I still don't think this is safe. Suppose @0 and @1 are -128 in type
int8_t and @3 is 128 in a wider type and the operation is AND. Then the
original expression has result 128. But if you convert @3 to int8_t you
get -128 and this would result in -128 from the simplified expression.
If the inner values are signed and the mask includes the sign bit of the
inner type, you have to zero-extend to the wider type (e.g. convert the
inner values to unsigned), not sign-extend.
(If the inner values are signed, it's *also* valid to optimize with a mask
where both the sign bit of the inner type and all higher bits are set,
such as a mask of -128 above; in that case, you do need to sign-extend.
If the inner values are unsigned, no check on the mask value is needed at
all as all higher bits in the mask can just be discarded. Both of these
statements only apply for bitwise operations, not arithmetic.)
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC][PR target/39726 P4 regression] match.pd pattern to do type narrowing
2015-02-03 12:23 ` Joseph Myers
@ 2015-02-08 7:43 ` Jeff Law
2015-02-11 6:43 ` Jeff Law
1 sibling, 0 replies; 19+ messages in thread
From: Jeff Law @ 2015-02-08 7:43 UTC (permalink / raw)
To: Joseph Myers; +Cc: gcc-patches
On 02/03/15 05:23, Joseph Myers wrote:
> On Tue, 3 Feb 2015, Jeff Law wrote:
>
>> +/* Given a bit-wise operation performed in mode P1 on operands
>> + in some narrower type P2 that feeds an outer masking operation.
>> + See if the mask turns off all the bits outside P2, and if so
>> + perform the all the operations in P2 and just convert the final
>> + result from P1 to P2. */
>> +(for inner_op (bit_and bit_ior bit_xor)
>> + (simplify
>> + (bit_and (inner_op (convert @0) (convert @1)) INTEGER_CST@3)
>> + (if ((TREE_INT_CST_LOW (@3) & ~GET_MODE_MASK (TYPE_MODE (TREE_TYPE (@0)))) == 0
>
> Are you sure about checking TREE_INT_CST_LOW here? What if the inner type
> is wider than HOST_WIDE_INT? (That could occur with bit-fields of type
> __int128, for example.) I think the check for what bits are set needs to
> be written in a wide-int-safe way - maybe something like tree_int_cst_sgn
> (@3) > 0 && tree_int_cst_min_precision (@3, UNSIGNED) <= TYPE_PRECISION
> (TREE_TYPE (@1)).
It's a WIP :-)
>
>> + && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1))
>> + && TYPE_UNSIGNED (TREE_TYPE (@0)) == TYPE_UNSIGNED (TREE_TYPE (@1))
>> + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)))
>> + (convert (bit_and (inner_op @0 @1) (convert @3))))))
>
> I still don't think this is safe. Suppose @0 and @1 are -128 in type
> int8_t and @3 is 128 in a wider type and the operation is AND. Then the
> original expression has result 128. But if you convert @3 to int8_t you
> get -128 and this would result in -128 from the simplified expression.
Agreed.
>
> If the inner values are signed and the mask includes the sign bit of the
> inner type, you have to zero-extend to the wider type (e.g. convert the
> inner values to unsigned), not sign-extend.
FWIW I did figure out how to get at an expression's type (you can
capture them just like individual operands). This is important because
if we introduce that signed->unsigned conversions, we have to avoid
infinite recursion of the pattern and the easiest way is actually to
look at the various types.
>
> (If the inner values are signed, it's *also* valid to optimize with a mask
> where both the sign bit of the inner type and all higher bits are set,
> such as a mask of -128 above; in that case, you do need to sign-extend.
> If the inner values are unsigned, no check on the mask value is needed at
> all as all higher bits in the mask can just be discarded. Both of these
> statements only apply for bitwise operations, not arithmetic.)
Noted.
Thanks. I'm going to go through another iteration on this stuff. I'm
less concerned about this particular PR, but the knowledge gained here
looks to be helpful for 45397 which looks like it might be solveable
with match.pd patterns too.
jeff
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC][PR target/39726 P4 regression] match.pd pattern to do type narrowing
2015-02-03 12:23 ` Joseph Myers
2015-02-08 7:43 ` Jeff Law
@ 2015-02-11 6:43 ` Jeff Law
2015-02-11 11:16 ` Richard Biener
2015-02-11 17:13 ` Joseph Myers
1 sibling, 2 replies; 19+ messages in thread
From: Jeff Law @ 2015-02-11 6:43 UTC (permalink / raw)
To: Joseph Myers; +Cc: gcc-patches
On 02/03/15 05:23, Joseph Myers wrote:
>
>> + && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1))
>> + && TYPE_UNSIGNED (TREE_TYPE (@0)) == TYPE_UNSIGNED (TREE_TYPE (@1))
>> + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)))
>> + (convert (bit_and (inner_op @0 @1) (convert @3))))))
>
> I still don't think this is safe. Suppose @0 and @1 are -128 in type
> int8_t and @3 is 128 in a wider type and the operation is AND. Then the
> original expression has result 128. But if you convert @3 to int8_t you
> get -128 and this would result in -128 from the simplified expression.
Looking at this again, @3 is being converted to the wrong type, plain
and simple. I should never write code while heavily medicated. I
suspect that combined with trying to work around the genmatch bug Richi
just fixed sent me down a totally wrong path.
I think the way to go is to always convert the inner operands to an
unsigned type. In fact, everything except the outer convert should be
using an unsigned type of the same size/precision as @0's type. The
outer convert should, of course, be the final type.
That avoids all the concerns about sign bit propagation from the narrow
type into the wider final type.
Application of this pattern (and the one I posted for 47477) is a
concern for targets that don't do sub-word arithmetic/logicals. But I
just did a sniff test of one such target (v850-elf because it was handy)
and I couldn't spot a change in the end code using both the 47477 patch
and my WIP patch for this BZ.
jeff
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC][PR target/39726 P4 regression] match.pd pattern to do type narrowing
2015-02-11 6:43 ` Jeff Law
@ 2015-02-11 11:16 ` Richard Biener
2015-02-11 15:56 ` Jeff Law
2015-02-11 17:13 ` Joseph Myers
1 sibling, 1 reply; 19+ messages in thread
From: Richard Biener @ 2015-02-11 11:16 UTC (permalink / raw)
To: Jeff Law; +Cc: Joseph Myers, GCC Patches
On Wed, Feb 11, 2015 at 7:43 AM, Jeff Law <law@redhat.com> wrote:
> On 02/03/15 05:23, Joseph Myers wrote:
>>
>>
>>> + && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE
>>> (@1))
>>> + && TYPE_UNSIGNED (TREE_TYPE (@0)) == TYPE_UNSIGNED (TREE_TYPE
>>> (@1))
>>> + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)))
>>> + (convert (bit_and (inner_op @0 @1) (convert @3))))))
>>
>>
>> I still don't think this is safe. Suppose @0 and @1 are -128 in type
>> int8_t and @3 is 128 in a wider type and the operation is AND. Then the
>> original expression has result 128. But if you convert @3 to int8_t you
>> get -128 and this would result in -128 from the simplified expression.
>
> Looking at this again, @3 is being converted to the wrong type, plain and
> simple. I should never write code while heavily medicated. I suspect that
> combined with trying to work around the genmatch bug Richi just fixed sent
> me down a totally wrong path.
>
> I think the way to go is to always convert the inner operands to an unsigned
> type. In fact, everything except the outer convert should be using an
> unsigned type of the same size/precision as @0's type. The outer convert
> should, of course, be the final type.
>
> That avoids all the concerns about sign bit propagation from the narrow type
> into the wider final type.
>
> Application of this pattern (and the one I posted for 47477) is a concern
> for targets that don't do sub-word arithmetic/logicals. But I just did a
> sniff test of one such target (v850-elf because it was handy) and I couldn't
> spot a change in the end code using both the 47477 patch and my WIP patch
> for this BZ.
The c-family frontends perform this kind of narrowing already anyway
(via the shorten_* stuff which is misplaced there and should be done
elsewhere for all frontends - thus in match.pd, thanks for starting that).
Richard.
> jeff
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC][PR target/39726 P4 regression] match.pd pattern to do type narrowing
2015-02-11 11:16 ` Richard Biener
@ 2015-02-11 15:56 ` Jeff Law
2015-02-11 16:06 ` Jakub Jelinek
0 siblings, 1 reply; 19+ messages in thread
From: Jeff Law @ 2015-02-11 15:56 UTC (permalink / raw)
To: Richard Biener; +Cc: Joseph Myers, GCC Patches
On 02/11/15 04:16, Richard Biener wrote:
>>
>> Application of this pattern (and the one I posted for 47477) is a concern
>> for targets that don't do sub-word arithmetic/logicals. But I just did a
>> sniff test of one such target (v850-elf because it was handy) and I couldn't
>> spot a change in the end code using both the 47477 patch and my WIP patch
>> for this BZ.
>
> The c-family frontends perform this kind of narrowing already anyway
> (via the shorten_* stuff which is misplaced there and should be done
> elsewhere for all frontends - thus in match.pd, thanks for starting that).
True, but I wanted to see if there was any impact, but thankfully there
isn't.
The fact that the C/C++ front-ends are doing most of the shortening now
probably explains why the fix for 47477 only affected code generation
for the Java front-end.
Jeff
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC][PR target/39726 P4 regression] match.pd pattern to do type narrowing
2015-02-11 15:56 ` Jeff Law
@ 2015-02-11 16:06 ` Jakub Jelinek
0 siblings, 0 replies; 19+ messages in thread
From: Jakub Jelinek @ 2015-02-11 16:06 UTC (permalink / raw)
To: Jeff Law; +Cc: Richard Biener, Joseph Myers, GCC Patches
On Wed, Feb 11, 2015 at 08:56:20AM -0700, Jeff Law wrote:
> On 02/11/15 04:16, Richard Biener wrote:
> >>
> >>Application of this pattern (and the one I posted for 47477) is a concern
> >>for targets that don't do sub-word arithmetic/logicals. But I just did a
> >>sniff test of one such target (v850-elf because it was handy) and I couldn't
> >>spot a change in the end code using both the 47477 patch and my WIP patch
> >>for this BZ.
> >
> >The c-family frontends perform this kind of narrowing already anyway
> >(via the shorten_* stuff which is misplaced there and should be done
> >elsewhere for all frontends - thus in match.pd, thanks for starting that).
> True, but I wanted to see if there was any impact, but thankfully there
> isn't.
>
> The fact that the C/C++ front-ends are doing most of the shortening now
> probably explains why the fix for 47477 only affected code generation for
> the Java front-end.
The C/C++ FEs are doing it only if it appears on a single statement though,
something fold can see at once.
If you rewrite it so that there are multiple vars holding the intermediate
values, they of course can't do that and it is a task for match.pd or
specialized pass to handle those.
Jakub
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC][PR target/39726 P4 regression] match.pd pattern to do type narrowing
2015-02-11 6:43 ` Jeff Law
2015-02-11 11:16 ` Richard Biener
@ 2015-02-11 17:13 ` Joseph Myers
1 sibling, 0 replies; 19+ messages in thread
From: Joseph Myers @ 2015-02-11 17:13 UTC (permalink / raw)
To: Jeff Law; +Cc: gcc-patches
On Tue, 10 Feb 2015, Jeff Law wrote:
> I think the way to go is to always convert the inner operands to an unsigned
> type. In fact, everything except the outer convert should be using an
> unsigned type of the same size/precision as @0's type. The outer convert
> should, of course, be the final type.
>
> That avoids all the concerns about sign bit propagation from the narrow type
> into the wider final type.
Subject of course to allowing for the cases where the original expression
uses sign-extension (signed narrow type), the mask is consistent with this
(sign bit of narrow type and all higher bits set, e.g. -128), and the
operation is bitwise not arithmetic, if you want to optimize those as
well.
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2015-02-11 17:13 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-31 8:53 [RFC][PR target/39726 P4 regression] match.pd pattern to do type narrowing Jeff Law
2015-02-01 0:47 ` Joseph Myers
2015-02-01 5:46 ` Jeff Law
2015-02-02 8:57 ` Richard Biener
2015-02-02 18:32 ` Jeff Law
2015-02-03 11:39 ` Richard Biener
2015-02-09 7:15 ` Jeff Law
2015-02-09 13:42 ` Richard Biener
2015-02-09 18:33 ` Jeff Law
2015-02-02 16:59 ` Joseph Myers
2015-02-02 18:04 ` Jeff Law
2015-02-03 7:20 ` Jeff Law
2015-02-03 12:23 ` Joseph Myers
2015-02-08 7:43 ` Jeff Law
2015-02-11 6:43 ` Jeff Law
2015-02-11 11:16 ` Richard Biener
2015-02-11 15:56 ` Jeff Law
2015-02-11 16:06 ` Jakub Jelinek
2015-02-11 17:13 ` Joseph Myers
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).