From: Kyrill Tkachov <kyrylo.tkachov@arm.com>
To: Segher Boessenkool <segher@kernel.crashing.org>
Cc: gcc Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH][combine][RFC] Don't transform sign and zero extends inside mults
Date: Thu, 05 Nov 2015 12:01:00 -0000 [thread overview]
Message-ID: <563B4516.5090001@arm.com> (raw)
In-Reply-To: <20151104235015.GA13203@gate.crashing.org>
Hi Segher,
On 04/11/15 23:50, Segher Boessenkool wrote:
> Hi Kyrill,
>
> On Mon, Nov 02, 2015 at 02:15:27PM +0000, Kyrill Tkachov wrote:
>> This patch attempts to restrict combine from transforming ZERO_EXTEND and
>> SIGN_EXTEND operations into and-bitmask
>> and weird SUBREG expressions when they appear inside MULT expressions. This
>> is because a MULT rtx containing these
>> extend operations is usually a well understood widening multiply operation.
> Right. I have often wanted for combine to try plain substitution as well
> as substitution and simplification: simplification (which uses nonzero_bits
> etc.) often makes a mess of even moderately complex instructions.
>
> But since we do not have that yet, and your 24% number is nicely impressive,
> let's try to do no simplification for widening mults, as in your patch.
>
>> With this patch on aarch64 I saw increased generation of smaddl and umaddl
>> instructions where
>> before the combination of smull and add instructions were rejected because
>> the extend operands
>> were being transformed into these weird equivalent expressions.
>>
>> Overall, I see 24% more [su]maddl instructions being generated for SPEC2006
>> and with no performance regressions.
>>
>> The testcase in the patch is the most minimal one I could get that
>> demonstrates the issue I'm trying to solve.
>>
>> Does this approach look ok?
> In broad lines it does. Could you try this patch instead? Not tested
> etc. (other than building an aarch64 cross and your test case); I'll do
> that tomorrow if you like the result.
Thanks, that looks less intrusive. I did try it out on arm and aarch64.
It does work on the aarch64 testcase. However, there's also a correctness
regression, I'll try to explain inline....
>
>
> Segher
>
>
> diff --git a/gcc/combine.c b/gcc/combine.c
> index c3db2e0..3bf7ffb 100644
> --- a/gcc/combine.c
> +++ b/gcc/combine.c
> @@ -5284,6 +5284,15 @@ subst (rtx x, rtx from, rtx to, int in_dest, int in_cond, int unique_copy)
> || GET_CODE (SET_DEST (x)) == PC))
> fmt = "ie";
>
> + /* Substituting into the operands of a widening MULT is not likely
> + to create RTL matching a machine insn. */
> + if (code == MULT
> + && (GET_CODE (XEXP (x, 0)) == ZERO_EXTEND
> + || GET_CODE (XEXP (x, 0)) == SIGN_EXTEND)
> + && (GET_CODE (XEXP (x, 1)) == ZERO_EXTEND
> + || GET_CODE (XEXP (x, 1)) == SIGN_EXTEND))
> + return x;
I think we should also add:
&& REG_P (XEXP (XEXP (x, 0), 0))
&& REG_P (XEXP (XEXP (x, 1), 0))
to the condition. Otherwise I've seen regressions in the arm testsuite, the
gcc.target/arm/smlatb-1.s test in particular that tries to match the pattern
(define_insn "*maddhisi4tb"
[(set (match_operand:SI 0 "s_register_operand" "=r")
(plus:SI (mult:SI (ashiftrt:SI
(match_operand:SI 1 "s_register_operand" "r")
(const_int 16))
(sign_extend:SI
(match_operand:HI 2 "s_register_operand" "r")))
(match_operand:SI 3 "s_register_operand" "r")))]
There we have a sign_extend of a shift that we want to convert to the form
that the pattern expects. So adding the checks for REG_P fixes that for me.
For the correctness issue I saw on aarch64 the shortest case I could reduce is:
short int a[16], b[16];
void
f5 (void)
{
a[0] = b[0] / 14;
}
We synthesise the division and lose one of the intermediate immediates.
The good codegen is:
f5:
adrp x1, b
mov w0, 9363 // <--- This gets lost!
movk w0, 0x9249, lsl 16 // <--- This gets lost!
adrp x2, a
ldrsh w1, [x1, #:lo12:b]
smull x0, w1, w0
lsr x0, x0, 32
add w0, w1, w0
asr w0, w0, 3
sub w0, w0, w1, asr 31
strh w0, [x2, #:lo12:a]
ret
The bad one with this patch is:
adrp x1, b
adrp x2, a
ldrsh w1, [x1, #:lo12:b]
smull x0, w1, w0
lsr x0, x0, 32
add w0, w1, w0
asr w0, w0, 3
sub w0, w0, w1, asr 31
strh w0, [x2, #:lo12:a]
ret
The problematic hunk there is the subst hunk.
In the expression 'x':
(mult:DI (sign_extend:DI (reg:SI 80 [ b ]))
(sign_extend:DI (reg:SI 82)))
it tries to substitute 'from': (reg:SI 82)
with 'to': (const_int -1840700269 [0xffffffff92492493])
since we now return just 'x' combine thinks that the substitution succeeded
and eliminates the immediate move.
Is there a way that subst can signal some kind of "failed to substitute" result?
If not, I got it to work by using that check to set the in_dest variable to the
subsequent recursive call to subst, in a similar way to my original patch, but I was
hoping to avoid overloading the meaning of in_dest.
Thanks,
Kyrill
> +
> /* Get the mode of operand 0 in case X is now a SIGN_EXTEND of a
> constant. */
> if (fmt[0] == 'e')
> @@ -8455,6 +8464,15 @@ force_to_mode (rtx x, machine_mode mode, unsigned HOST_WIDE_INT mask,
> /* ... fall through ... */
>
> case MULT:
> + /* Substituting into the operands of a widening MULT is not likely to
> + create RTL matching a machine insn. */
> + if (code == MULT
> + && (GET_CODE (XEXP (x, 0)) == ZERO_EXTEND
> + || GET_CODE (XEXP (x, 0)) == SIGN_EXTEND)
> + && (GET_CODE (XEXP (x, 1)) == ZERO_EXTEND
> + || GET_CODE (XEXP (x, 1)) == SIGN_EXTEND))
> + return gen_lowpart_or_truncate (mode, x);
> +
> /* For PLUS, MINUS and MULT, we need any bits less significant than the
> most significant bit in MASK since carries from those bits will
> affect the bits we are interested in. */
next prev parent reply other threads:[~2015-11-05 12:01 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-11-02 14:15 Kyrill Tkachov
2015-11-02 22:31 ` Jeff Law
2015-11-04 11:37 ` Kyrill Tkachov
2015-11-04 13:33 ` Kyrill Tkachov
2015-11-04 23:50 ` Segher Boessenkool
2015-11-05 12:01 ` Kyrill Tkachov [this message]
2015-11-06 0:56 ` Segher Boessenkool
2015-11-06 14:19 ` Kyrill Tkachov
2015-11-06 21:14 ` Jeff Law
2015-11-06 22:00 ` Segher Boessenkool
2015-11-08 20:58 ` Segher Boessenkool
2015-11-09 7:52 ` Uros Bizjak
2015-11-09 9:51 ` Segher Boessenkool
2015-11-10 19:53 ` Segher Boessenkool
2015-11-13 10:10 ` Kyrill Tkachov
2015-11-13 10:17 ` Uros Bizjak
2015-11-13 15:01 ` Segher Boessenkool
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=563B4516.5090001@arm.com \
--to=kyrylo.tkachov@arm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=segher@kernel.crashing.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).