public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Kyrill Tkachov <kyrylo.tkachov@arm.com>
To: Segher Boessenkool <segher@kernel.crashing.org>
Cc: gcc Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH][combine][RFC] Don't transform sign and zero extends inside mults
Date: Thu, 05 Nov 2015 12:01:00 -0000	[thread overview]
Message-ID: <563B4516.5090001@arm.com> (raw)
In-Reply-To: <20151104235015.GA13203@gate.crashing.org>

Hi Segher,

On 04/11/15 23:50, Segher Boessenkool wrote:
> Hi Kyrill,
>
> On Mon, Nov 02, 2015 at 02:15:27PM +0000, Kyrill Tkachov wrote:
>> This patch attempts to restrict combine from transforming ZERO_EXTEND and
>> SIGN_EXTEND operations into and-bitmask
>> and weird SUBREG expressions when they appear inside MULT expressions. This
>> is because a MULT rtx containing these
>> extend operations is usually a well understood widening multiply operation.
> Right.  I have often wanted for combine to try plain substitution as well
> as substitution and simplification: simplification (which uses nonzero_bits
> etc.) often makes a mess of even moderately complex instructions.
>
> But since we do not have that yet, and your 24% number is nicely impressive,
> let's try to do no simplification for widening mults, as in your patch.
>
>> With this patch on aarch64 I saw increased generation of smaddl and umaddl
>> instructions where
>> before the combination of smull and add instructions were rejected because
>> the extend operands
>> were being transformed into these weird equivalent expressions.
>>
>> Overall, I see 24% more [su]maddl instructions being generated for SPEC2006
>> and with no performance regressions.
>>
>> The testcase in the patch is the most minimal one I could get that
>> demonstrates the issue I'm trying to solve.
>>
>> Does this approach look ok?
> In broad lines it does.  Could you try this patch instead?  Not tested
> etc. (other than building an aarch64 cross and your test case); I'll do
> that tomorrow if you like the result.

Thanks, that looks less intrusive. I did try it out on arm and aarch64.
It does work on the aarch64 testcase. However, there's also a correctness
regression, I'll try to explain inline....

>
>
> Segher
>
>
> diff --git a/gcc/combine.c b/gcc/combine.c
> index c3db2e0..3bf7ffb 100644
> --- a/gcc/combine.c
> +++ b/gcc/combine.c
> @@ -5284,6 +5284,15 @@ subst (rtx x, rtx from, rtx to, int in_dest, int in_cond, int unique_copy)
>   	      || GET_CODE (SET_DEST (x)) == PC))
>   	fmt = "ie";
>   
> +      /* Substituting into the operands of a widening MULT is not likely
> +	 to create RTL matching a machine insn.  */
> +      if (code == MULT
> +	  && (GET_CODE (XEXP (x, 0)) == ZERO_EXTEND
> +	      || GET_CODE (XEXP (x, 0)) == SIGN_EXTEND)
> +	  && (GET_CODE (XEXP (x, 1)) == ZERO_EXTEND
> +	      || GET_CODE (XEXP (x, 1)) == SIGN_EXTEND))
> +	return x;

I think we should also add:
       && REG_P (XEXP (XEXP (x, 0), 0))
       && REG_P (XEXP (XEXP (x, 1), 0))

to the condition. Otherwise I've seen regressions in the arm testsuite, the
gcc.target/arm/smlatb-1.s test in particular that tries to match the pattern
(define_insn "*maddhisi4tb"
   [(set (match_operand:SI 0 "s_register_operand" "=r")
     (plus:SI (mult:SI (ashiftrt:SI
                (match_operand:SI 1 "s_register_operand" "r")
                (const_int 16))
               (sign_extend:SI
                (match_operand:HI 2 "s_register_operand" "r")))
          (match_operand:SI 3 "s_register_operand" "r")))]


There we have a sign_extend of a shift that we want to convert to the form
that the pattern expects. So adding the checks for REG_P fixes that for me.

For the correctness issue I saw on aarch64 the shortest case I could reduce is:
short int a[16], b[16];
void
f5 (void)
{
   a[0] = b[0] / 14;
}

We synthesise the division and lose one of the intermediate immediates.
The good codegen is:
f5:
     adrp    x1, b
     mov    w0, 9363            // <--- This gets lost!
     movk    w0, 0x9249, lsl 16 // <--- This gets lost!
     adrp    x2, a
     ldrsh    w1, [x1, #:lo12:b]
     smull    x0, w1, w0
     lsr    x0, x0, 32
     add    w0, w1, w0
     asr    w0, w0, 3
     sub    w0, w0, w1, asr 31
     strh    w0, [x2, #:lo12:a]
     ret

The bad one with this patch is:
     adrp    x1, b
     adrp    x2, a
     ldrsh    w1, [x1, #:lo12:b]
     smull    x0, w1, w0
     lsr    x0, x0, 32
     add    w0, w1, w0
     asr    w0, w0, 3
     sub    w0, w0, w1, asr 31
     strh    w0, [x2, #:lo12:a]
     ret


The problematic hunk there is the subst hunk.
In the expression 'x':
(mult:DI (sign_extend:DI (reg:SI 80 [ b ]))
     (sign_extend:DI (reg:SI 82)))

it tries to substitute 'from': (reg:SI 82)
with 'to': (const_int -1840700269 [0xffffffff92492493])

since we now return just 'x' combine thinks that the substitution succeeded
and eliminates the immediate move.
Is there a way that subst can signal some kind of "failed to substitute" result?
If not, I got it to work by using that check to set the in_dest variable to the
subsequent recursive call to subst, in a similar way to my original patch, but I was
hoping to avoid overloading the meaning of in_dest.

Thanks,
Kyrill

> +
>         /* Get the mode of operand 0 in case X is now a SIGN_EXTEND of a
>   	 constant.  */
>         if (fmt[0] == 'e')
> @@ -8455,6 +8464,15 @@ force_to_mode (rtx x, machine_mode mode, unsigned HOST_WIDE_INT mask,
>         /* ... fall through ...  */
>   
>       case MULT:
> +      /* Substituting into the operands of a widening MULT is not likely to
> +	 create RTL matching a machine insn.  */
> +      if (code == MULT
> +	  && (GET_CODE (XEXP (x, 0)) == ZERO_EXTEND
> +	      || GET_CODE (XEXP (x, 0)) == SIGN_EXTEND)
> +	  && (GET_CODE (XEXP (x, 1)) == ZERO_EXTEND
> +	      || GET_CODE (XEXP (x, 1)) == SIGN_EXTEND))
> +	return gen_lowpart_or_truncate (mode, x);
> +
>         /* For PLUS, MINUS and MULT, we need any bits less significant than the
>   	 most significant bit in MASK since carries from those bits will
>   	 affect the bits we are interested in.  */

  reply	other threads:[~2015-11-05 12:01 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-02 14:15 Kyrill Tkachov
2015-11-02 22:31 ` Jeff Law
2015-11-04 11:37   ` Kyrill Tkachov
2015-11-04 13:33     ` Kyrill Tkachov
2015-11-04 23:50 ` Segher Boessenkool
2015-11-05 12:01   ` Kyrill Tkachov [this message]
2015-11-06  0:56     ` Segher Boessenkool
2015-11-06 14:19       ` Kyrill Tkachov
2015-11-06 21:14         ` Jeff Law
2015-11-06 22:00           ` Segher Boessenkool
2015-11-08 20:58             ` Segher Boessenkool
2015-11-09  7:52               ` Uros Bizjak
2015-11-09  9:51                 ` Segher Boessenkool
2015-11-10 19:53                   ` Segher Boessenkool
2015-11-13 10:10                     ` Kyrill Tkachov
2015-11-13 10:17                       ` Uros Bizjak
2015-11-13 15:01                         ` Segher Boessenkool

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=563B4516.5090001@arm.com \
    --to=kyrylo.tkachov@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=segher@kernel.crashing.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).