Re: [PATCH 6/8] aarch64: Tweak MLA vector costs

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Richard Sandiford <richard.sandiford@arm.com>
To: Richard Biener <richard.guenther@gmail.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH 6/8] aarch64: Tweak MLA vector costs
Date: Wed, 04 Aug 2021 13:19:42 +0100	[thread overview]
Message-ID: <mptr1f95rmp.fsf@arm.com> (raw)
In-Reply-To: <mpty29h5rvd.fsf@arm.com> (Richard Sandiford via Gcc-patches's message of "Wed, 04 Aug 2021 13:14:30 +0100")

Richard Sandiford via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> Richard Biener <richard.guenther@gmail.com> writes:
>> On Tue, Aug 3, 2021 at 2:10 PM Richard Sandiford via Gcc-patches
>> <gcc-patches@gcc.gnu.org> wrote:
>>>
>>> The issue-based vector costs currently assume that a multiply-add
>>> sequence can be implemented using a single instruction.  This is
>>> generally true for scalars (which have a 4-operand instruction)
>>> and SVE (which allows the output to be tied to any input).
>>> However, for Advanced SIMD, multiplying two values and adding
>>> an invariant will end up being a move and an MLA.
>>>
>>> The only target to use the issue-based vector costs is Neoverse V1,
>>> which would generally prefer SVE in this case anyway.  I therefore
>>> don't have a self-contained testcase.  However, the distinction
>>> becomes more important with a later patch.
>>
>> But we do cost any invariants separately (for the prologue), so they
>> should be available in a register.  How doesn't that work?
>
> Yeah, that works, and the prologue part is costed correctly.  But the
> result of an Advanced SIMD FMLA is tied to the accumulator input, so if
> the accumulator input is an invariant, we need a register move (in the
> loop body) before the FMLA.
>
> E.g. for:
>
> void
> f (float *restrict x, float *restrict y, float *restrict z, float a)
> {
>   for (int i = 0; i < 100; ++i)
>     x[i] = y[i] * z[i];

+ 1.0, not sure where that went.

> }
>
> the scalar code is:
>
> .L2:
>         ldr     s1, [x1, x3]
>         ldr     s2, [x2, x3]
>         fmadd   s1, s1, s2, s0
>         str     s1, [x0, x3]
>         add     x3, x3, 4
>         cmp     x3, 400
>         bne     .L2
>
> the SVE code is:
>
> .L2:
>         ld1w    z1.s, p0/z, [x1, x3, lsl 2]
>         ld1w    z0.s, p0/z, [x2, x3, lsl 2]
>         fmad    z0.s, p1/m, z1.s, z2.s
>         st1w    z0.s, p0, [x0, x3, lsl 2]
>         add     x3, x3, x5
>         whilelo p0.s, w3, w4
>         b.any   .L2
>
> but the Advanced SIMD code is:
>
> .L2:
>         mov     v0.16b, v3.16b   // <-------- boo
>         ldr     q2, [x2, x3]
>         ldr     q1, [x1, x3]
>         fmla    v0.4s, v2.4s, v1.4s
>         str     q0, [x0, x3]
>         add     x3, x3, 16
>         cmp     x3, 400
>         bne     .L2
>
> Thanks,
> Richard
>
>
>>
>>> gcc/
>>>         * config/aarch64/aarch64.c (aarch64_multiply_add_p): Add a vec_flags
>>>         parameter.  Detect cases in which an Advanced SIMD MLA would almost
>>>         certainly require a MOV.
>>>         (aarch64_count_ops): Update accordingly.
>>> ---
>>>  gcc/config/aarch64/aarch64.c | 25 ++++++++++++++++++++++---
>>>  1 file changed, 22 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>>> index 084f8caa0da..19045ef6944 100644
>>> --- a/gcc/config/aarch64/aarch64.c
>>> +++ b/gcc/config/aarch64/aarch64.c
>>> @@ -14767,9 +14767,12 @@ aarch64_integer_truncation_p (stmt_vec_info stmt_info)
>>>
>>>  /* Return true if STMT_INFO is the second part of a two-statement multiply-add
>>>     or multiply-subtract sequence that might be suitable for fusing into a
>>> -   single instruction.  */
>>> +   single instruction.  If VEC_FLAGS is zero, analyze the operation as
>>> +   a scalar one, otherwise analyze it as an operation on vectors with those
>>> +   VEC_* flags.  */
>>>  static bool
>>> -aarch64_multiply_add_p (vec_info *vinfo, stmt_vec_info stmt_info)
>>> +aarch64_multiply_add_p (vec_info *vinfo, stmt_vec_info stmt_info,
>>> +                       unsigned int vec_flags)
>>>  {
>>>    gassign *assign = dyn_cast<gassign *> (stmt_info->stmt);
>>>    if (!assign)
>>> @@ -14797,6 +14800,22 @@ aarch64_multiply_add_p (vec_info *vinfo, stmt_vec_info stmt_info)
>>>        if (!rhs_assign || gimple_assign_rhs_code (rhs_assign) != MULT_EXPR)
>>>         continue;
>>>
>>> +      if (vec_flags & VEC_ADVSIMD)
>>> +       {
>>> +         /* Scalar and SVE code can tie the result to any FMLA input (or none,
>>> +            although that requires a MOVPRFX for SVE).  However, Advanced SIMD
>>> +            only supports MLA forms, so will require a move if the result
>>> +            cannot be tied to the accumulator.  The most important case in
>>> +            which this is true is when the accumulator input is invariant.  */
>>> +         rhs = gimple_op (assign, 3 - i);
>>> +         if (TREE_CODE (rhs) != SSA_NAME)
>>> +           return false;
>>> +         def_stmt_info = vinfo->lookup_def (rhs);
>>> +         if (!def_stmt_info
>>> +             || STMT_VINFO_DEF_TYPE (def_stmt_info) == vect_external_def)
>>> +           return false;
>>> +       }
>>> +
>>>        return true;
>>>      }
>>>    return false;
>>> @@ -15232,7 +15251,7 @@ aarch64_count_ops (class vec_info *vinfo, aarch64_vector_costs *costs,
>>>      }
>>>
>>>    /* Assume that multiply-adds will become a single operation.  */
>>> -  if (stmt_info && aarch64_multiply_add_p (vinfo, stmt_info))
>>> +  if (stmt_info && aarch64_multiply_add_p (vinfo, stmt_info, vec_flags))
>>>      return;
>>>
>>>    /* When costing scalar statements in vector code, the count already

next prev parent reply	other threads:[~2021-08-04 12:19 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-03 12:03 [PATCH 0/8] aarch64 vector cost tweaks Richard Sandiford
2021-08-03 12:03 ` [PATCH 1/8] aarch64: Turn sve_width tuning field into a bitmask Richard Sandiford
2021-08-03 12:04 ` [PATCH 2/8] aarch64: Add a simple fixed-point class for costing Richard Sandiford
2021-08-03 12:04 ` [PATCH 3/8] aarch64: Split out aarch64_adjust_body_cost_sve Richard Sandiford
2021-08-03 12:05 ` [PATCH 4/8] aarch64: Add gather_load_xNN_cost tuning fields Richard Sandiford
2021-08-03 12:05 ` [PATCH 5/8] aarch64: Tweak the cost of elementwise stores Richard Sandiford
2021-08-04 11:43   ` Richard Biener
2021-08-05 12:04     ` [PATCH] vect: Move costing helpers from aarch64 code Richard Sandiford
2021-08-05 12:11       ` Richard Biener
2021-08-05 13:06         ` Richard Sandiford
2021-08-03 12:06 ` [PATCH 6/8] aarch64: Tweak MLA vector costs Richard Sandiford
2021-08-04 11:45   ` Richard Biener
2021-08-04 12:14     ` Richard Sandiford
2021-08-04 12:19       ` Richard Sandiford [this message]
2021-08-03 12:06 ` [PATCH 7/8] aarch64: Restrict issue heuristics to inner vector loop Richard Sandiford
2021-08-03 12:06 ` [PATCH 8/8] aarch64: Add -mtune=neoverse-512tvb Richard Sandiford
2021-08-17 14:37   ` Richard Sandiford

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mptr1f95rmp.fsf@arm.com \
    --to=richard.sandiford@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=richard.guenther@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).