Re: [PATCH] tree-optimization/94864 - vector insert of vector extract simplification

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Richard Sandiford <richard.sandiford@arm.com>
To: Richard Biener <rguenther@suse.de>
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] tree-optimization/94864 - vector insert of vector extract simplification
Date: Wed, 12 Jul 2023 15:15:32 +0100	[thread overview]
Message-ID: <mptjzv5rzaz.fsf@arm.com> (raw)
In-Reply-To: <77b5514c-dab1-4803-b906-0c490e59f838@DBAEUR03FT058.eop-EUR03.prod.protection.outlook.com> (Richard Biener's message of "Wed, 12 Jul 2023 13:36:56 +0000 (UTC)")

Richard Biener <rguenther@suse.de> writes:
> The PRs ask for optimizing of
>
>   _1 = BIT_FIELD_REF <b_3(D), 64, 64>;
>   result_4 = BIT_INSERT_EXPR <a_2(D), _1, 64>;
>
> to a vector permutation.  The following implements this as
> match.pd pattern, improving code generation on x86_64.
>
> On the RTL level we face the issue that backend patterns inconsistently
> use vec_merge and vec_select of vec_concat to represent permutes.

Yeah, the current RTL codes probably overlap a bit too much.

Maybe we should have a rule that a vec_merge with a constant
third operand should be canonicalised to a vec_select?  And maybe
change the first operand of vec_select to be an rtvec, so that
no separate vec_concat (and thus wider mode) is needed for two-input
permutes?  Would be a lot of work though...

> I think using a (supported) permute is almost always better
> than an extract plus insert, maybe excluding the case we extract
> element zero and that's aliased to a register that can be used
> directly for insertion (not sure how to query that).

Yeah, extraction of the low element (0 for LE, N-1 for BE) is special
in RTL, in that it is now folded to a subreg.  But IMO it's reasonable
for even that case to through TARGET_VECTORIZE_VEC_PERM_CONST,
maybe with a target-independent helper function to match permute
vectors that are equivalent to extract-and-insert.

On AArch64, extract-and-insert is a single operation for other
elements too, e.g.:

	ins	v0.s[2], v1.s[1]

is a thing.  But if the helper returns the index of the extracted
elements, targets can decide for themselves whether the index is
supported or not.

Agree that this is the right thing for gimple to do FWIW.

Thanks,
Richard

> But this regresses for example gcc.target/i386/pr54855-8.c because PRE
> now realizes that
>
>   _1 = BIT_FIELD_REF <x_3(D), 64, 0>;
>   if (_1 > a_4(D))
>     goto <bb 3>; [50.00%]
>   else
>     goto <bb 4>; [50.00%]
>
>   <bb 3> [local count: 536870913]:
>
>   <bb 4> [local count: 1073741824]:
>   # iftmp.0_2 = PHI <_1(3), a_4(D)(2)>
>   x_5 = BIT_INSERT_EXPR <x_3(D), iftmp.0_2, 0>;
>
> is equal to
>
>   <bb 2> [local count: 1073741824]:
>   _1 = BIT_FIELD_REF <x_3(D), 64, 0>;
>   if (_1 > a_4(D))
>     goto <bb 4>; [50.00%]
>   else
>     goto <bb 3>; [50.00%]
>
>   <bb 3> [local count: 536870912]:
>   _7 = BIT_INSERT_EXPR <x_3(D), a_4(D), 0>;
>
>   <bb 4> [local count: 1073741824]:
>   # prephitmp_8 = PHI <x_3(D)(2), _7(3)>
>
> and that no longer produces the desired maxsd operation at the RTL
> level (we fail to match .FMAX at the GIMPLE level earlier).
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu with regressions:
>
> FAIL: gcc.target/i386/pr54855-13.c scan-assembler-times vmaxsh[ \\\\t] 1
> FAIL: gcc.target/i386/pr54855-13.c scan-assembler-not vcomish[ \\\\t]
> FAIL: gcc.target/i386/pr54855-8.c scan-assembler-times maxsd 1
> FAIL: gcc.target/i386/pr54855-8.c scan-assembler-not movsd
> FAIL: gcc.target/i386/pr54855-9.c scan-assembler-times minss 1
> FAIL: gcc.target/i386/pr54855-9.c scan-assembler-not movss
>
> I think this is also PR88540 (the lack of min/max detection, not
> sure if the SSE min/max are suitable here)
>
> 	PR tree-optimization/94864
> 	PR tree-optimization/94865
> 	* match.pd (bit_insert @0 (BIT_FIELD_REF @1 ..) ..): New pattern
> 	for vector insertion from vector extraction.
>
> 	* gcc.target/i386/pr94864.c: New testcase.
> 	* gcc.target/i386/pr94865.c: Likewise.
> ---
>  gcc/match.pd                            | 25 +++++++++++++++++++++++++
>  gcc/testsuite/gcc.target/i386/pr94864.c | 13 +++++++++++++
>  gcc/testsuite/gcc.target/i386/pr94865.c | 13 +++++++++++++
>  3 files changed, 51 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr94864.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr94865.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 8543f777a28..8cc106049c4 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -7770,6 +7770,31 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  		      wi::to_wide (@ipos) + isize))
>      (BIT_FIELD_REF @0 @rsize @rpos)))))
>  
> +/* Simplify vector inserts of other vector extracts to a permute.  */
> +(simplify
> + (bit_insert @0 (BIT_FIELD_REF@2 @1 @rsize @rpos) @ipos)
> + (if (VECTOR_TYPE_P (type)
> +      && types_match (@0, @1)
> +      && types_match (TREE_TYPE (TREE_TYPE (@0)), TREE_TYPE (@2))
> +      && TYPE_VECTOR_SUBPARTS (type).is_constant ())
> +  (with
> +   {
> +     unsigned HOST_WIDE_INT elsz
> +       = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (TREE_TYPE (@1))));
> +     poly_uint64 relt = exact_div (tree_to_poly_uint64 (@rpos), elsz);
> +     poly_uint64 ielt = exact_div (tree_to_poly_uint64 (@ipos), elsz);
> +     unsigned nunits = TYPE_VECTOR_SUBPARTS (type).to_constant ();
> +     vec_perm_builder builder;
> +     builder.new_vector (nunits, nunits, 1);
> +     for (unsigned i = 0; i < nunits; ++i)
> +       builder.quick_push (known_eq (ielt, i) ? nunits + relt : i);
> +     vec_perm_indices sel (builder, 2, nunits);
> +   }
> +   (if (!VECTOR_MODE_P (TYPE_MODE (type))
> +	|| can_vec_perm_const_p (TYPE_MODE (type), TYPE_MODE (type), sel, false))
> +    (vec_perm @0 @1 { vec_perm_indices_to_tree
> +                        (build_vector_type (ssizetype, nunits), sel); })))))
> +
>  (if (canonicalize_math_after_vectorization_p ())
>   (for fmas (FMA)
>    (simplify
> diff --git a/gcc/testsuite/gcc.target/i386/pr94864.c b/gcc/testsuite/gcc.target/i386/pr94864.c
> new file mode 100644
> index 00000000000..69cb481fcfe
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr94864.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -msse2 -mno-avx" } */
> +
> +typedef double v2df __attribute__((vector_size(16)));
> +
> +v2df move_sd(v2df a, v2df b)
> +{
> +    v2df result = a;
> +    result[0] = b[1];
> +    return result;
> +}
> +
> +/* { dg-final { scan-assembler "unpckhpd\[\\t \]%xmm0, %xmm1" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr94865.c b/gcc/testsuite/gcc.target/i386/pr94865.c
> new file mode 100644
> index 00000000000..84065ac2467
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr94865.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -msse2 -mno-avx" } */
> +
> +typedef double v2df __attribute__((vector_size(16)));
> +
> +v2df move_sd(v2df a, v2df b)
> +{
> +    v2df result = a;
> +    result[1] = b[1];
> +    return result;
> +}
> +
> +/* { dg-final { scan-assembler "shufpd\[\\t \]*.2, %xmm1, %xmm0" } } */

next      parent reply	other threads:[~2023-07-12 14:15 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <77b5514c-dab1-4803-b906-0c490e59f838@DBAEUR03FT058.eop-EUR03.prod.protection.outlook.com>
2023-07-12 14:15 ` Richard Sandiford [this message]
2023-07-13  6:39   ` Richard Biener
     [not found] <20230822090522.F0D7B385C6D9@sourceware.org>
2023-08-22  9:21 ` Hongtao Liu
2023-08-22  9:05 Richard Biener
     [not found] <20230712133739.1E4E23858C62@sourceware.org>
2023-07-15 17:31 ` Andrew Pinski
2023-07-15 17:34   ` Andrew Pinski
     [not found] <20230712133715.EDB563858C62@sourceware.org>
2023-07-13  2:47 ` Hongtao Liu
2023-07-13  2:57   ` Hongtao Liu
2023-07-13  6:32     ` Richard Biener
2023-07-13  8:29       ` Hongtao Liu
     [not found] <20230712133711.DEF46385842C@sourceware.org>
2023-07-12 14:05 ` Jeff Law
  -- strict thread matches above, loose matches on Subject: below --
2023-07-12 13:36 Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mptjzv5rzaz.fsf@arm.com \
    --to=richard.sandiford@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=rguenther@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).