[PATCH/RFC 08/10] aarch64: Don't use CEIL for vector_store in aarch64_stp_sequence_cost

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Kewen Lin <linkw@linux.ibm.com>
To: gcc-patches@gcc.gnu.org
Cc: richard.guenther@gmail.com, richard.sandiford@arm.com
Subject: [PATCH/RFC 08/10] aarch64: Don't use CEIL for vector_store in aarch64_stp_sequence_cost
Date: Wed, 13 Sep 2023 22:11:57 -0500	[thread overview]
Message-ID: <bc85799abb2616dcac511424a1b50b57e48c2556.1694657494.git.linkw@linux.ibm.com> (raw)
In-Reply-To: <cover.1694657494.git.linkw@linux.ibm.com>

This costing adjustment patch series exposes one issue in
aarch64 specific costing adjustment for STP sequence.  It
causes the below test cases to fail:

  - gcc/testsuite/gcc.target/aarch64/ldp_stp_15.c
  - gcc/testsuite/gcc.target/aarch64/ldp_stp_16.c
  - gcc/testsuite/gcc.target/aarch64/ldp_stp_17.c
  - gcc/testsuite/gcc.target/aarch64/ldp_stp_18.c

Take the below function extracted from ldp_stp_15.c as
example:

void
dup_8_int32_t (int32_t *x, int32_t val)
{
    for (int i = 0; i < 8; ++i)
          x[i] = val;
}

Without my patch series, during slp1 it gets:

  val_8(D) 2 times unaligned_store (misalign -1) costs 2 in body
  node 0x10008c85e38 1 times scalar_to_vec costs 1 in prologue

then the final vector cost is 3.

With my patch series, during slp1 it gets:

  val_8(D) 1 times unaligned_store (misalign -1) costs 1 in body
  val_8(D) 1 times unaligned_store (misalign -1) costs 1 in body
  node 0x10004cc5d88 1 times scalar_to_vec costs 1 in prologue

but the final vector cost is 17.  The unaligned_store count is
actually unchanged, but the final vector costs become different,
it's because the below aarch64 special handling makes the
different costs:

  /* Apply the heuristic described above m_stp_sequence_cost.  */
  if (m_stp_sequence_cost != ~0U)
    {
      uint64_t cost = aarch64_stp_sequence_cost (count, kind,
						 stmt_info, vectype);
      m_stp_sequence_cost = MIN (m._stp_sequence_cost + cost, ~0U);
    }

For the former, since the count is 2, function
aarch64_stp_sequence_cost returns 2 as "CEIL (count, 2) * 2".
While for the latter, it's separated into twice calls with
count 1, aarch64_stp_sequence_cost returns 2 for each time,
so it returns 4 in total.

For this case, the stmt with scalar_to_vec also contributes
4 to m_stp_sequence_cost, then the final m_stp_sequence_cost
are 6 (2+4) vs. 8 (4+4).

Considering scalar_costs->m_stp_sequence_cost is 8 and below
checking and re-assigning:

  else if (m_stp_sequence_cost >= scalar_costs->m_stp_sequence_cost)
    m_costs[vect_body] = 2 * scalar_costs->total_cost ();

For the former, the body cost of vector isn't changed; but
for the latter, the body cost of vector is double of scalar
cost which is 8 for this case, then it becomes 16 which is
bigger than what we expect.

I'm not sure why it adopts CEIL for the return value for
case unaligned_store in function aarch64_stp_sequence_cost,
but I tried to modify it with "return count;" (as it can
get back to previous cost), there is no failures exposed
in regression testing.  I expected that if the previous
unaligned_store count is even, this adjustment doesn't
change anything, if it's odd, the adjustment may reduce
it by one, but I'd guess it would be few.  Besides, as
the comments for m_stp_sequence_cost, the current
handlings seems temporary, maybe a tweak like this can be
accepted, so I posted this RFC/PATCH to request comments.
this one line change is considered.

gcc/ChangeLog:

	* config/aarch64/aarch64.cc (aarch64_stp_sequence_cost): Return
	count directly instead of the adjusted value computed with CEIL.
---
 gcc/config/aarch64/aarch64.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 37d414021ca..9fb4fbd883d 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -17051,7 +17051,7 @@ aarch64_stp_sequence_cost (unsigned int count, vect_cost_for_stmt kind,
 	  if (!aarch64_aligned_constant_offset_p (stmt_info, size))
 	    return count * 2;
 	}
-      return CEIL (count, 2) * 2;
+      return count;

     case scalar_store:
       if (stmt_info && STMT_VINFO_DATA_REF (stmt_info))
-- 
2.31.1

next prev parent reply	other threads:[~2023-09-14  3:12 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-14  3:11 [PATCH 00/10] vect: Move costing next to the transform for vect store Kewen Lin
2023-09-14  3:11 ` [PATCH 01/10] vect: Ensure vect store is supported for some VMAT_ELEMENTWISE case Kewen Lin
2023-09-27 11:22   ` Richard Biener
2023-09-14  3:11 ` [PATCH 02/10] vect: Move vect_model_store_cost next to the transform in vectorizable_store Kewen Lin
2023-09-27 11:23   ` Richard Biener
2023-09-14  3:11 ` [PATCH 03/10] vect: Adjust vectorizable_store costing on VMAT_GATHER_SCATTER Kewen Lin
2023-09-27 11:24   ` Richard Biener
2023-09-14  3:11 ` [PATCH 04/10] vect: Simplify costing on vectorizable_scan_store Kewen Lin
2023-09-27 11:25   ` Richard Biener
2023-09-14  3:11 ` [PATCH 05/10] vect: Adjust vectorizable_store costing on VMAT_ELEMENTWISE and VMAT_STRIDED_SLP Kewen Lin
2023-09-27 11:26   ` Richard Biener
2023-09-14  3:11 ` [PATCH 06/10] vect: Adjust vectorizable_store costing on VMAT_LOAD_STORE_LANES Kewen Lin
2023-09-27 11:27   ` Richard Biener
2023-09-14  3:11 ` [PATCH 07/10] vect: Adjust vectorizable_store costing on VMAT_CONTIGUOUS_PERMUTE Kewen Lin
2023-09-27 11:28   ` Richard Biener
2023-09-14  3:11 ` Kewen Lin [this message]
2023-09-18  8:41   ` [PATCH/RFC 08/10] aarch64: Don't use CEIL for vector_store in aarch64_stp_sequence_cost Richard Sandiford
2023-09-18  8:53     ` Richard Biener
2023-09-20  2:40       ` Kewen.Lin
2023-09-14  3:11 ` [PATCH 09/10] vect: Get rid of vect_model_store_cost Kewen Lin
2023-09-27 11:29   ` Richard Biener
2023-09-14  3:11 ` [PATCH 10/10] vect: Consider vec_perm costing for VMAT_CONTIGUOUS_REVERSE Kewen Lin
2023-09-27 11:30   ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bc85799abb2616dcac511424a1b50b57e48c2556.1694657494.git.linkw@linux.ibm.com \
    --to=linkw@linux.ibm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=richard.guenther@gmail.com \
    --cc=richard.sandiford@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).