From: Richard Biener <rguenther@suse.de>
To: gcc-patches@gcc.gnu.org
Cc: richard.sandiford@arm.com
Subject: [PATCH] tree-optimization/115252 - enhance peeling for gaps avoidance
Date: Tue, 28 May 2024 14:17:42 +0200 (CEST) [thread overview]
Message-ID: <20240528121742.QFBn6V_xwdAHfrUHFZI28dLnTOdIl_s_8Rc6gkiw2rc@z> (raw)
Code generation for contiguous load vectorization can already deal
with generalized avoidance of loading from a gap. The following
extends detection of peeling for gaps requirement with that,
gets rid of the old special casing of a half load and makes sure
when we do access the gap we have peeling for gaps enabled.
Bootstrapped and tested on x86_64-unknown-linux-gnu.
This is the first patch in a series to improve peeling for gaps,
it turned out into an improvement for code rather than just doing
the (delayed from stage3) removal of the "old" half-vector codepath.
I'll wait for the pre-CI testing for pushing so you also have time
for some comments.
Richard.
PR tree-optimization/115252
* tree-vect-stmts.cc (get_group_load_store_type): Enhance
detecting the number of cases where we can avoid accessing a gap
during code generation.
(vectorizable_load): Remove old half-vector peeling for gap
avoidance which is now redundant. Add gap-aligned case where
it's OK to access the gap. Add assert that we have peeling for
gaps enabled when we access a gap.
* gcc.dg/vect/slp-gap-1.c: New testcase.
---
gcc/testsuite/gcc.dg/vect/slp-gap-1.c | 18 +++++++++
gcc/tree-vect-stmts.cc | 58 +++++++++++++--------------
2 files changed, 46 insertions(+), 30 deletions(-)
create mode 100644 gcc/testsuite/gcc.dg/vect/slp-gap-1.c
diff --git a/gcc/testsuite/gcc.dg/vect/slp-gap-1.c b/gcc/testsuite/gcc.dg/vect/slp-gap-1.c
new file mode 100644
index 00000000000..36463ca22c5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/slp-gap-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O3" } */
+
+typedef unsigned char uint8_t;
+typedef short int16_t;
+void pixel_sub_wxh(int16_t * __restrict diff, uint8_t *pix1, uint8_t *pix2) {
+ for (int y = 0; y < 4; y++) {
+ for (int x = 0; x < 4; x++)
+ diff[x + y * 4] = pix1[x] - pix2[x];
+ pix1 += 16;
+ pix2 += 32;
+ }
+}
+
+/* We can vectorize this without peeling for gaps and thus without epilogue,
+ but the only thing we can reliably scan is the zero-padding trick for the
+ partial loads. */
+/* { dg-final { scan-tree-dump-times "\{_\[0-9\]\+, 0" 6 "vect" { target vect64 } } } */
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index a01099d3456..b26cc74f417 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -2072,16 +2072,22 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info,
dr_alignment_support alss;
int misalign = dr_misalignment (first_dr_info, vectype);
tree half_vtype;
+ poly_uint64 remain;
+ unsigned HOST_WIDE_INT tem, num;
if (overrun_p
&& !masked_p
&& (((alss = vect_supportable_dr_alignment (vinfo, first_dr_info,
vectype, misalign)))
== dr_aligned
|| alss == dr_unaligned_supported)
- && known_eq (nunits, (group_size - gap) * 2)
- && known_eq (nunits, group_size)
- && (vector_vector_composition_type (vectype, 2, &half_vtype)
- != NULL_TREE))
+ && can_div_trunc_p (group_size
+ * LOOP_VINFO_VECT_FACTOR (loop_vinfo) - gap,
+ nunits, &tem, &remain)
+ && (known_eq (remain, 0u)
+ || (constant_multiple_p (nunits, remain, &num)
+ && (vector_vector_composition_type (vectype, num,
+ &half_vtype)
+ != NULL_TREE))))
overrun_p = false;
if (overrun_p && !can_overrun_p)
@@ -11533,33 +11539,14 @@ vectorizable_load (vec_info *vinfo,
unsigned HOST_WIDE_INT gap = DR_GROUP_GAP (first_stmt_info);
unsigned int vect_align
= vect_known_alignment_in_bytes (first_dr_info, vectype);
- unsigned int scalar_dr_size
- = vect_get_scalar_dr_size (first_dr_info);
- /* If there's no peeling for gaps but we have a gap
- with slp loads then load the lower half of the
- vector only. See get_group_load_store_type for
- when we apply this optimization. */
- if (slp
- && loop_vinfo
- && !LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) && gap != 0
- && known_eq (nunits, (group_size - gap) * 2)
- && known_eq (nunits, group_size)
- && gap >= (vect_align / scalar_dr_size))
- {
- tree half_vtype;
- new_vtype
- = vector_vector_composition_type (vectype, 2,
- &half_vtype);
- if (new_vtype != NULL_TREE)
- ltype = half_vtype;
- }
/* Try to use a single smaller load when we are about
to load excess elements compared to the unrolled
- scalar loop.
- ??? This should cover the above case as well. */
- else if (known_gt ((vec_num * j + i + 1) * nunits,
+ scalar loop. */
+ if (known_gt ((vec_num * j + i + 1) * nunits,
(group_size * vf - gap)))
{
+ poly_uint64 remain = ((group_size * vf - gap)
+ - (vec_num * j + i) * nunits);
if (known_ge ((vec_num * j + i + 1) * nunits
- (group_size * vf - gap), nunits))
/* DR will be unused. */
@@ -11571,11 +11558,15 @@ vectorizable_load (vec_info *vinfo,
at least one element is accessed in the
scalar loop. */
;
+ else if (known_gt (vect_align,
+ ((nunits - remain)
+ * vect_get_scalar_dr_size
+ (first_dr_info))))
+ /* Aligned access to the gap area when there's
+ at least one element in it is OK. */
+ ;
else
{
- auto remain
- = ((group_size * vf - gap)
- - (vec_num * j + i) * nunits);
/* remain should now be > 0 and < nunits. */
unsigned num;
if (constant_multiple_p (nunits, remain, &num))
@@ -11589,6 +11580,13 @@ vectorizable_load (vec_info *vinfo,
ltype = ptype;
}
/* Else use multiple loads or a masked load? */
+ /* For loop vectorization we now should have
+ an alternate type or LOOP_VINFO_PEELING_FOR_GAPS
+ set. */
+ if (loop_vinfo)
+ gcc_assert (new_vtype
+ || LOOP_VINFO_PEELING_FOR_GAPS
+ (loop_vinfo));
}
}
tree offset
--
2.35.3
next reply other threads:[~2024-05-28 12:17 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-28 12:17 Richard Biener [this message]
[not found] <b44942a2-0ba2-47b9-9dfe-5f75ffad7be5@DB5PEPF00014B9C.eurprd02.prod.outlook.com>
2024-05-29 9:25 ` Richard Sandiford
2024-05-29 9:50 ` Richard Biener
2024-05-29 11:04 ` Richard Biener
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240528121742.QFBn6V_xwdAHfrUHFZI28dLnTOdIl_s_8Rc6gkiw2rc@z \
--to=rguenther@suse.de \
--cc=gcc-patches@gcc.gnu.org \
--cc=richard.sandiford@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).