From: Andre Vieira <andre.simoesdiasvieira@arm.com>
To: gcc-patches@gcc.gnu.org
Cc: stam.markianos-wright@arm.com, richard.earnshaw@arm.com,
Andre Vieira <andre.simoesdiasvieira@arm.com>
Subject: [PATCH v5 0/5] arm: Add support for MVE Tail-Predicated Low Overhead Loops
Date: Thu, 22 Feb 2024 17:37:58 +0000 [thread overview]
Message-ID: <20240222173803.20989-1-andre.simoesdiasvieira@arm.com> (raw)
[-- Attachment #1: Type: text/plain, Size: 4368 bytes --]
Hi,
This is a reworked patch series from. The main differences are a further split
of patches, where:
[1/5] is arm specific and has been approved before,
[2/5] is target agnostic, has had no substantial changes from v3.
[3/5] new arm specific patch that is split from the original last patch and
annotates across lane instructions that are safe for tail predication if their
tail predicated operands are zeroed.
[4/5] new arm specific patch that could be committed indepdent of series to fix
an obvious issue and remove unused unspecs & iterators.
[5/5] (v3-v4) reworked last patch refactoring the implicit predication and some other
validity checks, (v4-v5) removed the expectation that vctp instructions are
always zero extended after this was fixed on trunk.
Original cover letter:
This patch adds support for Arm's MVE Tail Predicated Low Overhead Loop
feature.
The M-class Arm-ARM:
https://developer.arm.com/documentation/ddi0553/bu/?lang=en
Section B5.5.1 "Loop tail predication" describes the feature
we are adding support for with this patch (although
we only add codegen for DLSTP/LETP instruction loops).
Previously with commit d2ed233cb94 we'd added support for
non-MVE DLS/LE loops through the loop-doloop pass, which, given
a standard MVE loop like:
```
void __attribute__ ((noinline)) test (int16_t *a, int16_t *b, int16_t *c, int n)
{
while (n > 0)
{
mve_pred16_t p = vctp16q (n);
int16x8_t va = vldrhq_z_s16 (a, p);
int16x8_t vb = vldrhq_z_s16 (b, p);
int16x8_t vc = vaddq_x_s16 (va, vb, p);
vstrhq_p_s16 (c, vc, p);
c+=8;
a+=8;
b+=8;
n-=8;
}
}
```
.. would output:
```
<pre-calculate the number of iterations and place it into lr>
dls lr, lr
.L3:
vctp.16 r3
vmrs ip, P0 @ movhi
sxth ip, ip
vmsr P0, ip @ movhi
mov r4, r0
vpst
vldrht.16 q2, [r4]
mov r4, r1
vmov q3, q0
vpst
vldrht.16 q1, [r4]
mov r4, r2
vpst
vaddt.i16 q3, q2, q1
subs r3, r3, #8
vpst
vstrht.16 q3, [r4]
adds r0, r0, #16
adds r1, r1, #16
adds r2, r2, #16
le lr, .L3
```
where the LE instruction will decrement LR by 1, compare and
branch if needed.
(there are also other inefficiencies with the above code, like the
pointless vmrs/sxth/vmsr on the VPR and the adds not being merged
into the vldrht/vstrht as a #16 offsets and some random movs!
But that's different problems...)
The MVE version is similar, except that:
* Instead of DLS/LE the instructions are DLSTP/LETP.
* Instead of pre-calculating the number of iterations of the
loop, we place the number of elements to be processed by the
loop into LR.
* Instead of decrementing the LR by one, LETP will decrement it
by FPSCR.LTPSIZE, which is the number of elements being
processed in each iteration: 16 for 8-bit elements, 5 for 16-bit
elements, etc.
* On the final iteration, automatic Loop Tail Predication is
performed, as if the instructions within the loop had been VPT
predicated with a VCTP generating the VPR predicate in every
loop iteration.
The dlstp/letp loop now looks like:
```
<place n into r3>
dlstp.16 lr, r3
.L14:
mov r3, r0
vldrh.16 q3, [r3]
mov r3, r1
vldrh.16 q2, [r3]
mov r3, r2
vadd.i16 q3, q3, q2
adds r0, r0, #16
vstrh.16 q3, [r3]
adds r1, r1, #16
adds r2, r2, #16
letp lr, .L14
```
Since the loop tail predication is automatic, we have eliminated
the VCTP that had been specified by the user in the intrinsic
and converted the VPT-predicated instructions into their
unpredicated equivalents (which also saves us from VPST insns).
The LE instruction here decrements LR by 8 in each iteration.
Stam Markianos-Wright (1):
arm: Add define_attr to to create a mapping between MVE predicated and
unpredicated insns
Andre Vieira (4):
doloop: Add support for predicated vectorized loops
arm: Annotate instructions with mve_safe_imp_xlane_pred
arm: Fix a wrong attribute use and remove unused unspecs and iterators
arm: Add support for MVE Tail-Predicated Low Overhead Loops
--
2.17.1
next reply other threads:[~2024-02-22 17:38 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-22 17:37 Andre Vieira [this message]
2024-02-22 17:37 ` [PATCH v5 1/5] arm: Add define_attr to to create a mapping between MVE predicated and unpredicated insns Andre Vieira
2024-02-22 17:38 ` [PATCH v5 2/5] doloop: Add support for predicated vectorized loops Andre Vieira
2024-02-22 17:38 ` [PATCH v5 3/5] arm: Annotate instructions with mve_safe_imp_xlane_pred Andre Vieira
2024-02-22 17:38 ` [PATCH v5 4/5] arm: Fix a wrong attribute use and remove unused unspecs and iterators Andre Vieira
2024-02-22 17:38 ` [PATCH v5 5/5] arm: Add support for MVE Tail-Predicated Low Overhead Loops Andre Vieira
2024-02-27 13:56 [PATCH v5 0/5] " Andre Vieira
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240222173803.20989-1-andre.simoesdiasvieira@arm.com \
--to=andre.simoesdiasvieira@arm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=richard.earnshaw@arm.com \
--cc=stam.markianos-wright@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).