From: Yun Hsiang <yun.hsiang@sifive.com>
To: libc-alpha@sourceware.org
Subject: Re: [PATCH 2/2] riscv: vectorised mem* and str* functions
Date: Wed, 3 May 2023 10:11:11 +0800 [thread overview]
Message-ID: <ZFHCv8JpkDa6WsBY@Yuns-MacBook-Pro.local> (raw)
In-Reply-To: <20230201095232.15942-2-slewis@rivosinc.com>
On Wed, Feb 01, 2023 at 09:52:32AM +0000, Sergei Lewis wrote:
>
> Initial implementations of memchr, memcmp, memcpy, memmove, memset, strchr,
> strcmp, strcpy, strlen, strncmp, strncpy, strnlen, strrchr, strspn
> targeting the riscv "V" extension, version 1.0
>
> The vectorised implementations assume VLENB of at least 128 and at least 32
> registers (as mandated by the "V" extension spec). They also assume that
> VLENB is a power of two which is no larger than the page size, and (as
> vectorised code in glibc for other platforms does) that it is safe to read
> past null terminators / buffer ends provided one does not cross a page
> boundary.
I've tried to apply these patches to run benchtests & tests, but I
encountered some errors at runtime while running string/test-*. Do
these patches depend on others?
> /* ignore */
In strnlen implementation.
> +#include <sysdep.h>
> +
> +.globl __strnlen
> +.type __strnlen,@function
> +
> +/* vector optimized strnlen
> + * assume it's safe to read to the end of the page
> + * containing either a null terminator or the last byte of the count or both,
> + * but not past it
> + * assume page size >= vlenb*2
> + */
> +
> +.align 2
> +__strnlen:
> + mv t4, a0 /* stash a copy of start for later */
> + beqz a1, .LzeroCount
> +
> + csrr t1, vlenb /* find vlenb*2 */
> + add t1, t1, t1
> + addi t2, t1, -1 /* mask off unaligned part of ptr */
> + and t2, a1, a0
Should this line be `and t2, t2, a0`?
> + beqz t2, .Laligned
> +
> + sub t2, t1, t2 /* search to align pointer to t1 */
> + bgeu t2, a1, 2f /* check it's safe */
> + mv t2, a1 /* it's not! look as far as permitted */
> +2: vsetvli t2, t2, e8, m2, ta, ma
> + vle8.v v2, (a0)
> + vmseq.vx v0, v2, zero
> + vfirst.m t3, v0
> + bgez t3, .Lfound
> + add a0, a0, t2
> + sub a1, a1, t2
> + bltu a1, t1, .LreachedCount
> +
> +.Laligned:
> + vsetvli zero, t1, e8, m2, ta, ma /* do 2*vlenb bytes per pass */
> +
> +1: vle8.v v2, (a0)
> + sub a1, a1, t1
If a1(maxlen) is smaller than t1(vlenb*2) in the first loop,
a1(maxlen) will become a negative value.
Then strnlen might get the wrong result.
> + vmseq.vx v0, v2, zero
> + vfirst.m t3, v0
> + bgez t3, .Lfound
> + add a0, a0, t1
> + bgeu a1, t1, 1b
> +.LreachedCount:
> + mv t2, a1 /* in case 0 < a1 < t1 */
> + bnez a1, 2b /* if so, still t2 bytes to check, all safe */
> +.LzeroCount:
> + sub a0, a0, t4
> + ret
> +
> +.Lfound: /* found the 0; subtract buffer start from current pointer */
> + add a0, a0, t3 /* and add offset into fetched data */
> + sub a0, a0, t4
> + ret
> +
> +.size __strnlen, .-__strnlen
> +weak_alias (__strnlen, strnlen)
> +libc_hidden_builtin_def (__strnlen)
> +libc_hidden_builtin_def (strnlen)
prev parent reply other threads:[~2023-05-03 2:11 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-01 9:52 [PATCH 1/2] riscv: sysdeps support for vectorised functions Sergei Lewis
2023-02-01 9:52 ` [PATCH 2/2] riscv: vectorised mem* and str* functions Sergei Lewis
2023-02-01 15:33 ` Jeff Law
2023-02-01 16:42 ` Florian Weimer
2023-02-01 17:07 ` Jeff Law
2023-02-02 9:34 ` Sergei Lewis
2023-02-06 12:49 ` Sergei Lewis
2023-02-01 17:17 ` Adhemerval Zanella Netto
2023-02-01 17:38 ` Adhemerval Zanella Netto
2023-02-01 18:13 ` Noah Goldstein
2023-02-02 10:02 ` Sergei Lewis
2023-02-02 14:26 ` Adhemerval Zanella Netto
2023-02-02 15:20 ` Sergei Lewis
2023-02-02 15:35 ` Sergei Lewis
2023-02-03 11:35 ` Adhemerval Zanella Netto
2023-02-03 14:04 ` Sergei Lewis
2023-02-01 18:11 ` Noah Goldstein
2023-02-01 18:13 ` Andrew Waterman
2023-02-01 19:03 ` Andrew Waterman
2023-02-03 0:13 ` Vineet Gupta
2023-02-03 0:51 ` Andrew Waterman
2023-05-03 2:11 ` Yun Hsiang [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZFHCv8JpkDa6WsBY@Yuns-MacBook-Pro.local \
--to=yun.hsiang@sifive.com \
--cc=libc-alpha@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).