public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Yun Hsiang <yun.hsiang@sifive.com>
To: libc-alpha@sourceware.org
Subject: Re: [PATCH 2/2] riscv: vectorised mem* and str* functions
Date: Wed, 3 May 2023 10:11:11 +0800	[thread overview]
Message-ID: <ZFHCv8JpkDa6WsBY@Yuns-MacBook-Pro.local> (raw)
In-Reply-To: <20230201095232.15942-2-slewis@rivosinc.com>

On Wed, Feb 01, 2023 at 09:52:32AM +0000, Sergei Lewis wrote:
> 
> Initial implementations of memchr, memcmp, memcpy, memmove, memset, strchr,
> strcmp, strcpy, strlen, strncmp, strncpy, strnlen, strrchr, strspn
> targeting the riscv "V" extension, version 1.0
> 
> The vectorised implementations assume VLENB of at least 128 and at least 32
> registers (as mandated by the "V" extension spec). They also assume that
> VLENB is a power of two which is no larger than the page size, and (as
> vectorised code in glibc for other platforms does) that it is safe to read
> past null terminators / buffer ends provided one does not cross a page
> boundary.
I've tried to apply these patches to run benchtests & tests, but I
encountered some errors at runtime while running string/test-*. Do
these patches depend on others?

> /* ignore */

In strnlen implementation.
> +#include <sysdep.h>
> +
> +.globl  __strnlen
> +.type   __strnlen,@function
> +
> +/* vector optimized strnlen
> + * assume it's safe to read to the end of the page
> + * containing either a null terminator or the last byte of the count or both,
> + * but not past it
> + * assume page size >= vlenb*2
> + */
> +
> +.align    2
> +__strnlen:
> +    mv          t4, a0               /* stash a copy of start for later */
> +    beqz        a1, .LzeroCount
> +
> +    csrr        t1, vlenb            /* find vlenb*2 */
> +    add         t1, t1, t1
> +    addi        t2, t1, -1           /* mask off unaligned part of ptr */
> +    and         t2, a1, a0
Should this line be `and t2, t2, a0`?
> +    beqz        t2, .Laligned
> +
> +    sub         t2, t1, t2           /* search to align pointer to t1 */
> +    bgeu        t2, a1, 2f           /* check it's safe */
> +    mv          t2, a1               /* it's not! look as far as permitted */
> +2:  vsetvli     t2, t2, e8, m2, ta, ma
> +    vle8.v      v2, (a0)
> +    vmseq.vx    v0, v2, zero
> +    vfirst.m    t3, v0
> +    bgez        t3, .Lfound
> +    add         a0, a0, t2
> +    sub         a1, a1, t2
> +    bltu        a1, t1, .LreachedCount
> +
> +.Laligned:
> +    vsetvli     zero, t1, e8, m2, ta, ma    /* do 2*vlenb bytes per pass */
> +
> +1:  vle8.v      v2, (a0)
> +    sub         a1, a1, t1
If a1(maxlen) is smaller than t1(vlenb*2) in the first loop,
a1(maxlen) will become a negative value.
Then strnlen might get the wrong result.
> +    vmseq.vx    v0, v2, zero
> +    vfirst.m    t3, v0
> +    bgez        t3, .Lfound
> +    add         a0, a0, t1
> +    bgeu        a1, t1, 1b
> +.LreachedCount:
> +    mv          t2, a1    /* in case 0 < a1 < t1 */
> +    bnez        a1, 2b    /* if so, still t2 bytes to check, all safe */
> +.LzeroCount:
> +    sub         a0, a0, t4
> +    ret
> +
> +.Lfound:        /* found the 0; subtract buffer start from current pointer */
> +    add         a0, a0, t3 /* and add offset into fetched data */
> +    sub         a0, a0, t4
> +    ret
> +
> +.size   __strnlen, .-__strnlen
> +weak_alias (__strnlen, strnlen)
> +libc_hidden_builtin_def (__strnlen)
> +libc_hidden_builtin_def (strnlen)

      parent reply	other threads:[~2023-05-03  2:11 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-01  9:52 [PATCH 1/2] riscv: sysdeps support for vectorised functions Sergei Lewis
2023-02-01  9:52 ` [PATCH 2/2] riscv: vectorised mem* and str* functions Sergei Lewis
2023-02-01 15:33   ` Jeff Law
2023-02-01 16:42     ` Florian Weimer
2023-02-01 17:07       ` Jeff Law
2023-02-02  9:34         ` Sergei Lewis
2023-02-06 12:49         ` Sergei Lewis
2023-02-01 17:17     ` Adhemerval Zanella Netto
2023-02-01 17:38   ` Adhemerval Zanella Netto
2023-02-01 18:13     ` Noah Goldstein
2023-02-02 10:02     ` Sergei Lewis
2023-02-02 14:26       ` Adhemerval Zanella Netto
2023-02-02 15:20         ` Sergei Lewis
2023-02-02 15:35           ` Sergei Lewis
2023-02-03 11:35           ` Adhemerval Zanella Netto
2023-02-03 14:04             ` Sergei Lewis
2023-02-01 18:11   ` Noah Goldstein
2023-02-01 18:13   ` Andrew Waterman
2023-02-01 19:03   ` Andrew Waterman
2023-02-03  0:13     ` Vineet Gupta
2023-02-03  0:51       ` Andrew Waterman
2023-05-03  2:11   ` Yun Hsiang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZFHCv8JpkDa6WsBY@Yuns-MacBook-Pro.local \
    --to=yun.hsiang@sifive.com \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).