From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pl1-x634.google.com (mail-pl1-x634.google.com [IPv6:2607:f8b0:4864:20::634]) by sourceware.org (Postfix) with ESMTPS id 5E41D3858D1E for ; Wed, 3 May 2023 02:11:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5E41D3858D1E Authentication-Results: sourceware.org; dmarc=pass (p=quarantine dis=none) header.from=sifive.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=sifive.com Received: by mail-pl1-x634.google.com with SMTP id d9443c01a7336-1aad55244b7so33092185ad.2 for ; Tue, 02 May 2023 19:11:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sifive.com; s=google; t=1683079875; x=1685671875; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=weBpyTo5LxXVHa3PiNpcsJ3/JrL1t30ARlabefmXirQ=; b=PQicysW5waAacIh0cVs576pU5a+nZx/HpPlGaC2+Qw9mXan4B5mlvSvJ2JiS1xhG0p VSlwMX4w02nSS5GtjM/Nrb36ffOKed8LqU6CdnNc41S0vWrZWluv80gUohBnUJGGpGbw zBc8uyUlal0QNAI8ejDZOg6KG/oO7HPYWdndkRSU8upPP/fgYVwjFSSNTg7xfMLHNeCF Bwh9VbmFudkdAc6QYG6ofeXWVr9YtW3FPVUaQvUmze7k/iHXxkfbi/QJH+mQQ4LuYTn9 sR9BYPK7mmFkZwt1eYCdzAKvVbES26U8RBIWZRzjuqxbsT8ZrW03eav8FW8rc9T4HsfM SH3A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683079875; x=1685671875; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=weBpyTo5LxXVHa3PiNpcsJ3/JrL1t30ARlabefmXirQ=; b=MNjArQYye0oe2fdpfpiXvWgYRaZoU3yT471juPbKnwaB54oh79YQttzhtliEMyYu7v 2W1KucYPmzoy+1/TeFSzrsoiuc1Jqrnx3Lhda6SZkbfIC6dxDuAtWTI+msQnXjr8G/2i Xhfaqop3p6b3mpD0Bm9+47KrLJpXLuweRKdgzdh3rilVLMlIhTlor1693ynT5LtB3wlB 80OBa+hIlh7P4Fbv5Yh2ErLNuaxR1jf8ZNIrZSbBLVYM7ztwaQU7JYIdhP5ef8fcstsn pR7r8pQKcFeCNz2PEggc/FQxf5FJwfgMW4oJuQ0ExLpWKVWHl2YrqUCE23q9xdGLkK18 4iXw== X-Gm-Message-State: AC+VfDzerno+0p6+AHdTziDnkOjl1FgLu4RRmAl/ta5amOqbTtkPQakg 9gVQ+1X6PPDa1PStEeUaGdBDawg8TASLrOLNAdD1ruypAfRpAsuXgV7Yl/kuww8Ey/KQt1029Qv fLTJ5eUb9Ir7PyHZS/ZTb05Paq0Hb/Qd5uUzcvysNi4C77QBL1xTHNjx7bUSy3WxuOibwLyvLxl 44MwqnfA== X-Google-Smtp-Source: ACHHUZ59B/mhxpe9L/+9evr1Jb6iuwj69520j2Xq15uO0F8e4IPbMF38Ts6IySxMWDC0IDapsKKoRg== X-Received: by 2002:a17:902:c103:b0:1ab:c4e:de87 with SMTP id 3-20020a170902c10300b001ab0c4ede87mr657153pli.10.1683079874757; Tue, 02 May 2023 19:11:14 -0700 (PDT) Received: from Yuns-MacBook-Pro.local (59-124-168-89.hinet-ip.hinet.net. [59.124.168.89]) by smtp.gmail.com with ESMTPSA id t13-20020a1709028c8d00b001a4fe00a8d4sm20343950plo.90.2023.05.02.19.11.13 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 02 May 2023 19:11:14 -0700 (PDT) Date: Wed, 3 May 2023 10:11:11 +0800 From: Yun Hsiang To: libc-alpha@sourceware.org Subject: Re: [PATCH 2/2] riscv: vectorised mem* and str* functions Message-ID: References: <20230201095232.15942-1-slewis@rivosinc.com> <20230201095232.15942-2-slewis@rivosinc.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230201095232.15942-2-slewis@rivosinc.com> X-Spam-Status: No, score=-3.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Wed, Feb 01, 2023 at 09:52:32AM +0000, Sergei Lewis wrote: > > Initial implementations of memchr, memcmp, memcpy, memmove, memset, strchr, > strcmp, strcpy, strlen, strncmp, strncpy, strnlen, strrchr, strspn > targeting the riscv "V" extension, version 1.0 > > The vectorised implementations assume VLENB of at least 128 and at least 32 > registers (as mandated by the "V" extension spec). They also assume that > VLENB is a power of two which is no larger than the page size, and (as > vectorised code in glibc for other platforms does) that it is safe to read > past null terminators / buffer ends provided one does not cross a page > boundary. I've tried to apply these patches to run benchtests & tests, but I encountered some errors at runtime while running string/test-*. Do these patches depend on others? > /* ignore */ In strnlen implementation. > +#include > + > +.globl __strnlen > +.type __strnlen,@function > + > +/* vector optimized strnlen > + * assume it's safe to read to the end of the page > + * containing either a null terminator or the last byte of the count or both, > + * but not past it > + * assume page size >= vlenb*2 > + */ > + > +.align 2 > +__strnlen: > + mv t4, a0 /* stash a copy of start for later */ > + beqz a1, .LzeroCount > + > + csrr t1, vlenb /* find vlenb*2 */ > + add t1, t1, t1 > + addi t2, t1, -1 /* mask off unaligned part of ptr */ > + and t2, a1, a0 Should this line be `and t2, t2, a0`? > + beqz t2, .Laligned > + > + sub t2, t1, t2 /* search to align pointer to t1 */ > + bgeu t2, a1, 2f /* check it's safe */ > + mv t2, a1 /* it's not! look as far as permitted */ > +2: vsetvli t2, t2, e8, m2, ta, ma > + vle8.v v2, (a0) > + vmseq.vx v0, v2, zero > + vfirst.m t3, v0 > + bgez t3, .Lfound > + add a0, a0, t2 > + sub a1, a1, t2 > + bltu a1, t1, .LreachedCount > + > +.Laligned: > + vsetvli zero, t1, e8, m2, ta, ma /* do 2*vlenb bytes per pass */ > + > +1: vle8.v v2, (a0) > + sub a1, a1, t1 If a1(maxlen) is smaller than t1(vlenb*2) in the first loop, a1(maxlen) will become a negative value. Then strnlen might get the wrong result. > + vmseq.vx v0, v2, zero > + vfirst.m t3, v0 > + bgez t3, .Lfound > + add a0, a0, t1 > + bgeu a1, t1, 1b > +.LreachedCount: > + mv t2, a1 /* in case 0 < a1 < t1 */ > + bnez a1, 2b /* if so, still t2 bytes to check, all safe */ > +.LzeroCount: > + sub a0, a0, t4 > + ret > + > +.Lfound: /* found the 0; subtract buffer start from current pointer */ > + add a0, a0, t3 /* and add offset into fetched data */ > + sub a0, a0, t4 > + ret > + > +.size __strnlen, .-__strnlen > +weak_alias (__strnlen, strnlen) > +libc_hidden_builtin_def (__strnlen) > +libc_hidden_builtin_def (strnlen)