From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ej1-x630.google.com (mail-ej1-x630.google.com [IPv6:2a00:1450:4864:20::630]) by sourceware.org (Postfix) with ESMTPS id 55ECC3858C52 for ; Fri, 3 Feb 2023 23:26:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 55ECC3858C52 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-ej1-x630.google.com with SMTP id lu11so19597463ejb.3 for ; Fri, 03 Feb 2023 15:26:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=A/C7CLSrRjET6+16sswC41Cal4EbkXQNhvyV0wLULOI=; b=UUsrU29iBNBktOFmGA5jiTXz2frJcrLqTnW40r6Re6NKlAQWE/PiHoturlaZcmuMA0 Uv4QwJTcn/XgTET6FGYguIvpDS41goHn7v3u8FZLO8XFFeEUQS4dH1up+3LOlozFaFg5 1b+sPeYtM3SJ8WXYgkeZTB3KMfDt6/YJHflKkoZWJlKnsGIsxRXtT2kiS6y5CygbfDlG y6Vhw98UkTSIm6d+3mGd88kbq8pljDbFTVgiscZ5nsMFSjqf7wy4vWKHwNaXdU4EeHw6 TiGd3TIbE0g/Yq+fsLEZqAxXarcEyc5TrMchruIhxyd82whrZfdpD7FoGaXX/+WAKyo7 ZgvQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=A/C7CLSrRjET6+16sswC41Cal4EbkXQNhvyV0wLULOI=; b=MgLQSXZGBl5I6Tp/i2Y1iptFC8Ri6jSAZPu7eixUyYcCS4IbbUQV+EhPwGWPTNnxPo CQ0uCJLsA9uYMrdzmv8K+4XvEFyp993CctCxEW+k8lea1StB/rkKI2a8KajsZmq0M8FF i7gdKC7aIzt++tBPOewwrqiUYwOiCfTunK4ASoVKX8f0bFZvnozTRmvmuukpmKtDPz9p ZS7VhoSUfobQ1LB52UJZC9WdfJFfXQJj6qXF2OccGD/yFNUTI1r9f2IhzEMRIR97T9Qt xUBrTCgk+J5UTyiO0KzAXlN2NPRBDnFBdx5rd1pbkSVfHDXHQ5fdHQ4WU3dwnZUJJLRG /+0Q== X-Gm-Message-State: AO0yUKUAIIrynBUXSlGkdsrX7HNMxW3knfBIdgLp2QcqQgsgDrj9Apuz jk2NuHfy2Gk95TSKqIo6NiOa24IqlEryeOLigdM= X-Google-Smtp-Source: AK7set8lMqQtJgelRNpwRTaNkjTRFbXFdaN56Js1hq7jRQQqDctX0ueZpd7p4YotwzDbE3XtQm7IhONO8rvXwaurr3I= X-Received: by 2002:a17:907:7670:b0:87b:db55:f3e5 with SMTP id kk16-20020a170907767000b0087bdb55f3e5mr3501185ejc.289.1675466778140; Fri, 03 Feb 2023 15:26:18 -0800 (PST) MIME-Version: 1.0 References: <20230202181149.2181553-1-adhemerval.zanella@linaro.org> <20230202181149.2181553-10-adhemerval.zanella@linaro.org> In-Reply-To: <20230202181149.2181553-10-adhemerval.zanella@linaro.org> From: Noah Goldstein Date: Fri, 3 Feb 2023 17:26:07 -0600 Message-ID: Subject: Re: [PATCH v12 09/31] string: Improve generic stpcpy To: Adhemerval Zanella Cc: libc-alpha@sourceware.org, Richard Henderson , Jeff Law , Xi Ruoyao Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Thu, Feb 2, 2023 at 12:12 PM Adhemerval Zanella wrote: > > It follows the strategy: > > - Align the destination on word boundary using byte operations. > > - If source is also word aligned, read a word per time, check for > null (using has_zero from string-fzb.h), and write the remaining > bytes. > > - If source is not word aligned, loop by aligning the source, and > merging the result of two reads. Similar to aligned case, > check for null with has_zero, and write the remaining bytes if > null is found. > > Checked on x86_64-linux-gnu, i686-linux-gnu, powerpc64-linux-gnu, > and powerpc-linux-gnu by removing the arch-specific assembly > implementation and disabling multi-arch (it covers both LE and BE > for 64 and 32 bits). > > Reviewed-by: Richard Henderson > --- > string/stpcpy.c | 92 +++++++++++++++++++++++++++++++++++++++++++++---- > 1 file changed, 86 insertions(+), 6 deletions(-) > > diff --git a/string/stpcpy.c b/string/stpcpy.c > index 8df5065cfe..dd0fef12ef 100644 > --- a/string/stpcpy.c > +++ b/string/stpcpy.c > @@ -15,12 +15,12 @@ > License along with the GNU C Library; if not, see > . */ > > -#ifdef HAVE_CONFIG_H > -# include > -#endif > - > #define NO_MEMPCPY_STPCPY_REDIRECT > #include > +#include > +#include > +#include > +#include > > #undef __stpcpy > #undef stpcpy > @@ -29,12 +29,92 @@ > # define STPCPY __stpcpy > #endif > > +static __always_inline char * > +write_byte_from_word (op_t *dest, op_t word) > +{ > + char *d = (char *) dest; > + for (size_t i = 0; i < OPSIZ; i++, ++d) > + { > + char c = extractbyte (word, i); > + *d = c; > + if (c == '\0') > + break; > + } > + return d; > +} > + > +static __always_inline char * > +stpcpy_aligned_loop (op_t *restrict dst, const op_t *restrict src) > +{ > + op_t word; > + while (1) > + { > + word = *src++; > + if (has_zero (word)) > + break; > + *dst++ = word; > + } > + > + return write_byte_from_word (dst, word); > +} > + > +static __always_inline char * > +stpcpy_unaligned_loop (op_t *restrict dst, const op_t *restrict src, > + uintptr_t ofs) > +{ > + op_t w2a = *src++; > + uintptr_t sh_1 = ofs * CHAR_BIT; > + uintptr_t sh_2 = OPSIZ * CHAR_BIT - sh_1; > + > + op_t w2 = MERGE (w2a, sh_1, (op_t)-1, sh_2); > + if (!has_zero (w2)) > + { > + op_t w2b; > + > + /* Unaligned loop. The invariant is that W2B, which is "ahead" of W1, > + does not contain end-of-string. Therefore it is safe (and necessary) > + to read another word from each while we do not have a difference. */ > + while (1) > + { > + w2b = *src++; > + w2 = MERGE (w2a, sh_1, w2b, sh_2); > + /* Check if there is zero on w2a. */ > + if (has_zero (w2)) > + goto out; > + *dst++ = w2; > + if (has_zero (w2b)) > + break; > + w2a = w2b; > + } > + > + /* Align the final partial of P2. */ > + w2 = MERGE (w2b, sh_1, 0, sh_2); > + } > + > +out: > + return write_byte_from_word (dst, w2); > +} > + > + > /* Copy SRC to DEST, returning the address of the terminating '\0' in DEST. */ > char * > STPCPY (char *dest, const char *src) > { > - size_t len = strlen (src); > - return memcpy (dest, src, len + 1) + len; > + /* Copy just a few bytes to make DEST aligned. */ > + size_t len = (-(uintptr_t) dest) % OPSIZ; > + for (; len != 0; len--, ++dest) > + { > + char c = *src++; > + *dest = c; > + if (c == '\0') > + return dest; > + } > + > + /* DEST is now aligned to op_t, SRC may or may not be. */ > + uintptr_t ofs = (uintptr_t) src % OPSIZ; > + return ofs == 0 ? stpcpy_aligned_loop ((op_t*) dest, (const op_t *) src) > + : stpcpy_unaligned_loop ((op_t*) dest, > + (const op_t *) (src - ofs) , ofs); > } > weak_alias (__stpcpy, stpcpy) > libc_hidden_def (__stpcpy) > -- > 2.34.1 > LGTM. Reviewed-by: Noah Goldstein