From: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
To: Xi Ruoyao <xry111@xry111.site>,
libc-alpha@sourceware.org,
Richard Henderson <richard.henderson@linaro.org>,
Noah Goldstein <goldstein.w.n@gmail.com>,
Jeff Law <jeffreyalaw@gmail.com>
Cc: caiyinyu <caiyinyu@loongson.cn>
Subject: Re: [PATCH v11 10/29] string: Improve generic stpcpy
Date: Thu, 2 Feb 2023 10:32:43 -0300 [thread overview]
Message-ID: <ad38b4db-7d4c-1ffe-861e-39270e9a65d4@linaro.org> (raw)
In-Reply-To: <744ec5a52e67829d38d072a5b8fe1c219f8c0e7c.camel@xry111.site>
On 01/02/23 14:29, Xi Ruoyao wrote:
> On Wed, 2023-02-01 at 14:03 -0300, Adhemerval Zanella wrote:
>> +static __always_inline char *
>> +stpcpy_unaligned_loop (op_t *restrict dst, const op_t *restrict src,
>> + uintptr_t ofs)
>> +{
>> + op_t w2a = *src++;
>> + uintptr_t sh_1 = ofs * CHAR_BIT;
>> + uintptr_t sh_2 = OPSIZ * CHAR_BIT - sh_1;
>
> Hmm, on 64-bit LoongArch if we "clone" the function 7 times to
> stpcpy_unaligned_loop_{1..7} and call them with a switch (ofs) { ... }
> construction, we'd be able to use bytepick.d instruction for MERGE,
> saving 2 instructions in the iteration. But maybe this is going too
> far. I'm not sure if this "optimization" applies for other
> architectures.
I think it should be feasible, I might get back to optimize the unaligned
loop with this strategy. But I will need to check if compiler will indeed
exploit the fact that the shifts are now constants to optimize the merge.
It also increases the code size slight, on x86_64 text size went from 850
to 1993 and on loongarch from 864 to 2200 (so it might be something to
consider as well assuming that unaligned strings will have a equal probability
to happen, so icache pressure would be important).
>
>> + op_t w2 = MERGE (w2a, sh_1, (op_t)-1, sh_2);
>> + if (!has_zero (w2))
>> + {
>> + op_t w2b;
>> +
>> + /* Unaligned loop. The invariant is that W2B, which is "ahead" of W1,
>> + does not contain end-of-string. Therefore it is safe (and necessary)
>> + to read another word from each while we do not have a difference. */
>> + while (1)
>> + {
>> + w2b = *src++;
>> + w2 = MERGE (w2a, sh_1, w2b, sh_2);
>> + /* Check if there is zero on w2a. */
>> + if (has_zero (w2))
>> + goto out;
>> + *dst++ = w2;
>> + if (has_zero (w2b))
>> + break;
>> + w2a = w2b;
>> + }
>> +
>> + /* Align the final partial of P2. */
>> + w2 = MERGE (w2b, sh_1, 0, sh_2);
>> + }
>> +
>> +out:
>> + return write_byte_from_word (dst, w2);
>> +}
>> +
>
next prev parent reply other threads:[~2023-02-02 13:32 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-01 17:03 [PATCH v11 00/29] Improve generic string routines Adhemerval Zanella
2023-02-01 17:03 ` [PATCH v11 01/29] Parameterize op_t from memcopy.h Adhemerval Zanella
2023-02-01 17:03 ` [PATCH v11 02/29] Parameterize OP_T_THRES " Adhemerval Zanella
2023-02-01 17:03 ` [PATCH v11 03/29] Add string vectorized find and detection functions Adhemerval Zanella
2023-02-01 17:03 ` [PATCH v11 04/29] string: Improve generic strlen Adhemerval Zanella
2023-02-01 19:51 ` Noah Goldstein
2023-02-01 17:03 ` [PATCH v11 05/29] string: Improve generic strnlen with memchr Adhemerval Zanella
2023-02-01 19:36 ` Noah Goldstein
2023-02-01 19:57 ` Adhemerval Zanella Netto
2023-02-01 19:39 ` Noah Goldstein
2023-02-01 20:02 ` Adhemerval Zanella Netto
2023-02-01 17:03 ` [PATCH v11 06/29] string: Improve generic strchr Adhemerval Zanella
2023-02-01 19:44 ` Noah Goldstein
2023-02-01 20:03 ` Adhemerval Zanella Netto
2023-02-01 17:03 ` [PATCH v11 07/29] string: Improve generic strchrnul Adhemerval Zanella
2023-02-01 19:49 ` Noah Goldstein
2023-02-01 17:03 ` [PATCH v11 08/29] string: Improve generic strcmp Adhemerval Zanella
2023-02-01 17:34 ` Richard Henderson
2023-02-02 11:57 ` Adhemerval Zanella Netto
2023-02-01 17:03 ` [PATCH v11 09/29] string: Improve generic strncmp Adhemerval Zanella
2023-02-01 19:42 ` Noah Goldstein
2023-02-02 12:01 ` Adhemerval Zanella Netto
2023-02-01 17:03 ` [PATCH v11 10/29] string: Improve generic stpcpy Adhemerval Zanella
2023-02-01 17:29 ` Xi Ruoyao
2023-02-02 13:32 ` Adhemerval Zanella Netto [this message]
2023-02-01 17:37 ` Richard Henderson
2023-02-01 19:47 ` Noah Goldstein
2023-02-01 20:26 ` Richard Henderson
2023-02-01 17:03 ` [PATCH v11 11/29] string: Improve generic strcpy Adhemerval Zanella
2023-02-01 17:38 ` Richard Henderson
2023-02-01 17:03 ` [PATCH v11 12/29] string: Improve generic memchr Adhemerval Zanella
2023-02-01 19:49 ` Noah Goldstein
2023-02-01 17:03 ` [PATCH v11 13/29] string: Improve generic memrchr Adhemerval Zanella
2023-02-01 19:50 ` Noah Goldstein
2023-02-01 17:03 ` [PATCH v11 14/29] hppa: Add memcopy.h Adhemerval Zanella
2023-02-01 17:03 ` [PATCH v11 15/29] hppa: Add string-fza.h, string-fzc.h, and string-fzi.h Adhemerval Zanella
2023-02-01 17:03 ` [PATCH v11 16/29] alpha: Add string-fza, string-fzb.h, string-fzi.h, and string-shift.h Adhemerval Zanella
2023-02-01 17:03 ` [PATCH v11 17/29] arm: Add string-fza.h Adhemerval Zanella
2023-02-01 17:03 ` [PATCH v11 18/29] powerpc: " Adhemerval Zanella
2023-02-01 17:03 ` [PATCH v11 19/29] sh: Add string-fzb.h Adhemerval Zanella
2023-02-01 17:03 ` [PATCH v11 20/29] riscv: Add string-fza.h and string-fzi.h Adhemerval Zanella
2023-02-01 17:53 ` Richard Henderson
2023-02-02 12:30 ` Adhemerval Zanella Netto
2023-02-02 16:24 ` Richard Henderson
2023-02-04 16:31 ` Jeff Law
2023-02-05 17:33 ` Richard Henderson
2023-02-01 18:08 ` Noah Goldstein
2023-02-01 17:03 ` [PATCH v11 21/29] string: Hook up the default implementation on test-strlen Adhemerval Zanella
2023-02-01 17:54 ` Richard Henderson
2023-02-01 17:03 ` [PATCH v11 22/29] string: Hook up the default implementation on test-strnlen Adhemerval Zanella
2023-02-01 17:54 ` Richard Henderson
2023-02-01 17:04 ` [PATCH v11 23/29] string: Hook up the default implementation on test-strchr Adhemerval Zanella
2023-02-01 17:55 ` Richard Henderson
2023-02-01 17:04 ` [PATCH v11 24/29] string: Hook up the default implementation on test-strcmp Adhemerval Zanella
2023-02-01 17:55 ` Richard Henderson
2023-02-01 17:04 ` [PATCH v11 25/29] string: Hook up the default implementation on test-strncmp Adhemerval Zanella
2023-02-01 17:56 ` Richard Henderson
2023-02-01 17:04 ` [PATCH v11 26/29] string: Hook up the default implementation on test-stpcpy Adhemerval Zanella
2023-02-01 17:56 ` Richard Henderson
2023-02-01 17:04 ` [PATCH v11 27/29] string: Hook up the default implementation on test-strcpy Adhemerval Zanella
2023-02-01 17:56 ` Richard Henderson
2023-02-01 17:04 ` [PATCH v11 28/29] string: Hook up the default implementation on test-memchr Adhemerval Zanella
2023-02-01 17:57 ` Richard Henderson
2023-02-01 17:04 ` [PATCH v11 29/29] string: Hook up the default implementation on test-memrchr Adhemerval Zanella
2023-02-01 17:57 ` Richard Henderson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ad38b4db-7d4c-1ffe-861e-39270e9a65d4@linaro.org \
--to=adhemerval.zanella@linaro.org \
--cc=caiyinyu@loongson.cn \
--cc=goldstein.w.n@gmail.com \
--cc=jeffreyalaw@gmail.com \
--cc=libc-alpha@sourceware.org \
--cc=richard.henderson@linaro.org \
--cc=xry111@xry111.site \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).