public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: dengjianbo <dengjianbo@loongson.cn>
To: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>,
	caiyinyu <caiyinyu@loongson.cn>,
	libc-alpha@sourceware.org
Cc: xry111@xry111.site, xuchenghua@loongson.cn, huangpei@loongson.cn
Subject: Re: [PATCH 2/2] Loongarch: Add ifunc support and add different versions of strlen
Date: Thu, 3 Aug 2023 21:27:20 +0800	[thread overview]
Message-ID: <c5e923cd-f014-636d-441e-5fd06400fdc0@loongson.cn> (raw)
In-Reply-To: <60b7843c-a32c-a1b5-ffc1-1ad769fb2d57@linaro.org>

[-- Attachment #1: Type: text/plain, Size: 3001 bytes --]


On 2023-08-02 20:59, Adhemerval Zanella Netto wrote:
>>>>> On 2023-08-02 10:31, Adhemerval Zanella Netto wrote:
>>>>> +#if IS_IN (libc)
>>>>> +# define STRLEN __strlen_aligned
>>>>> +#else
>>>>> +# define STRLEN strlen
>>>>> +#endif
>>>> Is this really an improvement over the generic implementation? It seems to
>>>> use a quite similar strategy.
>> Comparing with the code generated by compiler, the assembly code does an 16bytes loop
>> unrolling, and handles ascii data and non-ascii data separately which could take less
>> instructions to calculate the length of  ascii data. besides, the assembly code using
>> fewer instructions to start the loop. I think the performance improvement benefits from
>> this. Please kindly check bench result also from:
>> https://github.com/jiadengx/glibc_test/blob/main/strlen/bench-strlen.out
> From the summarized results [1], it seems that the initial start to mask
> off unaligned inputs are slight better.  The __strlen_aligned onl seems
> better to sizes larger than 32 (the 16 lenght results seems strange).
> Maybe you coult improve shift_find/find_zero_all/index_first on loongarch.
>
> Does it improve by explicit instructing compiler to unroll the loop?
As you know, the assembly versions of strlen uses the same strategy to
calculate string length, if assembly code only calculate 8 bytes in the
loop and don't separate ascii and non-ascii data, the code of loop and
loop end part should be the same as the compiler generated code base on
generic strlen. Loongarch doesn't provide instructions like alpha
cmpbge, so there is no much optimizations could be done on
find_zero_all/index_first/has_zero except we can remove some BIG_ENDIAN
codes.
 
Refer to the latest test results in the chart: The assembly
implementation vs. generic strlen implementation(compiled by using
CFLAGS-strlen.c += -funroll-all-loops --param
max-variable-expandsions-in-unroller=2) the performance
improvement of the assembly implementation is evident(30% ~ 40%),
especially in cases when the length is greater than 64 bytes.
Please kindly see the results via:
https://github.com/jiadengx/glibc_test/blob/main/strlen2/bench1/generic_strlen_with_loop_unrolling.png
>>>> This implementation fails to assembler with binutils 2.40.0.20230525:
>>>> ../sysdeps/loongarch/lp64/multiarch/strlen-lsx.S: Assembler messages:
>>>> ../sysdeps/loongarch/lp64/multiarch/strlen-lsx.S:30: Error: no match insn: vld  $vr0,$r4,0
>>>> ../sysdeps/loongarch/lp64/multiarch/strlen-lsx.S:31: Error: no match insn: vld  $vr1,$r4,16
>>>>
>> Sorry, it's my mistake for the wrong version of binutils. Could you please try the latest release
>> version 2.41?
> Although it should work, it is unexpected that depending of the assembler used
> some optimized routines are not enabled. 

In patch v2, an new configuration variable has been added to control
whether the LASX/LSX will be compiled according to assembler support
LASX/LSX or not, so it can be compiled with old versions of binutils.




  reply	other threads:[~2023-08-03 13:27 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-01  7:09 [PATCH 0/2] Add ifunc support and " dengjianbo
2023-08-01  7:09 ` [PATCH 1/2] LoongArch: Redefine macro LEAF/ENTRY dengjianbo
2023-08-01  7:09 ` [PATCH 2/2] Loongarch: Add ifunc support and add different versions of strlen dengjianbo
2023-08-01 14:31   ` Adhemerval Zanella Netto
2023-08-02  1:25     ` caiyinyu
2023-08-02 12:25       ` dengjianbo
2023-08-02 12:59         ` Adhemerval Zanella Netto
2023-08-03 13:27           ` dengjianbo [this message]
2023-08-03 13:48             ` Adhemerval Zanella Netto
2023-08-03 14:53               ` Xi Ruoyao
2023-08-03 14:59                 ` Xi Ruoyao
2023-08-03 16:29                   ` Adhemerval Zanella Netto
2023-08-04  1:50                 ` caiyinyu
2023-08-04 10:00               ` dengjianbo
2023-08-01 14:44   ` Xi Ruoyao
2023-08-02 12:47     ` dengjianbo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c5e923cd-f014-636d-441e-5fd06400fdc0@loongson.cn \
    --to=dengjianbo@loongson.cn \
    --cc=adhemerval.zanella@linaro.org \
    --cc=caiyinyu@loongson.cn \
    --cc=huangpei@loongson.cn \
    --cc=libc-alpha@sourceware.org \
    --cc=xry111@xry111.site \
    --cc=xuchenghua@loongson.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).