From: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
To: dengjianbo <dengjianbo@loongson.cn>,
caiyinyu <caiyinyu@loongson.cn>,
libc-alpha@sourceware.org
Cc: xry111@xry111.site, xuchenghua@loongson.cn, huangpei@loongson.cn
Subject: Re: [PATCH 2/2] Loongarch: Add ifunc support and add different versions of strlen
Date: Thu, 3 Aug 2023 10:48:43 -0300 [thread overview]
Message-ID: <2aed2087-c44e-8fed-83d1-5e60343c8f47@linaro.org> (raw)
In-Reply-To: <c5e923cd-f014-636d-441e-5fd06400fdc0@loongson.cn>
On 03/08/23 10:27, dengjianbo wrote:
>
> On 2023-08-02 20:59, Adhemerval Zanella Netto wrote:
>>>>>> On 2023-08-02 10:31, Adhemerval Zanella Netto wrote:
>>>>>> +#if IS_IN (libc)
>>>>>> +# define STRLEN __strlen_aligned
>>>>>> +#else
>>>>>> +# define STRLEN strlen
>>>>>> +#endif
>>>>> Is this really an improvement over the generic implementation? It seems to
>>>>> use a quite similar strategy.
>>> Comparing with the code generated by compiler, the assembly code does an 16bytes loop
>>> unrolling, and handles ascii data and non-ascii data separately which could take less
>>> instructions to calculate the length of ascii data. besides, the assembly code using
>>> fewer instructions to start the loop. I think the performance improvement benefits from
>>> this. Please kindly check bench result also from:
>>> https://github.com/jiadengx/glibc_test/blob/main/strlen/bench-strlen.out
>> From the summarized results [1], it seems that the initial start to mask
>> off unaligned inputs are slight better. The __strlen_aligned onl seems
>> better to sizes larger than 32 (the 16 lenght results seems strange).
>> Maybe you coult improve shift_find/find_zero_all/index_first on loongarch.
>>
>> Does it improve by explicit instructing compiler to unroll the loop?
> As you know, the assembly versions of strlen uses the same strategy to
> calculate string length, if assembly code only calculate 8 bytes in the
> loop and don't separate ascii and non-ascii data, the code of loop and
> loop end part should be the same as the compiler generated code base on
> generic strlen. Loongarch doesn't provide instructions like alpha
> cmpbge, so there is no much optimizations could be done on
> find_zero_all/index_first/has_zero except we can remove some BIG_ENDIAN
> codes.
>
> Refer to the latest test results in the chart: The assembly
> implementation vs. generic strlen implementation(compiled by using
> CFLAGS-strlen.c += -funroll-all-loops --param
> max-variable-expandsions-in-unroller=2) the performance
> improvement of the assembly implementation is evident(30% ~ 40%),
> especially in cases when the length is greater than 64 bytes.
> Please kindly see the results via:
> https://github.com/jiadengx/glibc_test/blob/main/strlen2/bench1/generic_strlen_with_loop_unrolling.png
So maybe use the generic implementation plus the compiler flags to loop
unrolling instead of asm optimization?
>>>>> This implementation fails to assembler with binutils 2.40.0.20230525:
>>>>> ../sysdeps/loongarch/lp64/multiarch/strlen-lsx.S: Assembler messages:
>>>>> ../sysdeps/loongarch/lp64/multiarch/strlen-lsx.S:30: Error: no match insn: vld $vr0,$r4,0
>>>>> ../sysdeps/loongarch/lp64/multiarch/strlen-lsx.S:31: Error: no match insn: vld $vr1,$r4,16
>>>>>
>>> Sorry, it's my mistake for the wrong version of binutils. Could you please try the latest release
>>> version 2.41?
>> Although it should work, it is unexpected that depending of the assembler used
>> some optimized routines are not enabled.
>
> In patch v2, an new configuration variable has been added to control
> whether the LASX/LSX will be compiled according to assembler support
> LASX/LSX or not, so it can be compiled with old versions of binutils.
Yes I am aware and this seems odd, albeit not really wrong. It means that
you will get less code coverage and optimizations depending of the used
binutils.
I would advise to follow what other architecture did to provide arch-specific
optimization, which is either setup a minimum gcc/binutils version (for
instance aarch64 libmvec), or encode the instructions in a binutils neutral
mode (as the powerpc implementation I pointed out).
next prev parent reply other threads:[~2023-08-03 13:48 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-01 7:09 [PATCH 0/2] Add ifunc support and " dengjianbo
2023-08-01 7:09 ` [PATCH 1/2] LoongArch: Redefine macro LEAF/ENTRY dengjianbo
2023-08-01 7:09 ` [PATCH 2/2] Loongarch: Add ifunc support and add different versions of strlen dengjianbo
2023-08-01 14:31 ` Adhemerval Zanella Netto
2023-08-02 1:25 ` caiyinyu
2023-08-02 12:25 ` dengjianbo
2023-08-02 12:59 ` Adhemerval Zanella Netto
2023-08-03 13:27 ` dengjianbo
2023-08-03 13:48 ` Adhemerval Zanella Netto [this message]
2023-08-03 14:53 ` Xi Ruoyao
2023-08-03 14:59 ` Xi Ruoyao
2023-08-03 16:29 ` Adhemerval Zanella Netto
2023-08-04 1:50 ` caiyinyu
2023-08-04 10:00 ` dengjianbo
2023-08-01 14:44 ` Xi Ruoyao
2023-08-02 12:47 ` dengjianbo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2aed2087-c44e-8fed-83d1-5e60343c8f47@linaro.org \
--to=adhemerval.zanella@linaro.org \
--cc=caiyinyu@loongson.cn \
--cc=dengjianbo@loongson.cn \
--cc=huangpei@loongson.cn \
--cc=libc-alpha@sourceware.org \
--cc=xry111@xry111.site \
--cc=xuchenghua@loongson.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).