Re: [PATCH 2/2] Loongarch: Add ifunc support and add different versions of strlen

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

From: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
To: dengjianbo <dengjianbo@loongson.cn>,
	caiyinyu <caiyinyu@loongson.cn>,
	libc-alpha@sourceware.org
Cc: xry111@xry111.site, xuchenghua@loongson.cn, huangpei@loongson.cn
Subject: Re: [PATCH 2/2] Loongarch: Add ifunc support and add different versions of strlen
Date: Thu, 3 Aug 2023 10:48:43 -0300	[thread overview]
Message-ID: <2aed2087-c44e-8fed-83d1-5e60343c8f47@linaro.org> (raw)
In-Reply-To: <c5e923cd-f014-636d-441e-5fd06400fdc0@loongson.cn>



On 03/08/23 10:27, dengjianbo wrote:
> 
> On 2023-08-02 20:59, Adhemerval Zanella Netto wrote:
>>>>>> On 2023-08-02 10:31, Adhemerval Zanella Netto wrote:
>>>>>> +#if IS_IN (libc)
>>>>>> +# define STRLEN __strlen_aligned
>>>>>> +#else
>>>>>> +# define STRLEN strlen
>>>>>> +#endif
>>>>> Is this really an improvement over the generic implementation? It seems to
>>>>> use a quite similar strategy.
>>> Comparing with the code generated by compiler, the assembly code does an 16bytes loop
>>> unrolling, and handles ascii data and non-ascii data separately which could take less
>>> instructions to calculate the length of  ascii data. besides, the assembly code using
>>> fewer instructions to start the loop. I think the performance improvement benefits from
>>> this. Please kindly check bench result also from:
>>> https://github.com/jiadengx/glibc_test/blob/main/strlen/bench-strlen.out
>> From the summarized results [1], it seems that the initial start to mask
>> off unaligned inputs are slight better.  The __strlen_aligned onl seems
>> better to sizes larger than 32 (the 16 lenght results seems strange).
>> Maybe you coult improve shift_find/find_zero_all/index_first on loongarch.
>>
>> Does it improve by explicit instructing compiler to unroll the loop?
> As you know, the assembly versions of strlen uses the same strategy to
> calculate string length, if assembly code only calculate 8 bytes in the
> loop and don't separate ascii and non-ascii data, the code of loop and
> loop end part should be the same as the compiler generated code base on
> generic strlen. Loongarch doesn't provide instructions like alpha
> cmpbge, so there is no much optimizations could be done on
> find_zero_all/index_first/has_zero except we can remove some BIG_ENDIAN
> codes.
>  
> Refer to the latest test results in the chart: The assembly
> implementation vs. generic strlen implementation(compiled by using
> CFLAGS-strlen.c += -funroll-all-loops --param
> max-variable-expandsions-in-unroller=2) the performance
> improvement of the assembly implementation is evident(30% ~ 40%),
> especially in cases when the length is greater than 64 bytes.
> Please kindly see the results via:
> https://github.com/jiadengx/glibc_test/blob/main/strlen2/bench1/generic_strlen_with_loop_unrolling.png

So maybe use the generic implementation plus the compiler flags to loop
unrolling instead of asm optimization?

>>>>> This implementation fails to assembler with binutils 2.40.0.20230525:
>>>>> ../sysdeps/loongarch/lp64/multiarch/strlen-lsx.S: Assembler messages:
>>>>> ../sysdeps/loongarch/lp64/multiarch/strlen-lsx.S:30: Error: no match insn: vld  $vr0,$r4,0
>>>>> ../sysdeps/loongarch/lp64/multiarch/strlen-lsx.S:31: Error: no match insn: vld  $vr1,$r4,16
>>>>>
>>> Sorry, it's my mistake for the wrong version of binutils. Could you please try the latest release
>>> version 2.41?
>> Although it should work, it is unexpected that depending of the assembler used
>> some optimized routines are not enabled. 
> 
> In patch v2, an new configuration variable has been added to control
> whether the LASX/LSX will be compiled according to assembler support
> LASX/LSX or not, so it can be compiled with old versions of binutils.

Yes I am aware and this seems odd, albeit not really wrong.  It means that
you will get less code coverage and optimizations depending of the used 
binutils. 

I would advise to follow what other architecture did to provide arch-specific 
optimization, which is either setup a minimum gcc/binutils version (for 
instance aarch64 libmvec), or encode the instructions in a binutils neutral
mode (as the powerpc implementation I pointed out).

next prev parent reply	other threads:[~2023-08-03 13:48 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-01  7:09 [PATCH 0/2] Add ifunc support and " dengjianbo
2023-08-01  7:09 ` [PATCH 1/2] LoongArch: Redefine macro LEAF/ENTRY dengjianbo
2023-08-01  7:09 ` [PATCH 2/2] Loongarch: Add ifunc support and add different versions of strlen dengjianbo
2023-08-01 14:31   ` Adhemerval Zanella Netto
2023-08-02  1:25     ` caiyinyu
2023-08-02 12:25       ` dengjianbo
2023-08-02 12:59         ` Adhemerval Zanella Netto
2023-08-03 13:27           ` dengjianbo
2023-08-03 13:48             ` Adhemerval Zanella Netto [this message]
2023-08-03 14:53               ` Xi Ruoyao
2023-08-03 14:59                 ` Xi Ruoyao
2023-08-03 16:29                   ` Adhemerval Zanella Netto
2023-08-04  1:50                 ` caiyinyu
2023-08-04 10:00               ` dengjianbo
2023-08-01 14:44   ` Xi Ruoyao
2023-08-02 12:47     ` dengjianbo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2aed2087-c44e-8fed-83d1-5e60343c8f47@linaro.org \
    --to=adhemerval.zanella@linaro.org \
    --cc=caiyinyu@loongson.cn \
    --cc=dengjianbo@loongson.cn \
    --cc=huangpei@loongson.cn \
    --cc=libc-alpha@sourceware.org \
    --cc=xry111@xry111.site \
    --cc=xuchenghua@loongson.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).