public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
To: "naohirot@fujitsu.com" <naohirot@fujitsu.com>
Cc: 'GNU C Library' <libc-alpha@sourceware.org>,
	Szabolcs Nagy <Szabolcs.Nagy@arm.com>
Subject: Re: [PATCH 0/5] Added optimized memcpy/memmove/memset for A64FX
Date: Thu, 29 Apr 2021 15:13:51 +0000	[thread overview]
Message-ID: <VE1PR08MB55993A255FE21BA012D010C3835F9@VE1PR08MB5599.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <TYAPR01MB60257AB96B99DA855B1B711DDF419@TYAPR01MB6025.jpnprd01.prod.outlook.com>

Hi Naohiro,

> I believe that I've answered all of your comments so far.
> Please let me know if I missed something.
> If there is no further comments to the first version of this patch,
> I'd like to proceed with the preparation of the second version after
> the consecutive National holidays, Apr. 29th - May. 5th, in Japan.

I've only looked at memcpy so far. My comments on memcpy:

(1) Improve the tail code in unroll4/2/1/last to do the reverse of
    shortcut_for_small_size - basically there is no need for loops or lots of branches.

(2) Rather than start with L2, check for n > L2_SIZE && vector_length == 64 and
    start with the vl_agnostic case. Copies > L2_SIZE will be very rare so it's best to
    handle the common case first.

(3) The alignment code can be significantly simplified. Why not just process
    4 vectors unconditionally and then align the pointers? That avoids all the
    complex code and is much faster.

(4) Is there a benefit of aligning src or dst to vector size in the vl_agnostic case?
    If so, it would be easy to align to a vector first and then if n > L2_SIZE do the
    remaining 3 vectors to align to a full cacheline.

(5) I'm not sure I understand the reason for src_notag/dest_notag. However if
    you want to ignore tags, just change the mov src_ptr, src into AND that
    clears the tag. There is no reason to both clear the tag and also keep the
    original pointer and tag.

For memmove I would suggest to merge it with memcpy to save ~100 instructions.
I don't understand the complexity of the L(dispatch) code - you just need a simple
3-instruction overlap check that branches to bwd_unroll8.

I haven't looked at memset, but pretty much all the improvements apply there too.

>> I think the best option for now is to change BTI_C into NOP if AARCH64_HAVE_BTI
>> is not set. This avoids creating alignment issues in existing code (which is written
>> to assume the hint is present) and works for all string functions.
>
> I updated sysdeps/aarch64/sysdep.h following your advice [1].
> 
> [1] https://github.com/NaohiroTamura/glibc/commit/c582917071e76cfed84fafb0c82cb70339294386

I meant using an actual NOP in the #else case so that existing string functions
won't change. Also note the #defines in the #if and #else need to be indented.

Cheers,
Wilco

  reply	other threads:[~2021-04-29 15:14 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-12 12:52 Wilco Dijkstra
2021-04-12 18:53 ` Florian Weimer
2021-04-13 12:07 ` naohirot
2021-04-14 16:02   ` Wilco Dijkstra
2021-04-15 12:20     ` naohirot
2021-04-20 16:00       ` Wilco Dijkstra
2021-04-27 11:58         ` naohirot
2021-04-29 15:13           ` Wilco Dijkstra [this message]
2021-04-30 15:01             ` Szabolcs Nagy
2021-04-30 15:23               ` Wilco Dijkstra
2021-04-30 15:30                 ` Florian Weimer
2021-04-30 15:40                   ` Wilco Dijkstra
2021-05-04  7:56                     ` Szabolcs Nagy
2021-05-04 10:17                       ` Florian Weimer
2021-05-04 10:38                         ` Wilco Dijkstra
2021-05-04 10:42                         ` Szabolcs Nagy
2021-05-04 11:07                           ` Florian Weimer
2021-05-06 10:01             ` naohirot
2021-05-06 14:26               ` Szabolcs Nagy
2021-05-06 15:09                 ` Florian Weimer
2021-05-06 17:31               ` Wilco Dijkstra
2021-05-07 12:31                 ` naohirot
2021-04-19  2:51     ` naohirot
2021-04-19 14:57       ` Wilco Dijkstra
2021-04-21 10:10         ` naohirot
2021-04-21 15:02           ` Wilco Dijkstra
2021-04-22 13:17             ` naohirot
2021-04-23  0:58               ` naohirot
2021-04-19 12:43     ` naohirot
2021-04-20  3:31     ` naohirot
2021-04-20 14:44       ` Wilco Dijkstra
2021-04-27  9:01         ` naohirot
2021-04-20  5:49     ` naohirot
2021-04-20 11:39       ` Wilco Dijkstra
2021-04-27 11:03         ` naohirot
2021-04-23 13:22     ` naohirot
  -- strict thread matches above, loose matches on Subject: below --
2021-03-17  2:28 Naohiro Tamura
2021-03-29 12:03 ` Szabolcs Nagy
2021-05-10  1:45 ` naohirot
2021-05-14 13:35   ` Szabolcs Nagy
2021-05-19  0:11     ` naohirot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=VE1PR08MB55993A255FE21BA012D010C3835F9@VE1PR08MB5599.eurprd08.prod.outlook.com \
    --to=wilco.dijkstra@arm.com \
    --cc=Szabolcs.Nagy@arm.com \
    --cc=libc-alpha@sourceware.org \
    --cc=naohirot@fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).