public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: "naohirot@fujitsu.com" <naohirot@fujitsu.com>
To: 'Wilco Dijkstra' <Wilco.Dijkstra@arm.com>
Cc: 'GNU C Library' <libc-alpha@sourceware.org>,
	Szabolcs Nagy <Szabolcs.Nagy@arm.com>
Subject: RE: [PATCH 0/5] Added optimized memcpy/memmove/memset for A64FX
Date: Tue, 20 Apr 2021 03:31:05 +0000	[thread overview]
Message-ID: <TYAPR01MB6025DC6336174F35AF570C44DF489@TYAPR01MB6025.jpnprd01.prod.outlook.com> (raw)
In-Reply-To: <VE1PR08MB5599AFAEFDA55471AF1C648C834E9@VE1PR08MB5599.eurprd08.prod.outlook.com>

Hi Wilco-san,

Let me focus on DC_ZVA and L1/L2 prefetch in this mail.

> From: Wilco Dijkstra <Wilco.Dijkstra@arm.com>

> > Without DC_VZA and L2 prefetch, memcpy and memset performance degraded
> over 4MB.
> 
> > DC_VZA and L2 prefetch have to be pair, only DC_VZA or only L2 prefetch
> doesn't get any improvement.
> 
> That seems odd. Was that using the L1 prefetch with the L2 distance? It seems to
> me one of the L1 or L2 prefetches is unnecessary. 

I tested the following 4 cases.
The result was that Case 4 is the best.
Case 2 and 3 were almost same as Case 1.
Case 4 [1] improved the performance in the size range more than 4MB from Case 1
7.5-10 GB/sec [2] to 10-10.5 GB/sec [3].

Case 1: DC_ZVA + L1 prefetch + L2 + prefetch [2]
Case 2: DC_ZVA + L1 prefetch
Case 3: DC_ZVA + L2 prefetch
Case 4: DC_ZVA only [3]

[1] https://github.com/NaohiroTamura/glibc/commit/d57bed764a45383dfea8265d6a384646f4f07eed
[2] https://drive.google.com/file/d/1ws3lTLzMFK3lLrrwxVFvriERrs-IKdP9/view
[3] https://drive.google.com/file/d/1g7nuFOtkFw3b5INcAfuuv2lVODmASm-G/view


>                                                Also why would the DC_ZVA
> need to be done so early? It seems to me that cleaning the cacheline just before
> you write it works best since that avoids accidentally replacing it.
> 

Yes, I moved it closer, please look at the change [1].

> > Without DC_VZA and L2 prefetch, memmove didn't degraded over 4MB.
> >
> > The reason why I didn't implement DC_VZA and L2 prefetch is that
> > memmove calls memcpy in most cases, and memmove code only handles
> backward copy.
> > Maybe most of memmove-large benchtest cases are backward copy, I need to
> check.
> 
> Most of the memmove tests do indeed overlap (so DC_ZVA does not work).
> However it also shows that it performs well across the L2 cache size range
> without any prefetch or DC_ZVA.

That's right, I confirmed that only DC_ZVA was necessary [1].

Next, I'll remove redundant instructions.

Thanks.
Naohiro



  parent reply	other threads:[~2021-04-20  3:31 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-12 12:52 Wilco Dijkstra
2021-04-12 18:53 ` Florian Weimer
2021-04-13 12:07 ` naohirot
2021-04-14 16:02   ` Wilco Dijkstra
2021-04-15 12:20     ` naohirot
2021-04-20 16:00       ` Wilco Dijkstra
2021-04-27 11:58         ` naohirot
2021-04-29 15:13           ` Wilco Dijkstra
2021-04-30 15:01             ` Szabolcs Nagy
2021-04-30 15:23               ` Wilco Dijkstra
2021-04-30 15:30                 ` Florian Weimer
2021-04-30 15:40                   ` Wilco Dijkstra
2021-05-04  7:56                     ` Szabolcs Nagy
2021-05-04 10:17                       ` Florian Weimer
2021-05-04 10:38                         ` Wilco Dijkstra
2021-05-04 10:42                         ` Szabolcs Nagy
2021-05-04 11:07                           ` Florian Weimer
2021-05-06 10:01             ` naohirot
2021-05-06 14:26               ` Szabolcs Nagy
2021-05-06 15:09                 ` Florian Weimer
2021-05-06 17:31               ` Wilco Dijkstra
2021-05-07 12:31                 ` naohirot
2021-04-19  2:51     ` naohirot
2021-04-19 14:57       ` Wilco Dijkstra
2021-04-21 10:10         ` naohirot
2021-04-21 15:02           ` Wilco Dijkstra
2021-04-22 13:17             ` naohirot
2021-04-23  0:58               ` naohirot
2021-04-19 12:43     ` naohirot
2021-04-20  3:31     ` naohirot [this message]
2021-04-20 14:44       ` Wilco Dijkstra
2021-04-27  9:01         ` naohirot
2021-04-20  5:49     ` naohirot
2021-04-20 11:39       ` Wilco Dijkstra
2021-04-27 11:03         ` naohirot
2021-04-23 13:22     ` naohirot
  -- strict thread matches above, loose matches on Subject: below --
2021-03-17  2:28 Naohiro Tamura
2021-03-29 12:03 ` Szabolcs Nagy
2021-05-10  1:45 ` naohirot
2021-05-14 13:35   ` Szabolcs Nagy
2021-05-19  0:11     ` naohirot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=TYAPR01MB6025DC6336174F35AF570C44DF489@TYAPR01MB6025.jpnprd01.prod.outlook.com \
    --to=naohirot@fujitsu.com \
    --cc=Szabolcs.Nagy@arm.com \
    --cc=Wilco.Dijkstra@arm.com \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).