From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <fw@deneb.enyo.de>
Received: from albireo.enyo.de (albireo.enyo.de [37.24.231.21])
 by sourceware.org (Postfix) with ESMTPS id E84DF3858028
 for <libc-alpha@sourceware.org>; Mon, 12 Apr 2021 18:53:57 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org E84DF3858028
Authentication-Results: sourceware.org;
 dmarc=none (p=none dis=none) header.from=deneb.enyo.de
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=fw@deneb.enyo.de
Received: from [172.17.203.2] (port=55679 helo=deneb.enyo.de)
 by albireo.enyo.de ([172.17.140.2]) with esmtps
 (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256)
 id 1lW1h4-0004Ad-Pz; Mon, 12 Apr 2021 18:53:54 +0000
Received: from fw by deneb.enyo.de with local (Exim 4.92)
 (envelope-from <fw@deneb.enyo.de>)
 id 1lW1h4-0000eg-J4; Mon, 12 Apr 2021 20:53:54 +0200
From: Florian Weimer <fw@deneb.enyo.de>
To: Wilco Dijkstra via Libc-alpha <libc-alpha@sourceware.org>
Cc: "naohirot\@fujitsu.com" <naohirot@fujitsu.com>,
 Wilco Dijkstra <Wilco.Dijkstra@arm.com>, Szabolcs Nagy <Szabolcs.Nagy@arm.com>
Subject: Re: [PATCH 0/5] Added optimized memcpy/memmove/memset for A64FX
References: <VE1PR08MB559991EE24FFB21C1258B71783709@VE1PR08MB5599.eurprd08.prod.outlook.com>
Date: Mon, 12 Apr 2021 20:53:54 +0200
In-Reply-To: <VE1PR08MB559991EE24FFB21C1258B71783709@VE1PR08MB5599.eurprd08.prod.outlook.com>
 (Wilco Dijkstra via Libc-alpha's message of "Mon, 12 Apr 2021 12:52:05
 +0000")
Message-ID: <87tuobibb1.fsf@mid.deneb.enyo.de>
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Status: No, score=-6.1 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS,
 SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: libc-alpha@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Mon, 12 Apr 2021 18:53:59 -0000

* Wilco Dijkstra via Libc-alpha:

> 5. Odd prefetches
>
> I have a hard time believing first prefetching the data to be
> written, then clearing it using DC ZVA (???), then prefetching the
> same data a 2nd time, before finally write the loaded data is
> helping performance...  Generally hardware prefetchers are able to
> do exactly the right thing since memcpy is trivial to prefetch.  So
> what is the performance gain of each prefetch/clear step? What is
> the difference between memcpy and memmove performance (given memmove
> doesn't do any of this)?

Another downside is exposure of latent concurrency bugs:

  G1: Phantom zeros in cardtable
  <https://bugs.openjdk.java.net/browse/JDK-8039042>

I guess the CPU's heritage is shining through here. 8-)