From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from esa8.fujitsucc.c3s2.iphmx.com (esa8.fujitsucc.c3s2.iphmx.com [68.232.159.88]) by sourceware.org (Postfix) with ESMTPS id 4E444393D014 for ; Fri, 23 Apr 2021 13:22:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 4E444393D014 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=fujitsu.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=naohirot@fujitsu.com IronPort-SDR: sZma6yvuejmhuzuDbeJwdN1D8d7sGo143ywaclio+RiafAcXWmP0cZfGk8s8TYGC52sw+W8Ny3 eAwln+rIlGtNav7kica80+0DyZVH9n/0fKB4YbKqujSorFuLmz9z8J40hDhtxZbwVyrSM309ZS CXz09Vvi1FXH15HA51violv1yv8TEOMfxP9o/Phi89xCElcR/p69Ft6WzzkKbardGW09FesrpU FmUKbZJl7z12KK6AUWijrJRoccESqzWeIVgnkKBCRkFeO4F4cIHN+vkYSUFNFabGBixFiZujDw pS0= X-IronPort-AV: E=McAfee;i="6200,9189,9963"; a="30305769" X-IronPort-AV: E=Sophos;i="5.82,245,1613401200"; d="scan'208";a="30305769" Received: from mail-os2jpn01lp2050.outbound.protection.outlook.com (HELO JPN01-OS2-obe.outbound.protection.outlook.com) ([104.47.92.50]) by ob1.fujitsucc.c3s2.iphmx.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Apr 2021 22:22:52 +0900 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=JHfbaQWscHacR0QYG0DJPJKF1aB+ePOMsaP/lc4u7OQN/wKVJWo9WFrzPs1cdS63VGfLkxCE2LJs7pFdeiSjn1fSoGLadxj4gPS8JuTTZebREVcXGAyStMbit1/hPk99xUrLYHMRV8WRebHkxW+GpOiCZ6a1ITUXhnJIl4VezhYaMVFe+0StRp2w0v/xH1FaGFIjRgRVsvFgYtNUlFjoOcB8l3J7zOD7hJF9WaLVcksHT1OhQR4sXWT2hl3f6cAtVbp4vmLyYj+nVj97B9D9yrLp51sln6wYbBZ55rp4YkheTr+aE7nJZvHm/akrTa0kYFF/ilwus/CNO5zk75MLBQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=G+1xH8e9YnAbnOUP87o6ZyXWORI6seU1RQqDDeuNxr0=; b=bBS+nbUMs/5gIS47vL78D8qnufdTdC/9CUu9YxyCO3MAjoJajXs3CNfcaagKwCovUozRvLTtCGqYnZT8qBo4HOb68W+OF7RmjdfigsThWv5bt9s483Km/3J8ULZhlQ1hZXy9yjqWqOu0I5yQ/Bj40nSL4+pahR7RDAjwnWzyumEfzQ7AgBMSlfDqE0ZZQPCO4uJ3fFd8oxMea0iqZu8CRsN58XHd4j0FH0pcH3GpZ4egbU1d5Qb9WlqvgLKtLOp4oov8c0Yf8H5jLLRTEgf21D6A8KaS4nl6WNEPZflhQRzlfPS1Hp2E9GYM4fhTS6/AmpcJA3dxd5sxXFKDOPTQtg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=fujitsu.com; dmarc=pass action=none header.from=fujitsu.com; dkim=pass header.d=fujitsu.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fujitsu.onmicrosoft.com; s=selector2-fujitsu-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=G+1xH8e9YnAbnOUP87o6ZyXWORI6seU1RQqDDeuNxr0=; b=o6+2TPs9NNqFjG3F8mgspWdOTvXUd1mcngOlgolWFM+5g+sq1R+3MtTtSyN04KwVCMErPusqR89jzPWF8/DhllMW+2PvfOp85WAopO1TaI4kTkntKBkGfD0SW3+TOSVE+C/USqSRmuGL7c4aK5LA1yFGYei/KFjurtCHJHW7ZOI= Received: from TYAPR01MB6025.jpnprd01.prod.outlook.com (2603:1096:402:36::13) by TYAPR01MB5594.jpnprd01.prod.outlook.com (2603:1096:404:8058::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4065.21; Fri, 23 Apr 2021 13:22:48 +0000 Received: from TYAPR01MB6025.jpnprd01.prod.outlook.com ([fe80::2422:2c7:39a3:5283]) by TYAPR01MB6025.jpnprd01.prod.outlook.com ([fe80::2422:2c7:39a3:5283%6]) with mapi id 15.20.4065.021; Fri, 23 Apr 2021 13:22:48 +0000 From: "naohirot@fujitsu.com" To: Wilco Dijkstra CC: 'GNU C Library' , Szabolcs Nagy Subject: RE: [PATCH 0/5] Added optimized memcpy/memmove/memset for A64FX Thread-Topic: [PATCH 0/5] Added optimized memcpy/memmove/memset for A64FX Thread-Index: AQHXL5Jyw0P1gKwhEk6/DkVDv1IPJaqyCeTQgAIMP+uADWMlcA== Date: Fri, 23 Apr 2021 13:22:48 +0000 Message-ID: References: , In-Reply-To: Accept-Language: en-001, ja-JP, en-US Content-Language: aa X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: arm.com; dkim=none (message not signed) header.d=none;arm.com; dmarc=none action=none header.from=fujitsu.com; x-originating-ip: [210.162.30.50] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 47cf2bb7-bce9-4437-9e8c-08d9065ae394 x-ms-traffictypediagnostic: TYAPR01MB5594: x-ms-exchange-transport-forked: True x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:4941; x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: SzEdA0+r3905brF6/Og0sR2ayT+5Fh+1pnQEI0wvQwOR9jar1B31kI+gBAP3zAbH3/EBgw1lpPrSd088s5XtiiuuNGSAV+RDDMtVJ2BF3JWUefUuEIZ8ldftkyahCBbFH7TKYMoEtTSY76VIfdwtxAQcoP9vxh/L7EafsSR9XqAb2cn6pwTrC4cNtlrLqJpSLI9tPjS2cAx16MYLvxHIWV0OTklqGF//6xQla6v0hlv73sxZ0xxsyuGrTzKWbH9+rIAoHqbSnHUMtGRmBYqT8j0fIXo76ZyxGNePt8iWSQxoSQ3ZKwC803jWvmsozCoRIMCT1GCkVhs3Q2SdVvKIVpSSvFWbdBqZGce1ybt2x01JYX3F+ExENb8gBrk/3FxLX9TPcN90O6evm9lt9rGwmyzWIldSsw1QyOThcVu8PpDv1OMzItjN+/mweT0Sr2i9KljC4mWwkkrZK1YNTZRpx6oOp0zmmYLOn7eYsI1pPkJ/wpmH6j8431vxchV+E/u95LWjm0wFNAu5WZBPYP4CVOEDvls0pBRpNcAdasr3Fqa32S6mU7ZHcgWRZ6bg0g9PxkXfcP6VlwsrXIMsxOIL0svJN2mD+GzpEi5uy170vzdXrL/hZwATznqCy0TFu2XcGU233bsHK3/7/o+JLa8i0QmRK/zesu8HOomHL5KP17w= x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:TYAPR01MB6025.jpnprd01.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(376002)(39860400002)(396003)(136003)(346002)(366004)(478600001)(186003)(52536014)(38100700002)(85182001)(71200400001)(8676002)(33656002)(316002)(6506007)(8936002)(54906003)(26005)(76116006)(7696005)(64756008)(9686003)(66446008)(6916009)(55016002)(122000001)(2906002)(86362001)(4326008)(5660300002)(966005)(66946007)(66556008)(66476007); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata: =?iso-2022-jp?B?NWM0NC83YWFSMjIyNktnL0IzbUo2RmYwRTBzRXc1SzJ5M1hIOUJZREgr?= =?iso-2022-jp?B?c1RaL0lTekF1dllTWTdRbzBsSFlwNWxjUDRtekZuMWRaQlhUVTRhb2p6?= =?iso-2022-jp?B?czl4ZVBZK2s4dS9CNWFwSnhzY05OVUZkRjZiUGgyYjBhSWpVNnRPL0hq?= =?iso-2022-jp?B?NUJvMlRJdW0rMnBqRjU0Nms5VGg1eHllNit5bWZGU09MQVpkUlI4MlVt?= =?iso-2022-jp?B?MEUvU1NJSDRYY0thV3lXck9SdVU2eXVaMWgwM2hwUVdDbG9uUEc1NzVB?= =?iso-2022-jp?B?Q0c0L2ZPZlJqZk42U2FaZUtoTFc3Z1JlTW5PSW94L1U0ekxOQSt4V0cx?= =?iso-2022-jp?B?RGFCdGtZcm8rOEtmQnd4bTByNGJrY3RBTFdiempEank4Mk8rRkFTLzNa?= =?iso-2022-jp?B?b2xaeUJlaTZwTE8ybEZ5dU9vMjRJWEY4V2ZtN3V6aW1uSkIzQUszOUdy?= =?iso-2022-jp?B?clB5eGJyVUlzV3E1czB0TXkzZGdtVlhjNmVnMGdudHM2MlZhOE9LODhO?= =?iso-2022-jp?B?Mm9RS2tCT1RQRVc4ZGlHcTBrUi9lcnZJN2JoUlduazYrMkpoV20xQTFm?= =?iso-2022-jp?B?ZjBuWkFnR2xPWnNsa3lvTGVNKzliZTdZN2FRMTluQUNzaUlVeUhmTkUy?= =?iso-2022-jp?B?K3FQbjJ6dmhYSVhpZ1BvL3l0WTd2UFNXblcrUVBzbnhYQkxCMFZZWkU3?= =?iso-2022-jp?B?RUxlRHY0UmFXa2gwTmU0QkpCZGNuZUNBVFYvVXRIaE9GeUxteCtINnkx?= =?iso-2022-jp?B?VEt1a3hHbUF1aFhXemFyN0E0NEZYRTRzZ3Jyc0J1NWxMM0hMemNFYThl?= =?iso-2022-jp?B?RHZSMlNnU2dzenNTaHM0dTFMaXJZcVpTWm9QblBaMUQxbVdmKzR2K1Mr?= =?iso-2022-jp?B?bW5VbEJVRzFTSTF2Z3owcy9kSEtSK0NtMXN4LzlBZmtUMHVaRkYzNnpw?= =?iso-2022-jp?B?aE0yUHNXcEI4L2tqQjhlK3U2Q2xPL2tYZmhQK2Y4OGN5bVlRaG9lR3VP?= =?iso-2022-jp?B?b1NvVjZZbDlka2VwUjJTemxaVksvL1ZrY2NqN0F4b0lFS3hXNDFzaGY4?= =?iso-2022-jp?B?MUNjcmZWRGtwaVUvaERxTXVHTVNMZjZEdWR5Qk9TSGtzTzJuajNpcnNL?= =?iso-2022-jp?B?RXpEOWVjWGRoM3dVNTRTajROdWJCS0IwNmo2WW96MlI3UkN0cDMreHBK?= =?iso-2022-jp?B?OEcyRWowaHduNkw5aGNDYXkvaE1IQlRRVzFoNkd3ZExvcTE2UWVjU0la?= =?iso-2022-jp?B?ZmJXUFB3VTkrWFVMV0dySG5PeU5JQkdLOFlnQjY5VVZpQnJteUM2K3ky?= =?iso-2022-jp?B?M1VMU0lvMzRqTDJuNnNwbytDemtFQ1RtTEpRVG1XdWoyUmFjNnYzelE0?= =?iso-2022-jp?B?OGIxRHJuam1HMDkwbTRnalVmN0RzVlBxOEdqZDhQVFRsS3Zod2llcXo1?= =?iso-2022-jp?B?cnJIS1I3MTVYWVV6RTNRclgyRXA0bDUxQk8zb1UyeDZFb0prcXNFMGho?= =?iso-2022-jp?B?RUpSV3ZpOVhMR21iK1hSNnA5dDFQbFBRR2ltaGgyTGdJT3VqdTdkUXhS?= =?iso-2022-jp?B?Mk1yb2w5Y1hZMGJkbFBMNW9VVTJLekhMTUc0T0VKN0RsN21yaWtQaldv?= =?iso-2022-jp?B?dUx0NEQ5bER2VGRnUm9WbXdVY3lISFVVT0RtcEVEb1NmUUI0b3N3dklW?= =?iso-2022-jp?B?UFRlRXdHWHgzK25wcFpIaHJ5L2NvWjVCRE5wSjBRUTlEQVh0czdubE55?= =?iso-2022-jp?B?dWxhNkxTaDBkQUdiWjZMM0Rlbkg3VXFYaFR3ZE9xN3hzSVZRclBiWXBY?= =?iso-2022-jp?B?cmJpUXl2a1o2WllFa0VlVXJiWDJ6bTNFZ3I0U0RSRDBXZ2Zua0dKeUh4?= =?iso-2022-jp?B?bVBJbGFJVURTVlNwek4zV0pVRE04PQ==?= Content-Type: text/plain; charset="iso-2022-jp" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: fujitsu.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: TYAPR01MB6025.jpnprd01.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 47cf2bb7-bce9-4437-9e8c-08d9065ae394 X-MS-Exchange-CrossTenant-originalarrivaltime: 23 Apr 2021 13:22:48.3521 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a19f121d-81e1-4858-a9d8-736e267fd4c7 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: nZvWXREqiwa1MXKQepay+usLnky1R9kQdSDexON1VfaikgIOdJKIDl2Nmcbbuh/u146y9MzIRp2Ghuue/eVXZQ== X-MS-Exchange-Transport-CrossTenantHeadersStamped: TYAPR01MB5594 X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Apr 2021 13:22:55 -0000 Hi Wilco-san,=0A= =0A= Let me re-evaluate the loop unrolling/software pipelining of L(vl_agnostic)= for the size=0A= 512B-4MB using the latest source code [2] with all graphs [3] in this mail.= =0A= The early evaluation was reported in the mail [1] but all graphs were not p= rovided.=0A= =0A= [1] https://sourceware.org/pipermail/libc-alpha/2021-April/125002.html=0A= [2] https://github.com/NaohiroTamura/glibc/commit/cbcb80e69325c16c6697c4262= 7a6ca12c3245a86=0A= [3] https://docs.google.com/spreadsheets/d/1leFhCAirelDezb0OFC7cr7v4uMUMvea= N1iAxL410D2c/edit?usp=3Dsharing=0A= =0A= > From: Wilco Dijkstra =0A= =0A= > Yes that is a good idea - you could also check whether the software pipel= ining=0A= > actually helps on an OoO core (it shouldn't) since that contributes a lot= to the=0A= > complexity and the amount of code and unrolling required.=0A= =0A= I compared each unrolls by commenting out upper labels of the target label.= =0A= For example, if the target labels is L(unroll4) of memset, L(unroll32) and = L(unroll8)=0A= are commented out, and L(unroll4), L(unroll2), and L(unroll1) are executed.= =0A= Regarding memcpy/memmove, among L(unroll8), L(unroll4), L(unroll2), and L(u= nroll1).=0A= Regarding memset, among L(unroll32), L(unroll8), L(unroll4), L(unroll2), an= d L(unroll1) .=0A= =0A= The result was that 8 unrolling/pipelining for memcpy/memmove and 32=0A= unrolling/pipelining for memset are still effective between the size 512B-6= 4KB=0A= as shown in the graphs in Google Sheet [3]=0A= In conclusion, it seems the loop unrolling/software pipelining technique st= ill works=0A= in case of A64FX. It may be a peculiar characteristic of A64FX, I believe.= =0A= =0A= Thanks.=0A= Naohiro=0A=