From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id DB9933861826 for ; Thu, 1 Feb 2024 17:32:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org DB9933861826 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org DB9933861826 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1706808745; cv=none; b=k2TwypnmKob3maxLKhLeNl5jfCMfkhfZq44oq4lfaJfZvIkQK7jUH0D/VSjv3xtTS2HQdtXrYXABNdxtg8YqIoWKYyWYS3+kmk2leF5GtsQ6KpApfJN0Cwauen/UiH8YY7OMGIWXg+ezyZIlpBmSW+1D4npqEQQ/DDtOkqPBMF4= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1706808745; c=relaxed/simple; bh=+YIDYzw0FIW4GGwfBu65exugDxxYH2QaMVP3lSez69Y=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=agG/a+5AMR1ztB+fZzKEP9V+jwMum7zuHlUZLcRHiUdiie2ibvE6A9DlDI9P8Ngb/pBSiRWt4oAiJpqirvgWilxSI7bKY/vv2Mpm/afOTycWr0Z9UIbYXKSHPLcjnWa8oGbyfXeTwQPeGQlb9nCdUI8k5nDw/yxo1F1+5nl4v78= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 3B24FDA7; Thu, 1 Feb 2024 09:33:06 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id C58AD3F738; Thu, 1 Feb 2024 09:32:22 -0800 (PST) From: Richard Sandiford To: Wilco Dijkstra Mail-Followup-To: Wilco Dijkstra ,Richard Earnshaw , Kyrylo Tkachov , GCC Patches , Richard Earnshaw , richard.sandiford@arm.com Cc: Richard Earnshaw , Kyrylo Tkachov , GCC Patches , Richard Earnshaw Subject: Re: [PATCH v4] AArch64: Cleanup memset expansion References: <372b9689-24b5-41f4-a990-5aee0226e15f@foss.arm.com> <61c6e268-188c-4b35-956d-bd8927d705f2@foss.arm.com> Date: Thu, 01 Feb 2024 17:32:21 +0000 In-Reply-To: (Wilco Dijkstra's message of "Tue, 30 Jan 2024 15:51:18 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-15.1 required=5.0 tests=BAYES_00,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_SHORT,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Wilco Dijkstra writes: > Hi Richard, > >>> That tune is only used by an obsolete core. I ran the memcpy and memset >>> benchmarks from Optimized Routines on xgene-1 with and without LDP/STP. >>> There is no measurable penalty for using LDP/STP. I'm not sure why it was >>> ever added given it does not do anything useful. I'll post a separate patch >>> to remove it to reduce the maintenance overhead. > > Patch: https://gcc.gnu.org/pipermail/gcc-patches/2024-January/644442.html > >> Is that enough to justify removing it though? It sounds from: >> >> https://gcc.gnu.org/pipermail/gcc-patches/2018-June/500017.html >> >> like the problem was in more balanced code, rather than memory-limited >> things like memset/memcpy. >> >> But yeah, I'm not sure if the intuition was supported by numbers >> in the end. If SPEC also shows no change then we can probably drop it >> (unless someone objects). > > SPECINT didn't show any difference either, so LDP doesn't have a measurable > penalty. It doesn't look like the original commit was ever backed up by benchmarks... > >> Let's leave this patch until that's resolved though, since I think as it >> stands the patch does leave -Os -mtune=xgene1 worse off (bigger code). >> Handling the tune in the meantime would also be OK. > > Note it was incorrectly handling -Os, it should still form LDP in that case > and take advantage of longer and faster inlined memcpy/memset instead of > calling a library function. Yeah. FWIW, I'd made the same point earlier in the review. Now we have the LDP/STP policy tuning instead, but since the block memory routines don't adjust for that yet, it's probably easier to handle any such adjustment as a follow-on (for anyone who wants to do it). >> /* Default the maximum to 256-bytes when considering only libcall vs >> SIMD broadcast sequence. */ > >> ...this comment should be deleted along with the code it's describing. >> Don't respin just for that though :) > > I've fixed that locally. Thanks. The patch is OK for GCC 15 if there are no objections to the AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS patch. Richard