From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 37F57385840D; Thu, 4 Apr 2024 10:36:39 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 37F57385840D DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1712226999; bh=CtZuBKW3uSY9jIXND/PCuqJzFKxpNfdUcAeS+0apuJM=; h=From:To:Subject:Date:In-Reply-To:References:From; b=Ei+Ro60giyIVHCpyPZeOA3LW+4zvjYHTp+R2cSKy2lF7R4gn/2B1yEi2fYYjIW1sb s9joXjJ1GqZaveUYRUDPzVvbtPIy8zLbHqu23L2zFUHf3rvi9GzCGb4MSPNAHzaqec 4QHDtwhSDp0g58tSyXpJ4+4rZZbYZ1wK7yqAKCwo= From: "cvs-commit at gcc dot gnu.org" To: glibc-bugs@sourceware.org Subject: [Bug string/30994] REP MOVSB performance suffers from page aliasing on Zen 4 Date: Thu, 04 Apr 2024 10:36:35 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: string X-Bugzilla-Version: 2.38 X-Bugzilla-Keywords: X-Bugzilla-Severity: minor X-Bugzilla-Who: cvs-commit at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: security- X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://sourceware.org/bugzilla/show_bug.cgi?id=3D30994 --- Comment #14 from Sourceware Commits --- The release/2.39/master branch has been updated by Arjun Shankar : https://sourceware.org/git/gitweb.cgi?p=3Dglibc.git;h=3Daa4249266e9906c4bc8= 33e4847f4d8feef59504f commit aa4249266e9906c4bc833e4847f4d8feef59504f Author: Adhemerval Zanella Date: Thu Feb 8 10:08:38 2024 -0300 x86: Fix Zen3/Zen4 ERMS selection (BZ 30994) The REP MOVSB usage on memcpy/memmove does not show much performance improvement on Zen3/Zen4 cores compared to the vectorized loops. Also, as from BZ 30994, if the source is aligned and the destination is not the performance can be 20x slower. The performance difference is noticeable with small buffer sizes, closer to the lower bounds limits when memcpy/memmove starts to use ERMS. The performance of REP MOVSB is similar to vectorized instruction on the size limit (the L2 cache). Also, there is no drawback to multiple cores sharing the cache. Checked on x86_64-linux-gnu on Zen3. Reviewed-by: H.J. Lu (cherry picked from commit 0c0d39fe4aeb0f69b26e76337c5dfd5530d5d44e) --=20 You are receiving this mail because: You are on the CC list for the bug.=