From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 91C443858039; Thu, 7 Oct 2021 12:39:28 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 91C443858039 From: "jon@solid-run.com" To: glibc-bugs@sourceware.org Subject: [Bug libc/28432] New: Aarch64 memcpy used on device-memory Date: Thu, 07 Oct 2021 12:39:28 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: libc X-Bugzilla-Version: 2.32 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: jon@solid-run.com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter cc target_milestone attachments.created Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: glibc-bugs@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Glibc-bugs mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Oct 2021 12:39:28 -0000 https://sourceware.org/bugzilla/show_bug.cgi?id=3D28432 Bug ID: 28432 Summary: Aarch64 memcpy used on device-memory Product: glibc Version: 2.32 Status: UNCONFIRMED Severity: normal Priority: P2 Component: libc Assignee: unassigned at sourceware dot org Reporter: jon@solid-run.com CC: drepper.fsp at gmail dot com Target Milestone: --- Created attachment 13713 --> https://sourceware.org/bugzilla/attachment.cgi?id=3D13713&action=3Ded= it patch to memcpy memmove for better device-memory compatibility It was first reported 4 years ago with the Macchiatobin writing to a memory mapped framebuffer of a PCIe device. The error was narrowed down to overlapping stps causing device memory to be 0'd out or not written at all.= =20 There were many discussions on if it was valid to use mem* functions on dev= ice memory mapped as uncached / writecombined. Recently I tracked down a rende= ring problem on the HoneyComb LX2K to a similar failure. Since between the 3 SO= Cs, the only similarity is the Cortex-A72 cores (They all have different combinations of CCN's and PCIe IP) I started looking a bit more into possib= le causes. I came across this documentation regarding how the Cortex-A72 does= ACE transfers, https://developer.arm.com/documentation/100095/0001/way138184685= 1421 because I had already narrowed down the failure to memcpy's of 97-110 size unaligned copies I realized that it was always the last 2 stp's of the copy= 96 routine. Since the ordering should not matter, I instead moved the backwards copy to happen first which would then allow from what I understand of the document above the 4 forward progressing stp's could be sent as a single 4x128bit WRAP write. This does fix the issue I was trying to solve in both my specific test case= as well as a few real world rendering failures. Since we are only re-ordering stp's there should be no functional or performance regressions, and all the glibc test's do pass. --=20 You are receiving this mail because: You are on the CC list for the bug.=