public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libc/28432] New: Aarch64 memcpy used on device-memory
@ 2021-10-07 12:39 jon@solid-run.com
  2021-10-07 12:59 ` [Bug libc/28432] " adhemerval.zanella at linaro dot org
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: jon@solid-run.com @ 2021-10-07 12:39 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=28432

            Bug ID: 28432
           Summary: Aarch64 memcpy used on device-memory
           Product: glibc
           Version: 2.32
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: libc
          Assignee: unassigned at sourceware dot org
          Reporter: jon@solid-run.com
                CC: drepper.fsp at gmail dot com
  Target Milestone: ---

Created attachment 13713
  --> https://sourceware.org/bugzilla/attachment.cgi?id=13713&action=edit
patch to memcpy memmove for better device-memory compatibility

It was first reported 4 years ago with the Macchiatobin writing to a memory
mapped framebuffer of a PCIe device.  The error was narrowed down to
overlapping stps causing device memory to be 0'd out or not written at all. 
There were many discussions on if it was valid to use mem* functions on device
memory mapped as uncached / writecombined.  Recently I tracked down a rendering
problem on the HoneyComb LX2K to a similar failure.  Since between the 3 SOCs,
the only similarity is the Cortex-A72 cores (They all have different
combinations of CCN's and PCIe IP) I started looking a bit more into possible
causes.  I came across this documentation regarding how the Cortex-A72 does ACE
transfers, https://developer.arm.com/documentation/100095/0001/way1381846851421

because I had already narrowed down the failure to memcpy's of 97-110 size
unaligned copies I realized that it was always the last 2 stp's of the copy96
routine. Since the ordering should not matter, I instead moved the backwards
copy to happen first which would then allow from what I understand of the
document above the 4 forward progressing stp's could be sent as a single
4x128bit WRAP write.

This does fix the issue I was trying to solve in both my specific test case as
well as a few real world rendering failures.  Since we are only re-ordering
stp's there should be no functional or performance regressions, and all the
glibc test's do pass.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-11-29 16:18 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-07 12:39 [Bug libc/28432] New: Aarch64 memcpy used on device-memory jon@solid-run.com
2021-10-07 12:59 ` [Bug libc/28432] " adhemerval.zanella at linaro dot org
2021-10-13 12:36 ` wdijkstr at arm dot com
2021-10-13 16:36 ` jon@solid-run.com
2021-10-13 18:28 ` nsz at gcc dot gnu.org
2021-10-14  4:15 ` jon@solid-run.com
2021-10-14  8:53 ` david at qore dot org
2021-10-25 16:46 ` nsz at gcc dot gnu.org
2021-10-26  7:01 ` jon@solid-run.com
2021-11-29 16:18 ` sam at gentoo dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).