public inbox for libc-stable@sourceware.org
 help / color / mirror / Atom feed
From: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
To: "libc-stable@sourceware.org" <libc-stable@sourceware.org>
Cc: nd <nd@arm.com>
Subject: [2.26 COMMITTED][AArch64] Backport strcmp improvements
Date: Tue, 01 Jan 2019 00:00:00 -0000	[thread overview]
Message-ID: <VI1PR0801MB2127BD7F5638E4119177FB2883BA0@VI1PR0801MB2127.eurprd08.prod.outlook.com> (raw)

commit 01de24dbca4374665fb2a439be39c05427c0a24a
Author: Siddhesh Poyarekar <siddhesh@sourceware.org>
Date:   Thu Feb 22 23:48:13 2018 +0530

    aarch64/strcmp: fix misaligned loop jump target
    
    I accidentally set the loop jump back label as misaligned8 instead of
    do_misaligned.  The typo is harmless but it's always nice to not have
    to unnecessarily execute those two instructions.
    
        * sysdeps/aarch64/strcmp.S (do_misaligned): Jump back to
        do_misaligned, not misaligned8.
    
    (cherry picked from commit 6ca24c43481e2c93a6eec362b04c3e77a35b28e3)

commit 4e75091d6ce3f7ac8b1750ca6135bc37d6707caf
Author: Siddhesh Poyarekar <siddhesh@sourceware.org>
Date:   Wed Dec 13 18:50:27 2017 +0530

    aarch64: Improve strcmp unaligned performance
    
    Replace the simple byte-wise compare in the misaligned case with a
    dword compare with page boundary checks in place.  For simplicity I've
    chosen a 4K page boundary so that we don't have to query the actual
    page size on the system.
    
    This results in up to 3x improvement in performance in the unaligned
    case on falkor and about 2.5x improvement on mustang as measured using
    bench-strcmp.
    
        * sysdeps/aarch64/strcmp.S (misaligned8): Compare dword at a
        time whenever possible.
    
    (cherry picked from commit 2bce01ebbaf8db52ba4a5635eb5744f989cdbf69)


diff --git a/ChangeLog b/ChangeLog
index 18a01ed..29f9e1b 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,15 @@
 2019-09-06  Siddhesh Poyarekar  <siddhesh@sourceware.org>
 
+       * sysdeps/aarch64/strcmp.S (do_misaligned): Jump back to
+       do_misaligned, not misaligned8.
+
+2019-09-06  Siddhesh Poyarekar  <siddhesh@sourceware.org>
+
+       * sysdeps/aarch64/strcmp.S (misaligned8): Compare dword at a
+       time whenever possible.
+
+2019-09-06  Siddhesh Poyarekar  <siddhesh@sourceware.org>
+
        * sysdeps/aarch64/memcmp.S (more16): Fix loop16 branch target.
 
        * sysdeps/aarch64/memcmp.S: Widen comparison to 16 bytes at a
diff --git a/sysdeps/aarch64/strcmp.S b/sysdeps/aarch64/strcmp.S
index e99d662..7eed82c 100644
--- a/sysdeps/aarch64/strcmp.S
+++ b/sysdeps/aarch64/strcmp.S
@@ -72,6 +72,7 @@ L(start_realigned):
        cbz     syndrome, L(loop_aligned)
        /* End of performance-critical section  -- one 64B cache line.  */
 
+L(end):
 #ifndef        __AARCH64EB__
        rev     syndrome, syndrome
        rev     data1, data1
@@ -145,12 +146,38 @@ L(mutual_align):
        b       L(start_realigned)
 
 L(misaligned8):
-       /* We can do better than this.  */
+       /* Align SRC1 to 8 bytes and then compare 8 bytes at a time, always
+          checking to make sure that we don't access beyond page boundary in
+          SRC2.  */
+       tst     src1, #7
+       b.eq    L(loop_misaligned)
+L(do_misaligned):
        ldrb    data1w, [src1], #1
        ldrb    data2w, [src2], #1
        cmp     data1w, #1
        ccmp    data1w, data2w, #0, cs  /* NZCV = 0b0000.  */
-       b.eq    L(misaligned8)
+       b.ne    L(done)
+       tst     src1, #7
+       b.ne    L(do_misaligned)
+
+L(loop_misaligned):
+       /* Test if we are within the last dword of the end of a 4K page.  If
+          yes then jump back to the misaligned loop to copy a byte at a time.  */
+       and     tmp1, src2, #0xff8
+       eor     tmp1, tmp1, #0xff8
+       cbz     tmp1, L(do_misaligned)
+       ldr     data1, [src1], #8
+       ldr     data2, [src2], #8
+
+       sub     tmp1, data1, zeroones
+       orr     tmp2, data1, #REP8_7f
+       eor     diff, data1, data2      /* Non-zero if differences found.  */
+       bic     has_nul, tmp1, tmp2     /* Non-zero if NUL terminator.  */
+       orr     syndrome, diff, has_nul
+       cbz     syndrome, L(loop_misaligned)
+       b       L(end)
+
+L(done):
        sub     result, data1, data2
        RET
 END(strcmp)

                 reply	other threads:[~2019-09-06 16:53 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=VI1PR0801MB2127BD7F5638E4119177FB2883BA0@VI1PR0801MB2127.eurprd08.prod.outlook.com \
    --to=wilco.dijkstra@arm.com \
    --cc=libc-stable@sourceware.org \
    --cc=nd@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).