From: Adhemerval Zanella <adhemerval.zanella@linaro.org>
To: libc-alpha@sourceware.org
Subject: [PATCH v3 0/7] Use introsort for qsort
Date: Fri, 3 Sep 2021 14:11:37 -0300 [thread overview]
Message-ID: <20210903171144.952737-1-adhemerval.zanella@linaro.org> (raw)
This is respin from my previous attempt to fix and improve qsort() [1].
The main motivation to use introsort is to make it fully AS-Safe and
AC-Safe, with a limited stack size requirement, to remove the use
of malloc (along with the system memory checks), and keeping Worst-case
performance on O(n*ln(n)) (instead of potentially quadradic as for the
quicksort).
The implementation does not aim to be the state-of-the-art sort
algorithm, rather I used a well understood and simple on ((introsort,
used on libstdc++) and leverage the current quicksort implementation
along with a heapsort one from Linux kernel.
The resulting implementation has bounded stack requirement (about 1.8k
on x86_64), and results in an simplified code and binary:
# master
$ size stdlib/qsort.o stdlib/msort.o
text data bss dec hex filename
1316 0 0 1316 524 stdlib/qsort.o
2045 0 12 2057 809 stdlib/msort.o
# patched
$ size stdlib/qsort.os
text data bss dec hex filename
2373 0 0 2373 945 stdlib/qsort.os
The performance difference with the provided benchmark clearly show the
tradeoff of using a instrosort over a mergesort. The results are on
a Ryzen 9 with gcc 11 ran with --nelem 49152:786432:8388608 (to simulate
the L1/L2/L3 of this CPU used):
SORTED
elements= 49152 (size= 4): %52.47% (master=3654381.0 patches=5571774.0)
elements= 786432 (size= 4): %57.39% (master=73067211.0 patches=115003964.0)
elements= 8388608 (size= 4): %49.71% (master=1005357374.0 patches=1505144332.0)
elements= 49152 (size= 8): %48.98% (master=3796489.0 patches=5656023.0)
elements= 786432 (size= 8): %52.34% (master=75809561.0 patches=115491584.0)
elements= 8388608 (size= 8): %49.97% (master=1004201712.0 patches=1506017512.0)
elements= 49152 (size=32): %-2.69% (master=11191649.0 patches=10891084.0)
elements= 786432 (size=32): %-12.16% (master=238870746.0 patches=209822358.0)
elements= 8388608 (size=32): %-23.33% (master=3323248933.0 patches=2547848399.0)
MOSTLYSORTED
elements= 49152 (size= 4): %66.47% (master=6915979.0 patches=11512945.0)
elements= 786432 (size= 4): %66.48% (master=143188170.0 patches=238379284.0)
elements= 8388608 (size= 4): %70.90% (master=1805703842.0 patches=3086011963.0)
elements= 49152 (size= 8): %78.11% (master=6836231.0 patches=12176330.0)
elements= 786432 (size= 8): %77.79% (master=138851879.0 patches=246867806.0)
elements= 8388608 (size= 8): %77.70% (master=1773884617.0 patches=3152112348.0)
elements= 49152 (size=32): %-6.44% (master=13766992.0 patches=12880888.0)
elements= 786432 (size=32): %-15.19% (master=287590899.0 patches=243919593.0)
elements= 8388608 (size=32): %-25.63% (master=3852042869.0 patches=2864641676.0)
RANDOM
elements= 49152 (size= 4): %9.54% (master=15078260.0 patches=16516457.0)
elements= 786432 (size= 4): %5.64% (master=305805587.0 patches=323043450.0)
elements= 8388608 (size= 4): %2.78% (master=3838340458.0 patches=3945202718.0)
elements= 49152 (size= 8): %13.75% (master=14676586.0 patches=16694499.0)
elements= 786432 (size= 8): %10.71% (master=295583560.0 patches=327252646.0)
elements= 8388608 (size= 8): %7.67% (master=3722946692.0 patches=4008563091.0)
elements= 49152 (size=32): %-8.34% (master=20944462.0 patches=19196696.0)
elements= 786432 (size=32): %-14.31% (master=425822838.0 patches=364896701.0)
elements= 8388608 (size=32): %-19.86% (master=5534217212.0 patches=4435276362.0)
REPEATED
elements= 49152 (size= 4): %9.18% (master=14051610.0 patches=15341610.0)
elements= 786432 (size= 4): %6.01% (master=282787457.0 patches=299777700.0)
elements= 8388608 (size= 4): %4.19% (master=3539279291.0 patches=3687517777.0)
elements= 49152 (size= 8): %12.37% (master=13659340.0 patches=15349336.0)
elements= 786432 (size= 8): %11.11% (master=273070595.0 patches=303413116.0)
elements= 8388608 (size= 8): %8.46% (master=3424739041.0 patches=3714521094.0)
elements= 49152 (size=32): %-11.21% (master=19826230.0 patches=17603795.0)
elements= 786432 (size=32): %-16.73% (master=405322605.0 patches=337501658.0)
elements= 8388608 (size=32): %-21.93% (master=5261140329.0 patches=4107591456.0)
BITONIC
elements= 49152 (size= 4): %405.43% (master=4632011.0 patches=23411611.0)
elements= 786432 (size= 4): %493.28% (master=92649410.0 patches=549674361.0)
elements= 8388608 (size= 4): %609.03% (master=1095591416.0 patches=7768102251.0)
elements= 49152 (size= 8): %402.05% (master=4672695.0 patches=23459484.0)
elements= 786432 (size= 8): %480.60% (master=94419035.0 patches=548197066.0)
elements= 8388608 (size= 8): %596.41% (master=1116631167.0 patches=7776333333.0)
elements= 49152 (size=32): %-6.92% (master=11545720.0 patches=10747100.0)
elements= 786432 (size=32): %-14.97% (master=245037969.0 patches=208357373.0)
elements= 8388608 (size=32): %-24.14% (master=3357712042.0 patches=2547221807.0)
With the bitonic and sorted showing the biggest hit. Even though, I
think the tradeoff is acceptable. The improvements over sizes larger
than 8 is mostly due the optimization done on swap operations.
[1] https://sourceware.org/pipermail/libc-alpha/2018-August/096981.html
Adhemerval Zanella (7):
benchtests: Add bench-qsort
support: Fix getopt_long with CMDLINE_OPTIONS
stdlib: Optimization qsort{_r} swap implementation (BZ #19305)
stdlib: Move insertion sort out qsort
stdlib: qsort: Move some macros to inline function
stdlib: Implement introsort with qsort
stdlib: Remove use of mergesort on qsort (BZ #21719)
benchtests/Makefile | 2 +-
benchtests/bench-qsort.c | 195 +++++++++++++++++++++
manual/argp.texi | 2 +-
manual/locale.texi | 3 +-
manual/search.texi | 7 +-
stdlib/Makefile | 3 +-
stdlib/msort.c | 310 ---------------------------------
stdlib/qsort.c | 333 ++++++++++++++++++++++++++++--------
support/support_test_main.c | 1 -
support/test-driver.h | 1 +
10 files changed, 460 insertions(+), 397 deletions(-)
create mode 100644 benchtests/bench-qsort.c
delete mode 100644 stdlib/msort.c
--
2.30.2
next reply other threads:[~2021-09-03 17:11 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-03 17:11 Adhemerval Zanella [this message]
2021-09-03 17:11 ` [PATCH v3 1/7] benchtests: Add bench-qsort Adhemerval Zanella
2021-09-04 9:09 ` Alexander Monakov
2021-09-06 18:30 ` Adhemerval Zanella
2021-10-13 3:19 ` Noah Goldstein
2021-10-15 12:52 ` Adhemerval Zanella
2021-10-15 16:39 ` Noah Goldstein
2021-10-15 17:19 ` Adhemerval Zanella
2021-09-03 17:11 ` [PATCH v3 2/7] support: Fix getopt_long with CMDLINE_OPTIONS Adhemerval Zanella
2021-09-03 17:11 ` [PATCH v3 3/7] stdlib: Optimization qsort{_r} swap implementation (BZ #19305) Adhemerval Zanella
2021-10-13 3:29 ` Noah Goldstein
2021-10-13 3:39 ` Noah Goldstein
2021-10-15 13:29 ` Adhemerval Zanella
2021-10-15 17:17 ` Noah Goldstein
2021-10-15 17:34 ` Adhemerval Zanella
2021-10-15 17:45 ` Noah Goldstein
2021-10-15 17:56 ` Adhemerval Zanella
2021-10-15 13:12 ` Adhemerval Zanella
2021-10-15 16:45 ` Noah Goldstein
2021-10-15 17:21 ` Adhemerval Zanella
2021-09-03 17:11 ` [PATCH v3 4/7] stdlib: Move insertion sort out qsort Adhemerval Zanella
2021-09-06 20:35 ` Fangrui Song
2021-09-06 20:48 ` Fangrui Song
2021-09-03 17:11 ` [PATCH v3 5/7] stdlib: qsort: Move some macros to inline function Adhemerval Zanella
2021-09-03 17:11 ` [PATCH v3 6/7] stdlib: Implement introsort with qsort Adhemerval Zanella
2021-09-04 9:17 ` Alexander Monakov
2021-09-06 18:43 ` Adhemerval Zanella
2021-09-06 20:23 ` Fangrui Song
2021-10-13 3:53 ` Noah Goldstein
2021-09-03 17:11 ` [PATCH v3 7/7] stdlib: Remove use of mergesort on qsort (BZ #21719) Adhemerval Zanella
2021-09-03 19:18 ` [PATCH v3 0/7] Use introsort for qsort Paul Eggert
2021-09-06 14:13 ` Carlos O'Donell
2021-09-06 17:03 ` Zack Weinberg
2021-09-06 18:19 ` Adhemerval Zanella
2021-09-07 0:14 ` Paul Eggert
2021-09-07 14:32 ` Adhemerval Zanella
2021-09-07 17:39 ` Paul Eggert
2021-09-07 18:07 ` Adhemerval Zanella
2021-09-07 19:28 ` Paul Eggert
2021-09-08 11:56 ` Adhemerval Zanella
2021-09-09 0:39 ` Paul Eggert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210903171144.952737-1-adhemerval.zanella@linaro.org \
--to=adhemerval.zanella@linaro.org \
--cc=libc-alpha@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).