Hi! On Sun, May 06, 2007 at 06:56:39AM +0100, belazougui djamel wrote: > I am writing again about glibc qsort algorithms and its performance. I > don't know exactly what to do to attract attention on this subject, but > speed of GNU qsort could greatly improved. At least in my experiments, > i can get quite large improvements. One of the reasons that could make > qsort so slow is the use of memcpy function in the inner loop. this is > inefficient because memcpy is slow when copied buffers are of variable > lengths.One of the solutions i have experimented was to use 4 different > sort functions, each one implemented for a class of operands lengths: > 1-sort function for operands of size 4. > 2-1-sort function for operands of size 8. > 3-sort function for operands of size 12,20,28,,8n+4. > 4-sort function for operands of size 16,24,32,,8n. > > for each of those functions i wrote my own memory copy function > optimized for size class and this made the algorithm run faster. Another > solution that made even larger speedup is to use indirect merge sorting. Here is a patch which implements that. For code size reasons it actually doesn't use different functions, as that made msort.os much bigger, just a switch. Below are some benchmark numbers from x86_64-linux, system glibc doesn't have that patch, while libc build tree in cwd does. ~/timing elf/ld.so --library-path . stdlib/tst-qsort2 100000 2048 Strip out best and worst realtime result minimum: 0.286614861 sec real / 0.000026731 sec CPU maximum: 0.296511410 sec real / 0.000049066 sec CPU average: 0.292100834 sec real / 0.000042651 sec CPU stdev : 0.001431638 sec real / 0.000003603 sec CPU ~/timing stdlib/tst-qsort2 100000 2048 Strip out best and worst realtime result minimum: 2.032786138 sec real / 0.000042467 sec CPU maximum: 2.039871554 sec real / 0.000055063 sec CPU average: 2.037650687 sec real / 0.000046376 sec CPU stdev : 0.001668754 sec real / 0.000001625 sec CPU ~/timing elf/ld.so --library-path . stdlib/tst-qsort2 100000 256 Strip out best and worst realtime result minimum: 0.079751795 sec real / 0.000037594 sec CPU maximum: 0.086948837 sec real / 0.000047276 sec CPU average: 0.080501879 sec real / 0.000041131 sec CPU stdev : 0.001015890 sec real / 0.000002226 sec CPU ~/timing stdlib/tst-qsort2 100000 256 Strip out best and worst realtime result minimum: 0.219361708 sec real / 0.000037121 sec CPU maximum: 0.221073085 sec real / 0.000048177 sec CPU average: 0.220079997 sec real / 0.000040878 sec CPU stdev : 0.000306212 sec real / 0.000002211 sec CPU ~/timing elf/ld.so --library-path . stdlib/tst-qsort2 100000 32 Strip out best and worst realtime result minimum: 0.032166783 sec real / 0.000036703 sec CPU maximum: 0.042934657 sec real / 0.000044877 sec CPU average: 0.040185916 sec real / 0.000041135 sec CPU stdev : 0.002841348 sec real / 0.000002156 sec CPU ~/timing stdlib/tst-qsort2 100000 32 Strip out best and worst realtime result minimum: 0.036521886 sec real / 0.000027210 sec CPU maximum: 0.049771413 sec real / 0.000050816 sec CPU average: 0.045404896 sec real / 0.000040574 sec CPU stdev : 0.003352199 sec real / 0.000003306 sec CPU ~/timing elf/ld.so --library-path . stdlib/tst-qsort2 100000 16 Strip out best and worst realtime result minimum: 0.022185376 sec real / 0.000034847 sec CPU maximum: 0.036350224 sec real / 0.000050040 sec CPU average: 0.032392322 sec real / 0.000041894 sec CPU stdev : 0.003810001 sec real / 0.000002991 sec CPU ~/timing stdlib/tst-qsort2 100000 16 Strip out best and worst realtime result minimum: 0.024453106 sec real / 0.000035883 sec CPU maximum: 0.038619840 sec real / 0.000044240 sec CPU average: 0.035602517 sec real / 0.000040279 sec CPU stdev : 0.002733362 sec real / 0.000001737 sec CPU ~/timing elf/ld.so --library-path . stdlib/tst-qsort2 100000 8 Strip out best and worst realtime result minimum: 0.021042132 sec real / 0.000021910 sec CPU maximum: 0.024142820 sec real / 0.000038728 sec CPU average: 0.023244031 sec real / 0.000030432 sec CPU stdev : 0.001035261 sec real / 0.000002934 sec CPU ~/timing stdlib/tst-qsort2 100000 8 Strip out best and worst realtime result minimum: 0.014727083 sec real / 0.000021330 sec CPU maximum: 0.024788724 sec real / 0.000039112 sec CPU average: 0.023562170 sec real / 0.000030172 sec CPU stdev : 0.001994661 sec real / 0.000003953 sec CPU ~/timing elf/ld.so --library-path . stdlib/tst-qsort2 100000 4 Strip out best and worst realtime result minimum: 0.020438105 sec real / 0.000018788 sec CPU maximum: 0.023082930 sec real / 0.000039773 sec CPU average: 0.022272783 sec real / 0.000027106 sec CPU stdev : 0.000949993 sec real / 0.000004309 sec CPU ~/timing stdlib/tst-qsort2 100000 4 Strip out best and worst realtime result minimum: 0.020050736 sec real / 0.000016737 sec CPU maximum: 0.033928475 sec real / 0.000034287 sec CPU average: 0.027609688 sec real / 0.000023064 sec CPU stdev : 0.004832571 sec real / 0.000004405 sec CPU ~/timing elf/ld.so --library-path . stdlib/tst-qsort2 100000 2 Strip out best and worst realtime result minimum: 0.026925996 sec real / 0.000021625 sec CPU maximum: 0.032363087 sec real / 0.000032854 sec CPU average: 0.029911853 sec real / 0.000023378 sec CPU stdev : 0.002086839 sec real / 0.000001389 sec CPU ~/timing stdlib/tst-qsort2 100000 2 Strip out best and worst realtime result minimum: 0.026910302 sec real / 0.000020686 sec CPU maximum: 0.032150177 sec real / 0.000040594 sec CPU average: 0.029878779 sec real / 0.000023112 sec CPU stdev : 0.002115564 sec real / 0.000001469 sec CPU Jakub