public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: "Zhangxuelei (Derek)" <zhangxuelei4@huawei.com>
To: Wilco Dijkstra <Wilco.Dijkstra@arm.com>,
	"libc-alpha@sourceware.org" <libc-alpha@sourceware.org>,
	nd <nd@arm.com>, "siddhesh@gotplt.org" <siddhesh@gotplt.org>,
	jiangyikun <jiangyikun@huawei.com>,
	"yikunkero@gmail.com" <yikunkero@gmail.com>
Cc: nd <nd@arm.com>
Subject: Re: [PATCH v2 2/2] aarch64: Optimized memcpy and memmove for Kunpeng processor
Date: Sat, 26 Oct 2019 13:46:00 -0000	[thread overview]
Message-ID: <8DC571DDDE171B4094D3D33E9685917BD931DE@dggemi509-mbs.china.huawei.com> (raw)

Hi Wilco, sorry for the delay in replying because we tested lot after modifing memmove.

> In order to select the right memmove implementation,
> multiarch/memmove.c needs similar changes as multiarch/memcpy.c.

That's true, we missed this patch and will submit it next patch.

> Also since the memmove entry sequence does both check for medium
> and large cases, the full overlap check should be done in both.
> Currently only sizes 96-512 benefit, not the move_long case:

Yes, we will add full overlap check in move_long case next patch.

And what confusing us now is that, we removed dst_unaligned code in memcpy according to the previous comments, which did not affect performance after testing in memcpy cases. But in the case when uses memmove function and enters the memcpy part, unaligned cases is significantly slower than aligned case according to the results of the first half part of memmove-walk as shown in the bottom. So do you think we should still remove dst_unaligned code? 

We analyse the reason is of more judgement in the begin of memmove and may weak processor ability to handle this case, and so dst_unaligned make difference.

> Well it looks the dst_unaligned code (which deals with a specific
> issue on ThunderX2) ...

I remember you memtioned the specific issue on ThunderX2 before, could you tell us more about it?

Function: memmove
Variant: walk
                    __memmove_thunderx	__memmove_thunderx2	__memmove_falkor	__memmove_kunpeng2	__memmove_generic
========================================================================================================================
      length=128:        33.99 (-73.69%)	       18.65 (  4.67%)	       17.75 (  9.29%)	       19.21 (  1.80%)	       19.57	
      length=129:        35.41 (  2.08%)	       37.43 ( -3.51%)	       35.87 (  0.79%)	       34.71 (  4.01%)	       36.16	
      length=256:        45.55 (-37.95%)	       32.61 (  1.23%)	       35.59 ( -7.79%)	       32.95 (  0.20%)	       33.02	
      length=257:        66.36 (  4.20%)	       69.50 ( -0.33%)	       68.03 (  1.80%)	       68.53 (  1.08%)	       69.27	
      length=512:        82.77 (-34.10%)	       65.67 ( -6.41%)	       65.61 ( -6.30%)	       60.13 (  2.57%)	       61.72	
      length=513:       146.19 (  3.90%)	      132.98 ( 12.59%)	      132.28 ( 13.05%)	      151.50 (  0.41%)	      152.12	
     length=1024:       155.75 (-26.13%)	      142.74 (-15.60%)	      126.97 ( -2.83%)	      121.58 (  1.53%)	      123.48	
     length=1025:       289.15 (  4.72%)	      318.71 ( -5.02%)	      262.97 ( 13.35%)	      307.00 ( -1.16%)	      303.48	
     length=2048:       298.85 (-22.16%)	      233.98 (  4.35%)	      249.71 ( -2.08%)	      245.37 ( -0.30%)	      244.63	
     length=2049:       409.46 ( 14.62%)	      399.08 ( 16.78%)	      508.64 ( -6.07%)	      465.79 (  2.87%)	      479.54	
     length=4096:       543.10 (-11.30%)	      445.35 (  8.73%)	      491.40 ( -0.71%)	      435.61 ( 10.73%)	      487.95	
     length=4097:       680.95 ( 18.96%)	      593.99 ( 29.31%)	      990.52 (-17.89%)	      882.91 ( -5.08%)	      840.23	
     length=8192:      1047.46 ( -8.01%)	      867.03 ( 10.59%)	      977.80 ( -0.83%)	      850.57 ( 12.29%)	      969.74	
     length=8193:      1224.46 ( 21.97%)	      979.34 ( 37.59%)	     1981.71 (-26.29%)	     1714.96 ( -9.29%)	     1569.12	
    length=16384:      2055.73 ( -5.42%)	     1701.01 ( 12.77%)	     1944.38 (  0.29%)	     1683.51 ( 13.67%)	     1950.11	
    length=16385:      2314.62 ( 23.38%)	     1774.44 ( 41.26%)	     3967.45 (-31.34%)	     3385.52 (-12.07%)	     3020.82	
    length=32768:      5153.99 (-32.25%)	     3426.50 ( 12.08%)	     3875.16 (  0.56%)	     3338.91 ( 14.32%)	     3897.16	
    length=32769:      5343.41 (  9.64%)	     3375.50 ( 42.92%)	     7925.06 (-34.01%)	     6716.28 (-13.57%)	     5913.72	
    length=65536:     10361.70 (-35.90%)	     6768.32 ( 11.23%)	     7759.75 ( -1.78%)	     6658.73 ( 12.66%)	     7624.32	
    length=65537:     10284.00 ( 12.00%)	     6528.85 ( 44.13%)	    15844.40 (-35.58%)	    13437.90 (-14.98%)	    11686.80	
   length=131072:     20539.30 (-34.71%)	    13672.50 ( 10.33%)	    15567.10 ( -2.10%)	    13325.60 ( 12.60%)	    15247.50	
   length=131073:     20868.20 ( 10.97%)	    12807.80 ( 45.36%)	    31605.90 (-34.83%)	    26788.20 (-14.28%)	    23440.70	
   length=262144:     41304.50 (-35.25%)	    26883.30 ( 11.97%)	    31038.70 ( -1.63%)	    26533.40 ( 13.12%)	    30539.40	
   length=262145:     41157.90 ( 12.84%)	    25568.20 ( 45.85%)	    63229.00 (-33.90%)	    53525.00 (-13.35%)	    47220.50	
   length=524288:     81777.00 (-32.88%)	    54133.00 ( 12.04%)	    61853.30 ( -0.51%)	    52869.40 ( 14.09%)	    61542.20	
   length=524289:     81986.90 ( 14.71%)	    50562.00 ( 47.40%)	   126255.00 (-31.33%)	   105969.00 (-10.23%)	    96132.70	
  length=1048576:    163628.00 (-33.00%)	   107776.00 ( 12.00%)	   123819.00 ( -1.00%)	   105831.00 ( 14.00%)	   123170.00	
  length=1048577:    177503.00 ( 12.00%)	    98680.60 ( 51.09%)	   253068.00 (-26.00%)	   211155.00 ( -5.00%)	   201763.00	
  length=2097152:    336756.00 (-34.00%)	   224097.00 ( 11.00%)	   254575.00 ( -1.00%)	   219864.00 ( 13.00%)	   253124.00	
  length=2097153:    373590.00 (  9.00%)	   214822.00 ( 48.00%)	   506479.00 (-23.00%)	   426299.00 ( -3.00%)	   414899.00	
  length=4194304:    662606.00 (-35.00%)	   437195.00 ( 11.00%)	   497288.00 ( -2.00%)	   427614.00 ( 13.00%)	   491729.00	
  length=4194305:    697910.00 (  9.00%)	   417656.00 ( 45.00%)	  1020670.00 (-32.62%)	   856051.00 (-12.00%)	   769599.00	
  length=8388608:   1307990.00 (-34.88%)	   852030.00 ( 12.00%)	   983092.00 ( -2.00%)	   834918.00 ( 13.00%)	   969712.00	
  length=8388609:   1416420.00 (  8.70%)	   821262.00 ( 47.06%)	  2030660.00 (-30.89%)	  1708360.00 (-10.11%)	  1551450.00	
 length=16777216:   2586380.00 (-33.02%)	  1702120.00 ( 12.46%)	  1970000.00 ( -1.32%)	  1676900.00 ( 13.76%)	  1944360.00	
 length=16777217:   2796060.00 ( 13.29%)	  1627720.00 ( 49.52%)	  4079100.00 (-26.51%)	  3410640.00 ( -5.77%)	  3224440.00	
 length=33554432:   5241680.00 (-33.96%)	  3488860.00 ( 10.84%)	  4890730.00 (-24.99%)	  3474630.00 ( 11.20%)	  3912900.00	
 length=33554433:   5666550.00 ( 14.71%)	  3357520.00 ( 49.46%)	  8039630.00 (-21.01%)	  6824230.00 ( -2.72%)	  6643780.00

             reply	other threads:[~2019-10-26 13:46 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-26 13:46 Zhangxuelei (Derek) [this message]
  -- strict thread matches above, loose matches on Subject: below --
2019-10-29  3:22 Zhangxuelei (Derek)
2019-10-29  3:26 ` Carlos O'Donell
2019-10-30  6:42   ` Yikun Jiang
2019-11-01 12:55     ` Carlos O'Donell
2019-10-26 13:22 Zhangxuelei (Derek)
2019-10-26 13:40 ` Carlos O'Donell
2019-10-21 14:25 Zhangxuelei (Derek)
2019-10-22  9:50 ` Yikun Jiang
2019-10-24 14:57   ` Carlos O'Donell
2019-10-26  9:57     ` Florian Weimer
2019-10-26 13:40       ` Carlos O'Donell
2019-10-29  1:20     ` Carlos O'Donell
2019-10-29 14:34 ` Wilco Dijkstra
2019-10-17 13:16 Xuelei Zhang
2019-10-17 14:57 ` Yikun Jiang
2019-10-18 15:50   ` Wilco Dijkstra
2019-10-22 18:29 ` Wilco Dijkstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8DC571DDDE171B4094D3D33E9685917BD931DE@dggemi509-mbs.china.huawei.com \
    --to=zhangxuelei4@huawei.com \
    --cc=Wilco.Dijkstra@arm.com \
    --cc=jiangyikun@huawei.com \
    --cc=libc-alpha@sourceware.org \
    --cc=nd@arm.com \
    --cc=siddhesh@gotplt.org \
    --cc=yikunkero@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).