From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 2504 invoked by alias); 26 Oct 2019 13:46:34 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 2494 invoked by uid 89); 26 Oct 2019 13:46:33 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-6.9 required=5.0 tests=AWL,BAYES_00,KAM_ASCII_DIVIDERS,KAM_MANYTO,SPF_HELO_PASS,SPF_PASS autolearn=no version=3.3.1 spammy=4500, 3500, 3225, 0.56 X-HELO: huawei.com From: "Zhangxuelei (Derek)" To: Wilco Dijkstra , "libc-alpha@sourceware.org" , nd , "siddhesh@gotplt.org" , jiangyikun , "yikunkero@gmail.com" CC: nd Subject: Re: [PATCH v2 2/2] aarch64: Optimized memcpy and memmove for Kunpeng processor Date: Sat, 26 Oct 2019 13:46:00 -0000 Message-ID: <8DC571DDDE171B4094D3D33E9685917BD931DE@dggemi509-mbs.china.huawei.com> Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-SW-Source: 2019-10/txt/msg00833.txt.bz2 Hi Wilco, sorry for the delay in replying because we tested lot after modif= ing memmove. > In order to select the right memmove implementation, > multiarch/memmove.c needs similar changes as multiarch/memcpy.c. That's true, we missed this patch and will submit it next patch. > Also since the memmove entry sequence does both check for medium > and large cases, the full overlap check should be done in both. > Currently only sizes 96-512 benefit, not the move_long case: Yes, we will add full overlap check in move_long case next patch. And what confusing us now is that, we removed dst_unaligned code in memcpy = according to the previous comments, which did not affect performance after = testing in memcpy cases. But in the case when uses memmove function and ent= ers the memcpy part, unaligned cases is significantly slower than aligned c= ase according to the results of the first half part of memmove-walk as show= n in the bottom. So do you think we should still remove dst_unaligned code?= =20 We analyse the reason is of more judgement in the begin of memmove and may = weak processor ability to handle this case, and so dst_unaligned make diffe= rence. > Well it looks the dst_unaligned code (which deals with a specific > issue on ThunderX2) ... I remember you memtioned the specific issue on ThunderX2 before, could you = tell us more about it? Function: memmove Variant: walk __memmove_thunderx __memmove_thunderx2 __memmove_falkor= __memmove_kunpeng2 __memmove_generic =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D length=3D128: 33.99 (-73.69%) 18.65 ( 4.67%) 17= .75 ( 9.29%) 19.21 ( 1.80%) 19.57=09 length=3D129: 35.41 ( 2.08%) 37.43 ( -3.51%) 35= .87 ( 0.79%) 34.71 ( 4.01%) 36.16=09 length=3D256: 45.55 (-37.95%) 32.61 ( 1.23%) 35= .59 ( -7.79%) 32.95 ( 0.20%) 33.02=09 length=3D257: 66.36 ( 4.20%) 69.50 ( -0.33%) 68= .03 ( 1.80%) 68.53 ( 1.08%) 69.27=09 length=3D512: 82.77 (-34.10%) 65.67 ( -6.41%) 65= .61 ( -6.30%) 60.13 ( 2.57%) 61.72=09 length=3D513: 146.19 ( 3.90%) 132.98 ( 12.59%) 132= .28 ( 13.05%) 151.50 ( 0.41%) 152.12=09 length=3D1024: 155.75 (-26.13%) 142.74 (-15.60%) 126= .97 ( -2.83%) 121.58 ( 1.53%) 123.48=09 length=3D1025: 289.15 ( 4.72%) 318.71 ( -5.02%) 262= .97 ( 13.35%) 307.00 ( -1.16%) 303.48=09 length=3D2048: 298.85 (-22.16%) 233.98 ( 4.35%) 249= .71 ( -2.08%) 245.37 ( -0.30%) 244.63=09 length=3D2049: 409.46 ( 14.62%) 399.08 ( 16.78%) 508= .64 ( -6.07%) 465.79 ( 2.87%) 479.54=09 length=3D4096: 543.10 (-11.30%) 445.35 ( 8.73%) 491= .40 ( -0.71%) 435.61 ( 10.73%) 487.95=09 length=3D4097: 680.95 ( 18.96%) 593.99 ( 29.31%) 990= .52 (-17.89%) 882.91 ( -5.08%) 840.23=09 length=3D8192: 1047.46 ( -8.01%) 867.03 ( 10.59%) 977= .80 ( -0.83%) 850.57 ( 12.29%) 969.74=09 length=3D8193: 1224.46 ( 21.97%) 979.34 ( 37.59%) 1981= .71 (-26.29%) 1714.96 ( -9.29%) 1569.12=09 length=3D16384: 2055.73 ( -5.42%) 1701.01 ( 12.77%) 1944= .38 ( 0.29%) 1683.51 ( 13.67%) 1950.11=09 length=3D16385: 2314.62 ( 23.38%) 1774.44 ( 41.26%) 3967= .45 (-31.34%) 3385.52 (-12.07%) 3020.82=09 length=3D32768: 5153.99 (-32.25%) 3426.50 ( 12.08%) 3875= .16 ( 0.56%) 3338.91 ( 14.32%) 3897.16=09 length=3D32769: 5343.41 ( 9.64%) 3375.50 ( 42.92%) 7925= .06 (-34.01%) 6716.28 (-13.57%) 5913.72=09 length=3D65536: 10361.70 (-35.90%) 6768.32 ( 11.23%) 7759= .75 ( -1.78%) 6658.73 ( 12.66%) 7624.32=09 length=3D65537: 10284.00 ( 12.00%) 6528.85 ( 44.13%) 15844= .40 (-35.58%) 13437.90 (-14.98%) 11686.80=09 length=3D131072: 20539.30 (-34.71%) 13672.50 ( 10.33%) 15567= .10 ( -2.10%) 13325.60 ( 12.60%) 15247.50=09 length=3D131073: 20868.20 ( 10.97%) 12807.80 ( 45.36%) 31605= .90 (-34.83%) 26788.20 (-14.28%) 23440.70=09 length=3D262144: 41304.50 (-35.25%) 26883.30 ( 11.97%) 31038= .70 ( -1.63%) 26533.40 ( 13.12%) 30539.40=09 length=3D262145: 41157.90 ( 12.84%) 25568.20 ( 45.85%) 63229= .00 (-33.90%) 53525.00 (-13.35%) 47220.50=09 length=3D524288: 81777.00 (-32.88%) 54133.00 ( 12.04%) 61853= .30 ( -0.51%) 52869.40 ( 14.09%) 61542.20=09 length=3D524289: 81986.90 ( 14.71%) 50562.00 ( 47.40%) 126255= .00 (-31.33%) 105969.00 (-10.23%) 96132.70=09 length=3D1048576: 163628.00 (-33.00%) 107776.00 ( 12.00%) 123819= .00 ( -1.00%) 105831.00 ( 14.00%) 123170.00=09 length=3D1048577: 177503.00 ( 12.00%) 98680.60 ( 51.09%) 253068= .00 (-26.00%) 211155.00 ( -5.00%) 201763.00=09 length=3D2097152: 336756.00 (-34.00%) 224097.00 ( 11.00%) 254575= .00 ( -1.00%) 219864.00 ( 13.00%) 253124.00=09 length=3D2097153: 373590.00 ( 9.00%) 214822.00 ( 48.00%) 506479= .00 (-23.00%) 426299.00 ( -3.00%) 414899.00=09 length=3D4194304: 662606.00 (-35.00%) 437195.00 ( 11.00%) 497288= .00 ( -2.00%) 427614.00 ( 13.00%) 491729.00=09 length=3D4194305: 697910.00 ( 9.00%) 417656.00 ( 45.00%) 1020670= .00 (-32.62%) 856051.00 (-12.00%) 769599.00=09 length=3D8388608: 1307990.00 (-34.88%) 852030.00 ( 12.00%) 983092= .00 ( -2.00%) 834918.00 ( 13.00%) 969712.00=09 length=3D8388609: 1416420.00 ( 8.70%) 821262.00 ( 47.06%) 2030660= .00 (-30.89%) 1708360.00 (-10.11%) 1551450.00=09 length=3D16777216: 2586380.00 (-33.02%) 1702120.00 ( 12.46%) 1970000= .00 ( -1.32%) 1676900.00 ( 13.76%) 1944360.00=09 length=3D16777217: 2796060.00 ( 13.29%) 1627720.00 ( 49.52%) 4079100= .00 (-26.51%) 3410640.00 ( -5.77%) 3224440.00=09 length=3D33554432: 5241680.00 (-33.96%) 3488860.00 ( 10.84%) 4890730= .00 (-24.99%) 3474630.00 ( 11.20%) 3912900.00=09 length=3D33554433: 5666550.00 ( 14.71%) 3357520.00 ( 49.46%) 8039630= .00 (-21.01%) 6824230.00 ( -2.72%) 6643780.00