From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk1-x743.google.com (mail-qk1-x743.google.com [IPv6:2607:f8b0:4864:20::743]) by sourceware.org (Postfix) with ESMTPS id A6E593858D37 for ; Tue, 25 Aug 2020 12:15:42 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org A6E593858D37 Received: by mail-qk1-x743.google.com with SMTP id p25so10661312qkp.2 for ; Tue, 25 Aug 2020 05:15:42 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:autocrypt:message-id :date:user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=4h2R0pt0LHifxsYJqs5r6ly/vd0pSx+Pkdr7+oIBdWE=; b=MxynOH4S7UD4GMHYlDLx+gKVckbuUt8RALCi6PVYkZGH6wMPOr9eZdsf6ctVqbKSFt mVQi1U8BOqU9K/BaTP28gHWdotHyhq9h2WywojJiDfVGg8pBiRcp5PGcMPq3TKWZ5lcI WjFlmyJxULV+A0w/O6ub/2zx/ssYuRHH8EuPM+iZkCgu9Ab7mzGgsg5JulZHdOWhRzO/ MncFSGSHpgU2ZcFu/IES92FLYKszwanuqkIUdDWyKeiTM2XODy40RDDzE7mWC23RkKvk Uvl2rSUvIglYhbBlxHfb8UdRvYXYzXdlkOgI4kfo1P5k/4aSvre9bqD31OH/gIV6hG5+ 0QLw== X-Gm-Message-State: AOAM530OlVOeyCr8B045yhFhRd7TnQizEXZ8mAQXG9hw5SQnPwGIsu05 5K/IFb7knbZlksPFPmQ3gdjWDTbBk4WsrYmf X-Google-Smtp-Source: ABdhPJzzPESwwhjbVK5/xRP8qh3+vd3Wx8gPGTe65O0NFTzHN1INTNEenhK8TEvydNbyOmkHXqItOg== X-Received: by 2002:a37:e315:: with SMTP id y21mr9015398qki.129.1598357741622; Tue, 25 Aug 2020 05:15:41 -0700 (PDT) Received: from [192.168.1.4] ([177.194.48.209]) by smtp.googlemail.com with ESMTPSA id v202sm7028226qka.5.2020.08.25.05.15.40 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 25 Aug 2020 05:15:40 -0700 (PDT) Subject: Re: [PATCH] AArch64: Improve backwards memmove performance To: libc-alpha@sourceware.org References: From: Adhemerval Zanella Autocrypt: addr=adhemerval.zanella@linaro.org; prefer-encrypt=mutual; keydata= mQINBFcVGkoBEADiQU2x/cBBmAVf5C2d1xgz6zCnlCefbqaflUBw4hB/bEME40QsrVzWZ5Nq 8kxkEczZzAOKkkvv4pRVLlLn/zDtFXhlcvQRJ3yFMGqzBjofucOrmdYkOGo0uCaoJKPT186L NWp53SACXguFJpnw4ODI64ziInzXQs/rUJqrFoVIlrPDmNv/LUv1OVPKz20ETjgfpg8MNwG6 iMizMefCl+RbtXbIEZ3TE/IaDT/jcOirjv96lBKrc/pAL0h/O71Kwbbp43fimW80GhjiaN2y WGByepnkAVP7FyNarhdDpJhoDmUk9yfwNuIuESaCQtfd3vgKKuo6grcKZ8bHy7IXX1XJj2X/ BgRVhVgMHAnDPFIkXtP+SiarkUaLjGzCz7XkUn4XAGDskBNfbizFqYUQCaL2FdbW3DeZqNIa nSzKAZK7Dm9+0VVSRZXP89w71Y7JUV56xL/PlOE+YKKFdEw+gQjQi0e+DZILAtFjJLoCrkEX w4LluMhYX/X8XP6/C3xW0yOZhvHYyn72sV4yJ1uyc/qz3OY32CRy+bwPzAMAkhdwcORA3JPb kPTlimhQqVgvca8m+MQ/JFZ6D+K7QPyvEv7bQ7M+IzFmTkOCwCJ3xqOD6GjX3aphk8Sr0dq3 4Awlf5xFDAG8dn8Uuutb7naGBd/fEv6t8dfkNyzj6yvc4jpVxwARAQABtElBZGhlbWVydmFs IFphbmVsbGEgTmV0dG8gKExpbmFybyBWUE4gS2V5KSA8YWRoZW1lcnZhbC56YW5lbGxhQGxp bmFyby5vcmc+iQI3BBMBCAAhBQJXFRpKAhsDBQsJCAcDBRUKCQgLBRYCAwEAAh4BAheAAAoJ EKqx7BSnlIjv0e8P/1YOYoNkvJ+AJcNUaM5a2SA9oAKjSJ/M/EN4Id5Ow41ZJS4lUA0apSXW NjQg3VeVc2RiHab2LIB4MxdJhaWTuzfLkYnBeoy4u6njYcaoSwf3g9dSsvsl3mhtuzm6aXFH /Qsauav77enJh99tI4T+58rp0EuLhDsQbnBic/ukYNv7sQV8dy9KxA54yLnYUFqH6pfH8Lly sTVAMyi5Fg5O5/hVV+Z0Kpr+ZocC1YFJkTsNLAW5EIYSP9ftniqaVsim7MNmodv/zqK0IyDB GLLH1kjhvb5+6ySGlWbMTomt/or/uvMgulz0bRS+LUyOmlfXDdT+t38VPKBBVwFMarNuREU2 69M3a3jdTfScboDd2ck1u7l+QbaGoHZQ8ZNUrzgObltjohiIsazqkgYDQzXIMrD9H19E+8fw kCNUlXxjEgH/Kg8DlpoYJXSJCX0fjMWfXywL6ZXc2xyG/hbl5hvsLNmqDpLpc1CfKcA0BkK+ k8R57fr91mTCppSwwKJYO9T+8J+o4ho/CJnK/jBy1pWKMYJPvvrpdBCWq3MfzVpXYdahRKHI ypk8m4QlRlbOXWJ3TDd/SKNfSSrWgwRSg7XCjSlR7PNzNFXTULLB34sZhjrN6Q8NQZsZnMNs TX8nlGOVrKolnQPjKCLwCyu8PhllU8OwbSMKskcD1PSkG6h3r0AquQINBFcVGkoBEACgAdbR Ck+fsfOVwT8zowMiL3l9a2DP3Eeak23ifdZG+8Avb/SImpv0UMSbRfnw/N81IWwlbjkjbGTu oT37iZHLRwYUFmA8fZX0wNDNKQUUTjN6XalJmvhdz9l71H3WnE0wneEM5ahu5V1L1utUWTyh VUwzX1lwJeV3vyrNgI1kYOaeuNVvq7npNR6t6XxEpqPsNc6O77I12XELic2+36YibyqlTJIQ V1SZEbIy26AbC2zH9WqaKyGyQnr/IPbTJ2Lv0dM3RaXoVf+CeK7gB2B+w1hZummD21c1Laua +VIMPCUQ+EM8W9EtX+0iJXxI+wsztLT6vltQcm+5Q7tY+HFUucizJkAOAz98YFucwKefbkTp eKvCfCwiM1bGatZEFFKIlvJ2QNMQNiUrqJBlW9nZp/k7pbG3oStOjvawD9ZbP9e0fnlWJIsj 6c7pX354Yi7kxIk/6gREidHLLqEb/otuwt1aoMPg97iUgDV5mlNef77lWE8vxmlY0FBWIXuZ yv0XYxf1WF6dRizwFFbxvUZzIJp3spAao7jLsQj1DbD2s5+S1BW09A0mI/1DjB6EhNN+4bDB SJCOv/ReK3tFJXuj/HbyDrOdoMt8aIFbe7YFLEExHpSk+HgN05Lg5TyTro8oW7TSMTk+8a5M kzaH4UGXTTBDP/g5cfL3RFPl79ubXwARAQABiQIfBBgBCAAJBQJXFRpKAhsMAAoJEKqx7BSn lIjvI/8P/jg0jl4Tbvg3B5kT6PxJOXHYu9OoyaHLcay6Cd+ZrOd1VQQCbOcgLFbf4Yr+rE9l mYsY67AUgq2QKmVVbn9pjvGsEaz8UmfDnz5epUhDxC6yRRvY4hreMXZhPZ1pbMa6A0a/WOSt AgFj5V6Z4dXGTM/lNManr0HjXxbUYv2WfbNt3/07Db9T+GZkpUotC6iknsTA4rJi6u2ls0W9 1UIvW4o01vb4nZRCj4rni0g6eWoQCGoVDk/xFfy7ZliR5B+3Z3EWRJcQskip/QAHjbLa3pml xAZ484fVxgeESOoaeC9TiBIp0NfH8akWOI0HpBCiBD5xaCTvR7ujUWMvhsX2n881r/hNlR9g fcE6q00qHSPAEgGr1bnFv74/1vbKtjeXLCcRKk3Ulw0bY1OoDxWQr86T2fZGJ/HIZuVVBf3+ gaYJF92GXFynHnea14nFFuFgOni0Mi1zDxYH/8yGGBXvo14KWd8JOW0NJPaCDFJkdS5hu0VY 7vJwKcyHJGxsCLU+Et0mryX8qZwqibJIzu7kUJQdQDljbRPDFd/xmGUFCQiQAncSilYOcxNU EMVCXPAQTteqkvA+gNqSaK1NM9tY0eQ4iJpo+aoX8HAcn4sZzt2pfUB9vQMTBJ2d4+m/qO6+ cFTAceXmIoFsN8+gFN3i8Is3u12u8xGudcBPvpoy4OoG Message-ID: <16d04393-df89-cfb2-a97c-91a44fa4888c@linaro.org> Date: Tue, 25 Aug 2020 09:15:38 -0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-15.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 25 Aug 2020 12:15:44 -0000 On 20/08/2020 08:46, Wilco Dijkstra wrote: > On some microarchitectures performance of the backwards memmove improves if > the stores use STR with decreasing addresses. So change the memmove loop > in memcpy_advsimd.S to use 2x STR rather than STP. > > Passes GLIBC regression test, OK for commit? LGTM, thanks. Does it make any difference to use the same strategy on the last iteration at L(copy64_from_start) as well? Reviewed-by: Adhemerval Zanella > > --- > diff --git a/sysdeps/aarch64/multiarch/memcpy_advsimd.S b/sysdeps/aarch64/multiarch/memcpy_advsimd.S > index d4ba74777744c8bb5a83e43ab2d63ad8dab35203..48bb6d7ca425197907eaef2307fb3939e69baa15 100644 > --- a/sysdeps/aarch64/multiarch/memcpy_advsimd.S > +++ b/sysdeps/aarch64/multiarch/memcpy_advsimd.S > @@ -223,12 +223,13 @@ L(copy_long_backwards): > b.ls L(copy64_from_start) > > L(loop64_backwards): > - stp A_q, B_q, [dstend, -32] > + str B_q, [dstend, -16] > + str A_q, [dstend, -32] > ldp A_q, B_q, [srcend, -96] > - stp C_q, D_q, [dstend, -64] > + str D_q, [dstend, -48] > + str C_q, [dstend, -64]! > ldp C_q, D_q, [srcend, -128] > sub srcend, srcend, 64 > - sub dstend, dstend, 64 > subs count, count, 64 > b.hi L(loop64_backwards) > >