From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk1-x741.google.com (mail-qk1-x741.google.com [IPv6:2607:f8b0:4864:20::741]) by sourceware.org (Postfix) with ESMTPS id 034BA3944430 for ; Wed, 30 Sep 2020 14:46:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 034BA3944430 Received: by mail-qk1-x741.google.com with SMTP id d20so1623981qka.5 for ; Wed, 30 Sep 2020 07:46:33 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:references:from:autocrypt:subject:message-id :date:user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=55x9mBnGk/PnrnIfufS2Iz3pe0u1vIQYrITcOz+O5D8=; b=EVX3wOhWVuIJ1SntIwrw2DBd89MqWnQUFJ7r0pXChGMpv6B+pRlqU7Qvt3+au0fuFf un1dXVseEid+deMx3jUBmmCHugtuGvHLLCfNz5J7Cb+aLRifsufTTx/EhUJh38Io5UiF 5vgfxEvf51dwrT1vz7uDuJPb5xkB9nqReNPaMG8FpSit5/rt8XG+YHsiXUnT/i2w6VmN jgKgIpgxAOT+ubj6/zUzWx31QPR7x6+t6RA2eUJsYMSDNv+QnxRk3pzm3w4Dwt5/bEbX k+UqzC5wu+m/KXjYC6eBUQX0axH7Nmgaf3jmnE8hMeL7KBOYpvSP11/oQ2fNRT9bCcBI KA8g== X-Gm-Message-State: AOAM533qE5QNFPrH7esnTbJ5NYGXJIoUg2iCc6Yp0mM1O64ApQjF8CpB 22lyu+ESJR2yMQfm71fx1GWU+pfdnM4TcQ== X-Google-Smtp-Source: ABdhPJxj9W2nFGuh/4GTPkWnl+QzaN3RJOYLOtKHx7hP7RxXSIpCEohvdHooNpYzg7nkWl8cLRPWoQ== X-Received: by 2002:a37:51d5:: with SMTP id f204mr2816350qkb.145.1601477193277; Wed, 30 Sep 2020 07:46:33 -0700 (PDT) Received: from [192.168.1.4] ([177.194.48.209]) by smtp.googlemail.com with ESMTPSA id o4sm2235451qkj.22.2020.09.30.07.46.31 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 30 Sep 2020 07:46:32 -0700 (PDT) To: Raphael M Zinsly , libc-alpha@sourceware.org References: <20200929152103.18564-1-rzinsly@linux.ibm.com> <20200929152103.18564-2-rzinsly@linux.ibm.com> <37dd785c-60ec-f064-bfeb-7c5ec5483936@linaro.org> <8c436fe7-cdf4-a1fa-6777-f641f3b8a59c@linux.ibm.com> From: Adhemerval Zanella Autocrypt: addr=adhemerval.zanella@linaro.org; prefer-encrypt=mutual; keydata= mQINBFcVGkoBEADiQU2x/cBBmAVf5C2d1xgz6zCnlCefbqaflUBw4hB/bEME40QsrVzWZ5Nq 8kxkEczZzAOKkkvv4pRVLlLn/zDtFXhlcvQRJ3yFMGqzBjofucOrmdYkOGo0uCaoJKPT186L NWp53SACXguFJpnw4ODI64ziInzXQs/rUJqrFoVIlrPDmNv/LUv1OVPKz20ETjgfpg8MNwG6 iMizMefCl+RbtXbIEZ3TE/IaDT/jcOirjv96lBKrc/pAL0h/O71Kwbbp43fimW80GhjiaN2y WGByepnkAVP7FyNarhdDpJhoDmUk9yfwNuIuESaCQtfd3vgKKuo6grcKZ8bHy7IXX1XJj2X/ BgRVhVgMHAnDPFIkXtP+SiarkUaLjGzCz7XkUn4XAGDskBNfbizFqYUQCaL2FdbW3DeZqNIa nSzKAZK7Dm9+0VVSRZXP89w71Y7JUV56xL/PlOE+YKKFdEw+gQjQi0e+DZILAtFjJLoCrkEX w4LluMhYX/X8XP6/C3xW0yOZhvHYyn72sV4yJ1uyc/qz3OY32CRy+bwPzAMAkhdwcORA3JPb kPTlimhQqVgvca8m+MQ/JFZ6D+K7QPyvEv7bQ7M+IzFmTkOCwCJ3xqOD6GjX3aphk8Sr0dq3 4Awlf5xFDAG8dn8Uuutb7naGBd/fEv6t8dfkNyzj6yvc4jpVxwARAQABtElBZGhlbWVydmFs IFphbmVsbGEgTmV0dG8gKExpbmFybyBWUE4gS2V5KSA8YWRoZW1lcnZhbC56YW5lbGxhQGxp bmFyby5vcmc+iQI3BBMBCAAhBQJXFRpKAhsDBQsJCAcDBRUKCQgLBRYCAwEAAh4BAheAAAoJ EKqx7BSnlIjv0e8P/1YOYoNkvJ+AJcNUaM5a2SA9oAKjSJ/M/EN4Id5Ow41ZJS4lUA0apSXW NjQg3VeVc2RiHab2LIB4MxdJhaWTuzfLkYnBeoy4u6njYcaoSwf3g9dSsvsl3mhtuzm6aXFH /Qsauav77enJh99tI4T+58rp0EuLhDsQbnBic/ukYNv7sQV8dy9KxA54yLnYUFqH6pfH8Lly sTVAMyi5Fg5O5/hVV+Z0Kpr+ZocC1YFJkTsNLAW5EIYSP9ftniqaVsim7MNmodv/zqK0IyDB GLLH1kjhvb5+6ySGlWbMTomt/or/uvMgulz0bRS+LUyOmlfXDdT+t38VPKBBVwFMarNuREU2 69M3a3jdTfScboDd2ck1u7l+QbaGoHZQ8ZNUrzgObltjohiIsazqkgYDQzXIMrD9H19E+8fw kCNUlXxjEgH/Kg8DlpoYJXSJCX0fjMWfXywL6ZXc2xyG/hbl5hvsLNmqDpLpc1CfKcA0BkK+ k8R57fr91mTCppSwwKJYO9T+8J+o4ho/CJnK/jBy1pWKMYJPvvrpdBCWq3MfzVpXYdahRKHI ypk8m4QlRlbOXWJ3TDd/SKNfSSrWgwRSg7XCjSlR7PNzNFXTULLB34sZhjrN6Q8NQZsZnMNs TX8nlGOVrKolnQPjKCLwCyu8PhllU8OwbSMKskcD1PSkG6h3r0AquQINBFcVGkoBEACgAdbR Ck+fsfOVwT8zowMiL3l9a2DP3Eeak23ifdZG+8Avb/SImpv0UMSbRfnw/N81IWwlbjkjbGTu oT37iZHLRwYUFmA8fZX0wNDNKQUUTjN6XalJmvhdz9l71H3WnE0wneEM5ahu5V1L1utUWTyh VUwzX1lwJeV3vyrNgI1kYOaeuNVvq7npNR6t6XxEpqPsNc6O77I12XELic2+36YibyqlTJIQ V1SZEbIy26AbC2zH9WqaKyGyQnr/IPbTJ2Lv0dM3RaXoVf+CeK7gB2B+w1hZummD21c1Laua +VIMPCUQ+EM8W9EtX+0iJXxI+wsztLT6vltQcm+5Q7tY+HFUucizJkAOAz98YFucwKefbkTp eKvCfCwiM1bGatZEFFKIlvJ2QNMQNiUrqJBlW9nZp/k7pbG3oStOjvawD9ZbP9e0fnlWJIsj 6c7pX354Yi7kxIk/6gREidHLLqEb/otuwt1aoMPg97iUgDV5mlNef77lWE8vxmlY0FBWIXuZ yv0XYxf1WF6dRizwFFbxvUZzIJp3spAao7jLsQj1DbD2s5+S1BW09A0mI/1DjB6EhNN+4bDB SJCOv/ReK3tFJXuj/HbyDrOdoMt8aIFbe7YFLEExHpSk+HgN05Lg5TyTro8oW7TSMTk+8a5M kzaH4UGXTTBDP/g5cfL3RFPl79ubXwARAQABiQIfBBgBCAAJBQJXFRpKAhsMAAoJEKqx7BSn lIjvI/8P/jg0jl4Tbvg3B5kT6PxJOXHYu9OoyaHLcay6Cd+ZrOd1VQQCbOcgLFbf4Yr+rE9l mYsY67AUgq2QKmVVbn9pjvGsEaz8UmfDnz5epUhDxC6yRRvY4hreMXZhPZ1pbMa6A0a/WOSt AgFj5V6Z4dXGTM/lNManr0HjXxbUYv2WfbNt3/07Db9T+GZkpUotC6iknsTA4rJi6u2ls0W9 1UIvW4o01vb4nZRCj4rni0g6eWoQCGoVDk/xFfy7ZliR5B+3Z3EWRJcQskip/QAHjbLa3pml xAZ484fVxgeESOoaeC9TiBIp0NfH8akWOI0HpBCiBD5xaCTvR7ujUWMvhsX2n881r/hNlR9g fcE6q00qHSPAEgGr1bnFv74/1vbKtjeXLCcRKk3Ulw0bY1OoDxWQr86T2fZGJ/HIZuVVBf3+ gaYJF92GXFynHnea14nFFuFgOni0Mi1zDxYH/8yGGBXvo14KWd8JOW0NJPaCDFJkdS5hu0VY 7vJwKcyHJGxsCLU+Et0mryX8qZwqibJIzu7kUJQdQDljbRPDFd/xmGUFCQiQAncSilYOcxNU EMVCXPAQTteqkvA+gNqSaK1NM9tY0eQ4iJpo+aoX8HAcn4sZzt2pfUB9vQMTBJ2d4+m/qO6+ cFTAceXmIoFsN8+gFN3i8Is3u12u8xGudcBPvpoy4OoG Subject: Re: [PATCH v3 2/2] powerpc: Add optimized stpncpy for POWER9 Message-ID: Date: Wed, 30 Sep 2020 11:46:30 -0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: <8c436fe7-cdf4-a1fa-6777-f641f3b8a59c@linux.ibm.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-8.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_NUMSUBJECT, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 30 Sep 2020 14:46:35 -0000 On 30/09/2020 11:21, Raphael M Zinsly wrote: > Hi Adhemerval, > > On 30/09/2020 10:42, Adhemerval Zanella wrote: >> >> >> On 29/09/2020 12:21, Raphael Moreira Zinsly via Libc-alpha wrote: >>> Add stpncpy support into the POWER9 strncpy. >> >> The benchmark numbers you provided [1] seems to show it is slight worse than >> the generic_strncpy which uses the same strategy as string/strncpy.c >> (which would use VSX instruction through memset/memcpy). > > My implementation is always better than the generic_strncpy, almost three times better in average. And it calls memset as well. > > Are you talking about __strncpy_ppc? For some reason it is using strnlen_ppc instead of the strnlen_power8, but I didn't touch it. > >> Did you compare this >> optimization against an implementation that just call power8/9 memset/memcpy >> instead? >> > > Not sure if I understand, isn't that generic_strncpy and strncpy_ppc? Right, I misread the benchmark. And I tested my own suggestion on the power9 from gcc farm and it seems that although it is slight faster than power7 variant it does not really beat power8 (as expected since it calls strnlen and then memcpy/memset and access the input twice). I do not really oppose it and it is up to the arch maintainer, but I still think these micro-optimizations tends to just add extra maintainability and icache pressure where the microbenchmark does not really catch. > > >> It should resulting a smaller implementation which reduces i-cache size and >> the code is much more simpler and maintainable.  The same applies for stpncpy. >> >> I tried to dissuade Intel developers that such micro-optimization are not >> really a real gain and instead we should optimize only a handful of string >> operations (memcpy/memset/etc.) and use composable implementation instead >> (as generic strncpy).  It still resulted on 1a153e47fcc, but I think we >> might do better for powerpc. >> >> [1] https://sourceware.org/pipermail/libc-alpha/2020-September/118049.html >> > > Best Regards,