From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pl1-x62f.google.com (mail-pl1-x62f.google.com [IPv6:2607:f8b0:4864:20::62f]) by sourceware.org (Postfix) with ESMTPS id B96E13858025 for ; Fri, 31 Mar 2023 14:31:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B96E13858025 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-pl1-x62f.google.com with SMTP id w4so21361156plg.9 for ; Fri, 31 Mar 2023 07:31:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1680273061; x=1682865061; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=hlEGTV1EOHPRdUDeW1R4H6pKc+kNjuvhwapnvD/PWn8=; b=FRp9iQTteNJzfBwXrClL/FuZXA8BgIhzB+jJxjNDBFYN+cskDGpSee4F/h6uD6PI3z HDdtO9hw28RWCE6VhG/ktyUo4hENMdsUKx2dr5ocMMmRWYWC7jj3IFuhUGypNFyOGJZ+ TpY4JVRMDKapbqPcSsz3fBKF6KrJ0KUakxW2IRfAHUnYRwTK6TFTFpRQg6P0syTx6Jn3 kvetgKhDBwTxFZTXZKv4hCTz6MLsT5OhI9P5+sjJspvWQfbBdfj/0O1bWq3/0cygeAXb 5d8ZSBVJ0NlI/CWW7RMwneNd/c7IT6gVYaa31Ipn5542DNHv0YPEKhhuvq4HV699Jx0c 5c0g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680273061; x=1682865061; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=hlEGTV1EOHPRdUDeW1R4H6pKc+kNjuvhwapnvD/PWn8=; b=DD6bQ2XKnadokhHzoK6dcxRVWJeumxsLAbL8b1SaBROz5kl+boP2G0QZ8mYdrggevs hhPZtVhFu/hVzq/cfOTnIaXcEmeTlDE7WgNz0Zubzzm97GQ0E4AuQqN7AmFW1BIVwMVt MMTk0mwddQk/qj9fJmVf534veLUD2n8weR/P4JwvhmOEORbQ3Kop9keQQ/FCfh7cOqoq pag/V4LPYZ+quNKnoUARmj0qrbpMsdDCSClsDSvPTRAsD5Wsqnu2mw1PpKJg53qtceVz Rh7oV69lc2IT/9GrGplU8LoyODHXAEoQWgRtKuxPONhwuBPvA0s8RfGCM5hUycRE3gaA xkkA== X-Gm-Message-State: AAQBX9eNUc2QtZ+zgtW7jI0WkmHrFKcD2+Ks6nL1LD6PUFXZHX0+SY3S WjsWVnykUryAHsVirODACHk= X-Google-Smtp-Source: AKy350Yxf/nBIILP9HRIqEmEbw15xWuHZMIXE7fyX0v9k2OyB9EUsdpCeDlaCSFH6U6BEGGVZU8Ijg== X-Received: by 2002:a17:902:cf0e:b0:1a1:b440:3773 with SMTP id i14-20020a170902cf0e00b001a1b4403773mr32151638plg.27.1680273061239; Fri, 31 Mar 2023 07:31:01 -0700 (PDT) Received: from ?IPV6:2601:681:8600:13d0::f0a? ([2601:681:8600:13d0::f0a]) by smtp.gmail.com with ESMTPSA id t126-20020a635f84000000b004fb26a80875sm1666381pgb.22.2023.03.31.07.30.59 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 31 Mar 2023 07:31:00 -0700 (PDT) Message-ID: <2400cf77-8258-1bdb-a3a0-3c0e65949b01@gmail.com> Date: Fri, 31 Mar 2023 08:30:59 -0600 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.9.0 Subject: Re: [RFC PATCH 16/19] riscv: Add accelerated strcmp routines Content-Language: en-US To: Adhemerval Zanella Netto , =?UTF-8?Q?Christoph_M=c3=bcllner?= , Xi Ruoyao Cc: libc-alpha@sourceware.org, Palmer Dabbelt , Darius Rad , Andrew Waterman , DJ Delorie , Vineet Gupta , Kito Cheng , Philipp Tomsich , Heiko Stuebner References: <20230207001618.458947-1-christoph.muellner@vrull.eu> <20230207001618.458947-17-christoph.muellner@vrull.eu> <03c55d9f-1c36-22e0-ea19-f60fa2cf4263@gmail.com> From: Jeff Law In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 3/31/23 06:31, Adhemerval Zanella Netto wrote: >> Jeff > > Is this implementation really better than new generic one [1]? With a target > with zbb support, the generic word comparison should use orc.b instruction [2]. > And the final comparison, once with the last word or the mismatch word is found, > should use clz/ctz instruction [3] (result also in branchless code, albeit > I have not check if better than the snippet this implementation uses). I haven't done any comparisons against the updated generic bits. I nearly suggested to Christoph to do that evaluation, but when I wandered around sysdeps I saw that we still had multiple custom strcmp implementations and set that suggestion aside. > > The generic implementation also has the advantage of use word instruction > on unaligned case, where this implementation does a naive byte per byte > check. Yea, but in my digging this just didn't happen terribly often. I don't think there's a lot of value there. Along the same lines, my investigation didn't show any significant value to realign cases and I nearly suggested dropping them to avoid the branch in the hot path, but I wasn't confident enough in the breadth of my investigations to push it. > > So maybe a better option would to optimize further the generic implementation. > One option might be to parametrize the final_cmp so you can use the branchless > trick (if it indeed is better than generic code). Another option that the > generic implementation does not explore is manual loop unrolling, as done by > multiple assembly implementations. I could certainly support that. I was on the fence about pushing to use the generic bits, a little nudge could easily push me to that side. jeff