From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pj1-x102c.google.com (mail-pj1-x102c.google.com [IPv6:2607:f8b0:4864:20::102c]) by sourceware.org (Postfix) with ESMTPS id DE2553858D32 for ; Fri, 7 Apr 2023 15:36:33 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org DE2553858D32 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivosinc.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivosinc.com Received: by mail-pj1-x102c.google.com with SMTP id v9so3996279pjk.0 for ; Fri, 07 Apr 2023 08:36:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20210112.gappssmtp.com; s=20210112; t=1680881793; h=content-transfer-encoding:mime-version:message-id:to:from:cc :in-reply-to:subject:date:from:to:cc:subject:date:message-id :reply-to; bh=5ALte+2ltXJCCpXPwYWRZ91PkcoDydUiiUZHrcK6xY0=; b=KIZ6sbXpAnH2I1uc3GekMe/MWQLkcOjHeOlv8SFmleEScvL9kIAEuP0mgwhmjLE5l6 ZoCNTDTL8Bgu93m7XH7/M1l9SrlgeaSWvphTezTbfzwy9SKQwvuK/GmnDoMmCwkVKq2F l9WnKVviVzwGqRFZTZIazkRMneYsSyKh2/4jLFNX6Le5e6D6UafSYLcE8hgxUTyoFvBt FbixXh2i8+j1ALxnshaM+eFUmQ4rZHb9wtMBlvnvMT1ByGGCyDHI8Th0eBjBwidaQe5K 5VXRw+mc4gZkhv3xPEXMx3urxN8f7OYXM79gWWztehWUfEbx/wRVJC0FZildsqZ8K+5K iZHQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680881793; h=content-transfer-encoding:mime-version:message-id:to:from:cc :in-reply-to:subject:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=5ALte+2ltXJCCpXPwYWRZ91PkcoDydUiiUZHrcK6xY0=; b=Q1pjjWtBSnZa6Ao7WeOJREP6zpaV6HF26JGilXUilJwB10aMZp8XP8NzuDD5fUwmDK KpYAgXKeQllHmPPmG4UblpiEtUIeDdvqIltgs5UWExAWz5bLq2WGPkyzm+D2SR+3m6IV O8k3aDP6OE98gjN8ygb1CqHYszJF6p9k04LZDDH1tI9/FXXnrKP/JFSnykCVTjG4EsLO AKjaBuJ3PTaKMF3BGKArdE9KA497qnPaeTJBHfZU76Otc3LxVvj6r1x45V92OMjpEAxd qgY/ZGJ+o9Z5nQLxhc3ZZOKlqVZicFYxCi/xbeKusKtnzRLkHd8Tl0s4tc8vImi8ZYTl ivYQ== X-Gm-Message-State: AAQBX9cfrt5JATmGjeRiewG4BILuWb8oFixwsHsNw6Wfdj/IhkPFJhnv rHzL24MkI5Kx3ABeDk2mYAUpSw== X-Google-Smtp-Source: AKy350aNZMgVdpcFxQaiga2rgtvVB7WvT6fa2veYIQPNfaROUIwTKOauT/rqf+6AbMPIWh8f+NObJg== X-Received: by 2002:a05:6a20:6d0f:b0:cc:63c6:8d3a with SMTP id fv15-20020a056a206d0f00b000cc63c68d3amr3041192pzb.41.1680881792373; Fri, 07 Apr 2023 08:36:32 -0700 (PDT) Received: from localhost ([135.180.227.0]) by smtp.gmail.com with ESMTPSA id y3-20020a62b503000000b0062e11842b84sm3172476pfe.169.2023.04.07.08.36.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 07 Apr 2023 08:36:31 -0700 (PDT) Date: Fri, 07 Apr 2023 08:36:31 -0700 (PDT) X-Google-Original-Date: Fri, 07 Apr 2023 08:36:27 PDT (-0700) Subject: Re: [PATCH v2 0/3] RISC-V: ifunced memcpy using new kernel hwprobe interface In-Reply-To: <61b659b7-7c08-028a-1e7a-2e1b19df080a@gmail.com> CC: adhemerval.zanella@linaro.org, Evan Green , libc-alpha@sourceware.org, slewis@rivosinc.com, Vineet Gupta From: Palmer Dabbelt To: jeffreyalaw@gmail.com Message-ID: Mime-Version: 1.0 (MHng) Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-5.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, 31 Mar 2023 15:10:24 PDT (-0700), jeffreyalaw@gmail.com wrote: > > > On 3/31/23 15:38, Palmer Dabbelt wrote: >> On Fri, 31 Mar 2023 14:35:36 PDT (-0700), jeffreyalaw@gmail.com wrote: >>> >>> >>> On 3/31/23 15:03, Palmer Dabbelt wrote: >>>> On Fri, 31 Mar 2023 13:19:19 PDT (-0700), jeffreyalaw@gmail.com wrote: >>>> >>>> [just snipping the rest so we can focus on Jeff's ask, the other stuff >>>> is interesting but a longer reply and we'd probably want to fork the >>>> thread anyway...] >>>> >>>>> So perhaps we can narrow down the scope right now even further.  Can we >>>>> agree to try and settle on a base implementation with no ISA extensions >>>>> and no uarch variants?  ISTM if we can settle on those implementations >>>>> that it should be usable immediately by the RV community at large and >>>>> doesn't depend on the kernel->glibc interface work. >>>> >>>> That base includes V and ZBB?  In that case we'd be dropping support for >>>> all existing hardware, which I would be very much against. >>> No, it would not include V or ZBB.  It would be something that could >>> work on any risc-v hardware.  Sorry if I wasn't clear about that. >> >> I'm still kind of confused then, maybe it's just too abstract?  Is there >> something you could propose as being the base? > So right now we use the generic (architecture independent) routines for > str* and mem*. > > If we look at (for example) strcmp there's hand written variants out > there are are purported to have better performance than the generic code > in glibc. > > Note that any such performance claims likely predate the work from > Adhemerval and others earlier this year to reduce the reliance on > hand-coded assembly. > > So the first step is to answer the question, for any str* or mem* where > we've received a patch submission of a hand coded assembly variant > (which isn't using ZBB or V), does that hand coded assembly variant > significantly out perform the generic code currently in glibc. If yes > and the generic code can't be significantly improved, then we should > declare that hand written variant as the standard baseline for risc-v in > glibc. Review, adjust, commit and move on. > > My hope would be that many (most, all?) of the base architecture hand > coded assembly variants no longer provide any significant benefit over > the current generic versions. > > That's my minimal proposal for now. It's not meant to solve everything > in this space, but at least carve out a chunk of the work and get it > resolved one way or the other. > > Does that help clarify what I'm suggesting? Sorry for being slow here, this fell off the queue. I think this proposal is in theory what we've done, it's just that nobody's posted patches like that -- unless I missed something? Certainly the original port had some assembly routines an we tossed those because we didn't care enough to justify them. If someone's got code then I'm happy to look, but we'd also need some benchmarks (on real HW that's publicly available) and that's usually the sticking point. That said, I'd guess that anyone trying to ship real product is going to need at least V (or some other explicitly data parallel instructions) before the performance of these routines matters.