From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pl1-x62c.google.com (mail-pl1-x62c.google.com [IPv6:2607:f8b0:4864:20::62c]) by sourceware.org (Postfix) with ESMTPS id C92813858281 for ; Fri, 31 Mar 2023 18:34:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C92813858281 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivosinc.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivosinc.com Received: by mail-pl1-x62c.google.com with SMTP id ix20so22105166plb.3 for ; Fri, 31 Mar 2023 11:34:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20210112.gappssmtp.com; s=20210112; t=1680287644; h=content-transfer-encoding:mime-version:message-id:to:from:cc :in-reply-to:subject:date:from:to:cc:subject:date:message-id :reply-to; bh=iPHjgoD9roJA/ZERiidOHM+GMcRVBDg70On+n1aN6ns=; b=rTE85141e3+fa896PE80Yk0mlQ/yc5Gm2dDzYl4F88XlraRsiePxFUpFMqbz1oa/XJ tbG+uMRqaibNrpxR4VbgWHVV9LxiBaRYh8AdJ4c82K+/oCi5m/CfGfWDU6Nm1SM6quDN tDTWCJwCb9Ea2BthX03b7hqaT9H/i6pi0koLGzTMc97/I/TYWFPAFwcLFcQeO06GAfR9 MWg+VGcTM/7pzDXfHFvcgMwR7rITSQ1XqG5Sm/k69HPg9UyOWIRTg2UiFwDStmPRPXyJ CcZ50tpo2lacmNXFNpBjHV6JQubAsCgW1O8hiNKyRd/RcBs476XWdjWZjjomuqbXgC0L WPaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680287644; h=content-transfer-encoding:mime-version:message-id:to:from:cc :in-reply-to:subject:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=iPHjgoD9roJA/ZERiidOHM+GMcRVBDg70On+n1aN6ns=; b=koyAfUdZS7qbChwP0CW7AAXpA6IsLmw4nOj9e4AEufRW95m5rWZVErcGRzWWI8dz/k 4XwI56CM0RqADD9aTikyOwQIyl1itYP224O51jz2zWbgLGtn5ZTbOc3W8KDuaCHdkRbZ nvSz5RRoZa97n0DVfMqnln0sVJh/q2BbERT4lO5ifoz7hXlgjd5eUw+cI4wbMGm+7j4J ewMGJ4mlEeqC1Ht8r/vTxptCTQaeuU8zkne6Tk/tD1e2znzdud3k4MZATHnW6/616hZz KcTtjDoGvkv74lZr08Z1hWCDCiCQNSxG69q9Dn/HrniHDHyTumPmgs5qLsXmUuzNCGz2 GCvA== X-Gm-Message-State: AAQBX9c4uGC5PZqH3/tU1zzwX9d+n8Um6fFO5gDHwlO0d7Xxw+IMdtSB K5QmezrpOp1agNFUIc7lNJPARg== X-Google-Smtp-Source: AKy350YzRDvxss6SXWezakVQQggiLD0V+hPTkgr3FdNBwaWKc0rvo3Q88kLOau3ywY8D76ZInZeHug== X-Received: by 2002:a17:903:22cd:b0:1a1:bcf:db5f with SMTP id y13-20020a17090322cd00b001a10bcfdb5fmr36924137plg.25.1680287643758; Fri, 31 Mar 2023 11:34:03 -0700 (PDT) Received: from localhost ([50.221.140.188]) by smtp.gmail.com with ESMTPSA id s14-20020a170902a50e00b001a1a18a678csm1882799plq.148.2023.03.31.11.34.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 31 Mar 2023 11:34:02 -0700 (PDT) Date: Fri, 31 Mar 2023 11:34:02 -0700 (PDT) X-Google-Original-Date: Fri, 31 Mar 2023 11:33:56 PDT (-0700) Subject: Re: [PATCH v2 0/3] RISC-V: ifunced memcpy using new kernel hwprobe interface In-Reply-To: <343d75ee-69f9-94b0-aa35-d5cd645ba0d3@gmail.com> CC: adhemerval.zanella@linaro.org, Evan Green , libc-alpha@sourceware.org, slewis@rivosinc.com, Vineet Gupta From: Palmer Dabbelt To: jeffreyalaw@gmail.com Message-ID: Mime-Version: 1.0 (MHng) Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-6.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, 31 Mar 2023 11:07:02 PDT (-0700), jeffreyalaw@gmail.com wrote: > > > On 3/30/23 13:38, Adhemerval Zanella Netto wrote: >> >> >> On 30/03/23 03:20, Jeff Law wrote: >>> >>> >>> On 3/29/23 13:45, Palmer Dabbelt wrote: >>> >>>> It's not in for-next yet, but various patch sets / proposals have been on the lists for a few months and it seems like discussion on the kernel side has pretty much died down.  That's why I was pinging the glibc side of things, if anyone here has comments on the interface then it's time to chime in.  If there's no comments then we're likely to end up with this in the next release (so queue into for-next soon, Linus' master in a month or so). >>> Right.  And I've suggested that we at least try to settle on the various mem* and str* implementations independently of the kernel->glibc interface question. >>> >>> I don't much care how we break down the problem of selecting implementations, just that we get started.   That can and probably should be happening in parallel with the kernel->glibc API work. >>> >>> I've got some performance testing to do in this space (primarily of the VRULL implementations).  It's just going to take a long time to get the data.  And that implementation probably needs some revamping after all the work on the mem* and str* infrastructure that landed earlier this year. >>> >> >> I don't think glibc is the right place for code dump, specially for implementations >> that does not have representative performance numbers in real hardware and might >> require further tuning. It can be even tricky if you require different build config >> to testing as used to have for some ABI (for instance on powerpc with --with-cpu), >> at least for ifunc we have some mechanism to test multiple variants assuming the >> chips at least support (which should be case for unaligned). > It's not meant to be "code dump". It's "these are the recommended > implementation and we're just waiting for the final ifunc wiring to use > them automatically." > > But I understand your point. Even if we just agree on the > implementations without committing until the ifunc interface is settled > is a major step forward. > > My larger point is that we need to work through the str* and mem* > implementations and settle on those implementations and that can happen > in independently of the interface discussion with the kernel team. If > we've settled on specific implementations, why not go ahead and put them > into the repo with the expectation that we can trivially wire them into > the ifunc resolver once the abi interface is sorted out. IMO that's fine: we've got a bunch of other infrastructure around these optimized routines that will need to get built (glibc_hwcaps, for example) so it's not like just having hwprobe means we're done. The only issue I see with having these in tree is that we'll end up with glibc binaries that have vendor-specific tunings, but no way to provide those with generic binaries. That means vendors will end up shipping these non-portable binaries. We've historically tried to avoid that wherever possible, but it's probably time to call that a pipe dream -- the only base we could really have is rv64gc, and that's going to be so slow it's essentially useless for any real systems. So if you guys have actual performance gain numbers to talk about, then I'm happy taking the optimized glibc routines (or at least whatever bits of them are in RISC-V land) for that hardware -- even if it means there's a build-time configuration that results in Ventana-specific binaries. I think we do want to keep pushing on the dynamic flavors of stuff, just so we can try to dig out of this hole at some point, but we're going to have a mess until the ISA get sorted out. My guess is that will take years, and blocking the optimizations until then is just going to lead to a bunch of out-of-tree ports from vendors and an even bigger mess. >> So for experimental routines, where you expect to have frequent tuning based on >> once you have tested and benchmarks on different chips; an external project >> might a better idea; and sync with glibc once the routines are tested and validate. >> And these RISCV does seemed to be still very experimental, where performance numbers >> are still synthetic ones from emulators. > I think we're actually a lot closer than you might think :-) My goal > would be that we're not doing frequent tuning and avoid uarch specific > versions if we at all can. There's a reasonable chance we can do that > if we have good baseline, zbb and vector versions. I'm not including Unfortunately there's going to be very wide variation in performance between vendors for the vector extension, we're going to have at least 3 flavors of anything there (plus whatever Allwinner/T-Head ends up needing, but that's a whole can of worms). So I think at this point we'd be better off just calling these vendor-specific routines, if there's some commonality between them we can sort it out later. > cboz memory clear right now -- there's already evidence that uarch > considerations around cboz may be significant. Yep, again there's at least 3 ways of implementing CBOZ that I've seen floating around so we're going to have a vendor-specific mess there. >> Another possibility might to improve the generic implementation, as we have done >> recently where RISCV bitmanip was a matter to add just 2 files and 4 functions >> to optimize multiple string functions [2]. I have some WIP patches to add support >> for unaligned memcpy/memmove with a very simple strategy. > As I noted elsewhere. I was on the fence with pushing for improvements > to the generic strcmp bits, but could be easily swayed to that position. > > jeff