From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oa1-x31.google.com (mail-oa1-x31.google.com [IPv6:2001:4860:4864:20::31]) by sourceware.org (Postfix) with ESMTPS id A21D63858CDB for ; Thu, 30 Mar 2023 19:43:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A21D63858CDB Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-oa1-x31.google.com with SMTP id 586e51a60fabf-17aa62d0a4aso20947821fac.4 for ; Thu, 30 Mar 2023 12:43:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1680205419; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:from:to:cc:subject:date:message-id:reply-to; bh=KyQ7rwqHEVRmVGNeiiahXbQR0jFUmhiXYa+iTHjqgf8=; b=aFK+BDM9ww6YcEqFSxGR3Myh6L3+xef+ioGaLgF7xiCggCtV/Aak9onmdsn4HnzA8f MpeUMYrfnADoLjdEN3DbVsYltI1gdBgBGvdYngsFG583g6GghLtlEsQDgalgcWxW7l8I rswHdQTB3SCZ3b3Fvh2pfoxlQgAOXWcXJSjVAQDCCggflsgiZug/oneJj9dEF7ZDj4SO lEBjdeSZFZMOBmOp64PAoIFSMX9/TJyao3qrEx17llIAO+TkgEUxcAYZPIjmxiHbjLTX pO+R+TV/z66QyC8ayETDRn6465doXvleMtAzR53AffbDunyzJKm1N5Zuldkc7ga9Dtba vGDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680205419; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=KyQ7rwqHEVRmVGNeiiahXbQR0jFUmhiXYa+iTHjqgf8=; b=wLcNLgkglNTsQ+jlLok6pr2+tWMAaglEA5POVOYksOQnWeIhbpLl1SpFZOYwHWkdao yVZ4d0blZyiKnhX4i9OumN/bkF4Az6rP33lt1UlGw0mqAkQ2LKnvM1YEoIzHtGgvutcI 2AFi5FsZUxymiEq31HvZZiYAtYWdz3ZMUKcQbmbvGyQqGY5/1Y9VhHtNipXEig6g7JxT Si8FOMa+PVFYn6tKu3pIW1W//zo0o/TKoGvRYxRyIxjASQ4s6pKSlSyj15CZwl8ox+Cl ohtDVjK03UKcmftVTLgrdSa0rZWADuHtGw68mJI57ypfCf3hp38r2fPBsVQulCEzDT4i EILg== X-Gm-Message-State: AAQBX9eDF9zKA4dF3HeFPpDSo7grmzff3mVguY/JLSfiqfSOnKsI+UbE PdFS7khupP1VOy3wiMYeA+cGow== X-Google-Smtp-Source: AKy350ZMxQeEt3jlqvAoRDmNCKM+2AcARqwRSOyIElIUpiYhHG2rkFAAXBcHUD51lsMk1GVVQrVieg== X-Received: by 2002:a05:6871:412:b0:17e:47f3:9d1a with SMTP id d18-20020a056871041200b0017e47f39d1amr14414159oag.9.1680205418909; Thu, 30 Mar 2023 12:43:38 -0700 (PDT) Received: from ?IPV6:2804:1b3:a7c1:60f9:4ca7:df5c:ca4c:27b4? ([2804:1b3:a7c1:60f9:4ca7:df5c:ca4c:27b4]) by smtp.gmail.com with ESMTPSA id vj12-20020a0568710e0c00b001762ce27f9asm267527oab.23.2023.03.30.12.43.36 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 30 Mar 2023 12:43:38 -0700 (PDT) Message-ID: Date: Thu, 30 Mar 2023 16:43:34 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.9.0 Subject: Re: [PATCH v2 0/3] RISC-V: ifunced memcpy using new kernel hwprobe interface Content-Language: en-US To: Evan Green Cc: Palmer Dabbelt , libc-alpha@sourceware.org, slewis@rivosinc.com, Vineet Gupta , Arnd Bergmann References: From: Adhemerval Zanella Netto Organization: Linaro In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-5.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 30/03/23 15:31, Evan Green wrote: > Hi Adhemerval, > > On Wed, Mar 29, 2023 at 1:13 PM Adhemerval Zanella Netto > wrote: >> >> >> >> On 29/03/23 16:45, Palmer Dabbelt wrote: >>> On Wed, 29 Mar 2023 12:16:39 PDT (-0700), adhemerval.zanella@linaro.org wrote: >>>> >>>> >>>> On 28/03/23 21:01, Palmer Dabbelt wrote: >>>>> On Tue, 28 Mar 2023 16:41:10 PDT (-0700), adhemerval.zanella@linaro.org wrote: >>>>>> >>>>>> >>>>>> On 28/03/23 19:54, Palmer Dabbelt wrote: >>>>>>> On Tue, 21 Feb 2023 11:15:34 PST (-0800), Evan Green wrote: >>>>>>>> >>>>>>>> This series illustrates the use of a proposed Linux syscall that >>>>>>>> enumerates architectural information about the RISC-V cores the system >>>>>>>> is running on. In this series we expose a small wrapper function around >>>>>>>> the syscall. An ifunc selector for memcpy queries it to see if unaligned >>>>>>>> access is "fast" on this hardware. If it is, it selects a newly provided >>>>>>>> implementation of memcpy that doesn't work hard at aligning the src and >>>>>>>> destination buffers. >>>>>>>> >>>>>>>> This is somewhat of a proof of concept for the syscall itself, but I do >>>>>>>> find that in my goofy memcpy test [1], the unaligned memcpy performed at >>>>>>>> least as well as the generic C version. This is however on Qemu on an M1 >>>>>>>> mac, so not a test of any real hardware (more a smoke test that the >>>>>>>> implementation isn't silly). >>>>>>> >>>>>>> QEMU isn't a good enough benchmark to justify a new memcpy routine in glibc. Evan has a D1, which does support misaligned access and runs some simple benchmarks faster. There's also been some minor changes to the Linux side of things that warrant a v3 anyway, so he'll just post some benchmarks on HW along with that. >>>>>>> >>>>>>> Aside from those comments, >>>>>>> >>>>>>> Reviewed-by: Palmer Dabbelt >>>>>>> >>>>>>> There's a lot more stuff to probe for, but I think we've got enough of a proof of concept for the hwprobe stuff that we can move forward with the core interface bits in Linux/glibc and then unleash the chaos... >>>>>>> >>>>>>> Unless anyone else has comments? >>>>>> >>>>>> Until riscv_hwprobe is not on Linus tree as official Linux ABI this patchset >>>>>> can not be installed. We failed to enforce it on some occasion (like Intel >>>>>> CET) and it turned out a complete mess after some years... >>>>> >>>>> Sorry if that wasn't clear, I was asking if there were any more comments from the glibc side of things before merging the Linux code. >>>> >>>> Right, so is this already settle to be the de-factor ABI to query for system >>>> information in RISCV? Or is it still being discussed? Is it in a next branch >>>> already, and/or have been tested with a patch glibc? >>> >>> It's not in for-next yet, but various patch sets / proposals have been on the lists for a few months and it seems like discussion on the kernel side has pretty much died down. That's why I was pinging the glibc side of things, if anyone here has comments on the interface then it's time to chime in. If there's no comments then we're likely to end up with this in the next release (so queue into for-next soon, Linus' master in a month or so). >>> >>> IIUC Evan's been testing the kernel+glibc stuff on QEMU, but he should be able to ack that explicitly (it's a little vague in the cover letter). There's also a glibc-independent kselftest as part of the kernel patch set: https://lore.kernel.org/all/20230327163203.2918455-6-evan@rivosinc.com/ . >> >> I am not sure if this is latest thread, but it seems that from cover letter link >> Arnd has raised some concerns about the interface [1] that has not been fully >> addressed. > > I've replied to that thread. > >> >> From libc perspective, the need to specify the query key on riscv_hwprobe should >> not be a problem (libc must know what tohandle, unknown tags are no use) and it >> simplifies the buffer management (so there is no need to query for unknown set of >> keys of a allocate a large buffer to handle multiple non-required pairs). >> >> However, I agree with Arnd that there should be no need to optimize for hardware >> that has an asymmetric set of features and, at least for glibc usage and most >> runtime feature selection, it does not make sense to query per-cpu information >> (unless you some very specific programming, like pine the process to specific >> cores and enable core-specific code). > > I pushed back on that in my reply upstream, feel free to jump in > there. I think you're right that glibc probably wouldn't ever use the > cpuset aspect of the interface, but the gist of my reply upstream is > that more specialized apps may. Well, I still think providing the userland with asymmetric set of features is a complexity that does not pay off, but at least the interface does allow to return a concise view of the supported features. > >> >> I also wonder how hotplug or cpusets would play with the vDSO support, and how >> kernel would synchronize the update, if any, to the prive vDSO data. > > The good news is that the cached data in the vDSO is not ABI, it's > hidden behind the vDSO function. So as things like hotplug start > evolving and interacting with the vDSO cache data, we can update what > data we cache and when we fall back to the syscall. Right, I was just curious how one would synchronize the vDSO code with the concurrent update from kernel. Some time ago, I was working with another kernel developer on a vDSO getrandom and it required a lot of boilerplate and even though we did not come with a good interface for concurrent access with a structure that kernel might change concurrently.