From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qv1-xf29.google.com (mail-qv1-xf29.google.com [IPv6:2607:f8b0:4864:20::f29]) by sourceware.org (Postfix) with ESMTPS id D78563858C41 for ; Mon, 7 Aug 2023 22:48:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D78563858C41 Authentication-Results: sourceware.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=google.com Received: by mail-qv1-xf29.google.com with SMTP id 6a1803df08f44-63d23473ed5so28618756d6.1 for ; Mon, 07 Aug 2023 15:48:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1691448503; x=1692053303; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=MgyQROasdVc+7q4FMZZP1+kyPIsPCVaQQsORcD+Wdvc=; b=c+qzK6xa4fhro6W5Q6ls6lgTvJeSumXXFbYZcXzheVd4DtaM739fviFUhj2N2pLaG2 tV9h7+zxpyrJ68j7ojQ8mn3OVEdgkBE9jUrcWfzCN53FBY1aVYDOIEizRjUXOT7iODDa uVAxWiA3rmpV/bDVZxIRiS/3EtMCKWTFsq8nuHMxNLUwS9j7OgaU4S/1O5v2ftKQVi+T IaBAlSXV/x3zINy/ZVEKfLW24sYo45OGMCZQ+Lq0O5VJXiV7LkrA8KfLZ39fhOMsTTMb 2SEmI0x9PyAYvZPUlCZwrHXbUJYxcJxYPcA739n91OP6/4Zc+w00/62tMohkudKt3EVJ Wk3A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691448503; x=1692053303; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=MgyQROasdVc+7q4FMZZP1+kyPIsPCVaQQsORcD+Wdvc=; b=iAo+sMI36FlVmwHi1OZWhWtoW9MBJn1zN9fmconHajl5zQWhSlbWCvxpv/J+LEOXOu GSueo8AQPdoEP6BqJyj8VJi+d4sAkjepgIIMjFG895nNGGIR1FpHQR0MVDIP2m4EQOaM dh1k7hH1hIdmEC2HpMW7trT6v9B5VBwAu/xZT94cTkBJTp91TgRqer8S6whHKSfQ7Xei 2uZlmU0OcWdYsKjGN1HXq35CXdY8n2sbMenhS4qwMejKlTSjzPa8zXd4dr3RMB6pzwE1 uudZlAg7aF8OBHyhOiB9pWr46bhDgnPS/+0DVu3wPOOp6K57DzPF6BHHgPLBBWCcGxGL Ym6Q== X-Gm-Message-State: AOJu0YyGiZ3pZvnSpC5TxhKQrS+s0Wd9psOH+0CjRE33OFLr4Dv6aF49 U6mzRtfZwfAEmXiLryWEv+GGbMMXCdnZhR+WBe6Gsw== X-Google-Smtp-Source: AGHT+IFjJYIB6RgB656g9VjyQDWPbqNmQsVbFS37ib/MvBdKB0Pljg3zXUTLf0WquA8g1Qy5jCN3u6qV0C8Nlfbb/IQ= X-Received: by 2002:a05:6214:ac9:b0:625:99ee:3ad8 with SMTP id g9-20020a0562140ac900b0062599ee3ad8mr15503475qvi.31.1691448503549; Mon, 07 Aug 2023 15:48:23 -0700 (PDT) MIME-Version: 1.0 References: <20230802155903.2552780-1-evan@rivosinc.com> <20230802155903.2552780-6-evan@rivosinc.com> <87il9w37vi.fsf@oldenburg.str.redhat.com> <6f0911c6-b24b-444c-4b4b-a62e49a51734@linaro.org> <548fc7d5-6225-69e7-f4a7-47669d2fdbd5@linaro.org> In-Reply-To: From: enh Date: Mon, 7 Aug 2023 15:48:11 -0700 Message-ID: Subject: Re: [PATCH v6 5/5] riscv: Add and use alignment-ignorant memcpy To: Evan Green Cc: Richard Henderson , Florian Weimer , libc-alpha@sourceware.org, slewis@rivosinc.com, palmer@rivosinc.com, vineetg@rivosinc.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-17.9 required=5.0 tests=BAYES_00,DKIMWL_WL_MED,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, Aug 7, 2023 at 3:11=E2=80=AFPM Evan Green wrote= : > > On Thu, Aug 3, 2023 at 3:30=E2=80=AFPM Richard Henderson > wrote: > > > > On 8/3/23 11:42, Evan Green wrote: > > > On Thu, Aug 3, 2023 at 10:50=E2=80=AFAM Richard Henderson > > > wrote: > > >> Outside libc something is required. > > >> > > >> An extra parameter to ifunc is surprising though, and clearly not id= eal per the extra > > >> hoops above. I would hope for something with hidden visibility in l= ibc_nonshared.a that > > >> could always be called directly. > > > > > > My previous spin took that approach, defining a > > > __riscv_hwprobe_early() in libc_nonshared that could route to the rea= l > > > function if available, or make the syscall directly if not. But that > > > approach had the drawback that ifunc users couldn't take advantage of > > > the vDSO, and then all users had to comprehend the difference between > > > __riscv_hwprobe() and __riscv_hwprobe_early(). > > > > I would define __riscv_hwprobe such that it could take advantage of the= vDSO once > > initialization reaches a certain point, but cope with being run earlier= than that point by > > falling back to the syscall. > > > > That constrains the implementation, I guess, in that it can't set errno= , but just > > returning the negative errno from the syscall seems fine. > > > > It might be tricky to get a reference to GLRO(dl_vdso_riscv_hwprobe) ve= ry early, but I > > would hope that some application of __attribute__((weak)) might correct= ly get you a NULL > > prior to full relocations being complete. > > Right, this is what we had in the previous iteration of this series, > and it did work ok. But it wasn't as good since it meant ifunc > selectors always got stuck in the null/fallback case and were forced > to make the syscall. With this mechanism they get to take advantage of > the vDSO. > > > > > > > > In contrast, IMO this approach is much nicer. Ifunc writers are > > > already used to getting hwcap info via a parameter. Adding this secon= d > > > parameter, which also provides hwcap-like things, seems like a natura= l > > > extension. I didn't quite follow what you meant by the "extra hoops > > > above". > > > > The check for null function pointer, for sure. But also consider how _= _riscv_hwprobe is > > going to be used. > > > > It might be worth defining some helper functions for probing a single k= ey or a single > > field. E.g. > > > > uint64_t __riscv_hwprobe_one_key(int64_t key, unsigned int flags) > > { > > struct riscv_hwprobe pair =3D { .key =3D key }; > > int err =3D __riscv_hwprobe(&pair, 1, 0, NULL, flags); > > if (err) > > return err; > > if (pair.key =3D=3D -1) > > return -ENOENT; > > return pair.value; > > } > > > > This implementation requires that no future hwprobe key define a value = which as a valid > > value in the errno range (or better, bit 63 unused). Alternately, or a= dditionally: > > > > bool __riscv_hwprobe_one_mask(int64_t key, uint64_t mask, uint64_t val,= int flags) > > { > > struct riscv_hwprobe pair =3D { .key =3D key }; > > return (__riscv_hwprobe(&pair, 1, 0, NULL, flags) =3D=3D 0 > > && pair.key !=3D -1 > > && (pair.value & mask) =3D=3D val); > > } > > > > These yield either > > > > int64_t v =3D __riscv_hwprobe_one_key(CPUPERF_0, 0); > > if (v >=3D 0 && (v & MISALIGNED_MASK) =3D=3D MISALIGNED_FAST) > > return __memcpy_noalignment; > > return __memcpy_generic; > > > > or > > > > if (__riscv_hwprobe_one_mask(CPUPERF_0, MISALIGNED_MASK, MISALIGNE= D_FAST, 0)) > > return __memcpy_noalignment; > > return __memcpy_generic; > > > > which to my mind looks much better for a pattern you'll be replicating = so very many times > > across all of the ifunc implementations in the system. > > Ah, I see. I could make a static inline function in the header that > looks something like this (mangled by gmail, sorry): > > /* Helper function usable from ifunc selectors that probes a single key. = */ > static inline int __riscv_hwprobe_one(__riscv_hwprobe_t hwprobe_func, > signed long long int key, > unsigned long long int *value) > { > struct riscv_hwprobe pair; > int rc; > > if (!hwprobe_func) > return -ENOSYS; > > pair.key =3D key; > rc =3D hwprobe_func(&pair, 1, 0, NULL, 0); > if (rc) { > return rc; > } > > if (pair.key < 0) { > return -ENOENT; > } > > *value =3D pair.value; > return 0; > } > > The ifunc selector would then be significantly cleaned up, looking > something like: > > if (__riscv_hwprobe_one(hwprobe_func, RISCV_HWPROBE_KEY_CPUPERF_0, &value= )) > return __memcpy_generic; > > if (value & RISCV_HWPROBE_MISALIGNED_MASK) =3D=3D RISCV_HWPROBE_MISALIGNE= D_FAST) > return __memcpy_noalignment; (Android's libc maintainer here, having joined the list just to talk about risc-v ifuncs :-) ) has anyone thought about calling ifunc resolvers more like this... --same part of the dynamic loader that caches the two getauxval()s for arm6= 4-- static struct riscv_hwprobe probes[] =3D { {.value =3D RISCV_HWPROBE_KEY_MVENDORID}, {.value =3D RISCV_HWPROBE_KEY_MARCHID}, {.value =3D RISCV_HWPROBE_KEY_MIMPID}, {.value =3D RISCV_HWPROBE_KEY_BASE_BEHAVIOR}, {.value =3D RISCV_HWPROBE_KEY_IMA_EXT}, {.value =3D RISCV_HWPROBE_KEY_CPUPERF_0}, ... // every time a new key is added to the kernel, we add it here }; __riscv_hwprobe(...); // called once --part of the dynamic loader that calls ifunc resolvers-- (*ifunc_resolver)(sizeof(probes)/sizeof(probes[0]), probes); this is similar to what we already have for arm64 (where there's a getauxval(AT_HWCAP) and a pointer to a struct for AT_HWCAP2 and potentially others), but more uniform, and avoiding the source (in)compatibility issues of adding new fields to a struct [even if it does have a size_t to "version" it like the arm64 ifunc struct]. yes, it means everyone pays to get all the hwprobes, but that gets amortized. and lookup in the ifunc resolver is simple and quick. if we know that the keys will be kept dense, we can even have code in ifunc resolvers like if (probes[RISCV_HWPROBE_BASE_BEHAVIOR_IMA].value & RISCV_HWPROBE_IMA_V) ..= . though personally for the "big ticket items" that get a letter to themselves like V, i'd be tempted to pass `(getauxval(AT_HWCAP), probe_count, probes_ptr)` to the resolver, but i hear that's controversial :-) > -Evan