From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lj1-x22a.google.com (mail-lj1-x22a.google.com [IPv6:2a00:1450:4864:20::22a]) by sourceware.org (Postfix) with ESMTPS id 838E43858D20 for ; Fri, 3 Feb 2023 14:05:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 838E43858D20 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivosinc.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivosinc.com Received: by mail-lj1-x22a.google.com with SMTP id o5so5357502ljj.1 for ; Fri, 03 Feb 2023 06:05:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20210112.gappssmtp.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=NMb+BwcJD7+dwh0RGZl7D0VheK3Rebfr9mB5a9lHBdg=; b=EwoL9T2Zkjx7DK+zwz+LbJ1AG4ERvhqUgss3s5WWr08c0yhZl8vRlCntO9ptqSKQW9 Y13LsUtsB+l1CMlkSvEgScmk41WJOwgiFB+82fk8dM2T71NNbABzskfOPzQxcX1KCWYV E7DEg3pRRm/w/irfHANUIiekwd2gStHMeoeQngaehpN2tY09lUQa3P91i5Dj0iQdoSGd zByyZHHWo6qIBUMsArURzeTRX7PSNtiy+k90+uKKZf0sc++nRPzESlU+7w9M4acHeaEF nPtaFUACaMWAsncS5C3UC6DrHxlgbWdrRhLRDUwmGt3wZve1VfqL2/U7gH1LlRUHaUGT KqAA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=NMb+BwcJD7+dwh0RGZl7D0VheK3Rebfr9mB5a9lHBdg=; b=Qmg8J2529tN1fjEXLD+jjHaDO0uBGdVv63kzS82igeX3GtqRAtkAx6FhxJ5Eh+XEPG DcuFR1rKoa78/WUzs9+fNB8glk6AO8H4jRfFb8XR8ALkrahG1Sxl5Kwr/0IYQ5Txi4eI B6116kULSmLNHvoL3gTcRElV1G/gfsJhxHNXKXMFIuyVdDEfCMs/ZknlKvTzdQQc6BT4 9kOaSI9gIvizRs64hNNmCZfOIGaKlc7zeaIRkgQYJw0659IU4rowsxD/LrCkzbUViZ0E oSEVTc+aoOoyBIB8cSaX0vwKngtgzTX2fDxPZcodZlWjd1MDXObZe/XoBGnF1A0mqvjc Qw0g== X-Gm-Message-State: AO0yUKWu41/nN2xJGAxY0+/p99/AON58TQSfubVOGMr3J+XoEht921ol WVzNoPl7aV91MFTcGmNZcN+o9KzwcwZOoiz0gIn1Xg== X-Google-Smtp-Source: AK7set8IC5Id5ERS9Y3JRVURCrbL+Zzw6YDG0+vZu8LBdVdhrorJLlvp7PEkifI0+2zEOSDptV+nK6ZGYF5QANhhGtk= X-Received: by 2002:a2e:8e2e:0:b0:28e:6e21:fcc1 with SMTP id r14-20020a2e8e2e000000b0028e6e21fcc1mr1677992ljk.152.1675433101149; Fri, 03 Feb 2023 06:05:01 -0800 (PST) MIME-Version: 1.0 References: <20230201095232.15942-1-slewis@rivosinc.com> <20230201095232.15942-2-slewis@rivosinc.com> <87479d1a-abf3-b564-8613-2a48d26527b5@linaro.org> <10c3e62f-e5a3-8c3f-7a5d-509b696aa12c@linaro.org> In-Reply-To: From: Sergei Lewis Date: Fri, 3 Feb 2023 14:04:49 +0000 Message-ID: Subject: Re: [PATCH 2/2] riscv: vectorised mem* and str* functions To: Adhemerval Zanella Netto Cc: Andrew Waterman , libc-alpha@sourceware.org Content-Type: multipart/alternative; boundary="000000000000d6d59205f3cc2a13" X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --000000000000d6d59205f3cc2a13 Content-Type: text/plain; charset="UTF-8" > > Is there a meaningful performance difference on always using the > __riscv_strict_align code path Minimum VLEN is 16 bytes, so for random operations we will average 8 iterations of a bytewise loop vs 2 of a word-oriented one. That said, I'm currently working on a v2 patch that removes the scalar fallback entirely - over a suite of random operation sizes, dropping the word-based loop is more expensive than just not having the fallback at all. So these patterns will all go away and the code will look much more like what's in the ISA manual. > The VLEN arbitrary upper bound and page size limit is also worrisome, as > Andrew has pointed out. I would prefer to either have a generic > implementation > that works without such limits > I realise I have not responded there yet - I'm certainly not ignoring this, but investigating options before I commit; e.g. another option might be to gate the affected code behind a compile time vlen check, and use fault only first loads as Andrew suggests where there was not enough information provided at compile time to prove the approach is safe. These decisions will all become much more straightforward with ifunc support - a generic version for the most common situation and runtime selection of more specific versions would resolve all these issues and also open the gates for people working on widely different implementations to easily provide their own versions of as many or as few of these functions as needed - and I do expect there will be a number of these, since the architecture is super flexible and the ecosystem already looking quite fragmented. Accordingly, I am also investigating what will be involved in getting ifuncs support in place. --000000000000d6d59205f3cc2a13--