From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf1-x42e.google.com (mail-pf1-x42e.google.com [IPv6:2607:f8b0:4864:20::42e]) by sourceware.org (Postfix) with ESMTPS id ED5353857C66 for ; Thu, 6 Jul 2023 22:20:43 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org ED5353857C66 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-pf1-x42e.google.com with SMTP id d2e1a72fcca58-66872d4a141so1002633b3a.1 for ; Thu, 06 Jul 2023 15:20:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1688682043; x=1691274043; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=P6Bl+cHXGEollRy4QoZa+ZDI4HR/9IOx/JKeb2Y0NGo=; b=Da8IIRmFRCGWHYhNWEjIJyk9YFLVSGw+J8qxqDhYpMilsqVF0AMlmxkGgbOJDhjwWa 9+i7JnjLTW0WFkbTNI4GBVV31ccRbQp91GuUuXQxJoqVE59+thtLt2fF7eydynpTq9+k MaIczSPLKYFMvaYrfyQ/igf8ZIVerr1DOrZMM8Y653T5llaqkIgwRwVt8gs9FzQUAtk3 KHXIdEdWHwuYoULrjNGvFA7uoileEF4TwkciwRBN5I/ogYeyC5hFxphWwNTjIFUgMy+4 KJbp5BQpk7lioDM/QoApl+f99zexh/KTXbBUBJwwHhE8xhZoB+4kwqr9NiNpX7R0YzCS COWA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688682043; x=1691274043; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=P6Bl+cHXGEollRy4QoZa+ZDI4HR/9IOx/JKeb2Y0NGo=; b=ZOmtx0ydF03tg2HDo6yVaSRIHtLZYcMu9vBqBICIJTlVLYFHYNXpKm0HpAptcWfYbR xJGjcDig4jan28bAF3PMzK3EMHIiman2DJFBhkGz2DjMy15qu3ecUga+7Mm9Kw9fyMod 0ouZSuXa7H8vBkHxsXYoRBAj+NUh77zeQ2+S86UhAwdeqydP+kKCYpzH6Yq6st0vR12l +W8y0P9tJqKZr5pN5A2XBrWeJTgwbDnDWL0CRkaEMyAWZ6n2hXCwwttk44PYXWNt4xOJ HTojIj+RAIRg/CHmAgZ5eeXI8GMmTDAclxBtlZSAbm2S+uiUQpAiq7XzMe1e/TXqv9mP ECMw== X-Gm-Message-State: ABy/qLYMNNfZnWXkAWRuZAHTiD/cIh9YZNd5FshzgufHYxfvTzD2yt3u Q8K4mSCn6Tr+5tyR5iOlj3o= X-Google-Smtp-Source: APBJJlHy3CXQxKheeWrSkBuM6E3uWUY/VmP/Lgi8GEZrbCnzz2fn430vUe+yOtfA8r5NqUfcQd3yzQ== X-Received: by 2002:a05:6a00:2303:b0:64f:7a9c:cb15 with SMTP id h3-20020a056a00230300b0064f7a9ccb15mr3064861pfh.11.1688682042673; Thu, 06 Jul 2023 15:20:42 -0700 (PDT) Received: from [172.31.0.109] ([136.36.130.248]) by smtp.gmail.com with ESMTPSA id b24-20020aa78718000000b006675c242548sm1699220pfo.182.2023.07.06.15.20.41 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 06 Jul 2023 15:20:42 -0700 (PDT) Message-ID: <0f4ea30b-a040-2fb8-ce4b-285d0c7c1a4e@gmail.com> Date: Thu, 6 Jul 2023 16:20:40 -0600 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 Subject: Re: [PATCH v4 0/3] RISC-V: ifunced memcpy using new kernel hwprobe interface Content-Language: en-US To: Palmer Dabbelt , Evan Green , Jeff Law Cc: libc-alpha@sourceware.org, slewis@rivosinc.com, Vineet Gupta , fweimer@redhat.com References: From: Jeff Law In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 7/6/23 14:11, Palmer Dabbelt wrote: > > Thanks.  Given that it has a meaningful performance increase on the > T-Head hardware it seems reasonable to take it for the next release.  I > don't remember if I've looked super closely at the implementation, I'll > do so before testing and merging it -- certainly not this week, though, > as the merge window will probably eat all my spare cycles. > > The only issue on my end is the assembly memcpy routine, which we were > generally trying to avoid.  +Jeff, as IIUC the Ventana folks were > interested in memcpy on fast-misaligned systems.  Do you guys happen to > have one lying around for the C implementation?  It'd be nice to see if > we're getting any real performance benefit from the assembly. It's just an assembly version from the VRULL team. It's a fairly typical decision tree based on the amount of data being copied. Each of the variants tries to avoid loops by unrolling them in a sensible way. What's never been 100% clear to me is whether or not the full decision tree is actually that profitable in practice. With that in mind, I wouldn't object to Evan's implementation. It's a bit simplistic, but I'm OK with that until someone proves additional complexity is really needed. And I suspect we'll be using "V" based copiers soon anyway. Jeff