From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pj1-x1033.google.com (mail-pj1-x1033.google.com [IPv6:2607:f8b0:4864:20::1033]) by sourceware.org (Postfix) with ESMTPS id 3BAD43858D1E for ; Mon, 6 Feb 2023 22:06:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 3BAD43858D1E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-pj1-x1033.google.com with SMTP id bx22so10103268pjb.3 for ; Mon, 06 Feb 2023 14:06:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=d3k8AlJctau6NPm+bxdQu80IQgVR+1j+EPQeOXCkAh8=; b=Z0TeIf+NX/9r7NCDAeIQbNvEegzHNYt+22W+lntRdvv8GCpPBXeMV2xnM1PLy/fbR8 2y6mENO5F7tSjHz6AEXkomVLAl9llh0za7LUn8lwXpkfZbOPIaoh3r69D6miGjYG30Ye OmNRjccd4Om399RR1v9L4tAb/bXdek3dvHFlYfVysSwQrh7x94kVpfhsYSkKslnLdkBB cN+yjk2WCX0TfBrV/p7/IS/sjdn04NsJFfytMYlROVwSFDwxo4BlUjTN22VIIC9DNt+P hWLytzCXB2nqOXGvTlpk6hUIJUBh1PHKQ7TyGiMTyBpWjkSGXtoms00m5905ZX8z0vyB qIiQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=d3k8AlJctau6NPm+bxdQu80IQgVR+1j+EPQeOXCkAh8=; b=aIlvGmTvIT0zd9H9MCxUwEMxEa3L4eHTXxKsQDM5yq019IcwQnQLlG8fvK5EQV7LfI OYoiVM06zpnS50rswZN2voRUL5ysngpeidCoMrLr3BVrkzCMtL5WVLAOWFrXPZPE1kie xzop28Fr8PkhcVCjyBoyY/Ui2EDZSwl/rP4q584rL0HHp6blgcOAu19gozGSrc4IYkfy JhOz5q7JBsUbb+3jX/kCIGNqe8XyWveQBrj9M0GUcPrIYb/rEv9jn6RzDc2DJ1jVdB6s 4dPIXXeCeLikaumW2CKPpNLSSzVq7MzZnvew/Rei/ofvyuEbDzccnKwk9WP8IZk8Bb8H EV4w== X-Gm-Message-State: AO0yUKVYMJdTBGcSqvDol6jnoxBWyBOjFj3+3aNUM0t6YgwqyNH6KR7P U/+pZZJNNk0cbFlkXX/DvypqRvyk2GV6GgKjALknOQ== X-Google-Smtp-Source: AK7set8Zfl6FebQyb6L+rtxgAmmES7+U1vsIcdrCIqkwlML7VSaYVix/FegY7dJCePvngIDDuiYD/g== X-Received: by 2002:a17:902:f54d:b0:199:196a:ecea with SMTP id h13-20020a170902f54d00b00199196aeceamr434714plf.34.1675721164183; Mon, 06 Feb 2023 14:06:04 -0800 (PST) Received: from [172.20.101.2] (rrcs-74-87-59-235.west.biz.rr.com. [74.87.59.235]) by smtp.gmail.com with ESMTPSA id m6-20020a170902bb8600b00177f25f8ab3sm7408262pls.89.2023.02.06.14.06.02 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 06 Feb 2023 14:06:03 -0800 (PST) Message-ID: <085bfca3-cacd-ea57-bde3-1d84f07aaeda@linaro.org> Date: Mon, 6 Feb 2023 12:05:59 -1000 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Subject: Re: [PATCH 2/2] riscv: Add and use alignment-ignorant memcpy Content-Language: en-US To: Evan Green , libc-alpha@sourceware.org Cc: slewis@rivosinc.com, vineetg@rivosinc.com, palmer@rivosinc.com References: <20230206194819.1679472-1-evan@rivosinc.com> <20230206194819.1679472-3-evan@rivosinc.com> From: Richard Henderson In-Reply-To: <20230206194819.1679472-3-evan@rivosinc.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-5.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 2/6/23 09:48, Evan Green wrote: > + /* Remainder is smaller than a page, compute native word count */ > + beqz a2, 6f > + andi a5, a2, ~(SZREG-1) > + andi a2, a2, (SZREG-1) > + add a3, a1, a5 > + /* Jump directly to byte copy if no words. */ > + beqz a5, 4f > + > +3: > + /* Use single native register copy */ > + REG_L a4, 0(a1) > + addi a1, a1, SZREG > + REG_S a4, 0(t6) > + addi t6, t6, SZREG > + bltu a1, a3, 3b > + > + /* Jump directly out if no more bytes */ > + beqz a2, 6f > + > +4: > + /* Copy the last few individual bytes */ > + add a3, a1, a2 > +5: > + lb a4, 0(a1) > + addi a1, a1, 1 > + sb a4, 0(t6) > + addi t6, t6, 1 > + bltu a1, a3, 5b > +6: > + ret If you know there are at least SZREG bytes in the range, you can avoid the byte loop by copying the last word unaligned. That may copy some bytes twice, but that's ok too. Similarly, you can redundantly copy a few bytes at the beginning to align the destination (there's usually some cost for unaligned stores, even if it's generally "fast"). For memcpy < SZREG, you don't need a loop; just test the final few bits of len. Have a look at the tricks in sysdeps/x86_64/multiarch/memmove-ssse3.S for ideas. r~