From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lj1-x22e.google.com (mail-lj1-x22e.google.com [IPv6:2a00:1450:4864:20::22e]) by sourceware.org (Postfix) with ESMTPS id 2F5863858000 for ; Thu, 9 Feb 2023 21:04:42 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2F5863858000 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivosinc.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivosinc.com Received: by mail-lj1-x22e.google.com with SMTP id m10so3615573ljp.3 for ; Thu, 09 Feb 2023 13:04:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20210112.gappssmtp.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=BwztmHKIwGHfPmRbmGnDQYW/puY8o3wE8pxEYyfSmR8=; b=wEMUipXINkMpb2mv1uKbFQ0/DRL2kyOfIFOITU3Ljl3k8EupKhyZBcZMTRKceoh6O7 Fd7pinzVEL5kuDxee/ERkzIofaqt1E/be7S6HSAmWuRK8/jYecpQcL1SsC9qNMfQLniE zYSe1FAHUmcECgtygs/ngF5NH9Gn9MPBHJSQVP9yCRqc8Fgw1K/SxKQe3h1wr9GP74RU lhrY9wjWAk4dcBpGuoN6zb7lE9Qq35HDdGTEAuMk4JRYWJ4GeZ8HaVmCYo322vypyaMq x8z2/5datmxLTbGhcAWERwZrRQfOqyjv0fi+fHJFlpPkm0bLTakBeocyK42TBHJJVGlN Ob2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=BwztmHKIwGHfPmRbmGnDQYW/puY8o3wE8pxEYyfSmR8=; b=s5mT1ZNkIvgRveF36f8eiZjZHj9w4hEYD05jxxCtr3xGC8t0XK1iA/7DLBIW50T87L JGQ731th6PhwjwAFsQ9qnDy29Y8/m4p5wyt02llx4VWZp9oQTJst64zewfkKgGTf6CE3 TYjJNNQzcAPtzdoJNp50wkBrJmOkd7IP+xVXz0DuJoJvW8/qHcEovZO8V5vmCud2kXWj 5P1GE1RSOz6ToAciBXDuChRZV80luKCLnfBbhh6GMuFm/MXXSAY0CWTiyh87/WRcmqgb psgQZIATdGipvGEQr3P6WT7nTJN4ag7QxxwNZXL2KO/1QDuBgGTlzCbDpO+S8cq+Lo+T Fkuw== X-Gm-Message-State: AO0yUKVx5ssPoj0EqAdmBiBxko1Lr3h1Yn9Ol4RGCFAN12zicOlCOykt W576i+MjDRrfCY6BsoWUFLtMB9xUSFL0ynpnzsBT4w== X-Google-Smtp-Source: AK7set/Zsk2vXh+wgG6e1e1PbBnE2nDBEW/kpY3iPuM10DUndc5NKPPFWxWpazUSuI38l/lhN5QOeJPQCFPqpX53bJ0= X-Received: by 2002:a2e:8695:0:b0:293:336d:6599 with SMTP id l21-20020a2e8695000000b00293336d6599mr574780lji.8.1675976680699; Thu, 09 Feb 2023 13:04:40 -0800 (PST) MIME-Version: 1.0 References: <20230206194819.1679472-1-evan@rivosinc.com> <20230206194819.1679472-3-evan@rivosinc.com> <085bfca3-cacd-ea57-bde3-1d84f07aaeda@linaro.org> In-Reply-To: <085bfca3-cacd-ea57-bde3-1d84f07aaeda@linaro.org> From: Evan Green Date: Thu, 9 Feb 2023 13:04:04 -0800 Message-ID: Subject: Re: [PATCH 2/2] riscv: Add and use alignment-ignorant memcpy To: Richard Henderson Cc: libc-alpha@sourceware.org, slewis@rivosinc.com, vineetg@rivosinc.com, palmer@rivosinc.com Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-3.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, Feb 6, 2023 at 2:06 PM Richard Henderson wrote: > > On 2/6/23 09:48, Evan Green wrote: > > + /* Remainder is smaller than a page, compute native word count */ > > + beqz a2, 6f > > + andi a5, a2, ~(SZREG-1) > > + andi a2, a2, (SZREG-1) > > + add a3, a1, a5 > > + /* Jump directly to byte copy if no words. */ > > + beqz a5, 4f > > + > > +3: > > + /* Use single native register copy */ > > + REG_L a4, 0(a1) > > + addi a1, a1, SZREG > > + REG_S a4, 0(t6) > > + addi t6, t6, SZREG > > + bltu a1, a3, 3b > > + > > + /* Jump directly out if no more bytes */ > > + beqz a2, 6f > > + > > +4: > > + /* Copy the last few individual bytes */ > > + add a3, a1, a2 > > +5: > > + lb a4, 0(a1) > > + addi a1, a1, 1 > > + sb a4, 0(t6) > > + addi t6, t6, 1 > > + bltu a1, a3, 5b > > +6: > > + ret > > If you know there are at least SZREG bytes in the range, you can avoid the byte loop by > copying the last word unaligned. That may copy some bytes twice, but that's ok too. > Similarly, you can redundantly copy a few bytes at the beginning to align the destination > (there's usually some cost for unaligned stores, even if it's generally "fast"). > > For memcpy < SZREG, you don't need a loop; just test the final few bits of len. > Have a look at the tricks in sysdeps/x86_64/multiarch/memmove-ssse3.S for ideas. Thanks! I haven't gone too deeply into the fine tuning of this routine, I think you're right there are probably tweaks to be made for optimal gains. These are good suggestions, though I might save them for a subsequent patch. -Evan