From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oa1-x2b.google.com (mail-oa1-x2b.google.com [IPv6:2001:4860:4864:20::2b]) by sourceware.org (Postfix) with ESMTPS id 31D2F385E00A for ; Fri, 7 Jul 2023 15:25:59 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 31D2F385E00A Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-oa1-x2b.google.com with SMTP id 586e51a60fabf-1b3ff2460ecso1919185fac.0 for ; Fri, 07 Jul 2023 08:25:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1688743558; x=1691335558; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=3CSxbcpSaEWMmovdHEj9xkvpMVQwYemtkcuxnSJVMOI=; b=DZXGjl50lIu7XJ+eq9zQzvGCg2cj8i1K0LT01B1DHPDDL7D1rE3xo5EpjCWVH6/hsk 30hJLZWcQS+k5JSWeIbp0hb8nl/XqFZKnSlVxbXejs/co0mweyO3yx6FlVvblLaD+SRo e4BpMnDilWaUSoXgDVU8/2hSjx/h0Fnn4boHFqEdNOsgThzOaQaGMOq6NpgAOCF02eW4 cFceaKD5wpJTxpkW2RMJSMvq37Q++o4Wjo/Y81fKyIYNw8axWHyufa/ONT2WCnwP4f8p R9xI2V1NQ5LpxBBntwe0T+mLJ99b88lWaasE7zrGo2Qa7PX6bhbkfq/ZsgiVv/aE2X8u nybA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688743558; x=1691335558; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=3CSxbcpSaEWMmovdHEj9xkvpMVQwYemtkcuxnSJVMOI=; b=Esvkd5t6OasZ50bSmn3TqocpY13B1a1uGN+OaWg0ErDyFM95lP8VozVT4z/eTVPLax /e1S+Y61t5JT8zLKGLPc2tJT9KM4iM0YDQjVHQQswRl4u5MAOqnqGPzwaTaPygpYWcdt VSuuPKtY7lelCG72IEMkRYZARYXSTqmaoJ3AZghnI48/jPu6NVoFqdcPKFQPHIzKmNJI pgMeuVGm3L+NG8+xksfiFgx5vLFVVtLm0cP43mzOUbVwvC2dX3ncJ/ZeXjCBnliciDsP Wc+5Ac73PylrDbXtSmpHKZ/0dYUjBaY+X5/kXwsXnWOhbb0zdCxFNgvEJfUX+02yVq/k wnMQ== X-Gm-Message-State: ABy/qLZdfG3ZRGBy78LeH7/Eb9KLOhXXMSxnPDjXfzNCNnh+qWZNDF32 O651sbQZQlYakeSEg4mYR34= X-Google-Smtp-Source: APBJJlGEXtDAm4AV6Mv57aEtClNBeU8CCm1FA8gVzcdlk1SwtOZ599d9SWE+EGkFiKUCU+ErcIlS6A== X-Received: by 2002:a05:6870:14d0:b0:1a2:8e53:c418 with SMTP id l16-20020a05687014d000b001a28e53c418mr6602908oab.57.1688743558233; Fri, 07 Jul 2023 08:25:58 -0700 (PDT) Received: from [172.31.0.109] ([136.36.130.248]) by smtp.gmail.com with ESMTPSA id z2-20020a637e02000000b00553ad4ae5e5sm3021953pgc.22.2023.07.07.08.25.57 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 07 Jul 2023 08:25:57 -0700 (PDT) Message-ID: Date: Fri, 7 Jul 2023 09:25:56 -0600 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 Subject: Re: [PATCH v4 3/3] riscv: Add and use alignment-ignorant memcpy Content-Language: en-US To: Richard Henderson , Evan Green , libc-alpha@sourceware.org Cc: palmer@rivosinc.com, slewis@rivosinc.com, vineetg@rivosinc.com, Florian Weimer References: <20230706192947.1566767-1-evan@rivosinc.com> <20230706192947.1566767-4-evan@rivosinc.com> From: Jeff Law In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.8 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 7/7/23 03:22, Richard Henderson via Libc-alpha wrote: > On 7/6/23 20:29, Evan Green wrote: >> +    /* Copy the last few individual bytes */ >> +    add a3, a1, a2 >> +5: >> +    lb a4, 0(a1) >> +    addi a1, a1, 1 >> +    sb a4, 0(t6) >> +    addi t6, t6, 1 >> +    bltu a1, a3, 5b >> +6: >> +    ret > > The only time you should be copying individual bytes is when the copy is > smaller than SZREG.  Otherwise the tail can be handled like > >     add    srcend, a1, a2 >     add    dstend, a0, a2 >     REG_L    tmp, -SZREG(srcend) >     REG_S    tmp, -SZREG(dstend) > > There are other tricks that can be used to reduce the number of branches > -- please examine the x86 code.  See e.g. the copy_0_15 block in > sysdeps/x86_64/multiarch/memmove-ssse3.S. The bits we've got here from VRULL use this trick. Evan, I'm happy to pass those bits along if you want to take a look. I have no strong opinions if this should be fixed before integration or as a follow-up. jeff