From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oo1-xc30.google.com (mail-oo1-xc30.google.com [IPv6:2607:f8b0:4864:20::c30]) by sourceware.org (Postfix) with ESMTPS id 171123858C74 for ; Thu, 2 Feb 2023 13:32:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 171123858C74 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-oo1-xc30.google.com with SMTP id i11-20020a056820012b00b00517518d79f6so172039ood.10 for ; Thu, 02 Feb 2023 05:32:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:from:to:cc:subject:date:message-id:reply-to; bh=N1DW7G/cM/2/YBqrKmIXvg+V8Q6cBmiyIblSDB+GTo0=; b=lPn5QX9Jjv3FQMaey87K3jRaVCXIef32WSpvNqWE1GCuaqeaLLz62jCtbCD1J955pv M4ClFM4y8DAfYJq9n8T4MUNtQvHco3azVprzWZtOAm8fqp1Ygd1tegBoObN5hs3RSUba IirlOVPaxvYo7Z/xJWL9AAY8kA8XkQWI2s/qTR4SWWuvljhh6fGE408toOpqo5XtgURE qHMxJ+qn5D/aPpcw9CkoqdA0MPTQxnYnbro/h6P+IHs2tt4REUO6c/MVs9/9OOC8FntG ljtNo4k7Rl+irAA+ryxPRN2KPpTFxHXUftTZzEVrdnvGPrWWolE2ckkXx6BTFQmEGtF1 K7tw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=N1DW7G/cM/2/YBqrKmIXvg+V8Q6cBmiyIblSDB+GTo0=; b=hBcPhs5A4PYVQcjLw7Mjnu0ck0dxe1IvTyI2V56DMyze50Nd5BaR5H+4nNb/nNrvci oNNbJivb4e0Z/O5Ts+NqILJUmnbUW54lVxve5aRbSJzm1qDgQpPKvt+wGTCNegmjhc5f JlJjuVJQAtLBo7Fp6Zk+TNKftzjyL9oNOiH3+DXn2PivvZhLL4CIEiB4JJHwIJRbEMLu CfpUTrCna6HjC7s8xaHBM6glrvbQWU5LYrpR6r865YXwvtddhre5PXvW+vnsbVfeqS/M WxnvjRnWLaKTENIsuOKKiWZh+9Z3DdcuDttuv89hy4vRJ70Fp22JKR2ycdk/0KfUaZk1 iLVg== X-Gm-Message-State: AO0yUKX+kSkMJ1IX+G2Woy1iZqrz1izYJwYNdNWl9NprOFApZAT+LznN +Ik/LengwUv6KSIFua+cZXIXOg== X-Google-Smtp-Source: AK7set8riyXoRCrI3fPl+pvArLXyrwb7gdrdMgWCK0ENLLIOLgBTFoeQUDBYyODzcX2s1ZxmzQjcpA== X-Received: by 2002:a4a:e9f3:0:b0:517:7580:4764 with SMTP id w19-20020a4ae9f3000000b0051775804764mr2661714ooc.0.1675344767270; Thu, 02 Feb 2023 05:32:47 -0800 (PST) Received: from ?IPV6:2804:1b3:a7c2:1887:5d31:5c36:95c5:9e2e? ([2804:1b3:a7c2:1887:5d31:5c36:95c5:9e2e]) by smtp.gmail.com with ESMTPSA id v10-20020a4a8c4a000000b004f241603c49sm8400854ooj.20.2023.02.02.05.32.44 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 02 Feb 2023 05:32:46 -0800 (PST) Message-ID: Date: Thu, 2 Feb 2023 10:32:43 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.7.0 Subject: Re: [PATCH v11 10/29] string: Improve generic stpcpy Content-Language: en-US To: Xi Ruoyao , libc-alpha@sourceware.org, Richard Henderson , Noah Goldstein , Jeff Law Cc: caiyinyu References: <20230201170406.303978-1-adhemerval.zanella@linaro.org> <20230201170406.303978-11-adhemerval.zanella@linaro.org> <744ec5a52e67829d38d072a5b8fe1c219f8c0e7c.camel@xry111.site> From: Adhemerval Zanella Netto Organization: Linaro In-Reply-To: <744ec5a52e67829d38d072a5b8fe1c219f8c0e7c.camel@xry111.site> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.1 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 01/02/23 14:29, Xi Ruoyao wrote: > On Wed, 2023-02-01 at 14:03 -0300, Adhemerval Zanella wrote: >> +static __always_inline char * >> +stpcpy_unaligned_loop (op_t *restrict dst, const op_t *restrict src, >> +                      uintptr_t ofs) >> +{ >> +  op_t w2a = *src++; >> +  uintptr_t sh_1 = ofs * CHAR_BIT; >> +  uintptr_t sh_2 = OPSIZ * CHAR_BIT - sh_1; > > Hmm, on 64-bit LoongArch if we "clone" the function 7 times to > stpcpy_unaligned_loop_{1..7} and call them with a switch (ofs) { ... } > construction, we'd be able to use bytepick.d instruction for MERGE, > saving 2 instructions in the iteration. But maybe this is going too > far. I'm not sure if this "optimization" applies for other > architectures. I think it should be feasible, I might get back to optimize the unaligned loop with this strategy. But I will need to check if compiler will indeed exploit the fact that the shifts are now constants to optimize the merge. It also increases the code size slight, on x86_64 text size went from 850 to 1993 and on loongarch from 864 to 2200 (so it might be something to consider as well assuming that unaligned strings will have a equal probability to happen, so icache pressure would be important). > >> +  op_t w2 = MERGE (w2a, sh_1, (op_t)-1, sh_2); >> +  if (!has_zero (w2)) >> +    { >> +      op_t w2b; >> + >> +      /* Unaligned loop.  The invariant is that W2B, which is "ahead" of W1, >> +        does not contain end-of-string.  Therefore it is safe (and necessary) >> +        to read another word from each while we do not have a difference.  */ >> +      while (1) >> +       { >> +         w2b = *src++; >> +         w2 = MERGE (w2a, sh_1, w2b, sh_2); >> +         /* Check if there is zero on w2a.  */ >> +         if (has_zero (w2)) >> +           goto out; >> +         *dst++ = w2; >> +         if (has_zero (w2b)) >> +           break; >> +         w2a = w2b; >> +       } >> + >> +      /* Align the final partial of P2.  */ >> +      w2 = MERGE (w2b, sh_1, 0, sh_2); >> +    } >> + >> +out: >> +  return write_byte_from_word (dst, w2); >> +} >> + >