From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oa1-x31.google.com (mail-oa1-x31.google.com [IPv6:2001:4860:4864:20::31]) by sourceware.org (Postfix) with ESMTPS id 1AAAE3858D38 for ; Thu, 22 Sep 2022 18:05:28 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 1AAAE3858D38 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-oa1-x31.google.com with SMTP id 586e51a60fabf-1280590722dso15056067fac.1 for ; Thu, 22 Sep 2022 11:05:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:from:to:cc:subject:date; bh=UqximCrMlp22JO9/N4BF8aHTjwk5yRjWux9LGN/4uiU=; b=rP1T4V0tPAhGFr8zzYM8JKAQxJFxjjwS+JSC517xv8jBs6TBcAM/sUfnftPkEFc/o/ eYQ0Ww/u2umyT5a1BKXOPGhpZDBAmU/G/OskKJ++whmtMu1STzu24WIUgIpApHG5vVBI AHU6HKeeBITLwAumMO1M5D6KhWQt72Zl/ALwIbcDHK2ZB04lgNrQ8j1UDjTal4hEoKI8 v7TDvaO6xBWgfBhBKfUdd4Av7Zs6YtuvDR68R+xINRv9AxJ/del+hVDRIN3eiPswTMt8 kfZwJ81daU+ulswS4oHlgR8m38WRxGQfdJrr4bsDrpUsV5eNjpv+L8sR3RAJKIx5z3r7 FUuw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date; bh=UqximCrMlp22JO9/N4BF8aHTjwk5yRjWux9LGN/4uiU=; b=UNFKSw+l61+eksriS97abGd/xhzAYQwAwC2SesgeY59Vy9Ka2Z3AKCNhU8zGfkNrvD 13DX1OwADGJDzv9+jf9jOEnAjpD9eosvjtB83NXfFtQR82QHvow6wdJxmtEGybzl8u6X X5aIAwhDS0TxadPBjYiJxNsDMn2NFSZq4MJgAZYWoLCjmR6OuvxqcxrSoKFi0pL2HNzX j7mPCFke3csU1lUH04NYUKw66T9yriWpb6rWefZ1C0FoHCT9WfS/QBXKcbpv7CyBpz6N k7dHzbuRXnVtbtj3larBiFYlf1uP497JG0eCWzd+c+IXXP3/xam5XW3uUPdef8/jMeB/ 9rPg== X-Gm-Message-State: ACrzQf3rurOroyhsTrX7AhSpDBAMIYXC/hqRg3a5AG3vPxn1uF8JzLEZ WWz0VMGjDn27LAkj4oBNxd/oXGELwrMFysqd X-Google-Smtp-Source: AMsMyM4wtljG2+LPZ5D2NbEeKwDtgFmWY+Gw0ObI1SwJTwb2ETB2l3zt/hOFy7yzdr6rtYV7DOSSXw== X-Received: by 2002:a05:6870:40d1:b0:12b:c0de:bdcf with SMTP id l17-20020a05687040d100b0012bc0debdcfmr9203703oal.20.1663869927126; Thu, 22 Sep 2022 11:05:27 -0700 (PDT) Received: from ?IPV6:2804:1b3:a7c1:c266:202e:f71c:c0e7:6b4e? ([2804:1b3:a7c1:c266:202e:f71c:c0e7:6b4e]) by smtp.gmail.com with ESMTPSA id z7-20020a056870e30700b0012b2137fb3dsm3913898oad.40.2022.09.22.11.05.25 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 22 Sep 2022 11:05:26 -0700 (PDT) Message-ID: <1fec4245-9eb4-108d-722e-ba36a1df0023@linaro.org> Date: Thu, 22 Sep 2022 15:05:24 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.3.0 Subject: Re: [PATCH 0/2] LoongArch: Add optimized functions. Content-Language: en-US To: Xi Ruoyao , "dengjianbo@loongson.cn" Cc: libc-alpha , caiyinyu , xuchenghua , "i.swmail" , joseph References: <403f78f0-55d9-48cf-c62a-4a0462a76987@loongson.cn> <2022091910031722091613@loongson.cn> <0172d70e-e939-31d4-bcd8-b47f274f97d9@linaro.org> <9cbcd3541c903aaba8038237befee5e3720d144e.camel@xry111.site> From: Adhemerval Zanella Netto Organization: Linaro In-Reply-To: <9cbcd3541c903aaba8038237befee5e3720d144e.camel@xry111.site> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 20/09/22 06:54, Xi Ruoyao wrote: > On Mon, 2022-09-19 at 17:16 -0300, Adhemerval Zanella Netto via Libc- > alpha wrote: >> Do you have any breakdown if either loop unrolling or missing string-fzi.h/ >> string-fza.h is what is making difference in string routines? > > It looks like there are some difficulties... LoongArch does not have a > dedicated instruction for finding a zero byte among the 8 bytes in a > register (I guess the LoongArch SIMD eXtension will provide such an > instruction, but the full LSX manual is not published yet and some > LoongArch processors may lack LSX). So the assembly code submitted by > dengjianbo relies on a register to cache the bit pattern > 0x0101010101010101. We can't just rematerialize it (with 3 > instructions) in has_zero or has_eq etc. or the performance will be > likely horribly bad. The 0x0101010101010101 is already created on find_zero_low (lsb), so creating it again on another static inline function should provide enough information to compiler to optimize the materialization to avoid doing it twice. So maybe adding a LoongArch specific index_first_zero_eq should be suffice. Maybe we can parametrize strchr with an extra function to do what the final step does: op_t found = index_first_zero_eq (word, repeated_c); if (extractbyte (word, found) == c) return (char *) (word_ptr) + found; return NULL; So LoongArch can reimplement it with a better strategy as well. The idea is this generic implementation is exactly to find the missing spots where C code could not produce the best instruction and parametrize in way that allows each architecture to reimplement in the best way. > >> Checking on last iteration [1], it seems that strchr is issuing 2 loads >> on each loop iteration and using bit-manipulation instruction that I am >> not sure compiler could emit with generic code. Maybe we can tune the >> generic implementation to get similar performance, as Richard has done >> for alpha, hppa, sh, and powerpc? >> >> I am asking because from the brief description of the algorithm, the >> general idea is essentially what my generic code aims to do (mask-off >> initial bytes, use word-aligned load and vectorized compares, extract >> final bytes), and I am hoping that architecture would provide >> string-fz{i,a}.h to get better code generation instead of pushing >> for more and more hand-write assembly routines. >