From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ot1-x330.google.com (mail-ot1-x330.google.com [IPv6:2607:f8b0:4864:20::330]) by sourceware.org (Postfix) with ESMTPS id 09E493858C2F for ; Mon, 19 Sep 2022 13:59:06 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 09E493858C2F Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-ot1-x330.google.com with SMTP id cm7-20020a056830650700b006587fe87d1aso9327423otb.10 for ; Mon, 19 Sep 2022 06:59:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=content-transfer-encoding:in-reply-to:organization:from:references :to:content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date; bh=fhO/bqEmOqL04DTSlqicczSg90IEvsa5ojGJqMXbCgM=; b=m/X+WwXWdDtR5jTPhyH3fyp3jX1ZeWzXySamyz1/mKbYNcg2mRT6R9Al1raVFd75hV Q4jGVISGsGDf6/WD25+Ge6NzgjjknXalhmEjJXoMOmANJR6MNiCXRHn08BZwUlKGCEPk ErstNZkn5PdtvXFCVt8ouslmMy3WHsnPORz24Od7+aUkKmsjTosomdHc7Kn7GyELqdAM Sz6faJkZ6Z/aa8XYGAWxoLmKOmBMr7Xo+SRbz5y2FzLUwqnSPz5VJZU3Q6zmd9oRihZz kgxkehUZs8KuX1wAjnZ4Ny+buPrK/ZE7oBK4LxfatqdAU1TAJPKEcF1PPqKdjkL2FMVt lEcg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:organization:from:references :to:content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date; bh=fhO/bqEmOqL04DTSlqicczSg90IEvsa5ojGJqMXbCgM=; b=RfbOSegBnooD9bHyjmL1MAeUF1AV6YSjIwSH8dqdLdeBmveEd2qnVozUQYxkxQarEH wtPg6NUcEWEEmtTgNg6eX683/a7keGmVLw+92PjvPexlOZK9q6jUs4JU+qkggTgFxVYU vwrHAUULw7Fm4ONVcnG5v5SFnt1xgQYNBQCsKUJUG5bkGvr829Dg6IVFJ5Bhz27Bjk4+ EIvtEI6hjv9Sf2nrZ/AF+ZTGG/kfCJXT7NIfHDdD7UyTwzKD3S7+mvByAcfo4oZUd42N Y8pOcYpeOKmKubPDG7R7Uwmy497FmV9Uci9oXmnktpEVtu/NmlcK8gOSiddXXo2omzF/ xv4g== X-Gm-Message-State: ACrzQf0ynxMfC+WrTHPr1gi1lmENNsLXFJibqtYR3mjIvZ0w5SqWic4S m+L5PYLqvl69JWHOH4vMfvjTLVt9+k1Q0PKn X-Google-Smtp-Source: AMsMyM7dkLznsbAjD9Qwi4m7Q7g6jFMEgGctA6xk3QfHKQtCJ2OpI4Xr2Yrjg03GmYc2rgJNJyxWLw== X-Received: by 2002:a9d:150:0:b0:659:f778:3b90 with SMTP id 74-20020a9d0150000000b00659f7783b90mr4163757otu.183.1663595944178; Mon, 19 Sep 2022 06:59:04 -0700 (PDT) Received: from ?IPV6:2804:1b3:a7c1:c266:6474:c804:752d:521c? ([2804:1b3:a7c1:c266:6474:c804:752d:521c]) by smtp.gmail.com with ESMTPSA id z14-20020a056870e30e00b0012769122387sm7687832oad.54.2022.09.19.06.59.03 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 19 Sep 2022 06:59:03 -0700 (PDT) Message-ID: <3622cffe-9cce-63f7-5321-a7903cca2890@linaro.org> Date: Mon, 19 Sep 2022 10:59:01 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.2.2 Subject: Re: [PATCH 04/17] Add string vectorized find and detection functions Content-Language: en-US To: Wilco Dijkstra , 'GNU C Library' References: From: Adhemerval Zanella Netto Organization: Linaro In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-5.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 03/09/22 10:13, Wilco Dijkstra wrote: > Hi Adhemerval, > > +static inline unsigned int > +__clz (op_t x) > +{ > +#if !HAVE_BUILTIN_CLZ > + unsigned r; > + op_t i; > + > + x |= x >> 1; > + x |= x >> 2; > + x |= x >> 4; > + x |= x >> 8; > + x |= x >> 16; > +# if __WORDSIZE == 64 > + x |= x >> 32; > + i = x * 0x03F79D71B4CB0A89ull >> 58; > +# else > + i = x * 0x07C4ACDDU >> 27; > +# endif > + r = index_access (i); > + return r ^ (sizeof (op_t) * CHAR_BIT - 1); > +#else > + if (sizeof (op_t) == sizeof (long int)) > + return __builtin_clzl (x); > + else > + return __builtin_clzll (x); > +#endif > +} > > This is a really bad idea. Firstly it is incorrect - sizeof (op_t) != __WORDSIZE due to > the odd way it is defined (it can be 64 bits on 32-bit targets). That in itself is > problematic since it isn't clear that using 64 bits operations extensively is efficient > on 32-bit targets (using 64-bit multiplies in GMP is different from using 64-bit > load/store in memcpy/memset which is different from 64-bit logical operations and > shifts, so all of these should be decoupled rather than forced together). > > Secondly, there are already several ways to use count leading zeroes in GLIBC. > One is use the builtin unconditionally (done in lots of places, eg. by math code), > another is count_leading_zeros defined in longlong.h. This would add the third way. > It's not clear how much gain inlining gives over using the libgcc implementation, > but if it is significant then we could provide a generic inline clzl/clzll that can be > used throughout GLIBC (replacing existing builtin_clz and count_leading_zeros). > Fair enough, I can't really recall I have added another count bits routine instead of using the already provided ones on longlong.h. The longlong.h already take care of avoiding libcall, so I adjusted the patch to use them instead. > Finally, emulating a full clz is inefficient. If you have already called find_zero_low > then there are at most 4 bits set on a 32-bit LE target, so you can trivially get the > index of the first zero byte via: > > x = x & -x; > x = (x >> 15) + (x >> 22) + 3 * (x >> 31); > > This is many times faster. There may be similar sequences for big-endian, but > you could just do a multiply with a magic word that gives the correct result > without needing a lookup table. I take that on recent architectures it would be faster assuming the existence of clz instruction, and this specific code is just used on memrchr tail call. So I think we can optimize it further on a subsequent patch.