From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oa1-x36.google.com (mail-oa1-x36.google.com [IPv6:2001:4860:4864:20::36]) by sourceware.org (Postfix) with ESMTPS id 6E1EF3858034 for ; Thu, 22 Sep 2022 17:31:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6E1EF3858034 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-oa1-x36.google.com with SMTP id 586e51a60fabf-12b542cb1d3so14805566fac.13 for ; Thu, 22 Sep 2022 10:31:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:from:to:cc:subject:date; bh=URNiWVEc36oVkfECMdTg4qyLgVexZM9Kud1WM8XgHps=; b=ZhD6My1chqlAWCC745aODLRFlOF6LHwQKcYelB2IaS8r1UB6g+SNo7SoGV9BUiLrke OhQBe5TvAO+KdKJoQR/8AUhkcAp6aiWmqMoBJsylbkN89AxihqcWipRXUwtMGb/+IYce S6Enlma6VmipXhEV0YZ9HPkn7ppJU0dPpK6zVi4Z2pK7ap594/T6L+Y28w6SHKUCUWVI HbmDB7x2XzwogDj6mUQvwJLfa9y+XdOmRbCZxtUJ3CbphU5BgBJgE+xz2+iHDidkDbNg PkXXbQs9bFfHlYfawPMKkSKmSPleakT2IJtIzmD+2HRw5JMKTt9+Otxb4+mmFuZ4GfBH 5Urg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date; bh=URNiWVEc36oVkfECMdTg4qyLgVexZM9Kud1WM8XgHps=; b=pl6zVrl5GS1rOpF+arGe2o0J2kt7rDSkrtqj+z1fcrcBm1wkHI5IjvFy1ElJWetdef MV3iSNDrcZvchk3g0nnk1EUh9wmQfDKwuORellTBmSmSJ1g0diPfLMU3A0LJJKfiJwFU eZ4BoGCWN4N5aVzrPaaAfQEzKpi+xf9nnZ5DA7ic4Pz2H62NwplOVJNBlzdh1lOiFbAE egWTipS/dDS/W9OLjLI0DKlWErhUBCZSz+9RGq3Tk3Z+vykt6yfyGtkUVfRpSwf3KSJZ smjd/u8juKW8q85/94EgDjK8TJl/+Gb3dCdCgmLLC36h1k+4L6hJx+goFAVnyf1nW99w dFWQ== X-Gm-Message-State: ACrzQf3ccw7D+Vl91/YS6Y8NlvYwPDWKv+zDq+2Sv4J21+4Q1+R6C8c9 82qN/yF3UaxanWcvydPug9usR5Csd0C0sYda X-Google-Smtp-Source: AMsMyM6/sw0neh9/eGlwNhP6aOv1mpk6Vnd9U97r/rfuftl4txXtf5737QO+f/MoQ+WKBtSNAHv/kg== X-Received: by 2002:a05:6871:799:b0:11e:a2a3:dcae with SMTP id o25-20020a056871079900b0011ea2a3dcaemr8574655oap.69.1663867879622; Thu, 22 Sep 2022 10:31:19 -0700 (PDT) Received: from ?IPV6:2804:1b3:a7c1:c266:202e:f71c:c0e7:6b4e? ([2804:1b3:a7c1:c266:202e:f71c:c0e7:6b4e]) by smtp.gmail.com with ESMTPSA id 66-20020a4a1445000000b00435a8024bc1sm2414108ood.4.2022.09.22.10.31.18 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 22 Sep 2022 10:31:19 -0700 (PDT) Message-ID: Date: Thu, 22 Sep 2022 14:31:16 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.3.0 Subject: Re: [PATCH v5 03/17] Add string-maskoff.h generic header Content-Language: en-US To: Carlos O'Donell Cc: libc-alpha@sourceware.org References: <20220919195920.956393-1-adhemerval.zanella@linaro.org> <20220919195920.956393-4-adhemerval.zanella@linaro.org> From: Adhemerval Zanella Netto Organization: Linaro In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-13.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 20/09/22 08:43, Carlos O'Donell wrote: > On Mon, Sep 19, 2022 at 04:59:06PM -0300, Adhemerval Zanella via Libc-alpha wrote: >> Macros to operate on unaligned access for string operations: >> >> - create_mask: create a mask based on pointer alignment to sets up >> non-zero bytes before the beginning of the word so a following >> operation (such as find zero) might ignore these bytes. >> >> - highbit_mask: create a mask with high bit of each byte being 1, >> and the low 7 bits being all the opposite of the input. > > I really appreciate the effort you've put into documenting the purpose > of each function! It's really awesome to reach such nice coments. Thank > you for that. I've gone through this to review the implementation and > the descriptions. I think it needs a little more tweaking. > >> These macros are meant to be used on optimized vectorized string >> implementations. >> --- >> sysdeps/generic/string-maskoff.h | 73 ++++++++++++++++++++++++++++++++ >> 1 file changed, 73 insertions(+) >> create mode 100644 sysdeps/generic/string-maskoff.h >> >> diff --git a/sysdeps/generic/string-maskoff.h b/sysdeps/generic/string-maskoff.h >> new file mode 100644 >> index 0000000000..831647bda6 >> --- /dev/null >> +++ b/sysdeps/generic/string-maskoff.h >> @@ -0,0 +1,73 @@ >> +/* Mask off bits. Generic C version. >> + Copyright (C) 2022 Free Software Foundation, Inc. >> + This file is part of the GNU C Library. >> + >> + The GNU C Library is free software; you can redistribute it and/or >> + modify it under the terms of the GNU Lesser General Public >> + License as published by the Free Software Foundation; either >> + version 2.1 of the License, or (at your option) any later version. >> + >> + The GNU C Library is distributed in the hope that it will be useful, >> + but WITHOUT ANY WARRANTY; without even the implied warranty of >> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >> + Lesser General Public License for more details. >> + >> + You should have received a copy of the GNU Lesser General Public >> + License along with the GNU C Library; if not, see >> + . */ >> + >> +#ifndef _STRING_MASKOFF_H >> +#define _STRING_MASKOFF_H 1 >> + >> +#include >> +#include >> +#include >> +#include >> + >> +/* Provide a mask based on the pointer alignment that sets up non-zero >> + bytes before the beginning of the word. It is used to mask off >> + undesirable bits from an aligned read from an unaligned pointer. >> + For instance, on a 64 bits machine with a pointer alignment of > > s/bits/-bit/g > > While it is technically correct English to say "A 64-bits machine", this > is not the normative usage. > > I suggest we use the normative "64-bit machine." We can talk about 64 > bits, and alignment as bits etc. Alright. > >> + 3 the function returns 0x0000000000ffffff for LE and 0xffffff0000000000 >> + (meaning to mask off the initial 3 bytes). */ > > Missing "for BE" ? Ack. > >> +static inline op_t >> +create_mask (uintptr_t i) >> +{ >> + i = i % sizeof (op_t); > > OK. Wrap the value. > >> + if (__BYTE_ORDER == __LITTLE_ENDIAN) >> + return ~(((op_t)-1) << (i * CHAR_BIT)); >> + else >> + return ~(((op_t)-1) >> (i * CHAR_BIT)); > > OK. Shift. > >> +} >> + >> +/* Setup an word with each byte being c_in. For instance, on a 64 bits > > s/an/a/g > s/bits/-bit/g Ack. > >> + machine with input as 0xce the functions returns 0xcececececececece. */ >> +static inline op_t >> +repeat_bytes (unsigned char c_in) >> +{ >> + return ((op_t)-1 / 0xff) * c_in; >> +} > > How does the compiler do here on the various architectures to produce > the deposit/expand instructions that could be used for this operation? > > aarch64 gcc trunk: > ldrb r3, [r7, #7] @ zero_extendqisi2 > mov r2, #16843009 > mul r3, r2, r3 > > x86_64 gcc trunk: > movzx eax, BYTE PTR [rbp-4] > imul eax, eax, 16843009 > > s390x gcc12: > ic %r1,167(%r11) > lhi %r2,255 > nr %r1,%r2 > ms %r1,.L4-.L3(%r5) > llgfr %r1,%r1 > > Looks OK, and the static inline will get optimized with the rest of > the operations. > >> + >> +/* Based on mask created by 'create_mask', mask off the high bit of each > > s/on/on a/g Ack. > >> + byte in the mask. It is used to mask off undesirable bits from an >> + aligned read from an unaligned pointer, and also taking care to avoid > > s/and/while/g Ack. > >> + match possible bytes meant to be matched. For instance, on a 64 bits > > Suggest: > matching possible bytes not meant to be matched. > > s/bits/-bits/g Ack. > >> + machine with a mask created from a pointer with an alignment of 3 >> + (0x0000000000ffffff) the function returns 0x7f7f7f0000000000 for BE >> + and 0x00000000007f7f7f for LE. */ >> +static inline op_t >> +highbit_mask (op_t m) >> +{ >> + return m & repeat_bytes (0x7f); > > OK. > >> +} >> + >> +/* Return the address of the op_t word containing the address P. For >> + instance on address 0x0011223344556677 and op_t with size of 8, >> + it returns 0x0011223344556670. */ > > Could you expand on this a bit more? It's a bit opaque what we might use > this for (I have some ideas). Maybe: /* Return the word aligned address containing the address P. For instance for the address 0x0011223344556677 with op_t with size of 8, it returns 0x0011223344556670. */ > >> +static inline op_t * >> +word_containing (char const *p) >> +{ >> + return (op_t *) (p - (uintptr_t) p % sizeof (op_t)); >> +} >> + >> +#endif /* _STRING_MASKOFF_H */ >> -- >> 2.34.1 >> >