From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk1-x72c.google.com (mail-qk1-x72c.google.com [IPv6:2607:f8b0:4864:20::72c]) by sourceware.org (Postfix) with ESMTPS id CF6193858D38 for ; Fri, 14 Oct 2022 23:23:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org CF6193858D38 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-qk1-x72c.google.com with SMTP id 8so3475446qka.1 for ; Fri, 14 Oct 2022 16:23:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=q4kmFycMsacMDizgwMUx5uOQf7km0z6eXiV9UeqsQms=; b=WoaQW43LlmoJ6tAzhywoZB1E0N6P7JkPze1UP5tprAZCK4puE+cdngi/cbs50/imKI q2bn9dxRtzIHiBTE9ZZ2GPaabNqiAJ9n3Vc0In/IuRt4/RSEkFbwhN3wbzce18IdUvxi DXPwL8znRG+3CNBKkK+86gPl91mkNzo86ok3vP6S5XZaLZMivfPUl2Ii7SpcDarTk6qt VZdj/g9Ln5zsVCSwyla1lagn6AWkYNrqyf86yuzsaXvadk64zObxqCnf086N60xBmy31 qe4WhovUhU5ihDpd9s2qYN7GUFHz+06ZDp44YQpUKNKwcORE7+PmyGh8/6Ai6RXzwNmC HQDg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=q4kmFycMsacMDizgwMUx5uOQf7km0z6eXiV9UeqsQms=; b=ZakC3ov7NwYXImdawf0oW5TuvaAU5Uld2NsA2CXT0XIUh/ZW0NEKa+RimHK2AqMsIP 1io3RXW1NGoNp1fpHyVqQo5kA53dNZZ31Asfrr6gkkupA0IRk9HkYBQEyJAetSEnDPVi WXeVc9dAl2toitd3HmaBmRc2Lt4VxgbWGFTiONHv2yff4IgGceH7nX/kIhdY93J7YfBm 98CORdA/qr4f2+/FoXa+7PXndQWNrsFD2XDzEX382Va9AwSw0y2vmyxtAZIbbmZLY2Sw xDpAYnLFr4QHhyR4EGDL+qiMHIxEKBUFZMozUY59zQgyb2i+b79/2DoHrK2K9WbtV09Z kFZQ== X-Gm-Message-State: ACrzQf2K8za+5fz3RXvupJPXXOdhRSeeSR4Eo5ql1M0SCcqOn9wgO2lU gq5JWa4FScX90XgEelIFtbGmG6RKXp68O3v76Ds= X-Google-Smtp-Source: AMsMyM5cVXSbhZrwOLDW6WDcFGYgPMS4rZMNk+Pvb3qdESSQztumXOy16SuvTV3As0/v7vPtUMY7Om2wZSOgZwWTM5M= X-Received: by 2002:a05:620a:2683:b0:6cf:3768:8e4b with SMTP id c3-20020a05620a268300b006cf37688e4bmr224401qkp.768.1665789784023; Fri, 14 Oct 2022 16:23:04 -0700 (PDT) MIME-Version: 1.0 References: <20221014164008.1325863-1-goldstein.w.n@gmail.com> <20221014211501.524094-1-goldstein.w.n@gmail.com> <20221014211501.524094-2-goldstein.w.n@gmail.com> In-Reply-To: From: "H.J. Lu" Date: Fri, 14 Oct 2022 16:22:27 -0700 Message-ID: Subject: Re: [PATCH v5 2/3] x86: Add macros for GPRs / mask insn based on VEC_SIZE To: Noah Goldstein Cc: libc-alpha@sourceware.org, carlos@systemhalted.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-3023.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,URIBL_BLACK autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, Oct 14, 2022 at 4:15 PM Noah Goldstein wrote: > > On Fri, Oct 14, 2022 at 5:41 PM H.J. Lu wrote: > > > > On Fri, Oct 14, 2022 at 3:27 PM Noah Goldstein wrote: > > > > > > On Fri, Oct 14, 2022 at 5:06 PM H.J. Lu wrote: > > > > > > > > On Fri, Oct 14, 2022 at 3:01 PM Noah Goldstein wrote: > > > > > > > > > > On Fri, Oct 14, 2022 at 4:28 PM H.J. Lu wrote: > > > > > > > > > > > > On Fri, Oct 14, 2022 at 2:15 PM Noah Goldstein wrote: > > > > > > > > > > > > > > This is to make it easier to do think like: > > > > > > > ``` > > > > > > > vpcmpb %VEC(0), %VEC(1), %k0 > > > > > > > kmov{d|q} %k0, %{eax|rax} > > > > > > > test %{eax|rax} > > > > > > > ``` > > > > > > > > > > > > > > It adds macro s.t any GPR can get the proper width with: > > > > > > > `V{upper_case_GPR_name}` > > > > > > > > > > > > > > and any mask insn can get the proper width with: > > > > > > > `{mask_insn_without_postfix}V` > > > > > > > > > > > > > > This commit does not change libc.so > > > > > > > > > > > > > > Tested build on x86-64 > > > > > > > --- > > > > > > > sysdeps/x86_64/multiarch/reg-macros.h | 166 ++++++++++++++++++ > > > > > > > .../multiarch/scripts/gen-reg-macros.py | 123 +++++++++++++ > > > > > > > 2 files changed, 289 insertions(+) > > > > > > > create mode 100644 sysdeps/x86_64/multiarch/reg-macros.h > > > > > > > create mode 100644 sysdeps/x86_64/multiarch/scripts/gen-reg-macros.py > > > > > > > > > > > > > > diff --git a/sysdeps/x86_64/multiarch/reg-macros.h b/sysdeps/x86_64/multiarch/reg-macros.h > > > > > > > new file mode 100644 > > > > > > > index 0000000000..16168b6fda > > > > > > > --- /dev/null > > > > > > > +++ b/sysdeps/x86_64/multiarch/reg-macros.h > > > > > > > @@ -0,0 +1,166 @@ > > > > > > > +/* This file was generated by: gen-reg-macros.py. > > > > > > > + > > > > > > > + Copyright (C) 2022 Free Software Foundation, Inc. > > > > > > > + This file is part of the GNU C Library. > > > > > > > + > > > > > > > + The GNU C Library is free software; you can redistribute it and/or > > > > > > > + modify it under the terms of the GNU Lesser General Public > > > > > > > + License as published by the Free Software Foundation; either > > > > > > > + version 2.1 of the License, or (at your option) any later version. > > > > > > > + > > > > > > > + The GNU C Library is distributed in the hope that it will be useful, > > > > > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of > > > > > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > > > > > > + Lesser General Public License for more details. > > > > > > > + > > > > > > > + You should have received a copy of the GNU Lesser General Public > > > > > > > + License along with the GNU C Library; if not, see > > > > > > > + . */ > > > > > > > + > > > > > > > +#ifndef _REG_MACROS_H > > > > > > > +#define _REG_MACROS_H 1 > > > > > > > + > > > > > > > +#define rax_8 al > > > > > > > +#define rax_16 ax > > > > > > > +#define rax_32 eax > > > > > > > +#define rax_64 rax > > > > > > > +#define rbx_8 bl > > > > > > > +#define rbx_16 bx > > > > > > > +#define rbx_32 ebx > > > > > > > +#define rbx_64 rbx > > > > > > > +#define rcx_8 cl > > > > > > > +#define rcx_16 cx > > > > > > > +#define rcx_32 ecx > > > > > > > +#define rcx_64 rcx > > > > > > > +#define rdx_8 dl > > > > > > > +#define rdx_16 dx > > > > > > > +#define rdx_32 edx > > > > > > > +#define rdx_64 rdx > > > > > > > +#define rbp_8 bpl > > > > > > > +#define rbp_16 bp > > > > > > > +#define rbp_32 ebp > > > > > > > +#define rbp_64 rbp > > > > > > > +#define rsp_8 spl > > > > > > > +#define rsp_16 sp > > > > > > > +#define rsp_32 esp > > > > > > > +#define rsp_64 rsp > > > > > > > +#define rsi_8 sil > > > > > > > +#define rsi_16 si > > > > > > > +#define rsi_32 esi > > > > > > > +#define rsi_64 rsi > > > > > > > +#define rdi_8 dil > > > > > > > +#define rdi_16 di > > > > > > > +#define rdi_32 edi > > > > > > > +#define rdi_64 rdi > > > > > > > +#define r8_8 r8b > > > > > > > +#define r8_16 r8w > > > > > > > +#define r8_32 r8d > > > > > > > +#define r8_64 r8 > > > > > > > +#define r9_8 r9b > > > > > > > +#define r9_16 r9w > > > > > > > +#define r9_32 r9d > > > > > > > +#define r9_64 r9 > > > > > > > +#define r10_8 r10b > > > > > > > +#define r10_16 r10w > > > > > > > +#define r10_32 r10d > > > > > > > +#define r10_64 r10 > > > > > > > +#define r11_8 r11b > > > > > > > +#define r11_16 r11w > > > > > > > +#define r11_32 r11d > > > > > > > +#define r11_64 r11 > > > > > > > +#define r12_8 r12b > > > > > > > +#define r12_16 r12w > > > > > > > +#define r12_32 r12d > > > > > > > +#define r12_64 r12 > > > > > > > +#define r13_8 r13b > > > > > > > +#define r13_16 r13w > > > > > > > +#define r13_32 r13d > > > > > > > +#define r13_64 r13 > > > > > > > +#define r14_8 r14b > > > > > > > +#define r14_16 r14w > > > > > > > +#define r14_32 r14d > > > > > > > +#define r14_64 r14 > > > > > > > +#define r15_8 r15b > > > > > > > +#define r15_16 r15w > > > > > > > +#define r15_32 r15d > > > > > > > +#define r15_64 r15 > > > > > > > + > > > > > > > +#define kmov_8 kmovb > > > > > > > +#define kmov_16 kmovw > > > > > > > +#define kmov_32 kmovd > > > > > > > +#define kmov_64 kmovq > > > > > > > +#define kortest_8 kortestb > > > > > > > +#define kortest_16 kortestw > > > > > > > +#define kortest_32 kortestd > > > > > > > +#define kortest_64 kortestq > > > > > > > +#define kor_8 korb > > > > > > > +#define kor_16 korw > > > > > > > +#define kor_32 kord > > > > > > > +#define kor_64 korq > > > > > > > +#define ktest_8 ktestb > > > > > > > +#define ktest_16 ktestw > > > > > > > +#define ktest_32 ktestd > > > > > > > +#define ktest_64 ktestq > > > > > > > +#define kand_8 kandb > > > > > > > +#define kand_16 kandw > > > > > > > +#define kand_32 kandd > > > > > > > +#define kand_64 kandq > > > > > > > +#define kxor_8 kxorb > > > > > > > +#define kxor_16 kxorw > > > > > > > +#define kxor_32 kxord > > > > > > > +#define kxor_64 kxorq > > > > > > > +#define knot_8 knotb > > > > > > > +#define knot_16 knotw > > > > > > > +#define knot_32 knotd > > > > > > > +#define knot_64 knotq > > > > > > > +#define kxnor_8 kxnorb > > > > > > > +#define kxnor_16 kxnorw > > > > > > > +#define kxnor_32 kxnord > > > > > > > +#define kxnor_64 kxnorq > > > > > > > +#define kunpack_8 kunpackbw > > > > > > > +#define kunpack_16 kunpackwd > > > > > > > +#define kunpack_32 kunpackdq > > > > > > > + > > > > > > > +/* Common API for accessing proper width GPR is V{upcase_GPR_name}. */ > > > > > > > +#define VRAX VGPR(rax) > > > > > > > +#define VRBX VGPR(rbx) > > > > > > > +#define VRCX VGPR(rcx) > > > > > > > +#define VRDX VGPR(rdx) > > > > > > > +#define VRBP VGPR(rbp) > > > > > > > +#define VRSP VGPR(rsp) > > > > > > > +#define VRSI VGPR(rsi) > > > > > > > +#define VRDI VGPR(rdi) > > > > > > > +#define VR8 VGPR(r8) > > > > > > > +#define VR9 VGPR(r9) > > > > > > > +#define VR10 VGPR(r10) > > > > > > > +#define VR11 VGPR(r11) > > > > > > > +#define VR12 VGPR(r12) > > > > > > > +#define VR13 VGPR(r13) > > > > > > > +#define VR14 VGPR(r14) > > > > > > > +#define VR15 VGPR(r15) > > > > > > > + > > > > > > > +/* Common API for accessing proper width mask insn is {upcase_mask_insn}. */ > > > > > > > +#define KMOV VKINSN(kmov) > > > > > > > +#define KORTEST VKINSN(kortest) > > > > > > > +#define KOR VKINSN(kor) > > > > > > > +#define KTEST VKINSN(ktest) > > > > > > > +#define KAND VKINSN(kand) > > > > > > > +#define KXOR VKINSN(kxor) > > > > > > > +#define KNOT VKINSN(knot) > > > > > > > +#define KXNOR VKINSN(kxnor) > > > > > > > +#define KUNPACK VKINSN(kunpack) > > > > > > > + > > > > > > > +#ifndef REG_WIDTH > > > > > > > +# define REG_WIDTH VEC_SIZE > > > > > > > +#endif > > > > > > > > > > > > Which files will define REG_WIDTH? What values will it be for > > > > > > YMM and ZMM vectors? > > > > > > > > > > for non-wide char evex or avx2/sse2 REG_WIDTH = VEC_SIZE > > > > > so for YMM REG_WIDTH = 32, for ZMM REG_WIDTH = 64. > > > > > > > > > > For wchar impls REG_WIDTH will often be 32 irrelivant of YMM/ZMM. > > > > > > > > Then we should have > > > > > > > > #ifdef USE_WIDE_CHAR > > > > # define REG_WIDTH 32 > > > > #else > > > > # define REG_WIDTH VEC_SIZE > > > > #endif > > > > > > > > > > It may not be universal. It may be that some wide-char impls will want > > > REG_WIDTH == 8/16 if they rely heavily on `inc` to do zero test or > > > > I think we can define a macro for it if needed. > > We can but don't you think just REG_WIDTH is more direct? It is very likely that 8-bit/16-bit registers will be used only for specific operations. Majority operations will be in 32-bit. Things like #ifndef REG_WIDTH # define REG_WIDTH VEC_SIZE #endif may lead to questions. > > > > > for some reason or another uses the full VEC_SIZE (as wcslen-evex512 > > > currently does). > > > > Will REG_WIDTH == 32 work for wcslen-evex512? > > > > I believe so but am trying to make these patch zero-affect. I think a seperate > patch to actually make substantive changes make more sense. USE_WIDE_CHAR is undefined currently. There is no impact. > > > Also don't really see what it saves to give up the granularity. > > > Either way to specify a seperate reg width the wchar impl will > > > need to define something else. Seems reasonable for that > > > something else to just be REG_WIDTH directly as opposed to > > > USE_WIDE_CHAR. > > > > > > What do you think? > > > > > > > > > > > > > +#define VPASTER(x, y) x##_##y > > > > > > > +#define VEVALUATOR(x, y) VPASTER(x, y) > > > > > > > + > > > > > > > +#define VGPR_SZ(reg_name, reg_size) VEVALUATOR(reg_name, reg_size) > > > > > > > +#define VKINSN_SZ(insn, reg_size) VEVALUATOR(insn, reg_size) > > > > > > > + > > > > > > > +#define VGPR(reg_name) VGPR_SZ(reg_name, REG_WIDTH) > > > > > > > +#define VKINSN(mask_insn) VKINSN_SZ(mask_insn, REG_WIDTH) > > > > > > > + > > > > > > > +#endif > > > > > > > diff --git a/sysdeps/x86_64/multiarch/scripts/gen-reg-macros.py b/sysdeps/x86_64/multiarch/scripts/gen-reg-macros.py > > > > > > > new file mode 100644 > > > > > > > index 0000000000..c7296a8104 > > > > > > > --- /dev/null > > > > > > > +++ b/sysdeps/x86_64/multiarch/scripts/gen-reg-macros.py > > > > > > > @@ -0,0 +1,123 @@ > > > > > > > +#!/usr/bin/python3 > > > > > > > +# Copyright (C) 2022 Free Software Foundation, Inc. > > > > > > > +# This file is part of the GNU C Library. > > > > > > > +# > > > > > > > +# The GNU C Library is free software; you can redistribute it and/or > > > > > > > +# modify it under the terms of the GNU Lesser General Public > > > > > > > +# License as published by the Free Software Foundation; either > > > > > > > +# version 2.1 of the License, or (at your option) any later version. > > > > > > > +# > > > > > > > +# The GNU C Library is distributed in the hope that it will be useful, > > > > > > > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > > > > > > > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > > > > > > +# Lesser General Public License for more details. > > > > > > > +# > > > > > > > +# You should have received a copy of the GNU Lesser General Public > > > > > > > +# License along with the GNU C Library; if not, see > > > > > > > +# . > > > > > > > +"""Generate macros for getting GPR name of a certain size > > > > > > > + > > > > > > > +Inputs: None > > > > > > > +Output: Prints header fill to stdout > > > > > > > + > > > > > > > +API: > > > > > > > + VGPR(reg_name) > > > > > > > + - Get register name VEC_SIZE component of `reg_name` > > > > > > > + VGPR_SZ(reg_name, reg_size) > > > > > > > + - Get register name `reg_size` component of `reg_name` > > > > > > > +""" > > > > > > > + > > > > > > > +import sys > > > > > > > +import os > > > > > > > +from datetime import datetime > > > > > > > + > > > > > > > +registers = [["rax", "eax", "ax", "al"], ["rbx", "ebx", "bx", "bl"], > > > > > > > + ["rcx", "ecx", "cx", "cl"], ["rdx", "edx", "dx", "dl"], > > > > > > > + ["rbp", "ebp", "bp", "bpl"], ["rsp", "esp", "sp", "spl"], > > > > > > > + ["rsi", "esi", "si", "sil"], ["rdi", "edi", "di", "dil"], > > > > > > > + ["r8", "r8d", "r8w", "r8b"], ["r9", "r9d", "r9w", "r9b"], > > > > > > > + ["r10", "r10d", "r10w", "r10b"], ["r11", "r11d", "r11w", "r11b"], > > > > > > > + ["r12", "r12d", "r12w", "r12b"], ["r13", "r13d", "r13w", "r13b"], > > > > > > > + ["r14", "r14d", "r14w", "r14b"], ["r15", "r15d", "r15w", "r15b"]] > > > > > > > + > > > > > > > +mask_insns = [ > > > > > > > + "kmov", > > > > > > > + "kortest", > > > > > > > + "kor", > > > > > > > + "ktest", > > > > > > > + "kand", > > > > > > > + "kxor", > > > > > > > + "knot", > > > > > > > + "kxnor", > > > > > > > +] > > > > > > > +mask_insns_ext = ["b", "w", "d", "q"] > > > > > > > + > > > > > > > +cr = """ > > > > > > > + Copyright (C) {} Free Software Foundation, Inc. > > > > > > > + This file is part of the GNU C Library. > > > > > > > + > > > > > > > + The GNU C Library is free software; you can redistribute it and/or > > > > > > > + modify it under the terms of the GNU Lesser General Public > > > > > > > + License as published by the Free Software Foundation; either > > > > > > > + version 2.1 of the License, or (at your option) any later version. > > > > > > > + > > > > > > > + The GNU C Library is distributed in the hope that it will be useful, > > > > > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of > > > > > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > > > > > > + Lesser General Public License for more details. > > > > > > > + > > > > > > > + You should have received a copy of the GNU Lesser General Public > > > > > > > + License along with the GNU C Library; if not, see > > > > > > > + . */ > > > > > > > +""" > > > > > > > + > > > > > > > +print("/* This file was generated by: {}.".format(os.path.basename( > > > > > > > + sys.argv[0]))) > > > > > > > +print(cr.format(datetime.today().year)) > > > > > > > + > > > > > > > +print("#ifndef _REG_MACROS_H") > > > > > > > +print("#define _REG_MACROS_H\t1") > > > > > > > +print("") > > > > > > > +for reg in registers: > > > > > > > + for i in range(0, 4): > > > > > > > + print("#define {}_{}\t{}".format(reg[0], 8 << i, reg[3 - i])) > > > > > > > + > > > > > > > +print("") > > > > > > > +for mask_insn in mask_insns: > > > > > > > + for i in range(0, 4): > > > > > > > + print("#define {}_{}\t{}{}".format(mask_insn, 8 << i, mask_insn, > > > > > > > + mask_insns_ext[i])) > > > > > > > +for i in range(0, 3): > > > > > > > + print("#define kunpack_{}\tkunpack{}{}".format(8 << i, mask_insns_ext[i], > > > > > > > + mask_insns_ext[i + 1])) > > > > > > > +mask_insns.append("kunpack") > > > > > > > + > > > > > > > +print("") > > > > > > > +print( > > > > > > > + "/* Common API for accessing proper width GPR is V{upcase_GPR_name}. */") > > > > > > > +for reg in registers: > > > > > > > + print("#define V{}\tVGPR({})".format(reg[0].upper(), reg[0])) > > > > > > > + > > > > > > > +print("") > > > > > > > + > > > > > > > +print( > > > > > > > + "/* Common API for accessing proper width mask insn is {upcase_mask_insn}. */" > > > > > > > +) > > > > > > > +for mask_insn in mask_insns: > > > > > > > + print("#define {} \tVKINSN({})".format(mask_insn.upper(), mask_insn)) > > > > > > > +print("") > > > > > > > + > > > > > > > +print("#ifndef REG_WIDTH") > > > > > > > +print("# define REG_WIDTH VEC_SIZE") > > > > > > > +print("#endif") > > > > > > > +print("") > > > > > > > +print("#define VPASTER(x, y)\tx##_##y") > > > > > > > +print("#define VEVALUATOR(x, y)\tVPASTER(x, y)") > > > > > > > +print("") > > > > > > > +print("#define VGPR_SZ(reg_name, reg_size)\tVEVALUATOR(reg_name, reg_size)") > > > > > > > +print("#define VKINSN_SZ(insn, reg_size)\tVEVALUATOR(insn, reg_size)") > > > > > > > +print("") > > > > > > > +print("#define VGPR(reg_name)\tVGPR_SZ(reg_name, REG_WIDTH)") > > > > > > > +print("#define VKINSN(mask_insn)\tVKINSN_SZ(mask_insn, REG_WIDTH)") > > > > > > > + > > > > > > > +print("\n#endif") > > > > > > > -- > > > > > > > 2.34.1 > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > H.J. > > > > > > > > > > > > > > > > -- > > > > H.J. > > > > > > > > -- > > H.J. -- H.J.