From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vk1-xa2d.google.com (mail-vk1-xa2d.google.com [IPv6:2607:f8b0:4864:20::a2d]) by sourceware.org (Postfix) with ESMTPS id 09B3D3858C62 for ; Thu, 14 Sep 2023 21:44:43 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 09B3D3858C62 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=cs.washington.edu Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=cs.washington.edu Received: by mail-vk1-xa2d.google.com with SMTP id 71dfb90a1353d-493542a25dfso633354e0c.0 for ; Thu, 14 Sep 2023 14:44:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cs.washington.edu; s=goo201206; t=1694727882; x=1695332682; darn=gcc.gnu.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=bH+p6LZhMzxU8TyC3MHtMffmLL0PInDO1RBI5PAisMs=; b=OItd2z8dpElzfvhHnfY/wSg3b/POq4x7JLLEGxfPIl3u+Lo8RUTQy1A0vdOANFkBCo Iv0mXtdN++Xl45ZpOlb6kSDXcspwX5Ex/pvQmPI5wZ8o8K274iFky2J/xrTjZY7+zVjc sSDYMSF5Nvp267jAAbXAGyjPsLiuf6eRXviac= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694727882; x=1695332682; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=bH+p6LZhMzxU8TyC3MHtMffmLL0PInDO1RBI5PAisMs=; b=GJwYlIc9l+wJsmFu1P0wdR+NVjO3fPiYjZWhlUeACklZH3SXWkEUgXw2ZSz10BIBoV 3n9n5o23vClIkTj5o5O5ku/Wad4DGQplxgz+omsaIaadynMwYl5r0rZwL1Is1bC8We0n XXP3VOFm3Qn6/M1TMZuryfmpr+zcCzpvHgtu3NtI1YEEkvVKce01YPI12SOLmkOuzurg jsBoNv/RSXfDIc+TnYEHWn+ZEKOZSb5Er5w8q+cdBDItPpodZe1aCHL6Q3tMpi3dR+JW NOJAvTMfeC3cykK7mn9V4ekm/rPjXZOFdfFr3t5qs+e3EA7N3fbKiZrdMh81opz2uwYs 5PkQ== X-Gm-Message-State: AOJu0YxAAvAnN2aUlJcWIyK4xbFhUACS3ZTL4zxUYSjqeiR2FjYpIDtM X/ywP4en1IO5iVF4EOZ97oFo7j/ssvc3Y/jp/bPH1Q== X-Google-Smtp-Source: AGHT+IHjpczIfZe5BszIujVmbPi0fCFgEH79sVqbdcHiwNcoC8jiqgRE2AHIr/u9+hmhYRW5lZLMM1lxkIl6poqcT3I= X-Received: by 2002:a1f:c902:0:b0:48f:8891:29d9 with SMTP id z2-20020a1fc902000000b0048f889129d9mr7381vkf.13.1694727882199; Thu, 14 Sep 2023 14:44:42 -0700 (PDT) MIME-Version: 1.0 References: <20230914064949.29787-1-kmatsui@gcc.gnu.org> <20230914064949.29787-17-kmatsui@gcc.gnu.org> <308093c9-bc0-49b-36ce-8687612ffd88@codesourcery.com> In-Reply-To: <308093c9-bc0-49b-36ce-8687612ffd88@codesourcery.com> From: Ken Matsui Date: Thu, 14 Sep 2023 14:44:06 -0700 Message-ID: Subject: Re: [PATCH v11 16/40] c, c++: Use 16 bits for all use of enum rid for more keyword space To: Joseph Myers Cc: Ken Matsui , gcc-patches@gcc.gnu.org, libstdc++@gcc.gnu.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-12.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Thu, Sep 14, 2023 at 10:54=E2=80=AFAM Joseph Myers wrote: > > On Wed, 13 Sep 2023, Ken Matsui via Gcc-patches wrote: > > > diff --git a/gcc/c/c-parser.h b/gcc/c/c-parser.h > > index 545f0f4d9eb..eed6deaf0f8 100644 > > --- a/gcc/c/c-parser.h > > +++ b/gcc/c/c-parser.h > > @@ -51,14 +51,14 @@ enum c_id_kind { > > /* A single C token after string literal concatenation and conversion > > of preprocessing tokens to tokens. */ > > struct GTY (()) c_token { > > + /* If this token is a keyword, this value indicates which keyword. > > + Otherwise, this value is RID_MAX. */ > > + ENUM_BITFIELD (rid) keyword : 16; > > /* The kind of token. */ > > ENUM_BITFIELD (cpp_ttype) type : 8; > > /* If this token is a CPP_NAME, this value indicates whether also > > declared as some kind of type. Otherwise, it is C_ID_NONE. */ > > ENUM_BITFIELD (c_id_kind) id_kind : 8; > > - /* If this token is a keyword, this value indicates which keyword. > > - Otherwise, this value is RID_MAX. */ > > - ENUM_BITFIELD (rid) keyword : 8; > > /* If this token is a CPP_PRAGMA, this indicates the pragma that > > was seen. Otherwise it is PRAGMA_NONE. */ > > ENUM_BITFIELD (pragma_kind) pragma_kind : 8; > > If you want to optimize layout, I'd expect flags to move so it can share > the same 32-bit unit as the pragma_kind bit-field (not sure if any change= s > should be made to the declaration of flags to maximise the chance of such > sharing across different host bit-field ABIs). > Thank you for your review! I did not make this change aggressively, but we can do the following to minimize the fragmentation: struct GTY (()) c_token { tree value; /* pointer, depends, but 4 or 8 bytes as usual */ location_t location; /* unsigned int, at least 2 bytes, 4 bytes as usual = */ ENUM_BITFIELD (rid) keyword : 16; /* 2 bytes */ ENUM_BITFIELD (cpp_ttype) type : 8; /* 1 byte */ ENUM_BITFIELD (c_id_kind) id_kind : 8; /* 1 byte */ ENUM_BITFIELD (pragma_kind) pragma_kind : 8; /* 1 byte */ unsigned char flags; /* 1 byte */ } Supposing a pointer size is 8 bytes and int is 4 bytes, the struct size would be 24 bytes. The internal fragmentation would be 0 bytes, and the external fragmentation is 6 bytes since the overall struct alignment requirement is $K_{max} =3D 8$ from the pointer. Here is the original struct before making keyword 16-bit. The overall struct alignment requirement is $K_{max} =3D 8$ from the pointer. This struct size would be 24 bytes since the internal fragmentation is 4 bytes (after location), and the external fragmentation is 3 bytes. struct GTY (()) c_token { ENUM_BITFIELD (cpp_ttype) type : 8; /* 1 byte */ ENUM_BITFIELD (c_id_kind) id_kind : 8; /* 1 byte */ ENUM_BITFIELD (rid) keyword : 8; /* 1 byte */ ENUM_BITFIELD (pragma_kind) pragma_kind : 8; /* 1 byte */ location_t location; /* unsigned int, at least 2 bytes, 4 bytes as usual = */ tree value; /* pointer, depends, but 4 or 8 bytes as usual */ unsigned char flags; /* 1 byte */ } If we keep the original order with the 16-bit keyword, the struct size would be 32 bytes (my current implementation as well, I will update this patch). struct GTY (()) c_token { ENUM_BITFIELD (cpp_ttype) type : 8; /* 1 byte */ ENUM_BITFIELD (c_id_kind) id_kind : 8; /* 1 byte */ ENUM_BITFIELD (rid) keyword : 16; /* 2 bytes */ ENUM_BITFIELD (pragma_kind) pragma_kind : 8; /* 1 byte */ location_t location; /* unsigned int, at least 2 bytes, 4 bytes as usual = */ tree value; /* pointer, depends, but 4 or 8 bytes as usual */ unsigned char flags; /* 1 byte */ } Likewise, the overall struct alignment requirement is $K_{max} =3D 8$ from the pointer. The internal fragmentation would be 7 bytes (3 bytes after pragma_kind + 4 bytes after location), and the external fragmentation would be 7 bytes. I think optimizing the size is worth doing unless this breaks GCC. > > diff --git a/gcc/cp/parser.h b/gcc/cp/parser.h > > index 6cbb9a8e031..3c3c482c6ce 100644 > > --- a/gcc/cp/parser.h > > +++ b/gcc/cp/parser.h > > @@ -40,11 +40,11 @@ struct GTY(()) tree_check { > > /* A C++ token. */ > > > > struct GTY (()) cp_token { > > - /* The kind of token. */ > > - enum cpp_ttype type : 8; > > /* If this token is a keyword, this value indicates which keyword. > > Otherwise, this value is RID_MAX. */ > > - enum rid keyword : 8; > > + enum rid keyword : 16; > > + /* The kind of token. */ > > + enum cpp_ttype type : 8; > > /* Token flags. */ > > unsigned char flags; > > /* True if this token is from a context where it is implicitly exter= n "C" */ > > You're missing an update to the "3 unused bits." comment further down. > > > @@ -988,7 +988,7 @@ struct GTY(()) cpp_hashnode { > > unsigned int directive_index : 7; /* If is_directive, > > then index into directive tabl= e. > > Otherwise, a NODE_OPERATOR. *= / > > - unsigned int rid_code : 8; /* Rid code - for front ends. */ > > + unsigned int rid_code : 16; /* Rid code - for front e= nds. */ > > unsigned int flags : 9; /* CPP flags. */ > > ENUM_BITFIELD(node_type) type : 2; /* CPP node type. */ > > You're missing an update to the "5 bits spare." comment further down. > Thank you! > Do you have any figures for the effects on compilation time or memory > usage from the increase in size of these structures? > Regarding only c_token, we will have the same size if we optimize the size. Although I did not calculate the size of other structs, we might not see any significant performance change? I am taking benchmarks and will let you know once it is done. > -- > Joseph S. Myers > joseph@codesourcery.com