From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <kmatsui@gcc.gnu.org>
Received: from mx0b-00641c01.pphosted.com (mx0b-00641c01.pphosted.com [205.220.177.146])
	by sourceware.org (Postfix) with ESMTPS id A650A38FCD09;
	Fri, 15 Sep 2023 02:36:45 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A650A38FCD09
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=gcc.gnu.org
Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=gcc.gnu.org
Received: from pps.filterd (m0247480.ppops.net [127.0.0.1])
	by mx0a-00641c01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 38F2UJ1K019207;
	Fri, 15 Sep 2023 02:36:45 GMT
Received: from mxout24.cac.washington.edu (mxout24.cac.washington.edu [140.142.234.158])
	by mx0a-00641c01.pphosted.com (PPS) with ESMTPS id 3t47de39pc-1
	(version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
	Fri, 15 Sep 2023 02:36:44 +0000
Received: from smtp.washington.edu (smtp.washington.edu [128.208.60.54])
	by mxout24.cac.washington.edu (8.14.4+UW20.07/8.14.4+UW22.04) with ESMTP id 38F2agII009807
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);
	Thu, 14 Sep 2023 19:36:43 -0700
X-Auth-Received: from localhost.localdomain ([10.154.75.179])
	(authenticated authid=kmatsui)
	by smtp.washington.edu (8.16.1+UW21.10/8.14.4+UW19.10) with ESMTPSA id 38F2ago8001192
	(version=TLSv1.2 cipher=DHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT);
	Thu, 14 Sep 2023 19:36:42 -0700
X-UW-Orig-Sender: kmatsui@smtp.washington.edu
From: Ken Matsui <kmatsui@gcc.gnu.org>
To: gcc-patches@gcc.gnu.org
Cc: libstdc++@gcc.gnu.org, Ken Matsui <kmatsui@gcc.gnu.org>
Subject: [PATCH v13 16/40] c, c++: Use 16 bits for all use of enum rid for more keyword space
Date: Thu, 14 Sep 2023 19:34:56 -0700
Message-ID: <20230915023640.75216-17-kmatsui@gcc.gnu.org>
X-Mailer: git-send-email 2.42.0
In-Reply-To: <20230915023640.75216-1-kmatsui@gcc.gnu.org>
References: <20230915022305.74083-1-kmatsui@gcc.gnu.org>
 <20230915023640.75216-1-kmatsui@gcc.gnu.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Proofpoint-GUID: ICUK7qtFIBHzNAtj04Pfe3MJs1OBFQEX
X-Proofpoint-ORIG-GUID: ICUK7qtFIBHzNAtj04Pfe3MJs1OBFQEX
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.267,Aquarius:18.0.980,Hydra:6.0.601,FMLib:17.11.176.26
 definitions=2023-09-15_02,2023-09-14_01,2023-05-22_02
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0
 priorityscore=1501 lowpriorityscore=0 suspectscore=0 phishscore=0
 mlxlogscore=999 spamscore=0 impostorscore=0 malwarescore=0 clxscore=1034
 mlxscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1
 engine=8.12.0-2308100000 definitions=main-2309150021
X-Spam-Status: No, score=-13.0 required=5.0 tests=BAYES_00,GIT_PATCH_0,JMQ_SPF_NEUTRAL,KAM_DMARC_STATUS,RCVD_IN_DNSWL_LOW,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NEUTRAL,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <libstdc++.gcc.gnu.org>

Now that RID_MAX has reached 255, we need to update the bit sizes of every
use of the enum rid from 8 to 16 to support more keywords.

For struct token_indent_info, the 8-bit increase does not change the overall
struct size because the 8-bit just consumes 1 byte from 2 bytes of external
fragmentation.  Since reordering the fields just changes 1 byte of internal
fragmentation to 1 byte of external fragmentation, I keep the original field
order.

For struct c_token, the 8-bit expansion increased the overall struct size from
24 bytes to 32 bytes.  The original struct takes 4 bytes of internal
fragmentation (after the location field) and 3 bytes of external
fragmentation.  Keeping the original order with the 8-bit expansion gives
7 bytes of internal fragmentation (3 bytes after the pragma_kind field + 4
bytes after the location field) and 7 bytes of external fragmentation. Since
the original field order was not optimal, reordering the fields results in the
same overall size as the original one.  I updated the field order to the most
efficient order.

For struct cp_token, reordering the fields only minimizes internal
fragmentation and does not minimize the overall struct size.  I keep the
original field order. The original struct size was 16 bytes with 3 bits of
internal fragmentation.  With this 8-bit update, the overall size would be
24 bytes.  Since there is no external fragmentation and 7 bytes + 3 bits of
internal fragmentation, reordering the fields does not minimize the overall
size.  I keep the orignal field order.

Suppose a pointer takes 8 bytes and int takes 4 bytes. Then, struct
ht_identifier takes 16 bytes, and union _cpp_hashnode_value takes 8 bytes.
For struct cpp_hashnode, the 8-bit increase consumes 1 more byte, resulting in
33 bytes except for paddings.  The original overall size before the 8-bit
increase was 32 bytes.  However, due to fragmentation, the overall struct size
would be 40 bytes.  Since there is no external fragmentation and 3 bytes + 5
bits of internal fragmentation, reordering the fields does not minimize the
overall size.  I keep the original field order.

gcc/c-family/ChangeLog:

	* c-indentation.h (struct token_indent_info): Make keyword 16 bits.

gcc/c/ChangeLog:

	* c-parser.cc (c_parse_init): Handle RID_MAX not to exceed the max
	value of 16 bits.
	* c-parser.h (struct c_token): Make keyword 16 bits. Reorder the
	fields to minimize memory fragmentation.

gcc/cp/ChangeLog:

	* parser.h (struct cp_token): Make keyword 16 bits.
	(struct cp_lexer): Make saved_keyword 16 bits.

libcpp/ChangeLog:

	* include/cpplib.h (struct cpp_hashnode): Make rid_code 16 bits.

Signed-off-by: Ken Matsui <kmatsui@gcc.gnu.org>
---
 gcc/c-family/c-indentation.h |  2 +-
 gcc/c/c-parser.cc            |  6 +++---
 gcc/c/c-parser.h             | 14 +++++++-------
 gcc/cp/parser.h              |  8 +++++---
 libcpp/include/cpplib.h      |  7 +++++--
 5 files changed, 21 insertions(+), 16 deletions(-)

diff --git a/gcc/c-family/c-indentation.h b/gcc/c-family/c-indentation.h
index c0e07bf49f1..6d2b88f01a3 100644
--- a/gcc/c-family/c-indentation.h
+++ b/gcc/c-family/c-indentation.h
@@ -26,7 +26,7 @@ struct token_indent_info
 {
   location_t location;
   ENUM_BITFIELD (cpp_ttype) type : 8;
-  ENUM_BITFIELD (rid) keyword : 8;
+  ENUM_BITFIELD (rid) keyword : 16;
 };
 
 /* Extract token information from TOKEN, which ought to either be a
diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index b9a1b75ca43..2086f253923 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -115,9 +115,9 @@ c_parse_init (void)
   tree id;
   int mask = 0;
 
-  /* Make sure RID_MAX hasn't grown past the 8 bits used to hold the keyword in
-     the c_token structure.  */
-  gcc_assert (RID_MAX <= 255);
+  /* Make sure RID_MAX hasn't grown past the 16 bits used to hold the keyword
+     in the c_token structure.  */
+  gcc_assert (RID_MAX <= 65535);
 
   mask |= D_CXXONLY;
   if (!flag_isoc99)
diff --git a/gcc/c/c-parser.h b/gcc/c/c-parser.h
index 545f0f4d9eb..6a9bd22a793 100644
--- a/gcc/c/c-parser.h
+++ b/gcc/c/c-parser.h
@@ -51,21 +51,21 @@ enum c_id_kind {
 /* A single C token after string literal concatenation and conversion
    of preprocessing tokens to tokens.  */
 struct GTY (()) c_token {
+  /* The value associated with this token, if any.  */
+  tree value;
+  /* The location at which this token was found.  */
+  location_t location;
+  /* If this token is a keyword, this value indicates which keyword.
+     Otherwise, this value is RID_MAX.  */
+  ENUM_BITFIELD (rid) keyword : 16;
   /* The kind of token.  */
   ENUM_BITFIELD (cpp_ttype) type : 8;
   /* If this token is a CPP_NAME, this value indicates whether also
      declared as some kind of type.  Otherwise, it is C_ID_NONE.  */
   ENUM_BITFIELD (c_id_kind) id_kind : 8;
-  /* If this token is a keyword, this value indicates which keyword.
-     Otherwise, this value is RID_MAX.  */
-  ENUM_BITFIELD (rid) keyword : 8;
   /* If this token is a CPP_PRAGMA, this indicates the pragma that
      was seen.  Otherwise it is PRAGMA_NONE.  */
   ENUM_BITFIELD (pragma_kind) pragma_kind : 8;
-  /* The location at which this token was found.  */
-  location_t location;
-  /* The value associated with this token, if any.  */
-  tree value;
   /* Token flags.  */
   unsigned char flags;
 
diff --git a/gcc/cp/parser.h b/gcc/cp/parser.h
index 6cbb9a8e031..7aa251d11b1 100644
--- a/gcc/cp/parser.h
+++ b/gcc/cp/parser.h
@@ -44,7 +44,7 @@ struct GTY (()) cp_token {
   enum cpp_ttype type : 8;
   /* If this token is a keyword, this value indicates which keyword.
      Otherwise, this value is RID_MAX.  */
-  enum rid keyword : 8;
+  enum rid keyword : 16;
   /* Token flags.  */
   unsigned char flags;
   /* True if this token is from a context where it is implicitly extern "C" */
@@ -59,7 +59,9 @@ struct GTY (()) cp_token {
   bool purged_p : 1;
   bool tree_check_p : 1;
   bool main_source_p : 1;
-  /* 3 unused bits.  */
+  /* These booleans use 5 bits within 1 byte, resulting in 3 unused bits.
+     Since there would be 3 bytes of internal fragmentation to the location
+     field, the total unused bits would be 27 (= 3 + 24).  */
 
   /* The location at which this token was found.  */
   location_t location;
@@ -102,7 +104,7 @@ struct GTY (()) cp_lexer {
 
   /* Saved pieces of end token we replaced with the eof token.  */
   enum cpp_ttype saved_type : 8;
-  enum rid saved_keyword : 8;
+  enum rid saved_keyword : 16;
 
   /* The next lexer in a linked list of lexers.  */
   struct cp_lexer *next;
diff --git a/libcpp/include/cpplib.h b/libcpp/include/cpplib.h
index fcdaf082b09..7c37b861a77 100644
--- a/libcpp/include/cpplib.h
+++ b/libcpp/include/cpplib.h
@@ -988,11 +988,14 @@ struct GTY(()) cpp_hashnode {
   unsigned int directive_index : 7;	/* If is_directive,
 					   then index into directive table.
 					   Otherwise, a NODE_OPERATOR.  */
-  unsigned int rid_code : 8;		/* Rid code - for front ends.  */
+  unsigned int rid_code : 16;		/* Rid code - for front ends.  */
   unsigned int flags : 9;		/* CPP flags.  */
   ENUM_BITFIELD(node_type) type : 2;	/* CPP node type.  */
 
-  /* 5 bits spare.  */
+  /* These bitfields use 35 bits (= 1 + 7 + 16 + 9 + 2).  The exceeded 3 bits
+     in terms of bytes leave 5 unused bits within 1 byte.  Since there would
+     be 3 bytes of internal fragmentation to the deferred field, the total
+     unused bits would be 29 (= 5 + 24).  */
 
   /* The deferred cookie is applicable to NT_USER_MACRO or NT_VOID.
      The latter for when a macro had a prevailing undef.
-- 
2.42.0