From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=WHbo=E6=cs.washington.edu=kmatsui@sourceware.org>
Received: from mail-vk1-xa2d.google.com (mail-vk1-xa2d.google.com [IPv6:2607:f8b0:4864:20::a2d])
	by sourceware.org (Postfix) with ESMTPS id 09B3D3858C62
	for <gcc-patches@gcc.gnu.org>; Thu, 14 Sep 2023 21:44:43 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 09B3D3858C62
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=cs.washington.edu
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=cs.washington.edu
Received: by mail-vk1-xa2d.google.com with SMTP id 71dfb90a1353d-493542a25dfso633354e0c.0
        for <gcc-patches@gcc.gnu.org>; Thu, 14 Sep 2023 14:44:43 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=cs.washington.edu; s=goo201206; t=1694727882; x=1695332682; darn=gcc.gnu.org;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=bH+p6LZhMzxU8TyC3MHtMffmLL0PInDO1RBI5PAisMs=;
        b=OItd2z8dpElzfvhHnfY/wSg3b/POq4x7JLLEGxfPIl3u+Lo8RUTQy1A0vdOANFkBCo
         Iv0mXtdN++Xl45ZpOlb6kSDXcspwX5Ex/pvQmPI5wZ8o8K274iFky2J/xrTjZY7+zVjc
         sSDYMSF5Nvp267jAAbXAGyjPsLiuf6eRXviac=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1694727882; x=1695332682;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=bH+p6LZhMzxU8TyC3MHtMffmLL0PInDO1RBI5PAisMs=;
        b=GJwYlIc9l+wJsmFu1P0wdR+NVjO3fPiYjZWhlUeACklZH3SXWkEUgXw2ZSz10BIBoV
         3n9n5o23vClIkTj5o5O5ku/Wad4DGQplxgz+omsaIaadynMwYl5r0rZwL1Is1bC8We0n
         XXP3VOFm3Qn6/M1TMZuryfmpr+zcCzpvHgtu3NtI1YEEkvVKce01YPI12SOLmkOuzurg
         jsBoNv/RSXfDIc+TnYEHWn+ZEKOZSb5Er5w8q+cdBDItPpodZe1aCHL6Q3tMpi3dR+JW
         NOJAvTMfeC3cykK7mn9V4ekm/rPjXZOFdfFr3t5qs+e3EA7N3fbKiZrdMh81opz2uwYs
         5PkQ==
X-Gm-Message-State: AOJu0YxAAvAnN2aUlJcWIyK4xbFhUACS3ZTL4zxUYSjqeiR2FjYpIDtM
	X/ywP4en1IO5iVF4EOZ97oFo7j/ssvc3Y/jp/bPH1Q==
X-Google-Smtp-Source: AGHT+IHjpczIfZe5BszIujVmbPi0fCFgEH79sVqbdcHiwNcoC8jiqgRE2AHIr/u9+hmhYRW5lZLMM1lxkIl6poqcT3I=
X-Received: by 2002:a1f:c902:0:b0:48f:8891:29d9 with SMTP id
 z2-20020a1fc902000000b0048f889129d9mr7381vkf.13.1694727882199; Thu, 14 Sep
 2023 14:44:42 -0700 (PDT)
MIME-Version: 1.0
References: <20230914064949.29787-1-kmatsui@gcc.gnu.org> <20230914064949.29787-17-kmatsui@gcc.gnu.org>
 <308093c9-bc0-49b-36ce-8687612ffd88@codesourcery.com>
In-Reply-To: <308093c9-bc0-49b-36ce-8687612ffd88@codesourcery.com>
From: Ken Matsui <kmatsui@cs.washington.edu>
Date: Thu, 14 Sep 2023 14:44:06 -0700
Message-ID: <CAML+3pU0OwkxJQBdpgOfpLd-OgWfD_W5cY3vQPYA=Uvwrtj6sg@mail.gmail.com>
Subject: Re: [PATCH v11 16/40] c, c++: Use 16 bits for all use of enum rid for
 more keyword space
To: Joseph Myers <joseph@codesourcery.com>
Cc: Ken Matsui <kmatsui@gcc.gnu.org>, gcc-patches@gcc.gnu.org, libstdc++@gcc.gnu.org
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-12.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Thu, Sep 14, 2023 at 10:54=E2=80=AFAM Joseph Myers <joseph@codesourcery.=
com> wrote:
>
> On Wed, 13 Sep 2023, Ken Matsui via Gcc-patches wrote:
>
> > diff --git a/gcc/c/c-parser.h b/gcc/c/c-parser.h
> > index 545f0f4d9eb..eed6deaf0f8 100644
> > --- a/gcc/c/c-parser.h
> > +++ b/gcc/c/c-parser.h
> > @@ -51,14 +51,14 @@ enum c_id_kind {
> >  /* A single C token after string literal concatenation and conversion
> >     of preprocessing tokens to tokens.  */
> >  struct GTY (()) c_token {
> > +  /* If this token is a keyword, this value indicates which keyword.
> > +     Otherwise, this value is RID_MAX.  */
> > +  ENUM_BITFIELD (rid) keyword : 16;
> >    /* The kind of token.  */
> >    ENUM_BITFIELD (cpp_ttype) type : 8;
> >    /* If this token is a CPP_NAME, this value indicates whether also
> >       declared as some kind of type.  Otherwise, it is C_ID_NONE.  */
> >    ENUM_BITFIELD (c_id_kind) id_kind : 8;
> > -  /* If this token is a keyword, this value indicates which keyword.
> > -     Otherwise, this value is RID_MAX.  */
> > -  ENUM_BITFIELD (rid) keyword : 8;
> >    /* If this token is a CPP_PRAGMA, this indicates the pragma that
> >       was seen.  Otherwise it is PRAGMA_NONE.  */
> >    ENUM_BITFIELD (pragma_kind) pragma_kind : 8;
>
> If you want to optimize layout, I'd expect flags to move so it can share
> the same 32-bit unit as the pragma_kind bit-field (not sure if any change=
s
> should be made to the declaration of flags to maximise the chance of such
> sharing across different host bit-field ABIs).
>

Thank you for your review!

I did not make this change aggressively, but we can do the following
to minimize the fragmentation:

struct GTY (()) c_token {
  tree value; /* pointer, depends, but 4 or 8 bytes as usual */
  location_t location; /* unsigned int, at least 2 bytes, 4 bytes as usual =
*/
  ENUM_BITFIELD (rid) keyword : 16; /* 2 bytes */
  ENUM_BITFIELD (cpp_ttype) type : 8; /* 1 byte */
  ENUM_BITFIELD (c_id_kind) id_kind : 8; /* 1 byte */
  ENUM_BITFIELD (pragma_kind) pragma_kind : 8; /* 1 byte */
  unsigned char flags; /* 1 byte */
}

Supposing a pointer size is 8 bytes and int is 4 bytes, the struct
size would be 24 bytes. The internal fragmentation would be 0 bytes,
and the external fragmentation is 6 bytes since the overall struct
alignment requirement is $K_{max} =3D 8$ from the pointer.

Here is the original struct before making keyword 16-bit. The overall
struct alignment requirement is $K_{max} =3D 8$ from the pointer. This
struct size would be 24 bytes since the internal fragmentation is 4
bytes (after location), and the external fragmentation is 3 bytes.

struct GTY (()) c_token {
  ENUM_BITFIELD (cpp_ttype) type : 8; /* 1 byte */
  ENUM_BITFIELD (c_id_kind) id_kind : 8; /* 1 byte */
  ENUM_BITFIELD (rid) keyword : 8; /* 1 byte */
  ENUM_BITFIELD (pragma_kind) pragma_kind : 8; /* 1 byte */
  location_t location; /* unsigned int, at least 2 bytes, 4 bytes as usual =
*/
  tree value; /* pointer, depends, but 4 or 8 bytes as usual */
  unsigned char flags; /* 1 byte */
}

If we keep the original order with the 16-bit keyword, the struct size
would be 32 bytes (my current implementation as well, I will update
this patch).

struct GTY (()) c_token {
  ENUM_BITFIELD (cpp_ttype) type : 8; /* 1 byte */
  ENUM_BITFIELD (c_id_kind) id_kind : 8; /* 1 byte */
  ENUM_BITFIELD (rid) keyword : 16; /* 2 bytes */
  ENUM_BITFIELD (pragma_kind) pragma_kind : 8; /* 1 byte */
  location_t location; /* unsigned int, at least 2 bytes, 4 bytes as usual =
*/
  tree value; /* pointer, depends, but 4 or 8 bytes as usual */
  unsigned char flags; /* 1 byte */
}

Likewise, the overall struct alignment requirement is $K_{max} =3D 8$
from the pointer. The internal fragmentation would be 7 bytes (3 bytes
after pragma_kind + 4 bytes after location), and the external
fragmentation would be 7 bytes.

I think optimizing the size is worth doing unless this breaks GCC.

> > diff --git a/gcc/cp/parser.h b/gcc/cp/parser.h
> > index 6cbb9a8e031..3c3c482c6ce 100644
> > --- a/gcc/cp/parser.h
> > +++ b/gcc/cp/parser.h
> > @@ -40,11 +40,11 @@ struct GTY(()) tree_check {
> >  /* A C++ token.  */
> >
> >  struct GTY (()) cp_token {
> > -  /* The kind of token.  */
> > -  enum cpp_ttype type : 8;
> >    /* If this token is a keyword, this value indicates which keyword.
> >       Otherwise, this value is RID_MAX.  */
> > -  enum rid keyword : 8;
> > +  enum rid keyword : 16;
> > +  /* The kind of token.  */
> > +  enum cpp_ttype type : 8;
> >    /* Token flags.  */
> >    unsigned char flags;
> >    /* True if this token is from a context where it is implicitly exter=
n "C" */
>
> You're missing an update to the "3 unused bits." comment further down.
>
> > @@ -988,7 +988,7 @@ struct GTY(()) cpp_hashnode {
> >    unsigned int directive_index : 7;  /* If is_directive,
> >                                          then index into directive tabl=
e.
> >                                          Otherwise, a NODE_OPERATOR.  *=
/
> > -  unsigned int rid_code : 8;         /* Rid code - for front ends.  */
> > +  unsigned int rid_code : 16;                /* Rid code - for front e=
nds.  */
> >    unsigned int flags : 9;            /* CPP flags.  */
> >    ENUM_BITFIELD(node_type) type : 2; /* CPP node type.  */
>
> You're missing an update to the "5 bits spare." comment further down.
>

Thank you!

> Do you have any figures for the effects on compilation time or memory
> usage from the increase in size of these structures?
>

Regarding only c_token, we will have the same size if we optimize the
size. Although I did not calculate the size of other structs, we might
not see any significant performance change? I am taking benchmarks and
will let you know once it is done.


> --
> Joseph S. Myers
> joseph@codesourcery.com