public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [RFC] More compact (100x) -g3 .debug_macinfo
@ 2011-07-13 17:12 Jakub Jelinek
  2011-07-13 19:59 ` Tom Tromey
  2011-07-15 15:52 ` [RFC] More compact (100x) -g3 .debug_macinfo (take 2) Jakub Jelinek
  0 siblings, 2 replies; 25+ messages in thread
From: Jakub Jelinek @ 2011-07-13 17:12 UTC (permalink / raw)
  To: Jason Merrill, Richard Henderson, Tom Tromey, Jan Kratochvil,
	Roland McGrath, Cary Coutant, Mark Wielaard
  Cc: gcc-patches

Hi!

Currently .debug_macinfo is prohibitively large, because it doesn't
allow for any kind of merging of duplicate debug information.

This patch is an RFC for extensions that allow it to bring it down
to manageable levels.  The ideas for the first shrinking come from Jason
and/or Roland I think from last year and is similar to the introduction of
DW_FORM_strp to replace DW_FORM_string in some cases.
In particular, if the string in DW_MACINFO_define or DW_MACINFO_undef is
larger than 4 bytes including terminating '\0' and there is a chance the
string might occur more than once, instead an offset into .debug_str
is used.  The usual .debug_str string merging then kicks in and removes
duplicities.

The second savings come from merging of identical sequences of
DW_MACINFO_define/undef ops.  Usually, when you include some header,
the macros it defines/undefines are the same.  Unfortunately it is hard
to merge whole headers, because:
1) DW_MACINFO_start_file uses .debug_line references, which prevent merging
   - different CUs have different .debug_line content
2) multiple inclusion of headers with single inclusion guards is quite
   common and results in such merging to be less than satisfactory, as
   if some header includes <stdio.h> and you include that header
   in one source file without prior inclusion of stdio.h and in a different
   one after #include <stdio.h>, suddenly the .debug_macinfo sequence
   for that header is different if it transitively includes included headers

Unfortunately, as defined in DWARF{2,3,4}, .debug_macinfo is not really
allowing extensions.  DW_MACINFO_vendor_ext doesn't count, because its
argument is a string, which certainly can't include embedded zeros needed
for the offsets into other sections or other portions of the same section.

The following approach just grabs a range of .debug_macinfo opcodes for
vendor use, if the DWARF commitee would give such an approach a green light.
.debug_macinfo has 256 possible opcodes and just defines 5 (plus 1 for
termination), the remaining 250 are unused.
Other alternative would be to come up with .debug_gnu_macinfo section or
similar and defining a new DW_AT_GNU_macro_info attribute that would be
used instead of DW_AT_macro_info, but I'd prefer to stay with
.debug_macinfo.

The newly added opcodes:
DW_MACINFO_GNU_define_indirect4		0xe0
	This opcode has two arguments, one is uleb128 lineno and the
	other is 4 byte offset into .debug_str.  Except for the
	encoding of the string it is similar to DW_MACINFO_define.
DW_MACINFO_GNU_undef_indirect4		0xe1
	This opcode has two arguments, one is uleb128 lineno and the
	other is 4 byte offset into .debug_str.  Except for the
	encoding of the string it is similar to DW_MACINFO_undef.
DW_MACINFO_GNU_transparent_include4	0xe2
	This opcode has a single argument, a 4 byte offset into
	.debug_macinfo.  It instructs the debug info consumer that
	this opcode during reading should be replaced with the sequence
	of .debug_macinfo opcodes from the mentioned offset, up to
	a terminating 0 opcode (not including that 0).
DW_MACINFO_GNU_define_opcode		0xe3
	This is an opcode for future extensibility through which
	a debugger could skip unknown opcodes.  It has 3 arguments:
	1 byte opcode number, uleb128 count of arguments and
	a count bytes long array, with a DW_FORM_* code how the
	argument is encoded.
The debug info producers have to ensure that opcodes in
DW_MACINFO_GNU_transparent_include4 chains reference the right sections
for any .debug_macinfo that includes them (which essentially means
that DW_MACINFO_start_file can't be used in the transparent_include4
chain.  Perhaps cleaner would be not to define all offset sizes in the
opcode values/names and instead have DW_MACINFO_GNU_define_indirect
and DW_MACINFO_GNU_undef_indirect whose arguments would be
DW_FORM_udata and DW_FORM_strp (i.e. offset size) - the producers
would need to ensure that .debug_macinfo chains with different
assumed offset size aren't merged together, which could be done
e.g. by using wm4.[<filename>.]<lineno>.<md5> and wm8.* comdat
groups instead of the current wm.*.  DW_MACINFO_GNU_transparent_include4
then would have DW_FORM_sec_offset single argument and
DW_MACINFO_GNU_define_opcode would have DW_FORM_data1 and DW_FORM_block
arguments and the implicit opcode definition assumed at the start
of every .debug_macinfo would be:
DW_MACINFO_GNU_define_opcode <0, 0 []>
DW_MACINFO_GNU_define_opcode <DW_MACINFO_define, 2 [DW_FORM_udata, DW_FORM_string]>
DW_MACINFO_GNU_define_opcode <DW_MACINFO_undef, 2 [DW_FORM_udata, DW_FORM_string]>
DW_MACINFO_GNU_define_opcode <DW_MACINFO_start_file, 2 [DW_FORM_udata, DW_FORM_sec_offset]>
DW_MACINFO_GNU_define_opcode <DW_MACINFO_end_file, 1 [DW_FORM_udata]>
DW_MACINFO_GNU_define_opcode <DW_MACINFO_GNU_define_indirect, 2 [DW_FORM_udata, DW_FORM_strp]>
DW_MACINFO_GNU_define_opcode <DW_MACINFO_GNU_undef_indirect, 2 [DW_FORM_udata, DW_FORM_strp]>
DW_MACINFO_GNU_define_opcode <DW_MACINFO_GNU_define_opcode, 2 [DW_FORM_data1, DW_FORM_block]>
DW_MACINFO_GNU_define_opcode <DW_MACINFO_vendor_ext, 1 [DW_FORM_string]>

This approach doesn't need any linker changes, the slight disadvantage is
a small increase in the size of -g3 built object files (e.g. on i686-linux
-g3 -O2 gcc/*.o were together 461.3MB large before this patch and with this patch
518.6MB, i.e. more than 13% more), but the size of cc1plus reduced
significantly, from 428.9MB down to 92.6MB.  Previously, .debug_macinfo
section occupied in cc1plus 339MB and .debug_str 1MB, with the patch
.debug_macinfo has 1MB and .debug_str 2.5MB.  .debug_str wasn't used
for macinfo before, so macinfo now takes together 2.5MB compared to
339MB before.

2011-07-13  Jakub Jelinek  <jakub@redhat.com>

	* dwarf2.h (DW_MACINFO_lo_user, DW_MACINFO_hi_user): Add.
	(DW_MACINFO_GNU_define_indirect4, DW_MACINFO_GNU_undef_indirect4,
	DW_MACINFO_GNU_transparent_include4, DW_MACINFO_GNU_define_opcode):
	Add.

	* dwarf2out.c (dwarf2out_undef): Remove redundant semicolon.
	(htab_macinfo_hash, htab_macinfo_eq, output_macinfo_op): New
	functions.
	(output_macinfo): Use them.  If !dwarf_strict and .debug_str is
	mergeable, optimize longer strings using
	DW_MACINFO_GNU_{define,undef}_indirect4 and if HAVE_COMDAT and ELF,
	optimize longer sequences of define/undef ops from headers
	using DW_MACINFO_GNU_transparent_include4.

--- include/dwarf2.h.jj	2011-06-23 10:14:06.000000000 +0200
+++ include/dwarf2.h	2011-07-13 11:39:49.000000000 +0200
@@ -877,7 +877,13 @@ enum dwarf_macinfo_record_type
     DW_MACINFO_undef = 2,
     DW_MACINFO_start_file = 3,
     DW_MACINFO_end_file = 4,
-    DW_MACINFO_vendor_ext = 255
+    DW_MACINFO_lo_user = 0xe0,
+    DW_MACINFO_GNU_define_indirect4 = 0xe0,
+    DW_MACINFO_GNU_undef_indirect4 = 0xe1,
+    DW_MACINFO_GNU_transparent_include4 = 0xe2,
+    DW_MACINFO_GNU_define_opcode = 0xe3,
+    DW_MACINFO_hi_user = 0xfe,
+    DW_MACINFO_vendor_ext = 0xff
   };
 \f
 /* @@@ For use with GNU frame unwind information.  */
--- gcc/dwarf2out.c.jj	2011-07-12 17:59:01.000000000 +0200
+++ gcc/dwarf2out.c	2011-07-13 17:04:17.000000000 +0200
@@ -20383,17 +20383,118 @@ dwarf2out_undef (unsigned int lineno ATT
       macinfo_entry e;
       e.code = DW_MACINFO_undef;
       e.lineno = lineno;
-      e.info = xstrdup (buffer);;
+      e.info = xstrdup (buffer);
       VEC_safe_push (macinfo_entry, gc, macinfo_table, &e);
     }
 }
 
+/* Routines to manipulate hash table of CUs.  */
+static hashval_t
+htab_macinfo_hash (const void *of)
+{
+  const macinfo_entry *const entry =
+    (const macinfo_entry *) of;
+
+  return htab_hash_string (entry->info);
+}
+
+static int
+htab_macinfo_eq (const void *of1, const void *of2)
+{
+  const macinfo_entry *const entry1 = (const macinfo_entry *) of1;
+  const macinfo_entry *const entry2 = (const macinfo_entry *) of2;
+
+  return !strcmp (entry1->info, entry2->info);
+}
+
+/* Output a single .debug_macinfo entry.  */
+
+static void
+output_macinfo_op (macinfo_entry *ref)
+{
+  int file_num;
+  size_t len;
+  struct indirect_string_node *node;
+  char label[MAX_ARTIFICIAL_LABEL_BYTES];
+
+  switch (ref->code)
+    {
+    case DW_MACINFO_start_file:
+      file_num = maybe_emit_file (lookup_filename (ref->info));
+      dw2_asm_output_data (1, DW_MACINFO_start_file, "Start new file");
+      dw2_asm_output_data_uleb128 (ref->lineno,
+				   "Included from line number %lu", 
+				   (unsigned long) ref->lineno);
+      dw2_asm_output_data_uleb128 (file_num, "file %s", ref->info);
+      break;
+    case DW_MACINFO_end_file:
+      dw2_asm_output_data (1, DW_MACINFO_end_file, "End file");
+      break;
+    case DW_MACINFO_define:
+    case DW_MACINFO_undef:
+      len = strlen (ref->info) + 1;
+      if (!dwarf_strict
+	  && len > DWARF_OFFSET_SIZE
+	  && DWARF_OFFSET_SIZE == 4
+	  && !DWARF2_INDIRECT_STRING_SUPPORT_MISSING_ON_TARGET
+	  && (debug_str_section->common.flags & SECTION_MERGE) != 0)
+	{
+	  ref->code = ref->code == DW_MACINFO_define
+		      ? DW_MACINFO_GNU_define_indirect4
+		      : DW_MACINFO_GNU_undef_indirect4;
+	  output_macinfo_op (ref);
+	  return;
+	}
+      dw2_asm_output_data (1, ref->code,
+			   ref->code == DW_MACINFO_define
+			   ? "Define macro" : "Undefine macro");
+      dw2_asm_output_data_uleb128 (ref->lineno, "At line number %lu", 
+				   (unsigned long) ref->lineno);
+      dw2_asm_output_nstring (ref->info, -1, "The macro");
+      break;
+    case DW_MACINFO_GNU_define_indirect4:
+    case DW_MACINFO_GNU_undef_indirect4:
+      node = find_AT_string (ref->info);
+      if (node->form != DW_FORM_strp)
+	{
+	  char label[32];
+	  ASM_GENERATE_INTERNAL_LABEL (label, "LASF", dw2_string_counter);
+	  ++dw2_string_counter;
+	  node->label = xstrdup (label);
+	  node->form = DW_FORM_strp;
+	}
+      dw2_asm_output_data (1, ref->code,
+			   ref->code == DW_MACINFO_GNU_define_indirect4
+			   ? "Define macro indirect4"
+			   : "Undefine macro indirect4");
+      dw2_asm_output_data_uleb128 (ref->lineno, "At line number %lu",
+				   (unsigned long) ref->lineno);
+      dw2_asm_output_offset (DWARF_OFFSET_SIZE, node->label,
+			     debug_str_section, "The macro: \"%s\"",
+			     ref->info);
+      break;
+    case DW_MACINFO_GNU_transparent_include4:
+      dw2_asm_output_data (1, ref->code, "Transparent include4");
+      ASM_GENERATE_INTERNAL_LABEL (label,
+				   DEBUG_MACINFO_SECTION_LABEL, ref->lineno);
+      dw2_asm_output_offset (DWARF_OFFSET_SIZE, label, NULL, NULL);
+      break;
+    default:
+      fprintf (asm_out_file, "%s unrecognized macinfo code %lu\n",
+	       ASM_COMMENT_START, (unsigned long) ref->code);
+      break;
+    }
+}
+
 static void
 output_macinfo (void)
 {
   unsigned i;
   unsigned long length = VEC_length (macinfo_entry, macinfo_table);
-  macinfo_entry *ref;
+  macinfo_entry *ref, *ref2;
+  VEC (macinfo_entry, gc) *files = NULL;
+  unsigned long transparent_includes = 0;
+  htab_t macinfo_htab = NULL;
 
   if (! length)
     return;
@@ -20402,37 +20503,185 @@ output_macinfo (void)
     {
       switch (ref->code)
 	{
-	  case DW_MACINFO_start_file:
+	case DW_MACINFO_start_file:
+	  VEC_safe_push (macinfo_entry, gc, files, ref);
+	  break;
+	case DW_MACINFO_end_file:
+	  if (!VEC_empty (macinfo_entry, files))
 	    {
-	      int file_num = maybe_emit_file (lookup_filename (ref->info));
-	      dw2_asm_output_data (1, DW_MACINFO_start_file, "Start new file");
-	      dw2_asm_output_data_uleb128 
-			(ref->lineno, "Included from line number %lu", 
-			 			(unsigned long)ref->lineno);
-	      dw2_asm_output_data_uleb128 (file_num, "file %s", ref->info);
+	      ref2 = VEC_last (macinfo_entry, files);
+	      free (CONST_CAST (char *, ref2->info));
+	      VEC_pop (macinfo_entry, files);
 	    }
-	    break;
-	  case DW_MACINFO_end_file:
-	    dw2_asm_output_data (1, DW_MACINFO_end_file, "End file");
-	    break;
-	  case DW_MACINFO_define:
-	    dw2_asm_output_data (1, DW_MACINFO_define, "Define macro");
-	    dw2_asm_output_data_uleb128 (ref->lineno, "At line number %lu", 
-			 			(unsigned long)ref->lineno);
-	    dw2_asm_output_nstring (ref->info, -1, "The macro");
-	    break;
-	  case DW_MACINFO_undef:
-	    dw2_asm_output_data (1, DW_MACINFO_undef, "Undefine macro");
-	    dw2_asm_output_data_uleb128 (ref->lineno, "At line number %lu",
-			 			(unsigned long)ref->lineno);
-	    dw2_asm_output_nstring (ref->info, -1, "The macro");
-	    break;
-	  default:
-	   fprintf (asm_out_file, "%s unrecognized macinfo code %lu\n",
-	     ASM_COMMENT_START, (unsigned long)ref->code);
+	  break;
+	case DW_MACINFO_define:
+	case DW_MACINFO_undef:
+#ifdef OBJECT_FORMAT_ELF
+	  if (!dwarf_strict
+	      && HAVE_COMDAT_GROUP
+	      && DWARF_OFFSET_SIZE == 4
+	      && VEC_length (macinfo_entry, files) != 1
+	      && i > 0
+	      && i + 1 < length
+	      && VEC_index (macinfo_entry, macinfo_table, i - 1)->code == 0)
+	    {
+	      char linebuf[sizeof (HOST_WIDE_INT) * 3 + 1];
+	      unsigned char checksum[16];
+	      struct md5_ctx ctx;
+	      char *tmp, *tail;
+	      const char *base;
+	      unsigned int j = i, k, l;
+	      void **slot;
+
+	      ref2 = VEC_index (macinfo_entry, macinfo_table, i + 1);
+	      if (ref2->code != DW_MACINFO_define
+		  && ref2->code != DW_MACINFO_undef)
+		break;
+
+	      if (VEC_empty (macinfo_entry, files))
+		{
+		  if (ref->lineno != 0 || ref2->lineno != 0)
+		    break;
+		}
+	      else if (ref->lineno == 0)
+		break;
+	      md5_init_ctx (&ctx);
+	      for (; VEC_iterate (macinfo_entry, macinfo_table, j, ref2); j++)
+		if (ref2->code != DW_MACINFO_define
+		    && ref2->code != DW_MACINFO_undef)
+		  break;
+		else if (ref->lineno == 0 && ref2->lineno != 0)
+		  break;
+		else
+		  {
+		    unsigned char code = ref2->code;
+		    md5_process_bytes (&code, 1, &ctx);
+		    checksum_uleb128 (ref2->lineno, &ctx);
+		    md5_process_bytes (ref2->info, strlen (ref2->info) + 1,
+				       &ctx);
+		  }
+	      md5_finish_ctx (&ctx, checksum);
+	      if (ref->lineno == 0)
+		base = "";
+	      else
+		base = lbasename (VEC_last (macinfo_entry, files)->info);
+	      for (l = 0, k = 0; base[k]; k++)
+		if (ISIDNUM (base[k]) || base[k] == '.')
+		  l++;
+	      if (l)
+		l++;
+	      sprintf (linebuf, HOST_WIDE_INT_PRINT_UNSIGNED,
+		       VEC_index (macinfo_entry, macinfo_table, i)->lineno);
+	      tmp = XNEWVEC (char, 3 + l + strlen (linebuf) + 1 + 16 * 2 + 1);
+	      strcpy (tmp, "wm.");
+	      tail = tmp + 3;
+	      if (l)
+		{
+		  for (k = 0; base[k]; k++)
+		    if (ISIDNUM (base[k]) || base[k] == '.')
+		      *tail++ = base[k];
+		  *tail++ = '.';
+		}
+	      l = strlen (linebuf);
+	      memcpy (tail, linebuf, l);
+	      tail += l;
+	      *tail++ = '.';
+	      for (k = 0; k < 16; k++)
+		sprintf (tail + k * 2, "%02x", checksum[k] & 0xff);
+	      ref2 = VEC_index (macinfo_entry, macinfo_table, i - 1);
+	      ref2->code = DW_MACINFO_GNU_transparent_include4;
+	      ref2->lineno = 0;
+	      ref2->info = tmp;
+	      if (macinfo_htab == NULL)
+		macinfo_htab = htab_create (10, htab_macinfo_hash,
+					    htab_macinfo_eq, NULL);
+	      slot = htab_find_slot (macinfo_htab, ref2, INSERT);
+	      if (*slot != NULL)
+		{
+		  free (CONST_CAST (char *, ref2->info));
+		  ref2->code = 0;
+		  ref2->info = NULL;
+		  ref2 = (macinfo_entry *) *slot;
+		  output_macinfo_op (ref2);
+		  for (j = i;
+		       VEC_iterate (macinfo_entry, macinfo_table, j, ref2);
+		       j++)
+		    if (ref2->code != DW_MACINFO_define
+			&& ref2->code != DW_MACINFO_undef)
+		      break;
+		    else if (ref->lineno == 0 && ref2->lineno != 0)
+		      break;
+		    else
+		      {
+			ref2->code = 0;
+			free (CONST_CAST (char *, ref2->info));
+			ref2->info = NULL;
+		      }
+		}
+	      else
+		{
+		  *slot = ref2;
+		  ref2->lineno = ++transparent_includes;
+		  output_macinfo_op (ref2);
+		}
+	      i = j - 1;
+	      continue;
+	    }
+#endif
+	  break;
+	default:
 	  break;
 	}
+      output_macinfo_op (ref);
+      /* For DW_MACINFO_start_file ref->info has been copied into files
+	 vector.  */
+      if (ref->code != DW_MACINFO_start_file)
+	free (CONST_CAST (char *, ref->info));
+      ref->info = NULL;
+      ref->code = 0;
     }
+
+  if (!transparent_includes)
+    return;
+
+  htab_delete (macinfo_htab);
+
+#ifdef OBJECT_FORMAT_ELF
+  for (i = 0; VEC_iterate (macinfo_entry, macinfo_table, i, ref); i++)
+    switch (ref->code)
+      {
+      case 0:
+	continue;
+      case DW_MACINFO_GNU_transparent_include4:
+	{
+	  char label[MAX_ARTIFICIAL_LABEL_BYTES];
+	  tree comdat_key = get_identifier (ref->info);
+	  /* Terminate the previous .debug_macinfo section.  */
+	  dw2_asm_output_data (1, 0, "End compilation unit");
+	  targetm.asm_out.named_section (DEBUG_MACINFO_SECTION,
+					 SECTION_DEBUG
+					 | SECTION_LINKONCE,
+					 comdat_key);
+	  ASM_GENERATE_INTERNAL_LABEL (label,
+				       DEBUG_MACINFO_SECTION_LABEL,
+				       ref->lineno);
+	  ASM_OUTPUT_LABEL (asm_out_file, label);
+	  ref->code = 0;
+	  free (CONST_CAST (char *, ref->info));
+	  ref->info = NULL;
+	}
+	break;
+      case DW_MACINFO_define:
+      case DW_MACINFO_undef:
+	output_macinfo_op (ref);
+	ref->code = 0;
+	free (CONST_CAST (char *, ref->info));
+	ref->info = NULL;
+	break;
+      default:
+	gcc_unreachable ();
+      }
+#endif
 }
 
 /* Set up for Dwarf output at the start of compilation.  */

	Jakub

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2011-07-26  5:17 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-13 17:12 [RFC] More compact (100x) -g3 .debug_macinfo Jakub Jelinek
2011-07-13 19:59 ` Tom Tromey
2011-07-13 20:37   ` Jakub Jelinek
2011-07-18 15:42     ` Tom Tromey
2011-07-15 15:52 ` [RFC] More compact (100x) -g3 .debug_macinfo (take 2) Jakub Jelinek
2011-07-15 17:19   ` Richard Henderson
2011-07-15 21:18     ` [RFC] More compact (100x) -g3 .debug_macinfo (take 3) Jakub Jelinek
2011-07-18 15:09       ` Tom Tromey
2011-07-20  1:17       ` Richard Henderson
2011-07-21 11:38         ` [RFC] More compact (100x) -g3 .debug_gnu_macro (take 4) Jakub Jelinek
2011-07-21 17:25           ` Richard Henderson
2011-07-21 18:13             ` Jakub Jelinek
2011-07-22 13:49             ` [RFC] More compact (100x) -g3 .debug_gnu_macro (take 5) Jakub Jelinek
2011-07-22 15:34               ` Tom Tromey
2011-07-22 17:24               ` Richard Henderson
2011-07-22 20:33             ` [RFC] More compact (100x) -g3 .debug_gnu_macro (take 4) Michael Eager
2011-07-22 21:50               ` Richard Henderson
2011-07-22 21:51                 ` Michael Eager
2011-07-22 22:10                   ` Richard Henderson
2011-07-23  0:32                     ` Michael Eager
2011-07-23  0:36                       ` Richard Henderson
2011-07-26  7:34                         ` Jason Merrill
2011-07-15 18:28   ` [RFC] More compact (100x) -g3 .debug_macinfo (take 2) Tom Tromey
2011-07-15 19:21     ` Jakub Jelinek
2011-07-15 19:30       ` Tom Tromey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).