From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 2103) id 027813849AFF; Fri, 19 Apr 2024 15:51:54 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 027813849AFF DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1713541915; bh=aiG3WoOqYfyvB+Pa1oS2ta3panLz/i5CO4HAwDJrg/8=; h=From:To:Subject:Date:From; b=ZIUGCxL93QeFpyN8A+40EAdldlBOTEi5sUK1dTEPxb0QNKjeY9sreTspbYEIdOjB7 NWFceMFu55+5InQa2w1aI9sSYjalvCLFdfL+FWU0CNCgv7GdwOh3X5+LGOxp9LPE12 N9yJtiisEAJiOUlOCSpAGnMRq8MmvD0YfUGJTcS4= Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable From: Nick Alcock To: binutils-cvs@sourceware.org Subject: [binutils-gdb] libctf: rethink strtab writeout X-Act-Checkin: binutils-gdb X-Git-Author: Nick Alcock X-Git-Refname: refs/heads/master X-Git-Oldrev: 149ce5c263616e657ff8d108419d2eca54532b5a X-Git-Newrev: cf9da3b0b6a6ae9d71ba36898a5e39710150f85e Message-Id: <20240419155155.027813849AFF@sourceware.org> Date: Fri, 19 Apr 2024 15:51:54 +0000 (GMT) List-Id: https://sourceware.org/git/gitweb.cgi?p=3Dbinutils-gdb.git;h=3Dcf9da3b0b6a6= ae9d71ba36898a5e39710150f85e commit cf9da3b0b6a6ae9d71ba36898a5e39710150f85e Author: Nick Alcock Date: Mon Mar 25 19:07:43 2024 +0000 libctf: rethink strtab writeout =20 This commit finally adjusts strtab writeout so that repeated writeouts,= or writeouts of a dict that was read in earlier, only sorts the portion of= the strtab that was newly added. =20 There are three intertwined changes here: =20 - pull the contents of strtabs from newly ctf_bufopened dicts into the atoms table, so that future additions will reuse the existing offset= etc rather than adding new identical strings - allow the internal ctf_bufopen done by serialization to contribute i= ts existing atoms table, so that existing atoms can be used for the remainder of the open process (like name table construction): this a= toms table currente gets thrown away in the mass reassignment done later = in ctf_serialize in any case, but it needs to be there during the open. - rewrite ctf_str_write_strtab so that a) it uses iterators rather than ctf_*_iter, reducing pointless structures which serve no other purpo= se than to implement ordinary variable scope, but more clunkily, and b) retains the existing strtab on the front of the new one, with its so= rt retained, rather than resorting, so all existing already-written str= tab offsets remain valid across the call. =20 This latter change finally permits repeated serializations, and reserializations of ctf_open()ed dicts, to work, but for now we keep the code that prevents that because serialization is about to change again = in a way that will make it more obvious that doing such things is safe, and = we can take it out then. =20 (There are also some smaller changes like moving the purge of the refs = table into ctf_str_write_strtab(), since that's where the changes happen that invalidate it, rather than doing it in ctf_serialize(). We also prohib= it something that has never worked, opening a dict and then reporting symb= ols to it via ctf_link_add_strtab() et al: you must do that to newly-created dicts which have had stuff ctf_link()ed into them. This is very unlike= ly ever to be a problem in practice: linkers just don't do that sort of th= ing.) =20 libctf/ =20 * ctf-create.c (ctf_create): Add (temporary) atoms arg. * ctf-impl.h (struct ctf_dict.ctf_dynstrtab): New. (ctf_str_create_atoms): Adjust. (ctf_str_write_strtab): Likewise. (ctf_simple_open_internal): Likewise. * ctf-open.c (ctf_simple_open_internal): Add atoms arg. (ctf_bufopen): Likewise. (ctf_bufopen_internal): Initialize just enough of an atoms table: pre-init from the atoms arg if supplied. (ctf_simple_open): Adjust. * ctf-serialize.c (ctf_serialize): Constify the strtab. Move ref list purging into ctf_str_write_strtab. Initialize the new dict with the old dict's atoms table. Accept the new strtab from ctf_str_write_strtab. Adjust for addition of ctf_dynstrtab. * ctf-string.c (ctf_strraw_explicit): Improve comments. (ctf_str_create_atoms): Prepopulate from an existing atoms tabl= e, or alternatively pull in all strings from the strtab and turn them into atoms. (ctf_str_free_atoms): Free the dynstrtab and its strtab. (struct ctf_strtab_write_state): Remove. (ctf_str_count_strtab): Fold this... (ctf_str_populate_sorttab): ... and this... (ctf_str_write_strtab): ... into this. Prepend existing strings to the strtab rather than resorting them (and wrecking their offsets). Keep the dynstrtab updated. Update refs for all atoms with refs, whether or not they are strings newly added to the strtab. Diff: --- libctf/ctf-create.c | 2 +- libctf/ctf-impl.h | 9 +- libctf/ctf-open.c | 20 ++- libctf/ctf-serialize.c | 22 +-- libctf/ctf-string.c | 389 +++++++++++++++++++++++++++++++++------------= ---- 5 files changed, 292 insertions(+), 150 deletions(-) diff --git a/libctf/ctf-create.c b/libctf/ctf-create.c index e0558d28233..78fb0305c20 100644 --- a/libctf/ctf-create.c +++ b/libctf/ctf-create.c @@ -133,7 +133,7 @@ ctf_create (int *errp) cts.cts_size =3D sizeof (hdr); cts.cts_entsize =3D 1; =20 - if ((fp =3D ctf_bufopen_internal (&cts, NULL, NULL, NULL, errp)) =3D=3D = NULL) + if ((fp =3D ctf_bufopen_internal (&cts, NULL, NULL, NULL, NULL, errp)) = =3D=3D NULL) goto err; =20 /* These hashes will have been initialized with a starting size of zero, diff --git a/libctf/ctf-impl.h b/libctf/ctf-impl.h index f4611316f50..3eef232bea0 100644 --- a/libctf/ctf-impl.h +++ b/libctf/ctf-impl.h @@ -396,6 +396,7 @@ struct ctf_dict ctf_dynhash_t *ctf_names; /* Hash table of remaining type names. */ ctf_lookup_t ctf_lookups[5]; /* Pointers to nametabs for name lookup= . */ ctf_strs_t ctf_str[2]; /* Array of string table base and bounds. */ + ctf_strs_writable_t *ctf_dynstrtab; /* Dynamically allocated string tabl= e, if any. */ ctf_dynhash_t *ctf_str_atoms; /* Hash table of ctf_str_atoms_t. */ ctf_dynhash_t *ctf_str_movable_refs; /* Hash table of void * -> ctf_str_= atom_ref_t. */ uint32_t ctf_str_prov_offset; /* Latest provisional offset assigned so= far. */ @@ -734,7 +735,7 @@ extern const char *ctf_strraw (ctf_dict_t *, uint32_t); extern const char *ctf_strraw_explicit (ctf_dict_t *, uint32_t, ctf_strs_t *); extern const char *ctf_strptr_validate (ctf_dict_t *, uint32_t); -extern int ctf_str_create_atoms (ctf_dict_t *); +extern int ctf_str_create_atoms (ctf_dict_t *, ctf_dynhash_t *atoms); extern void ctf_str_free_atoms (ctf_dict_t *); extern uint32_t ctf_str_add (ctf_dict_t *, const char *); extern uint32_t ctf_str_add_ref (ctf_dict_t *, const char *, uint32_t *ref= ); @@ -745,7 +746,7 @@ extern int ctf_str_add_external (ctf_dict_t *, const ch= ar *, uint32_t offset); extern void ctf_str_remove_ref (ctf_dict_t *, const char *, uint32_t *ref); extern void ctf_str_rollback (ctf_dict_t *, ctf_snapshot_id_t); extern void ctf_str_purge_refs (ctf_dict_t *); -extern ctf_strs_writable_t ctf_str_write_strtab (ctf_dict_t *); +extern const ctf_strs_writable_t *ctf_str_write_strtab (ctf_dict_t *); =20 extern struct ctf_archive_internal * ctf_new_archive_internal (int is_archive, int unmap_on_close, @@ -762,10 +763,10 @@ extern int ctf_flip (ctf_dict_t *, ctf_header_t *, un= signed char *, int); extern ctf_dict_t *ctf_simple_open_internal (const char *, size_t, const c= har *, size_t, size_t, const char *, size_t, - ctf_dynhash_t *, int *); + ctf_dynhash_t *, ctf_dynhash_t *, int *); extern ctf_dict_t *ctf_bufopen_internal (const ctf_sect_t *, const ctf_sec= t_t *, const ctf_sect_t *, ctf_dynhash_t *, - int *); + ctf_dynhash_t *, int *); extern int ctf_import_unref (ctf_dict_t *fp, ctf_dict_t *pfp); extern int ctf_serialize (ctf_dict_t *); =20 diff --git a/libctf/ctf-open.c b/libctf/ctf-open.c index 22475465fa8..6d7a276f2cd 100644 --- a/libctf/ctf-open.c +++ b/libctf/ctf-open.c @@ -1290,7 +1290,7 @@ ctf_dict_t *ctf_simple_open (const char *ctfsect, siz= e_t ctfsect_size, { return ctf_simple_open_internal (ctfsect, ctfsect_size, symsect, symsect= _size, symsect_entsize, strsect, strsect_size, NULL, - errp); + NULL, errp); } =20 /* Open a CTF file, mocking up a suitable ctf_sect and overriding the exte= rnal @@ -1300,7 +1300,8 @@ ctf_dict_t *ctf_simple_open_internal (const char *ctf= sect, size_t ctfsect_size, const char *symsect, size_t symsect_size, size_t symsect_entsize, const char *strsect, size_t strsect_size, - ctf_dynhash_t *syn_strtab, int *errp) + ctf_dynhash_t *syn_strtab, + ctf_dynhash_t *atoms, int *errp) { ctf_sect_t skeleton; =20 @@ -1338,7 +1339,7 @@ ctf_dict_t *ctf_simple_open_internal (const char *ctf= sect, size_t ctfsect_size, } =20 return ctf_bufopen_internal (ctfsectp, symsectp, strsectp, syn_strtab, - errp); + atoms, errp); } =20 /* Decode the specified CTF buffer and optional symbol table, and create a= new @@ -1350,7 +1351,7 @@ ctf_dict_t * ctf_bufopen (const ctf_sect_t *ctfsect, const ctf_sect_t *symsect, const ctf_sect_t *strsect, int *errp) { - return ctf_bufopen_internal (ctfsect, symsect, strsect, NULL, errp); + return ctf_bufopen_internal (ctfsect, symsect, strsect, NULL, NULL, errp= ); } =20 /* Like ctf_bufopen, but overriding the external strtab with a synthetic o= ne. */ @@ -1358,7 +1359,7 @@ ctf_bufopen (const ctf_sect_t *ctfsect, const ctf_sec= t_t *symsect, ctf_dict_t * ctf_bufopen_internal (const ctf_sect_t *ctfsect, const ctf_sect_t *symsect, const ctf_sect_t *strsect, ctf_dynhash_t *syn_strtab, - int *errp) + ctf_dynhash_t *atoms, int *errp) { const ctf_preamble_t *pp; size_t hdrsz =3D sizeof (ctf_header_t); @@ -1615,7 +1616,14 @@ ctf_bufopen_internal (const ctf_sect_t *ctfsect, con= st ctf_sect_t *symsect, ctf_set_base(). */ =20 ctf_set_version (fp, hp, hp->cth_version); - if (ctf_str_create_atoms (fp) < 0) + + /* Temporary assignment, just enough to be able to initialize + the atoms table. */ + + fp->ctf_str[CTF_STRTAB_0].cts_strs =3D (const char *) fp->ctf_buf + + hp->cth_stroff; + fp->ctf_str[CTF_STRTAB_0].cts_len =3D hp->cth_strlen; + if (ctf_str_create_atoms (fp, atoms) < 0) { err =3D ENOMEM; goto bad; diff --git a/libctf/ctf-serialize.c b/libctf/ctf-serialize.c index 6355d4225eb..82e5b7d705b 100644 --- a/libctf/ctf-serialize.c +++ b/libctf/ctf-serialize.c @@ -955,7 +955,7 @@ ctf_serialize (ctf_dict_t *fp) ctf_header_t hdr, *hdrp; ctf_dvdef_t *dvd; ctf_varent_t *dvarents; - ctf_strs_writable_t strtab; + const ctf_strs_writable_t *strtab; int err; int sym_functions =3D 0; =20 @@ -1090,36 +1090,34 @@ ctf_serialize (ctf_dict_t *fp) assert (t =3D=3D (unsigned char *) buf + sizeof (ctf_header_t) + hdr.cth= _stroff); =20 /* Construct the final string table and fill out all the string refs wit= h the - final offsets. Then purge the refs list, because we're about to move= this - strtab onto the end of the buf, invalidating all the offsets. */ + final offsets. */ + strtab =3D ctf_str_write_strtab (fp); - ctf_str_purge_refs (fp); =20 - if (strtab.cts_strs =3D=3D NULL) + if (strtab =3D=3D NULL) goto oom; =20 /* Now the string table is constructed, we can sort the buffer of ctf_varent_t's. */ - ctf_sort_var_arg_cb_t sort_var_arg =3D { fp, (ctf_strs_t *) &strtab }; + ctf_sort_var_arg_cb_t sort_var_arg =3D { fp, (ctf_strs_t *) strtab }; ctf_qsort_r (dvarents, nvars, sizeof (ctf_varent_t), ctf_sort_var, &sort_var_arg); =20 - if ((newbuf =3D realloc (buf, buf_size + strtab.cts_len)) =3D=3D NULL) + if ((newbuf =3D realloc (buf, buf_size + strtab->cts_len)) =3D=3D NULL) goto oom; =20 buf =3D newbuf; - memcpy (buf + buf_size, strtab.cts_strs, strtab.cts_len); + memcpy (buf + buf_size, strtab->cts_strs, strtab->cts_len); hdrp =3D (ctf_header_t *) buf; - hdrp->cth_strlen =3D strtab.cts_len; + hdrp->cth_strlen =3D strtab->cts_len; buf_size +=3D hdrp->cth_strlen; - free (strtab.cts_strs); =20 /* Finally, we are ready to ctf_simple_open() the new dict. If this is successful, we then switch nfp and fp and free the old dict. */ =20 if ((nfp =3D ctf_simple_open_internal ((char *) buf, buf_size, NULL, 0, 0, NULL, 0, fp->ctf_syn_ext_strtab, - &err)) =3D=3D NULL) + fp->ctf_str_atoms, &err)) =3D=3D NULL) { free (buf); return (ctf_set_errno (fp, err)); @@ -1189,9 +1187,11 @@ ctf_serialize (ctf_dict_t *fp) ctf_str_free_atoms (nfp); nfp->ctf_str_atoms =3D fp->ctf_str_atoms; nfp->ctf_prov_strtab =3D fp->ctf_prov_strtab; + nfp->ctf_dynstrtab =3D fp->ctf_dynstrtab; nfp->ctf_str_movable_refs =3D fp->ctf_str_movable_refs; fp->ctf_str_atoms =3D NULL; fp->ctf_prov_strtab =3D NULL; + fp->ctf_dynstrtab =3D NULL; fp->ctf_str_movable_refs =3D NULL; memset (&fp->ctf_dtdefs, 0, sizeof (ctf_list_t)); memset (&fp->ctf_errs_warnings, 0, sizeof (ctf_list_t)); diff --git a/libctf/ctf-string.c b/libctf/ctf-string.c index dcb8bf0fee1..46b984bd0de 100644 --- a/libctf/ctf-string.c +++ b/libctf/ctf-string.c @@ -20,10 +20,14 @@ #include #include #include -#include =20 -/* Convert an encoded CTF string name into a pointer to a C string, using = an - explicit internal strtab rather than the fp-based one. */ +static ctf_str_atom_t * +ctf_str_add_ref_internal (ctf_dict_t *fp, const char *str, + int flags, uint32_t *ref); + +/* Convert an encoded CTF string name into a pointer to a C string, possib= ly + using an explicit internal provisional strtab rather than the fp-based + one. */ const char * ctf_strraw_explicit (ctf_dict_t *fp, uint32_t name, ctf_strs_t *strtab) { @@ -32,18 +36,20 @@ ctf_strraw_explicit (ctf_dict_t *fp, uint32_t name, ctf= _strs_t *strtab) if ((CTF_NAME_STID (name) =3D=3D CTF_STRTAB_0) && (strtab !=3D NULL)) ctsp =3D strtab; =20 - /* If this name is in the external strtab, and there is a synthetic strt= ab, - use it in preference. */ + /* If this name is in the external strtab, and there is a synthetic + strtab, use it in preference. (This is used to add the set of strings + -- symbol names, etc -- the linker knows about before the strtab is + written out.) */ =20 if (CTF_NAME_STID (name) =3D=3D CTF_STRTAB_1 && fp->ctf_syn_ext_strtab !=3D NULL) return ctf_dynhash_lookup (fp->ctf_syn_ext_strtab, (void *) (uintptr_t) name); =20 - /* If the name is in the internal strtab, and the offset is beyond the e= nd of - the ctsp->cts_len but below the ctf_str_prov_offset, this is a provis= ional - string added by ctf_str_add*() but not yet built into a real strtab: = get - the value out of the ctf_prov_strtab. */ + /* If the name is in the internal strtab, and the name offset is beyond + the end of the ctsp->cts_len but below the ctf_str_prov_offset, this = is + a provisional string added by ctf_str_add*() but not yet built into a + real strtab: get the value out of the ctf_prov_strtab. */ =20 if (CTF_NAME_STID (name) =3D=3D CTF_STRTAB_0 && name >=3D ctsp->cts_len && name < fp->ctf_str_prov_offset) @@ -134,13 +140,25 @@ ctf_str_free_atom (void *a) } =20 /* Create the atoms table. There is always at least one atom in it, the n= ull - string. */ + string: but also pull in atoms from the internal strtab. (We rely on + calls to ctf_str_add_external to populate external strtab entries, since + these are often not quite the same as what appears in any external + strtab, and the external strtab is often huge and best not aggressively + pulled in.) + + Alternatively, if passed, populate atoms from the passed-in table, but = do + not propagate their flags or refs: they are all non-freeable and + non-movable. (This is used when serializing a dict: this entire atoms + table will be thrown away shortly, so it is important that we not create + any new strings.) */ int -ctf_str_create_atoms (ctf_dict_t *fp) +ctf_str_create_atoms (ctf_dict_t *fp, ctf_dynhash_t *atoms) { + size_t i; + fp->ctf_str_atoms =3D ctf_dynhash_create (ctf_hash_string, ctf_hash_eq_s= tring, - free, ctf_str_free_atom); - if (fp->ctf_str_atoms =3D=3D NULL) + NULL, ctf_str_free_atom); + if (!fp->ctf_str_atoms) return -ENOMEM; =20 if (!fp->ctf_prov_strtab) @@ -161,6 +179,63 @@ ctf_str_create_atoms (ctf_dict_t *fp) if (errno =3D=3D ENOMEM) goto oom_str_add; =20 + /* Serializing. We have existing strings in an existing atoms table with + possibly-live pointers to them which must be used unchanged. Import + them into this atoms table. */ + + if (atoms) + { + ctf_next_t *it =3D NULL; + void *k, *v; + int err; + + while ((err =3D ctf_dynhash_next (atoms, &it, &k, &v)) =3D=3D 0) + { + ctf_str_atom_t *existing =3D v; + ctf_str_atom_t *atom; + + if (existing->csa_str[0] =3D=3D 0) + continue; + + if ((atom =3D malloc (sizeof (struct ctf_str_atom))) =3D=3D NULL) + goto oom_str_add; + memcpy (atom, existing, sizeof (struct ctf_str_atom)); + memset (&atom->csa_refs, 0, sizeof(ctf_list_t)); + atom->csa_flags =3D 0; + + if (ctf_dynhash_insert (fp->ctf_str_atoms, atom->csa_str, atom) < 0) + { + free (atom); + goto oom_str_add; + } + } + } + else + { + /* Not serializing. Pull in all the strings in the strtab as new + atoms. The provisional strtab must be empty at this point, so + there is no need to populate atoms from it as well. Types in this + subset are frozen and readonly, so the refs list and movable refs + list need not be populated. */ + + for (i =3D 0; i < fp->ctf_str[CTF_STRTAB_0].cts_len; + i +=3D strlen (&fp->ctf_str[CTF_STRTAB_0].cts_strs[i]) + 1) + { + ctf_str_atom_t *atom; + + if (fp->ctf_str[CTF_STRTAB_0].cts_strs[i] =3D=3D 0) + continue; + + atom =3D ctf_str_add_ref_internal (fp, &fp->ctf_str[CTF_STRTAB_0].cts_s= trs[i], + 0, 0); + + if (!atom) + goto oom_str_add; + + atom->csa_offset =3D i; + } + } + return 0; =20 oom_str_add: @@ -182,6 +257,11 @@ ctf_str_free_atoms (ctf_dict_t *fp) ctf_dynhash_destroy (fp->ctf_prov_strtab); ctf_dynhash_destroy (fp->ctf_str_atoms); ctf_dynhash_destroy (fp->ctf_str_movable_refs); + if (fp->ctf_dynstrtab) + { + free (fp->ctf_dynstrtab->cts_strs); + free (fp->ctf_dynstrtab); + } } =20 #define CTF_STR_ADD_REF 0x1 @@ -538,69 +618,6 @@ ctf_str_update_refs (ctf_str_atom_t *refs, uint32_t va= lue) *(ref->caf_ref) =3D value; } =20 -/* State shared across the strtab write process. */ -typedef struct ctf_strtab_write_state -{ - /* Strtab we are writing, and the number of strings in it. */ - ctf_strs_writable_t *strtab; - size_t strtab_count; - - /* Pointers to (existing) atoms in the atoms table, for qsorting. */ - ctf_str_atom_t **sorttab; - - /* Loop counter for sorttab population. */ - size_t i; - - /* The null-string atom (skipped during population). */ - ctf_str_atom_t *nullstr; -} ctf_strtab_write_state_t; - -/* Count the number of entries in the strtab, and its length. */ -static void -ctf_str_count_strtab (void *key _libctf_unused_, void *value, - void *arg) -{ - ctf_str_atom_t *atom =3D (ctf_str_atom_t *) value; - ctf_strtab_write_state_t *s =3D (ctf_strtab_write_state_t *) arg; - - /* We only factor in the length of items that have no offset and have re= fs: - other items are in the external strtab, or will simply not be written= out - at all. They still contribute to the total count, though, because we= still - have to sort them. We add in the null string's length explicitly, ou= tside - this function, since it is explicitly written out even if it has no r= efs at - all. */ - - if (s->nullstr =3D=3D atom) - { - s->strtab_count++; - return; - } - - if (!ctf_list_empty_p (&atom->csa_refs)) - { - if (!atom->csa_external_offset) - s->strtab->cts_len +=3D strlen (atom->csa_str) + 1; - s->strtab_count++; - } -} - -/* Populate the sorttab with pointers to the strtab atoms. */ -static void -ctf_str_populate_sorttab (void *key _libctf_unused_, void *value, - void *arg) -{ - ctf_str_atom_t *atom =3D (ctf_str_atom_t *) value; - ctf_strtab_write_state_t *s =3D (ctf_strtab_write_state_t *) arg; - - /* Skip the null string. */ - if (s->nullstr =3D=3D atom) - return; - - /* Skip atoms with no refs. */ - if (!ctf_list_empty_p (&atom->csa_refs)) - s->sorttab[s->i++] =3D atom; -} - /* Sort the strtab. */ static int ctf_str_sort_strtab (const void *a, const void *b) @@ -612,79 +629,182 @@ ctf_str_sort_strtab (const void *a, const void *b) } =20 /* Write out and return a strtab containing all strings with recorded refs, - adjusting the refs to refer to the corresponding string. The returned = strtab - may be NULL on error. Also populate the synthetic strtab with mappings= from - external strtab offsets to names, so we can look them up with ctf_strpt= r(). - Only external strtab offsets with references are added. */ -ctf_strs_writable_t + adjusting the refs to refer to the corresponding string. The returned + strtab is already assigned to strtab 0 in this dict, is owned by this + dict, and may be NULL on error. Also populate the synthetic strtab with + mappings from external strtab offsets to names, so we can look them up + with ctf_strptr(). Only external strtab offsets with references are + added. + + As a side effect, replaces the strtab of the current dict with the newl= y- + generated strtab. This is an exception to the general rule that + serialization does not change the dict passed in, because the alternati= ve + is to copy the entire atoms table on every reserialization just to avoid + modifying the original, which is excessively costly for minimal gain. + + We use the lazy man's approach and double memory costs by always storing + atoms as individually allocated entities whenever they come from anywhe= re + but a freshly-opened, mmapped dict, even though after serialization the= re + is another copy in the strtab; this ensures that ctf_strptr()-returned + pointers to them remain valid for the lifetime of the dict. + + This is all rendered more complex because if a dict is ctf_open()ed it + will have a bunch of strings in its strtab already, and their strtab + offsets can never change (without piles of complexity to rescan the + entire dict just to get all the offsets to all of them into the atoms + table). Entries below the existing strtab limit are just copied into t= he + new dict: entries above it are new, and are are sorted first, then + appended to it. The sorting is purely a compression-efficiency + improvement, and we get nearly as good an improvement from sorting big + chunks like this as we would from sorting the whole thing. */ + +const ctf_strs_writable_t * ctf_str_write_strtab (ctf_dict_t *fp) { - ctf_strs_writable_t strtab; - ctf_str_atom_t *nullstr; + ctf_strs_writable_t *strtab; + size_t strtab_count =3D 0; uint32_t cur_stroff =3D 0; - ctf_strtab_write_state_t s; ctf_str_atom_t **sorttab; + ctf_next_t *it =3D NULL; size_t i; + void *v; + int err; + int new_strtab =3D 0; int any_external =3D 0; =20 - memset (&strtab, 0, sizeof (struct ctf_strs_writable)); - memset (&s, 0, sizeof (struct ctf_strtab_write_state)); - s.strtab =3D &strtab; + strtab =3D calloc (1, sizeof (ctf_strs_writable_t)); + if (!strtab) + return NULL; + + /* The strtab contains the existing string table at its start: figure out + how many new strings we need to add. We only need to add new strings + that have no external offset, that have refs, and that are found in t= he + provisional strtab. If the existing strtab is empty we also need to + add the null string at its start. */ + + strtab->cts_len =3D fp->ctf_str[CTF_STRTAB_0].cts_len; =20 - nullstr =3D ctf_dynhash_lookup (fp->ctf_str_atoms, ""); - if (!nullstr) + if (strtab->cts_len =3D=3D 0) { - ctf_err_warn (fp, 0, ECTF_INTERNAL, _("null string not found in strt= ab")); - strtab.cts_strs =3D NULL; - return strtab; + new_strtab =3D 1; + strtab->cts_len++; /* For the \0. */ } =20 - s.nullstr =3D nullstr; - ctf_dynhash_iter (fp->ctf_str_atoms, ctf_str_count_strtab, &s); - strtab.cts_len++; /* For the null string. */ + /* Count new entries in the strtab: i.e. entries in the provisional + strtab. Ignore any entry for \0, entries which ended up in the + external strtab, and unreferenced entries. */ =20 - ctf_dprintf ("%lu bytes of strings in strtab.\n", - (unsigned long) strtab.cts_len); + while ((err =3D ctf_dynhash_next (fp->ctf_prov_strtab, &it, NULL, &v)) = =3D=3D 0) + { + const char *str =3D (const char *) v; + ctf_str_atom_t *atom; + + atom =3D ctf_dynhash_lookup (fp->ctf_str_atoms, str); + if (!ctf_assert (fp, atom)) + goto err_strtab; + + if (atom->csa_str[0] =3D=3D 0 || ctf_list_empty_p (&atom->csa_refs) = || + atom->csa_external_offset) + continue; + + strtab->cts_len +=3D strlen (atom->csa_str) + 1; + strtab_count++; + } + if (err !=3D ECTF_NEXT_END) + { + ctf_dprintf ("ctf_str_write_strtab: error counting strtab entries: %= s\n", + ctf_errmsg (err)); + goto err_strtab; + } =20 - /* Sort the strtab. Force the null string to be first. */ - sorttab =3D calloc (s.strtab_count, sizeof (ctf_str_atom_t *)); + ctf_dprintf ("%lu bytes of strings in strtab: %lu pre-existing.\n", + (unsigned long) strtab->cts_len, + (unsigned long) fp->ctf_str[CTF_STRTAB_0].cts_len); + + /* Sort the new part of the strtab. */ + + sorttab =3D calloc (strtab_count, sizeof (ctf_str_atom_t *)); if (!sorttab) - goto oom; + { + ctf_set_errno (fp, ENOMEM); + goto err_strtab; + } =20 - sorttab[0] =3D nullstr; - s.i =3D 1; - s.sorttab =3D sorttab; - ctf_dynhash_iter (fp->ctf_str_atoms, ctf_str_populate_sorttab, &s); + i =3D 0; + while ((err =3D ctf_dynhash_next (fp->ctf_prov_strtab, &it, NULL, &v)) = =3D=3D 0) + { + ctf_str_atom_t *atom; =20 - qsort (&sorttab[1], s.strtab_count - 1, sizeof (ctf_str_atom_t *), + atom =3D ctf_dynhash_lookup (fp->ctf_str_atoms, v); + if (!ctf_assert (fp, atom)) + goto err_sorttab; + + if (atom->csa_str[0] =3D=3D 0 || ctf_list_empty_p (&atom->csa_refs) = || + atom->csa_external_offset) + continue; + + sorttab[i++] =3D atom; + } + + qsort (sorttab, strtab_count, sizeof (ctf_str_atom_t *), ctf_str_sort_strtab); =20 - if ((strtab.cts_strs =3D malloc (strtab.cts_len)) =3D=3D NULL) - goto oom_sorttab; + if ((strtab->cts_strs =3D malloc (strtab->cts_len)) =3D=3D NULL) + goto err_sorttab; + + cur_stroff =3D fp->ctf_str[CTF_STRTAB_0].cts_len; =20 - /* Update all refs: also update the strtab appropriately. */ - for (i =3D 0; i < s.strtab_count; i++) + if (new_strtab) { - if (sorttab[i]->csa_external_offset) - { - /* External strtab entry. */ + strtab->cts_strs[0] =3D 0; + cur_stroff++; + } + else + memcpy (strtab->cts_strs, fp->ctf_str[CTF_STRTAB_0].cts_strs, + fp->ctf_str[CTF_STRTAB_0].cts_len); + + /* Work over the sorttab, add its strings to the strtab, and remember + where they are in the csa_offset for the appropriate atom. No ref + updating is done at this point, because refs might well relate to + already-existing strings, or external strings, which do not need addi= ng + to the strtab and may not be in the sorttab. */ + + for (i =3D 0; i < strtab_count; i++) + { + sorttab[i]->csa_offset =3D cur_stroff; + strcpy (&strtab->cts_strs[cur_stroff], sorttab[i]->csa_str); + cur_stroff +=3D strlen (sorttab[i]->csa_str) + 1; + } + free (sorttab); + sorttab =3D NULL; =20 + /* Update all refs, then purge them as no longer necessary: also update + the strtab appropriately. */ + + while ((err =3D ctf_dynhash_next (fp->ctf_str_atoms, &it, NULL, &v)) =3D= =3D 0) + { + ctf_str_atom_t *atom =3D (ctf_str_atom_t *) v; + uint32_t offset; + + if (ctf_list_empty_p (&atom->csa_refs)) + continue; + + if (atom->csa_external_offset) + { any_external =3D 1; - ctf_str_update_refs (sorttab[i], sorttab[i]->csa_external_offset); - sorttab[i]->csa_offset =3D sorttab[i]->csa_external_offset; + offset =3D atom->csa_external_offset; } else - { - /* Internal strtab entry with refs: actually add to the string - table. */ - - ctf_str_update_refs (sorttab[i], cur_stroff); - sorttab[i]->csa_offset =3D cur_stroff; - strcpy (&strtab.cts_strs[cur_stroff], sorttab[i]->csa_str); - cur_stroff +=3D strlen (sorttab[i]->csa_str) + 1; - } + offset =3D atom->csa_offset; + ctf_str_update_refs (atom, offset); } - free (sorttab); + if (err !=3D ECTF_NEXT_END) + { + ctf_dprintf ("ctf_str_write_strtab: error iterating over atoms while= updating refs: %s\n", + ctf_errmsg (err)); + goto err_strtab; + } + ctf_str_purge_refs (fp); =20 if (!any_external) { @@ -692,16 +812,29 @@ ctf_str_write_strtab (ctf_dict_t *fp) fp->ctf_syn_ext_strtab =3D NULL; } =20 + /* Replace the old strtab with the new one in this dict. */ + + if (fp->ctf_dynstrtab) + { + free (fp->ctf_dynstrtab->cts_strs); + free (fp->ctf_dynstrtab); + } + + fp->ctf_dynstrtab =3D strtab; + fp->ctf_str[CTF_STRTAB_0].cts_strs =3D strtab->cts_strs; + fp->ctf_str[CTF_STRTAB_0].cts_len =3D strtab->cts_len; + /* All the provisional strtab entries are now real strtab entries, and ctf_strptr() will find them there. The provisional offset now starts= right beyond the new end of the strtab. */ =20 ctf_dynhash_empty (fp->ctf_prov_strtab); - fp->ctf_str_prov_offset =3D strtab.cts_len + 1; + fp->ctf_str_prov_offset =3D strtab->cts_len + 1; return strtab; =20 - oom_sorttab: + err_sorttab: free (sorttab); - oom: - return strtab; + err_strtab: + free (strtab); + return NULL; }