From: Rong Xu <xur@google.com>
To: Xinliang David Li <davidxl@google.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [google][gcc-4_9] encode and compress cc1 option strings in gcov_module_info
Date: Tue, 06 Oct 2015 18:30:00 -0000 [thread overview]
Message-ID: <CAF1bQ=TPcRvEpR-dad6Z4+VFkY1-m4yF79UmZtMHBsvTn+Sk2Q@mail.gmail.com> (raw)
In-Reply-To: <CAAkRFZ+m3YGOX=q7AZNtntN1NEX4NpnnC3er=Ds5myqSV1EYpA@mail.gmail.com>
It's 1:3 to 1:4 in the programs I tested. But it really depends on how
the options are used. I think your idea of using combined strlen works
better.
I just make the code a little clumsy but it does not cause any
performance issue.
On Tue, Oct 6, 2015 at 10:21 AM, Xinliang David Li <davidxl@google.com> wrote:
> On Tue, Oct 6, 2015 at 9:26 AM, Rong Xu <xur@google.com> wrote:
>> On Mon, Oct 5, 2015 at 5:33 PM, Xinliang David Li <davidxl@google.com> wrote:
>>> unsigned ggc_memory = gcov_read_unsigned ();
>>> + unsigned marker = 0, len = 0, k;
>>> + char **string_array, *saved_cc1_strings;
>>> +
>>> for (unsigned j = 0; j < 7; j++)
>>>
>>>
>>> Do not use hard coded number. Use the enum defined in coverage.c.
>>
>> OK.
>>
>>>
>>>
>>> + string_array[j] = xstrdup (gcov_read_string ());
>>> +
>>> + k = 0;
>>> + for (unsigned j = 1; j < 7; j++)
>>>
>>> Do not use hard coded number.
>>
>> OK.
>>
>>>
>>>
>>> + {
>>> + if (num_array[j] == 0)
>>> + continue;
>>> + marker += num_array[j];
>>>
>>> It is better to read if the name of variable 'marker' is changed to
>>> 'j_end' or something similar
>>>
>>> For all the substrings of 'j' kind, there should be just one marker,
>>> right? It looks like here you introduce one marker per string, not one
>>> marker per string kind.
>>
>> I don't understand what you meant here. "marker" is fixed for each j
>> substring (one option kind) -- it the end index of the sub-string
>> array. k-loop is for each string.
>>
>
> That was a wrong comment from me. Discard it.
>
>>>
>>> + len += 3; /* [[<FLAG> */
>>>
>>> Same here for hard coded value.
>>>
>>> + for (; k < marker; k++)
>>> + len += strlen (string_array[k]) + 1; /* 1 for delimter of ']' */
>>>
>>> Why do we need one ']' per string?
>>
>> This is because the options strings can contain space ' '. I cannot
>> use space as the delimiter, neither is \0 as it is the end of the
>> string of the encoded string.
>
> Ok -- this allows you to avoid string copy during parsing.
>>
>>>
>>>
>>> + }
>>> + saved_cc1_strings = (char *) xmalloc (len + 1);
>>> + saved_cc1_strings[0] = 0;
>>> +
>>> + marker = 0;
>>> + k = 0;
>>> + for (unsigned j = 1; j < 7; j++)
>>>
>>> Same here for 7.
>>
>> will fix in the new patch.
>>
>>>
>>> + {
>>> + static const char lipo_string_flags[6] = {'Q', 'B', 'S',
>>> 'D','I', 'C'};
>>> + if (num_array[j] == 0)
>>> + continue;
>>> + marker += num_array[j];
>>>
>>> Suggest changing marker to j_end
>> OK.
>>
>>>
>>> + sprintf (saved_cc1_strings, "%s[[%c", saved_cc1_strings,
>>> + lipo_string_flags[j - 1]);
>>> + for (; k < marker; k++)
>>> + {
>>> + sprintf (saved_cc1_strings, "%s%s]", saved_cc1_strings,
>>> + string_array[k]);
>>>
>>> +#define DELIMTER "[["
>>>
>>> Why double '[' ?
>> I will change to single '['.
>>
>>>
>>> +#define DELIMTER2 "]"
>>> +#define QUOTE_PATH_FLAG 'Q'
>>> +#define BRACKET_PATH_FLAG 'B'
>>> +#define SYSTEM_PATH_FLAG 'S'
>>> +#define D_U_OPTION_FLAG 'D'
>>> +#define INCLUDE_OPTION_FLAG 'I'
>>> +#define COMMAND_ARG_FLAG 'C'
>>> +
>>> +enum lipo_cc1_string_kind {
>>> + k_quote_paths = 0,
>>> + k_bracket_paths,
>>> + k_system_paths,
>>> + k_cpp_defines,
>>> + k_cpp_includes,
>>> + k_lipo_cl_args,
>>> + num_lipo_cc1_string_kind
>>> +};
>>> +
>>> +struct lipo_parsed_cc1_string {
>>> + const char* source_filename;
>>> + unsigned num[num_lipo_cc1_string_kind];
>>> + char **strings[num_lipo_cc1_string_kind];
>>> +};
>>> +
>>> +struct lipo_parsed_cc1_string *
>>> +lipo_parse_saved_cc1_string (const char *src, char *str,
>>> + bool parse_cl_args_only);
>>> +void free_parsed_string (struct lipo_parsed_cc1_string *string);
>>> +
>>>
>>> Declare above in a header file.
>>
>> OK.
>>
>>>
>>>
>>> /* Returns true if the command-line arguments stored in the given module-infos
>>> are incompatible. */
>>> bool
>>> -incompatible_cl_args (struct gcov_module_info* mod_info1,
>>> - struct gcov_module_info* mod_info2)
>>> +incompatible_cl_args (struct lipo_parsed_cc1_string* mod_info1,
>>> + struct lipo_parsed_cc1_string* mod_info2)
>>>
>>> Fix formating.
>> OK.
>>>
>>> {
>>> {
>>> @@ -1647,7 +1679,7 @@ build_var (tree fn_decl, tree type, int counter)
>>> /* Creates the gcov_fn_info RECORD_TYPE. */
>>>
>>> NULL_TREE, get_gcov_unsigned_t ());
>>> DECL_CHAIN (field) = fields;
>>> fields = field;
>>>
>>> - /* Num bracket paths */
>>> + /* cc1_uncompressed_strlen field */
>>> field = build_decl (BUILTINS_LOCATION, FIELD_DECL,
>>> NULL_TREE, get_gcov_unsigned_t ());
>>> DECL_CHAIN (field) = fields;
>>> fields = field;
>>>
>>>
>>> Why do we need to store uncompressed string length? If there is need
>>> to do that, I suggest combine uncompressed length and compressed
>>> length into one 32bit integer. (16/16, or 17/15 split)
>>
>> In theory, I don't need the uncompressed length, But I would need to
>> guess the uncompressed length to allocate the buffer. If the
>> decompressing fails with insufficient space, I need to have another
>> guess and do it again. I think it's easier just save the uncompressed
>> size to the struct. I think combine the two sizes into one
>> gcov_unsigned_t is a good idea. We don't expect the string size are
>> very big.
>
> What is the usual compression ratio? If you guess it right most of the
> time, just get rid of the uncompressed length.
>
> David
>
>>>
>>>
>>> David
>>>
>>> On Mon, Oct 5, 2015 at 3:51 PM, Rong Xu <xur@google.com> wrote:
>>>> Hi,
>>>>
>>>> This patch is for google branch only.
>>>>
>>>> It encodes and compresses various cc1 option strings in
>>>> gcov_module_info to reduce the lipo instrumented object size. The
>>>> savings are from string compression and the reduced number of
>>>> relocations.
>>>>
>>>> More specifically, we replace the following fields in gcov_module_info
>>>> gcov_unsigned_t num_quote_paths;
>>>> gcov_unsigned_t num_bracket_paths;
>>>> gcov_unsigned_t num_system_paths;
>>>> gcov_unsigned_t num_cpp_defines;
>>>> gcov_unsigned_t num_cpp_includes;
>>>> gcov_unsigned_t num_cl_args;
>>>> char *string_array[1];
>>>> with
>>>> gcov_unsigned_t cc1_strlen;
>>>> gcov_unsigned_t cc1_uncompressed_strlen;
>>>> char *saved_cc1_strings;
>>>>
>>>> The new saved_cc1_strings are zlib compressed string.
>>>>
>>>> Tested with google internal benchmarks.
>>>>
>>>> Thanks,
>>>>
>>>> -Rong
next prev parent reply other threads:[~2015-10-06 18:30 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-05 22:51 Rong Xu
2015-10-06 0:33 ` Xinliang David Li
2015-10-06 16:26 ` Rong Xu
2015-10-06 17:21 ` Xinliang David Li
2015-10-06 18:30 ` Rong Xu [this message]
2015-10-06 18:40 ` Rong Xu
2015-10-06 23:37 ` Xinliang David Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAF1bQ=TPcRvEpR-dad6Z4+VFkY1-m4yF79UmZtMHBsvTn+Sk2Q@mail.gmail.com' \
--to=xur@google.com \
--cc=davidxl@google.com \
--cc=gcc-patches@gcc.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).