From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 128880 invoked by alias); 13 Apr 2017 04:33:08 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Received: (qmail 126245 invoked by uid 89); 13 Apr 2017 04:31:25 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy= X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 13 Apr 2017 04:31:21 +0000 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 43F6885376 for ; Thu, 13 Apr 2017 04:31:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 43F6885376 Authentication-Results: ext-mx01.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx01.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=sergiodj@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com 43F6885376 Received: from localhost (unused-10-15-17-193.yyz.redhat.com [10.15.17.193]) by smtp.corp.redhat.com (Postfix) with ESMTPS id EB47984D8F; Thu, 13 Apr 2017 04:31:21 +0000 (UTC) From: Sergio Durigan Junior To: Pedro Alves Cc: GDB Patches Subject: Re: [PATCH] fork-child.c: Avoid unnecessary heap-allocation / string copying (Re: [PATCH v5 3/5] C++-fy and prepare for sharing fork_inferior) References: <1482464361-4068-1-git-send-email-sergiodj@redhat.com> <20170330014915.4894-1-sergiodj@redhat.com> <20170330014915.4894-4-sergiodj@redhat.com> <877f2q3223.fsf@redhat.com> <87o9w2z068.fsf@redhat.com> <36f28afa-1174-df6a-cb54-fdcf129fa816@redhat.com> <878tn5w9d7.fsf@redhat.com> <01d7a597-ac60-9764-142b-bcaefd269271@redhat.com> Date: Thu, 13 Apr 2017 04:33:00 -0000 In-Reply-To: <01d7a597-ac60-9764-142b-bcaefd269271@redhat.com> (Pedro Alves's message of "Thu, 13 Apr 2017 04:42:38 +0100") Message-ID: <87y3v5szc6.fsf@redhat.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-IsSubscribed: yes X-SW-Source: 2017-04/txt/msg00390.txt.bz2 On Wednesday, April 12 2017, Pedro Alves wrote: > On 04/12/2017 11:26 PM, Sergio Durigan Junior wrote: >> On Wednesday, April 12 2017, Pedro Alves wrote: >> >>> On 04/12/2017 06:04 AM, Sergio Durigan Junior wrote: >>> >>>>>>> static void >>>>>>> -breakup_args (char *scratch, char **argv) >>>>>>> +breakup_args (const std::string &scratch, std::vector &argv) >>>>>>> { >>>>>> >>>>>> ... >>>>>> >>>>>>> + >>>>>>> + std::string arg = scratch.substr (cur_pos, next_sep - cur_pos); >>>>>>> + >>>>>> >>>>>> This creates a temporary string (heap allocates) ... >>>>>> >>>>>>> + argv.push_back (xstrdup (arg.c_str ())); >>>>>> >>>>>> ... and here you create yet another copy. >>>>>> >>>>>> You should be able to avoid it by using e.g., savestring: >>>>>> >>>>>> char *arg = savestring (scratch.c_str () + cur_pos, next_sep - cur_pos); >>>>>> argv.push_back (arg); >>>>> >>>>> Fair enough. I had my mind on "C++-only mode" when writing this code. >>> >>> Yup, in C++, it's good to keep unnecessary temporaries >>> and hidden heap allocations in mind. Actually, now that I look a bit >>> deeper, I think we can avoid a "premature pessimization" here, keeping >>> the same level of clarity. >>> >>> I think it'd be good to push this patch below. WDYT? >> >> Hm, I don't know. I mean, I understand where you're coming from and >> what you're trying to achieve, but somehow I thought the old code >> (especially the part that modified the argument string in place, using >> \0's as separators) was somewhat hacky. Even though the new code uses >> more memory, it it also more readable, at least IMNSHO. And also more >> const-correct, since we can make breakup_args_for_exec receive a const >> as the first argument. > > There's nothing wrong with writing to a string in place. A std::string > is just a _container_ of chars with convenience methods. It's > been designed to support embedded \0s. And it's not a problem for a > function to take a copy of a value if it needs to modify it! > const-correct refers to when functions that _don't need to > modify an argument nevertheless declare non-const reference/pointer. Sorry, maybe I should have made myself clearer. I did not mean to say that there was anything actually wrong with modifying a string in-place; I just meant that it seems somewhat hacky and more error-prone to me than separate the string into substrings. I totally understand that a std::string is just a container, and that it is OK to modify it if needed. But we're not talking about a generic string manipulation here; we're talking about inserting \0's on the string in order to split it. Anyway, I totally understand this is valid code. About the const-correctness comment: you're right, I should have said that the original function is better IMHO because it doesn't modify its arguments. > Effectively you can't avoid _some_ creating _some_ new string(s). > But instead of taking one single copy (O(1)), the current code is making > many (N) tiny copies. Trips to the generic allocator / malloc should > be one of the first things to avoid if easy to avoid, as a principle. Yep, I am aware that the code is more efficient using less strings, but I thought "OK, how many args will we be dealing with here, usually?", and I decided that it wasn't worth. > The patch I had posted is exactly equivalent to the one below, except > the state/storage in this one is moved to a class. Maybe you'd > find this clearer? (It'd need some cleaning up for comments > further at least, but it's getting quite late here...) I liked your patch. There are some things I'd like to comment, but overall it is good and I think it does a nice job of separating things on fork_inferior. As I said before, I won't really make a fuss about this approach if you think it is the right way to go; my comments were solely based on my personal taste. > From 689a881839f9d4aeb36695676856db2eaf8e97ab Mon Sep 17 00:00:00 2001 > From: Pedro Alves > Date: Thu, 13 Apr 2017 04:13:22 +0100 > Subject: [PATCH] fork-child.c: Avoid unnecessary heap-allocation / string > copying > > The previous patch converted the argv building from an > alloca-allocated array of non-owning arg pointers, to a std::vector of > owning pointers, which results in N string dups, with N being the > number of arguments in the vector, and then requires manually > releasing the pointers owned by the vector. > > This patch makes the vector hold non-owning pointers, and avoids the > string dups, by doing one single string copy of the arguments upfront, > and replacing separators with NULL terminators in place, like we used > to. > > With this, there's no need to remember to call free_vector_argv either. > > gdb/ChangeLog: > yyyy-mm-dd Pedro Alves > > * fork-child.c (breakup_args): Make 'scratch' non-const. > Replace separators with NULL terminators in place. Change > type of vector. (fork_inferior): The argument vector now > holds non-owning pointers. Don't strdup strings into the > vector. Remove free_vector_argv call. > --- > gdb/fork-child.c | 278 ++++++++++++++++++++++++++++++++++--------------------- > 1 file changed, 172 insertions(+), 106 deletions(-) > > diff --git a/gdb/fork-child.c b/gdb/fork-child.c > index 6b7386e..91986f7 100644 > --- a/gdb/fork-child.c > +++ b/gdb/fork-child.c > @@ -43,38 +43,96 @@ extern char **environ; > > static char *exec_wrapper; > > -/* Break up SCRATCH into an argument vector suitable for passing to > - execvp and store it in ARGV. E.g., on "run a b c d" this routine > +/* Build the argument vector for execv. */ > + > +class execv_argv_builder > +{ > +public: > + /* EXEC_FILE is the file to run. ALLARGS is a string containing the > + arguments to the program. If starting with a shell, SHELL_FILE > + is the shell to run. Otherwise, SHELL_FILE is NULL. */ > + execv_argv_builder (const char *exec_file, > + const std::string &allargs, > + const char *shell_file); > + > + /* Return a pointer to the built argv, in the type expected by > + execv. */ > + char **argv () > + { > + /* It is guaranteed that the exec functions do not modify the > + arguments, but they nevertheless expect "char **", so that's > + the type we return. */ > + return const_cast (&m_argv[0]); > + } > + > +private: > + void build_no_shell (const char *exec_file, > + const std::string &allargs); > + > + void build_shell (const char *exec_file, > + const std::string &allargs, > + const char *shell_file); Missing comments on the functions. > + > + /* The argument vector built. Holds non-owning pointers. */ > + std::vector m_argv; OOC, I notice you're prefixing a lot (if not all) of your class variables with "m_". I understand this is a code convention to signal when a variable is a member of a class, but are we following this officially? Because (again!) I'm not really fond of this :-). > + /* Storage. Arguments in C_ARGV point inside these. */ > + > + /* For the !shell case. */ > + std::string m_broke_up_args; If we're going this way, then I'd like to see this comment expanded to explain how the args are split, i.e., using \0 as the delimiter. > + > + /* For the shell case. */ > + std::string m_shell_command; > +}; > + > +/* Break up allargs into an argument vector suitable for passing to > + execvp and store it in M_ARGV. E.g., on "run a b c d" this routine > would get as input the string "a b c d", and as output it would > - fill in ARGV with the four arguments "a", "b", "c", "d". */ > + fill in M_ARGV with the four arguments "a", "b", "c", "d". */ > > -static void > -breakup_args (const std::string &scratch, std::vector &argv) > +void > +execv_argv_builder::build_no_shell (const char *exec_file, > + const std::string &allargs) > { > - for (size_t cur_pos = 0; cur_pos < scratch.size ();) > + /* We're going to call execvp. Create argument vector. */ > + > + /* Save/work with a copy. The pointers pushed to M_ARGV point > + directly into M_BROKE_UP_ARGS, which is modified in place with > + the necessary NULL terminators. */ > + m_broke_up_args = allargs; > + > + m_argv.push_back (exec_file); > + > + for (size_t cur_pos = 0; cur_pos < m_broke_up_args.size ();) > { > /* Skip whitespace-like chars. */ > - std::size_t pos = scratch.find_first_not_of (" \t\n", cur_pos); > + std::size_t pos = m_broke_up_args.find_first_not_of (" \t\n", cur_pos); > > if (pos != std::string::npos) > cur_pos = pos; > > /* Find the position of the next separator. */ > - std::size_t next_sep = scratch.find_first_of (" \t\n", cur_pos); > + std::size_t next_sep = m_broke_up_args.find_first_of (" \t\n", cur_pos); > > - /* No separator found, which means this is the last > - argument. */ > if (next_sep == std::string::npos) > - next_sep = scratch.size (); > + { > + /* No separator found, which means this is the last > + argument. */ > + next_sep = m_broke_up_args.size (); > + } > + else > + { > + /* Replace the separator with a terminator. */ > + m_broke_up_args[next_sep++] = '\0'; > + } > > - char *arg = savestring (scratch.c_str () + cur_pos, next_sep - cur_pos); > - argv.push_back (arg); > + m_argv.push_back (&m_broke_up_args[cur_pos]); > > cur_pos = next_sep; > } > > /* NULL-terminate the vector. */ > - argv.push_back (NULL); > + m_argv.push_back (NULL); > } > > /* When executing a command under the given shell, return non-zero if > @@ -101,6 +159,101 @@ escape_bang_in_quoted_argument (const char *shell_file) > return 0; > } Missing a comment here explaining the function below (or pointing to the right place). > +execv_argv_builder::execv_argv_builder (const char *exec_file, > + const std::string &allargs, > + const char *shell_file) > +{ > + if (shell_file == NULL) > + build_no_shell (exec_file, allargs); > + else > + build_shell (exec_file, allargs, shell_file); > +} And here. > + > +void > +execv_argv_builder::build_shell (const char *exec_file, > + const std::string &allargs, > + const char *shell_file) > +{ > + /* We're going to call a shell. */ > + const char *p; > + int need_to_quote; > + const int escape_bang = escape_bang_in_quoted_argument (shell_file); > + > + m_shell_command = "exec "; > + > + /* Add any exec wrapper. That may be a program name with arguments, > + so the user must handle quoting. */ > + if (exec_wrapper) > + { > + m_shell_command += exec_wrapper; > + m_shell_command += ' '; > + } > + > + /* Now add exec_file, quoting as necessary. */ > + > + /* Quoting in this style is said to work with all shells. But csh > + on IRIX 4.0.1 can't deal with it. So we only quote it if we need > + to. */ > + p = exec_file; > + while (1) > + { > + switch (*p) > + { > + case '\'': > + case '!': > + case '"': > + case '(': > + case ')': > + case '$': > + case '&': > + case ';': > + case '<': > + case '>': > + case ' ': > + case '\n': > + case '\t': > + need_to_quote = 1; > + goto end_scan; > + > + case '\0': > + need_to_quote = 0; > + goto end_scan; > + > + default: > + break; > + } > + ++p; > + } > + end_scan: > + if (need_to_quote) > + { > + m_shell_command += '\''; > + for (p = exec_file; *p != '\0'; ++p) > + { > + if (*p == '\'') > + m_shell_command += "'\\''"; > + else if (*p == '!' && escape_bang) > + m_shell_command += "\\!"; > + else > + m_shell_command += *p; > + } > + m_shell_command += '\''; > + } > + else > + m_shell_command += exec_file; > + > + m_shell_command += " " + allargs; > + > + /* If we decided above to start up with a shell, we exec the shell, > + "-c" says to interpret the next arg as a shell command to > + execute, and this command is "exec ". */ > + m_argv.reserve (4); > + m_argv.push_back (shell_file); > + m_argv.push_back ("-c"); > + m_argv.push_back (m_shell_command.c_str ()); > + m_argv.push_back (NULL); > +} > + > /* See inferior.h. */ > > void > @@ -155,7 +308,6 @@ fork_inferior (const char *exec_file_arg, const std::string &allargs, > static const char *exec_file; > char **save_our_env; > int shell = 0; > - std::vector argv; I believe you can also remove the 'static' keyword from the 'shell_file' declaration. > const char *inferior_io_terminal = get_inferior_io_terminal (); > struct inferior *inf; > int i; > @@ -183,95 +335,9 @@ fork_inferior (const char *exec_file_arg, const std::string &allargs, > shell = 1; > } > > - if (!shell) > - { > - /* We're going to call execvp. Create argument vector. */ > - argv.push_back (xstrdup (exec_file)); > - breakup_args (allargs, argv); > - } > - else > - { > - /* We're going to call a shell. */ > - std::string shell_command; > - const char *p; > - int need_to_quote; > - const int escape_bang = escape_bang_in_quoted_argument (shell_file); > - > - shell_command = std::string ("exec "); > - > - /* Add any exec wrapper. That may be a program name with arguments, so > - the user must handle quoting. */ > - if (exec_wrapper) > - { > - shell_command += exec_wrapper; > - shell_command += ' '; > - } > - > - /* Now add exec_file, quoting as necessary. */ > - > - /* Quoting in this style is said to work with all shells. But > - csh on IRIX 4.0.1 can't deal with it. So we only quote it if > - we need to. */ > - p = exec_file; > - while (1) > - { > - switch (*p) > - { > - case '\'': > - case '!': > - case '"': > - case '(': > - case ')': > - case '$': > - case '&': > - case ';': > - case '<': > - case '>': > - case ' ': > - case '\n': > - case '\t': > - need_to_quote = 1; > - goto end_scan; > - > - case '\0': > - need_to_quote = 0; > - goto end_scan; > - > - default: > - break; > - } > - ++p; > - } > - end_scan: > - if (need_to_quote) > - { > - shell_command += '\''; > - for (p = exec_file; *p != '\0'; ++p) > - { > - if (*p == '\'') > - shell_command += "'\\''"; > - else if (*p == '!' && escape_bang) > - shell_command += "\\!"; > - else > - shell_command += *p; > - } > - shell_command += '\''; > - } > - else > - shell_command += exec_file; > - > - shell_command += " " + allargs; > - > - /* If we decided above to start up with a shell, we exec the > - shell, "-c" says to interpret the next arg as a shell command > - to execute, and this command is "exec > - ". We xstrdup all the strings here because they will > - be free'd later in the code. */ > - argv.push_back (xstrdup (shell_file)); > - argv.push_back (xstrdup ("-c")); > - argv.push_back (xstrdup (shell_command.c_str ())); > - argv.push_back (NULL); > - } > + /* Build the argument vector. */ > + execv_argv_builder argv_builder (exec_file, allargs, > + shell ? shell_file : NULL); Good code cleanup! BTW, it is possible to simplify this part and remove the "shell" variable completely, using only "shell_file". > > /* Retain a copy of our environment variables, since the child will > replace the value of environ and if we're vforked, we have to > @@ -376,6 +442,8 @@ fork_inferior (const char *exec_file_arg, const std::string &allargs, > path to find $SHELL. Rich Pixley says so, and I agree. */ > environ = env; > > + char **argv = argv_builder.argv (); > + > if (exec_fun != NULL) > (*exec_fun) (argv[0], &argv[0], env); > else > @@ -393,8 +461,6 @@ fork_inferior (const char *exec_file_arg, const std::string &allargs, > _exit (0177); > } > > - free_vector_argv (argv); > - > /* Restore our environment in case a vforked child clob'd it. */ > environ = save_our_env; > > -- > 2.5.5 Otherwise, LGTM. Thanks for doing this, -- Sergio GPG key ID: 237A 54B1 0287 28BF 00EF 31F4 D0EB 7628 65FC 5E36 Please send encrypted e-mail if possible http://sergiodj.net/