Re: [RFA] C++-ify parse_format_string

public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed

From: Simon Marchi <simark@simark.ca>
To: Pedro Alves <palves@redhat.com>
Cc: Simon Marchi <simon.marchi@ericsson.com>,
	Tom Tromey <tom@tromey.com>,
	gdb-patches@sourceware.org
Subject: Re: [RFA] C++-ify parse_format_string
Date: Fri, 24 Nov 2017 16:26:00 -0000	[thread overview]
Message-ID: <ef022d403759b0acfeede05540dc39ac@simark.ca> (raw)
In-Reply-To: <b2308402-c29b-5e7e-c721-660c83b0d591@redhat.com>

On 2017-11-24 07:54, Pedro Alves wrote:
> On 11/24/2017 03:17 AM, Simon Marchi wrote:
>> On 2017-11-23 05:40 PM, Pedro Alves wrote:
>>> On 11/23/2017 09:13 PM, Simon Marchi wrote:
>>> 
>>>> I have a patch in some branch that does essentially the same thing, 
>>>> so
>>>> I was able to compare our approaches.  In my version, I removed the
>>>> big allocation that is shared among pieces, and made each piece have
>>>> its own std::string.  Unless we want to keep the current allocation
>>>> scheme for performance/memory usage reasons, I think that using
>>>> std::strings simplifies things in the parse_format_string function.
>>>> The format_pieces structure is replaced with an std::vector of
>>>> format_piece.
>>> 
>>> Sounds like a step backwards to me.  If it simplifies things, then
>>> it sounds like it might be because we're missing some utility,
>>> like string_view or something like that.
>> 
>> I am not sure I understand, can you expand a little bit about what you
>> have in mind?
> 
> I call it a step backwards because simplification is done at the
> cost of efficiency (individual heap allocations, changing the
> ownership model), when the original code avoided that.  It's well known 
> that
> standard C++ is missing basic string manipulation utilities, but that's
> no reason to force ourselves to consider std::string the ultimate tool, 
> IMO.
> There's C++ the library, and then there's the C++ the language.  We 
> can't
> do anything about the latter, but we can extend the former.  Later 
> revisions
> of the standard are adding more facilities, like std::string_view in 
> C++17, and
> now things like std::array_view / std::span (from gsl::span) have a 
> very
> good chance of making their way into C++20.  Many C++ projects end up
> reinventing something like those types, hence the attempts at
> standardization, making those types lingua franca for non-owning use 
> cases.

Thanks for the detailed reply!

In my original reply, I said "Unless we want to keep the current 
allocation scheme for performance/memory usage reasons", removing the 
common allocation was done consciously.  My thought was that the single 
allocation scheme had not been chosen for performance reason, but for 
simplicity.  Computing how much to malloc for every piece would have 
been very annoying.  Making each piece own its string made the code 
easier to reason about, but if the performance hit is not acceptable, I 
understand.

>> What I meant is that is changes things like this:
>> 
>> 	    strncpy (current_substring, percent_loc, length_before_ll);
>> 	    strcpy (current_substring + length_before_ll, "I64");
>> 	    current_substring[length_before_ll + 3] =
>> 	      percent_loc[length_before_ll + lcount];
>> 	    current_substring += length_before_ll + 4;
>> 
>> into this
>> 
>> 	    piece_string.assign (percent_loc, length_before_ll);
>> 	    piece_string += "I64";
>> 	    piece_string += percent_loc[length_before_ll + lcount];
>> 
>> Less magical number, less playing with offsets, etc.
> 
> I meant that you don't need for each piece_string to own its
> own deep copy of the string to get rid of the magical numbers.
> You can get that even with a couple C-like free-functions, like:
> 
>  // memcpy, null-terminate, and return advanced pointer.
>  char *append (char *p, const char *str);
>  char *append (char *p, const char *str, size_t len);
> 
> and then:
> 
>  char *p = current_substring;
> 
>  p = append (p, percent_loc, length_before_ll);
>  p = append (p, "I64");
>  p = append (p, percent_loc[length_before_ll + lcount]);
> 
>  current_substring = p;
> 
> You could wrap that in a class, like:
> 
>  /* Helper to build an null-terminated C string.  */
>  struct zstring_builder
>  {
>     /* POS is a pointer to the position in buffer to work on.  */
>     explicit zstring_builder (char *pos)
>       : m_pos (pos)
>     {}
> 
>     // memcpy, advance m_pos, null-terminate.
>     zstring_builder &append (const char *s, size_type count);
>     zstring_builder &append (const char* s);
> 
>     char *pos () { return m_pos; }
> 
>  private:
>     char *m_pos;
>  };
> 
> Or:
> 
>  /* Helper to build a null-terminated C string on top
>     of a vector.  */
>  struct zstring_builder
>  {
>     /* POS is a pointer to the position in buffer to work on.  */
>     explicit zstring_builder (std::vector<char> &buf, size_t pos);
> 
>     // memcpy, advance m_pos, null-terminate.
>     zstring_builder &append (const char *s, size_type count);
>     zstring_builder &append (const char* s);
> 
>     char *pos () { return &m_vector[m_pos]; }
> 
>  private:
>     std::vector &m_buf;
>     size_t m_pos;
>  };

I agree that something like this is good if we want to keep the current 
allocation scheme but make the code clearer.  Since "append" continues 
the same string piece, I think we would need some kind of "finish" 
method (a bit like obstack) to be able to start a new string:

   zstring_builder builder (buf);
   builder.append ("Hello ");
   builder.append ("world!");
   const char *english = builder.finish ();
   builder.append ("Bonjour ");
   builder.append ("monde!");
   const char *french = builder.finish ();

giving something like this:

   Hello world!\0Bonjour monde!\0

> or something else along those lines, if you now make piece_string
> a zstring_builder, you can get similar, and just as readable code:
> 
>             zstring_builder piece_string (current_substring);
> 
>             piece_string.append (percent_loc, length_before_ll);
>  	    piece_string.append ("I64");
>  	    piece_string.append (percent_loc[length_before_ll + lcount]);
> 
>             current_substring = piece_string.pos ();
> 
> Alternatively, we could have something like a zstring_ref
> class [1], as a non-owning mutable view of a C string (that never
> reallocates), guaranteed to be null-terminated, and then the code
> you shown would be just exactly the same.  Open choices could be 
> whether
> to save the string size inside zstring_ref (like string_view), or
> not saving a size, making it a really-thin wrapper around a
> pointer instead.
> 
> std::string is great as an _owning_ SSO-enabled (which is sometimes a
> curse when embedded in structures...) container of characters.  But
> when we don't want an owning container, the right tool / refactor can
> still make the code just as readable.
> 
> [1] - I've thought several times because that something like a 
> zstring_view
> (like string_view, but guarantees null-termination) would be handy in
> a lot of places.  string_view isn't mutable, though, hence the
> alternative name instead.  could be mut_string_view too, for example.

It's all performance/complexity tradeoffs, as usual :)

Simon

next prev parent reply	other threads:[~2017-11-24 16:26 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-23 16:46 Tom Tromey
2017-11-23 21:14 ` Simon Marchi
2017-11-23 22:40   ` Pedro Alves
2017-11-24  3:17     ` Simon Marchi
2017-11-24 12:54       ` Pedro Alves
2017-11-24 16:26         ` Simon Marchi [this message]
2017-11-25 21:25         ` Tom Tromey
2017-12-02 20:31           ` [PATCH] " Simon Marchi
2017-12-03 14:12             ` Pedro Alves
2017-12-03 17:50               ` Simon Marchi
2017-12-08 16:22                 ` Tom Tromey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ef022d403759b0acfeede05540dc39ac@simark.ca \
    --to=simark@simark.ca \
    --cc=gdb-patches@sourceware.org \
    --cc=palves@redhat.com \
    --cc=simon.marchi@ericsson.com \
    --cc=tom@tromey.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).