public inbox for libabigail@sourceware.org
 help / color / mirror / Atom feed
* [Bug default/19427] Intern the strings used in Libabigail
  2016-01-01  0:00 [Bug default/19427] New: Intern the strings used in Libabigail dodji at redhat dot com
  2016-01-01  0:00 ` [Bug default/19427] " dodji at seketeli dot org
  2016-01-01  0:00 ` dodji at redhat dot com
@ 2016-01-01  0:00 ` dodji at redhat dot com
  2016-01-01  0:00 ` michi.henning at canonical dot com
  2016-01-01  0:00 ` dodji at redhat dot com
  4 siblings, 0 replies; 7+ messages in thread
From: dodji at redhat dot com @ 2016-01-01  0:00 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=19427

dodji at redhat dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED

--- Comment #1 from dodji at redhat dot com ---
So I started to work on this and I do have a working branch (named
'str-intern') with the necessary changes undocumented (yet).  You can browse it
at
https://sourceware.org/git/gitweb.cgi?p=libabigail.git;a=shortlog;h=refs/heads/dodji/str-intern.

I am going to post time and memory consumption comparison using the code base
that is built with optimization (-O2).

So here is the resource usage of abidw --abidiff on r300_dri.so, for the master
branch:

real => 5:03.31
user => 300.24
sys => 2.86
max mem => 4959036KB

And the resource usage for the str-intern branch:

real => 4:56.98
user => 294.12
sys => 2.65
max mem => 4617328KB

So, as you can see, it slightly improves the speed of this test (by 6 seconds),
and significantly improves memory usage (saving more than 300 mega bytes of
memory).

The problem is that, of smaller tests and tests that don't involve emitting
abixml, things are a little bit slower, actually.  In other words, abidiff, for
instance, becomes slightly slower.  The memory consumption savings are still
there though.

That is, the cost of looking up strings in a hash table to ensure that each
string exists in only one copy in the environment (this is string interning)
makes the loading of abi corpora slower.  But then, comparing *strings* later
becomes faster as comparing two strings amounts to just comparing two pointers.
 But we need to compare a lot of strings to make up for the cost of interning
them in the first place.  And the place where we compare strings the most at
the moment is when we emit abixml (i.e, in abidw).

During decls comparisons it turns out we don't compare strings that much
because we compare their types first.  And thanks to type canonicalization,
comparing two types is very fast.  And as the majority of comparisons yield a
negative result, we don't even get to compare the names of the decls.

So I am still not sure if I am going to incorporate this optimization in the
end.  I *am* inclined to merge it, because it makes the library consume less
memory, and it speeds up abixml writing, especially for big libraries.  In
other words, it makes libabigail scale more.  But then it slows it slightly on
small workloads (which are quite fast anyway).

I'll give this a little bit more thought.

But in the mean time, if you have some thoughts, please share them :-)

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug default/19427] Intern the strings used in Libabigail
  2016-01-01  0:00 [Bug default/19427] New: Intern the strings used in Libabigail dodji at redhat dot com
                   ` (2 preceding siblings ...)
  2016-01-01  0:00 ` dodji at redhat dot com
@ 2016-01-01  0:00 ` michi.henning at canonical dot com
  2016-01-01  0:00   ` Dodji Seketeli
  2016-01-01  0:00 ` dodji at redhat dot com
  4 siblings, 1 reply; 7+ messages in thread
From: michi.henning at canonical dot com @ 2016-01-01  0:00 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=19427

Michi Henning <michi.henning at canonical dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |michi.henning at canonical dot com

--- Comment #3 from Michi Henning <michi.henning at canonical dot com> ---
Not that this is any of my business :-)

The 2% speed difference is insignificant. No-one will care.

The memory savings are nice, but not *that* huge (around 7%). I'd be inclined
to make this kind of change only if the impact on code complexity and
maintainability is low.

I've worked with code bases that create their own string implementation in the
past and, usually, it creates more problems than it solves. Often, the problem
ends up being that the replacement cannot be easily exchanged with std::string
and vice-versa, and then ends up costing more than it saves. So, whenever I see
a string replacement class, I tend to look twice.

I don't know whether you can use boost in your code. If so, boost::flyweight
might be a useful approach.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug default/19427] New: Intern the strings used in Libabigail
@ 2016-01-01  0:00 dodji at redhat dot com
  2016-01-01  0:00 ` [Bug default/19427] " dodji at seketeli dot org
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: dodji at redhat dot com @ 2016-01-01  0:00 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=19427

            Bug ID: 19427
           Summary: Intern the strings used in Libabigail
           Product: libabigail
           Version: unspecified
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: default
          Assignee: dodji at redhat dot com
          Reporter: dodji at redhat dot com
                CC: libabigail at sourceware dot org
  Target Milestone: ---

From the performance work related to bug #19355, it appears that we could
significantly improve the performance of emitting the XML ABI representation of
a binary by interning the strings
(https://en.wikipedia.org/wiki/String_interning) used to represent the names
and qualified names of types throughout the internal representation of the
Library.

So the idea is to introduce a new abigail::intern_string type, make it
inter-operable with the std::string used elsewhere in the library, provide a
string intern pool that is accessible from the current instance of
abigail::environment, and use that pool to create the interned strings for a
given run of the library that uses a given environment.

Further profiling would tell us if other spots in the library would benefit
from using string interning.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug default/19427] Intern the strings used in Libabigail
  2016-01-01  0:00 [Bug default/19427] New: Intern the strings used in Libabigail dodji at redhat dot com
                   ` (3 preceding siblings ...)
  2016-01-01  0:00 ` michi.henning at canonical dot com
@ 2016-01-01  0:00 ` dodji at redhat dot com
  4 siblings, 0 replies; 7+ messages in thread
From: dodji at redhat dot com @ 2016-01-01  0:00 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=19427

dodji at redhat dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED

--- Comment #2 from dodji at redhat dot com ---
I have finally merged the patch that performs this optimization.

It's commit
https://sourceware.org/git/gitweb.cgi?p=libabigail.git;a=commit;h=cf8eba68c363c1d4159e8a9e4920b14d0d54bb9c
in the master branch.

This should be available in the 1.0.rc3 release.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug default/19427] Intern the strings used in Libabigail
  2016-01-01  0:00 [Bug default/19427] New: Intern the strings used in Libabigail dodji at redhat dot com
@ 2016-01-01  0:00 ` dodji at seketeli dot org
  2016-01-01  0:00 ` dodji at redhat dot com
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: dodji at seketeli dot org @ 2016-01-01  0:00 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=19427

--- Comment #4 from dodji at seketeli dot org ---
> Not that this is any of my business :-)

It is, otherwise, this would not be an open project ;-)

> The 2% speed difference is insignificant. No-one will care.
>
> The memory savings are nice, but not *that* huge (around 7%). I'd be inclined
> to make this kind of change only if the impact on code complexity and
> maintainability is low.

Actually, someone brought to my attention a test case where we have 30%
speed increase ;-)

This is on the nss package, from mozilla.  That is why I merged it in.
In other words, it makes the library scale a lot more on packages with
deep aggregate type hierarchies where each aggregate type has a lot of data
members, and where sub-types reference each other.  If you also have a
lot of incomplete types in the mix, then you end up doing a *lot* of
type name comparisons.  And this is where interned strings help.

> I've worked with code bases that create their own string implementation in the
> past and, usually, it creates more problems than it solves. Often, the problem
> ends up being that the replacement cannot be easily exchanged with std::string
> and vice-versa, and then ends up costing more than it saves. So, whenever I see
> a string replacement class, I tend to look twice.

Right, I totally agree.

This is why I tried very hard to make abigail::interned_string to be
just a pointer to std::string, in essence.  It's not a new string.  The
idea is just that comparing two instances or interned_string amounts to
comparing the values of the pointers to std::string, as there is only
one copy of each distinct std::string in the entire environment now.

> I don't know whether you can use boost in your code.

As this thing should work on systems that are not updated that much, for
instance, el6, we don't use boost, no.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Bug default/19427] Intern the strings used in Libabigail
  2016-01-01  0:00 ` michi.henning at canonical dot com
@ 2016-01-01  0:00   ` Dodji Seketeli
  0 siblings, 0 replies; 7+ messages in thread
From: Dodji Seketeli @ 2016-01-01  0:00 UTC (permalink / raw)
  To: michi.henning at canonical dot com; +Cc: libabigail


> Not that this is any of my business :-)

It is, otherwise, this would not be an open project ;-)

> The 2% speed difference is insignificant. No-one will care.
>
> The memory savings are nice, but not *that* huge (around 7%). I'd be inclined
> to make this kind of change only if the impact on code complexity and
> maintainability is low.

Actually, someone brought to my attention a test case where we have 30%
speed increase ;-)

This is on the nss package, from mozilla.  That is why I merged it in.
In other words, it makes the library scale a lot more on packages with
deep aggregate type hierarchies where each aggregate type has a lot of data
members, and where sub-types reference each other.  If you also have a
lot of incomplete types in the mix, then you end up doing a *lot* of
type name comparisons.  And this is where interned strings help.

> I've worked with code bases that create their own string implementation in the
> past and, usually, it creates more problems than it solves. Often, the problem
> ends up being that the replacement cannot be easily exchanged with std::string
> and vice-versa, and then ends up costing more than it saves. So, whenever I see
> a string replacement class, I tend to look twice.

Right, I totally agree.

This is why I tried very hard to make abigail::interned_string to be
just a pointer to std::string, in essence.  It's not a new string.  The
idea is just that comparing two instances or interned_string amounts to
comparing the values of the pointers to std::string, as there is only
one copy of each distinct std::string in the entire environment now.

> I don't know whether you can use boost in your code.

As this thing should work on systems that are not updated that much, for
instance, el6, we don't use boost, no.

-- 
		Dodji

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug default/19427] Intern the strings used in Libabigail
  2016-01-01  0:00 [Bug default/19427] New: Intern the strings used in Libabigail dodji at redhat dot com
  2016-01-01  0:00 ` [Bug default/19427] " dodji at seketeli dot org
@ 2016-01-01  0:00 ` dodji at redhat dot com
  2016-01-01  0:00 ` dodji at redhat dot com
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: dodji at redhat dot com @ 2016-01-01  0:00 UTC (permalink / raw)
  To: libabigail

https://sourceware.org/bugzilla/show_bug.cgi?id=19427

dodji at redhat dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Blocks|                            |19355


Referenced Bugs:

https://sourceware.org/bugzilla/show_bug.cgi?id=19355
[Bug 19355] [PERF] Libabigail slow on the r300_dri.so binary
-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-02-24 23:01 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-01  0:00 [Bug default/19427] New: Intern the strings used in Libabigail dodji at redhat dot com
2016-01-01  0:00 ` [Bug default/19427] " dodji at seketeli dot org
2016-01-01  0:00 ` dodji at redhat dot com
2016-01-01  0:00 ` dodji at redhat dot com
2016-01-01  0:00 ` michi.henning at canonical dot com
2016-01-01  0:00   ` Dodji Seketeli
2016-01-01  0:00 ` dodji at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).