public inbox for libabigail@sourceware.org
 help / color / mirror / Atom feed
From: "dodji at redhat dot com" <sourceware-bugzilla@sourceware.org>
To: libabigail@sourceware.org
Subject: [Bug default/19427] Intern the strings used in Libabigail
Date: Fri, 01 Jan 2016 00:00:00 -0000	[thread overview]
Message-ID: <bug-19427-9487-3tJNNE7fpn@http.sourceware.org/bugzilla/> (raw)
In-Reply-To: <bug-19427-9487@http.sourceware.org/bugzilla/>

https://sourceware.org/bugzilla/show_bug.cgi?id=19427

dodji at redhat dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED

--- Comment #1 from dodji at redhat dot com ---
So I started to work on this and I do have a working branch (named
'str-intern') with the necessary changes undocumented (yet).  You can browse it
at
https://sourceware.org/git/gitweb.cgi?p=libabigail.git;a=shortlog;h=refs/heads/dodji/str-intern.

I am going to post time and memory consumption comparison using the code base
that is built with optimization (-O2).

So here is the resource usage of abidw --abidiff on r300_dri.so, for the master
branch:

real => 5:03.31
user => 300.24
sys => 2.86
max mem => 4959036KB

And the resource usage for the str-intern branch:

real => 4:56.98
user => 294.12
sys => 2.65
max mem => 4617328KB

So, as you can see, it slightly improves the speed of this test (by 6 seconds),
and significantly improves memory usage (saving more than 300 mega bytes of
memory).

The problem is that, of smaller tests and tests that don't involve emitting
abixml, things are a little bit slower, actually.  In other words, abidiff, for
instance, becomes slightly slower.  The memory consumption savings are still
there though.

That is, the cost of looking up strings in a hash table to ensure that each
string exists in only one copy in the environment (this is string interning)
makes the loading of abi corpora slower.  But then, comparing *strings* later
becomes faster as comparing two strings amounts to just comparing two pointers.
 But we need to compare a lot of strings to make up for the cost of interning
them in the first place.  And the place where we compare strings the most at
the moment is when we emit abixml (i.e, in abidw).

During decls comparisons it turns out we don't compare strings that much
because we compare their types first.  And thanks to type canonicalization,
comparing two types is very fast.  And as the majority of comparisons yield a
negative result, we don't even get to compare the names of the decls.

So I am still not sure if I am going to incorporate this optimization in the
end.  I *am* inclined to merge it, because it makes the library consume less
memory, and it speeds up abixml writing, especially for big libraries.  In
other words, it makes libabigail scale more.  But then it slows it slightly on
small workloads (which are quite fast anyway).

I'll give this a little bit more thought.

But in the mean time, if you have some thoughts, please share them :-)

-- 
You are receiving this mail because:
You are on the CC list for the bug.

  parent reply	other threads:[~2016-02-09 15:58 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-01  0:00 [Bug default/19427] New: " dodji at redhat dot com
2016-01-01  0:00 ` [Bug default/19427] " dodji at seketeli dot org
2016-01-01  0:00 ` dodji at redhat dot com
2016-01-01  0:00 ` michi.henning at canonical dot com
2016-01-01  0:00   ` Dodji Seketeli
2016-01-01  0:00 ` dodji at redhat dot com [this message]
2016-01-01  0:00 ` dodji at redhat dot com

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-19427-9487-3tJNNE7fpn@http.sourceware.org/bugzilla/ \
    --to=sourceware-bugzilla@sourceware.org \
    --cc=libabigail@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).