public inbox for gsl-discuss@sourceware.org
 help / color / mirror / Atom feed
From: Patrick Alken <patrick.alken@Colorado.EDU>
To: gsl-discuss@sourceware.org
Subject: Re: switched sparse format to binary trees
Date: Tue, 29 Apr 2014 01:13:00 -0000	[thread overview]
Message-ID: <535EFCA5.8050807@colorado.edu> (raw)
In-Reply-To: <535ED7A5.7020309@lanl.gov>

On 04/28/2014 04:35 PM, Gerard Jungman wrote:
> 
> On 04/27/2014 07:32 PM, Patrick Alken wrote:
> 
>> This actually makes a huge speed improvement in duplicate detection and
>> element retrieval for large sparse matrices. There may however be a
>> performance hit since a new tree node must be allocated each time an
>> element is added to the matrix. But I believe the tradeoff is worth it.
> 
> 
> For what it's worth, the right thing to do is probably
> to expose the avl allocation strategy to the GSL client
> in some way. It could be as simple as an extra argument
> to gsl_spmatrix_alloc(), which could be a pointer to a
> generic allocator strategy struct (similar or identical
> to 'struct libavl_allocator'). This is the most general
> solution, which requires that clients who care be able
> to write their own allocator.

Interesting idea...I'll think a bit more on it. Unfortunately I don't
think there is any easy way to preallocate a binary tree. Even if the
user knows ahead of time how many elements they want to add to the
sparse matrix, its not so easy to preallocate the whole tree, since
after each insertion, the tree rebalances itself to minimize its height.

I did do some tests on my own machine, inserting 2.5M elements into a
sparse matrix and profiling the code. I found that the avl allocation of
a new node was insignificant, compared to searching the tree each time
to check for duplicate elements.

I also found a study online where people benchmarked various sparse
matrix storage strategies. The other main contender is to use a hash
table, which is what matlab does for its sparse matrices. For that
approach you need to develop a strategy to deal with hash collisions,
which can happen frequently for huge matrices.

Anyway the study found that AVL trees were much superior to hash tables
in terms of speed. The study is here:

http://link.springer.com/chapter/10.1007%2F978-3-540-25944-2_135

Anyway, its a good idea to give the user more control over the
allocation, so I will think some more on how to do this elegantly.

Patrick

  reply	other threads:[~2014-04-29  1:13 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-28  1:32 Patrick Alken
2014-04-28 22:35 ` Gerard Jungman
2014-04-29  1:13   ` Patrick Alken [this message]
2014-04-29  8:40     ` Frank Reininghaus
2014-04-29 19:15       ` Gerard Jungman
2014-04-29 19:52         ` Patrick Alken
2014-04-29 22:07           ` Gerard Jungman
2014-05-01 18:46             ` Patrick Alken
2014-05-01 23:21               ` Gerard Jungman
2014-05-02  1:48                 ` GSL containers (switched sparse format to binary trees) Gerard Jungman
2014-05-02  1:56                   ` Gerard Jungman
2014-05-02  8:29                     ` Rhys Ulerich
2014-05-02  8:33                       ` Rhys Ulerich
2014-05-02  8:52                         ` Rhys Ulerich
2014-05-02 15:02                     ` Patrick Alken
2014-05-13 21:59                       ` Gerard Jungman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=535EFCA5.8050807@colorado.edu \
    --to=patrick.alken@colorado.edu \
    --cc=gsl-discuss@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).