From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 18425 invoked by alias); 29 Apr 2014 01:13:14 -0000 Mailing-List: contact gsl-discuss-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gsl-discuss-owner@sourceware.org Received: (qmail 18410 invoked by uid 89); 29 Apr 2014 01:13:12 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.3 required=5.0 tests=AWL,BAYES_00,RP_MATCHES_RCVD,SPF_PASS autolearn=ham version=3.3.2 X-HELO: ipmx5.colorado.edu Received: from ipmx5.colorado.edu (HELO ipmx5.colorado.edu) (128.138.128.235) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 29 Apr 2014 01:13:11 +0000 From: Patrick Alken Received: from 174-29-218-253.hlrn.qwest.net (HELO [192.168.0.226]) ([174.29.218.253]) by smtp.colorado.edu with ESMTP/TLS/DHE-RSA-AES128-SHA; 28 Apr 2014 19:13:09 -0600 Message-ID: <535EFCA5.8050807@colorado.edu> Date: Tue, 29 Apr 2014 01:13:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 MIME-Version: 1.0 To: gsl-discuss@sourceware.org Subject: Re: switched sparse format to binary trees References: <535DAFC0.8090402@colorado.edu> <535ED7A5.7020309@lanl.gov> In-Reply-To: <535ED7A5.7020309@lanl.gov> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-SW-Source: 2014-q2/txt/msg00015.txt.bz2 On 04/28/2014 04:35 PM, Gerard Jungman wrote: > > On 04/27/2014 07:32 PM, Patrick Alken wrote: > >> This actually makes a huge speed improvement in duplicate detection and >> element retrieval for large sparse matrices. There may however be a >> performance hit since a new tree node must be allocated each time an >> element is added to the matrix. But I believe the tradeoff is worth it. > > > For what it's worth, the right thing to do is probably > to expose the avl allocation strategy to the GSL client > in some way. It could be as simple as an extra argument > to gsl_spmatrix_alloc(), which could be a pointer to a > generic allocator strategy struct (similar or identical > to 'struct libavl_allocator'). This is the most general > solution, which requires that clients who care be able > to write their own allocator. Interesting idea...I'll think a bit more on it. Unfortunately I don't think there is any easy way to preallocate a binary tree. Even if the user knows ahead of time how many elements they want to add to the sparse matrix, its not so easy to preallocate the whole tree, since after each insertion, the tree rebalances itself to minimize its height. I did do some tests on my own machine, inserting 2.5M elements into a sparse matrix and profiling the code. I found that the avl allocation of a new node was insignificant, compared to searching the tree each time to check for duplicate elements. I also found a study online where people benchmarked various sparse matrix storage strategies. The other main contender is to use a hash table, which is what matlab does for its sparse matrices. For that approach you need to develop a strategy to deal with hash collisions, which can happen frequently for huge matrices. Anyway the study found that AVL trees were much superior to hash tables in terms of speed. The study is here: http://link.springer.com/chapter/10.1007%2F978-3-540-25944-2_135 Anyway, its a good idea to give the user more control over the allocation, so I will think some more on how to do this elegantly. Patrick