From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 97553 invoked by alias); 7 Feb 2016 19:32:31 -0000 Mailing-List: contact gsl-discuss-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gsl-discuss-owner@sourceware.org Received: (qmail 97532 invoked by uid 89); 7 Feb 2016 19:32:30 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.0 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 spammy=dox X-HELO: mail-ig0-f171.google.com Received: from mail-ig0-f171.google.com (HELO mail-ig0-f171.google.com) (209.85.213.171) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Sun, 07 Feb 2016 19:32:28 +0000 Received: by mail-ig0-f171.google.com with SMTP id 5so45929215igt.0 for ; Sun, 07 Feb 2016 11:32:27 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=flga1D4n5OKlDU3VKZfJdMfkkIbQ6hHUf/7/jy9f6FM=; b=FPN1tPbcwAk4WoYrhckTjQFC00qW2dTN7A1gcZLYcYZM9TExPuFYr51Szi60YOJ3PY YbZgM0x/w80tG1UkpZ4A70NX5V1av9r+51mB6q5C+HPAYY7J5nazZHALkjK4UGLlOVrQ UK2iO0oOmxMFmhf8dCJ3qszjGw69LR/5u2pflS9wvKwIeB+nuD5yBxorMjnSJlX6g1yi DKHOLiBJRMtReRo/8wtLJfErUPB21mh8a3Vxyb89uwNIqNValJIWH+w3lB3zFZOTJFS2 1FGLK1S4IBzYaMayLr4WcWq0I7Bszm1iRd5ZVvau8KT6mnXLyfxt99e0uxYu8Uhpzzcn eG2Q== X-Gm-Message-State: AG10YOQiOfcDXLARHeoooP1Y74YZcY7c/5MNk67mkxhL9rQktldSEquL+9VDzzoYCG+n3TdHc+mdnwomn9ixjQ== MIME-Version: 1.0 X-Received: by 10.50.6.104 with SMTP id z8mr20673263igz.58.1454873546136; Sun, 07 Feb 2016 11:32:26 -0800 (PST) Received: by 10.79.118.141 with HTTP; Sun, 7 Feb 2016 11:32:26 -0800 (PST) In-Reply-To: <56B77E13.1000306@colorado.edu> References: <569E6C33.1090505@colorado.edu> <569EA1A9.2080101@colorado.edu> <56B689B1.5090005@colorado.edu> <56B77E13.1000306@colorado.edu> Date: Sun, 07 Feb 2016 19:32:00 -0000 Message-ID: Subject: Re: Sparse matrix extension From: Alexis Tantet To: Patrick Alken Cc: "gsl-discuss@sourceware.org" Content-Type: text/plain; charset=UTF-8 X-SW-Source: 2016-q1/txt/msg00008.txt.bz2 Hi Patrick, On Sun, Feb 7, 2016 at 6:25 PM, Patrick Alken wrote: > On 02/06/2016 06:03 PM, Alexis Tantet wrote: >> Hi Patrick, >> >> Thanks for working on it. Following, some comments: >> >> On Sun, Feb 7, 2016 at 1:02 AM, Patrick Alken wrote: >>> Hello, >>> >>> I've been working my way through your patch. There is a new branch on >>> the git called 'sparse' with all the current changes. You've done a lot >>> of work on your patch so its taking me a while to get through all of it. >>> >>> The main functionality of CRS is in there, along with basic stuff like >>> memcpy, transpose_memcpy, transpose. Also dgemv works (and hence also >>> the GMRES linear solver). >> Since the GMRES is already implemented, do you think it would be worth >> using the Arnoldi method to find eigenvalues and eigenvectors of >> sparse non-hermitian matrices? > > If you think you can make a nice implementation, go for it. We could add > it to the gsl_eigen interface. > The basic elements of the Arnoldi iteration and the eigenvalue solver for dense non-hermitian matrices are there. The question is then which restarting method to use, e.g. the implicite restart of ARPACK. I'll see what I can do. >> >>> I did some initial work on the fprintf/fscanf >>> for the triplet case - for the compressed matrices, I am wondering if we >>> should stick to a standard format such as Harwell-Boeing, I'm not sure >>> at the moment. We might be able to avoid this issue by making a binary >>> fwrite/fread interface which just directly writes the 3 arrays to disk. >> I agree that a standard format would be best but I am not aware of any >> dominant format. >> Is Harwell-Boeing one of them? >> It seems important to me to have both binary and formatted input/output >> (for coherence with gsl_matrix and because formatted files are more >> handy to share). >> Also, it is important to have some header indicating how many elements >> should be read >> and what is the dimension of the matrix. > > I'm not aware of a standard format either, I'll try and look into this > further to come up with a good solution. > >> >>> Things I still need to add for the CRS storage: >>> >>> 1. spmatrix_add >>> 2. spblas_dgemm >>> 3. fprintf/fscanf >>> 4. transpose - for now I'll keep it the way you coded it, but I think it >>> would be better to keep the matrix in the same storage format during a >>> transpose, ie: the user might want to pass A^T to an external solver >>> library which requires CCS and they might not expect the transpose >>> routine to change it to CRS. I did a quick search for in-place transpose >>> algorithms in CCS/CRS and they do appear to exist though I haven't had >>> time to implement it yet. >> Fine, but then I think it is important to add a separate function, say >> "transpose_switch_major", >> for which the transposition is performed by just changing the major >> (i.e. row->col or col->row) >> since this is the cheapest way to transpose while some libraries may >> work equally well with CRS or CCS. > > Ok I like this idea too since it is very efficient. > >> >>> I removed the inner/outer idx stuff from gsl_spmatrix - its not >>> necessary and also I'd like to avoid modifying the gsl_spmatrix struct >>> since any changes would break binary compatibility for the next release. >> That makes perfect sense, but then you need to test each time for the >> major to know what is the size of the pointer? > > I am just making separate routines for CCS and CRS (ie: gsl_spmatrix_ccs > and gsl_spmatrix_crs to do triplet -> compressed). I think its simpler > this way. > >> >>> Also your modification to gsl_spmatrix_set - I see what you're trying to >>> do but I think its better to add a new function: >>> >>> double *gsl_spmatrix_ptr(gsl_spmatrix *m, size_t i, size_t j) >>> >>> which returns a pointer to the data array for element (i,j), and the >>> user can then add whatever value they want (this is how its done with >>> gsl_matrix and gsl_vector). I think it should be straightforward to add >>> this function, possibly even for the compressed storage too. >> Then I would also add a compressed matrix constructor (or an option) in which >> gsl_spmatrix_ptr is used to sum duplicate entries instead of gsl_spmatrix_set. >> This possibility is offered by some other libraries such as Eigen >> and is for example useful to build matrices counting (e.g. transitions). > > I'm not sure what you mean. I've added a new function gsl_spmatrix_ptr > to the git, which as far as I can tell does exactly what your > sum_duplicate flag does. It searches the matrix for an (i,j) element, > and if found returns a pointer. If not found a null pointer is returned. > This makes it easy for the user to modify A(i,j) after it has been added > to the matrix. Are you thinking of something else? Can you point me to > the Eigen routine? > What I meant is to have the equivalent of gsl_spmatrix_compress, with the difference that gsl_spmatrix_ptr is used instead of gsl_spmatrix_set, so has to build the compressed matrix from triplets, summing the duplicates, instead of replacing them. This is what is done here : The http://eigen.tuxfamily.org/dox/classEigen_1_1SparseMatrix.html#a5bcf3187e372ff7cea1e8f61152ae49b Best, Alexis >> >>> Anyway, just wanted to let you know things are progressing (though a >>> little slowly). >> Thanks a lot! Would you like me to work on something in particular? >> >> Best, >> Alexis >> >>> Patrick >>> >>> On 01/20/2016 11:31 AM, Alexis Tantet wrote: >>>> Hi Patrick, >>>> >>>> I've cloned gsl.git, made my modifications to spmatrix/ and spblas/ >>>> and pushed them to master at: >>>> https://github.com/atantet/gsl >>>> >>>> This time, I've tried to modify the code as little as possible. I've >>>> usually added comments labelled "AT" with each modification. >>>> I also have a few questions about the code which I've labelled "?AT". >>>> Finally, I've added the corresponding tests to both test.c (successful >>>> on my machine). >>>> >>>> I see that duplicates are handled for the triplet format by replacing. >>>> Thus, I removed the function taking care of sorting and summing >>>> duplicates that I had added to _compress (the trick was to transpose >>>> with copy to reorder first, transpose back and remove the duplicates >>>> then, taken from Eigen C++). Instead, I have added an option to sum >>>> the duplicates (useful e.g. when counting links of a graph or >>>> transitions of a Markov chain). One may need the third option of >>>> repeating the triplets, but then avl_insert should also be modified. >>>> >>>> However, if I've understood well, no sorting of the inner indices is >>>> performed right now. This could be problematic in the future (e.g. in >>>> functions such as _equal). >>>> >>>> For _spdgemm for CRS matrices, I've used the trick (A B)^t = B^t A^t >>>> to convert the problem to CSC. I've tested it successfully but you may >>>> not like the coding style. >>>> >>>> I hope that helps, >>>> Best, >>>> Alexis >>>> >>>> On Tue, Jan 19, 2016 at 9:50 PM, Patrick Alken wrote: >>>>> On 01/19/2016 12:55 PM, Alexis Tantet wrote: >>>>>> Hi Patrick, >>>>>> >>>>>> Thanks for the quick reply! I wanted to be sure that this contribution >>>>>> is useful before to spend time on the merging with the latest version. >>>>>> I will create the gsl.git repository and work on it during the week. >>>>>> >>>>>> I had already had a look at the documentation but did not know about >>>>>> the iterative solvers (a link between each modules would be useful). >>>>>> My contribution indeed fits in the sparse matrix module + the update >>>>>> of the dgemm and dgemv functions to support CRS (an update may also be >>>>>> needed for the solvers). >>>>> The solvers only call dgemv (as far as I remember) so they shouldn't need an >>>>> update once dgemv is updated. >>>>> >>>>>> I have also developed a simple C++ object allowing to use gsl_spmatrix >>>>>> as a user-defined matrix in ARPACK++ (a maintained fork of ARPACK++ >>>>>> can be found at https://github.com/m-reuter/arpackpp), allowing to >>>>>> avoid having to use other libraries such as superLU. It could be >>>>>> useful to others, maybe as an extension. Now that I think about it, >>>>>> the iterative solvers could also be used to support the shift and >>>>>> invert modes (see ARPACK++ documentation). What do you think (I could >>>>>> work on it)? >>>>> I've never used ARPACK, but if you want to make an extension to interface >>>>> GSL/ARPACK its fine with me - such an extension would never be added to the >>>>> main GSL code since GSL tries to be as standalone as possible. >>>>> >>>>> >>>>>> If you have major comments, the sooner the better, so that I can work >>>>>> on them while merging. >>>>>> >>>>>> Thank you for your interest, >>>>>> Alexis >>>>>> >>>>>> On Tue, Jan 19, 2016 at 6:02 PM, Patrick Alken wrote: >>>>>>> Hi Alexis, >>>>>>> >>>>>>> This looks like very good work! Adding compressed row storage has been >>>>>>> on >>>>>>> my todo list for a while. The 'gslsp' extension is unfortunately very out >>>>>>> of >>>>>>> date, and the current git contains newer code (including a GMRES >>>>>>> iterative >>>>>>> linear solver). I removed the gslsp extension from the web page a while >>>>>>> back >>>>>>> to reflect this. You can browse the latest manual to see the current >>>>>>> sparse >>>>>>> matrix capabilities (http://www.gnu.org/software/gsl/manual/gsl-ref.pdf) >>>>>>> - >>>>>>> there are 3 chapters: sparse matrices, sparse blas and sparse linear >>>>>>> algebra >>>>>>> - it looks like your contributions will fit into the sparse matrices >>>>>>> chapter. >>>>>>> >>>>>>> Would you be able to verify that your changes are compatible with the >>>>>>> current gsl.git repository? This will make it much easier for me to merge >>>>>>> everything into the git when ready. It would be best if you made a new >>>>>>> branch of gsl.git, and add your changes so I can then pull them from >>>>>>> github >>>>>>> or somewhere. I will try to find some time in the next few days to look >>>>>>> over >>>>>>> your code. >>>>>>> >>>>>>> Thanks again, >>>>>>> Patrick >>>>>>> >>>>>>> >>>>>>> On 01/19/2016 09:43 AM, Alexis Tantet wrote: >>>>>>>> Dear GSLers, >>>>>>>> >>>>>>>> As a scientist rather than a developer, I have developed an extension >>>>>>>> of the sparse matrix module (CRS, I/O, manipulation, see below), which >>>>>>>> I have tested. These modifications conserve the structure of the >>>>>>>> original module and be useful for a large number of sparse matrices >>>>>>>> users. >>>>>>>> >>>>>>>> I'm not familiar with the contributing process here. My repository can >>>>>>>> be found there: >>>>>>>> https://github.com/atantet/gslsp >>>>>>>> Unfortunately, I did not know of the gsl.git repository and I forked it >>>>>>>> froml: >>>>>>>> https://github.com/drjerry/gslsp , >>>>>>>> which seems to be a bit older than gsl.git. >>>>>>>> >>>>>>>> How can I push/merge to gsl.git ? Should it be as an update or another >>>>>>>> extension? Is it necessary to adapt to the newest version of the code >>>>>>>> ? >>>>>>>> >>>>>>>> Best regards, >>>>>>>> Alexis Tantet >>>>>>>> >>>>>>>> CHANGES.md: >>>>>>>> >>>>>>>> Extension of the sparse matrix module of GSL >>>>>>>> >>>>>>>> =================================== >>>>>>>> >>>>>>>> Introduction >>>>>>>> ------------ >>>>>>>> >>>>>>>> Usages of sparse matrices are numerous in scientific computing. >>>>>>>> When numerical linear algebra problems become large, sparse >>>>>>>> matrices become necessary to avoid memory overload and unnecessary >>>>>>>> computations, at the cost of element access and matrix construction. >>>>>>>> As a result, most large scale linear solvers or eigen solvers perform >>>>>>>> on sparse matrices. >>>>>>>> >>>>>>>> Fortunately, a very useful sparse matrix module has recently been >>>>>>>> introduced to GSL. >>>>>>>> However, important features are still lacking, such has >>>>>>>> Compressed Row Storage (CRS) matrices, input/output functions and >>>>>>>> other matrix properties and manipulation functions. >>>>>>>> This new version attempts to address this, conserving the original >>>>>>>> structure of the module and conventions. >>>>>>>> >>>>>>>> Major changes >>>>>>>> ------------- >>>>>>>> >>>>>>>> * Add CRS format and update functions manipulating compressed matrices : >>>>>>>> - additional flag GSL_SPMATRIX_CRS and macro GSLSP_ISMATRIX ( >>>>>>>> gsl_spmatrix.h ) >>>>>>>> - additional members innerSize and outerSize used to iterate >>>>>>>> matrix elements ( gsl_spmatrix.h ) >>>>>>>> - rename some variables for coherence ( gsl_spmatrix.h , *.c ) >>>>>>>> - update all functions on compressed matrices ( *.c ) >>>>>>>> * Allow to sum duplicate elements when compressing ( spcompress.c ) : >>>>>>>> - modify gsl_spmatrix_compress >>>>>>>> - add gsl_spmatrix_sum_duplicate >>>>>>>> * CCS <-> CRS and fast transpose inplace in spswap.c : >>>>>>>> - add gsl_spmatrix_switch_major >>>>>>>> - add gsl_spmatrix_transpose >>>>>>>> * Add printing and scanning functions in spio.c : >>>>>>>> - add gsl_spmatrix_fprintf >>>>>>>> - add gsl_spmatrix_fscanf >>>>>>>> * Add manipulation functions in spmanip.c (particularly useful for >>>>>>>> Markov chain transition matrices) : >>>>>>>> - add gsl_spmatrix_get_rowsum : get vector of sum over row >>>>>>>> elements >>>>>>>> - add gsl_spmatrix_get_colsum : get vector of sum over column >>>>>>>> elements >>>>>>>> - add gsl_spmatrix_div_rows : divide all elements of each row >>>>>>>> by a vector element >>>>>>>> - add gsl_spmatrix_div_cols : divide all elements of each >>>>>>>> column by a vector element >>>>>>>> * Add test functions in atprop.c : >>>>>>>> - add gsl_spmatrix_gt_elements : greater than test for each >>>>>>>> matrix >>>>>>>> element >>>>>>>> - add gsl_spmatrix_ge_elements : greater or equal than test for >>>>>>>> each matrix element >>>>>>>> - add gsl_spmatrix_lt_elements : lower than test for each matrix >>>>>>>> element >>>>>>>> - add gsl_spmatrix_le_elements : lower or equal than test for >>>>>>>> each matrix element >>>>>>>> - add gsl_spmatrix_any : test if any non-zero element in >>>>>>>> matrix >>>>>>>> >>>>>>>> Other minor changes have been made, such as error tests. >>>>>>>> test.c has also been updated to test new features. >>>> >> >> > -- Alexis Tantet