From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <kawa-return-5910-listarch-kawa=sources.redhat.com@sourceware.org>
Received: (qmail 18669 invoked by alias); 17 Aug 2015 22:48:06 -0000
Mailing-List: contact kawa-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <kawa.sourceware.org>
List-Subscribe: <mailto:kawa-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/kawa/>
List-Post: <mailto:kawa@sourceware.org>
List-Help: <mailto:kawa-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: kawa-owner@sourceware.org
Received: (qmail 18352 invoked by uid 89); 17 Aug 2015 22:48:04 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-0.5 required=5.0 tests=AWL,BAYES_50,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2
X-HELO: aibo.runbox.com
Received: from aibo.runbox.com (HELO aibo.runbox.com) (91.220.196.211) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted) ESMTPS; Mon, 17 Aug 2015 22:47:49 +0000
Received: from [10.9.9.206] (helo=mailfront02.runbox.com)	by bars.runbox.com with esmtp (Exim 4.71)	(envelope-from <per@bothner.com>)	id 1ZRTBw-0004ix-2m	for kawa@sourceware.org; Tue, 18 Aug 2015 00:47:44 +0200
Received: from 70-36-239-58.dsl.dynamic.fusionbroadband.com ([70.36.239.58] helo=toshie.bothner.com)	by mailfront02.runbox.com with esmtpsa  (uid:757155 )  (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16)	(Exim 4.76)	id 1ZRTBv-0003L5-Im	for kawa@sourceware.org; Tue, 18 Aug 2015 00:47:43 +0200
Subject: Re: APL-style array indexing in Kawa - a sketch
To: kawa@sourceware.org
References: <55CD4B6A.8010606@bothner.com> <4E24D268-9062-465B-BF2B-966806394A39@theptrgroup.com>
From: Per Bothner <per@bothner.com>
Message-ID: <55D2648B.2040203@bothner.com>
Date: Mon, 17 Aug 2015 22:48:00 -0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0
MIME-Version: 1.0
In-Reply-To: <4E24D268-9062-465B-BF2B-966806394A39@theptrgroup.com>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
X-IsSubscribed: yes
X-SW-Source: 2015-q3/txt/msg00027.txt.bz2


On 08/17/2015 01:24 PM, Jamison Hope wrote:
> On Aug 13, 2015, at 9:59 PM, Per Bothner <per@bothner.com> wrote:
>
>>
>> I recently changed the various vector classes to use an index mapper:
>> Every vector has two fields: a raw Java array buffer, as well as
>> an IntSequence mapper which maps vector indexes to buffer indexes.
>> This allows slices, rotations, permutations, etc, using the
>> same same buffer:
>>   (VEC I) => index I in VEC ; function-call notation for indexeing
>>   (set! (VEC i) VALUE) - update value
>>   (VEC [i <: J]) => slice of VEC ; same buffer, new index mapper
>>   (VEC [i0 i1 ... in]) ==> [(VEC i0) (VEC i1) ... (VEC in)]
>>       - but re-using the VEC buffer with a new index mapper
>>   (set! (VEC [i <: J]) VALUE) - replace elements (possible re-size)
>
> In this last example, VALUE is a sequence and not an atom, right?  So:
>
> (define v (make-vector 5 0)) ;; -> #(0 0 0 0 0)
>
> (set! (v [2 <: 4]) 1) ;; error!
>
> (set! (v [2 <: 4]) #(1 1)) ;; -> v is now #(0 0 1 1 0)
>
> (set! (v [2 <: 4]) #(1)) ;; -> v is now #(0 0 1 0)

Right.

I'd like to have some "fill" syntax or idiom:  Repeat this item or sequences
as necessary.  One option is to define infinite cyclic arrays as a
useful-in-itself data type, and then add the rule that if the VALUE
is infinite we truncate it to the size of the selected left-hand slice.

>> Kawa also supports multi-dimensional arrays, based on SRFI-25, and you
>> can use the same function-call notation:
>>   (ARR X Y)
>>   (set! (ARR X Y) VALUE)
>> SRFI-25 (and Kawa) has functions for slices, transpose, rotations, etc.
>> However, I'd like to have APL-style array-as-index for general arrays:
>>   (ARR [A <: B] Y [c <: D])
>>   (ARR [5 2 6] [2 3])
>
> So the way to read this is that it constructs a new 3x2 2D array which
> shares backing store with ARR, and its six elements in row-major order
> are (ARR 5 2), (ARR 5 3), (ARR 2 2), (ARR 2 3), (ARR 6 2), and (ARR 6 3),
> right?

Correct.

> Is there some clever notation for a range meaning "from here to the end,
> wherever that is"?  (VEC [i <: ]) doesn't work;

That is indeed the clever notation, and indeed it doesn't work for silly
reasons involving int vs integer.  I'll fix it.

> (VEC [i <: (length VEC)]) works, but seems verbose.

Agreed.

> Matlab (which uses 1-based indexing) lets you do things like A(2:end) to
> extract all-but-the-first element.  Also, for dimensions for which you
> want the whole thing, instead of writing "1:end" you can use ":" as
> shorthand:  A(:,3) means the whole third column (regardless of the
> number of rows), and A(3,:) means the whole third row (regardless of the
> number of columns).

We could define [<:] as a supported syntax (see $bracket-list$ in syntax.scm).
It would be relative easy to define 0 as the default lower bound, but that
doesn't quite do what you want.  We could define [<;] as a special magic value
that gets coerced to a specific range depending on context.

>> * Is this a feature you'd find exciting and/or use it?
>
> I can definitely think of math-ish things to do with something like this.
>
> If I have a 4x4 matrix T representing a spatial transformation, then
> (T [0 <: 3] [0 <: 3]) would be the upper-left 3x3 submatrix, which would
> be a rotation matrix that magically stays in sync with the original
> transform data.  That could be handy.
>
> The values at [(T 0 3) (T 1 3) (T 2 3)] are the translation vector.
> Using this new notation, that's *almost* the same as (T [0 <: 3] [3]),
> except that the latter gives me a 3x1 2D matrix rather than a 3-element
> 1D vector.
>
> Will there be some simple way to flatten the result to reduce its rank
> so I can omit the index that would always be 0?

Why wouldn't you just use: (T [0 <: 3] 3) ?

We will have functions that reshape and transform arrays.
I believe this particular use-case can be handled by SRFI-25's share-array.
We do have the APL family of languages for inspiration ...

>> * I know computers and data structures.  I don't know matrixes or math, much.
>>   Would this approach cause problems if trying to to do high-performace
>>   array/matrix manipulations, or interfacing to other libraries?
>
> It's kind of hard to say.  It depends whether the added indirection of
> the index mapper introduces more overhead than the gain from not
> allocating a new data store and copying over the values at array
> creation time, right?  It certainly *seems* like it would be a win, at
> least for arrays larger than some size N (wherever the size of the data
> starts to outgrow the size of the index mapper object).

Spacewise, the difference is trivial.  In the proposal look at the ArrayMapper.java
combined with (say) F32Array, compared with the old GeneralArray combined with
a SimpleVector. Basically, it's a wash.  Of course you could write a simple Matrix class
that just allocates a data array and some bounds, and that would take slightly
less space and be slightly faster.  Speed is more of a factor, since we may
be adding some extra indirection and lookups.

Note an array-mapper is just an interface, and can be quite complex.
For example Range.IntRange just has 3 int fields; we can think of it
as an optimized version of the ArrayMapper class for rank=1.

> What would be the idiomatic way to do a deep(?) copy, to break the
> connection with the original data buffer?

It seems reasonable to offer a 1-argument array-copy function.
In addition to copying (necessary parts of) the data array,
it would simplify the IntArray interface to an ArrayMapper.

Note that array-copy is a special case of:
   (make-array-from-function SHAPE FUNCTION)
since a rank-N array can be treated as N-ary function.

>One situation I can think of
> where the sharing could cause trouble would be passing the sub-vector to
> something like an in-place sort function, where we might not actually
> want to overwrite the parent array's data.

We definitely need copying as well as sharing.

-- 
	--Per Bothner
per@bothner.com   http://per.bothner.com/