[RFC] zip_vector: in-memory block compression of integer arrays

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* [RFC] zip_vector: in-memory block compression of integer arrays
@ 2022-08-17  6:34 Michael Clark
  2022-08-17  7:10 ` Richard Biener
  0 siblings, 1 reply; 3+ messages in thread
From: Michael Clark @ 2022-08-17  6:34 UTC (permalink / raw)
  To: GCC Mailing List

Hi Folks,

This is an edited version of a message posted on the LLVM Discourse.

I want to share what I have been working on as I feel it may be of 
interest to the GCC compiler developers, specifically concerning alias 
analysis and optimizations for iteration of sparse block-based 
multi-arrays. I also have questions about optimization related to this 
implementation, specifically the observability of alias analysis 
pessimization and memory to register optimizations.

I have been working on _zip_vector_. _zip_vector_ is a compressed 
variable length array that uses vectorized block codecs to compress and 
decompress integers using dense variable bit width deltas as well as 
compressing constant values and sequences. _zip_vector_ employs integer 
block codecs optimized for vector instruction sets using the Google 
Highway C++ library for portable SIMD/vector intrinsics.

The high-level class supports 32-bit and 64-bit compressed integer arrays:

  - `zip_vector<i32>`
    - { 8, 16, 24 } bit signed and unsigned fixed-width values.
    - { 8, 16, 24 } bit signed deltas and per block IV.
    - constants and sequences using per block IV and delta.
  - `zip_vector<i64>`
    - { 8, 16, 24, 32, 48 } bit signed and unsigned fixed-width values.
    - { 8, 16, 24, 32, 48 } bit signed deltas with per block IV.
    - constants and sequences using per block IV and delta.

Here is a link to the implementation:

- https://github.com/metaparadigm/zvec/

The README has a background on the delta encoding scheme. If you read 
the source, "zvec_codecs.h" contains the low-level vectorized block 
codecs while "zvec_block.h" contains a high-level interface to the block 
codecs using cpuid-based dynamic dispatch. The high-level sparse integer 
vector class leveraging the block codecs is in "zip_vector.h". It has 
been tested with GCC and LLVM on x86-64 using SSE3, AVX, and AVX-512.

The principle of operation is to employ simplified block codecs 
dedicated to only compressing fixed-width integers and thus are 
extremely fast, unlike typical compression algorithms. They are _in the 
order of 30-150GiB/sec_ on a single core when running within the L1 
cache on Skylake AVX-512. From this perspective, zip_vector achieves its 
performance by reducing global memory bandwidth because it fetches and 
stores compressed data to and from RAM and then uses extremely fast 
vector codecs to pack and unpack compressed blocks within the L1 cache. 
 From this perspective, it is similar to texture compression codecs, but 
the specific use case is closer to storage for index arrays because the 
block codecs are lossless integer codecs. The performance is striking in 
that it can be faster for in-order read-only traversal than a regular 
array, while the primary goal is footprint reduction.

The design use case is an offsets array that might contain 64-bit values 
but usually contains smaller values. From this perspective, we wanted 
the convenience of simply using `zip_vector<i64>` or `zip_vector<i32>` 
while benefiting from the space advantages of storing data using 8, 16, 
24, 32, and 48-bit deltas.

Q. Why is it specifically of interest to GCC developers?

I think the best way to answer this is with questions. How can we model 
a block-based iterator for a sparse array that is amenable to vectorization?

There are aspects to the zip_vector iterator design that are *not done 
yet* concerning its current implementation. The iteration has two 
phases. There is an inter-block phase at the boundary of each block (the 
logic inside of `switch_page`) that scans and compresses the previously 
active block, updates the page index, and decompresses the next block. 
Then there is a _broad-phase_ for intra-block accesses, which is 
amenable to vectorization due to the use of fixed-size blocks.

*Making 1D iteration as fast as 2D iteration*

Firstly I have to say that there is a lot of analysis for the 
optimization of the iterator that I would like to discuss. There is the 
issue of hoisting the inter-block boundary test from the fast path so 
that during block boundary traversal, subsequent block endings are 
calculated in advance so that the broad phase only requires a pointer 
increment and comparison with the addresses held in registers.

The challenge is getting past compiler alias analysis. Alias analysis 
seems to prevent caching of the sum of the slab base address and active 
area offset in a register versus being demoted to memory accesses. These 
member variables hold the location of the slab and the offset to the 
uncompressed page which are both on the critical path. When these values 
are in memory, _it adds 4 or more cycles of latency for base address 
calculation on every access_. There is also the possibility to hoist and 
fold the active page check as we know we can make constructive proofs 
concerning changes to that value.

Benchmarks compare the performance of 1D and 2D style iterators. At 
certain times the compiler would hoist the base and offset pointers from 
member variable accesses into registers in the 1D version making a 
noticeable difference in performance. In some respects, from the 
perspective of single-threaded code, the only way the pointer to the 
active region can change is inside `switch_page(size_t y)`.

The potential payoff is huge because one may be able to access data ~ 
0.9X - 3.5X faster than simply accessing integers in RAM when combining 
the reduction in global memory bandwidth with auto-vectorization, but 
the challenge is generating safe code for the simpler 1D iteration case 
that is as efficient as explicit 2D iteration.

1D iteration:

     for (auto x : vec) x2 += x;

2D iteration:

     for (size_t i = 0; i < n; i += decltype(vec)::page_interval) {
         i64 *cur = &vec[i], *end = cur + decltype(vec)::page_interval;
         while(cur != end) x2 += *cur++;
     }

Note: In this example, I avoid having a different size loop tail but 
that is also a consideration.

I trialled several techniques using a simplified version of the 
`zip_vector` class where `switch_page` was substituted with simple logic 
so that it was possible to get the compiler to coalesce the slab base 
pointer and active area offset into a single calculation upon page 
crossings. There is also hoisting of the active_page check 
(_y-parameter_) to only occur on block crossings. I found that when the 
`switch_page` implementation became more complex, i.e. probably adding 
an extern call to `malloc`, the compiler would resort to more 
conservatively fetching through a pointer to a member variable for the 
base pointer and offset calculation. See here:

https://github.com/metaparadigm/zvec/blob/756e583472028fcc36e94c0519926978094dbb00/src/zip_vector.h#L491-L496

So I got to the point where I thought it would help to get input from 
compiler developers to figure out how to observe which internal 
constraints are violated by "switch_page"  preventing the base pointer 
and offset address calculation from being cached in registers. 
"slab_data" and "active_area" are neither volatile nor atomic, so 
threads should not expect their updates to be atomic or go through memory.

I tried a large number of small experiments. e.g. so let's try and 
collapse "slab_data" and "active_area" into one pointer at the end of 
"switch_page" so that we only have one pointer to access. Also, the 
"active_page" test doesn't necessarily need to be in the broad phase. I 
had attempted to manually hoist these variables by modifying the 
iterators but found it was necessary to keep them where they were to 
avoid introducing stateful invariants to the iterators that could become 
invalidated due to read accesses.

Stack/register-based coroutines could help due to the two distinct 
states in the iterator.

I want to say that it is not as simple as one might think on the 
surface. I tried several approaches to coalesce address calculations and 
move them into the page switch logic, all leading to performance 
fall-off, almost like the compiler was carrying some pessimization that 
forced touched member variables to be accessed via memory versus 
registers. At one stage the 1D form was going fast with GCC, but after 
adding support for `zip_vector<i32>` and `zip_vector<i64>`, I found that 
performance fell off. So I would like to observe exactly which code 
causes pessimization of accesses to member variables, preventing them 
from being held in registers and causing accesses to go to memory 
instead. It seems it should be possible to get 1D iteration to be faster 
than _std::vector_ as I did witness this with GCC but the optimization 
does not seem to be stable.

So that's what I would like help with...

Regarding the license, zip vector and its block codecs are released 
under "PLEASE LICENSE", a permissive ISC-derived license with a notice 
about implied copyright. The license removes the ISC restriction that 
all copies must include the copyright message, so while it is still 
copyright material i.e. it is not public domain, it is, in fact, 
compatible with the Apache Software License.

Please have a look at the benchmarks.

Regards,

Michael J. Clark
Twitter: @larkmjc

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC] zip_vector: in-memory block compression of integer arrays
  2022-08-17  6:34 [RFC] zip_vector: in-memory block compression of integer arrays Michael Clark
@ 2022-08-17  7:10 ` Richard Biener
  2022-08-17  7:49   ` Michael Clark
  0 siblings, 1 reply; 3+ messages in thread
From: Richard Biener @ 2022-08-17  7:10 UTC (permalink / raw)
  To: Michael Clark; +Cc: GCC Mailing List

On Wed, Aug 17, 2022 at 8:35 AM Michael Clark via Gcc <gcc@gcc.gnu.org> wrote:
>
> Hi Folks,
>
> This is an edited version of a message posted on the LLVM Discourse.
>
> I want to share what I have been working on as I feel it may be of
> interest to the GCC compiler developers, specifically concerning alias
> analysis and optimizations for iteration of sparse block-based
> multi-arrays. I also have questions about optimization related to this
> implementation, specifically the observability of alias analysis
> pessimization and memory to register optimizations.
>
> I have been working on _zip_vector_. _zip_vector_ is a compressed
> variable length array that uses vectorized block codecs to compress and
> decompress integers using dense variable bit width deltas as well as
> compressing constant values and sequences. _zip_vector_ employs integer
> block codecs optimized for vector instruction sets using the Google
> Highway C++ library for portable SIMD/vector intrinsics.
>
> The high-level class supports 32-bit and 64-bit compressed integer arrays:
>
>   - `zip_vector<i32>`
>     - { 8, 16, 24 } bit signed and unsigned fixed-width values.
>     - { 8, 16, 24 } bit signed deltas and per block IV.
>     - constants and sequences using per block IV and delta.
>   - `zip_vector<i64>`
>     - { 8, 16, 24, 32, 48 } bit signed and unsigned fixed-width values.
>     - { 8, 16, 24, 32, 48 } bit signed deltas with per block IV.
>     - constants and sequences using per block IV and delta.
>
> Here is a link to the implementation:
>
> - https://github.com/metaparadigm/zvec/
>
> The README has a background on the delta encoding scheme. If you read
> the source, "zvec_codecs.h" contains the low-level vectorized block
> codecs while "zvec_block.h" contains a high-level interface to the block
> codecs using cpuid-based dynamic dispatch. The high-level sparse integer
> vector class leveraging the block codecs is in "zip_vector.h". It has
> been tested with GCC and LLVM on x86-64 using SSE3, AVX, and AVX-512.
>
> The principle of operation is to employ simplified block codecs
> dedicated to only compressing fixed-width integers and thus are
> extremely fast, unlike typical compression algorithms. They are _in the
> order of 30-150GiB/sec_ on a single core when running within the L1
> cache on Skylake AVX-512. From this perspective, zip_vector achieves its
> performance by reducing global memory bandwidth because it fetches and
> stores compressed data to and from RAM and then uses extremely fast
> vector codecs to pack and unpack compressed blocks within the L1 cache.
>  From this perspective, it is similar to texture compression codecs, but
> the specific use case is closer to storage for index arrays because the
> block codecs are lossless integer codecs. The performance is striking in
> that it can be faster for in-order read-only traversal than a regular
> array, while the primary goal is footprint reduction.
>
> The design use case is an offsets array that might contain 64-bit values
> but usually contains smaller values. From this perspective, we wanted
> the convenience of simply using `zip_vector<i64>` or `zip_vector<i32>`
> while benefiting from the space advantages of storing data using 8, 16,
> 24, 32, and 48-bit deltas.
>
> Q. Why is it specifically of interest to GCC developers?
>
> I think the best way to answer this is with questions. How can we model
> a block-based iterator for a sparse array that is amenable to vectorization?
>
> There are aspects to the zip_vector iterator design that are *not done
> yet* concerning its current implementation. The iteration has two
> phases. There is an inter-block phase at the boundary of each block (the
> logic inside of `switch_page`) that scans and compresses the previously
> active block, updates the page index, and decompresses the next block.
> Then there is a _broad-phase_ for intra-block accesses, which is
> amenable to vectorization due to the use of fixed-size blocks.
>
> *Making 1D iteration as fast as 2D iteration*
>
> Firstly I have to say that there is a lot of analysis for the
> optimization of the iterator that I would like to discuss. There is the
> issue of hoisting the inter-block boundary test from the fast path so
> that during block boundary traversal, subsequent block endings are
> calculated in advance so that the broad phase only requires a pointer
> increment and comparison with the addresses held in registers.
>
> The challenge is getting past compiler alias analysis. Alias analysis
> seems to prevent caching of the sum of the slab base address and active
> area offset in a register versus being demoted to memory accesses. These
> member variables hold the location of the slab and the offset to the
> uncompressed page which are both on the critical path. When these values
> are in memory, _it adds 4 or more cycles of latency for base address
> calculation on every access_. There is also the possibility to hoist and
> fold the active page check as we know we can make constructive proofs
> concerning changes to that value.
>
> Benchmarks compare the performance of 1D and 2D style iterators. At
> certain times the compiler would hoist the base and offset pointers from
> member variable accesses into registers in the 1D version making a
> noticeable difference in performance. In some respects, from the
> perspective of single-threaded code, the only way the pointer to the
> active region can change is inside `switch_page(size_t y)`.
>
> The potential payoff is huge because one may be able to access data ~
> 0.9X - 3.5X faster than simply accessing integers in RAM when combining
> the reduction in global memory bandwidth with auto-vectorization, but
> the challenge is generating safe code for the simpler 1D iteration case
> that is as efficient as explicit 2D iteration.
>
> 1D iteration:
>
>      for (auto x : vec) x2 += x;
>
> 2D iteration:
>
>      for (size_t i = 0; i < n; i += decltype(vec)::page_interval) {
>          i64 *cur = &vec[i], *end = cur + decltype(vec)::page_interval;
>          while(cur != end) x2 += *cur++;
>      }
>
> Note: In this example, I avoid having a different size loop tail but
> that is also a consideration.
>
> I trialled several techniques using a simplified version of the
> `zip_vector` class where `switch_page` was substituted with simple logic
> so that it was possible to get the compiler to coalesce the slab base
> pointer and active area offset into a single calculation upon page
> crossings. There is also hoisting of the active_page check
> (_y-parameter_) to only occur on block crossings. I found that when the
> `switch_page` implementation became more complex, i.e. probably adding
> an extern call to `malloc`, the compiler would resort to more
> conservatively fetching through a pointer to a member variable for the
> base pointer and offset calculation. See here:
>
> https://github.com/metaparadigm/zvec/blob/756e583472028fcc36e94c0519926978094dbb00/src/zip_vector.h#L491-L496
>
> So I got to the point where I thought it would help to get input from
> compiler developers to figure out how to observe which internal
> constraints are violated by "switch_page"  preventing the base pointer
> and offset address calculation from being cached in registers.
> "slab_data" and "active_area" are neither volatile nor atomic, so
> threads should not expect their updates to be atomic or go through memory.
>
> I tried a large number of small experiments. e.g. so let's try and
> collapse "slab_data" and "active_area" into one pointer at the end of
> "switch_page" so that we only have one pointer to access. Also, the
> "active_page" test doesn't necessarily need to be in the broad phase. I
> had attempted to manually hoist these variables by modifying the
> iterators but found it was necessary to keep them where they were to
> avoid introducing stateful invariants to the iterators that could become
> invalidated due to read accesses.
>
> Stack/register-based coroutines could help due to the two distinct
> states in the iterator.
>
> I want to say that it is not as simple as one might think on the
> surface. I tried several approaches to coalesce address calculations and
> move them into the page switch logic, all leading to performance
> fall-off, almost like the compiler was carrying some pessimization that
> forced touched member variables to be accessed via memory versus
> registers. At one stage the 1D form was going fast with GCC, but after
> adding support for `zip_vector<i32>` and `zip_vector<i64>`, I found that
> performance fell off. So I would like to observe exactly which code
> causes pessimization of accesses to member variables, preventing them
> from being held in registers and causing accesses to go to memory
> instead. It seems it should be possible to get 1D iteration to be faster
> than _std::vector_ as I did witness this with GCC but the optimization
> does not seem to be stable.
>
> So that's what I would like help with...

It's difficult to sort through a maze of C++ abstraction, it also means
that it's probably difficult if not impossible to place a strathegic
__restrict somewhere (GCC only likes that on function arguments).

TBAA is difficult to deal with, especially if you have integer data
and integer iterator state.  There's currently no way to make
primitive types with different alias sets (typedefs and friends do
not help).


> Regarding the license, zip vector and its block codecs are released
> under "PLEASE LICENSE", a permissive ISC-derived license with a notice
> about implied copyright. The license removes the ISC restriction that
> all copies must include the copyright message, so while it is still
> copyright material i.e. it is not public domain, it is, in fact,
> compatible with the Apache Software License.
>
> Please have a look at the benchmarks.
>
> Regards,
>
> Michael J. Clark
> Twitter: @larkmjc

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC] zip_vector: in-memory block compression of integer arrays
  2022-08-17  7:10 ` Richard Biener
@ 2022-08-17  7:49   ` Michael Clark
  0 siblings, 0 replies; 3+ messages in thread
From: Michael Clark @ 2022-08-17  7:49 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Mailing List

On 17/08/22 7:10 pm, Richard Biener wrote:

>> Q. Why is it specifically of interest to GCC developers?
>>
>> I think the best way to answer this is with questions. How can we model
>> a block-based iterator for a sparse array that is amenable to vectorization?
>>
>> There are aspects to the zip_vector iterator design that are *not done
>> yet* concerning its current implementation. The iteration has two
>> phases. There is an inter-block phase at the boundary of each block (the
>> logic inside of `switch_page`) that scans and compresses the previously
>> active block, updates the page index, and decompresses the next block.
>> Then there is a _broad-phase_ for intra-block accesses, which is
>> amenable to vectorization due to the use of fixed-size blocks.
>>
>> *Making 1D iteration as fast as 2D iteration*
>>
>> Firstly I have to say that there is a lot of analysis for the
>> optimization of the iterator that I would like to discuss. There is the
>> issue of hoisting the inter-block boundary test from the fast path so
>> that during block boundary traversal, subsequent block endings are
>> calculated in advance so that the broad phase only requires a pointer
>> increment and comparison with the addresses held in registers.
>>
>> The challenge is getting past compiler alias analysis. Alias analysis
>> seems to prevent caching of the sum of the slab base address and active
>> area offset in a register versus being demoted to memory accesses. These
>> member variables hold the location of the slab and the offset to the
>> uncompressed page which are both on the critical path. When these values
>> are in memory, _it adds 4 or more cycles of latency for base address
>> calculation on every access_. There is also the possibility to hoist and
>> fold the active page check as we know we can make constructive proofs
>> concerning changes to that value.
>>
>> Benchmarks compare the performance of 1D and 2D style iterators. At
>> certain times the compiler would hoist the base and offset pointers from
>> member variable accesses into registers in the 1D version making a
>> noticeable difference in performance. In some respects, from the
>> perspective of single-threaded code, the only way the pointer to the
>> active region can change is inside `switch_page(size_t y)`.
>>
>> The potential payoff is huge because one may be able to access data ~
>> 0.9X - 3.5X faster than simply accessing integers in RAM when combining
>> the reduction in global memory bandwidth with auto-vectorization, but
>> the challenge is generating safe code for the simpler 1D iteration case
>> that is as efficient as explicit 2D iteration.
>>
>> 1D iteration:
>>
>>       for (auto x : vec) x2 += x;
>>
>> 2D iteration:
>>
>>       for (size_t i = 0; i < n; i += decltype(vec)::page_interval) {
>>           i64 *cur = &vec[i], *end = cur + decltype(vec)::page_interval;
>>           while(cur != end) x2 += *cur++;
>>       }
>>
>> Note: In this example, I avoid having a different size loop tail but
>> that is also a consideration.
>>
>> I trialled several techniques using a simplified version of the
>> `zip_vector` class where `switch_page` was substituted with simple logic
>> so that it was possible to get the compiler to coalesce the slab base
>> pointer and active area offset into a single calculation upon page
>> crossings. There is also hoisting of the active_page check
>> (_y-parameter_) to only occur on block crossings. I found that when the
>> `switch_page` implementation became more complex, i.e. probably adding
>> an extern call to `malloc`, the compiler would resort to more
>> conservatively fetching through a pointer to a member variable for the
>> base pointer and offset calculation. See here:
>>
>> https://github.com/metaparadigm/zvec/blob/756e583472028fcc36e94c0519926978094dbb00/src/zip_vector.h#L491-L496
>>
>> So I got to the point where I thought it would help to get input from
>> compiler developers to figure out how to observe which internal
>> constraints are violated by "switch_page"  preventing the base pointer
>> and offset address calculation from being cached in registers.
>> "slab_data" and "active_area" are neither volatile nor atomic, so
>> threads should not expect their updates to be atomic or go through memory.
>>
>> I tried a large number of small experiments. e.g. so let's try and
>> collapse "slab_data" and "active_area" into one pointer at the end of
>> "switch_page" so that we only have one pointer to access. Also, the
>> "active_page" test doesn't necessarily need to be in the broad phase. I
>> had attempted to manually hoist these variables by modifying the
>> iterators but found it was necessary to keep them where they were to
>> avoid introducing stateful invariants to the iterators that could become
>> invalidated due to read accesses.
>>
>> Stack/register-based coroutines could help due to the two distinct
>> states in the iterator.
>>
>> I want to say that it is not as simple as one might think on the
>> surface. I tried several approaches to coalesce address calculations and
>> move them into the page switch logic, all leading to performance
>> fall-off, almost like the compiler was carrying some pessimization that
>> forced touched member variables to be accessed via memory versus
>> registers. At one stage the 1D form was going fast with GCC, but after
>> adding support for `zip_vector<i32>` and `zip_vector<i64>`, I found that
>> performance fell off. So I would like to observe exactly which code
>> causes pessimization of accesses to member variables, preventing them
>> from being held in registers and causing accesses to go to memory
>> instead. It seems it should be possible to get 1D iteration to be faster
>> than _std::vector_ as I did witness this with GCC but the optimization
>> does not seem to be stable.
>>
>> So that's what I would like help with...
> 
> It's difficult to sort through a maze of C++ abstraction, it also means
> that it's probably difficult if not impossible to place a strathegic
> __restrict somewhere (GCC only likes that on function arguments).
> 
> TBAA is difficult to deal with, especially if you have integer data
> and integer iterator state.  There's currently no way to make
> primitive types with different alias sets (typedefs and friends do
> not help).

I didn't suspect it was an easy problem to solve but I think it is a 
useful problem to solve. I had considered if one had a co-routine 
perhaps the compiler would have clear blocks it can more easily analyze 
what can potentially change within the scope of a block, where the 
"block-crossing logic" has a clear cut with the "intra-block logic". A 
co-routine could cut the control flow to hold the intra-block state 
where we just increment a pointer.

Google's Highway library does indeed generate somewhat of a maze of code 
that is possibly not the easiest to analyze compared to say using the 
non portable AVX intrinsics or one of the fixed width SIMD libraries.

However, if you look at the zvec code, I had some awareness of this and 
made some cuts where I expose the codecs through a simple set of 
high-level interfaces that do indeed use __restrict. In fact I don't 
think they are necessarily the problem. In any case there is a distinct 
cut between the combinatorial expansion of low-level compression codecs, 
because we access them via function pointers so that we can do cpuid 
based dispatch. It wouldn't be too hard to make a purely C API for them.

if you look in zvec_block.h we have the high level block codec API 
(could be made C in a pinch). the headers do indeed have __restrict. I 
feel like I there might be a problem with realloc or something else.

/*
  * high-level interface to scan, encode, and decode blocks
  *
  * - scan block for statistics with codec
  *   zvec_stats<T> zvec_block_scan(T * in, size_t n, zvec_codec codec);
  *
  * - select block codec and size from statistics
  *   zvec_format zvec_block_format(zvec_stats<T> s);
  *
  * - gather block iv and delta from statistics
  *   zvec_meta<T> zvec_block_metadata(zvec_stats<T> s);
  *
  * - block size from format
  *   size_t zvec_block_size(zvec_format fmt, size_t n);
  *
  * - block alignment from format
  *   size_t zvec_block_align(zvec_format fmt, size_t n);
  *
  * - encode block using metadata
  *   void zvec_block_encode(T * in, void * comp, size_t n,
  *                          zvec_format fmt, zvec_meta<T> meta);
  *
  * - decode block using metadata
  *   void zvec_block_decode(T * out, void * comp, size_t n,
  *                          zvec_format fmt, zvec_meta<T> meta);
  */

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-08-17  7:49 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-17  6:34 [RFC] zip_vector: in-memory block compression of integer arrays Michael Clark
2022-08-17  7:10 ` Richard Biener
2022-08-17  7:49   ` Michael Clark

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).