public inbox for libstdc++@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: Looking for status of SIMD implementation
       [not found]   ` <CAHFci29G7XQNV2R2EQdWX2XuAGqshDC_1srJoZTJ3hJOB2514w@mail.gmail.com>
@ 2021-06-24  8:50     ` Matthias Kretz
  2021-06-24  8:52       ` Matthias Kretz
  2021-06-24  9:09       ` Bin.Cheng
  0 siblings, 2 replies; 6+ messages in thread
From: Matthias Kretz @ 2021-06-24  8:50 UTC (permalink / raw)
  To: Bin.Cheng; +Cc: libstdc++, Richard Sandiford

CC'ing libstdc++, to discuss stdx::simd design in public.

I discussed variable length vectors at length in the C++ committee and so far 
I've not seen any useful progress that could make it work for vector *types*. 
The main issue is that C++ itself would require support for runtime-sized 
types throughout the language. E.g.

template <class T>
struct Point {
  T x, y, z;
};

Now:
sizeof(Point<float>) == 3 * sizeof(float)
sizeof(Point<simd<float>>) == 3 * sizeof(simd<float>)
should both hold. And the expectation is that it holds at compile time.

If sizeof(simd<float>) is not a constant expression this breaks a lot of 
assumptions compilers currently make when constructing types. Suddenly 
offsetof(Point<simd<float>>, y) also isn't a constant expression. This affects 
how members are accessed: everything must be computed at runtime or possibly 
cached somewhere.

Since the simd<T> type's strength, compared to loop vectorization, is the 
ability to define data structures that depend on sizeof(simd<T>) - like 
Point<simd<float>> - it's a non-starter to propose that runtime sized simd<T> 
cannot be used to compose data structures anymore. In a way std::valarray is 
that type which works in C++: if you need something with a size that's only 
known at runtime you have to malloc the memory, you can't use the stack.

So unless C++ solves the runtime sized types issue in general (basically 
support VLAs in C++), I don't see runtime sized std::simd happening.

My plan for SVE (and RISC-V V) was that in the first iteration you have to 
decide at compile time what vector width you want to compile for. My 
understanding, where SVE will be usable, is that you typically have a 
homogeneous cluster or a single machine where you can make this work. In 
engineering and research settings it is often possible to compile for the 
machine you'll work on. The problem becomes harder once a company wants to 
release binaries that are supposed to work on a wide range of customer setups. 
With this model you'd have one binary per supported SVE vector width or it 
would require "fat" binaries that contain a combination of many vector widths.

But there's more we could do if the library and compiler work together. Like 
compile SVE code in such a way that data structures are "compiled" for many 
different vector widths while algorithms that work on the data structures are 
compiled in a vector width agnostic way. The novel (AFAIK) problem to solve is 
how to dispatch and combine these. Probably the best way to make progress is 
to compile everything for multiple vector widths, taking care to not use the 
knowledge about vector widths wherever possible, and then try to shrink the 
resulting "fat" binary by eliminating the resulting duplicated machine code 
regions. How this can move up to higher abstraction levels in the compiler, I 
have no idea...

So -msve-vector-bits=<some number> might be a prerequisite for the first 
stdx::simd implementation.

But I'd be happy for better ideas and/or research.

-- 
──────────────────────────────────────────────────────────────────────────
 Dr. Matthias Kretz                           https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research               https://gsi.de
 std::experimental::simd              https://github.com/VcDevel/std-simd
──────────────────────────────────────────────────────────────────────────




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Looking for status of SIMD implementation
  2021-06-24  8:50     ` Looking for status of SIMD implementation Matthias Kretz
@ 2021-06-24  8:52       ` Matthias Kretz
  2021-06-24  9:09       ` Bin.Cheng
  1 sibling, 0 replies; 6+ messages in thread
From: Matthias Kretz @ 2021-06-24  8:52 UTC (permalink / raw)
  To: libstdc++; +Cc: Richard Sandiford

Well great, now I accidentally sent the mail without context... so here's what 
I answered :)

On Donnerstag, 24. Juni 2021 07:17:46 CEST you wrote:
> Hi Matthias,
> Sorry for the late reply.
> 
> We are investigating the support of SVE in SIMD (and RISC-V V
> extension in the future), as you may know, it's a difficult task
> because of the VLA programming model.  One decision needs to be made
> is whether/how we implement SVE support in the VLA way, or fixed
> vector size way.
> 
> ARM(Richard Sandiford) proposed sizeless type extension
> [http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1101r0.html],
> but seems it won't be incorporated in near future.  Without this
> extension, I don't see any practical way to support SVE/VLA in SIMD,
> so we are left with the fixed vector size way.  Even with fixed vector
> size, there are quite a lot of difficulties that need to be resolved,
> for example, SVE has very limited capacity manipulating predicate
> registers while std::bitset and SIMD::_BitMask are implemented as
> packed bits.
> 
> I am still working on the prototype supporting SVE using the fixed
> vector size method.  I am wondering if you have any plans for SVE/RVV?
>  Or any comments about the idea.
> 
> Thanks,
> bin

-- 
──────────────────────────────────────────────────────────────────────────
 Dr. Matthias Kretz                           https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research               https://gsi.de
 std::experimental::simd              https://github.com/VcDevel/std-simd
──────────────────────────────────────────────────────────────────────────

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Looking for status of SIMD implementation
  2021-06-24  8:50     ` Looking for status of SIMD implementation Matthias Kretz
  2021-06-24  8:52       ` Matthias Kretz
@ 2021-06-24  9:09       ` Bin.Cheng
  2021-06-29 13:57         ` Richard Sandiford
  1 sibling, 1 reply; 6+ messages in thread
From: Bin.Cheng @ 2021-06-24  9:09 UTC (permalink / raw)
  To: Matthias Kretz; +Cc: libstdc++, Richard Sandiford

On Thu, Jun 24, 2021 at 4:50 PM Matthias Kretz <m.kretz@gsi.de> wrote:
>
> CC'ing libstdc++, to discuss stdx::simd design in public.
>
> I discussed variable length vectors at length in the C++ committee and so far
> I've not seen any useful progress that could make it work for vector *types*.
> The main issue is that C++ itself would require support for runtime-sized
> types throughout the language. E.g.
>
> template <class T>
> struct Point {
>   T x, y, z;
> };
>
> Now:
> sizeof(Point<float>) == 3 * sizeof(float)
> sizeof(Point<simd<float>>) == 3 * sizeof(simd<float>)
> should both hold. And the expectation is that it holds at compile time.
>
> If sizeof(simd<float>) is not a constant expression this breaks a lot of
> assumptions compilers currently make when constructing types. Suddenly
> offsetof(Point<simd<float>>, y) also isn't a constant expression. This affects
> how members are accessed: everything must be computed at runtime or possibly
> cached somewhere.
>
> Since the simd<T> type's strength, compared to loop vectorization, is the
> ability to define data structures that depend on sizeof(simd<T>) - like
> Point<simd<float>> - it's a non-starter to propose that runtime sized simd<T>
> cannot be used to compose data structures anymore. In a way std::valarray is
> that type which works in C++: if you need something with a size that's only
> known at runtime you have to malloc the memory, you can't use the stack.
>
> So unless C++ solves the runtime sized types issue in general (basically
> support VLAs in C++), I don't see runtime sized std::simd happening.
>
> My plan for SVE (and RISC-V V) was that in the first iteration you have to
> decide at compile time what vector width you want to compile for. My
> understanding, where SVE will be usable, is that you typically have a
> homogeneous cluster or a single machine where you can make this work. In
> engineering and research settings it is often possible to compile for the
> machine you'll work on. The problem becomes harder once a company wants to
> release binaries that are supposed to work on a wide range of customer setups.
> With this model you'd have one binary per supported SVE vector width or it
> would require "fat" binaries that contain a combination of many vector widths.
>
> But there's more we could do if the library and compiler work together. Like
> compile SVE code in such a way that data structures are "compiled" for many
> different vector widths while algorithms that work on the data structures are
> compiled in a vector width agnostic way. The novel (AFAIK) problem to solve is
> how to dispatch and combine these. Probably the best way to make progress is
> to compile everything for multiple vector widths, taking care to not use the
> knowledge about vector widths wherever possible, and then try to shrink the
> resulting "fat" binary by eliminating the resulting duplicated machine code
> regions. How this can move up to higher abstraction levels in the compiler, I
> have no idea...
>
> So -msve-vector-bits=<some number> might be a prerequisite for the first
> stdx::simd implementation.
Hi Matthias,
Thanks for the explanation, actually -msve-vector-bits=128*N is the
method I am using in my prototype implementation.  Considering SIMD
library itself guarantees source code level portability, it might be
fine to do this in the first version implementation.  Although here we
are breaking sub-target binary portability which SVE tries to achieve.

I will let Richard give more comments on this before discussing
details about the -msve-vector-bits method.

Thanks,
bin
>
> But I'd be happy for better ideas and/or research.
>
> --
> ──────────────────────────────────────────────────────────────────────────
>  Dr. Matthias Kretz                           https://mattkretz.github.io
>  GSI Helmholtz Centre for Heavy Ion Research               https://gsi.de
>  std::experimental::simd              https://github.com/VcDevel/std-simd
> ──────────────────────────────────────────────────────────────────────────
>
>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Looking for status of SIMD implementation
  2021-06-24  9:09       ` Bin.Cheng
@ 2021-06-29 13:57         ` Richard Sandiford
  2021-06-30  6:30           ` Bin.Cheng
  0 siblings, 1 reply; 6+ messages in thread
From: Richard Sandiford @ 2021-06-29 13:57 UTC (permalink / raw)
  To: Bin.Cheng; +Cc: Matthias Kretz, libstdc++

"Bin.Cheng" <amker.cheng@gmail.com> writes:
> On Thu, Jun 24, 2021 at 4:50 PM Matthias Kretz <m.kretz@gsi.de> wrote:
>>
>> CC'ing libstdc++, to discuss stdx::simd design in public.
>>
>> I discussed variable length vectors at length in the C++ committee and so far
>> I've not seen any useful progress that could make it work for vector *types*.
>> The main issue is that C++ itself would require support for runtime-sized
>> types throughout the language. E.g.
>>
>> template <class T>
>> struct Point {
>>   T x, y, z;
>> };
>>
>> Now:
>> sizeof(Point<float>) == 3 * sizeof(float)
>> sizeof(Point<simd<float>>) == 3 * sizeof(simd<float>)
>> should both hold. And the expectation is that it holds at compile time.
>>
>> If sizeof(simd<float>) is not a constant expression this breaks a lot of
>> assumptions compilers currently make when constructing types. Suddenly
>> offsetof(Point<simd<float>>, y) also isn't a constant expression. This affects
>> how members are accessed: everything must be computed at runtime or possibly
>> cached somewhere.
>>
>> Since the simd<T> type's strength, compared to loop vectorization, is the
>> ability to define data structures that depend on sizeof(simd<T>) - like
>> Point<simd<float>> - it's a non-starter to propose that runtime sized simd<T>
>> cannot be used to compose data structures anymore. In a way std::valarray is
>> that type which works in C++: if you need something with a size that's only
>> known at runtime you have to malloc the memory, you can't use the stack.
>>
>> So unless C++ solves the runtime sized types issue in general (basically
>> support VLAs in C++), I don't see runtime sized std::simd happening.

I agree with this FWIW.  But I think there's a risk that this strength
could be counterproductive in some cases.

Taking the easy case first: if an algorithm needs to operate on
N-element vectors for some fixed N (e.g. because it's processing data
that occurs in fixed-size chunks) then being able to compose those types
in the same way as (say) a std::array<T, N> is clearly useful.  It also
seems reasonable that the N-element types might occur in general data
structures.  That part seems uncontroversial.

But AIUI stdx::simd (and particularly stdx::native_simd) is also
designed to be used by code that is agnostic about the length of the
vectors.  Such code can adapt to whatever vectors the target provides.
That's the case I want to talk about below (let's call it case B).

It might just be a factor of who I've talked to, but I get the
impression that people who want to use case B are accelerating
the internals of an algorithm, rather than using vectors as part
of the main interface or using vectors for long-term storage.

For case B we generally have an algorithm that is conceptually agnostic
about the size of stdx::(native_)simd.  When using SVE, we also have a
target architecture that is agnostic about the runtime length of the
vectors.  This means that the length-agnostic algorithm could in
principle be realised by a single piece of length-agnostic SVE code.
However, the type system explicitly prevents this by forcing a length
to be chosen at compile time, even though neither the algorithm nor the
object code require that.

If an algorithm is sufficiently general that it can cope with any
vector length, then IMO it's counterproductive to write it in a
way that explicitly divides the problem up into constant-size chunks.
Case B in general requires a high degree of data parallelism and IMO
it would be better to describe that parallelism directly.

>> My plan for SVE (and RISC-V V) was that in the first iteration you have to
>> decide at compile time what vector width you want to compile for. My
>> understanding, where SVE will be usable, is that you typically have a
>> homogeneous cluster or a single machine where you can make this work. In
>> engineering and research settings it is often possible to compile for the
>> machine you'll work on. The problem becomes harder once a company wants to
>> release binaries that are supposed to work on a wide range of customer setups.
>> With this model you'd have one binary per supported SVE vector width or it
>> would require "fat" binaries that contain a combination of many vector widths.

I think using fat binaries for this use case is a non-starter though.
It would increase the size of binaries fivefold.  It also feels like the
tail wagging the dog: as mentioned above, the architecture itself is
designed around having one piece of length-agnostic code, and it's the
type system (rather than the architecture) that is preventing that from
happening.

To put it another way: if the stdx::simd code was written as normal
scalar code, there would be no need to use fat binaries in this way.
So I think using stdx::simd would in that sense be a regression
for this use case.

Having the option to use -msve-vector-bits=N is useful in principle,
but we shouldn't require it.  In an OS distro setting,
-msve-vector-bits= is going to be very much the exception rather
than the rule.  I think we should treat it as a niche option for
power users only.

>> But there's more we could do if the library and compiler work together. Like
>> compile SVE code in such a way that data structures are "compiled" for many
>> different vector widths while algorithms that work on the data structures are
>> compiled in a vector width agnostic way. The novel (AFAIK) problem to solve is
>> how to dispatch and combine these. Probably the best way to make progress is
>> to compile everything for multiple vector widths, taking care to not use the
>> knowledge about vector widths wherever possible, and then try to shrink the
>> resulting "fat" binary by eliminating the resulting duplicated machine code
>> regions. How this can move up to higher abstraction levels in the compiler, I
>> have no idea...

I'm not sure that's feasible though.  Like you say, sizeof(std::simd<T>)
is just a constant like any other.  It can in principle leak anywhere,
including into the size of static data (which is harder to version than
functions).

>> So -msve-vector-bits=<some number> might be a prerequisite for the first
>> stdx::simd implementation.
> Hi Matthias,
> Thanks for the explanation, actually -msve-vector-bits=128*N is the
> method I am using in my prototype implementation.  Considering SIMD
> library itself guarantees source code level portability, it might be
> fine to do this in the first version implementation.  Although here we
> are breaking sub-target binary portability which SVE tries to achieve.

Yeah, the last bit is my concern too.

I think the way to handle stdx::simd for SVE is to implement stdx::simd
for arm_neon.h vector types and use SVE to accelerate 64-bit and 128-bit
operations that arm_neon.h can't do as efficiently.  Any SVE implementation
can operate on 64-bit and 128-bit vectors by using an appropriate predicate.

Then we should try to get the compiler to revectorise stdx::simd
algorithms to take advantage of length-agnosticism where possible.
We could do this by scalarising the stdx::simd code and then vectorising
it in the same way as “normal” scalar code.  Alternatively, we could try
to vectorise the existing vector code directly: convert operations on
single 128-bit vectors into operations on multiple 128-bit vectors.

My worry is that this might be harder to do than it would be on the
equivalent scalar code.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Looking for status of SIMD implementation
  2021-06-29 13:57         ` Richard Sandiford
@ 2021-06-30  6:30           ` Bin.Cheng
  2021-06-30  7:22             ` Matthias Kretz
  0 siblings, 1 reply; 6+ messages in thread
From: Bin.Cheng @ 2021-06-30  6:30 UTC (permalink / raw)
  To: Bin.Cheng, Matthias Kretz, libstdc++, Richard Sandiford, JunMa

On Tue, Jun 29, 2021 at 9:57 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> "Bin.Cheng" <amker.cheng@gmail.com> writes:
> > On Thu, Jun 24, 2021 at 4:50 PM Matthias Kretz <m.kretz@gsi.de> wrote:
> >>
> >> CC'ing libstdc++, to discuss stdx::simd design in public.
> >>
> >> I discussed variable length vectors at length in the C++ committee and so far
> >> I've not seen any useful progress that could make it work for vector *types*.
> >> The main issue is that C++ itself would require support for runtime-sized
> >> types throughout the language. E.g.
> >>
> >> template <class T>
> >> struct Point {
> >>   T x, y, z;
> >> };
> >>
> >> Now:
> >> sizeof(Point<float>) == 3 * sizeof(float)
> >> sizeof(Point<simd<float>>) == 3 * sizeof(simd<float>)
> >> should both hold. And the expectation is that it holds at compile time.
> >>
> >> If sizeof(simd<float>) is not a constant expression this breaks a lot of
> >> assumptions compilers currently make when constructing types. Suddenly
> >> offsetof(Point<simd<float>>, y) also isn't a constant expression. This affects
> >> how members are accessed: everything must be computed at runtime or possibly
> >> cached somewhere.
> >>
> >> Since the simd<T> type's strength, compared to loop vectorization, is the
> >> ability to define data structures that depend on sizeof(simd<T>) - like
> >> Point<simd<float>> - it's a non-starter to propose that runtime sized simd<T>
> >> cannot be used to compose data structures anymore. In a way std::valarray is
> >> that type which works in C++: if you need something with a size that's only
> >> known at runtime you have to malloc the memory, you can't use the stack.
> >>
> >> So unless C++ solves the runtime sized types issue in general (basically
> >> support VLAs in C++), I don't see runtime sized std::simd happening.
>
> I agree with this FWIW.  But I think there's a risk that this strength
> could be counterproductive in some cases.
>
> Taking the easy case first: if an algorithm needs to operate on
> N-element vectors for some fixed N (e.g. because it's processing data
> that occurs in fixed-size chunks) then being able to compose those types
> in the same way as (say) a std::array<T, N> is clearly useful.  It also
> seems reasonable that the N-element types might occur in general data
> structures.  That part seems uncontroversial.
>
> But AIUI stdx::simd (and particularly stdx::native_simd) is also
> designed to be used by code that is agnostic about the length of the
> vectors.  Such code can adapt to whatever vectors the target provides.
> That's the case I want to talk about below (let's call it case B).
>
> It might just be a factor of who I've talked to, but I get the
> impression that people who want to use case B are accelerating
> the internals of an algorithm, rather than using vectors as part
> of the main interface or using vectors for long-term storage.
I can see possible issues in "case B" usage.  It puts burden on end
programmers to better understand parallelism in program/algorithm.  It
could end up with various small pieces of vectorized loop for
different operations on "long-term" storage/data-structure, this
requires quite powerful fusion capability from compilers.
From the name "SIMD", I tend to think it's original motivation is to
provide a higher-level, better-portable simd feature than
intrinsic/assembly (the case A?).

>
> For case B we generally have an algorithm that is conceptually agnostic
> about the size of stdx::(native_)simd.  When using SVE, we also have a
> target architecture that is agnostic about the runtime length of the
> vectors.  This means that the length-agnostic algorithm could in
> principle be realised by a single piece of length-agnostic SVE code.
> However, the type system explicitly prevents this by forcing a length
> to be chosen at compile time, even though neither the algorithm nor the
> object code require that.
>
> If an algorithm is sufficiently general that it can cope with any
> vector length, then IMO it's counterproductive to write it in a
> way that explicitly divides the problem up into constant-size chunks.
> Case B in general requires a high degree of data parallelism and IMO
> it would be better to describe that parallelism directly.
>
> >> My plan for SVE (and RISC-V V) was that in the first iteration you have to
> >> decide at compile time what vector width you want to compile for. My
> >> understanding, where SVE will be usable, is that you typically have a
> >> homogeneous cluster or a single machine where you can make this work. In
> >> engineering and research settings it is often possible to compile for the
> >> machine you'll work on. The problem becomes harder once a company wants to
> >> release binaries that are supposed to work on a wide range of customer setups.
> >> With this model you'd have one binary per supported SVE vector width or it
> >> would require "fat" binaries that contain a combination of many vector widths.
>
> I think using fat binaries for this use case is a non-starter though.
> It would increase the size of binaries fivefold.  It also feels like the
> tail wagging the dog: as mentioned above, the architecture itself is
> designed around having one piece of length-agnostic code, and it's the
> type system (rather than the architecture) that is preventing that from
> happening.
>
> To put it another way: if the stdx::simd code was written as normal
> scalar code, there would be no need to use fat binaries in this way.
> So I think using stdx::simd would in that sense be a regression
> for this use case.
>
> Having the option to use -msve-vector-bits=N is useful in principle,
> but we shouldn't require it.  In an OS distro setting,
> -msve-vector-bits= is going to be very much the exception rather
> than the rule.  I think we should treat it as a niche option for
> power users only.
>
> >> But there's more we could do if the library and compiler work together. Like
> >> compile SVE code in such a way that data structures are "compiled" for many
> >> different vector widths while algorithms that work on the data structures are
> >> compiled in a vector width agnostic way. The novel (AFAIK) problem to solve is
> >> how to dispatch and combine these. Probably the best way to make progress is
> >> to compile everything for multiple vector widths, taking care to not use the
> >> knowledge about vector widths wherever possible, and then try to shrink the
> >> resulting "fat" binary by eliminating the resulting duplicated machine code
> >> regions. How this can move up to higher abstraction levels in the compiler, I
> >> have no idea...
>
> I'm not sure that's feasible though.  Like you say, sizeof(std::simd<T>)
> is just a constant like any other.  It can in principle leak anywhere,
> including into the size of static data (which is harder to version than
> functions).
>
> >> So -msve-vector-bits=<some number> might be a prerequisite for the first
> >> stdx::simd implementation.
> > Hi Matthias,
> > Thanks for the explanation, actually -msve-vector-bits=128*N is the
> > method I am using in my prototype implementation.  Considering SIMD
> > library itself guarantees source code level portability, it might be
> > fine to do this in the first version implementation.  Although here we
> > are breaking sub-target binary portability which SVE tries to achieve.
>
> Yeah, the last bit is my concern too.
>
> I think the way to handle stdx::simd for SVE is to implement stdx::simd
> for arm_neon.h vector types and use SVE to accelerate 64-bit and 128-bit
> operations that arm_neon.h can't do as efficiently.  Any SVE implementation
> can operate on 64-bit and 128-bit vectors by using an appropriate predicate.
>
> Then we should try to get the compiler to revectorise stdx::simd
> algorithms to take advantage of length-agnosticism where possible.
> We could do this by scalarising the stdx::simd code and then vectorising
> it in the same way as “normal” scalar code.  Alternatively, we could try
> to vectorise the existing vector code directly: convert operations on
> single 128-bit vectors into operations on multiple 128-bit vectors.
Hmm, seems to me this will introduce a lot of work to (not only one)
compilers, even if it's practical.  Also the fusion problem also
exists here?
Overall, we are not sure if high level abstraction of parallelism is
stx::simd's job, it does look like a job for DSL or high level program
language features (JunMa referred to something like Halide).  We do
look for comments on more possible use cases.

Thanks,
bin
>
> My worry is that this might be harder to do than it would be on the
> equivalent scalar code.
>
> Thanks,
> Richard

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Looking for status of SIMD implementation
  2021-06-30  6:30           ` Bin.Cheng
@ 2021-06-30  7:22             ` Matthias Kretz
  0 siblings, 0 replies; 6+ messages in thread
From: Matthias Kretz @ 2021-06-30  7:22 UTC (permalink / raw)
  To: Bin.Cheng, libstdc++, Richard Sandiford, JunMa

Thank you Richard for your insight. There's a lot to discuss but I'll just 
start with the question how general the expression of data-parallelism is 
(supposed to be) for stdx::simd. Most of my answer is written up in
https://wg21.link/p0349, "Assumptions about the size of datapar".

Note that we (SG1) called it datapar instead of simd, because the intent is to 
express data-parallelism and not necessarily SIMD hardware (i.e. an abstract 
view vs. a hardware-centric view). Oh, and I wrote this paper when SVE had not 
been announced yet, but you may notice that I knew what was coming. ;)

Oh, and also relevant for what stdx::simd wants to be:
https://wg21.link/p0851, "simd<T> is neither a product type nor a container 
type".

I'll try to answer more directly later.

-- 
──────────────────────────────────────────────────────────────────────────
 Dr. Matthias Kretz                           https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research               https://gsi.de
 std::experimental::simd              https://github.com/VcDevel/std-simd
──────────────────────────────────────────────────────────────────────────

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-06-30  7:22 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAHFci2_-0LAmpr+O4-+UX8S9r7G3u4WLtTB1gM9489m_qVUMRA@mail.gmail.com>
     [not found] ` <21256857.EfDdHjke4D@minbar>
     [not found]   ` <CAHFci29G7XQNV2R2EQdWX2XuAGqshDC_1srJoZTJ3hJOB2514w@mail.gmail.com>
2021-06-24  8:50     ` Looking for status of SIMD implementation Matthias Kretz
2021-06-24  8:52       ` Matthias Kretz
2021-06-24  9:09       ` Bin.Cheng
2021-06-29 13:57         ` Richard Sandiford
2021-06-30  6:30           ` Bin.Cheng
2021-06-30  7:22             ` Matthias Kretz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).