Re: [PATCH] libstdc++: add ARM SVE support to std::experimental::simd

public inbox for libstdc++@gcc.gnu.org
 help / color / mirror / Atom feed

From: Matthias Kretz <m.kretz@gsi.de>
To: <gcc-patches@gcc.gnu.org>
Cc: Srinivas Yadav <vasusrinivas.vasu14@gmail.com>,
	<gcc-patches@gcc.gnu.org>, <libstdc++@gcc.gnu.org>,
	<richard.sandiford@arm.com>, Andrew Pinski <pinskia@gmail.com>
Subject: Re: [PATCH] libstdc++: add ARM SVE support to std::experimental::simd
Date: Thu, 18 Jan 2024 09:40:51 +0100	[thread overview]
Message-ID: <4123346.6PsWsQAL7t@centauriprime> (raw)
In-Reply-To: <CA+=Sn1nTL8LLLFNmj2797a5p-3WLCVuEXukoPXe=GykNGep2Vw@mail.gmail.com>

On Thursday, 18 January 2024 08:40:48 CET Andrew Pinski wrote:
> On Wed, Jan 17, 2024 at 11:28 PM Matthias Kretz <m.kretz@gsi.de> wrote:
> > template <typename T>
> > struct Point
> > {
> >   T x, y, z;
> >   
> >   T distance_to_origin() {
> >     return sqrt(x * x + y * y + z * z);
> >   }
> > };
> > 
> > Point<float> is one point in 3D space, Point<simd<float>> stores multiple
> > points in 3D space and can work on them in parallel.
> > 
> > This implies that simd<T> must have a sizeof. C++ is unlikely to get
> > sizeless types (the discussions were long, there were many papers, ...).
> > Should sizeless types in C++ ever happen, then composition is likely going
> > to be constrained to the last data member.
> 
> Even this is a bad design in general for simd. It means the code needs
> to know the size.

Yes and no. The developer writes size-agnostic code. The person compiling the 
code chooses the size (via -m flags) and thus the compiler sees fixed-size 
code.

> Also AoS vs SoA is always an interesting point here. In some cases you
> want an array of structs
> for speed and Point<simd<float>> does not work there at all. I guess
> This is all water under the bridge with how folks design code.
> You are basically pushing AoSoA idea here which is much worse idea than
> before.

I like to call it "array of vectorized struct" (AoVS) instead of AoSoA to 
emphasize the compiler-flags dependent memory layout.

I've been doing a lot of heterogeneous SIMD programming since 2009, starting 
with an outer loop vectorization across many TUs of a high-energy physics code 
targeting Intel Larrabee (pre-AVX512 ISA) and SSE2 with one source. In all 
these years my experience has been that, if the problem allows, AoVS is best 
in terms of performance and code generality & readability. I'd be interested 
to learn why you think differently.

> That being said sometimes it is not a vector of N elements you want to
> work on but rather 1/2/3 vector of  N elements. Seems like this is
> just pushing the idea one of one vector of one type of element which
> again is wrong push.

I might have misunderstood. You're saying that sometimes I want a <float, 8> 
even though my target CPU only has <float, 4> registers? Yes! The 
std::experimental::simd spec and implementation isn't good enough in that area 
yet, but the C++26 paper(s) and my prototype implementation provides perfect 
SIMD + ILP translation of the expressed data-parallelism.

> Also more over, I guess pushing one idea of SIMD is worse than pushing
> any idea of SIMD. For Mathematical code, it is better for the compiler
> to do the vectorization than the user try to be semi-portable between
> different targets.

I guess I agree with that statement. But I wouldn't, in general, call the use 
of simd<T> "the user try[ing] to be semi-portable". In my experience, working 
on physics code - a lot of math - using simd<T> (as intended) is better in 
terms of performance and performance portability. As always, abuse is possible 
...

> This is what was learned on Fortran but I guess
> some folks in the C++ likes to expose the underlying HW instead of
> thinking high level here.

The C++ approach is to "leave no room for a lower-level language" while 
designing for high-level abstractions / usage.

> > With the above as our design constraints, SVE at first seems to be a bad
> > fit for implementing std::simd. However, if (at least initially) we accept
> > the need for different binaries for different SVE implementations, then
> > you
> > can look at the "scalable" part of SVE as an efficient way of reducing the
> > number of opcodes necessary for supporting all kinds of different vector
> > lengths. But otherwise you can treat it as fixed-size registers - which it
> > is for a given CPU. In the case of a multi-CPU shared-memory system (e.g.
> > RDMA between different ARM implementations) all you need is a different
> > name for incompatible types. So std::simd<float> on SVE256 must have a
> > different name on SVE512. Same for std::simd<float, 8> (which is currently
> > not the case with Sriniva's patch, I think, and needs to be resolved).
> 
> For SVE that is a bad design. It means The code is not portable at all.

When you say "code" you mean "source code", not binaries, right? I don't see 
how that follows.

- Matthias

-- 
──────────────────────────────────────────────────────────────────────────
 Dr. Matthias Kretz                           https://mattkretz.github.io
 GSI Helmholtz Center for Heavy Ion Research               https://gsi.de
 std::simd
──────────────────────────────────────────────────────────────────────────

next prev parent reply	other threads:[~2024-01-18  8:40 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-24 15:59 Srinivas Yadav
2023-12-10 13:29 ` Richard Sandiford
2023-12-11 11:02   ` Richard Sandiford
2024-01-04  7:42   ` Srinivas Yadav
2024-01-04  9:10     ` Andrew Pinski
2024-01-18  7:27       ` Matthias Kretz
2024-01-18  7:40         ` Andrew Pinski
2024-01-18  8:40           ` Matthias Kretz [this message]
2024-01-18  6:54   ` Matthias Kretz
2024-01-23 20:57     ` Richard Sandiford
2024-03-27 11:53       ` Matthias Kretz
2024-03-27 13:34         ` Richard Sandiford
2024-03-28 14:48           ` Matthias Kretz
2024-02-09 14:28   ` [PATCH v2] " Srinivas Yadav Singanaboina
2024-03-08  9:57     ` Matthias Kretz
2024-03-27  9:50       ` Jonathan Wakely
2024-03-27 10:07         ` Richard Sandiford
2024-03-27 10:30           ` Matthias Kretz
2024-03-27 12:13             ` Richard Sandiford
2024-03-27 12:47               ` Jonathan Wakely
2024-03-27 14:18         ` Matthias Kretz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4123346.6PsWsQAL7t@centauriprime \
    --to=m.kretz@gsi.de \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=libstdc++@gcc.gnu.org \
    --cc=pinskia@gmail.com \
    --cc=richard.sandiford@arm.com \
    --cc=vasusrinivas.vasu14@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).