Re: [PATCH] libstdc++: add ARM SVE support to std::experimental::simd

public inbox for libstdc++@gcc.gnu.org
 help / color / mirror / Atom feed

From: Richard Sandiford <richard.sandiford@arm.com>
To: Matthias Kretz <m.kretz@gsi.de>
Cc: Srinivas Yadav <vasusrinivas.vasu14@gmail.com>,
	 <libstdc++@gcc.gnu.org>,  <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] libstdc++: add ARM SVE support to std::experimental::simd
Date: Wed, 27 Mar 2024 13:34:52 +0000	[thread overview]
Message-ID: <mpt4jcr4x37.fsf@arm.com> (raw)
In-Reply-To: <4306399.Mh6RI2rZIc@excalibur> (Matthias Kretz's message of "Wed, 27 Mar 2024 12:53:00 +0100")

Matthias Kretz <m.kretz@gsi.de> writes:
> Hi Richard,
>
> sorry for not answering sooner. I took action on your mail but failed to also 
> give feedback. Now in light of your veto of Srinivas patch I wanted to use the 
> opportunity to pick this up again.
>
> On Dienstag, 23. Januar 2024 21:57:23 CET Richard Sandiford wrote:
>> However, we also support different vector lengths for streaming SVE
>> (running in "streaming" mode on SME) and non-streaming SVE (running
>> in "non-streaming" mode on the core).  Having two different lengths is
>> expected to be the common case, rather than a theoretical curiosity.
>
> I read up on this after you mentioned this for the first time. As a WG21 
> member I find the approach troublesome - but that's a bit off-topic for this 
> thread.
>
> The big issue here is that, IIUC, a user (and the simd library) cannot do the 
> right thing at the moment. There simply isn't enough context information 
> available when parsing the <experimental/simd> header. I.e. on definition of 
> the class template there's no facility to take target_clones or SME 
> "streaming" mode into account. Consequently, if we want the library to be fit 
> for SME, then we need more language extension(s) to make it work.

Yeah.  I think the same applies to plain SVE.  It seems reasonable to
have functions whose implementation is specialised for a specific SVE
length, with that function being selected at runtime where appropriate.
Those functions needn't (in principle) be in separate TUs.  The “best”
definition of native<float> then becomes a per-function property rather
than a per-TU property.

As you note later, I think the same thing would apply to x86_64.

> I guess I'm looking for a way to declare types that are different depending on 
> whether they are used in streaming mode or non-streaming mode (making them 
> ill-formed to use in functions marked arm_streaming_compatible).
>
> From reading through https://arm-software.github.io/acle/main/
> acle.html#controlling-the-use-of-streaming-mode I don't see any discussion of 
> member functions or ctor/dtor, static and non-static data members, etc.
>
> The big issue I see here is that currently all of std::* is declared without a 
> arm_streaming or arm_streaming_compatible. Thus, IIUC, you can't use anything 
> from the standard library in streaming mode. Since that also applies to 
> std::experimental::simd, we're not creating a new footgun, only missing out on 
> potential users?

Kind-of.  However, we can inline a non-streaming function into a streaming
function if that doesn't change defined behaviour.  And that's important
in practice for C++, since most trivial inline functions will not be
marked streaming-compatible despite being so in practice.

It's UB to pass and return SVE vectors across streaming/non-streaming
boundaries unless the two VLs are equal.  It's therefore valid to inline
such functions into streaming functions *unless* the callee uses
non-streaming-only instructions such as gather loads.

Because of that, someone trying to use std::experimenal::simd in SME
functions is likely to succeed, at least in simple cases.

> Some more thoughts on target_clones/streaming SVE language extension 
> evolution:
>
>   void nonstreaming_fn(void) {
>     constexpr int width = __arm_sve_bits(); // e.g. 512
>     constexpr int width2 = __builtin_vector_size(); // e.g. 64 (the
>       // vector_size attribute works with bytes, not bits)
>   }
>
>   __attribute__((arm_locally_streaming))
>   void streaming_fn(void) {
>     constexpr int width = __arm_sve_bits(); // e.g. 128
>     constexpr int width2 = __builtin_vector_size(); // e.g. 16
>   }
>
>   __attribute__((target_clones("sse4.2,avx2")))
>   void streaming_fn(void) {
>     constexpr int width = __builtin_vector_size(); // 16 in the sse4.2 clone
>       // and 32 in the avx2 clone
>   }
>
> ... as a starting point for exploration. Given this, I'd still have to resort 
> to a macro to define a "native" simd type:
>
> #define NATIVE_SIMD(T) std::experimental::simd<T, _SveAbi<__arm_sve_bits() / 
> CHAR_BITS, __arm_sve_bits() / CHAR_BITS>>
>
> Getting rid of the macro seems to be even harder.

Yeah.  The constexprs in the AArch64 functions would only be compile-time
constants in to-be-defined circumstances, using some mechanism to specify
the streaming and non-streaming vector lengths at compile time.  But that's
a premise of the whole discussion, just noting it for the record in case
anyone reading this later jumps in at this point.

> A declaration of an alias like
>
> template <typename T>
> using SveSimd = std::experimental::simd<T, _SveAbi<__arm_sve_bits() / 
> CHAR_BITS, __arm_sve_bits() / CHAR_BITS>>;
>
> would have to delay "invoking" __arm_sve_bits() until it knows its context:
>
>   void nonstreaming_fn(void) {
>     static_assert(sizeof(SveSimd<float>) == 64);
>   }
>
>   __attribute__((arm_locally_streaming))
>   void streaming_fn(void) {
>     static_assert(sizeof(SveSimd<float>) == 16);
>     nonstreaming_fn(); // fine
>   }
>
> This gets even worse for target_clones, where
>
>   void f() {
>     sizeof(std::simd<float>) == ?
>   }
>
>   __attribute__((target_clones("sse4.2,avx2")))
>   void g() {
>     f();
>   }
>
> the compiler *must* virally apply target_clones to all functions it calls. And 
> member functions must either also get cloned as functions, or the whole type 
> must be cloned (as in the std::simd case, where the sizeof needs to change). 😳

Yeah, tricky :)

It's also not just about vector widths.  The target-clones case also has
the problem that you cannot detect at include time which features are
available.  E.g. “do I have SVE2-specific instructions?” becomes a
contextual question rather than a global question.

Fortunately, this should just be a missed optimisation.  But it would be
nice if uses of std::simd in SVE2 clones could take advantage of SVE2-only
instructions, even if SVE2 wasn't enabled at include time.

Thanks for the other (snipped) clarifications.  They were really helpful,
but I didn't have anything to add.

Richard

next prev parent reply	other threads:[~2024-03-27 13:34 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-24 15:59 Srinivas Yadav
2023-12-10 13:29 ` Richard Sandiford
2023-12-11 11:02   ` Richard Sandiford
2024-01-04  7:42   ` Srinivas Yadav
2024-01-04  9:10     ` Andrew Pinski
2024-01-18  7:27       ` Matthias Kretz
2024-01-18  7:40         ` Andrew Pinski
2024-01-18  8:40           ` Matthias Kretz
2024-01-18  6:54   ` Matthias Kretz
2024-01-23 20:57     ` Richard Sandiford
2024-03-27 11:53       ` Matthias Kretz
2024-03-27 13:34         ` Richard Sandiford [this message]
2024-03-28 14:48           ` Matthias Kretz
2024-02-09 14:28   ` [PATCH v2] " Srinivas Yadav Singanaboina
2024-03-08  9:57     ` Matthias Kretz
2024-03-27  9:50       ` Jonathan Wakely
2024-03-27 10:07         ` Richard Sandiford
2024-03-27 10:30           ` Matthias Kretz
2024-03-27 12:13             ` Richard Sandiford
2024-03-27 12:47               ` Jonathan Wakely
2024-03-27 14:18         ` Matthias Kretz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mpt4jcr4x37.fsf@arm.com \
    --to=richard.sandiford@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=libstdc++@gcc.gnu.org \
    --cc=m.kretz@gsi.de \
    --cc=vasusrinivas.vasu14@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).