From: Richard Sandiford <richard.sandiford@arm.com>
To: Matthias Kretz <m.kretz@gsi.de>
Cc: Srinivas Yadav <vasusrinivas.vasu14@gmail.com>,
<libstdc++@gcc.gnu.org>, <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] libstdc++: add ARM SVE support to std::experimental::simd
Date: Wed, 27 Mar 2024 13:34:52 +0000 [thread overview]
Message-ID: <mpt4jcr4x37.fsf@arm.com> (raw)
In-Reply-To: <4306399.Mh6RI2rZIc@excalibur> (Matthias Kretz's message of "Wed, 27 Mar 2024 12:53:00 +0100")
Matthias Kretz <m.kretz@gsi.de> writes:
> Hi Richard,
>
> sorry for not answering sooner. I took action on your mail but failed to also
> give feedback. Now in light of your veto of Srinivas patch I wanted to use the
> opportunity to pick this up again.
>
> On Dienstag, 23. Januar 2024 21:57:23 CET Richard Sandiford wrote:
>> However, we also support different vector lengths for streaming SVE
>> (running in "streaming" mode on SME) and non-streaming SVE (running
>> in "non-streaming" mode on the core). Having two different lengths is
>> expected to be the common case, rather than a theoretical curiosity.
>
> I read up on this after you mentioned this for the first time. As a WG21
> member I find the approach troublesome - but that's a bit off-topic for this
> thread.
>
> The big issue here is that, IIUC, a user (and the simd library) cannot do the
> right thing at the moment. There simply isn't enough context information
> available when parsing the <experimental/simd> header. I.e. on definition of
> the class template there's no facility to take target_clones or SME
> "streaming" mode into account. Consequently, if we want the library to be fit
> for SME, then we need more language extension(s) to make it work.
Yeah. I think the same applies to plain SVE. It seems reasonable to
have functions whose implementation is specialised for a specific SVE
length, with that function being selected at runtime where appropriate.
Those functions needn't (in principle) be in separate TUs. The “best”
definition of native<float> then becomes a per-function property rather
than a per-TU property.
As you note later, I think the same thing would apply to x86_64.
> I guess I'm looking for a way to declare types that are different depending on
> whether they are used in streaming mode or non-streaming mode (making them
> ill-formed to use in functions marked arm_streaming_compatible).
>
> From reading through https://arm-software.github.io/acle/main/
> acle.html#controlling-the-use-of-streaming-mode I don't see any discussion of
> member functions or ctor/dtor, static and non-static data members, etc.
>
> The big issue I see here is that currently all of std::* is declared without a
> arm_streaming or arm_streaming_compatible. Thus, IIUC, you can't use anything
> from the standard library in streaming mode. Since that also applies to
> std::experimental::simd, we're not creating a new footgun, only missing out on
> potential users?
Kind-of. However, we can inline a non-streaming function into a streaming
function if that doesn't change defined behaviour. And that's important
in practice for C++, since most trivial inline functions will not be
marked streaming-compatible despite being so in practice.
It's UB to pass and return SVE vectors across streaming/non-streaming
boundaries unless the two VLs are equal. It's therefore valid to inline
such functions into streaming functions *unless* the callee uses
non-streaming-only instructions such as gather loads.
Because of that, someone trying to use std::experimenal::simd in SME
functions is likely to succeed, at least in simple cases.
> Some more thoughts on target_clones/streaming SVE language extension
> evolution:
>
> void nonstreaming_fn(void) {
> constexpr int width = __arm_sve_bits(); // e.g. 512
> constexpr int width2 = __builtin_vector_size(); // e.g. 64 (the
> // vector_size attribute works with bytes, not bits)
> }
>
> __attribute__((arm_locally_streaming))
> void streaming_fn(void) {
> constexpr int width = __arm_sve_bits(); // e.g. 128
> constexpr int width2 = __builtin_vector_size(); // e.g. 16
> }
>
> __attribute__((target_clones("sse4.2,avx2")))
> void streaming_fn(void) {
> constexpr int width = __builtin_vector_size(); // 16 in the sse4.2 clone
> // and 32 in the avx2 clone
> }
>
> ... as a starting point for exploration. Given this, I'd still have to resort
> to a macro to define a "native" simd type:
>
> #define NATIVE_SIMD(T) std::experimental::simd<T, _SveAbi<__arm_sve_bits() /
> CHAR_BITS, __arm_sve_bits() / CHAR_BITS>>
>
> Getting rid of the macro seems to be even harder.
Yeah. The constexprs in the AArch64 functions would only be compile-time
constants in to-be-defined circumstances, using some mechanism to specify
the streaming and non-streaming vector lengths at compile time. But that's
a premise of the whole discussion, just noting it for the record in case
anyone reading this later jumps in at this point.
> A declaration of an alias like
>
> template <typename T>
> using SveSimd = std::experimental::simd<T, _SveAbi<__arm_sve_bits() /
> CHAR_BITS, __arm_sve_bits() / CHAR_BITS>>;
>
> would have to delay "invoking" __arm_sve_bits() until it knows its context:
>
> void nonstreaming_fn(void) {
> static_assert(sizeof(SveSimd<float>) == 64);
> }
>
> __attribute__((arm_locally_streaming))
> void streaming_fn(void) {
> static_assert(sizeof(SveSimd<float>) == 16);
> nonstreaming_fn(); // fine
> }
>
> This gets even worse for target_clones, where
>
> void f() {
> sizeof(std::simd<float>) == ?
> }
>
> __attribute__((target_clones("sse4.2,avx2")))
> void g() {
> f();
> }
>
> the compiler *must* virally apply target_clones to all functions it calls. And
> member functions must either also get cloned as functions, or the whole type
> must be cloned (as in the std::simd case, where the sizeof needs to change). 😳
Yeah, tricky :)
It's also not just about vector widths. The target-clones case also has
the problem that you cannot detect at include time which features are
available. E.g. “do I have SVE2-specific instructions?” becomes a
contextual question rather than a global question.
Fortunately, this should just be a missed optimisation. But it would be
nice if uses of std::simd in SVE2 clones could take advantage of SVE2-only
instructions, even if SVE2 wasn't enabled at include time.
Thanks for the other (snipped) clarifications. They were really helpful,
but I didn't have anything to add.
Richard
next prev parent reply other threads:[~2024-03-27 13:34 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-24 15:59 Srinivas Yadav
2023-12-10 13:29 ` Richard Sandiford
2023-12-11 11:02 ` Richard Sandiford
2024-01-04 7:42 ` Srinivas Yadav
2024-01-04 9:10 ` Andrew Pinski
2024-01-18 7:27 ` Matthias Kretz
2024-01-18 7:40 ` Andrew Pinski
2024-01-18 8:40 ` Matthias Kretz
2024-01-18 6:54 ` Matthias Kretz
2024-01-23 20:57 ` Richard Sandiford
2024-03-27 11:53 ` Matthias Kretz
2024-03-27 13:34 ` Richard Sandiford [this message]
2024-03-28 14:48 ` Matthias Kretz
2024-02-09 14:28 ` [PATCH v2] " Srinivas Yadav Singanaboina
2024-03-08 9:57 ` Matthias Kretz
2024-03-27 9:50 ` Jonathan Wakely
2024-03-27 10:07 ` Richard Sandiford
2024-03-27 10:30 ` Matthias Kretz
2024-03-27 12:13 ` Richard Sandiford
2024-03-27 12:47 ` Jonathan Wakely
2024-03-27 14:18 ` Matthias Kretz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=mpt4jcr4x37.fsf@arm.com \
--to=richard.sandiford@arm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=libstdc++@gcc.gnu.org \
--cc=m.kretz@gsi.de \
--cc=vasusrinivas.vasu14@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).