From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=m8Ns=I4=gsi.de=M.Kretz@sourceware.org>
Received: from lxmtout1.gsi.de (lxmtout1.gsi.de [140.181.3.111])
	by sourceware.org (Postfix) with ESMTPS id ADFC13858C56;
	Thu, 18 Jan 2024 07:28:09 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org ADFC13858C56
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gsi.de
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gsi.de
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org ADFC13858C56
Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=140.181.3.111
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1705562891; cv=none;
	b=HROFIy+BULHqhkcisnNyWiQNJETlqizU7wX606U5hkHSeTER8RNp5cxsyD0Z8Clr1hXmFH8zfvFI8gy8cVnC/uZferFldBe0mYa+rya02fs7TVA9KPboiPB+kD3eSTigVyt0Zb2R/INOBj26B4vTCQnJkRUxJKpvXs7rRMZFXT8=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
	t=1705562891; c=relaxed/simple;
	bh=mwsryTfH1TU7xQl7qaQe5XhFhmfUe1kuo/Oq2BBnjFs=;
	h=From:To:Subject:Date:Message-ID:MIME-Version; b=Lu2WpbT3hAtpv9tfvjSMitgzEpbTdj4tNADvAyrm/QCqb2yHlGINf5glzfo2EsT/8BhINc7So1TtrWVme2KKhVaSRy475BBCrkHFLlBle8Z4m6gDEmSPSBbXLzTRw2viEAbPkBxB4ORiGPK7qzAYQh60cG7lJdkhVR5ZNTUuP/4=
ARC-Authentication-Results: i=1; server2.sourceware.org
Received: from localhost (localhost [127.0.0.1])
	by lxmtout1.gsi.de (Postfix) with ESMTP id D77612051049;
	Thu, 18 Jan 2024 08:28:08 +0100 (CET)
X-Virus-Scanned: Debian amavisd-new at lxmtout1.gsi.de
Received: from lxmtout1.gsi.de ([127.0.0.1])
	by localhost (lxmtout1.gsi.de [127.0.0.1]) (amavisd-new, port 10024)
	with LMTP id 2nV1xDBVSsI4; Thu, 18 Jan 2024 08:28:08 +0100 (CET)
Received: from srvEX6.campus.gsi.de (unknown [10.10.4.96])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by lxmtout1.gsi.de (Postfix) with ESMTPS id BD8A82051040;
	Thu, 18 Jan 2024 08:28:08 +0100 (CET)
Received: from centauriprime.localnet (140.181.3.12) by srvEX6.campus.gsi.de
 (10.10.4.96) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Thu, 18 Jan
 2024 08:28:08 +0100
From: Matthias Kretz <m.kretz@gsi.de>
To: Srinivas Yadav <vasusrinivas.vasu14@gmail.com>, <libstdc++@gcc.gnu.org>,
	<gcc-patches@gcc.gnu.org>, <richard.sandiford@arm.com>
Subject: Re: [PATCH] libstdc++: add ARM SVE support to std::experimental::simd
Date: Thu, 18 Jan 2024 07:54:32 +0100
Message-ID: <3296207.VqM8IeB0Os@centauriprime>
Organization: GSI Helmholtz Centre for Heavy Ion Research
In-Reply-To: <mpt8r62kwxi.fsf@arm.com>
References: <CAADZLq108u8Td2Fc==ToN24wKE7+bdACUPfi8cVVVwv=uwYxXQ@mail.gmail.com>
 <mpt8r62kwxi.fsf@arm.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="UTF-8"
X-Originating-IP: [140.181.3.12]
X-ClientProxiedBy: srvEX8.Campus.gsi.de (10.10.4.160) To srvEX6.campus.gsi.de
 (10.10.4.96)
X-Spam-Status: No, score=0.1 required=5.0 tests=BAYES_00,BODY_8BITS,INDUSTRIAL_BODY,KAM_DMARC_STATUS,SPF_HELO_PASS,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE,URIBL_SBL_A autolearn=no autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <libstdc++.gcc.gnu.org>

On Sunday, 10 December 2023 14:29:45 CET Richard Sandiford wrote:
> Thanks for the patch and sorry for the slow review.

Sorry for my slow reaction. I needed a long vacation. For now I'll focus on=
=20
the design question wrt. multi-arch compilation.

> I can only comment on the usage of SVE, rather than on the scaffolding
> around it.  Hopefully Jonathan or others can comment on the rest.

That's very useful!

> The main thing that worries me is:
>=20
> #if _GLIBCXX_SIMD_HAVE_SVE
> constexpr inline int __sve_vectorized_size_bytes =3D __ARM_FEATURE_SVE_BI=
TS/8;
> #else
> constexpr inline int __sve_vectorized_size_bytes =3D 0;
> #endif
>=20
> Although -msve-vector-bits is currently a per-TU setting, that isn't
> necessarily going to be the case in future.

This is a topic that I care about a lot... as simd user, implementer, and W=
G21=20
proposal author. Are you thinking of a plan to implement the target_clones=
=20
function attribute for different SVE lengths? Or does it go further than th=
at?=20
PR83875 is raising the same issue and solution ideas for x86. If I understa=
nd=20
your concern correctly, then the issue you're raising exists in the same fo=
rm=20
for x86.

If anyone is interested in working on a "translation phase 7 replacement" f=
or=20
compiler flags macros I'd be happy to give some input of what I believe is=
=20
necessary to make target_clones work with std(x)::simd. This seems to be ab=
out=20
constant expressions that return compiler-internal state - probably similar=
 to=20
how static reflection needs to work.

=46or a sketch of a direction: what I'm already doing in=20
std::experimental::simd, is to tag all non-always_inline function names wit=
h a=20
bitflag, representing a relevant subset of -f and -m flags. That way, I'm=20
guarding against surprises on linking TUs compiled with different flags.

> Ideally it would be
> possible to define different implementations of a function with
> different (fixed) vector lengths within the same TU.  The value at
> the point that the header file is included is then somewhat arbitrary.
>=20
> So rather than have:
> >  using __neon128 =3D _Neon<16>;
> >  using __neon64 =3D _Neon<8>;
> >=20
> > +using __sve =3D _Sve<>;
>=20
> would it be possible instead to have:
>=20
>   using __sve128 =3D _Sve<128>;
>   using __sve256 =3D _Sve<256>;
>   ...etc...
>=20
> ?  Code specialised for 128-bit vectors could then use __sve128 and
> code specialised for 256-bit vectors could use __sve256.

Hmm, as things stand we'd need two numbers, IIUC:
_Sve<NumberOfUsedBytes, SizeofRegister>

On x86, "NumberOfUsedBytes" is sufficient, because 33-64 implies zmm regist=
ers=20
(and -mavx512f), 17-32 implies ymm, and <=3D16 implies xmm (except where it=
=20
doesn't ;) ).

> Perhaps that's not possible as things stand, but it'd be interesting
> to know the exact failure mode if so.  Either way, it would be good to
> remove direct uses of __ARM_FEATURE_SVE_BITS from simd_sve.h if possible,
> and instead rely on information derived from template parameters.

The TS spec requires std::experimental::native_simd<int> to basically give =
you=20
the largest, most efficient, full SIMD register. (And it can't be a sizeles=
s=20
type because they don't exist in C++). So how would you do that without=20
looking at __ARM_FEATURE_SVE_BITS in the simd implementation?


> It should be possible to use SVE to optimise some of the __neon*
> implementations, which has the advantage of working even for VLA SVE.
> That's obviously a separate patch, though.  Just saying for the record.

I learned that NVidia Grace CPUs alias NEON and SVE registers. But I must=20
assume that other SVE implementations (especially those with=20
__ARM_FEATURE_SVE_BITS > 128) don't do that and might incur a significant=20
latency when going from a NEON register to an SVE register and back (which=
=20
each requires a store-load, IIUC). So are you thinking of implementing=20
everything via SVE? That would break ABI, no?

=2D Matthias

=2D-=20
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=
=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=
=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=
=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80
 Dr. Matthias Kretz                           https://mattkretz.github.io
 GSI Helmholtz Center for Heavy Ion Research               https://gsi.de
 std::simd
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=
=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=
=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=
=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=
=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=
=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80