From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lxmtout1.gsi.de (lxmtout1.gsi.de [140.181.3.111]) by sourceware.org (Postfix) with ESMTPS id 800FB3858CD1; Thu, 18 Jan 2024 08:40:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 800FB3858CD1 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gsi.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gsi.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 800FB3858CD1 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=140.181.3.111 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1705567258; cv=none; b=dCF3DOZeJQMcrhmBSaso5Y9joVs+UXrYqfCk2im0yiqlHaryf+4tvHuUqrBUtU+5ZDs1Z9sLPaFXJVxM0U9NiLpPHWUnfxKlKbGC38tCYFL8GDcSIUVrQqsbExvJu9cRSCKzF1E2cb5I/SqYI4ifi3iF/Z0wcqJItTRWFQtRBME= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1705567258; c=relaxed/simple; bh=ACcnqAGJw36BOEQX5odfLb54WKFzQtRlMZS93hXfV7k=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=mqQ91QU4M9H3N5obUBnnP7BJEgNmpn0MfTAuuLZJJ27OoJRbW3qf+DUzFa13gIMnGZ95vDD2NTCiBgB7Ev6t8WrpRDJ1DrmkBelEx3FxzHyzYzJ/manya9Z8GBP1l0LurqGXN0bR+InUtmPxf6S1LulKqLy9QODgsSkCTg2xYY0= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from localhost (localhost [127.0.0.1]) by lxmtout1.gsi.de (Postfix) with ESMTP id 83FED205104D; Thu, 18 Jan 2024 09:40:52 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at lxmtout1.gsi.de Received: from lxmtout1.gsi.de ([127.0.0.1]) by localhost (lxmtout1.gsi.de [127.0.0.1]) (amavisd-new, port 10024) with LMTP id gwZw4RhxVabi; Thu, 18 Jan 2024 09:40:52 +0100 (CET) Received: from srvEX6.campus.gsi.de (unknown [10.10.4.96]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lxmtout1.gsi.de (Postfix) with ESMTPS id 6A8CE2051040; Thu, 18 Jan 2024 09:40:52 +0100 (CET) Received: from centauriprime.localnet (140.181.3.12) by srvEX6.campus.gsi.de (10.10.4.96) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Thu, 18 Jan 2024 09:40:52 +0100 From: Matthias Kretz To: CC: Srinivas Yadav , , , , Andrew Pinski Subject: Re: [PATCH] libstdc++: add ARM SVE support to std::experimental::simd Date: Thu, 18 Jan 2024 09:40:51 +0100 Message-ID: <4123346.6PsWsQAL7t@centauriprime> Organization: GSI Helmholtz Centre for Heavy Ion Research In-Reply-To: References: <12731700.VsHLxoZxqI@centauriprime> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="UTF-8" X-Originating-IP: [140.181.3.12] X-ClientProxiedBy: srvEX7.Campus.gsi.de (10.10.4.97) To srvEX6.campus.gsi.de (10.10.4.96) X-Spam-Status: No, score=-4.1 required=5.0 tests=BAYES_00,BODY_8BITS,KAM_DMARC_STATUS,SPF_HELO_PASS,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE,URIBL_SBL_A autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Thursday, 18 January 2024 08:40:48 CET Andrew Pinski wrote: > On Wed, Jan 17, 2024 at 11:28=E2=80=AFPM Matthias Kretz = wrote: > > template > > struct Point > > { > > T x, y, z; > > =20 > > T distance_to_origin() { > > return sqrt(x * x + y * y + z * z); > > } > > }; > >=20 > > Point is one point in 3D space, Point> stores multip= le > > points in 3D space and can work on them in parallel. > >=20 > > This implies that simd must have a sizeof. C++ is unlikely to get > > sizeless types (the discussions were long, there were many papers, ...). > > Should sizeless types in C++ ever happen, then composition is likely go= ing > > to be constrained to the last data member. >=20 > Even this is a bad design in general for simd. It means the code needs > to know the size. Yes and no. The developer writes size-agnostic code. The person compiling t= he=20 code chooses the size (via -m flags) and thus the compiler sees fixed-size= =20 code. > Also AoS vs SoA is always an interesting point here. In some cases you > want an array of structs > for speed and Point> does not work there at all. I guess > This is all water under the bridge with how folks design code. > You are basically pushing AoSoA idea here which is much worse idea than > before. I like to call it "array of vectorized struct" (AoVS) instead of AoSoA to=20 emphasize the compiler-flags dependent memory layout. I've been doing a lot of heterogeneous SIMD programming since 2009, startin= g=20 with an outer loop vectorization across many TUs of a high-energy physics c= ode=20 targeting Intel Larrabee (pre-AVX512 ISA) and SSE2 with one source. In all= =20 these years my experience has been that, if the problem allows, AoVS is bes= t=20 in terms of performance and code generality & readability. I'd be intereste= d=20 to learn why you think differently. > That being said sometimes it is not a vector of N elements you want to > work on but rather 1/2/3 vector of N elements. Seems like this is > just pushing the idea one of one vector of one type of element which > again is wrong push. I might have misunderstood. You're saying that sometimes I want a =20 even though my target CPU only has registers? Yes! The=20 std::experimental::simd spec and implementation isn't good enough in that a= rea=20 yet, but the C++26 paper(s) and my prototype implementation provides perfec= t=20 SIMD + ILP translation of the expressed data-parallelism. > Also more over, I guess pushing one idea of SIMD is worse than pushing > any idea of SIMD. For Mathematical code, it is better for the compiler > to do the vectorization than the user try to be semi-portable between > different targets. I guess I agree with that statement. But I wouldn't, in general, call the u= se=20 of simd "the user try[ing] to be semi-portable". In my experience, worki= ng=20 on physics code - a lot of math - using simd (as intended) is better in= =20 terms of performance and performance portability. As always, abuse is possi= ble=20 =2E.. > This is what was learned on Fortran but I guess > some folks in the C++ likes to expose the underlying HW instead of > thinking high level here. The C++ approach is to "leave no room for a lower-level language" while=20 designing for high-level abstractions / usage. > > With the above as our design constraints, SVE at first seems to be a bad > > fit for implementing std::simd. However, if (at least initially) we acc= ept > > the need for different binaries for different SVE implementations, then > > you > > can look at the "scalable" part of SVE as an efficient way of reducing = the > > number of opcodes necessary for supporting all kinds of different vector > > lengths. But otherwise you can treat it as fixed-size registers - which= it > > is for a given CPU. In the case of a multi-CPU shared-memory system (e.= g. > > RDMA between different ARM implementations) all you need is a different > > name for incompatible types. So std::simd on SVE256 must have a > > different name on SVE512. Same for std::simd (which is curren= tly > > not the case with Sriniva's patch, I think, and needs to be resolved). >=20 > For SVE that is a bad design. It means The code is not portable at all. When you say "code" you mean "source code", not binaries, right? I don't se= e=20 how that follows. =2D Matthias =2D-=20 =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80 Dr. Matthias Kretz https://mattkretz.github.io GSI Helmholtz Center for Heavy Ion Research https://gsi.de std::simd =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80