From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <Thomas_Schwinge@mentor.com>
Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153])
 by sourceware.org (Postfix) with ESMTPS id 55DE83858018
 for <gcc@gcc.gnu.org>; Tue,  8 Mar 2022 14:30:00 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 55DE83858018
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=codesourcery.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com
X-IronPort-AV: E=Sophos;i="5.90,165,1643702400"; d="scan'208";a="75446484"
Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167])
 by esa1.mentor.iphmx.com with ESMTP; 08 Mar 2022 06:30:00 -0800
IronPort-SDR: dUh5jCQIQi97Kxv5kxX2MhMTrRqZWRMYVE7gqASkQMfuCf1Osw0BQiKyqnrA4J+9Y/hNsR13+l
 8o2CSTXigSnH6OOwXWNkZjQcrZY3yPyrkpoR70NWpb1t4Nu9h2EaltGlQTMr4+yz3Aa/8VwfjB
 URuKyrVaR4cjluCa07iuanOSp6LyjzwVqLexULxndoIPHAOiGmtsl6T7KYefcwZ3jfA22nWuyX
 lB4t7cRBbpQU+H++TnalEnG4vhnlwIVkNs+MWIKxYFtq3TkDmzmteZC1yRAgk5gShkodPjqcTP
 wkY=
From: Thomas Schwinge <thomas@codesourcery.com>
To: <Andrew_Stubbs@mentor.com>, Jakub Jelinek <jakub@redhat.com>
CC: <gcc@gcc.gnu.org>, Tom de Vries <tdevries@suse.de>
Subject: Re: OpenMP auto-simd
In-Reply-To: <2213264c2c5c467fb491f71051173873@svr-ies-mbx-01.mgc.mentorg.com>
References: <2213264c2c5c467fb491f71051173873@svr-ies-mbx-01.mgc.mentorg.com>
User-Agent: Notmuch/0.29.3+94~g74c3f1b (https://notmuchmail.org) Emacs/27.1
 (x86_64-pc-linux-gnu)
Date: Tue, 8 Mar 2022 15:29:49 +0100
Message-ID: <8735jssejm.fsf@euler.schwinge.homeip.net>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
X-Originating-IP: [137.202.0.90]
X-ClientProxiedBy: SVR-IES-MBX-08.mgc.mentorg.com (139.181.222.8) To
 svr-ies-mbx-01.mgc.mentorg.com (139.181.222.1)
X-Spam-Status: No, score=-6.0 required=5.0 tests=BAYES_00,
 HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS,
 TXREP, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc mailing list <gcc.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc>,
 <mailto:gcc-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <mailto:gcc-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc>,
 <mailto:gcc-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Tue, 08 Mar 2022 14:30:02 -0000

Hi!

... with the usual caveat that I know much more about OpenACC than
OpenMP, and I know (at least a bit) more about nvptx than GCN...  ;-)

On 2022-03-02T15:12:30+0000, "Stubbs, Andrew" <Andrew_Stubbs@mentor.com> wr=
ote:
> Has anyone ever considered having GCC add the "simd" clause to offload (o=
r regular) loop nests automatically?
>
> For example, something like "-fomp-auto-simd" would transform "distribute=
 parallel" to "distribute parallel simd" automatically. Loop nests that alr=
eady contain "simd" clauses or directives would remain unchanged, most like=
ly.
>
> The reason I ask is that other toolchains have chosen to use a "SIMT" mod=
el for GPUs, which means that OpenMP threads map to individual vector lanes=
 and are therefore are strictly scalar. The result is that the "simd" direc=
tive is irrelevant and lots of code out there isn't using it at all (so I'm=
 told). Meanwhile, in GCC we map OpenMP threads to Nvidia warps and AMD GCN=
 wavefronts, so it is impossible to get full performance without explicitly=
 specifying the "simd" directive. We therefore suffer in direct comparisons=
.
>
> I'm of the opinion that GCC is the one implementing OpenMP as intended

I'm curious: how does one arrive at this conclusion?

For example, in addition to intra-warp thread parallelism, nvptx also
does have a few SIMD instructions: data transfer (combine two adjacent
32-bit transfers into one 64-bit transfer, and also some basic
arithmetic; I'd have to look up the details).  It's not much, but it's
something that GCC's SLP vectorizer can use.  (Tom worked on that, years
ago.)  Using that to implement OpenMP's SIMD (quite likely via
default-(SLP-)auto-vectorization), you'd then indeed get for actualy
OpenMP threads what you described as "SIMT" model above.

Why not change GCC to do the same, if that's the common understanding how
OpenMP for GPUs should be done, as implemented by other compilers?


Gr=C3=BC=C3=9Fe
 Thomas


> but all the same I need to explore our options here, figure out what the =
consequences would be, and plan a project to do what we can.
>
> I've thought of simply enabling "-ftree-vectorize" on AMD GCN (this doesn=
't help NVPTX) but I think that is sub-optimal because things like the Open=
MP scheduler really need to be aware of the vector size, and there's probab=
ly other ways in which parallel regions can be better formed with regard to=
 the vectorizer. If these features don't exist right now then I have an opp=
ortunity to include them in our upcoming project.
>
> Any info/suggestions/advice would be appreciated.
>
> Thanks
>
> Andrew
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstra=C3=9Fe 201=
, 80634 M=C3=BCnchen; Gesellschaft mit beschr=C3=A4nkter Haftung; Gesch=C3=
=A4ftsf=C3=BChrer: Thomas Heurung, Frank Th=C3=BCrauf; Sitz der Gesellschaf=
t: M=C3=BCnchen; Registergericht M=C3=BCnchen, HRB 106955