From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcgf-fortran@m.gmane-mx.org>
Received: from ciao.gmane.io (ciao.gmane.io [116.202.254.214])
	by sourceware.org (Postfix) with ESMTPS id 6AF303858CDA
	for <fortran@gcc.gnu.org>; Mon, 31 Oct 2022 21:19:26 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6AF303858CDA
Authentication-Results: sourceware.org; dmarc=fail (p=none dis=none) header.from=manchester.ac.uk
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=m.gmane-mx.org
Received: from list by ciao.gmane.io with local (Exim 4.92)
	(envelope-from <gcgf-fortran@m.gmane-mx.org>)
	id 1opcBp-000A19-4r
	for fortran@gcc.gnu.org; Mon, 31 Oct 2022 22:19:25 +0100
X-Injected-Via-Gmane: http://gmane.org/
To: fortran@gcc.gnu.org
From: Dave Love <dave.love@manchester.ac.uk>
Subject: Re: adding attributes
Date: Mon, 31 Oct 2022 21:19:18 +0000
Message-ID: <87edund73d.fsf@manchester.ac.uk>
References: <87pmecdni6.fsf@manchester.ac.uk> <20221030084839.118ef0c8@nbbrfq>
Mime-Version: 1.0
Content-Type: text/plain
User-Agent: secret agent
Cancel-Lock: sha1:N1IjsyHb9ts5bsadYg4EJer+kCw=
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,HEADER_FROM_DIFFERENT_DOMAINS,KAM_DMARC_STATUS,KAM_SHORT,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <fortran.gcc.gnu.org>

Bernhard Reutner-Fischer via Fortran <fortran@gcc.gnu.org> writes:

> Well we already have
> !GCC$ ATTRIBUTES attribute-list :: var-name [, var-name] ...
>
> See https://gcc.gnu.org/onlinedocs/gfortran/ATTRIBUTES-directive.html

Yes, that's what I was hoping was simple to extend.  Sorry I didn't say
explicitly.

> For target_clones you would most likely need a slightly different parser
> for you need the user to specify the actual target_clones somehow. You
> would probably make a suggestion and discuss the proposal here.
> Ideally the syntax would be the same as in C.

Right.  I hoped it would be possible to lift machinery easily from C.
It wasn't obvious you could, but I didn't spend much time when I looked
at it a while ago.

> ---8<---
> In general, I prefer to stick to standard methods
> (which are portable) and think that those user knobs often make things
> slower than faster (as they tend to stay for years, even after the hard-
> ware as moved on - or they are even inserted blindly).
> ---8<---

There's no standard method for this sort of portable performance
engineering as far as I can tell.  The best I could see was specifying a
SIMD length statically in OpenMP.  I'm interested in things that
potentially make the difference between, say, vectorization for AVX2 or
full-width AVX512 versus SSE2 for profiled host-spots.  I fully agree
about measurement and not doing things blindly, and I prize
maintainability.  However, target_clones is clearly better than the
existing facility for explicit, target-independent unrolling, for instance.

> In former times, you would compile your library multiple times
> and provide a distinct, optimized version for each of the CPUs.
> Maybe that would work for you equally well, without target_clones?

"Former times" to me means, say, GEC 4000 v. IBM 370 and the aftermath
of "all the world's a VAX", rather than different x86
micro-architectures...  I do now work on both x86_64 and POWER.

Multiple compilation isn't a good solution.  I haven't followed the
current state of hardware capability support, but relevant systems don't
have it on x86_64, at least.  That wouldn't help kernels of your
simulation code that aren't abstracted into a library or set up for
dynamic dispatch anyway.  I don't have a specific instance in mind, but
consider OS packaging, which I do; that currently has to be built for
base x86_64 (SSE2) for EPEL, at least, and so could miss a factor of
several performance from vectorized.

> HTH

Thanks.  Definitely a more helpful response than when I asked about
doing something previously!  (I don't know if I'll actually be able to
work on it in the end, at least on work time.)