From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ciao.gmane.io (ciao.gmane.io [116.202.254.214]) by sourceware.org (Postfix) with ESMTPS id 6AF303858CDA for ; Mon, 31 Oct 2022 21:19:26 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6AF303858CDA Authentication-Results: sourceware.org; dmarc=fail (p=none dis=none) header.from=manchester.ac.uk Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=m.gmane-mx.org Received: from list by ciao.gmane.io with local (Exim 4.92) (envelope-from ) id 1opcBp-000A19-4r for fortran@gcc.gnu.org; Mon, 31 Oct 2022 22:19:25 +0100 X-Injected-Via-Gmane: http://gmane.org/ To: fortran@gcc.gnu.org From: Dave Love Subject: Re: adding attributes Date: Mon, 31 Oct 2022 21:19:18 +0000 Message-ID: <87edund73d.fsf@manchester.ac.uk> References: <87pmecdni6.fsf@manchester.ac.uk> <20221030084839.118ef0c8@nbbrfq> Mime-Version: 1.0 Content-Type: text/plain User-Agent: secret agent Cancel-Lock: sha1:N1IjsyHb9ts5bsadYg4EJer+kCw= X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,HEADER_FROM_DIFFERENT_DOMAINS,KAM_DMARC_STATUS,KAM_SHORT,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Bernhard Reutner-Fischer via Fortran writes: > Well we already have > !GCC$ ATTRIBUTES attribute-list :: var-name [, var-name] ... > > See https://gcc.gnu.org/onlinedocs/gfortran/ATTRIBUTES-directive.html Yes, that's what I was hoping was simple to extend. Sorry I didn't say explicitly. > For target_clones you would most likely need a slightly different parser > for you need the user to specify the actual target_clones somehow. You > would probably make a suggestion and discuss the proposal here. > Ideally the syntax would be the same as in C. Right. I hoped it would be possible to lift machinery easily from C. It wasn't obvious you could, but I didn't spend much time when I looked at it a while ago. > ---8<--- > In general, I prefer to stick to standard methods > (which are portable) and think that those user knobs often make things > slower than faster (as they tend to stay for years, even after the hard- > ware as moved on - or they are even inserted blindly). > ---8<--- There's no standard method for this sort of portable performance engineering as far as I can tell. The best I could see was specifying a SIMD length statically in OpenMP. I'm interested in things that potentially make the difference between, say, vectorization for AVX2 or full-width AVX512 versus SSE2 for profiled host-spots. I fully agree about measurement and not doing things blindly, and I prize maintainability. However, target_clones is clearly better than the existing facility for explicit, target-independent unrolling, for instance. > In former times, you would compile your library multiple times > and provide a distinct, optimized version for each of the CPUs. > Maybe that would work for you equally well, without target_clones? "Former times" to me means, say, GEC 4000 v. IBM 370 and the aftermath of "all the world's a VAX", rather than different x86 micro-architectures... I do now work on both x86_64 and POWER. Multiple compilation isn't a good solution. I haven't followed the current state of hardware capability support, but relevant systems don't have it on x86_64, at least. That wouldn't help kernels of your simulation code that aren't abstracted into a library or set up for dynamic dispatch anyway. I don't have a specific instance in mind, but consider OS packaging, which I do; that currently has to be built for base x86_64 (SSE2) for EPEL, at least, and so could miss a factor of several performance from vectorized. > HTH Thanks. Definitely a more helpful response than when I asked about doing something previously! (I don't know if I'll actually be able to work on it in the end, at least on work time.)