From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ej1-x629.google.com (mail-ej1-x629.google.com [IPv6:2a00:1450:4864:20::629]) by sourceware.org (Postfix) with ESMTPS id 4D19C3858C56 for ; Wed, 2 Nov 2022 23:19:32 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 4D19C3858C56 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-ej1-x629.google.com with SMTP id b2so1056189eja.6 for ; Wed, 02 Nov 2022 16:19:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=mp+QDOF9fRuWXQ7qD+dId8y7Y0B5IZxU4sKf7qTo0/w=; b=kkBhGa2s1jzyOjKcyRloPPijNRM46sF1yNZ5+Q8m2TKWhzNPfu1ESGtJTl0PAczAea UBDjHTFOKtqqOYiOgGAS+/EyGEEUiBjW8I1hiinoUiRoJZohCZ3siskVWRdICQoy1d1y QIdEAX1a4eiKLIRjo2MZFiIdhe3i7h4/mAd4GrScvsQLpuPJ60UF7X8ArMkrRAType5b AfmUNwJmVFyQDIjvIWCPnIYJMNaF3QaFPitjDuqjbWyJHyqlAStWR0gqcF2LpPSMSLxc VwM7qsOUWT70dMn7JHlfgNtdKiO8uVWrr/OJy1uxmfAp/Y6JJBJx6/ekc4Sr5GfiZKP8 JjoA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=mp+QDOF9fRuWXQ7qD+dId8y7Y0B5IZxU4sKf7qTo0/w=; b=DHP5RZRYX281xk+RQUqmWbIXke/T89v948bfnfxB7KU0+lbP4k3lhkzz+2zjIMllsg 2jiqHCsMoBQgB55O3LGOlelbKt/bQqdKAcgxdTcWJbMz1qYTL/rRW1rKXk+EtpKzDrfE mhFKIQICsmYAKHiUt/VDHiH63k2/BJGL+cK/lwYc4u65ZLvnbPK0iStdRAf+2Wv7G6FX 23XgdJTr5n1NZI0D5JWSbD+IyFiW+MjSXZG9lEzs4ApJViGbrRzD8lYZvkSXKM44MzU1 OYW4cjyP3+So8mN9P7ka3jBbYb59rmSVG+FqoKzSglq9wAfGG2IaulFrfmQ9Y0wSQB5j I8Jg== X-Gm-Message-State: ACrzQf0g0xMlxY2zVhLo6P1KF+y60GmT8/ORBe/OohicZyWWxoV7VivK yZ02jDNeW3iS5HJbfItR7oI= X-Google-Smtp-Source: AMsMyM5QPOHc6ncLC0rYndqjqlrSMjjiQnBMPG0GBUAAj4zzdRxQzE8bUQX9N9Q1OnIG8c7Ng3wj+Q== X-Received: by 2002:a17:906:9746:b0:798:baec:3a80 with SMTP id o6-20020a170906974600b00798baec3a80mr25678587ejy.610.1667431170504; Wed, 02 Nov 2022 16:19:30 -0700 (PDT) Received: from nbbrfq (62-46-141-136.adsl.highway.telekom.at. [62.46.141.136]) by smtp.gmail.com with ESMTPSA id v14-20020a1709063bce00b0072af4af2f46sm5922870ejf.74.2022.11.02.16.19.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 02 Nov 2022 16:19:30 -0700 (PDT) Date: Thu, 3 Nov 2022 00:19:26 +0100 From: Bernhard Reutner-Fischer To: Dave Love via Fortran Cc: rep.dot.nop@gmail.com, Dave Love Subject: Re: adding attributes Message-ID: <20221103001926.725fd9bf@nbbrfq> In-Reply-To: <87edund73d.fsf@manchester.ac.uk> References: <87pmecdni6.fsf@manchester.ac.uk> <20221030084839.118ef0c8@nbbrfq> <87edund73d.fsf@manchester.ac.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,WEIRD_PORT autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, 31 Oct 2022 21:19:18 +0000 Dave Love via Fortran wrote: > Bernhard Reutner-Fischer via Fortran writes: > > Ideally the syntax would be the same as in C. > > Right. I hoped it would be possible to lift machinery easily from C. Lifting that won't work easily, no. > There's no standard method for this sort of portable performance > engineering as far as I can tell. The best I could see was specifying a > SIMD length statically in OpenMP. I'm interested in things that > potentially make the difference between, say, vectorization for AVX2 or > full-width AVX512 versus SSE2 for profiled host-spots. I fully agree I see. So target_clones is one thing. What other attributes would be important? > about measurement and not doing things blindly, and I prize > maintainability. However, target_clones is clearly better than the > existing facility for explicit, target-independent unrolling, for instance. Yes. Unroll is certainly only applicable in a few places, sure. > > > In former times, you would compile your library multiple times > > and provide a distinct, optimized version for each of the CPUs. > > Maybe that would work for you equally well, without target_clones? > > "Former times" to me means, say, GEC 4000 v. IBM 370 and the aftermath > of "all the world's a VAX", rather than different x86 > micro-architectures... I do now work on both x86_64 and POWER. In your job script you would use cpuid(1) to determine a properly tuned binary for the parts of the cluster you run on. Or the installed binaries are tuned for the host they are installed on and are located in a uniform place per application. > > Multiple compilation isn't a good solution. I haven't followed the It might not be good, but it's cheap and easy if you only have a small set of different arches and subarches each. In a controlled environment, with a batch scheduler. Won't work in the wild of course. > current state of hardware capability support, but relevant systems don't > have it on x86_64, at least. That wouldn't help kernels of your > simulation code that aren't abstracted into a library or set up for > dynamic dispatch anyway. I don't have a specific instance in mind, but > consider OS packaging, which I do; that currently has to be built for > base x86_64 (SSE2) for EPEL, at least, and so could miss a factor of > several performance from vectorized. For packaging for global use that won't work all that well indeed. But since you cannot mix target_clones across arch-boundaries, supporting those for a distro will probably be rather ugly anyway. I think that's what's gentoo et al are for, or your privately rebuilt debian repo; provide a tuned world for everybody, individually ;) But as you mentioned EPEL i never said that :) > > > HTH > > Thanks. Definitely a more helpful response than when I asked about > doing something previously! (I don't know if I'll actually be able to > work on it in the end, at least on work time.) heh, me neither. Luckily yesterday was a holiday, so what i ended up with was the following, fya. Consider: $ grep -v "^\!\!" /scratch/src/gcc-13.mine/gcc/testsuite/gfortran.dg/attr_target_clones-1.f90;echo EOF ! { dg-do compile } ! { dg-options "-O1 -fdump-tree-optimized" } ! ! Test __attribute__ ((target_clones ("foo", "bar"))) ! module m implicit none contains subroutine sub1() !GCC$ ATTRIBUTES target_clones("avx", "sse","default") :: sub1 print *, 4321 end end module m ! { dg-final { scan-tree-dump-times {void * __m_MOD_sub1.resolver ()} "optimized" 1 } } ! { dg-final { scan-tree-dump-times {void __m_MOD_sub1.avx ()} "optimized" 1 } } ! { dg-final { scan-tree-dump-times {void __m_MOD_sub1.sse ()} "optimized" 1 } } !!! { dg-final { scan-tree-dump-times {XXX something sub1.default ()} "optimized" 1 } } ! { dg-final { scan-tree-dump-not {void sub1 ()} "optimized" } } EOF Which gives $ ./gfortran -B. -o /tmp/out.o -c /scratch/src/gcc-13.mine/gcc/testsuite/gfortran.dg/attr_target_clones-1.f90 -O2 -fdump-tree-original -fdump-tree-optimized /tmp/ccxpGd9Y.s: Assembler messages: /tmp/ccxpGd9Y.s:118: Error: symbol `__m_MOD_sub1' is already defined That's because that ends up as $ nl -ba /tmp/out.s | grep __m_MOD_sub1 12 .type __m_MOD_sub1, @function 13 __m_MOD_sub1: 35 .size __m_MOD_sub1, .-__m_MOD_sub1 36 .type __m_MOD_sub1.avx, @function 37 __m_MOD_sub1.avx: 59 .size __m_MOD_sub1.avx, .-__m_MOD_sub1.avx 60 .type __m_MOD_sub1.sse, @function 61 __m_MOD_sub1.sse: 83 .size __m_MOD_sub1.sse, .-__m_MOD_sub1.sse 84 .section .text.__m_MOD_sub1.resolver,"axG",@progbits,__m_MOD_sub1.resolver,comdat 85 .weak __m_MOD_sub1.resolver 86 .type __m_MOD_sub1.resolver, @function 87 __m_MOD_sub1.resolver: 95 movl $__m_MOD_sub1.avx, %eax 104 movl $__m_MOD_sub1, %eax 105 movl $__m_MOD_sub1.sse, %edx 110 .size __m_MOD_sub1.resolver, .-__m_MOD_sub1.resolver 111 .globl __m_MOD_sub1 112 .type __m_MOD_sub1, @gnu_indirect_function 113 .set __m_MOD_sub1,__m_MOD_sub1.resolver where 13 and 111 probably don't work out too well. The C frontend uses sub1.default as version for the (former) plain sub1: 4 .type sub1.default, @function 5 sub1.default: ... 103 .section .text.sub1.resolver,"axG",@progbits,sub1.resolver,comdat 105 .weak sub1.resolver 106 .type sub1.resolver, @function 107 sub1.resolver: ... 162 leaq sub1.default(%rip), %rax 167 .size sub1.resolver, .-sub1.resolver 168 .globl sub1 169 .type sub1, @gnu_indirect_function 170 .set sub1,sub1.resolver If i mark the module fndecl as DECL_FUNCTION_VERSIONED, then it's pointed out that i seem to have to provide the default by hand: 10 | subroutine sub1() | 1 internal compiler error: in ix86_mangle_function_version_assembler_name, at config/i386/i386-features.cc:3165 0x806780 ix86_mangle_function_version_assembler_name That's the check that there is /* target attribute string cannot be NULL. */ gcc_assert (version_attr != NULL_TREE); So while target and target_clones seem to be mutually exclusive (from the C FE checking), the versioning wants the default in a target attr or something like that. And on top of all that, gfc_match_gcc_attributes has the following comment: TODO: We should support all GCC attributes using the same syntax for the attribute list, i.e. the list in C __attributes(( attribute-list )) matches then !GCC$ ATTRIBUTES attribute-list :: Cf. c-parser.cc's c_parser_attributes; the data can then directly be saved into a TREE. When we do that, we can get rid of ext_attr_list[] because that would be generated right from the start. I've added a /* Attributes set by compiler extensions (!GCC$ ATTRIBUTES). */ unsigned ext_attr:EXT_ATTR_NUM; + tree ext_attr_args; to struct symbol_attribute where i can prepare the tree_list for the attrs right from the start. The lowering is then rather simple and uniform, just chainon the prepared attributes and be done. One could get rid of ext_attr altogether, with the caveat that this would change the module format. We'd have to save the attrs in a different way, breaking module compat again, of course. target_clones does not require a bump in the module format, i'd say, because the main entry point does not change. Will have to check if the clones do not end up being emitted in the module, they shouldn't be. Other attributes _may_ require a change in the module format though. These would need checking on a per case basis. That said, one cannot import all attributes handling from the C FE into the fortran FE seamlessly. There is always a bit of massaging required.