From: Bernhard Reutner-Fischer <rep.dot.nop@gmail.com>
To: Dave Love via Fortran <fortran@gcc.gnu.org>
Cc: rep.dot.nop@gmail.com, Dave Love <dave.love@manchester.ac.uk>
Subject: Re: adding attributes
Date: Thu, 3 Nov 2022 00:19:26 +0100 [thread overview]
Message-ID: <20221103001926.725fd9bf@nbbrfq> (raw)
In-Reply-To: <87edund73d.fsf@manchester.ac.uk>
On Mon, 31 Oct 2022 21:19:18 +0000
Dave Love via Fortran <fortran@gcc.gnu.org> wrote:
> Bernhard Reutner-Fischer via Fortran <fortran@gcc.gnu.org> writes:
> > Ideally the syntax would be the same as in C.
>
> Right. I hoped it would be possible to lift machinery easily from C.
Lifting that won't work easily, no.
> There's no standard method for this sort of portable performance
> engineering as far as I can tell. The best I could see was specifying a
> SIMD length statically in OpenMP. I'm interested in things that
> potentially make the difference between, say, vectorization for AVX2 or
> full-width AVX512 versus SSE2 for profiled host-spots. I fully agree
I see.
So target_clones is one thing. What other attributes would be important?
> about measurement and not doing things blindly, and I prize
> maintainability. However, target_clones is clearly better than the
> existing facility for explicit, target-independent unrolling, for instance.
Yes. Unroll is certainly only applicable in a few places, sure.
>
> > In former times, you would compile your library multiple times
> > and provide a distinct, optimized version for each of the CPUs.
> > Maybe that would work for you equally well, without target_clones?
>
> "Former times" to me means, say, GEC 4000 v. IBM 370 and the aftermath
> of "all the world's a VAX", rather than different x86
> micro-architectures... I do now work on both x86_64 and POWER.
In your job script you would use cpuid(1) to determine a properly tuned
binary for the parts of the cluster you run on. Or the installed
binaries are tuned for the host they are installed on and are located in
a uniform place per application.
>
> Multiple compilation isn't a good solution. I haven't followed the
It might not be good, but it's cheap and easy if you only have a small
set of different arches and subarches each. In a controlled
environment, with a batch scheduler. Won't work in the wild of course.
> current state of hardware capability support, but relevant systems don't
> have it on x86_64, at least. That wouldn't help kernels of your
> simulation code that aren't abstracted into a library or set up for
> dynamic dispatch anyway. I don't have a specific instance in mind, but
> consider OS packaging, which I do; that currently has to be built for
> base x86_64 (SSE2) for EPEL, at least, and so could miss a factor of
> several performance from vectorized.
For packaging for global use that won't work all that well indeed.
But since you cannot mix target_clones across arch-boundaries,
supporting those for a distro will probably be rather ugly anyway.
I think that's what's gentoo et al are for, or your privately rebuilt
debian repo; provide a tuned world for everybody, individually ;) But as
you mentioned EPEL i never said that :)
>
> > HTH
>
> Thanks. Definitely a more helpful response than when I asked about
> doing something previously! (I don't know if I'll actually be able to
> work on it in the end, at least on work time.)
heh, me neither. Luckily yesterday was a holiday, so what i ended up
with was the following, fya.
Consider:
$ grep -v "^\!\!" /scratch/src/gcc-13.mine/gcc/testsuite/gfortran.dg/attr_target_clones-1.f90;echo EOF
! { dg-do compile }
! { dg-options "-O1 -fdump-tree-optimized" }
!
! Test __attribute__ ((target_clones ("foo", "bar")))
!
module m
implicit none
contains
subroutine sub1()
!GCC$ ATTRIBUTES target_clones("avx", "sse","default") :: sub1
print *, 4321
end
end module m
! { dg-final { scan-tree-dump-times {void * __m_MOD_sub1.resolver ()} "optimized" 1 } }
! { dg-final { scan-tree-dump-times {void __m_MOD_sub1.avx ()} "optimized" 1 } }
! { dg-final { scan-tree-dump-times {void __m_MOD_sub1.sse ()} "optimized" 1 } }
!!! { dg-final { scan-tree-dump-times {XXX something sub1.default ()} "optimized" 1 } }
! { dg-final { scan-tree-dump-not {void sub1 ()} "optimized" } }
EOF
Which gives
$ ./gfortran -B. -o /tmp/out.o -c /scratch/src/gcc-13.mine/gcc/testsuite/gfortran.dg/attr_target_clones-1.f90 -O2 -fdump-tree-original -fdump-tree-optimized
/tmp/ccxpGd9Y.s: Assembler messages:
/tmp/ccxpGd9Y.s:118: Error: symbol `__m_MOD_sub1' is already defined
That's because that ends up as
$ nl -ba /tmp/out.s | grep __m_MOD_sub1
12 .type __m_MOD_sub1, @function
13 __m_MOD_sub1:
35 .size __m_MOD_sub1, .-__m_MOD_sub1
36 .type __m_MOD_sub1.avx, @function
37 __m_MOD_sub1.avx:
59 .size __m_MOD_sub1.avx, .-__m_MOD_sub1.avx
60 .type __m_MOD_sub1.sse, @function
61 __m_MOD_sub1.sse:
83 .size __m_MOD_sub1.sse, .-__m_MOD_sub1.sse
84 .section .text.__m_MOD_sub1.resolver,"axG",@progbits,__m_MOD_sub1.resolver,comdat
85 .weak __m_MOD_sub1.resolver
86 .type __m_MOD_sub1.resolver, @function
87 __m_MOD_sub1.resolver:
95 movl $__m_MOD_sub1.avx, %eax
104 movl $__m_MOD_sub1, %eax
105 movl $__m_MOD_sub1.sse, %edx
110 .size __m_MOD_sub1.resolver, .-__m_MOD_sub1.resolver
111 .globl __m_MOD_sub1
112 .type __m_MOD_sub1, @gnu_indirect_function
113 .set __m_MOD_sub1,__m_MOD_sub1.resolver
where 13 and 111 probably don't work out too well.
The C frontend uses sub1.default as version for the (former) plain sub1:
4 .type sub1.default, @function
5 sub1.default:
...
103 .section .text.sub1.resolver,"axG",@progbits,sub1.resolver,comdat
105 .weak sub1.resolver
106 .type sub1.resolver, @function
107 sub1.resolver:
...
162 leaq sub1.default(%rip), %rax
167 .size sub1.resolver, .-sub1.resolver
168 .globl sub1
169 .type sub1, @gnu_indirect_function
170 .set sub1,sub1.resolver
If i mark the module fndecl as DECL_FUNCTION_VERSIONED, then it's
pointed out that i seem to have to provide the default by hand:
10 | subroutine sub1()
| 1
internal compiler error: in ix86_mangle_function_version_assembler_name, at config/i386/i386-features.cc:3165
0x806780 ix86_mangle_function_version_assembler_name
That's the check that there is
/* target attribute string cannot be NULL. */
gcc_assert (version_attr != NULL_TREE);
So while target and target_clones seem to be mutually exclusive (from
the C FE checking), the versioning wants the default in a target attr
or something like that.
And on top of all that, gfc_match_gcc_attributes has the following
comment:
TODO: We should support all GCC attributes using the same syntax for
the attribute list, i.e. the list in C
__attributes(( attribute-list ))
matches then
!GCC$ ATTRIBUTES attribute-list ::
Cf. c-parser.cc's c_parser_attributes; the data can then directly be
saved into a TREE.
When we do that, we can get rid of ext_attr_list[] because that would
be generated right from the start.
I've added a
/* Attributes set by compiler extensions (!GCC$ ATTRIBUTES). */
unsigned ext_attr:EXT_ATTR_NUM;
+ tree ext_attr_args;
to struct symbol_attribute where i can prepare the tree_list for the
attrs right from the start. The lowering is then rather simple and
uniform, just chainon the prepared attributes and be done.
One could get rid of ext_attr altogether, with the caveat that this
would change the module format. We'd have to save the attrs in a
different way, breaking module compat again, of course.
target_clones does not require a bump in the module format, i'd say,
because the main entry point does not change. Will have to check if
the clones do not end up being emitted in the module, they shouldn't be.
Other attributes _may_ require a change in the module format though.
These would need checking on a per case basis.
That said, one cannot import all attributes handling from the C FE into
the fortran FE seamlessly. There is always a bit of massaging required.
next prev parent reply other threads:[~2022-11-02 23:19 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-28 14:35 Dave Love
2022-10-30 7:48 ` Bernhard Reutner-Fischer
2022-10-31 21:19 ` Dave Love
2022-11-02 23:19 ` Bernhard Reutner-Fischer [this message]
2022-11-04 20:59 ` Bernhard Reutner-Fischer
2022-11-05 7:40 ` Thomas Koenig
2022-11-05 10:54 ` Bernhard Reutner-Fischer
2022-11-06 13:44 ` Thomas Koenig
2022-11-07 11:06 ` Dave Love
2023-02-24 12:24 ` Dave Love
2022-11-07 11:04 ` Dave Love
2022-11-10 12:25 ` Bernhard Reutner-Fischer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20221103001926.725fd9bf@nbbrfq \
--to=rep.dot.nop@gmail.com \
--cc=dave.love@manchester.ac.uk \
--cc=fortran@gcc.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).