* adding attributes @ 2022-10-28 14:35 Dave Love 2022-10-30 7:48 ` Bernhard Reutner-Fischer 0 siblings, 1 reply; 12+ messages in thread From: Dave Love @ 2022-10-28 14:35 UTC (permalink / raw) To: fortran Assuming there's no technical reason not to, can someone say what would be involved in adding relevant attributes (at least function ones) like those for C? I'm thinking particularly of target_clones, but others probably make sense. I don't know my way around, but I had a quick look, and it at least wouldn't be as straightforward as I hoped. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: adding attributes 2022-10-28 14:35 adding attributes Dave Love @ 2022-10-30 7:48 ` Bernhard Reutner-Fischer 2022-10-31 21:19 ` Dave Love 0 siblings, 1 reply; 12+ messages in thread From: Bernhard Reutner-Fischer @ 2022-10-30 7:48 UTC (permalink / raw) To: Dave Love via Fortran; +Cc: rep.dot.nop, Dave Love On Fri, 28 Oct 2022 15:35:45 +0100 Dave Love via Fortran <fortran@gcc.gnu.org> wrote: > Assuming there's no technical reason not to, can someone say what would > be involved in adding relevant attributes (at least function ones) like > those for C? I'm thinking particularly of target_clones, but others > probably make sense. Well we already have !GCC$ ATTRIBUTES attribute-list :: var-name [, var-name] ... See https://gcc.gnu.org/onlinedocs/gfortran/ATTRIBUTES-directive.html > > I don't know my way around, but I had a quick look, and it at least > wouldn't be as straightforward as I hoped. See gcc/fortran/decl.cc gfc_match_gcc_* For example the CDECL attribute probably comes close to target_clones: subroutine mysub1() !GCC$ ATTRIBUTES CDECL :: mysub1 ! body of mysub1 here For target_clones you would most likely need a slightly different parser for you need the user to specify the actual target_clones somehow. You would probably make a suggestion and discuss the proposal here. Ideally the syntax would be the same as in C. Finally you would have to lower the target_clones, not sure offhand who creates the ifunc dispatcher, the frontend or the middle-end. There is expand_target_clones in gcc/multiple_target.cc that probably comes into play. So yes, that seems to create the ifuncs if i read that correctly. Hence the implementation in the frontend should not be all too complicated, it seems. To your initial remark about a technical reason not to support it i would point you to what Tobias said to me some time ago and which i certainly told other people more than once, too: https://patchwork.ozlabs.org/comment/958570/ ---8<--- In general, I prefer to stick to standard methods (which are portable) and think that those user knobs often make things slower than faster (as they tend to stay for years, even after the hard- ware as moved on - or they are even inserted blindly). ---8<--- In former times, you would compile your library multiple times and provide a distinct, optimized version for each of the CPUs. Maybe that would work for you equally well, without target_clones? HTH ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: adding attributes 2022-10-30 7:48 ` Bernhard Reutner-Fischer @ 2022-10-31 21:19 ` Dave Love 2022-11-02 23:19 ` Bernhard Reutner-Fischer 0 siblings, 1 reply; 12+ messages in thread From: Dave Love @ 2022-10-31 21:19 UTC (permalink / raw) To: fortran Bernhard Reutner-Fischer via Fortran <fortran@gcc.gnu.org> writes: > Well we already have > !GCC$ ATTRIBUTES attribute-list :: var-name [, var-name] ... > > See https://gcc.gnu.org/onlinedocs/gfortran/ATTRIBUTES-directive.html Yes, that's what I was hoping was simple to extend. Sorry I didn't say explicitly. > For target_clones you would most likely need a slightly different parser > for you need the user to specify the actual target_clones somehow. You > would probably make a suggestion and discuss the proposal here. > Ideally the syntax would be the same as in C. Right. I hoped it would be possible to lift machinery easily from C. It wasn't obvious you could, but I didn't spend much time when I looked at it a while ago. > ---8<--- > In general, I prefer to stick to standard methods > (which are portable) and think that those user knobs often make things > slower than faster (as they tend to stay for years, even after the hard- > ware as moved on - or they are even inserted blindly). > ---8<--- There's no standard method for this sort of portable performance engineering as far as I can tell. The best I could see was specifying a SIMD length statically in OpenMP. I'm interested in things that potentially make the difference between, say, vectorization for AVX2 or full-width AVX512 versus SSE2 for profiled host-spots. I fully agree about measurement and not doing things blindly, and I prize maintainability. However, target_clones is clearly better than the existing facility for explicit, target-independent unrolling, for instance. > In former times, you would compile your library multiple times > and provide a distinct, optimized version for each of the CPUs. > Maybe that would work for you equally well, without target_clones? "Former times" to me means, say, GEC 4000 v. IBM 370 and the aftermath of "all the world's a VAX", rather than different x86 micro-architectures... I do now work on both x86_64 and POWER. Multiple compilation isn't a good solution. I haven't followed the current state of hardware capability support, but relevant systems don't have it on x86_64, at least. That wouldn't help kernels of your simulation code that aren't abstracted into a library or set up for dynamic dispatch anyway. I don't have a specific instance in mind, but consider OS packaging, which I do; that currently has to be built for base x86_64 (SSE2) for EPEL, at least, and so could miss a factor of several performance from vectorized. > HTH Thanks. Definitely a more helpful response than when I asked about doing something previously! (I don't know if I'll actually be able to work on it in the end, at least on work time.) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: adding attributes 2022-10-31 21:19 ` Dave Love @ 2022-11-02 23:19 ` Bernhard Reutner-Fischer 2022-11-04 20:59 ` Bernhard Reutner-Fischer 2022-11-07 11:04 ` Dave Love 0 siblings, 2 replies; 12+ messages in thread From: Bernhard Reutner-Fischer @ 2022-11-02 23:19 UTC (permalink / raw) To: Dave Love via Fortran; +Cc: rep.dot.nop, Dave Love On Mon, 31 Oct 2022 21:19:18 +0000 Dave Love via Fortran <fortran@gcc.gnu.org> wrote: > Bernhard Reutner-Fischer via Fortran <fortran@gcc.gnu.org> writes: > > Ideally the syntax would be the same as in C. > > Right. I hoped it would be possible to lift machinery easily from C. Lifting that won't work easily, no. > There's no standard method for this sort of portable performance > engineering as far as I can tell. The best I could see was specifying a > SIMD length statically in OpenMP. I'm interested in things that > potentially make the difference between, say, vectorization for AVX2 or > full-width AVX512 versus SSE2 for profiled host-spots. I fully agree I see. So target_clones is one thing. What other attributes would be important? > about measurement and not doing things blindly, and I prize > maintainability. However, target_clones is clearly better than the > existing facility for explicit, target-independent unrolling, for instance. Yes. Unroll is certainly only applicable in a few places, sure. > > > In former times, you would compile your library multiple times > > and provide a distinct, optimized version for each of the CPUs. > > Maybe that would work for you equally well, without target_clones? > > "Former times" to me means, say, GEC 4000 v. IBM 370 and the aftermath > of "all the world's a VAX", rather than different x86 > micro-architectures... I do now work on both x86_64 and POWER. In your job script you would use cpuid(1) to determine a properly tuned binary for the parts of the cluster you run on. Or the installed binaries are tuned for the host they are installed on and are located in a uniform place per application. > > Multiple compilation isn't a good solution. I haven't followed the It might not be good, but it's cheap and easy if you only have a small set of different arches and subarches each. In a controlled environment, with a batch scheduler. Won't work in the wild of course. > current state of hardware capability support, but relevant systems don't > have it on x86_64, at least. That wouldn't help kernels of your > simulation code that aren't abstracted into a library or set up for > dynamic dispatch anyway. I don't have a specific instance in mind, but > consider OS packaging, which I do; that currently has to be built for > base x86_64 (SSE2) for EPEL, at least, and so could miss a factor of > several performance from vectorized. For packaging for global use that won't work all that well indeed. But since you cannot mix target_clones across arch-boundaries, supporting those for a distro will probably be rather ugly anyway. I think that's what's gentoo et al are for, or your privately rebuilt debian repo; provide a tuned world for everybody, individually ;) But as you mentioned EPEL i never said that :) > > > HTH > > Thanks. Definitely a more helpful response than when I asked about > doing something previously! (I don't know if I'll actually be able to > work on it in the end, at least on work time.) heh, me neither. Luckily yesterday was a holiday, so what i ended up with was the following, fya. Consider: $ grep -v "^\!\!" /scratch/src/gcc-13.mine/gcc/testsuite/gfortran.dg/attr_target_clones-1.f90;echo EOF ! { dg-do compile } ! { dg-options "-O1 -fdump-tree-optimized" } ! ! Test __attribute__ ((target_clones ("foo", "bar"))) ! module m implicit none contains subroutine sub1() !GCC$ ATTRIBUTES target_clones("avx", "sse","default") :: sub1 print *, 4321 end end module m ! { dg-final { scan-tree-dump-times {void * __m_MOD_sub1.resolver ()} "optimized" 1 } } ! { dg-final { scan-tree-dump-times {void __m_MOD_sub1.avx ()} "optimized" 1 } } ! { dg-final { scan-tree-dump-times {void __m_MOD_sub1.sse ()} "optimized" 1 } } !!! { dg-final { scan-tree-dump-times {XXX something sub1.default ()} "optimized" 1 } } ! { dg-final { scan-tree-dump-not {void sub1 ()} "optimized" } } EOF Which gives $ ./gfortran -B. -o /tmp/out.o -c /scratch/src/gcc-13.mine/gcc/testsuite/gfortran.dg/attr_target_clones-1.f90 -O2 -fdump-tree-original -fdump-tree-optimized /tmp/ccxpGd9Y.s: Assembler messages: /tmp/ccxpGd9Y.s:118: Error: symbol `__m_MOD_sub1' is already defined That's because that ends up as $ nl -ba /tmp/out.s | grep __m_MOD_sub1 12 .type __m_MOD_sub1, @function 13 __m_MOD_sub1: 35 .size __m_MOD_sub1, .-__m_MOD_sub1 36 .type __m_MOD_sub1.avx, @function 37 __m_MOD_sub1.avx: 59 .size __m_MOD_sub1.avx, .-__m_MOD_sub1.avx 60 .type __m_MOD_sub1.sse, @function 61 __m_MOD_sub1.sse: 83 .size __m_MOD_sub1.sse, .-__m_MOD_sub1.sse 84 .section .text.__m_MOD_sub1.resolver,"axG",@progbits,__m_MOD_sub1.resolver,comdat 85 .weak __m_MOD_sub1.resolver 86 .type __m_MOD_sub1.resolver, @function 87 __m_MOD_sub1.resolver: 95 movl $__m_MOD_sub1.avx, %eax 104 movl $__m_MOD_sub1, %eax 105 movl $__m_MOD_sub1.sse, %edx 110 .size __m_MOD_sub1.resolver, .-__m_MOD_sub1.resolver 111 .globl __m_MOD_sub1 112 .type __m_MOD_sub1, @gnu_indirect_function 113 .set __m_MOD_sub1,__m_MOD_sub1.resolver where 13 and 111 probably don't work out too well. The C frontend uses sub1.default as version for the (former) plain sub1: 4 .type sub1.default, @function 5 sub1.default: ... 103 .section .text.sub1.resolver,"axG",@progbits,sub1.resolver,comdat 105 .weak sub1.resolver 106 .type sub1.resolver, @function 107 sub1.resolver: ... 162 leaq sub1.default(%rip), %rax 167 .size sub1.resolver, .-sub1.resolver 168 .globl sub1 169 .type sub1, @gnu_indirect_function 170 .set sub1,sub1.resolver If i mark the module fndecl as DECL_FUNCTION_VERSIONED, then it's pointed out that i seem to have to provide the default by hand: 10 | subroutine sub1() | 1 internal compiler error: in ix86_mangle_function_version_assembler_name, at config/i386/i386-features.cc:3165 0x806780 ix86_mangle_function_version_assembler_name That's the check that there is /* target attribute string cannot be NULL. */ gcc_assert (version_attr != NULL_TREE); So while target and target_clones seem to be mutually exclusive (from the C FE checking), the versioning wants the default in a target attr or something like that. And on top of all that, gfc_match_gcc_attributes has the following comment: TODO: We should support all GCC attributes using the same syntax for the attribute list, i.e. the list in C __attributes(( attribute-list )) matches then !GCC$ ATTRIBUTES attribute-list :: Cf. c-parser.cc's c_parser_attributes; the data can then directly be saved into a TREE. When we do that, we can get rid of ext_attr_list[] because that would be generated right from the start. I've added a /* Attributes set by compiler extensions (!GCC$ ATTRIBUTES). */ unsigned ext_attr:EXT_ATTR_NUM; + tree ext_attr_args; to struct symbol_attribute where i can prepare the tree_list for the attrs right from the start. The lowering is then rather simple and uniform, just chainon the prepared attributes and be done. One could get rid of ext_attr altogether, with the caveat that this would change the module format. We'd have to save the attrs in a different way, breaking module compat again, of course. target_clones does not require a bump in the module format, i'd say, because the main entry point does not change. Will have to check if the clones do not end up being emitted in the module, they shouldn't be. Other attributes _may_ require a change in the module format though. These would need checking on a per case basis. That said, one cannot import all attributes handling from the C FE into the fortran FE seamlessly. There is always a bit of massaging required. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: adding attributes 2022-11-02 23:19 ` Bernhard Reutner-Fischer @ 2022-11-04 20:59 ` Bernhard Reutner-Fischer 2022-11-05 7:40 ` Thomas Koenig 2022-11-07 11:04 ` Dave Love 1 sibling, 1 reply; 12+ messages in thread From: Bernhard Reutner-Fischer @ 2022-11-04 20:59 UTC (permalink / raw) To: Dave Love via Fortran; +Cc: rep.dot.nop, Dave Love On Thu, 3 Nov 2022 00:19:26 +0100 Bernhard Reutner-Fischer <rep.dot.nop@gmail.com> wrote: > So target_clones is one thing. What other attributes would be important? > > doing something previously! (I don't know if I'll actually be able to > > work on it in the end, at least on work time.) > > heh, me neither. Luckily yesterday was a holiday, so what i ended up > with was the following, fya. > Consider: $ cat gcc/testsuite/gfortran.dg/attr_target_clones-1.F90; echo EOF ! { dg-require-ifunc "" } ! { dg-options "-O1" } ! { dg-additional-options "-fdump-tree-optimized" } ! It seems arch defines are not defined?! ! See fortran.cpp FIXME: Pandora's Box ! Ok, so enterprise-level bugfix: ! { dg-additional-options "-D__i386__=1" { target { i?86-*-* x86_64-*-* } } } ! { dg-additional-options "-D__powerpc__=1" { target { powerpc*-*-* } } } ! { dg-skip-if "test not yet implemented for target" { ! {i?86-*-* x86_64-*-* powerpc*-*-*} } } !! { dg- skip-if "needs optimize" { *-*-* } { "*" } { " -O0 " } } ! Test __attribute__ ((target_clones ("foo", "bar"))) ! module m implicit none contains subroutine sub1() #if defined __i386__ || defined __x86_64__ !GCC$ ATTRIBUTES target_clones("avx", "sse","default") :: sub1 #elif defined __powerpc__ !GCC$ ATTRIBUTES target_clones("power10", "power9","default") :: sub1 #endif print *, 4321 end end module m ! { dg-final { scan-tree-dump-times {(?n)void \* __m_MOD_sub1\.resolver \(\)} 1 "optimized" } } ! { dg-final { scan-tree-dump-times {(?n)void __m_MOD_sub1\.(?:avx|power10) \(\)} 1 "optimized" } } ! { dg-final { scan-tree-dump-times {(?n)void __m_MOD_sub1\.(?:sse|power9) \(\)} 1 "optimized" } } ! { dg-final { scan-tree-dump-times {(?n)void sub1 \(\)} 1 "optimized" } } !! and a non-assembly hint on the ifunc ! { dg-final { scan-tree-dump-times {Function sub1 \(__m_MOD_sub1\.default,} 1 "optimized" } } EOF 2 patches: https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605081.html https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604981.html (the testcase mentioned in the latter is superseded be the blurb above) One would have to cleanup the parser (see "XXX: Rephrase this in a sane, understandable manner..") and add some more testcases, for several malformed attribute strings. Maybe i'll get to it during the weekend or some evening. Not sure about the usefulness though. And not sure if fellow gfortraners would accept this attribute target_clones in there in the first place.. cheers, ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: adding attributes 2022-11-04 20:59 ` Bernhard Reutner-Fischer @ 2022-11-05 7:40 ` Thomas Koenig 2022-11-05 10:54 ` Bernhard Reutner-Fischer 0 siblings, 1 reply; 12+ messages in thread From: Thomas Koenig @ 2022-11-05 7:40 UTC (permalink / raw) To: Bernhard Reutner-Fischer, Dave Love via Fortran; +Cc: Dave Love On 04.11.22 21:59, Bernhard Reutner-Fischer via Fortran wrote: > And not sure if fellow gfortraners would accept this attribute > target_clones in there in the first place.. It might actually be useful. Is there any change about the calling sequence or anything else that should be visible in a Fortran module or the calling sequence? Thomas ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: adding attributes 2022-11-05 7:40 ` Thomas Koenig @ 2022-11-05 10:54 ` Bernhard Reutner-Fischer 2022-11-06 13:44 ` Thomas Koenig 2023-02-24 12:24 ` Dave Love 0 siblings, 2 replies; 12+ messages in thread From: Bernhard Reutner-Fischer @ 2022-11-05 10:54 UTC (permalink / raw) To: Thomas Koenig; +Cc: rep.dot.nop, Dave Love via Fortran, Dave Love [-- Attachment #1: Type: text/plain, Size: 1673 bytes --] On Sat, 5 Nov 2022 08:40:06 +0100 Thomas Koenig <tkoenig@netcologne.de> wrote: > On 04.11.22 21:59, Bernhard Reutner-Fischer via Fortran wrote: > > And not sure if fellow gfortraners would accept this attribute > > target_clones in there in the first place.. > > It might actually be useful. Is there any change about > the calling sequence or anything else that should be visible > in a Fortran module or the calling sequence? The module interface remains the same. And the call sequence remains the same, too. For a user nothing changes. An example: module m implicit none contains subroutine sub1() !GCC$ ATTRIBUTES target_clones("avx", "sse","default") :: sub1 print *, 4321 end end module m This used to compiles to: $ nm /tmp/pristine.o U _gfortran_st_write U _gfortran_st_write_done U _gfortran_transfer_integer_write 0000000000000000 T __m_MOD_sub1 And now compiles to: $ nm /tmp/new.o U __cpu_indicator_init U __cpu_model U _gfortran_st_write U _gfortran_st_write_done U _gfortran_transfer_integer_write 0000000000000000 i __m_MOD_sub1 000000000000006e t __m_MOD_sub1.avx 0000000000000000 t __m_MOD_sub1.default 0000000000000000 W __m_MOD_sub1.resolver 00000000000000dc t __m_MOD_sub1.sse I.e. the caller still calls __m_MOD_sub1 But this is now an ifunc, which looks at the cpu bits and dispatches to the appropriate ISA version. I'm attaching the assembler input for reference. If you think that we want to add support for that attribute, i can submit a proper patch. Just let me know please. thanks, [-- Attachment #2: new.s --] [-- Type: text/plain, Size: 3040 bytes --] .file "attr_target_clones-1.F90" .text .section .rodata .align 8 .LC0: .string "/scratch/src/gcc-13.mine/gcc/testsuite/gfortran.dg/attr_target_clones-1.F90" .align 4 .LC1: .long 4321 .text .type __m_MOD_sub1.default, @function __m_MOD_sub1.default: .LFB0: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq %rsp, %rbp .cfi_def_cfa_register 6 subq $528, %rsp movq $.LC0, -520(%rbp) movl $21, -512(%rbp) movl $128, -528(%rbp) movl $6, -524(%rbp) leaq -528(%rbp), %rax movq %rax, %rdi call _gfortran_st_write leaq -528(%rbp), %rax movl $4, %edx movl $.LC1, %esi movq %rax, %rdi call _gfortran_transfer_integer_write leaq -528(%rbp), %rax movq %rax, %rdi call _gfortran_st_write_done nop leave .cfi_def_cfa 7, 8 ret .cfi_endproc .LFE0: .size __m_MOD_sub1.default, .-__m_MOD_sub1.default .type __m_MOD_sub1.avx, @function __m_MOD_sub1.avx: .LFB1: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq %rsp, %rbp .cfi_def_cfa_register 6 subq $528, %rsp movq $.LC0, -520(%rbp) movl $21, -512(%rbp) movl $128, -528(%rbp) movl $6, -524(%rbp) leaq -528(%rbp), %rax movq %rax, %rdi call _gfortran_st_write leaq -528(%rbp), %rax movl $4, %edx movl $.LC1, %esi movq %rax, %rdi call _gfortran_transfer_integer_write leaq -528(%rbp), %rax movq %rax, %rdi call _gfortran_st_write_done nop leave .cfi_def_cfa 7, 8 ret .cfi_endproc .LFE1: .size __m_MOD_sub1.avx, .-__m_MOD_sub1.avx .type __m_MOD_sub1.sse, @function __m_MOD_sub1.sse: .LFB2: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq %rsp, %rbp .cfi_def_cfa_register 6 subq $528, %rsp movq $.LC0, -520(%rbp) movl $21, -512(%rbp) movl $128, -528(%rbp) movl $6, -524(%rbp) leaq -528(%rbp), %rax movq %rax, %rdi call _gfortran_st_write leaq -528(%rbp), %rax movl $4, %edx movl $.LC1, %esi movq %rax, %rdi call _gfortran_transfer_integer_write leaq -528(%rbp), %rax movq %rax, %rdi call _gfortran_st_write_done nop leave .cfi_def_cfa 7, 8 ret .cfi_endproc .LFE2: .size __m_MOD_sub1.sse, .-__m_MOD_sub1.sse .section .text.__m_MOD_sub1.resolver,"axG",@progbits,__m_MOD_sub1.resolver,comdat .weak __m_MOD_sub1.resolver .type __m_MOD_sub1.resolver, @function __m_MOD_sub1.resolver: .LFB4: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq %rsp, %rbp .cfi_def_cfa_register 6 call __cpu_indicator_init movl __cpu_model+12(%rip), %eax andl $512, %eax testl %eax, %eax jle .L5 movl $__m_MOD_sub1.avx, %eax jmp .L4 .L5: movl __cpu_model+12(%rip), %eax andl $8, %eax testl %eax, %eax jle .L6 movl $__m_MOD_sub1.sse, %eax jmp .L4 .L6: movl $__m_MOD_sub1.default, %eax .L4: popq %rbp .cfi_def_cfa 7, 8 ret .cfi_endproc .LFE4: .size __m_MOD_sub1.resolver, .-__m_MOD_sub1.resolver .globl __m_MOD_sub1 .type __m_MOD_sub1, @gnu_indirect_function .set __m_MOD_sub1,__m_MOD_sub1.resolver .ident "GCC: (GNU) 13.0.0 20220916 (experimental) [master r13-2694-g3e8c4b925a9]" .section .note.GNU-stack,"",@progbits ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: adding attributes 2022-11-05 10:54 ` Bernhard Reutner-Fischer @ 2022-11-06 13:44 ` Thomas Koenig 2022-11-07 11:06 ` Dave Love 2023-02-24 12:24 ` Dave Love 1 sibling, 1 reply; 12+ messages in thread From: Thomas Koenig @ 2022-11-06 13:44 UTC (permalink / raw) To: Bernhard Reutner-Fischer; +Cc: Dave Love via Fortran, Dave Love Hi Bernhard, > If you think that we want to add support for that attribute, i can > submit a proper patch. Just let me know please. I think this attribute makes sense, especially if people want to compile once and then port to different architectures. So, yes please submit a patch, if you would be so kind :-) Best regards Thomas ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: adding attributes 2022-11-06 13:44 ` Thomas Koenig @ 2022-11-07 11:06 ` Dave Love 0 siblings, 0 replies; 12+ messages in thread From: Dave Love @ 2022-11-07 11:06 UTC (permalink / raw) To: fortran Thomas Koenig via Fortran <fortran@gcc.gnu.org> writes: > Hi Bernhard, > >> If you think that we want to add support for that attribute, i can >> submit a proper patch. Just let me know please. > > I think this attribute makes sense, especially if people want to > compile once and then port to different architectures. > > So, yes please submit a patch, if you would be so kind :-) > > Best regards > > Thomas I'll make a list of others I think would be useful for consideration. I should probably have looked at what similar facilities proprietary compilers provide. I should say the intent of optimization-related attributes would be to encourage adding them to known performance-sensitive kernels, and potentially avoid having to write C. However, I doubt I can persuade the people who are convinced of the great superiority of ifort, and the desirability of always adding -axCORE-AVX512, without measuring or understanding GCC options. I think an equivalent of "#pragma GCC optimize" could be useful too. I don't have a specific case in mind, but an example might be where you need localized standard-conforming arithmetic, but other parts of the program benefit from -ffast-math. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: adding attributes 2022-11-05 10:54 ` Bernhard Reutner-Fischer 2022-11-06 13:44 ` Thomas Koenig @ 2023-02-24 12:24 ` Dave Love 1 sibling, 0 replies; 12+ messages in thread From: Dave Love @ 2023-02-24 12:24 UTC (permalink / raw) To: Bernhard Reutner-Fischer via Fortran Cc: Thomas Koenig, Bernhard Reutner-Fischer I noticed other attributes have been added now. What happened to target_clones, in particular? (Thanks for working on it.) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: adding attributes 2022-11-02 23:19 ` Bernhard Reutner-Fischer 2022-11-04 20:59 ` Bernhard Reutner-Fischer @ 2022-11-07 11:04 ` Dave Love 2022-11-10 12:25 ` Bernhard Reutner-Fischer 1 sibling, 1 reply; 12+ messages in thread From: Dave Love @ 2022-11-07 11:04 UTC (permalink / raw) To: fortran Bernhard Reutner-Fischer via Fortran <fortran@gcc.gnu.org> writes: > I see. > So target_clones is one thing. What other attributes would be important? At least optimization-related ones could be useful, and possibly others. I haven't made a list, but could do. > In your job script you would use cpuid(1) to determine a properly tuned > binary for the parts of the cluster you run on. Or the installed > binaries are tuned for the host they are installed on and are located in > a uniform place per application. > >> >> Multiple compilation isn't a good solution. I haven't followed the > > It might not be good, but it's cheap and easy if you only have a small > set of different arches and subarches each. In a controlled > environment, with a batch scheduler. Won't work in the wild of course. I know all that you can do, but my opinions are from extensive experience managing rather heterogeneous HPC clusters and working on dynamic dispatch in libraries. (The worst thing about gfortran for system management is the lack of backwards-compatibility in module formats and libgfortran.) > But since you cannot mix target_clones across arch-boundaries, > supporting those for a distro will probably be rather ugly anyway. Yes, you need simple pre-processing, as you do for the attributes in C, unless there was some extra guard facility added. > heh, me neither. Luckily yesterday was a holiday, so what i ended up > with was the following, fya. Gosh; I thought it would take a while even if you knew your way around. I didn't want to spoil a holiday! (I'd aim to do such things on work time.) > I've added a > /* Attributes set by compiler extensions (!GCC$ ATTRIBUTES). */ > unsigned ext_attr:EXT_ATTR_NUM; > + tree ext_attr_args; > > to struct symbol_attribute where i can prepare the tree_list for the > attrs right from the start. The lowering is then rather simple and > uniform, just chainon the prepared attributes and be done. If I understand correctly, I could go through and add ones that look useful (for debate). I have experience of using several in C (at least once even for g77 runtime). > target_clones does not require a bump in the module format, i'd say, > because the main entry point does not change. Will have to check if > the clones do not end up being emitted in the module, they shouldn't be. > Other attributes _may_ require a change in the module format though. > These would need checking on a per case basis. I don't understand the module format, but I wouldn't have expected relevant attributes to change interfaces. Anyway, thanks! ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: adding attributes 2022-11-07 11:04 ` Dave Love @ 2022-11-10 12:25 ` Bernhard Reutner-Fischer 0 siblings, 0 replies; 12+ messages in thread From: Bernhard Reutner-Fischer @ 2022-11-10 12:25 UTC (permalink / raw) To: Dave Love via Fortran; +Cc: rep.dot.nop, Dave Love Hi! On Mon, 07 Nov 2022 11:04:17 +0000 Dave Love via Fortran <fortran@gcc.gnu.org> wrote: > Bernhard Reutner-Fischer via Fortran <fortran@gcc.gnu.org> writes: > > > I see. > > So target_clones is one thing. What other attributes would be important? > > At least optimization-related ones could be useful, and possibly others. > I haven't made a list, but could do. Please do. And yes, i can see that __attribute__((__optimize__(...))) would be useful. > dynamic dispatch in libraries. (The worst thing about gfortran for > system management is the lack of backwards-compatibility in module > formats and libgfortran.) yea. IIRC there was discussion a couple of years back what we could do about the module format. Nowadays i'd probably just use JSON, but i did not think too much about it. Needless to say that rewriting the mio (module I/O) would take more than one evening :) I remember some wrinkles there when i played around with the fortran-fe-stringpool idea (which reminds me i should pickup again, maybe). > > But since you cannot mix target_clones across arch-boundaries, > > supporting those for a distro will probably be rather ugly anyway. > > Yes, you need simple pre-processing, as you do for the attributes in C, > unless there was some extra guard facility added. Yes indeed. But this would be much easier to handle if we'd have actual arch defines. Until we have, you'd have to run this through cpp manually which is doable but not all that convenient IMHO. Hm, didn't we have a syntax for arch conditions in the math vec? We could hijack that, but it's probably still better to just fix the arch defines as that's generally useful. > > heh, me neither. Luckily yesterday was a holiday, so what i ended up > > with was the following, fya. > > Gosh; I thought it would take a while even if you knew your way around. > I didn't want to spoil a holiday! (I'd aim to do such things on work > time.) No problem, it was just for fun. I spent most of the time to scratch my head why the attribute didn't work for i had wrapped it in an arch ifdef for the testsuite to cover both i386 and ppc. And of course i only noticed very, very late what was really going on ;) > > I've added a > > /* Attributes set by compiler extensions (!GCC$ ATTRIBUTES). */ > > unsigned ext_attr:EXT_ATTR_NUM; > > + tree ext_attr_args; > > > > to struct symbol_attribute where i can prepare the tree_list for the > > attrs right from the start. The lowering is then rather simple and > > uniform, just chainon the prepared attributes and be done. > > If I understand correctly, I could go through and add ones that look > useful (for debate). I have experience of using several in C (at least > once even for g77 runtime). Yes please, that'd be interesting. > > target_clones does not require a bump in the module format, i'd say, > > because the main entry point does not change. Will have to check if > > the clones do not end up being emitted in the module, they shouldn't be. > > Other attributes _may_ require a change in the module format though. > > These would need checking on a per case basis. > > I don't understand the module format, but I wouldn't have expected > relevant attributes to change interfaces. Well that should probably not be needed indeed for most attributes, yes. But then, i do think we stream out the ext_attr, at least for certain attributes like "cdecl", the dll{im,ex}port, {std,fast}call et al. See module.cc, mio_symbol_attribute. So the bits that change the calling convention have to be brought to the attention of the module consumer probably. Think regparm or sseregparm for example i guess. That said, if i comment out the invalid cases of the test, i get with gcc-12: $ gfortran -c -o /tmp/out0.o /scratch/src/gcc-13.mine/gcc/testsuite/gfortran.dg/compiler-directive_1.f90 /scratch/src/gcc-13.mine/gcc/testsuite/gfortran.dg/compiler-directive_1.f90:33:15: 33 | cdecl => sub2 | 1 Warning: ‘cdecl’ attribute ignored [-Wattributes] /scratch/src/gcc-13.mine/gcc/testsuite/gfortran.dg/compiler-directive_1.f90:33:15: Warning: ‘cdecl’ attribute ignored [-Wattributes] /scratch/src/gcc-13.mine/gcc/testsuite/gfortran.dg/compiler-directive_1.f90:34:17: 34 | stdcall => sub3 | 1 Warning: ‘stdcall’ attribute ignored [-Wattributes] /scratch/src/gcc-13.mine/gcc/testsuite/gfortran.dg/compiler-directive_1.f90:34:17: Warning: ‘stdcall’ attribute ignored [-Wattributes] /scratch/src/gcc-13.mine/gcc/testsuite/gfortran.dg/compiler-directive_1.f90:35:18: 35 | fastcall => sub4 | 1 Warning: ‘fastcall’ attribute ignored [-Wattributes] /scratch/src/gcc-13.mine/gcc/testsuite/gfortran.dg/compiler-directive_1.f90:35:18: Warning: ‘fastcall’ attribute ignored [-Wattributes] so i'm not sure what these attributes do or are supposed to do really, but i didn't look. And in addition there is typo s/consitency/consistency/ in the test. > > Anyway, thanks! You're welcome. PS: You might have seen the patch to add "flatten". As can be seen, it should now be rather straightforward to add simple attributes for procedures. cheers, ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2023-02-24 12:24 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-10-28 14:35 adding attributes Dave Love 2022-10-30 7:48 ` Bernhard Reutner-Fischer 2022-10-31 21:19 ` Dave Love 2022-11-02 23:19 ` Bernhard Reutner-Fischer 2022-11-04 20:59 ` Bernhard Reutner-Fischer 2022-11-05 7:40 ` Thomas Koenig 2022-11-05 10:54 ` Bernhard Reutner-Fischer 2022-11-06 13:44 ` Thomas Koenig 2022-11-07 11:06 ` Dave Love 2023-02-24 12:24 ` Dave Love 2022-11-07 11:04 ` Dave Love 2022-11-10 12:25 ` Bernhard Reutner-Fischer
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).