public inbox for fortran@gcc.gnu.org
 help / color / mirror / Atom feed
* adding attributes
@ 2022-10-28 14:35 Dave Love
  2022-10-30  7:48 ` Bernhard Reutner-Fischer
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Love @ 2022-10-28 14:35 UTC (permalink / raw)
  To: fortran

Assuming there's no technical reason not to, can someone say what would
be involved in adding relevant attributes (at least function ones) like
those for C?  I'm thinking particularly of target_clones, but others
probably make sense.

I don't know my way around, but I had a quick look, and it at least
wouldn't be as straightforward as I hoped.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: adding attributes
  2022-10-28 14:35 adding attributes Dave Love
@ 2022-10-30  7:48 ` Bernhard Reutner-Fischer
  2022-10-31 21:19   ` Dave Love
  0 siblings, 1 reply; 12+ messages in thread
From: Bernhard Reutner-Fischer @ 2022-10-30  7:48 UTC (permalink / raw)
  To: Dave Love via Fortran; +Cc: rep.dot.nop, Dave Love

On Fri, 28 Oct 2022 15:35:45 +0100
Dave Love via Fortran <fortran@gcc.gnu.org> wrote:

> Assuming there's no technical reason not to, can someone say what would
> be involved in adding relevant attributes (at least function ones) like
> those for C?  I'm thinking particularly of target_clones, but others
> probably make sense.

Well we already have
!GCC$ ATTRIBUTES attribute-list :: var-name [, var-name] ...

See https://gcc.gnu.org/onlinedocs/gfortran/ATTRIBUTES-directive.html

> 
> I don't know my way around, but I had a quick look, and it at least
> wouldn't be as straightforward as I hoped.

See gcc/fortran/decl.cc gfc_match_gcc_* 

For example the CDECL attribute probably comes close to target_clones:
subroutine mysub1()
!GCC$ ATTRIBUTES CDECL :: mysub1
! body of mysub1 here

For target_clones you would most likely need a slightly different parser
for you need the user to specify the actual target_clones somehow. You
would probably make a suggestion and discuss the proposal here.
Ideally the syntax would be the same as in C.

Finally you would have to lower the target_clones, not sure offhand who
creates the ifunc dispatcher, the frontend or the middle-end. There is
expand_target_clones in gcc/multiple_target.cc that probably comes into play.
So yes, that seems to create the ifuncs if i read that correctly.
Hence the implementation in the frontend should not be all too
complicated, it seems.

To your initial remark about a technical reason not to support it i
would point you to what Tobias said to me some time ago and which i
certainly told other people more than once, too:
https://patchwork.ozlabs.org/comment/958570/
---8<---
In general, I prefer to stick to standard methods
(which are portable) and think that those user knobs often make things
slower than faster (as they tend to stay for years, even after the hard-
ware as moved on - or they are even inserted blindly).
---8<---

In former times, you would compile your library multiple times
and provide a distinct, optimized version for each of the CPUs.
Maybe that would work for you equally well, without target_clones?

HTH

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: adding attributes
  2022-10-30  7:48 ` Bernhard Reutner-Fischer
@ 2022-10-31 21:19   ` Dave Love
  2022-11-02 23:19     ` Bernhard Reutner-Fischer
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Love @ 2022-10-31 21:19 UTC (permalink / raw)
  To: fortran

Bernhard Reutner-Fischer via Fortran <fortran@gcc.gnu.org> writes:

> Well we already have
> !GCC$ ATTRIBUTES attribute-list :: var-name [, var-name] ...
>
> See https://gcc.gnu.org/onlinedocs/gfortran/ATTRIBUTES-directive.html

Yes, that's what I was hoping was simple to extend.  Sorry I didn't say
explicitly.

> For target_clones you would most likely need a slightly different parser
> for you need the user to specify the actual target_clones somehow. You
> would probably make a suggestion and discuss the proposal here.
> Ideally the syntax would be the same as in C.

Right.  I hoped it would be possible to lift machinery easily from C.
It wasn't obvious you could, but I didn't spend much time when I looked
at it a while ago.

> ---8<---
> In general, I prefer to stick to standard methods
> (which are portable) and think that those user knobs often make things
> slower than faster (as they tend to stay for years, even after the hard-
> ware as moved on - or they are even inserted blindly).
> ---8<---

There's no standard method for this sort of portable performance
engineering as far as I can tell.  The best I could see was specifying a
SIMD length statically in OpenMP.  I'm interested in things that
potentially make the difference between, say, vectorization for AVX2 or
full-width AVX512 versus SSE2 for profiled host-spots.  I fully agree
about measurement and not doing things blindly, and I prize
maintainability.  However, target_clones is clearly better than the
existing facility for explicit, target-independent unrolling, for instance.

> In former times, you would compile your library multiple times
> and provide a distinct, optimized version for each of the CPUs.
> Maybe that would work for you equally well, without target_clones?

"Former times" to me means, say, GEC 4000 v. IBM 370 and the aftermath
of "all the world's a VAX", rather than different x86
micro-architectures...  I do now work on both x86_64 and POWER.

Multiple compilation isn't a good solution.  I haven't followed the
current state of hardware capability support, but relevant systems don't
have it on x86_64, at least.  That wouldn't help kernels of your
simulation code that aren't abstracted into a library or set up for
dynamic dispatch anyway.  I don't have a specific instance in mind, but
consider OS packaging, which I do; that currently has to be built for
base x86_64 (SSE2) for EPEL, at least, and so could miss a factor of
several performance from vectorized.

> HTH

Thanks.  Definitely a more helpful response than when I asked about
doing something previously!  (I don't know if I'll actually be able to
work on it in the end, at least on work time.)


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: adding attributes
  2022-10-31 21:19   ` Dave Love
@ 2022-11-02 23:19     ` Bernhard Reutner-Fischer
  2022-11-04 20:59       ` Bernhard Reutner-Fischer
  2022-11-07 11:04       ` Dave Love
  0 siblings, 2 replies; 12+ messages in thread
From: Bernhard Reutner-Fischer @ 2022-11-02 23:19 UTC (permalink / raw)
  To: Dave Love via Fortran; +Cc: rep.dot.nop, Dave Love

On Mon, 31 Oct 2022 21:19:18 +0000
Dave Love via Fortran <fortran@gcc.gnu.org> wrote:

> Bernhard Reutner-Fischer via Fortran <fortran@gcc.gnu.org> writes:

> > Ideally the syntax would be the same as in C.  
> 
> Right.  I hoped it would be possible to lift machinery easily from C.

Lifting that won't work easily, no.

> There's no standard method for this sort of portable performance
> engineering as far as I can tell.  The best I could see was specifying a
> SIMD length statically in OpenMP.  I'm interested in things that
> potentially make the difference between, say, vectorization for AVX2 or
> full-width AVX512 versus SSE2 for profiled host-spots.  I fully agree

I see.
So target_clones is one thing. What other attributes would be important?

> about measurement and not doing things blindly, and I prize
> maintainability.  However, target_clones is clearly better than the
> existing facility for explicit, target-independent unrolling, for instance.

Yes. Unroll is certainly only applicable in a few places, sure.
> 
> > In former times, you would compile your library multiple times
> > and provide a distinct, optimized version for each of the CPUs.
> > Maybe that would work for you equally well, without target_clones?  
> 
> "Former times" to me means, say, GEC 4000 v. IBM 370 and the aftermath
> of "all the world's a VAX", rather than different x86
> micro-architectures...  I do now work on both x86_64 and POWER.

In your job script you would use cpuid(1) to determine a properly tuned
binary for the parts of the cluster you run on. Or the installed
binaries are tuned for the host they are installed on and are located in
a uniform place per application.

> 
> Multiple compilation isn't a good solution.  I haven't followed the

It might not be good, but it's cheap and easy if you only have a small
set of different arches and subarches each. In a controlled
environment, with a batch scheduler. Won't work in the wild of course.

> current state of hardware capability support, but relevant systems don't
> have it on x86_64, at least.  That wouldn't help kernels of your
> simulation code that aren't abstracted into a library or set up for
> dynamic dispatch anyway.  I don't have a specific instance in mind, but
> consider OS packaging, which I do; that currently has to be built for
> base x86_64 (SSE2) for EPEL, at least, and so could miss a factor of
> several performance from vectorized.

For packaging for global use that won't work all that well indeed.

But since you cannot mix target_clones across arch-boundaries,
supporting those for a distro will probably be rather ugly anyway.
I think that's what's gentoo et al are for, or your privately rebuilt
debian repo; provide a tuned world for everybody, individually ;) But as
you mentioned EPEL i never said that :)

> 
> > HTH  
> 
> Thanks.  Definitely a more helpful response than when I asked about
> doing something previously!  (I don't know if I'll actually be able to
> work on it in the end, at least on work time.)

heh, me neither. Luckily yesterday was a holiday, so what i ended up
with was the following, fya.
Consider:
$ grep -v "^\!\!" /scratch/src/gcc-13.mine/gcc/testsuite/gfortran.dg/attr_target_clones-1.f90;echo EOF
! { dg-do compile }
! { dg-options "-O1 -fdump-tree-optimized" }
!
! Test __attribute__ ((target_clones ("foo", "bar")))
!
module m
  implicit none
contains
  subroutine sub1()
!GCC$ ATTRIBUTES target_clones("avx", "sse","default") :: sub1
    print *, 4321
  end
end module m
! { dg-final { scan-tree-dump-times {void * __m_MOD_sub1.resolver ()} "optimized" 1 } }
! { dg-final { scan-tree-dump-times {void __m_MOD_sub1.avx ()} "optimized" 1 } }
! { dg-final { scan-tree-dump-times {void __m_MOD_sub1.sse ()} "optimized" 1 } }
!!! { dg-final { scan-tree-dump-times {XXX something sub1.default ()} "optimized" 1 } }
! { dg-final { scan-tree-dump-not {void sub1 ()} "optimized" } }
EOF
Which gives
$ ./gfortran -B. -o /tmp/out.o -c /scratch/src/gcc-13.mine/gcc/testsuite/gfortran.dg/attr_target_clones-1.f90 -O2 -fdump-tree-original -fdump-tree-optimized
/tmp/ccxpGd9Y.s: Assembler messages:
/tmp/ccxpGd9Y.s:118: Error: symbol `__m_MOD_sub1' is already defined

That's because that ends up as
$ nl -ba /tmp/out.s | grep __m_MOD_sub1
    12		.type	__m_MOD_sub1, @function
    13	__m_MOD_sub1:
    35		.size	__m_MOD_sub1, .-__m_MOD_sub1
    36		.type	__m_MOD_sub1.avx, @function
    37	__m_MOD_sub1.avx:
    59		.size	__m_MOD_sub1.avx, .-__m_MOD_sub1.avx
    60		.type	__m_MOD_sub1.sse, @function
    61	__m_MOD_sub1.sse:
    83		.size	__m_MOD_sub1.sse, .-__m_MOD_sub1.sse
    84		.section	.text.__m_MOD_sub1.resolver,"axG",@progbits,__m_MOD_sub1.resolver,comdat
    85		.weak	__m_MOD_sub1.resolver
    86		.type	__m_MOD_sub1.resolver, @function
    87	__m_MOD_sub1.resolver:
    95		movl	$__m_MOD_sub1.avx, %eax
   104		movl	$__m_MOD_sub1, %eax
   105		movl	$__m_MOD_sub1.sse, %edx
   110		.size	__m_MOD_sub1.resolver, .-__m_MOD_sub1.resolver
   111		.globl	__m_MOD_sub1
   112		.type	__m_MOD_sub1, @gnu_indirect_function
   113		.set	__m_MOD_sub1,__m_MOD_sub1.resolver

where 13 and 111 probably don't work out too well.
The C frontend uses sub1.default as version for the (former) plain sub1:
     4		.type	sub1.default, @function
     5	sub1.default:
...
   103		.section	.text.sub1.resolver,"axG",@progbits,sub1.resolver,comdat
   105		.weak	sub1.resolver
   106		.type	sub1.resolver, @function
   107	sub1.resolver:
...
   162		leaq	sub1.default(%rip), %rax
   167		.size	sub1.resolver, .-sub1.resolver
   168		.globl	sub1
   169		.type	sub1, @gnu_indirect_function
   170		.set	sub1,sub1.resolver

If i mark the module fndecl as DECL_FUNCTION_VERSIONED, then it's
pointed out that i seem to have to provide the default by hand:
   10 |   subroutine sub1()
      |                 1
internal compiler error: in ix86_mangle_function_version_assembler_name, at config/i386/i386-features.cc:3165
0x806780 ix86_mangle_function_version_assembler_name

That's the check that there is
  /* target attribute string cannot be NULL.  */
  gcc_assert (version_attr != NULL_TREE);
So while target and target_clones seem to be mutually exclusive (from
the C FE checking), the versioning wants the default in a target attr
or something like that.

And on top of all that, gfc_match_gcc_attributes has the following
comment:
   TODO: We should support all GCC attributes using the same syntax for
   the attribute list, i.e. the list in C
      __attributes(( attribute-list ))
   matches then
      !GCC$ ATTRIBUTES attribute-list ::
   Cf. c-parser.cc's c_parser_attributes; the data can then directly be
   saved into a TREE.

When we do that, we can get rid of ext_attr_list[] because that would
be generated right from the start.

I've added a
   /* Attributes set by compiler extensions (!GCC$ ATTRIBUTES).  */
   unsigned ext_attr:EXT_ATTR_NUM;
+  tree ext_attr_args;

to struct symbol_attribute where i can prepare the tree_list for the
attrs right from the start. The lowering is then rather simple and
uniform, just chainon the prepared attributes and be done.

One could get rid of ext_attr altogether, with the caveat that this
would change the module format. We'd have to save the attrs in a
different way, breaking module compat again, of course.

target_clones does not require a bump in the module format, i'd say,
because the main entry point does not change. Will have to check if
the clones do not end up being emitted in the module, they shouldn't be.
Other attributes _may_ require a change in the module format though.
These would need checking on a per case basis.

That said, one cannot import all attributes handling from the C FE into
the fortran FE seamlessly. There is always a bit of massaging required.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: adding attributes
  2022-11-02 23:19     ` Bernhard Reutner-Fischer
@ 2022-11-04 20:59       ` Bernhard Reutner-Fischer
  2022-11-05  7:40         ` Thomas Koenig
  2022-11-07 11:04       ` Dave Love
  1 sibling, 1 reply; 12+ messages in thread
From: Bernhard Reutner-Fischer @ 2022-11-04 20:59 UTC (permalink / raw)
  To: Dave Love via Fortran; +Cc: rep.dot.nop, Dave Love

On Thu, 3 Nov 2022 00:19:26 +0100
Bernhard Reutner-Fischer <rep.dot.nop@gmail.com> wrote:

> So target_clones is one thing. What other attributes would be important?

> > doing something previously!  (I don't know if I'll actually be able to
> > work on it in the end, at least on work time.)  
> 
> heh, me neither. Luckily yesterday was a holiday, so what i ended up
> with was the following, fya.
> Consider:

$ cat gcc/testsuite/gfortran.dg/attr_target_clones-1.F90; echo EOF
! { dg-require-ifunc "" }
! { dg-options "-O1" }
! { dg-additional-options "-fdump-tree-optimized" }
! It seems arch defines are not defined?!
! See fortran.cpp  FIXME: Pandora's Box
! Ok, so enterprise-level bugfix:
! { dg-additional-options "-D__i386__=1" { target { i?86-*-* x86_64-*-* } } }
! { dg-additional-options "-D__powerpc__=1" { target { powerpc*-*-* } } }
! { dg-skip-if "test not yet implemented for target" { ! {i?86-*-* x86_64-*-* powerpc*-*-*} } }
!! { dg- skip-if "needs optimize" { *-*-* } { "*" } { " -O0 " } }
! Test __attribute__ ((target_clones ("foo", "bar")))
!
module m
  implicit none
contains
  subroutine sub1()
#if defined __i386__ || defined __x86_64__
!GCC$ ATTRIBUTES target_clones("avx", "sse","default") :: sub1
#elif defined __powerpc__
!GCC$ ATTRIBUTES target_clones("power10", "power9","default") :: sub1
#endif
    print *, 4321
  end
end module m
! { dg-final { scan-tree-dump-times {(?n)void \* __m_MOD_sub1\.resolver \(\)} 1 "optimized" } }
! { dg-final { scan-tree-dump-times {(?n)void __m_MOD_sub1\.(?:avx|power10) \(\)} 1 "optimized" } }
! { dg-final { scan-tree-dump-times {(?n)void __m_MOD_sub1\.(?:sse|power9) \(\)} 1 "optimized" } }
! { dg-final { scan-tree-dump-times {(?n)void sub1 \(\)} 1 "optimized" } }
!! and a non-assembly hint on the ifunc
! { dg-final { scan-tree-dump-times {Function sub1 \(__m_MOD_sub1\.default,} 1 "optimized" } }
EOF

2 patches:
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605081.html
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604981.html
(the testcase mentioned in the latter is superseded be the blurb above)

One would have to cleanup the parser (see "XXX: Rephrase this in a
sane, understandable manner..") and add some more testcases, for several
malformed attribute strings. Maybe i'll get to it during the weekend or
some evening.

Not sure about the usefulness though.
And not sure if fellow gfortraners would accept this attribute
target_clones in there in the first place..

cheers,

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: adding attributes
  2022-11-04 20:59       ` Bernhard Reutner-Fischer
@ 2022-11-05  7:40         ` Thomas Koenig
  2022-11-05 10:54           ` Bernhard Reutner-Fischer
  0 siblings, 1 reply; 12+ messages in thread
From: Thomas Koenig @ 2022-11-05  7:40 UTC (permalink / raw)
  To: Bernhard Reutner-Fischer, Dave Love via Fortran; +Cc: Dave Love

On 04.11.22 21:59, Bernhard Reutner-Fischer via Fortran wrote:
> And not sure if fellow gfortraners would accept this attribute
> target_clones in there in the first place..

It might actually be useful.  Is there any change about
the calling sequence or anything else that should be visible
in a Fortran module or the calling sequence?

	Thomas

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: adding attributes
  2022-11-05  7:40         ` Thomas Koenig
@ 2022-11-05 10:54           ` Bernhard Reutner-Fischer
  2022-11-06 13:44             ` Thomas Koenig
  2023-02-24 12:24             ` Dave Love
  0 siblings, 2 replies; 12+ messages in thread
From: Bernhard Reutner-Fischer @ 2022-11-05 10:54 UTC (permalink / raw)
  To: Thomas Koenig; +Cc: rep.dot.nop, Dave Love via Fortran, Dave Love

[-- Attachment #1: Type: text/plain, Size: 1673 bytes --]

On Sat, 5 Nov 2022 08:40:06 +0100
Thomas Koenig <tkoenig@netcologne.de> wrote:

> On 04.11.22 21:59, Bernhard Reutner-Fischer via Fortran wrote:
> > And not sure if fellow gfortraners would accept this attribute
> > target_clones in there in the first place..  
> 
> It might actually be useful.  Is there any change about
> the calling sequence or anything else that should be visible
> in a Fortran module or the calling sequence?

The module interface remains the same.
And the call sequence remains the same, too.
For a user nothing changes.

An example:
module m
  implicit none
contains
  subroutine sub1()
!GCC$ ATTRIBUTES target_clones("avx", "sse","default") :: sub1
    print *, 4321
  end
end module m

This used to compiles to:
$ nm /tmp/pristine.o 
                 U _gfortran_st_write
                 U _gfortran_st_write_done
                 U _gfortran_transfer_integer_write
0000000000000000 T __m_MOD_sub1

And now compiles to:
$ nm /tmp/new.o 
                 U __cpu_indicator_init
                 U __cpu_model
                 U _gfortran_st_write
                 U _gfortran_st_write_done
                 U _gfortran_transfer_integer_write
0000000000000000 i __m_MOD_sub1
000000000000006e t __m_MOD_sub1.avx
0000000000000000 t __m_MOD_sub1.default
0000000000000000 W __m_MOD_sub1.resolver
00000000000000dc t __m_MOD_sub1.sse

I.e. the caller still calls __m_MOD_sub1
But this is now an ifunc, which looks at the cpu bits and dispatches to
the appropriate ISA version.

I'm attaching the assembler input for reference.

If you think that we want to add support for that attribute, i can
submit a proper patch. Just let me know please.

thanks,

[-- Attachment #2: new.s --]
[-- Type: text/plain, Size: 3040 bytes --]

	.file	"attr_target_clones-1.F90"
	.text
	.section	.rodata
	.align 8
.LC0:
	.string	"/scratch/src/gcc-13.mine/gcc/testsuite/gfortran.dg/attr_target_clones-1.F90"
	.align 4
.LC1:
	.long	4321
	.text
	.type	__m_MOD_sub1.default, @function
__m_MOD_sub1.default:
.LFB0:
	.cfi_startproc
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register 6
	subq	$528, %rsp
	movq	$.LC0, -520(%rbp)
	movl	$21, -512(%rbp)
	movl	$128, -528(%rbp)
	movl	$6, -524(%rbp)
	leaq	-528(%rbp), %rax
	movq	%rax, %rdi
	call	_gfortran_st_write
	leaq	-528(%rbp), %rax
	movl	$4, %edx
	movl	$.LC1, %esi
	movq	%rax, %rdi
	call	_gfortran_transfer_integer_write
	leaq	-528(%rbp), %rax
	movq	%rax, %rdi
	call	_gfortran_st_write_done
	nop
	leave
	.cfi_def_cfa 7, 8
	ret
	.cfi_endproc
.LFE0:
	.size	__m_MOD_sub1.default, .-__m_MOD_sub1.default
	.type	__m_MOD_sub1.avx, @function
__m_MOD_sub1.avx:
.LFB1:
	.cfi_startproc
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register 6
	subq	$528, %rsp
	movq	$.LC0, -520(%rbp)
	movl	$21, -512(%rbp)
	movl	$128, -528(%rbp)
	movl	$6, -524(%rbp)
	leaq	-528(%rbp), %rax
	movq	%rax, %rdi
	call	_gfortran_st_write
	leaq	-528(%rbp), %rax
	movl	$4, %edx
	movl	$.LC1, %esi
	movq	%rax, %rdi
	call	_gfortran_transfer_integer_write
	leaq	-528(%rbp), %rax
	movq	%rax, %rdi
	call	_gfortran_st_write_done
	nop
	leave
	.cfi_def_cfa 7, 8
	ret
	.cfi_endproc
.LFE1:
	.size	__m_MOD_sub1.avx, .-__m_MOD_sub1.avx
	.type	__m_MOD_sub1.sse, @function
__m_MOD_sub1.sse:
.LFB2:
	.cfi_startproc
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register 6
	subq	$528, %rsp
	movq	$.LC0, -520(%rbp)
	movl	$21, -512(%rbp)
	movl	$128, -528(%rbp)
	movl	$6, -524(%rbp)
	leaq	-528(%rbp), %rax
	movq	%rax, %rdi
	call	_gfortran_st_write
	leaq	-528(%rbp), %rax
	movl	$4, %edx
	movl	$.LC1, %esi
	movq	%rax, %rdi
	call	_gfortran_transfer_integer_write
	leaq	-528(%rbp), %rax
	movq	%rax, %rdi
	call	_gfortran_st_write_done
	nop
	leave
	.cfi_def_cfa 7, 8
	ret
	.cfi_endproc
.LFE2:
	.size	__m_MOD_sub1.sse, .-__m_MOD_sub1.sse
	.section	.text.__m_MOD_sub1.resolver,"axG",@progbits,__m_MOD_sub1.resolver,comdat
	.weak	__m_MOD_sub1.resolver
	.type	__m_MOD_sub1.resolver, @function
__m_MOD_sub1.resolver:
.LFB4:
	.cfi_startproc
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register 6
	call	__cpu_indicator_init
	movl	__cpu_model+12(%rip), %eax
	andl	$512, %eax
	testl	%eax, %eax
	jle	.L5
	movl	$__m_MOD_sub1.avx, %eax
	jmp	.L4
.L5:
	movl	__cpu_model+12(%rip), %eax
	andl	$8, %eax
	testl	%eax, %eax
	jle	.L6
	movl	$__m_MOD_sub1.sse, %eax
	jmp	.L4
.L6:
	movl	$__m_MOD_sub1.default, %eax
.L4:
	popq	%rbp
	.cfi_def_cfa 7, 8
	ret
	.cfi_endproc
.LFE4:
	.size	__m_MOD_sub1.resolver, .-__m_MOD_sub1.resolver
	.globl	__m_MOD_sub1
	.type	__m_MOD_sub1, @gnu_indirect_function
	.set	__m_MOD_sub1,__m_MOD_sub1.resolver
	.ident	"GCC: (GNU) 13.0.0 20220916 (experimental) [master r13-2694-g3e8c4b925a9]"
	.section	.note.GNU-stack,"",@progbits

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: adding attributes
  2022-11-05 10:54           ` Bernhard Reutner-Fischer
@ 2022-11-06 13:44             ` Thomas Koenig
  2022-11-07 11:06               ` Dave Love
  2023-02-24 12:24             ` Dave Love
  1 sibling, 1 reply; 12+ messages in thread
From: Thomas Koenig @ 2022-11-06 13:44 UTC (permalink / raw)
  To: Bernhard Reutner-Fischer; +Cc: Dave Love via Fortran, Dave Love

Hi Bernhard,

> If you think that we want to add support for that attribute, i can
> submit a proper patch. Just let me know please.

I think this attribute makes sense, especially if people want to
compile once and then port to different architectures.

So, yes please submit a patch, if you would be so kind :-)

Best regards

	Thomas

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: adding attributes
  2022-11-02 23:19     ` Bernhard Reutner-Fischer
  2022-11-04 20:59       ` Bernhard Reutner-Fischer
@ 2022-11-07 11:04       ` Dave Love
  2022-11-10 12:25         ` Bernhard Reutner-Fischer
  1 sibling, 1 reply; 12+ messages in thread
From: Dave Love @ 2022-11-07 11:04 UTC (permalink / raw)
  To: fortran

Bernhard Reutner-Fischer via Fortran <fortran@gcc.gnu.org> writes:

> I see.
> So target_clones is one thing. What other attributes would be important?

At least optimization-related ones could be useful, and possibly others.
I haven't made a list, but could do.

> In your job script you would use cpuid(1) to determine a properly tuned
> binary for the parts of the cluster you run on. Or the installed
> binaries are tuned for the host they are installed on and are located in
> a uniform place per application.
>
>> 
>> Multiple compilation isn't a good solution.  I haven't followed the
>
> It might not be good, but it's cheap and easy if you only have a small
> set of different arches and subarches each. In a controlled
> environment, with a batch scheduler. Won't work in the wild of course.

I know all that you can do, but my opinions are from extensive
experience managing rather heterogeneous HPC clusters and working on
dynamic dispatch in libraries.  (The worst thing about gfortran for
system management is the lack of backwards-compatibility in module
formats and libgfortran.)

> But since you cannot mix target_clones across arch-boundaries,
> supporting those for a distro will probably be rather ugly anyway.

Yes, you need simple pre-processing, as you do for the attributes in C,
unless there was some extra guard facility added.

> heh, me neither. Luckily yesterday was a holiday, so what i ended up
> with was the following, fya.

Gosh; I thought it would take a while even if you knew your way around.
I didn't want to spoil a holiday!  (I'd aim to do such things on work
time.)

> I've added a
>    /* Attributes set by compiler extensions (!GCC$ ATTRIBUTES).  */
>    unsigned ext_attr:EXT_ATTR_NUM;
> +  tree ext_attr_args;
>
> to struct symbol_attribute where i can prepare the tree_list for the
> attrs right from the start. The lowering is then rather simple and
> uniform, just chainon the prepared attributes and be done.

If I understand correctly, I could go through and add ones that look
useful (for debate).  I have experience of using several in C (at least
once even for g77 runtime).

> target_clones does not require a bump in the module format, i'd say,
> because the main entry point does not change. Will have to check if
> the clones do not end up being emitted in the module, they shouldn't be.
> Other attributes _may_ require a change in the module format though.
> These would need checking on a per case basis.

I don't understand the module format, but I wouldn't have expected
relevant attributes to change interfaces.

Anyway, thanks!


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: adding attributes
  2022-11-06 13:44             ` Thomas Koenig
@ 2022-11-07 11:06               ` Dave Love
  0 siblings, 0 replies; 12+ messages in thread
From: Dave Love @ 2022-11-07 11:06 UTC (permalink / raw)
  To: fortran

Thomas Koenig via Fortran <fortran@gcc.gnu.org> writes:

> Hi Bernhard,
>
>> If you think that we want to add support for that attribute, i can
>> submit a proper patch. Just let me know please.
>
> I think this attribute makes sense, especially if people want to
> compile once and then port to different architectures.
>
> So, yes please submit a patch, if you would be so kind :-)
>
> Best regards
>
> 	Thomas

I'll make a list of others I think would be useful for consideration.  I
should probably have looked at what similar facilities proprietary
compilers provide.

I should say the intent of optimization-related attributes would be to
encourage adding them to known performance-sensitive kernels, and
potentially avoid having to write C.  However, I doubt I can persuade
the people who are convinced of the great superiority of ifort, and the
desirability of always adding -axCORE-AVX512, without measuring or
understanding GCC options.

I think an equivalent of "#pragma GCC optimize" could be useful too.  I
don't have a specific case in mind, but an example might be where you
need localized standard-conforming arithmetic, but other parts of the
program benefit from -ffast-math.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: adding attributes
  2022-11-07 11:04       ` Dave Love
@ 2022-11-10 12:25         ` Bernhard Reutner-Fischer
  0 siblings, 0 replies; 12+ messages in thread
From: Bernhard Reutner-Fischer @ 2022-11-10 12:25 UTC (permalink / raw)
  To: Dave Love via Fortran; +Cc: rep.dot.nop, Dave Love

Hi!

On Mon, 07 Nov 2022 11:04:17 +0000
Dave Love via Fortran <fortran@gcc.gnu.org> wrote:

> Bernhard Reutner-Fischer via Fortran <fortran@gcc.gnu.org> writes:
> 
> > I see.
> > So target_clones is one thing. What other attributes would be important?  
> 
> At least optimization-related ones could be useful, and possibly others.
> I haven't made a list, but could do.

Please do.
And yes, i can see that __attribute__((__optimize__(...))) would be
useful.

> dynamic dispatch in libraries.  (The worst thing about gfortran for
> system management is the lack of backwards-compatibility in module
> formats and libgfortran.)

yea. IIRC there was discussion a couple of years back what we could do
about the module format. Nowadays i'd probably just use JSON, but i did
not think too much about it. Needless to say that rewriting the mio
(module I/O) would take more than one evening :) I remember some
wrinkles there when i played around with the fortran-fe-stringpool idea
(which reminds me i should pickup again, maybe).

> > But since you cannot mix target_clones across arch-boundaries,
> > supporting those for a distro will probably be rather ugly anyway.  
> 
> Yes, you need simple pre-processing, as you do for the attributes in C,
> unless there was some extra guard facility added.

Yes indeed. But this would be much easier to handle if we'd have actual
arch defines. Until we have, you'd have to run this through cpp
manually which is doable but not all that convenient IMHO.

Hm, didn't we have a syntax for arch conditions in the math vec?
We could hijack that, but it's probably still better to just fix the
arch defines as that's generally useful.

> > heh, me neither. Luckily yesterday was a holiday, so what i ended up
> > with was the following, fya.  
> 
> Gosh; I thought it would take a while even if you knew your way around.
> I didn't want to spoil a holiday!  (I'd aim to do such things on work
> time.)

No problem, it was just for fun.
I spent most of the time to scratch my head why the attribute didn't
work for i had wrapped it in an arch ifdef for the testsuite to cover
both i386 and ppc. And of course i only noticed very, very late what was
really going on ;)

> > I've added a
> >    /* Attributes set by compiler extensions (!GCC$ ATTRIBUTES).  */
> >    unsigned ext_attr:EXT_ATTR_NUM;
> > +  tree ext_attr_args;
> >
> > to struct symbol_attribute where i can prepare the tree_list for the
> > attrs right from the start. The lowering is then rather simple and
> > uniform, just chainon the prepared attributes and be done.  
> 
> If I understand correctly, I could go through and add ones that look
> useful (for debate).  I have experience of using several in C (at least
> once even for g77 runtime).

Yes please, that'd be interesting.

> > target_clones does not require a bump in the module format, i'd say,
> > because the main entry point does not change. Will have to check if
> > the clones do not end up being emitted in the module, they shouldn't be.
> > Other attributes _may_ require a change in the module format though.
> > These would need checking on a per case basis.  
> 
> I don't understand the module format, but I wouldn't have expected
> relevant attributes to change interfaces.

Well that should probably not be needed indeed for most attributes, yes.

But then, i do think we stream out the ext_attr, at least for certain
attributes like "cdecl", the dll{im,ex}port, {std,fast}call et al.
See module.cc, mio_symbol_attribute. So the bits that change the
calling convention have to be brought to the attention of the module
consumer probably. Think regparm or sseregparm for example i guess.

That said, if i comment out the invalid cases of the test, i get
with gcc-12:
$ gfortran -c -o /tmp/out0.o /scratch/src/gcc-13.mine/gcc/testsuite/gfortran.dg/compiler-directive_1.f90 
/scratch/src/gcc-13.mine/gcc/testsuite/gfortran.dg/compiler-directive_1.f90:33:15:

   33 |   cdecl => sub2
      |               1
Warning: ‘cdecl’ attribute ignored [-Wattributes]
/scratch/src/gcc-13.mine/gcc/testsuite/gfortran.dg/compiler-directive_1.f90:33:15: Warning: ‘cdecl’ attribute ignored [-Wattributes]
/scratch/src/gcc-13.mine/gcc/testsuite/gfortran.dg/compiler-directive_1.f90:34:17:

   34 |   stdcall => sub3
      |                 1
Warning: ‘stdcall’ attribute ignored [-Wattributes]
/scratch/src/gcc-13.mine/gcc/testsuite/gfortran.dg/compiler-directive_1.f90:34:17: Warning: ‘stdcall’ attribute ignored [-Wattributes]
/scratch/src/gcc-13.mine/gcc/testsuite/gfortran.dg/compiler-directive_1.f90:35:18:

   35 |   fastcall => sub4
      |                  1
Warning: ‘fastcall’ attribute ignored [-Wattributes]
/scratch/src/gcc-13.mine/gcc/testsuite/gfortran.dg/compiler-directive_1.f90:35:18: Warning: ‘fastcall’ attribute ignored [-Wattributes]

so i'm not sure what these attributes do or are supposed to do really,
but i didn't look. And in addition there is typo s/consitency/consistency/
in the test.

> 
> Anyway, thanks!

You're welcome.

PS: You might have seen the patch to add "flatten". As can be seen, it
should now be rather straightforward to add simple attributes for
procedures.

cheers,

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: adding attributes
  2022-11-05 10:54           ` Bernhard Reutner-Fischer
  2022-11-06 13:44             ` Thomas Koenig
@ 2023-02-24 12:24             ` Dave Love
  1 sibling, 0 replies; 12+ messages in thread
From: Dave Love @ 2023-02-24 12:24 UTC (permalink / raw)
  To: Bernhard Reutner-Fischer via Fortran
  Cc: Thomas Koenig, Bernhard Reutner-Fischer

I noticed other attributes have been added now.  What happened to
target_clones, in particular?  (Thanks for working on it.)

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-02-24 12:24 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-28 14:35 adding attributes Dave Love
2022-10-30  7:48 ` Bernhard Reutner-Fischer
2022-10-31 21:19   ` Dave Love
2022-11-02 23:19     ` Bernhard Reutner-Fischer
2022-11-04 20:59       ` Bernhard Reutner-Fischer
2022-11-05  7:40         ` Thomas Koenig
2022-11-05 10:54           ` Bernhard Reutner-Fischer
2022-11-06 13:44             ` Thomas Koenig
2022-11-07 11:06               ` Dave Love
2023-02-24 12:24             ` Dave Love
2022-11-07 11:04       ` Dave Love
2022-11-10 12:25         ` Bernhard Reutner-Fischer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).