* Tests of gcc development beyond its testsuite (in this case, for gfortran)
@ 2024-05-06 21:27 Toon Moene
2024-05-06 21:32 ` Andrew Pinski
0 siblings, 1 reply; 8+ messages in thread
From: Toon Moene @ 2024-05-06 21:27 UTC (permalink / raw)
To: gcc mailing list, fortran
I have now, for some time, ran LAPACK's test programs on my gcc/gfortran
builds on both on the x86_64-linux-gnu architecture, as well as the
aarch64-linux-gnu one (see, e.g.,
http://moene.org/~toon/lapack-amd64-gfortran13-O3).
The results are rather alarming - this is r15-202 for aarch64 vs r15-204
for x86_64 (compiled with -O3):
diff lapack-amd64-gfortran15-O3 lapack-aarch64-gfortran15-O3
3892,3895c3928,3931
< REAL 1327023 0 (0.000%) 0 (0.000%)
< DOUBLE PRECISION 1300917 6 (0.000%) 0 (0.000%)
< COMPLEX 786775 0 (0.000%) 0 (0.000%)
< COMPLEX16 787842 0 (0.000%) 0 (0.000%)
---
> REAL 1317063 71 (0.005%) 0 (0.000%)
> DOUBLE PRECISION 1318331 54 (0.004%) 4 (0.000%)
> COMPLEX 767023 390 (0.051%) 0 (0.000%)
> COMPLEX16 772338 305 (0.039%) 0 (0.000%)
3897c3933
< --> ALL PRECISIONS 4202557 6 (0.000%) 0 (0.000%)
---
> --> ALL PRECISIONS 4174755 820 (0.020%) 4 (0.000%)
Note the excessive exceeding the threshold for errors on the aarch64
side (>).
Of course, this is only an excerpt of the full log file - there is more
information in it to zoom in on the errors on the aarch64 side (note
that the x86_64 side is not faultless).
Is there a way to pass this information to our websites, so that we do
not "forget" this - or in the alternative, follow the progress in
solving this ?
Kind regards,
--
Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG Maartensdijk, The Netherlands
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Tests of gcc development beyond its testsuite (in this case, for gfortran)
2024-05-06 21:27 Tests of gcc development beyond its testsuite (in this case, for gfortran) Toon Moene
@ 2024-05-06 21:32 ` Andrew Pinski
2024-05-06 21:35 ` Toon Moene
0 siblings, 1 reply; 8+ messages in thread
From: Andrew Pinski @ 2024-05-06 21:32 UTC (permalink / raw)
To: Toon Moene; +Cc: gcc mailing list, fortran
On Mon, May 6, 2024 at 2:27 PM Toon Moene <toon@moene.org> wrote:
>
> I have now, for some time, ran LAPACK's test programs on my gcc/gfortran
> builds on both on the x86_64-linux-gnu architecture, as well as the
> aarch64-linux-gnu one (see, e.g.,
> http://moene.org/~toon/lapack-amd64-gfortran13-O3).
>
> The results are rather alarming - this is r15-202 for aarch64 vs r15-204
> for x86_64 (compiled with -O3):
Did you test x86_64 with -march=native (or with -mfma) or just -O3?
The reason why I am asking is aarch64 includes FMA by default while
x86_64 does not.
Most recent x86_64 includes an FMA instruction but since the base ISA
does not include it, it is not enabled by default.
I am suspect the aarch64 "excessive exceeding the threshold for
errors" are all caused by the more use of FMA rather than anything
else.
Thanks,
Andrew Pinski
>
> diff lapack-amd64-gfortran15-O3 lapack-aarch64-gfortran15-O3
>
> 3892,3895c3928,3931
> < REAL 1327023 0 (0.000%) 0 (0.000%)
> < DOUBLE PRECISION 1300917 6 (0.000%) 0 (0.000%)
> < COMPLEX 786775 0 (0.000%) 0 (0.000%)
> < COMPLEX16 787842 0 (0.000%) 0 (0.000%)
> ---
> > REAL 1317063 71 (0.005%) 0 (0.000%)
> > DOUBLE PRECISION 1318331 54 (0.004%) 4 (0.000%)
> > COMPLEX 767023 390 (0.051%) 0 (0.000%)
> > COMPLEX16 772338 305 (0.039%) 0 (0.000%)
> 3897c3933
> < --> ALL PRECISIONS 4202557 6 (0.000%) 0 (0.000%)
> ---
> > --> ALL PRECISIONS 4174755 820 (0.020%) 4 (0.000%)
>
> Note the excessive exceeding the threshold for errors on the aarch64
> side (>).
>
> Of course, this is only an excerpt of the full log file - there is more
> information in it to zoom in on the errors on the aarch64 side (note
> that the x86_64 side is not faultless).
>
> Is there a way to pass this information to our websites, so that we do
> not "forget" this - or in the alternative, follow the progress in
> solving this ?
>
> Kind regards,
>
> --
> Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
> Saturnushof 14, 3738 XG Maartensdijk, The Netherlands
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Tests of gcc development beyond its testsuite (in this case, for gfortran)
2024-05-06 21:32 ` Andrew Pinski
@ 2024-05-06 21:35 ` Toon Moene
2024-05-06 22:02 ` Toon Moene
0 siblings, 1 reply; 8+ messages in thread
From: Toon Moene @ 2024-05-06 21:35 UTC (permalink / raw)
To: Andrew Pinski; +Cc: gcc mailing list, fortran
On 5/6/24 23:32, Andrew Pinski wrote:
> Did you test x86_64 with -march=native (or with -mfma) or just -O3?
> The reason why I am asking is aarch64 includes FMA by default while
> x86_64 does not.
> Most recent x86_64 includes an FMA instruction but since the base ISA
> does not include it, it is not enabled by default.
> I am suspect the aarch64 "excessive exceeding the threshold for
> errors" are all caused by the more use of FMA rather than anything
> else.
Aah, I forgot to include that tidbit, because its readily apparent from
the full logs - I compiled with *just* -O3.
Thanks,
--
Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG Maartensdijk, The Netherlands
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Tests of gcc development beyond its testsuite (in this case, for gfortran)
2024-05-06 21:35 ` Toon Moene
@ 2024-05-06 22:02 ` Toon Moene
2024-05-07 18:30 ` Toon Moene
0 siblings, 1 reply; 8+ messages in thread
From: Toon Moene @ 2024-05-06 22:02 UTC (permalink / raw)
To: fortran, gcc mailing list; +Cc: Andrew Pinski
On 5/6/24 23:35, Toon Moene wrote:
> On 5/6/24 23:32, Andrew Pinski wrote:
>
>> Did you test x86_64 with -march=native (or with -mfma) or just -O3?
>> The reason why I am asking is aarch64 includes FMA by default while
>> x86_64 does not.
>> Most recent x86_64 includes an FMA instruction but since the base ISA
>> does not include it, it is not enabled by default.
>> I am suspect the aarch64 "excessive exceeding the threshold for
>> errors" are all caused by the more use of FMA rather than anything
>> else.
>
> Aah, I forgot to include that tidbit, because its readily apparent from
> the full logs - I compiled with *just* -O3.
>
> Thanks,
>
OK, perhaps on the aarch64 I need the following option to make the
comparison fair:
‘rdma’
Enable Round Double Multiply Accumulate instructions. This is on by
default for -march=armv8.1-a.
I.e., -mno-rdma
(I hope that's correct - I'll will try that when the Sun rises again and
I have some power to run the AArch64 machine ...).
I must say I didn't expected this - the discussion on the "Intel" side
was always that the fact that fused multiply-add instruction didn't
express the "real computations" expressed by the program meant that they
were evil and therefore had to be hidden behind some special compiler
option that made it very clear that those instruction were evil.
Again, thanks to point me to the difference (in philosophy, if not math)
between to the two continents (i.e., the Americas and Europe's - before
Brexit - England :-)
Kind regards,
--
Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG Maartensdijk, The Netherlands
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Tests of gcc development beyond its testsuite (in this case, for gfortran)
2024-05-06 22:02 ` Toon Moene
@ 2024-05-07 18:30 ` Toon Moene
2024-05-07 18:35 ` Andrew Pinski
0 siblings, 1 reply; 8+ messages in thread
From: Toon Moene @ 2024-05-07 18:30 UTC (permalink / raw)
To: fortran, gcc mailing list; +Cc: Andrew Pinski
On 5/7/24 00:02, Toon Moene wrote:
> OK, perhaps on the aarch64 I need the following option to make the
> comparison fair:
>
> ‘rdma’
>
> Enable Round Double Multiply Accumulate instructions. This is on by
> default for -march=armv8.1-a.
>
> I.e., -mno-rdma
>
> (I hope that's correct - I'll will try that when the Sun rises again and
> I have some power to run the AArch64 machine ...).
Well, I did two independent runs with gfortran-13.2 and the following
options:
-O3 -march=armv8.1-a+rdma
and
-O3 -march=armv8.1-a+nordma
No difference in the number of error runs exceeding the prescribed
thresholds.
So, unless I made a mistake in the option specification (or the compiler
silently ignored them because they were not applicable to my machine -
ugh), the cause of the problem lies elsewhere.
Kind regards,
--
Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG Maartensdijk, The Netherlands
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Tests of gcc development beyond its testsuite (in this case, for gfortran)
2024-05-07 18:30 ` Toon Moene
@ 2024-05-07 18:35 ` Andrew Pinski
2024-05-07 18:44 ` Toon Moene
0 siblings, 1 reply; 8+ messages in thread
From: Andrew Pinski @ 2024-05-07 18:35 UTC (permalink / raw)
To: Toon Moene; +Cc: fortran, gcc mailing list
On Tue, May 7, 2024 at 11:31 AM Toon Moene <toon@moene.org> wrote:
>
> On 5/7/24 00:02, Toon Moene wrote:
>
> > OK, perhaps on the aarch64 I need the following option to make the
> > comparison fair:
> >
> > ‘rdma’
> >
> > Enable Round Double Multiply Accumulate instructions. This is on by
> > default for -march=armv8.1-a.
> >
> > I.e., -mno-rdma
> >
> > (I hope that's correct - I'll will try that when the Sun rises again and
> > I have some power to run the AArch64 machine ...).
>
> Well, I did two independent runs with gfortran-13.2 and the following
> options:
>
> -O3 -march=armv8.1-a+rdma
>
> and
>
> -O3 -march=armv8.1-a+nordma
>
> No difference in the number of error runs exceeding the prescribed
> thresholds.
>
> So, unless I made a mistake in the option specification (or the compiler
> silently ignored them because they were not applicable to my machine -
> ugh), the cause of the problem lies elsewhere.
AARCH64 armv8-a has FMA as part of its base ISA.
So you want to try with `-ffp-contract=off` instead.
RDMA turns on/off instructions which are not used by the
auto-vectorizer (yet) and used by intrinsics for them (If I read the
code correctly).
Thanks,
Andrew Pinski
>
> Kind regards,
>
> --
> Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
> Saturnushof 14, 3738 XG Maartensdijk, The Netherlands
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Tests of gcc development beyond its testsuite (in this case, for gfortran)
2024-05-07 18:35 ` Andrew Pinski
@ 2024-05-07 18:44 ` Toon Moene
2024-05-08 12:43 ` Toon Moene
0 siblings, 1 reply; 8+ messages in thread
From: Toon Moene @ 2024-05-07 18:44 UTC (permalink / raw)
To: Andrew Pinski; +Cc: fortran, gcc mailing list
On 5/7/24 20:35, Andrew Pinski wrote:
> On Tue, May 7, 2024 at 11:31 AM Toon Moene <toon@moene.org> wrote:
>>
>> On 5/7/24 00:02, Toon Moene wrote:
>>
>>> OK, perhaps on the aarch64 I need the following option to make the
>>> comparison fair:
>>>
>>> ‘rdma’
>>>
>>> Enable Round Double Multiply Accumulate instructions. This is on by
>>> default for -march=armv8.1-a.
>>>
>>> I.e., -mno-rdma
>>>
>>> (I hope that's correct - I'll will try that when the Sun rises again and
>>> I have some power to run the AArch64 machine ...).
>>
>> Well, I did two independent runs with gfortran-13.2 and the following
>> options:
>>
>> -O3 -march=armv8.1-a+rdma
>>
>> and
>>
>> -O3 -march=armv8.1-a+nordma
>>
>> No difference in the number of error runs exceeding the prescribed
>> thresholds.
>>
>> So, unless I made a mistake in the option specification (or the compiler
>> silently ignored them because they were not applicable to my machine -
>> ugh), the cause of the problem lies elsewhere.
>
>
> AARCH64 armv8-a has FMA as part of its base ISA.
> So you want to try with `-ffp-contract=off` instead.
> RDMA turns on/off instructions which are not used by the
> auto-vectorizer (yet) and used by intrinsics for them (If I read the
> code correctly).
Ah, thanks - I'll try that tomorrow.
--
Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG Maartensdijk, The Netherlands
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Tests of gcc development beyond its testsuite (in this case, for gfortran)
2024-05-07 18:44 ` Toon Moene
@ 2024-05-08 12:43 ` Toon Moene
0 siblings, 0 replies; 8+ messages in thread
From: Toon Moene @ 2024-05-08 12:43 UTC (permalink / raw)
To: Andrew Pinski; +Cc: fortran, gcc mailing list
On 5/7/24 20:44, Toon Moene wrote:
> On 5/7/24 20:35, Andrew Pinski wrote:
>
>> On Tue, May 7, 2024 at 11:31 AM Toon Moene <toon@moene.org> wrote:
>>>
>>> On 5/7/24 00:02, Toon Moene wrote:
>>>
>>>> OK, perhaps on the aarch64 I need the following option to make the
>>>> comparison fair:
>>>>
>>>> ‘rdma’
>>>>
>>>> Enable Round Double Multiply Accumulate instructions. This is
>>>> on by
>>>> default for -march=armv8.1-a.
>>>>
>>>> I.e., -mno-rdma
>>>>
>>>> (I hope that's correct - I'll will try that when the Sun rises again
>>>> and
>>>> I have some power to run the AArch64 machine ...).
>>>
>>> Well, I did two independent runs with gfortran-13.2 and the following
>>> options:
>>>
>>> -O3 -march=armv8.1-a+rdma
>>>
>>> and
>>>
>>> -O3 -march=armv8.1-a+nordma
>>>
>>> No difference in the number of error runs exceeding the prescribed
>>> thresholds.
>>>
>>> So, unless I made a mistake in the option specification (or the compiler
>>> silently ignored them because they were not applicable to my machine -
>>> ugh), the cause of the problem lies elsewhere.
>>
>>
>> AARCH64 armv8-a has FMA as part of its base ISA.
>> So you want to try with `-ffp-contract=off` instead.
>> RDMA turns on/off instructions which are not used by the
>> auto-vectorizer (yet) and used by intrinsics for them (If I read the
>> code correctly).
>
> Ah, thanks - I'll try that tomorrow.
Yep, that did it:
--> LAPACK TESTING SUMMARY <--
Processing LAPACK Testing output found in the TESTING directory
SUMMARY nb test run numerical error other error
================ =========== ================= ================
REAL 1327023 0 (0.000%) 0 (0.000%)
DOUBLE PRECISION 1327845 0 (0.000%) 0 (0.000%)
COMPLEX 786775 0 (0.000%) 0 (0.000%)
COMPLEX16 787842 0 (0.000%) 0 (0.000%)
--> ALL PRECISIONS 4229485 0 (0.000%) 0 (0.000%)
So, obviously, the threshold values for these tests were derived on a
machine without fused-multiply-add, or without using them if present.
This is perhaps not surprising, as the default build-and-test setup
(make.inc.example) of the LAPACK package as distributed from netlib.org
lists as the compiler choice:
FC = gfortran
FFLAGS = -O2 -frecursive
FFLAGS_DRV = $(FFLAGS)
FFLAGS_NOOPT = -O0 -frecursive
which means that the choice of architecture on x86-64 would be "generic"
and wouldn't include FMA instructions. If the authors had used that
setup in deriving the thresholds, it is not surprising that you need
-ffp-contract=off on architectures that include FMA instructions by default.
Thanks for helping me out with this !
--
Toon Moene - e-mail: toon@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG Maartensdijk, The Netherlands
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-05-08 12:43 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-06 21:27 Tests of gcc development beyond its testsuite (in this case, for gfortran) Toon Moene
2024-05-06 21:32 ` Andrew Pinski
2024-05-06 21:35 ` Toon Moene
2024-05-06 22:02 ` Toon Moene
2024-05-07 18:30 ` Toon Moene
2024-05-07 18:35 ` Andrew Pinski
2024-05-07 18:44 ` Toon Moene
2024-05-08 12:43 ` Toon Moene
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).