Fwd: [patch, libfortran] AMD-specific versions of library matmul

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* Fwd: [patch, libfortran] AMD-specific versions of library matmul
       [not found] <96c89d30-d311-22e6-e80d-8b87c68d3c60@netcologne.de>
@ 2017-05-25 11:19 ` Thomas Koenig
  2017-05-25 14:11 ` Jerry DeLisle
  2017-05-25 17:58 ` Janne Blomqvist
  2 siblings, 0 replies; 9+ messages in thread
From: Thomas Koenig @ 2017-05-25 11:19 UTC (permalink / raw)
  To: gcc-patches

Hi,

patch is at https://gcc.gnu.org/ml/fortran/2017-05/msg00133.html
(didn't to through to gcc-patches due to size limitations).

Regards

	Thomas


-------- Weitergeleitete Nachricht --------
Betreff: [patch, libfortran] AMD-specific versions of library matmul
Datum: Thu, 25 May 2017 12:45:46 +0200
Von: Thomas Koenig <tkoenig@netcologne.de>
An: fortran@gcc.gnu.org <fortran@gcc.gnu.org>, gcc-patches 
<gcc-patches@gcc.gnu.org>

Hello world,

the attached patch speeds up the library version of matmul for AMD chips
by selecting AVX128 instructions and, depending on which instructions
are supported, either FMA3 (aka FMA) or FMA4.

Jerry tested this on his AMD systems, and found a speedup vs. the
current code of around 10%.

I have been unable to test this on a Ryzen system (the new compile farm
machines won't accept my login yet).  From the benchmarks I have read,
this method should also work fairly well on a Ryzen.

So, OK for trunk?

Regards

	Thomas

2017-05-25  Thomas Koenig  <tkoenig@gcc.gnu.org>

	PR libfortran/78379
	* Makefile.am: Add generated/matmulavx128_*.c files.
	Handle them for compiling and setting the right flags.
	* acinclude.m4: Add tests for FMA3, FMA4 and AVX128.
	* configure.ac: Call them.
	* Makefile.in: Regenerated.
	* config.h.in: Regenerated.
	* configure: Regenerated.
	* m4/matmul.m4:  Handle AMD chips by calling 128-bit AVX
	versions which use FMA3 or FMA4.
	* m4/matmulavx128.m4: New file.
          * generated/matmul_c10.c: Regenerated.
          * generated/matmul_c16.c: Regenerated.
          * generated/matmul_c4.c: Regenerated.
          * generated/matmul_c8.c: Regenerated.
          * generated/matmul_i1.c: Regenerated.
          * generated/matmul_i16.c: Regenerated.
          * generated/matmul_i2.c: Regenerated.
          * generated/matmul_i4.c: Regenerated.
          * generated/matmul_i8.c: Regenerated.
          * generated/matmul_r10.c: Regenerated.
          * generated/matmul_r16.c: Regenerated.
          * generated/matmul_r4.c: Regenerated.
          * generated/matmul_r8.c: Regenerated.
          * generated/matmulavx128_c10.c: New file.
          * generated/matmulavx128_c16.c: New file.
          * generated/matmulavx128_c4.c: New file.
          * generated/matmulavx128_c8.c: New file.
          * generated/matmulavx128_i1.c: New file.
          * generated/matmulavx128_i16.c: New file.
          * generated/matmulavx128_i2.c: New file.
          * generated/matmulavx128_i4.c: New file.
          * generated/matmulavx128_i8.c: New file.
          * generated/matmulavx128_r10.c: New file.
          * generated/matmulavx128_r16.c: New file.
          * generated/matmulavx128_r4.c: New file.
          * generated/matmulavx128_r8.c: New file.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch, libfortran] AMD-specific versions of library matmul
       [not found] <96c89d30-d311-22e6-e80d-8b87c68d3c60@netcologne.de>
  2017-05-25 11:19 ` Fwd: [patch, libfortran] AMD-specific versions of library matmul Thomas Koenig
@ 2017-05-25 14:11 ` Jerry DeLisle
  2017-05-25 14:52   ` Thomas Koenig
  2017-05-25 17:58 ` Janne Blomqvist
  2 siblings, 1 reply; 9+ messages in thread
From: Jerry DeLisle @ 2017-05-25 14:11 UTC (permalink / raw)
  To: Thomas Koenig, fortran, gcc-patches

On 05/25/2017 03:45 AM, Thomas Koenig wrote:
> Hello world,
> 
> the attached patch speeds up the library version of matmul for AMD chips
> by selecting AVX128 instructions and, depending on which instructions
> are supported, either FMA3 (aka FMA) or FMA4.
> 
> Jerry tested this on his AMD systems, and found a speedup vs. the
> current code of around 10%.
> 
> I have been unable to test this on a Ryzen system (the new compile farm
> machines won't accept my login yet).  From the benchmarks I have read,
> this method should also work fairly well on a Ryzen.
> 
> So, OK for trunk?

Yes, OK.  Maybe test Ryzen first?

I just confirmed access to the Ryzen machines so I plan to get set up and test 
there.

Time to start looking under the hood.

cat /proc/cpuinfo gives for flags:

flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 
clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 
constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni 
pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c 
rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 
3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext 
perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap 
clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves clzero irperf arat npt lbrv 
svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter 
pfthreshold avic overflow_recov succor smca

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch, libfortran] AMD-specific versions of library matmul
  2017-05-25 14:11 ` Jerry DeLisle
@ 2017-05-25 14:52   ` Thomas Koenig
  0 siblings, 0 replies; 9+ messages in thread
From: Thomas Koenig @ 2017-05-25 14:52 UTC (permalink / raw)
  To: Jerry DeLisle, fortran, gcc-patches

Hi Jerry,

> Yes, OK.  Maybe test Ryzen first?

Sure, I can wait for a bit :-)
> I just confirmed access to the Ryzen machines so I plan to get set up 
> and test there.

The gcc compile farm machines?  My ssh key does not work there...

I have based the choice of FMA(3) over FMA4 when both are available
on a short remark in a benchmark that FMA3 is faster... it might
be interesting to see if that is actually true.

Regards

	Thomas

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch, libfortran] AMD-specific versions of library matmul
       [not found] <96c89d30-d311-22e6-e80d-8b87c68d3c60@netcologne.de>
  2017-05-25 11:19 ` Fwd: [patch, libfortran] AMD-specific versions of library matmul Thomas Koenig
  2017-05-25 14:11 ` Jerry DeLisle
@ 2017-05-25 17:58 ` Janne Blomqvist
  2017-05-25 20:31   ` Jerry DeLisle
  2 siblings, 1 reply; 9+ messages in thread
From: Janne Blomqvist @ 2017-05-25 17:58 UTC (permalink / raw)
  To: Thomas Koenig; +Cc: fortran, gcc-patches

On Thu, May 25, 2017 at 1:45 PM, Thomas Koenig <tkoenig@netcologne.de> wrote:
> Hello world,
>
> the attached patch speeds up the library version of matmul for AMD chips
> by selecting AVX128 instructions and, depending on which instructions
> are supported, either FMA3 (aka FMA) or FMA4.
>
> Jerry tested this on his AMD systems, and found a speedup vs. the
> current code of around 10%.
>
> I have been unable to test this on a Ryzen system (the new compile farm
> machines won't accept my login yet).  From the benchmarks I have read,
> this method should also work fairly well on a Ryzen.
>
> So, OK for trunk?

In some comments, you have -mprefer=avx128 whereas the option that gcc
understands is -mprefer-avx128. Also, have you verified that e.g.
contemporary Intel processors still use the avx256 codepath and don't
accidentally end up with avx128?

As for FMA4, are there sufficient numbers of processors supporting
FMA4 but not FMA3 around to justify bloating the library to support
them? I understood that this is only a single AMD CPU generation
("bulldozer" in 2011), the next one ("piledriver" in 2012) added FMA3
in addition to FMA4. And in the new Zen core (Ryzen, Epyc, etc.) AMD
has dropped support for FMA4 although there are reports that it will
still execute FMA4 for backward compatibility although it's no longer
advertised in CPUID, but in any case AMD seems to consider it a legacy
instruction that should not be used anymore (Intel never supported
it).

-- 
Janne Blomqvist

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch, libfortran] AMD-specific versions of library matmul
  2017-05-25 17:58 ` Janne Blomqvist
@ 2017-05-25 20:31   ` Jerry DeLisle
  2017-05-25 23:44     ` Thomas Koenig
  0 siblings, 1 reply; 9+ messages in thread
From: Jerry DeLisle @ 2017-05-25 20:31 UTC (permalink / raw)
  To: Janne Blomqvist, Thomas Koenig; +Cc: fortran, gcc-patches

On 05/25/2017 10:20 AM, Janne Blomqvist wrote:
> On Thu, May 25, 2017 at 1:45 PM, Thomas Koenig <tkoenig@netcologne.de> wrote:
>> Hello world,
>>
>> the attached patch speeds up the library version of matmul for AMD chips
>> by selecting AVX128 instructions and, depending on which instructions
>> are supported, either FMA3 (aka FMA) or FMA4.
>>
>> Jerry tested this on his AMD systems, and found a speedup vs. the
>> current code of around 10%.
>>
>> I have been unable to test this on a Ryzen system (the new compile farm
>> machines won't accept my login yet).  From the benchmarks I have read,
>> this method should also work fairly well on a Ryzen.
>>
>> So, OK for trunk?
> 
> In some comments, you have -mprefer=avx128 whereas the option that gcc
> understands is -mprefer-avx128. Also, have you verified that e.g.
> contemporary Intel processors still use the avx256 codepath and don't
> accidentally end up with avx128?
> 
> As for FMA4, are there sufficient numbers of processors supporting
> FMA4 but not FMA3 around to justify bloating the library to support
> them? I understood that this is only a single AMD CPU generation
> ("bulldozer" in 2011), the next one ("piledriver" in 2012) added FMA3
> in addition to FMA4. And in the new Zen core (Ryzen, Epyc, etc.) AMD
> has dropped support for FMA4 although there are reports that it will
> still execute FMA4 for backward compatibility although it's no longer
> advertised in CPUID, but in any case AMD seems to consider it a legacy
> instruction that should not be used anymore (Intel never supported
> it).
> 

Good questions. I am testing this on Ryzen now. It does work as advertised. The 
cpu flags only advertise FMA.

So I will be testing the older AMD machine which advertises FMA4 and FMA with 
just the FMA flag and likewise the Ryzen with FMA4 and FMA.

I want to see if there is any breakage between the two generations of AMD I can 
access.

Also Ryzen with and without -mprefer-avx128 will be tested.

I do not have an Intel box to test.

Regards,

Jerry


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch, libfortran] AMD-specific versions of library matmul
  2017-05-25 20:31   ` Jerry DeLisle
@ 2017-05-25 23:44     ` Thomas Koenig
  2017-05-26  5:41       ` Jerry DeLisle
  0 siblings, 1 reply; 9+ messages in thread
From: Thomas Koenig @ 2017-05-25 23:44 UTC (permalink / raw)
  To: Jerry DeLisle, Janne Blomqvist; +Cc: fortran, gcc-patches

Hi everybody,

I have committed the patch (with the corrections for the name)
as rev 248472.

The infrastructure is in place, so we will be able to make
any fine-tuning easily.

Regards

	Thomas

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch, libfortran] AMD-specific versions of library matmul
  2017-05-25 23:44     ` Thomas Koenig
@ 2017-05-26  5:41       ` Jerry DeLisle
  2017-05-26  6:21         ` Andrew Pinski
  0 siblings, 1 reply; 9+ messages in thread
From: Jerry DeLisle @ 2017-05-26  5:41 UTC (permalink / raw)
  To: Thomas Koenig, Janne Blomqvist; +Cc: fortran, gcc-patches

On 05/25/2017 02:57 PM, Thomas Koenig wrote:
> Hi everybody,
> 
> I have committed the patch (with the corrections for the name)
> as rev 248472.
> 
> The infrastructure is in place, so we will be able to make
> any fine-tuning easily.
> 
> Regards
> 
>      Thomas

Based on my testing I think it is close enough as is.

Thanks Thomas

Jerry

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch, libfortran] AMD-specific versions of library matmul
  2017-05-26  5:41       ` Jerry DeLisle
@ 2017-05-26  6:21         ` Andrew Pinski
  2017-05-26 15:33           ` Bill Seurer
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Pinski @ 2017-05-26  6:21 UTC (permalink / raw)
  To: Jerry DeLisle; +Cc: Thomas Koenig, Janne Blomqvist, fortran, gcc-patches

On Thu, May 25, 2017 at 6:43 PM, Jerry DeLisle <jvdelisle@charter.net> wrote:
> On 05/25/2017 02:57 PM, Thomas Koenig wrote:
>>
>> Hi everybody,
>>
>> I have committed the patch (with the corrections for the name)
>> as rev 248472.
>>
>> The infrastructure is in place, so we will be able to make
>> any fine-tuning easily.
>>
>> Regards
>>
>>      Thomas
>
>
> Based on my testing I think it is close enough as is.

This patch most likely broke all non-x86 targets:
configure: error: conditional "HAVE_AVX128" was never defined.
Usually this means the macro was only invoked conditionally.
Makefile:19843: recipe for target 'configure-target-libgfortran' failed
make[1]: *** [configure-target-libgfortran] Error 1
make[1]: *** Waiting for unfinished jobs....


Thanks,
Andrew

>
> Thanks Thomas
>
> Jerry

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [patch, libfortran] AMD-specific versions of library matmul
  2017-05-26  6:21         ` Andrew Pinski
@ 2017-05-26 15:33           ` Bill Seurer
  0 siblings, 0 replies; 9+ messages in thread
From: Bill Seurer @ 2017-05-26 15:33 UTC (permalink / raw)
  To: Andrew Pinski, Jerry DeLisle
  Cc: Thomas Koenig, Janne Blomqvist, fortran, gcc-patches

On 05/26/2017 12:41 AM, Andrew Pinski wrote:
> On Thu, May 25, 2017 at 6:43 PM, Jerry DeLisle <jvdelisle@charter.net> wrote:
>> On 05/25/2017 02:57 PM, Thomas Koenig wrote:
>>>
>>> Hi everybody,
>>>
>>> I have committed the patch (with the corrections for the name)
>>> as rev 248472.
>>>
>>> The infrastructure is in place, so we will be able to make
>>> any fine-tuning easily.
>>>
>>> Regards
>>>
>>>      Thomas
>>
>>
>> Based on my testing I think it is close enough as is.
>
> This patch most likely broke all non-x86 targets:
> configure: error: conditional "HAVE_AVX128" was never defined.
> Usually this means the macro was only invoked conditionally.
> Makefile:19843: recipe for target 'configure-target-libgfortran' failed
> make[1]: *** [configure-target-libgfortran] Error 1
> make[1]: *** Waiting for unfinished jobs....

Yup, this is definitely what broke (most-) everything.  248471 worked 
fine and then the above error starting with 248472.
-- 

-Bill Seurer

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-05-26 15:20 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <96c89d30-d311-22e6-e80d-8b87c68d3c60@netcologne.de>
2017-05-25 11:19 ` Fwd: [patch, libfortran] AMD-specific versions of library matmul Thomas Koenig
2017-05-25 14:11 ` Jerry DeLisle
2017-05-25 14:52   ` Thomas Koenig
2017-05-25 17:58 ` Janne Blomqvist
2017-05-25 20:31   ` Jerry DeLisle
2017-05-25 23:44     ` Thomas Koenig
2017-05-26  5:41       ` Jerry DeLisle
2017-05-26  6:21         ` Andrew Pinski
2017-05-26 15:33           ` Bill Seurer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).