public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: [AArch64][Spec2017]Question about mlow-precision-div optimization.
@ 2020-02-25 15:27 Wilco Dijkstra
  2020-02-26 13:01 ` =?gb18030?B?QnUgTGU=?=
  0 siblings, 1 reply; 7+ messages in thread
From: Wilco Dijkstra @ 2020-02-25 15:27 UTC (permalink / raw)
  To: gcc-help, cityubule

Hi,

> I found that the mlow-precision-div option have a fix number of newton iterations, 
> which is 2 for float type and 3 for double type.
>
> I noticed that if I alter the numbers of newton iterations as following, it could leads
> to faster performance in SPEC2017 fpspeed test  on AArch64, with less but
> acceptable precision.
 
Which CPU did you try this on? Those results look suspicious - lbm hardly does any
divisions for example, so either the computation has gone wrong due to the lower
accuracy or your CPU has a really slow divide...

On modern cores it is faster to do a division than to use the division approximation
instructions. Eg. on Neoverse N1 a float division takes at most 10 cycles while the
reduced approximation takes 13 cycles (and needs 3 extra instructions which take up
decode and issue slots).

Cheers,
Wilco

^ permalink raw reply	[flat|nested] 7+ messages in thread
* [AArch64][Spec2017]Question about mlow-precision-div optimization.
@ 2020-02-23  4:06 Bu Le
  0 siblings, 0 replies; 7+ messages in thread
From: Bu Le @ 2020-02-23  4:06 UTC (permalink / raw)
  To: gcc-help

Hello world,

 
I found that the mlow-precision-div option have a fix number of newton iterations, which is 2 for float type and 3 for double type.
 
I noticed that if I alter the numbers of newton iterations as following, it could leads to faster performance in SPEC2017 fpspeed test  on AArch64, with less but acceptable precision.
 
Before change: 
 
frecpe  s2, s8
 
frecps  s4, s2, s8
 
fmul    s2, s2, s4
 
frecps  s4, s2, s8
 
fmul    s2, s2, s4
 
fmul   s10, s2
 
 
 
after change:
 
frecpe  s2, s8
 
frecps  s4, s2, s8
 
fmul    s2, s2, s4
 
fmul   s10, s2
 
 
 
The detail of the improvement is shown as following: (change the number of newton iterations for float to 1 and double to 2)
     
Test case
   
Improvement
  
   
603.bwaves_s
   
7.92%
  
   
607.cactuBSSN_s
   
Output miscompare
  
   
619.lbm_s
   
32.34%
  
   
621.wrf_s
   
Output miscompare
  
   
627.cam4_s
   
Output miscompare
  
   
628.pop2_s
   
Output miscompare 
  
   
638.imagick_s
   
-0.97%
  
   
644.nab_s
   
9.09%
  
   
649.fotonik3d_s
   
Output miscompare
  
   
654.roms_s
   
-3.45%
  
  
This may benefit the performance of some test cases which do not have a high demand on precision. 
 
Considering the precision of div is already lower than the IEEE standard when this option is on. Why the precision is fixed by the magic number 2 and 3? 
 
Should we provide a parameter so that users can alter this value according to their needs?

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-03-06 15:24 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-25 15:27 [AArch64][Spec2017]Question about mlow-precision-div optimization Wilco Dijkstra
2020-02-26 13:01 ` =?gb18030?B?QnUgTGU=?=
2020-02-27 22:08   ` Wilco Dijkstra
2020-03-03 16:22     ` Richard Sandiford
2020-03-04 13:26       ` Wilco Dijkstra
     [not found]       ` <tencent_5C7FA4816F6BB9D3236327A73C9BA5A39105@qq.com>
2020-03-06 15:24         ` 回复: " Richard Sandiford
  -- strict thread matches above, loose matches on Subject: below --
2020-02-23  4:06 Bu Le

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).