Re: Why vectorization didn't turn on by -O2

public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed

From: "Kewen.Lin" <linkw@linux.ibm.com>
To: Richard Biener <rguenther@suse.de>
Cc: bin.cheng@linux.alibaba.com,
	Segher Boessenkool <segher@kernel.crashing.org>,
	172060045@hdu.edu.cn, gcc-help <gcc-help@gcc.gnu.org>,
	Jan Hubicka <hubicka@ucw.cz>,
	Richard Sandiford <richard.sandiford@arm.com>
Subject: Re: Why vectorization didn't turn on by -O2
Date: Mon, 16 Aug 2021 11:22:33 +0800	[thread overview]
Message-ID: <e3b776f5-14d2-882a-b647-0ad4a55bcd46@linux.ibm.com> (raw)
In-Reply-To: <nycvar.YFH.7.76.2108041028300.11781@zhemvz.fhfr.qr>

on 2021/8/4 下午4:31, Richard Biener wrote:
> On Wed, 4 Aug 2021, Richard Sandiford wrote:
> 
>> Hongtao Liu <crazylht@gmail.com> writes:
>>> On Tue, May 18, 2021 at 4:27 AM Richard Sandiford via Gcc-help
>>> <gcc-help@gcc.gnu.org> wrote:
>>>>
>>>> Jan Hubicka <hubicka@ucw.cz> writes:
>>>>> Hi,
>>>>> here are updated scores.
>>>>> https://lnt.opensuse.org/db_default/v4/SPEC/latest_runs_report?younger_in_days=14&older_in_days=0&all_elf_detail_stats=on&min_percentage_change=0.001&revisions=9388fc7bf0da61a8104e8501e5965120e9159e12%2Cea21f32198432a490dd490696322838d94b3d3b2%2C4f5431c5768bbba81a422f6fed6a6e2454c700ee%2C&include_user_branches=on
>>>>> compares
>>>>>   base:  mainline
>>>>>   1st column: mainline with very cheap vectorization at -O2 and -O3
>>>>>   2nd column: mainline with cheap vectorization at -O2 and -O3.
>>>>>
>>>>> The short story is:
>>>>>
>>>>> 1) -O2 generic performance
>>>>>     kabylake (Intel):
>>>>>                               very    cheap
>>>>>         SPEC/SPEC2006/FP/total        ~       8.32%
>>>>>       SPEC/SPEC2006/total     -0.38%  4.74%
>>>>>       SPEC/SPEC2006/INT/total -0.91%  -0.14%
>>>>>
>>>>>       SPEC/SPEC2017/INT/total 4.71%   7.11%
>>>>>       SPEC/SPEC2017/total     2.22%   6.52%
>>>>>       SPEC/SPEC2017/FP/total  0.34%   6.06%
>>>>>     zen
>>>>>         SPEC/SPEC2006/FP/total        0.61%   10.23%
>>>>>       SPEC/SPEC2006/total     0.26%   6.27%
>>>>>       SPEC/SPEC2006/INT/total 34.006  -0.24%  0.90%
>>>>>
>>>>>         SPEC/SPEC2017/INT/total       3.937   5.34%   7.80%
>>>>>       SPEC/SPEC2017/total     3.02%   6.55%
>>>>>       SPEC/SPEC2017/FP/total  1.26%   5.60%
>>>>>
>>>>>  2) -O2 size:
>>>>>      -0.78% (very cheap) 6.51% (cheap) for spec2k2006
>>>>>      -0.32% (very cheap) 6.75% (cheap) for spec2k2017
>>>>>  3) build times:
>>>>>      0%, 0.16%, 0.71%, 0.93% (very cheap) 6.05% 4.80% 6.75% 7.15% (cheap) for spec2k2006
>>>>>      0.39% 0.57% 0.71%       (very cheap) 5.40% 6.23% 8.44%       (cheap) for spec2k2017
>>>>>     here I simply copied data from different configuratoins
>>>>>
>>>>> So for SPEC i would say that most of compile time costs are derrived
>>>>> from code size growth which is a problem with cheap model but not with
>>>>> very cheap.  Very cheap indeed results in code size improvements and
>>>>> compile time impact is probably somewhere around 0.5%
>>>>>
>>>>> So from these scores alone this would seem that vectorization makes
>>>>> sense at -O2 with very cheap model to me (I am sure we have other
>>>>> optimizations with worse benefits to compile time tradeoffs).
>>>>
>>>> Thanks for running these.
>>>>
>>>> The biggest issue I know of for enabling very-cheap at -O2 is:
>>>>
>>>>    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100089
>>>>
>>>> Perhaps we could get around that by (hopefully temporarily) disabling
>>>> BB SLP within loop vectorisation for the very-cheap model.  This would
>>>> purely be a workaround and we should remove it once the PR is fixed.
>>>> (It would even be a compile-time win in the meantime :-))
>>>>
>>>> Thanks,
>>>> Richard
>>>>
>>>>> However there are usual arguments against:
>>>>>
>>>>>   1) Vectorizer being tuned for SPEC.  I think the only way to overcome
>>>>>      that argument is to enable it by default :)
>>>>>   2) Workloads improved are more of -Ofast type workloads
>>>>>
>>>>> Here are non-spec benchmarks we track:
>>>>> https://lnt.opensuse.org/db_default/v4/CPP/latest_runs_report?younger_in_days=14&older_in_days=0&min_percentage_change=0.02&revisions=9388fc7bf0da61a8104e8501e5965120e9159e12%2Cea21f32198432a490dd490696322838d94b3d3b2%2C4f5431c5768bbba81a422f6fed6a6e2454c700ee%2C&include_user_branches=on
>>>>>
>>>>> I also tried to run Firefox some time ago. Results are not surprising -
>>>>> vectorizaiton helps rendering benchmarks which are those compiler with
>>>>> aggressive flags anyway.
>>>>>
>>>>> Honza
>>>
>>> Hi:
>>>   I would like to ask if we can turn on O2 vectorization now?
>>
>> I think we still need to deal with the PR100089 issue that I mentioned above.
>> Like I say, “dealing with” it could be as simple as disabling:
>>
>>       /* If we applied if-conversion then try to vectorize the
>> 	 BB of innermost loops.
>> 	 ???  Ideally BB vectorization would learn to vectorize
>> 	 control flow by applying if-conversion on-the-fly, the
>> 	 following retains the if-converted loop body even when
>> 	 only non-if-converted parts took part in BB vectorization.  */
>>       if (flag_tree_slp_vectorize != 0
>> 	  && loop_vectorized_call
>> 	  && ! loop->inner)
>>
>> for the very-cheap vector cost model until the PR is fixed properly.
> 
> Alternatively only enable loop vectorization at -O2 (the above checks
> flag_tree_slp_vectorize as well).  At least the cost model kind
> does not have any influence on BB vectorization, that is, we get the
> same pros and cons as we do for -O3.
> 
> Did anyone benchmark -O2 -ftree-{loop,slp}-vectorize separately yet?
> 


Here is the measured performance speedup at O2 vect with
very cheap cost model on both Power8 and Power9.

INT: -O2 -mcpu=power{8,9} -ftree-{,loop-,slp-}vectorize -fvect-cost-model=very-cheap
FP: INT + -ffast-math

Column titles are:

<bmks>  <both loop and slp>  <loop only>  <slp only> (+:improvement, -:degradation)

Power8:
500.perlbench_r 	0.00% 0.00% 0.00%
502.gcc_r 		0.39% 0.78% 0.00%
505.mcf_r 		0.00% 0.00% 0.00%
520.omnetpp_r 		1.21% 0.30% 0.00%
523.xalancbmk_r 	0.00% 0.00% -0.57%
525.x264_r  		41.84%  42.55%  0.00%
531.deepsjeng_r 	0.00% -0.63%  0.00%
541.leela_r 		-3.44%  -2.75%  0.00%
548.exchange2_r 	1.66% 1.66% 0.00%
557.xz_r  		1.39% 1.04% 0.00%
Geomean 		3.67% 3.64% -0.06%

503.bwaves_r  		0.00% 0.00% 0.00%
507.cactuBSSN_r 	0.00% 0.29% 0.44%
508.namd_r  		0.00% 0.29% 0.00%
510.parest_r  		0.00% -0.36%  -0.54%
511.povray_r  		0.63% 0.31% 0.94%
519.lbm_r 		2.71% 2.71% 0.00%
521.wrf_r 		1.04% 1.04% 0.00%
526.blender_r 		-1.31%  -0.78%  0.00%
527.cam4_r  		-0.62%  -0.31%  -0.62%
538.imagick_r 		0.21% 0.21% -0.21%
544.nab_r 		0.00% 0.00% 0.00%
549.fotonik3d_r 	0.00% 0.00% 0.00%
554.roms_r  		0.30% 0.00% 0.00%
Geomean 		0.22% 0.26% 0.00%

Power9:

500.perlbench_r 	0.62% 0.62% -1.54%
502.gcc_r 		-0.60%  -0.60%  -0.81%
505.mcf_r 		2.05% 2.05% 0.00%
520.omnetpp_r 		-2.41%  -0.30%  -0.60%
523.xalancbmk_r 	-1.44%  -2.30%  -1.44%
525.x264_r  		24.26%  23.93%  -0.33%
531.deepsjeng_r 	0.32% 0.32% 0.00%
541.leela_r 		0.39% 1.18% -0.39%
548.exchange2_r 	0.76% 0.76% 0.00%
557.xz_r  		0.36% 0.36% -0.36%
Geomean 		2.19% 2.38% -0.55%

503.bwaves_r  		0.00% 0.36% 0.00%
507.cactuBSSN_r 	0.00% 0.00% 0.00%
508.namd_r  		-3.73%  -0.31%  -3.73%
510.parest_r  		-0.21%  -0.42%  -0.42%
511.povray_r  		-0.96%  -1.59%  0.64%
519.lbm_r 		2.31% 2.31% 0.17%
521.wrf_r 		2.66% 2.66% 0.00%
526.blender_r 		-1.96%  -1.68%  1.40%
527.cam4_r  		0.00% 0.91% 1.81%
538.imagick_r 		0.39% -0.19%  -10.29%  // known noise, imagick_r can have big jitter on P9 box sometimes.
544.nab_r 		0.25% 0.00% 0.00%
549.fotonik3d_r 	0.94% 0.94% 0.00%
554.roms_r  		0.00% 0.00% -1.05%
Geomean 		-0.03%  0.22% -0.93%


As above, the gains are mainly from loop vectorization.
btw, Power8 data can be more representative since some bmks can have jitters on our P9 perf box.

BR,
Kewen

next prev parent reply	other threads:[~2021-08-16  3:22 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-08 12:07 172060045
2021-05-08 16:54 ` Xi Ruoyao
2021-05-09 13:44   ` Segher Boessenkool
2021-05-10  8:21     ` Richard Biener
2021-05-10  9:11       ` Jan Hubicka
2021-05-10  9:27         ` Segher Boessenkool
2021-05-10 12:10           ` Jan Hubicka
2021-05-10  9:24       ` Segher Boessenkool
2021-05-17 16:03         ` Jan Hubicka
2021-05-17 18:56           ` Richard Sandiford
2021-08-04  8:21             ` Hongtao Liu
2021-08-04  8:22               ` Richard Sandiford
2021-08-04  8:31                 ` Richard Biener
2021-08-04  9:10                   ` Richard Sandiford
2021-08-04  9:56                     ` Segher Boessenkool
2021-08-04 10:22                       ` Richard Sandiford
2021-08-04 21:18                         ` Segher Boessenkool
2021-08-04  9:12                   ` Hongtao Liu
2021-08-11 17:14                   ` Jan Hubicka
2021-08-14 14:22                     ` Jan Hubicka
2021-08-16  8:03                       ` Richard Biener
2021-08-16  3:22                   ` Kewen.Lin [this message]
2021-08-16  6:00                     ` Hongtao Liu
2021-08-16  6:09                       ` Hongtao Liu
2021-08-24  2:21                         ` Hongtao Liu
2021-08-04  8:36                 ` Hongtao Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e3b776f5-14d2-882a-b647-0ad4a55bcd46@linux.ibm.com \
    --to=linkw@linux.ibm.com \
    --cc=172060045@hdu.edu.cn \
    --cc=bin.cheng@linux.alibaba.com \
    --cc=gcc-help@gcc.gnu.org \
    --cc=hubicka@ucw.cz \
    --cc=rguenther@suse.de \
    --cc=richard.sandiford@arm.com \
    --cc=segher@kernel.crashing.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).