From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-306348-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 18155 invoked by alias); 2 Nov 2011 20:37:03 -0000
Received: (qmail 18105 invoked by uid 22791); 2 Nov 2011 20:37:02 -0000
X-SWARE-Spam-Status: No, hits=-2.2 required=5.0	tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,TW_AV
X-Spam-Check-By: sourceware.org
Received: from mail-ey0-f175.google.com (HELO mail-ey0-f175.google.com) (209.85.215.175)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 02 Nov 2011 20:36:47 +0000
Received: by eyd9 with SMTP id 9so585718eyd.20        for <gcc-patches@gcc.gnu.org>; Wed, 02 Nov 2011 13:36:45 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.182.232.38 with SMTP id tl6mr1227394obc.22.1320266204969; Wed, 02 Nov 2011 13:36:44 -0700 (PDT)
Received: by 10.182.17.232 with HTTP; Wed, 2 Nov 2011 13:36:44 -0700 (PDT)
In-Reply-To: <63EE40A00BA43F49B85FACBB03F078B6090AC041A7@sausexmbp02.amd.com>
References: <20110712212201.23194.45716.sendpatchset@gccpike4.amd.com>	<4E1CC32B.3060004@redhat.com>	<63EE40A00BA43F49B85FACBB03F078B60821086630@sausexmbp02.amd.com>	<63EE40A00BA43F49B85FACBB03F078B6090A9D90F2@sausexmbp02.amd.com>	<CAFiYyc3kdDNqpbg7yEFVX7wX0LZtt2puZQnSBt92aebwfRmmeQ@mail.gmail.com>	<63EE40A00BA43F49B85FACBB03F078B6090AC041A7@sausexmbp02.amd.com>
Date: Wed, 02 Nov 2011 20:50:00 -0000
Message-ID: <CAFiYyc2MrgHoARWTzV3EBpHFfLY6QnyrMQc1bQN4YT01L78e8Q@mail.gmail.com>
Subject: Re: AVX generic mode tuning discussion.
From: Richard Guenther <richard.guenther@gmail.com>
To: "Jagasia, Harsha" <harsha.jagasia@amd.com>
Cc: Richard Henderson <rth@redhat.com>, "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>, 	"hubicka@ucw.cz" <hubicka@ucw.cz>, "ubizjak@gmail.com" <ubizjak@gmail.com>, 	"hjl.tools@gmail.com" <hjl.tools@gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
X-IsSubscribed: yes
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
X-SW-Source: 2011-11/txt/msg00238.txt.bz2

On Wed, Nov 2, 2011 at 5:57 PM, Jagasia, Harsha <harsha.jagasia@amd.com> wr=
ote:
>> >> > > We would like to propose changing AVX generic mode tuning to
>> >> generate
>> >> > 128-bit
>> >> > > AVX instead of 256-bit AVX.
>> >> >
>> >> > You indicate a 3% reduction on bulldozer with avx256.
>> >> > How does avx128 compare to -mno-avx -msse4.2?
>> >>
>> >> We see these % differences going from SSE42 to AVX128 to AVX256 on
>> >> Bulldozer with "-mtune=3Dgeneric -Ofast".
>> >> (Positive is improvement, negative is degradation)
>> >>
>> >> Bulldozer:
>> >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 AVX128/SSE42 =A0 =A0AVX25=
6/AVX-128
>> >> 410.bwaves =A0 =A0 =A0 =A0 =A0 =A0-1.4% =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 -1.4%
>> >> 416.gamess =A0 =A0 =A0 =A0 =A0 =A0-1.1% =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 0.0%
>> >> 433.milc =A0 =A0 =A0 =A0 =A0 =A0 =A00.5% =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0-2.4%
>> >> 434.zeusmp =A0 =A0 =A0 =A0 =A0 =A09.7% =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0-2.1%
>> >> 435.gromacs =A0 =A0 =A0 =A0 =A0 5.1% =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A00.5%
>> >> 436.cactusADM =A0 =A0 =A0 =A0 8.2% =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0-23.8%
>> >> 437.leslie3d =A0 =A0 =A0 =A0 =A08.1% =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A00.4%
>> >> 444.namd =A0 =A0 =A0 =A0 =A0 =A0 =A03.6% =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A00.0%
>> >> 447.dealII =A0 =A0 =A0 =A0 =A0 =A0-1.4% =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 -0.4%
>> >> 450.soplex =A0 =A0 =A0 =A0 =A0 =A0-0.4% =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 -0.4%
>> >> 453.povray =A0 =A0 =A0 =A0 =A0 =A00.0% =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0-1.5%
>> >> 454.calculix =A0 =A0 =A0 =A0 =A015.7% =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
 =A0 -8.3%
>> >> 459.GemsFDTD =A0 =A0 =A0 =A0 =A04.9% =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A01.4%
>> >> 465.tonto =A0 =A0 =A0 =A0 =A0 =A0 1.3% =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0-0.6%
>> >> 470.lbm =A0 =A0 =A0 =A0 =A0 =A0 =A0 0.9% =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A00.3%
>> >> 481.wrf =A0 =A0 =A0 =A0 =A0 =A0 =A0 7.3% =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0-3.6%
>> >> 482.sphinx3 =A0 =A0 =A0 =A0 =A0 5.0% =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0-9.8%
>> >> SPECFP =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A03.8% =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0-3.2%
>> >>
>> >> > Will the next AMD generation have a useable avx256?
>> >> > I'm not keen on the idea of generic mode being tune
>> >> > for a single processor revision that maybe shouldn't
>> >> > actually be using avx at all.
>> >>
>> >> We see a substantial gain in several SPECFP benchmarks going from
>> SSE42
>> >> to AVX128 on Bulldozer.
>> >> IMHO, accomplishing even a 5% gain in an individual benchmark takes
>> a
>> >> hardware company several man months.
>> >> The loss with AVX256 for Bulldozer is much more significant than the
>> >> gain for SandyBridge.
>> >> While the general trend in the industry is a move toward AVX256, for
>> >> now we would be disadvantaging Bulldozer with this choice.
>> >>
>> >> We have several customers who use -mtune=3Dgeneric and it is default,
>> >> unless a user explicitly overrides it with -mtune=3Dnative. They are
>> the
>> >> ones who want to experiment with latest ISA using gcc, but want to
>> keep
>> >> their ISA selection and tuning agnostic on x86/64. IMHO, it is with
>> >> these customers in mind that generic was introduced in the first
>> place.
>> >
>> > Since stage 1 closure is around the corner, just wanted to ping to
>> see if the maintainers have made up their mind on this one.
>> > AVX-128 is an improvement over SSE42 for Bulldozer and AVX-256 wipes
>> out pretty much all of that gain in generic mode.
>> > Until there is a convergence on AVX-256 for x86/64, we would like to
>> propose having generic generate avx-128 by default and have a user
>> override to avx-256 manually when known to benefit performance.
>>
>> Did somebody spend the time analyzing why CactusADM shows so much of a
>> difference?
>> With the recent improvements in vectorizing for AVX, did
>> you
>> re-do the measurements with a recent trunk?
>>
>> I don't think disabling avx-256 by default is a good idea until we
>> understand why these numbers happen and are convinced we cannot fix
>> this by proper
>> cost modeling.
>
> We have observed cases where AVX-256 bit code is slower than AVX-128 bit =
code on Bulldozer. This is because internally the front end, data paths etc=
 for Bulldozer are designed for optimal AVX 128-bit. Throwing densely packe=
d 256-bit code at the pipeline can congest the front end causing stalls and=
 hence slowdowns. We expect the behavior of cactus, calculix and sphinx, wh=
ich are the 3 benchmarks with the biggest avx-256 gaps, to be in the same v=
ein. In general, the hardware design engineers recommend running AVX 128-bi=
t code on Bulldozer. Given the underlying hardware design, software tuning =
can't really change the results here. Any further analysis of cactus would =
be a cycle sink at our end and we may not even be able to discuss the detai=
ls on a public mailing list. x86/64 has not yet converged on avx-256 and ge=
neric mode should reflect that.

Well, generic hasn't converged on AVX at all.  Cost modeling can deal
with code density just fine - are there any differences between code
density issues of
say, loads vs. stores vs. arithmetic?  I specifically ask about
analysis because AVX-256 has instruction set issues for certain
patterns the vectorizer generates
and the cost model currently does not reflect these at all.

Richard.

> Posting the re-measurements on trunk for cactus, calculix and sphinx on B=
ulldozer:
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0AVX128/SSE42 =A0 =A0AVX256/AVX-128
> 436.cactusADM =A0 10% =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 -30%
> 454.calculix =A0 =A014.7% =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 -6%
> 482.sphinx3 =A0 =A0 =A0 =A0 7% =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0-9%
>
> All positive % above are improvements, all negative % are degradations.
>
> I will post re-measurements for all of Spec with latest trunk as soon as =
I have them.
>
> Thoughts?
>
> Thanks,
> Harsha
>
>
>