From: Richard Guenther <rguenther@suse.de>
To: "William J. Schmidt" <wschmidt@linux.vnet.ibm.com>
Cc: Richard Guenther <richard.guenther@gmail.com>,
gcc-patches@gcc.gnu.org, bergner@vnet.ibm.com
Subject: Re: [PATCH] Correct cost model for strided loads
Date: Mon, 11 Jun 2012 14:13:00 -0000 [thread overview]
Message-ID: <Pine.LNX.4.64.1206111559220.29541@jbgna.fhfr.qr> (raw)
In-Reply-To: <1339422849.3043.52.camel@gnopaine>
[-- Attachment #1: Type: TEXT/PLAIN, Size: 6718 bytes --]
On Mon, 11 Jun 2012, William J. Schmidt wrote:
>
>
> On Mon, 2012-06-11 at 11:15 +0200, Richard Guenther wrote:
> > On Sun, Jun 10, 2012 at 5:58 PM, William J. Schmidt
> > <wschmidt@linux.vnet.ibm.com> wrote:
> > > The fix for PR53331 caused a degradation to 187.facerec on
> > > powerpc64-unknown-linux-gnu. The following simple patch reverses the
> > > degradation without otherwise affecting SPEC cpu2000 or cpu2006.
> > > Bootstrapped and regtested on that platform with no new regressions. Ok
> > > for trunk?
> >
> > Well, would the real cost not be subparts * scalar_to_vec plus
> > subparts * vec_perm?
> > At least vec_perm isn't the cost for building up a vector from N scalar elements
> > either (it might be close enough if subparts == 2). What's the case
> > with facerec
> > here? Does it have subparts == 2?
>
> In this case, subparts == 4 (32-bit floats, 128-bit vec reg). On
> PowerPC, this requires two merge instructions and a permute instruction
> to get the four 32-bit quantities into the right place in a 128-bit
> register. Currently this is modeled as a vec_perm in other parts of the
> vectorizer per Ira's earlier patches, so I naively changed this to do
> the same thing.
I see.
> The types of vectorizer instructions aren't documented, and I can't
> infer much from the i386.c cost model, so I need a little education.
> What semantics are represented by scalar_to_vec?
It's a vector splat, thus x -> { x, x, x, ... }. You can create
{ x, y, z, ... } by N such splats plus N - 1 permutes (if a permute,
as VEC_PERM_EXPR, takes two input vectors). That's by far not
the most efficient way to build up such a vector of course (with AVX
you could do one splat plus N - 1 inserts for example). The cost
is of course dependent on the number of vector elements, so a
simple new enum vect_cost_for_stmt kind does not cover it but
the target would have to look at the vector type passed and do
some reasonable guess.
> On PowerPC, we have this mapping of the floating-point registers and
> vector float registers where they overlap (the low-order half of each of
> the first 32 vector float regs corresponds to a scalar float reg). So
> in this case we have four scalar loads that place things in the bottom
> half of four vector registers, two vector merge instructions that
> collapse the four registers into two vector registers, and a vector
> permute that gets things in the right order.(*) I wonder if what we
> refer to as a merge instruction is similar to scalar_to_vec.
Looks similar to x86 SSE then.
> If so, then maybe we need something like
>
> subparts = TYPE_VECTOR_SUBPARTS (STMT_VINFO_VECTYPE (stmt_info));
> inside_cost += vect_get_stmt_cost (scalar_load) * ncopies * subparts;
> inside_cost += ncopies * vect_get_stmt_cost (scalar_to_vec) * subparts / 2;
> inside_cost += ncopies * vect_get_stmt_cost (vec_perm);
>
> But then we'd have to change how vec_perm is modeled elsewhere for
> PowerPC based on Ira's earlier patches. As I said, it's difficult for
> me to figure out all the intent of cost model decisions that have been
> made in the past, using current documentation.
Heh, usually the intent was to make the changes simple, not to compute
a proper cost.
I think we simply need a new scalars_to_vec cost kind.
> > I really wanted to pessimize this case
> > for say AVX and char elements, thus building up a vector from 32 scalars which
> > certainly does not cost a mere vec_perm. So, maybe special-case the
> > subparts == 2 case and assume vec_perm would match the cost only in that
> > case.
>
> (I'm a little confused by this as what you have at the moment is a
> single scalar_to_vec per copy, which has a cost of 1 on most i386
> targets (occasionally 2). The subparts multiplier is only applied to
> the loads. So changing this to vec_perm seemed to be a no-op for i386.)
Oh, I somehow read the patch as you were removing the multiplication
by TYPE_VECTOR_SUBPARTS. But yes, the cost is way off and I'd wanted
to reflect it with N scalar loads plus N splats plus N - 1 permutes
originally. You could also model it with N scalar loads plus N
inserts (but we don't have a vec_insert cost either). I think adding
a scalars_to_vec or vec_init or however we want to call it - basically
what the cost of a vector CONSTRUCTOR would be - is best.
> (*) There are actually a couple more instructions here to convert 64-bit
> values to 32-bit values, since on PowerPC 32-bit loads are converted to
> 64-bit values in scalar float registers and they have to be coerced back
> to 32-bit. Very ugly. The cost model currently doesn't represent this
> at all, which I'll have to look at fixing at some point in some way that
> isn't too nasty for the other targets. The cost model for PowerPC seems
> to need a lot of TLC.
Maybe the above would be a possible way to do it. On x86 for example
a vector of two doubles is extremely cheap for generic SSE2 to construct,
we can directly load into the low/high part, thus when we use it as
N scalar loads plus the vector-build the vector-build would be free.
Thanks,
Richard.
> Thanks,
> Bill
>
> >
> > Thanks,
> > Richard.
> >
> > > Thanks,
> > > Bill
> > >
> > >
> > > 2012-06-10 Bill Schmidt <wschmidt@linux.ibm.com>
> > >
> > > * tree-vect-stmts.c (vect_model_load_cost): Change cost model
> > > for strided loads.
> > >
> > >
> > > Index: gcc/tree-vect-stmts.c
> > > ===================================================================
> > > --- gcc/tree-vect-stmts.c (revision 188341)
> > > +++ gcc/tree-vect-stmts.c (working copy)
> > > @@ -1031,11 +1031,10 @@ vect_model_load_cost (stmt_vec_info stmt_info, int
> > > /* The loads themselves. */
> > > if (STMT_VINFO_STRIDE_LOAD_P (stmt_info))
> > > {
> > > - /* N scalar loads plus gathering them into a vector.
> > > - ??? scalar_to_vec isn't the cost for that. */
> > > + /* N scalar loads plus gathering them into a vector. */
> > > inside_cost += (vect_get_stmt_cost (scalar_load) * ncopies
> > > * TYPE_VECTOR_SUBPARTS (STMT_VINFO_VECTYPE (stmt_info)));
> > > - inside_cost += ncopies * vect_get_stmt_cost (scalar_to_vec);
> > > + inside_cost += ncopies * vect_get_stmt_cost (vec_perm);
> > > }
> > > else
> > > vect_get_load_cost (first_dr, ncopies,
> > >
> > >
> >
>
>
--
Richard Guenther <rguenther@suse.de>
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer
next prev parent reply other threads:[~2012-06-11 14:10 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-06-10 16:32 William J. Schmidt
2012-06-11 9:22 ` Richard Guenther
2012-06-11 14:10 ` William J. Schmidt
2012-06-11 14:13 ` Richard Guenther [this message]
2012-06-11 15:09 ` William J. Schmidt
2012-06-11 18:38 ` Richard Guenther
2012-06-12 11:04 ` Richard Guenther
2012-06-12 12:13 ` William J. Schmidt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.1206111559220.29541@jbgna.fhfr.qr \
--to=rguenther@suse.de \
--cc=bergner@vnet.ibm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=richard.guenther@gmail.com \
--cc=wschmidt@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).