Re: [RFC, vectorizer] Allow single element vector types for vector reduction operations

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Richard Biener <rguenther@suse.de>
To: Andrew Pinski <pinskia@gmail.com>
Cc: Tamar Christina <Tamar.Christina@arm.com>,
	    Andreas Schwab <schwab@linux-m68k.org>,
	Jon Beniston <jon@beniston.com>,
	    "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
	nd <nd@arm.com>
Subject: Re: [RFC, vectorizer] Allow single element vector types for vector reduction operations
Date: Tue, 05 Sep 2017 09:37:00 -0000	[thread overview]
Message-ID: <alpine.LSU.2.20.1709051027370.14191@zhemvz.fhfr.qr> (raw)
In-Reply-To: <CA+=Sn1kQ5tJfhzLY01EO+9bN7tgAYMdPB_-81izEV2oY_wjwyA@mail.gmail.com>

On Mon, 4 Sep 2017, Andrew Pinski wrote:

> On Mon, Sep 4, 2017 at 7:28 AM, Tamar Christina <Tamar.Christina@arm.com> wrote:
> >> >   vect__5.25_58 = VIEW_CONVERT_EXPR<vector(1) long unsigned
> >> intD.11>(vect__4.21_65);
> >> >   vect__5.25_57 = VIEW_CONVERT_EXPR<vector(1) long unsigned
> >> intD.11>(vect__4.22_63);
> >> >   vect__5.25_56 = VIEW_CONVERT_EXPR<vector(1) long unsigned
> >> intD.11>(vect__4.23_61);
> >> >   vect__5.25_55 = VIEW_CONVERT_EXPR<vector(1) long unsigned
> >> > intD.11>(vect__4.24_59);
> >> >
> >> > I suspect this patch will be quite bad for us performance wise as it
> >> > thinks it's as cheap to do all our integer operations on the vector side with
> >> vectors of 1 element. But I'm still waiting for the perf numbers to confirm.
> >>
> >> Looks like the backend advertises that it can do POPCOUNT on V1DI.  So SLP
> >> vectorization decides it can vectorize this without unrolling.
> >
> > We don't, POPCOUNT is only defined for vector modes V8QI and V16QI, we also don't define support
> > For V1DI anywhere in the backend, we do however say we support V1DF, but removing
> > That doesn't cause the ICE to go away.
> >
> >> Vectorization with V2DI is rejected:
> >>
> >> /tmp/trunk/gcc/testsuite/gcc.dg/vect/pr68577.c:18:3: note: function is not
> >> vectorizable.
> >> /tmp/trunk/gcc/testsuite/gcc.dg/vect/pr68577.c:18:3: note: not vectorized:
> >> relevant stmt not supported: _8 = __builtin_popcountl (_5); ...
> >> /tmp/trunk/gcc/testsuite/gcc.dg/vect/pr68577.c:18:3: note: ***** Re-trying
> >> analysis with vector size 8
> >>
> >> and that now succeeds (it probably didn't succeed before the patch).
> >
> > In the .optimized file, I see it vectorised it to
> >
> >   vect__5.25_58 = VIEW_CONVERT_EXPR<vector(1) long unsigned intD.11>(vect__4.21_65);
> >   vect__5.25_57 = VIEW_CONVERT_EXPR<vector(1) long unsigned intD.11>(vect__4.22_63);
> >   vect__5.25_56 = VIEW_CONVERT_EXPR<vector(1) long unsigned intD.11>(vect__4.23_61);
> >   vect__5.25_55 = VIEW_CONVERT_EXPR<vector(1) long unsigned intD.11>(vect__4.24_59);
> >   _54 = POPCOUNT (vect__5.25_58);
> >   _53 = POPCOUNT (vect__5.25_57);
> >
> > Which is something we just don't have a pattern for. Before this patch, it was rejecting "long unsigned int"
> > With this patch is somehow thinks we support an integer vector of 1 element, even though 1) we don't have an optab
> > Defined for this operation for POPCOUNT (or at all in aarch64 as far as I can tell), and 2) we don't have it in our supported list of vector modes.
> 
> Here are the two popcount optab aarch64 has:
> (define_mode_iterator GPI [SI DI])
> (define_expand "popcount<mode>2"
>   [(match_operand:GPI 0 "register_operand")
>    (match_operand:GPI 1 "register_operand")]
> 
> 
> (define_insn "popcount<mode>2"
> (define_mode_iterator VB [V8QI V16QI])
>   [(set (match_operand:VB 0 "register_operand" "=w")
>         (popcount:VB (match_operand:VB 1 "register_operand" "w")))]
> 
> As you can see we only define popcount optab for SI, DI, V8QI and
> V16QI.  (note SI and DI uses the V8QI and V16QI during the expansion
> but that is a different story).
> 
> Maybe somehow the vectorizer is thinking V1DI and DI are
> interchangeable in some places.

We ask

Breakpoint 5, vectorizable_internal_function (cfn=CFN_BUILT_IN_POPCOUNTL, 
    fndecl=<function_decl 0x7ffff694b500 __builtin_popcountl>, 
    vectype_out=<vector_type 0x7ffff69789d8>, 
    vectype_in=<vector_type 0x7ffff697a690>)
    at /tmp/trunk/gcc/tree-vect-stmts.c:1666
1666      if (internal_fn_p (cfn))
(gdb) p debug_generic_expr (vectype_out)
vector(2) int
$10 = void
(gdb) p debug_generic_expr (vectype_in)
vector(1) long unsigned int
$11 = void
(gdb) fin
Run till exit from #0  vectorizable_internal_function (
    cfn=CFN_BUILT_IN_POPCOUNTL, 
    fndecl=<function_decl 0x7ffff694b500 __builtin_popcountl>, 
    vectype_out=<vector_type 0x7ffff69789d8>, 
    vectype_in=<vector_type 0x7ffff697a690>)
    at /tmp/trunk/gcc/tree-vect-stmts.c:1666
0x0000000001206afc in vectorizable_call (gs=<gimple_call 0x7ffff7fee7e0>, 
    gsi=0x0, vec_stmt=0x0, slp_node=0x2451290)
    at /tmp/trunk/gcc/tree-vect-stmts.c:2762
2762        ifn = vectorizable_internal_function (cfn, callee, 
vectype_out,
Value returned is $12 = IFN_POPCOUNT

so somehow direct_internal_fn_supported_p says true for a POPCOUNTL
V1DI -> V2SI.  I'd argue the question is odd already but the
ultimative answer is cleary wrong ;)

We have

(gdb) p info
$22 = (const direct_internal_fn_info &) @0x19b8e0c: {type0 = 0, type1 = 0, 
  vectorizable = 1}

so require vectype_in as vectype_out which means we ask for V1DI -> V1DI
(and the vectorizer compensates for the demotion).  But the
mode of V1DI is actually DI (because the target doesn't support V1DI).
Thus we end up with asking for popcount with DImode which is
available.

Note that this V1DI vector type having DImode is desirable for Jons
case as far as I understand.  DImode gets used because aarch64
advertises generic vectorization support with word_mode.

Things may get tricky later when we look for VEC_PACK_TRUNC
of V1DI, V1DI -> V2SI but the vec_pack_trunc_optab advertises
DImode support via the VDN iterator (maybe expecting this only
in case no V2SI support is available).

So in the end the vectorizer works as expected ;)

What we don't seem to handle is a single-element vector typed, DImode
constructor with a single DImode element.

We call store_bit_field with fieldmode DI but a value with V2SI,
I guess things go downhill from there.  The SSA exp we store
was DImode.  Ah.

 <ssa_name 0x7ffff66c0438
    type <integer_type 0x7ffff68a07e0 long unsigned int public unsigned DI
        size <integer_cst 0x7ffff6891a98 constant 64>
        unit-size <integer_cst 0x7ffff6891ab0 constant 8>
        align:64 warn_if_not_align:0 symtab:0 alias-set 3 canonical-type 
0x7ffff68a07e0 precision:64 min <integer_cst 0x7ffff6891d68 0> max 
<integer_cst 0x7ffff6893540 18446744073709551615>
        pointer_to_this <pointer_type 0x7ffff68b1c78>>
    visited
    def_stmt _62 = VEC_PACK_TRUNC_EXPR <_67, _64>;

I suppose _that_'s already somewhat bogus.  vector lowering creates it
for some reason:

-  vect__8.26_52 = VEC_PACK_TRUNC_EXPR <_54, _53>;
+  _67 = BIT_FIELD_REF <_54, 64, 0>;
+  _64 = BIT_FIELD_REF <_53, 64, 0>;
+  _62 = VEC_PACK_TRUNC_EXPR <_67, _64>;
+  _60 = {_62};
+  _48 = VIEW_CONVERT_EXPR<vector(2) int>(_60);
+  vect__8.26_52 = _48;

which happens because

static tree
get_compute_type (enum tree_code code, optab op, tree type)
{
  /* For very wide vectors, try using a smaller vector mode.  */
  tree compute_type = type;
  if (op
      && (!VECTOR_MODE_P (TYPE_MODE (type))
          || optab_handler (op, TYPE_MODE (type)) == CODE_FOR_nothing))

and/or

  if (compute_type == NULL_TREE)
    compute_type = get_compute_type (code, op, type);
  if (compute_type == type)
    return; 

note that lowering sth like VEC_PACK_TRUNC_EXPR into scalars is
pointless - at least in the way the lowering code does.

Index: gcc/tree-vect-generic.c
===================================================================
--- gcc/tree-vect-generic.c	(revision 251642)
+++ gcc/tree-vect-generic.c	(working copy)
@@ -1439,7 +1439,7 @@ get_compute_type (enum tree_code code, o
   /* For very wide vectors, try using a smaller vector mode.  */
   tree compute_type = type;
   if (op
-      && (!VECTOR_MODE_P (TYPE_MODE (type))
+      && (TYPE_MODE (type) == BLKmode
 	  || optab_handler (op, TYPE_MODE (type)) == CODE_FOR_nothing))
     {
       tree vector_compute_type
@@ -1459,7 +1459,7 @@ get_compute_type (enum tree_code code, o
   if (compute_type == type)
     {
       machine_mode compute_mode = TYPE_MODE (compute_type);
-      if (VECTOR_MODE_P (compute_mode))
+      if (compute_mode != BLKmode)
 	{
 	  if (op && optab_handler (op, compute_mode) != CODE_FOR_nothing)
 	    return compute_type;

fixes this and we pass the testcase again.  Note we vectorize
that way even with the cost model enabled so it might show us
that the cost model fails to account for some costs:

/tmp/trunk/gcc/testsuite/gcc.dg/vect/pr68577.c:18:3: note: Cost model 
analysis:
  Vector inside of loop cost: 8
  Vector prologue cost: 2
  Vector epilogue cost: 0
  Scalar iteration cost: 16
  Scalar outside cost: 0
  Vector outside cost: 2
  prologue iterations: 0
  epilogue iterations: 0
  Calculated minimum iters for profitability: 1

SLP costing seems to fail to account for promotion/demotion costs
it seems accounting for that still doesn't push it over the edge.

Richard.

> Thanks,
> Andrew
> 
> 
> >
> > So I don't quite understand how we end up with this expression.
> >
> > Regards,
> > Tamar
> >
> >>
> >> Richard.
> >>
> >> > ________________________________________
> >> > From: gcc-patches-owner@gcc.gnu.org <gcc-patches-owner@gcc.gnu.org>
> >> on
> >> > behalf of Andreas Schwab <schwab@linux-m68k.org>
> >> > Sent: Saturday, September 2, 2017 10:58 PM
> >> > To: Jon Beniston
> >> > Cc: 'Richard Biener'; gcc-patches@gcc.gnu.org
> >> > Subject: Re: [RFC, vectorizer] Allow single element vector types for
> >> > vector reduction operations
> >> >
> >> > On Aug 30 2017, "Jon Beniston" <jon@beniston.com> wrote:
> >> >
> >> > > gcc/
> >> > > 2017-08-30  Jon Beniston  <jon@beniston.com>
> >> > >             Richard Biener  <rguenther@suse.de>
> >> > >
> >> > > diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
> >> > > index cfdb72c..5ebeac2 100644
> >> > > --- a/gcc/tree-vect-patterns.c
> >> > > +++ b/gcc/tree-vect-patterns.c
> >> > > @@ -4150,7 +4150,7 @@ vect_pattern_recog_1 (vect_recog_func
> >> *recog_func,
> >> > >    loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
> >> > >
> >> > >    if (VECTOR_BOOLEAN_TYPE_P (type_in)
> >> > > -      || VECTOR_MODE_P (TYPE_MODE (type_in)))
> >> > > +      || VECTOR_TYPE_P (type_in))
> >> > >      {
> >> > >        /* No need to check target support (already checked by the pattern
> >> > >           recognition function).  */ diff --git
> >> > > a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index
> >> > > 013fb1f..fc62efb 100644
> >> > > --- a/gcc/tree-vect-stmts.c
> >> > > +++ b/gcc/tree-vect-stmts.c
> >> > > @@ -9098,7 +9098,8 @@ get_vectype_for_scalar_type_and_size (tree
> >> > > scalar_type, unsigned size)
> >> > >    else
> >> > >      simd_mode = mode_for_vector (inner_mode, size / nbytes);
> >> > >    nunits = GET_MODE_SIZE (simd_mode) / nbytes;
> >> > > -  if (nunits <= 1)
> >> > > +  /* NOTE: nunits == 1 is allowed to support single element vector types.
> >> > > */
> >> > > +  if (nunits < 1)
> >> > >      return NULL_TREE;
> >> > >
> >> > >    vectype = build_vector_type (scalar_type, nunits);
> >> > >
> >> > >
> >> > >
> >> >
> >> > That breaks vect/pr68577.c on aarch64.
> >> >
> >> > during RTL pass: expand
> >> > /opt/gcc/gcc-20170902/gcc/testsuite/gcc.dg/vect/pr68577.c: In function
> >> 'slp_test':
> >> > /opt/gcc/gcc-20170902/gcc/testsuite/gcc.dg/vect/pr68577.c:20:12:
> >> > internal compiler error: in simplify_subreg, at simplify-rtx.c:6050
> >> > 0xb55983 simplify_subreg(machine_mode, rtx_def*, machine_mode,
> >> unsigned int)
> >> >         ../../gcc/simplify-rtx.c:6049
> >> > 0xb595f7 simplify_gen_subreg(machine_mode, rtx_def*, machine_mode,
> >> unsigned int)
> >> >         ../../gcc/simplify-rtx.c:6278
> >> > 0x81d277 store_bit_field_1
> >> >         ../../gcc/expmed.c:798
> >> > 0x81d55f store_bit_field(rtx_def*, unsigned long, unsigned long, unsigned
> >> long, unsigned long, machine_mode, rtx_def*, bool)
> >> >         ../../gcc/expmed.c:1133
> >> > 0x840bf7 store_field
> >> >         ../../gcc/expr.c:6950
> >> > 0x84792f store_constructor_field
> >> >         ../../gcc/expr.c:6142
> >> > 0x83edbf store_constructor
> >> >         ../../gcc/expr.c:6726
> >> > 0x840443 expand_constructor
> >> >         ../../gcc/expr.c:8027
> >> > 0x82d5bf expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
> >> expand_modifier, rtx_def**, bool)
> >> >         ../../gcc/expr.c:10133
> >> > 0x82eaeb expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
> >> expand_modifier, rtx_def**, bool)
> >> >         ../../gcc/expr.c:9819
> >> > 0x82dadf expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
> >> expand_modifier, rtx_def**, bool)
> >> >         ../../gcc/expr.c:10942
> >> > 0x82eaeb expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
> >> expand_modifier, rtx_def**, bool)
> >> >         ../../gcc/expr.c:9819
> >> > 0x83d197 expand_expr
> >> >         ../../gcc/expr.h:276
> >> > 0x83d197 expand_assignment(tree_node*, tree_node*, bool)
> >> >         ../../gcc/expr.c:4971
> >> > 0x71e2f3 expand_gimple_stmt_1
> >> >         ../../gcc/cfgexpand.c:3653
> >> > 0x71e2f3 expand_gimple_stmt
> >> >         ../../gcc/cfgexpand.c:3751
> >> > 0x721cdb expand_gimple_basic_block
> >> >         ../../gcc/cfgexpand.c:5750
> >> > 0x726b07 execute
> >> >         ../../gcc/cfgexpand.c:6357
> >> >
> >> > Andreas.
> >> >
> >> > --
> >> > Andreas Schwab, schwab@linux-m68k.org
> >> > GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276
> >> > 4ED5 "And now for something completely different."
> >> >
> >> >
> >>
> >> --
> >> Richard Biener <rguenther@suse.de>
> >> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton,
> >> HRB 21284 (AG Nuernberg)
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

next prev parent reply	other threads:[~2017-09-05  9:37 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-28  8:22 Jon Beniston
2017-08-28  8:29 ` Richard Biener
2017-08-28  9:04   ` Jon Beniston
2017-08-28 11:41     ` Richard Biener
2017-08-30 11:13       ` Jon Beniston
2017-08-30 12:28         ` Richard Biener
2017-08-30 15:52           ` Jon Beniston
2017-08-30 16:32             ` Richard Biener
2017-08-30 20:42             ` Richard Sandiford
2017-09-02 21:58             ` Andreas Schwab
2017-09-04  8:58               ` Tamar Christina
2017-09-04  9:19                 ` Richard Biener
2017-09-04 14:28                   ` Tamar Christina
2017-09-04 16:48                     ` Andrew Pinski
2017-09-05  9:37                       ` Richard Biener [this message]
2017-09-05 10:24                         ` Tamar Christina
2017-09-05 10:41                           ` Richard Biener
2017-09-05 12:50                             ` Richard Biener
2017-09-05 13:06                               ` Tamar Christina
2017-09-05 13:12                                 ` Richard Biener
2017-09-05 14:00                                   ` Tamar Christina
2017-09-11 15:39                                   ` Vidya Praveen
2017-09-12  7:13                                     ` Richard Biener
2017-09-12  8:45                                       ` Tamar Christina
2017-09-12  9:03                                         ` Richard Biener
2017-09-12 10:50                                           ` Richard Biener
2017-09-12 16:26                                             ` Andreas Schwab
2017-09-12 16:36                                               ` Richard Biener
2017-09-12 17:13                                                 ` Tamar Christina
2017-09-04 15:04             ` Christophe Lyon
2017-09-07 11:08 ` Bernd Schmidt
2017-09-07 11:20   ` Jon Beniston
2017-09-07 14:42     ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LSU.2.20.1709051027370.14191@zhemvz.fhfr.qr \
    --to=rguenther@suse.de \
    --cc=Tamar.Christina@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jon@beniston.com \
    --cc=nd@arm.com \
    --cc=pinskia@gmail.com \
    --cc=schwab@linux-m68k.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).