RE: [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask by using new optabs [PR108583]

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Tamar Christina <Tamar.Christina@arm.com>
To: Richard Sandiford <Richard.Sandiford@arm.com>
Cc: Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org>,
	nd <nd@arm.com>, "rguenther@suse.de" <rguenther@suse.de>,
	"jlaw@ventanamicro.com" <jlaw@ventanamicro.com>
Subject: RE: [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask by using new optabs [PR108583]
Date: Fri, 10 Feb 2023 16:09:58 +0000	[thread overview]
Message-ID: <VI1PR08MB53258F0AB4D9FB5EF8B0D066FFDE9@VI1PR08MB5325.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <mpth6vt345g.fsf@arm.com>

> -----Original Message-----
> From: Richard Sandiford <richard.sandiford@arm.com>
> Sent: Friday, February 10, 2023 3:57 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org>; nd
> <nd@arm.com>; rguenther@suse.de; jlaw@ventanamicro.com
> Subject: Re: [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask
> by using new optabs [PR108583]
> 
> Tamar Christina <Tamar.Christina@arm.com> writes:
> >> > a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index
> >> >
> >>
> 6934aebc69f231af24668f0a1c3d140e97f55487..e39d7e6b362ef44eb2fc467f33
> >> 69
> >> > de2afea139d6 100644
> >> > --- a/gcc/tree-vect-patterns.cc
> >> > +++ b/gcc/tree-vect-patterns.cc
> >> > @@ -3914,12 +3914,82 @@ vect_recog_divmod_pattern (vec_info
> *vinfo,
> >> >        return pattern_stmt;
> >> >      }
> >> >    else if ((cst = uniform_integer_cst_p (oprnd1))
> >> > -	   && targetm.vectorize.can_special_div_by_const (rhs_code,
> >> vectype,
> >> > -							  wi::to_wide (cst),
> >> > -							  NULL, NULL_RTX,
> >> > -							  NULL_RTX))
> >> > +	   && TYPE_UNSIGNED (itype)
> >> > +	   && rhs_code == TRUNC_DIV_EXPR
> >> > +	   && vectype
> >> > +	   && direct_internal_fn_supported_p (IFN_ADDH, vectype,
> >> > +					      OPTIMIZE_FOR_SPEED))
> >> >      {
> >> > -      return NULL;
> >> > +      /* div optimizations using narrowings
> >> > +       we can do the division e.g. shorts by 255 faster by calculating it as
> >> > +       (x + ((x + 257) >> 8)) >> 8 assuming the operation is done in
> >> > +       double the precision of x.
> >> > +
> >> > +       If we imagine a short as being composed of two blocks of bytes
> then
> >> > +       adding 257 or 0b0000_0001_0000_0001 to the number is equivalent
> to
> >> > +       adding 1 to each sub component:
> >> > +
> >> > +	    short value of 16-bits
> >> > +       ┌──────────────┬────────────────┐
> >> > +       │              │                │
> >> > +       └──────────────┴────────────────┘
> >> > +	 8-bit part1 ▲  8-bit part2   ▲
> >> > +		     │                │
> >> > +		     │                │
> >> > +		    +1               +1
> >> > +
> >> > +       after the first addition, we have to shift right by 8, and narrow the
> >> > +       results back to a byte.  Remember that the addition must be done
> in
> >> > +       double the precision of the input.  However if we know that
> >> > + the
> >> addition
> >> > +       `x + 257` does not overflow then we can do the operation in
> >> > + the
> >> current
> >> > +       precision.  In which case we don't need the pack and unpacks.  */
> >> > +      auto wcst = wi::to_wide (cst);
> >> > +      int pow = wi::exact_log2 (wcst + 1);
> >> > +      if (pow == (int) (element_precision (vectype) / 2))
> >> > +	{
> >> > +	  wide_int min,max;
> >> > +	  /* If we're in a pattern we need to find the orginal definition.  */
> >> > +	  tree op0 = oprnd0;
> >> > +	  gimple *stmt = SSA_NAME_DEF_STMT (oprnd0);
> >> > +	  stmt_vec_info stmt_info = vinfo->lookup_stmt (stmt);
> >> > +	  if (is_pattern_stmt_p (stmt_info))
> >> > +	    {
> >> > +	      auto orig_stmt = STMT_VINFO_RELATED_STMT (stmt_info);
> >> > +	      if (is_gimple_assign (STMT_VINFO_STMT (orig_stmt)))
> >> > +		op0 = gimple_assign_lhs (STMT_VINFO_STMT (orig_stmt));
> >> > +	    }
> >>
> >> If this is generally safe (I'm skipping thinking about it in the
> >> interests of a quick review :-)), then I think it should be done in
> >> vect_get_range_info instead.  Using gimple_get_lhs would be more
> >> general than handling just assignments.
> >>
> >> > +
> >> > +	  /* Check that no overflow will occur.  If we don't have range
> >> > +	     information we can't perform the optimization.  */
> >> > +	  if (vect_get_range_info (op0, &min, &max))
> >> > +	    {
> >> > +	      wide_int one = wi::to_wide (build_one_cst (itype));
> >> > +	      wide_int adder = wi::add (one, wi::lshift (one, pow));
> >> > +	      wi::overflow_type ovf;
> >> > +	      /* We need adder and max in the same precision.  */
> >> > +	      wide_int zadder
> >> > +		= wide_int_storage::from (adder, wi::get_precision (max),
> >> > +					  UNSIGNED);
> >> > +	      wi::add (max, zadder, UNSIGNED, &ovf);
> >>
> >> Could you explain this a bit more?  When do we have mismatched
> >> precisions?
> >
> > C promotion rules will promote e.g.
> >
> > void fun2(uint8_t* restrict pixel, uint8_t level, int n) {
> >   for (int i = 0; i < n; i+=1)
> >     pixel[i] = (pixel[i] + level) / 0xff; }
> >
> > And have the addition be done as a 32 bit integer.  The vectorizer
> > will demote this down to a short, but range information is not stored
> > for patterns.  So In the above the range will correctly be 0x1fe but
> > the precision will be that of the original expression, so 32.  This
> > will be a mismatch with itype which is derived from the size the vectorizer
> will perform the operation in.
> 
> Gah, missed this first time round, sorry.
> 
> Richi would know better than me, but I think it's dangerous to rely on the
> orig/pattern link for range information.  The end result of a pattern
> (vect_stmt_to_vectorize) has to have the same type as the lhs of the original
> statement.  But the other statements in the pattern sequence can do
> arbitrary things.  Their range isn't predictable from the range of the original
> statement result.
> 
> IIRC, the addition above is converted to:
> 
>   a' = (uint16_t) pixel[i]
>   b' = (uint16_t) level
>   sum' = a' + b'
>   sum = (int) sum'
> 
> where sum is the direct replacement of "pixel[i] + level", with the same type
> and range.  The division then uses sum' instead of sum.
> 
> But the fact that sum' is part of the same pattern as sum doesn't guarantee
> that sum' has the same range as sum.  E.g. the pattern statements added by
> the division optimisation wouldn't have this property.

So my assumption is that no pattern would replace a statement with something
That has higher precision than the C statement. The pattern above is demoted
By the vectorizer based on range information already. My assumption was that
the precision can only ever be smaller, because otherwise the pattern has violated
the semantics of the C code, which would be dangerous if e.g. the expression escapes?

> 
> Is it possible to tell ranger to compute the range of expressions that haven't
> been added to the IL?  (Genuine question, haven't looked.
> It seems pretty powerful though.)

I don't know either, I guess for things it has explicit knowledge about it's ok, so
+w or *w would be fine, but with a random IFN_ it'll likely have to punt as varying.

I guess while theoretically possible, I don't see a case where the vectorizer would
Introduce a higher precision, as this would reduce your VF.

The only place I can think of that this is unsafe is if the division is introduced as part
of another pattern, but in that case the pattern won't have a related statement so
we'll punt.

Regards,
Tamar

> 
> Thanks,
> Richard

next prev parent reply	other threads:[~2023-02-10 16:10 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-09 17:16 Tamar Christina
2023-02-09 17:22 ` [PATCH 2/2]AArch64 Update div-bitmask to implement new optab instead of target hook [PR108583] Tamar Christina
2023-02-10 10:35   ` Tamar Christina
2023-02-10 14:10   ` Richard Sandiford
2023-02-10 10:34 ` [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask by using new optabs [PR108583] Tamar Christina
2023-02-10 13:13 ` Richard Biener
2023-02-10 13:36 ` Richard Sandiford
2023-02-10 13:52   ` Richard Biener
2023-02-10 14:13   ` Tamar Christina
2023-02-10 14:30     ` Richard Sandiford
2023-02-10 14:54       ` Tamar Christina
2023-02-27 11:09       ` Tamar Christina
2023-02-27 12:11         ` Richard Sandiford
2023-02-27 12:14           ` Tamar Christina
2023-02-27 21:33             ` Richard Sandiford
2023-02-27 22:10               ` Tamar Christina
2023-02-28 11:08                 ` Richard Sandiford
2023-02-28 11:12                   ` Tamar Christina
2023-02-28 12:03                     ` Richard Sandiford
2023-03-01 11:30                       ` Richard Biener
2023-02-10 15:56     ` Richard Sandiford
2023-02-10 16:09       ` Tamar Christina [this message]
2023-02-10 16:25         ` Richard Sandiford
2023-02-10 16:33           ` Tamar Christina
2023-02-10 16:57             ` Richard Sandiford
2023-02-10 17:01               ` Richard Sandiford
2023-02-10 17:14               ` Tamar Christina
2023-02-10 18:12                 ` Richard Sandiford
2023-02-10 18:34                   ` Richard Biener
2023-02-10 20:58                     ` Andrew MacLeod
2023-02-13  9:54                       ` Tamar Christina
2023-02-15 12:51                         ` Tamar Christina
2023-02-15 16:05                           ` Andrew MacLeod
2023-02-15 17:13                             ` Tamar Christina
2023-02-15 17:50                               ` Andrew MacLeod
2023-02-15 18:42                                 ` Andrew MacLeod
2023-02-22 12:51                                   ` Tamar Christina
2023-02-22 16:41                                   ` Andrew MacLeod
2023-02-22 18:03                                     ` Tamar Christina
2023-02-22 18:33                                       ` Andrew MacLeod
2023-02-23  8:36                                         ` Tamar Christina
2023-02-23 16:39                                           ` Andrew MacLeod
2023-02-23 16:56                                             ` Tamar Christina
2023-03-01 16:57                                             ` Andrew Carlotti
2023-03-01 18:16                                               ` Tamar Christina
2023-02-22 13:06                                 ` Tamar Christina
2023-02-22 15:19                                   ` Andrew MacLeod

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=VI1PR08MB53258F0AB4D9FB5EF8B0D066FFDE9@VI1PR08MB5325.eurprd08.prod.outlook.com \
    --to=tamar.christina@arm.com \
    --cc=Richard.Sandiford@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jlaw@ventanamicro.com \
    --cc=nd@arm.com \
    --cc=rguenther@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).