Re: [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask by using new optabs [PR108583]

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Richard Sandiford <richard.sandiford@arm.com>
To: Tamar Christina <Tamar.Christina@arm.com>
Cc: Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org>,
	 nd <nd@arm.com>,  "rguenther\@suse.de" <rguenther@suse.de>,
	 "jlaw\@ventanamicro.com" <jlaw@ventanamicro.com>
Subject: Re: [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask by using new optabs [PR108583]
Date: Fri, 10 Feb 2023 15:56:43 +0000	[thread overview]
Message-ID: <mpth6vt345g.fsf@arm.com> (raw)
In-Reply-To: <VI1PR08MB53257CBBF08FD81509AED8ACFFDE9@VI1PR08MB5325.eurprd08.prod.outlook.com> (Tamar Christina's message of "Fri, 10 Feb 2023 14:13:30 +0000")

Tamar Christina <Tamar.Christina@arm.com> writes:
>> > a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index
>> >
>> 6934aebc69f231af24668f0a1c3d140e97f55487..e39d7e6b362ef44eb2fc467f33
>> 69
>> > de2afea139d6 100644
>> > --- a/gcc/tree-vect-patterns.cc
>> > +++ b/gcc/tree-vect-patterns.cc
>> > @@ -3914,12 +3914,82 @@ vect_recog_divmod_pattern (vec_info *vinfo,
>> >        return pattern_stmt;
>> >      }
>> >    else if ((cst = uniform_integer_cst_p (oprnd1))
>> > -	   && targetm.vectorize.can_special_div_by_const (rhs_code,
>> vectype,
>> > -							  wi::to_wide (cst),
>> > -							  NULL, NULL_RTX,
>> > -							  NULL_RTX))
>> > +	   && TYPE_UNSIGNED (itype)
>> > +	   && rhs_code == TRUNC_DIV_EXPR
>> > +	   && vectype
>> > +	   && direct_internal_fn_supported_p (IFN_ADDH, vectype,
>> > +					      OPTIMIZE_FOR_SPEED))
>> >      {
>> > -      return NULL;
>> > +      /* div optimizations using narrowings
>> > +       we can do the division e.g. shorts by 255 faster by calculating it as
>> > +       (x + ((x + 257) >> 8)) >> 8 assuming the operation is done in
>> > +       double the precision of x.
>> > +
>> > +       If we imagine a short as being composed of two blocks of bytes then
>> > +       adding 257 or 0b0000_0001_0000_0001 to the number is equivalent to
>> > +       adding 1 to each sub component:
>> > +
>> > +	    short value of 16-bits
>> > +       ┌──────────────┬────────────────┐
>> > +       │              │                │
>> > +       └──────────────┴────────────────┘
>> > +	 8-bit part1 ▲  8-bit part2   ▲
>> > +		     │                │
>> > +		     │                │
>> > +		    +1               +1
>> > +
>> > +       after the first addition, we have to shift right by 8, and narrow the
>> > +       results back to a byte.  Remember that the addition must be done in
>> > +       double the precision of the input.  However if we know that the
>> addition
>> > +       `x + 257` does not overflow then we can do the operation in the
>> current
>> > +       precision.  In which case we don't need the pack and unpacks.  */
>> > +      auto wcst = wi::to_wide (cst);
>> > +      int pow = wi::exact_log2 (wcst + 1);
>> > +      if (pow == (int) (element_precision (vectype) / 2))
>> > +	{
>> > +	  wide_int min,max;
>> > +	  /* If we're in a pattern we need to find the orginal definition.  */
>> > +	  tree op0 = oprnd0;
>> > +	  gimple *stmt = SSA_NAME_DEF_STMT (oprnd0);
>> > +	  stmt_vec_info stmt_info = vinfo->lookup_stmt (stmt);
>> > +	  if (is_pattern_stmt_p (stmt_info))
>> > +	    {
>> > +	      auto orig_stmt = STMT_VINFO_RELATED_STMT (stmt_info);
>> > +	      if (is_gimple_assign (STMT_VINFO_STMT (orig_stmt)))
>> > +		op0 = gimple_assign_lhs (STMT_VINFO_STMT (orig_stmt));
>> > +	    }
>> 
>> If this is generally safe (I'm skipping thinking about it in the interests of a
>> quick review :-)), then I think it should be done in vect_get_range_info
>> instead.  Using gimple_get_lhs would be more general than handling just
>> assignments.
>> 
>> > +
>> > +	  /* Check that no overflow will occur.  If we don't have range
>> > +	     information we can't perform the optimization.  */
>> > +	  if (vect_get_range_info (op0, &min, &max))
>> > +	    {
>> > +	      wide_int one = wi::to_wide (build_one_cst (itype));
>> > +	      wide_int adder = wi::add (one, wi::lshift (one, pow));
>> > +	      wi::overflow_type ovf;
>> > +	      /* We need adder and max in the same precision.  */
>> > +	      wide_int zadder
>> > +		= wide_int_storage::from (adder, wi::get_precision (max),
>> > +					  UNSIGNED);
>> > +	      wi::add (max, zadder, UNSIGNED, &ovf);
>> 
>> Could you explain this a bit more?  When do we have mismatched
>> precisions?
>
> C promotion rules will promote e.g.
>
> void fun2(uint8_t* restrict pixel, uint8_t level, int n)
> {
>   for (int i = 0; i < n; i+=1)
>     pixel[i] = (pixel[i] + level) / 0xff;
> }
>
> And have the addition be done as a 32 bit integer.  The vectorizer will demote this down
> to a short, but range information is not stored for patterns.  So In the above the range will
> correctly be 0x1fe but the precision will be that of the original expression, so 32.  This will
> be a mismatch with itype which is derived from the size the vectorizer will perform the
> operation in.

Gah, missed this first time round, sorry.

Richi would know better than me, but I think it's dangerous to rely on
the orig/pattern link for range information.  The end result of a pattern
(vect_stmt_to_vectorize) has to have the same type as the lhs of the
original statement.  But the other statements in the pattern sequence
can do arbitrary things.  Their range isn't predictable from the range
of the original statement result.

IIRC, the addition above is converted to:

  a' = (uint16_t) pixel[i]
  b' = (uint16_t) level
  sum' = a' + b'
  sum = (int) sum'

where sum is the direct replacement of "pixel[i] + level", with the
same type and range.  The division then uses sum' instead of sum.

But the fact that sum' is part of the same pattern as sum doesn't
guarantee that sum' has the same range as sum.  E.g. the pattern
statements added by the division optimisation wouldn't have this
property.

Is it possible to tell ranger to compute the range of expressions that
haven't been added to the IL?  (Genuine question, haven't looked.
It seems pretty powerful though.)

Thanks,
Richard

next prev parent reply	other threads:[~2023-02-10 15:56 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-09 17:16 Tamar Christina
2023-02-09 17:22 ` [PATCH 2/2]AArch64 Update div-bitmask to implement new optab instead of target hook [PR108583] Tamar Christina
2023-02-10 10:35   ` Tamar Christina
2023-02-10 14:10   ` Richard Sandiford
2023-02-10 10:34 ` [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask by using new optabs [PR108583] Tamar Christina
2023-02-10 13:13 ` Richard Biener
2023-02-10 13:36 ` Richard Sandiford
2023-02-10 13:52   ` Richard Biener
2023-02-10 14:13   ` Tamar Christina
2023-02-10 14:30     ` Richard Sandiford
2023-02-10 14:54       ` Tamar Christina
2023-02-27 11:09       ` Tamar Christina
2023-02-27 12:11         ` Richard Sandiford
2023-02-27 12:14           ` Tamar Christina
2023-02-27 21:33             ` Richard Sandiford
2023-02-27 22:10               ` Tamar Christina
2023-02-28 11:08                 ` Richard Sandiford
2023-02-28 11:12                   ` Tamar Christina
2023-02-28 12:03                     ` Richard Sandiford
2023-03-01 11:30                       ` Richard Biener
2023-02-10 15:56     ` Richard Sandiford [this message]
2023-02-10 16:09       ` Tamar Christina
2023-02-10 16:25         ` Richard Sandiford
2023-02-10 16:33           ` Tamar Christina
2023-02-10 16:57             ` Richard Sandiford
2023-02-10 17:01               ` Richard Sandiford
2023-02-10 17:14               ` Tamar Christina
2023-02-10 18:12                 ` Richard Sandiford
2023-02-10 18:34                   ` Richard Biener
2023-02-10 20:58                     ` Andrew MacLeod
2023-02-13  9:54                       ` Tamar Christina
2023-02-15 12:51                         ` Tamar Christina
2023-02-15 16:05                           ` Andrew MacLeod
2023-02-15 17:13                             ` Tamar Christina
2023-02-15 17:50                               ` Andrew MacLeod
2023-02-15 18:42                                 ` Andrew MacLeod
2023-02-22 12:51                                   ` Tamar Christina
2023-02-22 16:41                                   ` Andrew MacLeod
2023-02-22 18:03                                     ` Tamar Christina
2023-02-22 18:33                                       ` Andrew MacLeod
2023-02-23  8:36                                         ` Tamar Christina
2023-02-23 16:39                                           ` Andrew MacLeod
2023-02-23 16:56                                             ` Tamar Christina
2023-03-01 16:57                                             ` Andrew Carlotti
2023-03-01 18:16                                               ` Tamar Christina
2023-02-22 13:06                                 ` Tamar Christina
2023-02-22 15:19                                   ` Andrew MacLeod

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mpth6vt345g.fsf@arm.com \
    --to=richard.sandiford@arm.com \
    --cc=Tamar.Christina@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jlaw@ventanamicro.com \
    --cc=nd@arm.com \
    --cc=rguenther@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).