Re: [PATCH] middle-end: __builtin_mul_overflow expansion improvements [PR95862]

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Richard Biener <rguenther@suse.de>
To: Jakub Jelinek <jakub@redhat.com>
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] middle-end: __builtin_mul_overflow expansion improvements [PR95862]
Date: Wed, 25 Nov 2020 14:40:14 +0000 (UTC)	[thread overview]
Message-ID: <nycvar.YFH.7.77.849.2011251440041.7048@jbgna.fhfr.qr> (raw)
In-Reply-To: <20201125100511.GY3788@tucnak>

On Wed, 25 Nov 2020, Jakub Jelinek wrote:

> Hi!
> 
> The following patch adds some improvements for __builtin_mul_overflow
> expansion.
> One optimization is for the u1 * u2 -> sr case, as documented we normally
> do:
>      u1 * u2 -> sr
>         res = (S) (u1 * u2)
>         ovf = res < 0 || main_ovf (true)
> where main_ovf (true) stands for jump on unsigned multiplication overflow.
> If we know that the most significant bits of both operands are clear (such
> as when they are zero extended from something smaller), we can
> emit better coe by handling it like s1 * s2 -> sr, i.e. just jump on
> overflow after signed multiplication.
> 
> Another two cases are s1 * s2 -> ur or s1 * u2 -> ur, if we know the minimum
> precision needed to encode all values of both arguments summed together
> is smaller or equal to destination precision (such as when the two arguments
> are sign (or zero) extended from half precision types, we know the overflows
> happen only iff one argument is negative and the other argument is positive
> (not zero), because even if both have maximum possible values, the maximum
> is still representable (e.g. for char * char -> unsigned short
> 0x7f * 0x7f = 0x3f01 and for char * unsigned char -> unsigned short
> 0x7f * 0xffU = 0x7e81) and as the result is unsigned, all negative results
> do overflow, but are also representable if we consider the result signed
> - all of them have the MSB set.  So, it is more efficient to just
> do the normal multiplication in that case and compare the result considered
> as signed value against 0, if it is smaller, overflow happened.
> 
> And the get_min_precision change is to improve the char to short handling,
> we have there in the IL
>   _2 = (int) arg_1(D);
> promotion from C promotions from char or unsigned char arg, and the caller
> adds a NOP_EXPR cast to short or unsigned short.  get_min_precision punts
> on the narrowing cast though, it handled only widening casts, but we can
> handle narrowing casts fine too, by recursing on the narrowing cast operands
> and using it only if it has in the end smaller minimal precision, which
> would duplicate the sign bits (or zero bits) to both the bits above the
> narrowing conversion and also at least one below that.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2020-10-25  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR rtl-optimization/95862
> 	* internal-fn.c (get_min_precision): For narrowing conversion, recurse
> 	on the operand and if the operand precision is smaller than the
> 	current one, return that smaller precision.
> 	(expand_mul_overflow): For s1 * u2 -> ur and s1 * s2 -> ur cases
> 	if the sum of minimum precisions of both operands is smaller or equal
> 	to the result precision, just perform normal multiplication and
> 	set overflow to the sign bit of the multiplication result.  For
> 	u1 * u2 -> sr if both arguments have the MSB known zero, use
> 	normal s1 * s2 -> sr expansion.
> 
> 	* gcc.dg/builtin-artih-overflow-5.c: New test.
> 
> --- gcc/internal-fn.c.jj	2020-10-13 09:25:28.444909391 +0200
> +++ gcc/internal-fn.c	2020-11-24 15:54:32.684762538 +0100
> @@ -553,6 +553,16 @@ get_min_precision (tree arg, signop sign
>        if (++cnt > 30)
>  	return prec + (orig_sign != sign);
>      }
> +  if (CONVERT_EXPR_P (arg)
> +      && INTEGRAL_TYPE_P (TREE_TYPE (TREE_OPERAND (arg, 0)))
> +      && TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (arg, 0))) > prec)
> +    {
> +      /* We have e.g. (unsigned short) y_2 where int y_2 = (int) x_1(D);
> +	 If y_2's min precision is smaller than prec, return that.  */
> +      int oprec = get_min_precision (TREE_OPERAND (arg, 0), sign);
> +      if (oprec < prec)
> +	return oprec + (orig_sign != sign);
> +    }
>    if (TREE_CODE (arg) != SSA_NAME)
>      return prec + (orig_sign != sign);
>    wide_int arg_min, arg_max;
> @@ -1357,6 +1367,37 @@ expand_mul_overflow (location_t loc, tre
>  				   NULL, done_label, profile_probability::very_likely ());
>  	  goto do_error_label;
>  	case 3:
> +	  if (get_min_precision (arg1, UNSIGNED)
> +	      + get_min_precision (arg0, SIGNED) <= GET_MODE_PRECISION (mode))
> +	    {
> +	      /* If the first operand is sign extended from narrower type, the
> +		 second operand is zero extended from narrower type and
> +		 the sum of the two precisions is smaller or equal to the
> +		 result precision: if the first argument is at runtime
> +		 non-negative, maximum result will be 0x7e81 or 0x7f..fe80..01
> +		 and there will be no overflow, if the first argument is
> +		 negative and the second argument zero, the result will be
> +		 0 and there will be no overflow, if the first argument is
> +		 negative and the second argument positive, the result when
> +		 treated as signed will be negative (minimum -0x7f80 or
> +		 -0x7f..f80..0) there there will be always overflow.  So, do
> +		 res = (U) (s1 * u2)
> +		 ovf = (S) res < 0  */
> +	      struct separate_ops ops;
> +	      ops.code = MULT_EXPR;
> +	      ops.type
> +		= build_nonstandard_integer_type (GET_MODE_PRECISION (mode),
> +						  1);
> +	      ops.op0 = make_tree (ops.type, op0);
> +	      ops.op1 = make_tree (ops.type, op1);
> +	      ops.op2 = NULL_TREE;
> +	      ops.location = loc;
> +	      res = expand_expr_real_2 (&ops, NULL_RTX, mode, EXPAND_NORMAL);
> +	      do_compare_rtx_and_jump (res, const0_rtx, GE, false,
> +				       mode, NULL_RTX, NULL, done_label,
> +				       profile_probability::very_likely ());
> +	      goto do_error_label;
> +	    }
>  	  rtx_code_label *do_main_label;
>  	  do_main_label = gen_label_rtx ();
>  	  do_compare_rtx_and_jump (op0, const0_rtx, GE, false, mode, NULL_RTX,
> @@ -1374,7 +1415,16 @@ expand_mul_overflow (location_t loc, tre
>    /* u1 * u2 -> sr  */
>    if (uns0_p && uns1_p && !unsr_p)
>      {
> -      uns = true;
> +      if ((pos_neg0 | pos_neg1) == 1)
> +	{
> +	  /* If both arguments are zero extended from narrower types,
> +	     the MSB will be clear on both and so we can pretend it is
> +	     a normal s1 * s2 -> sr multiplication.  */
> +	  uns0_p = false;
> +	  uns1_p = false;
> +	}
> +      else
> +	uns = true;
>        /* Rest of handling of this case after res is computed.  */
>        goto do_main;
>      }
> @@ -1455,6 +1505,37 @@ expand_mul_overflow (location_t loc, tre
>  				       profile_probability::very_likely ());
>  	      goto do_error_label;
>  	    }
> +	  if (get_min_precision (arg0, SIGNED)
> +	      + get_min_precision (arg1, SIGNED) <= GET_MODE_PRECISION (mode))
> +	    {
> +	      /* If both operands are sign extended from narrower types and
> +		 the sum of the two precisions is smaller or equal to the
> +		 result precision: if both arguments are at runtime
> +		 non-negative, maximum result will be 0x3f01 or 0x3f..f0..01
> +		 and there will be no overflow, if both arguments are negative,
> +		 maximum result will be 0x40..00 and there will be no overflow
> +		 either, if one argument is positive and the other argument
> +		 negative, the result when treated as signed will be negative
> +		 and there will be always overflow, and if one argument is
> +		 zero and the other negative the result will be zero and no
> +		 overflow.  So, do
> +		 res = (U) (s1 * s2)
> +		 ovf = (S) res < 0  */
> +	      struct separate_ops ops;
> +	      ops.code = MULT_EXPR;
> +	      ops.type
> +		= build_nonstandard_integer_type (GET_MODE_PRECISION (mode),
> +						  1);
> +	      ops.op0 = make_tree (ops.type, op0);
> +	      ops.op1 = make_tree (ops.type, op1);
> +	      ops.op2 = NULL_TREE;
> +	      ops.location = loc;
> +	      res = expand_expr_real_2 (&ops, NULL_RTX, mode, EXPAND_NORMAL);
> +	      do_compare_rtx_and_jump (res, const0_rtx, GE, false,
> +				       mode, NULL_RTX, NULL, done_label,
> +				       profile_probability::very_likely ());
> +	      goto do_error_label;
> +	    }
>  	  /* The general case, do all the needed comparisons at runtime.  */
>  	  rtx_code_label *do_main_label, *after_negate_label;
>  	  rtx rop0, rop1;
> --- gcc/testsuite/gcc.dg/builtin-artih-overflow-5.c.jj	2020-11-24 16:13:49.482793354 +0100
> +++ gcc/testsuite/gcc.dg/builtin-artih-overflow-5.c	2020-11-24 16:13:44.980843796 +0100
> @@ -0,0 +1,87 @@
> +/* PR rtl-optimization/95862 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +int
> +f1 (int a, int b)
> +{
> +  unsigned long long c;
> +  return __builtin_mul_overflow (a, b, &c);
> +}
> +
> +int
> +f2 (int a, unsigned b)
> +{
> +  unsigned long long c;
> +  return __builtin_mul_overflow (a, b, &c);
> +}
> +
> +int
> +f3 (unsigned a, unsigned b)
> +{
> +  long long c;
> +  return __builtin_mul_overflow (a, b, &c);
> +}
> +
> +int
> +f4 (int a, unsigned b)
> +{
> +  long long c;
> +  return __builtin_mul_overflow (a, b, &c);
> +}
> +
> +short
> +f5 (short a, short b)
> +{
> +  unsigned c;
> +  return __builtin_mul_overflow (a, b, &c);
> +}
> +
> +short
> +f6 (short a, unsigned short b)
> +{
> +  unsigned c;
> +  return __builtin_mul_overflow (a, b, &c);
> +}
> +
> +short
> +f7 (unsigned short a, unsigned short b)
> +{
> +  int c;
> +  return __builtin_mul_overflow (a, b, &c);
> +}
> +
> +short
> +f8 (short a, unsigned short b)
> +{
> +  int c;
> +  return __builtin_mul_overflow (a, b, &c);
> +}
> +
> +signed char
> +f9 (signed char a, signed char b)
> +{
> +  unsigned short c;
> +  return __builtin_mul_overflow (a, b, &c);
> +}
> +
> +signed char
> +f10 (signed char a, unsigned char b)
> +{
> +  unsigned short c;
> +  return __builtin_mul_overflow (a, b, &c);
> +}
> +
> +signed char
> +f11 (unsigned char a, unsigned char b)
> +{
> +  short c;
> +  return __builtin_mul_overflow (a, b, &c);
> +}
> +
> +signed char
> +f12 (signed char a, unsigned char b)
> +{
> +  short c;
> +  return __builtin_mul_overflow (a, b, &c);
> +}
> 
> 	Jakub
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend

     prev parent reply	other threads:[~2020-11-25 14:40 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-25 10:05 Jakub Jelinek
2020-11-25 14:40 ` Richard Biener [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=nycvar.YFH.7.77.849.2011251440041.7048@jbgna.fhfr.qr \
    --to=rguenther@suse.de \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jakub@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).