From: Richard Biener <rguenther@suse.de>
To: Jakub Jelinek <jakub@redhat.com>
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] middle-end: __builtin_mul_overflow expansion improvements [PR95862]
Date: Wed, 25 Nov 2020 14:40:14 +0000 (UTC) [thread overview]
Message-ID: <nycvar.YFH.7.77.849.2011251440041.7048@jbgna.fhfr.qr> (raw)
In-Reply-To: <20201125100511.GY3788@tucnak>
On Wed, 25 Nov 2020, Jakub Jelinek wrote:
> Hi!
>
> The following patch adds some improvements for __builtin_mul_overflow
> expansion.
> One optimization is for the u1 * u2 -> sr case, as documented we normally
> do:
> u1 * u2 -> sr
> res = (S) (u1 * u2)
> ovf = res < 0 || main_ovf (true)
> where main_ovf (true) stands for jump on unsigned multiplication overflow.
> If we know that the most significant bits of both operands are clear (such
> as when they are zero extended from something smaller), we can
> emit better coe by handling it like s1 * s2 -> sr, i.e. just jump on
> overflow after signed multiplication.
>
> Another two cases are s1 * s2 -> ur or s1 * u2 -> ur, if we know the minimum
> precision needed to encode all values of both arguments summed together
> is smaller or equal to destination precision (such as when the two arguments
> are sign (or zero) extended from half precision types, we know the overflows
> happen only iff one argument is negative and the other argument is positive
> (not zero), because even if both have maximum possible values, the maximum
> is still representable (e.g. for char * char -> unsigned short
> 0x7f * 0x7f = 0x3f01 and for char * unsigned char -> unsigned short
> 0x7f * 0xffU = 0x7e81) and as the result is unsigned, all negative results
> do overflow, but are also representable if we consider the result signed
> - all of them have the MSB set. So, it is more efficient to just
> do the normal multiplication in that case and compare the result considered
> as signed value against 0, if it is smaller, overflow happened.
>
> And the get_min_precision change is to improve the char to short handling,
> we have there in the IL
> _2 = (int) arg_1(D);
> promotion from C promotions from char or unsigned char arg, and the caller
> adds a NOP_EXPR cast to short or unsigned short. get_min_precision punts
> on the narrowing cast though, it handled only widening casts, but we can
> handle narrowing casts fine too, by recursing on the narrowing cast operands
> and using it only if it has in the end smaller minimal precision, which
> would duplicate the sign bits (or zero bits) to both the bits above the
> narrowing conversion and also at least one below that.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
OK.
Thanks,
Richard.
> 2020-10-25 Jakub Jelinek <jakub@redhat.com>
>
> PR rtl-optimization/95862
> * internal-fn.c (get_min_precision): For narrowing conversion, recurse
> on the operand and if the operand precision is smaller than the
> current one, return that smaller precision.
> (expand_mul_overflow): For s1 * u2 -> ur and s1 * s2 -> ur cases
> if the sum of minimum precisions of both operands is smaller or equal
> to the result precision, just perform normal multiplication and
> set overflow to the sign bit of the multiplication result. For
> u1 * u2 -> sr if both arguments have the MSB known zero, use
> normal s1 * s2 -> sr expansion.
>
> * gcc.dg/builtin-artih-overflow-5.c: New test.
>
> --- gcc/internal-fn.c.jj 2020-10-13 09:25:28.444909391 +0200
> +++ gcc/internal-fn.c 2020-11-24 15:54:32.684762538 +0100
> @@ -553,6 +553,16 @@ get_min_precision (tree arg, signop sign
> if (++cnt > 30)
> return prec + (orig_sign != sign);
> }
> + if (CONVERT_EXPR_P (arg)
> + && INTEGRAL_TYPE_P (TREE_TYPE (TREE_OPERAND (arg, 0)))
> + && TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (arg, 0))) > prec)
> + {
> + /* We have e.g. (unsigned short) y_2 where int y_2 = (int) x_1(D);
> + If y_2's min precision is smaller than prec, return that. */
> + int oprec = get_min_precision (TREE_OPERAND (arg, 0), sign);
> + if (oprec < prec)
> + return oprec + (orig_sign != sign);
> + }
> if (TREE_CODE (arg) != SSA_NAME)
> return prec + (orig_sign != sign);
> wide_int arg_min, arg_max;
> @@ -1357,6 +1367,37 @@ expand_mul_overflow (location_t loc, tre
> NULL, done_label, profile_probability::very_likely ());
> goto do_error_label;
> case 3:
> + if (get_min_precision (arg1, UNSIGNED)
> + + get_min_precision (arg0, SIGNED) <= GET_MODE_PRECISION (mode))
> + {
> + /* If the first operand is sign extended from narrower type, the
> + second operand is zero extended from narrower type and
> + the sum of the two precisions is smaller or equal to the
> + result precision: if the first argument is at runtime
> + non-negative, maximum result will be 0x7e81 or 0x7f..fe80..01
> + and there will be no overflow, if the first argument is
> + negative and the second argument zero, the result will be
> + 0 and there will be no overflow, if the first argument is
> + negative and the second argument positive, the result when
> + treated as signed will be negative (minimum -0x7f80 or
> + -0x7f..f80..0) there there will be always overflow. So, do
> + res = (U) (s1 * u2)
> + ovf = (S) res < 0 */
> + struct separate_ops ops;
> + ops.code = MULT_EXPR;
> + ops.type
> + = build_nonstandard_integer_type (GET_MODE_PRECISION (mode),
> + 1);
> + ops.op0 = make_tree (ops.type, op0);
> + ops.op1 = make_tree (ops.type, op1);
> + ops.op2 = NULL_TREE;
> + ops.location = loc;
> + res = expand_expr_real_2 (&ops, NULL_RTX, mode, EXPAND_NORMAL);
> + do_compare_rtx_and_jump (res, const0_rtx, GE, false,
> + mode, NULL_RTX, NULL, done_label,
> + profile_probability::very_likely ());
> + goto do_error_label;
> + }
> rtx_code_label *do_main_label;
> do_main_label = gen_label_rtx ();
> do_compare_rtx_and_jump (op0, const0_rtx, GE, false, mode, NULL_RTX,
> @@ -1374,7 +1415,16 @@ expand_mul_overflow (location_t loc, tre
> /* u1 * u2 -> sr */
> if (uns0_p && uns1_p && !unsr_p)
> {
> - uns = true;
> + if ((pos_neg0 | pos_neg1) == 1)
> + {
> + /* If both arguments are zero extended from narrower types,
> + the MSB will be clear on both and so we can pretend it is
> + a normal s1 * s2 -> sr multiplication. */
> + uns0_p = false;
> + uns1_p = false;
> + }
> + else
> + uns = true;
> /* Rest of handling of this case after res is computed. */
> goto do_main;
> }
> @@ -1455,6 +1505,37 @@ expand_mul_overflow (location_t loc, tre
> profile_probability::very_likely ());
> goto do_error_label;
> }
> + if (get_min_precision (arg0, SIGNED)
> + + get_min_precision (arg1, SIGNED) <= GET_MODE_PRECISION (mode))
> + {
> + /* If both operands are sign extended from narrower types and
> + the sum of the two precisions is smaller or equal to the
> + result precision: if both arguments are at runtime
> + non-negative, maximum result will be 0x3f01 or 0x3f..f0..01
> + and there will be no overflow, if both arguments are negative,
> + maximum result will be 0x40..00 and there will be no overflow
> + either, if one argument is positive and the other argument
> + negative, the result when treated as signed will be negative
> + and there will be always overflow, and if one argument is
> + zero and the other negative the result will be zero and no
> + overflow. So, do
> + res = (U) (s1 * s2)
> + ovf = (S) res < 0 */
> + struct separate_ops ops;
> + ops.code = MULT_EXPR;
> + ops.type
> + = build_nonstandard_integer_type (GET_MODE_PRECISION (mode),
> + 1);
> + ops.op0 = make_tree (ops.type, op0);
> + ops.op1 = make_tree (ops.type, op1);
> + ops.op2 = NULL_TREE;
> + ops.location = loc;
> + res = expand_expr_real_2 (&ops, NULL_RTX, mode, EXPAND_NORMAL);
> + do_compare_rtx_and_jump (res, const0_rtx, GE, false,
> + mode, NULL_RTX, NULL, done_label,
> + profile_probability::very_likely ());
> + goto do_error_label;
> + }
> /* The general case, do all the needed comparisons at runtime. */
> rtx_code_label *do_main_label, *after_negate_label;
> rtx rop0, rop1;
> --- gcc/testsuite/gcc.dg/builtin-artih-overflow-5.c.jj 2020-11-24 16:13:49.482793354 +0100
> +++ gcc/testsuite/gcc.dg/builtin-artih-overflow-5.c 2020-11-24 16:13:44.980843796 +0100
> @@ -0,0 +1,87 @@
> +/* PR rtl-optimization/95862 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +int
> +f1 (int a, int b)
> +{
> + unsigned long long c;
> + return __builtin_mul_overflow (a, b, &c);
> +}
> +
> +int
> +f2 (int a, unsigned b)
> +{
> + unsigned long long c;
> + return __builtin_mul_overflow (a, b, &c);
> +}
> +
> +int
> +f3 (unsigned a, unsigned b)
> +{
> + long long c;
> + return __builtin_mul_overflow (a, b, &c);
> +}
> +
> +int
> +f4 (int a, unsigned b)
> +{
> + long long c;
> + return __builtin_mul_overflow (a, b, &c);
> +}
> +
> +short
> +f5 (short a, short b)
> +{
> + unsigned c;
> + return __builtin_mul_overflow (a, b, &c);
> +}
> +
> +short
> +f6 (short a, unsigned short b)
> +{
> + unsigned c;
> + return __builtin_mul_overflow (a, b, &c);
> +}
> +
> +short
> +f7 (unsigned short a, unsigned short b)
> +{
> + int c;
> + return __builtin_mul_overflow (a, b, &c);
> +}
> +
> +short
> +f8 (short a, unsigned short b)
> +{
> + int c;
> + return __builtin_mul_overflow (a, b, &c);
> +}
> +
> +signed char
> +f9 (signed char a, signed char b)
> +{
> + unsigned short c;
> + return __builtin_mul_overflow (a, b, &c);
> +}
> +
> +signed char
> +f10 (signed char a, unsigned char b)
> +{
> + unsigned short c;
> + return __builtin_mul_overflow (a, b, &c);
> +}
> +
> +signed char
> +f11 (unsigned char a, unsigned char b)
> +{
> + short c;
> + return __builtin_mul_overflow (a, b, &c);
> +}
> +
> +signed char
> +f12 (signed char a, unsigned char b)
> +{
> + short c;
> + return __builtin_mul_overflow (a, b, &c);
> +}
>
> Jakub
>
>
--
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend
prev parent reply other threads:[~2020-11-25 14:40 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-25 10:05 Jakub Jelinek
2020-11-25 14:40 ` Richard Biener [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=nycvar.YFH.7.77.849.2011251440041.7048@jbgna.fhfr.qr \
--to=rguenther@suse.de \
--cc=gcc-patches@gcc.gnu.org \
--cc=jakub@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).