* [Bug rtl-optimization/97282] division done twice for modulo and divsion for 128-bit integers
2020-10-03 16:37 [Bug rtl-optimization/97282] New: division done twice for modulo and divsion for 128-bit integers tkoenig at gcc dot gnu.org
@ 2020-10-03 16:37 ` tkoenig at gcc dot gnu.org
2020-10-03 23:18 ` tkoenig at gcc dot gnu.org
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2020-10-03 16:37 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97282
Thomas Koenig <tkoenig at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
See Also| |https://gcc.gnu.org/bugzill
| |a/show_bug.cgi?id=89256
Keywords| |missed-optimization
Severity|normal |enhancement
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug rtl-optimization/97282] division done twice for modulo and divsion for 128-bit integers
2020-10-03 16:37 [Bug rtl-optimization/97282] New: division done twice for modulo and divsion for 128-bit integers tkoenig at gcc dot gnu.org
2020-10-03 16:37 ` [Bug rtl-optimization/97282] " tkoenig at gcc dot gnu.org
@ 2020-10-03 23:18 ` tkoenig at gcc dot gnu.org
2020-10-05 11:00 ` jakub at gcc dot gnu.org
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2020-10-03 23:18 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97282
--- Comment #1 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
And here is a version which uses two 64-bit numbers for calculation of he
sum of decimal digits as a benchmark for the division and modulo:
unsigned long digsum3 (myint x)
{
unsigned long ret;
__uint64_t high, low;
const unsigned long int rem_high[10] = {0,6,2,8,4,0,6,2,8,4};
const unsigned long int foo_high[10] =
{0x0000000000000000, 0x1999999999999999, 0x3333333333333333,
0x4CCCCCCCCCCCCCCC,
0x6666666666666666, 0x8000000000000000, 0x9999999999999999,
0xB333333333333333,
0xCCCCCCCCCCCCCCCC, 0xE666666666666666 };
high = x >> 64;
low = x;
ret = 0;
while (low > 0 || high > 0)
{
unsigned long r_high, r_low, r_sum, r_carry;
r_high = high % 10;
r_carry = rem_high[r_high];
high = high / 10;
r_low = low % 10;
low = low / 10;
low = low + foo_high[r_high];
r_sum = r_low + r_carry;
if (r_sum >= 10)
{
r_sum = r_sum - 10;
low ++;
}
ret = ret + r_sum;
}
return ret;
}
It is _much_ faster, taking around 250 to 260 cycles per
calculation, a speedup of a factor of around 8 versus the
original code.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug rtl-optimization/97282] division done twice for modulo and divsion for 128-bit integers
2020-10-03 16:37 [Bug rtl-optimization/97282] New: division done twice for modulo and divsion for 128-bit integers tkoenig at gcc dot gnu.org
2020-10-03 16:37 ` [Bug rtl-optimization/97282] " tkoenig at gcc dot gnu.org
2020-10-03 23:18 ` tkoenig at gcc dot gnu.org
@ 2020-10-05 11:00 ` jakub at gcc dot gnu.org
2020-10-05 13:37 ` jakub at gcc dot gnu.org
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-10-05 11:00 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97282
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jakub at gcc dot gnu.org
--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
We have the divmod discovery in tree-ssa-math-opts.c and it will handle this
case if the division/modulo is by the same variable. But for constants it
punts:
/* Disable the transform if either is a constant, since division-by-constant
may have specialized expansion. */
if (CONSTANT_CLASS_P (op1) || CONSTANT_CLASS_P (op2))
return false;
So, perhaps we want to get it through if CONSTANT_CLASS_P (op2) (but not op1)
and
some further conditions are met, e.g. that op2 is not a power of two, and
either the type's precision is > HOST_BITS_PER_WIDE_INT (then expand_divmod
punts on it), as choose_multiplier etc. work on HOST_WIDE_INTs), or compute
choose_multiplier for the constant divisor and if pre or post shift needs to be
BITS_PER_WORD or more bits.
Or optimize into DIVMOD always even for constant op2 and when trying to expand
DIVMOD ifn with constant op2, see if the target can expand division by that
constant using just shifts and +/- and if so, emit it as that division +
subtraction, otherwise throw away the division expansion and emit a divmod.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug rtl-optimization/97282] division done twice for modulo and divsion for 128-bit integers
2020-10-03 16:37 [Bug rtl-optimization/97282] New: division done twice for modulo and divsion for 128-bit integers tkoenig at gcc dot gnu.org
` (2 preceding siblings ...)
2020-10-05 11:00 ` jakub at gcc dot gnu.org
@ 2020-10-05 13:37 ` jakub at gcc dot gnu.org
2020-10-06 8:33 ` cvs-commit at gcc dot gnu.org
2021-08-15 11:23 ` pinskia at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-10-05 13:37 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97282
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Ever confirmed|0 |1
Assignee|unassigned at gcc dot gnu.org |jakub at gcc dot gnu.org
Status|UNCONFIRMED |ASSIGNED
Last reconfirmed| |2020-10-05
--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Created attachment 49307
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49307&action=edit
gcc11-pr97282.patch
Untested fix.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug rtl-optimization/97282] division done twice for modulo and divsion for 128-bit integers
2020-10-03 16:37 [Bug rtl-optimization/97282] New: division done twice for modulo and divsion for 128-bit integers tkoenig at gcc dot gnu.org
` (3 preceding siblings ...)
2020-10-05 13:37 ` jakub at gcc dot gnu.org
@ 2020-10-06 8:33 ` cvs-commit at gcc dot gnu.org
2021-08-15 11:23 ` pinskia at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2020-10-06 8:33 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97282
--- Comment #4 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:
https://gcc.gnu.org/g:bf510679bb3f9bfd6019666065016bb26a5b5466
commit r11-3671-gbf510679bb3f9bfd6019666065016bb26a5b5466
Author: Jakub Jelinek <jakub@redhat.com>
Date: Tue Oct 6 10:32:22 2020 +0200
divmod: Match and expand DIVMOD even in some cases of constant divisor
[PR97282]
As written in the comment, tree-ssa-math-opts.c wouldn't create a DIVMOD
ifn call for division + modulo by constant for the fear that during
expansion we could generate better code for those cases.
If the divisoris a power of two, that is certainly the case always,
but otherwise expand_divmod can punt in many cases, e.g. if the division
type's precision is above HOST_BITS_PER_WIDE_INT, we don't even call
choose_multiplier, because it works on HOST_WIDE_INTs (true, something
we should fix eventually now that we have wide_ints), or if pre/post shift
is larger than BITS_PER_WORD.
So, the following patch recognizes DIVMOD with constant last argument even
when it is unclear if expand_divmod will be able to optimize it, and then
during DIVMOD expansion if the divisor is constant attempts to expand it as
division + modulo and if they actually don't contain any libcalls or
division/modulo, they are kept as is, otherwise that sequence is thrown
away
and divmod optab or libcall is used.
2020-10-06 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/97282
* tree-ssa-math-opts.c (divmod_candidate_p): Don't return false for
constant op2 if it is not a power of two and the type has precision
larger than HOST_BITS_PER_WIDE_INT or BITS_PER_WORD.
* internal-fn.c (contains_call_div_mod): New function.
(expand_DIVMOD): If last argument is a constant, try to expand it
as
TRUNC_DIV_EXPR followed by TRUNC_MOD_EXPR, but if the sequence
contains any calls or {,U}{DIV,MOD} rtxes, throw it away and use
divmod optab or divmod libfunc.
* gcc.target/i386/pr97282.c: New test.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug rtl-optimization/97282] division done twice for modulo and divsion for 128-bit integers
2020-10-03 16:37 [Bug rtl-optimization/97282] New: division done twice for modulo and divsion for 128-bit integers tkoenig at gcc dot gnu.org
` (4 preceding siblings ...)
2020-10-06 8:33 ` cvs-commit at gcc dot gnu.org
@ 2021-08-15 11:23 ` pinskia at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-15 11:23 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97282
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution|--- |FIXED
Target Milestone|--- |11.0
--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
All fixed for GCC 11 by the patches for PR 97459 .
^ permalink raw reply [flat|nested] 7+ messages in thread