[Bug tree-optimization/113105] New: Missing optimzation: fold `div(v, a) * b + rem(v, a)` to `div(v, a) * (b

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/113105] New: Missing optimzation: fold `div(v, a) * b + rem(v, a)` to `div(v, a) * (b - a) + v`
@ 2023-12-21  8:50 xxs_chy at outlook dot com
  2023-12-21 16:06 ` [Bug tree-optimization/113105] " pinskia at gcc dot gnu.org
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: xxs_chy at outlook dot com @ 2023-12-21  8:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113105

            Bug ID: 113105
           Summary: Missing optimzation: fold `div(v, a) * b + rem(v, a)`
                    to `div(v, a) * (b - a) + v`
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: xxs_chy at outlook dot com
  Target Milestone: ---

Godbolt example: https://godbolt.org/z/b5va37Tzx

For example:

unsigned char _bin2bcd(unsigned val)
{
        return ((val / 10) << 4) + val % 10;
}

can be folded to:

unsigned char new_bin2bcd(unsigned val)
{
        return val / 10 * 6 + val;
}

This C snippet is extracted from
https://github.com/torvalds/linux/blob/master/lib/bcd.c

Both GCC and LLVM missed it.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/113105] Missing optimzation: fold `div(v, a) * b + rem(v, a)` to `div(v, a) * (b - a) + v`
  2023-12-21  8:50 [Bug tree-optimization/113105] New: Missing optimzation: fold `div(v, a) * b + rem(v, a)` to `div(v, a) * (b - a) + v` xxs_chy at outlook dot com
@ 2023-12-21 16:06 ` pinskia at gcc dot gnu.org
  2023-12-21 16:09 ` jakub at gcc dot gnu.org
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-12-21 16:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113105

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
           Severity|normal                      |enhancement

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/113105] Missing optimzation: fold `div(v, a) * b + rem(v, a)` to `div(v, a) * (b - a) + v`
  2023-12-21  8:50 [Bug tree-optimization/113105] New: Missing optimzation: fold `div(v, a) * b + rem(v, a)` to `div(v, a) * (b - a) + v` xxs_chy at outlook dot com
  2023-12-21 16:06 ` [Bug tree-optimization/113105] " pinskia at gcc dot gnu.org
@ 2023-12-21 16:09 ` jakub at gcc dot gnu.org
  2023-12-21 17:16 ` xxs_chy at outlook dot com
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-12-21 16:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113105

--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
When it is signed v / a * b + v % a, I think it can introduce UB which wasn't
there originally.
E.g. for v = 0, a = INT_MIN and b = 3.  So, if it isn't done just for unsigned
types,
parts of it need to be done in unsigned.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/113105] Missing optimzation: fold `div(v, a) * b + rem(v, a)` to `div(v, a) * (b - a) + v`
  2023-12-21  8:50 [Bug tree-optimization/113105] New: Missing optimzation: fold `div(v, a) * b + rem(v, a)` to `div(v, a) * (b - a) + v` xxs_chy at outlook dot com
  2023-12-21 16:06 ` [Bug tree-optimization/113105] " pinskia at gcc dot gnu.org
  2023-12-21 16:09 ` jakub at gcc dot gnu.org
@ 2023-12-21 17:16 ` xxs_chy at outlook dot com
  2023-12-21 17:50 ` jakub at gcc dot gnu.org
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: xxs_chy at outlook dot com @ 2023-12-21 17:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113105

--- Comment #2 from XChy <xxs_chy at outlook dot com> ---
(In reply to Jakub Jelinek from comment #1)
> When it is signed v / a * b + v % a, I think it can introduce UB which
> wasn't there originally.
> E.g. for v = 0, a = INT_MIN and b = 3.  So, if it isn't done just for
> unsigned types,
> parts of it need to be done in unsigned.

Yes, this fold is true if there is no nooverflow/nowrap constraint. For those
with  nooverflow/nowrap constraint, it stays unclear to me when to fold.

For your reference, LLVM expands "v % a" to "v - (v / a) * a", and then
reassociates "(v / a) * b - (v / a) * a + v" to "(v / a) * (b - a) + v" to
solve this issue.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/113105] Missing optimzation: fold `div(v, a) * b + rem(v, a)` to `div(v, a) * (b - a) + v`
  2023-12-21  8:50 [Bug tree-optimization/113105] New: Missing optimzation: fold `div(v, a) * b + rem(v, a)` to `div(v, a) * (b - a) + v` xxs_chy at outlook dot com
                   ` (2 preceding siblings ...)
  2023-12-21 17:16 ` xxs_chy at outlook dot com
@ 2023-12-21 17:50 ` jakub at gcc dot gnu.org
  2023-12-21 22:51 ` jakub at gcc dot gnu.org
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-12-21 17:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113105

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
I think with int v, a, b; ... v / a * b + v % a can be simplified into
(int) (v / a * ((unsigned) b - a) + v), i.e. perform just the division in
signed and everything else in corresponding unsigned type.
Also, a question is if this is a useful optimization on targets where one
instruction can compute both v / a and v % a together, because then the
original has roughly one divmod insn, one multiplication and one addition,
compared to the divmod insn from which only division is used, subtraction,
multiplication and addition.
Of course, if b - a can fold into a constant, it is different (but
multiplication by constant is often done using shifts and additions and
multiplication by b might be cheaper than by b - a.
When v % a needs to be computed separately and especially when it is expensive,
it can be obviously a win.

From the usual GIMPLE IL rules, both forms are 4 statements so equally good,
but for the case where casts are needed, the replacement is more expensive.
So, perhaps this shouldn't be done in match.pd, but during expansion or
immediately before expansion, expanding to RTL both forms and comparing the
costs.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/113105] Missing optimzation: fold `div(v, a) * b + rem(v, a)` to `div(v, a) * (b - a) + v`
  2023-12-21  8:50 [Bug tree-optimization/113105] New: Missing optimzation: fold `div(v, a) * b + rem(v, a)` to `div(v, a) * (b - a) + v` xxs_chy at outlook dot com
                   ` (3 preceding siblings ...)
  2023-12-21 17:50 ` jakub at gcc dot gnu.org
@ 2023-12-21 22:51 ` jakub at gcc dot gnu.org
  2023-12-23 10:02 ` xxs_chy at outlook dot com
  2024-05-30  4:46 ` pinskia at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-12-21 22:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113105

--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
So, e.g. on x86_64,
unsigned int
f1 (unsigned val)
{
  return val / 10 * 16 + val % 10;
}

unsigned int
f2 (unsigned val)
{
  return val / 10 * 6 + val;
}

unsigned int
f3 (unsigned val, unsigned a, unsigned b)
{
  return val / a * b + val % a;
}

unsigned int
f4 (unsigned val, unsigned a, unsigned b)
{
  return val / a * (b - a) + val % a;
}

unsigned int
f5 (unsigned val)
{
  return val / 93 * 127 + val % 93;
}

unsigned int
f6 (unsigned val)
{
  return val / 93 * (127 - 93) + val;
}

f2, f3 and f5 are shorter compared to f1, f4 and f6 at -O2.
With -Os, f3 is shorter than f4, while f1/f2 and f5/f6 are the same size (and
also same number of insns there, perhaps f1 better than f2 as it uses shift
rather than imul).
So, this is really something that needs to take into account the machine
specific expansion etc., isn't a clear winner all the time.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/113105] Missing optimzation: fold `div(v, a) * b + rem(v, a)` to `div(v, a) * (b - a) + v`
  2023-12-21  8:50 [Bug tree-optimization/113105] New: Missing optimzation: fold `div(v, a) * b + rem(v, a)` to `div(v, a) * (b - a) + v` xxs_chy at outlook dot com
                   ` (4 preceding siblings ...)
  2023-12-21 22:51 ` jakub at gcc dot gnu.org
@ 2023-12-23 10:02 ` xxs_chy at outlook dot com
  2024-05-30  4:46 ` pinskia at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: xxs_chy at outlook dot com @ 2023-12-23 10:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113105

--- Comment #5 from XChy <xxs_chy at outlook dot com> ---
(In reply to Jakub Jelinek from comment #4)
> So, e.g. on x86_64,
> unsigned int
> f1 (unsigned val)
> {
>   return val / 10 * 16 + val % 10;
> }
> 
> unsigned int
> f2 (unsigned val)
> {
>   return val / 10 * 6 + val;
> }
> 
> unsigned int
> f3 (unsigned val, unsigned a, unsigned b)
> {
>   return val / a * b + val % a;
> }
> 
> unsigned int
> f4 (unsigned val, unsigned a, unsigned b)
> {
>   return val / a * (b - a) + val % a;
> }
> 
> unsigned int
> f5 (unsigned val)
> {
>   return val / 93 * 127 + val % 93;
> }
> 
> unsigned int
> f6 (unsigned val)
> {
>   return val / 93 * (127 - 93) + val;
> }
> 
> f2, f3 and f5 are shorter compared to f1, f4 and f6 at -O2.
> With -Os, f3 is shorter than f4, while f1/f2 and f5/f6 are the same size
> (and also same number of insns there, perhaps f1 better than f2 as it uses
> shift rather than imul).
> So, this is really something that needs to take into account the machine
> specific expansion etc., isn't a clear winner all the time.

Thanks for your explanations! It's a good fold for those targets with expensive
cost on "v % a", but not for those cheap. I'm not a GCC developer, do you think
I should report to rtl-optimization?

And it seems that f6 has smaller size than f5 at -O2 in your example:
https://godbolt.org/z/PEWKfj1je

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/113105] Missing optimzation: fold `div(v, a) * b + rem(v, a)` to `div(v, a) * (b - a) + v`
  2023-12-21  8:50 [Bug tree-optimization/113105] New: Missing optimzation: fold `div(v, a) * b + rem(v, a)` to `div(v, a) * (b - a) + v` xxs_chy at outlook dot com
                   ` (5 preceding siblings ...)
  2023-12-23 10:02 ` xxs_chy at outlook dot com
@ 2024-05-30  4:46 ` pinskia at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-05-30  4:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113105

--- Comment #6 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
*** Bug 115287 has been marked as a duplicate of this bug. ***

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-05-30  4:46 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-21  8:50 [Bug tree-optimization/113105] New: Missing optimzation: fold `div(v, a) * b + rem(v, a)` to `div(v, a) * (b - a) + v` xxs_chy at outlook dot com
2023-12-21 16:06 ` [Bug tree-optimization/113105] " pinskia at gcc dot gnu.org
2023-12-21 16:09 ` jakub at gcc dot gnu.org
2023-12-21 17:16 ` xxs_chy at outlook dot com
2023-12-21 17:50 ` jakub at gcc dot gnu.org
2023-12-21 22:51 ` jakub at gcc dot gnu.org
2023-12-23 10:02 ` xxs_chy at outlook dot com
2024-05-30  4:46 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).