[Bug tree-optimization/64308] New: Missed optimization: 64-bit divide used when 32-bit divide would work

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/64308] New: Missed optimization: 64-bit divide used when 32-bit divide would work
@ 2014-12-14 18:39 tavianator at gmail dot com
  2014-12-15 10:34 ` [Bug tree-optimization/64308] " glisse at gcc dot gnu.org
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: tavianator at gmail dot com @ 2014-12-14 18:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64308

            Bug ID: 64308
           Summary: Missed optimization: 64-bit divide used when 32-bit
                    divide would work
           Product: gcc
           Version: 4.9.2
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: tavianator at gmail dot com

Created attachment 34280
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=34280&action=edit
Test case

The following is a fairly typical implementation of exponentiation modulo m:

$ cat ipowm.c
// Computes (b**e) % m
unsigned int
ipowm(unsigned int b, unsigned int e, unsigned int m)
{
  unsigned int ret;
  b %= m;
  for (ret = m > 1; e; e >>= 1) {
    if ((e & 1) == 1) {
      ret = (unsigned long long)ret * b % m;
    }
    b = (unsigned long long)b * b % m;
  }
  return ret;
}

Unfortunately, GCC emits a 64-bit multiply and divide for both "... * b % m"
expressions on x86 and x86-64, where a 32-bit multiply and divide would be
equivalent and faster.

$ gcc -std=c11 -O3 -Wall -S -save-temps ipowm.c
$ cat ipowm.s
...
    imulq    %rdi, %rax
    divq    %rcx
...
    imulq    %rdi, %rax
    divq    %rcx
...

The pattern

    mull    %edi
    divl    %ecx

would be much faster.  They're equivalent because b is always reduced mod m, so
b < m and therefore (for any unsigned int x), x * b / m <= x * m / m == x, thus
the quotient will always fit in 32 bits.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug tree-optimization/64308] Missed optimization: 64-bit divide used when 32-bit divide would work
  2014-12-14 18:39 [Bug tree-optimization/64308] New: Missed optimization: 64-bit divide used when 32-bit divide would work tavianator at gmail dot com
@ 2014-12-15 10:34 ` glisse at gcc dot gnu.org
  2014-12-20  1:29 ` olegendo at gcc dot gnu.org
  2022-01-28  0:52 ` pinskia at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: glisse at gcc dot gnu.org @ 2014-12-15 10:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64308

--- Comment #2 from Marc Glisse <glisse at gcc dot gnu.org> ---
You would need symbolic ranges, b and ret are in [0,m-1]. And then you are
using that very specific x86 instruction that divides 64 bits by 32 bits but
only works if the result fits in 32 bits. It works here because (m-1)*(m-1)/m<m
is small enough, but that's very hard for the compiler to prove, and I don't
know of another architecture with a similar instruction.

(Related: PR58897, PR53100)


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug tree-optimization/64308] Missed optimization: 64-bit divide used when 32-bit divide would work
  2014-12-14 18:39 [Bug tree-optimization/64308] New: Missed optimization: 64-bit divide used when 32-bit divide would work tavianator at gmail dot com
  2014-12-15 10:34 ` [Bug tree-optimization/64308] " glisse at gcc dot gnu.org
@ 2014-12-20  1:29 ` olegendo at gcc dot gnu.org
  2022-01-28  0:52 ` pinskia at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: olegendo at gcc dot gnu.org @ 2014-12-20  1:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64308

Oleg Endo <olegendo at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |olegendo at gcc dot gnu.org

--- Comment #4 from Oleg Endo <olegendo at gcc dot gnu.org> ---
(In reply to Marc Glisse from comment #2)
> You would need symbolic ranges, b and ret are in [0,m-1]. And then you are
> using that very specific x86 instruction that divides 64 bits by 32 bits but
> only works if the result fits in 32 bits. It works here because
> (m-1)*(m-1)/m<m is small enough, but that's very hard for the compiler to
> prove, and I don't know of another architecture with a similar instruction.

On SH a 64/32 -> 32 bit unsigned division can be done by repeating rotcl,div1
32 times (once for each result bit).  It's one of the examples for the 1-step
division insn 'div1' in the manuals.  Effectively it's the same as the x86 divl
insn.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug tree-optimization/64308] Missed optimization: 64-bit divide used when 32-bit divide would work
  2014-12-14 18:39 [Bug tree-optimization/64308] New: Missed optimization: 64-bit divide used when 32-bit divide would work tavianator at gmail dot com
  2014-12-15 10:34 ` [Bug tree-optimization/64308] " glisse at gcc dot gnu.org
  2014-12-20  1:29 ` olegendo at gcc dot gnu.org
@ 2022-01-28  0:52 ` pinskia at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-01-28  0:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64308

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-01-28  0:52 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-14 18:39 [Bug tree-optimization/64308] New: Missed optimization: 64-bit divide used when 32-bit divide would work tavianator at gmail dot com
2014-12-15 10:34 ` [Bug tree-optimization/64308] " glisse at gcc dot gnu.org
2014-12-20  1:29 ` olegendo at gcc dot gnu.org
2022-01-28  0:52 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).