public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/97282] New: division done twice for modulo and divsion for 128-bit integers
@ 2020-10-03 16:37 tkoenig at gcc dot gnu.org
  2020-10-03 16:37 ` [Bug rtl-optimization/97282] " tkoenig at gcc dot gnu.org
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2020-10-03 16:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97282

            Bug ID: 97282
           Summary: division done twice for modulo and divsion for 128-bit
                    integers
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: tkoenig at gcc dot gnu.org
  Target Milestone: ---

Currently, gcc calls the (long and slow) division routines for 128-bit
integers twice when both the residual and the value is needed.  For
the other integer types, this is optimized.

Take this test case:

$ cat digsum.c
#include <stdio.h>
#include <x86intrin.h>

typedef __uint128_t myint;

unsigned long digsum1 (myint x)
{
  unsigned long ret;

  if (x == 0)
    return 0; 

  ret = 0;
  while (x > 0)
    {
      ret = ret + x % 10;
      x = x / 10;
    }
  return ret;
}

unsigned long digsum2 (myint x)
{
  unsigned long ret;
  myint tmp;

  if (x == 0)
    return 0; 

  ret = 0;
  while (x > 0)
    {
      tmp = x / 10;
      ret = ret + (x - tmp * 10);
      x = tmp;
    }
  return ret;
}

#define NUM 1000000
int main()
{
  myint x;
  unsigned long sum;
  long int t1, t2;
  __uint128_t from, to;

  from = 1;
  from = (from << 93) - NUM/2;
  to = from + NUM;
  sum = 0;

  t1 = __rdtsc();
  for (x=from; x<to; x++)
    sum = sum + digsum1(x);

  t2 = __rdtsc();
  printf ("digsum1:\nsum = %lu\n", sum);
  printf ("cycles per sum = %.2f\n\n", (double) (t2-t1)/NUM);

  sum = 0;
  t1 = __rdtsc();
  for (x=from; x<to; x++)
    sum = sum + digsum2(x);

  t2 = __rdtsc();
  printf ("digsum2:\nsum = %lu\n", sum);
  printf ("cycles per sum = %.2f\n", (double) (t2-t1)/NUM);

  return 0;
}

"As is", this gives on my machine

$ gcc -O3 digsum.c
$ ./a.out
digsum1:
sum = 113493792
cycles per sum = 2021.68

digsum2:
sum = 113493792
cycles per sum = 1025.47

(similar timings if a signed type is used).

This also affects Fortran I/O.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-08-15 11:23 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-03 16:37 [Bug rtl-optimization/97282] New: division done twice for modulo and divsion for 128-bit integers tkoenig at gcc dot gnu.org
2020-10-03 16:37 ` [Bug rtl-optimization/97282] " tkoenig at gcc dot gnu.org
2020-10-03 23:18 ` tkoenig at gcc dot gnu.org
2020-10-05 11:00 ` jakub at gcc dot gnu.org
2020-10-05 13:37 ` jakub at gcc dot gnu.org
2020-10-06  8:33 ` cvs-commit at gcc dot gnu.org
2021-08-15 11:23 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).