From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
 id ACD45385702E; Sat,  3 Oct 2020 23:18:42 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org ACD45385702E
From: "tkoenig at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug rtl-optimization/97282] division done twice for modulo and
 divsion for 128-bit integers
Date: Sat, 03 Oct 2020 23:18:42 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: rtl-optimization
X-Bugzilla-Version: 11.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: enhancement
X-Bugzilla-Who: tkoenig at gcc dot gnu.org
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-97282-4-mvzZsLHGiF@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-97282-4@http.gcc.gnu.org/bugzilla/>
References: <bug-97282-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: gcc-bugs@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-bugs mailing list <gcc-bugs.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Sat, 03 Oct 2020 23:18:42 -0000

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D97282
--- Comment #1 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
And here is a version which uses two 64-bit numbers for calculation of he
sum of decimal digits as a benchmark for the division and modulo:

unsigned long digsum3 (myint x)
{
  unsigned long ret;
  __uint64_t high, low;
  const unsigned long int rem_high[10] =3D {0,6,2,8,4,0,6,2,8,4};
  const unsigned long int foo_high[10] =3D
    {0x0000000000000000, 0x1999999999999999, 0x3333333333333333,
0x4CCCCCCCCCCCCCCC,
     0x6666666666666666, 0x8000000000000000, 0x9999999999999999,
0xB333333333333333,
     0xCCCCCCCCCCCCCCCC, 0xE666666666666666 };

  high =3D x >> 64;
  low =3D x;
  ret =3D 0;
  while (low > 0 || high > 0)
    {
      unsigned long r_high, r_low, r_sum, r_carry;
      r_high =3D high % 10;
      r_carry =3D rem_high[r_high];
      high =3D high / 10;
      r_low =3D low % 10;
      low =3D low / 10;
      low =3D low + foo_high[r_high];
      r_sum =3D r_low + r_carry;
      if (r_sum >=3D 10)
        {
          r_sum =3D r_sum - 10;
          low ++;
        }
      ret =3D ret + r_sum;
    }
  return ret;
}

It is _much_ faster, taking around 250 to 260 cycles per
calculation, a speedup of a factor of around 8 versus the
original code.=