From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 38746 invoked by alias); 10 Jul 2018 11:09:15 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 38725 invoked by uid 89); 10 Jul 2018 11:09:14 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-1.6 required=5.0 tests=BAYES_00,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=no version=3.3.2 spammy=average, warren, Warren, acknowledge X-HELO: mail-lf0-f51.google.com Received: from mail-lf0-f51.google.com (HELO mail-lf0-f51.google.com) (209.85.215.51) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 10 Jul 2018 11:09:10 +0000 Received: by mail-lf0-f51.google.com with SMTP id f18-v6so1827958lfc.2 for ; Tue, 10 Jul 2018 04:09:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=vXPVNsL15v6TQcxaMGTm1VJiPA66rnHlpogKQroxJKQ=; b=Z8oLoFFjHygA7UYvVjdh8NErl9vBWBVwrP4KnS/3FIT6ahOvNOXiCoycq9LQFVCQjz aZUi97q9bA1+8SPGkI3tJJvqBdM5EfNMjz+57YnPuuxdne+1/g/wPBtUDqN3yfTKea1C ekX4itlH9pKELYp1qCCRZGDohn8YI0zM39nOkUy6HgGt3+F2paJqAgwomn3ypbFz/7bf rJTlgy6jzOxscIhdau7D8NQFa2R+m1ZlOyS2BpVNq/uSDE6HD0JPTmpKbr/q699uFd9V M8n9bYv2/blRWLHHk9fMMYdZJjknGXofeQNhd8L4xl79Ki+uE1xWLQA9OqRyqJX+XkdN d3Rg== MIME-Version: 1.0 Received: by 2002:a19:5510:0:0:0:0:0 with HTTP; Tue, 10 Jul 2018 04:09:06 -0700 (PDT) In-Reply-To: References: From: "colinb2 ." Date: Tue, 10 Jul 2018 11:09:00 -0000 Message-ID: Subject: Making GNU GCC choose_multiplier in expmed.c significantly faster To: gcc@gcc.gnu.org Content-Type: multipart/mixed; boundary="000000000000a25d5f0570a32815" X-SW-Source: 2018-07/txt/msg00159.txt.bz2 --000000000000a25d5f0570a32815 Content-Type: text/plain; charset="UTF-8" Content-length: 5909 Feel free to copy this email and attachment to anyone who might be interested. I'm very happy to answer any questions anyone has. The program can be compiled and run like this on Linux with GNU GCC: gcc -O2 -o expmed2.exe expmed2.c ./expmed2.exe This email deals with making part of the GNU GCC compiler - integer division by a constant divisor - faster. (The calculation of the parameters for the machine code will be faster; compiled programs won't run faster.) Further down I mention inequality (1) which can be used to make the LLVM compiler somewhat faster, because that currently uses code based on (2). I don't know what - if anything - the Java JVM uses for this, or how other compilers do this, but these ideas may be useful there. By significantly faster I mean I have benchmarked alternative versions of choose_multiplier which on my low specification netbook can take maybe less than half the time the current version takes. Time saved in compilation is much less important than time saved in running compiled programs, but the code for the faster versions is about the same length as the code for the current version, and is only a bit more complicated, so is worth considering? A short summary of the following is that choose_multiplier currently uses an elegant algorithm due to Granlund & Montgomery, but which as implemented seems rather slow. We can make it faster while retaining the basic structure, and using a different, mostly equivalent, algorithm, may be a bit faster than that. Licencing: in Fractint people's words "Don't want money, want recognition!" The version choose_multiplier_v2 is based - but improves - on what's in the GCC choose_multiplier function in file expmed.c, so the GCC licence. The version choose_multiplier_v4 is based - but improves - on magicu2 in "Hacker's Delight", so the licence is you needn't acknowledge the source but it would be nice to credit the code as from magicu2 in "Hacker's Delight" by Henry S Warren http://hackersdelight.org with improvements by Colin Bartlett . This latter also applies to choose_multiplier_power2_divrem because that is also an obvious (?) idea from magicu2 in "Hacker's Delight". */ The idea is using "wide int" seems slow compared to "unsigned HOST_WIDE_INT", so it makes sense to avoid using "wide int" as much as possible. We can easily rewrite choose_multiplier to only use "wide int" to calculate the initial mlow; this is choose_multiplier_v2. An alternative for choose_multiplier_v2 completely avoids using "wide int" by iterating upwards for the initial mlow, but if that benchmarks as faster than using "wide int" even once (I suspect it might) then just iterating upwards may even be a bit faster; this is choose_multiplier_v4. The attachment is self-contained, and I have checked that the values produced agree with a "Hacker's Delight" table of M/2^s for small d and n=precision=32. What follows is a short summary of the theory, applying it to choose_multiplier. Problem: find M/2^s so that for 0<=i<=iMax>=d we have floor(i/d)=floor(i*M/2^s). Let qc=floor((iMax+1)/d); nc=qc*d-1; lgup=ceiling(log2(d)). Given s let M=floor(2^s/d)+1; delta=M*d-2^s. For GCC choose_multiplier: * equivalent necessary & sufficient conditions: (1) 0