From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-369970-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 9891 invoked by alias); 11 Jun 2014 09:30:25 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 9878 invoked by uid 89); 11 Jun 2014 09:30:24 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2
X-HELO: mail-la0-f50.google.com
Received: from mail-la0-f50.google.com (HELO mail-la0-f50.google.com) (209.85.215.50) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Wed, 11 Jun 2014 09:30:22 +0000
Received: by mail-la0-f50.google.com with SMTP id b8so4607435lan.23        for <gcc-patches@gcc.gnu.org>; Wed, 11 Jun 2014 02:30:18 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=1e100.net; s=20130820;        h=x-gm-message-state:mime-version:in-reply-to:references:date         :message-id:subject:from:to:cc:content-type;        bh=a5p20yUk/4GSWTWPcpN6umFSklNCJC8dRjw6nIVxrZY=;        b=N8Hcv2uQOjeyFG5BqvVMLKaWF7RFYPmFJYXxrMHJGDDwgzP58B4tHABrQbgPJnC1FR         ++Ge6CcWi8Gj2GDtUO9DeQDQ4BF4fI5196fpTiHkd2tX3uToQ+vTsquxJWlFd9RY3maG         S17Iu2RaTrI38fQQ/rXr2ZxQH3932bGwiL8SowcaC+p6/rXc/upXnGD8sdLYGbJLTgCC         YnzlKigJx1pgE4Wq7BAm9YmVECHPmrxSqbs5OpH8tIMU++/JtkMU8N3FpltWRBWvcBcD         Id+6gTB/vXRD8SDhIqf4RLtG5ZJWbRQSusVn5PIuoNGfWlKdc7ZRF5FtDJD0LvdGHdtn         UwMg==
X-Gm-Message-State: ALoCoQmTtT05kSTh3d7UvcLpOdJIj9J7XkdJWH9bLvLoQok5yz696zu/NH+Yh+6Mghly5xhSlTFs
MIME-Version: 1.0
X-Received: by 10.152.203.236 with SMTP id kt12mr25676465lac.8.1402479018739; Wed, 11 Jun 2014 02:30:18 -0700 (PDT)
Received: by 10.112.142.227 with HTTP; Wed, 11 Jun 2014 02:30:18 -0700 (PDT)
In-Reply-To: <CADnVucB50UQqXTqn+ODx0QQ-5rFY15tM_iSart4nWXbbrW2zGQ@mail.gmail.com>
References: <CADnVucB50UQqXTqn+ODx0QQ-5rFY15tM_iSart4nWXbbrW2zGQ@mail.gmail.com>
Date: Wed, 11 Jun 2014 09:30:00 -0000
Message-ID: <CADnVucCDOcVq7h_b_T5gCtyCHFNXFAjXU11Yf0o6m+4LDLshBw@mail.gmail.com>
Subject: Re: [PATCH, ARM, v2] Improve 64 bit division performance
From: Charles Baylis <charles.baylis@linaro.org>
To: Richard Earnshaw <rearnsha@arm.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>, 	Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>
Content-Type: text/plain; charset=UTF-8
X-IsSubscribed: yes
X-SW-Source: 2014-06/txt/msg00890.txt.bz2

ping?

On 22 May 2014 11:08, Charles Baylis <charles.baylis@linaro.org> wrote:
> On 1 May 2014 16:41, Richard Earnshaw <rearnsha@arm.com> wrote:
>> I think really, you've got three independent changes here:
>
> Version 2 of this patch series is now a 9 patch series which addresses
> most of the following. Exceptions discussed below.
>
>> 1) Optimize the prologue/epilogue sequences when ldrd is available.
>> 2) Replace the call to __gnu_ldivmod_helper with __udivmoddi4
>
> I assume you mean __gnu_uldivmod_helper here, as __gnu_ldivmod_helper
> performs signed division and can't be directly replaced with the
> unsigned division performed by __udivmoddi4.
>
>> 3) Optimize the code to __aeabi_ldivmod.
>
> Converting to call __udivmoddi4, fixing up signedness of operands and
> results and optimisation are all one change.
>
>> Ideally, therefore, this is a three patch series, but it's then missing
>> a few bits.
>>
>> 4) Step 2 can also be trivially applied to bpabi-v6m.S as well, since
>> it's a direct swap of one function for another (unless I've misread the
>> changes, I think the ABI of the two helper functions are the same).
>
> For __aeabi_uldivmod this is true. For __aeabi_ldivmod this is not
> trivial as the signedness fix-ups must be written.
>
>> 5) Step 4 then makes __gnu_ldivmod_helper in bpabi.c a dead function
>> which can be deleted.  This is good because currently pulling in either
>> 64-bit division function causes both these helper functions to be pulled
>> in and thus the whole of the 64-bit div-mod code for both signed and
>> unsigned values.   That's particularly unfortunate for ARMv6m class
>> devices as that's potentially a lot of redundant code.
>
> Similarly, __gnu_uldivmod_helper not __gnu_ldivmod_helper.
>
> I've included two patches which do the trivial steps for the unsigned case.
>
>>
>> Finally, I know this was the original code, but the complete lack of
>> comments in this code made reviewing even the trivial parts a complete
>> nightmare -- it took me half an hour before I remembered that
>> __udivmoddi4 took three parameters, the third of which was on the stack:
>> thus the messing around with sp/ip in the prologue wasn't just trivial
>> padding but a necessary part of the function.  Please could you add, at
>> least some short comments clarifying the register disposition on input
>> and what that prologue code is up to...
>
> Done.
>
>> Finally, how was this code tested?
>
> It has been built and "make check" has been run with no regressions on:
> arm-unknown-linux-gnueabihf --with-mode=thumb --with-arch=armv7-a
> arm-unknown-linux-gnueabihf --with-mode=arm --with-arch=armv7-a
> arm-unknown-linux-gnueabi --with-mode=arm --with-arch=armv5te
> arm-unknown-linux-gnueabi --with-mode=arm --with-arch=armv4t
>
> I have also run a simple test harness which checks the result of
> several 64 bit division operations where gcc has been built with the
> above configurations.
>
> I am not currently set up with a way to test v6M, so those parts aren't tested.
>
>> Anyway, some additional comments below:
>>
>> Don't repeat the function name for multiple tweaks to the same function;
>> as mentioned above, if these are really separate changes they should be
>> in separate submissions.  Mixing unrelated changes just makes the
>> reviewing step that much harder.
>
> Done.
>
>
>>> +     strd ip,lr, [sp, #-16]!
>>
>> Space after comma.
>
> Done
>
>> Also, since you've essentially rewritten the entire function, can you
>> please also reformat them to follow the coding style of the rest of the
>> file: namely "<tab>OP<tab>operands".
>
> Done
>
>>>  #else
>>> +     sub sp, sp, #8
>>>       do_push {sp, lr}
>>>  #endif
>>
>> Please add a comment that the value at *sp is the address of the the
>> slot for the remainder.
>
> Done
>>> +#if defined(__thumb2__) && CAN_USE_LDRD
>>> +     sub ip, sp, #8
>>> +     strd ip,lr, [sp, #-16]!
>>
>> Space after comma.
>
> Done
>
>>>  #else
>>> +     sub sp, sp, #8
>>>       do_push {sp, lr}
>>>  #endif
>>> +     cmp xxh, #0
>>> +     blt 1f
>>> +     cmp yyh, #0
>>> +     blt 2f
>>> +
>>> +98:  cfi_push 98b - __aeabi_ldivmod, 0xe, -0xc, 0x10
>>
>> The CFI push should really precede your conditional tests, it relates to
>> the do_push expression.
>
> Done.
>
>>> +     bl SYM(__udivmoddi4) __PLT__
>>> +     ldr lr, [sp, #4]
>>> +#if CAN_USE_LDRD
>>> +     ldrd r2, r3, [sp, #8]
>>> +     add sp, sp, #16
>>> +#else
>>> +     add sp, sp, #8
>>> +     do_pop {r2, r3}
>>> +#endif
>>
>> You're missing a CFI pop, which is needed when the values on the stack
>> go out of scope.
>
> The existing code doesn't do this. Since there are multiple exit
> points from the optimised function the existing cfi_* macros aren't
> sufficient (there is no cfi_save_state/cfi_restore_state), so I have
> included a patch which uses the gas .cfi_* directives. This may be
> interesting on non-DWARF or non-ELF platforms, if any are still
> supported .
>
>>> +     RET
>>> +1: /* xxh:xxl is negative */
>>> +     rsbs xxl, xxl, #0
>>
>> We're using unified syntax, so NEGS is preferable.
>
> Done
>
>>> +     sbc xxh, xxh, xxh, lsl #1
>>
>> Worthy of a comment, Thumb2 has no RSC instruction, so use X - 2X.
>
> Done
>
>>> +     cmp yyh, #0
>>> +     blt 3f
>>> +98:  cfi_push 98b - __aeabi_ldivmod, 0xe, -0xc, 0x10
>>
>> This CFI push looks wrong.  You've already pushed things earlier.  On
>> the other hand, you should save the state before the CFI pop above, so
>> that you can restore the state again for the next (ie this) block of code.
>
> Done (see above)
>
>>> +98:  cfi_push 98b - __aeabi_ldivmod, 0xe, -0xc, 0x10
>>> +     bl SYM(__udivmoddi4) __PLT__
>>> +     ldr lr, [sp, #4]
>>> +#if CAN_USE_LDRD
>>> +     ldrd r2, r3, [sp, #8]
>>> +     add sp, sp, #16
>>> +#else
>>> +     add sp, sp, #8
>>> +     do_pop {r2, r3}
>>> +#endif
>>> +     rsbs yyl, yyl, #0
>>> +     sbc yyh, yyh, yyh, lsl #1
>>> +     RET
>>>
>>>  #endif /* L_aeabi_ldivmod */
>>>
>>
>> You use the LDRD vs do_pop sequence identically several times.  To avoid
>> a lot of ifdefs, it might be worth considering a macro for this idiom to
>> reduce the overall amount of conditionalized code.
>
> Done.
>
>
> The updated patch series is attached. Hopefully, patches 1 through 6
> are now ready. Patches 7 through 9 can be dropped if necessary.
>
>
>
>
> 0001-Whitespace.patch
>
> 2014-05-22  Charles Baylis  <charles.baylis@linaro.org>
>
>         * config/arm/bpabi.S (__aeabi_uldivmod): Fix whitespace.
>         (__aeabi_ldivmod): Fix whitespace.
>
>
>
> 0002-Add-comments.patch
>
> 2014-05-22  Charles Baylis  <charles.baylis@linaro.org>
>
>         * config/arm/bpabi.S (__aeabi_uldivmod, __aeabi_ldivmod): Add comment
>         describing register usage on function entry and exit.
>
>
>
> 0003-Optimise-__aeabi_uldivmod-stack-manipulation.patch
>
> 2014-05-22  Charles Baylis  <charles.baylis@linaro.org>
>
>         * config/arm/bpabi.S (__aeabi_uldivmod): Optimise stack pointer
>         manipulation.
>
>
>
> 0004-Optimise-__aeabi_uldivmod.patch
>
> 2014-05-22  Charles Baylis  <charles.baylis@linaro.org>
>
>         * config/arm/bpabi.S (__aeabi_uldivmod): Perform division using call
>         to __udivmoddi4.
>
>
>
> 0005-Optimise-__aeabi_ldivmod-stack-manipulation.patch
>
> 2014-05-22  Charles Baylis  <charles.baylis@linaro.org>
>
>         * config/arm/bpabi.S (__aeabi_ldivmod): Optimise stack manipulation.
>
>
>
> 0006-Optimise-__aeabi_ldivmod.patch
>
> 2014-05-22  Charles Baylis  <charles.baylis@linaro.org>
>
>         * config/arm/bpabi.S (__aeabi_ldivmod): Perform division using
>         __udivmoddi4, and fixups for negative operands.
>
>
>
> 0007-Fix-cfi-annotations.patch
>
> 2014-05-22  Charles Baylis  <charles.baylis@linaro.org>
>
>         * config/arm/bpabi.S (__aeabi_ldivmod, __aeabi_uldivmod,
>         push_for_divide, pop_for_divide): Use .cfi_* directives for DWARF
>         annotations. Fix DWARF information.
>
>
>
> 0008-Use-__udivmoddi4-for-v6M-aeabi_uldivmod.patch
>
> 2014-05-22  Charles Baylis  <charles.baylis@linaro.org>
>
>         * config/arm/bpabi-v6m.S (__aeabi_uldivmod): Perform division using
>         __udivmoddi4.
>
>
>
> 0009-Remove-__gnu_uldivmod_helper.patch
>
> 2014-05-22  Charles Baylis  <charles.baylis@linaro.org>
>
>         * config/arm/bpabi.c (__gnu_uldivmod_helper): Remove.