From: Richard Earnshaw <rearnsha@arm.com>
To: Charles Baylis <charles.baylis@linaro.org>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>,
Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>
Subject: Re: [PATCH, ARM, v2] Improve 64 bit division performance
Date: Wed, 11 Jun 2014 09:33:00 -0000 [thread overview]
Message-ID: <5398226C.5060905@arm.com> (raw)
In-Reply-To: <CADnVucCDOcVq7h_b_T5gCtyCHFNXFAjXU11Yf0o6m+4LDLshBw@mail.gmail.com>
On 11/06/14 10:30, Charles Baylis wrote:
> ping?
>
Sorry, can you resend this as a series of mails, with one patch per mail.
R.
> On 22 May 2014 11:08, Charles Baylis <charles.baylis@linaro.org> wrote:
>> On 1 May 2014 16:41, Richard Earnshaw <rearnsha@arm.com> wrote:
>>> I think really, you've got three independent changes here:
>>
>> Version 2 of this patch series is now a 9 patch series which addresses
>> most of the following. Exceptions discussed below.
>>
>>> 1) Optimize the prologue/epilogue sequences when ldrd is available.
>>> 2) Replace the call to __gnu_ldivmod_helper with __udivmoddi4
>>
>> I assume you mean __gnu_uldivmod_helper here, as __gnu_ldivmod_helper
>> performs signed division and can't be directly replaced with the
>> unsigned division performed by __udivmoddi4.
>>
>>> 3) Optimize the code to __aeabi_ldivmod.
>>
>> Converting to call __udivmoddi4, fixing up signedness of operands and
>> results and optimisation are all one change.
>>
>>> Ideally, therefore, this is a three patch series, but it's then missing
>>> a few bits.
>>>
>>> 4) Step 2 can also be trivially applied to bpabi-v6m.S as well, since
>>> it's a direct swap of one function for another (unless I've misread the
>>> changes, I think the ABI of the two helper functions are the same).
>>
>> For __aeabi_uldivmod this is true. For __aeabi_ldivmod this is not
>> trivial as the signedness fix-ups must be written.
>>
>>> 5) Step 4 then makes __gnu_ldivmod_helper in bpabi.c a dead function
>>> which can be deleted. This is good because currently pulling in either
>>> 64-bit division function causes both these helper functions to be pulled
>>> in and thus the whole of the 64-bit div-mod code for both signed and
>>> unsigned values. That's particularly unfortunate for ARMv6m class
>>> devices as that's potentially a lot of redundant code.
>>
>> Similarly, __gnu_uldivmod_helper not __gnu_ldivmod_helper.
>>
>> I've included two patches which do the trivial steps for the unsigned case.
>>
>>>
>>> Finally, I know this was the original code, but the complete lack of
>>> comments in this code made reviewing even the trivial parts a complete
>>> nightmare -- it took me half an hour before I remembered that
>>> __udivmoddi4 took three parameters, the third of which was on the stack:
>>> thus the messing around with sp/ip in the prologue wasn't just trivial
>>> padding but a necessary part of the function. Please could you add, at
>>> least some short comments clarifying the register disposition on input
>>> and what that prologue code is up to...
>>
>> Done.
>>
>>> Finally, how was this code tested?
>>
>> It has been built and "make check" has been run with no regressions on:
>> arm-unknown-linux-gnueabihf --with-mode=thumb --with-arch=armv7-a
>> arm-unknown-linux-gnueabihf --with-mode=arm --with-arch=armv7-a
>> arm-unknown-linux-gnueabi --with-mode=arm --with-arch=armv5te
>> arm-unknown-linux-gnueabi --with-mode=arm --with-arch=armv4t
>>
>> I have also run a simple test harness which checks the result of
>> several 64 bit division operations where gcc has been built with the
>> above configurations.
>>
>> I am not currently set up with a way to test v6M, so those parts aren't tested.
>>
>>> Anyway, some additional comments below:
>>>
>>> Don't repeat the function name for multiple tweaks to the same function;
>>> as mentioned above, if these are really separate changes they should be
>>> in separate submissions. Mixing unrelated changes just makes the
>>> reviewing step that much harder.
>>
>> Done.
>>
>>
>>>> + strd ip,lr, [sp, #-16]!
>>>
>>> Space after comma.
>>
>> Done
>>
>>> Also, since you've essentially rewritten the entire function, can you
>>> please also reformat them to follow the coding style of the rest of the
>>> file: namely "<tab>OP<tab>operands".
>>
>> Done
>>
>>>> #else
>>>> + sub sp, sp, #8
>>>> do_push {sp, lr}
>>>> #endif
>>>
>>> Please add a comment that the value at *sp is the address of the the
>>> slot for the remainder.
>>
>> Done
>>>> +#if defined(__thumb2__) && CAN_USE_LDRD
>>>> + sub ip, sp, #8
>>>> + strd ip,lr, [sp, #-16]!
>>>
>>> Space after comma.
>>
>> Done
>>
>>>> #else
>>>> + sub sp, sp, #8
>>>> do_push {sp, lr}
>>>> #endif
>>>> + cmp xxh, #0
>>>> + blt 1f
>>>> + cmp yyh, #0
>>>> + blt 2f
>>>> +
>>>> +98: cfi_push 98b - __aeabi_ldivmod, 0xe, -0xc, 0x10
>>>
>>> The CFI push should really precede your conditional tests, it relates to
>>> the do_push expression.
>>
>> Done.
>>
>>>> + bl SYM(__udivmoddi4) __PLT__
>>>> + ldr lr, [sp, #4]
>>>> +#if CAN_USE_LDRD
>>>> + ldrd r2, r3, [sp, #8]
>>>> + add sp, sp, #16
>>>> +#else
>>>> + add sp, sp, #8
>>>> + do_pop {r2, r3}
>>>> +#endif
>>>
>>> You're missing a CFI pop, which is needed when the values on the stack
>>> go out of scope.
>>
>> The existing code doesn't do this. Since there are multiple exit
>> points from the optimised function the existing cfi_* macros aren't
>> sufficient (there is no cfi_save_state/cfi_restore_state), so I have
>> included a patch which uses the gas .cfi_* directives. This may be
>> interesting on non-DWARF or non-ELF platforms, if any are still
>> supported .
>>
>>>> + RET
>>>> +1: /* xxh:xxl is negative */
>>>> + rsbs xxl, xxl, #0
>>>
>>> We're using unified syntax, so NEGS is preferable.
>>
>> Done
>>
>>>> + sbc xxh, xxh, xxh, lsl #1
>>>
>>> Worthy of a comment, Thumb2 has no RSC instruction, so use X - 2X.
>>
>> Done
>>
>>>> + cmp yyh, #0
>>>> + blt 3f
>>>> +98: cfi_push 98b - __aeabi_ldivmod, 0xe, -0xc, 0x10
>>>
>>> This CFI push looks wrong. You've already pushed things earlier. On
>>> the other hand, you should save the state before the CFI pop above, so
>>> that you can restore the state again for the next (ie this) block of code.
>>
>> Done (see above)
>>
>>>> +98: cfi_push 98b - __aeabi_ldivmod, 0xe, -0xc, 0x10
>>>> + bl SYM(__udivmoddi4) __PLT__
>>>> + ldr lr, [sp, #4]
>>>> +#if CAN_USE_LDRD
>>>> + ldrd r2, r3, [sp, #8]
>>>> + add sp, sp, #16
>>>> +#else
>>>> + add sp, sp, #8
>>>> + do_pop {r2, r3}
>>>> +#endif
>>>> + rsbs yyl, yyl, #0
>>>> + sbc yyh, yyh, yyh, lsl #1
>>>> + RET
>>>>
>>>> #endif /* L_aeabi_ldivmod */
>>>>
>>>
>>> You use the LDRD vs do_pop sequence identically several times. To avoid
>>> a lot of ifdefs, it might be worth considering a macro for this idiom to
>>> reduce the overall amount of conditionalized code.
>>
>> Done.
>>
>>
>> The updated patch series is attached. Hopefully, patches 1 through 6
>> are now ready. Patches 7 through 9 can be dropped if necessary.
>>
>>
>>
>>
>> 0001-Whitespace.patch
>>
>> 2014-05-22 Charles Baylis <charles.baylis@linaro.org>
>>
>> * config/arm/bpabi.S (__aeabi_uldivmod): Fix whitespace.
>> (__aeabi_ldivmod): Fix whitespace.
>>
>>
>>
>> 0002-Add-comments.patch
>>
>> 2014-05-22 Charles Baylis <charles.baylis@linaro.org>
>>
>> * config/arm/bpabi.S (__aeabi_uldivmod, __aeabi_ldivmod): Add comment
>> describing register usage on function entry and exit.
>>
>>
>>
>> 0003-Optimise-__aeabi_uldivmod-stack-manipulation.patch
>>
>> 2014-05-22 Charles Baylis <charles.baylis@linaro.org>
>>
>> * config/arm/bpabi.S (__aeabi_uldivmod): Optimise stack pointer
>> manipulation.
>>
>>
>>
>> 0004-Optimise-__aeabi_uldivmod.patch
>>
>> 2014-05-22 Charles Baylis <charles.baylis@linaro.org>
>>
>> * config/arm/bpabi.S (__aeabi_uldivmod): Perform division using call
>> to __udivmoddi4.
>>
>>
>>
>> 0005-Optimise-__aeabi_ldivmod-stack-manipulation.patch
>>
>> 2014-05-22 Charles Baylis <charles.baylis@linaro.org>
>>
>> * config/arm/bpabi.S (__aeabi_ldivmod): Optimise stack manipulation.
>>
>>
>>
>> 0006-Optimise-__aeabi_ldivmod.patch
>>
>> 2014-05-22 Charles Baylis <charles.baylis@linaro.org>
>>
>> * config/arm/bpabi.S (__aeabi_ldivmod): Perform division using
>> __udivmoddi4, and fixups for negative operands.
>>
>>
>>
>> 0007-Fix-cfi-annotations.patch
>>
>> 2014-05-22 Charles Baylis <charles.baylis@linaro.org>
>>
>> * config/arm/bpabi.S (__aeabi_ldivmod, __aeabi_uldivmod,
>> push_for_divide, pop_for_divide): Use .cfi_* directives for DWARF
>> annotations. Fix DWARF information.
>>
>>
>>
>> 0008-Use-__udivmoddi4-for-v6M-aeabi_uldivmod.patch
>>
>> 2014-05-22 Charles Baylis <charles.baylis@linaro.org>
>>
>> * config/arm/bpabi-v6m.S (__aeabi_uldivmod): Perform division using
>> __udivmoddi4.
>>
>>
>>
>> 0009-Remove-__gnu_uldivmod_helper.patch
>>
>> 2014-05-22 Charles Baylis <charles.baylis@linaro.org>
>>
>> * config/arm/bpabi.c (__gnu_uldivmod_helper): Remove.
>
next prev parent reply other threads:[~2014-06-11 9:33 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-22 10:08 Charles Baylis
2014-06-11 9:30 ` Charles Baylis
2014-06-11 9:33 ` Richard Earnshaw [this message]
2014-06-11 10:20 ` [PATCH 1/9] Whitespace Charles Baylis
2014-06-11 10:20 ` [PATCH 4/9] Optimise __aeabi_uldivmod Charles Baylis
2014-06-18 13:53 ` Richard Earnshaw
2014-06-11 10:20 ` [PATCH 9/9] Remove __gnu_uldivmod_helper Charles Baylis
2014-06-18 14:09 ` Richard Earnshaw
2014-06-11 10:20 ` [PATCH 3/9] Optimise __aeabi_uldivmod (stack manipulation) Charles Baylis
2014-06-18 13:52 ` Richard Earnshaw
2014-06-11 10:20 ` [PATCH 7/9] Fix cfi annotations Charles Baylis
2014-06-18 14:04 ` Richard Earnshaw
2014-06-11 10:20 ` [PATCH 2/9] Add comments Charles Baylis
2014-06-11 12:55 ` Richard Earnshaw
2014-06-11 10:20 ` [PATCH 6/9] Optimise __aeabi_ldivmod Charles Baylis
2014-06-18 14:03 ` Richard Earnshaw
2014-06-11 10:20 ` [PATCH 5/9] Optimise __aeabi_ldivmod (stack manipulation) Charles Baylis
2014-06-18 13:53 ` Richard Earnshaw
2014-06-11 10:20 ` [PATCH 8/9] Use __udivmoddi4 for v6M aeabi_uldivmod Charles Baylis
2014-06-18 14:04 ` Richard Earnshaw
2014-06-11 12:55 ` [PATCH 1/9] Whitespace Richard Earnshaw
2014-06-18 15:55 ` Charles Baylis
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5398226C.5060905@arm.com \
--to=rearnsha@arm.com \
--cc=Ramana.Radhakrishnan@arm.com \
--cc=charles.baylis@linaro.org \
--cc=gcc-patches@gcc.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).