From: Claudiu Zissulescu Ianculescu <claziss@gmail.com>
To: Roger Sayle <roger@nextmovesoftware.com>
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [ARC PATCH] Improved SImode shifts and rotates on !TARGET_BARREL_SHIFTER.
Date: Tue, 24 Oct 2023 10:30:48 +0300 [thread overview]
Message-ID: <CAL0iMy2H2De5_b+wMZjxD45e89q=gQdKvdfrPuoXm+cFqB0S7Q@mail.gmail.com> (raw)
In-Reply-To: <002701d9fa1a$a55dec70$f019c550$@nextmovesoftware.com>
Hi Roger,
Your patch doesn't introduce new regressions. However, before pushing
to the mainline you need to fix some issues:
1. Please fix the trailing spaces and blocks of 8 spaces which should
be replaced with tabs. You can use check_GNU_style.py script to spot
them.
2. Please use capital letters for code iterators (i.e., any_shift_rotate).
Once the above issues are fixed, please proceed with your commit.
Thank you for your contribution,
Claudiu
On Sun, Oct 8, 2023 at 10:07 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> This patch completes the ARC back-end's transition to using pre-reload
> splitters for SImode shifts and rotates on targets without a barrel
> shifter. The core part is that the shift_si3 define_insn is no longer
> needed, as shifts and rotates that don't require a loop are split
> before reload, and then because shift_si3_loop is the only caller
> of output_shift, both can be significantly cleaned up and simplified.
> The output_shift function (Claudiu's "the elephant in the room") is
> renamed output_shift_loop, which handles just the four instruction
> zero-overhead loop implementations.
>
> Aside from the clean-ups, the user visible changes are much improved
> implementations of SImode shifts and rotates on affected targets.
>
> For the function:
> unsigned int rotr_1 (unsigned int x) { return (x >> 1) | (x << 31); }
>
> GCC with -O2 -mcpu=em would previously generate:
>
> rotr_1: lsr_s r2,r0
> bmsk_s r0,r0,0
> ror r0,r0
> j_s.d [blink]
> or_s r0,r0,r2
>
> with this patch, we now generate:
>
> j_s.d [blink]
> ror r0,r0
>
> For the function:
> unsigned int rotr_31 (unsigned int x) { return (x >> 31) | (x << 1); }
>
> GCC with -O2 -mcpu=em would previously generate:
>
> rotr_31:
> mov_s r2,r0 ;4
> asl_s r0,r0
> add.f 0,r2,r2
> rlc r2,0
> j_s.d [blink]
> or_s r0,r0,r2
>
> with this patch we now generate an add.f followed by an adc:
>
> rotr_31:
> add.f r0,r0,r0
> j_s.d [blink]
> add.cs r0,r0,1
>
>
> Shifts by constants requiring a loop have been improved for even counts
> by performing two operations in each iteration:
>
> int shl10(int x) { return x >> 10; }
>
> Previously looked like:
>
> shl10: mov.f lp_count, 10
> lpnz 2f
> asr r0,r0
> nop
> 2: # end single insn loop
> j_s [blink]
>
>
> And now becomes:
>
> shl10:
> mov lp_count,5
> lp 2f
> asr r0,r0
> asr r0,r0
> 2: # end single insn loop
> j_s [blink]
>
>
> So emulating ARC's SWAP on architectures that don't have it:
>
> unsigned int rotr_16 (unsigned int x) { return (x >> 16) | (x << 16); }
>
> previously required 10 instructions and ~70 cycles:
>
> rotr_16:
> mov_s r2,r0 ;4
> mov.f lp_count, 16
> lpnz 2f
> add r0,r0,r0
> nop
> 2: # end single insn loop
> mov.f lp_count, 16
> lpnz 2f
> lsr r2,r2
> nop
> 2: # end single insn loop
> j_s.d [blink]
> or_s r0,r0,r2
>
> now becomes just 4 instructions and ~18 cycles:
>
> rotr_16:
> mov lp_count,8
> lp 2f
> ror r0,r0
> ror r0,r0
> 2: # end single insn loop
> j_s [blink]
>
>
> This patch has been tested with a cross-compiler to arc-linux hosted
> on x86_64-pc-linux-gnu and (partially) tested with the compile-only
> portions of the testsuite with no regressions. Ok for mainline, if
> your own testing shows no issues?
>
>
> 2023-10-07 Roger Sayle <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
> * config/arc/arc-protos.h (output_shift): Rename to...
> (output_shift_loop): Tweak API to take an explicit rtx_code.
> (arc_split_ashl): Prototype new function here.
> (arc_split_ashr): Likewise.
> (arc_split_lshr): Likewise.
> (arc_split_rotl): Likewise.
> (arc_split_rotr): Likewise.
> * config/arc/arc.cc (output_shift): Delete local prototype. Rename.
> (output_shift_loop): New function replacing output_shift to output
> a zero overheap loop for SImode shifts and rotates on ARC targets
> without barrel shifter (i.e. no hardware support for these insns).
> (arc_split_ashl): New helper function to split *ashlsi3_nobs.
> (arc_split_ashr): New helper function to split *ashrsi3_nobs.
> (arc_split_lshr): New helper function to split *lshrsi3_nobs.
> (arc_split_rotl): New helper function to split *rotlsi3_nobs.
> (arc_split_rotr): New helper function to split *rotrsi3_nobs.
> * config/arc/arc.md (any_shift_rotate): New define_code_iterator.
> (define_code_attr insn): New code attribute to map to pattern name.
> (<any_shift_rotate>si3): New expander unifying previous ashlsi3,
> ashrsi3 and lshrsi3 define_expands. Adds rotlsi3 and rotrsi3.
> (*<any_shift_rotate>si3_nobs): New define_insn_and_split that
> unifies the previous *ashlsi3_nobs, *ashrsi3_nobs and *lshrsi3_nobs.
> We now call arc_split_<insn> in arc.cc to implement each split.
> (shift_si3): Delete define_insn, all shifts/rotates are now split.
> (shift_si3_loop): Rename to...
> (<insn>si3_loop): define_insn to handle loop implementations of
> SImode shifts and rotates, calling ouput_shift_loop for template.
> (rotrsi3): Rename to...
> (*rotrsi3_insn): define_insn for TARGET_BARREL_SHIFTER's ror.
> (*rotlsi3): New define_insn_and_split to transform left rotates
> into right rotates before reload.
> (rotlsi3_cnt1): New define_insn_and_split to implement a left
> rotate by one bit using an add.f followed by an adc.
> * config/arc/predicates.md (shiftr4_operator): Delete.
>
>
> Thanks in advance,
> Roger
> --
>
prev parent reply other threads:[~2023-10-24 7:31 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-08 19:07 Roger Sayle
2023-10-24 7:30 ` Claudiu Zissulescu Ianculescu [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAL0iMy2H2De5_b+wMZjxD45e89q=gQdKvdfrPuoXm+cFqB0S7Q@mail.gmail.com' \
--to=claziss@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=roger@nextmovesoftware.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).