Re: [ARC PATCH] Improved SImode shifts and rotates on !TARGET_BARREL_SHIFTER.

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Claudiu Zissulescu Ianculescu <claziss@gmail.com>
To: Roger Sayle <roger@nextmovesoftware.com>
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [ARC PATCH] Improved SImode shifts and rotates on !TARGET_BARREL_SHIFTER.
Date: Tue, 24 Oct 2023 10:30:48 +0300	[thread overview]
Message-ID: <CAL0iMy2H2De5_b+wMZjxD45e89q=gQdKvdfrPuoXm+cFqB0S7Q@mail.gmail.com> (raw)
In-Reply-To: <002701d9fa1a$a55dec70$f019c550$@nextmovesoftware.com>

Hi Roger,

Your patch doesn't introduce new regressions. However, before pushing
to the mainline you need to fix some issues:
1. Please fix the trailing spaces and blocks of 8 spaces which should
be replaced with tabs. You can use check_GNU_style.py script to spot
them.
2. Please use capital letters for code iterators (i.e., any_shift_rotate).

Once the above issues are fixed, please proceed with your commit.

Thank you for your contribution,
Claudiu

On Sun, Oct 8, 2023 at 10:07 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> This patch completes the ARC back-end's transition to using pre-reload
> splitters for SImode shifts and rotates on targets without a barrel
> shifter.  The core part is that the shift_si3 define_insn is no longer
> needed, as shifts and rotates that don't require a loop are split
> before reload, and then because shift_si3_loop is the only caller
> of output_shift, both can be significantly cleaned up and simplified.
> The output_shift function (Claudiu's "the elephant in the room") is
> renamed output_shift_loop, which handles just the four instruction
> zero-overhead loop implementations.
>
> Aside from the clean-ups, the user visible changes are much improved
> implementations of SImode shifts and rotates on affected targets.
>
> For the function:
> unsigned int rotr_1 (unsigned int x) { return (x >> 1) | (x << 31); }
>
> GCC with -O2 -mcpu=em would previously generate:
>
> rotr_1: lsr_s r2,r0
>         bmsk_s r0,r0,0
>         ror     r0,r0
>         j_s.d   [blink]
>         or_s    r0,r0,r2
>
> with this patch, we now generate:
>
>         j_s.d   [blink]
>         ror     r0,r0
>
> For the function:
> unsigned int rotr_31 (unsigned int x) { return (x >> 31) | (x << 1); }
>
> GCC with -O2 -mcpu=em would previously generate:
>
> rotr_31:
>         mov_s   r2,r0   ;4
>         asl_s r0,r0
>         add.f 0,r2,r2
>         rlc r2,0
>         j_s.d   [blink]
>         or_s    r0,r0,r2
>
> with this patch we now generate an add.f followed by an adc:
>
> rotr_31:
>         add.f   r0,r0,r0
>         j_s.d   [blink]
>         add.cs  r0,r0,1
>
>
> Shifts by constants requiring a loop have been improved for even counts
> by performing two operations in each iteration:
>
> int shl10(int x) { return x >> 10; }
>
> Previously looked like:
>
> shl10:  mov.f lp_count, 10
>         lpnz    2f
>         asr r0,r0
>         nop
> 2:      # end single insn loop
>         j_s     [blink]
>
>
> And now becomes:
>
> shl10:
>         mov     lp_count,5
>         lp      2f
>         asr     r0,r0
>         asr     r0,r0
> 2:      # end single insn loop
>         j_s     [blink]
>
>
> So emulating ARC's SWAP on architectures that don't have it:
>
> unsigned int rotr_16 (unsigned int x) { return (x >> 16) | (x << 16); }
>
> previously required 10 instructions and ~70 cycles:
>
> rotr_16:
>         mov_s   r2,r0   ;4
>         mov.f lp_count, 16
>         lpnz    2f
>         add r0,r0,r0
>         nop
> 2:      # end single insn loop
>         mov.f lp_count, 16
>         lpnz    2f
>         lsr r2,r2
>         nop
> 2:      # end single insn loop
>         j_s.d   [blink]
>         or_s    r0,r0,r2
>
> now becomes just 4 instructions and ~18 cycles:
>
> rotr_16:
>         mov     lp_count,8
>         lp      2f
>         ror     r0,r0
>         ror     r0,r0
> 2:      # end single insn loop
>         j_s     [blink]
>
>
> This patch has been tested with a cross-compiler to arc-linux hosted
> on x86_64-pc-linux-gnu and (partially) tested with the compile-only
> portions of the testsuite with no regressions.  Ok for mainline, if
> your own testing shows no issues?
>
>
> 2023-10-07  Roger Sayle  <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
>         * config/arc/arc-protos.h (output_shift): Rename to...
>         (output_shift_loop): Tweak API to take an explicit rtx_code.
>         (arc_split_ashl): Prototype new function here.
>         (arc_split_ashr): Likewise.
>         (arc_split_lshr): Likewise.
>         (arc_split_rotl): Likewise.
>         (arc_split_rotr): Likewise.
>         * config/arc/arc.cc (output_shift): Delete local prototype.  Rename.
>         (output_shift_loop): New function replacing output_shift to output
>         a zero overheap loop for SImode shifts and rotates on ARC targets
>         without barrel shifter (i.e. no hardware support for these insns).
>         (arc_split_ashl): New helper function to split *ashlsi3_nobs.
>         (arc_split_ashr): New helper function to split *ashrsi3_nobs.
>         (arc_split_lshr): New helper function to split *lshrsi3_nobs.
>         (arc_split_rotl): New helper function to split *rotlsi3_nobs.
>         (arc_split_rotr): New helper function to split *rotrsi3_nobs.
>         * config/arc/arc.md (any_shift_rotate): New define_code_iterator.
>         (define_code_attr insn): New code attribute to map to pattern name.
>         (<any_shift_rotate>si3): New expander unifying previous ashlsi3,
>         ashrsi3 and lshrsi3 define_expands.  Adds rotlsi3 and rotrsi3.
>         (*<any_shift_rotate>si3_nobs): New define_insn_and_split that
>         unifies the previous *ashlsi3_nobs, *ashrsi3_nobs and *lshrsi3_nobs.
>         We now call arc_split_<insn> in arc.cc to implement each split.
>         (shift_si3): Delete define_insn, all shifts/rotates are now split.
>         (shift_si3_loop): Rename to...
>         (<insn>si3_loop): define_insn to handle loop implementations of
>         SImode shifts and rotates, calling ouput_shift_loop for template.
>         (rotrsi3): Rename to...
>         (*rotrsi3_insn): define_insn for TARGET_BARREL_SHIFTER's ror.
>         (*rotlsi3): New define_insn_and_split to transform left rotates
>         into right rotates before reload.
>         (rotlsi3_cnt1): New define_insn_and_split to implement a left
>         rotate by one bit using an add.f followed by an adc.
>         * config/arc/predicates.md (shiftr4_operator): Delete.
>
>
> Thanks in advance,
> Roger
> --
>

     prev parent reply	other threads:[~2023-10-24  7:31 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-08 19:07 Roger Sayle
2023-10-24  7:30 ` Claudiu Zissulescu Ianculescu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAL0iMy2H2De5_b+wMZjxD45e89q=gQdKvdfrPuoXm+cFqB0S7Q@mail.gmail.com' \
    --to=claziss@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=roger@nextmovesoftware.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).