Re: [RFC v2] RISCV: Combine Pass Clobber Ops

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Kito Cheng <kito.cheng@gmail.com>
To: "Patrick O'Neill" <patrick@rivosinc.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>,
	Palmer Dabbelt <palmer@dabbelt.com>, Jim Wilson <jimw@sifive.com>
Subject: Re: [RFC v2] RISCV: Combine Pass Clobber Ops
Date: Fri, 11 Mar 2022 16:07:27 +0800	[thread overview]
Message-ID: <CA+yXCZB-W+P7t1r1y9DGTJDCNO=Ny2BuPzhg19u9KJ=LUydXdA@mail.gmail.com> (raw)
In-Reply-To: <20220310175455.3178485-1-patrick@rivosinc.com>

Hi Patrick:

There is few direction in my mind:

1. Model the C extension right in riscv.md
2. Write peephole2 pattern.
3. Implement a RISC-V specific register renaming pass.

1. Model the C extension right in riscv.md

Currently we rely the GNU as to compress the instruction to C
extension, and actually we can model that in our md file,

Using addsi3 as example (I didn't handle the rv64 case carefully since
it's not matter to demo the idea):

(define_insn "addsi3"
 [(set (match_operand:SI          0 "register_operand" "=r,r,l,r")
       (plus:SI (match_operand:SI 1 "register_operand" " r,0,sp,r")
                (match_operand:SI 2 "arith_operand"    " r,r,sp_imm,I")))]
 ""
"@
 add %0, %1, %2
 c.add %0, %2
 c.addi4spn %0, %2
 addi %0, %1, %2
"
 [(set_attr "type" "arith")
  (set_attr "mode" "SI")])

That could give GCC more knowledge about the C-extension, I've done
that before, but not upstream, and I guess I lost those changes...
The reason why I didn't upstream before is because there seems to be
no code size reduction by this work.
I guess we should have more tweaks to make it more useful, but I never
found enough time to do that.

2. Write peephole2/split pattern.

OK, that's pretty straightforward, just write down what you did in the
combine pass (this patch), and let peephole pass to do that.


(define_peephole2
 [(set (match_operand:SI 0 "register_operand" "")
      (match_operator:SI 1 "any_operator_has_compressed_form"
                          [(match_operand:SI 2 "register_operand" "")
                           (match_operand:SI 3 "register_operand" "")]))
  (set (match_operand:SI 4 "register_operand" "")
       (match_operator:SI 5 "any_binary_operator"
                           [(match_dup 0)
                            (match_operand:SI 6 "arith_operand" "")]))
 "TARGET_RVC && peep2_reg_dead_p (1, operands[2])
  && /* More check*/"
 [(set (match_dup 2) (match_op_dup 1 [(match_dup 2) (match_dup 3)]))
  (set (match_dup 4) (match_op_dup 5 [(match_dup 2) (match_dup 6)]))]
)

3. Implement a RISC-V specific register renaming pass.

Last approach I've done in my previous job, but never contribute back
(and I've no chance to access that now),
It has some code size reduction, but I don't remember the accurate
number of the results,
This approach can give you having better global view of whole register usage,
So I believe this approach can have better results than other approaches.

You can find lots of useful util functions in regrename.cc that could
be used to build a RISC-V specific register renaming pass.

On Fri, Mar 11, 2022 at 1:56 AM Patrick O'Neill <patrick@rivosinc.com> wrote:
>
> RISC-V's C-extension describes 2-byte instructions with special
> constraints. One of those constraints is that one of the sources/dest
> registers are equal (op will clobber one of it's operands). This patch
> adds support for combining simple sequences:
>
> r1 = r2 + r3 (4 bytes)
> r2 DEAD
> r4 = r1 + r5 (4 bytes)
> r1 DEAD
>
> Combine pass now generates:
>
> r2 = r2 + r3 (2 bytes)
> r4 = r2 + r5 (4 bytes)
> r2 DEAD
>
> This change results in a ~150 Byte decrease in the linux kernel's
> compiled size (text: 5327254 Bytes -> 5327102 Bytes).
>
> I added this enforcement during the combine pass since it looks at the
> cost of certian expressions and can rely on the target to tell the
> pass that clobber-ops are cheaper than regular ops.
>
> The main thing holding this RFC back is the combine pass's behavior for
> sequences like this:
> b = a << 5;
> c = b + 2;
>
> Normally the combine pass modifies the RTL to be:
> c = (a << 5) + 2
> before expanding it back to the original statement.
>
> With my changes, the RTL is prevented from being combined like that and
> instead results in RTL like this:
> c = 2
> which is clearly wrong.
>
> I think that the next step would be to figure out where this
> re-expansion takes place and implement the same-register constraint
> there. However, I'm opening the RFC for any input:
> 1. Are there better ways to enforce same-register constraints during the
>    combine pass other than declaring the source/dest register to be the
>    same in RTL? Specifically, I'm concerned that this addition may
>    restrict subsequent RTL pass optimizations.
> 2. Are there other concerns with implementing source-dest constraints
>    within the combine pass?
> 3. Any other thoughts/input you have is welcome!
>
> 2022-03-10 Patrick O'Neill <patrick@rivosinc.com>
>
>         * combine.cc: Add register equality replacement.
>         * riscv.cc (riscv_insn_cost): Add in order to tell combine pass
>           that clobber-ops are cheaper.
>         * riscv.h: Add c extension argument macros.
>
> Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
> ---
> Changelog:
> v2:
>  - Fix whitespace
>  - Rearrange conditionals to break long lines
> ---
>  gcc/combine.cc            | 78 +++++++++++++++++++++++++++++++++++++++
>  gcc/config/riscv/riscv.cc | 42 +++++++++++++++++++++
>  gcc/config/riscv/riscv.h  |  7 ++++
>  3 files changed, 127 insertions(+)
>
> diff --git a/gcc/combine.cc b/gcc/combine.cc
> index 8f06ee0e54f..1b14e802166 100644
> --- a/gcc/combine.cc
> +++ b/gcc/combine.cc
> @@ -3280,6 +3280,84 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0,
>        i2_is_used = n_occurrences;
>      }
>
> +/* Attempt to replace ops with clobber-ops.
> +     If the target implements clobber ops (set r1 (plus (r1)(r2))) as cheaper,
> +     this pattern allows the combine pass to optimize with that in mind.
> +     NOTE: This conditional is not triggered in most cases. Ideally we would be
> +     able to move it above the if (i2_is_used == 0), but that breaks the
> +     testsuite.
> +     See RFC blurb for more info.  */
> +  if (!i0 && !i1 && i2 && i3 && GET_CODE(PATTERN(i2)) == SET
> +      && GET_CODE(SET_DEST(PATTERN(i2))) == REG
> +      && GET_CODE(PATTERN(i3)) == SET
> +      && (GET_RTX_CLASS(GET_CODE(SET_SRC(PATTERN(i3)))) == RTX_BIN_ARITH
> +         || GET_RTX_CLASS(GET_CODE(SET_SRC(PATTERN(i3)))) == RTX_COMM_ARITH
> +         || GET_RTX_CLASS(GET_CODE(SET_SRC(PATTERN(i3)))) == RTX_UNARY)
> +      && GET_CODE(SET_DEST(PATTERN(i3))) == REG
> +      && REGNO(XEXP(SET_SRC(PATTERN(i3)), 0)) != REGNO(SET_DEST(PATTERN(i3)))) {
> +
> +    rtx_code insn_class = GET_CODE(SET_SRC (PATTERN(i2)));
> +
> +    if (GET_RTX_CLASS(insn_class) == RTX_BIN_ARITH
> +       || GET_RTX_CLASS(insn_class) == RTX_COMM_ARITH
> +       || GET_RTX_CLASS(insn_class) == RTX_UNARY) {
> +
> +      rtx operand1 = XEXP (SET_SRC (PATTERN (i2)), 0);
> +      rtx prior_reg = SET_DEST (PATTERN (i2));
> +
> +      if (GET_CODE(operand1) == REG
> +         && GET_MODE(operand1) == GET_MODE(prior_reg)
> +         && find_reg_note (i2, REG_DEAD, operand1)
> +         && regno_use_in (REGNO(prior_reg), PATTERN(i3))
> +         && find_reg_note (i3, REG_DEAD, SET_DEST (PATTERN(i2)))) {
> +
> +       // Now we have a dead operand register, and we know where the dest dies.
> +
> +       // Remove the note declaring the register as dead
> +       rtx note = find_reg_note (i2, REG_DEAD, operand1);
> +       remove_note (i2, note);
> +
> +       // Overwrite i2 dest with operand1
> +       rtx i2_dest = copy_rtx(operand1);
> +       SUBST (SET_DEST (PATTERN (i2)), i2_dest);
> +
> +       // Replace the previous i2 dest register with operand1 in i3
> +       rtx op1_copy = copy_rtx(operand1);
> +       rtx new_src = simplify_replace_rtx(SET_SRC (PATTERN (i3)),
> +                                                   prior_reg, op1_copy);
> +       SUBST (SET_SRC (PATTERN (i3)), new_src);
> +
> +       // Move the dest dead note to the new register
> +       note = find_reg_note (i3, REG_DEAD, prior_reg);
> +       if (note) {
> +         remove_note (i3, note);
> +         //add_reg_note (i3, REG_DEAD, op1_copy);
> +       }
> +
> +       newi2pat = PATTERN (i2);
> +       newpat = PATTERN (i3);
> +
> +       subst_insn = i3;
> +       subst_low_luid = DF_INSN_LUID (i2);
> +       added_sets_2 = added_sets_1 = added_sets_0 = 0;
> +       i2dest = i2_dest;
> +       i2dest_killed = dead_or_set_p (i2, i2dest);
> +
> +       changed_i3_dest = false;
> +
> +       i2_code_number = recog_for_combine (&newi2pat, i2, &new_i2_notes);
> +
> +       if (i2_code_number >= 0)
> +         insn_code_number = recog_for_combine (&newpat, i3, &new_i3_notes);
> +
> +       if (insn_code_number >= 0)
> +         swap_i2i3 = true;
> +
> +       goto validate_replacement;
> +      }
> +    }
> +  }
> +
>    /* If we already got a failure, don't try to do more.  Otherwise, try to
>       substitute I1 if we have it.  */
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 7da9d377120..9f0bd59ac41 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -2160,6 +2160,46 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN
>      }
>  }
>
> +/* Helper for INSN_COST.
> +
> +   Copied from arc.cc
> +
> +   Per Segher Boessenkool: rtx_costs computes the cost for any rtx (an
> +   insn, a set, a set source, any random piece of one).  set_src_cost,
> +   set_rtx_cost, etc. are helper functions that use that.
> +
> +   Those functions do not work for parallels.  Also, costs are not
> +   additive like this simplified model assumes.  Also, more complex
> +   backends tend to miss many cases in their rtx_costs function.
> +
> +   Many passes that want costs want to know the cost of a full insn.  Like
> +   combine.  That's why I created insn_cost: it solves all of the above
> +   problems.  */
> +
> +static int
> +riscv_insn_cost (rtx_insn *insn, bool speed)
> +{
> +  rtx pat = PATTERN (insn);
> +
> +  if (TARGET_RVC && !speed) {
> +    if (GET_CODE(pat) == SET && GET_CODE(SET_DEST(pat)) == REG) {
> +      rtx src = SET_SRC(pat);
> +      rtx dest = SET_DEST(pat);
> +      if (GET_CODE(src) == PLUS && GET_CODE(XEXP(src, 0)) == REG && REGNO(XEXP(src, 0)) == REGNO(dest)) {
> +       if (GET_CODE(XEXP(src, 1)) == REG)
> +         return 2;
> +       else if (GET_CODE(XEXP(src, 1)) == CONST_INT && CMPRESD_OPERAND(INTVAL(XEXP(src, 1))))
> +         return 2;
> +      } else if (GET_CODE(src) == ASHIFT && GET_CODE(XEXP(src, 0)) == REG && REGNO(dest) == REGNO(XEXP(src, 0))) {
> +       if (GET_CODE(XEXP(src, 1)) == CONST_INT && UNSIGNED_CMPRESD_OPERAND(INTVAL(XEXP(src, 1))))
> +         return 2;
> +      }
> +    }
> +  }
> +
> +  return pattern_cost (pat, speed);
> +}
> +
>  /* Implement TARGET_ADDRESS_COST.  */
>
>  static int
> @@ -5618,6 +5658,8 @@ riscv_asan_shadow_offset (void)
>  #define TARGET_RTX_COSTS riscv_rtx_costs
>  #undef TARGET_ADDRESS_COST
>  #define TARGET_ADDRESS_COST riscv_address_cost
> +#undef TARGET_INSN_COST
> +#define TARGET_INSN_COST riscv_insn_cost
>
>  #undef TARGET_ASM_FILE_START
>  #define TARGET_ASM_FILE_START riscv_file_start
> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
> index 8a4d2cf7f85..be5e77b1394 100644
> --- a/gcc/config/riscv/riscv.h
> +++ b/gcc/config/riscv/riscv.h
> @@ -522,6 +522,13 @@ enum reg_class
>  #define SMALL_OPERAND(VALUE) \
>    ((unsigned HOST_WIDE_INT) (VALUE) + IMM_REACH/2 < IMM_REACH)
>
> +#define CMPRESD_OPERAND(VALUE) \
> +  (VALUE < 32 && VALUE >= -32)
> +
> +/* True if VALUE is an unsigned 5-bit number. */
> +#define UNSIGNED_CMPRESD_OPERAND(VALUE) \
> +  (VALUE < 64 && VALUE >= 0)
> +
>  /* True if VALUE can be loaded into a register using LUI.  */
>
>  #define LUI_OPERAND(VALUE)                                             \
> --
> 2.25.1
>

next prev parent reply	other threads:[~2022-03-11  8:07 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-10 17:35 [RFC] " Patrick O'Neill
2022-03-10 17:42 ` H.J. Lu
2022-03-10 17:54   ` [RFC v2] " Patrick O'Neill
2022-03-11  8:07     ` Kito Cheng [this message]
2022-03-11 14:18     ` Segher Boessenkool

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+yXCZB-W+P7t1r1y9DGTJDCNO=Ny2BuPzhg19u9KJ=LUydXdA@mail.gmail.com' \
    --to=kito.cheng@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jimw@sifive.com \
    --cc=palmer@dabbelt.com \
    --cc=patrick@rivosinc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).