From: Ajit Agarwal <aagarwa1@linux.ibm.com>
To: Alex Coplan <alex.coplan@arm.com>,
"Kewen.Lin" <linkw@linux.ibm.com>,
Segher Boessenkool <segher@kernel.crashing.org>,
Michael Meissner <meissner@linux.ibm.com>,
Peter Bergner <bergner@linux.ibm.com>,
David Edelsohn <dje.gcc@gmail.com>,
gcc-patches <gcc-patches@gcc.gnu.org>,
richard.sandiford@arm.com
Subject: Re: [Patch, rs6000, aarch64, middle-end] Add implementation for different targets for pair mem fusion
Date: Fri, 31 May 2024 19:24:25 +0530 [thread overview]
Message-ID: <cc31b3fb-552b-43bd-9029-5b84f6e6c437@linux.ibm.com> (raw)
In-Reply-To: <mpt34py71e8.fsf@arm.com>
Hello Richard:
On 31/05/24 3:23 pm, Richard Sandiford wrote:
> Ajit Agarwal <aagarwa1@linux.ibm.com> writes:
>> Hello All:
>>
>> Common infrastructure using generic code for pair mem fusion of different
>> targets.
>>
>> rs6000 target specific specific code implements virtual functions defined
>> by generic code.
>>
>> Code is implemented with pure virtual functions to interface with target
>> code.
>>
>> Target specific code are added in rs6000-mem-fusion.cc and additional virtual
>> function implementation required for rs6000 are added in aarch64-ldp-fusion.cc.
>>
>> Bootstrapped and regtested for aarch64-linux-gnu and powerpc64-linux-gnu.
>>
>> Thanks & Regards
>> Ajit
>>
>>
>> aarch64, rs6000, middle-end: Add implementation for different targets for pair mem fusion
>>
>> Common infrastructure using generic code for pair mem fusion of different
>> targets.
>>
>> rs6000 target specific specific code implements virtual functions defined
>> by generic code.
>>
>> Code is implemented with pure virtual functions to interface with target
>> code.
>>
>> Target specific code are added in rs6000-mem-fusion.cc and additional virtual
>> function implementation required for rs6000 are added in aarch64-ldp-fusion.cc.
>>
>> 2024-05-31 Ajit Kumar Agarwal <aagarwa1@linux.ibm.com>
>>
>> gcc/ChangeLog:
>>
>> * config/aarch64/aarch64-ldp-fusion.cc: Add target specific
>> implementation of additional virtual functions added in pair_fusion
>> struct.
>> * config/rs6000/rs6000-passes.def: New mem fusion pass
>> before pass_early_remat.
>> * config/rs6000/rs6000-mem-fusion.cc: Add new pass.
>> Add target specific implementation using pure virtual
>> functions.
>> * config.gcc: Add new object file.
>> * config/rs6000/rs6000-protos.h: Add new prototype for mem
>> fusion pass.
>> * config/rs6000/t-rs6000: Add new rule.
>> * rtl-ssa/accesses.h: Moved set_is_live_out_use as public
>> from private.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * g++.target/powerpc/me-fusion.C: New test.
>> * g++.target/powerpc/mem-fusion-1.C: New test.
>> * gcc.target/powerpc/mma-builtin-1.c: Modify test.
>> ---
>
> This isn't a complete review, just some initial questions & comments
> about selected parts.
>
>> [...]
>> +/* Check whether load can be fusable or not.
>> + Return true if dependent use is UNSPEC otherwise false. */
>> +bool
>> +rs6000_pair_fusion::fuseable_load_p (insn_info *info)
>> +{
>> + rtx_insn *insn = info->rtl ();
>> +
>> + for (rtx note = REG_NOTES (insn); note; note = XEXP (note, 1))
>> + if (REG_NOTE_KIND (note) == REG_EQUAL
>> + || REG_NOTE_KIND (note) == REG_EQUIV)
>> + return false;
>
> It's unusual to punt on an optimisation because of a REG_EQUAL/EQUIV
> note. What's the reason for doing this? Are you trying to avoid
> fusing pairs before reload that are equivalent to a MEM (i.e. have
> a natural spill slot)? I think Alex hit a similar situation.
>
We have used the above check because of some SPEC benchmarks failing with
with MEM pairs having REG_EQUAL/EQUIV notes.
By adding the checks the benchmarks passes and also it improves the
performance.
This checks were added during initial implementation of pair fusion
pass.
I will investigate further if this check is still required or not.
Sorry for the inconvenience caused.
>> +
>> + for (auto def : info->defs ())
>> + {
>> + auto set = dyn_cast<set_info *> (def);
>> + if (set && set->has_any_uses ())
>> + {
>> + for (auto use : set->all_uses())
>
> Nit: has_any_uses isn't necessary: the inner loop will simply do nothing
> in that case. Also, we can/should restrict the scan to non-debug uses.
>
> This can then be:
>
> for (auto def : info->defs ())
> if (auto set = dyn_cast<set_info *> (def))
> for (auto use : set->nondebug_insn_uses())
>
Sure. I will change as above.
>> + {
>> + if (use->insn ()->is_artificial ())
>> + return false;
>> +
>> + insn_info *info = use->insn ();
>> +
>> + if (info
>> + && info->rtl ()
>
> This test shouldn't be necessary.
>
Sure I will remove this check.
>> + && info->is_real ())
>> + {
>> + rtx_insn *rtl_insn = info->rtl ();
>> + rtx set = single_set (rtl_insn);
>> +
>> + if (set == NULL_RTX)
>> + return false;
>> +
>> + rtx op0 = SET_SRC (set);
>> + if (GET_CODE (op0) != UNSPEC)
>> + return false;
>
> What's the motivation for rejecting unspecs? It's unsual to treat
> all unspecs as a distinct group.
>
> Also, using single_set means that the function still lets through
> parallels of two sets in which the sources are unspecs. Is that
> intentional?
>
> The reasons behind things like the REG_EQUAL/EQUIV and UNSPEC decisions
> need to be described in comments, so that other people coming to this
> code later can understand the motivation. The same thing applies to
> other decisions in the patch.
>
Adjacent load pair fusion with 256 bit OOmode is seen and valid with use of load
in UNSPEC. Thats why this check is added.
>> + }
>> + }
>> + }
>> + }
>> + return true;
>> +}
>> [...]
>> diff --git a/gcc/pair-fusion.cc b/gcc/pair-fusion.cc
>> index 9f897ac04e2..2dbe9f854ef 100644
>> --- a/gcc/pair-fusion.cc
>> +++ b/gcc/pair-fusion.cc
>> @@ -312,7 +312,7 @@ static int
>> encode_lfs (lfs_fields fields)
>> {
>> int size_log2 = exact_log2 (fields.size);
>> - gcc_checking_assert (size_log2 >= 2 && size_log2 <= 4);
>> + gcc_checking_assert (size_log2 >= 2 && size_log2 <= 6);
>> return ((int)fields.load_p << 3)
>> | ((int)fields.fpsimd_p << 2)
>> | (size_log2 - 2);
>
> The point of the assert is to make sure that "size_log2 - 2"
> fits within 2 bits, and so does not interfere with the fpsimd_p
> and load_p parts of the encoding. If rs6000 needs size_log2 == 6,
> the shift amounts should be increased by 1 to compensate.
>
> If we do bump the shifts by 1, the new range can be:
>
> gcc_checking_assert (size_log2 >= 2 && size_log2 <= 9);
>
Sure I will make this change.
>> [...]
>> + // Given insn_info pair I1 and I2, return true if offsets are in order.
>> + virtual bool should_handle_unordered_insns (rtl_ssa::insn_info *i1,
>> + rtl_ssa::insn_info *i2) = 0;
>> +
>
> This name seems a bit misleading. The function is used in:
>
> @@ -2401,6 +2405,9 @@ pair_fusion_bb_info::try_fuse_pair (bool load_p, unsigned access_size,
> reversed = true;
> }
>
> + if (!m_pass->should_handle_unordered_insns (i1, i2))
> + return false;
> +
> rtx cand_mems[2];
> rtx reg_ops[2];
> rtx pats[2];
>
> and so it acts as a general opt-out. The insns aren't known to be unordered.
>
> It looks like the rs6000 override requires the original insns to be
> in offset order. Could you say why that's necessary? (Both in email
> and as a comment in the code.)
>
Yes rs6000 requires the original load insns to be in offset order.
Some regression tests like vect-outer-4f fails if we do load pair
fusion with load offsets are not in offset order as this breaks lxvp
semantics.
> If we do need a general opt-out like this, it should probably go at the
> very start of try_fuse_pair, before even the dump message. (Alternatively,
> it could go after the dump message, but then the "return false" would need
> a dump message of its own to explain the failure.)
>
Sure.
> Thanks,
> Richard
Thanks & Regards
Ajit
next prev parent reply other threads:[~2024-05-31 13:54 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-30 19:51 Ajit Agarwal
2024-05-30 21:34 ` Segher Boessenkool
2024-05-31 8:14 ` Richard Sandiford
2024-05-31 14:19 ` Segher Boessenkool
2024-05-31 9:53 ` Richard Sandiford
2024-05-31 10:28 ` Richard Sandiford
2024-05-31 13:54 ` Ajit Agarwal [this message]
2024-05-31 14:38 ` Richard Sandiford
2024-05-31 16:59 ` Ajit Agarwal
2024-06-02 5:52 ` Ajit Agarwal
2024-06-03 8:37 ` Richard Sandiford
2024-06-03 11:05 ` Ajit Agarwal
2024-06-03 11:33 ` Richard Sandiford
2024-06-03 13:47 ` Ajit Agarwal
2024-06-03 14:17 ` Richard Sandiford
2024-06-03 14:34 ` Ajit Agarwal
2024-06-03 14:54 ` Richard Sandiford
2024-06-03 15:58 ` Ajit Agarwal
2024-06-04 16:00 ` Ajit Agarwal
2024-06-02 13:16 ` Ajit Agarwal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cc31b3fb-552b-43bd-9029-5b84f6e6c437@linux.ibm.com \
--to=aagarwa1@linux.ibm.com \
--cc=alex.coplan@arm.com \
--cc=bergner@linux.ibm.com \
--cc=dje.gcc@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=linkw@linux.ibm.com \
--cc=meissner@linux.ibm.com \
--cc=richard.sandiford@arm.com \
--cc=segher@kernel.crashing.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).