From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id A32033858D35 for ; Thu, 22 Jun 2023 15:55:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A32033858D35 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 30903C14; Thu, 22 Jun 2023 08:55:52 -0700 (PDT) Received: from [10.1.30.150] (E121495.arm.com [10.1.30.150]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 748013F663; Thu, 22 Jun 2023 08:55:07 -0700 (PDT) Message-ID: <6dba374d-0796-7957-5426-764177115135@arm.com> Date: Thu, 22 Jun 2023 16:54:57 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0 Subject: Re: [PATCH 2/2] arm: Add support for MVE Tail-Predicated Low Overhead Loops Content-Language: en-US To: Stamatis Markianos-Wright , "gcc-patches@gcc.gnu.org" Cc: Kyrylo Tkachov , Richard Earnshaw , ramana.gcc@gmail.com, "nickc@redhat.com" References: From: "Andre Vieira (lists)" In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-8.4 required=5.0 tests=BAYES_00,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_SHORT,NICE_REPLY_A,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Some comments below, all quite minor. I'll continue to review tomorrow, I need a fresher brain for arm_mve_check_df_chain_back_for_implic_predic ;) +static int +arm_mve_get_vctp_lanes (rtx x) +{ + if (GET_CODE (x) == SET && GET_CODE (XEXP (x, 1)) == UNSPEC + && (XINT (XEXP (x, 1), 1) == VCTP || XINT (XEXP (x, 1), 1) == VCTP_M)) + { + switch (GET_MODE (XEXP (x, 1))) + { + case V16BImode: + return 16; + case V8BImode: + return 8; + case V4BImode: + return 4; + case V2QImode: + return 2; + default: + break; + } + } + return 0; +} I think you can replace the switch with something along the lines of: machine_mode mode = GET_MODE (XEXP (x, 1)); return VECTOR_MODE_P (mode) ? GET_MODE_NUNITS (mode) : 0; +/* Check if an insn requires the use of the VPR_REG, if it does, return the + sub-rtx of the VPR_REG. The `type` argument controls whether + this function should: + * For type == 0, check all operands, including the OUT operands, + and return the first occurance of the VPR_REG. s/occurance/occurrence/ + bool requires_vpr; + extract_constrain_insn (insn); indent of requires_vpr is off. + if (type == 1 && (recog_data.operand_type[op] == OP_OUT + || recog_data.operand_type[op] == OP_INOUT)) + continue; + else if (type == 2 && (recog_data.operand_type[op] == OP_IN + || recog_data.operand_type[op] == OP_INOUT)) + continue; Why skip INOUT? I guess this will become clear when I see the uses, but I'm wondering whether 'only check the input operands.' is clear enough. Maybe 'check operands that are input only.' would be more accurate? + /* Fetch the reg_class for each entry and check it against the + * VPR_REG reg_class. */ Remove leading * on the second line. + +/* Wrapper function of arm_get_required_vpr_reg with type == 1, so return + something only if the VPR reg is an input operand to the insn. */ When talking about a function parameter in comments capitalize (INSN) the name. Same for: +/* Wrapper function of arm_get_required_vpr_reg with type == 2, so return + something only if the VPR reg is the retrurn value, an output of, or is + clobbered by the insn. */ +/* Return true if an insn is an MVE instruction that VPT-predicable, but in + its unpredicated form, or if it is predicated, but on a predicate other + than vpr_reg. */ In this one also 'is a MVE instruction that is VPT-predicable' would be better I think. On 15/06/2023 12:47, Stamatis Markianos-Wright via Gcc-patches wrote: > Hi all, > > This is the 2/2 patch that contains the functional changes needed > for MVE Tail Predicated Low Overhead Loops. See my previous email > for a general introduction of MVE LOLs. > > This support is added through the already existing loop-doloop > mechanisms that are used for non-MVE dls/le looping. > > Mid-end changes are: > > 1) Relax the loop-doloop mechanism in the mid-end to allow for > decrement numbers other that -1 and for `count` to be an > rtx containing a simple REG (which in this case will contain > the number of elements to be processed), rather > than an expression for calculating the number of iterations. > 2) Added a new df utility function: `df_bb_regno_only_def_find` that > will return the DEF of a REG only if it is DEF-ed once within the > basic block. > > And many things in the backend to implement the above optimisation: > > 3) Implement the `arm_predict_doloop_p` target hook to instruct the > mid-end about Low Overhead Loops (MVE or not), as well as > `arm_loop_unroll_adjust` which will prevent unrolling of any loops > that are valid for becoming MVE Tail_Predicated Low Overhead Loops > (unrolling can transform a loop in ways that invalidate the dlstp/ > letp tranformation logic and the benefit of the dlstp/letp loop > would be considerably higher than that of unrolling) > 4) Appropriate changes to the define_expand of doloop_end, new > patterns for dlstp and letp, new iterators, unspecs, etc. > 5) `arm_mve_loop_valid_for_dlstp` and a number of checking functions: > * `arm_mve_dlstp_check_dec_counter` > * `arm_mve_dlstp_check_inc_counter` > * `arm_mve_check_reg_origin_is_num_elems` > * `arm_mve_check_df_chain_back_for_implic_predic` > * `arm_mve_check_df_chain_fwd_for_implic_predic_impact` > This all, in smoe way or another, are running checks on the loop > structure in order to determine if the loop is valid for dlstp/letp > transformation. > 6) `arm_attempt_dlstp_transform`: (called from the define_expand of > doloop_end) this function re-checks for the loop's suitability for > dlstp/letp transformation and then implements it, if possible. > 7) Various utility functions: > *`arm_mve_get_vctp_lanes` to map > from vctp unspecs to number of lanes, and > `arm_get_required_vpr_reg` > to check an insn to see if it requires the VPR or not. > * `arm_mve_get_loop_vctp` > * `arm_mve_get_vctp_lanes` > * `arm_emit_mve_unpredicated_insn_to_seq` > * `arm_get_required_vpr_reg` > * `arm_get_required_vpr_reg_param` > * `arm_get_required_vpr_reg_ret_val` > * `arm_mve_vec_insn_is_predicated_with_this_predicate` > * `arm_mve_vec_insn_is_unpredicated_or_uses_other_predicate` > > No regressions on arm-none-eabi with various targets and on > aarch64-none-elf. Thoughts on getting this into trunk? > > Thank you, > Stam Markianos-Wright > > gcc/ChangeLog: > > * config/arm/arm-protos.h (arm_target_insn_ok_for_lob): > Rename to... > (arm_target_bb_ok_for_lob): ...this > (arm_attempt_dlstp_transform): New. > * config/arm/arm.cc (TARGET_LOOP_UNROLL_ADJUST): New. > (TARGET_PREDICT_DOLOOP_P): New. > (arm_block_set_vect): > (arm_target_insn_ok_for_lob): Rename from > arm_target_insn_ok_for_lob. > (arm_target_bb_ok_for_lob): New. > (arm_mve_get_vctp_lanes): New. > (arm_get_required_vpr_reg): New. > (arm_get_required_vpr_reg_param): New. > (arm_get_required_vpr_reg_ret_val): New. > (arm_mve_get_loop_vctp): New. > (arm_mve_vec_insn_is_unpredicated_or_uses_other_predicate): New. > (arm_mve_vec_insn_is_predicated_with_this_predicate): New. > (arm_mve_check_df_chain_back_for_implic_predic): New. > (arm_mve_check_df_chain_fwd_for_implic_predic_impact): New. > (arm_mve_check_reg_origin_is_num_elems): New. > (arm_mve_dlstp_check_inc_counter): New. > (arm_mve_dlstp_check_dec_counter): New. > (arm_mve_loop_valid_for_dlstp): New. > (arm_predict_doloop_p): New. > (arm_loop_unroll_adjust): New. > (arm_emit_mve_unpredicated_insn_to_seq): New. > (arm_attempt_dlstp_transform): New. > * config/arm/iterators.md (DLSTP): New. > (mode1): Add DLSTP mappings. > * config/arm/mve.md (*predicated_doloop_end_internal): New. > (dlstp_insn): New. > * config/arm/thumb2.md (doloop_end): Update for MVE LOLs. > * config/arm/unspecs.md: New unspecs. > * df-core.cc (df_bb_regno_only_def_find): New. > * df.h (df_bb_regno_only_def_find): New. > * loop-doloop.cc (doloop_condition_get): Relax conditions. > (doloop_optimize): Add support for elementwise LoLs. > > gcc/testsuite/ChangeLog: > > * gcc.target/arm/lob.h: Update framework. > * gcc.target/arm/lob1.c: Likewise. > * gcc.target/arm/lob6.c: Likewise. > * gcc.target/arm/mve/dlstp-compile-asm.c: New test. > * gcc.target/arm/mve/dlstp-int16x8.c: New test. > * gcc.target/arm/mve/dlstp-int32x4.c: New test. > * gcc.target/arm/mve/dlstp-int64x2.c: New test. > * gcc.target/arm/mve/dlstp-int8x16.c: New test. > * gcc.target/arm/mve/dlstp-invalid-asm.c: New test.