From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pj1-x1033.google.com (mail-pj1-x1033.google.com [IPv6:2607:f8b0:4864:20::1033]) by sourceware.org (Postfix) with ESMTPS id 7B2323858D38 for ; Mon, 19 Dec 2022 15:44:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 7B2323858D38 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-pj1-x1033.google.com with SMTP id o8-20020a17090a9f8800b00223de0364beso649771pjp.4 for ; Mon, 19 Dec 2022 07:44:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:in-reply-to:content-language:references :cc:to:subject:from:user-agent:mime-version:date:message-id:from:to :cc:subject:date:message-id:reply-to; bh=T2c7KvYcuXGJM2RjyrcfWDq8RRvwhffv0Q92SQl6U60=; b=Uma5ahRtaIC4M0Y4CAI6ZG8DvPkG6OMFUf3Kyu5BDJhWejr66r5UOZ5oyXtNnCNnG9 xCy2bBnXVva3DHCRRw+UQZusErM5ZBKtVPpcNJqecLIirRTp2RSRgCZCwssZKzmjyJk9 jhDUJ1Huk/tjMyOgQ0sed8OeHtrkwwSyfrX4HsHU94y4slDsuc6gYsIn2RKps5q79Umg g4GWfBHYDp13bVol7CAo/nkgQYLzHewumgaq7FwPQGGAkbv8sFpHjs3Q+Tpxeh8oopJz 4QNIEEB6QYMNKiTguDg6GxsYJBmJ6Wilksrr617qYP5Qx9m2XZEPyt8MgU2be9JS4Mdw gfkQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:content-language:references :cc:to:subject:from:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=T2c7KvYcuXGJM2RjyrcfWDq8RRvwhffv0Q92SQl6U60=; b=ETxgXa9Ig+A+TkVNS8BErbWEND++xB8QDtE0vUeaNqF3AcpaKeccqQ9YYqY3jKjaUQ NztvTcyT4XIo8V4MIlnV4Df8/0e7Yv4Vl6/WqVE6BhetdErSsOk3RDsvf46LDtZNQsxN M2bbjzt2z0F7jh90rpBMnAkJN99dDOyDd69B7ojBs15WUbDLm3TnxrL8buBykGPoGq2g Wk5aKeSsEJbu45oy2FVScGGuwIQKb0I7rhoHKKqe+I8f1rCnQGk3GcuvLNUXmk61hh9C 3bv+WV2qBY+7mA0FmGfS9Prmaj0GBTGbAZ+BZzhVBhTnk5kW9+4bMyQ2U9BH//KX78zz gxew== X-Gm-Message-State: ANoB5pkx0LUf6AAs5hvAjZu4G8EgcyHIPylTLhNuGqX/d6Zfb6dNEkQb q77cley88zupjmv09TH/bdI= X-Google-Smtp-Source: AA0mqf5jDe32ULR4KAbUt/i2UnVSCTBDnfhkqUuax+d8WCsl4tV6Z1OEhjEz6+I+JhHRPI5JA40kFw== X-Received: by 2002:a05:6a20:bf17:b0:ac:6543:d515 with SMTP id gc23-20020a056a20bf1700b000ac6543d515mr47813230pzb.42.1671464659246; Mon, 19 Dec 2022 07:44:19 -0800 (PST) Received: from ?IPV6:2601:681:8600:13d0::f0a? ([2601:681:8600:13d0::f0a]) by smtp.gmail.com with ESMTPSA id 15-20020a63144f000000b00478d947ed5esm6426896pgu.27.2022.12.19.07.44.18 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 19 Dec 2022 07:44:18 -0800 (PST) Message-ID: <77c96a76-34e1-5394-d7ac-31de790dd007@gmail.com> Date: Mon, 19 Dec 2022 08:44:17 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.5.1 From: Jeff Law Subject: Re: [PATCH] RISC-V: Support VSETVL PASS for RVV support To: juzhe.zhong@rivai.ai, gcc-patches@gcc.gnu.org Cc: kito.cheng@gmail.com, palmer@dabbelt.com References: <20221214071345.153278-1-juzhe.zhong@rivai.ai> Content-Language: en-US In-Reply-To: <20221214071345.153278-1-juzhe.zhong@rivai.ai> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-3.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: I believe Kito already approved. There's nothing here that is critical, just minor cleanups and I'm fine with them being cleaned up as a follow-up patch given Kito has already approved this patch. On 12/14/22 00:13, juzhe.zhong@rivai.ai wrote: > From: Ju-Zhe Zhong > > This patch is to support VSETVL PASS for RVV support. > 1.The optimization and performance is guaranteed LCM (Lazy code motion). > 2.Base on RTL_SSA framework to gain better optimization chances. > 3.Also we do VL/VTYPE, demand information backward propagation across > blocks by RTL_SSA reverse order in CFG. > 4.It has been well and fully tested by about 200+ testcases for VLMAX > AVL situation (Only for VLMAX since we don't have an intrinsics to > test non-VLMAX). > 5.Will support AVL model in the next patch > > gcc/ChangeLog: > > * config.gcc: Add riscv-vsetvl.o. > * config/riscv/riscv-passes.def (INSERT_PASS_BEFORE): Add VSETVL PASS location. > * config/riscv/riscv-protos.h (make_pass_vsetvl): New function. > (enum avl_type): New enum. > (get_ta): New function. > (get_ma): Ditto. > (get_avl_type): Ditto. > (calculate_ratio): Ditto. > (enum tail_policy): New enum. > (enum mask_policy): Ditto. > * config/riscv/riscv-v.cc (calculate_ratio): New function. > (emit_pred_op): change the VLMAX mov codgen. > (get_ta): New function. > (get_ma): Ditto. > (enum tail_policy): Change enum. > (get_prefer_tail_policy): New function. > (enum mask_policy): Change enum. > (get_prefer_mask_policy): New function. > * config/riscv/t-riscv: Add riscv-vsetvl.o > * config/riscv/vector.md (): Adjust attribute and pattern for VSETVL PASS. > (@vlmax_avl): Ditto. > (@vsetvl_no_side_effects): Delete. > (vsetvl_vtype_change_only): New MD pattern. > (@vsetvl_discard_result): Ditto. > * config/riscv/riscv-vsetvl.cc: New file. > * config/riscv/riscv-vsetvl.h: New file. So a high level note. Once you've inserted your vsetvl instrutions, you can't have further code motion, correct? So doesn't this potentially have a poor interaction with something like speculative code motion as performed by sched? ISTM that if you want to run before sched2, then you'd need to introduce dependencies between the vsetvl instrutions and the vector instructions that utilize those settings? I can envision wanting to schedule the vsetvl instructions so that they bubble up slightly from their insertion points to avoid stalls or allow the vector units to start executing earlier. Is that what's driving the the current pass placement? If not would it make more sense to use the late prologue/epilogue hooks that Richard Sandiford posted recently (I'm not sure they're committed yet). > + > +static bool > +loop_basic_block_p (const basic_block cfg_bb) > +{ > + return JUMP_P (BB_END (cfg_bb)) && any_condjump_p (BB_END (cfg_bb)); > +} The name seems poor here -- AFAICT this has nothing to do with loops. It's just a test that the end of a block is a conditional jump. I'm pretty sure we could extract BB_END (cfg_bb) and use an existing routine instead of writing our own. I'd suggest peeking at jump.cc to see if there's something already suitable. + > +/* Return true if it is vsetvldi or vsetvlsi. */ > +static bool > +vsetvl_insn_p (rtx_insn *rinsn) > +{ > + return INSN_CODE (rinsn) == CODE_FOR_vsetvldi > + || INSN_CODE (rinsn) == CODE_FOR_vsetvlsi; Formatting note. For a multi-line conditional, go ahead and use an open paren and the usual indention style. return (INSN_CODE (rinsn) == CODE_FOR_vsetvldi || INSN_CODE (rinsn) == CODE_FOR_vsetvlsi); There's other examples in the new file. > + > +/* An "anticipatable occurrence" is one that is the first occurrence in the > + basic block, the operands are not modified in the basic block prior > + to the occurrence and the output is not used between the start of > + the block and the occurrence. */ > +static bool > +anticipatable_occurrence_p (const insn_info *insn, const vector_insn_info dem) > +{ > + /* The only possible operand we care of VSETVL is AVL. */ > + if (dem.has_avl_reg ()) > + { > + /* The operands shoule not be modified in the basic block prior s/shoule/should/ > + > +/* Return true if the branch probability is dominate. */ > +static bool > +dominate_probability_p (edge e) The function comment needs some work. "is dominate" doesn't really mean anything without more context. It looks like you're really testing if the edge probability is greater than 50%. > + > + /* There is a obvious case that is not worthwhile and meaningless > + to propagate the demand information: > + local_dem > + __________ > + ____|____ | > + | | | > + |________| | > + |_________| > + reaching_out > + Header is incompatible with reaching_out and the block is loop itself, > + we don't backward propagete the local_dem since we can't avoid emit > + vsetvl for the local_dem. */ s/propagete/propagate/ > +static bool > +can_backward_propagate_p (const function_info *ssa, const basic_block cfg_bb, > + const vector_insn_info prop) > +{ > + insn_info *insn = prop.get_insn (); > + > + /* TODO: We don't backward propagate the explict VSETVL here > + since we will change vsetvl and vsetvlmax intrinsiscs into > + no side effects which can be optimized into optimzal location > + by GCC internal PASSes. We only need to support these backward > + propagation if vsetvl instrinsics have side effects. */ s/optimzal/optimal/ s/PASSes/passes/ s/intrinsiscs/intrinsics/ s/instrinsics/intrinsics/ > + > + /* If the definition is in the current block, we can't propagate it > + acrocss blocks. */ s/acrocss/across/ > + > +static void > +eliminate_insn (rtx_insn *rinsn) > +{ > + if (dump_file) > + { > + fprintf (dump_file, "\nEliminate insn %d:\n", INSN_UID (rinsn)); > + print_rtl_single (dump_file, rinsn); > + } > + if (in_sequence_p ()) > + remove_insn (rinsn); > + else > + delete_insn (rinsn); > +} Hmm, presumably you can be in a sequence because you're generating code and you might want to remove an insn in the generated sequence? > + > +/* ??? If there was a jump optimization pass after gcse and before loop, > + then we would not need to do this here, because jump would add the > + necessary REG_LABEL_OPERAND and REG_LABEL_TARGET notes. */ > + > +static void > +add_label_notes (rtx x, rtx_insn *insn) It'd probably be better to move this into rtl.cc with a prototype in rtl.h rather than have duplicate definitions in gcse.c and the RISC-V backend. I'm not even entirely sure why we really need it here. > + > + This is used by both the PRE and code hoisting. */ > + > +static void > +insert_insn_end_basic_block (rtx_insn *rinsn, basic_block cfg_bb) Similarly. Though rather than rtl.cc maybe we should just export it out of gcse or wherever it's currently defined. > > + > +static void > +change_insn (rtx_insn *rinsn, rtx new_pat) These need function comments. What isn't clear to me is why we don't just call validate_change? Is it just so we get the dump info? Jeff