From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTP id CD035394D837 for ; Wed, 30 Jun 2021 15:25:06 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org CD035394D837 Received: from mail-ot1-f71.google.com (mail-ot1-f71.google.com [209.85.210.71]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-26-bxtZbdxEOKqQwCtLR-PYZQ-1; Wed, 30 Jun 2021 11:25:05 -0400 X-MC-Unique: bxtZbdxEOKqQwCtLR-PYZQ-1 Received: by mail-ot1-f71.google.com with SMTP id c3-20020a9d61430000b0290474c23d2dbcso812712otk.15 for ; Wed, 30 Jun 2021 08:25:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=JUdhVsd73qDhD4fmEv4v59se+XDiQfZGyrlZKznn6WI=; b=CLGEDpb3LVz6Q0GpxnqNbVCno4s4izs/41RFVQy82xiPUVblccHyE5Tb9yV8luMJym v5WgS3FCveIxwZgZ4tDT7Zv2ZkGBfRNLD6FNNKy8VXI+SzGRsY8Y8SE50bvE0y5tZAc8 dC+RMM/qcdPXIioq2A7CmxpJbt5zJF4nn9Dao2rMGIxi5zlEAE9Vxxfn4fO9J9i5Kb/I LdGmyUb31WAE5UxgRisu4iFryGEVHdwTBYZy19QXntsqdH+VCM0YVw+5+E8hSi8b2b92 hgyTxkmswHpRa6dj2Eda/vBfM77Fj3pEz1ZaSRvbgLP5cl/mi8YAHms5lzk59Gj0CD5f xJiA== X-Gm-Message-State: AOAM532F4YrTk+ZKE0k7EFtn4I0IYgYaSDGG+WP2B+rOIV6NnIqdr184 tBMq2vuJ3cO0zMlneEnvPDftC6tjivSESDGp1RokZEGgjgD6k/65mMENqGS8rbSX6zA1slyNsT/ bz1n6FQadusUxuJJm3Q== X-Received: by 2002:a4a:6f0e:: with SMTP id h14mr8751553ooc.9.1625066704518; Wed, 30 Jun 2021 08:25:04 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxcszbzdEacK4y3H72WBl3VbJNLPyRdFMNnJ5spVkJsnL4jiMYAJshJsJDroji+wc52bi8vgw== X-Received: by 2002:a4a:6f0e:: with SMTP id h14mr8751532ooc.9.1625066704336; Wed, 30 Jun 2021 08:25:04 -0700 (PDT) Received: from [192.168.1.113] (24-52-230-208.cable.teksavvy.com. [24.52.230.208]) by smtp.gmail.com with ESMTPSA id x29sm3231306ooj.10.2021.06.30.08.24.59 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 30 Jun 2021 08:25:03 -0700 (PDT) Subject: Re: [RFC/PATCH v3] ira: Support more matching constraint forms with param [PR100328] To: "Kewen.Lin" , GCC Patches Cc: bergner@linux.ibm.com, Bill Schmidt , Segher Boessenkool , Richard Sandiford , crazylht@gmail.com References: <8a5fd52a-1cc9-6563-ee6c-f345b489654c@linux.ibm.com> From: Vladimir Makarov Message-ID: <46f79a40-6927-35f3-6eda-f49d1faccbc9@redhat.com> Date: Wed, 30 Jun 2021 11:24:57 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.10.1 MIME-Version: 1.0 In-Reply-To: <8a5fd52a-1cc9-6563-ee6c-f345b489654c@linux.ibm.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 30 Jun 2021 15:25:09 -0000 On 2021-06-28 2:26 a.m., Kewen.Lin wrote: > Hi! > > on 2021/6/9 下午1:18, Kewen.Lin via Gcc-patches wrote: >> Hi, >> >> PR100328 has some details about this issue, I am trying to >> brief it here. In the hottest function LBM_performStreamCollideTRT >> of SPEC2017 bmk 519.lbm_r, there are many FMA style expressions >> (27 FMA, 19 FMS, 11 FNMA). On rs6000, this kind of FMA style >> insn has two flavors: FLOAT_REG and VSX_REG, the VSX_REG reg >> class have 64 registers whose foregoing 32 ones make up the >> whole FLOAT_REG. There are some differences for these two >> flavors, taking "*fma4_fpr" as example: >> >> (define_insn "*fma4_fpr" >> [(set (match_operand:SFDF 0 "gpc_reg_operand" "=,wa,wa") >> (fma:SFDF >> (match_operand:SFDF 1 "gpc_reg_operand" "%,wa,wa") >> (match_operand:SFDF 2 "gpc_reg_operand" ",wa,0") >> (match_operand:SFDF 3 "gpc_reg_operand" ",0,wa")))] >> >> // wa => A VSX register (VSR), vs0…vs63, aka. VSX_REG. >> // (f/d) => A floating point register, aka. FLOAT_REG. >> >> So for VSX_REG, we only have the destructive form, when VSX_REG >> alternative being used, the operand 2 or operand 3 is required >> to be the same as operand 0. reload has to take care of this >> constraint and create some non-free register copies if required. >> >> Assuming one fma insn looks like: >> op0 = FMA (op1, op2, op3) >> >> The best regclass of them are VSX_REG, when op1,op2,op3 are all dead, >> IRA simply creates three shuffle copies for them (here the operand >> order matters, since with the same freq, the one with smaller number >> takes preference), but IMO both op2 and op3 should take higher priority >> in copy queue due to the matching constraint. >> >> I noticed that there is one function ira_get_dup_out_num, which meant >> to create this kind of constraint copy, but the below code looks to >> refuse to create if there is an alternative which has valid regclass >> without spilled need. >> >> default: >> { >> enum constraint_num cn = lookup_constraint (str); >> enum reg_class cl = reg_class_for_constraint (cn); >> if (cl != NO_REGS >> && !targetm.class_likely_spilled_p (cl)) >> goto fail >> >> ... >> >> I cooked one patch attached to make ira respect this kind of matching >> constraint guarded with one parameter. As I stated in the PR, I was >> not sure this is on the right track. The RFC patch is to check the >> matching constraint in all alternatives, if there is one alternative >> with matching constraint and matches the current preferred regclass >> (or best of allocno?), it will record the output operand number and >> further create one constraint copy for it. Normally it can get the >> priority against shuffle copies and the matching constraint will get >> satisfied with higher possibility, reload doesn't create extra copies >> to meet the matching constraint or the desirable register class when >> it has to. >> >> For FMA A,B,C,D, I think ideally copies A/B, A/C, A/D can firstly stay >> as shuffle copies, and later any of A,B,C,D gets assigned by one >> hardware register which is a VSX register (VSX_REG) but not a FP >> register (FLOAT_REG), which means it has to pay costs once we can NOT >> go with VSX alternatives, so at that time it's important to respect >> the matching constraint then we can increase the freq for the remaining >> copies related to this (A/B, A/C, A/D). This idea requires some side >> tables to record some information and seems a bit complicated in the >> current framework, so the proposed patch aggressively emphasizes the >> matching constraint at the time of creating copies. >> > Comparing with the original patch (v1), this patch v3 has > considered: (this should be v2 for this mail list, but bump > it to be consistent as PR's). > > - Excluding the case where for one preferred register class > there can be two or more alternatives, one of them has the > matching constraint, while another doesn't have. So for > the given operand, even if it's assigned by a hardware reg > which doesn't meet the matching constraint, it can simply > use the alternative which doesn't have matching constraint > so no register move is needed. One typical case is > define_insn *mov_internal2 on rs6000. So we > shouldn't create constraint copy for it. > > - The possible free register move in the same register class, > disable this if so since the register move to meet the > constraint is considered as free. > > - Making it on by default, suggested by Segher & Vladimir, we > hope to get rid of the parameter if the benchmarking result > looks good on major targets. > > - Tweaking cost when either of matching constraint two sides > is hardware register. Before this patch, the constraint > copy is simply taken as a real move insn for pref and > conflict cost with one hardware register, after this patch, > it's allowed that there are several input operands > respecting the same matching constraint (but in different > alternatives), so we should take it to be like shuffle copy > for some cases to avoid over preferring/disparaging. > > Please check the PR comments for more details. > > This patch can be bootstrapped & regtested on > powerpc64le-linux-gnu P9 and x86_64-redhat-linux, but have some > "XFAIL->XPASS" failures on aarch64-linux-gnu. The failure list > was attached in the PR and thought the new assembly looks > improved (expected). > > With option Ofast unroll, this patch can help to improve SPEC2017 > bmk 508.namd_r +2.42% and 519.lbm_r +2.43% on Power8 while > 508.namd_r +3.02% and 519.lbm_r +3.85% on Power9 without any > remarkable degradations. > > Since this patch likely benefits x86_64 and aarch64, but I don't > have performance machines with these arches at hand, could > someone kindly help to benchmark it if possible? > > Many thanks in advance! > > btw, you can simply ignore the part about parameter > ira-consider-dup-in-all-alts (its name/description), it's sort of > stale, I let it be for now as we will likely get rid of it. Kewen, thank you for addressing remarks for the previous version of the patch.  The patch is ok to commit with some minor changes: o In a comment for function ira_get_dup_out_num there is no mention of effect of the param on the function returned value and returned value of single_input_op_has_cstr_p and this imho creates wrong function interface description. o It would be still nice to change name op_no to op_regno in ira_get_dup_out_num. It is ok to commit the patch to the mainline with condition that you submit the patch switching off the parameter for x86-64 right after that as Hongtao Liu has shown its negative effect on x86-64 SPEC2017. Thank you again for working on this issue. > gcc/ChangeLog: > > * doc/invoke.texi (ira-consider-dup-in-all-alts): Document new > parameter. > * ira.c (ira_get_dup_out_num): Adjust as parameter > param_ira_consider_dup_in_all_alts. > * params.opt (ira-consider-dup-in-all-alts): New. > * ira-conflicts.c (process_regs_for_copy): Add one parameter > single_input_op_has_cstr_p. > (get_freq_for_shuffle_copy): New function. > (add_insn_allocno_copies): Adjust as single_input_op_has_cstr_p. > * ira-int.h (ira_get_dup_out_num): Add one bool parameter.