From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pl1-x634.google.com (mail-pl1-x634.google.com [IPv6:2607:f8b0:4864:20::634]) by sourceware.org (Postfix) with ESMTPS id F20013858D28 for ; Mon, 28 Aug 2023 23:22:37 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org F20013858D28 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-pl1-x634.google.com with SMTP id d9443c01a7336-1c1e128135aso8818365ad.3 for ; Mon, 28 Aug 2023 16:22:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1693264957; x=1693869757; h=content-transfer-encoding:in-reply-to:from:references:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=oAKAcGRmXiypscqyiUruYFrTzrMrEqRqaAOJo4so4GQ=; b=ZjKAvsKUwwfr8AaIrOIC0KSZFrk7qZ2E/Yuf768bndRxsgIEL4xck4oa3wDx7xukf0 DC9gR1ecss46KNV8W5MpNbev/xbqWwjAuIqAye7IYF81pbNlZs/UEgDWNGu45g7cD5Wq J1xZusq3f23fuyyjJu7gq9p4UV4Khv/Vqr396vlUn1gg0VpIGkrfvxKPTH/maQLMK35L ClhPBxFZxnWZN14t/5EdvPOUUaLee2UAV1dGM9EGXkTepn0vSYszkhxY/qhoOlgivTot ATy0mvapKyF5WZB/An4Z5bRYvu75m+ZW2dVlWgKl1Fln4CormFoBpWceNF+7dWCGvW6T skng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693264957; x=1693869757; h=content-transfer-encoding:in-reply-to:from:references:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=oAKAcGRmXiypscqyiUruYFrTzrMrEqRqaAOJo4so4GQ=; b=lqKQOayg8bYno9E/D74p1ynZtbLcBjWTFIbEE37nqnbusZimAawo7LA8hYJ4rkEthD w6ExG9+EV3hVoasJLMKWXIS92QucI/bxlbVvrUCWc8dDu5aDyvPARBtU9qs0jfY+TP6n Vfl87s+vB6EH7uSDcONIRgxyW9nm0/GGZfrX3YlyHgcILC3RJO78gvK2+bfmiIgy0rWf ZScbypmxfZvzVky/okkIGtggS3CaYx9gPd0iPK3c8HV9e9ZT/Jz3bGMHviE4Hwz7r3Mn G/Es8x2kVrGoUUg1A8vML1OLWcX3Wxqu4QmVn1Qt3iaQlWlJsqYvaS2dKw/1uMGGnQqb zgbQ== X-Gm-Message-State: AOJu0YzwNsFEAT3vHjJYEGBOmoIc7B5Sx1LtdnRR13aVJaEJvOlR3TLA P5SYgV3g8+T5j1tm8v6x1JU= X-Google-Smtp-Source: AGHT+IGt2ktHoiLnavTPn9D8JnOPo5TSl2+GoZO3qsQC9IFsskDEEX5DqIbOgARTFjUi05bpixbltA== X-Received: by 2002:a17:903:1252:b0:1bd:ea88:7b93 with SMTP id u18-20020a170903125200b001bdea887b93mr26013844plh.54.1693264956827; Mon, 28 Aug 2023 16:22:36 -0700 (PDT) Received: from [172.31.0.109] ([136.36.130.248]) by smtp.gmail.com with ESMTPSA id n1-20020a170902e54100b001bf095dfb79sm7922734plf.235.2023.08.28.16.22.35 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 28 Aug 2023 16:22:36 -0700 (PDT) Message-ID: <4c3c9a1c-e182-30a9-342d-525adfb8cffd@gmail.com> Date: Mon, 28 Aug 2023 17:22:35 -0600 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Subject: Re: [PATCH] [tree-optimization/110279] swap operands in reassoc to reduce cross backedge FMA Content-Language: en-US To: Di Zhao OS , "gcc-patches@gcc.gnu.org" References: From: Jeff Law In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-3.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,KAM_SHORT,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 8/28/23 02:17, Di Zhao OS via Gcc-patches wrote: > This patch tries to fix the 2% regression in 510.parest_r on > ampere1 in the tracker. (Previous discussion is here: > https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624893.html) > > 1. Add testcases for the problem. For an op list in the form of > "acc = a * b + c * d + acc", currently reassociation doesn't > Swap the operands so that more FMAs can be generated. > After widening_mul the result looks like: > > _1 = .FMA(a, b, acc_0); > acc_1 = .FMA(c, d, _1); > > While previously (before the "Handle FMA friendly..." patch), > widening_mul's result was like: > > _1 = a * b; > _2 = .FMA (c, d, _1); > acc_1 = acc_0 + _2; > > If the code fragment is in a loop, some architecture can execute > the latter in parallel, so the performance can be much faster than > the former. For the small testcase, the performance gap is over > 10% on both ampere1 and neoverse-n1. So the point here is to avoid > turning the last statement into FMA, and keep it a PLUS_EXPR as > much as possible. (If we are rewriting the op list into parallel, > no special treatment is needed, since the last statement after > rewrite_expr_tree_parallel will be PLUS_EXPR anyway.) > > 2. Function result_feeds_back_from_phi_p is to check for cross > backedge dependency. Added new enum fma_state to describe the > state of FMA candidates. > > With this patch, there's a 3% improvement in 510.parest_r 1-copy > run on ampere1. The compile options are: > "-Ofast -mcpu=ampere1 -flto --param avoid-fma-max-bits=512". > > Best regards, > Di Zhao > > ---- > > PR tree-optimization/110279 > > gcc/ChangeLog: > > * tree-ssa-reassoc.cc (enum fma_state): New enum to > describe the state of FMA candidates for an op list. > (rewrite_expr_tree_parallel): Changed boolean > parameter to enum type. > (result_feeds_back_from_phi_p): New function to check > for cross backedge dependency. > (rank_ops_for_fma): Return enum fma_state. Added new > parameter. > (reassociate_bb): If there's backedge dependency in an > op list, swap the operands before rewrite_expr_tree. > > gcc/testsuite/ChangeLog: > > * gcc.dg/pr110279.c: New test. Not a review, but more of a question -- isn't this transformation's profitability uarch sensitive. ie, just because it's bad for a set of aarch64 uarches, doesn't mean it's bad everywhere. And in general we shy away from trying to adjust gimple code based on uarch preferences. It seems the right place to do this is gimple->rtl expansion. Jeff