From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=iL30=EN=gmail.com=jeffreyalaw@sourceware.org>
Received: from mail-pl1-x634.google.com (mail-pl1-x634.google.com [IPv6:2607:f8b0:4864:20::634])
	by sourceware.org (Postfix) with ESMTPS id F20013858D28
	for <gcc-patches@gcc.gnu.org>; Mon, 28 Aug 2023 23:22:37 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org F20013858D28
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com
Received: by mail-pl1-x634.google.com with SMTP id d9443c01a7336-1c1e128135aso8818365ad.3
        for <gcc-patches@gcc.gnu.org>; Mon, 28 Aug 2023 16:22:37 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20221208; t=1693264957; x=1693869757;
        h=content-transfer-encoding:in-reply-to:from:references:to
         :content-language:subject:user-agent:mime-version:date:message-id
         :from:to:cc:subject:date:message-id:reply-to;
        bh=oAKAcGRmXiypscqyiUruYFrTzrMrEqRqaAOJo4so4GQ=;
        b=ZjKAvsKUwwfr8AaIrOIC0KSZFrk7qZ2E/Yuf768bndRxsgIEL4xck4oa3wDx7xukf0
         DC9gR1ecss46KNV8W5MpNbev/xbqWwjAuIqAye7IYF81pbNlZs/UEgDWNGu45g7cD5Wq
         J1xZusq3f23fuyyjJu7gq9p4UV4Khv/Vqr396vlUn1gg0VpIGkrfvxKPTH/maQLMK35L
         ClhPBxFZxnWZN14t/5EdvPOUUaLee2UAV1dGM9EGXkTepn0vSYszkhxY/qhoOlgivTot
         ATy0mvapKyF5WZB/An4Z5bRYvu75m+ZW2dVlWgKl1Fln4CormFoBpWceNF+7dWCGvW6T
         skng==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1693264957; x=1693869757;
        h=content-transfer-encoding:in-reply-to:from:references:to
         :content-language:subject:user-agent:mime-version:date:message-id
         :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=oAKAcGRmXiypscqyiUruYFrTzrMrEqRqaAOJo4so4GQ=;
        b=lqKQOayg8bYno9E/D74p1ynZtbLcBjWTFIbEE37nqnbusZimAawo7LA8hYJ4rkEthD
         w6ExG9+EV3hVoasJLMKWXIS92QucI/bxlbVvrUCWc8dDu5aDyvPARBtU9qs0jfY+TP6n
         Vfl87s+vB6EH7uSDcONIRgxyW9nm0/GGZfrX3YlyHgcILC3RJO78gvK2+bfmiIgy0rWf
         ZScbypmxfZvzVky/okkIGtggS3CaYx9gPd0iPK3c8HV9e9ZT/Jz3bGMHviE4Hwz7r3Mn
         G/Es8x2kVrGoUUg1A8vML1OLWcX3Wxqu4QmVn1Qt3iaQlWlJsqYvaS2dKw/1uMGGnQqb
         zgbQ==
X-Gm-Message-State: AOJu0YzwNsFEAT3vHjJYEGBOmoIc7B5Sx1LtdnRR13aVJaEJvOlR3TLA
	P5SYgV3g8+T5j1tm8v6x1JU=
X-Google-Smtp-Source: AGHT+IGt2ktHoiLnavTPn9D8JnOPo5TSl2+GoZO3qsQC9IFsskDEEX5DqIbOgARTFjUi05bpixbltA==
X-Received: by 2002:a17:903:1252:b0:1bd:ea88:7b93 with SMTP id u18-20020a170903125200b001bdea887b93mr26013844plh.54.1693264956827;
        Mon, 28 Aug 2023 16:22:36 -0700 (PDT)
Received: from [172.31.0.109] ([136.36.130.248])
        by smtp.gmail.com with ESMTPSA id n1-20020a170902e54100b001bf095dfb79sm7922734plf.235.2023.08.28.16.22.35
        (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
        Mon, 28 Aug 2023 16:22:36 -0700 (PDT)
Message-ID: <4c3c9a1c-e182-30a9-342d-525adfb8cffd@gmail.com>
Date: Mon, 28 Aug 2023 17:22:35 -0600
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.13.0
Subject: Re: [PATCH] [tree-optimization/110279] swap operands in reassoc to
 reduce cross backedge FMA
Content-Language: en-US
To: Di Zhao OS <dizhao@os.amperecomputing.com>,
 "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
References: <SN6PR01MB4240A22F29D390F5B96FD057E8E0A@SN6PR01MB4240.prod.exchangelabs.com>
From: Jeff Law <jeffreyalaw@gmail.com>
In-Reply-To: <SN6PR01MB4240A22F29D390F5B96FD057E8E0A@SN6PR01MB4240.prod.exchangelabs.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-3.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,KAM_SHORT,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>


On 8/28/23 02:17, Di Zhao OS via Gcc-patches wrote:
> This patch tries to fix the 2% regression in 510.parest_r on
> ampere1 in the tracker. (Previous discussion is here:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624893.html)
> 
> 1. Add testcases for the problem. For an op list in the form of
> "acc = a * b + c * d + acc", currently reassociation doesn't
> Swap the operands so that more FMAs can be generated.
> After widening_mul the result looks like:
> 
>     _1 = .FMA(a, b, acc_0);
>     acc_1 = .FMA(c, d, _1);
> 
> While previously (before the "Handle FMA friendly..." patch),
> widening_mul's result was like:
> 
>     _1 = a * b;
>     _2 = .FMA (c, d, _1);
>     acc_1 = acc_0 + _2;
> 
> If the code fragment is in a loop, some architecture can execute
> the latter in parallel, so the performance can be much faster than
> the former. For the small testcase, the performance gap is over
> 10% on both ampere1 and neoverse-n1. So the point here is to avoid
> turning the last statement into FMA, and keep it a PLUS_EXPR as
> much as possible. (If we are rewriting the op list into parallel,
> no special treatment is needed, since the last statement after
> rewrite_expr_tree_parallel will be PLUS_EXPR anyway.)
> 
> 2. Function result_feeds_back_from_phi_p is to check for cross
> backedge dependency. Added new enum fma_state to describe the
> state of FMA candidates.
> 
> With this patch, there's a 3% improvement in 510.parest_r 1-copy
> run on ampere1. The compile options are:
> "-Ofast -mcpu=ampere1 -flto --param avoid-fma-max-bits=512".
> 
> Best regards,
> Di Zhao
> 
> ----
> 
>          PR tree-optimization/110279
> 
> gcc/ChangeLog:
> 
>          * tree-ssa-reassoc.cc (enum fma_state): New enum to
>          describe the state of FMA candidates for an op list.
>          (rewrite_expr_tree_parallel): Changed boolean
>          parameter to enum type.
>          (result_feeds_back_from_phi_p): New function to check
>          for cross backedge dependency.
>          (rank_ops_for_fma): Return enum fma_state. Added new
>          parameter.
>          (reassociate_bb): If there's backedge dependency in an
>          op list, swap the operands before rewrite_expr_tree.
> 
> gcc/testsuite/ChangeLog:
> 
>          * gcc.dg/pr110279.c: New test.
Not a review, but more of a question -- isn't this transformation's 
profitability uarch sensitive.  ie, just because it's bad for a set of 
aarch64 uarches, doesn't mean it's bad everywhere.

And in general we shy away from trying to adjust gimple code based on 
uarch preferences.

It seems the right place to do this is gimple->rtl expansion.

Jeff