From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lj1-x236.google.com (mail-lj1-x236.google.com [IPv6:2a00:1450:4864:20::236]) by sourceware.org (Postfix) with ESMTPS id B30B03858413 for ; Wed, 19 Jul 2023 09:05:57 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B30B03858413 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-lj1-x236.google.com with SMTP id 38308e7fff4ca-2b741cf99f8so102606671fa.0 for ; Wed, 19 Jul 2023 02:05:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1689757556; x=1692349556; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=rAt5rtYaMfNDGjbSqHePxSKp4tmDWeb0sfPbQxkb7kA=; b=CoyWZNI85zDsqnWG5/R5paOjOMxg56jzyyPUP9LteTjx1WrpJXh3XhvTj27tgHdbAV iuUOrKepzUMFX5ZEuOG53pjkSFcrzi6R2FDHQhTAJT45Uxxkppg+erAgAFm6uQ7VWhuG NiDJqxbGNU7EmTtMQTSM1aTDMxLeeF0g3jxcihCRni4GqPexePyKVx+iT1uhGg4xz9xg c9FG4QiVEDmqTGo5ntcJFmpNN0kaXWX+UoYCZ0x8XEJo9FlNVWbU5cuFwnG//dqmMUcg yccyEBwIhXwSjeCX7b6B6batAPo9RkZjFj0Vil6e26JpaUxyHrxi6OespWk7JMdsNVj5 ausA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689757556; x=1692349556; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=rAt5rtYaMfNDGjbSqHePxSKp4tmDWeb0sfPbQxkb7kA=; b=Saf0r8rvPHKMQjIRa7FW+I344SgH/cXaMNw/0UHfLe2kxRDCq6C5Bd4zr4iYdq4b4i YtinkabZr7ellC2V0lOtjuHY8i2LCvuFLzNnVJi0L1MWRAPnVdZ1X0bJzle/4M7ELn+w HWJxmhadlxErpKSQpDXie7v6TUpa3orWpZJ24yLFUiw+9BWBijd/ZiYMKO1vkChlTyH0 SCwH49xyDNmuV/hwUBae5MK+XzDB7CqO5K6jipuFK7IUF90XTnboDY/P9lBAsfxfJMs0 ESJXnchms00AD1/snKe1dV4N66r+A9oDj503+ejyGIUlkIUjOyiPxwsAE3D1gHlQQonM e8Vg== X-Gm-Message-State: ABy/qLZPeMWXC5iVY5kG4jxZ2zmK8hk9W0EYSQlTRXro5VbYrnqXIHZK E8xOsBvLIGt3IQsTmb8kSaikGeDml5a08gXOCTs= X-Google-Smtp-Source: APBJJlHCb4BoX8bPHD9MR6CcZh02kjTvoi2K8OFEtgXjBDXisqp5ZBLPWvy/cqC3d/mPz6E1NXtcCAjcRcmT31WzEC0= X-Received: by 2002:a2e:731a:0:b0:2b6:e2aa:8fbc with SMTP id o26-20020a2e731a000000b002b6e2aa8fbcmr1401141ljc.8.1689757556075; Wed, 19 Jul 2023 02:05:56 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Richard Biener Date: Wed, 19 Jul 2023 11:05:22 +0200 Message-ID: Subject: Re: [PATCH v2] tree-optimization/110279- Check for nested FMA chains in reassoc To: Di Zhao OS , Martin Jambor Cc: "gcc-patches@gcc.gnu.org" , Philipp Tomsich Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-1.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Tue, Jul 11, 2023 at 5:00=E2=80=AFAM Di Zhao OS via Gcc-patches wrote: > > Attached is an updated version of the patch. > > Based on Philipp's review, some changes: > > 1. Defined new enum fma_state to describe the state of FMA candidates > for a list of operands. (Since the tests seems simple after the > change, I didn't add predicates on it.) > 2. Changed return type of convert_mult_to_fma_1 and convert_mult_to_fma > to tree, to remove the in/out parameter. > 3. Added description of return value values of rank_ops_for_fma. I'll note that rank_ops_for_fma works on the single-use addition chain only= so there might be FMA ops "elsewhere" in the dependence chain that are not in 'ops'. You are using defer_p =3D false for the fma_deferring_state so I wonder why you need this machinery at all? Wouldn't the restriction only apply when we'd actually deferred a FMA generation? I'm CCing Martin who did this wor= k. But what I'm curious about is that if any of the new conditions trigger the= n you leave the ops chain alone - but it _could_ already be in "bad" shape, is there anything we could do to improve ordering? Also if (ops_mult.length () >=3D 2 && ops_mult.length () !=3D ops_length) { + if (maybe_le (tree_to_poly_int64 (TYPE_SIZE (type)), + param_avoid_fma_max_bits)) + /* Avoid re-arrange to produce less FMA chains that can be slow. *= / + return FMA_STATE_MULTIPLE; + /* Put no-mult ops and mult ops alternately at the end of the queue, which is conducive to generating more FMA and reducing the loss of FMA when breaking the chain. */ @@ -6829,9 +6909,9 @@ rank_ops_for_fma (vec *ops) if (opindex > 0) opindex--; } - return true; + return FMA_STATE_MULTIPLE; so we end up parallel rewriting in any case and just avoid "optimizing" the ops list here. >From the PR it looks like without the rewriting we are lucky that the FMA generation heuristic to avoid cross backedge FMA dependences doesn't trigger without associating (but parallel rewrite is still good)? For the NESTED case we avoid parallel rewriting completely, independent on param_avoid_fma_max_bits - it seems from the PR we want more FMAs here and the parallel rewriting makes it worse? But I don't see exactly how. I think these are all a bit fragile heuristics also give that reassoc works on single-use chains only. The more we interwind FMA creation in widen_mult and associat= ing for FMA in reassoc the more I think that reassoc itself should form the FMA= s? The passes are unfortunately quite a bit separated. Can you produce testcases for the two problematical cases in the PR? That said, the added heuristics are not looking universally good to me with= out better motivation. > --- > gcc/ChangeLog: > Missing PR tree-optimization/110279 so bugzilla picks up the commit. > * tree-ssa-math-opts.cc (convert_mult_to_fma_1): Added new parame= ter > check_only_p. Changed return type to tree. > (struct fma_transformation_info): Moved to header. > (class fma_deferring_state): Moved to header. > (convert_mult_to_fma): Added new parameter check_only_p. Changed > return type to tree. > * tree-ssa-math-opts.h (struct fma_transformation_info): Moved fr= om .cc. > (class fma_deferring_state): Moved from .cc. > (convert_mult_to_fma): Add function decl. > * tree-ssa-reassoc.cc (enum fma_state): Defined new enum to descr= ibe > the state of FMA candidates for a list of operands. > (rewrite_expr_tree_parallel): Changed boolean parameter to enum t= ype. > (rank_ops_for_fma): Return enum fma_state. > (reassociate_bb): Avoid rewriting to parallel if nested FMAs are = found. > > Thanks, > Di Zhao > >