From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lj1-x22b.google.com (mail-lj1-x22b.google.com [IPv6:2a00:1450:4864:20::22b]) by sourceware.org (Postfix) with ESMTPS id 914393858C27 for ; Tue, 22 Nov 2022 08:28:49 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 914393858C27 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-lj1-x22b.google.com with SMTP id u2so17141320ljl.3 for ; Tue, 22 Nov 2022 00:28:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=vbvbK5YQ3iH3udlyHw6ysgTthTprKr03PcrCGczcXko=; b=e1TEnfkrgiRlfuk1icW207GYEfyclQRC8wJuU5cV2OYUQ4EGBb9d6MEWkjcVRNw6bK NbXfTdhWCRrckG945q9CF2wSJRE1hpclkabWskKBSaOuHqy8LMxBLWXFjIpAz/b19p4i ZqKCuJ4gRzR4GVFx+OED22t2hHFCNHg8Rp/eprLychr7pOdVEnhh+RJmQoDAbyOgzahA WC/j+pdMsA9WiKFip1ctW4WZ2opJ19iv4s/TkifH8zGQAoD+n8z0jKqk7GvNa/CXDzXJ ivYxTbc11RZWH03+xtykSv4am17tLBW+x7d1Wo1LtCb7rDOWpB9H5FODcsjiotq2wnEg dtMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=vbvbK5YQ3iH3udlyHw6ysgTthTprKr03PcrCGczcXko=; b=YIvHgEqKegzhvpmmPoHer8PrPfHZIa6jDDLen6UByIiTYhqK3QtOlJe/vK4j3HbSBp 7RU9QUat68EOpl/oA3lH3rypOlUqv94HpvYn/DLvqwQ2HrS8aepxz65V1Unln9pKncwp gehlDC9F3yQA0ieiebISGdRt3VD5mmQwsJRe2K+IGpFa6XWJVpUfRnFW4VzKqL4gaSVV 3UCgufdtlmb0H0OJ2BCtht+8ZEVgl4bJmLLhB6Rba2dX0HTavwy3v94ZdeqwRYw44Prl P7XVAhl3XVBI7BrPNLmPrsoXizPdvRLkksvoH7kbge1tkwbAT4GHr+eGiOUv3CnrW0F1 PTWg== X-Gm-Message-State: ANoB5pkzMjHME5WbiuJN4A7Q0ysywVownMQai+UAo4xQlzkxJVg/HLbh SEi1809WOZ77z/I8OYjWvmwcKldLRG36nb5V4PsftAmw X-Google-Smtp-Source: AA0mqf556yDuUNNpjJg2+B9xFufjsuJPWcANJmDa6j2909aRtpMvEoiY/AoWHPLRbqs+dFS004jIUSm+qhgGTs65fE0= X-Received: by 2002:a2e:9dcf:0:b0:279:70ee:571e with SMTP id x15-20020a2e9dcf000000b0027970ee571emr328245ljj.249.1669105727968; Tue, 22 Nov 2022 00:28:47 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Richard Biener Date: Tue, 22 Nov 2022 09:28:35 +0100 Message-ID: Subject: Re: [PATCH] AArch64: Add fma_reassoc_width [PR107413] To: Richard Sandiford , Wilco Dijkstra , GCC Patches Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Tue, Nov 22, 2022 at 8:59 AM Richard Sandiford via Gcc-patches wrote: > > Wilco Dijkstra writes: > > Add a reassocation width for FMAs in per-CPU tuning structures. Keep the > > existing setting for cores with 2 FMA pipes, and use 4 for cores with 4 > > FMA pipes. This improves SPECFP2017 on Neoverse V1 by ~1.5%. > > > > Passes regress/bootstrap, OK for commit? > > > > gcc/ > > PR 107413 > > * config/aarch64/aarch64.cc (struct tune_params): Add > > fma_reassoc_width to all CPU tuning structures. > > * config/aarch64/aarch64-protos.h (struct tune_params): Add > > fma_reassoc_width. > > > > --- > > > > diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h > > index a73bfa20acb9b92ae0475794c3f11c67d22feb97..71365a446007c26b906b61ca8b2a68ee06c83037 100644 > > --- a/gcc/config/aarch64/aarch64-protos.h > > +++ b/gcc/config/aarch64/aarch64-protos.h > > @@ -540,6 +540,7 @@ struct tune_params > > const char *loop_align; > > int int_reassoc_width; > > int fp_reassoc_width; > > + int fma_reassoc_width; > > int vec_reassoc_width; > > int min_div_recip_mul_sf; > > int min_div_recip_mul_df; > > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > > index 798363bcc449c414de5bbb4f26b8e1c64a0cf71a..643162cdecd6a8fe5587164cb2d0d62b709a491d 100644 > > --- a/gcc/config/aarch64/aarch64.cc > > +++ b/gcc/config/aarch64/aarch64.cc > > @@ -1346,6 +1346,7 @@ static const struct tune_params generic_tunings = > > "8", /* loop_align. */ > > 2, /* int_reassoc_width. */ > > 4, /* fp_reassoc_width. */ > > + 1, /* fma_reassoc_width. */ > > 1, /* vec_reassoc_width. */ > > 2, /* min_div_recip_mul_sf. */ > > 2, /* min_div_recip_mul_df. */ > > @@ -1382,6 +1383,7 @@ static const struct tune_params cortexa35_tunings = > > "8", /* loop_align. */ > > 2, /* int_reassoc_width. */ > > 4, /* fp_reassoc_width. */ > > + 1, /* fma_reassoc_width. */ > > 1, /* vec_reassoc_width. */ > > 2, /* min_div_recip_mul_sf. */ > > 2, /* min_div_recip_mul_df. */ > > @@ -1415,6 +1417,7 @@ static const struct tune_params cortexa53_tunings = > > "8", /* loop_align. */ > > 2, /* int_reassoc_width. */ > > 4, /* fp_reassoc_width. */ > > + 1, /* fma_reassoc_width. */ > > 1, /* vec_reassoc_width. */ > > 2, /* min_div_recip_mul_sf. */ > > 2, /* min_div_recip_mul_df. */ > > @@ -1448,6 +1451,7 @@ static const struct tune_params cortexa57_tunings = > > "8", /* loop_align. */ > > 2, /* int_reassoc_width. */ > > 4, /* fp_reassoc_width. */ > > + 1, /* fma_reassoc_width. */ > > 1, /* vec_reassoc_width. */ > > 2, /* min_div_recip_mul_sf. */ > > 2, /* min_div_recip_mul_df. */ > > @@ -1481,6 +1485,7 @@ static const struct tune_params cortexa72_tunings = > > "8", /* loop_align. */ > > 2, /* int_reassoc_width. */ > > 4, /* fp_reassoc_width. */ > > + 1, /* fma_reassoc_width. */ > > 1, /* vec_reassoc_width. */ > > 2, /* min_div_recip_mul_sf. */ > > 2, /* min_div_recip_mul_df. */ > > @@ -1514,6 +1519,7 @@ static const struct tune_params cortexa73_tunings = > > "8", /* loop_align. */ > > 2, /* int_reassoc_width. */ > > 4, /* fp_reassoc_width. */ > > + 1, /* fma_reassoc_width. */ > > 1, /* vec_reassoc_width. */ > > 2, /* min_div_recip_mul_sf. */ > > 2, /* min_div_recip_mul_df. */ > > @@ -1548,6 +1554,7 @@ static const struct tune_params exynosm1_tunings = > > "4", /* loop_align. */ > > 2, /* int_reassoc_width. */ > > 4, /* fp_reassoc_width. */ > > + 1, /* fma_reassoc_width. */ > > 1, /* vec_reassoc_width. */ > > 2, /* min_div_recip_mul_sf. */ > > 2, /* min_div_recip_mul_df. */ > > @@ -1580,6 +1587,7 @@ static const struct tune_params thunderxt88_tunings = > > "8", /* loop_align. */ > > 2, /* int_reassoc_width. */ > > 4, /* fp_reassoc_width. */ > > + 1, /* fma_reassoc_width. */ > > 1, /* vec_reassoc_width. */ > > 2, /* min_div_recip_mul_sf. */ > > 2, /* min_div_recip_mul_df. */ > > @@ -1612,6 +1620,7 @@ static const struct tune_params thunderx_tunings = > > "8", /* loop_align. */ > > 2, /* int_reassoc_width. */ > > 4, /* fp_reassoc_width. */ > > + 1, /* fma_reassoc_width. */ > > 1, /* vec_reassoc_width. */ > > 2, /* min_div_recip_mul_sf. */ > > 2, /* min_div_recip_mul_df. */ > > @@ -1646,6 +1655,7 @@ static const struct tune_params tsv110_tunings = > > "8", /* loop_align. */ > > 2, /* int_reassoc_width. */ > > 4, /* fp_reassoc_width. */ > > + 1, /* fma_reassoc_width. */ > > 1, /* vec_reassoc_width. */ > > 2, /* min_div_recip_mul_sf. */ > > 2, /* min_div_recip_mul_df. */ > > @@ -1678,6 +1688,7 @@ static const struct tune_params xgene1_tunings = > > "16", /* loop_align. */ > > 2, /* int_reassoc_width. */ > > 4, /* fp_reassoc_width. */ > > + 1, /* fma_reassoc_width. */ > > 1, /* vec_reassoc_width. */ > > 2, /* min_div_recip_mul_sf. */ > > 2, /* min_div_recip_mul_df. */ > > @@ -1710,6 +1721,7 @@ static const struct tune_params emag_tunings = > > "16", /* loop_align. */ > > 2, /* int_reassoc_width. */ > > 4, /* fp_reassoc_width. */ > > + 1, /* fma_reassoc_width. */ > > 1, /* vec_reassoc_width. */ > > 2, /* min_div_recip_mul_sf. */ > > 2, /* min_div_recip_mul_df. */ > > @@ -1743,6 +1755,7 @@ static const struct tune_params qdf24xx_tunings = > > "16", /* loop_align. */ > > 2, /* int_reassoc_width. */ > > 4, /* fp_reassoc_width. */ > > + 1, /* fma_reassoc_width. */ > > 1, /* vec_reassoc_width. */ > > 2, /* min_div_recip_mul_sf. */ > > 2, /* min_div_recip_mul_df. */ > > @@ -1778,6 +1791,7 @@ static const struct tune_params saphira_tunings = > > "16", /* loop_align. */ > > 2, /* int_reassoc_width. */ > > 4, /* fp_reassoc_width. */ > > + 1, /* fma_reassoc_width. */ > > 1, /* vec_reassoc_width. */ > > 2, /* min_div_recip_mul_sf. */ > > 2, /* min_div_recip_mul_df. */ > > @@ -1811,6 +1825,7 @@ static const struct tune_params thunderx2t99_tunings = > > "16", /* loop_align. */ > > 3, /* int_reassoc_width. */ > > 2, /* fp_reassoc_width. */ > > + 1, /* fma_reassoc_width. */ > > 2, /* vec_reassoc_width. */ > > 2, /* min_div_recip_mul_sf. */ > > 2, /* min_div_recip_mul_df. */ > > @@ -1844,6 +1859,7 @@ static const struct tune_params thunderx3t110_tunings = > > "16", /* loop_align. */ > > 3, /* int_reassoc_width. */ > > 2, /* fp_reassoc_width. */ > > + 1, /* fma_reassoc_width. */ > > 2, /* vec_reassoc_width. */ > > 2, /* min_div_recip_mul_sf. */ > > 2, /* min_div_recip_mul_df. */ > > @@ -1876,6 +1892,7 @@ static const struct tune_params neoversen1_tunings = > > "32:16", /* loop_align. */ > > 2, /* int_reassoc_width. */ > > 4, /* fp_reassoc_width. */ > > + 1, /* fma_reassoc_width. */ > > 2, /* vec_reassoc_width. */ > > 2, /* min_div_recip_mul_sf. */ > > 2, /* min_div_recip_mul_df. */ > > @@ -1912,6 +1929,7 @@ static const struct tune_params ampere1_tunings = > > "32:16", /* loop_align. */ > > 2, /* int_reassoc_width. */ > > 4, /* fp_reassoc_width. */ > > + 1, /* fma_reassoc_width. */ > > 2, /* vec_reassoc_width. */ > > 2, /* min_div_recip_mul_sf. */ > > 2, /* min_div_recip_mul_df. */ > > @@ -2089,6 +2107,7 @@ static const struct tune_params neoversev1_tunings = > > "32:16", /* loop_align. */ > > 2, /* int_reassoc_width. */ > > 4, /* fp_reassoc_width. */ > > + 4, /* fma_reassoc_width. */ > > 2, /* vec_reassoc_width. */ > > 2, /* min_div_recip_mul_sf. */ > > 2, /* min_div_recip_mul_df. */ > > @@ -2226,6 +2245,7 @@ static const struct tune_params neoverse512tvb_tunings = > > "32:16", /* loop_align. */ > > 2, /* int_reassoc_width. */ > > 4, /* fp_reassoc_width. */ > > + 4, /* fma_reassoc_width. */ > > 2, /* vec_reassoc_width. */ > > 2, /* min_div_recip_mul_sf. */ > > 2, /* min_div_recip_mul_df. */ > > @@ -2414,6 +2434,7 @@ static const struct tune_params neoversen2_tunings = > > "32:16", /* loop_align. */ > > 2, /* int_reassoc_width. */ > > 4, /* fp_reassoc_width. */ > > + 1, /* fma_reassoc_width. */ > > 2, /* vec_reassoc_width. */ > > 2, /* min_div_recip_mul_sf. */ > > 2, /* min_div_recip_mul_df. */ > > @@ -2603,6 +2624,7 @@ static const struct tune_params neoversev2_tunings = > > "32:16", /* loop_align. */ > > 3, /* int_reassoc_width. */ > > 6, /* fp_reassoc_width. */ > > + 4, /* fma_reassoc_width. */ > > 3, /* vec_reassoc_width. */ > > 2, /* min_div_recip_mul_sf. */ > > 2, /* min_div_recip_mul_df. */ > > @@ -2638,6 +2660,7 @@ static const struct tune_params a64fx_tunings = > > "32", /* loop_align. */ > > 4, /* int_reassoc_width. */ > > 2, /* fp_reassoc_width. */ > > + 1, /* fma_reassoc_width. */ > > 2, /* vec_reassoc_width. */ > > 2, /* min_div_recip_mul_sf. */ > > 2, /* min_div_recip_mul_df. */ > > @@ -3350,9 +3373,10 @@ aarch64_reassociation_width (unsigned opc, machine_mode mode) > > return aarch64_tune_params.vec_reassoc_width; > > if (INTEGRAL_MODE_P (mode)) > > return aarch64_tune_params.int_reassoc_width; > > - /* Avoid reassociating floating point addition so we emit more FMAs. */ > > - if (FLOAT_MODE_P (mode) && opc != PLUS_EXPR) > > - return aarch64_tune_params.fp_reassoc_width; > > + /* FMA's can have a different reassociation width. */ > > + if (FLOAT_MODE_P (mode)) > > + return opc == PLUS_EXPR ? aarch64_tune_params.fma_reassoc_width > > + : aarch64_tune_params.fp_reassoc_width; > > return 1; > > } > > I guess an obvious question is: if 1 (rather than 2) was the right value > for cores with 2 FMA pipes, why is 4 the right value for cores with 4 FMA > pipes? It would be good to clarify how, conceptually, the core property > should map to the fma_reassoc_width value. > > It sounds from the existing comment like the main motivation for returning 1 > was to encourage more FMAs to be formed, rather than to prevent FMAs from > being reassociated. Is that no longer an issue? Or is the point that, > with more FMA pipes, lower FMA formation is a price worth paying for > the better parallelism we get when FMAs can be formed? Would integrating FMA formation with reassoc when reassoc_width > 1 make sense? I think we should be able to add a special oplist optimization phase to more optimally distribute FMA pairs in the formed tree? Richard. > Does this code ever see opc == FMA? > > Thanks, > Richard