From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ed1-x535.google.com (mail-ed1-x535.google.com [IPv6:2a00:1450:4864:20::535]) by sourceware.org (Postfix) with ESMTPS id 68D2A3858D3C for ; Fri, 22 Dec 2023 15:39:44 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 68D2A3858D3C Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 68D2A3858D3C Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::535 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1703259586; cv=none; b=Tc5vqEesI6JQ1xau8qEGCYyyElJaIp5wuQl7qbbsFq5/a5yp+r1+s/5az1HAEI8qm2cUm+SFuDAuaS11/QO6D3F+Clu2acL8PCI4zO6ngxUJgRt3WAbYZvMnyuqAZ0DdkXE//8CB+2UUnw5mFpw7FElGfWVnqsDf+bWklRMW5pY= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1703259586; c=relaxed/simple; bh=2sAxMYREqXCfOohDXfDJf3jsFt4VN6lnU5NcHUvG9Dw=; h=DKIM-Signature:From:Mime-Version:Subject:Date:Message-Id:To; b=O3QGhXndDFDEuBe6CQI3SLRjruxhVscIj3jHgoZpDvKF7G6vGRTUce26Xk5besTqgMR1Q+n11bXmyPUipKuJPhAmAIh+lQ00la1WIHXvqV8sb7X77bcnky0/d63JoWq6OlJ91B0CReBVX7lpjYb9ePDk8FEvKg1hXG4Qrkktn5s= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-ed1-x535.google.com with SMTP id 4fb4d7f45d1cf-5534abbc637so2310378a12.0 for ; Fri, 22 Dec 2023 07:39:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1703259583; x=1703864383; darn=gcc.gnu.org; h=to:in-reply-to:cc:references:message-id:date:subject:mime-version :from:content-transfer-encoding:from:to:cc:subject:date:message-id :reply-to; bh=bECqg+bC9wj3LwM2EYS4HdHcL5KzTwAAz5IiFzfPjr4=; b=nKesM23gP2R5PhMOsDbRwXm/m6oh4ZBk69/xbVPSlHPEZoq2AMDJ/aMzDL1sd/3kka ni4tyWCl8+4w9S12JKPm8XJjsy4sRzu/VzHbxDqkRB/+qCs4dT+j3IARbkIRhTfPc4Cw QhPUNf71LnMlOT6NJ0WKkpQhKnDB/e+osRvx7E1ky2UsKNUpvLPbuZUAQAmEKFz1o1lJ zMrkizQCZm/dY9MKZa+Qeb1dug/oKSPxdUrXcHtOCfVrO324xJhw2hd2N1tpIWv1G7BX 9i5Nol0uxD4PgCcYXpnZnnCHmCW/4RbAJOYmaL8Q9sYcx8MTS/oL7ST6mEWQ7XI8aYW2 SKOw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1703259583; x=1703864383; h=to:in-reply-to:cc:references:message-id:date:subject:mime-version :from:content-transfer-encoding:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=bECqg+bC9wj3LwM2EYS4HdHcL5KzTwAAz5IiFzfPjr4=; b=F6czrXx1SsVnb9vBGrirC4mbvACtda++QRoouspElDX30S6n+sjFOgqlNytBoLzKQ8 M9pQms7JPKdub8gUeCFaG9BtWwbjObWbnpzNOgK5qjjNdjp8DbGGA4kREehwnUfJk/tW U4yik+fwv3Y6D0nNkB8dT15KA7BrIPGDQCZqUK0VoylOqFbyunFuw8mis6hWJ+SUj7Eu 6t0nLzXA6S2x9j+GdJ8HDhgiwvi9PuAque9BHf2+TH01fkndOi03ld6jU9Zdpgqim3WA 2u0fnkwagfqMHBua3TLHPbXs4bOGZwBRUGWmGEwvq7031LpwyL833VSvgSttqh+dDQL+ lo4Q== X-Gm-Message-State: AOJu0YxFN0J6e7VNc5wcwpGQxxU9prT2kPkNfLoSl64mRZ5vkaAYWxDM gIQM9ZJEoq/MgKYPcTWUV68= X-Google-Smtp-Source: AGHT+IGgqb9Kqmf9SKWjLNwox988MhgpCaoYbfbsndDxaA55VZxcZGva1NnyHXalJyxZ2Bo4PLVy0w== X-Received: by 2002:a17:906:10c:b0:a26:c840:1c29 with SMTP id 12-20020a170906010c00b00a26c8401c29mr170660eje.104.1703259582750; Fri, 22 Dec 2023 07:39:42 -0800 (PST) Received: from smtpclient.apple (dynamic-095-114-074-037.95.114.pool.telefonica.de. [95.114.74.37]) by smtp.gmail.com with ESMTPSA id zr16-20020a170907711000b00a26af4d96c6sm1272505ejb.4.2023.12.22.07.39.41 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 22 Dec 2023 07:39:42 -0800 (PST) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable From: Richard Biener Mime-Version: 1.0 (1.0) Subject: Re: [PATCH v4] [tree-optimization/110279] Consider FMA in get_reassociation_width Date: Fri, 22 Dec 2023 16:39:31 +0100 Message-Id: <8078F55F-D53C-43AF-817D-76E2C5C8BF79@gmail.com> References: Cc: Thomas Schwinge , gcc-patches@gcc.gnu.org In-Reply-To: To: Di Zhao OS X-Mailer: iPhone Mail (21C66) X-Spam-Status: No, score=-10.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: > Am 22.12.2023 um 16:05 schrieb Di Zhao OS := >=20 > =EF=BB=BFUpdated the fix in attachment. >=20 > Is it OK for trunk? Ok > Tested on aarch64-unknown-linux-gnu and x86_64-pc-linux-gnu. >=20 > Thanks, > Di Zhao >=20 >> -----Original Message----- >> From: Di Zhao OS >> Sent: Sunday, December 17, 2023 8:31 PM >> To: Thomas Schwinge ; gcc-patches@gcc.gnu.org >> Cc: Richard Biener >> Subject: RE: [PATCH v4] [tree-optimization/110279] Consider FMA in >> get_reassociation_width >>=20 >> Hello Thomas, >>=20 >>> -----Original Message----- >>> From: Thomas Schwinge >>> Sent: Friday, December 15, 2023 5:46 PM >>> To: Di Zhao OS ; gcc-patches@gcc.gnu.org >>> Cc: Richard Biener >>> Subject: RE: [PATCH v4] [tree-optimization/110279] Consider FMA in >>> get_reassociation_width >>>=20 >>> Hi! >>>=20 >>> On 2023-12-13T08:14:28+0000, Di Zhao OS >> wrote: >>>> --- /dev/null >>>> +++ b/gcc/testsuite/gcc.dg/pr110279-2.c >>>> @@ -0,0 +1,41 @@ >>>> +/* PR tree-optimization/110279 */ >>>> +/* { dg-do compile } */ >>>> +/* { dg-options "-Ofast --param tree-reassoc-width=3D4 --param fully- >>> pipelined-fma=3D1 -fdump-tree-reassoc2-details -fdump-tree-optimized" } *= / >>>> +/* { dg-additional-options "-march=3Darmv8.2-a" { target aarch64-*-* }= } */ >>>> + >>>> +#define LOOP_COUNT 800000000 >>>> +typedef double data_e; >>>> + >>>> +#include >>>> + >>>> +__attribute_noinline__ data_e >>>> +foo (data_e in) >>>=20 >>> Pushed to master branch commit 91e9e8faea4086b3b8aef2355fc12c1559d425f6 >>> "Fix 'gcc.dg/pr110279-2.c' syntax error due to '__attribute_noinline__'"= , >>> see attached. >>>=20 >>> However: >>>=20 >>>> +{ >>>> + data_e a1, a2, a3, a4; >>>> + data_e tmp, result =3D 0; >>>> + a1 =3D in + 0.1; >>>> + a2 =3D in * 0.1; >>>> + a3 =3D in + 0.01; >>>> + a4 =3D in * 0.59; >>>> + >>>> + data_e result2 =3D 0; >>>> + >>>> + for (int ic =3D 0; ic < LOOP_COUNT; ic++) >>>> + { >>>> + /* Test that a complete FMA chain with length=3D4 is not broken.= */ >>>> + tmp =3D a1 + a2 * a2 + a3 * a3 + a4 * a4 ; >>>> + result +=3D tmp - ic; >>>> + result2 =3D result2 / 2 - tmp; >>>> + >>>> + a1 +=3D 0.91; >>>> + a2 +=3D 0.1; >>>> + a3 -=3D 0.01; >>>> + a4 -=3D 0.89; >>>> + >>>> + } >>>> + >>>> + return result + result2; >>>> +} >>>> + >>>> +/* { dg-final { scan-tree-dump-not "was chosen for reassociation" >>> "reassoc2"} } */ >>>> +/* { dg-final { scan-tree-dump-times {\.FMA } 3 "optimized"} } */ >>=20 >> Thank you for the fix. >>=20 >>> ..., I still see these latter two tree dump scans FAIL, for GCN: >>>=20 >>> $ grep -C2 'was chosen for reassociation' pr110279-2.c.197t.reassoc2 >>> 2 *: a3_40 >>> 2 *: a2_39 >>> Width =3D 4 was chosen for reassociation >>> Transforming _15 =3D powmult_1 + powmult_3; >>> into _63 =3D powmult_1 + a1_38; >>> $ grep -F .FMA pr110279-2.c.265t.optimized >>> _63 =3D .FMA (a2_39, a2_39, a1_38); >>> _64 =3D .FMA (a3_40, a3_40, powmult_5); >>>=20 >>> ..., nvptx: >>>=20 >>> $ grep -C2 'was chosen for reassociation' pr110279-2.c.197t.reassoc2 >>> 2 *: a3_40 >>> 2 *: a2_39 >>> Width =3D 4 was chosen for reassociation >>> Transforming _15 =3D powmult_1 + powmult_3; >>> into _63 =3D powmult_1 + a1_38; >>> $ grep -F .FMA pr110279-2.c.265t.optimized >>> _63 =3D .FMA (a2_39, a2_39, a1_38); >>> _64 =3D .FMA (a3_40, a3_40, powmult_5); >>=20 >> For these 2 targets, the reassoc_width for FMUL is 1 (default value), >> While the testcase assumes that to be 4. The bug was introduced when I >> updated the patch but forgot to update the testcase. >>=20 >>> ..., but also x86_64-pc-linux-gnu: >>>=20 >>> $ grep -C2 'was chosen for reassociation' pr110279-2.c.197t.reassoc2= >>> 2 *: a3_40 >>> 2 *: a2_39 >>> Width =3D 2 was chosen for reassociation >>> Transforming _15 =3D powmult_1 + powmult_3; >>> into _63 =3D powmult_1 + powmult_3; >>> $ grep -cF .FMA pr110279-2.c.265t.optimized >>> 0 >>=20 >> For x86_64 this needs "-mfma". Sorry the compile options missed that. >> Can the change below fix these issues? I moved them into >> testsuite/gcc.target/aarch64, since they rely on tunings. >>=20 >> Tested on aarch64-unknown-linux-gnu. >>=20 >>>=20 >>> Gr=C3=BC=C3=9Fe >>> Thomas >>>=20 >>>=20 >>> ----------------- >>> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstra=C3=9Fe 2= 01, >> 80634 >>> M=C3=BCnchen; Gesellschaft mit beschr=C3=A4nkter Haftung; Gesch=C3=A4fts= f=C3=BChrer: Thomas >>> Heurung, Frank Th=C3=BCrauf; Sitz der Gesellschaft: M=C3=BCnchen; Regist= ergericht >>> M=C3=BCnchen, HRB 106955 >>=20 >> Thanks, >> Di Zhao >>=20 >> --- >> gcc/testsuite/{gcc.dg =3D> gcc.target/aarch64}/pr110279-1.c | 3 +-- >> gcc/testsuite/{gcc.dg =3D> gcc.target/aarch64}/pr110279-2.c | 3 +-- >> 2 files changed, 2 insertions(+), 4 deletions(-) >> rename gcc/testsuite/{gcc.dg =3D> gcc.target/aarch64}/pr110279-1.c (83%) >> rename gcc/testsuite/{gcc.dg =3D> gcc.target/aarch64}/pr110279-2.c (78%) >>=20 >> diff --git a/gcc/testsuite/gcc.dg/pr110279-1.c >> b/gcc/testsuite/gcc.target/aarch64/pr110279-1.c >> similarity index 83% >> rename from gcc/testsuite/gcc.dg/pr110279-1.c >> rename to gcc/testsuite/gcc.target/aarch64/pr110279-1.c >> index f25b6aec967..97d693f56a5 100644 >> --- a/gcc/testsuite/gcc.dg/pr110279-1.c >> +++ b/gcc/testsuite/gcc.target/aarch64/pr110279-1.c >> @@ -1,6 +1,5 @@ >> /* { dg-do compile } */ >> -/* { dg-options "-Ofast --param avoid-fma-max-bits=3D512 --param tree-re= assoc- >> width=3D4 -fdump-tree-widening_mul-details" } */ >> -/* { dg-additional-options "-march=3Darmv8.2-a" { target aarch64-*-* } }= */ >> +/* { dg-options "-Ofast -mcpu=3Dgeneric --param avoid-fma-max-bits=3D512= --param >> tree-reassoc-width=3D4 -fdump-tree-widening_mul-details" } */ >>=20 >> #define LOOP_COUNT 800000000 >> typedef double data_e; >> diff --git a/gcc/testsuite/gcc.dg/pr110279-2.c >> b/gcc/testsuite/gcc.target/aarch64/pr110279-2.c >> similarity index 78% >> rename from gcc/testsuite/gcc.dg/pr110279-2.c >> rename to gcc/testsuite/gcc.target/aarch64/pr110279-2.c >> index b6b69969c6b..a88cb361fdc 100644 >> --- a/gcc/testsuite/gcc.dg/pr110279-2.c >> +++ b/gcc/testsuite/gcc.target/aarch64/pr110279-2.c >> @@ -1,7 +1,6 @@ >> /* PR tree-optimization/110279 */ >> /* { dg-do compile } */ >> -/* { dg-options "-Ofast --param tree-reassoc-width=3D4 --param fully-pip= elined- >> fma=3D1 -fdump-tree-reassoc2-details -fdump-tree-optimized" } */ >> -/* { dg-additional-options "-march=3Darmv8.2-a" { target aarch64-*-* } }= */ >> +/* { dg-options "-Ofast -mcpu=3Dgeneric --param tree-reassoc-width=3D4 -= -param >> fully-pipelined-fma=3D1 -fdump-tree-reassoc2-details -fdump-tree-optimize= d" } */ >>=20 >> #define LOOP_COUNT 800000000 >> typedef double data_e; >> -- >> 2.25.1 > <0001-Fix-compile-options-of-pr110279-1.c-and-pr110279-2.c.patch>