From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yb1-xb33.google.com (mail-yb1-xb33.google.com [IPv6:2607:f8b0:4864:20::b33]) by sourceware.org (Postfix) with ESMTPS id 0DA963858D35 for ; Mon, 8 Jan 2024 03:08:33 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0DA963858D35 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 0DA963858D35 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::b33 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1704683316; cv=none; b=HMuJ3QyS1Lk9MBDPiBDpUHG+EG2flzJSBNLpfzTC519+N734tD15RLTkMYy3Mdam48R77QB8mbcRXn5dXQ9y3e4KSlzaVJ10+fCZTeeOzSKAVyjFhm8nmiwaYYTkonfx+PZf1sIYdOisCnLXPBa9p+43dugncKpx/+mgjD7ct2U= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1704683316; c=relaxed/simple; bh=LlxiTH6JhqPkP5x5tmkfei6HRqf0NmYROyKjgJUQOzc=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=NckPhZXPv4XWMLWcp7yhjZR9V2AgUReLs3oqB5C1OAArCZjdvKpcQvYjF1xbHk5SacPQ6IZxEbFmHo0ZoF+6plEOdfs4v+UqNmxRwuPnASi4ZImLorGkr+DrgfD/7iVS8nM3f096BCdMdcRUb6rsNCVobtVEOphKKeHHSymGJ8c= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-yb1-xb33.google.com with SMTP id 3f1490d57ef6-dbed4b03b48so579551276.3 for ; Sun, 07 Jan 2024 19:08:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1704683312; x=1705288112; darn=gcc.gnu.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=GWybsmtbUxPWD0z3Ji7NuGP6Rjr0Y/hhlILn7HwkPeo=; b=VgQsMaSDVjidatUp+0j+04xB8UgiN5/pcnj1YVlQk7jtiOvNSOhN/vMM8jyD1VNE8m LvnIxhy8BbV/UDrOdUaTKvQo/HOSiStVcBYT5KW6nGlKRYYIxjJMT+II3z+C7h6GMIix jmKj2pMhnX7g+jYsDbEDGRLbEUdjI0KN13IAwRiJXoPftd7csFvIDbjA1alYRCayXD8d 1zM65r5sMmh6HTkhYPsEfgYS5ejdMxNMJXhuXuce5VmigK5XpfyBjoEmvGVBqglk38V1 +Xi5V3t3jSRbfu2GdwGlWuxppKTQvc/PjTw5tWcOlLj2ubwSBRXnu8arY5XQQ90li16M jxNg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704683312; x=1705288112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GWybsmtbUxPWD0z3Ji7NuGP6Rjr0Y/hhlILn7HwkPeo=; b=GRpqgMuGhuHz8uK+7u1h5jI1yRaJEBOmqMRASW/orlzKEeEo3N/s3GxnDAQ+wvlgKV 4Sq8MlvwI7n36DYKCCg7PQfq0p+KqzSnSpvGjkXTDQcoOP6xzt62MSrBy8naQubjwv64 gNMzaQcAostcK9ugpCpDUEEVy9ITPB9ay0yBPSmWHEO48qTaULMtzB6BaPTev13IfVqp ortqcppSBGcYnMpLqyQQeiYennAM9gdHlGK38OJwGt3o3THUfGHvQZUhRXIDdd0eG1f/ P08wJGhTta6+1KvfmullyJ2g/ufb26b4BKBcxITZhzbh+jQmcUQTvxi/PvXd3SKUopvD S0Cg== X-Gm-Message-State: AOJu0Yy3In2bsoLICkNYd+wFFBTj6GpG6kul4UMrlTKL4VsbsHoKN0T8 MSJo/pJtjTlEzjCKAVLPtEvg3cpnrZ3unT9jSUC40xKunpI= X-Google-Smtp-Source: AGHT+IHAvRs3F4ZZFbFI8SwN+2oULP1cvPbEI1A/bAoAkEkhk2GgXf6kOhSw6nMyzoLWXr729+jnwTTD7dF+OV7txUc= X-Received: by 2002:a25:268d:0:b0:d9a:4b0f:402b with SMTP id m135-20020a25268d000000b00d9a4b0f402bmr850592ybm.38.1704683312265; Sun, 07 Jan 2024 19:08:32 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Hongtao Liu Date: Mon, 8 Jan 2024 11:16:50 +0800 Message-ID: Subject: Re: Disable FMADD in chains for Zen4 and generic To: Jan Hubicka Cc: gcc-patches@gcc.gnu.org, hongtao.liu@intel.com, hongjiu.lu@intel.com, "Zhang, Annita" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-7.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Thu, Dec 14, 2023 at 12:03=E2=80=AFAM Jan Hubicka wrote= : > > > > The diffrerence is that Cores understand the fact that fmadd does not= need > > > all three parameters to start computation, while Zen cores doesn't. > > > > > > Since this seems noticeable win on zen and not loss on Core it seems = like good > > > default for generic. > > > > > > I plan to commit the patch next week if there are no compplains. > > The generic part LGTM.(It's exactly what we proposed in [1]) > > > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637721.html > > Thanks. I wonder if can think of other generic changes that would make > sense to do? > Concerning zen4 and FMA, it is not really win with AVX512 enabled > (which is what I was benchmarking for znver4 tuning), but indeed it is > win with AVX256 where the extra latency is not hidden by the parallelism > exposed by doing evertyhing twice. > > I re-benmchmarked zen4 and it behaves similarly to zen3 with avx256, so > for x86-64-v3 this makes sense. > > Honza > > > > > > Honza > > > > > > #include > > > #include > > > > > > #define SIZE 1000 > > > > > > float a[SIZE][SIZE]; > > > float b[SIZE][SIZE]; > > > float c[SIZE][SIZE]; > > > > > > void init(void) > > > { > > > int i, j, k; > > > for(i=3D0; i > > { > > > for(j=3D0; j > > { > > > a[i][j] =3D (float)i + j; > > > b[i][j] =3D (float)i - j; > > > c[i][j] =3D 0.0f; > > > } > > > } > > > } > > > > > > void mult(void) > > > { > > > int i, j, k; > > > > > > for(i=3D0; i > > { > > > for(j=3D0; j > > { > > > for(k=3D0; k > > { > > > c[i][j] +=3D a[i][k] * b[k][j]; > > > } > > > } > > > } > > > } > > > > > > int main(void) > > > { > > > clock_t s, e; > > > > > > init(); > > > s=3Dclock(); > > > mult(); > > > e=3Dclock(); > > > printf(" mult took %10d clocks\n", (int)(e-s)); > > > > > > return 0; > > > > > > } > > > > > > * confg/i386/x86-tune.def (X86_TUNE_AVOID_128FMA_CHAINS, X86_= TUNE_AVOID_256FMA_CHAINS) > > > Enable for znver4 and Core. > > > > > > diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.= def > > > index 43fa9e8fd6d..74b03cbcc60 100644 > > > --- a/gcc/config/i386/x86-tune.def > > > +++ b/gcc/config/i386/x86-tune.def > > > @@ -515,13 +515,13 @@ DEF_TUNE (X86_TUNE_USE_SCATTER_8PARTS, "use_sca= tter_8parts", > > > > > > /* X86_TUNE_AVOID_128FMA_CHAINS: Avoid creating loops with tight 128= bit or > > > smaller FMA chain. */ > > > -DEF_TUNE (X86_TUNE_AVOID_128FMA_CHAINS, "avoid_fma_chains", m_ZNVER1= | m_ZNVER2 | m_ZNVER3 > > > - | m_YONGFENG) > > > +DEF_TUNE (X86_TUNE_AVOID_128FMA_CHAINS, "avoid_fma_chains", m_ZNVER1= | m_ZNVER2 | m_ZNVER3 | m_ZNVER4 > > > + | m_YONGFENG | m_GENERIC) > > > > > > /* X86_TUNE_AVOID_256FMA_CHAINS: Avoid creating loops with tight 256= bit or > > > smaller FMA chain. */ > > > -DEF_TUNE (X86_TUNE_AVOID_256FMA_CHAINS, "avoid_fma256_chains", m_ZNV= ER2 | m_ZNVER3 > > > - | m_CORE_HYBRID | m_SAPPHIRERAPIDS | m_CORE_ATOM) > > > +DEF_TUNE (X86_TUNE_AVOID_256FMA_CHAINS, "avoid_fma256_chains", m_ZNV= ER2 | m_ZNVER3 | m_ZNVER4 > > > + | m_CORE_HYBRID | m_SAPPHIRERAPIDS | m_CORE_ATOM | m_GENERI= C) Can we backport the patch(at least the generic part) to GCC11/GCC12/GCC13 release branch? > > > > > > /* X86_TUNE_AVOID_512FMA_CHAINS: Avoid creating loops with tight 512= bit or > > > smaller FMA chain. */ > > > > > > > > -- > > BR, > > Hongtao --=20 BR, Hongtao