From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ed1-x532.google.com (mail-ed1-x532.google.com [IPv6:2a00:1450:4864:20::532]) by sourceware.org (Postfix) with ESMTPS id B2DEC3858C50 for ; Mon, 10 Jun 2024 18:27:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B2DEC3858C50 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=vrull.eu Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=vrull.eu ARC-Filter: OpenARC Filter v1.0.0 sourceware.org B2DEC3858C50 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::532 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1718044037; cv=none; b=C0iFMWEdFGjmRG5Jug7mjWUqjAZL7lK/8HoYn31Nf9V10vMPBuQ+PghiWHhiJEQBUPOaxoViXFrqfvewWBqS6DNAmEUGfznH16pPt2eJCsAXIPW3dFQPmTmxloLYWm4QBWTgSzjO0xCf9n75YjcAx3/cpGRRaumq7bUN9N+a9LE= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1718044037; c=relaxed/simple; bh=IAlBVqmVC/Rz6uo2pA2allB0NEEOwqgZ4++xanjUwX4=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=bPEMs4PfFRS4Uvvp0I4Au1GUXgzsqGOwBPVPsNlUymfTB2sHyekuqTjahYZbF7Mc1QSElkIxOo0rq+33C6lsjDTAxh8lgUEMqNTlfDBN/1IPXgq26is9n6/y55kWqXlbdfjuXgNmbsyeZ7F5bKPgJhwv5nYTXLAlv9nJdhO12cw= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-ed1-x532.google.com with SMTP id 4fb4d7f45d1cf-579fa270e53so244087a12.3 for ; Mon, 10 Jun 2024 11:27:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vrull.eu; s=google; t=1718044034; x=1718648834; darn=gcc.gnu.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=EcjKFOeXqe164zLNFxCU//FT3GdG5Qq4HLB4jkVFew8=; b=nmIXHVGV4lxWzMWo7jAExqFNk+S7VuKKxZNc324FelRi+QhYviUc2dcRhjguAqYyJw qX6gWmVh5e8cV2Az4zDVRG9bRvJjA0DMeTw68qBUDTh8AsGWnPt9eVO5juENgDc/zb4h syeEjQlDD0u8U3MdIqAzk2YgngRKsd5Ueu/CH0IyqCm3fCbZpndYq2BZobA5K2P7GVew DAF2WA3jolX7HCN0BVk8fFl8sdlYQWH3DLTf+jjG637ENpSwRfX2+/yjomciIfZKJ/eu UXHk17ZA+dlEeJq6bgsGAkYB1zaoQY7HYuGAkU/ZtfqoElKXLovFuQUYVi0TSSt1oCkT TJaw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718044034; x=1718648834; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=EcjKFOeXqe164zLNFxCU//FT3GdG5Qq4HLB4jkVFew8=; b=Us9hti5MZLgUo0UCIfTr7JVXOr95vunL4G9pNIULmw0QaXP+NTgpgL6Xx1PTexbkMc WDfooSuKbFi4tU6Y9S9k1HcXWhO3KYFVEAH2sn4SEvrjtJ3rPJNUV81rLWowIMrzpBT+ V7wV0j3hvUlZWhkQ+JR6HBFZJ/r0qO8s4CPra9vyvgZM5LgaJxEDyYB8Ta7EuKDD2yok Mxn3RyMrYJWYCVvxC/gpiL4l8qzSXo1Fu3NWocUteRuvhhIv3k6dZtyK0nHiUCNTT84v w/0gH1Lf/HgoFVjfTC4fVIdzq5C4wvi9wkYFfHYITzA51A+wmfd7H4NTnisrnOCQfNBI FTrg== X-Forwarded-Encrypted: i=1; AJvYcCXlfrmo43+TkKucMa3v3HlG66efup6/fH9M9jdWtWAJG3cno8PTj2JIF8H3jIMkJWOyVgBN9bgTUWvPi5EQUTSFbloxSQWNjQ== X-Gm-Message-State: AOJu0YwBEZiLO1ImQ8/q/1DQV7HfUEAQAC1tO/2firMzMGYJbIlh+OEQ 57mj3zzaM8Jx+XiwAGms7O3oK3z9CFBhk9p0qqy5iS0mIwE0/BuWFlnFrkCyqud30pO7aL8mf8l MrJYQRxnwiEk5km+Ej5enYwWDadXUJq4xoeoIHg== X-Google-Smtp-Source: AGHT+IFPLE67jtGb2EA9uhBDD67/4pYrbr1i+PCRN6pzFHDaRJMAriy+0hrtQfUbKe7yuBfCRLPHnpQJo/12xvW4aTI= X-Received: by 2002:a50:d69c:0:b0:57c:5f7e:d0f1 with SMTP id 4fb4d7f45d1cf-57c5f7ed498mr6259825a12.38.1718044033955; Mon, 10 Jun 2024 11:27:13 -0700 (PDT) MIME-Version: 1.0 References: <20240606101043.3682477-1-manolis.tsamis@vrull.eu> <264e248d-cb54-4d3d-860d-193fd7be1049@gmail.com> In-Reply-To: <264e248d-cb54-4d3d-860d-193fd7be1049@gmail.com> From: Philipp Tomsich Date: Mon, 10 Jun 2024 20:27:02 +0200 Message-ID: Subject: Re: [PATCH v2] Target-independent store forwarding avoidance. To: Jeff Law Cc: Manolis Tsamis , gcc-patches@gcc.gnu.org, Richard Biener , =?UTF-8?Q?Christoph_M=C3=BCllner?= , Jiangning Liu , Jakub Jelinek , Andrew Pinski Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, 10 Jun 2024 at 20:03, Jeff Law wrote: > > > > On 6/10/24 1:55 AM, Manolis Tsamis wrote: > > >> > > There was an older submission of a load-pair specific pass but this is > > a complete reimplementation and indeed significantly more general. > > Apart from being target independant, it addresses a number of > > important restrictions and can handle multiple store forwardings per > > load. > > It should be noted that it cannot handle the load-pair cases as these > > need special handling, but that's something we're planning to do in > > the future by reusing this infrastructure. > ACK. Thanks for the additional background. > > > > > >> > >>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > >>> index 4e8967fd8ab..c769744d178 100644 > >>> --- a/gcc/doc/invoke.texi > >>> +++ b/gcc/doc/invoke.texi > >>> @@ -12657,6 +12657,15 @@ loop unrolling. > >>> This option is enabled by default at optimization levels @option{-O1}, > >>> @option{-O2}, @option{-O3}, @option{-Os}. > >>> > >>> +@opindex favoid-store-forwarding > >>> +@item -favoid-store-forwarding > >>> +@itemx -fno-avoid-store-forwarding > >>> +Many CPUs will stall for many cycles when a load partially depends on previous > >>> +smaller stores. This pass tries to detect such cases and avoid the penalty by > >>> +changing the order of the load and store and then fixing up the loaded value. > >>> + > >>> +Disabled by default. > >> Is there any particular reason why this would be off by default at -O1 > >> or higher? It would seem to me that on modern cores that this > >> transformation should easily be a win. Even on an old in-order core, > >> avoiding the load with the bit insert is likely profitable, just not as > >> much so. > >> > > I don't have a strong opinion for that but I believe Richard's > > suggestion to decide this on a per-target basis also makes a lot of > > sense. > > Deciding whether the transformation is profitable is tightly tied to > > the architecture in question (i.e. how large the stall is and what > > sort of bit-insert instructions are available). > > In order to make this more widely applicable, I think we'll need a > > target hook that decides in which case the forwarded stores incur a > > penalty and thus the transformation makes sense. > You and Richi are probably right. I'm not a big fan of passes being > enabled/disabled on particular targets, but it may make sense here. > > > > > Afaik, for each CPU there may be cases that store forwarding is > > handled efficiently. > Absolutely. But forwarding from a smaller store to a wider load is > painful from a hardware standpoint and if we can avoid it from a codegen > standpoint, we should. This change is what I briefly hinted as "the complete solution" that we had on the drawing board when we briefly talked last November in Santa Clara. > Did y'all look at spec2017 at all for this patch? I've got our hardware > guys to expose a signal for this case so that we can (in a month or so) > get some hard data on how often it's happening in spec2017 and evaluate > how this patch helps the most affected workloads. But if y'all already > have some data we can use it as a starting point. We have looked at all of SPEC2017, especially for coverage (i.e., making sure we see a significant number of uses of the transformation) and correctness. The gcc_r and parest_r components triggered in a number of "interesting" ways (e.g., motivating the case of load-elimination). If it helps, we could share the statistics for how often the pass triggers on compiling each of the SPEC2017 components? Philipp.