From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pl1-x635.google.com (mail-pl1-x635.google.com [IPv6:2607:f8b0:4864:20::635]) by sourceware.org (Postfix) with ESMTPS id 9039E3858CDB for ; Mon, 10 Jun 2024 18:03:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9039E3858CDB Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 9039E3858CDB Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::635 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1718042607; cv=none; b=qJB6r6uunsoFdR4sPqidjPuUaELMNxmSV74/oxQSU4iP//UMtlMu5NSQwabuUBRs6C2IMMoy78m9bJsWhRbgacgnZiSOjUfi4sDWQSqfctqDk/3gZjb31UjW+FoUQ25NybR24lEn9UpDhaVc6pR5q9dEgfD/kVbSembolSfTGEE= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1718042607; c=relaxed/simple; bh=Mhg2q+iSjIrYXlzrBYeqfYoxkHi8m1bKhmw56DaZioQ=; h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:To:From; b=JO7ADwQ1gn0Sw2/kV8pSrzpS8v9rfu7StYtXPCBeKHnV0Wnrx4GpD/UBT5yjf15fDxUIe9PVcx5Ae/Nc95cHGNG3Iws6LyQZr0wSmBI3cQma1cVp89KDxT17gvaplAXgowUTM3B5QVDgxu+iS7wOXs8gRwP0el3SIi8PhnVsT4A= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pl1-x635.google.com with SMTP id d9443c01a7336-1f70c457823so9100015ad.3 for ; Mon, 10 Jun 2024 11:03:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1718042600; x=1718647400; darn=gcc.gnu.org; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=0nrnUVBWg7ywIKRo03CGZKN5NQa9GoZYcuSTXwXVS8s=; b=YbCpGzRHHzV8jt0nvBsvubfOnyPZyhZaU0vyUm2Rx/zIzYMLeIrlcYSoaKU2x+zD/G gg+b40d4xzKCWV1yzCqv53wpkLKwpaA85Bk2Z7TxQx+CB3OmWimD1Zn08k5tRLzgI/ct V7LPYtyzqSVLlyF0lO9Y70yddUsY54ROAreh52wxEank0aBN6DgHo6ib8aMm7NOjDb7A cVsR69xOvvyPpZsy3848R9/B0She3EmJms3nPQdpuN0ktysoIcwwP5cLt7K8Q03NXlWl 7aaT3dpXVAhe9m3poKeoinvYzAy+N9HFdOmz5RIyMI4K8DpepthFZI3bVd2DdKHVvNCf 8Tug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718042600; x=1718647400; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=0nrnUVBWg7ywIKRo03CGZKN5NQa9GoZYcuSTXwXVS8s=; b=AU/RY3Jy7sBj8wO8qO2uTAmlJSaHEMhLe2oSrUJteTlOmdAxEOWvyy4D1Yan+t7EIT UgjjCEsgNjwoWA6ki+sAvJTLmLNU+b11Fp0jYOg5fqVXDRGcvIXAQg5Ow8dVrgQj52fz KfLuTAxwO4+WP46Of0A2vT26npWd7rDX3oiUHGvwBFAdiCXhS9nKqaIHDS/Bk0kYigii 1VglAyUkoWBE6CdyR/yFWrpu0yiJdk6EkoT2Ktb4rpTZpixEqCz2HVl+jPK8roj7VFaR n0qIEPw/bRHlvARXKk2NoSybgepvrljEVyuclR+k+Rd/dLwtpul1ekkFOJDGMgU+rdP8 A4tA== X-Gm-Message-State: AOJu0YyY3lrDZNQNf7woeNXt4nxmBUKNaBrm7QkDm9axzM8PntqIYyP3 /DK/EYWQddsumNmNkwrEYlLGISDVOdxNXIAGfYOz7Zxj2pGnJT+k X-Google-Smtp-Source: AGHT+IGDZhVnIRtn5VqgOPpb5l0Jv26YszeA37Bay4tO9u/jJ8wjqbqOT43EvVXsQyWiBC9Nac/SrA== X-Received: by 2002:a17:902:f682:b0:1f7:326:c65f with SMTP id d9443c01a7336-1f70326ca6dmr53170815ad.30.1718042599834; Mon, 10 Jun 2024 11:03:19 -0700 (PDT) Received: from [172.31.0.109] ([136.36.72.243]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1f6f96c37b6sm43281125ad.231.2024.06.10.11.03.18 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 10 Jun 2024 11:03:19 -0700 (PDT) Message-ID: <264e248d-cb54-4d3d-860d-193fd7be1049@gmail.com> Date: Mon, 10 Jun 2024 12:03:14 -0600 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Beta Subject: Re: [PATCH v2] Target-independent store forwarding avoidance. Content-Language: en-US To: Manolis Tsamis Cc: gcc-patches@gcc.gnu.org, Richard Biener , Philipp Tomsich , =?UTF-8?Q?Christoph_M=C3=BCllner?= , Jiangning Liu , Jakub Jelinek , Andrew Pinski References: <20240606101043.3682477-1-manolis.tsamis@vrull.eu> From: Jeff Law In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-8.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 6/10/24 1:55 AM, Manolis Tsamis wrote: >> > There was an older submission of a load-pair specific pass but this is > a complete reimplementation and indeed significantly more general. > Apart from being target independant, it addresses a number of > important restrictions and can handle multiple store forwardings per > load. > It should be noted that it cannot handle the load-pair cases as these > need special handling, but that's something we're planning to do in > the future by reusing this infrastructure. ACK. Thanks for the additional background. > >> >>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi >>> index 4e8967fd8ab..c769744d178 100644 >>> --- a/gcc/doc/invoke.texi >>> +++ b/gcc/doc/invoke.texi >>> @@ -12657,6 +12657,15 @@ loop unrolling. >>> This option is enabled by default at optimization levels @option{-O1}, >>> @option{-O2}, @option{-O3}, @option{-Os}. >>> >>> +@opindex favoid-store-forwarding >>> +@item -favoid-store-forwarding >>> +@itemx -fno-avoid-store-forwarding >>> +Many CPUs will stall for many cycles when a load partially depends on previous >>> +smaller stores. This pass tries to detect such cases and avoid the penalty by >>> +changing the order of the load and store and then fixing up the loaded value. >>> + >>> +Disabled by default. >> Is there any particular reason why this would be off by default at -O1 >> or higher? It would seem to me that on modern cores that this >> transformation should easily be a win. Even on an old in-order core, >> avoiding the load with the bit insert is likely profitable, just not as >> much so. >> > I don't have a strong opinion for that but I believe Richard's > suggestion to decide this on a per-target basis also makes a lot of > sense. > Deciding whether the transformation is profitable is tightly tied to > the architecture in question (i.e. how large the stall is and what > sort of bit-insert instructions are available). > In order to make this more widely applicable, I think we'll need a > target hook that decides in which case the forwarded stores incur a > penalty and thus the transformation makes sense. You and Richi are probably right. I'm not a big fan of passes being enabled/disabled on particular targets, but it may make sense here. > Afaik, for each CPU there may be cases that store forwarding is > handled efficiently. Absolutely. But forwarding from a smaller store to a wider load is painful from a hardware standpoint and if we can avoid it from a codegen standpoint, we should. Did y'all look at spec2017 at all for this patch? I've got our hardware guys to expose a signal for this case so that we can (in a month or so) get some hard data on how often it's happening in spec2017 and evaluate how this patch helps the most affected workloads. But if y'all already have some data we can use it as a starting point. jeff