From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 4106A3858038; Tue, 1 Mar 2022 09:33:46 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 4106A3858038 From: "crazylht at gmail dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/101908] [12 regression] cray regression with -O2 -ftree-slp-vectorize compared to -O2 Date: Tue, 01 Mar 2022 09:33:45 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 12.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: crazylht at gmail dot com X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P1 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 12.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Mar 2022 09:33:46 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D101908 --- Comment #29 from Hongtao.liu --- >>From Agner Fog's excellent optimization manuals(https://www.agner.org/optimize/microarchitecture.pdf). For ICX/TGL: An aligned write of 128 bits or more followed by a read of one or both of t= he two halves or the four quarters, etc., has little or no penalty. A partial read that does= not fit into the halves or quarters fails to forward. The write-to-read latency is 19-20 clock cycl= es when forwarding fails. A read that is bigger than the write, or a read that covers both written and unwritten bytes, fails to forward. The write-to-read latency is 19-20 clock cycles. And for Intel software optimization guide: There are several cases in which data is passed through memory, and the sto= re may need to be sepa- rated from the load: =E2=80=A2 Spills, save and restore registers in a stack frame. =E2=80=A2 Parameter passing. =E2=80=A2 Global and volatile variables. =E2=80=A2 Type conversion between integer and floating-point. =E2=80=A2 When compilers do not analyze code that is inlined, forcing varia= bles that are involved in the interface with inlined code to be in memory, creating m= ore memory variables and preventing the elimination of redundant loads.=