From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id C78CC3857C4E; Fri, 15 Apr 2022 11:04:42 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C78CC3857C4E From: "lili.cui at intel dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/104271] [12 Regression] 538.imagick_r run-time at -Ofast -march=native regressed by 26% on Intel Cascade Lake server CPU Date: Fri, 15 Apr 2022 11:04:42 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 12.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: lili.cui at intel dot com X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 12.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 Apr 2022 11:04:42 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D104271 --- Comment #9 from cuilili --- Really appreciate for your reply, I debugged SRA pass with the small testca= se and found that SRA dose not handle this situation. SRA cannot split callee's first parameter for "Do not decompose non-BLKmode parameters in a way that would create a BLKmode parameter. Especially for pass-by-reference (hence, pointer type parameters), it's not worth it." Before inline: For caller=20 store-1 : 128 bits store of struct "a" (it is an implicit store during IPA pass, the store can only be found after a certain pass.) For callee load-1 : 128 bits load of struct "a" for operation "c->a=3D(*a)" store-2: 128 bits store of struct "c->a" for operation "c->a=3D(*a)"=20 load-2 : 4 * 32 bits load for c->a.f1, c->a.f2, c->a.f3 and c->a.f4. (because the store-2 using vector register to store, we cannot use the regi= ster directly here.)=20 After inline: For caller None. For callee store-2 : 128 bits store of struct c->a for operation "c->a=3D(*a)" -------------------------------------------------------- int callee (struct A *a, struct C *c) { c->a=3D(*a);=20=20=20 if ((c->b + 7) & 17) { c->a.f1 =3D c->a.f2 + c->a.f3; c->a.f2 =3D c->a.f2 - c->a.f3; c->a.f3 =3D c->a.f2 + c->a.f3; c->a.f4 =3D c->a.f2 - c->a.f3; c->b =3D c->a.f2 + c->a.f4; return 0; } return 1; } int caller (int d, struct C *c) { struct A a; a.f1 =3D 1 + d; a.f2 =3D 2; a.f3 =3D 12 + d; a.f4 =3D 68 + d; if (d > 0) return callee (&a, c); else return 1; } ------------------------------------------------- In 538.imagic_r=EF=BC=88c_ray also has the similar code=EF=BC=89, if we inl= ine the hot function, the redundant store and load structure's size is 256 bits (4 elem= ents of size 64 bits), which can eliminates one 256-bit store, one 256-bit load,= and four 64-bit loads. can I do it like this? Computes the total size of all callee arguments that= can eliminate redundant loads and stores. Thanks!=