From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 782DC385840F; Tue, 27 Feb 2024 07:26:09 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 782DC385840F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1709018769; bh=IuzSjPh27s1p2klf3zlBys91ItBDvFOXoQ/ZZGvCs9g=; h=From:To:Subject:Date:In-Reply-To:References:From; b=wCtAjeVpk6o9Kv0lpE7BL0Ar73XB1lXbTjuY9+eHom9cjrpDFn5GMBOrot76VgHt6 1RYl5WpsbsmTR38cNPWsyYfTnc5S9Vk8coHQPQn04g/QJITtwePgCnF+Mj+Bv0BHcS AbhxgPFj1jWumEBSoxdKjrdvPWgRAdYCP1UlmLoY= From: "liuhongt at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/112325] Missed vectorization of reduction after unrolling Date: Tue, 27 Feb 2024 07:26:07 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: liuhongt at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D112325 --- Comment #11 from Hongtao Liu --- > Loop body is likely going to simplify further, this is difficult > to guess, we just decrease the result by 1/3. */ >=20 This is introduced by r0-68074-g91a01f21abfe19 /* Estimate number of insns of completely unrolled loop. We assume + that the size of the unrolled loop is decreased in the + following way (the numbers of insns are based on what + estimate_num_insns returns for appropriate statements): + + 1) exit condition gets removed (2 insns) + 2) increment of the control variable gets removed (2 insns) + 3) All remaining statements are likely to get simplified + due to constant propagation. Hard to estimate; just + as a heuristics we decrease the rest by 1/3. + + NINSNS is the number of insns in the loop before unrolling. + NUNROLL is the number of times the loop is unrolled. */ + +static unsigned HOST_WIDE_INT +estimated_unrolled_size (unsigned HOST_WIDE_INT ninsns, + unsigned HOST_WIDE_INT nunroll) +{ + HOST_WIDE_INT unr_insns =3D 2 * ((HOST_WIDE_INT) ninsns - 4) / 3; + if (unr_insns <=3D 0) + unr_insns =3D 1; + unr_insns *=3D (nunroll + 1); + + return unr_insns; +} And r0-93444-g08f1af2ed022e0 try do it more accurately by marking likely_eliminated stmt and minus that from total insns, But 2 / 3 is still keeped. +/* Estimate number of insns of completely unrolled loop. + It is (NUNROLL + 1) * size of loop body with taking into account + the fact that in last copy everything after exit conditional + is dead and that some instructions will be eliminated after + peeling. - NINSNS is the number of insns in the loop before unrolling. - NUNROLL is the number of times the loop is unrolled. */ + Loop body is likely going to simplify futher, this is difficult + to guess, we just decrease the result by 1/3. */ static unsigned HOST_WIDE_INT -estimated_unrolled_size (unsigned HOST_WIDE_INT ninsns, +estimated_unrolled_size (struct loop_size *size, unsigned HOST_WIDE_INT nunroll) { - HOST_WIDE_INT unr_insns =3D 2 * ((HOST_WIDE_INT) ninsns - 4) / 3; + HOST_WIDE_INT unr_insns =3D ((nunroll) + * (HOST_WIDE_INT) (size->overall + - size->eliminated_by_peeling)); + if (!nunroll) + unr_insns =3D 0; + unr_insns +=3D size->last_iteration - size->last_iteration_eliminated_by_peeling; + + unr_insns =3D unr_insns * 2 / 3; if (unr_insns <=3D 0) unr_insns =3D 1; - unr_insns *=3D (nunroll + 1); It looks to me 1 / 3 overestimates the instructions that can be optimised a= way, especially if we've subtracted eliminated_by_peeling=