From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 6DBAB3858D37; Sat, 21 Jan 2023 13:32:53 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6DBAB3858D37 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1674307973; bh=DgnM4WL5bGOkqkSKGRObyZOnp4BEdDAIoy+xBKCsWKY=; h=From:To:Subject:Date:In-Reply-To:References:From; b=L4N+fP5Ls0UOTA5MAlQKRJWhmDzBQA5b9r0UjXZo3nAe2ZOld1oCjhj9uo73iKUt0 V/Pqa69DJPNPzDQXqp7zk2lq5UVYM1031hxFjhat0H0m3qJtB12+4/z03RLZ6GQmKT msO/z1IiNW5cH9fozMmkzMmFwuGN7vLfeG8wppAU= From: "Mark_B53 at yahoo dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/108487] [10/11/12/13 Regression] ~20-30x slowdown in populating std::vector from std::ranges::iota_view Date: Sat, 21 Jan 2023 13:32:53 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 12.2.0 X-Bugzilla-Keywords: needs-bisection X-Bugzilla-Severity: normal X-Bugzilla-Who: Mark_B53 at yahoo dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D108487 --- Comment #2 from MARK BOURGEAULT --- >> For fn1, assembly of the inner loop should be identical, so I think the = 20% you were seeing may result from different loop alignment with respect t= o 32b fetch boundary Yes, it does appear that this is the explanation for the difference. Here = are the full results: original code * gcc 10.3 -std=3Dc++20 -O3 =3D> fn1 =3D ~2000ms, fn2 =3D ~1000ms * gcc 10.3 -std=3Dc++20 -O3 -falign-loops=3D32 =3D> fn1 =3D ~2000ms, fn2 = =3D ~1000ms * gcc 12.2 -std=3Dc++20 -O3 =3D> fn1 =3D ~2500ms, fn2 =3D ~32000ms * gcc 12.2 -std=3Dc++20 -O3 -falign-loops=3D32 =3D> fn1 =3D ~2000ms, fn2 = =3D ~32000ms fn1 only * gcc 10.3 -std=3Dc++20 -O3 =3D> fn1 =3D ~2500ms * gcc 10.3 -std=3Dc++20 -O3 -falign-loops=3D32 =3D> fn1 =3D ~2000ms * gcc 12.2 -std=3Dc++20 -O3 =3D> fn1 =3D ~2000ms * gcc 12.2 -std=3Dc++20 -O3 -falign-loops=3D32 =3D> fn1 =3D ~2000ms >> Also please note that cloud instances backing godbolt.org have different= CPUs, so timing results from different runs are not directly comparable. Yes, I know. I really only used godbolt to reach the conclusion that the i= ssue still exists on trunk.=