From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id C772C3858D20; Sat, 21 Jan 2023 11:48:07 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C772C3858D20 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1674301687; bh=Fyeqt/ET8GVMUrL0CVj0mcq5LfLpdeJDhFnb1wQZ7GU=; h=From:To:Subject:Date:In-Reply-To:References:From; b=OUKMlirXgCpPzenhO24F6njdhMGbAgTQ0eq8Vb+kEI4pwAOdQsUuwFxl/UEzxdCZf AFDnibVcpEcGpKzYDJLxmvDqSuBJlDe6D3KaIRMnuf3Gt9Jch5AhEpWj3WXvkfNADp 6F97InikpWqTTAKtXAecH1ROtlyCIy+joqr80OL8= From: "amonakov at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/108487] [10/11/12/13 Regression] ~20-30x slowdown in populating std::vector from std::ranges::iota_view Date: Sat, 21 Jan 2023 11:48:06 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 12.2.0 X-Bugzilla-Keywords: needs-bisection X-Bugzilla-Severity: normal X-Bugzilla-Who: amonakov at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: component keywords short_desc cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D108487 Alexander Monakov changed: What |Removed |Added ---------------------------------------------------------------------------- Component|rtl-optimization |tree-optimization Keywords| |needs-bisection Summary|~20-30x slowdown in |[10/11/12/13 Regression] |populating std::vector from |~20-30x slowdown in |std::ranges::iota_view |populating std::vector from | |std::ranges::iota_view CC| |amonakov at gcc dot gnu.org --- Comment #1 from Alexander Monakov --- Regarding fn1, would you mind re-running the test on your Xeon CPU with fn2 removed from the source code and -falign-loops=3D32 added to gcc command li= ne? For fn1, assembly of the inner loop should be identical, so I think the 20%= you were seeing may result from different loop alignment with respect to 32b fe= tch boundary. Also please note that cloud instances backing godbolt.org have different CP= Us, so timing results from different runs are not directly comparable. Regarding fn2, this may partially be a library issue, compiling preprocessed source from gcc-10.4 using gcc-10.2 also exhibits the problem. Inner loop becomes significantly more complicated. Bisecting should be helpful.=