From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 14E683858C29; Sat, 30 Dec 2023 12:35:51 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 14E683858C29 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1703939751; bh=0oyYfwR+Vn/nGvmlkKUG+HPyIenYcU79++X08AHnrG0=; h=From:To:Subject:Date:In-Reply-To:References:From; b=MGJ8/+Ij5qZyP+7rL+AMyFqHxk6lGnuNQkbmpkrnvmMgeAJsQwTxx5VhHLBi9gmKU myedI3aP4CNo3coqA59QhU5cybcW/323nCI+pEynAwIlWwpW8Uisg5CBowO9TnjqJO BBovEr5R39i1iN1IG6ZxEJR9jKD8mS17nm8mqjzI= From: "rsandifo at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/113104] Suboptimal loop-based slp node splicing across iterations Date: Sat, 30 Dec 2023 12:35:49 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: rsandifo at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: rsandifo at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_status cf_reconfirmed_on everconfirmed cc assigned_to Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D113104 Richard Sandiford changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |ASSIGNED Last reconfirmed| |2023-12-30 Ever confirmed|0 |1 CC| |rsandifo at gcc dot gnu.org Assignee|unassigned at gcc dot gnu.org |rsandifo at gcc dot= gnu.org --- Comment #4 from Richard Sandiford --- FWIW, we do get the desired code with -march=3Darmv8-a+sve (even though the= test doesn't use SVE). This is because of: /* Consider enabling VECT_COMPARE_COSTS for SVE, both so that we can compare SVE against Advanced SIMD and so that we can compare multiple SVE vectorization approaches against each other. There's not really any point doing this for Advanced SIMD only, since the first mode that works should always be the best. */ if (TARGET_SVE && aarch64_sve_compare_costs) flags |=3D VECT_COMPARE_COSTS; The testcase in this PR is a counterexample to the claim in the final sente= nce. I think the comment might predate significant support for mixed-sized Adva= nced SIMD vectorisation. If we enable SVE (or uncomment the "if" line), the costs are 13 units per vector iteration for 128-bit vectors and 4 units per vector iteration for 64-bit vectors (so 8 units per 128 bits on a parity basis). The 64-bit ver= sion is therefore seen as significantly cheaper and is chosen ahead of the 128-b= it version. I think this PR is enough proof that we should enable VECT_COMPARE_COSTS ev= en without SVE. Assigning to myself for that.=