From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id ED1353882161; Thu, 13 Jun 2024 06:22:51 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org ED1353882161 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1718259771; bh=RQHLtdgFW5E3HWRqqrhkxD2/50jSebSmadA5BesnDgU=; h=From:To:Subject:Date:In-Reply-To:References:From; b=RFo7TgdvDb58X2S6u9j1AGYlbym3e4V4XBcEjqRPnsnmdO2q5s+YxWrBOftZqzBuU D91B583EYRHVGKER0UcG6IQ2CHDAieYutjrlBfI/nF/ReX1sQzzrCRJ9vva838AtYJ HQC2MhHgttJW+kA7Zg2hJteDH3Xsok2XrerQ9ffY= From: "cvs-commit at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/115385] Peeling for gaps can be optimized more or needs to peel more than one iteration Date: Thu, 13 Jun 2024 06:22:51 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 15.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: cvs-commit at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D115385 --- Comment #2 from GCC Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:6669dc51515313dd1e60c493596dbc90429fc362 commit r15-1239-g6669dc51515313dd1e60c493596dbc90429fc362 Author: Richard Biener Date: Fri Jun 7 14:47:12 2024 +0200 tree-optimization/115385 - handle more gaps with peeling of a single iteration The following makes peeling of a single scalar iteration handle more gaps, including non-power-of-two cases. This can be done by rounding up the remaining access to the next power-of-two which ensures that the next scalar iteration will pick at least the number of excess elements we access. I've added a correctness testcase and one x86 specific scanning for the optimization. PR tree-optimization/115385 * tree-vect-stmts.cc (get_group_load_store_type): Peeling of a single scalar iteration is sufficient if we can narrow the access to the next power of two of the bits in the last access. (vectorizable_load): Ensure that the last access is narrowed. * gcc.dg/vect/pr115385.c: New testcase. * gcc.target/i386/vect-pr115385.c: Likewise.=