From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 21156 invoked by alias); 21 Jun 2016 22:34:48 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 21139 invoked by uid 89); 21 Jun 2016 22:34:47 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=AWL,BAYES_40,RP_MATCHES_RCVD,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.2 spammy=vmx, VMX, sk:altivec, vec_duplicate X-HELO: gate.crashing.org Received: from gate.crashing.org (HELO gate.crashing.org) (63.228.1.57) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted) ESMTPS; Tue, 21 Jun 2016 22:34:45 +0000 Received: from gate.crashing.org (localhost.localdomain [127.0.0.1]) by gate.crashing.org (8.14.1/8.13.8) with ESMTP id u5LMYfvB020700; Tue, 21 Jun 2016 17:34:41 -0500 Received: (from segher@localhost) by gate.crashing.org (8.14.1/8.14.1/Submit) id u5LMYfQZ020698; Tue, 21 Jun 2016 17:34:41 -0500 Date: Tue, 21 Jun 2016 22:34:00 -0000 From: Segher Boessenkool To: Bill Schmidt Cc: GCC Patches , David Edelsohn Subject: Re: [PATCH, rs6000] Prefer vspltisw/h over xxspltib+instruction when available Message-ID: <20160621223440.GA18969@gate.crashing.org> References: <73436B77-BC85-4919-975E-915A6D93F585@linux.vnet.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <73436B77-BC85-4919-975E-915A6D93F585@linux.vnet.ibm.com> User-Agent: Mutt/1.4.2.3i X-IsSubscribed: yes X-SW-Source: 2016-06/txt/msg01556.txt.bz2 On Tue, Jun 21, 2016 at 03:14:51PM -0500, Bill Schmidt wrote: > I discovered recently that, with -mcpu=power9, an attempt to generate a vspltish instruction resulted instead in an xxspltib followed by a vupkhsb. This is semantically correct but the extra instruction is not optimal. I found that there was some logic in xxspltib_constant_p to do special casing for const_vector with small constants, but not for vec_duplicate with small constants. This patch duplicates that logic so we can generate the single instruction when possible. This part is okay. > When I did this, I ran into a problem with an existing test case. We end up matching the *vsx_splat_v4si_internal pattern instead of falling back to the altivec_vspltisw pattern. The constraints don't match for constant input. To avoid this, I added a pattern ahead of this one that will match for VMX output registers and produce the vspltisw as desired. This corrected the failing test and produces the expected code. Why does the predicate allow constant input, while the constraints do not? > I've added a test case to demonstrate the code works properly now in the usual case. Thanks :-) Segher