From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 16065 invoked by alias); 8 Jul 2010 20:10:45 -0000 Received: (qmail 16043 invoked by uid 22791); 8 Jul 2010 20:10:44 -0000 X-SWARE-Spam-Status: No, hits=-5.2 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_HI,SPF_HELO_PASS,T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 08 Jul 2010 20:10:39 +0000 Received: from int-mx01.intmail.prod.int.phx2.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id o68KAbMk013171 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Thu, 8 Jul 2010 16:10:37 -0400 Received: from anchor.twiddle.home (vpn-227-19.phx2.redhat.com [10.3.227.19]) by int-mx01.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id o68KAbdo020864; Thu, 8 Jul 2010 16:10:37 -0400 Message-ID: <4C3630BD.3040807@redhat.com> Date: Thu, 08 Jul 2010 20:10:00 -0000 From: Richard Henderson User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.10) Gecko/20100621 Fedora/3.0.5-1.fc13 Thunderbird/3.0.5 MIME-Version: 1.0 To: Ira Rosen CC: gcc-patches@gcc.gnu.org Subject: Re: [RFC] [patch] Support vectorization of min/max location pattern References: <4C34E6CF.4030608@redhat.com> <4C3607AD.50406@redhat.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2010-07/txt/msg00713.txt.bz2 On 07/08/2010 11:19 AM, Ira Rosen wrote: > It's minloc pattern, i.e., a loop that finds the location of the minimum: > > float arr[N}; > > for (i = 0; i < N; i++) > if (arr[i] < limit) > { > pos = i + 1; > limit = arr[i]; > } > > Vectorizer's input code: > > # pos_22 = PHI > # limit_24 = PHI > ... > pos_1 = [cond_expr] limit_9 < limit_24 ? pos_10 : pos_22; // > location > limit_4 = [cond_expr] limit_9 < limit_24 ? limit_9 : limit_24; // min Ok, I get it now. So your thinking was that you needed the builtin to replace the comparison portion of the VEC_COND_EXPR? Or, looking again I see that you don't actually use VEC_COND_EXPR, you use ... > + /* Create: VEC_DEST = (VEC_OPRND1 & MASK) | (VEC_OPRND2 & !MASK). */ ... explicit masking. I.e. you assume that the return value of the builtin is a bit mask of the full width, and that there's no better way to implement the VEC_COND. I wonder if it wouldn't be better to extend the definition of VEC_COND_EXPR so that the comparison values can be of a different type than the data operands (with the caveat that the number of elements should be the same -- i.e. 4-wide compare must match 4-wide data movement). I can think of 2 portability problems with your current solution: (1) SSE4.1 would prefer to use BLEND instructions, which perform that entire (X & M) | (Y & ~M) operation in one insn. (2) The mips C.cond.PS instruction does *not* produce a bitmask like altivec or sse do. Instead it sets multiple condition codes. One then uses MOV[TF].PS to merge the elements based on the individual condition codes. While there's no direct corresponding instruction that will operate on integers, I don't think it would be too difficult to use MOV[TF].G or BC1AND2[FT] instructions to emulate it. In any case, this is again a case where you don't want to expose any part of the VEC_COND at the gimple level. r~