From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-267717-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 16065 invoked by alias); 8 Jul 2010 20:10:45 -0000
Received: (qmail 16043 invoked by uid 22791); 8 Jul 2010 20:10:44 -0000
X-SWARE-Spam-Status: No, hits=-5.2 required=5.0	tests=AWL,BAYES_00,RCVD_IN_DNSWL_HI,SPF_HELO_PASS,T_RP_MATCHES_RCVD
X-Spam-Check-By: sourceware.org
Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 08 Jul 2010 20:10:39 +0000
Received: from int-mx01.intmail.prod.int.phx2.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11])	by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id o68KAbMk013171	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK);	Thu, 8 Jul 2010 16:10:37 -0400
Received: from anchor.twiddle.home (vpn-227-19.phx2.redhat.com [10.3.227.19])	by int-mx01.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id o68KAbdo020864;	Thu, 8 Jul 2010 16:10:37 -0400
Message-ID: <4C3630BD.3040807@redhat.com>
Date: Thu, 08 Jul 2010 20:10:00 -0000
From: Richard Henderson <rth@redhat.com>
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.10) Gecko/20100621 Fedora/3.0.5-1.fc13 Thunderbird/3.0.5
MIME-Version: 1.0
To: Ira Rosen <IRAR@il.ibm.com>
CC: gcc-patches@gcc.gnu.org
Subject: Re: [RFC] [patch] Support vectorization of min/max location pattern
References: <OFEBD40E44.8D85D407-ONC225765A.002CB10C-C2257753.002C059F@il.ibm.com> <OF5EE44A9E.17266140-ONC2257758.00270A88-C2257758.0027CC05@il.ibm.com> <4C34E6CF.4030608@redhat.com> <OF04EBA496.BF7446CC-ONC225775A.0021908B-C225775A.00298E54@il.ibm.com> <4C3607AD.50406@redhat.com> <OF8091E3DC.D51DEA78-ONC225775A.0063AA0D-C225775A.0064AF5D@il.ibm.com>
In-Reply-To: <OF8091E3DC.D51DEA78-ONC225775A.0063AA0D-C225775A.0064AF5D@il.ibm.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-IsSubscribed: yes
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
X-SW-Source: 2010-07/txt/msg00713.txt.bz2

On 07/08/2010 11:19 AM, Ira Rosen wrote:
> It's minloc pattern, i.e., a loop that finds the location of the minimum:
> 
>   float  arr[N};
> 
>   for (i = 0; i < N; i++)
>     if (arr[i] < limit)
>       {
>         pos = i + 1;
>         limit = arr[i];
>       }
> 
> Vectorizer's input code:
> 
>   # pos_22 = PHI <pos_1(4), 1(2)>
>   # limit_24 = PHI <limit_4(4), 0(2)>
>   ...
>   pos_1 = [cond_expr] limit_9 < limit_24 ? pos_10 : pos_22;       //
> location
>   limit_4 = [cond_expr] limit_9 < limit_24 ? limit_9 : limit_24;  // min

Ok, I get it now.

So your thinking was that you needed the builtin to replace the
comparison portion of the VEC_COND_EXPR?  Or, looking again I see
that you don't actually use VEC_COND_EXPR, you use ...

> +  /* Create: VEC_DEST = (VEC_OPRND1 & MASK) | (VEC_OPRND2 & !MASK).  */ 

... explicit masking.  I.e. you assume that the return value of
the builtin is a bit mask of the full width, and that there's no
better way to implement the VEC_COND.

I wonder if it wouldn't be better to extend the definition
of VEC_COND_EXPR so that the comparison values can be of a 
different type than the data operands (with the caveat that the
number of elements should be the same -- i.e. 4-wide compare must
match 4-wide data movement).

I can think of 2 portability problems with your current solution:

(1) SSE4.1 would prefer to use BLEND instructions, which perform
    that entire (X & M) | (Y & ~M) operation in one insn.

(2) The mips C.cond.PS instruction does *not* produce a bitmask
    like altivec or sse do.  Instead it sets multiple condition
    codes.  One then uses MOV[TF].PS to merge the elements based
    on the individual condition codes.  While there's no direct
    corresponding instruction that will operate on integers, I
    don't think it would be too difficult to use MOV[TF].G or
    BC1AND2[FT] instructions to emulate it.  In any case, this 
    is again a case where you don't want to expose any part of
    the VEC_COND at the gimple level.


r~