From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <luoxhu@linux.ibm.com>
Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com
 [148.163.156.1])
 by sourceware.org (Postfix) with ESMTPS id 32FA93851C07;
 Wed,  2 Sep 2020 09:26:59 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 32FA93851C07
Received: from pps.filterd (m0098404.ppops.net [127.0.0.1])
 by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id
 08292HG9089416; Wed, 2 Sep 2020 05:26:57 -0400
Received: from pps.reinject (localhost [127.0.0.1])
 by mx0a-001b2d01.pphosted.com with ESMTP id 33a7pa21qc-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Wed, 02 Sep 2020 05:26:57 -0400
Received: from m0098404.ppops.net (m0098404.ppops.net [127.0.0.1])
 by pps.reinject (8.16.0.36/8.16.0.36) with SMTP id 08292EPw088944;
 Wed, 2 Sep 2020 05:26:56 -0400
Received: from ppma05fra.de.ibm.com (6c.4a.5195.ip4.static.sl-reverse.com
 [149.81.74.108])
 by mx0a-001b2d01.pphosted.com with ESMTP id 33a7pa21ph-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Wed, 02 Sep 2020 05:26:56 -0400
Received: from pps.filterd (ppma05fra.de.ibm.com [127.0.0.1])
 by ppma05fra.de.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 0829MX4e003511;
 Wed, 2 Sep 2020 09:26:54 GMT
Received: from b06cxnps4074.portsmouth.uk.ibm.com
 (d06relay11.portsmouth.uk.ibm.com [9.149.109.196])
 by ppma05fra.de.ibm.com with ESMTP id 337en82scp-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Wed, 02 Sep 2020 09:26:54 +0000
Received: from d06av25.portsmouth.uk.ibm.com (d06av25.portsmouth.uk.ibm.com
 [9.149.105.61])
 by b06cxnps4074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id
 0829QpWE23331160
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);
 Wed, 2 Sep 2020 09:26:51 GMT
Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id B8AC811C05B;
 Wed,  2 Sep 2020 09:26:51 +0000 (GMT)
Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id 4878C11C064;
 Wed,  2 Sep 2020 09:26:49 +0000 (GMT)
Received: from luoxhus-MacBook-Pro.local (unknown [9.197.240.238])
 by d06av25.portsmouth.uk.ibm.com (Postfix) with ESMTP;
 Wed,  2 Sep 2020 09:26:48 +0000 (GMT)
Subject: Re: [PATCH] rs6000: Expand vec_insert in expander instead of gimple
 [PR79251]
To: Richard Biener <richard.guenther@gmail.com>
Cc: Segher Boessenkool <segher@kernel.crashing.org>,
 Bill Schmidt <wschmidt@linux.ibm.com>, GCC Patches
 <gcc-patches@gcc.gnu.org>, linkw@gcc.gnu.org,
 David Edelsohn <dje.gcc@gmail.com>
References: <20200831090647.152432-1-luoxhu@linux.ibm.com>
 <20200831170414.GJ28786@gate.crashing.org>
 <3aec8b6f-8cf5-bd48-eb4e-c7b82e88dcd7@linux.ibm.com>
 <CAFiYyc21gbUsjXjeJtpUZB0uz_Xunj64LVW+cpYPStoYhnu8Tw@mail.gmail.com>
From: luoxhu <luoxhu@linux.ibm.com>
Message-ID: <53017f5c-d81b-6fe2-6c5d-1496249313f1@linux.ibm.com>
Date: Wed, 2 Sep 2020 17:26:47 +0800
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:68.0)
 Gecko/20100101 Thunderbird/68.12.0
MIME-Version: 1.0
In-Reply-To: <CAFiYyc21gbUsjXjeJtpUZB0uz_Xunj64LVW+cpYPStoYhnu8Tw@mail.gmail.com>
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 7bit
X-TM-AS-GCONF: 00
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.235, 18.0.687
 definitions=2020-09-02_03:2020-09-02,
 2020-09-02 signatures=0
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0
 bulkscore=0 mlxlogscore=999
 spamscore=0 phishscore=0 suspectscore=0 malwarescore=0 priorityscore=1501
 mlxscore=0 clxscore=1015 impostorscore=0 adultscore=0 lowpriorityscore=0
 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2006250000
 definitions=main-2009020079
X-Spam-Status: No, score=-4.5 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_EF, NICE_REPLY_A, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2,
 SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Wed, 02 Sep 2020 09:27:00 -0000

Hi,

On 2020/9/1 21:07, Richard Biener wrote:
> On Tue, Sep 1, 2020 at 10:11 AM luoxhu via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>> Hi,
>>
>> On 2020/9/1 01:04, Segher Boessenkool wrote:
>>> Hi!
>>>
>>> On Mon, Aug 31, 2020 at 04:06:47AM -0500, Xiong Hu Luo wrote:
>>>> vec_insert accepts 3 arguments, arg0 is input vector, arg1 is the value
>>>> to be insert, arg2 is the place to insert arg1 to arg0.  This patch adds
>>>> __builtin_vec_insert_v4si[v4sf,v2di,v2df,v8hi,v16qi] for vec_insert to
>>>> not expand too early in gimple stage if arg2 is variable, to avoid generate
>>>> store hit load instructions.
>>>>
>>>> For Power9 V4SI:
>>>>       addi 9,1,-16
>>>>       rldic 6,6,2,60
>>>>       stxv 34,-16(1)
>>>>       stwx 5,9,6
>>>>       lxv 34,-16(1)
>>>> =>
>>>>       addis 9,2,.LC0@toc@ha
>>>>       addi 9,9,.LC0@toc@l
>>>>       mtvsrwz 33,5
>>>>       lxv 32,0(9)
>>>>       sradi 9,6,2
>>>>       addze 9,9
>>>>       sldi 9,9,2
>>>>       subf 9,9,6
>>>>       subfic 9,9,3
>>>>       sldi 9,9,2
>>>>       subfic 9,9,20
>>>>       lvsl 13,0,9
>>>>       xxperm 33,33,45
>>>>       xxperm 32,32,45
>>>>       xxsel 34,34,33,32
>>>
>>> For v a V4SI, x a SI, j some int,  what do we generate for
>>>     v[j&3] = x;
>>> ?
>>> This should be exactly the same as we generate for
>>>     vec_insert(x, v, j);
>>> (the builtin does a modulo 4 automatically).
>>
>> No, even with my patch "stxv 34,-16(1);stwx 5,9,6;lxv 34,-16(1)" generated currently.
>> Is it feasible and acceptable to expand some kind of pattern in expander directly without
>> builtin transition?
>>
>> I borrowed some of implementation from vec_extract.  For vec_extract, the issue also exists:
>>
>>             source:                             gimple:                             expand:                                        asm:
>> 1) i = vec_extract (v, n);  =>  i = __builtin_vec_ext_v4si (v, n);   => {r120:SI=unspec[r118:V4SI,r119:DI] 134;...}     => slwi 9,6,2   vextuwrx 3,9,2
>> 2) i = vec_extract (v, 3);  =>  i = __builtin_vec_ext_v4si (v, 3);   => {r120:SI=vec_select(r118:V4SI,parallel)...}     =>  li 9,12      vextuwrx 3,9,2
>> 3) i = v[n%4];   =>  _1 = n & 3;  i = VIEW_CONVERT_EXPR<int[4]>(v)[_1];  =>        ...        =>     stxv 34,-16(1);addi 9,1,-16; rldic 5,5,2,60; lwax 3,9,5
>> 4) i = v[3];     =>          i = BIT_FIELD_REF <v, 32, 96>;              =>  {r120:SI=vec_select(r118:V4SI,parallel)...} => li 9,12;   vextuwrx 3,9,2
> 
> Why are 1) and 2) handled differently than 3)/4)?

1) and 2) are calling builtin function vec_extract, it is defined to
 __builtin_vec_extract and will be resolved to ALTIVEC_BUILTIN_VEC_EXTRACT
 by resolve_overloaded_builtin, to generate a call __builtin_vec_ext_v4si
 to be expanded only in RTL. 
3) is access variable v as array type with opcode VIEW_CONVERT_EXPR, I
 guess we should also generate builtin call instead of calling
 convert_vector_to_array_for_subscript to generate VIEW_CONVERT_EXPR
 expression for such kind of usage.
4) is translated to BIT_FIELD_REF with constant bitstart and bitsize,
variable v could also be accessed by register instead of stack, so optabs
could match the rs6000_expand_vector_insert to generate expected instruction
through extract_bit_field.

> 
>> Case 3) also couldn't handle the similar usage, and case 4) doesn't generate builtin as expected,
>> it just expand to vec_select by coincidence.  So does this mean both vec_insert and vec_extract
>> and all other similar vector builtins should use IFN as suggested by Richard Biener, to match the
>> pattern in gimple and expand both constant and variable index in expander?  Will this also be
>> beneficial for other targets except power?  Or we should do that gradually after this patch
>> approved as it seems another independent issue?  Thanks:)
> 
> If the code generated for 3)/4) isn't optimal you have to figure why
> by tracing the RTL
> expansion code and looking for missing optabs.
> 
> Consider the amount of backend code you need to write if ever using
> those in constexpr
> context ...

It seems too complicated to expand the "i = VIEW_CONVERT_EXPR<int[4]>(v)[_1];"
or "VIEW_CONVERT_EXPR<int[4]>(v1)[_1] = i_6(D);" to
rs6000_expand_vector_insert/rs6000_expand_vector_extract in RTL, as:
1) Vector v is stored to stack with array type; need extra load and store operation.
2) Requires amount of code to decompose VIEW_CONVERT_EXPR to extract the vector and
index then call rs6000_expand_vector_insert/rs6000_expand_vector_extract.

which means replace following instructions #9~#12 to new instruction sequences:
    1: NOTE_INSN_DELETED
    6: NOTE_INSN_BASIC_BLOCK 2
    2: r119:V4SI=%2:V4SI
    3: r120:DI=%5:DI
    4: r121:DI=%6:DI
    5: NOTE_INSN_FUNCTION_BEG
    8: [r112:DI]=r119:V4SI

    9: r122:DI=r121:DI&0x3
   10: r123:DI=r122:DI<<0x2
   11: r124:DI=r112:DI+r123:DI
   12: [r124:DI]=r120:DI#0

   13: r118:V4SI=[r112:DI]
   17: %2:V4SI=r118:V4SI
   18: use %2:V4SI

=>

    1: NOTE_INSN_DELETED
    6: NOTE_INSN_BASIC_BLOCK 2
    2: r119:V4SI=%2:V4SI
    3: r120:DI=%5:DI
    4: r121:DI=%6:DI
    5: NOTE_INSN_FUNCTION_BEG
    8: [r112:DI]=r119:V4SI

r130:V4SI=[r112:DI]
rs6000_expand_vector_insert (r130,  r121:DI&0x3, r120:DI#0)
[r112:DI]=r130:V4SI

   13: r118:V4SI=[r112:DI]
   17: %2:V4SI=r118:V4SI
   18: use %2:V4SI

so maybe bypass convert_vector_to_array_for_subscript for special circumstance
like "i = v[n%4]" or "v[n&3]=i" to generate vec_extract or vec_insert builtin 
call a relative simpler method?

Xionghu