From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <prvs=1831e939ca=christophe.lyon@foss.st.com>
Received: from mx07-00178001.pphosted.com (mx07-00178001.pphosted.com
 [185.132.182.106])
 by sourceware.org (Postfix) with ESMTPS id 5D6AC385F01C
 for <gcc-patches@gcc.gnu.org>; Fri, 16 Jul 2021 14:32:37 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5D6AC385F01C
Received: from pps.filterd (m0241204.ppops.net [127.0.0.1])
 by mx07-00178001.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id
 16GEQVY8002240; Fri, 16 Jul 2021 16:32:34 +0200
Received: from beta.dmz-eu.st.com (beta.dmz-eu.st.com [164.129.1.35])
 by mx07-00178001.pphosted.com with ESMTP id 39tw1gvbh6-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Fri, 16 Jul 2021 16:32:34 +0200
Received: from euls16034.sgp.st.com (euls16034.sgp.st.com [10.75.44.20])
 by beta.dmz-eu.st.com (STMicroelectronics) with ESMTP id 95CFE10002A;
 Fri, 16 Jul 2021 16:32:33 +0200 (CEST)
Received: from Webmail-eu.st.com (sfhdag2node3.st.com [10.75.127.6])
 by euls16034.sgp.st.com (STMicroelectronics) with ESMTP id 6A8B722F7D1;
 Fri, 16 Jul 2021 16:32:33 +0200 (CEST)
Received: from [10.131.138.188] (10.75.127.44) by SFHDAG2NODE3.st.com
 (10.75.127.6) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Fri, 16 Jul
 2021 16:32:32 +0200
Subject: Re: [PATCH 1/2] arm: Fix vcond_mask expander for MVE (PR
 target/100757)
To: gcc Patches <gcc-patches@gcc.gnu.org>, <richard.sandiford@arm.com>
References: <20210608213844.6218-1-christophe.lyon@linaro.org>
 <mpta6nzm650.fsf@arm.com>
 <CAKdteOY8w7r2=2RFLjYO8axH0isBBUac0bGNxrE9aoX-ifPqTw@mail.gmail.com>
 <mptfswe4aki.fsf@arm.com>
From: Christophe LYON <christophe.lyon@foss.st.com>
Message-ID: <f6816340-9cc8-a2fe-f9c9-7408bfe66452@foss.st.com>
Date: Fri, 16 Jul 2021 16:32:32 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
 Thunderbird/78.11.0
MIME-Version: 1.0
In-Reply-To: <mptfswe4aki.fsf@arm.com>
Content-Type: text/plain; charset="utf-8"; format=flowed
Content-Transfer-Encoding: 8bit
Content-Language: en-US
X-Originating-IP: [10.75.127.44]
X-ClientProxiedBy: SFHDAG1NODE3.st.com (10.75.127.3) To SFHDAG2NODE3.st.com
 (10.75.127.6)
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.790
 definitions=2021-07-16_05:2021-07-16,
 2021-07-16 signatures=0
X-Spam-Status: No, score=-8.7 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, GIT_PATCH_0, NICE_REPLY_A, RCVD_IN_DNSWL_LOW,
 SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Fri, 16 Jul 2021 14:32:39 -0000


On 16/07/2021 16:06, Richard Sandiford via Gcc-patches wrote:
> Hi,
>
> Sorry for the slow review.  I'd initially held off from reviewing
> because it sounded like you were trying to treat predicates as
> MODE_VECTOR_BOOL instead.  Is that right?  If so, how did that go?


Yes, that's part of PR 101325. It's still WIP as I wrote in the PR, I'm 
not sure I got it right yet. At the moment it seems it would imply a lot 
of changes, I'll have to look at AArch64's implementation in more details.

I hoped this fix could be merged before switching to MODE_VECTOR_BOOL.


> It does feel like the right long-term direction.  Treating masks as
> integers for AVX seems to make some things more difficult than they
> should be.  Also, RTL like:

OK, I see, good to know.


>
>> +(define_expand "vec_cmp<mode>hi"
>> +  [(set (match_operand:HI 0 "s_register_operand")
>> +	(match_operator:HI 1 "comparison_operator"
>> +	  [(match_operand:MVE_VLD_ST 2 "s_register_operand")
>> +	   (match_operand:MVE_VLD_ST 3 "reg_or_zero_operand")]))]
>> +  "TARGET_HAVE_MVE
>> +   && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
>> +{
>> +  arm_expand_vector_compare (operands[0], GET_CODE (operands[1]),
>> +			     operands[2], operands[3], false, false);
>> +  DONE;
>> +})
> seems kind-of suspect, since (as I think you said in a PR),
> (eq:HI X Y) would normally be a single boolean.  Having MODE_VECTOR_BOOL
> means that we can represent the comparisons “properly”, even in
> define_insns.
>
> But since this usage is confined to define_expands, I guess it
> doesn't matter much for the purposes of this patch.  Any switch
> to MODE_VECTOR_BOOL would leave most of the patch in tact (contrary
> to what I'd initially assumed).
>
>> @@ -31061,13 +31065,7 @@ arm_expand_vector_compare (rtx target, rtx_code code, rtx op0, rtx op1,
>>   
>>   	  /* If we are not expanding a vcond, build the result here.  */
>>   	  if (!vcond_mve)
>> -	    {
>> -	      rtx zero = gen_reg_rtx (cmp_result_mode);
>> -	      rtx one = gen_reg_rtx (cmp_result_mode);
>> -	      emit_move_insn (zero, CONST0_RTX (cmp_result_mode));
>> -	      emit_move_insn (one, CONST1_RTX (cmp_result_mode));
>> -	      emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, target, one, zero, vpr_p0));
>> -	    }
>> +	    emit_move_insn (target, vpr_p0);
> The code above this is:
>
> 	  if (vcond_mve)
> 	    vpr_p0 = target;
> 	  else
> 	    vpr_p0 = gen_reg_rtx (HImode);
>
> so couldn't we simply use vpr_p0 = target unconditionally (or drop
> vpr_p0 altogether)?  Same for the other cases.
Probably, I'll check that.
>> @@ -31178,20 +31164,21 @@ arm_expand_vcond (rtx *operands, machine_mode cmp_result_mode)
>>   			    mask, operands[1], operands[2]));
>>     else
>>       {
>> -      machine_mode cmp_mode = GET_MODE (operands[4]);
>> +      machine_mode cmp_mode = GET_MODE (operands[0]);
>>         rtx vpr_p0 = mask;
>> -      rtx zero = gen_reg_rtx (cmp_mode);
>> -      rtx one = gen_reg_rtx (cmp_mode);
>> -      emit_move_insn (zero, CONST0_RTX (cmp_mode));
>> -      emit_move_insn (one, CONST1_RTX (cmp_mode));
>> +
>>         switch (GET_MODE_CLASS (cmp_mode))
>>   	{
>>   	case MODE_VECTOR_INT:
>> -	  emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, operands[0], one, zero, vpr_p0));
>> +	  emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_mode, operands[0],
>> +				     operands[1], operands[2], vpr_p0));
>>   	  break;
>>   	case MODE_VECTOR_FLOAT:
>>   	  if (TARGET_HAVE_MVE_FLOAT)
>> -	    emit_insn (gen_mve_vpselq_f (cmp_mode, operands[0], one, zero, vpr_p0));
>> +	    emit_insn (gen_mve_vpselq_f (cmp_mode, operands[0],
>> +					 operands[1], operands[2], vpr_p0));
>> +	  else
>> +	    gcc_unreachable ();
>>   	  break;
>>   	default:
>>   	  gcc_unreachable ();
> Here too vpr_p0 feels a bit redundant now.

Indeed, but it seemed clearer to me :-)

>> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
>> index e393518ea88..a9840408bdd 100644
>> --- a/gcc/config/arm/mve.md
>> +++ b/gcc/config/arm/mve.md
>> @@ -10516,3 +10516,58 @@ (define_insn "*movmisalign<mode>_mve_load"
>>     "vldr<V_sz_elem1>.<V_sz_elem>\t%q0, %E1"
>>     [(set_attr "type" "mve_load")]
>>   )
>> +
>> +;; Expanders for vec_cmp and vcond
>> +
>> +(define_expand "vec_cmp<mode>hi"
>> +  [(set (match_operand:HI 0 "s_register_operand")
>> +	(match_operator:HI 1 "comparison_operator"
>> +	  [(match_operand:MVE_VLD_ST 2 "s_register_operand")
>> +	   (match_operand:MVE_VLD_ST 3 "reg_or_zero_operand")]))]
>> +  "TARGET_HAVE_MVE
>> +   && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
> Is flag_unsafe_math_optimizations needed for MVE?  For Neon we had
> it because of flush to zero (at least, I think that was the only reason).

Right, I inherited this from the vec_cmp in neon.md. I do not have the 
ARM ARM at hand right now to check.

However, your question makes me wonder about the other vec_cmp pattern 
in vec-common.md, which is common to Neon and MVE. It might need to be 
adjusted too.

>> +{
>> +  arm_expand_vector_compare (operands[0], GET_CODE (operands[1]),
>> +			     operands[2], operands[3], false, false);
>> +  DONE;
>> +})
> The (snipped) tests look good, but if we do support
> !flag_unsafe_math_optimizations, it would be good to have some tests
> for unordered comparisons too.  E.g.:
>
> #define ordered(A, B) (!__builtin_isunordered (A, B))
> #define unordered(A, B) (__builtin_isunordered (A, B))
> #define ueq(A, B) (!__builtin_islessgreater (A, B))
> #define ult(A, B) (__builtin_isless (A, B))
> #define ule(A, B) (__builtin_islessequal (A, B))
> #define uge(A, B) (__builtin_isgreaterequal (A, B))
> #define ugt(A, B) (__builtin_isgreater (A, B))
> #define nueq(A, B) (__builtin_islessgreater (A, B))
> #define nult(A, B) (!__builtin_isless (A, B))
> #define nule(A, B) (!__builtin_islessequal (A, B))
> #define nuge(A, B) (!__builtin_isgreaterequal (A, B))
> #define nugt(A, B) (!__builtin_isgreater (A, B))
>
> The main thing is just to check that they don't ICE.  It might be
> difficult to check the assembly for correctness.

Thanks I'll check that in August, when I'm back from holidays.

Christophe


>
> Thanks,
> Richard