From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <andre.simoesdiasvieira@arm.com>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
 by sourceware.org (Postfix) with ESMTP id E1EE03857810
 for <gcc-patches@gcc.gnu.org>; Thu, 25 Nov 2021 10:40:13 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E1EE03857810
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6FF1B1042;
 Thu, 25 Nov 2021 02:40:13 -0800 (PST)
Received: from [10.1.29.157] (E121495.Arm.com [10.1.29.157])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id F0AD73F73B;
 Thu, 25 Nov 2021 02:40:12 -0800 (PST)
Message-ID: <21e3500d-6cf5-ed46-6f95-1f554c5dbc50@arm.com>
Date: Thu, 25 Nov 2021 10:40:14 +0000
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
 Thunderbird/91.3.1
Subject: Re: [PATCH 1v2/3][vect] Add main vectorized loop unrolling
Content-Language: en-US
To: Richard Biener <rguenther@suse.de>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
 richard.sandiford@arm.com
References: <4a2e6dde-cc5c-97fe-7a43-bd59d542c2ce@arm.com>
 <27777876-4201-5e86-bf9a-063143d38641@arm.com>
 <q2n8qs0-p9s-nq31-6sp-79n7o35730ss@fhfr.qr>
 <e01f9b48-918b-6379-8f0f-5ab8f5402b5a@arm.com>
 <p4442748-3rq6-4013-4546-51q6r553s756@fhfr.qr>
 <fb6e6a2f-646a-f338-6f55-4669e593e9c2@arm.com>
 <4272814n-8538-p793-157q-5n6q16r48n51@fhfr.qr>
 <623fbfd9-b97c-8c6e-0348-07d6c4496592@arm.com>
 <pp44o5pq-rp4o-rn56-1070-15ns3n93n4o2@fhfr.qr>
 <5c887c48-7f7e-c02b-2998-7a7c41b11af8@arm.com>
 <r938qonn-53n-68o5-o3o6-7n1s773p92nq@fhfr.qr> <mptmtn1xt0e.fsf@arm.com>
 <33cb143e-bb2e-e214-cd5f-66fd2d1bd20b@arm.com>
 <5op15ns-4sq8-2sn3-41qs-49q44417sp6@fhfr.qr>
 <b73e1c6f-0c8c-ecae-7244-7c62489db306@arm.com>
 <99qs2o2p-pn87-n164-q8n9-9p814r6n75r1@fhfr.qr>
 <475fae98-9541-5dca-2e60-eaff172ff787@arm.com>
 <8p72o15s-5894-4or0-409r-oo4p74o238r1@fhfr.qr>
From: "Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com>
In-Reply-To: <8p72o15s-5894-4or0-409r-oo4p74o238r1@fhfr.qr>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-9.0 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS,
 NICE_REPLY_A, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Thu, 25 Nov 2021 10:40:16 -0000


On 24/11/2021 11:00, Richard Biener wrote:
> On Wed, 24 Nov 2021, Andre Vieira (lists) wrote:
>
>> On 22/11/2021 12:39, Richard Biener wrote:
>>> +  if (first_loop_vinfo->suggested_unroll_factor > 1)
>>> +    {
>>> +      if (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (first_loop_vinfo))
>>> +       {
>>> +         if (dump_enabled_p ())
>>> +           dump_printf_loc (MSG_NOTE, vect_location,
>>> +                            "***** Re-trying analysis with first vector
>>> mode"
>>> +                            " %s for epilogue with partial vectors of"
>>> +                            " unrolled first loop.\n",
>>> +                            GET_MODE_NAME (vector_modes[0]));
>>> +         mode_i = 0;
>>>
>>> and the later done check for bigger VF than main loop - why would
>>> we re-start at 0 rather than at the old mode?  Maybe we want to
>>> remember the iterator value we started at when arriving at the
>>> main loop mode?  So if we analyzed successfully with mode_i == 2,
>>> then sucessfully at mode_i == 4 which suggested an unroll of 2,
>>> re-start at the mode_i we continued after the mode_i == 2
>>> successful analysis?  To just consider the "simple" case of
>>> AVX vs SSE it IMHO doesn't make much sense to succeed with
>>> AVX V4DF, succeed with SSE V2DF and figure it's better than V4DF AVX
>>> but get a suggestion of 2 times unroll and then re-try AVX V4DF
>>> just to re-compute that yes, it's worse than SSE V2DF?  You
>>> are probably thinking of SVE vs ADVSIMD here but do we need to
>>> start at 0?  Adding a comment to the code would be nice.
>>>
>>> Thanks,
>> I was indeed thinking SVE vs Advanced SIMD where we end up having to compare
>> different vectorization strategies, which will have different costs depending.
>> The hypothetical case, as in I don't think I've come across one, is where if
>> we decide to vectorize the main loop for V8QI and unroll 2x, yielding a VF of
>> 16, we may then want to then use a predicated VNx16QI epilogue.
> But this isn't the epilogue handling ...
Am I misunderstanding the code here? To me it looks like this is picking 
what mode_i to start the 'while (1)' loop does the loop analysis for the 
epilogues?