From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 26407 invoked by alias); 11 Oct 2013 15:05:33 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 26395 invoked by uid 89); 11 Oct 2013 15:05:33 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-4.2 required=5.0 tests=AWL,BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 11 Oct 2013 15:05:32 +0000 Received: from int-mx02.intmail.prod.int.phx2.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r9BF5Seg010768 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Fri, 11 Oct 2013 11:05:28 -0400 Received: from tucnak.zalov.cz (vpn1-4-130.ams2.redhat.com [10.36.4.130]) by int-mx02.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id r9BF5QTP020623 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 11 Oct 2013 11:05:28 -0400 Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.14.7/8.14.7) with ESMTP id r9BF5PYg015539; Fri, 11 Oct 2013 17:05:25 +0200 Received: (from jakub@localhost) by tucnak.zalov.cz (8.14.7/8.14.7/Submit) id r9BF5ODE015538; Fri, 11 Oct 2013 17:05:24 +0200 Date: Fri, 11 Oct 2013 15:05:00 -0000 From: Jakub Jelinek To: Vidya Praveen Cc: Richard Biener , "gcc@gcc.gnu.org" , "ook@ucw.cz" , marc.glisse@inria.fr Subject: Re: [RFC] Vectorization of indexed elements Message-ID: <20131011150524.GX30970@tucnak.zalov.cz> Reply-To: Jakub Jelinek References: <20130924150425.GE22907@e103625-lin.cambridge.arm.com> <20130927145008.GA861@e103625-lin.cambridge.arm.com> <20130927151945.GB861@e103625-lin.cambridge.arm.com> <20130930125454.GD3460@e103625-lin.cambridge.arm.com> <20130930140001.GF3460@e103625-lin.cambridge.arm.com> <20131011145408.GB23850@e103625-lin.cambridge.arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20131011145408.GB23850@e103625-lin.cambridge.arm.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-IsSubscribed: yes X-SW-Source: 2013-10/txt/msg00117.txt.bz2 On Fri, Oct 11, 2013 at 03:54:08PM +0100, Vidya Praveen wrote: > Here's a compilable example: > > void > foo (int *__restrict__ a, > int *__restrict__ b, > int *__restrict__ c) > { > int i; > > for (i = 0; i < 8; i++) > a[i] = b[i] * c[2]; > } > > This is vectorized by duplicating c[2] now. But I'm trying to take advantage > of target instructions that can take a vector register as second argument but > use only one element (by using the same value for all the lanes) of the > vector register. > > Eg. mul , , [index] > mla , , [index] // multiply and add > > But for a loop like the one in the C example given, I will have to load the > c[2] in one element of the vector register (leaving the remaining unused) > rather. This is why I was proposing to load just one element in a vector > register (what I meant as "lane specific load"). The benefit of doing this is > that we avoid explicit duplication, however such a simplification can only > be done where such support is available - the reason why I was thinking in > terms of optional standard pattern name. Another benefit is we will also be > able to support scalars in the expression like in the following example: > > void > foo (int *__restrict__ a, > int *__restrict__ b, > int c) > { > int i; > > for (i = 0; i < 8; i++) > a[i] = b[i] * c; > } So just during combine let the broadcast operation be combined with the arithmetics? Intel AVX512 ISA has similar feature, not sure what exactly they are doing for this. That said, the broadcast is likely going to be hoisted before the loop, and in that case is it really cheaper to have it unbroadcasted in a vector register rather than to broadcast it before the loop and just use there? Jakub