From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 5147 invoked by alias); 10 Jan 2019 08:19:53 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 5133 invoked by uid 89); 10 Jan 2019 08:19:53 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=0.8 required=5.0 tests=BAYES_50,SPF_PASS autolearn=ham version=3.3.2 spammy=99.95, 0.05, 005, SQRT X-HELO: mx1.suse.de Received: from mx2.suse.de (HELO mx1.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 10 Jan 2019 08:19:51 +0000 Received: from relay1.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 2A631AFB5; Thu, 10 Jan 2019 08:19:49 +0000 (UTC) Date: Thu, 10 Jan 2019 08:19:00 -0000 From: Richard Biener To: Jakub Jelinek cc: David Malcolm , Jonathan Wakely , Andrew Haley , Kyrill Tkachov , "Kay F. Jahnke" , "gcc@gcc.gnu.org" Subject: Re: autovectorization in gcc In-Reply-To: <20190109162509.GQ30353@tucnak> Message-ID: References: <41ea83cd-0ce8-4f25-35e5-888513d69c7b@gmail.com> <5C35C2C2.1050106@foss.arm.com> <2721bb39-ee4b-0202-d81d-e0b36d2059fa@redhat.com> <1547050225.7788.129.camel@redhat.com> <20190109162509.GQ30353@tucnak> User-Agent: Alpine 2.20 (LSU 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-SW-Source: 2019-01/txt/msg00064.txt.bz2 On Wed, 9 Jan 2019, Jakub Jelinek wrote: > On Wed, Jan 09, 2019 at 11:10:25AM -0500, David Malcolm wrote: > > extern void vf1() > > { > > #pragma vectorize enable > > for ( int i = 0 ; i < 32768 ; i++ ) > > data [ i ] = std::sqrt ( data [ i ] ) ; > > } > > > > Compiling on this x86_64 box with -fopt-info-vec-missed shows the > > > _7 = .SQRT (_1); > > if (_1 u>= 0.0) > > goto ; [99.95%] > > else > > goto ; [0.05%] > > > > [local count: 1062472912]: > > goto ; [100.00%] > > > > [local count: 531495]: > > __builtin_sqrtf (_1); > > > > I'm not sure where that control flow came from: it isn't in > > sqrt-test.cc.104t.stdarg > > but is in > > sqrt-test.cc.105t.cdce > > so I think it's coming from the argument-range code in cdce. > > > > Arguably the location on the statement is wrong: it's on the loop > > header, when it presumably should be on the std::sqrt call. > > See my either mail, it is the result of the -fmath-errno default, > the inline emitted sqrt doesn't handle errno setting and we emit > essentially x = sqrt (arg); if (__builtin_expect (arg < 0.0, 0)) sqrt (arg); where > the former sqrt is inline using HW instructions and the latter is the > library call. > > With some extra work we could vectorize it; e.g. if we make it handle > OpenMP #pragma omp ordered simd efficiently, it would be the same thing > - allow non-vectorizable portions of vectorized loops by doing there a > scalar loop from 0 to vf-1 doing the non-vectorizable stuff + drop the limitation > that the vectorized loop is a single bb. Essentially, in this case it would > be > vec1 = vec_load (data + i); > vec2 = vec_sqrt (vec1); > if (__builtin_expect (any (vec2 < 0.0))) > { > for (int i = 0; i < vf; i++) > sqrt (vec2[i]); > } > vec_store (data + i, vec2); > If that would turn to be way too hard, we could for the vectorization > purposes hide that into the .SQRT internal fn, say add a fndecl argument to > it if it should treat the exceptional cases some way so that the control > flow isn't visible in the vectorized loop. If we decide it's worth the trouble I'd rather do that in the epilogue and thus make the any (vec2 < 0.0) a reduction. Like smallest = min(smallest, vec1); and after the loop do the errno thing on the smallest element. That said, this is a transform that is probably worthwhile even on scalar code, possibly easiest to code-gen right from the start in the call-dce pass. Richard.