From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 14820 invoked by alias); 9 Jan 2019 16:25:21 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 14806 invoked by uid 89); 9 Jan 2019 16:25:20 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=1.8 required=5.0 tests=BAYES_50,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_PASS autolearn=no version=3.3.2 spammy=005, 9995, 0.05, 99.95 X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 09 Jan 2019 16:25:19 +0000 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 8B462A7FDE; Wed, 9 Jan 2019 16:25:17 +0000 (UTC) Received: from tucnak.zalov.cz (ovpn-116-18.ams2.redhat.com [10.36.116.18]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 19D4967164; Wed, 9 Jan 2019 16:25:15 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.15.2/8.15.2) with ESMTP id x09GPCQB003943; Wed, 9 Jan 2019 17:25:12 +0100 Received: (from jakub@localhost) by tucnak.zalov.cz (8.15.2/8.15.2/Submit) id x09GP9mG003942; Wed, 9 Jan 2019 17:25:09 +0100 Date: Wed, 09 Jan 2019 16:25:00 -0000 From: Jakub Jelinek To: David Malcolm , Richard Biener Cc: Jonathan Wakely , Andrew Haley , Kyrill Tkachov , "Kay F. Jahnke" , "gcc@gcc.gnu.org" Subject: Re: autovectorization in gcc Message-ID: <20190109162509.GQ30353@tucnak> Reply-To: Jakub Jelinek References: <41ea83cd-0ce8-4f25-35e5-888513d69c7b@gmail.com> <5C35C2C2.1050106@foss.arm.com> <2721bb39-ee4b-0202-d81d-e0b36d2059fa@redhat.com> <1547050225.7788.129.camel@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1547050225.7788.129.camel@redhat.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-IsSubscribed: yes X-SW-Source: 2019-01/txt/msg00061.txt.bz2 On Wed, Jan 09, 2019 at 11:10:25AM -0500, David Malcolm wrote: > extern void vf1() > { > #pragma vectorize enable > for ( int i = 0 ; i < 32768 ; i++ ) > data [ i ] = std::sqrt ( data [ i ] ) ; > } > > Compiling on this x86_64 box with -fopt-info-vec-missed shows the > _7 = .SQRT (_1); > if (_1 u>= 0.0) > goto ; [99.95%] > else > goto ; [0.05%] > > [local count: 1062472912]: > goto ; [100.00%] > > [local count: 531495]: > __builtin_sqrtf (_1); > > I'm not sure where that control flow came from: it isn't in > sqrt-test.cc.104t.stdarg > but is in > sqrt-test.cc.105t.cdce > so I think it's coming from the argument-range code in cdce. > > Arguably the location on the statement is wrong: it's on the loop > header, when it presumably should be on the std::sqrt call. See my either mail, it is the result of the -fmath-errno default, the inline emitted sqrt doesn't handle errno setting and we emit essentially x = sqrt (arg); if (__builtin_expect (arg < 0.0, 0)) sqrt (arg); where the former sqrt is inline using HW instructions and the latter is the library call. With some extra work we could vectorize it; e.g. if we make it handle OpenMP #pragma omp ordered simd efficiently, it would be the same thing - allow non-vectorizable portions of vectorized loops by doing there a scalar loop from 0 to vf-1 doing the non-vectorizable stuff + drop the limitation that the vectorized loop is a single bb. Essentially, in this case it would be vec1 = vec_load (data + i); vec2 = vec_sqrt (vec1); if (__builtin_expect (any (vec2 < 0.0))) { for (int i = 0; i < vf; i++) sqrt (vec2[i]); } vec_store (data + i, vec2); If that would turn to be way too hard, we could for the vectorization purposes hide that into the .SQRT internal fn, say add a fndecl argument to it if it should treat the exceptional cases some way so that the control flow isn't visible in the vectorized loop. Jakub