From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 19770 invoked by alias); 9 Jan 2019 16:26:47 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 19758 invoked by uid 89); 9 Jan 2019 16:26:46 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=2.8 required=5.0 tests=BAYES_50,KAM_SHORT,LIKELY_SPAM_BODY,SPF_HELO_PASS autolearn=no version=3.3.2 spammy=005, 9995, 0.05, 99.95 X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 09 Jan 2019 16:26:43 +0000 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 9603D80F94; Wed, 9 Jan 2019 16:26:42 +0000 (UTC) Received: from ovpn-116-42.phx2.redhat.com (ovpn-116-42.phx2.redhat.com [10.3.116.42]) by smtp.corp.redhat.com (Postfix) with ESMTP id C5D1710A182E; Wed, 9 Jan 2019 16:26:41 +0000 (UTC) Message-ID: <1547051201.7788.132.camel@redhat.com> Subject: Re: autovectorization in gcc From: David Malcolm To: Jonathan Wakely , Andrew Haley Cc: Kyrill Tkachov , "Kay F. Jahnke" , "gcc@gcc.gnu.org" Date: Wed, 09 Jan 2019 16:26:00 -0000 In-Reply-To: <1547050225.7788.129.camel@redhat.com> References: <41ea83cd-0ce8-4f25-35e5-888513d69c7b@gmail.com> <5C35C2C2.1050106@foss.arm.com> <2721bb39-ee4b-0202-d81d-e0b36d2059fa@redhat.com> <1547050225.7788.129.camel@redhat.com> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2019-01/txt/msg00062.txt.bz2 On Wed, 2019-01-09 at 11:10 -0500, David Malcolm wrote: > On Wed, 2019-01-09 at 09:56 +0000, Jonathan Wakely wrote: > > On Wed, 9 Jan 2019 at 09:50, Andrew Haley wrote: > > > I don't agree. Sometimes vectorization is critical. It would be > > > nice > > > to have a warning which would fire if vectorization failed. That > > > would > > > surely help the OP. > > > > Dave Malcolm has been working on something like that: > > https://gcc.gnu.org/ml/gcc-patches/2018-09/msg01749.html > > Yes: this code is in trunk for gcc 9, but it doesn't help much for > the > case given elsewhere in this thread: > > #include > > extern float data [ 32768 ] ; > > extern void vf1() > { > #pragma vectorize enable > for ( int i = 0 ; i < 32768 ; i++ ) > data [ i ] = std::sqrt ( data [ i ] ) ; > } > > Compiling on this x86_64 box with -fopt-info-vec-missed shows the > rather cryptic: > > g++ -c /tmp/sqrt-test.cc -O3 -mavx2 -fopt-info-vec-missed > /tmp/sqrt-test.cc:8:24: missed: couldn't vectorize loop > /tmp/sqrt-test.cc:8:24: missed: not vectorized: control flow in loop. > /home/david/coding/gcc-python/gcc-svn-trunk/install- > dogfood/include/c++/9.0.0/cmath:464:27: missed: statement clobbers > memory: __builtin_sqrtf (_1); > > and with -fopt-info-vec-all-internals shows: > > g++ -c /tmp/sqrt-test.cc -O3 -mavx2 -fopt-info-vec-all-internals > > Analyzing loop at /tmp/sqrt-test.cc:8 > /tmp/sqrt-test.cc:8:24: note: === analyze_loop_nest === > /tmp/sqrt-test.cc:8:24: note: === vect_analyze_loop_form === > /tmp/sqrt-test.cc:8:24: missed: not vectorized: control flow in > loop. > /tmp/sqrt-test.cc:8:24: missed: bad loop form. > /tmp/sqrt-test.cc:8:24: missed: couldn't vectorize loop > /tmp/sqrt-test.cc:8:24: missed: not vectorized: control flow in loop. > /tmp/sqrt-test.cc:5:13: note: vectorized 0 loops in function. > /home/david/coding/gcc-python/gcc-svn-trunk/install- > dogfood/include/c++/9.0.0/cmath:464:27: note: === > vect_slp_analyze_bb === > /home/david/coding/gcc-python/gcc-svn-trunk/install- > dogfood/include/c++/9.0.0/cmath:464:27: note: === > vect_analyze_data_refs === > /home/david/coding/gcc-python/gcc-svn-trunk/install- > dogfood/include/c++/9.0.0/cmath:464:27: note: got vectype for stmt: > _1 = data[i_12]; > vector(8) float > /home/david/coding/gcc-python/gcc-svn-trunk/install- > dogfood/include/c++/9.0.0/cmath:464:27: missed: not vectorized: not > enough data-refs in basic block. > /home/david/coding/gcc-python/gcc-svn-trunk/install- > dogfood/include/c++/9.0.0/cmath:464:27: missed: statement clobbers > memory: __builtin_sqrtf (_1); > /tmp/sqrt-test.cc:8:24: note: === vect_slp_analyze_bb === > /tmp/sqrt-test.cc:8:24: note: === vect_analyze_data_refs === > /tmp/sqrt-test.cc:8:24: note: got vectype for stmt: data[i_12] = > _7; > vector(8) float > /tmp/sqrt-test.cc:8:24: missed: not vectorized: not enough data-refs > in basic block. > /tmp/sqrt-test.cc:10:1: note: === vect_slp_analyze_bb === > /tmp/sqrt-test.cc:10:1: note: === vect_analyze_data_refs === > /tmp/sqrt-test.cc:10:1: missed: not vectorized: not enough data-refs > in basic block. > > I had to turn on -fdump-tree-all to try to figure out what that > "control flow in loop" was; it seems to be a guard against the input > to > value being negative: > > [local count: 1063004407]: > # i_12 = PHI <0(2), i_6(7)> > # ivtmp_10 = PHI <32768(2), ivtmp_2(7)> > # DEBUG i => i_12 > # DEBUG BEGIN_STMT > _1 = data[i_12]; > # DEBUG __x => _1 > # DEBUG BEGIN_STMT > _7 = .SQRT (_1); > if (_1 u>= 0.0) > goto ; [99.95%] > else > goto ; [0.05%] > > [local count: 1062472912]: > goto ; [100.00%] > > [local count: 531495]: > __builtin_sqrtf (_1); > > I'm not sure where that control flow came from: it isn't in > sqrt-test.cc.104t.stdarg > but is in > sqrt-test.cc.105t.cdce > so I think it's coming from the argument-range code in cdce. > > Arguably the location on the statement is wrong: it's on the loop > header, when it presumably should be on the std::sqrt call. > > Shall I file a bugzilla about this? ...and -fno-tree-builtin-call-dce eliminates the control flow, but it still doesn't vectorize the loop; on godbolt.org with: -O3 -mavx2 -fopt-info-vec-all -fno-tree-builtin-call-dce gcc trunk x86_64 gives: :8:24: missed: couldn't vectorize loop /opt/compiler-explorer/gcc-trunk-20190109/include/c++/9.0.0/cmath:464:27: missed: statement clobbers memory: _7 = __builtin_sqrtf (_1); :5:13: note: vectorized 0 loops in function. /opt/compiler-explorer/gcc-trunk-20190109/include/c++/9.0.0/cmath:464:27: missed: statement clobbers memory: _7 = __builtin_sqrtf (_1); Compiler returned: 0 ...so presumably it doesn't know how to vectorize that builtin call. Dave