From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 16700 invoked by alias); 22 Oct 2012 06:44:36 -0000 Received: (qmail 16652 invoked by uid 48); 22 Oct 2012 06:44:18 -0000 From: "vincenzo.innocente at cern dot ch" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/55016] New: request for specific builtins for rcp and rsqrt Date: Mon, 22 Oct 2012 06:44:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Keywords: X-Bugzilla-Severity: enhancement X-Bugzilla-Who: vincenzo.innocente at cern dot ch X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Changed-Fields: Message-ID: X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated Content-Type: text/plain; charset="UTF-8" MIME-Version: 1.0 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2012-10/txt/msg01913.txt.bz2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55016 Bug #: 55016 Summary: request for specific builtins for rcp and rsqrt Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: tree-optimization AssignedTo: unassigned@gcc.gnu.org ReportedBy: vincenzo.innocente@cern.ch There are cases where the use of approximate rcp and rsqrt suffice. I wonder if it would be possible to introduce specific "generic" builtins for "rcp" and "rsqrt" that produce the proper instruction depending on the target architecture (see,avx etc) and eventually generate vector instruction in a loop at the moment anything like this is target specific, inefficient and does not vectorize! #include float v0[1024]; float v1[1024]; inline float rsqrtf( float x ) { return _mm_cvtss_f32( _mm_rsqrt_ss( _mm_set_ss( x ) ) ); } void v() { for(int i=0; i!=1024; ++i) v0[i] = rsqrtf(v1[i]); }