From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 2157 invoked by alias); 9 Jan 2019 08:29:07 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 2146 invoked by uid 89); 9 Jan 2019 08:29:06 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=recognition, neglected, feeding, nowadays X-HELO: mail-wm1-f43.google.com Received: from mail-wm1-f43.google.com (HELO mail-wm1-f43.google.com) (209.85.128.43) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 09 Jan 2019 08:29:04 +0000 Received: by mail-wm1-f43.google.com with SMTP id b11so6671465wmj.1 for ; Wed, 09 Jan 2019 00:29:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=to:from:subject:message-id:date:user-agent:mime-version :content-language:content-transfer-encoding; bh=oe2jHvOrZdkWtVYJN3RxPj18owptUpz5PfHLBAohaRw=; b=htXjQW1/w6EXr+qtBo3PDLja5aQwC060JaPPqbgjbiPVKIPJJ4eg0GUHMXb0rRHuqx WLfreIizxl9Rco+NV/WNulvxRZoEPFpFi+aYX7tRrOeKxm3t6EJrT8amqt5Iu7pKawcu ZyfaSk5k2vMHlj7SzRdOw/1vKjF4eEVkkzxhHkhbG5ezrL6kyVpjVIB1YXNLnMz3wHr7 cLB2fAu/4UZmLPYMbWEX4/k9FO5xMxhm+XIjW8lEFaCgEYLQ5b0N4sjwgmD67gEXAZw4 hJ+7Q9U1Yyakqmwu/vCgZqgnHawZpxIEZUHq93Wghk60kWOXsV5t2qbtHXNqn4Kk4tqX VvBg== Return-Path: Received: from [192.168.178.20] (p5DDEE0CE.dip0.t-ipconnect.de. [93.222.224.206]) by smtp.googlemail.com with ESMTPSA id n9sm45547794wrx.80.2019.01.09.00.29.01 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 09 Jan 2019 00:29:01 -0800 (PST) To: gcc@gcc.gnu.org From: "Kay F. Jahnke" Subject: autovectorization in gcc Message-ID: <41ea83cd-0ce8-4f25-35e5-888513d69c7b@gmail.com> Date: Wed, 09 Jan 2019 08:29:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-SW-Source: 2019-01/txt/msg00040.txt.bz2 Hi there! I am developing software which tries to deliberately exploit the compiler's autovectorization facilities by feeding data in autovectorization-friendly loops. I'm currently using both g++ and clang++ to see how well this approach works. Using simple arithmetic, I often get good results. To widen the scope of my work, I was looking for documentation on which constructs would be recognized by the autovectorization stage, and found https://www.gnu.org/software/gcc/projects/tree-ssa/vectorization.html By the looks of it, this document has not seen any changes for several years. Has development on the autovectorization stage stopped, or is there simply no documentation? In my experience, vectorization is essential to speed up arithmetic on the CPU, and reliable recognition of vectorization opportunities by the compiler can provide vectorization to programs which don't bother to code it explicitly. I feel the topic is being neglected - at least the documentation I found suggests this. To demonstrate what I mean, I have two concrete scenarios which I'd like to be handled by the autovectorization stage: - gather/scatter with arbitrary indexes In C, this would be loops like // gather from B to A using gather indexes for ( int i = 0 ; i < vsz ; i++ ) A [ i ] = B [ indexes [ i ] ] ; From the AVX2 ISA onwards, there are hardware gather/scatter operations, which can speed things up a good deal. - repeated use of vectorizable functions for ( int i = 0 ; i < vsz ; i++ ) A [ i ] = sqrt ( B [ i ] ) ; Here, replacing the repeated call of sqrt with the vectorized equivalent gives a dramatic speedup (ca. 4X) If the compiler were to provide the autovectorization facilities, and if the patterns it recognizes were well-documented, users could rely on certain code patterns being recognized and autovectorized - sort of a contract between the user and the compiler. With a well-chosen spectrum of patterns, this would make it unnecessary to have to rely on explicit vectorization in many cases. My hope is that such an interface would help vectorization to become more frequently used - as I understand the status quo, this is still a niche topic, even though many processors provide suitable hardware nowadays. Can you point me to where 'the action is' in this regard? With regards Kay F. Jahnke