Re: autovectorization in gcc

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

From: Kyrill  Tkachov <kyrylo.tkachov@foss.arm.com>
To: "Kay F. Jahnke" <kfjahnke@gmail.com>,
	 "gcc@gcc.gnu.org" <gcc@gcc.gnu.org>
Subject: Re: autovectorization in gcc
Date: Wed, 09 Jan 2019 09:46:00 -0000	[thread overview]
Message-ID: <5C35C2C2.1050106@foss.arm.com> (raw)
In-Reply-To: <41ea83cd-0ce8-4f25-35e5-888513d69c7b@gmail.com>

Hi Kay,

On 09/01/19 08:29, Kay F. Jahnke wrote:
> Hi there!
>
> I am developing software which tries to deliberately exploit the
> compiler's autovectorization facilities by feeding data in
> autovectorization-friendly loops. I'm currently using both g++ and
> clang++ to see how well this approach works. Using simple arithmetic, I
> often get good results. To widen the scope of my work, I was looking for
> documentation on which constructs would be recognized by the
> autovectorization stage, and found
>
> https://www.gnu.org/software/gcc/projects/tree-ssa/vectorization.html
>

Yeah, that page hasn't been updated in ages AFAIK.

> By the looks of it, this document has not seen any changes for several
> years. Has development on the autovectorization stage stopped, or is
> there simply no documentation?
>

There's plenty of work being done on auto-vectorisation in GCC.
Auto-vectorisation is a performance optimisation and as such is not really
a user-visible feature that absolutely requires user documentation.

> In my experience, vectorization is essential to speed up arithmetic on
> the CPU, and reliable recognition of vectorization opportunities by the
> compiler can provide vectorization to programs which don't bother to
> code it explicitly. I feel the topic is being neglected - at least the
> documentation I found suggests this. To demonstrate what I mean, I have
> two concrete scenarios which I'd like to be handled by the
> autovectorization stage:
>
> - gather/scatter with arbitrary indexes
>
> In C, this would be loops like
>
> // gather from B to A using gather indexes
>
> for ( int i = 0 ; i < vsz ; i++ )
>    A [ i ] = B [ indexes [ i ] ] ;
>
>  From the AVX2 ISA onwards, there are hardware gather/scatter
> operations, which can speed things up a good deal.
>
> - repeated use of vectorizable functions
>
> for ( int i = 0 ; i < vsz ; i++ )
>    A [ i ] = sqrt ( B [ i ] ) ;
>
> Here, replacing the repeated call of sqrt with the vectorized equivalent
> gives a dramatic speedup (ca. 4X)
>

I believe GCC will do some of that already given a high-enough optimisation level
and floating-point constraints.
Do you have examples where it doesn't? Testcases with self-contained source code
and compiler flags would be useful to analyse.

> If the compiler were to provide the autovectorization facilities, and if
> the patterns it recognizes were well-documented, users could rely on
> certain code patterns being recognized and autovectorized - sort of a
> contract between the user and the compiler. With a well-chosen spectrum
> of patterns, this would make it unnecessary to have to rely on explicit
> vectorization in many cases. My hope is that such an interface would
> help vectorization to become more frequently used - as I understand the
> status quo, this is still a niche topic, even though many processors
> provide suitable hardware nowadays.
>

I wouldn't say it's a niche topic :)
 From my monitoring of the GCC development over the last few years there's been lots
of improvements in auto-vectorisation in compilers (at least in GCC).

The thing is, auto-vectorisation is not always profitable for performance.
Sometimes the runtime loop iteration count is so low that setting up the vectorised loop
(alignment checks, loads/permutes) is slower than just doing the scalar form,
especially since SIMD performance varies from CPU to CPU.
So we would want the compiler to have the freedom to make its own judgement on when
to auto-vectorise rather than enforce a "contract". If the user really only wants
vector code, they should use one of the explicit programming paradigms.

HTH,
Kyrill

> Can you point me to where 'the action is' in this regard?
>
> With regards
>
> Kay F. Jahnke
>
>

next prev parent reply	other threads:[~2019-01-09  9:46 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-09  8:29 Kay F. Jahnke
2019-01-09  9:46 ` Kyrill Tkachov [this message]
2019-01-09  9:50   ` Andrew Haley
2019-01-09  9:56     ` Jonathan Wakely
2019-01-09 16:10       ` David Malcolm
2019-01-09 16:25         ` Jakub Jelinek
2019-01-10  8:19           ` Richard Biener
2019-01-10 11:11             ` Szabolcs Nagy
2019-01-09 16:26         ` David Malcolm
2019-01-09 10:47     ` Ramana Radhakrishnan
2019-01-10  9:24     ` Kay F. Jahnke
2019-01-10 11:18       ` Jonathan Wakely
2019-08-18 10:59         ` [wwwdocs PATCH] for " Gerald Pfeifer
2019-01-09 10:56   ` Kay F. Jahnke
2019-01-09 11:03     ` Jakub Jelinek
2019-01-09 11:21       ` Jakub Jelinek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5C35C2C2.1050106@foss.arm.com \
    --to=kyrylo.tkachov@foss.arm.com \
    --cc=gcc@gcc.gnu.org \
    --cc=kfjahnke@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).