public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Biener <richard.guenther@gmail.com>
To: Hanke Zhang <hkzhang455@gmail.com>
Cc: gcc@gcc.gnu.org
Subject: Re: How to make parallelizing loops and vectorization work at the same time?
Date: Fri, 15 Sep 2023 13:59:30 +0200	[thread overview]
Message-ID: <CAFiYyc33W1V09dAOx-uLy1EAy+Ym=QnfAHU7LxYWe7ZgLXjfQw@mail.gmail.com> (raw)
In-Reply-To: <CAM_DAs_cejW=sLEmo4tzkcjh6_AqxtKuem5Sv7aKtM3e=DozvQ@mail.gmail.com>

On Fri, Sep 15, 2023 at 1:21 PM Hanke Zhang via Gcc <gcc@gcc.gnu.org> wrote:
>
> Hi I'm trying to accelerate my program with -ftree-vectorize and
> -ftree-parallelize-loops.
>
> Here are my test results using the different options (based on
> gcc10.3.0 on i9-12900KF):
> gcc-10 test.c -O3 -flto
> > time: 29000 ms
> gcc-10 test.c -O3 -flto -mavx2 -ftree-vectorize
> > time: 17000 ms
> gcc-10 test.c -O3 -flto -ftree-parallelize-loops=24
> > time: 5000 ms
> gcc-10 test.c -O3 -flto -ftree-parallelize-loops=24 -mavx2 -ftree-vectorize
> > time: 5000 ms
>

First of all -O3 already enables -ftree-vectorize, adding -mavx2 is what brings
the first gain.  So adding -ftree-vectorize to the last command-line is not
expected to change anything.  Instead you can use -fno-tree-vectorize on
the second last one.  Doing that I get 111s vs 41s thus doing both helps.

Note parallelization hasn't seen any development in the last years.

Richard.

> I found that these two options do not work at the same time, that is,
> if I use the `-ftree-vectorize` option alone, it can bring a big
> efficiency gain compared to doing nothing; At the same time, if I use
> the option of `-ftree-parallelize-loops` alone, it will also bring a
> big efficiency gain. But if I use both options, vectorization fails,
> that is, I can't get the benefits of vectorization, I can only get the
> benefits of parallelizing loops.
>
> I know that the reason may be that after parallelizing the loop,
> vectorization cannot be performed, but is there any way I can reap the
> benefits of both optimizations?
>
> Here is my example program, adapted from the 462.libquantum in speccpu2006:
>
> ```
> #include <stdio.h>
> #include <stdlib.h>
> #include <time.h>
>
> #define MAX_UNSIGNED unsigned long long
>
> struct quantum_reg_node_struct {
>     float _Complex *amplitude; /* alpha_j */
>     MAX_UNSIGNED *state;       /* j */
> };
>
> typedef struct quantum_reg_node_struct quantum_reg_node;
>
> struct quantum_reg_struct {
>     int width; /* number of qubits in the qureg */
>     int size;  /* number of non-zero vectors */
>     int hashw; /* width of the hash array */
>     quantum_reg_node *node;
>     int *hash;
> };
>
> typedef struct quantum_reg_struct quantum_reg;
>
> void quantum_toffoli(int control1, int control2, int target, quantum_reg *reg) {
>     for (int i = 0; i < reg->size; i++) {
>         if (reg->node->state[i] & ((MAX_UNSIGNED)1 << control1)) {
>             if (reg->node->state[i] & ((MAX_UNSIGNED)1 << control2))  {
>                 reg->node->state[i] ^= ((MAX_UNSIGNED)1 << target);
>             }
>         }
>     }
> }
>
> int get_random() {
>     return rand() % 64;
> }
>
> void init(quantum_reg *reg) {
>     reg->size = 2097152;
>     for (int i = 0; i < reg->size; i++)  {
>         reg->node = (quantum_reg_node *)malloc(sizeof(quantum_reg_node));
>         reg->node->state = (MAX_UNSIGNED *)malloc(sizeof(MAX_UNSIGNED)
> * reg->size);
>         reg->node->amplitude = (float _Complex *)malloc(sizeof(float
> _Complex) * reg->size);
>         if (i >= 1) break;
>     }
>     for (int i = 0; i < reg->size; i++)  {
>         reg->node->amplitude[i] = 0;
>         reg->node->state[i] = 0;
>     }
> }
>
> int main() {
>     quantum_reg reg;
>     init(&reg);
>     for (int i = 0; i < 65000; i++) {
>         quantum_toffoli(get_random(), get_random(), get_random(), &reg);
>     }
> }
> ```
>
> Thanks so much.

  reply	other threads:[~2023-09-15 11:59 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-15 11:20 Hanke Zhang
2023-09-15 11:59 ` Richard Biener [this message]
2023-09-15 13:09   ` Hanke Zhang
2023-09-15 13:13     ` Richard Biener
2023-09-15 14:07       ` Hanke Zhang
2023-09-18  6:45         ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFiYyc33W1V09dAOx-uLy1EAy+Ym=QnfAHU7LxYWe7ZgLXjfQw@mail.gmail.com' \
    --to=richard.guenther@gmail.com \
    --cc=gcc@gcc.gnu.org \
    --cc=hkzhang455@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).