From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-x12c.google.com (mail-lf1-x12c.google.com [IPv6:2a00:1450:4864:20::12c]) by sourceware.org (Postfix) with ESMTPS id 3894F3858D1E for ; Fri, 15 Sep 2023 11:20:49 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 3894F3858D1E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-lf1-x12c.google.com with SMTP id 2adb3069b0e04-50098cc8967so3260218e87.1 for ; Fri, 15 Sep 2023 04:20:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1694776847; x=1695381647; darn=gcc.gnu.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=lIAxLjv2VlsMgFrf7dONMwK/3AurMRkzsEhvb+Tf44k=; b=I21k8/dcAjSl0qZnldnLMTRMnVwDudYIxrTTUbibXfBfP96QYi/fv/UJVgLlDneUU2 V1TumasBQOhragKMqRhW0fUoJYRshX13D3+IPumK9UgKcJ8b3ZiN/4zu7eANevVtqRgC iwCcnJ6psOhGv4r+3Q7qmoxnjiBa6BX9xfwOfbksKskr5JEDuSHI8rEHxN0TifuRPIpX 0hJqKmH5GzAcyAMhvhR+l6wcdAYYF9blnwZjmnKdeaMxNwtJcVOry5OkCZ0Y6Og+qRhM uQYBG+hivO6KZLXhmXpbSiNcAEY3T2Kp/pa+AACEp8NBe2Y0EDdOpsS3cQ5DlP+sV0L8 yj7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694776847; x=1695381647; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=lIAxLjv2VlsMgFrf7dONMwK/3AurMRkzsEhvb+Tf44k=; b=L+QLi3Wv0BDlJ49xAyPdHhyzflaFFYX4FAbySCcimHu01Ggw8do/2bjMjuoB53ZszF yDAVl6m/hoSgnHehKKoqP14mLJLv+4AUanR1NQ2lMdO+eDecERNnixlXv7s7pPdroBKJ gPH0ae4v2gEGcbWGbDKTRdmy5pIEiRhP5FDbx8T7OVtWIj4PWWriVV4Tbex/CE56qcQ0 XtFflmXc7HZPVELF8KWmv3ngBNViKigxwVN8nAqtJkdzZOCsUfkwYHc9iZ0/1PvhMlSN lpO5TuDTEo5bOdntCpSROqzmIE82frjPdMKYFBOPdakW1i3o76gTEjibAmumJ5GdJR1e wWZw== X-Gm-Message-State: AOJu0Yz0UYocDBKd2ofb3wSlMa63GX51unNyYFQRsLq8PIZS1bfyTe0h EMqKkwAbd6HbXH4eY427L8kMzP6+CurWkR5WFfDHLdtwkq4= X-Google-Smtp-Source: AGHT+IFrV9sGX3VIfeU2RoYG5bZWwNFTmsVYsbkCxcJ+8VhzYL0p5tTIRvdNTxvpKqbhhpYibPx65e3afqptmTYvGyU= X-Received: by 2002:a05:6512:32a9:b0:502:cc8d:f1fc with SMTP id q9-20020a05651232a900b00502cc8df1fcmr1085009lfe.37.1694776847079; Fri, 15 Sep 2023 04:20:47 -0700 (PDT) MIME-Version: 1.0 From: Hanke Zhang Date: Fri, 15 Sep 2023 19:20:35 +0800 Message-ID: Subject: How to make parallelizing loops and vectorization work at the same time? To: gcc@gcc.gnu.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=0.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi I'm trying to accelerate my program with -ftree-vectorize and -ftree-parallelize-loops. Here are my test results using the different options (based on gcc10.3.0 on i9-12900KF): gcc-10 test.c -O3 -flto > time: 29000 ms gcc-10 test.c -O3 -flto -mavx2 -ftree-vectorize > time: 17000 ms gcc-10 test.c -O3 -flto -ftree-parallelize-loops=24 > time: 5000 ms gcc-10 test.c -O3 -flto -ftree-parallelize-loops=24 -mavx2 -ftree-vectorize > time: 5000 ms I found that these two options do not work at the same time, that is, if I use the `-ftree-vectorize` option alone, it can bring a big efficiency gain compared to doing nothing; At the same time, if I use the option of `-ftree-parallelize-loops` alone, it will also bring a big efficiency gain. But if I use both options, vectorization fails, that is, I can't get the benefits of vectorization, I can only get the benefits of parallelizing loops. I know that the reason may be that after parallelizing the loop, vectorization cannot be performed, but is there any way I can reap the benefits of both optimizations? Here is my example program, adapted from the 462.libquantum in speccpu2006: ``` #include #include #include #define MAX_UNSIGNED unsigned long long struct quantum_reg_node_struct { float _Complex *amplitude; /* alpha_j */ MAX_UNSIGNED *state; /* j */ }; typedef struct quantum_reg_node_struct quantum_reg_node; struct quantum_reg_struct { int width; /* number of qubits in the qureg */ int size; /* number of non-zero vectors */ int hashw; /* width of the hash array */ quantum_reg_node *node; int *hash; }; typedef struct quantum_reg_struct quantum_reg; void quantum_toffoli(int control1, int control2, int target, quantum_reg *reg) { for (int i = 0; i < reg->size; i++) { if (reg->node->state[i] & ((MAX_UNSIGNED)1 << control1)) { if (reg->node->state[i] & ((MAX_UNSIGNED)1 << control2)) { reg->node->state[i] ^= ((MAX_UNSIGNED)1 << target); } } } } int get_random() { return rand() % 64; } void init(quantum_reg *reg) { reg->size = 2097152; for (int i = 0; i < reg->size; i++) { reg->node = (quantum_reg_node *)malloc(sizeof(quantum_reg_node)); reg->node->state = (MAX_UNSIGNED *)malloc(sizeof(MAX_UNSIGNED) * reg->size); reg->node->amplitude = (float _Complex *)malloc(sizeof(float _Complex) * reg->size); if (i >= 1) break; } for (int i = 0; i < reg->size; i++) { reg->node->amplitude[i] = 0; reg->node->state[i] = 0; } } int main() { quantum_reg reg; init(®); for (int i = 0; i < 65000; i++) { quantum_toffoli(get_random(), get_random(), get_random(), ®); } } ``` Thanks so much.