public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/102054] New: slightly worse code as PRE on some code got disabled for loop vectorization
@ 2021-08-25 7:14 linkw at gcc dot gnu.org
2021-08-25 7:18 ` [Bug tree-optimization/102054] " linkw at gcc dot gnu.org
2021-09-13 6:20 ` linkw at gcc dot gnu.org
0 siblings, 2 replies; 3+ messages in thread
From: linkw at gcc dot gnu.org @ 2021-08-25 7:14 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102054
Bug ID: 102054
Summary: slightly worse code as PRE on some code got disabled
for loop vectorization
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: linkw at gcc dot gnu.org
Target Milestone: ---
This is a test case reduced from SPEC2017 bmk 541.leela_r source FastBoard.cpp,
when I was investigating the O2 vectorization degradation on SPEC2017 run. It's
an issue similar to PR100794, but which is only applied at O2 and fixed by
re-running pcom at O2. This one is applied for O3 vectorization as well.
TEST CASE:
class FastBoard {
public:
static const int NBR_SHIFT = 4;
static const int MAXBOARDSIZE = 19;
static const int MAXSQ = ((MAXBOARDSIZE + 2) * (MAXBOARDSIZE + 2));
enum square_t {
BLACK = 0, WHITE = 1, EMPTY = 2, INVAL = 3
};
bool self_atari(int color, int vertex);
protected:
int m_dirs[4];
square_t m_square[MAXSQ];
int nbr_libs[20];
};
bool FastBoard::self_atari(int color, int vertex) {
int nbr_libs_cnt = 0;
nbr_libs[nbr_libs_cnt++] = vertex;
for (int k = 0; k < 20; k++) {
int ai = vertex + m_dirs[k];
if (m_square[ai] == FastBoard::EMPTY) {
bool found = false;
for (int i = 0; i < nbr_libs_cnt; i++) {
if (nbr_libs[i] == ai) {
found = true;
break;
}
}
if (!found) {
if (nbr_libs_cnt > 1)
return false;
nbr_libs[nbr_libs_cnt++] = ai;
}
}
}
return true;
}
Options: -mcpu=power9 -Ofast (or -O2 -ftree-vectorize) etc.
With -fno-tree-loop-vectorize, it passes down the vertex_11 for nbr_libs[0].
<bb 3> [local count: 1014686026]:
# prephitmp_26 = PHI <pretmp_28(5), vertex_11(D)(10)>
# ivtmp.17_27 = PHI <ivtmp.17_3(5), ivtmp.17_8(10)>
if (ai_15 == prephitmp_26)
goto <bb 8>; [5.50%]
else
goto <bb 4>; [94.50%]
<bb 4> [local count: 958878295]:
if (ivtmp.17_27 != _31)
goto <bb 5>; [93.84%]
else
goto <bb 11>; [6.16%]
<bb 5> [local count: 899822494]:
ivtmp.17_3 = ivtmp.17_27 + 4;
_21 = (void *) ivtmp.17_3;
pretmp_28 = MEM[(int *)_21];
goto <bb 3>; [100.00%]
Without -fno-tree-loop-vectorize, it has the below IRs instead, always do the
load before ai comparison.
<bb 4> [local count: 1014686026]:
# ivtmp.12_27 = PHI <ivtmp.12_28(5), ivtmp.12_26(3)>
ivtmp.12_28 = ivtmp.12_27 + 4;
_22 = (void *) ivtmp.12_28;
_3 = MEM[(int *)_22];
if (_3 == ai_15)
goto <bb 8>; [5.50%]
else
goto <bb 5>; [94.50%]
<bb 5> [local count: 958878295]:
if (ivtmp.12_28 != _30)
goto <bb 4>; [93.84%]
else
goto <bb 10>; [6.16%]
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug tree-optimization/102054] slightly worse code as PRE on some code got disabled for loop vectorization
2021-08-25 7:14 [Bug tree-optimization/102054] New: slightly worse code as PRE on some code got disabled for loop vectorization linkw at gcc dot gnu.org
@ 2021-08-25 7:18 ` linkw at gcc dot gnu.org
2021-09-13 6:20 ` linkw at gcc dot gnu.org
1 sibling, 0 replies; 3+ messages in thread
From: linkw at gcc dot gnu.org @ 2021-08-25 7:18 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102054
Kewen Lin <linkw at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |crazylht at gmail dot com,
| |rguenth at gcc dot gnu.org,
| |rsandifo at gcc dot gnu.org,
| |segher at gcc dot gnu.org,
| |wschmidt at gcc dot gnu.org
Keywords| |missed-optimization
--- Comment #1 from Kewen Lin <linkw at gcc dot gnu.org> ---
Forgot to mention that it only affects 0.3% for 541.leela_r, so I guess it's in
low priority.
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug tree-optimization/102054] slightly worse code as PRE on some code got disabled for loop vectorization
2021-08-25 7:14 [Bug tree-optimization/102054] New: slightly worse code as PRE on some code got disabled for loop vectorization linkw at gcc dot gnu.org
2021-08-25 7:18 ` [Bug tree-optimization/102054] " linkw at gcc dot gnu.org
@ 2021-09-13 6:20 ` linkw at gcc dot gnu.org
1 sibling, 0 replies; 3+ messages in thread
From: linkw at gcc dot gnu.org @ 2021-09-13 6:20 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102054
--- Comment #2 from Kewen Lin <linkw at gcc dot gnu.org> ---
Yet another reduced test case from 526.blender_r.
#include <math.h>
typedef struct QMCSampler {
struct QMCSampler *next, *prev;
int type;
int tot;
int used;
double *samp2d;
double offs[1][2];
} QMCSampler;
float BLI_thread_frand(int thread);
static void halton_sample(double *ht_invprimes, double *ht_nums, double *v) {
unsigned int i;
for (i = 0; i < 2; i++) {
double r = fabs((1.0 - ht_nums[i]) - 1e-10);
if (ht_invprimes[i] >= r) {
double lasth;
double h = ht_invprimes[i];
do {
lasth = h;
h *= ht_invprimes[i];
} while (h >= r);
ht_nums[i] += ((lasth + h) - 1.0);
} else
ht_nums[i] += ht_invprimes[i];
v[i] = (float)ht_nums[i];
}
}
void QMC_initPixel(QMCSampler *qsa, int thread) {
if (qsa->type == 2) {
qsa->offs[thread][0] = 0.5f * BLI_thread_frand(thread);
qsa->offs[thread][1] = 0.5f * BLI_thread_frand(thread);
} else {
double ht_invprimes[2], ht_nums[2];
double r[2];
int i;
ht_nums[0] = BLI_thread_frand(thread);
ht_nums[1] = BLI_thread_frand(thread);
ht_invprimes[0] = 0.5;
ht_invprimes[1] = 1.0 / 3.0;
for (i = 0; i < qsa->tot; i++) {
halton_sample(ht_invprimes, ht_nums, r);
qsa->samp2d[2 * i + 0] = r[0];
qsa->samp2d[2 * i + 1] = r[1];
}
}
}
Without loop vectorization, unrestricted pre makes the loop happy for cunroll
and the loop was completely unrolled. The affected pct. is also small, about
0.7%.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-09-13 6:20 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-25 7:14 [Bug tree-optimization/102054] New: slightly worse code as PRE on some code got disabled for loop vectorization linkw at gcc dot gnu.org
2021-08-25 7:18 ` [Bug tree-optimization/102054] " linkw at gcc dot gnu.org
2021-09-13 6:20 ` linkw at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).