* [x86 PATCH] Some additional ix86_rtx_costs clean-ups: NEG, AND and pandn.
@ 2022-05-18 17:30 Roger Sayle
2022-05-19 6:29 ` Uros Bizjak
0 siblings, 1 reply; 3+ messages in thread
From: Roger Sayle @ 2022-05-18 17:30 UTC (permalink / raw)
To: gcc-patches
[-- Attachment #1: Type: text/plain, Size: 1446 bytes --]
Hi Uros,
Very many thanks for the speedy review and approval of my ix86_rtx_costs
patch to correct the cost of multi-word multiplications. In the same
spirit, this patch tidies up a few additional nits I noticed while there:
Multi-word NOT requires two operations, but multi-word NEG requires
three operations. Using SSE, vector NOT requires a pxor with -1, but
AND of NOT is cheap thanks to the existence of pandn. There's also some
legacy (aka incorrect) logic explicitly testing for DImode [independently
of TARGET_64BIT] in determining the cost of logic operations that's not
required.
There should be no behavioural changes from this patch (as confirmed by
testing) but hopefully vectorization and other middle-end passes can now
rely on more sensible "cost" estimates.
This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}, with
no new failures. Ok for mainline?
2022-05-18 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386.cc (ix86_rtx_costs): Split AND from XOR/IOR.
Multi-word binary logic operations require two instructions.
For vector integer modes, AND with a NOT operand requires only
a single instruction (pandn).
[NOT]: Vector integer NOT requires more than 1 instruction (pxor).
[NEG]: Multi-word negation requires 3 instructions.
Thanks in advance,
Roger
--
[-- Attachment #2: patchmc2.txt --]
[-- Type: text/plain, Size: 3123 bytes --]
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 86752a6..1701244 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -20744,26 +20744,67 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
}
/* FALLTHRU */
- case AND:
case IOR:
case XOR:
if (GET_MODE_CLASS (mode) == MODE_INT
&& GET_MODE_SIZE (mode) > UNITS_PER_WORD)
{
- *total = (cost->add * 2
- + (rtx_cost (XEXP (x, 0), mode, outer_code, opno, speed)
- << (GET_MODE (XEXP (x, 0)) != DImode))
- + (rtx_cost (XEXP (x, 1), mode, outer_code, opno, speed)
- << (GET_MODE (XEXP (x, 1)) != DImode)));
+ *total = cost->add * 2
+ + rtx_cost (XEXP (x, 0), mode, outer_code, opno, speed)
+ + rtx_cost (XEXP (x, 1), mode, outer_code, opno, speed);
+ return true;
+ }
+ else if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
+ *total = ix86_vec_cost (mode, cost->sse_op);
+ else if (GET_MODE_SIZE (mode) > UNITS_PER_WORD)
+ *total = cost->add * 2;
+ else
+ *total = cost->add;
+ return false;
+
+ case AND:
+ if (GET_MODE_CLASS (mode) == MODE_INT
+ && GET_MODE_SIZE (mode) > UNITS_PER_WORD)
+ {
+ *total = cost->add * 2
+ + rtx_cost (XEXP (x, 0), mode, outer_code, opno, speed)
+ + rtx_cost (XEXP (x, 1), mode, outer_code, opno, speed);
return true;
}
- else if (code == AND
- && address_no_seg_operand (x, mode))
+ else if (address_no_seg_operand (x, mode))
{
*total = cost->lea;
return true;
}
- /* FALLTHRU */
+ else if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
+ {
+ /* pandn is a single instruction. */
+ if (GET_CODE (XEXP (x, 0)) == NOT)
+ {
+ *total = ix86_vec_cost (mode, cost->sse_op)
+ + rtx_cost (XEXP (XEXP (x, 0), 0), mode,
+ outer_code, opno, speed)
+ + rtx_cost (XEXP (x, 1), mode,
+ outer_code, opno, speed);
+ return true;
+ }
+ *total = ix86_vec_cost (mode, cost->sse_op);
+ }
+ else if (GET_MODE_SIZE (mode) > UNITS_PER_WORD)
+ *total = cost->add * 2;
+ else
+ *total = cost->add;
+ return false;
+
+ case NOT:
+ if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
+ // vnot is pxor -1.
+ *total = ix86_vec_cost (mode, cost->sse_op) + 1;
+ else if (GET_MODE_SIZE (mode) > UNITS_PER_WORD)
+ *total = cost->add * 2;
+ else
+ *total = cost->add;
+ return false;
case NEG:
if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
@@ -20781,13 +20822,10 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
*total = ix86_vec_cost (mode, cost->sse_op);
return false;
}
- /* FALLTHRU */
-
- case NOT:
- if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
+ else if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
*total = ix86_vec_cost (mode, cost->sse_op);
else if (GET_MODE_SIZE (mode) > UNITS_PER_WORD)
- *total = cost->add * 2;
+ *total = cost->add * 3;
else
*total = cost->add;
return false;
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [x86 PATCH] Some additional ix86_rtx_costs clean-ups: NEG, AND and pandn.
2022-05-18 17:30 [x86 PATCH] Some additional ix86_rtx_costs clean-ups: NEG, AND and pandn Roger Sayle
@ 2022-05-19 6:29 ` Uros Bizjak
2022-05-20 18:36 ` Roger Sayle
0 siblings, 1 reply; 3+ messages in thread
From: Uros Bizjak @ 2022-05-19 6:29 UTC (permalink / raw)
To: Roger Sayle; +Cc: gcc-patches
On Wed, May 18, 2022 at 7:30 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> Hi Uros,
> Very many thanks for the speedy review and approval of my ix86_rtx_costs
> patch to correct the cost of multi-word multiplications. In the same
> spirit, this patch tidies up a few additional nits I noticed while there:
> Multi-word NOT requires two operations, but multi-word NEG requires
> three operations. Using SSE, vector NOT requires a pxor with -1, but
> AND of NOT is cheap thanks to the existence of pandn. There's also some
> legacy (aka incorrect) logic explicitly testing for DImode [independently
> of TARGET_64BIT] in determining the cost of logic operations that's not
> required.
>
> There should be no behavioural changes from this patch (as confirmed by
> testing) but hopefully vectorization and other middle-end passes can now
> rely on more sensible "cost" estimates.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}, with
> no new failures. Ok for mainline?
>
>
> 2022-05-18 Roger Sayle <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
> * config/i386/i386.cc (ix86_rtx_costs): Split AND from XOR/IOR.
> Multi-word binary logic operations require two instructions.
> For vector integer modes, AND with a NOT operand requires only
> a single instruction (pandn).
> [NOT]: Vector integer NOT requires more than 1 instruction (pxor).
> [NEG]: Multi-word negation requires 3 instructions.
- case AND:
case IOR:
case XOR:
if (GET_MODE_CLASS (mode) == MODE_INT
&& GET_MODE_SIZE (mode) > UNITS_PER_WORD)
{
- *total = (cost->add * 2
- + (rtx_cost (XEXP (x, 0), mode, outer_code, opno, speed)
- << (GET_MODE (XEXP (x, 0)) != DImode))
- + (rtx_cost (XEXP (x, 1), mode, outer_code, opno, speed)
- << (GET_MODE (XEXP (x, 1)) != DImode)));
+ *total = cost->add * 2
+ + rtx_cost (XEXP (x, 0), mode, outer_code, opno, speed)
+ + rtx_cost (XEXP (x, 1), mode, outer_code, opno, speed);
+ return true;
+ }
+ else if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
+ *total = ix86_vec_cost (mode, cost->sse_op);
+ else if (GET_MODE_SIZE (mode) > UNITS_PER_WORD)
+ *total = cost->add * 2;
+ else
+ *total = cost->add;
+ return false;
Shouldn't this be just:
if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
*total = ix86_vec_cost (mode, cost->sse_op);
else if (GET_MODE_SIZE (mode) > UNITS_PER_WORD)
*total = cost->add * 2;
else
*total = cost->add;
return false;
When returning false, subexpressions will be scanned automatically.
+ else if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
+ {
+ /* pandn is a single instruction. */
+ if (GET_CODE (XEXP (x, 0)) == NOT)
pandn is also a single instruction with BMI. IMO, this should be also
reflected in RTX costs.
Uros.
^ permalink raw reply [flat|nested] 3+ messages in thread
* RE: [x86 PATCH] Some additional ix86_rtx_costs clean-ups: NEG, AND and pandn.
2022-05-19 6:29 ` Uros Bizjak
@ 2022-05-20 18:36 ` Roger Sayle
0 siblings, 0 replies; 3+ messages in thread
From: Roger Sayle @ 2022-05-20 18:36 UTC (permalink / raw)
To: 'Uros Bizjak'; +Cc: gcc-patches
[-- Attachment #1: Type: text/plain, Size: 4605 bytes --]
Hi Uros,
Many thanks for the review. As requested here is a revised version of the
patch that's slightly more aggressive in its clean-ups (duplicating 6 lines
of code allowed me to eliminate 9 lines of code), but most significantly
also includes support for the andn for TARGET_BMI, and allows the NOT
operand of andn/pandn to appear as either the first or second operand
(combine/reload/output will swap them as appropriate).
Once again this patch has been tested on x86_64-pc-linux-gnu with
make bootstrap and make -k check, with and without -m32, with no
new failures. Ok for mainline?
2022-05-20 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386.cc (ix86_rtx_costs): Split AND from XOR/IOR.
Multi-word binary logic operations require two instructions.
For vector integer modes, AND with a NOT operand requires only
a single instruction (pandn). Likewise, integer andn with -mbmi.
[NOT]: Vector integer NOT requires more than 1 instruction (pxor).
[NEG]: Multi-word negation requires 3 instructions.
Thanks in advance,
Roger
> -----Original Message-----
> From: Uros Bizjak <ubizjak@gmail.com>
> Sent: 19 May 2022 07:29
> To: Roger Sayle <roger@nextmovesoftware.com>
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [x86 PATCH] Some additional ix86_rtx_costs clean-ups: NEG, AND
> and pandn.
>
> On Wed, May 18, 2022 at 7:30 PM Roger Sayle
> <roger@nextmovesoftware.com> wrote:
> >
> >
> > Hi Uros,
> > Very many thanks for the speedy review and approval of my
> > ix86_rtx_costs patch to correct the cost of multi-word
> > multiplications. In the same spirit, this patch tidies up a few additional nits I
> noticed while there:
> > Multi-word NOT requires two operations, but multi-word NEG requires
> > three operations. Using SSE, vector NOT requires a pxor with -1, but
> > AND of NOT is cheap thanks to the existence of pandn. There's also
> > some legacy (aka incorrect) logic explicitly testing for DImode
> > [independently of TARGET_64BIT] in determining the cost of logic
> > operations that's not required.
> >
> > There should be no behavioural changes from this patch (as confirmed
> > by
> > testing) but hopefully vectorization and other middle-end passes can
> > now rely on more sensible "cost" estimates.
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32},
> > with no new failures. Ok for mainline?
> >
> >
> > 2022-05-18 Roger Sayle <roger@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> > * config/i386/i386.cc (ix86_rtx_costs): Split AND from XOR/IOR.
> > Multi-word binary logic operations require two instructions.
> > For vector integer modes, AND with a NOT operand requires only
> > a single instruction (pandn).
> > [NOT]: Vector integer NOT requires more than 1 instruction (pxor).
> > [NEG]: Multi-word negation requires 3 instructions.
>
> - case AND:
> case IOR:
> case XOR:
> if (GET_MODE_CLASS (mode) == MODE_INT
> && GET_MODE_SIZE (mode) > UNITS_PER_WORD)
> {
> - *total = (cost->add * 2
> - + (rtx_cost (XEXP (x, 0), mode, outer_code, opno, speed)
> - << (GET_MODE (XEXP (x, 0)) != DImode))
> - + (rtx_cost (XEXP (x, 1), mode, outer_code, opno, speed)
> - << (GET_MODE (XEXP (x, 1)) != DImode)));
> + *total = cost->add * 2
> + + rtx_cost (XEXP (x, 0), mode, outer_code, opno, speed)
> + + rtx_cost (XEXP (x, 1), mode, outer_code, opno, speed);
> + return true;
> + }
> + else if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
> + *total = ix86_vec_cost (mode, cost->sse_op);
> + else if (GET_MODE_SIZE (mode) > UNITS_PER_WORD)
> + *total = cost->add * 2;
> + else
> + *total = cost->add;
> + return false;
>
> Shouldn't this be just:
>
> if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
> *total = ix86_vec_cost (mode, cost->sse_op); else if (GET_MODE_SIZE (mode)
> > UNITS_PER_WORD)
> *total = cost->add * 2;
> else
> *total = cost->add;
> return false;
>
> When returning false, subexpressions will be scanned automatically.
>
> + else if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
> + {
> + /* pandn is a single instruction. */
> + if (GET_CODE (XEXP (x, 0)) == NOT)
>
> pandn is also a single instruction with BMI. IMO, this should be also reflected in
> RTX costs.
>
> Uros.
[-- Attachment #2: patchmc3.txt --]
[-- Type: text/plain, Size: 5122 bytes --]
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 30a9cd0..daa60ac 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -20738,70 +20738,125 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
}
if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
- {
- *total = cost->addss;
- return false;
- }
+ *total = cost->addss;
else if (X87_FLOAT_MODE_P (mode))
- {
- *total = cost->fadd;
- return false;
- }
+ *total = cost->fadd;
else if (FLOAT_MODE_P (mode))
- {
- *total = ix86_vec_cost (mode, cost->addss);
- return false;
- }
- /* FALLTHRU */
+ *total = ix86_vec_cost (mode, cost->addss);
+ else if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
+ *total = ix86_vec_cost (mode, cost->sse_op);
+ else if (GET_MODE_SIZE (mode) > UNITS_PER_WORD)
+ *total = cost->add * 2;
+ else
+ *total = cost->add;
+ return false;
- case AND:
case IOR:
case XOR:
- if (GET_MODE_CLASS (mode) == MODE_INT
- && GET_MODE_SIZE (mode) > UNITS_PER_WORD)
+ if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
+ *total = ix86_vec_cost (mode, cost->sse_op);
+ else if (GET_MODE_SIZE (mode) > UNITS_PER_WORD)
+ *total = cost->add * 2;
+ else
+ *total = cost->add;
+ return false;
+
+ case AND:
+ if (address_no_seg_operand (x, mode))
{
- *total = (cost->add * 2
- + (rtx_cost (XEXP (x, 0), mode, outer_code, opno, speed)
- << (GET_MODE (XEXP (x, 0)) != DImode))
- + (rtx_cost (XEXP (x, 1), mode, outer_code, opno, speed)
- << (GET_MODE (XEXP (x, 1)) != DImode)));
+ *total = cost->lea;
return true;
}
- else if (code == AND
- && address_no_seg_operand (x, mode))
+ else if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
{
- *total = cost->lea;
- return true;
+ /* pandn is a single instruction. */
+ if (GET_CODE (XEXP (x, 0)) == NOT)
+ {
+ *total = ix86_vec_cost (mode, cost->sse_op)
+ + rtx_cost (XEXP (XEXP (x, 0), 0), mode,
+ outer_code, opno, speed)
+ + rtx_cost (XEXP (x, 1), mode,
+ outer_code, opno, speed);
+ return true;
+ }
+ else if (GET_CODE (XEXP (x, 1)) == NOT)
+ {
+ *total = ix86_vec_cost (mode, cost->sse_op)
+ + rtx_cost (XEXP (x, 0), mode,
+ outer_code, opno, speed)
+ + rtx_cost (XEXP (XEXP (x, 1), 0), mode,
+ outer_code, opno, speed);
+ return true;
+ }
+ *total = ix86_vec_cost (mode, cost->sse_op);
}
- /* FALLTHRU */
-
- case NEG:
- if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
+ else if (GET_MODE_SIZE (mode) > UNITS_PER_WORD)
{
- *total = cost->sse_op;
- return false;
+ if (TARGET_BMI && GET_CODE (XEXP (x,0)) == NOT)
+ {
+ *total = cost->add * 2
+ + rtx_cost (XEXP (XEXP (x, 0), 0), mode,
+ outer_code, opno, speed)
+ + rtx_cost (XEXP (x, 1), mode,
+ outer_code, opno, speed);
+ return true;
+ }
+ else if (TARGET_BMI && GET_CODE (XEXP (x, 1)) == NOT)
+ {
+ *total = cost->add * 2
+ + rtx_cost (XEXP (x, 0), mode,
+ outer_code, opno, speed)
+ + rtx_cost (XEXP (XEXP (x, 1), 0), mode,
+ outer_code, opno, speed);
+ return true;
+ }
+ *total = cost->add * 2;
}
- else if (X87_FLOAT_MODE_P (mode))
+ else if (TARGET_BMI && GET_CODE (XEXP (x,0)) == NOT)
{
- *total = cost->fchs;
- return false;
+ *total = cost->add
+ + rtx_cost (XEXP (XEXP (x, 0), 0), mode,
+ outer_code, opno, speed)
+ + rtx_cost (XEXP (x, 1), mode, outer_code, opno, speed);
+ return true;
}
- else if (FLOAT_MODE_P (mode))
+ else if (TARGET_BMI && GET_CODE (XEXP (x,1)) == NOT)
{
- *total = ix86_vec_cost (mode, cost->sse_op);
- return false;
+ *total = cost->add
+ + rtx_cost (XEXP (x, 0), mode, outer_code, opno, speed)
+ + rtx_cost (XEXP (XEXP (x, 1), 0), mode,
+ outer_code, opno, speed);
+ return true;
}
- /* FALLTHRU */
+ else
+ *total = cost->add;
+ return false;
case NOT:
if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
- *total = ix86_vec_cost (mode, cost->sse_op);
+ // vnot is pxor -1.
+ *total = ix86_vec_cost (mode, cost->sse_op) + 1;
else if (GET_MODE_SIZE (mode) > UNITS_PER_WORD)
*total = cost->add * 2;
else
*total = cost->add;
return false;
+ case NEG:
+ if (SSE_FLOAT_MODE_SSEMATH_OR_HF_P (mode))
+ *total = cost->sse_op;
+ else if (X87_FLOAT_MODE_P (mode))
+ *total = cost->fchs;
+ else if (FLOAT_MODE_P (mode))
+ *total = ix86_vec_cost (mode, cost->sse_op);
+ else if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
+ *total = ix86_vec_cost (mode, cost->sse_op);
+ else if (GET_MODE_SIZE (mode) > UNITS_PER_WORD)
+ *total = cost->add * 3;
+ else
+ *total = cost->add;
+ return false;
+
case COMPARE:
rtx op0, op1;
op0 = XEXP (x, 0);
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2022-05-20 18:36 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-18 17:30 [x86 PATCH] Some additional ix86_rtx_costs clean-ups: NEG, AND and pandn Roger Sayle
2022-05-19 6:29 ` Uros Bizjak
2022-05-20 18:36 ` Roger Sayle
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).