* Re: [PATCH, Atom] Improve AGU stalls avoidance optimization
@ 2011-09-03 14:25 Uros Bizjak
2011-09-06 12:26 ` Ilya Enkovich
0 siblings, 1 reply; 6+ messages in thread
From: Uros Bizjak @ 2011-09-03 14:25 UTC (permalink / raw)
To: gcc-patches; +Cc: Ilya Enkovich
... Sent again, with correct Cc and subject line ...
Hello!
> Here is a patch which adds few more splits for AGU stalls avoidance on
> Atom. It also fixes cost model and detects AGU stalls more
> efficiently.
>
> Bootstrapped and checked on x86_64-linux.
>
> 2011-09-02 Enkovich Ilya <ilya.enkovich@intel.com>
>
> * config/i386/i386-protos.h (ix86_lea_outperforms): New.
> (ix86_avoid_lea_for_add): Likewise.
> (ix86_avoid_lea_for_addr): Likewise.
> (ix86_split_lea_for_addr): Likewise.
>
> * config/i386/i386.c (LEA_MAX_STALL): New.
> (increase_distance): Likewise.
> (insn_defines_reg): Likewise.
> (insn_uses_reg_mem): Likewise.
> (distance_non_agu_define_in_bb): Likewise.
> (distance_agu_use_in_bb): Likewise.
> (ix86_lea_outperforms): Likewise.
> (ix86_ok_to_clobber_flags): Likewise.
> (ix86_avoid_lea_for_add): Likewise.
> (ix86_avoid_lea_for_addr): Likewise.
> (ix86_split_lea_for_addr): Likewise.
> (distance_non_agu_define): Search in pred BBs added.
> (distance_agu_use): Search in succ BBs added.
> (IX86_LEA_PRIORITY): Value changed from 2 to 0.
> (LEA_SEARCH_THRESHOLD): Now depends on LEA_MAX_STALL.
> (ix86_lea_for_add_ok): Use ix86_lea_outperforms to make decision.
>
> * config/i386/i386.md: Splits added to transform lea into a
> sequence of instructions.
Did you also test on x32 ? H.J.'s x32 page [1] currently says that
Atom LEA optimization is disabled on x32 for some reason.
The patch looks OK to me, with a few nits below.
[1] https://sites.google.com/site/x32abi/
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
+bool
+ix86_avoid_lea_for_addr (rtx insn, rtx operands[])
+{
+ unsigned int regno0 = true_regnum (operands[0]) ;
+ unsigned int regno1 = -1;
+ unsigned int regno2 = -1;
Use INVALID_REGNUM here.
+extern void
+ix86_split_lea_for_addr (rtx operands[], enum machine_mode mode)
+{
Missing comment.
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -5777,6 +5777,41 @@
(const_string "none")))
(set_attr "mode" "QI")])
+;; Split non destructive adds if we cannot use lea.
+(define_split
+ [(set (match_operand:SWI48 0 "register_operand" "")
+ (plus:SWI48 (match_operand:SWI48 1 "register_operand" "")
+ (match_operand:SWI48 2 "nonmemory_operand" "")))
+ (clobber (reg:CC FLAGS_REG))]
+ "reload_completed && ix86_avoid_lea_for_add (insn, operands)"
+ [(set (match_dup 0) (match_dup 1))
+ (parallel [(set (match_dup 0) (plus:<MODE> (match_dup 0) (match_dup 2)))
+ (clobber (reg:CC FLAGS_REG))])
+ ]
+)
Put all closing braces on one line:
(clobber (reg:CC FLAGS_REG))])])
+;; Split lea into one or more ALU instructions if profitable.
+(define_split
+ [(set (match_operand:SI 0 "register_operand" "")
+ (subreg:SI (match_operand:DI 1 "lea_address_operand" "") 0))]
+ "reload_completed && ix86_avoid_lea_for_addr (insn, operands)"
+ [(const_int 0)]
+{
+ ix86_split_lea_for_addr (operands, SImode);
+ DONE;
+})
This is valid only for TARGET_64BIT.
Please note that x32 adds quite some different LEA patterns (see
i386.md, line 5466+). I suggest you merge your splitters with these
define_insn patterns into define_insn_and_split, adding "&&
reload_completed && ix86_avoid_lea_for_addr (insn, operands)" as a
split condition.
Uros.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH, Atom] Improve AGU stalls avoidance optimization
2011-09-03 14:25 [PATCH, Atom] Improve AGU stalls avoidance optimization Uros Bizjak
@ 2011-09-06 12:26 ` Ilya Enkovich
2011-09-06 15:43 ` Uros Bizjak
0 siblings, 1 reply; 6+ messages in thread
From: Ilya Enkovich @ 2011-09-06 12:26 UTC (permalink / raw)
To: Uros Bizjak; +Cc: gcc-patches
[-- Attachment #1: Type: text/plain, Size: 3227 bytes --]
Hello,
Thanks for review!
2011/9/3 Uros Bizjak <ubizjak@gmail.com>:
> Did you also test on x32 ? H.J.'s x32 page [1] currently says that
> Atom LEA optimization is disabled on x32 for some reason.
No. I did not try to cover x32. It will be a separate work.
> +bool
> +ix86_avoid_lea_for_addr (rtx insn, rtx operands[])
> +{
> + unsigned int regno0 = true_regnum (operands[0]) ;
> + unsigned int regno1 = -1;
> + unsigned int regno2 = -1;
>
> Use INVALID_REGNUM here.
Fixed. Also used INVALID_REGNUM in other places where -1 was used as
invalid register number.
>
> +extern void
> +ix86_split_lea_for_addr (rtx operands[], enum machine_mode mode)
> +{
>
> Missing comment.
Fixed
> +;; Split non destructive adds if we cannot use lea.
> +(define_split
> + [(set (match_operand:SWI48 0 "register_operand" "")
> + (plus:SWI48 (match_operand:SWI48 1 "register_operand" "")
> + (match_operand:SWI48 2 "nonmemory_operand" "")))
> + (clobber (reg:CC FLAGS_REG))]
> + "reload_completed && ix86_avoid_lea_for_add (insn, operands)"
> + [(set (match_dup 0) (match_dup 1))
> + (parallel [(set (match_dup 0) (plus:<MODE> (match_dup 0) (match_dup 2)))
> + (clobber (reg:CC FLAGS_REG))])
> + ]
> +)
>
> Put all closing braces on one line:
Fixed.
> +;; Split lea into one or more ALU instructions if profitable.
> +(define_split
> + [(set (match_operand:SI 0 "register_operand" "")
> + (subreg:SI (match_operand:DI 1 "lea_address_operand" "") 0))]
> + "reload_completed && ix86_avoid_lea_for_addr (insn, operands)"
> + [(const_int 0)]
> +{
> + ix86_split_lea_for_addr (operands, SImode);
> + DONE;
> +})
>
> This is valid only for TARGET_64BIT.
Fixed.
> Please note that x32 adds quite some different LEA patterns (see
> i386.md, line 5466+). I suggest you merge your splitters with these
> define_insn patterns into define_insn_and_split, adding "&&
> reload_completed && ix86_avoid_lea_for_addr (insn, operands)" as a
> split condition.
Thanks for the note. I'll look at new patterns when we enable lea
optimization for x32.
>
> Uros.
>
Is fixed version OK?
Thanks,
Ilya
---
gcc/
2011-09-06 Enkovich Ilya <ilya.enkovich@intel.com>
* config/i386/i386-protos.h (ix86_lea_outperforms): New.
(ix86_avoid_lea_for_add): Likewise.
(ix86_avoid_lea_for_addr): Likewise.
(ix86_split_lea_for_addr): Likewise.
* config/i386/i386.c (LEA_MAX_STALL): New.
(increase_distance): Likewise.
(insn_defines_reg): Likewise.
(insn_uses_reg_mem): Likewise.
(distance_non_agu_define_in_bb): Likewise.
(distance_agu_use_in_bb): Likewise.
(ix86_lea_outperforms): Likewise.
(ix86_ok_to_clobber_flags): Likewise.
(ix86_avoid_lea_for_add): Likewise.
(ix86_avoid_lea_for_addr): Likewise.
(ix86_split_lea_for_addr): Likewise.
(distance_non_agu_define): Search in pred BBs added.
(distance_agu_use): Search in succ BBs added.
(IX86_LEA_PRIORITY): Value changed from 2 to 0.
(LEA_SEARCH_THRESHOLD): Now depends on LEA_MAX_STALL.
(ix86_lea_for_add_ok): Use ix86_lea_outperforms to make decision.
* config/i386/i386.md: Splits added to transform lea into a
sequence of instructions.
[-- Attachment #2: lea.diff --]
[-- Type: application/octet-stream, Size: 24378 bytes --]
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 7deeae7..900d1c5 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -90,6 +90,11 @@ extern void ix86_fixup_binary_operands_no_copy (enum rtx_code,
extern void ix86_expand_binary_operator (enum rtx_code,
enum machine_mode, rtx[]);
extern bool ix86_binary_operator_ok (enum rtx_code, enum machine_mode, rtx[]);
+extern bool ix86_lea_outperforms (rtx, unsigned int, unsigned int,
+ unsigned int, unsigned int);
+extern bool ix86_avoid_lea_for_add (rtx, rtx[]);
+extern bool ix86_avoid_lea_for_addr (rtx, rtx[]);
+extern void ix86_split_lea_for_addr (rtx[], enum machine_mode);
extern bool ix86_lea_for_add_ok (rtx, rtx[]);
extern bool ix86_vec_interleave_v2df_operator_ok (rtx operands[3], bool high);
extern bool ix86_dep_by_shift_count (const_rtx set_insn, const_rtx use_insn);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index a9c0aa7..660e8c0 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -15955,12 +15955,125 @@ ix86_split_idivmod (enum machine_mode mode, rtx operands[],
emit_label (end_label);
}
-#define LEA_SEARCH_THRESHOLD 12
+#define LEA_MAX_STALL (3)
+#define LEA_SEARCH_THRESHOLD (LEA_MAX_STALL << 1)
+
+/* Increase given DISTANCE in half-cycles according to
+ dependencies between PREV and NEXT instructions.
+ Add 1 half-cycle if there is no dependency and
+ go to next cycle if there is some dependecy. */
+
+static unsigned int
+increase_distance (rtx prev, rtx next, unsigned int distance)
+{
+ df_ref *use_rec;
+ df_ref *def_rec;
+
+ if (!prev || !next)
+ return distance + (distance & 1) + 2;
+
+ if (!DF_INSN_USES (next) || !DF_INSN_DEFS (prev))
+ return distance + 1;
+
+ for (use_rec = DF_INSN_USES (next); *use_rec; use_rec++)
+ for (def_rec = DF_INSN_DEFS (prev); *def_rec; def_rec++)
+ if (!DF_REF_IS_ARTIFICIAL (*def_rec)
+ && DF_REF_REGNO (*use_rec) == DF_REF_REGNO (*def_rec))
+ return distance + (distance & 1) + 2;
+
+ return distance + 1;
+}
+
+/* Function checks if instruction INSN defines register number
+ REGNO1 or REGNO2. */
+
+static bool
+insn_defines_reg (unsigned int regno1, unsigned int regno2,
+ rtx insn)
+{
+ df_ref *def_rec;
+
+ for (def_rec = DF_INSN_DEFS (insn); *def_rec; def_rec++)
+ if (DF_REF_REG_DEF_P (*def_rec)
+ && !DF_REF_IS_ARTIFICIAL (*def_rec)
+ && (regno1 == DF_REF_REGNO (*def_rec)
+ || regno2 == DF_REF_REGNO (*def_rec)))
+ {
+ return true;
+ }
+
+ return false;
+}
+
+/* Function checks if instruction INSN uses register number
+ REGNO as a part of address expression. */
+
+static bool
+insn_uses_reg_mem (unsigned int regno, rtx insn)
+{
+ df_ref *use_rec;
+
+ for (use_rec = DF_INSN_USES (insn); *use_rec; use_rec++)
+ if (DF_REF_REG_MEM_P (*use_rec) && regno == DF_REF_REGNO (*use_rec))
+ return true;
+
+ return false;
+}
+
+/* Search backward for non-agu definition of register number REGNO1
+ or register number REGNO2 in basic block starting from instruction
+ START up to head of basic block or instruction INSN.
+
+ Function puts true value into *FOUND var if definition was found
+ and false otherwise.
+
+ Distance in half-cycles between START and found instruction or head
+ of BB is added to DISTANCE and returned. */
+
+static int
+distance_non_agu_define_in_bb (unsigned int regno1, unsigned int regno2,
+ rtx insn, int distance,
+ rtx start, bool *found)
+{
+ basic_block bb = start ? BLOCK_FOR_INSN (start) : NULL;
+ rtx prev = start;
+ rtx next = NULL;
+ enum attr_type insn_type;
+
+ *found = false;
+
+ while (prev
+ && prev != insn
+ && distance < LEA_SEARCH_THRESHOLD)
+ {
+ if (NONDEBUG_INSN_P (prev) && NONJUMP_INSN_P (prev))
+ {
+ distance = increase_distance (prev, next, distance);
+ if (insn_defines_reg (regno1, regno2, prev))
+ {
+ insn_type = get_attr_type (prev);
+ if (insn_type != TYPE_LEA)
+ {
+ *found = true;
+ return distance;
+ }
+ }
+
+ next = prev;
+ }
+ if (prev == BB_HEAD (bb))
+ break;
+
+ prev = PREV_INSN (prev);
+ }
+
+ return distance;
+}
/* Search backward for non-agu definition of register number REGNO1
or register number REGNO2 in INSN's basic block until
1. Pass LEA_SEARCH_THRESHOLD instructions, or
- 2. Reach BB boundary, or
+ 2. Reach neighbour BBs boundary, or
3. Reach agu definition.
Returns the distance between the non-agu definition point and INSN.
If no definition point, returns -1. */
@@ -15971,35 +16084,14 @@ distance_non_agu_define (unsigned int regno1, unsigned int regno2,
{
basic_block bb = BLOCK_FOR_INSN (insn);
int distance = 0;
- df_ref *def_rec;
- enum attr_type insn_type;
+ bool found = false;
if (insn != BB_HEAD (bb))
- {
- rtx prev = PREV_INSN (insn);
- while (prev && distance < LEA_SEARCH_THRESHOLD)
- {
- if (NONDEBUG_INSN_P (prev))
- {
- distance++;
- for (def_rec = DF_INSN_DEFS (prev); *def_rec; def_rec++)
- if (DF_REF_TYPE (*def_rec) == DF_REF_REG_DEF
- && !DF_REF_IS_ARTIFICIAL (*def_rec)
- && (regno1 == DF_REF_REGNO (*def_rec)
- || regno2 == DF_REF_REGNO (*def_rec)))
- {
- insn_type = get_attr_type (prev);
- if (insn_type != TYPE_LEA)
- goto done;
- }
- }
- if (prev == BB_HEAD (bb))
- break;
- prev = PREV_INSN (prev);
- }
- }
+ distance = distance_non_agu_define_in_bb (regno1, regno2, insn,
+ distance, PREV_INSN (insn),
+ &found);
- if (distance < LEA_SEARCH_THRESHOLD)
+ if (!found && distance < LEA_SEARCH_THRESHOLD)
{
edge e;
edge_iterator ei;
@@ -16013,88 +16105,121 @@ distance_non_agu_define (unsigned int regno1, unsigned int regno2,
}
if (simple_loop)
+ distance = distance_non_agu_define_in_bb (regno1, regno2,
+ insn, distance,
+ BB_END (bb), &found);
+ else
{
- rtx prev = BB_END (bb);
- while (prev
- && prev != insn
- && distance < LEA_SEARCH_THRESHOLD)
+ int shortest_dist = -1;
+ bool found_in_bb = false;
+
+ FOR_EACH_EDGE (e, ei, bb->preds)
{
- if (NONDEBUG_INSN_P (prev))
+ int bb_dist = distance_non_agu_define_in_bb (regno1, regno2,
+ insn, distance,
+ BB_END (e->src),
+ &found_in_bb);
+ if (found_in_bb)
{
- distance++;
- for (def_rec = DF_INSN_DEFS (prev); *def_rec; def_rec++)
- if (DF_REF_TYPE (*def_rec) == DF_REF_REG_DEF
- && !DF_REF_IS_ARTIFICIAL (*def_rec)
- && (regno1 == DF_REF_REGNO (*def_rec)
- || regno2 == DF_REF_REGNO (*def_rec)))
- {
- insn_type = get_attr_type (prev);
- if (insn_type != TYPE_LEA)
- goto done;
- }
+ if (shortest_dist < 0)
+ shortest_dist = bb_dist;
+ else if (bb_dist > 0)
+ shortest_dist = MIN (bb_dist, shortest_dist);
}
- prev = PREV_INSN (prev);
+
+ found = found || found_in_bb;
}
+
+ distance = shortest_dist;
}
}
- distance = -1;
-
-done:
/* get_attr_type may modify recog data. We want to make sure
that recog data is valid for instruction INSN, on which
distance_non_agu_define is called. INSN is unchanged here. */
extract_insn_cached (insn);
+
+ if (!found)
+ distance = -1;
+ else
+ distance = distance >> 1;
+
return distance;
}
-/* Return the distance between INSN and the next insn that uses
- register number REGNO0 in memory address. Return -1 if no such
- a use is found within LEA_SEARCH_THRESHOLD or REGNO0 is set. */
+/* Return the distance in half-cycles between INSN and the next
+ insn that uses register number REGNO in memory address added
+ to DISTANCE. Return -1 if REGNO0 is set.
+
+ Put true value into *FOUND if register usage was found and
+ false otherwise.
+ Put true value into *REDEFINED if register redefinition was
+ found and false otherwise. */
static int
-distance_agu_use (unsigned int regno0, rtx insn)
+distance_agu_use_in_bb(unsigned int regno,
+ rtx insn, int distance, rtx start,
+ bool *found, bool *redefined)
{
- basic_block bb = BLOCK_FOR_INSN (insn);
- int distance = 0;
- df_ref *def_rec;
- df_ref *use_rec;
+ basic_block bb = start ? BLOCK_FOR_INSN (start) : NULL;
+ rtx next = start;
+ rtx prev = NULL;
- if (insn != BB_END (bb))
+ *found = false;
+ *redefined = false;
+
+ while (next
+ && next != insn
+ && distance < LEA_SEARCH_THRESHOLD)
{
- rtx next = NEXT_INSN (insn);
- while (next && distance < LEA_SEARCH_THRESHOLD)
+ if (NONDEBUG_INSN_P (next) && NONJUMP_INSN_P (next))
{
- if (NONDEBUG_INSN_P (next))
+ distance = increase_distance(prev, next, distance);
+ if (insn_uses_reg_mem (regno, next))
{
- distance++;
-
- for (use_rec = DF_INSN_USES (next); *use_rec; use_rec++)
- if ((DF_REF_TYPE (*use_rec) == DF_REF_REG_MEM_LOAD
- || DF_REF_TYPE (*use_rec) == DF_REF_REG_MEM_STORE)
- && regno0 == DF_REF_REGNO (*use_rec))
- {
- /* Return DISTANCE if OP0 is used in memory
- address in NEXT. */
- return distance;
- }
+ /* Return DISTANCE if OP0 is used in memory
+ address in NEXT. */
+ *found = true;
+ return distance;
+ }
- for (def_rec = DF_INSN_DEFS (next); *def_rec; def_rec++)
- if (DF_REF_TYPE (*def_rec) == DF_REF_REG_DEF
- && !DF_REF_IS_ARTIFICIAL (*def_rec)
- && regno0 == DF_REF_REGNO (*def_rec))
- {
- /* Return -1 if OP0 is set in NEXT. */
- return -1;
- }
+ if (insn_defines_reg (regno, INVALID_REGNUM, next))
+ {
+ /* Return -1 if OP0 is set in NEXT. */
+ *redefined = true;
+ return -1;
}
- if (next == BB_END (bb))
- break;
- next = NEXT_INSN (next);
+
+ prev = next;
}
+
+ if (next == BB_END (bb))
+ break;
+
+ next = NEXT_INSN (next);
}
- if (distance < LEA_SEARCH_THRESHOLD)
+ return distance;
+}
+
+/* Return the distance between INSN and the next insn that uses
+ register number REGNO0 in memory address. Return -1 if no such
+ a use is found within LEA_SEARCH_THRESHOLD or REGNO0 is set. */
+
+static int
+distance_agu_use (unsigned int regno0, rtx insn)
+{
+ basic_block bb = BLOCK_FOR_INSN (insn);
+ int distance = 0;
+ bool found = false;
+ bool redefined = false;
+
+ if (insn != BB_END (bb))
+ distance = distance_agu_use_in_bb (regno0, insn, distance,
+ NEXT_INSN (insn),
+ &found, &redefined);
+
+ if (!found && !redefined && distance < LEA_SEARCH_THRESHOLD)
{
edge e;
edge_iterator ei;
@@ -16108,42 +16233,41 @@ distance_agu_use (unsigned int regno0, rtx insn)
}
if (simple_loop)
+ distance = distance_agu_use_in_bb (regno0, insn,
+ distance, BB_HEAD (bb),
+ &found, &redefined);
+ else
{
- rtx next = BB_HEAD (bb);
- while (next
- && next != insn
- && distance < LEA_SEARCH_THRESHOLD)
+ int shortest_dist = -1;
+ bool found_in_bb = false;
+ bool redefined_in_bb = false;
+
+ FOR_EACH_EDGE (e, ei, bb->succs)
{
- if (NONDEBUG_INSN_P (next))
+ int bb_dist = distance_agu_use_in_bb (regno0, insn,
+ distance, BB_HEAD (e->dest),
+ &found_in_bb, &redefined_in_bb);
+ if (found_in_bb)
{
- distance++;
-
- for (use_rec = DF_INSN_USES (next); *use_rec; use_rec++)
- if ((DF_REF_TYPE (*use_rec) == DF_REF_REG_MEM_LOAD
- || DF_REF_TYPE (*use_rec) == DF_REF_REG_MEM_STORE)
- && regno0 == DF_REF_REGNO (*use_rec))
- {
- /* Return DISTANCE if OP0 is used in memory
- address in NEXT. */
- return distance;
- }
-
- for (def_rec = DF_INSN_DEFS (next); *def_rec; def_rec++)
- if (DF_REF_TYPE (*def_rec) == DF_REF_REG_DEF
- && !DF_REF_IS_ARTIFICIAL (*def_rec)
- && regno0 == DF_REF_REGNO (*def_rec))
- {
- /* Return -1 if OP0 is set in NEXT. */
- return -1;
- }
-
+ if (shortest_dist < 0)
+ shortest_dist = bb_dist;
+ else if (bb_dist > 0)
+ shortest_dist = MIN (bb_dist, shortest_dist);
}
- next = NEXT_INSN (next);
+
+ found = found || found_in_bb;
}
+
+ distance = shortest_dist;
}
}
- return -1;
+ if (!found || redefined)
+ distance = -1;
+ else
+ distance = distance >> 1;
+
+ return distance;
}
/* Define this macro to tune LEA priority vs ADD, it take effect when
@@ -16151,7 +16275,309 @@ distance_agu_use (unsigned int regno0, rtx insn)
Negative value: ADD is more preferred than LEA
Zero: Netrual
Positive value: LEA is more preferred than ADD*/
-#define IX86_LEA_PRIORITY 2
+#define IX86_LEA_PRIORITY 0
+
+/* Return true if usage of lea INSN has performance advantage
+ over a sequence of instructions. Instructions sequence has
+ SPLIT_COST cycles higher latency than lea latency. */
+
+bool
+ix86_lea_outperforms (rtx insn, unsigned int regno0, unsigned int regno1,
+ unsigned int regno2, unsigned int split_cost)
+{
+ int dist_define, dist_use;
+
+ dist_define = distance_non_agu_define (regno1, regno2, insn);
+ dist_use = distance_agu_use (regno0, insn);
+
+ if (dist_define < 0 || dist_define >= LEA_MAX_STALL)
+ {
+ /* If there is no non AGU operand definition, no AGU
+ operand usage and split cost is 0 then both lea
+ and non lea variants have same priority. Currently
+ we prefer lea for 64 bit code and non lea on 32 bit
+ code. */
+ if (dist_use < 0 && split_cost == 0)
+ return TARGET_64BIT || IX86_LEA_PRIORITY;
+ else
+ return true;
+ }
+
+ /* With longer definitions distance lea is more preferable.
+ Here we change it to take into account splitting cost and
+ lea priority. */
+ dist_define += split_cost + IX86_LEA_PRIORITY;
+
+ /* If there is no use in memory addess then we just check
+ that split cost does not exceed AGU stall. */
+ if (dist_use < 0)
+ return dist_define >= LEA_MAX_STALL;
+
+ /* If this insn has both backward non-agu dependence and forward
+ agu dependence, the one with short distance takes effect. */
+ return dist_define >= dist_use;
+}
+
+/* Return true if it is legal to clobber flags by INSN and
+ false otherwise. */
+
+static bool
+ix86_ok_to_clobber_flags(rtx insn)
+{
+ basic_block bb = BLOCK_FOR_INSN (insn);
+ df_ref *use;
+ bitmap live;
+
+ while (insn)
+ {
+ if (NONDEBUG_INSN_P (insn))
+ {
+ for (use = DF_INSN_USES (insn); *use; use++)
+ if (DF_REF_REG_USE_P (*use) && DF_REF_REGNO (*use) == FLAGS_REG)
+ return false;
+
+ if (insn_defines_reg (FLAGS_REG, INVALID_REGNUM, insn))
+ return true;
+ }
+
+ if (insn == BB_END (bb))
+ break;
+
+ insn = NEXT_INSN (insn);
+ }
+
+ live = df_get_live_out(bb);
+ return !REGNO_REG_SET_P (live, FLAGS_REG);
+}
+
+/* Return true if we need to split op0 = op1 + op2 into a sequence of
+ move and add to avoid AGU stalls. */
+
+bool
+ix86_avoid_lea_for_add (rtx insn, rtx operands[])
+{
+ unsigned int regno0 = true_regnum (operands[0]);
+ unsigned int regno1 = true_regnum (operands[1]);
+ unsigned int regno2 = true_regnum (operands[2]);
+
+ /* Check if we need to optimize. */
+ if (!TARGET_OPT_AGU || optimize_function_for_size_p (cfun))
+ return false;
+
+ /* Check it is correct to split here. */
+ if (!ix86_ok_to_clobber_flags(insn))
+ return false;
+
+ /* We need to split only adds with non destructive
+ destination operand. */
+ if (regno0 == regno1 || regno0 == regno2)
+ return false;
+ else
+ return !ix86_lea_outperforms (insn, regno0, regno1, regno2, 1);
+}
+
+/* Return true if we need to split lea into a sequence of
+ instructions to avoid AGU stalls. */
+
+bool
+ix86_avoid_lea_for_addr (rtx insn, rtx operands[])
+{
+ unsigned int regno0 = true_regnum (operands[0]) ;
+ unsigned int regno1 = -1;
+ unsigned int regno2 = -1;
+ unsigned int split_cost = 0;
+ struct ix86_address parts;
+ int ok;
+
+ /* Check we need to optimize. */
+ if (!TARGET_OPT_AGU || optimize_function_for_size_p (cfun))
+ return false;
+
+ /* Check it is correct to split here. */
+ if (!ix86_ok_to_clobber_flags(insn))
+ return false;
+
+ ok = ix86_decompose_address (operands[1], &parts);
+ gcc_assert (ok);
+
+ /* We should not split into add if non legitimate pic
+ operand is used as displacement. */
+ if (parts.disp && flag_pic && !LEGITIMATE_PIC_OPERAND_P (parts.disp))
+ return false;
+
+ if (parts.base)
+ regno1 = true_regnum (parts.base);
+ if (parts.index)
+ regno2 = true_regnum (parts.index);
+
+ /* Compute how many cycles we will add to execution time
+ if split lea into a sequence of instructions. */
+ if (parts.base || parts.index)
+ {
+ /* Have to use mov instruction if non desctructive
+ destination form is used. */
+ if (regno1 != regno0 && regno2 != regno0)
+ split_cost += 1;
+
+ /* Have to add index to base if both exist. */
+ if (parts.base && parts.index)
+ split_cost += 1;
+
+ /* Have to use shift and adds if scale is 2 or greater. */
+ if (parts.scale > 1)
+ {
+ if (regno0 != regno1)
+ split_cost += 1;
+ else if (regno2 == regno0)
+ split_cost += 4;
+ else
+ split_cost += parts.scale;
+ }
+
+ /* Have to use add instruction with immediate if
+ disp is non zero. */
+ if (parts.disp && parts.disp != const0_rtx)
+ split_cost += 1;
+
+ /* Subtract the price of lea. */
+ split_cost -= 1;
+ }
+
+ return !ix86_lea_outperforms (insn, regno0, regno1, regno2, split_cost);
+}
+
+/* Split lea instructions into a sequence of instructions
+ which are executed on ALU to avoid AGU stalls.
+ It is assumed that it is allowed to clobber flags register
+ at lea position. */
+
+extern void
+ix86_split_lea_for_addr (rtx operands[], enum machine_mode mode)
+{
+ unsigned int regno0 = true_regnum (operands[0]) ;
+ unsigned int regno1 = INVALID_REGNUM;
+ unsigned int regno2 = INVALID_REGNUM;
+ struct ix86_address parts;
+ rtx tmp, clob;
+ rtvec par;
+ int ok, adds;
+
+ ok = ix86_decompose_address (operands[1], &parts);
+ gcc_assert (ok);
+
+ if (parts.base)
+ {
+ if (GET_MODE (parts.base) != mode)
+ parts.base = gen_rtx_SUBREG (mode, parts.base, 0);
+ regno1 = true_regnum (parts.base);
+ }
+
+ if (parts.index)
+ {
+ if (GET_MODE (parts.index) != mode)
+ parts.index = gen_rtx_SUBREG (mode, parts.index, 0);
+ regno2 = true_regnum (parts.index);
+ }
+
+ if (parts.scale > 1)
+ {
+ /* Case r1 = r1 + ... */
+ if (regno1 == regno0)
+ {
+ /* If we have a case r1 = r1 + C * r1 then we
+ should use multiplication which is very
+ expensive. Assume cost model is wrong if we
+ have such case here. */
+ gcc_assert (regno2 != regno0);
+
+ for (adds = parts.scale; adds > 0; adds--)
+ {
+ tmp = gen_rtx_PLUS (mode, operands[0], parts.index);
+ tmp = gen_rtx_SET (VOIDmode, operands[0], tmp);
+ clob = gen_rtx_CLOBBER (VOIDmode,
+ gen_rtx_REG (CCmode, FLAGS_REG));
+ par = gen_rtvec (2, tmp, clob);
+ emit_insn (gen_rtx_PARALLEL (VOIDmode, par));
+ }
+ }
+ else
+ {
+ /* r1 = r2 + r3 * C case. Need to move r3 into r1. */
+ if (regno0 != regno2)
+ emit_insn (gen_rtx_SET (VOIDmode, operands[0], parts.index));
+
+ /* Use shift for scaling. */
+ tmp = gen_rtx_ASHIFT (mode, operands[0],
+ GEN_INT (exact_log2 (parts.scale)));
+ tmp = gen_rtx_SET (VOIDmode, operands[0], tmp);
+ clob = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (CCmode, FLAGS_REG));
+ par = gen_rtvec (2, tmp, clob);
+ emit_insn (gen_rtx_PARALLEL (VOIDmode, par));
+
+ if (parts.base)
+ {
+ tmp = gen_rtx_PLUS (mode, operands[0], parts.base);
+ tmp = gen_rtx_SET (VOIDmode, operands[0], tmp);
+ clob = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (CCmode, FLAGS_REG));
+ par = gen_rtvec (2, tmp, clob);
+ emit_insn (gen_rtx_PARALLEL (VOIDmode, par));
+ }
+
+ if (parts.disp && parts.disp != const0_rtx)
+ {
+ tmp = gen_rtx_PLUS (mode, operands[0], parts.disp);
+ tmp = gen_rtx_SET (VOIDmode, operands[0], tmp);
+ clob = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (CCmode, FLAGS_REG));
+ par = gen_rtvec (2, tmp, clob);
+ emit_insn (gen_rtx_PARALLEL (VOIDmode, par));
+ }
+ }
+ }
+ else if (!parts.base && !parts.index)
+ {
+ gcc_assert(parts.disp);
+ emit_insn (gen_rtx_SET (VOIDmode, operands[0], parts.disp));
+ }
+ else
+ {
+ if (!parts.base)
+ {
+ if (regno0 != regno2)
+ emit_insn (gen_rtx_SET (VOIDmode, operands[0], parts.index));
+ }
+ else if (!parts.index)
+ {
+ if (regno0 != regno1)
+ emit_insn (gen_rtx_SET (VOIDmode, operands[0], parts.base));
+ }
+ else
+ {
+ if (regno0 == regno1)
+ tmp = gen_rtx_PLUS (mode, operands[0], parts.index);
+ else if (regno0 == regno2)
+ tmp = gen_rtx_PLUS (mode, operands[0], parts.base);
+ else
+ {
+ emit_insn (gen_rtx_SET (VOIDmode, operands[0], parts.base));
+ tmp = gen_rtx_PLUS (mode, operands[0], parts.index);
+ }
+
+ tmp = gen_rtx_SET (VOIDmode, operands[0], tmp);
+ clob = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (CCmode, FLAGS_REG));
+ par = gen_rtvec (2, tmp, clob);
+ emit_insn (gen_rtx_PARALLEL (VOIDmode, par));
+ }
+
+ if (parts.disp && parts.disp != const0_rtx)
+ {
+ tmp = gen_rtx_PLUS (mode, operands[0], parts.disp);
+ tmp = gen_rtx_SET (VOIDmode, operands[0], tmp);
+ clob = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (CCmode, FLAGS_REG));
+ par = gen_rtvec (2, tmp, clob);
+ emit_insn (gen_rtx_PARALLEL (VOIDmode, par));
+ }
+ }
+}
/* Return true if it is ok to optimize an ADD operation to LEA
operation to avoid flag register consumation. For most processors,
@@ -16172,26 +16598,8 @@ ix86_lea_for_add_ok (rtx insn, rtx operands[])
if (!TARGET_OPT_AGU || optimize_function_for_size_p (cfun))
return false;
- else
- {
- int dist_define, dist_use;
- /* Return false if REGNO0 isn't used in memory address. */
- dist_use = distance_agu_use (regno0, insn);
- if (dist_use <= 0)
- return false;
-
- dist_define = distance_non_agu_define (regno1, regno2, insn);
- if (dist_define <= 0)
- return true;
-
- /* If this insn has both backward non-agu dependence and forward
- agu dependence, the one with short distance take effect. */
- if ((dist_define + IX86_LEA_PRIORITY) < dist_use)
- return false;
-
- return true;
- }
+ return ix86_lea_outperforms (insn, regno0, regno1, regno2, 0);
}
/* Return true if destination reg of SET_BODY is shift count of
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 37491ae..1fa8ed5 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -5777,6 +5777,39 @@
(const_string "none")))
(set_attr "mode" "QI")])
+;; Split non destructive adds if we cannot use lea.
+(define_split
+ [(set (match_operand:SWI48 0 "register_operand" "")
+ (plus:SWI48 (match_operand:SWI48 1 "register_operand" "")
+ (match_operand:SWI48 2 "nonmemory_operand" "")))
+ (clobber (reg:CC FLAGS_REG))]
+ "reload_completed && ix86_avoid_lea_for_add (insn, operands)"
+ [(set (match_dup 0) (match_dup 1))
+ (parallel [(set (match_dup 0) (plus:<MODE> (match_dup 0) (match_dup 2)))
+ (clobber (reg:CC FLAGS_REG))])])
+
+;; Split lea into one or more ALU instructions if profitable.
+(define_split
+ [(set (match_operand:SWI48 0 "register_operand" "")
+ (match_operand:SWI48 1 "lea_address_operand" ""))]
+ "reload_completed && ix86_avoid_lea_for_addr (insn, operands)"
+ [(const_int 0)]
+{
+ ix86_split_lea_for_addr (operands, <MODE>mode);
+ DONE;
+})
+
+;; Split lea into one or more ALU instructions if profitable.
+(define_split
+ [(set (match_operand:SI 0 "register_operand" "")
+ (subreg:SI (match_operand:DI 1 "lea_address_operand" "") 0))]
+ "TARGET_64BIT && reload_completed && ix86_avoid_lea_for_addr (insn, operands)"
+ [(const_int 0)]
+{
+ ix86_split_lea_for_addr (operands, SImode);
+ DONE;
+})
+
;; Convert add to the lea pattern to avoid flags dependency.
(define_split
[(set (match_operand:SWI 0 "register_operand" "")
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH, Atom] Improve AGU stalls avoidance optimization
2011-09-06 12:26 ` Ilya Enkovich
@ 2011-09-06 15:43 ` Uros Bizjak
2011-09-06 17:55 ` Ilya Enkovich
0 siblings, 1 reply; 6+ messages in thread
From: Uros Bizjak @ 2011-09-06 15:43 UTC (permalink / raw)
To: Ilya Enkovich; +Cc: gcc-patches
On Tue, Sep 6, 2011 at 2:26 PM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
> Is fixed version OK?
>
> Thanks,
> Ilya
> ---
> gcc/
>
> 2011-09-06 Enkovich Ilya <ilya.enkovich@intel.com>
>
> * config/i386/i386-protos.h (ix86_lea_outperforms): New.
> (ix86_avoid_lea_for_add): Likewise.
> (ix86_avoid_lea_for_addr): Likewise.
> (ix86_split_lea_for_addr): Likewise.
>
> * config/i386/i386.c (LEA_MAX_STALL): New.
> (increase_distance): Likewise.
> (insn_defines_reg): Likewise.
> (insn_uses_reg_mem): Likewise.
> (distance_non_agu_define_in_bb): Likewise.
> (distance_agu_use_in_bb): Likewise.
> (ix86_lea_outperforms): Likewise.
> (ix86_ok_to_clobber_flags): Likewise.
> (ix86_avoid_lea_for_add): Likewise.
> (ix86_avoid_lea_for_addr): Likewise.
> (ix86_split_lea_for_addr): Likewise.
> (distance_non_agu_define): Search in pred BBs added.
> (distance_agu_use): Search in succ BBs added.
> (IX86_LEA_PRIORITY): Value changed from 2 to 0.
> (LEA_SEARCH_THRESHOLD): Now depends on LEA_MAX_STALL.
> (ix86_lea_for_add_ok): Use ix86_lea_outperforms to make decision.
>
> * config/i386/i386.md: Splits added to transform lea into a
> sequence of instructions.
Please merge your new splitters with corresponding LEA patterns.
OK with this change.
Thanks,
Uros.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH, Atom] Improve AGU stalls avoidance optimization
2011-09-06 15:43 ` Uros Bizjak
@ 2011-09-06 17:55 ` Ilya Enkovich
2011-09-08 13:56 ` H.J. Lu
0 siblings, 1 reply; 6+ messages in thread
From: Ilya Enkovich @ 2011-09-06 17:55 UTC (permalink / raw)
To: Uros Bizjak; +Cc: gcc-patches
[-- Attachment #1: Type: text/plain, Size: 1323 bytes --]
2011/9/6 Uros Bizjak <ubizjak@gmail.com>:
>
> Please merge your new splitters with corresponding LEA patterns.
>
> OK with this change.
>
> Thanks,
> Uros.
>
Fixed. Could please someone check it in if it's OK now?
Thanks,
Ilya
---
gcc/
2011-09-06 Enkovich Ilya <ilya.enkovich@intel.com>
* config/i386/i386-protos.h (ix86_lea_outperforms): New.
(ix86_avoid_lea_for_add): Likewise.
(ix86_avoid_lea_for_addr): Likewise.
(ix86_split_lea_for_addr): Likewise.
* config/i386/i386.c (LEA_MAX_STALL): New.
(increase_distance): Likewise.
(insn_defines_reg): Likewise.
(insn_uses_reg_mem): Likewise.
(distance_non_agu_define_in_bb): Likewise.
(distance_agu_use_in_bb): Likewise.
(ix86_lea_outperforms): Likewise.
(ix86_ok_to_clobber_flags): Likewise.
(ix86_avoid_lea_for_add): Likewise.
(ix86_avoid_lea_for_addr): Likewise.
(ix86_split_lea_for_addr): Likewise.
(distance_non_agu_define): Search in pred BBs added.
(distance_agu_use): Search in succ BBs added.
(IX86_LEA_PRIORITY): Value changed from 2 to 0.
(LEA_SEARCH_THRESHOLD): Now depends on LEA_MAX_STALL.
(ix86_lea_for_add_ok): Use ix86_lea_outperforms to make decision.
* config/i386/i386.md: Split added to transform non destructive
add into move and add.
(lea_1): transformed into insn_and_split to avoid AGU stalls.
(lea<mode>_2): Likewise.
[-- Attachment #2: lea.diff --]
[-- Type: application/octet-stream, Size: 24636 bytes --]
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 7deeae7..900d1c5 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -90,6 +90,11 @@ extern void ix86_fixup_binary_operands_no_copy (enum rtx_code,
extern void ix86_expand_binary_operator (enum rtx_code,
enum machine_mode, rtx[]);
extern bool ix86_binary_operator_ok (enum rtx_code, enum machine_mode, rtx[]);
+extern bool ix86_lea_outperforms (rtx, unsigned int, unsigned int,
+ unsigned int, unsigned int);
+extern bool ix86_avoid_lea_for_add (rtx, rtx[]);
+extern bool ix86_avoid_lea_for_addr (rtx, rtx[]);
+extern void ix86_split_lea_for_addr (rtx[], enum machine_mode);
extern bool ix86_lea_for_add_ok (rtx, rtx[]);
extern bool ix86_vec_interleave_v2df_operator_ok (rtx operands[3], bool high);
extern bool ix86_dep_by_shift_count (const_rtx set_insn, const_rtx use_insn);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index a9c0aa7..660e8c0 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -15955,12 +15955,125 @@ ix86_split_idivmod (enum machine_mode mode, rtx operands[],
emit_label (end_label);
}
-#define LEA_SEARCH_THRESHOLD 12
+#define LEA_MAX_STALL (3)
+#define LEA_SEARCH_THRESHOLD (LEA_MAX_STALL << 1)
+
+/* Increase given DISTANCE in half-cycles according to
+ dependencies between PREV and NEXT instructions.
+ Add 1 half-cycle if there is no dependency and
+ go to next cycle if there is some dependecy. */
+
+static unsigned int
+increase_distance (rtx prev, rtx next, unsigned int distance)
+{
+ df_ref *use_rec;
+ df_ref *def_rec;
+
+ if (!prev || !next)
+ return distance + (distance & 1) + 2;
+
+ if (!DF_INSN_USES (next) || !DF_INSN_DEFS (prev))
+ return distance + 1;
+
+ for (use_rec = DF_INSN_USES (next); *use_rec; use_rec++)
+ for (def_rec = DF_INSN_DEFS (prev); *def_rec; def_rec++)
+ if (!DF_REF_IS_ARTIFICIAL (*def_rec)
+ && DF_REF_REGNO (*use_rec) == DF_REF_REGNO (*def_rec))
+ return distance + (distance & 1) + 2;
+
+ return distance + 1;
+}
+
+/* Function checks if instruction INSN defines register number
+ REGNO1 or REGNO2. */
+
+static bool
+insn_defines_reg (unsigned int regno1, unsigned int regno2,
+ rtx insn)
+{
+ df_ref *def_rec;
+
+ for (def_rec = DF_INSN_DEFS (insn); *def_rec; def_rec++)
+ if (DF_REF_REG_DEF_P (*def_rec)
+ && !DF_REF_IS_ARTIFICIAL (*def_rec)
+ && (regno1 == DF_REF_REGNO (*def_rec)
+ || regno2 == DF_REF_REGNO (*def_rec)))
+ {
+ return true;
+ }
+
+ return false;
+}
+
+/* Function checks if instruction INSN uses register number
+ REGNO as a part of address expression. */
+
+static bool
+insn_uses_reg_mem (unsigned int regno, rtx insn)
+{
+ df_ref *use_rec;
+
+ for (use_rec = DF_INSN_USES (insn); *use_rec; use_rec++)
+ if (DF_REF_REG_MEM_P (*use_rec) && regno == DF_REF_REGNO (*use_rec))
+ return true;
+
+ return false;
+}
+
+/* Search backward for non-agu definition of register number REGNO1
+ or register number REGNO2 in basic block starting from instruction
+ START up to head of basic block or instruction INSN.
+
+ Function puts true value into *FOUND var if definition was found
+ and false otherwise.
+
+ Distance in half-cycles between START and found instruction or head
+ of BB is added to DISTANCE and returned. */
+
+static int
+distance_non_agu_define_in_bb (unsigned int regno1, unsigned int regno2,
+ rtx insn, int distance,
+ rtx start, bool *found)
+{
+ basic_block bb = start ? BLOCK_FOR_INSN (start) : NULL;
+ rtx prev = start;
+ rtx next = NULL;
+ enum attr_type insn_type;
+
+ *found = false;
+
+ while (prev
+ && prev != insn
+ && distance < LEA_SEARCH_THRESHOLD)
+ {
+ if (NONDEBUG_INSN_P (prev) && NONJUMP_INSN_P (prev))
+ {
+ distance = increase_distance (prev, next, distance);
+ if (insn_defines_reg (regno1, regno2, prev))
+ {
+ insn_type = get_attr_type (prev);
+ if (insn_type != TYPE_LEA)
+ {
+ *found = true;
+ return distance;
+ }
+ }
+
+ next = prev;
+ }
+ if (prev == BB_HEAD (bb))
+ break;
+
+ prev = PREV_INSN (prev);
+ }
+
+ return distance;
+}
/* Search backward for non-agu definition of register number REGNO1
or register number REGNO2 in INSN's basic block until
1. Pass LEA_SEARCH_THRESHOLD instructions, or
- 2. Reach BB boundary, or
+ 2. Reach neighbour BBs boundary, or
3. Reach agu definition.
Returns the distance between the non-agu definition point and INSN.
If no definition point, returns -1. */
@@ -15971,35 +16084,14 @@ distance_non_agu_define (unsigned int regno1, unsigned int regno2,
{
basic_block bb = BLOCK_FOR_INSN (insn);
int distance = 0;
- df_ref *def_rec;
- enum attr_type insn_type;
+ bool found = false;
if (insn != BB_HEAD (bb))
- {
- rtx prev = PREV_INSN (insn);
- while (prev && distance < LEA_SEARCH_THRESHOLD)
- {
- if (NONDEBUG_INSN_P (prev))
- {
- distance++;
- for (def_rec = DF_INSN_DEFS (prev); *def_rec; def_rec++)
- if (DF_REF_TYPE (*def_rec) == DF_REF_REG_DEF
- && !DF_REF_IS_ARTIFICIAL (*def_rec)
- && (regno1 == DF_REF_REGNO (*def_rec)
- || regno2 == DF_REF_REGNO (*def_rec)))
- {
- insn_type = get_attr_type (prev);
- if (insn_type != TYPE_LEA)
- goto done;
- }
- }
- if (prev == BB_HEAD (bb))
- break;
- prev = PREV_INSN (prev);
- }
- }
+ distance = distance_non_agu_define_in_bb (regno1, regno2, insn,
+ distance, PREV_INSN (insn),
+ &found);
- if (distance < LEA_SEARCH_THRESHOLD)
+ if (!found && distance < LEA_SEARCH_THRESHOLD)
{
edge e;
edge_iterator ei;
@@ -16013,88 +16105,121 @@ distance_non_agu_define (unsigned int regno1, unsigned int regno2,
}
if (simple_loop)
+ distance = distance_non_agu_define_in_bb (regno1, regno2,
+ insn, distance,
+ BB_END (bb), &found);
+ else
{
- rtx prev = BB_END (bb);
- while (prev
- && prev != insn
- && distance < LEA_SEARCH_THRESHOLD)
+ int shortest_dist = -1;
+ bool found_in_bb = false;
+
+ FOR_EACH_EDGE (e, ei, bb->preds)
{
- if (NONDEBUG_INSN_P (prev))
+ int bb_dist = distance_non_agu_define_in_bb (regno1, regno2,
+ insn, distance,
+ BB_END (e->src),
+ &found_in_bb);
+ if (found_in_bb)
{
- distance++;
- for (def_rec = DF_INSN_DEFS (prev); *def_rec; def_rec++)
- if (DF_REF_TYPE (*def_rec) == DF_REF_REG_DEF
- && !DF_REF_IS_ARTIFICIAL (*def_rec)
- && (regno1 == DF_REF_REGNO (*def_rec)
- || regno2 == DF_REF_REGNO (*def_rec)))
- {
- insn_type = get_attr_type (prev);
- if (insn_type != TYPE_LEA)
- goto done;
- }
+ if (shortest_dist < 0)
+ shortest_dist = bb_dist;
+ else if (bb_dist > 0)
+ shortest_dist = MIN (bb_dist, shortest_dist);
}
- prev = PREV_INSN (prev);
+
+ found = found || found_in_bb;
}
+
+ distance = shortest_dist;
}
}
- distance = -1;
-
-done:
/* get_attr_type may modify recog data. We want to make sure
that recog data is valid for instruction INSN, on which
distance_non_agu_define is called. INSN is unchanged here. */
extract_insn_cached (insn);
+
+ if (!found)
+ distance = -1;
+ else
+ distance = distance >> 1;
+
return distance;
}
-/* Return the distance between INSN and the next insn that uses
- register number REGNO0 in memory address. Return -1 if no such
- a use is found within LEA_SEARCH_THRESHOLD or REGNO0 is set. */
+/* Return the distance in half-cycles between INSN and the next
+ insn that uses register number REGNO in memory address added
+ to DISTANCE. Return -1 if REGNO0 is set.
+
+ Put true value into *FOUND if register usage was found and
+ false otherwise.
+ Put true value into *REDEFINED if register redefinition was
+ found and false otherwise. */
static int
-distance_agu_use (unsigned int regno0, rtx insn)
+distance_agu_use_in_bb(unsigned int regno,
+ rtx insn, int distance, rtx start,
+ bool *found, bool *redefined)
{
- basic_block bb = BLOCK_FOR_INSN (insn);
- int distance = 0;
- df_ref *def_rec;
- df_ref *use_rec;
+ basic_block bb = start ? BLOCK_FOR_INSN (start) : NULL;
+ rtx next = start;
+ rtx prev = NULL;
- if (insn != BB_END (bb))
+ *found = false;
+ *redefined = false;
+
+ while (next
+ && next != insn
+ && distance < LEA_SEARCH_THRESHOLD)
{
- rtx next = NEXT_INSN (insn);
- while (next && distance < LEA_SEARCH_THRESHOLD)
+ if (NONDEBUG_INSN_P (next) && NONJUMP_INSN_P (next))
{
- if (NONDEBUG_INSN_P (next))
+ distance = increase_distance(prev, next, distance);
+ if (insn_uses_reg_mem (regno, next))
{
- distance++;
-
- for (use_rec = DF_INSN_USES (next); *use_rec; use_rec++)
- if ((DF_REF_TYPE (*use_rec) == DF_REF_REG_MEM_LOAD
- || DF_REF_TYPE (*use_rec) == DF_REF_REG_MEM_STORE)
- && regno0 == DF_REF_REGNO (*use_rec))
- {
- /* Return DISTANCE if OP0 is used in memory
- address in NEXT. */
- return distance;
- }
+ /* Return DISTANCE if OP0 is used in memory
+ address in NEXT. */
+ *found = true;
+ return distance;
+ }
- for (def_rec = DF_INSN_DEFS (next); *def_rec; def_rec++)
- if (DF_REF_TYPE (*def_rec) == DF_REF_REG_DEF
- && !DF_REF_IS_ARTIFICIAL (*def_rec)
- && regno0 == DF_REF_REGNO (*def_rec))
- {
- /* Return -1 if OP0 is set in NEXT. */
- return -1;
- }
+ if (insn_defines_reg (regno, INVALID_REGNUM, next))
+ {
+ /* Return -1 if OP0 is set in NEXT. */
+ *redefined = true;
+ return -1;
}
- if (next == BB_END (bb))
- break;
- next = NEXT_INSN (next);
+
+ prev = next;
}
+
+ if (next == BB_END (bb))
+ break;
+
+ next = NEXT_INSN (next);
}
- if (distance < LEA_SEARCH_THRESHOLD)
+ return distance;
+}
+
+/* Return the distance between INSN and the next insn that uses
+ register number REGNO0 in memory address. Return -1 if no such
+ a use is found within LEA_SEARCH_THRESHOLD or REGNO0 is set. */
+
+static int
+distance_agu_use (unsigned int regno0, rtx insn)
+{
+ basic_block bb = BLOCK_FOR_INSN (insn);
+ int distance = 0;
+ bool found = false;
+ bool redefined = false;
+
+ if (insn != BB_END (bb))
+ distance = distance_agu_use_in_bb (regno0, insn, distance,
+ NEXT_INSN (insn),
+ &found, &redefined);
+
+ if (!found && !redefined && distance < LEA_SEARCH_THRESHOLD)
{
edge e;
edge_iterator ei;
@@ -16108,42 +16233,41 @@ distance_agu_use (unsigned int regno0, rtx insn)
}
if (simple_loop)
+ distance = distance_agu_use_in_bb (regno0, insn,
+ distance, BB_HEAD (bb),
+ &found, &redefined);
+ else
{
- rtx next = BB_HEAD (bb);
- while (next
- && next != insn
- && distance < LEA_SEARCH_THRESHOLD)
+ int shortest_dist = -1;
+ bool found_in_bb = false;
+ bool redefined_in_bb = false;
+
+ FOR_EACH_EDGE (e, ei, bb->succs)
{
- if (NONDEBUG_INSN_P (next))
+ int bb_dist = distance_agu_use_in_bb (regno0, insn,
+ distance, BB_HEAD (e->dest),
+ &found_in_bb, &redefined_in_bb);
+ if (found_in_bb)
{
- distance++;
-
- for (use_rec = DF_INSN_USES (next); *use_rec; use_rec++)
- if ((DF_REF_TYPE (*use_rec) == DF_REF_REG_MEM_LOAD
- || DF_REF_TYPE (*use_rec) == DF_REF_REG_MEM_STORE)
- && regno0 == DF_REF_REGNO (*use_rec))
- {
- /* Return DISTANCE if OP0 is used in memory
- address in NEXT. */
- return distance;
- }
-
- for (def_rec = DF_INSN_DEFS (next); *def_rec; def_rec++)
- if (DF_REF_TYPE (*def_rec) == DF_REF_REG_DEF
- && !DF_REF_IS_ARTIFICIAL (*def_rec)
- && regno0 == DF_REF_REGNO (*def_rec))
- {
- /* Return -1 if OP0 is set in NEXT. */
- return -1;
- }
-
+ if (shortest_dist < 0)
+ shortest_dist = bb_dist;
+ else if (bb_dist > 0)
+ shortest_dist = MIN (bb_dist, shortest_dist);
}
- next = NEXT_INSN (next);
+
+ found = found || found_in_bb;
}
+
+ distance = shortest_dist;
}
}
- return -1;
+ if (!found || redefined)
+ distance = -1;
+ else
+ distance = distance >> 1;
+
+ return distance;
}
/* Define this macro to tune LEA priority vs ADD, it take effect when
@@ -16151,7 +16275,309 @@ distance_agu_use (unsigned int regno0, rtx insn)
Negative value: ADD is more preferred than LEA
Zero: Netrual
Positive value: LEA is more preferred than ADD*/
-#define IX86_LEA_PRIORITY 2
+#define IX86_LEA_PRIORITY 0
+
+/* Return true if usage of lea INSN has performance advantage
+ over a sequence of instructions. Instructions sequence has
+ SPLIT_COST cycles higher latency than lea latency. */
+
+bool
+ix86_lea_outperforms (rtx insn, unsigned int regno0, unsigned int regno1,
+ unsigned int regno2, unsigned int split_cost)
+{
+ int dist_define, dist_use;
+
+ dist_define = distance_non_agu_define (regno1, regno2, insn);
+ dist_use = distance_agu_use (regno0, insn);
+
+ if (dist_define < 0 || dist_define >= LEA_MAX_STALL)
+ {
+ /* If there is no non AGU operand definition, no AGU
+ operand usage and split cost is 0 then both lea
+ and non lea variants have same priority. Currently
+ we prefer lea for 64 bit code and non lea on 32 bit
+ code. */
+ if (dist_use < 0 && split_cost == 0)
+ return TARGET_64BIT || IX86_LEA_PRIORITY;
+ else
+ return true;
+ }
+
+ /* With longer definitions distance lea is more preferable.
+ Here we change it to take into account splitting cost and
+ lea priority. */
+ dist_define += split_cost + IX86_LEA_PRIORITY;
+
+ /* If there is no use in memory addess then we just check
+ that split cost does not exceed AGU stall. */
+ if (dist_use < 0)
+ return dist_define >= LEA_MAX_STALL;
+
+ /* If this insn has both backward non-agu dependence and forward
+ agu dependence, the one with short distance takes effect. */
+ return dist_define >= dist_use;
+}
+
+/* Return true if it is legal to clobber flags by INSN and
+ false otherwise. */
+
+static bool
+ix86_ok_to_clobber_flags(rtx insn)
+{
+ basic_block bb = BLOCK_FOR_INSN (insn);
+ df_ref *use;
+ bitmap live;
+
+ while (insn)
+ {
+ if (NONDEBUG_INSN_P (insn))
+ {
+ for (use = DF_INSN_USES (insn); *use; use++)
+ if (DF_REF_REG_USE_P (*use) && DF_REF_REGNO (*use) == FLAGS_REG)
+ return false;
+
+ if (insn_defines_reg (FLAGS_REG, INVALID_REGNUM, insn))
+ return true;
+ }
+
+ if (insn == BB_END (bb))
+ break;
+
+ insn = NEXT_INSN (insn);
+ }
+
+ live = df_get_live_out(bb);
+ return !REGNO_REG_SET_P (live, FLAGS_REG);
+}
+
+/* Return true if we need to split op0 = op1 + op2 into a sequence of
+ move and add to avoid AGU stalls. */
+
+bool
+ix86_avoid_lea_for_add (rtx insn, rtx operands[])
+{
+ unsigned int regno0 = true_regnum (operands[0]);
+ unsigned int regno1 = true_regnum (operands[1]);
+ unsigned int regno2 = true_regnum (operands[2]);
+
+ /* Check if we need to optimize. */
+ if (!TARGET_OPT_AGU || optimize_function_for_size_p (cfun))
+ return false;
+
+ /* Check it is correct to split here. */
+ if (!ix86_ok_to_clobber_flags(insn))
+ return false;
+
+ /* We need to split only adds with non destructive
+ destination operand. */
+ if (regno0 == regno1 || regno0 == regno2)
+ return false;
+ else
+ return !ix86_lea_outperforms (insn, regno0, regno1, regno2, 1);
+}
+
+/* Return true if we need to split lea into a sequence of
+ instructions to avoid AGU stalls. */
+
+bool
+ix86_avoid_lea_for_addr (rtx insn, rtx operands[])
+{
+ unsigned int regno0 = true_regnum (operands[0]) ;
+ unsigned int regno1 = -1;
+ unsigned int regno2 = -1;
+ unsigned int split_cost = 0;
+ struct ix86_address parts;
+ int ok;
+
+ /* Check we need to optimize. */
+ if (!TARGET_OPT_AGU || optimize_function_for_size_p (cfun))
+ return false;
+
+ /* Check it is correct to split here. */
+ if (!ix86_ok_to_clobber_flags(insn))
+ return false;
+
+ ok = ix86_decompose_address (operands[1], &parts);
+ gcc_assert (ok);
+
+ /* We should not split into add if non legitimate pic
+ operand is used as displacement. */
+ if (parts.disp && flag_pic && !LEGITIMATE_PIC_OPERAND_P (parts.disp))
+ return false;
+
+ if (parts.base)
+ regno1 = true_regnum (parts.base);
+ if (parts.index)
+ regno2 = true_regnum (parts.index);
+
+ /* Compute how many cycles we will add to execution time
+ if split lea into a sequence of instructions. */
+ if (parts.base || parts.index)
+ {
+ /* Have to use mov instruction if non desctructive
+ destination form is used. */
+ if (regno1 != regno0 && regno2 != regno0)
+ split_cost += 1;
+
+ /* Have to add index to base if both exist. */
+ if (parts.base && parts.index)
+ split_cost += 1;
+
+ /* Have to use shift and adds if scale is 2 or greater. */
+ if (parts.scale > 1)
+ {
+ if (regno0 != regno1)
+ split_cost += 1;
+ else if (regno2 == regno0)
+ split_cost += 4;
+ else
+ split_cost += parts.scale;
+ }
+
+ /* Have to use add instruction with immediate if
+ disp is non zero. */
+ if (parts.disp && parts.disp != const0_rtx)
+ split_cost += 1;
+
+ /* Subtract the price of lea. */
+ split_cost -= 1;
+ }
+
+ return !ix86_lea_outperforms (insn, regno0, regno1, regno2, split_cost);
+}
+
+/* Split lea instructions into a sequence of instructions
+ which are executed on ALU to avoid AGU stalls.
+ It is assumed that it is allowed to clobber flags register
+ at lea position. */
+
+extern void
+ix86_split_lea_for_addr (rtx operands[], enum machine_mode mode)
+{
+ unsigned int regno0 = true_regnum (operands[0]) ;
+ unsigned int regno1 = INVALID_REGNUM;
+ unsigned int regno2 = INVALID_REGNUM;
+ struct ix86_address parts;
+ rtx tmp, clob;
+ rtvec par;
+ int ok, adds;
+
+ ok = ix86_decompose_address (operands[1], &parts);
+ gcc_assert (ok);
+
+ if (parts.base)
+ {
+ if (GET_MODE (parts.base) != mode)
+ parts.base = gen_rtx_SUBREG (mode, parts.base, 0);
+ regno1 = true_regnum (parts.base);
+ }
+
+ if (parts.index)
+ {
+ if (GET_MODE (parts.index) != mode)
+ parts.index = gen_rtx_SUBREG (mode, parts.index, 0);
+ regno2 = true_regnum (parts.index);
+ }
+
+ if (parts.scale > 1)
+ {
+ /* Case r1 = r1 + ... */
+ if (regno1 == regno0)
+ {
+ /* If we have a case r1 = r1 + C * r1 then we
+ should use multiplication which is very
+ expensive. Assume cost model is wrong if we
+ have such case here. */
+ gcc_assert (regno2 != regno0);
+
+ for (adds = parts.scale; adds > 0; adds--)
+ {
+ tmp = gen_rtx_PLUS (mode, operands[0], parts.index);
+ tmp = gen_rtx_SET (VOIDmode, operands[0], tmp);
+ clob = gen_rtx_CLOBBER (VOIDmode,
+ gen_rtx_REG (CCmode, FLAGS_REG));
+ par = gen_rtvec (2, tmp, clob);
+ emit_insn (gen_rtx_PARALLEL (VOIDmode, par));
+ }
+ }
+ else
+ {
+ /* r1 = r2 + r3 * C case. Need to move r3 into r1. */
+ if (regno0 != regno2)
+ emit_insn (gen_rtx_SET (VOIDmode, operands[0], parts.index));
+
+ /* Use shift for scaling. */
+ tmp = gen_rtx_ASHIFT (mode, operands[0],
+ GEN_INT (exact_log2 (parts.scale)));
+ tmp = gen_rtx_SET (VOIDmode, operands[0], tmp);
+ clob = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (CCmode, FLAGS_REG));
+ par = gen_rtvec (2, tmp, clob);
+ emit_insn (gen_rtx_PARALLEL (VOIDmode, par));
+
+ if (parts.base)
+ {
+ tmp = gen_rtx_PLUS (mode, operands[0], parts.base);
+ tmp = gen_rtx_SET (VOIDmode, operands[0], tmp);
+ clob = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (CCmode, FLAGS_REG));
+ par = gen_rtvec (2, tmp, clob);
+ emit_insn (gen_rtx_PARALLEL (VOIDmode, par));
+ }
+
+ if (parts.disp && parts.disp != const0_rtx)
+ {
+ tmp = gen_rtx_PLUS (mode, operands[0], parts.disp);
+ tmp = gen_rtx_SET (VOIDmode, operands[0], tmp);
+ clob = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (CCmode, FLAGS_REG));
+ par = gen_rtvec (2, tmp, clob);
+ emit_insn (gen_rtx_PARALLEL (VOIDmode, par));
+ }
+ }
+ }
+ else if (!parts.base && !parts.index)
+ {
+ gcc_assert(parts.disp);
+ emit_insn (gen_rtx_SET (VOIDmode, operands[0], parts.disp));
+ }
+ else
+ {
+ if (!parts.base)
+ {
+ if (regno0 != regno2)
+ emit_insn (gen_rtx_SET (VOIDmode, operands[0], parts.index));
+ }
+ else if (!parts.index)
+ {
+ if (regno0 != regno1)
+ emit_insn (gen_rtx_SET (VOIDmode, operands[0], parts.base));
+ }
+ else
+ {
+ if (regno0 == regno1)
+ tmp = gen_rtx_PLUS (mode, operands[0], parts.index);
+ else if (regno0 == regno2)
+ tmp = gen_rtx_PLUS (mode, operands[0], parts.base);
+ else
+ {
+ emit_insn (gen_rtx_SET (VOIDmode, operands[0], parts.base));
+ tmp = gen_rtx_PLUS (mode, operands[0], parts.index);
+ }
+
+ tmp = gen_rtx_SET (VOIDmode, operands[0], tmp);
+ clob = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (CCmode, FLAGS_REG));
+ par = gen_rtvec (2, tmp, clob);
+ emit_insn (gen_rtx_PARALLEL (VOIDmode, par));
+ }
+
+ if (parts.disp && parts.disp != const0_rtx)
+ {
+ tmp = gen_rtx_PLUS (mode, operands[0], parts.disp);
+ tmp = gen_rtx_SET (VOIDmode, operands[0], tmp);
+ clob = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (CCmode, FLAGS_REG));
+ par = gen_rtvec (2, tmp, clob);
+ emit_insn (gen_rtx_PARALLEL (VOIDmode, par));
+ }
+ }
+}
/* Return true if it is ok to optimize an ADD operation to LEA
operation to avoid flag register consumation. For most processors,
@@ -16172,26 +16598,8 @@ ix86_lea_for_add_ok (rtx insn, rtx operands[])
if (!TARGET_OPT_AGU || optimize_function_for_size_p (cfun))
return false;
- else
- {
- int dist_define, dist_use;
- /* Return false if REGNO0 isn't used in memory address. */
- dist_use = distance_agu_use (regno0, insn);
- if (dist_use <= 0)
- return false;
-
- dist_define = distance_non_agu_define (regno1, regno2, insn);
- if (dist_define <= 0)
- return true;
-
- /* If this insn has both backward non-agu dependence and forward
- agu dependence, the one with short distance take effect. */
- if ((dist_define + IX86_LEA_PRIORITY) < dist_use)
- return false;
-
- return true;
- }
+ return ix86_lea_outperforms (insn, regno0, regno1, regno2, 0);
}
/* Return true if destination reg of SET_BODY is shift count of
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 37491ae..52c57fa 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -5463,19 +5463,31 @@
[(set_attr "type" "alu")
(set_attr "mode" "QI")])
-(define_insn "*lea_1"
+(define_insn_and_split "*lea_1"
[(set (match_operand:SI 0 "register_operand" "=r")
(subreg:SI (match_operand:DI 1 "lea_address_operand" "p") 0))]
"TARGET_64BIT"
"lea{l}\t{%a1, %0|%0, %a1}"
+ "&& reload_completed && ix86_avoid_lea_for_addr (insn, operands)"
+ [(const_int 0)]
+{
+ ix86_split_lea_for_addr (operands, SImode);
+ DONE;
+}
[(set_attr "type" "lea")
(set_attr "mode" "SI")])
-(define_insn "*lea<mode>_2"
+(define_insn_and_split "*lea<mode>_2"
[(set (match_operand:SWI48 0 "register_operand" "=r")
(match_operand:SWI48 1 "lea_address_operand" "p"))]
""
"lea{<imodesuffix>}\t{%a1, %0|%0, %a1}"
+ "reload_completed && ix86_avoid_lea_for_addr (insn, operands)"
+ [(const_int 0)]
+{
+ ix86_split_lea_for_addr (operands, <MODE>mode);
+ DONE;
+}
[(set_attr "type" "lea")
(set_attr "mode" "<MODE>")])
@@ -5777,6 +5789,17 @@
(const_string "none")))
(set_attr "mode" "QI")])
+;; Split non destructive adds if we cannot use lea.
+(define_split
+ [(set (match_operand:SWI48 0 "register_operand" "")
+ (plus:SWI48 (match_operand:SWI48 1 "register_operand" "")
+ (match_operand:SWI48 2 "nonmemory_operand" "")))
+ (clobber (reg:CC FLAGS_REG))]
+ "reload_completed && ix86_avoid_lea_for_add (insn, operands)"
+ [(set (match_dup 0) (match_dup 1))
+ (parallel [(set (match_dup 0) (plus:<MODE> (match_dup 0) (match_dup 2)))
+ (clobber (reg:CC FLAGS_REG))])])
+
;; Convert add to the lea pattern to avoid flags dependency.
(define_split
[(set (match_operand:SWI 0 "register_operand" "")
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH, Atom] Improve AGU stalls avoidance optimization
2011-09-06 17:55 ` Ilya Enkovich
@ 2011-09-08 13:56 ` H.J. Lu
0 siblings, 0 replies; 6+ messages in thread
From: H.J. Lu @ 2011-09-08 13:56 UTC (permalink / raw)
To: Ilya Enkovich; +Cc: Uros Bizjak, gcc-patches
On Tue, Sep 6, 2011 at 10:54 AM, Ilya Enkovich <enkovich.gnu@gmail.com> wrote:
> 2011/9/6 Uros Bizjak <ubizjak@gmail.com>:
>>
>> Please merge your new splitters with corresponding LEA patterns.
>>
>> OK with this change.
>>
>> Thanks,
>> Uros.
>>
>
> Fixed. Could please someone check it in if it's OK now?
>
> Thanks,
> Ilya
> ---
> gcc/
>
> 2011-09-06 Enkovich Ilya <ilya.enkovich@intel.com>
>
> * config/i386/i386-protos.h (ix86_lea_outperforms): New.
> (ix86_avoid_lea_for_add): Likewise.
> (ix86_avoid_lea_for_addr): Likewise.
> (ix86_split_lea_for_addr): Likewise.
>
> * config/i386/i386.c (LEA_MAX_STALL): New.
> (increase_distance): Likewise.
> (insn_defines_reg): Likewise.
> (insn_uses_reg_mem): Likewise.
> (distance_non_agu_define_in_bb): Likewise.
> (distance_agu_use_in_bb): Likewise.
> (ix86_lea_outperforms): Likewise.
> (ix86_ok_to_clobber_flags): Likewise.
> (ix86_avoid_lea_for_add): Likewise.
> (ix86_avoid_lea_for_addr): Likewise.
> (ix86_split_lea_for_addr): Likewise.
> (distance_non_agu_define): Search in pred BBs added.
> (distance_agu_use): Search in succ BBs added.
> (IX86_LEA_PRIORITY): Value changed from 2 to 0.
> (LEA_SEARCH_THRESHOLD): Now depends on LEA_MAX_STALL.
> (ix86_lea_for_add_ok): Use ix86_lea_outperforms to make decision.
>
> * config/i386/i386.md: Split added to transform non destructive
> add into move and add.
> (lea_1): transformed into insn_and_split to avoid AGU stalls.
> (lea<mode>_2): Likewise.
>
I checked it into trunk for you.
Thanks.
--
H.J.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH, Atom] Improve AGU stalls avoidance optimization
@ 2011-09-02 11:03 Ilya Enkovich
0 siblings, 0 replies; 6+ messages in thread
From: Ilya Enkovich @ 2011-09-02 11:03 UTC (permalink / raw)
To: gcc-patches
[-- Attachment #1: Type: text/plain, Size: 1212 bytes --]
Hello,
Here is a patch which adds few more splits for AGU stalls avoidance on
Atom. It also fixes cost model and detects AGU stalls more
efficiently.
Bootstrapped and checked on x86_64-linux.
Thanks,
Ilya
---
gcc/
2011-09-02 Enkovich Ilya <ilya.enkovich@intel.com>
* config/i386/i386-protos.h (ix86_lea_outperforms): New.
(ix86_avoid_lea_for_add): Likewise.
(ix86_avoid_lea_for_addr): Likewise.
(ix86_split_lea_for_addr): Likewise.
* config/i386/i386.c (LEA_MAX_STALL): New.
(increase_distance): Likewise.
(insn_defines_reg): Likewise.
(insn_uses_reg_mem): Likewise.
(distance_non_agu_define_in_bb): Likewise.
(distance_agu_use_in_bb): Likewise.
(ix86_lea_outperforms): Likewise.
(ix86_ok_to_clobber_flags): Likewise.
(ix86_avoid_lea_for_add): Likewise.
(ix86_avoid_lea_for_addr): Likewise.
(ix86_split_lea_for_addr): Likewise.
(distance_non_agu_define): Search in pred BBs added.
(distance_agu_use): Search in succ BBs added.
(IX86_LEA_PRIORITY): Value changed from 2 to 0.
(LEA_SEARCH_THRESHOLD): Now depends on LEA_MAX_STALL.
(ix86_lea_for_add_ok): Use ix86_lea_outperforms to make decision.
* config/i386/i386.md: Splits added to transform lea into a
sequence of instructions.
[-- Attachment #2: lea.diff --]
[-- Type: application/octet-stream, Size: 24120 bytes --]
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 7deeae7..900d1c5 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -90,6 +90,11 @@ extern void ix86_fixup_binary_operands_no_copy (enum rtx_code,
extern void ix86_expand_binary_operator (enum rtx_code,
enum machine_mode, rtx[]);
extern bool ix86_binary_operator_ok (enum rtx_code, enum machine_mode, rtx[]);
+extern bool ix86_lea_outperforms (rtx, unsigned int, unsigned int,
+ unsigned int, unsigned int);
+extern bool ix86_avoid_lea_for_add (rtx, rtx[]);
+extern bool ix86_avoid_lea_for_addr (rtx, rtx[]);
+extern void ix86_split_lea_for_addr (rtx[], enum machine_mode);
extern bool ix86_lea_for_add_ok (rtx, rtx[]);
extern bool ix86_vec_interleave_v2df_operator_ok (rtx operands[3], bool high);
extern bool ix86_dep_by_shift_count (const_rtx set_insn, const_rtx use_insn);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index d0e1be5..3792dee 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -15955,12 +15955,125 @@ ix86_split_idivmod (enum machine_mode mode, rtx operands[],
emit_label (end_label);
}
-#define LEA_SEARCH_THRESHOLD 12
+#define LEA_MAX_STALL (3)
+#define LEA_SEARCH_THRESHOLD (LEA_MAX_STALL << 1)
+
+/* Increase given DISTANCE in half-cycles according to
+ dependencies between PREV and NEXT instructions.
+ Add 1 half-cycle if there is no dependency and
+ go to next cycle if there is some dependecy. */
+
+static unsigned int
+increase_distance (rtx prev, rtx next, unsigned int distance)
+{
+ df_ref *use_rec;
+ df_ref *def_rec;
+
+ if (!prev || !next)
+ return distance + (distance & 1) + 2;
+
+ if (!DF_INSN_USES (next) || !DF_INSN_DEFS (prev))
+ return distance + 1;
+
+ for (use_rec = DF_INSN_USES (next); *use_rec; use_rec++)
+ for (def_rec = DF_INSN_DEFS (prev); *def_rec; def_rec++)
+ if (!DF_REF_IS_ARTIFICIAL (*def_rec)
+ && DF_REF_REGNO (*use_rec) == DF_REF_REGNO (*def_rec))
+ return distance + (distance & 1) + 2;
+
+ return distance + 1;
+}
+
+/* Function checks if instruction INSN defines register number
+ REGNO1 or REGNO2. */
+
+static bool
+insn_defines_reg (unsigned int regno1, unsigned int regno2,
+ rtx insn)
+{
+ df_ref *def_rec;
+
+ for (def_rec = DF_INSN_DEFS (insn); *def_rec; def_rec++)
+ if (DF_REF_REG_DEF_P (*def_rec)
+ && !DF_REF_IS_ARTIFICIAL (*def_rec)
+ && (regno1 == DF_REF_REGNO (*def_rec)
+ || regno2 == DF_REF_REGNO (*def_rec)))
+ {
+ return true;
+ }
+
+ return false;
+}
+
+/* Function checks if instruction INSN uses register number
+ REGNO as a part of address expression. */
+
+static bool
+insn_uses_reg_mem (unsigned int regno, rtx insn)
+{
+ df_ref *use_rec;
+
+ for (use_rec = DF_INSN_USES (insn); *use_rec; use_rec++)
+ if (DF_REF_REG_MEM_P (*use_rec) && regno == DF_REF_REGNO (*use_rec))
+ return true;
+
+ return false;
+}
+
+/* Search backward for non-agu definition of register number REGNO1
+ or register number REGNO2 in basic block starting from instruction
+ START up to head of basic block or instruction INSN.
+
+ Function puts true value into *FOUND var if definition was found
+ and false otherwise.
+
+ Distance in half-cycles between START and found instruction or head
+ of BB is added to DISTANCE and returned. */
+
+static int
+distance_non_agu_define_in_bb (unsigned int regno1, unsigned int regno2,
+ rtx insn, int distance,
+ rtx start, bool *found)
+{
+ basic_block bb = start ? BLOCK_FOR_INSN (start) : NULL;
+ rtx prev = start;
+ rtx next = NULL;
+ enum attr_type insn_type;
+
+ *found = false;
+
+ while (prev
+ && prev != insn
+ && distance < LEA_SEARCH_THRESHOLD)
+ {
+ if (NONDEBUG_INSN_P (prev) && NONJUMP_INSN_P (prev))
+ {
+ distance = increase_distance (prev, next, distance);
+ if (insn_defines_reg (regno1, regno2, prev))
+ {
+ insn_type = get_attr_type (prev);
+ if (insn_type != TYPE_LEA)
+ {
+ *found = true;
+ return distance;
+ }
+ }
+
+ next = prev;
+ }
+ if (prev == BB_HEAD (bb))
+ break;
+
+ prev = PREV_INSN (prev);
+ }
+
+ return distance;
+}
/* Search backward for non-agu definition of register number REGNO1
or register number REGNO2 in INSN's basic block until
1. Pass LEA_SEARCH_THRESHOLD instructions, or
- 2. Reach BB boundary, or
+ 2. Reach neighbour BBs boundary, or
3. Reach agu definition.
Returns the distance between the non-agu definition point and INSN.
If no definition point, returns -1. */
@@ -15971,35 +16084,14 @@ distance_non_agu_define (unsigned int regno1, unsigned int regno2,
{
basic_block bb = BLOCK_FOR_INSN (insn);
int distance = 0;
- df_ref *def_rec;
- enum attr_type insn_type;
+ bool found = false;
if (insn != BB_HEAD (bb))
- {
- rtx prev = PREV_INSN (insn);
- while (prev && distance < LEA_SEARCH_THRESHOLD)
- {
- if (NONDEBUG_INSN_P (prev))
- {
- distance++;
- for (def_rec = DF_INSN_DEFS (prev); *def_rec; def_rec++)
- if (DF_REF_TYPE (*def_rec) == DF_REF_REG_DEF
- && !DF_REF_IS_ARTIFICIAL (*def_rec)
- && (regno1 == DF_REF_REGNO (*def_rec)
- || regno2 == DF_REF_REGNO (*def_rec)))
- {
- insn_type = get_attr_type (prev);
- if (insn_type != TYPE_LEA)
- goto done;
- }
- }
- if (prev == BB_HEAD (bb))
- break;
- prev = PREV_INSN (prev);
- }
- }
+ distance = distance_non_agu_define_in_bb (regno1, regno2, insn,
+ distance, PREV_INSN (insn),
+ &found);
- if (distance < LEA_SEARCH_THRESHOLD)
+ if (!found && distance < LEA_SEARCH_THRESHOLD)
{
edge e;
edge_iterator ei;
@@ -16013,88 +16105,121 @@ distance_non_agu_define (unsigned int regno1, unsigned int regno2,
}
if (simple_loop)
+ distance = distance_non_agu_define_in_bb (regno1, regno2,
+ insn, distance,
+ BB_END (bb), &found);
+ else
{
- rtx prev = BB_END (bb);
- while (prev
- && prev != insn
- && distance < LEA_SEARCH_THRESHOLD)
+ int shortest_dist = -1;
+ bool found_in_bb = false;
+
+ FOR_EACH_EDGE (e, ei, bb->preds)
{
- if (NONDEBUG_INSN_P (prev))
+ int bb_dist = distance_non_agu_define_in_bb (regno1, regno2,
+ insn, distance,
+ BB_END (e->src),
+ &found_in_bb);
+ if (found_in_bb)
{
- distance++;
- for (def_rec = DF_INSN_DEFS (prev); *def_rec; def_rec++)
- if (DF_REF_TYPE (*def_rec) == DF_REF_REG_DEF
- && !DF_REF_IS_ARTIFICIAL (*def_rec)
- && (regno1 == DF_REF_REGNO (*def_rec)
- || regno2 == DF_REF_REGNO (*def_rec)))
- {
- insn_type = get_attr_type (prev);
- if (insn_type != TYPE_LEA)
- goto done;
- }
+ if (shortest_dist < 0)
+ shortest_dist = bb_dist;
+ else if (bb_dist > 0)
+ shortest_dist = MIN (bb_dist, shortest_dist);
}
- prev = PREV_INSN (prev);
+
+ found = found || found_in_bb;
}
+
+ distance = shortest_dist;
}
}
- distance = -1;
-
-done:
/* get_attr_type may modify recog data. We want to make sure
that recog data is valid for instruction INSN, on which
distance_non_agu_define is called. INSN is unchanged here. */
extract_insn_cached (insn);
+
+ if (!found)
+ distance = -1;
+ else
+ distance = distance >> 1;
+
return distance;
}
-/* Return the distance between INSN and the next insn that uses
- register number REGNO0 in memory address. Return -1 if no such
- a use is found within LEA_SEARCH_THRESHOLD or REGNO0 is set. */
+/* Return the distance in half-cycles between INSN and the next
+ insn that uses register number REGNO in memory address added
+ to DISTANCE. Return -1 if REGNO0 is set.
+
+ Put true value into *FOUND if register usage was found and
+ false otherwise.
+ Put true value into *REDEFINED if register redefinition was
+ found and false otherwise. */
static int
-distance_agu_use (unsigned int regno0, rtx insn)
+distance_agu_use_in_bb(unsigned int regno,
+ rtx insn, int distance, rtx start,
+ bool *found, bool *redefined)
{
- basic_block bb = BLOCK_FOR_INSN (insn);
- int distance = 0;
- df_ref *def_rec;
- df_ref *use_rec;
+ basic_block bb = start ? BLOCK_FOR_INSN (start) : NULL;
+ rtx next = start;
+ rtx prev = NULL;
- if (insn != BB_END (bb))
+ *found = false;
+ *redefined = false;
+
+ while (next
+ && next != insn
+ && distance < LEA_SEARCH_THRESHOLD)
{
- rtx next = NEXT_INSN (insn);
- while (next && distance < LEA_SEARCH_THRESHOLD)
+ if (NONDEBUG_INSN_P (next) && NONJUMP_INSN_P (next))
{
- if (NONDEBUG_INSN_P (next))
+ distance = increase_distance(prev, next, distance);
+ if (insn_uses_reg_mem (regno, next))
{
- distance++;
-
- for (use_rec = DF_INSN_USES (next); *use_rec; use_rec++)
- if ((DF_REF_TYPE (*use_rec) == DF_REF_REG_MEM_LOAD
- || DF_REF_TYPE (*use_rec) == DF_REF_REG_MEM_STORE)
- && regno0 == DF_REF_REGNO (*use_rec))
- {
- /* Return DISTANCE if OP0 is used in memory
- address in NEXT. */
- return distance;
- }
+ /* Return DISTANCE if OP0 is used in memory
+ address in NEXT. */
+ *found = true;
+ return distance;
+ }
- for (def_rec = DF_INSN_DEFS (next); *def_rec; def_rec++)
- if (DF_REF_TYPE (*def_rec) == DF_REF_REG_DEF
- && !DF_REF_IS_ARTIFICIAL (*def_rec)
- && regno0 == DF_REF_REGNO (*def_rec))
- {
- /* Return -1 if OP0 is set in NEXT. */
- return -1;
- }
+ if (insn_defines_reg (regno, -1, next))
+ {
+ /* Return -1 if OP0 is set in NEXT. */
+ *redefined = true;
+ return -1;
}
- if (next == BB_END (bb))
- break;
- next = NEXT_INSN (next);
+
+ prev = next;
}
+
+ if (next == BB_END (bb))
+ break;
+
+ next = NEXT_INSN (next);
}
- if (distance < LEA_SEARCH_THRESHOLD)
+ return distance;
+}
+
+/* Return the distance between INSN and the next insn that uses
+ register number REGNO0 in memory address. Return -1 if no such
+ a use is found within LEA_SEARCH_THRESHOLD or REGNO0 is set. */
+
+static int
+distance_agu_use (unsigned int regno0, rtx insn)
+{
+ basic_block bb = BLOCK_FOR_INSN (insn);
+ int distance = 0;
+ bool found = false;
+ bool redefined = false;
+
+ if (insn != BB_END (bb))
+ distance = distance_agu_use_in_bb (regno0, insn, distance,
+ NEXT_INSN (insn),
+ &found, &redefined);
+
+ if (!found && !redefined && distance < LEA_SEARCH_THRESHOLD)
{
edge e;
edge_iterator ei;
@@ -16108,42 +16233,41 @@ distance_agu_use (unsigned int regno0, rtx insn)
}
if (simple_loop)
+ distance = distance_agu_use_in_bb (regno0, insn,
+ distance, BB_HEAD (bb),
+ &found, &redefined);
+ else
{
- rtx next = BB_HEAD (bb);
- while (next
- && next != insn
- && distance < LEA_SEARCH_THRESHOLD)
+ int shortest_dist = -1;
+ bool found_in_bb = false;
+ bool redefined_in_bb = false;
+
+ FOR_EACH_EDGE (e, ei, bb->succs)
{
- if (NONDEBUG_INSN_P (next))
+ int bb_dist = distance_agu_use_in_bb (regno0, insn,
+ distance, BB_HEAD (e->dest),
+ &found_in_bb, &redefined_in_bb);
+ if (found_in_bb)
{
- distance++;
-
- for (use_rec = DF_INSN_USES (next); *use_rec; use_rec++)
- if ((DF_REF_TYPE (*use_rec) == DF_REF_REG_MEM_LOAD
- || DF_REF_TYPE (*use_rec) == DF_REF_REG_MEM_STORE)
- && regno0 == DF_REF_REGNO (*use_rec))
- {
- /* Return DISTANCE if OP0 is used in memory
- address in NEXT. */
- return distance;
- }
-
- for (def_rec = DF_INSN_DEFS (next); *def_rec; def_rec++)
- if (DF_REF_TYPE (*def_rec) == DF_REF_REG_DEF
- && !DF_REF_IS_ARTIFICIAL (*def_rec)
- && regno0 == DF_REF_REGNO (*def_rec))
- {
- /* Return -1 if OP0 is set in NEXT. */
- return -1;
- }
-
+ if (shortest_dist < 0)
+ shortest_dist = bb_dist;
+ else if (bb_dist > 0)
+ shortest_dist = MIN (bb_dist, shortest_dist);
}
- next = NEXT_INSN (next);
+
+ found = found || found_in_bb;
}
+
+ distance = shortest_dist;
}
}
- return -1;
+ if (!found || redefined)
+ distance = -1;
+ else
+ distance = distance >> 1;
+
+ return distance;
}
/* Define this macro to tune LEA priority vs ADD, it take effect when
@@ -16151,7 +16275,304 @@ distance_agu_use (unsigned int regno0, rtx insn)
Negative value: ADD is more preferred than LEA
Zero: Netrual
Positive value: LEA is more preferred than ADD*/
-#define IX86_LEA_PRIORITY 2
+#define IX86_LEA_PRIORITY 0
+
+/* Return true if usage of lea INSN has performance advantage
+ over a sequence of instructions. Instructions sequence has
+ SPLIT_COST cycles higher latency than lea latency. */
+
+bool
+ix86_lea_outperforms (rtx insn, unsigned int regno0, unsigned int regno1,
+ unsigned int regno2, unsigned int split_cost)
+{
+ int dist_define, dist_use;
+
+ dist_define = distance_non_agu_define (regno1, regno2, insn);
+ dist_use = distance_agu_use (regno0, insn);
+
+ if (dist_define < 0 || dist_define >= LEA_MAX_STALL)
+ {
+ /* If there is no non AGU operand definition, no AGU
+ operand usage and split cost is 0 then both lea
+ and non lea variants have same priority. Currently
+ we prefer lea for 64 bit code and non lea on 32 bit
+ code. */
+ if (dist_use < 0 && split_cost == 0)
+ return TARGET_64BIT || IX86_LEA_PRIORITY;
+ else
+ return true;
+ }
+
+ /* With longer definitions distance lea is more preferable.
+ Here we change it to take into account splitting cost and
+ lea priority. */
+ dist_define += split_cost + IX86_LEA_PRIORITY;
+
+ /* If there is no use in memory addess then we just check
+ that split cost does not exceed AGU stall. */
+ if (dist_use < 0)
+ return dist_define >= LEA_MAX_STALL;
+
+ /* If this insn has both backward non-agu dependence and forward
+ agu dependence, the one with short distance takes effect. */
+ return dist_define >= dist_use;
+}
+
+/* Return true if it is legal to clobber flags by INSN and
+ false otherwise. */
+
+static bool
+ix86_ok_to_clobber_flags(rtx insn)
+{
+ basic_block bb = BLOCK_FOR_INSN (insn);
+ df_ref *use;
+ bitmap live;
+
+ while (insn)
+ {
+ if (NONDEBUG_INSN_P (insn))
+ {
+ for (use = DF_INSN_USES (insn); *use; use++)
+ if (DF_REF_REG_USE_P (*use) && DF_REF_REGNO (*use) == FLAGS_REG)
+ return false;
+
+ if (insn_defines_reg (FLAGS_REG, -1, insn))
+ return true;
+ }
+
+ if (insn == BB_END (bb))
+ break;
+
+ insn = NEXT_INSN (insn);
+ }
+
+ live = df_get_live_out(bb);
+ return !REGNO_REG_SET_P (live, FLAGS_REG);
+}
+
+/* Return true if we need to split op0 = op1 + op2 into a sequence of
+ move and add to avoid AGU stalls. */
+
+bool
+ix86_avoid_lea_for_add (rtx insn, rtx operands[])
+{
+ unsigned int regno0 = true_regnum (operands[0]);
+ unsigned int regno1 = true_regnum (operands[1]);
+ unsigned int regno2 = true_regnum (operands[2]);
+
+ /* Check if we need to optimize. */
+ if (!TARGET_OPT_AGU || optimize_function_for_size_p (cfun))
+ return false;
+
+ /* Check it is correct to split here. */
+ if (!ix86_ok_to_clobber_flags(insn))
+ return false;
+
+ /* We need to split only adds with non destructive
+ destination operand. */
+ if (regno0 == regno1 || regno0 == regno2)
+ return false;
+ else
+ return !ix86_lea_outperforms (insn, regno0, regno1, regno2, 1);
+}
+
+/* Return true if we need to split lea into a sequence of
+ instructions to avoid AGU stalls. */
+
+bool
+ix86_avoid_lea_for_addr (rtx insn, rtx operands[])
+{
+ unsigned int regno0 = true_regnum (operands[0]) ;
+ unsigned int regno1 = -1;
+ unsigned int regno2 = -1;
+ unsigned int split_cost = 0;
+ struct ix86_address parts;
+ int ok;
+
+ /* Check we need to optimize. */
+ if (!TARGET_OPT_AGU || optimize_function_for_size_p (cfun))
+ return false;
+
+ /* Check it is correct to split here. */
+ if (!ix86_ok_to_clobber_flags(insn))
+ return false;
+
+ ok = ix86_decompose_address (operands[1], &parts);
+ gcc_assert (ok);
+
+ /* We should not split into add if non legitimate pic
+ operand is used as displacement. */
+ if (parts.disp && flag_pic && !LEGITIMATE_PIC_OPERAND_P (parts.disp))
+ return false;
+
+ if (parts.base)
+ regno1 = true_regnum (parts.base);
+ if (parts.index)
+ regno2 = true_regnum (parts.index);
+
+ /* Compute how many cycles we will add to execution time
+ if split lea into a sequence of instructions. */
+ if (parts.base || parts.index)
+ {
+ /* Have to use mov instruction if non desctructive
+ destination form is used. */
+ if (regno1 != regno0 && regno2 != regno0)
+ split_cost += 1;
+
+ /* Have to add index to base if both exist. */
+ if (parts.base && parts.index)
+ split_cost += 1;
+
+ /* Have to use shift and adds if scale is 2 or greater. */
+ if (parts.scale > 1)
+ {
+ if (regno0 != regno1)
+ split_cost += 1;
+ else if (regno2 == regno0)
+ split_cost += 4;
+ else
+ split_cost += parts.scale;
+ }
+
+ /* Have to use add instruction with immediate if
+ disp is non zero. */
+ if (parts.disp && parts.disp != const0_rtx)
+ split_cost += 1;
+
+ /* Subtract the price of lea. */
+ split_cost -= 1;
+ }
+
+ return !ix86_lea_outperforms (insn, regno0, regno1, regno2, split_cost);
+}
+
+extern void
+ix86_split_lea_for_addr (rtx operands[], enum machine_mode mode)
+{
+ unsigned int regno0 = true_regnum (operands[0]) ;
+ unsigned int regno1 = -1;
+ unsigned int regno2 = -1;
+ struct ix86_address parts;
+ rtx tmp, clob;
+ rtvec par;
+ int ok, adds;
+
+ ok = ix86_decompose_address (operands[1], &parts);
+ gcc_assert (ok);
+
+ if (parts.base)
+ {
+ if (GET_MODE (parts.base) != mode)
+ parts.base = gen_rtx_SUBREG (mode, parts.base, 0);
+ regno1 = true_regnum (parts.base);
+ }
+
+ if (parts.index)
+ {
+ if (GET_MODE (parts.index) != mode)
+ parts.index = gen_rtx_SUBREG (mode, parts.index, 0);
+ regno2 = true_regnum (parts.index);
+ }
+
+ if (parts.scale > 1)
+ {
+ /* Case r1 = r1 + ... */
+ if (regno1 == regno0)
+ {
+ /* If we have a case r1 = r1 + C * r1 then we
+ should use multiplication which is very
+ expensive. Assume cost model is wrong if we
+ have such case here. */
+ gcc_assert (regno2 != regno0);
+
+ for (adds = parts.scale; adds > 0; adds--)
+ {
+ tmp = gen_rtx_PLUS (mode, operands[0], parts.index);
+ tmp = gen_rtx_SET (VOIDmode, operands[0], tmp);
+ clob = gen_rtx_CLOBBER (VOIDmode,
+ gen_rtx_REG (CCmode, FLAGS_REG));
+ par = gen_rtvec (2, tmp, clob);
+ emit_insn (gen_rtx_PARALLEL (VOIDmode, par));
+ }
+ }
+ else
+ {
+ /* r1 = r2 + r3 * C case. Need to move r3 into r1. */
+ if (regno0 != regno2)
+ emit_insn (gen_rtx_SET (VOIDmode, operands[0], parts.index));
+
+ /* Use shift for scaling. */
+ tmp = gen_rtx_ASHIFT (mode, operands[0],
+ GEN_INT (exact_log2 (parts.scale)));
+ tmp = gen_rtx_SET (VOIDmode, operands[0], tmp);
+ clob = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (CCmode, FLAGS_REG));
+ par = gen_rtvec (2, tmp, clob);
+ emit_insn (gen_rtx_PARALLEL (VOIDmode, par));
+
+ if (parts.base)
+ {
+ tmp = gen_rtx_PLUS (mode, operands[0], parts.base);
+ tmp = gen_rtx_SET (VOIDmode, operands[0], tmp);
+ clob = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (CCmode, FLAGS_REG));
+ par = gen_rtvec (2, tmp, clob);
+ emit_insn (gen_rtx_PARALLEL (VOIDmode, par));
+ }
+
+ if (parts.disp && parts.disp != const0_rtx)
+ {
+ tmp = gen_rtx_PLUS (mode, operands[0], parts.disp);
+ tmp = gen_rtx_SET (VOIDmode, operands[0], tmp);
+ clob = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (CCmode, FLAGS_REG));
+ par = gen_rtvec (2, tmp, clob);
+ emit_insn (gen_rtx_PARALLEL (VOIDmode, par));
+ }
+ }
+ }
+ else if (!parts.base && !parts.index)
+ {
+ gcc_assert(parts.disp);
+ emit_insn (gen_rtx_SET (VOIDmode, operands[0], parts.disp));
+ }
+ else
+ {
+ if (!parts.base)
+ {
+ if (regno0 != regno2)
+ emit_insn (gen_rtx_SET (VOIDmode, operands[0], parts.index));
+ }
+ else if (!parts.index)
+ {
+ if (regno0 != regno1)
+ emit_insn (gen_rtx_SET (VOIDmode, operands[0], parts.base));
+ }
+ else
+ {
+ if (regno0 == regno1)
+ tmp = gen_rtx_PLUS (mode, operands[0], parts.index);
+ else if (regno0 == regno2)
+ tmp = gen_rtx_PLUS (mode, operands[0], parts.base);
+ else
+ {
+ emit_insn (gen_rtx_SET (VOIDmode, operands[0], parts.base));
+ tmp = gen_rtx_PLUS (mode, operands[0], parts.index);
+ }
+
+ tmp = gen_rtx_SET (VOIDmode, operands[0], tmp);
+ clob = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (CCmode, FLAGS_REG));
+ par = gen_rtvec (2, tmp, clob);
+ emit_insn (gen_rtx_PARALLEL (VOIDmode, par));
+ }
+
+ if (parts.disp && parts.disp != const0_rtx)
+ {
+ tmp = gen_rtx_PLUS (mode, operands[0], parts.disp);
+ tmp = gen_rtx_SET (VOIDmode, operands[0], tmp);
+ clob = gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (CCmode, FLAGS_REG));
+ par = gen_rtvec (2, tmp, clob);
+ emit_insn (gen_rtx_PARALLEL (VOIDmode, par));
+ }
+ }
+}
/* Return true if it is ok to optimize an ADD operation to LEA
operation to avoid flag register consumation. For most processors,
@@ -16172,26 +16593,8 @@ ix86_lea_for_add_ok (rtx insn, rtx operands[])
if (!TARGET_OPT_AGU || optimize_function_for_size_p (cfun))
return false;
- else
- {
- int dist_define, dist_use;
- /* Return false if REGNO0 isn't used in memory address. */
- dist_use = distance_agu_use (regno0, insn);
- if (dist_use <= 0)
- return false;
-
- dist_define = distance_non_agu_define (regno1, regno2, insn);
- if (dist_define <= 0)
- return true;
-
- /* If this insn has both backward non-agu dependence and forward
- agu dependence, the one with short distance take effect. */
- if ((dist_define + IX86_LEA_PRIORITY) < dist_use)
- return false;
-
- return true;
- }
+ return ix86_lea_outperforms (insn, regno0, regno1, regno2, 0);
}
/* Return true if destination reg of SET_BODY is shift count of
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 37491ae..100e3ff 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -5777,6 +5777,41 @@
(const_string "none")))
(set_attr "mode" "QI")])
+;; Split non destructive adds if we cannot use lea.
+(define_split
+ [(set (match_operand:SWI48 0 "register_operand" "")
+ (plus:SWI48 (match_operand:SWI48 1 "register_operand" "")
+ (match_operand:SWI48 2 "nonmemory_operand" "")))
+ (clobber (reg:CC FLAGS_REG))]
+ "reload_completed && ix86_avoid_lea_for_add (insn, operands)"
+ [(set (match_dup 0) (match_dup 1))
+ (parallel [(set (match_dup 0) (plus:<MODE> (match_dup 0) (match_dup 2)))
+ (clobber (reg:CC FLAGS_REG))])
+ ]
+)
+
+;; Split lea into one or more ALU instructions if profitable.
+(define_split
+ [(set (match_operand:SWI48 0 "register_operand" "")
+ (match_operand:SWI48 1 "lea_address_operand" ""))]
+ "reload_completed && ix86_avoid_lea_for_addr (insn, operands)"
+ [(const_int 0)]
+{
+ ix86_split_lea_for_addr (operands, <MODE>mode);
+ DONE;
+})
+
+;; Split lea into one or more ALU instructions if profitable.
+(define_split
+ [(set (match_operand:SI 0 "register_operand" "")
+ (subreg:SI (match_operand:DI 1 "lea_address_operand" "") 0))]
+ "reload_completed && ix86_avoid_lea_for_addr (insn, operands)"
+ [(const_int 0)]
+{
+ ix86_split_lea_for_addr (operands, SImode);
+ DONE;
+})
+
;; Convert add to the lea pattern to avoid flags dependency.
(define_split
[(set (match_operand:SWI 0 "register_operand" "")
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2011-09-08 13:42 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-03 14:25 [PATCH, Atom] Improve AGU stalls avoidance optimization Uros Bizjak
2011-09-06 12:26 ` Ilya Enkovich
2011-09-06 15:43 ` Uros Bizjak
2011-09-06 17:55 ` Ilya Enkovich
2011-09-08 13:56 ` H.J. Lu
-- strict thread matches above, loose matches on Subject: below --
2011-09-02 11:03 Ilya Enkovich
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).