Hi, PR52272 reported a performance regression in spec2006/410.bwaves once GCC is prevented from representing address of one memory object using address of another memory object. Also as I commented in that PR, we have two possible fixes for this: 1) Improve how TMR.base is deduced, so that we can represent addr of mem obj using another one, while not breaking PR50955. 2) Add iv candidates with base object stripped. In this way, we use the common base-stripped part to represent all address expressions, in the form of [base_1 + common], [base_2 + common], ..., [base_n + common]. In terms of code generation, method 2) is at least as good as 1), actually better in my opinion. The problem of 2) is we need to tell when iv candidates should be added for the common part and when shouldn't. This issue can be generalized and described as: We know IVO tries to add candidates by deriving from iv uses. One disadvantage is that candidates are derived from iv use independently. It doesn't take common sub expression among different iv uses into consideration. As a result, candidate for common sub expression is not added, while many useless candidates are added. As a matter of fact, candidate derived from iv use is useful only if it's common enough and could be shared among different uses. A candidate is most likely useless if it's derived from a single use and could not be shared by others. This patch works in this way by firstly recording all kinds candidates derived from iv uses, then adding candidates for common ones. The patch improves 410.bwaves by 3-4% on x86_64. I also saw regression for 400.perlbench and small regression for 401.bzip on x86_64, but I can confirm they are false alarms caused by align issues. For aarch64, fp cases are obviously improved for both spec2000 and spec2006. Also the patch causes 2-3% regression for 459.GemsFDTD, which I think is another irrelevant issue caused by heuristic candidate selecting algorithm. Unfortunately, I don't have fix to it currently. This patch may add more candidates in some cases, but generally candidates number is smaller because we don't need to add useless candidates now. Statistic data shows there are quite fewer loops with more than 30 candidates when building spec2k6 on x86_64 using this patch. Bootstrap and test on x86_64. I will re-test it against latest trunk on AArch64. Is it OK? Thanks, bin 2015-11-03 Bin Cheng PR tree-optimization/52272 * tree-ssa-loop-ivopts.c (struct iv_common_cand): New struct. (struct iv_common_cand_hasher): New struct. (iv_common_cand_hasher::hash): New function. (iv_common_cand_hasher::equal): New function. (struct ivopts_data): New fields, iv_common_cand_tab and iv_common_cands. (tree_ssa_iv_optimize_init): Initialize above fields. (record_common_cand, common_cand_cmp): New functions. (add_iv_candidate_derived_from_uses): New function. (add_iv_candidate_for_use): Record iv_common_cands derived from iv use in hash table, instead of adding candidates directly. (add_iv_candidate_for_uses): Call add_iv_candidate_derived_from_uses. (record_important_candidates): Add important candidates to iv uses' related_cands. Always keep related_cands for future use. (try_add_cand_for): Use iv uses' related_cands. (free_loop_data, tree_ssa_iv_optimize_finalize): Release new fields in struct ivopts_data, iv_common_cand_tab and iv_common_cands.