Hi, On 14 February 2018 at 09:47, Kugan Vivekanandarajah wrote: > Hi Kyrill, > > On 13 February 2018 at 20:47, Kyrill Tkachov > wrote: >> Hi Kugan, >> >> On 12/02/18 23:58, Kugan Vivekanandarajah wrote: >>> >>> Implements a machine reorg pass for aarch64/Falkor to handle >>> prefetcher tag collision. This is strictly not part of the loop >>> unroller but for Falkor, unrolling can make h/w prefetcher performing >>> badly if there are too much tag collisions based on the discussions in >>> https://gcc.gnu.org/ml/gcc/2017-10/msg00178.html. >>> >> >> Could you expand a bit more on what transformation exactly this pass does? > > This is similar to what LLVM does in https://reviews.llvm.org/D35366. > > Falkor hardware prefetcher works well when signature of the prefetches > (or tags as computed in the patch - similar to LLVM) are different for > different memory streams. If different memory streams have the same > signature, it can result in bad performance. This machine reorg pass > tries to change the signature of memory loads by changing the base > register with a free register. > >> From my understanding the loads that use the same base >> register and offset and have the same destination register >> are considered part of the same stream by the hardware prefetcher, so for >> example: >> ldr x0, [x1, 16] (load1) >> ... (set x1 to something else) >> ldr x0, [x1, 16] (load2) >> >> will cause the prefetcher to think that both loads are part of the same >> stream, >> so this pass tries to rewrite the sequence into: >> ldr x0, [x1, 16] >> ... (set x1 to something else) >> mov tmp, x1 >> ldr x0, [tmp, 16] >> >> Where the tag/signature is the combination of destination x0, base x1 and >> offset 16. >> Is this a fair description? > > This is precisely what is happening. > >> >> I've got some comments on the patch itself >> >>> gcc/ChangeLog: >>> >>> 2018-02-12 Kugan Vivekanandarajah >>> >>> * config/aarch64/aarch64.c (iv_p): New. >>> (strided_load_p): Likwise. >>> (make_tag): Likesie. >>> (get_load_info): Likewise. >>> (aarch64_reorg): Likewise. >>> (TARGET_MACHINE_DEPENDENT_REORG): Implement new target hook. >> >> >> New functions need function comments describing the arguments at least. >> Functions like make_tag, get_load_info etc can get tricky to maintain >> without >> some documentation on what they are supposed to accept and return. > > I wil add the comments. > >> >> I think the pass should be enabled at certain optimisation levels, say -O2? >> I don't think it would be desirable at -Os since it creates extra moves that >> increase code size. > > Ok, I will change this. > >> >> That being said, I would recommend you implement this as an aarch64-specific >> pass, >> in a similar way to cortex-a57-fma-steering.c. That way you can register it >> in >> aarch64-passes.def and have flexibility as to when exactly the pass gets to >> run >> (i.e. you wouldn't be limited by when machine_reorg gets run). >> >> Also, I suggest you don't use the "if (aarch64_tune != falkor) return;" way >> of >> gating this pass. Do it in a similar way to the FMA steering pass that is, >> define a new flag in aarch64-tuning-flags.def and use it in the tune_flags >> field >> of the falkor tuning struct. > > Ok, I will revise the patch. Here is the revised patch. Thanks, Kugan gcc/ChangeLog: 2018-02-15 Kugan Vivekanandarajah * config.gcc: Add falkor-tag-collision-avoidance.o to extra_objs for aarch64-*-*. * config/aarch64/aarch64-protos.h (make_pass_tag_collision_avoidance): Declare. * config/aarch64/aarch64-passes.def: Insert tag collision avoidance pass. * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_AVOID_PREFETCH_TAG_COLLISION): Define. * config/aarch64/aarch64.c (qdf24xx_tunings): Add AARCH64_EXTRA_TUNE_AVOID_PREFETCH_TAG_COLLISION. * config/aarch64/falkor-tag-collision-avoidance.c: New file. * config/aarch64/t-aarch64: Add falkor-tag-collision-avoidance.o. > > > Thanks, > Kugan > >> >> Hope this helps, >> Kyrill