This patch improves MIPS assembly implementations of memcpy. Two optimizations are added: prefetching of data for subsequent iterations of memcpy loop and pipelined expansion of unaligned memcpy. These optimizations speed up MIPS memcpy by about 10%. The prefetching part is straightforward: it adds prefetching of a cache line (32 bytes) for +1 iteration for unaligned case and +2 iteration for aligned case. The rationale here is that it will take prefetch to acquire data about same time as 1 iteration of unaligned loop or 2 iterations of aligned loop. Values for these parameters were tuned on a modern MIPS processor. The pipelined expansion of unaligned loop is implemented in a similar fashion as expansion of the aligned loop. The assembly is tricky, but it works. These changes are almost 3 years old, and have been thoroughly tested in CodeSourcery MIPS toolchains. Retested with current trunk with no regressions for n32, n64 and o32 ABIs. OK to apply? -- Maxim Kuvyrkov Mentor Graphics