From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from szxga06-in.huawei.com (szxga06-in.huawei.com [45.249.212.32]) by sourceware.org (Postfix) with ESMTPS id A7CD73858D33 for ; Mon, 11 Jan 2021 08:39:00 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org A7CD73858D33 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=huawei.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=wangshuo47@huawei.com Received: from DGGEMS411-HUB.china.huawei.com (unknown [172.30.72.58]) by szxga06-in.huawei.com (SkyGuard) with ESMTP id 4DDnCj01zqzj4cS for ; Mon, 11 Jan 2021 16:38:09 +0800 (CST) Received: from huawei.com (10.174.176.87) by DGGEMS411-HUB.china.huawei.com (10.3.19.211) with Microsoft SMTP Server id 14.3.498.0; Mon, 11 Jan 2021 16:38:48 +0800 From: Shuo Wang To: , CC: , Subject: x86-64: memcpy performance reduce when running in virtual mechine Date: Mon, 11 Jan 2021 16:38:48 +0800 Message-ID: <20210111083848.22496-1-wangshuo47@huawei.com> X-Mailer: git-send-email 2.19.0.windows.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Originating-IP: [10.174.176.87] X-CFilter-Loop: Reflected X-Spam-Status: No, score=-6.2 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Jan 2021 08:39:03 -0000 memcpy performance reduce when running in virtual mechine compared with host. This is test result: ----------------------- | | host | vm | |cycle: | 78 | 1503 | ----------------------- >From perf, we believe that they enter same bracnch between host and vm: [host] 78.61% libc-2.28.so [.] __memmove_sse2_unaligned_erms 12.85% [kernel] [k] nmi 6.38% hot_host_memcpy [.] main [virtual machine] 98.64% libc-2.28.so [.] __memmove_sse2_unaligned_erms 0.17% hot_vm_memcpy [.] main This is our demo: #include #include #include #include static __inline__ unsigned long long rdtsc(void) { unsigned hi, lo; __asm__ __volatile__ ("rdtsc" : "=a"(lo), "=d"(hi)); return ( (unsigned long long)lo)|( ((unsigned long long)hi)<<32 ); } int main(int argc, char **argv) { int i, defs, lm_optb; if (argc == 3) { defs = atoi(argv[1]); lm_optb = atoi(argv[2]); } else { printf("error input!\n"); return 1; } char *src = (char *)valloc(defs); char *dest = (char *)valloc(defs); int opts = defs; memset(src, 1, defs); memset(dest, 1, defs); unsigned long long begin, end; begin = rdtsc(); //while (1) { for (i = 0; i < lm_optb; i++) { (void) memcpy(dest, src, opts); } //} end = rdtsc(); printf("all cycle = %llu, percall = %llu\n", end - begin, (end - begin) / lm_optb); return (0); } This is the test log: # taskset -c 2 ./host_memcpy 1024 1024000 all cycle = 80149652, percall = 78 # taskset -c 2 ./host_memcpy 1024 1024000 all cycle = 93075200, percall = 90 # taskset -c 2 ./vm_memcpy 1024 1024000 all cycle = 1539990968, percall = 1503 # taskset -c 2 ./vm_memcpy 1024 1024000 all cycle = 1541243316, percall = 1505 We build it by: # gcc -g -O0 memcpy.c -o host_memcpy # gcc -g -O0 memcpy.c -o vm_memcpy The environment information is as follows: [host] - kernel version: 4.18.0 - glibc version: 2.28 - gcc version: 8.3.1 - qemu version: 2.12.0 - libvirtd version: 4.5.0 # lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 60 On-line CPU(s) list: 0-59 Thread(s) per core: 2 Core(s) per socket: 15 Socket(s): 8 NUMA node(s): 8 Vendor ID: GenuineIntel CPU family: 6 Model: 62 Model name: Intel(R) Xeon(R) CPU E7-8870 v2 @ 2.30GHz Stepping: 7 CPU MHz: 2294.529 CPU max MHz: 2300.0000 CPU min MHz: 1200.0000 BogoMIPS: 4589.07 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 30720K NUMA node0 CPU(s): 0-14,30-44 NUMA node1 CPU(s): 15-29,45-59 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault epb pti intel_ppin ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm arat pln pts md_clear flush_l1d [virtual machine] - kernel version: 4.18.0 - glibc version: 2.28 - gcc version: 8.3.1 - qemu version: 2.12.0 - libvirtd version: 4.5.0 # lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 4 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 62 Model name: Intel(R) Xeon(R) CPU E7-8870 v2 @ 2.30GHz Stepping: 7 CPU MHz: 2294.468 BogoMIPS: 4588.93 Hypervisor vendor: KVM Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 4096K L3 cache: 16384K NUMA node0 CPU(s): 0-3 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cpuid_fault pti ssbd ibrs ibpb stibp fsgsbase tsc_adjust smep erms xsaveopt arat umip md_clear arch_capabilities