public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Shuo Wang <wangshuo47@huawei.com>
To: <hjl.tools@gmail.com>, <libc-alpha@sourceware.org>
Cc: <hushiyuan@huawei.com>, <liqingqing3@huawei.com>
Subject: x86-64: memcpy performance reduce when running in virtual mechine
Date: Mon, 11 Jan 2021 16:41:57 +0800	[thread overview]
Message-ID: <20210111084157.15188-1-wangshuo47@huawei.com> (raw)

There is also performance reduce when memcpy enter __memmove_avx_unaligned_erms in
vm compared with host.
>memcpy performance reduce when running in virtual mechine compared with host.
>This is test result:
>-----------------------
>|       | host |  vm  | 
>|cycle: |  78  | 1503 |
>-----------------------
>
From perf, we believe that they enter same bracnch between host and vm:
>[host]
>  78.61%  libc-2.28.so     [.] __memmove_sse2_unaligned_erms
>  12.85%  [kernel]         [k] nmi
>   6.38%  hot_host_memcpy  [.] main
>   
>[virtual machine]
>  98.64%  libc-2.28.so   [.] __memmove_sse2_unaligned_erms
>   0.17%  hot_vm_memcpy  [.] main
>   
>This is our demo:
>#include <unistd.h>
>#include <stdlib.h>
>#include <stdio.h>
>#include <string.h>
>
>static __inline__ unsigned long long rdtsc(void)
>{
>  unsigned hi, lo;
>  __asm__ __volatile__ ("rdtsc" : "=a"(lo), "=d"(hi));
>  return ( (unsigned long long)lo)|( ((unsigned long long)hi)<<32 );
>}
>
>int main(int argc, char **argv)
>{
>        int i, defs, lm_optb;
>    if (argc == 3) {
>        defs = atoi(argv[1]);
>        lm_optb = atoi(argv[2]);
>    } else {
>        printf("error input!\n");
>        return 1;
>    }
>    char *src = (char *)valloc(defs);
>    char *dest = (char *)valloc(defs);
>    int opts = defs;
>
>    memset(src, 1, defs);
>    memset(dest, 1, defs);
>
>    unsigned long long begin, end;
>    begin = rdtsc();
>
>//while (1) {
>    for (i = 0; i < lm_optb; i++) {
>        (void) memcpy(dest, src, opts);
>    }
>//}
>
>    end = rdtsc();
>    printf("all cycle = %llu, percall = %llu\n", end - begin, (end - begin) / lm_optb);
>
>    return (0);
>}
>
>This is the test log:
># taskset -c 2 ./host_memcpy 1024 1024000
>all cycle = 80149652, percall = 78
># taskset -c 2 ./host_memcpy 1024 1024000
>all cycle = 93075200, percall = 90
>
># taskset -c 2 ./vm_memcpy 1024 1024000
>all cycle = 1539990968, percall = 1503
># taskset -c 2 ./vm_memcpy 1024 1024000
>all cycle = 1541243316, percall = 1505
>
>We build it by:
># gcc -g -O0 memcpy.c -o host_memcpy
># gcc -g -O0 memcpy.c -o vm_memcpy
>
>
>The environment information is as follows:
>[host]
>- kernel version: 4.18.0
>- glibc version: 2.28
>- gcc version: 8.3.1
>- qemu version: 2.12.0
>- libvirtd version: 4.5.0
>
># lscpu
>Architecture:        x86_64
>CPU op-mode(s):      32-bit, 64-bit
>Byte Order:          Little Endian
>CPU(s):              60
>On-line CPU(s) list: 0-59
>Thread(s) per core:  2
>Core(s) per socket:  15
>Socket(s):           8
>NUMA node(s):        8
>Vendor ID:           GenuineIntel
>CPU family:          6
>Model:               62
>Model name:          Intel(R) Xeon(R) CPU E7-8870 v2 @ 2.30GHz
>Stepping:            7
>CPU MHz:             2294.529
>CPU max MHz:         2300.0000
>CPU min MHz:         1200.0000
>BogoMIPS:            4589.07
>Virtualization:      VT-x
>L1d cache:           32K
>L1i cache:           32K
>L2 cache:            256K
>L3 cache:            30720K
>NUMA node0 CPU(s):   0-14,30-44
>NUMA node1 CPU(s):   15-29,45-59
>Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault epb pti intel_ppin ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm arat pln pts md_clear flush_l1d
>
>[virtual machine]
>- kernel version: 4.18.0
>- glibc version: 2.28
>- gcc version: 8.3.1
>- qemu version: 2.12.0
>- libvirtd version: 4.5.0
>
># lscpu
>Architecture:        x86_64
>CPU op-mode(s):      32-bit, 64-bit
>Byte Order:          Little Endian
>CPU(s):              4
>On-line CPU(s) list: 0-3
>Thread(s) per core:  1
>Core(s) per socket:  1
>Socket(s):           4
>NUMA node(s):        1
>Vendor ID:           GenuineIntel
>CPU family:          6
>Model:               62
>Model name:          Intel(R) Xeon(R) CPU E7-8870 v2 @ 2.30GHz
>Stepping:            7
>CPU MHz:             2294.468
>BogoMIPS:            4588.93
>Hypervisor vendor:   KVM
>Virtualization type: full
>L1d cache:           32K
>L1i cache:           32K
>L2 cache:            4096K
>L3 cache:            16384K
>NUMA node0 CPU(s):   0-3
>Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cpuid_fault pti ssbd ibrs ibpb stibp fsgsbase tsc_adjust smep erms xsaveopt arat umip md_clear arch_capabilities
>


             reply	other threads:[~2021-01-11  8:42 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-11  8:41 Shuo Wang [this message]
  -- strict thread matches above, loose matches on Subject: below --
2021-01-11 14:32 Shuo Wang
2021-01-11 15:09 ` Florian Weimer
2021-01-11  8:38 Shuo Wang
2021-01-11  9:06 ` Florian Weimer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210111084157.15188-1-wangshuo47@huawei.com \
    --to=wangshuo47@huawei.com \
    --cc=hjl.tools@gmail.com \
    --cc=hushiyuan@huawei.com \
    --cc=libc-alpha@sourceware.org \
    --cc=liqingqing3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).