From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 7640 invoked by alias); 25 Jan 2013 13:38:33 -0000 Received: (qmail 7435 invoked by uid 22791); 25 Jan 2013 13:38:30 -0000 X-SWARE-Spam-Status: No, hits=-4.3 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,KHOP_RCVD_TRUST,RCVD_IN_DNSWL_LOW,RCVD_IN_HOSTKARMA_YE X-Spam-Check-By: sourceware.org Received: from mail-ob0-f173.google.com (HELO mail-ob0-f173.google.com) (209.85.214.173) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 25 Jan 2013 13:37:56 +0000 Received: by mail-ob0-f173.google.com with SMTP id dn14so364987obc.32 for ; Fri, 25 Jan 2013 05:37:56 -0800 (PST) MIME-Version: 1.0 X-Received: by 10.60.30.231 with SMTP id v7mr4583871oeh.22.1359121076177; Fri, 25 Jan 2013 05:37:56 -0800 (PST) Received: by 10.182.49.68 with HTTP; Fri, 25 Jan 2013 05:37:56 -0800 (PST) Date: Fri, 25 Jan 2013 13:38:00 -0000 Message-ID: Subject: Re: [PATCH] Add faster HTM fastpath for libitm TSX v2 From: Uros Bizjak To: gcc-patches@gcc.gnu.org Cc: Andi Kleen , Torvald Riegel , Richard Henderson Content-Type: multipart/mixed; boundary=e89a8ff1c2c2f3cb8004d41d0841 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2013-01/txt/msg01235.txt.bz2 --e89a8ff1c2c2f3cb8004d41d0841 Content-Type: text/plain; charset=ISO-8859-1 Content-length: 1492 Hello! > The libitm TSX hardware transaction fast path currently does quite a bit of > unnecessary work (saving registers etc.) before even trying to start > a hardware transaction. This patch moves the initial attempt at a > transaction early into the assembler stub. Complicated work like retries > is still done in C. So this is essentially a fast path for the fast > path. > > The assembler code doesn't understand the layout of "serial_lock", but > it needs to check that serial_lock is free. We export just the lock > variable as a separate pointer for this. Probably the attached (RFC) patch can be useful in this case. The patch allows to specify the label for xbegin, so it is possible to implement code like following (non-sensical) example: --cut here-- extern int lock1; extern int lock2; int test () { register unsigned int res asm ("eax"); __builtin_ia32_xbegin_label (&&__txn_abort); lock1 = 0; __txn_abort: if (res & 0x10) lock2 = 1; return 0; } --cut here-- gcc -O2 -mrtm: test: xbegin .L2 movl $0, lock1(%rip) .L2: testb $16, %al je .L3 movl $1, lock2(%rip) .L3: xorl %eax, %eax ret Please note that the edge from __builtin_ia32_xbegin_label is not known to tree optimizers, so someone familiar with this part of the compiler should enhance/fix the patch. With the (fixed) patch, you can implement your assembly using plain C++, probably also using ifunc relocations. Uros. --e89a8ff1c2c2f3cb8004d41d0841 Content-Type: text/plain; charset=US-ASCII; name="p.diff.txt" Content-Disposition: attachment; filename="p.diff.txt" Content-Transfer-Encoding: base64 X-Attachment-Id: f_hcddkwvp0 Content-length: 2725 SW5kZXg6IGkzODYubWQKPT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PQotLS0gaTM4 Ni5tZAkocmV2aXNpb24gMTk1NDYwKQorKysgaTM4Ni5tZAkod29ya2luZyBj b3B5KQpAQCAtMTgwOTMsNiArMTgwOTMsMTcgQEAKICAgRE9ORTsKIH0pCiAK KyhkZWZpbmVfZXhwYW5kICJ4YmVnaW5fbGFiZWwiCisgIFsocGFyYWxsZWwK KyAgICBbKHNldCAocGMpCisJICAoaWZfdGhlbl9lbHNlIChuZSAodW5zcGVj IFsoY29uc3RfaW50IDApXSBVTlNQRUNfWEJFR0lOX0FCT1JUKQorCQkJICAg IChjb25zdF9pbnQgMCkpCisJCQkobWF0Y2hfb3BlcmFuZCAxKQorCQkJKHBj KSkpCisgICAgIChzZXQgKG1hdGNoX29wZXJhbmQ6U0kgMCAicmVnaXN0ZXJf b3BlcmFuZCIpCisJICAodW5zcGVjX3ZvbGF0aWxlOlNJIFsobWF0Y2hfZHVw IDApXSBVTlNQRUNWX1hCRUdJTikpXSldCisgICJUQVJHRVRfUlRNIikKKwog KGRlZmluZV9pbnNuICJ4YmVnaW5fMSIKICAgWyhzZXQgKHBjKQogCShpZl90 aGVuX2Vsc2UgKG5lICh1bnNwZWMgWyhjb25zdF9pbnQgMCldIFVOU1BFQ19Y QkVHSU5fQUJPUlQpCkluZGV4OiBpMzg2LmMKPT09PT09PT09PT09PT09PT09 PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09 PT09PQotLS0gaTM4Ni5jCShyZXZpc2lvbiAxOTU0NjApCisrKyBpMzg2LmMJ KHdvcmtpbmcgY29weSkKQEAgLTI2NTcwLDYgKzI2NTcwLDcgQEAgZW51bSBp eDg2X2J1aWx0aW5zCiAKICAgLyogUlRNICovCiAgIElYODZfQlVJTFRJTl9Y QkVHSU4sCisgIElYODZfQlVJTFRJTl9YQkVHSU5fTEFCRUwsCiAgIElYODZf QlVJTFRJTl9YRU5ELAogICBJWDg2X0JVSUxUSU5fWEFCT1JULAogICBJWDg2 X0JVSUxUSU5fWFRFU1QsCkBAIC0yNjk0Miw2ICsyNjk0Myw3IEBAIHN0YXRp YyBjb25zdCBzdHJ1Y3QgYnVpbHRpbl9kZXNjcmlwdGlvbiBiZGVzY19zcGVj CiAKICAgLyogUlRNICovCiAgIHsgT1BUSU9OX01BU0tfSVNBX1JUTSwgQ09E RV9GT1JfeGJlZ2luLCAiX19idWlsdGluX2lhMzJfeGJlZ2luIiwgSVg4Nl9C VUlMVElOX1hCRUdJTiwgVU5LTk9XTiwgKGludCkgVU5TSUdORURfRlRZUEVf Vk9JRCB9LAorICB7IE9QVElPTl9NQVNLX0lTQV9SVE0sIENPREVfRk9SX3hi ZWdpbl9sYWJlbCwgIl9fYnVpbHRpbl9pYTMyX3hiZWdpbl9sYWJlbCIsIElY ODZfQlVJTFRJTl9YQkVHSU5fTEFCRUwsIFVOS05PV04sIChpbnQpIFZPSURf RlRZUEVfUENWT0lEIH0sCiAgIHsgT1BUSU9OX01BU0tfSVNBX1JUTSwgQ09E RV9GT1JfeGVuZCwgIl9fYnVpbHRpbl9pYTMyX3hlbmQiLCBJWDg2X0JVSUxU SU5fWEVORCwgVU5LTk9XTiwgKGludCkgVk9JRF9GVFlQRV9WT0lEIH0sCiAg IHsgT1BUSU9OX01BU0tfSVNBX1JUTSwgQ09ERV9GT1JfeHRlc3QsICJfX2J1 aWx0aW5faWEzMl94dGVzdCIsIElYODZfQlVJTFRJTl9YVEVTVCwgVU5LTk9X TiwgKGludCkgSU5UX0ZUWVBFX1ZPSUQgfSwKIH07CkBAIC0zMjIzOSw2ICsz MjI0MSwxNyBAQCBhZGRjYXJyeXg6CiAKICAgICAgIHJldHVybiB0YXJnZXQ7 CiAKKyAgICBjYXNlIElYODZfQlVJTFRJTl9YQkVHSU5fTEFCRUw6CisgICAg ICBhcmcwID0gQ0FMTF9FWFBSX0FSRyAoZXhwLCAwKTsKKyAgICAgIG9wMCA9 IGV4cGFuZF9ub3JtYWwgKGFyZzApOworICAgICAgaWYgKEdFVF9DT0RFIChv cDApICE9IExBQkVMX1JFRikKKwl7CisJICBlcnJvciAoInRoZSB4YmVnaW4n cyBhcmd1bWVudCBtdXN0IGJlIGEgbGFiZWwiKTsKKwkgIHJldHVybiBjb25z dDBfcnR4OworCX0KKyAgICAgIGVtaXRfanVtcF9pbnNuIChnZW5feGJlZ2lu X2xhYmVsIChnZW5fcnR4X1JFRyAoU0ltb2RlLCBBWF9SRUcpLCBvcDApKTsK KyAgICAgIHJldHVybiAwOworCiAgICAgY2FzZSBJWDg2X0JVSUxUSU5fWEFC T1JUOgogICAgICAgaWNvZGUgPSBDT0RFX0ZPUl94YWJvcnQ7CiAgICAgICBh cmcwID0gQ0FMTF9FWFBSX0FSRyAoZXhwLCAwKTsK --e89a8ff1c2c2f3cb8004d41d0841--