From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg1-x532.google.com (mail-pg1-x532.google.com [IPv6:2607:f8b0:4864:20::532]) by sourceware.org (Postfix) with ESMTPS id 71E5F3858407 for ; Thu, 20 Jan 2022 22:57:11 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 71E5F3858407 Received: by mail-pg1-x532.google.com with SMTP id c5so6629177pgk.12 for ; Thu, 20 Jan 2022 14:57:11 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:from:date:message-id:subject:to:cc; bh=Dtp7/fwYVw+FjjvZaO+EZgibMiWbMuttfcK+NjSOimM=; b=HQAIh8FoY1dIIXNUMrSuSPTu3LhSUJIRWvfP2bf/5yky2e8t9UlkQFPY4r/7YCW+P+ H8ewDmQdozvszT+027etw5kICP7p1PG43Z2HKxzbuStWSAyYmNWWAClVxK2S88jDxeQj tKRQD2masPoNpP8XeGpkbzg1VaYA1wo6a4uLupYIgrMGFxJVe+ZfqsFQ7BtYDT2qSwx0 nKTxqpOwCxye89tjT5Ubi50kVEmdGpAUvEATRM7cnc/CiVJP09BEfZFWn/IGK1gGT8DX SkYcUu5S/UVv2G3fLNJxBpejOUaBNH7S/aG8EcCUElDjSHIjNeKtzu1ydoN4gXiXurVC 94Bg== X-Gm-Message-State: AOAM531tJaNCYcrlrvicdAp+RBnqgbCV4+fXvMcClvENwCLLZFF9Cba3 bSJah/rTFXsS7wE7IzXc5N+uWQV7tYneqIw9zXwjp+7Lj9s= X-Google-Smtp-Source: ABdhPJz9XBZziysTgfvlN3IPJRYZ+OoIPc5r4aSNcTWqFMJTpU5jqJnxOsRXC1hAmSGf4C/cRj8jKZo3zHHWqMh8Lxg= X-Received: by 2002:a63:7705:: with SMTP id s5mr798672pgc.338.1642719430455; Thu, 20 Jan 2022 14:57:10 -0800 (PST) MIME-Version: 1.0 From: Noah Goldstein Date: Thu, 20 Jan 2022 16:56:59 -0600 Message-ID: Subject: Add new ABIs '__strcmpeq', '__strncmpeq', '__wcscmpeq' and '__wcsncmpeq' to libc To: libc-coord@lists.openwall.com Cc: GNU C Library , Richard Biener via Gcc Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=unavailable autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Jan 2022 22:57:13 -0000 Hi All, This is a proposal for four new interfaces to be supported by libc. This is essentially the same proposal as '__memcmpeq()': https://sourceware.org/pipermail/libc-alpha/2021-September/131099.html for the character string and wide-character string comparison functions. #### Interfaces #### int __strcmpeq(const void * s1, const void * s2) int __strncmpeq(const void * s1, const void * s2, size_t n) int __wcscmpeq(const wchar_t * ws1, const wchar_t * ws2) int __wcsncmpeq(const wchar_t * ws1, const wchar_t * ws2, size_t n) #### Descriptions #### - 'strcmpeq()' - The '__strcmpeq()' function shall compare the string pointed to by 's1' to the string pointed to by 's2'. If the two strings are equal the return value will be zero. Otherwise the return value will be some non-zero value. 'strcmp()' is a valid implementation of '__strcmpeq()'. - 'strncmpeq()' - The '__strncmpeq()' function shall compare not more than 'n' bytes (bytes that follow a null byte are not compared) from the array pointed to by 's1' to the array pointed to by 's2'. If the two strings or first 'n' characters are equal the return value will be zero. Otherwise the return value will be some non-zero value. 'strncmp()' is a valid implementation of '__strncmpeq()'. - 'wcscmpeq()' - The '__wcscmpeq()' function shall compare the wide-character string pointed to by 'ws1' to the wide-character string pointed to by 'ws2'. If the two wide-character strings are equal the return value will be zero. Otherwise the return value will be some non-zero value. 'wcscmp()' is a valid implementation of '__wcscmpeq()'. - 'wcsncmpeq()' - The '__wcsncmpeq()' function shall compare not more than 'n' wide-character codes (wide-character codes that follow a null wide-character code are not compared) from the array pointed to by 'ws1' to the array pointed to by 'ws2'. If the two wide-character strings or first 'n' characters are equal the return value will be zero. Otherwise the return value will be some non-zero value. 'wcsncmp()' is a valid implementation of '__wcsncmpeq()'. #### Use Case #### The goal is that the new interfaces will be usable as an optimization by compilers if a program uses the return value of the non "eq" variant as a boolean. For example: void foo (const char *s1, const char *s2, const wchar_t *ws1, const wchar_t *ws2, size_t n) { if (!strcmp (s1, s2)) printf ("strcmp can be optimized to __strcmpeq in this use case\n"); if (strncmp (s1, s2, n)) printf ("strncmp can be optimized to __strncmpeq in this use case\n"); if (wcscmp (ws1, ws2)) printf ("wcscmp can be optimized to __wcscmpeq in this use case\n"); if (!wcsncmp (ws1, ws2, n)) printf ("wcsncmp can be optimized to __wcsncmpeq in this use case\n"); } #### Argument Specifications #### - '__strcmpeq()' has the exact same argument specifications as 'strcmp()' - 's1' is a null terminated character string. - 's2' is a null terminated character string. - '__strncmpeq()' has the exact same argument specifications as 'strncmp()' - 's1' is a character sequences terminated either by null or 'n' - 's2' is a character sequences terminated either by null or 'n' - 'n' is the maximum number - '__wcscmpeq()' has the exact same argument specifications as 'wcscmp()' - 'ws1' is a null terminated wide-character string. - 'ws2' is a null terminated wide-character string. - '__wcsncmpeq()' has the exact same argument specifications as 'wcsncmp()' - 'ws1' is a wide-character sequences terminated either by null or 'n' - 'ws2' is a wide-character sequences terminated either by null or 'n' - 'n' is the maximum number For each of these functions, if any of the input constraints are not met, the result is undefined. #### Return Value Specification #### - '__strcmpeq()' - if 's1' and 's2' are equal, the return value is zero. Otherwise the return value is any non-zero value. 'strcmp()', '!!strcmp()', or '-strcmp()' are all valid implementations of '__strcmpeq()'. - '__strncmpeq()' - if 's1' and 's2' are equal up to the first 'n' characters or up to and including the first null character, the return value is zero. Otherwise the return value is any non-zero value. 'strncmp()', '!!strncmp()', or '-strncmp()' are all valid implementations of '__strncmpeq()'. - '__wcscmpeq()' - if 'ws1' and 'ws2' are equal, the return value is zero. Otherwise the return value is any non-zero value. 'wcscmp()', '!!wcscmp()', or '-wcscmp()' are all valid implementations of '__wcscmpeq()'. - '__wcsncmpeq()' - if 'ws1' and 'ws2' are equal up to the first 'n' wide-characters or up to and including the first null wide-character, the return value is zero. Otherwise the return value is any non-zero value. 'wcsncmp()', '!!wcsncmp()', or '-wcsncmp()' are all valid implementations of '__wcsncmpeq()'. #### Notes #### These interfaces are designed intentionally so that the non "eq" variant of each function will be a valid implementation of the corresponding "eq" variant. #### ABI vs API #### This proposal is for '__strcmpeq()', '__strncmpeq()', '__wcscmpeq()', and '__wcsncmpeq()' as new ABIs. As ABIs the interfaces will have value as an optimization compilers can make for the idiomatic boolean usage of the return value of the existing comparison functions. Best, Noah