From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk1-x730.google.com (mail-qk1-x730.google.com [IPv6:2607:f8b0:4864:20::730]) by sourceware.org (Postfix) with ESMTPS id 6A0083858C3A for ; Tue, 12 Oct 2021 19:44:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6A0083858C3A Received: by mail-qk1-x730.google.com with SMTP id bi9so116627qkb.11 for ; Tue, 12 Oct 2021 12:44:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=27weKXIiItbG/JhfcyszFPVDxLZN1Ib1dZM57lInmaA=; b=mfEElOyeff/3LARQQUTw/eBCPrLeWXRYVyOoBl8HiAFhpfIN1hBohU1t/eImT/L4OD 8Nvo2s1ib1Lkv4oWZgn+ubz9NmgWHyLqyyZjhqtLo9ISkFU/Y7wEzewXXzAVQ4nI4Hqp 0dhj2xy0uj/CY3sZT7dEeGdbmVJGYoSHoDuh2YEgKVDQxx2Hgr+TUbSnFGApXOqd9urj 9sl/NDQum8rPumPFMuqffVqr+kIiRx9/pT3J5iQ/hPZ1TRjlUf2IZRGzBG1PjtEaiGJV mMl+/wZrqM4EOWNYdWbRY5ndau5CboYmSG9VEFRvZf2DZQ50lTpkWhxCQmlYpNuAkH9W VrUA== X-Gm-Message-State: AOAM530N6wsnzIkuqx38T6fL0SRJFs/NQT2Iy5UZg6tLq3y2t/rTFqDX WOn1toaQiH2PITV32ObBkbQ= X-Google-Smtp-Source: ABdhPJzKvE9EryCwSYV3rmIMousrpQKZbot9N/5+SNgdQMRgZ761/UsgrMT5D05xqKOfCOL81SBQQg== X-Received: by 2002:a05:620a:31a2:: with SMTP id bi34mr15182402qkb.331.1634067860112; Tue, 12 Oct 2021 12:44:20 -0700 (PDT) Received: from [192.168.0.41] (184-96-250-116.hlrn.qwest.net. [184.96.250.116]) by smtp.gmail.com with ESMTPSA id r23sm6780453qtm.80.2021.10.12.12.44.19 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 12 Oct 2021 12:44:19 -0700 (PDT) Subject: [PATCH v2] detect out-of-bounds stores by atomic functions [PR102453] To: Richard Biener Cc: gcc-patches , Jan Hubicka References: <72c637b5-f6cc-22d6-af62-65166f5ab069@gmail.com> From: Martin Sebor Message-ID: <923ccb56-fd9c-96ea-6f9d-619668fd789b@gmail.com> Date: Tue, 12 Oct 2021 13:44:18 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.2.2 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-4.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Oct 2021 19:44:23 -0000 On 10/12/21 12:52 AM, Richard Biener wrote: > On Mon, Oct 11, 2021 at 11:25 PM Martin Sebor wrote: >> >> The attached change extends GCC's warnings for out-of-bounds >> stores to cover atomic (and __sync) built-ins. >> >> Rather than hardcoding the properties of these built-ins just >> for the sake of the out-of-bounds detection, on the assumption >> that it might be useful for future optimizations as well, I took >> the approach of extending class attr_fnspec to express their >> special property that they encode the size of the access in their >> name. >> >> I also took the liberty of making attr_fnspec assignable (something >> the rest of my patch relies on), and updating some comments for >> the characters the class uses to encode function properties, based >> on my understanding of their purpose. >> >> Tested on x86_64-linux. > > Hmm, so you place 'A' at an odd place (where the return value is specified), > but you do not actually specify the behavior on the return value. Shoudln't > > + 'A' specifies that the function atomically accesses a constant > + 1 << N bytes where N is indicated by character 3+2i > > maybe read > > 'A' specifies that the function returns the memory pointed to > by argument > one of size 1 << N bytes where N is indicated by > character 3 +2i accessed atomically > > ? I didn't think the return value would be interesting because in general (parallel accesses) it's not related (in an observable way) to the value of the dereferenced operand. Not all the built-ins also return a value (e.g., atomic_store), and whether or not one does return the argument would need to be encoded somehow because it cannot be determined from the return type (__atomic_compare_exchange and __atomic_test_and_set return bool that's not necessarily the value of the operand). Also, since the functions return the operand value either before or after the update, we'd need another letter to describe that. (This alone could be dealt with simply by using 'A' and 'a', but that's not enough for the other cases.) So with all these possibilities I don't think encoding the return value at this point is worthwhile. If/when this enhancement turns out to be used for optimization and we think encoding the return value would be helpful, I'd say let's revisit it then. The accessor APIs should make it a fairly straightforward exercise. > I also wonder if it's necessary to constrain this to 'atomic' accesses > for the purpose of the patch and whether that detail could be omitted to > eventually make more use of it? I pondered the same question but I couldn't think of any other built-ins with similar semantics (read-write-modify, return a result either pre- or post-modification), so I opted for simplicity. I am open to generalizing it if/when there is a function I could test it with, although I'm not sure the current encoding scheme has enough letters and letter positions to describe the effects in their full generality. > > Likewise > > + '0'...'9' specifies the size of value written/read is given either > + by the specified argument, or for atomic functions, by > + 2 ^ N where N is the constant value denoted by the character > > should mention (excluding '0') for the argument position. Sure, I'll update the comment if you think this change is worth pursuing. > > /* length of the fn spec string. */ > - const unsigned len; > + unsigned len; > > why that? The const member is what prevents the struct from being assignable, which is what the rest of the patch depends on. > > + /* Return true of the function is an __atomic or __sync built-in. */ > > you didn't specify that for 'A' ... > > + bool > + atomic_p () const > + { > + return str[0] == 'A'; > + } > > +attr_fnspec > +atomic_builtin_fnspec (tree callee) > +{ > + switch (DECL_FUNCTION_CODE (callee)) > + { > +#define BUILTIN_ACCESS_SIZE_FNSPEC(N, lgsz) \ > + BUILT_IN_ATOMIC_LOAD_ ## N: \ > + return "Ap" "R" lgsz; > > note that doing this for atomics makes those no longer a compiler barrier > for (aliased) loads and stores which means they are no longer a reliable > way to implement locks. That's a reason why I never pushed a > PTA/alias patch I have to open-code this. > > Thus, do we really want to do this? That's my question to you :) If you don't think this attr_fnspec extension would be useful for optimization I'll drop this part of the patch and open-code it separately only for the out-of-bounds diagnostics. I'd started out that way but the fnspec class made the code cleaner and if it could be used elsewhere so much the better. Please let me know. Martin