谈谈assume

王很水

发布于 2024-09-12 16:48:16

1230

发布于 2024-09-12 16:48:16

本期文章是特约供稿，感谢@mwish

我们在之前介绍过 Strict Alias，也介绍过 __restrict__ 对自动向量化的影响。可以看到，在编译阶段，如果手动告诉编译器相关的知识，它也能更好的指导你的代码的生成。实际上，在写一些函数的时候，如果你能保证输入符合需求的话，那么「ub is good」，比如在 [1] 的例子中，我们有如下的函数：

   // Bitmask selecting the k-th bit in a byte
   static constexpr uint8_t kBitmask[] = {1, 2, 4, 8, 16, 32, 64, 128};

   // Gets the i-th bit from a byte. Should only be used with i <= 7.
   static constexpr bool GetBitFromByte1(uint8_t byte, uint8_t i) {
     return byte & GetBitMask(i);
   }

   template <typename T>
   static constexpr uint8_t GetBitMask(T index) {
     // DCHECK(index >= 0 && index <= 7);
     ARROW_COMPILER_ASSUME(index >= 0);
     return static_cast<uint8_t>(1) << index;
   }

   // Gets the i-th bit from a byte. Should only be used with i <= 7.
   static constexpr bool GetBitFromByte2(uint8_t byte, uint8_t i) {
     return byte & GetBitMask(i);
   }

   void SetBit2(uint8_t* bits, int64_t i) { bits[i / 8] |= GetBitmask2(i % 8); }

   void SetBit2NNeg(uint8_t* bits, int64_t i) {
     ASSUME(i >= 0);
     bits[i / 8] |= GetBitmask2(i % 8);
     }

可以看到，GetBitFromByte2 中，GetBitMask 实际上可以告诉编译器「这里 index 只会，只应该是 [0,7] 中的数字」，在 GetBitFromByte1 中，这个操作也是一份 UB。在这种情况下，如果调用的地方没被编译器找出来 index 的范围，直接优化掉的话，可能会有多余的 checking 代码，相关例子见：[2]

   SetBit2(unsigned char*, long):
           mov     rcx, rsi
           lea     rax, [rsi + 7]
           test    rsi, rsi
           cmovns  rax, rsi
           mov     edx, eax
           and     edx, 248
           sub     ecx, edx
           mov     edx, 1
           shl     edx, cl
           sar     rax, 3
           or      byte ptr [rdi + rax], dl
           ret

   SetBit2NNeg(unsigned char*, long):
           mov     ecx, esi
           and     cl, 7
           mov     al, 1
           shl     al, cl
           shr     rsi, 3
           or      byte ptr [rdi + rsi], al
           ret

在这里，下面这个就是个明显的负数检验

           test    rsi, rsi
           cmovns  rax, rsi

有的时候编译器能发现这些，有的时候则像我们写 __restrict__ 一样，需要我们手动的一些帮助。这也是 C++23 引入的 C++ attribute: assume [3] 的作用：Portable Compiler Assumption

Assume 的使用

这里你在 [3] 会看到，assume 的语法是 [[assume(expression)]]。在 C++23 之前的版本里面，也有类似的操作，Arrow [4] 和 Folly[5] 都有库的形式来提供 Compiler 的 Intrinsics. 我们可以简单摘抄一份 Folly 和 Arrow 的实现：

Folly

   FOLLY_ALWAYS_INLINE void compiler_may_unsafely_assume(bool cond) {
     FOLLY_SAFE_DCHECK(cond, "compiler-hint assumption fails at runtime");
   #if defined(__clang__)
     __builtin_assume(cond);
   #elif defined(__GNUC__)
     if (!cond) {
       __builtin_unreachable();
     }
   #elif defined(_MSC_VER)
     __assume(cond);
   #else
     while (!cond)
       ;
   #endif
   }

Arrow:

   #  if defined(__clang__)  // clang-specific
   #    define ARROW_COMPILER_ASSUME(expr) __builtin_assume(expr)
   #  else  // GCC-specific
   #    if __GNUC__ >= 13
   #      define ARROW_COMPILER_ASSUME(expr) __attribute__((assume(expr)))
   #    else
   // GCC does not have a built-in assume intrinsic before GCC 13, so we use an
   // if statement and __builtin_unreachable() to achieve the same effect [2].
   // Unlike clang's __builtin_assume and C++23's [[assume(expr)]], using this
   // on GCC won't warn about side-effects in the expression, so make sure expr
   // is side-effect free when working with GCC versions before 13 (Jan-2024),
   // otherwise clang/MSVC builds will fail in CI.
   #      define ARROW_COMPILER_ASSUME(expr) \
           if (expr) {                       \
           } else {                          \
             __builtin_unreachable();        \
           }
   #    endif  // __GNUC__ >= 13
   #  endif

这里我们注意到，folly 提供的是一个 boolean 参数，Arrow 以宏的形式提供，而 C++23 以 attribute 形式提供。这个在 p1774r4 [6] 也有指出：

• 不选择函数的形式，是因为 compiler assume 里面的东西不会被调用。而 assert 则不一样。所以，这里提供了 attribute 的形式。具体的使用可以参考 [3] 和 Arrow Folly 中的例子当然说到这里，我们就会好奇 Assume / Assert 的关系，以及...如何减少写搓了带来的问题？

assume & assert

我们以一篇博客作为开头，即 "Assertions Are Pessimistic, Assumptions Are Optimistic" [7] 。我们将在这里讨论 assume 的语义学和它和我们的老朋友 assert 的区别。

我们复习一下 assert 的语义 [8]:

• 如果定义了 NDEBUG 宏，那么它什么都不做
• 否则，进行断言，检查 condition 的结果总的来看，这里在 !NDEBUG 的环境下，代码部分还是被执行了，文章认为它是「悲观的」，而 assume 假设一切都不会为 false，有就是 ub，里面的内容不会被执行。这个地方很有趣，某种意义上说，assume 在代码里引入了新的 UB. 我们注意到 [9][10] 的文章，即「Undefined behavior can result in time travel 」，这里的意思是，下面的 assume_or_ub 可以影响到之前的代码。就像春秋蝉一样，这个 assume 能够对 assume 之前的代码施加影响。

   T function(..) {
     // code part 1
     ...
     assume_or_ub(...); // assume_or_ub
   }

那么，我们怎么安全的 assume 呢？我们回到 Folly 的答案:

   FOLLY_ALWAYS_INLINE void compiler_may_unsafely_assume(bool cond) {
     FOLLY_SAFE_DCHECK(cond, "compiler-hint assumption fails at runtime");
     // assume cond 为真
   }

可以看到，这里进行了一轮 DCHECK。下面的 assume 会影响 DCHECK 吗导致它不执行吗，春秋蝉会成功吗？

答案是不会的，这里如果在 !NDEBUG 的情况下，代码会类似：

   if (cond) {
       assertion and break
   }
   // cond 已经被满足
   assume(cond);

实际上，这里可以理解为「!NDEBUG 的情况下，assert 某种程度上（包括 if 也有这个功能）也有 assume 后面不会 failed 的含义，即后续 assume 为 true」。

在 LLVM 中，也有类似的代码，见 [11]

那么 assume 和 assert 能互相推导吗？p2064r0 [12] 中 Herb Sutter 大段论述了他们的区别。简而言之，实践上，我个人感觉用户可以在自己的库里实现 Folly 类似的 Checking，就不要先想着这两个互相推导了。

真的有优化吗

无论如何，assume 仍然是个危险的操作。一般认为要试过了有优化才能加上，我们可以举出一个反例 [13]. 在这个例子中。assume 会阻止优化，当然这是 LLVM 实现的问题。笔者在 Arrow 社区也遇到过有趣的问题[14]. 这里是 Assume 不能够进行预期的优化。

笔者想说的是：

• 编译器比我们想象的聪明，但可能很多地方还是可能遇到问题，当然有问题报修就是了
• Assume 是危险的，如果没有可衡量的优化，那就别开吧！有优化的话就小心前进吧！

References

1. https://github.com/apache/arrow/pull/41690#discussion_r1603723000
2. https://godbolt.org/z/Ez974vE3d
3. https://en.cppreference.com/w/cpp/language/attributes/assume
4. Arrow: Introduce portable compiler assumptions https://github.com/apache/arrow/pull/41021
5. Folly Assume https://github.com/facebook/folly/blob/ce5edfb9b08ead9e78cb46879e7b9499861f7cd2/folly/lang/Assume.h
6. Portable assumptions https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p1774r4.pdf
7. "Assertions Are Pessimistic, Assumptions Are Optimistic" https://blog.regehr.org/archives/1096
8. assert https://en.cppreference.com/w/cpp/error/assert
9. 浅谈 C++ Undefined Behavior - Lancern的文章 - 知乎 https://zhuanlan.zhihu.com/p/391088391
10. Undefined behavior can result in time travel (among other things, but time travel is the funkiest) https://devblogs.microsoft.com/oldnewthing/20140627-00/?p=633
11. https://github.com/llvm/llvm-project/blob/23a26e7120df474f37f7369e8e06fd90f21a58b5/libcxx/include/__assert#L76
12. p2064r0 Assumptions: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p2064r0.pdf
13. llvm.assume blocks optimization: https://discourse.llvm.org/t/llvm-assume-blocks-optimization/71609/14
14. https://github.com/llvm/llvm-project/issues/106895

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2024-09-11，如有侵权请联系 cloudcommunity@tencent.com 删除

编译器