
下面这段 C++ 代码,数组排序后,执行速率快了近 6 倍。
#include <algorithm>
#include <ctime>
#include <iostream>
int main()
{
// Generate data
const unsigned arraySize = 32768;
int data[arraySize];
for (unsigned c = 0; c < arraySize; ++c)
data = std::rand() % 256;
// !!! With this, the next loop runs faster
std::sort(data, data + arraySize);
// Test
clock_t start = clock();
long long sum = 0;
for (unsigned i = 0; i < 100000; ++i)
{
// Primary loop
for (unsigned c = 0; c < arraySize; ++c)
{
if (data >= 128)
sum += data;
}
}
double elapsedTime = static_cast<double>(clock() - start) / CLOCKS_PER_SEC;
std::cout << elapsedTime << std::endl;
std::cout << "sum = " << sum << std::endl;
}std::sort(data, data + arraySize)的话,时间大概为 11.54 秒。一开始我认为可能是语言或者编译器搞的鬼,所以又用 Java 试了下。
import java.util.Arrays;
import java.util.Random;
public class Main
{
public static void main(String[] args)
{
// Generate data
int arraySize = 32768;
int data[] = new int[arraySize];
Random rnd = new Random(0);
for (int c = 0; c < arraySize; ++c)
data = rnd.nextInt() % 256;
// !!! With this, the next loop runs faster
Arrays.sort(data);
// Test
long start = System.nanoTime();
long sum = 0;
for (int i = 0; i < 100000; ++i)
{
// Primary loop
for (int c = 0; c < arraySize; ++c)
{
if (data >= 128)
sum += data;
}
}
System.out.println((System.nanoTime() - start) / 1000000000.0);
System.out.println("sum = " + sum);
}
}但结果也差不多。
按道理说,也不应该是缓存造成的。仔细看一下这些代码,做的无非就是判断,加法这些很平常的运算。到底是什么导致了这样的差异呢?
其实这是由分支预测(Branch Prediction)造成的。
分支预测的专业解释可以参考下维基上的 分支预测器。我这里简单解释下,就是让 CPU 找到一个规律,可以猜到下一条要执行的是哪一条指令,然后直接跳过去,这样速度就变快了。
就以上面的代码为例,如果已经排过序了,那么就会出现下面的情况,在if (data >= 128)上分支预测器很容易处理,
T = branch taken
N = branch not taken
data[] = 0, 1, 2, 3, 4, ... 126, 127, 128, 129, 130, ... 250, 251, 252, ...
branch = N N N N N ... N N T T T ... T T T ...
= NNNNNNNNNNNN ... NNNNNNNTTTTTTTTT ... TTTTTTTTTT (easy to predict)但是如果数据是无序的,分支预测器就没啥用了,
data[] = 226, 185, 125, 158, 198, 144, 217, 79, 202, 118, 14, 150, 177, 182, 133, ...
branch = T, T, N, T, T, T, T, N, T, N, N, T, T, T, N ...
= TTNTTTTNTNNTTTN ... (completely random - hard to predict)如果你想进一步证实到底是不是分支预测影响的,你可以这么做:
替换:
if (data >= 128)
sum += data;为:
int t = (data - 128) >> 31;
sum += ~t & data;这样就没分支预测了(两个语句做的事情其实是等同的,就是用位运算来替换 if 语句而已)。
测试环境:Core i7 920 @ 3.5 GHz
C++ – Visual Studio 2010 – x64 Release
// Branch - Random
seconds = 11.777
// Branch - Sorted
seconds = 2.352
// Branchless - Random
seconds = 2.564
// Branchless - Sorted
seconds = 2.587Java – NetBeans 7.1.1 JDK 7 – x64
// Branch - Random
seconds = 10.93293813
// Branch - Sorted
seconds = 5.643797077
// Branchless - Random
seconds = 3.113581453
// Branchless - Sorted
seconds = 3.186068823所以基本上可以得出结论: