如何在 Solidity 中对数组进行去重

天地一小儒

发布于 2024-08-29 08:05:19

1040

发布于 2024-08-29 08:05:19

文章被收录于专栏：信且诚心之动

一、引言

Solidity 是一种面向以太坊平台的智能合约编程语言，具有类似 JavaScript 和 C++ 的语法结构。它是专门为在区块链上编写自执行合约而设计的，支持复杂的业务逻辑和去中心化应用（dApps）的开发。随着区块链技术的快速发展，Solidity 已成为构建去中心化金融（DeFi）、NFT 市场以及其他区块链应用的首选语言（其实主要是以太坊用户多罢了）。

在区块链开发中，处理数据的效率至关重要，特别是在智能合约中，数组的高效操作往往决定了合约的性能和 gas 成本。由于以太坊网络上的每一笔交易都会产生费用，减少不必要的计算和存储操作变得尤为关键。对数组进行去重就是这样一种常见的数据操作需求：我们可能需要从一个用户列表中移除重复地址，或从一个交易列表中提取唯一的交易 ID。这些操作不仅涉及数据的正确性，还直接影响到合约的执行成本。

那么，在 Solidity 中，如何高效地对数组进行去重？这是一个值得深入探讨的话题。本文将介绍几种常见的去重方法，并分析它们的优缺点，帮助你在实际开发中选择最合适的策略。

二、Solidity 中的数组操作基础

在 Solidity 中，数组是最常用的数据结构之一，允许开发者存储和操作一系列相同类型的元素。根据数组的长度是否固定，Solidity 中的数组可以分为静态数组和动态数组。

2.1 Solidity 中数组的基本使用方法

在 Solidity 中，定义和使用数组的方法非常直观。数组类型由元素类型和方括号组成，例如 uint256[] 表示一个动态数组，uint256[5] 表示一个包含 5 个 uint256 元素的静态数组。

以下是一些基本的数组操作示例：

// SPDX-License-Identifier: MIT
pragma solidity ^0.8.0;

contract ArrayExample {
    // 动态数组
    uint256[] public dynamicArray;

    // 静态数组
    uint256[5] public staticArray;

    // 添加元素到动态数组
    function addElement(uint256 element) public {
        dynamicArray.push(element);
    }

    // 修改静态数组中的元素
    function setElement(uint256 index, uint256 element) public {
        require(index < staticArray.length, "Index out of bounds");
        staticArray[index] = element;
    }

    // 获取动态数组的长度
    function getDynamicArrayLength() public view returns (uint256) {
        return dynamicArray.length;
    }
}

2.2 动态数组与静态数组的区别

静态数组：静态数组的长度在编译时已确定，并且在合约生命周期中无法更改。使用静态数组的优点是它们的存储和操作成本相对较低，因为它们不需要动态调整大小。静态数组常用于合约中需要处理固定数量数据的场景，例如固定数量的参与者或预定义的常量值。

// 定义一个包含 3 个元素的静态数组
uint256[3] public staticArray = [1, 2, 3];

动态数组：动态数组的长度在合约的生命周期内是可变的，开发者可以使用 push 方法向数组中添加元素。虽然动态数组提供了灵活性，但它们也带来了更高的 gas 成本，尤其是在添加和删除元素时。动态数组适用于需要处理可变数量数据的场景，例如用户地址列表或交易记录等。

// 定义一个动态数组
uint256[] public dynamicArray;

// 向动态数组中添加元素
dynamicArray.push(4);

2.3 数组操作的 gas 成本及其在智能合约中的重要性

在智能合约中，每次数组操作都会消耗一定的 gas，这是因为操作涉及对以太坊虚拟机（EVM）中存储的读取和写入。尤其是在以太坊主网上，gas 成本直接影响到交易费用，因此对数组的操作效率显得尤为重要。

读操作：在数组中读取数据的 gas 成本相对较低，通常只需要访问存储器。
写操作：写操作的 gas 成本较高，因为它涉及到将数据写入以太坊的持久化存储。
动态调整大小：对于动态数组，每次 push 操作不仅需要写入新元素，还可能涉及数组大小调整的操作，这会增加额外的 gas 成本。

优化数组操作是 Solidity 开发中的一个关键点。为了减少不必要的 gas 消耗，开发者通常会在合约逻辑中慎重考虑数组的使用方式和操作方法。例如，尽量避免在循环中进行多次写操作，或者在不必要的情况下使用动态数组。

总之，理解数组的基本操作及其 gas 成本，可以帮助开发者编写更高效的智能合约，避免不必要的资源浪费，提升合约的整体性能。

三、Solidity 中的去重挑战

在智能合约开发中，Solidity 的局限性往往会影响开发者实现特定功能的方式。一个显著的限制是，Solidity 不直接支持像 JavaScript 中的 Set 这样的动态数据结构。这使得在 Solidity 中处理集合操作（如去重）变得更加复杂和昂贵。

由于 Solidity 的局限性和区块链环境的特性，开发者必须在实现去重的同时，尽可能减少存储和计算操作，以节省 gas 和降低存储成本。这需要精心设计合约逻辑，选择最合适的数据结构和算法来优化性能和成本。在智能合约开发中，这种权衡和优化是不可避免的，也是开发者需要不断学习和改进的关键点。

3.1 Solidity 的局限性

缺乏高级数据结构：Solidity 目前只支持基本的数据结构，如数组和映射（mapping）。这些数据结构虽然足以满足许多简单需求，但在处理更复杂的数据操作时，如自动去重或排序，它们显得力不从心。与 JavaScript 不同，Solidity 没有原生的 Set 类型，这意味着没有直接的方式来存储唯一值。
存储操作成本高：Solidity 中的任何写入操作，特别是涉及到永久存储（storage）时，都会消耗大量的 gas。存储的数据越多，操作越复杂，消耗的 gas 就越高。因此，构建一个复杂的数据结构或进行多次数据写入操作，会显著增加合约的部署和执行成本。
没有原生的集合操作：Solidity 缺乏对集合操作的原生支持。像在 JavaScript 中使用 Set 的 add 方法自动去重，或使用 has 方法快速查找元素，这些在 Solidity 中都需要手动实现。通常，这需要编写额外的逻辑和循环，进一步增加了合约的复杂性和执行成本。

3.2 在 Solidity 中实现去重的难度

在 Solidity 中去重的主要难点在于如何在保证数据唯一性的同时控制 gas 成本。以下是实现去重的一些挑战：

高昂的 gas 成本：为了实现去重，开发者需要遍历数组中的所有元素，并且通常需要在遍历过程中检查每个元素是否已经存在。使用映射（mapping）或集合（set）来跟踪已经见过的元素是一种常见的解决方案，但每次访问和修改映射都会消耗 gas，尤其是在数据量较大时，成本可能会非常高。
存储空间的浪费：为实现去重，我们需要额外的数据结构来跟踪已经处理的元素。例如，使用映射来记录一个元素是否已出现过，虽然这种方式可以使查找操作的时间复杂度为 O(1)，但是映射本身需要额外的存储空间，这会增加合约的总体存储成本。更糟的是，存储在区块链上的数据是永久存在的，这意味着这些额外的存储消耗将会是长期的。
Gas Limit 约束：以太坊网络对单个交易执行的 gas 数量有上限（即 gas limit）。如果一个合约函数在执行时消耗的 gas 超过了这个限制，交易将被回滚，合约不会执行成功。去重操作的复杂性可能导致 gas 消耗迅速增加，特别是在处理大型数组或在复杂逻辑中嵌套多次去重操作时。

四、方法一：使用集合（或映射）进行去重

下面是一个使用 openzepplin 的 EnumerableSet 库来快速去重空投地址的智能合约示例：

// SPDX-License-Identifier: MIT
pragma solidity ^0.8.0;

// version >= v3.3.0
import {EnumerableSet} from "@openzeppelin/contracts/utils/structs/EnumerableSet.sol";

contract AirdropUniqueAddresses {
    // Using the EnumerableSet library for AddressSet
    using EnumerableSet for EnumerableSet.AddressSet;

    // Declare a state variable of type EnumerableSet.AddressSet to store unique addresses
    EnumerableSet.AddressSet private seen;

    // Function to add an address to the set if it is not already present
    function addAirdropAddress(address _address) public returns (bool) {
        // Add the address to the set
        // Returns true if the address was not already in the set, false otherwise
        return seen.add(_address);
    }

    // Function to check if an address is already in the set
    function isAirdropAddressSeen(address _address) public view returns (bool) {
        // Check if the address is in the set
        return seen.contains(_address);
    }

    // Function to get the total number of unique addresses in the set
    function getTotalUniqueAddresses() public view returns (uint256) {
        // Return the number of unique addresses in the set
        return seen.length();
    }

    // Function to retrieve an address by index in the set
    function getAirdropAddressAtIndex(uint256 index) public view returns (address) {
        // Ensure the index is within the bounds of the set
        require(index < seen.length(), "Index out of bounds");
        // Retrieve the address at the specified index
        return seen.at(index);
    }

    // Function to remove an address from the set
    function removeAirdropAddress(address _address) public returns (bool) {
        // Remove the address from the set
        // Returns true if the address was in the set and removed, false otherwise
        return seen.remove(_address);
    }
}

优点：
- 高效性：映射在 Solidity 中提供了 O(1) 的查找时间。
- 易于实现：逻辑简单，易于理解。
缺点：
- 需要额外的存储空间，可能会增加 gas 成本。
- 不能动态创建映射，需要预先定义数据结构：类似这种代码在编译中会报错Uninitialized mapping. Mappings cannot be created dynamically, you have to assign them from a state variable.

mapping(uint256 => bool) memory seen;
uint256[] memory input = [1, 2, 2, 3, 4, 4];
for (uint256 i = 0; i < input.length; i++) {
    if (!seen[input[i]]) {
        seen[input[i]] = true;
    }
}

五、方法二：双重循环去重

以下是一个使用双重循环来去重空投地址的智能合约示例：

// SPDX-License-Identifier: MIT
pragma solidity ^0.8.0;

contract AirdropUniqueAddresses {
    // Function to get the unique addresses after removing duplicates
    function getUniqueAirdropAddresses(address[] airdropAddresses;) public view returns (address[] memory) {
        // Calculate the length of the airdropAddresses array
        uint256 length = airdropAddresses.length;

        // Create a dynamic array to store unique addresses
        address[] memory uniqueAddresses = new address[](length);
        uint256 uniqueCount = 0;

        // Outer loop to iterate through each address in the airdropAddresses array
        for (uint256 i = 0; i < length; i++) {
            bool isDuplicate = false;

            // Inner loop to check if the address is already in the uniqueAddresses array
            for (uint256 j = 0; j < uniqueCount; j++) {
                if (airdropAddresses[i] == uniqueAddresses[j]) {
                    isDuplicate = true;
                    break;
                }
            }

            // If the address is not a duplicate, add it to the uniqueAddresses array
            if (!isDuplicate) {
                uniqueAddresses[uniqueCount] = airdropAddresses[i];
                uniqueCount++;
            }
        }

        // Create a new array with the size of uniqueCount to store the final unique addresses
        address[] memory finalUniqueAddresses = new address[](uniqueCount);

        // Copy the unique addresses to the final array
        for (uint256 k = 0; k < uniqueCount; k++) {
            finalUniqueAddresses[k] = uniqueAddresses[k];
        }

        return finalUniqueAddresses;
    }
}