Python-VBA函数之旅-set函数

码农世界 2024-05-31 后端 90 次浏览 0个评论

一、set函数的常见应用场景：

二、set函数使用注意事项

三、如何用好set函数？

1、set函数：

1-1、Python：

1-2、VBA：

2、推荐阅读：

个人主页： https://blog.csdn.net/ygb_1024?spm=1010.2135.3001.5421

一、set函数的常见应用场景：

在Python中，set()函数用于创建一个无序且不包含重复元素的集合，由于其独特的性质，set()函数在许多实际场景中都非常有用，常见的应用场景有：

1、去重：当你有一个列表或其他可迭代对象，并且你想快速去除其中的重复元素时，可以使用 set()函数。

2、快速查找：由于set的实现基于哈希表，因此它提供了平均时间复杂度为O(1)的成员查找操作(即 `in` 关键字)，这在需要快速检查元素是否存在于集合中时非常有用。

3、集合运算：集合支持多种集合运算，如并集(|)、交集(&)、差集(-)和对称差集(^)，这些运算在数据分析、算法实现等领域非常有用。

4、过滤和选择：结合列表推导式，可以使用集合来过滤和选择数据。

5、字典键去重：由于集合中的元素是唯一的，因此它们经常被用作字典的键来确保键的唯一性。

6、性能优化：在某些需要快速查找或判断元素是否存在的场景中，使用集合通常比使用列表更快，因为集合在内部使用哈希表实现，提供了快速的查找和插入操作。

7、图形和网络算法：在处理图形和网络问题时，set常常用于表示节点或边的集合，并进行诸如查找连接节点、计算共同邻居等操作。

8、数据清洗和预处理：在数据清洗和预处理阶段，集合可以用于快速识别并去除重复的数据项。

9、缓存和状态管理：在某些情况下，你可能需要跟踪一个对象或一组对象的状态或属性，使用set()函数可以轻松地存储这些状态，并在需要时进行检查或更新。

10、实现集合的幂集：集合的幂集是所有可能的子集(包括空集和原集合本身)的集合。

二、set函数使用注意事项

在Python中，set()是一个内置的数据类型，用于存储无序且不重复的元素集合。虽然set()本身不是一个函数(它是一个类，用于创建集合对象)，但了解其使用方式和注意事项对于有效使用集合是非常重要的。以下是使用set()(或集合)时的一些注意事项：

1、无序性：集合是无序的，因此你不能依赖元素在集合中的插入顺序。
2、不可变性：集合中的元素必须是不可变类型(如整数、浮点数、字符串、元组等)，但集合本身是可变的。
3、去重性：集合会自动去除重复的元素，如果你尝试向集合中添加一个已经存在的元素，该操作不会有任何效果。
4、不支持索引：由于集合是无序的，因此它不支持索引或切片操作。
5、运算符支持：集合支持多种集合运算符，如并集(|)、交集(&)、差集(-)和对称差集(^)。
6、迭代：你可以遍历集合中的所有元素。
7、添加和删除：你可以使用add()方法向集合中添加元素，使用remove()或discard()方法删除元素，如果尝试删除的元素不存在于集合中，remove()会引发KeyError异常，而discard()则不会。
8、性能：集合的查找、添加和删除操作通常具有平均时间复杂度为O(1)的性能，这是因为集合在内存中是通过哈希表实现的。
9、转换：你可以将其他可迭代对象(如列表、元组或字符串)转换为集合，以去除其中的重复元素，但要注意，这样做会丢失原始顺序。
10、空集合的创建：要创建一个空集合，你不能只使用{}，因为这会创建一个空字典；相反，你应该使用set()或{}后跟一个逗号来创建空集合(例如set() 或{})，但在实践中，通常只使用set()。
11、子集和超集：你可以使用 `<`、`<=`、`>` 和 `>=` 运算符来检查一个集合是否是另一个集合的子集或超集。
12、不可哈希类型：集合不能包含列表或其他可变类型作为元素，因为这些类型是不可哈希的，如果你尝试这样做，Python会抛出一个TypeError异常。
13、冻结集合：如果你需要一个不可变的集合(即其元素在创建后不能更改)，可以使用 frozenset()函数来创建一个冻结集合。

三、如何用好set函数？

在Python中，set()不是一个函数，而是一个内置的数据类型，用于创建集合对象；集合(set)是一个无序且不包含重复元素的数据集合，要充分利用Python的集合(set)，需遵循以下建议：

1、去重：集合的一个主要优点是它们会自动去除重复的元素，如果你有一个列表或其他可迭代对象，并且想要去除其中的重复项，你可以将其转换为集合。

2、集合运算：集合支持多种集合运算，如并集、交集、差集和对称差集。

3、检查元素是否存在：使用 `in` 关键字可以检查一个元素是否存在于集合中。

4、添加和删除元素：使用add()方法向集合中添加元素，使用remove()或discard()方法删除元素。

5、集合的交集更新：如果你想要更新一个集合，使其只包含与另一个集合的交集，可以使用intersection_update()方法。

6、集合的差集更新：类似地，你可以使用difference_update()方法来更新一个集合，使其只包含与另一个集合的差集。

7、判断子集和超集：使用 `<`、`<=`、`>` 和 `>=` 运算符来判断一个集合是否是另一个集合的子集或超集。

8、不可变集合：如果你需要一个不可变的集合(即其内容在创建后不能更改)，可以使用 frozenset()，这在需要将集合作为字典的键或其他需要不可变类型的地方时很有用。

9、注意性能：集合的查找、添加和删除操作通常具有平均时间复杂度为O(1)的性能，因为它们是通过哈希表实现的，这使得集合在处理大型数据集时特别有效。

10、小心使用可变元素：集合中的元素必须是不可变的，如果你尝试将一个可变对象(如列表)添加到集合中，Python会抛出一个TypeError异常。

1、set函数：

1-1、Python：

# 1.函数：set
# 2.功能：用于将可迭代对象转换为一个无序且无重复元素的可变集合
# 3.语法：set([iterable])
# 4.参数：iterable，表示要转换为集合的可迭代对象，可以是列表、元组、range对象、字符串等
# 5.返回值：
# 5-1、无参形式：返回一个新的空集合
# 5-2、有参形式：返回一个新的集合对象
# 6.说明：
# 7.示例：
# 用dir()函数获取该函数内置的属性和方法
print(dir(set))
# ['__and__', '__class__', '__class_getitem__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__',
# '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__iand__', '__init__', '__init_subclass__',
# '__ior__', '__isub__', '__iter__', '__ixor__', '__le__', '__len__', '__lt__', '__ne__', '__new__', '__or__', '__rand__',
# '__reduce__', '__reduce_ex__', '__repr__', '__ror__', '__rsub__', '__rxor__', '__setattr__', '__sizeof__', '__str__',
# '__sub__', '__subclasshook__', '__xor__', 'add', 'clear', 'copy', 'difference', 'difference_update', 'discard', 'intersection',
# 'intersection_update', 'isdisjoint', 'issubset', 'issuperset', 'pop', 'remove', 'symmetric_difference', 'symmetric_difference_update', 'union', 'update']
# 用help()函数获取该函数的文档信息
help(set)
# 应用一：去重
# 示例1: 去除列表中的重复项
# 原始列表包含重复项
original_list = [1, 2, 2, 3, 4, 4, 5, 5, 5]
# 使用set()去除重复项，然后转回列表
unique_list = list(set(original_list))
# 注意：set是无序的，因此转换回列表后顺序可能会改变
print(unique_list)  # 输出可能是 [1, 2, 3, 4, 5]，但顺序不一定
# 如果需要保持原始顺序，可以使用其他方法，如列表推导式和if语句
unique_list_ordered = []
[unique_list_ordered.append(item) for item in original_list if item not in unique_list_ordered]
print(unique_list_ordered)  # 输出将保持原始顺序
# [1, 2, 3, 4, 5]
# [1, 2, 3, 4, 5]
# 示例2: 去除字符串中的重复字符
# 原始字符串包含重复字符
original_string = "Hello, Python!"
# 使用set()去除重复字符，但set不能直接转回字符串
unique_chars = set(original_string)
# 如果需要将结果转换回字符串，并去除顺序的影响(因为set是无序的)
unique_string = ''.join(sorted(unique_chars))
print(unique_string)
# !, HPehlnoty
# 示例3: 去除嵌套列表中的重复子列表(注意：set不能直接处理列表作为元素)
# 原始嵌套列表包含重复子列表
original_nested_list = [[1, 2], [3, 4], [1, 2], [5, 6]]
# 使用tuple()转换子列表为元组，并使用set()去除重复项
unique_nested_set = set(map(tuple, original_nested_list))
# 转换回列表(但子列表将变为元组)
unique_nested_list_tuples = list(unique_nested_set)
# 如果需要子列表仍为列表而不是元组，则再次转换
unique_nested_list = [list(item) for item in unique_nested_list_tuples]
print(unique_nested_list)
# [[1, 2], [3, 4], [5, 6]]
# 应用二：快速查找
# 示例1: 查找元素是否在集合中
# 创建一个集合
my_set = {3, 5, 6, 8, 10, 11, 24}
# 查找元素
element_to_find = 10
if element_to_find in my_set:
    print(f"{element_to_find} 在集合中")
else:
    print(f"{element_to_find} 不在集合中")
# 查找不存在的元素
element_not_found = 7
if element_not_found in my_set:
    print(f"{element_not_found} 在集合中")
else:
    print(f"{element_not_found} 不在集合中")
# 10 在集合中
# 7 不在集合中
# 示例2: 使用集合进行快速去重和查找
# 原始列表包含重复元素
original_list = [1, 2, 2, 3, 4, 4, 5, 5, 5]
# 使用集合去重
unique_set = set(original_list)
# 查找元素
element_to_find = 4
if element_to_find in unique_set:
    print(f"{element_to_find} 在去重后的集合中")
else:
    print(f"{element_to_find} 不在去重后的集合中")
# 如果你想将结果转换回列表(但不保证顺序)
unique_list = list(unique_set)
print(unique_list)
# 4 在去重后的集合中
# [1, 2, 3, 4, 5]
# 示例3: 查找两个集合的交集
# 创建两个集合
set1 = {1, 2, 3, 4, 5}
set2 = {4, 5, 6, 7, 8}
# 查找交集
intersection = set1 & set2
print(intersection)
# 查找一个元素是否在两个集合的交集中
element_to_check = 4
if element_to_check in intersection:
    print(f"{element_to_check} 在两个集合的交集中")
else:
    print(f"{element_to_check} 不在两个集合的交集中")
# {4, 5}
# 4 在两个集合的交集中
# 应用三：集合运算
# 示例1: 并集(Union)
# 创建两个集合
set1 = {1, 2, 3, 4}
set2 = {3, 4, 5, 6}
# 使用 | 运算符计算并集
union_set = set1 | set2
print(union_set)
# {1, 2, 3, 4, 5, 6}
# 示例2: 交集(Intersection)
# 创建两个集合
set1 = {1, 2, 3, 4}
set2 = {3, 4, 5, 6}
# 使用 & 运算符计算交集
intersection_set = set1 & set2
print(intersection_set)
# {3, 4}
# 示例3: 差集(Difference)
# 创建两个集合
set1 = {1, 2, 3, 4}
set2 = {3, 4, 5, 6}
# 使用 - 运算符计算差集(set1中有但set2中没有的元素)
difference_set = set1 - set2
print(difference_set)
# 反过来计算差集(set2中有但set1中没有的元素)
difference_set_reverse = set2 - set1
print(difference_set_reverse)
# {1, 2}
# {5, 6}
# 示例4: 对称差集(Symmetric Difference)
# 创建两个集合
set1 = {1, 2, 3, 4}
set2 = {3, 4, 5, 6}
# 使用 ^ 运算符计算对称差集(存在于一个集合中但不同时存在于两个集合中的元素)
symmetric_difference_set = set1 ^ set2
print(symmetric_difference_set)
# 示例5: 判断一个集合是否是另一个集合的子集(Subset)
# 创建两个集合
set1 = {1, 2, 3}
set2 = {1, 2, 3, 4, 5}
# 使用 <= 运算符判断set1是否是set2的子集
is_subset = set1 <= set2
print(is_subset)
# 反过来判断set2是否是set1的子集
is_subset_reverse = set2 <= set1
print(is_subset_reverse)
# True
# False
# 示例6: 判断两个集合是否有交集(Intersection)
# 创建两个集合
set1 = {1, 2, 3}
set2 = {3, 4, 5}
# 使用&运算符和if语句判断两个集合是否有交集
has_intersection = bool(set1 & set2)
print(has_intersection)
# 如果没有交集
set3 = {6, 7, 8}
has_intersection_no = bool(set1 & set3)
print(has_intersection_no)
# True
# False
# 应用四：过滤和选择
# 示例1: 使用集合过滤列表中的重复项
# 原始列表包含重复项
original_list = [1, 2, 2, 3, 4, 4, 5, 5, 5]
# 使用集合去重，然后转回列表
filtered_list = list(set(original_list))
# 注意：集合是无序的，因此转换回列表后顺序可能会改变
print(filtered_list)
# [1, 2, 3, 4, 5]
# 示例2: 使用集合选择两个列表中共有的元素
# 两个列表
list1 = [1, 2, 3, 4, 5]
list2 = [4, 5, 6, 7, 8]
# 将列表转换为集合
set1 = set(list1)
set2 = set(list2)
# 使用集合的交集运算来选择共有的元素
common_elements = set1 & set2
# 如果需要结果是一个列表
common_elements_list = list(common_elements)
print(common_elements_list)
# [4, 5]
# 示例3: 使用集合过滤列表中的元素，只保留特定集合中的元素
# 原始列表
original_list = [1, 2, 3, 4, 5, 6]
# 我们要保留的元素的集合
selected_elements = {2, 4, 6}
# 使用列表推导式来过滤列表，只保留在selected_elements中的元素
filtered_list = [element for element in original_list if element in selected_elements]
print(filtered_list)
# [2, 4, 6]
# 示例4: 使用集合的差集运算来过滤列表中的元素
# 原始列表
original_list = [1, 2, 3, 4, 5]
# 我们想要从列表中移除的元素的集合
elements_to_remove = {2, 4}
# 将列表转换为集合
set_original = set(original_list)
# 使用差集运算来移除元素
filtered_set = set_original - elements_to_remove
# 如果需要结果是一个列表
filtered_list = list(filtered_set)
print(filtered_list)
# [1, 3, 5]
# 应用五：字典键去重
# 假设我们有一个包含可能重复键的列表
keys_with_duplicates = ['a', 'b', 'a', 'c', 'b', 'd']
# 使用集合去除重复的键
unique_keys = set(keys_with_duplicates)
# 将去重后的键转换为一个字典，这里我们使用一个简单的值（比如None或者一个固定的值）
# 如果你想为每个键分配一个特定的值，你需要有一个与键对应的值列表
dict_with_unique_keys = {key: None for key in unique_keys}
# 输出结果
print(dict_with_unique_keys)
values = [1, 2, 3, 4]  # 注意：这个列表的长度应该与unique_keys的长度相同
# 你可以使用zip函数来配对键和值（但你需要确保列表长度匹配）
# 在这个例子中，我们假设values列表已经按照某种逻辑与unique_keys匹配
# （在实际应用中，你可能需要根据具体的逻辑来配对键和值）
dict_with_paired_values = dict(zip(unique_keys, values))
# 输出结果（注意：如果values列表长度不够，将会丢失一些键）
print(dict_with_paired_values)
# {'a': None, 'd': None, 'b': None, 'c': None}
# {'a': 1, 'd': 2, 'b': 3, 'c': 4}
# 应用六：性能优化
# 示例1: 使用集合进行高效的成员资格检查
import time
# 使用列表进行成员资格检查
lst = list(range(1000000))  # 创建一个包含一百万个元素的列表
element = 999999
start_time = time.time()
if element in lst:
    print("Element found in list.")
print(f"List membership check took {time.time() - start_time} seconds.")
# 使用集合进行成员资格检查
set_lst = set(lst)  # 将列表转换为集合
start_time = time.time()
if element in set_lst:
    print("Element found in set.")
print(f"Set membership check took {time.time() - start_time} seconds.")
# 注意到集合的成员资格检查速度更快
# Element found in list.
# List membership check took 0.013991117477416992 seconds.
# Element found in set.
# Set membership check took 0.0 seconds.
# 示例2: 使用集合进行高效的去重操作
# 使用列表推导式和if条件进行去重(较慢)
lst = [1, 2, 2, 3, 4, 4, 5, 5, 5]
unique_lst = []
for item in lst:
    if item not in unique_lst:
        unique_lst.append(item)
# 使用集合进行去重(更快)
set_lst = set(lst)
unique_lst_from_set = list(set_lst)
# 注意到使用集合去重更加简洁且高效
# 示例3: 使用集合进行高效的交集、并集和差集运算
# 创建两个集合
set1 = {1, 2, 3, 4, 5}
set2 = {4, 5, 6, 7, 8}
# 计算交集
intersection = set1 & set2
print(intersection)  # 输出: {4, 5}
# 计算并集
union = set1 | set2
print(union)  # 输出: {1, 2, 3, 4, 5, 6, 7, 8}
# 计算差集（set1中有但set2中没有的元素）
difference = set1 - set2
print(difference)  # 输出: {1, 2, 3}
# 这些集合运算比使用列表进行类似操作更加高效
# {4, 5}
# {1, 2, 3, 4, 5, 6, 7, 8}
# {1, 2, 3}
# 应用七：图形和网络算法
# 示例1: 使用集合表示无向图的邻接关系
# 假设我们有一个无向图，用字典表示邻接关系
graph = {
    'A': {'B', 'C'},
    'B': {'A', 'D', 'E'},
    'C': {'A', 'F'},
    'D': {'B'},
    'E': {'B', 'F'},
    'F': {'C', 'E'},
}
# 我们可以使用集合来快速查找一个顶点的所有邻居
def get_neighbors(graph, vertex):
    return graph[vertex]
# 示例：查找顶点'B'的所有邻居
print(get_neighbors(graph, 'B'))  # 输出: {'A', 'D', 'E'}
# 查找两个顶点之间是否有边（直接相邻）
def are_adjacent(graph, vertex1, vertex2):
    return vertex2 in graph[vertex1]
# 示例：检查顶点'A'和'B'是否相邻
print(are_adjacent(graph, 'A', 'B'))  # 输出: True
# {'A', 'D', 'E'}
# True
# 示例2: 使用集合进行网络中的社区检测
# 假设我们有一些已知的社区
communities = {
    'community1': {'A', 'B', 'C'},
    'community2': {'D', 'E', 'F'},
}
# 图的定义，节点到邻居的映射
graph = {
    'A': {'B', 'D'},
    'B': {'A', 'C', 'E'},
    'C': {'B'},
    'D': {'A', 'E', 'F'},
    'E': {'B', 'D', 'F'},
    'F': {'D', 'E'},
}
# 我们可以使用集合运算来查找跨社区的边或进行其他分析
def find_inter_community_edges(graph, communities):
    inter_edges = set()
    for community in communities.values():
        for vertex in community:
            for neighbor in graph[vertex]:
                if neighbor not in community:
                    # 如果邻居不在同一个社区中，则添加这条边到跨社区边集合
                    inter_edges.add((vertex, neighbor))
    return inter_edges
# 示例：查找跨社区的边
print(find_inter_community_edges(graph, communities))  # 输出跨社区的边集合
# {('D', 'A'), ('B', 'E'), ('A', 'D'), ('E', 'B')}
# 应用八：数据清洗和预处理
# 示例1：去除列表中的重复项
# 原始列表，包含重复项
original_list = [1, 2, 3, 2, 4, 4, 5, 5, 6]
# 使用set()去除重复项，然后转换回列表
cleaned_list = list(set(original_list))
# 注意：set()破坏了原始顺序，如果需要保持顺序，可以使用其他方法
# 例如，使用列表推导式和 if 语句来保持顺序
cleaned_list_ordered = []
[cleaned_list_ordered.append(item) for item in original_list if item not in cleaned_list_ordered]
print("原始列表:", original_list)
print("使用set()去除重复项:", cleaned_list)
print("保持顺序的列表:", cleaned_list_ordered)
# 原始列表: [1, 2, 3, 2, 4, 4, 5, 5, 6]
# 使用set()去除重复项: [1, 2, 3, 4, 5, 6]
# 保持顺序的列表: [1, 2, 3, 4, 5, 6]
# 示例2：查找两个列表的交集
# 列表1
list1 = [1, 2, 3, 4, 5]
# 列表2
list2 = [4, 5, 6, 7, 8]
# 使用set()查找交集
intersection = set(list1).intersection(set(list2))
print("列表1和列表2的交集:", intersection)
# 列表1和列表2的交集: {4, 5}
# 示例3：查找两个列表的并集
# 列表1
list1 = [1, 2, 3, 4, 5]
# 列表2
list2 = [4, 5, 6, 7, 8]
# 使用set()查找并集，但注意这只会返回不重复的元素
union_without_duplicates = set(list1).union(set(list2))
# 如果要包含重复项，则需要使用其他方法，例如列表推导式
union_with_duplicates = list(set(list1)) + [x for x in list2 if x not in set(list1)]
print("列表1和列表2的不重复并集:", union_without_duplicates)
print("列表1和列表2的包含重复项的并集:", union_with_duplicates)
# 列表1和列表2的不重复并集: {1, 2, 3, 4, 5, 6, 7, 8}
# 列表1和列表2的包含重复项的并集: [1, 2, 3, 4, 5, 6, 7, 8]
# 示例4：查找一个列表在另一个列表中的差集
# 列表1
list1 = [1, 2, 3, 4, 5]
# 列表2
list2 = [4, 5, 6, 7, 8]
# 使用set()查找差集
difference = set(list1).difference(set(list2))
print("列表1在列表2中的差集:", difference)
# 列表1在列表2中的差集: {1, 2, 3}
# 应用九：缓存和状态管理
# 示例1：使用set作为缓存
def find_factors(n):
    """
    Find all factors of n and return them as a set.
    This function uses a cache to avoid recalculating factors for the same number.
    """
    cache = {}  # 使用字典作为缓存，因为我们需要存储输入和输出
    def _find_factors_helper(n):
        if n in cache:
            return cache[n]
        factors = {i for i in range(1, int(n ** 0.5) + 1) if n % i == 0}
        factors.update({n // i for i in factors if i != n // i})  # 添加另一半因子
        cache[n] = factors
        return factors
    return _find_factors_helper(n)
# 示例
print(find_factors(12))  # 输出: {1, 2, 3, 4, 6, 12}
print(find_factors(12))  # 再次调用，但使用缓存，所以计算会更快
# {1, 2, 3, 4, 6, 12}
# {1, 2, 3, 4, 6, 12}
# 示例2：使用set进行状态管理
# 初始化用户集合
completed_users = set()
def mark_task_completed(user_id):
    """
    将用户标记为已完成任务
    """
    completed_users.add(user_id)
    print(f"User {user_id} has completed the task.")
def check_task_status(user_id):
    """
    检查用户是否已完成任务
    """
    if user_id in completed_users:
        return "Completed"
    else:
        return "Not Completed"
# 示例
mark_task_completed(123)  # 输出: User 123 has completed the task.
print(check_task_status(123))  # 输出: Completed
print(check_task_status(456))  # 输出: Not Completed
# User 123 has completed the task.
# Completed
# Not Completed
# 应用十：实现集合的幂集
def powerset(s):
    """
    Generate the powerset of a given set.
    :param s: The input set.
    :return: The powerset of the input set.
    """
    if not s:
        return [set()]
    subsets_without_current = powerset(s - {next(iter(s))})
    subsets_with_current = [s_ | {next(iter(s))} for s_ in subsets_without_current]
    return subsets_without_current + subsets_with_current
# 示例
input_set = {3, 5, 6, 8}
power_set = powerset(input_set)
for subset in power_set:
    print(subset)
# set()
# {6}
# {5}
# {5, 6}
# {3}
# {3, 6}
# {3, 5}
# {3, 5, 6}
# {8}
# {8, 6}
# {8, 5}
# {8, 5, 6}
# {8, 3}
# {8, 3, 6}
# {8, 3, 5}
# {8, 3, 5, 6}