文章/答案/技术大牛

发布

社区首页 >问答首页 >查找特定单元格满足特定正则表达式的数据帧行

问查找特定单元格满足特定正则表达式的数据帧行
EN

Stack Overflow用户

提问于 2020-05-04 19:43:03

回答 1查看 39关注 0票数 2

假设我有Pandas数据帧：

data_df:

color          shape          number

green 5        triangle       3
green 1056     rectangle      2
blue           square         1
blue           circle         5
blue           square         4

我还有其他的数据帧，带有这些过滤参数：

filter_df:

color          shape          number

green .*       ANY            ANY
blue           square         1

过滤后的结果应该是：

filtered_data_df:

color          shape          number

blue           circle         5
blue           square         4

我的错误方法是为颜色、形状和数字创建正则表达式，如下所示：

color_regex = 'green .*|blue'
shape_regex = '.*|square'  # I would replace ANY with '.*'
number_regex = '.*|1'

在此之后，我将简单地使用：

filtered_data_df = data_df.drop(
           data_df[data_df['color'].str.match(color_regex , case=False)].index &
           data_df[data_df['shape'].str.match(shape_regex , case=False)].index &
           data_df[data_df['number'].astype(str).str.match(number_regex, case=False)].index,
           axis=0)

但当然，因为我在shape_regex和number_regex中都有'.*‘，所有的东西都会被过滤掉，我希望所有的东西都被过滤掉，只有绿色和蓝色的组合才能过滤掉蓝色/方形/1。

我可能会写一些东西，但这会涉及到某种FOR循环，在使用Pandas时，我想我可以跳过使用它。

在我的实际案例中，data_dt最多可以有5000行，而filter_dt可以有大约100行(3列)的过滤参数组合，并且具有不断增长的潜力(逐行)。

我的问题是，如何以一种有效的方式处理这种过滤？

python-3.x

regex

pandas

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-05-04 20:15:37

IIUC你可以使用列表理解。我要指出的是，df是您的data_df，df1是您的filter_df

# replace any with .*
df1 = df1.replace('ANY', '.*')
# zip your columns
z = zip(df1['color'], df1['shape'], df1['number'])

# list comprehension with logic and str.contains
l = [~((df['color'].str.contains(c)) &\
       (df['shape'].str.contains(s)) & \
       (df['number'].astype(str).str.contains(n))).values for c,s,n in z]

# numpy.equal to see where the true values overlap then use boolean indexing
df[np.equal(*l)]

  color   shape  number
3  blue  circle       5
4  blue  square       4

以下是示例数据

s = """color,shape,number
green 5,triangle,3
green 1056,rectangle,2
blue,square,1
blue,circle,5
blue,square,4"""

s1 = """color,shape,number
green .*,ANY,ANY
blue,square,1"""

df = pd.read_csv(StringIO(s))
df1 = pd.read_csv(StringIO(s1))

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/61600283

复制

相似问题

问查找特定单元格满足特定正则表达式的数据帧行
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问查找特定单元格满足特定正则表达式的数据帧行EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问查找特定单元格满足特定正则表达式的数据帧行
EN