我有一个熊猫DataFrame,记录了所有四个点的绝对坐标: TL (左上角),TR (右上),BL (左下角)和BR (右下角)。实际上,凹槽似乎遵循一种类似行的模式,其中有明显的成群形成“行”,如图中所示:

数据如下所示:
tl_x tl_y tr_x tr_y br_x br_y bl_x bl_y ht wd
0 1567 136 1707 136 1707 153 1567 153 17 140
1 1360 154 1548 154 1548 175 1360 175 21 188
2 1567 154 1747 154 1747 174 1567 174 20 180
3 1311 175 1548 175 1548 196 1311 196 21 237
4 1565 174 1741 174 1741 199 1565 199 25 176
5 1566 196 1753 196 1753 220 1566 220 24 187
...我需要沿着bl_y或br_y列(底部Y坐标)对这些对象进行聚类,以生成一个2D的“行”列表,如:

正如您所看到的,每个“行”中的对象可能具有稍微不同的Y坐标(在每个集群中并不完全等价)。我基本上需要一些函数来向DF中添加一个单独的例如clustered_y列,然后按这个列进行排序。
最简单的方法是什么?
发布于 2022-05-22 09:42:24
考虑到您提供的数据:
import pandas as pd
df = pd.DataFrame(
{
"tl_x": {0: 1567, 1: 1360, 2: 1567, 3: 1311, 4: 1565, 5: 1566},
"tl_y": {0: 136, 1: 154, 2: 154, 3: 175, 4: 174, 5: 196},
"tr_x": {0: 1707, 1: 1548, 2: 1747, 3: 1548, 4: 1741, 5: 1753},
"tr_y": {0: 136, 1: 154, 2: 154, 3: 175, 4: 174, 5: 196},
"br_x": {0: 1707, 1: 1548, 2: 1747, 3: 1548, 4: 1741, 5: 1753},
"br_y": {0: 153, 1: 175, 2: 174, 3: 196, 4: 199, 5: 220},
"bl_x": {0: 1567, 1: 1360, 2: 1567, 3: 1311, 4: 1565, 5: 1566},
"bl_y": {0: 153, 1: 175, 2: 174, 3: 196, 4: 199, 5: 220},
"ht": {0: 17, 1: 21, 2: 20, 3: 21, 4: 25, 5: 24},
"wd": {0: 140, 1: 188, 2: 180, 3: 237, 4: 176, 5: 187},
}
)有一种方法可以做到:
# Calculate distance between "br_y" values
df = df.sort_values(by="br_y")
df["previous"] = df["br_y"].shift(1).fillna(method="bfill")
df["distance"] = df["br_y"] - df["previous"]
# Group values if distance > 5% of "br_y" values mean (arbitrarily chosen)
clusters = df.copy().loc[df["distance"] > 0.05 * df["br_y"].mean()]
clusters["clustered_br_y"] = [f"row{i}" for i in range(clusters.shape[0])]
# Add clusters back to dataframe and cleanup
df = (
pd.merge(
how="left",
left=df,
right=clusters["clustered_br_y"],
left_index=True,
right_index=True,
)
.fillna(method="ffill")
.fillna(method="bfill")
.drop(columns=["previous", "distance"])
.reset_index(drop=True)
) tl_x tl_y tr_x tr_y br_x br_y bl_x bl_y ht wd clustered_br_y
0 1567 136 1707 136 1707 153 1567 153 17 140 row0
1 1567 154 1747 154 1747 174 1567 174 20 180 row0
2 1360 154 1548 154 1548 175 1360 175 21 188 row0
3 1311 175 1548 175 1548 196 1311 196 21 237 row1
4 1565 174 1741 174 1741 199 1565 199 25 176 row1
5 1566 196 1753 196 1753 220 1566 220 24 187 row2https://stackoverflow.com/questions/72299545
复制相似问题