我有两个与股票及其价格相关的数据帧,我试图交叉匹配每个数据帧中的数据。
df1
=每个用户选择了多个股票的数据库:
Username Stock 1 Stock 2
0 JB3004 TSLA MSFT
1 JM3009 SHOP SPOT
2 DB0208 TWTR MSFT
3 AB3011 TWTR PTON
4 CB3004 MSFT TSLA
df2
=每只股票的今天收盘价:
TWTR SPOT PTON SHOP MSFT TSLA
Date Adj Close Adj Close Adj Close Adj Close Adj Close Adj Close
2020-12-11 51.44 341.22 117.1 1057.87 213.26 609.99
我尝试将df1
中每个用户的相关股票与df2
中的调整收盘价进行匹配,这样我就可以为每个用户选择的股票打印一个具有正确收盘价的df3
。
我该怎么做呢?我试过的所有东西都无法接近,所以我需要一些帮助!
发布于 2020-12-13 12:11:17
我也遇到过类似的问题。然后我有了一个解决方案,我正在与你分享。希望这篇文章能帮你找到答案。要查看我的解决方案,请单击github
创建df1
data1 = {"Username" : ["JB3004", "JM3009", "DB0208", "AB3011", "CB3004"],
"Stock_1" : ["TSLA", "SHOP", "TWTR", "TWTR", "MSFT"],
"Stock_2" : ["MSFT", "SPOT", "MSFT", "PTON", "TSLA"]}
df1 = pd.DataFrame(data=data1)
df1.head()
Username Stock_1 Stock_2
0 JB3004 TSLA MSFT
1 JM3009 SHOP SPOT
2 DB0208 TWTR MSFT
3 AB3011 TWTR PTON
4 CB3004 MSFT TSLA
将宽格式转换为长格式数据
df1_1 = pd.wide_to_long(df1, stubnames='Stock_', i='Username', j='Stock_num')
df1_1.reset_index(inplace=True)
df1_1
Username Stock_num Stock_
0 JB3004 1 TSLA
1 JM3009 1 SHOP
2 DB0208 1 TWTR
3 AB3011 1 TWTR
4 CB3004 1 MSFT
5 JB3004 2 MSFT
6 JM3009 2 SPOT
7 DB0208 2 MSFT
8 AB3011 2 PTON
9 CB3004 2 TSLA
将列名Stock_重命名为Stocks
df1_1.rename(columns={"Stock_": "Stocks"}, inplace=True)
df1_1
创建与df2匹配的df2
closing_price.csv文件包含收盘价数据
# closing_price.csv
,TWTR,SPOT,PTON,SHOP,MSFT,TSLA
Date,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close
2020-12-11,51.44,341.22,117.1,1057.87,213.26,609.99
加载df2
df2 = pd.read_csv("closing_price.csv", index_col=None)
df2.head()
Unnamed: 0 TWTR SPOT PTON SHOP MSFT TSLA
0 Date Adj Close Adj Close Adj Close Adj Close Adj Close Adj Close
1 2020-12-11 51.44 341.22 117.1 1057.87 213.26 609.99
数据清理和转换
df2.set_index("Unnamed: 0", inplace = True)
df2.index.name = "Date"
df2.reset_index(inplace=True)
df2.drop([0], inplace=True)
df2.head()
Date TWTR SPOT PTON SHOP MSFT TSLA
1 2020-12-11 51.44 341.22 117.1 1057.87 213.26 609.99
将宽格式转换为长格式数据
# Convert wide format to long format data
df2_1 = pd.melt(df2, id_vars=['Date'], value_vars=["TWTR", "SPOT", "PTON", "SHOP", "MSFT", "TSLA"], var_name="Stocks", value_name="Adj Close")
df2_1
Date Stocks Adj Close
0 2020-12-11 TWTR 51.44
1 2020-12-11 SPOT 341.22
2 2020-12-11 PTON 117.1
3 2020-12-11 SHOP 1057.87
4 2020-12-11 MSFT 213.26
5 2020-12-11 TSLA 609.99
现在,df1_1和df2_1如下:
df1_1
Username Stock_num Stocks
0 JB3004 1 TSLA
1 JM3009 1 SHOP
2 DB0208 1 TWTR
3 AB3011 1 TWTR
4 CB3004 1 MSFT
5 JB3004 2 MSFT
6 JM3009 2 SPOT
7 DB0208 2 MSFT
8 AB3011 2 PTON
9 CB3004 2 TSLA
df2_1
Date Stocks Adj Close
0 2020-12-11 TWTR 51.44
1 2020-12-11 SPOT 341.22
2 2020-12-11 PTON 117.1
3 2020-12-11 SHOP 1057.87
4 2020-12-11 MSFT 213.26
5 2020-12-11 TSLA 609.99
合并"Stocks“列上的df1_1和df2_1
# Merge df1_1 and df2_1 on column "Stocks"
df3 = pd.merge(df1_1, df2_1, on='Stocks')
df3
Username Stock_num Stocks Date Adj Close
0 JB3004 1 TSLA 2020-12-11 609.99
1 CB3004 2 TSLA 2020-12-11 609.99
2 JM3009 1 SHOP 2020-12-11 1057.87
3 DB0208 1 TWTR 2020-12-11 51.44
4 AB3011 1 TWTR 2020-12-11 51.44
5 CB3004 1 MSFT 2020-12-11 213.26
6 JB3004 2 MSFT 2020-12-11 213.26
7 DB0208 2 MSFT 2020-12-11 213.26
8 JM3009 2 SPOT 2020-12-11 341.22
9 AB3011 2 PTON 2020-12-11 117.1
重新排列列
# Rearrange columns
df3.set_index(["Date"], inplace=True)
df3.reset_index(inplace=True)
df3
Date Username Stock_num Stocks Adj Close
0 2020-12-11 JB3004 1 TSLA 609.99
1 2020-12-11 CB3004 2 TSLA 609.99
2 2020-12-11 JM3009 1 SHOP 1057.87
3 2020-12-11 DB0208 1 TWTR 51.44
4 2020-12-11 AB3011 1 TWTR 51.44
5 2020-12-11 CB3004 1 MSFT 213.26
6 2020-12-11 JB3004 2 MSFT 213.26
7 2020-12-11 DB0208 2 MSFT 213.26
8 2020-12-11 JM3009 2 SPOT 341.22
9 2020-12-11 AB3011 2 PTON 117.1
# Reshaping or pivoting data based on column values
df = df3.pivot(index="Username", columns="Stock_num", values=["Stocks", "Adj Close"])
df
Stocks Adj Close
Stock_num 1 2 1 2
Username
AB3011 TWTR PTON 51.44 117.1
CB3004 MSFT TSLA 213.26 609.99
DB0208 TWTR MSFT 51.44 213.26
JB3004 TSLA MSFT 609.99 213.26
JM3009 SHOP SPOT 1057.87 341.22
发布于 2020-12-13 13:05:05
刚看到这个,我想我应该试一试。
在df2上使用pandas.DataFrame.stack()将所有内容与df1对齐。如果需要,可以重命名一些字段。
df2t = df2.stack().reset_index().rename(
columns={
"level_0":"date",
"level_1":"stock",
0:"closing_price",
},
)
df2t = df2t.loc[df2t["date"] != "Date", :]
数据-
date stock closing_price
6 2020-12-11 TWTR 51.44
7 2020-12-11 SPOT 341.22
8 2020-12-11 PTON 117.1
9 2020-12-11 SHOP 1057.87
10 2020-12-11 MSFT 213.26
11 2020-12-11 TSLA 609.99
df1上的pandas.melt()
df1m = pd.melt(df1, id_vars=["username"], value_vars=["Stock 1", "Stock 2"])
数据-
username variable value
0 JB3004 Stock 1 TSLA
1 JM3009 Stock 1 SHOP
2 DB0208 Stock 1 TWTR
3 AB3011 Stock 1 TWTR
4 CB3004 Stock 1 MSFT
5 JB3004 Stock 2 MSFT
6 JM3009 Stock 2 SPOT
7 DB0208 Stock 2 MSFT
8 AB3011 Stock 2 PTON
9 CB3004 Stock 2 TSLA
合并数据帧。
df = pd.merge(df1m, df2t, left_on="value", right_on="stock", sort=False)
数据-
username variable value date stock closing_price
0 JB3004 Stock 1 TSLA 2020-12-11 TSLA 609.99
1 CB3004 Stock 2 TSLA 2020-12-11 TSLA 609.99
2 JM3009 Stock 1 SHOP 2020-12-11 SHOP 1057.87
3 DB0208 Stock 1 TWTR 2020-12-11 TWTR 51.44
4 AB3011 Stock 1 TWTR 2020-12-11 TWTR 51.44
5 CB3004 Stock 1 MSFT 2020-12-11 MSFT 213.26
6 JB3004 Stock 2 MSFT 2020-12-11 MSFT 213.26
7 DB0208 Stock 2 MSFT 2020-12-11 MSFT 213.26
8 JM3009 Stock 2 SPOT 2020-12-11 SPOT 341.22
9 AB3011 Stock 2 PTON 2020-12-11 PTON 117.1
进行一些清理,然后旋转以获得可用的结果
df = df.drop("value", axis=1).rename(columns={"variable": "holding_id"})
df = df.pivot(index="username", columns="holding_id", values=["stock", "closing_price"]).rename(columns=lambda x: x.strip())
数据-
stock closing_price
holding_id Stock 1 Stock 2 Stock 1 Stock 2
username
AB3011 TWTR PTON 51.44 117.1
CB3004 MSFT TSLA 213.26 609.99
DB0208 TWTR MSFT 51.44 213.26
JB3004 TSLA MSFT 609.99 213.26
JM3009 SHOP SPOT 1057.87 341.22
使用多索引选择数据非常简单
df.loc[:,"stock"]["Stock 1"]
数据-
username
AB3011 TWTR
CB3004 MSFT
DB0208 TWTR
JB3004 TSLA
JM3009 SHOP
Name: Stock 1, dtype: object
或者,包含目标选择的用户名:
df.loc["AB3011","stock"]["Stock 1"]
数据-
'TWTR'
https://stackoverflow.com/questions/65267138
复制相似问题