pandas中有很多的操作技巧需要我们去挖掘,本文中介绍pandas
中的3
种操作奇技淫巧,让pandas
的操作骚动起来。
DF
型数据columns
strings
转成numbers
import pandas as pd
import numpy as np
df1 = pd.read_excel("salesfunnel.xlsx")
df1
.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>
Account | Name | Rep | Manager | Product | Quantity | Price | Status | |
---|---|---|---|---|---|---|---|---|
0 | 714466 | Trantow-Barrows | Craig Booker | Debra Henley | CPU | 1 | 30000 | presented |
1 | 714466 | Trantow-Barrows | Craig Booker | Debra Henley | Software | 1 | 10000 | presented |
2 | 714466 | Trantow-Barrows | Craig Booker | Debra Henley | Maintenance | 2 | 5000 | pending |
3 | 737550 | Fritsch, Russel and Anderson | Craig Booker | Debra Henley | CPU | 1 | 35000 | declined |
4 | 146832 | Kiehn-Spinka | Daniel Hilton | Debra Henley | CPU | 2 | 65000 | won |
5 | 218895 | Kulas Inc | Daniel Hilton | Debra Henley | CPU | 2 | 40000 | pending |
6 | 218895 | Kulas Inc | Daniel Hilton | Debra Henley | Software | 1 | 10000 | presented |
7 | 412290 | Jerde-Hilpert | John Smith | Debra Henley | Maintenance | 2 | 5000 | pending |
8 | 740150 | Barton LLC | John Smith | Debra Henley | CPU | 1 | 35000 | declined |
9 | 141962 | Herman LLC | Cedric Moss | Fred Anderson | CPU | 2 | 65000 | won |
10 | 163416 | Purdy-Kunde | Cedric Moss | Fred Anderson | CPU | 1 | 30000 | presented |
11 | 239344 | Stokes LLC | Cedric Moss | Fred Anderson | Maintenance | 1 | 5000 | pending |
12 | 239344 | Stokes LLC | Cedric Moss | Fred Anderson | Software | 1 | 10000 | presented |
13 | 307599 | Kassulke, Ondricka and Metz | Wendy Yule | Fred Anderson | Maintenance | 3 | 7000 | won |
14 | 688981 | Keeling LLC | Wendy Yule | Fred Anderson | CPU | 5 | 100000 | won |
15 | 729833 | Koepp Ltd | Wendy Yule | Fred Anderson | CPU | 2 | 65000 | declined |
16 | 729833 | Koepp Ltd | Wendy Yule | Fred Anderson | Monitor | 2 | 5000 | presented |
这个方法是将我们剪贴板中的内容直接变成DataFrame型数据,不需要其他转换。
我们需要事先在表格中剪贴好数据,然后直接运行下面的代码:
# 现在剪贴板中进行赋值,再执行下面的语句
df2 = pd.read_clipboard()
df2
.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>
Account | Name | Rep | Manager | Product | Quantity | Price | Status | |
---|---|---|---|---|---|---|---|---|
0 | 714466 | Trantow-Barrows | Craig Booker | Debra Henley | CPU | 1 | 30000 | presented |
1 | 714466 | Trantow-Barrows | Craig Booker | Debra Henley | Software | 1 | 10000 | presented |
2 | 714466 | Trantow-Barrows | Craig Booker | Debra Henley | Maintenance | 2 | 5000 | pending |
3 | 737550 | Fritsch, Russel and Anderson | Craig Booker | Debra Henley | CPU | 1 | 35000 | declined |
4 | 146832 | Kiehn-Spinka | Daniel Hilton | Debra Henley | CPU | 2 | 65000 | won |
5 | 218895 | Kulas Inc | Daniel Hilton | Debra Henley | CPU | 2 | 40000 | pending |
6 | 218895 | Kulas Inc | Daniel Hilton | Debra Henley | Software | 1 | 10000 | presented |
7 | 412290 | Jerde-Hilpert | John Smith | Debra Henley | Maintenance | 2 | 5000 | pending |
8 | 740150 | Barton LLC | John Smith | Debra Henley | CPU | 1 | 35000 | declined |
9 | 141962 | Herman LLC | Cedric Moss | Fred Anderson | CPU | 2 | 65000 | won |
10 | 163416 | Purdy-Kunde | Cedric Moss | Fred Anderson | CPU | 1 | 30000 | presented |
11 | 239344 | Stokes LLC | Cedric Moss | Fred Anderson | Maintenance | 1 | 5000 | pending |
12 | 239344 | Stokes LLC | Cedric Moss | Fred Anderson | Software | 1 | 10000 | presented |
13 | 307599 | Kassulke, Ondricka and Metz | Wendy Yule | Fred Anderson | Maintenance | 3 | 7000 | won |
14 | 688981 | Keeling LLC | Wendy Yule | Fred Anderson | CPU | 5 | 100000 | won |
15 | 729833 | Koepp Ltd | Wendy Yule | Fred Anderson | CPU | 2 | 65000 | declined |
16 | 729833 | Koepp Ltd | Wendy Yule | Fred Anderson | Monitor | 2 | 5000 | presented |
df3 = pd.DataFrame({'name':['xiaoming','zhansan','lisi'],
'age':[22,28,25],
'address':['shenzhen','guangzhou','changsha'],
'height':[178,180,176]
})
df3
.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>
name | age | address | height | |
---|---|---|---|---|
0 | xiaoming | 22 | shenzhen | 178 |
1 | zhansan | 28 | guangzhou | 180 |
2 | lisi | 25 | changsha | 176 |
df3.dtypes
name object
age int64
address object
height int64
dtype: object
# 我们直接选择int64型的数据
df3.select_dtypes(include='int64')
.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>
age | height | |
---|---|---|
0 | 22 | 178 |
1 | 28 | 180 |
2 | 25 | 176 |
# 选择除了int64之外的数据
df3.select_dtypes(exclude='int64')
.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>
name | address | |
---|---|---|
0 | xiaoming | shenzhen |
1 | zhansan | guangzhou |
2 | lisi | changsha |
# 同时选择多种数据类型
df3.select_dtypes(include=['int64','object'])
.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>
name | age | address | height | |
---|---|---|---|---|
0 | xiaoming | 22 | shenzhen | 178 |
1 | zhansan | 28 | guangzhou | 180 |
2 | lisi | 25 | changsha | 176 |
两种方法将字符串改成数值型数据
df4 = pd.DataFrame({'goods':['A','B','C'],
'price':['30','20','60'],
'sales':['50','-','40']}
)
df4
.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>
goods | price | sales | |
---|---|---|---|
0 | A | 30 | 50 |
1 | B | 20 | - |
2 | C | 60 | 40 |
df4.dtypes
goods object
price object
sales object
dtype: object
# 将price转成int型
# 等价: df4 = df4.astype({'price': 'int'})
df4['price'] = df4['price'].astype('int64')
df4
.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>
goods | price | sales | |
---|---|---|---|
0 | A | 30 | 50 |
1 | B | 20 | - |
2 | C | 60 | 40 |
df4.dtypes
goods object
price int64
sales object
dtype: object
使用同样的方法转化sales列数据则会报错:
df4['sales'] = pd.to_numeric(df4['sales'], errors='coerce')
df4
.dataframe tbody tr th:only-of-type { vertical-align: middle; } <pre><code>.dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; } </code></pre>
goods | price | sales | |
---|---|---|---|
0 | A | 30 | 50.0 |
1 | B | 20 | NaN |
2 | C | 60 | 40.0 |
df4.dtypes
goods object
price int64
sales float64
dtype: object