今年拿到的观测资料是nc格式,为了保证去年的脚本还能正常使用,可以考虑先将观测转为csv表格。NC数据的信息如下:
dimensions:
time = 1
station = 3956 // unlimited
strlen = 30
variables:
character stid ( station, strlen )
float lon ( station )
units : degrees_east
longname : longitude
float lat ( station )
units : degrees_east
longname : latitude
float elev ( station )
units : meter
longname : elevation
integer wd10a ( station, time )
units : degree
longname : Wind Direction,10 minute average value
float ws10a ( station, time )
units : m/s
longname : Wind speed,10 minute average value
主要用到了两个库
示例脚本
import netCDF4 as nc
import numpy as np
import pandas as pd
filename = "20210301100000.nc"
fout = "test.csv"
fn = nc.Dataset(filename,"r")
stid = fn.variables['stid']
stid = np.apply_along_axis(lambda x: x.tobytes().decode("utf-8"), 1, stid[:].data)
lon = fn.variables['lon']
lat = fn.variables['lat']
elev = fn.variables['elev']
wd10a = fn.variables['wd10a'] # Wind Direction,10 minute average value
ws10a = fn.variables['ws10a'] # Wind speed,10 minute average value
df = pd.DataFrame( { 'stid' : stid,
'lon' : lon[:],
'lat' : lat[:],
'elev' : elev[:],
'wd10a': wd10a[:,0], # 必须是1维
'ws10a': ws10a[:,0],
} )
df.to_csv(fout, index=False)
另外需要注意一下stid的处理,stid变量的内容如下:
[[b'5' b'4' b'3' ... b'' b'' b'']
[b'5' b'4' b'4' ... b'' b'' b'']
[b'5' b'4' b'4' ... b'' b'' b'']
...
[b'C' b'S' b'2' ... b'' b'' b'']
[b'C' b'S' b'2' ... b'' b'' b'']
[b'C' b'S' b'2' ... b'' b'' b'']]
这是一个二维的character变量,第0维表示不同的站点,第1维表示的是每个站点的id,每一位存放一个字符。我们需要通过np.apply_along_axis利用匿名函数lambda x: x.tobytes().decode("utf-8")将原始数据按行合并成字符串并解码为utf-8格式。