高质量编码--使用Pandas查询日期文件名中的数据

原创

MiaoGIS

修改于 2019-07-29 18:54:15

2K00

代码可运行

文章被收录于专栏：Python in AI-IOTPython in AI-IOT

运行总次数：0

代码可运行

如下场景：数据按照日期保存为文件夹，文件夹中数据又按照分钟保存为csv文件。

2019-07-28文件夹和2019-07-29中的文件分别如下：

代码如下，其中subDirTimeFormat，fileTimeFormat，requestTimeFormat分别来指定文件夹解析格式，文件解析格式，以及查询参数日期解析格式：

import os
import pandas as pd

onedayDelta=pd.datetime(2018,9,2)-pd.datetime(2018,9,1)
baseDir="D:/Data"
subDirTimeFormat="%Y-%m-%d"
fileTimeFormat="%Y-%m-%d_%H-%M.csv"

requestTimeFormat="%Y-%m-%d %H:%M"


subDirs=os.listdir(baseDir)

def getData(startTime,endTime,featureID,featureField,resultFields):
    startTime=pd.datetime.strptime(startTime,requestTimeFormat)
    endTime=pd.datetime.strptime(endTime,requestTimeFormat)
    days=(endTime.date()-startTime.date()).days
    print(days)
    dates=list(map(lambda x:pd.datetime.strftime(startTime+onedayDelta*x,subDirTimeFormat),range(days+1)))
    dates=list(filter(lambda x:x in subDirs,dates))
    files=list(map(lambda x:map(lambda y:os.path.join(baseDir,x,y),os.listdir(os.path.join(baseDir,x))),dates))
    
    files=pd.DataFrame(files).values.flatten()
    
    fileNames=map(lambda x:os.path.basename(x),files)
    dfs=list(map(lambda x:pd.read_csv(x),files))
    
    for i,j in zip(dfs,fileNames):
        i['datetime']=pd.datetime.strptime(j,fileTimeFormat)
        i['datetimeTxt']=pd.datetime.strptime(j,fileTimeFormat).strftime(requestTimeFormat)
    if(len(dfs)==0):
        return {}
    
    df=pd.concat(dfs)
    df=df.set_index([featureField,'datetime']).sort_index()
    if not df.index.contains(featureID):
        return {}
    result=df.loc[featureID].loc[startTime:endTime].reset_index()[['datetimeTxt']+resultFields].to_json()
    return result 
if __name__=='__main__':
    
    result=getData('2019-07-28 05:29','2019-07-29 17:29',12,"name",["value1","value2"])
    print(result)

让我们查询2019-07-28 05:29到2019-07-29 17:29之间name为12的数据，只返回value1和value2列。看一下调用结果：

通过比较检验，确认返回结果和csv文件中的数据是一致的，

name为12在各个csv中数据如下：

2019-07-28 15:25时刻对应value1和value2分别为12，2507

2019-07-28 16:25时刻对应value1和value2分别为181，1425

2019-07-29 12:45时刻对应value1和value2分别为104，2724

2019-07-29 12:55时刻对应value1和value2分别为147，2416

函数可以指定主键字段以及返回列作为参数，使其更有通用性和扩展性。

最后用Tornado封装成web接口，代码如下：

# -*- coding:utf-8 -*
import os
import json
from datetime import datetime
import tornado.httpserver
import tornado.ioloop
import tornado.options
import tornado.web
from tornado.options import define,options
from DataTools import getData ,onedayDelta,requestTimeFormat
define('port',default=8091,help='run on the given port',type=int)


class csvAPIHandler(tornado.web.RequestHandler):
    def get(self):
        startTime=self.get_argument('start',(datetime.now()-onedayDelta).strftime("%Y-%m-%d %H:%M"))
        endTime=self.get_argument('end',datetime.now().strftime("%Y-%m-%d %H:%M"))
        rowID=self.get_argument('id','9999')
        if rowID.isdigit():
            rowID=int(rowID)
        rowKey=self.get_argument('key','name')
        resultCols=self.get_argument('cols','["value1","value2"]')
        resultCols=json.loads(resultCols)
        print(startTime)
        print(endTime)
        print(rowID)
        print(rowKey)
        print(resultCols)
        
        result=getData(startTime,endTime,rowID,rowKey,resultCols)

     
        self.write(result)



if __name__=='__main__':
    tornado.options.parse_command_line()
    app=tornado.web.Application(
        handlers=[
		(r'/csvAPI',csvAPIHandler),
                  ],
        template_path=os.path.join(os.path.curdir,'templates'),static_path=os.path.join(os.path.curdir,'static'),cookie_secret='miaojiyue',debug=True )
         
    http_server=tornado.httpserver.HTTPServer(app)
    http_server.listen(options.port)
    tornado.ioloop.IOLoop.instance().start()