首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >python中的Dataframe为空?

python中的Dataframe为空?
EN

Stack Overflow用户
提问于 2015-12-02 02:52:53
回答 1查看 565关注 0票数 1

我正在尝试使用pandas将日志文件加载到数据帧中。我有2个文件,我试图合并成1。发生的情况是,数据帧变成空的,这很奇怪,因为相同的代码与其他相同类型的日志文件。

下面是我得到的输出:

代码语言:javascript
复制
rows of df1 146299.000000
columns of df1 6.000000
Columns: [timestamp, type, wait_at_db_queue, db_response_time, wait_server_queue, server_response_time]
Index: []
Empty DataFrame

它说了正确的行数和列数,但没有给出里面的数据,这是怎么回事?以下是代码和数据示例。

代码:

代码语言:javascript
复制
trace_path = '/Users/ramapriyasridharan/Documents/new_exp/new_trace/m3xlarge/01'

    client_path = os.path.join(trace_path,'client')
    middleware_path = os.path.join(trace_path,'middleware')
    df = pd.DataFrame(columns=['timestamp','type','wait_at_db_queue','db_response_time','wait_server_queue','server_response_time'])
    #df = None
    for root, _,files in os.walk(middleware_path):
        for f in files:
            if 'server' not in f : continue
            print 'current file name %s:' %f

            #df.columns = ['timestamp','type','wait_at_db_queue','db_response_time','wait_server_queue','server_response_time']
            f1 = os.path.join(middleware_path,f)
            df1 = pd.read_csv(f1,header=None,sep=',')
            df1.columns = ['timestamp','type','wait_at_db_queue','db_response_time','wait_server_queue','server_response_time']
            #df1 = refine(df1)
            print ' rows of df1 %f' %df1.shape[0]
            print 'columns of df1 %f'%df1.shape[1]
            print 'len of df1 %f' %len(df1)
            df1 = refine(df1)
            print df1
            if df.shape[0] == 0:
                df = df1
                print df
            else:
                df = pd.concat([df,df1],axis=0)
                print df
    print df
    print ' rows of df %f' %df.shape[0]
    print 'columns of df %f'%df.shape[1]

完整输出:

代码语言:javascript
复制
 python find_service_time.py 
current file name rsridhar-serverworker-1448992797827.log:
 rows of df1 146299.000000
columns of df1 6.000000
len of df1 146299.000000
Empty DataFrame
Columns: [timestamp, type, wait_at_db_queue, db_response_time, wait_server_queue, server_response_time]
Index: []
Empty DataFrame
Columns: [timestamp, type, wait_at_db_queue, db_response_time, wait_server_queue, server_response_time]
Index: []
current file name rsridhar-serverworker-1448992805710.log:
 rows of df1 194827.000000
columns of df1 6.000000
len of df1 194827.000000
Empty DataFrame
Columns: [timestamp, type, wait_at_db_queue, db_response_time, wait_server_queue, server_response_time]
Index: []
Empty DataFrame
Columns: [timestamp, type, wait_at_db_queue, db_response_time, wait_server_queue, server_response_time]
Index: []
Empty DataFrame
Columns: [timestamp, type, wait_at_db_queue, db_response_time, wait_server_queue, server_response_time]
Index: []
 rows of df 0.000000
columns of df 6.000000
 len of refined df 0.000000
min timestamp : nan
done
Traceback (most recent call last):
  File "find_service_time.py", line 170, in <module>
    main()
  File "find_service_time.py", line 94, in main
    t_per_sec = map(lambda x: len(df[df['timestamp']==x]), range(1,int(np.max(df['timestamp']))))
ValueError: cannot convert float NaN to integer

示例数据:

代码语言:javascript
复制
1448992805978,GET_QUEUE,1,2,0,2
1448992805978,SEND_MSG,18,147,1,157
1448992805978,SEND_MSG,26,153,0,159
1448992805979,SEND_MSG,20,149,1,163
1448992805979,GET_QUEUE,1,3,1,4
1448992805980,GET_QUEUE,1,3,0,3
1448992805981,GET_QUEUE,2,3,1,4
1448992805981,GET_QUEUE,1,3,1,4
1448992805982,SEND_MSG,5,129,0,133
1448992805983,GET_QUEUE,1,8,0,8
1448992805983,GET_QUEUE,3,5,1,6
1448992805983,GET_QUEUE,0,1,5,6
1448992805984,GET_QUEUE,3,5,2,7
1448992805984,GET_QUEUE,2,5,1,7
1448992805985,GET_QUEUE,0,5,3,8
1448992805985,GET_QUEUE,5,10,0,10
1448992805986,GET_QUEUE,4,9,1,10
1448992805986,GET_QUEUE,9,10,0,10
1448992805987,GET_QUEUE,0,7,3,10
1448992805987,GET_QUEUE,4,5,5,10
1448992805988,GET_QUEUE,5,6,5,11
1448992805989,GET_QUEUE,2,6,6,12
1448992805990,GET_QUEUE,1,4,7,11
1448992805990,GET_QUEUE,0,2,8,10
1448992805991,GET_QUEUE,5,10,4,14
1448992805991,GET_QUEUE,2,4,8,12
1448992805991,GET_QUEUE,0,6,7,13
1448992805992,GET_QUEUE,11,16,0,16
1448992805992,GET_QUEUE,0,4,9,13
1448992805993,GET_QUEUE,4,6,8,14
1448992805992,GET_QUEUE,8,15,0,15
1448992805993,GET_QUEUE,1,7,8,15
1448992805993,GET_QUEUE,1,7,8,15
1448992805993,GET_QUEUE,0,10,6,16
1448992805993,GET_QUEUE,6,9,7,16
1448992805994,GET_QUEUE,1,6,8,14
1448992805994,GET_LATEST_MSG_DELETE,1,8,7,15
1448992805995,GET_QUEUE,2,7,9,16
1448992805995,GET_QUEUE,4,6,6,12
1448992805996,GET_QUEUE,10,20,0,20
1448992805996,GET_QUEUE,12,13,6,19

欢迎任何建议,这只是代码的一小部分。

EN

回答 1

Stack Overflow用户

发布于 2015-12-02 04:00:11

refine()不会从DataFrame中删除某些行;它会删除所有行。在调用它之后,您将获得一个print df1,并且每次的输出都显示为Empty DataFrame。最直接的问题似乎在于你在那里做的任何过滤。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/34027891

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档