文章/答案/技术大牛

发布

问Python文件操作
EN

Stack Overflow用户

提问于 2009-11-09 07:19:51

回答 3查看 1.9K关注 0票数 1

假设我有这样的文件夹

  rootfolder
      | 
     / \ \
    01 02 03 ....
    |
  13_itemname.xml

因此，在我的根目录下，每个目录表示一个月，比如01 02 03，在这些目录下，我有一些条目，它们的创建时间和项目名称，比如16_item1.xml、24_item1.xml等等，您可能会猜到每个小时都有几个条目和每个xml创建。

现在我想做两件事：

我需要生成一个月的项目名称列表( 01 )，其中包含item1、item2和item3。
我需要过滤每一项，比如item1:我想从01_item1.xml读到24_item1.xml。

如何在Python中以简单的方式实现这些？

回答 3

Stack Overflow用户

回答已采纳

发布于 2009-11-09 08:06:24

这里有两种方法来做你要求的事情(如果我能理解的话)。一个带着regex，一个没有。你选择你喜欢哪一个;)

有一点看起来很神奇，那就是"setdefault“。有关解释，请参见医生们。我把它作为“读者练习”来理解它的工作原理;)

from os import listdir
from os.path import join

DATA_ROOT = "testdata"

def folder_items_no_regex(month_name):

   # dict holding the items (assuming ordering is irrelevant)
   items = {}

   # 1. Loop through all filenames in said folder
   for file in listdir( join( DATA_ROOT, month_name ) ):
      date, name = file.split( "_", 1 )

      # skip files that were not possible to split on "_"
      if not date or not name:
         continue

      # ignore non-.xml files
      if not name.endswith(".xml"):
         continue

      # cut off the ".xml" extension
      name = name[0:-4]

      # keep a list of filenames
      items.setdefault( name, set() ).add( file )

   return items

def folder_items_regex(month_name):

   import re

   # The pattern:
   # 1. match the beginnning of line "^"
   # 2. capture 1 or more digits ( \d+ )
   # 3. match the "_"
   # 4. capture any character (as few as possible ): (.*?)
   # 5. match ".xml"
   # 6. match the end of line "$"
   pattern = re.compile( r"^(\d+)_(.*?)\.xml$" )

   # dict holding the items (assuming ordering is irrelevant)
   items = {}

   # 1. Loop through all filenames in said folder
   for file in listdir( join( DATA_ROOT, month_name ) ):

      match = pattern.match( file )
      if not match:
         continue

      date, name = match.groups()

      # keep a list of filenames
      items.setdefault( name, set() ).add( file )

   return items
if __name__ == "__main__":
   from pprint import pprint

   data = folder_items_no_regex( "02" )

   print "--- The dict ---------------"
   pprint( data )

   print "--- The items --------------"
   pprint( sorted( data.keys() ) )

   print "--- The files for item1 ---- "
   pprint( sorted( data["item1"] ) )


   data = folder_items_regex( "02" )

   print "--- The dict ---------------"
   pprint( data )

   print "--- The items --------------"
   pprint( sorted( data.keys() ) )

   print "--- The files for item1 ---- "
   pprint( sorted( data["item1"] ) )

票数 5

Stack Overflow用户

发布于 2009-11-09 08:17:09

假设项目名称有固定长度的前缀和后缀(例如，3个字符前缀，如'01_‘和4个字符后缀'.xml')，您可以这样解决问题的第一部分：

names = set(name[3:-4] for name in os.listdir('01') if name.endswith('.xml')]

这会给你一个独特的项目名称。

要筛选每一项，只需查找以该项的名称结尾的文件，并在需要时对其进行排序。

item_suffix = '_item2.xml'
filtered = sorted(name for name in os.listdir('01') if name.endswith(item_suffix))

票数 0

Stack Overflow用户

发布于 2009-11-09 08:23:44

不确定你到底想做什么，但以下是一些可能有用的提示

创建文件名("%02d“意为留下零的衬垫)

foldernames = ["%02d"%i for i in range(1,13)]

filenames = ["%02d"%i for i in range(1,24)]

使用os.path.join构建复杂路径而不是字符串连接

os.path.join(foldername,filename)

os.path.exists，用于首先检查文件是否存在

if os.path.exists(newname):
    print "file already exists"

要列出目录内容，请使用glob

from glob import glob
xmlfiles = glob("*.xml")

使用shutil进行更高级别的操作，如创建文件夹、重命名文件

shutil.move(oldname,newname)

basename从完整路径获取文件名

filename = os.path.basename(fullpath)

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/1699552

复制

相似问题

问Python文件操作
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python文件操作EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python文件操作
EN