发布
社区首页 >问答首页 >用xml.etree.ElementTree解析某些元素的问题

用xml.etree.ElementTree解析某些元素的问题
EN

Stack Overflow用户
提问于 2021-09-12 17:48:36
回答 2查看 97关注 0票数 0

我希望你一切都好。我面临着与解析器相关的一些困难。实际上,我的数据集如下所示:

代码语言:javascript
代码运行次数:0
复制
<?xml version="1.0"?>

<bugrepository name="AspectJ">
  <bug id="28974" opendate="2003-1-3 10:28:00" fixdate="2003-1-14 14:30:00">
    <buginformation>
      <summary>"Compiler error when introducing a ""final"" field"</summary>
      <description>The aspecs the problem...</description>
    </buginformation>
    <fixedFiles>
      <file>org.aspectj/modules/weaver/src/org/aspectj/weaver/AjcMemberMaker.java</file>
    </fixedFiles>
  </bug>

  <bug id="28919" opendate="2002-12-30 16:40:00" fixdate="2003-1-14 15:06:00">
    <buginformation>
      <summary>waever tries to weave into native methods ...</summary>
      <description>If youat org.aspectj.ajdt.internal.core.burce</description>
    </buginformation>
    <fixedFiles>
      <file>org.aspectj/modules/weaver/src/org/aspectj/weaver/bcel/LazyMethodGen.java</file>
    </fixedFiles>
  </bug>
  
  <bug id="29186" opendate="2003-1-8 21:22:00" fixdate="2003-1-14 16:43:00">
    <buginformation>
      <summary>ajc -emacssym chokes on pointcut that includes an intertype method</summary>
      <description>This ;void Foo.ajc$before$Foo</description>
    </buginformation>
    <fixedFiles>
      <file>org.aspectj/modules/weaver/src/org/aspectj/weaver/Lint.java</file>
      <file>org.aspectj/modules/weaver/src/org/aspectj/weaver/Shadow.java</file>
      <file>org.aspectj/modules/weaver/src/org/aspectj/weaver/bcel/BcelWeaver.java</file>
    </fixedFiles>
  </bug>
  
  <bug id="29769" opendate="2003-1-19 11:42:00" fixdate="2003-1-24 21:17:00">
    <buginformation>
      <summary>Ajde does not support new AspectJ 1.1 compiler options</summary>
      <description>The org.aspectj.ajpiler. This enhancement is needed byort.</description>
    </buginformation>
    <fixedFiles>
      <file>org.aspectj/modules/ajde/testdata/examples/figures-coverage/figures/Figure.java</file>
      <file>org.aspectj/modules/ajde/testsrc/org/aspectj/ajde/AjdeTests.java</file>
      <file>org.aspectj/modules/ajde/testsrc/org/aspectj/ajde/ui/StructureViewManagerTest.java</file>
      <file>org.aspectj/modules/org.aspectj.ajdt.core/src/org/aspectj/ajdt/ajc/BuildArgParser.java</file>
      <file>org.aspectj/modules/org.aspectj.ajdt.core/src/org/aspectj/ajdt/internal/core/builder/AjBuildConfig.java</file>
      <file>org.aspectj/modules/org.aspectj.ajdt.core/testsrc/org/aspectj/ajdt/ajc/BuildArgParserTestCase.java</file>
    </fixedFiles>
  </bug>
  <bug id="29959" opendate="2003-1-22 7:10:00" fixdate="2003-2-13 16:00:00">
    <buginformation>
      <summary>super call in intertype method declaration body causes VerifyError</summary>
      <description>AspectJ Compiler 1.1 showstopper</description>
    </buginformation>
    <fixedFiles>
      <file>org.aspectj/modules/org.aspectj.ajdt.core/src/org/compiler/ast/InterTypeConstructorDeclaration.java</file>
      <file>org.aspectj/modules/org.aspectj.ajdt.core/src/org/aspectj/ajdt/internal/compiler/ast/SuperFixerVisitor.java</file>
      <file>org.aspectj/modules/org.aspectj.ajdt.core/src/org/aspectj/ajdt/internal/compiler/lookup/InterTypeMethodBinding.java</file>
      <file>org.aspectj/modules/tests/bugs/SuperToIntro.java</file>
    </fixedFiles>
  </bug>
</bugrepository>

我希望能够恢复数据集的某些元素,以便在dataframe中与Pandas一起使用它们。

第一个问题是以列表形式从标记中获取所有子元素。

实际上,我的代码只检索第一个元素,忽略其他元素,或者可以检索所有元素,但不像在这些图片中看到的那样结构化:这里只有空([])列表没有内容

守则:

代码语言:javascript
代码运行次数:0
复制
import pandas as pd 
from xml.etree.ElementTree import parse

document = parse('dataset.xml')
summary = []
description = []
fixedfile = []

for item in document.iterfind('bug'):
    summary.append(item.findtext('buginformation/summary'))
    description.append(item.findtext('buginformation/description'))
    fixedfile.append(item.findall('fixedFiles/file'))
    
#df = pd.DataFrame({'summary':summary, 'description':description, 'fixed_files':fixedfile})
df = pd.DataFrame({'fixed_files': fixedfile})
df

这里只有第一个元素

守则:

代码语言:javascript
代码运行次数:0
复制
import pandas as pd 
from xml.etree.ElementTree import parse

document = parse('dataset.xml')
summary = []
description = []
fixedfile = []

for item in document.iterfind('bug'):
    summary.append(item.findtext('buginformation/summary'))
    description.append(item.findtext('buginformation/description'))
    fixedfile.append(item.findtext('fixedFiles/file'))
    
#df = pd.DataFrame({'summary':summary, 'description':description, 'fixed_files':fixedfile})
df = pd.DataFrame({'fixed_files': fixedfile})
df

我在这里发现了一个适合我的情况的“使用Python遍历xml.etree.ElementTree树的问题”解决方案,它可以工作,但不像我想要的那样(每个元素的列表),我可以加载所有的元素,但是可以单独加载。

守则:

代码语言:javascript
代码运行次数:0
复制
import xml.etree.ElementTree as ET
import pandas as pd 

xmldoc = ET.parse('dataset.xml')
root = xmldoc.getroot()
summary = []
description = []
fixedfile = []

for bug in xmldoc.iter(tag='bug'): 
    
    #for item in document.iterfind('bug'):
    #summary.append(item.findtext('buginformation/summary'))
    #description.append(item.findtext('buginformation/description'))
    
    for file in bug.iterfind('./fixedFiles/file'):
    
           fixedfile.append([file.text])
        
fixedfile
#df = pd.DataFrame({'summary':summary, 'description':description, 'fixed_files':fixedfile})
df = pd.DataFrame({'fixed_files': fixedfile})
df

当我想迭代我的数据的其他列(摘要,描述)时,我得到以下错误消息: ValueError:所有数组必须具有相同的长度

第二个问题,例如能够选择所有有2或3个子元素的标记。

诚挚的问候,

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2021-09-12 18:04:45

若要将文件保存在与描述和摘要关联的列表中,请将它们添加到每个错误的新列表中。

Try:

代码语言:javascript
代码运行次数:0
复制
import pandas as pd
from xml.etree.ElementTree import parse

document = parse('dataset.xml')
summary = []
description = []
fixedfile = []

for item in document.iterfind('bug'):
    summary.append(item.findtext('buginformation/summary'))
    description.append(item.findtext('buginformation/description'))
    fixedfile.append([elt.text for elt in item.findall('fixedFiles/file')])

df = pd.DataFrame({'summary': summary,
                   'description': description,
                   'fixed_files': fixedfile})
df

对于第二部分,这将只过滤那些有两个或更多文件的bug。

代码语言:javascript
代码运行次数:0
复制
newdf = df[df.fixed_files.str.len() >= 2]

如果想要有2和3个文件的bug,那么:

代码语言:javascript
代码运行次数:0
复制
newdf = df[(df.fixed_files.str.len() == 2) | (df.fixed_files.str.len() == 3)]
票数 1
EN

Stack Overflow用户

发布于 2021-09-12 18:10:02

下面收集数据。这样做的目的是找到所有的bug元素并对它们进行迭代。对于每个bug -查找所需的子元素。

代码语言:javascript
代码运行次数:0
复制
import xml.etree.ElementTree as ET
import pandas as pd

xml = '''<?xml version="1.0"?>

<bugrepository name="AspectJ">
  <bug id="28974" opendate="2003-1-3 10:28:00" fixdate="2003-1-14 14:30:00">
    <buginformation>
      <summary>"Compiler error when introducing a ""final"" field"</summary>
      <description>The aspecs the problem...</description>
    </buginformation>
    <fixedFiles>
      <file>org.aspectj/modules/weaver/src/org/aspectj/weaver/AjcMemberMaker.java</file>
    </fixedFiles>
  </bug>

  <bug id="28919" opendate="2002-12-30 16:40:00" fixdate="2003-1-14 15:06:00">
    <buginformation>
      <summary>waever tries to weave into native methods ...</summary>
      <description>If youat org.aspectj.ajdt.internal.core.burce</description>
    </buginformation>
    <fixedFiles>
      <file>org.aspectj/modules/weaver/src/org/aspectj/weaver/bcel/LazyMethodGen.java</file>
    </fixedFiles>
  </bug>
  
  <bug id="29186" opendate="2003-1-8 21:22:00" fixdate="2003-1-14 16:43:00">
    <buginformation>
      <summary>ajc -emacssym chokes on pointcut that includes an intertype method</summary>
      <description>This ;void Foo.ajc$before$Foo</description>
    </buginformation>
    <fixedFiles>
      <file>org.aspectj/modules/weaver/src/org/aspectj/weaver/Lint.java</file>
      <file>org.aspectj/modules/weaver/src/org/aspectj/weaver/Shadow.java</file>
      <file>org.aspectj/modules/weaver/src/org/aspectj/weaver/bcel/BcelWeaver.java</file>
    </fixedFiles>
  </bug>
  
  <bug id="29769" opendate="2003-1-19 11:42:00" fixdate="2003-1-24 21:17:00">
    <buginformation>
      <summary>Ajde does not support new AspectJ 1.1 compiler options</summary>
      <description>The org.aspectj.ajpiler. This enhancement is needed byort.</description>
    </buginformation>
    <fixedFiles>
      <file>org.aspectj/modules/ajde/testdata/examples/figures-coverage/figures/Figure.java</file>
      <file>org.aspectj/modules/ajde/testsrc/org/aspectj/ajde/AjdeTests.java</file>
      <file>org.aspectj/modules/ajde/testsrc/org/aspectj/ajde/ui/StructureViewManagerTest.java</file>
      <file>org.aspectj/modules/org.aspectj.ajdt.core/src/org/aspectj/ajdt/ajc/BuildArgParser.java</file>
      <file>org.aspectj/modules/org.aspectj.ajdt.core/src/org/aspectj/ajdt/internal/core/builder/AjBuildConfig.java</file>
      <file>org.aspectj/modules/org.aspectj.ajdt.core/testsrc/org/aspectj/ajdt/ajc/BuildArgParserTestCase.java</file>
    </fixedFiles>
  </bug>
  <bug id="29959" opendate="2003-1-22 7:10:00" fixdate="2003-2-13 16:00:00">
    <buginformation>
      <summary>super call in intertype method declaration body causes VerifyError</summary>
      <description>AspectJ Compiler 1.1 showstopper</description>
    </buginformation>
    <fixedFiles>
      <file>org.aspectj/modules/org.aspectj.ajdt.core/src/org/compiler/ast/InterTypeConstructorDeclaration.java</file>
      <file>org.aspectj/modules/org.aspectj.ajdt.core/src/org/aspectj/ajdt/internal/compiler/ast/SuperFixerVisitor.java</file>
      <file>org.aspectj/modules/org.aspectj.ajdt.core/src/org/aspectj/ajdt/internal/compiler/lookup/InterTypeMethodBinding.java</file>
      <file>org.aspectj/modules/tests/bugs/SuperToIntro.java</file>
    </fixedFiles>
  </bug>
  </bugrepository>'''

data = []
root = ET.fromstring(xml)
for bug in root.findall('.//bug'):
    bug_info = bug.find('buginformation')
    fixed_files = bug.find('fixedFiles')
    entry = {'summary': bug_info.find('summary').text,'description':bug_info.find('summary').text,'fixedFiles':[x.text for x in list(fixed_files)]}
    data.append(entry)
for entry in data:
    print(entry)
df = pd.DataFrame(data)

输出

代码语言:javascript
代码运行次数:0
复制
{'summary': '"Compiler error when introducing a ""final"" field"', 'description': '"Compiler error when introducing a ""final"" field"', 'fixedFiles': ['org.aspectj/modules/weaver/src/org/aspectj/weaver/AjcMemberMaker.java']}
{'summary': 'waever tries to weave into native methods ...', 'description': 'waever tries to weave into native methods ...', 'fixedFiles': ['org.aspectj/modules/weaver/src/org/aspectj/weaver/bcel/LazyMethodGen.java']}
{'summary': 'ajc -emacssym chokes on pointcut that includes an intertype method', 'description': 'ajc -emacssym chokes on pointcut that includes an intertype method', 'fixedFiles': ['org.aspectj/modules/weaver/src/org/aspectj/weaver/Lint.java', 'org.aspectj/modules/weaver/src/org/aspectj/weaver/Shadow.java', 'org.aspectj/modules/weaver/src/org/aspectj/weaver/bcel/BcelWeaver.java']}
{'summary': 'Ajde does not support new AspectJ 1.1 compiler options', 'description': 'Ajde does not support new AspectJ 1.1 compiler options', 'fixedFiles': ['org.aspectj/modules/ajde/testdata/examples/figures-coverage/figures/Figure.java', 'org.aspectj/modules/ajde/testsrc/org/aspectj/ajde/AjdeTests.java', 'org.aspectj/modules/ajde/testsrc/org/aspectj/ajde/ui/StructureViewManagerTest.java', 'org.aspectj/modules/org.aspectj.ajdt.core/src/org/aspectj/ajdt/ajc/BuildArgParser.java', 'org.aspectj/modules/org.aspectj.ajdt.core/src/org/aspectj/ajdt/internal/core/builder/AjBuildConfig.java', 'org.aspectj/modules/org.aspectj.ajdt.core/testsrc/org/aspectj/ajdt/ajc/BuildArgParserTestCase.java']}
{'summary': 'super call in intertype method declaration body causes VerifyError', 'description': 'super call in intertype method declaration body causes VerifyError', 'fixedFiles': ['org.aspectj/modules/org.aspectj.ajdt.core/src/org/compiler/ast/InterTypeConstructorDeclaration.java', 'org.aspectj/modules/org.aspectj.ajdt.core/src/org/aspectj/ajdt/internal/compiler/ast/SuperFixerVisitor.java', 'org.aspectj/modules/org.aspectj.ajdt.core/src/org/aspectj/ajdt/internal/compiler/lookup/InterTypeMethodBinding.java', 'org.aspectj/modules/tests/bugs/SuperToIntro.java']}
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/69153935

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档