一尘不染

Java 如何使用SAX解析器解析XML

java

它很好用,但是我希望它返回一个包含所有字符串的数组,而不是最后一个元素返回一个字符串。

任何想法如何做到这一点?


阅读 296

收藏
2020-03-02

共1个答案

一尘不染

因此,你想构建一个XML解析器来解析这样的RSS feed。

<rss version="0.92">
<channel>
    <title>MyTitle</title>
    <link>http://myurl.com</link>
    <description>MyDescription</description>
    <lastBuildDate>SomeDate</lastBuildDate>
    <docs>http://someurl.com</docs>
    <language>SomeLanguage</language>

    <item>
        <title>TitleOne</title>
        <description><![CDATA[Some text.]]></description>
        <link>http://linktoarticle.com</link>
    </item>

    <item>
        <title>TitleTwo</title>
        <description><![CDATA[Some other text.]]></description>
        <link>http://linktoanotherarticle.com</link>
    </item>

</channel>
</rss>

现在,你可以使用两个SAX实现。你可以使用org.xml.sax或android.sax实现。在发布简短的示例后,我将解释两者的优点和缺点。

android.sax实现

让我们从android.sax实现开始。

你首先必须使用RootElement和Element对象定义XML结构。

无论如何,我都会使用POJO(普通的旧Java对象)来保存你的数据。这就是所需的POJO。

Channel.java

public class Channel implements Serializable {

    private Items items;
    private String title;
    private String link;
    private String description;
    private String lastBuildDate;
    private String docs;
    private String language;

    public Channel() {
        setItems(null);
        setTitle(null);
        // set every field to null in the constructor
    }

    public void setItems(Items items) {
        this.items = items;
    }

    public Items getItems() {
        return items;
    }

    public void setTitle(String title) {
        this.title = title;
    }

    public String getTitle() {
        return title;
    }
    // rest of the class looks similar so just setters and getters
}

此类实现了Serializable接口,因此你可以将其放入Bundle并对其进行处理。

现在我们需要一个类来存放我们的物品。在这种情况下,我将扩展ArrayList课程。

Items.java

public class Items extends ArrayList<Item> {

    public Items() {
        super();
    }

}

就我们的项目容器而言。现在,我们需要一个类来保存每个项目的数据。

Item.java

public class Item implements Serializable {

    private String title;
    private String description;
    private String link;

    public Item() {
        setTitle(null);
        setDescription(null);
        setLink(null);
    }

    public void setTitle(String title) {
        this.title = title;
    }

    public String getTitle() {
        return title;
    }

    // same as above.

}

例:

public class Example extends DefaultHandler {

    private Channel channel;
    private Items items;
    private Item item;

    public Example() {
        items = new Items();
    }

    public Channel parse(InputStream is) {
        RootElement root = new RootElement("rss");
        Element chanElement = root.getChild("channel");
        Element chanTitle = chanElement.getChild("title");
        Element chanLink = chanElement.getChild("link");
        Element chanDescription = chanElement.getChild("description");
        Element chanLastBuildDate = chanElement.getChild("lastBuildDate");
        Element chanDocs = chanElement.getChild("docs");
        Element chanLanguage = chanElement.getChild("language");

        Element chanItem = chanElement.getChild("item");
        Element itemTitle = chanItem.getChild("title");
        Element itemDescription = chanItem.getChild("description");
        Element itemLink = chanItem.getChild("link");

        chanElement.setStartElementListener(new StartElementListener() {
            public void start(Attributes attributes) {
                channel = new Channel();
            }
        });

        // Listen for the end of a text element and set the text as our
        // channel's title.
        chanTitle.setEndTextElementListener(new EndTextElementListener() {
            public void end(String body) {
                channel.setTitle(body);
            }
        });

        // Same thing happens for the other elements of channel ex.

        // On every <item> tag occurrence we create a new Item object.
        chanItem.setStartElementListener(new StartElementListener() {
            public void start(Attributes attributes) {
                item = new Item();
            }
        });

        // On every </item> tag occurrence we add the current Item object
        // to the Items container.
        chanItem.setEndElementListener(new EndElementListener() {
            public void end() {
                items.add(item);
            }
        });

        itemTitle.setEndTextElementListener(new EndTextElementListener() {
            public void end(String body) {
                item.setTitle(body);
            }
        });

        // and so on

        // here we actually parse the InputStream and return the resulting
        // Channel object.
        try {
            Xml.parse(is, Xml.Encoding.UTF_8, root.getContentHandler());
            return channel;
        } catch (SAXException e) {
            // handle the exception
        } catch (IOException e) {
            // handle the exception
        }

        return null;
    }

}

如你所见,这是一个非常简单的示例。使用android.saxSAX实现的主要优点是,你可以定义必须解析的XML结构,然后只需将事件侦听器添加到适当的元素即可。缺点是代码会重复很多,而且很肿。

org.xml.sax实现

org.xml.saxSAX处理程序实现是一个有点不同。

在这里,你无需指定或声明XML结构,而只是侦听事件。最广泛使用的是以下事件:

  • Document Start
  • Document End
  • Element Start
  • Element End
  • Characters between Element Start and Element End

使用上面的Channel对象的示例处理程序实现如下所示。

public class ExampleHandler extends DefaultHandler {

    private Channel channel;
    private Items items;
    private Item item;
    private boolean inItem = false;

    private StringBuilder content;

    public ExampleHandler() {
        items = new Items();
        content = new StringBuilder();
    }

    public void startElement(String uri, String localName, String qName, 
            Attributes atts) throws SAXException {
        content = new StringBuilder();
        if(localName.equalsIgnoreCase("channel")) {
            channel = new Channel();
        } else if(localName.equalsIgnoreCase("item")) {
            inItem = true;
            item = new Item();
        }
    }

    public void endElement(String uri, String localName, String qName) 
            throws SAXException {
        if(localName.equalsIgnoreCase("title")) {
            if(inItem) {
                item.setTitle(content.toString());
            } else {
                channel.setTitle(content.toString());
            }
        } else if(localName.equalsIgnoreCase("link")) {
            if(inItem) {
                item.setLink(content.toString());
            } else {
                channel.setLink(content.toString());
            }
        } else if(localName.equalsIgnoreCase("description")) {
            if(inItem) {
                item.setDescription(content.toString());
            } else {
                channel.setDescription(content.toString());
            }
        } else if(localName.equalsIgnoreCase("lastBuildDate")) {
            channel.setLastBuildDate(content.toString());
        } else if(localName.equalsIgnoreCase("docs")) {
            channel.setDocs(content.toString());
        } else if(localName.equalsIgnoreCase("language")) {
            channel.setLanguage(content.toString());
        } else if(localName.equalsIgnoreCase("item")) {
            inItem = false;
            items.add(item);
        } else if(localName.equalsIgnoreCase("channel")) {
            channel.setItems(items);
        }
    }

    public void characters(char[] ch, int start, int length) 
            throws SAXException {
        content.append(ch, start, length);
    }

    public void endDocument() throws SAXException {
        // you can do something here for example send
        // the Channel object somewhere or whatever.
    }

}

坦白说,现在我真的无法告诉你此处理程序实现相对于该实现的任何真正优势android.sax。不过,我可以告诉你目前的缺点,这一缺点现在应该已经很明显了。看一下方法中的else if语句startElement。由于我们有标签,link因此description我们必须在当前的XML结构中进行跟踪。也就是说,如果我们遇到一个<item>开始标记,我们设置的inItem标志true,以确保我们正确的数据映射到正确的对象,并在endElement方法中,我们设置标志,false如果我们遇到一个</item>标签。表示我们已完成该商品标签。</p> <p>在此示例中,这很容易管理,但是必须使用重复级别不同的标签来解析更复杂的结构变得棘手。例如,你必须使用Enums来设置当前状态,并使用许多开关/案例状态菜单来检查你的位置,或者更优雅的解决方案是使用标签栈的某种标签跟踪器。</p> <div style="font-size:12px"> <span>2020-03-02 </span> </div> </div> </div> </div> <div class="col-md-4 bd-toc"> <div class="ui segment"> <script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-5812992211126268" crossorigin="anonymous"></script> <!-- 问答纵向 --> <ins class="adsbygoogle" style="display:block" data-ad-client="ca-pub-5812992211126268" data-ad-slot="9167960078" data-ad-format="auto" data-full-width-responsive="true"></ins> <script> (adsbygoogle = window.adsbygoogle || []).push({}); </script> </div> <div class="ui segment"> <script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-5812992211126268" crossorigin="anonymous"></script> <!-- 问答纵向 --> <ins class="adsbygoogle" style="display:block" data-ad-client="ca-pub-5812992211126268" data-ad-slot="9167960078" data-ad-format="auto" data-full-width-responsive="true"></ins> <script> (adsbygoogle = window.adsbygoogle || []).push({}); </script> </div> </div> </div> </div> <!-- <script src="https://readmore.openwrite.cn/js/readmore.js" type="text/javascript"></script> <script> const btw = new BTWPlugin(); btw.init({ id: 'container', blogId: '19336-1647660365813-119', name: 'Golang技术栈', qrcode: 'https://codingdict.com/static/assets/images/qrcode.jpg', keyword: '666', }); </script> --> <footer class="es-footer"> <div class="copyright"> <div class="container"> Powered by <a href="http://www.codingdict.com/" target="_blank">CodingDict</a> ©2014-2020 <a class="mlm" href="http://www.codingdict.com/" target="_blank">编程字典</a> <a class="mlm" href="http://www.codingdict.com/courses">课程存档</a> <div class="mts"> 课程内容版权均归 <a href="http://www.codingdict.com/"> CodingDict </a> 所有 <a class="mlm" href="https://beian.miit.gov.cn/" target="_blank"> 京ICP备18030172号 </a> <span> 商务合作:15011039890(微信手机同号)</span> </div> </div> </div> </footer> <script type="text/javascript" src="/static/plugins/js/jquery.min.js"></script> <script type="text/javascript" src="/static/assets/js/bootstrap.min.js"></script> <script type="text/javascript" src="/static/plugins/js/ace.js"></script> <script type="text/javascript" src="/static/plugins/js/resizable.min.js"></script> <script type="text/javascript" src="/static/plugins/js/semantic.min.js"></script> <script type="text/javascript" src="/static/plugins/js/emojis.min.js"></script> <script type="text/javascript" src="/static/plugins/js/highlight.min.js"></script> <script type="text/javascript" src="/static/martor/js/martor.min.js"></script> </div> </body> </html>