Parsing RSS-XML using JSTL tag library

RSS, or Really Simple Syndication, is a method of sharing and broadcasting content such as news from a website. Using XML, items such as news articles can be automatically downloaded into a News Reader or published onto another website. There are two ways of using RSS; to share your data with others or to harvest others’ data for your site. First version of RSS called RDF Site Summary was created by Ramanathan V. Guha at Netscape in March 1999. This version became known as RSS 0.9. Version 0.91 was launched by Dan Libby of Netscape in July 1999. He also renamed RSS “Rich Site Summary”.

Introduction to JSTL

Java Server Pages Standard Tag Library is a component of Java Enterprise Edition Web application development platform (J2EE 1.4 SDK) released by Sun Microsystems. JSTL provides an effective way to embed logic within a JSP page without using embedded Java code directly. The use of a standardized tag set, rather than breaking in and out of Java code leads to more maintainable code and enables separation of concerns between the development of the application code and user interface.

Commonly used JAR files

Parsing XML using JSTL

The x prefix type of tag of JSTL can be used for parsing xml documents. Let us keep the XML content in a file called data.xml. Following is the content of data.xml
<!-- Filename: data.xml -->
<persons>
	<student>
		<name>Mary</name>
		<age>12</age>
		<class>5</class>
	</student>
	<student>
		<name>John</name>
		<age>13</age>
		<class>6</class>
	</student>
</persons>
Assume that this file is placed next to JSP file where we need to parse this xml document. Following will be the JSP file which will parse data.xml file and display information of each student as a row in table. Content of ShowStudents.jsp
<!-- Filename: ShowStudents.jsp -->
<%@taglib uri="http://java.sun.com/jsp/jstl/xml" prefix="x"%>
<%@taglib uri="http://java.sun.com/jsp/jstl/core" prefix="c"%>
<HTML>
<HEAD>
	<TITLE>Display Student Information</TITLE>
	<STYLE>
		table {border: 1px solid blue; width: 300px;}
	</style>
</HEAD>
<BODY>
	
	<c:import var="xmlDoc" url="data.xml"/>
	
	<x:parse var="parsedDocument" xml="${xmlDoc}"/>
	<table>
		<tr>
			<th>Name</th>
			<th>Age</th>
			<th>Class</th>
		</tr>
		<x:forEach select="$parsedDocument/persons/student">
		<tr>
			<td> <x:out select="name" /> </td>
			<td> <x:out select="age" /> </td>
			<td> <x:out select="class" /> </td>
		</tr>
		</x:forEach>
	</table>
</BODY>
</HTML>
Understanding the code To parse a given XML document, first we need to import it using <c:import> tag. This tag will import the content of the url into a variable. In previous example, content of the data.xml will be copied into variable xmlDoc. For parsing the imported XML content, <x:parse> tag is used. Hence the parsedDocument variable will contain parsed XML document. This variable can be used then to access other child tags as well as properties. <x:forEach> tag can be used to iterate across a given tag. Here in above example we have iterated through <student> tag by using <x:forEach select=”$parsedDocument/persons/student”>. Child elements of <student> tag can be accessed by <x:out select=”name” />. Hence this tag will print the value of the <name> tag.

Understanding structure of RSS 2.0

Since RSS 2.0 must be a valid XML, the first line of the feed must be the XML declaration.
<?xml version="1.0"?>
The root of the RSS 2.0 format is <rss> and <channel> tag. All of the feed content goes inside these tags.
<rss version="2.0">
  <channel>
Next comes the information about the feed such as title of feed, the description, like of the site etc.
<title>Google News Feed</title>
<link>http://news.google.com</link>
<description>Google News Feed</description>
<language>en-us</language>
<pubDate>Tue, 10 Jun 2008 04:00:00 GMT</pubDate>
<lastBuildDate>Tue, 10 Jun 2008 09:41:01 GMT</lastBuildDate>
A channel may contain any number of <item>s. An item may represent a “story” — much like a story in a newspaper or magazine; if so its description is a synopsis of the story, and the link points to the full story. An item may also be complete in itself, if so, the description contains the text (entity-encoded HTML is allowed), and the link and title may be omitted. All elements of an item are optional, however at least one of title or description must be present. Each item has a title, link, description, publication date and guid.
<item>
  <title>Star City</title>
  <link>http://liftoff.msfc.nasa.gov/news/2003/news-starcity.asp</link>
  <description>
     How do Americans get ready to work with Russians aboard the International Space Station? 
</description>
  <pubDate>Tue, 03 Jun 2008 09:39:21 GMT</pubDate>
  <guid>http://liftoff.msfc.nasa.gov/2008/06/03.html </guid>
</item>
  • title: The name of the channel.
  • link: The URL to the HTML website corresponding to the channel
  • language: The language the channel is written in
  • copyright: Copyright notice for content in the channel.
  • managingEditor: Email address for person responsible for editorial content.
  • webMaster: Email address for person responsible for technical issues relating to channel.
  • pubDate: The publication date for the content in the channel.
  • category: Specify one or more categories that the channel belongs to.
  • generator: A string indicating the program used to generate the channel.
  • image Specifies: a GIF, JPEG or PNG image that can be displayed with the channel.

Elements of <item>

  • title: The title of the item.
  • link: The URL of the item.
  • description: The item synopsis.
  • author: Email address of the author of the item.
  • category: Includes the item in one or more categories.
  • comments: URL of a page for comments relating to the item.
  • enclosure: Describes a media object that is attached to the item
  • guid: A string that uniquely identifies the item.
  • pubDate: Indicates when the item was published.
  • source: The RSS channel that the item came from.

Parsing RSS using JSTL

We will create a parser which parse the given RSS feed and will display it in a JSP using JSTL. For this we will have a textbox where in user can specify the Feed URL. This URL will be submitted to same JSP page and then will be parsed using tag of JSTL.
<!-- Filename: FeerReader.jsp -->
<%@taglib uri="http://java.sun.com/jsp/jstl/xml" prefix="x"%>
<%@taglib uri="http://java.sun.com/jsp/jstl/core" prefix="c"%>
<HTML>
<HEAD>
	<TITLE>Feed Reader in JSTL</TITLE>
	<STYLE>
		table 
             {border: 2px ridge ; width: 500px}
		#feed .title
             {font-family: Arial; font-weight: bold; font-size: 18px}
		#feed .label
             {font-family: Tahoma; font-weight: bold; font-size: 11px}
		#feed td 
             {font-family: Tahoma; font-size: 11px}
	</style>
</HEAD>
<BODY>
	<form>
		<input type="text" name="feedURL" 
			value="http://news.google.com/?output=rss" /> 
		<input type="submit" value="Display"/>
	</form>

	<c:if test="${param.feedURL != null}">
		
	Feed URL: ${param.feedURL}

  	<c:import var="xmlContent" url="${param.feedURL}"/>
	
	<x:parse var="doc" xml="${xmlContent}"/>
    	
    	
    <table class="content-table"" id="feed">
    <tr class="profile_odd">
    	<td align="center" colspan="2">  
    		<span class="title">
               <x:out select="$doc/rss/channel/title" />
    		</span> 
    	</td>
    </tr>
     <x:forEach var="story" 
                select="$doc/rss/channel/item" varStatus="status">
     	<tr>
     		<td colspan="2"> <hr/> </td>
     	</tr>
        <tr class="profile_even">
          <td class="label">Topic</td>
          <td> <x:out select="title" /> </td>	
        </tr>
        <tr class="profile_even">
          <td class="label">Published Date</td>
          <td> <x:out select="pubDate" /> </td>	
        </tr>
        <tr class="profile_even">
          <td class="label">Category</td>
          <td> <x:out select="category" /> </td>	
        </tr>
        <tr class="" valign="top">
        	<td class="label">Description</td>
        	<td><x:out select="description" escapeXml="false"/></td>
        </tr>
      </x:forEach>
    </table>
	</c:if>
</BODY>
</HTML>

References

  • JSTL Home Page (https://java.sun.com/products/jsp/jstl/)
  • Documentation on XML tag library (http://java.sun.com/j2ee/1.4/docs/tutorial/doc/JSTL5.html)
  • Information about RSS (http://en.wikipedia.org/wiki/RSS_(file_format))

Further Reading

The given tutorial about Parsing RSS using JSTL uses Java technologies like JSP/JSTL etc and focus on RSS 2.0 format. There are other open source free tools available which do the similar work of parsing different feed formats like RSS 0.90, RSS 0.91, and Atom etc. Readers who are interested in such formats or the open source tools can refer to following links:
  • RSS 1.0 Specification (http://web.resource.org/rss/1.0).
  • Project Rome: Open Source Java tools for parsing, generating and publishing RSS and Atom feeds. (https://rome.dev.java.net/).
  • Parsing RSS in .NET (http://www.rssdotnet.com/).
  • Parsing RSS in Ajax/Javascript (http://www.xml.com/pub/a/2006/09/13/rss-and-ajax-a-simple-news-reader.html).
Get our Articles via Email. Enter your email address.

You may also like...

15 Comments

  1. Akmed says:

    Thanks for this. http://viralpatel.net/ is now in my feed reader, I’ll keep and eye out for your next story. I like the layout of your site, nice and clean and easy to read. Thakns.

  2. Viral says:

    Thanks Akmed for the comment :)
    Watch this space for more fun on tech side.

  3. Koolz says:

    Hi Viral,
    Thanks a lot for that info. Since i am totally new to web development, i have some queries. What are the prerequisites for running the JSTL RSS feed reader that you have created above? can i simply run the above jsp page from my desktop and see if google news is getting parsed? If not what are the steps to be followed? At present when i tried running it using Mozilla, the taglib porion was getting displayed as it is and no parsing was occuring. Can ya please help out? Thanks a lot

  4. Viral Patel says:

    Hi Koolz,
    As I learned from your comment, you are probably new to web development, I think you have to setup first some basics environment for running JSP/JSTL in your machine. For that you may want to install servlet container like Tomcat 5.5 and then add this project in it and run it. Try to search for some basic information about JSP on internet. This much info may not be sufficient for a newbee to start JSP development, but believe me, it is pretty much easy. Just hook to it and try to read some books about JSP/Servlets etc.
    Hope this will help..

  5. Amit says:

    I am getting the following error in the code you provided. Any clues for this error.
    org.apache.jasper.JasperException: Exception in JSP: /rssparse.jsp:37

    34:
    35:
    36:
    37:
    38:
    39:
    40:

    • Viral Patel says:

      It seems that your code contains html tags and thus is truncated. Can you wrap your code in a <pre> tag?

  6. Amit says:

    Hi Viral,

    I am not getting any data from the RSS parser. Here is my code. I have confirmed that the rss is loading in the xml variable. But after that I don’t get any result just an empty page. Please help me.

    <a href="”>

  7. Amit says:

    i am not able to post any code.

    • Viral Patel says:

      @amit: wrap your code inside <pre> tag.

  8. Ami says:

    Hi Viral,
    I have to parse a Medline Citation XML using JAXB.Can u plz help me?
    I have already given my full week on R and D…
    Thankx in advance

  9. starssswin says:

    thank’s been looking for this

  10. Jerez says:

    Hi Viral,
    Thanks a lot for this one.
    If I want to parse the tag like
    How can I do?

  11. sumit says:

    hey viral, can u please update the lib links (jar files) to be included… ? those posted above are not working
    and please tell me where to put those jar files exactly in ?

    • Viral Patel says:

      Hi, I fixed the broken links and added correct JARs. In your Dynamic web project, you must put these jars in WebContent/WEB-INF/lib folder.

  12. ashish jain says:

    Nice article. Sometimes simple solution works best.

Leave a Reply

Your email address will not be published. Required fields are marked *