All Packages  Class Hierarchy  This Package  Previous  Next  Index

Class Webcrawler.Crawler.URLNode

java.lang.Object
   |
   +----Webcrawler.Crawler.URLNode

public class URLNode
extends Object
The superclass of all Node-classes. A URLNode can not be loaded (e.g. a mail-link) and has no descendants (like an HTMLNode). Every node has a URL, is at a certain depth in the tree, has a father (the root doesn't), has a nodeState in the Crawler-cycle (waiting, reading,...) and can have an explaining infoText. So far there are 2 more Node-classes: LoadableNode and HTMLNode.

See Also:
URLTree, LoadableNode, HTMLNode

Variable Index

 o addressInParentFile
The Address of this page as it was referenced in the parent-file e.g.
 o depth
The depth of this node in the tree (0=root)
 o done
 o father
The father of this node (the root doesn't have one)
 o infoText
Contains, for example, info about errors
 o myURL
The URL of this node.
 o nodeState
see static finals above (default: notRead)
 o notRead
 o parsing
 o reading
 o waitingToBeParsed
 o waitingToBeRead

Constructor Index

 o URLNode()
Use this Constructor for malformed URLs.
 o URLNode(String)
Creates a new URLNode with the specified url.
 o URLNode(URL, String)
Use this Constructor for resolving relative URLs.

Method Index

 o copy(URLNode)
When a node is detected to be recursive (=has been downloaded before) the info of that loaded node can just be copied into the new node and doesn't need to be loaded again.
 o getAddressInParentFile()
 o getDepth()
 o getFather()
 o getInfo()
 o getNodeState()
 o getURL()
 o toString()

Variables

 o notRead
 public static final int notRead
 o waitingToBeRead
 public static final int waitingToBeRead
 o reading
 public static final int reading
 o waitingToBeParsed
 public static final int waitingToBeParsed
 o parsing
 public static final int parsing
 o done
 public static final int done
 o addressInParentFile
 protected String addressInParentFile
The Address of this page as it was referenced in the parent-file e.g. addressInParentFile="support.html" but myURL="http://www.x.com/support.html" myURL contains the full URL (relative ones are resolved)

 o myURL
 protected URL myURL
The URL of this node. Contains the full URL (relative URLs are resolved)

 o depth
 protected int depth
The depth of this node in the tree (0=root)

 o father
 protected URLNode father
The father of this node (the root doesn't have one)

 o nodeState
 protected int nodeState
see static finals above (default: notRead)

 o infoText
 protected String infoText
Contains, for example, info about errors

Constructors

 o URLNode
 public URLNode()
Use this Constructor for malformed URLs. Set the infoText accordingly.

 o URLNode
 public URLNode(String url) throws MalformedURLException
Creates a new URLNode with the specified url. If that URL is malformed, the MalformedURLException is thrown, which should be cought, then the URLNode() constructor should be used and the infoText set accordingly.

 o URLNode
 public URLNode(URL context,
                String spec) throws MalformedURLException
Use this Constructor for resolving relative URLs. More info about resolving relative URLs see the documentation of the class java.net.URL.

Methods

 o copy
 public void copy(URLNode from)
When a node is detected to be recursive (=has been downloaded before) the info of that loaded node can just be copied into the new node and doesn't need to be loaded again. For copying relevant info the URLTree-class uses this method. In the URLNode case: none of the URLNode-fields need to be copied.

Parameters:
from - the node from which info should be copied into this node
 o getAddressInParentFile
 public String getAddressInParentFile()
Returns:
the address of this page as it was referenced in the parent file
 o getURL
 public URL getURL()
Returns:
the URL of this node
 o getDepth
 public int getDepth()
Returns:
the depth of this node in the tree (0=root)
 o getFather
 public URLNode getFather()
Returns:
the father-node (if existing)
 o getNodeState
 public int getNodeState()
Returns:
the nodeState (=where is this node in the Crawler-cycle) of this node
 o getInfo
 public String getInfo()
Returns:
the infoText of this node
 o toString
 public String toString()
Returns:
a String-representaion of this node or "no URL" if myURL is not set.
Overrides:
toString in class Object

All Packages  Class Hierarchy  This Package  Previous  Next  Index