All Packages Class Hierarchy This Package Previous Next Index
Class Webcrawler.Crawler.URLNode
java.lang.Object
|
+----Webcrawler.Crawler.URLNode
- public class URLNode
- extends Object
The superclass of all Node-classes. A URLNode can not be loaded (e.g. a mail-link)
and has no descendants (like an HTMLNode). Every node has a URL, is at a certain depth in the tree,
has a father (the root doesn't), has a nodeState in the Crawler-cycle (waiting, reading,...) and
can have an explaining infoText.
So far there are 2 more Node-classes: LoadableNode and HTMLNode.
- See Also:
- URLTree, LoadableNode, HTMLNode
-
addressInParentFile
- The Address of this page as it was referenced in the parent-file
e.g.
-
depth
- The depth of this node in the tree (0=root)
-
done
-
-
father
- The father of this node (the root doesn't have one)
-
infoText
- Contains, for example, info about errors
-
myURL
- The URL of this node.
-
nodeState
- see static finals above (default: notRead)
-
notRead
-
-
parsing
-
-
reading
-
-
waitingToBeParsed
-
-
waitingToBeRead
-
-
URLNode()
- Use this Constructor for malformed URLs.
-
URLNode(String)
- Creates a new URLNode with the specified url.
-
URLNode(URL, String)
- Use this Constructor for resolving relative URLs.
-
copy(URLNode)
-
When a node is detected to be recursive (=has been downloaded before) the info
of that loaded node can just be copied into the new node and doesn't need to be
loaded again.
-
getAddressInParentFile()
-
-
getDepth()
-
-
getFather()
-
-
getInfo()
-
-
getNodeState()
-
-
getURL()
-
-
toString()
-
notRead
public static final int notRead
waitingToBeRead
public static final int waitingToBeRead
reading
public static final int reading
waitingToBeParsed
public static final int waitingToBeParsed
parsing
public static final int parsing
done
public static final int done
addressInParentFile
protected String addressInParentFile
- The Address of this page as it was referenced in the parent-file
e.g. addressInParentFile="support.html" but myURL="http://www.x.com/support.html"
myURL contains the full URL (relative ones are resolved)
myURL
protected URL myURL
- The URL of this node. Contains the full URL (relative URLs are resolved)
depth
protected int depth
- The depth of this node in the tree (0=root)
father
protected URLNode father
- The father of this node (the root doesn't have one)
nodeState
protected int nodeState
- see static finals above (default: notRead)
infoText
protected String infoText
- Contains, for example, info about errors
URLNode
public URLNode()
- Use this Constructor for malformed URLs. Set the infoText accordingly.
URLNode
public URLNode(String url) throws MalformedURLException
- Creates a new URLNode with the specified url. If that URL is malformed,
the MalformedURLException is thrown, which should be cought, then the
URLNode() constructor should be used and the infoText set accordingly.
URLNode
public URLNode(URL context,
String spec) throws MalformedURLException
- Use this Constructor for resolving relative URLs. More info about resolving
relative URLs see the documentation of the class java.net.URL.
copy
public void copy(URLNode from)
- When a node is detected to be recursive (=has been downloaded before) the info
of that loaded node can just be copied into the new node and doesn't need to be
loaded again. For copying relevant info the URLTree-class uses this method.
In the URLNode case: none of the URLNode-fields need to be copied.
- Parameters:
- from - the node from which info should be copied into this node
getAddressInParentFile
public String getAddressInParentFile()
- Returns:
- the address of this page as it was referenced in the parent file
getURL
public URL getURL()
- Returns:
- the URL of this node
getDepth
public int getDepth()
- Returns:
- the depth of this node in the tree (0=root)
getFather
public URLNode getFather()
- Returns:
- the father-node (if existing)
getNodeState
public int getNodeState()
- Returns:
- the nodeState (=where is this node in the Crawler-cycle) of this node
getInfo
public String getInfo()
- Returns:
- the infoText of this node
toString
public String toString()
- Returns:
- a String-representaion of this node or "no URL" if myURL is not set.
- Overrides:
- toString in class Object
All Packages Class Hierarchy This Package Previous Next Index