All Packages  Class Hierarchy  This Package  Previous  Next  Index

Class Webcrawler.Crawler.URLTree

java.lang.Object
   |
   +----Webcrawler.Crawler.URLTree

public class URLTree
extends Object
Organizes a tree of URLNodes. When a new node is being loaded from the network, the loadedNode(n) method must be called. The checkLoaded(n) method can then check if a specified URL was already loaded. This is useful for not loading the same thing over and over again. (Implementation uses a Hashtable) Since the Reference-part (#) of a http address is only a reference within a HTML file, it is unnecessary to load the file with the reference if the file without the ref has already been loaded. As a result of this the methods loadedNode and checkLoaded only use the 1st part of the URL without the reference.


Variable Index

 o loaded
 o rootNode

Constructor Index

 o URLTree(String)
Creates a new HTMLNode as the root of this tree.

Method Index

 o checkLoaded(LoadableNode)
Says if the spec.
 o getRootNode()
Returns the reference to the root of the tree.
 o loadedNode(LoadableNode)
Registers the URL of the node n to be already loaded from the network.

Variables

 o rootNode
 protected HTMLNode rootNode
 o loaded
 protected Hashtable loaded

Constructors

 o URLTree
 public URLTree(String url) throws MalformedURLException
Creates a new HTMLNode as the root of this tree.

Methods

 o loadedNode
 protected void loadedNode(LoadableNode n)
Registers the URL of the node n to be already loaded from the network. (URL without reference-part)

Parameters:
n - the node to be registered as loaded
 o checkLoaded
 protected boolean checkLoaded(LoadableNode checkme)
Says if the spec. URL (checkme) was already loaded from the network. Before loading an URL always use this method to check that it hasn't been downloaded before. This prevents unnecessary downloads. If the URL of checkme has been loaded before, this method calls checkme.copy(theloadednode) and sets checkme.URLType to recursive. (URLs without reference-part)

Parameters:
checkme - the node to be checked if it was already loaded
Returns:
has been loaded before true/false
 o getRootNode
 public HTMLNode getRootNode()
Returns the reference to the root of the tree.


All Packages  Class Hierarchy  This Package  Previous  Next  Index