All Packages Class Hierarchy This Package Previous Next Index
Class Webcrawler.Crawler.HTMLNode
java.lang.Object
|
+----Webcrawler.Crawler.URLNode
|
+----Webcrawler.Crawler.LoadableNode
|
+----Webcrawler.Crawler.HTMLNode
- public class HTMLNode
- extends LoadableNode
This class is derived from LoadableNode and implements a node-class for
storing HTML-file-data such as the links in the file and the title.
For easy access to the links (=the sons) this class features an Enumeration
or the links-Vector can directly be retrieved.
To find HTML-page specific information such as the TITLE use the
findHTMLPageInfos()-method, which parese through the local file and sets
the fields of this object according to the found information.
An object of this class can call the findSons() method which uses a Parser
for finding the links and connecting them to the node as sons.
- See Also:
- URLNode, LoadableNode, Parser
-
links
- Sons of this node.
-
title
- Title of HTML-Page (if existing)
-
willSonsBeLoaded
- Will the sons of this node be loaded in the future.
-
HTMLNode()
-
-
HTMLNode(String)
-
-
HTMLNode(URL, String)
-
-
check_Connect(String)
- Finds out what type the specified url has (HTML,...) and connects a new node
with the according nodetype (LoadableNode, HTLNode, URLNode) to this node.
-
ConnectDeadSon(String)
- Connects a new son to this node, whose URLType is set to dead.
-
ConnectMalformedSon(String)
- Creates a new LoadableNode, sets it's URLType to malformed and its
infoText to "url couldn't be resolved....".
-
ConnectSon(URLNode)
- Connects the given node to this node.
-
copy(HTMLNode)
- Copies the title, but not the links, because if a URL is recursive
it is also a leaf in the tree, so it doesn't need any links.
-
findHTMLPageInfos()
- Parses through localfile and sets the HTML-page specific fields like TITLE.
-
getLinks()
-
-
getNoOfSons()
-
-
getSonEnumeration()
- Access to the sons of the node via an Enumeration
-
getTitle()
-
-
getWillSonsBeLoaded()
-
title
protected String title
- Title of HTML-Page (if existing)
links
protected Vector links
- Sons of this node.
willSonsBeLoaded
protected boolean willSonsBeLoaded
- Will the sons of this node be loaded in the future.
This info is important for the Parsers, cuz' they don't need to do the
+"/index.html"-check for every son if it won't be loaded.
HTMLNode
public HTMLNode()
- See Also:
- URLNode, LoadableNode
HTMLNode
public HTMLNode(String url) throws MalformedURLException
- See Also:
- URLNode, LoadableNode
HTMLNode
public HTMLNode(URL context,
String spec) throws MalformedURLException
- See Also:
- URLNode, LoadableNode
copy
public void copy(HTMLNode from)
- Copies the title, but not the links, because if a URL is recursive
it is also a leaf in the tree, so it doesn't need any links.
getTitle
public String getTitle()
- Returns:
- The Title of this HTML-page
getNoOfSons
public int getNoOfSons()
- Returns:
- The number of sons connected to this node
getSonEnumeration
public Enumeration getSonEnumeration()
- Access to the sons of the node via an Enumeration
- Returns:
- An Enumeration over all the sons
- See Also:
- Enumeration
getLinks
public Vector getLinks()
- Returns:
- the Vector links stored in this node
getWillSonsBeLoaded
public boolean getWillSonsBeLoaded()
- Returns:
- will this nodes sons be loaded in the future?
- See Also:
- Parsers
ConnectSon
public URLNode ConnectSon(URLNode n)
- Connects the given node to this node. Sets depth and father fields of node
n before connecting.
- Returns:
- the connected node
ConnectMalformedSon
public URLNode ConnectMalformedSon(String url)
- Creates a new LoadableNode, sets it's URLType to malformed and its
infoText to "url couldn't be resolved....". Connects that son to
this node.
- Returns:
- the connected node
ConnectDeadSon
public URLNode ConnectDeadSon(String url)
- Connects a new son to this node, whose URLType is set to dead.
- Returns:
- the connected node
check_Connect
public URLNode check_Connect(String url)
- Finds out what type the specified url has (HTML,...) and connects a new node
with the according nodetype (LoadableNode, HTLNode, URLNode) to this node.
- Returns:
- the connected node (null if error occured)
findHTMLPageInfos
public void findHTMLPageInfos()
- Parses through localfile and sets the HTML-page specific fields like TITLE.
All Packages Class Hierarchy This Package Previous Next Index