Ideas for Coding a Web Crawler in Java

Asked By 20 points N/A Posted on -
qa-featured

Is it feasible to write a web crawler in Java? I know some web crawlers are written in languages such as PHP but I am not entirely sure you can have one written in Java. So my question is, can you write a web crawler program in Java and have it deployed on the web to search for information? If it is possible, then do you know how efficient such a program written in Java will be?

 

SHARE
Answered By 590495 points N/A #189544

Ideas for Coding a Web Crawler in Java

qa-featured

At first, I thought it is not possible because most web spiders are not written using Java. But after a little digging, it turns out that there are even tutorials online that will teach you how to create your own Java web crawler. But first, of course, you need a full knowledge about Java because that’s the foundation.

A normal spider works in the following pace: first, parse the root page or the root web page, like for example, mit.edu, and gather all links from this page; second, use the URLs that you collected in the first step and then parse those URLs; third, each page needs to be tracked so that each web page gets processed only once.

The third step will require you to have a database. But if you don’t want to use a database, you can also use a file to track or monitor the history of the crawl. If you want to know how it is done, visit Web Crawler Out Of Java.

Related Questions