2.2 Passive Crawling & Spidering

Crawling and Spidering are techniques used to automatically browse and index web content, typically performed by search engines or web crawlers. Here's a breakdown of each:

Crawling

  • Crawling is the process of systematically browsing the web to discover and index web pages.

  • A crawler, also known as a web spider or web robot, starts with a list of seed URLs and then follows hyperlinks from one page to another, recursively.

  • Crawlers retrieve web pages, extract links from them, and add new URLs to their queue for further crawling.

  • They often adhere to a set of rules specified in a file called robots.txt to determine which parts of a website they are allowed to crawl and index.

Spidering

  • Spidering is a term often used interchangeably with crawling, referring to the automated process of fetching web pages and following links.

  • It derives from the analogy of a spider weaving a web by moving from one location to another and creating connections.

  • Spidering involves systematically traversing the web, exploring web pages and their links to index content for search engines or other purposes.

  • Spidering algorithms may vary based on the specific objectives, such as optimizing for depth-first or breadth-first traversal, or prioritizing certain types of content.

Spidering with OWASP ZAP

  • Abilitate Foxy Proxy on browser (Firefox)

  • Refresh webpage to retrieve Sites Link on ZAP

  • Click on Tools -> Spider

  • Select URL and abilitate recurse

  • Start Scan

We obtained more URIs and we can save them into .csv format for a better exportation and manipulation.

Last updated