# 2.2 Passive Crawling & Spidering

**Crawling** and S**pidering** are techniques used to automatically browse and index web content, typically performed by search engines or web crawlers. Here's a breakdown of each:

## **Crawling**

* Crawling is the process of systematically browsing the web to discover and index web pages.
* A crawler, also known as a web spider or web robot, starts with a list of seed URLs and then follows hyperlinks from one page to another, recursively.
* Crawlers retrieve web pages, extract links from them, and add new URLs to their queue for further crawling.
* They often adhere to a set of rules specified in a file called robots.txt to determine which parts of a website they are allowed to crawl and index.
* We can utilize [**Burp Suite**](https://dev-angelist.gitbook.io/practical-ethical-hacker-ceh-tools/practical-ethical-hacker-notes/tools/burp-suite)'s passive crawler to map out the web app and understand its structure (navigating on website an retrieve the site map).

## **Spidering**

* Spidering is a term often used interchangeably with crawling, referring to the automated process of fetching web pages and following links.
* It derives from the analogy of a spider weaving a web by moving from one location to another and creating connections.
* Spidering involves systematically traversing the web, exploring web pages and their links to index content for search engines or other purposes.
* Spidering algorithms may vary based on the specific objectives, such as optimizing for depth-first or breadth-first traversal, or prioritizing certain types of content.
* We can utilize Burp Suite Pro or [**OWASP ZAP**](https://dev-angelist.gitbook.io/practical-ethical-hacker-ceh-tools/practical-ethical-hacker-notes/tools/zap)'s Spider to automate the process of spidering a web application to map out the web app and understand its structure.

### Spidering with OWASP ZAP

* Abilitate Foxy Proxy on browser (Firefox)
* Refresh webpage to retrieve Sites Link on ZAP
* Click on Tools -> Spider
* Select URL and abilitate recurse
* Start Scan

<figure><img src="/files/1uswZ1LXojo0PNhqoxMR" alt=""><figcaption></figcaption></figure>

We obtained more URIs and we can save them into .csv format for a better exportation and manipulation.

<figure><img src="/files/rDheqgG0V8ZDkDv3Rgjn" alt=""><figcaption></figcaption></figure>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://dev-angelist.gitbook.io/ewptv2-notes/readme/network-security/2.2-passive-crawling-and-spidering.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
