Automation of the reconnaissance phase during Web Application Penetration Testing II | by Karol Mazurek

Automation of the reconnaissance phase during Web Application Penetration Testing II

by Karol Mazurek


This article is a continuation of the previous one available in this link.
After the first phase of reconnaissance, which was subdomains enumeration, you should have a lot of information about the company you are attacking.

The next step is to select one subdomain and perform a detailed reconnaissance strictly on it. In this article, you’ll learn about the path and queries enumeration tools. You will also learn how to use them and automate the entire process. Described research is based on the OWASP methodology and the methodology contained in the book “Hack Tricks” written by Carlos Polop.

Generally speaking, if the penetration test coverage is limited to one subdomain, we will be interested in the following resources:

  1. Protocols (scheme)
  2. Host (ip && ports)
  3. Paths (directories && files)
  4. Queries (parameters names && values)

Source: own study

Before starting work, launch a new project in Burp Suite and turn of interception, as shown in the screenshot below:

Source: own study

Then prepare appropriate catalogs to which the results of the reconnaissance will be saved (most of the results from the tools used will be saved in the “recon.txt” file). Resolve the IP addresses of the targeted domain and set the appropriate variables on which we will operate. Set your target domain in the first line in place of “$domain” variable as shown below.

Tools used:

  • dig

Source: own study

The tester’s first task is to check for opened ports on the resolved IP addresses of the targeted subdomain. Then perform OS detectionversion detectionscript scanning, and traceroute. Restrict the scan to opened ports only.
Save results in a txt format and in a format for grep — this will be useful for other tools like BruteSpray.

Tools used:

  • nmap

The screenshot below shows how you can automate this process using bash

Source: own study

Then check for existance of a “robots.txt” and “sitemap.xml” files and gather url addresses contained in them. In order to extract the contents of “robots.txt” file, just use “curl”, while the contents of the “sitemap.xml” file need to be parsed. For this purpose, I recommend the script, created by “yuriyyakym” which is available in this link.

Tools used:

  • sitemap-urls

The screenshot below shows how you can automate this process using bash:

Source: own study

The next step is to identify technologies used on websites and check if there is any WAF (Web application firewall) being used.

Tools used:

  • wafw00f
  • webtech

The screenshot below shows how you can automate this process using bash:

Source: own study

Then perform a brute-force attack to uncover known and potentially dangerous scripts on the web server.

Tools used:

  • nikto

The screenshot below shows how you can automate this process using bash:

Source: own study

After that use “Wapiti” to scan the webpages of the deployed webapp, looking for misconfiguration in cookies flagscontent security policyheaders security and htaccess file. Additionally checking for shellshock and crlf injection vulnerabilities.

Tools used:

  • wapiti

The screenshot below shows how you can automate this process using bash:

Source: own study

The next step is to search the network for parameters of the targeted domain.

Tools used:

  • gospider
  • paramspider
  • gau
  • waybackurls
  • hakrawler
  • galer

The screenshot below shows how you can automate this process using bash:

Source: own study

After web crawling, merge results from previous reconnaissance phase gathered in “../urls.txt”, (but only these related to targeted subdomain) with current output from crawlers in “urls.txt”.

Tools used:

  • anew
  • qsreplace

The screenshot below shows how you can automate this process using bash:

Source: own study

Now check all statuses for the urls containing the query string and proxy them to “Burp Suite”.

Tools used:

  • wfuzz
  • Burp Suite

The screenshot below shows how you can automate this process using bash:

Source: own study

Then extract all paths from “urls.txt” file and add them to wordlist that will be used for directory bruteforcing.

Tools used:

  • unfurl

The screenshot below shows how you can automate this process using bash:

Source: own study

Now it is a time for directory bruteforcing in order to discover new paths.

Tools used:

  • ffuf

The screenshot below shows how you can automate this process using bash:

Source: own study

The next step is to filter the results to remove potentially insignificant responses returned by the server. For this purpose I have created my own script called “clever_ffuf” (it can be downloaded from this link). However, you should not rely only on it and check the “status_ffuf.txt” file manually.

Tools used:

  • clever_ffuf

The screenshot below shows how you can automate this process using bash:

Source: own study

Now download source code of all enumerated valid urls and store it in the “all_source_code/” directory. Name each of the file as the name of the url. This way, you can easily find the website on which the api key was leaked, if you find one in the later stages of this recon.

Tools used:

  • curl

The screenshot below shows how you can automate this process using bash:

Source: own study

Afterwards use another web crawler to gather more JavaScript url addresses and extract links to JavaScript files from web crawling (“urls.txt”) and brute-forcing (“ffuf.txt”). Then check if they are valid.

Tools used:

  • getJS
  • anew
  • httpx

The screenshot below shows how you can automate this process using bash:

Source: own study

Then gather all source code again, but this time from live JavaScript links.

Tools used:

  • curl

The screenshot below shows how you can automate this process using bash:

Source: own study

Next step is to extract new paths and leaked api keys from gathered source code.

Tools used:

  • zile
  • unfurl

The screenshot below shows how you can automate this process using bash:

Source: own study

After that check if there are no duplicates in the new endpoints extracted from java script files and check their status.

Tools used:

  • wfuzz

The screenshot below shows how you can automate this process using bash:

Source: own study

The next step is to delete all responses from the server with 400 and 404 status code. Then merge lists from brute-forcing into one called “ffuf.txt” and proxy the results to Burp Suite for further processing.

Tools used:

  • wfuzz
  • anew
  • Burp Suite

The screenshot below shows how you can automate this process using bash:

Source: own study

Check for backup extensions on the enumerated valid urls. For this purpose, my program called “crimson_backuper” was used, available for download here.

Tools used:

  • crimson_backuper

The screenshot below shows how you can automate this process using bash:

Source: own study

In the next step, extract unique queries from previous enumeration from “status_params.txt” and store them in “exp/params.txt” and prepare directories by adding the “/” at the end of the path if it is not there.

Tools used:

  • qsreplace

The screenshot below shows how you can automate this process using bash:

Source: own study

Check for CORS (Cross Origin Resource Sharing) misconfigurations and search for leaked api keys inside “all_source_code/” directory.

Tools used:

  • CorsMe

The screenshot below shows how you can automate this process using bash:

Source: own study

At the very end perform brute-force on the parameter names. This is very time consuming, but can contribute to finding parameters that are obfuscated because they are responsible for critical functions.

Tools used:

  • Arjun

The screenshot below shows how you can automate this process using bash:

Source: own study

After all of these steps check the following text files and directories:

  1. recon.txt — output from most tools used in the whole process
  2. urls.txt — screped urls
  3. status_params.txt — status codes of gathered parapmeters
  4. zile.txt — Leaked api keys / endpoints
  5. status_ffuf.txt — status codes from first directory bruteforcing
  6. status_new_endpoints.txt — status codes of second directory bruteforcing
  7. ffuf.txt — gatered paths during bruteforcing
  8. status_dir.txt — status codes of all urls gathered in ffuf.txt
  9. exp/params.txt — wordlist with parameters prepared for next module
  10. exp/dirs.txt — wordlist with directories prepared for next module
  11. exp/arjun.txt — bruteforced parameters
  12. backups.txt — potentially backup files

The above process has been automated in one script called “crimson_target”. This is one of the three modules that are part of the “crimson” tool that I am constantly creating and sharing at Github.

Now you should have a lot of information about the subdomain you are targeting. The next step should be strictly looking for bugs in all found files and parameters. This process is automated in my 3rd module called “crimson_exploit”, which will be described in my next article.

References:

  1. https://github.com/Karmaz95/crimson
  2. https://owasp.org/www-project-web-security-testing-guide/latest/4-Web_Application_Security_Testing/
  3. https://book.hacktricks.xyz/pentesting/pentesting-web
  4. https://book.hacktricks.xyz/external-recon-methodology
  5. https://github.com/punishell/bbtips

About the Author:

Karol Mazurek - Penetration Tester, Security Researcher and Bug Bounty Hunter.


The article originally published at: https://karol-mazurek95.medium.com/automation-of-the-reconnaissance-phase-during-web-application-penetration-testing-ii-4336bd4ca73b

April 12, 2021
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Inline Feedbacks
View all comments
© HAKIN9 MEDIA SP. Z O.O. SP. K. 2013