Hello and welcome to my first tutorial on Information Gathering.
In this tutorial we will gather information about our website using some freely online available websites.
We’ll be using the Following websites in this tutorial to capture information about our target.
Then the last thing we are going to use is the robots.txt file to view the paths, which web admin wants to hide from the bots and do not want them to be public. All such infomation can many times give your testing a boost start. I will Expain each with example one by one.
You can also view the video tutorial for a better understanding.
This website gives us a detailed information about the web hosting and the Server with detailed information on what is running on the server along with the IP, whoIs information, Server side technologies etc. All this Information should be saved in your reports so that you can use all the information to find the right tests and define the attack surface which is the most important part of a penetest.
Many times the particular domain you are targetting is not so vulnerable or you are not able to find the right attack surface, in such case you can make a Reverse IP domain lookup and find the other domains on the server which may be vulnerable and allow you to enter the Server.
In such a way you can make your way towards into the website.
Archive.org is a website which is maintaining history of many websites over the internet. Many times you can get some information which is no more displayed on the website because of some security issue but something related to that can still be found there.
Robots.txt is a file which is used by the websites to disallow crawlers to Crawl some of its sensitive data or the admin panels. And it can be viewed publically so in that case it could be useful if we find that data and use it later on.
After all this we can move to our target domain and view the robots.txt file, which is used by the web Admins or some Web-Application to hide private stuff from the web bots. But viewing it may allow you to get the path of all that content and later we can view those pages or paths and find some hidden content which could also be in an open form because of the foolishness of a web admin