HTML source code can contain numerous juicy tidbits of information.
HTML Comments The most obvious place attackers look is in HTML comments, special sections of source code where the authors often place informal remarks that can be quite revealing. The <-- characters mark all basic HTML comments.
HTML comments are a hit-or-miss prospect. They may be pervasive and uninformative, or they may be rare and contain descriptions of a database table for a subsequent SQL query, or worse yet, user passwords.
The next example shows how our getit.sh script can obtain the index.html file for a site, and then pipe it through the UNIX/Linux grep command to find HTML comments (you can use the Windows findstr command similarly to the grep command).
The ! character has special meaning on the Unix/Linux command line and will need to be escaped using "\in grep searches.
[root@meddle ]# getit.sh www.victim.com /index.html | grep "<\!--"
www.victim.com [192.168.189.113] 80 (http) open
<!-- $Id: index.shtml,v 1.155 2002/01/25 04:06:15 hpa Exp $ -->
sent 17, rcvd 16417: NOTSOCK
At the very least, this example shows us that the index.html file is actually a link to index.shtml. The .shtml extension implies that parts of the page were created with Server Side Includes. Induction plays an important role when profiling the application, which is why it’s important to familiarize yourself with several types of web technologies.
Pop quiz: What type of program could be responsible for the information in the $Id shown in the previous example?
You can use this method (using our getit script or the automated web crawling tool of your choice) to dump the comments from the entire site into one file and then review that file for any interesting items. If you find something that looks promising, you can search the site for that comment to find the page it’s from and then carefully study that page to understand the context of the comment. This process can reveal even more interesting information, including:
• Filename-like comments You will typically see plenty of comments with template fi lenames tucked in them. Download them and review the template code. You never know what you might fi nd.
• Old code Look for links that might be commented out. They could point to an old portion of the web site that could contain security holes. Or maybe the link points to a fi le that once worked, but now, when you attempt to access it, a very revealing error message is displayed.
• Auto-generated comments A lot of comments that you might see are
automatically generated by web content software. Take the comment to a search
engine and see what other sites turn up those same comments. Hopefully, you’ll discover what software generated the comments and learn useful information.
• The obvious We’ve seen things like entire SQL statements, database passwords, and actual notes left for other developers in fi les such as IRC chat logs within comments.
Other HTML Source Nuggets Don’t stop at comment separators. HTML source has all kinds of hidden treasures. Try searching for a few of these strings:
SQL Select Insert #include #exec
Password Catabase Connect //
If you find SQL strings, thank the web hacking gods—the application may soon fall (although you still have to wait for Chapter 8 to find out why). The search for specific strings is always fruitful, but in the end, you will have to just open the file in Notepad or vi to get the whole picture.
When using the grep command, play around with the –i flag (ignore case), –AN flag (show N lines after the matching line), and –BN flag (show N lines before the matching line).
Once in a while, syntax errors creep into dynamic pages. Incorrect syntax may cause a file to execute partially, which could leave raw code snippets in the HTML source. Here is a snippet of code (from a web site) that suffered from a misplaced PHP tag:
Go to forum!\n"; $file = "http://www.victim.com/$subdir/list2.php?
f=$num"; if (readfile($file) == 0) { echo "(0 messages so far)"; } ?>
Another interesting thing to search for in HTML are tags that denote server-side execution, such as <? and ?> for PHP, and <% and %> and <runat=server> for ASP pages. These can reveal interesting tidbits that the site developer never intended the public to see.
HTML source information can also provide useful information when combined with the power of Internet search engines like Google. For example, you might find developer names and e-mail addresses in comments. This bit of information by itself may not be that interesting, but what if you search on Google and identify that the developer posted multiple questions related to the development of his or her application? Now you suddenly have nice insight into how the application was developed. You could also assume that same information could be a username for one of the authenticated portions of the site and try brute-forcing passwords against that username.
In one instance, a Google search on a username that turned up in HTML comments identified several other applications that the developer had written that were downloadable from his web site. Looking through the code, we learned that his application uses configuration data on the developer’s own web site! With a bit more
effort, we found a DES administer password file within this configuration data. We downloaded this file and ran a password-cracking tool against it. Within an hour, we got the password and logged in as the administrator. All of this success thanks to a single comment and a very helpful developer’s homepage.
Some final thoughts on HTML source-sifting: the rule of thumb is to look for anything that might contain information that you don’t yet know. When you see some weird-looking string of random numbers within comments on every page of the file, look into it. Those random numbers could belong to a media management application that might have a web-accessible interface. The tiniest amount of information in web assessments can bring the biggest breakthroughs. So don’t let anything slide by you, no matter how insignificant it may seem at first.
Forms
Forms are the backbone of any web application. How many times have you unchecked the box that says, “Do not uncheck this box to not receive SPAM!” every time you create an account on a web site? Even English majors’ in-boxes become filled with unsolicited e-mail due to confusing opt-out (or is it opt-in?) verification. Of course, there are more important, security-related parts of the form. You need to have this information, though, because the majority of input validation attacks are executed against form information.
When manually inspecting an application, note every page with an input field. You can find most of the forms by a click-through of the site. However, visual confirmation is not enough. Once again, you need to go to the source. For our command-line friends who like to mirror the entire site and use grep, start by looking for the simplest indicator of a form, its tag. Remember to escape the < character since it has special meaning on the command line:
[root@meddle]# getit.sh www.victim.com /index.html |
grep -i \<form www.victim.com [192.168.33.101] 80 (http) open sent 27, rcvd 2683: NOTSOCK
<form name=gs method=GET action=/search>
Now you have the name of the form, gs; you know that it uses GET instead of POST;
and it calls a script called “search” in the web root directory. Going back to the search for helper files, the next few files we might look for are search.inc, search.js, gs.inc, and gs.js.
A lucky guess never hurts. Remember to download the HTML source of the /search file, if possible.
Next, find out what fields the form contains. Source-sifting is required at this stage, but we’ll compromise with grep to make things easy:
[root@meddle]# getit.sh www.victim.com /index.html |
grep -i "input type" www.victim.com [192.168.238.26] 80 (http) open
<input type="text" name="name" size="10" maxlength="15">
<input type="password" name="passwd" size="10" maxlength="15">
<input type=hidden name=vote value="websites">
<input type="submit" name="Submit" value="Login">
This form shows three items: a login field, a password field, and the submit button with the text, “Login.” Both the username and password must be 15 characters or less (or so the application would like to believe). The HTML source reveals a fourth field called
“name.” An application may use hidden fields for several purposes, most of which seriously inhibit the site’s security. Session handling, user identification, passwords, item costs, and other sensitive information tend to be put in hidden fields. We know you’re chomping at the bit to actually try some input validation, but be patient. We have to finish gathering all we can about the site.
If you’re trying to create a brute-force script to perform FORM logins, you’ll want to enumerate all of the password fields (you might have to omit the \" characters):
[root@meddle]# getit.sh www.victim.com /index.html |
\> grep -i "type=\"password\""
www.victim.com [192.168.238.26] 80 (http) open <input type="password"
name="passwd" size="10" maxlength="15">
Tricky programmers might not use the password input type or have the words “pass-word” or “passwd” or “pwd” in the form. You can search for a different string, although its hit rate might be lower. Newer web browsers support an autocomplete function that saves users from entering the same information every time they visit a web site. For example, the browser might save the user’s address. Then, every time the browser detects an address field (i.e., it searches for “address” in the form), it will supply the user’s information automatically. However, the autocomplete function is usually set to “off” for password fields:
[root@meddle]# getit.sh www.victim.com /login.html | \
> grep -i autocomplete
www.victim.com [192.168.106.34] 80 (http) open
<input type=text name="val2"
size="12" autocomplete=off>
This might indicate that "val2" is a password field. At the very least, it appears to contain sensitive information that the programmers explicitly did not want the browser to store. In this instance, the fact that type="password" is not being used is a security issue, as the password will not be masked when a user enters her data into the field. So when inspecting a page’s form, make notes about all of its aspects:
• Method Does it use GET or POST to submit data? GET requests are easier to manipulate on the URL.
• Action What script does the form call? What scripting language was used (.pl, .sh, .asp)? If you ever see a form call a script with a .sh extension (shell script), mark it. Shell scripts are notoriously insecure on web servers.
• Maxlength Are input restrictions applied to the input fi eld? Length restrictions are trivial to bypass.
• Hidden Was the field supposed to be hidden from the user? What is the value of the hidden field? These fields are trivial to modify.
• Autocomplete Is the autocomplete tag applied? Why? Does the input field ask for sensitive information?
• Password Is it a password field? What is the corresponding login field?