C ONTRIBUTION - 利用隔離使用者內容實現輕量級跨站偽造要求

CHAPTER 1 INTRODUCTION

1.3 C ONTRIBUTION

Due to the prevalence of vulnerability of web pages, websites need an effective and flexible protection mechanism. This paper presents a labeling mechanism which is a server-side CSRF protection approach with light-weight access control mechanism.

Based on the concept of user-centered design and inspired by [34][37][45], server will affix labels to each HTTP request invoked by user-created contents. Administrator can monitor HTTP requests at server-side and decide whether these requests are malicious or not. The most important this is that labels will not touch user-created contents.

We observe that existing websites are tired of filtering or rewriting user-created contents because of unpredictable or unexpected parsing behavior. Therefore, we propose a novel protection approach without sanitization. We ensures that UCC are isolated correctly and server-side can know exactly all HTPP requests from UCC. To be compatible with original JavaScript defined by ECMAScript [18], labeling mechanism allows JavaScript syntax with little limitation instead of filtering or rewriting. Since fine-grained protection policies, labeling mechanism can also accurately block CSRF attack until sensitive data are accessed. We also formalize the

labeling mechanism by RBAC (Role-Based Access Control) [22] for integrity.

Labeling mechanism can help both administrator and members maximize the utilization of resources from website.

Chapter 2 Background 2.1 Cross-Site Scripting

Cross-site scripting (XSS) is the most common vulnerability in the web application. It makes use of the flaw in the web pages and attackers can inject malicious scripts into web pages which served by trust web server. When surfing these pages, browser will automatically execute malicious scripts in the page without victims’

permission. Since the script has the same privilege as the users, attackers can acquire sensitive data from victims. XSS vulnerability has been classified into two categories.

One is called “Reflected XSS.” This vulnerability has “non-persistent” feature, because it depends on user to trigger a series of attack. For example, the content of dynamic web pages depends on the user input. Attackers often inject the codes they want into the input field and lure the users to activate. Once users click on the link, or button, the malicious code executes immediately and reflects back to the users. The most common situation is that attackers send some messages which contain dangerous links or content to victims.

Another is called “Store XSS.” Malicious script stores in the database and executes whenever users request data which is polluted. This kind of attack can be more significantly than reflected XSS because malicious script is rendered more than once and all visitors become victims while browsing the web page. And the most important thing is that attacker does not need to face to victims. For example, attackers post a message containing malicious scripts on the forum. As long as the malicious content has not been removed, every visitor will execute scripts when watching this page.

2.2 Same Origin Policy

To prevent XSS vulnerability, the most common method is to filter user’s input. It is effective in the traditional web site. Netscape also introduced “Same Origin Policy”

(SOP) to enhance the resistance of XSS. This policy restricts a script to can only access the attribute and method from the same site. It prevents any information from one origin to another, but it has not limitation in the same origin.

An origin is composed of application layer protocol, domain name, and TCP port.

Same origin policy means that these three values must be the same; otherwise, browser-side programming language, such as JavaScript, cannot access the method or properties of the website.

2.3 HyperText Transfer Protocol

HyperText Transfer Protocol (HTTP) is an application layer protocol between

browser and web server. It is based on request-response standard. Once browser wants to transmit data to web server or obtain information from web server, a HTTP request is issued by browser and a HTTP response comes from web server.

In a HTTP request, there are usually two methods for transferring data in the modern web application. GET requests a specific resource on the web server. If GET is used for operations, it would cause side-effect, such as information leaking, cross-site request forgery. Attackers can make use of HTML sending HTTP request with GET.

Because of HTML is not restricted by any policies, browser would send requests as attackers expect. The second method is POST. POST can submit data to update resource in the web server. In the HTML, HTTP request with POST is harder to forge than GET. It can be generated by HTML form tag only, but AJAX can simply generate a HTTP request with POST or GET.

2.4 Document Object Model (DOM)

Document Object Model (DOM) is an important structure of HTML, XHTML, and XML. As W3C speaking [4],

“The Document Object Model is a platform- and language-neutral interface that will allow programs and scripts to dynamically access and update the content, structure and style of documents.”

In other words, all objects in the web page are DOM elements. For convenient, JavaScript can access DOM freely, including events, HTML tags, and CSS (Cascading Style Sheets), etc. Moreover, JavaScript can transform a static web page into a dynamic web page by modifying DOM immediately.

Chapter 3 Related Work

In recent years, there are many solutions to prevent CSRF attack [33][39]. CSRF is an attack that tricks victims into browsing a web page that contains malicious scripts which can forge HTTP requests. At the server’s point of view, establishing a well-defined access control mechanism to a service is the most important thing.

According to this concept, solutions can be simply classified into following categories:

3.1 No Script Policy

Website usually does not believe scripts which are composed by users. Therefore, UCC are considered as suspicious and administrators do not allow users to upload scripts. Users are only allowed to write some text on the web page. Whenever JavaScript is going to upload, the server eliminates or blocks them by sanitization method on strings [15]. On the basis of this approach, any client-side script language will be blocked, including HTML, JavaScript, and AJAX. Hence, it is good to defeat any client-side scripts attack.

Figure 1, the difference of Web 1.0 and. Web 2.0 [25]

But in the web 2.0, the user interaction and experience is very important factor to a social network website. With too much limitation on the client-side script language, users cannot enrich their web pages; moreover the whole website is just like going back to web 1.0. Everything in the web page is static and lifeless.

3.2 HTTP Header Modification

Based on the behavior of “cross-site”, CSRF attack is launched from a malicious website to intrude another website. Hence, the solution is how to detect “cross-site”

behavior. Without modifying client and server, Server can depend on HTTP Referrer header which stands for the previous URL browser surfed.

The potential problem is that HTTP Referrer header might leak the sensitive information that impinges on the privacy of users when cross website. An URL

sometimes contains GET parameters, for example:

 http://www.google.com.tw/search?hl=zh-TW&q=secret

From the URL, we can know that user just enter http://www.google.com.tw/ and search a keyword “secret.”

Figure 2, custom HTTP Header

Some papers propose custom HTTP header to figure out this problem [32][38]. For example, “Origin Header” prevents cross-site request forgery by modifying browser.

This header only appears while sending a HTTP request with POST method. In the content of origin header, there does not contain sensitive data, like GET parameters or the path after domain name. Server can follow this header to verify that whether the source URL is dangerous or not.

This approach cannot apply to attack under the same website. Because the content of origin header has information about domain name, server cannot recognize who create the scripts and decide whether scripts should be block or execute.

Furthermore, it is not convenient for users installing extra add-on or plug-in in order to browse a specific website.

3.3 Secret Validation Token

Instead of modifying HTTP request header, some researches implement a secret token mechanism to verify the legality of HTTP request, such as NoForge [5], CSRFx [6], and CSRFGuard [7]. When a client connects to server, this mechanism will generate and dispatch secret token to each other. That is both server and client have the same session key. Every time the client wants to communicate with server, secret token in the client-side will be verified by server. Secret token is useful for blocking forged HTTP request, but the location of secret token is a severity problem. In the client-side, browser embed secret token in the HTML tag, as following:

 <a>

 <form>

 <iframe>

 <button>

 <meta http-equiv=”refresh” >

To modify these tags will cause some problems. First of all, it’s hard to cooperate with DHTML. Dynamic HTML means the content of web page could be changed in the runtime. If a HTML form is generated at runtime, server cannot generate and send to browser immediately. As the result, the secret in client and server are asynchronous.

Secondly, secret token will be embedded into “src” attribute which is equivalent to URL. But, URL is not protected well. Attacker can know the full URL by setting up a malicious website or using document.URL in JavaScript. Lastly, the most common problem is that all of these tags are DOM elements. In DHTML, JavaScript can access whole DOM tree. Thus, JavaScript will be prohibited when secret token exist in the same page.

3.4 JavaScript Restriction

As mentioned above, we can know that JavaScript is the biggest problem.

Therefore, some researchers want to build a better JavaScript. These approaches depend on filtering or rewriting the dangerous functions or properties in the original JavaScript [36].

Figure 3, a safer subset of JavaScript

By doing this, define a much safer subset of original JavaScript. But there are some disadvantages in filtering or rewriting. For the user, functions are provided by server’s API. Users cannot create new functions as they want. As a result, the freedom of JavaScript is restricted. To meet the security, although these papers always provide a formalization of safer-JavaScript, it can be broken sometimes. In [8], authors find vulnerabilities in FBJS [9] and ADSafe [10].

Rewriting is a hard work because of the properties of JavaScript. The most severe problem is the overhead of translation. In a client-server communication, time is the critical factor for users. However, filtering or rewriting is a time-consuming matter.

Although its defense is very effective, users always hope for faster processing time.

Chapter 4 Proposed Scheme

To prevent CSRF attack, we establish a labeling function and construct UCC quarantine policy. The function is used to isolate user-created content and the policy can propagate labels with request from user-created content. CSRF attack is discovered when a request is labeled with untrusted and tries to access critical services which contain sensitive data or privacy information about users. In order to proof the integrity of proposed scheme, we use RBAC (Role-Based Access Control) to formalize and verify.

4.1 Main Idea

Browser always believes contents of the web page from web server even if authors of contents are not trusted by viewer. Once an attack can inject malicious scripts into web page, browser executes malicious scripts automatically without viewer’s permission. This behavior causes severity problems, including CSRF attack.

To prevent CSRF attack, we want to distinguish untrusted contents from trusted contents and prohibit untrusted contents from accessing web services which contain sensitive data, as show in Figure 4.

15 Figure 4, main idea

4.2 Observation

While surfing Internet, browser plays an important role between users and server.

It has to manage session key which is called “cookie” and ensure the same origin policy is applied correctly. Server displays the content of web pages upon cookie. For example, if users want to see a web page with private information, they must provide relative session key which has enough rights. To avoid rights confusion, browser uses one cookie at a time in a website.

Figure 5, browser and cookie

Browser embeds cookie into HTTP requests automatically because of convenient, like Figure 5. Web server receives HTTP request and generates response which depends on cookie. In the process of communication, information which is acquired

by web server comes from HTTP request instead of web page. Therefore, web server cannot exactly know the status of web page in the browser and attacker can just focus on how to build a flawless HTTP request rather than the whole web page.

Another observation focuses on the features of CSRF attack. The most important feature of CSRF attack is session riding. CSRF attack needs session to make a forge request with victim’s identity. If the victim does not login website and web server does not know the identity of victim, CSRF attack is meaningless and cannot be successful.

4.3 Labeling

In a web page, contents can be classified into two types. One is created by administrator of web server, another is created by users. Contents created by users, as called “User-Created Content” (UCC), are the source problem of CSRF attack. But parts of UCC are harmless, such as contents which are created by viewer or does not contain malicious scripts. This kind of UCC should be classified into trust contents.

The reason is that a CSRF attack is meaningless when both identity of attacker and victim are the same. Website can determine whether contents belong to current user or not by session, such as cookies. According to session, we can divide contents into two categories:

 Trusted: contents are created by administrator of website or current viewer

 Untrusted: contents are created by other users

Therefore, a labeling function is used to separate contents and ensure labels cannot be disrupted. The goal of labeling function is that every HTTP request must be labeled from browser, too. However, there may appear non-labeled HTTP request in some

situation, such as user login, open new window, cross-site request, etc. These situations sometimes do not have session in HTTP request header, so web server should help them obtain cookies. But, if there exists session in HTTP request header, it can be considered as cross-site attack. In [11], the author proposed a simple and effective method for preventing cross-site attack. To build an essential page without any user’s input can defeat cross-site attack before accessing normal web pages. We can extend this idea for blocking unknown HTTP request. Once a non-labeled HTTP request appears, server should redirect it to a non-UCC web page and take no parameters (GET or POST). By doing this, the page will not suffer CSRF attack from unpredictable user input and server can assign new cookie and label to browser. A cookie might contain following objects:

 User information: verify the identity of user

 Secret token: a temporary session key to valid freshness

 Session key: the common key used to keep login status For different labels, cookies should not be the same.

Figure 6, a web page with labels

In addition to labeling contents of web page, we also have to establish access

 Trusted label can access trusted and untrusted label freely.

 Untrusted label can only access contents with untrusted label.

 Once contents of trusted label are polluted by untrusted label, its label becomes untrusted, too.

Although labeling function can help server distinguish UCC from trusted contents, there are many challenges in implementation. Because of modifying server-side only, we must properly use JavaScript and built-in functionalities to reach restrictions and dispatch different secret tokens to each label. We will discuss these challenges in HTTP request. Because a multi-stage service requires a sequence of HTTP requests, as show in Figure 7, label should be the same during the process of accessing service.

The process of accessing a web services is called transition path. Labeled transition path can help server determine that a series of requests are going to launch attack or just harmless.

Figure 7, transition path of a service

HTTP defines lots of useful header and methods which can obtain information about browser status. Referrer header, GET and POST method in HTTP request can help web server know exactly the current URL and the next URL of browser.

 Referrer header: an URL stands for previous web page which issue this request

 GET method: retrieve information from URL

 POST method: upload resource to URL

If two URL are session-dependant, they should be considered as a minimum transition path. Session-dependant means that two pages share the same session. To build a complete transition path, web server connects these minimum transition paths upon sessions. By this way, we can catalog requests to build all transition paths, such as Figure 8.

Figure 8, all transition paths of website

Once we have all transition paths, another challenge is to decide which transition path is multi-stage CSRF attack. By observation, attackers inject malicious scripts into UCC and wait for victim browsing. Therefore, we can know that a transition path invoked by a HTTP request with untrusted label should be noticed; nevertheless, parts

of them are harmless.

To defeat CSRF attack accurately, administrator of website should define a subset of web services which will access sensitive data or privacy information about users.

When an HTTP request is issued, web server should check the label of request every time to guarantee the integrity of critical services.

4.5 Determine CSRF Attack

As mentioned above, attacker injects malicious scripts into honest website and waits for victims to browse the vulnerability web page. Labeling mechanism can quarantine UCC and separate the untrusted part of contents. Once the malicious scripts launch attack, all requests are tagged with untrusted labeled. Furthermore, administrator creates the protection policies of critical services and stores them in database. Therefore, we can block CSRF attack effectively without modifying UCC if an HTTP request is tagged with untrusted label and wants to access critical services.

21 attack is simpler than system level access control. We focus on separating contents by trusted and untrusted authors instead of treating content as a unit. Hence we choose a role-based access control model.

First of all, we introduce notations in RBAC and than these symbols are connected with labeling mechanism. We discuss the properties of security information, confidentiality, integrity, and availability in RBAC. In the end, we formalize CSRF attack and our proposed scheme by RBAC.

5.1 Notation of RBAC

RBAC (Role-Based Access Control) is a well-known access control model [31]

and ensure the integrity and confidentiality of a system. In this section, we will introduce symbols of RBAC in Table 1.

There are three basic components in RBAC, Subject, Roles, and Objects. Subjects represent user, program or a basic unit, even a network computer. In our scheme, subjects are contents in a web page and contents can be classified by the identity of author. A role is a collection of job functions and dispatches permissions to each subject. We divide contents into two roles, trusted and untrusted, by authors’ and

在文檔中利用隔離使用者內容實現輕量級跨站偽造要求 (頁 13-0)