4 Application Design
4.2 Generating of Snort Rules
4.2.1 Process of Rule generating
The process consists of three steps. First, the extractor reads data from private data sources and extracts the keyword from files or each entry of private data sources. Then the converter generates the Snort rule according to the data type of the keyword. Finally the writer writes the rules into categorized rule files. We discuss the details of those components in the following sections.
4.2.1.1 Keyword Extractor
First, the user defines his private data with the interface of PrivacyGuardian. For selecting personal file, PrivacyGuardian provide a file manager for user to browse through the directories and select files he thinks which is private. Considering users usually don’t understand the Linux operating system and it’s common that users and applications only access and stored their data
Figure 10: The process of generating Snort rules
in the sdcard. So we make a restriction that user can only choose files under the /sdcard directory as his private file. For selecting other system resources like contacts, cellphone information or location information as the private data, user can select the data type by check the checkbox listed in the interface of PrivacyGuardian. Since PrivacyGuardian needs to extract the keyword from the private data, it needs declare the corresponding permissions in manifest file.
After the user’s private data is defined, the extractor accesses those private data and extracts a string as keyword from these private data. For the user’s private files, the extractor opens the file; reads the input stream of the file and output a string as the keyword represents the file. For the private system resources, the extractor accesses these resources by declaring a content resolver and querying the resources with the resolver. If the resource contains two or more entry like contacts, the extractor will query for all the entries and output one string for each entry as the keyword.
There are several ways to select a string from a file, like random selection, or via the text mining approach. To ensure the result of experiment, we choose the simplest one – only retrieve the first few bytes to be the keyword of a file as the keyword of the file. For the private system resources, we choose the most critical part of the entry as the keyword of this entry. Table 1 shows that which part of the private data will be extracted according to the type of the private data.
Table 1: Matching of the data type and the extracted keyword
Data type Extracted keyword
Contacts Phone number
Short Message Service (SMS) The message content or the phone number Location information The latitude and longitude
IMEI The number itself
Files The first few bytes of the file
4.2.1.2 Rule Converter
After the keyword being extracted, the converter generates the complete Snort rules with these keywords. To form a Snort rule, the converter prefixes and suffixes the keyword to match the format of snort rules. The added strings include the rule action, protocol, source/destination IP address, source/destination port number, and rule options. For increasing detection rate and reducing the system overhead, we can set different options for each class of file types. Table 2 shows the different type of data and the Snort rule options which night help the detection.
Table 2: Recommend rule options for each data type
Types of private data Options
Contacts and SMS content:” keyword”; depth: n;
Video, Audio, Graphs content: “keyword”; dsize: n<;
Location Information content:”123”; content:”456”;within:n;
General content: “key”;
The “content” option is one of the most important features of Snort. It allows the user to set rules that search for specific content in the packet payload and trigger response based on
that data. To search for the leakage, we set the keywords we extracted to content option and optimize the rule with adding other options according to the type of private data.
For the rules of contact and SMS keyword, since we believe that the packet length might not too long for sending the phone number, we set the option “depth” to constrain the search area. The “depth” option allows the rule writer to specify how far into a packet Snort should search for the specified pattern. A depth of n would tell Snort to only look for the specified pattern within the first n bytes of the payload.
For detecting private file such as video, audio, graph, we use “dsize” option. The dsize keyword is used to test the packet payload size. This may be used to check for abnormally sized packets. In our case, since the media file size is usually large, the “dsize” option may help us to filter out some small file and increase the efficiency of the IPS.
The keyword of location information is composed of two strings: latitude and longitude.
The distance of the two keywords might not be too long. To specify this, characteristic we use the option “within”. The within option is a content modifier that makes sure that at most n bytes are between pattern matches using the content option.
Except the options we mention above, there still other method to increase efficiency of the detection. For example, we only detect the outgoing packets with specifying direction operator from the IP address of the localhost to any foreign IP.
Because the data could be encoded to different code while it was transferred via the internet, we use a few common coding methods, like ASCII, UTF-8…, to encode the keyword and generate new rules for each encoded keywords. Hence, the rule converter may output several rules with only one original keyword.
4.2.1.3 Rule Writer
After the rules of each selected private data are generated. The rule writer writes the Snort rules into the configuration file. Snort rules can be written on the configuration file “Snort.conf”
directly, or can be written on another configuration file which is included in Snort.conf via adding include option in the main configuration file. To make this system more flexible, the rule writer write the Snort rules to different “.rules” files according to the types of the private data those rules belong to. These “.rules” files are all included in the main “Snort.conf”. There are five “.rules” files currently; they are “Files.rules” for storing the rules relative to user’s private file, “Message.rule” for SMS rules, “Contact.rules” for contact rules, “Location.rules” for location information, “Phone.rules” for storing rules relative to cellphone information like IMEI(International Mobile Equipment Identity number), IMSI(International Mobile Subscriber Identity) and so on. The number of the “.rules” file can be adjusted according to the number of the private data type, or depends on the design of the IPS.