B IG DATA GENERATED FROM LOCATION BASED SERVICES

CHAPTER 2 LITERATURE REVIEW

2.2 B IG DATA GENERATED FROM LOCATION BASED SERVICES

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

3. Use it in car or pedestrian mode.

Information on the user’s location is of substantial commercial value to the operator of a search service. It’s by providing the user with information that is beyond his or her sensory perception, such as Google or foursquare, since it can be used to precisely hit in geographic proximity to the consumer. Such precision is of value to enterprise, who are willing to adopt the search service and phase in commercial application.

GPS system which was created by US department of defense for the navigation of military in any part of world under circumstances. But, this system is now being used for many other purposes and GPS system has proved to be a revolutionary technology in today's world. GPS is extremely easy to navigate as it tells you to the direction for each turns you take or you have to take to reach to your destination. There are several advantages of GPS at present (Virrantaus, et al., 2002).

 GPS system works in any time (24hrs whole day).

 GPS system works in all condition of the weather, doesn’t need to care about the climate.

 The operation software is very popular in most of mobile device.

 The cost is very low, even is free.

 The protocol of signal is standardized, no competition issue.

 The coverage area is all over the planet.

 The signal is very stable and advance.

2.2 Big data generated from location based services

Learned from previous section 2-1, there are growing data from location based service.

This section will separator into 3 portions to describe how to manage these data. Starting from mention of data feature to what technology be used and how the management effort involved.

2.2.1 Special features of data collected from LBS

The content of location data come from typically mobile users with a wireless device like a PDA, mobile phone, or a tablet pc. The wireless networks will enable new forms of mobile

‧

services. Location Based Services (LBS) are such services for mobile users that take the current position of the user into account when performing their task. Map information and GIS services and infrastructures are crucial helper services. There are many wireless localization techniques that can be used to obtain the location of a mobile device. However, the most popular techniques allow one to determine one’s own position. The GPS (Satellite-based positioning) offers meter accuracy positioning almost everywhere on the planet. This type of positioning is passive, meaning that the mobile devices determine their own position and the satellites cannot determine the location of mobile devices on earth (Zipf, 2008).

So far, we already understood that where the location data come from and how the data be generated. Then we are going to study what kind of the feature included? Here I would like to list down 6 items which collected by research team.

 Routing path: For social scientists interested in understanding human behavior in space-time and its complex relationship with the urban environment, the possibility of collecting and using data derived from LBS offer new opportunities and pose many challenges at the same time (Kwan, 2001).

 Non-structure: The data of LBS a kind of non-structured is irregular, the request of the data is dynamic, and these characteristics raise new challenges for the data's analysis and administration (Huang et al., 2012).

 Imprecision and Varying Precision: Imprecision is a fundamental aspect of location data. User locations are sampled according to some specific protocol. The sample imprecision is dependent on the positioning technology used and the circumstances under which a specific technology is used (Morten, 2005).

 Digital data: Global Positioning System can also be collected and then imported into a GIS. A current trend in data collection gives users the ability to utilize field computers with the ability to edit live data using wireless connections or disconnected editing sessions (Geographic information system, 2014).

 Real Time: After data has been collected, it should be transmitted, received and stored into a database that is able to store large amounts of data and quickly executes queries.

To be able to provide real-time „push‟ and „pull‟-services, the information needs to be sent to the end-user as soon as the data arrives and matches his requested service.

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

To do so, the database should be able to store large amounts of data and execute the queries almost at the same time as new data is being inserted (Ekkebus et al., 2004).

 Social network: Facebook and Twitter were pioneers of social networking, with mobile they have started extending their reach to include geo-social marketing. Geo-social networking allows users to interact relative to their current locations. Thus you can search for users in your network who are nearby, or by venue. For business this means potential group messaging and ad targeting. Users can share likes, maybe meet at a specified location. At every step of their interaction there is the potential for mobile marketing and advertising (WebMapSolution, 2012).

2.2.2 Technology involved in managing big data from LBS

New technologies make it possible to realize value from Big Data. There is a new wave of economic opportunity that businesses should get ready to exploit this trend. To establish a strategy of Big Data is a main stream activity for company survival in current environment. It is all about location-based services.

In traditional, a company operated data in a way of MRP, ERP, and CRM… They feed these data into a data warehouse for analysis and reporting. That will no longer good enough to a modern enterprise. They are going to an age of BI (Business Intelligent). Business should need to know what their customer needs where they are, why they buy, when they buy.

Fortunately, these answer already behind the data warehouse. What they need to do is to dig and mine the database by an algorithms (Stepney, 2014).

Big Data relates to data collection, storage, querying and analysis that is category in terms of volume, variety, and velocity. It is also a term used to refer to massive and complex datasets made up of a variety of data structures, including structured, semi-structured, and unstructured data. Businesses are aware that this huge volume of data can be used to generate new opportunities and process improvements through their processing and analysis

Here are the 6 items of technology which involved in managing Big Data from Location-Based Service (Rodrigues, 2012).

‧

 Schema-less databases: The data collected from capturing from user in any type of data. They are a kind of non-structure data. It means data is not perform in a consolidation form. Some of data may loss or no-data. So, if we stored to a normal database, could be initialled a syntax error (Cassandra, 2014).

 Cloud computing: Cloud computing is a term used to refer to a model of network computing where a program or application runs on a connected server or servers rather than on a local computing device such as a PC, tablet or smartphone (Huang et al., 2012). .

 Storage Technologies: One of the key characteristics of big data applications is that they demand real-time or near real-time responses. If a police officer stops a car they need data on that car and its occupants as quickly as possible. Data volumes are growing very quickly, especially unstructured data. As we move forward, this will only likely increase, with data augmented by that from growing numbers and types of machine sensors as well as by mobile data, social media and so on. (Adshead &

Dubash, 2014)

 MapReduce: A MapReduce program is composed of a Map procedure that performs filtering and sorting and a Reduce procedure that performs a summary operation. The

"MapReduce System" orchestrates by marshalling the distributed servers, running the various tasks in parallel, managing all communications and data transfers between the various parts of the system, and providing for redundancy and fault tolerance (Rodrigues, 2012).

 Hadoop: The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures (Rodrigues, 2012).

 Hive and PIG: The Apache Hive data warehouse software facilitates querying and managing large datasets residing in distributed storage. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called

‧

programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL. Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets (Rodrigues, 2012).

2.2.3 Management effort involved in exploiting big data from LBS

In a traditional industry, many of the routing companies predict routing path by computing of mathematic. But something is changing, this is not enough to satisfy user’s needs. Some company start to adopt the management of Big Data. Users are looking for stability and distribution system expertise more than the latest algorithms. However, one of the challenges that all of these companies will face is the ability to process data to re-optimize routes.

Here we would like to discuss about the management effort in Big Data. It included 3 portions as below: 1) Database management. 2) Clustering Data. 3) Recommendation system, which represented 3Vs (volume, variety, velocity) (Beyer & Laney, 2012) of Big Data.

 Database management: it’s concerning the database adopted big data, it’s a large data repository that integrates data from several sources into structures expressly designed for analytical purposes. Database typically employ a multidimensional model for organizing data. This type of model typically categorizes data as either business facts with associated measures, which are numerical in nature, or dimensions, which characterize the facts and are mostly textual. Each dimension is organized into a hierarchical structure of levels, which enables the aggregation of facts to the desired levels of granularity. Services supported by non-conventional databases, characterized by the spatial and temporal dimension, i.e., spatiotemporal databases. Due to this, data involved in LBS have not been really examined in depth.

Consequently, LBS data semantics are not captured properly, LBS data models do not fully accommodate application requirements, and the final system does not

‧

always meet user needs (Jensen, et al., 2003).

 Clustering Data: The scientist organized a collection of objects into a classification or a hierarchy. Not feasible to “label” large collection of objects, there are no prior knowledge of the number and nature of groups in data. Clusters may evolve over different domain, which provides efficient querying, search, storage and organization of data. More and more data are collected from multiple sources or represented by multiple views, where different views describe distinct perspectives of the data. Clustering is an exploratory technique and essential methodology. The collected data are used in every scientific field. The scientist depends on data to choice of clustering algorithm and factors. Clustering is essential for solving issues of Big Data. The methodology of K-means provides good trade-off between data size and accuracy. The number of challenges are extensibility, huge quantity of clusters, various data, solidity data, and validity. Although each view could be individually used for finding patterns by clustering, the clustering performance could be more accurate by exploring the information among multiple views. Several multi-view clustering methods have been proposed to unsupervised integrate different views of data. However, they are graph based approaches. It based on spectral clustering, such that they cannot handle the large-scale data. How to combine these heterogeneous features for unsupervised large-scale data. Clustering has become a challenging problem (Cai et al., 2013).

 Recommendation system: The system of real time recommendation methodology which generating location based involves building a dataset on usage data of the user based. The usage data collected from capturing device location data. The data determine the travelling patterns of the individual, and send to server in real time.

The system is a valuable but unique application in location-based social networking services, in terms of what a recommendation is and where a recommendation is to be made. Recommender system can record people’s routes by taking advantage of the category information of a user’s location history. Always taking into account usage context that match user personal interests within a geospatial. Machine learning algorithms can be used effectively to identify the regular routes, weekend vacation, most frequented routes and other travel pattern that can be used to build the user profile (Shamah, 2014).

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

在文檔中位基服務商業模式研究---海量資料價值創造 - 政大學術集成 (頁 15-21)