Dempster-Shafer 理論於交通資料整合技術之應用

全文

(1)國立交通大學運輸科技與管理學系碩士論文 ! ! !. Dempster-Shafer 理論於 ϛ ఞ ྆ ी ጭ Ќ ԍ ௻ ˿ ᐷ σ! The Application of Dempster-Shafer Theory on Traffic Information Integration ! ! 研究生：曾治維指導教授：王晉元 ! ! 中華民國九十三年六月.

(2) Dempster-Shafer 理論於交通資料整合技術之應用 The Application of Dempster-Shafer Theory on Traffic Information Integration 研究生：曾治維. Student：Chih-Wei Tseng. 指導教授：王晉元. Advisor：Jin-Yuan Wang. 國立交通大學運輸科技與管理學系碩士論文. A Thesis Submitted to Department of Transportation Technology and Management College of Management National Chiao Tung University in partial Fulfillment of the Requirements for the Degree of Master of Engineering in Transportation Technology and Management June 2004 Hsinchu, Taiwan, Republic of China. 中華民國九十三年六月.

(3) Dempster-Shafer 理論於交通資料整合技術之應用學生 : 曾治維. 指導教授 : 王晉元. 國立交通大學運輸科技管理學系碩士班. 摘. 要. 隨著智慧型運輸系統(ITS)的蓬勃發展，道路上可以收集到的交通資料越來越多，對於交通控制中心而言，這些資料都有其不同的來源，例如探偵車、道路偵測器、CCTV 等，而因為其來源不同，格式、準確率以及更新頻率都不同，所以交控中心處理起來就有其困難。因此如何將不同資訊來源加以整合並提供給用路人就成為一個重要的課題，而此種資料整合技術通常稱為資料融合。資料融合技術開始於 1980 年晚期，起初多半應用於軍事領域。近幾年才開始應用在智慧型運輸系統的相關產業，其中主要應用在先進旅行者資訊系統 (ATIS)和先進交通管理系統(ATMS)。本研究應用 Dempster-Shafer 理論提出一個新的模式，這個模式可以給予不同資料來源不同的權重，透過此權重可以去整合來自不同資料來源的交通原始資料。對於用路人而言，交通壅塞程度是不容易表示的，所以本研究會將速率資訊轉成不同的區間，然後再透過我們的模式，將這些資料轉成道路服務水準的模式讓交控中心提供給用路者。為了評估本研究的準確性和合理性，除了實際資料的測試，我們還會利用資料模擬的方式來模擬不同的情境。測試結果證實本研究所提出的資料融合方法在實務上具有可行性。. 關鍵字：旅行者資訊、資料整合、Dempster-Shafer 理論、智慧型運輸系統(ITS).

(4) The Application of Dempster-Shafer Theory on Traffic Information Integration Student : Chih-Wei Tseng. Advisor : Jin-Yuan Wang. Department of Transportation Technology and Management NationalChiao Tung University. Abstract With the wide implementation of Intelligent Transportation Systems (ITS), there are many raw traffic data collected by various devices. In a traffic information center, the traffic data may come from different sources, such as probe vehicles, CCTV and loop detector, with different formats, accuracy and updating frequency. An important issue for this traffic information center is to integrate the data from multiple sources into single one and broadcast to users. This data integration is usually called data fusion. Data fusion technology started in the late 1980s and has continued to the present but usually used in military surveillance. In recent years, ITS industry starts to use data fusion technique, especially in Advanced Traveler Information Systems (ATIS) and Advanced Traffic Management Systems (ATMS). This study proposes a new model using Dempster-Shafer Theory to combine various raw traffic data from multiple sensors and assign different weights to distinguish multiple sensors. Because the level of traffic congestion to the drivers is hard to quantify, we develop a method to categorize speed data into different intervals. Then, we develop a efficient data processing method in order to provide real-time road service level to the traffic center. In order to evaluate the accuracy and rationality of our model, beside the real data testing, we generate simulated data for scenarios simulation. The testing results show that our proposed entropy data fusion technique is suitable in practice. Keyword: Traveler Information, Data Fusion, Dempster-Shafer Theory, Intelligent Transportation Systems (ITS).

(5) 誌. 謝. 本論文能夠完成首先要感謝王晉元老師的指導，除了四年來老師的訓練外，在論文研究過程老師總是能適時的給予建議，使得論文進行順利而不置於迷失方向，在論文英文撰寫的方面，因為英文寫作的不習慣，尤其讓老師勞心勞力，除了論文方面，在實務上解決問題的能力，也在老師的壓力說下有所提升，生活上的道理，尤其是莫非定理不斷的在身邊驗證，使得自己避免準備不夠的問題，要感謝的地方太多，謝謝老師。除此之外，成大魏建宏老師和運研所的吳玉珍組長在口試時的指教與建議，都使的本論文能更加成熟；在資料收集的方面要特別感謝資策會紀舜學長和運研所東凌學長在收集方法和設備上的指導與幫助，實際的資料收集也感謝彥佑小倆口和家族學弟信誌和日錦的支援；多虧了大家，本論文才可以順利完成。在交大六年時間，隨著論文的完成也畫下句點，感謝系籃的學長學弟，能讓我在課外也有個奮鬥的舞台；感謝當了多年室友的小偉跟阿信，不論在課業、生活、籃球上都是一起努力的夥伴；也要感謝 Lab 中的學長和學弟妹，大家一起在這同甘共苦，學長們的指導和學弟們的支援，我都心存感激，希望大師兄能早日拿下本 Lab 第一個博士學位，然後就跟老師一起辦喜酒，宗成學長在美國順利拿下學位，紀舜、家盛、駿逸學長在工作上好好發揮，小名當兵順利，猴子身體健康，嘉龍愛情順利。終於下了賊船，感謝 Lab 和老師帶給我的一切，希望 hoho、彥佑、怡君、思文、信翔、嘉英、威豪、瑞豐能扛下 Lab 的招牌並且順利畢業。除了這些朋友們，更要感謝我的家人，沒有父親辛勤的工作和母親細心的照顧，我也不能順利渡過我的求學生涯，並對於課業全力付出。最後也要感謝我的女友欣潔，不論在生活上或研究上都一直陪伴在我的身旁，能跟你分享生活的點點滴滴真的很開心，離開學校進入社會，希望我們能繼續相伴左右。這篇論文能順利完成，要感謝的人實在太多了，希望能在此和你們分享論文完成的喜悅，謝謝大家。. 曾治維. 新竹交大 2004/7/04.

(6) Content Chapter1 Introduction ....................................................................................................1 1.1 Motivation........................................................................................................1 1.2 Objective ..........................................................................................................2 1.3 Scope................................................................................................................2 1.4 Study Flowchart ...............................................................................................3 Chapter2 Literature Review...........................................................................................5 2.1 Review of Data Fusion.....................................................................................5 2.1.1 Introduction of Data Fusion ..................................................................5 2.1.2 Advantage of Data Fusion.....................................................................5 2.1.3 A Framework for data fusion ................................................................6 2.2 Review of Data Fusion Algorithms..................................................................8 2.2.1 Reviews of Neural Networks ................................................................8 2.2.2 Dempster-Shafer Evidential Reasoning..............................................10 Chapter 3 Model Building ...........................................................................................13 3.1 The concept of data adjusting ........................................................................13 3.2 The concept of data clustering .......................................................................14 3.3 The concept of the Dempster-Shafer rule of combination .............................16 Chapter 4 Model Testing..............................................................................................19 4.1 The data collection and generation ................................................................19 4.1.1 Probe vehicle data collection and analysis..........................................19 4.1.2 VD (Vehicle detector) data collection and analysis ............................22 4.1.3 Simulated data generation...................................................................23 4.2 The model testing...........................................................................................25 4.2.1 Real data testing..................................................................................25 4.2.2 Simulated data testing .........................................................................29 4.3 Summary of fusion result...............................................................................35 Chapter 5 Conclusions and suggestions.......................................................................36 5.1 Conclusions....................................................................................................36 5.2 Suggestions ....................................................................................................37 References....................................................................................................................38. -I-.

(7) List of Pictures Figure 1-1 Study Flowchart ...........................................................................................3 Figure 2-1 Framework for data fusion ..........................................................................6 Figure 2-2 The architecture neural network.................................................................10 Figure 3-1 Model Framework......................................................................................13 Figure 4-1 The raw data of buses ...............................................................................20 Figure 4-2 The digital map ..........................................................................................21 Figure 4-3 Detecting range of vehicle detector............................................................22 Figure 4-4 The collected VD data................................................................................22 Figure 4-5 The simulation process...............................................................................24. -II-.

(8) List of Tables Table 2-1 Fusion level....................................................................................................8 Table 3-1 Service Level Standard ..............................................................................17 Table 4-1 The results of single data source ................................................................25 Table 4-2 The results of weights ................................................................................26 Table 4-3 The results of clustering .............................................................................26 Table 4-4 Road level Ⅱ ..............................................................................................27 Table 4-5 The outputs of fusion model ........................................................................28 Table 4-6 The scenario.................................................................................................30 Table 4-7 The data adjusting........................................................................................31 Table 4-8 The data clustering.......................................................................................31 Table 4-9 The data fusion.............................................................................................33. -III-.

(9) Chapter1 Introduction. 1.1 Motivation Traffic jam is a major concern for a modern country. The construction of new roads is not fast enough to satisfy transportation demand. Thus, traffic management is more and more important. To make traffic management effective, traffic information such as vehicle speed, flow of vehicles, travel time data are indispensable to traffic management. Besides, the traffic information is necessary to the travelers. Travelers want to know where the traffic congestion is and when they can arrive at their destinations. However, how to collect the real-time traffic information is an issue we have to address. With the wide implementation of Intelligent Transportation Systems (ITS), there are many raw traffic data collected by various devices. For example, the raw traffic data can be collected by probe vehicles, loop detectors, CCTV and other sensors. However, raw traffic data is not useful to users. In a traffic information center, the traffic data may come from different sources, such as probe vehicles, CCTV and loop detector, with different formats, accuracy and updating frequency. An important issue for this traffic information center is to integrate the data from multiple sources into a single one and broadcast to users. This data integration is usually called data fusion. Data fusion is a technique which combines multi-source data through a centralized data processor to provide accurate information. Through the data fusion techniques, we can reduce ambiguity, increase confidence and obtain useful information from raw data. Data fusion technology starts in the late 1980s and mostly for military. -1-.

(10) surveillance purposes. In recent year, ITS industry starts to use data fusion technique, especially in Advanced Traveler Information Systems (ATIS) and Advanced Traffic Management Systems (ATMS). Therefore, data fusion is considered as a significant technique in the development and advancement of ITS applications.. 1.2 Objective The objective of this study is to propose a new method to combine various raw traffic data from multiple sensors into single traffic information. Because the level of traffic congestion is hard to quantify, we also develop a method to categorize speed data into different intervals.. 1.3 Scope In this research, we use two types of data sources: loop detectors and probe vehicles of bus. But our model could be applied to any traffic data source. Traffic speed is one of the most desirable information for traffic management and traveler information systems, because it is a good measure of system effectiveness. We only consider the speed data in this research. Besides, because probe vehicles usually come from urban buses, we assume the traffic data are collected from urban areas in this research. But our model is applicable to both urban and rural areas. In this research, we just fuse the data from different traffic data sources. And the data processing from single data source and the time and amount of data collection are not discussed in our research.. -2-.

(11) 1.4 Study Flowchart. Problem Definition. Literature Review. Data Fusion Method Developing. Model Testing Yes. Output Analysis. Need Modified No. Conclusions Figure 1-1. Study Flowchart. -3-.

(12) As shown in the Figure1-1, the flow of this study is as follows: 1. Problem definition: According to the objective and scope, we will define the problem. 2. Literature Review: After the problem definition, we review relative literatures of data fusion. After reviewing these literatures, we will determine the major methodology adapted in this research. 3. Data fusion method developing: Then, we will try to develop our own data fusion technique. 4. Model testing: We will collect real-time data and perform tests. If the real-time data is not sufficient, we will use simulation to generate testing data for evaluation purpose. 5. Output analysis: After model testing, we will modify our model and solution technique based on our testing results. We will keep modifying our model until satisfied. 6. Conclusion: Finally, we will draw some conclusions and suggestions.. -4-.

(13) Chapter2 Literature Review. 2.1 Review of Data Fusion. 2.1.1 Introduction of Data Fusion Multi-sensor data fusion is a technique by which data from several sensors are combined through a centralized data processor to provide comprehensive and accurate information. This technique offers a synergistic process of consolidation of individual data creates into a combined resource with a productive value greater than the sum of its parts. Data fusion started in the late 1980s and used in military at first. The U.S Department of Defense conducted much of the early research on this technology and explored its usefulness in military surveillance and land –based battle management systems. The application of data fusion technology to commercial endeavors and non-military projects is also growing rapidly. Data fusion has been given much attention in the engineering literature, yet relatively few articles discuss its potential usefulness for transportation management or intelligent Transportation Systems (ITS) [1]. 2.1.2 Advantage of Data Fusion The advantages of data fusion for the traffic information system are: 1. Increased confidence: Sensors can confirm each other’s inferences, thereby increasing confidence in the final system in inference. Also, some inferences can be ruled out to generate a reduced set of feasible options, thereby reducing the. -5-.

(14) effort required to search for the best solution. 2. Reduced ambiguity: joint information from multiple sensors reduces the set of hypotheses about the target. 3. Improved detection: integration of multiple measurements of the same target improves signal-to-noise ratio, which increases the assurance of detection 4. Increased robustness: one sensor can contribute information where others are unavailable, inoperative, or ineffective. 5. Enhanced spatial and temporal coverage: one sensor can work when or where another sensor cannot. 6. Decreased costs: a suite of “average” sensors can achieve the same level of performance as a single, highly-reliable sensor and at a significantly lower cost. 7. Shorter response time: Since more data is collected by multiple sensors, a prescribed level of performance can be attained in a shorter time [1][2]. 2.1.3 A Framework for data fusion. Figure 2-1. Framework for data fusion [2]. The framework of data fusion is shown in Figure 2-1. Each block is briefly described -6-.

(15) next. 1. Information sources: These sources include sensors that may be situated at the fusion site or may be distributed, and other a priori information available from humans or a database. 2. Level of the processing: These fusion levels are differentiated according to the amount of information they provide. (1) Level one: The most basic level involves the fusion of multi-sensor data to determine the position, velocity, and identity of a target. At this level, however, only raw, uncorrelated data are provided to the user. (2) Level two: Level two data fusion provides a higher level of inference and delivers additional interpretive meaning suggested from the raw data. (3) Level three: Level three data fusion is designed to make assessments and provide recommendations to the user, much as occurs in knowledge-based expert systems (KBES). Thus, each jump between data fusion levels represents a corresponding leap in technological complexity to produce increasingly valuable informational detail. 3. Process refinement: This process is a meta-process concerned with other processes. 4. Database management: This is a key component of a successful fusion system. Required functions are data retrieval, storage, archiving, compression, relational queries and data protection. 5. Human computer interface: This interface provides a means for human computer interaction [1][2][4].. -7-.

(16) 2.2 Review of Data Fusion Algorithms According to Linn and Hall’s 1991 [14] taxonomy of data fusion algorithms, five general, goal-oriented, data fusion methods are in use today: data association, positional estimation, identity fusion, pattern recognition, and artificial intelligence. Within these five general categories, ten discrete data fusion techniques can be identified. See Table 2-1 [1]. Table 2-1. Fusion level. Fusion Level. General Method. Specific Technique. Level one. Data association. Figure of merit(FOM) Gating techniques. Level two. Positional estimation. Kalman filters. Identity fusion. Bayesian decision theory Dempster-Shafer evidential reasoning(DSER) Adaptive neural networks. Level three. Pattern recognition. Cluster methods. Artificial intelligence. Expert systems Blackboard architecture Fuzzy logic. In this reviews, we find we can get fusion information from raw data in level two, and most main fusion algorithms are in level two. So we focus in the level two of data fusion. 2.2.1 Reviews of Neural Networks Artificial neural systems (ANSs), also known as neural networks, are. -8-.

(17) information-processing structures that attempt to replicate the process of learning and decision making observed in the human brain. A neural network uses many simple elements called neurons (or processing nodes) to collect and correlate information. These neurons are connected by synapses that ascribe a weight to each neuron’s output and then forward it, in a unidirectional path, to the next set of neurons. A neuron may have many inputs, but it has only a single output. In summary, the three defining elements of a neural network are the following: • The neuron’s characteristics - the equations that define what a neuron will do. • The learning rule - the guide as to how the weights between various neurons will change according to the stimuli they receive. • The network topology - the manner in which the neurons are connected. Neural networks always require a “learning” period in order to fully establish and test the specific patterns or rules that will guide the system. The learning process employed in a typical multi-layer neural network is simple error feedback. During this process, the network must be run through its paces so that each neuron can be “taught” the proper association between diverse data inputs and assimilated output. This knowledge can be obtained through the observations of a human teacher, who repeatedly programs the desired weights given to each neuron until a known pattern is fully duplicated. The architecture of neural network is showed in Figure 2-2[1] [3]:. -9-.

(18) Figure 2-2 The architecture neural network[1] 2.2.2 Dempster-Shafer Evidential Reasoning In the multi-sensor data fusion techniques, the Bayesian theory is the common and traditional method, but is limited in its ability to handle uncertainty in sensor data. Therefore, Dempster-Shafer Evidential Reasoning (DSER) method is being explored. For the rest of this review, we briefly describe the DSER. Let Θ be a finite set of mutually exclusive alternatives, called the frame of discernment. We will assign the beliefs over Θ and compose the assignment function which called “basic probability assignment”. We assume the function presented by. m( ) , the power set of Θ , to [0,1].Therefore, the empty set of m(. ). is 0 and the sum. of m( ) over all subset of Θ is 1. Then, we introduce two other important functions about the DSER. Assume the assignment function is m( ) , let E k be the evidence of sensor S and A present one type of situation. Belief function accounts for all evidence E k that. -10-.

(19) supports the situation A.. Belief ( A) =. ∑ m( E ) k. Ek ⊆ A. Another function, plausibility function accounts for all evidence E k that make the intersection with situation A is not empty.. Plausibility ( A) =. ∑ m( E ). Ek ∩ A≠ Φ. k. Assume P( A) is the probability of situation A, then the relation between belief and plausibility is:. Belief ( A) ≤ P( A) ≤ Plausibility ( A) For this reason, this relation can be called the “confidence interval”. Belief ( A) and Plausibili ty ( A) are called lower and upper bound of the confidence interval.. Above functions indicate the confidence interval of the sensor S about the situation A, but the most important problem in data fusion is combining the data from different kinds of data source. For example, there are two data source S1 and S 2 . They have different assignment functions m1 ( ) and m2 ( ) , and different evidences E k and E k' , but have the same situation A. We should combine these two assignment functions in one assignment function, so DSER proposes a combination rule, called Dempster’s rule of combination:. m12 ( A) = K=. ∑ m (E )m (E ) k. 1. 2. ' k. Ek ∩ Ek' = A. 1− K. ∑ m (E )m (E ) 1. k. 2. ' k. Ek ∩ Ei' = Φ. where K is called the conflict. By this new assignment function m12 ( A) , we can calculate the new confidence interval using the data from two different kinds of data source. In this condition, we can enhance the accuracy and the reliability of the data [5][6][7][8].. -11-.

(20) We use the Dempster-Shafer theory in our research based on the following reasons. 1. DS theory is simpler than other theories, so we can program it without using any other package software. 2. The computation time is short enough to support real-time information. 3. We want to provide travelers road service level information, and DS theory is suitable for level information.. -12-.

(21) Chapter 3 Model Building In this chapter, we describe the proposed data fusion model. The model here involves data adjusting, data clustering and the Dempster-Shafer rule of combination. The data adjusting reduces the ambiguity between multiple data sources. The data clustering is the process of grouping a set into classes of similar subsets and transform the data of speed from discrete values to intervals. The Dempster-Shafer rule combines the data form multiple sources and that is the main idea of this research. The data adjusting is described in section 3.1. The data clustering is described in section 3.2 and the Dempster-Shafer rule of combination in section 3.3. The model framework is shown in Figure 3-1. Sensor1. Data Adjusting. Data Clustering Combination. Sensor2. Data Adjusting. Data Clustering. Figure 3-1 Model Framework. 3.1 The concept of data adjusting. Before data clustering, we should adjust the raw data since Dempster-Shafer rule (DS rule) has several limitations. The problem of DS rule was originally pointed out by Lotfi Zadeh[9][15]. Zadeh provided a compelling example of erroneous results. That example showed that DS rule has ambiguity when the raw data varies significantly between multiple sources. -13-.

(22) The method that we use to adjust the raw data is shifting without changing the data distribution. We shift the raw data by the following steps: 1. Deciding the shifting target: First, we need to decide the shifting target. The shifting target is one of the data sources. The weight wi of each data source must be calculated in advance. wi could be computed as follows:[10] wi =. ni si2. where si is the standard deviation of the data from the i th source and ni is the sample size of the i th source. The larger wi mean more reliable. Therefore, the source that has the largest weight is the shifting target. 2. Shifting: After deciding the shifting target, we can shift the raw data by adding a number to other raw data in order to have the same mean. For example, the raw data are from two different data sources (A and B). Suppose A is the shifting target since it has better accuracy. Let the elements of B is. b1 ....bm , mean of A is A , and mean of B is B . The data adjusting is as follows: D AB = A − B. bk* = bk + D AB. ,k =1~ m. Where bk* are new elements of B after the data adjusting.. 3.2 The concept of data clustering. In DS rule, we need to transform the speed data from discrete values to intervals. These intervals are generated by the data clustering based on the principle of -14-.

(23) maximizing the intraclass similarity and minimizing the interclass similarity. [11] There are many methods of data clustering. In this model, we use standard deviation to cluster data. Let the elements of data source (A) are a1 ....a n ( a1 ≤ a 2 ≤ a3 ..... ≤ a n ) and the standard deviation is S A . Cluster1: a1 .....a m Cluster2: a m +1 .....a j. Cluster3: a j +1 .....a k. , a m ≤ a1 + S A. , a j ≤ am + S A , ak ≤ a j + S A. …………….. For example, the data are: 18,20,22,24,25,29,35,38,40,40,40 With standard deviation equals to 8.711. Take Cluster1 for example, a1 = 18 and S A = 8.711 , so the interval of Cluster1 is 18 to 26(18+8). This Cluster includes five. data, 18, 20, 22, 24 and 25. Similarly, the interval of Cluster2 is 29 to 37(29+8). This Cluster includes two data, 29 and 35. The interval of Cluster3 is 38 to 46(38+8). This Cluster includes four data, 38, 40, 40 and 40. After data clustering, the data are represented with intervals. In DS rule, we need to generate the basic probability assignment m to calculate the basic probability number m( A) . The basic probability assignment m is a function that assigns a value in [0,1] to every subset A . We calculate m( A) based on the number of data fallen in each interval. Let m( Ai ) be the basic probability of Cluster i :. m( Ai ) =. ni N -15-.

(24) where ni is number of data in Cluster i and N is the total number of data. In the above example, the basic probability number m( A1 ) of Cluster1 is: m( A1 ) =. n1 5 = = 0.454545 N 11. That is, this method satisfies the principle of DS rules: m(φ ) = 0. ∑ m( A) = 1 After generating the basic probabilities, we combine data from multiple sources with the Dempster-Shafer rule of combination.. 3.3 The concept of the Dempster-Shafer rule of combination. The Dempster-Shafer rule of combination is critical to the original conception of Dempster-Shafer theory. Dempster-Shafer rule combins multiple belief functions through their basic probability assignments ( m ). The combination ( m12 ) is computed from the aggregation of two basic probability assignments m1 and m 2 in the following manner [5]:. m12 ( A) =. K=. ∑ m (E )m (E ) 1. E k ∩ E k'. 2. k. ' k. =A. (1). 1− K. ∑ m (E )m (E ) 1. k. 2. ' k. Ek ∩ Ei' = Φ. where K is the conflict and E k , E k' are the intervals of different data sources. In this research, the basic probabilities of m12 represent the state of traffic. -16-.

(25) congestion. For example, m12 ( A) is the basic probability of service level A after combination and m12 (B ) is the basic probability of service level B after combination and so on. m1 and m 2 are the basic probability assignments of two different data sources. E k and E k' are the data intervals of clusters that are generated by these two data sources. In this method, we categorize speed data into different intervals to represent different level of traffic congestion. We refer to service level standard in the Highway Capacity Manual (HCM) are shown in the Table 3-1 [12]. Table 3-1 Service Level Standard Ⅰ. Ⅱ. Ⅲ. 55. 45. 40. Service Level. Avg Travel Speed(kph). Avg Travel Speed (kph). Avg Travel Speed (kph). A. ~51. ~43. ~33. B. 51~39. 43~32. 33~25. C. 39~34. 32~27. 25~20. D. 34~29. 27~23. 20~16. E. 29~21. 23~17. 16~10. F. 21~. 17~. 10~. Road Level Free Flow Speed，kph. While computing the combination, we still have one more problem to take care of. It is uncertainty due to the interval classifications. For example, the intersection of E k and E k' may belong to level A and level B simultaneously. If we let it belong to level A and B at the same time, m1 (E k )m2 (E k' ) will be counted twice. Then,. ∑ m ( A) > 1 , which is against the DS rule. So when this situation happens, 12. we propose the following two steps to solve this issue:. -17-.

(26) Step1: We compute it with a ratio.. E ∩E = A RA = k E k ∩ E k' ' k. m12 ( A) =. ( ). In Equations (1), R A m1 (E k )m2 E k'. ∑ m (E )m (E ) , and 1. k. 2. ' k. ∑R. A. ( ). m1 (E k )m2 E k'. Ek ∩ E k' = A. (2). 1− K. ( E k ∩ E k' = A, B ) counts in. RB m1 (E k )m2 (E k' ) ( E k ∩ E k' = A, B ) counts in. Ek ∩ Ei' = A. ∑ m (E )m (E ) 1. k. 2. ' k. Ek ∩ Ei' = B. After computing m12 , we have m12 ( A) , m12 ( B) , m12 (C ) and so on. Step 2: We use this basic probability of m12 .. ( ). If m12 ( A) > m12 ( B ) , m1 (E k )m2 E k'. ∑ m (E )m (E ) 1. k. 2. ' k. Ek ∩ Ei' = A. not in. ( E k ∩ E k' = A, B ) only counts in. ∑ m (E )m (E ) . 1. 2. k. ' k. Ek ∩ Ei' = B. After these two steps, we let the biggest basic probability of m12 get more significant. Therefore, this new basic probability assignment is more convenient for deciding the traffic situation. Then, we find the maximum m12 () . The data from two different sources can be combined by following the above procedures. If there are more than two data sources, we can extend the Dempster-Shafer rule of (1) as follows: [13]. m(φ ) = 0 if K ≠ 1 , m( A) =. where K =. ∑ ∏ m (B ). B1 ∩...∩ B p = A 1≤ i ≤ p. i. i. (3). 1− K. ∑ ∏ m (B ). B1 ∩...∩ B p =φ 1≤i ≤ p. i. i. p is the amount of the data sources.. -18-.

(27) Chapter 4 Model Testing In this chapter, we test the data fusion model described in chapter 3. First, we need to collect speed data for model testing. We use two kinds of data. One is real data and another is simulated data. Real data testing can evaluate how well our model works in the real world. However, the real data for both detectors and probe vehicles are rare. Thus, we need simulated data for a more comprehensive test. Therefore, we test the real data first. Then, generate simulated data base on the real data distribution. Finally, we analyze the testing results and summarize our findings.. 4.1 The data collection and generation. 4.1.1 Probe vehicle data collection and analysis. There are total 375 probe vehicles in Taichung(Taiwan), including 250 buses and 125 taxis. The GPS raw data of probe vehicles are provided by the website of Taichung City Government. The data includes ID of on board unit, routes, next station, terminal station, longitude, latitude, speed and direction of the bus as shown in figure 4.1. Except routes, next station and terminal station, the data format of taxis are the same as those of buses.. -19-.

(28) Figure 4-1 The raw data of buses In order to identity the real road segment where the bus is located, the bus traveling direction and useable speed data of probe vehicles, we need to do some manipulation of the raw data. We describe these manipulation procedures as follows. 1. The real road segment where the bus is located: In most cases, the GPS data does not fall on any road segment. So we have to determine which road segment the GPS data is related to. Based on the digital map as shown in figure 4-2, we can find the closest road segment to the bus. We set a search range first. In this range, we find the road segment with the shortest direct distance to the bus. That means, that bus is traveling along this road segment.. -20-.

(29) Figure 4-2 The digital map 2. The bus traveling direction: There are two possible directions in each road segment, and we need know which direction the bus is traveling. We have the direction angle of the bus from the raw data and get the starting point and the end point of every road segment from the GIS data base. Based on the starting point and the end point, we can get the direction of every road segment. Then, we compare this direction with the direction angle of the bus to determine the direction of the bus. 3. Useful speed data of probe vehicles: The bus is the major source of our real world testing data. However, the buses need to pickup and drop passengers frequently. Thus, we have to filter out these data in order to have a more accurate estimation. We found several filtering methods in literature, such as Kaman filter, mean filter…etc. However, these methods usually require a large amount of available data. With only 375 probe vehicles at hand, it is obvious that these methods are not suitable for us. So we use a simple filtering method. We set a threshold to filter the unusual data. In this -21-.

(30) research, we set the threshold is 10 km/hr. That means, the speed data below 10 km/hr will be deleted. 4.1.2 VD (Vehicle detector) data collection and analysis. There are two VDs installed in two road segments in Taichung. These VDs can detect multiple lanes at the same time. Figure 4-3 shows a VD is installed at a specific intersection. The collected data include VD number, time, collection frequency, lane number, total volume, average speed(average in 5 minutes), average occupancy, volume of small cars, volume of medium cars and volume of large cars. The data is as shown in Figure 4-4.. Figure 4-3 Detecting range of vehicle detector. Figure 4-4 The collected VD data In most cases, VDs are installed close to the intersection. So VD data may be. -22-.

(31) influenced by the traffic signals. Thus, just like the probe vehicle data, we need to filter the speed data of VDs. The filtering method is the same as that of the probe vehicle data. That means, the speed data below 10 km/hr will be deleted. 4.1.3 Simulated data generation. The data of both probe vehicles and VDs are not sufficient enough of a comprehensive test. Thus, we generate the simulated data to enrich our testing samples. These simulated data need to reflect the real world data distribution. So we use the lidar speed radar gun to collect the real speed data first. Then, we apply the goodness-of-fit test to find the distribution of real data. We let α = 0.01 . And verify that real world speed data follow the normal distribution. We assume the length of the target road is 360 meters and the buses’ arrival rate follows Poison distribution. We assume arrival rate of buses is 0.5 per minute and the frequency of data sending is 4 times per minute. The speed follows normal distribution and the average speed and standard deviation are adjustable. The simulation time lasts for 2 hours. The simulation process of the buses data is shown in figure 4-5. In the flowchart, tj is the time when the probe vehicle sending its GPS data to the center; Total distance is the length of the target road and its initial value is 360 meters; T is the simulation time; Vj is the average speed during the time period tj tj-1. The first step generates buses until there is any tj of a generated bus is greater than T. Then, we use the similar way to simulate data obtained from VD. The additional assumption is that the arrival rate of vehicle is 10 per minute.. -23-.

(32) Generate time t 0 of a bus with the arrival rate 0.5 per minute which follows Poison distribution. Total distance=Total distance - (t 0 x average speed) ; j=1. tj=tj-1 +15. tj > T. Yes. Stop simulating. No. Total distance >0. No. Stop simulating this bus. Yes. Generate speed Vj. Total distance=Total distance - (15 x Vj ) ; j++. Figure 4-5 The simulation process We can adjust the arrival rate and frequency of data sending to simulate different data sources. And we also adjust average speed and standard deviation to simulate different data sources or different service levels of the road. With these simulated data, we can do a more comprehensive test for our model.. -24-.

(33) 4.2 The model testing. 4.2.1 Real data testing. Our data sources are probe vehicles(buses and Taxies) and VDs in a specific road segment of Taichung. The descriptions of real data are as follows. 1. Collection time: PM16:00-19:00(peak hour), November 14, 2003 2. the volume of collected data: Probe vehicle: 130 (After filter) VD: 33 Before testing, we show the road service level when there is only one data source. Table 4-1 The results of single data source Probe vehicles. VD. Level. Probability. Level. Probability. A. 0. A. 0. B. 0.1308. B. 0.2121. C. 0.1231. C. 0.0303. D. 0.2385. D. 0.1515. E. 0.2769. E. 0.6061. F. 0.2307. F. 0. Then, we follow the fusion model described in Chapter 3 step by step to compute the fusion information. Step1: Data adjusting We compute the weights of probe vehicles and VDs first. The results of weights are as shown in table 4-2.. -25-.

(34) Table 4-2 The results of weights Standard deviation. Volume. Weight. Probe vehicle. 7.036. 130. 2.626. VD. 5.462. 33. 1.106. According to the weights, we choose probe vehicle to be our target. The average speed of probe vehicles is 22.6, and the average speed of VD is 23.9. So we shift the speed data of VD by decreasing 1.3 to each of data.. Step2: Data clustering We use standard deviation to cluster data. It means that the interval in the probe vehicle clusters is 7 and in the VDs clusters is 5. The results of clustering are shown in table 4-3. Table 4-3 The results of clustering Probe vehicle. VD. Cluster. Probability. Cluster. Probability. 11-18. 0.3154. 17-22. 0.6061. 20-27. 0.4539. 24-29. 0.1818. 29-36. 0.1923. 32-37. 0.2121. 37-44. 0.0385. Step3: Data fusion We adopt the definitions of service levels as specified in Highway Capacity Manual (HCM) shown in Table 3-1 in our test. In this test, the road level of this specific road segment is Ⅱ. The data of service levels for level Ⅱ are shown in Table 4-4.. -26-.

(35) Table 4-4 Road level Ⅱ Ⅱ. Road Level Free Flow Speed，kph. 45. Service Level. Avg Travel Speed (kph). A. ~43. B. 43~32. C. 32~27. D. 27~23. E. 23~17. F. 17~. Using Equation (1) in section 3.3, m1 is the basic probability assignment of probe vehicles. So m1 ( E1 ) =0.3154, m1 ( E 2 ) =0.4539…etc. m2 is the basic probability assignment of VDs. So m2 ( E1 ) =0.6061, m2 ( E 2 ) =0.1818…etc. Then, we take m12 (B ) for example.. m12 (B ) = K=. ∑ m (E )m (E ) 1. E k ∩ E k'. 2. k. ' k. =B. 1− K. ∑ m (E )m (E ) 1. k. 2. ' k. Ek ∩ Ei' = Φ. The interval of m12 (B ) is between 32 and 43 (see Table 4-4). It means that if intersection of m1 ( E k ) and m2 ( E k' ) is between 32 and 43 such as m1 ( E3 ) and m2 ( E3 ) , we will add up all of their multiplications to computing the numerator. And if intersection of m1 ( E k ) and m2 ( E k' ) is empty set such as. m1 ( E3 ) and m2 ( E1 ) , we will add up all of their multiplications to computing K. In section 3.3, there are two methods of fusion. One is computing with a ratio and another is using output of step 1. Those outputs are shown in Table 4-5.. -27-.

(36) Table 4-5. The outputs of fusion model. Step1. Step2. Level. Probability. Level. Probability. A. 0. A. 0. B. 0.0774. B. 0.0774. C. 0.0879. C. 0.0553. D. 0.0978. D. 0.1304. E. 0.7369. E. 0.7369. F. 0. F. 0. We find the maximum probability, and the service level of this test is E. We compare this Table 4-5 with Table 4-1. When there is only probe vehicle, the maximum probability is 0.2769 and 0.6061 in VD. Then, our fusion result is 0.7369. It means that our fusion model can make the more accurate information than the single one data source dose. In order to evaluate our testing result, we use the lidar speed radar gun to collect speed data. The average speed of these data is 27.4 and standard deviation is 7.1. We can find out the average speed of lidar speed radar gun is 27 which is different to the probe vehicles and VDs. This difference may result from the stopped or stopping vehicles are ignored by the data collectors. Since the difference (4 km/hr) is smaller than the standard deviation of lidar speed data, the fusion result is acceptable.. -28-.

(37) 4.2.2 Simulated data testing. In this section, we design different scenarios for testing. In different scenarios, we have two simulated data sources. And we can adjust mean, standard deviation and collection frequency of the simulated data sources. We describe those scenarios and their results as follows. 1. The scenarios Scenario 1: The means of source1 and source2 are the same and are close to the real average speed. The standard deviations of source1 and source2 are small. Scenario 2: The means of source1 and source2 are the same and are close to the real average speed. The standard deviation of source1 is large and the standard deviation of source1 is small. Scenario 3: The means of source1 and source2 are the same and are close to the real average speed. The standard deviations of source1 and source2 are large. Scenario 4: The mean of source1 is smaller than the real average speed, and the mean of source2 is near the real average speed. The standard deviation of source1 is large and the standard deviation of source1 is small. Scenario 5: The mean of source2 is smaller than the real average speed, and the mean of source1 is near the real average speed. The standard deviation of source1 is small and the standard deviation of source2 is large. Scenario 6: -29-.

(38) There are three data sources in this scenario. The means of source1 and source2 are the same and are near the real average speed. But the means of source3 is smaller than the real speed data. The standard deviations of source1 and source2 are small. But standard deviation of source3 is large. In this scenario, we use Equation (3) in section 3.3. The means and standard deviations of the sources in these scenarios are shown in Table 4-6. Table 4-6 The scenario Scenario. Mean. Standard deviation. Real average speed. 1. Source1. 45. 3. 45(A). Source2. 45. 3. Source1. 34. 9. Source2. 34. 3. Source1. 29. 9. Source2. 29. 9. Source1. 22. 9. Source2. 35. 3. Source1. 25. 3. Source2. 18. 9. Source1. 25. 9. Source2. 34. 3. Source3. 35. 3. 2. 3. 4. 5. 6. 2. Data adjusting. -30-. 35(B). 30(C). 35(B). 25(D). 35(B).

(39) Table 4-7 The data adjusting Scenario. Volume. Standard. Weight. shfit. deviation 1. 2. 3. 4. 5. 6. Source1. 115. 3.158. 11.53. 0. Source2. 2401. 3.113. 771.282. 0. Source1. 153. 9.148. 1.828. 0. Source2. 2368. 3.018. 259.982. 0. Source1. 194. 7.811. 3.18. +1. Source2. 2329. 8.589. 31.572. 0. Source1. 201. 7.861. 3.253. +11. Source2. 2398. 3.017. 263.45. 0. Source1. 245. 3.11. 25.331. -4. Source2. 2002. 6.929. 41.699. 0. Source1. 174. 7.564. 25.644. +9. Source2. 181. 3.179. 33.707. +1. Source3. 2431. 3.018. 34.996. 0. 3. Data clustering Table 4-8 The data clustering Scenario. 1. Source1. Source2. Cluster. Probability. Cluster Probability. 36-39. 0.0348. 34-37. 0.0058. 40-43. 0.3217. 38-41. 0.1274. 44-47. 0.4696. 42-45. 0.4344. 48-51. 0.1739. 46-49. 0.3561. -31-.

(40) 2. 3. 4. 5. 50-53. 0.0741. 54-57. 0.0021. 10-19. 0.0784. 23-26. 0.0059. 20-29. 0.2614. 27-30. 0.1068. 30-39. 0.4248. 31-34. 0.4434. 40-49. 0.1961. 35-38. 0.3737. 50-59. 0.0392. 39-42. 0.0676. 43-46. 0.0025. 12-19. 0.1237. 10-18. 0.1095. 20-27. 0.2577. 19-27. 0.3104. 28-35. 0.3918. 28-36. 0.3714. 36-43. 0.1959. 37-45. 0.1795. 44-51. 0.0258. 46-54. 0.0253. 52-59. 0.0052. 55-63. 0.0034. 64-72. 0.0004. 21-28. 0.2091. 23-26. 0.0033. 29-36. 0.3409. 27-30. 0.0646. 37-44. 0.25. 31-34. 0.3674. 45-52. 0.0955. 35-38. 0.4450. 53-60. 0.0182. 39-42. 0.1134. 43-46. 0.0063. 10-13. 0.0041. 10-16. 0.2992. 14-17. 0.1633. 17-23. 0.3626. 18-21. 0.3918. 24-30. 0.2493. -32-.

(41) 6. 22-25. 0.3878. 31-37. 0.0674. 26-29. 0.0531. 38-44. 0.0205. 45-51. 0.0001. Source1. Source2. Source3. Cluster. Probability Cluster. Probability. Cluster. Probability. 19-26. 0.1609. 26-29. 0.0552. 25-28. 0.0144. 27-34. 0.2931. 30-33. 0.3094. 29-32. 0.1892. 35-42. 0.3736. 34-37. 0.4309. 33-36. 0.4949. 43-50. 0.1724. 38-41. 0.1989. 37-40. 0.2694. 43-46. 0.0055. 41-44. 0.0313. 45-48. 0.0008. 4. Data fusion Table 4-9 The data fusion Scenario. Level. Step1. Step2. 1. A. 0.8170. 0.9277. B. 0.1830. 0.0723. C. 0. 0. D. 0. 0. E. 0. 0. F. 0. 0. A. 0. 0. B. 0.828. 0.9675. C. 0.172. 0.0324. 2. -33-.

(42) 3. 4. 5. 6. D. 0.0001. 0.0001. E. 0. 0. F. 0. 0. A. 0.0267. 0.0139. B. 0.4494. 0.6484. C. 0.2118. 0.2047. D. 0.1024. 0. E. 0.185. 0.33737. F. 0.03. 0. A. 0.004. 0.004. B. 0.7117. 0.7695. C. 0.1235. 0.0657. D. 0.0013. 0.0013. E. 0. 0. F. 0. 0. A. ０. 0. B. ０. 0. C. 0.02. 0. D. 0.3393. 0.2189. E. 0.5412. 0.6813. F. 0.1. 0.1. A. 0.0001. 0.0001. B. 0.9455. 0.9873. C. 0.0539. 0.0121. D. 0.0005. 0.0005. -34-.

(43) E. 0. F. 0. 0. 4.3 Summary of fusion result According to the results of the tests in section 4.2, we summarize the results as follows: 1. The fusion model can process the real speed data, and the outputs of the fusion model are better than the results of single data source. 2. The fusion model is effective in scenario1, scenario2, scenario4. 3. The data adjusting is effective in scenario4, but is not so good in scenario5. The amount of data from the biased data source is much greater than another data source. Although the standard deviation of data from the biased data source is bigger, the biased data source gets a larger weight. 4. The step2 of the data fusion has the most significant impact in every scenario. It results in the biggest basic probability of m12 . 5. When the mean is very close the real speed data, the impacts the standard deviations is little and acceptable.. -35-.

(44) Chapter 5 Conclusions and suggestions. 5.1 Conclusions 1.. Our proposed model can process the real speed data and provide the road service level information.. 2.. The outputs of the fusion model are better than the results of single data source. It means that information center can provide more accurate information after using our fusion model.. 3.. In most cases, the data used in Dempster-Shafer theory are discrete values. The data sources in this research are intervals data. We find our fusion model is still suitable in this situation.. 4.. Our proposed fusion model is also suitable for multiple data sources (greater than 2). However, more data sources usually use more computation time.. 5.. Before the data shifting, we choose data source with higher weight to be our shifting target. If the amount of data from biased source is large, the biased data sources will have the larger weight. It causes a worse result.. 6.. When the standard deviations of data are large, the data distribute sparsely. In this case, our fusion model performs somehow poorly. For example, the real speed data is C level. We may identify it in the B or D level.. -36-.

(45) 5.2 Suggestions 1.. When we compute the weights of different data sources, we only think about amount and standard deviation of the data. We can use the historical data to evaluate the accuracy of each data source. Then, we can compute the weights in a more accurate way.. 2.. We find the speed data of buses is below that of the general vehicles. If we use buses to be the probe vehicles, we need to adjust the speed data to reflect the real situations.. 3.. Because the level of traffic congestion is hard to quantify, we can adopt other theories to set the levels, such as fuzzy theory.. 4.. Because we assume that the simulated speed data follow normal distribution, most data are centered around the mean within one standard deviation. Then, we use one standard deviation to cluster the data. That fits our fusion model. But we do not test other clustering method. Maybe there are some clustering methods which are more suitable with our fusion model.. -37-.

(46) References 1. D.J. Dailey, P. Harn, and P. Lin, “The Final Research Report of ITS Data Fusion”, Washington State Transportation Center and Washington State Department of Transportation, April 1996. 2. P.K Varshney, “Multisensor Data Fusion”, Electronics & Communication Engineering Journal, December 1997. 3. John N. Ivan and Vaneet Sethi, “Data Fusion of Fixed Detector and Probe Vehicle Data for Incident Detection”, Computer-Aided Civil and Infrastructure Enginneering 13:329-337, 1998. 4. D. J. Dailey, H. Xu and M.P. Haselkorn, “Data Fusion for Multimodel Traveler Information in a Wireless Environment” the Third World Congress on Intelligent Transport Systems, October 14-18, 1996. 5. S.C.Byun, D.B.Choi , B.H.Ahn and Hanseok Ko, “Traffic Incident Detection Using Evidential Reasoning based Data Fusion”, 6th World Congress Intelligent Transport Systems, Toronto, Canada, Nov, 1999. 6. H. Wu, M. Siegel, R. Stiefelhagen, J. Yang, “Sensor Fusion Using Dempster-Shafer Theory”, IEEE Instrumentation and Measurement Technology Conference Anchorage, AK, USA, 21-23, 2002. 7. Jeffrey A. Barnett, “Calculating Dempster-Shafer Plausibility”, IEEE Transaction on Pattern Analysis and Machine Intelligence, VOL 13, NO. 6, June 1991. 8. Huadong Wu, Mel Siegel, Rainer Stiefelhagen and Jie Yang, “Sensor Fusion -38-.

(47) Using Dempster-Shafer Theory”, IEEE Instrumentation and Measurement Technology Conference Anchorag, AK, USA, 21-23, May 2002. 9. Kari Sentz and Scott Ferson, “Combination of Evidence in Dempster-Shafer Theory”, SAND 2002-0835 Unlimited Release Printed, April 2002. 10. Keechoo Choi and YounShik Chung, “A Data fusion Algorithm for Estimating link travel time”, Intelligent Transportation Systems, 7:235-260, 2002. 11. Jiawei Han and Micheline Kamber, Data Mining: Concepts and Technique, MK, 2001. 12. Ministry of Transportation and Communications in Taiwan, Highway Capacity. Manual, 1991. 13. Sylvie Le Hegarat-Mascle, Isabelle Bloch and D. Vidal-Madjar, “Application of Dempster-Shafer Evidence Theory to Unsupervised Classification in Multisource Remote Sensing”, IEEE Trasportation on Geoscience and Remote Sensing, VOL.35, NO. 4, July 1997. 14. Linn, R.J. and D.L. Hall, “A Survey of Multi-sensor Data Fusion Systems.”, Proceedings ofthe SPIE - The International Society for Optical Engineering, Vol. 1470: (13-29). 1-2, April 1991. 15. Zhang, L, “Representation, independence, and combination of evidence in the Dempster-Shafer theory”, Advances in the Dempster-Shafer Theory of Evidence.R. R. Yager, J. Kacprzyk and M. Fedrizzi. New York, John Wiley & Sons, Inc.:51-69, 1994.. -39-.

(48)

Dempster-Shafer 理 論 於 交 通 資 料 整 合 技 術 之 應 用

Dempster-Shafer 理論於交通資料整合技術之應用