Earth Observation Magazine

Application of Aerial Video for Traffic Flow Monitoring and Management

Researchers at the University of Arizona are investigating the potential of remotely sensed data, using aerial video, to enhance existing data sources and therefore to improve traffic management. This research has developed new methods for both data collection and image processing of remotely sensed data. While the use of remotely sensed data to monitor traffic flows is not new, this research is examining the integration of digital video, global positioning systems (GPS), and automated image processing to improve the accuracy and cost-effectiveness of data collection and reduction. Several different aerial platforms are under investigation for the data collection. With these platforms, a number of experiments in which aerial video and GPS data were collected from a major arterial and from a freeway are described. A technique has been developed to process the GPS data and the digital imagery, in near real time, to estimate vehicle speeds directly from the video images. The algorithm is capable of detecting vehicles and measuring their speeds, using image registration and edge detection techniques. When vehicles are matched between two frames, their speeds can be estimated directly. In addition, similar techniques are being used to derive other traffic flow parameters, such as densities, travel times, delays, turning counts, queue lengths, and measures of platoon dispersion. The automated image processing significantly reduces the amount of time needed to generate these traffic data.

Pitu Mirchandani, Mark Hickman, Alejandro Angel, and Dinesh Chandnani

Traffic flows in many large urban areas are monitored and managed using data from a variety of sensors. The traditional technologies for traffic sensing, including inductive loop detectors and video cameras, are positioned at fixed locations in the transportation network. While these detectors do provide useful information and data on the traffic flows at particular points, they generally do not provide useful data for traffic flows over space. It is not possible to move the detectors, and they are not capable of providing data on vehicle trajectories and paths through the network. It is also difficult to use these sensors to identify traffic flow characteristics (speeds, acceleration/deceleration, and routing information) of individual vehicles.
The use of aerial platforms for traffic data collection has appeal to supplement the traditional traffic sensors. Aerial photography and video has the properties of being both mobile, so that it may be deployed where needed to collect traffic data, and of capturing movements of individual vehicles in both time and space. When used in conjunction with traditional ground-based sensors, more comprehensive data can be collected for traffic monitoring and management.
With this in mind, the US Department of Transportation, with the support of NASA, has initiated the National Consortia on Remote Sensing in Transportation (NCRST). One of the four consortia focuses on traffic flows (NCRST-F). The goal of the initial four-year research effort of the NCRST-F consortium is to develop and demonstrate methods to use satellite and airborne sensing platforms to enhance traffic flow monitoring and management. With advances in digital sensing platforms, image processing, and computational speed, there are significant opportunities to automate traffic data collection. The research described in this article has been supported by the NCRST-F, and represents a particular investigation of the use of airborne platforms for traffic data collection. Within this scope, we are investigating both off-line and on-line applications of remotely sensed traffic data.
This article describes our current research in this area. While the use of remotely sensed data to monitor traffic flows is not new, this research is examining the integration of digital video, GPS, and automated image processing to improve the accuracy and cost-effectiveness of data collection and reduction. To this end, the second section of the article gives a description of the data collection methods we are using. In the third section, an algorithm for image processing is described, in which video images are processed to track individual vehicles and to derive their speeds. Some results using data from several helicopter flights in Tucson, Arizona are given in the fourth section. The article finally offers some conclusions on the applicability of this technique for traffic monitoring and management.

Data Collection
The experiments to date have used digital video imagery, taken from an airborne platform, to observe traffic behavior. In these experiments, a consumer-grade digital video camera was mounted vertically on a skid of a helicopter, to minimize visual distortion of the imagery. At the same time, the altitude and position of the helicopter were recorded using a GPS receiver. To register the video frames to known geographic locations, ground control points were used in an initial experiment (May 2001). A subsequent experiment (May 2002) used an inertial measurement unit (IMU) to collect data on the roll, pitch, and yaw of the helicopter. High-precision GPS and IMU data is then used to automatically register the imagery, providing an accurate means of geo-referencing.
The video camera captures imagery in a 720x480 pixel sensor, with three (RGB) color bands. The video imagery is collected at 30 frames per second, but previous experiments suggest that the imagery can be sampled at rates of one frame per second or less with minimal loss of traffic information. Also, with a 35 mm equivalent camera focal length of 42 mm, a field of view of approximately 220 m (720 ft) was obtained at an altitude of approximately 260 m (850 ft) above ground. This field of view allows a considerable number of vehicles to be included in the image, such as a large “platoon” of vehicles or the full queues on all approaches to an intersection. For our vehicle detection technique, the scale of the frames is typically between three and four pixels/m.
In the course of this research, a series of data collection efforts took place, using aerial video from a helicopter as the primary source of traffic data. The primary objectives were to study the video image properties for different types of transportation facilities, and to characterize the quality of the data for traffic measurements. Video of representative arterials and freeways in the Tucson area (Figure 1), namely Speedway Boulevard and I-10, was obtained using a handheld digital video camera (Figure 2). From these flights, the data were used to develop and enhance the video image processing algorithms.

Image Processing
Vehicle Recognition and Image
Registration
The analysis of video can be greatly enhanced through automated image processing. Many researchers have studied this problem, primarily from fixed camera locations. In contrast, the use of a mobile camera platform adds additional complications to the image-processing task, as the motion of both the vehicles and the camera platform must be considered. In addition, the objective of this research has been to develop an algorithm that can run close to “real time”—in this case, the image processing must be completed in seconds (e.g., two-to-three seconds).
The algorithm presented below for vehicle detection and tracking is still being developed and tested. It uses many well-established image processing techniques for fixed cameras. However, the main contribution of our technique is to compensate for the motion of the camera platform. To do this, the idea is to detect individual vehicles in a frame, to register consecutive frames, and then to employ image transformation, image subtraction, and object matching to track individual vehicles. This process we are using to do this is illustrated in Figure 3 and described below. The common elements of the image processing technique are included for completeness.
1. Two frames, one or two seconds apart, are extracted from the video.
2. To accelerate vehicle identification, the frames are pre-processed. Pre-processing operations include converting the frames to 8-bit grayscale and trimming the areas of the frame that do not contain the road (e.g., the top and bottom). A geo-referenced roadway mask is used to define the roadway edges. [An example of the result of pre-processing is illustrated in Figures 6 and 7 later in the article.]
3. Standard edge detection is performed on the frames.
4. The frames are parsed to search for concentration of white pixels, which outline both moving and stationary objects. [Vehicle identification is also shown in Figures 6 and 7 later in the article.]
5. The coordinates of the center of each image are found from the GPS and IMU data. The precise scale of each frame is calculated from the camera’s focal length and the elevation of the helicopter with respect to the road.
6. The second frame is registered to the first one using the location, orientation and scale information from step 5.
7. The two frames are overlapped and subtracted. The overlap between the two frames is usually, by intention, between 65% and 85%, depending on the frame sampling interval, the image scale, and the speed of the helicopter.
8. After subtraction, only moving objects (vehicles) should have pixel values different from zero in the overlapping portion of the two frames. However, in many cases stationary objects may be seen as moving objects (pixel values different from zero) because of limitations in geo-referencing accuracy. This is solved by setting a minimum speed threshold for moving objects (e.g., only objects with a displacement of more than 20 pixels). The vehicles are thus identified.
9. Matching the cars in the two frames is done using a minimum cost matching algorithm. This algorithm is described in more detail in the next section.
10. After matching the vehicles, vehicle trajectories can then be derived from the time and position information in order to obtain traffic parameters such speed, density, spacing, etc. This procedure is then repeated for all the frames.

Vehicle Matching and Speed Estimation
Steps 9 and 10 of the algorithm presented above determine the motion of cars and the speeds of the various cars detected. There are two main algorithms that were used for the matching of the cars in consecutive frames:
-- Greedy algorithm
-- Minimum cost matching for a weighted bipartite graph (the assignment problem)
The cars detected in the first frame are to be matched with the cars in the second frame. A greedy algorithm is implemented for matching, as follows:
1. Take a car in the first frame if it has not yet been selected and initialize a variable minimum_distance to a very high value.
2. Assume the speed at which this car should travel and determine the predicted location of this car in the second frame. The algorithm is fairly robust to the assumed speed, when the speeds are consistent across vehicles (e.g., on a freeway). One might expect performance to be degraded for less consistent speeds (e.g., on arterial roadways with signalized intersections).
3. Take each car of Frame 2 in turn and determine the Euclidian distance between the predicted location of the car in the first frame and the current car in the second frame.
4. If this distance is below a certain threshold limit, and it is less than minimum_distance, then record that value and the car number as the possible match for the car in Frame 1. Update the value of minimum_distance to this new value.
5. After all the cars in Frame 2 have been checked, and if a match is found for the car in Frame 1, then make this tentative match as permanent.
6. Repeat the above steps till all the cars in Frame 1 are evaluated, given a successful or unsuccessful match.
This matching algorithm was carried out on various images taken at different intervals. The various resolutions and time intervals along with the performance of this algorithm are given below:
-- Resolution 720 x 480 pixels, Time interval: 2 seconds. For this, the greedy algorithm worked very well and gave results that were very close (about 90-100% correct matching) to the actual matching of the cars. The actual matching was done manually and compared with the results obtained by the greedy algorithm.
-- Resolution 720 x 480 pixels, Time interval: 5 seconds. For this, the greedy algorithm worked very well and gave results that were very close (about 90-100% correct matching) to the actual matching of the cars.
-- Resolution 2265 x 300 pixels, Time interval: 10 seconds. For this, the greedy algorithm did not work well. It gave about 50-60% correct matching.
A second method using minimum cost matching was also investigated. To match the cars in the consecutive frames, a weighted bipartite network was generated; this network was then used for matching the cars. The algorithm to accomplish this is as follows:
1. Take a car in the first frame if it is not yet been selected.
2. Assume the speed at which this car should travel and determine the possible location of this car.
3. Build a bipartite graph that has arcs from the predicted location of the cars in Frame 1 to all of the cars in Frame 2. Take each car of Frame 2 in turn and make the weight assigned to this arc a function of the distance between the predicted location and the observed location (explained below).
4. Repeat the above steps for each of the cars of Frame 1.
5. Using the network so formed, perform the successive shortest path algorithm to do the minimum cost matching of this weighted bipartite graph.
The determination of the arc weights was carried out by two different approaches, and the results of each of these were compared. The two techniques are explained below.
-- Single ellipse method: Assume that the location of the car in Frame 1 is determined using some predicted speed at which the car must travel. For a given car in Frame 2, the problem is now to assign a weight to the link that joins these two locations. In this method, we consider that it is more likely for a car to gain or drop speed and remain in the same lane rather than change lanes. Thus we consider that around each car there are equal-likelihood ellipses that define the likelihood of the car’s predicted location. These likelihood ellipses are related to the direction of motion of the car; the width is flatter that the length of the ellipse. This is demonstrated in Figure 4.
In Figure 4, it is seen that in this method, it is more likely for the car in Frame 1 to be matched with car a of Frame 2 than with car b of Frame 2. This is because it assigns more cost to the link that is incident on car b than to car a. This algorithm was tested on the same set of images as with the greedy algorithm, with the following results. For the two-second images, the algorithm worked very well and gave results that were very close (about 90-100% correct matching) to the actual matching of the cars. For the five-second images, the algorithm also performed very well, with about 90-100% correct matching) to the actual matching of the cars. Finally, with the high-resolution 10-second images, the algorithm also gave results that were very close (about 70-90% correct matching) to the actual matching of the cars.
-- Two ellipse method: In this method, we assume, from the earlier images, that we know whether the vehicles are slowing (platoon contraction) or increasing speed (platoon dispersion). For example, for platoon dispersion, we consider that it is more likely for a car to gain speed rather than drop speed and also remain in the same lane rather than change lanes. Thus we consider that around each car there are two different ellipses that define the likelihood function for the location of the car. One ellipse is in the forward direction, while the other is in the backward direction. The ratio of the major axis to the minor axis for the ellipse in the backward direction is lesser than that in the forward direction. This is demonstrated in Figure 5.
In this method, it is more likely for the car in Frame 1 to be matched with car a of Frame 2 than with car b of Frame 2. This is because it assigns more cost to the link that is incident on car b than to car a. Performance of these arc weights was similar to the single ellipse method.
In all of the matching techniques, the method uses an initial estimate of the speed to determine a probability density function for the vehicle location. The values from the density function are used as the “costs” in the matching algorithm. In the tests described above, this algorithm seems to perform reasonably well on freeway segments, where the speeds are generally uniform. For arterials, large speed variations take place, and it is considerably more difficult to match vehicles by speed.
Once the vehicles are matched, the speed of each vehicle is computed by taking the distance between matched vehicles (i.e., pixels multiplied by the image scale) and dividing by the time interval between the two images. An example of this process is described in the following section.

Example Results
Two frames taken from a video captured in May 2002 are shown in Figure 6 and Figure 7. These frames were taken two seconds apart, as the helicopter moved along the freeway. For the purposes of faster image processing, the images were converted to grayscale. With these images, edge detection and feature (vehicle) identification was performed, resulting in the blobs identified with bounding boxes in Figures 6 and 7. One may note that not all vehicles were recognized in each frame, particularly in Frame 2. Note also that the precise location of each bounding box is also not directly over the vehicle, due to long shadows that appeared in the imagery — the images were taken around 5 PM, with the evening sun from the west (i.e., from the tops of the images).
The process of registering the images, and image subtraction, was also performed on the images. The result of this process, for the two frames, is shown in Figure 8. In this example, the single ellipse method with the minimum cost-matching algorithm was used to match vehicles. The results of the matching are shown in Table 1. In this case, a total of five vehicles were matched between the two frames. One might note from Figures 6 and 7 that Vehicle 8 in Frame 1 should have been matched with Vehicle 10 from Frame 2; also, Vehicle 9 in Frame 1 should not have been matched with Vehicle 10 in Frame 2. It appears that Vehicle 9 in Frame 1 was not re-identified in Frame 2, due to a failure of the edge detection/feature identification process. Also, because of the movement of the helicopter, some vehicles in the first frame were not observed in the second frame, and vice versa.
The estimated speeds of the matched vehicles were calculated using the image scale and the time between frames. Vehicle speeds were fairly well clustered around 48-53 mph, as one might expect along a freeway. The lone exception was Vehicle 9 (from Frame 1), which had an observed speed of over 70-mph. This anomalous speed was due to the erroneous matching of Vehicle 9 with Vehicle 10 in Frame 2. Such an erroneous speed could be discovered by a simple filter.
Finally, to demonstrate the real-time nature of the proposed algorithm, the processing time for each of the major steps are shown in Figure 9. For one run of the algorithm (i.e., to match two frames, detect cars in the two frames, match the cars and determine their speeds) requires the following operations:
-- Displaying the picture twice (this step is not necessary)
-- Rotating the picture once
-- Edge detection once
-- Scaling the picture once
-- Translation once
-- Detecting cars once
-- Image subtraction once
Note that we need to perform each operation only once, since under normal conditions one might need to process only one additional image in a series of images. The algorithm was coded in Java and run on a 1.8 GHz Pentium 4 personal computer with 256 MB of RAM. The total CPU time for a 500x500 pixel (trimmed) image is about 1.2 seconds. This meets the initial requirements for processing in near real time for sample images (taken every 2 seconds). Further tests suggest that the CPU time is approximately linear in the number of pixels, meaning that the full image from the video camera (720x540) could be processed in real time. To achieve similar results for higher resolution imagery, frames would need to be sampled less frequently; or else some additional processing power is needed.

Conclusions
In this article a simple technique for vehicle identification and speed estimation is given. This technique, which we are continuing to refine, shows promise that it may be run in real time in the near future, allowing one to estimate individual vehicle speeds from aerial imagery on a more or less continuous basis. Ongoing testing and refinement is now being conducted to explore the performance of the algorithms in terms of:
-- CPU (image processing) times as a function of image resolution, scale, and field of view.
-- Communication requirements to send the results to a ground station;
-- Detection error rates; and,
-- Tracking error rates.

Acknowledgments
This research was supported by the National Consortium on Remote Sensing in Transportation—Flows. The support of the Research and Special Programs Administration at the US Department of Transportation, and of NASA, is gratefully acknowledged.

About the Authors
Pitu Mirchandani is Professor, Department of Systems and Industrial Engineering, ATLAS Research Center, University of Arizona.
Mark Hickman is Assistant Professor, Department of Civil Engineering, ATLAS Research Center, University of Arizona.
Alejandro Angel is a Graduate Researcher, ATLAS Research Center, University of Arizona.
Dinesh Chandnani is a Graduate Researcher, ATLAS Research Center, University of Arizona.

Back