Prof. Christian S. Jensen. List of Publications

Prof. Christian S. Jensen
List of Publications

This page contains a list of research publications with
abstracts and, generally, links to full paper versions.

Due to the copyright restrictions of some publishers, not all the documents are available online on this page. If you cannot access the file you are interested in, please, feel free to contact me.

2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 2000 1999 1998 1997 1996 1995 1994 1993 1992 1991 1990 1989

2016 top Xie, X., P. Jin, M.-L. Yiu, J. Du, M. Yuan, C. S. Jensen, "Enabling Scalable Geographic Service Sharing with Weighted Imprecise Voronoi Cells," in IEEE Transactions on Knowledge and Data Engineering, 28(2): 439–453, February 2016.

Publication

Online at IEEE Xplore Digital Library
We provide techniques that enable a scalable so-called Volunteered Geographic Services system. This system targets the increasing populations of online mobile users, e.g., smartphone users, enabling such users to provide location-based services to each other, thus enabling citizen reporter or citizen as a sensor scenarios. More specifically, the system allows users to register as service volunteers, or micro-service providers, by accepting service descriptions and periodically updated locations from such volunteers; and the system allows users to subscribe to notifications of available, nearby relevant services by accepting subscriptions, formalized as continuous queries, that take service preferences and user locations as arguments and return relevant services. Services are ranked according to their relevance and distance to a query, and the highest ranked services are returned. The key challenge addressed is that of scalably providing up-to-date results to queries when the query locations change continuously. This is achieved by the proposal of a new so-called safe-zone model. With safe zones, query results are accompanied by safe zones with the property that a query result remains the same for all locations in its safe zone. Then, query users need only notify the system when they exit their current safe zone. Existing safe-zone models fall short in the papers setting. The new model is enabled by (i) weighted and (ii) set weighted imprecise Voronoi cells. The paper covers underlying concepts, properties, and algorithms, and it covers applications in VGS tracking and presents findings of empirical performance studies.
Lu, H., C. Guo, B. Yang, C. S. Jensen, "Finding Frequently Visited Indoor POIs Using Symbolic Indoor Tracking Data," in Proceedings of the 19th International Conference on Extending Database Technology, Bordeaux, France, pp. 449–460, March 15–18, 2016.

Publication

Online at OpenProceedings
Indoor tracking data is being amassed due to the deployment of indoor positioning technologies. Analysing such data discloses useful insights that are otherwise hard to obtain. For example, by studying tracking data from an airport, we can identify the shops and restaurants that are most popular among passengers. In this paper, we study two query types for finding frequently visited Points of Interest (POIs) from symbolic indoor tracking data. The snapshot query finds those POIs that were most frequently visited at a given time point, whereas the interval query finds such POIs for a given time interval. A typical example of symbolic tracking is RFID-based tracking, where an object with an RFID tag is detected by an RFID reader when the object is in the reader’s detection range. A symbolic indoor tracking system deploys a limited number of proximity detection devices, like RFID readers, at preselected locations, covering only part of the host indoor space. Consequently, symbolic tracking data is inherently uncertain and only enables the discrete capture of the trajectories of indoor moving objects in terms of coarse regions. We provide uncertainty analyses of the data in relation to the two kinds of queries. The outcomes of the analyses enable us to design processing algorithms for both query types. An experimental evaluation with both real and synthetic data suggests that the framework and algorithms enable efficient and scalable query processing.
2015 top Kaul, M., R. C.-W. Wong, C. S. Jensen, "New Lower and Upper Bounds for Shortest Distance Queries on Terrains," in Proceedings of the VLDB Endowment, 9(3): 168–179, November 2015.

Publication [not publicly available]

Online at VLDB
The increasing availability of massive and accurate laser data enables the processing of spatial queries on terrains. As shortest-path computation, an integral element of query processing, is inherently expensive on terrains, a key approach to enabling efficient query processing is to reduce the need for exact shortest-path computation in query processing. We develop new lower and upper bounds on terrain shortest distances that are provably tighter than any existing bounds. Unlike existing bounds, the new bounds do not rely on the quality of the triangulation. We show how use of the new bounds speeds up query processing by reducing the need for exact distance computations. Speedups of of nearly an order of magnitude are demonstrated empirically for well-known spatial queries
Chen, L., Y. Gao, Z. Xing, C. S. Jensen, G. Chen, "I2RS: A Distributed Geo-Textual Image Retrieval and Recommendation System," in Proceedings of the VLDB Endowment, 8(12): 1884–1887, (demo paper), August 2015.

Publication [not publicly available]

Online at VLDB
Massive amounts of geo-tagged and textually annotated images are provided by online photo services such as Flickr and Zommr. However, most existing image retrieval engines only consider text annotations. We present I2RS, a system that allows users to view geo-textual images on Google Maps, find hot topics within a specific geographic region and time period, retrieve images similar to a query image, and receive recommended images that they might be interested in. I2RS is a distributed geo-textual image retrieval and recommendation system that employs SPB-trees to index geotextual images, and that utilizes metric similarity queries, including top-m spatio-temporal range and k nearest neighbor queries, to support geo-textual image retrieval and recommendation. The system adopts the browser-server model, whereas the server is deployed in a distributed environment that enables efficiency and scalability to huge amounts of data and requests. A rich set of 100 million geo-textual images crawled from Flickr is used to demonstrate that, I2RS can return high-quality answers in an interactive way and support efficient updates for high image arrival rates.
Jin, P., X. Xie, C. S. Jensen, Y. Jin, L. Yue, "HAG: An Energy-Proportional Data Storage Scheme for Disk Array Systems," Journal of Computer Science and Technology, 30(4): 679–695, July 2015.

Publication [not publicly available]

Online at SpringerLink
Energy consumption has been a critical issue for data storage systems, especially for modern data centers. A recent survey has showed that power costs amount to about 50% of the total cost of ownership in a typical data center, with about 27% of the system power being consumed by storage systems. This paper aims at providing an effective solution to reducing the energy consumed by disk storage systems, by proposing a new approach to reduce the energy consumption. Differing from previous approaches, we adopt two new designs. 1) We introduce a hotness-aware and group-based system model (HAG) to organize the disks, in which all disks are partitioned into a hot group and a cold group. We only make file migration between the two groups and avoid the migration within a single group, so that we are able to reduce the total cost of file migration. 2) We use an on-demand approach to reorganize files among the disks that is based on workload change as well as the change of data hotness. We conduct trace-driven experiments involving two real and nine synthetic traces and we make detailed comparisons between our method and competitor methods according to different metrics. The results show that our method can dynamically select hot files and disks when the workload changes and that it is able to reduce energy consumption for all the traces. Furthermore, its time performance is comparable to that of the compared algorithms. In general, our method exhibits the best energy efficiency in all experiments, and it is capable of maintaining an improved trade-off between performance and energy consumption.
Skovsgaard, A., C. S. Jensen, "Finding top-k relevant groups of spatial web objects," in The VLDB Journal, 24(4): 537–555, June 2015.

Publication [not publicly available]

Online at SpringerLink
The web is increasingly being accessed from geo-positioned devices such as smartphones, and rapidly increasing volumes of web content are geo-tagged. In addition, studies show that a substantial fraction of all web queries has local intent. This development motivates the study of advanced spatial keyword-based querying of web content. Previous research has primarily focused on the retrieval of the top-k individual spatial web objects that best satisfy a query specifying a location and a set of keywords. This paper proposes a new type of query functionality that returns top-k groups of objects while taking into account aspects such as group density, distance to the query, and relevance to the query keywords. To enable efficient processing, novel indexing and query processing techniques for single and multiple keyword queries are proposed. Empirical performance studies with an implementation of the techniques and real data suggest that the proposals are viable in practical settings.
Guo, C., B. Yang, O. Andersen, C. S. Jensen, K. Torp, "EcoMark 2.0: empowering eco-routing with vehicular environmental models and actual vehicle fuel consumption data," in GeoInformatica, 19(3): 567–599, July 2015.

Publication [not publicly available]

Online at SpringerLink
Eco-routing is a simple yet effective approach to substantially reducing the environmental impact, e.g., fuel consumption and greenhouse gas (GHG) emissions, of vehicular transportation. Eco-routing relies on the ability to reliably quantify the environmental impact of vehicles as they travel in a spatial network. The procedure of quantifying such vehicular impact for road segments of a spatial network is called eco-weight assignment. EcoMark 2.0 proposes a general framework for eco-weight assignment to enable eco-routing. It studies the abilities of six instantaneous and five aggregated models to estimating vehicular environmental impact. In doing so, it utilizes travel information derived from GPS trajectories (i.e., velocities and accelerations) and actual fuel consumption data obtained from vehicles. The framework covers analyses of actual fuel consumption, impact model calibration, and experiments for assessing the utility of the impact models in assigning eco-weights. The application of EcoMark 2.0 indicates that the instantaneous model EMIT and the aggregated model SIDRA-Running are suitable for assigning eco-weights under varying circumstances. In contrast, other instantaneous models should not be used for assigning eco-weights, and other aggregated models can be used for assigning eco-weights under certain circumstances.
Cao, X., G. Cong, C. S. Jensen, "Efficient Processing of Spatial Group Keyword Queries," in ACM Transactions on Database Systems, 40(2), Article 13, 48 pages, June 2015.

Publication [not publicly available]

ACM Author-Izer
With the proliferation of geo-positioning and geo-tagging techniques, spatio-textual objects that possess both a geographical location and a textual description are gaining in prevalence, and spatial keyword queries that exploit both location and textual description are gaining in prominence. However, the queries studied so far generally focus on finding individual objects that each satisfy a query rather than finding groups of objects where the objects in a group together satisfy a query.
We define the problem of retrieving a group of spatio-textual objects such that the group's keywords cover the query's keywords and such that the objects are nearest to the query location and have the smallest inter-object distances. Specifically, we study three instantiations of this problem, all of which are NP-hard. We devise exact solutions as well as approximate solutions with provable approximation bounds to the problems. In addition, we solve the problems of retrieving top-k groups of three instantiations, and study a weighted version of the problem that incorporates object weights. We present empirical studies that offer insight into the efficiency of the solutions, as well as the accuracy of the approximate solutions.
Shang, S., K. Zheng, C. S. Jensen, B. Yang, P. Kalnis, G. Li, J.-R. Wen, "Discovery of Path Nearby Clusters in Spatial Networks," in IEEE Transactions on Knowledge and Data Engineering, 27(6): 1505–1518, June 2015.

Publication

Online at IEEE
The discovery of regions of interest in large cities is an important challenge. We propose and investigate a novel query called the path nearby cluster (PNC) query that finds regions of potential interest (e.g., sightseeing places and commercial districts) with respect to a user-specified travel route. Given a set of spatial objects O (e.g., POIs, geo-tagged photos, or geo-tagged tweets) and a query route q, if a cluster c has high spatial-object density and is spatially close to q, it is returned by the query (a cluster is a circular region defined by a center and a radius). This query aims to bring important benefits to users in popular applications such as trip planning and location recommendation. Efficient computation of the PNC query faces two challenges: how to prune the search space during query processing, and how to identify clusters with high density effectively. To address these challenges, a novel collective search algorithm is developed. Conceptually, the search process is conducted in the spatial and density domains concurrently. In the spatial domain, network expansion is adopted, and a set of vertices are selected from the query route as expansion centers. In the density domain, clusters are sorted according to their density distributions and they are scanned from the maximum to the minimum. A pair of upper and lower bounds are defined to prune the search space in the two domains globally. The performance of the PNC query is studied in extensive experiments based on real and synthetic spatial data.
Yang, B., C. Guo, Y. Ma, C. S. Jensen, "Towards Personalized, Context-Aware Routing," in The VLDB Journal, 24(2): 297–318, April 2015.

Publication [not publicly available]

Online at SpringerLink
A driver’s choice of a route to a destination may depend on the route’s length and travel time, but a multitude of other, possibly hard-to-formalize aspects, may also factor into the driver’s decision. There is evidence that a driver’s choice of route is context dependent, e.g., varies across time, and that route choice also varies from driver to driver. In contrast, conventional routing services support little in the way of context dependence, and they deliver the same routes to all drivers. We study how to identify context-aware driving preferences for individual drivers from historical trajectories, and thus how to provide foundations for personalized navigation, but also professional driver education and traffic planning. We provide techniques that are able to capture time-dependent and uncertain properties of dynamic travel costs, such as travel time and fuel consumption, from trajectories, and we provide techniques capable of capturing the driving behaviors of different drivers in terms of multiple dynamic travel costs. Further, we propose techniques that are able to identify a driver’s contexts and then to identify driving preferences for each context using historical trajectories from the driver. Empirical studies with a large trajectory data set offer insight into the design properties of the proposed techniques and suggest that they are effective.
Wu, D., B. Choi, J. Xu, C. S. Jensen, "Authentication of Moving Top-k Spatial Keyword Queries," in IEEE Transactions on Knowledge and Data Engineering, 27(4): 922–935, April 2015.

Publication

Online at IEEE
A moving top-k spatial keyword (MkSK) query, which takes into account a continuously moving query location, enables a mobile client to be continuously aware of the top-k spatial web objects that best match a query with respect to location and text relevance. The increasing mobile use of the web and the proliferation of geo-positioning render it of interest to consider a scenario where spatial keyword search is outsourced to a separate service provider capable at handling the voluminous spatial web objects available from various sources. A key challenge is that the service provider may return inaccurate or incorrect query results (intentionally or not), e.g., due to cost considerations or invasion of hackers. Therefore, it is attractive to be able to authenticate the query results at the client side. Existing authentication techniques are either inefficient or inapplicable for the kind of query we consider. We propose new authentication data structures, the MIR-tree and MIR*-tree, that enable the authentication of MkSK queries at low computation and communication costs. We design a verification object for authenticating MkSK queries, and we provide algorithms for constructing verification objects and using these for verifying query results. A thorough experimental study on real data shows that the proposed techniques are capable of outperforming two baseline algorithms by orders of magnitude.
Keles, I, S. Saltenis, C. S. Jensen, "Synthesis of Partial Rankings of Points of Interest Using Crowdsourcing," in Proceedings of the Ninth Workshop on Geographic Information Retrieval, Paris, France, article 15, 10 pages, November 26–27, 2015.

Publication [not publicly available]

ACM Author-Izer
The web is increasingly being accessed from mobile devices, and studies suggest that a large fraction of keyword-based search engine queries have local intent, meaning that users are interested in local content and that the underlying ranking function should take into account both relevance to the query keywords and the query location. A key challenge in being able to make progress on the design of ranking functions is to be able to assess the quality of the results returned by ranking functions. We propose a model that synthesizes a ranking of points of interest from answers to crowdsourced pairwise relevance questions. To evaluate the model, we propose an innovative methodology that enables evaluation of the quality of synthesized rankings in a simulated setting. We report on an experimental evaluation based on the methodology that shows that the proposed model produces promising results in pertinent settings and that it is capable of outperforming an approach based on majority voting.
Silvestri, C., F. Lettich, S. Orlando, C. S. Jensen, "A wait-free output data structure for GPU-based streaming query processing," in Proceedings of the 23rd Italian Symposium on Advanced Database Systems, Gaeta, Italy, pp. 232–239, June 14–17, 2015.

Publication
The performance of GPU-based algorithms can be reduced significantly by contention among memory accesses and by locking. We focus on highvolume output in GPU-based algorithms for streaming query processing: a very large number of cores process input streams and simultaneously produce a sustained output stream whose volume is sometimes orders of magnitude larger than that of the input streams. In this context, several cores can produce results simultaneously that must be written in the output buffer according to some order and without conflicts with other writers. To enable this behavior, we propose a waitfree bitmap-based data structure and a usage pattern that combine to obviate the use of locks and atomic operations. In our experiments, where the GPU-based algorithm considered is otherwise unchanged, the introduction of the new wait-free data structure entails a performance improvement of one order of magnitude.
Čeikute, V., C. S. Jensen, "Vehicle Routing With User-Generated Trajectory Data," in Proceedings of the Sixteenth IEEE International Conference on Mobile Data Management - Volume I, Pittsburgh, PA, pp. 14–23, June 15–18, 2015.

Publication

Online at IEEE
Rapidly increasing volumes of GPS data collected from vehicles provide new and increasingly comprehensive insight into the routes that drivers prefer. While routing services generally compute shortest or fastest routes, recent studies suggest that local drivers often prefer routes that are neither shortest nor fastest, indicating that drivers value route properties that are diverse and hard to quantify or even identify. We propose a routing service that uses an existing routing service while exploiting the availability of historical route usage data from local drivers. Given a source and destination, the service recommends a corresponding route that is most preferred by local drivers. It uses a route preference function that takes into account the number of distinct drivers and the number of trips associated with a route, as well as temporal aspects of the trips. The paper provides empirical studies with real route usage data and an existing online routing service.
Chen, L., Y. Gao, C. S. Jensen, X. Li, B. Zheng, G. Chen, "Indexing Metric Uncertain Data for Range Queries," in Proceedings of the 2015 ACM SIGMOD International Conference on the Management of Data, Melbourne, Vic., Australia, pp. 951–965, May 31–June 4, 2015.

Publication [not publicly available]

ACM Author-Izer
Range queries in metric spaces have applications in many areas such as multimedia retrieval, computational biology, and location-based services, where metric uncertain data exists in different forms, resulting from equipment limitations, high-throughput sequencing technologies, privacy preservation, or others. In this paper, we represent metric uncertain data by using an object-level model and a bi-level model, respectively. Two novel indexes, the uncertain pivot B+-tree (UPB-tree) and the uncertain pivot B+-forest (UPB-forest), are proposed accordingly in order to support probabilistic range queries w.r.t. a wide range of uncertain data types and similarity metrics. Both index structures use a small set of effective pivots chosen based on a newly defined criterion, and employ the B+-tree(s) as the underlying index. By design, they are easy to be integrated into any existing DBMS. In addition, we present efficient metric probabilistic range query algorithms, which utilize the validation and pruning techniques based on our derived probability lower and upper bounds. Extensive experiments with both real and synthetic data sets demonstrate that, compared against existing state-of-the-art indexes for metric uncertain data, the UPB-tree and UPB-forest incur much lower construction costs, consume smaller storage spaces, and can support more efficient metric probabilistic range queries.
Guo, C., B. Yang, O. Andersen, C. S. Jensen, K. Torp, "EcoSky: An Eco Routing System for Reducing the Vehicular Environmental Impact," in Proceedings of the 31st IEEE International Conference on Data Engineering, Seoul, South Korea, pp. 1412–1415, April 13–17, 2015.

Publication

Online at IEEE
Reduction in greenhouse gas emissions from transportation attracts increasing interest from governments, fleet managers, and individual drivers. Eco-routing, which enables drivers to use eco-friendly routes, is a simple and effective approach to reducing emissions from transportation. We present EcoSky, a system that annotates edges of a road network with time dependent and uncertain eco-weights using GPS data and that supports different types of eco-routing. Basic eco-routing returns the most eco-friendly routes; skyline eco-routing takes into account not only fuel consumption but also travel time and distance when computing eco-routes; and personalized eco-routing considers each driver's past behavior and accordingly suggests different routes to different drivers.
Aljubayrin, S., J. Qi, C. S. Jensen, R. Zhang, Z. He, Z. Wen, "The Safest Path via Safe Zones," in Proceedings of the 31st IEEE International Conference on Data Engineering, Seoul, South Korea, pp. 531–542, April 13–17, 2015.

Publication

Online at IEEE
We define and study Euclidean and spatial network variants of a new path finding problem: given a set of safe zones, find paths that minimize the distance traveled outside the safe zones. In this problem, the entire space with the exception of the safe zones is unsafe, but passable, and it differs from problems that involve unsafe regions to be strictly avoided. As a result, existing algorithms are not effective solutions to the new problem. To solve the Euclidean variant, we devise a transformation of the continuous data space with safe zones into a discrete graph upon which shortest path algorithms apply. A naive transformation yields a very large graph that is expensive to search. In contrast, our transformation exploits properties of hyperbolas in the Euclidean space to safely eliminate graph edges, thus improving performance without affecting the shortest path results. To solve the spatial network variant, we propose a different graph-to-graph transformation that identifies critical points that serve the same purpose as do the hyperbolas, thus avoiding the creation of extraneous edges. This transformation can be extended to support a weighted version of the problem, where travel in safe zones has non-zero cost. We conduct extensive experiments using both real and synthetic data. The results show that our approaches outperform baseline approaches by more than an order of magnitude in graph construction time, storage space and query response time.
Chen, L., Y. Gao, X. Li, C. S. Jensen, G. Chen, "Efficient Metric Indexing for Similarity Search," in Proceedings of the 31st IEEE International Conference on Data Engineering, Seoul, South Korea, pp. 591–602, April 13–17, 2015.

Publication

Online at IEEE
The goal in similarity search is to find objects similar to a specified query object given a certain similarity criterion. Although useful in many areas, such as multimedia retrieval, pattern recognition, and computational biology, to name but a few, similarity search is not yet supported well by commercial DBMS. This may be due to the complex data types involved and the needs for flexible similarity criteria seen in real applications. We propose an efficient disk-based metric access method, the Space-filling curve and Pivot-based B+-tree (SPB-tree), to support a wide range of data types and similarity metrics. The SPB-tree uses a small set of so-called pivots to reduce significantly the number of distance computations, uses a space-filling curve to cluster the data into compact regions, thus improving storage efficiency, and utilizes a B+-tree with minimum bounding box information as the underlying index. The SPB-tree also employs a separate random access file to efficiently manage a large and complex data. By design, it is easy to integrate the SPB-tree into an existing DBMS. We present efficient similarity search algorithms and corresponding cost models based on the SPB-tree. Extensive experiments using real and synthetic data show that the SPB-tree has much lower construction cost, smaller storage size, and can support more efficient similarity queries with high accuracy cost models than is the case for competing techniques. Moreover, the SPB-tree scales sublinearly with growing dataset size.
Chen, L., X. Lin, H. Hu, C. S. Jensen, J. Xu, "Answering Why-Not Questions on Spatial Keyword Top-k Queries," in Proceedings of the 31st IEEE International Conference on Data Engineering, Seoul, South Korea, pp. 279–290, April 13–17, 2015.

Publication

Online at IEEE
Large volumes of geo-tagged text objects are available on the web. Spatial keyword top-k queries retrieve k such objects with the best score according to a ranking function that takes into account a query location and query keywords. In this setting, users may wonder why some known object is unexpectedly missing from a result; and understanding why may aid users in retrieving better results. While spatial keyword querying has been studied intensively, no proposals exist for how to offer users explanations of why such expected objects are missing from results. We provide techniques that allow the revision of spatial keyword queries such that their results include one or more desired, but missing objects. In doing so, we adopt a query refinement approach to provide a basic algorithm that reduces the problem to a two-dimensional geometrical problem. To improve performance, we propose an index-based ranking estimation algorithm that prunes candidate results early. Extensive experimental results offer insight into design properties of the proposed techniques and suggest that they are efficient in terms of both running time and I/O cost.
Jensen, C. S., C. Jermaine, X. Zhou, editors, "Special Section on the International Conference on Data Engineering," IEEE Transaction on Knowledge and Data Engineering, 27(7), 99 pages, July 2015.

Publication

Online at IEEE
Jensen, C. S., X. Xie, V. I. Zadorozhny, S. Madria, E. Pitoura, B. Zheng, C.-Y. Chow, editors, Proceedings of the Sixtenth International Conference on Mobile Data Management - Volume I, Pittsburgh, PA, USA, 332+xxvii pages, June 15–18, 2015.

Online at IEEE
Jensen, C. S., X. Xie, V. I. Zadorozhny, S. Madria, E. Pitoura, B. Zheng, C.-Y. Chow, editors, Proceedings of the Sixtenth International Conference on Mobile Data Management - Volume II, Pittsburgh, PA, USA, 130+xiv pages, June 15–18, 2015.

Online at IEEE
Candan, K. S., C. S. Jensen, M. Parashar, K. D. Ryu, H. Yeom, editors, Proceeedings of the 2015 IEEE International Conference on Cloud Engineering, Tempe, AZ, USA, 514+xxix pages, March 9–13, 2015.

Online at IEEE
Jensen, C. S., "Keyword-Based Querying of Geo-Tagged Web Content," in Proceedings of the Fifth International Conference on Model & Data Engineering, Rhodes, Greece, p. XIII, September 26–28, 2015.

Publication [not publicly available]

Online at SpringerLink
The web is being accessed increasingly by users for which an accurate geo-location is available, and increasing volumes of geo-tagged content are available on the web, including web pages, points of interest, and microblog posts. Studies suggest that each week, several billions of keyword-based queries are issued that have some form of local intent and that target geo-tagged web content with textual descriptions. This state of affairs gives prominence to spatial web data management, and it opens to a research area full of new and exciting opportunities and challenges. A prototypical spatial web query takes a user location and user-supplied keywords as arguments, and it returns content that is spatially and textually relevant to these arguments. Due perhaps to the rich semantics of geographical space and its importance to our daily lives, many different kinds of relevant spatial web query functionality may be envisioned. Based on recent and ongoing work by the speaker and his colleagues, the talk presents key functionality, concepts, and techniques relating to spatial web querying; it presents functionality that addresses different kinds of user intent; and it outlines directions for the future development of keyword-based spatial web querying.
Jensen, C. S., "Querying of Geo-TextualWeb Content: Concepts and Techniques," in Proceedings of the Sixteenth IEEE International Conference on Mobile Data Management - Volume II, Pittsburgh, PA, pp. 1–2, June 15–18, 2015.

Publication

Online at IEEE
Qu, Q., C. Chen, C. S. Jensen, A. Skovsgaard, "Space-Time Aware Behavioral Topic Modeling for Microblog Posts," in X. Zhou (ed.): Special Issue on Location-based Social Media Analysis, IEEE Data Engineering Bulletin, 38(2): 58–67, Invited paper, June 2015.

Publication
How can we automatically identify the topics of microblog posts? This question has received substantial attention in the research community and has led to the development of different topic models, which are mathematically well-founded statistical models that enable the discovery of topics in document collections. Such models can be used for topic analyses according to the interests of user groups, time, geographical locations, or social behavior patterns. The increasing availability of microblog posts with associated users, textual content, timestamps, geo-locations, and user behaviors, offers an opportunity to study space-time dependent behavioral topics. Such a topic is described by a set of words, the distribution of which varies according to the time, geo-location, and behaviors (that capture how a user interacts with other users by using functionality such as reply or re-tweet) of users. This study jointly models user topic interest and behaviors considering both space and time at a fine granularity. We focus on the modeling of microblog posts like Twitter tweets, where the textual content is short, but where associated information in the form of timestamps, geo-locations, and user interactions is available. The model aims to have applications in location inference, link prediction, online social profiling, etc. We report on experiments with tweets that offer insight into the design properties of the papers proposal.
Candan, K. S., C. S. Jensen, M. Parashar, K. D. Ryu, H. Yeom, "Guest Editors’ Introduction: Cloud Engineering," IEEE Cloud Computing, 2(5): 6–8, September/October 2015.

Publication

Online at IEEE
Cloud engineering leverages innovations from a diverse spectrum of disciplines, from computer science and engineering to business informatics, toward the holistic treatment of key technical and business issues related to clouds.
Donald, K., A. Ailamaki, M. Balazinska, K. S. Candan, Y. Diao, C. Dyreson, Y. Ioanidis, C. S. Jensen, T. Milo, F. Spinola, "Letter from the SIGMOD Executive Committee," ACM SIGMOD Record, 44(3): 5–6, September 2015.

Online at SIGMOD
Jensen, C. S., C. Jermaine, X. Zhou, "Guest Editorial: Special Section on the International Conference on Data Engineering," IEEE Transactions on Knowledge and Data Engineering, 27(7): 1739–1740, July 2015.

Publication
Jensen, C. S., "Editorial: The Best of Two Worlds – Present Your TODS Paper at SIGMOD," ACM Transactions on Database Systems, 40(2), Article 7, 2 pages, June 2015.

Publication [not publicly available]

ACM Author-Izer
Jensen, C. S., X. Xie, V. I. Zadorozhny, "Message from the General Co-chairs," in Proceedings of the Sixtenth International Conference on Mobile Data Management - Volume 1, Pittsburgh, PA, USA, pp. xi–xii, June 15–18, 2015.

Publication

Online at IEEE
Jensen, C. S., "Changes to the TODS Editorial Board," ACM SIGMOD Record, 44(1): 5, March 2015.

Publication [not publicly available]

ACM Author-Izer
Jensen, C. S., M. Parashar, H. Yeom, "IC2E 2015: Message from the Program Chairs," in Proceeedings of the 2015 IEEE International Conference on Cloud Engineering, Tempe, AZ, USA, p. xiv, March 9–13, 2015.

Publication

Online at IEEE
Jensen, C. S., "Editorial: Updates to the Editorial Board," ACM Transactions on Database Systems, 40(1), article 1e, 1 pages, March 2015.

Publication [not publicly available]

Online at ACM Digital Library
Dai, J., B. Yang, C. Guo, C. S. Jensen, "Efficient and Accurate Path Cost Estimation Using Trajectory Data," Technical Report, October 2015, 16 pages. arXiv:1510.02886 [cs.DB], 10 Oct 2015.

Online at Cornell University Library
Using the growing volumes of vehicle trajectory data, it becomes increasingly possible to capture time-varying and uncertain travel costs in a road network, including travel time and fuel consumption. The current paradigm represents a road network as a graph, assigns weights to the graph's edges by fragmenting trajectories into small pieces that fit the underlying edges, and then applies a routing algorithm to the resulting graph. We propose a new paradigm that targets more accurate and more efficient estimation of the costs of paths by associating weights with sub-paths in the road network. The paper provides a solution to a foundational problem in this paradigm, namely that of computing the time-varying cost distribution of a path. The solution consists of several steps. We first learn a set of random variables that capture the joint distributions of sub-paths that are covered by sufficient trajectories. Then, given a departure time and a path, we select an optimal subset of learned random variables such that the random variables' corresponding paths together cover the path. This enables accurate joint distribution estimation of the path, and by transferring the joint distribution into a marginal distribution, the travel cost distribution of the path is obtained. The use of multiple learned random variables contends with data sparseness, the use of multi-dimensional histograms enables compact representation of arbitrary joint distributions that fully capture the travel cost dependencies among the edges in paths. Empirical studies with substantial trajectory data from two different cities offer insight into the design properties of the proposed solution and suggest that the solution is effective in real-world settings.
2014 top Guo, C., C. S. Jensen, B. Yang, "Towards Total Traffic Awareness," in ACM SIGMOD Record, 43(3): 18–23, September 2014.

Publication [not publicly available]

ACM Author-Izer
A combination of factors render the transportation sector a highly desirable area for data management research. The transportation sector receives substantial investments and is of high societal interest across the globe. Since there is limited room for new roads, smarter use of the existing infrastructure is of essence. The combination of the continued proliferation of sensors and mobile devices with the drive towards open data will result in rapidly increasing volumes of data becoming available. The data management community is well positioned to contribute to building a smarter transportation infrastructure. We believe that efficient management and effective analysis of big transportation data will enable us to extract transportation knowledge, which will bring significant and diverse benefits to society. We describe the data, present key challenges related to the extraction of thorough, timely, and trustworthy traffic knowledge to achieve total traffic awareness, and we outline services that may be enabled. It is thus our hope that the paper will inspire data management researchers to address some of the many challenges in the transportation area.
Šidlauskas, D, C. S. Jensen, "Spatial Joins in Main Memory: Implementation Matters!," in Proceedings of the VLDB Endowment, 8(1): 97–100, (Experiment and Analysis Paper), September 2014.

Publication [not publicly available]

Online at ACM Digital Library
A recent PVLDB paper reports on experimental analyses of ten spatial join techniques in main memory. We build on this comprehensive study to raise awareness of the fact that empirical running time performance findings in main-memory settings are results of not only the algorithms and data structures employed, but also their implementation, which complicates the interpretation of the results.
In particular, we re-implement the worst performing technique without changing the underlying high-level algorithm, and we then offer evidence that the resulting re-implementation is capable of outperforming all the other techniques. This study demonstrates that in main memory, where no time-consuming I/O can mask variations in implementation, implementation details are very important; and it offers a concrete illustration of how it is difficult to make conclusions from empirical running time performance findings in main-memory settings about data structures and algorithms studied.
Šidlauskas, D., S. Šaltenis, C. S. Jensen, "Processing of Extreme Moving-Object Update and Query Workloads in Main Memory," in The VLDB Journal, 23(5): 817–841, (Extended version of [146].), October 2014.

Publication [not publicly available]

Online at SpringerLink
The efficient processing of workloads that interleave moving-object updates and queries is challenging. In addition to the conflicting needs for update-efficient versus query-efficient data structures, the increasing parallel capabilities of multi-core processors yield challenges. To prevent concurrency anomalies and to ensure correct system behavior, conflicting update and query operations must be serialized. In this setting, it is a key concern to avoid that operations are blocked, which leaves processing cores idle. To enable efficient processing, we first examine concurrency degrees from traditional transaction processing in the context of our target domain and propose new semantics that enable a high degree of parallelism and ensure up-to-date query results. We define the new semantics for range and kk-nearest neighbor queries. Then, we present a main-memory indexing technique called parallel grid that implements the proposed semantics as well as two other variants supporting different semantics. This enables us to quantify the effects that different degrees of consistency have on performance. We also present an alternative time-partitioning approach. Empirical studies with the above and three existing proposals conducted on modern processors show that our proposals scale near-linearly with the number of hardware threads and thus are able to benefit from increasing on-chip parallelism.
Yang, B., M. Kaul, C. S. Jensen, "Using Incomplete Information for Complete Weight Annotation of Road Networks," in IEEE Transactions on Knowledge and Data Engineering, 26(5): 1267–1279, May 2014.

Publication

Online at IEEE
We are witnessing increasing interests in the effective use of road networks. For example, to enable effective vehicle routing, weighted-graph models of transportation networks are used, where the weight of an edge captures some cost associated with traversing the edge, e.g., greenhouse gas (GHG) emissions or travel time. It is a precondition to using a graph model for routing that all edges have weights. Weights that capture travel times and GHG emissions can be extracted from GPS trajectory data collected from the network. However, GPS trajectory data typically lack the coverage needed to assign weights to all edges. This paper formulates and addresses the problem of annotating all edges in a road network with travel cost based weights from a set of trips in the network that cover only a small fraction of the edges, each with an associated ground-truth travel cost. A general framework is proposed to solve the problem. Specifically, the problem is modeled as a regression problem and solved by minimizing a judiciously designed objective function that takes into account the topology of the road network. In particular, the use of weighted PageRank values of edges is explored for assigning appropriate weights to all edges, and the property of directional adjacency of edges is also taken into account to assign weights. Empirical studies with weights capturing travel time and GHG emissions on two road networks (Skagen, Denmark, and North Jutland, Denmark) offer insight into the design properties of the proposed techniques and offer evidence that the techniques are effective.
Shang, S., R. Ding, K. Zheng, C. S. Jensen, P. Kalnis, X. Zhou, "Personalized Trajectory Matching in Spatial Networks," in The VLDB Journal, 23(3): 449–468, June 2014.

Publication [not publicly available]

Online at SpringerLink
With the increasing availability of moving-object tracking data, trajectory search and matching is increasingly important. We propose and investigate a novel problem called personalized trajectory matching (PTM). In contrast to conventional trajectory similarity search by spatial distance only, PTM takes into account the significance of each sample point in a query trajectory. A PTM query takes a trajectory with user-specified weights for each sample point in the trajectory as its argument. It returns the trajectory in an argument data set with the highest similarity to the query trajectory. We believe that this type of query may bring significant benefits to users in many popular applications such as route planning, carpooling, friend recommendation, traffic analysis, urban computing, and location-based services in general. PTM query processing faces two challenges: how to prune the search space during the query processing and how to schedule multiple so-called expansion centers effectively. To address these challenges, a novel two-phase search algorithm is proposed that carefully selects a set of expansion centers from the query trajectory and exploits upper and lower bounds to prune the search space in the spatial and temporal domains. An efficiency study reveals that the algorithm explores the minimum search space in both domains. Second, a heuristic search strategy based on priority ranking is developed to schedule the multiple expansion centers, which can further prune the search space and enhance the query efficiency. The performance of the PTM query is studied in extensive experiments based on real and synthetic trajectory data sets.
Cao, X., G. Cong, C. S. Jensen, M. L. Yiu, "Retrieving Regions of Interest for User Exploration," in Proceedings of the VLDB Endowment, 7(9): 733–744, May 2014.

Publication [not publicly available]

Online at VLDB
We consider an application scenario where points of interest (PoIs) each have a web presence and where a web user wants to identify a region that contains relevant PoIs that are relevant to a set of keywords, e.g., in preparation for deciding where to go to conveniently explore the PoIs. Motivated by this, we propose the lengthconstrained maximum-sum region (LCMSR) query that returns a spatial-network region that is located within a general region of interest, that does not exceed a given size constraint, and that best matches query keywords. Such a query maximizes the total weight of the PoIs in it w.r.t. the query keywords. We show that it is NPhard to answer this query. We develop an approximation algorithm with a (5 + ǫ) approximation ratio utilizing a technique that scales node weights into integers. We also propose a more efficient heuristic algorithm and a greedy algorithm. Empirical studies on real data offer detailed insight into the accuracy of the proposed algorithms and show that the proposed algorithms are capable of computing results efficiently and effectively.
Skovsgaard, A., C. S. Jensen, "Top-k Point of Interest Retrieval Using Standard Indexes," in Proceedings of the 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Dallas, TX, USA, pp. 172–182, November 4–7, 2014.

Publication [not publicly available]

ACM Author-Izer
With the proliferation of Internet-connected, location-aware mobile devices, such as smartphones, we are also witnessing a proliferation and increased use of map-based services that serve information about relevant Points of Interest (PoIs) to their users.
We provide an efficient and practical foundation for the processing of queries that take a keyword and a spatial region as arguments and return the k most relevant PoIs that belong to the region, which may be the part of the map covered by the user's screen. The paper proposes a novel technique that encodes the spatio-textual part of a PoI as a compact bit string. This technique extends an existing spatial encoding to also encode the textual aspect of a PoI in compressed form. The resulting bit strings may then be indexed using index structures such as B-trees or hashing that are standard in DBMSs and key-value stores. As a result, it is straightforward to support the proposed functionality using existing data management systems. The paper also proposes a novel top-k query algorithm that merges partial results while providing an exact result.
An empirical study with real-world data indicates that the proposed techniques enable excellent indexing and query execution performance on a standard DBMS.
Rishede, J., M. L. Yiu, C. S. Jensen, "Concise Caching of Driving Instructions," in Proceedings of the 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Dallas, TX, USA, pp. 23–32, November 4–7, 2014.

Publication [not publicly available]

ACM Author-Izer
Online driving direction services offer fundamental functionality to mobile users, and such services see substantial and increasing loads as mobile access continues to proliferate. Cache servers can be deployed in order to reduce the resulting network traffic. We define so-called concise shortest paths that are equivalent to driving instructions. A concise shortest path occupies much less space than a shortest path; yet it provides sufficient navigation information to mobile users. Then we propose techniques that enable the caching of concise shortest paths in order to improve the cache hit ratio. Interestingly, the use of concise shortest paths in caching has two opposite effects on the cache hit ratio. The cache can accommodate a larger number of concise paths, but each individual concise path contains fewer nodes and so may answer fewer shortest path queries. The challenge is to strike a balance between these two effects in order to maximize the overall cache hit ratio. In this paper, we revisit two classes of caching methods and develop effective caching techniques for concise paths. Empirical results on real trajectory-induced workloads confirm the effectiveness of the proposed techniques.
Qu, Q., S. Liu, C. S. Jensen, F. Zhu, C. Faloutsos, "Interestingness-Driven Diffusion Process Summarization in Dynamic Networks," in Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, Part II, LNCS 8725, Nancy, France, pp. 597–613, September 15–19, 2014.

Publication [not publicly available]

Online at SpringerLink
The widespread use of social networks enables the rapid diffusion of information, e.g., news, among users in very large communities. It is a substantial challenge to be able to observe and understand such diffusion processes, which may be modeled as networks that are both large and dynamic. A key tool in this regard is data summarization. However, few existing studies aim to summarize graphs/networks for dynamics. Dynamic networks raise new challenges not found in static settings, including time sensitivity and the needs for online interestingness evaluation and summary traceability, which render existing techniques inapplicable. We study the topic of dynamic network summarization: how to summarize dynamic networks with millions of nodes by only capturing the few most interesting nodes or edges over time, and we address the problem by finding interestingness-driven diffusion processes. Based on the concepts of diffusion radius and scope, we define interestingness measures for dynamic networks, and we propose OSNet, an online summarization framework for dynamic networks. We report on extensive experiments with both synthetic and real-life data. The study offers insight into the effectiveness and design properties of OSNet.
Skovsgaard, A., D. Šidlauskas, C. S. Jensen, "A Clustering Approach to the Discovery of Points of Interest from Geo-Tagged Microblog Posts," in Proceedings of the Fifteenth IEEE International Conference on Mobile Data Management, Brisbane, Australia, pp. 178–189, July 14–18, 2014.

Publication

Online at IEEE
Points of interest (PoI) data serves an important role as a foundation for a wide variety of location-based services. Such data is typically obtained from an authoritative source or from users through crowd sourcing. It can be costly to maintain an up-to-date authoritative source, and data obtained from users can vary greatly in coverage and quality. We are also witnessing a proliferation of both GPS-enabled mobile devices and geotagged content generated by users of such devices. This state of affairs motivates the paper's proposal of techniques for the automatic discovery of PoI data from geo-tagged microblog posts. Specifically, the paper proposes a new clustering technique that takes into account both the spatial and textual attributes of microblog posts to obtain clusters that represent PoIs. The technique expands clusters based on a proposed quality function that enables clusters of arbitrary shape and density. An empirical study with a large database of real geo-tagged microblog posts offers insight into the properties of the proposed techniques and suggests that they are effective at discovering real-world points of interest.
Qu, Q., S. Liu, B. Yang, C. S. Jensen, "Efficient Top-k Spatial Locality Search for Co-located Spatial Web Objects," in Proceedings of the Fifteenth IEEE International Conference on Mobile Data Management, Brisbane, Australia, pp. 269–278, July 14–18, 2014.

Publication

Online at IEEE
In step with the web being used widely by mobile users, user location is becoming an essential signal in services, including local intent search. Given a large set of spatial web objects consisting of a geographical location and a textual description (e.g., Online business directory entries of restaurants, bars, and shops), how can we find sets of objects that are both spatially and textually relevant to a query? Most of existing studies solve the problem by requiring that all query keywords are covered by the returned objects and then rank the sets by spatial proximity. The needs for identifying sets with more textually relevant objects render these studies inapplicable. We propose locality Search, a query that returns top-k sets of spatial web objects and integrates spatial distance and textual relevance in one ranking function. We show that computing the query is NP-hard, and we present two efficient exact algorithms and one generic approximate algorithm based on greedy strategies for computing the query. We report on findings from an empirical study with three real-life datasets. The study offers insight into the efficiency and effectiveness of the proposed algorithms.
Qu, Q., S. Liu, B. Yang, C. S. Jensen, "Integrating Non-Spatial Preferences into Spatial Location Queries," in Proceedings of the 26th International Conference on Scientific and Statistical Database Management, Aalborg, Denmark, article no. 8, 12 pages, June 30–July 2, 2014.

Publication [not publicly available]

ACM Author-Izer
Increasing volumes of geo-referenced data are becoming available. This data includes so-called points of interest that describe businesses, tourist attractions, etc. by means of a geo-location and properties such as a textual description or ratings. We propose and study the efficient implementation of a new kind of query on points of interest that takes into account both the locations and properties of the points of interest. The query takes a result cardinality, a spatial range, and property-related preferences as parameters, and it returns a compact set of points of interest with the given cardinality and in the given range that satisfies the preferences. Specifically, the points of interest in the result set cover so-called allying preferences and are located far from points of interest that possess so-called alienating preferences. A unified result rating function integrates the two kinds of preferences with spatial distance to achieve this functionality. We provide efficient exact algorithms for this kind of query. To enable queries on large datasets, we also provide an approximate algorithm that utilizes a nearest-neighbor property to achieve scalable performance. We develop and apply lower and upper bounds that enable search-space pruning and thus improve performance. Finally, we provide a generalization of the above query and also extend the algorithms to support the generalization. We report on an experimental evaluation of the proposed algorithms using real point of interest data from Google Places for Business that offers insight into the performance of the proposed solutions.
Ma, Y., B. Yang, C. S. Jensen, "Enabling Time-Dependent Uncertain Eco-Weights For Road Networks," in Proceedings of the 2014 Workshop on Managing and Mining Enriched Geo-Spatial Data, Snowbird, UT, USA, 6 pages., June 27, 2014.

Publication [not publicly available]

ACM Author-Izer
Reduction of greenhouse gas (GHG) emissions from transportation is an essential part of the efforts to prevent global warming and climate change. Eco-routing, which enables drivers to use the most environmentally friendly routes, is able to substantially reduce GHG emissions from vehicular transportation. The foundation of eco-routing is a weighted-graph representation of a road network in which road segments, or edges, are associated with eco-weights that capture the GHG emissions caused by traversing the edges. Due to the dynamics of traffic, the eco-weights are typically time dependent and uncertain. We formalize the problem of assigning a time-dependent, uncertain eco-weight to each edge in a road network. In particular, a sequence of histograms are employed to describe the uncertain eco-weight during different time intervals for each edge. Various compression techniques, including histogram merging and buckets reduction, are proposed to maintain compact histograms while achieving good accuracy. Histogram aggregation methods are proposed that use these to accurately estimate GHG emissions for routes. A comprehensive empirical study is conducted based on two years of GPS data from vehicles in order to gain insight into the effectiveness and efficiency of the proposed approach.
Radaelli, L., Y. Moses, C. S. Jensen, "Using Cameras to Improve Wi-Fi Based Indoor Positioning," in Proceedings of the Thirteenth International Symposium on Web and Wireless Geographical Information Systems, Seoul, South Korea, pp. 166–183, May 29–30, 2014.

Publication [not publicly available]

Online at SpringerLink
Indoor positioning systems are increasingly being deployed to enable indoor navigation and other indoor location-based services. Systems based on Wi-Fi and video cameras rely on different technologies and techniques and have so far been developed independently by different research communities; we show that integrating information provided by a video system into a Wi-Fi based system increases its maintainability and avoid drops in accuracy over time. Specifically, we consider a Wi-Fi system that uses fingerprints measurements collected in the space for positioning. We improve the system’s room-level accuracy by means of automatic, video-driven collection of fingerprints. Our method is able to relate a Wi-Fi user to unidentified movements detected by cameras by exploiting the existing Wi-Fi system, thus generating fingerprints automatically. This use of video for fingerprint collection reduces the need for manual collection and allows online updating of fingerprints. Hence, increasing system accuracy. We report on an empirical study that shows that automatic fingerprinting induces only few false positives and yields a substantial accuracy improvement.
Skovsgaard, A., D. ˇ Sidlauskas, C. S. Jensen, "Scalable Top-k Spatio-Temporal Term Querying," in Proceedings of the 30th IEEE International Conference on Data Engineering, Chicago, IL, USA, pp. 148–159, March 31–April 4, 2014.

Publication

Online at IEEE
With the rapidly increasing deployment of Internet-connected, location-aware mobile devices, very large and increasing amounts of geo-tagged and timestamped user-generated content, such as microblog posts, are being generated. We present indexing, update, and query processing techniques that are capable of providing the top-k terms seen in posts in a user-specified spatio-temporal range. The techniques enable interactive response times in the millisecond range in a realistic setting where the arrival rate of posts exceeds today's average tweet arrival rate by a factor of 4-10. The techniques adaptively maintain the most frequent items at various spatial and temporal granularities. They extend existing frequent item counting techniques to maintain exact counts rather than approximations. An extensive empirical study with a large collection of geo-tagged tweets shows that the proposed techniques enable online aggregation and query processing at scale in realistic settings.
Yang, B., C. Guo, C. S. Jensen, M. Kaul, S. Shang, "Stochastic Skyline Route Planning under Time-Varying Uncertainty," in Proceedings of the 30th IEEE International Conference on Data Engineering, Chicago, IL, USA, pp. 136–147, March 31–April 4, 2014.

Publication

Online at IEEE
Different uses of a road network call for the consideration of different travel costs: in route planning, travel time and distance are typically considered, and green house gas (GHG) emissions are increasingly being considered. Further, travel costs such as travel time and GHG emissions are time-dependent and uncertain. To support such uses, we propose techniques that enable the construction of a multi-cost, time-dependent, uncertain graph (MTUG) model of a road network based on GPS data from vehicles that traversed the road network. Based on the MTUG, we define stochastic skyline routes that consider multiple costs and time-dependent uncertainty, and we propose efficient algorithms to retrieve stochastic skyline routes for a given source-destination pair and a start time. Empirical studies with three road networks in Denmark and a substantial GPS data set offer insight into the design properties of the MTUG and the efficiency of the stochastic skyline routing algorithms.
Silvestri, C., F. Lettich, S. Orlando, C. S. Jensen, "GPU-based Computing of Repeated Range Queries over Moving Objects," in Proceedings of the 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, Turin, Italy, pp. 640–647, March 31–April 4, 2014.

Publication

Online at IEEE
In this paper we investigate the use of GPUs to solve a data-intensive problem that involves huge amounts of moving objects. The scenario which we focus on regards objects that continuously move in a 2D space, where a large percentage of them also issues range queries. The processing of these queries entails a large quantity of objects falling into the range queries to be returned. In order to solve this problem by maintaining a suitable throughput, we partition the time into ticks, and defer the parallel processing of all the objects events (location updates and range queries) occurring in a given tick to the next tick, thus slightly delaying the overall computation. We process in parallel all the events of each tick by adopting an hybrid approach, based on the combined use of CPU and GPU, and show the suitability of the method by discussing performance results. The exploitation of a GPU allow us to achieve a speedup of more than 20× on several datasets with respect to the best sequential algorithm solving the same problem. More importantly, we show that the adoption of new bitmap-based intermediate data structure we propose to avoid memory access contention entails a 10× speedup with respect to naive GPU based solutions.
Jensen, C. S., H. Lu, T. B. Pedersen, C. Thomsen, K. Torp, editors, Proceeedings of the 26th International Conference on Scientific and Statistical Database Management, Aalborg, Denmark, June 30–July 2, 2014.

Online at ACM Digital Library
Bhowmick, S., C. E. Dyreson, C. S. Jensen, M. L. Lee, A. Muliantara, B. Thalheim, editors, Proceedings of the 19th International Conference on Database Systems for Advanced Applications, Part I, LNCS 8421, Bali, Indonesia, 514+ xxv pages, April 21–24, 2014.

Online at SpringerLink
Bhowmick, S., C. E. Dyreson, C. S. Jensen, M. L. Lee, A. Muliantara, B. Thalheim, editors, Proceedings of the 19th International Conference on Database Systems for Advanced Applications, Part II, LNCS 8422, Bali, Indonesia, 558+ xxvi pages, April 21–24, 2014.

Online at SpringerLink
Jensen, C. S., A. Friis-Christensen, T. B. Pedersen, D. Pfoser, S. Šaltenis, N. Tryfona, "Location-Based Services—A Database Perspective," Chapter 6, pp. 82–93 in Breaking New Ground - Dedicated to Finn Kjærsdam, edited by L. Dirckinck-Holmfeld, N.-H. Gylstorff, H. K. Krogstrup, L. Lange, E. H. Nielsen, E. Toft, and R. Ærø, Aalborg University Press, Reprint of [421] with a foreword, 2014.

Publication
We are heading rapidly towards a global computing and information infrastructure that will contain billions of wirelessly connected devices, many of which will offer so-called location-based services to their mobile users always and everywhere. Indeed, users will soon take ubiquitous wireless access to information and services for granted. This scenario is made possible by the rapid advances in the underlying hardware technologies, which continue to follow variants of Moore’s Law.
Examples of location-based services include tracking, way-finding, traffic management, safety-related services, and mixed-reality games, to name but a few. This paper outlines a general usage scenario for location-based services, in which data warehousing plays a central role, and it describes central challenges to be met by the involved software technologies in order for them to reach their full potential for usage in location-based services.
Jensen, C. S., "Foreword to Invited Paper Issue," ACM Transactions on Database Systems, 39(4), article 26, 2 pages, December 2014.

Publication [not publicly available]

Online at ACM Digital Library
Sheng, Q. Z., J. He, G. Wang, C. S. Jensen, "Guest editorial: Web technologies and applications," World Wide Web, 17(4): 455-456, July 2014.

Publication [not publicly available]

Online at SpringerLink
Jensen, C. S., "Foreword," ACM Transactions on Database Systems, 39(3), article 18, 1 page, September 2014.

Publication [not publicly available]

Online at ACM Digital Library
Jensen, C. S., H. Lu, T. B. Pedersen, C. Thomsen, K. Torp, "Foreword," in Proceeedings of the 26th International Conference on Scientific and Statistical Database Management, Aalborg, Denmark, 2 pages, June 30–July 2, 2014.

Publication [not publicly available]

Online at ACM Digital Library
The International Conference on Scientific and Statistical Database Management (SSDBM) brings together scientific domain experts, database researchers, practitioners, and developers for the presentation and exchange of current research results on concepts, tools, and techniques for scientific and statistical database applications. This year, the 26th SSDBM takes place in Aalborg, Denmark, from June 30 to July 2, 2014.
Bhowmick, S., C. E. Dyreson, C. S. Jensen, "Preface," in Proceedings of the 19th International Conference on Database Systems for Advanced Applications, Parts I and II, LNCS 8421, Bali, Indonesia, pp. v–vii, April 21–24, 2014.

Publication [not publicly available]

Online at SpringerLink
Lettich, F., S. Orlando, C. Silvestri, C. S. Jensen, "Manycore processing of repeated range queries over massive moving objects observations," Technical Report, 36 pages. arXiv:1411.3212v1 [cs.DB] 12 Nov 2014, November 2014.

Online at Cornell University Library
The ability to timely process significant amounts of continuously updated spatial data is mandatory for an increasing number of applications. Parallelism enables such applications to face this data-intensive challenge and allows the devised systems to feature low latency and high scalability. In this paper we focus on a specific data-intensive problem, concerning the repeated processing of huge amounts of range queries over massive sets of moving objects, where the spatial extents of queries and objects are continuously modified over time. To tackle this problem and significantly accelerate query processing we devise a hybrid CPU/GPU pipeline that compresses data output and save query processing work. The devised system relies on an ad-hoc spatial index leading to a problem decomposition that results in a set of independent data-parallel tasks. The index is based on a point-region quadtree space decomposition and allows to tackle effectively a broad range of spatial object distributions, even those very skewed. Also, to deal with the architectural peculiarities and limitations of the GPUs, we adopt non-trivial GPU data structures that avoid the need of locked memory accesses and favour coalesced memory accesses, thus enhancing the overall memory throughput. To the best of our knowledge this is the first work that exploits GPUs to efficiently solve repeated range queries over massive sets of continuously moving objects, characterized by highly skewed spatial distributions. In comparison with state-of-the-art CPU-based implementations, our method highlights significant speedups in the order of 14x-20x, depending on the datasets, even when considering very cheap GPUs.
2013 top Li, X., V. Ceikute, C. S. Jensen, K.-L. Tan, "Effective Online Group Discovery in Trajectory Databases," in IEEE Transactions on Knowledge and Data Engineering, 25(12): 2752–2766, December 2013.

Publication

Online at IEEE
GPS-enabled devices are pervasive nowadays. Finding movement patterns in trajectory data stream is gaining in importance. We propose a group discovery framework that aims to efficiently support the online discovery of moving objects that travel together. The framework adopts a sampling-independent approach that makes no assumptions about when positions are sampled, gives no special importance to sampling points, and naturally supports the use of approximate trajectories. The framework's algorithms exploit state-of-the-art, density-based clustering (DBScan) to identify groups. The groups are scored based on their cardinality and duration, and the top-k groups are returned. To avoid returning similar subgroups in a result, notions of domination and similarity are introduced that enable the pruning of low-interest groups. Empirical studies on real and synthetic data sets offer insight into the effectiveness and efficiency of the proposed framework.
Kaul, M., R. C.-W. Wong, B. Yang, C. S. Jensen, "Finding Shortest Paths on Terrains by Killing Two Birds with One Stone," in Proceedings of the VLDB Endowment, 7(1): 73–84, September 2013.

Publication

Online at VLDB
With the increasing availability of terrain data, e.g., from aerial laser scans, the management of such data is attracting increasing attention in both industry and academia. In particular, spatial queries, e.g., k-nearest neighbor and reverse nearest neighbor queries, in Euclidean and spatial network spaces are being extended to terrains. Such queries all rely on an important operation, that of finding shortest surface distances. However, shortest surface distance computation is very time consuming. We propose techniques that enable efficient computation of lower and upper bounds of the shortest surface distance, which enable faster query processing by eliminating expensive distance computations. Empirical studies show that our bounds are much tighter than the best-known bounds in many cases and that they enable speedups of up to 43 times for some well-known spatial queries.
Bøgh, K. S., A. Skovsgaard, C. S. Jensen, "GroupFinder: A New Approach to Top-K Point-of-Interest Group Retrieval," in Proceedings of the VLDB Endowment, 6(12): 1226–1229, August 2013.

Publication [not publicly available]

Online at VLDB
The notion of point-of-interest (PoI) has existed since paper road maps began to include markings of useful places such as gas stations, hotels, and tourist attractions. With the introduction of geopositioned mobile devices such as smartphones and mapping services such as Google Maps, the retrieval of PoIs relevant to a user’s intent has became a problem of automated spatio-textual information retrieval. Over the last several years, substantial research has gone into the invention of functionality and efficient implementations for retrieving nearby PoIs. However, with a couple of exceptions existing proposals retrieve results at single-PoI granularity. We assume that a mobile device user issues queries consisting of keywords and an automatically supplied geo-position, and we target the common case where the user wishes to find nearby groups of PoIs that are relevant to the keywords. Such groups are relevant to users who wish to conveniently explore several options before making a decision such as to purchase a specific product. Specifically, we demonstrate a practical proposal for finding top-k PoI groups in response to a query. We show how problem parameter settings can be mapped to options that are meaningful to users. Further, although this kind of functionality is prone to combinatorial explosion, we will demonstrate that the functionality can be supported efficiently in practical settings.
Yang, B., C. Guo, C. S. Jensen, "Travel Cost Inference from Sparse, Spatio-Temporally Correlated Time Series Using Markov Models," in Proceedings of the VLDB Endowment, 6(9): 769–780, July 2013.

Publication [not publicly available]

Online at VLDB
The monitoring of a system can yield a set of measurements that can be modeled as a collection of time series. These time series are often sparse, due to missing measurements, and spatio-temporally correlated, meaning that spatially close time series exhibit temporal correlation. The analysis of such time series offers insight into the underlying system and enables prediction of system behavior. While the techniques presented in the paper apply more generally, we consider the case of transportation systems and aim to predict travel cost from GPS tracking data from probe vehicles. Specifi- cally, each road segment has an associated travel-cost time series, which is derived from GPS data. We use spatio-temporal hidden Markov models (STHMM) to model correlations among different traffic time series. We provide algorithms that are able to learn the parameters of an STHMM while contending with the sparsity, spatio-temporal correlation, and heterogeneity of the time series. Using the resulting STHMM, near future travel costs in the transportation network, e.g., travel time or greenhouse gas emissions, can be inferred, enabling a variety of routing services, e.g., eco-routing. Empirical studies with a substantial GPS data set offer insight into the design properties of the proposed framework and algorithms, demonstrating the effectiveness and efficiency of travel cost inferencing.
Wu, D., M. L. Yiu, C. S. Jensen, "Moving Spatial Keyword Queries: Formulation, Methods, and Analysis," in ACM Transactions on Database Systems, 38(1), 45 pages, (Extended version of [153].), March 2013.

Publication [not publicly available]

ACM Author-Izer
Web users and content are increasingly being geo-positioned. This development gives prominence to spatial keyword queries, which involve both the locations and textual descriptions of content. We study the efficient processing of continuously moving top-k spatial keyword (MkSK) queries over spatial text data. State-of-the-art solutions for moving queries employ safe zones that guarantee the validity of reported results as long as the user remains within the safe zone associated with a result. However, existing safe-zone methods focus solely on spatial locations and ignore text relevancy.
We propose two algorithms for computing safe zones that guarantee correct results at any time and that aim to optimize the server-side computation as well as the communication between the server and the client. We exploit tight and conservative approximations of safe zones and aggressive computational space pruning. We present techniques that aim to compute the next safe zone efficiently, and we present two types of conservative safe zones that aim to reduce the communication cost. Empirical studies with real data suggest that the proposals are efficient.
To understand the effectiveness of the proposed safe zones, we study analytically the expected area of a safe zone, which indicates on average for how long a safe zone remains valid, and we study the expected number of influence objects needed to define a safe zone, which gives an estimate of the average communication cost. The analytical modeling is validated through empirical studies.
Chen, L., G. Cong, C. S. Jensen, D. Wu, "Spatial Keyword Query Processing: An Experimental Evaluation," in Proceedings of the VLDB Endowment, 6(3): 217–228, January 2013.

Publication [not publicly available]

Online at VLDB
Geo-textual indices play an important role in spatial keyword querying. The existing geo-textual indices have not been compared systematically under the same experimental framework. This makes it difficult to determine which indexing technique best supports specific functionality. We provide an all-around survey of 12 stateof-the-art geo-textual indices. We propose a benchmark that enables the comparison of the spatial keyword query performance. We also report on the findings obtained when applying the benchmark to the indices, thus uncovering new insights that may guide index selection as well as further research.
Tzoumas, K., A. Deshpande, C. S. Jensen, "Efficiently Adapting Graphical Models for Cardinality Estimation," in The VLDB Journal, 22(1): 3–27, (Special issue on best papers of VLDB 2011. Extended version of [38].), February 2013.

Publication [not publicly available]

Online at ACM Digital Library
Query optimizers rely on statistical models that succinctly describe the underlying data. Models are used to derive cardinality estimates for intermediate relations, which in turn guide the optimizer to choose the best query execution plan. The quality of the resulting plan is highly dependent on the accuracy of the statistical model that represents the data. It is well known that small errors in the model estimates propagate exponentially through joins, and may result in the choice of a highly sub-optimal query execution plan. Most commercial query optimizers make the attribute value independence assumption: all attributes are assumed to be statistically independent. This reduces the statistical model of the data to a collection of one-dimensional synopses (typically in the form of histograms), and it permits the optimizer to estimate the selectivity of a predicate conjunction as the product of the selectivities of the constituent predicates. However, this independence assumption is more often than not wrong, and is considered to be the most common cause of sub-optimal query execution plans chosen by modern query optimizers. We take a step towards a principled and practical approach to performing cardinality estimation without making the independence assumption. By carefully using concepts from the field of graphical models, we are able to factor the joint probability distribution over all the attributes in the database into small, usually two-dimensional distributions, without a significant loss in estimation accuracy. We show how to efficiently construct such a graphical model from the database using only two-way join queries, and we show how to perform selectivity estimation in a highly efficient manner. We integrate our algorithms into the PostgreSQL DBMS. Experimental results indicate that estimation errors can be greatly reduced, leading to orders of magnitude more efficient query execution plans in many cases. Optimization time is kept in the range of tens of milliseconds, making this a practical approach for industrial-strength query optimizers.
Radaelli, L., C. S. Jensen, "Towards Fully Organic Indoor Positioning," in Proceedings of the Fifth ACM SIGSPATIAL International Workshop on Indoor Spatial Awareness, Orlando, FL, USA, pp. 16–20, November 5, 2013.

Publication [not publicly available]

ACM Author-Izer
Indoor positioning systems based on fingerprinting techniques generally require costly initialization and maintenance by trained surveyors. Organic positioning systems aim to eliminate these deficiencies by managing their own accuracy and obtaining input from users and other sources. Such systems introduce new challenges, e.g., detection and filtering of erroneous user input, estimation of the positioning accuracy, and means of obtaining user input when necessary.
We envision a fully organic indoor positioning system, where all available sources of information are exploited in order to provide room-level accuracy with no active intervention of users. For example, such systems can exploit pre-installed cameras to associate a user's location with a Wi-Fi fingerprint from the user's phone; and it can use a calendar to determine whether a user is in the room reported by the positioning system. Numerous possibilities for integration exist that may provide better indoor positioning.
Li, X., V. Čeikute, C. S. Jensen, K.-L. Tan, "Trajectory Based Optimal Segment Computation in Road Network Databases," in Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Orlando, FL, USA, pp. 386–389, November 5–8, 2013.

Publication [not publicly available]

ACM Author-Izer
Finding a location for a new facility s.t. the facility attracts the maximal number of customers is a challenging problem. Existing studies either model customers as static sites and thus do not consider customer movement, or they focus on theoretical aspects and do not provide solutions that are shown empirically to be scalable. Given a road network, a set of existing facilities, and a collection of customer route traversals, an optimal segment query returns the optimal road network segment(s) for a new facility. We propose a practical framework for computing this query, where each route traversal is assigned a score that is distributed among the road segments covered by the route according to a score distribution model. We propose two algorithms that adopt different approaches to computing the query. Empirical studies with real data sets demonstrate that the algorithms are capable of offering high performance in realistic settings.
Kjærgaard, M. B., M. V. Krarup, A. Stisen, T. S. Prentow, H. Blunck, K. Grønbæk, C. S. Jensen, "Indoor Positioning using Wi-Fi—How Well Is the Problem Understood?," in Proceedings of the 2013 International Conference on Indoor Positioning and Indoor Navigation, Montbéliard-Belfort, France, 6 pages, October 28–31, 2013.

Publication

Online at Scholar
The past decade has witnessed substantial research on methods for indoor Wi-Fi positioning. While much effort has gone into achieving high positioning accuracy and easing fingerprint collection, it is our contention that the general problem is not sufficiently well understood, thus preventing deployments and their usage by applications to become more widespread. Based on our own and published experiences on indoor Wi-Fi positioning deployments, we hypothesize the following: Current indoor WiFi positioning systems and their utilization in applications are hampered by the lack of understanding of the requirements present in the real-world deployments. In this paper, we report findings from qualitatively studying organisational requirements for indoor Wi-Fi positioning. The studied cases and deployments cover both company and public-sector settings and the deployment and evaluation of several types of indoor Wi-Fi positioning systems over durations of up to several years. The findings suggest among others a need for supporting all case-specific user groups, providing software platform independence, low maintenance, allowing positioning of all user devices, regardless of platform and form factor. Furthermore, the findings also vary significantly across organisations, for instance in terms of need for coverage, which motivates the design of orthogonal solutions.
Brucato, M., L. Derczynski, H. Llorens, K. Bontcheva, C. S. Jensen, "Recognising and Interpreting Named Temporal Expressions," in Proceedings of Recent Advances in Natural Language Processing, Hissar, Bulgaria, pp. 113–121, September 9–11, 2013.

Publication

Online at RANLP
This paper introduces a new class of temporal expression – named temporal expressions – and methods for recognising and interpreting its members. The commonest temporal expressions typically contain date and time words, like April or hours. Research into recognising and interpreting these typical expressions is mature in many languages. However, there is a class of expressions that are less typical, very varied, and difficult to automatically interpret. These indicate dates and times, but are harder to detect because they often do not contain time words and are not used frequently enough to appear in conventional temporally-annotated corpora – for example Michaelmas or Vasant Panchami.
Using Wikipedia and linked data, we automatically construct a resource of English named temporal expressions, and use it to extract training examples from a large corpus. These examples are then used to train and evaluate a named temporal expression recogniser. We also introduce and evaluate rules for automatically interpreting these expressions, and we observe that use of the rules improves temporal annotation performance over existing corpora
Čeikute, V., C. S. Jensen, "Routing Service Quality—Local Driver Behavior Versus Routing Services," in Proceedings of the Fourteenth International Conference on Mobile Data Management, Milan, Italy, pp. 97–106, June 3–6, 2013.

Publication

Online at IEEE
Mobile location-based services is a very successful class of services that are being used frequently by users with GPS-enabled mobile devices such as smartphones. This paper presents a study of how to exploit GPS trajectory data, which is available in increasing volumes, for the assessment of the quality of one kind of location-based service, namely routing services. Specifically, the paper presents a framework that enables the comparison of the routes provided by routing services with the actual driving behaviors of local drivers. Comparisons include route length, travel time, and also route popularity, which are enabled by common driving behaviors found in available trajectory data. The ability to evaluate the quality of routing services enables service providers to improve the quality of their services and enables users to identify the services that best serve their needs. The paper covers experiments with real vehicle trajectory data and an existing online navigation service. It is found that the availability of information about previous trips enables better prediction of route travel time and makes it possible to provide the users with more popular routes than does a conventional navigation service.
Kaul, M., B. Yang, C. S. Jensen, "Building Accurate 3D Spatial Networks to Enable Next Generation Intelligent Transportation Systems," in Proceedings of the Fourteenth International Conference on Mobile Data Management, Milan, Italy, pp. 137–146, June 3–6, 2013.

Publication

Online at IEEE
The use of accurate 3D spatial network models can enable substantial improvements in vehicle routing. Notably, such models enable eco-routing, which reduces the environmental impact of transportation. We propose a novel filtering and lifting framework that augments a standard 2D spatial network model with elevation information extracted from massive aerial laser scan data and thus yields an accurate 3D model. We present a filtering technique that is capable of pruning irrelevant laser scan points in a single pass, but assumes that the 2D network fits in internal memory and that the points are appropriately sorted. We also provide an external-memory filtering technique that makes no such assumptions. During lifting, a triangulated irregular network (TIN) surface is constructed from the remaining points. The 2D network is projected onto the TIN, and a 3D network is constructed by means of interpolation. We report on a large-scale empirical study that offers insight into the accuracy, efficiency, and scalability properties of the framework.
Radaelli, L., D. Sabonis, H. Lu, C. S. Jensen, "Identifying Typical Movements Among Indoor Objects—Concepts and Empirical Study," in Proceedings of the Fourteenth International Conference on Mobile Data Management, Milan, Italy, pp. 197–206, June 3–6, 2013.

Publication

Online at IEEE
With the proliferation of mobile computing, positioning systems are becoming available that enable indoor location-based services. As a result, indoor tracking data is also becoming available. This paper puts focus on one use of such data, namely the identification of typical movement patterns among indoor moving objects. Specifically, the paper presents a method for the identification of movement patterns. Leveraging concepts from sequential pattern mining, the method takes into account the specifics of spatial movement and, in particular, the specifics of tracking data that captures indoor movement. For example, the paper's proposal supports spatial aggregation and utilizes the topology of indoor spaces to achieve better performance. The paper reports on empirical studies with real and synthetic data that offer insights into the functional and computational aspects of its proposal.
Baniukevic, A., C. S. Jensen, H. Lu, "Hybrid Indoor Positioning With Wi-Fi and Bluetooth: Architecture and Performance," in Proceedings of the Fourteenth International Conference on Mobile Data Management, Milan, Italy, pp. 207–216, June 3–6, 2013.

Publication

Online at IEEE
Reliable indoor positioning is an important foundation for emerging indoor location based services. Most existing indoor positioning proposals rely on a single wireless technology, e.g., Wi-Fi, Bluetooth, or RFID. A hybrid positioning system combines such technologies and achieves better positioning accuracy by exploiting the different capabilities of the different technologies. In a hybrid system based on Wi-Fi and Bluetooth, the former works as the main infrastructure to enable fingerprint based positioning, while the latter (via hotspot devices) partitions the indoor space as well as a large Wi-Fi radio map. As a result, the Wi-Fi based online position estimation is improved in a divide-and-conquer manner. We study three aspects of such a hybrid indoor positioning system. First, to avoid large positioning errors caused by similar reference positions that are hard to distinguish, we design a deployment algorithm that identifies and separates such positions into different smaller radio maps by deploying Bluetooth hotspots at particular positions. Second, we design methods that improve the partition switching that occurs when a user leaves the detection range of a Bluetooth hotspot. Third, we propose three architectural options for placement of the computation workload. We evaluate all proposals using both simulation and walkthrough experiments in two indoor environments of different sizes. The results show that our proposals are effective and efficient in achieving very good indoor positioning performance.
Andersen, O., C. S. Jensen, K. Torp, B. Yang, "EcoTour: Reducing the Environmental Footprint of Vehicles Using Eco-Routes," in Proceedings of the Fourteenth International Conference on Mobile Data Management, Milan, Italy, pp. 338–340, June 3–6, 2013.

Publication

Online at IEEE
Reduction in greenhouse gas emissions from transportation is essential in combating global warming and climate change. Eco-routing enables drivers to use the most eco-friendly routes and is effective in reducing vehicle emissions. The EcoTour system assigns eco-weights to a road network based on GPS and fuel consumption data collected from vehicles to enable ecorouting. Given an arbitrary source-destination pair in Denmark, EcoTour returns the shortest route, the fastest route, and the eco-route, along with statistics for the three routes. EcoTour also serves as a testbed for exploring advanced solutions to a range of challenges related to eco-routing.
Derczynski, L. R. A., B. Yang, C. S. Jensen, "Towards Context-Aware Search and Analysis on Social Media Data," in Proceedings of the 16th International Conference on Extending Database Technology, Genoa, Italy, pp. 137–142, March 18–22, 2013.

Publication [not publicly available]

ACM Author-Izer
Social media has changed the way we communicate. Social media data capture our social interactions and utterances in machine readable format. Searching and analysing massive and frequently updated social media data brings significant and diverse rewards across many different application domains, from politics and business to social science and epidemiology. A notable proportion of social media data comes with explicit or implicit spatial annotations, and almost all social media data has temporal metadata. We view social media data as a constant stream of data points, each containing text with spatial and temporal contexts. We identify challenges relevant to each context, which we intend to subject to context aware querying and analysis, specifically including longitudinal analyses on social media archives, spatial keyword search, local intent search, and spatio-temporal intent search. Finally, for each context, emerging applications and further avenues for investigation are discussed.
Yang, B., N. Fantini, C. S. Jensen, "iPark: Identifying Parking Spaces from Trajectories," in Proceedings of the 16th International Conference on Extending Database Technology, Genoa, Italy, pp. 705–708, March 18–22, 2013.

Publication [not publicly available]

ACM Author-Izer
A wide variety of desktop and mobile Web applications involve geo-tagged content, e.g., photos and (micro-) blog postings. Such content, often called User Generated Geo-Content (UGGC), plays an increasingly important role in many applications. However, a great demand also exists for "core" UGGC where the geo-spatial aspect is not just a tag on other content, but is the primary content, e.g., a city street map with up-to-date road construction data. Along these lines, the iPark system aims to turn volumes of GPS data obtained from vehicles into information about the locations of parking spaces, thus enabling effective parking search applications. In particular, we demonstrate how iPark helps ordinary users annotate an existing digital map with two types of parking, on-street parking and parking zones, based on vehicular tracking data.
Jensen, C. S., C. Jermaine, X. Zhou, editors, Proceedings of the 29th IEEE International Conference on Data Engineering, Brisbane, QLD, Australia, April 8–11, 2013.

Online at IEEE
Hector, G., P. Venetis, C. S. Jensen, A. Y. Halevy, co-inventors, "Directions-based ranking of places returned by local search queries," United States Patent No. 8538973 B1, (filed June 4, 2010), September 17 2013.

Online at Google Inc.
A system and a method for ranking search results of local search queries. A local search query and a current location of a user are received. Next, two or more places that satisfy the local search query are identified, and for each respective place a corresponding distance from the current location of the user to the respective place is also identified. The two or more places are then ranked in accordance with scores that are based, at least in part, on popularity of the two or more places and the corresponding distances from the current location of the user, to produce a set of ranked places. The ranked set of places is then provided to the user.
Jensen, C. S., "Spatial Keyword Querying of Geo-Tagged Web Content," in Proceedings of the Seventh International Workshop on Ranking in Databases, Riva del Garda, Italy, article no. 1, 4 pages. Invited paper., August 30, 2013.

Publication [not publicly available]

ACM Author-Izer
The web is increasingly being used by mobile users, and it is increasingly possible to accurately geo-position mobile users. In addition, increasing volumes of geo-tagged web content are becoming available. Further, indications are that a substantial fraction of web keyword queries target local content. When combined, these observations suggest that spatial keyword querying is important and indeed gaining in importance. A prototypical spatial keyword query takes a user location and user-supplied keywords as parameters and returns web content that is spatially and textually relevant to these parameters. The paper reviews key concepts related to spatial keyword querying and reviews recent proposals by the author and his colleagues for spatial keyword querying functionality that is easy to use, relevant to users, and can be supported efficiently.
Moreira, J., C. S. Jensen, P. Dias, P. Mesquita, "Creating data representations for moving objects with extent from images," presented at the COST MOVE Workshop at Moving Objects at Sea, Brest, France, 4 pages, June 28–29, 2013.

Publication
Atzeni, P., C. S. Jensen, G. Orsi, S. Ram, L. Tanca, R. Torlone, "The relational model is dead, SQL is dead, and I don’t feel so good myself," ACM SIGMOD Record, 42(2):64–68, June 2013.

Publication [not publicly available]

Online at ACM Digital Library
We report the opinions expressed by well-known database researchers on the future of the relational model and SQL during a panel at the International Workshop on Non-Conventional Data Access (NoCoDa 2012), held in Florence, Italy in October 2012 in conjunction with the 31st International Conference on Conceptual Modeling. The panelists include: Paolo Atzeni (Università Roma Tre, Italy), Umeshwar Dayal (HP Labs, USA), Christian S. Jensen (Aarhus University, Denmark), and Sudha Ram (University of Arizona, USA). Quotations from movies are used as a playful though effective way to convey the dramatic changes that database technology and research are currently undergoing.
Jensen, C. S., "Querying the Web with Local Intent," in Proceedings of the Fourteenth International Conference on Mobile Data Management, Milan, Italy, p. 1. Invited abstract, June 3–6, 2013.

Publication

Online at IEEE
In step with the rapid proliferation of mobile devices with Internet access, the Web is increasingly being access by mobile-device users on the move. Further, it is increasingly possible to accurately geo-position mobile devices, and increasing volumes of geo-positioned content, e.g., Web pages, business directory entries, and microblog posts, are becoming available on the Web. In short, an increasingly mobile and spatial Web is fast emerging. This development enables Web queries with local intent, i.e., keyword-based queries issued by users who are looking for Web content near them. In addition, it implies an increasing demand for query functionality that supports local intent.
Hu, H., C. S. Jensen, D. Wu, "Message from the LBS n.0 Workshop Organizers," in Proceedings of the 14th International Conference on Mobile Data Management, Milan, Italy, Volume 2, p. xiii, June 3–6, 2013.

Publication

Online at IEEE
Jensen, C. S., C. Jermaine, R. Kotagiri, B. C. Ooi, "Message from the ICDE 2013 Program Committee and General Chairs," in Proceedings of the 29th IEEE International Conference on Data Engineering, Brisbane, QLD, Australia, pp. i–ii, April 8–11, 2013.

Publication

Online at IEEE
Yang, B., M. Kaul, C. S. Jensen, "Using Incomplete Information for Complete Weight Annotation of Road Networks—Extended Version," Technical Report, 17 pages. CoRR cs.DB/1308.0484 (2013), (Extended version of [22].), August 2013.

Publication

Online at Cornell University Library
We are witnessing increasing interests in the effective use of road networks. For example, to enable effective vehicle routing, weighted-graph models of transportation networks are used, where the weight of an edge captures some cost associated with traversing the edge, e.g., greenhouse gas (GHG) emissions or travel time. It is a precondition to using a graph model for routing that all edges have weights. Weights that capture travel times and GHG emissions can be extracted from GPS trajectory data collected from the network. However, GPS trajectory data typically lack the coverage needed to assign weights to all edges. This paper formulates and addresses the problem of annotating all edges in a road network with travel cost based weights from a set of trips in the network that cover only a small fraction of the edges, each with an associated ground-truth travel cost. A general framework is proposed to solve the problem. Specifically, the problem is modeled as a regression problem and solved by minimizing a judiciously designed objective function that takes into account the topology of the road network. In particular, the use of weighted PageRank values of edges is explored for assigning appropriate weights to all edges, and the property of directional adjacency of edges is also taken into account to assign weights. Empirical studies with weights capturing travel time and GHG emissions on two road networks (Skagen, Denmark, and North Jutland, Denmark) offer insight into the design properties of the proposed techniques and offer evidence that the techniques are effective.
Li, X., V. Cˇ eikute˙, C. S. Jensen, K.-L. Tan, "Trajectory Based Optimal Segment Computation in Road Network Databases," Technical Report, 28 pages. CoRR cs.DB/1303.2310 (2013), (Extended version of [102].), March 2013.

Publication [not publicly available]

Online at Cornell University Library
Finding a location for a new facility s.t. the facility attracts the maximal number of customers is a challenging problem. Existing studies either model customers as static sites and thus do not consider customer movement, or they focus on theoretical aspects and do not provide solutions that are shown empirically to be scalable. Given a road network, a set of existing facilities, and a collection of customer route traversals, an optimal segment query returns the optimal road network segment(s) for a new facility. We propose a practical framework for computing this query, where each route traversal is assigned a score that is distributed among the road segments covered by the route according to a score distribution model. We propose two algorithms that adopt different approaches to computing the query. Empirical studies with real data sets demonstrate that the algorithms are capable of offering high performance in realistic settings.
2012 top Cao, X., G. Cong, C. S. Jensen, J. J. Ng, B. C. Ooi, N.-T. Phan, D. Wu, "SWORS: A System for the Efficient Retrieval of Relevant Spatial Web Objects," Proceedings of the VLDB Endowment, 5(12): 1914–1917, August 2012.

Publication [not publicly available]

Online at VLDB
Spatial web objects that possess both a geographical location and a textual description are gaining in prevalence. This gives prominence to spatial keyword queries that exploit both location and textual arguments. Such queries are used in many web services such as yellow pages and maps services. We present SWORS, the Spatial Web Object Retrieval System, that is capable of efficiently retrieving spatial web objects that satisfy spatial keyword queries. Specifically, SWORS supports two types of queries: a) the location-aware top-k text retrieval (LkT) query that retrieves k individual spatial web objects taking into account query location proximity and text relevancy; b) the spatial keyword group (SKG) query that retrieves a group of objects that cover the query keywords and are nearest to the query location and have the shortest inter-object distances. SWORS provides browser-based interfaces for desktop and laptop computers and provides a client application for mobile devices. The interfaces and the client enable users to formulate queries and view the query results on a map. The server side stores the data and processes the queries. We use three real-life data sets to demonstrate the functionality and performance of SWORS.
Guo, C., Y. Ma, B. Yang, C. S. Jensen, M. Kaul, "EcoMark: Evaluating Models of Vehicular Environmental Impact," in Proceedings of the 20th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Redondo Beach, CA, USA, pp. 269–278, November 6–9, 2012.

Publication [not publicly available]

ACM Author-Izer
The reduction of greenhouse gas (GHG) emissions from transportation is essential for achieving politically agreed upon emissions reduction targets that aim to combat global climate change. So-called eco-routing and eco-driving are able to substantially reduce GHG emissions caused by vehicular transportation. To enable these, it is necessary to be able to reliably quantify the emissions of vehicles as they travel in a spatial network. Thus, a number of models have been proposed that aim to quantify the emissions of a vehicle based on GPS data from the vehicle and a 3D model of the spatial network the vehicle travels in. We develop an evaluation framework, called EcoMark, for such environmental impact models. In addition, we survey all eleven state-of-the-art impact models known to us. To gain insight into the capabilities of the models and to understand the effectiveness of the EcoMark, we apply the framework to all models.
Sheng, Q. Z., G. Wang, C. S. Jensen, G. Xu, editors, Proceedings of the 14th Asia-Pacific Web Conference, Kunming, China, 799+xix pages, April 11–13, 2012.

Online at SpringerLink
Cao, X., L. Chen, G. Cong, C. S. Jensen, Q. Qu, A. Skovsgaard, D. Wu, M. L. Yiu, "Spatial Keyword Querying," in Proceedings of the 31st International Conference on Conceptual Modeling, Florence, Italy, pp. 16–29. Invited paper, October 15–18, 2012.

Publication [not publicly available]

Online at SpringerLink
The web is increasingly being used by mobile users. In addition, it is increasingly becoming possible to accurately geo-position mobile users and web content. This development gives prominence to spatial web data management. Specifically, a spatial keyword query takes a user location and user-supplied keywords as arguments and returns web objects that are spatially and textually relevant to these arguments. This paper reviews recent results by the authors that aim to achieve spatial keyword querying functionality that is easy to use, relevant to users, and can be supported efficiently. The paper covers different kinds of functionality as well as the ideas underlying their definition.
Jensen, C. S., "Data management on the Spatial Web," Proceedings of the VLDB Endowment, 5(12): 1696, Invited abstract, August 2012.

Publication [not publicly available]

Online at ACM Digital Library
Due in part to the increasing mobile use of the web and the proliferation of geo-positioning, the web is fast acquiring a significant spatial aspect. Content and users are being augmented with locations that are used increasingly by location-based services. Studies suggest that each week, several billion web queries are issued that have local intent and target spatial web objects. These are points of interest with a web presence, and they thus have locations as well as textual descriptions. This development has given prominence to spatial web data management, an area ripe with new and exciting opportunities and challenges. The research community has embarked on inventing and supporting new query functionality for the spatial web. Different kinds of spatial web queries return objects that are near a location argument and are relevant to a text argument. To support such queries, it is important to be able to rank objects according to their relevance to a query. And it is important to be able to process the queries with low latency. The talk offers an overview of key aspects of the spatial web. Based on recent results obtained by the speaker and his colleagues, the talk explores new query functionality enabled by the setting. Further, the talk offers insight into the data management techniques capable of supporting such functionality.
Jensen, C. S., "Internettet – nu med en geografisk dimension," in Årsskrift 2011, Villum Fonden and Velux Fonden, pp. 38–41. Invited article, January 2012.

Publication

Publication in Danish
Mængden af data på elektronisk form vokser for tiden eksponentielt. Den it-infrastruktur, herunder internettet, som vi benytter dagligt, udvikler sig samtidig hastigt. Fx ser vi i den ene ende af infrastrukturen, at smartphones udbredes hastigt, samtidig med at den mobile båndbredde vokser og vokser. I den anden ende ser vi såkaldte datacentre skyde op. Disse er bygninger med et stort antal processorer og harddiske, der muliggør håndtering af enorme datamængder så billigt som muligt. Denne udvikling skaber hele tiden nye udfordringer og muligheder. Christian S. Jensen har modtaget Villum Kann Rasmussens Årslegat til Teknisk og Naturvidenskabelig Forskning for bl.a. sine bidrag inden for effektiv lagring af, og søgning i, spatiotemporale data, dvs. data, hvor tid og sted indgår. En del af disse bidrag retter sig mod at give internettet en geografisk dimension. Årslegatet på 2.500.000 kr. skal ifølge Christian S. Jensen anvendes til at muliggøre yderligere forskning i fundamentet for fremtidens internet.
Jensen, C. S., "Foreword," in The Knowledge Grid—in Cyber-Physical Society, by H. Zhuge, 2nd Edition, World Scientific, p. vii, September 2012.

Publication [not publicly available]
Bernstein, P. A., C. S. Jensen, K. L. Tan, "A Call for Surveys," ACM SIGMOD Record, 41(2): 47, June 2012.

Publication [not publicly available]

Online at ACM Digital Library
Sheng, Q.Z., G.Wang, C. S. Jensen, "Message from the Program Chairs," Proceedings of the 14th Asia-Pacific Web Conference, Kunming, China, p. vii, April 11–13, 2012.

Publication [not publicly available]

Online at SpringerLink
Jensen, C. S., E. Ofek, E. Tanin, "Highlights from ACM SIGSPATIAL GIS 2011—The 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (Chicago, Illinois - November 1–4, 2011)," in The SIGSPATIAL Special, 4(1): 2–4, March 2012.

Publication [not publicly available]

ACM Author-Izer
ACM SIGSPATIAL GIS 2011 was the 19th gathering of the premier event on spatial information and Geographic Information Systems (GIS). It is also the fourth year that the conference was held under the auspices of ACM's most recent special interest group, SIGSPATIAL. Since its start in 1993, the conference has targeted researchers, developers, and users whose work relates to spatial information and GIS, and it has a tradition of interdisciplinary discussions and presentations. It provides a forum for original research contributions that cover conceptual, design, and implementation aspects of spatial information systems and GIS.
D. Sidlauskas, C. S. Jensen, S. Šaltenis, " A comparison of the use of virtual versus physical snapshots for supporting update-intensive workloads ," in DaMoN '12 Proceedings of the Eighth International Workshop on Data Management on New Hardware , May 2012.

Publication [not publicly available]

ACM Author-Izer
Deployments of networked sensors fuel online applications that feed on real-time sensor data. This scenario calls for techniques that support the management of workloads that contain queries as well as very frequent updates. This paper compares two well-chosen approaches to exploiting the parallelism offered by modern processors for supporting such workloads. A general approach to avoiding contention among parallel hardware threads and thus exploiting the parallelism available in processors is to maintain two copies, or snapshots, of the data: one for the relatively long-duration queries and one for the frequent and very localized updates. The snapshot that receives the updates is frequently made available to queries, so that queries see up-to-date data. The snapshots may be physical or virtual. Physical snapshots are created using the C library memcpy function. Virtual snapshots are created by the fork system function that creates a new process that initially has the same data snapshot as the process it was forked from. When the new process carries out updates, this triggers the actual memory copying in a copy-on-write manner at memory page granularity. This paper characterizes the circumstances under which each technique is preferable. The use of physical snapshots is surprisingly efficient.
Cao, X., G. Cong, B. Cui, C. S. Jensen, Q. Yuan, " Approaches to Exploring Category Information for Question Retrieval in Community Question-Answer Archives ," in ACM Transactions on Information Systems, 34 pages, May 2012.

Publication [not publicly available]

ACM Author-Izer
Community Question Answering (CQA) is a popular type of service where users ask questions and where answers are obtained from other users or from historical question-answer pairs. CQA archives contain large volumes of questions organized into a hierarchy of categories. As an essential function of CQA services, question retrieval in a CQA archive aims to retrieve historical question-answer pairs that are relevant to a query question. This article presents several new approaches to exploiting the category information of questions for improving the performance of question retrieval, and it applies these approaches to existing question retrieval models, including a state-of-the-art question retrieval model. Experiments conducted on real CQA data demonstrate that the proposed techniques are effective and efﬁcient and are capable of outperforming a variety of baseline methods signiﬁcantly.
Wu, D., G. Cong, C. S. Jensen, "A Framework for Efficient Spatial Web Object Retrieval," in The VLDB Journal, 25 pages, March 2012.

Publication

Online at SpringerLink
The conventional Internet is acquiring a geospatial dimension. Web documents are being geo-tagged and geo-referenced objects such as points of interest are being associated with descriptive text documents. The resulting fusion of geo-location and documents enables new kinds of queries that take into account both location proximity and text relevancy. This paper proposes a new indexing framework for top-k spatial text retrieval. The framework leverages the inverted file for text retrieval and the R-tree for spatial proximity querying. Several indexing approaches are explored within this framework. The framework encompasses algorithms that utilize the proposed indexes for computing location-aware as well as region-aware top-k text retrieval queries, thus taking into account both text relevancy and spatial proximity to prune the search space. Results of empirical studies with an implementation of the framework demonstrate that the paper’s proposal is capable of excellent performance.
Li, X., P. Karras, L. Shi, K.-L. Tan, C. S. Jensen, "Cooperative Scalable Moving Continuous Query Processing," in Proceedings of the Thirteenth International Conference on Mobile Data Management, Bengaluru, India , July 2012.

Rishede, J., M. L. Yiu, C. S. Jensen, " Effective Caching of Shortest Paths for Location-Based Services ," in Proceedings of the 2012 ACM SIGMOD International Conference on the Management of Data, Scotttsdale, AZ, USA, pp. 313-324 , May 2012.

Publication [not publicly available]

ACM Author-Izer
Web search is ubiquitous in our daily lives. Caching has been extensively used to reduce the computation time of the search engine and reduce the network trafﬁc beyond a proxy server. Another form of web search, known as online shortest path search, is popular due to advances in geo-positioning. However, existing caching techniques are ineffective for shortest path queries. This is due to several crucial differences between web search results and shortest path results, in relation to query matching, cache item overlapping, and query cost variation. Motivated by this, we identify several properties that are essential to the success of effective caching for shortest path search. Our cache exploits the optimal subpath property, which allows a cached shortest path to answer any query with source and target nodes on the path. We utilize statistics from query logs to estimate the beneﬁt of caching a speciﬁc shortest path, and we employ a greedy algorithm for placing beneﬁcial paths in the cache. Also, we design a compact cache structure that supports efﬁcient query matching at runtime. Empirical results on real datasets conﬁrm the effectiveness of our proposed techniques.
Sidlauskas, D., S. Šaltenis, C. S. Jensen, " Parallel Main-Memory Indexing for Moving-Object Query and Update Workloads ," in Proceedings of the 2012 ACM SIGMOD International Conference on the Management of Data, Scottsdale, AZ, USA, pp. 37-48 , May 2012.

Publication [not publicly available]

ACM Author-Izer
We are witnessing a proliferation of Internet-worked, geo-positioned mobile devices such as smartphones and personal navigation devices. Likewise, location-related services that target the users of such devices are proliferating. Consequently, server-side infrastructures are needed that are capable of supporting the location-related query and update workloads generated by very large populations of such moving objects.
This paper presents a main-memory indexing technique that aims to support such workloads. The technique, called PGrid, uses a grid structure that is capable of exploiting the parallelism offered by modern processors. Unlike earlier proposals that maintain separate structures for updates and queries, PGrid allows both long-running queries and rapid updates to operate on a single data structure and thus offers up-to-date query results. Because PGrid does not rely on creating snapshots, it avoids the stop-the-world problem that occurs when workload processing is interrupted to perform such snapshotting. Its concurrency control mechanism relies instead on hardware-assisted atomic updates as well as object-level copying, and it treats updates as non-divisible operations rather than as combinations of deletions and insertions; thus, the query semantics guarantee that no objects are missed in query results.
Empirical studies demonstrate that PGrid scales near-linearly with the number of hardware threads on four modern multi-core processors. Since both updates and queries are processed on the same current data-store state, PGrid outperforms snapshot-based techniques in terms of both query freshness and CPU cycle-wise efficiency.
Lu, H., X. Cao, C. S. Jensen, "A Foundation for Efficient Indoor Distance-Aware Query Processing," in Proceedings of the 28th IEEE International Conference on Data Engineering, 12 pages , April 2012.

Indoor spaces accommodate large numbers of spatial objects, e.g., points of interest (POIs), and moving populations. A variety of services, e.g., location-based services and security control, are relevant to indoor spaces. Such services can be improved substantially if they are capable of utilizing indoor distances. However, existing indoor space models do not account well for indoor distances. To address this shortcoming, we propose a data management infrastructure that captures indoor distance and facilitates distance-aware query processing. In particular, we propose a distance-aware indoor space model that integrates indoor distance seamlessly. To enable the use of the model as a foundation for query processing, we develop accompanying, efficient algorithms that compute indoor distances for different indoor entities like doors as well as locations. We also propose an indexing framework that accommodates indoor distances that are pre-computed using the proposed algorithms. On top of this foundation, we develop efficient algorithms for typical indoor, distance-aware queries. The results of an extensive experimental evaluation demonstrate the efficacy of the proposals.
Lu, H., C. S. Jensen, "Upgrading Uncompetitive Products Economically," in Proceedings of the 28th IEEE International Conference on Data Engineering, Washington, DC, USA , April 2012.

The skyline of a multidimensional point set consists of the points that are not dominated by other points. In a scenario where product features are represented by multidimensional points, the skyline points may be viewed as representing competitive products. A product provider may wish to upgrade uncompetitive products to become competitive, but wants to take into account the upgrading cost. We study the top-k product upgrading problem. Given a set P of competitor products, a set T of products that are candidates for upgrade, and an upgrading cost function f that applies to T, the problem is to return the k products in T that can be upgraded to not be dominated by any products in P at the lowest cost. This problem is nontrivial due to not only the large data set sizes, but also to the many possibilities for upgrading a product. We identify and provide solutions for the different options for upgrading an uncompetitive product, and combine the solutions into a single solution. We also propose a spatial join-based solution that assumes P and T are indexed by an R-tree. Given a set of products in the same R-tree node, we derive three lower bounds on their upgrading costs. These bounds are employed by the join approach to prune upgrade candidates with uncompetitive upgrade costs. Empirical studies with synthetic and real data show that the join approach is efficient and scalable.
2011 top Jensen, C. S., K.-J. Li, S. Winter, "ISA 2010 Workshop Report The Other 87%: A Report on the Second International Workshop on Indoor Spatial Awareness (San Jose, California - November 2, 2010)," March 2011.

Publication [not publicly available]

ACM Author-Izer
With the increasing deployment of location-based services, geographic information systems, and ubiquitous computing, technologies and services that target indoor spaces are receiving increasing attention. This development is quite understandable because, as a paper presented at ISA 2010 points out, studies show that we lead most of our lives, 87% to be specific, in indoor settings. Those 87% are the focus of ISA 2010.
Jeung, H., M. L. Yiu, C. S. Jensen, "Trajectory Pattern Mining," Computing with Spatial Trajectories, pp. 143-177, 2011.

Online at SpringerLink
In step with the rapidly growing volumes of available moving-object trajectory data, there is also an increasing need for techniques that enable the analysis of trajectories. Such functionality may benefit a range of application area and services, including transportation, the sciences, sports, and prediction-based and social services, to name but a few. The chapter first provides an overview trajectory patterns and a categorization of trajectory patterns from the literature. Next, it examines relative motion patterns, which serve as fundamental background for the chapter's subsequent discussions. Relative patterns enable the specification of patterns to be identified in the data that refer to the relationships of motion attributes among moving objects. The chapter then studies disc-based and density-based patterns, which address some of the limitations of relative motion patterns. The chapter also reviews indexing structures and algorithms for trajectory pattern mining.
Lin, D., C. S. Jensen, R. Zhang, L. Xiao, J. Lu, " A Moving-Object Index for Efficient Query Processing with Peer-Wise Location Privacy ," in Proceedings of the VLDB Endowment, 5(1), pp. 37-48, September, 2011.

Publication

Online at VLDB
With the growing use of location-based services, location privacy attracts increasing attention from users, industry, and the research community. While considerable effort has been devoted to inventing techniques that prevent service providers from knowing a user's exact location, relatively little attention has been paid to enabling so-called peer-wise privacy---the protection of a user's location from unauthorized peer users. This paper identifies an important efficiency problem in existing peer-privacy approaches that simply apply a filtering step to identify users that are located in a query range, but that do not want to disclose their location to the querying peer. To solve this problem, we propose a novel, privacy-policy enabled index called the PEB-tree that seamlessly integrates location proximity and policy compatibility. We propose efficient algorithms that use the PEB-tree for processing privacy-aware range and kNN queries. Extensive experiments suggest that the PEB-tree enables efficient query processing.
Sidlauskas, D., K. A. Ross, C. S. Jensen, S. Šaltenis, " Thread-Level Parallel Indexing of Update Intensive Moving-Object Workloads ," in Proceedings of the Twelfth International Symposium on Spatial and Temporal Databases, Minneapolis, MN, pp. 186-204 , August 24-26, 2011.

Publication [not publicly available]

Online at SpringerLink
Modern processors consist of multiple cores that each support parallel processing by multiple physical threads, and they offer ample main-memory storage. This paper studies the use of such processors for the processing of update-intensive moving-object workloads that contain very frequent updates as well as contain queries. The non-trivial challenge addressed is that of avoiding contention between long-running queries and frequent updates. Specifically, the paper proposes a grid-based indexing technique. A static grid indexes a near up-to-date snapshot of the data to support queries, while a live grid supports updates. An efficient cloning technique that exploits the memcpy system call is used to maintain the static grid. An empirical study conducted with three modern processors finds that very frequent cloning, on the order of tens of milliseconds, is feasible, that the proposal scales linearly with the number of hardware threads, and that it significantly outperforms the previous state-of-the-art approach in terms of update throughput and query freshness.
Cao, X., G. Cong, C. S. Jensen, B. C. Ooi, "Collective Spatial Keyword Querying," in Proceedings of the 2011 ACM SIGMOD International Conference on the Management of Data, Athens, Greece, pp. 373-384 , June 12-16, 2011.

Publication [not publicly available]

ACM Author-Izer
With the proliferation of geo-positioning and geo-tagging, spatial web objects that possess both a geographical location and a textual description are gaining in prevalence, and spatial keyword queries that exploit both location and textual description are gaining in prominence. However, the queries studied so far generally focus on finding individual objects that each satisfy a query rather than finding groups of objects where the objects in a group collectively satisfy a query. We define the problem of retrieving a group of spatial web objects such that the group's keywords cover the query's keywords and such that objects are nearest to the query location and have the lowest inter-object distances. Specifically, we study two variants of this problem, both of which are NP-complete. We devise exact solutions as well as approximate solutions with provable approximation bounds to the problems. We present empirical studies that offer insight into the efficiency and accuracy of the solutions.
Banukievic, A., D. Sabonis, C. S. Jensen, H. Lu, "Improving Wi-Fi Based Indoor Positioning Using Bluetooth Add-Ons," in Proceedings of the Twelfth International Conference on Mobile Data Management, Luleå, pp. 246-255 , June 6-9, 2011.

Publication [not publicly available]
Location-Based Services (LBSs) constitutes one of the most popular classes of mobile services. However, while current LBSs typically target outdoor settings, we lead large parts of our lives indoors. The availability of easy-to-use and low-cost indoor positioning services is essential in also enabling indoor LBSs.
Existing indoor positioning services typically use a single technology such as Wi-Fi, RFID or Bluetooth. Wi-Fi based indoor positioning is relatively easy to deploy, but does often not offer good positioning accuracy. In contrast, the use of RFID or Bluetooth for positioning requires considerable investments in equipment in order to ensure good positioning accuracy. Motivated by these observations, we propose a hybrid approach to indoor positioning. In particular, we introduce Bluetooth hotspots into an indoor space with an existing Wi-Fi infrastructure such that better positioning is achieved than what can be achieved by each technology in isolation. We design a ﬂexible and extensible system architecture with an effective online position estimation algorithm for the hybrid system. The system is evaluated empirically in the building of our department. The results show that the hybrid approach improves positioning accuracy markedly.
Vicente, C. R., I. Assent, C. S. Jensen, "Effective Privacy-Preserving Online Route Planning," in Proceedings of the Twelfth International Conference on Mobile Data Management, Luleå, Sweden, pp. 119-128 , June 6-9, 2011.

Publication [not publicly available]
An online Route Planning Service (RPS) computes a route from one location to another. Current RPSs such as Google Maps require the use of precise locations. However, some users may not want to disclose their source and destination locations due to privacy concerns. An approach that supplies fake locations to an existing service incurs a substantial loss of quality of service, and the service may well return a result that may be not helpful to the user.
We propose a solution that is able to return accurate route planning results when source and destination regions are used in order to achieve privacy. The solution re-uses a standard online RPS rather than replicate this functionality, and it needs no trusted third party. The solution is able to compute the exact results without leaking of the exact locations to the RPS or untrusted parties. In addition, we provide heuristics that reduce the number of times that the RPS needs to be queried, and we also describe how the accuracy and privacy requirements can be relaxed to achieve better performance. An empirical study offers insight into key properties of the approach.
Wu, D., M. L. Yiu, G. Cong, C. S. Jensen, "Joint Top-K Spatial Keyword Query Processing," IEEE Transaction on Knowledge and Data Engineering, 16 pages, to appear.

Publication
Web users and content are increasingly being geo-positioned, and increased focus is being given to serving local content in response to web queries. This development calls for spatial keyword queries that take into account both the locations and textual descriptions of content. We study the efﬁcient, joint processing of multiple top-k spatial keyword queries. Such joint processing is attractive during high query loads and also occurs when multiple queries are used to obfuscate a user’s true query. We propose a novel algorithm and index structure for the joint processing of top-k spatial keyword queries. Empirical studies show that the proposed solution is efﬁcient on real datasets. We also offer analytical studies on synthetic datasets to demonstrate the efﬁciency of the proposed solution.
Tzoumas, K., A. Deshpande, C. S. Jensen, " Lightweight Graphical Models for Selectivity Estimation Without Independence Assumptions ," in Proceedings of the VLDB Endowment, 4(7), 12 pages, to appear.

Publication
As a result of decades of research and industrial development, modern query optimizers are complex software artifacts. However, the quality of the query plan chosen by an optimizer is largely determined by the quality of the underlying statistical summaries. Small selectivity estimation errors, propagated exponentially, can lead to severely sub-optimal plans. Modern optimizers typically maintain one-dimensional statistical summaries and make the attribute value independence and join uniformity assumptions for efﬁciently estimating selectivities. Therefore, selectivity estimation errors in today’s optimizers are frequently caused by missed correlations between attributes. We present a selectivity estimation approach that does not make the independence assumptions. By carefully using concepts from the ﬁeld of graphical models, we are able to factor the joint probability distribution of all the attributes in the database into small, usually two-dimensional distributions. We describe several optimizations that can make selectivity estimation highly efﬁcient, and we present a complete implementation inside PostgreSQL’s query optimizer. Experimental results indicate an order of magnitude better selectivity estimates, while keeping optimization time in the range of tens of milliseconds.
Lu, H., C. S. Jensen, Z. Zhang, "Flexible and Efficient Resolution of Skyline Query Size Constraints," in IEEE Transactions on Knowledge and Data Engineering, 23(7): 991-1005, July 2011.

Publication

Online at IEEE
Given a set of multidimensional points, a skyline query returns the interesting points that are not dominated by other points. It has been observed that the actual cardinality (s) of a skyline query result may differ substantially from the desired result cardinality (k), which has prompted studies on how to reduce s for the case where k<s. This paper goes further by addressing the general case where the relationship between k and s is not known beforehand. Due to their complexity, the existing pointwise ranking and set-wide maximization techniques are not well suited for this problem. Moreover, the former often incurs too many ties in its ranking, and the latter is inapplicable for k>s. Based on these observations, the paper proposes a new approach, called skyline ordering, that forms a skyline-based partitioning of a given data set such that an order exists among the partitions. Then, set-wide maximization techniques may be applied within each partition. Efficient algorithms are developed for skyline ordering and for resolving size constraints using the skyline order. The results of extensive experiments show that skyline ordering yields a flexible framework for the efficient and scalable resolution of arbitrary size constraints on skyline queries.
Yiu, M. L., C. S. Jensen, J. Møller, H. Lu, " Design and Analysis of a Ranking Approach to Private Location-Based Services ," ACM Transactions on Database Systems, 36(2), article 10, 43 pages, May 2011.

Publication [not publicly available]

ACM Author-Izer
Users of mobile services wish to retrieve nearby points of interest without disclosing their locations to the services. This article addresses the challenge of optimizing the query performance while satisfying given location privacy and query accuracy requirements. The article's proposal, SpaceTwist, aims to offer location privacy for k nearest neighbor (kNN) queries at low communication cost without requiring a trusted anonymizer. The solution can be used with a conventional DBMS as well as with a server optimized for location-based services. In particular, we believe that this is the first solution that expresses the server-side functionality in a single SQL statement. In its basic form, SpaceTwist utilizes well-known incremental NN query processing on the server. When augmented with a server-side granular search technique, SpaceTwist is capable of exploiting relaxed query accuracy guarantees for obtaining better performance. We extend SpaceTwist with so-called ring ranking, which improves the communication cost, delayed termination, which improves the privacy afforded the user, and the ability to function in spatial networks in addition to Euclidean space. We report on analytical and empirical studies that offer insight into the properties of SpaceTwist and suggest that our proposal is indeed capable of offering privacy with very good performance in realistic settings.
Vicente, C. R., D. Freni, C. Bettini, C. S. Jensen, "Location-Related Privacy in Geo-Social Networks," IEEE Internet Computing, 15(3): 20-27, May/June 2011.

Publication [not publicly available]

Online at IEEE
Geo-social networks (GeoSNs) provide context-aware services that help associate location with users and content. The proliferation of GeoSNs indicates that they're rapidly attracting users. GeoSNs currently offer different types of services, including photo sharing, friend tracking, and "check-ins." However, this ability to reveal users' locations causes new privacy threats, which in turn call for new privacy-protection methods. The authors study four privacy aspects central to these social networks - location, absence, co-location, and identity privacy - and describe possible means of protecting privacy in these circumstances.
Venetis, P., H. Gonzalez, C. S. Jensen, A. Halevy, "Hyper-Local, Directions-Based Ranking of Places," in Proceedings of the VLDB Endowment, 4(5): 290-301, February 2011.

Publication

Online at VLDB
Studies ﬁnd that at least 20% of web queries have local intent; and the fraction of queries with local intent that originate from mobile properties may be twice as high. The emergence of standardized support for location providers in web browsers, as well as of providers of accurate locations, enables so-called hyper-local web querying where the location of a user is accurate at a much ﬁner granularity than with IP-based positioning. This paper addresses the problem of determining the importance of points of interest, or places, in local-search results. In doing so, the paper proposes techniques that exploit logged directions queries. A query that asks for directions from a location a to a location b is taken to suggest that a user is interested in traveling to b and thus is a vote that location b is interesting. Such user-generated directions queries are particularly interesting because they are numerous and contain precise locations. Speciﬁcally, the paper proposes a framework that takes a user location and a collection of near-by places as arguments, producing a ranking of the places. The framework enables a range of aspects of directions queries to be exploited for the ranking of places, including the frequency with which places have been referred to in directions queries. Next, the paper proposes an algorithm and accompanying data structures capable of ranking places in response to hyper-local web queries. Finally, an empirical study with very large directions query logs offers insight into the potential of directions queries for the ranking of places and suggests that the proposed algorithm is suitable for use in real web search engines.
Olsen, M. G., D. Susar, A. Nietzio, M. Snaprud, C. S. Jensen, " Global Web Accessibility Analysis of National Government Portals and Ministry Web Sites ," Journal of Information Technology and Politics, 8(1): 41-67, 2011.

Publication [not publicly available]

Online at Informaworld
Equal access to public information and services for all is an essential part of the United Nations Declaration of Human Rights. Today, the Web plays an important role in providing information and services to citizens. Unfortunately, many government Web sites are poorly designed and have accessibility barriers that prevent people with disabilities from using them.
This paper combines current Web accessibility benchmarking methodologies with a sound strategy for comparing Web accessibility among countries and continents. Furthermore, the paper presents the first global analysis of the Web accessibility of 192 United Nation member states made publically available. The paper also identifies common properties of member states that have accessible and inaccessible Web sites and shows that implementing anti-disability discrimination laws is highly beneficial for the accessibility of Web sites, while signing the United Nations Rights and Dignity of Persons with Disabilities has had no such effect yet.
The paper demonstrates that, despite the commonly held assumption to the contrary, mature high-quality Web sites are more accessible than lower quality ones. Moreover, Web accessibility conformance claims by Web site owners are generally exaggerated.
Yiu, M.L., I. Assent, C. S. Jensen, P. Kalnis, "Outsourced Similarity Search on Metric Data Assets," in IEEE Transactions on Knowledge and Data Engineering, 24(2), pp. 338-352, February 2012.

Publication

Online at IEEE
This paper considers a cloud computing setting in which similarity querying of metric data is outsourced to a service provider. The data is to be revealed only to trusted users, not to the service provider or anyone else. Users query the server for the most similar data objects to a query example. Outsourcing offers the data owner scalability and a low initial investment. The need for privacy may be due to the data being sensitive (e.g., in medicine), valuable (e.g., in astronomy), or otherwise confidential. Given this setting, the paper presents techniques that transform the data prior to supplying it to the service provider for similarity queries on the transformed data. Our techniques provide interesting trade-offs between query cost and accuracy. They are then further extended to offer an intuitive privacy guarantee. Empirical studies with real data demonstrate that the techniques are capable of offering privacy while enabling efficient and accurate processing of similarity queries.
2010 top Tzoumas, K., A. Deshpande, C. S. Jensen, " Sharing-Aware Horizontal Partitioning for Exploiting Correlations during Query Processing ," in Proceedings of the VLDB Endowment, 3(1): 542-554, September 2010.

Publication

Online at VLDB
Optimization of join queries based on average selectivities is suboptimal in highly correlated databases. In such databases, relations are naturally divided into partitions, each partition having substantially different statistical characteristics. It is very compelling to discover such data partitions during query optimization and create multiple plans for a given query, one plan being optimal for a particular combination of data partitions. This scenario calls for the sharing of state among plans, so that common intermediate results are not recomputed. We study this problem in a setting with a routing-based query execution engine based on eddies [1]. Eddies naturally encapsulate horizontal partitioning and maximal state sharing across multiple plans. We define the notion of a conditional join plan, a novel representation of the search space that enables us to address the problem in a principled way. We present a lowoverhead greedy algorithm that uses statistical summaries based on graphical models. Experimental results suggest an order of magnitude faster execution time over traditional optimization for high correlations, while maintaining the same performance for low correlations.

[1] R. Avnur and J. M. Hellerstein. Eddies: Continuously adaptive query processing. In SIGMOD, pp. 261272, 2000.
Cao, X., G. Cong, C. S. Jensen, "Retrieving Top-k Prestige-Based Relevant Spatial Web Objects," in Proceedings of the VLDB Endowment, 3(1): 373-384, September 2010.

Publication

Online at VLDB
The location-aware keyword query returns ranked objects that are near a query location and that have textual descriptions that match query keywords. This query occurs inherently in many types of mobile and traditional web services and applications, e.g., Yellow Pages and Maps services. Previous work considers the potential results of such a query as being independent when ranking them. However, a relevant result object with nearby objects that are also relevant to the query is likely to be preferable over a relevant object without relevant nearby objects.
The paper proposes the concept of prestige-based relevance to capture both the textual relevance of an object to a query and the effects of nearby objects. Based on this, a new type of query, the Location-aware top-k Prestige-based Text retrieval (LkPT) query, is proposed that retrieves the top-k spatial web objects ranked according to both prestige-based relevance and location proximity.
We propose two algorithms that compute LkPT queries. Empirical studies with real-world spatial data demonstrate that LkPT queries are more effective in retrieving web objects than a previous approach that does not consider the effects of nearby objects; and they show that the proposed algorithms are scalable and outperform a baseline approach significantly.
Cao, X., G. Cong, C. S. Jensen, "Mining Significant Semantic Locations From GPS Data," in Proceedings of the VLDB Endowment, 3(1): 1009-1020, September 2010.

Publication

Online at VLDB
With the increasing deployment and use of GPS-enabled devices, massive amounts of GPS data are becoming available. We propose a general framework for the mining of semantically meaningful, significant locations, e.g., shopping malls and restaurants, from such data.
We present techniques capable of extracting semantic locations from GPS data. We capture the relationships between locations and between locations and users with a graph. Significance is then assigned to locations using random walks over the graph that propagates significance among the locations. In doing so, mutual reinforcement between location significance and user authority is exploited for determining significance, as are aspects such as the number of visits to a location, the durations of the visits, and the distances users travel to reach locations.
Studies using up to 100 million GPS records from a confined spatio-temporal region demonstrate that the proposal is effective and is capable of outperforming baseline methods and an extension of an existing proposal.
Jeung, H., M. L. Yiu, X. Zhou, C. S. Jensen, "Path Prediction and Predictive Range Querying in Road Network Databases," in The VLDB Journal, 19(4): 585-602, August 2010.

Publication

Online at SpringerLink
In automotive applications, movement-path prediction enables the delivery of predictive and relevant services to drivers, e.g., reporting traffic conditions and gas stations along the route ahead. Path prediction also enables better results of predictive range queries and reduces the location update frequency in vehicle tracking while preserving accuracy. Existing moving-object location prediction techniques in spatial-network settings largely target short-term prediction that does not extend beyond the next road junction. To go beyond short-term prediction, we formulate a network mobility model that offers a concise representation of mobility statistics extracted from massive collections of historical object trajectories. The model aims to capture the turning patterns at junctions and the travel speeds on road segments at the level of individual objects. Based on the mobility model, we present a maximum likelihood and a greedy algorithm for predicting the travel path of an object (for a time duration h into the future). We also present a novel and efficient server-side indexing scheme that supports predictive range queries on the mobility statistics of the objects. Empirical studies with real data suggest that our proposals are effective and efficient.
Freni, D., C. Bettini, C. R. Vicente, S. Mascetti, C. S. Jensen, "Preserving Location and Absence Privacy in Geo-Social Networks," in Proceedings of the 19th ACM Conference on Information and Knowledge Management, Toronto, Canada, pp. 309-318 , October 26-30, 2010.

Publication [not publicly available]

ACM Author-Izer
Online social networks often involve very large numbers of users who share very large volumes of content. This content is increasingly being tagged with geo-spatial and temporal coordinates that may then be used in services. For example, a service may retrieve photos taken in a certain region. The resulting geo-aware social networks (GeoSNs) pose privacy threats beyond those found in location-based services. Con- tent published in a GeoSN is often associated with references to multiple users, without the publisher being aware of the privacy preferences of those users. Moreover, this content is often accessible to multiple users. This renders it difficult for GeoSN users to control which information about them is available and to whom it is available.
This paper addresses two privacy threats that occur in GeoSNs: location privacy and absence privacy. The former concerns the availability of information about the presence of users in specific locations at given times, while the latter concerns the availability of information about the absence of an individual from specific locations during given periods of time. The challenge addressed is that of supporting privacy while still enabling useful services. We believe this is the first paper to formalize these two notions of privacy and to propose techniques for enforcing them. The techniques offer privacy guarantees, and the paper reports on empirical performance studies of the techniques.
Hansen, R., R. Wind, C. S. Jensen, B. Thomsen, " Algorithmic Strategies for Adapting to Environmental Changes in 802.11 Location Fingerprinting ," in Proceedings of the 2010 International Conference on Indoor Positioning and Indoor Navigation, Zurich, Switzerland, 10 pages , September 15-17, 2010.

Publication

Online at IEEE
This paper studies novel algorithmic strategies that enable 802.11 location fingerprinting to adapt to environmental changes. A long-standing challenge in location fingerprinting has been that dynamic changes, such as people presence, opening/closing of doors, or changing humidity levels, may influence the 802.11 signal strengths to an extent where a static radio map is rendered useless. To counter this effect, related research efforts propose to install additional sensors in order to adapt a previously built radio map to the circumstances at a given time. Although effective, this is not a viable solution for ubiquitous positioning where localization is required in many different buildings. Instead, we propose algorithmic strategies for dealing with changing environmental dynamics. We have performed an evaluation of our algorithms on signal strength data collected over a two month period at Aalborg University. The results show a vast improvement over using traditional static radio maps.
Gonzalez, H., A. Halevy, C. S. Jensen, A. Langen, J. Madhavan, R. Shapely, W. Shen, J. Goldberg-Kidon, "Google Fusion Tables: Web-Centered Data Management and Collaboration," in Proceedings of the 2010 ACM Symposium on Cloud Computing, Indianapolis, IN, USA, pp. 175-180 , June 10-11, 2010.

Publication [not publicly available]

ACM Author-Izer
It has long been observed that database management systems focus on traditional business applications, and that few people use a database management system outside their workplace. Many have wondered what it will take to enable the use of data management technology by a broader class of users and for a much wider range of applications. Google Fusion Tables represents an initial answer to the question of how data management functionality that focussed on enabling new users and applications would look in today's computing environment. This paper characterizes such users and applications and highlights the resulting principles, such as seamless Web integration, emphasis on ease of use, and incentives for data sharing, that underlie the design of Fusion Tables. We describe key novel features, such as the support for data acquisition, collaboration, visualization, and web-publishing.
Goldberg-Kidon, J., H. Gonzalez, A. Halevy, C. S. Jensen, A. Langen, J. Madhavan, R. Shapely, " Google Fusion Tables: Data Management, Integration and Collaboration in the Cloud ," in Proceedings of the 2010 ACM SIGMOD ACM SIGMOD International Conference on the Management of Data, Indianapolis, IN, USA, pp. 1061-1066 , June 6-11, 2010.

Publication [not publicly available]

ACM Author-Izer
Google Fusion Tables is a cloud-based service for data management and integration. Fusion Tables enables users to upload tabular data files (spreadsheets, CSV, KML), currently of up to 100MB. The system provides several ways of visualizing the data (e.g., charts, maps, and timelines) and the ability to filter and aggregate the data. It supports the integration of data from multiple sources by performing joins across tables that may belong to different users. Users can keep the data private, share it with a select set of collaborators, or make it public and thus crawlable by search engines. The discussion feature of Fusion Tables allows collaborators to conduct detailed discussions of the data at the level of tables and individual rows, columns, and cells. This paper describes the inner workings of Fusion Tables, including the storage of data in the system and the tight integration with the Google Maps infrastructure.
Jensen, C.S., H. Lu, B. Yang, "Indoor-A New Data Management Frontier," in M. Mokbel (ed.): Special Issue on New Frontiers in Spatial and Spatio-temporal Database Systems, IEEE Data Engineering Bulletin, 33(2): 12-17 , June 2010.

Publication

Online
Much research has been conducted on the management of outdoor moving objects. In contrast, relatively little research has been conducted on indoor moving objects. The indoor setting differs from outdoor settings in important ways, including the following two. First, indoor spaces exhibit complex topologies. They are composed of entities that are unique to indoor settings, e.g., rooms and hallways that are connected by doors. As a result, conventional Euclidean distance and spatial network distance are inapplicable in indoor spaces. Second, accurate, GPS-like positioning is typically unavailable in indoor spaces. Rather, positioning is achieved through the use of technologies such as Bluetooth, Infrared, RFID, or Wi-Fi. This typically results in much less reliable and accurate positioning.
This paper covers some preliminary research that explicitly targets an indoor setting. Specifically, we describe a graph-based model that enables the effective and efficient tracking of indoor objects using proximity-based positioning technologies like RFID and Bluetooth. Furthermore, we categorize objects according to their position-related states, present an on-line hash-based object indexing scheme, and conduct an uncertainty analysis for indoor objects. We end by identifying several interesting and important directions for future research.
Jensen, C.S., S. Madria, "Message from the General Chairs," in Proceedings of the Eleventh International Conference on Mobile Data Management, Kansas City, MO, USA, p. xii , May 23-26, 2010.

Publication [not publicly available]
Jensen, C.S., H. Lu, "The Great Indoors: A Data Management Frontier," in Proceedings of the Second Workshop on Research Directions in Situational-aware Self-managed Proactive Computing in Wireless Adhoc Networks, Kansas City, MO, USA, 3 pages , May 2010.

Publication [not publicly available]
Much of the research on data management for moving objects has assumed an outdoor setting in which objects move in Euclidean space (possibly constrained) or some form of spatial network and in which GPS or GPS-like positioning is assumed explicitly or implicitly. That body of research provides part of an enabling foundation for the growing Location-Based Services industry.
However, we lead large parts of our lives in indoor spaces: homes, office buildings, shopping and leisure facilities, and collective transportation infrastructures. The latter may be large: For example, each day in 2009, London Heathrow Airport, UK had on average 180,000 passengers, and the Tokyo Subway (Tokyo Metro and Toei Subway), Japan delivered a daily average of 8.7 million passenger rides in 2008. Tokyo's Shinjuku Station alone was used by an average of 3.64 million passengers per day in 2007.
Indoor differs from outdoor in important ways and thus calls for new research. The remainder of this paper covers selected differences between indoor and outdoor and discusses the implications for research.
Cao, X., G. Cong, B. Cui, C. S. Jensen, " A Generalized Framework of Exploring Category Information for Question Retrieval in Community Question Answer Archives ," in Proceedings of the Ninteenth International World Wide Web Conference, Raleigh, NC, USA, pp. 201-210 , April 26-30, 2010.

Publication [not publicly available]

ACM Author-Izer
Community Question Answering (CQA) has emerged as a popular type of service where users ask and answer questions and access historical question-answer pairs. CQA archives contain very large volumes of questions organized into a hierarchy of categories. As an essential function of CQA services, question retrieval in a CQA archive aims to retrieve historical question-answer pairs that are relevant to a query question. In this paper, we present a new approach to exploiting category information of questions for improving the performance of question retrieval, and we apply the approach to existing question retrieval models, including a state-of-the-art question retrieval model. Experiments conducted on real CQA data demonstrate that the proposed techniques are capable of outperforming a variety of baseline methods significantly.
Yang, B., H. Lu, C. S. Jensen, " Probabilistic Threshold k Nearest Neighbor Queries over Moving Objects in Symbolic Indoor Space ," in Proceedings of the Thirteenth International Conference on Extending Database Technology, Lausanne, Switzerland, pp. 335-346 , March 22-26, 2010.

Publication [not publicly available]

ACM Author-Izer
The availability of indoor positioning renders it possible to deploy location-based services in indoor spaces. Many such services will benefit from the efficient support for k nearest neighbor (kNN) queries over large populations of indoor moving objects. However, existing kNN techniques fall short in indoor spaces because these differ from Euclidean and spatial network spaces and because of the limited capabilities of indoor positioning technologies.
To contend with indoor settings, we propose the new concept of minimal indoor walking distance (MIWD) along with algorithms and data structures for distance computing and storage; and we differentiate the states of indoor moving objects based on a positioning device deployment graph, utilize these states in effective object indexing structures, and capture the uncertainty of object locations. On these foundations, we study the probabilistic threshold kNN (PTkNN) query. Given a query location q and a probability threshold T, this query returns all subsets of k objects that have probability larger than T of containing the kNN query result of q. We propose a combination of three techniques for processing this query. The first uses the MIWD metric to prune objects that are too far away. The second uses fast probability estimates to prune unqualified objects and candidate result subsets. The third uses efficient probability evaluation for computing the final result on the remaining candidate subsets. An empirical study using both synthetic and real data shows that the techniques are efficient.
Jensen, C.S., "Foreword," Moving Objects Management-Models, Techniques and Applications, by X. Meng and J. Chen, Springer Verlag , February 2010.

Publication [not publicly available]

Online at SpringerLink
Lu, H., C. S. Jensen, Z. Zhang, " Skyline Ordering: A Flexible Framework for Efficient Resolution of Size Constraints on Skyline Queries ," DB Technical Report TR-27, Department of Computer Science, Aalborg University, 28 pages , January 2010.

Publication
Given a set of multi-dimensional points, a skyline query returns the interesting points that are not dominated by other points. It has been observed that the actual cardinality (s) of a skyline query result may differ substantially from the desired result cardinality (k), which has prompted studies on how to reduce s for the case where k < s.
This paper goes further by addressing the general case where the relationship between k and s is not known beforehand. Due to their complexity, the existing pointwise ranking and set-wide maximization techniques are not well suited for this problem. Moreover, the former often incurs too many ties in its ranking, and the latter is inapplicable for k > s. Based on these observations, the paper proposes a new approach, called skyline ordering, that forms a skyline-based partitioning of a given data set, such that an order exists among the partitions. Then set-wide maximization techniques may be applied within each partition. Efficient algorithms are developed for skyline ordering and for resolving size constraints using the skyline order. The results of extensive experiments show that skyline ordering yields a flexible framework for the efficient and scalable resolution of arbitrary size constraints on skyline queries.
Yiu, M.L., G. Ghinita, C. S. Jensen, P. Kalnis, "Enabling Search Services on Outsourced Private Spatial Data," in The VLDB Journal, 19(3): 363-384, 2010.

Publication

Online at SpringerLink
Cloud computing services enable organizations and individuals to outsource the management of their data to a service provider in order to save on hardware investments and reduce maintenance costs. Only authorized users are allowed to access the data. Nobody else, including the service provider, should be able to view the data. For instance, a real-estate company that owns a large database of properties wants to allow its paying customers to query for houses according to location. On the other hand, the untrusted service provider should not be able to learn the property locations and, e.g., selling the information to a competitor. To tackle the problem, we propose to transform the location datasets before uploading them to the service provider. The paper develops a spatial transformation that re-distributes the locations in space, and it also proposes a cryptographic-based transformation. The data owner selects the transformation key and shares it with authorized users. Without the key, it is infeasible to reconstruct the original data points from the transformed points. The proposed transformations present distinct trade-offs between query efficiency and data confidentiality. In addition, we describe attack models for studying the security properties of the transformations. Empirical studies demonstrate that the proposed methods are efficient and applicable in practice.
Ruxanda, M. M., A. Nanopoulos, C. S. Jensen, "Flexible Fusion of Relevance and Importance in Music Ranking," in Journal of New Music Research, 39(1): 35-45, 2010.

Publication [not publicly available]

Online at informaworld
Due to the proliferation of audio files on the Web and in large digital music collections, the ranking of the retrieved music becomes an important issue in Music Information Retrieval. This paper proposes a music-ranking strategy that can identify and flexibly fuse music audio, ranging from similar and potentially serendipitous music to authoritative or mainstream music. The notions of similar and authoritative music are double-folded identified based on user preference data and acoustic features extracted from the audio. The music-ranking employs kernel functions, which can be user-controllable through an intuitive parameter tuning. A research prototype system that incorporates the ranking mechanism is developed, and a user study is conducted on its use. The results of the survey envisage users' satisfaction with respect to the proposed music-ranking strategy, and its real-world applicability.
2009 top Tiesyte, D., C. S. Jensen, "Assessing the Predictability of Scheduled-Vehicle Travel Times," in Proceedings of the Seventeenth ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Seattle, WA, USA, pp. 416-419 , November 4-6, 2009.

Publication [not publicly available]

ACM Author-Izer
One of the most desired and challenging services in collective transport systems is the real-time prediction of the near-future travel times of scheduled vehicles, especially public buses, thus improving the experience of the transportation users, who may be able to better schedule their travel, and also enabling system operators to perform real-time monitoring. While travel-time prediction has been researched extensively during the past decade, the accuracies of existing techniques fall short of what is desired, and proposed mathematical prediction models are often not transferable to other systems because the properties of the travel-time-related data of vehicles are highly context-dependent, making the models difficult to fit. We propose a framework for evaluating various predictability types of the data independently of the model, and we also compare predictability analysis results of travel times with the actual prediction errors for real bus trajectories. We have applied the proposed framework to real-time data collected from buses operating in Copenhagen, Denmark.
Yang, B., H. Lu, C. S. Jensen, " Scalable Continuous Range Monitoring of Moving Objects in Symbolic Indoor Space ," in Proceedings of the 18th ACM Conference on Information and Knowledge Management, Hong Kong, pp. 671-680, November 2-6, 2009.

Publication [not publicly available]

ACM Author-Izer
Indoor spaces accommodate large populations of individuals. The continuous range monitoring of such objects can be used as a foundation for a wide variety of applications, e.g., space planning, way finding, and security. Indoor space differs from outdoor space in that symbolic locations, e.g., rooms, rather than Euclidean positions or spatial network locations are important. In addition, positioning based on presence sensing devices, rather than, e.g., GPS, is assumed. Such devices report the objects in their activation ranges. We propose an incremental, query-aware continuous range query processing technique for objects moving in this setting. A set of critical devices is determined for each query, and only the observations from those devices are used to continuously maintain the query result. Due to the limitations of the positioning devices, queries contain certain and uncertain results. A maximum-speed constraint on object movement is used to refine the latter results. A comprehensive experimental study with both synthetic and real data suggests that our proposal is efficient and scalable.
Cao, X., G. Cong, B. Cui, C. S. Jensen, C. Zhang, "The use of categorization information in language models for question retrieval," in Proceedings of the 18th ACM Conference on Information and Knowledge Management, Hong Kong, pp. 265-274 , November 2-6, 2009.

Publication [not publicly available]

ACM Author-Izer
Community Question Answering (CQA) has emerged as a popular type of service meeting a wide range of information needs. Such services enable users to ask and answer questions and to access existing question-answer pairs. CQA archives contain very large volumes of valuable user-generated content and have become important information resources on the Web. To make the body of knowledge accumulated in CQA archives accessible, effective and effcient question search is required. Question search in a CQA archive aims to retrieve historical questions that are relevant to new ques tions posed by users. This paper proposes a category-based framework for search in CQA archives. The framework embodies several new techniques that use language models to exploit categories of questions for improving question-answer search. Experiments conducted on real data from Yahoo! Answers demonstrate that the proposed techniques are effective and effcient and are capable of outperforming baseline methods significantly.
Jensen, C. S., H. Lu, B. Yang, "Indexing the Trajectories of Moving Objects in Symbolic Indoor Space," in Proceedings of the Eleventh International Symposium on Spatial and Temporal Databases, Aalborg, Denmark, pp. 208-227 , July 8-10, 2009.

Publication [not publicly available]

Online at SpringerLink
Indoor spaces accommodate large populations of individuals. With appropriate indoor positioning, e.g., Bluetooth and RFID, in place, large amounts of trajectory data result that may serve as a foundation for a wide variety of applications, e.g., space planning, way finding, and security. This scenario calls for the indexing of indoor trajectories. Based on an appropriate notion of indoor trajectory and definitions of pertinent types of queries, the paper proposes two R-tree based structures for indexing object trajectories in symbolic indoor space. The RTR-tree represents a trajectory as a set of line segments in a space spanned by positioning readers and time. The TP2R-tree applies a data transformation that yields a representation of trajectories as points with extension along the time dimension. The paper details the structure, node organization strategies, and query processing algorithms for each index. An empirical performance study suggests that the two indexes are effective, efficient, and robust. The study also elicits the circumstances under which our proposals perform the best.
Hansen, R., R. Wind, C. S. Jensen, B. Thomsen, "Pretty Easy Pervasive Positioning," in Proceedings of the Eleventh International Symposium on Spatial and Temporal Databases, Aalborg, Denmark, pp. 417-421 , July 8-10, 2009.

Publication [not publicly available]

Online at SpringerLink
With the increasing availability of positioning based on GPS, Wi-Fi, and cellular technologies and the proliferation of mobile devices with GPS, Wi-Fi and cellular connectivity, ubiquitous positioning is becoming a reality. While offerings by companies such as Google, Skyhook, and Spotigo render positioning possible in outdoor settings, including urban environments with limited GPS coverage, they remain unable to offer accurate indoor positioning. We will demonstrate a software infrastructure that makes it easy for anybody to build support for accurate Wi-Fi based positioning in buildings. All that is needed is a building with Wi-Fi coverage, access to the building, a floor plan of the building, and a Wi-Fi enabled device. Specifically, we will explain the software infrastructure and the steps that must be completed to obtain support for positioning. And we will demonstrate the positioning obtained, including how it interoperates with outdoor GPS positioning.
Vicente, C. R., M. Kirkpatrick, G. Ghinita, E. Bertino, C. S. Jensen, "Towards location-based access control in healthcare emergency response," in Proceedings of the Second SIGSPATIAL ACM GIS International Workshop on Security and Privacy in GIS and LBS, Seattle, WA, USA, pp. 22-26 , November 3, 2009.

Publication [not publicly available]

ACM Author-Izer
Recent advances in positioning and tracking technologies have led to the emergence of novel location-based applications that allow participants to access information relevant to their spatio-temporal context. Traditional access control models, such as role-based access control (RBAC), are not sufficient to address the new challenges introduced by these location-based applications. Several recent research efforts have enhanced RBAC with spatio-temporal features. Nevertheless, the state-of-the-art does not deal with mobility of both subjects and objects, and does not support complex access control decisions based on spatio-temporal relationships among subjects and objects. Furthermore, such relationships change frequently in dynamic environments, requiring efficient mechanisms to monitor and re-evaluate access control decisions. In this position paper, we present a healthcare emergency response scenario which highlights the novel challenges that arise when enforcing access control in an environment with moving subjects and objects. To address a realistic application scenario, we consider movement on road networks, and we identify complex access control decisions relevant to such settings. We overview the main technical issues to be addressed, and we describe the architecture for policy decision and enforcement points.
Tzoumas, K., M. L. Yiu, C. S. Jensen, "Workload-Aware Indexing of Continuously Moving Objects," in Proceedings of the VLDB Endowment, Vol. 2, No. 1-2, pp. 1186-1197, 2009.

Publication

Online at VLDB
The increased deployment of sensors and data communication networks yields data management workloads with update loads that are intense, skewed, and highly bursty. Query loads resulting from location-based services are expected to exhibit similar characteristics. In such environments, index structures can easily become performance bottlenecks. We address the need for indexing that is adaptive to the workload characteristics, called workload-aware, in order to cover the space in between maintaining an accurate index, and having no index at all. Our proposal, QU-Trade, extends R-tree type indexing and achieves workload-awareness by controlling the underlying index's filtering quality. QU-Trade safely drops index updates, increasing the overlap in the index when the workload is update-intensive, and it restores the filtering capabilities of the index when the workload becomes query-intensive. This is done in a non-uniform way in space so that the quality of the index remains high in frequently queried regions, while it deteriorates in frequently updated regions. The adaptation occurs online, without the need for a learning phase. We apply QU-Trade to the R-tree and the TPR-tree, and we offer analytical and empirical studies. In the presence of substantial workload skew, QU-Trade can achieve index update costs close to zero and can also achieve virtually the same query cost as the underlying index.
Cong, G., C. S. Jensen, D. Wu, "Efficient Retrieval of the Top-k Most Relevant Spatial Web Objects," in Proceedings of the VLDB Endowment, Vol. 2, No. 1-2, pp. 337-348, 2009.

Publication

Online at VLDB
The conventional Internet is acquiring a geo-spatial dimension. Web documents are being geo-tagged, and geo-referenced objects such as points of interest are being associated with descriptive text documents. The resulting fusion of geo-location and documents enables a new kind of top-k query that takes into account both location proximity and text relevancy. To our knowledge, only naive techniques exist that are capable of computing a general web information retrieval query while also taking location into account. This paper proposes a new indexing framework for locationaware top-k text retrieval. The framework leverages the inverted file for text retrieval and the R-tree for spatial proximity querying. Several indexing approaches are explored within the framework. The framework encompasses algorithms that utilize the proposed indexes for computing the top-k query, thus taking into account both text relevancy and location proximity to prune the search space. Results of empirical studies with an implementation of the framework demonstrate that the paper's proposal offers scalability and is capable of excellent performance.
Zhang, M., S. Chen, C. S. Jensen, B. C. Ooi, Z. Zhang, "Effectively Indexing Uncertain Moving Objects for Predictive Queries," in Proceedings of the VLDB Endowment, Vol. 2, No. 1-2, pp. 1198-1209, 2009.

Publication

Online at VLDB
Moving object indexing and query processing is a well studied research topic, with applications in areas such as intelligent transport systems and location-based services. While much existing work explicitly or implicitly assumes a deterministic object movement model, real-world objects often move in more complex and stochastic ways. This paper investigates the possibility of a marriage between moving-object indexing and probabilistic object modeling. Given the distributions of the current locations and velocities of moving objects, we devise an efficient inference method for the prediction of future locations. We demonstrate that such prediction can be seamlessly integrated into existing index structures designed for moving objects, thus improving the meaningfulness of range and nearest neighbor query results in highly dynamic and uncertain environments. The paper reports on extensive experiments on the B*-tree that offer insights into the properties of the paper's proposal.
Jensen, C. S., H. Lu, B. Yang, "Graph Model Based Indoor Tracking," in Proceedings of the Tenth International Conference on Mobile Data Management, Taipei, Taiwan, pp. 122-131, May 18-21, 2009.

Publication

Online at IEEE
The tracking of the locations of moving objects in large indoor spaces is important, as it enables a range of applications related to, e.g., security and indoor navigation and guidance. This paper presents a graph model based approach to indoor tracking that offers a uniform data management infrastructure for different symbolic positioning technologies, e.g., Bluetooth and RFID. More specifically, the paper proposes a model of indoor space that comprises a base graph and mappings that represent the topology of indoor space at different levels. The resulting model can be used for one or several indoor positioning technologies. Focusing on RFID-based positioning, an RFID specific reader deployment graph model is built from the base graph model. This model is then used in several algorithms for constructing and refining trajectories from raw RFID readings. Empirical studies with implementations of the models and algorithms suggest that the paper's proposals are effective and efficient.
Pedersen, T. B., A. Shoshani, J. Gu, C. S. Jensen, "Object-Extended OLAP Querying," in Data and Knowledge Engineering, Vol. 68, No. 5, pp. 453-480, May 2009.

Publication [not publicly available]

Online by Elsevier at ScienceDirect
On-line analytical processing (OLAP) systems based on a dimensional view of data have found widespread use in business applications and are being used increasingly in nonstandard applications. These systems provide good performance and ease-of-use. However, the complex structures and relationships inherent in data in non-standard applications are not accommodated well by OLAP systems. In contrast, object database systems are built to handle such complexity, but do not support OLAP-type querying well. This paper presents the concepts and techniques underlying a flexible, "multi-model" federated system that enables OLAP users to exploit simultaneously the features of OLAP and object systems. The system allows data to be handled using the most appropriate data model and technology: OLAP systems for dimensional data and object database systems for more complex, general data. This allows data analysis on the OLAP data to be significantly enriched by the use of additional object data. Additionally, physical integration of the OLAP and the object data can be avoided. As a vehicle for demonstrating the capabilities of the system, a prototypical OLAP language is defined and extended to naturally support queries that involve data in object databases. The language permits selection criteria that reference object data, queries that return combinations of OLAP and object data, and queries that group dimensional data according to object data. The system is designed to be aggregation-safe, in the sense that it exploits the aggregation semantics of the data to prevent incorrect or meaningless query results. These capabilities may also be integrated into existing languages. It is shown how to integrate relational and XML data using the technology. A prototype implementation of the system is reported, along with performance measurements that show that the approach is a viable alternative to a physically integrated data warehouse.
Ruxanda, M. M., B. Y. Chua, A. Nanopoulos, C. S. Jensen, "Emotion-based Music Retrieval on a Well-reduced Audio Feature Space," in Proceedings of the 34th International Conference on Acoustics, Speech, and Signal Processing, Taipei, Taiwan, pp. 181-184, April 19-24, 2009.

Publication

Online at IEEE
Music expresses emotion. A number of audio extracted features have influence on the perceived emotional expression of music. These audio features generate a high-dimensional space, on which music similarity retrieval can be performed effectively, with respect to human perception of the musicemotion. However, the real-time systems that retrieve music over large music databases, can achieve order of magnitude performance increase, if applying multidimensional indexing over a dimensionally reduced audio feature space. To meet this performance achievement, in this paper, extensive studies are conducted on a number of dimensionality reduction algorithms, including both classic and novel approaches. The paper clearly envisages which dimensionality reduction techniques on the considered audio feature space, can preserve in average the accuracy of the emotion-based music retrieval.
Zhou, Y., G. Cong, B. Cui, C. S. Jensen, J. Yao, "Routing Questions to the Right Users in Online Communities," in Proceedings of the 25th International Conference on Data Engineering, Shanghai, China, pp. 700-711 , March 29 - April 4, 2009.

Publication

Online at IEEE
Online forums contain huge amounts of valuable user-generated content. In current forum systems, users have to passively wait for other users to visit the forum systems and read/answer their questions. The user experience for question answering suffers from this arrangement. In this paper, we address the problem of "pushing" the right questions to the right persons, the objective being to obtain quick, high-quality answers, thus improving user satisfaction. We propose a framework for the efficient and effective routing of a given question to the top-k potential experts (users) in a forum, by utilizing both the content and structures of the forum system. First, we compute the expertise of users according to the content of the forum system.—This is to estimate the probability of a user being an expert for a given question based on the previous question answering of the user. Specifically, we design three models for this task, including a profile-based model, a thread-based model, and a clusterbased model. Second, we re-rank the user expertise measured in probability by utilizing the structural relations among users in a forum system. The results of the two steps can be integrated naturally in a probabilistic model that computes a final ranking score for each user. Experimental results show that the proposals are very promising.
Yiu, M. L., G. Ghinita, C. S. Jensen, P. Kalnis, "Outsourcing Search Services on Private Spatial Data," in Proceedings of the 25th International Conference on Data Engineering, Shanghai, China, pp. 1140-1143 , March 29 - April 4, 2009.

Publication

Online at IEEE
Social networking and content sharing service providers, e.g., Facebook and Google Maps, enable their users to upload and share a variety of user-generated content, including location data such as points of interest. Users wish to share location data through an (untrusted) service provider such that trusted friends can perform spatial queries on the data. We solve the problem by transforming the location data before uploading them. We contribute spatial transformations that redistribute locations in space and a transformation that employs cryptographic techniques. The data owner selects transformation keys and shares them with the trusted friends. Without the keys, it is infeasible for an attacker to reconstruct the exact original data points from the transformed points. These transformations achieve different tradeoffs between query efficiency and data security. In addition, we describe an attack model for studying the security properties of the transformations. Empirical studies suggest that the proposed methods are secure and efficient.
Tradišauskas, N., J. Juhl, H. Lahrmann, C. S. Jensen, "Map Matching for Intelligent Speed Adaptation," in IET Intelligent Transport Systems, Vol. 3, No. 1, pp. 57-66, March 2009.

Publication [not publicly available]

Online at IET Digital Library
The availability of Global Navigation Satellite Systems (GNSS) enables sophisticated vehicle guidance and advisory systems such as Intelligent Speed Adaptation (ISA) systems. In ISA systems, it is essential to be able to position vehicles within a road network. Because digital road networks as well as GNSS positioning are often inaccurate, a technique known as map matching is needed that aims to use this inaccurate data for determining a vehicle's real road-network position. Then, knowing this position, an ISA system can compare speed with the speed limit in effect and take measures against speeding. An on-line map-matching algorithm is presented with an extensive number of weighting parameters that allow better determination of a vehicle's road network position. The algorithm uses certainty value to express its belief in the correctness of its results. The algorithm was designed and implemented to be used in the large scale ISA project "Spar på farten". Using test data and data collected from project participants, the algorithm's performance is evaluated. It is shown that the algorithm performs correctly 95% of the time and is capable of handling GNSS positioning errors in a conservative manner.
Hansen, R., R. Wind, C. S. Jensen, B. Thomsen, " Seamless Indoor/Outdoor Positioning Handover for Location-Based Services in Streamspin ," in Proceedings of the Tenth International Conference on Mobile Data Management, Taipei, Taiwan, pp. 267-272 , May 18-21, 2009.

Publication

Online at IEEE
This paper presents the implementation of a novel seamless indoor/outdoor positioning service for mobile users.The service is being made available in the Streamspin system(www.streamspin.com), an open platform for the creation and delivery of location-based services. Streamspin seeks to enable the delivery of truly ubiquitous location-based services by integrating GPS and Wi-Fi location fingerprinting. The paper puts focus on key aspects of the seamless handover between outdoor to indoor positioning. Several different handover solutions are presented,and their applicability is evaluated with respect to positioning accuracy and battery consumption of the mobile device.
Jensen, C. S., "Data Management Infrastructure for the Mobile Web," in Proceedings of the Fifth International Conference on Semantics, Knowledge and Grid, Zhuhai, China, p. 1. Invited paper , October 12-14, 2009.

Publication [not publicly available]
The Internet is going mobile, and indications are that the mobile Internet will be "bigger" than the conventional Internet. Due to aspects such as user mobility, much more diverse use situations, and the form factor of mobile devices, context awareness is important on the mobile Internet. Focusing on geo-spatial context awareness, this talk covers research that aims to build infrastructure for mobile data management.
Shestakov, N. A, C. S. Jensen, " Extending Mobile Service Context with User Routes in the Streamspin Platform ," Bulletin of the Tomsk Polytechnic University, Vol. 314, No. 5, pp. 170-175 (in Russian) , May 2009.

Publication [not publicly available]
Jensen, C. S., R. T. Snodgrass, "Absolute Time," in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, p. 1 , 2009.

Online at SpringerLink
Jensen, C. S., R. T. Snodgrass, "Applicability Period," in Encyclopedia of Database Systems, edited by L. Liu and M. T.–Özsu, Springer Verlag, pp. 98-99 , 2009.

Online at SpringerLink
Jensen, C. S., R. T. Snodgrass, "Bitemporal Interval," in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, p. 243 , 2009.

Online at SpringerLink
Jensen, C. S., R. T. Snodgrass, "Bitemporal Relation," in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, pp. 243-244 , 2009.

Online at SpringerLink
Jensen, C. S., R. T. Snodgrass, "Calendar," in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, pp. 304-305 , 2009.

Online at SpringerLink
Dyreson, C., C. S. Jensen, R. T. Snodgrass, "Calendric System," in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, p. 305 , 2009.

Online at SpringerLink
Böhlen, M. H., C. S. Jensen, R. T. Snodgrass, "Current Semantics," in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, pp. 544-545 , 2009.

Online at SpringerLink
Jensen, C. S., R. T. Snodgrass, "Event in Temporal Databases," in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, pp. 1045-1046 , 2009.

Online at SpringerLink
Jensen, C. S., R. T. Snodgrass, "Fixed Time Span," in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, p. 1141, 2009.

Online at SpringerLink
Jensen, C. S., R. T. Snodgrass, "Forever," in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, p. 1161, 2009.

Online at SpringerLink
Jensen, C. S., R. T. Snodgrass, "History in Temporal Databases," in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, p. 1319, 2009.

Online at SpringerLink
Šaltenis, S., C. S. Jensen, "Indexing of the Current and Near-Future Positions of Moving Objects," in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, pp. 1458-1463, 2009.

Online at SpringerLink
Jensen, C. S., "Lifespan," in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, pp. 1612-1613, 2009.

Online at SpringerLink
Jensen, C. S., N. Tradišauskas, "Map Matching," in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, pp. 1692-1696, 2009.

Online at SpringerLink
Böhlen, M. H., C. S. Jensen, R. T. Snodgrass, "Nonsequenced Semantics," in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, pp. 1913-1915, 2009.

Online at SpringerLink
Dyreson, C., C. S. Jensen, R. T. Snodgrass, "Now in Temporal Databases," in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, pp. 1920-1924, 2009.

Online at SpringerLink
Jensen, C. S., R. T. Snodgrass, "Relative Time," in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, pp. 2376-2377, 2009.

Online at SpringerLink
Böhlen, M. H., C. S. Jensen, "Sequenced Semantics," in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, pp. 2619-2621, 2009.

Online at SpringerLink
Jensen, C. S., R. T. Snodgrass, "Snapshot Equivalence," in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, p. 2659, 2009.

Online at SpringerLink
Böhlen, M. H., J. Gamper, C. S. Jensen, R. T. Snodgrass, "SQL-Based Temporal Query Languages," in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, pp. 2762-2768, 2009.

Online at SpringerLink
Böhlen, M. H., J. Gamper, C. S. Jensen, "Temporal Aggregation," in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, pp. 2924-2929, 2009.

Online at SpringerLink
Böhlen, M. H., C. S. Jensen, R. T. Snodgrass, "Temporal Compatibility," in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, pp. 2936-2939, 2009.

Online at SpringerLink
Jensen, C. S., R. T. Snodgrass, "Temporal Database," in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, pp. 2957-2960, 2009.

Online at SpringerLink
Jensen, C. S., R. T. Snodgrass, "Temporal Data Models," in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, pp. 2952-2957, 2009.

Online at SpringerLink
Jensen, C. S., R. T. Snodgrass, "Temporal Element," in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, p. 2966, 2009.

Online at SpringerLink
Jensen, C. S., R. T. Snodgrass, "Temporal Expression," in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, p. 2967, 2009.

Online at SpringerLink
Jensen, C. S., R. T. Snodgrass, "Temporal Generalization," in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, pp. 2967-2968, 2009.

Online at SpringerLink
Jensen, C. S., R. T. Snodgrass, "Temporal Homogeneity," in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, p. 2973, 2009.

Online at SpringerLink
Jensen, C. S., R. T. Snodgrass, "Temporal Projection," in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, p. 3008, 2009.

Online at SpringerLink
Jensen, C. S., R. T. Snodgrass, "Temporal Query Languages," in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, pp. 3009-3012, 2009.

Online at SpringerLink
Jensen, C. S., R. T. Snodgrass, "Temporal Specialization," in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, pp. 3017-3018, 2009.

Online at SpringerLink
Jensen, C. S., R. T. Snodgrass, "Time Instant," in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, p. 3112, 2009.

Online at SpringerLink
Jensen, C. S., R. T. Snodgrass, "Time Interval," in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, pp. 3112-3113, 2009.

Online at SpringerLink
Jensen, C. S., R. T. Snodgrass, "Time Span," in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, p. 3119, 2009.

Online at SpringerLink
Jensen, C. S., R. T. Snodgrass, "Timeslice Operator," in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, pp. 3120-3121, 2009.

Online at SpringerLink
Jensen, C. S., R. T. Snodgrass, "Transaction Time," in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, pp. 3162-3163, 2009.

Online at SpringerLink
Jensen, C. S., R. T. Snodgrass, "User-Defined Time," in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, p. 3252, 2009.

Online at SpringerLink
Jensen, C. S., R. T. Snodgrass, "Valid Time," in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, pp. 3253-3254, 2009.

Online at SpringerLink
Jensen, C. S., R. T. Snodgrass, "Variable Time Span," in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, p. 3259, 2009.

Online at SpringerLink
Jensen, C. S., R. T. Snodgrass, "Weak Equivalence," in Encyclopedia of Database Systems, edited by L. Liu and M. T. Özsu, Springer Verlag, pp. 3455-3456, 2009.

Online at SpringerLink
Jensen, C. S., H. Lu, M. L. Yiu, "Location Privacy Techniques in Client-Server Architectures," Chapter 2 in Privacy in Location-Based Applications, edited by C. Bettini, S. Jajodia, P. Samarati, and X. Sean Wang, Lecture Notes on Computer Science Vol. 5599, Springer Verlag, pp. 31-58., 2009.

Publication

Online at SpringerLink
A typical location-based service returns nearby points of interest in response to a user location. As such services are becoming increasingly available and popular, location privacy emerges as an important issue. In a system that does not offer location privacy, users must disclose their exact locations in order to receive the desired services. We view location privacy as an enabling technology that may lead to increased use of location-based services.
In this chapter, we consider location privacy techniques that work in traditional client-server architectures without any trusted components other than the client's mobile device. Such techniques have important advantages. First, they are relatively easy to implement because they do not rely on any trusted third-party components. Second, they have potential for wide application, as the client-server architecture remains dominant for web services. Third, their effectiveness is independent of the distribution of other users, unlike the k-anonymity approach.
The chapter characterizes the privacy models assumed by existing techniques and categorizes these according to their approach. The techniques are then covered in turn according to their category. The first category of techniques enlarge the client's position into a region before it is sent to the server. Next, dummy-based techniques hide the user's true location among fake locations, called dummies. In progressive retrieval, candidate results are retrieved iteratively from the server, without disclosing the exact user location. Finally, transformation-based techniques employ cryptographic transformations so that the service provider is unable to decipher the exact user locations. We end by pointing out promising directions and open problems.
2008 top Redoutey, M., E. Scotti, C. S. Jensen, C. Ray, C. Claramunt, "Efficient Vessel Tracking with Accuracy Guarantees," in Proceedings of the Eighth International Symposium on Web and Wireless Geographical Information Systems, Shanghai, China, pp. 140-151 , December 11-12, 2008.

Publication

Online at SpringerLink
Safety and security are top concerns in maritime navigation, particularly as maritime traffic continues to grow and as crew sizes are reduced. The Automatic Identification System (AIS) plays a key role in regard to these concerns. This system, whose objective is in part to identify and locate vessels, transmits location-related information from vessels to ground stations that are part of a so-called Vessel Traffic Service (VTS), thus enabling these to track the movements of the vessels. This paper presents techniques that improve the existing AIS by offering better and guaranteed tracking accuracies at lower communication costs. The techniques employ movement predictions that are shared between vessels and the VTS. Empirical studies with a prototype implementation and real vessel data demonstrate that the techniques are capable of significantly improving the AIS.
Jensen, C. S., C. R. Vicente, R. Wind, "User-Generated Content-The Case for Mobile Services," IEEE Computer, Vol. 41, No. 12, pp. 116-118, December 2008.

Publication

Online at IEEE
Enabling user-generated services could help fuel the mobile revolution. Web sites that enable the sharing of user- generated content such as photos and videos are immensely popular, and their use is on the rise. Technologies that enable Web sites to support the creation, sharing, and deployment of user-generated mobile services could be key factors in the spread of the mobile Internet.
Tiesyte, D., C. S. Jensen, " Similarity-Based Prediction of Travel Times for Vehicles Traveling on Known Routes ," in Proceedings of the Sixteenth ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Irvine, CA, USA, pp. 105-114 , November 5-7, 2008.

Publication [not publicly available]

ACM Author-Izer
The use of centralized, real-time position tracking is proliferating in the areas of logistics and public transportation. Real-time positions can be used to provide up-to-date information to a variety of users, and they can also be accumulated for uses in subsequent data analyses. In particular, historical data in combination with real-time data may be used to predict the future travel times of vehicles more accurately, thus improving the experience of the users who rely on such information. We propose a Nearest-Neighbor Trajectory (NNT) technique that identifies the historical trajectory that is the most similar to the current, partial trajectory of a vehicle. The historical trajectory is then used for predicting the future movement of the vehicle. The paper's specific contributions are two-fold. First, we define distance measures and a notion of nearest neighbor that are specific to trajectories of vehicles that travel along known routes. In empirical studies with real data from buses, we evaluate how well the proposed distance functions are capable of predicting future vehicle movements. Second, we propose a main-memory index structure that enables incremental similarity search and that is capable of supporting varying-length nearest neighbor queries.
Combi, C., S. Degani, C. S. Jensen, "Capturing Temporal Constraints in Temporal ER Models," in Proceedings of the 27th International Conference on Conceptual Modeling, Barcelona, Spain, pp. 397-411 , October 20-23, 2008.

Publication

Online at SpringerLink
A wide range of database applications manage information that varies over time. The conceptual modeling of databases is frequently based on one of the several versions of the ER model. As this model does not provide built-in means for capturing temporal aspects of data, the resulting diagrams are unnecessarily obscure and inadequate for documentation purposes. The TimeER model extends the ER model with suitable constructs for modeling time-varying information, easing the design process, and leading to easy-to-understand diagrams. In a temporal ER model, support for the specification of advanced temporal constraints would be desiderable, allowing the designer to specify, e.g., that the value of an attribute must not change over time. This paper extends the TimeER model by introducing the notation, and the associated semantics, for the specification of new temporal constraints.
Ruxanda, M. M., C. S. Jensen, "Flexible Query Framework for Music Data and Playlist Manipulation," in Proceedings of the Third International Workshop on Flexible Database and Information System Technology, Turin, Italy, pp. 693-697, September 1, 2008.

Publication

Online at IEEE
Motivated by the explosion of digital music on the Web and the increasing popularity of music recommender systems, this paper presents a relational query framework for flexible music retrieval and effective playlist manipulation. A generic song representation model is introduced, which captures heterogeneous categories of musical information and serves a foundation for query operators that offer a practical solution to playlist management. A formal definition of the proposed query operators is provided, together with real usage scenarios and a prototype implementation.
Diao, Y., C. S. Jensen, editors, in Proceedings of the Fifth International Workshop on Data Management for Sensor Networks, Auckland, New Zealand, 55+viii pages , August 24, 2008.

Online at ACM Digital Library

Jeung, H., M. L. Yiu, X. Zhou, C. S. Jensen, H. T. Shen, "Discovery of Convoys in Trajectory Databases," in Proceedings of the VLDB Endowment, Auckland, New Zealand, Vol. 1, No. 1, pp. 1068-1080 , August 2008.

Publication [not publicly available]

Online at ACM Digital Library
As mobile devices with positioning capabilities continue to proliferate, data management for so-called trajectory databases that capture the historical movements of populations of moving objects becomes important. This paper considers the querying of such databases for convoys, a convoy being a group of objects that have traveled together for some time.
More specifically, this paper formalizes the concept of a convoy query using density-based notions, in order to capture groups of arbitrary extents and shapes. Convoy discovery is relevant for real-life applications in throughput planning of trucks and carpooling of vehicles. Although there has been extensive research on trajectories in the literature, none of this can be applied to retrieve correctly exact convoy result sets. Motivated by this, we develop three efficient algorithms for convoy discovery that adopt the well-known filter-refinement framework. In the filter step, we apply line-simplification techniques on the trajectories and establish distance bounds between the simplified trajectories. This permits efficient convoy discovery over the simplified trajectories without missing any actual convoys. In the refinement step, the candidate convoys are further processed to obtain the actual convoys. Our comprehensive empirical study offers insight into the properties of the paper's proposals and demonstrates that the proposals are effective and efficient on real-world trajectory data.
Chen, S., C. S. Jensen, D. Lin, "A Benchmark for Evaluating Moving Object Indexes," in Proceedings of the VLDB Endowment, Auckland, New Zealand, Vol. 1, No. 2, pp. 1574-1585 , August 2008.

Publication [not publicly available]

Online at ACM Digital Library
Progress in science and engineering relies on the ability to measure, reliably and in detail, pertinent properties of artifacts under design. Progress in the area of database-index design thus relies on empirical studies based on prototype implementations of indexes. This paper proposes a benchmark that targets techniques for the indexing of the current and near-future positions of moving objects. This benchmark enables the comparison of existing and future indexing techniques. It covers important aspects of such indexes that have not previously been covered by any benchmark. Notable aspects covered include update efficiency, query efficiency, concurrency control, and storage requirements. Next, the paper applies the benchmark to half a dozen notable moving-object indexes, thus demonstrating the viability of the benchmark and offering new insight into the performance properties of the indexes.
Hansen, R., C. S. Jensen, B. Thomsen, R. Wind, "Seamless Indoor/Outdoor Positioning with Streamspin," in Proceedings of the the Fifth Annual International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services, Dublin, Ireland, 2 pages, July 21-25, 2008.

Publication [not publicly available]
This paper presents the implementation of novel seamless indoor/outdoor positioning service for mobile users. When users are not within GPS range, the service exploits the wifi access point infrastructure for positioning. A central server stores wifi radio maps and map images that are then sent to user terminals based on the mac addresses of nearby access points. The positioning services is available in Streamspin (www.streamspin.com), which is an open and scalable plat- form for the creation and delivery of location-based services. With this new service, the system enables the easy creation and deployment of mobile services that rely on seamless in- door/outdoor positioning.
Jensen, C. S., D. Tiesyte, "TransDB-GPS Data Management with Applications in Collective Transport," in Proceedings of the First International Workshop on Computational Transportation Science, Dublin, Ireland, 6 pages , July 21, 2008.

Publication [not publicly available]
Recent and continuing advances in geo-positioning, mobile communications, and computing electronics combine to offer opportunities for advanced and affordable collective transport services. As the roads in many parts of the world are facing increasing congestion, it becomes increasingly important to establish collective transport solutions, such as bus services, that are competitive in comparison to the use of private cars. One important ingredient in the provisioning of such solutions is an information system that is always aware of the current location and expected future locations of each bus and that is capable of utilizing this information in real time as well as off-line, e.g., for offering the users accurate arrival information and for creating safe, realistic, and environmentally friendly bus schedules. This paper introduces to an on-going project that explores the advanced data management techniques needed to create an efficient, accurate, and yet inexpensive information system for collective transport monitoring. Focus is on bus travel time prediction and the communication between the vehicles and their surrounding infrastructure.
Böhlen, M. H., J. Gamper, C. S. Jensen, "Towards General Temporal Aggregation," in Proceedings of the Twentyfifth British National Conference on Databases, Lecture Notes on Computer Science 5071, Cardiff, Wales, UK, pp. 257-269 , July 7-10, 2008.

Publication

Online at SpringerLink
Most database applications manage time-referenced, or temporal, data. Temporal data management is difficult when using conventional database technology, and many contributions have been made for how to better model, store, and query temporal data. Temporal aggregation illustrates well the problems associated with the management of temporal data. Indeed, temporal aggregation is complex and among the most difficult, and thus interesting, temporal functionality to support. This paper presents a general framework for temporal aggregation that accommodates existing kinds of aggregation, and it identifies open challenges within temporal aggregation.
p Tzoumas, K., T. Sellis, C. S. Jensen, "A Reinforcement Learning Approach for Adaptive Query Processing," DB Technical Report TR-22, Department of Computer Science, Aalborg University, June 2008, 27 pages , June 27, 2008.

Publication
In adaptive query processing, query plans are improved at runtime by means of feedback. In the very flexible approach based on so-called eddies, query execution is treated as a process of routing tuples to the query operators that combine to compute a query. This makes it possible to alter query plans at the granularity of tuples. Further, the complex task of searching the query plan space for a suitable plan now resides in the routing policies used. These policies must adapt to the changing execution environment and must converge at a near-optimal plan when the environment stabilizes.
This paper advances adaptive query processing in two respects. First, it proposes a general framework for the routing problem that may serve the same role for adaptive query processing as does the framework of search in query plan space for conventional query processing. It thus offers an improved foundation for research in adaptive query processing. The framework leverages reinforcement learning theory and formalizes a tuple routing policy as a mapping from a state space to an action space, capturing query semantics as well as routing constraints. In effect, the framework transforms query optimization from a search problem in query plan space to an unsupervised learning problem with quantitative rewards that is tightly coupled with the query execution. The framework covers selection queries as well as joins that use all proposed join execution mechanisms (SHJs, SteMs, STAIRs). Second, in addition to showing how existing routing policies can fit into the framework, the paper demonstrates new routing policies that build on advances in reinforcement learning. By means of empirical studies, it is shown that the proposed policies embody the desired adaptivity and convergence characteristics, and that they are capable of clearly outperforming existing policies.
Speicys, L., C. S. Jensen, "Enabling Location-Based Services - Multi-Graph Representation of Transportation Networks," GeoInformatica, Vol. 12, No. 2, pp. 219-253, June 2008.

Publication [not publicly available]

Online at SpringerLink
Advances in wireless communications, positioning technologies, and consumer electronics combine to enable a range of applications that use a mobile user's geo-spatial location to deliver on-line, location-enhanced services, often referred to as location-based services. This paper assumes that the service users are constrained to a transportation network, and it delves into the modeling of such networks, points of interest, and the service users with the objective of supporting location-based services. In particular, the paper presents a framework that encompasses two interrelated modelsa twodimensional, spatial representation and a multi-graph presentation. The former, high-fidelity model may be used for the positioning of content and users in the infrastructure (e.g., using map matching). The latter type of model is recognized as an ideal basis for a variety of query processing tasks, e.g., route and distance computations. Together, the two models capture central aspects of the problem domain needed in order to support the different types of queries that underlie location-based services. Notably, the framework is capable of capturing roads with lanes, lane shift and u-turn regulations, and turn restrictions. As part of the framework, the paper constructively demonstrates how it is possible map instances of the semantically rich two-dimensional model to instances of the graph model that preserve the topology of the twodimensional model instances. In doing so, the paper demonstrates how a wealth of previously proposed query processing techniques based on graphs are applicable even in the context of complex transportation networks. The paper also presents means of compacting graphs while preserving aspects of the graphs that are important for the intended applications.
Ruxanda, M. M., A. Nanopoulos, C. S. Jensen, Y. Manolopoulos, "Ranking Music Data by Relevance and Importance," in Proceedings of the IEEE International Conference on Multimedia and Expo, Hannover, Germany, 4 pages, June 23-26, 2008.

Publication

Online at IEEE
Due to the rapidly increasing availability of audio files on the Web, it is relevant to augment search engines with advanced audio search functionality. In this context, the ranking of the retrieved music is an important issue. This paper proposes a music ranking method capable of flexibly fusing the music based on its relevance and importance. The fusion is controlled by a single parameter, which can be intuitively tuned by the user. The notion of authoritative music among relevant music is introduced, and social media mined from the Web is used in an innovative manner to determine both the relevance and importance of music. The proposed method may support users with diverse needs when searching for music.
Demri, S., C. S. Jensen, editors, in Proceedings of the Fifteenth International Symposium on Temporal Representation and Reasoning, Montreal, Canada, 174+x pages , June 16-18, 2008.

Online at IEEE
Lu, H., C. S. Jensen, M. L. Yiu, "PAD: Privacy-Area Aware, Dummy-Based Location Privacy in Mobile Services," in Proceedings of the Seventh ACM International Workshop on Data Engineering for Wireless and Mobile Access, Vancouver, Canada, pp. 16-23, June 13, 2008.

Publication [not publicly available]

ACM Author-Izer
Location privacy in mobile services has the potential to become a serious concern for service providers and users. Existing privacy protection techniques that use k-anonymity convert an original query into an anonymous query that contains the locations of multiple users. Such techniques, however, generally fail in offering guaranteed large privacy regions at reasonable query processing costs. In this paper, we propose the PAD approach that is capable of offering privacy-region guarantees. To achieve this, PAD uses so-called dummy locations that are deliberately generated according to either a virtual grid or circle. These cover a user's actual location, and their spatial extents are controlled by the generation algorithms. The PAD approach only requires a lightweight server-side front-end in order for it to be integrated into an existing client/server mobile service system. In addition, query results are organized according to a compact format on the server, which not only reduces communication cost, but also facilitates the result refinement on the client side. An empirical study shows that our proposal is effective in terms of offering location privacy, and efficient in terms of computation and communication costs.
Jensen, C. S., R. T. Snodgrass, editors, "Temporal Database Entries for the Springer Encyclopedia of Database Systems," TimeCenter Technical Report TR-90, 337+v pages, May 2008.

Publication
Tiesyte, D., C. S. Jensen, "Efficient Cost-Based Tracking of Scheduled Vehicle Journeys," in Proceedings of the Ninth International Conference on Mobile Data Management, Beijing, China, pp. 9-16 , April 27-30, 2008.

Publication

Online at IEEE
Applications in areas such as logistics, cargo delivery, and collective transport involve the management of fleets of vehicles that are expected to travel along known routes according to schedules. There is a fundamental need by the infrastructure surrounding the vehicles to know the actual status of the vehicles. Since the vehicles deviate from their schedules due to road construction, accidents, and other unexpected conditions, it is necessary for the vehicles to communicate with the infrastructure. Frequent updates introduce high communication costs, and server-side updates easily become a bottleneck. This paper presents techniques that enable the tracking of vehicle positions and arrival times at scheduled stops with little communication, while still offering the desired accuracy to the infrastructure of the status of the vehicles. Experimental results with real GPS data from buses show that the proposed techniques are capable of reducing the number of updates significantly compared to a state-of-the art approach where vehicles issue updates at pre-defined positions along their routes.
Yiu, M. L., C. S. Jensen, X. Huang, H. Lu, " SpaceTwist: Managing the Trade-Offs Among Location Privacy, Query Performance, and Query Accuracy in Mobile Services ," in Proceedings of the Twentyfourth IEEE International Conference on Data Engineering, Cancun, Mexico, pp. 366-375 , April 7-12, 2008.

Publication

Online at IEEE
In a mobile service scenario, users query a server for nearby points of interest but they may not want to disclose their locations to the service. Intuitively, location privacy may be obtained at the cost of query performance and query accuracy. The challenge addressed is how to obtain the best possible performance, subjected to given requirements for location privacy and query accuracy. Existing privacy solutions that use spatial cloaking employ complex server query processing techniques and entail the transmission of large quantities of intermediate result. Solutions that use transformation-based matching generally fall short in offering practical query accuracy guarantees. Our proposed framework, called SpaceTwist, rectifies these shortcomings for k nearest neighbor (kNN) queries. Starting with a location different from the user's actual location, nearest neighbors are retrieved incrementally until the query is answered correctly by the mobile terminal. This approach is flexible, needs no trusted middleware, and requires only well-known incremental NN query processing on the server. The framework also includes a server-side granular search technique that exploits relaxed query accuracy guarantees for obtaining better performance. The paper reports on empirical studies that elicit key properties of SpaceTwist and suggest that the framework offers very good performance and high privacy, at low communication cost.
Skyt, J., C. S. Jensen, T. B. Pedersen, "Specification-Based Data Reduction in Dimensional Data Warehouses," Information Systems, Vol. 33, No. 1, pp. 36-63, March 2008.

Publication [not publicly available]

Online by Elsevier at ScienceDirect
Many data warehouses contain massive amounts of data, accumulated over long periods of time. In some cases, it is necessary or desirable to either delete "old" data or to maintain the data at an aggregate level. This may be due to privacy concerns, in which case the data are aggregated to levels that ensure anonymity. Another reason is the desire to maintain a balance between the uses of data that change as the data age and the size of the data, thus avoiding overly large data warehouses. This paper presents effective techniques for data reduction that enable the gradual aggregation of detailed data as the data ages. With these techniques, data may be aggregated to higher levels as they age, enabling the maintenance of more compact, consolidated data and the compliance with privacy requirements. Special care is taken to avoid semantic problems in the aggregation process. The paper also describes the querying of the resulting data warehouses and an implementation strategy based on current database technology.
Jensen, C. S., D. Lin, B. C. Ooi, "Indexing of Moving Objects, B+-Tree," in Encyclopedia of GIS, edited by S. Shekhar and H. Xiong, Springer Verlag, pp. 512-518, 2008.

Jensen, C. S., D. Lin, B. C. Ooi, "Maximum Update Interval in Moving Objects Databases," in Encyclopedia of GIS, edited by S. Shekhar and H. Xiong, Springer Verlag, p. 651, 2008.

Jensen, C. S., L. Speicys, "Road Network Data Models," in Encyclopedia of GIS, edited by S. Shekhar and H. Xiong, Springer Verlag, pp. 972-976, 2008.

Tryfona, N., C. S. Jensen, "Spatio-temporal Database Modeling with an Extended Entity-Relationship Model," in Encyclopedia of GIS, edited by S. Shekhar and H. Xiong, Springer Verlag, pp. 1115-1121, 2008.

2007 top Jensen, C. S., D. Lin, B. C. Ooi, "Continuous Clustering of Moving Objects," IEEE Transactions on Knowledge and Data Engineering, Vol. 19, No. 9, pp. 1161-1174, September 2007.

Publication

Online at IEEE
This paper considers the problem of efficiently maintaining a clustering of a dynamic set of data points that move continuously in two-dimensional Euclidean space. This problem has received little attention and introduces new challenges to clustering. The paper proposes a new scheme that is capable of incrementally clustering moving objects. This proposal employs a notion of object dissimilarity that considers object movement across a period of time, and it employs clustering features that can be maintained efficiently in incremental fashion. In the proposed scheme, a quality measure for incremental clusters is used for identifying clusters that are not compact enough after certain insertions and deletions. An extensive experimental study shows that the new scheme performs significantly faster than traditional ones that frequently rebuild clusters. The study also shows that the new scheme is effective in preserving the quality of moving-object clusters.
Urgun, B., C. E. Dyreson, N. Kline, J. K. Miller, R. T. Snodgrass, M. D. Soo, C. S. Jensen, "Integrating Multiple Calendars using tZaman," Software: Practice and Experience, Vol. 37, No. 3, pp. 267-308, March 2007.

Publication [not publicly available]

Online at Wiley InterScience
Programmers world-wide are interested in developing applications that can be used internationally. Part of the internationalization effort is the ability to engineer applications to use dates and times that conform to local calendars yet can inter-operate with dates and times in other calendars, for instance between the Gregorian and Islamic calendars. tZAMAN is a system that provides a natural language and calendarindependent framework for integrating multiple calendars. tZAMAN performs "runtime-binding" of calendars and language support. A running tZAMAN system dynamically loads calendars and language support tables from XML-formatted files. Loading a calendar integrates it with other, already loaded calendars, enabling users of tZAMAN to add, compare, and convert times between multiple calendars. tZAMAN also provides a flexible, calendar-independent framework for parsing temporal literals. Literals can be input and output in XML or plain text, using user-defined formats, and in different languages and character sets. Finally, tZAMAN is a client/server system, enabling shared access to calendar servers spread throughout the web. This paper describes the architecture of tZAMAN and experimentally quantifies the cost of using a calendar server to translate and manipulate dates.
Brilingaite, A., C. S. Jensen, "Enabling Routes of Road Network Constrained Movements as Mobile Service Context," GeoInformatica, Vol. 11, No. 1, pp. 55-102, March 2007.

Publication [not publicly available]

Online at SpringerLink
With the continuing advances in wireless communications, geo-positioning, and portable electronics, an infrastructure is emerging that enables the delivery of on-line, location-enabled services to very large numbers of mobile users. A typical usage situation for mobile services is one characterized by a small screen and no keyboard, and by the service being only a secondary focus of the user. Under such circumstances, it is particularly important to deliver the "right" information and service at the right time, with as little user interaction as possible. This may be achieved by making services context aware. Mobile users frequently follow the same route to a destination as they did during previous trips to the destination, and the route and destination constitute important aspects of the context for a range of services. This paper presents key concepts underlying a software component that identifies and accumulates the routes of a user along with their usage patterns and that makes the routes available to services. The problems associated with of route recording are analyzed, and algorithms that solve the problems are presented. Experiences from using the component on logs of GPS positions acquired from vehicles traveling within a real road network are reported.
Biveinis, L., S. Šaltenis, C. S. Jensen, "Main-Memory Operation Buffering for Efficient R-Tree Update," in Proceedings of the Thirtythird International Conference on Very Large Data Bases, Vienna, Austria, pp. 591-602, September 23-28, 2007.

Publication

Online at VLDB
Emerging communication and sensor technologies enable new applications of database technology that require database systems to efficiently support very high rates of spatial-index updates. Previous works in this area require the availability of large amounts of main memory, do not exploit all the main memory that is indeed available, or do not support some of the standard index operations.
Assuming a setting where the index updates need not be written to disk immediately, we propose an R-tree-based indexing technique that does not exhibit any of these drawbacks. This technique exploits the buffering of update operations in main memory as well as the grouping of operations to reduce disk I/O. In particular, operations are performed in bulk so that multiple operations are able to share I/O. The paper presents an analytical cost model that is shown to be accurate by empirical studies. The studies also show that, in terms of update I/O performance, the proposed technique improves on state of the art in settings with frequent updates.
Jensen, C. S., S. Pakalnis, "TRAX - Real-World Tracking of Moving Objects," in Proceedings of the Thirtythird International Conference on Very Large Data Bases, Vienna, Austria, pp. 1362-1365, September 23-28, 2007.

Publication

Online at VLDB
A range of mobile services rely on knowing the current positions of populations of so-called moving objects. In the ideal setting, the positions of all objects are known always and exactly. While this is not possible in practice, it is possible to know each object's position with a certain guaranteed accuracy.
This paper presents the TRAX tracking system that supports several techniques capable of tracking the current positions of moving objects with guaranteed accuracies at low update and communication costs in real-world settings. The techniques are readily relevant for practical applications, but they also have implications for continued research. The tracking techniques offer a realistic setting for existing query processing techniques that assume that it is possible to always know the exact positions of moving objects. The techniques enable studies of trade-offs between querying and update, and the accuracy guarantees they offer may be exploited by query processing techniques to offer perfect recall.
Tradišauskas, N., J. Juhl, H. Lahrmann, C. S. Jensen, " Map Matching Algorithm for the "Spar På Farten" Intelligent Speed Adaptation Project ," 2007 Annual Transport Conference at Aalborg University, Aalborg, Denmark, 10 pages , August 27-28, 2007.

Publication [not publicly available]
The availability of Global Navigation Satellite Systems (GNSS) enables sophisticated vehicle guidance and advisory systems such as Intelligent Speed Adaptation (ISA) systems. In ISA systems, it is essential to be able to position vehicles within a road network. Because digital road networks as well as GNSS positioning are often inaccurate, a technique known as map matching is needed that aims to use this inaccurate data for determining a vehicle's real road-network position. Then, knowing this position, an ISA system can compare speed with the speed limit in eect and take measures against speeding.
This paper presents an on-line map matching algorithm with an extensive number of weighting parameters that allow better determination of a vehicle's road network position. The algorithm uses certainty value to express its belief in the correctness of its results. The algorithm was designed and implemented to be used in the large scale ISA project "Spar på farten". Using test data and data collected from project participants, the algorithm's performance is evaluated. It is shown that algorithm performs correctly 95% of the time and is capable of handling GNSS positioning errors in a conservative manner.
Huang, X., C. S. Jensen, H. Lu, S. Šaltenis, "S-GRID: A Versatile Approach to Efficient Query Processing in Spatial Networks," in Proceedings of the Tenth International Symposium on Spatial and Temporal Databases, Boston, MA, USA, pp. 93-111 , July 16-18, 2007.

Publication

Online at SpringerLink
Mobile services is emerging as an important application area for spatio-temporal database management technologies. Service users are often constrained to a spatial network, e.g., a road network, through which points of interest, termed data points, are accessible. Queries that implement services will often concern data points of some specific type, e.g., Thai restaurants or art museums. As a result, the relatively few data points are relevant to a query in comparison to the number of network edges, meaning that queries, e.g., k nearest-neighbor queries, must access large portions of the network.
Existing query processing techniques pre-compute distances between data points and network vertices for improving the performance. However, pre-computation becomes problematic when the network or data points must be updated, possibly concurrently with the querying; and if the data points are moving, the existing techniques are inapplicable. In addition, multiple pre-computed structures must be maintained - one for each type of data point. We propose a versatile pre-computation approach for spatial network data. This approach uses a grid for pre-computing a simplified network. The above-mentioned shortcomings are avoided by making the pre-computed data independent of the data points. Empirical performance studies show that the structure is competitive with respect to the existing, more specialized techniques.
Lu, H., Z. Huang, C. S. Jensen, L. Xu, "Distributed, Concurrent Range Monitoring of Spatial-Network Constrained Mobile Objects," in Proceedings of the Tenth International Symposium on Spatial and Temporal Databases, Boston, MA, USA, pp. 366-384, July 16-18, 2007.

Publication

Online at SpringerLink
The ability to continuously monitor the positions of mobile objects is important in many applications. While most past work has been set in Euclidean spaces, the mobile objects relevant in many applications are constrained to spatial networks. This paper addresses the problem of range monitoring of mobile objects in this setting, in which network distance is concerned. An architecture is proposed where the mobile clients and a central server share computation, the objective being to obtain scalability by utilizing the capabilities of the clients. The clients issue location reports to the server, which is in charge of data storing and query processing. The server associates each range monitoring query with the network-edge portions it covers. This enables incremental maintenance of each query, and it also enables shared maintenance of concurrent queries by identifying the overlaps among such queries. The mobile clients contribute to the query processing by encapsulating their host edge portion identifiers in their reports to the server. Extensive empirical studies indicate that the paper's proposal is efficient and scalable, in terms of both query load and moving-object load.
Huang, Z., C. S. Jensen, H. Lu, B. C. Ooi, "Collaborative Spatial Data Sharing Among Mobile Lightweight Devices," in Proceedings of the Tenth International Symposium on Spatial and Temporal Databases, Boston, MA, USA, pp. 403-422, July 16-18, 2007.

Publication

Online at SpringerLink
Mobile devices are increasingly being equipped with wireless peerto- peer (P2P) networking interfaces, rendering the sharing of data among mobile devices feasible and beneficial. In comparison to the traditional client/server wireless channel, the P2P channels have considerably higher bandwidth. Motivated by these observations, we propose a collaborative spatial data sharing scheme that exploits the P2P capabilities of mobile devices. Using carefully maintained routing tables, this scheme enables mobile devices not only to use their local storage for query processing, but also to collaborate with nearby mobile peers to exploit their data. This scheme is capable of reducing the cost of the communication between mobile clients and the server as well as the query response time. The paper details the design of the data sharing scheme, including its routing table maintenance, query processing and update handling. An analytical cost model sensitive to user mobility is proposed to guide the storage content replacement and routing table maintenance. The results of extensive simulation studies based on an implementation of the scheme demonstrate that the scheme is efficient in processing location dependent queries and is robust to data updates.
Tradišauskas, N., J. Juhl, H. Lahrmann, C. S. Jensen, "Map Matching for Intelligent Speed Adaptation," in Proceedings of the Sixth European Congress and Exhibition on Intelligent Transport Systems and Services, Aalborg, Denmark, 12 pages, June 18-20, 2007.

Publication [not publicly available]
The availability of Global Navigation Satellite Systems enables sophisticated vehicle guidance and advisory systems such as Intelligent Speed Adaptation (ISA) systems. In ISA systems, it is essential to be able to position vehicles within a road network. Because digital road networks as well as GPS positioning are often inaccurate, a technique known as map matching is needed that aims to use this inaccurate data for determining a vehicle's real road-network position. Then, knowing this position, an ISA system can compare the vehicle's speed with the speed limit in effect and react appropriately.
This paper presents an on-line map matching algorithm with an extensive number of weighting parameters that allow better determination of a vehicle's road network position. The algorithm uses certainty value to express its belief in the correctness of its results. The algorithm was designed and implemented for use in the large scale ISA project "Spar på farten." Using test data and data collected from project participants, the algorithm's performance is evaluated. It is shown that algorithm performs correctly 95% of the time and is capable of handling GPS/DR errors in a conservative manner.
Tiesyte, D., C. S. Jensen, "Recovery of Vehicle Trajectories from Tracking Data for Analysis Purposes," in Proceedings of the Sixth European Congress and Exhibition on Intelligent Transport Systems and Services, Aalborg, Denmark, 12 pages , June 18-20, 2007.

Publication [not publicly available]
A number of transportation-related applications involve the accumulation of position data from vehicles. Examples include real-time tracking applications that relate to taxis, police and emergency vehicles, and collective transportation. Further, position data from historical vehicle trajectories may be utilized for improving many of these services. For example, better travel time prediction and scheduling algorithms can be developed. However, the position data being obtained from vehicles are often relatively infrequent and do not capture the vehicle trajectories accurately and systematically. This paper proposes techniques that systematically recover close-to-actual vehicle trajectories from the positions obtained during the tracking of the vehicles. The focus is on scenarios where the vehicles traverse known routes.
Wind, R., C. S. Jensen, K. Torp, " An Open Platform for the Creation and Deployment of Transport-Related Mobile Data Services ," in Proceedings of the Sixth European Congress and Exhibition on Intelligent Transport Systems and Services, Aalborg, Denmark, 8 pages , June 18-20, 2007.

Publication [not publicly available]

Online at ITS Sweden
Advanced mobile computing devices with wireless communication and geo-positioning capabilities are finding increasingly widespread use in Europe and beyond. Example devices include smart phones, PDA phones, and navigation systems. It is thus becoming increasingly relevant and attractive to utilize these devices and the related communication infrastructure for the deployment of transport-related mobile services. This paper describes the architecture of a service platform that enables users to create their own mobile services. The platform is based on standard hardware and software technologies and offers integration of transportation-related services with other services.
Wind, R., C. S. Jensen, K. H. Pedersen, K. Torp, "A Testbed for the Exploration of Novel Concepts in Mobile Service Delivery," in Proceedings of the Eighth International Conference on Mobile Data Management, Mannheim, Germany, pp. 218-220, May 7-11, 2007.

Publication

Online at IEEE
This paper describes an open, extendable, and scalable system that supports the delivery of context-dependent content to mobile users. The system enables users to receive content from multiple content providers that matches their demographic data, active proles, and context such as location and time. The system also allows users to subscribe to specic services. In addition, it allows users to provide their own content and services, by either using the system's publicly available interface or by lling out one of the service-conguration templates.
Tiesyte, D., C. S. Jensen, "Challenges in the Tracking and Prediction of Scheduled-Vehicle Journeys," in Proceedings of the First International Workshop on Pervasive Transportation Systems, White Plains, NY, USA, 6 pages , March 19, 2007.

Publication

Online at IEEE
A number of applications in areas such as logistics, cargo delivery, and collective transport involve the management of fleets of vehicles that are expected to travel along known routes according to fixed schedules. Due to road construction, accidents, and other unanticipated conditions, the vehicles deviate from their schedules. At the same time, there is a need for the infrastructure surrounding the vehicles to continually know the actual status of the vehicles. For example, anticipated arrival times of buses may have to be displayed at bus stops. It is a fundamental challenge to maintain this type of knowledge with minimal cost.
This paper characterizes the problem of real-time vehicle tracking using wireless communication, and of predicting the future status of the vehicles when their movements are restricted to given routes and when they follow schedules with a best effort. The paper discusses challenges related to tracking, to the prediction of future travel times, and to historical data analysis. It also suggests approaches to addressing the challenges.
Jensen, C. S., "When the Internet Hits the Road," in Proceedings of the Twelfth GI-Fachtagung für Datenbanksysteme in Business, Technologie und Web, Aachen, Germany, pp. 2-16 , March 7-9, 2007.

Publication [not publicly available]

Online at BTW
The Internet has recovered from the dot-com crash of the early 2000's and now features an abundance of new, innovative technologies and services. We are also witnessing the emergence of a communication and computing infrastructure that encompasses millions of people with mobile devices, such as mobile phones, with Internet connectivity. This infrastructure will soon enable the Internet to go mobile. This paper describes the background and aspirations of a new research project that is concerned with data management aspects of innovative mobile Internet services. It is argued that mobile services will be context aware, and the project devotes particular attention to geographical context awareness.
The project will adopt a prototyping approach where services are built and exposed to users, and where data management challenges are identified and addressed. The paper describes the evolving service platform that supports the approach chosen, it describes some of the data management techniques being integrated into the service platform, and it describes research guidelines that the project aims to follow.
Jensen, C. S., "Sensor Networks - the Case of Intelligent Transport Systems," in Proceedings of the NSF Workshop on Data Management for Mobile Sensor Networks, Pittsburgh, PA, USA, 2 pages, January 16-17, 2007.

Publication [not publicly available]

Online at the workshop site
Civilis, A., C. S. Jensen, S. Pakalnis, "Tracking of Moving Objects With Accuracy Guarantees," Chapter 13, pp. 285-309 in Spatial Data on the Web - Modeling and Management, edited by A. Belussi, B. Catania, E. Clementini, and E. Ferrari, Springer Verlag, 2007.

Publication [not publicly available]

Online at SpringerLink
Wind, R., C. S. Jensen, K. Torp, "Windows Mobile Programming," Chapter 8, pp. 207-235 in Mobile Phone Programming and its Application to Wireless Networks, edited by F. H. P. Fitzek and F. Reichert, Springer Verlag, 2007.

Publication [not publicly available]

Online at Springerlink
Becker, C., C. S. Jensen, D. Nicklas, J. Su, editors, in Proceedings of the Eighth International Conference on Mobile Data Management, Mannheim, Germany, 232+viii pages, May 7-11, 2007.

Online at IEEE
Haas, L. M., C. S. Jensen, M. L. Kersten, editors, "Special issue: best papers of VLDB 2005," The VLDB Journal, Vol. 16, No. 1, 164 pages, January 2007.

Online at SpringerLink
Haas, L. M., C. S. Jensen, M. L. Kersten, "Special issue: best papers of VLDB 2005," The VLDB Journal, Vol. 16, No. 1, pp. 1-3, January 2007.

Publication

Online at SpringerLink
Huang, X., C. S. Jensen, "A Streams-Based Framework for Defining Location-Based Queries," DB Technical Report, TR-19, 19 pages, April 2007.

Publication
An infrastructure is emerging that supports the delivery of on-line, location-enabled services to mobile users. Such services involve novel database queries, and the database research community is quite active in proposing techniques for the efficient processing of such queries. In parallel to this, the management of data streams has become an active area of research.
While most research in mobile services concerns performance issues, this paper aims to establish a formal framework for defining the semantics of queries encountered in mobile services, most notably the so-called continuous queries that are particularly relevant in this context. Rather than inventing an entirely new framework, the paper proposes a framework that builds on concepts from data streams and temporal databases. Definitions of example queries demonstrates how the framework enables clear formulation of query semantics and the comparison of queries. The paper also proposes a categorization of location-based queries.
2006 top Pelanis, M., S. Šaltenis, C. S. Jensen, "Indexing the Past, Present and Anticipated Future Positions of Moving Objects," ACM Transactions on Database Systems, Vol. 31, No. 1, pp. 255-298, March 2006.

Publication [not publicly available]

ACM Author-Izer
With the proliferation of wireless communications and geo-positioning, e-services are envisioned that exploit the positions of a set of continuously moving users to provide context-aware functionality to each individual user. Because advances in disk capacities continue to outperform Moore's Law, it becomes increasingly feasible to store on-line all the position information obtained from the moving e-service users. With the much slower advances in I/O speeds and many concurrent users, indexing techniques are of essence in this scenario.
Existing indexing techniques come in two forms. Some techniques capture the position of an object up until the time of the most recent position sample, while other techniques represent an object's position as a constant or linear function of time and capture the position from the current time and into the (near) future. This paper offers an indexing technique capable of capturing the positions of moving objects at all points in time. The index substantially extends partial persistence techniques, which support transaction time, to support valid time for monitoring applications. The performance of a timeslice query is independent of the number of past position samples stored for an object. No existing indices exist with these characteristics.
Benetis, R., C. S. Jensen, G. Karčiauskas, S. Šaltenis, "Nearest and Reverse Nearest Neighbor Queries for Moving Objects," The VLDB Journal, Vol. 15, No. 3, pp. 229-249, September 2006.

Publication

Online at SpringerLink
With the continued proliferation of wireless communications and advances in positioning technologies, algorithms for efficiently answering queries about large populations of moving objects are gaining interest. This paper proposes algorithms for k nearest and reverse k nearest neighbor queries on the current and anticipated future positions of points moving continuously in the plane. The former type of query returns k objects nearest to a query object for each time point during a time interval, while the latter returns the objects that have a specified query object as one of their k closest neighbors, again for each time point during a time interval. In addition, algorithms for so-called persistent and continuous variants of these queries are provided. The algorithms are based on the indexing of object positions represented as linear functions of time. The results of empirical performance experiments are reported.
Böhlen, M. H., J. Gamper, C. S. Jensen, "An Algebraic Framework for Temporal Attribute Characteristics," Annals of Mathematics and Artificial Intelligence, Vol. 46, No. 3, pp. 349-374, March 2006.

Publication

Online at SpringerLink
Most real-world database applications manage temporal data, i.e., data with associated time references that capture a temporal aspect of the data, typically either when the data is valid or when the data is known. Such applications abound in, e.g., the financial, medical, and scientific domains. In contrast to this, current database management systems offer precious little built-in query language support for temporal data management. This situation persists although an active temporal database research community has demonstrated that application development can be simplified substantially by built-in temporal support.
This paper's contribution is motivated by the observation that existing temporal data models and query languages generally make the same rigid assumption about the semantics of the association of data and time, namely that if a subset of the time domain is associated with some data then this implies the association of any further subset with the data. This paper offers a comprehensive, general framework where alternative semantics may co-exist and that supports so-called malleable and atomic temporal associations, in addition to the conventional ones mentioned above, which are termed constant. To demonstrate the utility of the framework, the paper defines a characteristics-enabled temporal algebra, termed CETA, which defines the traditional relational operators in the new framework. This contribution demonstrates that it is possible to provide built-in temporal support while making less rigid assumptions about the data, without jeopardizing the degree of the support. This may move temporal support closer to practical applications.
Jensen, C. S., K. Torp, "GPS baseret tracking af mobile objekter," Geoforum Perspektiv, Vol. 9, pp. 21-26, February 2006.

Publication

Online at Geoforum Danmark
Denne artikel beskriver hvorledes man med eksisterende teknologi, herunder Global Position System og General Packet Radio Service, effektivt kan tracke mobile objekter som f.eks. køretøjer med en garanteret nøjagtighed. Først beskrives den teknologiske platform. Herefter beskrives tre forskellige teknikker til at tracke mobile objekter. Teknikkerne bliver gradvis mere avancerede. De tre teknikker evalueres, og omkostningen for at tracke et mobilt objekt med en nøjagtighed på cirka 150 meter estimeres til mindre end 1 kr. pr. døgn baseret på priser fra et forsøg udført i 2004.
Ruxanda, M. M., C. S. Jensen, "Efficient Similarity Retrieval In Music Databases," in Proceedings of The Thirteenth International Conference on Management of Data, Delhi, India, pp. 56-67 , December 14-16, 2006.

Publication
Audio music is increasingly becoming available in digital form, and the digital music collections of individuals continue to grow. Addressing the need for effective means of retrieving music from such collections, this paper proposes new techniques for content-based similarity search. Each music object is modeled as a time sequence of high-dimensional feature vectors, and dynamic time warping (DTW) is used as the similarity measure. To accomplish this, the paper extends techniques for time-series-length reduction and lower bounding of DTW distance to the multi-dimensional case. Further, the Vector Approximation file is adapted to the indexing of time sequences and to use a lower bound on the DTW distance. Using these techniques, the paper exploits the lack of a ground truth for queries to efficiently compute query results that differ only slightly from results that may be more accurate but also are much more expensive to compute. In particular, the paper demonstrates that aggressive use of time-series-length reduction together with query expansion results in significant performance improvements while providing good, approximative query results.
Jensen, C. S., D. Lin, B. C. Ooi, R. Zhang, "Effective Density Queries of Continuously Moving Objects," in Proceedings of the Twentysecond International Conference on Data Engineering, Atlanta, GA, USA, 11 pages , April 3-7, 2006.

Publication

Online at IEEE
This paper assumes a setting where a population of objects move continuously in the Euclidean plane. The position of each object, modeled as a linear function from time to points, is assumed known. In this setting, the paper studies the querying for dense regions. In particular, the paper defines a particular type of density query with desirable properties and then proceeds to propose an algorithm for the efficient computation of density queries. While the algorithm may exploit any existing index for the current and near-future positions of moving objects, the Bx-tree is used. The paper reports on an extensive empirical study, which elicits the performance properties of the algorithm.
Huang, Z., C. S. Jensen, H. Lu, B. C. Ooi, "Skyline Queries Against Mobile Lightweight Devices in MANETs," in Proceedings of the Twentysecond International Conference on Data Engineering, Atlanta, GA, USA, 11 pages , April 3-7, 2006.

Publication

Online at IEEE
Skyline queries are well suited when retrieving data according to multiple criteria. While most previous work has assumed a centralized setting this paper considers skyline querying in a mobile and distributed setting, where each mobile device is capable of holding only a portion of the whole dataset; where devices communicate through mobile ad hoc networks; and where a query issued by a mobile user is interested only in the user's local area, although a query generally involves data stored on many mobile devices due to the storage limitations. We present techniques that aim to reduce the costs of communication among mobile devices and reduce the execution time on each single mobile device. For the former, skyline query requests are forwarded among mobile devices in a deliberate way, such that the amount of data to be transferred is reduced. For the latter, specific optimization measures are proposed for resource-constrained mobile devices. We conduct extensive experiments to show that our proposal performs efficiently in real mobile devices and simulated wireless ad hoc networks.
Schmidt, A., C. S. Jensen, S. Šaltenis, "Expiration Times for Data Management," in Proceedings of the Twentysecond International Conference on Data Engineering, Atlanta, GA, USA, 11 pages , April 3-7, 2006.

Publication

Online at IEEE
This paper describes an approach to incorporating the notion of expiration time into data management based on the relational model. Expiration times indicate when tuples cease to be current in a database. The paper presents a formal data model and a query algebra that handle expiration times transparently and declaratively. In particular, expiration times are exposed to users only on insertion and update, and when triggers fire due to the expiration of a tuple; for queries, they are handled behind the scenes and do not concern the user. Notably, tuples are removed automatically from (materialised) query results as they expire in the (base) relations.
For application developers, the benefits of using expiration times are leaner application code, lower transaction volume, smaller databases, and higher consistency for replicated data with lower overhead. Expiration times turn out to be especially useful in open architectures and loosely-coupled systems, which abound on the World Wide Web as well as in mobile networks, be it as Web Services or as ad hoc and intermittent networks of mobile devices.
Böhlen, M. H., J. Gamper, C. S. Jensen, "Multi-Dimensional Aggregation for Temporal Data," in Proceedings of the Tenth Intenational Conference on Extending Database Technology, Lecture Notes on Computer Science 3896, Munich, Germany, pp. 257-275 , March 26-30, 2006.

Publication

Online at SpringerLink
Business Intelligence solutions, encompassing technologies such as multi-dimensional data modeling and aggregate query processing, are being applied increasingly to non-traditional data. This paper extends multi-dimensional aggregation to apply to data with associated interval values that capture when the data hold. In temporal databases, intervals typically capture the states of reality that the data apply to, or capture when the data are, or were, part of the current database state. This paper proposes a new aggregation operator that addresses several challenges posed by interval data. First, the intervals to be associated with the result tuples may not be known in advance, but depend on the actual data. Such unknown intervals are accommodated by allowing result groups that are specified only partially. Second, the operator contends with the case where an interval associated with data expresses that the data holds for each point in the interval, as well as the case where the data holds only for the entire interval, but must be adjusted to apply to sub-intervals. The paper reports on an implementation of the new operator and on an empirical study that indicates that the operator scales to large data sets and is competitive with respect to other temporal aggregation algorithms.
Jensen, C. S., D. Tiesyte, N. Tradišauskas, "The COST Benchmark - Comparison and Evaluation of Spatio-Temporal Indexes," in Proceedings of the Eleventh International Conference on Database Systems for Advanced Applications, Lecture Notes on Computer Science 3882, Singapore, pp. 125-140 , April 12-15, 2006.

Publication

Online at SpringerLink
An infrastructure is emerging that enables the positioning of populations of on-line, mobile service users. In step with this, research in the management of moving objects has attracted substantial attention. In particular, quite a few proposals now exist for the indexing of moving objects, and more are underway. As a result, there is an increasing need for an independent benchmark for spatio-temporal indexes.
This paper characterizes the spatio-temporal indexing problem and proposes a benchmark for the performance evaluation and comparison of spatio-temporal indexes. Notably, the benchmark takes into account that the available positions of the moving objects are inaccurate, an aspect largely ignored in previous indexing research. The concepts of data and query enlargement are introduced for addressing inaccuracy. As proof of concepts of the benchmark, the paper covers the application of the benchmark to three spatio-temporal indexes - the TPR-, TPR*-, and Bx-trees. Representative experimental results and consequent guidelines for the usage of these indexes are reported.
Schmidt, A., C. S. Jensen, "Efficient Maintenance of Ephemeral Data," in Proceedings of the Eleventh International Conference on Database Systems for Advanced Applications, Lecture Notes on Computer Science 3882, Singapore, pp. 141-155 , April 12-15, 2006.

Publication

Online at SpringerLink
Motivated by the increasing prominence of loosely-coupled systems, such as mobile and sensor networks, the characteristics of which include intermittent connectivity and volatile data, we study the tagging of data with so-called expiration times. More specifically, when data are inserted into a database, they may be stamped with time values indicating when they expire, i.e., when they are regarded as stale or invalid and thus are no longer considered part of the database. In a number of applications, expiration times are known and can be assigned at insertion time. We present data structures and algorithms for online management of data stamped with expiration times. The algorithms are based on fully functional treaps, which are a combination of binary search trees with respect to a primary attribute and heaps with respect to a secondary attribute. The primary attribute implements primary keys, and the secondary attribute stores expiration times in a minimum heap, thus keeping a priority queue of tuples to expire. A detailed and comprehensive experimental study demonstrates the well-behavedness and scalability of the approach as well as its efficiency with respect to a number of competitors.
Jensen, C. S., "Geo-Enabled, Mobile Services - A Tale of Routes, Detours, and Dead Ends," in Proceedings of the Eleventh International Conference on Database Systems for Advanced Applications, Lecture Notes on Computer Science 3882, Singapore, pp. 6-19 , April 12-15, 2006.

Publication

Online at SpringerLink
We are witnessing the emergence of a global infrastructure that enables the widespread deployment of geo-enabled, mobile services in practice. At the same time, the research community has also paid increasing attention to data management aspects of mobile services. This paper offers me an opportunity to characterize this research area and to describe some of their challenges and pitfalls, and it affords me an opportunity to look back and reflect upon some of the general challenges we face as researchers. I hope that my views and experiences as expressed in this paper may enable others to challenge their own views about this exciting research area and about how to best carry out their research in their own unique contexts.
Brilingaite, A., C. S. Jensen, "Online Route Prediction for Automotive Applications," in Proceedings of the Thirteenth World Congress and Exhibition on Intelligent Transport Systems and Services, London, UK, 8 pages , October 8-12, 2006.

Publication
An information and communication technology infrastructure is rapidly emerging that enables the delivery of location-based services to vast numbers of mobile users. Services will benefit from being aware of not only the user's location, but also the user's current destination and route towards the destination. This paper describes a component that enables the use of geo-context. Using GPS data, the component gathers a driver's routes and associates them with usage meta-data. Other services may then provide the component with a driver ID, the time of the day, and a location, in return obtaining the likely routes for the driver.
Huang, X., C. S. Jensen, S. Šaltenis, "Multiple k Nearest Neighbor Query Processing in Spatial Network Databases," in Proceedings of the Tenth East-European Conference on Advances In Databases and Information Systems, Lecture Notes on Computer Science 4152, Thessaloniki, Greece, pp. 266-281 , September 3-7, 2006.

Publication

Online at SpringerLink
This paper concerns the efficient processing of multiple k nearest neighbor queries in a road-network setting. The assumed setting covers a range of scenarios such as the one where a large population of mobile service users that are constrained to a road network issue nearest-neighbor queries for points of interest that are accessible via the road network. Given multiple k nearest neighbor queries, the paper proposes progressive techniques that selectively cache query results in main memory and subsequently reuse these for query processing. The paper initially proposes techniques for the case where an upper bound on k is known a priori and then extends the techniques to the case where this is not so. Based on empirical studies with real-world data, the paper offers insight into the circumstances under which the different proposed techniques can be used with advantage for multiple k nearest neighbor query processing.
Jensen, C.S., D. Tiesyte, N. Tradišauskas, "Robust B+-Tree-Based Indexing of Moving Objects," in Proceedings of the Seventh International Conference on Mobile Data Management, Nara, Japan, 9 pages , May 9-12, 2006.

Publication

Online at IEEE
With the emergence of an infrastructure that enables the geo-positioning of on-line, mobile users, the management of so-called moving objects has emerged as an active area of research. Among the indexing techniques for efficiently answering predictive queries on moving-object positions, the recent Bx-tree is based on the B+-tree and is relatively easy to integrate into an existing DBMS. However, the Bx-tree is sensitive to data skew. This paper proposes a new query processing algorithm for the Bx-tree that fully exploits the available data statistics to reduce the query enlargement that is needed to guarantee perfect recall, thus significantly improving robustness. The new technique is empirically evaluated and compared with four other approaches and with the TPR-tree, a competitor that is based on the R*-tree. The results indicate that the new index is indeed more robust than its predecessor - it significantly reduces the number of I/O operations per query for the workloads considered. In many settings, the TPR-tree is outperformed as well.
Chrysanthis, P. K., C. S. Jensen, V. Kumar, A. Labrinidis, editors, in Proceedings of the Fifth ACM International Workshop on Data Engineering for Wireless and Mobile Access, Chicago, Illinois, USA, 92+viii pages, June 25, 2006.

Online at ACM Digital Library
Chrysanthis, P. K., C. S. Jensen, V. Kumar, A. Labrinidis, "Foreword," in Proceedings of the Fifth ACM International Workshop on Data Engineering for Wireless and Mobile Access, Chicago, Illinois, USA, p. iii , June 25, 2006.

Publication
Alonso, G., C. S. Jensen, B. Mitschang, editors, " Data Always and Everywhere - Management of Mobile, Ubiquitous, Pervasive, and Sensor Data ," Dagstuhl Seminar Proceedings 05421, Dagstuhl, Germany, 37 pages, January 2006.

Online at Dagstuhl
Alonso, G., C. S. Jensen, B. Mitschang, "05421 Abstracts Collection - Data Always and Everywhere," in Dagstuhl Seminar Proceedings 05421, Dagstuhl, Germany, 19 pages, January 2006.

Publication

Online at Dagstuhl
From 16.10.05 to 21.10.05, the Dagstuhl Seminar 05421, Data Always and Everywhere - Management of Mobile, Ubiquitous, Pervasive, and Sensor Data, was held in the International Conference and Research Center, Schloss Dagstuhl. During the seminar, all participants were given the opportunity to present their current research, and ongoing activities and open problems were discussed. This document is a collection of the abstracts of the presentations given during the seminar. Some abstracts offer links to extended abstracts, full papers, and other supporting documents. A separate companion document summarizes the seminar.
The authors wish to acknowledge Victor Teixeira de Almeida, who served as collector for the seminar and thus played a key role in collecting materials from the seminar participants.
Alonso, G., C. S. Jensen, B. Mitschang, " 05421 Executive Summary - Data Always and Everywhere - Management of Mobile, Ubiquitous, Pervasive, and Sensor Data ," Dagstuhl Seminar Proceedings 05421, Dagstuhl, Germany, 6 pages, January 2006.

Publication

Online at Dagstuhl
This report summarizes the important aspects of the workshop on "Management of Mobile, Ubiquitous, Pervasive, and Sensor Data," which took place from October 16th to October 21st, 2005. Thirty-seven participants from thirteen countries met during that week and discussed a broad range of topics related to the management of data in relation to mobile, ubiquitous, and pervasive applications of information technology. The wealth of the contributions is available at the seminar page at the Dagstuhl server. Here, we provide a short overview.
Böhlen, M. H., J. Gamper, C. S. Jensen, "How Would You Like to Aggregate Your Temporal Data?," in Proceedings of the Thirteenth International Symposium on Temporal Representation and Reasoning, Budapest, Hungary, pp. 121-136 , June 15-17, 2006.

Publication

Online at IEEE
Real-world data management applications generally manage temporal data, i.e., they manage multiple states of time-varying data. Many contributions have been made by the research community for how to better model, store, and query temporal data. In particular, several dozen temporal data models and query languages have been proposed. Motivated in part by the emergence of non-traditional data management applications and the increasing proliferation of temporal data, this paper puts focus on the aggregation of temporal data. In particular, it provides a general framework of temporal aggregation concepts, and it discusses the abilities of five approaches to the design of temporal query languages with respect to temporal aggregation. Rather than providing focused, polished results, the paper's aim is to explore the inherent support for temporal aggregation in an informal manner that may serve as a foundation for further exploration.
Pedersen, T. B., C. S. Jensen, C. Dyreson, "Method and System for Making OLAP Hierarchies Summarisable," United States Patent No. 7,133,865 B1, 27 pages, November 7, 2006.

Online at WIPO
A method, a computer system and a computer program product for a computer system for transforming general On-line Analytical Processing (OLAP) hierarchies into summarizable hierarchies whereby pre-aggregation is disclosed, by which fast query response times for aggregation queries without excessive storage use is made possible even when the hierarchies originally are irregular. Pre-aggregation is essential for ensuring adequate response time during data analysis. Most OLAP systems adopt the practical pre-aggregation approach, as opposed to full pre-aggregation, of materializing only select combinations of aggregates and then re-use these for efficiently computing other aggregates. However, this re-use of aggregates is contingent on the dimension hierarchies and the relationships between facts and dimensions satisfying stringent constraints. The present invention significantly extends the scope of practical pre-aggregation by transforming irregulare dimension hierarchies and fact-dimension relationships into well-behaved structures that enable practical pre-aggregation.
Huang, X., C. S. Jensen, S. Šaltenis, "The Islands Approach to Nearest Neighbor Querying in Spatial Networks," DB Technical Report, TR-16, 23 pages, October 2006.

Publication
Much research has recently been devoted to the data management foundations of location-based mobile services. In one important scenario, the service users are constrained to a transportation network. As a result, query processing in spatial road networks is of interest. In this paper, we propose a versatile approach to k nearest neighbor computation in spatial networks, termed the Islands approach. By offering flexible yet simple means of balancing re-computation and pre-computation, this approach is able to manage the trade-off between query and update performance, and it offers better overall query and update performance than do its predecessors. The result is a single, efficient, and versatile approach to k nearest neighbor computation that obviates the need for using several k nearest neighbor approaches for supporting a single service scenario. The experimental comparison with the existing techniques uses real-world road network data and considers both I/O and CPU performance, for both queries and updates.
2005 top Böhm, K., C. S. Jensen, L. M. Haas, M. L. Kersten, P.-Å. Larson, B. C. Ooi, editors, in Proceedings of the Thirtyfirst International Conference on Very Large Data Bases, Trondheim, Norway, 1372+xxiv pages, August 30-September 2, 2005.

Online at DBLP
Bernstein, P. A., S. Chaudhuri, D. DeWitt, A. Heuer, Z. Ives, C. S. Jensen, H. Meyer, M. T. Özsu, R. T. Snodgrass, K. Y. Whang, J. Widom, "Database Publication Practices," in Proceedings of the Thirtyfirst International Conference on Very Large Data Bases, Trondheim, Norway, pp. 1241-1246, August 30-September 2, 2005.

Publication [not publicly available]

Online at VLDB
There has been a growing interest in improving the publication processes for database research papers. This panel reports on recent changes in those processes and presents an initial cut at historical data for the VLDB Journal and ACM Transactions on Database Systems.
Bernstein, P. A., E. Bertino, A. Heuer, C. S. Jensen, H. Meyer, M. T. Özsu, R. T. Snodgrass, K. Y. Whang, "An Apples-to-Apples Comparison of Two Database Journals," ACM SIGMOD Record, Vol. 34, No. 4, pp. 61-64, December 2005.

Publication [not publicly available]

ACM Author-Izer
This paper defines a collection of metrics on manuscript reviewing and presents historical data for ACM Transactions on Database Systems and The VLDB Journal.
Huang, X., C. S. Jensen, S. Šaltenis, "The Islands Approach to Nearest Neighbor Querying in Spatial Networks," in Proceedings of the Nineth International Symposium on Spatial and Temporal Databases, Angra, Brazil, published as Lecture Notes in Computer Science, Volume 3633, pp. 73-90 , August 22-24, 2005.

Publication [not publicly available]

Online at SpringerLink
Much research has recently been devoted to the data management foundations of location-based mobile services. In one important scenario, the service users are constrained to a transportation network. As a result, query processing in spatial road networks is of interest. We propose a versatile approach to k nearest neighbor computation in spatial networks, termed the Islands approach. By offering flexible yet simple means of balancing re-computation and pre-computation, this approach is able to manage the trade-off between query and update performance. The result is a single, efficient, and versatile approach to k nearest neighbor computation that obviates the need for using several k nearest neighbor approaches for supporting a single service scenario. The experimental comparison with the existing techniques uses real-world road network data and considers both I/O and CPU performance, for both queries and updates.
Schmidt, A., C. S. Jensen, "Efficient Management of Short-Lived Data," TimeCenter Technical Report TR-82 and CoRR cs.DB/0505038 (2005), 24 pages, May 2005.

Publication

Publication at CoRR
Motivated by the increasing prominence of loosely-coupled systems, such as mobile and sensor networks, which are characterised by intermittent connectivity and volatile data, we study the tagging of data with so-called expiration times. More specifically, when data are inserted into a database, they may be tagged with time values indicating when they expire, i.e., when they are regarded as stale or invalid and thus are no longer considered part of the database. In a number of applications, expiration times are known and can be assigned at insertion time. We present data structures and algorithms for online management of data tagged with expiration times. The algorithms are based on fully functional, persistent treaps, which are a combination of binary search trees with respect to a primary attribute and heaps with respect to a secondary attribute. The primary attribute implements primary keys, and the secondary attribute stores expiration times in a minimum heap, thus keeping a priority queue of tuples to expire. A detailed and comprehensive experimental study demonstrates the well-behavedness and scalability of the approach as well as its efficiency with respect to a number of competitors.
Civilis, A., C. S. Jensen, S. Pakalnis, "Techniques for Efficient Tracking of Road-Network-Based Moving Objects," IEEE Transactions on Knowledge and Data Engineering, Vol. 17, No. 5, pp. 698-712, May 2005.

Publication

Online at IEEE
With the continued advances in wireless communications, geo-positioning, and consumer electronics, an infrastructure is emerging that enables location-based services that rely on the tracking of the continuously changing positions of entire populations of service users, termed moving objects. This scenario is characterized by large volumes of updates, for which reason location update technologies become important. A setting is assumed in which a central database stores a representation of each moving object's current position. This position is to be maintained so that it deviates from the user's real position by at most a given threshold. To do so, each moving object stores locally the central representation of its position. Then an object updates the database whenever the deviation between its actual position (as obtained from a GPS device) and the database position exceeds the threshold. The main issue considered is how to represent the location of a moving object in a database so that tracking can be done with as few updates as possible. The paper proposes to use the road network within which the objects are assumed to move for predicting their future positions. The paper presents algorithms that modify an initial road-network representation, so that it works better as a basis for predicting an object's position; it proposes to use known movement patterns of the object, in the form of routes; and it proposes to use acceleration profiles together with the routes. Using real GPS-data and a corresponding real road network, the paper offers empirical evaluations and comparisons that include three existing approaches and all the proposed approaches.
Pfoser, D., C. S. Jensen, "Trajectory Indexing Using Movement Constraints," Geoinformatica, Vol. 9, No. 2, pp. 93-115, June 2005.

Publication [not publicly available]

Online at SpringerLink
With the proliferation of mobile computing, the ability to index efficiently the movements of mobile objects becomes important. Objects are typically seen as moving in two-dimensional (x,y) space, which means that their movements across time may be embedded in the three-dimensional (x,y,t) space. Further, the movements are typically represented as trajectories, sequences of connected line segments. In certain cases, movement is restricted; specifically, in this paper, we aim at exploiting that movements occur in transportation networks to reduce the dimensionality of the data. Briefly, the idea is to reduce movements to occur in one spatial dimension. As a consequence, the movement occurs in two-dimensional (x,t) space. The advantages of considering such lower-dimensional trajectories are that the overall size of the data is reduced and that lower-dimensional data is to be indexed. Since off-the-shelf database management systems typically do not offer higher-dimensional indexing, this reduction in dimensionality allows us to use existing DBMSes to store and index trajectories. Moreover, we argue that, given the right circumstances, indexing these dimensionality-reduced trajectories can be more efficient than using a three-dimensional index. A decisive factor here is the fractal dimension of the network.the lower, the more efficient is the proposed approach. This hypothesis is verified by an experimental study that incorporates trajectories stemming from real and synthetic road networks.
Pfoser, D., N. Tryfona, C. S. Jensen, "Indeterminacy and Spatiotemporal Data: Basic Definitions and Case Study," Geoinformatica, Vol. 9, No. 3, pp. 211-236, September 2005.

Publication [not publicly available]

Online at SpringerLink
For some spatiotemporal applications, it can be assumed that the modeled world is precise and bounded, and that our record of it is precise. While these simplifying assumptions are sufficient in applications like a land information system, they are unnecessarily crude for many other applications that manage data with spatial and/or temporal extents, such as navigational applications. This work explores fuzziness and uncertainty, subsumed under the term indeterminacy, in the spatiotemporal context. To better illustrate the basic spatiotemporal concepts of change or evolution, it is shown how the fundamental modeling concepts of spatial objects, attributes, and relationships and time points, and periods are influenced by indeterminacy, and how they can be combined. In particular, the focus is on the change of spatial objects and their geometries across time. Four change scenarios are outlined, which concern discrete versus continuous change and asynchronous versus synchronous measurement, and it is shown how to model indeterminacy for each. A case study illustrates the applicability of the paper's general proposal by describing the uncertainty related to the management of the movements of point objects, such as the management of vehicle positions in a fleet management system.
Friis-Christensen, A., C. S. Jensen, J. P. Nytun, D. Skogan, " A Conceptual Schema Language for the Management of Multiple Representations of Geographic Entities ," Transactions in GIS, Vol. 9, No. 3, pp. 345-380, June 2005.

Publication [not publicly available]

Online at Blackwell Synergy
Multiple representation of geographic information occurs when a real-world entity is represented more than once in the same or different databases. This occurs frequently in practice, and it invariably results in the occurrence of inconsistencies among the different representations of the same entity. In this paper, we propose an approach to the modeling of multiply represented entities, which is based on the relationships among the entities and their representations. Central to our approach is the Multiple Representation Schema Language that, by intuitive and declarative means, is used to specify rules that match objects representing the same entity, maintain consistency among these representations, and restore consistency if necessary. The rules configure a Multiple Representation Management System, the aim of which is to manage multiple representations over a number of autonomous federated databases. We present a graphical and a lexical binding to the schema language. The graphical binding is built on an extension to the Unified Modeling Language and the Object Constraint Language. We demonstrate that it is possible to implement the constructs of the schema language in the object-relational model of a commercial RDBMS.
Gao, D., C. S. Jensen, R. T. Snodgrass, M. D. Soo, "Join Operations in Temporal Databases," The VLDB Journal, Vol. 14, No. 1, pp. 2-29, March 2005.

Publication

Online at SpringerLink
Joins are arguably the most important relational operators. Poor implementations are tantamount to computing the Cartesian product of the input relations. In a temporal database, the problem is more acute for two reasons. First, conventional techniques are designed for the evaluation of joins with equality predicates rather than the inequality predicates prevalent in valid-time queries. Second, the presence of temporally varying data dramatically increases the size of a database. These factors indicate that specialized techniques are needed to efficiently evaluate temporal joins.
We address this need for efficient join evaluation in temporal databases. Our purpose is twofold. We first survey all previously proposed temporal join operators. While many temporal join operators have been defined in previous work, this work has been done largely in isolation from competing proposals, with little, if any, comparison of the various operators. We then address evaluation algorithms, comparing the applicability of various algorithms to the temporal join operators and describing a performance study involving algorithms for one important operator, the temporal equijoin. Our focus, with respect to implementation, is on non-index-based join algorithms. Such algorithms do not rely on auxiliary access paths but may exploit sort orderings to achieve efficiency.
Lomet, D., R. T. Snodgrass, C. S. Jensen, "Using the Lock Manager to Choose Timestamps," in Proceedings of the Nineth International Database Engineering and Applications Symposium, Montreal, Canada, pp. 357-368 , July 25-27, 2005.

Publication

Online at IEEE
Our goal is to support transaction-time functionality that enables the coexistence of ordinary, non-temporal tables with transaction-time tables. In such a system, each transaction updating a transaction-time or snapshot table must include a timestamp for its updated data that correctly reflects the serialization order of the transactions, including transactions on ordinary tables. A serious issue is coping with SQL CURRENT_TIME functions, which should return a time consistent with a transaction's timestamp and serialization order. Prior timestamping techniques cannot support such functions with this desired semantics. We show how to compatibly extend conventional database functionality for transactiontime support by exploiting the database system lock manager and by utilizing a spectrum of optimizations.
Frank, L., C. Frank, C. S. Jensen, T. B. Pedersen, "Dimensional Modeling By Using a New Response to Slowly Changing Dimensions," in Proceedings of the Second International Conference on Information Technology, Amman, Jordan, pp. 7-10 , May 3-5, 2005.

Publication [not publicly available]
Dimensions are defined as dynamic or slowly changing if the attributes or relationships of a dimension can be updated. Aggregations to dynamic dimensions might be misleading if the measures are aggregated without regarding the changes of the dimensions. Kimball et al. has described three classic solutions/responses to handling the aggregation problems caused by slowly changing dimensions. In this paper, we will describe a fourth solution. A special aspect of our new response is that it should be used before the other responses, as it will change the design of the data warehouse. Afterwards, it may be necessary to use the classic responses to improve the design further.
Jensen, C. S., K.-J. Lee, S. Pakalnis, S. Šaltenis, "Advanced Tracking of Vehicles," in Proceedings of the Fifth European Congress and Exhibition on Intelligent Transport Systems, Hannover, Germany, 12 pages , June 1-3, 2005.

Publication [not publicly available]
With the continued advances in wireless communications, geo-location technologies, and consumer electronics, it is becoming possible to accurately track the time-varying location of each vehicle in a population of vehicles.
This paper reports on ongoing research that has as it objective to develop efficient tracking techniques. More specifically, while almost all commercially available tracking solutions simply offer time-based sampling of positions, this paper's techniques aim to offer a guaranteed tracking accuracy for each vehicle at the lowest possible costs, in terms of network traffic and server-side updates. This is achieved by designing, prototyping, and testing novel tracking techniques that exploit knowledge of the road network and past movement.
These resulting tracking techniques are to support mobile services that rely on the existence of a central server that continuously tracks the current positions of vehicles.
Lin, D., C. S. Jensen, B. C. Ooi, S. Šaltenis, "Efficient Indexing of the Historical, Present, and Future Positions of Moving Objects," in Proceedings of the Sixth International Conference on Mobile Data Management, Ayia Napa, Cyprus, pp. 59-66, May 9-13, 2005.

Publication [not publicly available]

ACM Author-Izer
Although significant effort has been put into the development of efficient spatio-temporal indexing techniques for moving objects, little attention has been given to the development of techniques that efficiently support queries about the past, present, and future positions of objects. The provisioning of such techniques is challenging, both because of the nature of the data, which reflects continuous movement, and because of the types of queries to be supported.
This paper proposes the BBx-index structure, which indexes the positions of moving objects, given as linear functions of time, at any time. The index stores linearized moving-object locations in a forest of B+-trees. The index supports queries that select objects based on temporal and spatial constraints, such as queries that retrieve all objects whose positions fall within a spatial range during a set of time intervals. Empirical experiments are reported that offer insight into the query and update performance of the proposed technique.
Damsgaard, J., J. Hørlück, C. S. Jensen, "IT-Driven Customer Service or Customer-Driven IT Service: Does IT Matter?," teaching case, 24 pages, European Case Clearing House reference number: 905-002-1 , January 2005.

Publication
At the end of 2004, the Nykredit Group was doing well. In accordance with the overall plan of diversifying Nykredit from a mortgage bank to a retail financial institution, Nykredit had just successfully acquired another mortgage bank. The company portfolio of Nykredit was now close to being complete. Mortgage banking, retail banking, an insurance company, a real estate brokerage chain, and a real estate investor company comprised the Nykredit Group, making it a modern financial supermarket.. The deregulation of the Danish banking industry in the 1990s caused a lot of turmoil within the entire industry and had forced Nykredit into a radical reorientation of the company. From this Nykredit emerged not only as a survivor, but also as a clear winner. The remarkable competence of the IT staff of the Nykredit Group in maintaining, integrating and developing its multi-faceted portfolio of IT systems across the various constituent companies into a modern multi-channelled and multi-tiered IT infrastructure had accentuated the success of Nykredit's strategy. In 2004, the financial industry competition was again concentrated on gaining competitive advantage through differentiation and cost reduction.
Everybody agreed that IT was the answer to achieving both cost reduction and differentiation, but how could Nykredit be sure they would always have the right IT? Some argued that the customer side should drive the IT development in order to ensure that Nykredit would have the most relevant IT systems. Others had the opinion that radical business innovation leading to a competitive advantage could only be achieved through in-depth knowledge of what new and emerging IT could do and of how this could be linked with existing IT systems.
Acknowledging the importance of both sides, Nykredit had combined IT service and customer service into one powerful business development department. The creation of a central department had been extremely successful during the radical changes that Nykredit had been forced into during the 1990s. But could the success of the past be extended into the future?
The public debate on the value of IT as a competitive weapon had further stimulated this discussion.
Damsgaard, J., J. Hørlück, C. S. Jensen, "IT-Driven Customer Service or Customer-Driven IT Service: Does IT Matter?," teaching case, 8 pages, European Case Clearing House reference number: 905-002-8 , January 2005.

Publication [not publicly available]
This case deals with a large European financial institution that has built an extensive IT infrastructure to serve its multi channel approach to its customers at the same time as changing into a modern financial supermarket with a large portfolio of almost all financial services. Experience has shown that in this industry, IT does matter. As an example: a few days after take over of a competitor, this competitor's previous owner - 105 small banks - sold Nykredit's products through their 1150 branches.
The case can thus be used in a discussion of Nicholas Carr's article "IT Doesn't Matter" (Carr, Nicholas G. (2003). "IT Doesn't Matter." Harward Business Review(May): 41-49).
Traditionally mortgage banking would mean either building an extensive branch network backed by central staff functions or joining forces with an existing retail financial institution. However, the Internet made online presence and a call center equipped with the latest CRM tools an inevitable alternative. But this was not viable if the existing IT infrastructure could not be transformed into a modern streamlined multitiered infrastructure accessible from the Internet. The IT infrastructure was - for historical reasons - based on a variety of systems encompassing both systems developed in house, acquired best of suite and best of breed systems. In order to implement a multi-channel customer approach, the financial institution was both engaged in rearranging the old IT systems while building new Internet ready systems.
This case is open-ended and does not have a set solution. The business perspective is to discuss alternatives to a financial institution based on branches and especially what this requires in terms of IT support. It is designed to encourage discussion on issues such as physical distribution network versus a strong net presence; the changing role of the IT department, from being supplier of back office systems to delivering the storefront; the challenge of transforming several hundred existing legacy systems to a coherent and multi-layered, Internet-ready IT infrastructure; and modern software development and project management.
Huang, X, C. S. Jensen, "In-Route Skyline Querying for Location-Based Services," Workshop on Web and Wireless Geographic Information Systems, post-workshop proceedings, published as Lecture Notes in Computer Science, Volume 3428, pp. 120-135 , April 2005.

Publication

Online at SpringerLink
With the emergence of an infrastructure for location-aware mobile services, the processing of advanced, location-based queries that are expected to underlie such services is gaining in relevance. While much work has assumed that users move in Euclidean space, this paper assumes that movement is constrained to a road network and that points of interest can be reached via the network. More specifically, the paper assumes that the queries are issued by users moving along routes towards destinations.
The paper defines in-route nearest-neighbor skyline queries in this setting and considers their efficient computation. The queries take into account several spatial preferences, and they intuitively return a set of most interesting results for each result returned by the corresponding non-skyline queries. The paper also covers a performance study of the proposed techniques based on real point-of-interest and road network data.
Civilis, A., C. S. Jensen, S. Pakalnis, "Techniques for Efficient Tracking of Road-Network-Based Moving Objects," DB Technical Report, TR-10, 37 pages, March 2005.

Publication
With the continued advances in wireless communications, geo-positioning, and consumer electronics, an infrastructure is emerging that enables location-based services that rely on the tracking of the continuously changing positions of entire populations of service users, termed moving objects. This scenario is characterized by large volumes of updates, for which reason location update technologies become important.
A setting is assumed in which a central database stores a representation of each moving object's current position. This position is to be maintained so that it deviates from the user's real position by at most a given threshold. To do so, each moving object stores locally the central representation of its position. Then an object updates the database whenever the deviation between its actual position (as obtained from a GPS device) and the database position exceeds the threshold.
The main issue considered is how to represent the location of a moving object in a database so that tracking can be done with as few updates as possible. The paper proposes to use the road network within which the objects are assumed to move for predicting their future positions.
The paper presents algorithms that modify an initial road-network representation, so that it works better as a basis for predicting an object's position; it proposes to use known movement patterns of the object, in the form of routes; and it proposes to use acceleration profiles together with the routes.
Using real GPS-data and a corresponding real road network, the paper offers empirical evaluations and comparisons that include three existing approaches and all the proposed approaches.
Brilingaite, A., C. S. Jensen, N. Zokaite, "Enabling Routes as Context in Mobile Services," DB Technical Report, TR-9, 42 pages, April 2005.

Publication

ACM Author-Izer
With the continuing advances in wireless communications, geo-positioning, and portable electronics, an infrastructure is emerging that enables the delivery of on-line, location-enabled services to very large numbers of mobile users. A typical usage situation for mobile services is one characterized by a small screen and no keyboard, and by the service being only a secondary focus of the user. Under such circumstances, it is particularly important to deliver the "right" information and service at the right time, with as little user interaction as possible. This may be achieved by making services context aware.
Mobile users frequently follow the same route to a destination as they did during previous trips to the destination, and the route and destination constitute important aspects of the context for a range of services. This paper presents key concepts underlying a software component that identifies and accumulates the routes of a user along with their usage patterns and that makes the routes available to services. Experiences from using the component on logs of GPS positions acquired from vehicles traveling within a real road network are reported.
2004 top Breunig, M., C. S. Jensen, M. Klein, A. Zeitz, G. Koloniari, J. Grünbauer, P. J. Marrón, C. Panieyiotoa, S. Boll, S. Šaltenis, K.-U. Sattler, M. Hauswirth, W. Lehner, O. Wolfson, "Research Issues in Mobile Querying," in Proceedings of the Dagstuhl Seminar 04441 on Mobile Information Management, Schloss Dagstuhl, Wadern, Germany, 6 pages, October 24-29, 2004.

Publication [not publicly available]

Online at Dagstuhl
This document reports on key aspects of the discussions conducted within the working group. In particular, the document aims to offer a structured and somewhat digested summary of the group's discussions. The document first offers concepts that enable characterization of "mobile queries" as well as the types of systems that enable such queries. It explores the notion of context in mobile queries. The document ends with a few observations, mainly regarding challenges.
Boll, S., M. Breunig, N. Davies, C. S. Jensen, B. König-Ries, R. Malaka, F. Matthes, C. Panayiotou, S. Šaltenis, T. Schwarz, "Towards a Handbook for User-Centred Mobile Application Design," in Proceedings of the Dagstuhl Seminar 04441 on Mobile Information Management, Schloss Dagstuhl, Wadern, Germany, 8 pages, October 24-29, 2004.

Publication [not publicly available]

Online at Dagstuhl
Why do we have difficulties designing mobile apps? Is there a "Mobile RUP"?
Jensen, C. S., A. Kligys, T. B. Pedersen, I. Timko, "Multidimensional Data Modeling For Location-Based Services," The VLDB Journal, Vol. 13, No. 1, pp. 1-21, January 2004.

Publication

ACM Author-Izer
With the recent and continuing advances in areas such as wireless communications and positioning technologies, mobile, location-based services n are becoming possible. Such services deliver location-dependent content to their users. More specifically, these services may capture the movements and requests of their users in multidimensional databases, i.e., data warehouses, and content delivery may be based on the results of complex queries on these data warehouses. Such queries aggregate detailed data in order to find useful patterns, e.g., in the interaction of a particular user with the services. The application of multidimensional technology in this context poses a range of new challenges. The specific challenge addressed here concerns the provision of an appropriate multidimensional data model. In particular, the paper extends an existing multidimensional data model and algebraic query language to accommodate spatial values that exhibit partial containment relationships instead of the total containment relationships normally assumed in multidimensional data models. Partial containment introduces imprecision in aggregation paths. The paper proposes a method for evaluating the imprecision of such paths. The paper also offers transformations of dimension hierarchies with partial containment relationships to simple hierarchies, to which existing precomputation techniques are applicable.
Torp, K., C. S. Jensen, R. T. Snodgrass, "Modification Semantics in Now-relative Databases," Information Systems, Vol. 18, No. 8, pp. 653-683, December 2004.

Publication [not publicly available]

Online by Elsevier at ScienceDirect
Most real-world databases record time-varying information. In such databases, the notion of "the current time," or NOW, occurs naturally and prominently. For example, when capturing the past states of a relation using begin and end time columns, tuples that are part of the current state have some past time as their begin time and NOW as their end time. While the semantics of such variable databases has been described in detail and is well understood, the modification of variable databases remains unexplored. This paper defines the semantics of modifications involving the variable NOW. More specifically, the problems with modifications in the presence of NOW are explored, illustrating that the main problems are with modifications of tuples that reach into the future. The paper defines the semantics of modifications-including insertions, deletions, and updates-of databases without NOW, with NOW, and withvalues of the type NOW + D; where D is a non-variable time duration. To accommodate these semantics, three new timestamp values are introduced. Finally, implementation is explored. We show how to represent the variable NOW withcolumns of standard SQL data types and give a mapping from SQL on NOW-relative data to standard SQL on these columns. The paper thereby completes the semantics, the querying, and the modification of now-relative databases.
Huang, X., C. S. Jensen, "In-Route Skyline Querying for Location-Based Services," in Proceedings of the Fourth International Workshop on Web and Wireless Geographic Information Systems, Goyang, South Korea, pp. 223-238, November 26-27, 2004.

Publication [not publicly available]
With the emergence of an infrastructure for location-aware mobile services, the processing of advanced, location-based queries that are expected to underlie such services is gaining in relevance. While most work has assumed that mobile objects move in Euclidean space, this paper considers the case where movement is constrained to a road network and where points of interest can be reached via the network. More specifically, the paper assumes that the queries are issued by objects moving along routes towards destinations. The paper defines in-route nearest-neighbor skyline queries in this setting and considers their efficient implementation. These skyline queries take into account several spatial preferences, and they intuitively return a set of most interesting results for each result returned by the corresponding non-skyline queries. The paper also reports on an empirical performance evaluation of the proposed implementations based on real road network and point-of-interest data.
Brilingaite, N., C. S. Jensen, N. Zokaite, "Enabling Routes as Context in Mobile Services," in Proceedings of the Twelfth ACM International Symposium on Advances in Geographic Information Systems, Washington DC, USA, pp. 127-136, November 12-13, 2004.

Publication [not publicly available]

Online at ACM Digital Library
With the continuing advances in wireless communications, geo-positioning, and portable electronics, an infrastructure is emerging that enables the delivery of on-line, location-enabled services to very large numbers of mobile users. A typical usage situation for mobile services is one characterized by a small screen and no keyboard, and by the service being only a secondary focus of the user. It is therefore particularly important to deliver the "right" information and service at the right time, with as little user interaction as possible. This may be achieved by making services context aware.
Mobile users frequently follow the same route to a destination as they did during previous trips to the destination, and the route and destination are important aspects of the context for a range of services. This paper presents key concepts underlying a software component that discovers the routes of a user along with their usage patterns and that makes the accumulated routes available to services. Experiences from using the component with real GPS logs are reported.
Huang, X., C. S. Jensen, "Towards A Streams-Based Framework for Defining Location-Based Queries," in Proceedings of the Second International Workshop on Spatio-Temporal Database Management, pp. 73-80 , August 30, 2004.

Publication
An infrastructure is emerging that supports the delivery of on-line, location-enabled services to mobile users. Such services involve novel database queries, and the database research community is quite active in proposing techniques for the efficient processing of such queries. In parallel to this, the management of data streams has become an active area of research.
While most research in mobile services concerns performance issues, this paper aims to establish a formal framework for defining the semantics of queries encountered in mobile services, most notably the so-called continuous queries that are particularly relevant in this context. Rather than inventing an entirely new framework, the paper proposes a framework that builds on concepts from data streams and temporal databases. Definitions of example queries demonstrates how the framework enables clear formulation of query semantics and the comparison of queries. The paper also proposes a categorization of location-based queries.
Jensen, C. S., D. Lin, B. C. Ooi, "Query and Update Efficient B+-Tree Based Indexing of Moving Objects," in Proceedings of the Thirtieth International Conference on Very Large Data Bases, pp. 768-779, August 30-September 3, 2004.

Publication

Online at VLDB
A number of emerging applications of data management technology involve the monitoring and querying of large quantities of continuous variables, e.g., the positions of mobile service users, termed moving objects. In such applications, large quantities of state samples obtained via sensors are streamed to a database. Indexes for moving objects must support queries efficiently, but must also support frequent updates. Indexes based on minimum bounding regions (MBRs) such as the R-tree exhibit high concurrency overheads during node splitting, and each individual update is known to be quite costly. This motivates the design of a solution that enables the B+-tree to manage moving objects. We represent moving-object locations as vectors that are timestamped based on their update time. By applying a novel linearization technique to these values, it is possible to index the resulting values using a single B+-tree that partitions values according to their timestamp and otherwise preserves spatial proximity. We develop algorithms for range and k nearest neighbor queries, as well as continuous queries. The proposal can be grafted into existing database systems cost effectively. An extensive experimental study explores the performance characteristics of the proposal and also shows that it is capable of substantially outperforming the R-tree based TPR-tree for both single and concurrent access scenarios.
Friis-Christensen, A., J. V. Christensen, C. S. Jensen, "A Framework for Conceptual Modeling of Geographic Data Quality," in Proceedings of the Eleventh International Symposium on Spatial Data Handling, pp. 605-616 , August 23-25, 2004.

Publication [not publicly available]

Online at Springerlink
The notion of data quality is of particular importance to geographic data. One reason is that such data is often inherently imprecise. Another is that the usability of the data is in large part determined by how "good" the data is, as different applications of geographic data require different qualities of the data are met. Such qualities concern the object level as well as the attribute level of the data. This paper presents a systematic and integrated approach to the conceptual modeling of geographic data and quality. The approach integrates quality information with the basic model constructs. This results in a model that enables object-oriented specification of quality requirements and of acceptable quality levels. More specifically, it extends the Unified Modeling Language with new modeling constructs based on standard classes, attributes, and associations that include quality information. A case study illustrates the utility of the quality-enabled model.
Civilis, A., C. S. Jensen, J. Nenortaite, S. Pakalnis, "Efficient Tracking of Moving Objects with Precision Guarantees," in Proceedings of the First Annual International Conference on Mobile and Ubiquitous Systems: Networking and Services, pp. 164-173 , August 22-25, 2004.

Publication

Online at IEEE
Sustained advances in wireless communications, geo-positioning, and consumer electronics pave the way to a kind of location-based service that relies on the tracking of the continuously changing positions of an entire population of service users. This type of service is characterized by large volumes of updates, giving prominence to techniques for location representation and update.
This paper presents several representations, along with associated update techniques, that predict the present and future positions of moving objects. An update occurs when the deviation between the predicted and the actual position of an object exceeds a given threshold. For the case where the road network, in which an object is moving, is known, we propose a so-called segment-based policy that predicts an object's movement according to the road's shape. Map matching is used for determining the road on which an object is moving. Empirical performance studies based on a real road network and GPS logs from cars are reported.
Gregersen, H., C. S. Jensen, "Conceptual Modeling of Time-Varying Information," in Proceedings of the International Conference on Computing, Communications and Control Technologies, pp. 248-255 , August 14-17, 2004.

Publication [not publicly available]
A wide range of database applications manage information that varies over time. Many of the underlying database schemas of these were designed using the Entity-Relationship (ER) model. In the research community as well as in industry, it is common knowledge that the temporal aspects of the mini-world are important, but difficult to capture using the ER model. Several enhancements to the ER model have been proposed in an attempt to support the modeling of temporal aspects of information. Common to the existing temporally extended ER models, few or no specific requirements to the models were given by their designers.
With the existing proposals, an ontological foundation, and novel requirements as its basis, this paper defines a graphical, temporally extended ER model. The ontological foundation serves to ensure an orthogonal design, and the requirements aim, in part, at ensuring a design that naturally extends the syntax and semantics of the regular ER model. The result is a novel model that satisfies an array of properties not satisfied by any single previously proposed model.
Pedersen, T. B., C. S. Jensen, "Multidimensional Databases," Chapter 12, pp. 12-1 - 12-13, in Industrial Information Technology Handbook, edited by R. Zurawski, CRC Press, November 2004.

Publication [not publicly available]
Jensen, C. S., "Database Aspects of Location-Based Services," Chapter 5, pp. 115-147 in Location-Based Services, edited by J. Schiller and A. Voisard, Morgan Kaufmann Publishers , 2004.

Publication [not publicly available]
Adopting a data management perspective on location-based services, this chapter explores central challenges to data management posed by location-based services. Because service users typically travel in, and are constrained to, transportation infrastructures, such structures must be represented in the databases underlying high-quality services. Several integrated representations - which capture different aspects of the same infrastructure - are needed. Further, all other content that can be related to geographical space must be integrated with the infrastructure representations.
The chapter describes the general concepts underlying one approach to data modeling for location-based services. The chapter also covers techniques that are needed to keep a database for location-based services up to date with the reality it models. As part of this, caching is touched upon briefly. The notion of linear referencing plays an important role in the chapter's approach to data modeling. Thus, the chapter offers an overview of linear referencing concepts and describes the support for linear referencing in Oracle.
Jensen, C. S., H. Lahrmann, S. Pakalnis, J. Runge, "The INFATI Data," TimeCenter Technical Report TR-79 and CoRR cs.DB/0410001 (2004), 10 pages, July 2004.

Publication

Publication at CoRR
The ability to perform meaningful empirical studies is of essence in research in spatio-temporal query processing. Such studies are often necessary to gain detailed insight into the functional and performance characteristics of proposals for new query processing techniques. We present a collection of spatio-temporal data, collected during an intelligent speed adaptation project, termed INFATI, in which some two dozen cars equipped with GPS receivers and logging equipment took part. We describe how the data was collected and how it was "modified" to afford the drivers some degree of anonymity. We also present the road network in which the cars were moving during data collection. The GPS data is publicly available for non-commercial purposes. It is our hope that this resource will help the spatio-temporal research community in its efforts to develop new and better query processing techniques.
Bohm, M., E. Bonnerup, C. Elberling, C. S. Jensen, C. P. Knudsen, L. Leffland, H. H. Lund, N. Olsen, L. Pallesen, K. Sørensen, "Det begynder i skolen - En ATV-rapport om naturfagenes vilkår og fremtidige udviklingsmuligheder i grundskolen," Danish Academy of Technical Sciences, 81 pages, April 2004.

Publication [not publicly available]

Online at ATV
ATV giver med dette debatoplæg for første gang sit bud på, hvordan naturfagsundervisningen specifikt og fagligheden generelt kan styrkes i grundskolen. Det gør vi, fordi den danske grundskole har ondt i fagligheden - og særlig ondt i naturfagligheden. Undervisningen i Natur og Teknik bliver nedprioriteret, de naturfaglige miljøer på skolerne har mange steder ringe gennemslagskraft, og alt for mange elever mister i løbet af deres skoletid interessen for naturfagene. Dette er ikke en tilfredsstillende situation for den største og vigtigste uddannelses- og kulturinstitution i Danmark.
Det er i grundskolen, vi skal sætte ind for at nære og bevare elevernes interesse for naturfag og teknik. Denne interesse skal eleverne kunne tage med sig på deres færd gennem uddannelsessystemet - ud på arbejdsmarkedet og videre ud i samfundet. Indsigt i natur og teknologi er nødvendig for at kunne virke i et vidensamfund. Det gælder alle uanset uddannelse, job og livsform. En grundlæggende viden om naturfag giver den enkelte elev gode muligheder for senere at vælge et spændende uddannelses- og senere jobforløb. Men vigtig er også muligheden for at forstå, tage stilling til og agere i forhold til alle de områder af tilværelsen, hvor naturvidenskab og teknologi spiller en rolle. Både i forhold til det enkelte individ, samfundets indretning og fremtidsperspektiver.
Danmarks velfærd begynder i grundskolen. Vi ønsker, at grundskolen bliver en bedre arbejdsplads for lærerne og en mere interessant skole for eleverne. Vi inviterer til en konstruktiv og fordomsfri dialog.
Debatoplægget vil belyse to overordnede indsatsområder:
1. Mulighederne for at skabe et naturfagligt fokusområde i den danske grundskole, samt optimere brugen af de ressourcer, der i øjeblikket anvendes på grundskolen.
2. Mulighederne for at styrke den generelle faglighed i grundskolen og sikre en bedre sammenhæng mellem undervisningen i grundskolen og på de videre ungdomsuddannelser.
ATV's Naturfagsudvalg er et udvalg under ATV's Tænketank. Tænketanken arbejder sideløbende med et projekt om naturfagenes vilkår i gymnasieskolen, som denne kommer til at se ud efter den nye gymnasiereform.
ATV's Tænketank arbejder med tekniske og naturvidenskabelige emner og problemstillinger, som har samfundsmæssig relevans og betydning.
Pelanis, M., S. Šaltenis, C. S. Jensen, "Indexing the Past, Present and Anticipated Future Positions of Moving Objects," TimeCenter Technical Report TR-78, 30 pages, July 2004.

Publication
With the proliferation of wireless communications and geo-positioning, e-services are envisioned that exploit the positions of a set of continuously moving users to provide context-aware functionality to each individual user. Because advances in disk capacities continue to outperform Moore's Law, it becomes increasingly feasible to store on-line all the position information obtained from the moving e-service users. With the much slower advances in I/O speeds and many concurrent users, indexing techniques are of essence in this scenario.
Past indexing techniques capture the position of an object up until the time of the most recent position sample, or they represent an object's position as a constant or linear function of time and capture the position from the current time and into the (near) future. This paper offers an indexing technique capable of capturing the positions of moving objects at all points in time. The index substantially extends partial persistence techniques, which support transaction time, to support valid time for monitoring applications. The performance of a query is independent of the number of past position samples stored for an object. No existing indices exist with these characteristics.
Lee, M. L., W. Hsu, C. S. Jensen, B. Cui, "Supporting Frequent Updates in R-Trees: A Bottom-Up Approach," DB Technical Report TR-6, 23 pages (also TRA4/04, School of Computing, National University of Singapore), April 2004.

Publication
Advances in hardware-related technologies promise to enable new data management applications that monitor continuous processes. In these applications, enormous amounts of state samples are obtained via sensors and are streamed to a database. Further, updates are very frequent and may exhibit locality. While the R-tree is the index of choice for multi-dimensional data with low dimensionality, and is thus relevant to these applications, R-tree updates are also relatively inefficient. We present a bottom-up update strategy for R-trees that generalizes existing update techniques and aims to improve update performance. It has different levels of reorganization - ranging from global to local - during updates, avoiding expensive top-down updates. A compact main-memory summary structure that allows direct access to the R-tree index nodes is used together with efficient bottom-up algorithms. Empirical studies indicate that the bottom-up strategy outperforms the traditional top-down technique, leads to indices with better query performance, achieves higher throughput, and is scalable.
Pfoser, D., N. Tryfona, C. S. Jensen, "Indeterminacy and Spatiotemporal Data: the Moving Point Object Case," Technical Report No. 2004/02/03, Computer Technology Institute , February 2004.

Publication [not publicly available]
Civilis, A., C. S. Jensen, J. Nenortaite, S. Pakalnis, "Efficient Tracking of Moving Objects with Precision Guarantees," DB Technical Report TR-5, 23 pages, February 2004.

Publication
We are witnessing continued improvements in wireless communications and geo-positioning. In addition, the performance/price ratio for consumer electronics continues to improve. These developments pave the way to a kind of location-based service that relies on the tracking of the continuously changing positions of the entire population of service users. This type of service is characterized by large volumes of updates, giving prominence to techniques for location representation and update.
In this paper, we present several representations, along with associated update techniques, that predict the future positions of moving objects. For all representations, the predicted position of a moving object is updated whenever the deviation between it and the actual position of the object exceeds a given threshold. For the case where the road network, in which the object is moving, is known, we propose a so-called segment-based policy that represents and predicts an object's movement according to the road's shape. Map matching is used for determining the road on which an object is moving. Empirical performance studies and comparisons of the proposed techniques based on a real road network and GPS logs from cars are reported.
2003 top Skyt, J., C. S. Jensen, L. Mark, "A Foundation for Vacuuming Temporal Databases," Transactions on Data and Knowledge Engineering, Vol. 44, No. 1, pp. 1-29, January 2003.

Publication [not publicly available]

Online by Elsevier at ScienceDirect
A wide range of real-world database applications, including financial and medical applications, are faced with accountability and traceability requirements. These requirements lead to the replacement of the usual update-in-place policy by an append-only policy that retain all previous states in the database. This policy result in so-called transaction-time databases which are ever-growing. A variety of physical storage structures and indexing techniques as well as query languages have been proposed for transaction-time databases, but the support for physical removal of data, termed vacuuming, has only received little at- tention. Such vacuuming is called for by, e.g., the laws of many countries and the policies of many busi- nesses. Although necessary, with vacuuming, the database's perfect recollection of the past may be compromised via, e.g., selective removal of records pertaining to past states. This paper provides a semantic foundation for the vacuuming of transaction-time databases. The main focus is to establish a foundation for the correct processing of queries and updates against vacuumed databases. However, options for user, application, and database interactions in response to queries and updates against vacuumed data are also outlined.
Nytun, J. P., C. S. Jensen, V. A. Oleshchuk, "Towards a Data Consistency Modeling and Testing Framework for MOF Defined Languages," Norsk informatikkonferanse 2003, Oslo, Norway, 12 pages, November 24-26, 2003.

Publication
The number of online data sources is continuously increasing, and related data are often available from several sources. However accessing data from multiple sources is hindered by the use of different languages and schemas at the sources, as well as by inconsistencies among the data. There is thus a growing need for tools that enable the testing of consistency among data from different sources.
This paper puts forward the concept of a framework, that supports the integration of UML models and ontologies written in languages such as the W3C Web Ontology Language (OWL). The framework will be based on the Meta Object Facility (MOF); a MOF metamodel (e.g. a metamodel for OWL) can be input as a specification, the framework will then allow the user to instantiate the specified metamodel.
Consistencies requirements are specified using a special modeling technique that is characterized by its use of special Boolean class attributes, termed consistency attributes, to which OCL expressions are attached. The framework makes it possible to exercise the modeling technique on two or more legacy models and in this way specify consistency between models. Output of the consistency modeling is called an integration model which consist of the legacy models and the consistency model. The resulting integration model enables the testing of consistency between instances of legacy models; the consistency model is automatically instantiated and the consistency attribute values that are false indicates inconsistencies.
Speicys, L., C. S. Jensen, A. Kligys, "Computational Data Modeling for Network-Constrained Moving Objects," in Proceedings of the Eleventh International Symposium on Advances in Geographic Information Systems, New Orleans, LA, pp. 118-125, November 7-8, 2003.

Publication [not publicly available]

ACM Author-Izer
Advances in wireless communications, positioning technology, and other hardware technologies combine to enable a range of applications that use a mobile user's geo-spatial data to deliver online, location-enhanced services, often referred to as location-based services. Assuming that the service users are constrained to a transportation network, this paper develops data structures that model road networks, the mobile users, and stationary objects of interest. The proposed framework encompasses two supplementary road network representations, namely a two-dimensional representation and a graph representation. These capture aspects of the problem do main that are required in order to support the querying that underlies the envisioned location-based services.
Jensen, C. S., J. Kolar, T. B. Pedersen, I. Timko, "Nearest Neighbor Queries in Road Networks," in Proceedings of the Eleventh International Symposium on Advances in Geographic Information Systems, New Orleans, LA, pp. 1-8, November 7-8, 2003.

Publication [not publicly available]

ACM Author-Izer
With wireless communications and geo-positioning being widely available, it becomes possible to offer new e-services that provide mobile users with information about other mobile objects. This paper concerns active, ordered k-nearest neighbor queries for query and data objects that are moving in road networks. Such queries may be of use in many services.
Specifically, we present an easily implementable data model that serves well as a foundation for such queries. We also present the design of a prototype system that implements the queries based on the data model. The algorithm used for the nearest neighbor search in the prototype is presented in detail. In addition, the paper reports on results from experiments with the prototype system.
Pfoser, D., C. S. Jensen, "Indexing of Network Constrained Moving Objects," in Proceedings of the Eleventh International Symposium on Advances in Geographic Information Systems, New Orleans, LA, pp. 25-32, November 7-8, 2003.

Publication [not publicly available]

ACM Author-Izer
With the proliferation of mobile computing, the ability to index efficiently the movements of mobile objects becomes important. Objects are typically seen as moving in two-dimensional (x,y) space, which means that their movements across time may be embedded in the three-dimensional (x,y,t) space. Further, the movements are typically represented as trajectories, sequences of connected line segments. In certain cases, movement is restricted, and specifically in this paper, we aim at exploiting that movements occur in transportation networks to reduce the dimensionality of the data. Briefly, the idea is to reduce movements to occur in one spatial dimension. As a consequence, the movement data becomes two-dimensional (x,t). The advantages of considering such lower- dimensional trajectories are the reduced overall size of the data and the lower-dimensional indexing challenge. Since off-the-shelf database management systems typically do not offer higher- dimensional indexing, this reduction in dimensionality allows us to use such DBMSes to store and index trajectories. Moreover, we argue that, given the right circumstances, indexing these dimensionality-reduced trajectories can be more efficient than using a three-dimensional index. This hypothesis is verified by an experimental study that incorporates trajectories stemming from real and synthetic road networks.
Nytun, J. P., C. S. Jensen, "Modeling and Testing Legacy Data Consistency Requirements," in Proceedings of the Sixth International Conference on the Unified Modeling Language, San Francisco, CA, USA, pp. 341-355, October 20-24, 2003.

Publication

Online at SpringerLink
An increasing number of data sources are available on the Internet, many of which offer semantically overlapping data, but based on different schemas, or models. While it is often of interest to integrate such data sources, the lack of consistency among them makes this inte- gration difficult. This paper addresses the need for new techniques that enable the modeling and consistency checking for legacy data sources. Specifically, the paper contributes to the development of a framework that enables consistency testing of data coming from different types of data sources. The vehicle is UML and its accompanying XMI. The paper presents techniques for modeling consistency requirements using OCL and other UML modeling elements: it studies how models that describe the required consistencies among instances of legacy models can be de- signed in standard UML tools that support XMI. The paper also con- siders the automatic checking of consistency in the context of one of the modeling techniques. The legacy model instances that are inputs to the consistency check must be represented in XMI.
Lee, M. L., W. Hsu, C. S. Jensen, B. Cui, K. L. Teo, "Supporting Frequent Updates in R-Trees: A Bottom-Up Approach," in Proceedings of the Twentynineth International Conference on Very Large Data Bases, Berlin, Germany, pp. 608-619 , September 9-11, 2003.

Publication
Advances in hardware-related technologies promise to enable new data management applica- tions that monitor continuous processes. In these applications, enormous amounts of state samples are obtained via sensors and are streamed to a database. Further, updates are very frequent and may exhibit locality. While the R-tree is the index of choice for multi-dimensional data with low dimensionality, and is thus relevant to these applications, R-tree updates are also relatively in- efficient. We present a bottom-up update strategy for R-trees that generalizes existing update tech- niques and aims to improve update performance. It has different levels of reorganization-ranging from global to local-during updates, avoiding expensive top-down updates. A compact main- memory summary structure that allows direct access to the R-tree index nodes is used together with efficient bottom-up algorithms. Empirical studies indicate that the bottom-up strategy outperforms the traditional top-down technique, leads to indices with better query performance, achieves higher throughput, and is scalable.
Hage, C., C. S. Jensen, T. B. Pedersen, L. Speicys, I. Timko, "Integrated Data Management for Mobile Services in the Real World," in Proceedings of the Twentynineth International Conference on Very Large Data Bases, Berlin, Germany, pp. 1019-1030, September 9-11, 2003.

Publication
Market research companies predict a huge mar- ket for services to be delivered to mobile users. Services include route guidance, point-of-interest search, metering services such as road pricing and parking payment, traffic monitoring, etc. We be- lieve that no single such service will be the killer service, but that suites of integrated services are called for. Such integrated services reuse in- tegrated content obtained from multiple content providers.
This paper describes concepts and techniques un- derlying the data management system deployed by a Danish mobile content integrator. While geo- referencing of content is important, it is even more important to relate content to the transportation in- frastructure. The data management system thus re- lies on several sophisticated, integrated representa- tions of the infrastructure, each of which supports its own kind of use. The paper covers data model- ing, querying, and update, as well as the applica- tions using the system.
Friis-Christensen, A., C. S. Jensen, "Object-Relational Management of Multiply Represented Geographic Entities," in Proceedings of the Fifteenth International Conference on Scientific and Statistical Database Management, Cambridge, MA, USA, pp. 183-192, July 9-11, 2003.

Publication
Multiple representation occurs when information about the same geographic entity is represented electronically more than once. This occurs frequently in practice, and it invariably results in the occurrence of inconsistencies among the different representations. We propose to resolve this situation by introducing a multiple representation management system (MRMS), the schema of which includes rules that specify how to identify representations of the same entity, rules that specify consistency requirements, and rules used to restore consistency when necessary. In this paper, we demonstrate by means of a prototype and a real-world case study that it is possible to implement a multiple representation schema language on top of an object-relational database management system. Specifically, it is demonstrated how it is possible to map the constructs of the language used for specifying the multiple representation schema to functionality available in Oracle. Though some limitations exist, Oracle has proven to be a suitable platform for implementing an MRMS.
Böhlen, M. H., C. S. Jensen, "Temporal Data Model and Query Language Concepts," Encyclopedia of Information Systems, Vol. 4, pp. 437-453, Academic Press, Inc., 2003.

Publication [not publicly available]
Sellis, T., M. Koubarakis, A. Frank, S. Grumbach, R. H. Güting, C. S. Jensen, N. Lorentzos, Y. Manolopoulos, E. Nardelli, B. Pernici, H.-J. Schek, M. Scholl, B. Theodoulidis, N. Tryfona, editors, "Spatiotemporal Databases: The Chorochronos Approach," Lecture Notes in Computer Science, Volume 2520, Springer-Verlag, 352+xiv pages, 2003.

Jensen, C. S., editor, Special Issue of the IEEE Data Engineering Bulletin on Infrastructure for Research in Spatio-Temporal Query Processing, 26(2), 54 pages, June 2003.

Ed. letter

Online at Microsoft Research
Tryfona, N., R. Price, C. S. Jensen, "Conceptual Models for Spatio-temporal Applications," in T. Sellis et al., editors, Spatiotemporal Databases: The Chorochronos Approach, Lecture Notes in Computer Science, Volume 2520, Springer-Verlag, pp. 79-116, September 2003.

Publication

Online at SpringerLink
Güting, R. H., M. H. Böhlen, M. Erwig, L. Forlizzi, C. S. Jensen, N. Lorentzos, E. Nardelli, M. Schneider, M. Vazirgiannis, "Spatiotemporal Models and Languages: An Approach Based on Data Types," in T. Sellis et al., editors, Spatiotemporal Databases: The Chorochronos Approach, Lecture Notes in Computer Science, Volume 2520, Springer-Verlag, pp. 117-176, September 2003.

Publication

Online at SpringerLink
Di Pasquale, A., L. Forlizzi, C. S. Jensen, Y. Manolopoulos, E. Nardelli, D. Pfoser, G. Proietti, S. Šaltenis, Y. Theodoridis, T. Tzouramanis, M. Vassilakopoulos, "Access Methods and Query Processing," in T. Sellis et al., editors, Spatiotemporal Databases: The Chorochronos Approach, Lecture Notes in Computer Science, Volume 2520, Springer-Verlag, pp. 203-261, September 2003.

Publication

Online at SpringerLink
M. Breunig, T. Can, M. H. Böhlen, S. Dieker, R. H. Güting, C. S. Jensen, L. Relly, P. Rigaux, H.-J. Schek, M. Scholl, "Architecture and Implementation of Spatio-Temporal DBMS," in T. Sellis et al., editors, Spatiotemporal Databases: The Chorochronos Approach, Lecture Notes in Computer Science, Volume 2520, Springer-Verlag, pp. 263-318, September 2003.

Publication

Online at SpringerLink
C. S. Jensen, T. B. Pedersen, L. Speicys, I. Timko, "Data Modeling for Mobile Services in the Real World," in Proceedings of the Eighth International Symposium on Spatial and Temporal Databases, Santorini, Greece, pp. 1-9. Lecture Notes in Computer Science, Volume 2750, July 24-27, 2003.

Publication

Online at SpringerLink
Research contributions on data modeling, data structures, query processing, and indexing for mobile services may have an impact in the longer term, but each contribution typically offers an isolated solution to one small part of the practical problem of delivering mobile services in the real world. In contrast, this paper describes holistic concepts and techniques for mobile data modeling that are readily applicable in practice. Focus is on services to be delivered to mobile users, such as route guidance, point-of-interest search, road pricing, parking payment, traffic monitoring, etc. While geo-referencing of content is important, it is even more important to relate content to the transportation infrastructure. In addition, several sophisticated, integrated representations of the infrastructure are needed.
Jensen, C. S., A. Schmidt, "Spatio-Temporal Data Exchange Standards," in C. S. Jensen, editor: Special Issue on Infrastructure for Research in Spatio-Temporal Query Processing, IEEE Data Engineering Bulletin, 26(2), pp. 51-55, June 2003.

Publication
We believe that research that concerns aspects of spatio-temporal data management may benefit from taking into account the various standards for spatio-temporal data formats. For example, this may con- tribute to rendering prototype software "open" and more readily useful. This paper thus identifies and briefly surveys standardization in relation to primarily the exchange and integration of spatio-temporal data. An overview of several data exchange languages is offered, along with reviews their potential for facilitating the collection of test data and the leveraging of prototypes. The standards, most of which are XML-based, lend themselves to the integration of prototypes into middleware architectures, e.g., as Web services.
Jensen, C. S., "Introduction to Special Issue with Best Papers from EDBT 2002," Information Systems, 28(1-2), pp. 1-2, March-April 2003.

Publication [not publicly available]

Online by Elsevier at ScienceDirect
2002 top Agarwal, P. K., L. J. Guibas, H. Edelsbrunner, J. Erickson, M. Isard, S. Har-Peled, J. Hershberger, C. S. Jensen, L. E. Kavraki, P. Koehl, M. Lin, D. Manocha, D. N. Metaxas, B. Mirtich, D. M. Mount, S. Muthukrishnan, D. K. Pai, E. Sacks, J. Snoeyink, S. Suri, O. Wolfson, "Algorithmic Issues in Modeling Motion," ACM Computing Surveys, Vol. 34, No. 4, pp. 550-572, 2002.

Publication [not publicly available]

ACM Author-Izer
This article is a survey of research areas in which motion plays a pivotal role. The aim of the article is to review current approaches to modeling motion together with related data structures and algorithms, and to summarize the challenges that lie ahead in producing a more unified theory of motion representation that would be useful across several disciplines.
Skyt, J., C. S. Jensen, "Persistent Views-A Mechanism for Managing Ageing Data," The Computer Journal, Vol. 45, No. 5, pp. 481-493, 2002.

Publication [not publicly available]

Online at Oxford Journals
Enabled by the continued advances in storage technologies, the amounts of on-line data grow at a rapidly increasing pace. This development is witnessed in many data warehouse-type applications, including so-called data webhouses that accumulate click streams from portals. The presence of very large and continuously growing amounts of data introduces new challenges, one of them being the need for effective management of aged data. In very large and growing databases, some data eventually becomes inaccurate or outdated and may be of reduced interest to the database applications. This paper offers a mechanism, termed persistent views, that aids in flexibly reducing the volume of data, for example, by enabling the replacement of such 'low-interest', detailed data with aggregated data. The paper motivates persistent views and precisely defines and contrasts these with the related mechanisms of views, snapshots and physical deletion. The paper also offers a provably correct foundation for implementing persistent views.
Šaltenis, S., C. S. Jensen, "Indexing of now-relative spatio-bitemporal data," The VLDB Journal, Vol. 11, No. 1, pp. 1-16, 2002.

Publication

Online at SpringerLink
Real-world entities are inherently spatially and temporally referenced, and database applications increasingly exploit databases that record the past, present, and anticipated future locations of entities, e.g., the residences of customers obtained by the geo-coding of addresses. Indices that efficiently support queries on the spatio-temporal extents of such entities are needed. However, past indexing research has progressed in largely separate spatial and temporal streams. Adding time dimensions to spatial indices, as if time were a spatial dimension, neither supports nor exploits the special properties of time. On the other hand, temporal indices are generally not amenable to extension with spatial dimensions. This paper proposes the first efficient and versatile index for a general class of spatio-temporal data: the discretely changing spatial aspect of an object may be a point or may have an extent; both transaction time and valid time are supported, and a generalized notion of the current time, now, is accommodated for both temporal dimensions. The index is based on the R*-tree and provides means of prioritizing space versus time, which enables it to adapt to spatially and temporally restrictive queries. Performance experiments are reported that evaluate pertinent aspects of the index.
Snaprud, M. H., C. S. Jensen, N. Ulltveit-Moe, J. P. Nytun, M. E. Rafoshei-Klev, A. Sawicka, O. Hanssen, "Towards a Web Accessibility Monitor," in Proceedings of the Second European Medical and Biological Engineering Conference, Vienna, Austria, December 4-8, 2002.

Publication
A tool for the assessment and monitoring of web content accessibility is proposed. The experimental prototype utilises an Internet robot and stores the collected accessibility data in a data warehouse for further analysis. The evaluation is partly based on the Web Accessibility guidelines from W3C.
Jensen, C. S., A. Kligys, T. B. Pedersen, I. Timko, "Multidimensional Data Modeling for Location-Based Services," in Proceedings of the Tenth ACM International Symposium on Advances in Geographic Information Systems, McLean, VA, pp. 55-61, November 2002.

Publication [not publicly available]

Online at ACM Digital Library
With the recent and continuing advances in areas such as wireless communications and positioning technologies, mobile, location-based services are becoming possible. Such services deliver location-dependent content to their users. More specifically, these services may capture the movements of their users in multidimensional databases, and their delivery of content in response to user requests may be based on the issuing of complex, multidimensional queries.
The application of multidimensional technology in this and other contexts poses a range of new challenges. This paper aims to provide an appropriate multidimensional data model. In particular, the paper extends an existing multidimensional data model to accommodate spatial values that exhibit partial containment relationships instead of the total containment relationships normally assumed in multidimensional data models.
Friis-Christensen, A. A., D. Skogan, C. S. Jensen, G. Skagenstein, N. Tryfona, "Management of Multiply Represented Geographic Entities," in Proceedings of the 2002 International Data Engineering and Applications Symposium, Edmonton, Canada, pp. 150-159, 2002.

Publication
Multiple representation of geographic information occurs when a real-world entity is represented more than once in the same or different databases. In this paper, we propose a new approach to the modeling of multiply represented entities and the relationships among the entities and their representations. A Multiple Representation Management System is outlined that can manage multiple representations consistently over a number of autonomous databases. Central to our approach is the Multiple Representation Schema Language that is used to configure the system. It provides an intuitive and declarative means of modeling multiple representations and specifying rules that are used to maintain consistency, match objects representing the same entity, and restore consistency if necessary.
Benetis, R., C. S. Jensen, G. Karčiauskas, S. Šaltenis, "Nearest Neighbor and Reverse Nearest Neighbor Queries for Moving Objects," in Proceedings of the 2002 International Data Engineering and Applications Symposium, Edmonton, Canada, pp. 44-53, July 17-19, 2002.

Publication
With the proliferation of wireless communications and the rapid advances in technologies for tracking the positions of continuously moving objects, algorithms for efficiently answering queries about large numbers of moving objects increasingly are needed. One such query is the reverse nearest neighbor (RNN) query that returns the objects that have a query object as their closest object. While algorithms have been proposed that compute RNN queries for non-moving objects, there have been no proposals for answering RNN queries for continuously moving objects. Another such query is the nearest neighbor (NN) query, which has been studied extensively and in many contexts. Like the RNN query, the NN query has not been explored for moving query and data points.
This paper proposes an algorithm for answering RNN queries for continuously moving points in the plane. As a part of the solution to this problem and as a separate contribution, an algorithm for answering NN queries for continuously moving points is also proposed. The results of performance experiments are reported.
Šaltenis, S., C. S. Jensen, "Indexing of Moving Objects for Location-Based Services," in Proceedings of the 18th International Conference on Data Engineering, San Jose, CA, USA, pp. 463-472, February 26-March 1, 2002.

Publication

Online at IEEE
Visionaries predict that the Internet will soon extend to billions of wireless devices, or objects, a substantial fraction of which will offer their changing positions to location-based services. This paper assumes an Internet-service scenario where objects that have not reported their position within a specified duration of time are expected to no longer be interested in, or of interest to, the service. Due to the possibility of many "expiring" objects, a highly dynamic database results.The paper presents an R-tree based technique for the indexing of the current positions of such objects. Different types of bounding regions are studied, and new algorithms are provided for maintaining the tree structure. Performance experiments indicate that, when compared to the approach where the objects are not assumed to expire, the new indexing technique can improve search performance by a factor of two or more without sacrificing update performance.
Skyt, J., C. S. Jensen, T. B. Pedersen, "Specification-Based Data Reduction in Dimensional Data Warehouses," in Proceedings of the 18th International Conference on Data Engineering, San Jose, CA, USA, p. 278, February 26-March 1, 2002.

Publication
Jensen, C. S., editor, Special Issue of the IEEE Data Engineering Bulletin on Indexing of Moving Objects, Vol. 25, No. 2, 60 pages, June 2002.

Ed. letter

Online at Microsoft Research
Jensen, C. S., K. Jeffrey, J. Pokorny, S. Šaltenis, E. Bertino, K. Böhm, M. Jarke, editors, "Advances in Database Technology," Eighth International Conference on Extending Database Technology, Prague, Czech Republic, Lecture Notes in Computer Science, Volume 2287, Springer-Verlag, 776+xvi pages, March 2002.

Online at SpringerLink
Šaltenis, S., C. S. Jensen, "Indexing of Objects on the Move," Mining Spatio-Temporal Information Systems, pp. 21-41, 2002.

Publication [not publicly available]
Visionaries predict that the Internet will soon extend to billions of wireless devices, or objects, a substantial fraction of which will offer their changing positions to location-enabled services. This chapter assumes an Internet-service scenario where objects that have not reported their position within a specified duration of time are expected to no longer be interested in, or of interest to, the service. Due to the possibility of many "expiring'" objects, a highly dynamic database results.
The chapter describes two R-tree based techniques for the indexing of the current positions of such objects. These indexing techniques accommodate object positions that are described by linear functions of time. They employ novel types of bounding regions, as well as new algorithms for maintaining their tree structure. The results of quite encouraging performance experiments are described briefly.
Price, R., N. Tryfona, C. S. Jensen, "Extending UML for Space and Time-Dependent Applications," Chapter XVII, pp. 342-366, in Advanced Topics in Database Research, Vol. 1, edited by K. Siau, Idea Group Publishing, 2002.

Publication [not publicly available]
Jensen, C. S., "Location-Enabled Services - A Data Management Perspective," in The Nordic GIS Conference 2002: GI - Communication and Perspective, Aalborg, Denmark, p. 16, November 25-27, 2002.

Jensen, C. S., J. P. Nytun, M. Snaprud, "Towards Virtual Worlds and Augmented Realities: A Research Agenda," in M. Pätzold (ed.), Proceedings of the Second International Workshop on Research Directions in Mobile Communications and Services, Grimstad, Norway, pp. 19-22, September 2002.

Publication
Powerful drivers combine to enable the capture of reality in computers as well as the ubiquitous delivery of information content and services, based on the captured reality. We outline key drivers, exemplify application areas related to the research agenda, and describe briefly general software challenges as well as one specific data representation challenge to software technologies posed by the research agenda.
Jensen, C. S., S. Šaltenis, "Towards Increasingly Update Efficient Moving-Object Indexing," in Special Issue on Indexing of Moving Objects, IEEE Data Engineering Bulletin, Vol. 25, No. 2, edited by C. S. Jensen, pp. 35-40, June 2002.

Publication
Current moving-object indexing concentrates on point-objects capable of continuous movement in one-, two-, and three-dimensional Euclidean spaces, and most approaches are based on well-known, conventional spatial indices. Approaches that aim at indexing the current and anticipated future positions of moving objects generally must contend with very large update loads because of the agility of the objects indexed. At the same time, conventional spatial indices were often originally proposed in settings characterized by few updates and focus on query performance. In this paper, we characterize the challenge of moving-object indexing and discuss a range of techniques, the use of which may lead to better update performance.
Jensen, C. S., "Location-Based Services - A Data Management Perspective," in MapDays 2002 (Kartdagar 2002), Jönköping, Sweden, p. 44, April 17-19, 2002.

Publication
We are heading rapidly towards a global computing and information infrastructure that will contain billions of wirelessly connected devices, many of which will offer so-called location-based services to their mobile users always and everywhere. Indeed, users will soon take ubiquitous wireless access to information and services for granted. This scenario is made possible by the rapid advances in the underlying hardware technologies, which continue to follow variants of Moore s Law.
Examples of location-based services include tracking, way-finding, traffic management, safety-related services, and mixed-reality games, to name but a few. This paper outlines a general usage scenario for location-based services, in which data warehousing plays a central role, and it describes central challenges to be met by the involved software technologies in order for them to reach their full potential for usage in location-based services.
Jensen, C. S., S. Šaltenis, "Data Representation and Indexing in Location-Enabled M-Services," in National Science Foundation Workshop on Context-Aware Mobile Database Management, Providence, RI, USA, 3 pages, January 24-25, 2002.

Publication
Rapid, sustained advances in key computing technologies combine to enable a new class of computing services that aim to meet needs of mobile users. These ubiquitous and intelligent services adapt to each user s particular preferences and current circumstances they are personalized. The services exploit data available from multiple sources, including data on past interactions with the users, data accessible via the Internet, and data obtained from sensors. The user s geographical location is particularly central to these services. We outline some of the research challenges that aim to meet the data representation and indexing needs of such services.
Slivinskas, G., C. S. Jensen, R. T. Snodgrass, "Bringing Order to Query Optimization," ACM SIGMOD Record 31(2), pp. 5-14, June 2002.

Publication [not publicly available]

ACM Author-Izer
A variety of developments combine to highlight the need for respecting order when manipulating relations. For ex- ample, new functionality is being added to SQL to sup- port OLAP-style querying in which order is frequently an important aspect. The set- or multiset-based frameworks for query optimization that are currently being taught to database students are increasingly inadequate.
This paper presents a foundation for query optimization that extends existing frameworks to also capture ordering. A list-based relational algebra is provided along with three progressively stronger types of algebraic equivalences, concrete query transformation rules that obey the different equivalences, and a procedure for determining which types of transformation rules are applicable for optimizing a query. The exposition follows the style chosen by many textbooks, making it relatively easy to teach this material in continuation of the material covered in the textbooks, and to integrate this material into the textbooks.
Jensen, C. S., "Research Challenges in Location-Enabled M-Services," in Proceedings of the Third International Conference on Mobile Data Management, Singapore, pp. 3-7, January 8-11, 2002.

Publication
Rapid, sustained advances in key computing hardware technologies combine to enable a new class of computing services that aim to meet needs of mobile users. These ubiquitous and intelligent services adapt to each user s particular preferences and current circumstances they are personalized. The services exploit data available from multiple sources, including data on past interactions with the users, data accessible via the Internet, and data obtained from sensors. The user s geographical location is particularly central to these services.
We outline some of the research challenges that aim to meet the computing needs of such services. In particular, focus is on update and query processing in the context of geo-referenced data, where certain challenges related to the data representation, indexing, and precomputation are described.
Gao, D., C. S. Jensen, R. T. Snodgrass, M. D.Soo, "Join Operations in Temporal Databases," TimeCenter Technical Report TR-71, 50 pages, October 2002.

Publication
Joins are arguably the most important relational operators. Poor implementations are tantamount to computing the Cartesian product of the input relations. In a temporal database, the problem is more acute for two reasons. First, conventional techniques are designed for the evaluation of joins with equality predicates, rather than the inequality predicates prevalent in valid-time queries. Second, the presence of temporally-varying data dramatically increases the size of the database. These factors indicate that specialized techniques are needed to efficiently evaluate temporal joins.
We address this need for efficient join evaluation in temporal databases. Our purpose is two-fold. We first survey all previously proposed temporal join operators. While many temporal join operators have been defined in previous work, this work has been done largely in isolation from competing proposals, with little, if any, comparison of the various operators. We then address evaluation algorithms, comparing the applicability of various algorithms to the temporal join operators, and describing a performance study involving algorithms for one important operator, the temporal equijoin. Our focus, with respect to implementation, is on non-index based join algorithms. Such algorithms do not rely on auxiliary access paths, but may exploit sort orderings to achieve efficiency.
Jensen, C. S., A. Kligys, T. B. Pedersen, I. Timko, "Multidimensional Data Modeling For Location-Based Services," DB Technical Report TR-2, 30 pages, September 2002.

Publication
With the recent and continuing advances in areas such as wireless communications and positioning technologies, mobile, location-based services are becoming possible. Such services deliver location- dependent content to their users. More specifically, these services may capture the movements of their users in multidimensional databases, and their delivery of content in response to user requests may be based on the issuing of complex, multidimensional queries.
The application of multidimensional technology in this context poses a range of new challenges. The specific challenge addressed here concerns the provision of an appropriate multidimensional data model. In particular, the paper extends an existing multidimensional data model and algebraic query language to accommodate spatial values that exhibit partial containment relationships instead of the total containment relationships normally assumed in multidimensional data models. Partial containment introduces imprecision in aggregation paths. The paper proposes a method for evaluating the imprecision of such paths. The paper also offers transformations of dimension hierarchies with partial containment relationships to simple hierarchies, to which existing precomputation techniques are applicable.
Slivinskas, G., C. S. Jensen, R. T. Snodgrass, "Bringing Order to Query Optimization," DB Technical Report TR-1, 19 pages, September 2002.

Publication
A variety of developments combine to highlight the need for respecting order when manipulating relations. For example, new functionality is being added to SQL to support OLAP-style querying in which order is frequently an important aspect. The set- or multiset-based frameworks for query optimization that are currently being taught to database students are increasingly inadequate.
This paper presents a foundation for query optimization that extends existing frameworks to also capture ordering. A list-based relational algebra is provided along with three progressively stronger types of algebraic equivalences, concrete query transformation rules that obey the different equivalences, and a procedure for determining which types of transformation rules are applicable for optimizing a query. The exposition follows the style chosen by many textbooks, making it relatively easy to teach this material in continuation of the material covered in the textbooks, and to integrate this material into the textbooks.
Slivinskas, G., C. S. Jensen, "Enhancing an Extensible Query Optimizer with Support for Multiple Equivalence Types," TimeCenter Technical Report TR-70, 21 pages, August 2002.

Publication
Database management systems are continuously being extended with support for new types of data and more advanced querying capabilities. In large part because of this, query optimization has remained a very active area of research throughout the past two decades. At the same time, current commercial optimizers are hard to modify, to incorporate desired changes in, e.g., query algebras, transformation rules, search strategies. This has led to a number of research contributions that aim at creating extensible query optimizers. Examples include Starburst, Volcano, and OPT++.
This paper reports on a study that has enhanced Volcano to support a relational algebra with added temporal operators, such as temporal join and aggregation. This includes the handling of algorithms and cost formulas for these new operators, six types of query equivalences, and accompanying query transformation rules. The paper shows how the Volcano search-space generation and plan-search algo- rithms were extended to support the six equivalence types, describes other key implementation tasks, and evaluates the extensibility of Volcano.
Gregersen, H., C. S. Jensen, "On the Ontological Expressiveness of Temporal Extensions to the Entity-Relationship Model," TimeCenter Technical Report TR-69, 21 pages, August 2002.

Publication
It is widely recognized that temporal aspects of database schemas are prevalent, but also difficult to capture using the ER model. The database research community's response has been to develop temporally enhanced ER models. However, these models have not been subjected to systematic evaluation. In contrast, the evaluation of modeling methodologies for information systems development is a very active area of research in information systems engineering community, where the need for systematic evaluations of modeling methodologies is well recognized.
Based on a framework from information systems engineering, this paper evaluates the ontological expressiveness of three different temporal enhancements to the ER model, the Entity-Relation-Time model, the TERC+ model, and the Time Extended ER model. Each of these temporal ER model extensions is well-documented, and together the models represent a substantial range of the design space for temporal ER extensions. The evaluation considers the uses of the models for both analysis and design, and the focus is on how well the models capture temporal aspects of reality as well as of relational database designs.
2001 top Dyreson, C. E., M. H. Böhlen, C. S. Jensen, "MetaXPath," Journal of Digital Information, Vol. 2, No. 2, December 2001.

Publication

Online at JoDI
This paper presents the METAXPath data model and query language. METAXPath extends XPath with support for XML metadata. XPath is a specification language for locations in an XML document. It serves as the basis for XML query languages like XSLT and the XML Query Algebra.
The METAXPath data model is a nested XPath tree. Each level of metadata induces a new level of nest- ing. The data model separates metadata and data into different dataspaces, supports meta-metadata, and en- ables sharing of metadata common to a group of nodes without duplication. The METAXPath query language has a level shift operator to shift a query from a data level to a metadata level. METAXPath maximally reuses XPath hence the changes needed to support metadata are few. METAXPath is fully compatible with XPath.
Pedersen, T. B., C. S. Jensen, "Multidimensional Databases," IEEE Computer, Vol. 34, No. 12, pp. 40-46, December 2001.

Publication
Multidimensional data-base technology is a key factor in the interactive analysis of large amounts of data for decision-making purposes. In contrast to previous technologies, these databases view data as multidimensional cubes that are particularly well suited for data analysis. Multidimensional models categorize data either as facts with associated numerical measures or as textual dimensions that characterize the facts. Queries aggregate measure values over a range of dimension values to provide results such as total sales per month of a given product. Multidimensional database technology is being applied to distributed data and to new types of data that current technology often cannot adequately analyze. For example, classic techniques such as preaggregation cannot ensure fast query response times when data-such as that obtained from sensors or GPS-equipped moving objects-changes continuously. Multidimensional database technology will increasingly be applied where analysis results are fed directly into other systems, thereby eliminating humans from the loop. When coupled with the need for continuous updates, this context poses stringent performance requirements not met by current technology.
Pedersen, T. B., C. S. Jensen, C. E. Dyreson, "A Foundation for Capturing and Querying Complex Multidimensional Data," Information Systems (special issue on data warehousing), Vol. 26, No. 5, pp. 383-423, July 2001.

Publication [not publicly available]

Online by Elsevier at ScienceDirect
On-line analytical processing (OLAP) systems considerably improve data analysis and are finding wide-spread use. OLAP systems typically employ multidimensional data models to structure their data. This paper identifies 11 modeling requirements for multidimensional data models. These requirements are derived from an assessment of complex data found in real-world applications. A survey of 14 multidimensional data models reveals shortcomings in meeting some of the requirements. Existing models do not support many-to-many relationships between facts and dimensions, lack built-in mechanisms for handling change and time, lack support for imprecision, and are generally unable to insert data with varying granularities. This paper defines an extended multidimensional data model and algebraic query language that address all 11 requirements. The model reuses the common multidimensional concepts of dimension hierarchies and granularities to capture imprecise data. For queries that cannot be answered precisely due to the imprecise data, techniques are proposed that take into account the imprecision in the grouping of the data, in the subsequent aggregate computation, and in the presentation of the imprecise result to the user. In addition, alternative queries unaffected by imprecision are offered. The data model and query evaluation techniques discussed in this paper can be implemented using relational database technology. The approach is also capable of exploiting multidimensional query processing techniques like pre-aggregation. This yields a practical solution with low computational overhead.
Slivinskas, G., C. S. Jensen, R. T. Snodgrass, "A Foundation for Conventional and Temporal Query Optimization Addressing Duplicates and Ordering," IEEE Transactions on Knowledge and Data Engineering (special issue with extended versions of best papers from ICDE'2000), Vol. 13, No. 1, pp. 21-49, January/February 2001.

Publication
Most real-world databases contain substantial amounts of time-referenced, or temporal, data. Recent advances in temporal query languages show that such database applications may benefit substantially from built-in temporal support in the DBMS. To achieve this, temporal query representation, optimization, and processing mechanisms must be provided. This paper presents a foundation for query optimization that integrates conventional and temporal query optimization and is suitable for both conventional DBMS architectures and ones where the temporal support is obtained via a layer on top of a conventional DBMS. This foundation captures duplicates and ordering for all queries, as well as coalescing for temporal queries, thus generalizing all existing approaches known to the authors. It includes a temporally extended relational algebra to which SQL and temporal SQL queries may be mapped, six types of algebraic equivalences, concrete query transformation rules that obey different equivalences, a procedure for determining which types of transformation rules are applicable for optimizing a query, and a query plan enumeration algorithm. The presented approach partitions the work required by the database implementor to develop a provably correct query optimizer into four stages: The database implementor has to 1) specify operations formally, 2) design and prove correct appropriate transformation rules that satisfy any of the six equivalence types, 3) augment the mechanism that determines when the different types of rules are applicable to ensure that the enumeration algorithm applies the rules correctly, and 4) ensure that the mapping generates a correct initial query plan.
Price, R., N. Tryfona, C. S. Jensen, "Modeling Topological Constraints in Spatial Part-Whole Relationships," in Proceedings of the Twentieth International Conference on Conceptual Modeling, Yokohama, Japan, pp. 27-40, November 27-30, 2001.

Publication

Online at SpringerLink
To facilitate development of spatial applications, we investigate the problem of modeling topological constraints in part-whole relationships between spatial objects, where the related objects may themselves be composite. An example would be countries that belong to a supranational organization, where the countries are themselves composed of states. Current topological classification schemes are restricted to simple, bounded, regular, and/or 0-2D spatial data types; do not support the set-based topological constraints required to describe inter-part relationships such as those between members of a supranational organization; and focus primarily on query rather than design. We propose an approach to modeling topological relationships that allows specification of binary and set-based topological constraints on composite spatial objects. This approach does not depend on restricting the type of spatial objects, can be used to describe part-whole and inter-part relationships, and is at a level of detail suitable for use in conceptual modeling.
Friis-Christensen, A., N. Tryfona, C. S. Jensen, "Requirements and Research Issues in Geographic Data Modeling," in Proceedings of the Ninth ACM International Symposium on Advances in Geographic Information Systems, Atlanta, GA, USA, pp. 2-8, November 9-10, 2001.

Publication [not publicly available]

ACM Author-Izer
It is well-documented in the literature that geographic data have special characteristics that make the use of extensions to standard modeling languages and techniques, such as the Unified Modeling Language, attractive. Based on a real-world application from the Danish National Survey and Cadastre, this paper presents require- ments to geographic data modeling notations. Existing notations are then evaluated against the requirements, and a case study is carried out. The result is an identification of pertinent aspects of geographic data modeling-including roles of geographic objects, constraints on objects, and quality of data-that are not handled satisfactorily by existing proposals.
Dyreson, C. E., M. H. Böhlen, C. S. Jensen, "MetaXPath," in Proceedings of the 2001 International Conference on Dublin Core and Metadata Applications, Tokyo, Japan, pp. 17-23, October 22-26, 2001.

Publication

Online at NII
This paper presents the MetaXPath data model and query language. MetaXPath extends XPath with support for XML metadata. XPath is a specification language for locations in an XML document. It serves as the basis for XML query languages like XSLT and the XML Query Algebra.
The MetaXPath data model is a nested XPath tree. Each level of metadata induces a new level of nesting. The data model separates metadata and data into different dataspaces, supports meta-metadata, and enables sharing of metadata common to a group of nodes without duplication. The MetaXPath query language has a level shift operator to shift a query from a data level to a metadata level. MetaXPath maximally reuses XPath hence the changes needed to support metadata are few. MetaXPath is fully compatible with XPath
Slivinskas, G., C. S. Jensen, "Enhancing an Extensible Query Optimizer with Support for Multiple Equivalence Types," in Proceedings of the Fifth East-European Conference on Advances in Databases and Information Systems, Vilnius, Lithuania, pp. 55-69, September 25-28, 2001.

Publication

Online at SpringerLink
Database management systems are continuously being extended with support for new types of data and advanced querying capabilities. In large part because of this, query optimization has remained a very active area of research throughout the past two decades. At the same time, current commercial optimizers are hard to modify, to incorporate desired changes in, e.g., query algebras or transformation rules. This has led to a number of research contributions aiming to create extensible query optimizers, such as Starburst, Volcano, and OPT++.
This paper reports on a study that has enhanced Volcano to support a relational algebra with added temporal operators, such as temporal join and aggregation. These enhancements include the introduction of algorithms and cost formulas for the new operators, six types of query equivalences, and accompanying query transformation rules. The paper describes extensions to Volcano's structure and algorithms and summarizes implementation experiences.
Lomet, D., C. S. Jensen, "Transaction Timestamping in (Temporal) Databases," in Proceedings of the 27th International Conference on Very Large Databases, Rome, Italy, pp. 441-450, September 11-14, 2001.

Publication
Many database applications need accountability and trace-ability that necessitate retaining previous database states. For a transaction-time database supporting this, the choice of times used to timestamp database records, to establish when records are or were current, needs to be consistent with a committed transaction serialization order. Previous solutions have chosen timestamps at commit time, selecting a time that agrees with commit order. However, SQL standard databases can require an earlier choice because a statement within a transaction may request "current time." Managing timestamps chosen before a serialization order is established is the challenging problem we solve here.
By building on two-phase locking concurrency control, we can delay a transaction's choice of a timestamp, reducing the chance that transactions may need to be aborted in order keep timestamps consistent with a serialization order. Also, while timestamps stored with records in a transaction-time database make it possible to directly identify write-write and write-read conflicts, handling read-write conflicts requires more. Our simple auxiliary structure conservatively detects read-write conflicts, and hence provides transaction timestamps that are consistent with a serialization order.
Slivinskas, G., C. S. Jensen, R. T. Snodgrass, "Adaptable Query Optimization and Evaluation in Temporal Middleware," in Proceedings of the 2001 ACM SIGMOD International Conference on the Management of Data, Santa Barbara, CA, USA, pp. 127-138, May 21-24, 2001.

Publication [not publicly available]

ACM Author-Izer
Time-referenced data are pervasive in most real-world databases. Recent advances in temporal query languages show that such database applications may benefit substantially from built-in temporal support in the DBMS. To achieve this, temporal query optimization and evaluation mechanisms must be provided, either within the DBMS proper or as a source level translation from temporal queries to conventional SQL. This paper proposes a new approach: using a middleware component on top of a conventional DBMS. This component accepts temporal SQL statements and produces a corresponding query plan consisting of algebraic as well as regular SQL parts. The algebraic parts are processed by the middleware, while the SQL parts are processed by the DBMS. The middleware uses performance feedback from the DBMS to adapt its partitioning of subsequent queries into middleware and DBMS parts. The paper describes the architecture and implementation of the temporal middleware component, termed TANGO, which is based on the Volcano extensible query optimizer and the XXL query processing library. Experiments with the system demonstrate the utility of the middleware`s internal processing capability and its cost-based mechanism for apportioning the processing between the middle- ware and the underlying DBMS.
Pfoser, D., C. S. Jensen, "Querying the Trajectories of On-Line Mobile Objects," in Proceedings of the Second ACM International Workshop on Data Engineering for Wireless and Mobile Access, Santa Barbara, California, USA, pp. 66-73, May 20, 2001.

Publication [not publicly available]

ACM Author-Izer
Position data is expected to play a central role in a wide range of mobile computing applications, including advertising, leisure, safety, security, tourist, and traffic applications. Applications such as these are characterized by large quantities of wirelessly Internet-worked, position-aware mobile objects that receive services where the objects' position is essential. The movement of an object is captured via sampling, resulting in a trajectory consisting of a sequence of connected line segments for each moving object. This paper presents a technique for querying these trajectories. The technique uses indices for the processing of spatiotemporal range queries on trajectories. If object movement is constrained by the presence of infrastructure, e.g., lakes, park areas, etc., the technique is capable of exploiting this to reduce the range query, the purpose being to obtain better query performance. Specifically, an algorithm is proposed that segments the original range query based on the infrastructure contained in its range. The applicability and limitations of the proposal are assessed via empirical performance studies with varying datasets and parameter settings
Pedersen, T. B., C. S. Jensen, C. E. Dyreson, "Preaggregation for irregular OLAP Hierarchies with the TreeScape System," in Demo Proceedings of the Seventeenth International Conference on Data Engineering, Heidelberg, Germany, pp. 1-3, April 2-6, 2001.

Publication
We present the TreeScape system which, unlike any other system known to the authors, enables the reuse of pre-computed aggregate query results involving the kinds of irregular dimension hierarchies that occur frequently in practice. The system establishes a foundation for obtaining high-performance query processing while precomputing only few aggregates. It is demonstrated how this reuse of aggregates is enabled through dimension transformations that occur transparently to the user.
Jensen, C. S., M. Schneider, B. Seeger, V. J. Tsotras, (Eds), "Advances in Spatial and Temporal Databases," Seventh International Symposium, SSTD 2001, Redondo Beach, CA, USA, Lecture Notes in Computer Science, Volume 2121, Springer-Verlag, July 2001.

Online at SpringerLink
Jensen, C. S., "Temporal Database Concepts," in Encyclopedia of Microcomputers, Volume 27, Supplement 6, pp. 371-391, edited by A. Kent and J. Williams, published by Marcel Dekker, New York, NY, 2001.

Publication [not publicly available]
Agarwal, P. K., L. J. Guibas, H. Edelsbrunner, J. Erickson, M. Isard, S. Har-Peled, J. Hershberger, C. S. Jensen, L. Kavraki, P. Koehl, M. Lin, D. Manocha, D. Metaxas, B. Mirtich, D. Mount, S. Muthukrishnan, D. Pai, E. Sacks, J. Snoeyink, S. Suri, O. Wolfson, "Algorithmic Issues in Modeling Motion," Report from the National Science Foundation/Army Research Office Workshop on Motion Algorithms. The workshop was held in August 2000. 27 pages , September 2001.

Publication
This report presents the results of the workshop on Algorithmic Issues in Modeling Motion, funded by NSF and ARO, held on August 6 and 7, 2000 at Duke University, Durham, NC. The report identifies research areas in which motion plays a pivotal role, summarizes the challenges that lie ahead in dealing with motion, and makes a number of specific recommendations to address some of the challenges presented.
Jensen, C. S., A. Friis-Christensen, T. B. Pedersen, D. Pfoser, S. Šaltenis, N. Tryfona, "Location-Based Services - A Database Perspective," in Proceedings of the Eighth Scandinavian Research Conference on Geographical Information Sciences, Norway, pp. 59-68, June 25-27, 2001.

Publication
We are heading rapidly towards a global computing and information infrastructure that will contain billions of wirelessly connected devices, many of which will offer so-called location-based services to their mobile users always and everywhere. Indeed, users will soon take ubiquitous wireless access to information and services for granted. This scenario is made possible by the rapid advances in the underlying hardware technologies, which continue to follow variants of Moore's Law.
Examples of location-based services include tracking, way-finding, traffic management, safety-related services, and mixed-reality games, to name but a few. This paper outlines a general usage scenario for location-based services, in which data warehousing plays a central role, and it describes central challenges to be met by the involved software technologies in order for them to reach their full potential for usage in location-based services.
Jensen, C. .S., T. B. Pedersen, "Mobile E-Services and Their Challenges to Data Warehousing," in Proceedings of the Workshop des Arbeitskreises `Konzepte des Data Warehousing', Oldenburg, Germany, 9 pages, March 6-7, 2001.

Publication
Continued advances in hardware technologies combine to create a new class of information services, termed mobile e-services, or simply m-services, which exploits the advances in, among others, wireless communications, positioning, and miniaturization. Because the users do not merely interact with the services from behind stationary desktop computers, but from a variety of increasingly unobtrusive information appliances while on the move, location information plays a fundamental role, and new types of services become of interest. Such services include tracking, way-finding, traffic management, safety-related services, and mixed-reality games, to name but a few. Data warehousing has the potential for playing an essential part in m-services. However, for data warehousing to be successful in an m-service scenario, new challenges must be met by data warehousing technologies. Such challenges include support for non-standard dimension hierarchies and imprecision and varying precision in the data; transportation networks; continuous change; closed-loop usage; and dynamic services. This paper outlines a general m-services scenario and describes central challenges to be met by data warehousing in order for it to reach its full potential for usage in m-services.
Damsgaard, J., J. Hørlück, C. S. Jensen, "From Financial Wholesale to Retail - Preparing an IT-infrastructure for e-commerce," Teaching case. 28 pages, May, 2001.

In the beginning of 2001, the newly employed Internet director of the largest Danish mortgage company Nykredit was on his way to a meeting to review and revise Nykredit's Internet strategy. The principal consideration was depending on how to distribute Nykredit's many products and services in the near future. One basic question summarized the pending decision well: Should Nykredit primarily rely on a call-center and a strong web presence or should Nykredit extend its present branch network, focus on building up a finely meshed physical network, and distribute its services mainly through branches and offices? Nykredit was in a unique position in that it did not have many physical offices, and therefore Nykredit did in fact have a real choice that many other financial companies did not have in practice. On the other hand, the choices were only real if Nykredit could meet the technological challenge and if they, by the use of technology, could convince the customers that personal service was possible without a strong physical presence.
The meeting was to take place at Nykredit's Data Center in Aalborg. As he was walking through the building's long passageway, he was thinking to himself how IT was once a remote back-office support service and how it had now moved to the center stage and had become the business storefront. The fact that the strategy meeting took place at the data center underlined the importance of IT to Nykredit. Any decisions on distribution had to be carefully aligned not only to traditional considerations such as Nykredit's own organization, the market and the competitors' initiatives, but also to Nykredit's existing IT infrastructure and its IT department's capabilities to produce modern software fast.
Damsgaard, J., J. Hørlück, C. S. Jensen, "From Financial Wholesale to Retail - Preparing an IT-infrastructure for e-commerce," Teaching note accompanying the above, 8 pages, May, 2001.

Jensen, C. .S., "Virtual Worlds and Augmented Realities," Vision statement invited by the Danish Technical Research Council, 2 pages, July 1, 2001.

Benetis, R., C. S. Jensen, G. Karčiauskas, S. Šaltenis, "Nearest Neighbor and Reverse Nearest Neighbor Queries for Moving Objects," TimeCenter Technical Report TR-66, 21 pages, October 2001.

Publication
With the proliferation of wireless communications and the rapid advances in technologies for tracking the positions of continuously moving objects, algorithms for efficiently answering queries about large numbers of moving objects increasingly are needed. One such query is the reverse nearest neighbor (RNN ) query that returns the objects that have a query object as their closest object. While algorithms have been proposed that compute RNN queries for non-moving objects, there have been no proposals for answering RNN queries for continuously moving objects. Another such query is the nearest neighbor (NN ) query, which has been studied extensively and in many contexts. Like the RNN query, the NN query has not been explored for moving query and data points. This paper proposes an algorithm for answering RNN queries for continuously moving points in the plane. As a part of the solution to this problem and as a separate contribution, an algorithm for answering NN queries for continuously moving points is also proposed. The results of performance experiments are reported.
Skyt, J., C. S. Jensen, "Persistent Views - a Mechanism for Managing Aging Data," TimeCenter Technical Report TR-65, 29 pages, August 2001.

Publication
Enabled by the continued advances in storage technologies, the amounts of on-line data grow at a rapidly increasing pace. This development is witnessed in, e.g., so-called data webhouses that accumulate click streams from portals, and in other data warehouse-type applications. The presence of very large and continuously growing amounts of data introduces new challenges, one of them being the need for effective management of aged data. In very large and growing databases, some data eventually becomes inaccurate or outdated, and may be of reduced interest to the database applications. This paper offers a mechanism, termed persistent views, that aids in flexibly reducing the volume of data, e.g., by enabling the replacement of such "low-interest," detailed data with aggregated data. The paper motivates persistent views and precisely defines and contrasts these with the related mechanisms of views, snapshots, and physical deletion. The paper also offers a provably correct foundation for implementing persistent views.
Šaltenis, S., C. S. Jensen, "Indexing of Moving Objects for Location-Based Services," TimeCenter Technical Report TR-63, 24 pages, July 2001.

Publication
With the continued proliferation of wireless networks, e.g., based on such evolving standards as WAP and Bluetooth, visionaries predict that the Internet will soon extend to billions of wireless devices, or objects. A substantial fraction of these will offer their changing positions to the (location-based) services, they either use or support. As a result, software technologies that enable the management of the positions of objects capable of continuous movement are in increasingly high demand. This paper assumes what we consider a realistic Internet-service scenario where objects that have not reported their position within a specified duration of time are expected to no longer be interested in, or of interest to, the service. In this scenario, the possibility of substantial quantities of "expiring" objects introduces a new kind of implicit update, which contributes to rendering the database highly dynamic. The paper presents an R-tree based technique for the indexing of the current positions of such objects. Extensive performance experiments explore the properties of the types of bounding regions that are candidates for being used in the internal entries of the index, and they show that, when compared to the approach where the objects are not assumed to expire, the new indexing technique can improve the search performance by as much as a factor of two or more without sacrificing update performance.
Pfoser, D., C. S. Jensen, "Querying the Trajectories of On-Line Mobile Objects," TimeCenter Technical Report TR-57, 19 pages, June 2001.

Publication
Position data is expected to play a central role in a wide range of mobile computing applications, including advertising, leisure, safety, security, tourist, and traffic applications. Applications such as these are characterized by large quantities of wirelessly Internet-worked, position-aware mobile objects that receive services where the objects' position is essential. The movement of an object is captured via sampling, resulting in a trajectory consisting of a sequence of connected line segments for each moving object. This paper presents a technique for querying these trajectories. The technique uses indices for the processing of spatiotemporal range queries on trajectories. If object movement is constrained by the presence of infrastructure, e.g., lakes, park areas, etc., the technique is capable of exploiting this to reduce the range query, the purpose being to obtain better query performance. Specifically, an algorithm is proposed that segments the original range query based on the infrastructure contained in its range. The applicability and limitations of the proposal are assessed via empirical performance studies with varying datasets and parameter settings.
Slivinskas, G., C. S. Jensen, R. T. Snodgrass, "Adaptable Query Optimization and Evaluation in Temporal Middleware," TimeCenter Technical Report TR-56, 28 pages, March 2001.

Publication
Time-referenced data are pervasive in most real-world databases. Recent advances in temporal query languages show that such database applications may benefit substantially from built-in temporal support in the DBMS. To achieve this, temporal query optimization and evaluation mechanisms must be provided, either within the DBMS proper or as a source level translation from temporal queries to conventional SQL. This paper proposes a new approach: using a middleware component on top of a conventional DBMS. This component accepts temporal SQL statements and produces a corresponding query plan consisting of algebraic as well as regular SQL parts. The algebraic parts are processed by the middleware, while the SQL parts are processed by the DBMS. The middleware uses performance feedback from the DBMS to adapt its partitioning of subsequent queries into middleware and DBMS parts. The paper describes the architecture and implementation of the temporal middleware component, termed TANGO, which is based on the Volcano extensible query optimizer and the XXL query processing library. Experiments with the system demonstrate the utility of the middleware`s internal processing capability and its cost-based mechanism for apportioning the processing between the middleware and the underlying DBMS.
2000 top Price, R., N. Tryfona, C. S. Jensen, "Extended Spatiotemporal UML: Motivations, Requirements, and Constructs," Journal of Database Management (special issue: Systems Analysis and Design Using UML), Vol. 11, No. 4, pp. 13-27, October/December 2000.

Publication [not publicly available]
Böhlen, M. H., C. S. Jensen, R. T. Snodgrass, "Temporal Statement Modifiers," ACM Transactions on Database Systems, Vol. 25, No. 4, pp. 407-456, December 2000.

Publication [not publicly available]

ACM Author-Izer
A wide range of database applications manage time-varying data. Many temporal query languages have been proposed, each one the result of many carefully made yet subtly interacting design decisions. In this article we advocate a different approach to articulating a set of requirements, or desiderata, that directly imply the syntactic structure and core semantics of a temporal extension of an (arbitrary) nontemporal query language. These desiderata facilitate transitioning applications from a nontemporal query language and data model, which has received only scant attention thus far. The paper then introduces the notion of statement modifiers that provide a means of systematically adding temporal support to an existing query language. Statement modifiers apply to all query language statements, for example, queries, cursor definitions, integrity constraints, assertions, views, and data manipulation statements. We also provide a way to systematically add temporal support to an existing implementation. The result is a temporal query language syntax, semantics, and implementation that derives from first principles. We exemplify this approach by extending SQL-92 with statement modifiers. This extended language, termed ATSQL, is formally defined via a denotational-semantics-style mapping of temporal statements to expressions using a combination of temporal and conventional relational algebraic operators.
Güting, R. H., M. Böhlen, M. Erwig, C. S. Jensen, N. Lorentzos, M. Schneider, M. Vazirgiannis, "A Foundation for Representing and Querying Moving Objects," ACM Transactions on Database Systems, Vol. 25, No. 1, pp. 1-42, March 2000.

Publication [not publicly available]

ACM Author-Izer
Spatio-temporal databases deal with geometries changing over time. The goal of our work is to provide a DBMS data model and query language capable of handling such time-dependent geometries, including those changing continuously that describe moving objects. Two fundamental abstractions are moving point and moving region, describing objects for which only the time-dependent position, or position and extent, respectively, are of interest. We propose to present such time-dependent geometries as attribute data types with suitable operations, that is, to provide an abstract data type extension to a DBMS data model and query language. This paper presents a design of such a system of abstract data types. It turns out that besides the main types of interest, moving point and moving region, a relatively large number of auxiliary data types are needed. For example, one needs a line type to represent the projection of a moving point into the plane, or a moving real to represent the time-dependent distance of two points. It then becomes crucial to achieve (i) orthogonality in the design of the system, i.e., type constructors can be applied unifomly; (ii) genericity and consistency of operations, i.e., operations range over as many types as possible and behave consistently; and (iii) closure and consistency between structure and operations of nontemporal and related temporal types. Satisfying these goal leads to a simple and expressive system of abstract data types that may be integrated into a query language to yield a powerful language for querying spatio-temporal data, including moving objects. The paper formally defines the types and operations, offers detailed insight into the considerations that went into the design, and exemplifies the use of the abstract data types using SQL. The paper offers a precise and conceptually clean foundation for implementing a spatio-temporal DBMS extension.
Torp, K., R. T. Snodgrass, C. S. Jensen, "Effective Timestamping in Databases," The VLDB Journal, Vol. 8, No. 4, pp. 267-288, February 2000.

Publication

Online at SpringerLink
Many existing database applications place various timestamps on their data, rendering temporal values such as dates and times prevalent in database tables. During the past two decades, several dozen temporal data models have appeared, all with timestamps being integral components. The models have used timestamps for encoding two specific temporal aspects of database facts, namely transaction time, when the facts are current in the database, and valid time, when the facts are true in the modeled reality. However, with few exceptions, the assignment of timestamp values has been considered only in the context of individual modification statements.
This paper takes the next logical step: It considers the use of timestamping for capturing transaction and valid time in the context of transactions. The paper initially identifies and analyzes several problems with straightforward timestamping, then proceeds to propose a variety of techniques aimed at solving these problems. Timestamping the results of a transaction with the commit time of the transaction is a promising approach. The paper studies how this timestamping may be done using a spectrum of techniques. While many database facts are valid until now, the current time, this p value is absent from the existing temporal types. Techniques that address this problem using different substitute values are presented. Using a stratum architecture, the performance of the different proposed techniques are studied. Although querying and modifying time-varying data is accompanied by a number of subtle problems, we present a comprehensive approach that provides application programmers with simple, consistent, and efficient support for modifying bitemporal databases in the context of user transactions.
Price, R., N. Tryfona, C. S. Jensen, "Modeling part-whole relationships for spatial data," in Proceedings of the Eighth International Symposium of ACM GIS, Washington DC, pp. 1-8, November 10-11, 2000.

Publication [not publicly available]

ACM Author-Izer
Spatial applications must manage partwhole (PW) relationships between spatial objects, for example, the division of an administrative region into zones based on land use. Support for conceptual modeling of relationships between parts and whole, such as aggregation and membership, has been well researched in the object oriented (OO) community; however, spatial data has generally not been considered. We propose here a practical approach to integrating support for spatial PW relationships into conceptual modeling languages. Three different types of relationships - spatial part, spatial membership, and spatial inclusion - that are of general utility in spatial applications are identified and formally defined using a consistent classification framework based on spatial derivation and constraint relationships. An extension of the Unified Modeling Language (UML) for spatiotemporal data, namely Extended Spatiotemporal UML, is used to demonstrate the feasibility of using such an approach to define modeling constructs supporting spatial PW relationships.
Pedersen, T. B., A. Shoshani, J. Gu, C. S. Jensen, "Extending OLAP Querying to External Object Databases," in Proceedings of the Ninth International Conference on Information and Knowledge Management, Washington, DC, pp. 405-413, November 6-11, 2000.

Publication [not publicly available]

ACM Author-Izer
On-Line Analytical Processing (OLAP) systems based on a multidimensional view of data have found widespread use in business applications and are being used increasingly in non-standard applications. These systems provide good performance and ease-of-use. However, the complex structures and relationships inherent in data in non-standard applications are not accommodated well by OLAP systems. In contrast, object database systems are built to handle such complexity, but do not support OLAP-type querying well.
This paper presents the concepts and techniques underlying a flexible, "multi-model" federated system that enables OLAP users to exploit simultaneously the features of OLAP and object database systems. The system allows data to be handled using the most appropriate data model and technology: OLAP systems for multidimensional data and object database systems for more complex, general data. Additionally, physical data integration can be avoided. As a vehicle for demonstrating the capabilities of the system, a prototypical OLAP language is defined and extended to naturally support queries that involve data in object databases. The language permits selection criteria that reference object data, queries that return combinations of OLAP and object data, and queries that group multidimensional data according to object data. The system is designed to be aggregation-safe, in the sense that it exploits the aggregation semantics of the data to prevent incorrect or meaningless query results. A prototype implementation of the system is reported.
Skyt, J., C. S. Jensen, "Managing Aging Data Using Persistent Views (Extended Abstract)," in Proceedings of the Fifth IFCIS International Conference on Cooperative Information Systems, Eilat, Israel, pp. 132-137. The proceedings were published as Lecture Notes in Computer Science 1901, September 2000.

Publication
Enabled by the continued advances in storage technologies, the amounts of on-line data grow at a rapidly increasing pace. For example, this development is witnessed in the so-called data webhouses that accumulate data derived from clickstreams. The presence of very large and continuously growing amounts of data introduces new challenges, one of them being the need for effectively managing aging data that is perhaps inaccurate, partly outdated, and of reduced interest. This paper describes a new mechanism, persistent views, that aids in flexibly reducing the volume of data, e.g., by enabling the replacement of such "low-interest," detailed data with aggregated data; and it outlines a strategy for implementing persistent views.
Pedersen, T. B., C. S. Jensen, C. E. Dyreson, "The TreeScape System: Reuse of Pre-Computed Aggregates over Irregular OLAP Hierarchies," in Proceedings of the 26th International Conference on Very Large Databases, Cairo, Egypt, pp. 595-598 (demo), September 10-14, 2000.

Publication
We present the TreeScape system that, unlike any other system known to the authors, enables the reuse of pre-computed aggregate query results for irregular dimension hierarchies, which occur frequently in practice. The system establishes a foundation for obtaining high query processing performance while pre-computing only limited aggregates. The paper shows how this reuse of aggregates is enabled through dimension transformations that occur transparently to the user.
Pfoser, D., C. S. Jensen, Y. Theodoridis, "Novel Approaches in Query Processing for Moving Objects," in Proceesings of the 26th International Conference on Very Large Databases, Cairo, Egypt, pp. 395-406, September 10-14, 2000.

Publication
The domain of spatiotemporal applications is a treasure trove of new types of data and queries. However, work in this area is guided by related research from the spatial and temporal domains, so far, with little attention towards the true nature of spatiotemporal phenomena. In this work, the focus is on a spatiotemporal sub-domain, namely the trajectories of moving point objects. We present new types of spatiotemporal queries, as well as algorithms to process those. Further, we introduce two access methods this kind of data, namely the Spatio-Temporal R-tree (STR-tree) and the Trajectory-Bundle tree (TB-tree). The former is an R-tree based access method that considers the trajectory identity in the index as well, while the latter is a hybrid structure, which preserves trajectories as well as allows for R-tree typical range search in the data. We present performance studies that compare the two indices with the R-tree (appropriately modified, for a fair comparison) under a varying set of spatiotemporal queries, and we provide guidelines for a successful choice among them.
Bliujute, R., C. S. Jensen, S. Šaltenis, G. Slivinskas, "Light-Weight Indexing of General Bitemporal Data," in Proceedings of the Twelfth International Conference on Scientific and Statistical Database Management, Berlin, Germany, pp. 125-138, July 2000.

Publication
Most data managed by existing, real-world database applications is time referenced. Often, two temporal aspects of data are of interest, namely valid time, when data is true in the mini-world, and transaction time, when data is current in the database, resulting in so-called bitemporal data. Like spatial data, bitemporal data thus has associated two-dimensional regions. Such data is in part naturally now relative: some data is true until the current time, and some data is part of the current database state. Therefore, unlike for spatial data, bitemporal data regions may grow continuously. Existing indices, e.g., B+- and R-trees, typically do not contend well with even small amounts of now-relative data.In contrast, the 4-R index presented in the paper is capable of indexing general bitemporal data efficiently. The different kinds of growing data regions are transformed into stationary regions, which are then indexed by R*-trees. Queries are also transformed to counter the data transformations, yielding a technique with perfect precision and recall. Performance studies indicate that the technique is competitive with the best existing index; and unlike this existing index, the new technique does not require extension of the DBMS kernel.
Šaltenis, S., C. S. Jensen, S. Leutenegger, M. Lopez, "Indexing the Positions of Continuously Moving Objects," in Proceedings of the 2000 ACM SIGMOD International Conference on the Management of Data, Dallas, TX, USA, pp. 331-342, May 14-1, 2000.

Publication [not publicly available]

ACM Author-Izer
The coming years will witness dramatic advances in wireless communications as well as positioning technologies. As a result, tracking the changing positions of objects capable of continuous movement is becoming increasingly feasible and necessary. The present paper proposes a novel, R*-tree based indexing technique that supports the efficient querying of the current and projected future positions of such moving objects. The technique is capable of indexing objects moving in one-, two-, and three-dimensional space. Update algorithms enable the index to accommodate a dynamic data set, where objects may appear and disappear, and where changes occur in the anticipated positions of existing objects. A comprehensive performance study is reported.
Tryfona, N., C. S. Jensen, "Using Abstractions for Spatio-Temporal Conceptual Modeling," in Proceedings of the 2000 ACM Symposium on Applied Computing, Villa Olmo, Como, Italy, pp. 313-322, March 19-21, 2000.

Publication [not publicly available]

ACM Author-Izer
Conceptual data modeling for complex applications, such as multimedia and spatiotemporal applications, often results in large, complicated and difficult-to-comprehend diagrams. These diagrams frequently involve repetition of autonomous, semantically meaningful parts that capture similar situations and characteristics. By recognizing such parts and treating them as modeling units, it is possible to simplify the diagrams, as well as the conceptual modeling process. In this paper, based on requirements drawn from real applications, we present a set of modeling units that capture spatial, temporal, and spatiotemporal aspects. To facilitate the conceptual design process, these units are abbreviated in the conceptual diagrams by corresponding spatial, temporal, and spatiotemporal modeling abstractions. The result is more elegant and less- detailed diagrams that are easier to comprehend (for the user) and to construct (for the designer), but yet semantically rich. An extension of the Entity-Relationship model serves as the context for this study. An example from a real cadastral application illustrates the benefits of using an abstraction-based conceptual model.
Slivinskas, G., C. S. Jensen, R. T. Snodgrass, " Query Plans for Conventional and Temporal Queries Involving Duplicates and Ordering ," in Proceedings of the Sixteenth IEEE International Conference on Data Engineering, San Diego, CA, USA, pp. 547-558 , February 28-March 3, 2000.

Publication
Most real-world database applications contain a substantial portion of time-referenced, or temporal, data. Recent advances in temporal query languages show that such database applications could benefit substantially from builtin temporal support in the DBMS. To achieve this, temporal query representation, optimization, and processing mechanisms must be provided. This paper presents a general, algebraic foundation for query optimization that integrates conventional and temporal query optimization and is suitable for providing temporal support both via a stand-alone temporal DBMS and via a layer on top of a conventional DBMS. By capturing duplicate removal and retention and order preservation for all queries, as well as coalescing for temporal queries, this foundation formalizes and generalizes existing approaches.
Jensen, C. S., R. T. Snodgrass, "Temporally Enhanced Database Design," Chapter 7, pp. 163-193, in Advances in Object-Oriented Data Modeling, edited by M. P. Papazoglou, S. Spaccapietra, and Z. Tari, MIT Press, 367+xxv pages, 2000.

Publication
Pedersen, T. B., C. S. Jensen, "Advanced Implementation Techniques for Scientific Data Warehouses," in Proceedings of the Workshop of Management and Integration of Biochemical Data, Villa Bosch, Heidelberg, Germany, 9 pages , September 25-26, 2000.

Publication
Data warehouses using a multidimensional view of data have become very popular in both business and science in recent years. Data warehouses for scientific purposes such as medicine and bio-chemistry1 pose several great challenges to existing data warehouse technology. Data warehouses usually use pre-aggregated data to ensure fast query response. However, pre-aggregation cannot be used in practice if the dimension structures or the relationships between facts and dimensions are irregular. A technique for overcoming this limitation and some experimental results are presented. Queries over scientific data warehouses often need to reference data that is external to the data warehouse, e.g., data that is too complex to be handled by current data warehouse technology, data that is "owned" by other organizations, or data that is updated frequently. An example of this are the public genome databases such as Swissprot. This paper presents a federation architecture that allows the integration of multidimensional warehouse data with complex external data.
Jensen, C. S., "Themes and Challenges in Temporal Databases," pp. 175-176, in H.Schmidt, editor, Modellierung Betrieblicher Informationssysteme. (Proceedings der MOBIS-Fachtagung, Universität Siegen, Germany) , October 11-12, 2000.

Publication
Jensen, C. S., "Review - The Logical Access Path Schema of a Database," ACM SIGMOD Digital Review, Vol. 2, 2000.

Online at DBLP
Price, R., N. Tryfona, C. S. Jensen, "Supporting Conceptual Modeling of Complex Spatial Relationships," Chorochronos Technical Report CH-00-5, July 2000.

Geographic Information Systems must manage complex relationships between spatial entities such as the division of administrative regions into zones based on land use. Support for conceptual modeling of complex relationships such as aggregation and membership has been well-researched in the object-oriented community; however, spatial data has generally not been considered. In this paper, we propose a practical approach to integrating support for complex spatial relationships into a conceptual modeling language. Three different types of complex spatial relationships-spatial part, overlay, membership, and inclusion-which are of general utility in GIS applications are identified and formally defined using a consistent classification framework based on spatial derivation and constraint relationships. An extension of the Unified Modeling Language (UML) for spatiotemporal data, Extended Spatiotemporal UML, is used to demonstrate the feasibility of using such an approach to define modeling constructs supporting complex spatial relationships.
Slivinskas, G., C. S. Jensen, R. T. Snodgrass, " A Foundation for Conventional and Temporal Query Optimization Addressing Duplicates and Ordering ," TimeCenter Technical Report TR-49, 44 pages, February 2000.

Publication
Most real-world databases contain substantial amounts of time-referenced, or temporal, data. Recent advances in temporal query languages show that such database applications may benefit substantially from built-in temporal support in the DBMS. To achieve this, temporal query representation, optimization, and processing mechanisms must be provided. This paper presents a foundation for query optimization that integrates conventional and temporal query optimization and is suitable for both conventional DBMS architectures and ones where the temporal support is obtained via a layer on top of a conventional DBMS. This foundation captures duplicates and ordering for all queries, as well as coalescing for temporal queries, thus generalizing all existing approaches known to the authors. It includes a temporally extended relational algebra to which SQL and temporal SQL queries may be mapped, six types of algebraic equivalences, concrete query transformation rules that obey different equivalences, a procedure for determining which types of transformation rules are applicable for optimizing a query, and a query plan enumeration algorithm.
The presented approach partitions the work required by the database implementor to develop a provably correct query optimizer into four stages: the database implementor has to (1) specify operations formally; (2) design and prove correct appropriate transformation rules that satisfy any of the six equivalence types; (3) augment the mechanism that determines when the different types of rules are applicable to ensure that the enumeration algorithm applies the rules correctly; and (4) ensure that the mapping generates a correct initial query plan.
Pfoser, D., C. S. Jensen, Y. Theodoridis, "Novel Approaches in Query Processing for Moving Objects," Chorochronos Technical Report CH-00-03, 26 pages, February 2000.

Publication
The domain of spatiotemporal applications is a treasure trove of new types of data as well as queries. However, work in this area is guided by related research from the spatial and temporal domains, so far, with little attention towards the true nature of spatiotemporal phenomena. In this work the focus is on a spatiotemporal sub-domain, namely moving point objects. We present new types of spatiotemporal queries, as well as algorithms to process those. Further, we introduce two access methods to index these kinds of data, namely the Spatio-Temporal R-tree (STR-tree) and the Trajectory-Bundle tree (TB-tree). The former is an R-tree based access method that considers the trajectory identity in the index as well, while the latter is a hybrid structure, which preserves trajectories as well as allows for R-tree typical range search in the data. We present performance studies that compare the two indices with the R-tree (appropriately modified, for a fair comparison) under a varying set of spatiotemporal queries and provide guidelines for a successful choice among them.
1999 top Tryfona, N., C. S. Jensen, "Conceptual Data Modeling for Spatiotemporal Applications," Geoinformatica, Vol. 3, No. 3, pp. 245-268, September 1999.

Publication [not publicly available]

Online at SpringerLink
Many exciting potential application areas for database technology manage time-varying, spatial information. In contrast, existing database techniques, languages, and associated tools provide little built-in support for the management of such information. The focus of this paper is on enhancing existing conceptual data models with new constructs, improving their ability to conveniently model spatiotemporal aspects of information. The goal is to speed up the data modeling process and to make diagrams easier to comprehend and maintain. Based on explicitly formulated ontological foundations, the paper presents a small set of new, generic modeling constructs that may be introduced into different conceptual data models. The ER model is used as the concrete context for presenting the constructs. The semantics of the resulting spatiotemporal ER model, STER, is given in terms of the underlying ER model. STER is accompanied by a textual counterpart, and a CASE tool based on STER is currently being implemented, using the textual counterpart as its internal representation.
Gregersen, H., C. S. Jensen, "Temporal Entity-Relationship Models - a Survey," IEEE Transactions on Knowledge and Data Engineering, Vol. 11, No. 3, pp. 464-497, May/June 1999.

Publication
The Entity-Relationship (ER) Model, using varying notations and with some semantic variations, is enjoying a remarkable, and increasing, popularity in both the research community, the computer science curriculum, and in industry. In step with the increasing diffusion of relational platforms, ER modeling is growing in popularity. It has been widely recognized that temporal aspects of database schemas are prevalent and difficult to model using the ER model. As a result, how to enable the ER model to properly capture time-varying information has for a decade and a half been an active area of the database research community. This has led to the proposal of almost a dozen temporally enhanced ER models.
This paper surveys all temporally enhanced ER models known to the authors. It is the first paper to provide a comprehensive overview of temporal ER modeling, and it thus meets a need for consolidating and providing easy access to the research in temporal ER modeling. In the presentation of each model, the paper examines how the time-varying information is captured in the model and presents the new concepts and modeling constructs of the model. A total of 19 different design properties for temporally enhanced ER models are defined, and each model is characterized according the these properties.
Jensen, C. S., R. T. Snodgrass, "Temporal Data Management," in IEEE Transactions on Knowledge and Data Engineering, Vol. 11, No. 1, pp. 36-44, January/February 1999.

Publication
A wide range of database applications manage time-varying information. Existing database technology currently provides little support for managing such data. The research area of temporal databases has made important contributions in characterizing the semantics of such information and in providing expressive and efficient means to model, store, and query temporal data. This paper introduces the reader to temporal data management, surveys state-of-the-art solutions to challenging aspects of temporal data management, and points to research directions.
Gregersen, H., C. S. Jensen, " On the Ontological Expressiveness of Temporal Extensions to the Entity-Relationship Model ," in Proceedings of the First International Workshop on Evolution and Change in Data Management, Versailles, France, pp. 110-121 , November 15-18, 1999.

Publication
It is widely recognized that temporal aspects of database schemas are prevalent, but also difficult to capture using the ER model. The database research community's response has been to develop temporally enhanced ER models. However, these models have not been subjected to systematic evaluation. In contrast, the evaluation of modeling methodologies for information systems development is a very active area of research in information systems engineering community, where the need for systematic evaluations of modeling methodologies is well recognized.
Based on a framework from information systems engineering, this paper evaluates the ontological expressiveness of three different temporal enhancements to the ER model, the Entity-Relation-Time model, the TERC+ model, and the Time Extended ER model. Each of these temporal ER model extensions is well-documented, and together the models represent a substantial range of the design space for temporal ER extensions. The evaluation considers the uses of the models for both analysis and design, and the focus is on how well the models capture temporal aspects of reality as well as of relational database designs.
Dyreson, C. E., M. H. Böhlen, C. S.Jensen, "Capturing and Querying Multiple Aspects of Semistructured Data," in Proceedings of the Twentyfifth International Conference on Very Large Databases, pp. 290-301 , September 7-10, 1999.

Publication
Motivated to a large extent by the substantial and growing prominence of the World-Wide Web and the potential benefits that may be obtained by applying database concepts and techniques to web data management, new data models and query languages have emerged that contend with web data. These models organize data in graphs where nodes denote objects or values and edges are labeled with single words or phrases. Nodes are described by the labels of the paths that lead to them, and these descriptions serve as the basis for querying.
This paper proposes an extensible framework for capturing and querying meta-data properties in a semistructured data model. Properties such as temporal aspects of data, prices associated with data access, quality ratings associated with the data, and access restrictions on the data are considered. Specifically, the paper defines an extensible data model and an accompanying query language that provides new facilities for matching, slicing, collapsing, and coalescing properties. It also briefly introduces an implemented, SQL-like query language for the extended data model that includes additional constructs for the effective querying of graphs with properties.
Pedersen, T. B., C. S. Jensen, C. E. Dyreson, "Extending Practical Pre-Aggregation in On-Line Analytical Processing," in Proceedings of the Twentyfifth International Conference on Very Large Databases, pp. 663-674 , September 7-10, 1999.

Publication
On-Line Analytical Processing (OLAP) based on a dimensional view of data is being used increasingly for the purpose of analyzing very large amounts of data. To improve query performance, modern OLAP systems use a technique known as practical pre-aggregation, where select combinations of aggregate queries are materialized and re-used to compute other aggregates; full preaggregation, where all combinations of aggregates are materialized, is infeasible. However, this reuse of aggregates is contingent on the dimension hierarchies and the relationships between facts and dimensions satisfying stringent constraints, which severely limits the scope of practical preaggregation. This paper significantly extends the scope of practical pre-aggregation to cover a much wider range of realistic situations. Specifically, algorithms are given that transform irregular dimension hierarchies and fact-dimension relationships, which often occur in real-world OLAP applications, into well-behaved structures that, when used by existing OLAP systems, enable practical pre-aggregation. The algorithms have low computational complexity and may be applied incrementally to reduce the cost of updating OLAP structures.
Tryfona, N., S. Andersen, S. R. Mogensen, C. S. Jensen, "A Methodology and a Tool for Spatiotemporal Database Design," in Proceedings of the Seventh Hellenic Conference on Informatics, Ioannina, Greece, pp. III 53-60 , August 26-29, 1999.

Publication
This paper concerns a methodology and its supporting prototype tool for database design of spatiotemporal applications. The methodology focuses on the main phases of conceptual and logical modeling with each phase being accompanied by models specifically constructed to handle spatiotemporal peculiarities. A database design tool that guides the designer through the conceptual and logical modeling as well as implementation, while dealing with applications involving space and time, is further presented. Starting from the conceptual modeling phase, the tool provides a specific environment to support the SpatioTemporal Entity-Relationship (STER) model, an extension of the Entity-Relationship model, towards the spatial and temporal dimension. An intermediate representation phase, namely, the logical phase, follows; in this, conceptual schemata are mapped into maps and relations, using an extension of the relational model, the SpatioTemporal Relational model (STR). Translation rules from conceptual to logical schemata are given. The resulted logical schemata are further translated into different underlying target DBMSs with spatial support; Oracle and the Spatial Data Option are used as a prototype. The STER and STR models, as well as the proposed tool are tested with extended examples from real applications.
Pfoser, D., C. S. Jensen, "Incremental Join of Time-Oriented Data," in Proceedings of the Eleventh International Conference on Scientific and Statistical Database Management, Cleveland, Ohio, pp. 232-243 , July 28-30, 1999.

Publication
Data warehouses as well as a wide range of other databases exhibit a strong temporal orientation: it is important to track the temporal variation of data over several months or years. In addition, databases often exhibit append-only characteristics where old data is retained while new data is appended. Performing joins efficiently on large databases such as these is essential to obtain good overall query processing performance. This paper presents a sort-merge-based incremental algorithm for time-oriented data. While incremental computation techniques have proven competitive in many settings, they also introduce a space overhead in the form of differential files. For the temporal data explored here, this overhead is avoided because the differential files are already part of the database. In addition, data is naturally sorted, leaving only merging. The incremental algorithm works in a partitioned storage environment and does not assume the availability of indices, making it a competitor to sort-based and nested-loop joins. The paper presents analytical as well as simulation-based characterizations of the performance of the join.
Pedersen, T. B., C. S. Jensen, C. E. Dyreson, "Supporting Imprecision in Multidimensional Databases Using Granularities," in Proceedings of the Eleventh International Conference on Scientific and Statistical Database Management, Cleveland, Ohio, pp. 90-101 , July 28-30, 1999.

Publication
On-Line Analytical Processing (OLAP) technologies are being used widely, but the lack of effective means of handling data imprecision, which occurs when exact values are not known precisely or are entirely missing, represents a major obstacle in applying these technologies in many domains.This paper develops techniques for handling imprecision that aim to maximally reuse existing OLAP modeling constructs such as dimension hierarchies and granularities. With imprecise data available in the database, queries are tested to determine whether or not they may be answered precisely given the available data; if not, alternative queries unaffected by the imprecision are suggested. When processing queries affected by imprecision, techniques are proposed that take into account the imprecision in the grouping of the data, in the subsequent aggregate computation, and in the presentation of the imprecise result to the user. The approach is capable of exploiting existing OLAP query processing techniques such as pre-aggregation, yielding an effective approach with low computational overhead and that may be implemented using current technology.
Pfoser, D., C. S. Jensen, "Capturing the Uncertainty of Moving-Object Representations," in Proceedings of the Sixth International Symposium on Spatial Databases, Hong Kong, pp. 111-132 , July 20-23, 1999.

Publication

Online at SpringerLink
Spatiotemporal applications, such as fleet management and air traffic control, involving continuously moving objects are increasingly at the focus of research efforts. The representation of the continuously changing positions of the objects is fundamentally important in these applications. This paper reports on on-going research in the representation of the positions of moving-point objects. More specifically, object positions are sampled using the Global Positioning System, and interpolation is applied to determine positions in-between the samples. Special attention is given in the representation to the quantification of the position uncertainty introduced by the sampling technique and the interpolation. In addition, the paper considers the use for query processing of the proposed representation in conjunction with indexing. It is demonstrated how queries involving uncertainty may be answered using the standard filter-and-refine approach known from spatial query processing.
Bliujute, R., S. Šaltenis, G. Slivinskas, C. S. Jensen, "Developing a DataBlade for a New Index," in Proceedings of the Fifteenth IEEE International Conference on Data Engineering, Sydney, Australia, pp. 314-323 , March 23-26, 1999.

Publication
In order to better support current and new applications, the major DBMS vendors are stepping beyond uninterpreted binary large objects, termed BLOBs, and are beginning to offer extensibility features that allow external developers to extend the DBMS with, e.g., their own data types and accompanying access methods. Existing solutions include DB2 extenders, Informix DataBlades, and Oracle cartridges. Extensible systems offer new and exciting opportunities for researchers and third-party developers alike. This paper reports on an implementation of an Informix DataBlade for the GR-tree, a new R-tree based index. This effort represents a stress test of the perhaps currently most extensible DBMS, in that the new DataBlade aims to achieve better performance, not just to add functionality. The paper provides guidelines for how to create an access method DataBlade, describes the sometimes surprising challenges that must be negotiated during DataBlade development, and evaluates the extensibility of the Informix Dynamic Server.
Pedersen, T. B., C. S. Jensen, "Multidimensional Data Modeling for Complex Data," in Proceedings of the Fifteenth IEEE International Conference on Data Engineering, Sydney, Australia, pp. 336-345 , March 23-26, 1999.

Publication
On-Line Analytical Processing (OLAP) systems considerably ease the process of analyzing business data and have become widely used in industry. Such systems primarily employ multidimensional data models to structure their data. However, current multidimensional data models fall short in their abilities to model the complex data found in some real-world application domains. The paper presents nine requirements to multidimensional data models, each of which is exemplified by a real-world, clinical case study. A survey of the existing models reveals that the requirements not currently met include support for many-to-many relationships between facts and dimensions, built-in support for handling change and time, and support for uncertainty as well as different levels of granularity in the data. The paper defines an extended multidimensional data model, and an associated algebra, which address all nine requirements.
Böhlen, M. H., C. S. Jensen, M. Scholl, editors, "Spatio-Temporal Database Management," in Proceedings of the International Workshop on Spatio-Temporal Database Management, Edinburgh, Scotland, September, Lecture Notes in Computer Science, Volume 1678, Springer-Verlag , 1999.

Online at SpringerLink
Frank, A., R. H. Güting, C. S. Jensen, M. Koubarakis, N. Lorentzos, Y. Manolopoulos, E. Nardelli, B. Pernici, H.-J. Schek, M. Scholl, T. Sellis, B. Theodoulidis, P. Widmayer, "Chorochronos: A Research Network for Spatiotemporal Database Systems," in ACM SIGMOD Record, Vol. 28, No. 3, pp. 12-21. (Submissions go through a mini-review process.) , September 1999.

Publication [not publicly available]

ACM Author-Izer
Jensen, C. S., " Review - Equivalence of Relational Algebra and Relational Calculus Query Languages Having Aggregate Functions ," ACM SIGMOD Digital Review, Vol. 1, 1999.

Online at DBLP
Jensen, C. S., "Review - Multi-Step Processing of Spatial Joins," ACM SIGMOD Digital Review, Vol. 1, 1999.

Online at DBLP
Jensen, C. S., "Review - R-Trees: A Dynamic Index Structure for Spatial Searching," ACM SIGMOD Digital Review, Vol.1, 1999.

Online at DBLP
Jensen, C. S., "Databasen er strategisk," (in Danish), Computerworld, November 5, page 27, 1999.

Price, R., N. Tryfona, C. S. Jensen, "A Conceptual Modeling Language for Spatiotemporal Applications," Chorochronos Technical Report CH-99-20, December 1999.

Publication
This paper presents a conceptual modeling language for spatiotemporal applications that offers built-in support for capturing geo-referenced, time-varying information. More specifically, the well-known object-oriented Unified Modeling Language (UML) is extended to capture the semantics of space and time as they appear in spatiotemporal applications. Language clarity and simplicity is maintained in the new language, the Extended Spatiotemporal UML, which introduces a small base set of modeling constructs, namely, the spatial, temporal and thematic constructs, which can then be combined and applied at different levels (i.e., attribute, association, object class) in the object-oriented model. An example is used to illustrate the simplicity and flexibility of this approach, and a formal functional specification of the semantic constructs and their symbolic combinations is given.
Šaltenis, S., C. S. Jensen, S. Leutenegger, M. Lopez, "Indexing the Positions of Continuously Moving Objects," TimeCenter Technical Report TR-44, November 1999, 27 pages, and Chorochronos Technical Report CH-99-19, 26 pages, December 1999.

Publication
The coming years will witness dramatic advances in wireless communications as well as positioning technologies. As a result, tracking the changing positions of objects capable of continuous movement is becoming increasingly feasible and necessary. The present paper proposes a novel, R -tree based indexing technique that supports the efficient querying of the current and projected future positions of such moving objects. The technique is capable of indexing objects moving in one-, two-, and three- dimensional space. Update algorithms enable the index to accommodate a dynamic data set, where objects may appear and disappear, and where changes occur in the anticipated positions of existing objects. In addition, a bulkloading algorithm is provided for building and rebuilding the index. A comprehensive performance study is reported.
Šaltenis S., C. S. Jensen, "R-Tree Based Indexing of General Spatio-Temporal Data," TimeCenter Technical Report TR-45, December 1999, 23 pages, and Chorochronos Technical Report CH-99-18, 22 pages, December 1999.

Publication
Real-world objects are inherently spatially and temporally referenced, and many database applications rely on databases that record the past, present, and anticipated future locations of, e.g., people or land parcels. As a result, indices that efficiently support queries on the spatio-temporal extents of objects are needed. In contrast, past indexing research has progressed in largely separate spatial and temporal streams. In the former, focus has been on one-, two-, or three-dimensional space; and in the latter, focus has been on one or both of the temporal aspects, or dimensions, of data known as transaction time and valid time. Adding time dimensions to spatial indices, as if time was a spatial dimension, neither supports nor exploits the special properties of time. On the other hand, temporal indices are generally not amenable to extension with spatial dimensions.
This paper proposes an efficient and versatile technique for the indexing of spatio-temporal data with discretely changing spatial extents: the spatial aspect of an object may be a point or may have an extent; both the transaction time and valid time are supported; and a generalized notion of the current time, now, is accommodated for the temporal dimensions. The technique extends the previously proposed R -tree and borrows from the GR-tree, and it provides means of prioritizing space versus time, enabling it to adapt to spatially and temporally restrictive queries. Performance experiments were performed to evaluate different aspects of the proposed indexing technique, and are included in the paper.
Torp, K., C. S. Jensen, R. T. Snodgrass, "Modification of Now-Relative Databases," TimeCenter Technical Report TR-43, 37 pages, September 1999.

Publication
Most real-world databases record time-varying information. In such databases, the notion of "the current time," or "now", occurs naturally and prominently. For example, when capturing the past states of a relation using begin and end time attributes, tuples that are part of the current state have some past time as their begin time and "now" as their end time. While the semantics of such variable databases has been described in detail and is well understood, the modification of variable databases remains unexplored.
This paper defines the semantics of modifications involving the variable "now". More specifically, the problems with modifications in the presence of "now" are explored, illustrating that the main problems are with modifications and tuples that reach into the future. The paper defines the semantics of modifications - including insertions, deletions, and updates - of databases without "now", with "now", and with values of the type "now" + Delta, where Delta is a non-variable time duration. To accommodate these semantics, three new timestamp values are introduced. An approximate semantics that does not rely on new timestamp values is also provided. Finally, implementation is explored.
Pedersen, T. B., C. S. Jensen, C. E. Dyreson, "Extending Practical Pre-Aggregation in On-Line Analytical Processing," Technical Report R-99-5004, Aalborg University, Department of Computer Science, Fredrik Bajers Vej 7E, DK-9220 Aalborg Øst, Denmark, 33 pages , September 1999.

Publication
On-Line Analytical Processing (OLAP) based on a dimensional view of data is being used increasingly in traditional business applications as well as in applications such as health care for the purpose of analyzing very large amounts of data. Pre-aggregation, the prior materialization of aggregate queries for later use, is an essential technique for ensuring adequate response time during data analysis. Full pre-aggregation, where all combinations of aggregates are materialized, is infeasible. Instead, modern OLAP systems adopt the practical pre-aggregation approach of materializing only select combinations of aggregates and then re-use these for efficiently computing other aggregates. However, this re-use of aggregates is contingent on the dimension hierarchies and the relationships between facts and dimensions satisfying stringent constraints. This severely limits the scope of the practical pre-aggregation approach. This paper significantly extends the scope of practical pre-aggregation to cover a much wider range of realistic situations. Specifically, algorithms are given that transform irregular dimension hierarchies and fact-dimension relationships, which often occur in real-world OLAP applications, into well-behaved structures that, when used by existing OLAP systems, enable practical pre-aggregation. The algorithms have low computational complexity and may be applied incrementally to reduce the cost of updating OLAP structures.
Pfoser, D., Y. Theodoridis, C. S. Jensen, "Indexing Trajectories of Moving Point Objects," Chorochronos Technical Report CH-99-3, 23 pages, June 1999.

Publication
Spatiotemporal applications attract more and more attention, both, from researchers as well as application developers. Especially the peculiarities of spatiotemporal data are the focus of an increasing research effort. In this paper we extend the well-known R-tree method to handle trajectory data stemming from moving point objects. The resulting access method, termed (Spatio-Temporal) STR-tree, differs from the R-tree in that it stores additional information in the entries at the leaf level and, further, has modified insertion and split algorithms. Besides the description of the STR-tree algorithms, we provide an extensive performance study examining the behaviour of the new method as compared to the original R-tree under a varying set of queries and datasets. The collection of queries comprises the typical point and range queries as well as pure spatiotemporal queries based on the semantics of objects' trajectories, the so-called trajectory and navigational queries.
Pedersen, T. B., C. S. Jensen, C. E. Dyreson, "Supporting Imprecision in Multidimensional Databases Using Granularities," Technical Report R-99-5003, Aalborg University, Department of Computer Science, Fredrik Bajers Vej 7E, DK-9220 Aalborg Øst, Denmark, 29+iii pages , May 1999.

Publication
On-Line Analytical Processing (OLAP) technologies are being used widely for business-data analysis, and these technologies are also being used increasingly in medical applications, e.g., for patient-data analysis. The lack of effective means of handling data imprecision, which occurs when exact values are not known precisely or are entirely missing, represents a major obstacle in applying OLAP technology to the medical domain, as well as many other domains. OLAP systems are mainly based on a multidimensional model of data and include constructs such as dimension hierarchies and granularities. This paper develops techniques for the handling of imprecision that aim to maximally reusing these already existing constructs. With imprecise data now available in the database, queries are tested to determine whether or not they may be answered precisely given the available data; if not, alternative queries that are unaffected by the imprecision are suggested. When a user elects to proceed with a query that is affected by imprecision, techniques are proposed that take into account the imprecision in the grouping of the data, in the subsequent aggregate computation, and in the presentation of the imprecise result to the user. The approach is capable of exploiting existing multidimensional query processing techniques such as pre-aggregation, yielding an effective approach with low computational overhead and that may be implemented using current technology. The paper illustrates how to implement the approach using SQL databases.
1998 top Torp, K., L. Mark, C. S. Jensen, "Efficient Differential Timeslice Computation," in IEEE Transactions on Knowledge and Data Engineering Vol. 10, No. 4, pp. 599-611 , July/August 1998.

Publication
Transaction-time databases support access to not only the current database state, but also previous database states. Supporting access to previous database states requires large quantities of data and necessitates efficient temporal query processing techniques. In previous work, we have presented a log-based storage structure and algorithms for the differential computation of previous database states. Timeslices - i.e., previous database states - are computed by traversing a log of database changes, using previously computed and cached timeslices as outsets. When computing a new timeslice, the cache will contain two candidate outsets: an earlier outset and a later outset. The new timeslice can be computed by either incrementally updating the earlier outset or decrementally "downdating" the later outset using the log. The cost of this computation is determined by the size of the log between the outset and the new timeslice. This paper proposes an efficient algorithm that identifies the cheaper outset for the differential computation. The basic idea is to compute the sizes of the two pieces of the log by maintaining and using a tree structure on the timestamps of the database changes in the log. The lack of a homogeneous node structure, a controllable and high fill-factor for nodes, and of appropriate node allocation in existing tree structures (e.g., B+-trees, Monotonic B+-trees, and Append-only trees) render existing tree structures unsuited for our use. Consequently, a specialized tree structure, the Pointer-less Insertion tree, is developed to support the algorithm. As a proof of concept, we have implemented a main memory version of the algorithm and its tree structure.
Bliujute, R., C. S. Jensen, S. Šaltenis, G.Slivinskas, "R-tree-based Indexing of Now-Relative Bitemporal Data," in Proceedings of the 24th International Conference on Very Large Databases, New York City, NY, pp. 345-356 , August 24-27, 1998.

Publication
The databases of a wide range of applications, e.g., in data warehousing, store multiple states of time-evolving data. These databases contain a substantial part of now-relative data: data that became valid at some past time and remains valid until the current time. More specifically, two temporal aspects of data are frequently of interest, namely valid time, when data is true, and transaction time, when data is current in the database, leading to bitemporal data. Only little work, based mostly on R-trees, has addressed the indexing of bitemporal data. No indices exist that contend well with now-relative data, which leads to temporal data regions that are continuous functions of time. The paper proposes two extended R*-trees that permit the indexing of data regions that grow continuously over time, by also letting the internal bounding regions grow. Internal bounding regions may be triangular as well as rectangular. New heuristics for the algorithms that govern the index structure are provided. As a result, dead space and overlap, now also functions of time, are reduced. Performance studies indicate that the best extended index is typically 3-5 times faster than the existing R-tree based indices.
Torp, K., C. S. Jensen, R. T. Snodgrass, "Stratum Approaches to Temporal DBMS Implementation," in Proceedings of the 1998 International Database Engineering and Applications Symposium, Cardiff, Wales, U.K., pp. 4-13. IEEE Computer Society , July 8-10, 1998.

Publication
Previous approaches to implementing temporal DBMSs have assumed that a temporal DBMS must be built from scratch, employing an integrated architecture and using new temporal implementation techniques such as temporal indexes and join algorithms. However, this is a very large and time-consuming task. The paper explores approaches to implementing a temporal DBMS as a stratum on top of an existing non-temporal DBMS, rendering implementation more feasible by reusing much of the functionality of the underlying conventional DBMS. More specifically, the paper introduces three stratum meta-architectures, each with several specific architectures. Based on a new set of evaluation criteria, advantages and disadvantages of the specific architectures are identified. The paper also classifies all existing temporal DBMS implementations according to the specific architectures they, employ. It is concluded that a stratum architecture is the best short, medium, and perhaps even long-term, approach to implementing a temporal DBMS
Pedersen, T. B., C. S. Jensen, "Research Issues in Clinical Data Warehousing," in Proceedings of the Tenth International Conference on Scientific and Statistical Database Management, Capri, Italy, pp. 43-52. IEEE Computer Society , July 1-3, 1998.

Publication
Medical informatics has been an important area for the application of computing and database technology for at least four decades. This area may benefit from the functionality offered by data warehousing. However, the special nature of clinical applications poses different and new requirements to data warehousing technologies, over those posed by conventional data warehouse applications. This article presents a number of exciting new research challenges posed by clinical applications, to be met by the database research community. These include the need for complex-data modeling features, advanced temporal support, advanced classification structures, continuously valued data, dimensionally reduced data, and the integration of very complex data. In addition, the support for clinical treatment protocols and medical research are interesting areas for research.
Pedersen, T. B., C. S. Jensen, "Clinical Data Warehousing - A Survey," in Proceedings of the VIII Mediterranean Conference on Medical and Biological Engineering and Computing, Lemesos, Cyprus, Section 20.3, 6 pages (CD-rom proceedings), June 14-17, 1998.

Publication
In this article we present the concept of data warehousing, and its use in the clinical area. Clinical data warehousing will become very important in the near future, as healthcare enterprises need to gain more information from their clinical, administrative, and financial data, in order to improve quality and reduce costs. Adoption of data warehousing in health care has been slowed by lack of understanding of the benefits offered by the technology. This paper contributes by providing needed understanding, by introducing the opportunities offered by data warehousing, describing current efforts in the area, and providing criteria for comparing clinical data warehouse systems.
Bliujute, R., S. Šaltenis, G. Slivinskas, C. S. Jensen, "Systematic Change Management in Dimensional Data Warehousing," in Proceedings of the Third International Baltic Workshop on DB and IS, Riga, Latvia, pp. 27-41, April 15-17, 1998.

Publication
With the widespread and increasing use of data warehousing in industry, the design of effective data warehouses and their maintenance has become a focus of attention. Independently of this, the area of temporal databases has been an active area of research for well beyond a decade. This article identifies shortcomings of so-called star schemas, which are widely used in industrial warehousing, in their ability to handle change and subsequently studies the application of temporal techniques for solving these shortcomings.
Star schemas represent a new approach to database design and have gained widespread popularity in data warehousing, but while they have many attractive properties, star schemas do not contend well with so-called slowly changing dimensions and with state-oriented data. We study the use of so-called temporal star schemas that may provide a solution to the identified problems while not fundamentally changing the database design approach. More specifically, we study the relative database size and query performance when using regular star schemas and their temporal counterparts for state-oriented data. We also offer some insight into the relative ease of understanding and querying databases with regular and temporal star schemas.
Böhlen, M. H., C. S. Jensen, B. Skjellaug, "Spatio-Temporal Database Support for Legacy Applications," in Proceedings of the 1998 ACM Symposium on Applied Computing, Atlanta, Georgia, pp. 226-234, February 27-March 1, 1998.

Publication [not publicly available]

ACM Author-Izer
In areas such as finance, marketing, and property and resource management, many database applications manage spatio-temporal data. These applications typically run on top of a relational DBMS and manage spatio-temporal data either using the DBMS, which provides little support, or employ the services of a proprietary system that co-exists with the DBMS, but is separate from and not integrated with the DBMS. This wealth of applications may benefit substantially from built-in, integrated spatio-temporal DBMS support. Providing a foundation for such support is an important and substantial challenge.
This paper initially defines technical requirements to a spatio-temporal DBMS aimed at protecting business invest- ments in the existing legacy applications and at reusing personnel expertise. These requirements provide a foundation for making it economically feasible to migrate legacy applications to a spatio-temporal DBMS. The paper next presents the design of the core of a spatio-temporal, multi-dimensional extension to SQL-92, called STSQL, that satisfies the requirements. STSQL does so by supporting so-called upward compatible, dimensional upward compatible, reducible, and non-reducible queries. In particular, dimensional upward compatibility and reducibility were designed to address migration concerns and complement proposals based on abstract data types.
M. H. Böhlen, Busatto, R., C. S. Jensen, "Point versus Interval-based Temporal Data Models," in Proceedings of the Fourteenth IEEE International Conference on Data Engineering, Orlando, Florida, pp. 192-200, February 23-27, 1998.

Publication
The association of timestamps with various data items such as tuples or attribute values is fundamental to the management of time-varying information. Using intervals in timestamps, as do most data models, leaves a data model with a variety of choices for giving a meaning to timestamps. Specifically, some such data models claim to be point-based while other data models claim to be interval-based. The meaning chosen for timestamps is important-it has a pervasive effect on most aspects of a data model, including database design, a variety of query language properties, and query processing techniques, e.g., the availability of query optimization opportunities.
This paper precisely defines the notions of point-based and interval-based temporal data models, thus providing a new, formal basis for characterizing temporal data models and obtaining new insights into the properties of their query languages. Queries in point-based models treat snapshot equivalent argument relations identically. This renders point-based models insensitive to coalescing. In contrast, queries in interval-based models give significance to the actual intervals used in the timestamps, thus generally treating non-identical, but possibly snapshot equivalent, relations differently. The paper identifies the notion of time-fragment preservation as the essential defining property of an interval-based data model.
Snodgrass, R. T., M. H. Böhlen, C. S. Jensen, A. Steiner, "Transitioning Temporal Support in TSQL2 to SQL3," pp. 150-194 in O. Etzion, S. Jajodia, and S. Sripada, editors, Temporal Databases: Research and Practice, Lecture Notes in Computer Science 1399, Springer-Verlag 1998. (Proceedings of the Dagstuhl Seminar on Temporal Databases, Schloss Dagstuhl, Germany) , June 23-27, 1998.

Publication
This document summarizes the proposals before the SQL3 committees to allow the addition of tables with valid-time and transaction-time support into SQL/Temporal, and explains how to use these facilities to migrate smoothly from a conventional relational system to one encompassing temporal support. Initially, important requirements to a temporal system that may facilitate such a transition are motivated and discussed. The proposal then describes the language additions necessary to add valid-time support to SQL3 while fulfilling these requirements. The constructs of the language are divided into four levels, with each level adding increased temporal functionality to its predecessor. A prototype system implementing these constructs on top of a conventional DBMS is publicly available.
C. S. Jensen, C. E. Dyreson, (editors, with multiple other contributors), "A Consensus Glossary of Temporal Database Concepts - February 1998 Version," in O. Etzion, S. Jajodia, and S. Sripada, editors, Temporal Databases: Research and Practice, Lecture Notes in Computer Science 1399, Springer-Verlag, pp. 367-405 , 1998.

Publication
This document contains definitions of a wide range of concepts specific to and widely used within temporal databases. In addition to providing definitions, the document also includes explanations of concepts as well as discussions of the adopted names.
The consensus effort that lead to this glossary was initiated in Early 1992. Earlier versions appeared in SIGMOD Record in September 1992 and March 1994. The present glossary subsumes all the previous documents. The glossary meets the need for creating a higher degree of consensus on the definition and naming of temporal database concepts.
Two sets of criteria are included. First, all included concepts were required to satisfy four relevance criteria, and, second, the naming of the concepts was resolved using a set of evaluation criteria. The concepts are grouped into three categories: concepts of general database interest, of temporal database interest, and of specialized interest.
Pedersen, T. B., C. S. Jensen, "Clinical Data Warehousing - A Survey," presented at Conference on Healthcare Computing 1998, Harrogate, United Kingdom, 6 pages , March 23-25, 1998.

Publication
In this article we present the concept of data warehousing, and its use in the clinical area. Clinical data warehousing will become very important in the near future, as healthcare enterprises need to gain more information from their clinical, administrative, and financial data, in order to improve quality and reduce costs. Adoption of data warehousing in health care has been slowed by lack of understanding of the benefits offered by the technology. This paper contributes by providing needed understanding, by introducing the opportunities offered by data warehousing, describing current efforts in the area, and providing criteria for comparing clinical data warehouse systems.
Tsotras, V., C. S. Jensen, R. T. Snodgrass, "An Extensible Notation for Spatiotemporal Index Queries," in ACM SIGMOD Record, Vol. 27, No. 1, pp. 47-53, March, 1998.

Publication [not publicly available]

ACM Author-Izer
Temporal, spatial and spatiotemporal queries are inherently multidimensional, combining predicates on explicit attributes with predicates on time dimension(s) and spatial dimension(s). Much confusion has prevailed in the literature on access methods because no consistent notation exists for referring to such queries. As a contribution towards eliminating this problem, we propose a new and simple notation for spatiotemporal queries. The notation aims to address the selection-based spatiotemporal queries commonly studied in the literature of access methods. The notation is extensible and can be applied to more general multidimensional, selection-based queries.
Gregersen, H., L. Mark, C. S. Jensen, "Mapping Temporal ER Diagrams to Relational Schemas," TimeCenter Technical Report TR-39, 37 pages, December 1998.

Publication
Many database applications manage information that varies over time, and most of the database schemas for these applications were designed using one of the several versions of the Entity-Relationship (ER) model. In the research community as well as in industry, it is common knowledge that temporal aspects of data are pervasive and important to the applications, but are also difficult to capture using the ER model. The research community has developed temporal ER models, in an attempt to provide modeling constructs that more naturally and elegantly support capturing the temporal aspects. Specifically, the temporal models provide enhanced support for capturing aspects such as lifespans, valid time, and transaction time of data.
Because commercial database management systems support neither the ER model nor any temporal ER model as a model for data manipulation but rather support various versions of the relational model for this purpose we provide a two-step transformation from temporal ER diagrams, with built-in support for lifespans and valid and transaction time, to relational schemas. The first step of the algorithm translates a temporal ER diagram into relations in a surrogate-based relational target model; and the second step further translates this relational schema into a schema in a lexically-based relational target model.
Tryfona, N., C. S. Jensen, "A Component-Based Conceptual Model for Spatiotemporal Application Design," Chorochronos Technical Report CH-98-10, November 1998.

Publication
Conceptual data modeling for complex applications, such as multimedia and spatiotemporal applications, often results in large, complicated and difficult-to-comprehend diagrams. One reason for this is that these diagrams frequently involve repetition of autonomous, semantically meaningful parts that capture similar situations and characteristics. By recognizing such parts and treating them as units, it is possible to simplify the diagrams, as well as the conceptual modeling process. We propose to capture autonomous and semantically meaningful excerpts of diagrams that occur frequently as modeling patterns. Specifically, the paper concerns modeling patterns for conceptual design of spatiotemporal databases. Based on requirements drawn from real applications, it presents a set of modeling patterns that capture spatial, temporal, and spatiotemporal aspects. To facilitate the conceptual design process, these patterns are abbreviated by corresponding spatial, temporal, and spatiotemporal pattern abstractions, termed components. The result is more elegant and less-detailed diagrams that are easier to comprehend, but yet semantically rich. The Entity-Relationship model serves as the context for this study. An extensive example from a real cadastral application illustrates the benefits of using a component-based conceptual model.
Tryfona, N., C. S. Jensen, "Conceptual Modeling for Spatiotemporal Applications," Chorochronos Technical Report CH-98-8, November 1998.

Publication
Many exciting potential application areas for database technology manage time-varying, spatial information. In contrast, existing database techniques, languages, and associated tools provide little built- in support for the management of such information. The focus of this paper is on enhancing existing conceptual data models with new constructs, improving their ability to conveniently model spatiotemporal aspects of information. The goal is to speed up the data modeling process and to make diagrams easier to comprehend and maintain. Based on explicitly formulated ontological foundations, the paper presents a small set of new, generic modeling constructs that may be introduced into different conceptual data models. The ER model is used as the concrete context for presenting the constructs. The semantics of the resulting spatiotemporal ER model, STER, is given in terms of the underlying ER model. STER is accompanied by a textual counterpart, and a CASE tool based on STER is currently being implemented, using the textual counterpart as its internal representation.
Pedersen, T. B., C. S. Jensen, "Multidimensional Data Modeling for Complex Data," TimeCenter Technical Report TR-37, 25 pages, November 1998.

Publication
Systems for On-Line Analytical Processing (OLAP) considerably ease the process of analyzing business data and have become widely used in industry. OLAP systems primarily employ multidimensional data models to structure their data. However, current multidimensional data models fall short in their ability to model the complex data found in some real-world application domains. The paper presents nine requirements to multidimensional data models, each of which is exemplified by a real-world, clinical case study. A survey of the existing models reveals that the requirements not currently met include support for many-to-many relationships between facts and dimensions, built-in support for handling change and time, and support for uncertainty as well as different levels of granularity in the data. The paper defines an extended multidimensional data model, which addresses all nine requirements. Along with the model, we present an associated algebra, and outline how to implement the model using relational databases.
Dyreson, C. E., M. H. Böhlen, C. S. Jensen, "Capturing and Querying Multiple Aspects of Semistructured Data," TimeCenter Technical Report TR-36, 21 pages, November 1998.

Publication
Motivated to a large extent by the substantial and growing prominence of the World-Wide Web and the potential benefits that may be obtained by applying database concepts and techniques to web data management, new data models and query languages have emerged that contend with the semistructured nature of web data. These models organize data in graphs. The nodes in a graph denote objects or values, and each edge is labeled with a single word or phrase. Nodes are described by the labels of the paths that lead to them, and these descriptions serve as the basis for querying.
This paper proposes an extensible framework for capturing more data semantics in semistructured data models. Inspired by the multidimensional paradigm that finds application in on-line analytical processing and data warehousing, the framework makes it possible to associate values drawn from an extensible set of dimensions with edges. The paper considers dimensions that capture temporal aspects of data, prices associated with data access, quality ratings associated with the data, and access restrictions on the data. In this way, it accommodates notions from temporal databases, electronic commerce, information quality, and database security,
The paper defines the extensible data model and an accompanying query language that provides new facilities for matching, slicing, and collapsing the enriched paths and for coalescing edges. The paper describes an implemented, SQL-like query language for the extended data model that includes additional constructs for the effective querying of graphs with enriched paths.
Gregersen, H., C. S. Jensen, "Conceptual Modeling of Time-Varying Information," TimeCenter Technical Report TR-35, 31 pages, September 1998.

Publication
A wide range of database applications manage information that varies over time. Many of the underlying database schemas of these were designed using one of the several versions, with varying syntax and semantics, of the Entity-Relationship (ER) model. In the research community as well as in industry, it is common knowledge that the temporal aspects of the mini-world are pervasive and important, but are also difficult to capture using the ER model. Not surprisingly, several enhancements to the ER model have been proposed in an attempt to more naturally and elegantly support the modeling of temporal aspects of information. Common to the existing temporally extended ER models, few or no specific requirements to the models were given by their designers.
With the existing proposals, an ontological foundation, and novel requirements as its basis, this paper formally defines a graphical, temporally extended ER model. The ontological foundation serves to aid in ensuring a maximally orthogonal design, and the requirements aim, in part, at ensuring a design that naturally extends the syntax and semantics of the regular ER model. The result is a novel model that satisfies an array of properties not satisfied by any single previously proposed model.
Güting, R. H., M. H. Böhlen, M. Erwig, C. S. Jensen, N. Lorentzos, M. Schneider, M. Vazirgiannis, "A Foundation for Representing and Querying Moving Objects," Informatik Berichte 238 - 9/1998, FernUniversität Hagen, Fachbereich Informatik, 49 pages , September 1998.

Publication
Spatio-temporal databases deal with geometries changing over time. The goal of our work is to provide a DBMS data model and query language capable of handling such time-dependent geometries, including those changing continuously which describe moving objects. Two fundamental abstractions are moving point} and moving region, describing objects for which only the time-dependent position, or position and extent, are of interest, respectively. We propose to represent such time-dependent geometries as attribute data types with suitable operations, that is, to provide an abstract data type extension to a DBMS data model and query language.
This paper presents a design of such a system of abstract data types. It turns out that besides the main types of interest, moving point and moving region, a relatively large number of auxiliary data types is needed. For example, one needs a line type to represent the projection of a moving point into the plane, or a "moving real" to represent the time-dependent distance of two moving points. It then becomes crucial to achieve (i) orthogonality in the design of the type system, i.e., type constructors can be applied uniformly, (ii) genericity and consistency of operations, i.e., operations range over as many types as possible and behave consistently, and (iii) closure and consistency between structure and operations of non-temporal and related temporal types. Satisfying these goals leads to a simple and expressive system of abstract data types that may be integrated into a query language to yield a powerful language for querying spatio-temporal data, including moving objects. The paper formally defines the types and operations, offers detailed insight into the considerations that went into the design, and exemplifies the use of the abstract data types using SQL. The paper offers a precise and conceptually clean foundation for implementing a spatio-temporal DBMS extension.
Pfoser, D., C. S. Jensen, "Incremental Join of Time-Oriented Data," TimeCenter Technical Report TR-34, 22 pages, September 1998.

Publication
Data warehouses as well as a wide range of other databases exhibit a strong temporal orientation: it is important to track the temporal variation of data over several months or, often, years. In addition, data warehouses and databases often exhibit append-only characteristics where old data pertaining to the past is retained while new data pertaining to the present is appended. Performing joins on large databases such as these can be very costly, and the efficient processing of joins is essential to obtain good overall query processing performance. This paper presents a sort-merge-based incremental algorithm for timeoriented data. While incremental computation techniques have proven competitive in many settings, they also introduce a space overhead in the form of differential files. However, for the temporal data explored here, this overhead is avoided because the differential files are already part of the database. In addition, data is naturally sorted, leaving only merging. The incremental algorithm works in a partitioned storage environment and does not assume the availability of indices, making it a competitor to sort-based and nested-loop joins. The paper presents analytical cost formulas as well as simulation-based studies that characterize the performance of the join.
Skyt, J., C. S. Jensen, "Vacuuming Temporal Databases," TimeCenter Technical Report TR-32, 20 pages, September 1998.

Publication
A wide range of real-world database applications, including financial and medical applications, are faced with accountability and trace-ability requirements. These requirements lead to the replacement of the usual update-in-place policy by an append-only policy, yielding so-called transaction-time databases. With logical deletions being implemented as insertions at the physical level, these databases retain all previously current states and are ever-growing. A variety of physical storage structures and indexing techniques as well as query languages have been proposed for transaction-time databases, but the support for physical deletion, termed vacuuming, has received precious little attention. Such vacuuming is called for by, e.g., the laws of many countries. Although necessary, with vacuuming, the database s previously perfect and reliable recollection of the past may be manipulated via, e.g., selective removal of records pertaining to past states. This paper provides a semantic framework for the vacuuming of transaction-time databases. The main focus is to establish a foundation for the correct and user-friendly processing of queries and updates against vacuumed databases. Queries that may return results affected by vacuuming are intercepted, and the user is presented with the option of issuing similar queries that are not affected by vacuuming.
Bliujute, R., C. S. Jensen, S. Šaltenis, G. Slivinskas, "Light-Weight Indexing of General Bitemporal Data," TimeCenter Technical Report TR-30, 20 pages, September 1998.

Publication
Most data managed by existing, real-world database applications is time referenced. Data warehouses are good examples. Often, two temporal aspects of data are of interest, namely valid time, when data is true in the mini-world, and transaction time, when data is current in the database, resulting in so-called bitemporal data. Like spatial data, bitemporal data thus has associated two-dimensional regions. Such data is in part naturally now-relative: some data is currently true in the mini-world or is part of the current database state. So, unlike for spatial data, the regions of now-relative bitemporal data grow continuously. Existing indices, including commercially available indices such as B+- and R-trees, typically do not contend well with even small amounts of now-relative data.
This paper proposes a new indexing technique that indexes general bitemporal data efficiently. The technique eliminates the different kinds of growing data regions by means of transformations and then indexes the resulting stationary data regions with four R*-trees, and queries on the original data are mapped to corresponding queries on the transformed data. Extensive performance studies are reported that provide insight into the characteristics and behavior of the four trees storing differently-shaped regions, and they indicate that the new technique yields a performance that is competitive with the best existing index; and unlike this existing index, the new technique does not require extension of the kernel of the DBMS.
Bliujute, R., S. Šaltenis, G. Slivinskas, C. S. Jensen, "Developing a DataBlade for a New index," TimeCenter Technical Report TR-29, 20 pages, September 1998.

Publication
Many current and potential applications of database technology, e.g., geographical, medical, spatial, and multimedia applications, require efficient support for the management of data with new, complex data types. As a result, the major DBMS vendors are stepping beyond the support for uninterpreted binary large objects, termed BLOBs, and are beginning to offer extensibility features that allow external developers to extend the DBMS with, e.g., their own data types and accompanying access methods. Existing solutions include DB2 extenders, Informix DataBlades, and Oracle cartridges. Extensible systems offer new and exciting opportunities for researchers and third-party developers alike.
This paper reports on an implementation of an Informix DataBlade for the GR-tree, a new R-tree based index. This effort represents a stress test of what is perhaps currently the most extensible DBMS, in that the new DataBlade aims to achieve better performance, not just to add functionality. The paper provides guidelines for how to create an access method DataBlade, describes the sometimes surprising challenges that must be negotiated during DataBlade development, and evaluates the extensibility of the Informix Dynamic Server.
Bliujute, R., C. S. Jensen, S. Šaltenis, G. Slivinskas, "R-tree-based Indexing of Now-Relative Bitemporal Data," TimeCenter Technical Report TR-25, 21 pages, March 1998.

Publication
The databases of a wide range of applications, e.g., in data warehousing, store multiple states of time-evolving data. These databases contain a substantial part of now-relative data: data that became valid at some past time and remains valid until the current time. More specifically, two temporal aspects of data are frequently of interest, namely valid time, when data is true, and transaction time, when data is current in the database. The latter aspect is essential in all applications where accountability or trace-ability are required. When both aspects are captured, data is termed bitemporal.
A number of indices have been devised for the efficient support of operations on time-varying data with one time dimension, but only little work, based mostly on R-trees, has addressed the indexing of two- or higher-dimensional temporal data. No indices exist that contend well with now-relative data, which leads to temporal data regions that are continuous functions of time. The paper proposes two extended R-tree based indices.
Bliujute, R., S. Šaltenis, G. Slivinskas, C. S. Jensen, "Systematic Change Management in Dimensional Data Warehousing," TimeCenter Technical Report TR-23, 14 pages, January 1998.

Publication
With the widespread and increasing use of data warehousing in industry, the design of effective data warehouses and their maintenance has become a focus of attention. Independently of this, the area of temporal databases has been an active area of research for well beyond a decade. This article identifies shortcomings of so-called star schemas, which are widely used in industrial warehousing, in their ability to handle change and subsequently studies the application of temporal techniques for solving these shortcomings.
Star schemas represent a new approach to database design and have gained widespread popularity in data warehousing, but while they have many attractive properties, star schemas do not contend well with so-called slowly changing dimensions and with state-oriented data. We study the use of so-called temporal star schemas that may provide a solution to the identified problems while not fundamentally changing the database design approach. More specifically, we study the relative database size and query performance when using regular star schemas and their temporal counterparts for state-oriented data. We also offer some insight into the relative ease of understanding and querying databases with regular and temporal star schemas.
Böhlen, M., R. Busatto, C. S. Jensen, "Point versus Interval-based Temporal Data Models," TimeCenter Technical Report TR-21, 14 pages, January 1998.

Publication
The association of timestamps with various data items such as tuples or attribute values is fundamental to the management of time-varying information. Using intervals in timestamps, as do most data models, leaves a data model with a variety of choices for giving a meaning to timestamps. Specifically, some such data models claim to be point-based while other data models claim to be interval-based. The meaning chosen for timestamps is important it has a pervasive effect on most aspects of a data model, including database design, a variety of query language properties, and query processing techniques, e.g., the availability of query optimization opportunities.
This paper precisely defines the notions of point-based and interval-based temporal data models, thus providing a new, formal basis for characterizing temporal data models and obtaining new insights into the properties of their query languages. Queries in point-based models treat snapshot equivalent argument relations identically. This renders point-based models insensitive to coalescing. In contrast, queries in interval-based models give significance to the actual intervals used in the timestamps, thus generally treating non-identical, but possibly snapshot equivalent, relations differently. The paper identifies the notion of time-fragment preservation as the essential defining property of an interval-based data model.
1997 top Clifford, J., C. Dyreson, T. Isakowitz, C. S. Jensen, R. T. Snodgrass, "On the Semantics of "Now" in Databases," ACM Transactions on Database Systems. Vol. 22, No. 2, pp. 171-214, June 1997.

Publication [not publicly available]

ACM Author-Izer
Although "now" is expressed in SQL as CURRENT_TIMESTAMP within queries, this value cannot be stored in the database. However, this notion of an ever-increasing current-time value has been reflected in some temporal data models by inclusion of database-resident variables, such as "now", "until-changed," "x," "@," and "-". Time variables are very desirable, but their use also leads to a new type of database, consisting of tuples with variables, termed a variable database.
This article proposes a framework for defining the semantics of the variable databases of the relational and temporal relational data models. A framework is presented because several reasonable meanings may be given to databases that use some of the specific temporal variables that have appeared in the literature. Using the framework, the article defines a useful semantics for such databases. Because situations occur where the existing time variables are inadequate, two new types of modeling entities that address these shortcomings, timestamps that we call now-relative and now-relative indeterminate, are introduced and defined within the framework. Moreover, the article provides a foundation, using algebraic bind operators, for the querying of variable databases via existing query languages. This transition to variable databases presented here requires minimal change to the query processor. Finally, to underline the practical feasibility of variable databases, we show that database variables can be precisely specified and efficiently implemented in conventional query languages, such as SQL, and in temporal query languages, such as TSQL2.
Bair, J., M. H. Böhlen, C. S. Jensen, R. T. Snodgrass, "Notions of Upward Compatibility of Temporal Query Languages," Wirtschaftsinformatik, Vol. 39, No. 1, pp. 25-34, February 1997.

Publication [not publicly available]

Online at Wirtschaftsinformatik
Migrating applications from conventional to temporal database management technology has received scant mention in the research literature. This paper formally defines three increasingly restrictive notions of upward compatibility which capture properties of a temporal SQL with respect to conventional SQL that, when satisfied, provide for a smooth migration of legacy applications to a temporal system. The notions of upward compatibility dictate the semantics of conventional SQL statements and constrain the semantics of extensions to these statements. The paper evaluates the seven extant temporal extensions to SQL, all of which are shown to complicate migration through design decisions that violate one or more of these notions. We then outline how SQL-92 can be systematically extended to become a temporal query language that satisfies all three notions.
Gregersen, H., C. S. Jensen, L. Mark, "Evaluating Temporally Extended ER-Models," in Proceedings of the CAiSE'97/IFIP 8.1 International Workshop on Evaluation of Modeling Methods in Systems Analysis and Design, Barcelona, Spain, 12 pages, June 16-17, 1997.

Publication
The Entity-Relationship (ER) Model, is enjoying a remarkable popularity in industry. It has been widely recognized that while temporal aspects of data play a prominent role in database applications, these aspects are difficult to capture using the ER model. Some industrial users have responded to this deficiency by ignoring all temporal aspects in their ER diagrams and simply supplement the diagrams with phrases like "full temporal support." The research community has responded by developing about a dozen proposals for temporally extended ER models. These temporally extended ER models were accompanied by only few or no specific criteria for designing them, making it difficult to appreciate their properties and to conduct an insightful comparison of the models. This paper defines a set of design criteria that may be used for evaluating and comparing the existing temporally extended ER models.
Torp, K., C. S. Jensen, M. H. Böhlen, "Layered Temporal DBMSs - Concepts and Techniques," Fifth International Conference on Database Systems for Advanced Applications, Melbourne, Australia, 10 pages, pp. 371-380, April 1-4, 1997.

Publication
A wide range of database applications manage time- varying data, and it is well-known that querying and correctly updating time-varying data is dificult and error-prone when using standard SQL. Temporal exten- sions of SQL ofSeer substantial benefits over SQL when managing time-varying data.
The topic of this paper is the effective implementation of temporally extended SQL s. Traditionally, it has been assumed that a temporal DBMS must be built from scratch, utilizing new technologies for storage, in- dexing, query optimization, concurrency control, and recovery. In contrast, this paper explores the concepts and techniques involved in implementing a temporally enhanced SQL while maximally reusing the facilities of an existing SQL implementation. The topics cov- ered span the choice of an adequate timestamp domain that includes the time van able NOW, a comparison. of query processing architectures, and transaction pro- cessing, the latter including how to ensure ACID prop- erties and assign timestamps to updates.
Tryfona, N., C. S. Jensen, "Conceptual Design of Spatio-Temporal Applications: Requirements and Solutions (extended abstract)," Procedings of the First Chorochronos Intensive Workshop on Spatio-Temporal Database Systems, Petronell-Carnuntum, Austria, 6 pages, November 13-15, 1997.

Publication
Jensen, C. S., R. T. Snodgrass, "TimeCenter Prospectus," TimeCenter Technical Report, Internal TR-1, Aalborg University, Department of Computer Science, Fredrik Bajers Vej 7E, DK-9220 Aalborg Øst, Denmark, 18 pages, January 1997.

Böhlen, M. H., C. S. Jensen, B. Skjellaug, "Spatio-Temporal Database Support for Legacy Applications," TimeCenter Technical Report TR-20, 21 pages, July 1997.

Publication
In areas such as finance, marketing, and property and resource management, many database applications manage spatio-temporal data. These applications typically run on top of a relational DBMS and manage spatio-temporal data either using the DBMS, which provides little support, or employ the services of a proprietary system that co-exists with the DBMS, but is separate from and not integrated with the DBMS. This wealth of applications may benefit substantially from built-in, integrated spatio-temporal DBMS support. Providing a foundation for such support is an important and substantial challenge.
This paper initially defines technical requirements to a spatio-temporal DBMS aimed at protecting business investments in the existing legacy applications and at reusing personnel expertise. These requirements provide a foundation for making it economically feasible to migrate legacy applications to a spatio-temporal DBMS. The paper next presents the design of the core of a spatio-temporal extension to SQL 92, called STSQL, that satisfies the requirements. STSQL supports multiple temporal as well as spatial dimensions. Queries may ignore any dimension; this provides an important kind of upward compatibility with SQL 92. Queries may also view the tables in a dimensional fashion, where the DBMS provides so-called snapshot reducible query processing for each dimension. Finally, queries may view dimension attributes as if they are no different from other attributes.
Jensen, C. S., R. T. Snodgrass, "Temporal Data Management," TimeCenter Technical Report TR-17, 12 pages, June 1997.

Publication
A wide range of database applications manage time-varying information. Existing database technology currently provides little support for managing such data. The research area of temporal databases has made important contributions in characterizing the semantics of such information and in providing expressive and efficient means to model, store, and query temporal data. This paper introduces the reader to temporal data management, surveys state-of-the-art solutions to challenging aspects of temporal data management, and points to research directions.
Tsotras, V., C. S. Jensen, R. T. Snodgrass, "A Notation for Spatiotemporal Queries," TimeCenter Technical Report TR-10, 13 pages, April 1997.

Publication
Temporal, spatial and spatiotemporal queries are inherently multidimensional, combining predicates on time dimension(s) with predicates on explicit attributes and/or several spatial dimensions. In the past there was no consistent way to refer to temporal or spatiotemporal queries, thus leading to considerable confusion. In an attempt to eliminate this problem, we propose a new notation for such queries. Our notation is simple and extensible and can be easily applied to multidimensional queries in general.
Snodgrass, R. T., M. H. Böhlen, C. S. Jensen, A. Steiner, "Transitioning Temporal Support in TSQL2 to SQL3," TimeCenter Technical Report TR-8, 28 pages, April 1997.

Publication
This document summarizes the proposals before the SQL3 committees to allow the addition of tables with valid-time and transaction-time support into SQL/Temporal, and explains how to use these facilities to migrate smoothly from a conventional relational system to one encompassing temporal support. Initially, important requirements to a temporal system that may facilitate such a transition are motivated and discussed. The proposal then describes the language additions necessary to add valid-time support to SQL3 while fulfilling these requirements. The constructs of the language are divided into four levels, with each level adding increased temporal functionality to its predecessor. A prototype system implementing these constructs on top of a conventional DBMS is publicly available.
Torp, K., C. S. Jensen, R. T. Snodgrass, "Stratum Approaches to Temporal DBMS Implementation," TimeCenter Technical Report TR-5, 18 pages, March 1997.

Publication
Previous approaches to temporal databases have assumed that a temporal database management system (temporal DBMS) must be implemented from scratch, as an integrated architecture to yield adequate performance and to use new temporal implementation techniques, such as temporal indexes and join algorithms. However, this is a very large and time-consuming task.
In this paper we explore howa temporalDBMS can be implemented in a stratum on top of an existing non-temporal DBMS, rendering the task more feasible because it reuses much of the functionality of the underlying conventional DBMS.
At the outset, we discuss the advantages and disadvantages of the stratum architecture compared to the integrated architecture, and we present a set of criteria for a stratum architecture. Subsequently, three different meta architectures for implementing a temporal DBMS in a stratum are identified. Each meta architecture contains several specific architectures, which are examined in turn. Existing temporal DBMS implementations are classified according to the specific architectures identified. Finally, the specific architectures are evaluated according to our criteria.
We conclude that a stratum architecture is the best short, medium, and perhaps even long-term, approach to implementing a temporal DBMS. Further, it is possible to integrate existing conventional DBMSs with new temporal implementation techniques, blurring the differences between integrated and stratum architectures.
Torp, K., R. T. Snodgrass, C. S. Jensen, "Correct and Efficient Timestamping of Temporal Data," TimeCenter Technical Report TR-4, 20 pages, March 1997.

Publication
Previous approaches to timestamping temporal data have implicitly assumed that transactions have no duration. In this paper we identify several situations where a sequence of operations over time within a single transaction can violate ACID properties.
It has been previously shown that the transaction-time dimension must be timestamped after commit. This time is not known within the transaction. We describe how to correctly implement most queries that make explicit reference to this (unknown) transaction time, and argue that the rest, which can be syntactically identified, can only be answered with an approximation of the correct value.
The drawback of timestamping after commit is that it requires revisiting tuples. We show that this expensive revisiting step is required only before any queries or modifications in subsequent transactions that access prior states; in most cases, revisiting tuples can be postponed, and when to revisit can be syntactically determined. We propose several strategies for revisiting tuples, and we empirically evaluate these strategies in order to determine under which circumstances each is best.
1996 top Jensen, C. S., R. T. Snodgrass, M. D. Soo, "Extending Existing Dependency Theory to Temporal Databases," IEEE Transactions on Knowledge and Data Engineering, Vol. 8, No. 4, pp. 563-582, August 1996.

Publication
Transaction-time databases support access to not only the current database state, but also previous database states. Supporting access to previous database states requires large quantities of data and necessitates efficient temporal query processing techniques. Previously, we presented a log based storage structure and algorithms for the differential computation of previous database states. Timeslices-i.e., previous database states-are computed by traversing a log of database changes, using previously computed and cached timeslices as outsets. When computing a new timeslice, the cache will contain two candidate outsets: an earlier outset and a later outset. The new timeslice can be computed by either incrementally updating the earlier outset or decrementally downdating the later outset using the log. The cost of this computation is determined by the size of the log between the outset and the new timeslice. The paper proposes an efficient algorithm that identifies the cheaper outset for the differential computation. The basic idea is to compute the sizes of the two pieces of the log by maintaining and using a tree structure on the timestamps of the database changes in the log. The lack of a homogeneous node structure, a controllable and high fill factor for nodes, and of appropriate node allocation in existing tree structures (e.g., B+ trees, Monotonic B+ trees, and Append only trees) render existing tree structures unsuited for our use. Consequently, a specialized tree structure, the pointer-less insertion tree, is developed to support the algorithm. As a proof of concept, we have implemented a main memory version of the algorithm and its tree structure.
Jensen, C. S., R. T. Snodgrass, "Semantics of Time-Varying Information," Information Systems, Vol. 21, No. 4, pp. 311-352, 1996.

Publication [not publicly available]

Online by Elsevier at ScienceDirect
This paper provides a systematic and comprehensive study of the underlying semantics of temporal databases, summarizing the results of an intensive collaboration between the two authors over the last five years. We first examine how facts may be associated with time, most prominently with one or more dimensions of valid time and transaction time. One common case is that of a bitemporal relation, in which facts are associated with timestamps from exactly one valid-time and one transaction-time dimension. These two times may be related in various ways, yielding temporal specialization. Multiple transaction times arise when a fact is stored in one database, then later replicated or transferred to another database. By retaining the transaction times, termed temporal generalization, the original relation can be effectively queried by referencing only the final relation. We attempt to capture the essence of time-varying information via a very simple data model, the bitemporal conceptual data model. Emphasis is placed on the notion of snapshot equivalence of the information content of relations of different data models. The logical design of temporal databases is a natural next topic. Normal forms play a central role during the design of conventional relational databases. We show how to extend the existing relational dependency theory, including the dependencies themselves, keys, normal forms, and schema decomposition algorithms, to apply to temporal relations. However, this theory does not fully take into account the temporal semantics of the attributes of temporal relations. To address this deficiency, we study the semantics of individual attributes. One aspect is the observation and update patterns of attributes - when an attribute changes value and when the changes are recorded in the database, respectively. A related aspect is when an attribute has some value, termed its lifespan. Yet another aspect is the values themselves of attributes - how to derive a value for an attribute at any point in time from stored values, termed temporal derivation. This study of attribute semantics leads to the formulation of temporal guidelines for logical database design.
Jensen, C. S., R. T. Snodgrass, "Semantics of Time-Varying Information," Technical Report R-96-2008, Aalborg University, Department of Mathematics and Computer Science, Fredrik Bajers Vej 7E, DK-9220 Aalborg Øst, Denmark, 50 pages, March 1996.

Snodgrass, R. T., M. Böhlen, C. S. Jensen, A. Steiner, "Adding Valid Time to SQL/Temporal," ANSI Expert's Contribution, ANSI X3H2-96-501r1, ISO/IEC JTC1/SC21/ WG3 DBL MAD-146r2, International Organization for Standardization, 77 pages, November 1996.

Publication

Online at University of Arizona
This change proposal specifies the addition of tables with valid-time support into SQL/Temporal, and explains how to use these facilities to migrate smoothly from a conventional relational system to a temporal system. Initially, important requirements to a temporal system that may facilitate such a transition are motivated and discussed. The proposal then describes the language additions necessary to add valid-time support to SQL3 while fulfilling these requirements. The constructs of the language are divided into four levels, with each level adding increased temporal functionality to its predecessor. The proposal formally defines the semantics of the query language by providing a denotational semantics mapping to well-defined algebraic expressions. Several alternatives for implementing the language constructs are listed. A prototype system implementing these constructs on top of a conventional DBMS is publicly available.
Snodgrass, R. T., M. Böhlen, C. S. Jensen, A. Steiner, "Adding Transaction Time to SQL/Temporal," ANSI Expert's Contribution, ANSI X3H2-96-502r2, ISO/IEC JTC1/SC21/WG3 DBL MAD-147r2, International Organization for Standardization, 47 pages, November 1996.

Publication
Transaction time identifies when data was asserted in the database. If transaction time is supported, the states of the database at all previous points of time are retained. This change proposal specifies the addition of transaction time, in a fashion consistent with that already proposed for valid time. In particular, constructs to create tables with valid-time and transaction-time support and query such tables with temporal upward compatibility, sequenced semantics, and nonsequenced semantics, orthogonally for valid and transaction time, is defined. These constructs also can be used in modifications, assertions, cursors, and views.
Böhlen, M. H., C. S. Jensen, "Seamless Integration of Time into SQL," Technical Report R-96-2049, Aalborg University, Department of Computer Science, Fredrik Bajers Vej 7E, DK-9220 Aalborg Øst, Denmark, 51 pages, December 1996.

Gregersen, H., C. S. Jensen, "Temporal Entity-Relationship Models-a Survey," Technical Report R-96-2039, Aalborg University, Department of Computer Science, Fredrik Bajers Vej 7E, DK-9220 Aalborg Øst, Denmark, 41 pages. Also TimeCenter Technical Report TR-3, September 1996.

Publication
The Entity-Relationship (ER) Model, using varying notations and with some semantic variations, is enjoying a remarkable, and increasing, popularity in both the research community, the computer science curriculum, and in industry. In step with the increasing diffusion of relational platforms, ER modeling is growing in popularity. It has been widely recognized that temporal aspects of database schemas are prevalent and difficult to model using the ER model. As a result, how to enable the ER model to properly capture time-varying information has for a decade and a half been an active area of the database research community. This has led to the proposal of almost a dozen temporally enhanced ER models.
This paper surveys all temporally enhanced ER models known to the authors. It is the first paper to provide a comprehensive overview of temporal ER modeling, and it thus meets a need for consolidating and providing easy access to the research in temporal ER modeling. In the presentation of each model, the paper examines how the time-varying information is captured in the model and present the new concepts and modeling constructs of the model. A total of 20 different design properties for temporally enhanced ER models are defined, and each model is characterized according the these properties.
Bair, J., M. Böhlen, C. S. Jensen, R. T. Snodgrass, "Notions of Upward Compatibility of Temporal Query Languages," Technical Report R-96-2038, Aalborg University, Department of Computer Science, Fredrik Bajers Vej 7E, DK-9220 Aalborg Øst, Denmark, 25 pages. Also TimeCenter Technical Report TR-6, September 1996.

Publication
Migrating applications from conventional to temporal database management technology has received scant mention in the research literature. This paper formally defines three increasingly restrictive notions of upward compatibility which capture properties of a temporal SQL with respect to conventional SQL that, when satisfied, provide for a smooth migration of legacy applications to a temporal system. The notions of upward compatibility dictate the semantics of conventional SQL statements and constrain the semantics of extensions to these statements. The paper evaluates the seven extant temporal extensions to SQL, all of which are shown to complicate migration through design decisions that violate one or more of these notions. We then outline how SQL-92 can be systematically extended to become a temporal query language that satisfies all three notions.
Torp, K., C. S. Jensen, M. Böhlen, "Layered Implementation of Temporal DBMSs - Concepts and Techniques," Technical Report R-96-2037, Aalborg University, Department of Computer Science, Fredrik Bajers Vej 7E, DK-9220 Aalborg Øst, Denmark, 25 pages. Also TimeCenter Technical Report TR-2, September 1996.

Publication
A wide range of database applications manage time-varying data. Examples include, e.g., accounting, personnel, schedule, and data warehousing applications. At the same time, it is well-known that querying and correctly updating time-varying data is difficult and error-prone when using standard SQL. As a result of a decade of intensive exploration, temporal extensions of SQL have reached a level of maturity and sophistication where it is clear that they offer substantial benefits over SQL when managing time-varying data.
The topic of this paper is the effective implementation of temporally extended SQL's. Traditionally, it has been assumed that a temporal DBMS must be built from scratch, utilizing new technologies for storage, indexing, query optimization, concurrency control, and recovery. This paper adopts a quite different approach. Specifically, it explores the concepts and techniques involved in implementing a temporally enhanced SQL while maximally reusing the facilities of an existing SQL implementation, e.g., Oracle or DB2. The topics covered span the choice of an adequate timestamp domain that include the time variable "now," a comparison of alternative query processing architectures including a partial parser approach, update processing, and transaction processing, the latter including how to ensure ACID properties and assign correct timestamps.
1995 top Jensen, C. S., R. T. Snodgrass, "Semantics of Time-Varying Attributes and Their Use for Temporal Database Design," Fourteenth International Conference on Object-Oriented and Entity Relationship Modeling, Queensland, Australia, pp. 366-377, December 13-15, 1995.

Publication
Based on a systematic study of the semantics of temporal attributes of entities, this paper provides new guidelines for the design of temporal relational databases. The notions of observation and update patterns of an attribute capture when the attribute changes value and when the changes are recorded in the database. A lifespan describes when an attribute has a value. And derivation functions describe how the values of an attribute for all times within its lifespan are computed from stored values. The implications for temporal database design of the semantics that may be captured using these concepts are formulated as schema decomposition rules.
Böhlen, M. H., C. S. Jensen, R. T. Snodgrass, "Evaluating the Completeness of TSQL2," in Proceedings of the International Workshop on Temporal Databases, Zürich, Switzerland, pp. 153-172. The proceedings are entitled Recent Advances in Temporal Databases and are published by Springer-Verlag in their Workshops in Computing Series, September 17-18, 1995.

Publication
The question of what is a well-designed temporal data model and query language is a difficult, but also an important one. The consensus temporal query language TSQL2 attempts to take advantage of the accumulated knowledge gained from designing and studying many of the earlier models and languages. In this sense, TSQL2 represents a constructive answer to this question. Others have provided analytical answers by developing criteria, formulated as completeness properties, for what is a good model and language.
This paper applies important existing completeness notions to TSQL2 in order to evaluate the design of TSQL2. It is shown that TSQL2 satisfies only a subset of these completeness notions.
Snodgrass, R. T., C. S. Jensen, C. E. Dyreson, W. Käefer, N. Kline, J. F. Roddick, "A Second Example," Chapter 4, pp. 47-70, in The TSQL2 Temporal Query Language, edited by R. T. Snodgrass, Kluwer Academic Publishers, 674+xxiv pages, 1995.

Publication [not publicly available]
Jensen, C. S., R. T. Snodgrass, "The Surrogate Data Type," Chapter 9, pp. 149-152, in The TSQL2 Temporal Query Language, edited by R. T. Snodgrass, Kluwer Academic Publishers, 674+xxiv pages, 1995.

Publication [not publicly available]
Jensen, C. S., R. T. Snodgrass, M. D. Soo, "The TSQL2 Data Model," Chapter 10, pp. 153-238, in The TSQL2 Temporal Query Language, edited by R. T. Snodgrass, Kluwer Academic Publishers, 674+xxiv pages, 1995.

Publication [not publicly available]
Snodgrass, R. T., C. S. Jensen, M. D. Soo, "Schema Specification," Chapter 11, pp. 239-242, in The TSQL2 Temporal Query Language, edited by R. T. Snodgrass, Kluwer Academic Publishers, 674+xxiv pages, 1995.

Publication [not publicly available]
Snodgrass, R. T., C. S. Jensen, F. Grandi, "The From Clause," Chapter 12, pp. 243-248, in The TSQL2 Temporal Query Language, edited by R. T. Snodgrass, Kluwer Academic Publishers, 674+xxiv pages, 1995.

Publication [not publicly available]
Hsu, S., C. S. Jensen, R. T. Snodgrass, "Valid-Time Selection and Projection," Chapter 13, pp. 249-296, in The TSQL2 Temporal Query Language, edited by R. T. Snodgrass, Kluwer Academic Publishers, 674+xxiv pages, 1995.

Publication [not publicly available]
Leung, T. Y. C., C. S. Jensen, R. T. Snodgrass, "Modification," Chapter 14, pp. 297-302, in The TSQL2 Temporal Query Language, edited by R. T. Snodgrass, Kluwer Academic Publishers, 674+xxiv pages, 1995.

Publication [not publicly available]
Jensen, C. S., R. T. Snodgrass, T. Y. C. Leung, "Cursors," Chapter 15, pp. 303-308, in The TSQL2 Temporal Query Language, edited by R. T. Snodgrass, Kluwer Academic Publishers, 674+xxiv pages, 1995.

Publication [not publicly available]
Clifford, J., C. E. Dyreson, R. T. Snodgrass, T. Isakowitz, C. S. Jensen, "Now," Chapter 20, pp. 383-392, in The TSQL2 Temporal Query Language, edited by R. T. Snodgrass, Kluwer Academic Publishers, 674+xxiv pages, 1995.

Publication [not publicly available]
Jensen, C. S., "Vacuuming," Chapter 23, pp. 447-458, in The TSQL2 Temporal Query Language, edited by R. T. Snodgrass, Kluwer Academic Publishers, 674+xxiv pages, 1995.

Publication [not publicly available]
Soo, M. D., C. S. Jensen, R. T. Snodgrass, "An Architectural Framework," Chapter 24, pp. 461-470, in The TSQL2 Temporal Query Language, edited by R. T. Snodgrass, Kluwer Academic Publishers, 674+xxiv pages, 1995.

Publication [not publicly available]
Soo, M. D., C. S. Jensen, R. T. Snodgrass, "An Algebra for TSQL2," Chapter 27, pp. 501-541, in The TSQL2 Temporal Query Language, edited by R. T. Snodgrass, Kluwer Academic Publishers, 674+xxiv pages, 1995.

Publication [not publicly available]
Snodgrass, R. T., I. Ahn, G. Ariav, D. S. Batory, J. Clifford, C. E. Dyreson, R. Elmasri, F. Grandi, C. S. Jensen, W. Käfer, N. Kline, K. Kulkarni, T. Y. C. Leung, N. Lorentzos, J. F. Roddick, A. Segev, M. D. Soo, S. M. Sripada, "Language Syntax," Chapters 28-38, pp. 549-629, in The TSQL2 Temporal Query Language, edited by R. T. Snodgrass, Kluwer Academic Publishers, 674+xxiv pages, 1995.

Publication [not publicly available]
Segev, A., C. S. Jensen, R. T. Snodgrass, "Report on The 1995 International Workshop on Temporal Databases," in ACM SIGMOD Record, Vol. 24, No. 4, pp. 46-52, December 1995.

Publication [not publicly available]

ACM Author-Izer
This paper provides an overview of the 1995 International Workshop on Temporal Databases. It summarizes the technical papers and related discussions, and three panels: "Wither TSQL3?", "Temporal Data Management in Financial Applications," and "Temporal Data Management Infrastructure & Beyond."
Snodgrass, R. T., M. H. Böhlen, C. S. Jensen, A. Steiner, "Change Proposal for SQL/Temporal: Adding Valid Time-Part A," Technical Report R-95-2024, Aalborg University, Department of Mathematics and Computer Science, Fredrik Bajers Vej 7E, DK-9220 Aalborg Øst, Denmark, 36 pages, December 1995.

Publication
The effective management of time-varying information is of essence in a wide range of applications. Such applications may benefit substantially from built-in temporal support in the database management system. It is also important to be able to transition effectively from a non-temporal to a temporal DBMS.
This change proposal specifies the addition of temporal tables into SQL/Temporal, and explains how to use these facilities to migrate smoothly from a conventional relational system to a temporal system. Initially, important requirements to a temporal system that may facilitate a smooth transition are motivated and discussed. The proposal then describes the language additions necessary to add temporal support to SQL3 while fulfilling these requirements. The constructs of the language are divided into four levels, with each level adding increased temporal functionality to its predecessor.
The proposal formally defines the semantics of the query language by providing a denotational semantics mapping to well-defined algebraic expressions. Several alternatives for implementing the language constructs are listed, ranging from minimal extensions of SQL3 systems to alternatives that may exploit temporal query processing techniques and indices to achieve better performance. A prototype system implementing these constructs on top of a conventional DBMS is publicly available.
Böhlen, M. H., C. S. Jensen, R. T. Snodgrass, "Evaluating and Enhancing the Completeness of TSQL2," Technical Report 95-5, University of Arizona, Department of Computer Science, Tucson, Arizona 85721, USA, 26 pages, July 1995.

Publication
The question of what is a well-designed temporal data model and query language is a diffcult, but also an important one. The consensus temporal query language TSQL2 attempts to take advantage of the accumulated knowledge gained from designing and studying many of the earlier models and languages. In this sense, TSQL2 represents a constructive answer to this question. Others have provided analytical answers by developing criteria, formulated as completeness properties, for what is a good model and language.
This paper applies important existing completeness notions to TSQL2 in order to evaluate the design of TSQL2. It is shown that TSQL2 satisfies only a subset of these completeness notions. In response to this, a minimally modified version of TSQL2, termed Applied TSQL2, is proposed; this new language satisfies the notions of temporal semi-completeness and completeness which are not satisfied by TSQL2. An outline of the formal semantics for Applied TSQL2 is given.
Jensen, C. S., R. T. Snodgrass, "Semantics of Time-Varying Attributes and Their Use for Temporal Database Design," Technical Report R-95-2012, Aalborg University, Department of Mathematics and Computer Science, Fredrik Bajers Vej 7E, DK-9220 Aalborg Øst, Denmark, 28 pages, May 1995.

Publication
This paper concerns the design of temporal relational database schemas.
Normal forms play a central role during the design of conventional relational databases, and we have previously extended all existing relational normal forms to apply to temporal relations. However, these normal forms are all atemporal in nature and do not fully take into account the temporal semantics of the attributes of temporal relations. Consequently, additional guidelines for the design of temporal relations are required.
This paper presents a systematic study of important aspects of the temporal semantics of attributes. One such aspect is the observation and update patterns of attributes - when an attribute changes value and when the changes are recorded in the database. A related aspect is when the attributes have values. A third aspect is the values themselves of attributes - how to derive a value for an attribute at any point in time from stored values. Guidelines for the design of the logical schema of a temporal database are introduced, and implications of the temporal-attribute semantics for the design of views and the physical schema are considered. The Bitemporal Conceptual Data Model, the data model of the consensus temporal query language TSQL2, serves as the context for the study.
1994 top Jensen, C. S., M. D. Soo, R. T. Snodgrass, "Unifying Temporal Data Models via a Conceptual Model," Information Systems, Vol. 19, No. 7, pp. 513-547, December 1994.

Publication [not publicly available]

Online by Elsevier at ScienceDirect
To add time support to the relational model, both first normal form (1NF) and non-1NF data models have been proposed. Each has associated advantages and disadvantages. For example, remaining within 1NF when time support is added may introduce data redundancy. On the other hand, well-established storage organization and query evaluation techniques require atomic attribute values, and are thus intended for 1NF models; utilizing a non-1NF model may degrade performance. This paper describes a new temporal data model designed with the single purpose of capturing the time-dependent semantics of data. Here, tuples of bitemporal relations are stamped with sets of two-dimensional chronons in transaction-time/valid-time space. We use the notion of snapshot equivalence to map temporal relation instances and temporal operators of one existing model to equivalent instances and operators of another. We examine five previously proposed schemes for representing bitemporal data: two are tuple-timestamped 1NF representations, one is a Backlog relation composed of 1NF timestamped change requests, and two are non-1NF attribute value-timestamped representations. The mappings between these models are possible using mappings to and from the new conceptual model. The framework of well-behaved mappings between models, with the new conceptual model at the center, illustrates how it is possible to use different models for display and storage purposes in a temporal database system. Some models provide rich structure and are useful for display of temporal data, while other models provide regular structure useful for storing temporal data. The equivalence mappings effectively move the distinction between the investigated data models from a semantic basis to a display-related or a physical, performance-relevant basis, thereby allowing the exploitation of different data models by using each for the task(s) for which they are best suited.
Jensen, C. S., R. Snodgrass, "Temporal Specialization and Generalization," IEEE Transactions on Knowledge and Data Engineering, Vol. 6, No. 6, pp. 954-974, December 1994.

Publication
A standard relation has two dimensions: attributes and tuples. A temporal relation contains two additional orthogonal time dimensions: valid time records when facts are true in the modeled reality, and transaction time records when facts are stored in the temporal relation. Although there are no restrictions between the valid time and transaction time associated with each fact, in many practical applications the valid and transaction times exhibit restricted interrelationships that define several types of specialized temporal relations. This paper examines areas where different specialized temporal relations are present. In application systems with multiple, interconnected temporal relations, multiple time dimensions may be associated with facts as they flow from one temporal relation to another. The paper investigates several aspects of the resulting generalized temporal relations, including the ability to query a predecessor relation from a successor relation. The presented framework for generalization and specialization allows one to precisely characterize and compare temporal relations and the application systems in which they are embedded. The framework's comprehensiveness and its use in understanding temporal relations are demonstrated by placing previously proposed temporal data models within the framework. The practical relevance of the defined specializations and generalizations is illustrated by sample realistic applications in which they occur. The additional semantics of specialized relations are especially useful for improving the performance of query processing.
Soo, M. D., R. T. Snodgrass, C. S. Jensen, "Efficient Evaluation of the Valid-Time Natural Join," in Proceedings of the Tenth IEEE International Conference on Data Engineering, Houston, TX, pp. 282-292, February 14-18, 1994.

Publication
Joins are arguably the most important relational operators. Poor implementations are tantamount to computing the Cartesian product of the input relations. In a temporal database, the problem is more acute for two reasons. First, conventional techniques are designed for the optimization of joins with equality predicates, rather than the inequality predicates prevalent in valid-time queries. Second, the presence of temporally-varying data dramatically increases the size of the database. These factors require new techniques to e1~iciently evaluate valid-time joins.
We address this need for efficient join evaluation in databases supporting valid-time. A new temporaljoin algorithm based on tuple partitioning is introduced. This algorithm avoids the quadratic cost of nestedloop evaluation methods; it also avoids sorting. Performance comparisons between the partition-based algorithm and other evaluation methods are provided. While we focus on the important valid-time natural join, the techniques presented are also applicable to other valid-time joins.
Snodgrass, R. T., I. Ahn, G. Ariav, D. S. Batory, J. Clifford, C. E. Dyreson, R. Elmasri, F. Grandi, C. S. Jensen, W. Käfer, N. Kline, K. Kulkarni, T. Y. C. Leung, N. Lorentzos, J. F. Roddick, A. Segev, M. D. Soo, S. M. Sripada, "A TSQL2 Tutorial," in ACM SIGMOD Record, Vol. 23, No. 3, pp. 27-33, September 1994.

Publication [not publicly available]

ACM Author-Izer
Snodgrass, R. T., I. Ahn, G. Ariav, D. S. Batory, J. Clifford, C. E. Dyreson, R. Elmasri, F. Grandi, C. S. Jensen, W. Käfer, N. Kline, K. Kulkarni, T. Y. C. Leung, N. Lorentzos, J. F. Roddick, A. Segev, M. D. Soo, S. M. Sripada, "TSQL2 Language Specification," in ACM SIGMOD Record, Vol. 23, No. 1, pp. 65-86, March 1994.

Publication [not publicly available]

ACM Author-Izer
Dahl, K., H. Gregersen, C. A. Have, C. S. Jensen, J. Sigurdsson, J. S. Winter, "Databasebenchmarks," in PROSA-bladet, No. 5, pp. 15-17 (in Danish), May 1994.

Publication
Jensen, C. S. et al. (editors, with multiple other contributors), "A Consensus Glossary of Temporal Database Concepts," in ACM SIGMOD Record, Vol. 23, No. 1, pp. 52-65 (Special Section: Temporal Database Infrastructure), March 1994.

Publication [not publicly available]

Online at ACM Digital Library
This document contains definitions of a wide range of concepts specific to and widely used within temporal databases. In addition to providing definitions, the document also includes separate explanations of many of the defined concepts. Two sets of criteria are included. First, all included concepts were required to satisfy four relevance criteria, and, second, the naming of the concepts was resolved using a set of evaluation criteria. The concepts are grouped into three categories: concepts of general database interest, of temporal database interest, and of specialized interest. This document is a digest of a full version of the glossary. In addition to the material included here, the full version includes substantial discussions of the naming of the concepts.
The consensus effort that lead to this glossary was initiated in Early 1992. Earlier status documents appeared in March 1993 and December 1992 and included terms proposed after an initial glossary appeared in SIGMOD Record in September 1992. The present glossary subsumes all the previous documents. It was most recently discussed at the "ARPA/NSF International Workshop on an Infrastructure for Temporal Databases," in Arlington, TX, June 1993, and is recommended by a significant part of the temporal database community. The glossary meets a need for creating a higher degree of consensus on the definition and naming of temporal database concepts.
Snodgrass, R. T. (ed.), I. Ahn, G. Ariav, P. Bayer, J. Clifford, C. E. Dyreson, F. Grandi, L. Hermosilla, C. S. Jensen, W. Käfer, N. Kline, K. Kulkarni, T. Y. C. Leung, N. Lorentzos, Y. Mitsopoulos, J. F. Roddick, M. D. Soo, S. M. Sripada, "An Evaluation of TSQL2," 53 pages, in The TSQL2 Language Specification, University of Arizona, Department of Computer Science, Tucson, Arizona 85721, USA, 546 pages, September 1994.

Publication
This document evaluates the TSQL2 language against a sizable consensus test suite of temporal database queries. The test suite consists of a database schema, an instance for the schema, and a set of approximately 150 queries on this database. The reader is cautioned that the queries have not been independently validated nor tested on an implementation of TSQL2. Given the number and complexity of some of the queries, there are most certainly errors.
Soo, M. D., C. S. Jensen, R. T. Snodgrass, "An Algebra for TSQL2," Technical Report R-94-2053, Aalborg University, Department of Mathematics and Computer Science, Fredrik Bajers Vej 7E, DK-9220 Aalborg Øst, Denmark, 33 pages, December 1994.

Publication
Hsu, S., C. S. Jensen, R. T. Snodgrass, "Valid-time Selection and Projection in TSQL2," Technical Report R-94-2052, Aalborg University, Department of Mathematics and Computer Science, Fredrik Bajers Vej 7E, DK-9220 Aalborg Øst, Denmark, 36 pages, December 1994.

Publication
Temporal databases have now been studied for more than a decade. During that period of time, numerous query languages have been proposed for temporal databases. One of the essential components of a temporal query language is valid-time selection, which allows the user to retrieve tuples according to their valid-time relationship. Often, this component is closely tied to another important component, valid-time projection, which defines the timestamps of the tuples in query results.
Here, nine difierent temporal query languages, primarily SQL and QUEL extensions, are examined with a focus on valid-time selection and projection. Based on that survey, the specific design of the valid-time selection and projection components of the consensus temporal query language TSQL2 are presented.
Jensen, C. S., R. T. Snodgrass, M. D. Soo, "The TSQL2 Data Model," Technical Report R-94-2051, Aalborg University, Department of Mathematics and Computer Science, Fredrik Bajers Vej 7E, DK-9220 Aalborg Øst, Denmark, 60 pages, December 1994.

Publication
Torp, K., L. Mark, C. S. Jensen, "Efficient Differential Timeslice Computation," Technical Report R-94-2055, Aalborg University, Department of Mathematics and Computer Science, Fredrik Bajers Vej 7E, DK-9220 Aalborg Øst, Denmark, 26 pages, December 1994.

Publication
Transaction-time databases record all previous database states and are ever-growing, leading to potentially huge quantities of data. For that reason, eficient query processing and the utilization of cheap write-once storage media is of particular importance. This is facilitated by adopting a log-based storage structure. Timeslices, i.e., relation states or snapshots, are computed by traversing the logs, using previously computed and cached timeslices as outsets. When computing a new timeslice, the cache will contain two candidate outsets: an earlier outset and a later outset. The new timeslice can be computed by either incrementally updating the earlier outset or decrementally downdating the later outset. This paper proposes an eficient algorithm that eficiently identifies the cheaper outset.
The perhaps most obvious algorithm uses the proximity in time between the earlier and later outsets and the new timeslice as the basis for its measure of cost. Unfortunately, this is not a reliable measure of the in the number of changes recorded in the logs between each of the two outsets and the new timeslice. The amount of change to the database may vary substantially over time. We subsequently investigated a number of index structures on the timestamps in the logs, including B+-trees, Monotonic B+-trees, and Append-only trees. The fundamental idea was that determining the relative positioning, of the timestamps of the earlier, the new, and the later timeslices, in an index would allow a computation of the corresponding number of changes recorded in the logs between these times. Unfortunately, the lack in these index structures of either a homogeneous node structure, a controllable fill-factor for nodes, or of an appropriate node allocation algorithm greatly complicated the computation.
Consequently, a specialized index structure was developed to support the algorithm. We present and analyze two variations of this index structure, the Insertion tree (I-tree) and the Pointer-less Insertion tree (PLI-tree). The cost of using one of these trees for picking the optimal outset for timeslice computation is only slightly lower than that of using a B+-tree. However, being sparse and packed, I-trees and PLI-trees require little space overhead, and they are cheap to maintain as the underlying relations are updated. The trees also provide a basis for an algorithm that precisely and eficiently predicts the actual costs of computing timeslices in advance. This is useful for query optimization and can be essential in real-time applications. Finally, it is demonstrated how the trees can be used in the computation of other types of queries. As a proof of the functionality of the I- and PLI-tree we have implemented main memory versions of both.
Jensen, C. S., R. T. Snodgrass, M. D. Soo, "Extending Existing Dependency Theory to Temporal Databases," Technical Report R-94-2050, Aalborg University, Department of Mathematics and Computer Science, Fredrik Bajers Vej 7E, DK-9220 Aalborg Øst, Denmark, 58 pages, December 1994.

Publication
Normal forms play a central role in the design of relational databases. Several normal forms for temporal relational databases have been proposed. These definitions are particular to specific temporal data models, which are numerous and incompatible.
This paper attempts to rectify this situation. We define a consistent framework of temporal equivalents of the important conventional database design concepts: functional dependencies, primary keys, and third and Boyce-Codd normal forms. This framework is enabled by making a clear distinction between the logical concept of a temporal relation and its physical representation. As a result, the role played by temporal normal forms during temporal database design closely parallels that of normal forms during conventional database design. These new normal forms apply equally well to all temporal data models that have timeslice operators, including those employing tuple timestamping, backlogs, and attribute value timestamping.
As a basis for our research, we conduct a thorough examination of existing proposals for temporal dependencies, keys, and normal forms. To demonstrate the generality of our approach, we outline how normal forms and dependency theory can also be applied to spatial and spatiotemporal databases.
Jensen, C. S., "Vacuuming in TSQL2," Technical Report R-94-2049, Aalborg University, Department of Mathematics and Computer Science, Fredrik Bajers Vej 7E, DK-9220 Aalborg Øst, Denmark, 17 pages, November 1994.

Publication
Updates, including (logical) deletions, to temporal tables that support transaction time result in insertions at the physical level. Despite the continuing decrease in cost of data storage, it is still, for various reasons, not always acceptable that all data be retained forever. Therefore, there is a need for a new mechanism for the vacuuming, i.e., physical deletion, of data when such tables are being managed.
We propose syntax and informal semantics for vacuuming of data from temporal tables in TSQL2 which support transaction time. The mechanism allows - at schema definition time, as well as later, during the life span of a table - for the specification of so-called cut-off points. A cut-off point for a table is a timestamp that evaluates to a time instant. The timestamp may be either absolute or a bound or unbound now-relative timestamp. Conceptually, the cut-off point indicates that all data, current in the table solely before the (current value of the) timestamp, has been physically deleted. Vacuuming based on cut-off points is an example of a more general notion of vacuuming where arbitrary subsets of data may be physically deleted.
Clifford, J., C. Dyreson, T. Isakowitz, C. S. Jensen, R. T. Snodgrass, "On the Semantics of "Now" in Temporal Databases," Technical Report R-94-2047, Aalborg University, Department of Mathematics and Computer Science, Fredrik Bajers Vej 7E, DK-9220 Aalborg Øst, Denmark, 50 pages, November 1994.

Publication
Most databases record time-varying data, and significant efforts have been devoted to the convenient and efficient management of such data. Perhaps most prominently, numerous data models with varying degrees of built-in support for the temporal dimension of data have been proposed. Some models are quite restricted and simply support uninterpreted attribute domains for times and dates. Other models incorporate either a valid-time dimension, recording when the stored data is true, or a transaction-time dimension, recording when the stored data is current in the database. Bitemporal data models incorporate both valid and transaction time. The special temporal notion of an ever-increasing current-time value has been reflected in some of these data models by inclusion of current-time variables, such as "now," "until-changed," "1," "@" and "-." As timestamp values associated with facts in temporal databases, such variables may be conveniently used for indicating that a fact is currently valid. Although the services of time variables are very desirable, their use leads to a new type of database, consisting of tuples with variables, termed variable databases.
This paper proposes a framework for defining the semantics of the variable databases of temporal relational data models. A framework is presented because several reasonable meanings may be given to databases that use some of the specific temporal variables that have appeared in the literature. Using the framework, the paper defines a useful semantics for such databases. Because situations occur where the existing time variables are inadequate, two new types of modeling entities that address these shortcomings, timestamps which we call now-relative and now-relative indeterminate, are introduced and defined within the framework. Moreover, the paper provides a foundation, using algebraic bind operators, for the querying of variable databases via existing query languages. This transition to variable databases presented here requires minimal change to the query processor. Finally, to underline the practical feasibility of variable databases, we show that variables may be represented and manipulated efficiently, incurring little space or execution time overhead.
Snodgrass, R. T., I. Ahn, G. Ariav, D. Batory, J. Clifford, C. E. Dyreson, R. Elmasri, F. Grandi, C. S. Jensen, W. Käfer, N. Kline, K. Kulkarni, T. Y. C. Leung, N. Lorentzos, J. F. Roddick, A. Segev, M. D. Soo, S. M. Sripada, "The TSQL2 Language Specification," (the language specification proper) 68 pages, in The TSQL2 Language Specification, University of Arizona, Department of Computer Science, Tucson, Arizona 85721, USA, 546 pages, September 1994.

Publication
Jensen, C. S., R. T. Snodgrass, "The Surrogate Data Type in TSQL2," 4 pages, in The TSQL2 Language Specification, University of Arizona, Department of Computer Science, Tucson, Arizona 85721, USA, 546 pages, September 1994.

Publication
This document proposes syntax and informal semantics for the inclusion of a SURROGATE data type in the TSQL2 query language.
Jensen, C. S., R. T. Snodgrass, M. D. Soo, "The TSQL2 Data Model," 62 pages, in The TSQL2 Language Specification, University of Arizona, Department of Computer Science, Tucson, Arizona 85721, USA, 546 pages, September 1994.

Publication
Snodgrass, R. T., C. S. Jensen, F. Grandi, "Schema Specification in TSQL2," 4 pages, in The TSQL2 Language Specification, University of Arizona, Department of Computer Science, Tucson, Arizona 85721, USA, 546 pages, September 1994.

Publication
This document proposes syntax and informal semantics for extended Create and Alter statements that permit valid-time tables to be defined.
Snodgrass, R. T., C. S. Jensen, "The From Clause in TSQL2," 6 pages, in The TSQL2 Language Specification, University of Arizona, Department of Computer Science, Tucson, Arizona 85721, USA, 546 pages, September 1994.

Publication
This document proposes syntax and informal semantics for an extended From clause in the Select statement.
Hsu, S., C. S. Jensen, R. T. Snodgrass, "A Survey of Valid-time Selection and Projection in Temporal Query Languages," 23 pages, in The TSQL2 Language Specification, University of Arizona, Department of Computer Science, Tucson, Arizona 85721, USA, 546 pages, September 1994.

Publication
Temporal databases have now been studied for more than a decade. During that period of time, numerous query languages have been proposed for temporal databases. One of the essential parts of a temporal query language is valid-time selection, which allows the user to retrieve tuples according to their valid-time relationships. Valid-time projection is another important ingredient which defines the timestamps of the tuples in query results. Here, nine different temporal query languages are examined with a focus on valid-time selection and projection. This document is intended to provide an ideal foundation for designing the valid-time selection and projection components of the consensus query language TSQL2 that is currently being designed.
Hsu, S., C. S. Jensen, R. T. Snodgrass, "Valid-time Selection in TSQL2," 14 pages, in The TSQL2 Language Specification, University of Arizona, Department of Computer Science, Tucson, Arizona 85721, USA, 546 pages, September 1994.

Publication
Temporal databases have now been studied for more than a decade, and numerous temporal query languages have been proposed. One of the essential parts of a temporal query language is valid-time selection, which allows the user to retrieve tuples based on their underlying valid-times. We have previously surveyed valid-time selection and projection in nine temporal query languages, primarily SQL and Quel extensions. Based in that survey, this document proposes a specific design of the valid-time selection component of the consensus temporal query language TSQL2 that is currently being designed.
Hsu, S., C. S. Jensen, R. T. Snodgrass, "Valid-time Projection in TSQL2," 10 pages, in The TSQL2 Language Specification, University of Arizona, Department of Computer Science, Tucson, Arizona 85721, USA, 546 pages, September 1994.

Publication
Temporal databases have now been studied for more than a decade, and numerous temporal query languages have been proposed. Valid-time projection, which defines the timestamps of the tuples in query results, is an important ingredient of a temporal query language. Often, valid-time projections is closely tied to another important component, namely valid-time selection, which allows the user to retrieve tuples based on their underlying valid-times. We have previously surveyed valid-time selection and projection in nine temporal query languages, primarily SQL and Quel extensions. Based in that survey, this document proposes a specific design of the valid-time selection component of the consensus temporal query language TSQL2 that is currently being designed.
Leung, T. Y. C., C. S. Jensen, R. T. Snodgrass, "Update in TSQL2," 5 pages, in The TSQL2 Language Specification, University of Arizona, Department of Computer Science, Tucson, Arizona 85721, USA, 546 pages, September 1994.

Publication
This document proposes syntax and informal semantics for update in TSQL2.
Jensen, C. S., R. T. Snodgrass, T. Y. C. Leung, "Cursors in TSQL2," 4 pages, in The TSQL2 Language Specification, University of Arizona, Department of Computer Science, Tucson, Arizona 85721, USA, 546 pages, September 1994.

Publication
This document proposes syntax and informal semantics for cursors in TSQL2
Clifford, J., C. E. Dyreson, R. T. Snodgrass, T. Isakowitz, C. S. Jensen, "Now in TSQL2," 12 pages, in The TSQL2 Language Specification, University of Arizona, Department of Computer Science, Tucson, Arizona 85721, USA, 546 pages, September 1994.

Publication
"Now" is a distinguished timestamp value used by many temporal data model proposals. In this paper, we propose a new king of event, a a now-relative event, that more accurately captures the semantics of "now". We discuss query language constructs, representation, and query processing strategies for such values. We demonstrate that these values incur no storage overhead and nominal additional query execution cost. The related concepts of "infinite future" and "infinite past" are also considered.
Jensen, C. S., "Vacuuming in TSQL2," 10 pages, in The TSQL2 Language Specification, University of Arizona, Department of Computer Science, Tucson, Arizona 85721, USA, 546 pages, September 1994.

Publication
Updates, including (logical) deletions, to temporal tables that support transaction time result in insertions at the physical level. Despite the continuing decrease in cost of data storage, it is still, for various reasons, not always acceptable that all data be retained forever. Therefore, there is a need for a new mechanism for the vacuuming, i.e., physical deletion, of data when such tables are being managed.
We propose syntax and informal semantics for vacuuming of data from temporal tables in TSQL2 which support transaction time. The mechanism allows - at schema definition time, as well as later, during the life span of a table - for the specification of so-called cut-off points. A cut-off point for a table is a timestamp that evaluates to a time instant. The timestamp may be either absolute or a bound or unbound now-relative timestamp. Conceptually, the cut-off point indicates that all data, current in the table solely before the (current value of the) timestamp, has been physically deleted. Vacuuming based on cut-off points is an example of a more general notion of vacuuming where arbitrary subsets of data may be physically deleted.
Soo, M. D., C. S. Jensen, R. T. Snodgrass, "An Algebra for TSQL2," 36 pages, in The TSQL2 Language Specification, University of Arizona, Department of Computer Science, Tucson, Arizona 85721, USA, 546 pages, September 1994.

Publication
Torp, K., L. Mark, C. S. Jensen, "Efficient Differential Timeslice Computation," Technical Report GIT-CC-94-12, Georgia Institute of Technology, College of Computing, Atlanta, Georgia 30332, USA, 15 pages, February 1994.

Publication
Transaction-time databases record all previous database states and are ever-groving, leading to potentially huge quantities of data. For that reason, efficient query processing is of particular importance.
Due to the large size of transaction-time relations, it is advantageous to utilize cheap write-once storage media for storage. This is facilitated by adopting a log-based storage structure. Timeslices, i.e., relation states or snapshots, are computed by traversing the logs, using previously computed and cached timeslices as outsets. When computing a new timeslice, the cache will contain two candidate outsets: an earlier outset and a later outset. We provide efficient means of always picking the optimal one. Specifically, we define and investigate the use of a new data structure, the B+-tree-like Insertion Tree (I-tree), for this purpose. The cost of using an I-tree for picking the optimal outset is similar to that of using a B+-tree. Being sparse, I-trees require little space overhead, and they are cheap to maintain as the underlying relations are updated.
I-trees also provide a basis for precisely and efficiently estimating the costs of performing timeslices in advance. This is useful for query optimization and can be essential in real-time applications. Finally, it is demonstrated how I-trees can be used in the computation of other types of queries.
1993 top Jensen, C. S., L. Mark, N. Roussopoulos, T. Sellis, "Using Differential Techniques to Efficiently Support Transaction Time," The VLDB Journal, Vol. 2, No. 1, pp. 75-111, January 1993.

Publication
We present an architecture for query processing in the relational model extended with transaction time. The architecture integrates standard query optimization and computation techniques with new differential computation techniques. Differential computation computes a query incrementally or decrementally from the cached and indexed results of previous computations. The use of differential computation techniques is essential to provide efficient processing of queries that access very large temporal relations. Alternative query plans are integrated into a state transaction network, where the state space includes backlogs of base relations, cached results from previous computations, a cache index, and intermediate results; the transactions include standard relational algebra operators, operators for constructing differential files, operators for differential computation, and combined operators. A rule set is presented to prune away parts of state transition networks that are not promising, and dynamic programming techniques are used to identify the optimal plans from the remaining state transition networks. An extended logical access path serves as a "structuring" index on the cached results and contains, in addition, vital statistics for the query optimization process (including statistics about base relations, backlogs, and queries - previously computed and cached, previously computed, or just previously estimated).
Jensen, C. S., R. T. Snodgrass, "Three Proposals for a Third-Generation Temporal Data Model," in Proceedings of the International Workshop on an Infrastructure for Temporal Databases, Arlington, TX, pp. T-1-T10, June 14-16, 1993.

Publication
We present three general proposals for a next-generation temporal data model. Each of these proposals express a synthesis of a variety of contributions from diverse sources within temporal databases. We believe that the proposals may aid in bringing consensus to the area of temporal data models.
The current plethora of diverse and incompatible temporal data models has an impeding effect on the design of a consensus temporal data model. A single data model is highly desirable, both to the temporal database community and to the database user community at large. It is our contention that the simultaneous foci on the modeling, presentation, representation, and querying of temporal data have been a major cause of the proliferation of models. We advocate instead a separation of concerns.
As the next step, we propose a data model for the single, central task of temporal data modeling. In this model, tuples are stamped with bitemporal elements, i.e., sets of pairs of valid and transaction time chronons. This model has no intention of being suitable for the other tasks, where existing models may perhaps be more appropriate. However, this model does capture time-varying data in a natural way.
Finally, we argue that flexible support for physical deletion is needed in bitemporal databases. Physical deletion requires special attention in order not to compromise the correctness of query processing.
Jensen, C. S., M. D. Soo, R. T. Snodgrass, "Unification of Temporal Data Models," in Proceedings of the Ninth IEEE International Conference on Data Engineering, Vienna, Austria, pp. 262-271, April 19-23, 1993.

Publication
To add time support to the relational model, both first normal form (1NF) and non-1NF approaches have been proposed. Each has associated dii]iculties. Remaining within 1NF when time support is added may introduce data redundancy. The non-1NF models may not be capable of directly using existing relational storage structures or query evaluation strategies.
This paper describes a new, conceptual temporal data model that better captures the time-dependent semantics of the data while permitting multiple data models at the representation level. This conceptual model effectively moves the distinction between the various existing data models from a semantic basis to a physical, performance-relevant basis.
We define a conceptual notion of a bitemporal relation where tuples are stamped with sets of two-dimensional chronons in transaction-time/valid-time space. We introduce a tuple-timestamped 1NF representation to exemplify how the conceptual bitemporal data model is related, by means of snapshot equivalence, with representational models. We then consider querying within the two-level framework. We first define an algebra at the conceptual level. We proceed to map this algebra to the sample representational model in such a way that new operators compute equivalent results for different representations of the same conceptual bitemporal relation. This demonstrates that the representational model is faithful to the semantics of the conceptual data model, with many choices available that may be exploited to improve performance.
Jensen, C. S., L. Mark, "Differential Query Processing in Transaction-Time Databases," Chapter 19, pp. 457-491, in Temporal Databases: Theory, Design, and Implementation, edited by A. Tansel et al., Benjamin/Cummings Publishers, Database Systems and Applications Series, 1993.

Publication [not publicly available]
Jensen, C. S., J. Clifford, S. K. Gadia, A. Segev, R. T. Snodgrass, "A Glossary of Temporal Database Concepts," Appendix A, pp. 621-633, in Temporal Databases: Theory, Design, and Implementation, edited by A. Tansel et al., Benjamin/Cummings Publishers, Database Systems and Applications Series, 1993.

Publication [not publicly available]

ACM Author-Izer
Jensen, C. S. (editor, with multiple other contributors), "Proposed Temporal Database Concepts - May 1993," in Proceedings of the International Workshop on an Infrastructure for Temporal Databases, Arlington, TX, pp. A-1-A-24, June 14-16, 1993.

Publication
This document contains the complete set of glossary entries proposed by members of the temporal database community from Spring 1992 until May 1993. It is part of an initiative aimed at establishing an infrastructure for temporal databases. As such, the proposed concepts will be discussed during "International Workshop on an Infrastructure for Temporal Databases," in Arlington, TX, June 1993, with the specific purpose of defining a consensus glossary of temporal database concepts and names.
Earlier status documents appeared in March 1993 and December 1992 and included terms proposed after an initial glossary appeared in SIGMOD Record in September 1992. This document subsumes all the previous documents. Additional information related to the initiative may be found at cs.arizona.edu in the tsql directory, accessible via anonymous ftp.
Jensen, C. S. (editor, with multiple other contributors), "Addendum to 'Proposed Temporal Database Concepts - May 1993'," in Proceedings of the International Workshop on an Infrastructure for Temporal Databases, Arlington, TX, pp. A-25-A-29, June 14-16, 1993.

Publication
The paper "Proposed Temporal Database Concepts - May 1993" contained a complete set of glossary entries proposed by members of the temporal database community from Spring 1992 until May 1993. The aim of the proposal was to define a consensus glossary of temporal database concepts and names. Several glossary entries (Section 3) were included in the proposal, but were still unresolved at the time of the deadline. This addendum reflects on-going discussions and contains revised versions of several unresolved entries. The entries here thus supersede the corresponding entries in Section 3 of the proposal.
Jensen, C. S. (editor, with multiple other contributors), "The TSQL2 Benchmark," in Proceedings of the International Workshop on an Infrastructure for Temporal Databases, Arlington, TX, pp. QQ-1-QQ-28, June 14-16, 1993.

Publication
This document presents the temporal database community with an extensive, consensus benchmark for temporal query languages. The benchmark is semantic in nature. It is intended to be helpful when evaluating the user-friendliness of temporal query languages, including proposals for the consensus temporal SQL that is currently being developed.
The benchmark consists of a database schema, an instance for the schema, and a set of queries on the this database. The queries are classified according to a taxonomy, which is also part of the benchmark.
Jensen, C. S. (ed.) et al., "A Consensus Test Suite of Temporal Database Queries," Technical Report R-93-2034, Aalborg University, Department of Mathematics and Computer Science, Fredrik Bajers Vej 7E, DK-9220 Aalborg Øst, Denmark, 45 pages, November 1993.

Publication
This document presents the temporal database community with an sizable consensus test suite of temporal database queries. The test suite is intended to be helpful when evaluating the user-friendliness of temporal relational query languages.
The test suite consists of a database schema, an instance for the schema, and a set of approximately 150 queries on this database. The queries are classified according to a taxonomy, which is also included in the document.
Snodgrass, R. T., C. E. Dyreson, C. S. Jensen, N. Kline, L. Soo, M. D. Soo, "The MultiCal System," Manual and Systems Documentation, University of Arizona, Department of Computer Science, Tucson, Arizona 85721, USA, ca. 300 pages, June 1993.

Dyreson, C., R. Snodgrass, C. S. Jensen, "On the Semantics of `Now' in Temporal Databases," TempIS Technical Report 42, University of Arizona, Department of Computer Science, Tucson, Arizona 85721, USA, 22 pages, April 1993.

Publication
"Now" is a distinguished timestamp value used by many temporal data model proposals. In this paper, we examine the different semantics given this familiar term, and propose two new timestamp values, a deterministic now-relative value and an indeterminate now-relative value, that more accurately capture the semantics of "now." We discuss query language constructs, representation, and query processing strategies for such value.s Both the valid-time and transaction-time dimentions are considered. We demonstrated that these values incur no storage overhead and nominal additional query execution cost. The related concepts of "indefinite future"" and "indefinite now" are also considered.
Jensen, C. S. et al. (eds.), "A Consensus Glossary of Temporal Database Concepts," Technical Report R-93-2035, Aalborg University, Department of Mathematics and Computer Science, Fredrik Bajers Vej 7E, DK-9220 Aalborg Øst, Denmark, 55 pages, November 1993.

Publication
This document contains definitions of a wide range of concepts specific to and widely used within temporal databases. In addition to providing definitions, the document also includes explanations of concepts as well as discussions of the adopted names.
The consensus effort that lead to this glossary was initiated in Early 1992. Earlier status documents appeared in March 1993 and December 1992 and included terms proposed after an initial glossary appeared in SIGMOD Record in September 1992. The present glossary subsumes all the previous documents. It was most recently discussed at the "ARPA/NSF International Workshop on an Infrastructure for Temporal Databases," in Arlington, TX, June 1993, and the present glossary is recommended by a significant part of the temporal database community. The glossary meets a need for creating a higher degree of consensus on the definition and naming of temporal database concepts.
Two sets of criteria are included. First, all included concepts were required to satisfy four relevance criteria, and, second, the naming of the concepts was resolved using a set of evaluation criteria. The concepts are grouped into three categories: concepts of general database interest, of temporal database interest, and of specialized interest.
Jensen, C. S., M. D. Soo, R. T. Snodgrass, "Unifying Temporal Data Models via a Conceptual Model," Technical Report 93-31, University of Arizona, Department of Computer Science, Tucson, Arizona 85721, USA, 36 pages, September 1993.

Publication
To add time support to the relational model, both first normal form (1NF) and non-1NF approaches have been proposed. Each has associated difficulties. Remaining within 1NF when time support is added may introduce data redundancy. The non-1NF models may be incapable of directly using existing relational storage structures or query evaluation technologies.
This paper describes a new, conceptual temporal data model that better captures the time-dependent semantics of the data, while permitting multiple data models at the representation level. This conceptual model effectively moves the distinction between the various existing data models from a semantic basis to a physical, performance-relevant basis.
We define a conceptual notion of a bitemporal relation where tuples are stamped with sets of two-dimensional chronons in transaction-time/valid-time space. Next, we describe five representation schemes that support both valid and transaction time; these representations include both 1NF and non-1NF models. We use snapshot equivalence to relate the representation data models with the bitemporal conceptual data model.
We then consider querying within the two-level framework. To do so, we define an algebra at the conceptual level. We then map this algebra to the representation level in such a way that new operators compute equivalent results for different representations of the same bitemporal conceptual relation. This demonstrates that all of these representations are faithful to the semantics of the conceptual data model, with many choices available that may be exploited to improve performance.
Soo, M. D., R. T. Snodgrass, C. S. Jensen, "Efficient Evaluation of the Valid-Time Natural Join," Technical Report 93-17, University of Arizona, Department of Computer Science, Tucson, Arizona 85721, USA, 26 pages, June 1993.

Publication
Joins are arguably the most important relational operators. Poor implementations are tantamount to computing the Cartesian product of the input relations. In a temporal database, the problem is more acute for two reasons. First, conventional techniques are designed for the optimization of joins with equality predicates, rather than the inequality predicates prevalent in valid-time queries. Second, the presence of temporally-varying data dramatically increases the size of the database. These factors require new techniques to efficiently evaluate valid-time joins.
We address this need for efficient join evaluation in databases supporting valid-time. A new temporal-join algorithm based on tuple partitioning is introduced. This algorithm avoids the quadratic cost of nested-loop evaluation methods; it also avoids sorting. Performance comparisons between the partition-based algorithm and other evaluation methods are provided. While we focus on the important valid-time natural join, the techniques presented are also applicable to other valid-time joins.
Jensen, C. S. (ed.), "Proposed Glossary Entries - March 1993," Status report for the Terminology Subtask of the TSQL2 Design Initiative. Distributed to the TSQL Mailing List, 25 pages, March 1993.

Publication
This document describes the current status, as of March 30, 1993, of an initiative aimed at creating a consensus glossary of temporal database concepts and names. An earlier status document appeared in December 1992 and included terms proposed after an initial glossary appeared in SIGMOD Record. This document contains a set of new terms, proposed since December 1992, and the terms from the December 1992 document. To provide a context, the terms from the initial glossary are included in an appendix in dictionary format, and criteria for evaluation of glossary entries are also listed in the appendix.
The document is intended to help future contributors of glossary entries. Proposed glossary entries should be sent to tsql@cs.arizona.edu. Other information related to the initiative may be found at cs.arizona.edu in the tsql directory, accessible via anonymous ftp.
Soo, M. D., R. Snodgrass, C. E. Dyreson, N. Kline, C. S. Jensen, "Architectural Extensions to Support Multiple Calendars," The MULTICAL Project, Department of Computer Science, The University of Arizona, Tucson, Arizona 85721, USA, 78 pages, October 1993.

Publication
We describe in detail a system architecture for supporting a time-stamp attribute domain in conventional relational database management systems. This architecture underlies previously proposed temporal modifications to SQL. We describe the major components of the system and how they interact. For each component of the system, we provide specifications for the routines exported by that component. Finally, we describe a preliminary design for a tollkit that aids in the generation of the components of the database management system that support time.
1992 top Jensen, C. S., L. Mark, "Queries on Change in an Extended Relational Model," IEEE Transactions on Knowledge and Data Engineering, Vol. 4, No. 2, pp. 192-200, April 1992.

Publication
A data model that allows for the storage of detailed change history in so-called backlog relations is described. Its extended relational algebra, in conjunction with the extended data structures, provides a powerful tool for the retrieval of patterns and exceptions in change history. An operator, S, based on the notion of compact active domain is introduced. It groups data not in predefined groups but in groups that fit the data. This operator further expands the retrieval capabilities of the algebra. The expressive power of the algebra is demonstrated by examples, some of which show how patterns and exceptions in change history can be detected. Sample applications of this work are statistical and scientific databases, monitoring (of databases, manufacturing plants, power plants, etc.), CAD, and CASE.
Jensen, C. S., R. Snodgrass, "Temporal Specialization," in Proceedings of the Eighth IEEE International Conference on Data Engineering, Phoenix, AZ, pp. 594-603, February 3-6, 1992.

Publication [not publicly available]

Online at ACM Digital Library
This glossary contains concepts specific to temporal databases that are well-defined, well understood, and widely used. In addition to defining and naming the concepts, the glossary also explains the decisions made. It lists competing alternatives and discusses the pros and cons of these. It also includes evaluation criteria for the naming of concepts.
This paper is a structured presentation of the results of e-mail discussions initiated during the preparation of the first book on temporal databases, Temporal Databases: Theory, Design, and Implementation, published by Benjamin/Cummings, to appear January 1993. Independently of the book, an initiative aimed at designing a consensus Temporal SQL is under way. The paper is a contribution towards establishing common terminology, an initial subtask of this initiative.
Jensen, C. S., R. Snodgrass, "Proposal for a Data Model for the Temporal Structured Query Language," TempIS Technical Report 37, University of Arizona, Department of Computer Science, Tucson, Arizona 85721, USA, 32 pages, July 1992.

Publication
Adding time to the relational model has been a daunting task. More than two dozen time-extended relational data models have been proposed over the last fifteen years. We feel that the reason why so many temporal data models have been proposed is that these models attempt to simultaneously retain the simplicity of the relational model, present all the information concerning an object in one tuple, and ensure ease of implementation and query evaluation efficiency. We advocate instead a separation of concerns. We propose a new, conceptual temporal data model that better captures the time-dependent semantics of the data while permitting multiple data models at the representation level. This conceptual model effectively moves the distinction between the various existing data models from a semantic basis to a physical, performance-relevant basis. We compare our model with the previously proposed temporal data models.
Jensen, C. S., M. D. Soo, "Temporal Joins in a Two-Level Data Model," Unpublished manuscript, 24 pages, June 1992.

This paper focuses on the outer natural join of temporal relations and the simpler temporal joins by which it is defined. Joins are of particular interest because they are both frequently used and computationally expensive.
The temporal joins are defined within a model that completely separates the conceptual notion of a temporal relation from the actual representation of the temporal relation. Conceptually, tuples in a temporal relation have two associated times - valid time, recording changes in reality, and transaction time, recording changes in the database. We have adopted a representation scheme where temporal relations are embedded in conventional relations. One conceptual temporal relation may be represented by many such embeddings. The join operators are representation independent in that they compute equivalent results for different representations of the same temporal relation. The separation of a temporal relation from its representation is conceptually desirable; it allows significant flexibility when implementing the join operators which, in turn, may be exploited to gain better performance.
The joins proposed are natural generalizations of their counterparts in the conventional relational algebra. Just as the natural join is used when creating loss-less join decompositions in conventional relational database design, the generalized counterpart, the temporal natural join, plays the identical role when designing temporal relational database schemas.
Soo, M. D., R. T. Snodgrass, C. E. Dyreson, C. S. Jensen, N. Kline, "Architectural Extensions to Support Multiple Calendars," TempIS Technical Report 32, University of Arizona, Department of Computer Science, Tucson, Arizona 85721, USA, 74 pages, May 1992.

Publication
This paper is a detailed description of the design of the architectural extensions to a database management system (DBMS) supporting multiple calendars. The paper contains descriptions of the modules comprising the architectural extensions, with detailed descriptions of the services provided by each module. In addition, the data structures used by each system module are included.
Jensen, C. S. (ed.), "Proposed Glossary Entries - December 1992," Status report for the Terminology Subtask of the TSQL2 Design Initiative. Distributed to the TSQL Mailing List, 14 pages, December 1992.

Publication
This document describes the current status, as of December 15, 1992, of an initiative aimed at creating a consensus glossary of temporal database concepts and names. It contains the set of currently proposed, complete glossary entries. Existing terms and criteria for evaluation of glossary entries are contained in appendices.
The document is intended to help future contributors of glossary entries. Proposed glossary entries should be sent to tsql@cs.arizona.edu. Other information related to the initiative may be found at cs.arizona.edu in the tsql directory, accessible via anynomous ftp.
Jensen, C. S., R. T. Snodgrass, M. D. Soo, "Extending Normal Forms to Temporal Relations," Technical Report 92-17, University of Arizona, Department of Computer Science, Tucson, Arizona 85721, USA, 24 pages, July 1992.

Publication
Normal forms play a central role in the design of relational databases. Recently several normal forms for temporal relational databases have been proposed. The result is a number of isolated and sometimes contradictory contributions that only apply within specialized settings.
This paper attempts to rectify this situation. We define a consistent framework of temporal equivalents of all the important conventional database design concepts: functional and multivalued dependencies, primary keys, and third, Boyce-Codd, and fourth normal forms. This framwork is enabled by making a clear distinction between the logical concept of a temporal relation and its physical representation. As a result, the role played by temporal norms forms during temporal database design closely parallels that of normal forms during conventional database design.
We compare our approach with previously proposed definitions of temporal normal forms and temporal keys. To demonstrate the generality of our approach, we outline how normal forms and dependency theory can also be applied to spatial databases, as well as to spatial-temporal databases.
Jensen, C. S., M. D. Soo, R. T. Snodgrass, "Unification of Temporal Data Models," Technical Report TR 92-15, University of Arizona, Department of Computer Science, Tucson, Arizona 85721, USA, 28 pages, July 1992.

Publication
To add time support to the relational model, both first normal form (1NF) and non-1NF approaches have been proposed. Each has associated difficulties. Remaining within 1NF when time support is added may introduce data redundancy. The non-1NF models may not be capable of directly using existing relational storage structures or query evaluation technologies. This paper describes a new, conceptual temporal data model that better captures the time-dependent semantics of the data while permitting multiple data models at the representation level. This conceptual model effectively moves the distinction between the various existing data models from a semantic basis to a physical, performance-relevant basis.
We define a conceptual notion of a bitemporal relation where tuples are stamped with sets of two-dimensional chronons in transaction-time/valid-time space. Next, we describe three representation schemes: a tuple-timestamped 1NF representation, a backlog relation composed of 1NF timestamped change requests, and a non-1NF attribute value-timestamped representation. We further investigate several variants of these representations. We use snapshot equivalence to relate the representation data models with the conceptual bitemporal data model.
We then consider querying within the two-level framework. To do so, we define first an algebra at the conceptual level. We proceed to map this algebra to the representation level in such a way that new operators compute equivalent results for different representations of the same conceptual bitemporal relation. This demonstrates that all of these representations are faithful to the semantics of the conceptual data model, with many choices available that may be exploited to gain improved performance.
1991 top Jensen, C. S., L. Mark, N. Roussopoulos, "Incremental Implementation Model for Relational Databases with Transaction Time," IEEE Transactions on Knowledge and Data Engineering, Vol. 3, No. 4, pp. 461-473, December 1991.

Publication
An implementation model for the standard relational data model extended with transaction time is presented. The implementation model integrates techniques of view materialization, differential computation, and deferred update into a coherent whole. It is capable of storing any view (reflecting past or present states) and subsequently using stored views as outsets for incremental and decremental computations of requested views, making it more flexible than previously proposed partitioned storage models. The working and the expressiveness of the model are demonstrated by sample queries that show how historical data are retrieved.
Jensen, C. S., R. Snodgrass, "Temporal Specialization and Generalization," Technical Report R 91-45, Aalborg University, Department of Mathematics and Computer Science, Fredrik Bajers Vej 7E, DK-9220 Aalborg Øst, Denmark, August, 47 pages (also Technical Report 91-25, University of Arizona, Department of Computer Science, Tucson, Arizona 85721, USA), 1991.

Publication
A standard relation is two-dimensional with attributes and tuples as dimensions. A temporal relation contains two additional, orthogonal time dimensions, namely valid time and transaction time. Valid time records when facts are true in the modeled reality, and transaction time records when facts are stored in the temporal relation.
While, in general, there are no restrictions between the valid time and transaction time associated with each fact, in many practical applications the valid and transaction times exhibit more or less restricted interrelationships which define several types of specialized temporal relations. The paper examines five different areas where a variety of types of specialized temporal relations are present.
In application systems with multiple, interconnected temporal relations, multiple time dimensions may be associated with facts as they flow from one temporal relation to another. For example, a fact may have associated multiple transaction times telling when it was stored in previous temporal relations. The paper investigates several aspects of the resulting generalized temporal relations, including the ability to query a predecessor relation from a successor relation.
The presented framework for generalization and specialization allows researchers as well as database and system designers to precisely characterize, compare, and thus better understand temporal relations and the application systems in which they are embedded. The framework's comprehensiveness and its use in understanding temporal relations are demonstrated by placing previously proposed temporal data models within the framework. The practical relevance of the defined specializations and generalizations is illustrated by sample realistic applications in which they occur. The additional semantics of specialized relations are especially useful for improving the performance of query processing.
Jensen, C. S., R. Snodgrass, "Specialized Temporal Relations," Technical Report R 91-26, Aalborg University, Department of Mathematics and Computer Science, Fredrik Bajers Vej 7E, DK-9220 Aalborg Øst, Denmark, 27 pages, June 1991.

Publication
In temporal specialization, the database designer restricts the relationship between the valid time-stamp (recording when something is true in the reality being modeled) and the transaction time-stamp (recording when a fact is stored in the database). An example is a retroactive temporal event relation, where the event must have occured before it was stored, i.e., the valid time-stamp is restricted to be less than the transaction time-stamp. We discuss some two dozen useful restrictions, defining as many specialized types of temporal relations and indicate some of their applications. We present a detailed taxonomy of specialized temporal relations. This taxonomy may be employed during database design to specify the particular time semantics of temporal relations. Additionally, the DBMS may exploit such a characterization to more efficiently store and access those temporal relations.
Many previous research efforts that considered only one kind of time also apply to certain specialized temporal relations with both kinds of time. We classify a wide range of such efforts, identifying the particular specialization each concerns. Similarly, implementation approaches that assume only one kind of time often also apply to specialized temporal relations. We analyze the extent that each technique may be modified to work with temporal relations, thereby achieving improved performance when supporting such relations. An implication of this work is that much of previous and current research that heretofore has applied only to rollback or historical databases is also relevant to restricted forms of temporal databases.
1990 top Jensen, C. S., L. Mark, "Replication Gives High Performance Query Processing in Relational Models Extended with Transaction Time," The First IEEE Workshop on the Management of Replicated Data, Houston, TX, 4 pages, November 1990.

Publication
Jensen, C. S., L. Mark, "A Framework for Vacuuming Temporal Databases," Technical Report CS-TR-2516, UMIACS-TR-90-105, Department of Computer Science, University of Maryland, College Park, MD 20742, 46 pages, August 1990.

Publication
In conventional databases, the amount of data typically reaches a certain level and is then relatively stable. In databases supporting transaction time, old data are retained, and the amount of data is ever growing. Even with continued advances in mass storage technology, vacuuming (i.e., deletion or off-line storage of data) will eventually be necessary. At the same time, the fundamental principle, that history cannot be changed, of transaction time databases must be obeyed.
This paper provides a framework for vacuuming subsystems for relational transaction time databases. Our main focus is to establish a foundation for correct and cooperative query processing through the modification of queries that cannot be processed due to vacuuming. In doing this, we provide language facilities for specifying vacuuming; we present three classifications of vacuuming specifications; and we define correctness criteria for vacuuming specifications. Based on the classifications, we provide a comprehensive set of rules for expressing modified queries. For some of the classes, modified queries can be expressed using relational algebra - for others, this is impossible, and an extended, tagged relational algebra is used instead.
The framework is a useful tool for designers of specific vacuuming subsystems. The framework is presented in the context of a previously developed relational model with transaction time support, DM/T.
C. S. Jensen, L. Mark, N. Roussopoulos, T. Sellis, "Using Caching, Cache Indexing, and Differential Techniques to Efficiently Support Transaction Time," Technical Report CS-TR-2413, UMIACS-TR-90-25, Department of Computer Science, University of Maryland, College Park, MD 20742, 28 pages, February 1990.

Publication
We present a framework for query processing in the relational model extended with transaction time. The framework integrates standard query optimization and computation techniques with new differential computation techniques. Differential computation incrementally or decrementally computes a query from the cached and indexed results of previous computations. The use of differential computation techniques is essential in order to provide efficient processing of queries that access very large temporal relations. Alternative query plans are integrated into a state transition network, where the state space includes backlogs of base relations, cached results from previous computations, a cache index, and intermediate results; the transitions include standard relational algebra operators, operators for constructing differential files, operators for differential computation, and combined operators. A rule set is presented to prune away parts of state transition networks that are not promising, and dynamic programming techniques are used to identify the optimal plans from remaining state transition networks. An extended logical access path serves as a "structuring" index on the cached results and contains in addition vital statistics for the query optimization process, including statistics about base relations, backlogs, about previously computed and cached, previously computed, or just previously estimated queries.
1989 top C. S. Jensen, L. Mark, N. Roussopoulos, T. Sellis, "A Framework for Efficient Query Processing Using Caching, Cache Indexing, and Differential Techniques in the Relational Model Extended with Transaction Time," Technical Report R-89-45, Aalborg University, Department of Mathematics and Computer Science, Fredrik Bajers Vej 7E, DK-9220 Aalborg Øst, Denmark, 1989.

Publication
We present a framework for query processing in the relational model extended with transaction time. The framework integrates standard techniques for query optimization and computation with techniques for incremental and decremental, i.e., differential, computation from cached and indexed results of previous computation in order to provide efficient processing of queries on very large temporal relations. Alternative query plans are integrated into a state transition network, where the state space includes backlogs of base relations, cached results from previous computations, the cache index, and intermediate results; transitions include standard relational algebra operators, operators for constructing differential files, operators for differential computation, and combined operators. A rule set is presented to prune away parts that are not promising, and dynamic programming techniques are used to identify the optimal plan of the resulting state transition network. An extended logical access path serves as a "structuring" index on the cached results and contains in addition vital statistics for the query optimization process, including statistics about base relations, backlogs, about previously computed and cached, previously computed, or just previously estimated queries.
The framework exploits eager, threshold triggered, and eager propagation of update to ensure consistency between base data and cached data. It integrates previously proposed approaches to supporting views, i.e., recomputation, storage of data snapshots, and storage of pointer structures, and it generalizes incremental computation techniques to differential computation techniques.
C. S. Jensen, L. Mark, "Queries on Change in an Extended Relational Model," Technical Report CS-TR-2299, UMIACS-TR-89-80, Department of Computer Science, University of Maryland, College Park, MD 20742, 37 pages, August 1989.

Publication
A data model is a means of modeling, communicating about, and managing part of reality. In our understanding one of the most fundamental characteristics of reality is change; whereas change is fundamental, stability is relative and temporary. Change is an often critical aspect of database systems applications; in many applications change itself and previous states are of interest. Change presupposes the concept of time. We provide a data model that allows for the storage of detailed historical data in so-called backlog relations. The query language extends the standard relational algebra to take advantage of the additional data. In particular, we introduce an operator Sigma based on the notion of compact active domain. This operator groups data, not in predefined groups, but in groups that "fit" the data. The expressive power of the operator is demonstrated by examples showing how patterns and exceptions in change history can be detected. Sample applications of this work are statistic and scientific databases, monitoring (of production systems, databases, power plants, etc.), CAD and CASE.
C. S. Jensen, L. Mark, N. Roussopoulos, "Incremental Implementation Model for Relational Databases with Transaction Time," Technical Report CS-TR-2275, UMIACS-TR-8963, Department of Computer Science, University of Maryland, College Park, MD 20742, 28 pages, July 1989.

Publication
The database literature contains numerous contributions to the understanding of time in relational database systems. In the past, the focus has been on data model issues and only recently has efficient implementation been addressed. We present an implementation model for the standard relational data model extended with transaction time. The implementation model exploits techniques of view materialization, incremental computation, and deferred update. It is more flexible than previously presented partitioned storage models. A new and interesting class of detailled queries on change behaviour of the database is supported.

Prof. Christian S. Jensen List of Publications

This page contains a list of research publications with abstracts and, generally, links to full paper versions.

Prof. Christian S. Jensen
List of Publications

This page contains a list of research publications with
abstracts and, generally, links to full paper versions.