morph-streams is an ontology-based data access system that allows evaluating SPARQL-Stream queries over a range of data streaming systems, which are mapped using the W3C R2RML language. More specifically, the current version of morph-streams provides wrappers for:
Previous versions of morph-streams also supported the API of Pachube (now Xively), although this has been deprecated.
Morph-streams supports two modes of operation:
The morph-streams project repository can be found at https://github.com/oeg-upm/morph-streams, together with instructions on how to install it and use it. Besides, a live deployment of morph-streams with several types of streaming data sources can be found at http://streams.linkeddata.es/.
The purpose of this research is to design and implement an engine that allows complex queries over heterogeneous data streams in near real-time at Web scale.
There is a growing number of applications that depend on the usage of real-time spatiotemporal data, and which allow moving from the usual three levels of decision making (strategic, tactical, and operational) to real-time decision making. One example would be real-time geomarketing, where decisions on offering discount coupons to customers may be made on really short time slots based on the combination of a set of spatiotemporal data streams coming from different providers, e.g. public transport card validations or weather information. Extracting information from these streams is complex because of the heterogeneity of the data, the rate of data generation, and the volume. To tap these data sources accordingly and get relevant information, scalable processing infrastructures are required, as well as approaches to allow data integration and fusion.
Our plan is to build a distributed stream processing engine capable of adapting to changing conditions while serving complex continuous queries. First, adapters for various formats are used to convert heterogeneous streams to Linked Data streams. Then, Adaptive Query Processing (AQP) allows adjusting the query execution plan to varying conditions of the data input, the incoming queries, and the system.
Our engine will address real-time processing following the Lambda principles. Lambda is a 3-layer architecture designed to alleviate the complexities of Big Data management: a batch layer stores all the incoming data in an immutable master dataset and pre-computes batch views; a serving layer indexes views on the master dataset; and a speed layer manages the real-time processing issues and requests data views depending on incoming queries. We will follow this design together with AQP techniques and RDF compressed data structures allowing to decrease access time in large datasets, as well as data transmission time among processing nodes.
Created under Creative Commons License - 2015 OEG.