Phase 1 Report

Jeremy Morley, Amir Pourabdollah and Tracey Mooney - Nottingham Geospatial Institute, 29 March 2012.

Project OSM-GB
The OSM-GB project is exploring the improvement of Open Street Map data through rule-based processing and the merger of other data sources such as Ordnance Survey data. Contributing partners are the Nottingham Geospatial Institute (NGI) based at the University of Nottingham, 1 Spatial and KnowWhere. This report sets out the agreed aims and objectives of the NGI during phase 1 of the project, with detailed progress and feedback provided in sections 2 through 4 with specific task and timescales outlined in section 5, with a view to revising task specific timescales during the next meeting of the parties.

The project is divided into three broad areas:


 * Infrastructure development and implementation
 * Rules and action research
 * Community engagement

Infrastructure Development and Implementation
During phase 1 the aim was to establish the web mapping infrastructure for the OSM-GB project across three servers. It involved system design, installation and configuration of necessary software in order that relevant data could be stored in appropriate formats for processing, rendering, tiling and delivery to the community. A more comprehensive account is provided in section 2.

Rules and Action Research
The aim for phase 1 centred around establishing the 1Spatial Radius Studio data processing system so that it could take OSM data and produce quality controlled OSM-GB data that could be fed into the web mapping infrastructure outlined above and in further detail in section 2. Section 3 focuses on the research in this area and the progress made.

Community Engagement
During this phase of the project the main aim is to introduce the project to the OSM and wider community via press releases, a live webpage, and blog and project wiki. In addition the establishment of a steering group and invitations to engage with early adopters is seen as being advantageous. This is addressed in further detail in section 4.

Infrastructure Development
The system consists of three servers (Figure 2.1). OSM-GB1 and OSM-GB3 are primarily used for the processing and storage of data relevant to the project and will remain hidden from the public, while OSM-GB2 is the public-facing server that will deliver the final OSM-GB product and community relevant information.

Figure 2.1 OSM-GB System Diagram


(NGI, 2012)


 * OSM-GB 1 imports and processes OSM and OS data. It hosts the Oracle database, importing tools and Radius Studio. Oracle stores the original and processed OSM and OS data, generating the raw OSM-GB data to send through to OSM-GB3 for cartographic rendering and tile generation.
 * OSM-GB 2 is pubic facing and has relevant public interfaces installed. This will serve the raster and vector maps generated in OSM-GB3 through OGC and OSM standards, as well as other public interfaces including the project web site, weblog, wiki etc.
 * OSM-GB 3 will receive and store the most up to date processed data from OSM-GB1 for cartographic rendering, tile generation and storage. The PostGIS database is installed and holds a copy of the processed data in a format suitable for open web services and tile rendering. OSM-GB 3 also hosts the Mapnik tiling service.

During phase 1, the following components have been implemented:


 * OSM-GB1 : Snowflake Software’s Go Loader and Go Publisher have been installed; they have been used to import OSM data into the Oracle Spatial database. Radius Studio is now able to read from the database, apply a set of defined rules, and write back the results to the database.
 * OSM-GB2 : Apache web server and GeoServer are installed. This server has access to the map tiles produced by OSM-GB3 and now displays the produced map on the OSM-GB project home page using OpenLayer. WordPress and MediaWiki are installed and active to serve the Web 2.0 components of the system.
 * OSM-GB 3: a number of tools have been installed in order to make this a backend for OSM-GB2:- PostGIS and Mapnik are the dominant components. The “Tile-by-request” component has been developed on OSM-GB3, in order to make tiling service run on-the-fly, using ‘renderd’, however, the more up to date Tirex and Mod_Tile are installed on OSM-GB3 and are currently being researched and trialled.

As at generation of this report, the front-facing serves that can be accessed are the OSM-GB project home page, the project blog and the project wiki, as illustrated in figures 2.2 to 2.4.

Figure 2.2 OSM-GB Project Home Page as at March 2012.


        Front page as at March 2012 ('http://www.osmgb.org.uk/') 

Figure 2.3 OSM-GB Project Blog                 Figure 2.4 OSM-GB Project Wiki


Blog and Wiki pages as at March 2012 ('http://lgosmgb2.nottingham.ac.uk/blog/')         (http://www.osmgb.org.uk/osm-gb-wiki/index.php/Main_Page)

To-do:

 * 1) Further development of OGC data access mechanisms on OSM-GB2 (i. e. WMS and WFS).
 * 2) The “ingesting” component still needs to be developed on OSM-GB1 to automatically feed PostGIS, hosted on OSM-GB3, with the Oracle database contents (a Java tool will be developed to achieve this). The issue here is a change of schema applied to the OSM schema from the GO Loader and the Radius Studio rules, versus the schema expected by Mapnik.  OSM2pgSQL has been used to examine how this is currently done, and to determine the structure of the PostGIS database compatible with Mapnik. It is proposed that something similar will be developed for the OSM-GB project.
 * 3) The “Tile-by-request” component is currently running on OSM-GB3 using “renderd” tool. This can be replaced by “Tirex” as the more advanced technology to serve the tiles. However, research suggests that tile rendering must be conducted on the same server that the Web Server is installed. This means that either the tile rendering process needs to be moved to OSM-GB2 or OSM-GB3 shall host another Web server. The future research will define the position of the Tile Server and the Web Server in the final system architecture. At present, pre-rendered tiles has been produced and stored in OSM-GB 2 to be accessed by public.
 * 4) Optimizing inter-server communications –hardware (direct cabling) or software (e.g NFS) solutions.
 * 5) Developing vector rendering engines on OSM-GB3  (e. g. Osmarender).

3.1 Research on OSM Data Management
The semi-structured nature of the data model, using “tags” to describe geographical features’ metadata has been found to be both beneficial and challenging. Of benefit is the simplicity and rigidity in the data structure, but challenge is the very thing that gives OSM data some of it richness - the openness of the tagging that allows any key-value pair to be associated to any feature.

Within the scope of this project, there are six aspects of OSM data modelling that needed to be studied. Research continues on the following:


 * 1) The conceptual data model: In this view, the whole OSM data consists of three data primitives (Nodes, Ways and Relations). Tags are key-value pairs associated to each of those three. The recommended key-values for describing the geographical features in tags is documented in http://wiki.openstreetmap.org/wiki/Map_features
 * 2) The way that data are structured within the OSM internal database: OSM has a data structure for storing the data internally, as illustrated in http://wiki.openstreetmap.org/wiki/Database/Model
 * 3) The way that OSM makes the dump data files and update files (OSM-XML Schema): This schema can be found on  http://wiki.openstreetmap.org/wiki/API_v0.6/XSD
 * 4) The way that we need to store the OSM data in our servers to be analysed: This has been a trade-off between the optimum required structure for applying the rules and the structure that is created by loaders tools (e.g. Go Loader and FME). The structure that is created by Go Loader has been selected because of its performance, simplicity, being normalized and effective use of tags. This structure is detailed in http://wiki.snowflakesoftware.com/display/GLDOC/GO+Loader+OSM+Tables. The database type has also been selected to be Oracle Spatial 11g because the Radius Studio tool can work effectively with this database. In the importing process we have been particularly concerned to keep the data as close to the original OSM conceptual model as possible. For example, the shapefiles from the Cloudmade already incorporate model changes, splitting data into thematic layers and hence hiding an interpretation of the features’ tags to decide how features belong to layers.


 * 1) The way that we need to store the processed data in our servers to be able to serve the public: PostGIS has been selected for this because of the openness of the database and its adaptation with the Web Services that are to be established. The structure is that suggested by common conversion tools in the OpenStreetMap community (e. g. Osm2pgsql and Osmosis). Although this is a lossy and non-normalized in our view, it has been efficiently used by Mapnik tiling service.
 * 2) The way that we encode and export data for our consumers: In addition to raster outputs, GML schema through a WFS service has been selected. This is due to the wide acceptance of these two open standards within the open geospatial community.

In addition, research has been conducted on the possible need to import and export to and from the above items, as outlined below:


 * To import from OSM-XML to Oracle Spatial: This has been a critical step, as all the data validation will be based on querying an optimized data structure of converted OSM data, which shall be updated regularly. Osmosis/OGR2OGR tools, QuantumGIS plugins, FME and GO Loader/GO Publisher tools have been reviewed. Through this, an academic license of GO Loader/GO Publisher has been obtained from Snowflake Software. The job also includes on-going work with the Snowflake team to optimize the conversion procedure. Having the tool, the whole UK OSM has been stored in the project’s Oracle database from the GeoFabrik‘s dump OSM XML files.
 * To convert between Oracle Spatial and PostGIS: ''A Java tool for this is under development, but not completed at the time of writing this report. This means that currently the data improvement and reporting of Radius Studio is disconnected from the data representation on the web as data cannot be fed from Oracle Spatial to PostGIS yet. This is the final component in this phase, although it may also be feasible to use GO Publisher and GO Loader to achieve this.
 * To import from OSM-XML to PostGIS: This may not be necessary in the later stages of the project; however, this has been studied for testing and database structure study purposes. The PostGIS on the OSM-GB3 server has been initially populated from OSM-XML data using Osm2pgsql.

Research on OSM Quality Improvement
Having Radius Studio as our rules/action development tool, made it necessary to collect and implement a set of rules and actions as well as applying them on the database. Initial activities were:


 * Getting to know the Radius Studio environment and rules/actions development.
 * Studying the quality measurement and correction tools listed in thisOSM wiki page. A working version of this is reviewed in Appendix A.
 * Reviewing available catalogues and guides for geospatial validation rules and how-to-fix guides, including documents from Socium rules and the JOSM validator. Appendix B provides a summary table extracted from JOSM documents.

Additionally, a set of 4 geometry validation rules and fixing actions has been developed in Radius Studio, tested and applied on the imported data in the Oracle Spatial database. These rules are simple geometry rules including duplicate points, spikes, small rings, kickbacks and intersections in ways and polygons. Also, using the “Action Map” feature of the Radius Studio, it has been possible to make new corrected tables in the database such that some of the above abnormalities are corrected.

To-do:
More research on developing more complex rules in geometry and non-geometry areas are necessary to be done in the next phase. In addition, so far we have used the built-in functions of the Studio to correct the geometry errors, but this will be just an initial step of the range of corrections that can be applied.

Community Engagement
The project team has made relations to OSM Developers, Socium team, Snowflake software team as well as the wider community of crowd-sourced geospatial community including experts from AGILE and EuroSDR through last month’s Crowd-sourcing in national mapping workshop. Moreover the project has been promoted and cooperation areas discussed with CISRO and the Committee for Geographic Names Australasia.

In addition, the project blog has been active in updating about progress and the wider community has been informed via press releases, the live webpage and the project wiki – all of which has been disseminated through Twitter.

Paper abstracts are in preparation for the AGI Geocommunity (18th-20th September) and OSGIS conferences (4th-5th September). It would also be beneficial to present at this year’s “State of the Map” conference (6th – 8th September), however the timings in relation to the local OSGIS conference coupled with SotM being held in Japan and travel budget constraints mean that attendance is not likely. Abstract deadlines are as follows:


 * Geocommunity: 25th Arpil
 * State of the Map: 30th April
 * OSGIS: 10th May

Specific Tasks and Timescales
Table 5.1 is extracted from the initial project Gantt chart, highlighting tasks pertinent to phase 1 only. Those tasks highlighted in green have been completed those in orange partially completed but are still work in progress (WIP), while those in red are incomplete, with either work yet to commence or under research.

Table 5.1 Infrastructure development and implementation objectives
=Appendix A: Review of the quality assurance tools for OpenStreetMap=

(As listed in http://wiki.openstreetmap.org/wiki/Quality_Assurance )

OpenStreetBugs

 * This tool allows anybody to report a “bug”.
 * The bugs are described by users as text with no structure.
 * Others can comment and/or edit the bugged area using other editing environment (e.g. Potlatch etc.) and report the bug as fixed if they wish.
 * This looks like a Web 2.0 tool for discussing about the errors, mostly relying on personal observations/description.
 * The bugs data files can be downloaded here.
 * Personal development, open-source.

MapDust

 * “The intention is to enable the widest possible range of people to improve the OpenStreetMap database regardless of their technical skills.”
 * No error structure, many of the reported errors in UK has no description. Many other descriptions are about routing errors or locality of POSs according to personal observations.
 * Daily data files can be downloaded as sql here.
 * Web 2.0 features like OpenStreetBug.
 * Developed by Skobbler GmbH.

KeepRight

 * Automatic Error detection, following a hierarchy of errors and warnings.
 * 44 errors in 19 groups, 4 warnings.
 * Interfacing possible through here
 * List of error messages can be found here
 * The current applying rules (30) are documented
 * Private personal development

OSMOS (has nothing to do with “OSM + OS”!)

 * Supposed to be like KeepRight, but I found it having a poor interface and difficult to find errors. Some parts are in French, I guess it works in France maps better that what id does in UK maps.
 * No automatic fix.

DuplicateNodeMap

 * Finds OSM nodes that have the same locations having different identifiers, which can be an error indicator. However, the website does not show any duplicate node when tested.
 * No automatic fix.

JOSM Validator

 * The internal validation tool of JOSM, finds warnings, errors and “other” bugs.
 * The validation rules and how-to-fix can be found in Appendix B.
 * Sometimes can fix the bug, but seems very rarely.

Gary68 tools

 * A set of tools in Perl scripts developed by one of the OSM wiki users, called Gary 68. This user is currently off for his/her other workloads.
 * The tools are set of scripts, each for a certain type of problems.
 * The analysis output are in XML or GPX files that can be read in JOSM.
 * No bug fixing

OSM Inspector

 * Developd by GeoFabrik in Germany (the same which does the MapCompare)
 * Has different validation “layers”: Geometry, Tagging, Places, Highways, Multipolygons, ...

=Appendix B: Rules and Fixes, According to JOSM Validation Tool=