My project is to build a series of map layers showing developments over an extended period, bronze age to 19th century with primary features of the present, covering the area of Cornwall, Devon, Dorset & Somerset.

I am trying to import sections of OSM into a postgis database using osm2postgresql-04.sh with osmosis but am finding a combination of the default schema and 'free spirited' tagging is resulting in the loss of significant information. (e.g. created_by, source, note, comment, etc. tags causing the tags field in the db to be truncated before relevant tags are reached).

4 months of manually hacking at the data has failed to solve the problem and convinced my the solution is to be able to configure the db to have columns for EVERY tag in the data and the use sql queries to pick what I need.

The problem is I have no knowledge of postgres and can just barely get by with mysql.

my present procedure is using osm2postgresql-04.sh to load the data, Qgis to convert it to MapInfo files for editing.

Can SKS point me to a primer or other help.

asked 16 Oct '11, 06:30

sparrowhawk's gravatar image

sparrowhawk
56224
accept rate: 0%


You do not want columns for ALL tags as this would mean tens of thousands of columns.

There are various ways of importing OSM data into a database and which one you choose depends largely on what you want to do with the database. What you describe above - having all tags in one text field - is something nobody does nowadays and I am intrigued that you should try that at all; you must have been following a 4 year old README or so.

Basically, we have importers geared towards rendering the data - these will tend to drop everything not needed for map drawing, and will make sure to create "geometry" objects for quick access. Having a "geometry" object e.g. for a road in the database means that you can quickly access the road as a whole, and have the database answer questions like "does road A intersect with forest B" or so, without having to first read and process all the nodes making up the road. Examples of such importers are the widely-used osm2pgsql or the newer Imposm. With both you have considerable control over which tags get imported into their own columns, by way of a config file. osm2pgsql will always create exactly four database tables holding your geometries, whereas with Imposm this is configurable.

The other kind of importers we have is geared towards further processing of the data and these will usually make an effort to presever all information about an OSM object, including all tags and meta data like who created it when, and can optionally also preserve history, i.e. they can keep older versions of the same object. Osmosis is the prime example of such importers.

Both areas are not 100% distinct. osm2pgsql for example allows you, by way of extra commandline switches, to store all tags - not only those for which a separate column is created - in a so-called "hstore" column. This is a feature of PostgreSQL where you can have an unlimited number of key-value pairs in one database column. osm2pgsql also has a feature where you can request some metadata (user id, date of last modification) to be recorded in a separate column. On the other hand, Osmosis offers two distinct database schemas, one being the "API DB" schema where all tags and all metadata are stored (tags have their own database tables with one row per tag nowadays - not the old "all tags in one text field" scheme anymore!), and the other being the "Snapshot" schema which optionally includes built geometries.

Additionally, there are specialist importers like osm2pgrouting that import data optimized for specialist purposes.

If you plan to edit in Mapinfo, you could also use osm2shp or osmjs to convert OSM data to shape files and edit those, or you could use the OSM plugin for QGis to load OSM data directly into QGis.

permanent link

answered 16 Oct '11, 10:41

Frederik%20Ramm's gravatar image

Frederik Ramm ♦
71.0k836431107
accept rate: 24%

In addition to Frederick's excelent pointers, I'd advise you to browse the postres docs (http://www.postgresql.org/docs/current/static/) and postgis docs (http://www.postgis.org/docs/) they are very high quality and should help you understand what can be done with th db.

(16 Oct '11, 14:10) Vincent de P... ♦

Thanks for your response, you have given me a lot to digest. The process is going to be far more complex than I had thought

You do not want columns for ALL tags as this would mean tens of thousands of columns.

Not in the finished product but I need to simplify the extraction and sorting of significant objects.

In a perfect world, there would be a clearly defined hiarachy of tags and no one would make typos but we are stuck in the real world, full of human fraility and creativity.

The area I am working with contains 60,000 unique tag key = value sets covering about 300 tag keys that can be reduced to about 80-100 keys by merging various ways of saying the same thing.

I have been using grep to find the key = value sets in the .osm file then using awk to delete the unwanted tags but this takes 3 days for the grep/awk and another 2 -3 days to manually cut/paste the unprocessed tags into separate columns just for a mapinfo table of highway = primary, trunk, secondary, & tertiary

What you describe above - having all tags in one text field - is something nobody does nowadays and I am intrigued that you should try that at all; you must have been following a 4 year old README or so.

This is done by both osm2postgresql-04.sh (which seems to be a fronted to osmosis-0.38) and the Qgis OSM importer. I did wonder why anyone would want to roll all tags into a single field.

If you plan to edit in Mapinfo, you could also use osm2shp or osmjs to convert OSM data to shape files and edit those, or you could use the OSM plugin for QGis to load OSM data directly into QGis.

I use MapInfo for editing as I find it the most comfortable to use, I find QGis painfully slow for editing when it lets me select edit mode, but mapinfo is plagued by the 254 character limitation for text fields and I can't get it to search containing quote or equals symbols.

The final product for this phase of the process is to produce a db with tables for places, motor vehicle roads, borders, coastline & waterways to anchor the present, buildings, archealogical sites, etc. to illustrate the past.

I hope this clarifies things.

permanent link

answered 17 Oct '11, 04:03

sparrowhawk's gravatar image

sparrowhawk
56224
accept rate: 0%

I see. osm2postgis uses Osmosis to import data into PostgreSQL, and Osmosis duly creates a "hstore" column that contains all tags, but then osm2postgresql-04.sh flattens that into a single text column. Unless you have a very good reason to use this program (i.e. unless you really need exactly the data structure it creates) - do not use osm2postgresql-04.sh. If you use Osmosis directly then you get a database with a "hstore" column and you could easily break that out into "proper" columns for the 80 or so tags you are interested in, with a few lines of SQL. Or else use osm2pgsql which already creates any number of columns for you, you just have to add them to the config file.

It sounds as if you should really discuss your project in depth with someone more familiar with OSM and databases. Maybe there's a local pub meet or some other OSM event that you could go to?

permanent link

answered 17 Oct '11, 08:13

Frederik%20Ramm's gravatar image

Frederik Ramm ♦
71.0k836431107
accept rate: 24%

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×219
×121

question asked: 16 Oct '11, 06:30

question was seen: 6,887 times

last updated: 17 Oct '11, 08:13

powered by OSQA