I am asking what I realize is technical question about shapefiles rather than OSM but I hope you will excuse me. For what it is worth, I am a newbie but really think I will be involved in OSM for a while, so it won't be a lost cause to help me: I think I will contribute back.

Anyway: I have downloaded shapefiles from both cloudmade and geofabrik. I have also downloaded OSM files and converted them.

My problem: all the shapefiles have different dbase structures. For example, Cloudmade is nice and neat with four columns and missing a lot of tag information. QGIS saves shapefiles with anywhere from four columns to twenty columns so far, with the tags column having a variety of information such as "bicycle"="yes","hiking"="yes","information"="map","internet_access"="terminal". Some of this information is duplicated in a separate column: e.g. tags list parking/fees/surface... and the column amenity lists 'parking'.

This means that I have to parse this data somewhere else. Which means that I not only have to write a parser (ot at least a macro) but it would have to deal with unknowns each time. What is wierd is that the program I am using, NetLogo (for the moment), is oddly sucessful in dealing with some of the data - unfortunatly I don't have details on how this data is handled in order to solve it. However I would like to move forward having the data side solved and then deal with issues on the programming side.

So what I am looking for is:

  • An explanation of why the converters give such different results (or is it the same result and I am just missing the conversion schema?).
  • Ideas for how to filter the data if I download direct from OSM. (If I missed that please don't jump on me, I am sorry but all I've seen is the very nice export option. I haven't found any options.)
  • Any ideas for a program to handle the data in order to convert it post download? I have been using QGIS so far but I am a newbie pretty much at everything and can't get QGIS to present just the data I want. I would be happy enough to do the work manually and seperate layers at the QGIS stage but the information is scattered in the 'tags' line and I can't seem to get at it.

I realize this is only partially a OSM question, but any help would be appreciated.


asked 13 Mar '12, 11:50

jake%20cimilo's gravatar image

jake cimilo
accept rate: 0%

The shapefile format has a lot of limitations, so it is not possible to create a shapefile with all the information from OSM, so when you convert you have to decide what to include.

You don't describe what data you actually need, so it is hard to say for sure what is the best solution for you. If you need more than just simple shapes, you might be better off importing the planet file (or one of the extracts from Geofabrik or Cloudmade) into a PostGIS database.

Again depending on your needs, there are several ways to do that. Some of the options are: Osm2pgsql, Imposm and Osmosis

With Osmosis, PostgreSQL and PostGIS you can do something like this:

Create a PostGIS and hstore enabled database and setup the pgsnapshot schema, use Osmosis to import the data:

osmosis --read-pbf file=denmark.osm.pbf outPipe.0=1 --write-pgsql database=osm user=XXX password=XXX inPipe.0=1

Then create and index new qid columns to make QGIS happy:

CREATE SEQUENCE nodes_qid_seq;
ALTER TABLE nodes ADD COLUMN qid integer NOT NULL DEFAULT nextval('nodes_qid_seq');
ALTER TABLE nodes ADD CONSTRAINT unique_nodes_qid UNIQUE (qid);

CREATE SEQUENCE ways_qid_seq;
ALTER TABLE ways ADD COLUMN qid integer NOT NULL DEFAULT nextval('ways_qid_seq');
ALTER TABLE ways ADD CONSTRAINT unique_ways_qid UNIQUE (qid);

When you use the pgsnapshot schema, all the tags are stored in a hstore column. To make it easier to work with from QGIS, you can create views like this:

  SELECT nodes.qid, nodes.tags -> 'amenity' AS amenity, nodes.tags -> 'leisure' AS leisure, 
    nodes.tags -> 'sport' AS sport, nodes.tags -> 'tourism' AS tourism, nodes.geom
  FROM nodes
  WHERE nodes.tags ? 'amenity' OR nodes.tags ? 'leisure' OR nodes.tags ? 'sport' OR nodes.tags ? 'tourism'
  SELECT ways.qid, ways.tags -> 'amenity' AS amenity, ways.tags -> 'leisure' AS leisure, 
    ways.tags -> 'sport' AS sport, ways.tags -> 'tourism' AS tourism, ways.linestring AS geom
  FROM ways
  WHERE ways.tags ? 'amenity' OR ways.tags ? 'leisure' OR ways.tags ? 'sport' OR ways.tags ? 'tourism';

  SELECT w.qid, w.tags -> 'highway' AS highway, w.tags -> 'foot' AS foot, w.tags -> 'bicycle' AS bicycle, 
    w.tags -> 'tracktype' AS tracktype, w.tags -> 'access' AS access, w.tags -> 'surface' AS surface, 
    w.tags -> 'sac_scale' AS sac_scale, w.tags -> 'mtb:scale' AS mtb_scale, w.linestring
  FROM ways w
  WHERE w.tags ? 'highway' OR w.tags ? 'foot' OR w.tags ? 'bicycle' OR w.tags ? 'tracktype' OR 
    w.tags ? 'access' OR w.tags ? 'surface' OR w.tags ? 'sac_scale' OR w.tags ? 'mtb:scale';

  SELECT "Roads".qid, "Roads".highway, "Roads".foot, "Roads".bicycle, "Roads".tracktype, "Roads".access, 
    "Roads".surface, "Roads".sac_scale, "Roads".mtb_scale, "Roads".linestring
    FROM "Roads"
  WHERE COALESCE("Roads".foot, 'empty') <> 'no' AND 
    (COALESCE("Roads".access, 'empty') <> ALL (ARRAY['private', 'no', 'restricted', 'emergency'])) AND 
    (COALESCE("Roads".highway, 'empty') <> ALL (ARRAY['primary', 'cycleway', 'trunk', 'motorway', 'primary_link', 'trunk_link', 'motorway_link', 'motorway_junction']));

Then in QGIS, you can add the Amenities and Walkable views as PostGIS layers.

You should change the views to use your own classifications.

You can also use osm2pgsql and change the import style to create a schema that matches your needs.

There are no fixed rules for how anything should be tagged in OSM, so your biggest task is probably identifying what tag/value combinations you should use to make your classifications.

You can start by looking at some of these keys and especially their link to taginfo on the right side of the page - the "values" tab can give you an idea about values you should consider:

highway, foot, bicycle, tracktype, surface, access, mtb:scale and sac_scale.


permanent link

answered 13 Mar '12, 12:37

Dymo12's gravatar image

accept rate: 12%

edited 13 Mar '12, 20:12

I would like to do that. Eventually I will get to the point where I am storing the various layers I need, including the ones I generate, in an external database. For the moment, I wanted the .dbf file to hold just certain layers.

For example, I would like to have separate shapefiles with:

  • Highways but with the capability to show which roads are hikeable and bikeable.
  • Paths that can be shown to be bikeable by various degrees, mountainbikable, walkable, hikable, climbable.
  • Amenities but broken out seperately: gas stations, parking, toilets, bars, restaurants, hotels, etc.
  • natural parks, recreation sites, camping, etc.

To take the first example, bikeable roads have a tag: "bicycle"="yes". All the .dbf files I have right now just list that tag in a text record with all the other tags that apply to that feature. If the tags were separated or if I could download them in a way that the .dbf file showed a column "bicycle" and a column "level" (for 1 to 5 or which ever standard is used, then I could sort the roads using those columns, e.g. "highways" with "bicycle"="yes" and "level"<"3".

I admit that my reason is just that my system has built in ways to do this which I prefer over writing my own script. But if I am running hundreds of agents and hundreds of scenarios, I prefer having the data presorted... aye, which i guess is what i will have to do.

Still, if you know an answer without me having to code something, I would appreciate it.


(13 Mar '12, 13:17) jake cimilo

Are you aware that QGIS can load OSM files directly (through its OSM plugin) ? This may cause problems if your data use multipolygons but would be enough for simple OSM elements (ways, nodes).

(13 Mar '12, 14:55) Pieren

I think it is difficult to get the level of detail you want without some coding/scripting, but it is not that hard and it also makes it easier for you to keep you data updated. I have added some examples to my answer.

(13 Mar '12, 20:12) Dymo12

Dymo12 - thanks.

I have a lot of sorting out of my project to do before I attempt any of your suggestions - the guts of it is cognitive models of people and OSM is just the playground they will eventually use. For the moment, I can 'fake' a playground without sorting the OSM data and i need to do further work on the cogntiive model rather than the playground.

However, I will come back to this because I really like OSM, I even like that "There are no fixed rules for how anything should be tagged in OSM" - I much rpefer a flexible system even at the cost of having to learn how to code. When I get back to it, I will post what happened and how I succeeded. : )

(14 Mar '12, 09:20) jake cimilo
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here



Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:


question asked: 13 Mar '12, 11:50

question was seen: 11,199 times

last updated: 17 Mar '12, 11:11

powered by OSQA