I would like to start with the planet database, and then filter it to use for only administrative and geographic placename geocoding to lat-long (using nominatim). I don't care about roads, POI's, etc. I want to reduce the import time and storage requirements.

I first tried this in a straightforward way: I started with the whole planet db (22GB) and then removed all relations and ways so I was only left with nodes (15GB). I further filtered the nodes to places and some geographic features (down to 117MB). The import took less than overnight.

However, this filtering does not work so well, because some containment relationships are specified by boundary=administrative. Since boundaries are ways, I lost those by removing all ways. For instance, I lost the state boundaries, and so some US cities came back with lookup results like City, County, USA, instead of City, County, State, USA.

I'll put back the administrative boundaries. But I would very much appreciate any further advice about what to remove and what not to remove so as not to lose too much information but still reduce the database size significantly.

FOLLOWUP: For my purposes, removing buildings and roads was fine. The input data does not mention roads, so they could be safely removed. I used osmfilter:

 osmfilter --keep="name= and ( mountain_pass= or natural= or place= or waterway= or boundary=administrative or boundary=national_park or boundary=protected_area )"

I reduced the planet pbf from 22GB to 1.6GB using the above filtering. Doing --drop-author reduces it further to 1.4GB.

asked 23 Oct '13, 18:28

dhalbert's gravatar image

dhalbert
61338
accept rate: 0%

edited 28 Oct '13, 20:25


Removing ways which only have either a building tag, a building tag and a source tag or a building tag, source tag and wall tag will significantly reduce the file size. On the other hand, nominatim might do this already. You'd also want to remove their nodes if you're not planning to use updates.

You can't always remove roads because some geocoders rely on them.

From a file size perspective, the planet is largely buildings, roads and the nodes used for them. natural/landuse are the only other tags that spring to mind, but you don't want to removed any named landuse.

permanent link

answered 26 Oct '13, 12:07

pnorman's gravatar image

pnorman
2.4k52140
accept rate: 19%

edited 29 Oct '13, 21:18

I think you are potentially underestimating the complexity of what you want to do. Nominatim employs a lot of logic to try to get it right using the whole database, has to use heuristics in some circumstances and still doesn't "always" get it right.

Probably the best approach would be to just remove data where you can be fairly sure that it will have no consequences for your application, for example just revmoving building outlines that have no interesting tags and are not member of a relation should already result in a noticeable size reduction.

permanent link

answered 24 Oct '13, 10:42

SimonPoole's gravatar image

SimonPoole ♦
38.7k13287614
accept rate: 19%

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×113
×44
×24
×3

question asked: 23 Oct '13, 18:28

question was seen: 1,768 times

last updated: 29 Oct '13, 21:18

powered by OSQA