I would like to start with the planet database, and then filter it to use for only administrative and geographic placename geocoding to lat-long (using nominatim). I don't care about roads, POI's, etc. I want to reduce the import time and storage requirements. I first tried this in a straightforward way: I started with the whole planet db (22GB) and then removed all relations and ways so I was only left with nodes (15GB). I further filtered the nodes to places and some geographic features (down to 117MB). The import took less than overnight. However, this filtering does not work so well, because some containment relationships are specified by boundary=administrative. Since boundaries are ways, I lost those by removing all ways. For instance, I lost the state boundaries, and so some US cities came back with lookup results like City, County, USA, instead of City, County, State, USA. I'll put back the administrative boundaries. But I would very much appreciate any further advice about what to remove and what not to remove so as not to lose too much information but still reduce the database size significantly. FOLLOWUP: For my purposes, removing buildings and roads was fine. The input data does not mention roads, so they could be safely removed. I used
I reduced the planet pbf from 22GB to 1.6GB using the above filtering. Doing asked 23 Oct '13, 18:28 dhalbert |
Removing ways which only have either a building tag, a building tag and a source tag or a building tag, source tag and wall tag will significantly reduce the file size. On the other hand, nominatim might do this already. You'd also want to remove their nodes if you're not planning to use updates. You can't always remove roads because some geocoders rely on them. From a file size perspective, the planet is largely buildings, roads and the nodes used for them. natural/landuse are the only other tags that spring to mind, but you don't want to removed any named landuse. answered 26 Oct '13, 12:07 pnorman |
I think you are potentially underestimating the complexity of what you want to do. Nominatim employs a lot of logic to try to get it right using the whole database, has to use heuristics in some circumstances and still doesn't "always" get it right. Probably the best approach would be to just remove data where you can be fairly sure that it will have no consequences for your application, for example just revmoving building outlines that have no interesting tags and are not member of a relation should already result in a noticeable size reduction. answered 24 Oct '13, 10:42 SimonPoole ♦ |