Filtering planet for just placename geocoding
I would like to start with the planet database, and then filter it to use for only administrative and geographic placename geocoding to lat-long (using nominatim). I don't care about roads, POI's, etc. I want to reduce the import time and storage requirements.
I first tried this in a straightforward way: I started with the whole planet db (22GB) and then removed all relations and ways so I was only left with nodes (15GB). I further filtered the nodes to places and some geographic features (down to 117MB). The import took less than overnight.
However, this filtering does not work so well, because some containment relationships are specified by *boundary=administrative*. Since boundaries are ways, I lost those by removing all ways. For instance, I lost the state boundaries, and so some US cities came back with lookup results like *City, County, USA*, instead of *City, County, State, USA*.
I'll put back the administrative boundaries. But I would very much appreciate any further advice about what to remove and what not to remove so as not to lose too much information but still reduce the database size significantly.significantly.
**FOLLOWUP:** For my purposes, removing buildings and roads was fine. The input data does not mention roads, so they could be safely removed. I used `osmfilter`:
osmfilter --keep="name= and ( mountain_pass= or natural= or place= or waterway= or boundary=administrative or boundary=national_park or boundary=protected_area )"
I reduced the planet pbf from 22GB to 1.6GB using the above filtering. Doing `--drop-author` reduces it further to 1.4GB.