NOTICE: help.openstreetmap.org is no longer in use from 1st March 2024. Please use the OpenStreetMap Community Forum

1
1

Hi,

I want to know if there is a script to standardise OSM data. Some examples:

  • address tags can be given to closed ways (with a building tag), nodes and ways with an addr:interpolation tag. The street can be given as an addr:street tag, as an associatedstreet relation, or just the closest street availiable. Is there any tool to transform this into only nodes with an addr:housenumber, addr:street, addr:city ... tag?
  • Sometimes, when boundaries aren't availiable, is_in tags are used. Since we can't draw boundaries from is_in tags, can we put is_in tags on streets if we have the boundaries? Or if we don't have boundaries nor is_in tags, use the nominatim approach and choose the closest city/village.
  • ...

This would result in an OSM file that isn't suited for editing anymore, but can more easily be used by other tools.

This should be possible, since it's only a fraction of the work that nominatim does, but I wonder if there is also a separate tool or script to do it.

asked 14 Dec '10, 13:50

Sanderd17's gravatar image

Sanderd17
1.1k51637
accept rate: 31%

I'm not entirely sure what you're asking in this question. When you say "OSM data"- what specific kind of data do you mean? Addresses? Something else?

(15 Dec '10, 12:15) emacsen

Well,

I would love to have a general script: where all data is kept, as long as it's documented and in one format.

But if you could give me an example that does the things I mentionned, I think I would be able to also adapt it to other generalisations.

what I would want the most is:

  • apply a tag to all objects in a closed way or a closed relation

  • place a node on the centre of a closed way or relation with tags that come from that way/relation

Off cource in script form, so that it can be automised.

Thanks

(15 Dec '10, 18:43) Sanderd17

In general, there is no current tool available to do what you want. What you are looking for is a script that, for everything in the world that can be represented in OpenStreetMap in two or more different ways, pick one of the representations and convert all the others.

If you are just interested in sorting out tagging, then I would recommend the TagTransform plugin for Osmosis. I use this plugin daily in my rendering toolchain, for example to take the multiple different was of tagging a Zebra Crossing and consolidate them into one tag. This makes the stylesheets easier to deal with.

More advanced geometry manipulation is done in a variety of tools, most noticeably during import by osm2pgsql / nomintim, but these are specific-purposed transformations into a specific data format rather than general-purpose routines that output openstreetmap data.

There is nothing stopping such a utility being created, it simply hasn't been done yet. The ultimate tool would let you pick between the representations depending on your needs - for example some people want is_in generated from polygons, but others prefer polygons generated from is_in tags.

permanent link

answered 16 Dec '10, 13:42

Andy%20Allan's gravatar image

Andy Allan
12.5k23128153
accept rate: 28%

The tagtransform plugin comes close. The only disadvantage is that it can't change between relations, ways and nodes. It can only edit the tags.

for osm2pgsql, I've read that it loses the info if two ways are connected. And I would not want to lose that data.

I agree on your vision about what the ultimate tool would be.

(16 Dec '10, 13:58) Sanderd17

You could probably do this sort of thing with PostGIS. You can use Osmosis or osm2pgsql to read from an .osm file, and output to a PostGreSQL database.

Then you can use PostGIS queries to find if an object is within a boundary polygon, or near a place, and add tags as appropriate. You can also collapse areas into points (e.g. make a POI out of a building or so).

Such queries could be scripted and automated if you want. Though I don't know of any ready made scripts to do the processing/standardisation that you describe. As you say, Nominatim does some of this processing, and its source code is available, which might be helpful.

permanent link

answered 16 Dec '10, 04:32

Vclaw's gravatar image

Vclaw
9.2k895141
accept rate: 22%

edited 16 Dec '10, 15:17

Frederik%20Ramm's gravatar image

Frederik Ramm ♦
82.5k927201273

I'm afraid I think the reason no one's answered this question is that, like me, they don't understand it.

I don't know what a script where all data is kept means. I thought you wanted address data, but now I'm just too confused, and I'm afraid others here are as well.

I'm going to take a stab at answering what I think your question is:

The first part seems to be "How can I programatically access OSM data?"

If so, the answer is several fold.

OSM is a database of geographic information and we provide several mechanisms of storing that data. The most popular consumable format of OSM data is the OSM XML file format. Usually when you see a .osm file, that file is in the OSM XML format, which is described in detail at http://wiki.openstreetmap.org/wiki/.osm

The second part seems to be about programatically modifying OSM features to (using your terminology) "standardise them".

The short answer is that we don't provide a mechanism to do this, and you shouldn't consider modifying OSM data in an automated way without having a deep understanding of the data you're modifying (and often not even then).

Let's take your example of modifying addresses, using the is_in to add street or town data.

Speaking just about the US, I know this wouldn't work. There are places where the city/town you might think corresponds to an address doesn't. There's a long and complex story around this having to do with the names used for towns historically being the name of the post office and not the city, but the bottom line is you don't know.

Similarly, you might assume that a building in proximity to a street is addressed to that street, but you can't know that.

And if you make an assumption like that, modifying the data, then you make it more difficult to find the error and correct in the future.

Of course if you know the information for an area, you should modify it yourself, but this is not the same as carrying out an automated edit (ie running a bot). Bots are generally frowned upon because of their negative history with the project. More information on bots can be found here: http://wiki.openstreetmap.org/wiki/Bot

permanent link

answered 16 Dec '10, 01:59

emacsen's gravatar image

emacsen
1.2k1623
accept rate: 13%

The file shoudn't be used on OSM back again. It's just that, if there is less info availiable, the script guesses what it is and places it in the file. That way, a tool which uses the file doesn't have to guess again. But more important: if the data is availiable, it is sometimes as a node, a way or a relation. I would like to simplify it to nodes if possible.

I don't want te reupload the file, I know this would do bad things to the OSM database since the database is made for editing, not for usage by tools.

(16 Dec '10, 09:49) Sanderd17
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×230
×9
×3

question asked: 14 Dec '10, 13:50

question was seen: 5,838 times

last updated: 16 Dec '10, 15:17

NOTICE: help.openstreetmap.org is no longer in use from 1st March 2024. Please use the OpenStreetMap Community Forum