NOTICE: help.openstreetmap.org is no longer in use from 1st March 2024. Please use the OpenStreetMap Community Forum

There are many, many thousands of objects in OSM source data where any of these is so close to another in the same class that such scenario is totally unnatural (impossible) in the real world. For illustration see an example (of thousands) taken from the river-bank data-layer in the latest dump, here https://goo.gl/vpxQeQ. Of cause, most of the solid data preparation programs detect these anomalies, yet there is a dilemma - which one of those "close" objects to chose and that is my question. Namely, there are many options like keep the larger one, the one with more details (nodes), the latest, both... and so on.
Any suggestion? Thanks.

asked 27 Sep '17, 14:53

sanser's gravatar image

sanser
695383955
accept rate: 5%


The linked area in OSM https://www.openstreetmap.org/#map=19/46.94866/-0.93791 shows a bunch of duplicate crap river imports in France.

So in this case - none of them. Look at the best sources you can find (survey it yourself if you can) the best imagery and map it properly. It would also help to comment on the importing changesets explaining the problems that occurred.

permanent link

answered 27 Sep '17, 14:59

SomeoneElse's gravatar image

SomeoneElse ♦
36.9k71370866
accept rate: 16%

In addition to what SomeoneElse has said: Our goal is to deliver quality data curated by humans. OpenStreetMap is not a dump where everyone can throw in some public data sets and then somehow users (like yourself) have to pick which of three overlapping river outlines they want to use. What you see here is a badly executed import in an area that has too few active mappers to actually notice and repair the problem. The import is 6 years old; today, we have much stricter import guidelines that would hopefully avert such disasters. Data as bad as this is not "normal" for OSM, and must never become.

permanent link

answered 27 Sep '17, 15:43

Frederik%20Ramm's gravatar image

Frederik Ramm ♦
82.5k927201273
accept rate: 23%

Thanks to you both for the quick answer. Implicitly, there are two suggestions: one, ignore all replications and two, don’t do anything (wait for human corrections). In my opinion both are the extremes and hardly acceptable options for vector based OSM GIS and digital map makers. The first would result in too many river brakes and empty places (the distribution of the rivers’ geometry related anomalies is presented here https://goo.gl/CaHTgz where there are over ten thousand replication markers). The second suggestion results in too many logical anomalies in rendering like the replication from the link in the question, huge number of missing/overwritten islands like here https://osm.org/go/9DnlbY, not to mention formal traps in GIS apps, data generalisation… just to mention some. By the way, I am selecting the one with the most details/node points from the replicated (outer or inner) border polygons. But yet my question/dilemma is open – is this the right decision?

(30 Sep '17, 10:08) sanser
1

The third option, obviously, is: Help fix the problem! Sometimes individual imports can be identified that caused the problem, and the data from those imports can then simply be removed. If that is not the case, or proves unworkable since it would remove too many good data sets, then the second best option is to use a QA tool to help mappers fix the problems. Since you seem to have identified many problematic locations already, you could think about creating a "MapRoulette" task that would point mappers to the areas in question, and they can then be fixed one by one. Don't assume the OSM community has no interest in fixing the issue - they might simply not be aware of it. You can help!

(30 Sep '17, 10:33) Frederik Ramm ♦

Aha - I think I misunderstood the question that you're asking. From the perspective of a data consumer who doesn't get the option to remap, but only to pick and choose what they use then yes, there will be occasions when you need to not use a certain class of data from a particular source.

For example, at one point I dropped leisure=common from a rendering I maintain because it was at the time frequently misused. I can think of another example where someone who needed an urban/rural divide used a non-OSM source on a mostly OSM map because getting that data from OSM was considerably more cumbersome.

(30 Sep '17, 10:38) SomeoneElse ♦
1

@sanser, I am the author of MapRoulette and am happy to assist you to set up a challenge to crowdsource the clean up of these almost-duplicates. Send me an OSM message or email me at m@rtijn.org and we can take it from there if you are interested. An example of another MapRoulette 'cleanup' challenge that is currently running is here: http://maproulette.org/map/2774 -- cleaning up not existing schools from an old import in the United States. Thanks Frederik for pointing me to this discussion.

(30 Sep '17, 19:30) mvexel

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×30
×25
×1

question asked: 27 Sep '17, 14:53

question was seen: 2,081 times

last updated: 30 Sep '17, 19:30

NOTICE: help.openstreetmap.org is no longer in use from 1st March 2024. Please use the OpenStreetMap Community Forum