Note: this is crosspost of https://gis.stackexchange.com/questions/284608/difference-between-osm-regional-extracts-unexpectedly-large

I want to create a difference (change file) between two Europe extracts from OSM data. The files I want to compare are:

As the files are quite large, before running on them I have tried the processing on smaller extracts:

The workflow I use is:

  • first convert xxx-180401.osm.pbf to xxx-ref.o5m, then create the diff:

osmconvert europe-180401.osm.pbf -o=eu-ref.o5m

osmconvert eu-ref.o5m europe-latest.osm.pbf --diff -o=eu-changes.o5c

Or:

osmconvert czech-republic-180401.osm.pbf -o=cz-ref.o5m

osmconvert cz-ref.o5m czech-republic-latest.osm.pbf --diff -o=cz-changes.o5c

The process works fine with the Czech Republic data. The resulting o5c file looks reasonable, its size is ~30 MB (the Czech Republic input file is ~700 MB).

I get strange results with the Europe extract. The resuling o5c file is ~40 GB, while the Europe input is ~20 GB. When inspecting the file, I have found many instances of data which was not changed on OSM in this year at all, like way 539372444.

I have also tried doing the comparison using Osmium instead of Osmconvert, but the result was the same, the change file was huge.

Am I doing something wrong, or are the Europe extracts unsuitable for the comparison for some reason?

asked 30 May '18, 14:34

Ondrej%20Spanel's gravatar image

Ondrej Spanel
31115
accept rate: 0%

Have you downloaded the 20180401 file for Europe in April, or in May?

(30 May '18, 14:50) Frederik Ramm ♦

In May - just a few days ago, same as the Czech republic extract.

(30 May '18, 14:59) Ondrej Spanel

You've likely become a casuality to GDPR-related changes on the Geofabrik download server where we've removed user information from download files. It is possible that the two files you are comparing have a different method of removing user data (one has NO user data, the other has fake user data with uid=0) and this confuses the program that computes the diffs.

You could either try removing the user, uid and changeset fields from both files before you compare, or you could download the old-style, complete files from osm-internal.download.geofabrik.de.

permanent link

answered 30 May '18, 15:02

Frederik%20Ramm's gravatar image

Frederik Ramm ♦
69.3k806291083
accept rate: 24%

Given osmcompare diff does not use content, only version numbers, how can metadata removal affect this? (Cf. https://wiki.openstreetmap.org/wiki/Osmconvert#Retrieving_the_Differences_between_two_OSM_Files)

(30 May '18, 15:15) Ondrej Spanel

Inspecting result of conversion to OSM answers this immediately: europe-180401 is completely missing version attribute. If this is result of GDPR data removal, I think it is oversight, as version number does not seem like a personal data to me.

(30 May '18, 15:27) Ondrej Spanel

Note: it seems it is only historical data is affected by this (files like europe-180401.osm.pbf). The file europe-latest.osm.pbf seems fine. I will try historical data from https://osm-internal.download.geofabrik.de/europe.html# and report the result here.

(30 May '18, 15:35) Ondrej Spanel

I confirm data downloaded from this location contain the version id and work fine for my purpose.

(31 May '18, 09:41) Ondrej Spanel

http://download.geofabrik.de/europe-180401.osm.pbf (MD5: 7cd103991af26a5299ccf8dd9577171f) definitely contains version numbers, I just checked.

(05 Jun '18, 15:09) Frederik Ramm ♦
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×36
×29
×6

question asked: 30 May '18, 14:34

question was seen: 663 times

last updated: 05 Jun '18, 15:09

powered by OSQA