I need to get all addresses and appropriate coordinates from NYC.

I use Osmconvert and osm filter.

I took a NY .pbf file from Geofabrik.

After some filters, I did this to select buildings only:

osmfilter.exe 1_to-node.o5m --keep="building AND addr*" --drop-author --drop-version -o=2.building-addr.o5m

Then, I did this for an output:

osmconvert.exe 2.building-addr.o5m -o=3.addr.csv --csv-headline --csv-separator=; --csv="@id addr:street addr:housenumber @lat @lon"

But the problem is that it gives me the wrong @id. It gives some of the addresses with normal id and then - with 1000000000000000 id only. What can I do?

asked 21 Aug, 13:19

Skrypch's gravatar image

Skrypch
1124
accept rate: 0%


The convert to node feature of osmconvert adds an offset (configurable) to ways, as described in the documentation.

permanent link

answered 21 Aug, 15:15

SK53's gravatar image

SK53 ♦
22.6k46229355
accept rate: 20%

Thanks a lot. Could you please describe how to make IDs to be counted from 0? --object-type-offset=0?

(21 Aug, 15:20) Skrypch

I also don't understand why some coordinates are false? I aimed to get coordinates of all buidlings in NYC but I got { "@id;addr:street;addr:housenumber;@lat;@lon": "2322155479;10th Avenue;192;40.7467964;-74.0048356" },

And these coordinates show just the road.

It's also rather strange that NYC has more than 1 mil of buidlings but I got the list of 700k+

(21 Aug, 15:24) Skrypch

That node is mapped as being out in the road, so the coordinates you have are correct (in as far as they match what's in the database). 2322155479

As for not getting all of the buildings, you're only filtering for buildings that have an address tagged on them. There are probably lots of buildings that don't have an address tagged at all or have the contained addresses tagged separately on nodes within the building area. You can see examples of this on West 26th between 9th and 10th, where the addresses are on separate nodes (example node).

(21 Aug, 16:57) alester

I see. Thanks, I was thinking about almost the same. So it means OSM has still a lot of mistakes to deal with. However, could you please explain your example more clearly? I couldn't get it. I understand that some buildings don't have an address tagged (and that's a pity) but I can't get the meaning of "have the contained addresses tagged separately on nodes within the building area".

May be it is worth saying that my idea is that I will have data scraped every several hours from Rental websites and I will get ~40k of address of NYC buildings (actually, I will take coordinates only). As far as I use OSM polygons for NYC buildings, I need to check if these coordinates are within any of these polygons. So I decided that using a function in Python or other language may do the work more difficuly and I realised I needed a database of all NYC polygons and of coordinates of all NYC polygons. After I check what coordinate is inside which polygon and save this data into a database (mysql) it would be easier to check new coordinates after scraping because I will just need to check if this coordinate is in my base and what is the corresponding polygon.

(21 Aug, 17:21) Skrypch

I hope It's clear. So the more coordinates of NYC buildings I have - the better. That's why I also need to deal with an IDs problem.

(21 Aug, 17:21) Skrypch
1

Explaining the example, consider the case where one building may have multiple addresses, such as when there are multiple businesses in it. In such a case, the address tags would typically be on separate nodes for each of the businesses, not on the building object itself. If you look at the example node I linked to, you can see that it only has address information and no building tag. Therefore, your filter wouldn't find it. The same can be seen with the "Avenues" building across 10th, which isn't tagged with an address but has three nodes with addresses located within it.

As for your goal, I don't completely understand what it is you're trying to do. If you want to get data for all of the buildings, you could just filter for that tag and ignore addresses. If you also need address information for all of the buildings, it becomes much more complicated due to the cases I described, as well as alternate addressing schemes, mis-tagging, etc.

(21 Aug, 18:15) alester

Thanks for the great explanation! I guess I will need to get these "multiple addresses" too because I may be in a situation when I want to check if a specific address is within a specific building (polygon) but my database doesn't contain the polygons with addresses needed.

(22 Aug, 11:40) Skrypch
showing 5 of 7 show 2 more comments
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×547
×119
×43
×37
×28

question asked: 21 Aug, 13:19

question was seen: 212 times

last updated: 22 Aug, 11:40

powered by OSQA