This is a static archive of our old OpenStreetMap Help Site. Please post any new questions and answers at community.osm.org.

How to query overpass to find name tags with Chinese characters?

I was recently looking at the Indian state of Arunachal Pradesh, and noticed that many rivers had been provided with Chinese names. Soon I found Villages and lakes with Chinese names too.

This state has been administered by India, since India's independence, but is claimed by China, as part of Tibet, and therefore part of China. I suspect that this claim is related to the fact that some Chinese names have ended up on objects here.

I have been working on moving Chinese names to the name:zh tag, but it's hard to find everything. I would like to be able to construct a overpass query that only returned objects that have name tags with characters in unicode's Chinese range (or the CJK range, as this would be accurate enough).

This is way over my head, as I know next to nothing about overpass queries or character encoding. Can someone lend a hand?

overpass unicode name chinese query

asked 12 Jul '17, 08:32

keithonearth
2.9k●56●76●108
accept rate: 13%

edited 13 Jul '17, 08:19

I don't see anything like that in the Overpass documentation. Could you just download objects with name tags and do the analysis yourself offline?

(12 Jul '17, 15:16) neuhausr

Try to download all names, then sort them. This should lead to all Chinese names being next to each other. Maybe you can use the CSV output to generate a list with name,type,OSM ID or something similar.

(12 Jul '17, 15:36) scai ♦

I guess you could find some of them with a regular expression that tests for some common characters in names of villages.

(12 Jul '17, 16:59) escada

2 Answers:

One of the Overpass-API developers runs a server with prototype support for ICU character ranges in regex:

https://www.openstreetmap.org/user/mmd/diary/40197

This makes the query straightforward:

http://overpass-turbo.eu/s/qlv

answered 12 Jul '17, 18:00