The protocol buffer specified for StringTable states that it stores repeated bytes.
To the best of my knowledge this could easily be stated as
As ProtocolBuffer already defines string to be a utf-8 or equivalent ASCII subset. In its current state I don't see any definition on how strings should be encoded (in PBF), and that is very bad. asked 26 Jul '13, 06:32 he_the_great |
From the Protobuf documentation it looks like "bytes" and "string" are treated almost the same everywhere. Internally they seem to be the same, only when setting or getting the data, there might be differences depending on the language used, because some languages have special types for UTF-8 strings. Using Protobuf from C++ there is no difference between these two types. I don't know what the original reason was, maybe to optimize away any UTF-8 validity check that the library might do internally. But I don't really see any difficulties. All strings in OSM are always UTF-8, so that's what those are, too. If that's not documented it should be. answered 28 Jul '13, 08:01 Jochen Topf |
This is a very technical question and you are more likely to hear answers to this on the dev list (lists.openstreetmap.org/listinfo/dev).