This is a static archive of our old OpenStreetMap Help Site. Please post any new questions and answers at community.osm.org.

Why is the PBF StringTable defined to use byte

1

The protocol buffer specified for StringTable states that it stores repeated bytes.

message StringTable {
   repeated bytes s = 1;
}

To the best of my knowledge this could easily be stated as

message StringTable {
   repeated string s = 1;
}

As ProtocolBuffer already defines string to be a utf-8 or equivalent ASCII subset. In its current state I don't see any definition on how strings should be encoded (in PBF), and that is very bad.

asked 26 Jul '13, 06:32

he_the_great's gravatar image

he_the_great
1.2k61423
accept rate: 14%

edited 26 Jul '13, 06:33

1

This is a very technical question and you are more likely to hear answers to this on the dev list (lists.openstreetmap.org/listinfo/dev).

(26 Jul '13, 08:49) Frederik Ramm ♦

One Answer:

2

From the Protobuf documentation it looks like "bytes" and "string" are treated almost the same everywhere. Internally they seem to be the same, only when setting or getting the data, there might be differences depending on the language used, because some languages have special types for UTF-8 strings. Using Protobuf from C++ there is no difference between these two types.

I don't know what the original reason was, maybe to optimize away any UTF-8 validity check that the library might do internally. But I don't really see any difficulties. All strings in OSM are always UTF-8, so that's what those are, too. If that's not documented it should be.

answered 28 Jul '13, 08:01

Jochen%20Topf's gravatar image

Jochen Topf
5.2k55074
accept rate: 31%

Source code available on GitHub .