I am looking for a tool(s) that can be used to find profane language in a tag value within a specific country or region. I'd also like to find cases where a unicode character has been included in a tag value that is not expected in that particular part of the world. Does such a tool exist?

asked 03 Nov '21, 22:11

RobJN's gravatar image

RobJN
1605511
accept rate: 0%

Please refrain from fixing your findings in OSM via automated edits (https://wiki.openstreetmap.org/wiki/Automated_Edits_code_of_conduct).

(04 Nov '21, 12:43) scai ♦
2

Thanks but I have no intention to make edits. I just want to understand what is in, and going in to, OSM.

(04 Nov '21, 13:35) RobJN

Not sure that it is altogether wise to try and solve the Scunthorpe problem. The context of OSM is likely to produce a very high false-positive rate.

However an obvious approach would be a tool chain using osmium-tool => OPL format => standard UNIX pattern matching tools (grep, sed etc.). Osmium can provide bbox filtering and transformation. One-per-line format allows just tag & key values to be examined (e.g., by pulling out that field with awk or perl). The UNIX tools provide a wide range of means of matching regular expressions (for instance, tr could be used to identify specific unicode characters if that was a particular issue.)

permanent link

answered 04 Nov '21, 09:26

SK53's gravatar image

SK53 ♦
28.0k47268432
accept rate: 22%

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×35
×1

question asked: 03 Nov '21, 22:11

question was seen: 442 times

last updated: 04 Nov '21, 13:35

powered by OSQA