# Linguistic Hacking with Ruby Unicode is a revolution in the history of human language. No, really! For the first time in history Unicode has made it possible, and increasingly easy, for just about anyone with a computer to produce texts in any language, and in any any combination of languages. Stop and think about that for a second, about what it means: * some * examples * here And then there's the internet. Unicode has become the standard way to represent text on the internet, in wide variety of human languages. I could try to give you a bunch of numbers to convince you that there is ever more content in ever more languages appearing on the internet, but I doubt you need convincing of that. It's obvious, just a part of the digital landscape. What might not be obvious is that all this new text hasn't just empowered communication, it is also enabling a new variety of linguistics. And that's what this book is about. Doing linguistics with computers. It's a hands-on, down-and-dirty dive into the brave new worlds of math, language, and computing. Some things you'll learn: * How to identify what language a file is in automatically * How to build bilingual dictionaries automatically * All about writing systems * And more. The language we'll be using is Ruby. This might surprise some hackers, as Ruby doesn't have a reputation of handling Unicode very well. With the appearance of Ruby version 1.9, however,