What makes a good brownie? There are three essentials:
- Good ingredients
- A good recipe
- An experienced chef
And that’s surprisingly similar to building a machine learning system as well. What you need is good and clean input data, algorithms suited for your output and a learning system. Let us see how we tackle each of these in Bewgle.
By far the biggest win we have found is by cleaning the input data. User generated data is messy, with lots of grammatical and spelling mistakes, and unclear prose, for example a mixture of two languages. Bewgle has learnt some of the rules to clean this data. Clean data leads to great results (this is the analogous to the more popular Garbage In, Garbage Out principle 🙂
While there are many algorithms to analyze natural language data, at Bewgle we have chosen some of the most relevant ones which work very well specifically with user generated content. Generic algorithms are frequently tested with newspaper articles or wikipedia content, but how many people write that kind of English in reviews?
Finally, at Bewgle, we continuously tweak our algorithms and our approach to ensure we keep improving every single week. Quite simply, we strive to beat our own benchmarks.
The result is that Bewgle works very effectively with user generated content like reviews. Our focus helps us iterate in the most effective manner to help our customers make the most sense of the user generated data that they have.