Founded in 2003, Taobao started as an unknown e-commerce platform for trading second-hand items by auction. Today, the website sells an enormously wide variety of products, with hundreds of millions of registered users and total sales exceeding RMB 1 trillion. You, may also be a loyal member of Taobao. But have you ever noticed that, when you are making purchases, a series of statistical analysis is being done on you behind the system?
A basic approach starts from the product dimension, e.g. to infer your literacy level from your book purchasing records and to recommend relevant books that may suit your interest, or to infer the brand and year of your motorcycle from your parts purchasing records and to provide you with relevant maintenance information. In this approach, similar products are recommended to users based on their personal purchasing records; however, sometimes these users may have already bought the recommended products, and what they need is your recommendations on items that have been bought by customers who are similar to themselves, that is, from the personal dimension. By collecting a sizable volume of customer purchasing records, statistical analysis is used to simulate the purchasing behaviour and habits of different consumer groups and to identify the characteristics of the users through their preferred choice of media and websites, thus allowing us to explore the potential demands of the users.
Jack Ma once said, “The future of the Internet is the era of big data.” It is no exaggeration to say that the application of big data has been subtly changing the mode of business decision-making. Big data is indeed a massive amount of data collected through information engineering and similar techniques. Big data itself has no significant impact, and its value can only be realized when useful information is extracted through statistical approaches. As in the previous example, buying behaviour of different consumer groups is extracted from the huge amount of customer purchasing records.