Extracting User Profiles from Large Scale Data
In this paper, the authors present the details of a large scale user profiling framework that they developed here in IBM on top of Apache Hadoop. They address the problem of extracting and maintaining a very large number of user profiles extracted from large scale data. In this context, a user profile is often used to classify a given user into predefined user segments (e.g., by demographics or tastes) or to capture the online behavior of the user including the user's private interests and preferences. A user profile can be explicitly defined by the user herself, e.g., during the user's registration to some service. User profiling is usually defined as the process of implicitly learning a user profile from data associated with the user.