Back to: Data Science Tutorials
Recommendation Engine and its Working in Machine Learning
In this article, I am going to discuss the Recommendation Engine and its Working in Machine Learning with Examples. Please read our previous article where we discussed Association Rules and its Use Cases in Machine Learning with Examples.
Recommendation Engine in Machine Learning
A recommendation engine is now used by every firm, from small to large. So, let’s talk about it now.
Recommender systems are computer programs that make recommendations to users based on a variety of parameters. These systems forecast the most likely product that customers will buy and that they will be interested in.
Netflix, Amazon, and other companies employ recommender systems to assist their users to find the right product or movie for them. The recommender system works with a vast amount of data by filtering the most important information based on the data provided by the user and other criteria such as the user’s preferences and interests.
It determines the compatibility of the user and the object, as well as the similarities between users and items, in order to make recommendations. These types of systems have helped both the users and the services delivered. These kinds of solutions have also improved the quality and decision-making process.
The algorithm can recommend a wide range of items, including movies, books, news, articles, jobs, and adverts. There will never be a flawless recommendation given to a user.
It has a wide range of applications. Some of them are –
- Personalized Content: Enhances the on-site experience by generating dynamic recommendations for various audiences, similar to what Netflix does.
- Improved Product Search: Assists in categorizing products based on their features. For example, material, season, and so forth.
There are majorly three types of recommendation systems –
- Content-Based Filtering
- Collaborative Filtering
- Hybrid Recommendation System
Content-Based Filtering –
This form of recommendation system displays relevant items based on the content of the users’ previously searched items. The attribute/tag of the product that the user likes is referred to as content in this case. Items are labeled with keywords in this type of system, after which the system tries to comprehend what the user wants by searching its database, and lastly tries to recommend different products that the user wants.
Take, for example, a movie recommendation system in which each film is assigned a genre, which is referred to as tag/attributes in the afore mentioned situation. Let’s pretend that user A walks in and the system has no information about him. So, at first, the system tries to propose popular movies to consumers, or it tries to gather information about the user by having the user fill out a form. After some time, users may have assigned a rating to certain films, such as giving a good rating to action films and a low rating to anime films.
As a result, the algorithm suggests action movies to the users. However, you cannot state that the user dislikes animation movies because the user may detest the film for another reason, such as acting or story, but genuinely enjoys animation films and requires more information in this situation.
Advantages of Content-Based Filtering –
- Because recommendations are customized to a single user, the model does not require data from other users.
- It makes scaling to a big number of users easier.
- The model may recognize the user’s individual interests and make recommendations for goods that just a few other users are interested in.
Disadvantages of Content-Based Filtering –
- To some extent, feature representation of items is hand-engineered, and this technology necessitates a great deal of domain knowledge.
- The model can only give suggestions based on the user’s previous interests.
Collaborative Filtering
Collaborative-based filtering is essentially recommending new goods to consumers based on the interests and preferences of other like users. As an example: When we shop on Amazon, it suggests new products by saying things like “Customer who bought this also bought.”
This avoids the drawbacks of content-based filtering by relying on user interaction rather than content from the things being utilized by the users. It merely requires the users’ previous performance for this. Based on previous data, it is assumed that users who have agreed in the past will agree again in the future.
Collaborative filtering can be divided into two categories:
- User-Based Collaborative Filtering: In this type, the system identifies individuals who have similar purchasing preferences, and similarity between users is calculated based on purchase behavior.
- Item Based Collaborative Filtering: Here the algorithm looks for things that are comparable to those purchased by the consumer. For the prediction, the similarity between distinct items is computed based on the items rather than the users.
Advantages of Collaborative Filtering –
- Even if the data is small, it works well.
- This model aids users in discovering a new interest in a specific item, although the model may still recommend it if other users share that interest.
- Domain knowledge isn’t required.
Disadvantages of Collaborative Filtering –
- It can’t handle new things because the model isn’t trained on the database’s newly added objects. Cold Start Problem is the name given to this issue.
- The importance of the Side Feature is negligible. In the context of movie recommendations, side features can include actor names or release years.
Hybrid Recommendation System –
Different types of recommendation systems each have their own set of advantages and disadvantages. When employed in isolation, several of these strategies can appear to be restrictive, especially when multiple sources of data are available for the problem. Hybrid recommender systems are those that make use of a variety of data sources to make reliable inferences.
Parallel and sequential are the two most common designs for hybrid recommendation systems. The parallel architecture delivers input to numerous recommendation systems, which are then combined to provide a single output. The sequential architecture gives a single recommendation engine the input parameters, and the output is handed on to the next recommender in the series. A graphic illustration of both designs can be found in the diagram below.
Advantages of Hybrid Recommendation System –
Hybrid systems integrate several models to overcome the shortcomings of one model. Overall, this mitigates the drawbacks of utilizing individual models and facilitates the generation of more reliable suggestions. Users will receive more robust and tailored recommendations as a result of this.
Disadvantages of Hybrid Recommendation System –
These models are typically computationally difficult, and they necessitate a big database of ratings and other criteria to stay current. It’s tough to retrain and deliver new recommendations with updated items and ratings from diverse users if you don’t have up-to-date metrics (user interaction, ratings, etc.).
The recommendation system altered the situation by making it simple for the user to select their preferred options and areas of interest. It suggests material that is tailored to the user’s preferences. These systems are currently utilized on a number of additional platforms.
Case Study on Recommendation System in Machine Learning
Problem Statement
This dataset comprises 1,000,209 anonymous ratings from 6,040 MovieLens users on around 3,900 films. Users were chosen at random for this 1M version, which was released in February 2003. All of the individuals that were chosen had rated at least 20 films. An id is assigned to each user, and no additional information is supplied. The data was originally stored in three files: movies.dat, ratings.dat, and users.dat. We turned the data into CSV files to make it easier to work with.
Importing Libraries
Importing Libraries import pandas as pd import numpy as np from afinn import Afinn import plotly.graph_objs as go from sklearn.preprocessing import StandardScaler from sklearn.cluster import KMeans from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.metrics.pairwise import linear_kernel from sklearn.model_selection import train_test_split from sklearn.metrics import mean_squared_error from sklearn.metrics.pairwise import cosine_similarity from surprise import Reader, Dataset, KNNBasic, SVD from surprise.model_selection import cross_validate movies = pd.read_csv('movies.csv’) print('Movies Data Shape:', movies.shape) movies.head()
# Break up the big genre string into a string array movies['genres'] = movies['genres'].str.split('|') # Define a TF-IDF Vectorizer Object. Remove all english stopwords tf = TfidfVectorizer(analyzer='word', ngram_range=(1, 2), min_df=0, stop_words='english') # Convert genres to string value # Replace NaN with an empty string movies['genres'] = movies['genres'].fillna('').astype('str') # Construct the required TF-IDF matrix on the Genre feature tfidf_matrix = tf.fit_transform(movies['genres']) # Output the shape of tfidf_matrix print('Final Shape of TF-IDF Matrix:', tfidf_matrix.shape)
# Performing Cosine similarity over TF-IDF matrix cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix) # Printing cosine similarity matrix shape print('Cosine Similarity Matrix Shape:', cosine_sim.shape) # Visualizing first 8 values for similarity sns.heatmap(data=cosine_sim[:8, :8]) plt.show()
def genre_recommendations(title): '''Construct a reverse mapping of indices and movie titles, and drop duplicate titles, if any''' # Storing titles of movies in a list. titles = movies['title'] # Creating a series of movides and setting movies title as index. indices = pd.Series(movies.index, index=movies['title']) # Obtain the index of the movie that matches the title idx = indices[title] # Get the pairwsie similarity scores of all movies with that movie # And convert it into a list of tuples as described above sim_scores = list(enumerate(cosine_sim[idx])) # Sort the movies based on the cosine similarity scores sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True) # Get the scores of the 10 most similar movies. Ignore the first movie. sim_scores = sim_scores[1:11] # Get the movie indices movie_indices = [i[0] for i in sim_scores] # Return the top 10 most similar movies return titles.iloc[movie_indices] # Calling python function to perform recommendation genre_recommendations('Good Will Hunting (1997)')
In the next article, I am going to discuss the Naive Bayes Algorithm in Machine Learning with Examples. Here, in this article, I try to explain the Recommendation Engine and its Working in Machine Learning with Examples. I hope you enjoy this Recommendation Engine and its Working in Machine Learning with Examples article.