Dr. Ateendra Jha
2
IMDb Movie Analysis - Web Scrapping and Analysis
A web scrapping was done to retrieve the data from IMDB website. It was found that romance genre has been voted the least from the top 1000 voters ,Sci-Fi is the most popular amongst the top 1000 voters.
Web Scrapping
October 3, 2021
IMDb Movie Analysis - Web Scrapping and Analysis
The romance genre has been voted the least from the top 1000 voters ,Sci-Fi is the most popular amongst the top 1000 voters.
The dataset contains more movies from Drama compared to other genres but We can see that Sci-Fi with very few movies in the data set (as depicted in the previous count- bar chart) got the most number of votes from Male, Female and also Top 1000 Imdb voters
They have the highest rating across genders among almost all age group.
There is not a significant difference in popularity amongst Action, Thriller and Adventure as they are mostly target of same kind of audiance.
We can see from the previous boxplot that median rating of USA movie is highest.
Introduction :
MDb is an online database of information related to films, television programs, home videos, video games, and streaming content online – including cast, production crew and personal biographies, plot summaries, trivia, ratings, and fan and critical reviews.
Impact :
Understanding of the customer interest becomes crucial to drag the audiance crowed to newly released series or movies. The interest of the audiance decides the genra to host, which eventualy assists the movie or series makers to decide to spend money in that genra. It also becomes crucial to follow the ratings by different age groups to have insight of their interests.
Problem Statement :
To identify the genera more popular.Â
To identify which age group has rated more for whch genera.
Code :
# To add a new cell, type '# %%'
# To add a new markdown cell, type '# %% [markdown]'
#Â %%Â [markdown]
# # IMDb Top Rated Movies
#Â %%Â [markdown]
# ### Web Scrapping
#Â %%
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
import re
#Â %%
weburl = "https://www.imdb.com/chart/top/?ref_=nv_mv_250"
URLread = uReq(weburl)
htmlpage = URLread.read()
URLread.close()
#Â %%
websoup = soup(htmlpage)
#Â %%
filename = "IMBD-Movie2.csv"
f = open(filename, 'w')
#Â %%
containers = websoup.find("tbody")
rows = containers.findAll('tr')
headers = "Name###Year###Rating\n"
f.write(headers)
for row in rows:
col = row.findAll('td')
col = [x.text.strip() for x in col]
rating = (col[-3]).strip()
nameyearstring = col[1]
name = ((re.sub("^\d+","", nameyearstring))[2:-6]).strip()
year = (nameyearstring[-5:-1]).strip()
print(name,",",year,",",rating)
f.write(name+"###"+year+"###"+rating+"\n")
f.close()
#Â %%
import pandas as pd
#Â %%
df = pd.read_csv(r"C:\Users\ateen\OneDrive\Project Files\IMBD-Movie2.csv", sep="###" )
pd.set_option('max_rows', None)
df
#Â %%