Saturday, 30 December 2017

Sentiment Analysis in Python by using Large Movie Review Dataset

Dataset

I am using "Large Movie Review Dataset" for this sentiment analysis tutorial. This dataset contains 50,000 movie reviews which is obtained from the Internet Movie Database (IMDB) for this specific task. Dataset contains two polarity level: negative and positive, each set contains 25,000 movie reviews. 


After download, extract the winrar file into your directory


Convert Dataset into .CSV file


import os
import pandas as pd
import numpy as np
labels = {'pos': 'positive', 'neg': 'negative'}
dataset = pd.DataFrame()
for directory in ('test', 'train'):
    for sentiment in ('pos', 'neg'):
        # Note: change the path name with your directory
        path =r'C:\Users\nlpgeek\sentiment\aclImdb/{}/{}'.format(directory, sentiment)             
       
        for review_file in os.listdir(path):
            with open(os.path.join(path, review_file), 'r', encoding='utf8') as input_file:
                review = input_file.read()
            dataset = dataset.append([[review, labels[sentiment]]],
                                     ignore_index=True)
dataset.columns = ['review', 'sentiment']
indices = dataset.index.tolist()
np.random.shuffle(indices)
indices = np.array(indices)
dataset = dataset.reindex(index=indices)
dataset.to_csv('movie_reviews.csv', index=False)

It will take some time for converting 50,000 text documents into .CSV file.

Sentiment Analysis in Python by using Large Movie Review Dataset Rating: 4.5 Diposkan Oleh: Khanx