By searching on different platforms, I collected some informative data about interesting project ideas in natural language processing. Quora answers helped a lot for this article.
"Pathan Karimkhan" answered on Quora:
- Sentiment analysis for twitter, web articles - Identify over all sentiment for web articles, product review, movie review, tweets. Lexical based approach or machine learning techniques can be used
- Web article classification/summarization - Use clustering/classification technique to classify the web article, perform semantics analysis to summarize the articles
- Recommendations system based on user's social media profiles - Use social media API, collects user interest from facebook, twitter etc implement recommendation system for user interest
- Tweet classification and trend detection - Classify the tweets for sports, business, politics, entertainment etc and detect trending tweets in those domain
- Movie Review Prediction - Use online movie reviews to predict reviews of new movies.
- Summarize Restaurant Reviews - Take a list of reviews about a restaurant, and generate a single English summary for that restaurant.
- AutoBot - Build a system that can have a conversation with you. The user types messages, and your system replies based on the user's text. Many approaches here ... you could use a large twitter corpus and do language similarity
- Twitter based news system - Collect tweets for various categories hourly, daily base, identify trending discussion, perform semantic analysis and create kinda news system (Check Frrole product)
There are some basic building blocks that are not working robustly enough yet to automatically process user-generated content ie the real world.
- context awareness (user history, recent queries, location, current headlines and so on)
- text correction (transliteration, spelling, truecasing, punctuation and even grammar)
- language detection (of text, voice or images)
- languageless models (like multi-language speech synthesis and word2vec, instead of creating a separate model for each of the standards that happens to have an ISO code) — another solution to language detection in a way
Note that humans do all of this when learning and processing natural language. Some of it is inherently application-specific, but some of it can be researched and mostly solved in the abstract.