Analyzing crowd-sourced reviews

Analyzing crowd-sourced reviews about local businesses - YELP dataset
PART I - Predicting a star rating for businesses based on the category, check-ins and review count.
The goal here was to classify stars (1 stars, 1.5 stars, 2 stars, 2.5 stars, 3 stars, 3.5 stars, 4 stars, 4.5 stars) such that a decision tree can be formed that will be able to predict which star rating a business is likely to get considering the category, number of checkins and review_count received by users.

PART II - Predicting the number of fans for particular user based on number of votes for each vote type, number of reviews written by user and star rating received by a user.
The goal here was to classify the users such that a decision tree can be formed that will be able to predict whether a particular user will have high or low number of fans. To reduce the cost of the decision making process, it was necessary to create a decision tree with lesser number of nodes which is able to predict accurate decisions. This was achieved by classifying the data using less number of attributes.

  • Duration: March 2016 - May 2016
  • Tags: Big Data Analytics, R, Weka, Java program for data cleaning
  • Client: Academic Project
  • Technical Paper: View Paper
  • GitHub Link: View Project

Subgraph matching algorithms and improvements

This project has a implementation of subgraph matching algorithms and improvements like
a. optimized search space calculations
b.neighborhood profiling
c. optimized search order
d. optimized methods for verifying subgraph isomorphism.

1. Transforming different datasets into Neo4j data graphs
2. Implementation of subgraph matching algorithms and improvements.

  • Duration: 3 weeks
  • Tags: Subgraph matching, neighborhood profiling, optimized searchspace, optimized search order, optimized isomorphism verification
  • Type: Academic Project
  • GitHub Link: View

MendhiCoat - Cards Game with over 60000 downloads

Mendhi Coat is a mind blowing Indian card game. Its all about winning maximum no. of 10 numbered cards for your team and complete as many Coats against the opponents.This version includes features like single player option, customized theme, fonts and buttons.
Next advanced version of the game will have Multi player option over the internet, customized team size and much more. If you have played card games like Poker, Uno, Rummy,Solitaire,FreeCell you will like this game too.
Hope you'll have fun playing it..!

  • Duration: Improvised over 1 month
  • Tags: Android app, Cards game, Multiplayer, Canvas
  • Client: For own part-time venture I'm Curious Studios
  • Live: Google Play Link

Demand Response Management in Smart Grid

Smart grid supports two-way traffic between electricity supplier and consumer, that means it takes response from the consumer to make the service more reliable and minimize the possibility of power disruptions. Demand response system consists of sensors for sensing the excessive load on the station and based on the system decision, it can either deny the services or divert the load to most strategically located station. Here, we are studying historical data of power demands and power consumption patterns from households and businesses after fixed interval of time to reset the grid setup.

We have implemented demand response management with distribution intelligence. Our implementation requires peak values of power requirement on biweekly basis from all connected households and companies per station, which consists of hourly based peak consumption values. We then analyze the trend from this historical data for peak values. We calculate average peak values per hour for all households or industries included. T hese input files are collected per station and new average peak values are set per day per hour for respective stations.

Whenever consumer requests power supply from the power station, it redirects the request to the demand response system, where the system makes decision based on the peak value for given location at given hour. If the value is within the threshold limit, then system gives green signal to the grid to send the requested amount of electrical energy, else if the value exceeds the peak limit, then the system serves the request with a warning message to the consumer for exceeding the threshold limit at given time (exceeding consumption can be charged more, for each unit of electricity after a particular limit). If the current usage reaches the threshold value of any particular station, the station then borrows energy from other stations.

As we have peak value data on biweekly basis, at the end of 2nd week we need to update peak value table, with new peak values observed on the basis of current trend in the demand for power supply and this updated table goes input to demand response system for next cycle.

After running the above implementation for few iterations, we can see the alleviation in the demand of power supply at peak hours, as we are sending warning/response to consumers (analogy to price increment). The utilization of energy at peak hours has decreased by 10-15 percentage. This helps in increasing end user's awareness towards a more rational use of energy.

  • Date: March 2016- May 2016
  • Tags: Distributed Systems, Smart Grids, Demand Response Management, Network programming, RMI, Java
  • Client: Academic project
  • Technical Paper: View Paper
  • GitHub Link: View

Tic-Tac-Toe using Minimax algorithm

Implementation of Minimax algorithm with alpha beta pruning for Tic-Tac-Toe game.
TicTacToe game with regular Minimax and Minimax with alphabeta pruning algorithms implemented.
Output: Two complete game output with the states of the game board doing the game, and report the number of search tree nodes generated for each move by the computer, using
1) Regular Minimax
2) Minimax using alpha-beta pruning
The output file shows that the pruning does not change the decision of the moves from Minimax algorithm. The sequence of the moves suggested by both the algorithms was exact same. But, Minimax algorithm generates search tree nodes around 65000 and with alpha beta pruning we can exponentially lower that down to around 3000.

If moves are ordered properly then effectiveness of pruning can be improved.

  • Duration: 1 week
  • Tags: Tic-Tac-Toe, minimax algorithm, alpha-beta pruning, optimal solution
  • Type: Academic assignment
  • GitHub Link: View

Training and executing MLPs with hidden layers

Building a classifier using Multi-Layer-Perceptrons with hidden layers.
Wrote different programs to train & execute the MLP for given dataset.
Also, wrote program which trains & executes decision tree
1) after automatic induction, and
2) after subsequent pruning using the Chi-Squared test.
We are displaying the min.max,avg depth, number of nodes and leaves before and after pruning.
The program also calculates confusion matrix and total profit before and after pruning.

  • Duration: 2 weeks
  • Tags: Multi-Layer-Perceptron, classification, hidden layers, decision tree
  • Type: Academic assignment
  • GitHub Link: Link

Connect4 - Multiplayer iOS App

Connect4 game on iOS with multiplayer and player vs AI modes.
The objective of the game is to connect four of one's own discs of the same color next to each other vertically, horizontally, or diagonally before your opponent.
It was fun, Apple introduced us to world of swift and iPhone-sdk in small workshop followed by hackathon. We were a team of 3. We managed to develop full fledged Connect 4 game with player1 vs player2 and player vs AI modes in just 8 hours (rest of the time we were submitting our academic assignments :/)
Its not live on Apple app store because of higher yearly fees to run the developer account. Please have a look on Devpost or GitHub Link.

  • Duration: Developed in 8 hours as part of hackathon enrty
  • Tags: iPhone-sdk, Swift, iOS, Connect4, Multiplayer game, Player vs AI mode
  • Client: Developed as part of hackathon entry at RIT Apple iOS App Challenge 2016
  • Live View: DevPost Link
  • GitHub Link: View

Color reducer using clustering

Reducing the number of colors in a digital picture to only five colors.
To determine the closeness of a color from the original colors we used the Manhattan distance. Manhattan distance method was yielding better results in terms of better color replacement than Euclidean distance. We finalized the K-Means algorithm with 5 clusters (since we need to use 5 colors) for clustering. The idea was to use K-Means with no initial centroids. After the clustering, we mapped the RGB values of the final centroids from the clustering to the nearest color from the 5 decided colors.

  • Duration: 2 days
  • Tags: Color reducer, Clustering, Weka library, Java
  • Type: Academic assignment
  • GitHub Link: View

Text mining using Bag of Words & N-gram

Here, the objective is to examine newsgroup postings and predict the category or topic of that posting.
Approach : Concept of bag of words and keywords for document similarity along with an improvement that takes into consideration a scoring function based on the frequency and rank of most frequently occuring words. The first part was to build a model which would help in determining the newsgroup and the other part was to use this model in conjunction with a scoring function to classify the particular text file into a particular news group. The first part involved division of files into training and testing sets. For this project, we have selected a 70-30 split, where 70% of the files from each folder was used for training while 30% was used for testing. The next step is to check if the model is working correctly or not. For this, we supplied the testing dataset (the remaining 30% files in every newsgroup). Each file is tested to see if the model can correctly predict the newsgroup. This step is performed by our another Java program . For this classification, we developed a scoring function. Now, we can use the frequency of the words in the test file and the frequency of the words in each newsgroup model with the scoring function to calculate a score for each newsgroup, and the one with the highest score is assigned as the newsgroup of the test file.

  • Duration: Developed in 1 week
  • Tags: Text analysis, text mining, bag of words, n-gram, stemming, data cleaning and preparation
  • Type: Academic assignment
  • GitHub Link: View

Ctrl-f through physical copies of book

Cntrl-f on soft copies of the books makes it easier to surf through. But, how about finding some data from physical copy of the book lets say your academic reference book.
Allows user to scan barcode of the book that you have and then based on the ISBN number it extracts info of the book using Google Books API. Also, user can search for some data from the book from the app, user will get page information like number and page view in return.
Used Google Books API for book data extraction and ZXING open-source API for barcode scanning.

  • Duration: As a part of 36 hour hackathon entry in April 2016
  • Fields: Android Studios, Google Books API, ZXING open-source API
  • Client: As a part of hackathon entry at DandyHacks - University of Rochester 2016
  • DevPost Link: View

Alternative to Google Traffic Info - Back in 2011

Mumbai On Mobile - It was a J2ME mobile application with alternative to Google's real-time traffic information, as it wasn't available for Mumbai region in 2011. We achieved this by extracting info from regular updates received from Traffic Police Mumbai Server. At first, we wrote .asp files to parse relevant info from text messages at regular intervals.

We then matched the location keywords on map and calculated traffic severity from extracting traffic status from message. We used Google's Java Maps API and started coloring the matched path based on the severity of the traffic.

e.g If text message says, heavy traffic on LBS way, then LBS way was marked red. If the text message says Oil Tanker stalled on particular road, medium traffic then we used to mark it as yellow,etc.

Google Maps was providing different paths to reach destinations, because we could add traffic information as well on maps, user was able to decide which path to take instead of getting caught up in a bottleneck.

Today, for Mumbai city, we have traffic information from Google but, back in 2011 we were not having this info. We thought we could make some difference hence came up this project.
The applications also had railway schedule, bus routes, auto or taxi fare calculator features as addons.

Presented Technical paper at National Conference on 'Innovations in Electronics, Computers and IT' organized by Vidyalankar Institute of Technology.

  • Duration: Jan 2011 to May 2011
  • Tags: Google Java Maps API, J2ME, Mobile App, Traffic information
  • Client: Final Year Undergrad Project
  • Technical Paper: View Paper

Artificial Intelligence - Twitter bot

Under Construction - Implementing a twitter bot to impersonate a real person by feeding past tweets to a model.

  • Duration: September 2016 - ongoing
  • Tags: AI, Twitter bot, Impersonation, Training a model, feeding past tweets
  • Client: Personal project
  • Live View: Under Construction