Data Science Projects
This page describes, with some detail, the data science projects I have recently completed.
To obtain more information about each project (including a detailed description and the Python code), hover your mouse over the pictures and click on the PROJECT buttons that appear.
To see just a list with very brief descriptions, click on the button ALL PROJECTS below this line.
MACHINE LEARNING PROJECTS
Retail Expansion Analysis with Lasso & Ridge Regressions
Using purchase data from Iowa liquor stores, regularized regression models are used to provide recommendations on where new stores should be opened.
Predicting Outcomes of Tennis Matches
The probability that the higher-ranked player will win a match is estimated using logistic regression, decision trees, and random forests.
Predicting Comments on Reddit using Ensemble Tree Models
Here, features of Reddit posts (obtained via webscraping) that contribute most to the number of comments were obtained. Natural Language Processing was then used to feed data into a Random Forest Classifier.
Predicting Customer Churn
Churn rate is the rate at which user leave a product or service. In this project, the churn rate at a mobile phone company is predicted using a Support Vector Machines and Random Forest Classifiers
Predicting clicks on digital ads
In this project I predict the likelihood that a given online ad will be clicked. This type of model is useful for companies that include online advertising in their media mix.
Principal Components Analysis (PCA) and Wines
In this project I introduce the basic concepts of principal component analysis (PCA) and apply the technique to a dataset containing information about wines.
DEEP LEARNING PROJECTS
Painter Identification Using Convolutional Neural Networks
I built a Convolutional Neural Net to identify the artist of a painting via transfer learning, instantiating the convolutional part of the Inception V3 model, and training a fully-connected network on top (see slides).
Deep Learning, Bitcoin Price Data and Altcoins Correlations
I built predictive models for Bitcoin price data using recurrent neural networks (LSTMs). Correlations between altcoins are also considered.
Deep Learning with Keras and TensorFlow
Neural networks tutorial where I build fully-connected networks and convolutional neural networks using both Keras and TensorFlow respectively (in progress).
NATURAL LANGUAGE PROCESSING
neural language models and spinoza's Ethics
Spinoza's Ethics was used to build a language model for text generation with recurrent neural nets.
Sentiment analysis of movie reviews
A "reverse sentiment analysis" using Bernoulli Naive Bayes was performed on movie reviews (already classified) to identify which words appear more frequently on reviews from each class.
Topic Modeling Applied to a Corpus of Research Papers
In topic modeling, topics are identified in a document or a set of documents. This project uses one specific method, Latent Dirichlet Allocation (LDA) and apply it to a corpus of research papers.
STATISTICAL ANALYSIS AND DATA EXPLORATION
Analysis of the Decline of Participation in Standardized Tests
The goal was to make suggestions to the SAT's College Board to help them increase the participation rates in their exams. I performed a comprehensive statistical analysis of both SAT and ACT participation rates.