This page describes, with some detail, several projects I have recently completed. 

To obtain more information about each project (including a detailed description and the Python code), hover the mouse over the pictures and click on the PROJECT buttons that appears.

To see just a list with very brief descriptions, click on the button ALL PROJECTS below this line.


Retail Expansion Analysis with Lasso & Ridge Regressions

Using purchase data from Iowa liquor stores, regularized regression models are used to provide recommendations on where new stores should be opened.

Predicting Outcomes of Tennis Matches

The probability that the higher-ranked player will win a match is estimated using logistic regression, decision trees, and random forests. 

Predicting Comments on Reddit using Ensemble Tree Models

Here, features of Reddit posts (obtained via webscraping) that contribute most to the number of comments were obtained. Natural Language Processing was then used to feed data into a Random Forest Classifier.

Predicting Customer Churn

Churn rate is the rate at which user leave a product or service. In this project, the churn rate at a mobile phone company is predicted using a Support Vector Machines and Random Forest Classifiers

Predicting clicks on digital ads

In this project I predict the likelihood that a given online ad will be clicked. This type of model is useful for companies that include online advertising in their media mix.

Principal Components Analysis (PCA) and Wines

In this project I introduce the basic concepts of principal component analysis (PCA) and apply the technique to a dataset containing information about wines.



Painter Identification Using Convolutional Neural Networks

I built a Convolutional Neural Net to identify the artist of a painting via transfer learning, instantiating the convolutional part of the Inception V3 model, and training a fully-connected network on top (see slides).

Deep Learning, Bitcoin Price Data and Altcoins Correlations

I built predictive models for Bitcoin price data using recurrent neural networks (LSTMs). Correlations between altcoins are also considered.

Deep Learning with Keras and TensorFlow 

Neural networks tutorial where I build fully-connected networks and convolutional neural networks using both Keras and TensorFlow respectively (in progress).


Analysis of Geospatial Data

Geospatial or GIS data is data that identifies geographical locations. In this notebook I explore simple applications of this type of data.


neural language models and spinoza's Ethics

Spinoza's Ethics was used to build a language model for text generation with recurrent neural nets.

(Another) Sentiment analysis of movie reviews

A  "reverse sentiment analysis" using Bernoulli Naive Bayes was performed on movie reviews (already classified) to identify which words appear more frequently on reviews from each class.


Credit Card Fraud Prediction

In this notebook I use deep learning (i.e. artificial neural networks), to predict the probability of fraud in credit card transactional data. The dataset was downloaded from Kaggle. The code is written in Python and uses the TensorFlow library.

Algorithmic Trading with Python

Introductory tutorial showing how to use Python and its libraries to build simple strategies for algorithmic trading. Backtesting is also discussed.

Linear Optimization using Python

In this notebook it is shown how to write generic complex optimization problems using PuLP, a Python-based linear optimization library. PuLP is particularly useful in cases where constraints and/or the objective function involve products of quantities containing a large number of indexes.


Dynamic Pricing Applied to Hotel Management

In this notebook I explain the concept of dynamic pricing in the context of the hospitality industry. Particularly in hotel management, dinamic pricing is the strategy of adjusting prices as a function of factors such as room availability, demand (occupancy) and other things.


Topic Modeling Applied to a Corpus of Research Papers

In topic modeling, topics are identified in a document or a set of documents. This project uses one specific method, Latent Dirichlet Allocation (LDA) and apply it to a corpus of research papers.

Clustering Algorithms Applied to Data with Mixed Types (In R)

In this project, clustering algorithms are applied to a data set containing both categorical and continuous variables. In such cases, commonly used distance measures (e.g. Euclidean) cannot be applied.



Vector error correction model (VECM): applications in finance

Error correction models using the Engle–Granger method suffer from many limitations. The VECM  addresses many of those issues. In this project, VECMs are used for pairs trading of stocks.

Bayesian structural time-series models (BSTM)

In this project, Bayesian structural time-series models or BTSMs are used to analyze the impact on BP stocks of the BP oil spill in '10.  We will use the Causal impact library in R


Many-Body Localization (MBL)

When disorder and interactions are both present in an isolated quantum system, the MBL phase emerges. Here I used the QuSpin package to study the adiabatic ramping of parameters for a spin chain in a disordered field.



Analysis of the Decline of Participation in Standardized Tests

The goal was to make suggestions to the SAT's College Board to help them increase the participation rates  in their exams. I performed a comprehensive statistical analysis of both SAT and ACT participation rates. 

Exploratory Analysis of the West Nile Virus problem dataset

In this notebook I performed a detailed exploratory data analysis (EDA) of the Kaggle website from the West Nile Virus problem. This project is ongoing and progress can be tracked here.



Under construction

Automated Theorem Proving

Towards a unified approach to model interpretation in Machine Learning

Information-Extraction-From-Scientific Articles