Portfolio

Price transparency Texas hospitals (Python)

US hospitals must post their prices online, but hospitals are all over the map on formatting. The first goal was to find the files on their websites, clean and transform the data, and upload it to a PostgreSQL database for analysis.

The second goal was to get a sense of the hospital group with the best relative prices, using almost 1 million data points of the most common procedures.

Collect, clean, transform and analyze data
Python, Docker, PostgreSQL, Flyway

Project link

ULB credit card fraud prediction (Python)

This data set includes over 280,000 credit card transactions made in September 2013 by European cardholders.

The goal is to predict whether a transaction is fraudulent or not based on the 30 variables provided. Most of the variables are the principal components obtained with the PCA of the original data.

Classification
Logistic Regression, LDA, Random Forest, Gradient Boosting

Project link

ULB credit card fraud prediction (R)

I did the ULB credit card fraud prediction project in R to practice and to compare implementation differences of R vs. Python.

Classification
Logistic Regression, SVM, Random Forest, Gradient Boosting

Report in pdf
GitHub link

Lending Club loans (Python)

It was my entry to the workspace competition at Data Camp. The data set contains almost 10,000 loans issued by Lending Club.

There were multiple goals for this project: explore the data, visualize it, and extract valuable insights. Estimate the time it takes users to pay back their loans and explore the different types of customers that take loans for various purposes.

As a final step, the goal was to build a model to predict whether a loan would be paid back in full.

Classification
Logistic Regression, Random Forest, Gradient Boosting

Project link

UK used cars market (Python)

It was the project I presented to achieve Data Scientist Professional certification from Data Camp. The data set contains price information for 6,700 used cars in the UK.

The goal is to estimate the price of used cars within £1,500 using the eight variables provided.

Regression
Regularized Regression, Random Forest, Gradient Boosting

Project link

Budgeting Web App (Flask, Python, SQL)

A budgeting app where users can track their expenses based on the envelopes method. It is inspired by an excellent commercial application called Goodbudget.

JavaScript, Python, Flask, and SQL

Project video

MovieLens movie recommendation system (R)

A movie recommendation system using the MovieLens data set.

This data set contains 10 million ratings of more than 10,000 movies given by about 70,000 users. It was a guided project where we were provided some code to set up the data set and based the recommendation system on an example from the course series.

Recommendation System
Regularization

Report in pdf
GitHub link

DataCamp projects (Python)

I have taken many Data Camp courses. These are all the projects I have completed.

Data Manipulation
Data Visualization
Importing and Cleaning Data
Machine Learning

Data Camp projects

White wine & red wine (Python)

There are two data sets, one for white wine (4,898 wines) and one for red wine (1,599 wines). The goal is to create a model that estimates the subjective rating of wine as graded by experts, using the physicochemical properties of the wine in the data set.

Regression
Regularized Regression, Random Forest, Gradient Boosting

White wine link
Red wine link