John Bica

A creative data scientist, programmer, and analytics consultant interested in applied machine learning methods and statistical techniques to help drive insights & performance and tackle business problems using a data-supported approach.

Get in touch:


I'm John, a data scientist and analytics senior consultant currently working in EY's Data, AI, & Automation practice specializing in advanced data analytics for digital ecosystems. Current and past projects include A/B testing, data visualization, adhoc insights & analysis, and predictive modeling. Prior to rejoining EY, I completed my MS in Data Science from Northeastern University. I'm skilled in Python, R, SQL, various data visualization tools/frameworks and familiar with various machine learning applications as it pertains to predictive analytics, optimization techniques, and classification/regression use cases. I'm also familiar and have experience with Google Cloud Platform/AWS and enjoy picking up new skills and working on exciting and useful projects that broaden my own programming and data analysis toolkit.

My past volunteer projects included working with UCLA School of Law on their Covid-19 Behind Bars Data Project to help scrape historical data death rates in the US prison/jail systems and the impact Covid-19 has on these populations as well as working with to build d3.js dashboards to assist in tracking their PPE requests and donation needs.

Let's connect!


Programming Languages & Tools
Other Tools & Frameworks
  • Microsoft SQL Server Management Studio
  • Pandas, NumPy, SciPy
  • TensorFlow, Scikit-learn
  • Dash-Plotly
  • R Shiny
  • Google Cloud Platform(GCP)
  • Databricks
  • Adobe Analytics
  • Tableau
  • Jupyter Notebook


Churn Analysis Customers Dashboard

A interactive dashboard hosted on Tableau Public analyzing the Telco Customer Dataset , a open source dataset used commonly for running churn analysis predictive modeling for subscription based or SaaS businesses. | |

Health Tweet Topic Modeling

An interactive dashboard powered by Dash-Plotly using Python framework that explores topic modeling approaches such as Latent Dirichilet Allocation and Gibbs Sampling Dirichilet Mixture Models on health-related tweets scraped from 2014 - 2020. | |

Google Natural Question Answering System

An implementation of an ensemble of Bidirectional Encoder Representations from Transformers (BERT) models submitted to Google's Natural Question Answering Challenge . BERT models were fine-tuned and trained using a TensorFlow framework using TPUs on Google Cloud Platform (GCP). A Docker image of models was submitted to Google and placed amongst top 10 entries for both long and short answer tasks. | |

Distributed Matrix Factorization

An Apache Spark Scala distributed implementation of matrix factorization using ALS. Netflix Movie Rating data set used to predict ratings of unseen movies for users and can be used for recommendation with collaborative filtering. | |

Spanish High Speed Rail - Price Prediction

Machine Learning Regression Models implemented in Python using SciKit-Learn and Jupyter Notebook. The goal is to predict train ticket pricing for Spain's high-speed train network across five major cities to determine best time to buy fare. | |

Page Rank Distributed Algorithm

A Scala implementation of Larry's famous PageRank algorithm. Algorithm was run on simplified synthetic graph of nodes including dangling nodes where the graph consists of k linear chains, each with k vertices. | |


Apart from my professional and academic endeavors, I enjoy most of my time being outdoors. In the winter, I am an avid skier. During the warmer months, I enjoy hiking, running, visiting new national parks, and playing volleyball or soccer with friends.

When forced indoors, I enjoy reading new life-skills and knowledge-seeking books on a variety of different topics as well as picking up new foreign languages. I'm currently teaching my self the harmonica and improving my Spanish. I am looking forward to the opportunity to use it more in the workforce someday.