I have started taking the courses at Kaggle. It is a fantastic website. Thank you, Google, for keeping it free.

At first, I was intimidated by how fast they jumped into machine learning, building a model in the first lesson of the first course. It goes quickly, but I think it is beneficial to start getting used to the concepts, and then I can go deeper in other classes or practice in the competitions.

I was also watching a video by Joma Tech where he talks about the whole data science pipeline, what he calls the Data Science Hierarchy of Needs:

Data Science Hierarchy of Needs

It is about how the job of a data scientist/analyst/engineer can mean many different things in different companies and how there is often a disconnect between what you see in data science articles or videos and the company’s requirements.

In a different video, Krish Naik talks about his version of the data science ecosystem. It helped me get an overview, see which areas I am attacking in my learning, and what other topics/tools I will need to focus on in the future. Here is his list:

  • Programming Language
    • Python
    • R
    • Java
  • Web Scraping
    • Beautiful Soup
    • Scrapy
    • Urllib
  • Data Analysis
    • Feature Engineering
    • Data wrangling
    • Exploratory Data Analysis
  • Data Visualization
    • Tableau
    • Power BI
    • Matplotlib, GGplot, Seaborn
  • Machine Learning
    • Classification
    • Regression
    • Reinforcement
    • Deep Learning
    • Dimensionality Reduction
    • Clustering
  • IDE
    • Pycharm
    • Jupyter
    • Spyder
    • R Studio
    • VS Studio
  • Math
    • Statistics
    • Linear Algebra
    • Differential Equations
  • Deploy
    • AWS
    • Azure