Python is gaining an ever larger footprint in the interesting world of Data Science. KDnuggets says that Python R is even over as leader in the AI and Machine Learning standings platforms. But what makes Python so special for data science? One of the reasons is that Python is a general-purpose programming language. This means that Python has no specific purpose, which makes it possible to use models directly in a broader context. R, on the other hand, is not a general-purpose language, since the main goal is statistical modeling. Another reason that makes Python so popular is that it is an easy language to learn, so non-programmers (like me) can easily learn a programming language.

Is it better to learn only Python and stop working with R? No! Python is complementary to R. R is strong in both explanatory and predictive analysis, Python’s focus is really on the predictive side: Machine Learning, Data Mining and Artificial Intelligence applications are written earlier in Python than in R. Tensorflow , the Deep Learning framework designed by Google, is also written in Python. The fact that Python R is overtaking in Machine Learning does not mean that Python replaces R everywhere, especially when making predictions, R is much larger than Python.

How do you learn Python?

Learning Python

You also see the popularity of the language in the number of courses that are currently available. Coursera now has 115 courses and specializations with Python, many universities and websites have made online courses available and there are plenty of books available. You can not see the forest through the trees anymore. So where do you start now?

Learning a programming language can be compared to learning a new language. You can learn it, but by practicing a lot, you really get the hang of it. We therefore recommend that you start with one of these two courses: Code Academy or Udemy . These courses start low-threshold and have a logical structure. When you’re done with this, just get started. Go build things with Python: Build a model in Python once instead of the tool you’re used to. Or, for example, build a web scraper that automatically searches for the cheapest tickets with the shortest times .

When building, Google stand next to it. 99.9% of the things you run into have already been asked by others. There are usually good instructions on websites such as stack overflow . In addition, the following two books are also very useful as a reference book: Python Data Science Handbook by Jake VanderPlas and Python for Data Analysis by McKinney. Both books explain everything clearly and easily, with a lot of example code.

In addition to these start-up guides, there are some other tips that we can give you:

  • Install Anaconda instead of the normal Python installers. Anaconda has already installed many packages for you and is therefore easy to use.
  • Use Jupyter Notebooks (installed with Anaconda) instead of the Python interface. The advantage of notebooks is that it can work interactively.
  • Handy Packages that you have to install anyway when you get started with Machine Learning are:
    • General must-have’s: numpy, scipy, pandas,
    • Machine Learning: statsmodels, scikit-learn
    • Deep Learning: Tensorflow, Theano, Keras
    • Visualization: matplotlib, plotly, ggplot
    • Webscraping: Selenium, Beautifulsoup, scrapy
    • Text Mining / NLP: NLTK, Gensim
    • Creating User interfaces: Flask (web), PyGame, TKinter

Do we miss out on interesting resources that are ideal for learning Python for Data Science? Let us know!