What it takes to become a Data Scientist

This career sheet is based on the interview with Dr Alexander Vapirev, who works as HPC1 analyst and consultant at Facilities for Education, Research, Communication and Collaboration, ICTS services of KU Leuven, Belgium. After receiving PhD in Physics, he worked as a computational physicist and researcher in computational sciences.

As more and more companies rely on data analytics in order to understand their audience, improve their services, leverage on the information/data available to them, the demand for data scientists grows. The role of a data scientist it to organise, clean, process and make sense of the collected data. To do so, he/she uses different tools and methods from statistics and programming. Companies hiring data scientists are search engines (i.e. Google, Yahoo, Bing), social media companies (i.e. Facebook, Twitter, Linkedin), engineering related companies (i.e. Intel, IBM, Boeing, HP), financial related companies (i.e. Amazon, eBay, Paypal, Visa), banks and finance-related companies (i.e. JP Morgan, HSBC) and many more.




  • Python Coding
    Python is the most common coding language required in data science roles, along with Java, Perl, or C/C++. Because of its versatility, you can use Python for almost all the steps involved in data science processes.
  • SQL Database/Coding
    As a data scientist, you need to be proficient in SQL. This is because SQL is specifically designed to help you access, communicate and work on data. It gives you insights when you use it to query a database. It has specific commands that can help you to save time and reduce the amount of programming you need to perform difficult queries. Learning SQL will help you to better understand the relations between databases and boost your profile as a data scientist.
  • Apache Spark
    Apache Spark is becoming the most popular big data technology worldwide. It is specifically designed for data science to help run its complicated algorithm faster.
  • Machine Learning and AI
    If you want to stand out from other data scientists, you need to know Machine learning techniques such as supervised machine learning, decision trees, logistic regression etc. These skills will help you to solve different data science problems that are based on predictions of major organizational outcomes.
  • Data Visualization
    As a data scientist, you must be able to visualize data with the aid of data visualization tools such as ggplot, d3.js and Matplotlib, and Tableau. These tools will help you to convert complex results from your projects to a format that will be easy to comprehend by non-data savvy people.


  • Presentation skills
    As a data scientist, you are going to have to, at some time or another, prepare and deliver a presentation. It can be a one-to-one type of presentation, or you might need to address small groups or even larger audiences.
  • Strong communications skills
    Communication skills can include anything from, listening and understanding requirements, to writing reports, sending emails and adapting the language you use depending on your audience.
  • Data intuition
    This is perhaps one of the most significant non-technical skill that a data scientist needs. Great data intuition means perceiving patterns where none are observable. This makes data scientists more efficient in their work. This is a skill which comes with experience and can be cultivated with time.


Which subjects´ knowledge is essential for a career?

Many different paths can lead you to a data scientist career. At secondary school level subjects like algebra, calculus and programming will help you obtain the basic skills that you will need for your next steps. Most students start at the undergraduate level, with Bachelor’s degrees in data science that can lead to jobs like data visualization specialist, management analyst, and market research analyst. From there, many students go on to achieve Master’s degrees in fields like machine learning algorithm developer, statistician or data engineer. Many students then pursue Doctorate degrees that although they are not necessary, they can lead to fields such as business solutions scientist, data scientist, and enterprise science analytics manager.
Various certifications are also available although these are usually intended for professionals wishing to change careers and move from one field to data science.


It is very common for young candidates to have all the skills needed to get their first job but still having troubles getting employed. One way to work around this is to prepare a portfolio that will give you the opportunity to show to your potential employers what you can actually do instead of just listing your skills and qualifications.
In the portfolio, you can include data science projects you have worked on either during your studies or in your own time. The projects should demonstrate your interest in data science, your methodology, creative thinking, writing skills and presentation of results.


According to the European Commission report “Realising the European Open Science Cloud” 2 500,000 data scientists are needed in European open research data by 2020.
At the same time the Linkedin Workforce report for the U.S 3 calculated that, in August 2018 employers were seeking 151,717 more data scientists than exist in the United States, U.S. Bureau of Labor Statistics also reports that the rise of data science needs 4 will create 11.5 million job openings by 2026. The same report underlines that not only is there a huge demand, but there is also a noticeable shortage of qualified data scientists.

Q: If you could start all over again, how you would change your career path?
A: I would have made the same choices. It is the skills that define a person and its career path not the diplomas. If you understand how things work or you at least willing to understand how things work and ask questions, then this is the path for you.

Dr Alexander Vapirev, HPC analyst and consultant, ICTS services of KU Leuven


Data science career shaping

Hy.p.a.t.i.a. – Hybrid Pupils’ Analysis Tool For Interactions In Atlas
HYPATIA is an event analysis tool for data collected by the ATLAS experiment of the LHC at CERN. Its goal is to allow high school and university students to visualize the complexity of the hadron – hadron interactions through the graphical representation of ATLAS event data and interact with them in order to study different aspects of the fundamental building blocks of nature. HYPATIA aims to show students how real high-energy physic research is done. It provides the students with real data and an environment that closely resembles what actual researchers use, to give them the opportunity to conduct their own analysis and “discover” new particles.

*Any company mentioned in the text in no way supports this publication and we do not promote any specific tool, instead, we give examples of the most popular ones.

Sources of information:

1 High Performance Computing

2 https://ec.europa.eu/research/openscience/index.cfm?pg=open-science-cloud
3 https://economicgraph.linkedin.com/resources/linkedin-workforce-report-august-2018
4 https://www.forbes.com/sites/louiscolumbus/2017/12/11/linkedins-fastest-growing-jobs-today-are-in-data-science-machine-learning/#7ffc9e9b51bd

All Career Material created by the Project is published under Creative Commons license, Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0).

Leave a Reply

Your email address will not be published. Required fields are marked *