Getting Started With Data Science

Imaad Mohamed Khan
6 min readJul 13, 2019

Hello World! It’s been a very very long time since my last post. I’ve been trying to find time, but somehow haven’t been able to. Anyway, now that I’m back, let me jump right into the topic!

Over the past many months, I’ve received hundreds of messages from people asking me how they could get started with Data Science. Now, I’ve been trying to reply to many of these messages, but increasingly I’ve been finding it difficult to keep up. Therefore, I thought it would be useful to write down a framework for those wanting to get started with Data Science.

One of the many messages I’ve received

Data Science is in general an umbrella term. A lot of different fields have come together to contribute to the term ‘Data Science’. As a result, there is a lot of confusion out there. In fact, there are many different ways to actually get started with Data Science.

One thing that struck me when answering the messages that I get was the background of the people asking me questions. From students to engineers to accounting executives, there’s a lot of people from diverse backgrounds wanting to get into Data Science. That’s great for the field! But it’s difficult for me to be able to offer a generic advice to someone who has never programmed to someone who’s been programming all their life! That’s why, I’ve come up with this framework of getting started with Data Science based on the background of the person.

It’s my hope that this framework will help you identify the path you can possibly take. With this said, please go ahead and read the sections of the blog post that are most relevant to you. Happy learning!

Note that, this is a living document and I’ll keep updating and adding resources as I find them to be useful.

Framework based on background
Note: Some points are repeated in different sections. Go to the section directly that’s most relevant to your background.

  1. Developer/Programmer/Software Engineer/Coder
Developers dream in code

a. You perhaps already know how to program or code and have been doing it for the past many years. You may be primarily using Java/C++ at work.
b. Great! So you can now get started with either Python or R. It shouldn’t take you long since you already have been programming. Work on acquiring specific skills in data acquisition, building data pipeline, data munging and data visualization. Learning about databases and SQL will also be extremely useful.
c. Work on your math skills. Try to understand basics of statistics and probability. This is important for you to be able to grasp concepts in ML. This is also important for you to be able to perform experiments, determine causality etc.
d. Work on understanding Machine Learning algorithms. Try to understand the conceptual differences between supervised and unsupervised, classification and regression etc.
e. Once you’ve understood some theory, start implementing. This is important. There’s a lot of theory out there, so make sure you don’t drown yourself only in theory. Start implementing small projects that use the knowledge you’ve gained above. If you get stuck with some concept, go back to theory.
f. To implement stuff, you’ll have to think of a project idea that is unique. Don’t build another MNIST classifier. It won’t help you stand out. Observe the world around you and try and see if you can apply your knowledge to improve your surroundings. Learn how to use Python/R well enough to build. Make yourself aware of all the possible libraries or packages you can use to implement your projects quickly.

2. Researcher/PhD/STEM Professional

Black Hole Researcher Dr. Bouman used Python for her research

a. You probably have a PhD/post-doc or are a researcher (PhD student or otherwise) in a STEM field and have been working on your research projects and now want to move into the industry as a Data Scientist.
b. You already likely have been programming in Python or R. This is great! You may be lacking in the database side of things. Read up on how data is actually stored in databases, different types of databases, how to query them (SQL) etc. Apart from this, you might have to work on your programming skills to write code that is maintainable, scalable and efficient. Learn the best practices of your language. Also, learn a version control system like git. That will help you organise your code much better!
c. Your math skills are also probably there. You’ve been working on research so it’s likely that you have some background on stats on probability. Just refresh on the basics.
d. Work on understanding Machine Learning algorithms. Try to understand the conceptual differences between supervised and unsupervised, classification and regression etc.
e. Once you’ve understood some theory, start implementing. This is important. There’s a lot of theory out there, so make sure you don’t drown yourself only in theory. Start implementing small projects that use the knowledge you’ve gained above. If you get stuck with some concept, go back to theory.
f. To implement stuff, you’ll have to think of a project idea that is unique. Don’t build another MNIST classifier. It won’t help you stand out. Observe the world around you and try and see if you can apply your knowledge to improve your surroundings. Learn how to use Python/R well enough to build. Make yourself aware of all the possible libraries or packages you can use to implement your projects quickly.

3. Non-Engineer/Non-STEM background professional

Lawyer/Accountant/Generic Suit Wearing Professional

a. You are not an engineer. You haven’t worked in a field related to science, technology or mathematics. You may have had a commerce background and may now be working in a bank. You may have studied arts or any other field which isn’t STEM.
b. If you have no programming background, start learning how to program with Python or R. My personal favorite is Python but there are equal number of supporters for R. Learn how to write basic programs. Then graduate to learn more complex concepts like inheritance, multiprocessing etc. Learning more complex concepts is not necessary in the beginning, but it becomes important when you want to start working. Anyway, just start with writing small programs and take it from there. Again, try to learn more about how to read data, manipulate data, visualize data etc. There are tons of things you can do with programming so make sure you’re learning things related to data.
c. You probably haven’t touched math since maybe high school. That’s alright! Start learning simple concepts in Linear Algebra like how matrices work, how to add them, multiply them etc. Refresh on your statistics and probability knowledge. Make sure you understand the concepts and are not just mugging it up.
d. Work on understanding Machine Learning algorithms. Select the most commonly used algorithms and work on learning how they function. Try to understand the conceptual differences between supervised and unsupervised, classification and regression etc.
e. Once you’ve understood some theory, start implementing. This is important. There’s a lot of theory out there, so make sure you don’t drown yourself only in theory. Start implementing small projects that use the knowledge you’ve gained above. If you get stuck with some concept, go back to theory.
f. To implement stuff, you’ll have to think of a project idea that is unique. It won’t help you stand out. Observe the world around you and try and see if you can apply your knowledge to improve your surroundings. Learn how to use Python/R well enough to build. Make yourself aware of all the possible libraries or packages you can use to implement your projects quickly.

Alright! Hope you’ve been able to identify exactly where you belong and have started working on it! Note that this is only a framework. You don’t have to follow it exactly to the T. Feel free to pick and choose parts of the framework as it suits you. But from what I’ve seen, if you can do most of what’s mentioned in the framework, it will help you perform well in your Data Science interviews and get a job in Data Science! Let me know if this framework has helped you in any way.

To read more such content and even more interesting content, please follow me on LinkedIn. It’s where I write often. You can find my LinkedIn here:
https://in.linkedin.com/in/imaad-mohamed-khan-218b3999

--

--