So, if you heard enough about “BigData” & “Data Science” and is eager to be a data scientist, we’ll go through laying out a map of the available tools and required skills, and what you can do to be a data scientist. If you don’t know why you should learn Data Science review our post :
So, there’s a HUGE number of resources online there to learn about data science, and there exist nobody who can leverage all this amount of information, so what we’re trying to do here, is discussing what’s the possible path, that someone could take to reach his goals, which takes us to these questions :
“What areas does exist in the field of data science ?”
“What types of professionals are there in Data Science field, and who’s the Data Scientist?”
“What’re you goals to learn about Data Science ?”
“Do you believe that it’s fulfilling ? are you thrilled to learn about it ?”
After answering that question, a natural question would be :
“And what areas require a skill set that I could enjoy learning the most ?”
“What skill path ‘P’ should I take, to gain skill set ‘X’ if my current skill set is ‘Y’ ?”
“How can I acquire the required skill set ?”
The answer to these questions aren’t easy, and we CAN’T promise to provide you with the correct answer, but we’ll try to enumerate a list of POSSIBLE answers, to inspire you about what questions are there to answer, and what possible answers could be, so that you can answer those questions yourself. So, let’s now try to answer the first question :
1. “What areas does exist in the field of data science ?”
According to the Wikipedia’s article on the subject, the Data Science consists of the following Disciplines :
Too much, huh ! Will luckily, you don’t have to be expert at all of these fields to become a Data Scientist, you’ll touch the basics and scratch the surface of all fields but to be a great practitioner you probably need to dig into one or two disciplines from the illustration above.
So, you now know what’s how the one thousand feet image looks like, let’s try to answer the second question :
2. “What types of professionals are there in Data Science field, and who’s the Data Scientist?”
As illustrated in a blog post from hortonworks, the following types professional are what we can meet in the field :
As discussed in that post:
Software Engineers, are those professional enjoying creating production ready applications, that’s secure, scalable, maintainable…etc.
Data Engineers, are Software Engineers specialize in Data Field those professionals are excelled in writing SQL, MapReduce, Hadoop, Hive/Pig and any other Data processing technologies, they excel at building efficient and scalable pipelines with these tools.
Research Scientist, Are those professionals who enjoy and work on crafting new Algorithms to solve unseen issues in Machine Learning, Information Retrieval, NLP….etc, They are interested in publishing papers and doing fundamental research, and most probably don’t care about any specific production system.
Applied Scientist, Those are mostly researchers, who may be holding PHD or Masters degrees, and doing research as their Research Scientists peers, but unlike Researchers, those professional don’t care a lot about papers and fundamental research (although they may publish some papers), but instead they care about researching problems in production ready systems, and they apply state-of-art techniques created by researchers to solve real world problems.
Data Scientist, The data scientist is a professional who master skills from the Applied Scientist and Data Scientist, they’re can apply advanced statistics and mathematics and visualisations (as Applied Scientists does), and at same time, they know how to build and work with production ready systems, which requires Engineering qualities.
Now that you know who is the Data Scientist, why do you want to become one? let’s find out by answering the following question :
3. “What’s you goals from learning about Data Science ?”
You may be just interested in becoming one, because they’re so rare, and there’s a plenty of jobs out there, asking for a Data Scientist, and these job vacancies are still expected to grow more, as illustrated in our previous post.
Or you may have working as another professional type from the chart above, and have worked or seen Data Scientists in range, and fascinated by what they’ve done, and that made you willing to become one of them.
In all cases you must define you goals clearly, and make sure you gain enough interest before delving into the subject, as the path is too long and hard, and during the path, you may lose interest, at which time, you’ll need to reflect back to your original goals, and re-assess them.
After knowing why you want to become one, let’s answer the next important question :
4. “Do you believe that it’s fulfilling ? are you thrilled to learn about it ?”
You could be a multi-disciplinary person, who enjoys joining the science disciplines with practical matters, or you just enjoy working with massive amounts of data, or finding patterns out of chaos, and at same time you’re willing to pay enough time and efforts to be close to the subjects you enjoy.
You must be passionate about learning the skills needed, because as we mentioned in the answer of the previous question, it’s not easy at all, and the path is too long, and unless you’re passionate about learning the required skills (we’ll touch shortly about the required skills), you won’t probably go so far before losing interest.
The next logical question is what areas and skills are required ? let’s try to answer it :
5. “And what areas require a skill set that I could enjoy learning the most ?”
You can get a sense of the general areas and disciplines required from the Wikipedia’s chart mentioned above in this article, so there is large number of disciplines, and most probably you won’t be expert in all of the mentioned disciplines (unless you’re a super man! ), but you definitely need to touch all of these areas, at least at a breadth level.
As a Data Scientist you’re expected to understand (at a variable degree) all the disciplines mentioned, but it’s up to you to define and decide upon the degree of expertise you’ll master for each discipline, and what discipline you’ll be at a degree of writing books and teaching others about (That’s what expert means), and what discipline you’ll just know enough to be able to use for Data processing purposes.
After knowing what areas are there, it’s the time to ask the question of what skills are required in these areas, let’s try to answer that with the following question :
6. “What skill path ‘P’ should I take, to gain skill set ‘X’ if my current skill set is ‘Y’ ?”
Unfortunately that’s not an easy question and there’s no one gold bullet to answer this question, it’s something you’ll have to figure out, along the journey, but we can give you possible answer, just something you can look at, to know what could be there, but you’ll have to craft your own answer along the journey.
Swami Chandrasekaran, did a great job in the following post to illustrate sample of the required skills to become a Data Scientist, for each discipline, and at what order you can gain them, and where do the areas intersect, see the following image for sample :
7. “How can I acquire the required skill set ?”
The answer for this question is currently much much easier than ever, as recently the amount of online learning resources for Data Science is enormous, and a lot of graduate degrees does exist now for Data Science, so, it’s not hard to find how to learn a specific skill set, but the issue is that how can you choose from the infinite set of online resources and books that’s targeting the field.
Generally three approaches exists :
1. Self crafted track.
2. Online free tracks.
3. Accredited graduate degrees.
We’ll try to talk about each one, and discuss the pros and cons of each.
- Self crafted track : This means scanning through online blogs and articles and ask domain experts for recommendations, and gather a list of books and trainings, and may free/paid online courses, either credited or not, this approach has the advantage of mastering the learning experience and controlling when and how to learn each part, but it carries a lot of risks, such as loosing the target, or missing the path, here you’re an example for such track:
- Online free tracks: This one is free or almost free online tracks that gives you BREADTH of knowledge in the area, it has the advantage of being specific and well tailored path, that’s crafted by domain experts and are expected to give you a frame of what you should learn and how to learn about, but it has the disadvantage of being shallow, and in most cases, it’s not giving you the control over when to have a deep dive into a certain subject.
Here you are two examples :
- Accredited graduate degrees: This one is a step further after the online free tracks, it means you have the motive and the passion to learn about the field, and to secure a spot in a top ranked you should also be having some research experience in the field, as for the advantages of this approach, it’ll give you a slightly better situation for being hired in big corporations (those will be mostly the ones who need data scientists), but it means you’re willing to spend a huge sum of money, and spend much time, so it’s a tradeoff, and it’s has the same disadvantages of the online mostly free tracks, which is you won’t be having direct control over the syllabus, and in most cases it’ll be shallow, as for examples of online accredited Data Science Masters :
And the following links gives lists of schools those having a graduate Data Science degrees:
It’s not an easy to answer to question, and it’s rather a journey, that starts by asking your self a few questions then trying to answer these questions, in this article, I just tried to give sample answers to these question, but you’ll still have a lot of homework to do !!
Christopher R Ball a friend and a client I’ve worked for, for a long time, have asked me some of these question when I asked for advice, and I was just trying to answer those questions among others, in this post.
- Wikipedia’s Data Science article
- Hortonworks : How to build a Hadoop data science team
- Swami Chandrasekaran’s Blog: Becoming a data scientist
- Colleges with Data Science degrees
- Survey of graduate degree programs in analytics