So you’ve decided to study data science, now what?
— Three things you need to know to break into data science
Congratulations, you just made a choice that would make your career sound really fancy! Predicting elections, forecasting business growth, identifying people in risk of cancer… a future of doing those projects is really motivating.
As a Duke senior who had a long journey studying data science and going to tech company in San Francisco after graduation, I want to share with you, the aspiring freshmen, sophomore, and junior data science students, some insights that I wished I’ve learned earlier. Imagine we are grabbing coffee, and you ask me: what should I do to be a data science (DS) wiz after undergrad, and here are the three points that I would say:
A Data Science candidate needs to have a combo of skills, but you don’t need to be the master at everything.
It ranges from data wrangling, statistics, business sense, to interpersonal skills.
It’s hard because we need to be more well-rounded, and more proactive to learn those things, some of which are not taught in our undergrad courses. It’s easy because it’s so fun and so powerful when we learn them.
From my past experiences, we can understand a new grad DS job as a consultant who solves problems by coding up data analytics. We don’t need to be as strong coder as a software engineer, nor as talkative strategy thinker as a management consultant. But some coding skills plus some business/product sense, in addition to a great understanding of statistics, would make us excel in DS.
You don’t need to be have DS title to do data science
“Data scientist” (DS) sounds fancy, while “data analyst” (DA) sounds cheap, so a DS is definitely better than a DA, right? No.
The whole DS industry is so young; the term “data science” was invented in 1996, and it is so not well defined. In one company, a DS could be a PhD with 5 years of working experience building production level model pipeline for fraud detection. In another company, a DS could be a new grad doing charts in excel. By the same logic, when you go onto job search, do not only look at the job with title “data scientist”. Sometimes, the fancy jobs that harness the skill of machine learning, data wrangling, and business recommendation have titles such as “Data Analyst”, “Product Analyst”, or “Business Analyst”, while “Data Scientist” are those with advanced degrees and industry experience. In my case, I turned down DS offer from company A and signed with product analyst offer from company B, because specifically to those two offers, a product analyst in company B is the more intellectually challenged position.
So don’t just look at title. Make sure you read the job descriptions, and talk to people inside, to learn what it is like to work there.
What should I do now in order to prepare for becoming a DS after college?
1. You should take more applied classes in Stats and CS (electives & grad level)
As a stats major at Duke, I found that while the undergraduate statistics core classes laid out strong foundation for statistical theories, it was the electives and graduate level courses that actually helped me in interviews and carrying out projects in a company. I recommend taking classes that teach you how to code in R / Python for statistical computing (STA323, STA663), classes that teach you how to build models and do machine learning (STA 352, STA 521, CS571). In interviews, you would be tested on your data wrangling skills and statistical modelling knowledge, but rarely on deriving gamma distributions.
2. You should develop your business / product sense
Doing DS (and not software) means that you would not be coding 100% of the time. It is a job that also involves strategy, problem-solving, and presentation, as well as coding. Since sophomore summer, I’ve been taking business classes (I&E, Marking), subscribing to business/tech email newsletters, and listening to business podcasts. The days before I fly to interviews, I spent hours reading up on those employers, test their products, and think about their competitive advantage, revenue streams, and other business logics. These product/business insights not only enabled me to ask smart questions during super-days (on site interviews), but also carried me through the case/product interview sessions (yes you still need to face low key case interview even though it’s not consulting). I will recommend a list of ways to learn below.
3. You should build relationships with Duke alums
One of the biggest disadvantages for aspiring Duke data science students is that those companies don’t come to Duke for recruit. So while our software or consulting friends were talking to recruiters radiantly in career fairs, I and my DS friends found ourselves nowhere to go.
Don’t wait for suitors, go out to find them!
Taking advice from Howie Rhee (Duke I&E mentor), I cold emailed, called, and flew out to the SF Bay Area to build relationships (I hate to use the word “networking”) with Duke alums who are working in the industry. Connecting over lunches, coffee, or drinks, I learned so much after the DS industry, life in Bay Area, and recruiting process from those good people (which is also why I feel the urge to give back). I eventually landed only a few interviews through these internal referrals (since most companies don’t recruit undergrad DS), but it was those opportunities that I converted into offers; my referrals gave me tips along the interviewing process.
My Final Recommendation List:
List of reads
My friend Scott from Uber recommended me a list of newsletters to catch up on the tech world, out of these, I found the following to be the most helpful:
· Recode
· I also subscribe and visit Wall Street Journal couple times a week.
List of Duke Courses (try find similar courses if you go to another school)
· R / Python for statistical computing: STA323, STA663
· how to build models and do machine learning: STA 325, STA 521, CS571
· Study Design: 322
· Some I&E or marketing classes to give you an entrepreneurial vibe
List of Online Courses
· Datacamp: data wrangling & machine learning in R & Python
· Coursera: machine learning theories
· Elite Data Science: expensive, but very good in-depth machine learning in python (great free blogs too)
Product/Business Sense
· Udacity: free courses on AB testing (interviewers love to ask that), product design, etc.
· How I built this: from the Podcast App, stories about founding businesses
· 得到: Chinese short podcast app: lessons on everything. Listened daily before I go to bed.
· Case in point: to read and to case. If you don’t want to case, just read.
Learn SQL
· Learn syntax in postgres website
· Do problems in hackerrank
˙Find projects that let you to handle real databases (see my examples)