Dana Seidel was “traipsing around rural Alberta, following herds of elk,” trying to figure out their movement patterns, what they ate, what brought them back to the same spot, when she had an epiphany: Data could help answer these questions.
SEE: Snowflake data warehouse platform: A cheat sheet (free PDF) (TechRepublic)
At the time, enrolled in a master’s program at the University of Alberta, she was interested in tracking the movement of deer and elk and other central foragers. Seidel realized that she could use her math and ecology background at Cornell University to help evaluate a model that could answer these questions. She continued her studies, earning a Ph.D. at University of California Berkeley related to animal movement and the spread of diseases—which she monitored, in part, by collecting data from collars. Kind of like a Fitbit, Seidel explained, “tracking wherever you go throughout the day,” yielding GPS data points that could connect to land data, such as satellite images, offering a window into the movement of this wildlife.
Seidel, 31, has since transitioned from academia to the startup world, working as the lead data scientist at Plenty, an indoor vertical farming company. Or as she would call herself a “data scientist who is interested in spatial-temporal time series data.”
Seidel was born in Tennessee, but grew up in Kansas. She’s 31, which she said is “old” for the startup world. As someone who spent her twenties “investing in one career path and then switching over,” she doesn’t necessarily have the same industry experience as her colleagues. So while she is grateful for her experience, a degree is not a necessity, she said.
“I’m not sure that my Ph.D. helps me in my current job,” she said. One area where it did help her, however, was by giving her access to internships—at Google Maps, in Quantitative Analysts and RStudio—where she gained experience in software development.
“But I don’t think writing more papers about anthrax and zebras really convinced anybody that I was a data scientist,” she said.
Seidel learned the programming language R, which she loved, in college, and in her master’s program started building databases. She said she “generally taught myself alongside these courses to use the tools.” The biggest skill of being a data scientist “may very well just be knowing how to Google things,” she said. “That’s all coding really is, creative problem-solving.”
SEE: Job description: Chief data officer (TechRepublic Premium)
The field of data science is about a decade old, Seidel said—previously, it was statistics. “The idea of having somebody who has a statistics background or understands inferential modeling or machine learning has existed for a lot longer than we’ve called it a data scientist,” she said, and a master’s in data science didn’t exist until the last year of her Ph.D.
Additionally, “data scientist” is very broad. Among data scientists, many different jobs can exist. “There are data scientists that focus very much on advanced analytics. Some data scientists only do natural language processing,” she said. And the work emcompasses many diverse skills, she said, including “project management skills, data skills, analysis skills, critical thinking skills.”
Seidel has mentored others interested in getting into the field, starting with a weekly Women in Machine Learning and Data Science coffee hour at Berkeley. The first piece of advice? “I would tell them: ‘You have skills,'” Seidel said. Many young students, especially women, don’t realize how much they already know. “I don’t think we communicate often to ourselves in a positive way, all of the things we know how to do, and how that might translate,” she said.
For those interested in transitioning from academia to industry, she also advises getting experience in software development and best practices, which may have been missing from formal education. “If you understand things like standard industry practices, like version control and git and bash scripting a little bit so that you have some of that language, some of that knowledge, you can be a more effective collaborator.” Seidel also recommends learning SQL—one of the easiest languages, in her opinion—which she calls “the lingua franca of data analytics and data science. Even though I think it’s something you can absolutely learn on the job, it’s going to be the main way you access data if you’re working in an industry data science team. They’re going to have large databases with data and you need a way to communicate that,” she said. She also recommends building skills, through things like the 25-day Advent of Code, and other ways to demonstrate a clean coding style. “What takes a good amount of legwork, and until you have your industry job, it’s unpaid legwork, but it can really help make you stand out,” she said.
SEE: Top 5 things you need to know about data science (TechRepublic)
On a typical morning at her current job, working from home, Seidel is drinking coffee and answering Slack messages in her home office/ quilting studio. She checks to see if there are questions about the data, something wrong with the dashboard, or a question about plant health. Software engineers working on the data may also have questions, she said. There’s often a scrum meeting in the morning, and they operate with sprint teams (meeting every two weeks) and agile workflows.
“I have a pretty unique position where I can float between various data scrums we do, we have a farm performance scrum versus a perception team or a data infrastructure team,” Seidel explained. “I can decide: What am I going to contribute to in this sprint?” Twice a week there’s a leadership meeting, where she is on the software and data leads, and she can listen in on what else is being worked on, and what’s coming up ahead, which she said is one of the most important meetings for her, since she can hear directly “when a change is happening on the software side or there’s a new requirement coming out of ops for a software or for software or for data that’s coming.”
In the afternoon, she has a good block of development time, “to dig into whatever issue I’m working on that sprint,” she said.
SEE: How to become a data scientist: A cheat sheet (TechRepublic)
Seidel manages the data warehouse and ensures data streams are “being surfaced to end users in core data models.” Last week, she worked on the farm performance scrum, “validating measurements that are coming out of the farm, thinking ahead about the new measurements we need to be collecting, and thinking about the measurements that we have in our south San Francisco farm, measurements streaming in from a couple of thousand devices.” She needs to ensure accurate measurement streams, which come from everything from the temperature to irrigation, to ensure plant health, and answer questions like: “Why did last week’s arugula do better than this week’s arugula?”
The primary task is to know if they’re measuring the right thing, and to push back and say, “Oh, OK, what is it that you want that data to be explaining? What is the question you’re asking?” She needs to stay a few steps ahead, she said, and ask: “What are all the new data sources that I need to be aware of that we need to be supporting?”
The toughest part of the job? “I really hate not having the answer. I hate having to say, “No, we don’t measure that thing yet.” Or, “We’ll have that in the next sprint.” Balancing giving people the answers with giving them tools to access the answers themselves is a daily challenge, she said, with the ultimate goal of making data accessible.
And saying, “Oh, yes, that data is there and it’s this simple query,” or, “Oh, have you seen this tool I built a year ago that can solve this problem?” is really gratifying.
“Helping someone learn how to ask and answer questions from data is like giving them a superpower,” Seidel said.