[英] 开源在Uber：数据可视化项目kepler.gl首席架构师Shan He采访
Like many engineers working in data visualization, Shan He’s path was far from traditional. After studying architecture for seven years, she realized that her interests in design were too diverse to be contained by the physical world. So she went back to the drawing board and pursued a master’s degree in design computation, acquiring the skills necessary to bridge the divide between atoms and bits. In the process, she fell in love with the interconnected worlds of open source software and data visualization.
In 2014, Shan joined Uber as the first member of our Data Visualization team, a new group dedicated to leveraging location data to create maps, diagrams, and other actionable visualizations of the physical world. Over the past four years, Shan created hundreds of maps illuminating everything from a single day of activity on the Uber platform in New York City to tracking app usage during the 2017 Solar Eclipse.
Now a senior data visualization engineer, Shan is the primary architect behind kepler.gl , an open source geospatial framework released today.
We sat down with Shan to learn about her journey to data visualization, the importance of open sourcing these tools, and how kepler.gl will empower members of the open source community to leverage location data to make better, more useful maps.
When did you first get interested in computer science?
My background is actually in architecture. In architecture school, there’s this practice called parametric design where you can build 3D building models based on algorithms and parameters. Rhino, a 3D modeling program, includes plugins to generate shapes using parametric design. With just a couple of lines of code, I was able to create complex structures and geometries. This experience got me interested in coding and computer science because I learned that you can actually use code to assist in the design process. This type of 3D modeling opened the door to a completely new domain for me.
How did you transition from computer science to data visualization?
Starting with architecture and then transitioning to computer science, I realized that I wanted to do something with design that combined my love for both these areas. I didn’t want to put design away just because I was also interested in computer science. So when I started taking classes in computer science, I tried to look for opportunities where I could apply both of my passions. Data visualization became a natural fit. I became a researcher in data visualization at the MIT Senseable City Lab. After graduating from MIT, I got my first data visualization job at Uber in 2014, and the rest is history.
Given your untraditional path to data visualization, was it difficult to feel accepted by this community?
The data visualization community combines design and software engineering, so at least for me, the community was very welcoming, in part because everyone comes from a different background. You can come from computer science, user interface design, electrical engineering, or even architecture, and people will respect you. But when it comes to establishing yourself outside the data visualization community, the resistance is still noticeable. The typical reaction is that designers think of me as a software engineer and software engineers think of me as a designer. I remember once that I was trying to work with a software engineer who expected me to not be as strong of a coder as he was just because I know how to design. And when I meet with designers about data visualization inputs, they usually think that we’re not as good at design as they are because they primarily think of us as people who write code. It can be tricky to maintain this balance and usually takes some time before people get a sense for your skillset.
What advice would you give architects or designers like yourself who are interested in getting into coding?
When I realized that I wanted to do more software engineering, I asked myself: “What can I do that combines creativity with coding?” That’s a question I got asked a lot, too, from students of design and architecture. I wouldn’t say that data visualization is for everyone because I know some designers just have no interest in writing code. They’re fantastic designers and they can do many things that I could never imagine doing, but they have no interest in actually going into the digits, zeroes, and ones, but if you find yourself actually interested in math, excited about seeing designs generated by couple lines of code, I think data visualization is a perfect path.
When you first came to Uber, what projects were you working on?
I was Uber’s first data visualization hire. At the time, people gave me projects based on what they thought data visualization was. I was hired into the Data Science team, and they used data visualization to help them plot charts and make them look nice. My very first project was writing a template for R so that data scientists could use my templates to plot nicer-looking charts. Essentially, it is an R package that helps you make your charts sharper and more on-brand, Uber style. We still use it!
What’s the coolest visualization you’ve ever made?
My career in data visualization is tied closely with Uber. There’s one visualization that I made when I first joined that everyone loves: a day’s worth of Uber trips in London (above). The visualization basically depicts one full day of Uber trips via an animation presented as moving lines along the street. You can see the line fade in and out as it moves, imagining each of the lines as one car driving on the road. When there are a hundred thousand cars driving around the city, the visualization makes it look like they are flying down from the sky. It looks like lightning is flowing through every single street in London.
Are there a lot of other companies doing data visualization? Or is Uber one of the few companies really prioritizing and open sourcing it? Almost every big tech firm has a visualization branch, but I think Uber was one of the first to have a strong visualization team in terms of working across different domains (both physical and virtual) and open sourcing our tools.
How did you first get involved with open source at Uber?
I first got involved with open source by contributing to deck.gl , Uber’s WebGL-powered framework for visual exploratory analysis. deck.gl is a major open source effort that the Data Visualization team has been working on for a while. Most recently, I built kepler.gl with deck.gl as a main dependency.
With data visualization, you need to know what you want to find before you start using visualizations to look for answers inside the data. You can have many different questions when you’re looking at data. We build data visualization tools to help people get insights faster. Instead of spending two to three weeks building everything from scratch, these tools help you quickly explore the data and validate your hypothesis without having to write a lot of code. At Uber, we think it’s important to give back to the data visualization community, to improve this experience for everyone in the ecosystem.
Why did your team first decide to open source kepler.gl?
When I built the in-house version of kepler.gl, it was well-liked by everyone: engineers, data scientists, and designers. People find it super easy-to-use. It doesn’t require coding, works in the browser, requires no installation, is highly exploratory, and makes beautiful maps. We realized there are really not much software out there like kepler.gl. That’s why we decided to open source it. As a software engineer, when you create something truly awesome, you naturally want to get it out to the world so everybody else can use it. Once you open source it, other people can also contribute back to it. You are not the sole author any more. You have a small community of people who are interested in using the tool and contributing back to the project. In my opinion, that’s how you make a library better and better.
What exactly does kepler.gl do?
kepler.gl is a tool to visualize large-scale geospatial data in the browser. You can create maps with any type of geospatial data by exploring and interacting within the app. It’s high performance and super easy-to-use, given that all you need to do is drop in a CSV or a JSON file. It is built for fast exploration of geospatial data and the creation of beautiful and insightful maps with just a few clicks of a button. kepler.gl can render over 1 million of points on the map, and allows you to filter, aggregate, color, and size them based on the type of data you have.
What is most challenging about open sourcing something?
Emotionally, it’s the fact that everyone is going to see your code. You can’t hide it anymore. You can show how the tool makes pretty things, but the nightmare of any software engineer is being critiqued on your code.
A lot people say, “Oh, this visualization is beautiful,” but when you dive into the code, it might actually be built with hangers and duct tape. You don’t want people to think less of your visualizations because of the code you write to build them. Luckily we have experts in every data visualization-related domain on Uber’s Visualization team, such as web programmers, computer graphics engineers, information designers, and UX designers. When I was building kepler.gl, I was able to consult them whenever a tough engineering problem came along. It really made me feel supported and gave me a lot of confidence.
And what is the most rewarding about the process?
The most rewarding part about open sourcing something is seeing the number of downloads and knowing that people actually find it useful, which was certainly the case for deck.gl. Seeing people building apps modeled off of deck.gl was one of the most rewarding experiences for me. I want people to use the library I wrote to create something meaningful.
How would you describe Uber’s open source culture?
At any company, when something gets open sourced there’s always a question about what business value it brings. To open source internal software, you have to really argue strongly to the value it can bring, as you have to put your internal work on that project aside to work on open sourcing it.
At Uber, I feel like everyone is excited about open source. When I first announced that I want to open source kepler.gl , the response from the team at Uber Engineering was phenomenal. My engineering manager helped me put together a road map; my product manager offered to help us market it; designers helped me design the demo app and the marketing website; engineers from my team contributed to it despite being assigned other projects; and the Tech Brand team helped us write blog articles and host visualization-related events. Open sourcing kepler.gl was a group effort and everyone had a fun time doing it.
Uber’s Visualization team has open sourced the following projects: deck.gl, kepler.gl, luma.gl, react-vis, and react-map-gl. If you want to learn about all of Uber’s open source projects, check out our Github page, and, if you are interested in joining our team, visit the Uber Careers page.
If working on open source projects like kepler.gl appeals to you, consider applying for a role on our Visualization team!