Privacy, Twitter, and Machine Learning: six questions with Andrew Trask
Andrew Trask is the author of Grokking Deep Learning.
The interview was conducted by Frances Lefkowitz, a Development Editor at Manning Publications.
Andrew Trask is a researcher pursuing a Doctorate at Oxford University, where he focuses on Deep Learning with an emphasis on human language. He is also a leader at OpenMined.org, an open-source community of researchers and developers working on creating free and accessible tools for secure AI. Previously, Andrew was analytics product manager at Digital Reasoning, where he trained the world’s largest artificial neural network (with over 160 billion parameters) and helped guide the analytics for the Synthesys cognitive computing platform, which tackles problems in government intelligence, finance, and healthcare. Grokking Deep Learning is his first book.
Find Andrew online at his blog (iamtrask.github.io) and @iamtrask on Twitter.
You’ve got 26.5 k Twitter followers! How did you get so many?
Social media is about finding an audience and then growing it through creating content that is valuable for them. In my case, I started a Machine Learning blog which linked to my Twitter as a means of informing people when I wrote more content. Since then, I have expanded the content I feature on Twitter to include other great Machine Learning tutorials and blogs, as well. The number of followers is more a reflection of just how big the interest is in simple, intuitive explanations for Machine Learning concepts than it is a reflection of me as a person.
You’ve talked about how the algorithms in Deep Learning get all the attention, at the expense of the data. Can you explain what you mean?
The actual “information” about a person hasn’t changed that much, only our ability to measure it. This gives the impression that data is not innovative, that it is merely a part of the supply chain. Getting excited about data feels a bit like getting excited about the sand that creates microchips. So people tend to be more excited about the “refinement” part.
Two dominant trends in Deep Learning breakthroughs over the last several years are: 1. Learning when and where to re-use the same neural network weights (LSTMs, ConvNets, Dropout, CapsNet, etc.) and 2. Figuring out how to generate more training data (bigger datasets, bigger hardware, unsupervised training techniques, GANs, Deep Reinforcement learning). The first is about algorithms and the second is about data. However, the reason that data is so important is that algorithms are plentiful. Everyone has them. But not everyone has data. In my opinion, the potential social implications (both positive and negative) have more to do with who has data than who has algorithms–although access to talented people who know algorithms is certainly a factor.
What was the hardest thing about writing your book Grokking Deep Learning?
Finding time. Writing a book is a hugely time-consuming endeavor. It’s also such a long project that you have to take a long view of your daily priorities, forsaking activities which might be urgent but unimportant with those which are important but not urgent. My wife and the amazing team at Manning got me through it–through a balance of both patience and encouragement.
You’re completing a doctorate at Oxford, and yet your aim with your book is “to create the lowest possible barrier to the practice of Deep Learning.” Why is it important to make these topics accessible to people who have just basic skills in math and coding?
Unlike other careers, a career in Machine Learning can be very meritocratic. You don’t need a certification to do it–if you learn the skills, and show that you have them on Github or Kaggle, you can usually find your way into a high-paying job (often at a startup within your local community). I find great fulfillment in the idea that this book might help people on the road to finding jobs which would previously have been out of their reach. It may or may not happen in the end, but that vision was certainly a motivating factor for me.
Can you talk about the issue of privacy amid the increasing availability of training data?
One of the most exciting new developments in the Machine Learning industry is the broader awareness and importance of privacy in the public narrative. People (and governments) are waking up to the idea that privacy isn’t just for “people with something to hide.” It’s for everyone.
Maintaining control over your own data is a lot like maintaining the right to vote. If you can control who has access to your data, you can control who can use information about you for their own personal gain. Basically, information is pricing power–and it applies to every product you buy. If you search online for a few minutes, you’ll find a plethora of online companies which change the price of their goods and services based on what they know about you; airlines are particularly known for this. And the insurance industry is absolutely invested in knowing as much as they can about you. The more accurately an insurance company can predict whether you will be sick, the more accurately the company can price how much you “should pay” for insurance. And if you happen to be more likely to be sick than someone else, you get a higher bill.
So gaining privacy is putting money back in your pocket.
Is there anything to be done about this — as consumers, and as data scientists?
The good news is that we, as a society, are getting a better sense of just how valuable our personal data is. But this is also going to have a dramatic change on the Machine Learning/Deep Learning industry. It’s going to be harder to have access to training data in the future, and it’s going to require an entirely different set of tools to do so. Researching solutions, building these tools, and teaching people how to use them is my main focus in life. If this sounds interesting to you, join the OpenMined community (openmined.org), an open-source group of researchers and developers working on building free tools for private, secure, and governed AI.
Originally published at freecontent.manning.com.