pandas, scikit-learn, text extraction, K-means/DBSCAN clustering
We just launched our liveProject platform — where you can sign up for a structured project and get real-world experience.
Our pilot project — which is rather relevant at present — puts you in the role of a data scientist at the World Health Organization (WHO). The WHO is responsible for responding to international epidemics, a critical component of which involves monitoring global news headlines for signs of disease outbreaks. However, this daily deluge of news data is too huge to manually analyze. Your challenge is to pull geographic information from headlines, and determine where in the world outbreaks are occurring. Problems you will have to solve include extracting information from text using regular expressions, using the Basemap Matplotlib extension to visualize map locations for patterns indicating an epidemic, and reporting your findings to your superiors so resources can be dispatched.
Here’s the best part: the solo track for this project is FREE! Go try it out today at: https://www.manning.com/liveproject/discovering-disease-outbreaks-from-news-headlines
Learn more about liveProject here: https://liveproject.manning.com/