

Learn the fundamentals of data science, entirely free.
By completing a series of end-to-end data science projects, students build the wrangling, analysis, model-building, and communication skills to prepare for success in data-centric careers. They use their skills to build models that predict everything from real estate prices to customer retention, create interactive dashboards that run statistical experiments, and build APIs to incorporate their insights into web applications.
Data science is becoming a cornerstone of modern business. Practitioners use analytical tools and techniques to extract meaningful insights from data that drive critical business decisions. Data scientists are in demand across industries, and the number of positions is projected to grow by 35% through 2030.
“WQU helped me massively in my ML/AI journey. It gave me all the fundamental knowledge I needed in my career.
From WQU, I had more confidence in taking on ML projects. It was a lot easier to continue studying myself as well, and I also got a few decent roles due to the knowledge I gained here.”
Applied Data Science Lab Graduate, Nigeria
2023
Applied Data Science Lab
Next Deadline |
Rolling Admissions |
---|---|
Lab Start Date |
Upon Acceptance |
Cost |
Entirely Free |
Length |
16 weeks (recommended) |
Applicant Requirements |
|
Commitment |
10-15 hours per week |
Credentials Awarded |
|

Upon successful completion of the Applied Data Science Lab, students receive both a digital certificate and a sharable, verified credential.
What You Will Learn
In this dynamic learning environment, students get real-time feedback and opportunities to collaborate with their peers and participate in live office hours with their instructor. After successfully completing the Lab, students earn an easily shareable WQU badge issued by Credly.
Project Descriptions
This project analyzes a dataset of 21,000 properties for sale in Mexico from Properati.com to investigate whether property prices are more strongly influenced by size or location. Learners organize information using Python data structures and import and clean CSV data with the pandas library. The project includes creating data visualizations such as scatter and box plots to explore the data, culminating in an examination of variable relationships through correlation analysis to answer the central research question.
Building on the data wrangling and visualization skills from Project 1, Project 2 marks a transition from descriptive to predictive data science. Focusing on real estate in Buenos Aires, Argentina, learners create a machine learning model to predict apartment prices. The project covers building linear regression models using the scikit-learn library and constructing data pipelines for imputing missing values and encoding categorical features. Learners also explore techniques to improve model performance by reducing overfitting and conclude by creating a dynamic dashboard for interacting with their completed prediction model.
Project 3 utilizes data from openAfrica, one of Africa's largest open data platforms, to analyze air quality measurements from Nairobi, Lagos, and Dar es Salaam. Learners build a time series model to predict PM 2.5 readings throughout the day, gaining experience in querying MongoDB databases and preparing time series data for analysis. The project covers building autoregression models and improving performance through hyperparameter tuning. These time series modeling skills are valuable beyond public health applications, serving as foundational concepts for financial engineering and natural language processing work.
This project uses data from Open Data Nepal to build a model predicting building damage from the Nepal 2015 Earthquake, focusing primarily on the Gorkha district with additional examples from Ramechhap. Learners query SQL databases to retrieve data and develop classification models using both logistic regression and decision tree approaches. The project emphasizes the importance of incorporating ethical considerations into model building, providing learners with experience in responsible machine learning practices while addressing real-world disaster response scenarios.
Project 5 explores bankruptcy data collected by a team of Polish economists to build a predictive model that determines whether a company will go bankrupt. Learners develop skills in navigating file systems from the Linux command line and loading and saving files using Python. The project addresses the challenge of imbalanced datasets through resampling techniques and covers model evaluation using classification metrics such as precision and recall, providing practical experience with real-world financial data analysis.
Project 6 uses data from the 2019 Survey of Consumer Finances to identify households facing credit access challenges and build a model to segment them into distinct subgroups. As an example of unsupervised learning through clustering, the project has applications in commercial marketing, customer segmentation, and sociological studies of social stratification. Learners compare subgroup characteristics using side-by-side bar charts and build k-means clustering models. The project covers feature selection for clustering based on variance and dimensionality reduction through principal component analysis (PCA). Finally, learners design, build, and deploy an interactive Dash web application to share their findings.
This project involves designing and conducting an A/B testing experiment to determine if WQU can increase quiz completion rates. Learners explore the Applied Data Science Lab applicant pool to formulate research questions and hypotheses, then run and analyze their experiments. The project demonstrates randomized controlled experimentation techniques used across industries, from email marketing to campaign testing and scientific research. Learners build choropleth maps to visualize the global distribution of ADSL students and create custom Python classes for ETL processes. The project covers experimental design and chi-square test analysis, culminating in building an interactive web application that follows a three-tiered design pattern.
The final project focuses on building a model to predict volatility on the Bombay Stock Exchange. Learners will explore stock data for two companies using the AlphaVantage stock API, then use that data to calculate volatility and build predictive models. The project concludes with model deployment through creating a custom API to serve predictions. As volatility models are essential tools in econometrics and financial engineering, this project provides practical experience with real-world financial applications while reinforcing time series concepts from earlier coursework. Learners make HTTP requests to retrieve data from web APIs and transform and load data to SQL databases using custom Python classes. The project covers calculating asset volatility and building GARCH models for prediction, culminating in building and deploying a web API and server to serve model predictions.
Lab Outcomes

Database Management
Extract data from SQL and NoSQL databases

Regression & Classification Modeling
Build predictive models for regression and classification

Ethics in Machine Learning
Discuss the ethical implications of deploying models in the real world and the environmental impact of machine learning models

Data Cleaning and Preprocessing
Clean authentic, messy datasets

Data Visualization
Create compelling visualizations to explain data characteristics and model performance

Business Insight & Intelligence
Learn how to apply machine learning to business problems