Data Science Lab

Real Data. Real Problems. Real Skills.

The foundation every serious data scientist - and every AI practitioner - needs to build on.

Completely Online
100% Free of Cost
Rigorous Focus on Applied Learning

Eight Projects. End-to-End. No Shortcuts.

The Data Science Lab is organized into two sequential units of four projects each. Every project is grounded in a real-world dataset from around the world and guides you through the full analytical workflow: from raw data to deployed, interpretable results.

Unit 1 covers the foundational data science workflow, taking you from wrangling and exploring data to building your first regression, classification, and time series models.

Unit 2 scales up to applied machine learning, tackling ensemble methods, unsupervised learning, experimentation, and real deployment workflows.

You’ll gain practical experience in:

Cleaning and analyzing the kind of messy data you actually encounter on the job, not tidy classroom examples
Building predictive models that forecast housing prices, detect bankruptcy risk, and classify earthquake damage
Working with time-based data to spot trends and make forecasts
Designing experiments to test whether interventions actually work, and communicating results to non-technical audiences
Delivering a complete, deployable data science project from raw data all the way to production-ready code

Data Science in the Age of AI

Every AI system depends on human decisions long before a model is trained. Someone has to decide what data to use, how to frame the problem, and which questions are even worth asking. Those decisions happen before a single model is trained, and they are fundamentally data science decisions.

As AI takes on more consequential roles, organizations need people who can build reliable data pipelines, evaluate model outputs critically, and recognize when statistically impressive results fail in the real world. They also need professionals who can detect bias, spot instability, and know when a model is confidently wrong. The more powerful AI becomes, the more it needs people who truly understand data. Rather than a threat, the age of AI presents an opportunity for data scientists to do the work that makes AI useful, reliable, and trustworthy.

The Data Science Lab builds exactly these skills, through hands-on projects grounded in real business and policy problems, using the tools and workflows professional data scientists use every day.

“WQU helped me massively in my ML/AI journey. It gave me all the fundamental knowledge I needed in my career.

From WQU, I had more confidence in taking on ML projects. It was a lot easier to continue studying myself as well, and I also got a few decent roles due to the knowledge I gained here.”

Applied Data Science Lab Graduate, Nigeria
2023

Data Science Lab

Next Deadline	Rolling Admissions
Lab Start Date	Upon Acceptance
Cost	Entirely Free
Length	16 weeks (recommended)
Applicant Requirements	Beginner-level Python skills Familiarity with basic statistics Passing score on Admissions Quiz (66% or higher)
Commitment	10-15 hours per week
Credentials Awarded	Sharable Credly Badge upon successful completion of Unit 1 Verified Digital Badge and Certificate upon successful completion of Unit 1 & 2

Learn How to Apply

Earn a Credential That Proves You Can Do the Work

By the end of this lab, you'll know how to:

Clean and analyze messy, real-world datasets
Build and evaluate predictive models across regression, classification, and time series
Apply ensemble methods and unsupervised learning to complex problems
Design experiments and communicate findings clearly
Deploy an end-to-end data science system using professional tools

Your progress is tracked through interactive notebooks and auto-graded coding tasks, embedded in our virtual lab environment. After completing all projects, you’ll receive a shareable digital credential via Credly, recognized by employers worldwide.

Project Descriptions

The Data Science Lab comprises two units, each with four end-to-end projects.

Each successful project completion unlocks the registration for the next.

This project introduces the foundational data science workflow using real estate data from Mexico City. Students load, clean, and visualize property listings, then quantify relationships among variables using correlation analysis. The central question — whether price is driven more by size or location — runs through all four notebooks and prepares students for the modeling work in Project 2.

Project 2 introduces supervised machine learning by predicting housing prices in Buenos Aires. Students prepare data for modeling, build their first linear regression model, and directly confront overfitting — addressing it with Ridge and Lasso regularization. The focus is on understanding generalization, not just fitting numbers.

This project moves from independent and identically distributed data to time-indexed data, using Nairobi air quality measurements stored in MongoDB. Students learn to query, prepare, and forecast time series, building from lagged regression to AR and ARMA models. The key conceptual shift: in time series, the past is a feature.

Project 4 introduces classification using earthquake-damage data from Nepal stored in SQLite. Students work through the full pipeline — querying relational data, training logistic regression and decision tree models, and evaluating performance with metrics suited for discrete outcomes. It closes Unit 1 with a real humanitarian decision-making context.

This project opens Unit 2 by scaling up to ensemble methods. Using corporate bankruptcy records from Poland and Taiwan in JSON format, students build Random Forest and Gradient Boosting classifiers to handle severe class imbalance. The emphasis is on moving from a working model to a production-ready prediction pipeline.

This project introduces unsupervised learning by segmenting consumer finance data. Without a target variable, students explore how clustering algorithms reveal latent structure in financial data. The key idea: clustering results are analytical constructs, not ground truth, and interpretability matters as much as the algorithm.

Project 7 project shifts from prediction to experimentation. Using applicant data from the WQU Data Science Lab admissions process stored in MongoDB, students build ETL pipelines, test hypotheses with chi-square statistics, and communicate results through interactive dashboards. The question is no longer "what will happen", it's "did this intervention work?”

Project Overview: This culminating project functions as the capstone. Students model financial volatility using GARCH on real market data while applying test-driven development and proper software engineering practices. The goal is not just a working model; rather, it's a reproducible, packaged system that could plausibly be deployed.

Lab Outcomes

Unit 1 Learning Outcomes

Explore, Clean, and Visualize Real-World Data

Explore, clean, and visualize real-world datasets to extract patterns and formulate data-driven hypotheses.

Train and Regularize Regression Models

Build, train, and evaluate supervised regression models, including regularization strategies for controlling model complexity.

Analyze and Forecast Time Series Data

Analyze time-indexed data and apply forecasting techniques using temporal modeling frameworks.

Classify Outcomes and Interpret Results

Implement classification models, evaluate performance on discrete outcomes, and interpret results in applied contexts.

Unit 2 Learning Outcomes

Apply Ensemble Methods to Imbalanced Problems

Apply ensemble methods — Random Forests and Gradient Boosting — to high-stakes classification problems with class imbalance.

Segment Customers with Clustering Algorithms

Perform unsupervised learning and customer segmentation using clustering algorithms, with emphasis on interpretability and feature selection.

Design Experiments and Build Interactive Dashboards

Design and evaluate data-driven experiments using hypothesis testing, ETL pipelines, and interactive dashboards.

Deploy Reproducible End-to-End Data Science Systems

Build reproducible, deployable end-to-end data science systems that integrate modeling, testing, and software engineering best practices.

Frequently Asked Questions

Before you can start your Lab, you will need to take an Admissions Assessment. We want to make sure you have a solid foundation on which you can build the skills we teach in our program. Information on the number of questions in the Assessment, the time to completion, and the passing grade will be provided within the Admissions Assessment page.

We ask that you not use supplementary materials to ensure you are measuring your actual indivi

If you fail the Admissions Assessment for a Lab program, you have a second attempt after a 7-day waiting period. Applicants who do not pass the test on their 2nd attempt are able to reapply to the Lab following a waiting period of 6 months from the date of their 2nd attempt.

Important Warning: Creating multiple accounts to attempt the Admissions Assessment is a violation of the University’s Academi

All of WQU’s Labs are 100% free, online, and self-paced, allowing you to set your own study schedule. You can move through the existing projects at your own pace and set your own deadlines. We generally recommend setting aside about 10-15h per week to ensure your continued progress, but you may take as much time to complete the Lab as you need. As you successfully complete each project you will gain access to the next pro

No, our Labs are hands-on continuing education opportunities that do not require a prior degree.

View Full FAQ

Data Science Lab

Real Data. Real Problems. Real Skills.

The foundation every serious data scientist - and every AI practitioner - needs to build on.

Eight Projects. End-to-End. No Shortcuts.