Build, train, and deploy AI models with our Computer Vision Lab - enroll today
Data Science Lab
Real Data. Real Problems. Real Skills.
Eight Projects. End-to-End. No Shortcuts.
The Data Science Fundamentals Lab is organized into two sequential units of four projects each. Every project is grounded in a real-world dataset from around the world and guides you through the full analytical workflow: from raw data to deployed, interpretable results.
Unit 1 covers the foundational data science workflow, taking you from wrangling and exploring data to building your first regression, classification, and time series models.
Unit 2 scales up to applied machine learning, tackling ensemble methods, unsupervised learning, experimentation, and real deployment workflows.
You’ll gain practical experience in:
- Cleaning and analyzing the kind of messy data you actually encounter on the job, not tidy classroom examples
- Building predictive models that forecast housing prices, detect bankruptcy risk, and classify earthquake damage
- Working with time-based data to spot trends and make forecasts
- Designing experiments to test whether interventions actually work, and communicating results to non-technical audiences
- Delivering a complete, deployable data science project from raw data all the way to production-ready code
Data Science in the Age of AI
Every AI system depends on human decisions long before a model is trained. Someone has to decide what data to use, how to frame the problem, and which questions are even worth asking. Those decisions happen before a single model is trained, and they are fundamentally data science decisions.
As AI takes on more consequential roles, organizations need people who can build reliable data pipelines, evaluate model outputs critically, and recognize when statistically impressive results fail in the real world. They also need professionals who can detect bias, spot instability, and know when a model is confidently wrong. The more powerful AI becomes, the more it needs people who truly understand data. Rather than a threat, the age of AI presents an opportunity for data scientists to do the work that makes AI useful, reliable, and trustworthy.
The Data Science Fundamentals Lab builds exactly these skills, through hands-on projects grounded in real business and policy problems, using the tools and workflows professional data scientists use every day.
“WQU helped me massively in my ML/AI journey. It gave me all the fundamental knowledge I needed in my career.
From WQU, I had more confidence in taking on ML projects. It was a lot easier to continue studying myself as well, and I also got a few decent roles due to the knowledge I gained here.”
Applied Data Science Lab Graduate, Nigeria
2023
Applied Data Science Lab
|
Next Deadline |
Rolling Admissions |
|---|---|
|
Lab Start Date |
Upon Acceptance |
|
Cost |
Entirely Free |
|
Length |
16 weeks (recommended) |
|
Applicant Requirements |
|
|
Commitment |
10-15 hours per week |
|
Credentials Awarded |
|
Earn a Credential That Proves You Can Do the Work
By the end of this lab, you'll know how to:
- Clean and analyze messy, real-world datasets
- Build and evaluate predictive models across regression, classification, and time series
- Apply ensemble methods and unsupervised learning to complex problems
- Design experiments and communicate findings clearly
- Deploy an end-to-end data science system using professional tools
Your progress is tracked through interactive notebooks and auto-graded coding tasks, embedded in our virtual lab environment. After completing all projects, you’ll receive a shareable digital credential via Credly, recognized by employers worldwide.
Project Descriptions
The Data Science Lab comprises two units, each with four end-to-end projects.
Each successful project completion unlocks the registration for the next.
This project introduces the foundational data science workflow using real estate data from Mexico City. Students load, clean, and visualize property listings, then quantify relationships among variables using correlation analysis. The central question — whether price is driven more by size or location — runs through all four notebooks and prepares students for the modeling work in Project 2.
Project 2 introduces supervised machine learning by predicting housing prices in Buenos Aires. Students prepare data for modeling, build their first linear regression model, and directly confront overfitting — addressing it with Ridge and Lasso regularization. The focus is on understanding generalization, not just fitting numbers.
This project moves from independent and identically distributed data to time-indexed data, using Nairobi air quality measurements stored in MongoDB. Students learn to query, prepare, and forecast time series, building from lagged regression to AR and ARMA models. The key conceptual shift: in time series, the past is a feature.
Project 4 introduces classification using earthquake-damage data from Nepal stored in SQLite. Students work through the full pipeline — querying relational data, training logistic regression and decision tree models, and evaluating performance with metrics suited for discrete outcomes. It closes Unit 1 with a real humanitarian decision-making context.
This project opens Unit 2 by scaling up to ensemble methods. Using corporate bankruptcy records from Poland and Taiwan in JSON format, students build Random Forest and Gradient Boosting classifiers to handle severe class imbalance. The emphasis is on moving from a working model to a production-ready prediction pipeline.
This project introduces unsupervised learning by segmenting consumer finance data. Without a target variable, students explore how clustering algorithms reveal latent structure in financial data. The key idea: clustering results are analytical constructs, not ground truth, and interpretability matters as much as the algorithm.
Project 7 project shifts from prediction to experimentation. Using applicant data from the WQU Data Science Lab admissions process stored in MongoDB, students build ETL pipelines, test hypotheses with chi-square statistics, and communicate results through interactive dashboards. The question is no longer "what will happen", it's "did this intervention work?”
Project Overview: This culminating project functions as the capstone. Students model financial volatility using GARCH on real market data while applying test-driven development and proper software engineering practices. The goal is not just a working model; rather, it's a reproducible, packaged system that could plausibly be deployed.
Lab Outcomes
Unit 1 Learning Outcomes
Explore, Clean, and Visualize Real-World Data
Explore, clean, and visualize real-world datasets to extract patterns and formulate data-driven hypotheses.
Train and Regularize Regression Models
Build, train, and evaluate supervised regression models, including regularization strategies for controlling model complexity.
Analyze and Forecast Time Series Data
Analyze time-indexed data and apply forecasting techniques using temporal modeling frameworks.
Classify Outcomes and Interpret Results
Implement classification models, evaluate performance on discrete outcomes, and interpret results in applied contexts.
Unit 2 Learning Outcomes
Apply Ensemble Methods to Imbalanced Problems
Apply ensemble methods — Random Forests and Gradient Boosting — to high-stakes classification problems with class imbalance.
Segment Customers with Clustering Algorithms
Perform unsupervised learning and customer segmentation using clustering algorithms, with emphasis on interpretability and feature selection.
Design Experiments and Build Interactive Dashboards
Design and evaluate data-driven experiments using hypothesis testing, ETL pipelines, and interactive dashboards.
Deploy Reproducible End-to-End Data Science Systems
Build reproducible, deployable end-to-end data science systems that integrate modeling, testing, and software engineering best practices.