LUCENT GRID
Data Engineering Lab ยท Data Cleaning & Normalisation
Loading dirty dataset...
๐Ÿ–ฅ๏ธ
SQL Terminal
โš ๏ธ
Dirty Data
โœ…
Before/After
๐Ÿ“Œ OBJECTIVES
โ—‹Find NULL values
โ—‹Fix inconsistent casing
โ—‹Remove duplicate rows
โ—‹Standardise date format
โ—‹Trim whitespace
โ—‹Verify clean dataset
0 / 6 complete
SQL TERMINAL โ€” retail_db.customers
MySQL 8.0 โ€” retail_db loaded Table: customers (312 rows โ€” imported from legacy CRM) WARNING: Data quality issues detected โ€” see Dirty Data window โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
mysql>
DIRTY DATA PREVIEW โ€” customers table
idfirst_namelast_nameemailcountryjoined_date
1 Alice SMITHalice@mail.comuk15/01/2023
2BobjonesBOB@MAIL.COMUSA2023-02-20
3NULLNULLNULLFR2023-03-01
4Bobjonesbob@mail.comus20/02/2023
5Carol Whitecarol@mail.comGB01-04-2023
6DavidBrowndavid@mail.comGB2023-05-10
โš  Issues: whitespace, inconsistent casing, NULLs, duplicates, mixed date formats
BEFORE / AFTER VIEWER
DATA QUALITY LOG
Run cleaning queries to see changes appear here.
โ˜ฐ BRIEF
๐Ÿ–ฅ Terminal
โš ๏ธ Dirty Data
โœ… Before/After
MISSION BRIEF

MISSION BRIEFING

DATA CLEANING โ€” THORNFIELD RETAIL CRM

SCENARIO

Thornfield Retail migrated their customer database from a legacy CRM three months ago. The export was messy โ€” inconsistent casing, trailing spaces, NULL records, duplicate rows, and three different date formats. Marketing cannot use the data until it is clean. That is your job.

ISSUES TO FIX

  • Trailing/leading whitespace on names
  • Inconsistent casing (SMITH, jones, MAIL.COM)
  • NULL first_name / last_name / email rows
  • Duplicate customers (same email)
  • Mixed date formats (DD/MM/YYYY, YYYY-MM-DD, MM-DD-YYYY)

USEFUL COMMANDS

  • SELECT * FROM customers WHERE email IS NULL;
  • UPDATE customers SET first_name = TRIM(INITCAP(first_name));
  • UPDATE customers SET email = LOWER(email);
  • DELETE FROM customers WHERE id IN (SELECT MIN(id)...);
  • UPDATE customers SET joined_date = STR_TO_DATE(joined_date,'%d/%m/%Y') WHERE joined_date LIKE '%/%';
  • SELECT COUNT(*) FROM customers WHERE first_name IS NULL;
  • SELECT * FROM customers ORDER BY email;

TIP

Work through each issue one at a time. Check your results with a SELECT after each UPDATE. The Before/After viewer logs your cleaning actions.

Display Mode
LAB COMPLETE
๐Ÿ†
DATA CLEANED
Data Cleaning & Normalisation โ€” Thornfield Retail
โšก
+50 XP
EXPERIENCE EARNED
โฌ† LEVEL UP โ€” Level
โ†’ Return to Labs