cart Cart
Your cart
Your cart is empty

Browse courses to find something that interests you.

Register search
Enter to search / ESC to close

Data cleaning: Tidying up messy datasets (online)

This highly practical hands-on course empowers data analysts and subject matter experts to use Python to clean, question and visualise messy structured and unstructured datasets.

About this course

This course is a collaboration between UTS and Coder Academy, aimed at data professionals - whether you model data or interrogate it for conclusions.

Whether your job is to analyse, extract, or draw conclusions from data, this course will provide you with new tools to clean and process the massive amount of data sets out there.

It’s reported that data scientists spend between 50 to 80% of their time on data cleaning and feature engineering, which can be the most time-consuming and least enjoyable part of their work. This course aims to provide data professionals with new tools to clean and process the massive amount of messy data sets out there and to help them become more proficient at data manipulation.

For more information on the UTS & Coder Academy course collaboration, or to contact the Coder Academy team directly, follow this link.

Course structure

This intensive course is conducted over two three-hour evening sessions and covers:

  • Intro to Data Cleaning
    • What is Data Cleaning?
    • What is Feature Engineering?
    • Why should we learn how to clean data?
    • What are the best Data Cleaning Tools?
    • Misconceptions about cleaning data
  • Data Cleaning best practices
  • Cleaning an all-numerical datasets
  • Flash introduction to regular expressions in Python
  • Cleaning a text-only dataset
  • Feature Engineering
  • Exploring our cleaned datasets
    • Descriptive statistics
    • Data Visualisation
  • Making sure anyone can reproduce our results using the same data


Learning outcomes

  • Become adept at exploring different datasets and spotting inconsistencies in the data
  • Clean messy datasets and prepare them for analysis
  • Use different merging techniques to combine datasets
  • Version control your code and prepare it for future use


Any questions about the course or what to expect on the day? 

Looking for different dates or a similar course?

Who is this course for?

This course is designed for professionals or researchers with a working knowledge of Python who’re frustrated with the challenges of working with messy or unstructured data – attendees might include business analysts, consultants, data analysts, digital marketers, data journalists, librarians, and researchers.



04 August




6 hrs

Meet the Expert

Ramon Perez

Ramon Perez

Ramon is a data scientist and instructor at Coder Academy and a research associate at INSEAD. He works at the intersection of education, data science, and research in the areas of entrepreneurship and strategy. He has previously worked in consumer behaviour and development economics research in professional and academic settings, helping multinational companies understand their customers better and developing new methods to study the levels of financial literacy across the globe. Ramon holds a BSc in economics, finance and marketing, and an MA in Economics. In his spare time, he enjoys cycling, baseball, CrossFit, and finding new coffee shops around Sydney.

More Less

Book a session

Tue 04 Aug 2020 -
Thu 06 Aug 2020
Expert: Ramon Perez
  • Online via Canvas virtual classroom
  • Online
  • 2 sessions, 6 hours total

This online course runs for a small group of up to 50 participants, and registration for each session closes at midnight the day prior to the class.

We use cookies

We use cookies to help personalise content, tailor and measure ads, plus provide a safer experience. By navigating the site, you agree to the use of cookies to collect information. Read our Cookie Policy to learn more.