cart Cart
Your cart
Your cart is empty

Browse courses to find something that interests you.

Register search
Enter to search / ESC to close
loading
SHORT COURSE

Dealing with unstructured data: Get your own data from the web and prepare it for analysis (online)

This course empowers data professionals to tap into the world of unstructured data. It teaches them how to crawl, extract, clean and prepare data from web sources in a range of formats — from text to images and beyond.

About this course

This course is a collaboration between UTS and Coder Academy, aimed at data professionals. Whether your job involves cleaning, manipulating, extracting, or drawing conclusions from data, this course will provide you with new tools to access the massive amounts of data stored on the web.

A recent report from Deloitte Access Economics suggests that the world creates an additional 2.5 quintillion bytes of data each year, with 90 percent of all data in existence being created in just the last two years. Unstructured data makes up around 80% of the data that organisations are producing. This includes emails, social media posts, photos, videos, word documents, PDF’s and instant messages.

Unstructured data presents both huge opportunities and challenges. If managed appropriately, it can help with new product development, increase sales leads, help with meeting compliance requirements, improve data governance, detect patterns throughout different social media channels, and inform decision making.

On the flip side, poorly managed, unstructured data can present a growing and expensive storage problem and a major cyber risk.

This course will provide you with the tools necessary to capture, manipulate, clean, and store unstructured data coming from the web.

For more information on the UTS & Coder Academy course collaboration, or to contact the Coder Academy team directly, follow this link.

Course structure

This intensive course is conducted over two three-hour evening sessions and covers:

  • What is Unstructured Data?
  • Different kinds of unstructured data
  • How do we get unstructured data with Python?
  • Scrapy vs Beautiful Soup vs Selenium
  • Flash introduction to HTML
  • XPath Selectors
  • CSS Selectors
  • Introduction to Scrapy
  • Scraping one website
  • Crawling multiple websites
  • Crawling multiple links in one website
  • Fine-tuning your Spider
  • Data Cleaning
  • Flash intro to Git
  • Making sure our work is reproducible

 

Learning outcomes

  • Articulate the differences between structured and unstructured data
  • Create custom web crawlers to extract data from single to nested and convoluted websites
  • Clean unstructured data coming from the web
  • Use version control for reproducibility purposes

 

Any questions about the course or what to expect on the day? 

Looking for different dates or a similar course?

Who is this course for?

This course is designed for professionals or researchers with a working knowledge of Python who’re looking to work better with data from multiple sources – attendees might include business analysts, consultants, data analysts, digital marketers, data journalists, librarians, and researchers.

$450.00

START DATE

14 July

MODE

Online

DURATION

6 hrs

Meet the Expert

Ramon Perez

Ramon Perez

Ramon is a data scientist and instructor at Coder Academy and a research associate at INSEAD. He works at the intersection of education, data science, and research in the areas of entrepreneurship and strategy. He has previously worked in consumer behaviour and development economics research in professional and academic settings, helping multinational companies understand their customers better and developing new methods to study the levels of financial literacy across the globe. Ramon holds a BSc in economics, finance and marketing, and an MA in Economics. In his spare time, he enjoys cycling, baseball, CrossFit, and finding new coffee shops around Sydney.

More Less

Book a session

Tue 14 Jul 2020 -
Thu 16 Jul 2020
Expert: Ramon Perez
  • Online via Canvas virtual classroom
  • Online
  • 2 sessions, 6 hours total

This online course runs for a small group of up to 50 participants, and registration for each session closes at midnight the day prior to the class.

We use cookies

We use cookies to help personalise content, tailor and measure ads, plus provide a safer experience. By navigating the site, you agree to the use of cookies to collect information. Read our Cookie Policy to learn more.

loading