Mastering Big Data

Topics Overview (not limited to)

Introduction to Big Data

  • Introduction to Big Data concepts, problems and challenges.
  • Exploring Big Data storage and processing concepts.

Hadoop Distributions

  • Hadoop market leaders distributions overview such as Cloudera, Hortonsworks, MapR, etc
  • How to choose a Hadoop distribution

Big Data Processing Tools

Students will learn structured and unstructured data management and analysis.

Exploring Big Data storage and processing tools with hands-on workshops focusing on the most commons tools such as:

  • MapReduce
  • Hive
  • Flume
  • Pig
  • HDFS
  • Spark et Spark SQL
  • Nifi
  • Kafka
  • HBase

Data Storage Formats

A detailed presentation of various file formats widely used in the world of Big Data such as: Avro, Parquet and ORC. Also students will learn how to choose the best file format for a particular usage.

Introduction to Spark programmig languages

A presentation of the structure of the Scala and PySaprk languages with practical exercises to write Spark applications.

End to End Big Data Projets (Workshops)

To validate the acquired knowledge, the participants will achieve at the end of the training session a complete project which includes:

  • Real-time data collection from Twitter, Meetup etc.
  • Analyzing, filtering, cleaning and indexing collected data
  • Creation of a real-time dashboard to display the data collected.
  • Export data to an RDBMS
  • Create a sentiment analysis


Use of JDBC / ODBC connectors to extract data from a Hadoop data lake and visualize it using a visualization tool such as Tableau Software, MS PowerBI, MicroStrategy, MS Excel etc …