Apache Avro: Data serialization for distributed applications Training Course

Course Code



14 hours (usually 2 days including breaks)


A general familiarity with distributed computing


This course is intended for

  • Developers

Format of the course

  • Lectures, hands-on practice, small tests along the way to gauge understanding

Course Outline

Principles of distributed computing

  • Apache Spark
  • Hadoop

Principles of data serialization

  • How data object is passed over the network
  • Serialization of objects
  • Serialization approaches
    • Thrift
    • Protocol Buffers
    • Apache Avro
      • data structure
      • size, speed, format characteristics
      • persistent data storage
      • integration with dynamic languages
      • dynamic typing
      • schemas
        • untagged data
        • change management

Data serialization and distributed computing

  • Avro as a subproject of Hadoop
    • Java serialization
    • Hadoop serialization
    • Avro serialization

Using Avro with

  • Hive (AvroSerDe)
  • Pig (AvroStorage)

Porting Existing RPC Frameworks

Bookings, Prices and Enquiries

Guaranteed to run even with a single delegate!

Private Classroom

From £2500

Private Remote

From £2200 (96)

Public Classroom

Cannot find a suitable date? Choose Your Course Date >>Too expensive? Suggest your price

Related Courses

Course Discounts Newsletter

We respect the privacy of your email address. We will not pass on or sell your address to others.
You can always change your preferences or unsubscribe completely.