Big Data Masters Program

Request more information

    • United States+1
    • United Kingdom+44
    • Afghanistan (‫افغانستان‬‎)+93
    • Albania (Shqipëri)+355
    • Algeria (‫الجزائر‬‎)+213
    • American Samoa+1684
    • Andorra+376
    • Angola+244
    • Anguilla+1264
    • Antigua and Barbuda+1268
    • Argentina+54
    • Armenia (Հայաստան)+374
    • Aruba+297
    • Australia+61
    • Austria (Österreich)+43
    • Azerbaijan (Azərbaycan)+994
    • Bahamas+1242
    • Bahrain (‫البحرين‬‎)+973
    • Bangladesh (বাংলাদেশ)+880
    • Barbados+1246
    • Belarus (Беларусь)+375
    • Belgium (België)+32
    • Belize+501
    • Benin (Bénin)+229
    • Bermuda+1441
    • Bhutan (འབྲུག)+975
    • Bolivia+591
    • Bosnia and Herzegovina (Босна и Херцеговина)+387
    • Botswana+267
    • Brazil (Brasil)+55
    • British Indian Ocean Territory+246
    • British Virgin Islands+1284
    • Brunei+673
    • Bulgaria (България)+359
    • Burkina Faso+226
    • Burundi (Uburundi)+257
    • Cambodia (កម្ពុជា)+855
    • Cameroon (Cameroun)+237
    • Canada+1
    • Cape Verde (Kabu Verdi)+238
    • Caribbean Netherlands+599
    • Cayman Islands+1345
    • Central African Republic (République centrafricaine)+236
    • Chad (Tchad)+235
    • Chile+56
    • China (中国)+86
    • Christmas Island+61
    • Cocos (Keeling) Islands+61
    • Colombia+57
    • Comoros (‫جزر القمر‬‎)+269
    • Congo (DRC) (Jamhuri ya Kidemokrasia ya Kongo)+243
    • Congo (Republic) (Congo-Brazzaville)+242
    • Cook Islands+682
    • Costa Rica+506
    • Côte d’Ivoire+225
    • Croatia (Hrvatska)+385
    • Cuba+53
    • Curaçao+599
    • Cyprus (Κύπρος)+357
    • Czech Republic (Česká republika)+420
    • Denmark (Danmark)+45
    • Djibouti+253
    • Dominica+1767
    • Dominican Republic (República Dominicana)+1
    • Ecuador+593
    • Egypt (‫مصر‬‎)+20
    • El Salvador+503
    • Equatorial Guinea (Guinea Ecuatorial)+240
    • Eritrea+291
    • Estonia (Eesti)+372
    • Ethiopia+251
    • Falkland Islands (Islas Malvinas)+500
    • Faroe Islands (Føroyar)+298
    • Fiji+679
    • Finland (Suomi)+358
    • France+33
    • French Guiana (Guyane française)+594
    • French Polynesia (Polynésie française)+689
    • Gabon+241
    • Gambia+220
    • Georgia (საქართველო)+995
    • Germany (Deutschland)+49
    • Ghana (Gaana)+233
    • Gibraltar+350
    • Greece (Ελλάδα)+30
    • Greenland (Kalaallit Nunaat)+299
    • Grenada+1473
    • Guadeloupe+590
    • Guam+1671
    • Guatemala+502
    • Guernsey+44
    • Guinea (Guinée)+224
    • Guinea-Bissau (Guiné Bissau)+245
    • Guyana+592
    • Haiti+509
    • Honduras+504
    • Hong Kong (香港)+852
    • Hungary (Magyarország)+36
    • Iceland (Ísland)+354
    • India (भारत)+91
    • Indonesia+62
    • Iran (‫ایران‬‎)+98
    • Iraq (‫العراق‬‎)+964
    • Ireland+353
    • Isle of Man+44
    • Israel (‫ישראל‬‎)+972
    • Italy (Italia)+39
    • Jamaica+1876
    • Japan (日本)+81
    • Jersey+44
    • Jordan (‫الأردن‬‎)+962
    • Kazakhstan (Казахстан)+7
    • Kenya+254
    • Kiribati+686
    • Kosovo+383
    • Kuwait (‫الكويت‬‎)+965
    • Kyrgyzstan (Кыргызстан)+996
    • Laos (ລາວ)+856
    • Latvia (Latvija)+371
    • Lebanon (‫لبنان‬‎)+961
    • Lesotho+266
    • Liberia+231
    • Libya (‫ليبيا‬‎)+218
    • Liechtenstein+423
    • Lithuania (Lietuva)+370
    • Luxembourg+352
    • Macau (澳門)+853
    • Macedonia (FYROM) (Македонија)+389
    • Madagascar (Madagasikara)+261
    • Malawi+265
    • Malaysia+60
    • Maldives+960
    • Mali+223
    • Malta+356
    • Marshall Islands+692
    • Martinique+596
    • Mauritania (‫موريتانيا‬‎)+222
    • Mauritius (Moris)+230
    • Mayotte+262
    • Mexico (México)+52
    • Micronesia+691
    • Moldova (Republica Moldova)+373
    • Monaco+377
    • Mongolia (Монгол)+976
    • Montenegro (Crna Gora)+382
    • Montserrat+1664
    • Morocco (‫المغرب‬‎)+212
    • Mozambique (Moçambique)+258
    • Myanmar (Burma) (မြန်မာ)+95
    • Namibia (Namibië)+264
    • Nauru+674
    • Nepal (नेपाल)+977
    • Netherlands (Nederland)+31
    • New Caledonia (Nouvelle-Calédonie)+687
    • New Zealand+64
    • Nicaragua+505
    • Niger (Nijar)+227
    • Nigeria+234
    • Niue+683
    • Norfolk Island+672
    • North Korea (조선 민주주의 인민 공화국)+850
    • Northern Mariana Islands+1670
    • Norway (Norge)+47
    • Oman (‫عُمان‬‎)+968
    • Pakistan (‫پاکستان‬‎)+92
    • Palau+680
    • Palestine (‫فلسطين‬‎)+970
    • Panama (Panamá)+507
    • Papua New Guinea+675
    • Paraguay+595
    • Peru (Perú)+51
    • Philippines+63
    • Poland (Polska)+48
    • Portugal+351
    • Puerto Rico+1
    • Qatar (‫قطر‬‎)+974
    • Réunion (La Réunion)+262
    • Romania (România)+40
    • Russia (Россия)+7
    • Rwanda+250
    • Saint Barthélemy+590
    • Saint Helena+290
    • Saint Kitts and Nevis+1869
    • Saint Lucia+1758
    • Saint Martin (Saint-Martin (partie française))+590
    • Saint Pierre and Miquelon (Saint-Pierre-et-Miquelon)+508
    • Saint Vincent and the Grenadines+1784
    • Samoa+685
    • San Marino+378
    • São Tomé and Príncipe (São Tomé e Príncipe)+239
    • Saudi Arabia (‫المملكة العربية السعودية‬‎)+966
    • Senegal (Sénégal)+221
    • Serbia (Србија)+381
    • Seychelles+248
    • Sierra Leone+232
    • Singapore+65
    • Sint Maarten+1721
    • Slovakia (Slovensko)+421
    • Slovenia (Slovenija)+386
    • Solomon Islands+677
    • Somalia (Soomaaliya)+252
    • South Africa+27
    • South Korea (대한민국)+82
    • South Sudan (‫جنوب السودان‬‎)+211
    • Spain (España)+34
    • Sri Lanka (ශ්‍රී ලංකාව)+94
    • Sudan (‫السودان‬‎)+249
    • Suriname+597
    • Svalbard and Jan Mayen+47
    • Swaziland+268
    • Sweden (Sverige)+46
    • Switzerland (Schweiz)+41
    • Syria (‫سوريا‬‎)+963
    • Taiwan (台灣)+886
    • Tajikistan+992
    • Tanzania+255
    • Thailand (ไทย)+66
    • Timor-Leste+670
    • Togo+228
    • Tokelau+690
    • Tonga+676
    • Trinidad and Tobago+1868
    • Tunisia (‫تونس‬‎)+216
    • Turkey (Türkiye)+90
    • Turkmenistan+993
    • Turks and Caicos Islands+1649
    • Tuvalu+688
    • U.S. Virgin Islands+1340
    • Uganda+256
    • Ukraine (Україна)+380
    • United Arab Emirates (‫الإمارات العربية المتحدة‬‎)+971
    • United Kingdom+44
    • United States+1
    • Uruguay+598
    • Uzbekistan (Oʻzbekiston)+998
    • Vanuatu+678
    • Vatican City (Città del Vaticano)+39
    • Venezuela+58
    • Vietnam (Việt Nam)+84
    • Wallis and Futuna (Wallis-et-Futuna)+681
    • Western Sahara (‫الصحراء الغربية‬‎)+212
    • Yemen (‫اليمن‬‎)+967
    • Zambia+260
    • Zimbabwe+263
    • Åland Islands+358

    Introduction to Big Data

    1
    Introduction to Big Data
    2
    Big Data System Requirements
    3
    Monolithic vs Distributed System
    4
    Distributed System Architecture
    5
    What is Hadoop And Evolution of Hadoop
    6
    Core Components of Hadoop
    7
    HDFS Architecture:
    8
    What is Node And What is Cluster
    9
    Data Block & Block Size
    10
    Slave Node, Master Node, Data Node & Name Node
    11
    Metadata And Replication Factor
    12
    Heart Beat & Fault Tolerance
    13
    Handling Namenode Failure
    14
    What is SPOF
    15
    FSimage & Edit Logs
    16
    Secondary Namenode
    17
    Name Node Recovery
    18
    Check Pointing
    19
    Understanding Replication Factor
    20
    What is Rack And Rack Failure
    21
    Rack Awareness Mechanism
    22
    Block Report
    23
    Namenode High Availability
    24
    Quorum Journal Manager & Quorum Journal Node
    25
    Understanding Linux File System
    26
    List & Parameters of List Command
    27
    Touch, Mkdir, Rmdir & Other Linux Commands
    28
    HDFS Commands:
    29
    List Files & Directories
    30
    How HDFS Commands Work
    31
    ‘ls’ Command With Various Parameters
    32
    Create, Remove File/Directory
    33
    Copy & Get Files/Folders From Local to HDFS & Vice Versa
    34
    Move Files/Folders From HDFS to HDFS
    35
    Change Replication Factor Dynamically
    36
    View File Metadata Information

    MapReduce - Distributed Computing Framework

    1
    Introduction to MapReduce
    2
    Stages in MapReduce
    3
    What is Key-Value
    4
    What is Map & What is Reduce
    5
    Example to Undestand Map&Reduce
    6
    Word Count Example in MapREduce
    7
    Record Reader
    8
    Mapper Phase
    9
    Reducer Phase
    10
    MapReduce Shuffle & Sort
    11
    Inside Map & Reduce Phase
    12
    Wordcount Example in MapReduce
    13
    Typical MapReduce Flow
    14
    Blocks in MapReduce
    15
    Default Number of Mappers & Reducers
    16
    Understanding Number of Mappers/Reducers
    17
    MapReduce Framework Behind the Scenes
    18
    Role of Hash Function in MapReduce
    19
    Partitioning in MapReduce
    20
    How to Choose Number of Reducers
    21
    How Hash Function Works
    22
    Understanding Shuffle & Sort
    23
    Example: Calculating Max Temperature in a Day
    24
    Combiner Function in MapReduce
    25
    Advantages of Combiners
    26
    When to Use or Not to Use Combiner
    27
    Example1: Filtering Data using MapReduce
    28
    Example2: Finding Distinct Values
    29
    Example3: Finding Top 3 Most Influential users
    30
    Realtime Use Case: Google Web Search
    31
    MapReduce Programming
    32
    MR Code Explanation
    33
    How to Write Map Reduce Code
    34
    Mapper Code
    35
    Reducer Code
    36
    Main Code
    37
    Finding the Frequency of Each Word in a File
    38
    Mapreduce Jars
    39
    MapReduce Practical Sessions
    40
    Word Count Program – Practical Session1
    41
    Jar Creation & Execution – Practical Session2:
    42
    How to Create a Jar
    43
    How to Execute the Jar
    44
    How to Track a Job
    45
    How to Track All Previous Jobs
    46
    MR Program Variations – Practical Session3:
    47
    How to Change Number of Reducers
    48
    Writing Custom Partitioner Logic
    49
    Changing Number of Reducers to Zero
    50
    Introducing Combiner
    51
    Writing Custom Combiner logic
    52
    Introduction to Partitioners
    53
    Partitioners Code Example

    Apache Sqoop - Data Ingestion to Hadoop

    1
    Sqoop Fundamentals
    2
    Sqoop Basics
    3
    What is sqoop
    4
    Sqoop Workflow
    5
    Key Features of Sqoop
    6
    Sqoop Import
    7
    Sqoop Export
    8
    Connecting to MySQL
    9
    Acessing MySQL Databases from Hadoop
    10
    Acessing MySQL Tables from Hadoop
    11
    Sqoop Import Practicals
    12
    Sqoop Export Practicals
    13
    Sqoop Job
    14
    Sqoop Incremental Load
    15
    Sqoop Default Import
    16
    Sqoop Free-From Query Import
    17
    Sqoop Direct import
    18
    Importing Data Into Hive
    19
    Importing Data Into HBase
    20
    Sqoop Validate
    21
    When a Sqoop Export May Fail

    Apache Hive

    1
    Hive Overview:
    2
    Transactional System and Analytical System
    3
    Examples of Transactional Systems
    4
    Examples of Analytical Systems
    5
    What is Hive
    6
    Hive Query Language (HQL)
    7
    Understanding Hive Table
    8
    Introduction to Hive Metadata
    9
    Why Hive over traditional databases
    10
    Transactional and Analytical Processing
    11
    What is Data Warehouse
    12
    Hive Architecture
    13
    Hive on top of Hadoop
    14
    How Hive Works
    15
    Transactional vs Analytical Processing
    16
    Data Warehouse Concept
    17
    The Hive Metastore
    18
    Hive vs RDBMS
    19
    HQL vs SQL
    20
    Hive Subqueries Views & Index
    21
    Transactional and Analytical Processing
    22
    What is Data Warehouse
    23
    Hive Architecture
    24
    Hive on Hadoop
    25
    Hive Metastore
    26
    Hive vs. RDBMS
    27
    Hive Complex Data Types
    28
    Hive Array, Map & Struct
    29
    Hive Built-in Functions
    30
    Hive UDF, UDAF & UDTF
    31
    Hive Lateral Views
    32
    Hive Subqueries
    33
    Hive Views
    34
    Hive Normalization vs Denormalization

    Apache Hive Advance

    1
    Hive Structure Level Optimizations:
    2
    Hive Partitioning
    3
    Hive Partitioning With 2 Columns
    4
    Hive Bucketing
    5
    Hive Partitioning With Bucketing
    6
    Hive Query Level Optimizations:
    7
    Hive Join Optimizations
    8
    Hive Bucket Map Join Optimizations
    9
    Hive Window Functions
    10
    Hive Ranking
    11
    Hive Sorting
    12
    Hive File Format
    13
    Row vs Column File Formats
    14
    Specialized File Formats
    15
    Internals of ORC File Formats
    16
    Internals of Parquet File Formats
    17
    ORC vs Parquet File Formats
    18
    Hive Compression Techniques
    19
    Hive Vectorization
    20
    Changing the Hive Engine
    21
    Hive Thrift Server

    NoSQL Databases - HBase

    1
    Hbase Basics
    2
    Key requirements of database
    3
    Limitations of Hadoop
    4
    Google Bigtable concept for quick searching
    5
    Implementation of Bigtable as Hbase
    6
    Properties of Hbase
    7
    What Hbase can offer
    8
    Row based storage vs Columnar storage
    9
    Advantages of columnar storage
    10
    Normalization vs Denormalization
    11
    CRUD Operation
    12
    RDBMS vs Hbase
    13
    Hbase data model
    14
    4-Dimensional data model
    15
    CAP Theorem
    16
    Hbase Architecture
    17
    Hbase Region Server
    18
    Region, Memstore, Wal & Block Cache
    19
    Hfile
    20
    Zookeeper
    21
    Hmaster & Meta Table
    22
    Hbase Architecture components in details
    23
    Hbase Read/Write operations
    24
    Compaction
    25
    Hbase Data Update
    26
    Hbase Data Deletion
    27
    Handling Server Failures
    28
    Hbase Practicals
    29
    Handling Hbase Failure Services
    30
    Create & List Table
    31
    Insert Records in Table
    32
    Scan(view) & Get records from table
    33
    Delete a column
    34
    Describe a table
    35
    Check table exists or not
    36
    Drop table – Understanding how it works
    37
    Parameters of get command
    38
    Parameters of scan command
    39
    Hbase files structure in HDFS
    40
    How to disable/enable a table
    41
    Various filters in Hbase
    42
    Count Records

    NO SQL Database --Cassandra Overview

    1
    What is Cassandra
    2
    How Cassandra Cluster Look Like
    3
    Tunable read/write Consistency
    4
    Hbase vs Cassandra
    5
    Integration with Hadoop (Mini Project)
    6
    Hbase-Hive Integration

    Learning Scala - A Guide to Functional Programming

    1
    Why Scala
    2
    Where to Run Scala Code
    3
    Scala Code Using IDE
    4
    Scala Basics
    5
    Var vs val
    6
    Type inference
    7
    Data types in Scala
    8
    String Interpolation
    9
    String Comparison
    10
    Flow control: If else
    11
    Match Case
    12
    For Loop
    13
    While loop
    14
    Scala Functional Programming
    15
    How to define a function
    16
    Higher order function
    17
    Anonymous function
    18
    Scala Collections
    19
    Array
    20
    List
    21
    Tuple
    22
    Range
    23
    Set
    24
    Map
    25
    Scala Functional Programming:
    26
    Why Scala
    27
    Modes of writing Scala code
    28
    What is a functional programming
    29
    What is a function
    30
    What is a pure function?
    31
    First class function
    32
    Higher order function
    33
    Anonymous function
    34
    Immutability
    35
    Loop
    36
    Recursion
    37
    Tail recursion
    38
    Statement vs Expression
    39
    Closure
    40
    Scala type system
    41
    Scala operators
    42
    Anonymous function
    43
    Placeholder syntax
    44
    Partially applied functions
    45
    Function currying

    Apache Spark - General Purpose Cluster Computing Framework

    1
    What is App class in Scala
    2
    Default args, named args & variable args
    3
    Difference between nil, null, none & nothing
    4
    What is option in Scala
    5
    What is unit in Scala
    6
    Dealing with nulls in Scala
    7
    What is yield
    8
    What is vector
    9
    Scala if guards & pattern guards
    10
    What is “for comprehensions”
    11
    Difference between “==” in java and Scala
    12
    Difference between strict val vs lazy val
    13
    What are default packages in Scala
    14
    What is Scala apply method
    15
    What is a diamond problem in Scala
    16
    What is a trait
    17
    Why Scala is the top most choice for a big data
    18
    What is Apache Spark

    Apache Spark Introduction

    1
    What is Apache Spark
    2
    Understanding Spark cluster
    3
    Is Spark a replacement to Hadoop
    4
    Why Spark is faster than MapReduce
    5
    How data store in Spark
    6
    What is RDD
    7
    What is DAG
    8
    RDD Lineage
    9
    Resiliency
    10
    Immutability
    11
    Transformation & Action
    12
    Lazy Evaluation
    13
    Word count program in Spark
    14
    Word count program in PySpark
    15
    Word count problem real-time example

    Apache Spark --ADVANCE

    1
    Spark Real-Time Example
    2
    Broadcast Variable
    3
    Accumulators
    4
    How Spark Executes Program on the Cluster
    5
    Spark Driver and Executors
    6
    Client Mode, Cluster Mode and Local Mode Analyzing Log Messages – Hands on
    7
    Narrow vs Wide Transformations
    8
    Stages in Spark
    9
    Difference Between reduceByKey & reduce
    10
    Difference Between groupByKey & reduceByKey
    11
    Pair RDD
    12
    Pair RDD vs Map
    13
    Understanding Default Parallelism
    14
    Difference Between repartition & coalesce
    15
    When to Increase/Decrease Partitions
    16
    Spark on YARN Architecture
    17
    YARN – Yet Another Resource Negotiator
    18
    Application Master
    19
    Containers

    Apache Spark - Structured API Part-1

    1
    Cache vs Persist
    2
    Spark Storage Levels
    3
    Difference Between DAG & Lineage
    4
    How to Submit a Spark Job
    5
    Real-time example – Finding top movies based on ratings
    6
    Spark Ecosystem
    7
    Map vs Map Partitions
    8
    Introduction to Spark Structured API
    9
    Spark DataFrame
    10
    Understanding SparkSession
    11
    SparkSession vs SparkContext
    12
    Dataframe with Various Transformations
    13
    RDD vs DataFrame vs Datasets
    14
    Challenges with DataFrame
    15
    Spark Dataset API
    16
    Difference Between DataFrame and Dataset
    17
    Benefits of Dataset
    18
    Creating Dataframe/Datasets from Various File Formats
    19
    Read Modes & Schema
    20
    Ways to Define the Schema
    21
    Defining a Explicit Schema

    Apache Spark - Structured API Part-2

    1
    Writing Output to Sink (spark.write)
    2
    Spark File Layout
    3
    Benefits of Repartitions
    4
    partitionBy & bucketBy
    5
    Saving file in Various file format
    6
    Introduction to SparkSql
    7
    Storing Data in Persistent Manner
    8
    Handling Spark Metadata
    9
    Low & High level Transformations
    10
    Refering to a Column in Dataframe/Dataset
    11
    Column String
    12
    Column Object
    13
    Column Expression
    14
    Spark UDF using Structured API
    15
    Adding Column in Dataframe
    16
    Dataframe to Dataset Using Case Class.
    17
    Dataset to DataFrame Conversion
    18
    Spark Catalog
    19
    Registring UDF with Driver
    20
    Transformations Hands on Examples
    21
    Aggregate Transformations
    22
    Simple Aggregations
    23
    Grouping Aggregations
    24
    Window Aggregations
    25
    Joins on DataFrame
    26
    Simple Join (Shuffle Sort Merge Join)
    27
    Broadcast Join
    28
    Dealing With Ambiguoes Column Names
    29
    Dealing With Null’s
    30
    Internals of Join Operations
    31
    When to Use Simple Join When Use Broadcast Join
    32
    Grouping Aggregation Real-time Example
    33
    Infering Data in SparkSQL
    34
    Quiz
    35
    Assignment
    36
    Assignment Solution

    Apache Spark - Optimization Part-1

    1
    Level of Optimizations
    2
    Resource level optimizations
    3
    Application level optimizations
    4
    Cluster level optimizations
    5
    How to calculate no of Executors
    6
    Thin Executor
    7
    Fat Executor
    8
    How to calculate no of Executors
    9
    How to Calculate Memory allacation
    10
    How to Calculate No of Cores
    11
    Heap Memory
    12
    Off-Heap Memory
    13
    Hands on With Real-time cluster
    14
    Understanding Cluster Configuarations
    15
    Realtime Example:Moving ata to HDFS using a Edge node and work around it in a realtime cluster
    16
    Static Resource allocation
    17
    Dynamic Resource allocation
    18
    Understanding Memory Usage in Spark
    19
    Execution Memory
    20
    Storage Memory
    21
    Practical Demonstration: Cache & Persist
    22
    Java Serializer vs Kryo Serializer
    23
    Quiz
    24
    Assignment
    25
    Assignment Solution

    Apache Spark - Optimization Part-2

    1
    Broadcast Join Practical Demonstartions
    2
    Broadcast Join Using RDD
    3
    When to Use Broadcast Join
    4
    Broadcast Join Using Dataframe
    5
    Visualizing Broadcast Join with Structured API
    6
    Practical Demo on Repartition vs Coalesce
    7
    Client Mode vs Cluster Mode When using Spark submit
    8
    Spark Join Optimizations
    9
    Spark Advance Optimizations: Sort Aggregate vs Hash Aggregate
    10
    Spark Catalyst Optimizer
    11
    Quiz
    12
    Assignment
    13
    Assignment Solution

    Apache Spark - Streaming

    1
    What is Real-time Processing
    2
    The Importance of Real-time Processing
    3
    Batch processing vs Real-time Stream Processing
    4
    Spark Streaming Data
    5
    Spark discretized stream or DStream
    6
    Batch & Batch Interval
    7
    Do Spark is a real-time streaming engine
    8
    Stream Processing in Spark
    9
    Transformed DStream
    10
    Understanding Producer & Consumer
    11
    Practical on Real-time Processing
    12
    Stream Transformations
    13
    Stateless Transformations
    14
    Stateful Transformations
    15
    Window Operations
    16
    Batch Interval
    17
    Window Size
    18
    Sliding Interval
    19
    Practical on Stateless Transformation
    20
    Practical on Stateful Transformation
    21
    reduceByKey vs updateStateByKey
    22
    Working With Sliding Window
    23
    reduceByKeyAndWindow Transformation
    24
    reduceByWindow Transformation
    25
    countByWindow Transformation
    26
    Quiz
    27
    Assignment
    28
    Assignment Solution

    Apache Spark - Streaming Part-2

    1
    What Is Structured Streaming
    2
    Requirement Of Structure Streaming
    3
    Limitations Of Spark Streaming
    4
    Benefits Of Spark Structure Streaming
    5
    Practical – Wordcount Example On Structured Streaming
    6
    Dynamically Setting The Shuffle Partitions
    7
    Data Stream Writer Output Modes
    8
    Datastream Output Modes – append, update & complete
    9
    Spark Streaming Graceful Shutdown
    10
    How Does Spark Streaming Code Executes Internally
    11
    How a Job Converted to Micro batches
    12
    Trigger Point For Micro Batches
    13
    Types of Triggers – unspecified, time interval,one time, continuous
    14
    Types of Data Sources – Socket Source, Rate ,Source, File Source, Kafka Source
    15
    Limitations of socket source
    16
    Practical on File Data Source
    17
    Types of Spark Streaming Output Data Options
    18
    Fault Tolerance and Exactly Once Guarantee
    19
    Understanding Checkpoint Location
    20
    Stateful vs Stateless Transformations
    21
    Managed Stateful Operations vs UnManaged Stateful Operations
    22
    Types of Aggregations – Continuous Aggregations vs Time Bound Aggregations
    23
    Window Tranformations
    24
    updateStateByKey, reduceByKeyAndWindow,reduceByWindow, countByWindow
    25
    Types of windows – Tumbling Time Window,Sliding Time Window
    26
    Streaming Joins
    27
    Streaming Dataframe to static dataframe
    28
    Streaming Dataframe With Another Streaming Dataframes
    29
    Quiz
    30
    Assignment
    31
    Assignment Solution

    Apache Kafka - Distributed Event Streaming Platform

    1
    Introduction To Kafka
    2
    Kakfa Architecture
    3
    Kafka Key Concepts/Fundamentals
    4
    Overview Of Zookeeper And It’s Role In Kafka Cluster
    5
    Cluster, Nodes, Brokers, Topics
    6
    Consumer, Producers, Logs, Partitions
    7
    Concept Of Consumer Groups
    8
    Leader & Follower Partition
    9
    Installing One Node Kafka Cluster On Local
    10
    Installing Multi Broker Kafka Cluster On Local
    11
    Command Line Producer And Consumer
    12
    Replication Concept For Fault Tolerance
    13
    How Data Is Stored In Brokers
    14
    Log Segments, Message Offsets, Message Index
    15
    Isr List / Minimum Isr
    16
    Committed Vs Uncommited Messages
    17
    Writing A Kafka Producer In Java
    18
    Writing A Kafka Consumer In Java
    19
    Achieving Exactly Once Semantics
    20
    Integrating Kafka With Spark Structured Streaming.
    21
    Quiz
    22
    Assignment
    23
    Assignment Solution

    Big Data on Cloud

    1
    AWS EMR (Elastic MapReduce)
    2
    What is a VM (Virtual Machine)
    3
    On-Premise vs Cloud Setup
    4
    Major Vendors of Hadoop Distribution
    5
    Why Cloud & Big Data on Cloud
    6
    Major Cloud Providers of Bigdata
    7
    What is EMR
    8
    Hdfs vs S3
    9
    What Is S3
    10
    Important Instances in AWS
    11
    Kinds of Nodes in Cluster
    12
    Transient vs Long Running Cluster
    13
    Running Spark Code on Emr
    14
    How to Track Your Job
    15
    Copy File From S3 to Local
    16
    Zeppelin Notebook
    17
    Types of EC2 Instances
    18
    How to Create a VM
    19
    What is a Keypair
    20
    Elastic IP
    21
    AWS Storage, Networking & CLI
    22
    Instance Store
    23
    S3 & EBS
    24
    Public Ip Vs Private Ip
    25
    Network Switches
    26
    Security Group
    27
    Aws Command Line Interface
    28
    Launch A Emr Cluster Using Advanced Options

    AWS Athena:

    1
    What is Athena
    2
    When do we require Athena
    3
    What problem Athena Solve
    4
    How Athena Works
    5
    Athena Practical Demonstration
    6
    How to create a normal table manually on csv data residing in s3
    7
    How to minimize data scanning in Athena
    8
    How to create partition table on Parquet file
    9
    Infering Schema automatically using AWS Glue
    10
    Glue Catalog
    11
    Quiz
    12
    Assignment
    13
    Assignment Solution

    Final Project

    1
    One end-to-end pipeline PROJECT involving all Major components like Sqoop, Hdfs, Hive, Hbase, Spark… etc.
    2
    Interview Preparation Tips:
    3
    Resume Building
    4
    15+ Mock Interview Recordings
    5
    Mock Interview
    6
    Interview Questions
    7
    How to Handle Managerial Round Qs