Is Spark a modified version of Hadoop?

Spark has ------------ data storage. (Inmemory)

Spark is 40 times faster than -------------.(Hadoop)

Why do we want new model after Map Reduce?

How does data sharing takes place in Map Reduce?

What is the key idea behind Spark Programming Model?

RDDs are manipulated through various -------------- operators.(parallel)

RDDs are automatically ------------- on failure.(rebuilt)

What are supported languages in Spark programming model?(Python, Scala, Java)

What is log mining?

Load error messages from ------------ into --------------- and interactively search for various patterns.

Give Hive Architecture.

Give Spark SQL architecture.

Spark SQL has column oriented storage and it uses ----------- of primitive type. (array)

Give the diagram for Spark Platform.

What is GraphX?

GraphX involves parallel -------------- processing.(Graph)

What are limited GraphX algorithm?(Page Ranks, Connected Components, Triangle Counts)

The ----------- parameter for SparkContext determines which type and size of cluster to use.

Give Master parameter along with its description.

------- master parameter runs Spark locally with one worker thread.

-------- runs Spark locally with K worker threads.

-------- master parameter connects to Spark standalone cluster and PORT depends upon config(7071 by default)

------- master parameter connects to Mesos cluster, here port depends upon (5050 by default)

What are primary abstraction in Spark?

Spark is -------- once constructed.(immutable)
-------- tracks lineage information to efficiently recompute ----- data.(Spark, lost)

Enable operation on collection of elements in --------. (parallel)

Each row of a DataFrame is ------- object.(row)

When is transformed dataframe executed?
when action runs on it

From where do we create data frames?
We create data frames from data source

How to create data frames from files?
distFile = sqlContext.read.text("......")

Spark uses ------- to optimize the required calculations.(Catalyst)

Spark recovers from ------- and --------.(failures,slow workers)

--------- method creates data frame from one column.

What does drop method does explain with example?

Drop method returns a new DataFrame that drops the --------.
(specified columns)

How to transform a data frame?
linesDF = sqlContext.read.text('....')
commentsDF = linesDF.filter(isComment)

------- cause Spark to execute recipe to transform source. (Spark Actions)

List some useful actions performed with their descriiption

Action that prints the first n rows of the DataFrame
show(n, truncate)

Returns the first n rows as a list of Row
take(n)

Returns all records as a list of rows
collect()
(warning make sure it fits in driver program)

Return number of rows in the dataframe
count()

Exploratory Data Analysis function that computes statistics for numeric columns
describe(*cols)

print linesDF.count()
count() causes Spark to do what?
1. read data
2. sum with partitions
3. combine sums in driver
Give Spark Program Lifecycle
1. Create DataFrames from external data or create data frames from DataFrame
2. Lazily tranform them(old dataframes) to ------ dataframes. (new)
3. Cache some DataFrames for -------.(reuse)
4.Perform ------- to execute parallel computation and produce. (actions)

Where does the code run?
1. Locally in the driver
2. Distributed at the --------.(executors)
3.Both at the driver and the executors.

Do executors run in parallel? (yes)

Transformation run at -------. (executors)
Action run at ------- and -------. (executors, driver)

2



  2