Terminology clarification in Spark

I have a hard time distinguishing terminologies of SparkSQL. While SparkSQL are quite flexible in terms of abstraction layers, its really difficult for beginner to navigate around those options.

1. When we say " using SparkSQL to perform .....", does it mean that we can use any API/abstraction layers such as Scala, Python, HiveQL to query? As long as the core dataframe is in spark, we should be fine?

2. Can we manipulate data in both PySpark and Scala sequentially?

For example, may I clean up the data in Scala, then perform follow up manipulation in PySpark, then go back to Scala?

3. As demonstrated in the tutorial, we can query with SQL command by using the api spark.sql("My SQL command"). does it count as SQL or SPARK?

Image result for spark sql

1 Answer

Terminology clarification in Spark

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Related questions

Categories