Architecture overview v2

Hadoop is a framework that allows you to store a large data set in a distributed file system.

The Hadoop data wrapper provides an interface between a Hadoop file system and a Postgres database. The Hadoop data wrapper transforms a Postgres SELECT statement into a query that is understood by the HiveQL or Spark SQL interface.

Using a Hadoop distributed file system with Postgres

When possible, the foreign data wrapper asks the Hive or Spark server to perform the actions associated with the WHERE clause of a SELECT statement. Pushing down the WHERE clause improves performance by decreasing the amount of data moving across the network.