Package pyspark :: Module sql :: Class LocalHiveContext
[frames] | no frames]

Class LocalHiveContext

source code

SQLContext --+    
             |    
   HiveContext --+
                 |
                LocalHiveContext

Starts up an instance of hive where metadata is stored locally.

An in-process metadata data is created with data stored in ./metadata. Warehouse data is stored in in ./warehouse.

>>> import os
>>> hiveCtx = LocalHiveContext(sc)
>>> try:
...     supress = hiveCtx.hql("DROP TABLE src")
... except Exception:
...     pass
>>> kv1 = os.path.join(os.environ["SPARK_HOME"], 'examples/src/main/resources/kv1.txt')
>>> supress = hiveCtx.hql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")
>>> supress = hiveCtx.hql("LOAD DATA LOCAL INPATH '%s' INTO TABLE src" % kv1)
>>> results = hiveCtx.hql("FROM src SELECT value").map(lambda r: int(r.value.split('_')[1]))
>>> num = results.count()
>>> reduce_sum = results.reduce(lambda x, y: x + y)
>>> num
500
>>> reduce_sum
130091
Instance Methods

Inherited from HiveContext: hiveql, hql

Inherited from SQLContext: __init__, cacheTable, inferSchema, jsonFile, jsonRDD, parquetFile, registerRDDAsTable, sql, table, uncacheTable