Basehook Airflow

Basehook Airflow

Airflow Basehook Connection
Airflow Basehook Get_connection
Airflow Basehook Example
Airflow Basehook Github
BaseHook (source) source ¶ Bases: airflow.utils.log.loggingmixin.LoggingMixin Abstract base class for hooks, hooks are meant as an interface to interact with external systems.
Module Contents¶See the License for the # specific language governing permissions and limitations # under the License. From past.builtins import unicode import cloudant from airflow.exceptions import AirflowException from airflow.hooks.basehook import BaseHook from airflow.utils.log.loggingmixin import LoggingMixin.
Source code for airflow.hooks.dbapi # # Licensed to the Apache Software Foundation (ASF) under one # or more contributor license agreements. See the NOTICE file # distributed with this work for additional information # regarding copyright ownership.
Exceptions import AirflowException: from airflow. Base import BaseHook: class AzureBaseHook (BaseHook): ' This hook acts as a base hook for azure services. It offers several authentication mechanisms to: authenticate the client library used for upstream azure hooks.:param sdkclient: The SDKClient to use.
airflow.hooks.hive_hooks.HIVE_QUEUE_PRIORITIES = ['VERY_HIGH', 'HIGH', 'NORMAL', 'LOW', 'VERY_LOW'][source]¶airflow.hooks.hive_hooks.get_context_from_env_var()[source]¶Extract context from env variable, e.g. dag_id, task_id and execution_date,so that they can be used inside BashOperator and PythonOperator.ReturnsThe context of interest.
class airflow.hooks.hive_hooks.HiveCliHook(hive_cli_conn_id='hive_cli_default', run_as=None, mapred_queue=None, mapred_queue_priority=None, mapred_job_name=None)[source]¶Bases: airflow.hooks.base_hook.BaseHook
Simple wrapper around the hive CLI.
It also supports the beelinea lighter CLI that runs JDBC and is replacing the heaviertraditional CLI. To enable beeline, set the use_beeline param in theextra field of your connection as in {'use_beeline':true}
Note that you can also set default hive CLI parameters using thehive_cli_params to be used in your connection as in{'hive_cli_params':'-hiveconfmapred.job.tracker=some.jobtracker:444'}Parameters passed here can be overridden by run_cli's hive_conf param
The extra connection parameter auth gets passed as in the jdbc Twitch beatport. connection string as is.
Parametersmapred_queue (str) – queue used by the Hadoop Scheduler (Capacity or Fair)
mapred_queue_priority (str) – priority within the job queue.Possible settings include: VERY_HIGH, HIGH, NORMAL, LOW, VERY_LOW
mapred_job_name (str) – This name will appear in the jobtracker.This can make monitoring easier.
_get_proxy_user(self)[source]¶This function set the proper proxy_user value in case the user overwtire the default.
_prepare_cli_cmd(self)[source]¶This function creates the command list from available information
static _prepare_hiveconf(d)[source]¶This function prepares a list of hiveconf paramsfrom a dictionary of key value pairs.
Parametersd (dict) – 
run_cli(self, hql, schema=None, verbose=True, hive_conf=None)[source]¶Run an hql statement using the hive cli. If hive_conf is specifiedit should be a dict and the entries will be set as key/value pairsin HiveConf
Parametershive_conf (dict) – if specified these key value pairs will be passedto hive as -hiveconf'key'='value'. Note that they will bepassed after the hive_cli_params and thus will overridewhatever values are specified in the database.
test_hql(self, hql)[source]¶Test an hql statement using the hive cli and EXPLAIN
load_df(self, df, table, field_dict=None, delimiter=', ', encoding='utf8', pandas_kwargs=None, **kwargs)[source]¶Loads a pandas DataFrame into hive.
Hive data types will be inferred if not passed but column names willnot be sanitized.
Parametersdf (pandas.DataFrame) – DataFrame to load into a Hive table
table (str) – target Hive table, use dot notation to target aspecific database
field_dict (collections.OrderedDict) – mapping from column name to hive data type.Note that it must be OrderedDict so as to keep columns' order.
delimiter (str) – field delimiter in the file
encoding (str) – str encoding to use when writing DataFrame to file
pandas_kwargs (dict) – passed to DataFrame.to_csv
kwargs – passed to self.load_file
load_file(self, filepath, table, delimiter=', ', field_dict=None, create=True, overwrite=True, partition=None, recreate=False, tblproperties=None)[source]¶Loads a local file into Hive
Note that the table generated in Hive uses STOREDAStextfilewhich isn't the most efficient serialization format. If alarge amount of data is loaded and/or if the tables getsqueried considerably, you may want to use this operator only tostage the data into a temporary table before loading it into itsfinal destination using a HiveOperator.
Parametersfilepath (str) – local filepath of the file to load
table (str) – target Hive table, use dot notation to target aspecific database
delimiter (str) – field delimiter in the file
field_dict (collections.OrderedDict) – A dictionary of the fields name in the fileas keys and their Hive types as values.Note that it must be OrderedDict so as to keep columns' order.
create (bool) – whether to create the table if it doesn't exist
overwrite (bool) – whether to overwrite the data in table or partition
partition (dict) – target partition as a dict of partition columnsand values
recreate (bool) – whether to drop and recreate the table at everyexecution
tblproperties (dict) – TBLPROPERTIES of the hive table being created
kill(self)[source]¶class airflow.hooks.hive_hooks.HiveMetastoreHook(metastore_conn_id='metastore_default')[source]¶Bases: airflow.hooks.base_hook.BaseHook
Wrapper to interact with the Hive Metastore
MAX_PART_COUNT = 32767[source]¶__getstate__(self)[source]¶__setstate__(self, d)[source]¶get_metastore_client(self)[source]¶Returns a Hive thrift client.
get_conn(self)[source]¶

check_for_partition(self, schema, table, partition)[source]¶Checks whether a partition exists
Parametersschema (str) – Name of hive schema (database) @table belongs to
table – Name of hive table @partition belongs to
PartitionExpression that matches the partitions to check for(eg a = ‘b' AND c = ‘d')
Return typecheck_for_named_partition(self, schema, table, partition_name)[source]¶Checks whether a partition with a given name exists
Parametersschema (str) – Name of hive schema (database) @table belongs to
table – Name of hive table @partition belongs to
PartitionName of the partitions to check for (eg a=b/c=d)
Return typeget_table(self, table_name, db='default')[source]¶Get a metastore table object
get_tables(self, db, pattern='*')[source]¶Get a metastore table object Garmin gtu 10 alternative.
get_databases(self, pattern='*')[source]¶Get a metastore table object
get_partitions(self, schema, table_name, filter=None)[source]¶Returns a list of all partitions in a table. Works onlyfor tables with less than 32767 (java short max val).For subpartitioned table, the number might easily exceed this.
static _get_max_partition_from_part_specs(part_specs, partition_key, filter_map)[source]¶Helper method to get max partition of partitions with partition_keyfrom part specs. key:value pair in filter_map will be used tofilter out partitions.
Parameterspart_specs (list) – list of partition specs.
partition_key (str) – partition key name.
filter_map (map) – partition_key:partition_value map used for partition filtering,e.g. {‘key1': ‘value1', ‘key2': ‘value2'}.Only partitions matching all partition_key:partition_valuepairs will be considered as candidates of max partition.
ReturnsMax partition or None if part_specs is empty.
max_partition(self, schema, table_name, field=None, filter_map=None)[source]¶Returns the maximum value for all partitions with given field in a table.If only one partition key exist in the table, the key will be used as field.filter_map should be a partition_key:partition_value map and will be used tofilter out partitions.
Parametersschema (str) – schema name.
table_name (str) – table name.
field (str) – partition key to get max partition from.
filter_map (map) – partition_key:partition_value map used for partition filtering.
table_exists(self, table_name, db='default')[source]¶Check if table exists
class airflow.hooks.hive_hooks.HiveServer2Hook(hiveserver2_conn_id='hiveserver2_default')[source]¶Bases: airflow.hooks.base_hook.BaseHook
Wrapper around the pyhive library
Notes:* the default authMechanism is PLAIN, to override it youcan specify it in the extra of your connection in the UI* the default for run_set_variable_statements is true, if youare using impala you may need to set it to false in theextra of your connection in the UI
get_conn(self, schema=None)[source]¶Returns a Hive connection object.
_get_results(self, hql, schema='default', fetch_size=None, hive_conf=None)[source]¶get_results(self, hql, schema='default', fetch_size=None, hive_conf=None)[source]¶

Get results of the provided hql in target schema.
Parametershql (str or list) – hql to be executed.
schema (str) – target schema, default to ‘default'.
fetch_size (int Balcao pia kapesberg. ) – max size of result to fetch.
hive_conf (dict) – hive_conf to execute alone with the hql.
Returnsresults of hql execution, dict with data (list of results) and header
Return typeto_csv(self, hql, csv_filepath, schema='default', delimiter=', ', lineterminator='rn', output_header=True, fetch_size=1000, hive_conf=None)[source]¶Execute hql in target schema and write results to a csv file.
Parametershql (str or list) – hql to be executed.
csv_filepath (str) – filepath of csv to write results into.
schema (str) – target schema, default to ‘default'.
delimiter (str) – delimiter of the csv file, default to ‘,'.
lineterminator (str) – lineterminator of the csv file.
output_header (bool) – header of the csv file, default to True.
fetch_size (int) – number of result rows to write into the csv file, default to 1000.
hive_conf (dict) – hive_conf to execute alone with the hql.
get_records(self, hql, schema='default')[source]¶Get a set of records from a Hive query.
Parametershql (str or list) – hql to be executed.
schema (str) – target schema, default to ‘default'.
hive_conf (dict) – hive_conf to execute alone with the hql.
Returnsresult of hive execution
Return typeget_pandas_df(self, hql, schema='default')[source]¶Get a pandas dataframe from a Hive query
Parametershql (str or list) – hql to be executed.
schema (str) – target schema, default to ‘default'.
Returnsresult of hql execution
Return typeDataFrame
Returnspandas.DateFrame
Latest version Released: 
A Pylint plugin to lint Apache Airflow code.
Project descriptionPylint plugin for static code analysis on Airflow code.
UsageInstallation:
Usage:
This plugin runs on Python 3.6 and higher.
Error codesThe Pylint-Airflow codes follow the structure {I,C,R,W,E,F}83{0-9}{0-9}, where:
The characters show:I = Info
C = Convention
R = Refactor
W = Warning
E = Error
F = Fatal
83 is the base id (see all here https://github.com/PyCQA/pylint/blob/master/pylint/checkers/__init__.py)
{0-9}{0-9} is any number 00-99
The current codes are:
CodeSymbolDescription
C8300different-operator-varname-taskidFor consistency assign the same variable name and task_id to operators.
C8301match-callable-taskidFor consistency name the callable function ‘_[task_id]', e.g. PythonOperator(task_id='mytask', python_callable=_mytask).
C8302mixed-dependency-directionsFor consistency don't mix directions in a single statement, instead split over multiple statements.
C8303task-no-dependenciesSometimes a task without any dependency is desired, however often it is the result of a forgotten dependency.
C8304task-context-argnameIndicate you expect Airflow task context variables in the **kwargs argument by renaming to **context.
C8305task-context-separate-argTo avoid unpacking kwargs from the Airflow task context in a function, you can set the needed variables as arguments in the function.
C8306match-dagid-filenameFor consistency match the DAG filename with the dag_id.
R8300unused-xcomReturn values from a python_callable function or execute() method are automatically pushed as XCom.
W8300basehook-top-levelAirflow executes DAG scripts periodically and anything at the top level of a script is executed. Therefore, move BaseHook calls into functions/hooks/operators.
E8300duplicate-dag-nameDAG name should be unique.
E8301duplicate-task-nameTask name within a DAG should be unique.
E8302duplicate-dependencyTask dependencies can be defined only once.
E8303dag-with-cyclesA DAG is acyclic and cannot contain cycles.
E8304task-no-dagA task must know a DAG instance to run.

Code	Symbol	Description
C8300	different-operator-varname-taskid	For consistency assign the same variable name and task_id to operators.
C8301	match-callable-taskid	For consistency name the callable function ‘_[task_id]', e.g. PythonOperator(task_id='mytask', python_callable=_mytask).
C8302	mixed-dependency-directions	For consistency don't mix directions in a single statement, instead split over multiple statements.
C8303	task-no-dependencies	Sometimes a task without any dependency is desired, however often it is the result of a forgotten dependency.
C8304	task-context-argname	Indicate you expect Airflow task context variables in the kwargs argument by renaming to context.
C8305	task-context-separate-arg	To avoid unpacking kwargs from the Airflow task context in a function, you can set the needed variables as arguments in the function.
C8306	match-dagid-filename	For consistency match the DAG filename with the dag_id.
R8300	unused-xcom	Return values from a python_callable function or execute() method are automatically pushed as XCom.
W8300	basehook-top-level	Airflow executes DAG scripts periodically and anything at the top level of a script is executed. Therefore, move BaseHook calls into functions/hooks/operators.
E8300	duplicate-dag-name	DAG name should be unique.
E8301	duplicate-task-name	Task name within a DAG should be unique.
E8302	duplicate-dependency	Task dependencies can be defined only once.
E8303	dag-with-cycles	A DAG is acyclic and cannot contain cycles.
E8304	task-no-dag	A task must know a DAG instance to run.

DocumentationDocumentation is available on Read the Docs.
ContributingSuggestions for more checks are always welcome, please create an issue on GitHub. Read CONTRIBUTING.rst for more details.
Release historyRelease notifications | RSS feed Download filesDownload the file for your platform. If you're not sure which to choose, learn more about installing packages.
Files for pylint-airflow, version 0.1.0a1Filename, sizeFile typePython versionUpload dateHashes
Filename, size pylint_airflow-0.1.0a1-py3-none-any.whl (9.5 kB) File type Wheel Python version py3 Upload dateHashes
Filename, size pylint-airflow-0.1.0a1.tar.gz (7.0 kB) File type Source Python version None Upload dateHashes

Filename, size	File type	Python version	Upload date	Hashes
Filename, size pylint_airflow-0.1.0a1-py3-none-any.whl (9.5 kB)	File type Wheel	Python version py3	Upload date	Hashes
Filename, size pylint-airflow-0.1.0a1.tar.gz (7.0 kB)	File type Source	Python version None	Upload date	Hashes

check_for_partition(self, schema, table, partition)[source]¶Checks whether a partition exists
Parametersschema (str) – Name of hive schema (database) @table belongs to
table – Name of hive table @partition belongs to
PartitionExpression that matches the partitions to check for(eg a = ‘b' AND c = ‘d')
Return typecheck_for_named_partition(self, schema, table, partition_name)[source]¶Checks whether a partition with a given name exists
Parametersschema (str) – Name of hive schema (database) @table belongs to
table – Name of hive table @partition belongs to
PartitionName of the partitions to check for (eg a=b/c=d)
Return typeget_table(self, table_name, db='default')[source]¶Get a metastore table object
get_tables(self, db, pattern='*')[source]¶Get a metastore table object Garmin gtu 10 alternative.
get_databases(self, pattern='*')[source]¶Get a metastore table object
get_partitions(self, schema, table_name, filter=None)[source]¶Returns a list of all partitions in a table. Works onlyfor tables with less than 32767 (java short max val).For subpartitioned table, the number might easily exceed this.
static _get_max_partition_from_part_specs(part_specs, partition_key, filter_map)[source]¶Helper method to get max partition of partitions with partition_keyfrom part specs. key:value pair in filter_map will be used tofilter out partitions.
Parameterspart_specs (list) – list of partition specs.
partition_key (str) – partition key name.
filter_map (map) – partition_key:partition_value map used for partition filtering,e.g. {‘key1': ‘value1', ‘key2': ‘value2'}.Only partitions matching all partition_key:partition_valuepairs will be considered as candidates of max partition.
ReturnsMax partition or None if part_specs is empty.
max_partition(self, schema, table_name, field=None, filter_map=None)[source]¶Returns the maximum value for all partitions with given field in a table.If only one partition key exist in the table, the key will be used as field.filter_map should be a partition_key:partition_value map and will be used tofilter out partitions.
Parametersschema (str) – schema name.
table_name (str) – table name.
field (str) – partition key to get max partition from.
filter_map (map) – partition_key:partition_value map used for partition filtering.
table_exists(self, table_name, db='default')[source]¶Check if table exists
class airflow.hooks.hive_hooks.HiveServer2Hook(hiveserver2_conn_id='hiveserver2_default')[source]¶Bases: airflow.hooks.base_hook.BaseHook
Wrapper around the pyhive library
Notes:* the default authMechanism is PLAIN, to override it youcan specify it in the extra of your connection in the UI* the default for run_set_variable_statements is true, if youare using impala you may need to set it to false in theextra of your connection in the UI
get_conn(self, schema=None)[source]¶Returns a Hive connection object.
_get_results(self, hql, schema='default', fetch_size=None, hive_conf=None)[source]¶get_results(self, hql, schema='default', fetch_size=None, hive_conf=None)[source]¶Get results of the provided hql in target schema.
Parametershql (str or list) – hql to be executed.
schema (str) – target schema, default to ‘default'.
fetch_size (int Balcao pia kapesberg. ) – max size of result to fetch.
hive_conf (dict) – hive_conf to execute alone with the hql.
Returnsresults of hql execution, dict with data (list of results) and header
Return typeto_csv(self, hql, csv_filepath, schema='default', delimiter=', ', lineterminator='rn', output_header=True, fetch_size=1000, hive_conf=None)[source]¶Execute hql in target schema and write results to a csv file.
Parametershql (str or list) – hql to be executed.
csv_filepath (str) – filepath of csv to write results into.
schema (str) – target schema, default to ‘default'.
delimiter (str) – delimiter of the csv file, default to ‘,'.
lineterminator (str) – lineterminator of the csv file.
output_header (bool) – header of the csv file, default to True.
fetch_size (int) – number of result rows to write into the csv file, default to 1000.
hive_conf (dict) – hive_conf to execute alone with the hql.
get_records(self, hql, schema='default')[source]¶Get a set of records from a Hive query.
Parametershql (str or list) – hql to be executed.
schema (str) – target schema, default to ‘default'.
hive_conf (dict) – hive_conf to execute alone with the hql.
Returnsresult of hive execution
Return typeget_pandas_df(self, hql, schema='default')[source]¶Get a pandas dataframe from a Hive query
Parametershql (str or list) – hql to be executed.
schema (str) – target schema, default to ‘default'.
Returnsresult of hql execution
Return typeDataFrame
Returnspandas.DateFrame
Latest version Released: 
A Pylint plugin to lint Apache Airflow code.
Project descriptionPylint plugin for static code analysis on Airflow code.
UsageInstallation:
Usage:
This plugin runs on Python 3.6 and higher.
Error codesThe Pylint-Airflow codes follow the structure {I,C,R,W,E,F}83{0-9}{0-9}, where:
The characters show:I = Info
C = Convention
R = Refactor
W = Warning
E = Error
F = Fatal
83 is the base id (see all here https://github.com/PyCQA/pylint/blob/master/pylint/checkers/__init__.py)
{0-9}{0-9} is any number 00-99
The current codes are:
CodeSymbolDescription
C8300different-operator-varname-taskidFor consistency assign the same variable name and task_id to operators.
C8301match-callable-taskidFor consistency name the callable function ‘_[task_id]', e.g. PythonOperator(task_id='mytask', python_callable=_mytask).
C8302mixed-dependency-directionsFor consistency don't mix directions in a single statement, instead split over multiple statements.
C8303task-no-dependenciesSometimes a task without any dependency is desired, however often it is the result of a forgotten dependency.
C8304task-context-argnameIndicate you expect Airflow task context variables in the **kwargs argument by renaming to **context.
C8305task-context-separate-argTo avoid unpacking kwargs from the Airflow task context in a function, you can set the needed variables as arguments in the function.
C8306match-dagid-filenameFor consistency match the DAG filename with the dag_id.
R8300unused-xcomReturn values from a python_callable function or execute() method are automatically pushed as XCom.
W8300basehook-top-levelAirflow executes DAG scripts periodically and anything at the top level of a script is executed. Therefore, move BaseHook calls into functions/hooks/operators.
E8300duplicate-dag-nameDAG name should be unique.
E8301duplicate-task-nameTask name within a DAG should be unique.
E8302duplicate-dependencyTask dependencies can be defined only once.
E8303dag-with-cyclesA DAG is acyclic and cannot contain cycles.
E8304task-no-dagA task must know a DAG instance to run.
DocumentationDocumentation is available on Read the Docs.
ContributingSuggestions for more checks are always welcome, please create an issue on GitHub. Read CONTRIBUTING.rst for more details.
Release historyRelease notifications | RSS feed Download filesDownload the file for your platform. If you're not sure which to choose, learn more about installing packages.
Files for pylint-airflow, version 0.1.0a1Filename, sizeFile typePython versionUpload dateHashes
Filename, size pylint_airflow-0.1.0a1-py3-none-any.whl (9.5 kB) File type Wheel Python version py3 Upload dateHashes
Filename, size pylint-airflow-0.1.0a1.tar.gz (7.0 kB) File type Source Python version None Upload dateHashes
Airflow Basehook ConnectionCloseHashes for pylint_airflow-0.1.0a1-py3-none-any.whl Airflow Basehook Get_connectionHashes for pylint_airflow-0.1.0a1-py3-none-any.whlAlgorithmHash digest
SHA256530a43e902831f2806dc3f672c7eb46ac1378106c400a04dce2ea1ccd5c4d0c3
MD5c8fba07d31ad7d45b16f91212414c971
BLAKE2-2565f795fff2161bfecfc249eb8050685e2403488d83290ac4e6237781176f3968e
Airflow Basehook ExampleCloseHashes for pylint-airflow-0.1.0a1.tar.gz Airflow Basehook GithubHashes for pylint-airflow-0.1.0a1.tar.gzAlgorithmHash digest
SHA256a0e3edb13932f35b24b24e66ebe596541600575b755b78a2a33d74d8bf94b620
MD5d8d8fa0b1ad4f553a1cfd2ab59141e51
BLAKE2-256ff55619e9e6b5710eee8439c725ab18e10ca6da47d1c6b90c7a43770debd1915

Algorithm	Hash digest
SHA256	`530a43e902831f2806dc3f672c7eb46ac1378106c400a04dce2ea1ccd5c4d0c3`
MD5	`c8fba07d31ad7d45b16f91212414c971`
BLAKE2-256	`5f795fff2161bfecfc249eb8050685e2403488d83290ac4e6237781176f3968e`

Algorithm	Hash digest
SHA256	`a0e3edb13932f35b24b24e66ebe596541600575b755b78a2a33d74d8bf94b620`
MD5	`d8d8fa0b1ad4f553a1cfd2ab59141e51`
BLAKE2-256	`ff55619e9e6b5710eee8439c725ab18e10ca6da47d1c6b90c7a43770debd1915`