Operator

class bigquery_operator.operator.Operator(client: Client, dataset_id: str)[source]

Bases: object

Wrapper for usual operations on a fixed BigQuery dataset.

Parameters
  • client (google.cloud.bigquery.client.Client) – Client to manage connections to the BigQuery API.

  • dataset_id (str) – The dataset id in the format ‘project_id.dataset_name’.

build_table_id(table_name: str, dataset_id: Optional[str] = None)[source]

Return a table id.

Parameters
  • table_name (str) – A table name.

  • dataset_id (str, optional) – A dataset id in the format ‘project_id.dataset_name’. If not passed, falls back to self.dataset_id.

Returns

A table id in the format ‘project_id.dataset_name.table_name’.

Return type

str

clean_dataset() None[source]

Delete all the tables from the dataset.

property client: Client

The client.

Type

google.cloud.bigquery.client.Client

copy_table(source_table_name: str, destination_table_name: str, source_dataset_id: Optional[str] = None, write_disposition: Optional[WriteDisposition] = 'WRITE_TRUNCATE') None[source]

Copy a table. source_dataset_id must be given in the format ‘project_id.dataset_name’. If not passed, falls back to self.dataset_id.

copy_tables(source_table_names: List[str], destination_table_names: List[str], source_dataset_id: Optional[str] = None, write_disposition: Optional[WriteDisposition] = 'WRITE_TRUNCATE') None[source]

Copy tables. source_dataset_id must be given in the format ‘project_id.dataset_name’. If not passed, falls back to self.dataset_id.

create_dataset(location: Optional[str] = None, default_time_to_live: Optional[int] = None) None[source]

Create the dataset.

Parameters
  • location (str, optional) – Location in which the dataset is hosted.

  • default_time_to_live (int, optional) – The default time to live in days for the tables created in the dataset.

create_empty_table(table_name: str, schema: Optional[List[SchemaField]] = None, time_partitioning: Optional[TimePartitioning] = None, range_partitioning: Optional[RangePartitioning] = None, require_partition_filter: Optional[bool] = None, clustering_fields: Optional[List[str]] = None) None[source]

Create a empty table. Only specify at most one of time_partitioning or range_partitioning.

create_view(query: str, destination_table_name: str) None[source]

Create a view.

dataset_exists() bool[source]

Return True if the dataset exists.

property dataset_id: str

The dataset id in the format ‘project_id.dataset_name’.

Type

str

property dataset_name: str

The dataset name.

Type

str

delete_dataset() None[source]

Delete the dataset.

delete_table(table_name: str) None[source]

Delete a table.

extract_table(source_table_name: str, destination_uri: str, field_delimiter: Optional[str] = '|', print_header: Optional[bool] = True) None[source]

Extract a table. Data is extracted as a gzip compressed csv. destination_uri must end with ‘.csv.gz’

extract_tables(source_table_names: List[str], destination_uris: List[str], field_delimiter: Optional[str] = '|', print_header: Optional[bool] = True) None[source]

Extract tables from BigQuery to Storage. Each source table is extracted as one or more compressed gzip csv files. Each destination uri must end with ‘.csv.gz’.

get_columns(table_name: str) List[str][source]

Return the column names of a table.

get_dataset() Dataset[source]

Get the dataset. An api call is made.

get_format_attributes(table_name)[source]

Return the following table attributes: schema, time_partitioning, range_partitioning, require_partition_filter, clustering_fields.

get_table(table_name: str) Table[source]

Get a table. An api call is made.

instantiate_dataset() Dataset[source]

Instantiate the dataset. No api call is made.

instantiate_table(table_name: str) Table[source]

Instantiate a table. No api call is made.

list_tables() List[str][source]

List the names of the tables in the dataset.

load_table(source_uri: str, destination_table_name: str, schema: Optional[List[SchemaField]] = None, field_delimiter: Optional[str] = '|', write_disposition: Optional[WriteDisposition] = 'WRITE_TRUNCATE') None[source]

Load one or more Storage CSV files into one BigQuery table.

load_tables(source_uris: List[str], destination_table_names: List[str], schemas: Optional[List[List[SchemaField]]] = None, field_delimiter: Optional[str] = '|', write_disposition: Optional[WriteDisposition] = 'WRITE_TRUNCATE') None[source]

Load Storage CSV files into BigQuery tables.

property project_id: str

the id of the project which the client acts on behalf of.

Type

str

run_queries(queries: List[str], destination_table_names: List[str], write_disposition: Optional[WriteDisposition] = 'WRITE_TRUNCATE') dict[source]

Run queries. Return monitoring as a dict in the format {‘duration’: d, ‘cost’: c} where d is the execution duration in seconds and c the execution cost in dollars.

run_query(query: str, destination_table_name: str, write_disposition: Optional[WriteDisposition] = 'WRITE_TRUNCATE') dict[source]

Run a query. Return monitoring as a dict in the format {‘duration’: d, ‘cost’: c} where d is the execution duration in seconds and c the execution cost in dollars.

static sample_query(query: str, size: int) str[source]

Sample randomly a query.

The output query gives a subset of the lines given by the input query. This subset has approximately size lines. Nonetheless, the cost of the output query is the same as the cost of the input query.

set_time_to_live(table_name: str, nb_days: int) None[source]

Set the time to live of a table in days.

table_exists(table_name: str) bool[source]

Return True if the table exists.

table_is_empty(table_name: str) bool[source]

Return True if the table is empty.