Operator¶
- class bigquery_operator.operator.Operator(client: Client, dataset_id: str)[source]¶
Bases:
objectWrapper for usual operations on a fixed BigQuery dataset.
- Parameters:
client (google.cloud.bigquery.client.Client) – Client to manage connections to the BigQuery API.
dataset_id (str) – The dataset id in the format ‘project_id.dataset_name’.
- build_table_id(table_name: str) str[source]¶
Return a table id.
- Parameters:
table_name (str) – A table name.
- Returns:
- A table id in the format ‘dataset_id.table_name’ where
dataset_id is the argument passed to the __init__ method.
- Return type:
str
- property client: Client¶
The client.
- Type:
google.cloud.bigquery.client.Client
- property client_project_id: str¶
The id of the project which the client acts on behalf of.
- Type:
str
- copy_table(source_table_name: str, destination_table_name: str, time_to_live: int | None = None, source_dataset_id: str | None = None, write_disposition: WriteDisposition | None = 'WRITE_TRUNCATE') None[source]¶
Copy a table.
source_dataset_idmust be given in the format ‘project_id.dataset_name’. If not passed, falls back to self.dataset_id.
- copy_tables(source_table_names: List[str], destination_table_names: List[str], time_to_live: int | None = None, source_dataset_id: str | None = None, write_disposition: WriteDisposition | None = 'WRITE_TRUNCATE') None[source]¶
Copy tables.
source_dataset_idmust be given in the format ‘project_id.dataset_name’. If not passed, falls back to self.dataset_id.
- create_dataset_if_not_exist(location: str) None[source]¶
Create the dataset if it does not exist. Otherwise, check that the actual location of the existing dataset equals the location specified in the argument.
- create_empty_table(table_name: str, pre_delete_if_exists: bool | None = False, time_to_live: int | None = None, schema: List[SchemaField] | None = None, time_partitioning: TimePartitioning | None = None, range_partitioning: RangePartitioning | None = None, require_partition_filter: bool | None = None, clustering_fields: List[str] | None = None) None[source]¶
Create a empty table. Only specify at most one of time_partitioning or range_partitioning.
- create_view(query: str, destination_table_name: str, pre_delete_if_exists: bool | None = False, time_to_live: int | None = None) None[source]¶
Create a view.
- property dataset_id: str¶
The dataset id in the format ‘project_id.dataset_name’.
- Type:
str
- property dataset_name: str¶
The dataset name.
- Type:
str
- property dataset_project_id: str¶
The project id of the dataset.
- Type:
str
- delete_table_if_mismatches(reference: str, table_name: str) None[source]¶
Delete a table if the format attributes of the table and the reference table are not the same. The format attributes are given by the method get_format_attributes.
- extract_table(source_table_name: str, destination_uri: str, compression: str | None = None, field_delimiter: str | None = '|', print_header: bool | None = True) None[source]¶
Extract a table.
- extract_tables(source_table_names: List[str], destination_uris: List[str], compression: Compression | None = None, field_delimiter: str | None = '|', print_header: bool | None = True) None[source]¶
Extract tables from BigQuery to Storage. Each source table is extracted as one or more CSV files.
- get_format_attributes(table_name)[source]¶
Return the following table attributes: schema, time_partitioning, range_partitioning, require_partition_filter, clustering_fields.
- load_table(source_uri: str, destination_table_name: str, time_to_live: int | None = None, schema: List[SchemaField] | None = None, field_delimiter: str | None = '|', write_disposition: WriteDisposition | None = 'WRITE_TRUNCATE') None[source]¶
Load one or more Storage CSV files into one BigQuery table.
- load_tables(source_uris: List[str], destination_table_names: List[str], time_to_live: int | None = None, schemas: List[List[SchemaField]] | None = None, field_delimiter: str | None = '|', write_disposition: WriteDisposition | None = 'WRITE_TRUNCATE') None[source]¶
Load Storage CSV files into BigQuery tables.
- run_queries(queries: List[str], destination_table_names: List[str], sample_size: int | None = None, time_to_live: int | None = None, write_disposition: WriteDisposition | None = 'WRITE_TRUNCATE') dict[source]¶
Run queries. Return monitoring as a dict in the format {‘duration’: d, ‘GB’: gb} where d is the execution duration in seconds and gb the number of gigabytes processed by the queries.
- run_query(query: str, destination_table_name: str, sample_size: int | None = None, time_to_live: int | None = None, write_disposition: WriteDisposition | None = 'WRITE_TRUNCATE') dict[source]¶
Run a query. Return monitoring as a dict in the format {‘duration’: d, ‘GB’: gb} where d is the execution duration in seconds and gb the number of gigabytes processed by the query.
- static sample_query(query: str, size: int) str[source]¶
Sample randomly a query.
The output query gives a subset of the lines given by the input query. This subset has approximately
sizelines. Nonetheless, the number of gigabytes processed is the same for the output query and the input query.
- set_time_to_live(table_name: str, nb_days: int, retry_delays: Tuple[int, ...] = (10, 30)) None[source]¶
Set the time to live of a table in days. More precisely the expires attribute of the table is set to UTC midnight between (today + nb_days) and (today + nb_days + 1), if it has not already this value.
We have noticed that some unexpected google.api_core.exceptions.PreconditionFailed can happen. That is why we try to update the table several times. If this exception is raised, we try again after a delay. The retry delays are specified in seconds in the argument retry_delays.