Operator¶
- class bigquery_operator.operator.Operator(client: Client, dataset_id: str)[source]¶
Bases:
objectWrapper for usual operations on a fixed BigQuery dataset.
- Parameters
client (google.cloud.bigquery.client.Client) – Client to manage connections to the BigQuery API.
dataset_id (str) – The dataset id in the format ‘project_id.dataset_name’.
- build_table_id(table_name: str, dataset_id: Optional[str] = None)[source]¶
Return a table id.
- Parameters
table_name (str) – A table name.
dataset_id (str, optional) – A dataset id in the format ‘project_id.dataset_name’. If not passed, falls back to self.dataset_id.
- Returns
A table id in the format ‘project_id.dataset_name.table_name’.
- Return type
str
- property client: Client¶
The client.
- Type
google.cloud.bigquery.client.Client
- copy_table(source_table_name: str, destination_table_name: str, source_dataset_id: Optional[str] = None, write_disposition: Optional[WriteDisposition] = 'WRITE_TRUNCATE') None[source]¶
Copy a table.
source_dataset_idmust be given in the format ‘project_id.dataset_name’. If not passed, falls back to self.dataset_id.
- copy_tables(source_table_names: List[str], destination_table_names: List[str], source_dataset_id: Optional[str] = None, write_disposition: Optional[WriteDisposition] = 'WRITE_TRUNCATE') None[source]¶
Copy tables.
source_dataset_idmust be given in the format ‘project_id.dataset_name’. If not passed, falls back to self.dataset_id.
- create_dataset(location: Optional[str] = None, default_time_to_live: Optional[int] = None) None[source]¶
Create the dataset.
- Parameters
location (str, optional) – Location in which the dataset is hosted.
default_time_to_live (int, optional) – The default time to live in days for the tables created in the dataset.
- create_empty_table(table_name: str, schema: Optional[List[SchemaField]] = None, time_partitioning: Optional[TimePartitioning] = None, range_partitioning: Optional[RangePartitioning] = None, require_partition_filter: Optional[bool] = None, clustering_fields: Optional[List[str]] = None) None[source]¶
Create a empty table. Only specify at most one of time_partitioning or range_partitioning.
- property dataset_id: str¶
The dataset id in the format ‘project_id.dataset_name’.
- Type
str
- property dataset_name: str¶
The dataset name.
- Type
str
- extract_table(source_table_name: str, destination_uri: str, field_delimiter: Optional[str] = '|', print_header: Optional[bool] = True) None[source]¶
Extract a table. Data is extracted as a gzip compressed csv.
destination_urimust end with ‘.csv.gz’
- extract_tables(source_table_names: List[str], destination_uris: List[str], field_delimiter: Optional[str] = '|', print_header: Optional[bool] = True) None[source]¶
Extract tables from BigQuery to Storage. Each source table is extracted as one or more compressed gzip csv files. Each destination uri must end with ‘.csv.gz’.
- get_format_attributes(table_name)[source]¶
Return the following table attributes: schema, time_partitioning, range_partitioning, require_partition_filter, clustering_fields.
- load_table(source_uri: str, destination_table_name: str, schema: Optional[List[SchemaField]] = None, field_delimiter: Optional[str] = '|', write_disposition: Optional[WriteDisposition] = 'WRITE_TRUNCATE') None[source]¶
Load one or more Storage CSV files into one BigQuery table.
- load_tables(source_uris: List[str], destination_table_names: List[str], schemas: Optional[List[List[SchemaField]]] = None, field_delimiter: Optional[str] = '|', write_disposition: Optional[WriteDisposition] = 'WRITE_TRUNCATE') None[source]¶
Load Storage CSV files into BigQuery tables.
- property project_id: str¶
the id of the project which the client acts on behalf of.
- Type
str
- run_queries(queries: List[str], destination_table_names: List[str], write_disposition: Optional[WriteDisposition] = 'WRITE_TRUNCATE') dict[source]¶
Run queries. Return monitoring as a dict in the format {‘duration’: d, ‘cost’: c} where d is the execution duration in seconds and c the execution cost in dollars.
- run_query(query: str, destination_table_name: str, write_disposition: Optional[WriteDisposition] = 'WRITE_TRUNCATE') dict[source]¶
Run a query. Return monitoring as a dict in the format {‘duration’: d, ‘cost’: c} where d is the execution duration in seconds and c the execution cost in dollars.
- static sample_query(query: str, size: int) str[source]¶
Sample randomly a query.
The output query gives a subset of the lines given by the input query. This subset has approximately
sizelines. Nonetheless, the cost of the output query is the same as the cost of the input query.