Skip to content

Commit 485a3c1

Browse files
committed
update rendered to be unexecuted versions with output examples for command line
1 parent 3d3ed76 commit 485a3c1

13 files changed

+446
-1848
lines changed

notebooks/rendered/bigquery-basics.md

Lines changed: 19 additions & 178 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11

2-
# BigQuery Basics
2+
# BigQuery basics
33

44
[BigQuery](https://cloud.google.com/bigquery/docs/) is a petabyte-scale analytics data warehouse that you can use to run SQL queries over vast amounts of data in near realtime. This page shows you how to get started with the Google BigQuery API using the Python client library.
55

@@ -16,13 +16,11 @@ import pandas
1616
To use the BigQuery Python client library, start by initializing a client. The BigQuery client is used to send and receive messages from the BigQuery API.
1717

1818
### Client project
19-
The project used by the client will default to the project associated with the credentials file stored in the `GOOGLE_APPLICATION_CREDENTIALS` environment variable.
20-
21-
See the [google-auth](https://google-auth.readthedocs.io/en/latest/reference/google.auth.html) for more information about Application Default Credentials.
19+
The `bigquery.Client` object uses your default project. Alternatively, you can specify a project in the `Client` constructor. For more information about how the default project is determined, see the [google-auth documentation](https://google-auth.readthedocs.io/en/latest/reference/google.auth.html).
2220

2321

2422
### Client location
25-
Locations are required for certain BigQuery operations such as creating a Dataset. If a location is provided to the client when it is initialized, it will be the default location for jobs, datasets, and tables.
23+
Locations are required for certain BigQuery operations such as creating a dataset. If a location is provided to the client when it is initialized, it will be the default location for jobs, datasets, and tables.
2624

2725
Run the following to create a client with your default project:
2826

@@ -32,10 +30,7 @@ client = bigquery.Client(location="US")
3230
print("Client creating using default project: {}".format(client.project))
3331
```
3432

35-
Client creating using default project: your-project-id
36-
37-
38-
Alternatively, you can explicitly specify a project when constructing the client:
33+
To explicitly specify a project when constructing the client, set the `project` parameter:
3934

4035

4136
```python
@@ -44,15 +39,17 @@ Alternatively, you can explicitly specify a project when constructing the client
4439

4540
## Run a query on a public dataset
4641

47-
The following example runs a query on the BigQuery `usa_names` public dataset, which is a Social Security Administration dataset that contains all names from Social Security card applications for births that occurred in the United States after 1879.
42+
The following example queries the BigQuery `usa_names` public dataset to find the 10 most popular names. `usa_names` is a Social Security Administration dataset that contains all names from Social Security card applications for births that occurred in the United States after 1879.
4843

49-
Use the [Client.query()](https://googleapis.github.io/google-cloud-python/latest/bigquery/generated/google.cloud.bigquery.client.Client.html#google.cloud.bigquery.client.Client.query) method to run the query, and the [QueryJob.to_dataframe()](https://googleapis.github.io/google-cloud-python/latest/bigquery/generated/google.cloud.bigquery.job.QueryJob.html#google.cloud.bigquery.job.QueryJob.to_dataframe) method to return the results as a [pandas DataFrame](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html).
44+
Use the [Client.query](https://googleapis.github.io/google-cloud-python/latest/bigquery/generated/google.cloud.bigquery.client.Client.html#google.cloud.bigquery.client.Client.query) method to run the query, and the [QueryJob.to_dataframe](https://googleapis.github.io/google-cloud-python/latest/bigquery/generated/google.cloud.bigquery.job.QueryJob.html#google.cloud.bigquery.job.QueryJob.to_dataframe) method to return the results as a pandas [`DataFrame`](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html).
5045

5146

5247
```python
5348
query = """
54-
SELECT name FROM `bigquery-public-data.usa_names.usa_1910_current`
55-
WHERE state = "TX"
49+
SELECT name, SUM(number) as total
50+
FROM `bigquery-public-data.usa_names.usa_1910_current`
51+
GROUP BY name
52+
ORDER BY total DESC
5653
LIMIT 10
5754
"""
5855
query_job = client.query(
@@ -65,72 +62,13 @@ df = query_job.to_dataframe()
6562
df
6663
```
6764

68-
69-
70-
71-
<div>
72-
73-
<table>
74-
<thead>
75-
<tr>
76-
<th></th>
77-
<th>name</th>
78-
</tr>
79-
</thead>
80-
<tbody>
81-
<tr>
82-
<th>0</th>
83-
<td>Mary</td>
84-
</tr>
85-
<tr>
86-
<th>1</th>
87-
<td>Ruby</td>
88-
</tr>
89-
<tr>
90-
<th>2</th>
91-
<td>Annie</td>
92-
</tr>
93-
<tr>
94-
<th>3</th>
95-
<td>Willie</td>
96-
</tr>
97-
<tr>
98-
<th>4</th>
99-
<td>Ruth</td>
100-
</tr>
101-
<tr>
102-
<th>5</th>
103-
<td>Gladys</td>
104-
</tr>
105-
<tr>
106-
<th>6</th>
107-
<td>Maria</td>
108-
</tr>
109-
<tr>
110-
<th>7</th>
111-
<td>Frances</td>
112-
</tr>
113-
<tr>
114-
<th>8</th>
115-
<td>Margaret</td>
116-
</tr>
117-
<tr>
118-
<th>9</th>
119-
<td>Helen</td>
120-
</tr>
121-
</tbody>
122-
</table>
123-
</div>
124-
125-
126-
12765
## Run a parameterized query
12866

129-
BigQuery supports query parameters to help prevent [SQL injection](https://en.wikipedia.org/wiki/SQL_injection) when queries are constructed using user input. This feature is only available with [standard SQL syntax](https://cloud.google.com/bigquery/docs/reference/standard-sql/). Query parameters can be used as substitutes for arbitrary expressions. Parameters cannot be used as substitutes for identifiers, column names, table names, or other parts of the query.
67+
BigQuery supports query parameters to help prevent [SQL injection](https://en.wikipedia.org/wiki/SQL_injection) when you construct a query with user input. Query parameters are only available with [standard SQL syntax](https://cloud.google.com/bigquery/docs/reference/standard-sql/). Query parameters can be used as substitutes for arbitrary expressions. Parameters cannot be used as substitutes for identifiers, column names, table names, or other parts of the query.
13068

131-
To specify a named parameter, use the `@` character followed by an [identifier](https://cloud.google.com/bigquery/docs/reference/standard-sql/lexical#identifiers), such as `@param_name`. For example, this query finds all the words in a specific Shakespeare corpus with counts that are at least the specified value.
69+
To specify a parameter, use the `@` character followed by an [identifier](https://cloud.google.com/bigquery/docs/reference/standard-sql/lexical#identifiers), such as `@param_name`. For example, the following query finds all the words in a specific Shakespeare corpus with counts that are at least the specified value.
13270

133-
For more information, see [Running Parameterized Queries](https://cloud.google.com/bigquery/docs/parameterized-queries) in the BigQuery documentation.
71+
For more information, see [Running parameterized queries](https://cloud.google.com/bigquery/docs/parameterized-queries) in the BigQuery documentation.
13472

13573

13674
```python
@@ -158,89 +96,9 @@ query_job = client.query(sql, location="US", job_config=job_config)
15896
query_job.to_dataframe()
15997
```
16098

161-
162-
163-
164-
<div>
165-
166-
<table>
167-
<thead>
168-
<tr>
169-
<th></th>
170-
<th>word</th>
171-
<th>word_count</th>
172-
</tr>
173-
</thead>
174-
<tbody>
175-
<tr>
176-
<th>0</th>
177-
<td>the</td>
178-
<td>614</td>
179-
</tr>
180-
<tr>
181-
<th>1</th>
182-
<td>I</td>
183-
<td>577</td>
184-
</tr>
185-
<tr>
186-
<th>2</th>
187-
<td>and</td>
188-
<td>490</td>
189-
</tr>
190-
<tr>
191-
<th>3</th>
192-
<td>to</td>
193-
<td>486</td>
194-
</tr>
195-
<tr>
196-
<th>4</th>
197-
<td>a</td>
198-
<td>407</td>
199-
</tr>
200-
<tr>
201-
<th>5</th>
202-
<td>of</td>
203-
<td>367</td>
204-
</tr>
205-
<tr>
206-
<th>6</th>
207-
<td>my</td>
208-
<td>314</td>
209-
</tr>
210-
<tr>
211-
<th>7</th>
212-
<td>is</td>
213-
<td>307</td>
214-
</tr>
215-
<tr>
216-
<th>8</th>
217-
<td>in</td>
218-
<td>291</td>
219-
</tr>
220-
<tr>
221-
<th>9</th>
222-
<td>you</td>
223-
<td>271</td>
224-
</tr>
225-
<tr>
226-
<th>10</th>
227-
<td>that</td>
228-
<td>270</td>
229-
</tr>
230-
<tr>
231-
<th>11</th>
232-
<td>me</td>
233-
<td>263</td>
234-
</tr>
235-
</tbody>
236-
</table>
237-
</div>
238-
239-
240-
24199
## Create a new dataset
242100

243-
A dataset is contained within a specific [project](https://cloud.google.com/bigquery/docs/projects). Datasets are top-level containers that are used to organize and control access to your [tables](https://cloud.google.com/bigquery/docs/tables) and [views](https://cloud.google.com/bigquery/docs/views). A table or view must belong to a dataset, so you need to create at least one dataset before [loading data into BigQuery](https://cloud.google.com/bigquery/loading-data-into-bigquery).
101+
A dataset is contained within a specific [project](https://cloud.google.com/bigquery/docs/projects). Datasets are top-level containers that are used to organize and control access to your [tables](https://cloud.google.com/bigquery/docs/tables) and [views](https://cloud.google.com/bigquery/docs/views). A table or view must belong to a dataset. You need to create at least one dataset before [loading data into BigQuery](https://cloud.google.com/bigquery/loading-data-into-bigquery).
244102

245103

246104
```python
@@ -253,7 +111,7 @@ dataset = client.create_dataset(dataset_id) # API request
253111

254112
## Write query results to a destination table
255113

256-
For more information, see [Writing Query Results](https://cloud.google.com/bigquery/docs/writing-results) in the BigQuery documentation.
114+
For more information, see [Writing query results](https://cloud.google.com/bigquery/docs/writing-results) in the BigQuery documentation.
257115

258116

259117
```python
@@ -274,9 +132,6 @@ query_job.result() # Waits for the query to finish
274132
print("Query results loaded to table {}".format(table_ref.path))
275133
```
276134

277-
Query results loaded to table /projects/your-project-id/datasets/your_new_dataset/tables/your_new_table_id
278-
279-
280135
## Load data from a pandas DataFrame to a new table
281136

282137

@@ -301,12 +156,9 @@ job.result() # Waits for table load to complete.
301156
print("Loaded dataframe to {}".format(table_ref.path))
302157
```
303158

304-
Loaded dataframe to /projects/your-project-id/datasets/your_new_dataset/tables/monty_python
305-
306-
307159
## Load data from a local file to a table
308160

309-
The example below demonstrates how to load a local CSV file into a new or existing table. See [SourceFormat](https://googleapis.github.io/google-cloud-python/latest/bigquery/generated/google.cloud.bigquery.job.SourceFormat.html#google.cloud.bigquery.job.SourceFormat) in the Python client library documentation for a list of available source formats. For more information, see [Loading Data into BigQuery from a Local Data Source](https://cloud.google.com/bigquery/docs/loading-data-local) in the BigQuery documentation.
161+
The following example demonstrates how to load a local CSV file into a new table. See [SourceFormat](https://googleapis.github.io/google-cloud-python/latest/bigquery/generated/google.cloud.bigquery.job.SourceFormat.html#google.cloud.bigquery.job.SourceFormat) in the Python client library documentation for a list of available source formats. For more information, see [Loading Data into BigQuery from a local data source](https://cloud.google.com/bigquery/docs/loading-data-local) in the BigQuery documentation.
310162

311163

312164
```python
@@ -332,12 +184,9 @@ print('Loaded {} rows into {}:{}.'.format(
332184
job.output_rows, dataset_id, table_ref.path))
333185
```
334186

335-
Loaded 50 rows into your_new_dataset:/projects/your-project-id/datasets/your_new_dataset/tables/us_states_from_local_file.
187+
## Load data from Cloud Storage to a table
336188

337-
338-
## Load data from Google Cloud Storage to a table
339-
340-
The example below demonstrates how to load a local CSV file into a new or existing table. See [SourceFormat](https://googleapis.github.io/google-cloud-python/latest/bigquery/generated/google.cloud.bigquery.job.SourceFormat.html#google.cloud.bigquery.job.SourceFormat) in the Python client library documentation for a list of available source formats. For more information, see [Introduction to Loading Data from Cloud Storage](https://cloud.google.com/bigquery/docs/loading-data-cloud-storage) in the BigQuery documentation.
189+
The following example demonstrates how to load a local CSV file into a new table. See [SourceFormat](https://googleapis.github.io/google-cloud-python/latest/bigquery/generated/google.cloud.bigquery.job.SourceFormat.html#google.cloud.bigquery.job.SourceFormat) in the Python client library documentation for a list of available source formats. For more information, see [Introduction to loading data from Cloud Storage](https://cloud.google.com/bigquery/docs/loading-data-cloud-storage) in the BigQuery documentation.
341190

342191

343192
```python
@@ -348,7 +197,7 @@ job_config = bigquery.LoadJobConfig(
348197
bigquery.SchemaField('post_abbr', 'STRING')
349198
],
350199
skip_leading_rows=1,
351-
# The source format defaults to CSV, so the line below is optional.
200+
# The source format defaults to CSV. The line below is optional.
352201
source_format=bigquery.SourceFormat.CSV
353202
)
354203
uri = 'gs://cloud-samples-data/bigquery/us-states/us-states.csv'
@@ -367,11 +216,6 @@ destination_table = client.get_table(table_ref)
367216
print('Loaded {} rows.'.format(destination_table.num_rows))
368217
```
369218

370-
Starting job 1c54e163-d785-4551-b4b7-7170ba07e00a
371-
Job finished.
372-
Loaded 50 rows.
373-
374-
375219
## Cleaning Up
376220

377221
The following code deletes the dataset created for this tutorial, including all tables in the dataset.
@@ -386,6 +230,3 @@ client.delete_dataset(dataset, delete_contents=True)
386230

387231
print('Deleted dataset: {}'.format(dataset.path))
388232
```
389-
390-
Deleted dataset: /projects/your-project-id/datasets/your_new_dataset
391-

0 commit comments

Comments
 (0)