Skip to content

Commit f6befff

Browse files
sirtorrybeccasaurus
authored andcommitted
* initial commit * update census * update notebooks
0 parents  commit f6befff

23 files changed

+5911
-0
lines changed
Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
AutoML Tables enables your entire team to automatically build and deploy state-of-the-art machine learning models on structured data at massively increased speed and scale.
2+
3+
4+
## Problem Description
5+
The model uses a real dataset from the [Census Income Dataset](https://archive.ics.uci.edu/ml/datasets/Census+Income).
6+
7+
8+
The goal is the predict if a given individual has an income above or below 50k, given information like the person's age, education level, marital-status, occupation etc...
9+
This is framed as a binary classification model, to label the individual as either having an income above or below 50k.
10+
11+
12+
13+
14+
15+
16+
Dataset Details
17+
18+
19+
The dataset consists of over 30k rows, where each row corresponds to a different person. For a given row, there are 14 features that the model conditions on to predict the income of the person. A few of the features are named above, and the exhaustive list can be found both in the dataset link above or seen in the colab.
20+
21+
22+
23+
24+
## Solution Walkthrough
25+
The solution has been developed using [Google Colab Notebook](https://colab.research.google.com/notebooks/welcome.ipynb).
26+
27+
28+
29+
30+
Steps Involved
31+
32+
33+
### 1. Set up
34+
The first step in this process was to set up the project. We referred to the [AutoML tables documentation](https://cloud.google.com/automl-tables/docs/) and take the following steps:
35+
* Create a Google Cloud Platform (GCP) project
36+
* Enable billing
37+
* Enable the AutoML API
38+
* Enable the AutoML Tables API
39+
* Create a service account, grant required permissions, and download the service account private key.
40+
41+
42+
### 2. Initialize and authenticate
43+
44+
45+
The client library installation is entirely self explanatory in the colab.
46+
47+
48+
The authentication process is only slightly more complex: run the second code block entitled "Authenticate using service account key" and then upload the service account key you created in the set up step.
49+
50+
51+
To make sure your colab was authenticated and has access to your project, replace the project_id with your project_id, and run the subsequent code blocks. You should see the lists of your datasets and any models you made previously in AutoML Tables.
52+
53+
54+
### 3. Import training data
55+
56+
57+
This section has you create a dataset and import the data. You have both the option of using the csv import from a Cloud Storage bucket, or you can upload the csv into Big Query and import it from there.
58+
59+
60+
61+
62+
### 4. Update dataset: assign a label column and enable nullable columns
63+
64+
65+
This section is important, as it is where you specify which column (meaning which feature) you will use as your label. This label feature will then be predicted using all other features in the row.
66+
67+
68+
### 5. Creating a model
69+
70+
71+
This section is where you train your model. You can specify how long you want your model to train for.
72+
73+
74+
### 6. Make a prediction
75+
76+
77+
This section gives you the ability to do a single online prediction. You can toggle exactly which values you want for all of the numeric features, and choose from the drop down windows which values you want for the categorical features.
78+
79+
80+
The model takes a while to deploy online, and currently there does not exist a feedback mechanism in the sdk, so you will need to wait until the model finishes deployment to run the online prediction.
81+
When the deployment code ```response = client.deploy_model(model_name)``` finishes, you will be able to see this on the [UI](https://console.cloud.google.com/automl-tables).
82+
83+
84+
To see when it finishes, click on the UI link above and navitage to the dataset you just uploaded, and go to the predict tab. You should see "online prediction" text near the top, click on it, and it will take you to a view of your online prediction interface. You should see "model deployed" on the far right of the screen if the model is deployed, or a "deploying model" message if it is still deploying.
85+
86+
87+
Once the model finishes deployment, go ahead and run the ```prediction_client.predict(model_name, payload)``` line.
88+
89+
90+
Note: If the model has not finished deployment, the prediction will NOT work.
91+
92+
93+
### 7. Batch Prediction
94+
95+
96+
There is a validation csv file provided with a few rows of data not used in the training or testing for you to run a batch prediction with. The csv is linked in the text of the colab as well as [here](https://storage.cloud.google.com/cloud-ml-data/automl-tables/notebooks/census_income_batch_prediction_input.csv) .

0 commit comments

Comments
 (0)