Good feeling part and regrettable part of the Cloud Datalab

This entry,

It is a continuation.

Basics of Cloud Datalab

Here also and I wrote that similar but will write again, and Cloud Datalab is something like the following.

  • Interactive analysis of the environment based on Jupyter
  • GCP and integrated environment
  • Jupyter and package the Python library was containerization
  • It can or dropping launched easily on the GCE through datalab command the container

Premise of Datalab

Datalab has been designed to work closely linked to the project of GCP.

Nothing in particular, by default will be as follows and do not specify.

Start-up

Try various doing on the assumption that. It is the start of Anyway Datalab.

$ datalab create --disk-size-gb 10 --no-create-repository datalab-test
 

  • --disk-size-gbIn Specify the disk size.

    • Since the default is made in 200GB, you specified the 10GB in the smaller size
  • --no-create-repositoryIn does not perform the Repository Creation

    • When I was or off in the repository alone, --no-create-repositoryit has not start and not wearing. . . Would be what, this. In addition to properly investigate.

Cooperation with BigQuery

Datalab is very nice and can work with BigQuery. In, but it will change the story a little, the Jupyter Magic Command that %%if there are Kara begins command function, also provides function of BigQuery and GCS.

Execution of the query as Magic Command

Sample is as expected, but this is understood well and try out the splendor that can be written in the cell.

%%bq query

SELECT id, title, num_characters
FROM `publicdata.samples.wikipedia`
WHERE wp_namespace = 0
ORDER BY num_characters DESC
LIMIT 10

Run through google.datalab.bigquery

Since you put BQ of the query into a cell, in the fact that I want to process as those in the sample , but you, you can pass to the Pandas the results of a query as dataframe. wonderful.

%%bq query -n requests

SELECT timestamp, latency, endpoint
FROM `cloud-datalab-samples.httplogs.logs_20140615`
WHERE endpoint = 'Popular' OR endpoint = 'Recent'

import google.datalab.bigquery as bq
 import pandas as pd

df = requests.execute(output_options=bq.QueryOutput.dataframe()).result()

Is it such a feeling Mochoi to Ppoku via API?

import google.datalab.bigquery as bq
 import pandas as pd

# Issue query query = """SELECT timestamp, latency, endpoint
FROM `cloud-datalab-samples.httplogs.logs_20140615`
WHERE endpoint = 'Popular' OR endpoint = 'Recent'"""
# create a query object qobj = bq.Query(query) # get the query results as data frames pandas df2 = qobj.execute(output_options=bq.QueryOutput.dataframe()).result() # to the following operations pandas df2.head()

When the good think, because here the API has been provided, it is a flow I Magic Command is made. In fact, here you see and as Magic Command %%bqwill see that has been defined.

Cooperation with GCS

BigQuery the same sample street , we can manipulate the objects on the GCS from the cell. The point, whether you'll be able to read and write files. The results of the BigQuery is also cooperation can be a data source, but it is fascinating to handle transparently the data of GCS as it is as the data source.

Cooperation with CloudML

This is, it was confirmed that the move something through the time being API, because often you do not know as a lot of behavior this time will be skipped.

Changing the instance type

Here is the true value of the cloud. Spec up, if you need that are not possible with Onpure can be realized. In the create of datalab command --machine-typeallows you to specify the instance type in options. By default, n1-standard-1it looks like rises.

# Delete command in the instance delete 
 # disks that were attached in this case remain intact 
 $ datalab delete datalab-test

# On the same machine name, start by changing the instance type for the disk is made in the naming conventions of the # machine name + pd us to arbitrarily attach the disk # it's the same machine name $ datalab create --no-create-repository \
--machine-type n1-standard-4 \
datalab-test

Now, you can raise or lower the specs of the machine if necessary.

GPU of the analysis environment!

The time being, is this time of the highlights.

with this! ! ! After you specify the GPU instance! ! ! ! Handy GPU machine learning environment can get easy! ! ! !

And the place was in ,,, now the world that does not go so easily I thought, GPU instance is not supported in Datalab.

Summary

But is Datalab to or places regrettable, GPU instance is somehow expected pale that do not do us with any corresponding now or there, or at the Cloud Source Repository, or except where Cloud ML Engine around is finally also even, I of these days today think that it is the important part for making the data analysis environment. Next time I want to look a little more tightly around here.

Other reference information