Installing, Deploying, Using Tensorflow in Python

Speaker: Chris Fregley, Pipeline.ai (former Netflix, Databricks, kickstarted Spark.)

https://github.com/PipelineAI/pipeline/

Zero respect for Python until he found Scientific Python.

PySpark not his favorite, rather use Scala for Spark, originally a Java guy

Slides available on meetup: slides

Great GPU notebooks and examples

This guy is such a salesman. He used port 6969 because of port colissions. Clever.

Notes from meetup site

Main talk (~30 mins + Q&A)

Optimizing, Profiling, and Deploying High-Performance Spark ML and TensorFlow AI Models in Production with GPUs

Using the latest advancements from TensorFlow including the Accelerated Linear Algebra (XLA) Framework, JIT/AOT Compiler, and Graph Transform Tool, I’ll demonstrate how to optimize, profile, and deploy TensorFlow Models - and the TensorFlow Runtime - in GPU-based production environment. This talk is 100% demo based on open source tools and completely reproducible through Docker on your own GPU cluster.

Bio Chris Fregly is Founder and Research Engineer at PipelineAI, a Streaming Machine Learning and Artificial Intelligence Startup based in San Francisco. He is also an Apache Spark Contributor, a Netflix Open Source Committer, founder of the Global Advanced Spark and TensorFlow Meetup, author of the O’Reilly Training and Video Series titled, "High-Performance TensorFlow in Production."

Previously, Chris was a Distributed Systems Engineer at Netflix, a Data Solutions Engineer at Databricks, and a Founding Member and Principal Engineer at the IBM Spark Technology Center in San Francisco.

Tensorflow!

Mostly in C.

Neural networks never want to go to production
always want to be retrained
want someone else to do the last 20%

Pipeline.ai

A framework to get your models into solution. Something about data pipelines!

Today - TensorFlow (TF) - Condensed talk: 8 hour into 30 minutes

Optimize and Deploy TF models to production.

Nice to have a working knowledge of TF.

GPU heavy talk.

Content Breakdown

50% training optimizations(GPUs, Training Pipeline, JIT) 50% prediction optimizations (AOT Compile, TF Serving)

Why heavy focus on 'model prediction' vs 'just training'?

Training - boring and batch! Prediction - exciting and real time!

Agenda - see the slides

Biggest thing is telling data scientists that they can deploy from their jupyter notebook in a controlled manner. Ability to push it out & roll it back.

DOCKER - Package model and runtime as one

Docker for mac - opened up a ton of use cases!
- no surprises in production
- deploy and tune model + runtime together
- same local, dev, production env

Production always a bit different than development before this. A bunch of virtualbox cruft.

Did he consider vagrant? I think docker just blows vagrant away for developers, removes tons of devops.

Tune Model + Runtime together! (see slides!)

Offline models aren't always going to do well online... (see slides)

Online - real-time metrics. Cost per prediction!

The prediction profiling and tuning stuff is about bot identification.

Pipeline AI - Continuous Model Training

Identify and fix borderline predictions. Hotdog - not hotdog.

Use crowdflower to label the data - then relabel along the classification's edges.

Shift traffic to winning model using AI Bandit Algorithms.

"Once we drain this experiment of its value... we can stop this experiment."

Shift traffic to minimum "Cloud Cost"

Google Cloud vs Amazon Web Services

Shift to amazon when the cloud instances are cheaper. Shift back and forth.

He made fun of microsoft azure but plugged their kubernetes support.

NVIDIA GPU Half-Precision Support

All about Volta V100 (2017) - AMD - tensor cores / Google TPUs
Pascal P100 (2016)
FP32 = Full Precision
FP16 = Half Precision
Half good for approximate deep learning use cases
Fit two FP16s into FP32 GPU cores for 2x throughput!!
Thunk to 32 bit

Nvidia: P100, M40, K40 - all tested for whatever reason

AMD? - GPU CUDA Programming

Barbaric, fun. Must know hardware. Probably hate your life.
Optimized for same instruction, multiple thread.
Do not like if-statement. Like half the cores go unused.
Independent thread scheduling - finally
CUDA Streams
- Tensorflow uses this heavily
- goal is to saturate cpus

Check out Batch Normalization

Almost always use batch normalization - except rarely you should never use batch normalization
Technique from 2015
gradient descent
don't want the network to learn the order the data is showing up
first part of pipeline is shuffle
normalize per batch
normalize per layer
each mini batch may have wildly different distributions

Dropout

sniff connections in your network
prevent overfitting
ensemble different neural architectures
randomly, 50% sniff these things, purposely cripple it
figure out ways to distort things they are trying to block out (spaces, numbers)
more difficult for the network, better the final network was

this guy something or other - his friend

https://github.com/yaroslavvb/stuff

DON'T USE FEED_DICT

feed_dict Requires Python<->C++ Serialization
Dataset API
what happened to dataframe? it's gone i guess? tensorflow went right to dataset api

Tensorflow Debugger - mouse from a terminal it's crazy, only Google...

ALWAYS START WITH `estimator` AND `experiment` APIs

These come from Google trying to productionize tensorflow, probably successfully.

See the slide, has the above title!!!!!!!!

Skip all the early demos you see, they are old and crusty.
These simplify model building.
flexible parameter tuning
enable rapid model experiments

Estimator API

See the slides.

Train-to-Serve Design
Create custom or use a canned Estimator
Hides session, graph layers,
Chief:Worker
Supervisor:Worker
getting away from master but not improving their language... just changing it...

Canned Estimators

See the slides, 60% use prebuilt, 40% roll their own estimator

Commonly used estimators
Pre tested and pre tuned

Multi-headed inference: `Single-Objective Estimator` vs `Multi-Objective Estimator`

Get two answers (objectives out of one estimator or something).

Hparams - Hyper parameter tuning

Do a big grid search, a parameter as a range.

Layers API exists

Use kubernetes or mesos

Good cluster organizer for working on this stuff...

JIT Compiler - and visualizing

Need to do a Python Context Manager with device as... whatever to use JIT.

Historically: Spark & Project Tungsten would take your code and create your plan into an optimized one and then to a physical plan.

Built on XLA - accelerated linear algebra framework.

Goals
- reduce memory Movement
- reduce overhead of multiple function calls

AOT Compiler - Standalone, Ahead-Of-Time (AOT) Compiler

For super tiny devices... pass in your graph.
Build up DAGs of operations in spark and tensorflow.

tfcompile - point to input for graph and output for graph
built on XLA framework
creates functions with feeds (inputs) and fetches (outputs)
figures out what everything... .so files, what bare minimum is necessary
creates binary down to 600k - this is how we fit in the apple app store or not.
shrink, shrink, tons of slides, freeze for production

Installing, Deploying, Using Tensorflow in Python

Notes from meetup site

Tensorflow!

Pipeline.ai

Today - TensorFlow (TF) - Condensed talk: 8 hour into 30 minutes

Content Breakdown

Agenda - see the slides

DOCKER - Package model and runtime as one

Tune Model + Runtime together! (see slides!)

Offline models aren't always going to do well online... (see slides)

Pipeline AI - Continuous Model Training

Shift traffic to minimum "Cloud Cost"

NVIDIA GPU Half-Precision Support

AMD? - GPU CUDA Programming

Check out Batch Normalization

Dropout

this guy something or other - his friend

DON'T USE FEED_DICT

Tensorflow Debugger - mouse from a terminal it's crazy, only Google...

ALWAYS START WITH `estimator` AND `experiment` APIs

Estimator API

Canned Estimators

Multi-headed inference: `Single-Objective Estimator` vs `Multi-Objective Estimator`

Hparams - Hyper parameter tuning

Layers API exists

Use kubernetes or mesos

JIT Compiler - and visualizing

AOT Compiler - Standalone, Ahead-Of-Time (AOT) Compiler

request batch tuning

There's like 60 slides left !! Damn, lots of content.

Installing, Deploying, Using Tensorflow in Python

Notes from meetup site

Tensorflow!

Pipeline.ai

Today - TensorFlow (TF) - Condensed talk: 8 hour into 30 minutes

Content Breakdown

Agenda - see the slides

DOCKER - Package model and runtime as one

Tune Model + Runtime together! (see slides!)

Offline models aren't always going to do well online... (see slides)

Pipeline AI - Continuous Model Training

Shift traffic to minimum "Cloud Cost"

NVIDIA GPU Half-Precision Support

AMD? - GPU CUDA Programming

Check out Batch Normalization

Dropout

this guy something or other - his friend

DON'T USE FEED_DICT

Tensorflow Debugger - mouse from a terminal it's crazy, only Google...

ALWAYS START WITH estimator AND experiment APIs

Estimator API

Canned Estimators

Multi-headed inference: Single-Objective Estimator vs Multi-Objective Estimator

Hparams - Hyper parameter tuning

Layers API exists

Use kubernetes or mesos

JIT Compiler - and visualizing

AOT Compiler - Standalone, Ahead-Of-Time (AOT) Compiler

request batch tuning

There's like 60 slides left !! Damn, lots of content.

ALWAYS START WITH `estimator` AND `experiment` APIs

Multi-headed inference: `Single-Objective Estimator` vs `Multi-Objective Estimator`