The use of GPUs have become quite important for applications of Deep Learning. The landscape of this hardware has lately become quite interesting. The cryptocurrency fad has led to GPUs becoming expensive and/or unavailable. One solution to this problem is to rent a computer with a GPU from Google, Amazon, Microsoft, and others. Google even recently lowered their GPU prices quite substantially. I’ll save doing a full comparison of each platform for later, but what I’ll go over today is getting started with a relatively new service, PaperSpace based in my home of Brooklyn <3.
The virtue of paperspace is that their entry-level GPU offering costs $0.40/hr (compared to Google’s new $0.45/hr and Amazon’s $0.90/hr) but benchmarks ahead of Amazon and other offers based on the Tesla K80.
Below, I’ll show you how to get up and running quickly and how to set up cost-saving measures, like auto-shutdown.
Steps to get running
First, as with any new machine, update the current packages with
sudo apt update sudo apt upgrade
(Note: I had to add the
sudo apt update)
Add a new user according to these instructions.
open ports with ufw
By default, Paperspace has a very strict firewall (this is a good thing). We’re going to want to get to our jupyter notebooks, though, so we need to open up some ports. You can do that with these instructions.
My version is pretty unsafe (it allows access from any IP) so feel free to check that link for info on restricting the IP that can access your jupyter port.
sudo ufw allow 8888 sudo ufw allow 60000:61000/udp
set up jupyter
fix autoshutdown with ssh
set up ssh keys
Install cuda 9.1
I often find myself in roughly this situation:
I’ve done a lot of work to pre-process data, researched different DL architectures and selected one (or more), modified it so that it works well on my data. And now I’ve got a model that does well, but not quite as well as I want it to.
Below I’ll outline steps that I find very useful in this situation. Of course, more detail about how to get to that point is for another post. Also, I’m using Keras so many of these are specific to that tool, but they could absolutely be applied to other DL packages.
Clean up training code and improve logging.
My typical workflow when I’m building or tweaking a model by hand is to run Kears in a Jupyter notebook. This works fine if the entire pipeline runs in a few minutes but doesn’t work if I need to close my laptop while the pipeline is running. Jupyter often has trouble gracefully reconnecting and then I lose all the
Porting my code into a self-contained python script allows me to connect to a machine over mosh and tmux, run the script, and then forget about it. This is handy on it’s own but it’ll be extremely handy when we get to the later tips.
The tools I use for this are:
CSVLoggertakes the output from verbose and stores it in a csv file for later. If you’re using tmux, it should be preserving your terminal log (and the
verboseoutput) but csv puts all of that history data in a format where you can easily use it.
TensorBoardis super handy if you want to be able to monitor the progress of a long-running model from somewhere other than your terminal. I can access this from my iPad and it looks great!
ModelCheckpointsaves your model every so often (you set the frequency as a parameter). This is nice on its own because it allows you to access a model that was trained in a script from anywhere (including my preferred tinkering environment, Jupyter). Even more than that, if you use
save_best_only=True, then the script is automatically saving space by only saving the best performing model (you should youse
val_lossor some other validation metric with this option).
Learning Rate Schedule
keras.callbacks.ReduceLROnPlateauThis is my preference. I use this in conjunction with EarlyStopping and ModelCheckpoint all the time.
Use noise and other transformations to enlarge your datasets
- Tensorflow blog discussing this with background noise
There are many hyperparameters that can be optimized:
- Learning rate
- Dropout rate
- Preprocessing steps
- Different architectures (size and shape of layers as well as count) and parameters
- Regularizers and parameters
- Initializers and parameters
- and more!
These create an enormous space of possible models for which to test. My recommendation is to find something that works and then from that functioning model determine plausible options for the above hyperparameters. Then throw what you have into hyperopt.
- HyperOpt does this automatically. It’s poorly documented but fairly easy to use.
Last week I presented at the Data Science Study Group on a project of mine where I built a deep learning platform from scratch in python.
First, my project drew primarily from two sets of sources, without which I never would have completed this project. First, there are many examples of folks doing this online. Here’s an incomplete list in python:
- ** Deep Learning From Scratch I-V by Daniel Sabinasz **
- Implementing a Neural Network from Scratch in Python – An Introduction by Denny Britz
- Understanding and coding Neural Networks From Scratch in Python and R by Sunil Ray
- How to Implement the Backpropagation Algorithm From Scratch In Python
While all of these are useful, Sabinasz’s was what I based my project on because he implements a system that builds a computational graph and includes a true backpropogation algorithm. The others I saw do this implicitly by calculating the gradients operation-by-operation. That approach is fine for a single demo but I wanted something that mimicked the flexibility of tensorflow, allowing me to compare different network structures and activations without starting over each time.
In addition to these resources, I drew heavily from Deep Learning by Goodfellow, Bengio, and Courville. I’m certain that the other examples I looked toward used this book as well.
While I started with Sabinasz’s code, I made a few modifications and improvements including:
- Add graph visualization with python Graphviz
- Remove the use of globals for the computational graph
- Simplify backprop algorithm by adding gradient calculations to the operation classes
- Add a Relu activation function
- Tweak the visualizations
Here’s the learning rate plotted along with the classification boundary for a relu network with 4 hidden nodes.
And here’s the computational graph. You can really see the benefit of tracking the graph and automating the backprop algorithm for a graph of this size.
What I still want to do
- I want to write up a blog summarizing my talk and the process for creating this. I think it could be a very useful explanatory tool.
- I have a strong feeling that some of the gradients in here are inaccurate.
- In many cases the network fails to learn for any learning rate schedule unless I give it a much higher capacity than it needs (e.g. 4+ hidden nodes in the XOR task).
- In simple cases, like separable data, the model should be able to get arbitrarily close to $J=0$ but fails to do so.
- The softmax gradient seems to differ from that found in other sources
- I want to extend this model to larger datasets and deeper networks. Right now it runs into what I think are underflow errors in these cases but they should be possible to avoid.
Trans friends, here’s a periodic reminder that it might be useful to update your documents (name, gender marker, id). I’ve recently done this. Here’s the quickest and cheapest path for US citizens:
1 Update your passport gender marker. For this you need proof of citizenship (like a birth certificate), a letter from your doctor, and a new passport photo. Use the template in the link below. The cost is $110 and this can be done in one or two days. info here 2 Update your name in court. This process differs state by state. Roughly, you’ll schedule a hearing, gather documents including a birth certificate, publish an announcement in the paper (trans folks are generally able to waive the publication requirement in NY), then get certified copies of the court order NYC, NY, NJ 3 Update your passport name. Do this within a year of step one and it’s free and only requires the court order (form here). 4 Everything else (except sometimes a birth certificate) can be done with an updated passport and possibly some additional documentation.
If you need help, I’d love to or help find someone who can. If money is a problem, Trans Assistance Project, TLDEF, SRLP, and other places have provided that in the past. Let me know if you have any trouble.
One thing about being trans is that I get to come up with my own name. Plenty of non-trans people do this, sure, but to me it sort of feels like a rite of passage for trans folk. I’ve chosen “Sophia” for myself. I wanted to go through my thought process in choosing this name, both for myself and for others, as I found reading other people’s accounts of choosing their name to be quite helpful in choosing mine.
My checklist was something like the following (in approximate order of importance):
- It had to feel right.
- It had to feel feminine.
- None of my close friends have that name.
- It doesn’t stand out too much.
A common approach would be to feminize my current masculine name. So if my parents had named me Henry at birth, I might use Henrietta. This would have been fine, as I’m not personally bothered by deriving my female name from a male name, but it just doesn’t seem to fit. Another common option is to ask my parents what they would have named me if I had arrived as a girl to them. I think my mom has told me I would have been “Lindsey” which, for whatever reason, does not feel right. So back to the drawing board.
Sophia is a name that my wife and I have discussed for years as a potential name for a hypothetical daughter. But since we first started discussing the name, it has become an enormously popular. Rising from around 50th most popular girls name in the 90s, to around 10th in the 00s, to between the 1st and 3rd most popular now. We’ve always wanted to give our children names that fell farther down the list (because of course they will be unique little snowflakes!), so we’re not quite as excited about it as a name for a child now.
But for me, this could be a good thing. The popularity of the name means that “Sophia” is very recognizably feminine but, because it was more rare when I was young, I know essentially no one who has that name. It thus meets the last three criteria very well. It also has a weird personal connection because I’ve envisioned a daughter with that name—which I’m still not sure if that is good or bad.
But outside of all of these practical considerations, Sophia has a certain importance to me. To explain, let’s fire up the flashback machine to 2004 or so. At that time in my life I had just begun to play tabletop role-playing games with friends. It’s the kind of thing where you build a character out of the clay of your imagination and a giant rulebook and then roll dice to see what that character is able to do. You do this with friends who have made characters in a similar way, and try to make the voices the characters would make, et cetera. For me, it presented this strange dilemma: I was quite curious about playing a female character but totally apprehensive of doing so in front of the teenage boys others I played the game with. [I’d be interested to learn more about how these kinds of games impacted other trans people.]
Around that time I started playing what would become my favorite video game, The Knights of the Old Republic. It was a game for Xbox that followed the exact same rules as the pen and paper games I played with my friends. But this was single player on the Xbox. For the first time I got to choose the gender of my character, name her, and play her, all to myself. Sure I might have to explain to my brother or friends why my save game had a chick on it but that was so much less daunting than performing as a female in front of teenage boys.
Anyway, in this game I chose the character who looked most like me: dark hair, light skin, a woman. I named her Sophia after a character in the book I was reading at the time, Sophie Neveu in The Da Vinci Code. I’ve sort of grown out of The Da Vinci Code, though at the time my Dan Brown obsession was burning hot. And, yes, I realize Sophie’s character in The Da Vinci code is problematic for several reasons, but let’s give 14 year old me a break. The important thing is that it all just felt comfortable. And, props to Bioware, Sophia could do everything that any male character could, including pursuing plot-impacting romances with major female Non-Player-Characters. So adolescent me got to try on the skin of a lesbian jedi who always saved the day. And I fucking loved it.
For the next decade-plus, Sophia and I (or, I as Sophia) played so many role playing games: KOTOR 2, Mass Effect 1-3, Jade Empire, Dragon Age, … the list goes on (and certainly includes non-Bioware games, though those are my favorites). By the time I began to appreciate the fact that I am and have always been trans at age 27, I’ve already been trying on Sophia for nearly half of my life. After looking back and just realizing that fact, there is no way I can “choose” anything else. It doesn’t even feel like choosing at that point. I am Sophia and I have been for longer than even I realized.