Migration to a new technology tends to depend on how easy it is to set it up, and what ecosystem exists to support the technology and plug the gaps. In this post, we’ll cover the following:
- Part 1: A quick overview of the Docker ecosystem
- Part 2: Setting up Docker and tools
- Part 3: A Hello World example with Docker
This is the second article in the Docker for Data Science Series. You can read the first article here, in which we covered the use cases where Docker can help.
Part 1: Tools in Docker Eco-System:
Let us first look at the components that we’ll be installing:
docker cli & docker engine: Docker CLI (Command Line Interface) & Docker Engine are the core components that help us in running Docker as a service on our system.
docker-compose: Many a times an application uses several different docker images completely independent of each other.
docker-composehelps us with building and deploying multi-container applications. For example, a multi-container webapp that uses
phpfor the backend and
mysqlas the database server, both running from different containers, becomes easy to manage with
docker-machine: It helps to create and manage lot of hosts locally as well as remotely (think Cloud platforms). Via
docker-machine, one can run all the docker commands on any remote server from one’s own (local) system.
Part 2: Setting up Docker
We will be installing the stable release of Docker Community Edition CE.
Installing in MacOS
- Step 1: Download the .dmg installer from the download page for macos
- Step 2: Do the drag-and-drop-in-
Applications-folder routine to install Docker
- Step 3: Fire up Docker from Applications. You will see a Docker icon (see below image), which means that Docker Daemon is running.
If you face any difficulties related to installation, you can refer to the official Docker for MacOS page.
Installing in Windows
- Step 1: Download the .exe installer from the download page for Windows
- Step 2: Install from the file. If you get stuck with any issue related to installation, you can refer to the official Docker for Windows page.
- Docker for Windows is only available for
- 64bit Windows 10 Pro
- Enterprise and Education versions
- 1607 Anniversary Update, Build 14393 or later
- For other Windows versions, install Docker ToolBox. Refer to this link on how to install.
- Docker Toolbox is similar to Docker CE mentioned above sans Hyper V support. Hyper V support helps create virtual machines on Windows 10, but in other versions that don’t support Hyper V, Virtualbox can be used and it suffices.
NOTE: All Docker tools (docker command line, docker-compose, docker-machine and a host of other utilities) are packaged together in the executables and you don’t have to download anything else.
Installing in Linux based distributions
Distributions supported by Docker are:
I’ll quickly outline the steps required to install Docker (and tools) in Ubuntu. Steps for other distros can be found in the corresponding help page.
- Step 1: Update your repositories if you have not done that in a while (to ensure you get the latest stable version
- Step 2: Install required packages (apt-transport-https, ca-certificates, curl, software-properties-common)
- Step 3: Add GPG Key for Docker (repo)
- Step 4: Verify the key
- Step 5: Since we want the
stableversion, we’ll add that repository
- Step 6: Update the package index (so that apt knows where to install the Docker-CE stable release from)
- Step 7: Inspect the available versions so that you can choose one!
- Step 8: Install Docker (finally!)
- Step 9: Add (current) user to the docker user group (so that the user has necessary permissions)
- Step 10: Install docker-compose and docker-machine
I have listed the necessary commands to perform the steps outlined above (in Ubuntu) in this page on GitHub. Once that’s done, we are set for the Docker Magic!
Part 3: Hello Docker
To start-off, let’s do a small hello-world example. We’ll start an RStudio Server without any installation, purely via Docker. I chose RStudio Server because
- RStudio is one of the most popular for Data Analysts or Data Scientists who prefer to use R
- RStudio Desktop edition is very easy to install, but installing the server edition can be very tricky
- Having to use the RStudio Server edition is a frequent usecase, particularly on the cloud
What do we need for this?
- Docker Image that we can trust.
- Docker Command Line Interface (CLI) to run Docker commands.
- Search for Docker Image
- Pick the right image
- Pull the Docker Image to your local system
- Run the Docker Image
- Visit the localhost url at 8787
Choosing and picking the right Docker image
The biggest factor in choosing any Docker image for professional usage is if the image is an official Docker image or not. To give an example, in the snapshot below any official image provided by good folks at Docker would be tagged as
If we don’t have an official image, the next best bet would be the number of stars or visit the official github repository and check the
Dockerfile contents to be sure. For our example, we’ll pick the
rocker/rstudio Docker Image to run.
Pull the Docker image and run it
Visit Rstudio Server (that’s it!)
Now visit your browser and type:
http://localhost:8787/ which would redirect you to the RStudio GUI (see below)
The default username:password is rstudio:rstudio, once done you will have the screen below.
Now you can do your analysis, install packages and all the other things you can do via a local RStudio application!
I hope you had your Aha! moment. Installing Docker should have felt easy to you and running docker images off the bat should have felt even easier! I see a few nodding heads there!
So that was just a test drive for Docker applications. Next we’ll try to understand how Docker works to ensure that we know how to debug errors when we are building more complicated apps.
If you liked this post, do share it. I would also like to hear from you about how I could have improved this post, and if the set up process worked out for you.