Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docs for GCP #9

Merged
merged 9 commits into from
Sep 25, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
build/
_build/
25 changes: 19 additions & 6 deletions source/finalise.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,10 @@ In the meantime, you can connect to the custer and follow its progress.
Setting service limits
----------------------

You can log into the management node at ``yourusername@mgmtipaddress``,
You can log into the management node at ``provisionerusername@mgmtipaddress``,
using the IP address that terraform printed at the end of its run.

On Oracle, the username is ``opc`` and on Google, the username is ``provisioner``.
For example:

.. code-block:: shell-session
Expand All @@ -20,7 +22,7 @@ Once logged in, you can run the ``finish`` script:

.. code-block:: shell-session

[opc@mgmt ~]$ ./finish
[opc@mgmt ~]$ finish

It will most likely tell you that the system has not finished configuring.
If the ``finish`` script is not there, wait a minute or two and it should appear.
Expand All @@ -35,15 +37,13 @@ Use ``ctrl-c`` to stop following the log.

You can keep on running trying to run ``finish`` until the node has finished configuring.
Once it has, you need to tell the system what limits you want to place on the scaling of the cloud.
To decide what to put here, you should refer to your service limits in the OCI website.
In the future, we hope to be able to extract these automatically but for now you need to replicate it manually.
Edit the file ``/home/opc/limits.yaml`` with:

.. code-block:: shell-session

[opc@mgmt ~]$ vim limits.yaml

and set its contents to somthing like:
On an Oracle-based system, you should set its contents to something like:

.. code-block:: yaml

Expand All @@ -60,11 +60,24 @@ which specifies for each shape, what the service limit is for each AD in the reg
In this case each of the shapes ``VM.Standard2.1`` and ``VM.Standard2.2`` have a service limit of 1 in each AD.
The system will automatically adjust for the shape used by the management node.

To decide what to put here, you should refer to your service limits in the OCI website.
In the future, we hope to be able to extract these automatically but for now you need to replicate it manually.

On a Google-based system, you should set it to something like:

.. code-block:: yaml

n1-standard-1: 3
n1-standard-2: 3

which restricts the cluster to having at most 3 ``n1-standard-1`` and 3 ``n1-standard-1`` nodes.
Since Google does not have per-node-type limits, you can make these numbers as large as you like in principle.

Run ``finish`` again and it should configure and start the Slurm server:

.. code-block:: shell-session

[opc@mgmt ~]$ ./finish
[opc@mgmt ~]$ finish

If your service limits change, you can update the file and run the script again.

Expand Down
131 changes: 131 additions & 0 deletions source/google-infrastructure.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
Creating the infrastructure on Google
=====================================

Setting up the environment
--------------------------

Before we can install the cluster onto your cloud environment, we need to do some initial one-off setup.
Firstly, we need to install the command-line tool ``gcloud`` which allows you to configure Google Cloud.
Download and setup ``gcloud`` based on the `instructions from Google <https://cloud.google.com/sdk/docs/>`_.

Once you have ``gcloud`` installed, start by associating it with your Google account:

.. code-block:: shell-session

$ gcloud config set account <[email protected]>

where you should replace ``<[email protected]>`` with your email address.

Now that it knows who you are, you should set a default project.
You can find the ID of your project with ``gcloud projects list``.

.. code-block:: shell-session

$ gcloud config set project <citc-123456>

Once the project has been set, we can enable the required APIs to build Cluster in the Cloud.
This step will likely take a few minutes so be patient:

.. code-block:: shell-session

$ gcloud services enable compute.googleapis.com \
iam.googleapis.com \
cloudresourcemanager.googleapis.com \
file.googleapis.com

That's all the structural setup for the account needed.
The last ``gcloud`` thing we need to do is create a service account which Terraform uses to communicate with GCP.
Make sure to replace every instance of ``<citc-123456>`` with your project ID:

.. code-block:: shell-session

$ gcloud iam service-accounts create citc-terraform --display-name "CitC Terraform"
$ gcloud projects add-iam-policy-binding <citc-123456> --member serviceAccount:citc-terraform@<citc-123456>.iam.gserviceaccount.com --role='roles/editor'
$ gcloud projects add-iam-policy-binding <citc-123456> --member serviceAccount:citc-terraform@<citc-123456>.iam.gserviceaccount.com --role='roles/iam.securityAdmin'
$ gcloud iam service-accounts keys create citc-terraform-credentials.json --iam-account=citc-terraform@<citc-123456>.iam.gserviceaccount.com

This will create a local JSON file which contains the credentials for this user.

The final step of setup needed is to create a key which Terraform will use to communicate with the server to upload some configuration.
For now this must be created with no passphrase:

.. code-block:: shell-session

$ ssh-keygen -t rsa -f ~/.ssh/citc-google -C provisioner -N ""

Setting the config
------------------

To initialise the local Terraform repo, start by running the following:

.. code-block:: shell-session

$ terraform init google-cloud-platform

Now, when you check the Terraform version, you should see the Google provider showing up:

.. code-block:: shell-session

$ terraform version
Terraform v0.12.9
+ provider.external v1.2.0
+ provider.google v2.10.0
+ provider.tls v1.3.0
+ provider.template v2.1.0

Rename the example config file ``google-cloud-platform/terraform.tfvars.example`` to ``terraform.tfvars`` and open it in a text editor:

.. code-block:: shell-session

$ mv google-cloud-platform/terraform.tfvars.example terraform.tfvars
$ vim terraform.tfvars

There's a few variables which we need to change in here.
First you must set the ``region`` and ``zone`` variables to the correct values for your account.
This will depend on what regions you have access to and where you want to build your cluster.

Then the ``project`` variable must be set to the project ID as we used above when running ``gcloud``.

Finally, if you wish you can change the node type used for the management node.
By default it's a lightweight single-core VM which should be sufficient for most uses but you can change it if you wish.

The rest of the variables should usually be left as they are.

Running Terraform
-----------------

At this point, we are ready to provision our infrastructure.
Check that there's no immediate errors with

.. code-block:: shell-session

$ terraform validate google-cloud-platform

It should return with no errors.
If there are any problems, fix them before continuing.

Next, check that Terraform is ready to run with

.. code-block:: shell-session

$ terraform plan google-cloud-platform

which should have, near the end, something like ``Plan: 11 to add, 0 to change, 0 to destroy.``.

We're now ready to go. Run

.. code-block:: shell-session

$ terraform apply google-cloud-platform

and, when prompted, tell it that "yes", you do want to apply.

It will take some time but should return without any errors with something green that looks like::

Apply complete! Resources: 11 added, 0 changed, 0 destroyed.

Outputs:

ManagementPublicIP = 130.61.43.69

You are now ready to move on to :doc:`finalising the setup on the cluster <finalise>`.
10 changes: 7 additions & 3 deletions source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,19 @@ Create a cluster in the cloud
:caption: Contents:

infrastructure
oracle-infrastructure
google-infrastructure
finalise
running

Welcome to the documentation for cluster in the cloud.
By the end of this you will have a fully-operational, elastically-scaling, heterogeneous Slurm cluster running on cloud resources.

In the future, the intention is that this tutorial will cover installing on all major cloud providers but for now only Oracle Public Cloud is covered.
In the future, the intention is that this tutorial will cover installing on all major cloud providers
but for now Oracle Cloud Infrastructure and Google Cloud Platform are covered.

This tutorial was created by `Matt Williams <https://github.com/milliams/>`_ at the `ACRC in Bristol <http://www.bristol.ac.uk/acrc/>`_.
This tutorial and the Cluster in the Cloud software was created by `Matt Williams <https://github.com/milliams/>`_
at the `ACRC in Bristol <http://www.bristol.ac.uk/acrc/>`_.
Contributions to this tutorial document are welcome `at GitHub <https://github.com/ACRC/cluster-in-the-cloud>`_.

.. admonition:: If you need help
Expand All @@ -28,7 +32,7 @@ To complete this tutorial you will need:

* access to a command-line (i.e. Linux, MacOS Terminal or WSL)
* an `SSH key pair <https://help.github.com/articles/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent/>`_
* an account with credit on Oracle cloud
* an account with credit on Oracle or Google cloud

* the account must have admin permissions to create infrastructure

Expand Down
100 changes: 10 additions & 90 deletions source/infrastructure.rst
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
Set up the cloud infrastructure
===============================

Getting ready
-------------
Getting Terraform
-----------------

The first step is to get a bunch of servers powered on in your cloud.
We do this using a tool called `Terraform <https://www.terraform.io/>`_.
Expand All @@ -19,102 +19,22 @@ you should get output like:

Terraform v0.12.9

We're now ready to start configuring our infrastructure.
Getting the Terraform config
----------------------------

Setting the config
------------------
Terraform is a tool for creating infrastructure but we need to provide it with some configuration.

Start by making a new directory which will hold all our configuration.
We will refer to this directory as the *base config directory*.
Change to that directory in your terminal.

Grab the Terraform config from Git using:

.. code-block:: shell-session

$ git clone https://github.com/ACRC/oci-cluster-terraform.git

Now move into that directory and initialise the Terraform repo:

.. code-block:: shell-session

$ terraform init

Now, when you check the Terraform version, you should see the OCI provider showing up:

.. code-block:: shell-session

$ terraform version
Terraform v0.12.9
+ provider.oci v3.44.0
+ provider.template v2.1.0
+ provider.tls v2.1.0

Rename the example config file ``terraform.tfvars.example`` to ``terraform.tfvars`` and open it in a text editor:

.. code-block:: shell-session

$ mv terraform.tfvars.example terraform.tfvars
$ vim terraform.tfvars

Following the instructions at the `Oracle Terraform plugin docs <https://www.terraform.io/docs/providers/oci/index.html#authentication>`_,
set the values of ``tenancy_ocid``, ``user_ocid``, ``private_key_path``, ``fingerprint`` and ``region``.
Make sure that the user account you use for ``user_ocid`` has admin access in your tenancy to create infrastructure.

You will also need to set the compartment OCID of the compartment that you are using.
If you are using the default root compartment, this will be the same as your tenancy OCID.

The next thing to set is an SSH key that you will use to connect to the server once it is built.
See `GitHub's documentation <https://help.github.com/articles/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent/>`_ on information on how to do this
and then paste the contents of the public key into the ``ssh_public_key`` config variable between the two ``EOF``\ s.

You will want a simple, lightweight VM for the management node so
for this tutorial, we will use ``VM.Standard2.16`` for the management node.

Set the ``ManagementShape`` config variable to the shape you want for the management node::

ManagementShape = "VM.Standard2.1"

The second thing we need to do for the management node is decide which AD it should reside in.
Set the variable ``ManagementAD`` to whichever AD you'd like to use::

ManagementAD = "1"

Running Terraform
-----------------

At this point, we are ready to provision our infrastructure.
Check that there's no immediate errors with

.. code-block:: shell-session

$ terraform validate

It should return with no errors.
If there are any problems, fix them before continuing.

Next, check that Terraform is ready to run with

.. code-block:: shell-session

$ terraform plan

which should have, near the end, something like ``Plan: 9 to add, 0 to change, 0 to destroy.``.

We're now ready to go. Run

.. code-block:: shell-session

$ terraform apply

and, when prompted, tell it that "yes", you do want to apply.

It will take some time but should return without any errors with something green that looks like::

Apply complete! Resources: 9 added, 0 changed, 0 destroyed.

Outputs:
$ git clone https://github.com/ACRC/citc-terraform.git
$ cd citc-terraform

ManagementPublicIP = 130.61.43.69
We're now ready to start configuring our infrastructure on either:

You are now ready to move on to :doc:`finalising the setup on the cluster <finalise>`.
- :doc:`Oracle Cloud Infrastructure <oracle-infrastructure>` or
- :doc:`Google Cloud Platform <google-infrastructure>`.
Loading