Configuring Galaxy

Configuring Galaxy 19.01 or higher

  1. First install the GalaxyCloudRunner into your Galaxy virtual environment.
cd <galaxy_home>
source .venv/bin/activate
pip install galaxycloudrunner
  1. Edit your job_conf.xml in the <galaxy_home>/config folder and add the highlighted sections to it.

    You will need to add your own value for the cloudlaunch_api_token to the file. Instructions on how to obtain your CloudLaunch API key are given below.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
<?xml version="1.0"?>
<job_conf>
    <plugins>
        <plugin id="local" type="runner" load="galaxy.jobs.runners.local:LocalJobRunner" workers="4"/>
        <plugin id="pulsar" type="runner" load="galaxy.jobs.runners.pulsar:PulsarRESTJobRunner"/>
    </plugins>
    <destinations default="galaxycloudrunner">
        <destination id="local" runner="local"/>
        <destination id="galaxycloudrunner" runner="dynamic">
            <param id="type">python</param>
            <param id="function">cloudlaunch_pulsar_burst</param>
            <param id="rules_module">galaxycloudrunner.rules</param>
            <param id="cloudlaunch_api_endpoint">https://launch.usegalaxy.org/cloudlaunch/api/v1</param>
            <!-- Obtain your CloudLaunch token by visiting: https://launch.usegalaxy.org/profile -->
            <param id="cloudlaunch_api_token">37c46c89bcbea797bc7cd76fee10932d2c6a2389</param>
            <!-- id of the PulsarRESTJobRunner plugin. Defaults to "pulsar" -->
            <param id="pulsar_runner_id">pulsar</param>
            <!-- Destination to fallback to if no nodes are available -->
            <param id="fallback_destination_id">local</param>
            <!-- Pick next available server and resubmit if an unknown error occurs -->
            <resubmit condition="unknown_error and attempt &lt;= 3" destination="galaxycloudrunner" />
        </destination>
    </destinations>
    <tools>
        <tool id="upload1" destination="local"/>
    </tools>
</job_conf>
  1. Launch as many Pulsar nodes as you need through CloudLaunch. The job rule will periodically query CloudLaunch, discover these new nodes, and route jobs to them. Instructions on how to launch new Pulsar nodes are below.
  2. Submit your jobs as usual.

Configuring Galaxy versions lower than 19.01

  1. First install the GalaxyCloudRunner into your Galaxy virtual environment.
cd <galaxy_home>
source .venv/bin/activate
pip install galaxycloudrunner
  1. For prior prior to Galaxy 19.01, you will need to add a GalaxyCloudRunner job rule to your Galaxy configuration by pasting the following file contents into your Galaxy job rules folder in: <galaxy_home>/lib/galaxy/jobs/rules/.

    Create a file named galaxycloudrunner.py and paste the following contents into the file at the location above.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
from galaxycloudrunner.runners.cl_pulsar_burst import get_destination


def cloudlaunch_pulsar_burst(app, referrer,
                             cloudlaunch_api_endpoint=None,
                             cloudlaunch_api_token=None,
                             pulsar_runner_id="pulsar",
                             pulsar_file_action_config=None,
                             fallback_destination_id=None):
    return get_destination(app, referrer,
                           cloudlaunch_api_endpoint,
                           cloudlaunch_api_token,
                           pulsar_runner_id,
                           pulsar_file_action_config,
                           fallback_destination_id)
  1. Edit your job_conf.xml in the <galaxy_home>/config folder and add the highlighted sections to it.

    You will need to add your own cloudlaunch_api_token to the file. Instructions on how to obtain your CloudLaunch API key are given below. If you have a Galaxy version prior to 19.01, the line <param id=”rules_module”>galaxycloudrunner.rules</param> passed to your destination will not work. This is the reason that we need to perform step 2.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
<?xml version="1.0"?>
<job_conf>
    <plugins>
        <plugin id="local" type="runner" load="galaxy.jobs.runners.local:LocalJobRunner" workers="4"/>
        <plugin id="pulsar" type="runner" load="galaxy.jobs.runners.pulsar:PulsarRESTJobRunner"/>
    </plugins>
    <destinations default="galaxycloudrunner">
        <destination id="local" runner="local"/>
        <destination id="galaxycloudrunner" runner="dynamic">
            <param id="type">python</param>
            <param id="function">cloudlaunch_pulsar_burst_compat</param>
            <param id="cloudlaunch_api_endpoint">https://launch.usegalaxy.org/cloudlaunch/api/v1</param>
            <!-- Obtain your CloudLaunch token by visiting: https://launch.usegalaxy.org/profile -->
            <param id="cloudlaunch_api_token">37c46c89bcbea797bc7cd76fee10932d2c6a2389</param>
            <!-- id of the PulsarRESTJobRunner plugin. Defaults to "pulsar" -->
            <param id="pulsar_runner_id">pulsar</param>
            <!-- Destination to fallback to if no nodes are available -->
            <param id="pulsar_fallback_destination_id">local</param>
        </destination>
    </destinations>
    <tools>
        <tool id="upload1" destination="local"/>
    </tools>
</job_conf>
  1. Launch as many Pulsar nodes as you need through CloudLaunch. The job rule will periodically query CloudLaunch, discover these new nodes, and route jobs to them. Instructions on how to launch new Pulsar nodes are below.
  2. Submit your jobs as usual.

Reducing data transfers

If you would like to control the data transfer configurations for Pulsar, an additional option can be specified in the job_conf destination for the GalaxyCloudRunner rule. This is particularly useful for Galaxy’s reference data because the remote Pulsar nodes have been configured to mount the Galaxy public file system repository with pre-formatted reference data for a number of tools. In turn, this speeds up job execution and reduces data transfers from your Galaxy instance because the relevant files do not need to be transferred to the remote node with each job.

Note that this configuration is necessary only if your file system paths differ from those on the remote Pulsar nodes. Specifically for the reference data, Pulsar nodes mount Galaxy Project’s CVMFS repository, which is available under /cvmfs/data.galaxyproject.org/ directory. The layout of that directory can be inspected here: https://gist.github.com/afgane/b527eb857244f43a680c9654b30deb1f

To enable this feature for the GalaxyCloudRunner, it is necessary to add the following param to the existing job destination in job_conf.xml:

<!-- Path for the Pulsar destination config file for path rewrites. -->
<param id="pulsar_file_action_config">config/pulsar_actions.yml</param>

In addition, transfer actions need to be defined that specify how paths should be translated between the systems. This is done in a dedicated file pointed to in the above param tag, in above example config/pulsar_actions.yml. A basic example of the file is available below while complete details about the available transfer action options are available as part of the Pulsar documentation.

paths:
  - path: /galayx/server/tool-data/sacCer2/bwa_mem_index/sacCer2/
    path_types: unstructured
    action: rewrite
    source_directory: /galaxy/server/sacCer2/bwa_mem_index/sacCer2/
    destination_directory: /cvmfs/data.galaxyproject.org/managed/bwa_mem_index/sacCer2/