Configuring Galaxy¶
Configuring Galaxy 19.01 or higher¶
- First install the GalaxyCloudRunner into your Galaxy virtual environment.
cd <galaxy_home>
source .venv/bin/activate
pip install galaxycloudrunner
Edit your job_conf.xml in the <galaxy_home>/config folder and add the highlighted sections to it.
You will need to add your own value for the
cloudlaunch_api_token
to the file. Instructions on how to obtain your CloudLaunch API key are given below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | <?xml version="1.0"?>
<job_conf>
<plugins>
<plugin id="local" type="runner" load="galaxy.jobs.runners.local:LocalJobRunner" workers="4"/>
<plugin id="pulsar" type="runner" load="galaxy.jobs.runners.pulsar:PulsarRESTJobRunner"/>
</plugins>
<destinations default="galaxycloudrunner">
<destination id="local" runner="local"/>
<destination id="galaxycloudrunner" runner="dynamic">
<param id="type">python</param>
<param id="function">cloudlaunch_pulsar_burst</param>
<param id="rules_module">galaxycloudrunner.rules</param>
<param id="cloudlaunch_api_endpoint">https://launch.usegalaxy.org/cloudlaunch/api/v1</param>
<!-- Obtain your CloudLaunch token by visiting: https://launch.usegalaxy.org/profile -->
<param id="cloudlaunch_api_token">37c46c89bcbea797bc7cd76fee10932d2c6a2389</param>
<!-- id of the PulsarRESTJobRunner plugin. Defaults to "pulsar" -->
<param id="pulsar_runner_id">pulsar</param>
<!-- Destination to fallback to if no nodes are available -->
<param id="fallback_destination_id">local</param>
<!-- Pick next available server and resubmit if an unknown error occurs -->
<resubmit condition="unknown_error and attempt <= 3" destination="galaxycloudrunner" />
</destination>
</destinations>
<tools>
<tool id="upload1" destination="local"/>
</tools>
</job_conf>
|
- Launch as many Pulsar nodes as you need through CloudLaunch. The job rule will periodically query CloudLaunch, discover these new nodes, and route jobs to them. Instructions on how to launch new Pulsar nodes are below.
- Submit your jobs as usual.
Configuring Galaxy versions lower than 19.01¶
- First install the GalaxyCloudRunner into your Galaxy virtual environment.
cd <galaxy_home>
source .venv/bin/activate
pip install galaxycloudrunner
For prior prior to Galaxy 19.01, you will need to add a GalaxyCloudRunner job rule to your Galaxy configuration by pasting the following file contents into your Galaxy job rules folder in: <galaxy_home>/lib/galaxy/jobs/rules/.
Create a file named galaxycloudrunner.py and paste the following contents into the file at the location above.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | from galaxycloudrunner.runners.cl_pulsar_burst import get_destination
def cloudlaunch_pulsar_burst(app, referrer,
cloudlaunch_api_endpoint=None,
cloudlaunch_api_token=None,
pulsar_runner_id="pulsar",
pulsar_file_action_config=None,
fallback_destination_id=None):
return get_destination(app, referrer,
cloudlaunch_api_endpoint,
cloudlaunch_api_token,
pulsar_runner_id,
pulsar_file_action_config,
fallback_destination_id)
|
Edit your job_conf.xml in the <galaxy_home>/config folder and add the highlighted sections to it.
You will need to add your own
cloudlaunch_api_token
to the file. Instructions on how to obtain your CloudLaunch API key are given below. If you have a Galaxy version prior to 19.01, the line <param id=”rules_module”>galaxycloudrunner.rules</param> passed to your destination will not work. This is the reason that we need to perform step 2.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | <?xml version="1.0"?>
<job_conf>
<plugins>
<plugin id="local" type="runner" load="galaxy.jobs.runners.local:LocalJobRunner" workers="4"/>
<plugin id="pulsar" type="runner" load="galaxy.jobs.runners.pulsar:PulsarRESTJobRunner"/>
</plugins>
<destinations default="galaxycloudrunner">
<destination id="local" runner="local"/>
<destination id="galaxycloudrunner" runner="dynamic">
<param id="type">python</param>
<param id="function">cloudlaunch_pulsar_burst_compat</param>
<param id="cloudlaunch_api_endpoint">https://launch.usegalaxy.org/cloudlaunch/api/v1</param>
<!-- Obtain your CloudLaunch token by visiting: https://launch.usegalaxy.org/profile -->
<param id="cloudlaunch_api_token">37c46c89bcbea797bc7cd76fee10932d2c6a2389</param>
<!-- id of the PulsarRESTJobRunner plugin. Defaults to "pulsar" -->
<param id="pulsar_runner_id">pulsar</param>
<!-- Destination to fallback to if no nodes are available -->
<param id="pulsar_fallback_destination_id">local</param>
</destination>
</destinations>
<tools>
<tool id="upload1" destination="local"/>
</tools>
</job_conf>
|
- Launch as many Pulsar nodes as you need through CloudLaunch. The job rule will periodically query CloudLaunch, discover these new nodes, and route jobs to them. Instructions on how to launch new Pulsar nodes are below.
- Submit your jobs as usual.
Reducing data transfers¶
If you would like to control the data transfer configurations for Pulsar, an
additional option can be specified in the job_conf
destination for the
GalaxyCloudRunner rule. This is particularly useful for Galaxy’s reference data
because the remote Pulsar nodes have been configured to mount the Galaxy public
file system repository with pre-formatted reference data for a number of tools.
In turn, this speeds up job execution and reduces data transfers from your
Galaxy instance because the relevant files do not need to be transferred to the
remote node with each job.
Note that this configuration is necessary only if your file system paths differ
from those on the remote Pulsar nodes. Specifically for the reference data,
Pulsar nodes mount Galaxy Project’s CVMFS repository, which is available under
/cvmfs/data.galaxyproject.org/
directory. The layout of that directory can
be inspected here: https://gist.github.com/afgane/b527eb857244f43a680c9654b30deb1f
To enable this feature for the GalaxyCloudRunner, it is necessary to add the
following param
to the existing job destination in job_conf.xml
:
<!-- Path for the Pulsar destination config file for path rewrites. -->
<param id="pulsar_file_action_config">config/pulsar_actions.yml</param>
In addition, transfer actions need to be defined that specify how paths should
be translated between the systems. This is done in a dedicated file pointed to
in the above param
tag, in above example config/pulsar_actions.yml
. A
basic example of the file is available below while complete details about the
available transfer action options are available as part of the Pulsar
documentation.
paths:
- path: /galayx/server/tool-data/sacCer2/bwa_mem_index/sacCer2/
path_types: unstructured
action: rewrite
source_directory: /galaxy/server/sacCer2/bwa_mem_index/sacCer2/
destination_directory: /cvmfs/data.galaxyproject.org/managed/bwa_mem_index/sacCer2/