Configure Shuffle

2 minutes to read

Documentation for configuring Shuffle. Most information is related to onprem and hybrid versions of Shuffle.

Table of contents


With Shuffle being Open Sourced, there is a need for a place to read about configuration. There are quite a few options, and this article aims to delve into those.

Shuffle is based on Docker and is started using docker-compose with configuration items in a .env file. .env has the configuration items to be used for default environment changes, database locations, port forwarding, github locations and more.

Installing Shuffle

Check out the installation guide, however if you're on linux:

System requirements may be found further down in the Servers section.

git clone
cd Shuffle
docker-compose up -d

Updating Shuffle

From version v1.1 onwards, we will use* registry instead of*

As long as you use Docker, updating Shuffle is pretty straight forward. To make sure you're as secure and up to date as possible, do this as much as you please. To use a specific version of Shuffle, check out specific version. We recommend always sticking to the "latest" tag, and if you want experimental changes, use the "nightly" tag.

While being in the main repository, here is how to update Shuffle:

docker-compose down
git pull
docker pull frikky/shuffle:app_sdk    # Force update the App SDK
docker-compose pull
docker-compose up -d

PS: This will NOT update your apps, meaning they may be outdated. To update your apps, go to /apps and click both buttons in the top right corner (reload apps locally & Download from Github)

Specific Versioning

To use a specific version of Shuffle, you'll need to manually edit the Docker-Compose.yml file to reflect the version - usually for the frontend and backend, but sometimes also the other containers. You can see all our released versions here. We recommend keeping the same version for the frontend and backend, and not to keep them separate, as seen in the image below.


Production readiness

Shuffle is by default configured to be easy to start using. This means we've had to make some tradeoffs which can be enabled/disabled to make it easier to use the first time. This part outlines a lot of what's necessary to make Shuffle's security, availability and scalability better.

Here are the things we'll dive into


When setting up Shuffle for production, we always recommend using a minimum of two servers (VMs). This is because you don't want your executions to clog the webserver, which again clogs the executions (orborus). You can put Orborus on multiple servers with different environments to ensure better availability, or talk to us about Kubernetes/Swarm. These are MINIMUM requirements, and we recommend adding more.

Basic network overview below. Architecture.

- Shuffle backend starts a backend listener on port 5001 (default)
- Orborus POLLS for Jobs. Orborus needs access to port 5001 on backend (default)
- Orborus creates a worker for each job.
- The Worker runs the workflow and sends the payload back to the backend on port 5001 (default)

Orborus Runs all workflows and may be CPU heavy, along with Memory heavy when running at scale with gigabytes of data flowing through. If you do a lot of file transfers, deal with large API payloads, or memory analysis, make sure to add RAM accordingly. No persistent storage necessary.

  • Services: Orborus, Worker, Apps
  • CPU: 4vCPU
  • RAM: 4Gb
  • Disk: 10Gb (SSD)

Webserver The webserver is where your users and our API is. It is RAM heavy as we're doing A LOT of caching to ensure scalability.

  • Services: Frontend, Backend, Database
  • CPU: 2vCPU
  • RAM: 8Gb
  • Disk: >100Gb (SSD)

Docker configuration

These are the Docker configurations for the two different servers described above. To use them, put the information in files called docker-compose.yml on each respective server, to start the containers.

PS: The data below is based on this docker-compose file

Orborus Below is the Orborus configuration. make sure to change "BASE_URL" in the environment to match the external Shuffle backend URL. It can be modified to reduce or increase load, to add proxies, and much more. See environment variables for all options.

PS: Replace SHUFFLE-BACKEND with the IP of Shuffle backend in the specification below. Using Hostname MAY cause issues in certain environments. PPS: By default, the environments (executions) are NOT authenticated.

version: '3'
    #build: ./functions/onprem/orborus
    container_name: shuffle-orborus
    hostname: shuffle-orborus
      - shuffle
      - /var/run/docker.sock:/var/run/docker.sock
      - BASE_URL=http://SHUFFLE-BACKEND:5001
      - ORG_ID=Shuffle
      - ENVIRONMENT_NAME=Shuffle
      - CLEANUP=true
    restart: unless-stopped
    driver: bridge

Webserver The webserver should run the Frontend, Backend and Database. Make sure THIS .env file exists in the same folder. Further, make sure that Opensearch the right access:

sudo sysctl -w vm.max_map_count=262144             #
sudo chown 1000:1000 -R shuffle-database         # Requires for Opensearch


version: '3'
    container_name: shuffle-frontend
    hostname: shuffle-frontend
      - "${FRONTEND_PORT}:80"
      - "${FRONTEND_PORT_HTTPS}:443"
      - shuffle
    restart: unless-stopped
      - backend
    container_name: shuffle-backend
    hostname: ${BACKEND_HOSTNAME}
    # Here for debugging:
      - "${BACKEND_PORT}:5001"
      - shuffle
      - /var/run/docker.sock:/var/run/docker.sock
      - ${SHUFFLE_APP_HOTLOAD_LOCATION}:/shuffle-apps     
      - ${SHUFFLE_FILE_LOCATION}:/shuffle-files
      #- ${SHUFFLE_OPENSEARCH_CERTIFICATE_FILE}:/shuffle-files/es_certificate
    env_file: .env
      - SHUFFLE_APP_HOTLOAD_FOLDER=/shuffle-apps
      - SHUFFLE_FILE_LOCATION=/shuffle-files
    restart: unless-stopped
      - opensearch
    image: opensearchproject/opensearch:2.5.0
    hostname: shuffle-opensearch
    container_name: shuffle-opensearch
      - bootstrap.memory_lock=true
      - "OPENSEARCH_JAVA_OPTS=-Xms4096m -Xmx4096m" # minimum and maximum Java heap size, recommend setting both to 50% of system RAM
      - cluster.routing.allocation.disk.threshold_enabled=false
      - discovery.seed_hosts=shuffle-opensearch
      - cluster.initial_master_nodes=shuffle-opensearch
        soft: -1
        hard: -1
        soft: 65536 # maximum number of open files for the OpenSearch user, set to at least 65536 on modern systems
        hard: 65536
      - ${DB_LOCATION}:/usr/share/opensearch/data:rw
      - shuffle
    restart: unless-stopped
    driver: bridge

Hybrid Cloud Configuration

Scaling Shuffle with Swarm

Orborus can run in Docker-swarm mode, and in early 2023, with Kubernetes. This makes the workflow executions A LOT faster, use less resources, and makes it more scalable across multiple servers. This is a paid service, and requires the Enterprise / Scale license. There are ways to achieve the same by using multiple environments without swarm.

To begin with, Let's sort out the pre-requisites:

Our setup would end up looking like this: Screenshot 2023-07-01 at 3 41 53 AM

Let's begin with setting up Docker, Docker Compose, and creating a Docker Swarm network with two manager nodes involves several steps. Below is a step-by-step guide to achieve this:

Step 1: Install Docker

Install Docker on both machines by following the official Docker installation guide for your operating system. Docker Installation Guide:

Step 2: Install Docker Compose

Install Docker Compose on both machines by following the official Docker Compose installation guide. Docker Compose Installation Guide:

Now, Let's begin with setting up the docker swarm network in Machine A:

You will be provided with a url to download the Worker image from Shuffle. Orborus does not need changing.

  1. Download the new worker you were provided: (Bare in mind, URL is a place holder)
    wget URL # URL is the url provided by Shuffle
    docker load -i
  1. Set Orborus to latest in your docker-compose.yml file
  1. Add and change environment variables for Orborus in the docker-compose.yml file. BASE_URL is the external URL of the server you're running Shuffle on (the one you visit Shuffle with in your browser):
    BASE_URL=http://YOUR-BACKEND-URL:5001 # YOUR-BACKEND-URL can be replaced by your public IP (considering your ports are open)

To make swarm work, Please make sure that these ports are open on both your machines (to at least, both of these machines internally): 2377, 7946 and 4789

It is recommended to make sure that these ports are ONLY open internally just to be sure that everything is secure.

  1. When all is done, take down the stack and pull it back up AFTER initializing swarm:
docker swarm init
docker-compose down
docker-compose up -d
docker swarm join-token manager # copy the command given

PS: In certain scenarios you may need extra configurations, e.g. for network MTU's, docker download locations, proxies etc. See more in the production readiness section.

Add the other machine (Machine B) on docker swarm:

2 minutes to read

Again, Make sure docker works here. Then paste the output from the above last command. It adds the network in the docker swarm network as a manager (It is required to orchestrate the app containers).

It should look something like this:

docker swarm join --token SWMTKN-1-{token} {internal IP}:2377

Verify swarm

Run the following command to get logs from Orborus:

docker logs -f shuffle-orborus

And to check if services have started:

docker service ls

If the list is empty, or you see any of the "replicas" have 0/1, then something is wrong. In case of any swarm issues, contact us at or contact your account representative.

If you get EOFs or timeouts for workers in machine B, look here.

Environment Variables

Shuffle has a few toggles that makes it straight up faster, but which removes a lot of the checks that are being done during your first tries of Shuffle.


# Set the encryption key to ensure all app authentication is being encrypted. If this is NOT defined, we do not encrypt your apps. If this is defined, all authentications - both old and new will start using this key. 
# Do NOT lose this key if specified, as that means you will need to reset all keys.


# PS: Encryption is available from Shuffle backend version >=0.9.17.
## PPS: There's a [known bug]( with Proxies and git

# Set up distributed memcaching. See "Distributed Caching" for more.


# Cleans up all containers after they're done. Necessary to help Docker scale. Default=false

# Cleans up any containers related to Shuffle that have been up for more than 600 seconds.

# Decides the max amount of workflows to concurrenly run. Defaults to 10.
# Example math: 10 workflows * WITH 10 apps / second = 110 containers per second.
# We recommend starting with 10 and going higher as need be.

# Configures a HTTP proxy to use when talking to the Shuffle Backend
# Configures a HTTPS proxy when speaking to the Shuffle Backend

# Decides if the Worker should use the same proxy as Orborus (HTTP_PROXY). Default=true

# Decides if the Apps should use the same proxy as Orborus (HTTP_PROXY). Default=false

### PAID: The environment variables below only work when you've acquired a paid license of Shuffle (not required, but VERY useful when scaling Shuffle):

# Set up distributed caching for Orborus & Worker(s). See "Distributed Caching" for more.

Distributed Caching

Once you have a Scalable version of Shuffle, using Docker swarm, it becomes important for data to flow correctly throughout the platform. In version 1.1 of Shuffle, we introduce distributed caching in the form of Memcached. Memcached helps reduce the load on the database, as well as to ensure all executions are handled adequately. These services are supported:

  • Backend
  • Orborus
  • Worker

To make use of Memcached, you have to start a memcached service locally on a host Shuffle can access, before configuring each service to use it with a single environment variable. The default port is 11211. Here is a quickstart that reserves 1024 Mb of memory:

docker run --name shuffle-cache -p 11211:11211 -d memcached -m 1024

PS: This requires swap limit capabilities on the Docker host. More about running it in Docker here

Once this is up, it will be listening on port 11211. From here, you may set up the SHUFFLE_MEMCACHED environment variable on the previously mentioned services. We recommend starting with the backend. Here's an example that fits into your docker-compose file:


If you need help with this, please contact us.


You may configure multiple instances with a load balancer and docker-swarm/kubernetes. An official guide for high availability is still in the making. Please contact us if this is a need.

A good place to start is this blogpost by one of our contributors:

Proxy configuration

Proxies are another requirement to many enterprises, hence it's an important feature to support. There are two places where proxies can be implemented:

  • Shuffle Backend: Connects to Github and Dockerhub.
  • Shuffle Orborus: Connects to Dockerhub and Shuffle Backend.

PS: Orborus settings are also set for the Worker

To configure these, there are two options:

  • Individual containers
  • Globally for Docker

Global Docker proxy configuration

Follow this guide from Docker:

Individual container proxy

To set up proxies in individual containers, open docker-compose.yml and add the following lines with your proxy settings ( in my case).

PS: Make sure to use uppercase letters, and not lowercase (HTTP_PROXY, NOT http_proxy)

Proxy containers

Orborus running on a different network

All you'll need to do is allow orborus to have access to the backend port and your setup will work fine.


HTTPS is enabled by default on port 3443 with a self-signed certificate for localhost. If you would like to change this, the only way (currently) is to add configure and rebuild the frontend. If you don't have HTTPS enabled, check updating shuffle to get the latest configuration. Another workaround is to set up an Nginx reverse proxy you can control yourself. See further down for more details

After setting this up, make sure to change the BASE_URL for Orborus to talk to your new HTTPS url if you want encrypted traffic everywhere.
Default Routing: Orborus -> Backend:5001.
New Routing: Orborus -> Nginx -> Frontend -> Backend.

The New Routing steps are automatic as long as you update the BASE_URL to point to your new reverse proxy URL.

Necessary info for the truststore to create TLS/SSL certificates:

  • Certificates are located in ./frontend/certs.
  • ./frontend/ contains information on generating a self-signed cert
  • (default): Privatekey is named privkey.pem
  • (default): Fullchain is named fullchain.pem

If you want to change this, edit ./frontend/Dockerfile and ./frontend/nginx.conf.

After changing certificates, you can rebuild the entire frontend by running (./frontend)


Using the Nginx Reverse Proxy for TLS/SSL

If you intend to use Nginx as a Reverse Proxy, the main steps are below. Here is a basic single-server architecture for it. The Docker version is further down.

  1. Install Nginx on your server (find the correct distro), or in a Docker container by itself.
  2. Make sure you have a VALID certificate that matches your domain/hostname and add this to your Nginx server
  3. In the nginx.conf file (/etc/nginx/conf.d/default.conf or similar), under "server", add the information below. Make sure to change the "proxy_pass" part. This is how it will redirect all /api requests.
location / {
    proxy_pass SHUFFLE FRONTENDIP;
    proxy_buffering off;
    proxy_http_version 1.1;

    proxy_connect_timeout 900;
    proxy_send_timeout 900;
    proxy_read_timeout 900;
    send_timeout 900;
    proxy_ssl_verify off;
  1. Restart Nginx! systemctl restart nginx

Nginx in Docker

  1. Add the following service to your docker-compose.yml
    image: nginx:latest
    container_name: shuffle-nginx-proxy
      - shuffle
      - "80:80"
      - ./nginx-conf:/etc/nginx/conf.d
      - ./certs:/etc/nginx/certs
    restart: always
  1. Add a new nginx configuration file called nginx-conf with the following (you may add additional Nginx configuration to this):
server {
    listen 443 ssl;

    ssl_certificate /etc/nginx/certs/cert.crt;
    ssl_certificate_key /etc/nginx/certs/cert.key;

    location / {
        proxy_pass http://shuffle-frontend:80;

        proxy_buffering off;
        proxy_http_version 1.1;

        proxy_connect_timeout 900;
        proxy_send_timeout 900;
        proxy_read_timeout 900;
        send_timeout 900;
        proxy_ssl_verify off;
  1. Add a folder called "certs" with your certificates named cert.crt and cert.key.
  2. Restart everything: docker-compose down; docker-compose up -d

Internal Certificate Authority

By default, certificates are not being verified when outbound traffic goes from Shuffle. This is due to the massive use of self-signed certificates when using internal services. If you want to accept your Certificate Authority for all requests, there are a few ways to do this:

  1. Docker Daemon level (recommended) - point to your cert: $ dockerd --tlscacert=/path/to/custom-ca-cert.pem
  2. Add it to every app (per-image configuration). You can do this by modifying the Dockerfile for an app and manually building it with the certificate in the Dockerfile of each Docker image. Restart Shuffle after this is done.

As this may require advanced Docker understanding, reach out to ask us about it:


Shuffle supports IPv6 in Docker by default, but your docker engine may not. IPv6 can be enabled in Docker by adding it to the /etc/docker/daemon.json file on the host as per this article by Docker:


Shuffle use with Kubernetes is now possible due to help from our contributors. This has not extensively been tested, so please reach out to @frikkylikeme if you're having execution issues.

Configuring Kubernetes

To configure Kubernetes, you need to specify a single environment variable for Orborus: RUNNING_MODE. By setting the environment variable RUNNING_MODE=kubernetes, execution should work as expected!

Network configuration

In most enterprise environments, Shuffle will be behind multiple firewalls, proxies and other networking equipment. If this is the case, below are the requirements to make Shuffle work anywhere. The most common issue has to do with downloads from Alpine linux during runtime.

PS: If external connections are blocked, you may further have issues running Apps. Read more about manual image transfers here.

Domain Whitelisting

These URL's are used to get Shuffle up and running. Whitelisting them for the Shuffle services should make all processes work seamlessly.

PS: We do intend to make this JUST in the future.

# Can be closed after install with working Workflows                                                      # Initial setup & future app/workflow sync                                                        # Downloading apps, workflows and documentation     # Downloads from Github Container registry (                         # Downloads our Documentation raw from github (

# Should stay open                        # Used for building apps in realtime                     # Downloads apps if they don't exist locally                                                        # Github Docker registry                                        # Dockerhub authentication                            # Dockerhub registry (for apps)     # Protects of DockerHub

Incoming Domain Whitelisting

When using Shuffle in the cloud (*, the incoming IP to your services by default will be be from our cloud functions. The range is not static, and may wary based on region. Here's a list (mostly IPv6 as of 2023):

Default (London): 2600:1900:2000:2a:400::0 -> 2600:1900:2000:2a:400::ffff
Euroean Union (eu): TBA
United States (us): TBA
Canada (ca): TBA
India (in): TBA


Proxy settings

The main proxy issues may arise with the "Backend", along with 3the "Orborus" container, which runs workflows. This has to do with how this server can contact the backend (Orborus), along with how apps can be downloaded (Worker), down to how apps engage with external systems (Apps).

Environment variables to be sent to the Orborus container:

# Configures a HTTP proxy to use when talking to the Shuffle Backend
# Configures a HTTPS proxy when speaking to the Shuffle Backend

# Decides if the Worker should use the same proxy as Orborus (HTTP_PROXY). Default=true

# Decides if the Apps should use the same proxy as Orborus (HTTP_PROXY). Default=false

Environment variables for the Backend container:

# A proxy to be used if Opensearch / Elasticsearch (database) is behind a proxy.

# Configures a HTTP proxy for external downloads
# Configures a HTTPS proxy for external downloads

Manual Docker image transfers

In certain cases you may not have access to download or build images at all. If that's the case, you'll need to manually transfer them to the appropriate server. If the image to transfer is an app, it should be moved to the "Orborus" server. Otherwise; backend server.

# 1. Download the image you want. Go to []( and find the image. Download with docker pull. E.g. for Shuffle-tools:
docker pull frikky/shuffle:shuffle-tools_1.1.0

# 2. Save the image to a file to be transferred.
docker save frikky/shuffle:shuffle-tools_1.1.0 > shuffle_tools.tar

# 3. Transfer the file to a remote server
scp shuffle_tools.tar username@<server>:/path/to/destination/shuffle_tools.tar

# 4. Log into the remote server and find the repository
ssh username@<server>
cd /path/to/destination #same path as above

# 5. Load the file!
docker load shuffle_tools.tar

## All done!

# Transfer between 2 remote hosts:
#scp -3 centos@ centos@

No Internet Install

This procedure will help you export what you need to run Shuffle on a no internet host.

  1. Prerequise
  • Both machines has Docker and Docker Compose installed already

  • Your host machine already needs the images on it to make them exportable

  1. Pull images on original machine

Shuffle need a few base images to work:

  • shuffle-frontend
  • shuffle-backend
  • shuffle-orborus
  • shuffle-worker
  • shuffle:app_sdk
  • opensearch
  • shuffle-subflow
 docker pull & docker pull & docker pull &  docker pull frikky/shuffle:app_sdk & docker pull & docker pull opensearchproject/opensearch:2.5.0 & docker pull

Be careful with the versioning for opensearch, all other are going to use the tag "latest". You will also need to download and transfer ALL the apps you want to use. These can be discovered as such:

docker images | grep -i shuffle
  1. Save images and archive them
mkdir shuffle-export & cd shuffle-export

docker save > backend.tar
docker save > frontend.tar
docker save > orborus.tar
docker save frikky/shuffle:app_sdk > app_sdk.tar
docker save > worker.tar
docker save opensearchproject/opensearch:2.5.0 > opensearch.tar
docker save > sublow.tar

git pull


cd .. & tar cvf shuffle-export.tar.gz shuffle-export
  1. Export data to the targeted machine

Use scp, usb key, ..., to copy the previous archive to the machine. More about manual transfers here

  1. Import docker images to host without internet

    tar xvf shuffle-export.tar.gz & cd shuffle-export
    find -type f -name "*.tar" -exec docker load --input "{}" \;
  2. Deploy Shuffle without Internet

Create folders to add the python apps

mkdir shuffle-apps
cp -a python-apps/ * shuffle-apps/

Now, you just need to configure and install Shuffler like in normal procedure

Uptime Monitoring

Uptime monitoring of Shuffle can be done by periodically polling the API for userinfo located at /api/v1/getinfo. This is an API that connects to our database, and which will be stuck if we any platform issues occur, whether in your local instance or in our Cloud instance on

Shuffle has and will not have any planned downtime for services on, and have built our architecture around being able to upgrade and roll back without any downtime at all. If this occurs in the future for our Cloud platform, we will make sure to notify any active users. We plan to launch a status monitor for our services in 2022.

Basic monitoring can be done with a curl request + sendmail + cronjob as seen in this blogpost with the curl command below. Your personal API key can be found on or in the same location (/settings) in your local instance.

curl -H "Authorization: Bearer apikey"


To modify the database location, change "DB_LOCATION" in .env (root dir) to your new location.

Database indexes (opensearch)

  • workflowapp
  • workflowexecution
  • workflowapp
  • workflow
  • apikey
  • app_execution_values
  • environments
  • files
  • hooks
  • openapi3
  • organizations
  • schedules
  • sessions
  • syncjobs
  • trigger_auth
  • workflowappauth
  • users
  • workflowqueue-*

PS: workflowqueue-* is based on the environment used for execution.

Database migration

With the change from 0.8 to 0.9 we're changing databases from Google's Datastore to Opensearch. This has to be done due to unforeseen errors with Datastore, including issues with scale, search and debugging. The next section will detail how you can go about migrating from 0.8.X to 0.9.0 without losing access to your workflows, apps, organizations, triggers, users etc.

Indexes not being migrated:

  • workflowexecutions
  • app_execution_values
  • files
  • sessions
  • syncjobs
  • trigger_auth
  • workflowqueue

Before you start: If you have data of the same kind in the same index within Opensearch, these will be overwritten. Example: you have the user "admin" in the index "users" within Opensearch and Datastore; this will be overwritten with the version that's in Datastore.


  • Admin user in Shuffle using Datastore
  • An available Elasticsearch / Opensearch database.

1. Set main database to be Datastore

    1. Open .env
    1. Scroll down and look for "SHUFFLE_ELASTIC"
    1. Set it to false; SHUFFLE_ELASTIC=false

2. Set up Datastore and Opensearch

In order to run the migration, we have to run both databases at once, connected to Shuffle. This means to run both containers at the same time in the Docker-compose file like the image below, before restarting.

docker-compose down
docker-compose pull
docker-compose up         # PS: Notice that we don't add -d here. This to make it easier to follow the logs. It's ok as we'll stop the instance later.


3. Find your API-key!

Now that you have both databases set up, we need to find the API-key.

1. http://localhost:3000/settings     # You may need to log in
2. Copy the API-key
3. Go to next step

4. Run the migration!

We'll now run a curl command that starts the migration. It shouldn't take more than a few seconds, max a few minutes at scale.

PS: This is NOT a destructive action. It just reads data from one place and moves it to the other. The server will restart after it has finished.

Change the part that says "APIKEY" to your actual API key from the previous step.

curl -XPOST -v localhost:5001/api/v1/migrate_database -H 'Authorization: Bearer APIKEY'


5. Change database back to Opensearch

Let's reverse step 1 by choosing elastic as main database

- 1. Open .env
- 2. Scroll down and look for "SHUFFLE_ELASTIC"
- 3. Set it to false; SHUFFLE_ELASTIC=true

Got any issue? Ask on discord or Contact us.

Docker Version error

Shuffle runs using Docker in every step, from the frontend to the workers and apps. For certain systems however, it requires manual configuration of the version of Docker you're running. This has a self-correcting feature to it within Orborus > v0.8.98, but before then you'll have to manually correct for it.

Error getting containers: Error response from daemon: client version 1.40 is too new. Maximum supported API version is 1.35

To fix this issue, we need to set the version from 1.40 down to 1.35 in the Shuffle environment. This can be done by opening the docker-compose.yml file, then changing environment variable "DOCKER_API_VERSION" from 1.40 to 1.35 for the "orborus" service as seen below, then restarting Shuffle.

Error with Docker version


As Shuffle has a lot of individual parts, debugging can be quite tricky. To get started, here's a list of the different parts, with the latter three being modular / location independent.

TypeContainer nameTechnologyNote
Frontendshuffle-frontendReactJSCytoscape graphs & Material design
Backendshuffle-backendGolangRest API that connects all the different parts
Databaseshuffle-databaseGoogle DatastoreHas all non-volatile information. Will probably move to elastic or similar.
Orborusshuffle-orborusGolangRuns workers in a specific environment to connect locations. Defaults to the environment "Shuffle" onprem.
Workerworker-idGolangDeploys Apps to run Actions defined in a workflow
app sdkappname_appversion_idPythonUsed by Apps to talk to the backend

Execution debugging

Execution debugging might be the most notable issue you might explain. This is because there are a ton of reasons that it might crash. Before going into techniques to find what's going on, you'll need to understand what exactly happens when you click the big execution button.

Frontend click -> Backend verifies and deploys executions -> (based on environments) orborus deploys a new worker -> worker finds actions to execute -> your app is executed.

  1. A workflow is executed
  2. The backend verifies whether you can execute and deploys to environment
  3. Orborus is listening to environment and deploys worker if it's the correct one
  4. Worker deploys actions if they have the right environment
  5. App executes and returns data back to the execution

As previously stated, a lot can go wrong. Here's the most common issues:

  • Networking (firewalls / proxies)
  • Badly formed apps.
  • Bad environment

General debugging

This part is mean to describe how to go about finding the issue you're having with executions. In most cases, you should start from the top of the list previously described in the following way:

  1. Find out what environment your action(s) are running under by clicking the App and seeing "Environment" dropdown. In this case (and default) is "Shuffle". Environments can be specified / changed under the path /admin Check execution 3

  2. Check if the workflow executed at all by finding the execution line in the shuffle-backend container. Take note that it mentions environment "Shuffle", as found in the previous step.

    docker logs -f shuffle-backend

Check execution 1

  1. If it executed, check whether Orborus is running, before checking it's logs for "Container \<container_id> is created. The container_id is the worker it has deployed. Take not of the environment again at the end of the line. If you don't see this line, it's most likely because it's running in the wrong environment.

Check if shuffle-orborus is running

docker ps # Check if shuffle-orborus is running

Find whether it was deployed or not

docker logs -f shuffle-orborus  # Get logs from shuffle-orborus

Check execution 2

Check environment of running shuffle-orborus container.

docker inspect shuffle-orborus | grep -i "ENV"

Expected env result where "Shuffle" corresponds to the environment Check execution 4

  1. Check whether the worker executed your app. Remember that we found \<container_id> previously by checking the logs of shuffle-orborus? Now we need that one. Workers are and will always be verbose, specifically for the reason of potential debugging.

Find logs from a docker container

docker logs -f CONTAINER_ID

Check execution 5

As can be seen in the image above, is shows the exact execution order it takes. It starts by finding the parents, before executing the child process after it's finished. Take note of the specific apps being executed as well. It says "Time to execute \<app_id> with app \<app_name:app_version>. This indicates the app THAT WILL be executed. The following lines saying "Container \<container_id> is the container created with this app.

  1. App debugging in itself might be the trickiest. There are a lot of factors like branches, bad workflow building etc that might come into play. This builds on the same concept as the worker, where you pass the container ID it specified.

Get the app logs

docker logs -f CONTAINER_ID # The CONTAINER_ID found in the previous worker logs

As you will notice, app logs can be quite verbose (optional in a later build). In essence, if you see "RUNNING NORMAL EXECUTION" in the end, there's a 99.9% chance that it worked, otherwise some issue might have occurred.

Please notify me if you need help debugging app executions ASAP, as I've done a lot of it, but it's more tricky than the other steps.

Hybrid docker image handling

We currently don't have a Docker Registry for Shuffle, meaning you need some minor configuration to get Orborus running remotely with the right containers. This only applies to containers not on dockerhub, as we automatically push PYTHON containers there when updated (not OpenAPI)

Here's an example of how to handle this with two different servers and Docker

ssh user@
docker save frikky/shuffle:wazuh_api_rest_1.0.0 > wazuh.tar
scp -3 centos@ centos@
ssh user@
docker load wazuh.tar

Docker socket

For now, the docker socket is required to run Shuffle. Whether you run with Kubernetes or another clustering technology, Shuffle WILL need access to ContainerD, which is what the docker socket provides. If this is against internal policies and you want a single point of contact for controlling permissions, please have a look at docker socket proxy farther down.

Usage of the socket:

  • Backend (Not required, but used for app management)
  • Orborus (Required, deploying Workers)
  • Worker (Required, deploying Apps. Apps DONT have access to the socket.)

API's in use

  • Backend: Create, Make, Export docker images. No direct container management.
  • Orborus: Download and Remove images. Make, List and Remove containers. Make, List and Remove services.
  • Worker: Download and Remove images. Make, List and Remove containers. Make, List and Remove services.

Docker Socket Proxy

In certain scenarios or environments, you may find the docker socket to not have the right permissions, or running the socket directly on your software to be against internal policies. To solve this problem, we've built support for the docker socket proxy, which will give the containers the same permissions, but without the socket being directly mounted in the same container. Another good reason to use the docker socket proxy is to control the docker permissions required.

To use the docker socket proxy, add the following to your docker-compose.yml as a service. This will lauch it together with the rest:

    image: tecnativa/docker-socket-proxy
    privileged: true
      - SERVICES=1
      - TASKS=1
      - NETWORKS=1
      - NODES=1
      - BUILD=1
      - IMAGES=1
      - GRPC=1
      - CONTAINERS=1
      - PLUGINS=1
      - SYSTEM=1
      - VOLUMES=1
      - INFO=1
      - POST=1
      - AUTH=1
      - SECRETS=1
      - /var/run/docker.sock:/var/run/docker.sock
      - shuffle

When done, remove the "/var/run/docker.sock" volume from the backend and orborus services in the docker-compose. To enable the docker rerouting, add this environment variable to both of them

      - DOCKER_HOST=tcp://docker-socket-proxy:2375

This will route all docker traffic through the docker-socket-proxy giving you granular access to each API.

Shuffle Server Healthcheck

There are multiple things to check in the Shuffle server to ensure that the health of server is in a good state:

  • Disk Space
  • Memory
  • Elasticsearch service state

For this, the scripts have been prepared with the alerting mechanism which will check if everything is proper or not.

Disk Space Script

This script will determine whether or not the disc space is more than 75% full. If so, an alert will be sent to your Webhook URL. Replace the script's with your Webhook URL.

df -H | grep -vE '^Filesystem|tmpfs|cdrom' | awk '{ print $5 " " $1 }' | grep -v overlay | while read output;
  #echo $output
  usep=$(echo $output | awk '{ print $1}' | cut -d '%' -f1  )
  partition=$(echo $output | awk '{ print $2 }' )
  if [ $usep -ge 75 ]; then
    curl -X POST -H 'Content-type: application/json' --data '{"Alert":"Almost out of disk space","Server":"Local-Lab Shuffle Server"}' <Webhook-URL>

Memory Check Script

This script will determine whether or not the memory utilization is more than 70%. If so, an alert will be sent to your Webhook URL. Replace the script's with your Webhook URL.

#check server health
if [ "${STATUS}" = "OK" ]; then
    curl -X POST -H 'Content-type: application/json' --data '{"Alert":"There is a problem with this server(, status is not OK"}' <Webhook-URL>
    exit 1

Elasticsearch Service Script

This script will determine whether or not the Elasticsearch service is running or not. If not so, an alert will be sent to your Webhook URL. Replace the script’s with the Elasticsearch IP of your environment. Replace the script's with your Webhook URL.

#check server health
STATUS="$(curl <Elasticsearch-IP>)"
if [ "${STATUS}" = "OK" ]; then
    curl -X POST -H 'Content-type: application/json' --data '{"Alert":"There is a problem with this server(, status is not OK"}' <Webhook-URL>
    exit 1

Cron Jobs to automate the Process

You can set a cron job to execute the scripts on every 15 minutes and the whole process can be automated.

*/15 * * * * bash /root/
*/15 * * * * bash /root/
*/15 * * * * bash /root/