Configure Shuffle

Documentation for configuring Shuffle.

PS: This is only for on-prem / open-source, not cloud

Table of contents


Introduction

With Shuffle being Open Sourced, there is a need for a place to read about configuration. There are quite a few options, and this article aims to delve into those.

Shuffle is based on Docker and is started using docker-compose with configuration items in a .env file. .env has the configuration items to be used for default environment changes, database locations, port forwarding, github locations and more.


Installing Shuffle

Check out the installation guide, however if you're on linux:

git clone https://github.com/frikky/Shuffle
cd Shuffle
docker-compose up -d


Updating Shuffle

As long as you use Docker, updating Shuffle is pretty straight forward. To make sure you're as secure and up to date as possible, do this as much as you please.

While being in the main repository:

docker-compose down
git pull
docker-compose pull
docker-compose up -d
docker pull frikky/shuffle:app_sdk

PS: This will NOT update your apps, meaning they may be outdated. To update your apps, go to /apps and click both buttons in the top right corner (reload apps locally & Download from Github)


Production readiness

Shuffle is by default configured to be easy to start using. This means we've had to make some tradeoffs which can be enabled/disabled to make it easier to use the first time. This part outlines a lot of what's necessary to make Shuffle security, availability and scalability better.

Here are the things we'll dive into


Servers

When setting up Shuffle for production, we always recommend using a minimum of two servers (VMs). This is because you don't want your executions to clog the webserver, which again clogs the executions (orborus). You can put Orborus on multiple servers with different environments to ensure better availability, or talk to us about Kubernetes/Swarm

Orborus Runs all workflows - CPU heavy. If you do a lot of file transfers or memory analysis, make sure to add RAM accordingly.

  • Services: Orborus, Worker, Apps
  • CPU: 4vCPU
  • RAM: 4Gb
  • Disk: 10Gb (SSD)

Webserver The webserver is where your users and our API is. It is RAM heavy as we're doing A LOT of caching to ensure scalability.

  • Services: Frontend, Backend, Database
  • CPU: 2vCPU
  • RAM: 8Gb
  • Disk: 100Gb (SSD)


Docker configuration

These are the Docker configurations for the different servers. To use them, put the files in files called docker-compose.yml, and run

docker-compose up -d

to start the containers.

PS: The data below is based on this docker-compose file

Orborus Below is the Orborus configuration. make sure to change "BASE_URL" in the environment to match the Shuffle backend location. It can be modified to reduce or increase load, to add proxies, change backend environment to execute and much more.

PS: By default, the environments (executions) are NOT authenticated.

version: '3'
services:
  orborus:
    #build: ./functions/onprem/orborus
    image: ghcr.io/frikky/shuffle-orborus:0.8.92
    container_name: shuffle-orborus
    hostname: shuffle-orborus
    networks:
      - shuffle
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    environment:
      - BASE_URL=http://SHUFFLE-BACKEND:BACKEND-PORT
      - SHUFFLE_APP_SDK_VERSION=0.8.90
      - SHUFFLE_WORKER_VERSION=0.8.90
      - ORG_ID=Shuffle
      - ENVIRONMENT_NAME=Shuffle
      - DOCKER_API_VERSION=1.40
      - SHUFFLE_ORBORUS_EXECUTION_TIMEOUT=600
      - SHUFFLE_BASE_IMAGE_NAME=frikky
      - SHUFFLE_BASE_IMAGE_REGISTRY=ghcr.io
      - SHUFFLE_BASE_IMAGE_TAG_SUFFIX="-0.8.60"
      - HTTP_PROXY=""
      - HTTPS_PROXY=""
      - SHUFFLE_PASS_WORKER_PROXY=false
      - SHUFFLE_PASS_APP_PROXY=false
      - CLEANUP=true
    restart: unless-stopped
networks:
  shuffle:
    driver: bridge

Webserver The webserver should run the Frontend, Backend and Database. Here's the docker-compose. Make sure THIS .env file exists as well.

version: '3'
services:
  frontend:
    image: ghcr.io/frikky/shuffle-frontend:0.8.80
    container_name: shuffle-frontend
    hostname: shuffle-frontend
    ports:
      - "${FRONTEND_PORT}:80"
      - "${FRONTEND_PORT_HTTPS}:443"
    networks:
      - shuffle
    environment:
      - BACKEND_HOSTNAME=${BACKEND_HOSTNAME}
    restart: unless-stopped
    depends_on:
      - backend
  backend:
    image: ghcr.io/frikky/shuffle-backend:0.8.80
    container_name: shuffle-backend
    hostname: ${BACKEND_HOSTNAME}
    # Here for debugging:
    ports:
      - "${BACKEND_PORT}:5001"
    networks:
      - shuffle
    volumes: 
      - /var/run/docker.sock:/var/run/docker.sock 
      - ${SHUFFLE_APP_HOTLOAD_LOCATION}:/shuffle-apps     
      - ${SHUFFLE_FILE_LOCATION}:/shuffle-files
    environment:
      - DATASTORE_EMULATOR_HOST=shuffle-database:8000
      - SHUFFLE_APP_HOTLOAD_FOLDER=/shuffle-apps
      - SHUFFLE_FILE_LOCATION=/shuffle-files
      - ORG_ID=${ORG_ID}
      - SHUFFLE_APP_DOWNLOAD_LOCATION=${SHUFFLE_APP_DOWNLOAD_LOCATION}
      - SHUFFLE_DOWNLOAD_AUTH_BRANCH=${SHUFFLE_DOWNLOAD_AUTH_BRANCH}
      - SHUFFLE_DEFAULT_USERNAME=${SHUFFLE_DEFAULT_USERNAME}
      - SHUFFLE_DEFAULT_PASSWORD=${SHUFFLE_DEFAULT_PASSWORD}
      - SHUFFLE_DEFAULT_APIKEY=${SHUFFLE_DEFAULT_APIKEY}
      - SHUFFLE_APP_FORCE_UPDATE=${SHUFFLE_APP_FORCE_UPDATE}
      - HTTP_PROXY=${SHUFFLE_HTTP_PROXY}
      - HTTPS_PROXY=${SHUFFLE_HTTPS_PROXY}
    restart: unless-stopped
    depends_on:
      - database
  database:
    image: frikky/shuffle:database
    container_name: shuffle-database
    hostname: shuffle-database
    networks:
      - shuffle
    environment:
    -  _JAVA_OPTIONS="-Xmx2g"
    restart: unless-stopped
    volumes:
      - ${DB_LOCATION}:/etc/shuffle
networks:
  shuffle:
    driver: bridge


Hybrid Configuration

If you want to try using Hybrid Shuffle, giving you access to cloud executions, failovers and backups - Email us


Environment Variables

Shuffle has a few toggles that makes it straight up faster, but which removes a lot of the checks that are being done during your first tries of Shuffle.

Database:

_JAVA_OPTIONS="-Xmx6g" # Where the "6g" means 6Gb of RAM. It's important as to ensure the database keeps caching. If this is not set, you may lose your progress as you scale.

Orborus:

CLEANUP=true 	# Cleans up all containers after they're done. Necessary to help Docker scale. Default=false
HTTP_PROXY= 	# Configures a HTTP proxy to use when talking to the Shuffle Backend
HTTPs_PROXY= 	# Configures a HTTPS proxy when speaking to the Shuffle Backend


Redundancy

TBD: We have yet to decide how this should be implemented for Shuffle. Per now, you may configure multiple instances with a load balancer, but there's no easy way to syncronize data between them to ensure they're in the same place.

A good place to start is this blogpost by one of our contributors: https://azgaviperr.github.io/3-nodes-swarm/DockerSwarm/Stacks/Shuffler/


Proxy configuration

Proxies are another requirement to many enterprises, hence it's an important feature to support. There are two places where proxies can be implemented:

  • Shuffle Backend: Connects to Github and Dockerhub.
  • Shuffle Orborus: Connects to Dockerhub and Shuffle Backend.

PS: Orborus settings are also set for the Worker

To configure these, there are two options:

  • Individual containers
  • Globally for Docker


Global Docker proxy configuration

Follow this guide from Docker: https://docs.docker.com/network/proxy/


Individual container proxy

To set up proxies in individual containers, open docker-compose.yml and add the following lines with your proxy settings (http://my-proxy.com:8080 in my case).

PS: Make sure to use uppercase letters, and not lowercase (HTTP_PROXY, NOT http_proxy)

Proxy containers


HTTPS

HTTPS is enabled by default on port 3443 with a self-signed certificate for localhost. If you would like to change this, the only way (currently) is to add configure and rebuild the frontend. If you don't have HTTPS enabled, check updating shuffle to get the latest configuration.

Necessary info:

  • Certificates are located in ./frontend/certs.
  • ./frontend/README.md contains information on generating a self-signed cert
  • (default): Privatekey is named privkey.pem
  • (default): Fullchain is named fullchain.pem

If you want to change this, edit ./frontend/Dockerfile and ./frontend/nginx.conf.

After changing certificates, you can rebuild the entire frontend by running (./frontend)

./run.sh


Kubernetes

Shuffle use with Kubernetes is now possible due to help from our contributors. This has not extensively been tested, so please reach out to @frikkylikeme if you're having execution issues.


Configuring Kubernetes

To configure Kubernetes, you need to specify a single environment variable for Orborus: RUNNING_MODE. By setting the environment variable RUNNING_MODE=kubernetes, execution should work as expected!


Database

To modify the database location, change "DB_LOCATION" in .env (root dir) to your new location.


Database indexes (opensearch)

  • workflowapp
  • workflowexecution
  • workflowapp
  • workflow
  • apikey
  • app_execution_values
  • environments
  • files
  • hooks
  • openapi3
  • organizations
  • schedules
  • sessions
  • syncjobs
  • trigger_auth
  • workflowappauth
  • users
  • workflowqueue-*

PS: workflowqueue-* is based on the environment used for execution.


Docker Version error

Shuffle runs using Docker in every step, from the frontend to the workers and apps. For certain systems however, it requires manual configuration of the version of Docker you're running. This has a self-correcting feature to it within Orborus > v0.8.98, but before then you'll have to manually correct for it.

Error getting containers: Error response from daemon: client version 1.40 is too new. Maximum supported API version is 1.35

To fix this issue, we need to set the version from 1.40 down to 1.35 in the Shuffle enviornment. This can be done by opening the docker-compose.yml file, then changing environment variable "DOCKER_API_VERSION" from 1.40 to 1.35 for the "orborus" service as seen below, then restarting Shuffle.

Error with Docker version


Debugging

As Shuffle has a lot of individual parts, debugging can be quite tricky. To get started, here's a list of the different parts, with the latter three being modular / location independant.

TypeContainer nameTechnologyNote
Frontendshuffle-frontendReactJSCytoscape graphs & Material design
Backendshuffle-backendGolangRest API that connects all the different parts
Databaseshuffle-databaseGoogle DatastoreHas all non-volatile information. Will probably move to elastic or similar.
Orborusshuffle-orborusGolangRuns workers in a specific environment to connect locations. Defaults to the environment "Shuffle" onprem.
Workerworker-idGolang Deploys Apps to run Actions defined in a workflow
app sdkappname_appversion_idPythonUsed by Apps to talk to the backend

worker-8a666e4f-e544-440e-bf0f-4220e7cc9e25


Execution debugging

Execution debugging might be the most notable issue you might explain. This is because there are a ton of reasons that it might crash. Before going into techniques to find what's going on, you'll need to understand what exactly happens when you click the big execution button.

Frontend click -> Backend verifies and deploys executions -> (based on environments) orborus deploys a new worker -> worker finds actions to execute -> your app is executed.

  1. A workflow is executed
  2. The backend verifies whether you can execute and deploys to environment
  3. Orborus is listening to environment and deploys worker if it's the correct one
  4. Worker deploys actions if they have the right environment
  5. App executes and returns data back to the execution

As previously stated, a lot can go wrong. Here's the most common issues:

  • Networking (firewalls / proxies)
  • Badly formed apps.
  • Bad environment


General debugging

This part is mean to describe how to go about finding the issue you're having with executions. In most cases, you should start from the top of the list previously described in the following way:

  1. Find out what environment your action(s) are running under by clicking the App and seeing "Environment" dropdown. In this case (and default) is "Shuffle". Environments can be specified / changed under the path /admin Check execution 3

  2. Check if the workflow executed at all by finding the execution line in the shuffle-backend container. Take note that it mentions environment "Shuffle", as found in the previous step.

docker logs -f shuffle-backend

Check execution 1

  1. If it executed, check whether Orborus is running, before checking it's logs for "Container \<container_id> is created. The container_id is the worker it has deployed. Take not of the environment again at the end of the line. If you don't see this line, it's most likely because it's running in the wrong environment.

Check if shuffle-orborus is running

docker ps # Check if shuffle-orborus is running

Find whether it was deployed or not

docker logs -f shuffle-orborus  # Get logs from shuffle-orborus

Check execution 2

Check environment of running shuffle-orborus container.

docker inspect shuffle-orborus | grep -i "ENV"

Expected env result where "Shuffle" corresponds to the environment Check execution 4

  1. Check whether the worker executed your app. Remember that we found \<container_id> previously by checking the logs of shuffle-orborus? Now we need that one. Workers are and will always be verbose, specifically for the reason of potential debugging.

Find logs from a docker container

docker logs -f CONTAINER_ID

Check execution 5

As can be seen in the image above, is shows the exact execution order it takes. It starts by finding the parents, before executing the child process after it's finished. Take note of the specific apps being executed as well. It says "Time to execute \<app_id> with app \<app_name:app_version>. This indicates the app THAT WILL be executed. The following lines saying "Container \<container_id> is the container created with this app.

  1. App debugging in itself might be the trickiest. There are a lot of factors like branches, bad workflow building etc that might come into play. This builds on the same concept as the worker, where you pass the container ID it specified.

Get the app logs

docker logs -f CONTAINER_ID # The CONTAINER_ID found in the previous worker logs

As you will notice, app logs can be quite verbose (optional in a later build). In essence, if you see "RUNNING NORMAL EXECUTION" in the end, there's a 99.9% chance that it worked, otherwise some issue might have occurred.

Please notify me if you need help debugging app executions ASAP, as I've done a lot of it, but it's more tricky than the other steps.


Hybrid docker image handling

We currently don't have a Docker Registry for Shuffle, meaning you need some minor configuration to get Orborus running remotely with the right containers. This only applies to containers not on dockerhub, as we automatically push PYTHON containers there when updated (not OpenAPI)

Here's an example of how to handle this with two different servers and Docker

ssh user@10.0.0.1
docker save frikky/shuffle:wazuh_api_rest_1.0.0 > wazuh.tar
exit
scp -3 centos@10.0.0.1:/home/user/wazuh.tar centos@10.0.0.2:/home/user/wazuh.tar
ssh user@10.0.0.2
docker load wazuh.tar

TBD: We'll make this an API-call for ContainerD later.


Known Bugs