Skip to content

dask-ecs-lib is a Python library that effortlessly spins up a Dask cluster on AWS ECS using Fargate, allowing you to seamlessly execute and parallelize your functions.

Notifications You must be signed in to change notification settings

rolani/dask-ecs-lib

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dask_ecs_lib

Create a dask cluster to run a provided function

Packages in Image

Package version

blosc --> 1.10.6
cloudpickle --> 2.2.1
dask --> 2023.3.0
distributed --> 2023.3.0
lz4 --> 4.3.2
msgpack --> 1.0.4
numpy --> 1.24.2
pandas --> 1.5.3
toolz --> 0.12.0 tornado --> 6.2
python --> 3.8.0.final.0

You will want the client (the environment you are launching the ECS cluster from) to have the above libraries and versions matched. Check client_requirements_py38.txt.

How do I get set up?

  • Importing:

    import svclusterlib`
    
    from svclusterlib import daskfunctions
    
  • Usage:

    daskfunctions.runTask(mode, func, project, num_workers, worker_vcpu, worker_memory, return_result)
    
    • param mode --> string --> "local", "ecsspot" or "fargate" --REQUIRED
    • param func --> function --> The function to be run in the cluster -- REQUIRED
    • param project --> string --> The project to run, which determines the image to use. "research", "data_quality" or "latest" --> default is "research" -- REQUIRED
    • param num_workers --> int --> Number of workers to start on cluster creation. Default 3 --OPTIONAL
    • param worker_vcpu --> int --> The amount of CPU to request for worker tasks in milli-cpu (1/1024). Valid options (1024, 2048, 4096, etc). Default 4096 --OPTIONAL
    • param worker_memory --> int --> The amount of memory to request for worker tasks in MB. Default 8192 --OPTIONAL
    • param return_result -> boolean -> Whether to return the results of the ran function -- default is False
  • Examples: *

    daskfunctions.runTask("fargate", func, "research", 3, 2048, 8192)
    

    --> Creates a fargate cluster using research image with 3 workers, each with 2vCPUs and 8GB memory to run the function func. *

    daskfunctions.runTask("fargate", func, "data_quality", 4, 2048, 4096)
    

    --> Creates a fargate cluster using data_quality image with 4 workers, each with 2vCPUs and 4GB memory to run the function func. *

    daskfunctions.runTask("fargate", func, "latest", 4, 2048, 4096)
    

    --> Creates a fargate cluster using latest Dask image (currently py3.8) with 4 workers, each with 2vCPUs and 4GB memory to run the function func. *

    daskfunctions.runTask("fargate", func, "research", 4, 2048, 4096, True)
    

    --> Creates a fargate cluster using research image with 4 workers, each with 2vCPUs and 4GB memory to run the function func and return the result of the function. *

    daskfunctions.runTask("local", func, "research", 4)
    

    --> Creates a local dask cluster using research image with 4 workers to run the function func.

  • Note:

    • The function to run must be include a joblib context.

      with joblib.parallel_backend("dask"):
      
          searcher.fit(X,y)
      

About

dask-ecs-lib is a Python library that effortlessly spins up a Dask cluster on AWS ECS using Fargate, allowing you to seamlessly execute and parallelize your functions.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published