This repo contains a build.sh
script that's intended to be run in an Amazon
Linux docker container, and build numpy, pandas, and scipy for use in AWS
Lambda. For more info about how the script works, and how to use it, see this
blog post on deploying sklearn to Lambda.
There was an older version of this repo, now archived in the ec2-build-process branch, used an EC2 instance to perform the build process and an Ansible playbook to execute the build. That version still works, but the new dockerized version doesn't require you to launch a remote instance.
To build the zipfile, pull the Amazon Linux image and run the build script in it.
$ docker pull amazonlinux:2017.09
$ docker run -v $(pwd):/outputs -it amazonlinux:2017.09 /bin/bash /outputs/build.sh
Note that the script no longer works with the amazonlinux:latest
image so use the one specified above amazonlinux:2017.09
.
That will make a file called venv.zip
in the local directory that's around
67 MB.
Once you run this, you'll have a zipfile containing scipy, pandas, and numpy, to use them add your handler file to the zip, and add the lib
directory so it can be used for shared libs. The minimum viable scipy handler
would thus look like:
import os
import ctypes
for d, _, files in os.walk('lib'):
for f in files:
if f.endswith('.a'):
continue
ctypes.cdll.LoadLibrary(os.path.join(d, f))
import scipy
def handler(event, context):
# do scipy stuff here
return {'yay': 'done'}
To add extra packages to the build, create a requirements.txt
file alongside
the build.sh
in this repo. All packages listed there will be installed in
addition to scipy
, pandas
, numpy
, and related dependencies.
This script was edited to allow us to import our private repos such as amper-core. This repository must include private_key.txt
file including a ssh private key, and here is info on how to generate a ssh-key and how to add it to you github profile. The corresponding private_key should be placed in the private_key.txt file where it will be used by docker to access our private repo.
Also the build.sh
script was updated to use python 3.6 and install git so that we can install packages in requirements.txt.
With just compression and stripped binaries, the full sklearn stack weighs in at 65 MB, and could probably be reduced further by:
- Pre-compiling all .pyc files and deleting their source
- Removing test files
- Removing documentation
For my purposes, 39 MB is sufficiently small, if you have any improvements to share pull requests or issues are welcome.
Completed optimizations 2 and 3 above by following this link. Tried optimization 1 but packages broke down.
This project is MIT Licensed, for license info on the numpy, scipy, and sklearn packages see their respective sites. Full text of the MIT license is in LICENSE.txt.
- venv-3-1-18.zip - package for state-gen lambda function in covfefe (numpy, scipy, requirements)
- venv-4-23-18.zip - package for auto tuner (no longer used)
- venv-5-32-18.zip - package for cycles-gen lambda function in covfefe (numpy, scipy, pandas, requirements)
- venv-8-2-18.zip - package for cycles-gen lambda function in covfefe (numpy, scipy, pandas, scikit-learn, requirements)