Serverless Glue

This is a plugin for Serverless framework that provide the posiblitiy to deploy AWS Glue Jobs

Install

run npm install --save-dev serverless-aws-glue
add serverless-glue in serverless.yml plugin section
```
 plugins:
     - serverless-aws-glue
```
How work

The plugin create CloufFormation resources of your configuration before make the serverless deploy then add it to the serverless template.

So any glue-job deployed with this plugin is part of your stack too.

How configure your GlueJobs

Configure yours glue jobs in custom section like this:

custom:
  Glue:
    bucketDeploy: someBucket # Required
    s3Prefix: some/s3/key/location/ # optional, default = 'glueJobs/'
    jobs:
      - job:
          name: super-glue-job # Required
          script: src/glueJobs/test-job.py # Required script will be named with the name after '/' and uploaded to s3Prefix location
          tempDir: true # Optional true | false
          type: spark # spark / pythonshell # Required
          glueVersion: python3-2.0 # Required python3-1.0 | python3-2.0 | python2-1.0 | python2-0.9 | scala2-1.0 | scala2-0.9 | scala2-2.0 
          role: arn:aws:iam::000000000:role/someRole # Required
          MaxConcurrentRuns: 3 # Optional
          WorkerType: Standard  # Optional  | Standard  | G1.X | G2.X
          NumberOfWorkers: 1 # Optional
          Connections: "RDS-MySQL5.7-Connection1,RDS-MySQL5.7-Connection2" # Optional
          extraPyFilePaths: "/path/to/file1.py,/path/to/file2.py" # Optional
          extraJarPaths: "/path/to/file1.jar,/path/to/file2.jar" # Optional
          additionalModules: "mysql-connector-python==8.0.5,pymongo==3.11.4" # Optional
          sparkUIPath: "s3://path" # Optional
          DefaultArguments: # Optional
            stage: "dev"
            table_name: "test"

you can define a lot of jobs..

custom:
    Glue:
    bucketDeploy: someBucket
    jobs:
        - job:
            ...
        - job:
            ...

Glue configuration parameters

Parameter	Type	Description	Required
bucketDeploy	String	S3 Bucket name	true
jobs	Array	Array of glue jobs to deploy	true

Jobs configurations parameters

Parameter	Type	Description	Required
name	String	name of job	true
script	String	script path in the project	true
tempDir	Boolean	flag indicate if job required a temp folder, if true plugin create a bucket for tmp	false
type	String	Indicate if the type of your job. Values can use are : `spark` or `pythonshell`	true
glueVersion	String	Indicate language and glue version to use ( `[language][version]-[glue version]`) the value can you use are: python3-1.0 python3-2.0 python2-1.0 python2-0.9 scala2-1.0 scala2-0.9 scala2-2.0	true
role	String	arn role to execute job	true
MaxConcurrentRuns	Double	max concurrent runs of the job	false
WorkerType	String	worker type, default value if you dont indicate is `Standard`	false
NumberOfWorkers	Integer	number of workers	false
Connections	String	Database connections (For multiple connection use `,` for seperation)	false
extraPyFilesPath	String	Python file path (For multiple files use `,` for seperation)	false
extraJarsPath	String	Jar file path (For multiple files use `,` for seperation)	false
additionalModules	String	Additional modules (For multiple multiple use `,` for seperation)	false
sparkUIPath	String	S3 Path	false
DefaultArguments	Json	Key Value pair values	false

And now?...

Only run serverless deploy

Package detail

serverless-aws-glue

readme