Important: This documentation covers Yarn 1 (Classic).
For Yarn 2+ docs and migration guide, see yarnpkg.com.

Package detail

serverless-aws-glue

sanand404372MIT0.0.44

Serverless plugin to deploy AWS Glue Jobs

serverless, aws, glue, job, gluejob

readme

Serverless Glue

This is a plugin for Serverless framework that provide the posiblitiy to deploy AWS Glue Jobs

Install

  1. run npm install --save-dev serverless-aws-glue
  2. add serverless-glue in serverless.yml plugin section
     plugins:
         - serverless-aws-glue

    How work

The plugin create CloufFormation resources of your configuration before make the serverless deploy then add it to the serverless template.

So any glue-job deployed with this plugin is part of your stack too.

How configure your GlueJobs

Configure yours glue jobs in custom section like this:

custom:
  Glue:
    bucketDeploy: someBucket # Required
    s3Prefix: some/s3/key/location/ # optional, default = 'glueJobs/'
    jobs:
      - job:
          name: super-glue-job # Required
          script: src/glueJobs/test-job.py # Required script will be named with the name after '/' and uploaded to s3Prefix location
          tempDir: true # Optional true | false
          type: spark # spark / pythonshell # Required
          glueVersion: python3-2.0 # Required python3-1.0 | python3-2.0 | python2-1.0 | python2-0.9 | scala2-1.0 | scala2-0.9 | scala2-2.0 
          role: arn:aws:iam::000000000:role/someRole # Required
          MaxConcurrentRuns: 3 # Optional
          WorkerType: Standard  # Optional  | Standard  | G1.X | G2.X
          NumberOfWorkers: 1 # Optional
          Connections: "RDS-MySQL5.7-Connection1,RDS-MySQL5.7-Connection2" # Optional
          extraPyFilePaths: "/path/to/file1.py,/path/to/file2.py" # Optional
          extraJarPaths: "/path/to/file1.jar,/path/to/file2.jar" # Optional
          additionalModules: "mysql-connector-python==8.0.5,pymongo==3.11.4" # Optional
          sparkUIPath: "s3://path" # Optional
          DefaultArguments: # Optional
            stage: "dev"
            table_name: "test"

you can define a lot of jobs..

custom:
    Glue:
    bucketDeploy: someBucket
    jobs:
        - job:
            ...
        - job:
            ...

Glue configuration parameters

Parameter Type Description Required
bucketDeploy String S3 Bucket name true
jobs Array Array of glue jobs to deploy true

Jobs configurations parameters

Parameter Type Description Required
name String name of job true
script String script path in the project true
tempDir Boolean flag indicate if job required a temp folder, if true plugin create a bucket for tmp false
type String Indicate if the type of your job. Values can use are : spark or pythonshell true
glueVersion String Indicate language and glue version to use ( [language][version]-[glue version]) the value can you use are:
  • python3-1.0
  • python3-2.0
  • python2-1.0
  • python2-0.9
  • scala2-1.0
  • scala2-0.9
  • scala2-2.0
true
role String arn role to execute job true
MaxConcurrentRuns Double max concurrent runs of the job false
WorkerType String worker type, default value if you dont indicate is Standard false
NumberOfWorkers Integer number of workers false
Connections String Database connections (For multiple connection use , for seperation) false
extraPyFilesPath String Python file path (For multiple files use , for seperation) false
extraJarsPath String Jar file path (For multiple files use , for seperation) false
additionalModules String Additional modules (For multiple multiple use , for seperation) false
sparkUIPath String S3 Path false
DefaultArguments Json Key Value pair values false

And now?...

Only run serverless deploy