Important: This documentation covers Yarn 1 (Classic).
For Yarn 2+ docs and migration guide, see yarnpkg.com.

Package detail

@turbodocx/html-to-docx

turbodocx11.8kMIT1.13.4TypeScript support: included

HTML to DOCX converter

html-to-docx, html to docx, html, docx, office, word, pptx, templates, template, templater, templating, report, xlsx, generation, generate, generator, document generation, document creator, document automation, dynamic document generation, microsoft office, microsoft word, microsoft powerpoint, microsoft excel, create, make, Office Open XML, OOXML, document generation software, automated document creation, batch document generation, document templating, typescript, ts

readme

TurboDocx

html-to-docx

NPM Version Type Script Discord npm X Embed TurboDocx in Your App in Minutes

@turbodocx/html-to-docx is a powerful JavaScript library designed to convert HTML documents to DOCX format, compatible with Microsoft Word 2007+, LibreOffice Writer, Google Docs, WPS Writer, and other word processors. Inspired by @PrivateOmega, this is supported by TurboDocx to ensure ongoing development and improvements.

Disclaimer

While @turbodocx/html-to-docx is robust and used in production environments, it is continually evolving. Please ensure it meets your specific needs through thorough testing. Note that it currently does not work directly in the browser.

Installation

Use the npm to install the project.

npm install @turbodocx/html-to-docx

TypeScript Support

This package includes TypeScript typings. No additional installation is required to use it with TypeScript projects.

TypeScript Example

import HtmlToDocx from "@turbodocx/html-to-docx";

const htmlString = `<!DOCTYPE html>
    <html lang="en">
        <head>
            <meta charset="UTF-8" />
            <title>Document</title>
        </head>
        <body>
            <h1>Hello world</h1>
        </body>
    </html>`;

// Basic usage
async function basicExample() {
  const docx = await HtmlToDocx(htmlString);
  // docx is ArrayBuffer in Node.js or Blob in browser environments
}

// With header
async function withHeader() {
  const headerHtml = "<p>Document Header</p>";
  const docx = await HtmlToDocx(htmlString, headerHtml);
}

// With document options
async function withOptions() {
  const docx = await HtmlToDocx(htmlString, null, {
    orientation: "landscape",
    title: "TypeScript Example",
    creator: "TurboDocx",
    table: {
      row: {
        cantSplit: true,
      },
      borderOptions: {
        size: 1,
        color: "000000"
      }
    },
    pageNumber: true,
    footer: true
  });
}

// With all parameters
async function complete() {
  const headerHtml = "<p>Document Header</p>";
  const footerHtml = "<p>Page Footer</p>";

  const docx = await HtmlToDocx(
    htmlString,
    headerHtml,
    {
      orientation: "landscape",
      pageSize: {
        width: 12240,
        height: 15840
      },
      margins: {
        top: 1440,
        right: 1800,
        bottom: 1440,
        left: 1800
      },
      title: "Complete Example",
      creator: "TurboDocx",
    },
    footerHtml
  );
}

For more comprehensive TypeScript examples, check out the following files in the example/typescript directory:

  • typescript-example.ts - A complete example showing how to generate and save DOCX files using TypeScript
  • type-test.ts - Demonstrates the type checking capabilities provided by the TypeScript definitions

Running the TypeScript Examples

To run the TypeScript examples:

# Navigate to the example directory
cd example/typescript

# Install ts-node globally (if not already installed)
npm install -g ts-node typescript

# Ensure @turbodocx/html-to-docx is built and accessible
# From the root directory of the project:
# npm install
# npm run build

# Run the TypeScript example directly
ts-node typescript-example.ts

This will generate two DOCX files in the example/typescript directory:

  • basic-example.docx - A simple document with minimal configuration
  • advanced-example.docx - A document with headers, footers, and advanced formatting options

Usage

await HTMLtoDOCX(htmlString, headerHTMLString, documentOptions, footerHTMLString)

full fledged examples can be found under example/

Parameters

  • htmlString <String> clean html string equivalent of document content.
  • headerHTMLString <String> clean html string equivalent of header. Defaults to <p></p> if header flag is true.
  • documentOptions <?Object>
    • orientation <"portrait"|"landscape"> defines the general orientation of the document. Defaults to portrait.
    • pageSize <?Object> Defaults to U.S. letter portrait orientation.
      • width <Number> width of the page for all pages in this section in TWIP. Defaults to 12240. Maximum 31680. Supports equivalent measurement in pixel, cm or inch.
      • height <Number> height of the page for all pages in this section in TWIP. Defaults to 15840. Maximum 31680. Supports equivalent measurement in pixel, cm or inch.
    • margins <?Object>
      • top <Number> distance between the top of the text margins for the main document and the top of the page for all pages in this section in TWIP. Defaults to 1440. Supports equivalent measurement in pixel, cm or inch.
      • right <Number> distance between the right edge of the page and the right edge of the text extents for this document in TWIP. Defaults to 1800. Supports equivalent measurement in pixel, cm or inch.
      • bottom <Number> distance between the bottom of text margins for the document and the bottom of the page in TWIP. Defaults to 1440. Supports equivalent measurement in pixel, cm or inch.
      • left <Number> distance between the left edge of the page and the left edge of the text extents for this document in TWIP. Defaults to 1800. Supports equivalent measurement in pixel, cm or inch.
      • header <Number> distance from the top edge of the page to the top edge of the header in TWIP. Defaults to 720. Supports equivalent measurement in pixel, cm or inch.
      • footer <Number> distance from the bottom edge of the page to the bottom edge of the footer in TWIP. Defaults to 720. Supports equivalent measurement in pixel, cm or inch.
      • gutter <Number> amount of extra space added to the specified margin, above any existing margin values. This setting is typically used when a document is being created for binding in TWIP. Defaults to 0. Supports equivalent measurement in pixel, cm or inch.
    • title <?String> title of the document.
    • subject <?String> subject of the document.
    • creator <?String> creator of the document. Defaults to html-to-docx
    • keywords <?Array<String>> keywords associated with the document. Defaults to ['html-to-docx'].
    • description <?String> description of the document.
    • lastModifiedBy <?String> last modifier of the document. Defaults to html-to-docx.
    • revision <?Number> revision of the document. Defaults to 1.
    • createdAt <?Date> time of creation of the document. Defaults to current time.
    • modifiedAt <?Date> time of last modification of the document. Defaults to current time.
    • headerType <"default"|"first"|"even"> type of header. Defaults to default.
    • header <?Boolean> flag to enable header. Defaults to false.
    • footerType <"default"|"first"|"even"> type of footer. Defaults to default.
    • footer <?Boolean> flag to enable footer. Defaults to false.
    • font <?String> font name to be used. Defaults to Times New Roman.
    • fontSize <?Number> size of font in HIP(Half of point). Defaults to 22. Supports equivalent measure in pt.
    • complexScriptFontSize <?Number> size of complex script font in HIP(Half of point). Defaults to 22. Supports equivalent measure in pt.
    • table <?Object>
      • row <?Object>
        • cantSplit <?Boolean> flag to allow table row to split across pages. Defaults to false.
      • borderOptions <?Object>
        • size <?Number> denotes the border size. Defaults to 0.
        • stroke <?String> denotes the style of the borderStrike. Defaults to nil.
        • color <?String> determines the border color. Defaults to 000000.
      • addSpacingAfter <?Boolean> flag to add an empty paragraph after tables for spacing. Defaults to true.
    • pageNumber <?Boolean> flag to enable page number in footer. Defaults to false. Page number works only if footer flag is set as true.
    • skipFirstHeaderFooter <?Boolean> flag to skip first page header and footer. Defaults to false.
    • lineNumber <?Boolean> flag to enable line numbering. Defaults to false.
    • lineNumberOptions <?Object>
      • start <Number> start of the numbering - 1. Defaults to 0.
      • countBy <Number> skip numbering in how many lines in between + 1. Defaults to 1.
      • restart <"continuous"|"newPage"|"newSection"> numbering restart strategy. Defaults to continuous.
    • numbering <?Object>
      • defaultOrderedListStyleType <?String> default ordered list style type. Defaults to decimal.
    • decodeUnicode <?Boolean> flag to enable unicode decoding of header, body and footer. Defaults to false.
    • lang <?String> language localization code for spell checker to work properly. Defaults to en-US.
    • preProcessing <?Object>
      • skipHTMLMinify <?Boolean> flag to skip minification of HTML. Defaults to false.
  • footerHTMLString <String> clean html string equivalent of footer. Defaults to <p></p> if footer flag is true.

Returns

<Promise<Buffer|Blob>>

Notes

Currently page break can be implemented by having div with classname "page-break" or style "page-break-after" despite the values of the "page-break-after", and contents inside the div element will be ignored. <div class="page-break" style="page-break-after: always;"></div>

CSS list-style-type for <ol> element are now supported. Just do something like this in the HTML:

  <ol style="list-style-type:lower-alpha;">
    <li>List item</li>
    ...
  </ol>

List of supported list-style-types:

  • upper-alpha, will result in A. List item
  • lower-alpha, will result in a. List item
  • upper-roman, will result in I. List item
  • lower-roman, will result in i. List item
  • lower-alpha-bracket-end, will result in a) List item
  • decimal-bracket-end, will result in 1) List item
  • decimal-bracket, will result in (1) List item
  • decimal, (the default) will result in 1. List item

Also you could add attribute data-start="n" to start the numbering from the n-th.

<ol data-start="2"> will start the numbering from ( B. b. II. ii. 2. )

Font family doesnt work consistently for all word processor softwares

  • Word Desktop work as intended
  • LibreOffice ignores the fontTable.xml file, and finds a font by itself
  • Word Online ignores the fontTable.xml file, and finds closest font in their font library

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to branch new branches off of develop for contribution.

Support

Proudly Sponsored by TurboDocx "Proudly Sponsored by TurboDocx"

License

MIT

Contributors

Made with contrib.rocks.

changelog

Changelog

All notable changes to this project will be documented in this file. See standard-version for commit guidelines.

1.1.2 (2020-05-29)

Features

  • packaging: added jszip for packaging (89619ec)
  • packaging: added method to create container (9808cf2)
  • abstracted conversion using docxDocument class (c625a01)
  • template: added base docx template (abdb87b)
  • added builder methods for images (9e2720f)
  • added document file render helper (6dd9c3a)
  • added escape-html (1a231d5)
  • added header generation (25fb44f)
  • added hyperlinks support (3560ce9)
  • added method to archive images with other files (b6da74b)
  • added more xml builder methods (ffc584b)
  • added more xml statment builder methods (337e530)
  • added text formatting to paragraph (bacd888)
  • added vdom to xml method (8b5a618)
  • added virtual-dom and html-to-vdom (feaa396)
  • added xbuilder (f13b5cc)
  • added xml builder methods for images (f413ad8)
  • added xml statement builder helper (5e23c16)
  • enabling header on flag (516463c)
  • handle line breaks (164c0f5)
  • template: added numbering schema (d179d73)
  • template: added styles schema (d83d230)
  • template: added XML schemas (42232da)

Bug Fixes

  • added attributes to anchor drawing (62e4a29)
  • added default options (4590800)
  • added effectextent and srcrect fragment (5f5e975)
  • added extent fragment (7ce81f2)
  • added header override in content-types xml (5de681b)
  • added image conversion handler (f726e71)
  • added inline attributes (0a4d2ce)
  • added italics, underline and bold in runproperties (34c2e18)
  • added more namespaces (68636b4)
  • added namespace aliases to header and numbering xmls (d0b4101)
  • added numbering and styles relationship (c7e29af)
  • added other namespaces to the xml root (afbbca9)
  • added override for relationship (30acddc)
  • added override for settings and websettings (977af04)
  • added overrides for relationships (22b9cac)
  • added padding between image and wrapping text (e45fbf5)
  • added positioning fragments (e6f7e1c)
  • added required attributes to anchor fragment (d01c9f9)
  • added settings and websettings relation (34aeedc)
  • added settings and websettings to ooxml package (6c829b5)
  • added simple positioning to anchor (5006cc4)
  • added table borders (12864db)
  • added wrap elements (c951688)
  • changed attribute field for picture name (aef241d)
  • changed attribute used for name (3885233)
  • changed default namespace of relationship to solve render issue (56a3554)
  • changed file extension if octet stream is encountered (32c5bf1)
  • changed namespaces to original ecma 376 spec (51be86e)
  • fix table render issue due to grid width (636d499)
  • fixed abstract numbering id (9814cb8)
  • fixed coloring and refactored other text formatting (c288f80)
  • fixed document rels and numbering bug (d6e3152)
  • fixed docx generation (3d96acf)
  • fixed incorrect table row generation (742dd18)
  • fixed internal mode and added extensions (1266121)
  • fixed margin issues (f841b76)
  • fixed numbering and header issue due to wrong filename (64a04bc)
  • fixed table and image rendering (c153092)
  • handled figure wrapper for images and tables (4182a95)
  • handled table width (237ddfd)
  • handling multiple span children and multilevel formatting of text (4c81f58)
  • modified example to use esm bundle (491a83d)
  • moved namespaces into separate file (75cdf30)
  • namespace updated to 2016 standards (6fc2ac2)
  • template: fixed document templating (5f6a74f)
  • template: fixed numbering templating (8b09691)
  • template: removed word xml schema (ee0e1ed)
  • removed unwanted attribute (f3caf44)
  • renamed document rels schema file (10c3fda)
  • updated document abstraction to track generation ids (c34810f)
  • updated documentrels xml generation (433e4b4)
  • updated numbering xml generation (81b7a82)
  • updated xml builder to use namespace and child nodes (2e28b5e)
  • wrapped drawing inside paragraph tag (d0476b4)

1.1.1 (2020-05-28)

Bug Fixes

  • modified example to use esm bundle (dcd7f4b)

1.1.0 (2020-05-28)

Features

  • packaging: added jszip for packaging (89619ec)
  • packaging: added method to create container (9808cf2)
  • template: added base docx template (abdb87b)
  • template: added numbering schema (d179d73)
  • template: added styles schema (d83d230)
  • abstracted conversion using docxDocument class (c625a01)
  • added builder methods for images (9e2720f)
  • added document file render helper (6dd9c3a)
  • added escape-html (1a231d5)
  • added header generation (25fb44f)
  • added hyperlinks support (3560ce9)
  • added method to archive images with other files (b6da74b)
  • added more xml builder methods (ffc584b)
  • added more xml statment builder methods (337e530)
  • added text formatting to paragraph (bacd888)
  • added vdom to xml method (8b5a618)
  • added virtual-dom and html-to-vdom (feaa396)
  • added xbuilder (f13b5cc)
  • added xml builder methods for images (f413ad8)
  • added xml statement builder helper (5e23c16)
  • handle line breaks (164c0f5)
  • template: added XML schemas (42232da)

Bug Fixes

  • added attributes to anchor drawing (62e4a29)
  • added effectextent and srcrect fragment (5f5e975)
  • added extent fragment (7ce81f2)
  • added header override in content-types xml (5de681b)
  • added image conversion handler (f726e71)
  • added inline attributes (0a4d2ce)
  • added italics, underline and bold in runproperties (34c2e18)
  • added more namespaces (68636b4)
  • added namespace aliases to header and numbering xmls (d0b4101)
  • added numbering and styles relationship (c7e29af)
  • added other namespaces to the xml root (afbbca9)
  • added override for relationship (30acddc)
  • added override for settings and websettings (977af04)
  • added overrides for relationships (22b9cac)
  • added padding between image and wrapping text (e45fbf5)
  • added positioning fragments (e6f7e1c)
  • added required attributes to anchor fragment (d01c9f9)
  • added settings and websettings relation (34aeedc)
  • added settings and websettings to ooxml package (6c829b5)
  • added simple positioning to anchor (5006cc4)
  • added table borders (12864db)
  • added wrap elements (c951688)
  • changed attribute field for picture name (aef241d)
  • changed attribute used for name (3885233)
  • changed default namespace of relationship to solve render issue (56a3554)
  • changed file extension if octet stream is encountered (32c5bf1)
  • changed namespaces to original ecma 376 spec (51be86e)
  • fix table render issue due to grid width (636d499)
  • fixed abstract numbering id (9814cb8)
  • fixed coloring and refactored other text formatting (c288f80)
  • fixed document rels and numbering bug (d6e3152)
  • fixed docx generation (3d96acf)
  • fixed incorrect table row generation (742dd18)
  • fixed internal mode and added extensions (1266121)
  • fixed margin issues (f841b76)
  • fixed numbering and header issue due to wrong filename (64a04bc)
  • fixed table and image rendering (c153092)
  • handled figure wrapper for images and tables (4182a95)
  • handled table width (237ddfd)
  • handling multiple span children and multilevel formatting of text (4c81f58)
  • moved namespaces into separate file (75cdf30)
  • namespace updated to 2016 standards (6fc2ac2)
  • removed unwanted attribute (f3caf44)
  • renamed document rels schema file (10c3fda)
  • updated document abstraction to track generation ids (c34810f)
  • template: fixed document templating (5f6a74f)
  • template: fixed numbering templating (8b09691)
  • updated documentrels xml generation (433e4b4)
  • updated numbering xml generation (81b7a82)
  • updated xml builder to use namespace and child nodes (2e28b5e)
  • wrapped drawing inside paragraph tag (d0476b4)
  • template: removed word xml schema (ee0e1ed)