Important: This documentation covers Yarn 1 (Classic).
For Yarn 2+ docs and migration guide, see yarnpkg.com.

Package detail

@himorishige/noren-core

himorishige274MIT0.6.4TypeScript support: included

Core PII detection, masking, and tokenization library built on Web Standards

pii, privacy, redaction, masking, validation, japanese, web-standards, streams, webcrypto, edge

readme

@himorishige/noren-core

npm version Bundle Size

Fast, lightweight PII detection and masking library built on Web Standards

The core library of the Noren PII protection suite - designed for simplicity, performance, and universal compatibility.

✨ Key Features

  • 🚀 Ultra-lightweight: 124KB bundled size (77% code reduction)
  • High performance: 102K+ ops/sec with pre-compiled patterns
  • 🌐 Web Standards: Works everywhere (Node.js, Edge, Browsers)
  • 🎯 Smart detection: Built-in patterns with confidence scoring
  • 🛡️ Advanced validation: Context-aware false positive filtering with 3 strictness levels
  • 📊 JSON/NDJSON Support: Native structured data detection with key-based matching
  • Prefilter optimization: Fast screening before expensive regex operations
  • 🔒 Enhanced security: HMAC-based tokenization with 32-char minimum key
  • 📦 Zero dependencies: Pure JavaScript, no external deps
  • 🎚️ Confidence scoring: Rule-based detection accuracy control

🚀 Installation

npm install @himorishige/noren-core

📖 Quick Start

Basic Usage

import { Registry, redactText } from '@himorishige/noren-core'

// Create registry with default settings
const registry = new Registry({
  defaultAction: 'mask'
})

// Detect and mask PII
const input = 'Contact: john@company.com, Card: 4242-4242-4242-4242'
const result = await redactText(registry, input)

console.log(result)
// Output: Contact: [REDACTED:email], Card: [REDACTED:credit_card]

With Custom Rules

const registry = new Registry({
  defaultAction: 'mask',
  enableConfidenceScoring: true,     // Enhanced in v0.6.0+
  validationStrictness: 'balanced',  // New in v0.6.0: Advanced validation
  environment: 'production',         // Smart defaults with context-aware filtering
  rules: {
    email: { action: 'mask' },
    credit_card: { action: 'mask', preserveLast4: true }
  }
})

const input = 'Email: user@company.com, Card: 4242-4242-4242-4242'
const result = await redactText(registry, input)
// Output: Email: [REDACTED:email], Card: **** **** **** 4242

Tokenization

const registry = new Registry({
  defaultAction: 'tokenize',
  hmacKey: 'your-secure-32-character-key-here-123456' // Min 32 chars required
})

const input = 'User: alice@company.com'
const result = await redactText(registry, input)
// Output: User: TKN_EMAIL_AbC123XyZ...

// Same input always produces same token
const sameResult = await redactText(registry, input)
// Tokens will be identical

Advanced Validation (v0.6.0+)

Control false positive detection with context-aware validation:

const registry = new Registry({
  defaultAction: 'mask',
  validationStrictness: 'balanced' // 'fast' | 'balanced' | 'strict'
})

// Test data is automatically filtered out in balanced/strict modes
const testInput = 'Test email: test@example.com, Real email: john@company.com'
const result = await redactText(registry, testInput)
// Output: Test email: test@example.com, Real email: [REDACTED:email]

// Different strictness levels:
// - 'fast': No validation (maximum performance)
// - 'balanced': Filter test data and weak contexts (recommended)
// - 'strict': Aggressive filtering with context requirements

🎯 Supported PII Types

Core Package:

Type Pattern Example Notes
email Email addresses `john@company.com` ✓ Unicode support, validation
credit_card Credit card numbers (Luhn validated) 4242-4242-4242-4242 ✓ Brand detection, validation
phone_e164 International phone numbers +1-555-123-4567 ✓ Format validation

Network Detection (v0.6.0+):

⚠️ Breaking Change: Network PII detection (IPv4/IPv6/MAC) has been moved to a dedicated plugin for better modularity:

npm install @himorishige/noren-plugin-network
import * as networkPlugin from '@himorishige/noren-plugin-network'

const registry = new Registry({ defaultAction: 'mask' })
registry.use(networkPlugin.detectors, networkPlugin.maskers)

// Now IPv4, IPv6, and MAC detection works
const result = await redactText(registry, 'Server: 192.168.1.1, MAC: 00:11:22:33:44:55')
// Output: Server: [REDACTED:ipv4], MAC: [REDACTED:mac]

📊 Stream Processing

For large data processing:

import { createRedactionTransform } from '@himorishige/noren-core'

const registry = new Registry({ defaultAction: 'mask' })
const transform = createRedactionTransform(registry)

// Process any ReadableStream
const inputStream = new ReadableStream({
  start(controller) {
    controller.enqueue('Data with john@company.com...')
    controller.enqueue('More data with 4242-4242-4242-4242...')
    controller.close()
  }
})

const outputStream = inputStream.pipeThrough(transform)

// Collect results
const reader = outputStream.getReader()
const chunks = []
let done = false

while (!done) {
  const { value, done: readerDone } = await reader.read()
  done = readerDone
  if (value) chunks.push(value)
}

console.log(chunks.join(''))
// Output: Data with [REDACTED:email]...More data with [REDACTED:credit_card]...

🔧 Advanced Configuration

Data Types & Object Processing

Noren processes text strings only. Objects and arrays must be converted to strings before processing:

import { Registry, redactText } from '@himorishige/noren-core'

const registry = new Registry({ defaultAction: 'mask' })

// ❌ This will fail - objects not supported
const badExample = { email: 'user@example.com' }
// await redactText(registry, badExample) // Error: s.normalize is not a function

// ✅ Convert to JSON string first  
const jsonString = JSON.stringify({ email: 'user@company.com', phone: '090-1234-5678' })
const result = await redactText(registry, jsonString)
// Output: {"email":"[REDACTED:email]","phone":"•••-••••-••••"}

// ✅ Custom object processing helper
async function redactObject(registry, obj, options = {}) {
  if (typeof obj === 'string') {
    return await redactText(registry, obj, options)
  }

  if (Array.isArray(obj)) {
    const results = []
    for (const item of obj) {
      results.push(await redactObject(registry, item, options))
    }
    return results
  }

  if (obj && typeof obj === 'object') {
    const result = {}
    for (const [key, value] of Object.entries(obj)) {
      result[key] = await redactObject(registry, value, options)
    }
    return result
  }

  return obj // numbers, booleans, etc. returned as-is
}

// Process complex nested structures
const complexData = {
  user: { email: 'user@company.com', phones: ['090-1111-2222', '03-3333-4444'] },
  messages: ['Contact: admin@company.com', 'Phone: 080-5555-6666']
}

const redacted = await redactObject(registry, complexData, {
  hmacKey: 'your-secure-32-character-key-here-123456'
})
// Output: Nested objects with PII properly masked in string values only

Full-Width Character Support

Noren automatically handles full-width (zenkaku) characters through Unicode NFKC normalization:

const registry = new Registry({ defaultAction: 'mask' })

// Full-width characters are automatically normalized before processing
const fullWidthInput = 'Email: user@example.com Phone: 090-1234-5678'
const result = await redactText(registry, fullWidthInput)
// Output: Email: [REDACTED:email] Phone: •••-••••-••••

// Detection works the same as half-width equivalents
const halfWidthInput = 'Email: user@company.com Phone: 090-1234-5678'  
const sameResult = await redactText(registry, halfWidthInput)
// Both inputs produce equivalent masking results

Environment-Aware Processing

const registry = new Registry({
  environment: 'development', // Automatically excludes test patterns
  allowDenyConfig: {
    allowList: ['test@company.com'], // Never treat as PII
    denyList: ['admin@'] // Always treat as PII
  }
})

Performance Tuning

const registry = new Registry({
  enableConfidenceScoring: false, // Disable for maximum performance
  sensitivity: 'relaxed' // Less aggressive detection
})

🌐 Plugin System

Extend functionality with plugins:

// Use plugins for extended functionality
import * as networkPlugin from '@himorishige/noren-plugin-network'
import * as jpPlugin from '@himorishige/noren-plugin-jp'
import * as securityPlugin from '@himorishige/noren-plugin-security'

const registry = new Registry({ defaultAction: 'mask' })

// Add network detection (IPv4/IPv6/MAC)
registry.use(networkPlugin.detectors, networkPlugin.maskers)

// Add Japanese PII detection
registry.use(jpPlugin.detectors, jpPlugin.maskers)

// Add security token detection
registry.use(securityPlugin.detectors, securityPlugin.maskers)

Plugin Validation Integration (v0.6.0+)

Plugins automatically inherit the registry's validation settings:

const registry = new Registry({
  defaultAction: 'mask',
  validationStrictness: 'balanced' // Applies to plugins too
})

registry.use(jpPlugin.detectors, jpPlugin.maskers)

// Plugin detections are validated using the same rules as core detectors
const text = 'テスト電話: 03-1234-5678, 本番電話: 03-9876-5432'
const result = await redactText(registry, text)
// Only real phone numbers are detected, test patterns are filtered out

Available Plugins

📊 JSON/Structured Data Processing

Noren v0.5.0+ includes native support for JSON and NDJSON (newline-delimited JSON) processing:

const registry = new Registry({
  defaultAction: 'mask',
  enableJsonDetection: true // Enable structured data processing
})

// JSON object detection
const jsonInput = JSON.stringify({
  user: {
    email: 'admin@company.com',
    phone: '+1-555-123-4567',
    creditCard: '4242-4242-4242-4242'
  }
})

const result = await redactText(registry, jsonInput)
// Detects PII within JSON structure and provides path information

// NDJSON processing
const ndjsonInput = [
  JSON.stringify({ id: 1, email: 'user1@company.com' }),
  JSON.stringify({ id: 2, email: 'user2@company.com' })
].join('\n')

const ndjsonResult = await redactText(registry, ndjsonInput)
// Processes each JSON line independently

JSON Detection Features

  • Key-based detection: Enhanced accuracy using JSON key names as context
  • Path tracking: Provides full JSON path for detected PII (e.g., $.user.email)
  • Nested objects: Recursive detection in deeply nested structures
  • NDJSON support: Line-by-line processing for streaming data
  • Type safety: Validates JSON structure before processing

🔗 MCP (Model Context Protocol) Integration

Noren provides specialized support for MCP servers that communicate via JSON-RPC over stdio. This is particularly useful for AI tools like Claude Code that need to process communication with external services while protecting sensitive data.

MCP Transform Stream

For real-time stdio processing in MCP servers:

import { 
  Registry, 
  createMCPRedactionTransform,
  redactJsonRpcMessage 
} from '@himorishige/noren-core'

// Create registry with comprehensive PII detection
const registry = new Registry({
  defaultAction: 'mask',
  validationStrictness: 'fast', // Optimized for real-time processing
  enableJsonDetection: true,
  rules: {
    email: { action: 'mask' },
    api_key: { action: 'remove' },
    jwt_token: { action: 'tokenize' }
  },
  hmacKey: 'mcp-server-redaction-key-32-chars-minimum-length-required'
})

// Create MCP-optimized transform stream
const transform = createMCPRedactionTransform({
  registry,
  policy: { defaultAction: 'mask' },
  lineBufferSize: 64 * 1024
})

// Process stdio communication
await process.stdin
  .pipeThrough(transform)
  .pipeTo(process.stdout)

JSON-RPC Message Processing

For processing individual JSON-RPC messages:

// Process a JSON-RPC request
const request = {
  jsonrpc: '2.0',
  method: 'getUserProfile',
  params: {
    email: 'user@company.com',
    phone: '+1-555-123-4567'
  },
  id: 1
}

const redacted = await redactJsonRpcMessage(request, { registry })
console.log(redacted)
// Output: {
//   jsonrpc: '2.0',
//   method: 'getUserProfile', 
//   params: {
//     email: '[REDACTED:email]',
//     phone: '•••-•••-••••'
//   },
//   id: 1
// }

MCP Server Proxy Example

Create a proxy server that automatically redacts PII from stdio communication:

#!/usr/bin/env node
import { Registry, createMCPRedactionTransform } from '@himorishige/noren-core'
import { Readable, Writable } from 'node:stream'

class MCPRedactionProxy {
  constructor(options = {}) {
    this.registry = new Registry({
      defaultAction: 'mask',
      enableJsonDetection: true,
      validationStrictness: 'fast'
    })
  }

  async start() {
    const inputStream = Readable.toWeb(process.stdin)
    const outputStream = Writable.toWeb(process.stdout)

    const transform = createMCPRedactionTransform({
      registry: this.registry,
      policy: { defaultAction: 'mask' }
    })

    await inputStream
      .pipeThrough(transform)
      .pipeTo(outputStream)
  }
}

// Start the proxy
const proxy = new MCPRedactionProxy()
await proxy.start()

MCP Use Cases

1. AI Assistant Communication

  • Protect user data in Claude Code AI interactions
  • Redact PII from external API communications
  • Safe logging of AI model conversations

2. Development Tools Integration

  • IDE extensions with PII protection
  • Code analysis tools with privacy features
  • Debug logging with automatic data sanitization

3. CI/CD Pipeline Protection

  • Build logs with PII redaction
  • Test data anonymization
  • Environment variable protection

MCP Utilities

The library also provides utility functions for MCP processing:

import {
  parseJsonLines,
  isValidJsonRpcMessage,
  extractSensitiveContent,
  containsJsonRpcPattern,
  getMessageType
} from '@himorishige/noren-core'

// Parse line-delimited JSON messages
const messages = parseJsonLines(ndjsonString)

// Validate JSON-RPC message format
if (isValidJsonRpcMessage(message)) {
  const type = getMessageType(message) // 'request' | 'response' | 'notification' | 'error'
}

// Extract potentially sensitive content
const sensitiveContent = extractSensitiveContent(jsonRpcMessage)

📚 API Reference

Registry

Main class for PII detection and configuration.

Constructor Options

interface RegistryOptions {
  defaultAction?: 'mask' | 'remove' | 'tokenize'
  rules?: Record<string, { action: Action, preserveLast4?: boolean }>
  hmacKey?: string // Required for tokenization
  environment?: 'production' | 'development' | 'test'
  allowDenyConfig?: AllowDenyConfig
  enableConfidenceScoring?: boolean
  enableJsonDetection?: boolean // New: Enable JSON/NDJSON processing
  sensitivity?: 'strict' | 'balanced' | 'relaxed'
  contextHints?: string[] // Keywords to improve detection
  validationStrictness?: 'fast' | 'balanced' | 'strict' // v0.6.0+: Context validation level
}

Methods

  • use(detectors, maskers, contextHints?): Add plugins
  • detect(text, contextHints?): Detect PII (returns hits)
  • maskerFor(type): Get masker for PII type

redactText(registry, input, overrides?)

Process text and apply redaction rules.

createRedactionTransform(registry, overrides?)

Create transform stream for large data processing.

⚡ Performance

Benchmarks (v0.5.0)

  • Bundle Size: 124KB optimized distribution
  • Processing Speed: 102,229 operations/second (0.0098ms per iteration)
  • Memory Efficiency: Object pooling with automatic cleanup
  • TypeScript Codebase: 1,782 lines (40%+ reduction from v0.4.x)
  • API Surface: 14 exports (65% reduction for better tree-shaking)

Best Practices

  1. Reuse Registry instances - avoid creating new ones frequently
  2. Use streams for large data processing
  3. Disable confidence scoring for maximum performance
  4. Pre-compile patterns by loading plugins at startup

🔒 Security Considerations

HMAC Keys

  • Minimum 32 characters required (enforced in v0.5.0)
  • Store in environment variables, never in code
  • Use different keys per environment
  • Rotate keys regularly
  • Base64URL token format for better security

Memory Safety

  • Automatic object pooling reduces GC pressure
  • Sensitive data is cleared from memory after processing
  • Configurable limits prevent DoS attacks

🛠 Development Tools

For advanced features like benchmarking and A/B testing:

npm install @himorishige/noren-devtools

See @himorishige/noren-devtools for development and testing tools.

🔄 Version History

v0.6.0 (Latest) - Advanced Validation & Architecture Optimization

🚨 Breaking Changes:

  • Network detection separation: IPv4/IPv6/MAC detection moved to @himorishige/noren-plugin-network
  • Smaller core bundle: 35% reduction in core package size by removing network patterns
  • Plugin-based architecture: Better modularity and optional feature loading

🛡️ New Features:

  • Advanced validation system: Context-aware false positive filtering with 3 strictness levels (fast/balanced/strict)
  • Plugin validation integration: Automatic validation for plugin-detected PII types with seamless inheritance
  • 🇯🇵 Enhanced Japanese language support: Specialized validators and expanded context keywords for improved accuracy
  • 📋 Debug utilities: New debugValidation() function for detailed validation analysis
  • ⚡ Performance optimized: Validation adds minimal overhead while significantly reducing false positives
  • 🎯 Context-aware filtering: Smart detection of test data, examples, and weak contexts
  • 🔄 Backward compatible: All existing APIs work without changes (except network detection)

📦 Migration Guide:

// Before v0.6.0 (network detection included)
const result = await redactText(registry, 'IP: 192.168.1.1')

// v0.6.0+ (install network plugin)
npm install @himorishige/noren-plugin-network

import * as networkPlugin from '@himorishige/noren-plugin-network'
registry.use(networkPlugin.detectors, networkPlugin.maskers)
const result = await redactText(registry, 'IP: 192.168.1.1')

v0.5.0 - Performance & Structured Data Support

  • JSON/NDJSON detection: Native support for structured data with key-based matching
  • Prefilter optimization: Fast screening reduces processing time for non-PII text
  • 77% code reduction: Streamlined from 8,153 to 1,782 lines
  • Single-pass detection: Unified pattern matching for better performance
  • Optimized IPv6 parser: 31% size reduction with enhanced validation
  • Streamlined Hit Pool: 47% size reduction with object pooling
  • Reduced API surface: 65% fewer exports for better tree-shaking
  • Enhanced security: Stricter boundaries and improved validation
  • Code quality improvements: Full TypeScript strict mode compliance

v0.4.0 - Confidence Scoring & Advanced Features

  • Added confidence scoring system
  • Environment-aware processing
  • Enhanced HMAC security with 32-character minimum
  • Development tools package separation

📄 License

MIT License - see LICENSE for details.


Part of the Noren PII protection suite