@himorishige/noren-core
Fast, lightweight PII detection and masking library built on Web Standards
The core library of the Noren PII protection suite - designed for simplicity, performance, and universal compatibility.
✨ Key Features
- 🚀 Ultra-lightweight: 124KB bundled size (77% code reduction)
- ⚡ High performance: 102K+ ops/sec with pre-compiled patterns
- 🌐 Web Standards: Works everywhere (Node.js, Edge, Browsers)
- 🎯 Smart detection: Built-in patterns with confidence scoring
- 🛡️ Advanced validation: Context-aware false positive filtering with 3 strictness levels
- 📊 JSON/NDJSON Support: Native structured data detection with key-based matching
- ⚡ Prefilter optimization: Fast screening before expensive regex operations
- 🔒 Enhanced security: HMAC-based tokenization with 32-char minimum key
- 📦 Zero dependencies: Pure JavaScript, no external deps
- 🎚️ Confidence scoring: Rule-based detection accuracy control
🚀 Installation
npm install @himorishige/noren-core
📖 Quick Start
Basic Usage
import { Registry, redactText } from '@himorishige/noren-core'
// Create registry with default settings
const registry = new Registry({
defaultAction: 'mask'
})
// Detect and mask PII
const input = 'Contact: john@company.com, Card: 4242-4242-4242-4242'
const result = await redactText(registry, input)
console.log(result)
// Output: Contact: [REDACTED:email], Card: [REDACTED:credit_card]
With Custom Rules
const registry = new Registry({
defaultAction: 'mask',
enableConfidenceScoring: true, // Enhanced in v0.6.0+
validationStrictness: 'balanced', // New in v0.6.0: Advanced validation
environment: 'production', // Smart defaults with context-aware filtering
rules: {
email: { action: 'mask' },
credit_card: { action: 'mask', preserveLast4: true }
}
})
const input = 'Email: user@company.com, Card: 4242-4242-4242-4242'
const result = await redactText(registry, input)
// Output: Email: [REDACTED:email], Card: **** **** **** 4242
Tokenization
const registry = new Registry({
defaultAction: 'tokenize',
hmacKey: 'your-secure-32-character-key-here-123456' // Min 32 chars required
})
const input = 'User: alice@company.com'
const result = await redactText(registry, input)
// Output: User: TKN_EMAIL_AbC123XyZ...
// Same input always produces same token
const sameResult = await redactText(registry, input)
// Tokens will be identical
Advanced Validation (v0.6.0+)
Control false positive detection with context-aware validation:
const registry = new Registry({
defaultAction: 'mask',
validationStrictness: 'balanced' // 'fast' | 'balanced' | 'strict'
})
// Test data is automatically filtered out in balanced/strict modes
const testInput = 'Test email: test@example.com, Real email: john@company.com'
const result = await redactText(registry, testInput)
// Output: Test email: test@example.com, Real email: [REDACTED:email]
// Different strictness levels:
// - 'fast': No validation (maximum performance)
// - 'balanced': Filter test data and weak contexts (recommended)
// - 'strict': Aggressive filtering with context requirements
🎯 Supported PII Types
Core Package:
Type | Pattern | Example | Notes |
---|---|---|---|
email |
Email addresses | `john@company.com` | ✓ Unicode support, validation |
credit_card |
Credit card numbers (Luhn validated) | 4242-4242-4242-4242 |
✓ Brand detection, validation |
phone_e164 |
International phone numbers | +1-555-123-4567 |
✓ Format validation |
Network Detection (v0.6.0+):
⚠️ Breaking Change: Network PII detection (IPv4/IPv6/MAC) has been moved to a dedicated plugin for better modularity:
npm install @himorishige/noren-plugin-network
import * as networkPlugin from '@himorishige/noren-plugin-network'
const registry = new Registry({ defaultAction: 'mask' })
registry.use(networkPlugin.detectors, networkPlugin.maskers)
// Now IPv4, IPv6, and MAC detection works
const result = await redactText(registry, 'Server: 192.168.1.1, MAC: 00:11:22:33:44:55')
// Output: Server: [REDACTED:ipv4], MAC: [REDACTED:mac]
📊 Stream Processing
For large data processing:
import { createRedactionTransform } from '@himorishige/noren-core'
const registry = new Registry({ defaultAction: 'mask' })
const transform = createRedactionTransform(registry)
// Process any ReadableStream
const inputStream = new ReadableStream({
start(controller) {
controller.enqueue('Data with john@company.com...')
controller.enqueue('More data with 4242-4242-4242-4242...')
controller.close()
}
})
const outputStream = inputStream.pipeThrough(transform)
// Collect results
const reader = outputStream.getReader()
const chunks = []
let done = false
while (!done) {
const { value, done: readerDone } = await reader.read()
done = readerDone
if (value) chunks.push(value)
}
console.log(chunks.join(''))
// Output: Data with [REDACTED:email]...More data with [REDACTED:credit_card]...
🔧 Advanced Configuration
Data Types & Object Processing
Noren processes text strings only. Objects and arrays must be converted to strings before processing:
import { Registry, redactText } from '@himorishige/noren-core'
const registry = new Registry({ defaultAction: 'mask' })
// ❌ This will fail - objects not supported
const badExample = { email: 'user@example.com' }
// await redactText(registry, badExample) // Error: s.normalize is not a function
// ✅ Convert to JSON string first
const jsonString = JSON.stringify({ email: 'user@company.com', phone: '090-1234-5678' })
const result = await redactText(registry, jsonString)
// Output: {"email":"[REDACTED:email]","phone":"•••-••••-••••"}
// ✅ Custom object processing helper
async function redactObject(registry, obj, options = {}) {
if (typeof obj === 'string') {
return await redactText(registry, obj, options)
}
if (Array.isArray(obj)) {
const results = []
for (const item of obj) {
results.push(await redactObject(registry, item, options))
}
return results
}
if (obj && typeof obj === 'object') {
const result = {}
for (const [key, value] of Object.entries(obj)) {
result[key] = await redactObject(registry, value, options)
}
return result
}
return obj // numbers, booleans, etc. returned as-is
}
// Process complex nested structures
const complexData = {
user: { email: 'user@company.com', phones: ['090-1111-2222', '03-3333-4444'] },
messages: ['Contact: admin@company.com', 'Phone: 080-5555-6666']
}
const redacted = await redactObject(registry, complexData, {
hmacKey: 'your-secure-32-character-key-here-123456'
})
// Output: Nested objects with PII properly masked in string values only
Full-Width Character Support
Noren automatically handles full-width (zenkaku) characters through Unicode NFKC normalization:
const registry = new Registry({ defaultAction: 'mask' })
// Full-width characters are automatically normalized before processing
const fullWidthInput = 'Email: user@example.com Phone: 090-1234-5678'
const result = await redactText(registry, fullWidthInput)
// Output: Email: [REDACTED:email] Phone: •••-••••-••••
// Detection works the same as half-width equivalents
const halfWidthInput = 'Email: user@company.com Phone: 090-1234-5678'
const sameResult = await redactText(registry, halfWidthInput)
// Both inputs produce equivalent masking results
Environment-Aware Processing
const registry = new Registry({
environment: 'development', // Automatically excludes test patterns
allowDenyConfig: {
allowList: ['test@company.com'], // Never treat as PII
denyList: ['admin@'] // Always treat as PII
}
})
Performance Tuning
const registry = new Registry({
enableConfidenceScoring: false, // Disable for maximum performance
sensitivity: 'relaxed' // Less aggressive detection
})
🌐 Plugin System
Extend functionality with plugins:
// Use plugins for extended functionality
import * as networkPlugin from '@himorishige/noren-plugin-network'
import * as jpPlugin from '@himorishige/noren-plugin-jp'
import * as securityPlugin from '@himorishige/noren-plugin-security'
const registry = new Registry({ defaultAction: 'mask' })
// Add network detection (IPv4/IPv6/MAC)
registry.use(networkPlugin.detectors, networkPlugin.maskers)
// Add Japanese PII detection
registry.use(jpPlugin.detectors, jpPlugin.maskers)
// Add security token detection
registry.use(securityPlugin.detectors, securityPlugin.maskers)
Plugin Validation Integration (v0.6.0+)
Plugins automatically inherit the registry's validation settings:
const registry = new Registry({
defaultAction: 'mask',
validationStrictness: 'balanced' // Applies to plugins too
})
registry.use(jpPlugin.detectors, jpPlugin.maskers)
// Plugin detections are validated using the same rules as core detectors
const text = 'テスト電話: 03-1234-5678, 本番電話: 03-9876-5432'
const result = await redactText(registry, text)
// Only real phone numbers are detected, test patterns are filtered out
Available Plugins
- @himorishige/noren-plugin-network: IPv4/IPv6 addresses, MAC addresses (Required for network detection in v0.6.0+)
- @himorishige/noren-plugin-jp: Japanese phone numbers, postal codes, My Number
- @himorishige/noren-plugin-us: US phone numbers, ZIP codes, SSNs
- @himorishige/noren-plugin-security: HTTP headers, API tokens, cookies
- @himorishige/noren-dict-reloader: Dynamic policy reloading
📊 JSON/Structured Data Processing
Noren v0.5.0+ includes native support for JSON and NDJSON (newline-delimited JSON) processing:
const registry = new Registry({
defaultAction: 'mask',
enableJsonDetection: true // Enable structured data processing
})
// JSON object detection
const jsonInput = JSON.stringify({
user: {
email: 'admin@company.com',
phone: '+1-555-123-4567',
creditCard: '4242-4242-4242-4242'
}
})
const result = await redactText(registry, jsonInput)
// Detects PII within JSON structure and provides path information
// NDJSON processing
const ndjsonInput = [
JSON.stringify({ id: 1, email: 'user1@company.com' }),
JSON.stringify({ id: 2, email: 'user2@company.com' })
].join('\n')
const ndjsonResult = await redactText(registry, ndjsonInput)
// Processes each JSON line independently
JSON Detection Features
- Key-based detection: Enhanced accuracy using JSON key names as context
- Path tracking: Provides full JSON path for detected PII (e.g.,
$.user.email
) - Nested objects: Recursive detection in deeply nested structures
- NDJSON support: Line-by-line processing for streaming data
- Type safety: Validates JSON structure before processing
🔗 MCP (Model Context Protocol) Integration
Noren provides specialized support for MCP servers that communicate via JSON-RPC over stdio. This is particularly useful for AI tools like Claude Code that need to process communication with external services while protecting sensitive data.
MCP Transform Stream
For real-time stdio processing in MCP servers:
import {
Registry,
createMCPRedactionTransform,
redactJsonRpcMessage
} from '@himorishige/noren-core'
// Create registry with comprehensive PII detection
const registry = new Registry({
defaultAction: 'mask',
validationStrictness: 'fast', // Optimized for real-time processing
enableJsonDetection: true,
rules: {
email: { action: 'mask' },
api_key: { action: 'remove' },
jwt_token: { action: 'tokenize' }
},
hmacKey: 'mcp-server-redaction-key-32-chars-minimum-length-required'
})
// Create MCP-optimized transform stream
const transform = createMCPRedactionTransform({
registry,
policy: { defaultAction: 'mask' },
lineBufferSize: 64 * 1024
})
// Process stdio communication
await process.stdin
.pipeThrough(transform)
.pipeTo(process.stdout)
JSON-RPC Message Processing
For processing individual JSON-RPC messages:
// Process a JSON-RPC request
const request = {
jsonrpc: '2.0',
method: 'getUserProfile',
params: {
email: 'user@company.com',
phone: '+1-555-123-4567'
},
id: 1
}
const redacted = await redactJsonRpcMessage(request, { registry })
console.log(redacted)
// Output: {
// jsonrpc: '2.0',
// method: 'getUserProfile',
// params: {
// email: '[REDACTED:email]',
// phone: '•••-•••-••••'
// },
// id: 1
// }
MCP Server Proxy Example
Create a proxy server that automatically redacts PII from stdio communication:
#!/usr/bin/env node
import { Registry, createMCPRedactionTransform } from '@himorishige/noren-core'
import { Readable, Writable } from 'node:stream'
class MCPRedactionProxy {
constructor(options = {}) {
this.registry = new Registry({
defaultAction: 'mask',
enableJsonDetection: true,
validationStrictness: 'fast'
})
}
async start() {
const inputStream = Readable.toWeb(process.stdin)
const outputStream = Writable.toWeb(process.stdout)
const transform = createMCPRedactionTransform({
registry: this.registry,
policy: { defaultAction: 'mask' }
})
await inputStream
.pipeThrough(transform)
.pipeTo(outputStream)
}
}
// Start the proxy
const proxy = new MCPRedactionProxy()
await proxy.start()
MCP Use Cases
1. AI Assistant Communication
- Protect user data in Claude Code AI interactions
- Redact PII from external API communications
- Safe logging of AI model conversations
2. Development Tools Integration
- IDE extensions with PII protection
- Code analysis tools with privacy features
- Debug logging with automatic data sanitization
3. CI/CD Pipeline Protection
- Build logs with PII redaction
- Test data anonymization
- Environment variable protection
MCP Utilities
The library also provides utility functions for MCP processing:
import {
parseJsonLines,
isValidJsonRpcMessage,
extractSensitiveContent,
containsJsonRpcPattern,
getMessageType
} from '@himorishige/noren-core'
// Parse line-delimited JSON messages
const messages = parseJsonLines(ndjsonString)
// Validate JSON-RPC message format
if (isValidJsonRpcMessage(message)) {
const type = getMessageType(message) // 'request' | 'response' | 'notification' | 'error'
}
// Extract potentially sensitive content
const sensitiveContent = extractSensitiveContent(jsonRpcMessage)
📚 API Reference
Registry
Main class for PII detection and configuration.
Constructor Options
interface RegistryOptions {
defaultAction?: 'mask' | 'remove' | 'tokenize'
rules?: Record<string, { action: Action, preserveLast4?: boolean }>
hmacKey?: string // Required for tokenization
environment?: 'production' | 'development' | 'test'
allowDenyConfig?: AllowDenyConfig
enableConfidenceScoring?: boolean
enableJsonDetection?: boolean // New: Enable JSON/NDJSON processing
sensitivity?: 'strict' | 'balanced' | 'relaxed'
contextHints?: string[] // Keywords to improve detection
validationStrictness?: 'fast' | 'balanced' | 'strict' // v0.6.0+: Context validation level
}
Methods
use(detectors, maskers, contextHints?)
: Add pluginsdetect(text, contextHints?)
: Detect PII (returns hits)maskerFor(type)
: Get masker for PII type
redactText(registry, input, overrides?)
Process text and apply redaction rules.
createRedactionTransform(registry, overrides?)
Create transform stream for large data processing.
⚡ Performance
Benchmarks (v0.5.0)
- Bundle Size: 124KB optimized distribution
- Processing Speed: 102,229 operations/second (0.0098ms per iteration)
- Memory Efficiency: Object pooling with automatic cleanup
- TypeScript Codebase: 1,782 lines (40%+ reduction from v0.4.x)
- API Surface: 14 exports (65% reduction for better tree-shaking)
Best Practices
- Reuse Registry instances - avoid creating new ones frequently
- Use streams for large data processing
- Disable confidence scoring for maximum performance
- Pre-compile patterns by loading plugins at startup
🔒 Security Considerations
HMAC Keys
- Minimum 32 characters required (enforced in v0.5.0)
- Store in environment variables, never in code
- Use different keys per environment
- Rotate keys regularly
- Base64URL token format for better security
Memory Safety
- Automatic object pooling reduces GC pressure
- Sensitive data is cleared from memory after processing
- Configurable limits prevent DoS attacks
🛠 Development Tools
For advanced features like benchmarking and A/B testing:
npm install @himorishige/noren-devtools
See @himorishige/noren-devtools for development and testing tools.
🔄 Version History
v0.6.0 (Latest) - Advanced Validation & Architecture Optimization
🚨 Breaking Changes:
- Network detection separation: IPv4/IPv6/MAC detection moved to
@himorishige/noren-plugin-network
- Smaller core bundle: 35% reduction in core package size by removing network patterns
- Plugin-based architecture: Better modularity and optional feature loading
🛡️ New Features:
- Advanced validation system: Context-aware false positive filtering with 3 strictness levels (
fast
/balanced
/strict
) - Plugin validation integration: Automatic validation for plugin-detected PII types with seamless inheritance
- 🇯🇵 Enhanced Japanese language support: Specialized validators and expanded context keywords for improved accuracy
- 📋 Debug utilities: New
debugValidation()
function for detailed validation analysis - ⚡ Performance optimized: Validation adds minimal overhead while significantly reducing false positives
- 🎯 Context-aware filtering: Smart detection of test data, examples, and weak contexts
- 🔄 Backward compatible: All existing APIs work without changes (except network detection)
📦 Migration Guide:
// Before v0.6.0 (network detection included)
const result = await redactText(registry, 'IP: 192.168.1.1')
// v0.6.0+ (install network plugin)
npm install @himorishige/noren-plugin-network
import * as networkPlugin from '@himorishige/noren-plugin-network'
registry.use(networkPlugin.detectors, networkPlugin.maskers)
const result = await redactText(registry, 'IP: 192.168.1.1')
v0.5.0 - Performance & Structured Data Support
- JSON/NDJSON detection: Native support for structured data with key-based matching
- Prefilter optimization: Fast screening reduces processing time for non-PII text
- 77% code reduction: Streamlined from 8,153 to 1,782 lines
- Single-pass detection: Unified pattern matching for better performance
- Optimized IPv6 parser: 31% size reduction with enhanced validation
- Streamlined Hit Pool: 47% size reduction with object pooling
- Reduced API surface: 65% fewer exports for better tree-shaking
- Enhanced security: Stricter boundaries and improved validation
- Code quality improvements: Full TypeScript strict mode compliance
v0.4.0 - Confidence Scoring & Advanced Features
- Added confidence scoring system
- Environment-aware processing
- Enhanced HMAC security with 32-character minimum
- Development tools package separation
📄 License
MIT License - see LICENSE for details.
Part of the Noren PII protection suite