Important: This documentation covers Yarn 1 (Classic).
For Yarn 2+ docs and migration guide, see yarnpkg.com.

Package detail

britfone

British English pronunciation dictionary

cmudict, phonetic, ipa, british, english

readme

Britfone

British English (RP/Standard Southern British ) pronunciation dictionary:

  • +16,000 entries including the top 10,000 most frequent words as per BNC and Google Web Corpus
  • IPA transcription including primary and secondary stress
  • MIT license
  • separate expansion dictionary spelling out punctuation and abbreviations
  • both American and British spelling variants
  • all UK counties
  • all London boroughs
  • all major UK towns
  • all European capitals
  • all US states
  • all common irregular plurals
  • all common irregular verbs

Format

The main dictionary's words are in upper case, comma-separated from their space-separated pronunciation. For words with multiple pronunciations, a parenthesised number is attached to the end:

RAINBOW, ɹ ˈeɪ n b ˌəʊ
RAINING, ɹ ˈeɪ n ɪ ŋ
RAISE, ɹ ˈeɪ z
RAISED, ɹ ˈeɪ z d
RAISES, ɹ ˈeɪ z ɪ z
RAISING, ɹ ˈeɪ z ɪ ŋ
RAISINS, ɹ ˈeɪ z ɪ n z
RALEIGH(1), ɹ ˈɑː l i
RALEIGH(2), ɹ ˈɔː l i

Stress marks are attached to the stressed vowel/diphthong.

Multi-unit words are separated by the underscore _, which stands for an actual space . This is to ease further processing:

COSTA_RICA, k ˌɒ s t ə ɹ ˈiː k ə

In the expansions dictionary entries are also in upper case, tab-separated from their expansions:

MON    MONDAY(1)
MON.    MONDAY(1)
MPG    MILES PER(1) GALLON
MPH    MILES PER(1) HOUR
MR    MISTER
MR.    MISTER
MRS    MISSIS
MRS.    MISSIS

Issues and remarks

  • strict IPA versus traditional phonetic symbols: the phonetic symbols are strictly as defined by the IPA, as opposed to how they have traditionally been used in many dictionaries and the language learning literature. In particular:

    • /ɐ/ instead of traditional /ʌ/
    • /ɹ/ instead of traditional /r/
    • /ɛ/ instead of traditional /e/
    • /ɜː/ instead of traditional /əː/
  • unstressed vowels as /ə/ and /ɪ/: due to the diversity of the sources for phonetic transcription, there's some inconsistency in how weak vowels are transcribed, though in most cases /ɪ/ is used, following the Collins Dictionary.

  • final i: final unstressed i's are given a short tense "i" phoneme /i/, different from both /iː/ and /ɪ/, to reflect happy-tensing. Most dictionaries show this vowel (https://en.wikipedia.org/wiki/English_phonology) or the short tense /ɪ/. There might be some inconsistency in the transcription as happy-tensing is preserved in inflected variants in spoken English (e.g., studied derives it from study, and it contrasts with studded) yet this might not always be reflected in the dictionary.

  • secondary stress: secondary stress is not always marked (the primary always is).

  • stems and inflections: not all inflected open-class words (noun, verbs, adjectives and adverbs) have all their inflected variants, and not all variants show all of the alternative pronunciations. The possessive form -'s of nouns is not included, and neither is the superlative form of most adjectives and adverbs.

  • acronyms vs initialisms: The expansions dictionary only contains acronyms, i.e., words that are not pronounced by spelling out the individual letters (e.g. NATO). Initialisms, on the other hand, (e.g. BBC, NHS) are excluded. The pronunciation of these can be obtained by looking up the names of the individual letters in the main dictionary, then concatenating them.

Sources

The initial source of the phonetic transcriptions is cmudict, plus a number of other sources for British English specifics: Wiktionary, Wikipedia, the Collins Dictionary, the Oxford Dictionary, the Cambridge Dictionary and the MacMillan Dictionary.

The main sources of the word frequency-filtered vocabulary are the top 10K in the British National Corpus, the Google Web Corpus and the New General Service Lists. Not all words in these lists are included since due to sampling bias there are uncommon words like athelstan or phentermine, as well as foreign words. Also excluded are initialisms.

Changelog

See Changelog

Contribuiting

If you'd like to contribute a correction or an addition, or make a request for an addition, you can make a pull request or open an issue.

MIT License (MIT)

Copyright (c) 2017 by Jose Llarena

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

changelog

Change Log

v3.0.0 (2017/10/02 19:41 +00:00)

  • 5f7f7d7 build: update version number (@JoseLlarena)
  • b2b4242 doc: add changelog; update readme accordingly (@JoseLlarena)
  • f871071 feat: replace space with underscore (@JoseLlarena)
  • 7046142 fix: replace long with short i in derived verb forms (@JoseLlarena)
  • af4eb31 fix: happy-tensing for -ing forms of verbs; feat: replacement of space with underscore (@JoseLlarena)
  • 6690946 fix: kayak, trans(s)exual, trickle, qualitative, bras, dumfries; feat: add stickler, noodle, propagat(e,ed,es,ing,ion) (@JoseLlarena)

v2.0.1 (2017/04/30 07:14 +00:00)

  • 449a9c7 ref: rename after version bump (@JoseLlarena)
  • 84c1013 doc: fix examples (@JoseLlarena)

v2.0.0 (2017/04/30 06:53 +00:00)

  • dc657e0 doc: add github links to contributing section (@JoseLlarena)
  • ad47471 fix: herpes (@JoseLlarena)
  • 73cfb6a feat: replace long tense i in plurls with short tense i; fix: campuses (@JoseLlarena)
  • c8427e1 doc: add features to intro; add contribution section (@JoseLlarena)
  • ac75b75 ref: renaming files after bumping version (@JoseLlarena)
  • 5bad4ea doc: update to reflect new use of short tense 'i' (@JoseLlarena)
  • 739f438 feat: update list of symbols with new short i (@JoseLlarena)
  • e0f6159 feat: conversion of final weak 'i's from long to short (@JoseLlarena)
  • 8c001cc fix: pyramid (@JoseLlarena)
  • a422a57 fix: wikipedia link in readme (@JoseLlarena)
  • a2229c7 add: old english verb forms (@JoseLlarena)
  • 7dc9f3b fix: wikipedia, bucharest; add: irregular plurals and others (@JoseLlarena)

v1.2.0 (2017/02/13 17:59 +00:00)

  • 411d6bb ref: update file names to current version (@JoseLlarena)
  • 099b9f1 add: irregular plurals (@JoseLlarena)
  • ded8b5c doc: add missing links (@JoseLlarena)
  • 73858b7 doc: add changelog section (@JoseLlarena)
  • a135fcd doc: rewording, clarification; add: example entry from documentation (@JoseLlarena)
  • 97c872d doc: update and clarification (@JoseLlarena)
  • bcdb36a license: update year (@JoseLlarena)
  • 0ec459e add: words from bnc and google corpora (@JoseLlarena)
  • 287fcfb add: more words from google corpus (@JoseLlarena)
  • c69c58b fix: vertigo; add: more words from google corpus (@JoseLlarena)
  • 2984a9f add: more words from google corpus (@JoseLlarena)
  • d3b28d1 add from google corpus: lesotho, judiciary, incarnate, imagery, garner, garnet, burundi, buckle, trivia, trickle, treacle, tiffany, sturdy, trackback(s), trifle, severity, retailing, quartz, prevents, patriot, newfoundland, mesa, myrtle (@JoseLlarena)
  • 1dd60e3 fix: un-parantheses, christianity; modify and add incorporate (@JoseLlarena)
  • 6996190 fix: tournaments, specifically, progressively, complementary, businessmen, interestingly; modify: tournament(s), researcher(s),engaging, document(s), extreme, extremely, discriminate, businessmen; add: blimp, tournament(s), researcher(s), businessman, specific, extreme, extremely, engaging, document(s), buisenesmann, discriminate; remove: vis, specific (@JoseLlarena)
  • 903cb18 fix: allegation(s); add; brexit (@JoseLlarena)
  • dd862e1 fix: university, twentieth, thirtieth, millimetres; modify: twentieth; remove; twentieth, duplicate of indices (@JoseLlarena)
  • 6082b7d fix: unacceptable, surfaces, supplemental, sovereignty, selective, selections, select, selected, selecting, secretary, secreatries, representation, precedent, perspiration, matrixes, legitimate, legitimacy, necklace, lengendary, investigation(s), insulation, inspiration, incredible, incredibly, illegitimate, fundraising, elevation, easier, easiest, dialect(s), compatibility, collectibles, boulevard, benifit(s), appendixes, algebra; modified: resources, arsenic; add: indices, arsenic, resources (@JoseLlarena)
  • feed266 fix: willingness, wellness, ultimately, tortoise, terminations, soonest, sinuses, sadness, mattress, integration, hottest, highest, happiness, fastest, darkness, effectiveness, consciousness, coincidence, challenge, challenging, biggest, closest, awareness, countess;modify: porcelain, countess; removed: interest(s), interesting (@JoseLlarena)
  • 35b66e0 fix: characters, characteristic(s), primarily; modify: primarily; add: primarily (@JoseLlarena)
  • a7ffca5 fix: vegetarian, tremendous, purchas(es), purchased, purchasing, purchaser(s),realism, surface, provinces, plagiarism, mountain(s), mistress, courtesy, fountain, actress, amphibian, calendar(s), avenue; modify: indirectly, tremendous, decorative; add: decorative, indirectly; remove: tremendous (@JoseLlarena)
  • 6c5314a fix: scenarios, outrageous, discipline(s), disciplinary, captain, accesses, harvest (@JoseLlarena)
  • 00c167e fix: sinuses, relevant, relevance, re-sign, periodic, mozilla, mommy, memorial, mechanic(s), mechanical, loughton, jeremy, impulses,laundry, exploit, elephant; modify: salt, salty, opposite, opera, anybody; add: anybody, salt, salty, grandma, opposite; remove: sandwich (@JoseLlarena)
  • 1607b5f fix: witness(es), wilderness, weakness(es), thickness, sickness, madness, kindness, highness, goodness, forgiveness, fitness (@JoseLlarena)
  • e1800ce fix: unfortunate, unfortunately, template(s), tablet(s), syndicate, secret(s), private, priately, secrets, pirates, estimates; add: pirate (@JoseLlarena)
  • 9a6123f fix:ultimate, repetition, portrait(s), petition, passionate, moderate, legitimate, leaflet(s), intimate, inadequate, ilness, fortunate, fortunately, illegitimate, moderate, elaborate, duplicate, delicate, deliberate, deliberately, estimate, delegate(s), corporate, competition(s), coordinate(s), certificate(s),candidate(s), cannot, beloved, approximate, alternate, adequate, accurate, accurately, climate, benett; modify:climate, bennet, portraits; add: portraits; remove: schwa alternatives to climate and benett (@JoseLlarena)
  • 8f11eff fix: vitamins, andorra, worthless, useless, stainless, regardless, pointless, needless, hopeless, helpless, endless, cordless, doubtless (@JoseLlarena)
  • f2d2a7a fix: hymen, linen, daryl, banal (@JoseLlarena)
  • b531d06 fix: replaced (@JoseLlarena)
  • 469b8fb fix: volvo, topless, toilet(s), telegram, telegraph, telephone, telecom, separate, separately, germaine; modify: transit, separate; add: transit, separate (@JoseLlarena)
  • 5e3d54e fix: visual, visualis/zation, trademarks, surmise, petrol, nirvana, joseph, garage(s), dirge, architects; modify: visual, visualis/zation, garage(s), didcot, batman; add: visual, visualization, garage(s), didcot, batman; delete: joseph (@JoseLlarena)
  • eb56452 fix: worried, nissan, midi, married, hurried, fancied, celebs, accompanied; modify: nissan, mosaic, chi; add: nissan, mosaic, chi (@JoseLlarena)
  • fde7f22 fix: displacement (@JoseLlarena)
  • a1d64eb fix: modem(s); modify: export(s), embark; add: export(s), embark, buffet (@JoseLlarena)
  • 1ef88fe modify: as, pylon; add: weak as, pylon (@JoseLlarena)
  • 3ba505a fix: yous, wonky, spunk, plonker, plank, lancaster, flank, conquer, conrete, condom,concord; modify: module(s), leads, condom, coral; add: module(s), leads, choral (@JoseLlarena)
  • dda88ea fix: velvet, wander, vastly, subaru, statue, samuel, revive, palace, oldest, latest, houses, helmet, fossil, helmet, detach, detached, caters, biopsy; modify yearly, superb, second, seconds, ordeal, greasy, domain, berlin; add: yearly, superb, ordeal, statue, second, seconds, cursed, berlin (@JoseLlarena)
  • 861e23f fix: unite, united; modify: gill, volt; add: gill, volt, pry (@JoseLlarena)
  • 0070962 fix: paths, romeo; modify: year, dowry; add: modify, dowry (@JoseLlarena)
  • 29ae837 fix: stew, steer, semen, moses, heal, docs, cater, bios; modify: moped, momentum; add: moped, momentum (@JoseLlarena)
  • 1b534d3 fix: roar (@JoseLlarena)
  • b0dee7f fix: coral; modify: troll; add: new alt pronun to troll (@JoseLlarena)
  • 7015c2a fix: bung, walt; modify: pate; add: alternative for pate (@JoseLlarena)
  • 356d55f fix: wove (@JoseLlarena)
  • 3ef93d7 fix: calf, calves, catastrophic (@JoseLlarena)
  • bf0fe05 add: ruffle (@JoseLlarena)
  • b3a84ec fix: temporarily, television(s), terribly, terrible, decimal, beautiful, beautifully, anonymous, chocolate(s); add: alternative pronunciations to temporary, television(s), chocolate(s), controversy; remove: community with schwa (@JoseLlarena)
  • 84856d8 fix: bracelet, dusty, monsignor, stilt, teletext, wrestle; add:stilts (@JoseLlarena)
  • 6805902 fix: vixen (@JoseLlarena)
  • 4128e47 fix: invocation (@JoseLlarena)
  • 9d8a563 fix: artifact, lanky; remove: spurious rifh (@JoseLlarena)
  • 4109d1b fix: Cairo (@JoseLlarena)
  • 553fae0 fix: cheapest, shaken (@JoseLlarena)
  • 783c466 fix: yup, waked, quagmire, iffy, froth, eynsham, dine, diner, alas, eynsham, age, aging (@JoseLlarena)
  • 3f15ca1 modify: berkeley, adding alternate pronunciation; add: unintuitive UK place names (@JoseLlarena)
  • 8afd898 fix: loom; add: a few more words from NGSL (@JoseLlarena)
  • 39cce6d few more words from NGSL (@JoseLlarena)
  • 621a56c add: few more words from NGSL (@JoseLlarena)
  • f5083e8 fix: name of symbols file (@JoseLlarena)
  • b02d226 fix: fastest, indirect, ecological, transexual, transsexual; add: words from NGSL (@JoseLlarena)
  • 6481749 fix: whirring; add: European capitals, bnc words, irregular verbs (@JoseLlarena)
  • 8d3f583 feat: add irregular verbs (@JoseLlarena)
  • d8e0192 delete: sta; fix: textbook, orangutang, stockholm; add: bnc words, european capitals (@JoseLlarena)
  • 8c44fa5 fix: age, foray; feat: a few more words from bnc (@JoseLlarena)
  • 2e1b0a9 more words from bnc (@JoseLlarena)
  • d352485 fix: hymn; feat: add 100+ words from bnc (@JoseLlarena)
  • b1f1c6b ref: add prefix to symbols file (@JoseLlarena)
  • 969fab6 fix: SHITTY,PORTUGUESE,GUI,PONY,SUNDAY,BRITNEY,HAWAII,NEWBIE,THIRTY,MORAY,GRANDE,CRITICIZED,TWENTIETH,HARRIET,WARRIOR,WARRIORS,MARRIOTT,HAMPSHIRE,SOCIOLOGICAL (@JoseLlarena)
  • d279ce4 fix: muslims; add/modify: muslim (@JoseLlarena)
  • 54c756d fix: anaesthetise, mole, helix, squelch (@JoseLlarena)
  • 5cfd865 fix: karma, circulation, parentheses, electron, lieutenant; modify: comfortable; add: 600+ difficult to spell or pronounce words (@JoseLlarena)

v1.1.3 (2016/12/17 18:09 +00:00)

  • e1709b7 ref: rename files to current release (@JoseLlarena)
  • 4b3adb6 fix: chage short to long o in absorption (@JoseLlarena)
  • 53c486e fix: evelyn, pennsylvania; add fizzy (@JoseLlarena)
  • ec94c99 fix: syria (@JoseLlarena)
  • cca101a fix: ambiguous, conspicuous (@JoseLlarena)

v1.1.2 (2016/10/16 21:16 +00:00)

  • 93d7479 ref: rename files for release (@JoseLlarena)
  • 7e026e5 fix: marxist, shakespeare, thereafter (@JoseLlarena)
  • b4f8155 fix: remove stray i with missing length symbol in symbols list (@JoseLlarena)

v1.1.1 (2016/10/16 16:05 +00:00)

  • 2d545c5 update file name to release number (@JoseLlarena)
  • ca9f09c fix: add all symbols in main dictionary, sorted (@JoseLlarena)
  • 5dc4ba7 fix: resort main dictionary (@JoseLlarena)
  • 47648bb fix: confront, shrink (@JoseLlarena)
  • 8f6c9ef fix: pron. of confrontation; add: more entries from bnc top frequencies (@JoseLlarena)
  • 47f8a69 fix: accompany (@JoseLlarena)
  • b85e121 fix: sort newly added entries (@JoseLlarena)
  • ee8f53a fix: empty last line in expansions file (@JoseLlarena)
  • d46e8b6 add: entries from bnc frequency lists (@JoseLlarena)
  • f3ee6fb fix: been, moldova, departmental, will now show all words properly torted; add ax, axes, dormouse , dormice (@JoseLlarena)

v1.0.0 (2016/10/13 17:27 +00:00)

  • f18900a feat: delete old dictionary files to start semver (@JoseLlarena)
  • 6b6b455 feat: rename dictionary files to start semver (@JoseLlarena)
  • 694256b add: hamlet, hamlets (@JoseLlarena)
  • c653bf3 fix: missing dipthong in symbol list (@JoseLlarena)
  • b802a53 fix: p. of alex, lettuce (@JoseLlarena)
  • 1812680 fix: add au dipthong missiong from symbols.txt (@JoseLlarena)
  • 08f1ba6 fix: genealogy, intention(s), thesaurus (@JoseLlarena)
  • 4031642 feat: add annexe entry; fix: ally, annex, auburn, audio, ethiopia, mature, niger, oath, premature, regime, thereafter, and wallace (@JoseLlarena)
  • 97fe75b fix: change long to short a in aspirations (@JoseLlarena)
  • 678c765 replaced bad phonemes (@JoseLlarena)
  • 79119a2 corrected sound and spelling of various words (@JoseLlarena)
  • 321f1fb corrected evolutionary word, kilometre and vocabulary sounds (@JoseLlarena)
  • 1156b04 corrected joined, incorrect sounds (@JoseLlarena)
  • 4c7d6e9 corrected pronunciations of 've, brunei(2), mali and query (@JoseLlarena)
  • 12b00a1 corrected duplicate/missing stress marks, cosmetic changes to readme (@JoseLlarena)
  • 36a4149 corrected readme (@JoseLlarena)
  • 565ea55 corrected readme (@JoseLlarena)
  • dd45936 first commit (@JoseLlarena)