Morpheme Dataset

03 Jun 2019

This dataset (in morphemes.json) has been curated with the objective of enabling automated access.

It houses morphemes (prefixes, roots/stems, and suffixes), along with meanings and other associated characteristics.

An initial count shows 2400+ groupings, containing 3000+ prefixes, 1000+ roots/stems, and 800+ suffixes.

Prefixes:

Fields associated with prefixes are as follows:

form: the prefix characters as they would exist in a word (eg. “ab”)
root: reference to the form,with “-“ appended (eg. “ab-“)
loc: the fixed value “prefix”
attach_to: a list of parts of speech that this prefix would attach to (eg. [“noun”])
category: one of the following values (this list is extensible):
- preposition-like
- relating to
- negation
- …

For example, here is the entry for prefix “be-“:

prefix only

"be-": {
"forms": [{
  "root": "be-",
  "form": "be",
  "loc": "prefix",
  "attach_to": ["verb"],
  "category": "preposition-like"
}],
"meaning": ["equipped with", "covered with", "beset with"],
"origin": "",
"etymology": "",
"examples": ["bedeviled", "becalm", "bedazzle", "bewitch"]
},

Suffixes:

Fields associated with suffixes are as follows:

form: the suffix characters as they would exist in a word (eg. “able”)
root: reference to the form,with “-“ prepended (eg. “-able”)
loc: the fixed value “suffix”
pos: part of speech (allows for multiples)
type:
- derivational
- inflectional

For example, here is the entry for suffix “-acy”:

suffix only:

"-acy": {
"forms": [{
  "root": "-acy",
  "form": "acy",
  "pos": "noun",
  "type": "derivational",
  "loc": "suffix"
}],
"meaning": ["state", "quality", "resemblance"],
"origin": "",
"etymology": "",
"examples": ["democracy", "accuracy", "lunacy"]
},

Roots/Stems:

Fields associated with roots/stems are as follows:

form: the root/stem characters as they would exist in a word (eg. “fess”)
root: reference to the form,with “-“ prepended and appended (eg. “-fess-“)
loc: the fixed value “embedded”

For example, here is the entry for “-laps-“:

embedded root/stem only:

"laps": {
  "forms": [{
    "loc": "embedded",
    "root": "-laps-",
    "form": "laps"
  }],
  "meaning": ["slip"],
  "origin": "",
  "examples": ["collapse", "elapse", "lapse", "collapsible"]
},

Entries are grouped according to meaning. Fields in a grouping are:

meaning: a list of words or phrases that apply to all the prefixes, suffixes, and roots/stems in the grouping
theme: (where available) a categorization of the grouping (eg. “society”). Some values of theme are:
- color
- society
- diminutive
- human_body
- gender
- movement
- negation
- proportion
- …
origin: language of origin (eg. “Latin”)
etymology: details where available
examples: a list of words that contain one or more affixes in the grouping

Note 1: Some of the fields above do not have values. This is particularly true of prefix fields “attach_to”, “category”, suffix fields “pos”, “type”, and grouping fields “origin”, “etymology”. Completion of these field values is an ongoing project. Please contribute if you can.

Note 2: Ordering of some entries, particularly prefixes, is important to facilitate search.

See the README file under code/wordnet in the repo for details on Python scripts that can be used in conjunction with this dataset.

Morphemes with Meanings

Common English morphemes, organized for automated access

Morpheme Dataset