parse-latin
Latin-script (natural language) parser
Last updated 3 years ago by wooorm .
MIT · Repository · Bugs · Original npm · Tarball · package.json
$ cnpm install parse-latin 
SYNC missed versions from official npm registry.

parse-latin

Build Coverage Downloads Size

Natural language parser, for Latin-script languages, that produces nlcst.

Contents

What is this?

This package exposes a parser that takes Latin-script natural language and produces a syntax tree.

When should I use this?

If you want to handle natural language as syntax trees manually, use this.

Alternatively, you can use the retext plugin retext-latin, which wraps this project to also parse natural language at a higher-level (easier) abstraction.

Whether Old-English (“þā gewearþ þǣm hlāforde and þǣm hȳrigmannum wiþ ānum penninge”), Icelandic (“Hvað er að frétta”), French (“Où sont les toilettes?”), this project does a good job at tokenizing it.

For English and Dutch, you can instead use parse-english and parse-dutch.

You can somewhat use this for Latin-like scripts, such as Cyrillic (“привет”), Georgian (“გამარჯობა”), Armenian (“Բարեւ”), and such.

Install

This package is ESM only. In Node.js (version 16+), install with npm:

npm install parse-latin

In Deno with esm.sh:

import {ParseLatin} from 'https://esm.sh/parse-latin@7'

In browsers with esm.sh:

<script type="module">
  import {ParseLatin} from 'https://esm.sh/parse-latin@7?bundle'
</script>

Use

import {ParseLatin} from 'parse-latin'
import {inspect} from 'unist-util-inspect'

const tree = new ParseLatin().parse('A simple sentence.')

console.log(inspect(tree))

Yields:

RootNode[1] (1:1-1:19, 0-18)
└─0 ParagraphNode[1] (1:1-1:19, 0-18)
    └─0 SentenceNode[6] (1:1-1:19, 0-18)
        ├─0 WordNode[1] (1:1-1:2, 0-1)
        │   └─0 TextNode "A" (1:1-1:2, 0-1)
        ├─1 WhiteSpaceNode " " (1:2-1:3, 1-2)
        ├─2 WordNode[1] (1:3-1:9, 2-8)
        │   └─0 TextNode "simple" (1:3-1:9, 2-8)
        ├─3 WhiteSpaceNode " " (1:9-1:10, 8-9)
        ├─4 WordNode[1] (1:10-1:18, 9-17)
        │   └─0 TextNode "sentence" (1:10-1:18, 9-17)
        └─5 PunctuationNode "." (1:18-1:19, 17-18)

API

This package exports the identifier ParseLatin. There is no default export.

ParseLatin()

Create a new parser.

ParseLatin#parse(value)

Turn natural language into a syntax tree.

Parameters
  • value (string, optional) — value to parse
Returns

Tree (RootNode).

Algorithm

???? Note: The easiest way to see how parse-latin parses, is by using the online parser demo, which shows the syntax tree corresponding to the typed text.

parse-latin splits text into white space, punctuation, symbol, and word tokens:

  • “word” is one or more unicode letters or numbers
  • “white space” is one or more unicode white space characters
  • “punctuation” is one or more unicode punctuation characters
  • “symbol” is one or more of anything else

Then, it manipulates and merges those tokens into a syntax tree, adding sentences and paragraphs where needed.

  • some punctuation marks are part of the word they occur in, such as non-profit, she’s, G.I., 11:00, N/A, &c, nineteenth- and…
  • some periods do not mark a sentence end, such as 1., e.g., id.
  • although periods, question marks, and exclamation marks (sometimes) end a sentence, that end might not occur directly after the mark, such as .), ."
  • …and many more exceptions

Types

This package is fully typed with TypeScript. It exports no additional types.

Compatibility

Projects maintained by me are compatible with maintained versions of Node.js.

When I cut a new major release, I drop support for unmaintained versions of Node. This means I try to keep the current release line, parse-latin@^7, compatible with Node.js 16.

Security

This package is safe.

Related

Contribute

Yes please! See How to Contribute to Open Source.

License

MIT © Titus Wormer

Current Tags

  • 0.1.0-rc.3                                ...           0.1.0-rc.3 (12 years ago)
  • 0.1.0-rc.4                                ...           0.1.0-rc.4 (12 years ago)
  • 7.0.0                                ...           latest (3 years ago)
  • 0.4.0-rc.1                                ...           next (11 years ago)

37 Versions

  • 7.0.0                                ...           3 years ago
  • 6.0.2                                ...           3 years ago
  • 6.0.1                                ...           3 years ago
  • 6.0.0                                ...           3 years ago
  • 5.0.1                                ...           3 years ago
  • 5.0.0                                ...           5 years ago
  • 4.3.0                                ...           5 years ago
  • 4.2.1                                ...           6 years ago
  • 4.2.0                                ...           7 years ago
  • 4.1.1                                ...           8 years ago
  • 4.1.0                                ...           9 years ago
  • 4.0.3                                ...           9 years ago
  • 4.0.2                                ...           9 years ago
  • 4.0.1                                ...           9 years ago
  • 4.0.0                                ...           9 years ago
  • 3.2.0                                ...           10 years ago
  • 3.1.1                                ...           10 years ago
  • 3.1.0                                ...           10 years ago
  • 3.0.0                                ...           10 years ago
  • 2.0.0                                ...           11 years ago
  • 1.0.0                                ...           11 years ago
  • 0.5.2                                ...           11 years ago
  • 0.5.1                                ...           11 years ago
  • 0.5.0                                ...           11 years ago
  • 0.4.2                                ...           11 years ago
  • 0.4.1                                ...           11 years ago
  • 0.4.0                                ...           11 years ago
  • 0.4.0-rc.2                                ...           11 years ago
  • 0.4.0-rc.1                                ...           11 years ago
  • 0.3.0                                ...           11 years ago
  • 0.3.0-rc.1                                ...           11 years ago
  • 0.2.0                                ...           11 years ago
  • 0.1.3                                ...           12 years ago
  • 0.1.2                                ...           12 years ago
  • 0.1.0                                ...           12 years ago
  • 0.1.0-rc.4                                ...           12 years ago
  • 0.1.0-rc.3                                ...           12 years ago
Maintainers (1)
Downloads
Today 0
This Week 0
This Month 0
Last Day 0
Last Week 0
Last Month 2
Dependencies (6)
Dev Dependencies (14)

Copyright 2013 - present © cnpmjs.org | Home |