natural-language-understanding
Here are 491 public repositories matching this topic...
-
Updated
Apr 24, 2021 - Python
-
Updated
Jan 1, 2021 - Python
-
Updated
Apr 27, 2021 - Python
-
Updated
Apr 8, 2021 - Python
The Split class accepts SplitDelimiterBehavior which is really useful. The Punctuation however always uses SplitDelimiterBehavior::Isolated (and Whitespace on the other hand behaves like SplitDelimiterBehavior::Removed).
impl PreTokenizer for Punctuation {
fn pre_tokenize(&self, pretokenized: &mut PreTokenizedString) -> Result<()> {
pretokenized.split(|_, s| s.spl
[Error Message] Improve error message in SentencepieceTokenizer when arguments are not expected.
Description
While using tokenizers.create with the model and vocab file for a custom corpus, the code throws an error and is not able to generate the BERT vocab file
Error Message
ValueError: Mismatch vocabulary! All special tokens specified must be control tokens in the sentencepiece vocabulary.
To Reproduce
from gluonnlp.data import tokenizers
tokenizers.create('spm', model_p
-
Updated
Apr 9, 2021 - Scheme
-
Updated
Jan 22, 2021 - C++
-
Updated
Feb 16, 2021 - Python
-
Updated
Apr 20, 2021 - Python
-
Updated
Feb 25, 2021 - JavaScript
-
Updated
Oct 11, 2020 - Python
-
Updated
Mar 2, 2021 - Python
-
Updated
Mar 18, 2021 - Jupyter Notebook
-
Updated
Jan 12, 2021 - Python
-
Updated
Apr 13, 2021 - Python
-
Updated
Jun 29, 2020 - Python
-
Updated
Sep 27, 2018 - Python
-
Updated
Apr 5, 2021 - Jupyter Notebook
-
Updated
Apr 21, 2021 - Python
-
Updated
Apr 22, 2021 - JavaScript
-
Updated
Apr 26, 2021 - Java
-
Updated
Apr 16, 2018 - Python
-
Updated
Dec 28, 2019
-
Updated
Feb 10, 2021 - C++
-
Updated
Apr 22, 2021 - Python
-
Updated
Apr 12, 2021
-
Updated
Nov 13, 2020 - Python
-
Updated
May 28, 2020 - Python
Improve this page
Add a description, image, and links to the natural-language-understanding topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the natural-language-understanding topic, visit your repo's landing page and select "manage topics."


Hello I was thinking it would be of great help if I can get the time offsets of start and end of each word .
Motivation
I was going through Google Speech to text documentation and found this feature and thought will be really amazing if i can have something similar here.