natural-language-understanding

🚀 Feature request

Hello I was thinking it would be of great help if I can get the time offsets of start and end of each word .

Motivation

I was going through Google Speech to text documentation and found this feature and thought will be really amazing if i can have something similar here.

The Split class accepts SplitDelimiterBehavior which is really useful. The Punctuation however always uses SplitDelimiterBehavior::Isolated (and Whitespace on the other hand behaves like SplitDelimiterBehavior::Removed).

impl PreTokenizer for Punctuation {
    fn pre_tokenize(&self, pretokenized: &mut PreTokenizedString) -> Result<()> {
        pretokenized.split(|_, s| s.spl

Description

While using tokenizers.create with the model and vocab file for a custom corpus, the code throws an error and is not able to generate the BERT vocab file

Error Message

ValueError: Mismatch vocabulary! All special tokens specified must be control tokens in the sentencepiece vocabulary.

To Reproduce

from gluonnlp.data import tokenizers
tokenizers.create('spm', model_p

Mar	APR	May
	27
2020	2021	2022

natural-language-understanding

Here are 491 public repositories matching this topic...

huggingface / transformers

Getting time offsets of beginning and end of each word in Wav2Vec2

🚀 Feature request

Motivation

Enable Wav2Vec2 Pretraining

[docs] [sphinx] need to resolve cross-references for inherited/mixin methods

google-research / bert

hanxiao / bert-as-service

ludwig-ai / ludwig

microsoft / nlp-recipes

huggingface / tokenizers

Add SplitDelimiterBehavior to Punctuation constructor

dmlc / gluon-nlp

[Error Message] Improve error message in SentencepieceTokenizer when arguments are not expected.

Description

Error Message

To Reproduce

Use official MXNet batchify to implement the batchify functions

NMT Inference: Chunk overlength sequences and translate in sequence

opencog / opencog

google / sling

namisan / mt-dnn

explosion / spacy-transformers

KartikChugh / Otto

chatopera / insuranceqa-corpus-zh

declare-lab / conv-emotion

MITESHPUTHRANNEU / Speech-Emotion-Analyzer

turtlesoupy / this-word-does-not-exist

microsoft / DeBERTa

Decalogue / chat

suragnair / seqGAN

practical-nlp / practical-nlp

huggingface / autonlp

Picovoice / rhino

BotLibre / BotLibre

jayparks / tf-seq2seq

soulbliss / NLP-conference-compendium

chatopera / clause

graphbrain / graphbrain

gkiril / oie-resources

deepmipt / ner

feedly / transfer-nlp

Improve this page

Add this topic to your repo