huggingface-tokenizers

We use phonetics as a feature to create a joint semantic-phonetic embedding and improve the neural machine translation between Chinese and Japanese. 🥳

translation word2vec machine-translation pytorch embeddings transformer seq2seq gensim neural-machine-translation phonetics jieba nmt rnn-pytorch pytorch-lightning wandb janome huggingface-tokenizers

Updated Aug 3, 2021
Jupyter Notebook

TarunSingh2002 / Youtube-Comments-Sentiment-Analysis

Star

This project leverages deep learning transformers to classify YouTube comments into six distinct emotions.

nlp flask data-science machine-learning youtube deep-learning sentiment-analysis tensorflow transformer comments html-css-javascript sentiment-classification keras-tensorflow googletrans tflite-models huggingface-transformers huggingface-tokenizers pycld3 huggingfacespace

Updated Sep 10, 2024
Jupyter Notebook

miyako / text-splitter

Star

tool to split text into semantic chunks

huggingface-tokenizers tiktoken 4d-dependency

Updated Mar 30, 2026
Ruby

Kratugautam99 / Natural-Language-Processing-Practice

Star

Natural Language Processing Practice — a hands‑on repository spanning the full spectrum of NLP, from classical algorithms to cutting‑edge large language models (LLMs). Built around the Hugging Face LLM Course, it’s enriched with practical notebooks on foundational libraries and advanced fine‑tuning workflows.

jupyter-notebook huggingface-transformers huggingface-tokenizers huggingface-accelerate huggingface-datasets huggingface-spaces huggingface-peft huggingface-trl pytorch-tensorflow gensim-fasttext bitsandbytes-gptq awq-unsloth numpy-scipy-pandas matplotlib-seaborn-sklearn nltk-spacy-word2vec argilla-gradio-wandb-marimo

Updated Mar 21, 2026
Jupyter Notebook

cesarsiuu2316 / Auto-Glossary-Builder-QA-Assistant

Star

Automated glossary generation and QA assistant that extracts technical terms from text corpora using regex and Levenshtein clustering, tokenizes them with custom BPE, and generates definitions and examples using a local Ollama LLM, all accessible through a CLI interface.

nlp regex levenshtein-distance text-processing cli-tool huggingface-tokenizers local-llm ollama bpe-tokenizer glossary-builder

Updated Feb 20, 2026
Python

ocramz / tokenizers-hs

Sponsor

Star

Huggingface tokenizers for Haskell

nlp tokenizer huggingface-tokenizers

Updated Jul 16, 2023
Rust

Steffo99 / unimore-bda-6-steffo

Star

Sesta attività di Big Data Analytics

python sentiment-analysis tensorflow nltk unimore-informatica huggingface-tokenizers

Updated May 15, 2024
Python

MihranD / HuggingFace

Star

Hugging Face Transformers offer a powerful framework for state-of-the-art NLP, with the Pipeline API for easy inference, Tokenization for efficient preprocessing, and Quantization for optimized deployment.

google-colab huggingface quantizing huggingface-tokenizers huggingface-pipelines huggingface-quantizing

Updated Dec 18, 2024

Improve this page

Add a description, image, and links to the huggingface-tokenizers topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the huggingface-tokenizers topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

huggingface-tokenizers

Here are 11 public repositories matching this topic...

daulet / tokenizers

dineshsoudagar / local-llms-on-android

georgian-io / Transformers-Domain-Adaptation

windsuzu / Joint-Semantic-Phonetic-Embedding

TarunSingh2002 / Youtube-Comments-Sentiment-Analysis

miyako / text-splitter

Kratugautam99 / Natural-Language-Processing-Practice

cesarsiuu2316 / Auto-Glossary-Builder-QA-Assistant

ocramz / tokenizers-hs

Steffo99 / unimore-bda-6-steffo

MihranD / HuggingFace

Improve this page

Add this topic to your repo