<h1 align='center'> 🌴 dobbi 🦕 </h1> <p align='center'> Takes care of all of this boring NLP stuff <br> <br> <img alt="PyPI - Python Version" src="https://img.shields.io/pypi/pyversions/dobbi"> <a href='https://pypi.org/project/dobbi/'><img alt="Version" src="https://img.shields.io/pypi/v/dobbi?logo=pypi"></a> <a href='https://opensource.org/licenses/Apache-2.0'><img alt="GitHub" src="https://img.shields.io/github/license/iaramer/dobbi"></a><br> </p>

Description

An open-source NLP library: fast text cleaning and preprocessing.

TL;DR

This library provides a quick and ready-to-use text preprocessing tools for text cleaning and normalization. You can simply remove hashtags, nicknames, emoji, url addresses, punctuation, whitespace and whatever.

Installation

To download dobbi, either fork this GitHub repo or simply use Pypi via pip:

$ pip install dobbi

Usage

Import the library:

import dobbi

Interaction

The library uses method chaining in order to simplify text processing:

import pandas as pd

d = {'text': ['#fun #lol   Why  @Alex33 is so funny here: https://some-url.com',
              '#looool     =)      😍 such lovely!?*!!!%&']}
df = pd.DataFrame(d)

cln_func = dobbi.clean() \
    .hashtag() \
    .nickname() \
    .url() \
    .function()
df['text'] = df['text'].map(cln_func)

repl_func = dobbi.replace() \
    .emoji() \
    .emoticon() \
    .punctuation() \
    .function()
df['text'] = df['text'].map(repl_func)

Result:

print(df['text'][0])  # 'Why is so funny here'
print(df['text'][1])  # 'TOKEN_EMOTICON_HAPPY_FACE_OR_SMILEY TOKEN_EMOJI_SMILING_FACE_WITH_HEART_EYES such lovely'

Supported methods and patterns

The process consists of three stages:

Initialization methods: initialize a dobbi Work object
Intermediate methods: chain patterns in the needed order
Terminal methods: choose if you need a function or a result

dobbi

MoltPulse

Description

TL;DR

Installation

Usage

Interaction

Supported methods and patterns

Examples

1) Clean a random Twitter message

2) Replace nicknames and urls with tokens

3) Get the text cleanup function

Additional

Call for collaboration 🤗

Ecosystem Role

Embed Badge