Classifiers

gsitk offers a common interface, compatible with scikit-learn predictors for implementing classifiers. A classifier is a model that can be trained, or be already prepared to make predictions.

Currently, gsitk has two classifier types:

Use of Lexicon

Performs a sum over the words of a given document, using the annotations from a lexicon. It follows the lexicon's annotation schema. Normalizes the output to the range [-1, 0, 1]. The following example shows its use:

from gsitk.classifiers import LexiconSum

# use a custom-lexicon
ls = LexiconSum({'good': 1, 'bad': -1, 'happy': 1, 'sad': -1, 'mildly': -0.1})

text = [
    ['my', 'dog', 'is', 'a', 'good', 'and', 'happy', 'pet'],
    ['my', 'cat', 'is', 'not', 'sad', 'just', 'mildly', 'bad'],
    ['not', 'happy', 'nor', 'sad'],
]

ls.predict(text)
# output
array([ 1., -1.,  0.])

Vader

Wrapper around the implementation of the original author. This module does not need the text tokenized, as seen in the following example:

from gsitk.classifiers import VaderClassifier

text = [
    'my dog is a good and happy pet',
    'my cat is not sad just mildly bad',
    'not happy nor sad',
]

VaderClassifier().predict(text)
# output
array([ 1.,  0., -1.])