Classifiers
gsitk offers a common interface, compatible with scikit-learn predictors for implementing classifiers. A classifier is a model that can be trained, or be already prepared to make predictions.
Currently, gsitk has two classifier types:
LexiconSum
: made for a simple use of an annotated lexicon.VaderClassifier
: wrapper for the popular Vader sentiment analysis classifer.
Use of Lexicon
Performs a sum over the words of a given document, using the annotations from a lexicon. It follows the lexicon's annotation schema. Normalizes the output to the range [-1, 0, 1]. The following example shows its use:
from gsitk.classifiers import LexiconSum
# use a custom-lexicon
ls = LexiconSum({'good': 1, 'bad': -1, 'happy': 1, 'sad': -1, 'mildly': -0.1})
text = [
['my', 'dog', 'is', 'a', 'good', 'and', 'happy', 'pet'],
['my', 'cat', 'is', 'not', 'sad', 'just', 'mildly', 'bad'],
['not', 'happy', 'nor', 'sad'],
]
ls.predict(text)
# output
array([ 1., -1., 0.])
Vader
Wrapper around the implementation of the original author. This module does not need the text tokenized, as seen in the following example:
from gsitk.classifiers import VaderClassifier
text = [
'my dog is a good and happy pet',
'my cat is not sad just mildly bad',
'not happy nor sad',
]
VaderClassifier().predict(text)
# output
array([ 1., 0., -1.])