T - The class of the returned tokenspublic static class PTBTokenizer.PTBTokenizerFactory<T extends HasWord> extends Object implements TokenizerFactory<T>
PTBTokenizer for details of the parameters and options.PTBTokenizer,
Serialized Form| Modifier and Type | Field and Description |
|---|---|
protected LexedTokenFactory<T> |
factory |
protected String |
options |
| Modifier and Type | Method and Description |
|---|---|
Iterator<T> |
getIterator(Reader r)
Returns a tokenizer wrapping the given Reader.
|
Tokenizer<T> |
getTokenizer(Reader r)
Returns a tokenizer wrapping the given Reader.
|
Tokenizer<T> |
getTokenizer(Reader r,
String extraOptions)
Get a tokenizer for this reader.
|
static PTBTokenizer.PTBTokenizerFactory<CoreLabel> |
newCoreLabelTokenizerFactory(String options)
Constructs a new PTBTokenizer that returns CoreLabel objects and
uses the options passed in.
|
static PTBTokenizer.PTBTokenizerFactory<CoreLabel> |
newPTBTokenizerFactory(boolean tokenizeNLs,
boolean invertible) |
static <T extends HasWord> |
newPTBTokenizerFactory(LexedTokenFactory<T> tokenFactory,
String options)
Constructs a new PTBTokenizer that uses the LexedTokenFactory and
options passed in.
|
static TokenizerFactory<Word> |
newTokenizerFactory()
Constructs a new TokenizerFactory that returns Word objects and
treats carriage returns as normal whitespace.
|
static PTBTokenizer.PTBTokenizerFactory<Word> |
newWordTokenizerFactory(String options)
Constructs a new PTBTokenizer that returns Word objects and
uses the options passed in.
|
void |
setOptions(String options)
Sets default options for how tokenizers built from this factory should behave.
|
protected final LexedTokenFactory<T extends HasWord> factory
protected String options
public static TokenizerFactory<Word> newTokenizerFactory()
public static PTBTokenizer.PTBTokenizerFactory<Word> newWordTokenizerFactory(String options)
options - A String of optionspublic static PTBTokenizer.PTBTokenizerFactory<CoreLabel> newCoreLabelTokenizerFactory(String options)
options - A String of options. For the default, recommended
options for PTB-style tokenization compatibility, pass
in an empty String.public static <T extends HasWord> PTBTokenizer.PTBTokenizerFactory<T> newPTBTokenizerFactory(LexedTokenFactory<T> tokenFactory, String options)
tokenFactory - The LexedTokenFactoryoptions - A String of optionspublic static PTBTokenizer.PTBTokenizerFactory<CoreLabel> newPTBTokenizerFactory(boolean tokenizeNLs, boolean invertible)
public Iterator<T> getIterator(Reader r)
getIterator in interface IteratorFromReaderFactory<T extends HasWord>r - Where to read objects frompublic Tokenizer<T> getTokenizer(Reader r)
getTokenizer in interface TokenizerFactory<T extends HasWord>r - A Reader (which is assumed to already by buffered, if appropriate)public Tokenizer<T> getTokenizer(Reader r, String extraOptions)
TokenizerFactorygetTokenizer in interface TokenizerFactory<T extends HasWord>r - A Reader (which is assumed to already by buffered, if appropriate)extraOptions - Options for how this tokenizer should behavepublic void setOptions(String options)
TokenizerFactorysetOptions in interface TokenizerFactory<T extends HasWord>options - Options for how this tokenizer should behave