public abstract class Treebank extends AbstractCollection<Tree>
Treebank object provides access to a corpus of examples with
given tree structures.
This class now implements the Collection interface. However, it may offer
less than the full power of the Collection interface: some Treebanks are
read only, and so may throw the UnsupportedOperationException.| Modifier and Type | Field and Description |
|---|---|
static String |
DEFAULT_TREE_FILE_SUFFIX |
| Constructor and Description |
|---|
Treebank()
Create a new Treebank (using a LabeledScoredTreeReaderFactory).
|
Treebank(int initialCapacity)
Create a new Treebank.
|
Treebank(int initialCapacity,
TreeReaderFactory trf)
Create a new Treebank.
|
Treebank(TreeReaderFactory trf)
Create a new Treebank.
|
Treebank(TreeReaderFactory trf,
String encoding)
Create a new Treebank.
|
| Modifier and Type | Method and Description |
|---|---|
abstract void |
apply(TreeVisitor tp)
Apply a TreeVisitor to each tree in the Treebank.
|
abstract void |
clear()
Empty a
Treebank. |
void |
decimate(Writer trainW,
Writer devW,
Writer testW)
Divide a Treebank into 3, by taking every 9th sentence for the dev
set and every 10th for the test set.
|
String |
encoding()
Returns the encoding in use for treebank file bytestream access.
|
void |
loadPath(File path)
Load a sequence of trees from given file or directory and its subdirectories.
|
abstract void |
loadPath(File path,
FileFilter filt)
Load trees from given path specification.
|
void |
loadPath(File path,
String suffix,
boolean recursively)
Load trees from given directory.
|
void |
loadPath(String pathName)
Load a sequence of trees from given directory and its subdirectories.
|
void |
loadPath(String pathName,
FileFilter filt)
Load a sequence of trees from given directory and its subdirectories
which match the file filter.
|
void |
loadPath(String pathName,
String suffix,
boolean recursively)
Load trees from given directory.
|
boolean |
remove(Object o)
This operation isn't supported for a Treebank.
|
int |
size()
Returns the size of the Treebank.
|
String |
textualSummary()
Return various statistics about the treebank (number of sentences,
words, tag set, etc.).
|
String |
textualSummary(TreebankLanguagePack tlp)
Return various statistics about the treebank (number of sentences,
words, tag set, etc.).
|
String |
toString()
Return the whole treebank as a series of big bracketed lists.
|
Treebank |
transform(TreeTransformer treeTrans)
Return a Treebank (actually a TransformingTreebank) where each
Tree in the current treebank has been transformed using the
TreeTransformer.
|
TreeReaderFactory |
treeReaderFactory()
Get the
TreeReaderFactory for a Treebank --
this method is provided in order to make the
TreeReaderFactory available to subclasses. |
add, addAll, contains, containsAll, isEmpty, iterator, removeAll, retainAll, toArray, toArrayclone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, waitequals, hashCode, parallelStream, removeIf, spliterator, streampublic static final String DEFAULT_TREE_FILE_SUFFIX
public Treebank()
public Treebank(TreeReaderFactory trf)
trf - the factory class to be called to create a new
TreeReaderpublic Treebank(TreeReaderFactory trf, String encoding)
trf - the factory class to be called to create a new
TreeReaderencoding - The charset encoding to use for treebank file decodingpublic Treebank(int initialCapacity)
initialCapacity - The initial size of the underlying Collection,
(if a Collection-based storage mechanism is being provided)public Treebank(int initialCapacity,
TreeReaderFactory trf)
initialCapacity - The initial size of the underlying Collection,
(if a Collection-based storage mechanism is being provided)trf - the factory class to be called to create a new
TreeReaderpublic TreeReaderFactory treeReaderFactory()
TreeReaderFactory for a Treebank --
this method is provided in order to make the
TreeReaderFactory available to subclasses.public String encoding()
public abstract void clear()
Treebank.clear in interface Collection<Tree>clear in class AbstractCollection<Tree>public void loadPath(String pathName)
pathName - file or directory namepublic void loadPath(File path)
path - File specificationpublic void loadPath(String pathName, String suffix, boolean recursively)
pathName - File or directory namesuffix - Extension of files to load: If pathName
is a directory, then, if this is
non-null, all and only files ending in "." followed
by this extension will be loaded; if it is null,
all files in directories will be loaded. If pathName
is not a directory, this parameter is ignored.recursively - descend into subdirectories as wellpublic void loadPath(File path, String suffix, boolean recursively)
path - file or directory to load fromsuffix - suffix of files to loadrecursively - descend into subdirectories as wellpublic void loadPath(String pathName, FileFilter filt)
pathName - file or directory namefilt - A filter used to determine which files matchpublic abstract void loadPath(File path, FileFilter filt)
path - file or directory to load fromfilt - a FilenameFilter of files to loadpublic abstract void apply(TreeVisitor tp)
tp - The TreeVisitor to be appliedpublic Treebank transform(TreeTransformer treeTrans)
treeTrans - The TreeTransformer to usepublic String toString()
toString in class AbstractCollection<Tree>public int size()
size in interface Collection<Tree>size in class AbstractCollection<Tree>public void decimate(Writer trainW, Writer devW, Writer testW)
public String textualSummary()
public String textualSummary(TreebankLanguagePack tlp)
tlp - The TreebankLanguagePack used to determine punctuation and an
appropriate character encodingpublic boolean remove(Object o)
remove in interface Collection<Tree>remove in class AbstractCollection<Tree>