Code für das Text Engineering Seminar (siehe Seminarplan ) package ir (Information-Retrieval) Inhalt Ressourcen/Dependencies Literatur basic Korpus, Lineare Suche, Term-Dokument-Matrix Shakespeare IIR Kap. 1 boole Invertierter Index, Listen-Intersection, Vorverarbeitung, Positional Index, PositionalIntersect IIR Kap. 1 + 2 ranked Ranked Retrieval: Termgewichtung, Vector Space Model IIR Kap. 6 + 7 evaluation Evaluation: Precision, Recall, F-Maß IIR Kap. 8 lucene Lucene: Indexer und Searcher lucene-core, lucene-queryparser, lucene-analyzers-common Lucene in Action web Crawler, WebDocument commons-io, nekohtml, jrobotx IIR Kap. 19 + 20 package tm (Text-Mining) Inhalt Ressourcen/Dependencies Literatur document Document, Topics, TermIndex, FeatureVector corpus Korpus, DB, DocumentIndex, Crawler db4o, crawler (siehe package ir.web ) classification TextClassifier, Naive Bayes