[R] Что за cora такая?

0

0

Читаю документацию lda.pdf (по дискриминантному анализу) и там есть некая cora которая везде используется в примерах:

cora 
    A subset of the Cora dataset of scientific documents.
Description
    A collection of 2410 scientific documents in LDA format with links and titles from the Cora search
    engine.
Usage
    data(cora.documents)
    data(cora.vocab)
    data(cora.cites)
    data(cora.titles)
Format
    cora.documents and cora.vocab comprise a corpus of 2410 documents conforming to the
    LDA format.
    cora.titles is a character vector of titles for each document (i.e., each entry of cora.documents).
    cora.cites is a list representing the citations between the documents in the collection (see
    related for format).
Source
    Automating the construction of internet protals with machine learning. McCallum et al. Information
    Retrieval. 2000.

Откуда ее взять-то? Ее нет в пакете MASS, содержащем lda

ссылка на документацию:

http://cran.r-project.org/web/packages/lda/lda.pdf

Ссылка

←	[gtk][темы] Теоретический вопрос

Мультфильм о программистах

→

http://netkit-srl.sourceforge.net/data.html - оно?

CoRA
This data set is based on the cora data set (McCallum et al., 2000), which comprises computer science research papers. It includes the full citation graph as well as labels for the topic of each paper (and potentially sub- and sub-subtopics). There are seven possible labels.
The file contains two data sets, one using only citation links and one using both citation and shared-author links. The edge weights are added: one per shared author and one for a citation (two if the papers cite each other).

fjfalcon ★★★
(05.06.10 21:06:43 MSD)