splearn.datasets package¶

Submodules¶

splearn.datasets.base module¶

splearn.datasets.base.load_data_sample(adr, filetype='SPiCe', pickle=False)¶

Load a sample from file and returns a dictionary (word,count)

Input:

Parameters:	adr (str) – address and name of the loaded file filetype (str) – (default value = ‘SPiCe’) indicate the structure of the file. Should be either ‘SPiCe’ or ‘Pautomac’ pickle (boolean) – if enabled it a pickle file is created from the loaded file. Default is fault.

Output:

Returns:	corresponding DataSample
Return type:	DataSample
Example:

>>> from splearn.datasets.base import load_data_sample
>>> from splearn.tests.datasets.get_dataset_path import get_dataset_path
>>> train_file = '3.pautomac_light.train' # '4.spice.train'
>>> data = load_data_sample(adr=get_dataset_path(train_file))
>>> data.nbL
4
>>> data.nbEx
5000
>>> data.data
Splearn_array([[ 3.,  0.,  3., ..., -1., -1., -1.],
       [ 3.,  3., -1., ..., -1., -1., -1.],
       [ 3.,  2.,  0., ..., -1., -1., -1.],
       ...,
       [ 3.,  1.,  3., ..., -1., -1., -1.],
       [ 3.,  0.,  3., ..., -1., -1., -1.],
       [ 3.,  3.,  1., ..., -1., -1., -1.]])

splearn.datasets.data_sample module¶

This module contains the DataSample class and SplearnArray class.

class splearn.datasets.data_sample.DataSample(data=None, **kwargs)¶

Bases: dict

A DataSample instance

Input:

Parameters:	data (tuple) – a tuple of (int, int, numpy.array) for the corresponding three elements (nbL, nbEx, data) where nbL is the number of letters in the alphabet, nbEx is the number of samples and data is the 2d data array
Example:

>>> from splearn.datasets.base import load_data_sample
>>> from splearn.tests.datasets.get_dataset_path import get_dataset_path
>>> train_file = '3.pautomac_light.train' # '4.spice.train'
>>> data = load_data_sample(adr=get_dataset_path(train_file))
>>> print(data.__class__)
<class 'splearn.datasets.data_sample.DataSample'>
>>> data.nbL
4
>>> data.nbEx
5000
>>> data.data

data¶: SplearnArray

nbEx¶: Number of examples

nbL¶: Number of letters

class splearn.datasets.data_sample.SplearnArray¶

Bases: numpy.ndarray

Sample data array used by the splearn spectral estimation

SplearnArray class inherit from numpy ndarray as a 2d data ndarray.

Example of a possible 2d shape:

0	1	0	3	-1
0	0	3	3	1
1	1	-1	-1	-1
5	-1	-1	-1	-1
-1	-1	-1	-1	-1

is equivalent to:

word (0103) or abad
word (00331) or aaddb
word (11) or bb
word (5) or f
word () or empty

Each line represents a word of the sample. The words are represented by integer letters (0->a, 1->b, 2->c …). -1 indicates the end of the word. The number of rows is the total number of words in the sample (=nbEx) and the number of columns is given by the size of the longest word. Notice that the total number of words does not care about the words’ duplications. If a word is duplicated in the sample, it is counted twice as two different examples.

The DataSample class encapsulates also the sample’s parameters ‘nbL’, ‘nbEx’ (number of letters in the alphabet and number of samples) and the fourth dictionaries ‘sample’, ‘prefix’, ‘suffix’ and ‘factor’ that will be populated during the fit estimations.

Input:

Parameters:

input_array (nd.array) – input ndarray that will be converted into SplearnArray
nbL (int) – the number of letters
nbEx (int) – total number of examples.
sample (dict) – the keys are the words and the values are the number of time it appears in the sample.
pref (dict) – the keys are the prefixes and the values are the number of time it appears in the sample.
suff (dict) – the keys are the suffixes and the values are the number of time it appears in the sample.
fact (dict) – the keys are the factors and the values are the number of time it appears in the sample.

Example:

>>> from splearn.datasets.base import load_data_sample
>>> from splearn.tests.datasets.get_dataset_path import get_dataset_path
>>> train_file = '3.pautomac_light.train' # '4.spice.train'
>>> data = load_data_sample(adr=get_dataset_path(train_file))
>>> print(data.__class__)
>>> data.data
<class 'splearn.datasets.data_sample.DataSample'>
SplearnArray([[ 3.,  0.,  3., ..., -1., -1., -1.],
    [ 3.,  3., -1., ..., -1., -1., -1.],
    [ 3.,  2.,  0., ..., -1., -1., -1.],
    ...,
    [ 3.,  1.,  3., ..., -1., -1., -1.],
    [ 3.,  0.,  3., ..., -1., -1., -1.],
    [ 3.,  3.,  1., ..., -1., -1., -1.]])

splearn.datasets package¶

Submodules¶

splearn.datasets.base module¶

splearn.datasets.data_sample module¶

Module contents¶

Table Of Contents

This Page