splearn package¶

Subpackages¶

splearn.datasets package

Submodules¶

splearn.automaton module¶

This module contains the Automaton class

class splearn.automaton.Automaton(nbL=0, nbS=0, initial=[], final=[], transitions=[], type='classic')¶

Bases: object

Define an automaton with parameters

Input:

Parameters:	nbL (int) – the number of letters nbS (int) – the number of states initial (list) – the initial vector final (list) – the final vector transition (list) – the transitions tables

BuildHankels(lrows=[], lcolumns=[])¶

Return all Hankel (denses) matrices built on lrows and lcolumns from an automaton

Input:

Parameters:	lrows (list) – lcolumns (list) –

Output:

Returns:	list of all Hankel matrices built on lrows and lcolumns
Return type:	list

HouseholderReductionFw(tau)¶

algorithm (Fig. 3) from the paper Stability and complexity of Minimising Probabilistic Automata by Kiefer and Wachter

Input:

Parameters:	self (Automaton) – an object of the automaton class tau (float) – error tolerance parameter >=0

Output:

Returns:	The canonical forward reduction computed to the tolerance tau
Return type:	Automaton

static HouseholderReflector()¶

the vector which defines the Householder for x

Input:

Parameters:	x (vector) – a vector in R^k different from 0

Output:

Returns:	v = u/\|\|u\|\| where u_1 = x_1 + sign(x_1)\|\|x\|\| and u_i = x_i for i \geq 2
Return type:	vector

static SimpleExample()¶

A Probabilistic Automaton with two states and two letters.

Output:

Returns:	An automaton instance example with simple values
Return type:	Automaton

calc_prefix_completion_weights(prefix)¶

For the SPiCe competition for instance

Input:

Parameters:	self (Automaton) – Be careful that A should be a prefix transformation of an Automata. (see `transformation()`) prefix (List) – list of integers representing a prefix

Output:

Returns:	a dictionary with all alphabet letters as keys. The associated values are the weights of being the next letter.
Return type:	dict

final¶: The vector containing the final weight of each state

get_dot(threshold=0.0, nb_dec=2, title='Weighted Automata')¶

Return a string that contains the Automata into dot (graphviz) format

Example:

>>> from splearn.datasets.base import load_data_sample
>>> from splearn.tests.datasets.get_dataset_path import get_dataset_path
>>> from splearn import Spectral
>>> train_file = '3.pautomac_light.train'
>>> data = load_data_sample(adr=get_dataset_path(train_file))
>>> sp = Spectral()
>>> sp.fit(X=data.data)
>>> dotfile = "3.pautomac_light.train.dot"
>>> dot = sp.Automaton.get_dot(threshold = 0.2, title = dotfile)
>>> # To display the dot string one can use graphviz:
>>> from graphviz import Source
>>> src = Source(dot)
>>> src.render(dotfile + '.gv', view=True) 

Input:

:param Automaton self :param float threshold for the value to keep. If |weight| < threshold, the corresponding transition is not kept as an edge in the final dot string. :param int nb_dec is the number of decimals to keep for the weights. :param string title corresponds to the top comment of the string

Returns:	a string with the current Automata in dot format

initial¶: The vector containing the initial weight of each state

isAbsConv¶: Does the automaton meet the sufficient condition to be absolutely convergent

static load_Pautomac_Automaton()¶

Load an automaton from a PAutomaC file and returns an object of the class Automaton; works for PFA and PDFA - not for HMM.

Input:

Parameters:	adr (string) – address and name of the loaden file

Output

Returns:	An automaton instance
Return type:	Automaton

minimisation(tau)¶

compute an equivalent minimal automaton, to the precision tau

Input:

Parameters:	self (Automaton) –

Output:

Returns:	B, equivalent to A with a minimal number of states
Return type:	Automaton

mirror()¶

Compute the mirror automaton

Input:

Parameters:	self (Automaton) – Automaton(nbL, nbS, initial, final, transitions)

Output:

Returns:	mA = Automaton(nbL, nbS, final, initial, Newtransitions) where Newtransitions[x] = transpose(transitions[x])
Return type:	Automaton

static mulHouseholderReflector(v)¶

the product of u by the HouseholderReflector nxn matrix based on v

Input:

Parameters:	u (vector) – row vector of R^n v (vector) – vector of R^k (k<=n)

Output:

Returns:	w, row vector of R^n, w = uP(v) where P(v)=[I_{n-k} 0; 0 R]\in R^{n \times n} and R=I_k-2v^T.v
Return type:	vector

nbL¶: The number of letters

nbS¶: The number of states

static read(format='json')¶

return an Automaton build with attributes read from a file

Input:

Parameters:	filename (str) – the name of the input file. format (str) – ‘json’ or yaml’

Output:

Returns:	the output automaton
Return type:	Automaton

sum()¶

the sum of a rational series

Input:

Parameters:	self (Automaton) –

Output:

Returns:	sum over all samples of transitions
Return type:	ndarray

to_hankel(lrows, lcolumns, mode_quiet=False)¶

Return an Hankel instance (denses, classic and not partial) with matrices built on lrows and lcolumns from an automaton

Input:

Parameters:	lrows (list) – prefixes lcolumns (list) – suffixes mode_quiet (boolean) – (default value = False) True for no output message.

Output:

Returns:	Hankel instance
Return type:	Hankel

transformation(source='classic', target='prefix')¶

Takes an automaton as input and transforms it.

Input:

Parameters:	source (str) – “prefix”, “factor” or “classic” or “suffix”(default) target (str) – “prefix” (default) “factor” or “classic” or “suffix”

Output:

Returns:	The result automaton instance
Return type:	Automaton The transformation is done according to the source and target parameters. .. warning:: it does not check the convergence

transitions¶: The list of arrays defining the transitions

type¶: The string indicates the type of automaton

val(word)¶

Compute the value computed by the automaton on word

Input:

Parameters:	self (Automaton) – weighted automaton word (str) – a string

Output:

Returns:	probability r_A(w)
Return type:	float

static write(filename, format='json')¶

write input automaton into a file with the given format.

Input:

Parameters:	automaton_in (Automaton) – automaton to write into the file filename (str) – the name of the file. If it does not exist, the file is created. format (str) – ‘json’ or yaml’

splearn.hankel module¶

This module contains the Hankel class

class splearn.hankel.Hankel(sample_instance=None, lrows=[], lcolumns=[], version='classic', partial=False, sparse=False, full_svd_calculation=False, mode_quiet=False, lhankel=None)¶

Bases: object

A Hankel instance , compute the list of Hankel matrices

Input:

Parameters:

sample_instance (SplearnArray) – instance of SplearnArray
lrows (int or list of int) – number or list of rows, a list of strings if partial=True; otherwise, based on self.pref if version=”classic” or “prefix”, self.fact otherwise
lcolumns (int or list of int) – number or list of columns a list of strings if partial=True ; otherwise, based on self.suff if version=”classic” or “suffix”, self.fact otherwise
version (string) – (default = “classic”) version name
partial (boolean) – (default value = False) build of partial
sparse (boolean) – (default value = False) True if Hankel matrix is sparse
full_svd_calculation (boolean) – (default value = False) if True the entire SVD is calculated for building hankel matrix. Else it is done by the sklearn random algorithm only for the greatest k=rank eigenvalues.
mode_quiet (boolean) – (default value = False) True for no output message.
lhankel (list) – list of all Hankel matrices. At least one of the two parameters sample_instance or lhankel has to be not None. If sample_instance is given, the Hankel instance is built directly from the sample dictionnary, else it is deduced from the lhankels list of matrices.

Example:

>>> from splearn import Learning, Hankel , Spectral
>>> train_file = '0.spice.train'
>>> pT = load_data_sample(adr=train_file)
>>> sp = Spectral()
>>> sp.fit(X=pT.data)
>>> lhankel = Hankel( sample_instance=pT.sample,
>>>                   nbL=pT.nbL, nbEx=pT.nbEx,
>>>                   lrows=6, lcolumns=6, version="classic",
>>>                   partial=True, sparse=True, mode_quiet=True).lhankel

build(sample, pref, suff, fact, lrows, lcolumns, mode_quiet)¶

Create a Hankel matrix

Input:

Parameters:

sample (dict) – the keys are the words and the values are the number of time it appears in the sample.
pref (dict) – the keys are the prefixes and the values are the number of time it appears in the sample.
suff (dict) – the keys are the suffixes and the values are the number of time it appears in the sample.
fact (dict) – the keys are the factors and the values are the number of time it appears in the sample.
lrows (int or list of int) – number or list of rows, a list of strings if partial=True; otherwise, based on self.pref if version=”classic” or “prefix”, self.fact otherwise
lcolumns (int or list of int) – number or list of columns a list of strings if partial=True ; otherwise, based on self.suff if version=”classic” or “suffix”, self.fact otherwise
mode_quiet (boolean) – True for no output message.

Output:

Returns:	list lhankel, list of hankel matrix, a DoK based sparse matrix or nuppy matrix based not sparse
Return type:	list of matrix

build_from_sample¶: Boolean that indicates if the matrices have been build form sample or not (directly build from an Automaton in this case)

nbEx¶: Number of examples

nbL¶: Number of letters

static read(format='json')¶

return a Hankel build with attributes read from a file

Input:

Parameters:	filename (str) – the name of the input file. format (str) – ‘json’ or yaml’

Output:

Returns:	the output hankel
Return type:	Hankel

to_automaton(rank, mode_quiet=False)¶

Return an automaton from the current Hankel matrix

Input:

Parameters:	rank (int) – the matrix rank mode_quiet (boolean) – True for no output message.

Output:

Returns:	An automaton instance
Return type:	Automaton

static write(filename, format='json')¶

write input hankel into a file with the given format.

Input:

Parameters:	hankel_in (Hankel) – hankel to write into the file filename (str) – the name of the file. If it does not exist, the file is created. format (str) – ‘json’ or yaml’

splearn.serializer module¶

This module contains the Serializer class

class splearn.serializer.Serializer¶

Bases: object

Serializer is an helping object for data serialization

static data_to_json()¶

return a string into json format that does contains the input data.

Input:

Parameters:	data – data composed by any types that is serializabled

Output:

Returns:	the output string
Return type:	str

static data_to_yaml()¶

return a string into yaml format that does contains the input data.

Input:

Parameters:	data – data composed by any types that is serializabled

Output:

Returns:	the output string
Return type:	str

static json_to_data()¶

return a data from input json string.

Input:

Parameters:	json_data_str – the json input string

Output:

Returns:	the data
Return type:	deduced form the json input string

static yaml_to_data()¶

return a data from input yaml string.

Input:

Parameters:	yaml_data_str – the yaml input string

Output:

Returns:	the data
Return type:	deduced form the yaml input string

splearn.spectral module¶

This module contains the Spectral and Learning class

class splearn.spectral.Spectral(rank=5, lrows=7, lcolumns=7, version='classic', partial=True, sparse=True, full_svd_calculation=False, smooth_method='none', mode_quiet=False)¶

Bases: sklearn.base.BaseEstimator

A Spectral estimator instance

Input:

Parameters:

rank (int) – the ranking number
lrows (int or tuple of int) – (default value = 7) number or list of rows a list of strings or an interger indicating the max length of elements to consider if partial=True otherwise, based on self.pref if version=”classic” or “prefix”, self.fact otherwise
lcolumns (int or tuple of int) – (default value = 7) number or list of columns a list of strings or an interger indicating the max length of elements to consider if partial=True otherwise, based on self.suff if version=”classic” or “suffix”, self.fact otherwise
version (string) – (default value = “classic”) version name
partial (boolean) – (default value = False) build of partial Hankel matrix
sparse (boolean) – (default value = False) True if Hankel matrix is sparse
full_svd_calculation (boolean) – (default value = False) if True the entire SVD is calculated for building hankel matrix. Else it is done by the sklearn random algorithm only for the greatest k=rank eigenvalues.
smooth_method (string) –
(default value = “none”) method of smoothing
- ’trigram’ the 3-Gram trigram dict is computed and used by the predict function, in this case the threeGram probability is used instead of Spectral probability in negative case
- ’none’ or something else no smooth method is used in predict function.
mode_quiet (boolean) – (default value = False) True for no output message.

Example:

>>> from splearn.spectral import Spectral
>>> sp = Spectral()
>>> sp.set_params(partial=True, lcolumns=6, lrows=6, smooth_method='trigram')
Spectral(lcolumns=6, lrows=6, mode_quiet=False, partial=True, rank=5,
 smooth_method='trigram', sparse=True, version='classic')
>>> sp.fit(data.data)
Start Hankel matrix computation
End of Hankel matrix computation
Start Building Automaton from Hankel matrix
End of Automaton computation
Spectral(lcolumns=6, lrows=6, partial=True, rank=5, smooth_method='trigram', sparse=True, version='classic')
>>> sp.automaton.initial
array([-0.00049249,  0.00304676, -0.04405996, -0.10765322, -0.08660063])
>>> sp.predict(data.data)
array([  4.38961058e-04,   1.10616861e-01,   1.35569353e-03, ...,
    4.66041996e-06,   4.68177275e-02,   5.24287604e-20])
>>> sp.loss(data.data, normalize=True)
-10.530029936056017
>>> sp.score(data.data)
10.530029936056017

automaton¶: Automaton build by the fit method. None by default

fit(X, y=None)¶

Fit the model

Input:

Parameters:	X (SplearnArray) – object of shape [n_samples,n_features] Training data y (ndarray) – (default value = None) not used by Spectral estimator numpy array of shape [n_samples] Target values

Output:

Returns:	Spectral itself with an automaton attribute instanced returns an instance of self.
Return type:	Spectral

get_params(deep=True)¶

return parameters values of Spectral estimator

Output:

Returns:	parameters dictionary of Spectral estimator name : value
Return type:	dict

hankel¶: Hankel build by the fit method. None by default

loss(X, y=None, normalize=True)¶

Log probability using the Spectral model

Input:

Parameters:

X (SplearnArray) – of shape data shape = (n_samples, n_features) Samples. X is validation data.
y (ndarray) – (default value = Null) numpy array of shape [n_samples] Target values, is the ground truth target for X (in the supervised case) or None (in the unsupervised case)
normalize (boolean) – (default value = True) calculation are performed and normalize by the number of sample in case of True

Output:

Returns:	mean (resp. sum) of Log Probability corresponding to the input X if normalize is True (resp. False) and y is None. If y is a vector of target values, the mean (resp. sum) is calculated over the square of differences.
Return type:	float

nb_trigram()¶

return the number of index affected by the trigram computation

Output:

Returns:	int number of trigram_index

polulate_dictionnaries(X)¶

Populates the sample, pref, suff, fact dictionnaries of X

Input:

Parameters:	X (SplearnArray) – object of shape [n_samples,n_features] Training data

predict(X)¶

Predict using the Spectral model

Input:

:param SplearnArray X : of shape data shape = (n_samples, n_features): Samples.

Output:

Returns:	Probability corresponding to the input X, array-like of shape = n_samples
Return type:	ndarray

predict_proba(X)¶

Predict probability using the Spectral model

Input:

:param SplearnArray X : Samples, data shape = (n_samples, n_features)

Output:

Returns:	Probability corresponding to the input X of shape = (n_samples)
Return type:	ndarray

score(X, y=None, scoring='perplexity')¶

score of the input target

Input:

Parameters:	X (SplearnArray) – of shape data shape = (n_samples, n_features) Samples. y (ndarray) – (default value = None) numpy array of shape [n_samples] Target values, is the ground truth target for X (in the supervised case) or None (in the unsupervised case) scoring (string) – (default value = “perplexity”) method for score computation

Output:

Returns:	score, on the input X
Return type:	float

set_params(**parameters)¶

set the values of Spectral estimator parameters

Output:

Returns:	Spectral estimator with new parameters
Return type:	Spectral

trigram¶: The trigram dictionary

splearn package¶

Subpackages¶

Submodules¶

splearn.automaton module¶

splearn.hankel module¶

splearn.serializer module¶

splearn.spectral module¶

Module contents¶

Table Of Contents

This Page