splearn package

Submodules

splearn.automaton module

This module contains the Automaton class

class splearn.automaton.Automaton(nbL=0, nbS=0, initial=[], final=[], transitions=[], type='classic')

Bases: object

Define an automaton with parameters

  • Input:
Parameters:
  • nbL (int) – the number of letters
  • nbS (int) – the number of states
  • initial (list) – the initial vector
  • final (list) – the final vector
  • transition (list) – the transitions tables
BuildHankels(lrows=[], lcolumns=[])

Return all Hankel (denses) matrices built on lrows and lcolumns from an automaton

  • Input:
Parameters:
  • lrows (list) –
  • lcolumns (list) –
  • Output:
Returns:list of all Hankel matrices built on lrows and lcolumns
Return type:list
HouseholderReductionFw(tau)

algorithm (Fig. 3) from the paper Stability and complexity of Minimising Probabilistic Automata by Kiefer and Wachter

  • Input:
Parameters:
  • self (Automaton) – an object of the automaton class
  • tau (float) – error tolerance parameter >=0
  • Output:
Returns:The canonical forward reduction computed to the tolerance tau
Return type:Automaton
static HouseholderReflector()

the vector which defines the Householder for x

  • Input:
Parameters:x (vector) – a vector in R^k different from 0
  • Output:
Returns:v = u/||u|| where u_1 = x_1 + sign(x_1)||x|| and u_i = x_i for i \geq 2
Return type:vector
static SimpleExample()

A Probabilistic Automaton with two states and two letters.

  • Output:
Returns:An automaton instance example with simple values
Return type:Automaton
calc_prefix_completion_weights(prefix)

For the SPiCe competition for instance

  • Input:
Parameters:
  • self (Automaton) – Be careful that A should be a prefix transformation of an Automata. (see transformation())
  • prefix (List) – list of integers representing a prefix
  • Output:
Returns:a dictionary with all alphabet letters as keys. The associated values are the weights of being the next letter.
Return type:dict
final

The vector containing the final weight of each state

get_dot(threshold=0.0, nb_dec=2, title='Weighted Automata')

Return a string that contains the Automata into dot (graphviz) format

Example:
>>> from splearn.datasets.base import load_data_sample
>>> from splearn.tests.datasets.get_dataset_path import get_dataset_path
>>> from splearn import Spectral
>>> train_file = '3.pautomac_light.train'
>>> data = load_data_sample(adr=get_dataset_path(train_file))
>>> sp = Spectral()
>>> sp.fit(X=data.data)
>>> dotfile = "3.pautomac_light.train.dot"
>>> dot = sp.Automaton.get_dot(threshold = 0.2, title = dotfile)
>>> # To display the dot string one can use graphviz:
>>> from graphviz import Source
>>> src = Source(dot)
>>> src.render(dotfile + '.gv', view=True) 
  • Input:

:param Automaton self :param float threshold for the value to keep. If |weight| < threshold, the corresponding transition is not kept as an edge in the final dot string. :param int nb_dec is the number of decimals to keep for the weights. :param string title corresponds to the top comment of the string

Returns:a string with the current Automata in dot format
initial

The vector containing the initial weight of each state

isAbsConv

Does the automaton meet the sufficient condition to be absolutely convergent

static load_Pautomac_Automaton()

Load an automaton from a PAutomaC file and returns an object of the class Automaton; works for PFA and PDFA - not for HMM.

  • Input:
Parameters:adr (string) – address and name of the loaden file
  • Output
Returns:An automaton instance
Return type:Automaton
minimisation(tau)

compute an equivalent minimal automaton, to the precision tau

  • Input:
Parameters:self (Automaton) –
  • Output:
Returns:B, equivalent to A with a minimal number of states
Return type:Automaton
mirror()

Compute the mirror automaton

  • Input:
Parameters:self (Automaton) – Automaton(nbL, nbS, initial, final, transitions)
  • Output:
Returns:mA = Automaton(nbL, nbS, final, initial, Newtransitions) where Newtransitions[x] = transpose(transitions[x])
Return type:Automaton
static mulHouseholderReflector(v)

the product of u by the HouseholderReflector nxn matrix based on v

  • Input:
Parameters:
  • u (vector) – row vector of R^n
  • v (vector) – vector of R^k (k<=n)
  • Output:
Returns:w, row vector of R^n, w = uP(v) where P(v)=[I_{n-k} 0; 0 R]\in R^{n \times n} and R=I_k-2v^T.v
Return type:vector
nbL

The number of letters

nbS

The number of states

static read(format='json')

return an Automaton build with attributes read from a file

  • Input:
Parameters:
  • filename (str) – the name of the input file.
  • format (str) – ‘json’ or yaml’
  • Output:
Returns:the output automaton
Return type:Automaton
sum()

the sum of a rational series

  • Input:
Parameters:self (Automaton) –
  • Output:
Returns:sum over all samples of transitions
Return type:ndarray
to_hankel(lrows, lcolumns, mode_quiet=False)

Return an Hankel instance (denses, classic and not partial) with matrices built on lrows and lcolumns from an automaton

  • Input:
Parameters:
  • lrows (list) – prefixes
  • lcolumns (list) – suffixes
  • mode_quiet (boolean) – (default value = False) True for no output message.
  • Output:
Returns:Hankel instance
Return type:Hankel
transformation(source='classic', target='prefix')

Takes an automaton as input and transforms it.

  • Input:
Parameters:
  • source (str) – “prefix”, “factor” or “classic” or “suffix”(default)
  • target (str) – “prefix” (default) “factor” or “classic” or “suffix”
  • Output:
Returns:The result automaton instance
Return type:Automaton

The transformation is done according to the source and target parameters. .. warning:: it does not check the convergence

transitions

The list of arrays defining the transitions

type

The string indicates the type of automaton

val(word)

Compute the value computed by the automaton on word

  • Input:
Parameters:
  • self (Automaton) – weighted automaton
  • word (str) – a string
  • Output:
Returns:probability r_A(w)
Return type:float
static write(filename, format='json')

write input automaton into a file with the given format.

  • Input:
Parameters:
  • automaton_in (Automaton) – automaton to write into the file
  • filename (str) – the name of the file. If it does not exist, the file is created.
  • format (str) – ‘json’ or yaml’

splearn.hankel module

This module contains the Hankel class

class splearn.hankel.Hankel(sample_instance=None, lrows=[], lcolumns=[], version='classic', partial=False, sparse=False, full_svd_calculation=False, mode_quiet=False, lhankel=None)

Bases: object

A Hankel instance , compute the list of Hankel matrices

  • Input:
Parameters:
  • sample_instance (SplearnArray) – instance of SplearnArray
  • lrows (int or list of int) – number or list of rows, a list of strings if partial=True; otherwise, based on self.pref if version=”classic” or “prefix”, self.fact otherwise
  • lcolumns (int or list of int) – number or list of columns a list of strings if partial=True ; otherwise, based on self.suff if version=”classic” or “suffix”, self.fact otherwise
  • version (string) – (default = “classic”) version name
  • partial (boolean) – (default value = False) build of partial
  • sparse (boolean) – (default value = False) True if Hankel matrix is sparse
  • full_svd_calculation (boolean) – (default value = False) if True the entire SVD is calculated for building hankel matrix. Else it is done by the sklearn random algorithm only for the greatest k=rank eigenvalues.
  • mode_quiet (boolean) – (default value = False) True for no output message.
  • lhankel (list) – list of all Hankel matrices. At least one of the two parameters sample_instance or lhankel has to be not None. If sample_instance is given, the Hankel instance is built directly from the sample dictionnary, else it is deduced from the lhankels list of matrices.
Example:
>>> from splearn import Learning, Hankel , Spectral
>>> train_file = '0.spice.train'
>>> pT = load_data_sample(adr=train_file)
>>> sp = Spectral()
>>> sp.fit(X=pT.data)
>>> lhankel = Hankel( sample_instance=pT.sample,
>>>                   nbL=pT.nbL, nbEx=pT.nbEx,
>>>                   lrows=6, lcolumns=6, version="classic",
>>>                   partial=True, sparse=True, mode_quiet=True).lhankel
build(sample, pref, suff, fact, lrows, lcolumns, mode_quiet)

Create a Hankel matrix

  • Input:
Parameters:
  • sample (dict) – the keys are the words and the values are the number of time it appears in the sample.
  • pref (dict) – the keys are the prefixes and the values are the number of time it appears in the sample.
  • suff (dict) – the keys are the suffixes and the values are the number of time it appears in the sample.
  • fact (dict) – the keys are the factors and the values are the number of time it appears in the sample.
  • lrows (int or list of int) – number or list of rows, a list of strings if partial=True; otherwise, based on self.pref if version=”classic” or “prefix”, self.fact otherwise
  • lcolumns (int or list of int) – number or list of columns a list of strings if partial=True ; otherwise, based on self.suff if version=”classic” or “suffix”, self.fact otherwise
  • mode_quiet (boolean) – True for no output message.
  • Output:
Returns:list lhankel, list of hankel matrix, a DoK based sparse matrix or nuppy matrix based not sparse
Return type:list of matrix
build_from_sample

Boolean that indicates if the matrices have been build form sample or not (directly build from an Automaton in this case)

nbEx

Number of examples

nbL

Number of letters

static read(format='json')

return a Hankel build with attributes read from a file

  • Input:
Parameters:
  • filename (str) – the name of the input file.
  • format (str) – ‘json’ or yaml’
  • Output:
Returns:the output hankel
Return type:Hankel
to_automaton(rank, mode_quiet=False)

Return an automaton from the current Hankel matrix

  • Input:
Parameters:
  • rank (int) – the matrix rank
  • mode_quiet (boolean) – True for no output message.
  • Output:
Returns:An automaton instance
Return type:Automaton
static write(filename, format='json')

write input hankel into a file with the given format.

  • Input:
Parameters:
  • hankel_in (Hankel) – hankel to write into the file
  • filename (str) – the name of the file. If it does not exist, the file is created.
  • format (str) – ‘json’ or yaml’

splearn.serializer module

This module contains the Serializer class

class splearn.serializer.Serializer

Bases: object

Serializer is an helping object for data serialization

static data_to_json()

return a string into json format that does contains the input data.

  • Input:
Parameters:data – data composed by any types that is serializabled
  • Output:
Returns:the output string
Return type:str
static data_to_yaml()

return a string into yaml format that does contains the input data.

  • Input:
Parameters:data – data composed by any types that is serializabled
  • Output:
Returns:the output string
Return type:str
static json_to_data()

return a data from input json string.

  • Input:
Parameters:json_data_str – the json input string
  • Output:
Returns:the data
Return type:deduced form the json input string
static yaml_to_data()

return a data from input yaml string.

  • Input:
Parameters:yaml_data_str – the yaml input string
  • Output:
Returns:the data
Return type:deduced form the yaml input string

splearn.spectral module

This module contains the Spectral and Learning class

class splearn.spectral.Spectral(rank=5, lrows=7, lcolumns=7, version='classic', partial=True, sparse=True, full_svd_calculation=False, smooth_method='none', mode_quiet=False)

Bases: sklearn.base.BaseEstimator

A Spectral estimator instance

  • Input:
Parameters:
  • rank (int) – the ranking number
  • lrows (int or tuple of int) – (default value = 7) number or list of rows a list of strings or an interger indicating the max length of elements to consider if partial=True otherwise, based on self.pref if version=”classic” or “prefix”, self.fact otherwise
  • lcolumns (int or tuple of int) – (default value = 7) number or list of columns a list of strings or an interger indicating the max length of elements to consider if partial=True otherwise, based on self.suff if version=”classic” or “suffix”, self.fact otherwise
  • version (string) – (default value = “classic”) version name
  • partial (boolean) – (default value = False) build of partial Hankel matrix
  • sparse (boolean) – (default value = False) True if Hankel matrix is sparse
  • full_svd_calculation (boolean) – (default value = False) if True the entire SVD is calculated for building hankel matrix. Else it is done by the sklearn random algorithm only for the greatest k=rank eigenvalues.
  • smooth_method (string) –

    (default value = “none”) method of smoothing

    • ’trigram’ the 3-Gram trigram dict is computed and used by the predict function, in this case the threeGram probability is used instead of Spectral probability in negative case
    • ’none’ or something else no smooth method is used in predict function.
  • mode_quiet (boolean) – (default value = False) True for no output message.
Example:
>>> from splearn.spectral import Spectral
>>> sp = Spectral()
>>> sp.set_params(partial=True, lcolumns=6, lrows=6, smooth_method='trigram')
Spectral(lcolumns=6, lrows=6, mode_quiet=False, partial=True, rank=5,
 smooth_method='trigram', sparse=True, version='classic')
>>> sp.fit(data.data)
Start Hankel matrix computation
End of Hankel matrix computation
Start Building Automaton from Hankel matrix
End of Automaton computation
Spectral(lcolumns=6, lrows=6, partial=True, rank=5, smooth_method='trigram', sparse=True, version='classic')
>>> sp.automaton.initial
array([-0.00049249,  0.00304676, -0.04405996, -0.10765322, -0.08660063])
>>> sp.predict(data.data)
array([  4.38961058e-04,   1.10616861e-01,   1.35569353e-03, ...,
    4.66041996e-06,   4.68177275e-02,   5.24287604e-20])
>>> sp.loss(data.data, normalize=True)
-10.530029936056017
>>> sp.score(data.data)
10.530029936056017
automaton

Automaton build by the fit method. None by default

fit(X, y=None)

Fit the model

  • Input:
Parameters:
  • X (SplearnArray) – object of shape [n_samples,n_features] Training data
  • y (ndarray) – (default value = None) not used by Spectral estimator numpy array of shape [n_samples] Target values
  • Output:
Returns:Spectral itself with an automaton attribute instanced returns an instance of self.
Return type:Spectral
get_params(deep=True)

return parameters values of Spectral estimator

  • Output:
Returns:parameters dictionary of Spectral estimator name : value
Return type:dict
hankel

Hankel build by the fit method. None by default

loss(X, y=None, normalize=True)

Log probability using the Spectral model

  • Input:
Parameters:
  • X (SplearnArray) – of shape data shape = (n_samples, n_features) Samples. X is validation data.
  • y (ndarray) – (default value = Null) numpy array of shape [n_samples] Target values, is the ground truth target for X (in the supervised case) or None (in the unsupervised case)
  • normalize (boolean) – (default value = True) calculation are performed and normalize by the number of sample in case of True
  • Output:
Returns:mean (resp. sum) of Log Probability corresponding to the input X if normalize is True (resp. False) and y is None. If y is a vector of target values, the mean (resp. sum) is calculated over the square of differences.
Return type:float
nb_trigram()

return the number of index affected by the trigram computation

  • Output:
Returns:int number of trigram_index
polulate_dictionnaries(X)

Populates the sample, pref, suff, fact dictionnaries of X

  • Input:
Parameters:X (SplearnArray) – object of shape [n_samples,n_features] Training data
predict(X)

Predict using the Spectral model

  • Input:
:param SplearnArray X : of shape data shape = (n_samples, n_features)
Samples.
  • Output:
Returns:Probability corresponding to the input X, array-like of shape = n_samples
Return type:ndarray
predict_proba(X)

Predict probability using the Spectral model

  • Input:

:param SplearnArray X : Samples, data shape = (n_samples, n_features)

  • Output:
Returns:Probability corresponding to the input X of shape = (n_samples)
Return type:ndarray
score(X, y=None, scoring='perplexity')

score of the input target

  • Input:
Parameters:
  • X (SplearnArray) – of shape data shape = (n_samples, n_features) Samples.
  • y (ndarray) – (default value = None) numpy array of shape [n_samples] Target values, is the ground truth target for X (in the supervised case) or None (in the unsupervised case)
  • scoring (string) – (default value = “perplexity”) method for score computation
  • Output:
Returns:score, on the input X
Return type:float
set_params(**parameters)

set the values of Spectral estimator parameters

  • Output:
Returns:Spectral estimator with new parameters
Return type:Spectral
trigram

The trigram dictionary

Module contents