splearn package¶
Subpackages¶
Submodules¶
splearn.automaton module¶
This module contains the Automaton class
-
class
splearn.automaton.
Automaton
(nbL=0, nbS=0, initial=[], final=[], transitions=[], type='classic')¶ Bases:
object
Define an automaton with parameters
- Input:
Parameters: - nbL (int) – the number of letters
- nbS (int) – the number of states
- initial (list) – the initial vector
- final (list) – the final vector
- transition (list) – the transitions tables
-
BuildHankels
(lrows=[], lcolumns=[])¶ Return all Hankel (denses) matrices built on lrows and lcolumns from an automaton
- Input:
Parameters: - lrows (list) –
- lcolumns (list) –
- Output:
Returns: list of all Hankel matrices built on lrows and lcolumns Return type: list
-
HouseholderReductionFw
(tau)¶ algorithm (Fig. 3) from the paper Stability and complexity of Minimising Probabilistic Automata by Kiefer and Wachter
- Input:
Parameters: - self (Automaton) – an object of the automaton class
- tau (float) – error tolerance parameter >=0
- Output:
Returns: The canonical forward reduction computed to the tolerance tau Return type: Automaton
-
static
HouseholderReflector
()¶ the vector which defines the Householder for x
- Input:
Parameters: x (vector) – a vector in R^k different from 0 - Output:
Returns: v = u/||u|| where u_1 = x_1 + sign(x_1)||x|| and u_i = x_i for i \geq 2 Return type: vector
-
static
SimpleExample
()¶ A Probabilistic Automaton with two states and two letters.
- Output:
Returns: An automaton instance example with simple values Return type: Automaton
-
calc_prefix_completion_weights
(prefix)¶ For the SPiCe competition for instance
- Input:
Parameters: - self (Automaton) – Be careful that A should be a prefix transformation of an Automata.
(see
transformation()
) - prefix (List) – list of integers representing a prefix
- Output:
Returns: a dictionary with all alphabet letters as keys. The associated values are the weights of being the next letter. Return type: dict
-
final
¶ The vector containing the final weight of each state
-
get_dot
(threshold=0.0, nb_dec=2, title='Weighted Automata')¶ Return a string that contains the Automata into dot (graphviz) format
Example: >>> from splearn.datasets.base import load_data_sample >>> from splearn.tests.datasets.get_dataset_path import get_dataset_path >>> from splearn import Spectral >>> train_file = '3.pautomac_light.train' >>> data = load_data_sample(adr=get_dataset_path(train_file)) >>> sp = Spectral() >>> sp.fit(X=data.data) >>> dotfile = "3.pautomac_light.train.dot" >>> dot = sp.Automaton.get_dot(threshold = 0.2, title = dotfile) >>> # To display the dot string one can use graphviz: >>> from graphviz import Source >>> src = Source(dot) >>> src.render(dotfile + '.gv', view=True)
- Input:
:param Automaton self :param float threshold for the value to keep. If |weight| < threshold, the corresponding transition is not kept as an edge in the final dot string. :param int nb_dec is the number of decimals to keep for the weights. :param string title corresponds to the top comment of the string
Returns: a string with the current Automata in dot format
-
initial
¶ The vector containing the initial weight of each state
-
isAbsConv
¶ Does the automaton meet the sufficient condition to be absolutely convergent
-
static
load_Pautomac_Automaton
()¶ Load an automaton from a PAutomaC file and returns an object of the class Automaton; works for PFA and PDFA - not for HMM.
- Input:
Parameters: adr (string) – address and name of the loaden file - Output
Returns: An automaton instance Return type: Automaton
-
minimisation
(tau)¶ compute an equivalent minimal automaton, to the precision tau
- Input:
Parameters: self (Automaton) – - Output:
Returns: B, equivalent to A with a minimal number of states Return type: Automaton
-
mirror
()¶ Compute the mirror automaton
- Input:
Parameters: self (Automaton) – Automaton(nbL, nbS, initial, final, transitions) - Output:
Returns: mA = Automaton(nbL, nbS, final, initial, Newtransitions) where Newtransitions[x] = transpose(transitions[x]) Return type: Automaton
-
static
mulHouseholderReflector
(v)¶ the product of u by the HouseholderReflector nxn matrix based on v
- Input:
Parameters: - u (vector) – row vector of R^n
- v (vector) – vector of R^k (k<=n)
- Output:
Returns: w, row vector of R^n, w = uP(v) where P(v)=[I_{n-k} 0; 0 R]\in R^{n \times n} and R=I_k-2v^T.v Return type: vector
-
nbL
¶ The number of letters
-
nbS
¶ The number of states
-
static
read
(format='json')¶ return an Automaton build with attributes read from a file
- Input:
Parameters: - filename (str) – the name of the input file.
- format (str) – ‘json’ or yaml’
- Output:
Returns: the output automaton Return type: Automaton
-
sum
()¶ the sum of a rational series
- Input:
Parameters: self (Automaton) – - Output:
Returns: sum over all samples of transitions Return type: ndarray
-
to_hankel
(lrows, lcolumns, mode_quiet=False)¶ Return an Hankel instance (denses, classic and not partial) with matrices built on lrows and lcolumns from an automaton
- Input:
Parameters: - lrows (list) – prefixes
- lcolumns (list) – suffixes
- mode_quiet (boolean) – (default value = False) True for no output message.
- Output:
Returns: Hankel instance Return type: Hankel
-
transformation
(source='classic', target='prefix')¶ Takes an automaton as input and transforms it.
- Input:
Parameters: - source (str) – “prefix”, “factor” or “classic” or “suffix”(default)
- target (str) – “prefix” (default) “factor” or “classic” or “suffix”
- Output:
Returns: The result automaton instance Return type: Automaton The transformation is done according to the source and target parameters. .. warning:: it does not check the convergence
-
transitions
¶ The list of arrays defining the transitions
-
type
¶ The string indicates the type of automaton
splearn.hankel module¶
This module contains the Hankel class
-
class
splearn.hankel.
Hankel
(sample_instance=None, lrows=[], lcolumns=[], version='classic', partial=False, sparse=False, full_svd_calculation=False, mode_quiet=False, lhankel=None)¶ Bases:
object
A Hankel instance , compute the list of Hankel matrices
- Input:
Parameters: - sample_instance (SplearnArray) – instance of SplearnArray
- lrows (int or list of int) – number or list of rows, a list of strings if partial=True; otherwise, based on self.pref if version=”classic” or “prefix”, self.fact otherwise
- lcolumns (int or list of int) – number or list of columns a list of strings if partial=True ; otherwise, based on self.suff if version=”classic” or “suffix”, self.fact otherwise
- version (string) – (default = “classic”) version name
- partial (boolean) – (default value = False) build of partial
- sparse (boolean) – (default value = False) True if Hankel matrix is sparse
- full_svd_calculation (boolean) – (default value = False) if True the entire SVD is calculated for building hankel matrix. Else it is done by the sklearn random algorithm only for the greatest k=rank eigenvalues.
- mode_quiet (boolean) – (default value = False) True for no output message.
- lhankel (list) – list of all Hankel matrices. At least one of the two parameters sample_instance or lhankel has to be not None. If sample_instance is given, the Hankel instance is built directly from the sample dictionnary, else it is deduced from the lhankels list of matrices.
Example: >>> from splearn import Learning, Hankel , Spectral >>> train_file = '0.spice.train' >>> pT = load_data_sample(adr=train_file) >>> sp = Spectral() >>> sp.fit(X=pT.data) >>> lhankel = Hankel( sample_instance=pT.sample, >>> nbL=pT.nbL, nbEx=pT.nbEx, >>> lrows=6, lcolumns=6, version="classic", >>> partial=True, sparse=True, mode_quiet=True).lhankel
-
build
(sample, pref, suff, fact, lrows, lcolumns, mode_quiet)¶ Create a Hankel matrix
- Input:
Parameters: - sample (dict) – the keys are the words and the values are the number of time it appears in the sample.
- pref (dict) – the keys are the prefixes and the values are the number of time it appears in the sample.
- suff (dict) – the keys are the suffixes and the values are the number of time it appears in the sample.
- fact (dict) – the keys are the factors and the values are the number of time it appears in the sample.
- lrows (int or list of int) – number or list of rows, a list of strings if partial=True; otherwise, based on self.pref if version=”classic” or “prefix”, self.fact otherwise
- lcolumns (int or list of int) – number or list of columns a list of strings if partial=True ; otherwise, based on self.suff if version=”classic” or “suffix”, self.fact otherwise
- mode_quiet (boolean) – True for no output message.
- Output:
Returns: list lhankel, list of hankel matrix, a DoK based sparse matrix or nuppy matrix based not sparse Return type: list of matrix
-
build_from_sample
¶ Boolean that indicates if the matrices have been build form sample or not (directly build from an Automaton in this case)
-
nbEx
¶ Number of examples
-
nbL
¶ Number of letters
-
static
read
(format='json')¶ return a Hankel build with attributes read from a file
- Input:
Parameters: - filename (str) – the name of the input file.
- format (str) – ‘json’ or yaml’
- Output:
Returns: the output hankel Return type: Hankel
splearn.serializer module¶
This module contains the Serializer class
-
class
splearn.serializer.
Serializer
¶ Bases:
object
Serializer is an helping object for data serialization
-
static
data_to_json
()¶ return a string into json format that does contains the input data.
- Input:
Parameters: data – data composed by any types that is serializabled - Output:
Returns: the output string Return type: str
-
static
data_to_yaml
()¶ return a string into yaml format that does contains the input data.
- Input:
Parameters: data – data composed by any types that is serializabled - Output:
Returns: the output string Return type: str
-
static
json_to_data
()¶ return a data from input json string.
- Input:
Parameters: json_data_str – the json input string - Output:
Returns: the data Return type: deduced form the json input string
-
static
yaml_to_data
()¶ return a data from input yaml string.
- Input:
Parameters: yaml_data_str – the yaml input string - Output:
Returns: the data Return type: deduced form the yaml input string
-
static
splearn.spectral module¶
This module contains the Spectral and Learning class
-
class
splearn.spectral.
Spectral
(rank=5, lrows=7, lcolumns=7, version='classic', partial=True, sparse=True, full_svd_calculation=False, smooth_method='none', mode_quiet=False)¶ Bases:
sklearn.base.BaseEstimator
A Spectral estimator instance
- Input:
Parameters: - rank (int) – the ranking number
- lrows (int or tuple of int) – (default value = 7) number or list of rows a list of strings or an interger indicating the max length of elements to consider if partial=True otherwise, based on self.pref if version=”classic” or “prefix”, self.fact otherwise
- lcolumns (int or tuple of int) – (default value = 7) number or list of columns a list of strings or an interger indicating the max length of elements to consider if partial=True otherwise, based on self.suff if version=”classic” or “suffix”, self.fact otherwise
- version (string) – (default value = “classic”) version name
- partial (boolean) – (default value = False) build of partial Hankel matrix
- sparse (boolean) – (default value = False) True if Hankel matrix is sparse
- full_svd_calculation (boolean) – (default value = False) if True the entire SVD is calculated for building hankel matrix. Else it is done by the sklearn random algorithm only for the greatest k=rank eigenvalues.
- smooth_method (string) –
(default value = “none”) method of smoothing
- ’trigram’ the 3-Gram trigram dict is computed and used by the predict function, in this case the threeGram probability is used instead of Spectral probability in negative case
- ’none’ or something else no smooth method is used in predict function.
- mode_quiet (boolean) – (default value = False) True for no output message.
Example: >>> from splearn.spectral import Spectral >>> sp = Spectral() >>> sp.set_params(partial=True, lcolumns=6, lrows=6, smooth_method='trigram') Spectral(lcolumns=6, lrows=6, mode_quiet=False, partial=True, rank=5, smooth_method='trigram', sparse=True, version='classic') >>> sp.fit(data.data) Start Hankel matrix computation End of Hankel matrix computation Start Building Automaton from Hankel matrix End of Automaton computation Spectral(lcolumns=6, lrows=6, partial=True, rank=5, smooth_method='trigram', sparse=True, version='classic') >>> sp.automaton.initial array([-0.00049249, 0.00304676, -0.04405996, -0.10765322, -0.08660063]) >>> sp.predict(data.data) array([ 4.38961058e-04, 1.10616861e-01, 1.35569353e-03, ..., 4.66041996e-06, 4.68177275e-02, 5.24287604e-20]) >>> sp.loss(data.data, normalize=True) -10.530029936056017 >>> sp.score(data.data) 10.530029936056017
-
automaton
¶ Automaton build by the fit method. None by default
-
fit
(X, y=None)¶ Fit the model
- Input:
Parameters: - X (SplearnArray) – object of shape [n_samples,n_features] Training data
- y (ndarray) – (default value = None) not used by Spectral estimator numpy array of shape [n_samples] Target values
- Output:
Returns: Spectral itself with an automaton attribute instanced returns an instance of self. Return type: Spectral
-
get_params
(deep=True)¶ return parameters values of Spectral estimator
- Output:
Returns: parameters dictionary of Spectral estimator name : value Return type: dict
-
hankel
¶ Hankel build by the fit method. None by default
-
loss
(X, y=None, normalize=True)¶ Log probability using the Spectral model
- Input:
Parameters: - X (SplearnArray) – of shape data shape = (n_samples, n_features) Samples. X is validation data.
- y (ndarray) – (default value = Null) numpy array of shape [n_samples] Target values, is the ground truth target for X (in the supervised case) or None (in the unsupervised case)
- normalize (boolean) – (default value = True) calculation are performed and normalize by the number of sample in case of True
- Output:
Returns: mean (resp. sum) of Log Probability corresponding to the input X if normalize is True (resp. False) and y is None. If y is a vector of target values, the mean (resp. sum) is calculated over the square of differences. Return type: float
-
nb_trigram
()¶ return the number of index affected by the trigram computation
- Output:
Returns: int number of trigram_index
-
polulate_dictionnaries
(X)¶ Populates the sample, pref, suff, fact dictionnaries of X
- Input:
Parameters: X (SplearnArray) – object of shape [n_samples,n_features] Training data
-
predict
(X)¶ Predict using the Spectral model
- Input:
- :param SplearnArray X : of shape data shape = (n_samples, n_features)
- Samples.
- Output:
Returns: Probability corresponding to the input X, array-like of shape = n_samples Return type: ndarray
-
predict_proba
(X)¶ Predict probability using the Spectral model
- Input:
:param SplearnArray X : Samples, data shape = (n_samples, n_features)
- Output:
Returns: Probability corresponding to the input X of shape = (n_samples) Return type: ndarray
-
score
(X, y=None, scoring='perplexity')¶ score of the input target
- Input:
Parameters: - X (SplearnArray) – of shape data shape = (n_samples, n_features) Samples.
- y (ndarray) – (default value = None) numpy array of shape [n_samples] Target values, is the ground truth target for X (in the supervised case) or None (in the unsupervised case)
- scoring (string) – (default value = “perplexity”) method for score computation
- Output:
Returns: score, on the input X Return type: float
-
set_params
(**parameters)¶ set the values of Spectral estimator parameters
- Output:
Returns: Spectral estimator with new parameters Return type: Spectral
-
trigram
¶ The trigram dictionary