Lasso Regression model with built-in CV using LARS algorithm.
Use mlr_model_type: lasso_lars_cv
to use this MLR model in the recipe.
Classes:
LassoLarsCVModel(input_datasets,**kwargs) | Lasso Regression model with built-in CV using LARS algorithm. |
- class esmvaltool.diag_scripts.mlr.models.lasso_lars_cv.LassoLarsCVModel(input_datasets, **kwargs)[source]#
Bases: LinearModel
Lasso Regression model with built-in CV using LARS algorithm.
Attributes:
categorical_features
Categorical features.
data
Input data of the MLR model.
features
Features of the input data.
features_after_preprocessing
Features of the input data after preprocessing.
features_types
Types of the features.
features_units
Units of the features.
fit_kwargs
Keyword arguments for fit().
group_attributes
Group attributes of the input data.
label
Label of the input data.
label_units
Units of the label.
mlr_model_type
MLR model type.
numerical_features
Numerical features.
parameters
Parameters of the complete MLR model pipeline.
random_state
Random state instance.
Methods:
create(mlr_model_type,*args,**kwargs)
Create desired MLR model subclass (factory method).
efecv(**kwargs)
Perform exhaustive feature elimination using cross-validation.
export_prediction_data([filename])
Export all prediction data contained in self._data.
export_training_data([filename])
Export all training data contained in self._data.
fit()
Print final
alpha
after successful fitting.get_ancestors([label,features,...])
Return ancestor files.
get_data_frame(data_type[,impute_nans])
Return data frame of specified type.
get_x_array(data_type[,impute_nans])
Return x data of specific type.
get_y_array(data_type[,impute_nans])
Return y data of specific type.
grid_search_cv(param_grid,**kwargs)
Perform exhaustive parameter search using cross-validation.
plot_1d_model([filename,n_points])
Plot lineplot that represents the MLR model.
plot_coefs([filename])
Plot linear coefficients of models.
plot_feature_importance([filename,color_coded])
Plot feature importance given by linear coefficients.
plot_partial_dependences([filename])
Plot partial dependences for every feature.
plot_prediction_errors([filename])
Plot predicted vs.
plot_residuals([filename])
Plot residuals of training and test (if available) data.
plot_residuals_distribution([filename])
Plot distribution of residuals of training and test data (KDE).
plot_residuals_histogram([filename])
Plot histogram of residuals of training and test data.
plot_scatterplots([filename])
Plot scatterplots label vs.
predict([save_mlr_model_error,...])
Perform prediction using the MLR model(s) and write
*.nc
files.print_correlation_matrices()
Print correlation matrices for all datasets.
print_regression_metrics([logo])
Print all available regression metrics for training data.
register_mlr_model(mlr_model_type)
Add MLR model (subclass of this class) (decorator).
reset_pipeline()
Reset regressor pipeline.
rfecv(**kwargs)
Perform recursive feature elimination using cross-validation.
test_normality_of_residuals()
Perform Shapiro-Wilk test to normality of residuals.
update_parameters(**params)
Update parameters of the whole pipeline.
- property categorical_features#
Categorical features.
- Type:
- classmethod create(mlr_model_type, *args, **kwargs)#
Create desired MLR model subclass (factory method).
- property data#
Input data of the MLR model.
- Type:
- efecv(**kwargs)#
Perform exhaustive feature elimination using cross-validation.
- Parameters:
**kwargs (keyword arguments, optional) – Additional options for
esmvaltool.diag_scripts.mlr.custom_sklearn.cross_val_score_weighted()
.
- export_prediction_data(filename=None)#
Export all prediction data contained in self._data.
- Parameters:
filename (str, optional (default: '{data_type}_{pred_name}.csv')) – Name of the exported files.
- export_training_data(filename=None)#
Export all training data contained in self._data.
- Parameters:
filename (str, optional (default: '{data_type}.csv')) – Name of the exported files.
- property features#
Features of the input data.
- Type:
- property features_after_preprocessing#
Features of the input data after preprocessing.
- Type:
- property features_types#
Types of the features.
- Type:
- property features_units#
Units of the features.
- Type:
- fit()[source]#
Print final
alpha
after successful fitting.
- property fit_kwargs#
Keyword arguments for fit().
- Type:
- get_ancestors(label=True, features=None, prediction_names=None, prediction_reference=False)#
Return ancestor files.
- Parameters:
label (bool, optional (default: True)) – Return
label
files.features (list of str, optional (default: None)) – Features for which files should be returned. If
None
, returnfiles for all features.prediction_names (list of str, optional (default: None)) – Prediction names for which files should be returned. If
None
,return files for all prediction names.prediction_reference (bool, optional (default: False)) – Return
prediction_reference
files if available for givenprediction_names
.
- Returns:
Ancestor files.
- Return type:
- Raises:
ValueError – Invalid
feature
orprediction_name
given.
- get_data_frame(data_type, impute_nans=False)#
Return data frame of specified type.
- Parameters:
- Returns:
Desired data.
- Return type:
- Raises:
TypeError –
data_type
is invalid or data does not exist (e.g. test data is not set).
- get_x_array(data_type, impute_nans=False)#
Return x data of specific type.
- Parameters:
- Returns:
Desired data.
- Return type:
- Raises:
TypeError –
data_type
is invalid or data does not exist (e.g. test data is not set).
- get_y_array(data_type, impute_nans=False)#
Return y data of specific type.
- Parameters:
- Returns:
Desired data.
- Return type:
- Raises:
TypeError –
data_type
is invalid or data does not exist (e.g. test data is not set).
- grid_search_cv(param_grid, **kwargs)#
Perform exhaustive parameter search using cross-validation.
- Parameters:
param_grid (dict or list of dict) – Parameter names (keys) and ranges (values) for the search. Have tobe given for each step of the pipeline separated by twounderscores, i.e.
s__p
is the parameterp
for steps
.**kwargs (keyword arguments, optional) – Additional options for
sklearn.model_selection.GridSearchCV
.
- Raises:
ValueError – Final regressor does not supply the attributes
best_estimator_
orbest_params_
.
- property group_attributes#
Group attributes of the input data.
- Type:
- property label#
Label of the input data.
- Type:
- property label_units#
Units of the label.
- Type:
- property mlr_model_type#
MLR model type.
- Type:
- property numerical_features#
Numerical features.
- Type:
- property parameters#
Parameters of the complete MLR model pipeline.
- Type:
- plot_1d_model(filename=None, n_points=1000)#
Plot lineplot that represents the MLR model.
Note
This only works for a model with a single feature.
- Parameters:
- Raises:
sklearn.exceptions.NotFittedError – MLR model is not fitted.
ValueError – MLR model is built from more than 1 feature.
- plot_coefs(filename=None)#
Plot linear coefficients of models.
Note
The features plotted here are not necessarily the real input features,but the ones after preprocessing.
- Parameters:
filename (str, optional (default: 'coefs')) – Name of the plot file.
- plot_feature_importance(filename=None, color_coded=True)#
Plot feature importance given by linear coefficients.
Note
The features plotted here are not necessarily the real input features,but the ones after preprocessing.
- plot_partial_dependences(filename=None)#
Plot partial dependences for every feature.
- Parameters:
filename (str, optional (default: 'partial_dependece_{feature}')) – Name of the plot file.
- Raises:
sklearn.exceptions.NotFittedError – MLR model is not fitted.
- plot_prediction_errors(filename=None)#
Plot predicted vs. true values.
- Parameters:
filename (str, optional (default: 'prediction_errors')) – Name of the plot file.
- Raises:
sklearn.exceptions.NotFittedError – MLR model is not fitted.
- plot_residuals(filename=None)#
Plot residuals of training and test (if available) data.
- Parameters:
filename (str, optional (default: 'residuals')) – Name of the plot file.
- Raises:
sklearn.exceptions.NotFittedError – MLR model is not fitted.
- plot_residuals_distribution(filename=None)#
Plot distribution of residuals of training and test data (KDE).
- Parameters:
filename (str, optional (default: 'residuals_distribution')) – Name of the plot file.
- Raises:
sklearn.exceptions.NotFittedError – MLR model is not fitted.
- plot_residuals_histogram(filename=None)#
Plot histogram of residuals of training and test data.
- Parameters:
filename (str, optional (default: 'residuals_histogram')) – Name of the plot file.
- Raises:
sklearn.exceptions.NotFittedError – MLR model is not fitted.
- plot_scatterplots(filename=None)#
Plot scatterplots label vs. feature for every feature.
- Parameters:
filename (str, optional (default: 'scatterplot_{feature}')) – Name of the plot file.
- Raises:
sklearn.exceptions.NotFittedError – MLR model is not fitted.
- predict(save_mlr_model_error=None, save_lime_importance=False, save_propagated_errors=False, **kwargs)#
Perform prediction using the MLR model(s) and write
*.nc
files.- Parameters:
save_mlr_model_error (str or int, optional) – Additionally saves estimated squared MLR model error. This errorrepresents the uncertainty of the prediction caused by the MLRmodel itself and not by errors in the prediction input data (errorsin that will be considered by including datasets with
var_type
set toprediction_input_error
and settingsave_propagated_errors
toTrue
). If the option is set to'test'
, the (constant) error is estimated as RMSEP using a(hold-out) test data set. Only possible if test data is available,i.e. the optiontest_size
is not set toFalse
during classinitialization. If the option is set to'logo'
, the (constant)error is estimated as RMSEP using leave-one-group-outcross-validation using the group_attributes. Only possible ifgroup_datasets_by_attributes
is given. If the option is set toan integern
(!= 0), the (constant) error is estimated as RMSEPusing n-fold cross-validation.save_lime_importance (bool, optional (default: False)) – Additionally saves local feature importance given by LIME (LocalInterpretable Model-agnostic Explanations).
save_propagated_errors (bool, optional (default: False)) – Additionally saves propagated errors from
prediction_input_error
datasets. Only possible when these areavailable.**kwargs (keyword arguments, optional) – Additional options for the final regressors
predict()
function.
- Raises:
RuntimeError –
return_var
andreturn_cov
are both set toTrue
.sklearn.exceptions.NotFittedError – MLR model is not fitted.
ValueError – An invalid value for
save_mlr_model_error
is given.ValueError –
save_propagated_errors
isTrue
and noprediction_input_error
data is available.
- print_correlation_matrices()#
Print correlation matrices for all datasets.
- print_regression_metrics(logo=False)#
Print all available regression metrics for training data.
- Parameters:
logo (bool, optional (default: False)) – Print regression metrics using
sklearn.model_selection.LeaveOneGroupOut
cross-validation.Only possible when group_datasets_by_attributes was given duringclass initialization.
- property random_state#
Random state instance.
- classmethod register_mlr_model(mlr_model_type)#
Add MLR model (subclass of this class) (decorator).
- reset_pipeline()#
Reset regressor pipeline.
- rfecv(**kwargs)#
Perform recursive feature elimination using cross-validation.
Note
This only works for final estimators that provide information aboutfeature importance either through a
coef_
attribute or through afeature_importances_
attribute.- Parameters:
**kwargs (keyword arguments, optional) – Additional options for
sklearn.feature_selection.RFECV
.- Raises:
RuntimeError – Final estimator does not provide
coef_
orfeature_importances_
attribute.
- test_normality_of_residuals()#
Perform Shapiro-Wilk test to normality of residuals.
- Raises:
sklearn.exceptions.NotFittedError – MLR model is not fitted.
- update_parameters(**params)#
Update parameters of the whole pipeline.
Note
Parameter names have to be given for each step of the pipelineseparated by two underscores, i.e.
s__p
is the parameterp
forsteps
.- Parameters:
**params (keyword arguments, optional) – Parameters for the pipeline which should be updated.
- Raises:
ValueError – Invalid parameter for pipeline given.