The cascade.cpm_model module¶
Module defining the causal model.
The cpm_model module defines the solver and other functionality for the regression model used in causal pixel model.
- ols(design_matrix, data, covariance=None)[source]¶
Ordinary least squares.
- Parameters:
design_matrix ('numpy.ndarray') – The design or regression matrix used in the regression modeling
data ('numpy.ndarray') – Vecor of data point to be modeled.
weights ('numpy.ndarray', optional) – Weights used in the regression. Typically the inverse of the coveraice matrix. The default is None.
- Returns:
fit_parameters (‘numpy.ndarray’) – Linear regirssion parameters.
err_fit_parameters (‘numpy.ndarray’) – Error estimate on the regression parameters.
sigma_hat_sqr (‘float’) – Mean squared error.
Notes
This routine solves the linear equation
\[A x = y\]by finding optimal solution :math:’hat{x}’ by minimizing
\[|| y - A*\hat{x} ||^2\]References
Examples
>>> import numpy as np >>> from cascade.cpm_model import solve_linear_equation >>> A = np.array([[1, 0, -1], [0, 1, 0], [1, 0, 1], [1, 1, 0], [-1, 1, 0]]) >>> coef = np.array([4, 2, 7]) >>> b = np.dot(A, coef) >>> b = b + np.random.normal(0.0, 0.01, size=b.size) >>> results = solve_linear_equation(A, b) >>> print(results)
- check_causality()[source]¶
Check if all data has a causal connection.
- Returns:
causal_mask (ndarray of ‘bool’) – Mask of data which has good causal connection with other data.
- select_regressors(selection_mask, exclusion_distance)[source]¶
Return list with indici of the regressors for each wavelength data point.
- Parameters:
selectionMask ('ndarray' of 'bool') – Mask selection all data for which a regressor matrix have to be constructed.
exclusion_distance ('int') – Minimum distance to data point within no data is selected to be used as regressor.
- Returns:
regressor_list (‘list’) – list of indicex pais of data index and indici of the data used as regressors for the specified data point.
- return_design_matrix(data, selection_list)[source]¶
Return the design matrix based on the data set itself.
- Parameters:
data ('ndarray') – Input timeseries data.
selection_list ('tuple') – Tuple containing the indici of the data used as regressor for a given wvelength (index).
- Returns:
design_matrix (‘ndarray’) – Design matrix.
- log_likelihood(data, covariance, model)[source]¶
Calculate the log likelihood.
- Parameters:
data ('ndarray') – Data array to be modeled
covariance ('ndarray') – The covariance of the data.
model ('ndarray') – Regression model of the data.
- Returns:
lnL (‘float’) – Log likelihood.
Notes
For the determinent term in the log likelyhood calculation use:
2*np.sum(np.log(np.diag(np.linalg.cholesky(covariance))))
np.dot(np.dot((data-model), np.diag(weights)), (data-model))
- modified_AIC(lnL, n_data, n_parameters)[source]¶
Calculate the modified AIC.
- Parameters:
lnL ('float') – Log likelihood.
n_data ('int') – Number of data points
n_parameters ('int') – Number of free model parameters.
- Returns:
AICc (‘float’) – modelifed Aikake information criterium.
- create_regularization_matrix(method, n_regressors, n_not_regularized)[source]¶
Create regularization matrix.
Two options are implemented: The first one ‘value’ returns a penalty matrix for the clasical ridge rigression. The second option ‘derivative’ is consistend with fused ridge penalty (as introduced by Goeman, 2008).
- Parameters:
method ('string') – Method used to calculated regularization matrix. Allawed values are ‘value’ or ‘derivative’
n_regressors ('int') – Number of regressors.
n_not_regularized ('int') – Number of regressors whi should not have a regulariation term.
- Raises:
ValueError – Incase the method input parameter has a wrong value a ValueError is raised.
- Returns:
delta (‘ndarray’) – Regularization matrix.
- return_lambda_grid(lambda_min, lambda_max, n_lambda)[source]¶
Create grid for regularization parameters lambda.
- Parameters:
lambda_min (TYPE) – DESCRIPTION.
lambda_max (TYPE) – DESCRIPTION.
n_lambda (TYPE) – DESCRIPTION.
- Returns:
lambda_grid (TYPE) – DESCRIPTION.
- class regressionDataServer(dataset, regressor_dataset)[source]¶
Bases:
object
Class which provied all needed input daqta for the regression modeling.
The is class load the data and cleaned data to define for each wavelength the timeseries data at that wavelength which will be abalysed and the regressors which will be used for the analysis.
- sync_with_parameter_server(parameter_server_handle)[source]¶
Sync data server with the parameter server.
- Parameters:
parameter_server_handle ('regressionParameterServer') – instance of the regressionParameterServer class.
- Returns:
None.
- get_data_info(*, _ray_trace_ctx=None)[source]¶
Get the relevant information of the observations.
- Returns:
ndim (‘int’) – Dimension of the dataset.
shape (‘tuple’) – Shape of the dataset.
ROI (‘ndarray’) – Region of interest.
data_unit (‘astropy unit’) – Physical unit of the data.
wavelength_unit (‘astropy unit’) – Physical unit of the wavelength.
time_unit (‘astropy unit’) – Unit of the time.
time_bjd_zero (‘float’) – Time in BJD of first integration
data_product (‘string’) – Data product.
- initialze_lightcurve_model(*, _ray_trace_ctx=None)[source]¶
Initialize the ligthcurve model.
- Returns:
None.
- get_lightcurve_model(*, _ray_trace_ctx=None)[source]¶
Get the lightcurve model.
- Returns:
‘tuple’ – Tuple containing the lightcurve model, the limbdarkening correction,the dilution correction, the lightcurve model parameters and the mid transit time.
- unpack_datasets(*, _ray_trace_ctx=None)[source]¶
Unpack al datasets into masked arrays.
- Returns:
None.
- unpack_regressor_dataset(*, _ray_trace_ctx=None)[source]¶
Unpack dataset containing data to be used as regressors.
- Returns:
None.
- unpack_fit_dataset(*, _ray_trace_ctx=None)[source]¶
Unpack dataset containing data to be fitted.
- Returns:
None.
- static select_regressors(data, selection, bootstrap_indici=None)[source]¶
Return the design matrix for a given selection.
This function selects the data to be used as regressor. To be used in combination with the select_data function.
- Parameters:
data ('ndarray') – Spectroscopic data.
selection ('tuple') – Tuple containing the indici of the data to be used as regressors for each wavelength (index).
bootstrap_indici ('ndarray' of 'int', optional) – The time indici indicating which data to be used for a bootstrap sampling. The default is None.
- Returns:
design_matrix (‘ndarray’) – The design matrix used in the regression analysis.
- static select_data(data, selection, bootstrap_indici=None)[source]¶
Return the data for a given selection.
This functions selects the data for to be used the the regression analysis. To be used in combination with the select_regressors function.
- Parameters:
data ('ndarray') – Spectroscopic data..
selection ('tuple') – Tuple containing the indici of the data to be used as regressors for each wavelength (index).
bootstrap_indici ('ndarray' of 'int', optional) – The time indici indicating which data to be used for a bootstrap sampling. The default is None.
- Returns:
design_matrix (‘ndarray’) – The selected data to me modeled.
- setup_regression_data(selection, bootstrap_indici=None, *, _ray_trace_ctx=None)[source]¶
Setupe the data which will be fitted.
- Parameters:
selection ('tuple') – Tuple containing the indici of the data to be used as regressors for each wavelength (index).
bootstrap_indici ('ndarray' of 'int', optional) – The time indici indicating which data to be used for a bootstrap sampling. The default is None.
- Returns:
None.
- setup_regression_matrix(selection, bootstrap_indici=None, *, _ray_trace_ctx=None)[source]¶
Define the regression matrix.
- Parameters:
selection ('tuple') – Tuple containing the indici of the data to be used as regressors for each wavelength (index).
bootstrap_indici ('ndarray' of 'int', optional) – The time indici indicating which data to be used for a bootstrap sampling. The default is None.
- Returns:
None.
- get_regression_data(selection, bootstrap_indici=None, return_data_only=False, *, _ray_trace_ctx=None)[source]¶
Get all relevant data.
- Parameters:
selection ('tuple') – Tuple containing the indici of the data to be used as regressors for each wavelength (index).
bootstrap_indici ('ndarray' of 'int', optional) – The time indici indicating which data to be used for a bootstrap sampling. The default is None.
return_data_only ('bool', optional) – If set, the design matrix is not determined and returned as None.
- Returns:
‘ndarray’ – Data to be modeled.
’ndarray’ – Design matrix for te regression analysis of the data.
- get_all_regression_data(selection_list, bootstrap_indici=None, return_data_only=False, *, _ray_trace_ctx=None)[source]¶
Get all relevant data for a slection list for a single bootstrap step.
- Parameters:
selection_list ('list') – Tuple containing the indici of the data to be used as regressors for each wavelength (index).
bootstrap_indici ('ndarray' of 'int', optional) – The time indici indicating which data to be used for a bootstrap sampling. The default is None.
return_data_only ('bool', optional) – If set, the design matrix is not determined and returned as None.
- Returns:
‘ndarray’ – Data to be modeled.
’ndarray’ – Design matrix for the regression analysis of the data.
- get_regression_data_chunk(iterator_chunk, *, _ray_trace_ctx=None)[source]¶
Get all relevant data for a chunck of the regression iteration.
- Parameters:
iterator_chunk ('list') – list containing the tuple containing the indici of the data to be used as regressors for each wavelength (index) and the bootstrap time indici indicating which data to be used for a bootstrap sampling.
- Returns:
regression_selection_list (‘list’) – List containing the data to be modeled and the corresponding design matrix for te regression analysis of the data.
- class regressionControler(cascade_configuration, dataset, regressor_dataset, number_of_workers=1, number_of_data_servers=1)[source]¶
Bases:
object
The main server for the causal regression modeling.
This class defines the controler for the regression modeling. It starts the data and parameter server and distributes the tasks to the workers. After completion it processes all results and stores the extrcted planetary spectra in spectral data format.
- instantiate_data_server(dataset, regressor_dataset)[source]¶
Instantiate the data server.
- Parameters:
dataset ('SpectralDataTimeSeries') – The spectral timeseries dataset to be modeled.
regressor_dataset ('SpectralDataTimeSeries') – The cleaned version of the spectral timeseries dataset used for construnction the regression matrici.
- Returns:
None.
- initialize_servers()[source]¶
Initialize both data as wel as the parameter server.
Note that the order of initialization is important: Firts the data server and then the parameter server.
- Returns:
None.
- get_fit_parameters_from_server(bootstrap_chunk=None)[source]¶
Get the regression fit parameters from the parameter server.
Parameter¶
- bootstrap_chunk: ‘list’
indici of a subset of all bootstrap samples
- returns:
fitted_parameters (‘simpleNamespace’) – this namespace contrains all relevant fit parameters used in the extraction and calibration of the planetary signal.
- get_processed_parameters_from_server(bootstrap_chunk=None)[source]¶
Get the processed regression fit parameters from the parameter server.
- Returns:
fitted_parameters (‘simpleNamespace’) – this namespace contrains all relevant fit parameters used in the extraction and calibration of the planetary signal.
- get_regularization_parameters_from_server()[source]¶
Get the regularization parameters from the parameter server.
- Returns:
‘simapleNamespace’ – Namsespace containing all regularization varaibles and parameters.
- get_control_parameters()[source]¶
Get the contraol parameters from the parameter server.
This function returns all relevant parameters needed to determine the behaviour and settings of the regression modeling.
- Returns:
control_parameters (‘SimpleNamespace’) – This namespace contrain all control parameters of the regression model.
- get_lightcurve_model()[source]¶
Get the lightcurve model.
- Returns:
‘simapleNamespace’ – Namespace containing all variables and parameters defining the lightcurve model.
- initialize_regression_iterators(nchunks=1)[source]¶
Initialize the iterators required in the regression analysis.
- Returns:
None.
- get_regression_iterators(*, _ray_trace_ctx=None)[source]¶
Get all iterators used in the regression analysis.
- Returns:
‘simplaeNamespace’ – Namespace containing all iterators (data indici, bootstrap indici) for regression analysis
- static grouper_it(it, nchunks, number_of_iterators)[source]¶
Split iterator into chunks.
- Parameters:
it ('itertools.product') – Iterator to be split into chunks.
nchunks ('int') – Number of chuncks.
number_of_iterators ('int') – Number of iterators.
- Yields:
chunk_it (‘list’) – Chunk of the input iterator.
- chunk_iterators(nchunks=1, *, _ray_trace_ctx=None)[source]¶
Split interators into chunks.
- Parameters:
nchunk ('int', optional) – Number of chunks in which to split the iterators. The default is 1.
- Returns:
None.
- add_fit_parameters_to_parameter_server(new_parameters)[source]¶
Add the fited refression parameters to the parameter server.
- Parameters:
new_parameters ('simpleNamespace') – Updated fit parameters.
- Returns:
None.
- run_regression_model()[source]¶
Run the regression model.
This method runs the regression method for the instrument systematics and the transit depth determination.
- Returns:
None.
- ridge(input_regression_matrix, input_data, input_covariance, input_delta, input_alpha)[source]¶
Ridge regression.
- Parameters:
input_regression_matrix ('numpy.ndarray') – The design or regression matrix used in the regularized least square fit.
input_data ('numpy.ndarray') – Vector of data to be fit.
input_covariance ('numpy.ndarray') – Covariacne matrix used as weight in the least quare fit.
input_delta ('numpy.ndarray') – Regularization matrix. For ridge regression this is the unity matrix.
input_alpha ('float' or 'numpy.ndarray') – Regularization strength.
- Returns:
beta (‘numpy.ndarray’) – Fitted regression parameters.
rss (‘float’) – Sum of squared residuals.
mse (‘float’) – Mean square error
degrees_of_freedom (‘float’) – The effective degress of Freedo of the fit.
model_unscaled (‘numpy.ndarray’) – The fitted regression model.
optimal_regularization (‘numpy.ndarray’) – The optimal regularization strength determened by generalized cross validation.
aicc (float’) – Corrected Aikake information criterium.
Notes
This routine solves the linear equation
\[A x = y\]by finding optimal solution :math:’^x’ by minimizing
\[|| y - A*\hat{x} ||^2 + \lambda * || \hat{x} ||^2\]References
[5] PHD thesis by Diana Maria SIMA, “Regularization techniques in Model Fitting and Parameter estimation”, KU Leuven 2006
[6] Hogg et al 2010, “Data analysis recipies: Fitting a model to data”
[7] Rust & O’Leaary, “Residual periodograms for choosing regularization parameters for ill-posed porblems”
[8] Krakauer et al “Using generalized cross-validationto select parameters in inversions for regional carbon fluxes”
Examples
>>> import numpy as np >>> from cascade.cpm_model import solve_linear_equation >>> A = np.array([[1, 0, -1], [0, 1, 0], [1, 0, 1], [1, 1, 0], [-1, 1, 0]]) >>> coef = np.array([4, 2, 7]) >>> b = np.dot(A, coef) >>> b = b + np.random.normal(0.0, 0.01, size=b.size) >>> results = solve_linear_equation(A, b) >>> print(results)
- make_bootstrap_samples(ndata, nsamples)[source]¶
Make bootstrap sample indicii.
- Parameters:
ndata ('int') – Number of data points.
nsamples ('int') – Number of bootstrap samples.
- Returns:
bootsptrap_indici (‘ndarray’ of ‘int’) – (nsample+1 X ndata) array containing the permutated indicii of the data array. The first row is the unsampled list of indici.
non_common_indici (‘list’) – For ech nootstrap sampling, list of indici not sampled.
- class regressionParameterServer(cascade_configuration)[source]¶
Bases:
object
Class which provied the parameter server for the regression modeling.
The is class contains all parameters needed for the regression analysis and the fitted results.
- initialize_regression_configuration(*, _ray_trace_ctx=None)[source]¶
Initialize all regression control parameters.
- Returns:
None.
- get_regression_parameters(*, _ray_trace_ctx=None)[source]¶
Get all parameters controling the regression analysis.
- Returns:
‘simpleNameSpace’ – Name spcae holding all parameters controling the regression analysis.
- get_configuration(*, _ray_trace_ctx=None)[source]¶
Get the CASCADe configuration.
- Returns:
‘cascade.initialize.cascade_configuration’ – Singleton containing the cascade configuration.
- sync_with_data_server(data_server_handle)[source]¶
Sync the parameter server with the data server.
- Returns:
None.
- get_data_parameters(*, _ray_trace_ctx=None)[source]¶
Get all parameters characterizing the data.
- Returns:
simpleNameSpace’ – Name spcae holding all relevant parameters describing the dataset.
- initialize_regularization(*, _ray_trace_ctx=None)[source]¶
Initialize the regularization parameter test grid and results array.
- Returns:
None.
- get_regularization(*, _ray_trace_ctx=None)[source]¶
Get the regularization parameters.
- Returns:
simpleNameSpace’ – Name spcae holding all relevant parameters for the regularization.
- update_optimal_regulatization(new_regularization, *, _ray_trace_ctx=None)[source]¶
Update the fitted optimal regularization strength.
- Parameters:
new_regularization ('simpleNamespace') – New namespace holding the updated optimal regularization.
- Returns:
None.
- initialize_parameters(*, _ray_trace_ctx=None)[source]¶
Initialize the arrays holding the fit results.
- Returns:
None.
- update_fitted_parameters(new_parameters, data_chunk, *, _ray_trace_ctx=None)[source]¶
Apply new update and returns weights.
- update_processed_parameters(new_parameters, bootstrap_chunk=None, *, _ray_trace_ctx=None)[source]¶
Update processed parameters
- Parameters:
new_parameters (TYPE) – DESCRIPTION.
bootstrap_chunk (TYPE, optional) – DESCRIPTION. The default is None.
- Returns:
None.
- get_fitted_parameters(bootstrap_chunk=None, *, _ray_trace_ctx=None)[source]¶
Return the fitted parameters.
- Parameters:
bootstrap_chunk ('list') – list of indici of subset of bootstrap samples
- Returns:
‘simpleNamespace’ – Returns a namespace containing all fitted parameters.
- get_processed_parameters(bootstrap_chunk=None, *, _ray_trace_ctx=None)[source]¶
Return the fitted parameters.
- Parameters:
bootstrap_chunk ('list') – list of indici of subset of bootstrap samples
- Returns:
‘simpleNamespace’ – Returns a namespace containing all fitted parameters.
- add_new_parameters(new_parameters, *, _ray_trace_ctx=None)[source]¶
Add aditional fitted parameters.
- Parameters:
new_parameters ('dictionary') – Dictionary defining aditional fit parameters of the regression model.
- Returns:
None.
- reset_parameters(*, _ray_trace_ctx=None)[source]¶
Reset all regression and regularization parameters.
- Returns:
None.
- initialize_parameter_server(data_server_handle, *, _ray_trace_ctx=None)[source]¶
Initialize the parameter server.
- Parameters:
data_server_handle ('regressionDataServer') – Instance of the regressionDataServer class.
- Returns:
None.
- reset_parameter_server(cascade_configuration, data_server_handle, *, _ray_trace_ctx=None)[source]¶
Reset the parameter server.
- Parameters:
cascade_configuration ('cascade.initialize.cascade_configuration') – Singleton containing all cascade configuration parameters.
data_server_handle ('regressionDataServer') – Instance of the regressionDataServer class.
- Returns:
None.
- class regressionWorker(initial_regularization, iterator_chunk)[source]¶
Bases:
object
Regression worker class.
This class defines the workers used in the regression analysis to determine the systematics and transit model parameters.
- update_initial_parameters(updated_regularization, updated_iterator_chunk, *, _ray_trace_ctx=None)[source]¶
Update all parameters.
- Parameters:
updated_fit_parameters ('simpleNameSpace') – All parameters controling the regression model.
updated_regularization ('simpleNameSpace') – All parameters controling the regularization.
updated_iterator_chunk ('list') – Iterator chunck over data and bootstrap selections.
- Returns:
None.
- compute_model(regression_data, regularization_method, alpha, *, _ray_trace_ctx=None)[source]¶
Compute the regression model.
- Parameters:
regression_selection ('list') – DESCRIPTION.
bootstrap_selection ('list') – DESCRIPTION.
data_server_handle ('regressionDataServer') – DESCRIPTION.
regularization_method ('str') – DESCRIPTION.
alpha ('float' or 'ndarray') – DESCRIPTION.
- Returns:
beta_optimal (‘ndarray’) – DESCRIPTION.
rss (‘float’) – DESCRIPTION.
mse (‘float’) – DESCRIPTION.
degrees_of_freedom (‘float’) – DESCRIPTION.
model_unscaled (‘ndarray’) – DESCRIPTION.
alpha (‘float’) – DESCRIPTION.
- static get_data_chunck(data_server_handle, regression_selection, bootstrap_selection)[source]¶
Get a chanck of the data.
- Parameters:
data_server_handle ('regressionDataDerver') – Instance of the regressionDataDerver class.
regression_selection ('list') – List of indici defining the data to tbe modeld and the corresponding data to tbe used as regressors.
bootstrap_selection ('list') – List of indici defining the bootstrap selection.
- Returns:
regression_data_selection (‘ndarray’) – Selection of data to be modeled
regression_matirx_selection (TYPE) – Selection of data used as regression matrix.
- static get_regression_data_chunk(data_server_handle, iterator_chunk)[source]¶
bla.
- Parameters:
data_server_handle (TYPE) – DESCRIPTION.
iterator_chunk (TYPE) – DESCRIPTION.
- Returns:
selection_list (TYPE) – DESCRIPTION.
- static get_regression_parameters(parameter_server_handle)[source]¶
Get regression controll parameters from parameter server.
- Parameters:
parameter_server_handle (regressionParameterServer) – instance of the parameter server.
- Returns:
n_additional (‘int’) – Number of additional regressors.
ncorrect (‘int’) – Number of data points at the short wavelength side cut by the region of interest compared to the full dataset. This parameter is used to make sure the parameters are stored correctly in an array with a size corresponding to the total data volume.
- update_parameters_on_server(parameter_server_handle, data_chunk)[source]¶
Update parameters on parameter server.
- Parameters:
parameter_server_handle ('regressionParameterServer'') – Instane of the parameter server class
- Returns:
None.
- async_update_loop(parameter_server_handle, data_server_handle, *, _ray_trace_ctx=None)[source]¶
Regression loop over regressin and bootstrap selection.
- Parameters:
parameter_server_handle ('regressionParameterServer') – Instance of the paramter server
data_server_handle ('regressionDataServer') – Instance of the data server.
- Returns:
None.