pylag.file_reader module

A set of classes for managing access to input data, including the reading in of data from file.

class pylag.file_reader.DatasetReader[source]

Bases: object

Abstract base class for DatasetReaders

DatasetReaders are responsible for opening and reading single Datasets. Abstract base class introduced to assist with testing objects of type FileReader.

read_dataset(file_name, set_auto_mask_and_scale=True)[source]

Open a dataset for reading

Parameters
  • file_name (str) – The name or path of the file to open

  • set_auto_mask_and_scale (bool) – Flag for masking

Returns

A dataset.

Return type

N/A

class pylag.file_reader.DiskFileNameReader[source]

Bases: FileNameReader

Disk file name reader which reads in NetCDF file names from disk

Derived class for reading in file names from disk.

get_file_names(file_dir, file_name_stem)[source]

Get file names

Read file names from disk. A natural sorting algorithm is applied.

Parameters
  • file_dir (str) – Path to the input files.

  • file_name_stem (str) – Unique string identifying valid input files.

Returns

A list of file names.

Return type

list[str]

class pylag.file_reader.FileNameReader[source]

Bases: object

Abstract base class for FileNameReaders

File name readers are responsible for reading in and sorting file names, which will usually be stored on disk. An abstract base class was added in order to assist with testing FileReader’s behaviour under circumstances when all dependencies on reading data from disk have been removed.

get_file_names(file_dir, file_name_stem)[source]

Get file names

Return a list of file names

Parameters
  • file_dir (str) – Path to the input files.

  • file_name_stem (str) – Unique string identifying valid input files.

Returns

A list of file names.

Return type

list[str]

class pylag.file_reader.FileReader(config, data_source, file_name_reader, dataset_reader, datetime_start, datetime_end)[source]

Bases: object

Read in and manage access to input grid and field data

Objects of type FileReader manage all access to input data stored in files on disk. Support for data stored in multiple files covering non-overlapping time intervals is included. On initialisation, the object will scan the specified list of input data files in order to find the file or files that span the specified simulation start date/time. Two datasets are opened - one for the each of the two input time points that straddle the current simulation time point. These are referred to the first and second data files or time points respectively, with the first always corresponding to the time point that is earlier in time than the current simulation time point. Time indices for the two bounding time points are also stored. Through calls to update_reading_frames both the indices corresponding to the bounding time points and the input datasets can be updated as the simulation progresses. Support for running simulations either forward or backward in time is included.

Parameters
  • config (ConfigParser) – Configuration object.

  • data_source (str) – String indicating what type of data the datetime objects will be associated with. Options are: ‘ocean’, ‘atmosphere’, and ‘wave’.

  • file_name_reader (FileNameReader) – Object to assist with reading in file names.

  • dataset_reader (DatasetReader) – Object to assist with reading in datasets

  • datetime_start (Datetime) – Simulation start date/time.

  • datetime_end (Datetime) – Simulation end date/time.

Variables
  • config (ConfigParser) – Run configuration object.

  • config_section_name (str) – String identifying the section of the config where parameters describing the data are listed (e.g. WAVE_DATA).

  • file_name_reader (FileNameReader) – Object to assist with reading in file names from disk

  • dataset_reader (DatasetReader) – Object to assist with reading in NetCDF4 datasets

  • datetime_reader (DateTimeReader) – Object to assist with reading dates/times in input data.

  • data_dir (str) – Path to the directory containing input data

  • data_file_name_stem (str) – File name stem, used for building path names

  • grid_metrics_file_name (str) – File name or path to the grid metrics file

  • grid_file (Dataset) – NetCDF4 grid metrics dataset

  • data_file_names (list[str]) – A list of input data files that were found in data_dir

  • first_data_file_name (str) – Name of the data file containing the first time point bounding the current point in time.

  • second_data_file_name (str) – Name of data file containing the second time point bounding the current point in time.

  • first_data_file (Dataset) – Dataset containing the first time point bounding the current point in time.

  • second_data_file (Dataset) – Dataset containing the second time point bounding the current point in time.

  • time_direction (int) – Flag indicating the direction of integration. 1 forward, -1 backward.

  • first_time (array_like[float]) – Time array containing the first time point bounding the current point in time.

  • second_time (array_like[float]) – Time array containing the second time point bounding the current point in time.

  • tidx_first (int) – Array index corresponding to the first time point bounding the current point in time.

  • tidx_second (int) – Array index corresponding to the second time point bounding the current point in time.

  • sim_start_datatime (Datetime) – The current simulation start date/time. This is not necessarily fixed for the lifetime of the object - it can be updated through calls to setup_data_access. This helps support the running of ensemble simulations.

  • sim_end_datatime (Datetime) – The current simulation end date/time. This is not necessarily fixed for the lifetime of the object - it can be updated through calls to setup_data_access. This helps support the running of ensemble simulations.

compute_time_delta_between_datasets(data_file_name, forward)[source]

Compute time delta between datasets

If there is only one dataset or the last data file is given a value of zero is returned. Otherwise, time delta is the time difference in seconds between the last (first) time point in the data file and the first (last) time point in the next (previous) data file, as stored in self.data_file_names. The forward argument is used to determine if time delta is computed as the difference between the next or last files.

Parameters
  • data_file_name (str) – Dataset file name.

  • forward (bool) – If True, compute time delta between the last time point in the current file and the first time point in the next file. If False, compute time delta between the first time point in the current file and the last time point in the previous file.

Returns

time_delta – The absolute time difference in seconds.

Return type

float

get_dimension_variable(var_name)[source]

Get the size of the NetCDF4 dimension variable

Parameters

var_name (str) – The name of the dimension variable.

Returns

The size of the dimensions variable.

Return type

int

get_grid_variable(var_name)[source]

Get the NetCDF4 grid variable

Parameters

var_name (str) – The name of the variable.

Returns

The the grid variable.

Return type

NDArray

get_grid_variable_dimensions(var_name)[source]

Get the variable dimensions

Parameters

var_name (str) – The name of the variable.

Returns

The variable’s dimensions

Return type

tuple(str)

get_mask_at_last_time_index(var_name)[source]

Get the mask at the last time index

Parameters

var_name (str) – The name of the variable.

Returns

The variable mask

Return type

NDArray

get_mask_at_next_time_index(var_name)[source]

Get the mask at the next time index

Parameters

var_name (str) – The name of the variable.

Returns

The variable mask

Return type

NDArray

get_time_at_last_time_index()[source]

Get the time and the last time index

Returns

The time at the last time index.

Return type

float

get_time_at_next_time_index()[source]

Get the time and the next time index

Returns

The time at the next time index.

Return type

float

get_time_dependent_variable_at_last_time_index(var_name)[source]

Get the variable at the last time index

Parameters

var_name (str) – The name of the variable.

Returns

The variable array

Return type

NDArray

get_time_dependent_variable_at_next_time_index(var_name)[source]

Get the variable at the next time index

Parameters

var_name (str) – The name of the variable.

Returns

The variable array

Return type

NDArray

get_variable_dimensions(var_name, include_time=True)[source]

Get the variable dimensions

Parameters
  • var_name (str) – The name of the variable.

  • include_time (bool) – If False, the time dimension is not included in the dimensions. Optional, default: True.

Returns

The variable’s dimensions

Return type

tuple(str)

get_variable_shape(var_name, include_time=True)[source]

Get the variable shape

Parameters
  • var_name (str) – The name of the variable.

  • include_time (bool) – If False, the time dimension is not included in the shape. Optional, default: True.

Returns

The variable’s shape

Return type

tuple(int)

setup_data_access(start_datetime, end_datetime)[source]

Open data files for reading and initalise all time variables

Use the supplied start and end times to establish which input data file(s) contain data spanning the specified start time.

Parameters
  • start_datetime (Datetime) – Simulation start date/time.

  • end_datetime (Datetime) – Simulation end date/time.

update_reading_frames(time)[source]

Update input datasets and reading frames

Update input datasets and reading frames using the given time, which is the current simulation time in seconds.

Parameters

time (float) – Time

class pylag.file_reader.NetCDFDatasetReader[source]

Bases: DatasetReader

NetCDF dataset reader

Return a NetCDF4 dataset object.

read_dataset(file_name, set_auto_maskandscale=True)[source]

Open a dataset for reading

Parameters
  • file_name (str) – The name or path of the file to open

  • set_auto_maskandscale (bool) – Flag for masking

Returns

A NetCDF4 dataset.

Return type

NetCDF4 Dataset