Data Utilities¶
General utilities for common data file manipulations
Making Labels¶
Insert Realization Index¶
-
pygeostat.datautils.labels.
insert_real_idx
(data, num_real=0, bindex=True, real_column='Realization', bi_column='BlockIndex')¶ This will insert realization index columns. By default it will use the griddef associated with the file.
Parameters: - num_real (int) – If you do not have a griddef associated with the file you can tell it how many realizations there are
- bindex (bool) – True or False for adding a block index
- real_column (str) – Set the name of the column used for the Realizations Index
- bi_column (str) – Set the name of the column used for the Block Index
Process
If there are already a “real_column” or “bi_column” columns it will overwrite the values in these columns If the “real_column” and “bi_column” columns aren’t in the dataframe it will insert these columns at the front.Code author: Tyler Acorn - 2015-Sept-30
Make Labels¶
-
pygeostat.datautils.labels.
make_labels
(prefix, num, padding=0)¶ Returns a series of lables combining a prefix and a number with leading zeros
Parameters: - prefix (str) – any letter(s) that you want as the prefix (for example B for blockindex)
- num (int) – The number of labels you want.
- padding (int) – if given an integer value will pad the numbers with zeros until the prefix + the numbers equal the length of the padding value
Returns: This will return a series with “n” number of labels starting from 1
Return type: Series
Note
Barrowed from website http://pandas.pydata.org/pandas-docs/stable/advanced.html#advanced
Examples
Creating an array of labels
>>> label = gs.datautils.make_labels('R', 3, padding=3) >>> label >>> [R001, R002, R003]
Code author: Tyler Acorn 2015-09-21
Assorted Utility Functions¶
Check Len of Gridded Ascii File¶
-
pygeostat.datautils.utils.
check_grid_file_size
(gridfl, griddef, nreal=None)¶ Check the gridded data file to see if the number of lines in the file matches the number of cells specified in the griddef. Returns true if nreal * griddef.count() = nlines - (2 + nvar)
Relies on the GNU wc tool. Comes with cygwin. I think.
Parameters: - gridfl (str) – griddef file to check
- griddef (GridDef) – standard pygeostat griddef
- nreal (int) – optional number of realizations in the file
Returns: True if there is a match, False if there is a mismatch
Round to Significant Figures¶
-
pygeostat.datautils.utils.
round_sigfig
(value, sigfigs)¶ Round a float or integer to a specified number of significant figures. Also handles effectively zero, infinity, and negative infinity values.
From: http://stackoverflow.com/questions/3410976/
Parameters: - value (int or float) – Value that requires rounding
- sigfigs (int) – Number of significant figures to round the value to
Returns: Rounded value
Return type: new_value (int or float)
Example
>>> gs.round_sigfig(-0.00032161, 3) >>> -0.00322
Code author: Warren Black - 2015-10-13
Get Collocated Data from Grid¶
-
pygeostat.datautils.utils.
getcollocated
(data, secdatfl, seccols=None, concat=True)¶ Retrieve gridded exhaustive secondary data at the collocated sample locations.
If
concat
isTrue
, the secondary data will be added to thegs.DataFile
passed and nothing will be returned.Warning
The Fortran code as not be rigorously tested, use at your own risk.
Parameters: - datafl (
gs.DataFile
) – Ags.DataFile
class that must contain the appropriate coordinate information (i.e.,x
,y
, andz
attributes). Thegs.DataFile
must also have it’sgriddef
parameter specified, pointing to ags.GridDef
class. - secdatfl ('str') – Location of the gridded secondary data
- seccols (list) – List of the columns containing the secondary data to extract. Default is to extract all columns in the file
- concat (bool) – Indicate if the secondary data should be concatenated onto the input
gs.DataFile
dataframe (i.e.,data.data
)
Returns: Dataframe containg the secondary data. Its return is dependent on the value of
concat
Return type: secdat (
pd.DataFrame
)Example
A simple call:
>>> data = gs.DataFile(data, griddef=griddef, x='x', y='y', z='z') >>> secdatfl = '../secdat.dat' >>> gs.getcolloccated(data, secdatfl)
Code author: Warren E. Black - 2016-02-15
- datafl (
Get File Header¶
-
pygeostat.datautils.utils.
fileheader
(datafl, mute=False)¶ Read a GSLIB file from python and return the header information. Useful for large files.
Code author: Warren E. Black - 2016-02-15
Convert Corrmat to GSLIB string for USGSIM¶
-
pygeostat.datautils.utils.
corrmatstr
(corrmat, fmt)¶ Converts a correlation matrix that is currently a numpy matrix or a pandas dataframe, into a space delimited string. Correlation matrix strings are required in the parameter files of CCG programs such as USGSIM and supersec.
Currently, this function is hard coded to return two formats as specified by the
fmt
argument, one for'usgsim'
and one for'supersec'
.'usgsim'
returns the full correlation matrix while'supersec'
returns only the upper triangle of the matrix, without the diagonal values.Parameters: - corrmat – Correlation matrix as either a pandas dataframe (pd.DataFrame) or numpy matrix (np.ndarray).
- fmt (str) – Indicate which format to return. Accepts only one of
['usgsim', 'supersec']
Returns: Correlation matrix as a space delimited string.
Return type: corrstr (str)
Code author: Warren E. Black - 2016-03-15
Get the 2D Slice of a 3D Grid¶
-
pygeostat.datautils.utils.
slicegrid
(data, griddef, orient, sliceno, slicethickness=None, nullv=None)¶ Slice a 3-D grid.
Parameters: - data – 1-D array or a tidy long-form dataframe with a single column containing the variable in question and each row is an observation
- griddef (GridDef) – A pygeostat GridDef class created using
gs.GridDef
- orient (str) – Orientation to slice data.
'xy'
,'xz'
,'yz'
are the only accepted values - sliceno (int) – Grid cell location along the axis not plotted to take the slice of data to plot
Returns: 1-D array of the sliced data
Return type: view (np.ndarray)
Code author: Matthew Deutsch - 2014-04-19
Get a Slice of a 3D Point Dataset¶
-
pygeostat.datautils.utils.
slicescatter
(data, orient, sliceno, slicetol, griddef=None, x=None, y=None, z=None)¶ Slice scattered data based on a GSLIB style grid definition.
Parameters: - data (pd.DataFrame or gs.DataFile) – Dataframe where each column is a variable and each row
is an observation. Must contain the coordinate columns required depending on the value
of
orient
. If ags.DataFile
class is passed, its attributegriddef
,x
,y
, andz
will be extracted. - var (str) – Column header of variable under investigation
- orient (str) – Orientation to slice data.
'xy'
,'xz'
,'yz'
are the only accepted values - sliceno (int) – Grid cell location along the axis not plotted to take the slice of data to plot
- slicetol (float) – Slice tolerance to plot point data (i.e. plot +/-
slicetol
from the center of the slice). Any negative value plots all data. Default is to plot all data. - griddef (GridDef) – A pygeostat GridDef class created using
gs.GridDef
. Required if the attribute cannot be retrieved fromdata
if it is ags.DataFile
class. - x (str) – Column header of x-coordinate. Required if the attribute cannot be retrieved from
data
if it is ags.DataFile
class. - y (str) – Column header of x-coordinate. Required if the attribute cannot be retrieved from
data
if it is ags.DataFile
class. - z (str) – Column header of x-coordinate. Required if the attribute cannot be retrieved from
data
if it is ags.DataFile
class.
Returns: pd.DataFrame of the sliced data
Return type: pointview (pd.DataFrane)
Code author: Warren E. Black - 2016-04-11
- data (pd.DataFrame or gs.DataFile) – Dataframe where each column is a variable and each row
is an observation. Must contain the coordinate columns required depending on the value
of
Get Absolute Filepath¶
-
pygeostat.datautils.utils.
fixpath
(path)¶ Convert a file path to an absolute path if required and make sure there are only forward slashes.
If copying the path directly from windows explorer or something that will produce a path like that, make sure to indicate to python that the string is raw. This is done by placing a
r
in front of the string. For example:>>> string = r"A string with backslashes \ \ \ \"
Example
Make sure to place an
r
in front of the string so funny things don’t happen. A simple call:>>> gs.fixpath(r"D:\Data\data.dat")
Code author: Warren E. Black - 2016-02-07
Test if Data is Numeric¶
-
pygeostat.datautils.utils.
is_numeric
(s)¶ Returns true if a value can be converted to a floating point number
Ensure a Directory Exists¶
-
pygeostat.datautils.utils.
ensure_dir
(f)¶ Function to make sure that directory(s) exists and if not, create it
Ensure a Path-to-Directory Exists¶
-
pygeostat.datautils.utils.
ensure_path
(path)¶ Function ensures that all folders in a given path or list of paths are created if they do not exist
Get the Euclidean Distance to the Nearest Sample¶
-
pygeostat.datautils.utils.
nearest_eucdist
(x, y=None, z=None)¶ Calculate the euclidean distance to the nearest sample for each sample.
Parameters: x (np.array) – Array of the coordinate in the x direction
Keyword Arguments: - y (np.array) – Array of the coordinate in the y direction
- z (np.array) – Array of the coordinate in the z direction
Returns: Array of the euclidean distance to the nearest sample for each sample
Return type: dist (np.array)
Code author: Warren E. Black - 2016-07-28