Recent Pygeostat Changes¶
DataFile Changes¶
Demo of the DataFile Functionality¶
The following notebook demos new:
- New DataFile functionality as it relates to regular gridded data
- New DataFile functionality as it relates to irregular point data
- A modification to infergriddef, which is a class function of DataFile
- Extensions of pandas DataFrame functionality to the DataFile class
- A new data spacing calculation function
- Assignment of NaN's based on DataFile.null on write output
import pygeostat as gs
import matplotlib.pyplot as plt
import copy
% matplotlib inline
gs.gsParams['plotting.locmap.s'] = 3
1. DataFile with Gridded Data¶
The gsParams will be used for setting a default griddef¶
gs.gsParams['data.griddef'] = gs.GridDef(gridfl='../data/griddef.txt')
print(gs.gsParams['data.griddef'])
dat = gs.DataFile(flname='../data/sgsim.gsb', nreals=1)
The griddef default matches the length of dat, so it is assigned¶
print(dat.griddef)
The DataFile also initializes as dftype='grid' since the griddef is assigned¶
print(dat.dftype)
In the absence of other kwargs, the variables attribute is all non-specialized columns¶
Not very interesting in this case since variables are equal to dat.data.columns, but this is more useful with the irregular point dat that follows. Note that pixelplt no longer requires a griddef kwarg (seperate update).
fig, axes = gs.subplots(2, 2, cbar_mode='single',
axes_pad=(0.05, 0.4), figsize=(10, 10))
for ax, var in zip(axes, dat.variables):
gs.pixelplt(dat, var=var, ax=ax, vlim=(0, .5),
cbar_label='standardized units')
2. DataFile with Irregular Data¶
This data file does not match the length of the gsParams griddef, so it is not associated with that grid definition by default.
dat = gs.DataFile(flname='../data/data.dat')
print(dat.griddef)
The variables attribute is the non-specialized columns¶
print('dat.x, dat.y, dat.z:', dat.x, dat.y, dat.z)
print('dat.variables:', dat.variables)
These can be further reduced with various kwargs¶
# A list of notvariables may be provided, leading to their exclusion from variables
dat = gs.DataFile(flname='../data/data.dat', notvariables='Keyout')
print('dat.variables:', dat.variables)
# A list of variables may be provided, leading to their isolated selection
dat = gs.DataFile(flname='../data/data.dat',
notvariables=['Au', 'Sulfides'])
print('dat.variables:', dat.variables)
Note that a cat attribute has also been added.¶
This is intended for use as the categorical modeling variable (e.g., rocktype), which is generally singular. Keyout is a quasi-rocktype variable that often faciliates simulation with usgsim. Specifying it on initialization leads to its exclusion from variables.
dat = gs.DataFile(flname='../data/data.dat', cat='Keyout')
print('dat.variables = ', dat.variables)
print('dat.cat = ', dat.cat)
The addition of a variables attribute should allow for future wrapping convenience¶
For example, providing a data file object to an nscore routine, which then assumes that all variables should be normal scored in the absence of kwargs that say otherwise.
For now, it provides marginal convenience as the initialization and storage of information. Consider that no variable list needs to be initialized for plotting maps of each variable.
fig, axes = gs.subplots(2, 2, cbar_mode='single',
axes_pad=(0.05, 0.4), figsize=(10, 10))
for ax, var in zip(axes, dat.variables):
gs.locmap(dat, var=var, ax=ax, vlim=(0, .5),
cbar_label='standardized units')
Also consider added convenience of the list for functions such as DataFile.gscol¶
As well as the utility of the nvar that is calculated based on the number of variables
print('columns of the variables in the data file = ', dat.gscol(dat.variables))
print('number of variables = ', dat.nvar)
Similarly, an xyz attribute has been added to the DataFile object, which reports the x, y and z attributes as a list¶
Variables and xyz columns are frequently used within a list for iterating loops, parameter specifications, etc. They're presence as a data.variables and data.xyz simplifies things.
print('dat.xyz is a list:', dat.xyz)
print('this simplifies gscols, among other things:', dat.gscol(dat.xyz))
3. Infer Grid Definition¶
New functionality allows for the block sizes to be specified, before inferring the required number of blocks from the data extents
griddef = dat.infergriddef(blksize=(2, 2, None), databuffer=1.5)
print('A grid definition is output as a variable\n', griddef)
print('Though it is also added as an attribute of the data\n', dat.griddef)
The old functionality remains¶
Specify the number of blocks before inferring the block size.
griddef = dat.infergriddef(nblk=(115, 78, None), databuffer=1.5)
print('A grid definition is output as a variable\n', griddef)
print('Though it is also added as an attribute of the data\n', dat.griddef)
dat.data['new column'] = 0
Now, you can simply use the same notation on the DataFile object¶
This provides the setitem function of the DataFile.
dat['new column 2'] = 0
Similarly, the getitem functionality of a pandas DataFrame is extended to DataFile¶
dat['new column 2'].head()
list(dat.columns)
## Clean for the next section
dat.drop(['new column', 'new column 2', 'Keyout'])
cdat = copy.deepcopy(dat)
dat = copy.deepcopy(cdat)
The pandas DataFrame columns functionality has also been extended to DataFile, including get and set¶
Unlike the Pandas columns, special attributes such as data.x, data.variables, etc. are updated if the previously set name is changed.
% load_ext autoreload
% autoreload 2
print('the columns:', list(dat.columns), '\n')
print('the x and variables attributes:', dat.x, dat.variables, '\n')
columns = copy.deepcopy(dat.columns.values)
columns[[0, 2]] = 'Easting', 'Gold'
dat.columns = columns
print('the columns after altering:', list(dat.columns), '\n')
print('the x and variables attributes after altering:', dat.x, dat.variables)
The Pandas rename functionality is applied similarly¶
Note that the x and variables attributes are adjusted back
dat.rename({'Easting': 'X', 'Gold': 'Au'})
print('the columns after altering:', list(dat.columns), '\n')
print('the x and variables attributes after altering:', dat.x, dat.variables)
As is the drop functionality¶
dat.drop(['X', 'Organic Carbon'])
print('the columns after altering:', list(dat.columns), '\n')
print('the x and variables attributes after altering:', dat.x, dat.variables)
DataFrame.shape is now extended¶
dat.shape
# Reset
dat = gs.DataFile(flname='../data/data.dat', cat='Keyout')
5. A 2-D Data Spacing Function Allows its for its Fast Calculation¶
Data spacing is calculated at the data location only, and refers (for now) to the distance in the x/y plane (not downhole or 3-D spacing).
kwargs can override, but otherwise the function is based on DataFile properties (dh, x and y). If a dh is present, then the data spacing is based on the distance to the average x/y location of each drill hole.
The calculation is based on the average distance to the n_nearest locations. It is vector-based and very fast, but memory intensive. A compiled version will likely be required for extending this calculation to 3-D or larger data sets.
The output is concatenated as another array in the DataFile, unless the kwarg inplace is set to False.
Data spacing using the DataFile attributes (no kwargs here)¶
Here, the output dspace array is the average distance to the nearest 8 data.
dat.spacing(8)
This is useful for plotting as a distribution¶
Informs declustering cell size, variogram lag distance, etc.
gs.histplt(dat['Data Spacing (m)'], icdf=True, stat_blk='all')
This may also be useful for plotting in map view¶
E.g., for determining modeling domains/strategies
gs.locmap(dat, var='Data Spacing (m)', vlim=(6, 10))
You can also perform the calculation on specific variables¶
Where the calculation ignores records that are NaN for that variable
fig, axes = gs.subplots(2, 2, cbar_mode='single',
axes_pad=(0.05, 0.4), figsize=(10, 10))
for ax, var in zip(axes, dat.variables):
dat.spacing(8, var=var)
gs.locmap(dat, var=var+' Data Spacing (m)',
ax=ax, vlim=(6, 10))
fig, axes = gs.subplots(2, 2, label_mode='all', aspect=False,
axes_pad=(0.8, 0.8), figsize=(10, 10))
for ax, var in zip(axes, dat.variables):
gs.histplt(dat[var+' Data Spacing (m)'], ax=ax,
icdf=True, stat_blk='all')
6. NaN's are Assigned a Valid GSLIB Null Value on Output¶
May be specified in the function call, otherwise based on DataFile.null and gsParams['data.null'] in that order of priority.
# Inspect to see the -99's in place of NaN's
print(dat.null)
dat.writefile('test.dat')
gs.rmfile('test.dat')
GridDef Changes¶
Demo of Updates to GridDef¶
The following notebook demos new:
- Deprecation of old functions
- index3d_to_index1d, which replaces and improves Indices_to_Index
- index1d_to_index3d, which replaces and improves Index_to_Indices
- coord_to_index1d, which replaces and improves gridIndexCoords
- coord_to_index3d, which replaces and improves gridIndicesCoords
- gridcoord, which replaces the semi-redundant gridcoords and gengridpoints
import pygeostat as gs
import matplotlib.pyplot as plt
import numpy as np
% matplotlib inline
griddef = gs.GridDef(gridfl='../data/griddef.txt')
griddef
1. Deprecation of old naming conventions¶
All class functions of GridDef are now lower case, aligning with the Python standard. Based on discussion with contributors, more intuitive naming conventions are now used as well. Merging of potentially redundant functions has been initiated, though more work is required in this regard.
The current function name returns no warning¶
vol = griddef.blockvolume()
print('block volume = {}'.format(vol))
2. index3d_to_index1d¶
Highlights include:
- index3d_to_index1d replaces Indices_to_Index, which previously required a loop for multiple conversions. numpy vector functionality is used, which is much faster than loops.
- The previous functionality, operating on a single set of indices may still be used as well
- The old function name is deprecated, pointing to the current function with a warning (now removed!)
ix, iy, iz = 2, 4, 0
Execution with a single 3-D index¶
idx, ingrid = griddef.index3d_to_index1d(ix, iy, iz)
print('idx={}, ingrid={}'.format(idx, ingrid))
Execution with arrays of 3-D indices¶
ix, iy, iz = np.arange(0, 5), np.arange(0, 5), np.zeros(5)
idx, ingrid = griddef.index3d_to_index1d(ix, iy, iz)
print('idx={}, ingrid={}'.format(idx, ingrid))
3. index1d_to_index3d¶
Highlights include:
- index1d_to_index3d replaces Index_to_Indices, which previously required a loop for multiple conversions. numpy vector functionality is used, which is much faster than loops.
- The previous functionality, operating on a single index may still be used as well
- The old function name is deprecated, pointing to the current function with a warning (now removed!)
idx = 918
print('ix={}, iy={}, iz={}, ingrid={}'.format(ix, iy, iz, ingrid))
Execution with a single 1-D index¶
ix, iy, iz, ingrid = griddef.index1d_to_index3d(idx)
print('ix={}, iy={}, iz={}, ingrid={}'.format(ix, iy, iz, ingrid))
Execution with an array of 1-D indices¶
idx = np.array([0, 230, 460, 690, 920])
ix, iy, iz, ingrid = griddef.index1d_to_index3d(idx)
print('ix={}, iy={}, iz={}, ingrid={}'.format(ix, iy, iz, ingrid))
4. coord_to_index1d¶
Highlights include:
- coord_to_index1d replaces gridIndexCoords, which previously required a loop for multiple conversions. numpy vector functionality is used, which is much faster than loops.
- The previous functionality, operating on a single coordinate may still be used as well
- The old function name is deprecated, pointing to the current function with a warning (now removed!)
Execution with a single coordinate¶
x, y, z = 15, 30, .5
idx, ingrid = griddef.coord_to_index1d(x, y, z)
print('idx={}, ingrid={}'.format(idx, ingrid))
Execution with arrays of coordinates¶
x, y = np.linspace(30.5, 100.5, 5), np.linspace(30.5, 100.5, 5)
z = np.zeros(x.shape)
idx, ingrid = griddef.coord_to_index1d(x, y, z)
print('idx={}, ingrid={}'.format(idx, ingrid))
5. coord_to_index3d¶
Highlights include:
- coord_to_index3d replaces gridIndicesCoords, which previously required a loop for multiple conversions. numpy vector functionality is used, which is much faster than loops.
- The previous functionality, operating on a single coordinate may still be used as well
- The old function name is deprecated, pointing to the current function with a warning (now removed!)
Execution with a single coordinate¶
ix, iy, iz, ingrid = griddef.coord_to_index3d(x, y, z)
print('ix={}, iy={}, iz={}, ingrid={}'.format(ix, iy, iz, ingrid))
Execution with arrays of coordinates¶
x, y = np.linspace(30.5, 100.5, 5), np.linspace(30.5, 100.5, 5)
z = np.zeros(x.shape)
ix, iy, iz, ingrid = griddef.coord_to_index3d(x, y, z)
print('ix={}, iy={}, iz={}, ingrid={}'.format(ix, iy, iz, ingrid))
6. gridcoord¶
Highlights include:
- gridcoord replaces gridcoords and gengridpoints, which previously provided the single execution (coordinates of one grid node) and global execution (coordinates of all grid nodes) respectively
- The old function names are deprecated, pointing to the current function in the appropriate manner with a warning (now removed!)
gridcoord in the context of gridcoords¶
x, y, z = griddef.gridcoord(ix, iy, iz)
print('x={}, y={}, z={}'.format(x, y, z))
Deprecated gengridpoints¶
The old gengridpoints outputs a single array where each column corresponds with x, y and z. gridcoord is instead consistent with the execution above, outputing 3 seperate arrays.
gridcoord in the context of gengridpoints¶
x, y, z = griddef.gridcoord()
print('x={}, y={}, z={}'.format(x[:5], y[:5], z[:5]))
GsParams Addition¶
Demo of the gsParams Functionality¶
The following notebook demos how the gsParams object:
- May be inspected, described and used for setting defaults
- May be used for modifying default plotting label behaviour
- May be used for modifying default plotting style behaviour
- May be used for modifying default grid-related behaviour
- May be used for modifying default data-related behaviour
- May have its user settings saved/loaded within each notebook instance, providing consistency and convenience
import pygeostat as gs
import numpy as np
import matplotlib.pyplot as plt
% matplotlib inline
dat = gs.DataFile(flname='../data/data.dat')
1. Introducing the gsParams Object¶
Stores project defaults, excluding matplotlib parameters that are handled by the set_style functionality or with native matplotlib functionality. This object mirrors the matplotlib rcParams object, which is a dictionary object that validates inputs and is queried by pygeostat plotting functions. Use of keyword arguments in individual functions override the object.
As a dictionary object, all of the standard functionality applies, such as printing the keys and values¶
print(gs.gsParams)
Additionally, a description function may be used for an explicit description of individual cells¶
gs.gsParams.describe('data.griddef')
Using that function without a key leads to printing of the entire dictionary¶
gs.gsParams.describe()
Error checking is performed when describing parameters¶
# Describe with an invalid key
try:
gs.gsParams.describe('plotting.test')
except Exception as e:
print(e)
Error checking of keys and values is also performed when setting parameters¶
# Set with an invalid key
try:
gs.gsParams['plotting.test'] = 'something'
except Exception as e:
print(e)
# Set with an invalid value
try:
gs.gsParams['data.tmin'] = 'something'
except Exception as e:
print(e)
gs.locmap(dat, title='Units are meter, based on the standard default')
gs.gsParams['plotting.unit'] = 'ft'
gs.gsParams['plotting.locmap.s'] = 3
gs.locmap(dat, title='Units are feet, based on the modified default')
Modify the default axis labels¶
The following change to gsParams will impact pixelplt, locmap, pitplt, drillplt, varplt, etc.
gs.gsParams['plotting.xname'] = 'X'
gs.gsParams['plotting.yname'] = 'Y'
gs.locmap(dat)
Logical labels if unit == None¶
Entering None or empty quotations for the unit.spatial, leads to exclusion from labels (with no empty brackets behind label)
gs.gsParams['plotting.unit'] = None
gs.locmap(dat)
Override the defaults with kwarg of each function¶
Providing keyword labels lead to overriding of defaults for a particular plot, without impacting the defaults.
gs.locmap(dat, xlabel='U', ylabel='V', title='Overriding Defaults')
gs.locmap(dat, title='Reverting to Defaults')
Restore the original pygeostat defaults¶
If wishing to restore the original defaults, a class function is provided
gs.gsParams.restore_defaults()
gs.locmap(dat)
3. Plotting Style Functionality¶
gsParams may be used for altering the default plotting style in several ways. Note that provided customization via gsParams is intended to compliment matplotlib.rcParams, which provides basic defaults such as font families, font size, figure size, ,etc.
The background grid may be globally disabled in pygeostat plotting functions¶
gs.gsParams['plotting.locmap.s'] = 3
gs.locmap(dat, title='Without Grid Lines')
gs.gsParams['plotting.grid'] = True
gs.locmap(dat, title='With Grid Lines')
The face and edge color of histplt may be set globally¶
Also note that a GSLIB-style axis_xy may be set, removing top and left borders.
gs.histplt(dat['Organic Carbon'], title='Default Histogram Color Now Mimics GSLIB')
gs.gsParams['plotting.axis_xy'] = True
gs.histplt(dat['Organic Carbon'],
title='Top and right borders are now hidden for non-spatial plots')
gs.gsParams['plotting.histplt.facecolor'] = 'C0'
gs.gsParams['plotting.histplt.edgecolor'] = 'C1'
gs.histplt(dat['Organic Carbon'], title='Any Face and Edge Color Can be Used')
The color of CDFs may also be set globally¶
gs.histplt(dat['Organic Carbon'], icdf=True, title='Default CDF Color')
gs.gsParams['plotting.histplt.cdfcolor'] = 'C2'
gs.histplt(dat['Organic Carbon'], icdf=True, title='Any CDF Color Can be Used')
Various statistic block parameters may be altered¶
gs.gsParams['plotting.histplt.stat_blk'] = 'minimal'
gs.histplt(dat['Organic Carbon'], icdf=True,
title=('Minimal stats with 2 significant digits'))
gs.gsParams['plotting.sigfigs'] = 4
gs.histplt(dat['Organic Carbon'], icdf=True,
title=('4 significant digits'))
gs.gsParams['plotting.sigfigs'] = 3
gs.gsParams['plotting.roundstats'] = False
gs.gsParams['plotting.histplt.stat_xy_cdf'] = (0.7, 0.05)
gs.gsParams['plotting.stat_ha'] = 'left'
gs.histplt(dat['Organic Carbon'], icdf=True, title=('3 sig figs and left alignment'))
# Restore for future plots
gs.gsParams['plotting.stat_ha'] = 'right'
gs.gsParams['plotting.axis_xy'] = False
Note that the color is validated when set as a default¶
try:
gs.gsParams['plotting.histplt.facecolor'] = 'this isnt a valid color'
except Exception as e:
print(e)
# Return some defaults
gs.gsParams['plotting.histplt.facecolor'] = '.9'
gs.gsParams['plotting.histplt.edgecolor'] = 'k'
The 'axis_xy' style of GSLIB may be globally disabled¶
def subplot_figure(title):
fig, axes = plt.subplots(1, 2, figsize=(12, 5))
gs.histplt(dat['Organic Carbon'], ax=axes[0], stat_blk='all')
gs.locmap(dat, var='Organic Carbon', ax=axes[1])
fig.tight_layout()
fig.suptitle(title, **{'y': 1.02})
subplot_figure('The default axis borders follow matplotlib')
gs.gsParams['plotting.axis_xy'] = True
subplot_figure('The axis borders now mimic GSLIB-style (only full borders for spatial plots)')
gs.gsParams['plotting.axis_xy_spatial'] = True
subplot_figure('The axis borders are now consistent')
4. Grid-related Default Behaviour¶
Grid definitions must be repeatedly specified for initializing DataFiles, using grid related functionalities, etc. gsParams allows for a default to be set and applied in the absence of kwarg defaults
Observe the behaviour of DataFile with no griddef default¶
No griddef is associated with the DataFile when loaded, in the absence of kwarg
keyout = gs.DataFile(flname='../data/keyout.gsb')
print(keyout.griddef)
Now, initialize and set a default grid definition with one line¶
Here, the griddef is initialized through passing a file name with the new gridfl option in GridDef
# The grid definition works
gs.gsParams['data.griddef'] = gs.GridDef(gridfl='../data/griddef.txt')
The grid definition is now automatically associated with a DataFile when initialized, if its length matches¶
It will be assigned so long as DataFile.DataFrame.shape[0] matches gs.gsParams['data.griddef'].count()
keyout = gs.DataFile(flname='../data/keyout.gsb')
print(keyout.griddef)
5. Trimming (NaN Assignment) Related Default Behaviour¶
NaN is the standard for missing values in Pandas, Numpy, Scipy, Paraview, Matplotlib and others, and is therefore adopted within Pygeostat.
Observe the default behaviour of the DataFile¶
Here, the gsParams['tmin'] default of -98.0 is trimming null values, leading to their assignment as NaN. The pandas describe then ignores them in the count, stats, etc., the pixelplt displays them as white, the np.nanmean ignores them, etc.
sgsim = gs.DataFile(flname='../data/sgsim.gsb', nreals=1)
sgsim.describe()
# Note that matplotlib prints nans as white by default
gs.pixelplt(sgsim, vlim=(0, .2))
# Note that this matches the pandas describe above
print(np.nanmean(sgsim.data['variable_001']))
Observe the behaviour of DataFile if tmin is altered (not used in this case)¶
No tmin is used, so that -999's in the data file are included in the pandas describe. This functionality may be more useful, however, if a differing tmin tolerance must be used (e.g., tmin = -998, tmin = -9998, etc.)
gs.gsParams['data.tmin'] = None
sgsim = gs.DataFile(flname='../data/sgsim.gsb', nreals=1)
# Note that -999s are now included in the count and stats by pandas
sgsim.describe()
# Note that -999's are plotted as blue since matplotlib considers them valid, motivating the use of NaN
gs.pixelplt(sgsim, vlim=(0, .2))
# Note that there is no built-in functionality within numpy for ignoring
# -999, which again motivates the use of NaN
print(np.mean(sgsim.data['variable_001']))
Default NaN Replacement Behaviour with Output¶
When writing to external files, calling fortran wrappers, etc., the NaN's are replaced with gsParams['data.null'] by default. This may be overriden with function kwargs, which is recommended when writing to VTK where NaN's are implicitly handled.
# Read in data to initialize NaNs
gs.gsParams['data.tmin'] = -98.0
dat = gs.DataFile(flname='../data/data.dat')
dat.describe()
# Write this data to a file, which may be inspected (note the -99's)
# before removing
dat.writefile('test.dat')
Altered NaN Replacement Behaviour with Output¶
The gsParams may be used for modifying this default output behaviour, as well as function kwargs.
# First, note that -999s are now present in the output file (rather than -99s) due
# to the use of the kwarg below
dat.writefile('test.dat', null=-999.0)
# gsParams may be used for globally modifying null to anything, including NaN
gs.gsParams['data.null'] = -9
# The DataFile must be re-initialized for the global null to be applied
# as its attribute
dat = gs.DataFile(flname='../data/data.dat')
# -9s are now visible in this file
dat.writefile('test.dat')
gs.rmfile('test.dat')
6. Save and Load gsParams Settings¶
gs.gsParams.save('gsParams_user_defaults.txt')
gs.gsParams.load('gsParams_user_defaults.txt')
Plotting Styles Addition¶
Demo of new default style performance and set_style functionality¶
The following notebook demos how:
- The appearance of pygeostat plots in the absence of style specifications now matches matplotlib defaults, or whatever the previously defined style of a user is (according to modifications in matplotlib.rcParams)
- The default style may be modified with custom dictionary settings, before being used in all future plots
- The default style may be modified with pre-defined style dictionaries, such as the 'ccgpaper' style that used to be the default pygeostat plot style
- The original matplotlib defaults may be restored if changes are made
- The use of style specifications in individual pygeostat plot functions are only applied to those individual plots, and do not impact the default settings
- The complimentary (and not redundant) nature of pygeostat.set_style (via matplotlib.rcParams) and the pygeostat.gsParams settings.
Although these changes are demoed with locmap, they apply to all pygeostat plot functions. A paramdiff function is also defined in this notebook to explicit display the changes to the plotting style from each step.
import pygeostat as gs
import matplotlib as mpl
% matplotlib inline
dat = gs.DataFile(flname='../data/data.dat')
Generate a function for displaying changes to Matplotlib Parameters¶
The altered parameters will be visible, but the paramdiff function explicitly represents changes throughout this notebook.
origparams = mpl.rcParams.copy()
def paramdiff(): return {k: [v, mpl.rcParams[k]] for k, v in origparams.items() if v != mpl.rcParams[k]}
1. Plot without any modifications to style¶
No mangling of style occurs in the absence of provided style specifications
gs.locmap(dat, title='This plot uses the matplotlib defaults')
paramdiff()
2. Modify the default plot style with a custom dictionary¶
Note that this amounts to using the gs.set_style one-liner functionality in place of: import matplotlib as mpl mpl.rcParams['font.size'] = 30
This may be easier for some users...
gs.set_style(custom={'font.size':20, 'figure.figsize':(10, 10)})
gs.locmap(dat, title='The default style is modified through custom specifications')
paramdiff()
3. Modify the default plot style with preset styles¶
This is the required command for pygeostat to plot in its old default, 'ccgpaper'
gs.set_style('ccgpaper')
gs.locmap(dat, figsize=(12, 12), title='The ccgpaper style is now the default')
paramdiff()
4. Restore default styles¶
The original matplotlib defaults may be restored via restore_mpl_style()
gs.gsPlotStyle.restore_defaults()
gs.locmap(dat, title='The matplotlib defaults are restored')
paramdiff()
gs.set_style("pt3")
gs.locmap(dat, title='The pt3 style is now the default')
paramdiff()
gs.restore_mpl_style()
gs.locmap(dat, title='The matplotlib defaults are restored')
paramdiff()
5. Styles set using the pltstyle kwarg are non-permanent¶
gs.locmap(dat, pltstyle="pt3", title='pt3 style due to pltstyle kwarg')
gs.plt.show()
print(paramdiff())
gs.locmap(dat, pltstyle="presentation",
title='presentation style due to pltstyle kwarg')
gs.plt.show()
print(paramdiff())
gs.locmap(dat, title='return to the default style, which was not impacted by pltstyle kwargs')
gs.plt.show()
print(paramdiff())
gs.locmap(dat, cust_style={'font.family':'Times New Roman'},
title='Times due to cust_style kwarg')
gs.plt.show()
print(paramdiff())
gs.locmap(dat, title='Not Times since the default is used')
gs.plt.show()
print(paramdiff())
gs.histplt(dat['Au'], title='Default histplt style')
Alter defaults to provide a GSLIB look¶
GSLIB style plots remove the top and right axes. Also add a grid to all plots.
gs.gsParams['plotting.axis_xy'] = True
gs.gsParams['plotting.grid'] = True
gs.histplt(dat['Au'], title='GSLIB-style histplt')
Alter the default content and style of the statistics¶
# Horizontal alignment of all pygeostat statistics
gs.gsParams['plotting.stat_ha'] = 'left'
# Use of a fraction leads to the statistics font being a fraction of the
# regular font size
gs.gsParams['plotting.stat_fontsize'] = 0.8
# Histplt in the argument means that these arguments only pertain to that function
gs.gsParams['plotting.histplt.stat_blk'] = 'minimal'
gs.gsParams['plotting.histplt.stat_xy'] = (.8, .05)
gs.histplt(dat['Au'], title='Altered statistics content and style')
Specifics of spatial plots¶
GSLIB-style plots for spatial programs (locmap, pixelplt, etc.) displayed the top and right border. This is therefore a seperate argument within gsParams that is not impacted by the axis_xy default above.
gs.locmap(dat, title='Spatial plots maintain a GSLIB-style axis') gs.gsParams['plotting.axis_xy_spatial'] = True gs.locmap(dat, title='Spatial plots now are consistent with borders of the histplt above')
gs.locmap(dat, title='Spatial plots maintain a GSLIB-style axis')
gs.gsParams['plotting.axis_xy_spatial'] = True
gs.locmap(dat, title='Spatial plots are now consistent with all pygeostat Plots')
Scatter Plot Addition¶
import pygeostat as gs
import numpy as np
% matplotlib inline
gs.DataFile()
Load the data¶
dat = gs.DataFile(flname='../data/data.dat', notvariables=['Keyout'])
dat.variables
Set some default plot parameters¶
gs.set_style(custom={'font.size':12, 'figure.figsize':(5, 5)})
1. Basic Scatplt Properties¶
Scatplt is heavily integrated with gsParams. All of the kwargs that are demonstrated in this section, may be set as project defaults to avoid their repetition.
Basic defaults¶
Axis labels are are drawn from the Pandas series if not provided. Coloring according to calculated KDE is the default.
gs.scatplt(dat['Au'], dat['Sulfides'])
Color bar functionality¶
Note that a specialized (and relatively clean) color bar labeling is provided for KDE.
gs.scatplt(dat['Au'], dat['Sulfides'], cbar=True)
Coloring with any arbitrary array¶
Here, the colorbar label is drawn from the provided data in the absence of a kwarg
gs.scatplt(dat['Au'], dat['Sulfides'], c=dat['Carbon'], cbar=True,
clim=(0, .5))
Other color and opacty options¶
Commonly manipulated properties are found in the function kwargs, although any Matplotlib.scatter kwargs may be passed as well.
gs.scatplt(dat['Au'], dat['Sulfides'], c='k', alpha=.1)
2. Scatplt Statistics¶
Available statisistics¶
As with histplt, scatplt provides an 'all' argument for stat_blk, which for now, displays the number of pairs, pearson correlation and spearman rank correlation
gs.scatplt(dat['Au'], dat['Sulfides'], figsize=(5, 5), stat_blk='all')
# Also accomplished as a default via:
# gs.gsParams['plotting.scatplt.stat_blk'] = 'all'
Custom statistics¶
A list of the desired statistics maybe be provided, as well as the stat location.
gs.scatplt(dat['Au'], dat['Sulfides'],
stat_blk=('count', 'pearson'), stat_xy=(.95, .95))
# Also accomplished as a default via:
# gs.gsParams['plotting.scatplt.stat_xy'] = (.95, .95)
# gs.gsParams['plotting.scatplt.stat_blk'] = ('count', 'pearson')
# Setting new defaults for the next section
gs.gsParams['plotting.scatplt.stat_blk'] = 'all'
gs.gsParams['plotting.scatplt.stat_xy'] = (0.95, 0.95)
Declustered Stats¶
Declustering weights may be used for the calculated statistics. In the future, this may also be used for functions such as KDE calculations.
# Generate some random weights for this demo
wt = np.random.rand(dat.shape[0])
gs.scatplt(dat['Au'], dat['Sulfides'], wt=wt)
3. Multiple Scatter Plots via scatplts¶
Multiple scatterplots may be plotted with the scatplts wrapper, which contintues to provide all of the flexibility that was demonstrated with scatplt.
# Set a larger default figure size
gs.set_style(custom={'figure.figsize':(10, 10)})
# Note the variables attribute of the DataFile
print('variables:', dat.variables)
Basic Functionality¶
The defaults of this wrapper function are largely drawn from the underlying scaplt defaults and its related gsParams. Note that heterotopic data may be passed, as scatplt automatically determines the pairs with no NaN values (oberve the differing $n$ in each panel).
If a DataFile is passed, the function defaults to using the DataFile.variables attribute.
print(dat.columns)
print(dat.variables)
fig = gs.scatplts(dat, pad=(-5, -3.5))
Another example with Au coloring and specified variables¶
Here, a DataFrame is passed to control the variables that are plotted.
fig = gs.scatplts(dat[dat.variables], figsize=(10, 10), pad=(-5, -3.5),
c=dat['Au'],)
Another example with differing plot parameters¶
Demonstrating plot flexibility with a few parameters.
fig = gs.scatplts(dat, figsize=(10, 10), pad=(-4.5, -3.2), c='k',
alpha=.1, s=6, stat_blk=False, axis_xy=True,
grid=True)
4. Scatterplot Comparisons via scatplts_lu¶
The scatplts_lu function facilitates the comparison of multiple scatterplots, placing pairs in the upper and lower triangle. The orientation of the lower triangle plots may be aligned with the upper plots to further ease comparison.
Generate transformed data for comparison¶
Note that the following block of code is placed as an example of program calling with the latest DataFile attributes, though it is commented since the PPMT program is required.
# ppmt = gs.Program(program='ppmt', getpar=True)
ppmtpar = """ Parameters for PPMT
*******************
START OF PARAMETERS:
{datafl} -input data file
{nvar} {varcols} 0 - number of variables, variable cols, and wt col
-5 1.0e7 - trimming limits
25 50 50 -min/max iterations and targeted Gauss perc. (see Note 1)
1 -spatial decorrelation? (0=no,1=yes) (see Note 2)
1 2 0 - x, y, z columns (0=none for z)
50 25 - lag distance, lag tolerance
nscore.out -output data file with normal score transformed variables
{outfl} -output data file with PPMT transformed variables
ppmt.trn -output transformation table (binary)
Note 1: Optional stopping criteria, where the projection pursuit algorithm will terminate
after reaching the targetted Gaussian percentile. The input percentile range is 1 (very Gaussian)
to 99 (barely Gaussian); the percentiles are calculated using random Gaussian distributions.
The min/max iterations overrides the targetted Gaussian percentile.
Note 2: Option to apply min/max autocorrelation factors after the projection pursuit algorithm
to decorrelate the variables at the specified non-zero lag distance.
"""
ppmtfl = '../data/ppmt.out'
# ppmt.run(parstr=ppmtpar.format(datafl=dat.flname, nvar=dat.nvar,
# varcols=dat.gscol(dat.variables),
# outfl=ppmtfl))
gvariables = ['PPMT:'+a for a in dat.variables]
datg = gs.DataFile(ppmtfl, variables=gvariables)
Set some new defaults¶
All of these items could also be set via kwargs, though using gsParams allows for them to not be repeated.
gs.gsParams['plotting.scatplt.s'] = 3
gs.gsParams['plotting.scatplt.stat_blk'] = ['pearson', 'noweightflag']
gs.gsParams['plotting.scatplt.stat_xy'] = (1., 1.05)
gs.gsParams['plotting.sigfigs'] = 3
gs.gsParams['plotting.roundstats'] = True
fig = gs.scatplts_lu(dat, datg, pad=(-4, -3.2))
Orientation and titles¶
Here, the orientation of the axes in the lower and upper are aligned to ease comparison. Also note that the lower and upper triangles may be titled
dat['Weight'] = wt
fig = gs.scatplts_lu(dat, dat, lowwt='Weight', pad=(-4, -3.2),
titles=('Data', 'Realizations'), titlesize=14,
align_orient=True)
Visualization Toolkit (VTK) Changes¶
Write VTK Demo¶
The following notebooks demonstrates how the write_vtk function (and its DataFile.writefile wrapper) may be used for outputing:
- Point data
- Regular grid data
- Structured surface data
- Structured grid data
import pygeostat as gs
import pandas as pd
import numpy as np
1. Point Data¶
Drill hole data is stored as a point VTK format, which is relatively inefficient in terms of storage. Users may consider options displayed below for reducing file size (increasing speed of load in Paraview).
Load the point data and inspect its attributes¶
Note that these attributes are detected automatically based on the names (variables are unassigned columns).
datadir = '../data/pycourse/'
dat = gs.DataFile(datadir+'data.dat')
#
print('special attributes:', dat.dh, dat.xyz, dat.ifrom, dat.ito)
print('variables:', dat.variables)
print('dftype:', dat.dftype)
Convenience with DataFile.writefile¶
Minimal options are provided with the DataFile.writefile, but this convenience function is permitted since x, y, z and dftype are registered.
dat.writefile('point.vtk')
Flexibility with write_vtk¶
Specific variables may be specified for writing, reducing file size. Further, the precision of variables (vdtype) and coordinates (cdtype) may be specified to reduce file sizes if the default 'float64' is not necessary.
gs.write_vtk(dat, 'point_float32_variables.vtk', variables=[dat.dh]+dat.variables,
vdtype='float32', cdtype='float32')
Integration of gsParams for defaults¶
Note that options such as vdtype and cdtype may have their defaults altered, allowing for the use of writefile without the required kwarg.
gs.gsParams['data.write_vtk.vdtype'] = 'float32'
gs.gsParams['data.write_vtk.cdtype'] = 'float32'
dat.writefile('point_float32.vtk')
2. Regular Grid Data¶
The vast majority of grid data that is generated by CCG programs are regular or rectinilear grids. They are stored efficiently by the rectlinear VTK file (.vtr), although large file sizes can still result. Users should consider using float and integer precision as appropriate to reduce file sizes (see previous section).
Generate some grid data¶
Load the grid definition and create a variable that is a function of the x, y and z coordinates (completely arbitrary).
griddef = gs.GridDef(gridfl=datadir+'griddef.txt')
x, y, z = griddef.gridcoord()
sim = gs.DataFile(data=pd.DataFrame(np.multiply(np.multiply(x, y), z),
columns=['Multiplied Coordinates']),
griddef=griddef, dftype='grid')
sim.dftype
Convenience with DataFile.writefile¶
Since the file has a grid dftype and a grid definition, writefile outputs it as a regular grid format.
sim.writefile('regular_grid.vtk')
3. Structured Surface Data¶
Structured grids differ from regular or rectilinear grids, in that the coordinates of each centroid does not align with a simple definition. Rather, the coordinates of each grid location must be specified.
Like regular grids, however, the relative location/ordering of each grid node is maintained. Structured grids should iterate in the x, y and then z direction, as with GSLIB-style regular grids.
The first form of structured grid data is 2-D surfaces, which are relatively common. In the below example, the x/y coordinates of the surface are regularly spaced, but the z coordinate of the grid (surface elevation) is irregular.
Load the surface data¶
Note that x, y and z are included in this file, and are recognized on import.
sim = gs.DataFile(datadir+'surface.gsb', griddef=griddef.convert_to_2d())
print('Coordinate columns of the surface:', sim.x, sim.y, sim.z)
Output the VTK with two available methods¶
Two methods are presented below. The dftype must be specified, either as a datafile attribute, or as kwarg in the low-level write function. The x, y and z columns must also be specified, either as attributes or kwargs.
gs.write_vtk(sim, 'structured_surface_method1.vtk', dftype='sgrid')
sim.dftype = 'sgrid'
sim.writefile('structured_surface_method2.vtk')
# The 3-D structured grid will confor
surf = sim['Z'].values.reshape(griddef.nx, griddef.ny, order='F')
# Generate the z coordinates
_, _, z = griddef.gridcoord()
# Add some variability to the z coordinates of the grid, making it structured
z = z.reshape(griddef.nx, griddef.ny, griddef.nz, order='F')
for iy in range(griddef.ny):
for ix in range(griddef.nx):
z[ix, iy, :] = z[ix, iy, :] + surf[ix, iy]
z = z.reshape(griddef.nx*griddef.ny*griddef.nz, order='F')
# Create some cell data for output - simply the grid index
idx = np.arange(griddef.count())
# Create a structured datafile
sim = gs.DataFile(data=pd.DataFrame(np.stack((z, idx), axis=1),
columns=['Z', 'Index']),
griddef=griddef, dftype='sgrid')
Output the VTK¶
Note that since x and y are not attributes of the sgrid object, they are assumed to follow the regular grid definition.
sim.writefile('structured_grid.vtk')
Weight Changes¶
Declustering Weights Demo¶
This notebook demonstrates how the declustering weights attribute of a DataFile (wts) and the weight kwarg of functions (wt) may be used.
Note that wts is plural, because data may have multiple declustering weights for each variable. wt kwargs are generally singular (excluding future functions such as nscore, which will accept a wts argument for multiple variables).
The notebook is comprised of 7 sections:
- Load and inspect the data
- Calculate data spacing in advance of declustering
- Calculate declustering weights for each variable
- Set the DataFile.wts attribute
- Use the wts attribute with histplt
- Use the wts attribute with scatplts
- Use the wts attribute with histplt (with one variable/weight)
import pygeostat as gs
import matplotlib.pyplot as plt
% matplotlib inline
gs.gsParams['plotting.locmap.s'] = 5
1. Load and Inspect the Data¶
Note that Au is more densely sampled, and will therefore have differing declustering weights relative to the secondary variables.
dat = gs.DataFile(flname='../data/data.dat', cat='Keyout')
print('variables = ', dat.variables)
fig, axes = gs.subplots(2, 2, cbar_mode='single',
axes_pad=(0.05, 0.4), figsize=(10, 10))
for ax, var in zip(axes, dat.variables):
gs.locmap(dat, var=var, ax=ax, vlim=(0, .5),
cbar_label='standardized units')
dat.columns
2. Calculate Data Spacing¶
The data spacing is calculated to select an appropriate declustering cell size for each variable (corresponding with the ~P95 percentile here).
fig, axes = plt.subplots(2, 2, figsize=(10, 10))
axes = axes.flatten()
tnames = []
for ax, var in zip(axes, dat.variables):
dat.spacing(3, var)
tnames.append(var+' Data Spacing (m)')
gs.histplt(dat, var=tnames[-1], ax=ax, icdf=True)
# Don't need these variables anymore
tnames.append('Keyout')
dat.drop(tnames)
cellsizes = [8, 16, 16, 16]
3. Decluster Each Variable¶
# Note that users must have a declus program on their system path
# or in the folder where this is executed
declus = gs.Program(program='declus', getpar=True)
parstr = """ Parameters for DECLUS
*********************
START OF PARAMETERS:
../data/data.dat -file with data
1 2 0 {varcol} - columns for X, Y, Z, and variable
-1.0e21 1.0e21 - trimming limits
declus.sum -file for summary output
declus.out -file for output with data & weights
1.0 1.0 -Y and Z cell anisotropy (Ysize=size*Yanis)
0 -0=look for minimum declustered mean (1=max)
1 {cellsize} {cellsize} -number of cell sizes, min size, max size
5 -number of origin offsets
"""
tnames = []
for var, cellsize in zip(dat.variables, cellsizes):
declus.run(parstr=parstr.format(varcol=dat.gscol(var),
cellsize=cellsize),
liveoutput=False)
temp = gs.DataFile('declus.out')
tnames.append(var+' Wt')
dat[tnames[-1]] = temp['Declustering Weight']
gs.rmfile(['declus.out', 'declus.sum', 'temp'])
print('Data columns = ', list(dat.columns))
print('Data weights = ', dat.wts)
wts may be set using the setcol function¶
This is consistent with using setcol for dat.x, dat.y, etc.
dat.setcol('wts', tnames)
print('Data weights = ', dat.wts)
wts may be specified on initialization¶
The data is written to a temporary file below to permit initialization. wts are specified on the re-intitialization of the DataFile, which must be done in this case due to their non-standard naming conventions. Note that these weight columns are not registered as variables due to their registry as special attributes.
dat.writefile('declus.out')
dat = gs.DataFile('declus.out', wts=tnames)
gs.rmfile('declus.out')
print('Data weights = ', dat.wts)
print('Variables = ', dat.variables)
5. Use of wts with histplt¶
The wts naming convention allows for natural iteration of wt in a loop.
fig, axes = plt.subplots(2, 2, figsize=(10, 10))
axes = axes.flatten()
for var, wt, ax in zip(dat.variables, dat.wts, axes):
gs.histplt(dat, var=var, wt=wt, ax=ax)
try:
fig = gs.scatplts(dat, wt=dat.wts)
except Exception as e:
print(e)
Dropping wts columns alters the wts attribute¶
DataFile.drop leads to altering of the DataFile.wts attributes (if necessary). The below function leads to only a single wts column.
dat.drop(dat.wts[1:])
print('Data weights = ', dat.wts)
Scatplt now accepts dat.wts, since it is not a list¶
Note that a entry of dat.wts could previously be used as well.
fig = gs.scatplts(dat, wt=dat.wts, figsize=(10, 10), pad=-3,
stat_xy=(.95, .95))
Scatplt with a boolean wt¶
If dat.wts is a single column, a boolean may also be used for activating weights, adding further convenience.
fig = gs.scatplts(dat, wt=True, figsize=(10, 10), pad=-3,
stat_xy=(.95, .95))
try:
gs.histplt(dat, wt=True)
except Exception as e:
print(e)
Histplt with a boolean wt¶
After reducing the number of variables to 1, histplt operates on the DataFile. It also permits a boolean wt.
dat.drop(dat.variables[1:])
print('Data variables = ', dat.variables)
gs.histplt(dat, wt=True)