Fortran in pygeostat¶
Various Fortran subroutines have been wrapped for Python using a compiling tool Fortran Builder developed to permit F2PY compilation with a wide variety of Fortran and C compiler combinations. Compiling Fortran extensions for python allows data to be passed directly from Python to Fortran. This allows users to take advantage of the speed of Fortran, the breadth of historical libraries and the seamless integration with the wide variety of flexible and powerful libraries accessible through Python.
When pygeostat is imported, these compiled extensions are not automatically imported. However, full functionality of pygeostat assumes that the compiled extensions are present. If they are not found then usage of the package is still possible; various errors and warnings are thrown if functions requiring the compiled extensions are used.
The simplest way to compile the extensions is to run the following commands from a python interpreter after installing pygeostat:
>>> import pygeostat as gs
>>> gs.build_pygeostat_fortran()
Alternatively, a compile script is provided that is command line callable. Starting a terminal in
pygeostat/fortran/
and using the following command will result in the extensions built in the
current directory:
> python compile.py -compiler=gnu all
NOTES
If pygeostat was installed from a
.whl
file, the fortran extensions are already present and this step is not required.If installing using the setup.py, try
python setup.py install -buildfortran
to pre-build the fortran extensions and copy them over to the site-packages. The wheel installation is preferred over this stepThe Fortran Builder is compatible with gfortran and intel compilers and tested with the following configurations:
- MinGW 4.7.1 installed with
conda install mingw
- MinGW 7.3.0 installed from sourceforge
- Intel Fortran 12 and Visual Studio 2012
- Intel Fortran 15+ and Visual Studio 2015
4. To compile custom fortran extensions (e.g. fortran code not contained in pygeostat), consider the
custom fortran builder
Fortran Extesion Building Functions¶
Functions developed to permit compilation of pygeostat
and arbitrary fortran modules. Using
f2py
and flexible Fortran and C compilers.
Note
compile.py
is command line callable. Open a terminal in pygeostat/fortran/
and build the
pygeostat fortran extensions with python compile.py -compiler=gnu all
Code author: Ryan Martin - 24-04-2018
Build Pygeostat Fortran¶
-
pygeostat.fortran.compile.
build_pygeostat_fortran
(pygeostatmodules='all', compiler='gnu', mode='release', exflags='', opt='O2', clean=True, verbose=False)¶ This function builds the f2py extensions with minimal interaction from the user. The goal is to make this function callable with
gs.build_pygeostat_fortran()
, and have all requirements sorted out automatically resulting in a set of compiled pygeostat fortran extensions.Modules are compiled to
pygeostat/fortran/
, regardless of where the function is called from. If no gnu compiling tools are found on the path (gfortran -v
,gcc -v
returns nothing), thenconda install mingw -y
is run to install MinGW, and this compiler is used.Parameters: - pygeostatmodules (str) – either
"all"
or one of the dictionary keys found inpygeostat/fortran/sourcedefs.py
- compiler (str) – either
"intel"
,"gnu"
or a path to a local folder containing gfortran compilers, e.g.,C:/mingw/bin/
- mode (str) – “release” or “debug”
- exflags (str) – compiler-specific permissable fortran compiler flags
- opt (str) – optimization level, defaults to O2
- clean (bool) – if True then the build directory is cleaned before compiling (recommended)
- verbose (bool) – write all output to the terminal
Code author: Ryan Martin - 07-04-2018
- pygeostatmodules (str) – either
Pygeostat Fortran Modules¶
backtr¶
backtr.f90
covasubs¶
covasubs.for
dgm¶
dgm.for
esri_io¶
esri_io.f90
fgslib_io¶
fgslib_io.f90
gausskde¶
gausskde.f90
getcollocated¶
subs/sortem.f90
, getcollocated.f90
histcorrect¶
subs/random.f90
, subs/normaldist.f90
, subs/sortem.f90
, histcorrect.f90
nscore¶
subs/quicksort.f90
, subs/acorni.f90
, subs/normaldist.f90
, subs/nscore_module.f90
, nscore.f90
pygsb¶
subs/fileprocessing.f90
, subs/gslib_binary.f90
, pygsb.f90
sgvertices¶
sgvertices.f90
spbusim¶
subs/random.f90
, subs/sortem.f90
, subs/normaldist.f90
, linearinterp.f90
, covasubs.for
, gausskde.f90
, spbusim.f90
subsample¶
subs/random.f90
, subsample.f90
supersec¶
supersec.f90
varcalc¶
subs/sortem.f90
, subs/random.f90
, subs/varsubs.f90
, varcalc.f90
varmodel¶
subs/random.f90
, covasubs.for
, varmodel.f90
varsim¶
subs/fileprocessing.f90
, subs/gslib_binary.f90
, subs/varsim.for
, varsim_wrap.f90
Build Custom Fortran¶
-
pygeostat.fortran.compile.
build_custom_fortran
(sources, includes={}, wraponly={}, name=None, srcdir='./', outdir='./', tmpdir='./tmp/', compiler='gnu', mode='release', exflags='', opt='O2', clean=True, verbose=True)¶ This function is intended to allow arbitrary f2py extensions to be constructed using the tooling provided in this module. This function replaces FortranBuild as a more succinct methodology to compile the requred sources.
Parameters: - sources (list or dict) – either a list of fortran files where the last depends on the first
and the last file in the list contains the f2py wrapping code, or a dictionary where
keys in the dictionary indicate the name of the module to be built and the values are the
lists of associated sources to generate that module. See
pygeostat/fortran/sourcedefs.py
for inspiration on the structure of these dictionaries - includes (list or dict) – a matching item to sources that contains a list or dict of extra
items to include on the link step of compilation. See
pygeostat/fortran/sourcedefs.py
. - wraponly (list or dict) – matching item to
sources
andincludes
that contains a list or dictionary of functions that get wrapped for python, other functions in the f2py fortran code are ignored. Seepygeostat/fortran/sourcedefs.py
. - name (str) – if
sources
is a list, a name must be provided for the resulting<name>.pyd
- tmpdir, outdir (srcdir,) – various locations to consider
- compiler (str) – either
"intel"
,"gnu"
or a path to a compilerbin
directory - mode (str) –
"release"
or"debug"
- exflags (str) – compiler-specific permissable fortran compiler flags
- opt (str) – optimization level, defaults to
O2
- clean (bool) – if
True
then the build directory is cleaned before compiling (recommended) - verbose (bool) – write all output to the terminal
Code author: Ryan Martin - 07-04-2018
- sources (list or dict) – either a list of fortran files where the last depends on the first
and the last file in the list contains the f2py wrapping code, or a dictionary where
keys in the dictionary indicate the name of the module to be built and the values are the
lists of associated sources to generate that module. See
Fortran Subroutines¶
Simulated Realizations Accuracy Plot¶
Used and wrapped within gs.accpltsim()
Covariance Calculation Subroutines¶
cova3¶
Used and wrapped within gs.VariogramModel.getcova()
sqdist¶
Used and wrapped within gs.VariogramModel.setcova()
setrot¶
Used and wrapped within gs.VariogramModel.setcova()
Fast GSLIB File I/O Subroutines¶
File Read¶
Used and wrapped within gs.read_gslib_f()
and gs.DataFile()
File Write¶
Used and wrapped within gs.write_gslib_f()
and gs.DataFile()
Gaussian KDE Subroutines¶
Not used in any functions, but can be called.
Get Collocated Exhaustive Secondary Data¶
Used and wrapped within gs.getcollocated()
Histogram Correction¶
Not used in any functions, but can be called.
GSLIB-Binary I/O Subroutines¶
File Read¶
Used and wrapped within gs.read_gsb()
and gs.DataFile()
File Write¶
Used and wrapped within gs.write_gsb()
and gs.DataFile()
Semiparametric Bayesian Updating¶
Not used in any functions, but can be called.
Subsample Dataset¶
-
pygeostat.fortran.wrappers.
subsample
(datafl, col, ncell, nsub, nreal, rseed)¶ When datasets become too large to efficiently use, a sub-sample maybe extracted and used in ieu. A Fisher-Yates shuffle is implemented.This subroutine was designed with the intent of wrapping it for usein python. Motivated from Jarred L. Deutsch’s use of Fisher-Yates shuffle in gslib program histpltsim.
Assumes that the data file being read is in GSLIB format. Outputs an 1D array in the case that only one realization is sub-sampled, or a2D array in the case where multiple realizations are sub-sampled.Each realizations sub-sample is within a unique column.
Parameters: - datafl (str) – A single input datafile with the variable being sampled
- col (int) – Column containing data to sub-sample
- ncell (int) – The number of cells within a single realization, can also be interpreted as the number of data to read, then sub-sample
- nsub (int) – Number of sub-samples to output
- nreal (int) – Number of realizations to sub-sample
- rseed (int) – A seed value to pass to the random number generator
Returns: Output array with a realization in each column
Return type: subsamp (np.array)
Examples
First, the function needs to be loaded into Python:
>>> from pygeostat.fortran.subsample import subsample
A simple call:
>>> subsamp = subsample(datafl='./sgsim.out', col=1, ncell=griddef.count(), nsub=5000, ... nreal=100, rseed=gs.rseed())
Code author: Warren E. Black - 2015-09-22
Experimental Variogram Calculation¶
Used and wrapped within gs.varcalc()
Variogram Modeling and Fitting¶
Used and wrapped within gs.Variogram.fitmodel()
Gridded Variogram Calculation¶
Used and wrapped within gs.varsim()
F2PY¶
F2PY Intro, Installation, Troubleshooting Tips¶
F2PY compiles Fortran subroutines into .pyd files, allowing them to be called as Python functions.The module is included within numpy. Prerequisites for F2PY include (possibly not limited to, depending on your system, setup, and experience with the F2PY quirks):
- A C compiler from one of:
- MinGW (cygwin or other)
- MSVC compilers (probably the hardest option)
- A Fortran compiler:
- gfortran from MinGW
- Intel Fortran Compiler (see: resource/f2py_ifort_Makefile/)
3. The right path variables (this list may not be exhaustive, and is a common source of frustration in Windows)
- Anaconda path variables (should already be setup)
- Path to MinGW bin (<Anacondapath>/MinGW/bin)
- Path to Cygwin bin (<Cygwinpath>/bin)
- “Wrappable” Fortran code
- see here for more details
Getting F2PY to work directly and properly on windows can be an issue as the right combination of C and Fortran compilers are required based on your operating system and whether your machine has 32 or 64-bit architecture. Libraries also need to be in the correct location.
Warning
F2PY compilation with Cygwin does not presently work in Anaconda3 2.4.1 for Python 3.5 due to a change in Visual Studio Libraries (Bug Report: https://bugs.Python.org/issue25251). Recommendation is to use Anaconda3 2.3.0 in the meantime.
Installation instructures are outlined below, some examples can be found here. It is assumed that the Anaconda distribution is being used and is already installed
Windows 64-Bit
This procedure works for Windows 7 OS, it may work for others but has not been tested.
Install mingw to Anaconda by running the following in the command line or cygwin prompt:
>>> conda install mingw
Install cygwin with the following packages from the development category:
- mingw64-x86_64-gcc-core
- mingw64-x86_64-gcc-g++
F2PY Examples¶
Typical Fortran code allows for dynamically allocatable arrays depending on size constraints determined during run time. This can be accounted for in 2 ways with F2PY. Either write your Fortran subroutines to explicitly define all array sizes in the subroutine call. i.e.:
subroutine example(data,size)
integer, intent(in) :: size
real*8, intent(inout) :: data(size)
...
end subroutine example
If the above subroutine was contained in a file called ‘example.f90’ it could immediately be wrapped with F2PY using the command line F2PY functionality:
>>> f2py -c example.f90 -m example
There are commonly errors thrown at this stage, some of which are covered in the installation section, others can be sorted out by combing ‘the forumns’… The simplest method for troubleshooting is to use various combinations of --fcompiler=
or --compiler=
flags to determine the working combination for your machine. So the modified call may be something like this:
>>> f2py --fcompiler=gfortran --compiler=cygwin -c example.f90 -m example
The above procedure assumes that your functions are passed and return variables of known size. That is, sizes of the variables are known (there are no ‘allocatable’ arrays here). Additionally, all types are standard, derived types cannot be passed. If the code you are developing MUST account for unknown sizes, or needs to work with some kind of derived type, a working methodology is to use module allocatable arrays and private module variables, respectively:
module example
! things we can access from Python !
real*8,allocatable,public :: data(:,:)
public :: usedata
! things we cannot access from Python, but can use in Fortran !
type(customtype),private :: derived_variable
contains
subroutine usedata()
do
!something with data and/or derived_variable
enddo
end subroutine
end module example
In this case, you define the size of the allocatable arrays from Python by assigning data to the module variable after compiling the module with F2PY (more on this below):
>>> import numpy as np
>>> from example import example as ex
>>> ex.data = np.array([[1,1,1],[0,0,0]])
Using these simple rules, it is possible to wrap existing Fortran codes that are too ‘complex’ for F2PY, so they can be used in Python. Conider the example below. The kdtree2_module (from the CCG Knowledge Base) contains complex Fortran code including pointers, allocatable arrays, derived types, recursive subroutines. It also relies on allocatable arrays and pointers in order to be used from calling code, since you need to have dynamic storage for nearest neighbor searches, and must store pointers to the kdtree once it is constructed. The following wrapping module can be compiled by F2PY, to allow access to the complex code from Python:
module kd
use kdtree2_module
implicit none
!things that we can access from Python!
public :: createtree,nnearest,rnearest
integer,allocatable,dimension(:),public :: inds
real*8,allocatable,dimension(:),public :: dis
real*8,allocatable,dimension(:,:),public :: dlocs
integer, public :: nd
!things we can't access from Python:
type(kdtree2),pointer,private :: tree
type(kdtree2_result),allocatable,target,private :: datainds(:)
contains
subroutine createtree()
real*8,allocatable,dimension(:,:) :: tlocs
if(allocated(dlocs))then
if(size(dlocs,1)>3)then
write(*,*)'Data must be dim x nd dimensioned!'
return
endif
if(associated(tree)) call kdtree2_destroy(tree)
tree => kdtree2_create(dlocs,3,.true.)
nd = size(dlocs,2)
else
write(*,*)'Data not defined'
endif
end subroutine createtree
subroutine nnearest(n,pt)
integer,intent(in) :: n
real*8,intent(in),dimension(3) :: pt
if(allocated(inds))deallocate(datainds,inds,dis)
allocate(datainds(n),inds(n),dis(n))
call kdtree2_n_nearest(tree,pt,n,datainds)
inds=datainds%idx
dis=sqrt(datainds%dis)
end subroutine nnearest
subroutine rnearest(r,pt)
real*8,intent(in) :: r
real*8,intent(in),dimension(3) :: pt
integer :: n,nf
real*8 :: r2
r2 = r * r
n = kdtree2_r_count(tree,pt,r2)
if(allocated(inds))deallocate(datainds,inds,dis)
allocate(datainds(n),inds(n),dis(n))
call kdtree2_r_nearest(tree,pt,r2,nf,n,datainds)
inds=datainds%idx
dis=sqrt(datainds%dis)
end subroutine
end module kd
In order to build the above module with F2PY, we need to first pre-compile the kdtree2.f90 codes into a linkable module for F2PY. Do this with:
>>> gfortran -c -O2 kdtree2.f90 -fstack-arrays
The -c flag instructs gfortran to build the codes without linking to an executable, which generates a .mod file for every module in the kdtree2.f90 codes, and links all .mod modules into the .o object file that is used in the next step. The -fstack-arrays flag is required in this case since intels compilers assign arrays to the stack by default, whereas gfortran does not. This flag may not be required for all modules. Additionally it may be necessary to define the stack size explicitly here with another flag (see website for more options). Pre-compiling the complex codes is the critical step for using complex Fortran codes in Python. The next step is to build the wrapping module kd, contained in the kd_wrap.f90 codes (shown above), with F2PY. Building the module with F2PY generates the C-wrappers that allows the Fortran code to be called from Python. This step may be a bit of trial and error with compiler flags and whatnot, but in this case, and from a machine with both mingw, cygwin and Python 3.4 installed, the following call worked:
>>> f2py --fcompiler=gfortran --compiler=cygwin --opt=-O2 --f90flags=-fstack-arrays -I.
... kdtree2.o -c kd_wrap.f90 -m kd_wrap
Various errors will likely be thrown at this step, varying from 64 bit compiler issues, Python issues, distutils, paths, etc. For some troubleshooting see the F2PY install section.
Once compiled into a .pyd file, it can be called within Python as a module, and any module variables can be assigned data, any module subroutines may also be called, for example:
from kd_wrap import kd
import pygeostat as gs
import numpy as np
#import some data:
dat = gs.DataFile(flname='GC_df.out',readfl=True)
#in the kdtree2 module, data is assumed to be dim x nd dimensioned
kd.dlocs = np.array(dat.data[[0,1,2]]).transpose()
kd.createtree()
# Find points around some arbitrary location
pt = np.array([10,10,10])
# Do a nearest neighbor search with 10 neighbors
kd.nnearest(10,pt)
# Now we can access the resulting distances and indices
print(kd.dis)
print(kd.inds)
# Do a radius search
kd.rnearest(150,pt)
# Now we can access the resulting distances and indices
print(kd.dis)
print(kd.inds)
Compiling and calling Fortran dll’s¶
The standard Python library contains a package called ctypes
which provides interfaces to foreign functions that are found in shared libraries. With the Intel compiler, a Fortran function or subroutine can be exported to dll by ensuring that the !DEC$ is present to define the exported symbol:
! Note: this only works with the Intel Compiler
subroutine sqr(val)
!DEC$ ATTRIBUTES DLLEXPORT, ALIAS:'sqr' :: sqr
integer, intent(inout) :: val
val = val ** 2
end subroutine sqr
If the above subroutine is contained in the example.f90
file, it is compiled with :
>>> ifort /dll example.f90
which produces the shared library example.dll
. Inspecting this object with dumpbin:
>>> dumpbin /exports example.dll
1 0 00001000 sqr
The function can be imported to Python with the ctypes
module and called with the following Python code:
import ctypes as ct
fortlib = ct.CDLL('example.dll')
val = 100
val = ct.pointer(ct.c_int(val)) # setup the pointer to the correct data structure
_ = fortlib.sqr(val) # call the function
print(val[0]) # index the memory address setup above
Considering the F2PY examples above, immediately it is clear that compiling Fortran in this way is easier, and calling the Fortran from Python in this way is more difficult. The advantages are that this is a C-independent (except for Python and the ctypes
library) method to call a Fortran function from Python, and the same dll called from Python here could also be called from other languages (i.e. C# for petrel).
Maintaining GNU - Intel compatibility¶
Since many of the pygeostat functions can be compiled with either GNU or Intel tools using the FortranBuild class, a method to maintain a compatible Python calling sequence for the dll’s is required since one set of Python code should be able to call the Fortran function regardless of how it was compiled. This can be done by using the iso_c_binding
module:
subroutine sqr2(val) BIND(C, NAME='sqr2')
use iso_c_binding
!DEC$ ATTRIBUTES DLLEXPORT :: sqr2
integer, intent(inout) :: val
val = val ** 2
end subroutine sqr2
The iso_c_binding
in this case overwrites the name mangling for each compiler (gfortran and ifort), and ensures the call sequence from Python is constant. In this case the !DEC$ statement is ignored by gfortran and the function is available in the exported dll. Notably, the ALIAS is dropped from the !DEC$ export for Intel Fortran since the name in the binding overwrites this setting.
More Complicated Data Types¶
The main concern for calling Fortran functions in the dll’s from Python is how to setup the data structures. Consider the following function which take a 2-dimensional array of integers and computes some arbitrary value from the inputs by replacing the values in the array:
module example
use iso_c_binding
implicit none
contains
subroutine sqr_2d_arr(nd, val) BIND(C, NAME='sqr_2d_arr')
!DEC$ ATTRIBUTES DLLEXPORT :: sqr_2d_arr
integer, intent(in) :: nd
integer, intent(inout) :: val(nd, nd)
integer :: i, j
do j = 1, nd
do i = 1, nd
val(i, j) = (val(i, j) + val(j, i)) ** 2
enddo
enddo
end subroutine sqr_2d_arr
end module example
The module is compiled with:
>>> ifort /dll example.f90
or:
>>> gfortran -shared example.f90 -o example.dll
and can be called from Python with:
import ctypes as ct
import numpy as np
# import the dll
fortlib = ct.CDLL('example.dll')
# setup the data
N = 10
nd = ct.pointer( ct.c_int(N) ) # setup the pointer
pyarr = np.arange(0, N, dtype=int) * 5 # setup the N-long
for i in range(1, N): # concatenate columns until it is N x N
pyarr = np.c_[pyarr, np.arange(0, N, dtype=int) * 5]
# call the function by passing the ctypes pointer using the numpy function:
_ = fortlib.sqr_2d_arr(nd, np.ctypeslib.as_ctypes(pyarr))
print(pyarr)
Important Considerations¶
In the above Python code some things are notable:
All data is passed by pointer reference to the Fortran dll, the data is modified in place if the variable is intent(inout), or replaced if the variable is intent(out). All memory is controlled on the Python side of things
The memory must be allocated in some form on the Python side using
np.zeros(size, dtype=type)
(or similar) even if the variable is intent(out)
- The types of all data initialized on the Python size must match those being called in the Fortran module
This is very important. For example, use
int
forinteger
,float
forreal*8
, etc. Pythonfloat
is double precision by default. This may not be a correct list, and is definitely not exhaustive, but it has worked with the limited testing that has been done with this method of calling Fortranthe function
ctypes.c_pointer()
can be used to create the necessary references to non-array elements, be sure to usectypes.c_int()
andctypes.c_double()
(and others) as required for the needed parameterinitializing arrays on the python end is strange. For example, say a Fortran subroutine is defined with an integer, intent(out) array with size nd, to initialize this array on the python side, you can use:
import ctypes as ct intarray = (ct.c_int * nd)()and to give the array values for input:
import ctypes as ct pyarray = np.zeros(nd, dtype=np.int) valarray = (ct.c_int * nd)(*pyarray)Numpy provides some convenient methods
np.ctypeslib.as_ctypes()
to pass initialized arrays with the correct types to the Fortran functions (as demonstrated above)
Mixed Language Debugging¶
The use of dll’s and ctypes
library for calling the Fortran code provides some exciting opportunities to speed up code development while leveraging the plotting and data management capabilities of pygeostat with the GSLIB Fortran geostatistical library. Visual Studio 2015 Community is currently free and has recently developed a set of Python tools that will utilize Python distributions found on the machine. Combined with the Intel Compiler, projects containing Fortran code compiled to dll, and called VIA the Python libraries and tools presented above, can be debugged after it is called from Python!