Reusing code: scripts and modules#
For now, we have typed all instructions in the interpreter. For longer sets of instructions we need to change track and write the code in text files (using a text editor), that we will call either scripts or modules. Use your favorite text editor (provided it offers syntax highlighting for Python), or the editor that comes with the Scientific Python Suite you may be using.
Scripts#
Note
Let us first write a script, that is a file with a sequence of instructions that are executed each time the script is called. Instructions may be e.g. copied-and-pasted from the interpreter (but take care to respect indentation rules!).
The extension for Python files is .py
. Write or copy-and-paste the
following lines in a file called test.py
# Contents of test.py
message = "Hello how are you?"
for word in message.split():
print(word)
Note
Let us now execute the script interactively, that is inside the Jupyter or IPython interpreter. This is maybe the most common use of scripts in scientific computing.
In Jupyter or IPython, the syntax to execute a script is %run script.py
. For
example:
%run test.py
Hello
how
are
you?
message
'Hello how are you?'
The script has been executed. Moreover the variables defined in the
script (such as message
) are now available inside the interpreter’s
namespace.
Note
Other interpreters also offer the possibility to execute scripts
(e.g., execfile
in the plain Python interpreter, etc.).
It is also possible In order to execute this script as a standalone program, by executing the script inside a shell terminal (Linux/Mac console or cmd Windows console). For example, if we are in the same directory as the test.py file, we can execute this in a console:
$ python test.py
Hello
how
are
you?
Tip
Standalone scripts may also take command-line arguments
# Contents of my_file.py
import sys
print(sys.argv)
$ python my_file.py test arguments
['file.py', 'test', 'arguments']
Warning
Don’t implement option parsing like this yourself. Use a dedicated module such
as argparse
.
Importing objects from modules#
import os
os
<module 'os' (frozen)>
os.listdir('.')
['oop.md',
'exceptions.md',
'basic_types.md',
'standard_library.md',
'junk.txt',
'python-logo.png',
'demo2.py',
'solutions',
'workfile',
'__pycache__',
'test.py',
'first_steps.md',
'test.pkl',
'functions.md',
'my_file.py',
'control_flow.md',
'demo.py',
'python_language.md',
'data.txt',
'io.md',
'reusing_code.md']
And also:
from os import listdir
Importing shorthands:
import numpy as np
Warning
The following code is an example of what is called the star import and please, Do not use it
from os import *
Makes the code harder to read and understand: where do symbols come from?
Makes it impossible to guess the functionality by the context and the name (hint:
os.name
is the name of the OS), and to profit usefully from tab completion.Restricts the variable names you can use:
os.name
might overridename
, or vise-versa.Creates possible name clashes between modules.
Makes the code impossible to statically check for undefined symbols.
Modules are a good way to organize code in a hierarchical way. Actually, all the scientific computing tools we are going to use are modules:
import numpy as np # Module for data arrays
import scipy as sp # Module for scientific computing
# Use Numpy
np.linspace(0, 10, 6)
array([ 0., 2., 4., 6., 8., 10.])
Creating modules#
Note
If we want to write larger and better organized programs (compared to simple scripts), where some objects are defined, (variables, functions, classes) and that we want to reuse several times, we have to create our own modules.
Let us create a module demo
contained in the file demo.py
:
"A demo module."
def print_b():
"Prints b."
print("b")
def print_a():
"Prints a."
print("a")
c = 2
d = 2
Note
In this file, we defined two functions print_a
and print_b
. Suppose
we want to call the print_a
function from the interpreter. We could
execute the file as a script, but since we just want to have access to
the function print_a
, we are rather going to import it as a module.
The syntax is as follows.
import demo
demo.print_a()
a
demo.print_b()
b
Importing the module gives access to its objects, using the
module.object
syntax. Don’t forget to put the module’s name before the
object’s name, otherwise Python won’t recognize the instruction.
Introspection#
help(demo)
Help on module demo:
NAME
demo - A demo module.
FUNCTIONS
print_a()
Prints a.
print_b()
Prints b.
DATA
c = 2
d = 2
FILE
/Volumes/zorg/mb312/dev_trees/scientific-python-lectures/intro/language/demo.py
You can get the same output (in Jupyter / IPython) from:
demo?
An example session:
In [4]: demo?
Type: module
Base Class: <type 'module'>
String Form: <module 'demo' from 'demo.py'>
Namespace: Interactive
File: /home/varoquau/Projects/Python_talks/scipy_2009_tutorial/source/demo.py
Docstring:
A demo module.
In [5]: who
demo
In [6]: whos
Variable Type Data/Info
------------------------------
demo module <module 'demo' from 'demo.py'>
In [7]: dir(demo)
Out[7]:
['__builtins__',
'__doc__',
'__file__',
'__name__',
'__package__',
'c',
'd',
'print_a',
'print_b']
In [8]: demo.<TAB>
demo.c demo.print_a demo.py
demo.d demo.print_b demo.pyc
Importing objects from modules into the main namespace
In [9]: from demo import print_a, print_b
In [10]: whos
Variable Type Data/Info
--------------------------------
demo module <module 'demo' from 'demo.py'>
print_a function <function print_a at 0xb7421534>
print_b function <function print_b at 0xb74214c4>
In [11]: print_a()
a
Warning
Module caching
Modules are cached: if you modify demo.py
and re-import it in the
old session, you will get the old one.
Solution
In [10]: importlib.reload(demo)
‘__main__’ and module loading#
Note
Sometimes we want code to be executed when a module is
run directly, but not when it is imported by another module.
if __name__ == '__main__'
allows us to check whether the
module is being run directly.
File demo2.py
:
def print_b():
"Prints b."
print("b")
def print_a():
"Prints a."
print("a")
# print_b() runs on import
print_b()
if __name__ == "__main__":
# print_a() is only executed when the module is run directly.
print_a()
Importing it:
import demo2
b
Importing it again in the same session:
import demo2
Running it:
%run demo2
b
a
Scripts or modules? How to organize your code#
Note
Rule of thumb
Sets of instructions that are called several times should be written inside functions for better code reusability.
Functions (or other bits of code) that are called from several scripts should be written inside a module, so that only the module is imported in the different scripts (do not copy-and-paste your functions in the different scripts!).
How modules are found and imported#
When the import mymodule
statement is executed, the module mymodule
is searched in a given list of directories. This list includes a list
of installation-dependent default path (e.g., /usr/lib64/python3.11
) as
well as the list of directories specified by the environment variable
PYTHONPATH
.
The list of directories searched by Python is given by the sys.path
variable
import sys
sys.path
['/Library/Frameworks/Python.framework/Versions/3.12/lib/python312.zip',
'/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12',
'/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload',
'',
'/Volumes/zorg/mb312/.virtualenvs/sp-lectures/lib/python3.12/site-packages',
'/Volumes/zorg/mb312/dev_trees/jupytext/jupyterlab',
'/Volumes/zorg/mb312/dev_trees/jupytext/src',
'/Volumes/zorg/mb312/dev_trees/sphinx-book-theme/src']
Modules must be located in the search path, therefore you can:
write your own modules within directories already defined in the search path (e.g.
$HOME/.venv/lectures/lib64/python3.11/site-packages
). You may use symbolic links (on Linux) to keep the code somewhere else.modify the environment variable
PYTHONPATH
to include the directories containing the user-defined modules.Tip
On Linux/Unix, add the following line to a file read by the shell at startup (e.g. /etc/profile, .profile)
export PYTHONPATH=$PYTHONPATH:/home/emma/user_defined_modules
On Windows, https://support.microsoft.com/kb/310519 explains how to handle environment variables.
or modify the
sys.path
variable itself within a Python script.Tip
import sys new_path = '/home/emma/user_defined_modules' if new_path not in sys.path: sys.path.append(new_path)
This method is not very robust, however, because it makes the code less portable (user-dependent path) and because you have to add the directory to your sys.path each time you want to import from a module in this directory.
See also
See https://docs.python.org/3/tutorial/modules.html for more information about modules.
Packages#
A directory that contains many modules is called a package. A package
is a module with submodules (which can have submodules themselves, etc.).
A special file called __init__.py
(which may be empty) tells Python
that the directory is a Python package, from which modules can be
imported.
$ ls
_build_utils/ fft/ _lib/ odr/ spatial/
cluster/ fftpack/ linalg/ optimize/ special/
conftest.py __init__.py linalg.pxd optimize.pxd special.pxd
constants/ integrate/ meson.build setup.py stats/
datasets/ interpolate/ misc/ signal/
_distributor_init.py io/ ndimage/ sparse/
$ cd ndimage
$ ls
_filters.py __init__.py _measurements.py morphology.py src/
filters.py _interpolation.py measurements.py _ni_docstrings.py tests/
_fourier.py interpolation.py meson.build _ni_support.py utils/
fourier.py LICENSE.txt _morphology.py setup.py
From Jupyter / IPython:
import scipy as sp
sp.__file__
'/Volumes/zorg/mb312/.virtualenvs/sp-lectures/lib/python3.12/site-packages/scipy/__init__.py'
sp.version.version
'1.15.2'
# Also available as sp.ndimage.binary_dilation?
help(sp.ndimage.binary_dilation)
Help on function binary_dilation in module scipy.ndimage._morphology:
binary_dilation(input, structure=None, iterations=1, mask=None, output=None, border_value=0, origin=0, brute_force=False, *, axes=None)
Multidimensional binary dilation with the given structuring element.
Parameters
----------
input : array_like
Binary array_like to be dilated. Non-zero (True) elements form
the subset to be dilated.
structure : array_like, optional
Structuring element used for the dilation. Non-zero elements are
considered True. If no structuring element is provided an element
is generated with a square connectivity equal to one.
iterations : int, optional
The dilation is repeated `iterations` times (one, by default).
If iterations is less than 1, the dilation is repeated until the
result does not change anymore. Only an integer of iterations is
accepted.
mask : array_like, optional
If a mask is given, only those elements with a True value at
the corresponding mask element are modified at each iteration.
output : ndarray, optional
Array of the same shape as input, into which the output is placed.
By default, a new array is created.
border_value : int (cast to 0 or 1), optional
Value at the border in the output array.
origin : int or tuple of ints, optional
Placement of the filter, by default 0.
brute_force : boolean, optional
Memory condition: if False, only the pixels whose value was changed in
the last iteration are tracked as candidates to be updated (dilated)
in the current iteration; if True all pixels are considered as
candidates for dilation, regardless of what happened in the previous
iteration. False by default.
axes : tuple of int or None
The axes over which to apply the filter. If None, `input` is filtered
along all axes. If an `origin` tuple is provided, its length must match
the number of axes.
Returns
-------
binary_dilation : ndarray of bools
Dilation of the input by the structuring element.
See Also
--------
grey_dilation, binary_erosion, binary_closing, binary_opening,
generate_binary_structure
Notes
-----
Dilation [1]_ is a mathematical morphology operation [2]_ that uses a
structuring element for expanding the shapes in an image. The binary
dilation of an image by a structuring element is the locus of the points
covered by the structuring element, when its center lies within the
non-zero points of the image.
References
----------
.. [1] https://en.wikipedia.org/wiki/Dilation_%28morphology%29
.. [2] https://en.wikipedia.org/wiki/Mathematical_morphology
Examples
--------
>>> from scipy import ndimage
>>> import numpy as np
>>> a = np.zeros((5, 5))
>>> a[2, 2] = 1
>>> a
array([[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 1., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.]])
>>> ndimage.binary_dilation(a)
array([[False, False, False, False, False],
[False, False, True, False, False],
[False, True, True, True, False],
[False, False, True, False, False],
[False, False, False, False, False]], dtype=bool)
>>> ndimage.binary_dilation(a).astype(a.dtype)
array([[ 0., 0., 0., 0., 0.],
[ 0., 0., 1., 0., 0.],
[ 0., 1., 1., 1., 0.],
[ 0., 0., 1., 0., 0.],
[ 0., 0., 0., 0., 0.]])
>>> # 3x3 structuring element with connectivity 1, used by default
>>> struct1 = ndimage.generate_binary_structure(2, 1)
>>> struct1
array([[False, True, False],
[ True, True, True],
[False, True, False]], dtype=bool)
>>> # 3x3 structuring element with connectivity 2
>>> struct2 = ndimage.generate_binary_structure(2, 2)
>>> struct2
array([[ True, True, True],
[ True, True, True],
[ True, True, True]], dtype=bool)
>>> ndimage.binary_dilation(a, structure=struct1).astype(a.dtype)
array([[ 0., 0., 0., 0., 0.],
[ 0., 0., 1., 0., 0.],
[ 0., 1., 1., 1., 0.],
[ 0., 0., 1., 0., 0.],
[ 0., 0., 0., 0., 0.]])
>>> ndimage.binary_dilation(a, structure=struct2).astype(a.dtype)
array([[ 0., 0., 0., 0., 0.],
[ 0., 1., 1., 1., 0.],
[ 0., 1., 1., 1., 0.],
[ 0., 1., 1., 1., 0.],
[ 0., 0., 0., 0., 0.]])
>>> ndimage.binary_dilation(a, structure=struct1,\
... iterations=2).astype(a.dtype)
array([[ 0., 0., 1., 0., 0.],
[ 0., 1., 1., 1., 0.],
[ 1., 1., 1., 1., 1.],
[ 0., 1., 1., 1., 0.],
[ 0., 0., 1., 0., 0.]])
Good practices#
Use meaningful object names
Indentation: no choice!
Tip
Indenting is compulsory in Python! Every command block following a colon bears an additional indentation level with respect to the previous line with a colon. One must therefore indent after
def f():
orwhile:
. At the end of such logical blocks, one decreases the indentation depth (and re-increases it if a new block is entered, etc.)Strict respect of indentation is the price to pay for getting rid of
{
or;
characters that delineate logical blocks in other languages. Improper indentation leads to errors such as------------------------------------------------------------ IndentationError: unexpected indent (test.py, line 2)
All this indentation business can be a bit confusing in the beginning. However, with the clear indentation, and in the absence of extra characters, the resulting code is very nice to read compared to other languages.
Indentation depth: Inside your text editor, you may choose to indent with any positive number of spaces (1, 2, 3, 4, …). However, it is considered good practice to indent with 4 spaces. You may configure your editor to map the
Tab
key to a 4-space indentation.Style guidelines
Long lines: you should not write very long lines that span over more than (e.g.) 80 characters. Long lines can be broken with the
\
characterlong_line = "Here is a very very long line \ that we break in two parts."
Spaces
Write well-spaced code: put whitespaces after commas, around arithmetic operators, etc.:
a = 1 # yes a=1 # too cramped
A certain number of rules for writing “beautiful” code (and more importantly using the same conventions as anybody else!) are given in the Style Guide for Python Code.
Quick read
If you want to do a first quick pass through the Scientific Python Lectures to learn the ecosystem, you can directly skip to the next chapter: NumPy: creating and manipulating numerical data.
The remainder of this chapter is not necessary to follow the rest of the intro part. But be sure to come back and finish this chapter later.