dec2003.tar

Deploying Modules

Dale Southard

Under most current Unix/X11 systems, the user experience is primarily controlled by four groups of settings:

The user's choice of shell interpreter (csh, sh, bash, ...).
The setting of environment variables like $PATH, $MANPATH, $INFOPATH, $LD_LIBRARY_PATH, and $LM_LICENSEFILE.
Shell aliases and macros.
The X11 resources available through the X11 file search path or through xrdb.

Typically, these settings are statically configured by sourcing "rc" files located in system directories like /etc, or in the user's home directory when the user either logs in or launches a shell. For simple environments, this system works well in both standalone systems and moderate-sized networks.

Unfortunately, in larger, more complex environments that must serve a broader spectrum of users, a single statically configured environment can present problems for both users and systems administrators. Specifically, the following problems effect most large statically configured environments to varying degrees:

1. Applications are not "version controlled" -- Users are provided a single version of most common apps. If the app is upgraded, the newer version is both deployed (made available for use) and committed (made the default version for all users) at the same time. This means that updates generally require extensive testing prior to deployment to ensure that they do not adversely affect any existing users.

2. Central points of failure -- All applications maintainers require modification privileges on the network "bin" directories and rc files in order to make their apps available to users. The larger number of people working with the same directories and files increases both the opportunity for making mistakes and the effects of those mistakes on the user community. For example, a single error in a centralized rc file can interrupt things for all users, even those whose work-flow was otherwise independent of the application maintained by the person who made the error.

3. Namespace issues -- Software packages often contain applications that conflict with the names of applications in other packages. This can become confusing for users, since resolving the conflict often means renaming an application in ways that do not match the documentation. Worse yet, it is easy for application maintainers to overwrite each other's executables or shell variables when deploying new packages.

4. Shell dependence -- On well-configured systems, users are free to choose their shell interpreter based on personal preference. Unfortunately, many application maintainers, and several commercial applications, only support their application under a single shell or family of shells.

The modules package provides an answer to these problems by providing users with a dynamically reconfigurable environment that is independent of their choice of shells. Rather than installing all applications in a common space managed by a single rc file, the use of modules encourages installation of each version of each application in its own directory tree (see the sidebar "Installing Applications in Custom Locations"). It encapsulates the environment changes for each version of each application in a separate modulefile that the user can load, unload, or query using the module command.

The modules package itself is open source and available via the Internet. It was originally described in John L. Furlani's paper (LISA Conference, 1991) as a set of functions for csh and sh shells, but the modules package has become a standalone application with an embedded Tcl/TclX scripting language and now supports most common shell interpreters including csh, tcsh, sh, ksh, bash, zsh, and Perl. Additional features include enhanced logging, modulefile tracing, and support of hierarchical modulefile organization. Recent releases also include more advanced configuration options and the ability to do apropos-style searches of the installed modulefiles.

This article will focus on the basics of using and configuring modules from the viewpoint of users, systems administrators, and application maintainers. The examples used in this article are from a system running modules 2.2.2.4 under IRIX, but were also tested against release 3.1.6 under Debian Linux. In the examples, the module files were installed in a directory tree located at /depot/modulefiles, but sites could easily use other locations.

Modules for Users

The core of the modules system is the module command interpreter. During shell initialization, a shell alias or macro that calls the module command is defined. When this macro is called, the module interpreter runs the specified modules sub-command, which in turn performs actions based on one or more modulefiles.

From a user's prospective, the most important function of modules can be found in four of the modules sub-commands:

The module list sub-command shows which modules are currently loaded in the user's shell.
The module load and module unload sub-commands are used to make a software package available for use, or to remove a package and its settings from the user's environment.
The module swap sub-command is equivalent to a module unload followed by a module load. It is typically used to switch between different versions of the same package.
The module avail sub-command lists all the modulefiles available for the user to load/unload/swap. This provides a mechanism for users to "discover" newly upgraded/installed software packages without resorting to email/Web/paper memos or lists.

The following transcript illustrates use of the avail, list, and swap modules commands in a simplified modules setup:

myhost%  module avail

--------- /depot/modulefiles ---------
MIPSpro            purify
MIPSpro/7.2.1.3    purify/2002.05.00
MIPSpro/7.3        totalview        
MIPSpro/7.3.1.1    totalview/4.1.0-1
MIPSpro/7.3.1.2    totalview/4.1.0-6
MIPSpro/7.3.1.3    totalview/5.0.0-1
modules            totalview/5.0.0-4
modules/2.2.2.4
       
myhost%  module list
Currently Loaded Modulefiles:
          1) modules        2) MIPSpro/7.3  

myhost%  cc -version
MIPSpro Compilers: Version 7.30

myhost%  module swap MIPSpro MIPSpro/7.3.1.2
Switching 'MIPSpro/7.3' to 'MIPSpro/7.3.1.2'...ok.

myhost% cc -version
MIPSpro Compilers: Version 7.3.1.2m

The modules package also provides mechanisms for users to incorporate their own collections of modulefiles as well as set up the default modules they load at login and display what changes a modulefile will make to their enviroment. The newest release also allows searching though a whatis-style database for related modulefiles. These advanced features are beyond the scope of this introductory article.

Modules Initialization

Installing modules for use on a system requires initializing modules during startup of the user's shell and providing one or more modulefiles for loading/unloading/swapping. Initialization of modules is done by adding a stanza like the following to the appropriate cshrc or profile file:

#
# setup csh for modules and load/unload some modulefiles
#
if (-r /depot/modules/modules/init/csh) then
      source /depot/modules/modules/init/csh
      module load modules
      module load default
      module unload totalview
endif

When deploying modules, it is important to consider where modules will be initialized. Two obvious choices are the centralized rc file for each shell (e.g., /etc/profile for sh), or the user's home directory rc files (e.g., ~/.cshrc for csh).

Using the centralized rc files is somewhat easier to implement. Using the centralized rc files also provide a nice hierarchy for customizing the modulefiles that users are given by default:

1. A network-wide "default" modulefile contains the modules sub-commands to load a standard set of modulefiles.

2. Each system can modify this default set of modulefiles via the addition of modules sub-commands to the same rc files where modules is initialized (e.g., in the above example the "totalview" application is unloaded after default modulefile is loaded).

3. Individual users can add module sub-commands to their startup files to further customize the starting set of modulefiles in their account.

Unfortunately, using the centralized rc files for modules initialization comes with one huge disadvantage -- the centralized rc files are not sourced when a non-login shell is executed. That means that subshells launched from within applications like vi or emacs will not initialize modules and thus will not allow users to utilize modules commands. Even more confusing for advanced users, launching remote applications via rsh/ssh will also fail to initialize modules.

Finally, there are some module sub-commands like initadd and initrm, which will not work correctly unless the modules initialization is present in the user's home directory rc files. All these problems are not present if the modules system is initialized from the rc files in each user's home directory.

Note that it is possible (though not recommended) to initialize modules in both the centralized and home directory rc files. In fact, I've used such a setup for several years without problems.

Modules Modulefiles

The modules system uses one modulefile for each version of each software package that will be managed by modules. The modulefiles themselves contain the commands necessary for making an application available to users, including modifying the $PATH or other environment variables, setting aliases, and making resources available to X11. In version 2 and later, these modulefiles are written in the Tcl scripting language and begin with the string "#%Module". The modules system has provided several extensions to Tcl that are specific to managing user environments:

setenv -- Sets an environment variable in the shell.

prepend-path -- Adds a value at the beginning of a colon-separated list.

append-path -- Adds a value at the end of a colon-separated list.

set-alias -- Creates an alias using sh-style args.

x-resource -- Merges a resource into the X11 resource db.

module -- Allows a modulefile to execute other modules sub-commands.

uname -- Provides access to host info on the target system.

The first four commands perform an inverse of their normal function when called by a module remove sub-command. Most modulefiles contain fewer than a dozen lines of code. For example, the modulefile below would suffice for controlling an installation of XEmacs:

#%Module
#
#   xemacs 21.4.13 modulefile
#
########## standard defs ##########
set sys     [uname sysname]
set os      [uname release]
set mach    [uname machine]
##### program specific stuff ######
set root                "/depot/emacs/xemacs-21.4.13"
prepend-path  PATH      $root/bin
prepend-path  MANPATH   $root/man
prepend-path  INFOPATH  $root/lib/xemacs/info

The modulefiles are usually gathered into directory trees that are grouped by application. Within these directories, the modulefiles are generally named using the version of the application. For example:

/depot/modulefiles/totalview:
  4.1.0-1
  4.1.0-6
  5.0.0-1
  5.0.0-4
/depot/modulefiles/MIPSpro:
  7.2.1.3
  7.3
  7.3.1.1
  7.3.1.2
  7.3.1.3

With a structure like this, modulefiles can be referred to as "application/version". For example, the command module load totalview/5.0.0-1 would load the modulefile at /depot/modulefiles/totalview/5.0.0-1/.

If the modulefile is not fully qualified (e.g., module load totalview), modules will load the lexicographically highest modulefile name (in our example, /depot/modulefiles/totalview/5.0.0-4). If desired, a ".version" file can be used to override the lexicographical sort. For example, if the following .version file were in the /depot/modulefiles/totalview directory listed above, the command module load totalview would default to the totalview/4.1.0-6 modulefile rather than the lexicographically higher totalview/5.0.0-4 modulefile:

#%Module
set ModulesVersion 4.1.0-6

Note that the ModulesVersion variable refers to the version file that should be searched for in the hierarchy rooted in the directory where the .version file was encountered, not the version of modules that the user is running.

Modules in Practice

Once configured, the modules system provides a mechanism to address the issues mentioned in the introduction to this article:

1. Applications are "version controlled". Users can switch between installed versions on the fly. In practice, this vastly simplifies the process of deploying new software for sys admins. New packages are simply installed and a new modulefile is created. "Friendly users" can test the cutting edge versions at their leisure while those in the trenches can continue to access older versions as long as needed. No one is ever tied to a "default" version.

2. Application maintainers no longer need modification privileges on common directories and rc files in order to maintain software. Instead, they are given two directories: a directory in which to install their software releases (e.g., /depot/xemacs), and a directory in which to install their modulefiles (e.g., /depot/modulefiles/xemacs). Since these directories are not shared with other maintainers, the impact of their activities is limited to the applications they are maintaining. Other software packages are unaffected by any errors that may occur.

3. Namespace issues are under the control of the user. Because users can control which modules are loaded and in what order, name conflicts are either eliminated entirely or can be unambiguously controlled.

4. Modules are shell-independent. A single modulefile will work for csh, tcsh, sh, ksh, bash, zsh, and Perl users.

Beyond solving these problems, the modules package also provides a more powerful environment for users and developers. Here are some examples of setups I've used:

Providing the environment for the ubiquitous /usr/local tree as a modulefile allows developers to module unload it prior to compiling software that shouldn't be linked to non-system libraries.
Providing a GNU modules system that prepends the common GNU commands to the head of the user's environment. This provides an easy route for GNU/Linux users to experience a familiar environment even when they are working on a non-GNU system like Solaris or AIX.
Providing "meta modulefiles" that use multiple module commands to load customized environments for different tasks like development, graphic design, quantum chemistry, or CAD/CAM work.
Providing multiple, highly customized versions of the same package to better meet the needs of specific researchers or groups. This is a common need in quantum chemisty and other disciplines where researchers need specialized versions of common software to handle the particulars of their project.

Additionally, the modules system forms the underpinnings of the OSCAR Linux clustering effort's "env-switcher" application. The combination of modules plus env-switcher allows users to access all the power of the underlying modules system for on-the-fly environment changes, as well as providing a mechanism for user-controlled environment changes that are persistent across logins and ssh/rsh remote program invocations.

From my experience, the single largest benefit of modules is the time the system saves. Because modulefiles allow deploying applications without committing the entire user base to using them, tasks that previously required extensive planning, testing, and administrative overhead to prevent versioning conflicts can instead be solved immediately by simply installing the software in parallel under modules control. This is appreciated by the users as well as the systems administrators and application maintainers.

References

http://modules.sourceforge.net

http://hpcf.nersc.gov/software/os/modules.html

http://env-switcher.sourceforge.net/

Dale Southard is currently a sys admin with the Advanced Simulation and Computing VIEWS project at Lawrence Livermore National Laboratory in Livermore, California. He can be reached by email at: [email protected]. This work was performed under the auspices of the U.S. Department of Energy by the University of California, Lawrence Livermore National Laboratory under contract No. W-7405-Eng-48.