CASINO frequently-asked questions ================================= This file contains the answers to questions that have been asked often about the CASINO quantum Monte Carlo software, and also to questions that are useful for the potentially confused, even if they have not been asked yet. CASINO web page: www.tcm.phy.cam.ac.uk/~mdt26/casino.html A. INSTALLING CASINO -------------------- A1. What do I need for installing CASINO? You'll need a Fortran 90 compiler, a UNIX environment with the bash shell, the [t]csh shell, and, if you have a parallel machine, an MPI library. Optionally, if you plan to use the provided plotting utilities, you should install xmgrace and gnuplot. These dependencies can be installed automatically by the 'install' script, so go to A2 below. Keep reading if you would rather do this by hand. Here are full setup commands for popular Linux distributions, which you can copy and paste into your terminal: * Ubuntu 10.04+, Linux Mint 9+: sudo apt-get install make gcc gfortran g++ tcsh openmpi-bin\ libopenmpi-dev grace gnuplot * Debian Lenny+ (5.0+): su -c "apt-get install make gcc gfortran g++ tcsh openmpi-bin\ libopenmpi-dev grace gnuplot" * Fedora 9+, CentOS: su -c "yum install make gcc gcc-gfortran gcc-c++ tcsh openmpi\ openmpi-devel grace gnuplot" * openSUSE 11.3+: sudo zypper install make gcc gcc-fortran gcc-c++ tcsh openmpi\ openmpi-devel xmgrace gnuplot * Mandriva 2010.2+: su -c "gurpmi make gcc gcc-gfortran gcc-c++ tcsh openmpi grace gnuplot" * Gentoo: su -c "emerge make gcc tcsh openmpi grace sci-visualization/gnuplot" * Arch Linux: su -c "pacman -S make gcc gcc-fortran tcsh openmpi grace gnuplot" * Slackware: this distribution has no official package manager with automated dependency resolution. Slackware users, have fun. Both of you. Notes: - For openSUSE (up to at least version 11.3), after installing the packages above and before trying to compile the code, you may need to log out and back in for changes to take effect. - As a bleeding-edge rolling-release distro, Arch Linux as of May 2011 is the first to hit a compilation problem with the default gfortran 4.6.0. This will get fixed eventually on gfortran's side, or worked around on ours. - Installing OpenMPI under Ubuntu 10.10 and 11.04 will pull a package called blcr-dkms as a 'recommends', but it does not compile against the Linux kernel versions distributed in either of these releases. Ignore the error message after installing the above packages, and run sudo apt-get remove --purge blcr-dkms to remove the problematic package. It is not needed for normal operation. Ubuntu bug: https://bugs.launchpad.net/bugs/700036. After this, go to A2 below. Your CASINO_ARCH is 'linuxpc-gcc' for the non-parallel version and 'linuxpc-gcc-parallel' for the parallel one. Fedora 12 and later versions on multi-core systems should use 'linuxpc-gcc-parallel.fedora' instead. A2. How do I install CASINO? Change into the CASINO directory and type: ./install Then follow the prompts. This script helps you find the correct CASINO_ARCH for your machine, or to create a new one if you need to. The script allows you to configure several CASINO_ARCHs for the same installation, e.g., allowing you to keep binaries from different compilers, and to share the installation on machines with different architectures. The script creates a ~/.bashrc.casino file which is loaded on login and defines a function called casinoarch, so that at any point you can type 'casinoarch' at the prompt to switch to the configuration that you choose at any moment. You can run the install script as many times as you like, to do things like adding/removing CASINO_ARCHs or reordering them by preference. Should the abilities of the script fail to satisfy your needs, you can always define a CASINO_ARCH by creating the file CASINO/arch/data/.arch The syntax of .arch files is explained in CASINO/arch/README and in Appendix 5 of the manual. Many more details of this procedure are given in the 'Installation' section of the CASINO manual. A3. I use a PC or PC-based parallel system. Which Fortran 90 compiler should I use? In our tests the Intel Fortran compiler (ifort v11) generates the fastest executables for x86-based processors (NB, the PathScale Fortran compiler is tied in speed with ifort on AMD processors). The Gnu GCC gfortran compiler is free, widely available and reasonably fast. This is a sample set of benchmarks run in March 2007 (too long ago): --------------------- Compiler Score --------------------- g95 0.91 42% af90 10.0.2 91% gcc 4.3 93% iforte 9.1 100% path 2.9.99 103% --------------------- --------------------------------------------------------------------- Here is another one run on 16 processors of a Cray XK6 system (Jaguar) in December 2011 (MDT). Again - Intel ifort wins, with Gnu gfortran now a close second. examples/crystal/blips/silicon with the following keywords changed: vmc_nstep : 800 #*! Number of steps (Integer) vmc_nconfig_write : 800 #*! Number of configs to write (Integer) dmc_equil_nstep : 20 #*! Number of steps (Integer) dmc_stats_nstep : 20 #*! Number of steps (Integer) dmc_target_weight : 800.d0 #*! Total target weight in DMC (Real) Total CASINO CPU time DMC energy Ifort : 54.6394 seconds -63.253019810592 +/- 0.007612017376 Gnu : 62.6400 seconds -63.253019816018 +/- 0.007612049672 Pathscale : 67.8962 seconds -63.259034999613 +/- 0.011000738147 PGF : 72.9000 seconds -63.245897141026 +/- 0.009693089487 Cray : 84.0300 seconds -63.192701255463 +/- 0.006393938977 NB: The pathscale compiler is being deprecated on Crays and is no longer supported on Jaguar (as of June 2012). --------------------------------------------------------------------- Here is another set of runs for H on graphene (MDT 1.2013) on a Cray XK7 (Titan). VMC Gnu 79.65 seconds -282.890716861446 +/- 0.160222570613 PGF 83.74 seconds -282.895599950580 +/- 0.121131726135 Ifort 84.48 seconds -282.970112923443 +/- 0.128262416613 Cray 87.28 seconds -282.890716855184 +/- 0.160222562116 DMC Gnu 397.20 seconds -284.256578870330 +/- 0.048408651743 Cray 456.63 seconds -284.256578740638 +/- 0.048408642717 Ifort 477.75 seconds -284.252588591933 +/- 0.117686071014 PGF 490.10 seconds -283.915347739145 +/- 0.077180929508 Moral: use the GNU compiler (linuxpc-gcc-pbs-parallel.titan). Note: GNU and Cray answers essentially agree with each other, but Ifort (VMC) and PGF (DMC) are giving significantly different answers. This needs to be investigated. ---------------------------------------------------------------------- A4. What if I want to link to my hardware-optimized BLAS and LAPACK libraries to improve the performance of CASINO? Use the install script to create a new CASINO_ARCH. If you choose to edit the compiler options when prompted, you will be offered the choice of enabling external BLAS and LAPACK libraries. You will need to provide the linker flags required to link to these libraries (this depends on your setup, ask you system administrator or read your library's documentation). For example, the Intel MKL is linked (at least in TCM) by setting: BLAS: -lmkl lguide -lpthread LAPACK: -lmkl_lapack -lmkl -lguide -lpthread NB, CASINO does not make particularly intensive use of linear algebra routines, so expect the improvements to be minor. Actually, early tests seemed to show that CASINO was *slower* when the MKL library was linked instead of the generic library we provide. This may or may not be true in your case, but we recommend doing a couple of tests before switching libraries. A5. I want to run CASINO on my multi-processor PC, which I administer myself. How do I set that up? How do I set up multiple compilers to work with the MPI library? Recent Linux distributions offer ready-to-use MPI-enabled compilers. Installing the relevant packages should provide you with an 'mpif90' executable, a 'mpirun' launcher, libraries etc, and you would not need to go through the instructions below (see A1 above instead). If you want a different compiler, or if your distribution does not provide said package(s), follow these instructions: 1] Install the Fortran compilers you want to use (ifort, g95, gfortran, etc). 2] Download OpenMPI from www.open-mpi.org . (NB, we use OpenMPI in this example, but you can choose to use any other MPI implementation so long as it fully supports MPI-2 [LAM/MPI can *not* be used].). 3] Extract from the archive and change into the newly created directory. 4] Choose a directory for the installation. We will refer to this directory as (e.g. /opt/openmpi). 5] For each compiler you want OpenMPI to work with, let: = name of Fortran compiler binary, = name of Fortran 77 compiler (may be the same as ), = name of C compiler, and = name of C++ compiler = name under which we will refer to this setup. (e.g., ===ifort, =icc =icpc). Configure OpenMPI with: ./configure CC= CXX= FC= F77= --prefix=/ Then run: make all sudo make install make clean Repeat this step with the next compiler. 6] Add //bin to your PATH, and //lib to your LD_LIBRARY_PATH. Use the compiler as mpif90 / mpicc. 7] Refer to A2 above to set up CASINO for this compiler. 8] To switch compilers, change your PATH, LD_LIBRARY_PATH and CASINO_ARCH. Simple bash functions (which can be put in your .bashrc file) can be written to simplify this task. A6. Should I install CASINO as root? Should I modify the *rc files under /etc? Short answer: no. Long answer: you shouldn't unless you know what you're doing. CASINO is designed to be installed under the user's home directory. If you want to do a system-wide installation we can provide no help, as we haven't ever done this. A future version of CASINO will support this. A7. I have compilation/setup problems. What do I do? Contact your system administrator to check you have done the right setup. If you have, contact the CASINO developers. A8. Compilation problems on the archaic CASINO_ARCH=ibm_sp3.*: a) In the linking stage, I get something like: 1586-346 (S) An error occurred during code generation. The code generation return code was -1. ld: 0706-005 Cannot find or open file: /tmp/ipajR8xEe ld:open(): No such file or directory 1586-347 (S) An error occurred during linking of the object produced by the IPA Link step. The link return code was 255. Your machine probably has a small CPU-time limit for interactive commands set by default. Type 'limit cputime 10:00' (or a larger value) and type 'make' again. b) Why does the utilities compilation die on an IBM SP3 sometimes? If you have a 'TMPDIR' environment variable set, unset it. The compiler uses it and may be confused by it having a value already. A9. Compilation problems on the ancient CASINO_ARCH=origin.*: a) Why do I get the error message "cannot fork: too many processes"? Use something like 'jlimit processes 24' to increase the maximum number of concurrent processes. A10. How do I install xmgrace on my machine? If you can get root privileges and xmgrace is available for your distribution in .rpm or .deb form, go for that. The following are instructions for installing xmgrace as a non-root user when motif-compatible libraries are not available: 1] Get the SOURCES of grace (http://plasma-gate.weizmann.ac.il/Grace/) and lesstif (http://www.lesstif.org/), and untar them in a temp directory. 2] Go into the lesstif directory and install lesstif by issuing: $ ./configure --prefix $HOME/misc/lesstif $ make all install 3] Go into the grace directory and install it by issuing: $ ./configure --prefix $HOME/misc --with-extra-incpath="$HOME/misc/lesstif/include:/usr/X11R6/include" --with-extra-ldpath="$HOME/misc/lesstif/lib:/usr/X11R6/lib: /usr/X11R6/lib64" --with-motif-library="-lXm -lXt -lXft -lXrender -lXext -lX11" $ make ; make install 4] Then link $HOME/misc/grace/bin/xmgrace into your path (e.g., $HOME/bin) for convenience. You should be able to start xmgrace by typing 'xmgrace' at the command prompt. A11. How do I set up CASINO under Windows? The following instructions may be out of date. 1] Install cygwin. 2] Install MS Platform SDK, MS Visual C Studio (versions of which are free(-as-in-beer)ly available from Microsoft); the ifort installer will complain if an appropriate version of any of these is not present. 3] Install the ifort compiler for Windows (notice that there is no non-commercial version of ifort for Windows, but a 30-day trial). 4] Neither the SDK or the C Studio will set their environment variables by default (ifort does so on demand; ask the installer to do it when prompted). To set the missing variables by hand, go to System Properties [e.g., right-click on My Computer > Properties] > Advanced tab > Environment variables button. Add whatever variables/values are set in the vsvars32.bat script which is somewhere in your Visual Studio dir (try Program Files\Microsoft Visual Studio 8\Common7\Tools). Cygwin should then import and translate the variables on startup, and everything should go smoothly from there on, as far as tested. 5] Set up the environment as usual; your CASINO_ARCH is windowspc-ifort. 6] Cross your fingers and compile CASINO. Note that ifort-compiled binaries will ignore the Cygwin-provided UNIX layer, which leads, e.g., to symlinks not being understood, so be patient. Also notice that we do not support this set up. You are on your own. However we will be happy to amend these instructions if you find a mistake or if you'd like to tell us how you built an MPI-enabled version. A12. When compiling with the pathscale compiler on a Cray system, I get huge numbers of warning messages saying "the use of 'mktemp' is dangerous, better use 'mkstemp'". How do I stop this? This warning message is generated at link time when the '--whole-archive' option and static linking of the Suse Linux-provided libpthread.a library is used. The warning message will not occur if the program is linked dynamically, and since that is what the general Linux community does, it is unlikely that SuSE will address this anytime soon. In all compilers except PathScale, Cray was able to remove these options and replace them with others. Cray has added the following to the Cray Application Developer's Environment User's Guide (S-2396) 4.7.2 Known Warnings Code compiled using the options --whole-archive,-lpthread get the following warning message issued by libpthread.a(sem_open.o): warning: the use of 'mktemp' is dangerous, better use 'mkstemp'. The --whole-archive option is necessary to avoid a runtime segmentation fault when using OpenMP libraries. This warning can be safely ignored. In short, there is nothing you/we/Cray can do about this until SuSE addresses it. A13. When compiling with the pathscale compiler, use of the Fortran TINY function triggers warnings similar to the following: File = /home/billy/CASINO/src/nl2sol.f90, Line = 58, Column = 12 This numeric constant is out of range. This appears to be a harmless pathscale/glibc bug - ignore. A14. When compiling on various Cray machines e.g. with gfortran I get lots of messages like this: "mkdir: cannot create directory `/tmp/1362687766.mdt26.altd': File exists" This happens when using make in parallel (which the install script does automatically using 'make -j') and is the result of some harmless bug in the local software. It does not affect the end result of the compilation. I have reported it to User Support on Titan (Feb 2013) and they have promised to fix it. B. USING CASINO --------------- B1. I have set all the parameters in the Jastrow factor to zero, but the VMC energy that I get differs from the expected (Hartree-Fock) energy. What's wrong? To remove the Jastrow, you need to *delete* all parameters (except the cutoffs) from all Jastrow terms in the correlation.data file. CASINO will apply the cusp conditions on the Jastrow parameters (=> non-zero alpha_1 parameter) if any parameter is provided in the file, even if it's zero. Alternatively, you can set USE_JASTROW=F in the input file, provided you do not want to optimize the Jastrow parameters in this run. B2. What does keyword xxx mean? What values can it take? What structure does block-keyword xxx have? Type 'casinohelp xxx', or 'casinohelp search xxx', or 'casinohelp all'. The 'casinohelp' utility is very useful, and is the most up-to-date keyword reference available -- often more than the CASINO manual. B3. My machine/cluster crashed when I tried to optimize a Slater-Jastrow- backflow wave function with CASINO v2.1 or earlier. How can this happen? Your compiler/OS is not correctly handling over-sized memory allocation; CASINO should exit on 'Allocation problem' rather than crash the machine. CASINO v2.2 solves this problem. B4. Jastrow optimization consistently fails for an electron-hole bilayer, the VMC energies are all over the place. What's wrong? This is likely to be the case when using fluid orbitals in an excitonic regime, where the HF configurations appear not to be very good for optimizing Jastrow parameters. Try setting OPT_MAXITER=1 and OPT_METHOD=madmin, which will run single-iteration NL2SOL optimizations. It may take a few VMC-madmin cycles to reach sensible energies, after which you can use the default OPT_MAXITER (=20) to get to the minimum more quickly. B5. I've done VMC + varmin + VMC, and the energy (or variance) of the second VMC calculation is greater than the final 'mean energy' reported during varmin. Does this mean that the optimization has failed? Short answer: No, you must compare the VMC energies and variances and ignore what varmin tells you. Optimization fails if the energy of the second VMC run is significantly greater than that of the first VMC run. Long answer: The 'unreweighted variance' is a good target function to minimize to lower the TRUE energy (that of the later VMC run), but the values it takes are of no physical significance. This often applies to the 'reweighted variance' as well. Notice that the initial value of the two target functions must be the true variance of the first VMC run (or the true variance of a subset of the configurations if you set VMC_NCONFIG_WRITE < VMC_NSTEP, which is usually the case). The same applies to the 'mean energy' reported in varmin. B6. I've run a calculation on my PC and the reported time per block oscillates wildly by factors of 2--10. Why is this? OR I've run a calculation on my PC and the reported CPU time is 2--10 times longer than I expected. Why is this? Your Linux computer is dynamically changing the processor frequency to save power. It should set the frequency to the maximum when you run CASINO, but by default it ignores processes with a nice value grater than zero (CASINO's default is +15). To fix this, supply the '--user.nice=0' option to runqmc. You will see wild fluctuations in the block timings only if some other process triggers a change in the frequency, otherwise you will only notice slowness. The above problem should not appear in modern Linux distributions (from 2008 onwards). Other than this, block times are bound to oscillate, since during the course of the simulation particles are moved in/out of (orbital/Jastrow/ backflow) cutoff regions, which increases/reduces the expense of calculating things for particular configurations. However, provided the blocks contain a sufficient number of moves, the block times should be equally long on average. B7. What is the 'Kinetic energy check' that appears after VMC equilibration? Why does it say that the gradient or Laplacian is 'poor'? What does it mean if it fails? In this check, CASINO computes the numerical gradient and Laplacian of the wave function at a VMC-equilibrated configuration by finite differences with respect to the position of an electron. The results are compared against the analytic gradient and Laplacian, which are used in the computation of kinetic energies all over CASINO. CASINO reports the degree of accuracy to which the numerical and analytical derivatives agree using four different levels: optimal, good, poor and bad. Each of these correspond to a relative difference of <1E-7, <1E-5, <1E-3, and >1E-3, respectively. If the accuracy is 'bad', the test reports a failure, and the analytical and numerical gradient and Laplacian are printed, for debugging purposes. This check should detect any inconsistencies between the coded expressions for the values, gradients and Laplacians of orbitals, Jastrow terms and backflow terms, as well as inconsistencies in the process of calculating kinetic energies and wave-function ratios. Therefore it's reassuring to see that a given wave function passes the kinetic energy test. However there are cases where the results from the test should not be taken too seriously: - The thresholds defining the optimal/good/poor/bad levels are arbitrary. For small systems they seem to be a good partition (one usually gets good or optimal), but for large systems the procedure is bound to become more numerically unstable and the thresholds may not be appropriate. Thus a 'poor' gradient or Laplacian need not be signalling an error. - For ill-behaved wave functions (e.g. after an unsuccessful optimization) it is not uncommon for the check to report a failure ('bad' level). This is not a bug in the code, you'll just need to try harder at optimizing the wave function. B8. To use CASINO in shared memory mode on a Blue Gene system I need to set (at runtime) an environment variable BG_SHAREDMEMPOOLSIZE (Blue Gene/P) or BG_SHAREDMEMSIZE Blue Gene/Q) to be the size of the shared memory partition in Mb. How do I do this, why do I need to do it, and what value should I set it to? This variable may be set by using the --user.shmemsize option to the runqmc script (exactly what environment variable this defines is set in the appropriate .arch file for the machine in question). Note that on Blue Gene/Ps the default for this is zero (i.e. the user always needs to set it explicitly) and on Blue Gene/Qs the default is something like 32-64 Mb and dependent on the number of cores/per node requested -- thus for very small jobs on Blue Gene/Qs you don't need to set it explicitly. An example of such a machine is Intrepid (a Blue Gene/P at Argonne National Lab) and you can look in the following file to see exactly what is done with the value of --user.shmemsize: CASINO/arch/data/bluegene-xlf-cobalt-parallel.intrepid.arch For a Blue Gene/Q you can look in: CASINO/arch/data/bluegene-xlf-ll-parallel.bluejoule.arch CASINO/arch/data/bluegene-xlf-ll-parallel.mira.arch CASINO/arch/data/bluegene-xlf-ll-parallel.cetus.arch CASINO/arch/data/bluegene-xlf-ll-parallel.vesta.arch Let us use the Intrepid Blue Gene/P as an example. Nodes on intrepid have 4 cores and 2Gb of available memory, and the machine can run in 3 different "modes" : SMP - 1 process per node, which can use all 2Gb of memory. DUAL - 2 processes per node, each of which can use 1Gb of memory. VN - 4 processes per node, each of which can use 512Mb of memory. Using shared memory in SMP mode doesn't seem to work (any job run in this way is 'Killed with signal 11') presumably due to an IBM bug. (One might consider using OpenMp to split the configs over the 4 cores you would run in SMP mode and have multiple --tpp threads). So - taking VN mode as an example - we would like to be able to allocate (say) 1.5Gb of blip coefficients on the node, and for all four cores to have access to this single copy of the data - which is the point of shared memory runs. Such a calculation would be impossible if all 4 cores had to have a separate copy of the data. Unfortunately, on Intrepid, the user needs to know how much of the 2Gb will be taken up by shared memory allocations. This is easy enough, since the only vector which is 'shallocked' is the vector of blip coefficients. This will be a little bit smaller than the size of the binary blip file, which you can work it out with e.g. 'du -sh bwfn.data.bin'. On a machine with limited memory like this one, it will pay to use the 'sp_blips' option in input, and to use a smaller plane-wave cutoff if you can get away with it. Thus if your blip vector is 1.2Gb in size, then run the code with something like : runqmc --shmem=4 --ppn=4 --user.shmemsize=1250 --walltime=2h30m where the --shmem indicates we want to share memory over 4 cores, the --ppn ('processes per node') indicates we want to use VN mode, and walltime is the maximum job time. Note the value for --user.shmemsize is in Mb. A technical explanation for this behaviour (from a sysadmin) might run as follows: "The reason why the pool allocation needs to be up-front is that when the node boots it sets up the TLB to divide the memory space corresponding to the mode (SMP, DUAL, or VN). if you look at Figure 5-3 in the Apps Red Book (See http://www.redbooks.ibm.com/abstracts/sg247287.html?Open), you'll see how the (shared) kernel and shared memory pool are layed out first then the remainder of the node's memory is split in 2 for DUAL mode. The diagram 5-2 I believe is erroneous and should look more like the the one for DUAL except with 4 processes. With this scheme, it is impossible to grow the pool dynamically unless you fixed each processes' memory instead (probably more of a burden). In theory, in VN mode you should be able to allocate a pool that is 2GB - kernel (~10MB IIRC) - RO program text - 4 * RW process_size. There is a limitation that the TLB has not that many slots and the pages referenced there can only have certain sizes: 1MB, 16MB or 256MB, or 1GB. (There is some evidence 1GB may only be used in SMP mode). So depending on the size of the pool you ask for, it may take more than 1 slot, and there is the possibility of running out of slots. We don't know whether or not the pool size is padded in any way. So one thing you could try is to increase it slightly to 3*256MB." How much memory can you shallocate in general? In practice I (MDT) find on Intrepid that any attempt to set BG_SHAREDMEMPOOLSIZE to be greater than 1800 Mb results in an "Insufficient memory to start application" error. Values up to that should be fine. Sometime with larger values (from 1500 Mb to 1800 Mb) one sees the following error: "* Out of memory in file /bghome/bgbuild/V1R4M2_200_2010-100508P/ppc/bgp/ comm/lib/dev/mpich2/src/mpi/romio/adio/ad_bgl/ad_bgl_wrcoll.c, line 498" but this doesn't seem to affect the answer. I prefer Cray machines. B9. Are there any other special considerations for running on Blue Gene/Q machines apart from the shared memory thing described in question B8? Yes. If you do a direct comparison between the speeds of say a Cray XE6 and a Blue Gene/Q on the same number of cores you will find that the Blue Gene is significantly slower (4 times slower in the test that I - MDT - did). What can we do about this? With a bit of digging you then find that BG/Qs are *supposed* to be run with multiple threads per core, since you simply don't get full instruction throughput with 1 thread per core. It's not that more threads help, it's that less than 2 threads per core is like running with only one leg. You just can't compare BG/Q to anything unless you're running on >2 hardware threads per core. Most applications max out 3 or 4 per core (and CASINO maxes out at 4 - see below). BG/Q is a completely in-order, single-issue-per-hardware-thread core, whereas x86 cores are usually multi-issue. Here are some timings, for a short VMC run on a large system (TiO2 - 648 electrons) on 512 cores: Hector - Cray XE6 : 55.1 sec Vesta - Blue Gene/Q : 222.19 sec Now try 'overloading' each core on the BG/Q ------------------------------------------- Can take up to 4 threads/core (in powers of 2) --ppn=16 222.19 --ppn=32 150.02 --ppn=48 "48 is not a valid ranks per node value" --ppn=64 138.44 sec So by using 512 cores to run 2048 MPI processes, we improve to ~2.5 times slower than Hector (512 cores running 512 processes) rather than 4 times slower as before. Note that the number of processes per core has to be a power of 2 (i.e. not 3) - unless you do special tricks. Now try Openmp on the cores instead of extra MPI processes (OpenmpShm mode) ---------------------------------------------------------- --ppn=1 --tpp=1 222.19 --ppn=1 --tpp=2 197.18 --ppn=1 --tpp=4 196.42 Thus (a) not much point going beyong tpp=2, and (B) Openmp is not as good as just using extra MPI processes. Thus the best way to run CASINO on a Blue Gene/Q such as Mira at Argonne National Lab seems to be: runqmc -n xxx --ppn=64 -s --user.shmemsize=yyy i.e. run a shared memory calc with a shared memory block of yyy Mb on xxx 16-core nodes with 64 MPI processes per node (4 per core). B10. How come binary blip wave function files (bwfn.data.bin or b1) produced in different ways for the same system have very different files sizes? The authors of the various interfaces between CASINO and plane-wave codes such as PWSCF/CASTEP/ABINIT make different assumptions about which unoccupied orbitals should be included in the file. For example, with CASTEP/ABINIT you always have to produce a formatted pwfn.data files as a first step, then this must be transformed into a formatted blip bwfn.data file using the blip utility, then when this is read in by CASINO, this will be transformed into a much smaller bwfn.data.bin file (if keyword WRITE_BINARY_BLIPS is T, which is the default. With PWSCF, you may produce any of these files (pwfn.data, bwfn.data or old format bwfn.data.b1) directly with the DFT code without passing through a sequence of intermediaries. In the CASTEP case, the unstated assumption is that the formatted file is kept as a reference (either pwfn.data or the larger bwfn.data depending on whether disk space is a problem) and that this contains all the orbitals written out by the DFT code. When converting to binary, only the orbitals occupied in the requested state are written out. If later, you want to do a different excited state, then the bwfn.data.bin file should be regenerated for that state. In the PWSCF case, because the formatted file need never exist, then all orbitals written by the DFT code are included in the binary blip file (old format bwfn.data.b1) including all unoccupied ones. Thus these files can be considerably larger than ones produced through the ABINIT/CASTEP/blip utility/CASINO route. In general one should control the number of unoccupied orbitals in the blip file through some parameter in the DFT code itself. For example, you might try the following: CASTEP : Increase 'nextra_bands' to something positive. Note that CASTEP has a help system just like CASINO's casinohelp. Type 'castep -help nextra_bands'. PWSCF : Play with the 'nbnd' keyword. ABINIT : Play with the 'nband' keyword. See http://www.abinit.org/documentation/helpfiles/for-v6.8/ input_variables/varbas.html#nband Note that it is planned to tidy this system considerably in a future release of CASINO (including making the blip utility write in binary directly from the pwfn.data, without bwfn.data ever having existed). C. USING THE UTILITIES ---------------------- C1. How do I use utility xxx? There are README files in most subdirectories under CASINO/utils, which should contain the information you need. D. USING CASINO WITH EXTERNAL PROGRAMS -------------------------------------- Please send MDT any additional incantations or advice to include here; we do not always have 'in-house' people who understand how to use all the external wave-function generating programmes that we supposedly support, so your advice could be very useful. Note this sort of advice is likely to date quickly. D1. How do I compile PWSCF/quantum espresso on machine X? UK Hector machine (PWSCF current SVN version 26/11/2011) -------------------------------------------------------- None of the four compilers works by default - I (MDT) eventually managed to make the GNU one work. There may be other ways. Check current default module with 'module list' - if PrgEnv-gnu is not listed then run the relevant one of the next three lines' module unload PrgEnv-path module unload PrgEnv-cray module unload PrgEnv-pgi then module load PrgEnv-gnu then in the espresso base directory ./configure ARCH=crayxt Edit the make.sys file that gets produced, changing the CFLAGS line from CFLAGS = -fast $(DFLAGS) $(IFLAGS) to CFLAGS = -O3 $(DFLAGS) $(IFLAGS) Then type 'make pw'. JaguarPF -------- Having loaded module PrgEnv-pgi: ./configure ARCH=crayxt4 make pw UK Hartree Centre - Blue Joule Blue Gene/Q (Espresso version 5.02 Feb 2013) ------------------------------------------ This requires a bit of hacking. Not all of the following may be necessary, or even the right thing to do (you're supposed to fiddle with some preliminary files in the install directory) but this is a recipe that worked for me. (1) module load scalapack (2) cd ~/espresso (3) ./configure (4) This will create ~/espresso/make.sys, in which you should change the following things (again, these may not all be necessary but I couldn't be bothered to check): - MPIF90 = mpixlf90 #F90 = /opt/ibmcmp/xlf/bg/14.1/bin/bgxlf90_r CC = /opt/ibmcmp/vacpp/bg/12.1/bin/bgxlc_r F77 = /opt/ibmcmp/xlf/bg/14.1/bin/bgxlf_r to MPIF90 = mpixlf90_r CC = mpixlc_r F77 = mpixlf77_r - LD = /opt/ibmcmp/vacpp/bg/12.1/bin/bgxlc_r -qarch=qp -qtune=qp to LD = mpixlf90_r -qarch=qp -qtune=qp - LD_LIBS = to LDLIBS = -L/opt/ibmcmp/xlf/bg/14.1/lib64 -lxlopt -lxl -lxlf90_r \ -lxlfmath - and the Blas/Lapack libs should be changed from whatever they are to BLAS_LIBS = -L/bgsys/ibm_essl/prod/opt/ibmmath/essl/5.1/lib64/ -lesslbg BLAS_LIBS_SWITCH = external LAPACK_LIBS = -L/bgsys/ibm_essl/prod/opt/ibmmath/essl/5.1/lib64/ \ -L/gpfs/packages/ibm/lapack/3.4.2/lib -lesslbg -llapack LAPACK_LIBS_SWITCH = external Note that the 3.4.2 version of the lapack directory might have changed by the time you come to read this - check this exists and has liblapack.a in it. (5) Then type 'make pw' in ~/espresso. D2. When using an xwfn.data file produced by PWSCF, I get a CASINO error: ERROR : CHECK_KPOINTS Two k points (1 and 2) are equivalent. Why? Various versions of PWSCF from summer 2011 produce an apparently miscompiled executable, which when told to print out a list of k points in the output file (or in xwfn.data), just prints out a string of zeroes i.e. all k points are listed as (0.0 0.0 0.0). Having dug around in the PWSCF source code, it seems that the k point grid was being defined before the reciprocal lattice vectors, hence all the k points really were zero. I have no idea why no-one noticed this - you would think it would be pretty fundamental. Looking at the latest (26/11/2011) this has now been fixed. Solution : upgrade your PWSCF. EDIT: later investigation showed this problem was extant between commit 8051 and commit 8121. D3: When I try to use a pseudopotential produced by casino2upf in PWSCF, then it stops and whines about not having been compiled with support for hybrid functionals. I didn't specify a hybrid functional, so why? The casino2upf utility marks any UPF files it creates as having been generated using Hartree-Fock (since they generally are). If you do not supply a value for the 'input_dft' keyword in the `system' section of the PWSCF input file, then PWSCF will attempt to use the functional specified in the pseudopotential file i.e. it will try to do a Hartree-Fock calculation, and -- given that this is only possible with PWSCF if you compiled it having invoked 'configure' with the '--enable-exx' flag -- then the code may stop and whine about not having been compiled with support for hybrid functionals. This can be confusing. Solution: specify 'input_dft' in the input file. D4: When I try to use a CRYSTAL input file containing a user-defined pseudopotential from the CASINO library, CRYSTAL09 stops with an error message: ERROR **** PSINPU **** RADIAL POWER IN INPUT PSEUDO OUT OF RANGE Why? According to CRYSTAL author Roberto Orlando: "There was a stupid restriction in the input for pseudopotentials that limited the maximum allowed angular quantum number to L=4, even if algorithms are general. Thus, we have extended it to L=5. Unfortunately, this implies the addition of one datum in the record below INPUT. This change is reported in the manual, but everybody using pseudopotentials now fails. Maybe we should change the error message ..." There are clearly backwards compatible ways in which this could have been done but anyhow, the point is that from CRYSTAL09 onwards, all input decks constructed from pseudopotentials obtained from the CASINO pseudopotential library before Feb 2012 will fail. The solution is to add an extra zero to end of the second line of each pseudopotential (effectively stating that your pseudo contains no g functions). Thus: INPUT 1.000 8 8 8 0 0 51.12765602 1.00000000 -1 38.05848906 -860.41240728 0 etc.. becomes INPUT 1.000 8 8 8 0 0 0 51.12765602 1.00000000 -1 38.05848906 -860.41240728 0 etc.. On 14/2/2012 MDT converted all the files in the CASINO pseudopotential library and in the examples to reflect this change so everything should now work with CRYSTAL09. D5: I get the wrong energy when calculating spin-polarized molecules with gwfn.data files derived from CRYSTAL06 or CRYSTAL09. Why? The converters in utils/wfn_converters/crystal0[6,9] were broken for a period of several years. The error was introduced in patch 2.4.41 and fixed in 2.11.5 (and backfixed into the official 2.10 release). The essence of the problem was that the block of orbital coefficients for the down-spin orbitals was incorrectly just a copy of those for the up-spin-orbitals, rather than being the correct down-spin ones. Apologies for our having taken so long to notice this. To fix this problem manually: Fine line 466 in the following two files: CASINO/utils/wfn_converters/crystal06/casino_interface.f90 CASINO/utils/wfn_converters/crystal09/casino_interface.f90 In the incorrect version of these files, this should read: ak_pointer(1,:)=0 Change this to: ak_pointer(1,1)=0 ak_pointer(1,2)=ndfrf D6: I use the runpwscf script included with CASINO to run the PWSCF code, and I get no output. Why? Like runqmc, the runpwscf script uses the CASINO architecture scheme in order to know how to run calculations on any given machine. The data files containing the machine definition is in CASINO/arch/data/xx.arch (the 'arch file'). If your CASINO arch file defines a command for running CASINO and PWSCF (such as SCRIPT_RUN: mpirun -np &NPROC& &BINARY&), then it must include a tag &BINARY_ARGS& following the &BINARY& tag. This is because the PWSCF executable takes command line arguments such as '-pw2casino', '-npool' and '< in.pwscf >> out.pwscf' etc. which are not required by CASINO. For this reason an arch file which works for CASINO may not necessarily work for PWSCF without this modification. To check this on a batch machine, type 'runpwscf --qmc --check-only' and examine the resulting 'pw.x' batch file. If the line containing e.g. the mpirun command above does not have '-pw2casino' following 'pw.x', then you will need to add &BINARY_ARGS& to your arch file. E. QMC QUESTIONS ---------------- E1. Pseudopotential issues. a) When generating a trial wave function, do I have to use the same pseudopotentials that I am going to use in the QMC calculations? In general, yes. Although there may be cases where using different pseudopotentials may be of little importance, this is not true in most situations. CASINO requires the pseudopotential on a grid, and quantum chemistry codes tend to require them expanded in Gaussians. The online library at www.tcm.phy.cam.ac.uk/~mdt26/casino2.html has the pseudopotentials in both formats (with the latter done specifically for GAUSSIAN, CRYSTAL and GAMESS). There are also some notes supplied below the table which you should read. Have a look in the CASINO/utils/pseudo_converters directory for utilities that convert pseudopotentials formatted for other non-Gaussian codes into the correct format for CASINO. Utilities are currently available for ABINIT, CASTEP, PWSCF, and GP. b) Can I generate a wave function using an all-electron method and then use pseudopotentials in CASINO? You shouldn't, see D1.a. If you generate all-electron orbitals, you should run all-electron QMC calculations. Notice that the scaling of all-electron QMC with atomic number is problematic, you should almost always use pseudopotentials to simulate everything but first-row atoms. F. REQUESTS/CRITICISM --------------------- F1. Why don't you distribute .deb/.rpm packages? The source code is necessary for getting CASINO to work where it's most useful, i.e., on high-performance clusters. It's impractical to generate binaries for each machine we know of, particularly given that we presently don't have access to most of the supported architectures. However .deb and .rpm packages for popular Linux distributions are a planned feature, which we will get to once we enable system-wide installations (see A6 above). F2. The name of the old 'rundmc' script, was it a homage to the 80's hip-hop group of the same name? Nope, that was accidental. [MDT: That's what Pablo thinks. In fact it was entirely deliberate..] [PLR: Knew it!] [MDT: Don't ask me, because I don't know why But it's like that, and that's the way it is]