Tom Westerhout
2 February 2021
Version control is a system that records changes to a file or set of files over time so that you can recall specific versions later.
Source: Pro Git Book
Example:
$ ls
main_fixed_eq5.tex main_prb.tex main_prl_final.tex
main_prl.tex main.tex main_v2.tex
$ # Thinking...
$ cp main_prb.tex main_prb_final.tex
Git
Of the professional developers who responded to the survey, almost 82% use GitHub as a collaborative tool
repository = files + history
Files:
quantum_skyrmions/
├── Analysis
│ ├── 19_site_cluster.yml
│ ├── 7_site_cluster.yml
│ ├── slurm_main.sh
│ └── SpinED-x86_64.AppImage
├── Drafts
│ ├── paper.tex
│ └── references.bib
├── Figures
│ ├── ground_state_energy.pdf
│ └── topological_invariant.pdf
├── Proofs
├── Published
├── Raw data
│ ├── exact_diagonalization_result_19.h5
│ └── exact_diagonalization_result_7.h5
└── Submitted
i.e. your whole project folder
The file history appears as snapshots in time called commits
Source: Git Handbook
9d33835a8e744c5f9cc950f672885dd706c0852f
Questions?
GitLab is technically superior,
but all the cool kids hang out on GitHub.
→ we will use GitHub for examples
$ cd quantum_skyrmions/
$ git init
Initialized empty Git repository in .../quantum_skyrmions/.git/
$ git remote add origin https://github.com/twesterhout/quantum_skyrmions.git
What your collaborators will do after you have created a repository:
$ git clone https://github.com/twesterhout/quantum_skyrmions.git
Cloning into 'quantum_skyrmions'...
Username for 'https://github.com': username
Password for 'https://username@github.com': password
remote: Enumerating objects: 147, done.
remote: Counting objects: 100% (147/147), done.
remote: Compressing objects: 100% (119/119), done.
remote: Total 147 (delta 30), reused 140 (delta 26), pack-reused 0
Receiving objects: 100% (147/147), 2.46 MiB | 777.00 KiB/s, done.
Resolving deltas: 100% (30/30), done.
$ cd quantum_skyrmions/
$ vi "Analysis/SimpleTests.wl" # Making the changes...
$ WolframKernel -script Analysis/SimpleTests.wl # Testing...
Hello world!
$ git add Analysis/SimpleTests.wl # Track SimpleTests.wl
$ git commit -m "Implement hello world" # Commit to changes
$ git push origin main # Uploading to remote server...
Username for 'https://github.com': username
Password for 'https://username@github.com': password
Enumerating objects: 9, done.
Counting objects: 100% (9/9), done.
Delta compression using up to 8 threads
Compressing objects: 100% (6/6), done.
Writing objects: 100% (6/6), 496.66 KiB | 12.42 MiB/s, done.
Total 6 (delta 2), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (2/2), completed with 2 local objects.
To https://github.com/twesterhout/quantum_skyrmions
bd922d6..817e53d main -> main
Now you are safe:
$ cd quantum_skyrmions/
$ git pull origin main # fetch & merge changes from remote
Username for 'https://github.com': username
Password for 'https://username@github.com': password
remote: Enumerating objects: 5, done.
remote: Counting objects: 100% (5/5), done.
remote: Compressing objects: 100% (1/1), done.
remote: Total 3 (delta 2), reused 3 (delta 2), pack-reused 0
Unpacking objects: 100% (3/3), 1.76 KiB | 451.00 KiB/s, done.
From https://github.com/twesterhout/quantum_skyrmions
* branch main -> FETCH_HEAD
817e53d..88e8c26 main -> origin/main
Updating 817e53d..88e8c26
Fast-forward
SimpleTests.wl | 139 ++++++++++++++++++++++++++++++++--------
1 file changed, 114 insertions(+), 25 deletions(-)
git init — create new local repository remote add — add new remote repository clone — clone an existing repository add — add some changes to the next commit commit — commit changes push — publish local changes on a remote pull — get all changes from remote to local repository status — view changes in working directoryGit is not meant to be used with large files!
Solutions:
Git LFS handles large files by storing references to the file in the repository, but not the actual file itself.
Source: GitHub Docs
git-lfs to track specific files:
$ git lfs track "*.h5" # Track all HDF5 files
Adding path *.h5
$ git add "heisenberg_37.h5" # Work using standard git commands
| GitHub Free | 2G |
| GitLab.com Free | 10G |
| Science GitLab | 10G (probably) |
DVC is built to make ML models shareable and reproducible. It is designed to handle large files, data sets, machine learning models, and metrics as well as code.
ML = Machine Learning, Source: DVC Homepage
dvc add,
dvc push
etc.
if you are broke (like me) then Surfdrive (500G per RU employee) else Amazon S3 (easier to collaborate)(In the nearest future, Ceph storage cluster managed by C&CZ might get you best of both)
Questions?
Virtualization refers to the act of creating a virtual (rather than actual) version of something, including virtual computer hardware platforms, storage devices, and computer network resources.
Source: Wikipedia
venv,
Conda):
Package, dependency and environment management for any language — Python, R, Ruby, Lua, Scala, Java, JavaScript, C/ C++, FORTRAN, and more.
Source: Conda documentation
Pre-installed on TCM cluster.
module load Tcm; module load
Anaconda-3-2020.07
and you are good to go.
$ # Creating an environment...
$ conda env create -f conda-devel.yml
$ # Adding to git...
$ git add conda-devel.yml
$ git commit -m "Create environment"
conda-devel.yml:
name: lattice_symmetries_devel
channels:
- defaults
- weinbe58 # for QuSpin
dependencies:
- python
- pip:
- black
- neovim
- loguru
- numpy
- scipy
# Stuff to compile the package locally
- gcc_linux-64
- gxx_linux-64
- cmake
- ninja
# For benchmarks and testing
- numba ==0.48 # QuSpin doesn't work with the latest version
- omp # Get multi-threading support for QuSpin
- quspin
...
If you already know what Conda is:
base environment!conda install (except for testing)!Singularity containers can be built to include all of the programs, libraries, data and scripts such that an entire demonstration can be contained and either archived or distributed for others to replicate no matter what version of Linux they are presently running.
Source: Singularity User Guide
Pre-installed on TCM cluster.
module load Singularity
and you are good to go.
$ # no GPU locally...
$ nvcc hello.cu -o hello
bash: nvcc: command not found
$ # Singularity to the rescue!
$ singularity build hello.sif Singularity
INFO: Starting build...
INFO: Running setup scriptlet
+ mkdir -p /workdir
INFO: Copying hello.cu to /workdir/
INFO: Running post scriptlet
+ /bin/bash /.post.script
INFO: Adding runscript
INFO: Creating SIF file...
INFO: Build complete: hello.sif
hello.cu:
__global__ void cuda_hello() {
printf("Hello World from GPU!\n");
}
int main() {
cuda_hello<<<1, 1>>>();
return 0;
}
Singularity:
Bootstrap: docker
From: nvidia/cuda:11.0-devel-ubuntu20.04
%setup
mkdir -p ${SINGULARITY_ROOTFS}/workdir
%files
hello.cu /workdir/
%post
cd /workdir
nvcc hello.cu -o hello
%runscript
/workdir/hello
Disclaimer: this is an advanced example
$ # Compiling locally...
$ g++-10 -std=c++20 thread.cpp -o thread -lpthread
$ # Works locally
$ ./thread
Stopping...
$ # And on lilo6
$ scp thread lilo.science.ru.nl:
$ ./thread
$ ssh lilo6.science.ru.nl ./thread
Stopping...
$ # But breaks on lilo5
$ ssh lilo5.science.ru.nl ./thread
[...] version `GLIBCXX_3.4.22' not found [...]
$ # Compile statically!
$ g++-10 -std=c++20 thread.cpp -o thread \
-static -static-libgcc -static-libstdc++ \
-Wl,--whole-archive -lpthread -Wl,--no-whole-archive
$ # Now works on lilo5 as well!
$ scp thread lilo.science.ru.nl:
$ ssh lilo5.science.ru.nl ./thread
Stopping...
thread.cpp:
#include <chrono>
#include <cstdio>
#include <thread>
auto main() -> int {
using namespace std::chrono_literals;
// A sleepy worker thread
auto sleepy_worker = std::jthread{
[](std::stop_token stoken) {
for (;;) {
std::this_thread::sleep_for(100ms);
if (stoken.stop_requested()) {
std::printf("Stopping...\n");
return;
}
}
}};
sleepy_worker.request_stop();
sleepy_worker.join();
}
More info:
Download an application, make it executable, and run! No need to install. No system libraries or system preferences are altered.
Source: AppImage Homepage
Example:
$ # Hmmm, our cluster does not have NeoVim installed...
$ # No problem!
$ wget -q https://github.com/neovim/neovim/releases/download/v0.4.4/nvim.appimage
$ chmod +x nvim.appimage
$ # Yay!
$ ./nvim.appimage
More info:
Questions?
Level 0:
Level 1: put your tests in a file & use a testing framework, e.g.
$ pytest
========================== test session starts ===========================
platform linux -- Python 3.x.y, pytest-6.x.y, py-1.x.y, pluggy-0.x.y
cachedir: $PYTHON_PREFIX/.pytest_cache
rootdir: $REGENDOC_TMPDIR
collected 1 item
test_sample.py F [100%]
================================ FAILURES ================================
______________________________ test_answer _______________________________
def test_answer():
> assert inc(3) == 5
E assert 4 == 5
E + where 4 = inc(3)
test_sample.py:6: AssertionError
======================== short test summary info =========================
FAILED test_sample.py::test_answer - assert 4 == 5
=========================== 1 failed in 0.12s ============================
test_sample.py:
def inc(x):
return x + 1
def test_answer():
assert inc(3) == 5
Source: pytest documentation
Level 0.5 (i.e. a half measure): use asserts excessively, e.g.
# Python
assert x > 0, "real log is undefined for negative inputs"
// C and C++
assert(x > 0 && "real log is undefined for negative inputs");
# Julia
@assert x > 0 "real log is undefined for negative inputs"
Note: asserts can be disabled for production runs, so no, they will not slow down your code.
Level 2: Continuous Integration
CI.yml:
name: Ubuntu
env:
BUILD_TYPE: Debug
INSTALL_LOCATION: .local
jobs:
build:
strategy:
matrix:
gcc-version: [7, 8, 9, 10]
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
with:
submodules: true
- name: configure
run: |
cmake -Bbuild \
-DCMAKE_CXX_COMPILER=g++-${{ matrix.gcc-version }} \
-DCMAKE_C_COMPILER=gcc-${{ matrix.gcc-version }} \
-DCMAKE_BUILD_TYPE=$BUILD_TYPE \
-DCMAKE_INSTALL_PREFIX=$GITHUB_WORKSPACE/$INSTALL_LOCATION
- name: build
run: cmake --build build -j4
- name: run tests
run: cd build && ctest -VV
- name: install project
run: cmake --build build --target install
How to come up with test cases?
https://twesterhout.github.io/programming-practices-talk-2021