Skip to content
This repository was archived by the owner on Jul 31, 2024. It is now read-only.

Commit b36b167

Browse files
committed
Merge branch 'release-3.0.0'
2 parents ec25487 + 3a5e311 commit b36b167

30 files changed

+704
-325
lines changed

CMakeLists.txt

+1-1
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ foreach(_TYPE DATAROOT CMAKE INCLUDE LIB BIN MAN DOC)
8181
endforeach()
8282

8383
configure_file(${PROJECT_SOURCE_DIR}/source/timemory/version.h.in
84-
${PROJECT_BINARY_DIR}/source/timemory/_version.h @ONLY)
84+
${PROJECT_SOURCE_DIR}/source/timemory/version.h @ONLY)
8585

8686
# execute_process(
8787
# COMMAND ${CMAKE_COMMAND} -E copy_if_different

README.md

+10-4
Original file line numberDiff line numberDiff line change
@@ -25,9 +25,12 @@
2525
| PyPi | `pip install timemory` |
2626
| Anaconda Cloud | `conda install -c jrmadsen timemory` |
2727

28+
29+
Timemory is a performance measurement and analysis framework.
30+
2831
## Why Use timemory?
2932

30-
- __*Timemory is arguably the most customizable performance analysis and tuning API available*__
33+
- __*Timemory is arguably the most customizable performance measurement and analysis API available*__
3134
- __*High-performance*__: very low overhead when enabled and borderline negligible runtime disabled
3235
- Ability to arbitrarily switch and combine different measurement types anywhere in application
3336
- Provides static reporting (fixed at compile-time), dynamic reporting (selected at run-time), or hybrid
@@ -106,11 +109,14 @@ you want to measure and run your code: initialization and output are automated.
106109

107110
## Profiling and timemory
108111

109-
Timemory is not a full profiler and is intended to supplement profilers, not be used in lieu of profiling,
110-
which are important for _discovering where to place timemory markers_.
112+
Timemory is not a full profiler (yet). The ultimate goal is to create a customizable profiler.
113+
Currently, timemory supports explicit instrumentation (i.e. minor modifications to source code)
114+
and explicit wrapping of dynamically-linked functions.
115+
Using profilers are currently important for _discovering where to place timemory markers_ or
116+
_which dynamically function calls to wrap with GOTCHA_.
111117
The library provides an easy-to-use method for always-on general HPC analysis metrics
112118
(i.e. timing, memory usage, etc.) with the same or less overhead than if these metrics were to
113-
records and stored in a custom solution (there is zero polymorphism) and, for C++ code, extensively
119+
records and stored in a custom solution and, for C++ code, extensively
114120
inlined.
115121
Functionally, the overhead is non-existant: sampling profilers (e.g. gperftools, VTune)
116122
at standard sampling rates barely notice the presence of timemory unless it is been

cmake/Modules/Options.cmake

-2
Original file line numberDiff line numberDiff line change
@@ -199,8 +199,6 @@ if(${PROJECT_NAME}_MASTER_PROJECT)
199199
endif()
200200

201201
# timemory options
202-
add_option(TIMEMORY_USE_EXCEPTIONS
203-
"Signal handler throws exceptions (default: exit)" OFF ${_FEATURE})
204202
add_option(TIMEMORY_USE_EXTERN_INIT
205203
"Do initialization in library instead of headers" OFF)
206204
add_option(TIMEMORY_USE_MPI

docker/Dockerfile

+1-1
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ WORKDIR /tmp
1818

1919
# build and env args used by package-manager
2020
ARG COMPILER_TYPE=gcc
21-
ARG GCC_VERSION=9
21+
ARG GCC_VERSION=8
2222
ARG CLANG_VERSION=9
2323
ARG ENABLE_DISPLAY=1
2424

docker/config/apt.sh

+5-3
Original file line numberDiff line numberDiff line change
@@ -60,8 +60,8 @@ run-verbose apt-get -y install clang-${CLANG_VERSION} libc++-dev libc++abi-dev
6060

6161
DISPLAY_PACKAGES="xserver-xorg freeglut3-dev libx11-dev libx11-xcb-dev libxpm-dev libxft-dev libxmu-dev libxv-dev libxrandr-dev \
6262
libglew-dev libftgl-dev libxkbcommon-x11-dev libxrender-dev libxxf86vm-dev libxinerama-dev qt5-default \
63-
emacs-nox vim-nox"
64-
CUDA_VER=$(dpkg --get-selections | grep cuda-cudart- | awk '{print $1}' | head -n 1 | sed 's/cuda-cudart-//g')
63+
emacs-nox vim-nox firefox"
64+
CUDA_VER=$(dpkg --get-selections | grep cuda-cudart- | awk '{print $1}' | tail -n 1 | sed 's/cuda-cudart-//g' | sed 's/dev-//g')
6565

6666
#-----------------------------------------------------------------------------#
6767
#
@@ -165,6 +165,8 @@ bash miniconda.sh -b -p /opt/conda
165165
export PATH="/opt/conda/bin:${PATH}"
166166
conda config --set always_yes yes --set changeps1 yes
167167
conda update -c defaults -n base conda
168-
conda install -n base -c defaults -c conda-forge python=3.6 pyctest cmake scikit-build numpy matplotlib pillow
168+
conda install -n base -c defaults -c conda-forge python=3.6 pyctest cmake scikit-build numpy matplotlib pillow ipykernel jupyter
169+
source activate
170+
python -m ipykernel install --name base --display-name base
169171
conda clean -a -y
170172
conda config --set always_yes no

docker/config/timemory-install.sh

+41-8
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,10 @@ export LD_LIBRARY_PATH=/usr/local/lib:${LD_LIBRARY_PATH}
2020
ROOT_DIR=${PWD}
2121
: ${TIMEMORY_BRANCH:="master"}
2222

23+
#--------------------------------------------------------------------------------------------#
24+
# LIKWID
25+
#--------------------------------------------------------------------------------------------#
26+
2327
run-verbose cd ${ROOT_DIR}
2428
run-verbose git clone https://github.com/RRZE-HPC/likwid.git
2529
run-verbose cd likwid
@@ -31,19 +35,37 @@ ssed -i 's/@install/install/g' Makefile
3135
ssed -i 's/@cd/cd/g' Makefile
3236
run-verbose make install -j6
3337

38+
#--------------------------------------------------------------------------------------------#
39+
# TAU
40+
#--------------------------------------------------------------------------------------------#
41+
3442
run-verbose cd ${ROOT_DIR}
3543
run-verbose wget http://tau.uoregon.edu/tau.tgz
3644
run-verbose tar -xzf tau.tgz
3745
run-verbose cd tau-*
38-
export CFLAGS="-O3"
39-
export CPPFLAGS="-O3"
46+
export CFLAGS="-O3 -fPIC"
47+
export CPPFLAGS="-O3 -fPIC"
4048
# run-verbose ./configure -python -prefix=/usr/local -pthread -papi=/usr -mpi -mpiinc=/usr/include/mpich -cuda=/usr/local/cuda
4149
run-verbose ./configure -python -prefix=/usr/local -pthread -papi=/usr -mpi -mpiinc=/usr/include/mpich
4250
run-verbose make -j6
4351
run-verbose make install -j6
4452
unset CFLAGS
4553
unset CPPFLAGS
4654

55+
#--------------------------------------------------------------------------------------------#
56+
# UPC++
57+
#--------------------------------------------------------------------------------------------#
58+
59+
run-verbose git clone https://[email protected]/berkeleylab/upcxx.git
60+
run-verbose cd upcxx
61+
export CFLAGS="-fPIC"
62+
export CPPFLAGS="-fPIC"
63+
run-verbose ./install /usr/local
64+
65+
#--------------------------------------------------------------------------------------------#
66+
# timemory
67+
#--------------------------------------------------------------------------------------------#
68+
4769
run-verbose cd ${ROOT_DIR}
4870
run-verbose git clone -b ${TIMEMORY_BRANCH} https://github.com/NERSC/timemory.git timemory-source
4971
run-verbose cd timemory-source
@@ -56,12 +78,23 @@ run-verbose cmake -DCMAKE_INSTALL_PREFIX=/usr/local -DCMAKE_BUILD_TYPE=Release -
5678
run-verbose ninja -j6
5779
run-verbose ninja install
5880

59-
run-verbose git clone https://[email protected]/berkeleylab/upcxx.git
60-
run-verbose cd upcxx
61-
export CFLAGS="-fPIC"
62-
export CPPFLAGS="-fPIC"
63-
run-verbose ./install /usr/local
81+
#--------------------------------------------------------------------------------------------#
82+
# tomopy
83+
#--------------------------------------------------------------------------------------------#
84+
85+
run-verbose cd ${ROOT_DIR}
86+
run-verbose git clone https://github.com/jrmadsen/tomopy.git tomopy
87+
run-verbose cd tomopy
88+
run-verbose git checkout accelerated-redesign
89+
run-verbose conda env create -n tomopy -f envs/linux-36.yml
90+
source activate
91+
run-verbose conda activate tomopy
92+
run-verbose python -m pip install -vvv .
93+
run-verbose conda clean -a -y
6494

65-
cd ${ROOT_DIR}
95+
#--------------------------------------------------------------------------------------------#
96+
# Cleanup
97+
#--------------------------------------------------------------------------------------------#
6698

99+
run-verbose cd ${ROOT_DIR}
67100
run-verbose rm -rf ${ROOT_DIR}/*

docker/docker-compose.yml

+12-9
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,10 @@ version: "3.3"
77

88
services:
99
#--------------------------------------------------------------------------#
10-
# TiMemory development container
11-
timemory-dev:
12-
image: jrmadsen/timemory:dev
10+
# timemory development container w/ CUDA 10.0
11+
#
12+
timemory-dev-10-0:
13+
image: jrmadsen/timemory:cuda-10.0
1314
stdin_open: true
1415
tty: true
1516
build:
@@ -25,9 +26,10 @@ services:
2526
ENABLE_DISPLAY: "1"
2627

2728
#--------------------------------------------------------------------------#
28-
# TiMemory development container
29-
timemory-dev-edge:
30-
image: jrmadsen/timemory:dev-edge
29+
# timemory development container w/ CUDA 10.1
30+
#
31+
timemory-dev-10-1:
32+
image: jrmadsen/timemory:cuda-10.1
3133
stdin_open: true
3234
tty: true
3335
build:
@@ -43,7 +45,8 @@ services:
4345
ENABLE_DISPLAY: "1"
4446

4547
#--------------------------------------------------------------------------#
46-
# TiMemory development container
48+
# timemory development container w/ CUDA 10.2
49+
#
4750
timemory-latest:
4851
image: jrmadsen/timemory:latest
4952
stdin_open: true
@@ -53,9 +56,9 @@ services:
5356
dockerfile: Dockerfile
5457
args:
5558
BASE_IMG: "nvidia/cuda"
56-
BASE_TAG: "latest"
59+
BASE_TAG: "10.2-devel-ubuntu18.04"
5760
COMPILER_TYPE: "gcc"
58-
GCC_VERSION: "9"
61+
GCC_VERSION: "8"
5962
CLANG_VERSION: "9"
6063
REQUIRE_CUDA_VERSION: "10.1"
6164
ENABLE_DISPLAY: "1"

docs/about.md

+16-10
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,24 @@
11
# About
22

3-
Timemory is very _lightweight_, _cross-language_ timing, resource usage, and hardware counter utility
4-
for reporting timing, resource usage, and hardware counters for the CPU and GPU.
5-
6-
Timemory is implemented as a generic C++11 template library but supports implementation in C, C++, CUDA, and Python codes.
7-
The design goal of timemory is to enable "always-on" performance analysis that can be standard part of the source code
8-
with a negligible amount of overhead.
9-
10-
Timemory is not intended to replace profiling tools such as Intel's VTune, GProf, etc. -- instead,
11-
it complements them by enabling one to verify timing and memory usage without the overhead of the profiler.
3+
Timemory is a modular API for performance measurements and analysis with a very lightweight overhead.
4+
If timemory does not support a particular measurement type or analysis method, user applications
5+
can easily create their own component that accomplishes the desired task.
6+
7+
Timemory is implemented as a generic C++11 template library but supports implementation
8+
in C, C++, CUDA, and Python codes.
9+
The design goal of timemory is to create an easy-to-use framework for generating
10+
performance measurements and analysis methods which are extremely flexible
11+
with respect to both how the data is stored/accumulated and which methods the measurement
12+
or analysis supports. In order to keep the overhead as low as reasonable achievable,
13+
a significant amount of logic is evaluated at compile-time. As a result, applications
14+
which directly utilize the C++ template interface tend to see increases in compilation
15+
time, binary size (especially when debug info is included), and compiler memory usage.
16+
If this aspect of timemory impedes productivity, the best course of action is to
17+
utilize the library interface.
1218

1319
## Credits
1420

15-
Timemory is actively maintained by NERSC at Lawrence Berkeley National Laboratory
21+
Timemory is actively developed by NERSC at Lawrence Berkeley National Laboratory
1622

1723
| Name | Affiliation | GitHub |
1824
| ------------------ | :---------------------------------------------------------------------------------------: | :-------------------------------------------: |

docs/components/gotcha.md

+61-1
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,14 @@ where `Size` is the maximum number of external functions to be wrapped,
1818
`Diff` is an optional template parameter for differentiating `gotcha` components with equivalent `Size` and `Tools`
1919
parameters but wrap different functions. Note: the `Tools` type cannot contain other `gotcha` components.
2020

21+
### Use Cases
22+
23+
The `gotcha` component in timemory can provide either of the following functionalities:
24+
25+
1. Scoped instrumentation around external dynamically-linked function calls
26+
2. Wholesale replacement of external dynamically-linked function calls
27+
28+
2129
## Traditional GOTCHA in C
2230

2331
Writing a traditional GOTCHA wrapper in C requires a bit of work and the recommended methods require
@@ -77,7 +85,59 @@ A GOTCHA wrapper with timemory can be defined in a single line of code and there
7785
macros provided that eliminate the need for specifying the function signature (return-type and
7886
arguments) due to the ability for templates to extract these parameters.
7987
80-
## GOTCHA Example
88+
## Function Replacement with GOTCHA Example
89+
90+
Suppose that an application is spending a signifincant amount of run-time calling the standard math library
91+
double-precision `exp` function and you would like to investigate whether using single-precision `expf` is an
92+
acceptable substitute in certain regions. Instead of writing the [full specification](#traditional-gotcha-in-c)
93+
shown previously and manually enabling and disabling the wrapper in the region of interest, you can use timemory.
94+
95+
Provided below is the full component specification require to implement the replacement function.
96+
97+
```cpp
98+
// NOTE: declared in tim::component::
99+
struct exp_intercept : public base<exp_intercept, void>
100+
{
101+
double operator()(double val)
102+
{ return expf(static_cast<float>(val)); }
103+
};
104+
```
105+
106+
When the `exp_intercept` component is _appropriately_ configured within a `gotcha` component,
107+
whenever `double exp(double)` is invoked, timemory will (via the GOTCHA library) redirect this function call to
108+
`double exp_intercept::operator()(double)` -- and within this function, the replaced call to `expf` is implemented.
109+
Configuring the `gotcha` component is slightly different, however. The goal of this component is __*optimization*__
110+
instead of __*measurement or analysis*__ so the `gotcha` component is specified as such:
111+
112+
```cpp
113+
using empty_t = component_tuple<>;
114+
using exp_gotcha_t = gotcha<1, empty_t, exp_intercept>;
115+
```
116+
117+
In other words, we define a `gotcha` component with an empty set of measurement/analysis components and
118+
then we specify _a component_ as the third template parameter. The _combination_ of an empty measurement/analysis
119+
collection as the second template parameter and a component as the third template parameter trigger a special
120+
optimized wrapper around the original function call which is explicitly designed to minimize the overhead of
121+
the redirection to the wrapper.
122+
123+
All that remains is implementing the initializer that specifies which functions are wrapped by the `gotcha` component:
124+
125+
```cpp
126+
__attribute__((constructor))
127+
void init_gotcha()
128+
{
129+
exp_gotcha_t::get_initializer() = [=]()
130+
{ TIMEMORY_C_GOTCHA(exp_gotcha_t, 0, exp); };
131+
}
132+
```
133+
134+
In the above, using the constructor attribute (only available with certain compilers) creates a function
135+
that is automatically executed before main starts. Since this function configured the gotcha within a call-back,
136+
instead of explicitly invoking `TIMEMORY_C_GOTCHA`, the gotcha wrapper is not activated during this function,
137+
meaning that the redirection of `exp` to `expf` is explicitly tied to the allocation of
138+
at least one instance of `exp_gotcha_t`.
139+
140+
## Instrumentation with GOTCHA Example
81141
82142
> Reference: [source/tests/gotcha_tests.cpp](https://github.com/NERSC/timemory/blob/master/source/tests/gotcha_tests.cpp)
83143

0 commit comments

Comments
 (0)