Skip to content

Commit b0e999a

Browse files
authored
Fuzzing: ClusterFuzz integration (#7079)
The main addition here is a bundle_clusterfuzz.py script which will package up the exact files that should be uploaded to ClusterFuzz. It also documents the process and bundling and testing. You can do bundle.py OUTPUT_FILE.tgz That bundles wasm-opt from ./bin., which is enough for local testing. For actually uploading to ClusterFuzz, we need a portable build, and @dschuff had the idea to reuse the emsdk build, which works nicely. Doing bundle.py OUTPUT_FILE.tgz --build-dir=/path/to/emsdk/upstream/ will bundle wasm-opt (+libs) from the emsdk. I verified that those builds work on ClusterFuzz. I added several forms of testing here. First, our main fuzzer fuzz_opt.py now has a ClusterFuzz testcase handler, which simulates a ClusterFuzz environment. Second, there are smoke tests that run in the unit test suite, and can also be run separately: python -m unittest test/unit/test_cluster_fuzz.py Those unit tests can also run on a given bundle, e.g. one created from an emsdk build, for testing right before upload: BINARYEN_CLUSTER_FUZZ_BUNDLE=/path/to/bundle.tgz python -m unittest test/unit/test_cluster_fuzz.py A third piece of testing is to add a --fuzz-passes test. That is a mode for -ttf (translate random data into a valid wasm fuzz testcase) that uses random data to pick and run a set of passes, to further shape the wasm. (--fuzz-passes had no previous testing, and this PR fixes it and tidies it up a little, adding some newer passes too). Otherwise this PR includes the key run.py script that is bundled and then executed by ClusterFuzz, basically a python script that runs wasm-opt -ttf [..] to generate testcases, sets up their JS, and emits them. fuzz_shell.js, which is the JS to execute testcases, will now check if it is provided binary data of a wasm file. If so, it does not read a wasm file from argv[1]. (This is needed because ClusterFuzz expects a single file for the testcase, so we make a JS file with bundled wasm inside it.)
1 parent 25b8e6a commit b0e999a

11 files changed

+808
-22
lines changed

scripts/bundle_clusterfuzz.py

+135
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,135 @@
1+
#!/usr/bin/python3
2+
3+
'''
4+
Bundle files for uploading to ClusterFuzz.
5+
6+
Usage:
7+
8+
bundle.py OUTPUT_FILE.tgz [--build-dir=BUILD_DIR]
9+
10+
The output file will be a .tgz file.
11+
12+
if a build directory is provided, we will look under there to find bin/wasm-opt
13+
and lib/libbinaryen.so. A useful place to get builds from is the Emscripten SDK,
14+
as you can do
15+
16+
./emsdk install tot
17+
18+
after which ./upstream/ (from the emsdk dir) will contain builds of wasm-opt and
19+
libbinaryen.so (that are designed to run on as many systems as possible, by not
20+
depending on newer libc symbols, etc., as opposed to a normal local build).
21+
Thus, the full workflow could be
22+
23+
cd emsdk
24+
./emsdk install tot
25+
cd ../binaryen
26+
python3 scripts/bundle_clusterfuzz.py binaryen_wasm_fuzzer.tgz --build-dir=../emsdk/upstream
27+
28+
When using --build-dir in this way, you are responsible for ensuring that the
29+
wasm-opt in the build dir is compatible with the scripts in the current dir
30+
(e.g., if run.py here passes a flag that is only in a new/older version of
31+
wasm-opt, a problem can happen).
32+
33+
Before uploading to ClusterFuzz, it is worth doing the following:
34+
35+
1. Run the local fuzzer (scripts/fuzz_opt.py). That includes a ClusterFuzz
36+
testcase handler, which simulates what ClusterFuzz does.
37+
38+
2. Run the unit tests, which include smoke tests for our ClusterFuzz support:
39+
40+
python -m unittest test/unit/test_cluster_fuzz.py
41+
42+
Look at the logs, which will contain statistics on the wasm files the
43+
fuzzer emits, and see that they look reasonable.
44+
45+
You should run the unit tests on the bundle you are about to upload, by
46+
setting the proper env var like this (using the same filename as above):
47+
48+
BINARYEN_CLUSTER_FUZZ_BUNDLE=`pwd`/binaryen_wasm_fuzzer.tgz python -m unittest test/unit/test_cluster_fuzz.py
49+
50+
Note that you must pass an absolute filename (e.g. using pwd as shown).
51+
52+
The unittest logs should reflect that that bundle is being used at the
53+
very start ("Using existing bundle: ..." rather than "Making a new
54+
bundle"). Note that some of the unittests also create their own bundles, to
55+
test the bundling script itself, so later down you will see logging of
56+
bundle creation even if you provide a bundle.
57+
58+
After uploading to ClusterFuzz, you can wait a while for it to run, and then:
59+
60+
1. Inspect the log to see that we generate all the testcases properly, and
61+
their sizes look reasonably random, etc.
62+
63+
2. Inspect the sample testcase and run it locally, to see that
64+
65+
d8 --wasm-staging testcase.js
66+
67+
properly runs the testcase, emitting logging etc.
68+
69+
3. Check the stats and crashes page (known crashes should at least be showing
70+
up). Note that these may take longer to show up than 1 and 2.
71+
'''
72+
73+
import os
74+
import sys
75+
import tarfile
76+
77+
# Read the filenames first, as importing |shared| changes the directory.
78+
output_file = os.path.abspath(sys.argv[1])
79+
print(f'Bundling to: {output_file}')
80+
assert output_file.endswith('.tgz'), 'Can only generate a .tgz'
81+
82+
build_dir = None
83+
if len(sys.argv) >= 3:
84+
assert sys.argv[2].startswith('--build-dir=')
85+
build_dir = sys.argv[2].split('=')[1]
86+
build_dir = os.path.abspath(build_dir)
87+
# Delete the argument, as importing |shared| scans it.
88+
sys.argv.pop()
89+
90+
from test import shared # noqa
91+
92+
# Pick where to get the builds
93+
if build_dir:
94+
binaryen_bin = os.path.join(build_dir, 'bin')
95+
binaryen_lib = os.path.join(build_dir, 'lib')
96+
else:
97+
binaryen_bin = shared.options.binaryen_bin
98+
binaryen_lib = shared.options.binaryen_lib
99+
100+
with tarfile.open(output_file, "w:gz") as tar:
101+
# run.py
102+
run = os.path.join(shared.options.binaryen_root, 'scripts', 'clusterfuzz', 'run.py')
103+
print(f' .. run: {run}')
104+
tar.add(run, arcname='run.py')
105+
106+
# fuzz_shell.js
107+
fuzz_shell = os.path.join(shared.options.binaryen_root, 'scripts', 'fuzz_shell.js')
108+
print(f' .. fuzz_shell: {fuzz_shell}')
109+
tar.add(fuzz_shell, arcname='scripts/fuzz_shell.js')
110+
111+
# wasm-opt binary
112+
wasm_opt = os.path.join(binaryen_bin, 'wasm-opt')
113+
print(f' .. wasm-opt: {wasm_opt}')
114+
tar.add(wasm_opt, arcname='bin/wasm-opt')
115+
116+
# For a dynamic build we also need libbinaryen.so and possibly other files.
117+
# Try both .so and .dylib suffixes for more OS coverage.
118+
for suffix in ['.so', '.dylib']:
119+
libbinaryen = os.path.join(binaryen_lib, f'libbinaryen{suffix}')
120+
if os.path.exists(libbinaryen):
121+
print(f' .. libbinaryen: {libbinaryen}')
122+
tar.add(libbinaryen, arcname=f'lib/libbinaryen{suffix}')
123+
124+
# The emsdk build also includes some more necessary files.
125+
for name in [f'libc++{suffix}', f'libc++{suffix}.2', f'libc++{suffix}.2.0']:
126+
path = os.path.join(binaryen_lib, name)
127+
if os.path.exists(path):
128+
print(f' ......... : {path}')
129+
tar.add(path, arcname=f'lib/{name}')
130+
131+
print('Done.')
132+
print('To run the tests on this bundle, do:')
133+
print()
134+
print(f'BINARYEN_CLUSTER_FUZZ_BUNDLE={output_file} python -m unittest test/unit/test_cluster_fuzz.py')
135+
print()

scripts/clusterfuzz/run.py

+163
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,163 @@
1+
#
2+
# Copyright 2024 WebAssembly Community Group participants
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
16+
'''
17+
ClusterFuzz run.py script: when run by ClusterFuzz, it uses wasm-opt to generate
18+
a fixed number of testcases. This is a "blackbox fuzzer", see
19+
20+
https://google.github.io/clusterfuzz/setting-up-fuzzing/blackbox-fuzzing/
21+
22+
This file should be bundled up together with the other files it needs, see
23+
bundle_clusterfuzz.py.
24+
'''
25+
26+
import os
27+
import getopt
28+
import random
29+
import subprocess
30+
import sys
31+
32+
# The V8 flags we put in the "fuzzer flags" files, which tell ClusterFuzz how to
33+
# run V8. By default we apply all staging flags.
34+
FUZZER_FLAGS_FILE_CONTENTS = '--wasm-staging'
35+
36+
# Maximum size of the random data that we feed into wasm-opt -ttf. This is
37+
# smaller than fuzz_opt.py's INPUT_SIZE_MAX because that script is tuned for
38+
# fuzzing large wasm files (to reduce the overhead we have of launching many
39+
# processes per file), which is less of an issue on ClusterFuzz.
40+
MAX_RANDOM_SIZE = 15 * 1024
41+
42+
# The prefix for fuzz files.
43+
FUZZ_FILENAME_PREFIX = 'fuzz-'
44+
45+
# The prefix for flags files.
46+
FLAGS_FILENAME_PREFIX = 'flags-'
47+
48+
# The name of the fuzzer (appears after FUZZ_FILENAME_PREFIX /
49+
# FLAGS_FILENAME_PREFIX).
50+
FUZZER_NAME_PREFIX = 'binaryen-'
51+
52+
# The root directory of the bundle this will be in, which is the directory of
53+
# this very file.
54+
ROOT_DIR = os.path.dirname(os.path.abspath(__file__))
55+
56+
# The path to the wasm-opt binary that we run to generate testcases.
57+
FUZZER_BINARY_PATH = os.path.join(ROOT_DIR, 'bin', 'wasm-opt')
58+
59+
# The path to the fuzz_shell.js script that will execute the wasm in each
60+
# testcase.
61+
JS_SHELL_PATH = os.path.join(ROOT_DIR, 'scripts', 'fuzz_shell.js')
62+
63+
# The arguments we provide to wasm-opt to generate wasm files.
64+
FUZZER_ARGS = [
65+
# Generate a wasm from random data.
66+
'--translate-to-fuzz',
67+
# Run some random passes, to further shape the random wasm we emit.
68+
'--fuzz-passes',
69+
# Enable all features but disable ones not yet ready for fuzzing. This may
70+
# be a smaller set than fuzz_opt.py, as that enables a few experimental
71+
# flags, while here we just fuzz with d8's --wasm-staging.
72+
'-all',
73+
'--disable-shared-everything',
74+
'--disable-fp16',
75+
]
76+
77+
78+
# Returns the file name for fuzz or flags files.
79+
def get_file_name(prefix, index):
80+
return f'{prefix}{FUZZER_NAME_PREFIX}{index}.js'
81+
82+
83+
# Returns the contents of a .js fuzz file, given particular wasm contents that
84+
# we want to be executed.
85+
def get_js_file_contents(wasm_contents):
86+
# Start with the standard JS shell.
87+
with open(JS_SHELL_PATH) as file:
88+
js = file.read()
89+
90+
# Prepend the wasm contents, so they are used (rather than the normal
91+
# mechanism where the wasm file's name is provided in argv).
92+
wasm_contents = ','.join([str(c) for c in wasm_contents])
93+
js = f'var binary = new Uint8Array([{wasm_contents}]);\n\n' + js
94+
return js
95+
96+
97+
def main(argv):
98+
# Parse the options. See
99+
# https://google.github.io/clusterfuzz/setting-up-fuzzing/blackbox-fuzzing/#uploading-a-fuzzer
100+
output_dir = '.'
101+
num = 100
102+
expected_flags = ['input_dir=', 'output_dir=', 'no_of_files=']
103+
optlist, _ = getopt.getopt(argv[1:], '', expected_flags)
104+
for option, value in optlist:
105+
if option == '--output_dir':
106+
output_dir = value
107+
elif option == '--no_of_files':
108+
num = int(value)
109+
110+
for i in range(1, num + 1):
111+
input_data_file_path = os.path.join(output_dir, f'{i}.input')
112+
wasm_file_path = os.path.join(output_dir, f'{i}.wasm')
113+
114+
# wasm-opt may fail to run in rare cases (when the fuzzer emits code it
115+
# detects as invalid). Just try again in such a case.
116+
for attempt in range(0, 100):
117+
# Generate random data.
118+
random_size = random.SystemRandom().randint(1, MAX_RANDOM_SIZE)
119+
with open(input_data_file_path, 'wb') as file:
120+
file.write(os.urandom(random_size))
121+
122+
# Generate wasm from the random data.
123+
cmd = [FUZZER_BINARY_PATH] + FUZZER_ARGS
124+
cmd += ['-o', wasm_file_path, input_data_file_path]
125+
try:
126+
subprocess.check_call(cmd)
127+
except subprocess.CalledProcessError:
128+
# Try again.
129+
print('(oops, retrying wasm-opt)')
130+
attempt += 1
131+
if attempt == 99:
132+
# Something is very wrong!
133+
raise
134+
continue
135+
# Success, leave the loop.
136+
break
137+
138+
# Generate a testcase from the wasm
139+
with open(wasm_file_path, 'rb') as file:
140+
wasm_contents = file.read()
141+
testcase_file_path = os.path.join(output_dir,
142+
get_file_name(FUZZ_FILENAME_PREFIX, i))
143+
js_file_contents = get_js_file_contents(wasm_contents)
144+
with open(testcase_file_path, 'w') as file:
145+
file.write(js_file_contents)
146+
147+
# Emit a corresponding flags file.
148+
flags_file_path = os.path.join(output_dir,
149+
get_file_name(FLAGS_FILENAME_PREFIX, i))
150+
with open(flags_file_path, 'w') as file:
151+
file.write(FUZZER_FLAGS_FILE_CONTENTS)
152+
153+
print(f'Created testcase: {testcase_file_path}, {len(wasm_contents)} bytes')
154+
155+
# Remove temporary files.
156+
os.remove(input_data_file_path)
157+
os.remove(wasm_file_path)
158+
159+
print(f'Created {num} testcases.')
160+
161+
162+
if __name__ == '__main__':
163+
main(sys.argv)

0 commit comments

Comments
 (0)