Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audio PR - Augmentation support [ Downmix and ToDecibels ] #125

Merged
merged 351 commits into from
Jun 19, 2024
Merged
Show file tree
Hide file tree
Changes from 250 commits
Commits
Show all changes
351 commits
Select commit Hold shift + click to select a range
6018273
Merge branch 'swbs/audio/pr1' of https://github.com/swetha097/rocAL i…
swetha097 Mar 20, 2024
9b6c93a
Merge branch 'swbs/audio/pr2' of https://github.com/swetha097/rocAL i…
fiona-gladwin Mar 20, 2024
bfbbc7b
Fix formatting issues - Minor changes
Mar 20, 2024
bcc222b
Merge branch 'swbs/audio/pr2' into swbs/audio/pr3
Mar 20, 2024
08cf352
Clean up C++ audio unit test
fiona-gladwin Mar 20, 2024
9282f05
Merge branch 'swbs/audio/pr1' into swbs/audio/pr2
fiona-gladwin Mar 20, 2024
4d6982a
Remove max frames and channels from decoder
fiona-gladwin Mar 20, 2024
0c3a974
Minor changes
fiona-gladwin Mar 20, 2024
ca9c74b
Add Output comparison for python audio unittests
swetha097 Mar 20, 2024
7789333
Modify rocal audio unit test
fiona-gladwin Mar 20, 2024
741a677
Minor change
fiona-gladwin Mar 20, 2024
2150bbc
Merge branch 'develop' of https://github.com/ROCm/rocAL into swbs/aud…
fiona-gladwin Mar 20, 2024
e1eda57
Merge branch 'swbs/audio/pr1' of https://github.com/swetha097/rocAL i…
fiona-gladwin Mar 20, 2024
4a4cb3c
Merge branch 'swbs/audio/pr1' into swbs/audio/pr2
fiona-gladwin Mar 20, 2024
582bbfb
Merge branch 'swbs/audio/pr3' into audio_pr4
SundarRajan28 Mar 20, 2024
19d26be
Merge branch 'swbs/audio/pr2' into audio_pr4
SundarRajan28 Mar 20, 2024
242900e
Merge remote-tracking branch 'origin/audio_pr4' into swbs/audio/pr5
swetha097 Mar 21, 2024
635f247
Remove NSR
swetha097 Mar 21, 2024
da8b26c
Resolve PR comments
swetha097 Mar 21, 2024
dd97798
Merge branch 'swbs/audio/pr2' of https://github.com/swetha097/rocAL i…
swetha097 Mar 21, 2024
db1bf39
Minor changes - Modifying the names of the arguments
Mar 21, 2024
fb23b54
Initial commit for removing file list reader
SundarRajan28 Mar 21, 2024
a2c6a42
Merge remote-tracking branch 'origin/swbs/audio/pr2' into swbs/audio/pr3
Mar 21, 2024
2a39822
Added a bried desc for the rocAL enum for border type
Mar 21, 2024
aaae9fa
Add a WRN statement in PreEmphasis Filter to only use FP32 dtype
Mar 21, 2024
9567eab
minor changes
swetha097 Mar 21, 2024
46b0bcc
Change the borderType enum to int32 from uint32 dtype
Mar 21, 2024
191a88d
Minor changes
fiona-gladwin Mar 21, 2024
ebb6212
Minor change
fiona-gladwin Mar 21, 2024
61ac122
Update README for audio unit test
fiona-gladwin Mar 21, 2024
79ff879
Parameters for rocALAudioIterator
Mar 21, 2024
968570e
Merge branch 'swbs/audio/pr3' into audio_pr4
SundarRajan28 Mar 21, 2024
6873d81
Merge branch 'swbs/audio/pr2' of https://github.com/swetha097/rocAL i…
swetha097 Mar 21, 2024
16403cb
Removing file list reader and metadata reader
SundarRajan28 Mar 21, 2024
6595c43
Merge branch 'swbs/audio/pr2' into swbs/audio/pr3
swetha097 Mar 21, 2024
da86d51
Minor change
fiona-gladwin Mar 21, 2024
a79d484
Remove the reset_tensor_roi() from the PreEmphasis augmentation makin…
Mar 21, 2024
c42638b
Del the Unit Test Files introduced earlier
Mar 21, 2024
4c31be2
Merge branch 'swbs/audio/pr2' into audio_pr4
SundarRajan28 Mar 22, 2024
f5e4877
Merge branch 'swbs/audio/pr3' into audio_pr4
SundarRajan28 Mar 22, 2024
ba65268
Changing python unittests for QA mode
SundarRajan28 Mar 22, 2024
e2efa14
Resolving review comments
SundarRajan28 Mar 22, 2024
7e6c498
Adding comment for file list case in file source reader
SundarRajan28 Mar 22, 2024
d5d826f
Add pre_emphasis function and gollden output comparison in audio unit…
swetha097 Mar 22, 2024
17824e4
minor change - add update val in create array
swetha097 Mar 22, 2024
db7f126
Minor variable name change
fiona-gladwin Mar 22, 2024
defb6bb
Minor additions in the .h file
Mar 22, 2024
238b30e
Merge remote-tracking branch 'origin/audio_pr4' into swbs/audio/pr5
Mar 22, 2024
acffb66
minor change
swetha097 Mar 22, 2024
e05a967
Formatting Changes
Mar 22, 2024
f887af3
Update unit test
fiona-gladwin Mar 22, 2024
03dee69
Minor change
fiona-gladwin Mar 22, 2024
df64a4a
minor change
swetha097 Mar 22, 2024
6302b4e
Merge branch 'swbs/audio/pr3' of https://github.com/swetha097/rocAL i…
swetha097 Mar 22, 2024
0404a9e
Merge branch 'swbs/audio/pr1' into swbs/audio/pr3
SundarRajan28 Mar 22, 2024
b1d11ce
Merge branch 'swbs/audio/pr2' into swbs/audio/pr3
SundarRajan28 Mar 22, 2024
45680cf
Merge branch 'swbs/audio/pr3' into audio_pr4
SundarRajan28 Mar 22, 2024
7674427
Resolving review comments
SundarRajan28 Mar 22, 2024
beefee7
Merge branch 'audio_pr4' into swbs/audio/pr5
SundarRajan28 Mar 22, 2024
1ed0c5b
Merge branch 'swbs/audio/pr5' into swbs/audio/pr6
SundarRajan28 Mar 22, 2024
87ab2c3
Minor changes
fiona-gladwin Mar 22, 2024
6077ff8
Merge branch 'swbs/audio/pr1' into swbs/audio/pr2
fiona-gladwin Mar 22, 2024
2da5856
Minor change
fiona-gladwin Mar 22, 2024
d5bf0d1
Merge branch 'swbs/audio/pr2' into swbs/audio/pr3
fiona-gladwin Mar 22, 2024
0455d20
Add file reader to python audio unit test
fiona-gladwin Mar 22, 2024
0507647
Add to_decibels augmentations to rocAL
SundarRajan28 Mar 22, 2024
784f77b
Merge branch 'swbs/audio/pr3' into audio_pr4
SundarRajan28 Mar 22, 2024
4d34561
Formatting , review comments resolution and change enum dtype to int3…
Mar 22, 2024
e21d617
Update C++ unit test
fiona-gladwin Mar 22, 2024
b46c6f8
Update python audio unit test
fiona-gladwin Mar 22, 2024
1ed758c
Merge branch 'swbs/audio/pr1' into swbs/audio/pr2
fiona-gladwin Mar 22, 2024
433857d
Merge branch 'swbs/audio/pr2' into swbs/audio/pr3
fiona-gladwin Mar 22, 2024
8f3b595
Merge branch 'swbs/audio/pr3' into audio_pr4
SundarRajan28 Mar 22, 2024
0527822
Remove the unused variable output - Resolve warnings in cpp unit test
Mar 22, 2024
41ecb2a
Remove the dst_roi arg passed to rpp
Mar 22, 2024
80907a6
Minor changes
fiona-gladwin Mar 22, 2024
f4550b2
Merge branch 'swbs/audio/pr3' of https://github.com/swetha097/rocAL i…
fiona-gladwin Mar 22, 2024
41ea790
Merge branch 'swbs/audio/pr3' into audio_pr4
SundarRajan28 Mar 22, 2024
47fde8a
Adding file list reader to C++ unit tests
SundarRajan28 Mar 22, 2024
77a59ed
Fixing issues with C++ audio unit tests
SundarRajan28 Mar 22, 2024
0dc2d7f
Merge branch 'audio_pr4' into swbs/audio/pr6
SundarRajan28 Mar 25, 2024
c48ec9e
Merge branch 'swbs/audio/pr5' into swbs/audio/pr6
SundarRajan28 Mar 25, 2024
601f5e0
Adding test case for to_decibels and downmix
SundarRajan28 Mar 25, 2024
5cc237c
Modifying python unittests
SundarRajan28 Mar 25, 2024
0332cf3
Merge audio_pr4 branch
swetha097 Mar 25, 2024
8296f07
Fixing spectogram test case
SundarRajan28 Mar 25, 2024
67faf8f
Remove the reset_tensor_roi calls
Mar 25, 2024
54e802c
Resolving review comments
SundarRajan28 Mar 25, 2024
51a7498
Merge branch 'swbs/audio/pr5' into swbs/audio/pr6
SundarRajan28 Mar 25, 2024
59a8a97
Resolve some PR comments
Mar 25, 2024
2f8b5ab
Merge branch 'audio_pr4' into swbs/audio/pr6
SundarRajan28 Mar 25, 2024
4128bf8
Merge branch 'audio_pr4' into swbs/audio/pr5
Mar 25, 2024
cb10bdc
Minor changes
Mar 25, 2024
797d708
Merge branch 'swbs/audio/pr5' into swbs/audio/pr6
SundarRajan28 Mar 25, 2024
411c913
Merge branch 'swbs/audio/pr5' into swbs/audio/pr6
SundarRajan28 Mar 25, 2024
ac56161
Resolving review comments
SundarRajan28 Mar 25, 2024
d791b9a
Change the dims[0] and dims[1] positioning for Spectrogram
Mar 25, 2024
9abad24
Resolving review comments
SundarRajan28 Mar 25, 2024
d52755e
Resolving review comments
SundarRajan28 Mar 25, 2024
8503473
Resolving PR comments
swetha097 Mar 25, 2024
d782686
Updating audio unit tests for default file list path
SundarRajan28 Mar 25, 2024
ede4508
Minor changes
fiona-gladwin Mar 25, 2024
62920e4
Merge remote-tracking branch 'swe_fork/audio_pr4' into swbs/audio/pr5
swetha097 Mar 25, 2024
75b28c1
Minor Change
Mar 25, 2024
1320512
Minor change
Mar 25, 2024
58e3c63
Merge branch 'audio_pr4' into swbs/audio/pr6
SundarRajan28 Mar 25, 2024
6389cf9
Merge branch 'swbs/audio/pr5' into swbs/audio/pr6
SundarRajan28 Mar 25, 2024
c9a6dbb
Merge branch 'generic-name-change' into swbs/audio/pr1
fiona-gladwin Mar 26, 2024
ca6f311
Merge branch 'develop' of https://github.com/ROCm/rocAL into generic-…
fiona-gladwin Mar 26, 2024
8d34902
Merge branch 'generic-name-change' into swbs/audio/pr1
fiona-gladwin Mar 26, 2024
ffb284d
Name change from sample to data
Mar 26, 2024
ff12843
Merge branch 'generic-name-change' of https://github.com/swetha097/ro…
Mar 26, 2024
e53388f
Change from decoded_data_info to DecodedDataInfo
Mar 26, 2024
5f23def
Revert "Change the dims[0] and dims[1] positioning for Spectrogram"
swetha097 Mar 26, 2024
0774f69
Remove audio_decoder_factory.cpp file
fiona-gladwin Mar 26, 2024
90b9d83
Minor change
fiona-gladwin Mar 26, 2024
531e5fb
Change variable name
fiona-gladwin Mar 26, 2024
8b28c37
Merge branch 'swbs/audio/pr5' into swbs/audio/pr6
SundarRajan28 Mar 26, 2024
0753163
Add Spectrogram Case in unit tests
swetha097 Mar 26, 2024
03d66d5
Merge branch 'swbs/audio/pr5' of https://github.com/swetha097/rocAL i…
swetha097 Mar 26, 2024
8bd9d59
Add spectrogram case in python unit tests
swetha097 Mar 26, 2024
6c4e381
Merge branch 'swbs/audio/pr5' into swbs/audio/pr6
SundarRajan28 Mar 26, 2024
98ce527
Merge branch 'generic-name-change' into swbs/audio/pr1
fiona-gladwin Mar 26, 2024
7d4c1fd
Update the struct variable name in audio files
fiona-gladwin Mar 26, 2024
a3898a8
Fixing issues with downmix node output
SundarRajan28 Mar 27, 2024
ac545ff
Adding ROI updation in downmix node
SundarRajan28 Mar 27, 2024
8eae103
Adding downmix test case for python unit tests
SundarRajan28 Mar 27, 2024
2accd5d
Adding downmix and to_decibels test case in C++ tests
SundarRajan28 Mar 27, 2024
a9e6497
Minor changes
fiona-gladwin Mar 27, 2024
85d21e6
Change ROCAL_DATA_PATH to exclude rocal_data
fiona-gladwin Mar 27, 2024
6856e9b
Merge branch 'swbs/audio/pr1' into swbs/audio/pr2
fiona-gladwin Mar 27, 2024
57c8a0d
Update ROCAL_DATA_PATH to exclude rocal_data
fiona-gladwin Mar 27, 2024
3a86507
Use Pascal case for function names in audio decoder
fiona-gladwin Mar 27, 2024
ef91012
Add audio path for downmix test case
SundarRajan28 Apr 1, 2024
36caf22
Fix review comments
swetha097 Apr 1, 2024
7f46a25
Modify cmake to have SNDFILE in all capital
fiona-gladwin Apr 2, 2024
70aa700
Minor changes
fiona-gladwin Apr 2, 2024
0693605
Add struct for audio info in AudioReadAndDecode
fiona-gladwin Apr 2, 2024
44e654d
Merge branch 'develop' of https://github.com/ROCm/rocAL into generic-…
fiona-gladwin Apr 2, 2024
c77140c
Merge branch 'generic-name-change' into swbs/audio/pr1
fiona-gladwin Apr 2, 2024
eabf7aa
Merge branch 'swbs/audio/pr1' into swbs/audio/pr2
fiona-gladwin Apr 2, 2024
f96a92b
Fix merge conflict
fiona-gladwin Apr 2, 2024
46b7e5e
Merge branch 'swbs/audio/pr1' into swbs/audio/pr2
fiona-gladwin Apr 2, 2024
91d0615
Renaming crop_image_info to CropImageInfo
swetha097 Apr 3, 2024
bb4e5a5
Remove - actual_host_buffers - Unused
swetha097 Apr 3, 2024
50829f6
Rename TimingDBG to TimingDbg
swetha097 Apr 3, 2024
d0a456b
Move the instances of DecodedDataInfo to its base class LoaderModule
swetha097 Apr 3, 2024
a80a3a6
Fix a WRN msg in master_graph.cpp
swetha097 Apr 3, 2024
f648feb
Remove a dangling comment
swetha097 Apr 3, 2024
6146bac
Rename _circ_data_info to _circ_buff_data_info
swetha097 Apr 3, 2024
47263d9
Add Glob to CMakeLists.txt
fiona-gladwin Apr 4, 2024
8623be3
Rename SndFileDecoder to GenericAudioDecoder
fiona-gladwin Apr 4, 2024
c4af22c
Merge branch 'develop' of https://github.com/ROCm/rocAL into generic-…
fiona-gladwin Apr 4, 2024
5b9be3d
Merge branch 'generic-name-change' of https://github.com/swetha097/ro…
fiona-gladwin Apr 4, 2024
47dea85
Merge branch 'generic-name-change' into swbs/audio/pr1
fiona-gladwin Apr 4, 2024
660071b
Fix build issues
fiona-gladwin Apr 4, 2024
5bd4fa8
Merge branch 'swbs/audio/pr1' into swbs/audio/pr2
fiona-gladwin Apr 4, 2024
4f9ab6b
Minor change
fiona-gladwin Apr 4, 2024
6c430cf
Merge branch 'swbs/audio/pr1' into swbs/audio/pr2
fiona-gladwin Apr 4, 2024
416180e
Update python API README.md for audio unit test
fiona-gladwin Apr 4, 2024
0f0be88
Update audio unit test README
fiona-gladwin Apr 4, 2024
3114a18
Merge branch 'swbs/audio/pr2' into swbs/audio/pr3
SundarRajan28 Apr 8, 2024
ac6e6e3
Merge branch 'swbs/audio/pr3' into audio_pr4
SundarRajan28 Apr 8, 2024
876c9ad
Merge remote-tracking branch 'swe_fork/audio_pr4' into swbs/audio/pr5
swetha097 Apr 8, 2024
9445e6c
Adding missed param in python unit tests
SundarRajan28 Apr 8, 2024
12b4801
Merge branch 'swbs/audio/pr5' into swbs/audio/pr6
SundarRajan28 Apr 8, 2024
e496f3a
Revert "Add Glob to CMakeLists.txt"
fiona-gladwin Apr 10, 2024
5df0055
Merge branch 'develop' of https://github.com/ROCm/rocAL into swbs/aud…
fiona-gladwin Apr 10, 2024
7dc7092
Fix include headers for Audio files
fiona-gladwin Apr 10, 2024
19e30cf
Fix copy data 2D
fiona-gladwin Apr 10, 2024
34deb3b
Merge branch 'swbs/audio/pr1' into swbs/audio/pr2
fiona-gladwin Apr 10, 2024
4c02dfb
Minor changes
fiona-gladwin Apr 11, 2024
e3f350f
Pass decoded data info to load routine instead of separate vectors
fiona-gladwin Apr 11, 2024
67cda83
Update CHANGELOG.md
fiona-gladwin Apr 11, 2024
d36df07
Merge branch 'swbs/audio/pr1' into swbs/audio/pr2
fiona-gladwin Apr 11, 2024
42c844d
Update CHANGELOG.md
fiona-gladwin Apr 11, 2024
8b1c59f
Change swap_handle_time variable name in loader
fiona-gladwin Apr 11, 2024
241ce67
Merge remote-tracking branch 'swe_fork/swbs/audio/pr2' into swbs/audi…
swetha097 Apr 11, 2024
07ba1f6
Update the changelog.md
swetha097 Apr 11, 2024
83513fb
Update ChangeLog.md
swetha097 Apr 11, 2024
31959c2
Merge branch 'swbs/audio/pr5' of https://github.com/swetha097/rocAL i…
swetha097 Apr 11, 2024
abc63c9
Merge branch 'swbs/audio/pr3' into audio_pr4
SundarRajan28 Apr 11, 2024
bb8908b
Update CHANGELOG.md
SundarRajan28 Apr 11, 2024
91fed39
Formatting changes
fiona-gladwin Apr 11, 2024
ee3606b
Merge branch 'audio_pr4' into swbs/audio/pr5
SundarRajan28 Apr 11, 2024
6a80714
Update doxygen comments
fiona-gladwin Apr 11, 2024
a19086b
Merge branch 'swbs/audio/pr5' into swbs/audio/pr6
SundarRajan28 Apr 11, 2024
689985d
Move file source reader from readers/image to readers folder
fiona-gladwin Apr 11, 2024
db758fd
Merge branch 'swbs/audio/pr1' into swbs/audio/pr2
fiona-gladwin Apr 11, 2024
67190bf
Update README and add doxygen description
fiona-gladwin Apr 11, 2024
ffdcb0a
Update CMakeLists and README for audio test
fiona-gladwin Apr 11, 2024
b2de5f4
Merge branch 'swbs/audio/pr1' into swbs/audio/pr2
fiona-gladwin Apr 11, 2024
d000af0
Update README for audio test
fiona-gladwin Apr 11, 2024
7415447
Minor fix
fiona-gladwin Apr 12, 2024
f6bffef
Merge branch 'develop' of https://github.com/ROCm/rocAL into swbs/aud…
fiona-gladwin Apr 12, 2024
cb034b0
Merge branch 'swbs/audio/pr1' into swbs/audio/pr2
fiona-gladwin Apr 12, 2024
d8031b5
Merge remote-tracking branch 'swe_fork/swbs/audio/pr2' into swbs/audi…
swetha097 Apr 12, 2024
d894aba
Fix merge from PR 2
swetha097 Apr 12, 2024
689c55f
Minor changes shard_count argument name
fiona-gladwin Apr 12, 2024
1079d50
Rename set and get functions of data_info to decoded_data_info
fiona-gladwin Apr 12, 2024
1f63cab
Merge branch 'swbs/audio/pr1' into swbs/audio/pr2
fiona-gladwin Apr 12, 2024
2967b68
Merge branch 'swbs/audio/pr3' into audio_pr4
SundarRajan28 Apr 16, 2024
36a9516
Merge branch 'audio_pr4' into swbs/audio/pr5
SundarRajan28 Apr 17, 2024
fb7a52b
Merge branch 'swbs/audio/pr5' into swbs/audio/pr6
SundarRajan28 Apr 17, 2024
42d1bb1
Merge remote-tracking branch 'upstream/develop' into swbs/audio/pr2
SundarRajan28 Apr 17, 2024
3375f41
Merge branch 'swbs/audio/pr2' into swbs/audio/pr3
SundarRajan28 Apr 17, 2024
d7c8884
Merge branch 'swbs/audio/pr3' into audio_pr4
SundarRajan28 Apr 17, 2024
513fd78
Merge branch 'audio_pr4' into swbs/audio/pr5
SundarRajan28 Apr 17, 2024
44cefd6
Merge branch 'swbs/audio/pr5' into swbs/audio/pr6
SundarRajan28 Apr 17, 2024
d928c48
Merge branch 'develop' of https://github.com/ROCm/rocAL into swbs/aud…
fiona-gladwin Apr 17, 2024
c0d2309
Merge branch 'swbs/audio/pr1' into swbs/audio/pr2
fiona-gladwin Apr 17, 2024
c01325d
Revert empty line removed in CMakeLists.txt
fiona-gladwin Apr 17, 2024
549def5
Removed prefix original for audio vectors
fiona-gladwin Apr 17, 2024
c1d9cc5
Resolve PR comments
swetha097 Apr 18, 2024
7874f09
Add @params to all args in pytorch.py
swetha097 Apr 18, 2024
ef9a21b
Fix build issue
swetha097 Apr 18, 2024
0f48da9
Merge branch 'develop' of https://github.com/ROCm/rocAL into swbs/aud…
fiona-gladwin Apr 22, 2024
37921de
Minor changes in unit test
swetha097 Apr 22, 2024
96ace00
Merge branch 'swbs/audio/pr2' of https://github.com/swetha097/rocAL i…
swetha097 Apr 22, 2024
6602895
Minor changes
swetha097 Apr 22, 2024
aa13a35
Change ROCAL instaces to rocAL in pytorch.py
swetha097 Apr 22, 2024
2873d8c
Merge branch 'swbs/audio/pr2' into swbs/audio/pr3
fiona-gladwin Apr 22, 2024
2dd31f8
Resolve the PR comments
swetha097 Apr 23, 2024
1cd9779
Merge branch 'swbs/audio/pr3' of https://github.com/swetha097/rocAL i…
swetha097 Apr 23, 2024
d1d5241
Minor changes in decoders.py - Modify the comment for shard_size
swetha097 Apr 23, 2024
f4bcbca
Merge branch 'swbs/audio/pr2' of https://github.com/swetha097/rocAL i…
fiona-gladwin Apr 23, 2024
d152dca
Merge branch 'swbs/audio/pr3' of https://github.com/swetha097/rocAL i…
fiona-gladwin Apr 23, 2024
e4c5788
Merge branch 'develop' of https://github.com/ROCm/rocAL into swbs/aud…
fiona-gladwin Apr 23, 2024
fb33f06
Merge branch 'swbs/audio/pr3' into audio_pr4
SundarRajan28 Apr 24, 2024
be416ef
Minor changes
swetha097 Apr 24, 2024
8a7bb3c
Address the PR comments
swetha097 Apr 25, 2024
2021ab9
Address Review comments
swetha097 Apr 25, 2024
0c900a9
Introduce Audio layouts
fiona-gladwin May 9, 2024
e75616c
Add layout changes for spectrogram
fiona-gladwin May 9, 2024
e7ed0d8
Fix the unit tests - c++ & python
swetha097 May 9, 2024
528a87a
Merge branch 'swbs/audio/pr5' of https://github.com/swetha097/rocAL i…
fiona-gladwin May 9, 2024
ab993d0
Minor fix
fiona-gladwin May 10, 2024
9757256
Adding changes for spec layout changes
SundarRajan28 May 15, 2024
ce91644
Merge branch 'swbs/audio/pr5_layout' into swbs/audio/pr5
fiona-gladwin May 17, 2024
60133c6
Merge branch 'swbs/audio/pr5' into swbs/audio/pr6
fiona-gladwin May 17, 2024
c41f363
Merge remote-tracking branch 'open_source/develop' into swbs/audio/pr3
swetha097 May 17, 2024
70e12cd
Merge branch 'swbs/audio/pr3' into audio_pr4
swetha097 May 18, 2024
b858b69
Merge branch 'audio_pr4' into swbs/audio/pr5
swetha097 May 18, 2024
5e79034
Merge remote-tracking branch 'origin/swbs/audio/pr5' into HEAD
swetha097 May 18, 2024
66be5a2
Merge branch 'temp_swbs/audio/pr6' into swbs/audio/pr6
swetha097 May 19, 2024
c91794b
Merge remote-tracking branch 'upstream/develop' into swbs/audio/pr6
SundarRajan28 Jun 5, 2024
f7b83c7
Merge remote-tracking branch 'upstream/develop' into swbs/audio/pr6
SundarRajan28 Jun 12, 2024
f5223a9
Fix merge conflicts
SundarRajan28 Jun 13, 2024
8887b6d
Merge remote-tracking branch 'upstream/develop' into swbs/audio/pr6
SundarRajan28 Jun 14, 2024
42a9144
Resolving review comments
SundarRajan28 Jun 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,11 @@
* Pytorch iterator for Audio
* Python audio unit test, and support to verify outputs
* rocDecode for HW decode
* Support for Audio augmentation - PreEmphasis filter
* Support for reading from file lists in file reader
* Support for Audio augmentation - Spectrogram
* Support for Audio augmentation - ToDecibels
* Support for downmixing audio channels during decoding

### Optimizations

Expand Down
65 changes: 65 additions & 0 deletions rocAL/include/api/rocal_api_augmentation.h
Original file line number Diff line number Diff line change
Expand Up @@ -1098,4 +1098,69 @@ extern "C" RocalTensor ROCAL_API_CALL rocalSSDRandomCrop(RocalContext context, R
RocalTensorLayout output_layout = ROCAL_NONE,
RocalTensorOutputType output_datatype = ROCAL_UINT8);

/*! \brief Applies preemphasis filter to the input data.
* \ingroup group_rocal_augmentations
* \param [in] context Rocal context
* \param [in] input Input Rocal tensor
* \param [in] is_output Sets to True if the output tensor is part of the graph output
* \param [in] preemph_coeff Preemphasis coefficient
* \param [in] preemph_border_type Border value policy. Possible values are "zero", "clamp", "reflect".
* \param [in] output_datatype The data type of the output tensor
* \return RocalTensor
*/
extern "C" RocalTensor ROCAL_API_CALL rocalPreEmphasisFilter(RocalContext context,
RocalTensor input,
bool is_output,
RocalFloatParam preemph_coeff = NULL,
RocalAudioBorderType preemph_border_type = RocalAudioBorderType::ROCAL_CLAMP,
RocalTensorOutputType output_datatype = ROCAL_FP32);

/*! \brief Produces a spectrogram from a 1D audio signal.
* \ingroup group_rocal_augmentations
* \param [in] context Rocal context
* \param [in] input Input Rocal tensor
* \param [in] is_output is the output tensor part of the graph output
* \param [in] window_fn values of the window function
* \param [in] center_windows boolean value to specify whether extracted windows should be padded so that the window function is centered at multiples of window_step
* \param [in] reflect_padding Indicates the padding policy when sampling outside the bounds of the audio data
* \param [in] spectrogram_layout output spectrogram layout
* \param [in] power Exponent of the magnitude of the spectrum
* \param [in] nfft Size of the Fast Fourier transform (FFT)
* \param [in] window_length Window size in the number of samples
* \param [in] window_step Step between the Short-time Fourier transform (STFT) windows in number of samples
* \param [in] output_datatype the data type of the output tensor
* \return RocalTensor
*/
extern "C" RocalTensor ROCAL_API_CALL rocalSpectrogram(RocalContext context,
RocalTensor input,
bool is_output,
std::vector<float> &window_fn,
bool center_windows,
bool reflect_padding,
int power,
int nfft,
int window_length = 512,
int window_step = 256,
RocalTensorLayout output_layout = ROCAL_NFT,
RocalTensorOutputType output_datatype = ROCAL_FP32);

/*! \brief A
* \ingroup group_rocal_augmentations
* \param [in] p_context Rocal context
* \param [in] p_input Input Rocal tensor
* \param [in] is_output is the output tensor part of the graph output
* \param[in] cutoff_db minimum or cut-off ratio in dB
* \param[in] multiplier factor by which the logarithm is multiplied
* \param[in] reference_magnitude Reference magnitude which if not provided uses maximum value of input as reference
* \param [in] rocal_tensor_output_type the data type of the output tensor
* \return RocalTensor
*/
extern "C" RocalTensor ROCAL_API_CALL rocalToDecibels(RocalContext p_context,
RocalTensor p_input,
bool is_output,
float cutoff_db,
float multiplier,
float reference_magnitude,
RocalTensorOutputType rocal_tensor_output_type);

#endif // MIVISIONX_ROCAL_API_AUGMENTATION_H
8 changes: 6 additions & 2 deletions rocAL/include/api/rocal_api_data_loaders.h
Original file line number Diff line number Diff line change
Expand Up @@ -882,7 +882,8 @@ extern "C" RocalTensor ROCAL_API_CALL rocalJpegExternalFileSource(RocalContext p
/*! Creates Audio file reader and decoder. It allocates the resources and objects required to read and decode audio files stored on the file systems. It has internal sharding capability to load/decode in parallel if user wants.
* If the files are not in standard audio compression formats they will be ignored, Currently wav format is supported
* \param [in] context Rocal context
* \param [in] source_path A NULL terminated char string pointing to the location of files on the disk
* \param [in] source_path A NULL terminated char string pointing to the location on the disk
* \param [in] source_file_list_path A char string pointing to the file list location on the disk
* \param [in] shard_count Defines the parallelism level by internally sharding the input dataset and load/decode using multiple decoder/loader instances. Using shard counts bigger than 1 improves the load/decode performance if compute resources (CPU cores) are available.
* \param [in] is_output Boolean variable to enable the audio to be part of the output.
* \param [in] shuffle Boolean variable to shuffle the dataset.
Expand All @@ -892,6 +893,7 @@ extern "C" RocalTensor ROCAL_API_CALL rocalJpegExternalFileSource(RocalContext p
*/
extern "C" RocalTensor ROCAL_API_CALL rocalAudioFileSource(RocalContext context,
const char* source_path,
const char* source_file_list_path,
unsigned shard_count,
bool is_output,
bool shuffle = false,
Expand All @@ -901,7 +903,8 @@ extern "C" RocalTensor ROCAL_API_CALL rocalAudioFileSource(RocalContext context,
/*! Creates Audio file reader and decoder. It allocates the resources and objects required to read and decode audio files stored on the file systems. It has internal sharding capability to load/decode in parallel is user wants.
* If the files are not in standard audio compression formats they will be ignored.
* \param [in] context Rocal context
* \param [in] source_path A NULL terminated char string pointing to the location of files on the disk
* \param [in] source_path A NULL terminated char string pointing to the location on the disk
* \param [in] source_file_list_path A char string pointing to the file list location on the disk
* \param [in] shard_id Shard id for this loader
* \param [in] shard_count Defines the parallelism level by internally sharding the input dataset and load/decode using multiple decoder/loader instances. Using shard counts bigger than 1 improves the load/decode performance if compute resources (CPU cores) are available.
* \param [in] is_output Boolean variable to enable the audio to be part of the output.
Expand All @@ -912,6 +915,7 @@ extern "C" RocalTensor ROCAL_API_CALL rocalAudioFileSource(RocalContext context,
*/
extern "C" RocalTensor ROCAL_API_CALL rocalAudioFileSourceSingleShard(RocalContext p_context,
const char* source_path,
const char* source_file_list_path,
unsigned shard_id,
unsigned shard_count,
bool is_output,
Expand Down
3 changes: 2 additions & 1 deletion rocAL/include/api/rocal_api_meta_data.h
Original file line number Diff line number Diff line change
Expand Up @@ -36,9 +36,10 @@ THE SOFTWARE.
* \ingroup group_rocal_meta_data
* \param [in] rocal_context rocal context
* \param [in] source_path path to the folder that contains the dataset or metadata file
* \param file_list_path is the path to file list that contains the file names and its corresponding labels
* \return RocalMetaData object, can be used to inquire about the rocal's output (processed) tensors
*/
extern "C" RocalMetaData ROCAL_API_CALL rocalCreateLabelReader(RocalContext rocal_context, const char* source_path);
extern "C" RocalMetaData ROCAL_API_CALL rocalCreateLabelReader(RocalContext rocal_context, const char* source_path, const char* file_list_path = "");

/*! \brief creates video label reader
* \ingroup group_rocal_meta_data
Expand Down
36 changes: 34 additions & 2 deletions rocAL/include/api/rocal_api_types.h
Original file line number Diff line number Diff line change
Expand Up @@ -218,9 +218,20 @@ enum RocalTensorLayout {
/*! \brief AMD ROCAL_NFCHW
*/
ROCAL_NFCHW = 3,
/*! \brief AMD ROCAL_NHW
*/
ROCAL_NHW = 4,
/*! \brief AMD ROCAL_NFT
* Spectrogram Layout FT
*/
ROCAL_NFT = 5,
/*! \brief AMD ROCAL_NTF
* Spectrogram Layout TF
*/
ROCAL_NTF = 6,
/*! \brief AMD ROCAL_NONE
*/
ROCAL_NONE = 4 // Layout for generic tensors (Non-Image or Non-Video)
ROCAL_NONE = 7 // Layout for generic tensors (Non-Image or Non-Video)
};

/*! \brief rocAL Tensor Output Type enum
Expand All @@ -238,7 +249,13 @@ enum RocalTensorOutputType {
ROCAL_UINT8 = 2,
/*! \brief AMD ROCAL_INT8
*/
ROCAL_INT8 = 3
ROCAL_INT8 = 3,
/*! \brief AMD ROCAL_UINT32
*/
ROCAL_UINT32 = 4,
/*! \brief AMD ROCAL_INT32
*/
ROCAL_INT32 = 5
};

/*! \brief rocAL Decoder Type enum
Expand Down Expand Up @@ -377,6 +394,21 @@ enum RocalExternalSourceMode {
ROCAL_EXTSOURCE_RAW_UNCOMPRESSED = 2,
};

/*! \brief rocAL Audio Border Type enum
* \ingroup group_rocal_types
*/
enum RocalAudioBorderType {
/*! \brief AMD ROCAL_ZERO
*/
ROCAL_ZERO = 0,
/*! \brief AMD ROCAL_CLAMP
*/
ROCAL_CLAMP = 1,
/*! \brief AMD ROCAL_REFLECT
*/
ROCAL_REFLECT = 2
};

/*! \brief Tensor Last Batch Policies
* \ingroup group_rocal_types
*/
Expand Down
34 changes: 34 additions & 0 deletions rocAL/include/augmentations/audio_augmentations/node_downmix.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
/*
Copyright (c) 2024 Advanced Micro Devices, Inc. All rights reserved.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
*/

#pragma once
#include "pipeline/graph.h"
#include "pipeline/node.h"
class DownmixNode : public Node {
public:
DownmixNode(const std::vector<Tensor *> &inputs, const std::vector<Tensor *> &outputs);
DownmixNode() = delete;

protected:
void create_node() override;
void update_node() override;
};
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
/*
Copyright (c) 2024 Advanced Micro Devices, Inc. All rights reserved.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
*/

#pragma once
#include "pipeline/graph.h"
#include "pipeline/node.h"
#include "parameters/parameter_factory.h"
#include "parameters/parameter_vx.h"
#include "rocal_api_types.h"

class PreemphasisFilterNode : public Node {
public:
PreemphasisFilterNode(const std::vector<Tensor *> &inputs, const std::vector<Tensor *> &outputs);
PreemphasisFilterNode() = delete;
void init(FloatParam *preemph_coeff, RocalAudioBorderType preemph_border);

protected:
void create_node() override;
void update_node() override;

private:
ParameterVX<float> _preemph_coeff;
constexpr static float PREEMPH_COEFF_RANGE[2] = {0.97, 0.97};
RocalAudioBorderType _preemph_border;
};
58 changes: 58 additions & 0 deletions rocAL/include/augmentations/audio_augmentations/node_spectrogram.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
/*
Copyright (c) 2024 Advanced Micro Devices, Inc. All rights reserved.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
*/

#pragma once
#include "pipeline/graph.h"
#include "pipeline/node.h"
#include "rocal_api_types.h"

/// @brief Generates hann window for spectrogram
/// @param output
/// @param window_size
inline void hann_window(float *output, int window_size) {
double a = (2.0 * M_PI) / window_size;
for (int t = 0; t < window_size; t++) {
double phase = a * (t + 0.5);
output[t] = (0.5 * (1.0 - std::cos(phase)));
}
}

class SpectrogramNode : public Node {
public:
SpectrogramNode(const std::vector<Tensor *> &inputs, const std::vector<Tensor *> &outputs);
SpectrogramNode() = delete;
void init(bool is_center_windows, bool is_reflect_padding, int power, int nfft,
int window_length, int window_step, std::vector<float> &window_fn);

protected:
void create_node() override;
void update_node() override;

private:
std::vector<float> _window_fn;
int _power = 2;
int _nfft = 2048;
int _window_length = 512;
int _window_step = 256;
bool _is_center_windows = true;
bool _is_reflect_padding = true;
};
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
/*
Copyright (c) 2024 Advanced Micro Devices, Inc. All rights reserved.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
*/

#pragma once
#include "pipeline/graph.h"
#include "pipeline/node.h"

class ToDecibelsNode : public Node {
public:
ToDecibelsNode(const std::vector<Tensor *> &inputs, const std::vector<Tensor *> &outputs);
ToDecibelsNode() = delete;
void init(float cutoff_db, float multiplier, float reference_magnitude);

protected:
void create_node() override;
void update_node() override;

private:
float _cutoff_db = -200.0;
float _multiplier = 10.0;
float _reference_magnitude = 0.0;
};
3 changes: 3 additions & 0 deletions rocAL/include/augmentations/augmentations_nodes.h
Original file line number Diff line number Diff line change
Expand Up @@ -55,3 +55,6 @@ THE SOFTWARE.
#include "augmentations/node_copy.h"
#include "augmentations/node_nop.h"
#include "augmentations/node_sequence_rearrange.h"
#include "augmentations/audio_augmentations/node_preemphasis_filter.h"
#include "augmentations/audio_augmentations/node_spectrogram.h"
#include "augmentations/audio_augmentations/node_to_decibels.h"
Copy link
Contributor

@SundarRajan28 SundarRajan28 Jun 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Downmix augmentation is used in audio loader function so its added in rocal_api_dataloader.cpp

3 changes: 3 additions & 0 deletions rocAL/include/pipeline/commons.h
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,9 @@ enum class RocalTensorlayout {
NCHW,
NFHWC,
NFCHW,
NHW,
NFT,
NTF,
NONE
};

Expand Down
2 changes: 1 addition & 1 deletion rocAL/include/pipeline/master_graph.h
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ class MasterGraph {
std::shared_ptr<T> meta_add_node(std::shared_ptr<M> node);
Tensor *create_tensor(const TensorInfo &info, bool is_output);
Tensor *create_loader_output_tensor(const TensorInfo &info);
std::vector<rocalTensorList *> create_label_reader(const char *source_path, MetaDataReaderType reader_type);
std::vector<rocalTensorList *> create_label_reader(const char *source_path, MetaDataReaderType reader_type, const char *file_list_path = "");
std::vector<rocalTensorList *> create_video_label_reader(const char *source_path, MetaDataReaderType reader_type, unsigned sequence_length, unsigned frame_step, unsigned frame_stride, bool file_list_frame_num = true);
std::vector<rocalTensorList *> create_coco_meta_data_reader(const char *source_path, bool is_output, MetaDataReaderType reader_type, MetaDataType label_type, bool ltrb_bbox = true, bool is_box_encoder = false,
bool avoid_class_remapping = false, bool aspect_ratio_grouping = false, bool is_box_iou_matcher = false, float sigma = 0.0, unsigned pose_output_width = 0, unsigned pose_output_height = 0);
Expand Down
Loading