HuggingFace

Compare model performance with three different transformation types

A cross-validated approach to verify the impact of a feature on a model

In this notebook, I walk you through three different transformation types for audio (wav) files for a ten-class classification problem. In this example, I am using a vision-based algorithm; hence it is easy to visualize the importance of features from a visual perspective and their impact on model performance.

The three different transformation types are:

Linear Spectrograms
Log Spectrograms
Mel Spectrograms

You can learn more about these three transformations in Scott Duda's article and Ketan Doshi's writing, reasoning why Mel Spectrograms perform better in general for visual transformations of audio files.

This notebook will test these three transforms on this Urban Sounds 8K dataset and how they perform with a pre-trained vision-based model (Resnet-34) leveraging Fastaiv2. This notebook converts these sounds to a spectrogram then uses FastAI2 code base to classify these sounds. Code and approach in this notebook

There are ten folders in this dataset as part of the data source and we will approach this as a ten-fold cross-validation for a proper comparative metric with other research papers.

About the UrbanSounds8K dataset

Urban Sounds is a dataset of 8732 labeled sounds of less than 4 seconds each from 10 classes. Dataset for UrbanSounds8K contains these 10 classes:

air_conditioner
car_horn
children_playing
dog_bark
drilling
engine_idling
gun_shot
jackhammer
siren
street_music

Research with this dataset as of 2019 and optimized ML approaches as of late 2019 had classification accuracy at 74% with a k-nearest neighbours (KNN) algorithm. A deep learning neural network trained from scratch obtained accuracy at 76% accuracy.

(accuracy metrics for research article)

Setup Section

The installs and includes

You could use a non-GPU machine type for some file conversions as they are computationally expensive. For this, I am using an ml.p3.2xlarge. For the deep learning model, you could run it just as well on an ml.g4.2xlarge at a reduced cost.

#collapse-hide
#One time installs  - On AWS useconda_pytorch_p38 environment and add using ml.p3.2xlarge for this notebook
# !pip install librosa
# !pip install fastbook

#collapse-hide
#all the one time imports for this nb
import pandas as pd

from fastai.vision.all import *
from fastai.data.all import *
import matplotlib.pyplot as plt
from matplotlib.pyplot import specgram
import librosa
import librosa.display
import numpy as np
from pathlib import Path
import os
import random
import IPython
from tqdm import tqdm

from collections import OrderedDict

Download files from source then uncompress

These subsequent steps ensure adequate space to get the initial files and following transformations. I set aside 100GB.

File classification information

Note that this file provides classification information once unpacked.

	slice_file_name	fsID	start	end	salience	fold	classID	class
0	100032-3-0-0.wav	100032	0.0	0.317551	1	5	3	dog_bark
1	100263-2-0-117.wav	100263	58.5	62.500000	1	5	2	children_playing
2	100263-2-0-121.wav	100263	60.5	64.500000	1	5	2	children_playing
3	100263-2-0-126.wav	100263	63.0	67.000000	1	5	2	children_playing
4	100263-2-0-137.wav	100263	68.5	72.500000	1	5	2	children_playing

Data exploration

Class distribution across the sound types

#collapse-hide
df.groupby('class').classID.count().sort_values(ascending=False).plot.bar()
plt.ylabel('count')
plt.title('Class distribution in the dataset')

Text(0.5, 1.0, 'Class distribution in the dataset')

These classification files show that overall classes are represented, with a gunshot and car horns being slightly underrepresented relative to all others.

Each of the folds has a relatively similar number of wav files.

#collapse-hide
df.groupby(['fold']).classID.count().sort_values(ascending=False).plot.bar()
plt.ylabel('Files in each fold')
plt.title('Files in each fold')

Text(0.5, 1.0, 'Files in each fold')

Inspect the files - audio and single tranform of a audio file

Listen to a sample wav file - it happens to be a dog barking, an urban sound.

#collapse-hide

audio_file= 'UrbanSound8K/audio/fold5/100032-3-0-0.wav'   #dog bark in fold 5

IPython.display.Audio(audio_file)

Your browser does not support the audio element.

Single File Transformation

Linear Spectrogram

With the librosa module, these are the steps to convert a single wave file to a simple spectrogram. We stay with the dog barking audio representation as a spectrogram.

#collapse-hide

samples, sample_rate = librosa.load(audio_file)
Ydb = librosa.amplitude_to_db(librosa.stft(samples), ref=sample_rate)
plt.figure(figsize=(18, 6))
librosa.display.specshow(Ydb, sr=sample_rate, x_axis='time', y_axis='linear')
plt.colorbar()

/home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/librosa/util/decorators.py:88: UserWarning: amplitude_to_db was called on complex input so phase information will be discarded. To suppress this warning, call amplitude_to_db(np.abs(S)) instead.
  return f(*args, **kwargs)





<matplotlib.colorbar.Colorbar at 0x7f86eeaa0be0>

Log Spectrogram

Again, a spectrogram can have a log representation for the dog barking in log space.

#collapse-hide

plt.figure(figsize=(18, 6))
librosa.display.specshow(Ydb, sr=sample_rate, x_axis='time', y_axis='log')
plt.colorbar()

<matplotlib.colorbar.Colorbar at 0x7f86eec16ac0>

Mel Spectrogram

And finally, this code transforms the same audio file into a mel-spectrogram. Note the added level of intensity and representation in a mel-spectrogram.

#collapse-hide

S = librosa.feature.melspectrogram(y=samples, sr=sample_rate)
Sdb = librosa.power_to_db(S, ref=np.max)
plt.figure(figsize=(18, 6))
librosa.display.specshow(Sdb, sr=sample_rate, x_axis='time', y_axis='mel')
plt.colorbar()

<matplotlib.colorbar.Colorbar at 0x7f86eec09910>

8K Files into Three Transformations

The below code creates folders for the transformed image files from the 8K wav files. Note the minor change from the single-file approach for images - we can drop the axes as machines don't need these - a picture without axes works best.

This step takes a significant amount of time; it needs to be done once.

#collapse-hide

audio_path = Path('UrbanSound8K/audio/')  # un zipped source audio files are in this location as wav files
tranform_store_path = 'UrbanSoundTransforms/'  #destination folder for each transformed image state

#collapse-hide

#make initial folders once
#os.mkdir(tranform_store_path)
# os.mkdir(tranform_store_path +'linear_spectrogram')
# os.mkdir(tranform_store_path +'log_spectrogram')
# os.mkdir(tranform_store_path +'mel_spectrogram')

Validate all files are transformed in destination folds

Validate that all 8K files converted into the three types of transformations

#collapse-show
transforms = ['linear_spectrogram/','log_spectrogram/','mel_spectrogram/']
for transform in transforms:
    count = 0
    for fold in np.arange (1,11):
        count += len(list(Path(tranform_store_path+transform+str(fold)).glob('*.png')))
    print ('%s file count is %s'%(transform[:-1],count))
    assert (len(df)==count)

linear_spectrogram file count is 8732
log_spectrogram file count is 8732
mel_spectrogram file count is 8732

#collapse-show
classes = OrderedDict(sorted(df.set_index('classID').to_dict()['class'].items()))
classes

OrderedDict([(0, 'air_conditioner'),
             (1, 'car_horn'),
             (2, 'children_playing'),
             (3, 'dog_bark'),
             (4, 'drilling'),
             (5, 'engine_idling'),
             (6, 'gun_shot'),
             (7, 'jackhammer'),
             (8, 'siren'),
             (9, 'street_music')])

Visual inspection for each sound category

In this bit of code, we look at each of the transformed depictions across the ten-classes

#collapse-show
fig, ax = plt.subplots(10,3, figsize=(16,16))
for k,v in classes.items():
    sample = df[df['class']==v].sample(1)
    sample_fold = sample['fold'].values[0]
    sample_file = sample['slice_file_name'].values[0].replace('wav','png')
    t_counter=0
    for transform in transforms:
        img = plt.imread(tranform_store_path+transform+str(sample_fold)+'/'+sample_file)
        ax[k][t_counter].imshow(img, aspect='equal')
        ax[k][t_counter].set_title(v+' transformed with '+ transform[:-1])
        ax[k][t_counter].title.set_size(10)
        ax[k][t_counter].set_axis_off()
        
        t_counter+=1
fig.tight_layout()
plt.show()

Fast AI Model Build

From the reference file where our sources were in wav format, we will change them to png for each file name and create a dictionary objection with class for a filename.

#collapse-show
df['fname'] = df[['slice_file_name','fold']].apply (lambda x: str(x['slice_file_name'][:-4])+'.png'.strip(),axis=1 )

This labelling function uses the dictionary object and returns the class. We drop the parts of the path and focus on the filename, which is unique across the 8K files.

#collapse-show
my_dict = dict(zip(df.fname,df['class']))

#collapse-show
def label_func(f_name):
    f_name = str(f_name).split('/')[-1:][0]
    return my_dict[f_name]

#collapse-show
all_folds = list(np.arange(1,11))
all_folders = [str(i) for i in all_folds]
all_folders

['1', '2', '3', '4', '5', '6', '7', '8', '9', '10']

#collapse-show
results = pd.DataFrame()

We create a list of the ten folds as strings for a downstream code in this bit of code. A Resnet-34 model runs for three epochs for comparative analysis. This code runs the k-fold type prediction with a single fold as a test across the three file transformation folders. This code takes a significant amount of time and uses the GPU.

#collapse-show
for transform in transforms:
    all_files = get_image_files(path=tranform_store_path+transform,recurse=True, folders =all_folders )
    
    for test_folder in all_folds:
        
        dblock = DataBlock(blocks=(ImageBlock,CategoryBlock),
                   get_y     = label_func,
                   
                   splitter  = FuncSplitter(lambda s: Path(s).parent.name==str(test_folder)),
                   
                  )
        dl = dblock.dataloaders(all_files)
        
        print ('Train has {0} images and test has {1} images. Test is on folder {2} of transform type {3}.' .format(len(dl.train_ds),len(dl.valid_ds),test_folder,transform[:-1]))
        learn = vision_learner(dl, resnet34, metrics=accuracy)
        learn.fine_tune(3)
        r = learn.validate()
        results.at[test_folder,transform[:-1]] = r[1]

Train has 7859 images and test has 873 images. Test is on folder 1 of transform type linear_spectrogram.

epoch	train_loss	valid_loss	accuracy	time
0	1.646356	1.303435	0.592211	00:33

epoch	train_loss	valid_loss	accuracy	time
0	0.734405	1.158944	0.670103	00:41
1	0.349336	1.062735	0.715922	00:41
2	0.120102	1.035805	0.735395	00:42

Train has 7844 images and test has 888 images. Test is on folder 2 of transform type linear_spectrogram.

epoch	train_loss	valid_loss	accuracy	time
0	1.618032	1.471006	0.550676	00:32

epoch	train_loss	valid_loss	accuracy	time
0	0.750616	1.395346	0.643018	00:42
1	0.339488	1.213989	0.685811	00:42
2	0.122644	1.223365	0.698198	00:42

Train has 7807 images and test has 925 images. Test is on folder 3 of transform type linear_spectrogram.

epoch	train_loss	valid_loss	accuracy	time
0	1.609933	1.212611	0.585946	00:32

epoch	train_loss	valid_loss	accuracy	time
0	0.713603	1.071660	0.663784	00:42
1	0.354776	1.287965	0.656216	00:42
2	0.110639	1.320493	0.678919	00:42

Train has 7742 images and test has 990 images. Test is on folder 4 of transform type linear_spectrogram.

epoch	train_loss	valid_loss	accuracy	time
0	1.643499	1.288041	0.597980	00:32

epoch	train_loss	valid_loss	accuracy	time
0	0.720948	1.036399	0.688889	00:42
1	0.335754	0.818471	0.760606	00:42
2	0.113218	0.871289	0.763636	00:42

Train has 7796 images and test has 936 images. Test is on folder 5 of transform type linear_spectrogram.

epoch	train_loss	valid_loss	accuracy	time
0	1.648136	1.279705	0.588675	00:33

epoch	train_loss	valid_loss	accuracy	time
0	0.761581	0.919619	0.705128	00:42
1	0.345561	0.714063	0.792735	00:42
2	0.121579	0.691167	0.791667	00:42

Train has 7909 images and test has 823 images. Test is on folder 6 of transform type linear_spectrogram.

epoch	train_loss	valid_loss	accuracy	time
0	1.640491	1.461258	0.562576	00:33

epoch	train_loss	valid_loss	accuracy	time
0	0.722605	1.607899	0.646416	00:43
1	0.325290	1.414237	0.693803	00:43
2	0.110730	1.377959	0.712029	00:43

Train has 7894 images and test has 838 images. Test is on folder 7 of transform type linear_spectrogram.

epoch	train_loss	valid_loss	accuracy	time
0	1.594230	1.479910	0.517900	00:33

epoch	train_loss	valid_loss	accuracy	time
0	0.705556	0.951557	0.717184	00:43
1	0.306930	1.075511	0.704057	00:43
2	0.109320	0.981054	0.717184	00:43

Train has 7926 images and test has 806 images. Test is on folder 8 of transform type linear_spectrogram.

epoch	train_loss	valid_loss	accuracy	time
0	1.618264	1.568802	0.529777	00:33

epoch	train_loss	valid_loss	accuracy	time
0	0.736232	1.685889	0.589330	00:43
1	0.322088	1.566928	0.645161	00:43
2	0.113493	1.512315	0.676179	00:43

Train has 7916 images and test has 816 images. Test is on folder 9 of transform type linear_spectrogram.

epoch	train_loss	valid_loss	accuracy	time
0	1.662708	1.191227	0.609069	00:33

epoch	train_loss	valid_loss	accuracy	time
0	0.738244	0.937917	0.752451	00:43
1	0.342161	0.828607	0.797794	00:43
2	0.110757	0.838220	0.789216	00:43

Train has 7895 images and test has 837 images. Test is on folder 10 of transform type linear_spectrogram.

epoch	train_loss	valid_loss	accuracy	time
0	1.639923	1.204564	0.646356	00:33

epoch	train_loss	valid_loss	accuracy	time
0	0.715795	0.857385	0.756272	00:43
1	0.351518	0.753865	0.792115	00:43
2	0.122999	0.661824	0.832736	00:43

Train has 7859 images and test has 873 images. Test is on folder 1 of transform type log_spectrogram.

epoch	train_loss	valid_loss	accuracy	time
0	1.650926	1.132629	0.621993	00:33

epoch	train_loss	valid_loss	accuracy	time
0	0.664109	0.927699	0.726231	00:42
1	0.279372	0.983903	0.741123	00:43
2	0.087220	0.917430	0.746850	00:43

Train has 7844 images and test has 888 images. Test is on folder 2 of transform type log_spectrogram.

epoch	train_loss	valid_loss	accuracy	time
0	1.580881	1.473966	0.564189	00:33

epoch	train_loss	valid_loss	accuracy	time
0	0.660235	1.674381	0.662162	00:43
1	0.261079	1.347339	0.721847	00:43
2	0.086787	1.303460	0.730856	00:43

Train has 7807 images and test has 925 images. Test is on folder 3 of transform type log_spectrogram.

epoch	train_loss	valid_loss	accuracy	time
0	1.554419	1.265764	0.601081	00:32

epoch	train_loss	valid_loss	accuracy	time
0	0.645222	1.155872	0.647568	00:42
1	0.276456	1.479933	0.660541	00:42
2	0.085445	1.405435	0.684324	00:42

Train has 7742 images and test has 990 images. Test is on folder 4 of transform type log_spectrogram.

epoch	train_loss	valid_loss	accuracy	time
0	1.574416	1.410470	0.570707	00:32

epoch	train_loss	valid_loss	accuracy	time
0	0.651727	0.948870	0.710101	00:42
1	0.285824	0.963551	0.746465	00:42
2	0.092143	0.833476	0.763636	00:42

Train has 7796 images and test has 936 images. Test is on folder 5 of transform type log_spectrogram.

epoch	train_loss	valid_loss	accuracy	time
0	1.584739	1.040504	0.685897	00:32

epoch	train_loss	valid_loss	accuracy	time
0	0.659326	0.788796	0.753205	00:42
1	0.294902	0.901573	0.777778	00:42
2	0.096744	0.684919	0.795940	00:43

Train has 7909 images and test has 823 images. Test is on folder 6 of transform type log_spectrogram.

epoch	train_loss	valid_loss	accuracy	time
0	1.568830	1.537969	0.565006	00:33

epoch	train_loss	valid_loss	accuracy	time
0	0.629603	1.329358	0.693803	00:43
1	0.270114	1.239127	0.761847	00:43
2	0.093767	1.230025	0.765492	00:43

Train has 7894 images and test has 838 images. Test is on folder 7 of transform type log_spectrogram.

epoch	train_loss	valid_loss	accuracy	time
0	1.551806	1.253360	0.609785	00:33

epoch	train_loss	valid_loss	accuracy	time
0	0.634409	1.354680	0.698091	00:43
1	0.257725	1.049563	0.757757	00:43
2	0.086514	1.006423	0.762530	00:43

Train has 7926 images and test has 806 images. Test is on folder 8 of transform type log_spectrogram.

epoch	train_loss	valid_loss	accuracy	time
0	1.594089	1.724049	0.547146	00:33

epoch	train_loss	valid_loss	accuracy	time
0	0.639673	1.433562	0.689826	00:43
1	0.279755	1.512132	0.697270	00:43
2	0.083887	1.416594	0.719603	00:43

Train has 7916 images and test has 816 images. Test is on folder 9 of transform type log_spectrogram.

epoch	train_loss	valid_loss	accuracy	time
0	1.617574	1.245562	0.634804	00:32

epoch	train_loss	valid_loss	accuracy	time
0	0.661609	0.909678	0.781863	00:43
1	0.275348	0.984878	0.787990	00:43
2	0.088735	1.004140	0.794118	00:43

Train has 7895 images and test has 837 images. Test is on folder 10 of transform type log_spectrogram.

epoch	train_loss	valid_loss	accuracy	time
0	1.547163	1.147652	0.640382	00:33

epoch	train_loss	valid_loss	accuracy	time
0	0.634231	0.794007	0.801673	00:43
1	0.255871	0.729826	0.813620	00:43
2	0.083886	0.691131	0.835125	00:43

Train has 7859 images and test has 873 images. Test is on folder 1 of transform type mel_spectrogram.

epoch	train_loss	valid_loss	accuracy	time
0	1.438649	0.855717	0.723940	00:33

epoch	train_loss	valid_loss	accuracy	time
0	0.538291	1.242817	0.709049	00:42
1	0.215221	1.033411	0.757159	00:43
2	0.067709	1.043618	0.750286	00:42

Train has 7844 images and test has 888 images. Test is on folder 2 of transform type mel_spectrogram.

epoch	train_loss	valid_loss	accuracy	time
0	1.419769	1.318493	0.602477	00:32

epoch	train_loss	valid_loss	accuracy	time
0	0.576866	1.232487	0.674550	00:43
1	0.199539	1.151864	0.721847	00:43
2	0.065879	1.218944	0.724099	00:42

Train has 7807 images and test has 925 images. Test is on folder 3 of transform type mel_spectrogram.

epoch	train_loss	valid_loss	accuracy	time
0	1.399783	1.344811	0.575135	00:32

epoch	train_loss	valid_loss	accuracy	time
0	0.530457	1.246315	0.652973	00:42
1	0.206290	1.100291	0.717838	00:42
2	0.066169	1.039851	0.722162	00:42

Train has 7742 images and test has 990 images. Test is on folder 4 of transform type mel_spectrogram.

epoch	train_loss	valid_loss	accuracy	time
0	1.398213	1.226020	0.614141	00:32

epoch	train_loss	valid_loss	accuracy	time
0	0.530371	0.979655	0.722222	00:42
1	0.212438	0.769903	0.780808	00:42
2	0.068044	0.774077	0.772727	00:43

Train has 7796 images and test has 936 images. Test is on folder 5 of transform type mel_spectrogram.

epoch	train_loss	valid_loss	accuracy	time
0	1.424381	0.938384	0.684829	00:33

epoch	train_loss	valid_loss	accuracy	time
0	0.540425	0.622973	0.818376	00:42
1	0.230166	0.432847	0.876068	00:42
2	0.075813	0.379239	0.887821	00:42

Train has 7909 images and test has 823 images. Test is on folder 6 of transform type mel_spectrogram.

epoch	train_loss	valid_loss	accuracy	time
0	1.394642	1.405592	0.620899	00:33

epoch	train_loss	valid_loss	accuracy	time
0	0.513183	1.323630	0.707169	00:43
1	0.226495	1.160470	0.746051	00:43
2	0.076572	1.182628	0.755772	00:43

Train has 7894 images and test has 838 images. Test is on folder 7 of transform type mel_spectrogram.

epoch	train_loss	valid_loss	accuracy	time
0	1.361491	1.206686	0.630072	00:33

epoch	train_loss	valid_loss	accuracy	time
0	0.514640	1.079973	0.744630	00:43
1	0.215227	0.871684	0.768496	00:43
2	0.073965	0.989660	0.779236	00:43

Train has 7926 images and test has 806 images. Test is on folder 8 of transform type mel_spectrogram.

epoch	train_loss	valid_loss	accuracy	time
0	1.353415	1.392421	0.629032	00:33

epoch	train_loss	valid_loss	accuracy	time
0	0.528919	1.156052	0.700993	00:43
1	0.209949	1.182361	0.745658	00:43
2	0.069247	1.138816	0.771712	00:43

Train has 7916 images and test has 816 images. Test is on folder 9 of transform type mel_spectrogram.

epoch	train_loss	valid_loss	accuracy	time
0	1.420336	1.109759	0.703431	00:33

epoch	train_loss	valid_loss	accuracy	time
0	0.546160	1.094824	0.769608	00:43
1	0.217296	1.037101	0.813725	00:43
2	0.067287	0.964782	0.833333	00:43

Train has 7895 images and test has 837 images. Test is on folder 10 of transform type mel_spectrogram.

epoch	train_loss	valid_loss	accuracy	time
0	1.366528	1.066314	0.714456	00:33

epoch	train_loss	valid_loss	accuracy	time
0	0.544914	0.848490	0.790920	00:43
1	0.219007	0.895404	0.808841	00:43
2	0.079304	0.896217	0.824373	00:43

Prediction Results

#collapse-show
results

	linear_spectrogram	log_spectrogram	mel_spectrogram
1	0.735395	0.746850	0.750286
2	0.698198	0.730856	0.724099
3	0.678919	0.684324	0.722162
4	0.763636	0.763636	0.772727
5	0.791667	0.795940	0.887821
6	0.712029	0.765492	0.755772
7	0.717184	0.762530	0.779236
8	0.676179	0.719603	0.771712
9	0.789216	0.794118	0.833333
10	0.832736	0.835125	0.824373

#collapse-show
results.describe()

	linear_spectrogram	log_spectrogram	mel_spectrogram
count	10.000000	10.000000	10.000000
mean	0.739516	0.759847	0.782152
std	0.052835	0.042857	0.052126
min	0.676179	0.684324	0.722162
25%	0.701656	0.734854	0.751658
50%	0.726289	0.763083	0.772220
75%	0.782821	0.786961	0.813089
max	0.832736	0.835125	0.887821

#collapse-show
results.plot.line(figsize=(15,7))

<AxesSubplot:>

Across the board, mel-spectrogram seems to outperform the other two transformations with a minor exception at the 10th fold.

We can inspect the losses in this category through the fast ai library, spot any classification issue or potentially these sounds having multiple classes, and change them. But since this is a provided dataset, we will not change any of the categories.

#collapse-show

interp = ClassificationInterpretation.from_learner(learn)

losses,idxs = interp.top_losses()

len(dl.valid_ds)==len(losses)==len(idxs)

True

interp.plot_top_losses(9, figsize=(15,11))

doc(interp.plot_top_losses)

Interpretation.plot_top_losses(k, largest=True, **kwargs)
Show `k` largest(/smallest) preds and losses. `k` may be int, list, or `range` of desired results.

To get a prettier result with hyperlinks to source code and documentation, install nbdev: pip install nbdev

interp.plot_confusion_matrix(figsize=(12,12), dpi=60)

Summary

Mel spectrograms tend to outperform linear and log spectrograms for human audio sounds.
We didn't explore further tuning the model or trying other models, but a 90% range accuracy for the mean is attainable for this dataset.
The 95% results in an earlier notebook are attributable to randomness from how the train-test split worked out on this model.

You can try out the mel-spectrogram trained with a random split from this hugging face link or find out how I made this.

Compare model performance with three different transformation types

About the UrbanSounds8K dataset

Setup Section

The installs and includes

Download files from source then uncompress

File classification information

Data exploration

Class distribution across the sound types

Inspect the files - audio and single tranform of a audio file

Single File Transformation

Linear Spectrogram

Log Spectrogram

Mel Spectrogram

8K Files into Three Transformations

Validate all files are transformed in destination folds

Visual inspection for each sound category

Fast AI Model Build

Prediction Results

Summary

Read next

Urban Sounds 8k Classification with Hugging Face Demo

Deploying my first model to HuggingFace spaces

Comments ()

About the UrbanSounds8K dataset

Setup Section

The installs and includes

Download files from source then uncompress

File classification information

Data exploration

Class distribution across the sound types

Inspect the files - audio and single tranform of a audio file

Single File Transformation

Linear Spectrogram

Log Spectrogram

Mel Spectrogram

8K Files into Three Transformations

Validate all files are transformed in destination folds

Visual inspection for each sound category

Fast AI Model Build

Prediction Results

Summary

Read next

Comments ( )

Comments ()