Presentation

Repairing data #

Sensors produce all sorts of faulty files.
Problems are documented in an open source known-problems repository¹
Categorizing problems allows us to describe them in a common language

Some Examples:

ID	Vendor	Description
OE001	N/A	No date in filename
FL008	Frontier Labs	Invalid datestamps in file names (space instead of a zero)
FL011	Frontier Labs	Partial files named `data`
WA002	Wildlife Acoustics	Generating files with no data

https://github.com/ecoacoustics/known-problems ↩︎

Repairing data #

Introducing EMU #

The Ecoacoustics Metadata Utility.

Renames files
Fixes problems
Extracts metadata
Open source
Cross platform
QutEcoacoustics/emu

emu help page

Repairing data #

Using EMU to rename files #

Can convert dates:

> emu rename **/*.WAV
Looking for targets...
-   Renamed 5B07FAC0.WAV
        to 20180525T120000Z.WAV
1 files, 1 renamed, 0 unchanged, 0 failed

Can add timezone offsets¹:

> emu rename **/*.wav --offset "+11:00"
Looking for targets...
-   Renamed PILLIGA_20121204_234600.wav
        to PILLIGA_20121204T234600+1100.wav
1 files, 1 renamed, 0 unchanged, 0 failed

There is one true date format: ISO8601 ↩︎

Repairing data #

Using EMU to rename files #

Can read metadata from the files to use in rename:

$ emu rename --template "{StartDate}_{SampleRateHertz}{Extension}" --scan-metadata
Looking for targets...
-   Renamed /mnt/f/tmp/fixes/renames/20210621T205706-0300.wav
        to /mnt/f/tmp/fixes/renames/20210621T205706-0300_256000.wav
-   Renamed /mnt/f/tmp/fixes/renames/20220331T094902-0300.flac
        to /mnt/f/tmp/fixes/renames/20220331T094902-0300_44100.flac

Real use case: recovering dates from corrupted memory card:

$ emu rename --template "{StartDate}{Extension}" --scan-metadata **/F*
Looking for targets...
-   Renamed /mnt/f/tmp/fixes/renames/F4622343428908
         to /mnt/f/tmp/fixes/renames/20220331T094902-0300.flac
-   Renamed /mnt/f/tmp/fixes/renames/F4623864286243
         to /mnt/f/tmp/fixes/renames/20210621T205706-0300.wav
2 files, 2 renamed, 0 unchanged, 0 failed

Repairing data #

See what emu can fix #

--:----:--

Repairing data #

Using EMU to fix problems #

FL010: Repairing an invalid duration

--:----:--

Repairing data #

Using EMU to fix problems #

OE004, FL001, WA002:Renaming empty (or near empty) files

--:----:--

Command used: ~/emu/emu fix apply -f OE004 -f FL001 -f WA002 .

Repairing data #

Why EMU? #

“I could fix this myself”

Can you do it for 10,000 files in a 1000 folders?
Is your fix destructive?
Does it destroy metadata?
Is it idempotent?

EMU is used to clean and repair files ingested into Ecosounds and the A2O.

It has scanned > 1 million files, and fixed ≈400,000 of them.

Wrangling Sound Files #

Topics #

Storage #

Scheduling #

Storage #

Storing data #

Storage #

Directory Structure #

Storage #

Formats #

Storage #

Remote repositories #

Repairing data #

Repairing data #

Introducing EMU #

Repairing data #

Using EMU to rename files #

Repairing data #

Using EMU to rename files #

Repairing data #

See what emu can fix #

Repairing data #

Using EMU to fix problems #

Repairing data #

Using EMU to fix problems #

Repairing data #

Why EMU? #

Segmenting #

Segmenting #

Using `ffmpeg` #

Segmenting with `AP` #

Golden rules #