How to publish a recognizer

This guide will walk you through publishing your own recognizer.

Why publish your recognizer? #

See the why publish your recognizer page.

What is GitHub and why is it great for this purpose? #

Git is a version control tool for code and text. Git allows you to track changes, collaborate with others, and manage different versions of your work. GitHub provides a web-based social platform for hosting and sharing Git repositories. GitHub is a popular way to share scientific work and is free for most of the features we need. You can read more about git and its uses from GitHub’s documentation page.

Version Control #

Version control is a system for recording changes to files over time, allowing you to track any modifications and revert back to previous versions. Version control is great for making collaboration with others seamless and manageable.

Zenodo - DOI generation #

Zenodo is an open-source project for generating DOIs that integrates well with GitHub. We recommend using Zenodo for generating a permanent DOI (Digital Object Identifier) for your recognizer. A DOI provides a unique and persistent identifier for your work, enabling others to cite it in their publications. Obtaining a DOI through Zenodo enhances the discoverability and visibility of your recognizer contributing to the impact and recognition of your work within the scientific community.

How to publish your recognizer #

We recommend using our Recognizer Template in GitHub for publishing your recognizer. If you are familiar with GitHub you can proceed directly to the template. However, if you need more information on how to use it effectively, please continue reading this article.

What goes in the repository? #

One or more recognizers, which should consist of:

  • Code to run the recognizers (e.g. scripts)
  • The code for the recognizers or scripts to download and install the software/packages needed
  • Test and training data (or instructions on how to access said data)
  • Models or artifacts (or instructions on how to access said data)

A README.md file that describes:

  • What species are detected by the recognizer.
  • The performance of the recognizer.
  • License and attribution information.
  • Instructions for setting up and running the recognizer

How do I use the GitHub template to publish my recognizer? #

Step 1. Log in to GitHub #

If you already have a GitHub account, log in. If you are new to GitHub, see the help centre page about GitHub.

Step 2. Use the template as a start for a repository #

Once you are logged in, go to the Recognizer Template repository. You can use the template as a base for creating a new repository by clicking the button that says Use this template and then selecting Create a new repository.

The Use this template button is located on the top right side of the page.

Then, in the following page:

  • input a name for your recognizer repository.

You should use kebab-case (all lower case, using hyphens instead of spaces) to title your repository as this is the convention used for most repositories on GitHub. For example:

  • A repository that contained a recognizer for a Red-tailed Black-cockatoo might be named red-tailed-black-cockatoo-recognizer.
  • a project that detects frogs in SE Queensland might be named se-queensland-frogs-recognizer
  • repository that hosts all the recognizers for your group might simply be called ecoacoustics-recognizers
  • You can choose if you want your repository to be public or private. Private is a good choice if you’re not ready to publicly publish but you want to get started. See Step 5. Publish your recognizer in GitHub for more information.

  • Leave the box that says include all branches unchecked.

  • Click Create repository from template.

  • Then review the instructions in the README.md file in your new repository

Step 3. Add test and training data #

Adding your training data makes it very transparent what kind of audio events your recognizers was trained on, which helps others understand how applicable your recognizer is to their situation. Adding testing data makes your reported accuracy results verifiable by others. Both of these will increase the likelihood that other researchers will use your recognizer.

If your datasets are too large to include in a git repository, which will often be the case, you might like to include a link where they can be downloaded (see What to do when your training data set is too large).

Step 4. Add code and models #

If you want to publish code with the recognizer, add it to the src folder.

Similarly, if you have a trained model or other artefacts produced while developing your recognizer, put them to artifacts folder. Examples of this might be

  • a configuration file which dictates how the code should be run specifically for the species and model in question,
  • file which contains the parameters of the function that maps your audio segments to a detection result, such as the weights of a neural network.

Step 5. Publish your recognizer in GitHub #

Once you are ready to publish your recognizer you can do so by making your repository public.

  1. In your repository go to the Settings tab Picture of GitHub settings tab
  2. In the General section, scroll to the bottom Danger Zone
  3. Click the button Change visibility and select Change to public Screenshot of Visibility settings

Step 6. Generate a DOI using Zenodo #

Generating a DOI will make this repository Findable and Citable, enabling others in the scientific community reference and build upon your work.

  1. Go to Zenodo.
  2. Use your GitHub account to log in.
  3. Once you are logged in, choose Github from the drop-down menu by clicking the arrow next to your email address in the top right-hand side menu. Picture of Zenodo drop down menu
  4. On this page you will see a list of your repositories. Flip the switch next to the recognizer repository you created to the ON-position. Picture of Zenodo repository switch
  5. Go to Github and create a release. Zenodo will automatically generate a DOI for this release which you should now see in Zenodo GitHub page. Zenodo will generate a new DOI version for each release. Read about Zenodo Versioning.
  6. Go to your Zenodo page for your repository.
  7. Finally, you can copy the DOI-badge from the DOI-page by clicking the DOI-badge on the side panel. Picture of Zenodo DOI-badge in the side panel We recommend copying the Markdown text and adding it to your README-file in your GitHub recognizer repository.

Step 7. Add or modify a citation information file #

To make your GitHub recognizer repository easily citable we recommend adding a citation information file (CITATION.cff).

  1. Please use the CITATION.cff generation website to create a CITATION.cff file.
  2. Replace the CITATION.cff file in your repository with the file you generated. Screenshot of the Citation.cff file. You can edit files in your browser by clicking the pen icon.
  3. Once you are happy with your changes, save the file by clicking the Commit changes… button. You can commit your changes directly to the main branch.
  4. If you will be collaborating with multiple authors we recommend learning more about working with branches in GitHub.

Step 8. List your recognizer in the registry #

Please see the Contribute to the registry page.

FAQ #

What to do when your training data set is too large? #

Storing data in a repository is not always the right choice. In each folder where it is relevant you should include:

  1. Small sets of audio samples
  2. A README.md containing
    • provenance of any data included
    • instructions on how to obtain more data
  3. Any scripts needed to download data from remote repositories

For larger datasets it is not ideal to store the data in the GitHub repository. In this case the test and training data should be stored via a different method. Some options you can use are:

  • An ecoacoustics repository
  • A bioacoustics repository
  • Cloud storage options like DropBox, OneDrive, etc.
  • Commercial services like Amazon S3, Google Cloud Storage, etc.

How do I keep my recognizer private until I’m ready to publish? #

It is a good idea to set up a repository as soon as you start working on your recognizer, however there are good reasons to keep your repository private until you’re ready to publish it.

Reasons to keep your repository private:

  • Waiting to publish the recognizer along with a publication.
  • Keeping sensitive recognizers hidden until it is safe to release them.
  • Allowing you time to develop your work before you release it.
  • GitHub has strong access control mechanisms so you can invite specific people to collaborate with you until you’re ready to release the recognizer publicly.

After these concerns have passed it’s a great time to publish your recognizer. You can do so by changing the repository settings:

How to change the repository settings? #

  1. In your repository go to the Settings tab
  2. In the General section, scroll to the bottom of the page to heading Danger Zone
  3. Click the button Change visibility and select Change to private. Once you are ready to publish, go to this same setting and select Change to public. Screenshot of Visibility settings

See more in GitHub Documentation: setting repository visibility.

Who owns my recognizer? #

You! That’s one of the reasons we think GitHub is great: you control and own your IP and this guide is just helping you do that.

Is the template and GitHub still a good choice for a recognizer that will only ever be private? #

Yes! version control, access control, common pattern for publishing, all other advantages listed in this article.