Skip to content

Conversation

@mthrok
Copy link
Contributor

@mthrok mthrok commented Sep 21, 2021

[BC-Breaking] Move fine-tune specific module out of wav2vec2 encoder

Previously, the Linear module (called readout), used for an ASR fine-tuning task, was placed in encoder module. Conceptually, the encoder has nothing to do with a module specific to fine-tuning / downstreaming task.

The problems here are that;

  1. encoder can be also used in pretraining phase, which does not require such a module.
  2. The choice of Linear module is arbitral, and it is inconvenient for users to have hard-coded module structure in encoder.

Therefore this commit moves the Linear module out the encoder, and places it as aux attribute of Wav2Vec2Model. (as a result Wav2Vec2Model has feature_extractor, encoder and aux moduels)

An alternative approach is to define another module and place Wav2Vec2Model and aux module along each other. But that will introduce a new class we need to maintain, and since the use of aux is only expected for loading the pre-trained parameters published by fairseq (and it's variations from HF), and it is not general enough for downstream adoptations, where there will be a bunch of different more complicated models. (i.e. s3prl)

So based on the minimalistic approach, we put them inside of Wav2Vec2Model.

@mthrok mthrok changed the title Move aux [BC-Breaking] Move fine-tune specific module out of wav2vec2 encoder Sep 21, 2021
Previously, the Linear module (called `readout`) used for an ASR fine-tuning
task was placed in encoder module. Conceptually, the encoder has nothing to
do with a module specific to fine-tuning / downstreaming task.

The problems here are that;
1. encoder can be also used in pretraining phase, which does not require
such a module.
2. The choice of Linear module is arbitral, and it is inconvenient for users
to have hard-coded module structure in encoder.

Therefore this commit moves the Linear module out the encoder, and places it
as `aux` attribute of `Wav2Vec2Model`. (as a result `Wav2Vec2Model` has
`feature_extractor`, `encoder` and `aux` moduels)

An alternative approach is to define another module and place `Wav2Vec2Model`
and aux module along each other. But that will introduce a new class we need
to maintain, and since the use of `aux` is only expected for loading the
pre-trained parameters published by `fairseq` (and it's variations from HF),
and it is not general enough for downstream adoptations, where there will be
a bunch of different more complicated models. (i.e. s3prl)

So based on the minimalistic approach, we put them inside of `Wav2Vec2Model`.
@mthrok mthrok marked this pull request as ready for review September 22, 2021 17:32
Copy link
Contributor

@carolineechen carolineechen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@nateanl nateanl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mthrok mthrok merged commit 40f2a08 into pytorch:main Sep 22, 2021
@mthrok mthrok deleted the move-aux branch September 22, 2021 19:20
facebook-github-bot pushed a commit that referenced this pull request Sep 23, 2021
Summary: Import torchaudio by commit 40f2a08

Reviewed By: carolineechen

Differential Revision: D31056614

fbshipit-source-id: b04e83fe5460faad8f5d106da44a6e0f3aa2756b
hwangjeff added a commit that referenced this pull request Nov 4, 2021
* set version.

* Re-sync with internal repository (#592)

* Set up ShipIt

fbshipit-source-id: 4fb853c391900d3070b936e5a3e4609eb78a780d

* 20200428 pytorch/audio import

Summary: [10:30:47: cpuhrsch@devvm3140 pytorch]$ ./fb_build/import_audio.sh

Reviewed By: vincentqb

Differential Revision: D21282421

fbshipit-source-id: 9bde1455ca6a19defbf33dbbfc5f0d49a8e4dc6a

* Import torchaudio 20200528

Summary: Import Up to #664

Reviewed By: cpuhrsch

Differential Revision: D21728204

fbshipit-source-id: 648dd622087fa762194ca5f89a310500e777263d

* Remove unnecessary config file from torchaudio

Summary: Turned out .use_external_sox is not necessary for building torchaudio in fbcode.

Reviewed By: vincentqb

Differential Revision: D21792939

fbshipit-source-id: c0fb5173c6533e67114f50ddc8e9425bd129574f

* Import torchaudio 20200605

Summary: import torchaudio 0.5.0 in fbcode using import_audio.sh:

Reviewed By: vincentqb

Differential Revision: D21884426

fbshipit-source-id: b6f2cc308e597caef2dd767c315b167c09fb0d4c

* Change parameterized testing system to be compatible with unittest

Summary: The previous implementation of parameterized testing worked by modifying test.common_utils inplace.  This doesn't work in general because unittest's contract with test modules is such that it must be able to load the module and run the test itself.  Because the previous implementation needed to load the module and modify it, it is incompatible.

Reviewed By: mthrok

Differential Revision: D21964676

fbshipit-source-id: 9bb71e8c3f9fab074239b22306f3bbddb0f3975b

* Import torchaudio 20200618 #718

Summary: Import torchaudio up to #719

Reviewed By: zhangguanheng66

Differential Revision: D22119491

fbshipit-source-id: e14842278a32c9373179fc132e8111a0ffe66d93

* Import torchaudio 20200714 #782 (#784)

Summary:
Pull Request resolved: #784

 - Import torchaudio.
 - Change test util module name from test_case_utils to case_utils

Reviewed By: cpuhrsch

Differential Revision: D22261638

fbshipit-source-id: eb4df500c1d7db0a60baa100dd22795a63851438

* remediation of S205607

fbshipit-source-id: 5113fe0c527595e4227ff827253b7414abbdf7ac

* remediation of S205607

fbshipit-source-id: 798decc90db4f13770e97cdce3c0df7d5421b2a3

* Import torchaudio 20200723

Summary: Import torchaudio 20200723 #814

Reviewed By: fmassa

Differential Revision: D22666393

fbshipit-source-id: 50df07b5c158fe4e95ada7ea54381b2e26f6aecd

* Support custom exception message (#41907)

Summary:
Raise and assert used to have a hard-coded error message "Exception". User provided error message was ignored. This PR adds support to represent user's error message in TorchScript.

This breaks backward compatibility because now we actually need to script the user's error message, which can potentially contain unscriptable expressions. Such programs can break when scripting, but saved models can still continue to work.

Increased an op count in test_mobile_optimizer.py because now we need aten::format to form the actual exception message.

This is built upon an WIP PR:  pytorch/pytorch#34112 by driazati

Pull Request resolved: pytorch/pytorch#41907

Reviewed By: ngimel

Differential Revision: D22778301

Pulled By: gmagogsfm

fbshipit-source-id: 2b94f0db4ae9fe70c4cd03f4048e519ea96323ad

* Import torchaudio 20200804

Summary: Up to #804

Reviewed By: vincentqb

Differential Revision: D22947671

fbshipit-source-id: d1a005cec2f1a00913c41eda380b9f4b993ef779

* Remove .python3 markers

Reviewed By: ashwinp-fb

Differential Revision: D22955630

fbshipit-source-id: f00ef17a905e4c7cd9196c8924db39f9cdfe8cfa

* Import torchaudio 20200821

Reviewed By: cpuhrsch

Differential Revision: D23273584

fbshipit-source-id: 2fe7effa11b7f7cdf0cee1da6b1cac5556e9f55b

* Import torchaudio 20200922

Summary: Up to #914

Reviewed By: vincentqb, cpuhrsch

Differential Revision: D23846718

fbshipit-source-id: 9feb4e58563b900965467bd9ff66c979211c50df

* replace max-sentences with batch-size for dependencies

Summary: this fixes some regressions introduced by D24121305. fairseq configuration is changing from command line to dataclasses (via hydra eventually) which no longer supports option aliases. one such alias is --max-sentences / --batch-size, and D24121305 removed --max-sentences as --batch-size is more appropriate (fairseq is not just an nlp framework dealing with sentences). unfortunately it seems some existing flows broke and this diff attempts to fix this

Differential Revision: D24142488

fbshipit-source-id: 075180ea10a9d706a3f8d64b978d66dfd83c3d2b

* Import torchaudio #996 758f6c2

Reviewed By: cpuhrsch

Differential Revision: D24606263

fbshipit-source-id: 4301b1df84d20c671783ec34c52d5b257374abf1

* Import torchaudio #1004 5e54c77

Summary: Import torchaudio up to #1004 5e54c77

Reviewed By: vincentqb, cpuhrsch

Differential Revision: D24841498

fbshipit-source-id: 3829130636f36779d84f01ff0d0120b80b2396d7

* Import torchaudio #1034 70f429a

Summary: Import torchaudio #1027 0cf4b8a

Reviewed By: vincentqb, cpuhrsch

Differential Revision: D24958707

fbshipit-source-id: d06dd6b59197cc2c16bec5a9012cbf33a172b6b3

* Import torchaudio #1066 4406a6b

Summary: Import up to #1066

Reviewed By: cpuhrsch

Differential Revision: D25373068

fbshipit-source-id: 890d36a25259b93428b3037c3123ff5a2cacfa04

* Import torchaudio #1105 37692d8

Summary: Import torchaudio up to #1105 37692d8

Reviewed By: datumbox

Differential Revision: D25671497

fbshipit-source-id: 5af11c801321f2bb964245ac6ed74979310f4b5f

* Import torchaudio #1161 7a36c55

Summary: Import torchaudio #1161 7a36c55

Reviewed By: cpuhrsch

Differential Revision: D25827050

fbshipit-source-id: 31e07ace85f7e1417884cd721bc80c5c6c33960f

* Import torchaudio #1182 d53e404

Summary: Import torchaudio #1182 d53e404

Reviewed By: datumbox

Differential Revision: D25975367

fbshipit-source-id: feac3187a82b0e3de23562fde11fcfc5bb13461d

* Import #1217 828df46

Summary: Import [#1217](#1217) 828df46

Reviewed By: cpuhrsch

Differential Revision: D26180248

fbshipit-source-id: 34b1e18e86436472f47070c4d3c748a10a4153a3

* Import torchaudio #1233 135e966

Reviewed By: mthrok

Differential Revision: D26228762

fbshipit-source-id: 9acc587adb5e7ca7867d8a5df44ba73166099fd9

* Import torchaudio #1250 5a69911

Summary: Imported from Github

Reviewed By: mthrok

Differential Revision: D26344055

fbshipit-source-id: 163f308e43f514c0b885f4ed0ed87efc0ad26982

* Remove reference_cast in make_boxed_from_unboxed_functor (#51319)

Summary:
Pull Request resolved: pytorch/pytorch#51319

We were going out of our way to accommodate `IValue::to<Tensor>` returning a copy of the inner Tensor. `IValue::toTensor` is capable of returning a reference without copying, so if we use it directly, we can allow kernels that want to take `Tensor &` to do so!
As a bonus, we get reduced build times.
ghstack-source-id: 121378961

Reviewed By: bhosmer

Differential Revision: D26138549

fbshipit-source-id: b0f830527da360c542c815bef2f7e1692615b32a

* Add missing file to faciliate fixup patch (#1417)

* Sync environment.yml such that patch applies (#1418)

* Import #1396 dd76e9d

Summary: Import #1396 dd76e9d

Reviewed By: vincentqb

Differential Revision: D26772272

fbshipit-source-id: 5fb10b8e4bfe955372eaf588d33ab96e1a83ef8d

* Fix broken list of checks (#1401)

Summary:
Pull Request resolved: #1401

Extra spaces broke list of checks.

Reviewed By: mthrok

Differential Revision: D27125520

fbshipit-source-id: 506924f9b73266b3f3ac174a020830f33b0c7489

* Import torchaudio #1412 c0bfb03

Summary:
Import latest from github to fbcode

Pass: 951
Skip: 19
Omit: 1
ListingSuccess: 26

Result available at: https://www.internalfb.com/intern/testinfra/testrun/8444249336935844

Reviewed By: mthrok

Differential Revision: D27448988

fbshipit-source-id: 61f63ffa1295a31b4452abaf2c74ebfefb827dcf

* fbshipit-source-id: 5ee15b601a9c5c08836e4a7198401d54aa50aa3f

* Import torchaudio #1420 ad534c1

Summary: Import torchaudio from github

Reviewed By: carolineechen

Differential Revision: D27770998

fbshipit-source-id: 0b4a4a143769ae49cc30478dd9f8e075594074e8

* Remove CI script that is removed in GitHub

Summary: `run-clang-format.py` has been renamed to `run_clang_format.py` in github.com/pytorch/audio but there exists two of them in fbcode. Removing the unneeded one.

Reviewed By: carolineechen

Differential Revision: D27822084

fbshipit-source-id: 132de34b85b866342757bf4648cc1b6b81ff12be

* Import torchaudio #1466 9d50acf

Reviewed By: vincentqb, mthrok

Differential Revision: D27922742

fbshipit-source-id: 6fa96728171687089abe6d734c23fc98bd29430b

* Import torchaudio #1475 b5d8027

Reviewed By: mthrok

Differential Revision: D28098981

fbshipit-source-id: 48231fc919f3fda2bf946a9a6f0c666f9a417017

* Replace prototype RNNTL with PySpeech RNNTL

Summary:
- bind RNNTL to `torchaudio`
- remove previous version of RNNTL, including submodule
- replace references to previous RNNTL with new PySpeech RNNTL

Reviewed By: vincentqb

Differential Revision: D27973417

fbshipit-source-id: 992eab9f82edc7fdec18851c7a393c9bb3169f30

* Combine old and new RNNTL tests

Summary:
- remove unused code from `numpy_transducer`
- merge prototype transducer loss tests into internal transducer loss tests

Reviewed By: vincentqb

Differential Revision: D27973416

fbshipit-source-id: cc8f3b566c48dd584cd0400dceb406f3c84471ac

* Move rnnt files externally

Summary:
move rnnt files out of internal-only folders
- `csrc/facebook/transducer` --> `csrc/rnnt`
- `torchaudio_unittest/facebook/transducer` --> `torchaudio_unittest/rnnt`
- `torchaudio/facebook/transducer` --> `torchaudio/rnnt`

Reviewed By: vincentqb

Differential Revision: D28072192

fbshipit-source-id: 9d01736d37a3eb7110fb2adc8bda5544d3340e7d

* Remove unused file

Summary: `kernels.h` was copied into both cpu and gpu folders previously, but it is not actually necessary for cpu rnnt

Reviewed By: vincentqb

Differential Revision: D28072209

fbshipit-source-id: 74427ee7d0c81aafdb82d1151035e89e4faec359

* Remove sparse support for A-R RNNT

Summary:
sparse support is a layer on top of alignment-restriction rnnt, but we do not wish to release either of them initially

this diff removes sparse functionality along with sparse-related parameters (`valid_ranges`, `cells_per_sample`)

Reviewed By: vincentqb

Differential Revision: D28072213

fbshipit-source-id: 9a88368af1a730b4167ffb9cebdd5eddcc6e4bf9

* Remove alignment restriction support

Summary:
we do not wish to support alignment-restriction for the first release.

this diff removes alignment restriction support along with relevant parameters (`wordpiece_ends`, `left_buffer`, `right_buffer`) and unit tests

Reviewed By: vincentqb

Differential Revision: D28072228

fbshipit-source-id: daf62b10a1e004ab4c22d498811c8bee3f0a22e0

* Remove GPU code

Summary: to make fixing the CI build easier, we want to first remove the CPU code and export only the CPU code. we will add back GPU code after CPU code is merged into open source

Reviewed By: vincentqb

Differential Revision: D28076934

fbshipit-source-id: 9e12298b0ba8733853999c1127f0ee9d9368e25f

* Import torchaudio #1479 0c263a9

Summary: This diff syncs torchaudio GH with fbcode

Reviewed By: cpuhrsch

Differential Revision: D28321222

fbshipit-source-id: 8c5b5ed87c5b7c3aa87495ccb68ccbf9eaaab152

* Import torchaudio #1513 08f2bde

Summary: Import from github

Reviewed By: mthrok

Differential Revision: D28606124

fbshipit-source-id: 05dcb07efc5537d928bec682a68e6ccee7cc325e

* Import audio #1497 ffe735b

Reviewed By: mthrok

Differential Revision: D28678814

fbshipit-source-id: 3356fd88dc33ad9f20294ca19b0c3958ce55f1ae

* Import torchaudio #1554 afb6626

Summary: Import torchaudio #1554 afb6626

Reviewed By: NicolasHug

Differential Revision: D28891382

fbshipit-source-id: 9b6e06ff94b2ec2f6d948049cc74046dee721471

* Import torchaudio #1575 e39ece6

Summary: Import torchaudio #1575 e39ece6

Reviewed By: NicolasHug

Differential Revision: D29120301

fbshipit-source-id: df209aa765ad0309452c1759c7a04ca9167d52a8

* Import torchaudio #1584 89807cf

Summary: Import torchaudio #1584 89807cf

Reviewed By: carolineechen

Differential Revision: D29369638

fbshipit-source-id: 13acc60ba0c639957f8fb93ec6601be48cdbc57c

* Import torchaudio #1597 284bd10

Summary: Import from Github

Reviewed By: carolineechen

Differential Revision: D29518488

fbshipit-source-id: 34b3d3f2f8035bf734d047c7b6e6ec6e15ff65f1

* Import torchaudio #1633 8d374c4

Summary: Import from github

Reviewed By: carolineechen, mthrok

Differential Revision: D29855617

fbshipit-source-id: cb80a0b419a83a9e6a7fd17be8ce1acd348531fd

* Fix timeout issue in torchaudio unit test.

Summary: As titled. Shorten the sample rate in test method to fix the timeout issue.

Reviewed By: mthrok

Differential Revision: D29884117

fbshipit-source-id: 80ab1cebfc34801ede11e644ca543f81f5b15102

* Re-sync with internal repository (#1643)

Co-authored-by: Facebook Community Bot <[email protected]>

* Remove out-of-sync files

Summary: Remove files already removed on GitHub

Reviewed By: nateanl

Differential Revision: D29910346

fbshipit-source-id: 309a883f7e1c1a29c93aba5f09f39c5b6aad2d7e

* Remove out-of-sync files

Summary: Remove files already removed on GitHub

Reviewed By: hwangjeff

Differential Revision: D29910411

fbshipit-source-id: 5dbd0240da262f3829ac8d6abe1af089455ce0dc

* Remove out-of-sync files

Summary: After D29910632, some files are detected as out-of-sync. These files are removed on GitHub.

Reviewed By: hwangjeff, nateanl

Differential Revision: D29912630

fbshipit-source-id: 7de604ffcc8bbe7aea048d0ad987e800258e3003

* Import torchaudio #1639 37dbf29

Summary: Import torchaudio #1639 37dbf29

Reviewed By: carolineechen, mthrok

Differential Revision: D29920658

fbshipit-source-id: 94ba8c04edcfb50e355b1ca8e937f612917ecf38

* Move fbcode-specific logic into fb directory

Summary: Moving the fb-specific logic to `fb` directory, so that it is no longer visible in OSS (well, unless one dig the commit history)

Reviewed By: carolineechen, nateanl

Differential Revision: D30080845

fbshipit-source-id: 85b04dab2d362e94110a9ce90f54523a49b6fc74

* Import torchaudio #1620 af652ca

Summary: Import torchaudio #1620 af652ca

Reviewed By: nateanl, mthrok

Differential Revision: D30079698

fbshipit-source-id: 9ade6df7bd006782f146a04dfdbd4549981cb001

* Simplify extension initialization

Summary:
For the case where torchaudio is used with `mode/opt`, D29973934 introduced extension initialization module specific for fbcode and the override process.

This diff simplifies the process by just inserting the step to extract extension module as a regular file at the beginning of torchaudio extension initialization process, so that OSS and fbcode use the same process / code.

Reviewed By: carolineechen

Differential Revision: D29989551

fbshipit-source-id: 9f30d0a36c220f0eb669244c9bb2da1b833d6f03

* Reduce length of waveform in pitch_shift batch_consistency test

Summary: To address the test failure in T96406395

Reviewed By: carolineechen

Differential Revision: D30163741

fbshipit-source-id: f88d86b3da7b1ee52518934567b0b0a62700ee58

* Fix batch consistency test in transforms

Summary: The stress test still fails. Add n_fft to address it.

Reviewed By: mthrok

Differential Revision: D30218279

fbshipit-source-id: 7858efd3e5ac0073193a7883fd314486efc73814

* Import torchaudio #1700 1a64530

Summary: title

Reviewed By: nateanl

Differential Revision: D30304880

fbshipit-source-id: 7b9c5ab6fbc06266c8421f1fdc0217effbc7e609

* Import torchaudio #1711 2c11582

Summary: as titled

Reviewed By: carolineechen

Differential Revision: D30449599

fbshipit-source-id: 7b3faaf6d7dbfa2e5ca9c263554b18e7364be77e

* Import torchaudio #1726 560c082

Summary: Import torchaudio up to #1726 560c082

Reviewed By: carolineechen

Differential Revision: D30579288

fbshipit-source-id: 324cf0eb089786605e1a10e5f44f8114424dd0a6

* Import torchaudio #1734 e8cc7f9

Summary: import torchaudio #1734 e8cc7f9

Reviewed By: nateanl, mthrok

Differential Revision: D30675712

fbshipit-source-id: 0529dde7e94d53e5ba1b386ab66b6f8eb73ba079

* Import torchaudio #1755 e11d27c

Summary: Import torchaudio #1755 e11d27c

Reviewed By: carolineechen

Differential Revision: D30844075

fbshipit-source-id: 1295cc142dda23cb4b029b332b4ef78bb0a67432

* Update reference from master to main elsewhere in fbcode

Summary: Update reference from master to main elsewhere in fbcode

Reviewed By: alexeib

Differential Revision: D30938472

fbshipit-source-id: 243b98550207f241c9d3265bf3d4060350aaf0a8

* Import torchaudio #1782 40f2a08

Summary: Import torchaudio by commit 40f2a08

Reviewed By: carolineechen

Differential Revision: D31056614

fbshipit-source-id: b04e83fe5460faad8f5d106da44a6e0f3aa2756b

* Import torchaudio #1803 b75e3bb

Summary: title

Reviewed By: nateanl, mthrok

Differential Revision: D31271175

fbshipit-source-id: d0b6c44d71a4434fa75e6cd481724632dbd1a3ae

* torchaudio: torch.quantization -> torch.ao.quantization (#1817)

Summary:
Pull Request resolved: #1817

This changes the imports in the `torchaudio` to include the new import locations.

```
codemod -d pytorch/audio --extensions py 'torch.quantization' 'torch.ao.quantization'
```

Reviewed By: mthrok

Differential Revision: D31302450

fbshipit-source-id: f31a0d4f453f840ea690edb688555a9d585787b5

* Import torchaudio #1828 60aeb78

Summary: title

Reviewed By: carolineechen

Differential Revision: D31476921

fbshipit-source-id: c790146b133921de8bbda67c8e8c7a1b321b4bd4

* Import torchaudio #1890 211270d

Reviewed By: mthrok

Differential Revision: D31728916

fbshipit-source-id: 4b932f285c274c5f2197325ce73ecbd930e3597e

* Import torchaudio #1942 ab50909

Summary: title

Reviewed By: nateanl, mthrok

Differential Revision: D31997978

fbshipit-source-id: cfbfa192780f7d786a658eb84cc0685881a4f398

Co-authored-by: Vincent Quenneville-Belair <[email protected]>
Co-authored-by: cpuhrsch <[email protected]>
Co-authored-by: Moto Hira <[email protected]>
Co-authored-by: Ji Chen <[email protected]>
Co-authored-by: Ben Mehne <[email protected]>
Co-authored-by: Stanislau Hlebik <[email protected]>
Co-authored-by: Yanan Cao <[email protected]>
Co-authored-by: Andres Suarez <[email protected]>
Co-authored-by: moto <[email protected]>
Co-authored-by: Alexei Baevski <[email protected]>
Co-authored-by: Vincent Quenneville-Belair <[email protected]>
Co-authored-by: Nicolas Hug <[email protected]>
Co-authored-by: Vasilis Vryniotis <[email protected]>
Co-authored-by: Scott Wolchok <[email protected]>
Co-authored-by: Dmitry Polukhin <[email protected]>
Co-authored-by: Parmeet Singh Bhatia <[email protected]>
Co-authored-by: Artyom Astafurov <[email protected]>
Co-authored-by: Zhaoheng Ni <[email protected]>
Co-authored-by: Facebook Community Bot <[email protected]>
Co-authored-by: Facebook Community Bot <[email protected]>
Co-authored-by: Yao-Yuan Yang <[email protected]>
Co-authored-by: Jeff Hwang <[email protected]>
Co-authored-by: Diana Liskovich <[email protected]>
Co-authored-by: Zafar Takhirov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants