🙌🏼 🌲 🌚 Verbesserung des Markups multimodaler Daten: weniger Assessoren, mehr Ebenen 👂🏾 🙌🏽 🧕🏿

Hallo! Wir - Wissenschaftler des ITMO-Labors für maschinelles Lernen und das ML-Kernteam von VKontakte - forschen gemeinsam. Eine der wichtigen Aufgaben von VK ist die automatische Klassifizierung von Posts: Es ist nicht nur erforderlich, thematische Feeds zu generieren, sondern auch unerwünschte Inhalte zu identifizieren. Für die Verarbeitung von Aufzeichnungen sind Prüfer beteiligt. Gleichzeitig können die Kosten ihrer Arbeit durch ein solches Paradigma des maschinellen Lernens wie aktives Lernen erheblich gesenkt werden.

Es geht um seine Anwendung zur Klassifizierung multimodaler Daten, die in diesem Artikel behandelt werden. Wir werden Sie über die allgemeinen Prinzipien und Methoden des aktiven Lernens, die Besonderheiten ihrer Anwendung auf die Aufgabe sowie die während der Forschung gewonnenen Erkenntnisse informieren.

Bild

Einführung

— machine learning, . , , , .

, (, Amazon Mechanical Turk, .) . — reCAPTCHA, , , , — Google Street View. — .

. , Voyage — , . , , . , .

Amazon DALC (Deep Active Learning from targeted Crowds). , . Monte Carlo Dropout ( ). — noisy annotation. , « , », .

Amazon . : / . , , . , : , . .

— ! , . pool-based sampling.

Zahl: 1. Allgemeines Schema eines Pool-basierten Szenarios des aktiven Lernens

. 1. pool-based

. , , ( ). : , .

, — . (. — query). , . ( , ) .

, , .

, — . ( ). ≈250 . . () 50 — — :

, (. embedding), ;
.

, (. . 2).

. 2 —

. 2 —

ML — . , .

. , . , , , . , , early stopping. , .

. residual , highway , (. encoder). , (. fusion): , .

— , . -.

, — , . , .

. , (. 3):

. 3.

. 3.

. , . , , . , ( + ) — .

, . 3, :

. 4.

. 4.

, , . , ó , , .

, : ? :

;
;
.

. : maximum likelihood , - . :

L = \frac{1}{σ_{1}^{2}} L_{1} + \frac{1}{σ_{2}^{2}} L_{2} + \frac{1}{σ_{3}^{2}} L_{3} + \log σ_{1} + \log σ_{2} + \log σ_{3}

$L = \frac{1}{\sigma_1 ^ 2}L_1 + \frac{1}{\sigma_2 ^ 2}L_2 + \frac{1}{\sigma_3 ^ 2}L_3 + \log{\sigma_1} + \log{\sigma_2} + \log{\sigma_3}$

$L_1, L_2, L_3$ — ( -), $\sigma_1, \sigma_2, \sigma_3$ — , .

Pool-based sampling

— , . pool-based sampling :

- .
.
, , .
.
( ).
3–5 (, ).

, 3–6 — .

, , :

, . , : . , , , . . , 2 000.
. , . ( ). , , . , . 20 .

. , . — , . 100 200.

, , , .

№1: batch size

baseline , ( ) (. 5).

. 5. baseline- .

random state. .

. «» , , .

, (. batch size). 512 — - (50). , batch size . . :

upsample, ;
, .

batch size: (1).

c u r r e n t_b a t c h_s i z e = b + ⌊ \frac{n \mod b}{⌊ \frac{n}{b} ⌋} ⌋ [1]

$current\_batch\_size =b + \Big \lfloor\frac{n \mod b}{\lfloor\frac{n}{b}\rfloor}\Big\rfloor [1]$

$b$ — batch size, $n$ — .

“” (. 6).

. 6. batch size (passive ) (passive + flexible )

: c . , , batch size . .

Uncertainty

— uncertainty sampling. , , .

1. (. Least confident sampling)

, :

x_{L C}^{*} = \underset{x}{\arg max} 1 - P_{θ} (\hat{y} | x) [2]

$x^{*}_{LC} = \underset{x}{\arg\max} \ 1 - P_{\theta }(\hat{y}|x) [2]$

$\hat{y} = \underset{y}{\arg\max}\ P_{\theta}(y|x)$ — , $y$ — , $x$ — , $x^{*}_{LC}$ — , .

. , $1-\hat{y}$ . , . .

. , : {0,5; 0,49; 0,01}, — {0,49; 0,255; 0,255}. , (0,49) , (0,5). , ó : . , .

2. (. Margin sampling)

, , , :

x_{M}^{*} = \underset{x}{\arg min} P_{θ} ({\hat{y}}_{1} | x) - P_{θ} ({\hat{y}}_{2} | x) [3]

$x^{*}_{M} = \underset{x}{\arg\min} \ P_{\theta }(\hat{y}_{1}|x) - P_{\theta }(\hat{y}_{2}|x)[3]$

$\hat{y}_1$ — $x$ , $\hat{y}_2$ — .

, . , . , , MNIST ( ) — , . .

3. (. Entropy sampling)

x_{H}^{*} = \underset{x}{\arg max} - \sum P_{θ} (y_{i} | x) \log P_{θ} (y_{i} | x) [4]

$x^{*}_{H} = \underset{x}{\arg\max} -\sum \ P_{\theta }(y_{i}|x)\log{P_{\theta }(y_{i}|x)}[4]$

$y_{i}$ — $i$ - $x$ .

, , . :

, , ;
, .

, , . , entropy sampling .

(. 7).

. 7. uncertainty sampling ( — , — , — )

, least confident entropy sampling , . margin sampling .

, , : MNIST. , , entropy sampling , . , .

. $O(p\log{q})$ , $p$ — , $q$ — . , .

BALD

, , — BALD sampling (Bayesian Active Learning by Disagreement). .

, query-by-committee (QBC). — . uncertainty sampling. , . QBC Monte Carlo Dropout, .

, , — . dropout . dropout , ( ). , dropout- (. 8). Monte Carlo Dropout (MC Dropout) . , . ( dropout) Mutual Information (MI). MI , , — , . .

. 8. MC Dropout BALD

, QBC MC Dropout uncertainty sampling. , (. 9).

. 9. uncertainty sampling QBC ( - , - , - )

. 9. uncertainty sampling ( QBC ) ( — , — , — )

BALD. , Mutual Information :

a_{B A L D} = H (y_{1}, . . ., y_{n}) - E [H (y_{1}, . . ., y_{n} | ω)] [5]

$a_{BALD}=\mathbb{H}(y_1,...,y_n)-\mathbb{E}[\mathbb{H}(y_1,...,y_n|\omega)] [5]$

E [H (y_{1}, . . ., y_{n} | w)] = \frac{1}{k} \sum_{i = 1}^{n} \sum_{j = 1}^{k} H (y_{i} | w_{j}) [6]

$\mathbb{E}[\mathbb{H}(y_1,...,y_n|w)]=\frac{1}{k}\sum_{i=1}^{n}\sum_{j=1}^{k}\mathbb{H}(y_i|w_j) [6]$

$n$ — , $k$ — .

(5) , — . , , . BALD . 10.

. 10. BALD

, , .

query-by-committee BALD , . , uncertainty sampling. , — $O(kp\log(q))$ , $p$ — , $q$ — , $k$ — , .

BALD tf.keras, . PyTorch, dropout , batch normalization , .

№2: batch normalization

batch normalization. batch normalization — , . , , , , batch normalization. , . , . BALD. (. 11).

. 11. batch normalization BALD

, , .

batch normalization, . , .

Learning loss

. , . , .

, . — . , . learning loss, . , (. 12).

. 12. Learning loss

learning loss . .

. , . «» learning loss: , , . ideal learning loss (. 13).

. 13. ideal learning loss

, learning loss.

, . , , - , . :

(2000 ), ;
10000 ( );
;
;
100 ;
, , 1;
.

, , . , ( margin sampling).

1.

		p-value
loss	-0,2518	0,0115
margin	0,2461	0,0136

, margin sampling — , , , . c .

: ?

, , (. 14).

. 14. ideal learning loss ideal learning loss

, MNIST :

2. MNIST

		p-value
loss	0,2140	0,0326
	0,2040	0,0418

ideal learning loss , (. 15).

Zahl: 15. Trainieren Sie aktiv den Charakterklassifikator aus dem MNIST-Datensatz mit der idealen Lernverluststrategie. Blaue Grafik - idealer Lernverlust, orange - passives Lernen

. 15. MNIST ideal learning loss. — ideal learning loss, —

, , , , . .

learning loss , uncertainty sampling: $O(p\log{q})$ , $p$ — , $q$ — . , , . , .

, . . , margin sampling — . 16.

Zahl: 16. Vergleich des Trainings mit zufällig ausgewählten Daten (passives Training) und mit Daten, die durch die Margin-Sampling-Strategie ausgewählt wurden

. 16. ( ) , margin sampling

: ( — margin sampling), — , , . ≈25 . . 25% — .

, . , , .

, , . , :

batch size;
, , — , batch normalization.

Verbesserung des Markups multimodaler Daten: weniger Assessoren, mehr Ebenen