This R Notebook is the complement to my blog post Pretrained Character Embeddings for Deep Learning and Automatic Text Generation.

This notebook is licensed under the MIT License. If you use the code or data visualization designs contained within this notebook, it would be greatly appreciated if proper attribution is given back to this notebook and/or myself. Thanks! :)

source("Rstart.R")

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

Stackoverflow is a great place to get help:
http://stackoverflow.com/tags/ggplot2.
Registering fonts with R

Attaching package: ‘scales’

The following object is masked from ‘package:readr’:

    col_factor
library(tsne)

1 Visualize GloVe Vectors

df <- read_delim("glove.840B.300d-char.txt", col_names = F, delim=" ", quote = "—")
Parsed with column specification:
cols(
  .default = col_double(),
  X1 = col_character()
)
See spec(...) for full column specifications.
df[,1:6]

Assign colors by character type (warning: jank implementation for determining character type)

type <- ifelse(df$X1 %in% letters, "lowercase",
          ifelse(df$X1 %in% LETTERS, "uppercase",
            ifelse(df$X1 %in% c(0:9), "numeric", "punctuation")))
type <- factor(type, levels=c("lowercase", "uppercase", "numeric", "punctuation"))
type
 [1] punctuation punctuation punctuation numeric     numeric    
 [6] numeric     punctuation punctuation uppercase   uppercase  
[11] uppercase   uppercase   uppercase   uppercase   punctuation
[16] punctuation lowercase   lowercase   lowercase   lowercase  
[21] lowercase   lowercase   punctuation punctuation punctuation
[26] punctuation punctuation numeric     numeric     punctuation
[31] punctuation uppercase   uppercase   uppercase   uppercase  
[36] uppercase   uppercase   punctuation punctuation lowercase  
[41] lowercase   lowercase   lowercase   lowercase   lowercase  
[46] punctuation punctuation punctuation punctuation punctuation
[51] numeric     numeric     punctuation punctuation uppercase  
[56] uppercase   uppercase   uppercase   uppercase   uppercase  
[61] uppercase   punctuation lowercase   lowercase   lowercase  
[66] lowercase   lowercase   lowercase   lowercase   punctuation
[71] punctuation punctuation punctuation punctuation numeric    
[76] numeric     numeric     punctuation uppercase   uppercase  
[81] uppercase   uppercase   uppercase   uppercase   uppercase  
[86] punctuation lowercase   lowercase   lowercase   lowercase  
[91] lowercase   lowercase   lowercase   punctuation
Levels: lowercase uppercase numeric punctuation
perplexity = 7
initial_dims = 16
max_iter = 5000
set.seed(123)
df_reduce <- tsne(df %>% select(X2:X301) %>% data.matrix(), perplexity = perplexity,
                  initial_dims = initial_dims, max_iter = max_iter)
sigma summary: Min. : 0.1091 |1st Qu. : 0.2229 |Median : 0.3149 |Mean : 0.3803 |3rd Qu. : 0.4603 |Max. : 1.156 |
Epoch: Iteration #100 error is: 19.9008331270176
Epoch: Iteration #200 error is: 2.49154431736295
Epoch: Iteration #300 error is: 1.87324143576349
Epoch: Iteration #400 error is: 1.59075923235389
Epoch: Iteration #500 error is: 1.35222486231347
Epoch: Iteration #600 error is: 1.17576546267547
Epoch: Iteration #700 error is: 0.980337707923824
Epoch: Iteration #800 error is: 0.813922092570097
Epoch: Iteration #900 error is: 0.70708861397967
Epoch: Iteration #1000 error is: 0.523447573796321
Epoch: Iteration #1100 error is: 0.440259641018736
Epoch: Iteration #1200 error is: 0.390224409495345
Epoch: Iteration #1300 error is: 0.375597956673988
Epoch: Iteration #1400 error is: 0.340263006276789
Epoch: Iteration #1500 error is: 0.333036071217878
Epoch: Iteration #1600 error is: 0.327345734645549
Epoch: Iteration #1700 error is: 0.326405073212151
Epoch: Iteration #1800 error is: 0.325612512290506
Epoch: Iteration #1900 error is: 0.323755276828245
Epoch: Iteration #2000 error is: 0.32235501602663
Epoch: Iteration #2100 error is: 0.321176910848929
Epoch: Iteration #2200 error is: 0.320488027687165
Epoch: Iteration #2300 error is: 0.319912091494209
Epoch: Iteration #2400 error is: 0.316101284354926
Epoch: Iteration #2500 error is: 0.314703671412566
Epoch: Iteration #2600 error is: 0.313603700195101
Epoch: Iteration #2700 error is: 0.311729358032727
Epoch: Iteration #2800 error is: 0.290913056358454
Epoch: Iteration #2900 error is: 0.287556668655012
Epoch: Iteration #3000 error is: 0.285954703869886
Epoch: Iteration #3100 error is: 0.285499614053067
Epoch: Iteration #3200 error is: 0.285160223816836
Epoch: Iteration #3300 error is: 0.284873813057661
Epoch: Iteration #3400 error is: 0.284620577084481
Epoch: Iteration #3500 error is: 0.284375973817991
Epoch: Iteration #3600 error is: 0.28414429027049
Epoch: Iteration #3700 error is: 0.283928003398625
Epoch: Iteration #3800 error is: 0.283721128912677
Epoch: Iteration #3900 error is: 0.283513869146296
Epoch: Iteration #4000 error is: 0.283308973090469
Epoch: Iteration #4100 error is: 0.283112309796586
Epoch: Iteration #4200 error is: 0.2829246993822
Epoch: Iteration #4300 error is: 0.282755705105952
Epoch: Iteration #4400 error is: 0.282624585398679
Epoch: Iteration #4500 error is: 0.282504849665238
Epoch: Iteration #4600 error is: 0.282401249075633
Epoch: Iteration #4700 error is: 0.282316886030564
Epoch: Iteration #4800 error is: 0.282244244473652
Epoch: Iteration #4900 error is: 0.28218129535269
Epoch: Iteration #5000 error is: 0.282131537547469
df_reduce <- data.frame(char = df$X1, type = type, df_reduce) %>%
                tbl_df() %>%
                mutate(char = as.character(char))
df_reduce
plot <- ggplot(df_reduce, aes(x=X1, y=X2, label=char, color = type)) +
          geom_text(family="Source Code Pro Semibold") +
          theme_void(base_family = "Source Sans Pro", base_size=8) +
          scale_color_brewer(palette="Set1") + 
          labs(title = "Projection of 300D GloVe Character Vectors into 2D Space (16D, perplexity = 7)",
               subtitle = "Characters closer to each other are more similar in usage context.",
               color = '') +
          theme(plot.margin = unit(c(0.2,0.2,0.2,0.2),"cm"),
                plot.subtitle = element_text(family="Open Sans Condensed Bold", size=8, color="#666666"))
max_save(plot, "char-tsne", w=5, h=4, "Stanford NLP")
perplexity = 2
initial_dims = 64
max_iter = 5000
set.seed(123)
df_reduce <- tsne(df %>% select(X2:X301) %>% data.matrix(), perplexity = perplexity,
                  initial_dims = initial_dims, max_iter = max_iter)
sigma summary: Min. : 0.1142 |1st Qu. : 0.1725 |Median : 0.3115 |Mean : 0.3579 |3rd Qu. : 0.5363 |Max. : 1.088 |
Epoch: Iteration #100 error is: 25.8760793569182
Epoch: Iteration #200 error is: 3.40491184952478
Epoch: Iteration #300 error is: 2.82652095433813
Epoch: Iteration #400 error is: 2.61351919903927
Epoch: Iteration #500 error is: 2.44614053865498
Epoch: Iteration #600 error is: 2.37619368988545
Epoch: Iteration #700 error is: 2.23237319968277
Epoch: Iteration #800 error is: 2.28598472974399
Epoch: Iteration #900 error is: 2.28704907389129
Epoch: Iteration #1000 error is: 2.30782035770231
Epoch: Iteration #1100 error is: 2.2228642540803
Epoch: Iteration #1200 error is: 2.14152563933713
Epoch: Iteration #1300 error is: 2.21792771417654
Epoch: Iteration #1400 error is: 2.17480494303053
Epoch: Iteration #1500 error is: 2.14442841588378
Epoch: Iteration #1600 error is: 2.05000922881734
Epoch: Iteration #1700 error is: 2.04150963174296
Epoch: Iteration #1800 error is: 2.02794295152051
Epoch: Iteration #1900 error is: 2.00547829213824
Epoch: Iteration #2000 error is: 1.97400981512453
Epoch: Iteration #2100 error is: 1.95257091343749
Epoch: Iteration #2200 error is: 1.93701021733958
Epoch: Iteration #2300 error is: 1.91671100932913
Epoch: Iteration #2400 error is: 1.87794824921166
Epoch: Iteration #2500 error is: 1.82819422564111
Epoch: Iteration #2600 error is: 1.81097311237971
Epoch: Iteration #2700 error is: 1.80499353291699
Epoch: Iteration #2800 error is: 1.76040935313148
Epoch: Iteration #2900 error is: 1.70144530621949
Epoch: Iteration #3000 error is: 1.69555438130605
Epoch: Iteration #3100 error is: 1.68761496767178
Epoch: Iteration #3200 error is: 1.68414683514823
Epoch: Iteration #3300 error is: 1.68194197731156
Epoch: Iteration #3400 error is: 1.68043650776213
Epoch: Iteration #3500 error is: 1.67948578863723
Epoch: Iteration #3600 error is: 1.67856959435698
Epoch: Iteration #3700 error is: 1.67759489834003
Epoch: Iteration #3800 error is: 1.67618146680881
Epoch: Iteration #3900 error is: 1.66973732701943
Epoch: Iteration #4000 error is: 1.65942279809246
Epoch: Iteration #4100 error is: 1.65801595695891
Epoch: Iteration #4200 error is: 1.65691853224998
Epoch: Iteration #4300 error is: 1.6559144564949
Epoch: Iteration #4400 error is: 1.65518323967551
Epoch: Iteration #4500 error is: 1.65455328538897
Epoch: Iteration #4600 error is: 1.654179239483
Epoch: Iteration #4700 error is: 1.65378054679007
Epoch: Iteration #4800 error is: 1.65312648165102
Epoch: Iteration #4900 error is: 1.652384671507
Epoch: Iteration #5000 error is: 1.65204671206778
df_reduce <- data.frame(char = df$X1, type = type, df_reduce) %>%
                tbl_df() %>%
                mutate(char = as.character(char))
plot <- ggplot(df_reduce, aes(x=X1, y=X2, label=char, color = type)) +
          geom_text(family="Source Code Pro Semibold") +
          theme_void(base_family = "Source Sans Pro", base_size=8) +
          scale_color_brewer(palette="Set1") + 
          labs(title = "Projection of 300D GloVe Character Vectors into 2D Space (64D, perplexity = 2)",
               subtitle = "Characters closer to each other are more similar in usage context.",
               color = '') +
          theme(plot.margin = unit(c(0.2,0.2,0.2,0.2),"cm"),
                plot.subtitle = element_text(family="Open Sans Condensed Bold", size=8, color="#666666"))
max_save(plot, "char-tsne-2", w=5, h=4, "Stanford NLP")

2 Visualize Embedded Magic Characters

df_embed <- read_delim("char-embeddings.txt", col_names = F, delim=" ", quote = "—")
Parsed with column specification:
cols(
  .default = col_double(),
  X1 = col_character()
)
See spec(...) for full column specifications.
1 parsing failure.
row col    expected      actual                  file
 36  -- 301 columns 302 columns 'char-embeddings.txt'
df_embed <- na.omit(df_embed)   # removes space and newline since will not parse
df_embed[,1:6]
type_embed <- ifelse(df_embed$X1 %in% letters, "lowercase",
                ifelse(df_embed$X1 %in% LETTERS, "uppercase",
                 ifelse(df_embed$X1 %in% c(0:9), "numeric", "punctuation")))
type_embed <- factor(type_embed, levels=c("lowercase", "uppercase", "numeric", "punctuation"))
perplexity = 10
initial_dims = 30
max_iter = 5000
set.seed(123)
df_reduce <- tsne(df_embed %>% select(X2:X301) %>% data.matrix(), perplexity = perplexity,
                  initial_dims = initial_dims, max_iter = max_iter)
sigma summary: Min. : 0.3982 |1st Qu. : 0.534 |Median : 0.589 |Mean : 0.6118 |3rd Qu. : 0.6945 |Max. : 0.8683 |
Epoch: Iteration #100 error is: 19.8910592108382
Epoch: Iteration #200 error is: 2.33802134983821
Epoch: Iteration #300 error is: 1.90657874553873
Epoch: Iteration #400 error is: 1.67310138694167
Epoch: Iteration #500 error is: 1.60159203367983
Epoch: Iteration #600 error is: 1.54534214605539
Epoch: Iteration #700 error is: 1.47202400115262
Epoch: Iteration #800 error is: 1.36359179697377
Epoch: Iteration #900 error is: 1.29338901327261
Epoch: Iteration #1000 error is: 1.22449894038032
Epoch: Iteration #1100 error is: 1.18563239809868
Epoch: Iteration #1200 error is: 1.16394694407087
Epoch: Iteration #1300 error is: 1.14957125694423
Epoch: Iteration #1400 error is: 1.0890666358816
Epoch: Iteration #1500 error is: 1.0665950000023
Epoch: Iteration #1600 error is: 1.05740834572964
Epoch: Iteration #1700 error is: 1.04467688100733
Epoch: Iteration #1800 error is: 1.0277168013362
Epoch: Iteration #1900 error is: 1.01363810613918
Epoch: Iteration #2000 error is: 1.00711524892226
Epoch: Iteration #2100 error is: 1.00261733094675
Epoch: Iteration #2200 error is: 0.998513918731433
Epoch: Iteration #2300 error is: 0.990047086853415
Epoch: Iteration #2400 error is: 0.985935233982174
Epoch: Iteration #2500 error is: 0.981127594076246
Epoch: Iteration #2600 error is: 0.974665708237662
Epoch: Iteration #2700 error is: 0.971771107043645
Epoch: Iteration #2800 error is: 0.969903063157817
Epoch: Iteration #2900 error is: 0.963494917257623
Epoch: Iteration #3000 error is: 0.959103295799426
Epoch: Iteration #3100 error is: 0.956883967492001
Epoch: Iteration #3200 error is: 0.954526780938449
Epoch: Iteration #3300 error is: 0.953220835794314
Epoch: Iteration #3400 error is: 0.952100075685241
Epoch: Iteration #3500 error is: 0.950936349047178
Epoch: Iteration #3600 error is: 0.948268595283757
Epoch: Iteration #3700 error is: 0.946614033465507
Epoch: Iteration #3800 error is: 0.94473545110956
Epoch: Iteration #3900 error is: 0.943661074638806
Epoch: Iteration #4000 error is: 0.942714344630487
Epoch: Iteration #4100 error is: 0.941857686485872
Epoch: Iteration #4200 error is: 0.940628703186185
Epoch: Iteration #4300 error is: 0.939789045001551
Epoch: Iteration #4400 error is: 0.939497303537436
Epoch: Iteration #4500 error is: 0.939385356136726
Epoch: Iteration #4600 error is: 0.93926741472932
Epoch: Iteration #4700 error is: 0.939224899636872
Epoch: Iteration #4800 error is: 0.939195983783595
Epoch: Iteration #4900 error is: 0.939169260551808
Epoch: Iteration #5000 error is: 0.939139305300065
df_reduce <- data.frame(char = df_embed$X1, type = type_embed, df_reduce) %>%
                tbl_df() %>%
                mutate(char = as.character(char))
plot <- ggplot(df_reduce, aes(x=X1, y=X2, label=char, color = type)) +
          geom_text(family="Source Code Pro Semibold") +
          theme_void(base_family = "Source Sans Pro", base_size=8) +
          scale_color_brewer(palette="Set1") + 
          labs(title = "Projection of 300D Magic Card Character Vectors into 2D Space (30D, perplexity = 10)",
               subtitle = "Characters closer to each other are more similar in usage context.",
               color = '') +
          theme(plot.margin = unit(c(0.2,0.2,0.2,0.2),"cm"),
                plot.subtitle = element_text(family="Open Sans Condensed Bold", size=8, color="#666666"))
max_save(plot, "char-tsne-embed", w=5, h=4, "Keras Logging")

3 Training Perfomance

batches_per_epoch = 7850
df_train <- read_csv("log.csv") %>% filter(iteration <= 20) %>%
              mutate(cumbatch = (iteration-1) * batches_per_epoch + batch)
Parsed with column specification:
cols(
  iteration = col_integer(),
  batch = col_integer(),
  batch_loss = col_double(),
  epoch_loss = col_double(),
  elapsed_time = col_double()
)
df_train %>% head(200)
#df_train_reshape <- df_train %>% gather(key = loss_type, value = loss, batch_loss, epoch_loss) %>%
#                      mutate(loss_type = factor(loss_type))
#df_train_reshape %>% head(100)
plot <- ggplot(df_train, aes(x=cumbatch, y=batch_loss, color=factor(iteration))) +
          geom_line(size=0.1) +
          scale_y_sqrt(breaks=c(0, 0.25, 0.5, 1, 2, 4)) +
          scale_x_continuous(breaks = seq(0, max(df_train$cumbatch)+batches_per_epoch, by=batches_per_epoch), labels = c(0:20)) +
          fte_theme() +
          theme(panel.grid.major = element_line(size=0.1)) +
    labs(title = "Batch Loss Over Time While Training Magic Card Generator",
          x = "# Epoch",
          y = "Batch Loss (128 Samples per Batch)")
max_save(plot, "batch-losses", "Keras Logging")
plot <- ggplot(df_train, aes(x=cumbatch, y=epoch_loss, color=factor(iteration))) +
          geom_line(size=0.5) +
          scale_y_sqrt(breaks=c(0, 0.25, 0.5, 1, 2, 4)) +
          scale_x_continuous(breaks = seq(0, max(df_train$cumbatch)+batches_per_epoch, by=batches_per_epoch), labels = c(0:20)) +
          fte_theme() +
          theme(panel.grid.major = element_line(size=0.1)) +
    labs(title = "Epoch Loss Over Time While Training Magic Card Generator",
          x = "# Epoch",
          y = "Epoch Loss (Average Batch Loss During Epoch)")
max_save(plot, "epoch-losses", "Keras Logging")

4 LICENSE

The MIT License (MIT)

Copyright (c) 2017 Max Woolf

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

---
title: "Pretrained Character Embeddings for Deep Learning and Automatic Text Generation"
author: "Max Woolf (@minimaxir)"
date: "2017-04-04"
output:
  html_notebook:
    highlight: tango
    mathjax: null
    number_sections: yes
    theme: spacelab
    toc: yes
    toc_float: yes
---

This R Notebook is the complement to my blog post [Pretrained Character Embeddings for Deep Learning and Automatic Text Generation](http://minimaxir.com/2017/04/char-embeddings/).

This notebook is licensed under the MIT License. If you use the code or data visualization designs contained within this notebook, it would be greatly appreciated if proper attribution is given back to this notebook and/or myself. Thanks! :)

```{r}
source("Rstart.R")
library(tsne)
```

# Visualize GloVe Vectors

```{r}
df <- read_delim("glove.840B.300d-char.txt", col_names = F, delim=" ", quote = "—")
df[,1:6]
```

Assign colors by character type (warning: jank implementation for determining character type)

```{r}
type <- ifelse(df$X1 %in% letters, "lowercase",
          ifelse(df$X1 %in% LETTERS, "uppercase",
            ifelse(df$X1 %in% c(0:9), "numeric", "punctuation")))

type <- factor(type, levels=c("lowercase", "uppercase", "numeric", "punctuation"))
type
```


```{r}
perplexity = 7
initial_dims = 16
max_iter = 5000

set.seed(123)
df_reduce <- tsne(df %>% select(X2:X301) %>% data.matrix(), perplexity = perplexity,
                  initial_dims = initial_dims, max_iter = max_iter)

df_reduce <- data.frame(char = df$X1, type = type, df_reduce) %>%
                tbl_df() %>%
                mutate(char = as.character(char))
df_reduce
```

```{r}
plot <- ggplot(df_reduce, aes(x=X1, y=X2, label=char, color = type)) +
          geom_text(family="Source Code Pro Semibold") +
          theme_void(base_family = "Source Sans Pro", base_size=8) +
          scale_color_brewer(palette="Set1") + 
          labs(title = "Projection of 300D GloVe Character Vectors into 2D Space (16D, perplexity = 7)",
               subtitle = "Characters closer to each other are more similar in usage context.",
               color = '') +
          theme(plot.margin = unit(c(0.2,0.2,0.2,0.2),"cm"),
                plot.subtitle = element_text(family="Open Sans Condensed Bold", size=8, color="#666666"))

max_save(plot, "char-tsne", w=5, h=4, "Stanford NLP")
```

![](char-tsne.png)
```{r}
perplexity = 2
initial_dims = 64
max_iter = 5000

set.seed(123)
df_reduce <- tsne(df %>% select(X2:X301) %>% data.matrix(), perplexity = perplexity,
                  initial_dims = initial_dims, max_iter = max_iter)

df_reduce <- data.frame(char = df$X1, type = type, df_reduce) %>%
                tbl_df() %>%
                mutate(char = as.character(char))

plot <- ggplot(df_reduce, aes(x=X1, y=X2, label=char, color = type)) +
          geom_text(family="Source Code Pro Semibold") +
          theme_void(base_family = "Source Sans Pro", base_size=8) +
          scale_color_brewer(palette="Set1") + 
          labs(title = "Projection of 300D GloVe Character Vectors into 2D Space (64D, perplexity = 2)",
               subtitle = "Characters closer to each other are more similar in usage context.",
               color = '') +
          theme(plot.margin = unit(c(0.2,0.2,0.2,0.2),"cm"),
                plot.subtitle = element_text(family="Open Sans Condensed Bold", size=8, color="#666666"))

max_save(plot, "char-tsne-2", w=5, h=4, "Stanford NLP")
```

![](char-tsne-2.png)

# Visualize Embedded Magic Characters

```{r}
df_embed <- read_delim("char-embeddings.txt", col_names = F, delim=" ", quote = "—")
df_embed <- na.omit(df_embed)   # removes space and newline since will not parse
df_embed[,1:6]
```

```{r}
type_embed <- ifelse(df_embed$X1 %in% letters, "lowercase",
                ifelse(df_embed$X1 %in% LETTERS, "uppercase",
                 ifelse(df_embed$X1 %in% c(0:9), "numeric", "punctuation")))

type_embed <- factor(type_embed, levels=c("lowercase", "uppercase", "numeric", "punctuation"))
```

```{r}
perplexity = 10
initial_dims = 30
max_iter = 5000

set.seed(123)
df_reduce <- tsne(df_embed %>% select(X2:X301) %>% data.matrix(), perplexity = perplexity,
                  initial_dims = initial_dims, max_iter = max_iter)

df_reduce <- data.frame(char = df_embed$X1, type = type_embed, df_reduce) %>%
                tbl_df() %>%
                mutate(char = as.character(char))

plot <- ggplot(df_reduce, aes(x=X1, y=X2, label=char, color = type)) +
          geom_text(family="Source Code Pro Semibold") +
          theme_void(base_family = "Source Sans Pro", base_size=8) +
          scale_color_brewer(palette="Set1") + 
          labs(title = "Projection of 300D Magic Card Character Vectors into 2D Space (30D, perplexity = 10)",
               subtitle = "Characters closer to each other are more similar in usage context.",
               color = '') +
          theme(plot.margin = unit(c(0.2,0.2,0.2,0.2),"cm"),
                plot.subtitle = element_text(family="Open Sans Condensed Bold", size=8, color="#666666"))

max_save(plot, "char-tsne-embed", w=5, h=4, "Keras Logging")
```

![](char-tsne-embed.png)

# Training Perfomance

```{r}
batches_per_epoch = 7850

df_train <- read_csv("log.csv") %>% filter(iteration <= 20) %>%
              mutate(cumbatch = (iteration-1) * batches_per_epoch + batch)
df_train %>% head(200)
```

```{r}
#df_train_reshape <- df_train %>% gather(key = loss_type, value = loss, batch_loss, epoch_loss) %>%
#                      mutate(loss_type = factor(loss_type))
#df_train_reshape %>% head(100)
```


```{r}
plot <- ggplot(df_train, aes(x=cumbatch, y=batch_loss, color=factor(iteration))) +
          geom_line(size=0.1) +
          scale_y_sqrt(breaks=c(0, 0.25, 0.5, 1, 2, 4)) +
          scale_x_continuous(breaks = seq(0, max(df_train$cumbatch)+batches_per_epoch, by=batches_per_epoch), labels = c(0:20)) +
          fte_theme() +
          theme(panel.grid.major = element_line(size=0.1)) +
    labs(title = "Batch Loss Over Time While Training Magic Card Generator",
          x = "# Epoch",
          y = "Batch Loss (128 Samples per Batch)")

max_save(plot, "batch-losses", "Keras Logging")
```

![](batch-losses.png)


```{r}
plot <- ggplot(df_train, aes(x=cumbatch, y=epoch_loss, color=factor(iteration))) +
          geom_line(size=0.5) +
          scale_y_sqrt(breaks=c(0, 0.25, 0.5, 1, 2, 4)) +
          scale_x_continuous(breaks = seq(0, max(df_train$cumbatch)+batches_per_epoch, by=batches_per_epoch), labels = c(0:20)) +
          fte_theme() +
          theme(panel.grid.major = element_line(size=0.1)) +
    labs(title = "Epoch Loss Over Time While Training Magic Card Generator",
          x = "# Epoch",
          y = "Epoch Loss (Average Batch Loss During Epoch)")

max_save(plot, "epoch-losses", "Keras Logging")
```

![](epoch-losses.png)

# LICENSE

The MIT License (MIT)

Copyright (c) 2017 Max Woolf

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.