Neural Networks Engineering

Channel address:

Categories: Technologies

Language: English

Subscribers: 2.35K

Description from channel

Authored channel about neural networks development and machine learning mastering. Experiments, tool reviews, personal researches.
#deep_learning
#NLP
Author @generall93

▲ Vote (1)

Ratings & Reviews

3.00

2 reviews

Reviews can be left only by registered users. All reviews are moderated by admins.

5 stars

4 stars

3 stars

2 stars

1 stars

The latest Messages 3

2019-05-27 11:00:22 Have finished building demo and landing page for my project on mention classification. The idea of this project is to create a model which can assign some labels to objects based on their mentions in context. Right now it works only for people mentions, but if I find interest in this work, I will extend the model to other types like organizations or events. For now, you can check out the online demo of the neural network.

The current implementation can take account of several mentions at a time, so it can distinguish relevant parts of the context, not just averaging prediction.
It's also open sourced, and built with AllenNLP framework from training to serving. Take a look at it.
More technical details of implementation coming later.

5.3K viewsnne_controll_bot, edited 08:00

Open / Comment

2019-04-28 17:40:01 Wrote an article on the Medium about pushing fastText into Colab.

Tl;dr: original binary fastText is too large for Colab.
We can shrink it, but it is a little tricky for n-gram matrix: we need to consider uniformness of collision distribution.

The final model takes 2Gb of RAM instead of 16Gb and 94% similar to the original model.

Code is also provided.

3.5K viewsnne_controll_bot, edited 14:40

Open / Comment

2019-02-15 10:00:49 Neural networks achieved great success at various NLP tasks, however, they are limited at handling infrequent patterns. In this article, the problem is described in the context of machine translation task.

The authors noted that NMT is good at learning translation pairs that are frequently observed, but the system may ‘forget’ to use low-frequency pairs when they should be. In contrast, in traditional rule-based systems, low-frequency pairs cannot be smoothed out no matter how rare they are. One solution to this is combining both approaches.

The authors propose to use a large external memory along with a selection mechanism to allow NN to use this memory. Selection fetches all relevant translation variants using words in source sentence and then an attention mechanism selects among these variants. After that neural net decides what source of prediction should be used on each translation step.

The important thing is that the vectors for external memory were trained separately. That basically means that we can build knowledge bases for neural nets. That seems like a promising way to construct really large-scale models with huge capacity.

4.1K viewsnne_controll_bot, edited 07:00

Open / Comment

2018-12-28 10:00:52 Let's continue to dive into Question Answering. Last time we have generated several variants of synthetic sequences, from which we need to extract "answers". Each sequence type has each own pattern, and we want a neural network to find it.
In a most general sense, this task looks like sequence transformation - Seq2seq, similar to NMT.
In this post, I will describe how to implement a simple Seq2seq network with AllenNLP framework.

AllenNLP library includes components that standardize and simplify the creation of neural networks for text processing.
Its developers conducted a great work decomposing variety of NLP tasks into separate blocks.
It allowed to implement a set of universal pipeline components, suitable for reuse.
Implemented components could be used directly from code or for creating configs.

I have created a repository for my experiments. It contains a simple config file along with some accessory files. Let's take a look.

One of the main configuration parameters is a model. The model determines what happens to the data and the network during training and forecasting. The model parameter itself is a class that derives from allennlp.models.model.Model and implements the forward method.
We will use simple_seq2seq model which implements a basic sequence transformation scheme.

In classical seq2seq the source sequence is transformed by Encoder into representation, which is then read by Decoder to generate the target sequence.
simple_seq2seq module implements only Decoder. The Encoder should be implemented in other class, passed as a model parameter.
We will use the simplest encoder option - LSTM.

Here are some other useful model parameters:

- source_embedder - This class assigns a pre-trained vector to each input token. We have no pre-trained vectors for synthetic data so we will use random vectors. We will also make them untrainable to prevent overfitting.
- attention - attention function, used on each decoding step. Attention vector is concatenated with decoder state.
- beam_size - the number of variants, generated by beam search during decoding.
- scheduled_sampling_ratio - defines whether to use real or generated elements as a previous element during decoding.

Then we save our dataset so that the seq2seq dataset reader, implemented in AllenNLP, could work with it. Now we can launch training with a single command allennlp train config.json and observe training statistics on a Tensorboard.
A trained model could be easily used from Python, here is an example.

It should be noticed, that model is quickly overfitting on a synthetic data, so I generated a lot of it.

Unfortunately, AllenNLP seq2seq module is still under construction. It can't handle all existing variants of seq2seq models. For example, you can't implement Attention Transformer architecture from the article Attention is all you need. Attention Transformer requires a custom decoder, but it is hardcoded in simple_seq2seq. If you want to contribute AllenNLP seq2seq model, please, take a look at this issue. If you leave your reaction, it will help to focus AllenNLP developers attention on it.

3.5K viewsnne_controll_bot, edited 07:00

Open / Comment

2018-09-17 10:00:48 Another approach to transfer learning in NLP is Question Answering.
In the most general case Question Answering is the generation of a textual answer to a given question by a given set of facts in some form.
You can find a demo of QA system here

There are many types of this systems:

Categorized by facts representation:

A. Relational database
B. Complex data structure - ontology, semantic web, e.t.c.
C. Text

Categorized by answer types

1. Yes\No - particular case of matching models
2. Finding bounding indexes for the answer
3. Generate answer by given text and question

Categorized by question type

a. The only possible question - model has no input for questions, it learns to answer only one question defined by training set
b. Constant number of questions - model has one-hot encoded input for questions.
c. Textual question in special query language - projects like this
d. Textual question in free form - model is supposed to some-how encode the text of questions.

For example this article deals with combination C-2-d in this categorization.
This combination leads to the necessity of using complex bi-directional attention mechanisms like BiDAF.
I, on the contrary, want to concentrate on generating answers without initial markup in the form of answer boundaries. And I will not care about complex question representations, for now.
Let's start with synthetic data baseline as it is described in my previous posts.
In this notebook I wrote a list of data generators. Each one is slightly more complicated than the previous one.
In the next posts, I will describe my attempts to implement neural network architecture. It should able to generate correct answers for this datasets, starting from the simplest ones.

2.9K viewsnne_controll_bot, edited 07:00

Open / Comment

2018-08-05 22:11:05 Named Entity Linking with text matching neural network https://github.com/generall/OneShotNLP/blob/master/NEL.ipynb

2.4K viewsAndrey, 19:11

Open / Comment

2018-08-04 18:36:38

2.3K viewsAndrey, 15:36

Open / Comment

2018-08-04 18:35:58 There are some cases when you need to run your model on a small machine.
For example, if your model is being called 1 time per hour or you just don't want to pay $150 per month to Amazon for t2.2xlarge instance with 32Gb RAM.
The problem is that the size of most pre-trained word embeddings can reach tens of gigabytes.

In this post, I will describe the method of access word vectors without loading it into memory.
The idea is to simply save word vectors as a matrix so that we could compute the position of each row without reading any other rows from disk.
Fortunately, all this logic is already implemented in numpy.memmap.
The only thing we need to implement ourselves is the function which converts word into an appropriate index. We can simply store the whole vocabulary in memory or use hashing trick, it does not matter at this point.
It is slightly harder to store FastText vectors that way because it requires additional computation on n-grams to obtain word vector.
So for simplicity, we will just pre-compute vectors for all necessary words.

You may take a look at a simple implementation of the described approach here:
https://github.com/generall/OneShotNLP/blob/master/src/utils/disc_vectors.py

Class DiscVectors contains method for converting fastText .bin model into on-disk matrix representation and json file with vocabulary and meta-information.
Once the model is converted, you can retrieve vectors with get_word_vector method. Performance check shows that in the worst case it takes 20 µs to retrieve single vector, pretty good since we are not using any significant amount of RAM.

2.3K viewsAndrey, 15:35

Open / Comment

2018-07-24 00:24:19 Parallel preprocessing with multiprocessing

Using multiple processes to construct train batches may significantly reduce total training time of your network.
Basically, if you are using GPU for training, you can reduce additional batch construction time almost to zero. This is achieved through pipelining of computations: while GPU crunches numbers, CPU makes preprocessing. Python multiprocessing module allows us to implement such pipelining as elegant as it is possible in the language with GIL.

PyTorch DataLoader class, for example, also uses multiprocessing in it's internals.
Unfortunately DataLoader suffers lack of flexibility. It's impossible to create batch with any complex structure within standard DataLoader class. So it should be useful to be able to apply raw multiprocessing.

multiprocessing gives us a set of useful APIs to distribute computations among several processes. Processes does not share memory with each other, so data is transmitted via inter-process communication protocols. For example in linux-like operation systems multiprocessing uses pipes. Such organization leads to some pitfalls that I am going to tell you.

* map vs imap

Methods map and imap may be used to apply preprocessing to batches. Both of them take processing function and iterable as argument. The difference is that imap is lazy. It will return processed elements as soon as they are ready. In this case all processed batched should not be stored in RAM simultaneously. For training NN you should always prefer imap:

def process(batch_reader):
with Pool(threads) as pool:
for batch in pool.imap(foo, batch_reader):
....
yield batch
....

* Serialization

Other pitfall is associated with the need to transfer objects via pipes. In addition to the processing results, multiprocessing will also serialize transformation object if it is used like this: pool.imap(transformer.foo, batch_reader). transformer will be serialized and send to subprocess. It may lead to some problems if transformer object has large properties. In this case it may be better to store large properties as singleton class variables:

class Transformer():
large_dictinary = None

def __init__(self, large_dictinary, **kwargs):
self.__class__.large_dictinary = large_dictinary

def foo(self, x):
....
y = self.large_dictinary[x]
....

Another difficulty that you may encounter is if the preprocessor is faster than GPU learning. In this case unprocessed batches accumulate in memory. If your memory is not to large enough you will get Out-of-Memory error. One way to solve this problem is to limit batch preprocessing until GPU learning is done.
Semaphore is perfect solution for this task:

def batch_reader(semaphore):
for batch in source:
semaphore.acquire()
yield batch

def process(x):
return x + 1

def pooling():
with Pool(threads) as pool:
semaphore = Semaphore(limit)
for x in pool.imap(plus, batch_reader(semaphore)):
yield x
semaphore.release()

for x in pooling():
learn_gpu(x)

Semaphore has internal counter syncronized across all working processes. It's logic will block execution if some process tries to increase counet value above limit with semaphore.acquire ()

3.7K viewsAndrey, 21:24

Open / Comment

2018-07-13 00:40:56