In the context of NLP, a language model is a function that assigns a probability measure to a sequence of tokens drawn from a vocabulary (Bengio et al, 2016). In this setting, we model language using probability distributions instead of formal grammars (Manning et al, 1999). This approach is commonly referred as Statistical NLP, since it performs statistical inference and prediction in natural language. Due to large amount of written and spoken language in a digital form, which can be easily used by computers, we can employ statistical methods to analyse it. In this work we will analyse language from the empiricists' perspective. Empiricists describe language as it occurs. Rationalists seek to describe how the human mind modules language, for instance in the form of grammars, by viewing written or spoken language as an indirect method. The difference between these views can sometimes be subtle, but a clear distinction can be found, because the two parties describe different things. On the other hand, empiricists assume the brain has some initial cognitive ability to learn and organize language, but ultimately the structure is not innate, but rather something that needs to be learned. On the one hand, rationalists (usually connected to Noam Chomksy's ideas) assume language is an innate characteristic of humans, not derived by the senses, but rather somehow encoded in the genome. When analysing / processing language, there are two fundamentally different approaches (Manning et al, 1999). This means that machines should not only recognize the structure of language, but also understand it. Its main focus is to enable machines to independently process the base concepts in linguistics: morphology (word formation) syntax (sentence structure / grammar) and semantics (word / sentence meaning). Natural Language Processing (NLP) is a field of study concerned with the automatic processing of natural language (Manning et al, 1999). ![]() Natural Language is the system humans use to communicate with each other. In this project we also replicated their results. In this article, the authors found a dimension in the cell states (a neuron) that strongly correlates to the semantic concept of sentiment, which they called the Sentiment Neuron. The idea for this project started from the openAI paper (Radford et al, 2017). For some tags, using three neurons yielded satisfactory results. Looking at the classifier weights, we observed that some concepts (POS tags) are encoded in one neuron, i.e., the POS tag of a byte can be derived from one neuron's activation value, while others are derived with more than one neuron together with the logistic regression classifier. Then, taking a sentence, which can be viewed as a byte sequence, we used its inner representations (the cell states of the LSTM), along with its corresponding POS tags, as the inputs and targets to train a logistic regression classifier. We first trained a byte-level language model with a Long Short-Term Memory (LSTM) network using a large collection of text. More precisely, we questioned if a neural network would be able to encode Part-of-Speech (POS) tags in its neurons, just by training a simple language model. In this project we explored the capabilities of neural language models. This webpage showcases the master project developed by João Carvalho at the Chair of Algorithms and Data Structures of the University of Freiburg, as part of the MSc degree in Computer Science.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |