Artifical Intelligence Class Problem. Four missionaries and four cannibals are o
ID: 3668308 • Letter: A
Question
Artifical Intelligence Class Problem.
Four missionaries and four cannibals are on West bank (W) of a river, along with a boat that can hold up to three people. Find a way to get everyone to East bank (E) without ever leaving a group of missionaries in one place outnumbered by the cannibals in that place.
Solve the same problem with the 2nd heuristic function h2 by A* with GRAPHSEARCH.
(a) Define your 2nd heuristic function, h2, which is admissible and consistent.
(b) What is your optimal solution?
(c) What is the optimal cost of the solution?
(d) Give the total number of expanded noes, i.e. he states in the explored list.
(e) Compare the performance of both heuristics h1 and h2, in terms of the total number of expanded nodes.
Any help would be greatly appreciated!!!
Explanation / Answer
1. MOTIVATION AND CONTEXT FOR NATURAL LANGUAGE PROCESSING
1.1 The Nature and Role of Natural Language Processing
Natural Language Processing (NLP) is a sub-field of Artificial Intelligence. It is also known as Computational Linguistics.
NLP is concerned with the production and comprehension of natural languages such as English or Russian. It deals largely with written language or text, but there is some consideration of spoken language, including phonology, the study of the sounds that make up a language.
You can find an interesting glossary of terms used in NLP on the World Wide Web at:
http://www.cs.bham.ac.uk/~pxc/nlpa/nlpgloss.html
1.1.1 Functions of language in human communication
Language is the principal method of communication between humans. We use it for a number of different purposes:
Notice that some of these (informing, answering, acknowledging and sharing) are intended to transfer information to the listener, while others (requesting, commanding, querying) are intended to prompt the listener to take some action.
Some communications such as greetings ‘Good morning. How are you today?’ ‘I’m fine thanks. How are you?’ are intended only to build and reinforce social links and convey little or no real information.
1.1.2 Language as a sign of intelligence
No one really knows if we use language because we’re intelligent or if we’re intelligent because we use language. Jerison suggests that human language arises from the need for better ‘cognitive maps’ of our territory. He points out that dogs and other carnivores rely largely on scent marking and their sense of smell to tell them where they are and what other animals have been there. The early primates (30 million years ago) lacked this well-developed sense of smell and substituted sounds for scent marking. Language may simply be a means of compensating for our inadequate noses!
1.1.3 Natural language and other forms of communication
Natural language is not the only form of communication which exists: we’ll look briefly at four others: sign language, non-human communication, programming languages and formal logic.
Sign Language
Sign languages, such as British Sign Language (BSL) and American Sign Language (ASL) are true languages, with vocabularies of thousands of words, and grammars as complex and sophisticated as those of any spoken or written language. BSL is now the fourth most widely used language in the UK and a major campaign is under way to have it officially recognised by the government, as has already happened in most EU countries.
ASL is quite different from BSL - it is more closely related to French Sign Language, due to the influence of Laurent Clerc, the first teacher of the deaf in the United States. ASL has a Topic-Comment syntax, while English uses Subject-Object-Verb. In terms of syntax, ASL shares more with spoken Japanese than it does with English.
Sign languages are not an invented system like Esperanto. They are linguistically complete, natural languages and are the native languages of many deaf men and women, as well as some hearing children born into deaf families.
Sign languages are sometimes described as gestural languages. This is not absolutely correct because hand gestures are only one component. Facial features such as eyebrow motion and lip-mouth movements are also significant and form a crucial part of the grammatical system. Sign languages also make use of the space surrounding the signer to describe places and persons that are not present.
Sign languages have a very complex grammar. Unlike spoken languages where there is only a single stream of sounds, sign languages can have several things happening at the same time. For instance, the concept of ‘very big’ is conveyed by the simultaneous use of a hand gesture for ‘big’ and a mouth/cheek shape for ‘very’. Sign languages have their own morphology (rules for the creation of words), phonetics (rules for hand shapes), and grammar that are very unlike those found in spoken languages.
Sign languages should not be confused with Signed English, which is a word-for-word signed equivalent of English. Deaf people tend to find it tiring, because its grammar, like that of spoken languages, is linear, while that of sign languages is primarily spatial.
Non-Human Communication
It is often stated that one of the great differences between humans and animals is the ability of humans to use language. This has frequently been challenged, particularly in studies of the use of language by primates. These studies have generally followed one of two paths: the use of language by primates in the wild and attempts to teach some form of language (not necessarily spoken) to primates in captivity.
Wild primates use a variety of methods of communication. Many use scents to mark their territory and they use touch to indicate relationships: mothers carry their young and adults may sit and/or sleep together or groom each other. The higher primates look at whatever they are paying attention to. Important visual cues include facial expression, hair erection, general posture, and tail position.
Primates use vocal communication, from soft grunts to whoops, when they want to attract the attention of others. Sounds may be used to signal danger of an attack or the location of a food source. The meaning of primate communication depends on the social and environmental context as well as the particular signals being used. Most animals use a fixed set of signals to represent messages, which are important to their survival (food here, predator nearby, approach, withdraw etc.).
Vervet monkeys have the most sophisticated animal communication that we know of. The sounds they use are learned, rather than instinctive. They have a variety of calls for different predators: a loud bark for leopards, a short cough for eagles and a chatter for snakes. They also use one type of grunt to communicate with dominant members of their own group, another to communicate with subordinate members and a third type to communicate with members of other groups. They are even capable of lying! A vervet that is losing a fight may make the leopard alarm, causing the whole group to run for the trees and forget the fight.
There have been numerous attempts to teach some kind of language to primates. Researchers argue that projects of this nature can provide valuable information, not only about the nature of language and cognitive and intellectual capacities, but also about such issues as the uniqueness of human language and thought. Such projects also shed light on the early development of language in humans. Another reason for teaching language to primates is the hope of discovering better methods for training children with learning difficulties who fail to develop linguistic skills during their early years.
Allen and Beatrice Gardner began teaching American Sign Language to an infant chimpanzee named Washoe in 1966. They provided a friendly environment that they believed would be conducive to learning. The people who looked after Washoe used only sign language in her presence. She was able to transfer her signs spontaneously to a new situation, e.g. she used the word ‘more’ in a variety of contexts, not just for more tickling, which was the first context.
The Gardners reported that Washoe began to use combinations of signs spontaneously after learning only about eight or ten of them. At one stage Washoe ‘adopted’ an infant chimpanzee named Loulis. For the next five years, no sign language was used by humans in Loulis' presence; however, Loulis still managed to learn over 50 signs from the other chimpanzees.
The year after Project Washoe began, David and Ann Premack started an experiment with a different kind of language. They used plastic tokens, which represented words and varied in shape, size, texture, and colour to train a chimpanzee named Sarah. Sentences were formed by placing the tokens in a line. Sarah was taught nouns, verbs, adjectives, pronouns, quantifiers, same-difference, negation, and compound sentences. To show that she was not simply responding to cues from her trainers, she was introduced to a new trainer who didn’t know her language. When this trainer presented her with questions, she gave the correct answers less frequently than usual, but still well above chance.
A chimpanzee named Lana learned to use another language system, a keyboard with keys for various lexigrams, each representing one word. When Lana pressed a key with a lexigram on it, the key would light up and the lexigram would appear on a projector. If keys were pressed accidentally, Lana used the period key as an eraser so that she could restart the sentence - she did this on her own before it occurred to the researchers.
Lana started using ‘no’ as a protest (e.g. when someone else was drinking a Coke and she did not have one) after having learned it as a negation. Lana acquired many skills which showed her ability to abstract and generalise, e.g. she spontaneously used ‘this’ to refer to things for which she had no name, and she invented names for things by combining lexigrams in novel ways.
However, many linguists, including the highly influential Noam Chomsky, argue that language is a uniquely human gift. According to this school, chimpanzees and other close relatives cannot use language because they lack the human brain structures that make language work. Chomsky argues that trying to teach language to a chimpanzee is a bit like teaching a human being to fly. An athlete may be able to jump 20 feet, but it’s a crude imitation of flying.
Programming Languages
Programming languages have a number of features in common with natural languages, but there are also significant differences. Programming languages have a lexicon (or vocabulary) and rules governing how sentences in the languages are constructed. Most languages allow two different kinds of words, usually referred to as keywords and identifiers. There are a fixed number of keywords, e.g. begin, end, do, while etc. and these have a fixed function. There are an infinite number of identifiers. These are usually associated with a fixed function at the time of declaration, e.g. procedure name, variable name etc. In general, computer programmers have far more ability to generate new words than the speakers of a natural language, although their new words are often influenced by natural languages, e.g. CustName, TotPrice etc.
The syntax (grammatical rules) of modern programming languages can be rigorously defined and can often be expressed in a formal notation such as BNF (Backus Normal Form) or Syntax Diagrams. Unfortunately it’s not quite as easy to describe the semantics (meaning) of a programming language in a formal manner and this is normally still done by means of an English description. However, it’s still considerably more rigorous than a natural language.
Formal Logic
One important area of NLP is the study of the semantics or meaning of natural language statements. In many cases, the most important aspect of semantics is determining whether a sentence is true or false. We can simplify this task by defining a formal language with simple semantics and mapping natural language sentences on to it.
This formal language should be unambiguous, have simple rules of interpretation and inference and have a logical structure determined by the form of the sentence. Two commonly used formal languages are propositional logic and predicate logic. Formal Logic is covered in greater detail in section 4.1.4.
1.1.4 Natural Language Modalities (Speech and Text)
Natural language occurs in two distinct forms, text and speech. Although these can be considered as two different ways of expressing the same information there are important distinctions. Speech is usually less formal than text, but it can convey important additional information by means of volume, tone of voice etc. that are absent in text. It can also be more confusing as a result of accent, mispronunciation etc.
NLP has traditionally focused on text with Speech Recognition and Speech Generation being regarded as relatively disparate fields. However, in recent years there has been a degree of convergence as researchers have realised that a knowledge of language structure can assist in recognition or generation. We’ll look briefly at these fields.
Speech Recognition
Speech recognition is the process by which a computer converts an acoustic speech signal to text. It should be distinguished from speech understanding, the process by which a computer converts an acoustic speech signal to some form of abstract meaning.
Speech recognition systems can be speaker-dependent or speaker-independent. A speaker-dependent system is designed to operate for a single speaker. These systems are usually easier to develop, cheaper to buy and more accurate, but not as flexible as speaker adaptive or speaker independent systems.
A speaker-independent system is designed to operate for any speaker of a particular language. These systems are the most difficult to develop, most expensive and accuracy is lower than speaker dependent systems. However, they are more flexible.
A speaker-adaptive system is developed to adapt its operation to the characteristics of new speakers. It's difficulty lies somewhere between speaker-independent and speaker dependent systems.
The size of vocabulary of a speech recognition system affects its complexity, processing requirements and accuracy. Some applications only require a few words (e.g. numbers only), others require very large dictionaries (e.g. dictation machines).
An isolated-word system operates on single words at a time - requiring a pause between each word. This is the simplest form of recognition to perform because the end points are easier to find and the pronunciation of a word tends not affect others. Thus, because the occurrences of words are more consistent they are easier to recognise.
A continuous speech system operates on speech in which words are not separated by pauses. Continuous speech is more difficult to handle for a variety of reasons. It is difficult to find the start and end points of words. Another problem is coarticulation - the production of each phoneme is affected by the production of surrounding phonemes, and similarly the start and end of words are affected by the preceding and following words. The recognition of continuous speech is also affected by the rate of speech. Rapid speech tends to be harder.
Speech recognition starts with the digital sampling of speech, followed by acoustic signal processing. The next stage is recognition of phonemes, groups of phonemes and words. Most systems utilise some knowledge of the language to aid the recognition process. Some systems try to ‘understand’ speech, i.e. they try to convert the words into a representation of what the speaker intended to mean or achieve.
Speech Synthesis
Speech synthesis programs convert written input to spoken output by automatically generating synthetic speech. Speech synthesis is often referred to as ‘Text-to-Speech’ conversion (TTS). There are several algorithms available. The easiest way is to just record the voice of a person speaking the desired phrases. This is useful if only a restricted volume of phrases and sentences is used, e.g. messages in a train station, or schedule information via phone. The quality depends on the way recording is done.
More sophisticated, but poorer in quality, are algorithms that split the speech into smaller pieces. The smaller those units are, the fewer are they in number, but the quality also decreases. One frequently used unit is the phoneme, the smallest linguistic element. Depending on the language used there are about 35-50 phonemes in western European languages, i.e. there are 35-50 single recordings. The problem is combining them, as fluent speech requires fluent transitions between the elements. The intelligibility is therefore lower, but the memory required is small.
One solution to this dilemma is the use of diphones. Instead of splitting at the transitions, the cut is done at the centre of the phonemes, leaving the transitions themselves intact. This gives about 400 elements (20*20) and the quality increases.
The longer the units become, the more elements are there, but the quality increases along with the memory required. Other units that are widely used are half-syllables, syllables, words, or combinations of them, e.g. word stems and inflectional endings.
1. MOTIVATION AND CONTEXT FOR NATURAL LANGUAGE PROCESSING
1.1 The Nature and Role of Natural Language Processing
Natural Language Processing (NLP) is a sub-field of Artificial Intelligence. It is also known as Computational Linguistics.
NLP is concerned with the production and comprehension of natural languages such as English or Russian. It deals largely with written language or text, but there is some consideration of spoken language, including phonology, the study of the sounds that make up a language.
You can find an interesting glossary of terms used in NLP on the World Wide Web at:
http://www.cs.bham.ac.uk/~pxc/nlpa/nlpgloss.html
1.1.1 Functions of language in human communication
Language is the principal method of communication between humans. We use it for a number of different purposes:
Notice that some of these (informing, answering, acknowledging and sharing) are intended to transfer information to the listener, while others (requesting, commanding, querying) are intended to prompt the listener to take some action.
Some communications such as greetings ‘Good morning. How are you today?’ ‘I’m fine thanks. How are you?’ are intended only to build and reinforce social links and convey little or no real information.
1.1.2 Language as a sign of intelligence
No one really knows if we use language because we’re intelligent or if we’re intelligent because we use language. Jerison suggests that human language arises from the need for better ‘cognitive maps’ of our territory. He points out that dogs and other carnivores rely largely on scent marking and their sense of smell to tell them where they are and what other animals have been there. The early primates (30 million years ago) lacked this well-developed sense of smell and substituted sounds for scent marking. Language may simply be a means of compensating for our inadequate noses!
1.1.3 Natural language and other forms of communication
Natural language is not the only form of communication which exists: we’ll look briefly at four others: sign language, non-human communication, programming languages and formal logic.
Sign Language
Sign languages, such as British Sign Language (BSL) and American Sign Language (ASL) are true languages, with vocabularies of thousands of words, and grammars as complex and sophisticated as those of any spoken or written language. BSL is now the fourth most widely used language in the UK and a major campaign is under way to have it officially recognised by the government, as has already happened in most EU countries.
ASL is quite different from BSL - it is more closely related to French Sign Language, due to the influence of Laurent Clerc, the first teacher of the deaf in the United States. ASL has a Topic-Comment syntax, while English uses Subject-Object-Verb. In terms of syntax, ASL shares more with spoken Japanese than it does with English.
Sign languages are not an invented system like Esperanto. They are linguistically complete, natural languages and are the native languages of many deaf men and women, as well as some hearing children born into deaf families.
Sign languages are sometimes described as gestural languages. This is not absolutely correct because hand gestures are only one component. Facial features such as eyebrow motion and lip-mouth movements are also significant and form a crucial part of the grammatical system. Sign languages also make use of the space surrounding the signer to describe places and persons that are not present.
Sign languages have a very complex grammar. Unlike spoken languages where there is only a single stream of sounds, sign languages can have several things happening at the same time. For instance, the concept of ‘very big’ is conveyed by the simultaneous use of a hand gesture for ‘big’ and a mouth/cheek shape for ‘very’. Sign languages have their own morphology (rules for the creation of words), phonetics (rules for hand shapes), and grammar that are very unlike those found in spoken languages.
Sign languages should not be confused with Signed English, which is a word-for-word signed equivalent of English. Deaf people tend to find it tiring, because its grammar, like that of spoken languages, is linear, while that of sign languages is primarily spatial.
Non-Human Communication
It is often stated that one of the great differences between humans and animals is the ability of humans to use language. This has frequently been challenged, particularly in studies of the use of language by primates. These studies have generally followed one of two paths: the use of language by primates in the wild and attempts to teach some form of language (not necessarily spoken) to primates in captivity.
Wild primates use a variety of methods of communication. Many use scents to mark their territory and they use touch to indicate relationships: mothers carry their young and adults may sit and/or sleep together or groom each other. The higher primates look at whatever they are paying attention to. Important visual cues include facial expression, hair erection, general posture, and tail position.
Primates use vocal communication, from soft grunts to whoops, when they want to attract the attention of others. Sounds may be used to signal danger of an attack or the location of a food source. The meaning of primate communication depends on the social and environmental context as well as the particular signals being used. Most animals use a fixed set of signals to represent messages, which are important to their survival (food here, predator nearby, approach, withdraw etc.).
Vervet monkeys have the most sophisticated animal communication that we know of. The sounds they use are learned, rather than instinctive. They have a variety of calls for different predators: a loud bark for leopards, a short cough for eagles and a chatter for snakes. They also use one type of grunt to communicate with dominant members of their own group, another to communicate with subordinate members and a third type to communicate with members of other groups. They are even capable of lying! A vervet that is losing a fight may make the leopard alarm, causing the whole group to run for the trees and forget the fight.
There have been numerous attempts to teach some kind of language to primates. Researchers argue that projects of this nature can provide valuable information, not only about the nature of language and cognitive and intellectual capacities, but also about such issues as the uniqueness of human language and thought. Such projects also shed light on the early development of language in humans. Another reason for teaching language to primates is the hope of discovering better methods for training children with learning difficulties who fail to develop linguistic skills during their early years.
Allen and Beatrice Gardner began teaching American Sign Language to an infant chimpanzee named Washoe in 1966. They provided a friendly environment that they believed would be conducive to learning. The people who looked after Washoe used only sign language in her presence. She was able to transfer her signs spontaneously to a new situation, e.g. she used the word ‘more’ in a variety of contexts, not just for more tickling, which was the first context.
The Gardners reported that Washoe began to use combinations of signs spontaneously after learning only about eight or ten of them. At one stage Washoe ‘adopted’ an infant chimpanzee named Loulis. For the next five years, no sign language was used by humans in Loulis' presence; however, Loulis still managed to learn over 50 signs from the other chimpanzees.
The year after Project Washoe began, David and Ann Premack started an experiment with a different kind of language. They used plastic tokens, which represented words and varied in shape, size, texture, and colour to train a chimpanzee named Sarah. Sentences were formed by placing the tokens in a line. Sarah was taught nouns, verbs, adjectives, pronouns, quantifiers, same-difference, negation, and compound sentences. To show that she was not simply responding to cues from her trainers, she was introduced to a new trainer who didn’t know her language. When this trainer presented her with questions, she gave the correct answers less frequently than usual, but still well above chance.
A chimpanzee named Lana learned to use another language system, a keyboard with keys for various lexigrams, each representing one word. When Lana pressed a key with a lexigram on it, the key would light up and the lexigram would appear on a projector. If keys were pressed accidentally, Lana used the period key as an eraser so that she could restart the sentence - she did this on her own before it occurred to the researchers.
Lana started using ‘no’ as a protest (e.g. when someone else was drinking a Coke and she did not have one) after having learned it as a negation. Lana acquired many skills which showed her ability to abstract and generalise, e.g. she spontaneously used ‘this’ to refer to things for which she had no name, and she invented names for things by combining lexigrams in novel ways.
However, many linguists, including the highly influential Noam Chomsky, argue that language is a uniquely human gift. According to this school, chimpanzees and other close relatives cannot use language because they lack the human brain structures that make language work. Chomsky argues that trying to teach language to a chimpanzee is a bit like teaching a human being to fly. An athlete may be able to jump 20 feet, but it’s a crude imitation of flying.
Programming Languages
Programming languages have a number of features in common with natural languages, but there are also significant differences. Programming languages have a lexicon (or vocabulary) and rules governing how sentences in the languages are constructed. Most languages allow two different kinds of words, usually referred to as keywords and identifiers. There are a fixed number of keywords, e.g. begin, end, do, while etc. and these have a fixed function. There are an infinite number of identifiers. These are usually associated with a fixed function at the time of declaration, e.g. procedure name, variable name etc. In general, computer programmers have far more ability to generate new words than the speakers of a natural language, although their new words are often influenced by natural languages, e.g. CustName, TotPrice etc.
The syntax (grammatical rules) of modern programming languages can be rigorously defined and can often be expressed in a formal notation such as BNF (Backus Normal Form) or Syntax Diagrams. Unfortunately it’s not quite as easy to describe the semantics (meaning) of a programming language in a formal manner and this is normally still done by means of an English description. However, it’s still considerably more rigorous than a natural language.
Formal Logic
One important area of NLP is the study of the semantics or meaning of natural language statements. In many cases, the most important aspect of semantics is determining whether a sentence is true or false. We can simplify this task by defining a formal language with simple semantics and mapping natural language sentences on to it.
This formal language should be unambiguous, have simple rules of interpretation and inference and have a logical structure determined by the form of the sentence. Two commonly used formal languages are propositional logic and predicate logic. Formal Logic is covered in greater detail in section 4.1.4.
1.1.4 Natural Language Modalities (Speech and Text)
Natural language occurs in two distinct forms, text and speech. Although these can be considered as two different ways of expressing the same information there are important distinctions. Speech is usually less formal than text, but it can convey important additional information by means of volume, tone of voice etc. that are absent in text. It can also be more confusing as a result of accent, mispronunciation etc.
NLP has traditionally focused on text with Speech Recognition and Speech Generation being regarded as relatively disparate fields. However, in recent years there has been a degree of convergence as researchers have realised that a knowledge of language structure can assist in recognition or generation. We’ll look briefly at these fields.
Speech Recognition
Speech recognition is the process by which a computer converts an acoustic speech signal to text. It should be distinguished from speech understanding, the process by which a computer converts an acoustic speech signal to some form of abstract meaning.
Speech recognition systems can be speaker-dependent or speaker-independent. A speaker-dependent system is designed to operate for a single speaker. These systems are usually easier to develop, cheaper to buy and more accurate, but not as flexible as speaker adaptive or speaker independent systems.
A speaker-independent system is designed to operate for any speaker of a particular language. These systems are the most difficult to develop, most expensive and accuracy is lower than speaker dependent systems. However, they are more flexible.
A speaker-adaptive system is developed to adapt its operation to the characteristics of new speakers. It's difficulty lies somewhere between speaker-independent and speaker dependent systems.
The size of vocabulary of a speech recognition system affects its complexity, processing requirements and accuracy. Some applications only require a few words (e.g. numbers only), others require very large dictionaries (e.g. dictation machines).
An isolated-word system operates on single words at a time - requiring a pause between each word. This is the simplest form of recognition to perform because the end points are easier to find and the pronunciation of a word tends not affect others. Thus, because the occurrences of words are more consistent they are easier to recognise.
A continuous speech system operates on speech in which words are not separated by pauses. Continuous speech is more difficult to handle for a variety of reasons. It is difficult to find the start and end points of words. Another problem is coarticulation - the production of each phoneme is affected by the production of surrounding phonemes, and similarly the start and end of words are affected by the preceding and following words. The recognition of continuous speech is also affected by the rate of speech. Rapid speech tends to be harder.
Speech recognition starts with the digital sampling of speech, followed by acoustic signal processing. The next stage is recognition of phonemes, groups of phonemes and words. Most systems utilise some knowledge of the language to aid the recognition process. Some systems try to ‘understand’ speech, i.e. they try to convert the words into a representation of what the speaker intended to mean or achieve.
Speech Synthesis
Speech synthesis programs convert written input to spoken output by automatically generating synthetic speech. Speech synthesis is often referred to as ‘Text-to-Speech’ conversion (TTS). There are several algorithms available. The easiest way is to just record the voice of a person speaking the desired phrases. This is useful if only a restricted volume of phrases and sentences is used, e.g. messages in a train station, or schedule information via phone. The quality depends on the way recording is done.
More sophisticated, but poorer in quality, are algorithms that split the speech into smaller pieces. The smaller those units are, the fewer are they in number, but the quality also decreases. One frequently used unit is the phoneme, the smallest linguistic element. Depending on the language used there are about 35-50 phonemes in western European languages, i.e. there are 35-50 single recordings. The problem is combining them, as fluent speech requires fluent transitions between the elements. The intelligibility is therefore lower, but the memory required is small.
One solution to this dilemma is the use of diphones. Instead of splitting at the transitions, the cut is done at the centre of the phonemes, leaving the transitions themselves intact. This gives about 400 elements (20*20) and the quality increases.
The longer the units become, the more elements are there, but the quality increases along with the memory required. Other units that are widely used are half-syllables, syllables, words, or combinations of them, e.g. word stems and inflectional endings.
1. MOTIVATION AND CONTEXT FOR NATURAL LANGUAGE PROCESSING
1.1 The Nature and Role of Natural Language Processing
Natural Language Processing (NLP) is a sub-field of Artificial Intelligence. It is also known as Computational Linguistics.
NLP is concerned with the production and comprehension of natural languages such as English or Russian. It deals largely with written language or text, but there is some consideration of spoken language, including phonology, the study of the sounds that make up a language.
You can find an interesting glossary of terms used in NLP on the World Wide Web at:
http://www.cs.bham.ac.uk/~pxc/nlpa/nlpgloss.html
1.1.1 Functions of language in human communication
Language is the principal method of communication between humans. We use it for a number of different purposes:
Notice that some of these (informing, answering, acknowledging and sharing) are intended to transfer information to the listener, while others (requesting, commanding, querying) are intended to prompt the listener to take some action.
Some communications such as greetings ‘Good morning. How are you today?’ ‘I’m fine thanks. How are you?’ are intended only to build and reinforce social links and convey little or no real information.
1.1.2 Language as a sign of intelligence
No one really knows if we use language because we’re intelligent or if we’re intelligent because we use language. Jerison suggests that human language arises from the need for better ‘cognitive maps’ of our territory. He points out that dogs and other carnivores rely largely on scent marking and their sense of smell to tell them where they are and what other animals have been there. The early primates (30 million years ago) lacked this well-developed sense of smell and substituted sounds for scent marking. Language may simply be a means of compensating for our inadequate noses!
1.1.3 Natural language and other forms of communication
Natural language is not the only form of communication which exists: we’ll look briefly at four others: sign language, non-human communication, programming languages and formal logic.
Sign Language
Sign languages, such as British Sign Language (BSL) and American Sign Language (ASL) are true languages, with vocabularies of thousands of words, and grammars as complex and sophisticated as those of any spoken or written language. BSL is now the fourth most widely used language in the UK and a major campaign is under way to have it officially recognised by the government, as has already happened in most EU countries.
ASL is quite different from BSL - it is more closely related to French Sign Language, due to the influence of Laurent Clerc, the first teacher of the deaf in the United States. ASL has a Topic-Comment syntax, while English uses Subject-Object-Verb. In terms of syntax, ASL shares more with spoken Japanese than it does with English.
Sign languages are not an invented system like Esperanto. They are linguistically complete, natural languages and are the native languages of many deaf men and women, as well as some hearing children born into deaf families.
Sign languages are sometimes described as gestural languages. This is not absolutely correct because hand gestures are only one component. Facial features such as eyebrow motion and lip-mouth movements are also significant and form a crucial part of the grammatical system. Sign languages also make use of the space surrounding the signer to describe places and persons that are not present.
Sign languages have a very complex grammar. Unlike spoken languages where there is only a single stream of sounds, sign languages can have several things happening at the same time. For instance, the concept of ‘very big’ is conveyed by the simultaneous use of a hand gesture for ‘big’ and a mouth/cheek shape for ‘very’. Sign languages have their own morphology (rules for the creation of words), phonetics (rules for hand shapes), and grammar that are very unlike those found in spoken languages.
Sign languages should not be confused with Signed English, which is a word-for-word signed equivalent of English. Deaf people tend to find it tiring, because its grammar, like that of spoken languages, is linear, while that of sign languages is primarily spatial.
Non-Human Communication
It is often stated that one of the great differences between humans and animals is the ability of humans to use language. This has frequently been challenged, particularly in studies of the use of language by primates. These studies have generally followed one of two paths: the use of language by primates in the wild and attempts to teach some form of language (not necessarily spoken) to primates in captivity.
Wild primates use a variety of methods of communication. Many use scents to mark their territory and they use touch to indicate relationships: mothers carry their young and adults may sit and/or sleep together or groom each other. The higher primates look at whatever they are paying attention to. Important visual cues include facial expression, hair erection, general posture, and tail position.
Primates use vocal communication, from soft grunts to whoops, when they want to attract the attention of others. Sounds may be used to signal danger of an attack or the location of a food source. The meaning of primate communication depends on the social and environmental context as well as the particular signals being used. Most animals use a fixed set of signals to represent messages, which are important to their survival (food here, predator nearby, approach, withdraw etc.).
Vervet monkeys have the most sophisticated animal communication that we know of. The sounds they use are learned, rather than instinctive. They have a variety of calls for different predators: a loud bark for leopards, a short cough for eagles and a chatter for snakes. They also use one type of grunt to communicate with dominant members of their own group, another to communicate with subordinate members and a third type to communicate with members of other groups. They are even capable of lying! A vervet that is losing a fight may make the leopard alarm, causing the whole group to run for the trees and forget the fight.
There have been numerous attempts to teach some kind of language to primates. Researchers argue that projects of this nature can provide valuable information, not only about the nature of language and cognitive and intellectual capacities, but also about such issues as the uniqueness of human language and thought. Such projects also shed light on the early development of language in humans. Another reason for teaching language to primates is the hope of discovering better methods for training children with learning difficulties who fail to develop linguistic skills during their early years.
Allen and Beatrice Gardner began teaching American Sign Language to an infant chimpanzee named Washoe in 1966. They provided a friendly environment that they believed would be conducive to learning. The people who looked after Washoe used only sign language in her presence. She was able to transfer her signs spontaneously to a new situation, e.g. she used the word ‘more’ in a variety of contexts, not just for more tickling, which was the first context.
The Gardners reported that Washoe began to use combinations of signs spontaneously after learning only about eight or ten of them. At one stage Washoe ‘adopted’ an infant chimpanzee named Loulis. For the next five years, no sign language was used by humans in Loulis' presence; however, Loulis still managed to learn over 50 signs from the other chimpanzees.
The year after Project Washoe began, David and Ann Premack started an experiment with a different kind of language. They used plastic tokens, which represented words and varied in shape, size, texture, and colour to train a chimpanzee named Sarah. Sentences were formed by placing the tokens in a line. Sarah was taught nouns, verbs, adjectives, pronouns, quantifiers, same-difference, negation, and compound sentences. To show that she was not simply responding to cues from her trainers, she was introduced to a new trainer who didn’t know her language. When this trainer presented her with questions, she gave the correct answers less frequently than usual, but still well above chance.
A chimpanzee named Lana learned to use another language system, a keyboard with keys for various lexigrams, each representing one word. When Lana pressed a key with a lexigram on it, the key would light up and the lexigram would appear on a projector. If keys were pressed accidentally, Lana used the period key as an eraser so that she could restart the sentence - she did this on her own before it occurred to the researchers.
Lana started using ‘no’ as a protest (e.g. when someone else was drinking a Coke and she did not have one) after having learned it as a negation. Lana acquired many skills which showed her ability to abstract and generalise, e.g. she spontaneously used ‘this’ to refer to things for which she had no name, and she invented names for things by combining lexigrams in novel ways.
However, many linguists, including the highly influential Noam Chomsky, argue that language is a uniquely human gift. According to this school, chimpanzees and other close relatives cannot use language because they lack the human brain structures that make language work. Chomsky argues that trying to teach language to a chimpanzee is a bit like teaching a human being to fly. An athlete may be able to jump 20 feet, but it’s a crude imitation of flying.
Programming Languages
Programming languages have a number of features in common with natural languages, but there are also significant differences. Programming languages have a lexicon (or vocabulary) and rules governing how sentences in the languages are constructed. Most languages allow two different kinds of words, usually referred to as keywords and identifiers. There are a fixed number of keywords, e.g. begin, end, do, while etc. and these have a fixed function. There are an infinite number of identifiers. These are usually associated with a fixed function at the time of declaration, e.g. procedure name, variable name etc. In general, computer programmers have far more ability to generate new words than the speakers of a natural language, although their new words are often influenced by natural languages, e.g. CustName, TotPrice etc.
The syntax (grammatical rules) of modern programming languages can be rigorously defined and can often be expressed in a formal notation such as BNF (Backus Normal Form) or Syntax Diagrams. Unfortunately it’s not quite as easy to describe the semantics (meaning) of a programming language in a formal manner and this is normally still done by means of an English description. However, it’s still considerably more rigorous than a natural language.
Formal Logic
One important area of NLP is the study of the semantics or meaning of natural language statements. In many cases, the most important aspect of semantics is determining whether a sentence is true or false. We can simplify this task by defining a formal language with simple semantics and mapping natural language sentences on to it.
This formal language should be unambiguous, have simple rules of interpretation and inference and have a logical structure determined by the form of the sentence. Two commonly used formal languages are propositional logic and predicate logic. Formal Logic is covered in greater detail in section 4.1.4.
1.1.4 Natural Language Modalities (Speech and Text)
Natural language occurs in two distinct forms, text and speech. Although these can be considered as two different ways of expressing the same information there are important distinctions. Speech is usually less formal than text, but it can convey important additional information by means of volume, tone of voice etc. that are absent in text. It can also be more confusing as a result of accent, mispronunciation etc.
NLP has traditionally focused on text with Speech Recognition and Speech Generation being regarded as relatively disparate fields. However, in recent years there has been a degree of convergence as researchers have realised that a knowledge of language structure can assist in recognition or generation. We’ll look briefly at these fields.
Speech Recognition
Speech recognition is the process by which a computer converts an acoustic speech signal to text. It should be distinguished from speech understanding, the process by which a computer converts an acoustic speech signal to some form of abstract meaning.
Speech recognition systems can be speaker-dependent or speaker-independent. A speaker-dependent system is designed to operate for a single speaker. These systems are usually easier to develop, cheaper to buy and more accurate, but not as flexible as speaker adaptive or speaker independent systems.
A speaker-independent system is designed to operate for any speaker of a particular language. These systems are the most difficult to develop, most expensive and accuracy is lower than speaker dependent systems. However, they are more flexible.
A speaker-adaptive system is developed to adapt its operation to the characteristics of new speakers. It's difficulty lies somewhere between speaker-independent and speaker dependent systems.
The size of vocabulary of a speech recognition system affects its complexity, processing requirements and accuracy. Some applications only require a few words (e.g. numbers only), others require very large dictionaries (e.g. dictation machines).
An isolated-word system operates on single words at a time - requiring a pause between each word. This is the simplest form of recognition to perform because the end points are easier to find and the pronunciation of a word tends not affect others. Thus, because the occurrences of words are more consistent they are easier to recognise.
A continuous speech system operates on speech in which words are not separated by pauses. Continuous speech is more difficult to handle for a variety of reasons. It is difficult to find the start and end points of words. Another problem is coarticulation - the production of each phoneme is affected by the production of surrounding phonemes, and similarly the start and end of words are affected by the preceding and following words. The recognition of continuous speech is also affected by the rate of speech. Rapid speech tends to be harder.
Speech recognition starts with the digital sampling of speech, followed by acoustic signal processing. The next stage is recognition of phonemes, groups of phonemes and words. Most systems utilise some knowledge of the language to aid the recognition process. Some systems try to ‘understand’ speech, i.e. they try to convert the words into a representation of what the speaker intended to mean or achieve.
Speech Synthesis
Speech synthesis programs convert written input to spoken output by automatically generating synthetic speech. Speech synthesis is often referred to as ‘Text-to-Speech’ conversion (TTS). There are several algorithms available. The easiest way is to just record the voice of a person speaking the desired phrases. This is useful if only a restricted volume of phrases and sentences is used, e.g. messages in a train station, or schedule information via phone. The quality depends on the way recording is done.
More sophisticated, but poorer in quality, are algorithms that split the speech into smaller pieces. The smaller those units are, the fewer are they in number, but the quality also decreases. One frequently used unit is the phoneme, the smallest linguistic element. Depending on the language used there are about 35-50 phonemes in western European languages, i.e. there are 35-50 single recordings. The problem is combining them, as fluent speech requires fluent transitions between the elements. The intelligibility is therefore lower, but the memory required is small.
One solution to this dilemma is the use of diphones. Instead of splitting at the transitions, the cut is done at the centre of the phonemes, leaving the transitions themselves intact. This gives about 400 elements (20*20) and the quality increases.
The longer the units become, the more elements are there, but the quality increases along with the memory required. Other units that are widely used are half-syllables, syllables, words, or combinations of them, e.g. word stems and inflectional endings.
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.