Hi, I\'ve been trying to create only 1 regular expression to extract only the de
ID: 3590989 • Letter: H
Question
Hi, I've been trying to create only 1 regular expression to extract only the description information from the following 3 description tags, except I have found them to be a little bit different. Just hoping someone can provide me with one that will work for all 3 below. (Please note that 1 of the description tags seems to have no description listed so I would want my regular expression to still recognise this but to just give no words. For example for the first one I would only want "The Los Angeles City Council agreed Wednesday to pay $1.9 million to the family of a man who was shot to death by police officers after he had stabbed himself in the abdomen during an apparent suicide attempt" extracted. Thanks
<description><![CDATA[<br /><img src="http://media.nbclosangeles.com/images/213*120/luis+molina+martinez.JPG" align="left" hspace="5" /><br />The Los Angeles City Council agreed Wednesday to pay $1.9 million to the family of a man who was shot to death by police officers after he had stabbed himself in the abdomen during an apparent suicide attempt.<br/><br/>Photo Credit: Martinez Family]]></description>
<description><![CDATA[<br /><img src="http://media.nbclosangeles.com/images/262*120/171011-CandyFactoryEvictionVillageValley.JPG" align="left" hspace="5" /><br />The famed Candy Factory on Magnolia Boulevard in Valley Village has been forced to close up shop and move to a cheaper location due to rising retail prices.]]></description> </item>
<photo:thumbnail>http://media.nbclosangeles.com/images/231*120/10-10-2017-sky-fire-anaheim-hills-3.jpg</photo:thumbnail> </media:content> <description><![CDATA[<br /><img src="http://media.nbclosangeles.com/images/231*120/10-10-2017-sky-fire-anaheim-hills-3.jpg" align="left" hspace="5" /><br /><br/><br/>Photo Credit: KNBC-TV]]></description> </item> <item> <dc:creator><![CDATA[]]></dc:creator> <title><![CDATA[Photos: Memorable Dodger Moments From 2017]]></title> <link><![CDATA[http://www.nbclosangeles
Explanation / Answer
Since you need to find multiple matches, you can use /(expression)/g . g matches mutliple times saving the last matched index also. Now, To match a description, the sentences are placed in between two tags, so we need to match a string between a closing > and opening <.
So far we have this expression / > .* < /g .
Note that here .* matches any character. Now, To match a sentence we need a starting character to be capital i.e in [A-Z] and ends with ".", in between we can have any characters. Since . is a special character we need to escape it with .
So we get ( [A-Z] .* .) for matching a sentence. Now we want to find the first "." instead of searching any ".", For this we need to add a "?" to make the search greedy i.e. pickup the first match. So we get :
( [A-Z] .*? .)
After a Putting this in earlier expression we get:
/ > ( [A-Z] .*? . ) .* < /g
Here, printing the value of $1 will give you your output since, we have grouped the sentence part together and left out the > and <.
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.