In which the author describes a long-running project on engineering creative headline generation using a range of natural language processing libraries and draws analogies to a Herzogian tale of obsession.
*Sample title generated by Pundit on a draft of an academic paper on this topic.
The idea of the epic or quest is as old as history itself and has been depicted at length throughout Classical Antiquity, through the tales of the Crusades and Norse sagas right on up to Melville’s whaling epics, the novels of Thomas Pynchon and the cinema of Werner Herzog. Some people, (generally, but not exclusively male) attach themselves to chasing insane dreams which may come to nothing. “Think big, young man” someone said once, and they never stopped believing.
The 1982 motion picture Fitzcarraldo tells such a tale, of an Irish(?) rubber baron who transports a steamship through the Amazon, portrayed in typical manic fashion by Herzog muse Klaus Kinski.
Stranger than fiction is the parallel insanity of creating such a film itself, on location, in the Amazon and the mad dream of Herzog and his quarrels with natives drawing comparison between the German director and the real-life Peruvian Carlos Fitzcarrald on which the film was based. Not a million miles away from the grand designs of modern titans of industry such as Howard Hughes and Elon Musk and their projects involving titanic aircraft and interstellar transportation.
However, my own Amazonian steamboat, White Whale or Mars One, a.k a. the folly quest that just won’t die is the Pundit system.
Initially conceived as a side project during a particularly fallow period during my Ph.D years, it was a system which allowed the user to input phrases and receive a number of nonsensical puns, as seen below:
- Harry Potter and the Half Blood Prince -> Hairy Potter and the Half Cut Prince
- The Motorcycle Diaries -> The Motorcycle Dowries
It has in the years since morphed into something beyond a mere distraction, incorporating context and a barrage of natural language techniques to simulate linguistic creativity on topic and on tap. In the meantime, a pun generator appeared online, using Wikipedia as a data source and providing similar functionality in a dynamic fashion. The only missing feature was context.
The Pun also Rises
Long derided as the lowest form of wit or dad jokes, they still retain favour among news outlets such as The Economist, who have eked a lot of headline value from puns concerning the Chinese economy (Yuan small step for man..) However coming up with puns which are both creative and relevant to an article is by no means a cognitively cheap endeavour. Given my interest in natural language processing, I began to think about how such a process could be engineered or at the very least, augmented. Imagine a web service that suggested relevant on-topic creative article titles to you as you compose a blog post or news article? A Clippy for puns?
It started with a spark
As an example of the process, take the article title (Love Me Tinder) which has been common of late in articles discussing a certain swipe–happy dating application. Almost over-used to the point of cliché, I counted a handful of articles and even a short film with this title at the time of writing. As the zeitgeist has taken a shine to this particular phrase, it should be a prime candidate for deconstructing the mechanics. The very name of the dating app Tinder has connotations of matches, sparks and other such metaphors used for coupling and finding love. Such an article may also contain mentions of terms such as love, sex, dating, marriage, apps, technology, iPhones and other related terminology.
Imagine for a moment that we have access to a large database of stock phrases, book titles and cultural memes (such a database existed and was used in the early version of this system, although has been recently acquired and will be taken offline). We could query this database for phrases that contain our keywords, or combinations of same (Game Set Match, Match Point). As an added constraint, we also search for phrases that contain one of our keywords and a term that sounds like another keyword. Tinder sounds sort of like tender and cinder and our search now returns a number of results containing tender.
- Tender is the Night – (1934) F Scott Fitzgerald novel, (1999) Blur song
- Try A Little Tenderness (1966) – Otis Redding song
- Love me Tender (1956) – Elvis Presley song and film
Once we’ve obtained a list of possible candidates, now the fun begins!
We need to rank these according to some combination of metrics so that the most appropriate rise to the top.
In this case, it seems pretty clear why Love me Tinder is the ideal (albeit overused) title. It encapsulates the name of the product (sort of), a related topical concept (love) and top to it all, it is the title of not one, but two creative works featuring one of the most beloved recording artists of all time. I’m still partial to Try a Little Tinderness (1080 Google hits) or even Tinder is the Night, (18,000 Google hits) although the internet and the publishing industry appears to think differently (41000 Google hits for Love me Tinder).
Now we’ve seen the process in action, can we automate it computationally?
We can do wordplay for you wholesale
Going back to our article, our first goal is to extract topics. The very definite of topics is vague, however generally we want to group keywords and phrases by semantic relatedness. In the example above, we can imagine a dating topic (love, date, romance, marriage, couple, heart) and a technology topic (app, technology, internet, web, data, swipe..).
There are numerous algorithms which attempt to extract topics from text. One of the most widely used are the family of topic modeling approaches, the most popular of which are LDA and NMF.
There are issues here however, and using these algorithms may require a large corpus of text, which is fine for longform articles but not necessarily feasible for shorter opinion pieces or breaking news. So, in the spirit of the project, I implemented my own approach.
The first part which post hoc appears similar to the RAKE algorithm looks like:
- Remove stopwords based on a standard list
- Keep any tokens that appear at least twice in the text
- From this list, keep only nouns and verbs.
Video, Audio, DISCO
With the resulting list of n keywords, I computed the semantic similarity for each keyword with all of the others using an external library called DISCO. This tool allows comparison of words by semantic similarity similar to word2vec.
(K)-Means to an end
This matrix of word similarities is converted to a Euclidean distance matrix and then a k-means clustering is carried out to group the tokens into topics. As with any unsupervised clustering, the trick here was to compute a good number of clusters, not too many or too little. I settled on five topic clusters as a rough rule of thumb, although there are existing methods to determine the optimum number of clusters if the additional processing time is available. One extra outside case was added which checked if the number of keywords is less than ten, in this case only three clusters are computed.
Hanging on the Metaphone
Once we have a set of topic clusters, the next step involves augmentation of these. There are two processes for augmentation, semantic and phonetic.
The first step involves leveraging the DISCO API, and obtaining the top five most similar terms for each term in the topic cluster. Any multi-word terms returned are discarded. Another possible approach here would be to use datasources such as ConceptNet or Wordnet to find synonyms and semantically related terms.
The phonetic step is where the pun mechanism is incorporated, and this involves using the Metaphone library to return the top five most similar sounding terms for each of the topic terms. I’d like to think inventor Lawrence Philips was aware of the deviant use cases for his orthographical matching technology described in his 1990 paper title which riffs on a New Wave classic made famous by Blondie.
Memes to an end
As we saw above, tinder sounds like tender, cinder and possibly other more obscure terms, and these are added to our topic list. For each topic cluster pair, we try to find stock phrases, titles and memes that contain a term from each. However, there is of course the possibility that the vast majority of juxtapositions are unlikely or ungrammatical, so we employ a filtering step.
For each word pair, we query a language source to see if the words co-occur in an existing corpus sequence of length five. If this is the case, they are probably a sensible match. The original point of this step was to reduce the query load on the Freebase database, so in theory, it can be skipped if running locally, although it cuts down on possible query pairs in a neat way. The only downside is that you run the risk of missing a possible creative pairing that may be slightly less common in your language model. I used the COCA 5grams set, but the Google Books Ngrams corpus may be a wiser move, depending on computational storage space available.
Once a list of possible pairs are obtained, then we query against our source corpus. The Freebase database was used in the prototype system, and textual queries returned book titles, song titles, names musical groups and titles of film and tv programmes.
There are a number of possible matches given our two query terms, e.g tinder and love. Pun matches are restricted to content words only, e.g nouns or verbs, although POS taggers don’t work so well on short sentences and titles (which can actually benefit creativity).
- Phrase matches tinder : none
- Phrase matches love : Love Hurts, Love Bites, Love in the Time of Cholera….
- Phrase contains tinder and love : none
- Phrase contains Pun(tinder) : Tender is the Night, Tenderness, The Tender Trap
- Phrase contains Pun(love) : Hand in Glove, Maple Leaf Rag
- Phrase contains Pun(tinder) + love : Love Me Tender
- Phrase contains Pun(love) + tinder : none
- Phrase contains Pun(tinder) + Pun(love) Tender Glove, Tender Leaf
Given these combinations and depending on the size of our corpus, we may obtain hundreds of plausible results. It was decided only to focus on the specific combinations (3,6,7,8) in this case.
Rank and File (IO)
Even with the added restrictions, the system may return a high number of results. Accurate filtering is required to ensure the system is useable in a real application:
- Remove duplicate titles (These could also be incremented and used in sorting)
- Remove long titles (Optimum length of >=2 and <=6 was used in the experiments)
- Remove subtitles and bracketing
- Remove non-English titles (more frequent than we might think)
Once the filtering is done, the remaining output must be ranked.
Currently, ranking is done based on:
- Length : rank shorter phrases higher
- Edit distance between output with source phrase : ranks puns lower
- Semantic distance between topics and pun : Attempt to rank based on similarity with original topic
- Corpus frequency of original keywords : Are these commonly occurring terms?
The ranking is not an exact art, and several other methods could be useful, for example:
- Sentiment analysis: Is there a discrepancy between the tone of the title and original article?
- Additional terms occurrence: Do the other non-topic terms in the phrase occur in the original article or not?
- Grammaticality and/or euphony: Does the phrase flow, is there an even or odd number of terms, does it rhyme, (If the glove don’t fit, you must acquit)
A trial evaluation was carried out, where the system was fed ten articles and generated a number of titles for each. These titles were ranked using the initial four criteria, and a number of candidates were presented to the user.
100 titles in total were presented for ranking, and seven non-expert users were asked to give values between one and five for:
- Grammaticality: How does the title read
- Relevance: Does it correspond to the article
- Appropriateness: Could it be construed as offensive to print this?
- Creativity: Can this title be classed as creative?
Lessons learned from this evaluation is that reading ten articles is a bit too time-intensive for willing volunteers. Future evaluations will contain less stringent reading tasks, perhaps a paragraph summary to allow users to evaluate titles. Other critiques bemoaned the lack of a baseline headline for evaluation, in this case the actual article title could be used to compare user preferences.
Below is a list of the original ten articles, side by side with generated titles which were preferred and disliked. Links are given to the original creative title which spawned them.
|Link||Ranked above average||Ranked below average||Original title/topic|
|1||Obama of the People, Other People’s Good News||People, News and Views, People in the News||On Twitter, a number of high profile users dominate the conversation|
|2||Cancer of the Country||Age of Cancer, Country Blues, Number One||British have lower rates of cancer, less likely to survive than Americans|
|3||Shellshock, Day Late, Dollar Short, Another Day, Another Dollar||Two Dollar Day, The Dollar-a-year Man||Greece’s weak debt-ridden, jobless future|
|4||Mother Misery’s Favourite Child, Father Music, Mother Dance, Pre-Paid Child Support, Some Mother’s Son||Sweet Child O’ Mine, Sleeping Beauty Overture||The divorce divide: How the US legal system screws poor parents|
|5||none||Fear, Anxiety and Depression, Rip Than Thing||Tech has a depression problem|
|6||The Peanut Butter Solution, Pay Beyond the Bill, Helping People Help People||Wandering Child Come Home, The Peanut Butter Genocide||A US Cafe’s peanut butter sandwich charity campaign|
|7||The Food Fist Way, Food Time for Change, Soul Food||Song Farm Lacy’s Kitchen, When Something is Food||The Norwegian women making a song and dance about farming|
|8||Glitch in the System, January, February||Video Computer System||The numbers that lead to disaster|
|9||Wisdom of Life, High Rise Low Life, Hi-Fi Low-Life||Modern Life, Changing People, High on Life, Where Low Life Grows||The downsides of being clever|
|10||Half Man Half Woman, World of a Woman, No Man’s Woman||Wicked Woman, Foolish Man, Woman Beat Their Man, Thirteen Woman||What Norway can teach the US about getting more women into boardrooms|
Turns out principles of software engineering are actually pretty useful.
When you have a bunch of vaguely related linguistic resources, it makes sense to pre-compute similarities, load semantic matrices into memory and organize efficient data structures for Metaphone search.
Otherwise, the system can actually take hours to generate headlines for a single article.
Other ideas to implement in the future are an interactive web interface which allows users to manually enter URLs and/or topic lists for the system to operate on. A trace feature explaining each step of the process may help users in understanding the building blocks of the creativity process.
Of course, with any research topic, once one digs deeper a wealth of associated research is found.
Carlo Strapparava and his team at FBK Trento have been working on computational linguistic creativity for decades, automating advertising slogan creation, company name brainstorming and even the (excellently titled) EU project HahaCronym. The latter project gifted the world the popular Wordnet-Affect resource, a valuable side product for a grant whose initial goals were to create humorous acronyms a la Central Ineptitude Agency or the Fantastic Bureau of Intimidation and a veritable poster-child for the benefits of “basic research”.
The New Yorker magazine in conjunction with Microsoft is carrying out trials of AI to evaluate its famous cartoons.
A group of Finnish researchers recently developed a system that creates semantically vacuous yet plausible raps from a database of existing rap lyrics.
Deep learning can be used to train chat-bots with the sum total of knowledge from the Golden Age of Hollywood, so they can answer Big Questions with a cynical slant.
Frankly my dear, I don’t give a damn
Maybe the idea wasn’t so crazy after all.