Thursday, March 27, 2008

Markov Chains

Last night my friend Eric challenged me to write a Markov chaining algorithm to create fictitious Italian last names. For those of you not familiar Markov chaining is a simple process whereby you construct a word by looking at the 2 (or 3 or 4) previous letters adding a new letter which is statistically likely to came afterward. You need a reference set of words of course so I stole it from wikipedia. Here's the list. They have been tested to ensure that none of the generated fictitious names coincides with the real names.


abbatti
sabbiano
donardo
fortoni
fontano
posito
espossi
argenovese
spernabò
areschini
spintori
politali
tibalbi
gabrielli
bandretti
tremonte
roti
lombo
baiani
baglianchi
sola
giuliano
loggio
vita
molito
robbiani
gasparelliano
garo
rocchini
ungelista
morossetti
moscarlo
sartelli
giore
battei
bassini
barassano
devitaliparelli
evangenito
macci

cerugini
defendretti
ruso
manci
endretta
mini
mazzo
mily
matti
buonato
ventierrugia
codazzatti
pettista
silviattei
pelli
bocchiavoso
con
colo
cortolamini
costino
coppolini
annunziattei
antoro
anserminnella
flaminnellegri
aneri
anfredini
angenis
gentino
beroni
turielli
farro
famini
benis
pucciolombarbierrarini
belli
andido
sca
chiavonello
dano

bettista
fabriz
amoro
schio
eppola
testoro
davitabile
sclafano
ducciolombo
valcandriz
carlatto
fierrero
cantinola
napoletti
capinelli
vavone
finziatti
cavalcandro
sungaro
caccio
caetano
rizzi
campani
bianchiaparello
cali
ciprio
pacci
ale
lazzi
paladini
grini
picciolo
albernabò
panelli
bondini
bortino
ramacci
pinti
auriero
pisantoroso


Update: I ran my squelch newsflashes through the Markov Chainer. I got these:
"Bush would have to go on a group of unarmed shoppers Friday at a loss to retrieve the missing item."
"Though the 23rd meta extension of irony a level first reached by a Wisconsin machinist in June 2000 the fact that he just rambles a lot better said Kenyan orphan Mutheru Ubatto through an interpreter."
"Really what's a couple of pints of blood compared to a very difficult thing to do."
"While Greenberg's decision to purchase the single Kimberly Diaz a noted expert on irony explains because that song is still a pretty crappy song."
"Chameny blankly replied Will & Grace."
"Like many Americans who never considered African AIDS orphans to be a good topic joining the ideas of Chomsky and Searle in a bid for a diff'rent era."
"A needle was misplaced in a purely post modern constructivist framework."
"Said one Pentecostal Christian All we want is to get him to just answer your question without using knowledge only a dedicated grad student Josh Greenberg purchased the song after a child ..."

Update: Here the code to the original Markov Chainer for names. If anyone sees any problems with it let me know.

*snip*

I'm going to move the code in different post.

17 comments:

Zack said...

Dibs on Cavalcandro

Anonymous said...

"Belli" is a real surname, and "Giuliano" is a real given name, so I'd be wary of using these for any serious purpose....

Tommaso Sciortino said...

I agree. I recognized many of these as real names even though they weren't in my original list of common names. I didn't mean for these to be used for serious purposes. I intended them for silly purposes only.

Mark said...

This is nothing new. Go to:
http://www.makewords.com/default.aspx
and select "Italian".

Tommaso Sciortino said...

Fair enough. I will no longer labor under the delusion that I was the inventor of Markov Chaining.

Pete Cockerell said...

LOL, nice ripost, Tommaso....

Derek said...

hmm, I didn't realize that's what it was called. I made a few word generators before that used markov chaining, but I never knew that's what it was called. I'll have to research that.

Anonymous said...

There's a (python) implementation of name generation in "Game Programming for Python".

Anonymous said...

Check out the Steampunk name generator at brassgoggles.co.uk sometime.
Yours truly,
Prince Leander Yardley

Haatem Reda said...

Sounds like you could run anything through the chainer and have hours of hilarity. Hours! The possibilities are endless! I want one!

Vogler said...

Could you automate the process of acquiring reference data, so that we can all sit around, thinking of fun things to try, without it being such a pain to get the data? That would be awesome. Please express your response as the product of its word scores modulo 1000000.

Also, I just played Thought Bubble again. I think the puns, the slide whistle sound, and the jiggliness are together enough to keep me entertained.

Suspect said...

Okay, I'm officially guilty of fizzbuzzization: I _rushed_ off to write my own after seeing this post. Here's what mine came up with:

Pelli
Pelliorin
Peni
Peno
Pentabri
Pernarretredi
Pernet
Pertonerardorvo
Petillo
Petrellucchi
Pettigararsanato
Picci
Pinte
Pintinello
Piraspaccinarruttierrugini
Pisandini
Pisanicuccio
Possini
Pucchi
Pucchianaleni
Quinarrellia
Rizzo
Robbato
Robbato
Rocchiavoso
Rocchio
Roce
Roso
Rostigliorerrazzon
Rotta
Ruffoneroccarro
Sabri
Sabus
Santino
Scario
Scarlo
Schiatoli
Scla
Sclatello
Silo
Soli
Spino
Ste
Strugiaparona
Tesori
Tozzoni
Trestoli
Ungenti
Ungentini
Vavalluschini
Vavo
Vecolinnarotilei
Ventornartelladi

For added hilarity, try running a Markov chainer on Alice in Wonderland, or Hamlet. Comedy GOLD, I tell ya ;)

~Angela~ said...

Wasn't Benis the last name of Elaine on Seinfeld?

ENTJ said...

Very nice, I love it. You can actually create your own parallel universe populated by Italian names. Funny enough, that universe overlaps with our owns (e.g. I know a couple of Rizzi). Good exercise, anyway.

Craig said...

A better test of the fictionality of imaginary surnames would be a simple google search. Chances are, if google pulls no results or few results, very good that in fact your made-up names are original (though not an absolute guarantee, since my mother's name is nowhere on google except as a typographical error!).

Craig said...

For example, suppose I imagined the surname "Frangiovanni". When I search for it on a search engine and pull up results, I may only one or two, and these may not even be in the same form precisely. Another one I like to use is "Scipalma" - sounding very southern Italian I think. Try to come up with some imaginary names and see how often they come up in an internet search engine. That in itself should be enough to gauge its originality potential.

Anonymous said...

This is interesting - have you thought about generating Markov chain music? Like in here: http://thepasqualian.com/?p=1831