Saturday, February 28, 2009

"Knowing him as I do..." (Update post)

A short while ago, I published a post which regurgitated, in large measure, a private email exchange published (without permission) by John at Liberal Rapture. The exchange was between two individuals whose names were redacted from the published version. One correspondent was a Democratic party activist who had come to know Obama well, and who had many critical things to say about our current president. The other correspondent was a well-known writer who was and remains fervently pro-Obama.

Rather flippantly, I called the anti-O writer "Fred" and the pro-O writer "Ed." "Fred" offers some rather Suetonian "behind the scenes" views of our beloved leader, who emerges as a self-centered individual with plenty of ambition but no convictions or ideology.

In my earlier piece, I guessed that "Ed" (the die-hard admirer) might be Josh Marshall or Joe Conason. Those suspicions, I am told, were mistaken. I will venture no further guesses. I do not know who either person is, or even if the exchange was real.

One of my readers -- code-named "G" -- is versed in the ways of linguistic analysis. He insists that the same person wrote the words attributed to Fred and Ed. In other words, the exchange is -- in his view -- a forgery.

G and I agree that if the thing is a forgery, John at Liberal Rapture didn't do it.

So I wrote to John. I told him that I did not want him to divulge private info -- I simply want confirmation that two people were involved with the writing.

In his reply, John asserted that he believes the exchange to be real but cannot prove so.

He also says that he has seen other emails from "Ed" which are very similar, in both tone and style, to those which we see in the questioned exchange.

The person who sent the emails to John did not disguise the identities of the two writers. I don't know if Fred or someone else passed the stuff on to Liberal Rapture, but I do know that the contact did NOT ask for the exchange to be published.

John insists that Liberal Rapture is not a sufficiently important blog to warrant the creation of an elaborate, large-scale hoax. On the other hand, he admits that the whole thing could have been "a creative writing exercise," designed to "see if you can make a fool of the minor blogger."

I doubt it. I think, but cannot prove, that the exchange is real.

9 comments:

Anonymous said...

Fitz raises an interesting possibility – that Ed is actually Jim Kunstler.

I went to Kunstler’s blog, expecting to quickly dismiss this possibility. But what I got is ambiguity.

Kunstler’s writing does have some resemblance to Ed’s – to a substantially greater degree than other writers I’ve looked at – with the exception of Fred.

Cross entropy using lexical features (such as text character n-grams or words) assigns Ed to Fred, and not to Kunstler. But Support Vector Machine – another algorithm that has proven its worth in authorship attribution – assigns Ed to Kunstler when using words as a feature (using Support Vector Machine with text character n-grams assigns Ed to Fred). Also, there are certain consistencies between Kunstler and Ed – for example, Ed calls Obama “Mr. O”, and Kunstler does this also; Ed refers to FDR taking office in 1933, something that Kunstler often mentions; both like to use words in quotes and parentheticals; etc. I’ve only had a chance to do limited analysis on syntax, and Ed tends to come out closer to Fred than to Kunstler, but the Ed text sample is small and the syntactical differences aren’t huge, so it’s hard to be sure.

On another note - various algorithms have been developed for detecting deception in texts (i.e. is the text deceptive or honest communication). The best of these approaches can do pretty well (e.g. ~80% accuracy).

I found one such program on the web – by Rajarathnam Chandramouli (and co-workers) out of the Stevens Institute of Technology. I’ll note that they haven’t published their algorithm, making it difficult to assess their methodology and compare to other published approaches. But I tried a smallish set of sample texts of various sorts (dry/cerebral, emotional, political, apolitical, etc.) and it actually seemed to do quite well. I got no false positives (no erroneous ascriptions of deception) and one false negative (a known instance of deception that was erroneously labeled as normal communication). Interestingly, when I submitted the Fred text, it was labeled as deceptive communication.

Anonymous said...

The anti-Obot should be easier to identify if his/her quoted autobiographical info is correct:
"More importantly, I've had the opportunity to meet and talk with Obama in small, intimate settings and gain a better sense of the man than anyone can based on t.v. interviews and his ubiquitous speechifying. (I was for many years active in the Democratic party. I also worked for some years for an investment bank which was a major backer of Obama from the time he ran for the Senate. And, he was eager to spend lots of time with the top bankers of Wall St in small dinner settings in glorious NY penthouses. I was often invited along simply because I was one of the few blacks at my bank they could invite to "color" these dinners.)"

Find a rich black investment banker who used to work for an Obie-backing big bank in 2004, and who used to be active in rarefied Dem circles, and there's your guy/girl. He/she also says he/she has extended family in Germany.
How many people like that can there be? He/she even claims their "uniqueness" is significant to their access to him.

John Smart said...

Just to be clear I state in the original post that I had no idea if the email chain was real or concocted.

I beleive them. But the the value is in the fundraisers ability to be clear about her support of Clinton and her negative assesment of Obama not being automatically connected

Joseph Cannon said...

We have a pronoun...!

Anonymous said...

I am convinced that the exchange is real and that James Kunstler is, in fact, the Obot in question. My reason for this conclusion (perhaps it's same as the aforementioned Fitz's) is the anti-Obot's referencing of the phrase "basically honest and intelligent" by the Obot. Kunstler, in fact, on more than one occasion used exactly those words to describe Obama, as can be verified on this post.

Also, Kunstler had a particularly nasty case of CDS (or perhaps it should be called HDS because of the level of mysogyny involved) regarding Hillary Clinton, so it makes sense that he would dismiss the anti-Obot's claims as sour grapes from a disappointed Hillary supporter.

Moreover, I tend to doubt textual analysis in an exchange as short as this one because the writers will very often repeat one another's phrasings in order to rebut their arguments. Although perhaps I'm missing something. I did notice, however, that the anti-Obot made a few common grammatical errors (a misplaced modifier here, a faulty parallelism there) that the writer who I assume to be Kunstler did not. This only makes sense, since Kunstler is certainly a careful (and at times quite brilliant) writer.

Btw, I really love your blog.

Inky

Anonymous said...

Inky is definitely right that the major problem for an authorship analysis in this case is the very short length of the Ed text.

Kunstler is amazingly consistent in his use of language (words, syntax, etc.). Across interviews, articles, and blog entries, how he uses language stays remarkably constant.

But with a very short text, you have a very small sample of language – and stochasticity that can make it difficult to reach clear conclusions.

Algorithms based on cross entropy are known to work well, even for short texts. There’s less published about Support Vector Machine (SVM) for short texts. In general, it’s a powerful technique. But SVM can sometimes give unstable (flaky) results.

Basically, using these types of approaches, and using different samples of Kunstler’s language (both writing and interviews), here’s what I got:

Character n-grams (of various lengths) always ascribed authorship (of Ed) to Fred, using both cross entropy and SVM.

Cross entropy using words (and word bigrams and trigrams) also ascribed authorship to Fred.

SVM using words ascribed authorship to Kunstler in most but not all cases (i.e. under this method, when Fred was paired with various samples of Kunstler’s writing, authorship of Ed was generally, but not always, awarded to Fred). However, using longer word n-grams (word bigrams and trigrams) with SVM tended to award authorship to Fred.

Using words plus parts-of-speech (in my experience, a very informative feature set) always gave Fred as the author, under both cross entropy and SVM.

Further analysis of syntax (many aspects – e.g. use of adverbs in adjective phrases) also tended to place Ed closer to Fred than to Kunstler, but Kunstler was not highly divergent from Ed and Fred (and actually somewhat closer to Ed for certain features).

So I’m hesitant to draw a definitive conclusion.

There are studies showing that remarkably good rates of authorship attribution are possible with very short texts. E.g >90% correct attribution, in a panel of 100 possible authors, for E-bay comments and short Enron e-mails. But there’s a lot less information in a short text, and things go awry when you have a short text and particular potential authors with similar lexical and syntactic patterns. In this case, there are quite a lot of lexical and syntactic similarities between Fred and Kunstler, so discriminating between them for the short Ed text is difficult.

The potential echoing of phrasing that Inky notes occurred to me also. I tried eliminating anything that obviously looked like recapitulation, and it didn’t affect the results. Also, Kunstler’s language usage seems to remain constant even across interviews (i.e. I don’t see a large ‘echoing’ effect in that context – at least for many of the features I’m using). But this is still a consideration.

There are other similarities that I’ve noticed between Ed and Kunstler also – e.g the use of ‘IMO’, the use of a double dash, particular somewhat unusual words, etc.

So, at this point, I’m on the fence. I can’t confidently exclude Kunstler, and the quoted “basically honest and intelligent” phrase provides some evidence in favor of Kunstler.

Cinie’s suggestion is an interesting one. If this is a real communication, and Fred did not misrepresent himself/herself, he/she might not be hard to find. On the other hand, it’s hard to prove non-existence. If Fred does not exist (as self-described), searching for such an individual would turn into a bottomless pit of time. But it might be worth looking into this a bit.

Anonymous said...

How cool that somebody actually used "suetonian"!

Anonymous said...

Additional text analyses I ran on Sunday* suggested that ‘Ed’ actually was Kunstler. So I e-mailed him to inquire.

He confirmed that this is an actual e-mail exchange with ‘Frederica’ – that he is actually the blogger in question.

*[In case anyone’s interested in the minutia of the linguistic analysis…

For SVM trained on extremely large samples of Kunstler writing, authorship was ascribed to Kunstler (and not Frederica) for not just words, but also longer word n-grams (bigrams and trigrams).

I also found a large sample of text from an interview that Kunstler had conducted, in which he was speaking pretty ‘loosely’. Using this sample, both cross entropy and SVM assigned authorship of Ed to Kunstler under both words and word n-grams (and for a set of function words as well).

Focusing on very common syntactic features (such that the sample size would be adequately large in a short text sample): The use of determiners – especially the frequent use of ‘a’ or ‘an’ – is inconsistent with Frederica and is more consistent with Kunstler, especially when he’s communicating more informally. The frequency of use of noun phrases starting with a noun was also more consistent with Kunstler than Frederica.

There was also a set of idiosyncrasies (e.g. use of double dashes, occasional single quotation marks, “Mr. O”, etc.) shared by Ed and Kunstler.

Gender analysis tended to ascribe female gender to the primary correspondent, and male gender to Ed.

As multiple people have noted, there were stylistic differences between Ed and Frederica. And Ed had stylistic resemblance to Kunstler. This was even stronger in Kunstler e-mail text (a few samples of which I found online).]

Anonymous said...

Probably no-one cares, but I noticed a typo in one of my comment entries from yesterday (posted at 8:17 PM). The offending passage should read:
“SVM using words ascribed authorship to Kunstler in most but not all cases (i.e. under this method, when Fred was paired with various samples of Kunstler’s writing, authorship of Ed was generally, but not always, awarded to Kunstler).”
i.e. the last word in the parenthetical should be Kunstler, not Fred.