Tuesday, May 21, 2013

Listening to your telephone calls

Back on May 5, we looked at some disturbing comments made by former FBI counterterrorism agent Tom Clemente regarding the Boston bombing investigation. In essence, Clemente said that it would be possible to retrieve a stored phone conversation involving one of the bombers:
"...Welcome to America. All of that stuff is being captured as we speak whether we know it or like it or not."
Does this mean what it seems to mean -- that the NSA records and stores all of our telephone conversations? The comments here give some insight into the matter, especially these words from someone calling himself "Dilbert"...
...I know Tim personally and I believe he knows exactly what he's talking about. He shared these same views with me roughly 10-12 years ago. This is likely an extension of the old Echelon program. I doubt they're storing audio; more likely using voice-recognition and dumping this all to text. Sure, they could be doing the old keyword flags but I doubt that (too much noise). I expect it's all dumped into massive databases for after-the-fact investigation.

For more on the capabilities, just do a search on "echelon semantic forests"
On the other hand, a comment from one "Alex" points out the technical barriers -- the need for high compression, the ability to store everything, and the simple fact that "speech to text sucks."

A certain BT adds:
It is in the National Security State's best interests to imply that it can do more than what it actually admits doing even if it can't really do what it wants to imply, and I wouldn't put it past certain TLA's to feed deliberate misinformation to a former employee in order to make sure it gets into the press.
You must come to your own conclusions about the credibility of this offering, from someone calling himself Stratego:
I don't know if the NSA is monitoring every phone call. I know how they work and find it difficult to believe that they are monitoring everything. It's the US Government after all. However, I certainly know for a FACT that they are illegally monitoring US person's calls, and dragging in loads of US citizen and US person data, public and private.

NSA has had tons of problems over the last few years simply with data center power. They can't get enough power from the local utilities, hence Bluffdale, UT and a temporary Austin, TX location.
How much would it cost to store all of the audio from all of those calls? Perhaps less than you think...
I'll assume that speech-to-text is not good enough, or that we want to keep the audio around for some other reason.

Assume the NSA is using something like Codec2 at 1400 bits/sec ( 175 bytes/sec). That's 10.25 KB/min/person.

Extrapolating this to the entire country (310 million people), we get about 3 terabytes per minute, or about one large hard drive.

Amazon S3 glacier storage is about $0.01/GB/month, so storing one month of recordings would round up to $31/month. At 500 minutes of talking for each person (average) per month, that gives us $15,500/month ($186,000/year) to record the entire country.
Wow. You'd expect the cost of such a project to be in the millions.

7 comments:

stickler said...

It always pays to check the arithmetic.

At 3TB/min for the entire country, and 500 minutes/month/person, the annual storage necessary is 3TB/min X 500 min/mo/person X 12 mo/yr = 18,000TB/yr.

If the Amazon S3 glacier storage costs $0.01/GB/mo, that equals $120/TB/yr.

18,000TB/yr X $120/TB/yr = $2,160,000/yr to store all US citizen telephone calls.

Still pocket change.

Propertius said...

And, of course, NSA's internal costs to store data are far less than Amazon's inflated retail cloud storage costs, even counting the redundancy you'd need to make the retrieval process "hadoopable".

You would certainly want to store audio rather than text - with sufficient fidelity to permit analysis of background sounds (particularly voices).

Jack said...

Interesting. But not an original idea. Bill Gates made this very point in reference to what he predicted (back in the 1990s) would become ubiquitous video surveillance in public places. The cost of storage is just so trivial that it's not a factor in determining what to record and what to keep. And, again, this was the 1990s.

Anonymous said...

A bit of back of the envelope orders of magnitude calculation, from another commenter in your cited source:

"Regarding data requirements: GSM-encoded voice mail, Asterisk's documentation informs me, takes approximately 500 megabytes per 2,000 hours. I did a quick back-of-envelope on how many disk drives would be required to record 100 million people's phone calls for a year, and we're talking about around 14 million disk drives, or about the output of a nuclear power plant just to power them, nevermind the air conditioning, the servers built around them, and so on and so forth. Not happening.

As far as voice transcription, the NSA doesn't have any better technology than the private sector nowadays, and someone already mentioned how pathetic Siri etc. are. I don't doubt that they *try* to do machine transcriptions and stash them back, including the phoneme-based compression mentioned above, but that's a Hard Job, and I doubt they're any better than Siri.

Now, *digital text* communications... I presume that email and SMS and IM are all being logged and stored somewhere. They are compact and easily stored. Same deal with "PIN Register" data (who called whom when). But voice? That's a Hard Job, and it would surprise me if anything other than "flagged" calls are being recorded continuously."

However, the Homeland Security Agency claims that whatever they're holding is purged after a 5 year period.

See page 11, at http://www.dhs.gov/sites/default/files/publications/privacy/PIAs/privacy_pia_ops_NOC%20MMC%20Update_April2013.pdf

"The NOC MMC retains information for no more than five years to provide situational awareness and establish a common operating picture. This five-year retention schedule is based on the operational needs of the Department."

XI

mr kite said...

Actually, assuming that this data accumulated at a linear rate over the first year, the cost for a year of storage would be $1,080,000.

At $.01/GB/month and estimating a data growth rate of 5% annually, you could store all U.S. telephonic conversations for 10 years for a total cost of under $130,000,000 and never have to throw anything away.

How little money is this to our criminal elite? Well, it's about 1/3000th of the budget for Lockheed Martin's "troubled" F-35 fighter jet.

amspirnational said...

Did you hear about the nation's premiere antiwar site, www.antiwar.com suing the FBI over surveillance?
Check it out. Just announced.

Anonymous said...

Oh but Im sure the cost to the tax payer really is in the billions. Your thinking of the cost to the contractor.

Invade your privacy and get rich too. These people are geniuses!