Here are a few stories from the data space that caught my attention this week.
Amazon, BitYota launch data warehousing services
Amazon announced the beta launch of its Amazon Web Services data warehouse service Amazon Redshift this week. Paul Sawers at The Next Web reports that Amazon hopes to democratize data warehousing services, offering affordable options to make such services viable for small businesses while enticing large companies with cheaper alternatives. Depending on the service plan, customers can launch Redshift clusters scaling to more than a petabyte for less than $1,000 per terabyte per year.
So far, the service has drawn in some big players — Sawers notes that the initial private beta has more than 20 customers, including NASA/JPL, Netflix, and Flipboard.
Brian Proffitt at ReadWrite took an in-depth look at the service, noting its potential speed capabilities and the importance of its architecture. Proffitt writes that Redshift’s massively parallel processing (MPP) architecture “means that unlike Hadoop, where data just sits cheaply waiting to be batch processed, data stored in Redshift can be worked on fast — fast enough for even transactional work.”
Proffitt also notes that Redshift isn’t without its red flags, pointing out that a public cloud service not only raises issues of data security, but of the cost of data access — the bandwidth costs of transferring data back and forth. He also raises concerns that this service may play into Amazon’s typical business model of luring customers into its ecosystem bits at a time. Proffitt writes:
“If you have been keeping your data and applications local, shifting to Redshift could also mean shifting your applications to some other part of the AWS ecosystem as well, just to keep the latency times and bandwidth costs reasonable. In some ways, Redshift may be the AWS equivalent of putting the milk in the back of the grocery store.”
In related news, startup BitYota also launched a data warehousing service this week. Larry Dignan reports at ZDNet that BitYota is built on a cloud infrastructure and uses SQL technology, and that service plans will start at $1,500 per month for 500GB of data. As to competition with AWS Redshift, BitYota co-founder and CEO Dev Patel told Dignan that it’s a non-issue: “[Redshift is] not a competitor to us. Amazon is taking the traditional data warehouse and making it available. We focus on a SaaS approach where the hardware layer is abstracted away,” he said.
Big data problems, in real time?
Some of the big problems with big data might not be what we think they are — and some might not end up being problems at all. The New York Times’ Quentin Hardy talked this week with technology pioneer, brain researcher and Numenta co-founder Jeff Hawkins, who believes data storage companies, relational databases and historical data analysis may soon be things of the past. They’ll be replaced by real-time analysis of information streams from sensors, what Hawkins calls “the future of machine intelligence.” Hardy reports:
“‘Hadoop won’t go away, but it will manage a lot less stuff,’ [Hawkins] said in an interview at Numenta’s headquarters in Redwood City, Calif. ‘Querying databases won’t matter as much, as people worry instead about millions of streams of real-time data.’ In a sensor-rich world of data feeds, he is saying, we will model ourselves more closely on the constant change that is the real world.”
Hawkins’ company Numenta has been developing a real-time artificial intelligence product called Grok, which is modeled on how Hawkins believes the brain to work — taking in information and seeking patterns in order to predict what will happen next. As described on its website, Grok is a cloud-based service that pulls in data streams from sensors — servers, consumer devices, machines, etc. — in order to make actionable predictions to help inform decisions. Hardy reports Numenta claims the service’s limited release has been successful, with results 10-20% better than traditional predictive benchmarks. They expect to sell the product on a broader scale in early 2013.
Stolen cellphones give NYPD phone record carte blanche
A report this week from the New York Times brought to light new data privacy concerns regarding cellphone records. Joseph Goldstein reports that the standard procedure for the New York Police Department in the case of reported stolen cellphones is to subpoena the phone records for the stolen phone’s number, starting from the day of the theft. These records are then added to a searchable database, hyperlinking each phone number to allow detectives to cross-reference phone calls in other case files. This may seem reasonable from a crime-fighting standpoint, but there’s a problem with scope. Goldstein reports:
“The subpoenas not only cover the records of the thief’s calls, but also encompass calls to and from the victim on the day of the theft. In some cases the records can include calls made to and from a victim’s new cellphone, if the stolen phone’s number has been transferred, three detectives said in interviews.”
Goldstein points out that all this information is obtained without having to acquire a court order, and the phone records remain in the system to “conceivably be used for any investigative purpose.” Civil rights lawyer Norman Siegel called the situation “eye-opening and alarming” and told Goldstein, “There is absolutely no legitimate purpose for doing this. If I’m an innocent New Yorker, why should any of my information be in a police database?”
Tip us off
News tips and suggestions are always welcome, so please send them along.