SIGMOD 2010 Notes
Here are the notes from my SIGMOD conference attendance in 2010.
RDBMLS: SQL Server 2008 R2
ETL, reporting, analysis, search, CEP(StreamInsight), OLAP(PowerPivot), Collaboration, MDM(Master data services),; named as analytical data platform.
Platform Providers; application devs, end users.
+95% of enterprise data is unstructured.
2007 Hadoop is created by Yahoo.
Bryan’s DISC lecture.
2009 hadoop is in enterprise.
2010 hadoop is completely integrated to enterprise(security, etc.)
coupling complex map collection from MS is going to announced in upcoming weeks.
mesos-enabled frameworks will be there in 2012
what is elastic search?
Uses hadoop, shared disk model is not directly used in SciDB.
Arrays are the basic data structure.
Functional Query Language
What is Powerscript?
Integrating Hadoop and parallel
Why log data is important?
6tb log data/day in facebook
China mobile stores call data 5-8tb/day
do linear scan don’t do index on this huge log data.
Hadoop is the implementation of MapReduce
Hadoop has problems on deep analytics.
Rmpi, SNOW, are examples of R.
Look at; Comperative Advantage; David Ricardo
hadoop+jaql scalability to large-scale data management.
jaql is an open source.
Friend recommendation @ MySpace
PYMK=people you may know
Look at; MapReduce system, they said it can be comparable to c#.
people mapreduce system specific to friend recommendation.
Forecasting High-Dimensional Data
Early january(submit), mid january(yahoo accepts), loads up and being paid.
Full independance model.(FIM)
Partwise independence model. (PIM)
Sample-based Joint Model(SJM)
Hive+hadoop 99% of the analysis done in Facebook.
learn about: HDFS-RAID, Scribe-HDFS!!!
wikipedia data may not be correct always:)
twitter’s data will be more valuable than wikipedia’s data.
SQL for Azure
Google Fussion Queries are available in google docs and maps.
look at "public tables google"
XQE is a graphical query generator from aqualogic, aquastudio.
BEA's graphical query editor is now Oracle's editor.
faa data is available
Look at: Tableau lets you create graphical data queries.
if you have 2000 servers, 10 servers crash everyday.
1 hour laptop usage makes 20gr carbon print.