SIGMOD 2010 Notes



Here are the notes from my SIGMOD conference attendance in 2010.

RDBMLS: SQL Server 2008 R2
ETL, reporting, analysis, search, CEP(StreamInsight), OLAP(PowerPivot), Collaboration, MDM(Master data services),; named as analytical data platform.

Platform Providers; application devs, end users.
+95% of enterprise data is unstructured.

2007 Hadoop is created by Yahoo.
Bryan’s DISC lecture.

2009 hadoop is in enterprise.
2010 hadoop is completely integrated to enterprise(security, etc.)

coupling complex map collection from MS is going to announced in upcoming weeks.

mesos-enabled frameworks will be there in 2012

what is elastic search?

SciDB
Uses hadoop, shared disk model is not directly used in SciDB.
Arrays are the basic data structure.

Functional Query Language

What is Powerscript?
scidbl.org

Integrating Hadoop and parallel
Teradata input

Why log data is important?

6tb log data/day in facebook
China mobile stores call data 5-8tb/day
do linear scan don’t do index on this huge log data.
Hadoop is the implementation of MapReduce
Learn MapReduce.

Personalized recommendations;
Hadoop has problems on deep analytics.

Rmpi, SNOW, are examples of R.

Look at; Comperative Advantage; David Ricardo

hadoop+jaql scalability to large-scale data management.

jaql is an open source.


Friend recommendation @ MySpace
PYMK=people you may know
Look at; MapReduce system, they said it can be comparable to c#.

people mapreduce system specific to friend recommendation.

Forecasting High-Dimensional Data
Early january(submit), mid january(yahoo accepts), loads up and being paid.
Full independance model.(FIM)
Partwise independence model. (PIM)
Sample-based Joint Model(SJM)

Facebook
Hive+hadoop 99% of the analysis done in Facebook.
Chronos Scheduler
learn about: HDFS-RAID, Scribe-HDFS!!!


Cloud Computing
wikipedia data may not be correct always:)
twitter’s data will be more valuable than wikipedia’s data.


SQL for Azure
Gopal Kakivaya

Google Fussion Queries are available in google docs and maps.

cikm conference

tables.googlelabs.com

look at "public tables google"

XQE is a graphical query generator from aqualogic, aquastudio.
BEA's graphical query editor is now Oracle's editor.

faa data is available

Look at: Tableau lets you create graphical data queries.

if you have 2000 servers, 10 servers crash everyday.

1 hour laptop usage makes 20gr carbon print.

Comments

Popular posts from this blog

Space Character Problem on IE 6, 7, and 8

AWS encryption chart (SSE-S3 vs SSE-KMS vs SSE-C)

Does Netflix work on iOS 5 Beta 4?