Yahoo 2013 Tech Conference 與會小記

這次榮幸獲 Yahoo HR 的安排參加這場 2013 Tech Conference。看到活動邀請函上寫著『無限供應的自助餐點』，抱著白吃白喝看熱鬧的心情，在 9 / 12 日當天晚上，來到了南港軟體園區的雅虎辦公室。

一進會場，看到會場中央，人群在兩個長桌旁活像一群螞蟻緊緊圍繞著，眼看著所謂『無限量』供應的自助餐點也快被搶光殆盡，我趕緊加入這群螞蟻群中猛攻。

然而冷不防地，螞蟻群中有個熟面孔立馬認出我中飽私囊的醜態，嚇得我連躲避都來不及。原來前同事在下班之後變身為 Taiwan R Group 籌辦單位的 Officer！

這次的 Conference 主題是 Big Data，而跟 Big Data 有高度相關的 R （一種 Data-mining 和統計分析演算法的實踐技術）也在獲邀之列。R Group 由中研院的研究員所主導，藉由這次機會幸運地跟幾位中研院的研究員交換名片，原來 R Group 每週一晚上都有技術研討會，大家有機會可以多去參加，

我特別跟中研院院士確認，如果當天研討跟 Yahoo 一樣會有食物招待，一定要通知我到場。

雖然好吃的醜態被前同事撞個正著，但是往正面想：人多好辦事。我們兩聯手收刮食物，充分發揮 1 + 1 > 2 的效果。

當天技術研討會的議程如下：

講題一：How Yahoo! Dance with Big Data – Ecommerce Yahoo! Taiwan

* Ecommerce Data Solutions/Products Introduction 
* Data Solution Design and Planning
* Hadoop Application

講者介紹：

Wennie Hwang (E-Commerce Engineering Director, Yahoo!)
Wennie has been contributed herself in the Ecommerce industry since 1999. She led an engineering team to build up a B2B Auction platform for Autotradecenter, which was acquired by Adesa in 2011 with 210 million USD. Her work in Cyberlink as an online Marketing and Sales Director was driving numbers using numbers. Part of her efforts in Yahoo! for the past two years was to drive the Big Data analysis for the Ecommerce stakeholders and provide value added features and product offerings to the consumers.

講題二：The Big Data architecture in Yahoo!

講者介紹：

Tim Tully (Distinguished Architect, Yahoo!)
Tim Tully is Distinguished Architect at Yahoo! and is an experienced big data expert. At Yahoo!, he has designed the Yahoo! Data technology platform, including data warehousing, aggregation, visualization, instrumentation, ETL and anything else involving analytics. Currently, he leads the architecture of multi-petabyte solutions at Yahoo on Hadoop and other big data ecosystems, and is responsible for bringing Spark and Shark from UC Berkeley to Yahoo. He is also a Winner of prestigious Yahoo! Individual Superstar award for 2011.

Tim Tully 任職於美國Yahoo!。他是在 Big Data 裡面相當資深的 Distinguished Architect。Tim設計了從頭到尾的整個Yahoo! Data platform，包含 data warehousing, aggregation, visualization, instrumentation, ETL 以及任何有關資料分析的部份。現在他的工作主要是主導 Yahoo! Data platform，像是 Hadoop 跟其他 big data ecosystem 的發展，並且跟 UC Berkeley 合作將 Spark跟 Shark 帶入 Yahoo! Platform. Tim Tully同時也是 Yahoo! 2011 年的 Individual Super Star 的得主。

講題三：Spark and Shark: High-speed Analytics over Hadoop Data

In this talk, we present two emerging, popular open source projects: Spark and Shark. Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write. It outperform Hadoop by up to 100x in many real-world applications. Spark programs are often much shorter than their MapReduce counterparts thanks to its high-level APIs and language integration in Java, Scala, and Python. Shark is an analytic query engine built on top of Spark that is compatible with Hive. It can run Hive queries much faster in existing Hive warehouses without modifications.

These systems have been adopted by many organizations large and small (e.g. Yahoo, Intel, Adobe, Alibaba, Tencent) to implement data intensive applications such as ETL, interactive SQL, and machine learning.

講者介紹：

Reynold Xin (PhD Candidate, UC Berkeley)
Reynold Xin 是 Big Data 界最頂尖的 UC Berkeley AMP Lab 的學生，他是 Shark 的作者以及 Spark 的核心開發者。他所主導的 Shark Project 是 SQL on Spark 的一個 Open Source 的實現，在相容於 Hive 的狀況下，性能最高可以達到 Hive 的一百倍。也因為這樣的突破，Shark 獲得了SIGMOD 2012 的 Best Demo Award。在就讀 Berkeley之前，他曾經在 Google 跟 IBM 任職過，他的興趣是資料管理系統，分散式系統，以及大規模資料處理的演算法設計。

其實，這次的研討會根本是藉由 Yahoo 在大資料應用領域火力展示在做新血招募！我相信現場有許多人跟我一樣，看到技術成果都相當 exciting，趨之若鶩。