This week, Stephen Ibaraki, ISP, has an
interview with Mike Stonebraker.
Dr. Stonebraker, widely acknowledged as the
world’s foremost database expert, brings a long history of outstanding database
research, entrepreneurial achievement and experience to his ventures. This
year, Dr. Stonebraker received the 2005 IEEE John von Neumann Medal,
which is the IEEE’s most prestigious technical honor.
Amongst his numerous accomplishments, Dr.
Stonebraker was the chief designer behind the Ingres relational database
management system (DBMS), the object-relational Postgres DBMS, and the Mariposa
federated system. His seminal research spawned the launching of several
successful database companies as founder and CTO: Ingres (acquired by Computer
Associates), Illustra (based upon Postgres, bought by Informix and acquired by
IBM), and more recently, Cohera.
Currently, from research conducted at MIT,
Brown, and Brandeis Universities on real-time data stream processing engines, Dr. Stonebraker is
founder and CTO of StreamBase Systems Inc., a company based in Lexington, Massachusetts which has raised $16M in investment
funding since 2003. The company will revolutionize the handling of data streams
allowing real-time message correlation/analysis and queries on data flows. This
has a substantial impact and 150 times performance improvements for
applications in financial services (analyze multiple stock feeds), military
battlefield intelligence, industrial control, weather tracking, data tracking from
RFID tags and sensors and individual performance monitoring (Casinos). Anywhere
one needs SQL joins, aggregates, and information correlation without storing
the data and the ability to process data from multiple streams, all in real-time.
Moreover, using supplied tools, years of application development compresses to
Until 2000, Dr. Stonebraker was an electrical
engineering and computer science professor at the University of California at
Berkeley. He is currently a professor at MIT. Dr. Stonebraker has held visiting
professorships at the Pontifico Universitade Catholique (PUC), Rio de Janeiro, Brazil; the University of California, Santa Cruz; and the
University of Grenoble, France. Additional professional activities include: leading
alternative data management strategies for NASA’s Earth Observing System;
Chairman, Technology Council, Science Tools Corporation; General Chairman and
other significant positions for SIGMOD from 1982; member of the Technical
Advisory Committee for Citicorp, DB Software, and Bull and member of SIMC’s
(Security Industry Middleware Council, Inc.) Board of Directors.
Q: Mike, with your remarkable research and entrepreneurial history,
we are indeed fortunate to have you for this interview. Congratulations on your
John von Neumann Medal, a particularly signature achievement! Can you comment?
A: I am thrilled to be the 14th
recipient of this prestigious prize and to thereby be included in a group of
winners that includes Gordon Bell, Fred Brooks and Carver Meade. It is indeed a special honor.
Q: What are your short, medium, and long-term
strategies, goals, hopes for StreamBase?
A: The short term goal for StreamBase is to make the first collection of
customers very successful. Our medium goal
is to change the way computer people think about streaming data and get them to
realize that StreamSQL (SQL with stream extensions) is the right paradigm for
real-time low-latency stream processing. The long term goal is to participate in the “sea change” that will be
caused by cheap micro-sensor technology. This will cause everything on the planet of material significance to be
sensor tagged to report its state and/or location in real time. The downstream firehose of information will
be processed by engines such as ours.
Q: Your company is currently focused on
financial services since they have an immediate need to analyze/correlate
information from multiple feeds in milliseconds and at much lower costs. One
test, where you were processing 140,000 messages per second on a $1,500 PC
versus 900 for a major RDBS system illustrates the clear advantages. How do you
see your technology specifically being applied to this and other areas in the
future, and what competitive advantages will it bring?
A: Our stream processing engine is especially beneficial in low-latency
high volume financial services applications such as feed processing, electronic
trading, real time risk analysis, compliance and real-time bond pricing. Off into the future, similar opportunities
exist in network management, homeland security, military applications,
real-time weather alerting, and industrial process control; any place where
there is a firehose of real time information that must be processed quickly. One way to think of electronic trading is to
imagine a field being plowed by a collection of bulldozers. One of them turns up a nickel and if you are
the quickest one to run out and get it, then you get to keep it. Our engine provides competitive advantage in
Q: You have a proven method of
demonstrating the power of your StreamBase technology by solving a customer’s
most difficult problems within a week. Detail a typical scenario and explain
why standard solutions do not work effectively.
A: StreamSQL provides the right high level operations to build a certain
class of applications very quickly. For
example, one large firm subscribes to several feeds of financial trade
data. They wanted to forward to their
trading engines the best available data; i.e. they wanted to consolidate the
feeds by passing on the first arriving data from whichever feed is most timely
and then discard the late duplicates. We
wrote this pilot in half a day using a total of 18 of our operations. It could process more than 100,000 messages
per second. This compares very favorably
with a general purpose language such as Java or C++, where development times
would be measured in weeks or months.
Q: What prompted your decision to support Sun
Solaris, Linux, and Windows running on Intel servers?
A: These are the platforms that our customers ask us to support.
Q: With StreamBase, you read TCP/IP data
streams producing asynchronous messages and you have APIs for consuming the
messages in customer applications. Without the need to store data, your StreamSQL
creation allows for the processing of data streams producing SQL joins and
aggregates and correlation of multiple streams. You have adaptors for working
with popular financial services’ feed formats and a workflow-based GUI for
rapid application development. Can you further comment on how this works? What
types of time-series operations can be performed?
A: Your question contains a high level description of the StreamBase
engine. Using our GUI, you “drag and
drop” our operators from a palette onto a workspace. When you are satisfied with an application
you can test it with our synthetic message generator. When an application works correctly, you can
deploy it across multiple computer systems (for ultimate scalability). On each system, StreamBase has a real time
scheduler that “pushes” messages through our operators as quickly as
possible. By avoiding process switches
and the necessity of storing the data, we can produce exceptional performance.
Q: What are the challenges in solving real-time
and “real-world” data feed processing problems and how do you provide an
effective solution ending in a realized business value proposition? Which
benchmarks can be used?
A: Since Streambase is a novel paradigm
that requires customers to rethink the way they do things, we usually ask them
to point us at their hardest problem. We
go away and write a StreamBase application to solve this problem and then
return in a few days to show them how we worked on their problem. The customer usually can see how to move our
application into one they can deploy in production. Hence, the most effective benchmark is the
customer’s actual business problem.
Q: Ingres and Postgres are open source, and
you support the model such as with Linux. Describe its future in 5 and 10
think open source DBMSs will capture a substantial piece of the DBMS market
because of their attractive price. I
expect the open source movement to grow healthier over the next decade.
Q: With your deep knowledge of enterprises,
technology and business value, choose three areas that need addressing and
share your views in each area.
Area 1: Constructing a solution that will allow one to
retrieve information from a mix of textual files (e.g. HTML) and structured
data in databases. One can think of this
as providing a google-like extension to what has been called “the hidden web”,
because it is hidden behind databases. I
wish I had a good idea on how to do this. Obviously, text retrieval is ineffective in the hidden web and SQL (even
extended with text) is not an end-user language.
Area 2: Data warehouses. Customers are
putting data into data warehouses at an accelerating rate and then asking
ad-hoc queries that paw over very large subsets. Many warehouse users are in considerable pain
and I have some ideas on how to provide pain-relief in this area.
Area 3: Semantic heterogeneity. There is great hype surrounding web services
to glue together information and services from different enterprises. However, imagine that you are the French
administrator and your salaries are net pay after taxes, including a lunch
allowance, and in Euros. In contrast, I
am the U.S. administrator and my salaries are gross amounts in dollars. Although these two elements can both be
called “salary”, obviously there is considerable meta-data required to
interpret a value. A web service to
merely read the value from a local database will be unhelpful, because the
reader has no idea what the units are or how to interpret the data. Dealing with schemas that were designed
independently but need to inter-operate is a really hard challenge.
Q: Mike, it is such a privilege to have you
come in sharing your deep insights. Considering your busy schedule, we
appreciate and thank you for the time you have spent with us.
A: Thank you for your time. It has indeed been a pleasure.