<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Dan Weinreb's blog</title>
	<atom:link href="http://danweinreb.org/blog/feed" rel="self" type="application/rss+xml" />
	<link>http://danweinreb.org/blog</link>
	<description>Software and Innovation</description>
	<lastBuildDate>Mon, 05 Sep 2011 20:12:36 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>Learning French</title>
		<link>http://danweinreb.org/blog/learning-french</link>
		<comments>http://danweinreb.org/blog/learning-french#comments</comments>
		<pubDate>Mon, 05 Sep 2011 20:12:36 +0000</pubDate>
		<dc:creator>Dan Weinreb</dc:creator>
				<category><![CDATA[Travel]]></category>

		<guid isPermaLink="false">http://danweinreb.org/blog/?p=680</guid>
		<description><![CDATA[I&#8217;m going to France for the next two weeks and thought it would be good to try to learn some of the language. When I was a kid, I had three years of French instruction in school, but it was by far my worst subject. I got some French language instruction CD&#8217;s, which I am [...]]]></description>
				<content:encoded><![CDATA[<p>I&#8217;m going to France for the next two weeks and thought it would be good to try to learn some of the language.  When I was a kid, I had three years of French instruction in school, but it was by far my worst subject.</p>
<p>I got some French language instruction CD&#8217;s, which I am listening to in my car as I commute.  They&#8217;re called &#8220;Michel Thomas Method French for Beginners&#8221;.  I was afraid that these would be just as boring.  In fact, they&#8217;re great fun to listen to and practice with.</p>
<p>The format is that there&#8217;s a French man teaching two students.  After teaching a few words and a bit of grammar, he gives them a sentence to say, and I can try to formulate it before they do (or pause the CD), and then I get the immediate feedback.</p>
<p>When I was a kid learning French in school, we had to recite conjugations (&#8220;er&#8221;, &#8220;ir&#8221;, and &#8220;re&#8221; verbs, in many tenses), which was totally boring.  Well, it turns out that you can say a whole lot of useful stuff without any of that.  You can just use infinitives for all kinds of things (&#8220;je voudari manger&#8221;).  So you only need to learn the present-tense conjugations of a few verbs and you can say all kinds of useful things.</p>
<p>And you don&#8217;t need to use the future tense at all, since, just as in English you can say &#8220;I am going to eat&#8221;, in French you can say literally the same thing, &#8220;je vais manger&#8221;.  Cool.</p>
<p>Now, whether I&#8217;ll be able to construct sentences on the fly while someone is standing there waiting for me and evaluating whether I&#8217;m right, is another question.  I can easily imagine myself being embarrassed; maybe under pressure I&#8217;ll forget it all.</p>
<p>Also, whether I can understand what&#8217;s spoken to me is yet another issue.</p>
<p>I downloaded some French-English apps for the Android phone, but so far haven&#8217;t found any free apps better than Google&#8217;s own Translate.  I&#8217;d rather have one that doesn&#8217;t depend on network access, though.  I&#8217;m still looking.</p>
<p>Anyway, doing this is fun, which is perhaps what really matters.  So if you had the same negative language-learning experience as I did, try out doing it this way.</p>
]]></content:encoded>
			<wfw:commentRss>http://danweinreb.org/blog/learning-french/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Seed Funding and Angel Groups: The Fast and The Furious</title>
		<link>http://danweinreb.org/blog/seed-funding-and-angel-groups-the-fast-and-the-furious</link>
		<comments>http://danweinreb.org/blog/seed-funding-and-angel-groups-the-fast-and-the-furious#comments</comments>
		<pubDate>Wed, 29 Jun 2011 11:06:49 +0000</pubDate>
		<dc:creator>Dan Weinreb</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://danweinreb.org/blog/?p=674</guid>
		<description><![CDATA[I have written a blog post called Seed Funding and Angel Groups: The Fast and The Furious, which was posted on Dharmesh Shah&#8217;s On Startups blog. It&#8217;s about the speed at which entrepreneurs can acquire seed financing, whether angel groups or venture capital partnerships can move faster, and the how much all this matters. I [...]]]></description>
				<content:encoded><![CDATA[<p>I have written a blog post called <a href="http://onstartups.com/tabid/3339/bid/57145/Seed-Funding-and-Angel-Groups-The-Fast-and-The-Furious.aspx">Seed Funding and Angel Groups: The Fast and The Furious</a>, which was posted on Dharmesh Shah&#8217;s <a href="http://onstartups.com">On Startups blog.</a>  It&#8217;s about the speed at which entrepreneurs can acquire seed financing, whether angel groups or venture capital partnerships can move faster, and the how much all this matters.</p>
<p>I put it on Dharmesh&#8217;s blog at his request.  If you have comments, and if it&#8217;s all the same to you, it&#8217;s probably better to put them on his blog rather than here, just to keep all comments in the same place.</p>
]]></content:encoded>
			<wfw:commentRss>http://danweinreb.org/blog/seed-funding-and-angel-groups-the-fast-and-the-furious/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Comments on &#8220;Urban Myths about NoSQL&#8221;</title>
		<link>http://danweinreb.org/blog/657</link>
		<comments>http://danweinreb.org/blog/657#comments</comments>
		<pubDate>Fri, 17 Jun 2011 12:41:44 +0000</pubDate>
		<dc:creator>Dan Weinreb</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://danweinreb.org/blog/?p=657</guid>
		<description><![CDATA[Dr. Michael Stonebraker recently posted a presentation entitled &#8220;Urban Myths about NoSQL&#8221;. Its primary point is to defend SQL, i.e. relational, database systems against the claims of the new &#8220;NoSQL&#8221; data stores. Dr. Stonebraker is one of the original inventors of relational database technology, and has been one of the most eminent database researchers and [...]]]></description>
				<content:encoded><![CDATA[<p>Dr. Michael Stonebraker recently posted a <a href="http://voltdb.com/_pdf/VoltDB-MikeStonebraker-SQLMythsWebinar-060310.pdf">presentation entitled &#8220;Urban Myths about NoSQL&#8221;</a>.  Its primary point is to defend SQL, i.e. relational, database systems against the claims of the new &#8220;NoSQL&#8221; data stores.  Dr. Stonebraker is one of the original inventors of relational database technology, and has been one of the most eminent database researchers and practitioners for decades.</p>
<p>Many of the virtues of relational databases described here are specifically about a new and highly innovative RDDBMS called <a href="http://voltdb.com">VoltDB</a>.  VoltDB is made by a company called VoltDB.com, of which Dr. Stonebraker is co-founder and CTO. (There is also a good writeup about VoltDB <a href="http://highscalability.com/blog/2010/6/28/voltdb-decapitates-six-sql-urban-myths-and-delivers-internet.html">here</a>.)</p>
<p>The following are some comments about four of the six points in the presentation.  I don&#8217;t consider any of these to &#8220;debunk&#8221; the presentation or anything like that, but they point out considerations that I feel should be taken into account.</p>
<p><strong>#1: SQL is too slow:<br />
</strong><br />
This argument assumes a perfect (or excellent) query optimizer.  If you talk to anyone who has ever done a high-performance system in Oracle DB or DB/2, and you will find out about serious problems in query optimizers.  I am not saying that rolling-your-own C code is the answer, but query strategies often have to be provided explicitly by the developer or DBA.</p>
<p>Stored procedures have a serious problem: you can&#8217;t interleave your own code with database operations.  This can particularly be a problem if each stored procedure is its own transaction rather than an operation within a transaction, as in VoltDB.  Existing large systems may not be able to operate within that constraint, although new systems designed with that in mind might not have any problem witht this.</p>
<p>The &#8220;to go a lot faster&#8221; requires the whole database to be in main memory, as it is with VoltDB (the points on the slides here do not apply to RDBMS&#8217;s other than VoltDB.)  The reason VoltDB can get rid of buffer management is that there are no (disk) buffers.  VoltDB need not do lock management because there is no concurrency control: you just run every transaction to completion, since there is no reason to interleave transactions, since there are no I/O waits.</p>
<p>This is great if it works for your application.  In point #5, he says that most OLTP databases are not very big, e.g. &lt; 1TB, and for a database that size, using main memory is quite feasiable these days.  The requiredment for the sizes of OLTB databases will probably rise with time.  Of course, computers and memory are also getting faster and larger for the same price.</p>
<p><strong>#3: SQL Systems don&#8217;t scale</strong></p>
<p>If you have ever been in involved in benchmarking, you know how difficult it is to interpret benchmark results.  Is it possible that these results were obtained by choosing a benchmark that is particularly favorable to VoltDB?  The only benchmark that really matters is your own application: they are all different.  Of course, the problem with that is that it&#8217;s hard to port your application merely to test performance.  But by ignoring that and looking at other benchmarks, it&#8217;s like looking for a lost key under the streetlight because it&#8217;s easier to look there.  I&#8217;m not saying that these numbers are misleading, and certainly not that they are intentionally misleading, but they are very hard to interpret without knowing exactly what was benchmarked, how everything was tuned, and so on.  I say this from my own experience, having done benchmarking of database systems for years.</p>
<p>(Also notes that by TPC-C, he does not mean the officially defined TPC-C benchmark; look it up and you&#8217;ll see that it is a huge, major project to do it.  He means a very simplified example based on the key concepts in TPC-C.  (You can see this in the academic papers by him and others.)  That said, if you do want a micro-benchmark that is as close to what people agree to be a good measure of online transaction performance, this might be the best one can do.)</p>
<p><strong>#5: ACID is too slow</strong></p>
<p>ACID is great for software developers, providing them a very clean and easy-to-understand model.  Ease of understanding is crucial for achiving simplicity, which is the Holy Grail of software developement, enhancing maintainability and correctness.  I&#8217;m all for ACID.</p>
<p>To clarify something often not explained well: the NoSQL stores are ACID.  It&#8217;s just that what they can do within one ACID transaction is usually quite limited.  For example, a transaction might only be able to fetch a value (or store a value, or increment a value) given the key, and then the transaction is over.  That operation is ACID.</p>
<p>In a classic RDBMS, you can do many operations within one transaction.  Your program says &#8220;begin transaction&#8221; (sometimes this is tacit), and then you can do computations that include both code and database queries/updates, interleaved.  At the end you say &#8220;commit transaction&#8221;.  (During or at the end of a transaction, the DBMS might have to abort the transaction.)</p>
<p>Right now, very few DBMS&#8217;s  provide true ACID properties in the way they are really used in practice, for two reasons.  First, they run at reduded &#8220;isolation levels&#8221;, which means that the &#8220;I&#8221; in ACID is compromised.  See my <a href="http://danweinreb.org/blog/nosql-storage-systems-never-violate-acid-never-well-hardly-ever">blog article</a> for an explanation of this.</p>
<p>Second, one often wants to provide a way to recover from the failure of an entire data center.  This is done by having a second data center that is far enough away that it won&#8217;t be damaged by the failure of the primary data center.  This means you can keep going in the face of a &#8220;disaster&#8221; such as a regional power outage, a tsunami, etc.</p>
<p>The problem is that if the data center is far enough away to have truly independent failure modes, then the network connection will have latency so high that it is not feasible to do synchronous commits for every transaction that update the distant copy.  Most often, commit results are sent asynchronously to the distant copy.  If the local data center fails, any transactions that had beeen committed, but had not yet reached the distant copy, are lost.  So these transactions were not durable, the &#8220;D&#8221; in ACID.  So there is a tradeoff here.  (People live with this by being willing to do manual fixups in the face of a disaster.)</p>
<p>As discussed above, VoltDB transaction do not allow you to interleave code in your application with transactions.  (The stored procedures can run arbitrary code, in Java, but that&#8217;s not the same what I described above.)</p>
<p><strong>#6: In CAP, choose AP over CA</strong></p>
<p>I disagree that network partitions are not a major concern.  Very simple local-area networks do not suffer from partitions and network failures much, but even a medium-size network is vulnerable, and networks in large data centers are quite vulnerable, as you can easily learn from network operations experts.  For example, routers fail, or are misconfigured.</p>
<p>Both Amazon and Google have published papers about their large-scale data stores.  The papers talk a lot about how they deal with network partitions.  If partitions were so unlikey, why are these large companies taking the problem so seriously, and using rather sophisticated techniques to deal with the partitions?  Also, the study of how to deal with network partitions has been a hot topic of research for the last 35 years; again, why would that be true if partitions were not an important concern?</p>
<p>So, as your network becomes larger and more complex, dealing with partitions becomes more and more of an issue.  My impression (I may be wrong) is that the &#8220;sweet spot&#8221; for VoltDB, at least at the moment, is for distributed systems that are not at the kind of very-large scale of an Amazon or Google, and indeed for a much smaller scale, which makes network partitions much less of a problem.  There&#8217;s nothing wrong with this at all; I&#8217;m just trying to clarify the issue and explain the reason for the controversy about this point.</p>
<p><strong>Final Note</strong></p>
<p>There has been an exciting explosion of innovative database technology in the last few years.  Many different kinds of applications have different requirements.  It&#8217;s great news for all of us that there are so many solutions at different points in the requirement space.</p>
]]></content:encoded>
			<wfw:commentRss>http://danweinreb.org/blog/657/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>What are &#8220;Human-Generated Data&#8221; and &#8220;In-RAM Databases&#8221;?</title>
		<link>http://danweinreb.org/blog/what-are-human-generated-data-and-in-ram-databases</link>
		<comments>http://danweinreb.org/blog/what-are-human-generated-data-and-in-ram-databases#comments</comments>
		<pubDate>Tue, 24 May 2011 14:13:00 +0000</pubDate>
		<dc:creator>Dan Weinreb</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://danweinreb.org/blog/?p=645</guid>
		<description><![CDATA[For thoughtful commentary on all kinds of database and data storage systems, one of the best sources is Curt Monash&#8217;s DBMS2 blog.  Recently he posted an article called Traditional Databases will eventually wind up in RAM.  I have two comments about his points from that article. Human-Generated Data I&#8217;m still not totally comfortable with Curt&#8217;s [...]]]></description>
				<content:encoded><![CDATA[<p>For thoughtful commentary on all kinds of database and data storage systems, one of the best sources is <a href="http://www.dbms2.com/" target="_blank">Curt Monash&#8217;s DBMS2 blog</a>.  Recently he posted an article called <a href="http://www.dbms2.com/2011/05/23/databases-ram/" target="_blank">Traditional Databases will eventually wind up in RAM</a>.  I have two comments about his points from that article.</p>
<p><strong>Human-Generated Data</strong></p>
<p>I&#8217;m still not totally comfortable with Curt&#8217;s distinction between &#8220;human-generated&#8221; and &#8220;machine-generated&#8221; data.  Data from humans always goes through machines, so at some level all data is machine-generated. I think what you&#8217;re saying is that the number of humans is roughly constant (on the time scale you mean), and they only have so much time in a day to key in data, etc.  But what about trends that create more bits from any particular bit of human activity?<br />
In the old days, records in databases were created when a person &#8220;keyed in&#8221; some fields.  Now, data is generated every time you click on something.  As data systems increase in capacity, won&#8217;t computers start gathering more and more data for each human interaction?  For example, every time I click, the system records what I clicked on, plus such context the entire contents of what was on my browser screen, how long it was since my last click, plus the times for each of the previous 1,000 clicks, everything it currently (this keeps changing) knows about my buying habits, etc.<br />
That may be far-fetched, but I&#8217;m not so sure: betting on things staying the same size as they are has usually turned out to be less than prescient.  In any case, the underlying principle is analogous to the &#8220;Freeway Effect&#8221;: if there are higher data rates and databases, there will never be &#8220;enough&#8221;.</p>
<p>We&#8217;ll find more data to transmit and more to store, forever and ever.</p>
<p>&nbsp;</p>
<p><strong>In-RAM Database Systems</strong></p>
<p>Having a database &#8220;in RAM&#8221; can mean more than one thing.</p>
<p>In traditional DBMS design, data &#8220;in RAM&#8221; is vulnerable to a very common failure mode, namely, the machine crashing.  So no database data is considered to be durable (the &#8220;D&#8221; in &#8220;ACID&#8221;) until it has been written to disk, which is less vulnerable, especially if you use RAID, etc.  So traditionally writes are sent to a log and forced to disk.  You can still keep the data itself in RAM, but recovery from the log will take longer and longer as the log grows in size, so you &#8220;checkpoint&#8221; the data by writing it out to disk.  That can be done in the background if everything is designed properly.  This is utterly standard.</p>
<p>It&#8217;s also traditional that there isn&#8217;t enough RAM to hold the whole database, so RAM is used as a cache.  This creates some issues when you have to write a modified page back to disk NOT as part of a checkpoint, and there are very standard ways to deal with that.</p>
<p>&#8220;In RAM&#8221; can mean (a) as above, but ususally/always the RAM cache is so big that you never overflow the cache; (b) the database system is designed so that data must fit in RAM, which can simplify buffer management and recovery algorithms; (c) you get around the machine-crash problem some way or other and really do keep everything only in RAM.</p>
<p>One way to do (c) is to keep all data in (at least) two copies, such that they&#8217;ll never both be down.  This requires that the machines (1) have very, very independent failures modes, which is not as easy to do as one might think, and (2) get fixed very quickly, since while one is down you have fewer copies.  Issue (2) is one reason to keep more than two copies; usually three copies are recommended, with one being at a &#8220;distant&#8221; data center.</p>
<p>This approach can be used for the log even if not for the whole DBMS.  HFS, the Hadoop File System, and VoltDB consider this the preferred/canonical way to go.  In both cases, some users still feel uncomfortable with approach (c), and so both have put in ways to commit the log to a conventional disk.  The hope is that as approach (c) proves itself in real production environments over the years, it will be more and more accepted.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://danweinreb.org/blog/what-are-human-generated-data-and-in-ram-databases/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Come to see SPACE OPERA!</title>
		<link>http://danweinreb.org/blog/come-to-see-space-opera</link>
		<comments>http://danweinreb.org/blog/come-to-see-space-opera#comments</comments>
		<pubDate>Mon, 04 Apr 2011 12:58:53 +0000</pubDate>
		<dc:creator>Dan Weinreb</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://danweinreb.org/blog/?p=634</guid>
		<description><![CDATA[Please come to the North Cambridge Family Opera&#8217;s production of SPACE OPERA by NCFO founder David Bass. Space Opera is a light-hearted galactic odyssey, based on a familiar science fiction tale of heroes and villains, robots and aliens, unlikely adventures, and supernatural nonsense. Featuring entertaining lyrics set to singable music in a variety of popular [...]]]></description>
				<content:encoded><![CDATA[<p>Please come to the North Cambridge Family Opera&#8217;s production of</p>
<p style="text-align: center;">SPACE OPERA</p>
<p>by NCFO founder David Bass.  Space Opera is a light-hearted galactic odyssey, based on a familiar science fiction tale of heroes and villains, robots and aliens, unlikely adventures, and supernatural nonsense.</p>
<p>Featuring entertaining lyrics set to singable music in a variety of popular and classical styles, Space Opera is presented by an inter-generational cast in English with side titles.  This entirely sung 110-minute show (plus intermission) is full of dancing Stormtroopers, singing Jawas, droids of all shapes and sizes, and a cantina (with a live band!) that is indeed a wretched hive of scum and villainy. The sets and lighting add some stunning effects to the show. <a href="http://www.familyopera.org/drupal/node/86">Here is a synopsis with samples of the music.</a></p>
<p>Space Opera will be performed eight times at the Peabody School, 70 Rindge Avenue, Cambridge MA, which is at <a href="http://www.familyopera.org/drupal/node/peabody">North Cambridge, between Porter Square and Arlington</a>.  The first four shows have already happened; upcoming shows are:</p>
<ul>
<li>Saturday, April 9 at 2:00pm</li>
<li> Saturday, April 9 at 7:00pm</li>
<li>Sunday, April 10 at 1:00pm</li>
<li>Sunday, April 10 at 5:30pm</li>
</ul>
<p>The cast of more than 150 soloists and chorus members are drawn from many Greater Boston communities and range in age from 7 to 84. They are divided into two casts, and both are excellent, but I&#8217;m partial to the cast performing in shows 1, 3, 6 and 8 because I&#8217;ll be performing (as Owen Lars, Luke&#8217;s uncle).  Information on which cast members perform in which shows can be found <a href="www.FamilyOpera.org">here.</a></p>
<p>Admission this year is free, with a suggested donation of $5 for children, $10 for adults. Snacks will be available for purchase at intermission, as well as pizza at the Sunday 5:30pm shows.  T-shirts, CDs and DVDs will also be available for purchase at intermission.</p>
<p>Come early to make sure you get a seat! For more information about NCFO, visit <a href="www.FamilyOpera.org"></a>, and please forward this information to anyone you think may be interested.</p>
<p>We&#8217;d love it if you RSVP on our <a href="http://www.facebook.com/event.php?eid=210766615604782">Facebook page</a>.</p>
<p>Feel free to invite your friends and help us spread the word.</p>
<p>See you at the show!</p>
]]></content:encoded>
			<wfw:commentRss>http://danweinreb.org/blog/come-to-see-space-opera/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Participate in SPLASH 2011!</title>
		<link>http://danweinreb.org/blog/participate-in-splash-2011</link>
		<comments>http://danweinreb.org/blog/participate-in-splash-2011#comments</comments>
		<pubDate>Fri, 11 Mar 2011 14:16:00 +0000</pubDate>
		<dc:creator>Dan Weinreb</dc:creator>
				<category><![CDATA[Conference]]></category>
		<category><![CDATA[Event]]></category>
		<category><![CDATA[OOPSLA]]></category>

		<guid isPermaLink="false">http://danweinreb.org/blog/?p=625</guid>
		<description><![CDATA[The SPLASH Conference is my favorite technical conference every year. Wait, don&#8217;t stop reading just because you&#8217;re not a &#8220;researcher&#8221;! I started this blog so that I could post a trip report from the conference in 2007. The name has since changed from &#8220;OOPSLA&#8221; to &#8220;SPLASH&#8221;. Actually it now has three tracks. Here&#8217;s what each [...]]]></description>
				<content:encoded><![CDATA[<p>The <a href="http://splashcon.org/2011">SPLASH Conference</a> is my favorite technical conference every year.  Wait, don&#8217;t stop reading just because you&#8217;re not a &#8220;researcher&#8221;!</p>
<p>I started this blog so that I could post a trip report from the <a href="http://danweinreb.org/blog/3">conference in 2007</a>.  The name has since changed from &#8220;OOPSLA&#8221; to &#8220;SPLASH&#8221;.  Actually it now has three tracks.  Here&#8217;s what each is about:</p>
<ul>
<li>OOPSLA: High quality research work that uses established scientific methodologies, written using high standards of academic technical publications.</li>
<li>Wavefront: Original and innovative architecture, design, and/or implementation techniques used in actual leading-edge software system.  Our goal with Wavefront is to engage the software developers who are actually creating  next generation of software systems and to make sure  that their innovations are captured in the technical archives of computing.</li>
<li>Onward!: Innovative ideas that challenge existing beliefs, or early work well written and well argued for; essays and  ideas worth hearing about. (This includes things too &#8220;far out&#8221; to get published in existing journals and other conferences.</li>
</ul>
<p>If I am judging my own audience properly, Wavefront is what most of you would be interested in.  <a href="http://www.wirfs-brock.com/allen/posts/204">Here is a great blog post</a> by Allan Wirfs-Brock, who has been a leader in object-oriented and dynamic programming for decades.  The Wavefront track is to revive what made the early OOPSLA conferences so energetic and valuable: interaction between researchers and people out there getting stuff done.  Both groups have a lot to tell, and learn from, the other.</p>
<p>I am the &#8220;Panels Chair&#8221;, and I&#8217;m actively seeking people who would like to lead a panel, be on a panel, and/or suggest topics for panels.  Panels are fun to be on.  It&#8217;s far less work than submitting a paper.  All you have to do is come up with five minutes of something to say, after which it&#8217;s all questions and answers.  Let me know!  Tell your friends!</p>
<p>The overall theme of the conference is <em>The Internet as the world-wide Virtual Machine</em>.  This theme captures the change in the order of magnitude of computing that happened over the past few years. These days, software systems are rarely designed in isolation; they connect to pieces written by 3rd parties, they communicate with other pieces over the Internet, they use big data produced elsewhere, and they touch millions of interacting users through an ever larger variety of physical devices. In other words, the “machine” is now a global computing network. What does this entail for software development itself?</p>
<p>SPLASH 2011 will be in Portland, OR, October 22 &#8211; 27; the main tracks will probably be from Tuesday Oct 25 to Thursday Oct 27.  The information is all here, including the calls for papers for each track. If you want to submit a paper, the deadline is April 8, 2001.</p>
<p>Just to get you started, and to show how broad the scope of the conference is, here are some possible areas.  You could come up with something that impinges on one of these, but that&#8217;s not necessary.  Panels can be in any of the tracks of SPLASH (OOPSLA, Wavefront, or Onward!).</p>
<ul>
<li>Any aspect of software development, including prototyping, design, testing, evaluation, maintenance, reuse, static or dynamic analysis, frameworks and toolkits.</li>
<li>Language design issues, such as dynamic or static programming, type systems and type inference, use of modularity and parallelism, patterns.  Dynamic languages are welcome.  JavaScript, in particular, has become important lately, but any language is fine.</li>
<li>Language implementation issues: virtual machines, garbage collectors, compilers/interpreters, power efficiency.</li>
<li>Tools designed to reduce the time, effort, and/or cost of software systems.</li>
<li>And any of a wide range of topics: cloud computing and web platforms, mobile platforms, security and privacy issues, UI technology, location-awareness, storage, reliability.</li>
</ul>
<p>I hope to see you there!</p>
]]></content:encoded>
			<wfw:commentRss>http://danweinreb.org/blog/participate-in-splash-2011/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Egyptian Revolution Was Organized</title>
		<link>http://danweinreb.org/blog/the-egyptian-revolution-was-organized</link>
		<comments>http://danweinreb.org/blog/the-egyptian-revolution-was-organized#comments</comments>
		<pubDate>Thu, 10 Feb 2011 22:04:00 +0000</pubDate>
		<dc:creator>Dan Weinreb</dc:creator>
				<category><![CDATA[Politics]]></category>

		<guid isPermaLink="false">http://danweinreb.org/blog/?p=620</guid>
		<description><![CDATA[[Note added much later, 5/31/2011: I recently learned more about what actually happened in Egypt, and now I don't believe what I say in this post. In fact, Facebook played a very major role in organizing the demonstrations.  It turns out that you can propose a demonstration using Facebook and get a lot of people [...]]]></description>
				<content:encoded><![CDATA[<p><em>[Note added much later, 5/31/2011: I recently learned more about what actually happened in Egypt, and now I don't believe what I say in this post. In fact, Facebook played a very major role in organizing the demonstrations.  It turns out that you can propose a demonstration using Facebook and get a lot of people to come to it, and that can have a huge role in triggering a revolt, even if the people know that there might be some danger.]</em></p>
<p>At least in the beginning, the current revolution in Egypt was guided by a small team.  It was not entirely a spontaneous uprising.  How do these things happen?</p>
<p>A few months ago, Malcolm Gladwell wrote <a href="http://www.newyorker.com/reporting/2010/10/04/101004fa_fact_gladwell?currentPage=all">an article in The New Yorker</a> about the use of of Internet social media for social activism.  He basically debunks a lot of what has been said about the use of Facebook and Twitter in recent revolutions.  You may remember hearing about the great importance of Twitter in the Iran uprisings, e.g. how it was used to get information out of Iran, or allow the protesters to work together.  He points out:<br />
The people tweeting about the demonstrations were almost all in the West. &#8220;It is time to get Twitter&#8217;s role in the events in Iran right,&#8221; Golnaz Esfandiari wrote, this past summer, in Foreign Policy. &#8220;Simply put: There was no Twitter Revolution inside Iran&#8221;” The cadre of prominent bloggers, like Andrew Sullivan, who championed the role of social media in Iran, Esfandiari continued, misunderstood the situation. &#8220;Western journalists who couldn&#8217;t reach &#8212; or didn&#8217;t bother reaching &#8212; people on the ground in Iran simply scrolled through the English-language tweets post with tag #iranelection,&#8221; she wrote. &#8220;Through it all, no one seemed to wonder why people trying to coordinate protests in Iran would be writing in any language other than Farsi.&#8221;</p>
<p>Gladwell shows how the civil rights movement in the United States in the 1960&#8242;s worked, in some detail.  The volunteers had very strong personal connections.  What they were doing was extremely risky and required a lot of physical courage.  To make this work required work like a military campaign, with strong central leadership and careful planning.</p>
<p>Today, the New York Times ran <a href="http://www.nytimes.com/2011/02/10/world/middleeast/10youth.html?hp">an article about how the present Egyptian uprisings started.</a> It fits the very model that Gladwell described a few months ago: a small central group, starting with careful planning and pre-testing.  Anyone looking at what&#8217;s been happening in Egypt for the last few weeks could easily think that it was all disorganized and spontaneous.  But the organization wasn&#8217;t easily appearent; in fact, the organizers kept their activities secret as long as possible, for obvious reasons.  It&#8217;s only now that we are finding out in more detail what happened.</p>
<p>I see this as a vindication of, or at least a strong data-point in support of, the argument Gladwell put forth in his article.  As Gladwell points out, social media such as Facebook, with their weaker forms of more-widespread links and decentralization are excellent for some purposes, just not for these, and we should not jump to conclusions.</p>
<p>By the way, when the protests started, I was not optimistic about success.  Success is very difficult.  I expected brutal retaliation followed by the regime&#8217;s becoming even more autocratic and harsh.  I am very surprised and very pleased at what has happened.  The people of Egypt have worked very hard and taken considerable risk to make this happen.  As I write this, the news is that Mubarak is still refusing to step down, despite expectations earlier today that he would do so.  He refused to give any assurances that he would step down in September.  It&#8217;s hard for me to see why any such &#8220;assurances&#8221; would carry weight anyway; if the protests stop, he can just carry on business as usual.</p>
<p>Not surprisingly, he is blaming &#8220;foreigners&#8221; despite the obvious fact that the protesters are ordinary Egyptians.  This is very, very different from foreign overthrows such as those of Mohammad Mossaddegh in Iran in 1953, or of Jacobo Arbenz in Guatamala in 1954, which were staged primarily by the C.I.A.  I have read quite extensively about those events and there is absolutely no way that the present events in Egypt match that pattern.</p>
<p>I continue to hope for their success.</p>
]]></content:encoded>
			<wfw:commentRss>http://danweinreb.org/blog/the-egyptian-revolution-was-organized/feed</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Improving the PACELC Taxonomy</title>
		<link>http://danweinreb.org/blog/improving-the-pacelc-taxonomy</link>
		<comments>http://danweinreb.org/blog/improving-the-pacelc-taxonomy#comments</comments>
		<pubDate>Wed, 12 Jan 2011 20:51:47 +0000</pubDate>
		<dc:creator>Dan Weinreb</dc:creator>
				<category><![CDATA[Concurrency]]></category>
		<category><![CDATA[Database]]></category>
		<category><![CDATA[High Availabilty]]></category>
		<category><![CDATA[NoSQL]]></category>

		<guid isPermaLink="false">http://danweinreb.org/blog/?p=607</guid>
		<description><![CDATA[Daniel Abadi of Yale published a blog article last April criticizing the characterization of distributed storage systems using &#8220;you only get two of C, A, and P&#8221;. He proposed a new taxonomy with the acronym PACELC, to be read &#8220;If P, then trade off A for C; if E, trade off L for C&#8221;, with [...]]]></description>
				<content:encoded><![CDATA[<p>Daniel Abadi of Yale published a <a href="http://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-little.html">blog article last April</a> criticizing the characterization of distributed storage systems using &#8220;you only get two of C, A, and P&#8221;.  He proposed a new taxonomy with the acronym PACELC, to be read &#8220;If P, then trade off A for C; if E, trade off L for C&#8221;, with this meaning:</p>
<ul>
<li>When there is a partition, how does the system trade off between:</li>
<ul>
<li>Availability and</li>
<li>Consistency</li>
</ul>
</ul>
<ul>
<li>Else, when there are no partitions, how does the system trade off between:</li>
<ul>
<li>Latency and</li>
<li>Consistency</li>
</ul>
</ul>
<p>For example, a system characterized as PA/EL means that in the face of partitions, it favors availability over consistency, and if everything&#8217;s working, it favors low latency over consistency.</p>
<p>I think this is moving very much in the right direction, and I hope I can contribute and help develop these ideas a bit.</p>
<p><b>Problems with the &#8220;Proof of the CAP theorem&#8221;</b></p>
<p>The &#8220;CAP&#8221; characterization has a lot of problems.  It is especially poorly applied, if not actually misused, when someone trots out the &#8220;proof of the CAP theorem&#8221; to show how they were forced into a tradeoff.  While the proof is correct, what is proves is too crude to model what we really care about.</p>
<p>I discussed the proof in an <a href="http://danweinreb.org/blog/what-does-the-proof-of-the-cap-theorem-mean">earlier post.</a>  In the proof, each attribute is problematic:</p>
<ol>
<li>&#8220;Consistency&#8221; means the behavior that would have happened on a single server that never crashed and did each operation in serial, which is fine, but lack of consistency means that the system makes <i>no</i> guarantees or representations about the result of an operation <i>whatsoever</i>.</li>
<li>&#8220;Available&#8221; means that you get an answer &#8220;eventually&#8221;, but since eventually can mean any amount of time (a trillion years), there&#8217;s no practical difference between A and not A.</li>
<li>&#8220;Partition-tolerance&#8221; is never actually defined in the paper.</li>
</ol>
<p><b>&#8220;E&#8221; implies &#8220;A&#8221;</b></p>
<p>The reason the acronym doesn&#8217;t need to be &#8220;PACELCA&#8221; is that if there are no partitions, then the system must be available.  Adding an &#8220;A&#8221; to the second part is redundant.  But for me (maybe not for you), putting in the redundant &#8220;A&#8221; in the &#8220;E&#8221; case helps me.  A PA/EL system is always &#8220;available&#8221;, and calling it PA/ELA makes it easier for me to see that availability is always there.</p>
<p><b>How do Availability and Latency relate?</b></p>
<p>Consider what &#8220;highly available&#8221; and &#8220;low latency&#8221; mean.  They are not entirely distinct and orthogonal.  The only useful meaning of &#8220;A&#8221; is that the system replies within a maximum latency.  It could be something like &#8220;response within 10ms at least 90% of the time and within 100ms in any case&#8221; rather than a simple deadline.  We can call this &#8220;fast enough&#8221; to meet the system requirements.  So availability is about latency.</p>
<p>There is, however, an important practical difference.  &#8220;Available&#8221; refers to a system&#8217;s latency related to the amount of time it takes to repair a partition.</p>
<p>To see this, consider two web sites (with human users) that are based on a system that can have partitions:</p>
<ul>
<li>The operators of the system move so quickly that they always fix partitions within 10ms.  The system is &#8220;available&#8221; even in the face of any single partition, without any special mechanism to be &#8220;partition tolerant&#8221;.</li>
<li>The operators of the system move so slowly that it takes them five minutes to fix a partition.  If the system has no way to be &#8220;partition tolerant&#8221;, it&#8217;s not available.</li>
</ul>
<p>Latency (the &#8220;L&#8221; in PACELC) has nothing to do with repair time, since it only applies when there are no partitions.  A web site is far better with a (maximum/average/whatever) latency of 10ms than with 1000ms.</p>
<p>So &#8220;A&#8221; and &#8220;L&#8221; are different.  But, that said, even if a system meets its &#8220;A&#8221; (fast enough) requirement, it can be valuable to lower the latency below that requirement.  The &#8220;PAC&#8221; characterization does not take this into account.</p>
<p><b>PC/EL is confusing</b></p>
<p>If a system is consistent when there are partitions, then surely it&#8217;s also consistent when there aren&#8217;t any partitions.  If the components work better, the service should not be worse.</p>
<p>At first glance, this seems to mean that &#8220;if PC, then EC&#8221;.  That would mean that PC/EL can&#8217;t describe any realistic system, but Prof. Abadi characterizes PNUTS/Sherpa (as originally presented).  I&#8217;m sure that there isn&#8217;t really a paradoxical situation with any real system, but rather that there is a way to misinterpret the PACELC notation.  What do PC and EL really mean?</p>
<p>PC means that if a client sends a request when there are partitions that prevent the system from answering promptly and correctly, then the system does not answer, rather than providing an answer that might be incorrect.  Indeed, it might not be able to reply at all, since a total failure is a kind of partition, and there just isn&#8217;t anybody to send back a reply.</p>
<p>EL means that if a client sends a request, and the system can choose between waiting a longer time to send a consistent answer, versus waiting a shorter time to send an inconsistent answer, it chooses (or tilts toward) the latter.</p>
<p><b>Loose ends</b></p>
<p><ui></p>
<li>What does &#8220;C&#8221; really mean?  Can&#8217;t we say something better than &#8220;we don&#8217;t guarantee consistency&#8221;?  Dynamo can give you answers that are not definitive but are very useful, with semantics that the application can understand.  What about &#8220;eventual consistency&#8221;?</li>
<li>What about durability?  There&#8217;s a big difference between some data being temporarily offline versus data being lost forever.  Some systems use &#8220;commits over a WAN&#8221; to replace the use of disks, and then the tradeoff of latency versus correctness, from synchronous to asynchronous commits, is important.</li>
<li>Should be distinguish between &#8220;available for read&#8221; vs. &#8220;available for write&#8221;?  This can come up in, e.g., a master-slaves configuration.</li>
<p></ui></p>
<p>Stay tuned.</p>
]]></content:encoded>
			<wfw:commentRss>http://danweinreb.org/blog/improving-the-pacelc-taxonomy/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Errors in Database Systems Still Must Consider Network Partitions</title>
		<link>http://danweinreb.org/blog/errors-in-database-systems-still-must-consider-network-partitions</link>
		<comments>http://danweinreb.org/blog/errors-in-database-systems-still-must-consider-network-partitions#comments</comments>
		<pubDate>Tue, 14 Dec 2010 14:11:46 +0000</pubDate>
		<dc:creator>Dan Weinreb</dc:creator>
				<category><![CDATA[Database]]></category>

		<guid isPermaLink="false">http://danweinreb.org/blog/?p=597</guid>
		<description><![CDATA[Prof. Michael Stonebraker wrote a paper, published in the April 2010 issue of CACM, entitled &#8220;Errors in Database Systems: Eventual Consistency and the CAP Theorem&#8221;. As I see it, the overall point of the paper is that the kinds of failures that cause partition-tolerance problems are rare, and not too significant compared to the other [...]]]></description>
				<content:encoded><![CDATA[<p>Prof. Michael Stonebraker wrote a paper, published in the April 2010 issue of CACM, entitled <a href="http://cacm.acm.org/blogs/blog-cacm/83396-errors-in-database-systems-eventual-consistency-and-the-cap-theorem/fulltext">&#8220;Errors in Database Systems: Eventual Consistency and the CAP Theorem&#8221;.</a> As I see it, the overall point of the paper is that the kinds of failures that cause partition-tolerance problems are rare, and not too significant compared to the other ways that a DBMS can fail.  Therefore, people are worrying too much about <a href="http://danweinreb.org/blog/what-does-the-proof-of-the-cap-theorem-mean">the CAP issue</a>, specifically network partitions.</p>
<p>The paper enumerates eight causes of DBMS failure as seen by an application, such as software errors in the DBMS itself, operating system errors, and so on.  Number six in his list is &#8220;a network partition in a local cluster&#8221;, and number eight is &#8220;a network failure in the WAN connecting clusters together; the WAN failed and clusters can no longer all communicate with each other&#8221;.  (The usual reason one would have multiple clusters connected by a WAN is for &#8220;disaster recovery&#8221;, i.e. dealing with a problem that causes an entire cluster to fail, such as power loss over all hardware on which the cluster depends.)</p>
<p>Number six is the crucial issue in the paper, as far as the CAP issue goes.  He has two answers:</p>
<ol>
<li>In my experience, this is exceedingly rare, especially if one replicates the LAN (as Tandem did).</li>
<li>The overwhelming majority [of local failures] cause a single node to fail, which is a degenerate case of a network partition that is easily survived by lots of algorithms.</li>
</ol>
<p>About #1, I have spent some time talking to the operations architects at ITA Software, which has run high-availability servers for many years now, and heard about their experience.  It depends on what you mean by a &#8220;LAN&#8221;.  If you mean a few computers connected together by an Ethernet, with redundant hardware all around, then the chance of a failure of the network itself is relatively low.  However, real-world data centers with a relatively large number of servers rarely work this way.  The problem is that a real network is very complicated.  It depends on switches at both level 3 (routers) and level 2 (hubs).  Situations can arise in which pieces of the network are mis-configured by accident; these can be hard to find due, ironically,  the very redundancy that was added to avoid failures.  In particular, there is no way to make any kind of guarantee about the latency within the network, nor the likelihood that a packet will make it from its source to its destination.</p>
<p>About #2, it would have been helpful if there were citations in the paper.  It is hard to reply to such a claim without specifics.  One of the techniques to deal with one server being down involved &#8220;quorums&#8221;, but they can introduce problems with high-availability.</p>
<p>But, more importantly, consider some of the failure modes that <a href="http://www.google.com/search?hl=&amp;q=amazon+dynamo+paper&amp;sourceid=navclient-ff&amp;rlz=1B3GGLL_en___US410&amp;ie=UTF-8&amp;aq=1&amp;oq=amazon+dynamo">Amazon&#8217;s &#8220;Dynamo&#8221; highly-available key-value store</a> is built to deal with.  Suppose we have a key-value pair that resides on two servers, for high availability.  Call them A1 and A2.  An application changes the value of the pair, but does so at a time with A1 is down or unreachable.  So the update is made to replica A2.  Later, a second application reads the value associated with the key, but this time server A1 is down or unreachable, and server A2 is available.  The second application might not see the new value written by the first application.  I don&#8217;t know any of &#8220;lots of algorithms&#8221; that deals with this sort of scenario while providing complete consistency/correctness.</p>
<p>The conclusion of the paper is:  &#8220;In summary, one should not throw out the C so quickly, since there are real error scenarios where CAP does not apply and it seems like a bad tradeoff in many of the other situations.&#8221;  But since network-partition failures really can happen, it&#8217;s not clear that one can simply <em>decide</em> not to throw out the consistency/correctness criterion.</p>
]]></content:encoded>
			<wfw:commentRss>http://danweinreb.org/blog/errors-in-database-systems-still-must-consider-network-partitions/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>New Ruling from the FCC Affecting Net Neutrality</title>
		<link>http://danweinreb.org/blog/new-ruling-from-the-fcc-affecting-net-neutrality</link>
		<comments>http://danweinreb.org/blog/new-ruling-from-the-fcc-affecting-net-neutrality#comments</comments>
		<pubDate>Fri, 03 Dec 2010 13:19:14 +0000</pubDate>
		<dc:creator>Dan Weinreb</dc:creator>
				<category><![CDATA[Corporations]]></category>
		<category><![CDATA[Internet]]></category>
		<category><![CDATA[Net Neutrality]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://danweinreb.org/blog/?p=580</guid>
		<description><![CDATA[Some time ago I published some blog entries about net neutrality, and the actions of the FCC and the carriers. Yesterday there was a new announcement about these topics, which have many Internet analysts and net neutrality advocates quite upset. The issues are, unfortunately, hard to summarize, so I&#8217;ll just provide links to some good [...]]]></description>
				<content:encoded><![CDATA[<p>Some time ago I published some blog entries about net neutrality, and the actions of the FCC and the carriers.  Yesterday there was a new announcement about these topics, which have many Internet analysts and net neutrality advocates quite upset.  The issues are, unfortunately, hard to summarize, so I&#8217;ll just provide links to some good things to read.</p>
<p><a href="http://www.wired.com/epicenter/2010/12/net-neutrality-reaction/">This article in Wired Magazine</a> is a good place to start.</p>
<p>The negative comment from Marvin Ammori, quoted in the Wired story, comes from <a href="http://www.huffingtonpost.com/marvin-ammori/fcc-chair-proposes-garbag_b_790262.html"> this story published (perhaps in other places as well) in the Huffington Post</a>, with a lot of analysis.</p>
<p><a href="http://www.savetheinternet.com/blog/10/12/01/fcc-chairman-announces-fake-net-neutrality-proposal">There is more analysis here, from Josh Silver</a> of Free Press:</p>
<p>and <a href="http://www.freepress.net/press-release/2010/12/1/fcc-peddling-fake-net-neutrality">more from him, the day before the announcement:</a></p>
<p><a href="http://www.savetheinternet.com/blog/10/12/02/damning-praise-genachowskis-plan">This article</a> shows how happy the carriers are with the announced plan.</p>
<p>For some historical perspective, <a href="http://www.washingtonpost.com/wp-dyn/content/article/2010/12/01/AR2010120100014.html">see this story from the Washington Post</a>.</p>
<p>This is <a href="http://www.cnbc.com/id/40458075">another attempt to explain what is all means</a>, from CNBC.</p>
<p><a href="http://www.theatlanticwire.com/opinions/view/opinion/The-FCC-Pleases-No-One-With-Net-Neutrality-Proposal-6000">This is an article published in Atlantic Monthly</a> opposing the proposal.</p>
<p>What I don&#8217;t quite get is why <a href="http://news.yahoo.com/s/pcworld/20101201/tc_pcworld/fccsgenachowskipushesforvoteonnetneutrality">prominent Silicon Valley investors are happy</a> about this. This one quotes Ron Conway, the most famous of the Silicon Valley &#8220;super-angel&#8221; investors: John Doerr, perhaps the most famous VC anywhere, is also in favor, although I could only find that short quote.  </p>
<p>I have been told that Google and other internet companies are running full page advertisements asking Obama to live up to his commitment to net neutrality.</p>
]]></content:encoded>
			<wfw:commentRss>http://danweinreb.org/blog/new-ruling-from-the-fcc-affecting-net-neutrality/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
