You are here

Feed aggregator

Four short links: 21 March 2019

O'Reilly Radar - Thu, 2019/03/21 - 05:15

Newsletters, Confidence Intervals, Reverse Engineering, and Human Scale

  1. Email Newsletters: The New Social Media (NYT) -- “With newsletters, we can rebuild all of the direct connections to people we lost when the social web came along.”
  2. Scientists Rise Up Against Statistical Significance (Nature) -- want to replace p-values with confidence intervals, which are easier to interpret without special training. Sample intro to p-values and confidence intervals.
  3. Cutter -- A Qt and C++ GUI for radare2 reverse engineering framework. Its goal is making an advanced, customizable, and FOSS reverse-engineering platform while keeping the user experience in mind. Cutter is created by reverse engineers for reverse engineers.
  4. Computer Latency at a Human Scale -- if a CPU cycle is 1 second, then SSD I/O takes 1.5-4 days, and rotational disk I/O takes 1-9 months. Also in the Hacker News thread, human-scale storage: if a byte is a letter, then a 4kb page of memory is 1 sheet of paper, a 256kb L2 cache is a 64-page binder on the desk, and a 1TB SSD is a warehouse of books.

Continue reading Four short links: 21 March 2019.

Categories: Technology

The fundamental problem with Silicon Valley’s favorite growth strategy

O'Reilly Radar - Thu, 2019/03/21 - 03:00

Our entire economy seems to have forgotten that workers are also consumers, and suppliers are also customers.

The pursuit of monopoly has led Silicon Valley astray.

Look no further than the race between Lyft and Uber to dominate the online ride-hailing market. Both companies are gearing up for their IPOs in the next few months. Street talk has Lyft shooting for a valuation between $15 and $30 billion dollars, and Uber valued at an astonishing $120 billion dollars. Neither company is profitable; their enormous valuations are based on the premise that if a company grows big enough and fast enough, profits will eventually follow.

Most monopolies or duopolies develop over time, and have been considered dangerous to competitive markets; now they are sought after from the start and are the holy grail for investors. If LinkedIn co-founder Reid Hoffman and entrepreneur Chris Yeh’s new book Blitzscaling is to be believed, the Uber-style race to the top (or the bottom, depending on your point of view) is the secret of success for today’s technology businesses.

Blitzscaling promises to teach techniques that are “the lightning-fast path to building massively valuable companies.” Hoffman and Yeh argue that in today’s world, it’s essential to “achieve massive scale at incredible speed” in order to seize the ground before competitors do. By their definition, blitzscaling (derived from the blitzkrieg or “lightning war” strategy of Nazi general Heinz Guderian) “prioritizes speed over efficiency,” and risks “potentially disastrous defeat in order to maximize speed and surprise.”

Many of these businesses depend on network effects, which means the company that gets to scale first is likely to stay on top. So, for startups, this strategy typically involves raising a lot of capital and moving quickly to dominate a new market, even when the company’s leaders may not know how they are going to make money in the long term.

This premise has become doctrine in Silicon Valley. But is it correct? And is it good for society? I have my doubts.

Imagine, for a moment, a world in which Uber and Lyft hadn’t been able to raise billions of dollars in a winner-takes-all race to dominate the online ride-hailing market. How might that market have developed differently?

Blitzscaling isn’t really a recipe for success but rather survivorship bias masquerading as a strategy.

Uber and Lyft have developed powerful services that delight their users and are transforming urban transportation. But if they hadn’t been given virtually unlimited capital to offer rides at subsidized prices taxicabs couldn’t match in order to grow their user base at blitzscaling speed, would they be offering their service for less than it actually costs to deliver? Would each company be spending 55% of net revenue on driver incentives, passenger discounts, sales, and marketing to acquire passengers and drivers faster than the other? Would these companies now be profitable instead of hemorrhaging billions of dollars a year? Would incumbent transportation companies have had more time to catch up, leading to a more competitive market? Might drivers have gotten a bigger share of the pie? Would a market that grew more organically—like the web, e-commerce, smartphones, or mobile mapping services—have created more value over the long term?

We’ll never know, because investors, awash in cheap capital, anointed the winners rather than letting the market decide who should succeed and who should fail. This created a de-facto duopoly long before either company had proven that it has a sustainable business model. And because these two giants are now locked in a capital-fueled deathmatch, the market is largely closed off to new ideas except from within the existing, well-funded companies.

The case for blitzscaling

There are plenty of reasons to believe that blitzscaling makes sense. The internet is awash in billionaires who made their fortunes by following a strategy summed up in Mark Zuckerberg’s advice to “move fast and break things.” Hoffman and Yeh invoke the storied successes of Apple, Microsoft, Amazon, Google, and Facebook, all of whom have dominated their respective markets and made their founders rich in the process, and suggest that it is blitzscaling that got them there. And the book tells compelling tales of current entrepreneurs who have outmaneuvered competitors by pouring on the gas and moving more quickly. Hoffman recalls his own success with the blitzscaling philosophy during the early days of Paypal. Back in 2000, the company was growing 5% per day, letting people settle their charges using credit cards while using the service for free. This left the company to absorb, ruinously, the 3% credit card charge on each transaction. He writes:

I remember telling my old college friend and Paypal co-founder/CEO Peter Thiel, 'Peter, if you and I were standing on the roof of our office and throwing stacks of hundred-dollar bills off the edge as fast as our arms could go, we still wouldn’t be losing money as quickly as we are right now.'

But it worked out. Paypal built an enormous user base quickly, giving the company enough market power to charge businesses to accept Paypal payments. They also persuaded most customers to make those payments via direct bank transfers, which have much lower fees than credit cards. If they’d waited to figure out the business model, someone else might have beat them to the customer that made them a success: eBay, which went on to buy Paypal for $1.5 billion (which everyone thought was a lot of money in those days), launching Thiel and Hoffman on their storied careers as serial entrepreneurs and investors.

Of course, for every company like Paypal that pulled off that feat of hypergrowth without knowing where the money would come from, there is a dotcom graveyard of hundreds or thousands of companies that never figured it out. That’s the “risks potentially disastrous defeat” part of the strategy that Hoffman and Yeh talk about. A strong case can be made that blitzscaling isn’t really a recipe for success but rather survivorship bias masquerading as a strategy.

However, Hoffman and Yeh do a good job of explaining the conditions in which blitzscaling makes sense: The market has to be really big; there has to be a sustainable competitive advantage (e.g., network effects) from getting bigger faster than the competition; you have to have efficient means to bootstrap distribution; and you have to have high gross margins so the business will generate positive cash flow and profits when it does get to scale. This is good management advice for established companies as well as startups, and the book is chock full of it.

Hoffman and Yeh also make the point that what most often drives the need for blitzscaling is competition; an entrepreneur with a good idea can be too close to the center of the bullseye, inevitably drawing imitators. The book opens with an excellent tale of how Airbnb used blitzscaling to respond to the threat of a European copycat company by raising money to open and aggressively expand its own European operations years before the company would otherwise have chosen to do so.

But sometimes it isn’t just the threat of competition that drives the need to turbocharge growth: it’s the size and importance of the opportunity, and the need to get big fast enough to effect change. For example, you can make the case that if Uber and Lyft and Airbnb hadn’t blitzscaled, they would have been tied up in bureaucratic red tape, and the future they are trying to build wouldn’t just have happened more slowly; it would never have happened.

The strategic use of blitzscaling isn’t limited to startups. It can also apply to large companies, governments, and even nonprofits. For example, we’re facing a blitzscaling opportunity right now at Code for America, the non-profit founded and run by my wife Jennifer Pahlka, and on whose board I serve.

Our mission is to use the principles and practices of the digital age to improve how government serves the American public, and how the public improves government. Since Code for America is a non-profit, we aren’t trying to “take the market.” There’s no financial imperative to seize an opportunity before someone else does. Our goal is to show what’s possible, to build a constituency and a consensus for a change in the way government does things, and to encourage the development of an ecosystem of new vendors who can work with government the same way we do. By demonstrating that the work of government can be done quickly and cheaply at massive scale using open source software, machine learning, and other 21st-century technology, we look to shape the expectations of the market.

Emulating the tortoise, not the hare, has been our goal. We’ve always preferred opportunities where time is an ally, not an enemy.

So why is blitzscaling relevant to us? It’s not about making millions and snuffing out the competition—as in many of the most compelling cases for blitzscaling—it’s about building enough momentum to break through the stone walls of an old established order. In our case, we are attempting to save taxpayers money and radically alter the lives of millions of Americans.

Here’s a concrete example: one of the areas we’ve gotten deeply involved in is criminal justice reform. Specifically, we’re helping governments implement new laws and initiatives to redress 30 years of over-aggressive policy that has left almost 70 million Americans with some kind of criminal record and 2.2 million behind bars. (That’s the highest percentage in the world.) A broad consensus is emerging on both left and right that it’s time to rethink our criminal justice system.

Too often, though, those passing new laws have given insufficient thought to their implementation, leaving existing bureaucratic processes in place. For example, to clear a criminal record under 2014’s California Proposition 47, which reduced the penalty for many crimes by reclassifying them as misdemeanors rather than felonies, a person must go to the District Attorney’s office in each jurisdiction where he or she has a record, ask the DA to download their rap sheet, determine eligibility by assessing the obscure codes on the rap sheet, and, if eligible, petition the court for clearance. Facing such a cumbersome, expensive process, only a few thousands of those eligible were able to clear their records.

After the passage in 2016 of California Proposition 64, which decriminalized marijuana and added millions to the rolls of those who had criminal records eligible to be expunged, San Francisco District Attorney George Gascon announced a program for automatic expungement. The DA’s office would not wait for petitioners to appear, but would preemptively download and evaluate all eligible records. Unfortunately, lacking technology expertise, Gascon’s office set out to do it with manual labor, hiring paralegals to download and evaluate the rap sheets and file the necessary paperwork with the courts.

When we demonstrated that we could download the records in bulk and automate the evaluation of rap sheets, working through thousands of records in minutes and automatically generating the paperwork for clearance, they were all in. True automatic expungement looks like a real possibility. Now we aim to scale up our team to support the entire state in this ambitious program.

So what’s the rush? The first reason for urgency is the human toll we can alleviate by getting the work done more quickly. When people can clear their records, it gives them better access to jobs, subsidized housing, and many other benefits.

The second reason is that many other states are also reducing sentences and pushing for record clearance. While we’ve already got our Clear My Record project well underway in California, other states are turning to legacy vendors working through legacy procurement processes. If existing vendors exploit this opportunity and persuade states to sign traditional contracts before we show how cheaply and effectively the job can be done, millions of dollars in public money may be wasted doing it the old way, and years of delay in implementation are at stake. (These contracts typically cost hundreds of millions of dollars and take years to deliver on.)

So we’re asking ourselves, is it enough to show what’s possible and hope that others do the right thing? Or might we get to our desired outcome more effectively if we scale up our own capability to address the problem? The key question we are wrestling with is “how can we move faster?”—which is exactly the question that Hoffman and Yeh’s book seeks to answer.

In short, there are compelling reasons to blitzscale, and the book provides a great deal of wisdom for those facing a strategic inflection point where success depends on moving much faster. But I worry that the book oversells the idea, and that too many entrepreneurs will believe this is the only way to succeed.

Why I’m skeptical of blitzscaling

To understand why I’m skeptical about blitzscaling, you have to understand a bit about my own entrepreneurial history. I started my company, O’Reilly Media, 40 years ago with a $500 initial investment. We took in no venture capital, but despite that have built a business that generates hundreds of millions of dollars of profitable annual revenue. We got there through steady, organic growth, funded by customers who love our products and pay us for them.

Emulating the tortoise, not the hare, has been our goal. We’ve always preferred opportunities where time is an ally, not an enemy. That’s not to say that we haven’t had our share of blitzscaling opportunities, but in each of them, we kickstarted a new market and then let others take the baton.

In 1993, my company launched GNN, the Global Network Navigator, which was the first advertising-supported site on the World Wide Web, and the first web portal. We were so early that we had to persuade the world that advertising was the natural business model for this new medium. We plowed every penny we were making from our business writing and publishing computer-programming manuals into GNN—our own version of throwing hundred dollar bills off the rooftop. And for two years, from 1993 until we sold GNN to AOL in mid-1995, we were the place where people went to go to find new websites.

As the commercial web took off, however, it became clear that we couldn’t play by the same rules as we had in the publishing market, where a deep understanding of what customers were looking for, superior products, innovative marketing, and fair prices gave us all the competitive advantages we needed. Here, the market was exploding, and unless we could grow as fast or faster than the market, we’d soon be left behind. And we could see the only way to do that would be to take in massive amounts of capital, with the price of chasing the new opportunity being the loss of control over the company.

Wanting to build a company that I would own and control for the long term, I decided instead to spin out GNN and sell it to AOL. Jerry Yang and David Filo made a different decision at Yahoo!, founded a year after GNN. They took venture capital, blitzscaled their company, and beat AOL to the top of the internet heap—before being dethroned in their turn by Google.

It happened again in 1996 when O’Reilly launched a product called Website, the first commercial Windows-based web server. We’d been drawn to the promise of a web in which everyone was a publisher; instead, websites were being built on big centralized servers, and all most people could do was consume web content via ubiquitous free browsers. So we set out to democratize the web, with the premise that everyone who had a browser should also be able to have a server. Website took off, and became a multimillion-dollar product line for us. But our product was soon eclipsed by Netscape, which had raised tens of millions of dollars of venture capital, and eventually had a multibillion-dollar IPO—before being crushed in turn by Microsoft.

In the case of both GNN and Website, you can see the blitzscaling imperative: a new market is accelerating, and there is a clear choice between keeping up and being left behind. In those two examples, I made a conscious choice against blitzscaling because I had a different vision for my company. But far too many entrepreneurs don’t understand the choice, and are simply overtaken by better-funded competitors who seize the opportunity more boldly. And in too many markets, in the absence of antitrust enforcement, there is always the risk that no matter how much money you raise and how fast you go, the entrenched giants will be able to leverage their existing business to take over the new market you’ve created. That’s what Microsoft did to Netscape, and what Facebook did to Snapchat.

Had we followed Hoffman and Yeh’s advice, we would have taken on a contest we were very unlikely to win, no matter how much money we raised or how fast we moved. And even though we abandoned these two opportunities when the blitzscalers arrived, O’Reilly Media has had enormous financial success, becoming a leader in each of our chosen markets.

Winning at all costs

There’s another point that Hoffman and Yeh fail to address. It matters what stories we tell ourselves about what success looks like. Blitzscaling can be used by any company, but it can encourage a particular kind of entrepreneur: hard-charging, willing to crash through barriers, and often ruthless.

We see the consequences of this selection bias in the history of the on-demand ride-hailing market. Why did Uber emerge the winner in the ride-hailing wars? Sunil Paul, the founder of Sidecar, was the visionary who came up with the idea of a peer-to-peer taxi service provided by people using their own cars. Logan Green, the co-founder of Lyft, was the visionary who had set out to reinvent urban transportation by filling all the empty cars on the road. But Travis Kalanick, the co-founder and CEO of Uber, was the hyper-aggressive entrepreneur who raised money faster, broke the rules more aggressively, and cut the most corners in the race to the top.

In 2000, a full eight years before Uber was founded, Sunil Paul filed a patent that described many of the possibilities that the newly commercialized GPS capabilities would provide for on-demand car sharing. He explored founding a company at that time, but realized that GPS-enabled phones weren’t common enough. It was just too early for his ideas to take hold.

The market Paul had envisioned began, in fits and starts, around 2007. That year, Logan Green and John Zimmer, the founders of Lyft, started a company called Zimride that was inspired by the bottom-up urban jitneys Green had fallen in love with during a trip to Zimbabwe. They began with a web app to match up college students making long-distance trips with others going in the same direction. In 2008, Garrett Camp and Travis Kalanick founded Uber as a high-end service using SMS to summon black-car drivers.

Neither Zimride nor Uber had yet realized the full idea we now think of as smartphone-enabled ride hailing, and each was working toward it from a different end—peer-to-peer, and mobile on-demand respectively. The two ideas were about to meet in an explosive combination, fueled by the wide adoption of GPS-enabled smartphones and app marketplaces. Following the 2007 introduction of the iPhone and, at least as importantly, the 2008 introduction of the iPhone App Store, the iPhone became a platform for mobile applications.

Once again, it was Paul who first saw the future. Inspired by Airbnb’s success in getting ordinary people to rent out their homes, he realized people might also be willing to rent out their cars. He worked on several versions of this idea. In 2009, while working with the founders of what became Getaround in a class he was teaching at Singularity University, Paul explored peer-to-peer fractional car rental. Then, in 2012, he launched a new company, Sidecar, to provide the equivalent of peer-to-peer taxi service, with ordinary people providing the service using their own cars. He set out to get permission from regulatory agencies for this new approach.

There are few small wins for the entrepreneur; only the big bets pay off. And, as in Las Vegas, only the house always wins.

Green and Zimmer heard about Paul’s work on Sidecar and realized immediately that this model could help them realize their original vision for Zimride. They pivoted quickly from their original vision, launching Lyft as a project within Zimride about three months after Sidecar. When Lyft took off, they sold the original Zimride product and went all-in on the new offering. (That’s blitzscaling for you. Seize the ground first.)

Uber was an even more aggressive blitzscaler. Hearing about Lyft’s plans, Uber announced UberX, its down-market version of Uber using ordinary people driving their own cars instead of chauffeurs with limousines, the day before Lyft launched, even though all that it had developed in the way of a peer-to-peer driver platform was a press release. In fact, Kalanick, the co-founder and CEO, had been skeptical about the legality of the peer-to-peer driver model, telling Jason Calacanis, the host of the podcast This Week in Startups, “It would be illegal.”

And the race was on. Despite his earlier reservations about the legality of the model, Uber out-executed its smaller rivals, in part by ignoring regulation while they attempted to change the rules, and became the market leader. Uber also followed the blitzscaling playbook more closely, raising far more money than its rivals, and growing far faster. Lyft managed to become a strong number two. But by 2015, Sidecar was a footnote in history, going out of business after having raised only $35.5 million to Uber’s $7.3 billion and Lyft’s $2 billion. To date, Uber has raised a total of $24.3 billion, and Lyft $4.9 billion.

Hoffman and Yeh embrace this dark pattern as a call to action. Early in their book, Blake, the cynical sales manager played by Alec Baldwin in the movie Glengarry Glen Ross, appears as an oracle dispensing wise advice:

As you all know, first prize is a Cadillac Eldorado. Anyone wanna see second prize? Second prize is a set of steak knives. Third prize is you’re fired. Get the picture?

In the real world, though, while Sunil Paul’s company went out of business, it was Travis Kalanick of Uber who got fired. Stung by scandal after scandal as Uber deceived regulators, spied on passengers, and tolerated a culture of sexual harassment, the board eventually asked for Kalanick’s resignation. Not only that, Uber’s worldwide blitzscaling attempts—competing in ride-hailing not only with Lyft in the U.S. but with Didi in China and with Grab and Ola in Southeast Asia, and with Google on self-driving cars—eventually spread the company too thin, just as Guderian’s blitzkrieg techniques, which had worked so well against France and Poland, failed during the invasion of Russia.

Meanwhile, the forced bloom of Uber’s market share lead became a liability even in the U.S. Even though Uber had far more money, the price war between the two companies cost Uber far more in markets where its share was large and Lyft’s was small. Lyft focused on the U.S. market and began to chip away at Uber’s early lead. It also made significant gains on Uber as passengers and drivers, stung by the sense that Uber was an amoral company, began to abandon the service. Uber is still the larger and more valuable company, and Dara Khosrowshahi, the new CEO, has made enormous progress in stabilizing its business and restoring its reputation. But Lyft’s gains appear to be sustainable.

Blitzscaling—or sustainable scaling?

While Hoffman and Yeh’s book claims that companies like Google, Facebook, Microsoft, Apple, and Amazon are icons of the blitzscaling approach, this idea is plausible only with quite a bit of revisionist history. Each of these companies achieved profitability (or in Amazon’s case, positive cash flow) long before its IPO, and growth wasn’t driven by a blitzkrieg of spending to acquire customers below cost but by breakthrough products and services, and by strategic business model innovations that were rooted in a future the competition didn’t yet understand. These companies didn’t blitzscale; they scaled sustainably.

Google raised only $36 million before its IPO—an amount that earned Sidecar’s Sunil Paul the dismal third prize of going out of business. For that same level of investment, Google was already hugely profitable.

Facebook’s rise to dominance was far more capital-intensive than Google’s. The company raised $2.3 billion before its IPO, but it too was already profitable long before it went public; according to insiders, it ran close to breakeven from fairly early in its life. The money raised was strategic, a way of hedging against risk, and of stepping up the valuation of the company while delaying the scrutiny of a public offering. As Hoffman and Yeh note in their book, in today’s market, “Even if the money doesn’t prove to be necessary, a major financing round can have positive signaling effects—it helps convince the rest of the world that your company is likely to emerge as the market leader.”

Even Amazon, which lost billions before achieving profitability, raised only $108 million in venture capital before its IPO. How was this possible? Bezos realized his business generated enormous positive cash flow that he could borrow against. It was his boldness in taking the risk of borrowing billions (preserving a larger ownership stake for himself and his team than if he had raised billions in equity), not just Amazon’s commitment to growth over profits, that helped make him the world’s richest man today.

Winners-take-all is an investment philosophy perfectly suited for our age of inequality and economic fragility.

In short, none of these companies (except arguably Amazon) followed the path that Hoffman and Yeh lay out as a recipe for today’s venture-backed companies. Venture-backed blitzscaling was far less important to their success than product and business-model innovation, brilliant execution, and relentless strategic focus. Hypergrowth was the result rather than the cause of these companies’ success.

Ironically, Hoffman and Yeh’s book is full of superb practical advice about innovation, execution, and strategic focus, but it’s wrapped in the flawed promise that startups can achieve similar market dominance as these storied companies by force-feeding inefficient growth.

For a company like Airbnb, a company with both strong network effects and a solid path to profitability, blitzscaling is a good strategy. But blitzscaling also enables too many companies like Snap, which managed to go public while still losing enormous amounts of money, making its founders and investors rich while passing on to public market investors the risk that the company will never actually become a profitable business. Like Amazon and Airbnb, some of these companies may become sustainable, profitable businesses and grow into their valuation over time, but as of now, they are still bleeding red ink.

Sustainability may not actually matter, though, according to the gospel of blitzscaling. After all, the book’s marketing copy does not promise the secret of building massively profitable or enduring companies, but merely “massively valuable” ones.

What is meant by value? To too many investors and entrepreneurs, it means building companies that achieve massive financial exits, either by being sold or going public. And as long as the company can keep getting financing, either from private or public investors, the growth can go on.

But is a business really a business if it can’t pay its own way?

Is it a business or a financial instrument?

Benjamin Graham, the father of value investing, is widely reported to have said: “In the short run, the market is a voting machine. In the long run, it’s a weighing machine.” That is, in the short term, investors vote (or more accurately, place bets) on the present value of the future earnings of a company. Over the long term, the market discovers whether they were right in their bets. (That’s the weighing machine.)

But what is happening today is that the market has almost entirely turned into a betting machine. Not only that, it’s a machine for betting on a horse race in which it’s possible to cash your winning ticket long before the race has actually finished. In the past, entrepreneurs got rich when their companies succeeded and were able to sell shares to the public markets. Increasingly, though, investors are allowing insiders to sell their stock much earlier than that. And even when companies do reach the point of a public offering, these days, many of them still have no profits.

According to University of Florida finance professor Jay Ritter, 76% of all IPOs in 2017 were for companies with no profits. By October 2018, the percentage was 83%, exceeding even the 81% seen right before the dotcom bust in 2000.

Would profitless companies with huge scale be valued so highly in the absence of today’s overheated financial markets?

Too many of the companies likely to follow Hoffman and Yeh’s advice are actually financial instruments instead of businesses, designed by and for speculators. The monetization of the company is sought not via the traditional means of accumulated earnings and the value of a continuing business projecting those earnings into the future, but via the exit, that holy grail of today’s Silicon Valley. The hope is that either the company will be snapped up by another company that does have a viable business model but lacks the spark and sizzle of internet-fueled growth, or will pull off a profitless IPO, like Snap or Box.

The horse-race investment mentality has a terrible side effect: companies that are not contenders to win, place, or show are starved of investment. Funding dries up, and companies that could have built a sustainable niche if they’d grown organically go out of business instead. “Go big or go home” results in many companies that once would have been members of a thriving business ecosystem indeed going home. As Hoffman and Yeh put it:

Here is one of the ruthless practices that has helped make Silicon Valley so successful: investors will look at a company that is on an upward trajectory but doesn’t display the proverbial hockey stick of exponential growth and conclude that they need to either sell the business or take on additional risk that might increase the chances of achieving exponential growth... Silicon Valley venture capitalists want entrepreneurs to pursue exponential growth even if doing so costs more money and increases the chances that the business will fail.

Because this blitzscaling model requires raising ever more money in pursuit of the hockey stick venture capitalists are looking for, the entrepreneur’s ownership is relentlessly diluted. Even if the company is eventually sold, unless the company is a breakout hit, most of the proceeds go to investors whose preferred shares must be repaid before the common shares owned by the founders and employees get anything. There are few small wins for the entrepreneur; only the big bets pay off. And, as in Las Vegas, the house always wins.

Bryce Roberts, my partner at O’Reilly AlphaTech Ventures (OATV), recently wrote about the probability of winning big in business:

Timely reminder that the VCs aren’t even in the home run business.

They’re in the grand slam business.

Interestingly, odds of hitting a grand slam (.07%) are uncannily similar to odds of backing a unicorn (.07% of VC backed startups)

—indievc (@indievc) December 20, 2018

This philosophy has turned venture capitalists into movie studios, financing hundreds of companies in pursuit of the mega-hit that will make their fund, and at its worst turns entrepreneurs into the equivalent of Hollywood actors, moving from one disposable movie to another. (“The Uber of Parking” is sure to be a hit! And how about “the Uber of Dry Cleaning”?)

The losses from the blitzscaling mentality are felt not just by entrepreneurs but by society more broadly. When the traditional venture-capital wisdom is to shutter companies that aren’t achieving hypergrowth, businesses that would once have made meaningful contributions to our economy are not funded, or are starved of further investment once it is clear that they no longer have a hope of becoming a home run.

Winners-take-all is an investment philosophy perfectly suited for our age of inequality and economic fragility, where a few get enormously rich, and the rest get nothing. In a balanced economy, there are opportunities for success at all scales, from the very small, through successful mid-size companies, to the great platforms.

Is Glengarry Glen Ross’s sales competition really the economy we aspire to?

There is another way

There are business models, even in the technology sector, where cash flow from operations can fund the company, not venture capitalists gripping horse-race tickets.

Consider these companies: Mailchimp, funded by $490 million in annual revenue from its 12 million customers, profitable from day one without a penny of venture capital; Atlassian, bootstrapped for eight years before raising capital in 2010 after it had reached nearly $75 million in self-funded annual revenue; and Shutterstock, which took in venture capital only after it had already bootstrapped its way to becoming the largest subscription-based stock photo agency in the world. (In the case of both Atlassian and Shutterstock, outside financing was a step toward liquidity through a public offering, rather than strictly necessary to fund company growth.) All of these companies made their millions through consumer-focused products and patience, not blitzscaling.

Jason Fried and David Heinemeier Hansson, the founders of Basecamp, a 20-year-old, privately held, profitable Chicago company whose core product is used by millions of software developers, have gone even further: they entirely abandoned the growth imperative, shedding even successful products to keep their company under 50 people. Their book about their approach, It Doesn’t Have to Be Crazy At Work, should be read as an essential counterpoint to Blitzscaling.

Another story of self-funded growth I particularly like is far from tech. RxBar, a Chicago-based nutrition bar company with $130 million of self-funded annual revenue, was acquired last year by Kellogg for $600 million. Peter Rahal, one of the two founders, recalls that he and co-founder Jared Smith were in his parents’ kitchen, discussing how they would go about raising capital to start their business. His immigrant father said something like, “You guys need to shut the fuck up and just sell a thousand bars.”

And that’s exactly what they did, putting in $5,000 each, and hustling to sell their bars via Crossfit gyms. It was that hustle and bias toward customers, rather than outside funding, that got them their win. Their next breakthrough was in their distinctive “No BS” packaging, which made the ingredients, rather than extravagant claims about them, the centerpiece of the message.

The exit for RxBar, when it came, was not the objective, but a strategy for growing a business that was already working. “Jared and I never designed the business to sell it; we designed it to solve a problem for customers,” Rahal recalled. “In January 2017, Jared and I were sitting around and asked what do we want to do with this business? Do we want to continue and make it a family business? Or do we want to roll it up into a greater company, really scale this thing and take it to the next level? We wanted to go put fire on this thing.”

They could have raised growth capital at that point, like Mike Cannon-Brooks of Atlassian or Jon Oringer of Shutterstock did, but acquisition provided a better path to sustainable impact. Kellogg brought them not just an exit, but additional capabilities to grow their business. Rahal continues to lead the brand at Kellogg, still focusing on customers.

Raise less, own more

The fact that the Silicon Valley blitzscaling model is not suited for many otherwise promising companies has led a number of venture capitalists, including my partner Bryce Roberts at OATV, to develop an approach for finding, inspiring, and financing cash-flow positive companies., a project at OATV, has developed a new kind of venture financing instrument. It’s a convertible loan designed to be repaid out of operating cash flow rather than via an exit, but that can convert to equity if the company, having established there is a traditional venture business opportunity, decides to go that route. This optionality effectively takes away the pressure for companies to raise ever more money in pursuit of the hypergrowth that, as Hoffman and Yeh note, traditional venture capitalists are looking for. The program also includes a year of mentorship and networking, providing access to experienced entrepreneurs and experts in various aspects of growing a business.

In the FAQ, Bryce wrote:

We believe deeply that there are hundreds, even thousands, of businesses that could be thriving, at scale, if they focused on revenue growth over raising another round of funding. On average, the companies we’ve backed have increased revenues over 100% in the first 12 months of the program and around 300% after 24 months post-investment. We aim to be the last investment our founders NEED to take. We call this Permissionless Entrepreneurship.

This is a bit like the baseball scouting revolution that Michael Lewis chronicled in Moneyball. While all the other teams were looking for home-run hitters, Oakland A’s manager Billy Beane realized that on-base percentage was a far more important statistic for actually winning. He took that insight all the way from the league basement to the playoffs, winning against far richer teams despite the A’s own low-salary budget.

One result of an investment model looking for the equivalent of on-base percentage—that is, the ability to deliver a sustainable business for as little money as possible—is that many entrepreneurs can do far better than they can in the VC blitzscale model. They can build a business that they love, like I did, and continue to operate it for many years. If they do decide to exit, they will own far more of the proceeds.

Even successful swing-for-the fences VCs like Bill Gurley of Benchmark Capital agree. As Gurley, an early Uber investor and board member, tweeted recently:

100% agree with this article, & have voiced this opinion my whole career. The vast majority of entrepreneurs should NOT take venture capital. Why? Article nails it: it is a binary "swing for the fences" exercise. Bootstrapping more likely to lead to individual financial success.

— Bill Gurley (@bgurley) January 11, 2019’s search for profit-seeking rather than exit-seeking companies has also led to a far more diverse venture portfolio, with more than half of the companies led by women and 20% by people of color. (This is in stark contrast to traditional venture capital, where 98% of venture dollars go to men.) Many are from outside the Bay Area or other traditional venture hotbeds. The 2019 tour, in which Roberts looks for startups to join the program, hosts stops in Kansas City, Boise, Detroit, Denver, and Salt Lake City, as well as the obligatory San Francisco, Seattle, New York, and Boston.

Where conventional startup wisdom would suggest that aiming for profits, not rounds of funding, will lead to plodding growth, many of our companies are growing just as fast as those from the early-stage portfolios in our previous OATV funds.

Nice Healthcare is a good example. Its founder, Thompson Aderinkomi, had been down the traditional blitzscaling path with his prior venture and wanted to take a decidedly different approach to funding and scaling his new business. Seven months post-investment by, Nice was able to achieve 400% revenue growth, over $1 million in annual recurring revenue, and is now profitable. All while being run by a black founder in Minneapolis. Now that’s a real unicorn! Some of the other fast-growing companies in the Indie.VC portfolio include The Shade Room, Fohr, Storq, re:3d, and Chopshop.

OATV has invested in its share of companies that have gone on to raise massive amounts of capital—Foursquare, Planet, Fastly, Acquia, Signal Sciences, Figma, and Devoted Health for example—but we’ve also funded companies that were geared toward steady growth, profitability, and positive cash flow from operations, like Instructables, SeeClickFix, PeerJ, and OpenSignal. In our earlier funds, though, we were trying to shoehorn these companies into a traditional venture model when what we really needed was a new approach to financing. So many VCs throw companies like these away when they discover they aren’t going to hit the hockey stick. But Roberts kept working on the problem, and now his approach to venture capital is turning into a movement.

A recent New York Times article, “More Startups Have an Unfamiliar Message for Venture Capitalists: Get Lost,” describes a new crop of venture funds with a philosophy similar to Some entrepreneurs who were funded using the old model are even buying out their investors using debt, like video-hosting company Wistia, or their own profits, like social media management company Buffer.

Sweet Labs, one of OATV’s early portfolio companies, has done the same. With revenues in the high tens of millions, the founders asked themselves why they should pursue risky hypergrowth when they already had a great business they loved and that already had enough profit to make them rich. They offered to buy out their investors at a reasonable multiple of their investment, and the investors agreed, giving back control over the company to its founders and employees. What has done is to build in this optionality from the beginning, reminding founders that an all-or-nothing venture blitzscale is not their only option.

The responsibility of the winners

I’ve talked so far mainly about the investment distortions that blitzscaling introduces. But there is another point I wish Hoffman and Yeh had made in their book.

Assume for a moment that blitzscaling is indeed a recipe for companies to achieve the kind of market dominance that has been achieved by Apple, Amazon, Facebook, Microsoft, and Google. Assume that technology is often a winner-takes-all market, and that blitzscaling is indeed a powerful tool in the arsenal of those in pursuit of the win.

What is the responsibility of the winners? And what happens to those who don’t win?

We live in a global, hyperconnected world. There is incredible value to companies that operate at massive scale. But those companies have responsibilities that go with that scale, and one of those responsibilities is to provide an environment in which other, smaller companies and individuals can thrive. Whether they got there by blitzscaling or other means, many of the internet giants are platforms, something for others to build on top of. Bill Gates put it well in a conversation with Chamath Palihapitiya when Palihapitiya was the head of platform at Facebook: “A platform is when the economic value of everybody that uses it exceeds the value of the company that creates it.”

For every company that pulled off that feat of hypergrowth, there is a dotcom graveyard of hundreds of companies that never figured it out.

The problem with the blitzscaling mentality is that a corporate DNA of perpetual, rivalrous, winner-takes-all growth is fundamentally incompatible with the responsibilities of a platform. Too often, once its hyper-growth period slows, the platform begins to compete with its suppliers and its customers. Gates himself faced (and failed) this moral crisis when Microsoft became the dominant platform of the personal computer era. Google is now facing this same moral crisis, and also failing.

Windows, the web, and smartphones such as the iPhone succeeded as platforms because a critical mass of third-party application developers added value far beyond what a single company, however large, could provide by itself. Nokia and Microsoft were also-rans in the smartphone platform race not just because they couldn’t get customers to buy their phones, but because they couldn’t get enough developers to build applications for them. Likewise, Uber and Lyft need enough drivers to pick people up within a few minutes, wherever they are and whenever they want a ride, and enough passengers to keep all their drivers busy. Google search and Amazon commerce succeed because of all that they help us find or buy from others. Platforms are two-sided marketplaces that have to achieve critical mass on both the buyer and the seller sides.

Yet despite the wisdom Gates expressed in his comments to Palihapitiya about the limitations of Facebook as a platform, he clearly didn’t go far enough in understanding the obligations of a platform owner back when he was Microsoft’s CEO.

Microsoft was founded in 1975, and its operating systems —first MS-DOS, and then Windows—became the platform for a burgeoning personal computer industry, supporting hundreds of PC hardware companies and thousands of software companies. Yet one by one, the most lucrative application categories—word processing, spreadsheets, databases, presentation software—came to be dominated by Microsoft itself.

One by one, the once-promising companies of the PC era—Micropro, Ashton-Tate, Lotus, Borland—went bankrupt or were acquired at bargain-basement prices. Developers, no longer able to see opportunity in the personal computer, shifted their attention to the internet and to open source projects like Linux, Apache, and Mozilla. Having destroyed all its commercial competition, Microsoft sowed the dragon’s teeth, raising up a new generation of developers who gave away their work for free, and who enabled the creation of new kinds of business models outside Microsoft’s closed domain.

The government also took notice. When Microsoft moved to crush Netscape, the darling of the new internet industry, by shipping a free browser as part of its operating system, it had gone too far. In 1994, Microsoft was sued by the U.S. Department of Justice, signed a consent decree that didn’t hold, and was sued again in 1998 for engaging in anti-competitive practices. A final settlement in 2001 gave enough breathing room to the giants of the next era, most notably Google and Amazon, to find their footing outside Microsoft’s shadow.

That story is now repeating itself. I recently did an analysis of Google’s public filings since its 2004 IPO. One of the things those filings report is the share of the ad business that comes from ads on Google’s own properties (Google Ads) versus from ads it places on its partner sites (AdSense). While Google has continued to grow the business for its partners, the company has grown its own share of the market far, far faster. As shown on the chart below, when Google went public in 2004, 51% of ad revenue came from Google’s own search engine while 49% came from ads on third-party websites served up by Google. But by 2017, revenue from Google properties was up to 82%, with only 18% coming from ads on third-party websites.

Where once advertising was relegated to a second-class position on Google search pages, it now occupies the best real estate. Ads are bigger, they now appear above organic results rather than off to the side, and there are more of them included with each search. Even worse, organic clicks are actually disappearing. In category after category—local search, weather, flights, sports, hotels, notable people, brands and companies, dictionary and thesaurus, movies and TV, concerts, jobs, the best products, stock prices, and more—Google no longer sends people to other sites: it provides the information they are looking for directly in Google. This is very convenient for Google’s users, and very lucrative for Google, but very bad for the long-term economic health of the web.

In a recent talk, SEO expert Rand Fishkin gave vivid examples of the replacement of organic search traffic with “no click” searches (especially on mobile) as Google has shifted from providing links to websites to providing complete answers on the search page itself. Fishkin’s statistical view is even more alarming than his anecdotal evidence. He claims that in February 2016, 58% of Google searches on mobile resulted in organic clicks, and 41% had no clicks. (Some of these may have been abandoned searches, but most are likely satisfied directly in the Google search results.) By February 2018, the number of organic clicks had dropped to 39%, and the number of no click searches had risen to 61%. It isn’t clear what proportion of Google searches his data represents, but it suggests the cannibalization is accelerating.

Growth for growth’s sake seems to have replaced the mission that made Google great.

Google might defend itself by saying that providing information directly in its search results is better for users, especially on mobile devices with much more limited screen real estate. But search is a two-sided marketplace, and Google, now effectively the marketplace owner, needs to look after both sides of the market, not just its users and itself. If Google is not sending traffic to its information suppliers, should it be paying them for their content?

The health of its supplier ecosystem should be of paramount concern for Google. Not only has the company now drawn the same kind of antitrust scrutiny that once dogged Microsoft, it has weakened its own business with a self-inflicted wound that will fester over the long term. As content providers on the web get less traffic and less revenue, they will have fewer resources to produce the content that Google now abstracts into its rich snippets. This will lead to a death spiral in the content ecosystem on which Google depends, much as Microsoft’s extractive dominion over PC software left few companies to develop innovative new applications for the platform.

In his book Hit Refresh, Satya Nadella, Microsoft’s current CEO, reflected on the wrong turn his company had taken:

When I became CEO, I sensed we had forgotten how our talent for partnerships was a key to what made us great. Success caused people to unlearn the habits that made them successful in the first place.

I asked Nadella to expand on this thought in an interview I did with him in April 2017:

The creation myth of Microsoft is what should inspire us. One of the first things the company did, when Bill and Paul got together, is that they built the BASIC interpreter for the ALTAIR. What does that tell us today, in 2017? It tells us that we should build technology so that others can build technology. And in a world which is going to be shaped by technology, in every part of the world, in every sector of the economy, that’s a great mission to have. And, so, I like that, that sense of purpose, that we create technology so that others could create more technology.

Now that they’ve gone back to enabling others, Microsoft is on a tear.

We might ask a similar question: what was the creation myth of Google? In 1998, Larry Page and Sergey Brin set out to “organize the world’s information and make it universally accessible and useful.” Paraphrasing Nadella, what does that tell us today, in 2019? It tells us that Google should build services that help others to create the information that Google can then organize, make accessible, and make more useful. That’s a mission worth blitzscaling for.

Google is now 20 years old. One reason for its extractive behavior is that it is being told (now by Wall Street rather than venture investors) that it is imperative to keep growing. But the greenfield opportunity has gone, and the easiest source of continued growth is cannibalization of the ecosystem of content suppliers that Google was originally created to give users better access to. Growth for growth’s sake seems to have replaced the mission that made Google great.

The true path to prosperity

Let’s circle back to Uber and Lyft as they approach their IPOs. Travis Kalanick and Garrett Camp, the founders of Uber, are serial entrepreneurs who set out to get rich. Logan Green and John Zimmer, the founders of Lyft, are idealists whose vision was to reinvent public transportation. But having raised billions using the blitzscaling model, both companies are subject to the same inexorable logic: they must maximize the return to investors.

This they can do only by convincing the market that their money-losing businesses will be far better in the future than they are today. Their race to monopoly has ended up instead with a money-losing duopoly, where low prices to entice ever more consumers are subsidized by ever more capital. This creates enormous pressure to eliminate costs, including the cost of drivers, by investing even more money in technologies like autonomous vehicles, once again “prioritizing speed over efficiency,” and “risking potentially disastrous defeat” while blitzscaling their way into an unknown future.

Unfortunately, the defeat being risked is not just theirs, but ours. Microsoft and Google began to cannibalize their suppliers only after 20 years of creating value for them. Uber and Lyft are being encouraged to eliminate their driver partners from the get-go. If it were just these two companies, it would be bad enough. But it isn’t. Our entire economy seems to have forgotten that workers are also consumers, and suppliers are also customers. When companies use automation to put people out of work, they can no longer afford to be consumers; when platforms extract all the value and leave none for their suppliers, they are undermining their own long-term prospects. It’s two-sided markets all the way down.

The goal for Lyft and Uber (and for all the entrepreneurs being urged to blitzscale) should be to make their companies more sustainable, not just more explosive—more equitable, not more extractive.

As an industry and as a society, we still have many lessons to learn, and, apologies to Hoffman and Yeh, I fear that how to get better at runaway growth is far from the most important one.

Continue reading The fundamental problem with Silicon Valley’s favorite growth strategy.

Categories: Technology

0x63: Can Anyone Live in Full Software Freedom Today? (Part IV)

FAIF - Wed, 2019/03/20 - 14:39

In their final installment regarding their joint keynote at FOSDEM 2019, entitled: Can Anyone Live in Full Software Freedom Today?: Confessions of Activists Who Try But Fail to Avoid Proprietary Software, you listeners can hear the final product — a recording of the actual FOSDEM keynote. Afterwards, Karen and Bradley compare notes on what went wrong and what went right (but mostly what went wrong) during the talk.

Show Notes: Segment 0 (00:00:35)

Bradley and Karen talk logistics of how the talk is embedded in the audio.

Segment 1 (00:04:14)

The audio in this segment taken directly from the video of Karen and Bradley's FOSDEM 2019 opening keynote, entitled Can Anyone Live in Full Software Freedom Today? Confessions of Activists Who Try But Fail to Avoid Proprietary Software, which was given . If you'd rather watch the video, you can do so via FSODEM's video site in either webm format or in mp4 format.

Segment 2 (00:46:01) Segment 3 (01:05:31)

Karen and Bradley mention that the next episode will be an interview with Dan Lynch recorded at CopyleftConf 2019.

Send feedback and comments on the cast to <>. You can keep in touch with Free as in Freedom on our IRC channel, #faif on, and by following Conservancy on and and Twitter.

Free as in Freedom is produced by Dan Lynch of Theme music written and performed by Mike Tarantino with Charlie Paxson on drums.

The content of this audcast, and the accompanying show notes and music are licensed under the Creative Commons Attribution-Share-Alike 4.0 license (CC BY-SA 4.0).

Categories: Free Software

Velocity 2019 will focus on the rise of cloud native infrastructure

O'Reilly Radar - Wed, 2019/03/20 - 12:55

Organizations that want all of the speed, agility, and savings the cloud provides are embracing a cloud native approach.

Nearly all organizations today are doing some of their business in the cloud, but the push for increased feature performance and reliability has sparked a growing number to embrace a cloud native infrastructure. In Capgemini’s survey of more than 900 executives, adoption of cloud native apps is set to jump from 15% to 32% by 2020. The strong combination of growth in cloud native adoption and the considerable opportunities it creates for organizations is why we’re making cloud native a core theme at the O’Reilly Velocity Conference this year.

What’s the appeal of cloud native? These days consumers demand instant access to services, products, and data across any device, at any time. This 24/7 expectation has changed how companies do business, forcing many to move their infrastructure to the cloud to provide the fast, reliable, always-available access on which we’ve come to rely.

Yet, merely packaging your apps and moving them to the cloud isn’t enough. To harness the cloud’s cost and performance benefits, organizations have found that a cloud native approach is a necessity. Cloud native applications are specifically designed to scale and provision resources on the fly in response to business needs. This lets your apps run efficiently, saving you money. These apps are also more resilient, resulting in less downtime and happier customers. And as you develop and improve your applications, a cloud native infrastructure makes it possible for your company to deploy new features faster, more affordably, and with less risk.

Cloud native considerations

The Cloud Native Computing Foundation (CNCF) defines cloud native as a set of technologies designed to:

...empower organizations to build and run scalable applications in modern, dynamic environments such as public, private, and hybrid clouds. Containers, service meshes, microservices, immutable infrastructure, and declarative APIs exemplify this approach.

These techniques enable loosely coupled systems that are resilient, manageable, and observable. Combined with robust automation, they allow engineers to make high-impact changes frequently and predictably with minimal toil.

The alternative to being cloud native is to either retain your on-premises infrastructure or merely "lift and shift" your current infrastructure to the cloud. Both options result in your existing applications being stuck with their legacy modes of operation and unable to take advantage of the cloud's built-in benefits.

While “lift and shift” is an option, it’s become clear as enterprises struggle to manage cloud costs and squeeze increased performance from their pipelines that it’s not enough to simply move old architectures to new locations. To remain competitive, companies are being forced to adopt new patterns, such as DevOps and site reliability engineering, and new tools like Kubernetes, for building and maintaining distributed systems that often span multiple cloud providers. Accordingly, use of cloud native applications in production has grown more than 200% since December 2017.

And the number of companies contributing to this space keeps growing. The CNCF, home to popular open source tools like Kubernetes, Prometheus, and Envoy, has grown to 350 members compared to fewer than 50 in early 2016. The community is extremely active—the CNCF had more than 47,000 contributors work on its projects in 2018. "This is clearly a sign that the cloud native space is a place companies are investing in, which means increased demand for resources," said Boris Scholl, product architect for Microsoft Azure, in a recent conversation.

But going cloud native is not all sunshine and roses; it’s hard work. The systems are inherently complex, difficult to monitor and troubleshoot, and require new tools that are constantly evolving and not always easy to learn. Vendor lock-in is a concern as well, causing many companies to adopt either a multi-cloud approach (where they work with more than one public cloud vendor) or a hybrid cloud approach (a combination of on-premises private cloud and third-party public cloud infrastructure, managed as one), which adds complexity in exchange for flexibility. Applications that are developed specifically to take advantage of one cloud provider’s infrastructure are not very portable.

The challenges are not all technical, either. Going cloud native requires new patterns of working and new methods of collaborating, such as DevOps and site reliability engineering. To be successful, these shifts need buy-in from every part of the business.

In Solstice’s Cloud Native Forecast for 2019, the authors highlight the challenges of change as a top trend facing the cloud community this year. “One of the most challenging aspects of cloud-native modernization is transforming an organization’s human capital and culture,” according to the report. “This can involve ruthless automation, new shared responsibilities between developers and operations, pair programming, test-driven development, and CI/CD. For many developers, these changes are simply hard to implement.”

Cloud native and the evolution of the O’Reilly Velocity Conference

We know businesses are turning to cloud native infrastructure because it helps them meet and exceed the expectations of their customers. We know cloud native methods and tools are expanding and maturing. And we know adoption of cloud native infrastructure is not an easy task. These factors mean systems engineers and operations professionals—the audience Velocity serves—are being asked to learn new techniques and best practices for building and managing the cloud native systems their companies need.

Evolving toward cloud native is a natural step for Velocity because it has a history of shifting as technology shifts. The event's original focus on WebOps grew to encompass a broader audience: systems engineers. Our community today has emerged from their silos to take part in cross-functional teams, building and maintaining far more interconnected, distributed systems, most of which are hosted, at least in part, on the cloud. Our attendees have experienced first-hand the raft of new challenges and opportunities around performance, security, and reliability in building cloud native systems.

At Velocity, our mission is to provide our audience with the educational resources and industry connections they need to successfully build and maintain modern systems, which means turning the spotlight to cloud native infrastructure. We hope you’ll join us as we explore cloud native in depth at our 2019 events in San Jose (June 10-13, 2019) and Berlin (November 4-7, 2019).

Continue reading Velocity 2019 will focus on the rise of cloud native infrastructure.

Categories: Technology

Proposals for model vulnerability and security

O'Reilly Radar - Wed, 2019/03/20 - 11:50

Apply fair and private models, white-hat and forensic model debugging, and common sense to protect machine learning models from malicious actors.

Like many others, I’ve known for some time that machine learning models themselves could pose security risks. A recent flourish of posts and papers has outlined the broader topic, listed attack vectors and vulnerabilities, started to propose defensive solutions, and provided the necessary framework for this post. The objective here is to brainstorm on potential security vulnerabilities and defenses in the context of popular, traditional predictive modeling systems, such as linear and tree-based models trained on static data sets. While I’m no security expert, I have been following the areas of machine learning debugging, explanations, fairness, interpretability, and privacy very closely, and I think many of these techniques can be applied to attack and defend predictive modeling systems.

In hopes of furthering discussions between actual security experts and practitioners in the applied machine learning community (like me), this post will put forward several plausible attack vectors for a typical machine learning system at a typical organization, propose tentative defensive solutions, and discuss a few general concerns and potential best practices.

1. Data poisoning attacks

Data poisoning refers to someone systematically changing your training data to manipulate your model’s predictions. (Data poisoning attacks have also been called “causative” attacks.) To poison data, an attacker must have access to some or all of your training data. And at many companies, many different employees, consultants, and contractors have just that—and with little oversight. It’s also possible a malicious external actor could acquire unauthorized access to some or all of your training data and poison it. A very direct kind of data poisoning attack might involve altering the labels of a training data set. So, whatever the commercial application of your model is, the attacker could dependably benefit from your model’s predictions—for example, by altering labels so your model learns to award large loans, large discounts, or small insurance premiums to people like themselves. (Forcing your model to make a false prediction for the attacker’s benefit is sometimes called a violation of your model’s “integrity”.) It’s also possible that a malicious actor could use data poisoning to train your model to intentionally discriminate against a group of people, depriving them the big loan, big discount, or low premiums they rightfully deserve. This is like a denial-of-service (DOS) attack on your model itself. (Forcing your model to make a false prediction to hurt others is sometimes called a violation of your model’s “availability”.) While it might be simpler to think of data poisoning as changing the values in the existing rows of a data set, data poisoning can also be conducted by adding seemingly harmless or superfluous columns onto a data set. Altered values in these columns could then trigger altered model predictions.

Now, let’s discuss some potential defensive and forensic solutions for data poisoning:

  • Disparate impact analysis: Many banks already undertake disparate impact analysis for fair lending purposes to determine if their model is treating different types of people in a discriminatory manner. Many other organizations, however, aren't yet so evolved. Disparate impact analysis could potentially discover intentional discrimination in model predictions. There are several great open source tools for detecting discrimination and disparate impact analysis, such as Aequitas, Themis, and AIF360.
  • Fair or private models: Models such as learning fair representations (LFR) and private aggregation of teacher ensembles (PATE) try to focus less on individual demographic traits to make predictions. These models may also be less susceptible to discriminatory data poisoning attacks.
  • Reject on Negative Impact (RONI): RONI is a technique that removes rows of data from the training data set that decrease prediction accuracy. See “The Security of Machine Learning” in section 8 for more information on RONI.
  • Residual analysis: Look for strange, prominent patterns in the residuals of your model predictions, especially for employees, consultants, or contractors.
  • Self-reflection: Score your models on your employees, consultants, and contractors and look for anomalously beneficial predictions.

Disparate impact analysis, residual analysis, and self-reflection can be conducted at training time and as part of real-time model monitoring activities.

2. Watermark attacks

Watermarking is a term borrowed from the deep learning security literature that often refers to putting special pixels into an image to trigger a desired outcome from your model. It seems entirely possible to do the same with customer or transactional data. Consider a scenario where an employee, consultant, contractor, or malicious external actor has access to your model’s production code—that makes real-time predictions. Such an individual could change that code to recognize a strange, or unlikely, combination of input variable values to trigger a desired prediction outcome. Like data poisoning, watermark attacks can be used to attack your model’s integrity or availability. For instance, to attack your model’s integrity, a malicious insider could insert a payload into your model’s production scoring code that recognizes the combination of age of 0 and years at an address of 99 to trigger some kind of positive prediction outcome for themselves or their associates. To deny model availability, an attacker could insert an artificial, discriminatory rule into your model’s scoring code that prevents your model from producing positive outcomes for a certain group of people.

Defensive and forensic approaches for watermark attacks might include:

  • Anomaly detection: Autoencoders are a fraud detection model that can identify input data that is strange or unlike other input data, but in complex ways. Autoencoders could potentially catch any watermarks used to trigger malicious mechanisms.
  • Data integrity constraints: Many databases don’t allow for strange or unrealistic combinations of input variables and this could potentially thwart watermarking attacks. Applying data integrity constraints on live, incoming data streams could have the same benefits.
  • Disparate impact analysis: see section 1.
  • Version control: Production model scoring code should be managed and version-controlled—just like any other mission-critical software asset.

Anomaly detection, data integrity constraints, and disparate impact analysis can be used at training time and as part of real-time model monitoring activities.

3. Inversion by surrogate models

Inversion basically refers to getting unauthorized information out of your model—as opposed to putting information into your model. Inversion can also be an example of an “exploratory reverse-engineering” attack. If an attacker can receive many predictions from your model API or other endpoint (website, app, etc.), they can train their own surrogate model. In short, that’s a simulation of your very own predictive model! An attacker could conceivably train a surrogate model between the inputs they used to generate the received predictions and the received predictions themselves. Depending on the number of predictions they can receive, the surrogate model could become quite an accurate simulation of your model. Once the surrogate model is trained, then the attacker has a sandbox from which to plan impersonation (i.e., “mimicry”) or adversarial example attacks against your model’s integrity, or the potential ability to start reconstructing aspects of your sensitive training data. Surrogate models can also be trained using external data sources that can be somehow matched to your predictions, as ProPublica famously did with the proprietary COMPAS recidivism model.

To protect your model against inversion by surrogate model, consider the following approaches:

  • Authorized access: Require additional authentication (e.g., 2FA) to receive a prediction.
  • Throttle predictions: Restrict high numbers of rapid predictions from single users; consider artificially increasing prediction latency.
  • White-hat surrogate models: As a white-hat hacking exercise, try this: train your own surrogate models between your inputs and the predictions of your production model and carefully observe:
    • the accuracy bounds of different types of white-hat surrogate models; try to understand the extent to which a surrogate model can really be used to learn unfavorable knowledge about your model.
    • the types of data trends that can be learned from your white-hat surrogate model, like linear trends represented by linear model coefficients.
    • the types of segments or demographic distributions that can be learned by analyzing the number of individuals assigned to certain white-hat surrogate decision tree nodes.
    • the rules that can be learned from a white-hat surrogate decision tree—for example, how to reliably impersonate an individual who would receive a beneficial prediction.
4. Adversarial example attacks

A motivated attacker could theoretically learn, say by trial and error (i.e., “exploration” or “sensitivity analysis”), surrogate model inversion, or by social engineering, how to game your model to receive their desired prediction outcome or to avoid an undesirable prediction. Carrying out an attack by specifically engineering a row of data for such purposes is referred to as an adversarial example attack. (Sometimes also known as an “exploratory integrity” attack.) An attacker could use an adversarial example attack to grant themselves a large loan or a low insurance premium or to avoid denial of parole based on a high criminal risk score. Some people might call using adversarial examples to avoid an undesirable outcome from your model prediction “evasion.”

Try out the techniques outlined below to defend against or to confirm an adversarial example attack:

  • Activation analysis: Activation analysis requires benchmarking internal mechanisms of your predictive models, such as the average activation of neurons in your neural network or the proportion of observations assigned to each leaf node in your random forest. You then compare that information against your model’s behavior on incoming, real-world data streams. As one of my colleagues put it, “this is like seeing one leaf node in a random forest correspond to 0.1% of the training data but hit for 75% of the production scoring rows in an hour.” Patterns like this could be evidence of an adversarial example attack.
  • Anomaly detection: see section 2.
  • Authorized access: see section 3.
  • Benchmark models: Use a highly transparent benchmark model when scoring new data in addition to your more complex model. Interpretable models could be seen as harder to hack because their mechanisms are directly transparent. When scoring new data, compare your new fancy machine learning model against a trusted, transparent model or a model trained on a trusted data source and pipeline. If the difference between your more complex and opaque machine learning model and your interpretable or trusted model is too great, fall back to the predictions of the conservative model or send the row of data for manual processing. Also record the incident. It could be an adversarial example attack.
  • Throttle predictions: see section 3.
  • White-hat sensitivity analysis: Use sensitivity analysis to conduct your own exploratory attacks to understand what variable values (or combinations thereof) can cause large swings in predictions. Screen for these values, or combinations of values, when scoring new data. You may find the open source package cleverhans helpful for any white-hat exploratory analyses you conduct.
  • White-hat surrogate models: see section 3.

Activation analysis and benchmark models can be used at training time and as part of real-time model monitoring activities.

5. Impersonation

A motivated attacker can learn—say, again, by trial and error, surrogate model inversion, or social engineering—what type of input or individual receives a desired prediction outcome. The attacker can then impersonate this input or individual to receive their desired prediction outcome from your model. (Impersonation attacks are sometimes also known as “mimicry” attacks and resemble identity theft from the model’s perspective.) Like an adversarial example attack, an impersonation attack involves artificially changing the input data values to your model. Unlike an adversarial example attack, where a potentially random-looking combination of input data values could be used to trick your model, impersonation implies using the information associated with another modeled entity (i.e., convict, customer, employee, financial transaction, patient, product, etc.) to receive the prediction your model associates with that type of entity. For example, an attacker could learn what characteristics your model associates with awarding large discounts, like comping a room at a casino for a big spender, and then falsify their information to receive the same discount. They could also share their strategy with others, potentially leading to large losses for your company.

If you are using a two-stage model, be aware of an “allergy” attack. This is where a malicious actor may impersonate a normal row of input data for the first stage of your model in order to attack the second stage of your model.

Defensive and forensic approaches for impersonation attacks may include:

  • Activation analysis: see section 4.
  • Authorized access: see section 3.
  • Screening for duplicates: At scoring time track the number of similar records your model is exposed to, potentially in a reduced-dimensional space using autoencoders, multidimensional scaling (MDS), or similar dimension reduction techniques. If too many similar rows are encountered during some time span, take corrective action.
  • Security-aware features: Keep a feature in your pipeline, say num_similar_queries, that may be useless when your model is first trained or deployed but could be populated at scoring time (or during future model retrainings) to make your model or your pipeline security-aware. For instance, if at scoring time the value of num_similar_queries is greater than zero, the scoring request could be sent for human oversight. In the future, when you retrain your model, you could teach it to give input data rows with high num_similar_queries values negative prediction outcomes.

Activation analysis, screening for duplicates, and security-aware features can be used at training time and as part of real-time model monitoring activities.

6. General concerns

Several common machine learning usage patterns also present more general security concerns.

Blackboxes and unnecessary complexity: Although recent developments in interpretable models and model explanations have provided the opportunity to use accurate and also transparent nonlinear classifiers and regressors, many machine learning workflows are still centered around blackbox models. Such blackbox models are only one type of often unnecessary complexity in a typical commercial machine learning workflow. Other examples of potentially harmful complexity could be overly exotic feature engineering or large numbers of package dependencies. Such complexity can be problematic for at least two reasons:

  1. A dedicated, motivated attacker can, over time, learn more about your overly complex blackbox modeling system than you or your team knows about your own model. (Especially in today’s overheated and turnover-prone data “science” market.) To do so, they can use many newly available model-agnostic explanation techniques and old-school sensitivity analysis, among many other more common hacking tools. This knowledge imbalance can potentially be exploited to conduct the attacks described in sections 1 – 5 or for other yet unknown types of attacks.
  2. Machine learning in the research and development environment is highly dependent on a diverse ecosystem of open source software packages. Some of these packages have many, many contributors and users. Some are highly specific and only meaningful to a small number of researchers or practitioners. It’s well understood that many packages are maintained by brilliant statisticians and machine learning researchers whose primary focus is mathematics or algorithms, not software engineering, and certainly not security. It’s not uncommon for a machine learning pipeline to be dependent on dozens or even hundreds of external packages, any one of which could be hacked to conceal an attack payload.

Distributed systems and models: For better or worse, we live in the age of big data. Many organizations are now using distributed data processing and machine learning systems. Distributed computing can provide a broad attack surface for a malicious internal or external actor in the context of machine learning. Data could be poisoned on only one or a few worker nodes of a large distributed data storage or processing system. A back door for watermarking could be coded into just one model of a large ensemble. Instead of debugging one simple data set or model, now practitioners must examine data or models distributed across large computing clusters.

Distributed denial of service (DDOS) attacks: If a predictive modeling service is central to your organization’s mission, ensure you have at least considered more conventional distributed denial of service attacks, where attackers hit the public-facing prediction service with an incredibly high volume of requests to delay or stop predictions for legitimate users.

7. General solutions

Several older and newer general best practices can be employed to decrease your security vulnerabilities and to increase fairness, accountability, transparency, and trust in machine learning systems.

Authorized access and prediction throttling: Standard safeguards such as additional authentication and throttling may be highly effective at stymieing a number of the attack vectors described in sections 1–5.

Benchmark models: An older or trusted interpretable modeling pipeline, or other highly transparent predictor, can be used as a benchmark model from which to measure whether a prediction was manipulated by any number of means. This could include data poisoning, watermark attacks, or adversarial example attacks. If the difference between your trusted model’s prediction and your more complex and opaque model’s predictions are too large, record these instances. Refer them to human analysts or take other appropriate forensic or remediation steps. (Of course, serious precautions must be taken to ensure your benchmark model and pipeline remains secure and unchanged from its original, trusted state.)

Interpretable, fair, or private models: The techniques now exist (e.g., monotonic GBMs (M-GBM), scalable Bayesian rule lists (SBRL), eXplainable Neural Networks (XNN)), that can allow for both accuracy and interpretability. These accurate and interpretable models are easier to document and debug than classic machine learning blackboxes. Newer types of fair and private models (e.g., LFR, PATE) can also be trained to essentially care less about outward visible, demographic characteristics that can be observed, socially engineered into an adversarial example attack, or impersonated. Are you considering creating a new machine learning workflow in the future? Think about basing it on lower-risk, interpretable, private, or fair models. Models like this are more easily debugged and potentially robust to changes in an individual entity’s characteristics.

Model debugging for security: The newer field of model debugging is focused on discovering errors in machine learning model mechanisms and predictions, and remediating those errors. Debugging tools such a surrogate models, residual analysis, and sensitivity analysis can be used in white-hat exercises to understand your own vulnerabilities or for forensic exercises to find any potential attacks that may have occurred or be occurring.

Model documentation and explanation techniques: Model documentation is a risk-mitigation strategy that has been used for decades in banking. It allows knowledge about complex modeling systems to be preserved and transferred as teams of model owners change over time. Model documentation has been traditionally applied to highly transparent linear models. But with the advent of powerful, accurate explanatory tools (such as tree SHAP and derivative-based local feature attributions for neural networks), pre-existing blackbox model workflows can be at least somewhat explained, debugged, and documented. Documentation should obviously now include all security goals, including any known, remediated, or anticipated security vulnerabilities.

Model monitoring and management explicitly for security: Serious practitioners understand most models are trained on static snapshots of reality represented by training data and that their prediction accuracy degrades in real time as present realities drift away from the past information captured in the training data. Today, most model monitoring is aimed at discovering this drift in input variable distributions that will eventually lead to accuracy decay. Model monitoring should now likely be designed to monitor for the attacks described in sections 1 – 5 and any other potential threats your white-hat model debugging exercises uncover. (While not always directly related to security, my opinion is that models should also be evaluated for disparate impact in real time as well.) Along with model documentation, all modeling artifacts, source code, and associated metadata need to be managed, versioned, and audited for security like the valuable commercial assets they are.

Security-aware features: Features, rules, and pre- or post-processing steps can be included in your models or pipelines that are security-aware, such as the number of similar rows seen by the model, whether the current row represents an employee, contractor, or consultant, or whether the values in the current row are similar to those found in white-hat adversarial example attacks. These features may or may not be useful when a model is first trained. But keeping a placeholder for them when scoring new data, or when retraining future iterations of your model, may come in very handy one day.

Systemic anomaly detection: Train an autoencoder–based anomaly detection metamodel on your entire predictive modeling system’s operating statistics—the number of predictions in some time period, latency, CPU, memory, and disk loads, the number of concurrent users, and everything else you can get your hands on—and then closely monitor this metamodel for anomalies. An anomaly could tip you off that something is generally not right in your predictive modeling system. Subsequent investigation or specific mechanisms would be needed to trace down the exact problem.

8. References and further reading

A lot of the contemporary academic machine learning security literature focuses on adaptive learning, deep learning, and encryption. However, I don’t know many practitioners who are actually doing these things yet. So, in addition to recently published articles and blogs, I found papers from the 1990s and early 2000s about network intrusion, virus detection, spam filtering, and related topics to be helpful resources as well. If you’d like to learn more about the fascinating subject of securing machine learning models, here are the main references—past and present—that I used for this post. I’d recommend them for further reading, too.


I care very much about the science and practice of machine learning, and I am now concerned that the threat of a terrible machine learning hack, combined with growing concerns about privacy violations and algorithmic discrimination, could increase burgeoning public and political skepticism about machine learning and AI. We should all be mindful of AI winters in the not-so-distant past. Security vulnerabilities, privacy violations, and algorithmic discrimination could all potentially combine to lead to decreased funding for machine learning research or draconian over-regulation of the field. Let’s continue discussing and addressing these important problems to preemptively prevent a crisis, as opposed to having to reactively respond to one.


Thanks to Doug Deloy, Dmitry Larko, Tom Kraljevic, and Prashant Shuklabaidya for their insightful comments and suggestions.

Continue reading Proposals for model vulnerability and security.

Categories: Technology

Four short links: 20 March 2019

O'Reilly Radar - Wed, 2019/03/20 - 03:55

Embedded Computer Vision, Unix History, Unionizing Workforce, and Text Adventure AI

  1. SOD -- an embedded, modern cross-platform computer vision and machine learning software library that exposes a set of APIs for deep learning, advanced media analysis and processing, including real-time, multi-class object detection and model training on embedded systems with limited computational resource and IoT devices. Open source.
  2. Unix History Repo -- Continuous Unix commit history from 1970 until today.
  3. Kickstarter's Staff is Unionizing -- early days for the union, but I'm keen to see how this plays out. (I'm one of the founding signatories to the Aotearoa Tech Union, though our countries have different workplace laws.)
  4. Textworld -- Microsoft Research project, it's an open source, extensible engine that both generates and simulates text games. You can use it to train reinforcement learning (RL) agents to learn skills such as language understanding and grounding, combined with sequential decision-making. Cue "Microsoft teaches AI to play Zork" headlines. And they have a competition.

Continue reading Four short links: 20 March 2019.

Categories: Technology

Security Topic for 3/21

PLUG - Tue, 2019/03/19 - 18:24
Aaron Jones: Rehash - Shodan

Due to the fact Aaron will be presenting his Shodan talk to the Fed, he will be presenting it again at SPLUG for practice.

Introduction To Shodan is a two-hour course designed to provide an overview of the search engine for finding devices connected to the internet. Shodan is a security researcher tool that works by scanning the entire internet, locating and parsing banners, and then returning this information to the user. Shodan is an excellent tool to familiarize yourself with if you do not have the infrastructure or tools necessary to run masscan yourself. Shodan is useful in the target selection phase of an operation.

About Aaron:
Aaron, the owner of Retro64XYZ, is a software developer who currently creates applications for law enforcement. He is also an AZ POST certified public speaker. He earned a B.Sc., in Computer Information Systems from Park University in 2013 and an M.A., in Intelligence Analysis with a focus in Cyber Security in 2014. During that period of his life he took a double course load and completed his Masters with a 3.695 GPA in a year. He has been the recipient of recognition from the El Paso Police Department, State Of Texas, Texas Military Forces, Chandler Police Department, and others.

Aaron is also active in the community as the founder of the Phoenix Linux Users Group Cyber Security Meetup and regularly teaches members of the public a myriad of topics related to Cyber Security. His audience includes students, teachers, law enforcement, military, government officials, and concerned members of the public with a strong desire to learn what is going on in the world of technology.

When Aaron isn’t teaching, working, or spending time with his family, he enjoys relaxing at the pond with a fishing pole while not catching fish, operating a pistol at the shooting range, or reading books. He owns a Sega Saturn and a Sega Dreamcast and his favorite video games are Panzer Dragoon, Road Rash, Phantasy Star Online 2, and Power Stone. He is currently engrossed in building content for his site and looking for more ways to reach the public. You should reach Aaron through his Mastodon or on Keybase. He would love to hear from you, answer your questions, or find out about the projects you are involved with.

Four short links: 19 March 2019

O'Reilly Radar - Tue, 2019/03/19 - 04:05

Digital Life, Information Abundance, Quantum Computing, Language Design

  1. Timeliner -- All your digital life on a single timeline, stored locally. Great idea; I hope its development continues.
  2. What's Wrong with Blaming "Information" for Political Chaos (Cory Doctorow) -- a response to yesterday's "What The Hell is Going On?" link. I think Perell is wrong. His theory omits the most salient, obvious explanation for what's going on (the creation of an oligarchy that has diminished the efficacy of public institutions and introduced widespread corruption in every domain), in favor of rationalizations that let the wealthy and their enablers off the hook, converting a corrupt system with nameable human actors who have benefited from it and who spend lavishly to perpetuate it into a systemic problem that emerges from a historical moment in which everyone is blameless, prisoners of fate and history. I think it's both: we have far more of every medium than we can consume because the information industrial engines are geared to production and distraction not curation for quality. This has crippled the internet's ability to be a fightback mechanism. My country's recent experiences with snuff videos and white supremacist evangelicals doesn't predispose me to think as Perell does that the deluge of undifferentiated information is a marvelous thing, so I think Cory and I have a great topic of conversation the next time we're at the same conference together.
  3. Quantum Computing for the Very Curious (Michael Nielsen) -- an explanation of quantum computing with built-in spaced repetition testing of key concepts. Clever!
  4. 3 Things I Wish I Knew When I Began Designing Languages (Peter Alvaro) -- when I presented at my job talk at Harvard, a systems researcher who I admire very much, said something along the lines of, "Yes, this kind of reminds me of a Racket, and in Racket everything is a parenthesis. So, in your language, what is the thing that is everything that I don't buy?" That was nice.

Continue reading Four short links: 19 March 2019.

Categories: Technology

Four short links: 18 March 2019

O'Reilly Radar - Mon, 2019/03/18 - 04:20

Information Abundance, Meritocracy Considered Harmful, Tracking, and Blockchain Basics

  1. What the Hell is Going On? -- I’ll show how the shift from information scarcity to information abundance is transforming commerce, education, and politics. The structure of each industry was shaped by the information-scarce, mass media environment. First, we’ll focus on commerce. Education will be second. Then, we’ll zoom out for a short history of America since World War II. We’ll see how information scarcity creates authority and observe the effects of the internet on knowledge. Finally, we’ll return to politics and tie these threads together.
  2. Meritocracy (Fast Company) -- in companies that explicitly held meritocracy as a core value, managers assigned greater rewards to male employees over female employees with identical performance evaluations. This preference disappeared where meritocracy was not explicitly adopted as a value.
  3. Client-Side Instrumentation for Under $1/Month -- open source JavaScript tracker, AWS Lambda collecting it, Cloudflare logs into S3, AWS Athena integrating. (via Simon Willison)
  4. Crypto Canon -- A16Z recommended reading list to come up to speed in cryptocurrency/blockchain. Contains an awful lot of Medium posts.

Continue reading Four short links: 18 March 2019.

Categories: Technology

Four short links: 15 March 2019

O'Reilly Radar - Fri, 2019/03/15 - 04:05

Monopsony, Debugging Neural Nets, Future of Wearables, and Event Audio

  1. Facebook is Not a Monopoly, But Should Be Broken Up (Wired) -- Demand monopsonists integrate horizontally, acquiring or copying user demand adjacent to their existing demand and gaining leverage over their suppliers (and advertisers, if that’s the model). Facebook is unlikely to ever own a media production company, just as Airbnb and Uber will not soon own a hotel or a physical taxi company. But if they can, they’ll own every square foot of demand that feeds those industries. (via Cory Doctorow)
  2. Debugging Neural Networks -- 1. Start simple; 2. Confirm your loss; 3. Check intermediate outputs and connections; 4. Diagnose parameters; 5. Tracking your work.
  3. A Peek into the Future of Wearables (IEEE) -- Mind reading glasses, goggles that erase chronic pain, a wristband that can hear what the wearer can’t, and more futuristic wearables are on the horizon.
  4. Event Audio -- I wrote up a guide for event organizers to providing microphones so all the speakers can give their best performance.

Continue reading Four short links: 15 March 2019.

Categories: Technology

Algorithms are shaping our lives—here’s how we wrest back control

O'Reilly Radar - Thu, 2019/03/14 - 05:10

The O’Reilly Data Show Podcast: Kartik Hosanagar on the growing power and sophistication of algorithms.

In this episode of the Data Show, I spoke with Kartik Hosanagar, professor of technology and digital business, and professor of marketing at The Wharton School of the University of Pennsylvania.  Hosanagar is also the author of a newly released book, A Human’s Guide to Machine Intelligence, an interesting tour through the recent evolution of AI applications that draws from his extensive experience at the intersection of business and technology.

Continue reading Algorithms are shaping our lives—here’s how we wrest back control.

Categories: Technology

Four short links: 14 March 2019

O'Reilly Radar - Thu, 2019/03/14 - 03:50

Ethical Data, Iodide Notebook, Alexa Discovery, and Game Engine

  1. Changing Contexts and Intents (O'Reilly) -- context and intent as framing mechanisms for determining whether a use of data is appropriate.
  2. Iodide (Mozilla) -- notebook, but with multiple languages (eventually) compiling down to WebAssembly. Create, share, collaborate, and reproduce powerful reports and visualizations with tools you already know.
  3. Amazon's Alexa: 80,000 Apps and No Runaway Hit (Bloomberg) -- voice has a massive discoverability problem. As Alan Cooper said, I really have no idea what the boundaries of the domains are, because I would have to go experiment endlessly with Siri and Alexa and all the others, and I don’t have the patience. But that’s the point: I have no idea even roughly what I’m likely to be able to ask about. And it’s a moving target because the platform makers assume that more content is better, so they shovel new content into the system as fast as they can. So in a very real sense, the burden of memorizing the list of commands is increasing over time, as the system “improves.”
  4. ESP Little Game Engine -- Game engine with web emulator and compiler. [...] The game engine has a virtual screen resolution of 128x128 pixels, 16 colors, one background layer, 32 soft sprites with collision tracking and rotation, 20kb of memory for the game and variables. The virtual machine performs approximately 900,000 operations per second at a drawing rate of 20 frames per second. Control of eight buttons. Built for the ESP8266 chipset.

Continue reading Four short links: 14 March 2019.

Categories: Technology

Changing contexts and intents

O'Reilly Radar - Wed, 2019/03/13 - 03:00

The internet itself is a changing context—we’re right to worry about data flows, but we also have to worry about the context changing even when data doesn’t flow.

Every day, someone comes up with a new use for old data. Recently, IBM scraped a million photos from Flickr and turned them into a training data set for an AI project intending to reduce bias in facial recognition. That’s a noble goal, promoted to researchers as an opportunity to make more ethical AI.

Yet, the project raises numerous ethical questions of its own. Photographers and subjects weren’t asked if their photos could be included; while the photos are all covered by a Creative Commons non-commercial license, one of the photographers quoted in an NBC article about the project asks by what rationale anything IBM does with his photographs can be considered “non-commercial.” It’s almost impossible to get your photographs removed from the database; it’s possible in principle, but IBM requires you to have the URL of the original photograph—which means you have to know which photographs were included in the first place. (NBC provides a tool to check whether your photos are in the database.) And there are plenty of questions about how people will make use of this data, which has been annotated with many measurements that are useful for face recognition systems.

Not only that, photographic subjects were, in effect, turned into research subjects without their consent. And even though their photos were public on Flickr, a strong case can be made that the new context violates their privacy.

Cornell Tech professor Helen Nissenbaum, author of the book Privacy in Context, reminds us that we need to think about privacy in terms of when data moves from one context to another, rather than in absolute terms. Thinking about changes in context is difficult, but essential: we’ve long passed the point where any return to absolute privacy was possible—if it ever was possible in the first place.

Meredith Whittaker, co-director of the AI Now Institute, made a striking extension to this insight in a quote from the same NBC article: “People gave their consent to sharing their photos in a different internet ecosystem.”

We do indeed live in a different internet ecosystem than the one many of our original privacy rules were invented for. The internet is not what it was 30 years ago. The web is not what it was 30 years ago, when it was invented. Flickr is not what it was when it was founded. 15 or 20 years ago, we had some vague ideas about face recognition, but it was a lot closer to science fiction. People weren’t actually automating image tagging, which is creepy enough; they certainly weren’t building applications to scan attendees at concerts or sporting events.

IBM’s creation of a new database obviously represents a change of context. But Whittaker is saying that the internet itself is a changing context. It isn’t what it has been in the past; it probably never could have stayed the same; but regardless of what it is now, the data’s context has changed, without the data moving. We’re right to worry about data flows, but we also have to worry about the context changing even when data doesn’t flow. It’s easy to point fingers at IBM for using Flickr’s data irresponsibly—as we read it, we’re sympathetic with that position. But the real challenge is that the meaning of the images on Flickr has changed. They're not just photos: they're a cache of data for training machine learning systems.

What do we do when the contexts themselves change? That’s a question we must work hard to answer. Part of the problem is that contexts change slowly, and that changes in a context are much easier to ignore than a new data-driven application.

Some might argue that data can never be used without consent. But that has led us down a path of over-broad clickwrap agreements that force people to give consent for things that are not yet even imagined in order to use a valuable service.

One special type of meta-context to consider is intent. While context may change, it is possible to look through that changing context to the intent of a user’s consent to the use of their data. For example, when someone uses Google maps, they implicitly consent to Google using location data to guide them from place to place. When Google then provides an API that allows Uber or Lyft or Doordash to leverage that data to guide a driver to them, the context has changed but the intent has not. The data was part of a service transaction, and the intent can be "well-intentionedly" transferred to a new context, as long as it is still in service of the user, rather than simply for the advantage of the data holder.

When Google decides to use your location to target advertisements, that's not only a different context but a different intent. As it turns out, Google actually expresses the intent of their data collection very broadly, and asks its users to consent to Google’s use of their data in many evolving contexts as the cost of providing free services. There would surely be value in finer grained expression of intent. At the same time, you can make the case that there was a kind of meta-intent expressed and agreed to, which can survive the context transition to new services. What we still need in a case like this is some mechanism for redress, for users to say “in this case, you went too far” even as they often are delighted by other new and unexpected uses for their data.

There are other cases where the breach of intent is far clearer. For example, when my cell phone provider gets my location as a byproduct of connecting me to a cell tower, other use of my data was never part of the transaction, and when they resell it (as they do), that is a breach not only of context but of intent.

Data ethics raises many a hard problem. Fortunately, the framing of context as a guiding principle by Nissenbaum and Whittaker gives us a powerful way to work toward solving them.

Continue reading Changing contexts and intents.

Categories: Technology

Four short links: 13 March 2019

O'Reilly Radar - Wed, 2019/03/13 - 01:00

O'Reilly Radar, Speech Recognition, Super Sensors, Burnout

  1. What is O'Reilly Radar? -- trends we see breaking: Next Economy; Future of the Firm; Machine Learning/AI; Next Architecture; Responding to Disruption. Report on Future of the Firm is already out.
  2. An All-Neural On-Device Speech Recognizer (Google) -- on-device is the important bit here: no more uploading all your ambient audio to the cloud. After compression, the final model is 80MB. That's impressive too.
  3. Super Sensors -- a single, highly capable sensor can indirectly monitor a large context, without direct instrumentation of objects.
  4. Observations on Burnout -- I’ll add some data points, go in-depth on what I think causes it, and attempt to offer some advice for engineers and managers.

Continue reading Four short links: 13 March 2019.

Categories: Technology

0x62: Can Anyone Live in Full Software Freedom Today? (Part III)

FAIF - Tue, 2019/03/12 - 11:56

Bradley and Karen have the last pre-talk installment of discussing the preparation for their joint keynote at FOSDEM 2019, entitled: Can Anyone Live in Full Software Freedom Today?: Confessions of Activists Who Try But Fail to Avoid Proprietary Software. This episode is the third of three episodes where Bradley and Karen record their preparation conversations for this keynote address. In this particular episode, they discuss the issue of letting others use proprietary software on your behalf, the problem of relying too much on that, and then finish up discussing with how they'll include this material into the final talk.

Show Notes: Segment 0 (00:34)
  • Karen discussed the idea of a shabbos goy, and the analogy between that and allowing other people use proprietary on your behalf. (02:58)
  • Bradley and Karen discussed that it is equally abhorrent to ask someone else to use proprietary software for you as it is to use yourself, since someone's software freedom is compromised in any event (06:58)
  • Bradley mentioned that he had previously applied to serve on the USA's Internal Revenue Service (IRS)'s Electronic Tax Administration Advisory Committee (ETAAC). Bradley mentioned how sadly the IRS typically accepts people from proprietary software companies like Intuit but has to his knowledge never accepted anyone involved in FOSS software for IRS form preparation (10:02)
  • Bradley mentioned the Free Software PDF fill-in tools evince and flpsed (12:24)
  • Karen stated that Conservancy's policy is that: We care so much about software freedom that we would rather use proprietary software than have someone else lose their software freedom. (15:20)
  • Karen mentioned that her Linux Conf Australia 2019, Right to Not Broadcast, which you can view online. (22:18)
Segment 1 (23:15)

Send feedback and comments on the cast to <>. You can keep in touch with Free as in Freedom on our IRC channel, #faif on, and by following Conservancy on and and Twitter.

Free as in Freedom is produced by Dan Lynch of Theme music written and performed by Mike Tarantino with Charlie Paxson on drums.

The content of this audcast, and the accompanying show notes and music are licensed under the Creative Commons Attribution-Share-Alike 4.0 license (CC BY-SA 4.0).

Categories: Free Software

Four short links: 12 March 2019

O'Reilly Radar - Tue, 2019/03/12 - 04:10

Digital Music, Smart Camera, Cell Network Software, and Gender Equity

  1. Protocols: Duty, Despair and Decentralization (Mat Dryhurst) -- Another cold-light-of-day re-reading of the surge of poptimism in the press over the past decade is to see it as the bargaining stage of grief over the seemingly inexorable charge of bot-like popular figures who hoover up ideas from the margins and deploy significant resources to capture a moment with music fortified from any potentially critical angle one might level at it. Pop stars are better understood as monarchic CEO’s of content production studios atop a feudal, trickle up, creative economy. They have adapted to the online ecosystem far faster than the critical systems that might have one day raised objection to them. A fascinating and energetic stream of consciousness about the internet-disabled/enabled music industry.
  2. Under the Hood: Portal's Smart Camera (Facebook) -- how it follows you as you move around the room, with interesting pictures of the prototypes and how they automated what directors do (in some cases).
  3. Magma -- Magma is an open source software platform that gives network operators an open, flexible, and extendable mobile core network solution. Magma enables better connectivity by: (1) Allowing operators to offer cellular service without vendor lock-in with a modern, open source core network; (2) Enabling operators to manage their networks more efficiently with more automation, less downtime, better predictability, and more agility to add new services and applications; (3) Enabling federation between existing MNOs and new infrastructure providers for expanding rural infrastructure; (4) Allowing operators who are constrained with licensed spectrum to add capacity and reach by using Wi-Fi and CBRS. Want to spin up your own LTE network? (via Facebook blog)
  4. Gender Equity Resources (NAVA) -- is for the GLAM (Galleries, Libraries, Archives, Museums) sector, but there's a lot to adapt for your tech workplace, too. (via Courtney Johnston)

Continue reading Four short links: 12 March 2019.

Categories: Technology

Future of the firm

O'Reilly Radar - Tue, 2019/03/12 - 03:00

Mapping the complex forces that are reshaping organizations and changing the employee/employer relationship.

The “future of the firm” is a big deal. As jobs become more automated, and people more often work in teams, with work increasingly done on a contingent and contract basis, you have to ask: “What does a firm really do?” Yes, successful businesses are increasingly digital and technologically astute. But how do they attract and manage people in a world where two billion people work part-time? How do they develop their workforce when automation is advancing at light speed? And how do they attract customers and full-time employees when competition is high and trust is at an all-time low?

When thinking about the big-picture items affecting the future of the firm, we identified several topics that we discuss in detail in this report:

Continue reading Future of the firm.

Categories: Technology

What is O'Reilly Radar?

O'Reilly Radar - Tue, 2019/03/12 - 03:00

Radar spots and explores emerging technology themes so organizations can succeed amid constant change.

O’Reilly Radar is a process that assimilates signals and data to track, map, and name technology trends that impact many aspects of modern business and living. Almost as old as O’Reilly itself, Radar has a history of playing a key role in the development and amplification of influential themes, including open source, Web 2.0, big data, DevOps, Next Economy, and others.

O'Reilly applies the Radar approach to spot what's coming next and show how technology is changing our world. Using reports, conferences, and conversations, the Radar group provides decision-makers with the tools and connections they need to thrive during these dynamic times.

Our insights come from many sources: our own reading of the industry tea leaves, our many contacts in the industry, our analysis of usage on the O’Reilly online learning platform, and data we assemble on technology trends. After we track and identify emergent trends, we map them into themes that address their broader impact on employees, organizations, and society at large.

The process is not easy or simplistic. It requires follow-up analysis, much internal debate, and a healthy dose of realism in response to industry hype. Our goal is to provide insights and confidence to folks making decisions about technology, strategy, purpose, and mission.

We take a few fundamental approaches to exploring technology adoption, each building on the other:

  • We convene communities of interest through our conferences, summits, and our Foo Camps (short for “Friends of O’Reilly,” these are invite-only, unconference-style events).
  • We conduct qualitative research, taking advantage of our ability to convene communities, using surveys, interviews, and salons to gain intimate access to thought leaders, business leaders, and those in the trenches who are wrestling with technology and change.
  • We conduct quantitative analysis to track technology adoption, from the esoteric and emergent to the everyday world of developers, designers, administrators, managers, and architects. The O’Reilly online learning platform is one of the quantitative tools we use; it serves as a massive sensor that we analyze for insights into users’ engagement with technology.

Through this process, we’ve identified the following five themes business and technology leaders should consider. These themes are not discrete; we see much bleed between topics and how they interact—a characteristic of the current technology environment that affects, well, nearly everything organizations touch.

Next Economy

Next Economy defines how business leaders, policymakers, and technologists can chart a course from the economy we experience today to a better future for all, acknowledging the wonders and challenges that we have collectively wrought.

This research area focuses on big picture economic trends that nearly all organizations face, including:

  • How is technology changing the shape of the corporation and the nature of work?
  • What skills become more valuable as more types of work are subject to automation?
  • What economic incentives encourage businesses to treat people as costs to be eliminated rather than as assets to invest in? How do we change those incentives?
  • How do algorithmic systems drive value, manifest bias, and affect fairness—particularly in closed platforms with their own economics?
  • What is the impact of behavioral economics?
  • How does diversity improve all aspects of decision-making?
  • Do we need a new model and way of assessing antitrust in the age of internet-scale platforms?
  • What does technology let us do now that was formerly impossible?

This Radar theme offers a new and empowering perspective on creating value and success, leveraging innovation, and embracing disruption and change.

Radar has been looking at the Next Economy for the last five years, including running Next:Economy conferences in 2015 and 2016. Tim O’Reilly’s book WTF?: What’s the Future and Why It’s Up to Us provides a deeper dive into Next Economy topics. Lately, we have sharpened our attention on Next Economy topics into a focus on the Future of the Firm, as covered in the next section and in our “Future of the Firm” report.

Future of the Firm

As jobs become more automated and work is increasingly done on a contingent and contract basis, you have to ask: what does a firm really do?

Yes, successful businesses are increasingly digital and technologically astute. But how do they attract, retain, incent, and manage people in a world where by choice or by circumstance two billion people work part-time? How do they develop their workforce when automation is advancing at light speed? And how do they attract customers and full-time employees when competition is high and trust is at an all-time low?

Modern businesses are being reshaped by a number of factors, including:

  • Increasing demand for trust, responsibility, credibility, honesty, and transparency in organizations.
  • Employees’ search for meaning.
  • New leadership models with networks replacing hierarchies—the recognition that the top-down approach is too slow, catalyzing a move toward decentralization and teams.
  • The impact of generational change on employee and customer expectations.
  • Big systemic thinking—the need to understand and consider organizations as operating in complex, interconnected environments.
  • Automation creating new kinds of partnerships between people and machines.
  • Free agency, personal brands, and the evolving employer/employee relationship.
  • Compensation beyond pay.
  • Diversity, inclusion, and fairness at work.
  • Governance and the case for cognitive and experiential diversity.

We explore each of these trends in our report “Future of the Firm.”

Machine Learning / Artificial Intelligence

Few technologies have the potential to change the nature of work, of the firm, and how we live as machine learning (ML) and artificial intelligence (AI). The impact of ML/AI on the future of our economy is both uncertain and undeniable. With new tools and computing power, ML/AI has become more effective at predictions, recommendations, certain types of pattern matching, and optimizing processes. And while the space has matured quickly, organizations continue to grapple with how to best apply ML/AI models—we are still in early days with much to learn.

We see a lot of the effort around ML/AI aimed at improving or reframing the customer experience—i.e., making processes simpler, faster, more convenient, more intuitive, and anticipating requests or actions. However, results don’t always meet expectations due to a number of factors:

  • The need for large quantities of well-organized, accurate data.
  • The time and resources required to train machine learning models.
  • Algorithms that are difficult or too complex to understand.
  • Bias and fairness issues.
  • The need for constant monitoring.
  • Unreliable accuracy.

The result: ML/AI work requires a different approach and different perspective from how we develop software. Managing risk and measuring success means more unpredictable schedules and more tolerance for failure—think of ML/AI projects as a portfolio of experiments to monitor and evaluate.

Learn more about how organizations are evaluating and implementing ML/AI in our report “AI Adoption in the Enterprise.”

Next Architecture

Analysis of the O’Reilly online learning platform shows growing engagement with cloud, orchestration, and microservices topics. When coupled with continued interest in containers, this engagement paints a picture of increased use of a new kind of software architecture for building an organization’s digital presence. This architecture, which we call the Next Architecture, is cloud based, with functionality decomposed into microservices that are modularized into containers and managed and monitored by dynamic orchestration. Conversations with thought leaders across many industries confirm that the combination of cloud, containers, orchestration, and decomposition does indeed represent the path many organizations are taking for their next architecture.

Why the change? Organizations see the need to support agility, flexibility, scaling, resiliency, and productivity in building their digital properties as intrinsic to their value propositions and their ability to compete. The Next Architecture is not a cure-all or magic bullet. It’s a way of thinking about and designing systems that promises to be more flexible and adaptable than traditional monolith approaches.

Moving to the Next Architecture is not to be taken lightly, requiring new skills and the ability to manage complexity, including the particularly difficult task of turning complex functionality into modular, stand-alone services that can be easily upgraded or replaced. For most organizations, these challenges are worth confronting as a more flexible, agile, scalable architecture becomes essential for their digital properties.

We plan to release a report covering the Next Architecture in more detail in the coming months.

Responding to Innovation & Disruption

How do you run a business when everything is always changing, and innovation and disruption have become the new norms? Once you acknowledge that change is inevitable, how do you embrace it? Moonshots? Incremental change? Innovation centers? Skunkworks? Key hires? Some mix? It’s confusing, and each approach has its benefits and risks.

At a fundamental level, the best way to thrive in a world of constant innovation and disruption is to constantly reinvent yourself—to pay attention to technology, to your customers, to thought leaders, and adapt. Leaders and staff at all levels need to embrace continuous learning to avoid surprising and existential threats. History is littered with organizations that failed to adapt: look at Digital Equipment Corporation. Look at Kodak. Look at Sears.

But we also see companies making profound turnarounds. Five years ago, Microsoft looked stagnant and irrelevant. Nobody would say that now. Microsoft adapted to a future that looks different from the past, embracing change, embracing open source, and embracing the cloud.

Taking a note from Microsoft, what are the adaptations your organization needs to make? What technologies and shifts do you see on the horizon that will need to be addressed through innovation and disruption?

For example, blockchain, the distributed trust data structure, offers the potential for great disruption. And, our analysis shows organizations using our online learning platform paying increasing and considerable attention to blockchain as a topic.

While banks and other financial institutions are trialing blockchain applications, we see a wealth of possibilities beyond finance to apply blockchain’s encrypted and distributed data structure: supply chain / asset tracking, customer loyalty, identity management, government records, educational credentials, and distributed energy generation. Blockchain brings incredible disruptive potential to the future—or, it may not. It’s too early to tell. Nonetheless, if you’re not paying attention to blockchain, you can be sure someone you compete with is.

Get an introduction to blockchain’s components and uses in “What is a blockchain?

More to come from O’Reilly Radar

Our first report of 2019, “Future of the Firm,” is now available.

Over the coming months we’ll explore each of these themes through reports and analytic studies, event tracks, interviews with leaders and experts, and through other content and activities. We hope you’ll join us.

Continue reading What is O'Reilly Radar?.

Categories: Technology

March 14th Meeting topic

PLUG - Mon, 2019/03/11 - 04:46
For our March meeting William Lindley will present to us "75 Years of Computing in 60 Minutes"

William Lindley: 75 Years of Computing in 60 Minutes
The roots of modern digital computing go back nearly two hundred years, and through a series of pass-around artifacts from the past century, we will explore how the pioneers of the field and their groundbreaking decisions and technologies have led us -- for better or worse -- to today's Internet-enabled world.

About William
Mr. Lindley has been in the computer industry since he sold his first program (a printer driver for Heathkit HDOS) in 1980. He has used system from the earliest 8-bit microprocessors, through the PDP-11 and VAX, up to IBM mainframes, and has managed to write programs that did not crash on most of them. Mr. Lindley has been a GNU/Linux user since 1992 and has been free of proprietary software since 2001. Most recently he has been pleased to be an adjunct professor at Mesa Community College.

Four short links: 11 March 2019

O'Reilly Radar - Mon, 2019/03/11 - 03:46

Health Care NLP, Same Sex Databases, Apple Maps RE, and Science Articles

  1. Lessons Learned Building Natural Language Processing Systems in Health Care -- The next mistake I made, like many others, was building models that “solve health care.” Amazon’s Comprehend Medical is now taking this approach with a universal medical-NLP-as-a-service. This assumes that health care is one language. In reality, every sub-specialty and form of communication is fundamentally different.
  2. Obergefell v. Hodges: The Database Engineering Perspective -- discussion of the database implications of same-sex marriage. But the more interesting thing is that you just incidentally let in a whole bunch of edge cases. Up until now, it wasn't possible for an individual to marry themself. Now it is, and you need a new check constraint to ensure that partner_1_id and partner_2_id are different. Regardless of concerns about duplicate rows/couples remarrying, you also now have to contend with swapped partners: Alice marries Eve, and also Eve marries Alice, resulting in two rows recording the same marriage.
  3. Apple Maps Flyover Reverse Enginering -- This is an attempt to reverse-engineer Flyover (= 3D satellite mode) from Apple Maps. The main goal is to document the results and to provide code that emerges.
  4. -- The best articles for science lovers shortened to five bullet points or less. At first, I was sniffy at the "five bullet points," but then I realized my modal amount retained from reading science articles is 0, so...

Continue reading Four short links: 11 March 2019.

Categories: Technology


Subscribe to LuftHans aggregator