You are here

O'Reilly Radar

Subscribe to O'Reilly Radar feed
All of our Ideas and Learning material from all of our topics.
Updated: 2 weeks 4 days ago

Becoming a machine learning practitioner

Thu, 2019/08/29 - 05:00

The O’Reilly Data Show Podcast: Kesha Williams on how she added machine learning to her software developer toolkit.

In this episode of the Data Show, I speak with Kesha Williams, technical instructor at A Cloud Guru, a training company focused on cloud computing. As a full stack web developer, Williams became intrigued by machine learning and started teaching herself the ML tools on Amazon Web Services. Fast forward to today, Williams has built some well-regarded Alexa skills, mastered ML services on AWS, and has now firmly added machine learning to her developer toolkit.

Continue reading Becoming a machine learning practitioner.

Categories: Technology

Four short links: 29 August 2019

Thu, 2019/08/29 - 04:30

Debugging a Scale Problem, Verifying Cryptographic Protocols, Remote Team Stress, and PAC-MAN Source

  1. 6 Lessons we Learned When Debugging a Scaling Problem on -- When you choose specific non-default settings, leave a comment or link to documentation/issues as to why; future people will thank you. This.
  2. Verifpal -- software for verifying the security of cryptographic protocols. Building upon contemporary research in symbolic formal verification, Verifpal’s main aim is to appeal more to real-world practitioners, students, and engineers without sacrificing comprehensive formal verification features.
  3. Stress in Remote Teams -- features a good list of the causes of stress in remote teams. The section on work-family conflict struck close to home (so to speak).
  4. Atari PAC-MAN Source Code -- original Atari 8-bit PAC-MAN source code. You can even compare versions with and without use of the macro assembler.

Continue reading Four short links: 29 August 2019.

Categories: Technology

One simple chart: Who is interested in Apache Pulsar?

Wed, 2019/08/28 - 04:00

Multi-layer architecture, scalability, multitenancy, and durability are just some of the reasons companies have been using Pulsar.

With companies producing data from an increasing number of systems and devices, messaging and event streaming solutions—particularly Apache Kafka—have gained widespread adoption. Over the past year, we’ve been tracking the progress of Apache Pulsar (Pulsar), a less well-known but highly capable open source solution originated by Yahoo. Pulsar is designed to intelligently process, analyze, and deliver data from an expanding array of services and applications, and thus it fits nicely into modern data platforms. Pulsar is also designed to ease the operational burdens normally associated with complex, distributed systems.

Who else is interested in Pulsar? Karthik Ramasamy, CEO of Streamlio, was kind enough to share geo-demographic data of recent visitors to the project’s homepage:

Of the thousands of recent visitors to the site: 33% are from the Americas, 36% from Asia-Pacific, and 27% were based in the EMEA region.

While Apache Kafka is by far the most popular pub/sub solution, over the last year, we’ve started to come across numerous companies that use Pulsar. It turns out that Pulsar has a few features these companies value, including:

  • Multi-layer architecture comprised of a serving layer (brokers that coordinate how messages are received, stored, processed, and delivered), a storage layer (Apache BookKeeper nodes are used to persist messages), and a processing layer (via Pulsar functions or Pulsar SQL).
  • High performance and scalability: Pulsar has been used at Yahoo for several years to handle 100 billion messages per day on over two million topics. It is able to support millions of topics while delivering high-throughput and low-latency performance.
  • Easily add storage or serving without having to rebalance the entire cluster: the multi-layer architecture allows for storage to be added independently of serving. One is also able to make serving and storage layer expansions without any down time.
  • Support for popular messaging models including pub/sub messaging and message queuing.
  • Multitenancy allows a single Pulsar cluster to support an entire enterprise and lets each team have a separate namespace with its own quotas.
  • Durability (no data loss): data is replicated and synced to disk.
  • Geo-replication: out-of-box support for geographically distributed applications. Pulsar supports several different modes for replicating the data between clusters.

While previous generation messaging systems focused primarily on moving data, newer frameworks like Pulsar add data processing capabilities essential for feeding data into analytics and AI applications. The rise of connected devices, the introduction of 5G, and the growing importance of machine learning and AI will require that companies build infrastructure for capturing, processing, and moving many data streams. And they will increasingly need to perform these tasks in (near) real time. The good news is that critical components for data management, processing, transport, and orchestration continue to improve and automation technologies should ease operational burdens moving forward.


Continue reading One simple chart: Who is interested in Apache Pulsar?.

Categories: Technology

Four short links: 28 August 2019

Wed, 2019/08/28 - 03:00

Tech and Politics, Crypto-Mining Malware, Cost of Securing DNS, and Anti-Fuzzing Techniques

  1. Summer School Presentations -- a great selection of talks on technology and political structures.
  2. A First Look at the Crypto-Mining Malware Ecosystem: A Decade of Unrestricted Wealth -- In this paper, we conduct the largest measurement of crypto-mining malware to date, analyzing approximately 4.4 million malware samples (one million malicious miners), over a period of 12 years from 2007 to 2018. We then analyze publicly available payments sent to the wallets from mining-pools as a reward for mining, and estimate profits for the different campaigns. Our profit analysis reveals campaigns with multi-million earnings, associating over 4.3% of Monero with illicit mining.
  3. Analyzing the Costs (and Benefits) of DNS, DoT, and DoH for the Modern Web -- two new protocols have been proposed: DNS-over-HTTPS (DoH) and DNS-over-TLS (DoT). Rather than sending queries and responses as cleartext, these protocols establish encrypted tunnels between clients and resolvers. This fundamental architectural change has implications for the performance of DNS, as well as for content delivery. In this paper, we measure the effect of DoH and DoT on name resolution performance and content delivery.
  4. Fuzzification -- anti-fuzzing techniques.

Continue reading Four short links: 28 August 2019.

Categories: Technology

Four short links: 27 August 2019

Tue, 2019/08/27 - 04:10

Personal Information, Research Data, Massive Lamba Scale, and The Moral Character of Cryptographic Work

  1. Presidio -- recognizers for personally identifiable information, assembled into a pipeline that helps you scrub sensitive text such as credit card numbers, names, locations, social security numbers, bitcoin wallets, US phone numbers, and financial data.
  2. Microsoft's Academic Knowledge Graph -- a large RDF data set with over eight billion triples with information about scientific publications and related entities, such as authors, institutions, journals, and fields of study. The data set is based on the Microsoft Academic Graph and licensed under the Open Data Attributions license. Furthermore, we provide entity embeddings for all 210M represented scientific papers.
  3. gg -- code from the paper From Laptop to Lambda: Outsourcing Everyday Jobs to Thousands of Transient Functional Containers, describing a framework and a set of command-line tools that helps people execute everyday applications—e.g., software compilation, unit tests, video encoding, or object recognition—using thousands of parallel threads on a cloud functions service to achieve near-interactive completion times. In the future, instead of running these tasks on a laptop, or keeping a warm cluster running in the cloud, users might push a button that spawns 10,000 parallel cloud functions to execute a large job in a few seconds from start. gg is designed to make this practical and easy. (via Hacker News)
  4. The Moral Character of Cryptographic Work -- Cryptography rearranges power: it configures who can do what, from what. This makes cryptography an inherently political tool, and it confers on the field an intrinsically moral dimension. The Snowden revelations motivate a reassessment of the political and moral positioning of cryptography. They lead one to ask if our inability to effectively address mass surveillance constitutes a failure of our field. I believe that it does. I call for a community-wide effort to develop more effective means to resist mass surveillance. I plead for a reinvention of our disciplinary culture to attend not only to puzzles and math, but, also, to the societal implications of our work.

Continue reading Four short links: 27 August 2019.

Categories: Technology

Four short links: 26 August 2019

Mon, 2019/08/26 - 04:00

Avoiding Sexual Predators, YouTube Radicalization, Brian Behlendorf, and Cyberpunk Present

  1. How to Avoid Supporting Sexual Predators (Valerie Aurora) -- Your research process will look different depending on your situation, but the key elements will be: (1) Assume that sexual predators exist in your field and you don’t know who all of them are. (2) When you are asked to work with or support someone new, do research to find out if they are a sexual predator. (3) When you find out someone is probably a sexual predator, refuse to support them.
  2. Auditing Radicalization Pathways on YouTube -- the three communities increasingly share the same user base; that users consistently migrate from milder to more extreme content; and that a large percentage of users who consume Alt-right content now consumed Alt-lite and IDW [Intellectual Dark Web] content in the past. And recommendations steer people to more extreme content.
  3. Brian Behlendorf Interview -- Where a distributed database that was not just, “Here is a master MySQL node and slaves that hang off of it,” was not just a multi-write kind of system, but one that actually supported consensus, one that actually had the network enforcing rules about valid transactions versus invalid transactions. One that was programmable, with smart contracts on top. This started to make sense to me, and was something that was appealing to me in a way that financial instruments and proof-of-work was not. Hyperledger was announced by a set of large companies, along with the Linux Foundation to try to research this space further, and try to figure out the enterprise applications of these technologies.
  4. Employees Connect Nuclear Plant to the Internet so They can Mine Cryptocurrency (ZDNet) -- on the one hand I'm, "we're living in a Cyberpunk novel!", and on the other hand I'm, "oh god, we're living in a Cyberpunk novel!".

Continue reading Four short links: 26 August 2019.

Categories: Technology

How organizations are sharpening their skills to better understand and use AI

Mon, 2019/08/26 - 04:00

To successfully implement AI technologies, companies need to take a holistic approach toward retraining their workforces.

Continuous learning is critical to business success, but providing employees with an easily accessible, results-driven solution they can access from wherever they are, whenever they need it, is no easy feat. Additionally, delivering valuable content in a variety of formats—whether that is through books, videos, or live online training—is crucial to supporting employees to upskill and reskill on the job. These are some of the features O’Reilly Online Learning provides to its 2.25 million platform users to encourage personal and professional development, and there’s no better time to take advantage.

According to Deloitte, evolving work demands and skills requirements are one big reason why continuous learning is critical, and there is no sector experiencing this more abruptly than technology. Executives and employees alike are worried about how emerging tech, such as robotics and AI, are changing jobs and how people should prepare for them. In fact, a recent World Economic Forum report found that more than half (54%) of all employees will require significant reskilling and upskilling in just three years. So, what exactly are the skills data scientists and other tech titles are honing in response to this shift?

As the co-chair of the O'Reilly Artificial Intelligence conference, I regularly track broad changes in consumption patterns and preferences on our platform. For example, Figure 1 shows usage across a few select topics related to AI and Data. More precisely, it provides total usage across all content types in this subset of topics. We measure consumption with Units, a metric tuned specifically for the type of content (e.g., page views for books, minutes for videos):

Figure 1. Content usage across a few select AI and Data topics on Image by Ben Lorica.

Python is the largest topic on our platform, and it also happens to be a popular language among data scientists (the second largest topic is another programming language, Java). Overall content usage, across all topics combined, grew by 8% from 2018 to 2019 (January to July). Among the fastest-growing topics are those central to building AI applications: machine learning (up 58% from 2018), data science (up 53%), data engineering (up 58%), and AI itself (up 52%).

One of the main reasons Python has been ascendant as a programming language is because of its popularity among data scientists and machine learning researchers and practitioners. In fact, of the top 20 most-consumed Python titles on O’Reilly Online Learning in 2019, several were focused mainly on data science and machine learning applications, including:

In a survey we conducted earlier this year about AI adoption in the enterprise, respondents cited culture, organization, and lack of skilled people among the leading reasons holding back their adoption of AI technologies. As I noted in a recent article, adopting and sustaining AI and machine learning within a company will require retraining your entire organization. To succeed in implementing and incorporating AI and machine learning technologies, companies need to take a more holistic approach toward retraining their workforces. The rapid growth in consumption of content in training-relevant topics on (including machine learning, data engineering, data science, and AI) provide early signs that companies and individuals are taking training seriously.

At our upcoming Artificial Intelligence conferences in San Jose and London, we have assembled a roster of two-day training sessions, tutorial sessions, and presentations to help individuals (across job roles and functions) sharpen their skills and understanding of AI and machine learning. In addition to our usual strong slate of technical training, tutorials, and talks, we return with a two-day Business Summit designed specifically for executives and business leaders. Wholesale transformation will require cross-functional teams who are familiar with digital, data, and AI technologies. With this in mind, the AI conference in San Jose also will feature several outstanding new tutorials as well as executive briefings and case studies from leading companies and research organizations.

Continue reading How organizations are sharpening their skills to better understand and use AI.

Categories: Technology

Four short links: 23 August 2019

Fri, 2019/08/23 - 01:00

Open Source Economics, Program Synthesis, YouTube Influence, and ChatBot Papers

  1. The Economics of Open Source (CJ Silverio) -- I'm going to tell you a story about who owns the Javascript language commons, how we got into the situation that the language commons is *by* someone, and why we need to change it.
  2. State of the Art in Program Synthesis -- conference, with talks to be posted afterwards, run by a YC startup. Program Synthesis is one of the most exciting fields in software today, in my humble opinion: Programs that write programs are the happiest programs in the world, in the words of Andrew Hume. It'll give coders superpowers, or make us redundant, but either way it's interesting.
  3. Alternative Influence (Data and Society) -- amazing report. Extremely well-written, it lays out how the alt right uses YouTube. These strategies reveal a tension underlying the content produced by these influencers: while they present themselves as news sources, their content strategies often more accurately consist of marketing and advertising approaches. These approaches are meant to provoke feelings, memories, emotions, and social ties. In this way, the “accuracy” of their messaging can be difficult to assess through traditional journalistic tactics like fact-checking. Specifically, they recount ideological testimonials that frame ideology in terms of personal growth and self-betterment. They engage in self-branding techniques that present traditional, white, male-dominated values as desirable and aspirational. They employ search engine optimization (SEO) to highly rank their content against politically charged keywords. And they strategically use controversy to gain attention and frame political ideas as fun entertainment.
  4. Chatbot and Related Research Paper Notes with Images -- Papers related to chatbot models in chronological order spanning about five years from 2014. Some papers are not about chatbots, but I included them because they are interesting, and they may provide insights into creating new and different conversation models. For each paper I provided a link, the names of the authors, and GitHub implementations of the paper (noting the deep learning framework) if I happened to find any. Since I tried to make these notes as concise as possible they are in no way summarizing the papers but are merely a starting point to get a hang of what the paper is about, and to mention main concepts with the help of pictures.

Continue reading Four short links: 23 August 2019.

Categories: Technology

Four short links: 22 August 2019

Thu, 2019/08/22 - 05:55

I Don't Know, Map Quirks, UI Toolkit, and Open Power Chip Architecture

  1. I Don't Know (Wired) -- Two percent of Brits don’t know whether they’ve lived in London before. Five percent don’t know whether they’ve been attacked by a seagull or not. A staggering one in 20 residents of this fine isle don’t know whether or not they pick their nose. (via Flowing Data)
  2. Haberman -- interesting research into one way that online maps end up with places that aren't places.
  3. Blueprint -- a React-based UI toolkit for the web. It is optimized for building complex, data-dense web interfaces for desktop applications that run in modern browsers and IE11. This is not a mobile-first UI toolkit.
  4. IBM Open Sources Power Chip Instruction Set (Next Platform) -- To be precise about what IBM is doing, it is opening up the Power ISA [Instruction Set Architecture] and giving it to the OpenPower Foundation royalty free with patent rights, and that means companies can implement a chip using the Power ISA without having to pay IBM or OpenPower a dime, and they have patent rights to what they develop. Companies have to maintain compatibility with the instruction set, King explains, and there are a whole set of compatibility requirements, which we presume are precisely as stringent as Arm and are needed to maintain runtime compatibility should many Power chips be developed, as IBM hopes will happen.

Continue reading Four short links: 22 August 2019.

Categories: Technology

Four short links: 21 August 2019

Wed, 2019/08/21 - 04:40

Competition vs. Convenience, Super-Contributors and Power Users, Forecasting Time Series, and Appreciating Non-Scalability

  1. Less than Half of Google Searches Now Result in a Click (Sparktoro) -- We can see a consistent pattern: organic shrinks while zero-click searches and paid CTR rise. But the devil’s in the details, and, in this case, mostly the mobile details, where Google’s gotten more aggressive with how ads and instant answer-type features appear. Everyone has to beware of the self-serving, "hey, we're doing people a favor by taking (some action that results in greater market domination for us)" because there's a time when the fact that you have meaningful competition is better for the user than a marginal increase in value add from keeping them in your property longer. (via Slashdot)
  2. Super-Contributors and Power Laws (MySociety) -- Overall, two-thirds of users made only one report—but the reports made by this large set of users only makes up 20% of the total number of reports. This means that different questions can lead you to very different conclusions about the service. If you’re interested in the people who are using FixMyStreet, that two-thirds is where most of the action is. If you’re interested in the outcomes of the service, this is mostly due to a much smaller group of people. This dynamic applies pretty much everywhere and is worth understanding.
  3. Facebook Prophet -- a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well. Written in Python and R.
  4. On Nonscalability: The Living World Is Not Amenable to Precision-Nested Scales -- to scale well is to develop the quality called scalability, that is, the ability to expand—and expand, and expand—without rethinking basic elements. [...] [B]y its design, scalability allows us to see only uniform blocks, ready for further expansion. This essay recalls attention to the wild diversity of life on earth through the argument that it is time for a theory of nonscalability. (via Robin Sloan)

Continue reading Four short links: 21 August 2019.

Categories: Technology

Four short links: 20 August 2019

Tue, 2019/08/20 - 04:00

Content Moderation, Robust Learning, Archiving Floppies, and xkcd Charting

  1. Information Operations Directed at Hong Kong (Twitter) -- Today we are adding archives containing complete tweet and user information for the 936 accounts we’ve disclosed to our archive of information operations—the largest of its kind in the industry. This is a goldmine for researchers, as you can see from Renee DiResta's notes. Facebook also removed accounts for the same reason but hasn't shared the data. Google has not taken a position yet, which prompted Alex Stamos to say, "Two of the three relevant companies have made public statements. Neither have realistic prospects in the PRC, the other does. Lots of lessons from this episode, but one might be a reinforcement of how Russia represents “easy mode” for platforms doing state attribution. It’s a lot harder when the actor is financially critical, like the PRC or India." We're in interesting times, and research around content moderation are the most interesting things I've seen on the internet since SaaS. This work cuts to human truths, technical capability, and the limits of openness.
  2. Robust Learning from Untrusted Sources (Morning Paper) -- designed to let you incorporate data from multiple "weakly supervised" (i.e., noisy) data sources. Snorkel replaces labels with probability-weighted labels, and then trains the final classifier using those.
  3. Imaging Floppies (Jason Scott) -- recording the magnetic strength everywhere on the disk so you archive all the data not just the data you can read once. The result of this hardware is that it takes a 140 kilobyte floppy disk (140k) and reads it into a 20 megabyte (20,000 kilobyte) disk image. This means a LOT of the magnetic aspects of the floppy are read in for analysis. [...] This doesn't just dupe the data, but the copy protection, unique track setup, and a bunch of variance around each byte on the floppy to make it easier to work with. The software can then do all sorts of analysis to give us excellent, bootable disk images. Don't ever think that archiving is easy, or problems are solved.
  4. Chart.xkcd -- a chart library plots “sketchy,” “cartoony,” or “hand-drawn” styled charts. The world needs more whimsy.

Continue reading Four short links: 20 August 2019.

Categories: Technology

Four short links: 19 August 2019

Mon, 2019/08/19 - 04:20

Developer Tool, Deep Fakes, DNA Tests, and Retro Coding Hacks

  1. CROKAGE: A New Way to Search Stack Overflow -- a paper about a service [that] takes the description of a programming task as a query and then provides relevant, comprehensive programming solutions containing both code snippets and their succinct explanations. There's a replication package on GitHub. Follows in the footsteps of Douglas Adams's Electric Monk (which people bought to pay for them) and DVRs (which people use to watch TV for them), now we have software that'll copy dodgy code from the web for you. Programmers, software is coming for your jobs.
  2. Cheap Fakes Beat Deep Fakes -- One of the fundamental rules of information warfare is that you never lie (except when necessary.) Deepfakes are detectable as artificial content, which reveals the lie. This discredits the source of the information and the rest of their argument. For an information warfare campaign, using deepfakes is a high-risk proposition.
  3. I Took 9 Different Commercial DNA Tests and Got 6 Different Results -- refers to the dubious ancestry measures. "Ancestry itself is a funny thing, in that humans have never been these distinct groups of people," said Alexander Platt, an expert in population genetics at Temple University in Philadelphia. "So, you can't really say that somebody is 92.6 percent descended from this group of people when that's not really a thing."
  4. Dirty Tricks 6502 Programmers Use -- wonderfully geeky disection of a simple task rendered in as few bytes as possible.

Continue reading Four short links: 19 August 2019.

Categories: Technology

Antitrust regulators are using the wrong tools to break up Big Tech

Mon, 2019/08/19 - 04:00

What we really need is disclosure of information about the growth and health of the supply side of Big Tech's marketplaces.

It’s a nerve-wracking time to be a Big Tech company. Yesterday, a US subcommittee on antitrust grilled representatives from Amazon, Google, Facebook, and Apple in Congress, and presidential candidates have gone so far as to suggest that these behemoths should be broken up. In the European Union, regulation is already happening: in March, the EU levied its third multibillion-dollar fine against Googlefor anti-competitive behavior.

In his 2018 letter to shareholders, published this past April, Jeff Bezos was already prepping for conversations with regulators. He doesn’t think Amazon is a monopoly. Instead, the company’s founder argues it is “just a small player in global retail.”

In Bezos’s defense, for many of the products Amazon sells, there are indeed many alternative sources, suggesting plenty of competition. Despite Amazon’s leadership in online retail, Walmart is more than double Amazon’s size as a general retailer, with Costco not far behind Amazon. Specialty retailers like Walgreens and CVS in the pharmacy world and Kroger and Albertson’s in groceries also dwarf Amazon’s presence in their categories.

But Amazon does not just compete with Walmart, CVS, Kroger, and other retailers—it also competes with the merchants who sell products through its platform.

This competition isn’t just the obvious kind, such as the Amazon Basics-branded batteries that by 2016 represented one third of all online battery sales, as well as similar Amazon products in audio, home electronics, baby wipes, bed sheets, and kitchenware. Amazon also competes with its merchants for visibility on its platform, and charges them additional fees for favored placement. And because Amazon is now leading with featured products rather than those its customers think are the best, its merchants are incentivized to advertise on the platform. Amazon’s fast-growing advertising business is thus a kind of tax on its merchants.

Likewise, Google does not just compete with other search engines like Bing and DuckDuckGo, but with everyone who produces content on the world wide web. Apple’s iPhone and Google’s Android don’t just compete with each other as smartphone platforms, but also with the app vendors who rely on smartphones to sell their products.

This kind of competition is taken for granted by antitrust regulators, who are generally more concerned with the end cost for consumers. And as anyone who has shopped online will know, Amazon is nearly always the cheaper option. (In fact, surveys have suggested that between seven and nine out of 10 Americans will check Amazon to compare the price of a purchase.) As long as the monopoly doesn’t lead to us forking out more money, then antitrust regulators traditionally leave it alone.

However, this view of antitrust leaves out some unique characteristics of digital platforms and marketplaces. These giants don’t just compete on the basis of product quality and price—they control the market through the algorithms and design features that decide which products users will see and be able to choose from. And these choices are not always in consumers’ best interests.

A fresh approach to antitrust

All of the internet giants—Amazon, Google, Facebook, and insofar as app stores are considered, Apple—provide the illusion of free markets, in which billions of consumers choose among millions of suppliers’ offerings, which compete on the basis of price, quality, and availability.

But if you recognize that what consumers really choose from is not the universe of all possible products, but those that are offered up to them either on the homepage or the search screen, the “shelf space” provided by these platforms is in fact far more limited than the tiniest of local markets—and what is placed on that shelf is uniquely under the control of the platform owner. And with mobile playing a larger and larger role, that digital shelf space is visibly shrinking rather than growing.

In short, the designers of marketplace-platform algorithms and screen layouts can arbitrarily allocate value to whom they choose. The marketplace is designed and controlled by its owners, and that design shapes “who gets what and why” (to use the marvelous phrase from Alvin E. Roth, who received a Nobel prize in economics for his foundational work in the field of market design.)

When it comes to antitrust, the question of market power must be answered by analyzing the effect of these marketplace designs on both buyers and sellers, and how they change over time. How much of the value goes to the platform, how much to consumers, and how much to suppliers?

The platforms have the power to take advantage of either side of their marketplace. Any abuse of market power is likely to show up first on the supply side. A dominant platform can squeeze its suppliers while continuing to pass along part of the benefit to consumers—but keeping more and more of it for themselves.

Over time, though, consumers feel the bite. Power over sellers ultimately translates into power over customers as well. As the platform owner favors its own offerings over those of its suppliers, choice is reduced, though it is only in the endgame that consumer pricing—the typical measure of a monopoly—begins to be affected.

The control that the platforms have over placement and visibility puts them in a unique position to collect what economists call rents: that is, value extracted through the ownership of a limited resource. These rents may come in the form of additional advantage given to the marketplace’s own private-label products, but also through the fees that are paid by merchants who sell through that platform. These fees can take many forms, including the necessity for merchants to spend more on advertising in order to gain visibility; Amazon products don’t have to pay such a levy.

The term “rents” dates back to the very earliest days of modern economics, when agricultural land was still the primary source of wealth. That land was worked productively by tenant farmers, who produced value through their labor. But the bulk of the benefit was taken by the landed gentry, who lived lives of ease on the unearned income that accrued to them simply through the ownership of their vast estates. In today’s parlance, Amazon’s merchants are becoming sharecroppers. The cotton field has been replaced by a search field.

Not all rents are bad. Economist Joseph Schumpeter pointed out that technological innovation often can lead to temporary rents, as innovators initially have a corner on a new product or service. But he also pointed out that these so-called Schumpeterian rents can, over time, become traditional monopolistic rents.

This is what antitrust regulators should be looking at when evaluating internet platform monopolies. Has control over the algorithms and designs that allocate attention become the latest tool in the landlord’s toolbox?

Big Tech has become the internet’s landlord—and rents are rising as a result.

In her book, The Value of Everything, economist Mariana Mazzucato makes the case that if we are really to understand the sources of inequality in our economy, economists must turn their attention back to rents. One of the central questions of classical economics was what activities are actually creating value for society, and which are merely value extracting—in effect charging a kind of tax on value that has actually been created elsewhere.

In today’s neoclassical economics, rents are seen as a temporary aberration, the result of market defects that will disappear given sufficient competition. But whether we are asking fundamental questions about value creation, or merely insufficient competition, rent extraction gives us a new lens through which to consider antitrust policy.

How internet platforms increase choice

Before digital marketplaces curtailed our choices as consumers, they first expanded our options.

Amazon’s virtually unlimited virtual shelf space radically expanded opportunity for both suppliers and consumers. After all, Amazon carries 120 million unique products in the US alone, compared to about 120,000 in a Walmart superstore or 35 million on What’s more, Amazon operates a marketplace with over 2.5 million third-party sellers, whose products, collectively, provide 58% of all Amazon retail revenue, with only 42% coming from Amazon’s first-party retail operation.

In the first-party retail operation, Amazon buys products from its suppliers and then resells them to consumers. In the third-party operation, Amazon collects fees for providing marketplace services to sellers—including display on, warehousing, shipping, and sometimes even financing—but never legally takes possession of the companies’ merchandise. This is what allows it to have so many more products to sell than its competitors: because Amazon never takes possession of inventory but instead charges suppliers for the services it provides, the risk of offering a slow-moving product is transferred from Amazon to its suppliers.

All of this appears to add up to the closest approximation ever seen in retail to what economists call “perfect competition.” This term refers to market conditions in which a large number of sellers with offers to provide comparable products at a range of prices are met by a large number of buyers looking for those products. Those buyers are armed not only with the ability to compare the price at which products are offered, but also to compare the quality of those products via consumer ratings and reviews. In order to win the business of consumers, suppliers must not only offer the best products at the best prices, but must compete for customers to express their satisfaction with the products they have bought.

So far, at least according to the statistics Bezos shared in his annual letter, the success of the Amazon marketplace is a triumph for both suppliers and consumers, and antitrust regulators should look elsewhere. As he put it, “Third-party sellers are kicking our first-party butt.”

He may well be right, but there are warning signs from other internet marketplaces like Google search that suggest the situation may not be as rosy as it appears. As it turns out, regulators need to consider some additional factors in order to understand the market power of internet platforms.

How internet platforms take away choice

If Amazon has become “the everything store” for physical goods, Google is the everything store for information.

Even more than Amazon, Google appears to meet the conditions for perfect competition. It matches up consumers with a near-infinite source of supply. Ask any question, and you’ll be provided with answers from hundreds or even thousands of competing content suppliers.

To do this, Google searches hundreds of billions of web pages created by hundreds of millions of information suppliers. Traditional price matching is absent, since much of the content is offered for free, but Google uses hundreds of other signals to determine what answers its customers are likely to find “best.” They measure such things as the reputation of the sites linking to any other site (page rank); the words those sites use to make those links (anchor text); the content of the document itself (via an AI engine referred to as “the Google Brain”); how likely people are to click on a given result in the list, based on millions of iterations, all recorded and measured; and even whether people clicked on a link and appear to have gone away satisfied (“a long click”) or came back and clicked on another (“a short click”).

The same goes for advertising on Google. Its “pay per click” ad auction model was a breakthrough in the direction of perfect competition: advertisers pay only when customers click on their ads. Both Google and advertisers are thus incentivized to feature ads that users actually want to see.

Only about 6% of Google search results pages contain any advertising at all. Both content producers and consumers have the benefit of Google’s immense effort to index and search all web pages, not just those that are commercially valuable. Google is like a store where all of the goods are free to consumers, but some merchants pay, in the form of advertising, to have their goods placed front and center.

The company is well aware of the risk that advertising will lead Google to favor the needs of advertisers over those of searchers. In fact, “Advertising and mixed motives” is the title of the appendix to Google founders Larry Page and Sergey Brin’s original 1998 research paper on Google’s search algorithms, written while they were still graduate students at Stanford.

By placement on the screen and algorithmic priority, platforms have the power to shape the pages users click on and the products they decide to buy.

“The goals of the advertising business model do not always correspond to providing quality search to users,” they thoughtfully observed. Google made enormous efforts to overcome those mixed motives by clearly separating their advertising results from their organic results, but the company has blurred those boundaries over time, perhaps without even recognizing the extent to which they have done so.

It is undeniable that the Google search results pages of today look nothing like they did when the company went public in 2004. The list of 10 “organic” results with three paid listings on the top and a sidebar of advertising results on the right that once characterized Google are long gone.

Dutch search engine consultant Eduard Blacquière documented the changes in size and placement of adwords (link in Dutch), the pay-per-click advertisements that run alongside searches, between 2010 and 2014. Here’s a page he captured in June 2010, the result for a search for the word “autoverzekering” (“auto insurance” in Dutch).

Figure 1. Screengrab: Eduard Blacquière.

Note that the adwords at the top of the page have a background tint, and those at the side have a narrower column width, setting both off clearly from the organic results. Take a quick glance at this page, and your eye can quickly jump to the organic results while ignoring the ads if that’s what you prefer.

Here is Blacquière’s dramatization of the change in size of that top block of adwords. As you can see, the ad block has both dramatically changed in size and lost its background color between 2010 and 2019, making it much harder to distinguish ads from organic results.

Figure 2. Screengrab: Eduard Blacquière.

Today, paid results can push organic results almost off the screen, so that the searcher has to scroll down to see them at all. On mobile pages with advertisements, this is almost always the case. Blacquière also documented the result of several studies done over a five-year period, which found the likelihood of a click on the first organic search result fell from over 40% in 2010 to less than 20% in 2014. This shows that through changes in homepage design alone, Google was able to shift significant attention from organic search results to ads.

Not only is paid advertising supplanting organic search results, but for more and more queries, Google itself has now collected enough information to provide what it considers to be the best answer directly to the consumer, eradicating the need to send us to a third-party website at all.

That’s the box that often appears above the search results when you ask a question, such as What are the lyrics to “Don’t Stop Believing,” or What date did WWII end?; the box to the right that pops up with restaurant reviews and opening hours; or a series of visual cards midway down the screen that show you the actors who appeared in a movie or different kinds of pastries common to a geographic region.

Through changes in homepage design alone, Google was able to shift significant attention from organic search results to ads.

Where does this information come from? In 2010, with the acquisition of Metaweb, Google committed to a project it called “the knowledge graph,” a collection of facts about well-known entities such as places, people, and events. This knowledge graph provides immediate answers for many of the most common queries.

The knowledge graph was initially culled from the web by ingesting information from sources such as Wikipedia, Wikidata, and the CIA Factbook, but since then, it has become far more encyclopedic and has ingested information from all over the web. In 2016, Google CEO Sundar Pichai claimed that the Google knowledge graph contained more than 70 billion facts.

As shown in the figure below, for a popular search that has commercial potential, like visit Yellowstone, not only is the search results page dominated by paid search results (ads) and content directly supplied by Google, but Google’s “answer boxes” are themselves filled with links to other Google pages rather than to third-party websites. (Note that Google personalizes results and also runs hundreds of thousands of A/B tests a day on the effect of minor changes in position, so your own results for this identical search may have different results than are shown here.)

Figure 3. Screengrab: Tim O'Reilly.

As of March 2017, user clickstream data provided by web analytics firm Jumpshot suggests that up to 40% of all Google queries no longer result in a click through to an external website. Think of all the questions you go to Google for that no longer require a second click: what’s the weather? What’s the current value of the euro against the dollar? What’s that song that’s playing in the background? What’s the best local restaurant? Biographies of eminent people, descriptions of cities, neighborhoods, businesses, historical events, quotes by famous authors, song lyrics, stock prices, and flight times all now appear as immediate answers from Google.

I am not necessarily suggesting anti-competitive intent. Google claims, with considerable justice, that all of these changes to search engine result pages are designed to improve user experience. And indeed, it is often helpful to get an immediate answer to a query rather than having to click through to another web site. Furthermore, much of this data is in fact licensed. But these deals seem like a step backward from the perfect competition represented by Google’s original reliance on multi-factor search algorithms to surface the very best information from independent web sites.

The net effect on Google’s financial performance is striking. In 2004, the year that Google went public, it had two principal advertising revenue engines: Adwords (those pay-per-click advertisements that run alongside searches on Google’s own site) and Adsense (pay-per-click advertisements that Google places on third-party websites on their behalf, either in search results on their site or directly alongside their content). In 2004, the two revenue sources were very close to equal. But by 2018, Google’s revenue from advertising on its own properties had grown to 82% of its total advertising revenue, with only 18% coming from the advertising it provides on third–party sites.

These examples illustrate the power of a platform to shape, both by placement on the screen and algorithmic priority, the pages users click on and the products they decide to buy—and therefore also the economic success for the supply side of its marketplace. Google maintains a rigorous separation between the search and advertising teams, but despite that fact, changes in the layout of Google’s pages and its algorithms have played an enormous role in shaping the attention of its users to favor those who advertise with Google.

When Google decides unilaterally on the size and position that its own products take on the screen, it also stops consumers from organically deciding what content to click on or what socks to buy. That’s what antitrust regulators should be considering: whether the algorithmic and design control exerted by sites like Google or Amazon reduces the choices we have as consumers.

Maintaining the illusion of choice

If Google has monopolized our access to information, Amazon’s fast-growing advertising business is now shaping what products consumers are actually given to choose from. Have they, too, taken a bite from the poisoned apple of advertising’s mixed motives?

Amazon’s merchants are becoming sharecroppers. The cotton field has been replaced by a search field.

Like Google, Amazon used to rely heavily on the collective intelligence of its users to recommend the best products from its suppliers. It did this by using information such as the supplier-provided description of the product, the number and quality of reviews, the number of inbound links, the sales rank of similar products, and so on, to determine the order in which search results would appear. These were all factored into Amazon’s default search ranking, which put products that were considered “Most Popular” first.

But as with Google, this eden of internet collective intelligence may be in danger of coming to an end.

In the example below, you can see that the default search for “best science fiction books” on Amazon now turns up only “Featured” (i.e., paid for) products. Are these the results you’d expect from this search? Where are the Hugo and Nebula award winners? Where are the books and authors with thousands of five-star reviews?

Contrast these results for those for the same search on Google, shown in the figure below. A knowledgeable science-fiction fan might quibble with some of these selections, but this is indeed a list of widely acknowledged classics in the field. In this case, Google presents no advertising, and so the results instead simply reflect the collective intelligence of what the web thinks is best.

While this might be taken as a reflection of the superiority of Google’s search algorithms over Amazon’s, the more important point is to note how differently a platform treats results when it has no particular commercial axe to grind.

Amazon has long claimed that the company is fanatically focused on the needs of its customers. A search like the one shown above, which favors paid results, demonstrates how far the quest for advertising dollars takes them from that avowed goal.

Advice for antitrust regulators

So, how are we therefore best to decide if these Big Tech platforms need to be regulated?

In one famous exchange, Bill Gates, the founder and former CEO of Microsoft, told Chamath Palihapitiya, the one-time head of the Facebook platform:

“This isn’t a platform. A platform is when the economic value of everybody that uses it exceeds the value of the company that creates it. Then it’s a platform.”

Given this understanding of the role of a platform, regulators should be looking to measure whether companies like Amazon or Google are continuing to provide opportunity for their ecosystem of suppliers, or if they’re increasing their own returns at the expense of that ecosystem.

Rather than just asking whether consumers benefit in the short term from the companies’ actions, regulators should be looking at the long-term health of the marketplace of suppliers—they are the real source of that consumer benefit, not the platforms alone. Have Amazon, Apple, or Google earned their profits, or are they coming from monopolistic rents?

How might we know whether a company operating an algorithmically managed marketplace is extracting rents rather than simply taking a reasonable cut for the services it provides? The first sign may not be that it is raising prices for consumers, but that it is taking a larger percentage from its suppliers, or competing unfairly with them.

Before antitrust authorities look to remedies like breaking up these companies, a good first step would be to require disclosure of information about the growth and health of the supply side of their marketplaces. The statistics about the growth of its third-party marketplace that Bezos trumpeted in his shareholder letter tell only half the story. The questions to ask are who profits, by how much, and how that allocation of rewards is changing over time.

Regulators such as the SEC should require regular financial reporting on the allocation of value between the platform and its marketplace. I have done limited analysis for Google and Amazon based on information provided in their annual public filings, but much of the information required for a rigorous analysis is just not available.

Google provides an annual economic impact report analyzing value provided to its advertisers, but there is no comparable report for the value created for its content suppliers. Nor is there any visibility into the changing fortunes of app suppliers into the Play Store, Google’s Android app marketplace, or into the fortunes of content providers on YouTube.

Questions of who gets what and why must be asked of Amazon’s marketplace and its other operating units, including its dominant cloud-computing division, or Apple’s App Store. The role of Facebook’s algorithms in deciding what content appears in its readers’ newsfeeds has been widely scrutinized with regard to political bias and manipulation by hostile actors, but there’s been little rigorous economic analysis of economic bias in the algorithms of any of these companies.

Data is the currency of these companies. It should also be the currency of those looking to regulate them. You cannot regulate what you don’t understand. The algorithms that these companies use may be defended as trade secrets, but their outcomes should be open to inspection.

Continue reading Antitrust regulators are using the wrong tools to break up Big Tech.

Categories: Technology

Labeling, transforming, and structuring training data sets for machine learning

Thu, 2019/08/15 - 04:30

The O’Reilly Data Show Podcast: Alex Ratner on how to build and manage training data with Snorkel.

In this episode of the Data Show, I speak with Alex Ratner, project lead for Stanford’s Snorkel open source project; Ratner also recently garnered a faculty position at the University of Washington and is currently working on a company supporting and extending the Snorkel project. Snorkel is a framework for building and managing training data. Based on our survey from earlier this year, labeled data remains a key bottleneck for organizations building machine learning applications and services.

Ratner was a guest on the podcast a little over two years ago when Snorkel was a relatively new project. Since then, Snorkel has added more features, expanded into computer vision use cases, and now boasts many users, including Google, Intel, IBM, and other organizations. Along with his thesis advisor professor Chris Ré of Stanford, Ratner and his collaborators have long championed the importance of building tools aimed squarely at helping teams build and manage training data. With today’s release of Snorkel version 0.9, we are a step closer to having a framework that enables the programmatic creation of training data sets.

Continue reading Labeling, transforming, and structuring training data sets for machine learning.

Categories: Technology

Four short links: 15 August 2019

Thu, 2019/08/15 - 04:05

Data Businesses, Data Science Class, Tiny Mouse, and Training Bias

  1. Making Uncommon Knowledge Common -- The Rich Barton playbook is building data content loops to disintermediate incumbents and dominate search, and then using this traction to own demand in their industries.
  2. Data: Past, Present, and Future -- Data and data-empowered algorithms now shape our professional, personal, and political realities. This course introduces students both to critical thinking and practice in understanding how we got here, and the future we now are building together as scholars, scientists, and citizens. The way "Intro to Data Science" classes ought to be.
  3. Clever Travel Mouse -- very small presenter tool, mouse and pointer.
  4. Training Bias in "Hate Speech Detector" Means Black Speech is More Likely to be Censored (BoingBoing) -- The authors do a pretty good job of pinpointing the cause: the people who hand-labeled the training data for the algorithm were themselves biased, and incorrectly, systematically misidentified AAE writing as offensive. And since machine learning models are no better than their training data (though they are often worse!), the bias in the data propagated through the model.

Continue reading Four short links: 15 August 2019.

Categories: Technology

Four short links: 14 August 2019

Wed, 2019/08/14 - 04:00

Hardware Deplatforming, Hiring Groupthink, Loot Boxes and Problem Gambling, and Interoperability and Privacy

  1. Getting Deplatformed from Apple (BoingBoing) -- It turned out that getting locked out of his Apple account made all of Luke's Apple hardware almost useless. I think it should be illegal to do this. I believe in deplatforming (with appropriate boundaries and appeal) but breaking my hardware is bollocks.
  2. How to Avoid Groupthink When Hiring (HBR) -- abridged process: First, make it clear to interviewers that they should not share their interview experiences with each other before the final group huddle. Next, ask each interviewer to perform a few steps before the group huddle: distill their interview rating to a single numerical score; write down their main arguments for and against hiring this person and their final conclusion; If interviewers are emailing in their numerical scores and thoughts on a candidate, don’t include the entire group in the email. Finally, the hiring managers should take note of the average score for a candidate.
  3. Loot Boxes a Matter of "Life or Death," says Researcher -- "There's one clear message that I want to get across today, and it stands in stark contrast to mostly everything you've heard so far," Zendle said. "The message is this: spending money on loot boxes is linked to problem gambling. The more money people spend on loot boxes, the more severe their problem gambling is. This isn't just my research. This is an effect that has been replicated numerous times across the world by multiple independent labs. This is something the games industry does not engage with."
  4. Interoperability and Privacy (BoingBoing) -- latest in the tear that Cory's been on about how to deal with the centralized power of BigSocial.

Continue reading Four short links: 14 August 2019.

Categories: Technology

Four short links: 13 August 2019

Tue, 2019/08/13 - 03:55

Recognizing Fact, YouTube & Brazil, Programming Zine, and Credit Blacklists

  1. Younger Americans are Better than Older Americans at Telling Factual News Statements from Opinions (Pew Research) -- About a third of 18- to 49-year-olds (32%) correctly identified all five of the factual statements as factual, compared with two-in-ten among those ages 50 and older. A similar pattern emerges for the opinion statements. Among 18- to 49-year-olds, 44% correctly identified all five opinion statements as opinions, compared with 26% among those ages 50 and older. Or, 68% of 18-49 year olds couldn't tell whether five factual statements were factual? (via @pewjournalism)
  2. How YouTube Radicalized Brazil (NYT) -- He was killing time on the site one day, he recalled, when the platform showed him a video by a right-wing blogger. He watched out of curiosity. It showed him another, and then another. “Before that, I didn’t have an ideological political background,” Mr. Martins said. YouTube’s auto-playing recommendations, he declared, were “my political education.” “It was like that with everyone,” he said.
  3. Paged Out -- a new experimental (one article == one page) free magazine about programming (especially programming tricks!), hacking, security hacking, retro computers, modern computers, electronics, demoscene, and other similar topics.
  4. Credit Blacklists, Not the Solution to Every Problem -- translated Chinese article on blacklists. As the aforementioned source explained, Wulian County is one of the first in Shandong Province to trial the construction of a social credit system, that began last year. The blacklist is a disciplinary measure restricted to persons within the county. It is different from the People’s Bank of China’s credit information evaluation system blacklist, or the blacklist for those deemed to be untrustworthy by the People’s Court. It does not affect the educational opportunities of anyone’s children, whether or not they themselves can ride a train or plane, and so on. Activities such as volunteering, donating blood, charitable contributions, and so on, can add to one’s personal credit (score), and can also be used to restore and upgrade credit ratings, removing themselves from the blacklist. (via ChinAI)

Continue reading Four short links: 13 August 2019.

Categories: Technology

Blockchain solutions in enterprise

Mon, 2019/08/12 - 04:00

A review of the crucial steps for a successful blockchain-based solution.

Blockchain is a solution for business networks. It makes sense to deploy a blockchain-based solution only where there is a network of collaborating participants who are issuing transactions around a set of common assets in the network. In this article, we’ll identify the initial crucial steps to identifying scenarios for a successful blockchain-based solution, and the first steps toward transforming your business model.

Our first observation of when blockchain is the right solution is that there must be a business network of multiple participants. Our second would be that they require a shared view of assets and their associated transactions.

We then use the following four key blockchain features to further define the benefits of a blockchain-based solution:


The process of agreeing on new transactions and distributing them to participants in the network.


A complete history of all transactions related to the assets recorded on the blockchain.


Once a transaction has been stored on the blockchain, it cannot be edited, deleted, or have transactions inserted before it.


Once a transaction is committed to the blockchain, it is considered “final” and can no longer be “rolled back” or undone.

There are several other blockchain benefits that underpin these four key benefits, and are worth keeping in mind as you review any potential scenarios:


All participants in a permissioned blockchain network have an identity in the form of a digital certificate—the same technology that underpins the security and trust when we use a web browser to access our online bank.


Every transaction in the permissioned network is cryptographically signed, which provides authenticity of which participant sent it, nonrepudiation (meaning they can’t deny sending it), and integrity (meaning it hasn’t been changed since it was sent).


Smart contracts hold the business logic for transactions and are executed across the network by the participants endorsing a transaction.

These benefits help engender trust between the participants in busi‐ ness networks, and we can use them as a litmus test when checking to see if blockchain is a good technology fit. We should note that while it’s not necessary for a scenario to require every benefit just listed, the more that are required, the more the case is strengthened for using blockchain.

We should always be wary of thinking that blockchain is a panacea for all solutions. There are many reasons why blockchain wouldn’t be a good fit. For example:

  • Blockchain is not suitable if there’s only a single participant in the business network.
  • Although we talk about transactions and world state databases in blockchain, it shouldn’t be thought of as a replacement for traditional database or transaction servers.
  • Blockchain by design is a distributed peer-to-peer network, and is heavily based on cryptography. With this comes a number of nonfunctional requirement considerations. For example, performance and latency won’t match a traditional database or transaction server, but scalability, redundancy, and high availability are built in.
Assets, participants, and transactions

When thinking about a potential blockchain solution and the benefits it brings to the network of participants, it is useful to view it in relation to the following concepts:

  • Assets
  • Participants
  • Transactions

We have already introduced some examples of these. They are core concepts in a blockchain network that benefit from the four primary trust benefits introduced in the previous section.


Either purely digital, or backed by a physical object, an asset represents something that is recorded on the blockchain. The asset may be shared across the whole network, or can be kept private depending on the requirements. A smart contract defines the asset.


Participants occupy different levels in a blockchain network. There are those participants who run parts of the network and endorse transactions. Other members may consume services of the network but may rely on and trust other participants to run the network and endorse transactions. Then there are the end users who are interacting with the blockchain network through a user interface. The end user may not even be aware that a blockchain underpins the system.


The transactions are coded inside the smart contracts alongside the assets to which the transactions belong. Think of the transactions as the interaction points between the assets and the participants; a participant can create, delete, and update a given asset, assuming they are authorized to do so. It is these transactions that are stored immutably on the blockchain, which also provides the provenance of any changes to the asset over time.

The blockchain fit

First and foremost is to check there is a business network in place. Identify how many suppliers and partners are involved in both the internal and external network. If there is a good business network in place, consider the rest of the blockchain features.

As some of the disputes are related to differences between what was ordered and subsequently received, this can often be the result of different participants in a business network (partners, suppliers, and delivery companies) tracking goods in separate siloed systems.

Therefore, a shared ledger with consensus and finality provided by blockchain across the business network will help to reduce the overall number of disputes as it will give all participants the same information on the assets being tracked.

Furthermore, if changes to the data being tracked either intentionally or unintentionally are part of the root cause of these disputes, then the provenance and immutability features of blockchain could also help.

Last, consider the amount of time taken to resolve these issues. If there are multiple systems (including third-party systems) that someone needs to check in order to resolve any transactions in dispute, having a single shared ledger that is maintained through consensus will help reduce the time taken to resolve them.

Some further observations about how a blockchain-based solution can benefit this business network:

  • Each participant in the business network has an identity and is permissioned in the network. This could help with your processes related to know your customer (KYC) and anti-money laundering (AML).
  • Smart contracts could be designed to resolve some of the disputes automatically by maintaining consistency across the business network and therefore further reducing the number of disputes.

Choosing a first scenario

You may be considering multiple scenarios where blockchain provides a good solution fit. In this case, you will need to compare each to determine which is the best scenario to work on first.

We recommend a simple approach for comparing each scenario using a quadrant chart, where each is placed on the chart based on its relative benefit and simplicity.

In Figure 1, the x-axis is the simplicity of the scenario (simpler to the right) and the y-axis represents the benefit (more beneficial to the top). Place each scenario on the quadrant chart, considering its expected benefit and simplicity as a blockchain solution. This is best done as a group exercise with appropriate stakeholders who can provide the necessary insight to where each scenario falls in the chart based on level of simplicity and potential benefits.

Once all scenarios have been plotted on the chart, it becomes obvious which are the first scenarios to concentrate on—those that will provide the most benefits and are the simplest.

Figure 1. Comparing scenarios based on their benefit and simplicity

Transforming the business network

Once your first blockchain scenario has been identified, you will want to move to the next phase: building the minimal viable product (MVP). An MVP represents the minimum product that can be built to accomplish a goal of the blockchain scenario. Starting an MVP with blockchain shouldn’t be dissimilar to any other technology, and good software engineering practices, such as using Agile principles, will always be applicable. Following are some observations that will help as you start to transform your business with a new blockchain-based solution:

  • Blockchain is a team sport. There will be multiple stakeholders from different organizations in the business network. Some of these organizations may not have traditionally worked directly with one another. Therefore, a clear understanding of the requirements and issues across all participants, and clear lines of communication and agreement, are critical to the success of the project.
  • Use design thinking techniques that focus on the goals for the user to agree on the scope of the MVP.
  • Use agile software engineering best practices, such as continuous integration and stakeholder feedback, to iterate throughout the development of the MVP. Keep stakeholders informed and act on feedback.
  • Start with a small network and grow. There will be some challenges ahead, as this may be a paradigm shift for the business network.
  • If replacing an existing system, consider running the blockchain-based solution as a shadow chain to mitigate risk. By this we mean, during the pilot phase, run the new platform alongside the legacy system. Ideally, you would pass real production data to the new blockchain-based system to test and validate it, while continuing to rely on the legacy system for this phase of the project. Only after thorough testing has been completed and the new system has been proven should you switch from the legacy system to the new.
  • Although blockchain is likely to be a core foundational part of the solution, it probably won’t be the majority. The blockchain network will still integrate with other external systems, providing additional functions such as off-chain data storage, identity access management, Application Programming Interface (API) management and presentation layers, and so on.

Read the full free ebook here.

This post is a collaboration between O'Reilly and IBM. See our statement of editorial independence.

Continue reading Blockchain solutions in enterprise.

Categories: Technology

Four short links: 12 August 2019

Mon, 2019/08/12 - 04:00

Retro Hacking, Explaining AI, Teacher Ratings, and Algorithmic Bias

  1. First Person Adventure via Mario Maker (Vice) -- the remarkable “3D Maze House (P59-698-55G)” by creator ねぎちん somehow manages to credibly re-create the experience of playing a first-person (!!) adventure game like Wizardy, something Nintendo cleary never intended.
  2. Measurable Counterfactual Local Explanations for Any Classifier -- generates w-counterfactual explanations that state minimum changes necessary to flip a prediction’s classification [and ...] builds local regression models, using the w-counterfactuals to measure and improve the fidelity of its regressions. Making AI "explain itself" is useful and hard; this seems like an interesting step forward.
  3. Student Evaluation of Teaching Ratings and Student Learning are Not Related (Science Direct) -- Students do not learn more from professors with higher student evaluation of teaching (SET) ratings. [...] New meta-analyses of multisection studies show that SET ratings are unrelated to student learning. (via Sciblogs)
  4. Apparent Gender-Based Discrimination in the Display of STEM Career Ads -- women disproportionately click on job ads, so bidding algorithms charge more to advertisers to show to women, so men see more job ads. (via Ethan Molick)

Continue reading Four short links: 12 August 2019.

Categories: Technology

Four short links: 9 August 2019

Fri, 2019/08/09 - 04:05

Shadow Ban Patent, Abusing Unix Tools, Deblurring Photos, and Postal Vectors

  1. Facebook Patents Shadow Banning -- which has a long history elsewhere.
  2. Living Off The Land in Linux -- legitimate functions of Unix binaries that can be abused to break out restricted shells, escalate or maintain elevated privileges, transfer files, spawn bind and reverse shells, and facilitate the other post-exploitation tasks. Interesting to see the surprising functionality built into some utilities.
  3. Neural Blind Deconvolution Using Deep Priors -- deblurring photos with neural nets. Very cool, and they've posted code. (via @roadrunning01)
  4. Warshipping (TechCrunch) -- I mail you a package that contains a Wi-Fi sniffer with cellular connection back to me. It ships me your Wi-Fi handshake, I crack it, ship it back, now it joins your network and the game is afoot. (via BoingBoing)

Continue reading Four short links: 9 August 2019.

Categories: Technology