You are here

Feed aggregator

0x68: Molly De Blanc at CopyleftConf 2019

FAIF - Fri, 2019/05/31 - 04:26

Bradley and Karen enjoy and discuss Molly De Blanc's keynote at the first annual CopyleftConf, entitled The Margins of Software Freedom, followed by an exclusive interview with Molly!

Show Notes: Segment 0 (00:37)
Categories: Free Software

Four short links: 30 May 2019

O'Reilly Radar - Thu, 2019/05/30 - 04:05

Open Insulin, Sonification of Data, Security UX, and Advanced Data Structures

  1. Open Insulin -- open biohacking group working on (and making progress toward) the open and cheap manufacturing of insulin.
  2. The Sound of Data -- a gentle intro to sonification for historians.
  3. UX of Security -- an interesting talk on the relationship between UX and security. I particularly liked: "the most expensive dialog box in the world costs an Australian bank $750,000,000/year for password resets." Slides available. (via Jared Pool)
  4. Advanced Data Structures -- MIT course.

Continue reading Four short links: 30 May 2019.

Categories: Technology

Four short links: 29 May 2019

O'Reilly Radar - Wed, 2019/05/29 - 03:50

Robustness Principle, End of Mobile, Beautiful Hack, and Autonomous Radios

  1. The Harmful Consequences of the Robustness Principle (IETF) -- Time and experience shows that negative consequences to interoperability accumulate over time if implementations apply the robustness principle. This problem originates from an assumption implicit in the principle that it is not possible to effect change in a system the size of the internet. That is, the idea that once a protocol specification is published, changes that might require existing implementations to change are not feasible.
  2. The End of Mobile (Stratechery) -- deep dive into numbers on mobile adoption around the world. The end is the kicker, though: I’m not updating my smartphone model anymore. The next fundamental trends in tech, today, are probably machine learning, crypto, and regulation.
  3. GLS: Goroutine Local Storage -- using the call stack to implement local storage for goroutines, against the language's intentions. A splendidly hacky hack. What are people saying? "Wow, that's horrifying." "This is the most terrible thing I have seen in a very long time." "Where is it getting a context from? Is this serializing all the requests? What the heck is the client being bound to? What are these tags? Why does he need callers? Oh god no. No no no."
  4. If DARPA Has Its Way, AI Will Rule the Wireless Spectrum (IEEE) -- To tackle spectrum scarcity, I created the Spectrum Collaboration Challenge (SC2) at the U.S. Defense Advanced Research Projects Agency (DARPA), where I am a program manager. [...] Teams are designing new radios that use artificial intelligence (AI) to learn how to share spectrum with their competitors, with the ultimate goal of increasing overall data throughput. These teams are vying for nearly $4 million in prizes to be awarded at the SC2 championship this coming October in Los Angeles. Thanks to two years of competition, we have witnessed, for the first time, autonomous radios collectively sharing wireless spectrum to transmit far more data than would be possible by assigning exclusive frequencies to each radio.

Continue reading Four short links: 29 May 2019.

Categories: Technology

Four short links: 28 May 2019

O'Reilly Radar - Tue, 2019/05/28 - 03:50

Research Libraries, Disinformation Campaign, Unstructured Text Mining, and Building a PiDP-11

  1. The Books of College Libraries Are Turning Into Wallpaper (The Atlantic) -- University libraries across the country, and around the world, are seeing steady, and in many cases precipitous, declines in the use of the books on their shelves. [...] Statistics show that today’s undergraduates have read fewer books before they arrive on campus than in prior decades, and just placing students in an environment with more books is unlikely to turn that around. (The time to acquire the reading bug is much earlier than freshman year.) And while correlation does not equal causation, it is all too conspicuous that we reached Peak Book in universities just before the iPhone came out. Part of this story is undoubtedly about the proliferation of electronic devices that are consuming the attention once devoted to books. The interaction between the book format, information scarcity, and digitalization plays out in research libraries.
  2. Live Coverage of a Disinformation Operation Against the 2019 EU Parliamentary Elections (F-Secure) -- the visualization and research into botnet clusters is interesting.
  3. Knowledge Extraction from Unstructured Texts -- an interesting rundown of approaches and papers.
  4. PiDP-11 Retro Computer Build (YouTube) -- building and operating the PiDP-11.

Continue reading Four short links: 28 May 2019.

Categories: Technology

Four short links: 27 May 2019

O'Reilly Radar - Mon, 2019/05/27 - 01:00

Better Figures, Neal Stephenson, Reputation Inflation, Interactive Code

  1. Ten Simple Rules for Better Figures -- A more accurate definition for scientific visualization would be a graphical interface between people and data. In this short article, we do not pretend to explain everything about this interface. [...] Instead we aim to provide a basic set of rules to improve figure design and to explain some of the common pitfalls.
  2. Neal Stephenson Explains His Vision of the Digital Afterlife -- I saw someone recently describe social media in its current state as a doomsday machine, and I think that's not far off. We've turned over our perception of what's real to algorithmically driven systems that are designed not to have humans in the loop, because if humans are in the loop they're not scalable and if they're not scalable they can't make tons and tons of money. The result is the situation we see today where no one agrees on what factual reality is and everyone is driven in the direction of content that is "more engaging," which almost always means that it's more emotional, it's less factually based, it's less rational, and kind of destructive from a basic civics standpoint.
  3. Reputation Inflation -- A solution to marketplace information asymmetries is to have trading partners publicly rate each other post-transaction. Many have shown that these ratings are effective; we show that their effectiveness deteriorates over time. The problem is that ratings are prone to inflation, with raters feeling pressure to leave “above average” ratings, which in turn pushes the average higher. This pressure stems from raters’ desire to not harm the rated seller. As the potential to harm is what makes ratings effective, reputation systems, as currently designed, sow the seeds of their own irrelevance. AAAAAAAAA++ article, would read again.
  4. Dal Segno -- interactive code editor (the language is a bit like Scheme) that, when you change a function, rewinds until the last time that function was called. It's like magic.

Continue reading Four short links: 27 May 2019.

Categories: Technology

Four short links: 24 May 2019

O'Reilly Radar - Fri, 2019/05/24 - 05:10

Forms by Configuration, GitHub Sponsors, SpaceX's LEO Internet, and a Gallery of Programmer Interfaces

  1. ncform -- a very nice configuration generation way to develop forms.
  2. GitHub Sponsors -- allowing donations.
  3. Starlink -- SpaceX is developing a low latency, broadband internet system to meet the needs of consumers across the globe. Enabled by a constellation of low Earth orbit satellites, Starlink will provide fast, reliable internet to populations with little or no connectivity, including those in rural communities and places where existing services are too expensive or unreliable.
  4. Gallery of Programmer Interfaces -- These images bear witness to the passionate work of so many people striving to improve programming. So often the cobbler's children are barefoot.

Continue reading Four short links: 24 May 2019.

Categories: Technology

Applications of data science and machine learning in financial services

O'Reilly Radar - Thu, 2019/05/23 - 04:25

The O’Reilly Data Show Podcast: Jike Chong on the many exciting opportunities for data professionals in the U.S. and China.

In this episode of the Data Show, I spoke with Jike Chong, chief data scientist at Acorns, a startup focused on building tools for micro-investing. Chong has extensive experience using analytics and machine learning in financial services, and he has experience building data science teams in the U.S. and in China.

We had a great conversation spanning many topics, including:

  • Potential applications of data science in financial services.

  • The current state of data science in financial services in both the U.S. and China.

  • His experience recruiting, training, and managing data science teams in both the U.S. and China.

Continue reading Applications of data science and machine learning in financial services.

Categories: Technology

Four short links: 23 May 2019

O'Reilly Radar - Thu, 2019/05/23 - 04:05

Deep Fakes, GPU-Friendly Codec, Retro OS, and Production Readiness

  1. Few-Shot Adversarial Learning of Realistic Neural Talking Head Models -- astonishing work, where you can essentially do deep-fakes from one or two photos. See the YouTube clip for amazing footage of it learning from historical photos and even a painting. (via Dmitry Ulyanov)
  2. Basis Universal GPU Texture Codec -- open source codec for a super-compressed image file format that can be quickly transcoded to something ready for GPUs. See this Hacker News comment for a very readable explanation of why it's important for game developers.
  3. Serenity -- open source OS for x86 machines, which seems like Unix with Windows 98 UI.
  4. The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction -- We present a rubric as a set of 28 actionable tests, and offer a scoring system to measure how ready for production a given machine learning system is. With an implementation in Excel.

Continue reading Four short links: 23 May 2019.

Categories: Technology

Four short links: 22 May 2019

O'Reilly Radar - Wed, 2019/05/22 - 04:05

Software-Defined Memory, SQL Analyzer, Wolfram Engine, and Victims of Passion

  1. Software-Defined Memory in Warehouse-Scale Computers (ACM) -- when you're Google, you invent new types of memory. In this case, a cheaper, but slower, "far memory" that is slower than DRAM but faster than Flash. Of course you do!
  2. ZetaSQL -- Google's SQL parser and analyzer. Cf Apache Calcite. (via Hacker News)
  3. Wolfram Engine -- a locally downloadable Wolfram Engine to put computational intelligence into your applications. The Free Wolfram Engine for Developers is available for pre-production software development.
  4. Love Your Job? Someone May be Taking Advantage of You (Duke) -- people see it as more acceptable to make passionate employees do extra, unpaid, and more demeaning work than they did for employees without the same passion. Which goes some way to explaining why I've found passion to be strongly correlated with burnout.

Continue reading Four short links: 22 May 2019.

Categories: Technology

Four short links: 21 May 2019

O'Reilly Radar - Tue, 2019/05/21 - 04:00

Computational Socioeconomics, AI on Code, AMP, and Social Media's Effect on Adolescents

  1. Computational Socioeconomics -- In this review, we will make a brief manifesto about a new interdisciplinary research field named Computational Socioeconomics, followed by a detailed introduction about data resources, computational tools, data-driven methods, theoretical models, and novel applications at multiple resolutions—including the quantification of global economic inequality and complexity, the map of regional industrial structure and urban perception, the estimation of individual socioeconomic status and demographic, and the real-time monitoring of emergent events.
  2. Microsoft Applying AI to Entire Developer Lifecycle -- Microsoft looks at three different types of code when gathering data: source code—logic and markup (e.g., structure, logic, declarations, comments, variables), distinct learning from public, org, and personal repositories; metadata—interactions (e.g., pull requests, bugs/tickets, codeflow), telemetry (e.g., diagnostics for your app, profiling, etc.); and adjacent sources—documentation, tutorials, and samples; discussion forums (e.g., StackOverflow, Teams / Slack).
  3. Report from the AMP Advisory Committee Meeting -- We heard, several times, that publishers don't like AMP. They feel forced to use it because otherwise they don't get into Google's news carousel—right at the top of the search results.
  4. Social Media’s Enduring Effect on Adolescent Life Satisfaction (PNAS) -- We found that social media use is not, in and of itself, a strong predictor of life satisfaction across the adolescent population. Instead, social media effects are nuanced, small at best, reciprocal over time, gender specific, and contingent on analytic methods.

Continue reading Four short links: 21 May 2019.

Categories: Technology

Becoming a machine learning company means investing in foundational technologies

O'Reilly Radar - Tue, 2019/05/21 - 04:00

Companies successfully adopt machine learning either by building on existing data products and services, or by modernizing existing models and algorithms.

In this post, I share slides and notes from a keynote I gave at the Strata Data Conference in London earlier this year. I will highlight the results of a recent survey on machine learning adoption, and along the way describe recent trends in data and machine learning (ML) within companies. This is a good time to assess enterprise activities, as there are many indications a number of companies are already beginning to use machine learning. For example, in a July 2018 survey that drew more than 11,000 respondents, we found strong engagement among companies: 51% stated they already had machine learning models in production.

With all the hype around AI, it can be tempting to jump into use cases involving data types with which you aren’t familiar. We found that companies that have successfully adopted machine learning do so either by building on existing data products and services, or by modernizing existing models and algorithms. Here are some typical ways organizations begin using machine learning:

  • Build upon existing analytics use cases: e.g., one can use existing data sources for business intelligence and analytics, and use them in an ML application.
  • Modernize existing applications such as recommenders, search ranking, time series forecasting, etc.
  • Use ML to unlock new data types—e.g., images, audio, video.
  • Tackle completely new use cases and applications.

Consider deep learning, a specific form of machine learning that resurfaced in 2011/2012 due to record-setting models in speech and computer vision. While we continue to read about impressive breakthroughs in speech and computer vision, companies are beginning to use deep learning to augment or replace existing models and algorithms. A famous example is Google’s machine translation system, which shifted from “stats focused” approaches to TensorFlow. In our own conferences, we see strong interest in training sessions and tutorials on deep learning for time series and natural language processing—two areas where organizations likely already have existing solutions, and for which deep learning is beginning to show some promise.

Machine learning is not only appearing in more products and systems, but as we noted in a previous post, ML will also change how applications themselves get built in the future. Developers will find themselves increasingly building software that has ML elements. Thus, many developers will need to curate data, train models, and analyze the results of models. With that said, we are still in a highly empirical era for ML: we need big data, big models, and big compute.

Figure 1. A typical data pipeline for machine learning. Source: O'Reilly.

If anything, deep learning models are even more data hungry than previous algorithms favored by data scientists. Data is key to machine learning applications, and getting data flowing, cleaned, and in usable form is going to be key to sustaining a machine learning practice.

With an eye toward the growing importance of machine learning, we recently completed a data infrastructure survey that drew more than 3,200 respondents. Our goal was twofold: (1) find out what tools and platforms people are using, and (2) determine whether or not companies are building the foundational tools needed to sustain their ML initiatives. Many respondents signaled that they were using open source tools (Apache Spark, Kafka, TensorFlow, PyTorch, etc.) and managed services in the cloud.

One of the main questions we asked was: what are you currently building or evaluating?

  • Not surprisingly, data integration and ETL were among the top responses, with 60% currently building or evaluating solutions in this area. In an age of data-hungry algorithms, everything really begins with collecting and aggregating data.
  • An important part of getting your data ready for machine learning is to normalize, standardize, and augment it with other data sources. 52% of survey respondents indicated they were building or evaluating solutions for data preparation and cleaning. These include human-in-the-loop systems for data preparation: these are tools that allow domain experts to train automated systems to do data preparation and cleaning at scale. In fact, there is an exciting new research area called data programming, which unifies techniques for the programmatic creation of training sets.
  • You also need solutions that let you understand what data you have and who can access it. About a third of the respondents in the survey indicated they are interested in data governance systems and data catalogs. Some companies are beginning to build their own solutions, and several will be presenting them at Strata Data in NYC this coming Fall—e.g., Marquez (WeWork) and Databook (Uber). But this is also an area where startups—Alation, Immuta, Okera, and others—are beginning to develop interesting offerings.
  • 21% of survey respondents said they are building or evaluating data lineage solutions. In the past, we got by with a casual attitude toward data sources. Discussions of data ethics, privacy, and security have made data scientists aware of the importance of data lineage and provenance. Specifically, companies will need to know where the data comes from, how it was gathered, and how it was modified along the way. The need to audit or reproduce ML pipelines is increasingly a legal and security issue. Fortunately, we are beginning to see open source projects (including DVC, Pachyderm, Delta Lake, DOLT) that address the need for data lineage and provenance. At recent conferences, we’ve also had talks from companies that have built data lineage systems—Intuit, Lyft, Accenture, and Netflix, among others—and there will be more presentations on data lineage solutions at Strata Data in New York City this coming fall.
  • As the number of data scientists and machine learning engineers grow within an organization, tools have to be standardized, models and features need to be shared, and automation starts getting introduced. 58% of survey respondents indicated they are building or evaluating data science platforms. Our Strata Data conference consistently features several sessions on how companies built their internal data science platforms, specifically in regard to what tradeoffs and design choices they made, and what lessons they’ve learned along the way.
Figure 2. Key features of many data science platforms. Source: O'Reilly.

What about the cloud? In our recent survey, we found a majority are already using a public cloud for portions of their data infrastructure, and more than a third have been using serverless. We have had many training sessions, tutorials, and talks on serverless at recent conferences: including a talk by Eric Jonas on a recent paper laying out the UC Berkeley view on serverless, followed by a talk by Avner Braverman on the role of serverless in AI and data applications.

Companies are just getting started building machine learning applications, and I believe the use of machine learning will continue to grow over the next few years for a couple of reasons:

  • 5G is beginning to be rolled out, and 5G will lead to the development of machine-to-machine applications, many of which will incorporate ML.
  • Specialized hardware for machine learning (specifically, deep learning) will come online: we are already seeing new hardware for model inference for edge devices and servers. Sometime in Q3/Q4 of 2019, specialized hardware for training deep learning models will become available. Imagine systems that will let data scientists and machine learning experts run experiments at a fraction of the cost and a fraction of the time. This new generation of specialized hardware for machine learning training and inference will allow data scientists to explore and deploy many new types of models.

There are a couple of early indicators that ML will continue to grow within companies, both point to the growing number of companies interested in productionizing machine learning. First, while we read a lot of articles in the press about data scientists, a few years ago a new role dedicated to productionizing ML began to emerge.

Figure 3. Data results from a Twitter poll. Source: O'Reilly.

Machine learning engineers sit between data science and engineering/ops, they tend to be higher paid than data scientists, and they generally have stronger technical and programming skills. As my Twitter poll above suggests, there seem to be early indications that data scientists are “rebranding” themselves into this new job title.

Figure 4. Model development tools like MLflow are catching on. Source: O'Reilly.

Another signal that interest in ML is increasing emerges when you look at the traction of new projects like MLflow: in just about 10 months since it launched, we already see strong interest from many companies. As we noted in a previous post, a common use case for MLflow is experiment tracking and management—before MLflow, there weren’t good open source tools for this. Projects like MLflow and Kubeflow (as well as products from companies like and Verta.AI) make ML development easier for companies to manage.

MLflow is an interesting new tool, but it is focused on model development. As your machine learning practice expands to many parts of your organization, it becomes clear that you’ll need other specialized tools. In speaking with many companies that have built data platforms and infrastructure for machine learning, a few important factors arise that have to be taken into account as you design your toolchain:

  • Support for different modeling approaches and tools: while deep learning has become more important, the reality is that even the leading technology companies use a variety of modeling approaches including SVM, XGboost, and statistical learning methods.
  • Duration and frequency of model training will vary, depending on the use case, the amount of data, and the specific type of algorithms used.
  • How much model inference is involved in specific applications?
Figure 5. Important considerations when designing your ML platform. Source: O'Reilly.

Just like data are assets that require specialized tools (including data governance solutions and data catalogs), models are also valuable assets that will need to be managed and protected. As we noted in a previous post, tools for model governance and model operations will also be increasingly critical: the next big step in the democratization of machine learning is making it more manageable. Model governance and model ops will require solutions that contain items like:

  • A database for authorization and security: who has read/write access to certain models
  • A catalog or a database that lists models, including when they were tested, trained, and deployed
  • Metadata and artifacts needed for audits
  • Systems for deployment, monitoring, and alerting: who approved and pushed the model out to production, who is able to monitor its performance and receive alerts, and who is responsible for it
  • A dashboard that provides custom views for all principals (operations, ML engineers, data scientists, business owners)

Companies are learning that there are many important considerations that arise with the use of ML. Thankfully, the research community has begun rolling out techniques and tools to address some of the important challenges ML presents, including fairness, explainability, safety and reliability, and especially security and privacy. Machine learning often interacts and impacts users, so companies not only need to put in place processes that will let them deploy ML responsibly, they need to build foundational technologies that will allow them to retain oversight, particularly when things go wrong. The technologies I’ve alluded to above—data governance, data lineage, model governance—are all going to be useful for helping manage these risks. In particular, auditing and testing machine learning systems will rely on many of the tools I’ve described above.

There are real, not just theoretical, risks and considerations. These foundational tools will increasingly be essential and no longer optional. For example, a recent DLA Piper survey provides an estimate of GDPR breaches that have been reported to regulators: more than 59,000 personal data breaches as of February, 2019.

Figure 6. Machine learning involves a series of interrelated algorithms. Source: O'Reilly.

While we tend to think of ML as producing a “model” or “algorithm” that we deploy, auditing ML systems can be challenging, as there are actually two algorithms to keep track of:

  • The actual model that one deploys and uses in an application of product
  • Another algorithm (the “trainer” and “pipeline”) that uses data to produce the Model that best optimizes some objective function.

So, managing ML really means building a set of tools that can manage a series of interrelated algorithms. Based on the survey results I’ve described above, companies are beginning to build the important foundational technologies—data integration and ETL, data governance and data catalogs, data lineage, model development and model governance—that are important to sustaining a responsible machine learning practice.

But challenges remain, particularly as the use of ML grows within companies that are already having to grapple with many IT, software, and cloud solutions (besides having to manage the essential task of “keeping the lights on”). The good news is that there are early indicators that companies are beginning to acknowledge the need to build or acquire the requisite foundational technologies.

Related resource:

Continue reading Becoming a machine learning company means investing in foundational technologies.

Categories: Technology

Four short links: 20 May 2019

O'Reilly Radar - Mon, 2019/05/20 - 03:55

Account Hygiene, Conversational AI Playbook, Unix Time Falsehoods, and Testing/Debugging Machine Learning

  1. Basic Account Hygiene to Prevent Hijacking (Google) -- SMS 2FA blocked 100% of automated bots, 96% of bulk phishing attacks, and 76% of targeted attacks. On-device prompts, a more secure replacement for SMS, helped prevent 100% of automated bots, 99% of bulk phishing attacks and 90% of targeted attacks.
  2. Conversational AI Playbook -- The detailed instructions, practical advice, and real-world examples provided here should empower developers to improve the quality and variety of conversational experiences of the coming months and years.
  3. Falsehoods Programmers Believe about Unix Time -- These three facts all seem eminently sensible and reasonable, right? Unix time is the number of seconds since 1 January 1970 00:00:00 UTC. If I wait exactly one second, Unix time advances by exactly one second. Unix time can never go backward. False, false, false.
  4. Testing and Debugging in Machine Learning (Google) -- Testing and debugging machine learning systems differs significantly from testing and debugging traditional software. This course describes how, starting from debugging your model all the way to monitoring your pipeline in production.

Continue reading Four short links: 20 May 2019.

Categories: Technology

Four short links: 17 May 2019

O'Reilly Radar - Fri, 2019/05/17 - 04:10

Productsec, Supply Chain Attack, Sparse Neural Networks, and the Christchurch Call

  1. Six Buckets of Productsec -- There are six buckets a security bug can fall into on its journey through life: Prevented—best outcome, never turned into code. Found automatically—found via static analysis or other tools, “cheap” time cost. Found manually—good even if it took more time; a large set of bugs can only be found this way. Found externally—usually via bug bounty, put users at real risk, expensive time cost but 100x better than other outcomes. Never found—most bugs probably end up here. Exploited—the worst.
  2. ShadowHammer (Bruce Schneier) -- The common thread through all of the above-mentioned cases is that attackers got valid certificates and compromised their victims’ development environments. (via Bruce Schneier)
  3. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks -- dense, randomly initialized, feed-forward networks contain subnetworks ("winning tickets") that—when trained in isolation—reach test accuracy comparable to the original network in a similar number of iterations. The winning tickets we find have won the initialization lottery: their connections have initial weights that make training particularly effective.
  4. Christchurch Call -- first time governments and companies have, en masse, sat at a table to figure out how to curb violent extremist content on the platforms.

Continue reading Four short links: 17 May 2019.

Categories: Technology

Four short links: 16 May 2019

O'Reilly Radar - Thu, 2019/05/16 - 04:05

Regulating Platforms, Amazon Development, Still Love Tech, and ML Cheatsheets

  1. The Platform Challenge (Alex Stamos) -- absolute cracker of a talk about regulating the social media platforms. Must watch.
  2. Amazon's Away Teams -- Capturing the way things are at an organization as large as Amazon is always a challenge. The company has never publicly codified its management system as it has done for its leadership principles. But this picture might offer new ideas for people seeking to coordinate technology development at scale. (via Simon Willison)
  3. Why I Still Love Tech (Wired) -- I love the whole made world. But I can’t deny that the miracle is over, and that there is an unbelievable amount of work left for us to do.
  4. Illustrated Machine Learning Cheatsheets -- what it says on the box.

Continue reading Four short links: 16 May 2019.

Categories: Technology

The topics to watch in software architecture

O'Reilly Radar - Thu, 2019/05/16 - 03:00

Microservices, serverless, AI, ML, and Kubernetes are among the most notable topics in our analysis of proposals from the O’Reilly Software Architecture Conference.

The speaker proposals we receive for the O’Reilly Software Architecture Conference are valuable because they come from speakers who are often the leading names in their fields. These go-to experts and practitioners operate on the front lines of technology. They also understand that business and architecture can no longer be compartmentalized, and that revenue is at stake.

Our recent analysis[1] of speaker proposals from the O’Reilly Software Architecture Conference turned up a number of interesting findings:

Continue reading The topics to watch in software architecture.

Categories: Technology

Four short links: 15 May 2019

O'Reilly Radar - Wed, 2019/05/15 - 03:55

Privacy, Decision Trees, Other People's Problems, and Programming Tools

  1. Senate Testimony (Maciej Ceglowski) -- This is an HTMLized version of written testimony I provided on May 7, 2019, to the Senate Committee on Banking, Housing, and Urban Affairs for their hearing on Privacy Rights and Data Collection in a Digital Economy. [...] The leading ad networks in the European Union have chosen to respond to the GDPR by stitching together a sort of Frankenstein’s monster of consent, a mechanism whereby a user wishing to visit, say, a weather forecast page is first prompted to agree to share data with a consortium of 119 entities, including the aptly named “A Million Ads” network. The user can scroll through this list of intermediaries one by one, or give or withhold consent en bloc, but either way she must wait a further two minutes for the consent collection process to terminate before she is allowed to find out whether or not it is going to rain.
  2. Awesome Decision Tree Papers -- A collection of research papers on decision, classification, and regression trees with implementations.
  3. Other People's Problems (Camille Fournier) -- There’s always going to be something you can’t fix. So how do you decide where to exert your energy? Step one: figure out who owns this problem.
  4. Toward the Next Generation of Programming Tools (Mike Loukides) -- one of the most interesting research areas in artificial intelligence is the ability to generate code.

Continue reading Four short links: 15 May 2019.

Categories: Technology

How AI and machine learning are improving customer experience

O'Reilly Radar - Tue, 2019/05/14 - 04:05

From data quality to personalization, to customer acquisition and retention, and beyond, AI and ML will shape the customer experience of the future.

What can artificial intelligence (AI) and machine learning (ML) do to improve customer experience? AI and ML already have been intimately involved in online shopping since, well, the beginning of online shopping. You can’t use Amazon or any other shopping service without getting recommendations, which are often personalized based on the vendor’s understanding of your traits: your purchase history, your browsing history, and possibly much more. Amazon and other online businesses would love to invent a digital version of the (possibly mythical) sales person who knows you and your tastes, and can unerringly guide you to products you will enjoy.

Everything begins with better data

To make that vision a reality, we need to start with some heavy lifting on the back end. Who are your customers? Do you really know who they are? All customers leave behind a data trail, but that data trail is a series of fragments, and it’s hard to relate those fragments to each other. If one customer has multiple accounts, can you tell? If a customer has separate accounts for business and personal use, can you link them? And if an organization uses many different names (we remember a presentation in which someone talked of the hundreds of names—literally—that resolved to IBM), can you discover the single organization responsible for them? Customer experience starts with knowing exactly who your customers are and how they’re related. Scrubbing your customer lists to eliminate duplicates is called entity resolution; it used to be the domain of large companies that could afford substantial data teams. We’re now seeing the democratization of entity resolution: there are now startups that provide entity resolution software and services that are appropriate for small to mid-sized organizations.

Once you’ve found out who your customers are, you have to ask how well you know them. Getting a holistic view of a customer’s activities is central to understanding their needs. What data do you have about them, and how do you use it? ML and AI are now being used as tools in data gathering: in processing the data streams that come from sensors, apps, and other sources. Gathering customer data can be intrusive and ethically questionable; as you build your understanding of your customers, make sure you have their consent and that you aren’t compromising their privacy.

ML isn’t fundamentally different from any other kind of computing: the rule “garbage in, garbage out” still applies. If your training data is low quality, your results will be poor. As the number of data sources grows, the number of potential data fields and variables increases, along with the potential for error: transcription errors, typographic errors, and so on. In the past it might have been possible to manually correct and repair data, but correcting data manually is an error-prone and tedious task that continues to occupy most data scientists. As with entity resolution, data quality and data repair have been the subject of recent research, and a new set of machine learning tools for automating data cleaning are beginning to appear.


One common application of machine learning and AI to customer experience is in personalization and recommendation systems. In recent years, hybrid recommender systems—applications that combine multiple recommender strategies—have become much more common. Many hybrid recommenders rely on many different sources and large amounts of data, and deep learning models are frequently part of such systems. While it’s common for recommendations to be based on models that are only retrained periodically, advanced recommendation and personalization systems will need to be real time. Using reinforcement learning, online learning, and bandit algorithms, companies are beginning to build recommendation systems that constantly train models against live data.

Machine learning and AI are automating many different enterprise tasks and workflows, including customer interactions. We’ve all experienced chatbots that automate various aspects of customer service. So far, chatbots are more annoying than helpful—though, well-designed, simple “frequently asked question” bots can lead to good customer acquisition rates. But we’re only in the early stages of natural language processing and understanding—and in the past year, we’ve seen many breakthroughs. As our ability to build sophisticated language models improves, we will see chatbots progress through a number of stages: from providing notifications, to managing simple question and answer scenarios, to understanding context and participating in simple dialogs, and finally to personal assistants that are “aware” of their users’ needs. As chatbots improve, we expect them to become an integral part of customer service, not merely an annoyance that you have to work through to get to a human. And for chatbots to reach this level of performance, they will need to incorporate real-time recommendation and personalization. They will need to understand customers as well as a human.

Fraud detection is another technology that is now digesting machine learning. Fraud detection is engaged in a constant battle between the good guys and the criminals, and the stakes are constantly increasing. Fraud artists are inventing more sophisticated techniques for online crime. Fraud is no longer person-to-person: it is automated, as in a bot that buys up all the tickets to events so they can be resold by scalpers. As we’ve seen in many recent elections, it is easy for criminals to penetrate social media by creating a bot that floods conversations with automated responses. It is much harder to discover those bots and block them in real time. That’s only possible with machine learning, and even then, it’s a difficult problem that’s only partially solved. But solving it is a critical part of re-building an online world in which people feel safe and respected.

Advances in speech technologies and emotion detection will reduce friction in automated customer interactions even further. Multi-modal models that combine different kinds of inputs (audio, text, vision) will make it easier to respond to customers appropriately; customers might be able to show you what they want or send a live video of a problem they’re facing. While interactions between humans and robots frequently place users in the creepy “uncanny valley,” it’s a safe bet that customers of the future will be more comfortable with robots than we are now.

But if we’re going to get customers through to the other side of the uncanny valley, we also have to respect what they value. AI and ML applications that affect customers will have to respect privacy; they will have to be secure; and they will have to be fair and unbiased. None of these challenges are simple, but technology won’t improve customer experience if customers end up feeling abused. The result may be more efficient, but that’s a bad tradeoff.

What will machine learning and artificial intelligence do for customer experience? It has already done much. But there’s much more that it can do—and that it has to do—in building the frictionless customer experience of the future.

Related Content:

Continue reading How AI and machine learning are improving customer experience.

Categories: Technology

Four short links: 14 May 2019

O'Reilly Radar - Tue, 2019/05/14 - 04:00

Designing for AI, Opinions, Data Moats, and Trans-inclusive Design

  1. People + AI Guidebook (Google) -- Designing human-centered AI products.
  2. Strong Opinions Loosely Held Might Be The Worst Idea in Tech -- What really happens? The loudest, most bombastic engineer states their case with certainty, and that shuts down discussion. Other people either assume the loudmouth knows best, or don’t want to stick out their neck and risk criticism and shame. This is especially true if the loudmouth is senior, or there is any other power differential.
  3. The Empty Promise of Data Moats (A16Z) -- Most data network effects are really scale effects.
  4. Trans-inclusive Design -- this is GOLD and should be required reading for every software designer and developer.

Continue reading Four short links: 14 May 2019.

Categories: Technology

Making Facebook a scapegoat is a mistake

O'Reilly Radar - Tue, 2019/05/14 - 03:00

Breaking up Facebook won't solve the disinformation or privacy problems. It might well make it harder for Facebook to work on those problems.

Chris Hughes, a co-founder of Facebook, recently wrote an opinion piece for the New York Times presenting an argument for breaking up Facebook. I was trying to stay out of this discussion, and Nick Clegg's response pretty much summed up my opinions. But I got one too many emails from friends that simply assumed I'd be in agreement with Chris, especially given my critique of blitzscaling and Silicon Valley's obsession with "world domination."

If Facebook should be broken up, it should be on grounds of anticompetitive behavior, and Chris barely even touches on that topic. Dragging in the history of all Facebook's various corporate missteps for which a breakup would provide no remedy just muddies the water and lets everyone think that punishment has been done. It's actually a way of NOT solving the actual problems. Breaking up Facebook won't solve the disinformation problem. It won't solve the privacy problem. It might well make it harder for Facebook to work on those problems.

In addition, if harm to our society is sufficient reason for a breakup, there are a lot of companies ahead of Facebook in the queue. Big banks. The pharma companies that brought us the opioid crisis. The energy cartel that's still fighting against responding to climate change 70 years after the warning bells were sounded.

Even if you restrict yourself to surveillance capitalism, picking out one high-profile malefactor, issuing draconian punishment, and then going back to business as usual everywhere else is a lose-lose, not a win. Facebook is not at the root of this problem, nor even the worst offender. To solve this problem, we need to look not only beyond Facebook, but also beyond the entire tech industry. Banks and telcos are often worse privacy offenders than Google or Facebook. And if you restrict yourself to harm from disinformation, polarization, and radicalization, Twitter and YouTube are just as culpable and, out of the spotlight, appear to be doing less than Facebook to police their problems. Reddit is far worse, and traditional media outlets, especially the Fox News empire, almost as bad. (See Yochai Benkler's excellent book on Network Disinformation.)

Yes, Facebook has scale, but even there, I thought Chris' arguments were weak. In particular, I found the New York Times infographic to be quite disingenuous:

Figure 1. Screenshot of the infographic on Chris Hughes' opinion piece in the New York Times.

What's wrong with this infographic? It lists all of Facebook's properties, including those that are primarily messaging, while limiting the competition purely to social media, and failing to include other messaging products. Take out WhatsApp and Messenger, and Facebook suddenly doesn't look so big. Or add in other messaging products from companies like Apple, Google, Microsoft, and plain old SMS, and again, suddenly Facebook doesn't look so dominant. Add in Google's other properties besides YouTube, especially those like search and the new newsfeed on Android phones that influence what people see and understand about the world. Restrict it to the individual national markets where the competition actually happens, and it would look different again. This isn't analysis. This is polemics.

I'm not saying there aren't grounds for investigation of anticompetitive behavior in the tech industry, or by Facebook specifically, but if someone's going to talk breakup, I'd love to see reporting and opinion pieces that make a true antitrust case, if there is one.

If the world really wants to tackle the problems that networked media creates or amplifies, making Facebook a scapegoat and letting everyone else off the hook is a massive mistake.

Continue reading Making Facebook a scapegoat is a mistake.

Categories: Technology

Four short links: 13 May 2019

O'Reilly Radar - Mon, 2019/05/13 - 04:00

Git-rebase, Swift on the Web, Deepfake Dalí, and ML Style Guide

  1. Git-rebase in Depth -- These tools can be a little bit intimidating to the novice or even intermediate git user, but this guide will help to demystify the powerful git-rebase.
  2. SwiftWasm -- Run Swift in browsers. SwiftWasm compiles your Swift code to WebAssembly.
  3. Deepfake Salvador Dalí Takes Selfies with Museum Visitors (Verge) -- Using archival footage from interviews, GS&P pulled over 6,000 frames and used 1,000 hours of machine learning to train the AI algorithm on Dalí’s face. His facial expressions were then imposed over an actor with Dalí’s body proportions, and quotes from his interviews and letters were synced with a voice actor who could mimic his unique accent, a mix of French, Spanish, and English. The selfie is magic, though. Prize to whoever thought that up!
  4. Rules of ML (Google) -- a style for machine learning, similar to the Google C++ Style Guide and other popular guides to practical programming. If you have taken a class in machine learning, or built or worked on a machine­-learned model, then you have the necessary background to read this document.

Continue reading Four short links: 13 May 2019.

Categories: Technology


Subscribe to LuftHans aggregator