You are here

O'Reilly Radar

Subscribe to O'Reilly Radar feed
All of our Ideas and Learning material from all of our topics.
Updated: 1 week 6 days ago

Four short links: 13 May 2019

Mon, 2019/05/13 - 04:00

Git-rebase, Swift on the Web, Deepfake Dalí, and ML Style Guide

  1. Git-rebase in Depth -- These tools can be a little bit intimidating to the novice or even intermediate git user, but this guide will help to demystify the powerful git-rebase.
  2. SwiftWasm -- Run Swift in browsers. SwiftWasm compiles your Swift code to WebAssembly.
  3. Deepfake Salvador Dalí Takes Selfies with Museum Visitors (Verge) -- Using archival footage from interviews, GS&P pulled over 6,000 frames and used 1,000 hours of machine learning to train the AI algorithm on Dalí’s face. His facial expressions were then imposed over an actor with Dalí’s body proportions, and quotes from his interviews and letters were synced with a voice actor who could mimic his unique accent, a mix of French, Spanish, and English. The selfie is magic, though. Prize to whoever thought that up!
  4. Rules of ML (Google) -- a style for machine learning, similar to the Google C++ Style Guide and other popular guides to practical programming. If you have taken a class in machine learning, or built or worked on a machine­-learned model, then you have the necessary background to read this document.

Continue reading Four short links: 13 May 2019.

Categories: Technology

Toward the next generation of programming tools

Mon, 2019/05/13 - 03:00

Programmers have built great tools for others. It’s time they built some for themselves.

In a Quora post, Alan Kay lamented the state of tooling for programmers. Every other engineering discipline has built modern computational tools: for computer aided design, simulation and testing, and for manufacturing. But programming hasn’t progressed significantly since the 1970s. We’ve built great tools for others, but not ourselves. The shoemaker’s children have holes in their shoes.

Kay isn’t being entirely fair, but before looking at what he’s missing, it’s important to think about how he’s right. If we don’t understand how he’s right, we certainly won’t understand what to build next, and where the future may be staring us in the face.

Let’s start with programming itself. We’re still using punch cards, now emulated on modern high-resolution monitors. We’re still doing line-oriented programming with an alpha-numeric character set. We’re still using programming languages that, for the most part, behave like C or like LISP, and the biggest debates in the programming community are about which of these ancient paradigms is better. We have IDEs that make it somewhat easier to generate those virtual punch cards, but don’t fundamentally change the nature of the beast. We have some tools for unit testing, but they work by requiring us to write more punch cards (unit tests). We have version control tools for managing changes to those punch cards. And we even have tools for continuous integration, continuous deployment, and container orchestration—all of which are programmed by creating more virtual punch cards.

Database developers are in somewhat better shape. Non-procedural languages like SQL lend themselves more readily to visual programming styles, yielding tools like Tableau, though those tools don't help much if you're connecting an enterprise application to a back-end database.

Where can we go from here? I’ve long thought that the real next-generation programming language won’t be a rehash of LISP, C, or Smalltalk syntax. It won’t be character based at all: it will be visual. Rather than typing, we’ll draw what we want. I’ve yet to see a language that fits the bill. Teaching platforms like Alice and Scratch are an interesting attempt, but they don’t go anywhere near far enough: they just take the programming languages we already know and apply a visual metaphor. A C-clamp instead of a loop. Plug-together blocks instead of keywords. Nothing has really changed.

I suspect that the visual programming language we need will borrow ideas from all of our programming paradigms: it will pass messages, it will have objects, it will support concurrency, and so on. What it won’t have is a textual representation; it won’t be a visual sugarcoating to an underlying language like C.

I have some ideas about where such a language might come from. I see two trends that might help us think about the future of programming.

First, I see increasing evidence that there are two kinds of programmers: programmers who build things by connecting other things together, and programmers who create the things that others connect together. The first is the “blue collar” group of programmers; the second is the “academic” group of programmers–for example, the people doing new work in AI. Neither group is more valuable or important than the other. Building trades provide a good analogy. If I need to install a new kitchen sink, I call a plumber. He knows how to connect the sink to the rest of the plumbing; he doesn’t know how to design the sink. There’s a sink designer in an office somewhere who probably understands (in a rudimentary way) how to connect the sink to the plumbing, but whose real expertise is designing sinks, not installing them. You can’t do without either, though the world needs more plumbers than designers. The same holds true for software: the number of people who build web applications is far greater than the number of people who build web frameworks like React or who design new algorithms and do fundamental research.

Should the computational plumber use the same tools as the algorithm designer? I don’t think so; I can imagine a language that is highly optimized for connecting pre-built operations, but not for building the operations themselves. You can see a glimpse of this in languages that apply functions to every element of a list: you no longer need loops. (Python’s map() applies a function to every element of a list; there are languages where functions behave like this automatically.) You can see PHP as a language that’s good for connecting web pages to databases, but horrible for implementing cryptographic algorithms. And perhaps this connective language might be visual: if we’re just connecting a bunch of things, why not use lines rather than lines of code? (Mel Conway is working on “wiring diagrams” that allow consumers to build applications by describing interaction scenarios.)

Second, one of the most interesting research areas in artificial intelligence is the ability to generate code. A couple of years ago, we saw that AI was capable of optimizing database queries. Andrej Karpathy has argued that this ability places us on the edge of Software 2.0, in which AI will generate the algorithms we need. If, in the future, AI systems will be able to write our code, what kind of language will we need to describe the code we want? That language certainly won’t be a language with loops and conditionals; nor do I think it will be a language based on the mathematical abstraction given by functions. Karpathy suggests that the core of this language will be tagged datasets for training AI models. Instead of writing step-by-step instructions, we will show the computer what we want it to do. If you think such a programming paradigm is too radical, too far away from the process of making a machine do your bidding, think about what the machine language programmers of the 1950s would think about a modern optimizing compiler.

So, while I can’t yet imagine what a new visual language will look like, I can sense that we’re on the edge of being able to build it. In fact, we’re building it already. One of Jeremy Howard’s projects at platform.ai is a system that allows subject matter experts to build machine learning applications without any traditional programming. And Microsoft has launched a drag-and-drop machine learning tool that provides a graphical tool for assembling training data, cleaning it, and using it to build a model without any traditional programming. I suppose one could argue that this “isn’t real programming,” but that’s tantamount to defining programming by its dependence on archaic tools.

What about the rest? What about the tools for building, testing, deploying, and managing software? Here’s where Kay underestimates what we have. I’d agree that it isn’t much, but it’s something; it’s a foundation that we can build upon. We have more than 40 years of experience with build tools (starting with make in 1976), and similar experience with automated configuration (CFengine, the precursor to Chef and Puppet, goes back to the ‘90s), network monitoring (Nagios dates back to 2002), and continuous integration (Hudson, predecessor of Jeeves, dates to 2005). Kubernetes, which handles container orchestration, is the “new kid on the block.” Kubernetes is the robotically automated factory for distributed systems. It’s a tool for managing and operating large software deployments, running across many nodes. That’s really a complete tool suite that runs from automated assembly through automated testing to fully automated production. Lights off; let the computers do the work.

Sadly, these tools are still configured with virtual punch cards (text files, usually in XML, YAML, JSON, or something equally unpleasant), and that’s a problem that has to be overcome. I think the problem isn’t difficulty–creating a visual language for any of these tools strikes me as significantly easier than creating a visual language for programming–it’s tradition.

Software people are used to bad tools. And while we’d hate being forced to use physical punch cards (I’ve done that, it’s no fun), if you virtualize those punch cards, we’re just fine. Perhaps it’s a rite of passage, a sort of industrial hazing. “We survived this suckage, you should too if you’re going to be a real programmer.” We’re happy with that.

Kay is right that we shouldn’t be happy with this state of affairs. The pain of building software using tools that would be immediately understandable to developers in the 1960s keeps us thinking about the bits, rather than the meaning of those bits. As an industry, we need to get beyond that. We have prototypes for some of the tools we need. We just need to finish the job.

We need to imagine a different future for software development.

Continue reading Toward the next generation of programming tools.

Categories: Technology

Four short links: 10 May 2019

Fri, 2019/05/10 - 01:00

Flip Disc Display, Misinformation, Surveillance, and TCP/IP over Logs

  1. Tetris on a Flip-Disc Display (YouTube) -- the update click is ridiculously satisfying. (via BoingBoing)
  2. Agnotology and Epistemological Fragmentation (danah boyd) -- In 1995, Robert Proctor and Iain Boal coined the term “agnotology” to describe the strategic and purposeful production of ignorance. [...] there’s an increasing number of people who are propagating conspiracy theories or simply asking questions as a way of enabling and magnifying white supremacy. This is agnotology at work. Fascinating in the details of how the misinformers do their work online.
  3. Reverse Engineering a Xinjiang Police Mass Surveillance App (Human Rights Watch) -- discovering the data (online) saved by the surveillance system. TechCrunch even shows the tables.
  4. TCP/IP Over Amazon Cloudwatch Logs -- Running in a standard Go process, Richard Linklayer tunnels IP packets over Amazon Cloudwatch Log Streams that follow a special naming convention — the stream and log group names are just MAC addresses. A cute hack.

Continue reading Four short links: 10 May 2019.

Categories: Technology

Real-time entity resolution made accessible

Thu, 2019/05/09 - 03:00

The O’Reilly Data Show Podcast: Jeff Jonas on the evolution of entity resolution technologies.

In this episode of the Data Show, I spoke with Jeff Jonas, CEO, founder and chief scientist of Senzing, a startup focused on making real-time entity resolution technologies broadly accessible. He was previously a fellow and chief scientist of context computing at IBM. Entity resolution (ER) refers to techniques and tools for identifying and linking manifestations of the same entity/object/individual. Ironically, ER itself has many different names (e.g., record linkage, duplicate detection, object consolidation/reconciliation, etc.).

ER is an essential first step in many domains, including marketing (cleaning up databases), law enforcement (background checks and counterterrorism), and financial services and investing. Knowing exactly who your customers are is an important task for security, fraud detection, marketing, and personalization. The proliferation of data sources and services has made ER very challenging in the internet age. In addition, many applications now increasingly require near real-time entity resolution.

Continue reading Real-time entity resolution made accessible.

Categories: Technology

Four short links: 9 May 2019

Thu, 2019/05/09 - 01:00

Adversarial Examples, War Crimes, Open Source Firmware, and Better Questions

  1. Adversarial Examples Are Not Bugs, They Are Features -- Adversarial vulnerability is a direct result of our models’ sensitivity to well-generalizing features in the data.
  2. Tech Companies Are Deleting Evidence of War Crimes (The Atlantic) -- By piecing together information that becomes publicly accessible on social media and other sites, internet users can hold the perpetrators accountable—that is, unless algorithms developed by the tech giants expunge the evidence first. Facebook's automatic content removal tech is removing evidence these investigators use to hold war criminals to account. We live in an age when software designed to get college students laid is critical to prosecuting war criminals.
  3. Why Open Source Firmware is Important for Security (Jessie Frazelle) -- It’s counter-intuitive that the code that we have the least visibility into has the most privileges. This is what open source firmware is aiming to fix.
  4. Tukey, Design Thinking, and Better Questions (Roger Peng) -- In my view, the most useful thing a data scientist can do is to devote serious effort towards improving the quality and sharpness of the question being asked. In my experience as well.

Continue reading Four short links: 9 May 2019.

Categories: Technology

Four short links: 8 May 2019

Wed, 2019/05/08 - 04:10

Old Timers, Web Flashback, Software Collapse, Revisions to Paxos

  1. Brian Kernighan interviews Ken Thompson (YouTube) -- wonderful footage from Vintage Computer Festival East 2019.
  2. Hypertext and our Collective Destiny (Tim Berners-Lee) -- a 1995 talk honouring Vannevar Bush. I had (and still have) a dream that the web could be less of a television channel and more of an interactive sea of shared knowledge. (via Daniel G. Siegel)
  3. Dealing with Software Collapse -- The main issue with the rot metaphor is that it puts the blame on the wrong piece of the puzzle. If software becomes unusable over time, it's not because of any alteration to that software that needs to be reversed. Rather, it's the foundation on which the software has been built, ranging from the actual hardware via the operating system to programming languages and libraries, that has changed so much that the software is no longer compatible with it.
  4. Distributed Consensus Revised (Part II) (The Morning Paper) -- In today’s post, we’re going to be looking at Chapter 3 of Dr. Howard’s thesis, which is a tour (“systematization of knowledge,” SoK) of some of the major known revisions to the classic Paxos algorithm.

Continue reading Four short links: 8 May 2019.

Categories: Technology

Four short links: 7 May 2019

Tue, 2019/05/07 - 04:05

AI in Dev Tools, Reproducibility, Computing Space, and Social Media and Free Speech

  1. Microsoft's AI-Assisted Coding (TechCrunch) -- What makes IntelliCode different is that the company trained it by feeding it the code of thousands of open source projects from GitHub that have at least 100 stars. Using this data, the tool can then make smarter code-completion suggestions. It also takes the current code and context into account as it makes its recommendations. The first of (hopefully) many AI tools for coders. Interestingly, AI-style centralized training on large data sets isn't something that's a natural advantage for open source tools, so I wonder whether this marks the start of a dev tools shift in power to Microsoft.
  2. Reproducibility as a Vehicle for Engineering Best Practices (Joel Grus) -- a talk aimed at machine learning folks, on how delivering reproducibility requires you to use better (more modern) development practices.
  3. Defining the Dimensions of the Space of Computing (JoDS) -- Glass rectangles and black cylinders are not the future. We can imagine other possible futures—paths not taken—by searching within a “space of alternative” computing systems, as Simon has suggested. In this “space,” even though some dimensions are currently less recognizable than others, by investigating and hence illuminating the less-explored dimensions together, we can co-create alternative futures. (via Daniel G. Siegel)
  4. On Social Media (Tom Coates) -- Tom's Twitter thread about what's already filtered out in social media platforms in order to make it clear how fundamental it is that platforms filter content online.

Continue reading Four short links: 7 May 2019.

Categories: Technology

Four short links: 6 May 2019

Mon, 2019/05/06 - 03:50

Intercepting App's Network Traffic, Rules Engine, CC Search, and Pronouncing Names

  1. Privacy International's Data Interception Environment -- an emulator + MITM networking so you can spy on what apps are saying.
  2. Jess -- an open source rules engine from Sandia. Jess uses an enhanced version of the Rete algorithm to process rules. Rete is a very efficient mechanism for solving the difficult many-to-many matching problem. [...] Jess has many unique features, including backward chaining and working memory queries, and of course Jess can directly manipulate and reason about Java objects. Jess is also a powerful Java scripting environment, from which you can create Java objects, call Java methods, and implement Java interfaces without compiling any Java code.
  3. Creative Commons Search Engine -- finally a good search for CC-licensed images.
  4. Pronounce Like a Polyglot: A Guide to Saying Foreign Names (NPR) -- pronouncing someone's name correctly is a basic respect that you should accord them. Failure to do so will make everything else harder. ("Foreign" here is from an American perspective.)

Continue reading Four short links: 6 May 2019.

Categories: Technology

Four short links: 3 May 2019

Fri, 2019/05/03 - 04:05

Knowledge Graph, Volumetric Viewer, Neutral Painting, Weird Codes

  1. Beam -- eBay's open source distributed knowledge graph store. (via eBay tech blog)
  2. Neuroglancer -- Neuroglancer is a WebGL-based viewer for volumetric data. It is capable of displaying arbitrary (non axis-aligned) cross-sectional views of volumetric data, as well as 3-D meshes and line-segment based models (skeletons). This is not an official Google product. Open source. A live demo is hosted here.
  3. Neural Painters -- We explore neural painters, a generative model for brushstrokes learned from a real non-differentiable and non-deterministic painting program. [...] Finally, we present a new concept called intrinsic style transfer. By minimizing only the content loss from neural style transfer, we allow the artistic medium, in this case, brushstrokes, to naturally dictate the resulting style. Source available (in CoLab notebooks no less). (via Reiichiro Nakano)
  4. Real and Strange ICD-10 Codes (John D. Cook) -- Some of the ICD-10 codes are awfully specific, and bizarre. For example, V95.4: Unspecified spacecraft accident injuring occupant; V97.33XA: Sucked into jet engine, initial encounter; V97.33XD: Sucked into jet engine, subsequent encounter. Your reminder that no classification system survives contact with the weirdness of reality.

Continue reading Four short links: 3 May 2019.

Categories: Technology

Rise of the (advertising) machines

Thu, 2019/05/02 - 08:00

Mike Tidmarsh looks at how data and AI are radically reshaping the world of marketing communications.

Continue reading Rise of the (advertising) machines.

Categories: Technology

Building data science capacity in your organization

Thu, 2019/05/02 - 08:00

Shingai Manjengwa shares insights from teaching data science to 300,000 online learners.

Continue reading Building data science capacity in your organization.

Categories: Technology

The unstoppable rise of white box data

Thu, 2019/05/02 - 08:00

Chris Taggart explains the benefits of “white box data” and outlines the structural shifts that are moving the data world toward this model.

Continue reading The unstoppable rise of white box data.

Categories: Technology

Privacy, identity, and autonomy in the age of big data and AI

Thu, 2019/05/02 - 08:00

Sandra Wachter argues that a right to reasonable inferences could protect against new forms of discrimination.

Continue reading Privacy, identity, and autonomy in the age of big data and AI.

Categories: Technology

Combining creativity and analytics

Thu, 2019/05/02 - 08:00

David Boyle shares lessons on how analysts can harness data and creativity to build partnerships.

Continue reading Combining creativity and analytics.

Categories: Technology

Four short links: 2 May 2019

Thu, 2019/05/02 - 04:05

Network Computation, Algorithmic Bias, Social Robotics, Single Founders Do Fine

  1. When Should the Network Be the Computer? -- Researchers have repurposed programmable network devices to place small amounts of application computation in the network, sometimes yielding orders-of-magnitude performance gains. At the same time, effectively using these devices requires careful use of limited resources and managing deployment challenges. This paper provides a framework for principled use of in-network processing. We provide a set of guidelines for building robust and deployable in-network primitives, along with a taxonomy to help identify which applications can benefit from in-network processing and what types of devices they should use.
  2. The Myth of the Impartial Machine -- nifty exploration of algorithmic bias and where it comes from, with interactive data demos.
  3. What Can We Learn from Social Robotics Failures? (IEEE Spectrum) -- Long-term engagement is the holy grail, and the Gordian knot; We need artists; Embodiment does create emotional bonds; Design matters. This line got me: All-in-all, I predict that when designers will start their own social robotics companies and hire engineers, rather than the other way around, we will finally discover what the hidden need for home robots was in the first place.
  4. Cofounders and Single Founders -- In this paper, we examine the implications of founding alone versus as a group by using a unique data set of crowdfunded companies that together generated approximately $358 million in total revenue. We show that companies started by solo founders survive longer than those started by teams. Further, organizations started by solo founders generate more revenue than organizations started by founder pairs, and do not perform significantly different than larger teams. This suggests that the taken-for-granted assumption among scholars that entrepreneurship is best performed by teams should be reevaluated, with implications for theories of team performance and entrepreneurial strategy.

Continue reading Four short links: 2 May 2019.

Categories: Technology

Highlights from the Strata Data Conference in London 2019

Wed, 2019/05/01 - 08:00

Watch highlights from expert talks covering machine learning, predictive analytics, data regulation, and more.

People from across the data world came together in London for the Strata Data Conference. Below you'll find links to highlights from the event.

Making the future

James Burke asks if we can use data and predictive analytics to take the guesswork out of prediction.

Finding your North Star

Cait O’Riordan discusses the North Star metric the Financial Times uses to drive subscriber growth.

Making data science useful

Cassie Kozyrkov explains how organizations can extract more value from their data.

Sustaining machine learning in the enterprise

Drawing insights from recent surveys, Ben Lorica analyzes important trends in machine learning.

Privacy, identity, and autonomy in the age of big data and AI

Sandra Wachter argues that a right to reasonable inferences could protect against new forms of discrimination.

Building data science capacity in your organization

Shingai Manjengwa shares insights from teaching data science to 300,000 online learners.

Combining creativity and analytics

David Boyle shares lessons on how analysts can harness data and creativity to build partnerships.

The enterprise data cloud

Mick Hollison explains why hybrid and multi-cloud will help organizations capitalize on the potential of machine learning and AI.

Rise of the (advertising) machines

Mike Tidmarsh looks at how data and AI are radically reshaping the world of marketing communications.

The unstoppable rise of white box data

Chris Taggart explains the benefits of white box data and outlines the structural shifts that are moving the data world toward this model.

BMW’s journey to the data-driven enterprise from the edge to AI

Tobias Burger and Amr Awadallah discuss some of BMW's most important use cases for data and AI.

-->

Continue reading Highlights from the Strata Data Conference in London 2019.

Categories: Technology

Making data science useful

Wed, 2019/05/01 - 08:00

Cassie Kozyrkov explains how organizations can extract more value from their data.

Continue reading Making data science useful.

Categories: Technology

Making the future

Wed, 2019/05/01 - 08:00

James Burke asks if we can use data and predictive analytics to take the guesswork out of prediction.

Continue reading Making the future.

Categories: Technology

The enterprise data cloud

Wed, 2019/05/01 - 08:00

Mick Hollison describes why hybrid and multi-cloud is the future for organizations that want to capitalize on machine learning and AI.

Continue reading The enterprise data cloud.

Categories: Technology

Sustaining machine learning in the enterprise

Wed, 2019/05/01 - 08:00

Drawing insights from recent surveys, Ben Lorica analyzes important trends in machine learning.

Continue reading Sustaining machine learning in the enterprise.

Categories: Technology

Pages