You are here

Feed aggregator

Highlights from the Strata Data Conference in San Francisco 2019

O'Reilly Radar - Thu, 2019/03/28 - 16:00

Watch highlights from expert talks covering AI, machine learning, data analytics, and more.

People from across the data world came together in San Francisco for the Strata Data Conference. Below you'll find links to highlights from the event.

Hacking the vote: The neuropolitical universe

Elizabeth Svoboda explains how biosensors and predictive analytics are being applied by political campaigns and what they mean for the future of free and fair elections.

AI and cryptography: Challenges and opportunities

Shafi Goldwasser explains why the next frontier of cryptography will help establish safe machine learning.

Likewar: How social media is changing the world and how the world is changing social media

Peter Singer explores the new rules of power in the age of social media and how we can navigate a world increasingly shaped by "likes" and lies.

Forecasting uncertainty at Airbnb

Theresa Johnson outlines the AI powering Airbnb’s metrics forecasting platform.

Chatting with machines: Strange things 60 billion bot logs say about human nature

Lauren Kunze discusses lessons learned from an analysis of interactions between humans and chatbots.

The journey to the data-driven enterprise from the edge to AI

Amy O'Connor explains how Cloudera applies an "edge to AI" approach to collect, process, and analyze data.

Streamlining your data assets: A strategy for the journey to AI

Dinesh Nirmal shares a data asset framework that incorporates current business structures and the elements you need for an AI-fluent data platform.

Scoring your business in the AI matrix

Jed Dougherty plots AI examples on a matrix to clarify the various interpretations of AI.

Data warehousing is not a use case

Google BigQuery co-creator Jordan Tigani shares his vision for where cloud-scale data analytics is heading.

The enterprise data cloud

Mike Olson describes the key capabilities an enterprise data cloud system requires, and why hybrid and multi-cloud is the future.

Winners of the Strata Data Awards 2019

The Strata Data Award is given to the most disruptive startup, the most innovative industry technology, the most impactful data science project, and the most notable open source contribution.

It’s in the game: A rare look into how EA brought data science into the creative process of game design

Zachery Anderson discusses EA's history of combining data with development, from the early days of balancing gameplay to the future of personalized games for everyone.

-->

Continue reading Highlights from the Strata Data Conference in San Francisco 2019.

Categories: Technology

Winners of the Strata Data Awards 2019

O'Reilly Radar - Thu, 2019/03/28 - 13:00

The Strata Data Award is given to the most disruptive startup, the most innovative industry technology, the most impactful data science project, and the most notable open source contribution.

Continue reading Winners of the Strata Data Awards 2019.

Categories: Technology

Hacking the vote: The neuropolitical universe

O'Reilly Radar - Thu, 2019/03/28 - 13:00

Elizabeth Svoboda explains how biosensors and predictive analytics are being applied by political campaigns and what they mean for the future of free and fair elections.

Continue reading Hacking the vote: The neuropolitical universe.

Categories: Technology

Data warehousing is not a use case

O'Reilly Radar - Thu, 2019/03/28 - 13:00

Google BigQuery co-creator Jordan Tigani shares his vision for where cloud-scale data analytics is heading.

Continue reading Data warehousing is not a use case.

Categories: Technology

The enterprise data cloud

O'Reilly Radar - Thu, 2019/03/28 - 13:00

Mike Olson describes the key capabilities an enterprise data cloud system requires, and why hybrid and multi-cloud is the future.

Continue reading The enterprise data cloud.

Categories: Technology

Likewar: How social media is changing the world and how the world is changing social media.

O'Reilly Radar - Thu, 2019/03/28 - 13:00

Peter Singer explores the new rules of power in the age of social media and how we can navigate a world increasingly shaped by "likes" and lies.

Continue reading Likewar: How social media is changing the world and how the world is changing social media..

Categories: Technology

Chatting with machines: Strange things 60 billion bot logs say about human nature

O'Reilly Radar - Thu, 2019/03/28 - 13:00

Lauren Kunze discusses lessons learned from an analysis of interactions between humans and chatbots.

Continue reading Chatting with machines: Strange things 60 billion bot logs say about human nature.

Categories: Technology

Forecasting uncertainty at Airbnb

O'Reilly Radar - Thu, 2019/03/28 - 13:00

Theresa Johnson outlines the AI powering Airbnb’s metrics forecasting platform.

Continue reading Forecasting uncertainty at Airbnb.

Categories: Technology

It’s time for data scientists to collaborate with researchers in other disciplines

O'Reilly Radar - Thu, 2019/03/28 - 05:45

The O’Reilly Data Show Podcast: Forough Poursabzi Sangdeh on the interdisciplinary nature of interpretable and interactive machine learning.

In this episode of the Data Show, I spoke with Forough Poursabzi-Sangdeh, a postdoctoral researcher at Microsoft Research New York City. Poursabzi works in the interdisciplinary area of interpretable and interactive machine learning. As models and algorithms become more widespread, many important considerations are becoming active research areas: fairness and bias, safety and reliability, security and privacy, and Poursabzi’s area of focus—explainability and interpretability.

Continue reading It’s time for data scientists to collaborate with researchers in other disciplines.

Categories: Technology

Four short links: 28 March 2019

O'Reilly Radar - Thu, 2019/03/28 - 05:10

Data-Oriented Design, Time Zone Hell, Music Algorithms, and Fairness in ML

  1. Data Oriented Design -- A curated list of data-oriented design resources.
  2. Storing UTC is Not a Silver Bullet -- time zones will drive you to drink.
  3. Warner Music Signed an Algorithm to a Record Deal (Verge) -- Although Endel signed a deal with Warner, the deal is crucially not for “an algorithm,” and Warner is not in control of Endel’s product. The label approached Endel with a distribution deal and Endel used its algorithm to create 600 short tracks on 20 albums that were then put on streaming services, returning a 50/50 royalty split to Endel. Unlike a typical major label record deal, Endel didn’t get any advance money paid upfront, and it retained ownership of the master recordings.
  4. 50 Years of Unfairness: Lessons for Machine Learning -- We trace how the notion of fairness has been defined within the testing communities of education and hiring over the past half century, exploring the cultural and social context in which different fairness definitions have emerged. In some cases, earlier definitions of fairness are similar or identical to definitions of fairness in current machine learning research, and foreshadow current formal work. In other cases, insights into what fairness means and how to measure it have largely gone overlooked. We compare past and current notions of fairness along several dimensions, including the fairness criteria, the focus of the criteria (e.g., a test, a model, or its use), the relationship of fairness to individuals, groups, and subgroups, and the mathematical method for measuring fairness (e.g., classification, regression). This work points the way toward future research and measurement of (un)fairness that builds from our modern understanding of fairness while incorporating insights from the past.

Continue reading Four short links: 28 March 2019.

Categories: Technology

The journey to the data-driven enterprise from the edge to AI

O'Reilly Radar - Wed, 2019/03/27 - 13:00

Amy O'Connor explains how Cloudera applies an "edge to AI" approach to collect, process, and analyze data.

Continue reading The journey to the data-driven enterprise from the edge to AI.

Categories: Technology

AI and cryptography: Challenges and opportunities

O'Reilly Radar - Wed, 2019/03/27 - 13:00

Shafi Goldwasser explains why the next frontier of cryptography will help establish safe machine learning.

Continue reading AI and cryptography: Challenges and opportunities.

Categories: Technology

Streamlining your data assets: A strategy for the journey to AI

O'Reilly Radar - Wed, 2019/03/27 - 13:00

Dinesh Nirmal shares a data asset framework that incorporates current business structures and the elements you need for an AI-fluent data platform.

Continue reading Streamlining your data assets: A strategy for the journey to AI.

Categories: Technology

Scoring your business in the AI matrix

O'Reilly Radar - Wed, 2019/03/27 - 13:00

Jed Dougherty plots AI examples on a matrix to clarify the various interpretations of AI.

Continue reading Scoring your business in the AI matrix.

Categories: Technology

0x64: Our Producer Dan Lynch Interviewed at Copyleft Conf 2019

FAIF - Wed, 2019/03/27 - 11:19

Bradley and Karen interview their own producer, Dan Lynch, on site at Copyleft Conf 2019.

Show Notes: Segment 0 (00:46) Segment 1 (5:19) Segment 2 (28:23)

Bradley and Karen briefly dissect the interview with Dan.

Segment 3 (32:22)

Karen and Bradley mention that they'll discuss the Linux Foundation initiative, “Community Bridge” in the next episode. If you want a preview Bradley and Karen's thoughts, you can read their blog post about Linux Foundation's “Community Bridge” initiative.

Send feedback and comments on the cast to <oggcast@faif.us>. You can keep in touch with Free as in Freedom on our IRC channel, #faif on irc.freenode.net, and by following Conservancy on identi.ca and and Twitter.

Free as in Freedom is produced by Dan Lynch of danlynch.org. Theme music written and performed by Mike Tarantino with Charlie Paxson on drums.

The content of this audcast, and the accompanying show notes and music are licensed under the Creative Commons Attribution-Share-Alike 4.0 license (CC BY-SA 4.0).

Categories: Free Software

Four short links: 27 March 2019

O'Reilly Radar - Wed, 2019/03/27 - 04:00

Linkers and Loaders, Low-Low-Low Power Bluetooth, Voice, and NVC

  1. Linkers and Loaders -- the uncorrected manuscript chapters for my Linkers and Loaders, published by Morgan-Kaufman.
  2. <1mW Bluetooth LTE Transmitter -- Consuming just 0.6 milliwatts during transmission, it would broadcast for 11 years using a typical 5.8-mm coin battery. Such a millimeter-scale BLE radio would allow these ant-sized sensors to communicate with ordinary equipment, even a smartphone. Ingenious engineering hacks to make this work.
  3. Mumble -- an open source, low-latency, high-quality voice chat software primarily intended for use while gaming.
  4. A Guide to Difficult Conversations (Dave Bailey) -- your quarterly reminder that non-violent communication exists and is a good thing.

Continue reading Four short links: 27 March 2019.

Categories: Technology

Four short links: 26 March 2019

O'Reilly Radar - Tue, 2019/03/26 - 04:15

Software Stack, Gig Economy, Simple Over Flexible, and Packet Radio

  1. Thoughts on Conway's Law and the Software Stack (Jessie Frazelle) -- All these problems are not small by any means. They are miscommunications at various layers of the stack. They are people thinking an interface or feature is secure when it is merely a window dressing that can be bypassed with just a bit more knowledge about the stack. I really like the advice Lea Kissner gave: “take the long view, not just the broad view.” We should do this more often when building systems.
  2. Troubles with the Open Source Gig Economy and Sustainability Tip Jar (Chris Aniszczyk) -- thoughtful long essay with a lot of links for background reading, on the challenges of sustainability via Patreon, etc., through to some signs of possibly-working models.
  3. Choose Simple Solutions Over Flexible Ones -- flexibility does not come for free.
  4. New Packet Radio (Hackaday) -- a custom radio protocol, designed to transport bidirectional IP traffic over 430MHz radio links (ham radio). This protocol is optimized for "point to multipoint" topology, with the help of managed-TDMA. Note that Hacker News commentors indicate some possible FCC violations; though, as the project comes from France, that's probably not a problem for the creators of the software.

Continue reading Four short links: 26 March 2019.

Categories: Technology

Four short links: 25 March 2019

O'Reilly Radar - Mon, 2019/03/25 - 04:00

Hiring for Neurodiversity, Reprogrammable Molecular Computing, Retro UUCP, and Industrial Go

  1. Dell's Neurodiversity Program -- excellent work from Dell making themselves an attractive destination for folks on the autistic spectrum.
  2. Reprogrammable Molecular Computing System (Caltech) -- The researchers were able to experimentally demonstrate 6-bit molecular algorithms for a diverse set of tasks. In mathematics, their circuits tested inputs to assess if they were multiples of three, performed equality checks, and counted to 63. Other circuits drew "pictures" on the DNA "scarves," such as a zigzag, a double helix, and irregularly spaced diamonds. Probabilistic behaviors were also demonstrated, including random walks as well as a clever algorithm (originally developed by computer pioneer John von Neumann) for obtaining a fair 50/50 random choice from a biased coin. Paper.
  3. Dataforge UUCP -- it's like Cory Doctorow guestwrote our timeline: UUCP over SSH to give decentralized comms for freedom fighters.
  4. Go for Industrial Programming (Peter Bourgon) -- I’m speaking today about programming in an industrial context. By that I mean: in a startup or corporate environment; within a team where engineers come and go; on code that outlives any single engineer; and serving highly mutable business requirements. [...] I’ve tried to select for areas that have routinely tripped up new and intermediate Gophers in organizations I’ve been a part of, and particularly those things that may have nonobvious or subtle implications. (via ceej)

Continue reading Four short links: 25 March 2019.

Categories: Technology

Four short links: 22 March 2019

O'Reilly Radar - Fri, 2019/03/22 - 04:40

Explainable AI, Product Management, REPL for Games, and Open Source Inventory

  1. XAI -- An explainability toolbox for machine learning. Follows the Ethical Institute for AI & Machine Learning's 8 principles.
  2. The Producer Playbook -- Guidelines and best practices for producers and project managers.
  3. Repl.it Adds Graphics -- PyGame in the browser, in fast turnaround time.
  4. ScanCode Toolkit -- detects licenses, copyrights, package manifests and dependencies, and more by scanning code ... to discover and inventory open source and third-party packages used in your code.

Continue reading Four short links: 22 March 2019.

Categories: Technology

Automating ethics

O'Reilly Radar - Fri, 2019/03/22 - 04:15

Machines will need to make ethical decisions, and we will be responsible for those decisions.

We are surrounded by systems that make ethical decisions: systems approving loans, trading stocks, forwarding news articles, recommending jail sentences, and much more. They act for us or against us, but almost always without our consent or even our knowledge. In recent articles, I've suggested the ethics of artificial intelligence itself needs to be automated. But my suggestion ignores the reality that ethics has already been automated: merely claiming to make data-based recommendations without taking anything else into account is an ethical stance. We need to do better, and the only way to do better is to build ethics into those systems. This is a problematic and troubling position, but I don't see any alternative.

The problem with data ethics is scale. Scale brings a fundamental change to ethics, and not one that we're used to taking into account. That’s important, but it’s not the point I’m making here. The sheer number of decisions that need to be made means that we can’t expect humans to make those decisions. Every time data moves from one site to another, from one context to another, from one intent to another, there is an action that requires some kind of ethical decision.

Gmail’s handling of spam is a good example of a program that makes ethical decisions responsibly. We’re all used to spam blocking, and we don’t object to it, at least partly because email would be unusable without it. And blocking spam requires making ethical decisions automatically: deciding that a message is spam means deciding what other people can and can’t say, and who they can say it to.

There’s a lot we can learn from spam filtering. It only works at scale; Google and other large email providers can do a good job of spam filtering because they see a huge volume of email. (Whether this centralization of email is a good thing is another question.) When their servers see an incoming message that matches certain patterns across their inbound email, that message is marked as spam and sorted into recipients’ spam folders. Spam detection happens in the background; we don’t see it. And the automated decisions aren’t final: you can check the spam folder and retrieve messages that were spammed by mistake, and you can mark messages that are misclassified as not-spam.

Credit card fraud detection is another system that makes ethical decisions for us. Most of us have had a credit card transaction rejected and, upon calling the company, found that the card had been cancelled because of a fraudulent transaction. (In my case, a motel room in Oklahoma.) Unfortunately, fraud detection doesn’t work as well as spam detection; years later, when my credit card was repeatedly rejected at a restaurant that I patronized often, the credit card company proved unable to fix the transactions or prevent future rejections. (Other credit cards worked.) I’m glad I didn’t have to pay for someone else’s stay in Oklahoma, but an implementation of ethical principles that can’t be corrected when it makes mistakes is seriously flawed.

So, machines are already making ethical decisions, and often doing so badly. Spam detection is the exception, not the rule. And those decisions have an increasingly powerful effect on our lives. Machines determine what posts we see on Facebook, what videos are recommended to us on YouTube, what products are recommended on Amazon. Why did Google News suddenly start showing me alt-right articles about a conspiracy to deny Cornell University students’ inalienable right to hamburgers? I think I know; I’m a Cornell alum, and Google News “thought” I’d be interested. But I’m just guessing, and I have precious little control over what Google News decides to show me. Does real news exist if Google or Facebook decides to show me burger conspiracies instead? What does “news” even mean if fake conspiracy theories are on the same footing? Likewise, does a product exist if Amazon doesn’t recommend it? Does a song exist if YouTube doesn’t select it for your playlist?

These data flows go both ways. Machines determine who sees our posts, who receives data about our purchases, who finds out what websites we visit. We’re largely unaware of those decisions, except in the most grotesque sense: we read about (some of) them in the news, but we’re still unaware of how they impact our lives.

Don’t misconstrue this as an argument against the flow of data. Data flows, and data becomes more valuable to all of us as a result of those flows. But as Helen Nissenbaum argues in her book Privacy in Context, those flows result in changes in context, and when data changes context, the issues quickly become troublesome. I am fine with medical imagery being sent to a research study where it can be used to train radiologists and the AI systems that assist them. I’m not OK with those same images going to an insurance consortium, where they can become evidence of a “pre-existing condition,” or to a marketing organization that can send me fake diagnoses. I believe fairly deeply in free speech, so I’m not too troubled by the existence of conspiracy theories about Cornell’s dining service; but let those stay in the context of conspiracy theorists. Don’t waste my time or my attention.

I’m also not suggesting that machines make ethical choices in the way humans do: ultimately, humans bear responsibility for the decisions their machines make. Machines only follow instructions, whether those instruction are concrete rules or the arcane computations of a neural network. Humans can’t absolve themselves of responsibility by saying, “The machine did it.” We are the only ethical actors, even when we put tools in place to scale our abilities.

If we’re going to automate ethical decisions, we need to start from some design principles. Spam detection gives us a surprisingly good start. Gmail’s spam detection assists users. It has been designed to happen in the background and not get into the user’s way. That’s a simple but important statement: ethical decisions need to stay out of the user’s way. It’s easy to think that users should be involved with these decisions, but that defeats the point: there are too many decisions, and giving permission each time an email is filed as spam would be much worse than clicking on a cookie notice for every website you visit. But staying out of the user's way has to be balanced against human responsibility: ambiguous or unclear situations need to be called to the users' attention. When Gmail can't decide whether or not a message is spam, it passes it on to the user, possibly with a warning.

A second principle we can draw from spam filtering is that decisions can’t be irrevocable. Emails tagged as spam aren’t deleted for 30 days; at any time during that period, the user can visit the spam folder and say “that’s not spam.” In a conversation, Anna Lauren Hoffmann said it’s less important to make every decision correctly than to have a means of redress by which bad decisions can be corrected. That means of redress must be accessible by everyone, and it needs to be human, even though we know humans are frequently biased and unfair. It must be possible to override machine-made decisions, and moving a message out of the spam folder overrides that decision.

When the model for spam detection is systematically wrong, users can correct it. It’s easy to mark a message as “spam” or “not spam.” This kind of correction might not be appropriate for more complex applications. For example, we wouldn’t want real estate agents “correcting” a model to recommend houses based on race or religion; and we could even discuss whether similar behavior would be appropriate for spam detection. Designing effective means of redress and correction may be difficult, and we’ve only dealt with the simplest cases.

Ethical problems arise when a company’s interest in profit comes before the interests of the users. We see this all the time: in recommendations designed to maximize ad revenue via “engagement”; in recommendations that steer customers to Amazon’s own products, rather than other products on their platform. The customer’s interest must always come before the company’s. That applies to recommendations in a news feed or on a shopping site, but also how the customer’s data is used and where it’s shipped. Facebook believes deeply that “bringing the world closer together” is a social good but, as Mary Gray said on Twitter, when we say that something is a “social good,” we need to ask: “good for whom?” Good for advertisers? Stockholders? Or for the people who are being brought together? The answers aren’t all the same, and depend deeply on who’s connected and how.

Many discussions of ethical problems revolve around privacy. But privacy is only the starting point. Again, Nissenbaum clarifies that the real issue isn’t whether data should be private; it’s what happens when data changes context. None of these privacy tools could have protected the pregnant Target customer who was outed to her parents. The problem wasn’t with privacy technology, but with the intention: to use purchase data to target advertising circulars. How can we control data flows so those flows benefit, rather than harm, the user? "Datasheets for datasets" is a proposal for a standard way to describe data sets; model cards proposes a standard way to describe models. While neither of these is a complete solution, I can imagine a future version of these proposals that standardizes metadata so data routing protocols can determine which flows are appropriate and which aren't. It’s conceivable that the metadata for data could describe what kinds of uses are allowable (extending the concept of informed consent), and metadata for models could describe how data might be used. That's work that hasn't been started, but it's work that needed.

Whatever solutions we end up with, we must not fall in love with the tools. It’s entirely too easy for technologists to build some tools and think they’ve solved a problem, only to realize the tools have created their own problems. Differential privacy can safeguard personal data by adding random records to a database without changing its statistical properties, but it can also probably protect criminals by hiding evidence. Homomorphic encryption, which allows systems to do computations on encrypted data without first decrypting it, can probably be used to hide the real significance of computations. Thirty years of experience on the internet has taught us that routing protocols can be abused in many ways; protocols that use metadata to route data safely can no doubt be attacked. It's possible to abuse or to game any solution. That doesn’t mean we shouldn’t build solutions, but we need to build them knowing they aren’t bulletproof, that they’re subject to attack, and that we are ultimately responsible for their behavior.

Our lives are integrated with data in ways our parents could never have predicted. Data transfers have gone way beyond faxing a medical record or two to an insurance company, or authorizing a credit card purchase over an analog phone line. But as Thomas Wolfe wrote, we can’t go home again. There's no way back to some simpler world where your medical records were stored on paper in your doctor’s office, your purchases were made with cash, and your smartphone didn’t exist. And we wouldn’t want to go back. The benefits of the new data-rich world are immense. Yet, we live in a "data smog" that contains everyone's purchases, everyone's medical records, everyone’s location, and even everyone’s heart rate and blood pressure.

It's time to start building the systems that will truly assist us to manage our data. These machines will need to make ethical decisions, and we will be responsible for those decisions. We can’t avoid that responsibility; we must take it up, difficult and problematic as it is.

Continue reading Automating ethics.

Categories: Technology

Pages

Subscribe to LuftHans aggregator