You are here

Feed aggregator

Error message

Deprecated function: The each() function is deprecated. This message will be suppressed on further calls in menu_set_active_trail() (line 2404 of /usr/share/drupal7/includes/menu.inc).

Topic for Security meeting on Feb 20th

PLUG - Tue, 2020/02/18 - 14:46

Donald ‘Mac’ McCarthy: Firewall Fundamentals and Network Segmentation with pfSense

Description:
This presentation will cover two fundamental steps in properly securing networks.

Firewalls are one of the most basic and universal network and security controls in a security engineer’s tool bag. We will cover some basic setup and rule creation with pfSense.

Network segmentation is critical for maximizing the effectiveness of security controls. We will discuss some strategies for segmenting networks, practical segmentation, segmentation challenges, and how to best deploy network controls within network segments.

About Mac:
Mac is a 17 year veteran of the IT industry. He has experience worked for organization ranging in size from 10 to 200,000+ employees. Mac has been involved in information security for the past 9 years with organizations in the academic, healthcare and financial, and public sectors. Mac is a linux enthusiast with a passion for using large compute clusters to help solve the most challenging problems in security analytics. He has given presentations globally on business email compromise and credentials stuffing. Mac currently serves as the Director of Field Operations for Open Source Context.

Four short links: 5 February 2020

O'Reilly Radar - Tue, 2020/02/04 - 22:01
  1. Discord Switching from Go to Rust — memory management proved a deciding factor: Go’s garbage collector versus Rust’s compile-time ownership of memory.
  2. What Happened with DNC Tech (Rabble) — a lot of background and context, but the crux is: the decision-makers refuse to use free software, alienating the progcoders/ragtag communities. They also refuse to fund projects between cycles to build reusable platforms. Best pithy comment: underinvestment in quality is still the median software project experience by William Pietri.
  3. Out of Control — Norwegian Consumer Council report on the ad tech industry. Many actors in the online advertising industry collect information about us from a variety of places, including web browsing, connected devices, and social media. When combined, this data provides a complex picture of individuals, revealing what we do in our daily lives, our secret desires, and our most vulnerable moments. This massive commercial surveillance is systematically at odds with our fundamental rights and can be used to discriminate, manipulate, and exploit us. They’re filing lawsuits against companies, too.
  4. Morning Report: Smart Streetlights Are Experiencing Mission Creep — it’s my theory that once something is shown to be possible with bits, it will happen because there’s such diversity of motivations in the world that someone wants it to happen, and it’s nigh impossible to prevent things without physical restrictions. Not everyone can put cameras and other sensors into streetlights, but once they’re there, we’ll see every application of that data justified. It’s the Tragic Inevitability of Software, which we’re only beginning to appreciate.
Categories: Technology

Four short links: 4 February 2020

O'Reilly Radar - Mon, 2020/02/03 - 22:01
  1. The Missing Semester of Your MIT EducationWe’ll teach you how to master the command-line, use a powerful text editor, use fancy features of version control systems, and much more.
  2. Reasons Not to Be Famous (Tim Ferris) — my buddy Rowan Simpson talked about this over a decade ago, and it’s still something most people don’t think about.
  3. Expert Programmers Have Fine-tuned Cortical Representations of Source CodeThis approach enabled us to identify seven brain regions, widely distributed in the frontal, parietal, and temporal cortices, that have a tight relationship with programming expertise. In these brain regions, functional categories of source code could be decoded from brain activity and the decoding accuracies were significantly correlated with individual behavioral performances on source-code categorization. Our results suggest that programming expertise is built up on fine-tuned cortical representations specialized for the domain of programming.
  4. nfstreama Python package providing fast, flexible, and expressive data structures designed to make working with online or offline network data both easy and intuitive.
Categories: Technology

AI meets operations

O'Reilly Radar - Sun, 2020/02/02 - 22:01

One of the biggest challenges operations groups will face over the coming year will be learning how to support AI- and ML-based applications. On one hand, ops groups are in a good position to do this; they’re already heavily invested in testing, monitoring, version control, reproducibility, and automation. On the other hand, they will have to learn a lot about how AI applications work and what’s needed to support them. There’s a lot more to AI Operations than Kubernetes and Docker. The operations community has the right language, and that’s a great start; I do not mean that in a dismissive sense. But on inspection, AI stretches the meanings of those terms in important but unfamiliar directions.

Three things need to be understood about AI.

First, the behavior of an AI application depends on a model, which is built from source code and training data. A model isn’t source code, and it isn’t data; it’s an artifact built from the two. Source code is relatively less important compared to typical applications; the training data is what determines how the model behaves, and the training process is all about tweaking parameters in the application so that it delivers correct results most of the time.

This means that, to have a history of how an application was developed, you have to look at more than the source code. You need a repository for models and for the training data. There are many tools for managing source code, from git back to the venerable SCCS, but we’re only starting to build tools for data versioning. And that’s essential: if you need to understand how your model behaves, and you don’t have the training data, you’re sunk. The same is true for the models themselves; if you don’t have the artifacts you produced, you won’t be able to make statements about how they performed. Given source code and the training data, you could re-produce a model, but it almost certainly wouldn’t be the same because of randomization in the training process.

Second, the behavior of AI systems changes over time. Unlike a web application, they aren’t strictly dependent on the source. Models almost certainly react to incoming data; that’s their point. They may be retrained automatically. They almost certainly grow stale over time: users change the way they behave (often, the model is responsible for that change) and grow outdated.

This changes what we mean by “monitoring.” AI applications need to be monitored for staleness—whatever that might mean for your particular application. They also need to be monitored for fairness and bias, which can certainly creep in after deployment. And these results are inherently statistical. You need to collect a large number of data points to tell that a model has grown stale. It’s not like pinging a server to see if it’s down; it’s more like analyzing long-term trends in response time. We have the tools for that analysis; we just need to learn how to re-deploy them around issues like fairness.

We should also ask what “observability” means in a context where even “explainability” is always an issue. Is it important to observe what happens on each layer of a neural network? I don’t know, but that’s a question that certainly needs answering. Charity Majors’ emphasis on cardinality and inferring the internal states of a system from its outputs is certainly the right direction in which to be looking, but in AI systems, the number of internal states grows by many, many orders of magnitude.

Last, and maybe most important: AI applications are, above all, probabilistic. Given the same inputs, they don’t necessarily return the same results each time. This has important implications for testing. We can do unit testing, integration testing, and acceptance testing—but we have to acknowledge that AI is not a world in which testing whether 2 == 1+1 counts for much. And conversely, if you need software with that kind of accuracy (for example, a billing application), you shouldn’t be using AI. In the last two decades, a tremendous amount of work has been done on testing and building test suites. Now, it looks like that’s just a start. How do we test software whose behavior is fundamentally probabilistic? We will need to learn.

That’s the basics. There are other issues lurking. Collaboration between AI developers and operations teams will lead to growing pains on both sides, especially since many data scientists and AI researchers have had limited exposure to, or knowledge of, software engineering. The creation and management of data pipelines isn’t something that operations groups are responsible for–though, despite the proliferation of new titles like “data engineer” and “data ops,” in the future I suspect these jobs will be subsumed into “operations.”

It’s going to be an interesting few years as operations assimilates AI. The operations community is asking the right questions; we’ll learn the right answers.

Upcoming events

O’Reilly conferences combine expert insights from industry leaders with hands-on guidance about today’s most important technology topics.

We hope you’ll join us at our upcoming events:

O’Reilly Software Architecture Conference, New York, February 23-26

O’Reilly Strata Data & AI Conference, San Jose, March 15-18

Smart Cities & Mobility Ecosystems Conference, Phoenix, April 15-16

Categories: Technology

Four short links: 3 February 2020

O'Reilly Radar - Sun, 2020/02/02 - 22:01
  1. Standing on the Shoulders of Giants (Ben Evans) — I like his framing of the problems (“tech companies being bad to other companies,” “tech companies being bad to us,” “bad guys using tech”).
  2. Algo Deckan open-source collection of +200 algorithmic cards [for Anki]. It helps you prepare for and succeed in your algorithm and data structure interview. The code examples are in Java.
  3. Things I Believe — these two provoke a lot of thoughts: There are many fundamental discoveries in computer science that are yet to be found. Peak productivity for most software engineers happens closer to two hours of work a day than eight hours.
  4. Google Maps Hacks99 secondhand smartphones are transported in a handcart to generate virtual traffic jam in Google Maps.Through this activity, it is possible to turn a green street red which has an impact in the physical world by navigating cars on another route to avoid being stuck in traffic.
Categories: Technology

Topics for The February 13th

PLUG - Sun, 2020/02/02 - 15:09
Jill Rouleau: Ansible Everything! Description: Ansible is an IT automation tool designed for simplicity and ease of use. This talk will cover what that means, how it works, and what you can do with Ansible. About Jill: Jill is a long time member of the Free Software community, serving on the PLUG steering committee and as BoF organizer for the Southern California Linux Expo (SCaLE). They are a member of the Ansible core development team at Red Hat, focused on AWS and other Cloud modules.

Four short links: 31 January 2020

O'Reilly Radar - Thu, 2020/01/30 - 22:01
  1. Thunderbird on the Move (ZDNet) — the news is not that interesting, except that it represents signs of life for Thunderbird.
  2. SourceGen Disassemblya collection of disassembly projects, mostly Apple II games. The machine-language portions and embedded graphics have been converted to readable form.
  3. OpenSKan open-source implementation for security keys written in Rust that supports both FIDO U2F and FIDO2 standards—from Google.
  4. Machine Learning Systems Design27 open-ended questions that test your ability to […] design systems to solve practical problems.
Categories: Technology

Four short links: 30 January 2020

O'Reilly Radar - Wed, 2020/01/29 - 22:01
  1. Towards a Human-like Open-Domain Chatbot — Google’s paper on making a chatbot that can have a vaguely plausible conversation on any subject. (via Google’s AI Blog)
  2. fast.ai’s Coding Style — interesting to see how different they are from historic coding standards, but they justify those differences (and aren’t dogmatic about the code they accept).
  3. All Your Mods Are Belong to Us — if you make a “custom game” (aka mod), then Blizzard owns the copyright. An example of attempting to capture all the value you create.
  4. Engineering a Repairable Worldwe need to view repair not only as an entryway to the field, but also as an essential or even ethical element of sustainable design and engineering.
Categories: Technology

Four short links: 29 January 2020

O'Reilly Radar - Tue, 2020/01/28 - 22:01
  1. Reverbspeculative debugging for web applications. […] Reverb has three features that enable a fundamentally more powerful debugging experience. First, Reverb tracks precise value provenance, allowing a developer to quickly identify the reads and writes to JavaScript state that affected a particular variable’s value. Second, Reverb enables speculative bug fix analysis. […] Third, Reverb supports wide-area debugging for applications whose server-side components use event-driven architectures. (via Morning Paper)
  2. Planned vs. Unplanned Work (John Allspaw) — Planned work comes from experience, and experience comes from unplanned work.
  3. Legible News — scraped from Wikipedia’s current affairs section, and presented in a wonderfully minimalist fashion.
  4. Ops Lessons We Learn the Hard WayYour network team has a way into the network that your security team doesn’t know about. (via BoingBoing)
Categories: Technology

Four short links: 28 January 2020

O'Reilly Radar - Mon, 2020/01/27 - 22:01
  1. TinyML Book — machine learning for embedded systems, an O’Reilly book by Pete Warden and Daniel Sityunake.
  2. Useful Probability for Systems Programmers — interesting findings like: If you have 1N chance of success, then you’re more likely than not to have succeeded after N tries, but the probability is only about two thirds.
  3. Cost of 51% AttacksThis is a collection of coins and the theoretical cost of a 51% attack on each network.
  4. Brad Fitzpatrick Leaving Google — with a concise summary of his amazing track record. What next? TBA. But building something new.
Categories: Technology

Four short links: 27 January 2020

O'Reilly Radar - Sun, 2020/01/26 - 22:01
  1. The Developer Coefficient (Stripe) — Access to developers is a bigger threat to success than access to capital. […] The average developer spends more than 17 hours a week dealing with maintenance issues, such as debugging and refactoring. In addition, they spend approximately four hours a week on “bad code,” which equates to nearly $85 billion worldwide in opportunity cost lost annually, according to Stripe’s calculations on average developer salary by country
  2. Stanford Compilers Course — self-directed MOOC goes away on March 26, so get amongst it while you can.
  3. Terminal Phasea space shooter game you can play in your terminal.
  4. Synthesizing Data-Structure Transformations from Input-Output Examples (Morning Paper) — I believe I’ve linked to the paper before, but I just noticed this interesting point: It is known from prior work that such [functional] languages offer natural advantages in program synthesis. Good to see Adrian (the Morning Paper guy) is interested in the same “future of coding” areas that I am. This promises to be an interesting series of papers he looks at.
Categories: Technology

Four short links: 24 January 2020

O'Reilly Radar - Fri, 2020/01/24 - 04:00
  1. China Open Sourcing the Wuhan Coronaviruses Genomes (Twitter) — fast-tracking research.
  2. kube-scanOctarine k8s cluster risk assessment tool.
  3. Copyright is in Crisis (Cory Doctorow) — excellent excoriation of the state of the creative industries, where consolidation and regulation work against the creators and for the middlemen.
  4. Validating Startup IdeasOur goal in publishing this is to help other founders think about how to do early validation the way that we do inside the studio.
Categories: Technology

Four short links: 23 January 2020

O'Reilly Radar - Thu, 2020/01/23 - 04:00
  1. The Business Case for Formal Methodsa short explanation, a list of benefits and case studies, and a demo. Everything’s in TLA+, but the arguments apply equally well to Alloy, B, statecharts, etc. (Via Lobsters)
  2. Backend LoreFrom late 2012 to the present I have been writing backends (server-side code) for web applications. This document summarizes many aspects of how I write these pieces of code.
  3. float-toy — play with the binary representation of IEEE floats.
  4. matterbridge[chat] bridge between mattermost, IRC, gitter, xmpp, slack, discord, telegram, rocket.chat, steam, twitch, ssh-chat, zulip, whatsapp, keybase, matrix, and more with REST API (mattermost not required!)
Categories: Technology

Four short links: 22 January 2020

O'Reilly Radar - Tue, 2020/01/21 - 22:01
  1. Elements of Scheduling — notable for several things, but my eye was caught by: finite convergence to completion fell beyond our reach. I know that state.
  2. Dungeons and Deadlines — a game of work/life balance.
  3. Microsoft Application Inspector — open source software characterization source code analyzer that helps you understand what a program does by identifying interesting features and characteristics using static analysis and a customizable json-based rules engine.
  4. Understanding Team DynamicsWe find that highly successful teams are significantly more focused than average teams of the same size, that their members have worked on more diverse sets of projects, and the members of highly successful teams are more likely to be core members or “leads” of other teams.
Categories: Technology

Four short links: 21 January 2020

O'Reilly Radar - Mon, 2020/01/20 - 22:01
  1. Cytoscapean open source software platform for visualizing complex networks and integrating these with any type of attribute data.
  2. What’s Wrong with Computational Notebooks? Pain Points, Needs, and Design OpportunitiesOur findings suggest that data scientists face numerous pain points throughout the entire workflow—from setting up notebooks to deploying to production—across many notebook environments. Our data scientists report essential notebook requirements, such as supporting data exploration and visualization. The results of our study inform and inspire the design of computational notebooks.
  3. Advent of Computing — podcast of computing history.
  4. Privacy-Preserving Record Linkagetoolbox for deterministic, probabilistic, and privacy-preserving record linkage techniques.
Categories: Technology

Four short links: 20 January 2020

O'Reilly Radar - Sun, 2020/01/19 - 22:01
  1. AR Contact LensThe path ahead is not a short one; contact lenses are considered medical devices and therefore need US Food and Drug Administration (FDA) approval. But the Mojo Lens has been designated as an FDA Breakthrough Device, which will speed things up a little. And clinical studies have begun.
  2. BucklespringThis project emulates the sound of my old faithful IBM Model-M space saver bucklespring keyboard while typing on my notebook, mainly for the purpose of annoying the hell out of my coworkers.
  3. Orange Badge (Tim Bray) — At some point, it’s going to be a real problem being management in a sector that’s widely feared and distrusted. But we in the tech tribe haven’t really internalized much about this yet. This. Silicon Valley failed to die a hero, so has lived long enough to see itself become the villain.
  4. Great ExpectationsPipeline tests are applied to data (instead of code) and at batch time (instead of compile or deploy time). Pipeline tests are like unit tests for datasets: they help you guard against upstream data changes and monitor data quality. Software developers have long known that automated testing is essential for managing complex codebases. Great Expectations brings the same discipline, confidence, and acceleration to data science and engineering teams.
Categories: Technology

Four short links: 17 January 2020

O'Reilly Radar - Thu, 2020/01/16 - 22:01
  1. cursedfsMake a disk image formatted with both ext2 and FAT at once. Silliness!
  2. catsHere, placed side-by-side for comparison, are GNU’s implementation of cat, Plan 9’s implementation, Busybox’s implementation, and NetBSD’s implementation, Seventh Edition Unix (1979), Tenth Edition Unix (1989), and 4.3BSD. There’s a lot to learn from the differences!
  3. wav2letter++a fast, open source speech processing toolkit the speech team at Facebook AI Research built to facilitate research in end-to-end models for speech recognition. It is written entirely in C++ and uses the ArrayFire tensor library and the flashlight machine learning library for maximum efficiency.
  4. Work is Work (Coda Hale) — Neither your employee handbook nor your calendar are accurate depictions of how work in the organization is done. Unless your organization is staffed with zombies, members of the organization will constantly be subverting standard operating procedure in order to get actual work done. Even ants improvise. (via Ben Gracewood)
Categories: Technology

Topic for Jan 16th Security meeting

PLUG - Thu, 2020/01/16 - 09:16

Sebastian Tuchband: Webserver Practices Through Nextcloud

Description:
A showcase of Nextcloud and using it to show a few modern security practices on web servers.

About Sebastian:
Sysadmin and privacy advocates who finds and implements the open-source, self-hosted softwares for control and ownership.

 

Four short links: 16 January 2020

O'Reilly Radar - Wed, 2020/01/15 - 22:01
  1. Zero Trust Architecture PrinciplesTen principles to help you design and deploy a zero trust architecture. They are: know your architecture; create a single strong user identity; create a strong device identity; authenticate everywhere; know the health of your devices and services; focus your monitoring on devices and services; set policies according to the value of services or data; control access to your services and data; don’t trust the network, including the local network; choose services designed for zero trust.
  2. Ten Things Technology Platforms Can Do To Safeguard The 2020 US Election — (and everyone else’s elections, you bumptious yokels). They’re all good suggestions. Google, Twitter, and Facebook do not share common language or definitions for political ads—the primary social media companies should agree on a common, broad set of definitions for political ads and adopt them across platforms. Seems like “the limits of free speech online” is an issue without a widely agreed success condition, making it unsuited to the competing-and-changing nature of free enterprise, which thrives better in “sell more widgets / make more money” types of clear-cut goals. If there’ll never be a market-led solution, citizens should direct suggestions like this post to their government rather than to the companies themselves.
  3. Evidence-based Design Heuristics for Idea GenerationObservations go beyond products to consider multiple concepts generated for a given problem.
  4. Terrieran image and container analysis tool that can be used to scan images and containers to identify and verify the presence of specific files according to their hashes.
Categories: Technology

Reinforcement learning for the real world

O'Reilly Radar - Wed, 2020/01/15 - 04:00

Roger Magoulas recently sat down with Edward Jezierski, reinforcement learning AI principal program manager at Microsoft, to talk about reinforcement learning (RL). They discuss why RL’s role in AI is so important, challenges of applying RL in a business environment, and how to approach ethical and responsible use questions.

Here are some highlights from their conversation:

Reinforcement learning is different than simply trying to detect something in an image or extract something from a data set, Jezierski explains— it’s about making decisions. “That entails a whole set of concepts that are about exploring the unknown,” he says. “You have the notion of exploring versus exploiting, which is do the tried and true versus trying something new. You bring in high-level concepts like the notion of curiosity—how much should you buy as you try new things? The notion of creativity—how crazy are the things you’re willing to try out? Reinforcement learning is a science that studies how these things come together in a learning system. (00:18)

The biggest challenge for businesses, Jezierski says, is correctly identifying and defining goals, and deciding how to measure success. For example, is it the click you’re after or something a bit deeper? This honest, clarifying conversation is key, he says. “This is why we’re focused first on the applied use of services because it can be very abstract otherwise. It’s like, ‘Oh, I’ve got to make decisions. I get rewards, and I’m going to explore—how do I look at my own business problem through that light?’ A lot of people get tripped up in that. So we’ll try to say, ‘Look, we’re going to draw a smaller box. We’re going to say we want to define personalization using RL as ‘choose the right thing’ for my menu in a context and tell us how well it went.’ That’s not the universe of possibility, but 90% of people can frame a part of their problem that way. If we can design a small box where people in it can have guaranteed results and we can tell you whether you fit in the box or not, that’s a great way to get people started with RL.” (3:24)

Ethics and responsible use are essential facets of reinforcement learning, Jezierski notes. Guidelines in this area aren’t necessarily addressing bad actors, but are aiming to help those unaware of the consequences of what they’re doing become more aware and to help those who are aware of the consequences and have good intentions to have more backing. Asking the right questions, Jezierski explains, is the difficult part. “In reinforcement learning, you get very specific questions about ethics and personalization—like, where is it reasonable to apply reinforcement learning? Where is it consequential to explore or exploit? Should insurance policies be personalized in a webpage using reinforcement learning, and what are the attributes that should drive that? Or is an algorithm trying to find out better ways that are not goaled toward the purpose of insurance, which is a long-term financial pool of risk and social safety net. Is it even ethical to apply to that sort of scenario?” It’s important, Jezierski says, to make these types of conversations non-taboo in team environments, to empower anyone on the team to hit the brakes to address a potential issue. “If you have an ethical or responsible use concern, you can stop the process and it’s up to everybody else to justify why it should restart. It’s not up to you to justify why you stopped it. We take it very seriously because in the real world, these decisions will have consequences.” (9:40)

Categories: Technology

Pages

Subscribe to LuftHans aggregator