Real world examples of telemetry being useful

nicemicro · May 22, 2021, 1:33pm

Dear community members,

after the audacity debacle with the question on opt-in in free software, I’d like to ask the community, whether there are any real world examples of software telemetry that was useful for the development of the product, and couldn’t have done any other way?

I know that many people, including hosts of podcasts from the DLN network (as well as hosts of other shows) say that “well, telemetry can indeed be useful”, I have never heard of even an anecdotal example, where telemetry actually gave the developers information they couldn’t have gathered any other way. “I can imagine that it is useful”, and it seems “common sense to be useful to collect massive data on software use”, but as many cases in the real world, including the sciences, economics, maths, etc., “common sense” doesn’t always make a good marker whether something is true or not.

An other point that comes up often is that “due to the lack of telemetry we don’t know exactly how many users Linux / Audacity / etc has”. What is the real world importance of this information? If you are developing free software, what difference does it make how many people use it? Even if you develop it as part of a business, it’s not what’s important for your business: how many people are willing to pay for the service / donate is the important point.

Thank you for reading my rant, I’d be curious about your answers.

Ethanol · May 22, 2021, 4:05pm

Knowing how many people use your software could roughly tell you the expected load on your servers when everyone is rushing to download the latest version of something on the same day or week of it’s release.

For FOSS projects this would be good information to determine how they split the spending of their donations. Should they spend more for a higher end Digital Ocean droplet or host it themselves on a Raspberry Pi. It also helps you understand all the other data you collect. If 100 people post on the forum about an issue, is that 100% of your users or 1%? Is this issue worth fixing today or can a small fraction of users get by with a work around for a week?

For Audacity to know what features are being used or plugins loaded when a project opens helps the team know where to focus developer time. This could decide a tools prominence or location in the GUI or menus as well.

Ultimately, telemetry is a more reliable way of collecting this data than asking users to fill out a survey every year. What percentage of users would actually take the time to complete an Audacity survey? It’s complicated software, the survey would be lengthy. My guess is less people would do this than opt in to telemetry. It also gives you real time feedback. You can see the implications of changes the day after they are implemented. A survey, on the other hand, tells you that everyone hated your decision and has been dealing with it for a year, or half the users stopped using it entirely and you’ll never know which change in the last 12 months made them leave and they’re not going to fill a survey for software they don’t use anymore.

Ulfnic · May 22, 2021, 4:11pm

Who better than Stephen Fry to describe the empiricism vs rationalism problem, I caught a recent interview where he described his preference for empiricism as…

“testing an idea against what actually happens and how people actually behave rather than devising a system of reason. It’s not that rationalism and empiricism are always absolutely opposed but they sometimes are.”

The tricky part is few people can answer this without using rationalism as most people don’t manage telemetry based decision making.

From an empirical side I could say I adapt my jokes socially because of the “telemetry” I receive through people laughing. The jokes no one laughs at I discontinue or improve but if I didn’t know if anyone laughed i’d be in a difficult position. That isn’t to say I need telemetry for every joke or that I couldn’t improve without it but feedback goes a long way.

popey · May 22, 2021, 6:46pm

I can give you a couple of very real-world, concrete examples of where telemetry was useful.

The Ubuntu desktop installer data uncovered information the project never knew, popularity of Ubuntu on virtual machines vs real machines, and popularity of one GPU vs another. That very specific information allowed the QA team to focus attention on specific hardware. Without that data, they were wasting time testing hardware/software combinations that weren’t popular. Time and people are a valuable resource, no point wasting it.

Most proprietary software has telemetry. Slack, Spotify, Skype etc, all know which distro people are using, and other data too. They use this to know where to focus development (or not). If they see that certain distros are not popular, then they focus attention elsewhere. When I worked at Canonical, some proprietary software vendors would share their metrics with us (under NDA). It was very enlightening, and useful data to enable us to focus too.

There was a bug affecting many thousands of users, which resulted in them not receiving important updates. We identified this only through the error reporting metrics Ubuntu gathers. This was instrumental in us getting fixes out to users. I wrote the full story in a blog post, which you’ll find here: A Tale of Two Updates - Alan Pope's blog. It’s a bit long, but gives a very real example of what you’re asking for.

nicemicro · May 23, 2021, 1:16pm

Thank you for your insightful answers!
I have thought about some of the points, and I’ll try to address those I have some contentions with clearly and respectfully. If anyone is in the mood to argue about the topic, I’m totally up for it.
I might be a bit argumentative, but in the end, I’m doing this because I’m really curious about this issue, and would like to have a clearer picture on this topic. Please correct me whenever you feel like my comments are incorrect.

During my university years in statistics, the most important thing I have learned is that more data does not always mean better data. In theory, I can imagine, that actually continuously surveying the user’s behavior might give the developers valuable information, but the thing is that the more types of data you collect in parallel, the more likely that there will be random correlations.

When doing statistical inferences, like this is something that looks good when we first think about it, but I wonder if the people who try to use the mass data collection for telemetry with this purpose, have the proper understanding of statistics to be able to separate real data from noise, and to find real causal connections. For instance, doing A-B testing is kind of the minimum to make the statistics meaningful.
I don’t know if FOSS developers put in the effort to plan out these tests to make sure their conclusions are correct, or do they just look at the big bunch of data they get and try to make sense of it?

This seems like a valid point, but I guess this is something you only have to do once per installation, as an automated survey of the hardware. I get the point of things like this, I’m more worried / concerned about the effectiveness of data collection that is continuous throughout the use of the software.

I remember hearing about this story, very interesting point, especially how it did end up giving insight on something that the wasn’t really planned for. I personally wouldn’t want my OS to send automatic crash information to the distro maintainer, but I do see the value in easy, direct crash reporting.

I get it’s a joke, but it’s really misses the mark for the discussion I would like to have. I wholeheartedly agree that feedback is super important, there is no denying in that. And not every method of gathering feedback is telemetry. And not every method of gathering feedback is giving you the proper feedback you actually need. Staying with your analogy, if you want your jokes to get better, you don’t try to get feedback from your employees, whose paycheck depend on you, for example.

Thank you for reading my replies this far. Based on the current discussion, there is one thing I’m still looking for an answer for:
Can anyone give me a good example or a counterexample of continuous and repeated surveillance of the behavior of a massive chunk of the users resulted in valuable information for developing a software?
Did it work better than having a few select people try the software out in a controlled environment, with maybe a pre- and a post-trial interview?

popey · May 23, 2021, 2:43pm

I think you’re missing the point. Reliability can also mean having consistent and accurate data. Not necessarily about getting a large volume of data, but a representative set. For example the desktop installer on Ubuntu has an option to opt out of sending the install data. In that case a ping is still sent (contentiously) with no data. This allows the team gathering the metrics to gauge how many of the people who were presented with the option decided to send, and how many didn’t. Having a gauge on what proportion of the userbase sent that data, and what section didn’t, allows you to be more confident that the data you got was reliable.

Some of us had people with data science backgrounds work on the data. Not every project looks at a pile of data and makes uneducated guesses.

Absence of evidence isn’t evidence of absence. Just because you don’t have the examples, doesn’t mean your assertion that it’s bad is correct.

I can give you another example. With some of the software I’ve worked on, it continuously reports back to home base about the version of the software running. Imagine you have a small team of developers who are working across multiple versions of your software. How long should you continue devoting developer time to maintain the older version? What if you don’t know how many users the old release and the new release has? How do you know whether to focus attention on the 1.x series or the new 2.x series. Perhaps people are cautious about upgrading. Maybe people like the features in 1.x and don’t like what’s in 2.x. If you had actual data that shows how many people are on 1.x and how many on 2.x, and over what period those users migrated, you can predict with some certainty when the number of 1.x users shrinks to a point where you don’t need to worry. This happens regularly. Knowing how many people are on what release is crucial to allocating resources. It measurably makes the 2.x version of your product better, if you have more resources to work on it, because you’re not wasting effort on a 1.x version that “nobody” is using.

nicemicro · May 24, 2021, 10:20am

Thank you for taking the time answering my questions and addressing my concerns!

I agree with you wholeheartedly. I think many laypeople however don’t understand what does it take to get consistent and accurate data that carries real information.

Yeah, I really hope so, but wouldn’t the ideal case be that only projects who are actually equipped to meaningfully analyze the data would do these data collection endeavors? Especially regarding that data collection and analysis done in the wrong way would be a waste of human resources.

True. But also, just because every Linux / FOSS podcast and their mother goes out and tells us how “I am not against telemetry in general because it helps developers”, it doesn’t remove the burden of proof from those who make the statement.

I like your example about following the software versions, it makes sense if you are working on a project that runs multiple versions of the same software, to help focus the efforts. But addressing this:

I find it hard to believe, that apart from the teams that have someone very well versed in statistics, this information can be reliably extracted just by telemetry.

I guess, to summarize, my main point would be, that I would like to know, that when I click on “allow telemetry”, the data I actually volunteer to give to the developers, will actually serve a good purpose in the development. And I am at the point when people repeating “it is useful” is just not enough for me.

Thank you for the great input, I really enjoyed reading your reply.

popey · May 24, 2021, 2:07pm

I’m not saying the “why” can necessarily be extracted from the telemetry, and that’s not the point. The point is that some people may be holding back, for any one of a number of reasons, and this was just one example of those reasons. The point is that people are holding back. Further data gathering could be done (survey, or other) to understand why, but the first part is to understand that they’re doing it at all. Most software developers won’t even know.

Ethanol · May 25, 2021, 4:03am

Have you seen this Linux Mint blog post? Update your computer! – The Linux Mint Blog
Scroll down to the part about statistics. It’s interesting to see just how little they know about their users because they do not collect telemetry. The stats are all over the place and pretty unreliable. They could, however, conclude that more than NONE of their users were using a very old release with now known vulnerabilities. Since then they have revamped the way that Linux Mint notifies users about updates to make it more obvious and pushed an emergency update to their old release repo.

I thought it was interesting and fits into the discussion here.

popey · May 25, 2021, 8:23am

Mozilla recently published a blog post in which they highlight the usefulness of their data gathering.

Ulfnic · May 25, 2021, 8:17pm

@nicemicro having thought about this a bit more… while examples can prove that telemetry can be useful. They can’t prove telemetry is more useful than detrimental on average unless you’re privy to the opportunity costs of every telemetry based decision a company has ever made.

Selecting for only success stories just proves a success can exist, it doesn’t show odds of success or degrees of benefit on a continuum and given enough background noise all things are true at least once.

Just for fun here… this reminds me of one of the most controversial Myth Buster finders that Adam Savage described recently. I was certain what the data meant till 1/2 way through.

I’m guessing there’s some optimal mix of rationalism and empiricism; guessing what the data means and experimenting that gets the best results over time. How to measure that won’t be easy and probably needs some guessing and experimenting.