Package weights

#1

One of the common questions we’re seeing is about package weights, and I thought it’d be worth starting a thread for ideas and feelings on this.

A reminder that the “weight” is one of the factors in our payment formula, which is summarized here: https://tidelift.com/docs/lifting/paying

The overall effect of the formula is that both small, popular packages and large, niche packages can earn money on Tidelift; large, popular packages of course earn the most money.

I’m feeling great about the “usage” part of the formula - this is the royalty-like aspect, if a project builds and maintains a package that subscribers actually use, then they get paid in proportion to their success. But if I just upload some code to npm and nobody but me uses it, I get nothing.

The “weight” part is trickier. I thought I’d give some behind-the-scenes on what we’re doing right now and why, and leave this thread open a while for feedback and ideas.

Short story

There are some downsides to weighting by code size, but on balance we think it’s better than equal-weighting - a 2-line function and a huge framework simply shouldn’t get the same portion of fees because they aren’t the same amount of work or creating the same impact. We are mitigating the downsides in these ways:

  • The formula also incorporates usage; if you make a package that’s full of junky code, someone can make a forked version without the junk and take away your usage.
  • We adjust the code size measurement to remove things like generated code, vendored code, boilerplate, and tests.
  • We adjust the code size measurement to make different languages comparable.
  • We will actively shut down any identifiable attempts to game this, and since gaming it hurts other maintainers in your ecosystem, we expect that gaming attempts are likely to be reported. Also, we think most OSS maintainers are ethical.
  • We hope to incorporate some other measures into the weight over time.

Any way we split things up will be a little arbitrary and a little game-able.

We think “adjusted code size” has some virtues such as being fairly objective and having some relationship to maintenance effort. It’s better in our minds than equal-weighted for those reasons.

But better doesn’t mean perfect; very open to suggestions on how we should evolve the weight factor or what else should go into it.

Longer story

Rejected approach: equal weighting

The simplest approach to weight is equal-weighting (which is the same as “don’t have a weight factor, only consider usage”). The issues with this include:

  • a very strong incentive to split up a package into many small packages
  • the intuition that a huge framework and a 2-line function are not the same amount of maintenance effort or the same amount of value-to-subscribers

The issue with bad incentives isn’t only that people might game Tidelift on purpose, it’s also that in the wild people have already sometimes split things up and sometimes haven’t, for technical or practical reasons.

Rejected approach: size of the entire package

The next-simplest approach we came up with was “the size of the package” (like “make an http HEAD request to the package’s download URL and get the Content-Length”). The issues with this include:

  • some packages include N copies of their code, like a regular version, minified version, minified-a-different-way version, etc.
  • some packages include various data files, test files, vendored code, autogenerated code, etc.
  • sensitive to level of gzip or zip applied to the package
  • includes package manager metadata which means a slight gain from splitting up packages still

I actually tried this since it seemed like the simplest thing that could possibly work, and it did not work. The results were not good. For example, I was surprised how many packages ship their unit tests, including massive fixtures sometimes, right in the released package!

Current approach: adjusted code size

Getting a little more complex is an “adjusted code size.” What this means is that we unpack the package (removing compression), filter out files that aren’t code, filter out various kinds of code that shouldn’t count (like vendored dependencies), and then add up the sizes of the remaining code files.

This is what we’re doing now and the results feel pretty good; the packages at the top of the weightings are substantive packages with a lot of maintenance work going into them. Splitting up packages into smaller packages ought to have no effect, total weight would remain the same.

We also do some normalization by ecosystem to make npm and Java more comparable (since many subscribers are using multiple ecosystems).

Future ideas

Conceptually, the weight indicates how much relative value each package provides to a single subscriber. Usage then considers how many subscribers are receiving that value.

We could incorporate some signal from subscribers of “how much I care about this package” into the weight number. I don’t think it’s a good idea to let subscribers completely pick-and-choose which packages get what, because if they get no value from something, why are they using it? We want to lift all boats. Also, no subscriber wants to micromanage weights on 3000 packages.

But perhaps there are ways for them to say “I really really care about package xyz” and factor that in, and we’ve heard the desire to do so from them.

I tend to think we should avoid anything in the weight that’s redundant with usage. For example, download counts or GitHub stars or other popularity measures. I might expect these to correlate with usage, so pulling them in might double-count the same factor.

If people do start to game things, it could work out a lot like Google’s search algorithm or spam filtering algorithms, where we keep having to adapt. However, the absolute numbers of “packages people actually use” are a lot lower than the number of web pages or spam emails on the Internet, so manual-intervention solutions are more practical. We also have a business relationship and contract with all lifters, which helps.

By the way: if we do change the weighting algorithm in the future, mitigating the impact on lifters will be an important consideration. There are several ways to do that so we don’t pull the rug out from under anyone.

Feelings and ideas welcome

We can definitely make changes and evolve things from here.

6 Likes
#2

I completely understand the trickiness of figuring this out, and it sounds like you have put good thought and experiments into it. You mentioned “how much I care about this package” as a factor. Another way to look at it is that some packages “punch above their weight.” That is, they provide outsized value compared to their bulk. Perhaps if the subscribers had some way to indicate that?

I have no idea how engaged the subscriber are, if they care about this formula, or if they even have a sense of the size of packages? You’d probably have to show them a list, with the “value assumed based on size”, so that they could gauge how well it matches their perception. But as I type that out, I can guess that no one will put that much effort into it :slightly_smiling_face:

3 Likes
#3

The simplest idea I’ve come up with is to ask subscribers: “please tell us which dependencies are most critical to you” and allow subscribers to put a check by some of their deps. I think we’d want to user test this some and see if it makes sense to anyone.

I’d love to have some more automated way to get at this without someone having to do work - but it’s tough to think of what it’d be.

#4

I would have chosen a type-weighting approach.

You would have different tiers:

  1. programming language / compiler
  2. framework
  3. library
  4. polyfill
  5. helper function

I mainly provide 2, 3 and 4 and would completely understand if the maintainers in the top tiers (0 and 1) get more.

#5

Interesting! That’s a new idea. Can think of some challenging aspects of it. But there’s intuitive appeal that the things at the higher tiers feel more “central” so this could be a useful factor.

#6

I’m not sure tiers will add much to the scheme you already have. A programming language should only get more funding if it is widely used, and you’ve already got the breadth of use as a factor. Should a little-used esoteric programming language get funding just because it’s a programming language?

I think the number of users and the size of the project will already pay programming languages well without adding an extra factor based on the kind of software it is.

#7

I am talking about the ratio effort/retribution here: i.e. whether the programming language is popular doesn’t matter to me in that regard. There are frameworks based on a language making more money than the language creator himself, how’s that fair?

#8

If the value is calculated as “size of code” * “number of users”, then a language should make more than any framework it’s based on, no?

Or perhaps the thing to acknowledge here is that FrameworkX uses LanguageX, so some of the calculated value for FrameworkX should be re-directed to LanguageX?

#9

So untested code has the same “weight” as code with tests (that have to be written and maintained)?

#10

@mbrookes It seems like you’re drawing a conclusion from something someone here has said, but I’m not sure from what.

#11

Remember that usage is also a factor (and hopefully, over time, a referendum on quality and utility). My hope/belief is that many activities such as documentation, testing, patch review, etc. are reflected in increased usage.

A practical reason to exclude tests is that we’re weighing the shipped package, not the source repo. (Many repos publish multiple packages.)

Most packages don’t ship the tests; a surprising number I’ve looked at do, but it kinda seems like they do it accidentally (the consumer of the package afaik doesn’t import or use the tests).

So that’s why I’ve currently adjusted the tests out of those packages that seem to accidentally ship them, so weights are comparable between packages that do and don’t bundle their tests.

#12

so some of the calculated value for FrameworkX should be re-directed to LanguageX?

It should if it’s accepting donations, yes.

#13

Package weight is a subjective measure depending on value of the package in specific project. For cases when people can see it, this personal measure should be a priority. In my opinion. Every person on a project may have a package that he or she likes and wants to support, and people need to be given this ability. This makes Tidelift a more social story.

This almost fits Future ideas section if “micromanagement” is replaced with “let people play”. It could be your personal preferences, but if company hires you, they import your settings and may or may not distribute their weights accordingly. This makes you valuable artifact in you community regardless if you can participate or busy with a new job.

Speaking about gameplay. In agile development people are playing agile story points, so let them play infrastructure points to estimate the impact.

Subjective feeling of a project well-being is important as well. If one project has sufficient funds, then another might find them more important, but that’s a source of speculations, so unless people know each other personally, it may not work out well.

1 Like
#14

Something I’d emphasize as background that relates to some of this discussion, is that we are paying lifters to do work and to be on the hook for certain things; Tidelift is not a donation system, it’s a system for maintainers to collaboratively provide valuable benefits to subscribers. (See also Product roadmap snapshot as of January 2019 and https://tidelift.com/docs/lifting/tasks-overview )

The way we frame it currently is that subscribers are covered for all packages they report that they use ( see https://tidelift.com/subscription/support ) so if they’re reporting it, they are getting the subscription benefits.

A related point is that subscription benefits and paying lifters are linked. So for example we don’t have a way to sign up to lift C packages right now, but we also don’t have a way for subscribers to get subscription benefits on C packages. In the current model, we’d want to add both of those at once.