Philosophies and Papers

The diffusers Philosophy document says

code that can be read alongside the original paper

and

In contrast, diffusion pipelines are a collection of end-to-end diffusion systems that can be used out-of-the-box, should stay as close as possible to their original implementations.

There’s something in this that I think hints at some of the mismatched expectations I’ve had when dealing with the diffusers project.

To let you know where I am coming from:

In my twentysomething years of software library and application development experience, it was very rare for me to read a paper in a peer-reviewed journal. And I can’t think of a time when I was ever asked to cite one. I think I’ve read more papers in the last six months (since getting involved in diffusion-powered generative art) than I had in my entire career leading up to that point.

The professional community I interacted with does have conferences with presentations and poster sessions, but they aren’t the kind of thing that habitually comes with a DOI and a BibTeX entry. Some people present on what they have implemented themselves, others are evangelists or educators who provide guidance on working with the implementations that are out there.

References are URLs to source code or self-published blog posts and slide decks.

I’m not young and brash enough to say my experience is the only valid experience, but I do want you to know that there are a lot of software developers who might see that the first bullet point in your philosophy talks about an “original paper” and legitimately have never read a journal article in their lives, or had it even occur to them that it might be something they should do.

As for “stay as close as possible to their original implementations” –

Okay, even knowing that academic papers exist and are sometimes relevant to software, and having some appreciation for a document that describes intent, prior art, and methodology, this one is a hard pill for me to swallow.

With rare exception, the only places with code that stays unchanged from its original implementation are derelicts. Relics that nobody interacts with. Any software project that is actually being interacted with, anything people pick up and use and make part of their daily work, is changed by that process.

The original implementation might be an interesting historical artifact, in the way that a museum of science and industry might show a steam engine or the first airplane, but it’s not something you would ever want to use.

And why would you? Even setting aside the reputation code from academia has when it comes up against real-world usage, the original publication was likely the minimum required to demonstrate that something works. The people of the present – authors included! – have had the opportunity to learn from that and explore its uses and behaviors and learn from each other, and we’re now much better informed than that artifact frozen in the past.

3 Likes

I’ve been working in academics my entire life, but my research group straddles the border between research software and software engineering. There is a huge difference between the software that graduate students produce and those that the engineers produce, and my group reflects this. The graduate students write quick and dirty scripts that are proof of principle that the algorithm works, write papers, get their PhDs and then are off. If I want the code to be a legacy, I then give it over to the engineer half of the team, who usually rewrite the thing from scratch, document it properly, and commit to maintain it as a public resource. There is no expectation that one can read the research paper side by side with the polished code.

That being said, there are great benefits to an integrated team, and the knowledge and techniques that each side has provide great cross-fertilization and strengthen the team as a whole. Even though it can sometimes be a bit jarring to have a group meeting one week that focuses on continuous integration, and a meeting the next week where the topic is the regulation of apoptosis in medulloblastoma (don’t even ask).

1 Like

Hey keturn!

Lots of good points :slight_smile: I think we try to abide by these points in our philosophy doc, so we can empower people with the points you have.

  1. code that can be read alongside the original paper

This is admittedly one of our more aspirational goals that we don’t always achieve. The main point here is that diffusion models are a very fast moving and technical field where sometimes the best or only docs are papers. We just want to strive to make sure the two are as aligned as possible.

Super understanding that papers might be a bit out of the beaten path for many people. We see diffusers as a benefit here that can provide more paths to working on and understanding diffusion models than just reading papers. On a personal note this is how I got into working on diffusers because I wanted a reference implementation to look through while reading the latent diffusion paper :slight_smile:

  1. In contrast, diffusion pipelines are a collection of end-to-end diffusion systems that can be used out-of-the-box, should stay as close as possible to their original implementations.

We see self contained and close to the original version implementations as a means to help individuals build the newer and greater versions. We find this creates really stable and hackable software. So when someone comes up with the next great diffusion model, it will likely derive from an existing implementation. Then because the existing implementation is self contained and has been stable, the new version can be put together super quickly. We’re taking a lot from our learnings with transformers here :slight_smile:

Really love the feedback on the philosophy doc. If you have any ideas on how to maybe reword it to make it clearer or tweaks we could make, super happy to discuss :slight_smile:

1 Like

Thanks for starting this super interesting / important discussion!

Mabye just one more comment from my side because I think we’ve not been super clear by:

As for “stay as close as possible to their original implementations” –

I guess we’ve mostly copied that from the transformers philosophy (see goal 2 here: Philosophy) and what is meant here is more the following:

If the original codebase accompanying the paper presents a model, called stable diffusion that does Y = F(X), but later some other code-base states that they have improved upon stable diffusion and it now does Y* = F(X) => then we won’t use the improved version Y* = F(X), but instead we stick with the original version Y = F(X).

So essentially when people read diffusers - stable diffusion, we want to give the reader/user a certainty that the diffusers - stable diffusion model behaves exactly as described in the paper.
An example of this is that both sd-v1-4 and sd-v1-5 still use the PNDM/PLMS scheduler as the default scheduler (see here: scheduler/scheduler_config.json · runwayml/stable-diffusion-v1-5 at main) because that’s how they have been released. Similarly we implemented all the schedulers trying to match original implementation / paper instead of adding semi official improvements (see issue here e.g.: sample_dpmpp_2m slightly different from paper · crowsonkb/k-diffusion · Discussion #48 · GitHub)

=> Now we have exception and we do add new features that are not mentioned in the original paper (attention slicing, …), but we’re very careful that we keep the original Y = F(X) behavior.
Also note that this philosophy was much easier to commit to when working on transformes because back then every paper released their weights; this has now changed in diffusers because many labs don’t release their code/weights anymore and we have to implement it ourselves from the paper (dreambooth) or rely on community contributions.

1 Like