Draft:International Standard Content Code

  • Comment: Sources do not support the text. E.g. source 13, https://iscc.codes/concept
    does not support base32. The CODE and UNIT composition is also not directly mentioned. ChrysGalley (talk) 19:44, 9 February 2026 (UTC)
  • Comment: In accordance with the Wikimedia Foundation's Terms of Use, I disclose that I have been paid by my employer for my contributions to this article. Correction: this disclosure was added automatically by the Article Wizard in error. I am not paid to edit Wikipedia. I do have a conflict of interest as an unpaid board member of the ISCC Foundation, which is separately disclosed on my user page and on the talk page per WP:COI. Etma 1222 (talk) 11:12, 23 April 2026 (UTC)
  • Comment: Notability basis: independent significant coverage in NISO/Carpenter (2024), the joint IEC-ISO-ITU AMAS technical report (2025), and the peer-reviewed Fraunhofer paper in Electronic Imaging (2025). The subject also meets WP:NSTANDARD as a published ISO standard (ISO 24138:2024) that has been adopted nationally as AS/NZS ISO 24138:2025. Etma 1222 (talk) 10:42, 4 June 2026 (UTC)
International Standard Content Code (ISCC)
AbbreviationISCC
StatusPublished
Year started2016
First published15 May 2024
Latest versionISO 24138:2024
OrganizationISO/TC 46/SC 9
DomainDigital media content identification
Websiteiscc.io

The International Standard Content Code (ISCC) is a similarity-preserving identifier for digital media assets such as text, images, audio, and video, standardized as ISO 24138:2024.[1] An ISCC is computed from a file rather than assigned by a registration authority, so any party holding the same file can derive the same code. Because an ISCC is derived from the content, it differs from registry-assigned identifiers such as the ISBN or DOI, which are assigned by an authority; the fingerprint it produces also remains largely stable after a work is edited or re-encoded.[1][2] An ISCC combines several code units. Each unit is derived from one aspect of a file: its metadata, its perceptual features, its raw byte sequence, or a cryptographic hash of its contents. Because similar files yield similar codes, an ISCC can be used to detect near-duplicates and match content across formats and encodings.[1][2] The standard was published in 2024 after development within ISO/TC 46/SC 9. Since then the ISCC has been incorporated into the ONIX book-trade metadata standard and added to the C2PA list of soft-binding algorithms, and it has been used in rights-management and AI-governance settings. Its open-source reference implementation is maintained by the ISCC Foundation.[1][3]

History

edit

Development

edit

The ISCC was created by Titusz Pan (ORCID 0000-0002-0521-4214), an open-source developer working on content identification and the inventor of the ISCC.[4] According to the ISCC Foundation, Pan was the principal editor of ISO 24138:2024.[5] Pan developed the first ideas for the ISCC in early 2016; later that year the work was taken up by the Content Blockchain Project, a consortium that studied blockchain technology for journalism and digital media and received funding from the Google Digital News Initiative.[5] The project published an early specification and a prototype, and released an open-source ISCC 1.0 specification and reference code in 2018.[5][6] In 2019 the project received one of Germany's inaugural Digital Publishing Awards at the Leipzig Book Fair.[7]

Standardization

edit

In 2019 the International Organization for Standardization took up the ISCC as a work item within Technical Committee 46, Subcommittee 9 (TC 46/SC 9, Identification and description). A dedicated working group, WG 18, was established to develop it and held its first meeting on 29 October 2019.[5] The committee reviewed the draft through the usual ISO ballot stages, including a draft international standard (DIS) review in 2023.[8] ISO 24138:2024 was published on 15 May 2024.[1] It defines the syntax, structure, and algorithms for generating ISCC codes and describes their use alongside existing identifier schemes such as DOI, ISAN, ISBN, ISRC, ISSN, and ISWC.[1] The standard includes a reference implementation, published as a freely available electronic insert under its normative Annex D, "Reference implementation".[1][9] In 2025 the standard was adopted nationally in Australia and New Zealand as AS/NZS ISO 24138:2025.[10]

Industry adoption

edit

In July 2020 the ISCC was added to ONIX, the international book-trade metadata standard maintained by EDItEUR, allowing an ISCC to be carried alongside the ISBN in book-trade metadata.[11] In May 2024 the Coalition for Content Provenance and Authenticity (C2PA) added the ISCC to its list of approved soft binding algorithms, registered as io.iscc.v0 with an entry date of 17 May 2024. Most other entries on the list are digital watermarking schemes. A soft binding algorithm re-associates a content-provenance manifest with a file after the file's embedded metadata has been removed.[12][13] In 2025 a joint technical report by the IEC, ISO, and ITU, prepared through the AI and Multimedia Authenticity Standards (AMAS) collaboration, reviewed standards for AI-generated and altered media. The report listed the ISCC among asset-identifier standards.[14]

Structure

edit

An ISCC-CODE is a composite of several ISCC-UNITs, each computed from a different aspect of the content.[1][15]

Units

edit

The standard defines the following unit types:[1][3]

  • Meta-Code: a similarity hash (SimHash) of basic metadata, typically the name and an optional description, used to cluster assets with similar descriptive information.[16]
  • Semantic-Code: reserved as a code type in ISO 24138, with its algorithm not yet specified in the standard. Experimental implementations use deep learning embeddings of the semantic content of text and images.
  • Content-Code: a modality-specific perceptual fingerprint:
  • Data-Code: a similarity-preserving hash of the raw file bytes, using content-defined chunking and MinHash.[21]
  • Instance-Code: a BLAKE3 cryptographic hash of the file, used for exact integrity verification.[22]

A unit can stand alone or be combined into an ISCC-CODE. Unit bodies are sized in 32-bit steps, from 32 up to 256 bits, with a default of 64 bits.[23] When units are combined into an ISCC-CODE, each body is truncated to 64 bits before they are concatenated. A minimum ISCC-CODE contains the Data-Code and the Instance-Code; the other units are optional and precede them in canonical order.[1][15]

Format

edit

Each ISCC-UNIT begins with a variable-length header, commonly two bytes, that encodes the unit's main type, subtype, version, and length, followed by a variable-length body holding the fingerprint data. The units are concatenated and encoded in Base32 (RFC 4648, without padding) for the canonical string form, prefixed with ISCC:.[1][23] A representative ISCC-CODE that combines Meta-Code, text Content-Code, Data-Code, and Instance-Code units is:

ISCC:KACT4EBWK27737D2AYCJRAL5Z36G76RFRMO4554RU26HZ4ORJGIVHDI

Similarity preservation

edit

Unlike a cryptographic hash, which changes completely after any edit, an ISCC-UNIT changes only partially: similar inputs produce similar codes, and the Hamming distance between two units approximates the similarity of the underlying content. Applications use this to detect near-duplicates and to cluster related content by comparing codes.[1][2]

Comparison with other identifiers

edit
Comparison of the ISCC with registry-assigned identifiers
IdentifierScopeAssignmentSimilarity detection
ISBNBook editionsAssigned by national agenciesNo
DOIScholarly worksAssigned by registration agenciesNo
ISRCSound recordingsAssigned by national agenciesNo
ISWCMusical compositionsAssigned via collecting societiesNo
ISCCDigital media filesComputed from contentYes

Traditional identifiers are assigned by registration authorities and identify abstract works or specific editions. The ISCC identifies the digital file, so a single book edition with one ISBN can correspond to many ISCCs, one per format, compression level, or excerpt. ISO 24138 specifies the ISCC for use alongside DOI, ISAN, ISBN, ISRC, ISSN, and ISWC rather than as a replacement.[1] A 2026 European Union Intellectual Property Office (EUIPO) study mapping EU copyright databases and metadata standards discussed the ISCC among the content-identification schemes it surveyed.[24]

Applications

edit

Documented and proposed uses of the ISCC include:

  • Detecting exact and near-duplicate content for deduplication and database synchronization.
  • Verifying file integrity through the Instance-Code.[25]
  • Tracking versions of the same underlying content.
  • Identifying AI-generated content and recording text and data mining (TDM) opt-out declarations under the EU AI Act. In a 2025 Electronic Imaging paper, researchers at the Fraunhofer Institute for Secure Information Technology proposed using the ISCC as a robust hashing method within an infrastructure for tagging AI-generated content.[2] In a January 2026 submission to a European Commission consultation, the International Federation of Reproduction Rights Organisations suggested the ISCC as a possible basis for machine-readable TDM rights reservations.[26]
  • Research data management. The ELIXIR Galaxy platform integrated the ISCC for content-based reproducibility validation and dataset deduplication in bioimage analysis workflows.[25]

Use in cultural heritage

edit

In a 2024 blog post, staff at the TIB and the Berlin State Library argued that libraries, archives, and museums should adopt the ISCC, citing content authentication, comparison of similar works, and registration of machine-learning training data.[27]

CommonsDB

edit

CommonsDB is a European Commission-funded pilot registry of rights information for public domain and openly licensed works that uses the ISCC as its content-derived identifier. A user can check the rights status of a file by generating its ISCC and looking up matching declarations.[28] By March 2026 the registry contained over one million rights declarations.[29]

Reception

edit

In the scholarly-publishing and standards communities, the ISCC has been described as an example of "intrinsic" identifiers. A 2024 report on the PIDfest 2024 conference, co-authored by the chair of ISO/TC 46/SC 9, named the ISCC as one example of such intrinsic identifier systems.[30] Writing a guest column in Music Ally, Virginie Berger pointed to the ISCC as an existing ISO-standard fingerprinting method that the music industry could use for content traceability.[31]

Implementation

edit

The open-source reference implementation of ISO 24138 is maintained by the ISCC Foundation on GitHub.[32] The core repositories are:

Core ISCC reference repositories
RepositoryDescription
iscc-coreReference implementation of the ISO 24138 algorithms
iscc-sdkHigh-level Python SDK for ISCC generation
iscc-schemaJSON Schema definitions and metadata models
iscc-webREST API service powering the public demonstration

The Foundation also publishes experimental generators for semantic codes (iscc-sct for text and iscc-sci for images) and iscc-lib, a Rust implementation of the core algorithms with bindings for several languages.[32] Independent implementations include iscc-core-ts, a TypeScript port,[33] and iscc-sum, a Rust tool that generates the Data-Code and Instance-Code.[34]

ISCC Foundation

edit

The ISCC Foundation (Stichting ISCC) is a nonprofit foundation (stichting) under Dutch law, founded in Leiden in May 2019 and based in Hengelo, Netherlands.[5][35] According to the Foundation, its activities include research on content identification, participation in open-standards work, maintenance of the open-source reference implementation, and support for community adoption.[35]

See also

edit

References

edit
  1. 1 2 3 4 5 6 7 8 9 10 11 12 13 "ISO 24138:2024 Information and documentation - International Standard Content Code (ISCC)". International Organization for Standardization. 2024-05-15. Retrieved 2026-06-02.
  2. 1 2 3 4 Heeger, Julian; Berchtold, Waldemar; Bugert, Simon; Steinebach, Martin (2025). "EU AI-Act: Tagging GenAI Content". Electronic Imaging. 37 (4). Society for Imaging Science and Technology: MWSF-301. doi:10.2352/EI.2025.37.4.MWSF-301. Retrieved 2026-06-02.
  3. 1 2 Carpenter, Todd A. (June 2024). "Introducing the Newest ISO Identifier Standard". National Information Standards Organization. Retrieved 2026-06-02.
  4. Nawotka, Ed (2025-10-15). "Frankfurt Book Fair 2025: Identity Stamps". Publishers Weekly. Retrieved 2026-06-02.
  5. 1 2 3 4 5 "ISCC – History". ISCC Foundation. Retrieved 2026-06-02.
  6. "iscc/iscc-specs: ISCC Specification v1.0.0". GitHub (ISCC Foundation). 2018-03-31. Retrieved 2026-06-02.
  7. Anderson, Porter (2019-03-25). "Content Blockchain Project Wins One of Germany's Digital Publishing Awards". Publishing Perspectives. Retrieved 2026-06-02.
  8. "ISO/DIS 24138, International Standard Content Code (ISCC)". Association for Information Science and Technology. 2023-12-13. Retrieved 2026-06-02.
  9. "ISO 24138:2024 Electronic inserts (reference software)". International Organization for Standardization. Retrieved 2026-06-02.
  10. "AS/NZS ISO 24138:2025 Information and documentation - International Standard Content Code (ISCC)". Standards Australia / Standards New Zealand. 2025. Retrieved 2026-06-02.
  11. "ONIX for Books Codelists Issue 50, List 5 (Product identifier type), code 39". EDItEUR. 2020-07-09. Retrieved 2026-06-02.
  12. "Soft Binding Algorithm List". C2PA. Retrieved 2026-06-02.
  13. "C2PA Technical Specification 2.2" (PDF). Coalition for Content Provenance and Authenticity. 2025-05-01. Retrieved 2026-06-02.
  14. "Technical Report on AI and Multimedia Authenticity Standards" (PDF). Geneva: World Standards Cooperation (IEC, ISO, ITU). 2025-07-11. ISBN 978-2-8399-4720-6. Retrieved 2026-06-02.
  15. 1 2 Pan, Titusz (2026-01-19). "IEP-0010: ISCC-CODE". ISCC Enhancement Proposals (ISCC Foundation). Retrieved 2026-06-02.
  16. "IEP-0002: ISCC-UNIT Meta-Code". ISCC Foundation. Retrieved 2026-06-02.
  17. "IEP-0003: Content-Code Text". ISCC Foundation. Retrieved 2026-06-02.
  18. "IEP-0004: Content-Code Image". ISCC Foundation. Retrieved 2026-06-02.
  19. "IEP-0005: Content-Code Audio". ISCC Foundation. Retrieved 2026-06-02.
  20. "IEP-0006: Content-Code Video". ISCC Foundation. Retrieved 2026-06-02.
  21. "IEP-0008: Data-Code". ISCC Foundation. Retrieved 2026-06-02.
  22. "IEP-0009: Instance-Code". ISCC Foundation. Retrieved 2026-06-02.
  23. 1 2 Pan, Titusz (2026-01-19). "IEP-0001: ISCC Structure and Format". ISCC Enhancement Proposals (ISCC Foundation). Retrieved 2026-06-02.
  24. European Union Intellectual Property Office (2026-05-28). Mapping of EU Databases and Metadata Standards Providing Information on Copyright-Protected Works (Report). European Union Intellectual Property Office. doi:10.2814/4041636. ISBN 978-92-9156-373-9.
  25. 1 2 Paul, Maarten; Etzrodt, Martin (2026-02-14). "Content Tracking and Verification in Galaxy Workflows with ISCC-SUM". Galaxy Training Network. Retrieved 2026-06-02.
  26. International Federation of Reproduction Rights Organisations (January 2026). "Response to European Commission Consultation on Protocols for Reserving Rights from Text and Data Mining under the AI Act and the GPAI Code of Practice" (PDF). Retrieved 2026-06-02.
  27. Heller, Lambert; Gragert, Gerrit (2024-07-05). "Why libraries, archives and museums should use the International Standard Content Code (ISCC)". TIB-Blog. Retrieved 2026-06-02.
  28. Europeana Foundation. "CommonsDB". Europeana PRO. Retrieved 2026-06-02.
  29. Price, Gary (2026-03-31). "Openly Licensed Works: CommonsDB Surpasses One Million Declarations". Library Journal infoDOCKET. Retrieved 2026-06-02.
  30. Meadows, Alice; Jones, Phill; Carpenter, Todd A. (2024-07-18). "A Successful Start to a New Festival of Identifiers: PIDfest 2024". The Scholarly Kitchen. Society for Scholarly Publishing. Retrieved 2026-06-02.
  31. Berger, Virginie (2025-05-08). "Licensing AI Music: The Industry Is Focusing on the Wrong Problem". Music Ally. Retrieved 2026-06-02.
  32. 1 2 "ISCC Foundation". GitHub. Retrieved 2026-06-02.
  33. "iscc-core-ts: TypeScript implementation of iscc-core". GitHub. Retrieved 2026-06-02.
  34. "iscc-sum". GitHub. Retrieved 2026-06-02.
  35. 1 2 "Foundation". ISCC Foundation. Retrieved 2026-06-02.
edit