Gregg Vision v0.2.1

Gregg Vision v0.2.1 generates a Grascii representation of a Gregg Shorthand form.

Uses

Given a grayscale image of a single shorthand form, Gregg Vision can be used to generate its Grascii representation. When combined with Grascii Search, one can obtain possible English interpretations of the shorthand form.

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import AutoModelForVision2Seq, AutoImageProcessor, AutoTokenizer
from PIL import Image
import numpy as np


model_id = "grascii/gregg-vision-v0.2.1"
model = AutoModelForVision2Seq.from_pretrained(model_id)
processor = AutoImageProcessor.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)


def generate_grascii(image: Image):
  # convert image to a single channel
  grayscale = image.convert("L")

  # prepare processor input
  images = np.array([grayscale])

  # preprocess image
  pixel_values = processor(images, return_tensors="pt").pixel_values

  # generate token ids
  ids = model.generate(pixel_values, max_new_tokens=12)[0]

  # decode ids and return grascii
  return tokenizer.decode(ids, skip_special_tokens=True)

Note: As of transformers v4.47.0, the model is incompatible with pipeline due to the model's single channel image input.

Technical Details

Model Architecture and Objective

Gregg Vision v0.2.1 is a transformer model with a ViT encoder and a Roberta decoder.

For training, the model was warm-started using vit-small-patch16-224-single-channel for the encoder and a randomly initialized Roberta network for the decoder.

Training Data

Gregg Vision v0.2.1 was trained on the gregg-preanniversary-words dataset.

Training Hardware

Gregg Vision v0.2.1 was trained using 1xT4.

Downloads last month
40
Safetensors
Model size
36M params
Tensor type
F32
ยท
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train grascii/gregg-vision-v0.2.1

Space using grascii/gregg-vision-v0.2.1 1