DeepGHS

community

Verified

deepghs

Activity Feed Request to join this org

AI & ML interests

Computer Vision Technology and Data Collection for Anime Waifu

Recent Activity

wangsssssss authored a paper about 1 month ago

Adaptive 1D Video Diffusion Autoencoder

radna published a model about 1 month ago

deepghs/test

CharlieDreemur authored a paper about 2 months ago

SeeingEye: Agentic Information Flow Unlocks Multimodal Reasoning In Text-only LLMs

View all activity

Papers

A Large-scale Dataset for Robust Complex Anime Scene Text Detection

View all Papers

AbstractPhil

posted an update 1 day ago

Post

107

SVD + Scatterpoint2D is the official encoding structure of the geolip system as of the image encoding tests.

Both unattuned scatterpoint2d and triton-aligned SVD are a cut above the rest by a large margin.

https://github.com/kymatio/kymatio
https://huggingface.co/blog/AbstractPhil/svd-triton-kernel-optimization
AbstractPhil/svd-triton
AbstractPhil/geolip-hypersphere-experiments

Most kymatio tests were done on standard pytorch models that yielded higher accuracy than simple conv or transformers before overfitting, but not in every instance. Most common tested low-count cifar10 and cifar100 instances yielded more for less. Those are in the hypersphere-experiments notebooks and are viewable via huggingface tensorboard metrics.

The accuracy, retention, agreement, disagreement, and sheer capacity of the refined SVD kernel shows that full Procrustes alignment is not just crucial to distillation, but also entirely representable within encoders themselves as students.

This structure can representationally re-impose layer-by-layer which is what I tested, and this capture system can behave as a global regularization system, a selector, a behavioral adjudication structure, an encoding solidification unit, a trajectory systemic accumulator, an anchored differentiation unit, and about 30 other tests show - all of the above simultaneously.

The preliminary rapid-iteration capable kernel shows that not only can these behaviorally represent utility, but the noise-drift can be directly accounted for using systems like GELU, drop path, dropout, and other elements to learn to ignore that very noise that accumulates.

Attention is now officially deemed valid when utilized based on the tests and examples allowing preserved geometric structure after attention selection.

This encoding structure is substantially more durable than I can give credit for.

Surge is coming, exactly as predicted. Late I admit.

1 reply

AbstractPhil

posted an update 7 days ago

Post

206

I built an actionable todo based on current research, former research, and compounded a full spectrum of potentials for image encoding into pure geometric structures, hybrid geometric structures, partial geometric structures, and full spectrum analysis relational structures. Claude built the manifest based on our research after forming a full research spectrum to head into actionable directions.

AbstractPhil/geolip-hypersphere-experiments

I have to say before I continue, Claude managed to keep a large running manifest of our research, and with that list this was possible. Without that list, this would have been entirely devoid of purpose, and Claude would likely have not extracted the information in a utilizable state for this solution set.

I'll be running the full series of tests in conjunction with the constellation architecture. Either it survives, or something entirely new will form. Based on the results from these tests, the directions will evolve.

Either way, the most optimal and fastest methodologies for this system will be benchmarked and utilized as the primary use-cases. The slower and more obviously higher-resolution variations will be optimized as much as possible and solutions provided.

Lets do this right.

With that, the first experiment will be geolip-anchor-scattering and the structure will be based on the first in the list.

I will be updating posts based on benchmarks, landmarks, and new insights while the Bert data cooks.

4 replies

AbstractPhil

posted an update 10 days ago

Post

180

Clawd breadcrumb trail AbstractPhil/geolip-hypersphere-experiments

With this I'll begin forming Clawd interface utility with the geofractal router, which will allow Clawd to form agentic clouds of utility that can be datawise trained on the go with minimal hardware requirement. This is not ready yet, but it begins very soon.

The recent experiments have solved the alignment issue that crippled collectives and forced my hand into ensemble research instead.

With those recent experiments, the geofractal router will allow modularization structural capacity after some preliminary alignment adjustment and adjudication experimentation. This will enable the full collective differentiation through codified attribution.

In other words, adding and removing modular AI elements to contribute to aligned communication streams, all speaking the same language. This is an adjacent and more powerful result than the anticipated geovocab patchwork, and it yields substantially more effective agentic solutions than moving around a bulky embedding echo-chamber.

https://github.com/AbstractEyes/geofractal

Procrustes whitening orthogonality will allow adding and removing elements from geofractal routers given a small amount of prep data, while the anchors of expectation can stay as a snap-on element.

The most inquisitive and interested researchers can follow the trail to find all of the experiments. Web crawl it with clawd and you can probably create a unified rationality pretty quickly, but I doubt you'll like what you find. The journey was extensive and the failures outweighed the successes, but I did find the lightbulb.

The represented outcomes are either in my articles in huggingface, my civit articles, my github repos, my huggingface repos, or I forgot to upload them and they're in my colab notebook heap.

As most research yields, it is mostly failures. However, there are many successes in the mix. Many. If you need solutions, you can dredge the bog.

1 reply

AbstractPhil

posted an update 12 days ago

Post

141

geolip-vit-x34 - 34 expert vit. I can't train an extended version of 34 vits, but I can definitely run some experiments and make some starter weights with an anchor. That would yield a substantial amount of data.

AbstractPhil/bulk-coco-features

This... is going to be a odd one to describe. Based on the research with Bert, creating a uniformed patchwork using a multitude of vit composites will be very achievable. It shouldn't be soup, which is really hard to explain, but by creating a second geometric anchor, the system will align in a way that I could never predict without many more model analysis and must test. I simply didn't test all these vits for geometry, so this will be the test.

This is essentially 34 directly extracted views of coco, which is already prepared feature data. With this data, we have 34 experts that can distill into a single unified vit. I'm hesitant to even call this distillation anymore, it's more interpolative data alignment, and it's absurdly retentive.

ADDITIONALLY, we can anchor to frozen geolip-bert and create cross-contrast between the anchors for a learned anchor median, which will allow further integrations directly into the geometric core.

This will require a few overlapping internal mechanisms to guarantee vit differentiation, however I believe the full unified patchwork will be... different from what is currently known as a vit.

geolip-bert-vit will likely be cooking within the month. The alignment statistics say it will be... 100% accurate to the specifications.

I CAN prepare 34 vits worth of imagenet, but I would need probably 34 vits worth of laion aesthetics, which is substantially more than I currently have. In the process I would need to ensure everything isn't corrupt, and the captions are correctly synthesized in our expert student bert with the correct anchoring rotation.

Probably 3 vits is enough for the full version prototype, 34 vits for the bulk experiment.

3 replies

AbstractPhil

posted an update 14 days ago

Post

165

geolip-captionbert-8192

This bert is currently being distilled using 5 bert teachers using the conceptual captions dataset. The recall accuracy is based on the whitened procrustes alignment, and the losses reflect keeping that rotation aligned correctly.

The expectation from the smaller prototypes show this model will align to 100% accuracy recall based on the most optimal opinions based on the correct answer, aligning specifically to the correct answers in conjunction with all the geometric losses.

No joke, this may be the smallest, least computation, most accurate, and fastest bert I've trained thus far - and it will be based entirely on five teachers simultaneously feeding opinions through a relay hub.

13 replies

AbstractPhil

posted an update 16 days ago

Post

136

I'll attempt to expand the geolip-clip to full sequence context window to encompass sequential learning.
AbstractPhil/geolip-clip-vit-large-patch14-ctx576
The memory pod is specifically meant to tune everything based on final state pooling, which is fine if you aren't trying to actually use sequential utility.
HOWEVER, there are many elemental biases that present themselves if attempting to USE the standard sequence of 77 in conjunction with this final pooled state. Even though the standard 77 is predominantly noise past token 10 it still houses considerable amounts of information in terms of utility, so this should be handled carefully. Zero-shot structures are a tricky structure to analyze, especially structures based on attention mechanisms instead of true sequential accumulation. I've noticed I need to watch them for quite a while before the real bugs show up.

As it stands the token pool is essentially [B, 7+8, 768] for pools. This contains a robust and highly complex representation of useful accumulated bidirectional attention data, so it's quite powerful.

I'll build a few prototypes and tap into some papers. I'll either come up with something or a reason why I didn't. The end result will either produce an anchor bank set of tokens [B, 15, 768] for pooling, or [B, 15, 77, 768] ideally - which should expand the sequence of the clip to 1,155 if successful. That doesn't necessarily mean this sequence will be more useful than the [b, 15, 768], but it will be representationally valid to the context window expansion.

I wouldn't hold out for a single full-sequence option in a single day, that's a lot of moving parts to analyze, not to mention highly impractical to train with. A smaller dose of this information would be necessary for rapid prototyping so it'll likely be packaged as such.

Well I spoke too soon. It's ready to play with.
AbstractPhil/geolip-clip-vit-large-patch14-ctx576-seq77

4 replies

AbstractPhil

posted an update 19 days ago

Post

266

geolip-bertenstein-v1 - 5 experts chosen. A collective of shared transformer aligned experts, not a mixture of experts. Similar to a MOE, but not quite. This first prototype won't have the full mailing projection relay system afforded by the geofractal router, but it will definitely be a solid prototype.

It is not production ready yet, there needs to be a few upstream and downstream tools meant to consume and process the outputs to create useful representations.

This model will be able to text respond, use whisper, see with dinolip, code with codebert, and process proteins using esm2_t33_650m_ur50.

Our experts for the prototype are;
google-bert/bert-large-uncased
facebook/dinov2-large
microsoft/codebert-base
openai/whisper-large-v3
facebook/esm2_t33_650M_UR50

Not the smartest text model, but more than enough for this preliminary use case test setup. Text is predominantly meant to align and orient downward function, the entire machine is meant to be operated unilaterally as a collective, or independently through individual pairs requests via special token access.

This model will be capable of substantial power and feats as a prototype. It will be capable of seeing and processing differential equations utilizing dinov2 and esm2 data simultaneously, which can be used for downstream analysis - and I WILL use that data to create a more powerful connection between dinov2 tokens, protein tokens, video tokens, code tokens, and audio tokens.

This is the FIRST prototype of this case, and I will introduce video, genetics, shape analysis, pattern recognition processing, and a much more powerful and reusable text model.

The tests show the models can have differential communication through the geolip transformers after procrustes pairwise analysis and pentachoron CV protective measures.

Whitening procrustes for precalculation and center-aligning allows for a faster convergence, so that should help too.

2 replies

AbstractPhil

posted an update 20 days ago

Post

1977

I've... done it. This, with experts, achieves near 100% R1 retrieval accuracy on an adjacent - unseen by the fusion transformer - dataset with around 40k steps from the seen dataset. This means the language of the models are at least tested fused within the constraints, not just projected or estimated.
AbstractPhil/geolip-procrustes

I encourage EVERYONE who is curious to check my work. Check it, double check it, and triple check it.

These were aligned using COCO and then validated with Flickr. Entirely different datasets. The experts arbitrated and the alignment yielded the correct answers. Preliminary tests show that with almost no alignment requirement, the models can reach 100% R1 retrieval accuracy.

Not to be confused with validation accuracy for a classification model or a text encoder's text response, this allows multispectral communication between entirely different models for direct downstream consumption with almost no training for the chosen models.

I have a working procrustes experiment that learns adjacent manifolds within a reasonable spectrum and the speed is... well, 1 epoch with COCO using Bert-Large and DinoV2 that allows the models to align nearly perfectly. For some scales in the experiment it shows that the 3 set epochs aren't quite enough to align R1 to highest, while many align nearly immediately.

These two were an obvious pair to pick, 60% similarity and >90% spectral similarity.

The trainer transfers layers, learns embeddings, and more - all by sticking strictly to geometric boundaries and procrustes informational accumulation within a modulation model's constraints.

I have many experiments to run.

1 reply

AbstractPhil

posted an update 21 days ago

Post

269

The small projection-based approximator model for the geolip patchwork did not breach a certain level of accuracy as required by my specifications, so I've defaulted to harvesting direct geometric information from AI models until I get the comparative bounds required for a useful topology.

I must sincerely apologize for not solving this problem quickly.

This will take time. Without the approximator it's going to be considerably slower, but this model I begin training will be providing the approximations in a different way over time. As iterations progress, the system will conform to a huge array of geometric potentials and be capable at predicting those, but it will not be as powerful as the full patchmaker up front, and it will be slow training.

If I can get my hands on a cluster of A100's or H100's for a measure I'll make a post immediately, until then I must default to the slower process.

I really banked that the smaller version would have worked, but it simply couldn't hold complex topological shape without the correct boundaries being learnable AND endure entropic decay simultaneously. The only way to have a predominant shot at a full geometric shared language, is to make those boundaries learnable in the full spectrum of potentials, or at least more than I have placed on it.

I'll be refining my process in the coming days further, and I do apologize for pre-emptively announcing a potential that I have yet to fully explore.

There will be a full upgraded 38 shape geolip patchwork trained asap to fully encompass the Flux 1 AE spectrum, and another trained for SD15, SDXL, and Flux 2's VAE as well. These will accommodate DIRECT complex geometric patchwork learning, but not to the scale as promised yet. Autoregression is a complex mistress as many of you know, and I will be spending a great deal of time and compute analyzing all of the information required to build a uniformly useful and powerful autoregression patchwork to utilize as invariance to teaching.

2 replies

AbstractPhil

posted an update 28 days ago

Post

1589

GLIP - Geometric Linear Interpolative Patchwork aka geolip.
https://github.com/AbstractEyes/glip-autoencoder

To tinker with the topology directly you can play with it here, though I admit it's imperfect in this form - it's quite the tinker toy to see the effects of patching.
https://claude.ai/public/artifacts/697287e4-fa18-4753-8b57-904d5e2022ed

This is the repo that will contain the next experimental stage, which is based entirely on the research and structural boundaries applied by said research. It'll be a little rigid while I get Claude set up.

In order to directly train these layered topological response patchworks you must install and use the geovocab2, geofractal, and wide_compiler repos.

This is due to the wide_compiler's wide_linear high-speed efficiency for ensemble processing, the geovocab2 factory structure with multiple formulas including highly efficient designs meant for kernel compilation, and a series of reusable utilities in geofractal including some of the more complex losses and difficult to optimally tune gate structures surrounding them.

Many of the underlying formulas are outlined here;
AbstractPhil/geometric-experiment-history

Utilization and training USING the pretrained or untrained geolip patchwork will be as simple as loading the model in pytorch and will not require external dependencies of the geolip package, numpy, or pytorch depending on the task. It will come packaged with recommended losses but I encourage experimentation because I simply cannot cover all spectrums.

More details to come as development progresses. The system is coming together and the state of the utilizable autoencoder will be ready within a couple weeks. The entire system is built for convenience and reusability, so the structure will be built similarly to autoencoder systems that currently exist, with a few tweaks here and there for important elements - so the interface will be familiar to those who use it.

6 replies

AbstractPhil

posted an update about 1 month ago

Post

1342

The Rosetta Stone geometric vocabulary and the ramping up capacity.

What makes this particular invariant special, is the existence within all structures I've tested so far. I had Claude write up the direct article based on what we built together, but I've tested it on many substructures. This is flawed, and I have a series of answers to making it more accurate.

First a reconstruction from the ground up. This means each shape is specifically built upward from the substructure to the point of inductive deviance. This will be less quick at first and then build speed as I optimize like the last system did.

The "saddle" problem; the system detected saddles because there wasn't enough deviance in the shapes to attenuate to more cardinality and more aligned substructures. The blobs were around 30-40% of the overall patches, which interpolated into the others produced a fair approximation.
It MOST DEFINITELY did see those shapes in their voxel complexity. This is real.

https://claude.ai/public/artifacts/bf1256c7-726d-4943-88ad-d6addb263b8b
You can play with a public claude artifact dedicated to viewing the current shape spectrum - and with that know exactly why it's flawed.

The flawed and repetitive shapes. I rapid prototyped and there are multiple redundant shapes that simply don't classify well or at all. Not to mention the rotation simply doesn't help much of the time, or doesn't exist with many shapes. This will be rectified in the next variation.

Projecting to shared latent space as a catalyst to allow growing subjective geoflow matched step variance, rather than simply direct classification. This will theoretically allow for full channel-to-channel invariant features to be mapped from structure to structure, and the very formula that encapsulated them to be directly baked into the math rather than classified as a substructure analysis.

There are many challenges between here and there, so stay tuned my friends as I plot the geometric language of pretrained AI.

2 replies

wangsssssss

authored a paper about 1 month ago

Adaptive 1D Video Diffusion Autoencoder

Paper • 2602.04220 • Published Feb 4 • 5

radna

published a model about 1 month ago

deepghs/test

Updated Feb 9 • 1

AbstractPhil

posted an update about 2 months ago

Post

280

GeoFlow update — two training runs on the pentachoron geometric prior (4.8M params modulating frozen SD1.5).

10k ImageNet run fixed fragmented anatomy and spatial coherence in 7 minutes.

50k object-relations run taught actual compositional reasoning — "red cup on top of blue book" goes from a floating cup to correctly placed on the book.

Most interesting finding: learning happens in two phases. Object association locks first (~500 steps), spatial arrangement crystallizes after. You can watch it happen — "three candles in a triangle on a wooden tray" starts as candles side by side, then reorganizes into proper triangular formation. The tray itself rendered as a pentagon. Five vertices in, five sides out. The simplex is thinking in its own geometry.

Loss sits around 0.4 the entire time yet composition steadily improves. The prior nudges conditioning, it doesn't overwrite it.

Weights:
AbstractPhil/sd15-geoflow-object-association
Dataset:
AbstractPhil/synthetic-object-relations
Formulas:
AbstractPhil/sd15-rectified-geometric-matching

Next up — measuring the exact entropy decay inflection point across layers to enable branching the simplex into parallel paths with different anchor deviations. Geometric ensemble attention where the branches disagree on purpose.

AbstractPhil

posted an update about 2 months ago

Post

2277

AbstractPhil/tinyflux-experts
Introducing the "blot" expert, sd15-flow-sol. The twin sister flow-matching experts for tinyflux-lailah; sd15-flow-lune AND sd15-flow-sol will be used in tandem to train tinyflux-Lailah. sd15-flow-sol never managed to reach full flow-matching prediction, so epsilon vpred conversion is required. All experts will exist within the tinyflux-experts repo, including all the critical checkpoint sets.
Lune was heavily finetuned in the sd3-style and adapted shift timestep system after David's interpolation converted sd15 into geometric basis.
Sol was left abandoned after 50 epochs with David and was considered overcooked and rigid, until I noticed the geometric structure today. Lune doesn't produce geometric structure as solid as Sol, not even close. Lune produces improved fidelity and detail, but Sol produces something very very different, aligned to sd15's behavior, and fully representative of the 5point 4simplex structure that David brought to the table.

Sol is essentially a nearly perfect blob-forming geometric blotter. Sol is SD15, and yet SOL was trained using a specific pattern recognizing and timestep aligned David model. David was tasked with classifying timesteps and patterns using complex deep-recognition structural analysis layer-by-layer, determining full-scale opinions after watching the entirety of sd15's structure during training.

Even though the sd15-flow-sol was left abandoned, the structure of Sol is HIGHLY effective at understanding TIMESTEP blotting interpolation. I didn't realize how crucially important this was until Lailah started to show rigidity and compartmentalized behavior with sequence - which likely happens to ALL flow matching models.

AbstractPhil/sd15-flow-matching

AbstractPhil/geo-david-collective-sd15-distilled
AbstractPhil/geo-david-collective-sd15-base-e40

3 replies

ZetangForward

authored a paper about 2 months ago

Elastic Attention: Test-time Adaptive Sparsity Ratios for Efficient Transformers

Paper • 2601.17367 • Published Jan 24 • 34

ZetangForward

submitted a paper to Daily Papers about 2 months ago

Elastic Attention: Test-time Adaptive Sparsity Ratios for Efficient Transformers

Paper • 2601.17367 • Published Jan 24 • 34

AbstractPhil

posted an update 2 months ago

Post

987

Meet FluxLailah; AbstractPhil/tiny-flux-deep; 220m Flux variant currently pretraining at BF16. She is experimental, does not produce solid images yet - and yet she is producing. There is both an EMA and a raw weights pair producing different images. The EMA is particularly interesting at times.
Lailah uses flan-t5-base, clip-vit-l-14, and BlackForestLabs Flux1s VAE.
SEQ limit 128, images 512x512 for now. Lailah's early form is based on three variants. TinyFlux's weights were carefully planted into a deeper structure and trained yet again - dubbed TinyFlux-Deep. This variant has 15 dual-stream blocks and 25 single-stream blocks, nearly identical weight code as Flux with a similar attention mechanism - but intentionally deviant and compacted with careful consideration to scaling and purpose of mechanisms.
She went through quite a few growing pains with her earlier attention mechanism which required a reimagining today and careful consideration of the consequences, and now I present to you the preliminary look into Lailah.
The preliminary training is still heavily under way, the mechanisms are still being augmented, and her stability is currently being measured. The potential for fidelity, depth, and quality are still in measure - so I will be shifting attention and pivoting utility based on the needs over time.