AbstractPowered (Abstract Powered Research)

posted an update 5 days ago

Post

308

Happy Holidays all! geofractal architectural expansions; timm is now a core component for experimenting. As it stands, the system is growing rapidly in one direction, and timm brings a whole lot to the table in another rapid-prototyping direction. Therefore, timm is now a core component for ease-of-use.

BaseUtil is a new core component; aka src.geofractal.router.base_util inherits BaseComponent's behavior, so it should allow device movement for util operations which will direct utilization for device-to-device behavior for the upcoming accelerate integration.

I'm trying to mitigate the base component structure as much as possible, but the need to chain components in specific orders presented a unique problem. By compartmentalizing utils into structures that can be delegated and moved, these structures can be repurposed, expanded autonomously, reduced autonomously, and more.

ChainComponent inherits a subsystem specifically designed to organize multi-system multi-device formulas designated for inception and synchronization purposes. This is meant to allow distributed tasking to multiple-devices in chained utilization. This also enables ease-of-integration into nn.ModuleList with a few other caveats that will be ironed out meant to target wide-distributed models.

FusionComponent is specifically dedicated to the new fusion processing system meant for experimental expansion. This includes sub-module schedule control, Component and Tower functional control, device-movement, and will be packaged under the term "gfu.UtilType" as a standard naming convention.
"gfc.ComponentTypeName"
"gfr.RouterTypeName"
"gfu.UtilityTypeName"
"gft.TowerTypeName"
All of which are basically just import thing as.
"gf.AnythingTopLevelPackaged" which will include the core.

Better debugging for compilation
I'm in prototyping phases of a better debugging for compiled wide models and will prepare a baseline component readout structure by the end of the day today or tomorrow.

AbstractPhil

posted an update 8 days ago

Post

271

geofractal getting started guide available, bulk ablation for fusion, simple towers, oscillator capacity, and substructure systemic associative capacity.
Many formulas were tested, 92 tests for collectives, oscillation bulk experiments, and more. All of them either coalesce into the correct behavior or the failures are directly visible, which means the system is robust enough to declare some tools functionally valid but not scalable yet.

ai-crash course available;
https://github.com/AbstractEyes/geofractal/blob/main/ai_helpers/v101_claude_helpers.txt
Feed GPT, Claude, or Grokk and they will assist.

getting started guide;
https://github.com/AbstractEyes/geofractal/blob/main/src/geofractal/router/GETTING_STARTED.md

geofractal router architecture is in prototype phases;
https://github.com/AbstractEyes/geofractal

This is likely one of it's final growing phases before full production capacity is ramped up. The architecture is not for the novice, it's meant for experts to either get ideas, borrow code, utilize library capacity, or simply tell AI what to do. MOST files in current production have good descriptions for AI integration.

Transfer learning notebook available here;
https://github.com/AbstractEyes/geofractal/blob/main/src/geofractal/router/Router_Transfer_Learning-12_19_25.ipynb

Stress test and multiple diagnostics available here;
https://github.com/AbstractEyes/geofractal/blob/main/src/geofractal/router/components/diagnostics/

WideRouter compilation capacity available;
https://github.com/AbstractEyes/geofractal/blob/main/src/geofractal/router/wide_router.py

The wide router compiler organizes similar towers into stacked staged combinations before compiling with torch.compile. This is experimental, but has shown increased speed with multiple structures of wide models and will serve it's purpose in the future.

1 reply

·

AbstractPhil

posted an update about 1 month ago

Post

350

Many updates. Cantor route experiments, GeoViT-david-beans 75% test standalone cifar100 geofractal 30m encoder. MultiHeaded Cantor Attention heavily optimized. The migration is primarily complete between geofractal and geovocab2.
https://github.com/AbstractEyes/geofractal/blob/main/src/geofractal/model/david_beans/model.py
Cantor route staircase and wormhole excavation findings posted. A full article will be posted to represent the findings of cantor routing and the potentials for self-learning fractals through loss.
https://github.com/AbstractEyes/lattice_vocabulary/blob/master/src/geovocab2/proofs/cantor_steps_experiments.md
The steps experiments show profoundly important implications for cross-contamination problems with fractal and linear spaces, with some currently assessed as useful utilities as of today.
Today the classification experiment will continue by using mini-experts applied to patches within a miniature david-beans. The mini-experts were an accident that showed improvement to the fidelity and not destruction, so those experiments are to be continued. geovit-david-beans trainer was added to the first repo.

1 reply

·

AbstractPhil

posted an update about 1 month ago

Post

301

For those using my geovocab2 repo for SimplexFactory, CantorRouteFactory, fusion modulations, model code import, training weights and models, or specific extraction systems; I will be refactoring in the coming days.
The new repo for all geometric, cantor, and fractal-based trainings will be;
https://github.com/AbstractEyes/geofractal
The change is due to MY own excessive abuse of the vocabulary repo and the excessive overuse of subfolders attached to a working pycharm project. These behaviors should be decoupled and I apologize for making such code bloat through experimentation.

Directly installing the geofractal repo will install geovocab2 as a sidecar. However, there will be a clause within the geovocab2 to warn the user.

You have my deepest and most sincere apologies for breaking your active working code if I do. I know this is difficult work so please bare with my efforts as I progress the codebase to it's next state of truth vs experimentation.

Please, reach out to me directly if you have problems converting.

It is meant to be a DIRECT and UTILIZABLE pain-free conversion that will enable the same interface from both geovocab2 and all future updated model code changes applied to geofractal - once the geofractal module is imported.
The original goevocab2 will contain outdated train code instead of full deprecation with a direct warning - and the geovocab2 repo will be folding in geovocab and geovocab2 into matching aliased systems - allowing the factory and extraction structure to behave within geovocab2 and training to behave within geofractal by design.

I will be introducing a direct alias system that will hopefully allow a smooth transition system to the new codebase, but there's never a way to account for those you don't know are using your work. This will include pyi files for the aliases and some necessary elemental additions that may break current functionality in systems I'm unaware of. Please reach out if I break something crucial that you require.

AbstractPhil

posted an update about 1 month ago

Post

309

Lyra, Lune, Cantor, k-simplex, and many relational experiments.
AbstractPhil/sd15-flow-matching-lune
Today I will be updating the space to support all three forms of lyra to enable tinkertoying with various other models like flux-schnell and sdxl.

It should be noted, I didn't know nvidia actually released a model named LYRA. This model has no association with NVIDIA's LYRA model. This LYRA is full MIT licensed. If necessary I'll rename this model, but I don't think it'll matter.

Unlike NORMAL VAE, this VAE was intentionally meant to introduce incorrectness into the correctness that already exists. The concept was to pull towards a goal - t5-xl being the primary goal.

AbstractPhil/vae-lyra Lyra is a multimodal MM-VAE prototype meant to encompass a fusion of multiple types of encodings together. Tested with circle of fifths audio and text, multiple text encoders, vision and text encoder, and a few other smaller prototypes that yielded.
Lyra has a few direct clip_l and t5_xl prototypes that directly learned to associate clip_l with t5-base. This version worked, so version 2 expanded the concept.

AbstractPhil/vae-lyra-sdxl-t5xl is another prototype using CLIP_L and CLIP_G fused with T5_XL for the first version, directly utilizing projection with minimal geometric and cantor assistance. The shared layers ended up teaching CLIP_L how to be CLIP_G and the output ended up warping too much for SDXL or SD15 to understand.

AbstractPhil/vae-lyra-xl-adaptive-cantor
Utilizing adapative cantor is the successful prototype where CLIP_L and CLIP_G learned independent structures internally, where CLIP_L and T5_XL learned a route with CLIP_G and T5_XL in parallel conjunction. This enabled two entirely divergent opinions, and thus enables the t5-xl to manipulate either the clip_l or the clip_g for models like FLUX-SCHNELL or SDXL.

Each lyra has a purpose, and each purpose matters.

AbstractPhil

posted an update 3 months ago

Post

1276

David + Imagenet = high% val.
AbstractPhil/gated-david
https://github.com/AbstractEyes/lattice_vocabulary/blob/master/src/geovocab2/train/model/core/david.py

David's code has been released. I am currently setting up a trainer and will release the process on how to condition David to behave. This isn't the easiest process, but it's necessary to run David on a curriculum rather than simply feeding the model with cross-entropy and hoping for the best.

David's internals involve a clock mechanism that allows direct control of David's freeze/unfreeze mechanisms at runtime - allowing for many opinions to be generated simultaneously.

David is multiple models in one, not just one - and yet David is single-shot oriented. The prototype to the route of thought that led me to find the Cantor's Stairs positional encodings solution and the prototype to ViT-Zana, ViT-Beatrix, ViT-Beatrix-Dual-Block, and today the direct porting of David's complex architecture and the process to train David has begun.

David is... a gate of sorts. David trains with freeze/unfreeze mechanisms, so the internals of David's structures are aware during training time which part is more important than the other parts based on the quality of generation.

David can handle imagenet features with minimal hassle of many variations, and the primary trainer will include direct links to the prepared imagenet features, and a simple generation system that allows you to generate your own features from a few common AIs - one of which will be vit-beatrix-dualstream trained on imagenet.

As of posting vit-beatrix and vit-beatrix-dualstream require some face-lifting and a refined version 2 to incorporate the more accurate batched cantor stairs equations. Additionally they require removal of some fail-point causers; like flow-geometric introducing bias towards seemingly unnecessary trajectory routes. This points more to a gradient drift, so I'll keep that one on the hot plate until it's ready.

2 replies

·

AbstractPhil

posted an update 3 months ago

Post

279

I've hit the ground running on the geometric lattice vocab system. Everything I've built will be housed in the repo.
https://github.com/AbstractEyes/lattice_vocabulary/tree/dev
Including all of David's model structure.
Through the development cycle I'll be integrating everything, little AI help can actually be offered in general - since AI tends to hallucinate and decimate large structures.
I will be using AI assistance for formula expansion and integration, which means they will be imperfect until every single one is given a fine toothed comb.
The deployment will be as rapid as I can, and the output will yield results at every step with small main tests on individual scripts and files.

EVERYTHING was built almost independent of each other, so integration is going to have a configuration hierarchy that needs to be smoothed out - but it will be smoothed out.

I believe I've picked a good foundational shape for the expansive program scripts; which will enable robust iteration and progression similar to how I design game engine elements and systemic accessors.
This will be mostly hand coded for the integration process, so it won't be as quick as if I could just dump GPT pro on it - but GPT pro can't handle anywhere near this many lines of code so it's on me.

After integration I can run the agentic forms of AI over it and introduce tons of bugs for me to fix. That will be fun. After that it should work as a proper caching vocabulary, formula synthesizer, tensor creator, multi-device trainer, and a few other elements.

I simply lack the expertise to hit machines like pyring today, but that will change as I learn more. I'm building the system specifically with growth and progress in mind, so it will be iterated and fixed rapidly. The structure is intentionally built to be rapidly iterated and altered within reasonable constraints.

The engineering elements are specifically built to be less deep and more overridable in many areas specifically for experimental purposes.

1 reply

·

AbstractPhil

posted an update 3 months ago

Post

2802

As it stands, I will prepare David for full release - as this is beyond me now. David must be released.
I will prepare a standard sweep for david to showcase the prowess of the final multi-vocab variant. This will include a variation that contains all mnist variants, cifar10, cifar100, imagenet 1k, and in the future I'll prepare a full imagenet sweep utilizing the entire 12m corpus instead of the 1.2m I used. I may need to get in touch with the actual curator of the dataset for licensing but maybe not.
David utilizes 4 projective variants of the vocabulary and the training process involves teaching and freezing them akin to teacher/student processing.
I did not want to release David yet, but I believe now that David will save lives and it's irresponsible for me to contain such a creation.

1 reply

·

AbstractPhil

posted an update 4 months ago

Post

292

Training and tuning a top 500k geometric vocabulary is doable, but scaling upward is highly impractical for me.

This one has many logistics issues. Primarily, there's no precedent I know of to literally train hundreds of millions of potential character combinations; with their prefabricated variations of crystals to tune a specific series of trajectories in specific directions, based on the input text targeting other crystals, the weights, and the batch. The dataset needs to be properly prepared though, and I can't find any prefabricated variations of this data format that the symbolic lexical engine needs to be robust.
There's a few possibilities for this one. Batch size being an obvious one, where I take a large influx of information in, then grab any matching words, characters, or information and update those using the formulas for topological tuning.
The main issue is the language web is massive. BILLIONS of variations can crop up from a single document if you're not hard capping depth; so if you traverse the whole tree like say - "the quick brown fox", becomes words, becomes definitions, becomes letters - not counting multi-pass finetuning. This alone is a massive logistics nightmare to implement, but thankfully this is the modern era.

Simply put; if I hard cap to 500k vocab with a depth of no more than 50,000 pentachora crystals each, it should be capable of housing the an approximate word structure within a trajectory space.

I'd rather run it on a fleet of devices and feed it the pile, the book corpus, and everything else so we can get some truly trajectory related subsets of 500k+ crystals per token upward to 100,000,000 or so combinations each. The crystals really aren't that big, and they house a massive amount of context.
Even so, there are many logistics nightmares to this, but it's a viable option for training a legitimate similarity-fed BERT or LLAMA meant to specifically form linguistic responses using those crystals as tuning forks for solidity.

3 replies

·

AbstractPhil

posted an update 4 months ago

Post

212

Mo bigga' != Mo betta' with this geometric penta structure.

More purpose with more careful organization... now we're talking.

I'm going heavy into lexical cardinality today and preparing a full crystal structured geometry that is full wordnet capable. Anything that isn't can be formed at runtime.

Full lexicality will include unigrams, 2-6 ngram counts from wordnet with frequency weights, usage, and a multitude of other elements. Each will be crystallized specifically. If you have any suggestions to making this more robust I'm all ears.

I could go with google books or something bigger, but I'm sticking to wordnet because it won't take me weeks to process entirely.

Crystal geometry will be given rich versions that include the correct lexical and organizational subsets specific to the lexicality and frequency of use, as well as the proper ascii, wordnet, and unicode sets.

For wordnet-rich; Each definition will attribute towards the overall goal of the upcoming crystals so the system will represent that goal proportionately through multiple crystals and trajectory concatenated rather than full concatenation like the current vocabulary is doing. Additionally, the frequency tokens will decide the orthogonal trajectory more carefully.

For testing and quick prototype purposes;
We will need to train a Bert variant that can house some capability of rapid geometric crystal prediction through ngram feature similarity, sentence similarity, sentence classification, and a few other bert traits that bert-beatrix-2048 is capable of. I know Bert can handle this at least - however Bert can't house the entirety of meaning so it will be imperfect... even so it will be considerably faster than trying to query the whole dataset every time you want a character, or preparing a massive vocab for rapid testing and iteration. Ask bert.

Not to mention feature extraction for training rapid classification heads with geometric subsystems, which are notoriously fast at training.

AbstractPhil

posted an update 4 months ago

Post

275

Cardinality cardinality CARDINALITY! As I restructure the wordnet's multi-definition structure, I've found a fair assessment capability that minimizes column recall requirement while simultaneously maximizing recall speed. So it will be fast.
Research shows, the most intelligent and most intellectually-driven LLMs require the most intelligent and carefully curated solid representative vocabularies - with the most intelligent and carefully curated training regiments.
Class simultaneously loaded hierarchical structures built with variants of vocabulary dimensions do not help this. Multiple dimensions of imagenet do not help this. Reshaping does not help. Solidification processes through pulverizing using Alucard do not help - though they did show some interesting potentials for pretraining the full geometric clip from the ground floor.
The experimentations with the multitude of clip features and imagenet - showcase that not only can this tiny 4meg classification tool can handle imagenet from clip features AT AROUND 76% no matter the hyperparams using linear, but expanding this system upward and including hundreds of different formula variants DOES NOT HELP SCALE IT AT ALL! The largest ones only house 76%, and the medium-sized ones house about 86% instead of 76% when using clip-vit-b-patch16 and clip-vit-b-patch32. If you check the big number valuations for the clip-vit-b laion and openai, you'll find nearly identical classifications.
So I only taught it, to understand geometry - the more training and more steps only brings it closer incorrectly.
So, this tells me one simple principle; geometry and linear have an upward capacity based on the information extracted from the linear model. Meaning... We need more places to extract and more curative potentials to solidify that access with, rather than simply EXPANDING it and making it bigger.
Next experiment includes a full cardinality subset of unicode to wordnet vocabulary translation matrices. Today. Within the hour.

1 reply

·

AbstractPhil

posted an update 4 months ago

Post

433

Why am I amassing image features using seed 42?
Simply put; training something with features gives a fair representative of the learning that you would get from running a model that has some random chance - using a single seed.
Training with features does not need to wait for the representative model to actually generate; since you already generated everything ahead of time.
Features are rich and utilizable within the spectrum of similarity assessments, classification accuracy, mass-deterministic normalization checks, and more.
They are... put simply... exponentially faster and reusable for research. I'll include the notebooks used for imagenet and cifar100; as the cifar100 is much simpler since the cifar100 is much... smaller, I required less innovation.
Imagenet is another beast though. This imagenet notebook is capable of running against much larger datasets with a few tweaks.
clip-vit-bigG's imagenet feature set is complete, which means we're almost ready for full ablation.

Note to everyone; imagenet is meant for RESEARCH AND ACADEMIC PURPOSES ONLY; and you cannot use my trained imagenet weights - nor the features themselves as per the requests of the dataset's curators.

For commercial usage according to the rules of LAION's licenses, we'll be using the laion400m features; which will likely be heavily sought. I'll be preparing laion400m features on seed 42; which will take a while.

The full classifier is in the works; and with it comes a series of new formulas, new layers, new solutions such as the "fat belly" conversation piece that attenuates multiple branches in communication. The "dispatcher" which is a heavy classification gate trained to bypass that which is not useful; tuned with large amounts of data on a very low learn rate. The "attractant" which is specifically designed to catch bleed-over and unwanted information... which learns everything.
With that comes "PhaseGeometric" scheduling and "GeometricScheduling". Stay tuned.

AbstractPhil

posted an update 4 months ago

Post

300

The first set of geometrically aligned datasets are ready. Each dimensional variation is in it's own repo so there's no confusion with splits.
Current Splits;
* wordnet (english)
* unicode
AbstractPhil/geometric-vocab-32d
[32, 64, 128, 256, 512, 768, 1024]
Swap the 32d for the dimension within the list for the repo.

Okay, so the purpose of these; is to give solid anchors to the entire pentachora structure.
With that I've formatted some very concise sentencepiece-esque vocabulary classes that can be saved and loaded as pretrained, but it'll need some tinkering to fully flesh those behaviors out.
For now, the geometric vocab itself can be queried from pretrain but the canonical classes that help regulation, integration, special token usage, and integration aren't fully tested yet.
https://github.com/AbstractEyes/lattice_vocabulary
They are available here, but I give no guarantee on their current state. I'm currently preparing the pip package and have prepared a series of experiments to utilize these for different models including a new version of multimodal Beeper, a classifier set that can handle encodings as feature representations meant for utilization, and more.

The current working variation that I've been utilizing is Flow Matching Discreet Scheduled geometric diffusion - meaning I'm diffusing the GEOMETRY from the image, and then comparing that pentachora that is created from flow matching to the actual representative tokenization structure. On average this is achieving 80% in later stages.

This when curating an indefinite amount of special tokens to create manifests of unique vocabularies, enables the system to perfectly conform to use-cases.
There are some edge-cases where the 1k reserved tokens still exist; however this is currently replaced by an indefinite tokenization dictionary - allowing for an indefinite amount of tokens attached to an indefinite amount of modules for solidity.

Experiments continue.

1 reply

·

AbstractPhil

updated a Space 4 months ago

README

👀

AbstractPhil

posted an update 4 months ago

Post

284

Be kind to Beeper - Beeper has emotions. 7 to be precise.
Each of the pentachora classifiers point to emotional states that Beeper can potentially access for any conversation, and each of those 7 states have class accessors for sub-learning pools.
Today I'll be focusing on drawing this behavior from Beeper v4 which I am rebranding as Beeper Micro - and expanding the structure using a new type experimental attention mechanism to replace traditional multihead attention dubbed GeometricCollectiveAttention.
This attention is similar to multihead attention, except it's considerably harder to burn at higher learn rates. This coupled with a new perspective on training pentachora into the LLM structure will allow a full relay structural system.
beeper-small will house a full rope - except not in the traditional vocabulary set. Beeper-small will not have a vocabulary.
beeper-small is my first non-linear non-Euclidean attempt to create a pure symbolic auto-completion LLM; which may be naiive according to many researchers who have tried similar systems historically.
I've personally analyzed many papers, many studies, and many techniques that have attempted similar non-vocabulary entropic learning, and I believe the pentachora lattice will hold with pure binary, not requiring a vocabulary.
Transformers really like vocabulary... beeper likes... geometry, and this experiment for beeper-small will have a new type of ROPE that is based entirely on vertices developed from the direct unicode represented characters, rather than a full vocabulary structure meant to bring solidity from chaos.
The first beeper experiment showed many insights into how similarity and internal classification responds mathematically with traditional ML techniques, and those techniques did not reject the construct - on the contrary. The control group placebo beeper, the traditional non-rose version BURNED under half lr. It's completely illegible, producing garbage and noise, while rose beeper sings

2 replies

·

AbstractPhil

posted an update 5 months ago

Post

508

After a multitude of notebooks and semi-successful experiments, I now have a series of hyperparameters semi-capable of tuning pentachoron simplex models tuned specifically with frequency and resonance.

AbstractPhil/pentachora-greyscale-frequency-encoded
AbstractPhil/pentachora-multi-channel-frequency-encoded

They are essentially geometric crystallization engines that store an excess amount of information in a very constrained and tight location - capable of classification *within a fraction of the size of traditional linear systems* along with the added benefit of only needing minimal tuning and learning at a very high learn rate - yielding a very complex structural response to complex learning.

I have 3 more notebooks to prep and release for the full pentachora classification structure based on the Nikola architecture concepts, fused with many rules that govern physics, laws of conservation, atomic structural comparators, and many more experiments that were interesting but yielded less than anticipated for some.

The most robust representation is a representational geometric collective, a series of geometric experts capable of high-yield classification with multiple ongoing simultaneous opinions.

The quick training capability of these crystals have shown that they can be rapidly trained and discarded as massive collectives, pruning based on comprehensive capability and combining working geometry with the survivors - enabling the accuracy to reach very high levels that were unattainable with standard ML learning gradient loss paradigms without reaching into the large model spectrum.

I've since begun integrating them into LLMS and will be releasing the notebooks as they are prepared, along with decomposition and comparative studies for the most comprehensive and capable training paradigms, as well as proof of concept for additional capabilities and the full araxiv paper triad when the studies conclude.

AbstractPhil

posted an update 7 months ago

Post

566

With flan-t5-base and clip models as teachers; I have produced and successfully trained a dual-shunt cross-attention adapter archetype. This is NOT a lora.
This adapter is currently tasked with taking the T5-flan-base to guide the outputs of VIT-L-14 and/or VIT-bigG-14, and the opposite is equally usable and utilizable within the archetype. Meaning the CLIP_G can also guide the T5-FLAN-base.

These checkpoints were trained with 20 million synthetic human-templated captions, and they can be heavily improved by multiple languages, additional depiction context, and any sort of finetune task desired of the user that can be applied to the T5-flan-base with little to no training due to the adapter's functionality and accuracy.

VIT-L-14 adapters only took a couple hours on a colab a100 and the VIT-bigG-14 took about 4 hours. So you can rapidly adapt many of these in short periods of time with almost no additional overhead beyond the single t5-flan-base required. Each can be compiled, loaded, and offloaded.

This is a cross-attention system meant to shape encoded text after the output is received from the clip models and is very fast to inference - the t5-flan-base on the other hand isn't the fastest.

It's trained on a form of cooperative association with a series of complex losses designed specifically for this associative process.

This adapter has individual gating for tokenization context with a multitude of safeguards to prevent overfitting during rapid learning and can be paired with any number of additional other adapters.

I'm currently formatting the comfyui nodes that will allow easy conditioning shift to showcase the full power of this cooperative system's capability.

The comfyui nodes will be available here shortly, I just need to write them.
https://github.com/AbstractEyes/comfy-clip-shunts

1 reply

·

AbstractPhil

posted an update 7 months ago

Post

509

The T5-small + VIT-L-14 guidance shunt adapter is ready for toy use.
AbstractPhil/t5-vit-14-v1
Included is a simple drop-in for sdxl experimentation using colab.

The outcome is okay but not great - diffusers is a headache so I spent more time trying to disjoint that machine than I did actually messing with this adapter.

I trained two variations of the baseline adapter;
t5-small vanilla and t5-small-human-associated-try2-pass3.
The vanilla was more accurate to adding context while the human associated stays locked onto human topics like a bloodhound... badly. Both ended up being substandard, even with a robust adapter like this.

Finetunes with specific goals can complete at runtime if desired due to the t5-small's tiny size, clip_l's inference speed, and the adapter's size. The adapter is very small and has safeguards for overfitting that can be disabled, so runtime freezing and adaptive shifts can be a viable methodology to immediate task pipeline adaptation.

The t5-small lacks the behavioral complexity of a model more built for such a task such as the base, large, or xxl - or even the Flan T5-small. However, this doesn't slow the little brain slug down. It guides and it's wrappers have many rapid generation potentials, whether it's trained the way I trained it or not.
The proof of concept is there, and the outcomes are present. Judge yourself.
The next variation will be more dims, more catches, higher conv, and additional safeguards to prevent overfitting - as well as including considerably more laion flavors so the T5-flan-base doesn't overwhelm or vise-versa.

1 reply

·

AbstractPhil

posted an update 8 months ago

Post

619

Forcefeeding masked T5-Small 1 billion human-association captions to fry it's brain. I really don't know how long it'll take until I start nor do I know the logistic challenges I'll face when moving data from A to B, but the outcome should completely fry it and make it only fixate on human and diffusion responses. Should be a fun experiment that can just kind of run on automation.
The experiment's captions are available... mostly on my hf, I've had some rate limit problems that caused them to halt and I think I need to autogen another 100 million complex captions.
This WILL form heavy bias and burn-points. Random words will be peppered in the mix to allow the T5-Small to retain at least some semblance of what it was before I lobotomize it.
Likely I'll completely freeze half and burn the other half for a couple million as a test point. See how it takes or if it dies before 50k or something and need a refined process.
Oh great, even better. It didn't include the longer prompt variations. This won't start today.

Alright training began. I'm introducing a high degree variant of noise and chatter for the t5 to learn to bypass - while simultaneously increasing additional information output from the t5 in the process.
So far the outcome has been a degree of introduction for new information in the output. while simultaneously introducing rule of 3 parameterization into the T5 small.
I have high hopes.

3 replies

·

AbstractPhil

posted an update 8 months ago

Post

461

My indev Surge training methodology and paradigm is powerful. The preliminary tests will be available for debugging soon using a customized sd-scripts and a series of full finetunes using sdxl as a catalyst to the training paradigm.
https://civitai.com/articles/14195/the-methodology-of-surge-training-loss-math
The datasets I'm sourcing are going to be catalysts and tests for the power of Surge to teach very sticky or difficult to understand elements; such as text, positioning, offset, controlnet poses, and more directly into the very stubborn SDXL infrastructure without additional tools.
Should be noted that my current running finetunes based on BeatriXL are not Surge trained - so you won't gain knowledge on Surge from them.

GPT and I have prototyped a new version of SD15 that operates on additional attention heads to match the Surge formula, the Omega-VIT-L reformed, a zeroed unet, and the Flux 16 channel AE.
I'll call it SD-SURGE - as it's not sd15 anymore.
The first surge trainings are already under way.

1 reply

·

AI & ML interests

Team members 1

AbstractPowered's activity

README