[Solar-talk] validation - unicode?
Paul M Jones
pmjones at solarphp.com
Sat Jul 22 09:20:06 PDT 2006
On Jul 22, 2006, at 10:42 AM, Jeff Surgeson wrote:
> Bewteen the 100 - 500 and the 35 - 60 are these extended minus
> signs, I can
> display them, copy them but cant type them as they do not exist on my
> keyboard layout. As far as I know it has something to do with windows
> locality options. My col scheme is standard, validates for length
> (I am
> learning) and accepts only letters/numbers and punctuation, and
> wont allow
> blank.
>
> $this->_col['title'] = array(
> 'type' => 'varchar',
> 'size' => 64,
> 'require' => true,
> 'valid' => array('multiple', 'VALID_TEXT_64',
> array(
> array('maxLength', 64),
> array('regex', '/^[[:alpha:][:digit:]
> [:punct:]\s]+$/', 'NOT_BLANK'),
> )
> ),
> );
>
> This all works fine until someone enters a strange character, which
> is going
> to happen.
>
> How do I deal with it as far as Solar is concerned.
Some things to try:
1. At the form level, you can add a 'filter' rule that takes non-
alpha/digit/punct/space and strips them. See the docs for
Solar_Filter, and the related GettingStarted page in the Solar_Form
docs.
http://solarphp.com/index.php/docs/read/Solar_Filter
http://solarphp.com/index.php/docs/read/Solar_Form/GettingStarted
2. At the table level, you may wish to redo your validation regex so
that it allows for those characters using \xNNN notation (based on
the character code ranges; I have no idea what they are).
Alternatively, if the field is a free-form entry line, you might want
to take the regex validation off entirely and leave only the basic
checks (maxLength, notBlank).
3. In your table-level regex, try using the undocumented /u pattern
modifier for Unicode, which may cause [:punct:] to recognize the
additional high-code punctuation marks (don't know if that'll work
for sure, but it's a thought). See this comment for more info:
http://us3.php.net/manual/en/
reference.pcre.pattern.modifiers.php#54805
> What worries me was that I noticed that you make use of very little
> validation
> except where it is a predictable entry like a email/ipv4/ single or
> muti
> words, numbers only. But if large block of unknown entry like a
> description,
> you just type it as "clob" with no validation.
>
> Is this because of the charset problems and the problem I am having?
Kind of. If a field is free-form, then you can't really validate it;
there's no pattern for it to match against. CLOBs in particular are
like that.
> Rodrigo's post talks
> about "convert all existent locale files to UTF-8" what exactly do
> you mean?
> I know this has nothing to do with form validation but as far as I
> can make
> out it has to do with unicode/utf-8/charsets etc.
When I was using BBEdit, it defaulted to a charset other than UTF-8
(iso-latin-8, maybe? don't recall). That meant some high-code
characters weren't being saved properly, and were distributed in
their corrupted forms. Rodrido moved them all to UTF-8 and now that
I use TextMate (which is UTF-8 by default) it's no longer an issue.
Does that help?
> My limited knowledge tells me that the change to unicode solves all
> the
> different locale / character code problems, not so?
So far as I can tell, this is the case, yes.
Does this help at all, or have I missed an important point in your
questions?
--
Paul M. Jones <http://paul-m-jones.com>
Solar: Simple Object Library and Application Repository
for PHP5. <http://solarphp.com>
Savant: The simple, elegant, and powerful solution for
templates in PHP. <http://phpsavant.com>
More information about the solar-talk
mailing list